Re: [HACKERS] [PATCH]-hash index improving

2008-07-23 Thread Simon Riggs
On Wed, 2008-07-23 at 10:57 +0800, Xiao Meng wrote: Well, I'll do it after I finish my second patch. Hash index should be more efficient than btree when N is big enough. It seems meaningful to find how big N is in an experiment way. Agreed. We should also examine the basic thinking of the

Re: [HACKERS] [PATCH]-hash index improving

2008-07-23 Thread Kenneth Marshall
] [PATCH]-hash index improving Well, I'll do it after I finish my second patch. Hash index should be more efficient than btree when N is big enough. It seems meaningful to find how big N is in an experiment way. The savings will depend on many factors. Another thing (besides volume

Re: [HACKERS] [PATCH]-hash index improving

2008-07-22 Thread Xiao Meng
I'm sorry for delay reply. I couldn't get access to the internet these days for some reason. I do apologize for my rough work and very bad readability. I posted it in a hurry and I didn't mean to cause the reader so much inconvenience. I'll NEVER make such a mistake again. Currently, I've made

Re: [HACKERS] [PATCH]-hash index improving

2008-07-22 Thread Xiao Meng
Well, I'll do it after I finish my second patch. Hash index should be more efficient than btree when N is big enough. It seems meaningful to find how big N is in an experiment way. On Fri, Jul 18, 2008 at 6:35 PM, Simon Riggs [EMAIL PROTECTED] wrote: On Fri, 2008-07-18 at 11:07 +0100, Gregory

Re: [HACKERS] [PATCH]-hash index improving

2008-07-22 Thread Dann Corbit
-Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-hackers- [EMAIL PROTECTED] On Behalf Of Xiao Meng Sent: Tuesday, July 22, 2008 7:57 PM To: Simon Riggs Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] [PATCH]-hash index improving Well, I'll do it after I finish my

Re: [HACKERS] [PATCH]-hash index improving

2008-07-18 Thread Simon Riggs
On Thu, 2008-07-17 at 16:37 -0700, Dann Corbit wrote: Large table unique index equality search should be very fast with hashed index (and the only place where any advantage will be seen). Hashed indexes are useless for any search besides equality and gain more and more when the levels of

Re: [HACKERS] [PATCH]-hash index improving

2008-07-18 Thread Gregory Stark
Simon Riggs [EMAIL PROTECTED] writes: On Thu, 2008-07-17 at 16:37 -0700, Dann Corbit wrote: Large table unique index equality search should be very fast with hashed index (and the only place where any advantage will be seen). Hashed indexes are useless for any search besides equality and

Re: [HACKERS] [PATCH]-hash index improving

2008-07-18 Thread Simon Riggs
On Fri, 2008-07-18 at 11:07 +0100, Gregory Stark wrote: Simon Riggs [EMAIL PROTECTED] writes: hash lookups can in theory be O(1). I'm not sure whether that applies here? I'm interested in how *this* patch will work, not in more generic algorithm theory. To patch authors: Can we please see a

Re: [HACKERS] [PATCH]-hash index improving

2008-07-18 Thread Alvaro Herrera
Gregory Stark escribió: For cpu-bound databases with small indexes there might be a win if you can avoid the binary search of all the elements on a page. (Have we modified btree to do that or does it still scan sequentially on the leaf pages?) Hmm? It has used binary search since as long as

Re: [HACKERS] [PATCH]-hash index improving

2008-07-18 Thread Heikki Linnakangas
Gregory Stark wrote: For i/o-bound databases with very large indexes there should be an opportunity where btree lookups are O(logn) and hash lookups can in theory be O(1). Ignoring the big-O complexity, if a hash index only stores a 32-bit hash code instead of the whole key, it could be a big

Re: [HACKERS] [PATCH]-hash index improving

2008-07-18 Thread Gregory Stark
Heikki Linnakangas [EMAIL PROTECTED] writes: Gregory Stark wrote: For i/o-bound databases with very large indexes there should be an opportunity where btree lookups are O(logn) and hash lookups can in theory be O(1). Ignoring the big-O complexity, if a hash index only stores a 32-bit hash

Re: [HACKERS] [PATCH]-hash index improving

2008-07-18 Thread Jonah H. Harris
On Fri, Jul 18, 2008 at 10:44 AM, Heikki Linnakangas [EMAIL PROTECTED] wrote: Ignoring the big-O complexity, if a hash index only stores a 32-bit hash code instead of the whole key, it could be a big win in storage size, and therefore in cache-efficiency and performance, when the keys are very

Re: [HACKERS] [PATCH]-hash index improving

2008-07-18 Thread Tom Lane
Jonah H. Harris [EMAIL PROTECTED] writes: Agreed. My thinking is that there's either something inherently wrong with the implementation, or we're performing so many disk I/Os that it's nearly equivalent to b-tree. Tom has a couple suggestions which Xiao and I will explore. I finally got a

Re: [HACKERS] [PATCH]-hash index improving

2008-07-18 Thread Kenneth Marshall
I just ran my original 16M word test case against the patched version, and like Tom noted below, the tuples per bucket calculation is wrong which results in identical index sizes for both the original version and the hash-value-only version. I suppose that the main point of #1 is to reduce index

Re: [HACKERS] [PATCH]-hash index improving

2008-07-18 Thread Kenneth Marshall
FYI, I just patched the fill-factor calculation and re-ran my test. The index size dropped from 513M to 43M which is the same disk footprint as the corresponding btree index. Have a nice weekend. Ken On Fri, Jul 18, 2008 at 12:23:14PM -0500, Kenneth Marshall wrote: I just ran my original 16M

[HACKERS] [PATCH]-hash index improving

2008-07-17 Thread Xiao Meng
The patch store hash code only in the index tuple. It based on Neil Conway's patch with an old version of PostgreSQL. It passes the regression test but I didn't test the performance yet. Anyone interested can make a performance test;-) You can undefine the macro HASHVALUE_ONLY in hash.h to get

Re: [HACKERS] [PATCH]-hash index improving

2008-07-17 Thread Jonah H. Harris
On Thu, Jul 17, 2008 at 5:26 AM, Xiao Meng [EMAIL PROTECTED] wrote: The patch store hash code only in the index tuple. It based on Neil Conway's patch with an old version of PostgreSQL. It passes the regression test but I didn't test the performance yet. Anyone interested can make a

Re: [HACKERS] [PATCH]-hash index improving

2008-07-17 Thread Alvaro Herrera
Xiao Meng escribió: The patch store hash code only in the index tuple. It based on Neil Conway's patch with an old version of PostgreSQL. It passes the regression test but I didn't test the performance yet. Anyone interested can make a performance test;-) You can undefine the macro

Re: [HACKERS] [PATCH]-hash index improving

2008-07-17 Thread Jonah H. Harris
On Thu, Jul 17, 2008 at 12:42 PM, Alvaro Herrera [EMAIL PROTECTED] wrote: I think having the HASHVALUE_ONLY define is not a good idea -- it just makes the patch harder to read. I suggest just removing the old code and putting the new code in place. (That's why we have revision control.)

Re: [HACKERS] [PATCH]-hash index improving

2008-07-17 Thread Kenneth Marshall
On Thu, Jul 17, 2008 at 12:42:39PM -0400, Alvaro Herrera wrote: Xiao Meng escribi?: The patch store hash code only in the index tuple. It based on Neil Conway's patch with an old version of PostgreSQL. It passes the regression test but I didn't test the performance yet. Anyone interested

Re: [HACKERS] [PATCH]-hash index improving

2008-07-17 Thread Kenneth Marshall
On Thu, Jul 17, 2008 at 02:00:07PM -0400, Jonah H. Harris wrote: On Thu, Jul 17, 2008 at 1:54 PM, Kenneth Marshall [EMAIL PROTECTED] wrote: I think having the HASHVALUE_ONLY define is not a good idea -- it just makes the patch harder to read. I suggest just removing the old code and

Re: [HACKERS] [PATCH]-hash index improving

2008-07-17 Thread Jonah H. Harris
On Thu, Jul 17, 2008 at 1:54 PM, Kenneth Marshall [EMAIL PROTECTED] wrote: I think having the HASHVALUE_ONLY define is not a good idea -- it just makes the patch harder to read. I suggest just removing the old code and putting the new code in place. (That's why we have revision control.)

Re: [HACKERS] [PATCH]-hash index improving

2008-07-17 Thread Alvaro Herrera
Kenneth Marshall escribió: On Thu, Jul 17, 2008 at 12:42:39PM -0400, Alvaro Herrera wrote: I think having the HASHVALUE_ONLY define is not a good idea -- it just makes the patch harder to read. I suggest just removing the old code and putting the new code in place. (That's why we have

Re: [HACKERS] [PATCH]-hash index improving

2008-07-17 Thread Jonah H. Harris
On Thu, Jul 17, 2008 at 5:26 AM, Xiao Meng [EMAIL PROTECTED] wrote: The patch store hash code only in the index tuple. It based on Neil Conway's patch with an old version of PostgreSQL. It passes the regression test but I didn't test the performance yet. Anyone interested can make a

Re: [HACKERS] [PATCH]-hash index improving

2008-07-17 Thread Kenneth Marshall
On Thu, Jul 17, 2008 at 04:24:28PM -0400, Jonah H. Harris wrote: On Thu, Jul 17, 2008 at 5:26 AM, Xiao Meng [EMAIL PROTECTED] wrote: The patch store hash code only in the index tuple. It based on Neil Conway's patch with an old version of PostgreSQL. It passes the regression test but I

Re: [HACKERS] [PATCH]-hash index improving

2008-07-17 Thread Simon Riggs
On Thu, 2008-07-17 at 16:24 -0400, Jonah H. Harris wrote: I'm not really seeing any performance improvements over b-tree. I'd like to see a theoretical analysis on the benefit case before we spend too many teraflops on investigation. In which cases do we expect that hash indexes will beat

Re: [HACKERS] [PATCH]-hash index improving

2008-07-17 Thread Dann Corbit
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Simon Riggs Sent: Thursday, July 17, 2008 4:10 PM To: Jonah H. Harris Cc: Xiao Meng; pgsql-hackers@postgresql.org; Kenneth Marshall Subject: Re: [HACKERS] [PATCH]-hash index improving On Thu

Re: [HACKERS] [PATCH]-hash index improving

2008-07-17 Thread David Fetter
On Thu, Jul 17, 2008 at 02:11:20PM -0400, Alvaro Herrera wrote: Kenneth Marshall escribió: On Thu, Jul 17, 2008 at 12:42:39PM -0400, Alvaro Herrera wrote: I think having the HASHVALUE_ONLY define is not a good idea -- it just makes the patch harder to read. I suggest just removing

Re: [HACKERS] [PATCH]-hash index improving

2008-07-17 Thread Jonah H. Harris
On Thu, Jul 17, 2008 at 7:37 PM, Dann Corbit [EMAIL PROTECTED] wrote: In which cases do we expect that hash indexes will beat btrees? Large table unique index equality search should be very fast with hashed index (and the only place where any advantage will be seen). Yes, this is the exact

Re: [HACKERS] [PATCH]-hash index improving

2008-07-17 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes: Xiao Meng escribió: You can undefine the macro HASHVALUE_ONLY in hash.h to get the original implementation. I think having the HASHVALUE_ONLY define is not a good idea -- it just makes the patch harder to read. While we are griping about readability:

Re: [HACKERS] [PATCH]-hash index improving

2008-07-17 Thread Jonah H. Harris
On Fri, Jul 18, 2008 at 1:00 AM, Tom Lane [EMAIL PROTECTED] wrote: While we are griping about readability: failure to update the comments to match the code is NOT, NOT, NOT acceptable. I had barely started to read the patch before encountering this insult to the reader: ... As this is Xiao's

Re: [HACKERS] [PATCH]-hash index improving

2008-07-17 Thread Tom Lane
Jonah H. Harris [EMAIL PROTECTED] writes: The real question now has been listed above; why are hash index queries, including this patch, no better than b-tree even for straight equality comparisons? Well, the theoretical advantage is that a hash probe is O(1) while a btree probe is O(log N)