Re: [PATCHES] hash index improving v3

2008-09-24 Thread Simon Riggs
On Wed, 2008-09-24 at 12:04 -0400, Bruce Momjian wrote: > Can we consider this hash thread closed/completed? Please -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes

Re: [PATCHES] hash index improving v3

2008-09-24 Thread Bruce Momjian
Can we consider this hash thread closed/completed? --- Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > Thinks: Why not just sort all of the time and skip the debate entirely? > > The sort is demonstrably a los

Re: [PATCHES] hash index improving v3

2008-09-23 Thread Simon Riggs
On Tue, 2008-09-23 at 09:13 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > Thinks: Why not just sort all of the time and skip the debate entirely? > > The sort is demonstrably a loser for smaller indexes. Admittedly, > if the index is small then the sort can't cost all that

Re: [PATCHES] hash index improving v3

2008-09-23 Thread Simon Riggs
On Tue, 2008-09-23 at 09:13 -0400, Tom Lane wrote: > The other side of that coin is that it's not clear this is really worth > arguing about, much less exposing a separate parameter for. Agreed. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent

Re: [PATCHES] hash index improving v3

2008-09-23 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes: > Thinks: Why not just sort all of the time and skip the debate entirely? The sort is demonstrably a loser for smaller indexes. Admittedly, if the index is small then the sort can't cost all that much, but if the (correct) threshold is some large fraction o

Re: [PATCHES] hash index improving v3

2008-09-23 Thread Simon Riggs
On Tue, 2008-09-23 at 08:16 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > maintenance_work_mem is already used for 3 separate operations that bear > > little resemblance to each other. If it's appropriate for all of those > > then its appropriate for this usage also. > > No

Re: [PATCHES] hash index improving v3

2008-09-23 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes: > maintenance_work_mem is already used for 3 separate operations that bear > little resemblance to each other. If it's appropriate for all of those > then its appropriate for this usage also. No, it isn't. The fundamental point here is that this isn't a mem

Re: [PATCHES] hash index improving v3

2008-09-22 Thread Simon Riggs
On Tue, 2008-09-23 at 00:48 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > I wasn't very happy with effective_cache_size and not happy with > > shared_buffers either. If building hash indexes is memory critical then > > we just need to say so and encourage others to set memor

Re: [PATCHES] hash index improving v3

2008-09-22 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes: > I wasn't very happy with effective_cache_size and not happy with > shared_buffers either. If building hash indexes is memory critical then > we just need to say so and encourage others to set memory use correctly. > People are already aware that maintenance

Re: [PATCHES] hash index improving v3

2008-09-22 Thread Simon Riggs
On Tue, 2008-09-23 at 00:05 -0400, Tom Lane wrote: > "Jonah H. Harris" <[EMAIL PROTECTED]> writes: > > On Mon, Sep 22, 2008 at 11:25 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote: > >> On Sun, Sep 14, 2008 at 8:16 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > >>> I'm considering changing hashbuild to sw

Re: [PATCHES] hash index improving v3

2008-09-22 Thread Tom Lane
"Jonah H. Harris" <[EMAIL PROTECTED]> writes: > On Mon, Sep 22, 2008 at 11:25 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote: >> On Sun, Sep 14, 2008 at 8:16 PM, Tom Lane <[EMAIL PROTECTED]> wrote: >>> I'm considering changing hashbuild to switch over at shared_buffers instead >>> of effective_cache_s

Re: [PATCHES] hash index improving v3

2008-09-22 Thread Alex Hunsaker
On Mon, Sep 22, 2008 at 7:57 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote: > 50,000,000 rows and 32,768,000 collisions I should mention thats 50 million rows + 32 million more or 62,768,000 rows which explains some of the added index creation time... > index time: > head: 576600.967 ms > v5:

Re: [PATCHES] hash index improving v3

2008-09-22 Thread Jonah H. Harris
On Mon, Sep 22, 2008 at 11:25 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote: > On Sun, Sep 14, 2008 at 8:16 PM, Tom Lane <[EMAIL PROTECTED]> wrote: >> BTW, one thing I noticed was that the hash index build time for the >> "wide column" case got a lot worse after applying the patch (from 56 to >> 237

Re: [PATCHES] hash index improving v3

2008-09-22 Thread Alex Hunsaker
On Sun, Sep 14, 2008 at 8:16 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > BTW, one thing I noticed was that the hash index build time for the > "wide column" case got a lot worse after applying the patch (from 56 to > 237 sec). The reason for this turned out to be that with the smaller > predicted in

Re: [PATCHES] hash index improving v3

2008-09-22 Thread Alex Hunsaker
On Fri, Sep 12, 2008 at 8:29 AM, Kenneth Marshall <[EMAIL PROTECTED]> wrote: > On Thu, Sep 11, 2008 at 08:51:53PM -0600, Alex Hunsaker wrote: >> On Thu, Sep 11, 2008 at 9:24 AM, Kenneth Marshall <[EMAIL PROTECTED]> wrote: >> > Alex, >> > >> > I meant to check the performance with increasing numbers

Re: [PATCHES] hash index improving v3

2008-09-14 Thread Tom Lane
I did some testing of my own on the hash index patch, and mostly seem to have confirmed Alex's results. I used a table like this: create table tab (f1 serial primary key, f2 some-datatype); create index ind on tab using hash (f2); and populated it with 2 million rows of data; the

Re: [PATCHES] hash index improving v3

2008-09-12 Thread Alex Hunsaker
On Fri, Sep 12, 2008 at 3:14 AM, Zdenek Kotala <[EMAIL PROTECTED]> wrote: > Alex Hunsaker napsal(a): >> >> On Wed, Sep 10, 2008 at 10:27 AM, Alex Hunsaker <[EMAIL PROTECTED]> wrote: >>> >>> On Wed, Sep 10, 2008 at 8:47 AM, Zdenek Kotala <[EMAIL PROTECTED]> >>> wrote: What locale did you u

Re: [PATCHES] hash index improving v3

2008-09-12 Thread Kenneth Marshall
On Thu, Sep 11, 2008 at 08:51:53PM -0600, Alex Hunsaker wrote: > On Thu, Sep 11, 2008 at 9:24 AM, Kenneth Marshall <[EMAIL PROTECTED]> wrote: > > Alex, > > > > I meant to check the performance with increasing numbers of collisions, > > not increasing size of the hashed item. In other words, somethi

Re: [PATCHES] hash index improving v3

2008-09-12 Thread Zdenek Kotala
Alex Hunsaker napsal(a): On Wed, Sep 10, 2008 at 10:27 AM, Alex Hunsaker <[EMAIL PROTECTED]> wrote: On Wed, Sep 10, 2008 at 8:47 AM, Zdenek Kotala <[EMAIL PROTECTED]> wrote: What locale did you use? It would be nice to have also comparing between C and any UTF8 locale. I think we should see big

Re: [PATCHES] hash index improving v3

2008-09-11 Thread Alex Hunsaker
On Thu, Sep 11, 2008 at 9:24 AM, Kenneth Marshall <[EMAIL PROTECTED]> wrote: > Alex, > > I meant to check the performance with increasing numbers of collisions, > not increasing size of the hashed item. In other words, something like > this: > > for ($coll=500; $i<=100; $i=$i*2) { > for ($i=0;

Re: [PATCHES] hash index improving v3

2008-09-11 Thread Kenneth Marshall
On Wed, Sep 10, 2008 at 10:17:31PM -0600, Alex Hunsaker wrote: > On Wed, Sep 10, 2008 at 9:49 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote: > > On Wed, Sep 10, 2008 at 7:04 AM, Kenneth Marshall <[EMAIL PROTECTED]> wrote: > >> On Tue, Sep 09, 2008 at 07:23:03PM -0600, Alex Hunsaker wrote: > >>> On Tu

Re: [PATCHES] hash index improving v3

2008-09-10 Thread Alex Hunsaker
On Wed, Sep 10, 2008 at 9:49 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote: > On Wed, Sep 10, 2008 at 7:04 AM, Kenneth Marshall <[EMAIL PROTECTED]> wrote: >> On Tue, Sep 09, 2008 at 07:23:03PM -0600, Alex Hunsaker wrote: >>> On Tue, Sep 9, 2008 at 7:48 AM, Kenneth Marshall <[EMAIL PROTECTED]> wrote:

Re: [PATCHES] hash index improving v3

2008-09-10 Thread Alex Hunsaker
On Wed, Sep 10, 2008 at 7:04 AM, Kenneth Marshall <[EMAIL PROTECTED]> wrote: > On Tue, Sep 09, 2008 at 07:23:03PM -0600, Alex Hunsaker wrote: >> On Tue, Sep 9, 2008 at 7:48 AM, Kenneth Marshall <[EMAIL PROTECTED]> wrote: >> > I think that the glacial speed for generating a big hash index is >> > th

Re: [PATCHES] hash index improving v3

2008-09-10 Thread Alex Hunsaker
On Wed, Sep 10, 2008 at 10:27 AM, Alex Hunsaker <[EMAIL PROTECTED]> wrote: > On Wed, Sep 10, 2008 at 8:47 AM, Zdenek Kotala <[EMAIL PROTECTED]> wrote: >> What locale did you use? It would be nice to have also comparing between C >> and any UTF8 locale. I think we should see big benefit when non C l

Re: [PATCHES] hash index improving v3

2008-09-10 Thread Alex Hunsaker
On Wed, Sep 10, 2008 at 8:47 AM, Zdenek Kotala <[EMAIL PROTECTED]> wrote: > What locale did you use? It would be nice to have also comparing between C > and any UTF8 locale. I think we should see big benefit when non C locale is > used. Err yes this was UTF8, Ill see about doing a C locale. -- S

Re: [PATCHES] hash index improving v3

2008-09-10 Thread Zdenek Kotala
Alex Hunsaker napsal(a): wide: # NOTE not on the same machine as the "narrow" test was run! # spit out 2, 000, 000 random 100 length strings perl gen.pl > data.sql create table test_hash (wide text); copy test_hash from './data.sql'; create index test_hash_num_idx on test_hash using hash (wide

Re: [PATCHES] hash index improving v3

2008-09-10 Thread Kenneth Marshall
On Tue, Sep 09, 2008 at 07:23:03PM -0600, Alex Hunsaker wrote: > On Tue, Sep 9, 2008 at 7:48 AM, Kenneth Marshall <[EMAIL PROTECTED]> wrote: > > I think that the glacial speed for generating a big hash index is > > the same problem that the original code faced. > > Yeah sorry, I was not saying it

Re: [PATCHES] hash index improving v3

2008-09-09 Thread Alex Hunsaker
On Tue, Sep 9, 2008 at 7:23 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote: > BTW Im still planning on doing a wide vs narrow test... sometime... :) narrow: (exactly the same as what I just did in the other post) create table test_hash(num int8); insert into test_hash (num) select generate_series(1,

Re: [PATCHES] hash index improving v3

2008-09-09 Thread Alex Hunsaker
On Tue, Sep 9, 2008 at 7:23 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote: > create table test_hash(num int8); > insert into test_hash (num) select generate_series(1, 200); > create index test_hash_num_idx on test_hash (num); > > pgbench -c1 -n -t1 -f bench_index.sql > cvs head: tps = 3161.50

Re: [PATCHES] hash index improving v3

2008-09-09 Thread Alex Hunsaker
On Tue, Sep 9, 2008 at 7:48 AM, Kenneth Marshall <[EMAIL PROTECTED]> wrote: > I think that the glacial speed for generating a big hash index is > the same problem that the original code faced. Yeah sorry, I was not saying it was a new problem with the patch. Err at least not trying to :) *Both* o

Re: [PATCHES] hash index improving v3

2008-09-09 Thread Kenneth Marshall
On Sat, Sep 06, 2008 at 08:23:05PM -0600, Alex Hunsaker wrote: > On Sat, Sep 6, 2008 at 1:09 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > >For the convenience of anyone intending to test, here is an updated > >patch against CVS HEAD that incorporates Alex's fix. > > Here are the results for a table c

Re: [PATCHES] hash index improving v3

2008-09-08 Thread Tom Lane
Zdenek Kotala <[EMAIL PROTECTED]> writes: > I attach cleanup patch which we discussed last commit fest. It introduce new > macros HashGetMetaPage and HashGetBitmap and of course, it break backward on > disk format compatibility which we will break it anyway when Xiao's patch > will > be committ

Re: [PATCHES] hash index improving v3

2008-09-08 Thread Zdenek Kotala
Tom Lane napsal(a): "Alex Hunsaker" <[EMAIL PROTECTED]> writes: On Fri, Sep 5, 2008 at 2:21 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote: Ok now that I made it so it actually *test* collisions, with the patch it always returns all rows that matched the hashed "key". And here is the fix, we ju

Re: [PATCHES] hash index improving v3

2008-09-08 Thread Zdenek Kotala
It is nice test case. It should be part of hash index regression test. Zdenek Alex Hunsaker napsal(a): Ok now that I made it so it actually *test* collisions, with the patch it always returns all rows that matched the hashed "key". For example (non lobotomized inthash8, I just brute f

Re: [PATCHES] hash index improving v3

2008-09-06 Thread Alex Hunsaker
2008/9/6 Alex Hunsaker <[EMAIL PROTECTED]>: > pgbench -c1 -n -t1000 -f bench_bitmap.sql > cvs head: tps = 24.011871 > v5: tps = 2.543123 oops forgot to attach bench_bitmap.sql bench_bitmap.sql Description: Binary data -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.o

Re: [PATCHES] hash index improving v3

2008-09-06 Thread Alex Hunsaker
On Sat, Sep 6, 2008 at 1:09 PM, Tom Lane <[EMAIL PROTECTED]> wrote: >For the convenience of anyone intending to test, here is an updated >patch against CVS HEAD that incorporates Alex's fix. Here are the results for a table containing 1 million entries that will generate hash collisions. It paint

Re: [PATCHES] hash index improving v3

2008-09-06 Thread Tom Lane
"Alex Hunsaker" <[EMAIL PROTECTED]> writes: > On Fri, Sep 5, 2008 at 2:21 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote: >> Ok now that I made it so it actually *test* collisions, with the patch >> it always returns all rows that matched the hashed "key". > And here is the fix, we just forget to set

Re: [PATCHES] hash index improving v3

2008-09-06 Thread Tom Lane
"Alex Hunsaker" <[EMAIL PROTECTED]> writes: > - tbm_add_tuples(tbm, &scan->xs_ctup.t_self, 1, false); > + tbm_add_tuples(tbm, &scan->xs_ctup.t_self, 1, true); Hah, I bet that explains Jonah's complaint that recheck didn't seem to be happening in his tests.

Re: [PATCHES] hash index improving v3

2008-09-05 Thread Alex Hunsaker
] On Fri, Sep 5, 2008 at 2:21 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote: > Ok now that I made it so it actually *test* collisions, with the patch > it always returns all rows that matched the hashed "key". And here is the fix, we just forget to set the recheck flag for bitmap scans. *** a/src/

Re: [PATCHES] hash index improving v3

2008-09-05 Thread Alex Hunsaker
Ok now that I made it so it actually *test* collisions, with the patch it always returns all rows that matched the hashed "key". For example (non lobotomized inthash8, I just brute forced some collisions for 0 so that anyone can try this) create table test_hash (num int8); insert into test_hash (

Re: [PATCHES] hash index improving v3

2008-09-05 Thread Alex Hunsaker
On Thu, Sep 4, 2008 at 9:48 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote: > (obviously on a 64 bit machine) > int main(void) > { >unsigned long y = 0; >unsigned cnt = 0; > >printf("insert into test_hash (num) values "); > >//while(cnt != LONG_MAX/UINT_MAX) >wh

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Zdenek Kotala
Alex Hunsaker napsal(a): On Thu, Sep 4, 2008 at 9:48 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote: Ok here are the results: (data generated from the c program before) select count(1) from test_hash; count --- 10011 create index test_hash_num_idx on test_hash using hash (num); CV

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Alex Hunsaker
On Thu, Sep 4, 2008 at 9:48 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote: Ok here are the results: (data generated from the c program before) select count(1) from test_hash; count --- 10011 create index test_hash_num_idx on test_hash using hash (num); CVS: Time: 698065.180 ms patc

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Alex Hunsaker
On Thu, Sep 4, 2008 at 8:17 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > I guess one thing we could do for testing purposes is lobotomize one of > the datatype-specific hash functions. For instance, make int8_hash > return the input mod 2^32, ignoring the upper bytes. Then it'd be easy > to compute

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Tom Lane
"Alex Hunsaker" <[EMAIL PROTECTED]> writes: > On Thu, Sep 4, 2008 at 7:13 PM, Tom Lane <[EMAIL PROTECTED]> wrote: >> * check that the queries actually use the indexes (not sure that the >> proposed switch settings ensure this, not to mention you didn't create >> the indexes) > Well I was assuming

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Tom Lane
"Alex Hunsaker" <[EMAIL PROTECTED]> writes: > On Thu, Sep 4, 2008 at 7:45 PM, Tom Lane <[EMAIL PROTECTED]> wrote: >> So what we need for testing is a few different key values that hash to >> the same code. Not sure about an easy way to find such. > Hrm, well I have not really looked at the hash a

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Alex Hunsaker
On Thu, Sep 4, 2008 at 7:45 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > So what we need for testing is a few different key values that hash to > the same code. Not sure about an easy way to find such. Hrm, well I have not really looked at the hash algorithm but I assume we could just reduce the num

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Alex Hunsaker
On Thu, Sep 4, 2008 at 7:13 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > "Alex Hunsaker" <[EMAIL PROTECTED]> writes: >> Ok let me know if this is to naive of an approach or not hitting the >> right cases you want tested. > > You have the unique-versus-not dimension, but I'm also wondering about > narr

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Tom Lane
I wrote: > You have the unique-versus-not dimension, On second thought, actually not. What we want to look at is the penalty for false matches due to *distinct* key values that happen to have the same hash codes. Your test case for all-the-same is using all the same key values, which means it'll

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Tom Lane
"Alex Hunsaker" <[EMAIL PROTECTED]> writes: > Ok let me know if this is to naive of an approach or not hitting the > right cases you want tested. You have the unique-versus-not dimension, but I'm also wondering about narrow vs wide index keys (say about 8 bytes vs 50-100 or so). In the former cas

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Alex Hunsaker
On Thu, Sep 4, 2008 at 5:11 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > So my thinking right now is that we should just test this patch as-is. > If it doesn't show really horrid performance when there are lots of > hash key collisions, we should forget the store-both-things idea and > just go with th

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Tom Lane
Here is an updated patch incorporating Zdenek's review, my own observation that we should make the index tupledesc tell the truth, and some other fixes/improvements such as making backwards scans work as expected. The main thing lacking before this could be committed, from a code standpoint, is a

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Tom Lane
Zdenek Kotala <[EMAIL PROTECTED]> writes: > pgsql/src/backend/access/hash/hashutil.c > > It would be better remove #define from hash.h and setup it there > directly. Actually, I don't like this aspect of the patch one bit: it means that the

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Tom Lane
Zdenek Kotala <[EMAIL PROTECTED]> writes: > I performed code review and see my comments. Thanks for the comments. I've incorporated all of these into an updated patch that I'm preparing, except for > Why not define new datatype for example HashKey instead of uint32? This seems like a good

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Zdenek Kotala
I performed code review and see my comments. pgsql/src/backend/access/hash/hashpage.c use sizeof() or something better the 4. pgsql/src/backend/access/hash/hashpage.c New empty line

Re: [PATCHES] hash index improving v3

2008-09-03 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes: > Right now it seems strange that the index is larger than a btree, yet > the performance tests show that 3 times as much I/O was used accessing > the btree. Well, in an ideal world a hash index probe is O(1) while a btree probe is O(log N), so that result i

Re: [PATCHES] hash index improving v3

2008-09-03 Thread Jonah H. Harris
On Wed, Sep 3, 2008 at 10:06 PM, Simon Riggs <[EMAIL PROTECTED]> wrote: >> It seems hash index is a little better on index creation and >> selection. >> But maybe it's in the range of noise, I'm not sure. >> I'd like to try it with a bigger dataset (e.g. table with 10GB) but >> there is not enough

Re: [PATCHES] hash index improving v3

2008-09-03 Thread Simon Riggs
On Mon, 2008-08-18 at 09:46 +0800, Xiao Meng wrote: > There's minor change against the previous > one( http://archives.postgresql.org/pgsql-hackers/2008-07/msg01183.php ). > * merge branch master(Aug 16) into the patch > * clean code and make some comment > Performance result is here > http://wik

Re: [PATCHES] hash index improving v3

2008-08-19 Thread Xiao Meng
With the help of David Fetter, you can get the copy by git clone http://git.postgresql.org/git/~davidfetter/hash/.git It's in the branch gsoc-hash. Thank you, David. -- Best Regards, Xiao Meng DKERC, Harbin Institute of Technology, China G

[PATCHES] hash index improving v3

2008-08-17 Thread Xiao Meng
There's minor change against the previous one( http://archives.postgresql.org/pgsql-hackers/2008-07/msg01183.php ). * merge branch master(Aug 16) into the patch * clean code and make some comment Performance result is here http://wiki.postgresql.org/wiki/Gsoc08-hashindex It seems hash index is a l