Re: [PATCHES] hash index improving v3

2008-09-24 Thread Bruce Momjian
Can we consider this hash thread closed/completed? --- Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: Thinks: Why not just sort all of the time and skip the debate entirely? The sort is demonstrably a loser for

Re: [PATCHES] hash index improving v3

2008-09-24 Thread Simon Riggs
On Wed, 2008-09-24 at 12:04 -0400, Bruce Momjian wrote: Can we consider this hash thread closed/completed? Please -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes

Re: [PATCHES] hash index improving v3

2008-09-23 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: maintenance_work_mem is already used for 3 separate operations that bear little resemblance to each other. If it's appropriate for all of those then its appropriate for this usage also. No, it isn't. The fundamental point here is that this isn't a memory

Re: [PATCHES] hash index improving v3

2008-09-23 Thread Simon Riggs
On Tue, 2008-09-23 at 08:16 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: maintenance_work_mem is already used for 3 separate operations that bear little resemblance to each other. If it's appropriate for all of those then its appropriate for this usage also. No, it

Re: [PATCHES] hash index improving v3

2008-09-23 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: Thinks: Why not just sort all of the time and skip the debate entirely? The sort is demonstrably a loser for smaller indexes. Admittedly, if the index is small then the sort can't cost all that much, but if the (correct) threshold is some large fraction of

Re: [PATCHES] hash index improving v3

2008-09-23 Thread Simon Riggs
On Tue, 2008-09-23 at 09:13 -0400, Tom Lane wrote: The other side of that coin is that it's not clear this is really worth arguing about, much less exposing a separate parameter for. Agreed. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent

Re: [PATCHES] hash index improving v3

2008-09-22 Thread Alex Hunsaker
On Fri, Sep 12, 2008 at 8:29 AM, Kenneth Marshall [EMAIL PROTECTED] wrote: On Thu, Sep 11, 2008 at 08:51:53PM -0600, Alex Hunsaker wrote: On Thu, Sep 11, 2008 at 9:24 AM, Kenneth Marshall [EMAIL PROTECTED] wrote: Alex, I meant to check the performance with increasing numbers of collisions,

Re: [PATCHES] hash index improving v3

2008-09-22 Thread Alex Hunsaker
On Sun, Sep 14, 2008 at 8:16 PM, Tom Lane [EMAIL PROTECTED] wrote: BTW, one thing I noticed was that the hash index build time for the wide column case got a lot worse after applying the patch (from 56 to 237 sec). The reason for this turned out to be that with the smaller predicted index

Re: [PATCHES] hash index improving v3

2008-09-22 Thread Alex Hunsaker
On Mon, Sep 22, 2008 at 7:57 PM, Alex Hunsaker [EMAIL PROTECTED] wrote: 50,000,000 rows and 32,768,000 collisions I should mention thats 50 million rows + 32 million more or 62,768,000 rows which explains some of the added index creation time... index time: head: 576600.967 ms v5:

Re: [PATCHES] hash index improving v3

2008-09-22 Thread Tom Lane
Jonah H. Harris [EMAIL PROTECTED] writes: On Mon, Sep 22, 2008 at 11:25 PM, Alex Hunsaker [EMAIL PROTECTED] wrote: On Sun, Sep 14, 2008 at 8:16 PM, Tom Lane [EMAIL PROTECTED] wrote: I'm considering changing hashbuild to switch over at shared_buffers instead of effective_cache_size --- any

Re: [PATCHES] hash index improving v3

2008-09-22 Thread Simon Riggs
On Tue, 2008-09-23 at 00:05 -0400, Tom Lane wrote: Jonah H. Harris [EMAIL PROTECTED] writes: On Mon, Sep 22, 2008 at 11:25 PM, Alex Hunsaker [EMAIL PROTECTED] wrote: On Sun, Sep 14, 2008 at 8:16 PM, Tom Lane [EMAIL PROTECTED] wrote: I'm considering changing hashbuild to switch over at

Re: [PATCHES] hash index improving v3

2008-09-22 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: I wasn't very happy with effective_cache_size and not happy with shared_buffers either. If building hash indexes is memory critical then we just need to say so and encourage others to set memory use correctly. People are already aware that

Re: [PATCHES] hash index improving v3

2008-09-22 Thread Simon Riggs
On Tue, 2008-09-23 at 00:48 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: I wasn't very happy with effective_cache_size and not happy with shared_buffers either. If building hash indexes is memory critical then we just need to say so and encourage others to set memory use

Re: [PATCHES] hash index improving v3

2008-09-14 Thread Tom Lane
I did some testing of my own on the hash index patch, and mostly seem to have confirmed Alex's results. I used a table like this: create table tab (f1 serial primary key, f2 some-datatype); create index ind on tab using hash (f2); and populated it with 2 million rows of data;

Re: [PATCHES] hash index improving v3

2008-09-12 Thread Zdenek Kotala
Alex Hunsaker napsal(a): On Wed, Sep 10, 2008 at 10:27 AM, Alex Hunsaker [EMAIL PROTECTED] wrote: On Wed, Sep 10, 2008 at 8:47 AM, Zdenek Kotala [EMAIL PROTECTED] wrote: What locale did you use? It would be nice to have also comparing between C and any UTF8 locale. I think we should see big

Re: [PATCHES] hash index improving v3

2008-09-12 Thread Kenneth Marshall
On Thu, Sep 11, 2008 at 08:51:53PM -0600, Alex Hunsaker wrote: On Thu, Sep 11, 2008 at 9:24 AM, Kenneth Marshall [EMAIL PROTECTED] wrote: Alex, I meant to check the performance with increasing numbers of collisions, not increasing size of the hashed item. In other words, something like

Re: [PATCHES] hash index improving v3

2008-09-12 Thread Alex Hunsaker
On Fri, Sep 12, 2008 at 3:14 AM, Zdenek Kotala [EMAIL PROTECTED] wrote: Alex Hunsaker napsal(a): On Wed, Sep 10, 2008 at 10:27 AM, Alex Hunsaker [EMAIL PROTECTED] wrote: On Wed, Sep 10, 2008 at 8:47 AM, Zdenek Kotala [EMAIL PROTECTED] wrote: What locale did you use? It would be nice to

Re: [PATCHES] hash index improving v3

2008-09-11 Thread Kenneth Marshall
On Wed, Sep 10, 2008 at 10:17:31PM -0600, Alex Hunsaker wrote: On Wed, Sep 10, 2008 at 9:49 PM, Alex Hunsaker [EMAIL PROTECTED] wrote: On Wed, Sep 10, 2008 at 7:04 AM, Kenneth Marshall [EMAIL PROTECTED] wrote: On Tue, Sep 09, 2008 at 07:23:03PM -0600, Alex Hunsaker wrote: On Tue, Sep 9,

Re: [PATCHES] hash index improving v3

2008-09-11 Thread Alex Hunsaker
On Thu, Sep 11, 2008 at 9:24 AM, Kenneth Marshall [EMAIL PROTECTED] wrote: Alex, I meant to check the performance with increasing numbers of collisions, not increasing size of the hashed item. In other words, something like this: for ($coll=500; $i=100; $i=$i*2) { for ($i=0;

Re: [PATCHES] hash index improving v3

2008-09-10 Thread Zdenek Kotala
Alex Hunsaker napsal(a): wide: # NOTE not on the same machine as the narrow test was run! # spit out 2, 000, 000 random 100 length strings perl gen.pl data.sql create table test_hash (wide text); copy test_hash from './data.sql'; create index test_hash_num_idx on test_hash using hash (wide);

Re: [PATCHES] hash index improving v3

2008-09-10 Thread Alex Hunsaker
On Wed, Sep 10, 2008 at 8:47 AM, Zdenek Kotala [EMAIL PROTECTED] wrote: What locale did you use? It would be nice to have also comparing between C and any UTF8 locale. I think we should see big benefit when non C locale is used. Err yes this was UTF8, Ill see about doing a C locale. -- Sent

Re: [PATCHES] hash index improving v3

2008-09-10 Thread Alex Hunsaker
On Wed, Sep 10, 2008 at 10:27 AM, Alex Hunsaker [EMAIL PROTECTED] wrote: On Wed, Sep 10, 2008 at 8:47 AM, Zdenek Kotala [EMAIL PROTECTED] wrote: What locale did you use? It would be nice to have also comparing between C and any UTF8 locale. I think we should see big benefit when non C locale is

Re: [PATCHES] hash index improving v3

2008-09-10 Thread Alex Hunsaker
On Wed, Sep 10, 2008 at 7:04 AM, Kenneth Marshall [EMAIL PROTECTED] wrote: On Tue, Sep 09, 2008 at 07:23:03PM -0600, Alex Hunsaker wrote: On Tue, Sep 9, 2008 at 7:48 AM, Kenneth Marshall [EMAIL PROTECTED] wrote: I think that the glacial speed for generating a big hash index is the same

Re: [PATCHES] hash index improving v3

2008-09-10 Thread Alex Hunsaker
On Wed, Sep 10, 2008 at 9:49 PM, Alex Hunsaker [EMAIL PROTECTED] wrote: On Wed, Sep 10, 2008 at 7:04 AM, Kenneth Marshall [EMAIL PROTECTED] wrote: On Tue, Sep 09, 2008 at 07:23:03PM -0600, Alex Hunsaker wrote: On Tue, Sep 9, 2008 at 7:48 AM, Kenneth Marshall [EMAIL PROTECTED] wrote: I think

Re: [PATCHES] hash index improving v3

2008-09-09 Thread Kenneth Marshall
On Sat, Sep 06, 2008 at 08:23:05PM -0600, Alex Hunsaker wrote: On Sat, Sep 6, 2008 at 1:09 PM, Tom Lane [EMAIL PROTECTED] wrote: For the convenience of anyone intending to test, here is an updated patch against CVS HEAD that incorporates Alex's fix. Here are the results for a table

Re: [PATCHES] hash index improving v3

2008-09-09 Thread Alex Hunsaker
On Tue, Sep 9, 2008 at 7:48 AM, Kenneth Marshall [EMAIL PROTECTED] wrote: I think that the glacial speed for generating a big hash index is the same problem that the original code faced. Yeah sorry, I was not saying it was a new problem with the patch. Err at least not trying to :) *Both* of

Re: [PATCHES] hash index improving v3

2008-09-09 Thread Alex Hunsaker
On Tue, Sep 9, 2008 at 7:23 PM, Alex Hunsaker [EMAIL PROTECTED] wrote: create table test_hash(num int8); insert into test_hash (num) select generate_series(1, 200); create index test_hash_num_idx on test_hash (num); pgbench -c1 -n -t1 -f bench_index.sql cvs head: tps = 3161.500511

Re: [PATCHES] hash index improving v3

2008-09-09 Thread Alex Hunsaker
On Tue, Sep 9, 2008 at 7:23 PM, Alex Hunsaker [EMAIL PROTECTED] wrote: BTW Im still planning on doing a wide vs narrow test... sometime... :) narrow: (exactly the same as what I just did in the other post) create table test_hash(num int8); insert into test_hash (num) select generate_series(1,

Re: [PATCHES] hash index improving v3

2008-09-08 Thread Zdenek Kotala
It is nice test case. It should be part of hash index regression test. Zdenek Alex Hunsaker napsal(a): Ok now that I made it so it actually *test* collisions, with the patch it always returns all rows that matched the hashed key. For example (non lobotomized inthash8, I just brute

Re: [PATCHES] hash index improving v3

2008-09-08 Thread Zdenek Kotala
Tom Lane napsal(a): Alex Hunsaker [EMAIL PROTECTED] writes: On Fri, Sep 5, 2008 at 2:21 PM, Alex Hunsaker [EMAIL PROTECTED] wrote: Ok now that I made it so it actually *test* collisions, with the patch it always returns all rows that matched the hashed key. And here is the fix, we just

Re: [PATCHES] hash index improving v3

2008-09-08 Thread Tom Lane
Zdenek Kotala [EMAIL PROTECTED] writes: I attach cleanup patch which we discussed last commit fest. It introduce new macros HashGetMetaPage and HashGetBitmap and of course, it break backward on disk format compatibility which we will break it anyway when Xiao's patch will be committed.

Re: [PATCHES] hash index improving v3

2008-09-06 Thread Tom Lane
Alex Hunsaker [EMAIL PROTECTED] writes: - tbm_add_tuples(tbm, scan-xs_ctup.t_self, 1, false); + tbm_add_tuples(tbm, scan-xs_ctup.t_self, 1, true); Hah, I bet that explains Jonah's complaint that recheck didn't seem to be happening in his tests. Nice

Re: [PATCHES] hash index improving v3

2008-09-06 Thread Tom Lane
Alex Hunsaker [EMAIL PROTECTED] writes: On Fri, Sep 5, 2008 at 2:21 PM, Alex Hunsaker [EMAIL PROTECTED] wrote: Ok now that I made it so it actually *test* collisions, with the patch it always returns all rows that matched the hashed key. And here is the fix, we just forget to set the recheck

Re: [PATCHES] hash index improving v3

2008-09-06 Thread Alex Hunsaker
On Sat, Sep 6, 2008 at 1:09 PM, Tom Lane [EMAIL PROTECTED] wrote: For the convenience of anyone intending to test, here is an updated patch against CVS HEAD that incorporates Alex's fix. Here are the results for a table containing 1 million entries that will generate hash collisions. It paints a

Re: [PATCHES] hash index improving v3

2008-09-06 Thread Alex Hunsaker
2008/9/6 Alex Hunsaker [EMAIL PROTECTED]: pgbench -c1 -n -t1000 -f bench_bitmap.sql cvs head: tps = 24.011871 v5: tps = 2.543123 oops forgot to attach bench_bitmap.sql bench_bitmap.sql Description: Binary data -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)

Re: [PATCHES] hash index improving v3

2008-09-05 Thread Alex Hunsaker
Ok now that I made it so it actually *test* collisions, with the patch it always returns all rows that matched the hashed key. For example (non lobotomized inthash8, I just brute forced some collisions for 0 so that anyone can try this) create table test_hash (num int8); insert into test_hash

Re: [PATCHES] hash index improving v3

2008-09-05 Thread Alex Hunsaker
] On Fri, Sep 5, 2008 at 2:21 PM, Alex Hunsaker [EMAIL PROTECTED] wrote: Ok now that I made it so it actually *test* collisions, with the patch it always returns all rows that matched the hashed key. And here is the fix, we just forget to set the recheck flag for bitmap scans. ***

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Tom Lane
I wrote: You have the unique-versus-not dimension, On second thought, actually not. What we want to look at is the penalty for false matches due to *distinct* key values that happen to have the same hash codes. Your test case for all-the-same is using all the same key values, which means it'll

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Alex Hunsaker
On Thu, Sep 4, 2008 at 7:13 PM, Tom Lane [EMAIL PROTECTED] wrote: Alex Hunsaker [EMAIL PROTECTED] writes: Ok let me know if this is to naive of an approach or not hitting the right cases you want tested. You have the unique-versus-not dimension, but I'm also wondering about narrow vs wide

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Alex Hunsaker
On Thu, Sep 4, 2008 at 7:45 PM, Tom Lane [EMAIL PROTECTED] wrote: So what we need for testing is a few different key values that hash to the same code. Not sure about an easy way to find such. Hrm, well I have not really looked at the hash algorithm but I assume we could just reduce the number

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Tom Lane
Alex Hunsaker [EMAIL PROTECTED] writes: On Thu, Sep 4, 2008 at 7:45 PM, Tom Lane [EMAIL PROTECTED] wrote: So what we need for testing is a few different key values that hash to the same code. Not sure about an easy way to find such. Hrm, well I have not really looked at the hash algorithm

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Tom Lane
Alex Hunsaker [EMAIL PROTECTED] writes: On Thu, Sep 4, 2008 at 7:13 PM, Tom Lane [EMAIL PROTECTED] wrote: * check that the queries actually use the indexes (not sure that the proposed switch settings ensure this, not to mention you didn't create the indexes) Well I was assuming I could just

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Alex Hunsaker
On Thu, Sep 4, 2008 at 8:17 PM, Tom Lane [EMAIL PROTECTED] wrote: I guess one thing we could do for testing purposes is lobotomize one of the datatype-specific hash functions. For instance, make int8_hash return the input mod 2^32, ignoring the upper bytes. Then it'd be easy to compute

Re: [PATCHES] hash index improving v3

2008-09-03 Thread Simon Riggs
On Mon, 2008-08-18 at 09:46 +0800, Xiao Meng wrote: There's minor change against the previous one( http://archives.postgresql.org/pgsql-hackers/2008-07/msg01183.php ). * merge branch master(Aug 16) into the patch * clean code and make some comment Performance result is here

Re: [PATCHES] hash index improving v3

2008-08-19 Thread Xiao Meng
With the help of David Fetter, you can get the copy by git clone http://git.postgresql.org/git/~davidfetter/hash/.githttp://git.postgresql.org/git/%7Edavidfetter/hash/.git It's in the branch gsoc-hash. Thank you, David. -- Best Regards, Xiao Meng DKERC, Harbin Institute of Technology, China