Can we consider this hash thread closed/completed?
---
Tom Lane wrote:
Simon Riggs [EMAIL PROTECTED] writes:
Thinks: Why not just sort all of the time and skip the debate entirely?
The sort is demonstrably a loser for
On Wed, 2008-09-24 at 12:04 -0400, Bruce Momjian wrote:
Can we consider this hash thread closed/completed?
Please
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes
Simon Riggs [EMAIL PROTECTED] writes:
maintenance_work_mem is already used for 3 separate operations that bear
little resemblance to each other. If it's appropriate for all of those
then its appropriate for this usage also.
No, it isn't.
The fundamental point here is that this isn't a memory
On Tue, 2008-09-23 at 08:16 -0400, Tom Lane wrote:
Simon Riggs [EMAIL PROTECTED] writes:
maintenance_work_mem is already used for 3 separate operations that bear
little resemblance to each other. If it's appropriate for all of those
then its appropriate for this usage also.
No, it
Simon Riggs [EMAIL PROTECTED] writes:
Thinks: Why not just sort all of the time and skip the debate entirely?
The sort is demonstrably a loser for smaller indexes. Admittedly,
if the index is small then the sort can't cost all that much, but if
the (correct) threshold is some large fraction of
On Tue, 2008-09-23 at 09:13 -0400, Tom Lane wrote:
The other side of that coin is that it's not clear this is really worth
arguing about, much less exposing a separate parameter for.
Agreed.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
--
Sent
On Fri, Sep 12, 2008 at 8:29 AM, Kenneth Marshall [EMAIL PROTECTED] wrote:
On Thu, Sep 11, 2008 at 08:51:53PM -0600, Alex Hunsaker wrote:
On Thu, Sep 11, 2008 at 9:24 AM, Kenneth Marshall [EMAIL PROTECTED] wrote:
Alex,
I meant to check the performance with increasing numbers of collisions,
On Sun, Sep 14, 2008 at 8:16 PM, Tom Lane [EMAIL PROTECTED] wrote:
BTW, one thing I noticed was that the hash index build time for the
wide column case got a lot worse after applying the patch (from 56 to
237 sec). The reason for this turned out to be that with the smaller
predicted index
On Mon, Sep 22, 2008 at 7:57 PM, Alex Hunsaker [EMAIL PROTECTED] wrote:
50,000,000 rows and 32,768,000 collisions
I should mention thats 50 million rows + 32 million more or 62,768,000
rows which explains some of the added index creation time...
index time:
head: 576600.967 ms
v5:
Jonah H. Harris [EMAIL PROTECTED] writes:
On Mon, Sep 22, 2008 at 11:25 PM, Alex Hunsaker [EMAIL PROTECTED] wrote:
On Sun, Sep 14, 2008 at 8:16 PM, Tom Lane [EMAIL PROTECTED] wrote:
I'm considering changing hashbuild to switch over at shared_buffers instead
of effective_cache_size --- any
On Tue, 2008-09-23 at 00:05 -0400, Tom Lane wrote:
Jonah H. Harris [EMAIL PROTECTED] writes:
On Mon, Sep 22, 2008 at 11:25 PM, Alex Hunsaker [EMAIL PROTECTED] wrote:
On Sun, Sep 14, 2008 at 8:16 PM, Tom Lane [EMAIL PROTECTED] wrote:
I'm considering changing hashbuild to switch over at
Simon Riggs [EMAIL PROTECTED] writes:
I wasn't very happy with effective_cache_size and not happy with
shared_buffers either. If building hash indexes is memory critical then
we just need to say so and encourage others to set memory use correctly.
People are already aware that
On Tue, 2008-09-23 at 00:48 -0400, Tom Lane wrote:
Simon Riggs [EMAIL PROTECTED] writes:
I wasn't very happy with effective_cache_size and not happy with
shared_buffers either. If building hash indexes is memory critical then
we just need to say so and encourage others to set memory use
I did some testing of my own on the hash index patch, and mostly seem to
have confirmed Alex's results. I used a table like this:
create table tab (f1 serial primary key, f2 some-datatype);
create index ind on tab using hash (f2);
and populated it with 2 million rows of data;
Alex Hunsaker napsal(a):
On Wed, Sep 10, 2008 at 10:27 AM, Alex Hunsaker [EMAIL PROTECTED] wrote:
On Wed, Sep 10, 2008 at 8:47 AM, Zdenek Kotala [EMAIL PROTECTED] wrote:
What locale did you use? It would be nice to have also comparing between C
and any UTF8 locale. I think we should see big
On Thu, Sep 11, 2008 at 08:51:53PM -0600, Alex Hunsaker wrote:
On Thu, Sep 11, 2008 at 9:24 AM, Kenneth Marshall [EMAIL PROTECTED] wrote:
Alex,
I meant to check the performance with increasing numbers of collisions,
not increasing size of the hashed item. In other words, something like
On Fri, Sep 12, 2008 at 3:14 AM, Zdenek Kotala [EMAIL PROTECTED] wrote:
Alex Hunsaker napsal(a):
On Wed, Sep 10, 2008 at 10:27 AM, Alex Hunsaker [EMAIL PROTECTED] wrote:
On Wed, Sep 10, 2008 at 8:47 AM, Zdenek Kotala [EMAIL PROTECTED]
wrote:
What locale did you use? It would be nice to
On Wed, Sep 10, 2008 at 10:17:31PM -0600, Alex Hunsaker wrote:
On Wed, Sep 10, 2008 at 9:49 PM, Alex Hunsaker [EMAIL PROTECTED] wrote:
On Wed, Sep 10, 2008 at 7:04 AM, Kenneth Marshall [EMAIL PROTECTED] wrote:
On Tue, Sep 09, 2008 at 07:23:03PM -0600, Alex Hunsaker wrote:
On Tue, Sep 9,
On Thu, Sep 11, 2008 at 9:24 AM, Kenneth Marshall [EMAIL PROTECTED] wrote:
Alex,
I meant to check the performance with increasing numbers of collisions,
not increasing size of the hashed item. In other words, something like
this:
for ($coll=500; $i=100; $i=$i*2) {
for ($i=0;
Alex Hunsaker napsal(a):
wide:
# NOTE not on the same machine as the narrow test was run!
# spit out 2, 000, 000 random 100 length strings
perl gen.pl data.sql
create table test_hash (wide text);
copy test_hash from './data.sql';
create index test_hash_num_idx on test_hash using hash (wide);
On Wed, Sep 10, 2008 at 8:47 AM, Zdenek Kotala [EMAIL PROTECTED] wrote:
What locale did you use? It would be nice to have also comparing between C
and any UTF8 locale. I think we should see big benefit when non C locale is
used.
Err yes this was UTF8, Ill see about doing a C locale.
--
Sent
On Wed, Sep 10, 2008 at 10:27 AM, Alex Hunsaker [EMAIL PROTECTED] wrote:
On Wed, Sep 10, 2008 at 8:47 AM, Zdenek Kotala [EMAIL PROTECTED] wrote:
What locale did you use? It would be nice to have also comparing between C
and any UTF8 locale. I think we should see big benefit when non C locale is
On Wed, Sep 10, 2008 at 7:04 AM, Kenneth Marshall [EMAIL PROTECTED] wrote:
On Tue, Sep 09, 2008 at 07:23:03PM -0600, Alex Hunsaker wrote:
On Tue, Sep 9, 2008 at 7:48 AM, Kenneth Marshall [EMAIL PROTECTED] wrote:
I think that the glacial speed for generating a big hash index is
the same
On Wed, Sep 10, 2008 at 9:49 PM, Alex Hunsaker [EMAIL PROTECTED] wrote:
On Wed, Sep 10, 2008 at 7:04 AM, Kenneth Marshall [EMAIL PROTECTED] wrote:
On Tue, Sep 09, 2008 at 07:23:03PM -0600, Alex Hunsaker wrote:
On Tue, Sep 9, 2008 at 7:48 AM, Kenneth Marshall [EMAIL PROTECTED] wrote:
I think
On Sat, Sep 06, 2008 at 08:23:05PM -0600, Alex Hunsaker wrote:
On Sat, Sep 6, 2008 at 1:09 PM, Tom Lane [EMAIL PROTECTED] wrote:
For the convenience of anyone intending to test, here is an updated
patch against CVS HEAD that incorporates Alex's fix.
Here are the results for a table
On Tue, Sep 9, 2008 at 7:48 AM, Kenneth Marshall [EMAIL PROTECTED] wrote:
I think that the glacial speed for generating a big hash index is
the same problem that the original code faced.
Yeah sorry, I was not saying it was a new problem with the patch. Err
at least not trying to :) *Both* of
On Tue, Sep 9, 2008 at 7:23 PM, Alex Hunsaker [EMAIL PROTECTED] wrote:
create table test_hash(num int8);
insert into test_hash (num) select generate_series(1, 200);
create index test_hash_num_idx on test_hash (num);
pgbench -c1 -n -t1 -f bench_index.sql
cvs head: tps = 3161.500511
On Tue, Sep 9, 2008 at 7:23 PM, Alex Hunsaker [EMAIL PROTECTED] wrote:
BTW Im still planning on doing a wide vs narrow test... sometime... :)
narrow: (exactly the same as what I just did in the other post)
create table test_hash(num int8);
insert into test_hash (num) select generate_series(1,
It is nice test case. It should be part of hash index regression test.
Zdenek
Alex Hunsaker napsal(a):
Ok now that I made it so it actually *test* collisions, with the patch
it always returns all rows that matched the hashed key. For example
(non lobotomized inthash8, I just brute
Tom Lane napsal(a):
Alex Hunsaker [EMAIL PROTECTED] writes:
On Fri, Sep 5, 2008 at 2:21 PM, Alex Hunsaker [EMAIL PROTECTED] wrote:
Ok now that I made it so it actually *test* collisions, with the patch
it always returns all rows that matched the hashed key.
And here is the fix, we just
Zdenek Kotala [EMAIL PROTECTED] writes:
I attach cleanup patch which we discussed last commit fest. It introduce new
macros HashGetMetaPage and HashGetBitmap and of course, it break backward on
disk format compatibility which we will break it anyway when Xiao's patch
will
be committed.
Alex Hunsaker [EMAIL PROTECTED] writes:
- tbm_add_tuples(tbm, scan-xs_ctup.t_self, 1, false);
+ tbm_add_tuples(tbm, scan-xs_ctup.t_self, 1, true);
Hah, I bet that explains Jonah's complaint that recheck didn't seem to
be happening in his tests. Nice
Alex Hunsaker [EMAIL PROTECTED] writes:
On Fri, Sep 5, 2008 at 2:21 PM, Alex Hunsaker [EMAIL PROTECTED] wrote:
Ok now that I made it so it actually *test* collisions, with the patch
it always returns all rows that matched the hashed key.
And here is the fix, we just forget to set the recheck
On Sat, Sep 6, 2008 at 1:09 PM, Tom Lane [EMAIL PROTECTED] wrote:
For the convenience of anyone intending to test, here is an updated
patch against CVS HEAD that incorporates Alex's fix.
Here are the results for a table containing 1 million entries that
will generate hash collisions. It paints a
2008/9/6 Alex Hunsaker [EMAIL PROTECTED]:
pgbench -c1 -n -t1000 -f bench_bitmap.sql
cvs head: tps = 24.011871
v5: tps = 2.543123
oops forgot to attach bench_bitmap.sql
bench_bitmap.sql
Description: Binary data
--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
Ok now that I made it so it actually *test* collisions, with the patch
it always returns all rows that matched the hashed key. For example
(non lobotomized inthash8, I just brute forced some collisions for 0
so that anyone can try this)
create table test_hash (num int8);
insert into test_hash
]
On Fri, Sep 5, 2008 at 2:21 PM, Alex Hunsaker [EMAIL PROTECTED] wrote:
Ok now that I made it so it actually *test* collisions, with the patch
it always returns all rows that matched the hashed key.
And here is the fix, we just forget to set the recheck flag for bitmap scans.
***
I wrote:
You have the unique-versus-not dimension,
On second thought, actually not. What we want to look at is the penalty
for false matches due to *distinct* key values that happen to have the
same hash codes. Your test case for all-the-same is using all the same
key values, which means it'll
On Thu, Sep 4, 2008 at 7:13 PM, Tom Lane [EMAIL PROTECTED] wrote:
Alex Hunsaker [EMAIL PROTECTED] writes:
Ok let me know if this is to naive of an approach or not hitting the
right cases you want tested.
You have the unique-versus-not dimension, but I'm also wondering about
narrow vs wide
On Thu, Sep 4, 2008 at 7:45 PM, Tom Lane [EMAIL PROTECTED] wrote:
So what we need for testing is a few different key values that hash to
the same code. Not sure about an easy way to find such.
Hrm, well I have not really looked at the hash algorithm but I assume
we could just reduce the number
Alex Hunsaker [EMAIL PROTECTED] writes:
On Thu, Sep 4, 2008 at 7:45 PM, Tom Lane [EMAIL PROTECTED] wrote:
So what we need for testing is a few different key values that hash to
the same code. Not sure about an easy way to find such.
Hrm, well I have not really looked at the hash algorithm
Alex Hunsaker [EMAIL PROTECTED] writes:
On Thu, Sep 4, 2008 at 7:13 PM, Tom Lane [EMAIL PROTECTED] wrote:
* check that the queries actually use the indexes (not sure that the
proposed switch settings ensure this, not to mention you didn't create
the indexes)
Well I was assuming I could just
On Thu, Sep 4, 2008 at 8:17 PM, Tom Lane [EMAIL PROTECTED] wrote:
I guess one thing we could do for testing purposes is lobotomize one of
the datatype-specific hash functions. For instance, make int8_hash
return the input mod 2^32, ignoring the upper bytes. Then it'd be easy
to compute
On Mon, 2008-08-18 at 09:46 +0800, Xiao Meng wrote:
There's minor change against the previous
one( http://archives.postgresql.org/pgsql-hackers/2008-07/msg01183.php ).
* merge branch master(Aug 16) into the patch
* clean code and make some comment
Performance result is here
With the help of David Fetter, you can get the copy by
git clone
http://git.postgresql.org/git/~davidfetter/hash/.githttp://git.postgresql.org/git/%7Edavidfetter/hash/.git
It's in the branch gsoc-hash.
Thank you, David.
--
Best Regards,
Xiao Meng
DKERC, Harbin Institute of Technology, China
45 matches
Mail list logo