"Jonah H. Harris" <[EMAIL PROTECTED]> writes:
> Agreed.  My thinking is that there's either something inherently wrong
> with the implementation, or we're performing so many disk I/Os that
> it's nearly equivalent to b-tree.  Tom has a couple suggestions which
> Xiao and I will explore.

I finally got a chance to look through the patch in some detail.
If I haven't missed something, there are just two improvement ideas
embodied in it:

1. Store just the hash value, and not the index key value, in hash
index entries.  (This means that all hash indexscans become lossy
and have to be rechecked at the heap.)

2. Keep the contents of each index page ordered by hash value, and use
binary instead of linear search to find the matching item(s) during an
indexscan.  (This depends on #1 because recomputing the hash values
during the binary search would be too expensive --- although you could
also make it work by storing *both* the hash value and the original
key.)

I suppose that the main point of #1 is to reduce index size by
allowing more tuples to fit in each bucket.  However, the patch
neglects to adjust the target-fillfactor calculation in _hash_metapinit,
which means that the code won't actually put any more tuples per bucket
(on average) than it did before.  So the index isn't actually any
smaller and you get no I/O savings --- you just have more unused
space on a typical page.  Fixing that might help.

FWIW, I had always assumed that part of the solution to hash's woes
would involve decoupling the bucket size from the page size, so
that you could have multiple buckets per page.  But maybe the
binary-search idea makes that unnecessary.  I'm not sure.  A whole
lot depends on how evenly the buckets get filled.  You should probably
investigate how many tuples actually end up in each bucket with and
without the patch.

In the realm of micro-optimizations that might be significant, I think
you really need to get rid of all those _create_hash_desc calls,
particularly the one in _hash_checkqual which is part of the inner loop
of an indexscan.  Not only are they slow but they probably represent a
serious memory leak in a scan that returns many tuples.  For reading the
hash value out of an existing index tuple, I don't think you should be
bothering with a tupdesc at all --- don't use index_getattr, just map a
C struct onto the known layout of a indextuple with a single never-null
int field.  This would particularly make for a noticeable improvement in
the speed of _hash_binsearch.  The tupdescs used in storing an index
entry are probably needed, but you could just use a single statically
declared tupdesc (look at the ones in relcache.c for examples of
building one as a constant).

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to