Gregory Stark wrote:
Tom Lane <[EMAIL PROTECTED]> writes:

I think the problem may well be that we use hash buckets that are too
large (ie, whole pages).  After we fetch the page, we have to grovel
through every tuple on it to find the one(s) that really match the
query, whereas btree has a much more intelligent strategy (viz binary
search) to do its intrapage searches.  Smaller buckets would help make
up for this.

Hm, you would expect hash indexes to still be a win for very large indexes
where you're worried more about i/o than cpu resources.

Another issue is that we don't store the raw hashcode in the index
tuples, so the only way to test a tuple is to actually invoke the
datatype equality function.  If we stored the whole 32-bit hashcode
we could eliminate non-matching hashcodes cheaply.  I'm not sure how
painful it'd be to do this though ... hash uses the same index tuple
layout as everybody else, and so there's no convenient place to put
the hashcode.

I looked a while back and was suspicious about the actual hash functions too.
It seemed like a lot of them were vastly suboptimal. That would mean we're
often dealing with mostly empty and mostly full buckets instead of well
distributed hash tables.



This is now sounding like a lot of low hanging fruit ... highly performant hash indexed tables could possibly be a very big win.

cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Reply via email to