Re: [PERFORM] [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL

Neil Conway Mon, 09 May 2005 22:32:44 -0700

Tom Lane wrote:

I have a gut reaction against that: it makes hash indexes fundamentally
subservient to btrees.

I wouldn't say "subservient" -- if there is no ordering defined for the index key, we just do a linear scan.

However: what about storing the things in hashcode order?  Ordering uint32s
doesn't seem like any big conceptual problem.


Hmm, my memory of the hash code is a bit fuzzy. Do I understand correctly?

- we only use some of the bits in the hash to map from the hash of a key to its bucket

- therefore within a bucket, we can still distinguish most of the non-equal tuples from one another by comparing their full hash values

- if we keep the entries in a bucket (or page, I guess -- per earlier mail) sorted by full hash value, we can use that to perform a binary search

Sounds like a good idea to me. How likely is it that the hash index will be sufficiently large that we're using most of the bits in the hash just to map hash values to buckets, so that the binary search won't be very effective? (At this point many of the distinct keys in each bucket will be full-on hash collisions, although sorting by the key values themselves would still be effective.)

I think that efficient implementation of this would require explicitly
storing the hash code for each index entry, which we don't do now, but
it seems justifiable on multiple grounds --- besides this point, the
search could avoid doing the data-type-specific comparison if the hash
code isn't equal.

Another benefit is that it would speed up page splits -- there would be no need to rehash all the keys in a bucket when doing the split.

-Neil

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL

Reply via email to