Currently, hash indexes always store the hash code in the index, but
not the actual Datum.  It's recently been noted that this can make a
hash index smaller than the corresponding btree index would be if the
column is wide.  However, if the index is being built on a fixed-width
column with a typlen <= sizeof(Datum), we could store the original
value in the hash index rather than the hash code without using any
more space.  That would complicate the code, but I bet it would be
faster: we wouldn't need to set xs_recheck, we could rule out hash
collisions without visiting the heap, and we could support index-only
scans in such cases.

Another thought is that hash codes are 32 bits, but a Datum is 64 bits
wide on most current platforms.  So we're wasting 4 bytes per index
tuple storing nothing.  If we generated 64-bit hash codes we could
store as many bits of it as a Datum will hold and reduce hash
collisions.  Alternatively, we could try to stick some other useful
information in those bytes, like an abbreviated abbreviated key.

Not sure if these are good ideas.  They're just ideas.

Robert Haas
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (
To make changes to your subscription:

Reply via email to