Re: [HACKERS] Hash index todo list item

Jens-Wolfhard Schicke Mon, 10 Sep 2007 07:30:04 -0700

More random thoughts:

- Hash-Indices are best for unique keys, but every table needs a new hashkey, which means one more random page access. Is there any way to buildmulti-_table_ indices? A join might then fetch all table rows with a givenunique key after one page fetch for the combined index.

- Hashes with trivial hash-functions (like identity) can also return rowsin a desired order.

- Is there a case where a sequentially scanning a hash-index is useful? Ican't find any, but maybe somebody else has a use-case.


- What about HashJoins when the referenced tables have hash-indices?

- What about hash-indices where entries are inserted for multiple columns.Assume a table like this:


CREATE TABLE link (obj_id1 INT4, obj_id2 INT4);

and a query like

SELECT * FROM link WHERE ? IN (obj_id1, obj_id2);

or some join using a similar condition. It might be a nice thing to insertentries at both HASH(obj_id1) and HASH(obj_id2), which would eliminate theneed to check in two indices and do a bitmap OR. OTOH it might not befaster in any significant use cases because who'd need a link table withnearly unique linked objects?

- In cases where the distribution of the hash-function is good, but a smalland relatively even number of rows exist for each key (like it might be thecase in the above example), it might be nice to reserve a given amount ofsame-key row entries in each bucket, and hold a fill-count at the front ofit. That would avoid costly page fetches after each collision. You'd createa hash-index with n-buckets, each m-elements large. When the bucket isfull, the usual collision handling continues.

- About hash enlargement: What about always using only the first k bits ofeach hash value. When you find that the hash is "quite-full" (however thatis defined and detected), k is increased by one, effectively doubling thehash size. New entries are then written as usual, while retrieving the oldentries needs to test at the k-bit-position first and if there is a miss,also at the k-1-position and so forth. To limit this search, somebackground process could after analyzing the index move old entries to thenow correct k-bit-position and increment some "min-k"-value once all oldentries have been moved. After the hash has been increased, the index wouldapproximately half it's speed for some time then. Additionally one couldalso insert the entry at the new position if it has been found at the oldone only while using the index. A special "miss"-entry at the new positiondoesn't help if nothing could be found because the old positions willusually hold some data which resides there even if it uses k bits.



---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Hash index todo list item

Reply via email to