More random thoughts:

- Hash-Indices are best for unique keys, but every table needs a new hash key, which means one more random page access. Is there any way to build multi-_table_ indices? A join might then fetch all table rows with a given unique key after one page fetch for the combined index.

- Hashes with trivial hash-functions (like identity) can also return rows in a desired order.

- Is there a case where a sequentially scanning a hash-index is useful? I can't find any, but maybe somebody else has a use-case.

- What about HashJoins when the referenced tables have hash-indices?

- What about hash-indices where entries are inserted for multiple columns. Assume a table like this:

CREATE TABLE link (obj_id1 INT4, obj_id2 INT4);

and a query like

SELECT * FROM link WHERE ? IN (obj_id1, obj_id2);

or some join using a similar condition. It might be a nice thing to insert entries at both HASH(obj_id1) and HASH(obj_id2), which would eliminate the need to check in two indices and do a bitmap OR. OTOH it might not be faster in any significant use cases because who'd need a link table with nearly unique linked objects?

- In cases where the distribution of the hash-function is good, but a small and relatively even number of rows exist for each key (like it might be the case in the above example), it might be nice to reserve a given amount of same-key row entries in each bucket, and hold a fill-count at the front of it. That would avoid costly page fetches after each collision. You'd create a hash-index with n-buckets, each m-elements large. When the bucket is full, the usual collision handling continues.

- About hash enlargement: What about always using only the first k bits of each hash value. When you find that the hash is "quite-full" (however that is defined and detected), k is increased by one, effectively doubling the hash size. New entries are then written as usual, while retrieving the old entries needs to test at the k-bit-position first and if there is a miss, also at the k-1-position and so forth. To limit this search, some background process could after analyzing the index move old entries to the now correct k-bit-position and increment some "min-k"-value once all old entries have been moved. After the hash has been increased, the index would approximately half it's speed for some time then. Additionally one could also insert the entry at the new position if it has been found at the old one only while using the index. A special "miss"-entry at the new position doesn't help if nothing could be found because the old positions will usually hold some data which resides there even if it uses k bits.


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to