On Thu, Sep 06, 2007 at 11:53:45AM -0400, Mark Mielke wrote: > Hannu Krosing wrote: >>>> One approahc is not to mix hashes, but to partition the hash, so that >>>> each column gets its N bits in the hash. >>> How does that help? You still need all the keys to find out which >>> bucket to look in. >>> >> >> no. you need to look at only the buckets where that part of hash matches >> >> say you allocate bits 4-7 for column 2 and then need to look up column 2 >> value with hash 3 . here you need to look at only buckets N*16 + 3, that >> is, you need to examine only each 16th bucket >> >> > > I don't like the truncating hash suggestion because it limits the ability > of a hash code to uniquely identify a key. > > If a user requires the ability to search on both (column1) and (column1, > column2), they can create two hash indexes and the planner can decide which > to use. > Or, they can use a btree. I think hash has a subset of uses where it would > be a significant gain, and focus should be spent on this subset. > > Cheers, > mark > I agree that we should focus primarily on the subset of uses for hash indexes where there would be a significant gain. I do think that being able to use a single O(1) hash lookup against all the values specified in a pseudo-multi-column index could be very beneficial in reducing access time and I/O.
Since we already have to check the actual tuple values for any index lookup in postgresql, we could only store the full hash value and the corresponding TIDs in the bucket. Then when we lookup an item by calculating its hash, if the exact hash is not present in the bucket, then we know that the item is not in the index. If the value exists, then we would check the heap tuples before returning the results. Thus a negative lookup only needs to check the index and if the hash function is "good" there will be optimally only 1 possibly valid heap tuple if there is a match. One very big win for this change is to allow a much smaller index size (hash value + relevant TIDs) and the large column values are only stored in the actual data tuples. Regards, Ken > -- > Mark Mielke <[EMAIL PROTECTED]> > ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate