On Fri, Sep 07, 2007 at 10:36:41AM -0400, Brian Hurt wrote: > Kenneth Marshall wrote: > >> I understand that a hash value is a many-to-one mapping. That is the >> point of the flag in the index. The flag means that there is only one >> item in the heap corresponding to that hash value. In this case we >> know that the value in the heap is the correct one and a possibly >> very expensive string comparison can be skipped. Given that the hash >> function is doing its job, almost every string comparison can be skipped. >> How long would it take to compare 1-32K of data? How much CPU usage? >> With this field in place, you only need to check tuple visibility. >> > > How likely is it that you will get a hash collision, two strings that are > different that will hash to the same value? To avoid this requires a very > large hash key (128 bits, minimum)- otherwise you get into birthday attack > problems. With a 32-bit hash, the likelyhood is greater than 50% that two > strings in a collection of 100,000 will hash to the same value. With a > 64-bit hash, the likelyhood is greater than 50% that two strings in a > collection of 10 billion will has to same value. 10 billion is a large > number, but not an unreasonable number, of strings to want to put into a > hash table- and it's exactly this case where the O(1) cost of hashtables > starts being a real win. > > Brian > Yes, there is a non-negligible chance of collision (In a DB is there any chance that is non-negligible? :) ) and the values must be checked against the actual. The win is the collapse of the index size and only needed to check a small fraction of the actual tuples.
Regards, Ken ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly