Re: [HACKERS] Hash index todo list item

Kenneth Marshall Fri, 07 Sep 2007 08:18:13 -0700

On Fri, Sep 07, 2007 at 11:08:13AM -0400, Brian Hurt wrote:
> Kenneth Marshall wrote:
>
>>>>      
>>> How likely is it that you will get a hash collision, two strings that are 
>>> different that will hash to the same value?  To avoid this requires a 
>>> very large hash key (128 bits, minimum)- otherwise you get into birthday 
>>> attack problems.  With a 32-bit hash, the likelyhood is greater than 50% 
>>> that two strings in a collection of 100,000 will hash to the same value.  
>>> With a 64-bit hash, the likelyhood is greater than 50% that two strings 
>>> in a collection of 10 billion will has to same value.  10 billion is a 
>>> large number, but not an unreasonable number, of strings to want to put 
>>> into a hash table- and it's exactly this case where the O(1) cost of 
>>> hashtables starts being a real win.
>>>
>>> Brian
>>>
>>>    
>> Yes, there is a non-negligible chance of collision (In a DB is there
>> any chance that is non-negligible? :) ) and the values must be checked
>> against the actual. The win is the collapse of the index size and only
>> needed to check a small fraction of the actual tuples.
>>
>>
>>  
>
> Ah, OK- I misunderstood you.  I thought you were saying that the hash 
> values would need to be unique, and you wouldn't check the original values 
> at all.  My bad.
>
> Brian
>
No, you were correct. I misstated originally and you and Mark both pointed
out my mistake.


Regards,
Ken

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Re: [HACKERS] Hash index todo list item

Reply via email to