On Wed, Jan 02, 2013 at 12:26:37PM -0600, Troy Benjegerdes wrote: > The probability may be 'low' but it is not zero. Just because it's > hard to calculate the hash doesn't mean you can't do it. If your > input data is not random the probability of a hash collision is > going to get scewed.
The cost of catching hash collisions is an extra read for every write. It's possible to reduce this with a 2nd hash function and/or caching. I'm not sure it's worth it given the extremely low probability of a hash collision. Venti is an example of an existing system where hash collisions were ignored because the probability is so low. See 3.1. Choice of Hash Function section: http://plan9.bell-labs.com/sys/doc/venti/venti.html Stefan