On Wed, Jan 02, 2013 at 12:26:37PM -0600, Troy Benjegerdes wrote:
> The probability may be 'low' but it is not zero. Just because it's
> hard to calculate the hash doesn't mean you can't do it. If your
> input data is not random the probability of a hash collision is
> going to get scewed.

The cost of catching hash collisions is an extra read for every write.
It's possible to reduce this with a 2nd hash function and/or caching.

I'm not sure it's worth it given the extremely low probability of a hash
collision.

Venti is an example of an existing system where hash collisions were
ignored because the probability is so low.  See 3.1. Choice of Hash
Function section:

http://plan9.bell-labs.com/sys/doc/venti/venti.html

Stefan

Reply via email to