Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication

Stefan Hajnoczi Thu, 03 Jan 2013 04:40:28 -0800

On Wed, Jan 02, 2013 at 12:26:37PM -0600, Troy Benjegerdes wrote:
> The probability may be 'low' but it is not zero. Just because it's
> hard to calculate the hash doesn't mean you can't do it. If your
> input data is not random the probability of a hash collision is
> going to get scewed.


The cost of catching hash collisions is an extra read for every write.
It's possible to reduce this with a 2nd hash function and/or caching.

I'm not sure it's worth it given the extremely low probability of a hash
collision.

Venti is an example of an existing system where hash collisions were
ignored because the probability is so low.  See 3.1. Choice of Hash
Function section:

http://plan9.bell-labs.com/sys/doc/venti/venti.html

Stefan

Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication

Reply via email to