On 07/11/2012 01:09 PM, Justin Stringfellow wrote:
>> The point is that hash functions are many to one and I think the point
>> was about that verify wasn't really needed if the hash function is good
>> enough.
> This is a circular argument really, isn't it? Hash algorithms are never 
> perfect, but we're trying to build a perfect one?
> It seems to me the obvious fix is to use hash to identify candidates for 
> dedup, and then do the actual verify and dedup asynchronously. Perhaps a 
> worker thread doing this at low priority?
> Did anyone consider this?

This assumes you have low volumes of deduplicated data. As your dedup
ratio grows, so does the performance hit from dedup=verify. At, say,
dedupratio=10.0x, on average, every write results in 10 reads.

