On 07/11/2012 01:36 PM, casper....@oracle.com wrote:
>> This assumes you have low volumes of deduplicated data. As your dedup
>> ratio grows, so does the performance hit from dedup=verify. At, say,
>> dedupratio=10.0x, on average, every write results in 10 reads.
> I don't follow.
> If dedupratio == 10, it means that each item is *referenced* 10 times
> but it is only stored *once*.  Only when you have hash collisions then 
> multiple reads would be needed.
> Only one read is needed except in the case of hash collisions.

No, *every* dedup write will result in a block read. This is how:

 1) ZIO gets block X and computes HASH(X)
 2) ZIO looks up HASH(X) in DDT
  2a) HASH(X) not in DDT -> unique write; exit
  2b) HASH(X) in DDT; continue
 3) Read original disk block Y with HASH(Y) = HASH(X) <--here's the read
 4) Verify X == Y
  4a) X == Y; increment refcount
  4b) X != Y; hash collision; write new block   <-- here's the collision

So in other words, by the time you figure out you've got a hash
collision, you already did the read, ergo, every dedup write creates a read!

zfs-discuss mailing list

Reply via email to