On 07/11/2012 01:36 PM, casper....@oracle.com wrote:
>> This assumes you have low volumes of deduplicated data. As your dedup
>> ratio grows, so does the performance hit from dedup=verify. At, say,
>> dedupratio=10.0x, on average, every write results in 10 reads.
> I don't follow.
> If dedupratio == 10, it means that each item is *referenced* 10 times
> but it is only stored *once*. Only when you have hash collisions then
> multiple reads would be needed.
> Only one read is needed except in the case of hash collisions.
No, *every* dedup write will result in a block read. This is how:
1) ZIO gets block X and computes HASH(X)
2) ZIO looks up HASH(X) in DDT
2a) HASH(X) not in DDT -> unique write; exit
2b) HASH(X) in DDT; continue
3) Read original disk block Y with HASH(Y) = HASH(X) <--here's the read
4) Verify X == Y
4a) X == Y; increment refcount
4b) X != Y; hash collision; write new block <-- here's the collision
So in other words, by the time you figure out you've got a hash
collision, you already did the read, ergo, every dedup write creates a read!
zfs-discuss mailing list