>> This assumes you have low volumes of deduplicated data. As your dedup
>> ratio grows, so does the performance hit from dedup=verify. At, say,
>> dedupratio=10.0x, on average, every write results in 10 reads.
> Well you can't make an omelette without breaking eggs! Not a very nice one, 
> anyway.
> Yes dedup is expensive but much like using O_SYNC, it's a conscious decision 
> here to take a performance hit in order to be sure about our data. Moving the 
> actual reads to a async thread as I suggested should improve things.

And my point here is that the expense is unnecessary and can be omitted
if we choose our algorithms and settings carefully.

"Async" here won't help, you'll still get equal write amplification,
only it's going to be spread in between txg's.

