On Tue, Oct 20, 2009 at 10:51:29AM +0100, Darren J Moffat wrote: > Glad that you do offer verify as a choice. It would be very useful to > provide some sort of log output for the cases where verify found a > collision - ie the checksum hashes matched but the verify said they were > different. Not useful to end users so it could be a DTrace SDT or only > in a DEBUG kernel. If this ever shows up a "hit" when > dedup=sha256,verify it will make ZFS famous for finding collisions in > SHA256.
A collision log for debug purposes would be nice. But collision stats should be provided in any case because such stats can be useful to estimitating the usefulness of a hash function for this purpose (how much time is spent computing hashes vs. how much time is spent verifying blocks, and how many blocks do collide). Collision stats could also be a useful way to build confidence in a hash function ("look! 0 collisions for SHA-3 candidate X on a 1PB pool with random and real data!"). Of course, a SHA-3 candidate must build confidence by surviving known cryptanalysis techniques + any new ones that cryptographers throw at it, and no collisions in 1PB hardly constitutes proof, but N>0 collisions in 1PB would be likely be worrisome and indicative that additional analysis is needed. Yes, a flight of fancy, maybe just eye candy ("ah, SHA-256 seems to be working as advertised"), but if so, it'd be cheap eye candy. % zpool get ddhashcolls,ddcollrate rpool NAME PROPERTY VALUE SOURCE rpool ddhashcolls 5 - rpool ddcollrate .0135 - % (One prop would count total collisions ever seen, the other would be a ration of the first and the pool size.) Nico --