Garrett D'Amore wrote: > Nicolas Williams wrote: >> On Tue, Oct 20, 2009 at 10:51:29AM +0100, Darren J Moffat wrote: >> >>> Glad that you do offer verify as a choice. It would be very useful >>> to provide some sort of log output for the cases where verify found a >>> collision - ie the checksum hashes matched but the verify said they >>> were different. Not useful to end users so it could be a DTrace SDT >>> or only in a DEBUG kernel. If this ever shows up a "hit" when >>> dedup=sha256,verify it will make ZFS famous for finding collisions in >>> SHA256. >>> >> >> A collision log for debug purposes would be nice. But collision stats >> should be provided in any case because such stats can be useful to >> estimitating the usefulness of a hash function for this purpose (how >> much time is spent computing hashes vs. how much time is spent verifying >> blocks, and how many blocks do collide). >> >> Collision stats could also be a useful way to build confidence in a hash >> function ("look! 0 collisions for SHA-3 candidate X on a 1PB pool with >> random and real data!"). Of course, a SHA-3 candidate must build >> confidence by surviving known cryptanalysis techniques + any new ones >> that cryptographers throw at it, and no collisions in 1PB hardly >> constitutes proof, but N>0 collisions in 1PB would be likely be >> worrisome and indicative that additional analysis is needed. Yes, a >> flight of fancy, maybe just eye candy ("ah, SHA-256 seems to be working >> as advertised"), but if so, it'd be cheap eye candy. >> >> % zpool get ddhashcolls,ddcollrate rpool >> NAME PROPERTY VALUE SOURCE >> rpool ddhashcolls 5 - >> rpool ddcollrate .0135 - >> % >> (One prop would count total collisions ever seen, the other would be a >> ration of the first and the pool size.) >> > > How about just a kstat where it can be located easily for debug, without > polluting normal zfs properties?
kstat's don't persist over reboot or pool export/import. But I agree with Adam this is a future nice to have feature that is more about debugging than run time stats not a requirement for dedup's first integration. -- Darren J Moffat