On Tue, Oct 20, 2009 at 10:51:29AM +0100, Darren J Moffat wrote:
> Glad that you do offer verify as a choice.   It would be very useful to 
> provide some sort of log output for the cases where verify found a 
> collision - ie the checksum hashes matched but the verify said they were 
> different.  Not useful to end users so it could be a DTrace SDT or only 
> in a DEBUG kernel.  If this ever shows up a "hit" when 
> dedup=sha256,verify it will make ZFS famous for finding collisions in 
> SHA256.

A collision log for debug purposes would be nice.  But collision stats
should be provided in any case because such stats can be useful to
estimitating the usefulness of a hash function for this purpose (how
much time is spent computing hashes vs. how much time is spent verifying
blocks, and how many blocks do collide).

Collision stats could also be a useful way to build confidence in a hash
function ("look! 0 collisions for SHA-3 candidate X on a 1PB pool with
random and real data!").  Of course, a SHA-3 candidate must build
confidence by surviving known cryptanalysis techniques + any new ones
that cryptographers throw at it, and no collisions in 1PB hardly
constitutes proof, but N>0 collisions in 1PB would be likely be
worrisome and indicative that additional analysis is needed.  Yes, a
flight of fancy, maybe just eye candy ("ah, SHA-256 seems to be working
as advertised"), but if so, it'd be cheap eye candy.

% zpool get ddhashcolls,ddcollrate rpool
NAME   PROPERTY       VALUE                     SOURCE
rpool  ddhashcolls    5                         -
rpool  ddcollrate     .0135                     -
% 

(One prop would count total collisions ever seen, the other would be a
ration of the first and the pool size.)

Nico
-- 

Reply via email to