What I'm saying is that I am getting conflicting information from your 
rebuttals here.

I (and others) say there will be collisions that will cause data loss if verify 
is off.
You say it would be so rare as to be impossible from your perspective.
Tomas says, well then lets just use the hash value for a 4096X compression.
You fluff around his argument calling him names.
I say, well then compute all the possible hashes for all possible bit patterns 
and demonstrate no dupes.
You say it's not possible to do that.
I illustrate a way that loss of data could cost you money.
You say it's impossible for there to be a chance of me constructing a block 
that has the same hash but different content.
Several people have illustrated that 128K to 32bits is a huge and lossy ratio 
of compression, yet you still say it's viable to leave verify off.
I say, in fact that the total number of unique patterns that can exist on any 
pool is small, compared to the total, illustrating that I understand how the 
key space for the algorithm is small when looking at a ZFS pool, and thus could 
have a non-collision opportunity.

So I can see what perspective you are drawing your confidence from, but I, and 
others, are not confident that the risk has zero probability.

I'm pushing you to find a way to demonstrate that there is zero risk because if 
you do that, then you've, in fact created the ultimate compression factor (but 
enlarged the keys that could collide because the pool is now virtually larger), 
to date for random bit patterns, and you've also demonstrated that the 
particular algorithm is very good for dedup. 

That would indicate to me, that you can then take that algorithm, and run it 
inside of ZFS dedup to automatically manage when verify is necessary by 
detecting when a collision occurs.

I appreciate the push back.  I'm trying to drive thinking about this into the 
direction of what is known and finite, away from what is infinitely complex and 
thus impossible to explore.

Maybe all the work has already been done…


On Jul 11, 2012, at 11:02 AM, Sašo Kiselkov wrote:

> On 07/11/2012 05:58 PM, Gregg Wonderly wrote:
>> You're entirely sure that there could never be two different blocks that can 
>> hash to the same value and have different content?
>> Wow, can you just send me the cash now and we'll call it even?
> You're the one making the positive claim and I'm calling bullshit. So
> the onus is on you to demonstrate the collision (and that you arrived at
> it via your brute force method as described). Until then, my money stays
> safely on my bank account. Put up or shut up, as the old saying goes.
> --
> Saso

zfs-discuss mailing list

Reply via email to