On 07/11/2012 04:19 PM, Gregg Wonderly wrote: > But this is precisely the kind of "observation" that some people seem to miss > out on the importance of. As Tomas suggested in his post, if this was true, > then we could have a huge compression ratio as well. And even if there was > 10% of the bit patterns that created non-unique hashes, you could use the > fact that a block hashed to a known bit pattern that didn't have collisions, > to compress the other 90% of your data. > > I'm serious about this from a number of perspectives. We worry about the > time it would take to reverse SHA or RSA hashes to passwords, not even > thinking that what if someone has been quietly computing all possible hashes > for the past 10-20 years into a database some where, with every 5-16 > character password, and now has an instantly searchable hash-to-password > database.

This is something very well known in the security community as "rainbow tables" and a common method to protect against it is via salting. Never use a password hashing scheme which doesn't use salts for exactly the reason you outlined above. > Sometimes we ignore the scale of time, thinking that only the immediately > visible details are what we have to work with. > > If no one has computed the hashes for every single 4K and 8K block, then > fine. But, if that was done, and we had that data, we'd know for sure which > algorithm was going to work the best for the number of bits we are > considering. Do you even realize how many 4K or 8K blocks there are?!?! Exactly 2^32768 or 2^65536 respectively. I wouldn't worry about somebody having those pre-hashed ;-) Rainbow tables only work for a very limited subset of data. > Speculating based on the theory of the algorithms for "random" number of bits > is just silly. Where's the real data that tells us what we need to know? If you don't trust math, then I there's little I can do to convince you. But remember our conversation the next time you step into a car or get on an airplane. The odds that you'll die on that ride are far higher than that you'll find a random hash collision in a 256-bit hash algorithm... -- Saso _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss