Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-18 Thread Orvar Korvar
...If this is a general rule, maybe it will be worth considering using SHA512 truncated to 256 bits to get more speed... Doesn't it need more investigation if truncating 512bit to 256bit gives equivalent security as a plain 256bit hash? Maybe truncation will introduce some bias? -- This

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-18 Thread Orvar Korvar
Totally Off Topic: Very interesting. Did you produce some papers on this? Where do you work? Seems very fun place to work at! BTW, I thought about this. What do you say? Assume I want to compress data and I succeed in doing so. And then I transfer the compressed data. So all the information

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-18 Thread Nicolas Williams
On Tue, Jan 18, 2011 at 07:16:04AM -0800, Orvar Korvar wrote: BTW, I thought about this. What do you say? Assume I want to compress data and I succeed in doing so. And then I transfer the compressed data. So all the information I transferred is the compressed data. But, then you don't count

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-17 Thread Nicolas Williams
On Sat, Jan 15, 2011 at 10:19:23AM -0600, Bob Friesenhahn wrote: On Fri, 14 Jan 2011, Peter Taps wrote: Thank you for sharing the calculations. In lay terms, for Sha256, how many blocks of data would be needed to have one collision? Two. Pretty funny. In this thread some of you are

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-15 Thread Pawel Jakub Dawidek
On Fri, Jan 14, 2011 at 11:32:58AM -0800, Peter Taps wrote: Ed, Thank you for sharing the calculations. In lay terms, for Sha256, how many blocks of data would be needed to have one collision? Assuming each block is 4K is size, we probably can calculate the final data size beyond which

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-15 Thread Edward Ned Harvey
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Peter Taps Thank you for sharing the calculations. In lay terms, for Sha256, how many blocks of data would be needed to have one collision? There is no point in making a generalization and a

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-15 Thread Bob Friesenhahn
On Fri, 14 Jan 2011, Peter Taps wrote: Thank you for sharing the calculations. In lay terms, for Sha256, how many blocks of data would be needed to have one collision? Two. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-14 Thread Peter Taps
Ed, Thank you for sharing the calculations. In lay terms, for Sha256, how many blocks of data would be needed to have one collision? Assuming each block is 4K is size, we probably can calculate the final data size beyond which the collision may occur. This would enable us to make the

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-14 Thread Peter Taps
I am posting this once again as my previous post went into the middle of the thread and may go unnoticed. Ed, Thank you for sharing the calculations. In lay terms, for Sha256, how many blocks of data would be needed to have one collision? Assuming each block is 4K is size, we probably can

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-14 Thread David Magda
On Jan 14, 2011, at 14:32, Peter Taps wrote: Also, another related question. Why 256 bits was chosen and not 128 bits or 512 bits? I guess Sha512 may be an overkill. In your formula, how many blocks of data would be needed to have one collision using Sha128? There are two ways to get 128

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-12 Thread Enrico Maria Crisostomo
Edward, this is OT but may I suggest you to use something like Wolfram Alpha to perform your calculations a bit more comfortably? -- Enrico M. Crisostomo On Jan 12, 2011, at 4:24, Edward Ned Harvey opensolarisisdeadlongliveopensola...@nedharvey.com wrote: For anyone who still cares: I'm

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-12 Thread Edward Ned Harvey
Edward, this is OT but may I suggest you to use something like Wolfram Alpha to perform your calculations a bit more comfortably? Wow, that's pretty awesome. Thanks. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-11 Thread Lassi Tuura
Hey there, ~= 5.1E-57 Bah. My math is wrong. I was never very good at PS. I'll ask someone at work tomorrow to look at it and show me the folly. Wikipedia has it right, but I can't evaluate numbers to the few-hundredth power in any calculator that I have handy. bc -l EOF scale=150

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-11 Thread Edward Ned Harvey
From: Lassi Tuura [mailto:l...@cern.ch] bc -l EOF scale=150 define bday(n, h) { return 1 - e(-(n^2)/(2*h)); } bday(2^35, 2^256) bday(2^35, 2^256) * 10^57 EOF Basically, ~5.1 * 10^-57. Seems your number was correct, although I am not sure how you arrived at it. The number was

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-11 Thread Edward Ned Harvey
For anyone who still cares: I'm calculating the odds of a sha256 collision in an extremely large zpool, containing 2^35 blocks of data, and no repetitions. The formula on wikipedia for the birthday problem is: p(n;d) ~= 1-( (d-1)/d )^( 0.5*n*(n-1) ) In this case, n=2^35 d=2^256 The problem

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-11 Thread Edward Ned Harvey
In case you were wondering how big is n before the probability of collision becomes remotely possible, slightly possible, or even likely? Given a fixed probability of collision p, the formula to calculate n is: n = 0.5 + sqrt( ( 0.25 + 2*l(1-p)/l((d-1)/d) ) ) (That's just the same equation

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-10 Thread Robert Milkowski
On 01/ 8/11 05:59 PM, Edward Ned Harvey wrote: Has anybody measured the cost of enabling or disabling verification? The cost of disabling verification is an infinitesimally small number multiplied by possibly all your data. Basically lim-0 times lim-infinity. This can only be evaluated on a

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-10 Thread Pawel Jakub Dawidek
On Sat, Jan 08, 2011 at 12:59:17PM -0500, Edward Ned Harvey wrote: Has anybody measured the cost of enabling or disabling verification? Of course there is no easy answer:) Let me explain how verification works exactly first. You try to write a block. You see that block is already in dedup

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-10 Thread David Magda
On Mon, January 10, 2011 02:41, Eric D. Mudama wrote: On Sun, Jan 9 at 22:54, Peter Taps wrote: Thank you all for your help. I am the OP. I haven't looked at the link that talks about the probability of collision. Intuitively, I still wonder how the chances of collision can be so low. We

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-10 Thread Edward Ned Harvey
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Peter Taps I haven't looked at the link that talks about the probability of collision. Intuitively, I still wonder how the chances of collision can be so low. We are reducing a 4K block to

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-10 Thread Edward Ned Harvey
From: Pawel Jakub Dawidek [mailto:p...@freebsd.org] Well, I find it quite reasonable. If your block is referenced 100 times, it is probably quite important. If your block is referenced 1 time, it is probably quite important. Hence redundancy in the pool. There are many corruption

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-10 Thread Edward Ned Harvey
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of David Magda Knowing exactly how the math (?) works is not necessary, but understanding Understanding the math is not necessary, but it is pretty easy. And unfortunately it becomes kind of

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-10 Thread Edward Ned Harvey
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Edward Ned Harvey ~= 5.1E-57 Bah. My math is wrong. I was never very good at PS. I'll ask someone at work tomorrow to look at it and show me the folly. Wikipedia has it right, but I can't

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-09 Thread Pawel Jakub Dawidek
On Fri, Jan 07, 2011 at 03:06:26PM -0800, Brandon High wrote: On Fri, Jan 7, 2011 at 11:33 AM, Robert Milkowski mi...@task.gda.pl wrote: end-up with the block A. Now if B is relatively common in your data set you have a relatively big impact on many files because of one corrupted block

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-09 Thread Edward Ned Harvey
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Pawel Jakub Dawidek Dedupditto doesn't work exactly that way. You can have at most 3 copies of your block. Dedupditto minimal value is 100. The first copy is created on first write, the

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-09 Thread Peter Taps
Thank you all for your help. I am the OP. I haven't looked at the link that talks about the probability of collision. Intuitively, I still wonder how the chances of collision can be so low. We are reducing a 4K block to just 256 bits. If the chances of collision are so low, *theoretically* it

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-09 Thread Eric D. Mudama
On Sun, Jan 9 at 22:54, Peter Taps wrote: Thank you all for your help. I am the OP. I haven't looked at the link that talks about the probability of collision. Intuitively, I still wonder how the chances of collision can be so low. We are reducing a 4K block to just 256 bits. If the chances of

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-08 Thread Robert Milkowski
On 01/ 7/11 09:02 PM, Pawel Jakub Dawidek wrote: On Fri, Jan 07, 2011 at 07:33:53PM +, Robert Milkowski wrote: Now what if block B is a meta-data block? Metadata is not deduplicated. Good point but then it depends on a perspective. What if you you are storing lots of VMDKs? One

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-08 Thread Edward Ned Harvey
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Robert Milkowski What if you you are storing lots of VMDKs? One corrupted block which is shared among hundreds of VMDKs will affect all of them. And it might be a block containing meta-data

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-08 Thread Bob Friesenhahn
On Thu, 6 Jan 2011, David Magda wrote: If you're not worried about disk read errors (and/or are not experiencing them), then you shouldn't be worried about has collisions. Except for the little problem that if there is a collision then there will always be a collision for the same data and

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread Bakul Shah
On Thu, 06 Jan 2011 22:42:15 PST Michael DeMan sola...@deman.com wrote: To be quite honest, I too am skeptical about about using de-dupe just based o n SHA256. In prior posts it was asked that the potential adopter of the tech nology provide the mathematical reason to NOT use SHA-256 only.

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread Darren J Moffat
On 06/01/2011 23:07, David Magda wrote: On Jan 6, 2011, at 15:57, Nicolas Williams wrote: Fletcher is faster than SHA-256, so I think that must be what you're asking about: can Fletcher+Verification be faster than Sha256+NoVerification? Or do you have some other goal? Would running on

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread Sašo Kiselkov
On 01/07/2011 10:26 AM, Darren J Moffat wrote: On 06/01/2011 23:07, David Magda wrote: On Jan 6, 2011, at 15:57, Nicolas Williams wrote: Fletcher is faster than SHA-256, so I think that must be what you're asking about: can Fletcher+Verification be faster than Sha256+NoVerification? Or do

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread Darren J Moffat
On 07/01/2011 11:56, Sašo Kiselkov wrote: On 01/07/2011 10:26 AM, Darren J Moffat wrote: On 06/01/2011 23:07, David Magda wrote: On Jan 6, 2011, at 15:57, Nicolas Williams wrote: Fletcher is faster than SHA-256, so I think that must be what you're asking about: can Fletcher+Verification be

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread Sašo Kiselkov
On 01/07/2011 01:15 PM, Darren J Moffat wrote: On 07/01/2011 11:56, Sašo Kiselkov wrote: On 01/07/2011 10:26 AM, Darren J Moffat wrote: On 06/01/2011 23:07, David Magda wrote: On Jan 6, 2011, at 15:57, Nicolas Williams wrote: Fletcher is faster than SHA-256, so I think that must be what

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread Edward Ned Harvey
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Bakul Shah See http://en.wikipedia.org/wiki/Birthday_problem -- in particular see section 5.1 and the probability table of section 3.4. They say The expected number of n-bit hashes that can

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread David Magda
On Fri, January 7, 2011 04:26, Darren J Moffat wrote: On 06/01/2011 23:07, David Magda wrote: Would running on recent T-series servers, which have have on-die crypto units, help any in this regard? The on chip SHA-256 implementation is not yet used see:

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread David Magda
On Fri, January 7, 2011 01:42, Michael DeMan wrote: Then - there is the other side of things. The 'black swan' event. At some point, given percentages on a scenario like the example case above, one simply has to make the business justification case internally at their own company about

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread Michael DeMan
On Jan 7, 2011, at 6:13 AM, David Magda wrote: On Fri, January 7, 2011 01:42, Michael DeMan wrote: Then - there is the other side of things. The 'black swan' event. At some point, given percentages on a scenario like the example case above, one simply has to make the business justification

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread Nicolas Williams
On Fri, Jan 07, 2011 at 06:39:51AM -0800, Michael DeMan wrote: On Jan 7, 2011, at 6:13 AM, David Magda wrote: The other thing to note is that by default (with de-dupe disabled), ZFS uses Fletcher checksums to prevent data corruption. Add also the fact all other file systems don't have any

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread Casper . Dik
On Fri, January 7, 2011 01:42, Michael DeMan wrote: Then - there is the other side of things. The 'black swan' event. At some point, given percentages on a scenario like the example case above, one simply has to make the business justification case internally at their own company about

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread Robert Milkowski
On 01/ 7/11 02:13 PM, David Magda wrote: Given the above: most people are content enough to trust Fletcher to not have data corruption, but are worried about SHA-256 giving 'data corruption' when it comes de-dupe? The entire rest of the computing world is content to live with 10^-15 (for SAS

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread David Magda
On Fri, January 7, 2011 14:33, Robert Milkowski wrote: On 01/ 7/11 02:13 PM, David Magda wrote: Given the above: most people are content enough to trust Fletcher to not have data corruption, but are worried about SHA-256 giving 'data corruption' when it comes de-dupe? The entire rest of the

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread Pawel Jakub Dawidek
On Fri, Jan 07, 2011 at 07:33:53PM +, Robert Milkowski wrote: On 01/ 7/11 02:13 PM, David Magda wrote: Given the above: most people are content enough to trust Fletcher to not have data corruption, but are worried about SHA-256 giving 'data corruption' when it comes de-dupe? The entire

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread Brandon High
On Fri, Jan 7, 2011 at 11:33 AM, Robert Milkowski mi...@task.gda.pl wrote: end-up with the block A. Now if B is relatively common in your data set you have a relatively big impact on many files because of one corrupted block (additionally from a fs point of view this is a silent data

[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread Peter Taps
Folks, I have been told that the checksum value returned by Sha256 is almost guaranteed to be unique. In fact, if Sha256 fails in some case, we have a bigger problem such as memory corruption, etc. Essentially, adding verification to sha256 is an overkill. Perhaps (Sha256+NoVerification)

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread David Magda
On Thu, January 6, 2011 14:44, Peter Taps wrote: I have been told that the checksum value returned by Sha256 is almost guaranteed to be unique. In fact, if Sha256 fails in some case, we have a bigger problem such as memory corruption, etc. Essentially, adding verification to sha256 is an

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread Robert Milkowski
On 01/ 6/11 07:44 PM, Peter Taps wrote: Folks, I have been told that the checksum value returned by Sha256 is almost guaranteed to be unique. In fact, if Sha256 fails in some case, we have a bigger problem such as memory corruption, etc. Essentially, adding verification to sha256 is an

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread Richard Elling
On Jan 6, 2011, at 11:44 AM, Peter Taps wrote: Folks, I have been told that the checksum value returned by Sha256 is almost guaranteed to be unique. In fact, if Sha256 fails in some case, we have a bigger problem such as memory corruption, etc. Essentially, adding verification to sha256

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread Nicolas Williams
On Thu, Jan 06, 2011 at 11:44:31AM -0800, Peter Taps wrote: I have been told that the checksum value returned by Sha256 is almost guaranteed to be unique. All hash functions are guaranteed to have collisions [for inputs larger than their output anyways]. In fact, if

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread David Magda
On Jan 6, 2011, at 15:57, Nicolas Williams wrote: Fletcher is faster than SHA-256, so I think that must be what you're asking about: can Fletcher+Verification be faster than Sha256+NoVerification? Or do you have some other goal? Would running on recent T-series servers, which have have

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread Nicolas Williams
On Thu, Jan 06, 2011 at 06:07:47PM -0500, David Magda wrote: On Jan 6, 2011, at 15:57, Nicolas Williams wrote: Fletcher is faster than SHA-256, so I think that must be what you're asking about: can Fletcher+Verification be faster than Sha256+NoVerification? Or do you have some other

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread Edward Ned Harvey
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Peter Taps Perhaps (Sha256+NoVerification) would work 99.99% of the time. But Append 50 more 9's on there. 99.% See below. I

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread Michael DeMan
At the end of the day this issue essentially is about mathematical improbability versus certainty? To be quite honest, I too am skeptical about about using de-dupe just based on SHA256. In prior posts it was asked that the potential adopter of the technology provide the mathematical reason to