Zooko Wilcox-O'Hearn wrote: > On Sunday, 2009-12-13, at 20:20 , David-Sarah Hopwood wrote: >>> But how many blocks does it take before you suffer a 10^-5 chance of >>> IV collision? >> >> sqrt(2 * 10^-5) * 2^48 =~ 2^39.7. >> >> (You can do it in your head: nearest power of 2 to 100000 is 2^17, so >> that's roughly 2^(48 - 17/2) = 2^39.5.) >> >>> How about a 10^-9 chance? >> >> sqrt(2 * 10^-9) * 2^48 =~ 2^33.1. > > Hrm, why is this the answer? Can you explain the math to me?
Given n random values drawn from a discrete uniform distribution with d >= n elements, let p be the probability that at least two values are the same. Then 1 - p is the probability that all values are distinct. When choosing the (k+1)th value, k values have already been chosen, so the probability that this value will be distinct from those already chosen is 1 - k/d. Therefore 1 - p = product{k = 1..n-1}(1 - k/d) for n <= d Apply the approximation 1 ? x =~ e^-x for each x = k/d: 1 - p =~ product{k = 1..n-1}(e^(-k/d)) = e^sum{k = 1..n-1}(-k/d) = e^-(n(n-1)/2d) ln(1 - p) =~ -n(n-1)/2d ln(1/(1 - p)) =~ n(n-1)/2d n^2 =~ 2d ln(1/(1 - p)) [when d >> n] n =~ sqrt(2d ln(1/(1 - p))) Apply 1 - p =~ e^-p for small p, therefore ln(1/(1 - p)) =~ p: n =~ sqrt(2dp) = sqrt(2p) sqrt(d) This doesn't tell us how close the approximation is. In fact it is only about 18% out for p = 1/2, and more accurate for all smaller p. > In any case, assuming you are right then this means that having a mere 8 > billion blocks in your filesystem would incur a one-in-a-billion chance > of IV collision. Hm. That's not something I would be comfortable with. Me neither -- at least not for a general-purpose system where you don't know the sensitivity of the data. >> In <http://article.gmane.org/gmane.comp.encryption.general/13719>, I >> suggested a scheme that would also address both issues (maybe it's >> similar to what you're thinking of). Ideally you would want a >> 256-bit-block cipher if you're going to use that approach for CTR >> mode, though. > > Ha ha! Thank you for reminding me of this. Well, I read your proposal > when you posted it, but I didn't really understand it all. Then a > couple of days ago I woke up in the morning with a clever idea in mind > for how to improve ZFS crypto, which I alluded to, above. Re-reading > your post now it appears that my clever invention is nothing but a poor > variation of yours. :-) This has happened to me many times :-) (You very often need to reinvent something in order to see that it's possible, and know what to look for in previous research.) > So, just like you said, this proposal of mine -- I mean of yours -- > would have several advantages: most importantly, it allows deduplication > of encrypted blocks and it does so even if dedupe was not enabled when > that encrypted dataset was created and filled with data. Oh, I hadn't spotted that. > Second most > important, it allows 128-bit IVs, which means that if you are unwilling > to tolerate a 10^-9 chance of IV collision, you can raise your dataset > size limit from 2^33 blocks (with a 96-bit IV) to 2^49 blocks (with a > 128-bit IV). > > Thirdly, your scheme allows any block cipher mode of operation including > unauthenticated ones. > > And like you said, the major drawback to this is that you have to > process each block in two passes -- once to compute the MAC and then a > second time to encrypt. Since the blocks are small enough to fit wholly > in RAM, this may not be a significant performance problem in practice -- > I'm not sure. If I were making the decision then I would measure that. > > You mentioned that this would allow a parallelizable computation of the > encryption, such as by using CTR mode, and that you don't know of any > unpatented parallelizable MAC. However it is easy to make a > parallelizable MAC: use a Merkle Tree! For example, generate a separate > MAC tag over each 4 KiB of the block in a separate thread, resulting in > 32 MAC tags (if the block were 128 KiB in size). Then concatenate all > the MAC tags together and compute a MAC over them. D'oh. Obvious when you point it out. That's simple enough that it's probably worth doing it that way, even if current machines can't take full advantage because of threading overheads. -- David-Sarah Hopwood ? http://davidsarah.livejournal.com -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 292 bytes Desc: OpenPGP digital signature URL: <http://mail.opensolaris.org/pipermail/zfs-crypto-discuss/attachments/20091215/c4d1950a/attachment.bin>