Re: [p2p-hackers] convergent encryption reconsidered -- salting and key-strengthening
On Mar 31, 2008, at 4:47 AM, Ivan Krstić wrote: Tahoe doesn't run this service either. I can't use it to make guesses at any of the values you mentioned. I can use it to make guesses at whole documents incorporating such values, which is in most cases a highly non-trivial distinction. The way that I would phrase this is that convergent encryption exposes whatever data is put into it, in whatever batch-size is put into it, to brute-force/dictionary attacks. If the data that you put in is unguessable, then you needn't worry about these attacks. (Likewise, as Ben Laurie reminds us, using strong passwords is a sufficient defense against these attacks on passwords.) You correctly emphasize that typical convergent encryption services (which operate on files, or, in the case of GNUnet, on 32 KiB blocks), and typical uses of those services (which typically store files as produced by apps written for traditional filesystems), batch together data in such a way that the aggregate is more likely to be unguessable than if each field were stored separately. I don't disagree with this observation. I am often reminded of Niels Ferguson's and Bruce Schneier's dictum, in the excellent _Practical_Cryptography_, that security needs to be a *local* property. They argue that one should be able to tell whether a component is secure by inspecting that component itself, rather than by reasoning about interactions between that component and other components. Concretely, convergent encryption with a per-user added secret, as currently implemented in Tahoe, can be shown to guarantee confidentiality of the data, regardless of what the data is. Traditional convergent encryption can be shown to offer confidentiality only with the proviso that the data put into it conform to certain criteria -- criteria that cannot be verified by a computer nor by a user who is not a skilled security expert. You may argue that the chance that a user would put non-comformant data into it is small. I don't necessarily disagree, although before I became willing to bet on it I would require more quantitative investigation. However, arguing that component A is secure as long as component B behaves a certain way, and that component B is very likely to behave that way, is a different sort of argument than arguing that component A is secure regardless of the behavior of component B. For one thing, the behavior of component B may change in the future. Concretely, people may write apps that store data in Tahoe in a way that previous apps didn't. Those people will almost certainly be completely unaware of the nature of convergent encryption and brute- force/dictionary attacks. Now obviously making the security properties of a system modular in this way might impose a performance cost. In the case of Tahoe, that cost is the loss of universal convergence. Allmydata.com analyzed the space savings due to convergence among our current customers and found that it was around 1% savings. We (allmydata.com) intend to monitor the potential savings of universal convergence in an on-going way, and if it turns out that there are substantial benefits to be gained then I will revisit this issue and perhaps I will be forced to rely on an argument of the other form -- that users are unlikely to use it in an unsafe way. Thank you again for your thoughtful comments on this issue. Regards, Zooko O'Whielacronx - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]
convergent encryption reconsidered -- salting and key-strengthening
[This conversation is spanning three mailing lists -- cryptography@metzdowd.com, [EMAIL PROTECTED], and tahoe- [EMAIL PROTECTED] . Some of the posts have not reached all three of those lists. I've manually added Jerry Leichter and Ivan Krstić to the approved-senders set for p2p-hackers and tahoe-dev, so that further posts by them will appear on those lists.] On Mar 30, 2008, at 3:13 PM, Ivan Krstić wrote: Unless I'm misunderstanding Zooko's writeup, he's worried about an attacker going from a partially-known plaintext (e.g. a form bank letter) to a completely-known plaintext by repeating the following process: 1. take partially known plaintext 2. make a guess, randomly or more intelligently where possible, about the unknown parts 3. take the current integrated partial+guessed plaintext, hash to obtain convergence key 4. verify whether that key exists in the storage index 5. if yes, you've found the full plaintext. if not, repeat from '2'. That's a brute force search. That's right. Your comparison to normal old brute-force/dictionary attack on passwords is a good one, and one that Jim McCoy and Jerry Leichter have alluded to as well. Think of it like this: Passwords are susceptible to brute-force and/or dictionary attack. We can't, in general, prevent attackers from trying guesses at our passwords without also preventing users from using them, so instead we employ various techniques: * salts (to break up the space of targets into subspaces, of which at most one can be targeted by a given brute-force attack) * key strengthening (to increase by a constant factor the cost of checking a password) * rate-limits for on-line tries (i.e., you get only a small fixed number of wrong guesses in a row before you are locked out for a time- out period) However, secrets other than passwords are not usually susceptible to such attacks. You can store your True Name, credit card number, bank account number, mother's maiden name, and so forth, on the same server as your password, but you don't have to worry about using salts or key strengthening on those latter secrets, because the server doesn't run a service that allows unauthenticated remote people to connect, submit a guess as to their value, and receive confirmation, the way it does for your password. (In other words, for such data we generally use an extreme form of the third defense, rate-limiting tries -- the number of guesses that an attacker gets is limited to none!) Likewise, if you are going to store or transmit those kinds of secrets in encrypted form using a traditional randomly-generated encryption key, then you don't have to worry about brute-force/ dictionary attacks because your strong randomly-selected symmetric encryption key defies them. The Key Point: *** Convergent encryption exposes whatever data is put into it to the sorts of attacks that already apply to passwords. Convergent encryption had been invented, analyzed and used for many years, but to the best of my knowledge the first time that anyone noticed this issue was March 16 of this year (at 3 AM Chicago Time), when Drew Perttula and Brian Warner made that observation. (Although to be fair some of the best-known uses of convergent encryption during these years have been sharing publicly-known files with strangers, in which case I suppose it is assumed that the cleartext does not contain secrets.) Now PBKDF2 is a combination of the first two defenses -- salting and key strengthening. When you first suggested PBKDF2, I -- and apparently Jerry Leichter -- thought that you were suggesting its salting feature as a solution. The solution that we've come up with for Tahoe (described in my original note) is much like salting, except that the added value that gets mixed in is secret and unguessable, where I normally think of salt as non-secret. Now I see that you are also emphasizing the key strengthening feature of PBKDF2. k denotes symmetric encryption key, p denotes plaintext, c denotes ciphertext, s denotes salt, E(key, plaintext) is encryption, H() is secure hashing, H^1000() is secure hashing a thousand times over, i.e.H(H(H(H(... a thousand times. Traditional encryption: k = random() c = E(k, p) Traditional convergent encryption: k = H(p) c = E(k, p) Tahoe-style convergent encryption with added secret (s); s can be re-used for any number of files, but it is kept secret from everyone except those with whom you wish to converge storage. s = random() k = H(s, p) c = E(k, p) PBKDF2 (simplified); s can be re-used but is generally not, and it is not secret. s = random() k = H^1000(s, password) c = E(k, p) Now, one could imagine a variant of traditional convergent encryption which added key strengthening: k = H^1000(p) c = E(k, p) This would have a performance impact on normal everyday use of Tahoe without, in my current estimation,
Re: [tahoe-dev] convergent encryption reconsidered -- salting and key-strengthening
zooko wrote: Think of it like this: Passwords are susceptible to brute-force and/or dictionary attack. We can't, in general, prevent attackers from trying guesses at our passwords without also preventing users from using them, so instead we employ various techniques: * salts (to break up the space of targets into subspaces, of which at most one can be targeted by a given brute-force attack) * key strengthening (to increase by a constant factor the cost of checking a password) * rate-limits for on-line tries (i.e., you get only a small fixed number of wrong guesses in a row before you are locked out for a time- out period) You forgot: * stronger passwords Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.links.org/ There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit. - Robert Woodruff - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]
Re: [p2p-hackers] convergent encryption reconsidered -- salting and key-strengthening
On Mar 30, 2008, at 9:37 PM, zooko wrote: You can store your True Name, credit card number, bank account number, mother's maiden name, and so forth, on the same server as your password, but you don't have to worry about using salts or key strengthening on those latter secrets, because the server doesn't run a service that allows unauthenticated remote people to connect, submit a guess as to their value, and receive confirmation, the way it does for your password. Tahoe doesn't run this service either. I can't use it to make guesses at any of the values you mentioned. I can use it to make guesses at whole documents incorporating such values, which is in most cases a highly non-trivial distinction. To make such guesses, I need to account for at least: - file formats, since an e-mail message has a different on-disk representation depending on the recipient's e-mail client, - temporal and transport variance, as PDF documents generally incorporate a generation timestamp, and e-mail messages include routing headers (with timestamps!), - document modifications due to variables other than the one(s) being guessed, e.g. names, e-mail addresses, customized unsubscribe links. I would be interested to see an actual real-world example of how a document would fall to this attack. It strikes me as a cute threat in theory, but uninteresting in practice. *** Convergent encryption exposes whatever data is put into it to the sorts of attacks that already apply to passwords. Sometimes, under highly peculiar circumstances, etc. Convergent encryption had been invented, analyzed and used for many years, but to the best of my knowledge the first time that anyone noticed this issue was March 16 of this year FWIW, I have discussed this threat verbally with colleagues when I was asked for possible designs for OLPC's server-based automatic backup system. I dismissed it at the time as 'not a real-world concern'. I might even have it in my notes, but those weren't published, so it's moot. Now PBKDF2 is a combination of the first two defenses -- salting and key strengthening. When you first suggested PBKDF2, I -- and apparently Jerry Leichter -- thought that you were suggesting its salting feature as a solution. Yeah, sorry, I wasn't being clear. I should've just said a key strengthening function rather than naming anything in particular. This would have a performance impact on normal everyday use of Tahoe without, in my current estimation, making a brute-force/dictionary attack infeasible. Adding, say, 5 seconds of computation to the time it takes to store a file is likely to be lost as noise in comparison with the actual network upload time, while still making an attacker's life _dramatically_ harder than now. The trade-off is actually worse than it appears since the attacker is attacking multiple users at once (in traditional convergent encryption, he is attacking *all* users at once) Again, is there a real-world example of the kind of data or documents that would show this to be a serious problem? While it's technically true that you're targeting all the users in parallel when brute forcing, note that if you're not actually hyper-targeting your attack, you need to brute force _all_ the variables I mention above in parallel, except in pathological cases -- and those, if you know of some, would be interesting for the discussion. economy of scale, and can profitably invest in specialized tools, even specialized hardware such as a COPACOBANA [1]. The OpenBSD eksblowfish/bcrypt design can't be bitsliced and generally doesn't lend itself well to large speedups in hardware, by design. Cheers, -- Ivan Krstić [EMAIL PROTECTED] | http://radian.org - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]