Re: [p2p-hackers] convergent encryption reconsidered -- salting and key-strengthening
On Mar 31, 2008, at 4:47 AM, Ivan Krstić wrote: Tahoe doesn't run this service either. I can't use it to make guesses at any of the values you mentioned. I can use it to make guesses at whole documents incorporating such values, which is in most cases a highly non-trivial distinction. The way that I would phrase this is that convergent encryption exposes whatever data is put into it, in whatever batch-size is put into it, to brute-force/dictionary attacks. If the data that you put in is unguessable, then you needn't worry about these attacks. (Likewise, as Ben Laurie reminds us, using strong passwords is a sufficient defense against these attacks on passwords.) You correctly emphasize that typical convergent encryption services (which operate on files, or, in the case of GNUnet, on 32 KiB blocks), and typical uses of those services (which typically store files as produced by apps written for traditional filesystems), batch together data in such a way that the aggregate is more likely to be unguessable than if each field were stored separately. I don't disagree with this observation. I am often reminded of Niels Ferguson's and Bruce Schneier's dictum, in the excellent _Practical_Cryptography_, that security needs to be a *local* property. They argue that one should be able to tell whether a component is secure by inspecting that component itself, rather than by reasoning about interactions between that component and other components. Concretely, convergent encryption with a per-user added secret, as currently implemented in Tahoe, can be shown to guarantee confidentiality of the data, regardless of what the data is. Traditional convergent encryption can be shown to offer confidentiality only with the proviso that the data put into it conform to certain criteria -- criteria that cannot be verified by a computer nor by a user who is not a skilled security expert. You may argue that the chance that a user would put non-comformant data into it is small. I don't necessarily disagree, although before I became willing to bet on it I would require more quantitative investigation. However, arguing that component A is secure as long as component B behaves a certain way, and that component B is very likely to behave that way, is a different sort of argument than arguing that component A is secure regardless of the behavior of component B. For one thing, the behavior of component B may change in the future. Concretely, people may write apps that store data in Tahoe in a way that previous apps didn't. Those people will almost certainly be completely unaware of the nature of convergent encryption and brute- force/dictionary attacks. Now obviously making the security properties of a system modular in this way might impose a performance cost. In the case of Tahoe, that cost is the loss of universal convergence. Allmydata.com analyzed the space savings due to convergence among our current customers and found that it was around 1% savings. We (allmydata.com) intend to monitor the potential savings of universal convergence in an on-going way, and if it turns out that there are substantial benefits to be gained then I will revisit this issue and perhaps I will be forced to rely on an argument of the other form -- that users are unlikely to use it in an unsafe way. Thank you again for your thoughtful comments on this issue. Regards, Zooko O'Whielacronx - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]
convergent encryption reconsidered -- salting and key-strengthening
[This conversation is spanning three mailing lists -- cryptography@metzdowd.com, [EMAIL PROTECTED], and tahoe- [EMAIL PROTECTED] . Some of the posts have not reached all three of those lists. I've manually added Jerry Leichter and Ivan Krstić to the approved-senders set for p2p-hackers and tahoe-dev, so that further posts by them will appear on those lists.] On Mar 30, 2008, at 3:13 PM, Ivan Krstić wrote: Unless I'm misunderstanding Zooko's writeup, he's worried about an attacker going from a partially-known plaintext (e.g. a form bank letter) to a completely-known plaintext by repeating the following process: 1. take partially known plaintext 2. make a guess, randomly or more intelligently where possible, about the unknown parts 3. take the current integrated partial+guessed plaintext, hash to obtain convergence key 4. verify whether that key exists in the storage index 5. if yes, you've found the full plaintext. if not, repeat from '2'. That's a brute force search. That's right. Your comparison to normal old brute-force/dictionary attack on passwords is a good one, and one that Jim McCoy and Jerry Leichter have alluded to as well. Think of it like this: Passwords are susceptible to brute-force and/or dictionary attack. We can't, in general, prevent attackers from trying guesses at our passwords without also preventing users from using them, so instead we employ various techniques: * salts (to break up the space of targets into subspaces, of which at most one can be targeted by a given brute-force attack) * key strengthening (to increase by a constant factor the cost of checking a password) * rate-limits for on-line tries (i.e., you get only a small fixed number of wrong guesses in a row before you are locked out for a time- out period) However, secrets other than passwords are not usually susceptible to such attacks. You can store your True Name, credit card number, bank account number, mother's maiden name, and so forth, on the same server as your password, but you don't have to worry about using salts or key strengthening on those latter secrets, because the server doesn't run a service that allows unauthenticated remote people to connect, submit a guess as to their value, and receive confirmation, the way it does for your password. (In other words, for such data we generally use an extreme form of the third defense, rate-limiting tries -- the number of guesses that an attacker gets is limited to none!) Likewise, if you are going to store or transmit those kinds of secrets in encrypted form using a traditional randomly-generated encryption key, then you don't have to worry about brute-force/ dictionary attacks because your strong randomly-selected symmetric encryption key defies them. The Key Point: *** Convergent encryption exposes whatever data is put into it to the sorts of attacks that already apply to passwords. Convergent encryption had been invented, analyzed and used for many years, but to the best of my knowledge the first time that anyone noticed this issue was March 16 of this year (at 3 AM Chicago Time), when Drew Perttula and Brian Warner made that observation. (Although to be fair some of the best-known uses of convergent encryption during these years have been sharing publicly-known files with strangers, in which case I suppose it is assumed that the cleartext does not contain secrets.) Now PBKDF2 is a combination of the first two defenses -- salting and key strengthening. When you first suggested PBKDF2, I -- and apparently Jerry Leichter -- thought that you were suggesting its salting feature as a solution. The solution that we've come up with for Tahoe (described in my original note) is much like salting, except that the added value that gets mixed in is secret and unguessable, where I normally think of salt as non-secret. Now I see that you are also emphasizing the key strengthening feature of PBKDF2. k denotes symmetric encryption key, p denotes plaintext, c denotes ciphertext, s denotes salt, E(key, plaintext) is encryption, H() is secure hashing, H^1000() is secure hashing a thousand times over, i.e.H(H(H(H(... a thousand times. Traditional encryption: k = random() c = E(k, p) Traditional convergent encryption: k = H(p) c = E(k, p) Tahoe-style convergent encryption with added secret (s); s can be re-used for any number of files, but it is kept secret from everyone except those with whom you wish to converge storage. s = random() k = H(s, p) c = E(k, p) PBKDF2 (simplified); s can be re-used but is generally not, and it is not secret. s = random() k = H^1000(s, password) c = E(k, p) Now, one could imagine a variant of traditional convergent encryption which added key strengthening: k = H^1000(p) c = E(k, p) This would have a performance impact on normal everyday use of Tahoe without, in my current estimation,
Re: [p2p-hackers] convergent encryption reconsidered
On Sun, Mar 30, 2008 at 05:13:07PM -0400, Ivan Krsti?? wrote: That's a brute force search. If your convergence key, instead of being a simple file hash, is obtained through a deterministic but computationally expensive function such as PBKDF2 (or the OpenBSD bcrypt, etc), then step 3 makes an exhaustive search prohibitive in most cases while not interfering with normal filesystem operation. What am I missing? PBKDFS2 is excellent for turning interactively typed pass-phrases into keys. It is not entirely clear that it is a good fit for a filesystem. Updating any single file is now a computationally intensive process, the performance impact may be unacceptable. With PBKDF2 and the iteration count set to the for now popular 1000, a 64K byte file will now trigger ~~2 million sha1 compression function computations (if I remember the sha1 block size correctly as 512 bits or 64 bytes). A crude cost estimate on typical hardware (openssl speed): Doing sha1 for 3s on 8192 size blocks: 57316 sha1's in 3.00s Extrapolating from this, on 64K sized files, we get ~1200 HMAC operations per second. If we iterate that 1000 times, 1.2 key derivations per second. The throughput to disk is CPU bound at ~64KB/s, which is rather poor. -- Viktor. - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]
Re: [tahoe-dev] convergent encryption reconsidered -- salting and key-strengthening
zooko wrote: Think of it like this: Passwords are susceptible to brute-force and/or dictionary attack. We can't, in general, prevent attackers from trying guesses at our passwords without also preventing users from using them, so instead we employ various techniques: * salts (to break up the space of targets into subspaces, of which at most one can be targeted by a given brute-force attack) * key strengthening (to increase by a constant factor the cost of checking a password) * rate-limits for on-line tries (i.e., you get only a small fixed number of wrong guesses in a row before you are locked out for a time- out period) You forgot: * stronger passwords Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.links.org/ There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit. - Robert Woodruff - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]
Re: convergent encryption reconsidered
Hi, Sorry for arriving late into this thread... zooko [EMAIL PROTECTED] writes: The Learn-Partial-Information Attack They extended the confirmation-of-a-file attack into the learn-partial-information attack. In this new attack, the attacker learns some information from the file. This is done by trying possible values for unknown parts of a file and then checking whether the result matches the observed ciphertext. For example, if you store a document such as a form letter from your bank, which contains a few pages of boilerplate legal text plus a few important parts, such as your bank account number and password, then an attacker who knows the boilerplate might be able to learn your account number and password. I don't see how this would work. It's different from a dictionary attack because it looks for partial matches, as opposed to exact matches. Suppose you have one (sensitive) file that contains boilerplatesecret and another than contains boilerplateplaceholder. They have different hashes, hence different ciphertexts through convergent encryption. How would one get access to the plaintext of the former when knowing only the latter? Now, let's assume that said files were split into two blocks before being convergent-encrypted, namely boilerplate and secret for the former, and boilerplate and placeholder for the latter. The confirmation-of-a-file (or rather confirmation-of-a-block) attack does work, but it does not reveal anything about the secret. I'm not sure about Tahoe, but the scheme I had in mind in my thesis was to allow anyone to choose whatever encoding is used [0]. This means that one could choose the algorithm used to split input files into blocks, whether to compress the input file or individual blocks, what compression algorithm to use, what hash and cipher algorithm to use, etc. With that level of freedom, these two attacks are a lesser threat (one might argue that, in practice, many people would use the default settings, which would make them potential victims and attackers of each other...). Thanks, Ludovic. [0] http://www.fdn.fr/~lcourtes/phd/phd-thesis.pdf, e.g., Section 4.3. - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]
Re: [p2p-hackers] convergent encryption reconsidered
Ivan Krsti? wrote: 1. take partially known plaintext 2. make a guess, randomly or more intelligently where possible, about the unknown parts 3. take the current integrated partial+guessed plaintext, hash to obtain convergence key 4. verify whether that key exists in the storage index 5. if yes, you've found the full plaintext. if not, repeat from '2'. That's a brute force search. If your convergence key, instead of being a simple file hash, is obtained through a deterministic but computationally expensive function such as PBKDF2 (or the OpenBSD bcrypt, etc), then step 3 makes an exhaustive search prohibitive in most cases while not interfering with normal filesystem operation. What am I missing? Better still, have a limited supply of tickets that enable one to construct the convergence key. Enough tickets for all normal usage, but not enough to perform an exhaustive search. Assume a small set of ticket issuing computers hold a narrowly shared secret integer k. Assume a widely shared elliptic curve with the generator G. If h is the hash of the file, the convergence key is h*k*G. If you give the ticket issuing computers an elliptic point P, they will give you the corresponding elliptic point k*P. If, however, you ask for too many such points, they will stop responding. Of course, this allows one to be attacked by anyone that holds the narrowly held key. - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]
Re: [p2p-hackers] convergent encryption reconsidered -- salting and key-strengthening
On Mar 30, 2008, at 9:37 PM, zooko wrote: You can store your True Name, credit card number, bank account number, mother's maiden name, and so forth, on the same server as your password, but you don't have to worry about using salts or key strengthening on those latter secrets, because the server doesn't run a service that allows unauthenticated remote people to connect, submit a guess as to their value, and receive confirmation, the way it does for your password. Tahoe doesn't run this service either. I can't use it to make guesses at any of the values you mentioned. I can use it to make guesses at whole documents incorporating such values, which is in most cases a highly non-trivial distinction. To make such guesses, I need to account for at least: - file formats, since an e-mail message has a different on-disk representation depending on the recipient's e-mail client, - temporal and transport variance, as PDF documents generally incorporate a generation timestamp, and e-mail messages include routing headers (with timestamps!), - document modifications due to variables other than the one(s) being guessed, e.g. names, e-mail addresses, customized unsubscribe links. I would be interested to see an actual real-world example of how a document would fall to this attack. It strikes me as a cute threat in theory, but uninteresting in practice. *** Convergent encryption exposes whatever data is put into it to the sorts of attacks that already apply to passwords. Sometimes, under highly peculiar circumstances, etc. Convergent encryption had been invented, analyzed and used for many years, but to the best of my knowledge the first time that anyone noticed this issue was March 16 of this year FWIW, I have discussed this threat verbally with colleagues when I was asked for possible designs for OLPC's server-based automatic backup system. I dismissed it at the time as 'not a real-world concern'. I might even have it in my notes, but those weren't published, so it's moot. Now PBKDF2 is a combination of the first two defenses -- salting and key strengthening. When you first suggested PBKDF2, I -- and apparently Jerry Leichter -- thought that you were suggesting its salting feature as a solution. Yeah, sorry, I wasn't being clear. I should've just said a key strengthening function rather than naming anything in particular. This would have a performance impact on normal everyday use of Tahoe without, in my current estimation, making a brute-force/dictionary attack infeasible. Adding, say, 5 seconds of computation to the time it takes to store a file is likely to be lost as noise in comparison with the actual network upload time, while still making an attacker's life _dramatically_ harder than now. The trade-off is actually worse than it appears since the attacker is attacking multiple users at once (in traditional convergent encryption, he is attacking *all* users at once) Again, is there a real-world example of the kind of data or documents that would show this to be a serious problem? While it's technically true that you're targeting all the users in parallel when brute forcing, note that if you're not actually hyper-targeting your attack, you need to brute force _all_ the variables I mention above in parallel, except in pathological cases -- and those, if you know of some, would be interesting for the discussion. economy of scale, and can profitably invest in specialized tools, even specialized hardware such as a COPACOBANA [1]. The OpenBSD eksblowfish/bcrypt design can't be bitsliced and generally doesn't lend itself well to large speedups in hardware, by design. Cheers, -- Ivan Krstić [EMAIL PROTECTED] | http://radian.org - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]
Re: [p2p-hackers] convergent encryption reconsidered
On Mar 31, 2008, at 6:44 AM, James A. Donald wrote: Better still, have a limited supply of tickets that enable one to construct the convergence key. Enough tickets for all normal usage, but not enough to perform an exhaustive search. [...] If you give the ticket issuing computers an elliptic point P, they will give you the corresponding elliptic point k*P. If, however, you ask for too many such points, they will stop responding. This isn't a good design. It's incompatible with Tahoe's present architecture, introduces a single point of failure, centralizes the otherwise by-design decentralized filesystem, and presents a simple way to mount denial of service attacks. Finally, since the decentralization in Tahoe is part of its security design (storage servers aren't trusted), you run into the usual quis custodiet ipsos custodes problem with the ticket-issuing server that the present system nicely avoids. Cheers, -- Ivan Krstić [EMAIL PROTECTED] | http://radian.org - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]
Re: [p2p-hackers] convergent encryption reconsidered
On Mar 20, 2008, at 3:42 PM, zooko wrote: They extended the confirmation-of-a-file attack into the learn-partial-information attack. In this new attack, the attacker learns some information from the file. This is done by trying possible values for unknown parts of a file and then checking whether the result matches the observed ciphertext. How is this conceptually different from classic dictionary attacks, and why does e.g. running the file through PBKDF2 and using the result for convergence not address your concern(s)? -- Ivan Krstić [EMAIL PROTECTED] | http://radian.org - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]
Re: [p2p-hackers] convergent encryption reconsidered
| They extended the confirmation-of-a-file attack into the | learn-partial-information attack. In this new attack, the | attacker learns some information from the file. This is done by | trying possible values for unknown parts of a file and then | checking whether the result matches the observed ciphertext. | | How is this conceptually different from classic dictionary attacks, | and why does e.g. running the file through PBKDF2 and using the result | for convergence not address your concern(s)? How would that help? Both the ability of convergent encryption to eliminate duplicates, and this attack, depend on there being a deterministic algorithm that computes a key from the file contents. Sure, if you use a different salt for each file, the attack goes away - but so does the de-duplication. If you don't care about de-duplication, there are simpler, cheaper ways to choose a key. -- Jerry | -- | Ivan Krsti? [EMAIL PROTECTED] | http://radian.org | | - | The Cryptography Mailing List | Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED] | |
Re: [p2p-hackers] convergent encryption reconsidered
On Mar 30, 2008, at 3:12 PM, Leichter, Jerry wrote: How would that help? Unless I'm misunderstanding Zooko's writeup, he's worried about an attacker going from a partially-known plaintext (e.g. a form bank letter) to a completely-known plaintext by repeating the following process: 1. take partially known plaintext 2. make a guess, randomly or more intelligently where possible, about the unknown parts 3. take the current integrated partial+guessed plaintext, hash to obtain convergence key 4. verify whether that key exists in the storage index 5. if yes, you've found the full plaintext. if not, repeat from '2'. That's a brute force search. If your convergence key, instead of being a simple file hash, is obtained through a deterministic but computationally expensive function such as PBKDF2 (or the OpenBSD bcrypt, etc), then step 3 makes an exhaustive search prohibitive in most cases while not interfering with normal filesystem operation. What am I missing? Cheers, -- Ivan Krstić [EMAIL PROTECTED] | http://radian.org - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]
Re: [p2p-hackers] convergent encryption reconsidered
Jim: Thanks for your detailed response on the convergent encryption issue. In this post, I'll just focus on one very interesting question that you raise: When do either of these attacks on convergent encryption apply?. In my original note I was thinking about the allmydata.org Tahoe Least Authority Filesystem. In this post I will attempt to follow your lead in widening the scope. In particular GNUnet and Freenet are currently active projects that use convergent encryption. The learn-partial-information attack would apply to either system if a user were using it with files that she intended not to divulge, but that were susceptible to being brute-forced in this way by an attacker. On Mar 20, 2008, at 10:56 PM, Jim McCoy wrote: On Mar 20, 2008, at 12:42 PM, zooko wrote: Security engineers have always appreciated that convergent encryption allows an attacker to perform a confirmation-of-a-file attack -- if the attacker already knows the full plaintext of a file, then they can check whether a given user has a copy of that file. The truth of this depends on implementation details, and is an assertion that cannot be said to cover all or even most of the potential use-cases for this technique. You're right. I was writing the above in the context of Tahoe, where, as Brian Warner explained, we do not attempt to hide the linkage between users and ciphertexts. What I wrote above doesn't apply in the general case. However, there is a very general argument about the applicability of these attacks, which is: Why encrypt?. If your system has strong anonymity properties, preventing people from learning which files are associated with which users, then you can just store the files in plaintext. Ah, but of course you don't want to do that, because even without being linked to users, files may contain sensitive information that the users didn't intend to disclose. But if the files contain such information, then it might be acquired by the learn-partial- information attack. When designing such a system, you should ask yourself Why encrypt?. You encrypt in order to conceal the plaintext from someone, but if you use convergent encryption, and they can use the learn-partial-information attack, then you fail to conceal the plaintext from them. You should use traditional convergent encryption (without an added secret) if: 1. You want to encrypt the plaintext, and 2. You want convergence, and 3. You don't mind exposing the existence of that file (ignoring the confirmation-of-a-file attack), and 4. You are willing to bet that the file has entropy from the attacker's perspective which is greater than his computational capacity (defeating the learn-partial-information attack). You should use convergent encryption with an added secret (as recently implemented for the Tahoe Least Authority Filesystem) if: 1. You want to encrypt the plaintext, and 2. You want convergence within the set of people who know the added secret, and 3. You don't mind exposing the existence of that file to people in that set, and 4. You are willing to disclose the file to everyone in that set, or else you think that people in that set to whom you do not wish to disclose the file will not try the learn-partial-information attack, or if they do that the file has entropy from their perspective which is greater than their computational capacity. I guess the property of unlinkability between user and file addresses issue 3 in the above list -- the existence of a file is a much less sensitive bit of information than the existence of a file in a particular user's collection. It could also effect issue 4 by increasing the entropy the file has from an attacker's perspective. If he knows that the ciphertext belongs to you then he can try filling in the fields with information that he knows about you. Without that linkage, he has to try filling in the fields with information selected from what he knows about all users. But hiding this linkage doesn't actually help in the case the attacker is already using everything he knows about all users to attack all files in parallel. Note that using an added secret does help in the parallel attack case, because (just like salting passwords) it breaks the space of targets up into separate spaces which can't all be attacked with the same computation. The first problem is isolating the original ciphertext in the pool of storage. If a file is encrypted using convergent encryption and then run through an error-correction mechanism to generate a number of shares that make up the file an attacker first needs to be able to isolate these shares to generate the orginal ciphertext. FEC decoding speeds may be reasonably fast, but they are not without some cost. If the storage pool is sufficiently large and you are doing your job to limit the ability of an attacker to see which blocks
convergent encryption reconsidered
(This is an ASCII rendering of https://zooko.com/ convergent_encryption_reconsidered.html .) Convergent Encryption Reconsidered Written by Zooko Wilcox-O'Hearn, documenting ideas due to Drew Perttula, Brian Warner, and Zooko Wilcox-O'Hearn, 2008-03-20. Abstract Convergent encryption is already known to suffer from a confirmation-of-a-file attack. We show that it suffers also from a learn-partial-information attack. The conditions under which this attack works cannot be predicted by a computer program nor by an unsophisticated user. We propose a solution which trades away part of the space savings benefits of convergent encryption in order to prevent this new attack. Our defense also prevents the old attack. The issues are presented in the context of the Tahoe Least-AUthority Grid File System, a secure decentralized filesystem. Background -- The Confirmation-Of-A-File Attack Convergent encryption, also known as content hash keying, was first mentioned by John Pettitt on the cypherpunks list in 1996 [1], was used by Freenet [2] and Mojo Nation [3] in 2000, and was analyzed in a technical report by John Doceur et al. in 2002 [4]. Today it is used by at least Freenet, GNUnet [5], flud [6], and the Tahoe Least-AUthority Grid File System [7]. The remainder of this note will focus on the Tahoe LAUGFS filesystem. The use of convergent encryption in other systems may have different consequences than described here, because of the different use cases or added defenses that those systems may have. Convergent encryption is simply encrypting a file using a symmetric encryption key which is the secure hash of the plaintext of the file. Security engineers have always appreciated that convergent encryption allows an attacker to perform a confirmation-of-a-file attack -- if the attacker already knows the full plaintext of a file, then they can check whether a given user has a copy of that file. Whether this confirmation-of-a-file attack is a security or privacy problem depends on the situation. If you want to store banned books or political pamphlets without attracting the attention of an oppressive government, or store pirated copies of music or movies without attracting the attention of copyright holders, then the confirmation-of-a-file attack is potentially a critical problem. On the other hand, if the sensitive parts of your data are secret personal things like your bank account number, passwords, and so forth, then it isn't a problem. Or so I -- and as far as I know everyone else -- thought until March 16, 2008. I had planned to inform users of the current version of Tahoe -- version 0.9.0 -- about the confirmation-of-a-file attack by adding a FAQ entry: Q: Can anyone else see the contents of files that I have not shared? A: The files that you store are encrypted so that nobody can see a file's contents (unless of course you intentionally share the file with them). However, if the file that you store is something that someone has already seen, such as if it is a file that you downloaded from the Internet in the first place, then they can recognize it as being the same file when you store it, even though it is encrypted. So basically people can tell which files you are storing if they are publically known files, but they can't learn anything about your own personal files. However, four days ago (on March 16, 2008) Drew Perttula and Brian Warner came up with an attack that shows that the above FAQ is wrong. The Learn-Partial-Information Attack They extended the confirmation-of-a-file attack into the learn-partial-information attack. In this new attack, the attacker learns some information from the file. This is done by trying possible values for unknown parts of a file and then checking whether the result matches the observed ciphertext. For example, if you store a document such as a form letter from your bank, which contains a few pages of boilerplate legal text plus a few important parts, such as your bank account number and password, then an attacker who knows the boilerplate might be able to learn your account number and password. For another example, if you use Tahoe to backup your entire home directory, or your entire filesystem, then the attacker gains the opportunity to try to learn partial information about various files which are of predictable format but have sensitive fields in them, such as .my.cnf (MySQL configuration files), .htpasswd, .cvspass, .netrc, web browser cookie files, etc.. In some cases, files such as these will contain too much entropy from the perspective of the attacker to allow this attack, but in other cases the attacker will know, or be able to guess, most of the fields, and brute force
Fwd: [tahoe-dev] [p2p-hackers] convergent encryption reconsidered
Dear Perry Metzger: Jim McCoy asked me to forward this, as he is not subscribed to cryptography@metzdowd.com, so his posting bounced. Regards, Zooko Begin forwarded message: From: Jim McCoy [EMAIL PROTECTED] Date: March 20, 2008 10:56:58 PM MDT To: theory and practice of decentralized computer networks p2p- [EMAIL PROTECTED] Cc: [EMAIL PROTECTED], Cryptography cryptography@metzdowd.com Subject: Re: [tahoe-dev] [p2p-hackers] convergent encryption reconsidered Reply-To: [EMAIL PROTECTED] On Mar 20, 2008, at 12:42 PM, zooko wrote: Security engineers have always appreciated that convergent encryption allows an attacker to perform a confirmation-of-a-file attack -- if the attacker already knows the full plaintext of a file, then they can check whether a given user has a copy of that file. The truth of this depends on implementation details, and is an assertion that cannot be said to cover all or even most of the potential use-cases for this technique. This property only holds if it is possible for the attacker to link a selected ciphertext/file to a user. Systems which use convergent encryption to populate a shared storage pool _might_ have this property, but is my no means a certainty; if a system is implemented correctly is is not necessary for users to expose their list of files in order to maintain this shared storage space. basically people can tell which files you are storing if they are publically known files, but they can't learn anything about your own personal files. It sounds like you have a design problem. If nodes that participate in the system can distinguish between publication and _re_- publication/ replication (or whatever you want to call the random sharing of arbitrary data blocks for the purposes of increasing file availability) then you have a problem. If these two activities are indistinguishable then an observer knows you have some blocks to a file but should not be able to distinguish between you publishing the blocks and the act of re-distribution to increase block availability. The Learn-Partial-Information Attack [...] A better title for this would be Chosen-Plaintext attack on Convergent Encryption, since what you are talking about is really a chosen plaintext attack. To be a bit more specific, this is really just a version of a standard dictionary attack. The solution to this problem is to look at similar systems that suffered from dictionary attacks an see what solutions were created to solve the problem. The most widely known and studied version of this is the old crypt()/ passwd problem. For another example, if you use Tahoe to backup your entire home directory, or your entire filesystem, then the attacker gains the opportunity to try to learn partial information about various files which are of predictable format but have sensitive fields in them, such as .my.cnf (MySQL configuration files), .htpasswd, .cvspass, .netrc, web browser cookie files, etc.. The problem with this imagined attack are twofold. I will use your Tahoe example for my explanations because I have a passing familiarity with the architecture. The first problem is isolating the original ciphertext in the pool of storage. If a file is encrypted using convergent encryption and then run through an error-correction mechanism to generate a number of shares that make up the file an attacker first needs to be able to isolate these shares to generate the orginal ciphertext. FEC decoding speeds may be reasonably fast, but they are not without some cost. If the storage pool is sufficiently large and you are doing your job to limit the ability of an attacker to see which blocks are linked to the same FEC operation then the computational complexity of this attack is significantly higher than you suggest. Assuming an all-seeing oracle who can watch every bit sent into the storage pool will get us around this first problem, but it does raise the bar for potential attackers. The second problem an attacker now faces is deciding what sort of format a file might have, what the low-entropy content might be, and then filling in values for these unknowns. If your block size is small (and I mean really small in the context of the sort of systems we are talking about) there might be only a few kilobits of entropy in the first couple of blocks of a file so either a rainbow-table attack on known file formats or a dedicated effort to grab a specific file might be possible, but this is by no means certain. Increase your block size and this problem becomes much harder for the attacker. Defense Against Both Attacks [...] However, we can do better than that by creating a secret value and mixing that value into the per-file encryption key (so instead of symmetric_key = H(plaintext), you have symmetric_key = H(added_secret, plaintext), where , denotes an unambiguous encoding of both operands). This idea is due to Brian Warner and Drew Perttula
Re: convergent encryption reconsidered
|...Convergent encryption renders user files vulnerable to a |confirmation-of-a-file attack. We already knew that. It also |renders user files vulnerable to a learn-partial-information |attack in subtle ways. We didn't think of this until now. My |search of the literature suggests that nobody else did either. The way obvious in retrospect applies here: The vulnerability is closely related to the power of probable plaintext attacks against systems that are thought to be vulnerable only to known plaintext attacks. The general principle that needs to be applied is: In any cryptographic setting, if knowing the plaintext is sufficient to get some information out of the system, then it will also be possible to get information out of the system by guessing plaintext - and one must assume that there will be cases where such guessing is easy enough. -- Jerry - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]