Suppose you have some data and you want to control who gets to see it, and who also want anyone who gets to see it to be able to verify its integrity. So far, these requirements are familiar to cryptographers. The obvious answer is to encrypt the data and then to MAC (Message Authentication Code) the ciphertext. There would be one key for the encryption and one key for the MAC. However, this has the wrong semantics for our purposes -- anyone who is given the ability to check the integrity (by being given the MAC key) is also given the ability to create new texts which would verify. Also, whoever creates the initial MAC tag can also create new MAC tags which would cause new files to also verify. Instead, we want a single file that can pass the integrity check, and nobody -- not a reader who is able to verify integrity nor even the writer who initially created the file -- is able to make a different file which would also pass the integrity check.
Therefore, we want the integrity check value to be the secure hash of the file itself. That's what we currently have in Tahoe-LAFS. The immutable file read-cap is a concatenation of two values: the decryption key and the secure hash. The latter is solely for integrity-checking. Actually in Tahoe-LAFS, the integrity check value is not just a flat hash of the plaintext, but instead it is the hash of the roots of a pair of Merkle Trees, one for verifying the correctness of the shares and the other for verifying the correctness of the ciphertext (see [2]).
Now, convergent encryption could do both jobs with one value! If you let the symmetric key be the secure hash of the plaintext, then the reader could use the symmetric key to decrypt, then verify that the key was the hash of the plaintext. However, you can't always use convergent encryption. Not only because of the security issues [1], and not only because it requires two passes over the file which prevents "on-line" processing, but also because you might need to generate the symmetric key and/or the integrity check value in a different way. For example, the Tahoe-LAFS integrity-check value isn't just a secure hash of the plaintext. It would be inefficient to generate the full Tahoe-LAFS integrity check value before beginning to encrypt, and we want to be able to give someone the integrity check value (in a verify cap) without thus giving them the decryption key (i.e. the read-cap).
So here is my idea to use a single value to accomplish both decryption and integrity checking even when you can't set the symmetric key to be the secure hash of the plaintext. You use the encryption key k to encrypt the plaintext to produce the ciphertext, and in the same pass you compute the integrity-check value v. Then you compute the secure hash of the combination of k and v, let's call the result r = H(k, v). Then you encrypt k using r and store the encrypted k with the ciphertext. Now r is the real key -- the read cap. If someone gives you r, the ciphertext, and the encrypted k, then you first use r to decrypt k, check that r = H(k, v), then perform the decryption and integrity-checking of the ciphertext.
Here is a diagram: [3] (also attached). Regards, Zooko [1] http://hacktahoe.org/drew_perttula.html [2] http://allmydata.org/~zooko/lafs.pdf [3] http://zooko.com/imm-short-readcap-simple-drawing.svg
imm-short-readcap-simple-drawing.svg
Description: Binary data
_______________________________________________ tahoe-dev mailing list [email protected] http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev
