On Sun, Nov 01, 2009 at 10:33:34PM -0700, Zooko Wilcox-O'Hearn wrote: > I don't understand why you need a MAC when you already have the hash > of the ciphertext. Does it have something to do with the fact that > the checksum is non-cryptographic by default (http://docs.sun.com/app/ > docs/doc/819-5461/ftyue?a=view ), and is that still true? Your > original design document [1] said you needed a way to force the > checksum to be SHA-256 if encryption was turned on. But back then > you were planning to support non-authenticating modes like CBC. I > guess once you dropped non-authenticating modes then you could relax > that requirement to force the checksum to be secure.
[Not speaking for Darren...] No, the requirement to use a strong hash remains, but since the hash would be there primarily for protection against errors, I don't the requirement for a strong hash is really needed. > Too bad, though! Not only are you now tight on space in part because > you have two integrity values where one ought to do, but also a > secure hash of the ciphertext is actually stronger than a MAC! A > secure hash of the ciphertext tells whether the ciphertext is right > (assuming the hash function is secure and implemented correctly). > Given that the ciphertext is right, then the plaintext is right > (given that the encryption is implemented correctly and you use the > right decryption key). A MAC on the plaintext tells you only that > the plaintext was chosen by someone who knew the key. See what I > mean? A MAC can't be used to give someone the ability to read some > data while withholding from them the ability to alter that data. A > secure hash can. Users won't actually get the data keys, only the data key wrapping keys. Users who can read the disk and find the wrapped keys and know the wrapping keys can find the actual data keys, of course, but add in a host key that the user can't read and now the user cannot recover their data keys. One goal is to protect a system against its users, but another is to protect user data against maliciou modification by anyone else. A MAC provides the first kind of protection if the user can't access the data keys, and a MAC provides the second kind of protection if the data keys can be kept secret. > One of the founding ideas of the whole design of ZFS was end-to-end > integrity checking. It does that successfully now, for the case of > accidents, using large checksums. If the checksum is secure then it > also does it for the case of malice. In contrast a MAC doesn't do > "end-to-end" integrity checking. For example, if you've previously > allowed someone to read a filesystem (i.e., you've given them access > to the key), but you never gave them permission to write to it, but > they are able to exploit the isses that you mention at the beginning > of [1] such as "Untrusted path to SAN", then the MAC can't stop them > from altering the file, nor can the non-secure checksum, but a secure > hash can (provided that they can't overwrite all the way up the > Merkle Tree of the whole pool and any copies of the Merkle Tree root > hash). I think we have to assume that an attacker can write to any part of the pool, including the Merkle tree roots. It'd be odd to assume that the attacker can write anywhere but there -- there's nothing to make it so! I.e., we have to at least authenticate the Merkle tree roots. That still means depending on collision resistance of the hash function for security. If we authenticate every block we don't have that dependence (I'll come back to this). The interesting thing here is that we want the hash _and_ the MAC, not just the MAC. The reason is that we want block pointers (which include the {IV, MAC, hash} for the block being pointed to) to be visible to the layer below the filesystem, so that we can scrub/resilver and evacuate devices from a pool (meaning: re-write all the block pointers point to blocks on the evacuated devices so that they point elsewhere) even without having the data keys at hand (more on this below). We could MAC the Merkle tree roots alone, thus alleviating the space situation in the block pointer structure (and also saving precious CPU cycles). But interestingly we wouldn't alleviate it that much! We need to store a 96-bit IV, and if we don't MAC every block then we'll want the strongest hash we can use, so we'll need at least another 256 bits, for a total of 352 bits of the 384 that we have to play with. Whereas if we MAC every block we might store a 96-bit IV, a 128-bit authentication tag and 160-bit hash, using all 384 bits. You get more collision resistance from an N-bit MAC than from a hash of the same length. That's because in the MAC case the forger can't check the forgery without knowing the key, while in the hash case the attacker can verify that some contents collides with another's hash. In the MAC case an attacker that hasn't broken the MAC/key must wait until the system reads the modified block(s) to determine if his/her guess was correct. So a 128-bit MAC provides more protection than a 160-bit hash, and about as much as a 256-bit hash. If we remove the MAC then the hash has to grow longer to compensate, thus the space gained by not including the MAC is minimal, possibly zero. If we MAC every block then we don't need the hash function for security purposes: its main role would still be to provide integrity protection against errors for scrubbing and resilvering when keys are unavailable. The hash would continue to provide end-to-end integrity protection against errors. The hash would add _some_ security value though: not only must an attacker seeking to modify data forge the right MAC for the new contents, they must also find a hash collision (and they must do this all the way up the Merkle tree). > Likewise, a secure hash can be relied on as a dedupe tag *even* if > someone with malicious intent may have slipped data into the pool. For dedup you want to compare block contents on hash equality. That's what ZFS will do. That defeats your attack on dedup. > Also, the IVs for GCM don't need to be random, they need only to be > unique. Can you use a block number and birth number or other such > guaranteed-unique data instead of storing an IV? (Apropos recent > discussion on the cryptography list [2].) The block address can't be used: a blkptr_t actually stores 1-3 actual block addresses, but these can change if a block is relocated. I think the notion that all encrypted/authenticated filesystems need not be logged in in order to perform certain pool operations is both, very useful and rather odd. Odd because once a filesystem is logged in, an all-powerful administrator could either learn its keys or, if the system were using a token to avoid this, the admin could abuse those keys -- the sysadmin remains so powerful that trying to protect users against the sysadmin seems like a waste of resources. But the ability to perform some pool operations without having the keys is still useful: the sysadmin is a user, after all, and might not be around. Think of a SAN operator reconfiguring pools without having to have the keys to the datasets on those pools. Nico --