subject:"\"Re\\\: Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto\""

Re: Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

2009-11-09 Thread Zooko Wilcox-O'Hearn


On Wednesday,2009-11-04, at 7:04 , Darren J Moffat wrote:

The SHA-256 is unkeyed so there would be nothing to stop an  
attacker that can write to the disks but doesn't know the key from  
modifying the on disk ciphertext and all the SHA-256 hashes up to  
the top of the Merkle tree to the uberblock.  That would create a  
valid ZFS pool but the data would have been tampered with.   I  
don't see that as an acceptable risk.


I see.  It is interesting that you and I have different intuitions  
about this.  My intuition is that it is easier to make sure that the  
Merkle Tree root hash wasn't unauthorizedly changed than to make sure  
that an unauthorized person hasn't learned a secret.  Is your  
intuition the opposite?  I suppose in different situations either one  
could be true.


Now I better appreciate why you want to use both a secure hash and a  
MAC.  Now I understand the appeal of Nico Williams's proposal to MAC  
just the root of the tree and not every node of the tree.  That would  
save space in all the non-root nodes but would retain the property  
that you have to both know the secret *and* be able to write to the  
root hash in order to change the filesystem.


So if I don't truncate the SHA-256 how big does my MAC need to be  
given every ZFS block has its own IV ?


I don't know the answer to this question.  I have a hard time  
understanding if the minimum safe size of the MAC is zero (i.e. you  
don't need it anyway) or a 128 bits (i.e. you rely on the MAC and you  
want 128-bit crypto strength) or something in between.


Regards,

Zooko

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majord...@metzdowd.com

Re: Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

2009-11-08 Thread David-Sarah Hopwood

Nicolas Williams wrote:
> On Tue, Nov 03, 2009 at 07:28:15PM +, Darren J Moffat wrote:
>> Nicolas Williams wrote:
>>> Interesting.  If ZFS could make sure no blocks exist in a pool from more
>>> than 2^64-1 transactions ago[*], then the txg + a 32-bit per-transaction
>>> block write counter would suffice.  That way Darren would have to store
>>> just 32 bits of the IV.  That way he'd have 352 bits to work with, and
>>> then it'd be possible to have a 128-bit authentication tag and a 224-bit
>>> hash.
>>
>> The logical txg (post dedup integration we have physical and logical 
>> transaction ids) + a 32 bit counter is interesting.   It was actually my 
>> very first design for IV's several years ago!
[...]
>> I suspect that sometime in the next 584,542 years the block pointer size 
>> for ZFS will increase and I'll have more space to store a bigger MAC, 
>> hash and IV.  In fact I guess that will happen even in the next 50 years.
> 
> Heh.  txg + 32-bit counter == 96-bit IVs sounds like the way to go.

I'm confused. How does this allow you to do block-level deduplication,
given that the IV (and hence the ciphertext) will be different for every
block even when the plaintext is the same?

-- 
David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com



signature.asc
Description: OpenPGP digital signature

Re: Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

2009-11-03 Thread David-Sarah Hopwood

Zooko Wilcox-O'Hearn wrote:
> Dear Darren J Moffat:
> 
> I don't understand why you need a MAC when you already have the hash of
> the ciphertext.  Does it have something to do with the fact that the
> checksum is non-cryptographic by default
> (http://docs.sun.com/app/docs/doc/819-5461/ftyue?a=view ), and is that
> still true?  Your original design document [1] said you needed a way to
> force the checksum to be SHA-256 if encryption was turned on.  But back
> then you were planning to support non-authenticating modes like CBC.  I
> guess once you dropped non-authenticating modes then you could relax
> that requirement to force the checksum to be secure.
> 
> Too bad, though!  Not only are you now tight on space in part because
> you have two integrity values where one ought to do, but also a secure
> hash of the ciphertext is actually stronger than a MAC!  A secure hash
> of the ciphertext tells whether the ciphertext is right (assuming the
> hash function is secure and implemented correctly).  Given that the
> ciphertext is right, then the plaintext is right (given that the
> encryption is implemented correctly and you use the right decryption
> key).

Hmm. That may be too many "given"s.

Tahoe (see www.allmydata.org) has an open bug to add a plaintext hash,
precisely because the encryption might not be implemented correctly or
the encryption key might not be correct:

It seems as though ZFS (and many other protocols) is in the same position
as Tahoe, in wanting some way to validate that the ciphertext is correct
without needing the decryption key, but also wanting to minimize the risk
of some implementation error, and/or use of the wrong decryption key,
resulting in undetected errors in the plaintext.

I had something similar to the following in mind for the next update to
my proposal for Tahoe's new crypto protocol (simplified here to avoid
Tahoe-specific details and terminology):

 - a "plaintext verifier" is Hash1(index, salt, plaintext).

 - a "ciphertext verifier" is Hash2(index, ciphertext).

 - at a location determined by 'index', store:
   ciphertext = Encrypt[K](salt, plaintext)

This has the following advantages:

 - For integrity of the plaintext, you only need to assume that the
   implementation of the hash is correct. Moreover, if the hash
   implementation is not correct, that is very likely to cause it to
   fail to verify good data, which is noticeable as an error in normal
   operation. To get bad data to pass verification, the attacker would
   need to have some control over the output value of the incorrect
   hash; an error that effectively randomizes the value does not help
   them.

 - The verification also ensures integrity of the index. So, if a
   ciphertext ends up being stored in the wrong place, that will be
   detected.

 - Verification of the plaintext does not require the decryption key;
   it can be done using just the known plaintext verifier, and the
   purported values of 'salt' and 'plaintext' obtained from decryption.

   This is very important "if it must be possible to have all
   cryptographic key material stored and/or created entirely in a
   hardware device", as [1] states as a requirement for ZFS. If the
   verification can be done safely in software and if the encryption
   uses a standard mode, then it is more likely that existing crypto
   hardware, or at least hardware that has no specific dependency on
   ZFS, can be used.

 - Knowledge of the plaintext verifier by itself leaks no information
   about the plaintext, under the assumptions that the hash is oneway,
   and that there is no repetition of an (index, salt, plaintext) triple.

 - A non-malicious corruption of any of the plaintext verifier, the
   ciphertext, or the decryption key will cause the plaintext to fail
   to verify.

 - A malicious change to the ciphertext or any induced error in the
   decryption will cause the plaintext to fail to verify as long as
   the correct plaintext verifier is used.

Contrast with the case where we only use a ciphertext checksum, where
either an error in the decryption, or corruption of the decryption key,
will result in an undetected error in the plaintext.

Of course we also need to consider the space constraints. 384 bits
would fit two 192-bit hashes for the plaintext and ciphertext
verifiers; but then we would have no space to accomodate the
ciphertext expansion that results from encrypting the salt together
with the plaintext.

I'm not familiar enough with ZFS's on-disk format to tell whether there
is a way around this. Note that the encrypted salt does not need to
be stored in the same place as either the verifiers or the rest of
the ciphertext.

> A MAC on the plaintext tells you only that the plaintext was
> chosen by someone who knew the key.  See what I mean?  A MAC can't be
> used to give someone the ability to read some data while withholding
> from them the ability to alter that data.  A secure hash

Re: Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

2009-11-02 Thread Nicolas Williams

On Sun, Nov 01, 2009 at 10:33:34PM -0700, Zooko Wilcox-O'Hearn wrote:
> I don't understand why you need a MAC when you already have the hash  
> of the ciphertext.  Does it have something to do with the fact that  
> the checksum is non-cryptographic by default (http://docs.sun.com/app/ 
> docs/doc/819-5461/ftyue?a=view ), and is that still true?  Your  
> original design document [1] said you needed a way to force the  
> checksum to be SHA-256 if encryption was turned on.  But back then  
> you were planning to support non-authenticating modes like CBC.  I  
> guess once you dropped non-authenticating modes then you could relax  
> that requirement to force the checksum to be secure.

[Not speaking for Darren...]  No, the requirement to use a strong hash
remains, but since the hash would be there primarily for protection
against errors, I don't the requirement for a strong hash is really
needed.

> Too bad, though!  Not only are you now tight on space in part because  
> you have two integrity values where one ought to do, but also a  
> secure hash of the ciphertext is actually stronger than a MAC!  A  
> secure hash of the ciphertext tells whether the ciphertext is right  
> (assuming the hash function is secure and implemented correctly).   
> Given that the ciphertext is right, then the plaintext is right  
> (given that the encryption is implemented correctly and you use the  
> right decryption key).  A MAC on the plaintext tells you only that  
> the plaintext was chosen by someone who knew the key.  See what I  
> mean?  A MAC can't be used to give someone the ability to read some  
> data while withholding from them the ability to alter that data.  A  
> secure hash can.

Users won't actually get the data keys, only the data key wrapping keys.
Users who can read the disk and find the wrapped keys and know the
wrapping keys can find the actual data keys, of course, but add in a
host key that the user can't read and now the user cannot recover their
data keys.  One goal is to protect a system against its users, but
another is to protect user data against maliciou modification by anyone
else.  A MAC provides the first kind of protection if the user can't
access the data keys, and a MAC provides the second kind of protection
if the data keys can be kept secret.

> One of the founding ideas of the whole design of ZFS was end-to-end  
> integrity checking.  It does that successfully now, for the case of  
> accidents, using large checksums.  If the checksum is secure then it  
> also does it for the case of malice.  In contrast a MAC doesn't do  
> "end-to-end" integrity checking.  For example, if you've previously  
> allowed someone to read a filesystem (i.e., you've given them access  
> to the key), but you never gave them permission to write to it, but  
> they are able to exploit the isses that you mention at the beginning  
> of [1] such as "Untrusted path to SAN", then the MAC can't stop them  
> from altering the file, nor can the non-secure checksum, but a secure  
> hash can (provided that they can't overwrite all the way up the  
> Merkle Tree of the whole pool and any copies of the Merkle Tree root  
> hash).

I think we have to assume that an attacker can write to any part of the
pool, including the Merkle tree roots.  It'd be odd to assume that the
attacker can write anywhere but there -- there's nothing to make it so!

I.e., we have to at least authenticate the Merkle tree roots.  That
still means depending on collision resistance of the hash function for
security.  If we authenticate every block we don't have that dependence
(I'll come back to this).

The interesting thing here is that we want the hash _and_ the MAC, not
just the MAC.  The reason is that we want block pointers (which include
the {IV, MAC, hash} for the block being pointed to) to be visible to the
layer below the filesystem, so that we can scrub/resilver and evacuate
devices from a pool (meaning: re-write all the block pointers point to
blocks on the evacuated devices so that they point elsewhere) even
without having the data keys at hand (more on this below).

We could MAC the Merkle tree roots alone, thus alleviating the space
situation in the block pointer structure (and also saving precious CPU
cycles).  But interestingly we wouldn't alleviate it that much!  We need
to store a 96-bit IV, and if we don't MAC every block then we'll want
the strongest hash we can use, so we'll need at least another 256 bits,
for a total of 352 bits of the 384 that we have to play with.  Whereas
if we MAC every block we might store a 96-bit IV, a 128-bit
authentication tag and 160-bit hash, using all 384 bits.

You get more collision resistance from an N-bit MAC than from a hash of
the same length.  That's because in the MAC case the forger can't check
the forgery without knowing the key, while in the hash case the attacker
can verify that some contents collides with another's hash.  In the MAC
case an attacker that hasn't broken the M

Re: Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

2009-11-02 Thread Matt Ball

Hi Darren,

On Fri, Oct 30, 2009 at 11:30 AM, Darren J Moffat wrote:

> For the encryption functionality in the ZFS filesystem we use AES in CCM or
> GCM mode at the block level to provide confidentiality and authentication.
>  There is also a SHA256 checksum per block (of the ciphertext) that forms a
> Merkle tree of all the blocks in the pool. Note that I have to store the
> full IV in the block.   A block here is a ZFS block which is any power of
> two from 512 bytes to 128k (the default).
>
> The SHA256 checksums are used even for blocks in the pool that aren't
> encrypted and are used for detecting and repairing (resilvering) block
> corruption.  Each filesystem in the pool has its own wrapping key and data
> encryption keys.
>
> Due to some unchangeable constraints I have only 384 bits of space to fit
> in all of: IV, MAC (CCM or GCM Auth Tag), and the SHA256 checksum, which
> best case would need about 480 bits.
>
> Currently I have Option 1 below but I the truncation of SHA256 down to 128
> bits makes me question if this is safe.  Remember the SHA256 is of the
> ciphertext and is used for resilvering.
>
> Option 1
> 
> IV  96 bits  (the max CCM allows given the other params)
> MAC 128 bits
> ChecksumSHA256 truncated to 128 bits
>
>
I personally like the default option 1.  All the others have various
uglinesses.

SHA-224 has patent issues (see US patent
6829355).
It's really identical to SHA-256 except that it uses a different initial
value and truncates to 224 bits.  I would love to see SHA-224 completely
disappear.

Cryptographers will all have different opinions about how big a MAC (i.e.,
cryptographic integrity check) should be, but my take on it is to ask how
big of a CRC would you need in a non-adversarial environment to meet the
undetectable error rate specified within the system, and then use that for
the minimum size of the MAC.  For tape drives I've worked on, this was
typically somewhere around 1 undetected error in 10^27 bits.  If you protect
1 data bit, then you'd roughly need an 90 bit CRC, which you could round up
to 96-bits.  Anything more than 96 bits in my opinion is somewhat overkill.
I'd pick a CCM mac of either 96 bits or 128.

For hashing, it's a little different since you have to worry about the
birthday paradox.  The size of the hashing output depends on the
undetectable error rate of the system, along with the maximum number of
candidate plaintexts that an adversary could create in finding a hash
collision.  Most cryptographers (not knowing more about the system) would be
conservative and say something like "Use the full 256-bits of SHA-256 to get
a minimum of 128-bits of security", but realistically for this system, that
would be way overkill.  There's already a 128-bit CCM MAC to fall back to,
so here again (given the other safety nets in the system), I think that a
128-bit truncated SHA-256 has would be plenty of assurance for the system.

-- 
Thanks!

Matt Ball, Chair, IEEE P1619 Security in Storage Working Group
Staff Engineer, Sun Microsystems, Inc.
500 Eldorado Blvd, Bldg #5 BRM05-212, Broomfield, CO 80021
Work: 303-272-7580, Cell: 303-717-2717

Re: Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

2009-11-02 Thread Alexander Klimov

On Fri, 30 Oct 2009, Darren J Moffat wrote:
> The SHA256 checksums are used even for blocks in the pool that aren't
> encrypted and are used for detecting and repairing (resilvering) block
> corruption.  Each filesystem in the pool has its own wrapping key and
> data encryption keys.
>
> Due to some unchangeable constraints I have only 384 bits of space to
> fit in all of: IV, MAC (CCM or GCM Auth Tag), and the SHA256 checksum,
> which best case would need about 480 bits.
>
> Currently I have Option 1 below but I the truncation of SHA256 down to
> 128 bits makes me question if this is safe.  Remember the SHA256 is of
> the ciphertext and is used for resilvering.

If you use hash only to protect against non-malicious corruptions,
when why you use SHA-2? Would not MD5 or even CRC be enough?

-- 
Regards,
ASK

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majord...@metzdowd.com

Re: Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

2009-11-02 Thread Zooko Wilcox-O'Hearn


Dear Darren J Moffat:

I don't understand why you need a MAC when you already have the hash  
of the ciphertext.  Does it have something to do with the fact that  
the checksum is non-cryptographic by default (http://docs.sun.com/app/ 
docs/doc/819-5461/ftyue?a=view ), and is that still true?  Your  
original design document [1] said you needed a way to force the  
checksum to be SHA-256 if encryption was turned on.  But back then  
you were planning to support non-authenticating modes like CBC.  I  
guess once you dropped non-authenticating modes then you could relax  
that requirement to force the checksum to be secure.


Too bad, though!  Not only are you now tight on space in part because  
you have two integrity values where one ought to do, but also a  
secure hash of the ciphertext is actually stronger than a MAC!  A  
secure hash of the ciphertext tells whether the ciphertext is right  
(assuming the hash function is secure and implemented correctly).   
Given that the ciphertext is right, then the plaintext is right  
(given that the encryption is implemented correctly and you use the  
right decryption key).  A MAC on the plaintext tells you only that  
the plaintext was chosen by someone who knew the key.  See what I  
mean?  A MAC can't be used to give someone the ability to read some  
data while withholding from them the ability to alter that data.  A  
secure hash can.


One of the founding ideas of the whole design of ZFS was end-to-end  
integrity checking.  It does that successfully now, for the case of  
accidents, using large checksums.  If the checksum is secure then it  
also does it for the case of malice.  In contrast a MAC doesn't do  
"end-to-end" integrity checking.  For example, if you've previously  
allowed someone to read a filesystem (i.e., you've given them access  
to the key), but you never gave them permission to write to it, but  
they are able to exploit the isses that you mention at the beginning  
of [1] such as "Untrusted path to SAN", then the MAC can't stop them  
from altering the file, nor can the non-secure checksum, but a secure  
hash can (provided that they can't overwrite all the way up the  
Merkle Tree of the whole pool and any copies of the Merkle Tree root  
hash).


Likewise, a secure hash can be relied on as a dedupe tag *even* if  
someone with malicious intent may have slipped data into the pool.   
An insecure hash or a MAC tag can't -- a malicious actor could submit  
data which would cause a collision in an insecure hash or a MAC tag,  
causing tag-based dedupe to mistakenly unify two different blocks.


So, since you're tight on space, it would be really nice if you could  
tell your users to use a secure hash for the checksum and then  
allocate more space to the secure hash value and less space to the  
now-unnecessary MAC tag.  :-)


Anyway, if this is the checksum which is used for dedupe then  
remember the birthday so-called paradox -- some people may be  
uncomfortable with the prospect of not being able to safely dedupe  
their 2^64-block storage pool if the hash is only 128 bits, for  
example.  :-)  Maybe you could include the MAC tag in the dedupe  
comparison.


Also, the IVs for GCM don't need to be random, they need only to be  
unique.  Can you use a block number and birth number or other such  
guaranteed-unique data instead of storing an IV?  (Apropos recent  
discussion on the cryptography list [2].)


Regards,

Zooko

[1] http://hub.opensolaris.org/bin/download/Project+zfs%2Dcrypto/ 
files/zfs%2Dcrypto%2Ddesign.pdf

[2] http://www.mail-archive.com/cryptography@metzdowd.com/msg11020.html
---
Your cloud storage provider does not need access to your data.
Tahoe-LAFS -- http://allmydata.org

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majord...@metzdowd.com

Re: Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Re: Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Re: Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Re: Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Re: Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Re: Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Re: Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

7 matches

Site Navigation

Mail list logo

Footer information