convergent encryption reconsidered -- salting and key-strengthening

2008-03-31 Thread zooko
[This conversation is spanning three mailing lists --  
cryptography@metzdowd.com, [EMAIL PROTECTED], and tahoe- 
[EMAIL PROTECTED] .  Some of the posts have not reached all three of  
those lists.  I've manually added Jerry Leichter and Ivan Krstić to  
the approved-senders set for p2p-hackers and tahoe-dev, so that  
further posts by them will appear on those lists.]



On Mar 30, 2008, at 3:13 PM, Ivan Krstić wrote:

Unless I'm misunderstanding Zooko's writeup, he's worried about an
attacker going from a partially-known plaintext (e.g. a form bank
letter) to a completely-known plaintext by repeating the following
process:

1. take partially known plaintext
2. make a guess, randomly or more intelligently where possible,
about the unknown parts
3. take the current integrated partial+guessed plaintext, hash
to obtain convergence key
4. verify whether that key exists in the storage index
5. if yes, you've found the full plaintext. if not, repeat from '2'.

That's a brute force search.


That's right.  Your comparison to normal old brute-force/dictionary  
attack on passwords is a good one, and one that Jim McCoy and Jerry  
Leichter have alluded to as well.


Think of it like this:

Passwords are susceptible to brute-force and/or dictionary attack.   
We can't, in general, prevent attackers from trying guesses at our  
passwords without also preventing users from using them, so instead  
we employ various techniques:


 * salts (to break up the space of targets into subspaces, of which  
at most one can be targeted by a given brute-force attack)
 * key strengthening (to increase by a constant factor the cost of  
checking a password)
 * rate-limits for on-line tries (i.e., you get only a small fixed  
number of wrong guesses in a row before you are locked out for a time- 
out period)


However, secrets other than passwords are not usually susceptible to  
such attacks.  You can store your True Name, credit card number, bank  
account number, mother's maiden name, and so forth, on the same  
server as your password, but you don't have to worry about using  
salts or key strengthening on those latter secrets, because the  
server doesn't run a service that allows unauthenticated remote  
people to connect, submit a guess as to their value, and receive  
confirmation, the way it does for your password.  (In other words,  
for such data we generally use an extreme form of the third defense,  
rate-limiting tries -- the number of guesses that an attacker gets is  
limited to none!)


Likewise, if you are going to store or transmit those kinds of  
secrets in encrypted form using a traditional randomly-generated  
encryption key, then you don't have to worry about brute-force/ 
dictionary attacks because your strong randomly-selected symmetric  
encryption key defies them.


The Key Point:

 *** Convergent encryption exposes whatever data is put into it to  
the sorts of attacks that already apply to passwords.



Convergent encryption had been invented, analyzed and used for many  
years, but to the best of my knowledge the first time that anyone  
noticed this issue was March 16 of this year (at 3 AM Chicago Time),  
when Drew Perttula and Brian Warner made that observation.  (Although  
to be fair some of the best-known uses of convergent encryption  
during these years have been sharing publicly-known files with  
strangers, in which case I suppose it is assumed that the cleartext  
does not contain secrets.)


Now PBKDF2 is a combination of the first two defenses -- salting and  
key strengthening.  When you first suggested PBKDF2, I -- and  
apparently Jerry Leichter -- thought that you were suggesting its  
salting feature as a solution.  The solution that we've come up with  
for Tahoe (described in my original note) is much like salting,  
except that the added value that gets mixed in is secret and  
unguessable, where I normally think of salt as non-secret.


Now I see that you are also emphasizing the key strengthening feature  
of PBKDF2.


k denotes symmetric encryption key, p denotes plaintext, c  
denotes ciphertext, s denotes salt, E(key, plaintext) is  
encryption, H() is secure hashing, H^1000() is secure hashing a  
thousand times over, i.e.H(H(H(H(... a thousand times.


Traditional encryption:

k = random()
c = E(k, p)

Traditional convergent encryption:

k = H(p)
c = E(k, p)

Tahoe-style convergent encryption with added secret (s);  s can  
be re-used for any number of files, but it is kept secret from  
everyone except those with whom you wish to converge storage.


s = random()
k = H(s, p)
c = E(k, p)

PBKDF2 (simplified);  s can be re-used but is generally not, and it  
is not secret.


s = random()
k = H^1000(s, password)
c = E(k, p)

Now, one could imagine a variant of traditional convergent encryption  
which added key strengthening:


k = H^1000(p)
c = E(k, p)

This would have a performance impact on normal everyday use of Tahoe  
without, in my current estimation, 

Re: [p2p-hackers] convergent encryption reconsidered

2008-03-31 Thread Victor Duchovni
On Sun, Mar 30, 2008 at 05:13:07PM -0400, Ivan Krsti?? wrote:

 That's a brute force search. If your convergence key, instead of being  
 a simple file hash, is obtained through a deterministic but  
 computationally expensive function such as PBKDF2 (or the OpenBSD  
 bcrypt, etc), then step 3 makes an exhaustive search prohibitive in  
 most cases while not interfering with normal filesystem operation.  
 What am I missing?

PBKDFS2 is excellent for turning interactively typed pass-phrases into
keys. It is not entirely clear that it is a good fit for a filesystem.
Updating any single file is now a computationally intensive process, the
performance impact may be unacceptable. With PBKDF2 and the iteration
count set to the for now popular 1000, a 64K byte file will now trigger
~~2 million sha1 compression function computations (if I remember the
sha1 block size correctly as 512 bits or 64 bytes).

A crude cost estimate on typical hardware (openssl speed):

Doing sha1 for 3s on 8192 size blocks: 57316 sha1's in 3.00s

Extrapolating from this, on 64K sized files, we get ~1200 HMAC operations
per second. If we iterate that 1000 times, 1.2 key derivations per
second. The throughput to disk is CPU bound at ~64KB/s, which is rather
poor.

-- 
Viktor.

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: [tahoe-dev] convergent encryption reconsidered -- salting and key-strengthening

2008-03-31 Thread Ben Laurie

zooko wrote:

Think of it like this:

Passwords are susceptible to brute-force and/or dictionary attack.   
We can't, in general, prevent attackers from trying guesses at our  
passwords without also preventing users from using them, so instead  
we employ various techniques:


  * salts (to break up the space of targets into subspaces, of which  
at most one can be targeted by a given brute-force attack)
  * key strengthening (to increase by a constant factor the cost of  
checking a password)
  * rate-limits for on-line tries (i.e., you get only a small fixed  
number of wrong guesses in a row before you are locked out for a time- 
out period)


You forgot:

  * stronger passwords

Cheers,

Ben.

--
http://www.apache-ssl.org/ben.html   http://www.links.org/

There is no limit to what a man can do or how far he can go if he
doesn't mind who gets the credit. - Robert Woodruff

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: convergent encryption reconsidered

2008-03-31 Thread Ludovic Courtès
Hi,

Sorry for arriving late into this thread...

zooko [EMAIL PROTECTED] writes:

The Learn-Partial-Information Attack

 They extended the confirmation-of-a-file attack into the
 learn-partial-information attack. In this new attack, the
 attacker learns some information from the file. This is done by
 trying possible values for unknown parts of a file and then
 checking whether the result matches the observed ciphertext.
 For example, if you store a document such as a form letter from
 your bank, which contains a few pages of boilerplate legal text
 plus a few important parts, such as your bank account number
 and password, then an attacker who knows the boilerplate might
 be able to learn your account number and password.

I don't see how this would work.  It's different from a dictionary
attack because it looks for partial matches, as opposed to exact
matches.

Suppose you have one (sensitive) file that contains
boilerplatesecret and another than contains
boilerplateplaceholder.  They have different hashes, hence
different ciphertexts through convergent encryption.  How would one get
access to the plaintext of the former when knowing only the latter?

Now, let's assume that said files were split into two blocks before
being convergent-encrypted, namely boilerplate and secret for
the former, and boilerplate and placeholder for the latter.  The
confirmation-of-a-file (or rather confirmation-of-a-block) attack
does work, but it does not reveal anything about the secret.


I'm not sure about Tahoe, but the scheme I had in mind in my thesis was
to allow anyone to choose whatever encoding is used [0].  This means
that one could choose the algorithm used to split input files into
blocks, whether to compress the input file or individual blocks, what
compression algorithm to use, what hash and cipher algorithm to use,
etc.  With that level of freedom, these two attacks are a lesser threat
(one might argue that, in practice, many people would use the default
settings, which would make them potential victims and attackers of each
other...).

Thanks,
Ludovic.

[0] http://www.fdn.fr/~lcourtes/phd/phd-thesis.pdf, e.g., Section 4.3.

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: [p2p-hackers] convergent encryption reconsidered

2008-03-31 Thread James A. Donald

Ivan Krsti? wrote:

1. take partially known plaintext
2. make a guess, randomly or more intelligently where possible,
   about the unknown parts
3. take the current integrated partial+guessed plaintext, hash
   to obtain convergence key
4. verify whether that key exists in the storage index
5. if yes, you've found the full plaintext. if not, repeat from '2'.

That's a brute force search. If your convergence key, instead of being a 
simple file hash, is obtained through a deterministic but 
computationally expensive function such as PBKDF2 (or the OpenBSD 
bcrypt, etc), then step 3 makes an exhaustive search prohibitive in most 
cases while not interfering with normal filesystem operation. What am I 
missing?


Better still, have a limited supply of tickets that enable one to 
construct the convergence key.  Enough tickets for all normal usage, but 
 not enough to perform an exhaustive search.


Assume a small set of ticket issuing computers hold a narrowly shared 
secret integer k.  Assume a widely shared elliptic curve with the 
generator G.


If h is the hash of the file, the convergence key is h*k*G.

If you give the ticket issuing computers an elliptic point P, they will 
 give you the corresponding elliptic point k*P.  If, however, you ask 
for too many such points, they will stop responding.


Of course, this allows one to be attacked by anyone that holds the 
narrowly held key.


-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: [p2p-hackers] convergent encryption reconsidered -- salting and key-strengthening

2008-03-31 Thread Ivan Krstić

On Mar 30, 2008, at 9:37 PM, zooko wrote:

You can store your True Name, credit card number, bank
account number, mother's maiden name, and so forth, on the same
server as your password, but you don't have to worry about using
salts or key strengthening on those latter secrets, because the
server doesn't run a service that allows unauthenticated remote
people to connect, submit a guess as to their value, and receive
confirmation, the way it does for your password.


Tahoe doesn't run this service either. I can't use it to make guesses  
at any of the values you mentioned. I can use it to make guesses at  
whole documents incorporating such values, which is in most cases a  
highly non-trivial distinction.


To make such guesses, I need to account for at least:

- file formats, since an e-mail message has a different on-disk
  representation depending on the recipient's e-mail client,

- temporal and transport variance, as PDF documents generally
  incorporate a generation timestamp, and e-mail messages include
  routing headers (with timestamps!),

- document modifications due to variables other than the one(s) being
  guessed, e.g. names, e-mail addresses, customized unsubscribe links.

I would be interested to see an actual real-world example of how a  
document would fall to this attack. It strikes me as a cute threat in  
theory, but uninteresting in practice.



 *** Convergent encryption exposes whatever data is put into it to
the sorts of attacks that already apply to passwords.


Sometimes, under highly peculiar circumstances, etc.


Convergent encryption had been invented, analyzed and used for many
years, but to the best of my knowledge the first time that anyone
noticed this issue was March 16 of this year


FWIW, I have discussed this threat verbally with colleagues when I was  
asked for possible designs for OLPC's server-based automatic backup  
system. I dismissed it at the time as 'not a real-world concern'. I  
might even have it in my notes, but those weren't published, so it's  
moot.



Now PBKDF2 is a combination of the first two defenses -- salting and
key strengthening.  When you first suggested PBKDF2, I -- and
apparently Jerry Leichter -- thought that you were suggesting its
salting feature as a solution.


Yeah, sorry, I wasn't being clear. I should've just said a key  
strengthening function rather than naming anything in particular.



This would have a performance impact on normal everyday use of Tahoe
without, in my current estimation, making a brute-force/dictionary
attack infeasible.


Adding, say, 5 seconds of computation to the time it takes to store a  
file is likely to be lost as noise in comparison with the actual  
network upload time, while still making an attacker's life  
_dramatically_ harder than now.



The trade-off is actually worse than it appears since the attacker is
attacking multiple users at once (in traditional convergent
encryption, he is attacking *all* users at once)


Again, is there a real-world example of the kind of data or documents  
that would show this to be a serious problem? While it's technically  
true that you're targeting all the users in parallel when brute  
forcing, note that if you're not actually hyper-targeting your attack,  
you need to brute force _all_ the variables I mention above in  
parallel, except in pathological cases -- and those, if you know of  
some, would be interesting for the discussion.



economy of scale, and can profitably invest in specialized tools,
even specialized hardware such as a COPACOBANA [1].


The OpenBSD eksblowfish/bcrypt design can't be bitsliced and generally  
doesn't lend itself well to large speedups in hardware, by design.


Cheers,

--
Ivan Krstić [EMAIL PROTECTED] | http://radian.org

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: [p2p-hackers] convergent encryption reconsidered

2008-03-31 Thread Ivan Krstić

On Mar 31, 2008, at 6:44 AM, James A. Donald wrote:
Better still, have a limited supply of tickets that enable one to  
construct the convergence key.  Enough tickets for all normal usage,  
but  not enough to perform an exhaustive search. [...]


If you give the ticket issuing computers an elliptic point P, they  
will  give you the corresponding elliptic point k*P.  If, however,  
you ask for too many such points, they will stop responding.


This isn't a good design. It's incompatible with Tahoe's present  
architecture, introduces a single point of failure, centralizes the  
otherwise by-design decentralized filesystem, and presents a simple  
way to mount denial of service attacks. Finally, since the  
decentralization in Tahoe is part of its security design (storage  
servers aren't trusted), you run into the usual quis custodiet ipsos  
custodes problem with the ticket-issuing server that the present  
system nicely avoids.


Cheers,

--
Ivan Krstić [EMAIL PROTECTED] | http://radian.org

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: how to read information from RFID equipped credit cards

2008-03-31 Thread Peter Gutmann
Ben Laurie [EMAIL PROTECTED] writes:

And so we end up at the position that we have ended up at so many times
before: the GTCYM has to have a decent processor, a keyboard and a screen,
and must be portable and secure.

One day we'll stop concluding this and actually do something about it.

Actually there are already companies doing something like this, but they've
run into a problem that no-one has ever considered so far: The GTCYM needs a
(relatively) high-bandwidth connection to a remote server, and there's no easy
way to do this.

(Hint: You can't use anything involving USB because many corporates lock down
USB ports to prevent data leaking onto other corporates' networks, or
conversely to prevent other corporates' data leaking onto their networks. Same
for Ethernet, Firewire, ...).

Peter.

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]