Re: [p2p-hackers] convergent encryption reconsidered -- salting and key-strengthening

2008-04-02 Thread zooko

On Mar 31, 2008, at 4:47 AM, Ivan Krstić wrote:


Tahoe doesn't run this service either. I can't use it to make guesses
at any of the values you mentioned. I can use it to make guesses at
whole documents incorporating such values, which is in most cases a
highly non-trivial distinction.


The way that I would phrase this is that convergent encryption  
exposes whatever data is put into it, in whatever batch-size is put  
into it, to brute-force/dictionary attacks.


If the data that you put in is unguessable, then you needn't worry  
about these attacks.  (Likewise, as Ben Laurie reminds us, using  
strong passwords is a sufficient defense against these attacks on  
passwords.)


You correctly emphasize that typical convergent encryption services  
(which operate on files, or, in the case of GNUnet, on 32 KiB  
blocks), and typical uses of those services (which typically store  
files as produced by apps written for traditional filesystems),  
batch together data in such a way that the aggregate is more likely  
to be unguessable than if each field were stored separately.  I don't  
disagree with this observation.


I am often reminded of Niels Ferguson's and Bruce Schneier's dictum,  
in the excellent _Practical_Cryptography_, that security needs to be  
a *local* property.  They argue that one should be able to tell  
whether a component is secure by inspecting that component itself,  
rather than by reasoning about interactions between that component  
and other components.


Concretely, convergent encryption with a per-user added secret, as  
currently implemented in Tahoe, can be shown to guarantee  
confidentiality of the data, regardless of what the data is.


Traditional convergent encryption can be shown to offer  
confidentiality only with the proviso that the data put into it  
conform to certain criteria -- criteria that cannot be verified by a  
computer nor by a user who is not a skilled security expert.


You may argue that the chance that a user would put non-comformant  
data into it is small.  I don't necessarily disagree, although before  
I became willing to bet on it I would require more quantitative  
investigation.


However, arguing that component A is secure as long as component B  
behaves a certain way, and that component B is very likely to behave  
that way, is a different sort of argument than arguing that component  
A is secure regardless of the behavior of component B.


For one thing, the behavior of component B may change in the future.   
Concretely, people may write apps that store data in Tahoe in a way  
that previous apps didn't.  Those people will almost certainly be  
completely unaware of the nature of convergent encryption and brute- 
force/dictionary attacks.


Now obviously making the security properties of a system modular in  
this way might impose a performance cost.  In the case of Tahoe, that  
cost is the loss of universal convergence.  Allmydata.com analyzed  
the space savings due to convergence among our current customers and  
found that it was around 1% savings.  We (allmydata.com) intend to  
monitor the potential savings of universal convergence in an on-going  
way, and if it turns out that there are substantial benefits to be  
gained then I will revisit this issue and perhaps I will be forced to  
rely on an argument of the other form -- that users are unlikely to  
use it in an unsafe way.


Thank you again for your thoughtful comments on this issue.

Regards,

Zooko O'Whielacronx

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: [p2p-hackers] convergent encryption reconsidered

2008-03-31 Thread Victor Duchovni
On Sun, Mar 30, 2008 at 05:13:07PM -0400, Ivan Krsti?? wrote:

 That's a brute force search. If your convergence key, instead of being  
 a simple file hash, is obtained through a deterministic but  
 computationally expensive function such as PBKDF2 (or the OpenBSD  
 bcrypt, etc), then step 3 makes an exhaustive search prohibitive in  
 most cases while not interfering with normal filesystem operation.  
 What am I missing?

PBKDFS2 is excellent for turning interactively typed pass-phrases into
keys. It is not entirely clear that it is a good fit for a filesystem.
Updating any single file is now a computationally intensive process, the
performance impact may be unacceptable. With PBKDF2 and the iteration
count set to the for now popular 1000, a 64K byte file will now trigger
~~2 million sha1 compression function computations (if I remember the
sha1 block size correctly as 512 bits or 64 bytes).

A crude cost estimate on typical hardware (openssl speed):

Doing sha1 for 3s on 8192 size blocks: 57316 sha1's in 3.00s

Extrapolating from this, on 64K sized files, we get ~1200 HMAC operations
per second. If we iterate that 1000 times, 1.2 key derivations per
second. The throughput to disk is CPU bound at ~64KB/s, which is rather
poor.

-- 
Viktor.

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: [p2p-hackers] convergent encryption reconsidered

2008-03-31 Thread James A. Donald

Ivan Krsti? wrote:

1. take partially known plaintext
2. make a guess, randomly or more intelligently where possible,
   about the unknown parts
3. take the current integrated partial+guessed plaintext, hash
   to obtain convergence key
4. verify whether that key exists in the storage index
5. if yes, you've found the full plaintext. if not, repeat from '2'.

That's a brute force search. If your convergence key, instead of being a 
simple file hash, is obtained through a deterministic but 
computationally expensive function such as PBKDF2 (or the OpenBSD 
bcrypt, etc), then step 3 makes an exhaustive search prohibitive in most 
cases while not interfering with normal filesystem operation. What am I 
missing?


Better still, have a limited supply of tickets that enable one to 
construct the convergence key.  Enough tickets for all normal usage, but 
 not enough to perform an exhaustive search.


Assume a small set of ticket issuing computers hold a narrowly shared 
secret integer k.  Assume a widely shared elliptic curve with the 
generator G.


If h is the hash of the file, the convergence key is h*k*G.

If you give the ticket issuing computers an elliptic point P, they will 
 give you the corresponding elliptic point k*P.  If, however, you ask 
for too many such points, they will stop responding.


Of course, this allows one to be attacked by anyone that holds the 
narrowly held key.


-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: [p2p-hackers] convergent encryption reconsidered -- salting and key-strengthening

2008-03-31 Thread Ivan Krstić

On Mar 30, 2008, at 9:37 PM, zooko wrote:

You can store your True Name, credit card number, bank
account number, mother's maiden name, and so forth, on the same
server as your password, but you don't have to worry about using
salts or key strengthening on those latter secrets, because the
server doesn't run a service that allows unauthenticated remote
people to connect, submit a guess as to their value, and receive
confirmation, the way it does for your password.


Tahoe doesn't run this service either. I can't use it to make guesses  
at any of the values you mentioned. I can use it to make guesses at  
whole documents incorporating such values, which is in most cases a  
highly non-trivial distinction.


To make such guesses, I need to account for at least:

- file formats, since an e-mail message has a different on-disk
  representation depending on the recipient's e-mail client,

- temporal and transport variance, as PDF documents generally
  incorporate a generation timestamp, and e-mail messages include
  routing headers (with timestamps!),

- document modifications due to variables other than the one(s) being
  guessed, e.g. names, e-mail addresses, customized unsubscribe links.

I would be interested to see an actual real-world example of how a  
document would fall to this attack. It strikes me as a cute threat in  
theory, but uninteresting in practice.



 *** Convergent encryption exposes whatever data is put into it to
the sorts of attacks that already apply to passwords.


Sometimes, under highly peculiar circumstances, etc.


Convergent encryption had been invented, analyzed and used for many
years, but to the best of my knowledge the first time that anyone
noticed this issue was March 16 of this year


FWIW, I have discussed this threat verbally with colleagues when I was  
asked for possible designs for OLPC's server-based automatic backup  
system. I dismissed it at the time as 'not a real-world concern'. I  
might even have it in my notes, but those weren't published, so it's  
moot.



Now PBKDF2 is a combination of the first two defenses -- salting and
key strengthening.  When you first suggested PBKDF2, I -- and
apparently Jerry Leichter -- thought that you were suggesting its
salting feature as a solution.


Yeah, sorry, I wasn't being clear. I should've just said a key  
strengthening function rather than naming anything in particular.



This would have a performance impact on normal everyday use of Tahoe
without, in my current estimation, making a brute-force/dictionary
attack infeasible.


Adding, say, 5 seconds of computation to the time it takes to store a  
file is likely to be lost as noise in comparison with the actual  
network upload time, while still making an attacker's life  
_dramatically_ harder than now.



The trade-off is actually worse than it appears since the attacker is
attacking multiple users at once (in traditional convergent
encryption, he is attacking *all* users at once)


Again, is there a real-world example of the kind of data or documents  
that would show this to be a serious problem? While it's technically  
true that you're targeting all the users in parallel when brute  
forcing, note that if you're not actually hyper-targeting your attack,  
you need to brute force _all_ the variables I mention above in  
parallel, except in pathological cases -- and those, if you know of  
some, would be interesting for the discussion.



economy of scale, and can profitably invest in specialized tools,
even specialized hardware such as a COPACOBANA [1].


The OpenBSD eksblowfish/bcrypt design can't be bitsliced and generally  
doesn't lend itself well to large speedups in hardware, by design.


Cheers,

--
Ivan Krstić [EMAIL PROTECTED] | http://radian.org

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: [p2p-hackers] convergent encryption reconsidered

2008-03-31 Thread Ivan Krstić

On Mar 31, 2008, at 6:44 AM, James A. Donald wrote:
Better still, have a limited supply of tickets that enable one to  
construct the convergence key.  Enough tickets for all normal usage,  
but  not enough to perform an exhaustive search. [...]


If you give the ticket issuing computers an elliptic point P, they  
will  give you the corresponding elliptic point k*P.  If, however,  
you ask for too many such points, they will stop responding.


This isn't a good design. It's incompatible with Tahoe's present  
architecture, introduces a single point of failure, centralizes the  
otherwise by-design decentralized filesystem, and presents a simple  
way to mount denial of service attacks. Finally, since the  
decentralization in Tahoe is part of its security design (storage  
servers aren't trusted), you run into the usual quis custodiet ipsos  
custodes problem with the ticket-issuing server that the present  
system nicely avoids.


Cheers,

--
Ivan Krstić [EMAIL PROTECTED] | http://radian.org

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: [p2p-hackers] convergent encryption reconsidered

2008-03-30 Thread Ivan Krstić

On Mar 20, 2008, at 3:42 PM, zooko wrote:

   They extended the confirmation-of-a-file attack into the
   learn-partial-information attack. In this new attack, the
   attacker learns some information from the file. This is done by
   trying possible values for unknown parts of a file and then
   checking whether the result matches the observed ciphertext.


How is this conceptually different from classic dictionary attacks,  
and why does e.g. running the file through PBKDF2 and using the result  
for convergence not address your concern(s)?


--
Ivan Krstić [EMAIL PROTECTED] | http://radian.org

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: [p2p-hackers] convergent encryption reconsidered

2008-03-30 Thread Leichter, Jerry
| They extended the confirmation-of-a-file attack into the
| learn-partial-information attack. In this new attack, the
| attacker learns some information from the file. This is done by
| trying possible values for unknown parts of a file and then
| checking whether the result matches the observed ciphertext.
| 
| How is this conceptually different from classic dictionary attacks,
| and why does e.g. running the file through PBKDF2 and using the result
| for convergence not address your concern(s)?
How would that help?

Both the ability of convergent encryption to eliminate duplicates,
and this attack, depend on there being a deterministic algorithm
that computes a key from the file contents.  Sure, if you use a
different salt for each file, the attack goes away - but so does
the de-duplication.  If you don't care about de-duplication, there
are simpler, cheaper ways to choose a key.
-- Jerry

| --
| Ivan Krsti? [EMAIL PROTECTED] | http://radian.org
| 
| -
| The Cryptography Mailing List
| Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]
| 
| 

Re: [p2p-hackers] convergent encryption reconsidered

2008-03-30 Thread Ivan Krstić

On Mar 30, 2008, at 3:12 PM, Leichter, Jerry wrote:

How would that help?


Unless I'm misunderstanding Zooko's writeup, he's worried about an  
attacker going from a partially-known plaintext (e.g. a form bank  
letter) to a completely-known plaintext by repeating the following  
process:


1. take partially known plaintext
2. make a guess, randomly or more intelligently where possible,
   about the unknown parts
3. take the current integrated partial+guessed plaintext, hash
   to obtain convergence key
4. verify whether that key exists in the storage index
5. if yes, you've found the full plaintext. if not, repeat from '2'.

That's a brute force search. If your convergence key, instead of being  
a simple file hash, is obtained through a deterministic but  
computationally expensive function such as PBKDF2 (or the OpenBSD  
bcrypt, etc), then step 3 makes an exhaustive search prohibitive in  
most cases while not interfering with normal filesystem operation.  
What am I missing?


Cheers,

--
Ivan Krstić [EMAIL PROTECTED] | http://radian.org

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: [p2p-hackers] convergent encryption reconsidered

2008-03-26 Thread zooko

Jim:

Thanks for your detailed response on the convergent encryption issue.

In this post, I'll just focus on one very interesting question that  
you raise: When do either of these attacks on convergent encryption  
apply?.


In my original note I was thinking about the allmydata.org Tahoe  
Least Authority Filesystem.  In this post I will attempt to follow  
your lead in widening the scope.  In particular GNUnet and Freenet  
are currently active projects that use convergent encryption.  The  
learn-partial-information attack would apply to either system if a  
user were using it with files that she intended not to divulge, but  
that were susceptible to being brute-forced in this way by an attacker.



On Mar 20, 2008, at 10:56 PM, Jim McCoy wrote:


On Mar 20, 2008, at 12:42 PM, zooko wrote:


  Security engineers have always appreciated that convergent
  encryption allows an attacker to perform a
  confirmation-of-a-file attack -- if the attacker already knows
  the full plaintext of a file, then they can check whether a
  given user has a copy of that file.


The truth of this depends on implementation details, and is an
assertion that cannot be said to cover all or even most of the
potential use-cases for this technique.


You're right.  I was writing the above in the context of Tahoe,  
where, as Brian Warner explained, we do not attempt to hide the  
linkage between users and ciphertexts.  What I wrote above doesn't  
apply in the general case.


However, there is a very general argument about the applicability of  
these attacks, which is: Why encrypt?.


If your system has strong anonymity properties, preventing people  
from learning which files are associated with which users, then you  
can just store the files in plaintext.


Ah, but of course you don't want to do that, because even without  
being linked to users, files may contain sensitive information that  
the users didn't intend to disclose.  But if the files contain such  
information, then it might be acquired by the learn-partial- 
information attack.


When designing such a system, you should ask yourself Why  
encrypt?.  You encrypt in order to conceal the plaintext from  
someone, but if you use convergent encryption, and they can use the  
learn-partial-information attack, then you fail to conceal the  
plaintext from them.


You should use traditional convergent encryption (without an added  
secret) if:


1.  You want to encrypt the plaintext, and
2.  You want convergence, and
3.  You don't mind exposing the existence of that file (ignoring the  
confirmation-of-a-file attack), and
4.  You are willing to bet that the file has entropy from the  
attacker's perspective which is greater than his computational  
capacity (defeating the learn-partial-information attack).


You should use convergent encryption with an added secret (as  
recently implemented for the Tahoe Least Authority Filesystem) if:


1.  You want to encrypt the plaintext, and
2.  You want convergence within the set of people who know the added  
secret, and
3.  You don't mind exposing the existence of that file to people in  
that set, and
4.  You are willing to disclose the file to everyone in that set, or  
else you think that people in that set to whom you do not wish to  
disclose the file will not try the learn-partial-information attack,  
or if they do that the file has entropy from their perspective which  
is greater than their computational capacity.


I guess the property of unlinkability between user and file addresses  
issue 3 in the above list -- the existence of a file is a much less  
sensitive bit of information than the existence of a file in a  
particular user's collection.


It could also effect issue 4 by increasing the entropy the file has  
from an attacker's perspective.  If he knows that the ciphertext  
belongs to you then he can try filling in the fields with information  
that he knows about you.  Without that linkage, he has to try filling  
in the fields with information selected from what he knows about all  
users.  But hiding this linkage doesn't actually help in the case the  
attacker is already using everything he knows about all users to  
attack all files in parallel.


Note that using an added secret does help in the parallel attack  
case, because (just like salting passwords) it breaks the space of  
targets up into separate spaces which can't all be attacked with the  
same computation.




The first problem is isolating the original
ciphertext in the pool of storage.  If a file is encrypted using
convergent encryption and then run through an error-correction
mechanism to generate a number of shares that make up the file an
attacker first needs to be able to isolate these shares to generate
the orginal ciphertext.  FEC decoding speeds may be reasonably fast,
but they are not without some cost.  If the storage pool is
sufficiently large and you are doing your job to limit the ability of
an attacker to see which blocks 

Fwd: [tahoe-dev] [p2p-hackers] convergent encryption reconsidered

2008-03-21 Thread zooko

Dear Perry Metzger:

Jim McCoy asked me to forward this, as he is not subscribed to  
cryptography@metzdowd.com, so his posting bounced.


Regards,

Zooko


Begin forwarded message:


From: Jim McCoy [EMAIL PROTECTED]
Date: March 20, 2008 10:56:58 PM MDT
To: theory and practice of decentralized computer networks p2p- 
[EMAIL PROTECTED]

Cc: [EMAIL PROTECTED], Cryptography cryptography@metzdowd.com
Subject: Re: [tahoe-dev] [p2p-hackers] convergent encryption  
reconsidered

Reply-To: [EMAIL PROTECTED]


On Mar 20, 2008, at 12:42 PM, zooko wrote:


  Security engineers have always appreciated that convergent
  encryption allows an attacker to perform a
  confirmation-of-a-file attack -- if the attacker already knows
  the full plaintext of a file, then they can check whether a
  given user has a copy of that file.


The truth of this depends on implementation details, and is an
assertion that cannot be said to cover all or even most of the
potential use-cases for this technique.  This property only holds if
it is possible for the attacker to link a selected ciphertext/file to
a user.  Systems which use convergent encryption to populate a shared
storage pool _might_ have this property, but is my no means a
certainty; if a system is implemented correctly is is not necessary
for users to expose their list of files in order to maintain this
shared storage space.


  basically people can tell which files you are storing if they
  are publically known files, but they can't learn anything
  about your own personal files.


It sounds like you have a design problem.  If nodes that participate
in the system can distinguish between publication and _re_- 
publication/

replication (or whatever you want to call the random sharing of
arbitrary data blocks for the purposes of increasing file
availability) then you have a problem.  If these two activities are
indistinguishable then an observer knows you have some blocks to a
file but should not be able to distinguish between you publishing the
blocks and the act of re-distribution to increase block availability.


 The Learn-Partial-Information Attack [...]


A better title for this would be Chosen-Plaintext attack on
Convergent Encryption, since what you are talking about is really a
chosen plaintext attack.  To be a bit more specific, this is really
just a version of a standard dictionary attack.  The solution to this
problem is to look at similar systems that suffered from dictionary
attacks an see what solutions were created to solve the problem.

The most widely known and studied version of this is the old crypt()/
passwd problem.


  For another example, if you use Tahoe to backup your entire
  home directory, or your entire filesystem, then the attacker
  gains the opportunity to try to learn partial information about
  various files which are of predictable format but have
  sensitive fields in them, such as .my.cnf (MySQL configuration
  files), .htpasswd, .cvspass, .netrc, web browser cookie files,
  etc..


The problem with this imagined attack are twofold.  I will use your
Tahoe example for my explanations because I have a passing familiarity
with the architecture.  The first problem is isolating the original
ciphertext in the pool of storage.  If a file is encrypted using
convergent encryption and then run through an error-correction
mechanism to generate a number of shares that make up the file an
attacker first needs to be able to isolate these shares to generate
the orginal ciphertext.  FEC decoding speeds may be reasonably fast,
but they are not without some cost.  If the storage pool is
sufficiently large and you are doing your job to limit the ability of
an attacker to see which blocks are linked to the same FEC operation
then the computational complexity of this attack is significantly
higher than you suggest.

Assuming an all-seeing oracle who can watch every bit sent into the
storage pool will get us around this first problem, but it does raise
the bar for potential attackers.

The second problem an attacker now faces is deciding what sort of
format a file might have, what the low-entropy content might be, and
then filling in values for these unknowns.  If your block size is
small (and I mean really small in the context of the sort of systems
we are talking about) there might be only a few kilobits of entropy in
the first couple of blocks of a file so either a rainbow-table attack
on known file formats or a dedicated effort to grab a specific file
might be possible, but this is by no means certain.  Increase your
block size and this problem becomes much harder for the attacker.


 Defense Against Both Attacks

  [...]
  However, we can do better than that by creating a secret value
  and mixing that value into the per-file encryption key (so
  instead of symmetric_key = H(plaintext), you have symmetric_key
  = H(added_secret, plaintext), where , denotes an unambiguous
  encoding of both operands). This idea is due to Brian Warner
  and Drew Perttula