Re: Comments/summary on unicity discussion

2003-03-09 Thread Joseph Ashwood
- Original Message -
From: Joshua Hill [EMAIL PROTECTED]
Subject: Re: Comments/summary on unicity discussion


 It doesn't deal with plaintext, just ciphertext.  In fact, unicity
 distance is only valid for a ciphertext only attack.  Once you get a
 known plaintext/ciphertext pair, a high unicity distance works against
 you (more on this later). In addition, it is isn't certain that after
 observing the requisite unicity distance number of ciphertext units that
 you can uniquely determine the key, it is merely very likely.

There appears to be an error in there. The Unicity Distance has a very
strong correlation with the uncertainty of the plaintext (entropy per
message). By having access to the plaintext/ciphertext pair (often it takes
multiple pairs), this removes all uncertainty as to the plaintext, this
changes the unicity distance calculation by making the unicity distance as
short as possible, which would make Once you get a known
plaintext/ciphertext pair, a high unicity distance works against you
Seem more than a little odd as a statement.

On K complexity, while K complexity offers a convenient, if somewhat
inaccurate, upperbound of the entropy, that is basically where the
relationship ends. Permit me to give the basic example. Which of these
strings has higher entropy:
kevsnblawtrlnbatkb
kevsnblawtrlnbatkb
One was created by slapping my hands on the keyboard, and so contains some
entropy, the other was created through copy and paste, and so contains none.
However the K complexity of the two is identical. The portion of the
equation you are forgetting is that the key to the pRNG may itself be
compressible. This leads to somewhat of a logic loop, but at the end of it
is the absolute smallest representation, as a compression of a given
language (the only sense in which this makes sense).
Joseph Ashwood


Trust Laboratories
http://www.trustlaboratories.com


-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: Comments/summary on unicity discussion

2003-03-07 Thread Joshua Hill
I'll hop in here, and see if I can give this a swing.  IANAC (I am not
a cryptographer), but I do have access to one at work, and I did make
use of him in gaining an understanding of this area.

Some of these points are minutia, but are important, both because they
are common errors, and because they really help bring everything together.

In general, it appears that you mixed up your labels in the reference list.
[Sha49] is, indeed, the reference that you want, but that paper should
be listed as Communication Theory of Secrecy Systems (published in 1949)

On Mon, Mar 03, 2003 at 01:28:54PM -0800, Ed Gerck wrote:
 1. WHAT IS UNICITY?
 There are three different contexts to answer to this question!
 
 1.a. Unicity Definition: Shannon [Sha49, page 693] defined unicity
 distance (hereafter, n) as the least amount of plaintext which can be
 uniquely deciphered from the corresponding ciphertext, allowing one to
 determine, without doubt, the key that was used for encryption. 

It doesn't deal with plaintext, just ciphertext.  In fact, unicity
distance is only valid for a ciphertext only attack.  Once you get a
known plaintext/ciphertext pair, a high unicity distance works against
you (more on this later). In addition, it is isn't certain that after
observing the requisite unicity distance number of ciphertext units that
you can uniquely determine the key, it is merely very likely.

So, the definition should be something more like:
1.a. Unicity Definition: Shannon [Sha49, page 693] defined unicity
distance (hereafter, n) as the least amount of ciphertext which would
reduce the likely number of spurious keys (equivocations) in a random
cipher to zero.

 1.b. Unicity Model: As first given by Shannon [Sha49] under some restrictive
 assumptions, specially the random cipher assumption, the mathematical
 expression for unicity can be cast in the following unfolded expression
 (his original expression was  n = H(K)/D, where D is the redundancy):
 
 n = H(K)/[|M| - H(M)]

I don't see this.  I do see 
  D_N = log(G) - H(M)
Where D_N is the redundancy of the language, G is the total number of
messages in the language, and H(M) is the entropy of a message of the
language.  Shannon uses log base 10, but we like to talk about 'bits'
of entropy in crypto, so we tend to talk about log base 2 (often written
lg x)

  D = D_N / N

Where D is the redundancy the per unit (character/bit/whatever), D_N
is the redundancy of the language, and N is the average number of units
per message.

And finally, the unicity distance is:
  n = H(K) / D

Where n is the unicity distance (expressed in units), H(K) is the
amount of entropy of a key (also in units) for the system, and D is the
redundancy per unit.

So, pulling it together
 n = H(K) * N / (lg(G) - H(M))

(so, it would appear that your equation was off by a factor of the
average length of the message)

But truly, I think that it makes the most sense as just
 n = H(K) / D


As an aside, you define:
 H(K) = entropy of keys used in encryption
[...]
 H(M) = entropy of actual message, the plaintext

This seems to imply that a particular message has entropy.  This is
incorrect.  A random variable has entropy ('variable' in mathspeak, not
'variable' in the sense of programming, which has an actual value as
the program is executing; this is a random variable as in X, where X
assumes the following values x_1, x_2, ... x_n with probability p_1,
p_2, ...p_n) , based on the statistical properties of the variable.
A particular value doesn't really have 'entropy', outside the context
of the system that created the value.

Now, having said that, strings (values, messages, etc.) do have
something called Kolmogorov Complexity.  Kolmogorov Complexity is
the size of the smallest program that can produce a particular string.
For a string, the highest Kolmogorov Complexity it can have is the size
of the string.  The lowest would be a very small program that produces
a very large string.

As you might expect, entropy and Kolmogorov Complexity are related.
You would expect a message from a high entropy system to have a high
Kolmogorov Complexity (in fact, the Kolmogorov Complexity should
be comparable to the entropy of the system).  Further, you get some
intuitively nice results where the Kolmogorov Complexity of any output
of a PRNG is at most the size of the PRNG state, plus a bit for the PRNG
algorithm, which agrees nicely with the entropy of the system, which is
(at best) size of the PRNG state.


 NOTE 1: The model for unicity has no probability error with a tail
 to infinity because only entropy values are used in the formula of n
 and by *definition* of  entropy the entropy is already a limit to
 infinity.

I don't understand what this note means.

 NOTE 2: It does not matter how the attacker may try to decipher
 the message. The attacker can of course use brute-force and try
 out all keys or he can use short-cuts, it is his choice and he is entirely
 free to use any 

Comments/summary on unicity discussion

2003-03-05 Thread Ed Gerck
List:

The recent thread on AES-128 keys unique for fixed plaintext/ciphertext
pair included a discussion on unicity, with some broken dialogues. I wrote
-up a summary that I'm sending to this list as a possible seed for further
comments. I apologize for any mistakes or imprecision, as I'm not
trying to be as exact as possible -- just sufficiently exact for the purpose
at hand. I also provide below the online references for Shannon's  works
[Sha48, Sha49] that are important to this discussion.

The AES thread discussion is NOT included here.

1. WHAT IS UNICITY?
There are three different contexts to answer to this question!

1.a. Unicity Definition: Shannon [Sha49, page 693] defined unicity
distance (hereafter, n) as the least amount of plaintext which can be
uniquely deciphered from the corresponding ciphertext, allowing one to
determine, without doubt, the key that was used for encryption. The
amount of plaintext (i.e., n) can be measured in any units the user
may find convenient, such as bits, bytes, letters, symbols, etc. Actually,
Shannon used letters in his paper.

NOTE 1: This is a definition. There is no proof involved here.

1.b. Unicity Model: As first given by Shannon [Sha49] under some restrictive
assumptions, specially the random cipher assumption, the mathematical
expression for unicity can be cast in the following unfolded expression
(his original expression was  n = H(K)/D, where D is the redundancy):

n = H(K)/[|M| - H(M)]

where the quantities are:

n = unicity; least message length that can be uniquely deciphered
H(K) = entropy of keys used in encryption
|M| = maximum possible entropy for the plaintext
H(M) = entropy of actual message, the plaintext

and the entropies are calculated accordingly to the desired units (bits,
bytes, letters, symbols, etc.), which also define the unit for n.

NOTE 1: The model for unicity has no probability error with a tail
to infinity because only entropy values are used in the formula of n
and by *definition* of  entropy the entropy is already a limit to
infinity.

NOTE 2: It does not matter how the attacker may try to decipher
the message. The attacker can of course use brute-force and try
out all keys or he can use short-cuts, it is his choice and he is entirely
free to use any method he desires.  The work involved may be small,
quite large or even unbounded -- the amount of work is actually
unspecified.

NOTE 3: Shannon's definition of random cipher was that all
decipherments must produce a random flat distribution over all
bits in the plaintext space.

1.c. Unicity Value:  The numerical value of n. It is important not to
confuse a model with a measurement. Models predict measurements,
and do so within an error range. What is the the error range for
measuring n?

First, note that the model works for any ciphertext, any plaintext.
And for any such pairs, the result n is predicted by the model
even if an attacker has unbounded resources, including infinite time.
The value of n depends on the maximum possible entropy for the
plaintext, the plaintext entropy, the entropy of the keys and the
assumption that the cipher is a random cipher. Since all good
ciphers should be a random cipher, for those ciphers the model
provides a good approximation to what n actually is. The practical
difficulty of reliably estimating the plaintext entropy and even the key
entropy (which errors contribute to an error in n) has nothing to
do with the model itself or its error for n, but on the errors
for the quantities  on which it depends -- however, it's not so
hard to obtain good estimates and several are well-known.

NOTE 1: Estimating the entropy of English (and other languages)
has been the subject of considerable study. Various authors have
measured H(M) for English texts and found values that lie between
1.0 and 1.5. The standard value quoted is 1.2, close to average of
the extreme values. Even though  each author has a different text,
different preferred words, and different style preferences, we all
come pretty close to the  entropy value of 1.2. However, XML text
(which is in English) is more redundant than natural English and should
have a lower entropy. On the other hand, English text that is sent
by SMS in cell phones has messages such as Chk tat 4 u 2,
where the redundancy is reduced and the entropy should be higher.

NOTE 2: The benefit of compression is to increase unicity even
if the compression algorithm is fully known to the attacker. If the
plaintext is compressed before encipherment, then we rightly
expect its entropy per compressed character to increase -- even
though its entropy per English character does not increase. This
is often confusing and may provide the wrong impressions that
nothing is gained by compression or that we may need to hide
the compression algorithm from the attacker.


2. READING THE