Re: Comments/summary on unicity discussion
- Original Message - From: Joshua Hill [EMAIL PROTECTED] Subject: Re: Comments/summary on unicity discussion It doesn't deal with plaintext, just ciphertext. In fact, unicity distance is only valid for a ciphertext only attack. Once you get a known plaintext/ciphertext pair, a high unicity distance works against you (more on this later). In addition, it is isn't certain that after observing the requisite unicity distance number of ciphertext units that you can uniquely determine the key, it is merely very likely. There appears to be an error in there. The Unicity Distance has a very strong correlation with the uncertainty of the plaintext (entropy per message). By having access to the plaintext/ciphertext pair (often it takes multiple pairs), this removes all uncertainty as to the plaintext, this changes the unicity distance calculation by making the unicity distance as short as possible, which would make Once you get a known plaintext/ciphertext pair, a high unicity distance works against you Seem more than a little odd as a statement. On K complexity, while K complexity offers a convenient, if somewhat inaccurate, upperbound of the entropy, that is basically where the relationship ends. Permit me to give the basic example. Which of these strings has higher entropy: kevsnblawtrlnbatkb kevsnblawtrlnbatkb One was created by slapping my hands on the keyboard, and so contains some entropy, the other was created through copy and paste, and so contains none. However the K complexity of the two is identical. The portion of the equation you are forgetting is that the key to the pRNG may itself be compressible. This leads to somewhat of a logic loop, but at the end of it is the absolute smallest representation, as a compression of a given language (the only sense in which this makes sense). Joseph Ashwood Trust Laboratories http://www.trustlaboratories.com - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]
Re: Comments/summary on unicity discussion
I'll hop in here, and see if I can give this a swing. IANAC (I am not a cryptographer), but I do have access to one at work, and I did make use of him in gaining an understanding of this area. Some of these points are minutia, but are important, both because they are common errors, and because they really help bring everything together. In general, it appears that you mixed up your labels in the reference list. [Sha49] is, indeed, the reference that you want, but that paper should be listed as Communication Theory of Secrecy Systems (published in 1949) On Mon, Mar 03, 2003 at 01:28:54PM -0800, Ed Gerck wrote: 1. WHAT IS UNICITY? There are three different contexts to answer to this question! 1.a. Unicity Definition: Shannon [Sha49, page 693] defined unicity distance (hereafter, n) as the least amount of plaintext which can be uniquely deciphered from the corresponding ciphertext, allowing one to determine, without doubt, the key that was used for encryption. It doesn't deal with plaintext, just ciphertext. In fact, unicity distance is only valid for a ciphertext only attack. Once you get a known plaintext/ciphertext pair, a high unicity distance works against you (more on this later). In addition, it is isn't certain that after observing the requisite unicity distance number of ciphertext units that you can uniquely determine the key, it is merely very likely. So, the definition should be something more like: 1.a. Unicity Definition: Shannon [Sha49, page 693] defined unicity distance (hereafter, n) as the least amount of ciphertext which would reduce the likely number of spurious keys (equivocations) in a random cipher to zero. 1.b. Unicity Model: As first given by Shannon [Sha49] under some restrictive assumptions, specially the random cipher assumption, the mathematical expression for unicity can be cast in the following unfolded expression (his original expression was n = H(K)/D, where D is the redundancy): n = H(K)/[|M| - H(M)] I don't see this. I do see D_N = log(G) - H(M) Where D_N is the redundancy of the language, G is the total number of messages in the language, and H(M) is the entropy of a message of the language. Shannon uses log base 10, but we like to talk about 'bits' of entropy in crypto, so we tend to talk about log base 2 (often written lg x) D = D_N / N Where D is the redundancy the per unit (character/bit/whatever), D_N is the redundancy of the language, and N is the average number of units per message. And finally, the unicity distance is: n = H(K) / D Where n is the unicity distance (expressed in units), H(K) is the amount of entropy of a key (also in units) for the system, and D is the redundancy per unit. So, pulling it together n = H(K) * N / (lg(G) - H(M)) (so, it would appear that your equation was off by a factor of the average length of the message) But truly, I think that it makes the most sense as just n = H(K) / D As an aside, you define: H(K) = entropy of keys used in encryption [...] H(M) = entropy of actual message, the plaintext This seems to imply that a particular message has entropy. This is incorrect. A random variable has entropy ('variable' in mathspeak, not 'variable' in the sense of programming, which has an actual value as the program is executing; this is a random variable as in X, where X assumes the following values x_1, x_2, ... x_n with probability p_1, p_2, ...p_n) , based on the statistical properties of the variable. A particular value doesn't really have 'entropy', outside the context of the system that created the value. Now, having said that, strings (values, messages, etc.) do have something called Kolmogorov Complexity. Kolmogorov Complexity is the size of the smallest program that can produce a particular string. For a string, the highest Kolmogorov Complexity it can have is the size of the string. The lowest would be a very small program that produces a very large string. As you might expect, entropy and Kolmogorov Complexity are related. You would expect a message from a high entropy system to have a high Kolmogorov Complexity (in fact, the Kolmogorov Complexity should be comparable to the entropy of the system). Further, you get some intuitively nice results where the Kolmogorov Complexity of any output of a PRNG is at most the size of the PRNG state, plus a bit for the PRNG algorithm, which agrees nicely with the entropy of the system, which is (at best) size of the PRNG state. NOTE 1: The model for unicity has no probability error with a tail to infinity because only entropy values are used in the formula of n and by *definition* of entropy the entropy is already a limit to infinity. I don't understand what this note means. NOTE 2: It does not matter how the attacker may try to decipher the message. The attacker can of course use brute-force and try out all keys or he can use short-cuts, it is his choice and he is entirely free to use any
Comments/summary on unicity discussion
List: The recent thread on AES-128 keys unique for fixed plaintext/ciphertext pair included a discussion on unicity, with some broken dialogues. I wrote -up a summary that I'm sending to this list as a possible seed for further comments. I apologize for any mistakes or imprecision, as I'm not trying to be as exact as possible -- just sufficiently exact for the purpose at hand. I also provide below the online references for Shannon's works [Sha48, Sha49] that are important to this discussion. The AES thread discussion is NOT included here. 1. WHAT IS UNICITY? There are three different contexts to answer to this question! 1.a. Unicity Definition: Shannon [Sha49, page 693] defined unicity distance (hereafter, n) as the least amount of plaintext which can be uniquely deciphered from the corresponding ciphertext, allowing one to determine, without doubt, the key that was used for encryption. The amount of plaintext (i.e., n) can be measured in any units the user may find convenient, such as bits, bytes, letters, symbols, etc. Actually, Shannon used letters in his paper. NOTE 1: This is a definition. There is no proof involved here. 1.b. Unicity Model: As first given by Shannon [Sha49] under some restrictive assumptions, specially the random cipher assumption, the mathematical expression for unicity can be cast in the following unfolded expression (his original expression was n = H(K)/D, where D is the redundancy): n = H(K)/[|M| - H(M)] where the quantities are: n = unicity; least message length that can be uniquely deciphered H(K) = entropy of keys used in encryption |M| = maximum possible entropy for the plaintext H(M) = entropy of actual message, the plaintext and the entropies are calculated accordingly to the desired units (bits, bytes, letters, symbols, etc.), which also define the unit for n. NOTE 1: The model for unicity has no probability error with a tail to infinity because only entropy values are used in the formula of n and by *definition* of entropy the entropy is already a limit to infinity. NOTE 2: It does not matter how the attacker may try to decipher the message. The attacker can of course use brute-force and try out all keys or he can use short-cuts, it is his choice and he is entirely free to use any method he desires. The work involved may be small, quite large or even unbounded -- the amount of work is actually unspecified. NOTE 3: Shannon's definition of random cipher was that all decipherments must produce a random flat distribution over all bits in the plaintext space. 1.c. Unicity Value: The numerical value of n. It is important not to confuse a model with a measurement. Models predict measurements, and do so within an error range. What is the the error range for measuring n? First, note that the model works for any ciphertext, any plaintext. And for any such pairs, the result n is predicted by the model even if an attacker has unbounded resources, including infinite time. The value of n depends on the maximum possible entropy for the plaintext, the plaintext entropy, the entropy of the keys and the assumption that the cipher is a random cipher. Since all good ciphers should be a random cipher, for those ciphers the model provides a good approximation to what n actually is. The practical difficulty of reliably estimating the plaintext entropy and even the key entropy (which errors contribute to an error in n) has nothing to do with the model itself or its error for n, but on the errors for the quantities on which it depends -- however, it's not so hard to obtain good estimates and several are well-known. NOTE 1: Estimating the entropy of English (and other languages) has been the subject of considerable study. Various authors have measured H(M) for English texts and found values that lie between 1.0 and 1.5. The standard value quoted is 1.2, close to average of the extreme values. Even though each author has a different text, different preferred words, and different style preferences, we all come pretty close to the entropy value of 1.2. However, XML text (which is in English) is more redundant than natural English and should have a lower entropy. On the other hand, English text that is sent by SMS in cell phones has messages such as Chk tat 4 u 2, where the redundancy is reduced and the entropy should be higher. NOTE 2: The benefit of compression is to increase unicity even if the compression algorithm is fully known to the attacker. If the plaintext is compressed before encipherment, then we rightly expect its entropy per compressed character to increase -- even though its entropy per English character does not increase. This is often confusing and may provide the wrong impressions that nothing is gained by compression or that we may need to hide the compression algorithm from the attacker. 2. READING THE