### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

On Sat, Mar 25, 2006 at 07:26:51PM -0500, John Denker wrote: Executive summary: Small samples do not always exhibit average behavior. That's not the whole problem - you have to be looking at the right average too. For the long run encodability of a set of IID symbols produced with probability p_i, then that average is the Shannon Entropy. If you're interested in the mean number of guesses (per symbol) required to guess a long word formed from these symbols, then you should be looking at (\sum_i \sqrt(p_i))^2. Other metrics (min entropy, work factor, ...) require other averages. To see this behaviour, you both need a large sample and the right type of average to match your problem (and I've assumed IID). David. - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]

### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

From: John Denker [EMAIL PROTECTED] Sent: Mar 24, 2006 11:57 AM To: Erik Zenner [EMAIL PROTECTED], cryptography@metzdowd.com Subject: Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy) Erik Zenner wrote: 0 occurs with probability 1/2 each other number from 1 to 2^{160}+1 happens with probability 2^{-161}. Is anyone aware of whether (and where) this was discussed in the literature, or what other approaches are taken? This particular problem is contrived or at least exaggerated. The example is a contrived, pathological case. And there are a bunch of easy solutions once you know this is the distribution. The point is to demonstrate that Shannon entropy gives you the wrong answer when you try to use it here. Now, you might ask if this is a problem in a real-world setting. So let's imagine a very simple distribution--sequences of flips of a biased coin. There are nice ways to remove the bias, but let's imagine we're not doing that--instead, we're going to take a sequence of coin flips, turn it into a 0/1 sequence, and then hash it down to get a key. Suppose the coin is biased so that heads comes up 0.9 of the time, and that we generate 16-bit sequences at a time. The Shannon entropy of a 16-bit sequence is about 7.5, but the most likely symbol (all heads) comes up about 18% of the time. So if we tried to generate a 7-bit key, we'd lose on the attacker's first guess 18% of the time. So, let's go further with this. We want to generate a DES key, with 56 bits of entropy. A 64-bit sequence produced with this biased coin has Shannon entropy of about 60 bits, but an attacker has about a 1/1000 chance of guessing the DES key we derive from it on his first try, which is unacceptable for just about any crypto application. (The min-entropy is about ten bits.) So yes, I used a contrived example to demonstrate the problem, but no, the problem isn't just an ivory-tower concern. The intuition here is that Shannon entropy is concerned with communications channels, where we assume we have to send every symbol. So when you have lots of low-probability symbols, and one of those low-probability symbols is chosen 999 times out of a 1000, the amount of bandwidth you need to transmit those symbols easily becomes dominated by them. Most of the time, you're sending one of a large set of symbols, and they all have to be distinguished from one another. The situation for the attacker is a lot more like having a channel where it's acceptable to only send a small fraction of those symbols, and just drop the rest. That is, it's okay for the attacker's model to just ignore the huge set of low-probability symbols that occur 999/1000 of the time, and instead just transmit the highest probability symbol 1/1000 of the time. Instead of transmitting it, he's just guessing it. When he gets it right, he learns your DES key. ... This version serves to illustrate, in an exaggerated way, the necessity of not assuming that the raw data words are IID (independent and identically distributed). Yes! The independent part is much harder to deal with than the per-symbol distribution, in many real-world applications. The worst of these are operating system events (used in Windows and various Unixes to get random bits). Peter Gutmann did some really nice work on this on a bunch of operating systems in a Usenix paper, and he updated that and has a version of it on his website. If you're doing OS-event sampling to generate random bits, you really ought to look at his stuff. (But if you can get some hardware support, either directly or via sampling the microphone like the Turbid design does, you're probably on much firmer ground.) --John Kelsey, NIST - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]

### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

In the context of 0 occurs with probability 1/2 each other number from 1 to 2^{160}+1 happens with probability 2^{-161}. I wrote: This ... serves to illustrate, in an exaggerated way, the necessity of not assuming that the raw data words are IID (independent and identically distributed). Correction: IID isn't the issue here. The raw data words described above are IID. That's not the problem. The problem is that the distribution is highly skewed. Executive summary: Small samples do not always exhibit average behavior. Let me explain. There is a very simple way to look at this. Consider the expression for the entropy of the source, S := Sum_i P_i log(1/P_i) [1] where i runs over all symbols in the alphabet. One may, with a little care, attribute log(1/P_i) bits of unpredictability to the (i)th symbol in the alphabet. Then S can be interpreted as the appropriately weighted _average_ unpredictability per symbol. In the example quoted above, the minimum log(1/P_i) is vastly smaller than the average log(1/P_i) -- namely 1 versus 161. So focussing on the average is unlikely to solve all the world's problems. In crypto applications (including RNG construction), a crude yet provably correct approach is to rely on the minimum (not the average) per-symbol unpredictability. Using this approach, it would require 80 samples of the given distribution to produce an output with 80 bits of entropy. Things can only get better from here: -- With full knowledge of the source statistics, one can distinguish the large log(1/Pi) words from the small log(1/Pi) words. This would allow the number of required samples to be closer to the typical value (2) than to the worst-case value (80). An example of this is the Huffman coding I discussed in my previous note. -- With even modest knowledge of the given source, one can (by histogramming if nothing else) estimate the probabilities well enough to yield worthwhile improvements in efficiency. -- If you want to produce a long sequence of output words, as you often do, a reasonable-sized buffer is all you really need to come fairly close to optimal efficiency, namely only about 1 sample of the distribution, on average, per 80-bit output word. The chance of getting fooled can be made verrry small, at a modest cost in terms of buffer-size and extra samples of the distribution. That is, the law of large numbers comes to your rescue sooner or later, usually rather soon. -- It may be possible to engineer a different source with larger minimum log(1/Pi). Bottom line: Setting up a highly-skewed distribution and then drawing from it only a single sample is not guaranteed to produce average behavior. Duh, no surprise there! The source entropy S in equation [1] is a single number that characterizes the average behavior. For a small sample from a highly-skewed distribution, S doesn't tell you everything you need to know. This has no bearing on the definition of entropy; entropy is defined by equation [1]. It just means that equation [1] by itself doesn't solve all the world's problems. - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]

### RE: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

From: Erik Zenner [EMAIL PROTECTED] Sent: Mar 24, 2006 4:14 AM To: cryptography@metzdowd.com Subject: RE: Entropy Definition (was Re: passphrases with more than 160 bits of entropy) ... [I wrote:] 0 occurs with probability 1/2 each other number from 1 to 2^{160}+1 happens with probability 2^{-161}. The Shannon entropy on this distribution is 81.5 bits. But if you tried to sample it once to generate an 80-bit Skipjack key, half the time, I'd guess your key on my first try. It's entirely correct that entropy is the wrong measure here, but the question is how a good measure would look like. There are a bunch of different entropy measures. Shannon entropy is the right one for compression ratios and channel capacity, but not for difficulty of guessing the string. Assume that you have a sample space with N elements and an intelligent attacker (i.e., one that tries the most probable elements first). Then what you actually are interested in is that the attacker's probability of success after q sampling attempts is as close as possible to the lowest possible, namely q * 2^{-N}. A natural way of measuring this seems to be some kind of distance between Pr[succ after q samples] and the ideal function q * 2^{-N}. Such a measure might allow a designer to decide whether a non-perfect distribution is still acceptable or simply far out. Is anyone aware of whether (and where) this was discussed in the literature, or what other approaches are taken? The best discussion I've seen of this is in a couple papers by John Pliam, about something he calls the work function, W(n). This is just the probability of having guessed some secret value after n operations (typically n guesses). You can generalize this to the number of operations you have to carry out to have a given probability of violating any security property (repeating a nonce, getting a block collision in a block cipher chaining mode, etc). It's a very general way of thinking about limiting parameters of crypto algorithms. You're basically heading toward this idea in what you said above. Let's think of this for an ideal case: a 128-bit key. When you have done 2^k guesses of the key, you have a 2^{n-k} chance of having guessed the key correctly. So if you graphed the work vs probability of success on a log/log chart, you'd get a straight line for the ideal case. W(n) = 2^{-128} n, as you said above. Now, consider the case where you are generating a 128-bit key from a bunch of sampled events on your computer that have been concatenated together in a string S. Imagine making a list of all the possible values of that string, and sorting them in descending order of probability. You now have the best possible sequence of guesses. W(n) is the sum of the first n probabilities. If W(n) 2^{-128} n for any n, then the attacker has some point where it is to his advantage to guess S instead of guessing your key. So, this is where min-entropy comes in. Min-entropy is just -lg(P[max]), the base 2 log of the maximum probability in the distribution. You can convince yourself than if the min-entropy of S is at least 128, then it's never easier to guess S than to guess K. This is because each possible input string must have probability lower than 2^{-128}, so the sum of the first n, W(n) n 2^{-128}. This doesn't solve the much harder engineering/data analysis problem of getting some reasonable approximation for the probabilities of S, unfortunately. The easy way to solve that is to design a hardware RNG in such a way that you pretty-much know the probabilty model to use and can work out how to sample it to get a good estimate of the probabilities. However, it is kind of nice to recognize that you only have to estimate the largest probability to compute the min-entropy. (It ought to be the easiest probability to measure and bound!) Erik -- Dr. Erik Zenner Phone: +45 39 17 96 06Cryptico A/S Chief Cryptographer Mobile: +45 60 77 95 41Fruebjergvej 3 [EMAIL PROTECTED] www.cryptico.com DK 2100 Copenhagen --John Kelsey, NIST - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]

### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

Hal Finney wrote: ... This is true, in fact it is sometimes called the universal distribution or universal measure. In more detail, it is a distribution over all finite-length strings. The measure for a particular string X is defined as the sum over all programs that output X of 1/2^L_i, where L_i is the length of each such program. There is no such that as the universal measure; rather there are lots of universal measures. Universality is a rather arcane property of the measure, the term doesn't mean what most people think it means. -- Universal does NOT mean all-purpose. -- Universal does NOT mean general-purpose. -- There are many inequivalent universal distributions, most of which are not what you want for any given application. -- Certain things that are true asymptotically are not true for _any_ practical application. -- Ratio converging to 1 does not imply difference converging to 0. This is probably not the right forum for cleaning out this Augean stable. ... The universal measure for a string can also be thought of as the probability that, given a fixed Universal Turing Machine (UTM), a randomly chosen input program will output that string. So this is clearly a probability distribution (with some technicalities regarding issues of program lengths being glossed over here) as John Denker says. However to go from this to a notion of entropy is more questionable. Not really questionable. If you have a probability, you have an entropy. Shannon entropy is defined over a probability distribution. That is, given a probability distribution we can find its Shannon entropy by summing -p_i / log2(p_i) over all members of the distribution. If we approximate the universal distribution by 1/2^n for strings of length n, this sum is clearly divergent. So there is not really a meaningful Shannon entropy for this probability distribution, or in other words the entropy is infinite. The entropy _per string_ is unbounded, as it jolly well should be for random strings of unbounded length. That's true but uninteresting. A more interesting quantity is the _entropy density_ aka the entropy _per symbol_ which for typical long random bit-strings is one bit of entropy per bit of string-length. Similarly if you have a string of symbols over a 32-symbol alphabet, you would expect to see five bits of entropy per symbol for a typical long random string. - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]

### RE: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

Shannon entropy is the one most people know, but it's all wrong for deciding how many samples you need to derive a key. The kind of classic illustration of this is the probability distirbution: 0 occurs with probability 1/2 each other number from 1 to 2^{160}+1 happens with probability 2^{-161}. The Shannon entropy on this distribution is 81.5 bits. But if you tried to sample it once to generate an 80-bit Skipjack key, half the time, I'd guess your key on my first try. It's entirely correct that entropy is the wrong measure here, but the question is how a good measure would look like. Assume that you have a sample space with N elements and an intelligent attacker (i.e., one that tries the most probable elements first). Then what you actually are interested in is that the attacker's probability of success after q sampling attempts is as close as possible to the lowest possible, namely q * 2^{-N}. A natural way of measuring this seems to be some kind of distance between Pr[succ after q samples] and the ideal function q * 2^{-N}. Such a measure might allow a designer to decide whether a non-perfect distribution is still acceptable or simply far out. Is anyone aware of whether (and where) this was discussed in the literature, or what other approaches are taken? Erik -- Dr. Erik Zenner Phone: +45 39 17 96 06Cryptico A/S Chief Cryptographer Mobile: +45 60 77 95 41Fruebjergvej 3 [EMAIL PROTECTED] www.cryptico.com DK 2100 Copenhagen This e-mail may contain confidential information which is intended for the addressee(s) only and which may not be reproduced or disclosed to any other person. If you receive this e-mail by mistake, please contact Cryptico immediately and destroy the e-mail. Thank you. As e-mail can be changed electronically, Cryptico assumes no responsibility for the message or any attachments. Nor will Cryptico be responsible for any intrusion upon this e-mail or its attachments. - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]

### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

Someone mentioned Physics in this discussion and this was for me a motivation to point out something that has been forgotten by Shannon, Kolmogorov, Chaitin and in this thread. Even though Shannon's data entropy formula looks like an absolute measure (there is no reference included), the often confusing fact is that it does depend on a reference. The reference is the probability model that you assume to fit the data ensemble. You can have the same data ensemble and many different (infinite) probability models that fit that data ensemble, each one giving you a valid but different entropy value. For example, if a source sends the number 1 1,000 times in a row, what would be the source's entropy? Aram's assertion that the sequence of bytes from 1-256 has maximum entropy would be right if that sequence came as one of the possible outcomes of a neutron counter with a 256-byte register. Someone's assertion that any data has entropy X can be countered by finding a different probability model that also fits the data, even if the entropy is higher (!). In short, a data entropy value involves an arbitrary constant. The situation, which seems confusing, improves when we realize that only differences in data entropy can be actually measured, when the arbitrary constant can be canceled -- if we are careful. In practice, because data security studies usually (and often wrongly!) suppose a closed system, then, so to say automatically, only difference states of a single system are ever considered. Under such circumstances, the probability model is well-defined and the arbitrary constant *always* cancel. However, data systems are not really closed, probability models are not always ergodic or even accurate. Therefore, due care must be exercised when using data entropy. I don't want to go into too much detail here, which results will be available elsewhere, but it is useful to take a brief look into Physics. In Physics, Thermodynamics, entropy is a potential [1]. As is usual for a potential, only *differences* in entropy between different states can be measured. Since the entropy is a potential, it is associated with a *state*, not with a process. That is, it is possible to determine the entropy difference regardless of the actual process which the system may have performed, even whether the process was reversible or not. These are quite general properties. What I'm suggesting is that the idea that entropy depends on a reference also applies to data entropy, not just the entropy of a fluid, and it solves the apparent contradictions (often somewhat acid) found in data entropy discussions. It also explains why data entropy seems confusing and contradictory to use. It may actually be a much more powerful tool for data security than currently used. Cheers, Ed Gerck [1] For example, J. Kestin, A Course in Thermodynamics, Blaisdell, 1966. - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]

### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

Ed Gerck wrote: In Physics, Thermodynamics, entropy is a potential [1]. That's true in classical (19th-century) thermodynamics, but not true in modern physics, including statistical mechanics. The existence of superconductors and superfluids removes all doubt about the absolute zero of entropy. For details, see http://www.av8n.com/physics/thermo-laws.htm#sec-spectator http://www.av8n.com/physics/thermo-laws.htm#sec-secret-s As is usual for a potential, only *differences* in entropy between different states can be measured. Not true. These are quite general properties. They are neither general nor relevant to crypto. - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]

### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

Aram Perez wrote: * How do you measure entropy? I was under the (false) impression that Shannon gave a formula that measured the entropy of a message (or information stream). Entropy is defined in terms of probability. It is a measure of how much you don't know about the situation. If by message you mean a particular message that you are looking at, it has zero entropy. It is what it is; no probability required. If by stream you mean a somewhat abstract notion of an ensemble of messages or symbols that you can only imperfectly predict, then it has some nonzero entropy. * Can you measure the entropy of a random oracle? Or is that what both Victor and Perry are saying is intractable? That is a tricky question for a couple of reasons. a) It will never be completely answerable because the question hinges on the word random, which means different things to different people. Thoughtful experts use the word in multiple inconsistent ways. b) It also depends on just what you mean by measure. Often it is possible to _ascertain_ the entropy of a source ... but direct empirical measurements of the output are usually not the best way. * Are there units of entropy? Yes. It is naturally _dimensionless_, but dimensionless quantities often have nontrivial units. Commonly used units for entropy include ++ bits ++ joules per kelvin. One J/K equals 1.04×10^23 bits For more on this, see http://www.av8n.com/physics/thermo-laws.htm#sec-s-units * What is the relationship between randomness and entropy? They are neither exactly the same nor completely unrelated. A pseudorandom sequence may be random enough for many applications, yet has asymptotically zero entropy density. A sequence of _odd_ bytes is obviously not entirely random, yet may have considerable entropy density. * (Apologies to the original poster) When the original poster requested passphrases with more than 160 bits of entropy, what was he requesting? When you apply a mathematical function to an ensemble of inputs, it is common to find that the ensemble of outputs has less entropy than the ensemble of inputs. A simple pigeonhole argument suffices to show that a function whose output is represented in 160 bits cannot possibly represent more than 160 bits of entropy per output. So if you want the ensemble of outputs to have more than 160 bits of entropy, it is necessary to do something fancier than a single instance of SHA-1. * Does processing an 8 character password with a process similar to PKCS#5 increase the entropy of the password? No. * Can you add or increase entropy? Shuffling a deck of cards increases the entropy of the deck. For more on this, see http://www.av8n.com/physics/thermo-laws.htm#sec-entropy - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]

### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

On Wed, Mar 22, 2006 at 03:29:07PM -0800, Aram Perez wrote: * How do you measure entropy? I was under the (false) impression that Shannon gave a formula that measured the entropy of a message (or information stream). He did give a formula for the entropy of a source; however the caculation is based on the probabilties of each symbol appearing. Unless you know those, you can't actually apply the formula. So it is computable in theory, just not in pratice for any source that is at all interesting. * Can you measure the entropy of a random oracle? Or is that what both Victor and Perry are saying is intractable? A random oracle, by definition, produces a completely random output. However, since random oracles don't actually exist that does not seem to be a terribly interesting thing to be measuring the entropy of. * Are there units of entropy? Bits are usually the most intuitive/useful unit. * What is the relationship between randomness and entropy? I have a vague feeling this question requires a deeper answer than I'm able to provide. * Does processing an 8 character password with a process similar to PKCS#5 increase the entropy of the password? No, because there are no additional secrets. Knowledge of the password is all you need to rederive the final output, thus clearly there is no additional information (ie, entropy) in the output that was not in the original password. This ignores the salt, iteration count, and the specification of the algorithm itself, but those are all (usually) public. So they contribute to the entropy, they do not contribute to the conditional entropy, which is what we are usually interested in when thinking about entropy and crypto. * Can you add or increase entropy? Yes. Let's say the contents of tommorrow's NY Times has n bits of entropy (we probably can't actually calculate n, but we know it is a finite and fixed value). And the LA Times from the same day will also have some amount of entropy (call it n'). However, the entropy of the two papers combined would (probably) not be n+n', but some number x st min(n,n') = x = n+n', because the two papers will report on many of the same issues, carry some of the same AP stories, and so forth. This redundant information doesn't increase the entropy (reading the same AP story in a second newspaper wouldn't give you any new information). A book you may find interesting is Elements of Information Theory by Cover Thomas. -Jack - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]

### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

Aram Perez [EMAIL PROTECTED] wrote: So, if you folks care to educate me, I have several questions related to entropy and information security (apologies to any physicists): I'll answer the easier questions. I'll leave the harder ones for someone with a better grounding in information theory. * What is the relationship between randomness and entropy? Roughly, they both measure unpredictability. Something that is hard to predict is random, or has high entropy. There are mathematical formulations that make this a lot more precise, but that's the basic idea. * Does processing an 8 character password with a process similar to PKCS#5 increase the entropy of the password? Absolutely not! At best, you preserve the original entropy, just distributing it differently. If you get the processing wrong, you can reduce or even entirely destroy it, but no amount of any kind of processing can increase it. * Can you add or increase entropy? You can add more entropy, either from another source or more from the same source. That is the only way to increase it. -- Sandy Harris Zhuhai, Guangdong, China - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]

### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

From: Jack Lloyd [EMAIL PROTECTED] Sent: Mar 22, 2006 11:30 PM To: cryptography@metzdowd.com Subject: Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy) ... As an aside, this whole discussion is confused by the fact that there are a bunch of different domains in which entropy is defined. The algorithmic information theory sense of entropy (how long is the shortest program that produces this sequence?) is miles away from the information theory sense of entropy, and even that has a bunch of different flavors. For the information theory meanings of entropy, what you're really talking about is a property of a probability distribution. You can do that in terms of Shannon entropy if you want to know about bounds on average bandwidth requirements for transmitting symbols from that distribution. You can look at guessing entropy if you want to know the -lg(expected work to guess the next symbol). You can look at min-entropy if you want a hard bound on how many symbols you need to sample to derive a 128-bit key that won't ever expose you to weaknesses based on how you selected it. And so on. Shannon entropy is the one most people know, but it's all wrong for deciding how many samples you need to derive a key. The kind of classic illustration of this is the probability distirbution: 0 occurs with probability 1/2 each other number from 1 to 2^{160}+1 happens with probability 2^{-161}. The Shannon entropy on this distribution is 81.5 bits. But if you tried to sample it once to generate an 80-bit Skipjack key, half the time, I'd guess your key on my first try. ... Yes. Let's say the contents of tommorrow's NY Times has n bits of entropy (we probably can't actually calculate n, but we know it is a finite and fixed value). And the LA Times from the same day will also have some amount of entropy (call it n'). However, the entropy of the two papers combined would (probably) not be n+n', but some number x st min(n,n') = x = n+n', because the two papers will report on many of the same issues, carry some of the same AP stories, and so forth. This redundant information doesn't increase the entropy (reading the same AP story in a second newspaper wouldn't give you any new information). Right. If the probabilities are independent, you can add the entropies. That's why they're defined in log terms. --John - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]

### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

John Kelsey wrote: As an aside, this whole discussion is confused by the fact that there are a bunch of different domains in which entropy is defined. The algorithmic information theory sense of entropy (how long is the shortest program that produces this sequence?) is miles away from the information theory sense of entropy, and even that has a bunch of different flavors. I would have said almost the opposite. The great power and beauty of the entropy idea is that it has the *same* meaning across many domains. Entropy is defined in terms of probability, period. Any other definition is either a) exactly equivalent, b) an approximation, valid under this-or-that restrictive conditions, or c) wrong. With some slight fiddling to get the normalization right, 1/2 raised to the power of (program length) defines a probability measure. This may not be the probability you want, but it is a probability, and you can plug it into the entropy definition. The problem is almost never understanding the definition of entropy. The problem is almost always ascertaining what is the correct probability measure applicable to the problem at hand. Don't get me started about universal probability measures. That has an arcane narrow technical meaning that is verrry widely misunderstood. 0 occurs with probability 1/2 each other number from 1 to 2^{160}+1 happens with probability 2^{-161}. The Shannon entropy on this distribution is 81.5 bits. I think it is very close to 81 bits, not 81.5, but that is a minor point that doesn't change the conclusion: But if you tried to sample it once to generate an 80-bit Skipjack key, half the time, I'd guess your key on my first try. Yeah, but if I sample it N times, with high probability I can generate a large number of very good keys. This problem is faced by (and solved by) any decent TRNG. - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]

### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

At 22:09 -0500 2006/03/22, John Denker wrote: Aram Perez wrote: * Can you add or increase entropy? Shuffling a deck of cards increases the entropy of the deck. As a minor nit, shuffling *in an unpredictable manner* adds entropy, because there is extra randomness being brought into the process. If I was one of those people who can do a perfect riffle shuffle, reordering the cards in this entirely predictable manner does not increase or decrease the existing entropy. So in one sense, the answer is a simple no... nothing you can do to a passphrase can increase its (that is, the passphrase's) entropy. You can add randomness from another source, and increase the total entropy, but I don't think that is relevant to the original question. Greg. - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]

### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

From: John Denker [EMAIL PROTECTED] Sent: Mar 23, 2006 1:44 PM To: John Kelsey [EMAIL PROTECTED], cryptography@metzdowd.com Subject: Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy) ... With some slight fiddling to get the normalization right, 1/2 raised to the power of (program length) defines a probability measure. This may not be the probability you want, but it is a probability, and you can plug it into the entropy definition. No, this isn't right for the algorithmic information theory meaning at all. For that measure, we can intelligently discuss the entropy of a specific random string, without reference to a probability model. Indeed, what's the probability distribution of the sequence of bits defined by Chaitin's Omega? You can certainly complain that they should have used a different term than entropy here, but you're not going to get these to be the same. The problem is almost never understanding the definition of entropy. The problem is almost always ascertaining what is the correct probability measure applicable to the problem at hand. Well, you need to decide which of the probability distribution kinds of entropy measures to use, and that differs in different applications. If you use min-entropy or guessing entropy to estimate the limits on how well some sequence of symbols will compress, you'll get a pretty lousy answer. The same goes for using Shannon entropy to determine whether you have collected enough entropy in your pool to generate a 128-bit AES key. ... 0 occurs with probability 1/2 each other number from 1 to 2^{160}+1 happens with probability 2^{-161}. The Shannon entropy on this distribution is 81.5 bits. I think it is very close to 81 bits, not 81.5, but that is a minor point that doesn't change the conclusion: But if you tried to sample it once to generate an 80-bit Skipjack key, half the time, I'd guess your key on my first try. Yeah, but if I sample it N times, with high probability I can generate a large number of very good keys. This problem is faced by (and solved by) any decent TRNG. Hmmm. I've seen plenty of people get this wrong. If you use Shannon entropy as your measure, and then say when I have collected 128 bits of Shannon entropy in my pool, I can generate a 128-bit AES key, you will generate keys that aren't as secure as you think they are. Now, most TRNGs seem to evade this and many other problems by designing the hardware to give a relatively simple probability model. If your probability model is close to uniform, then all these probability distribution based entropy measurements converge (with a constant difference). In the case above, we had better specify N. If you sample it 16 times, then you have a 2^{-16} chance of still being in a weak state. Once you get enough samples that the probability of being in the pathological worst case is negligible (whatever that means in your application), then you can start generating output bits. That's probably somewhere between N=32 and N=64. --John - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]

### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

I wrote: With some slight fiddling to get the normalization right, 1/2 raised to the power of (program length) defines a probability measure. This may not be the probability you want, but it is a probability, and you can plug it into the entropy definition. John Kelsey wrote: No, this isn't right for the algorithmic information theory meaning at all. OK, in a moment we will have gone through four plies of no-it-isn't yes-it-is no-it-isn't yes-it-is. Let's get serious. The axiomatic definition of a measure is -- a mapping from sets to numbers -- positive -- additive on the countable union of disjoint sets And a probability measure has the further property of being -- bounded above I have checked that -- with due attention to trivial details -- .5 ^ (program length) satisfies this definition. If you wish to renew the assertion that there is no such probability measure, please explain which of the axiomatic requirements is not met. Please be specific. For that measure, we can intelligently discuss the entropy of a specific random string, without reference to a probability model. That's like saying we can talk about three-dimensional force, velocity, and acceleration without reference to vectors. Measure theory is the tried-and-true formalism for dealing with random strings. It would be spectacularly contrary to ignore the formalism, and just plain wrong to say the formalism is inapplicable. Indeed, what's the probability distribution of the sequence of bits defined by Chaitin's Omega? Huh? Omega is so obviously a probability that usually the word probability is used in its definition. See e.g. http://mathworld.wolfram.com/ChaitinsConstant.html I suppose a masochistic nitpicker could demand to see a proof that this word is justified, but I'm not going to bother, for reasons given above. - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]

### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

This is getting pretty far afield from cryptography but it is a topic I find very interesting so I can't resist jumping in. John Denker writes: OK, in a moment we will have gone through four plies of no-it-isn't yes-it-is no-it-isn't yes-it-is. Let's get serious. The axiomatic definition of a measure is -- a mapping from sets to numbers -- positive -- additive on the countable union of disjoint sets And a probability measure has the further property of being -- bounded above I have checked that -- with due attention to trivial details -- .5 ^ (program length) satisfies this definition. If you wish to renew the assertion that there is no such probability measure, please explain which of the axiomatic requirements is not met. Please be specific. This is true, in fact it is sometimes called the universal distribution or universal measure. In more detail, it is a distribution over all finite-length strings. The measure for a particular string X is defined as the sum over all programs that output X of 1/2^L_i, where L_i is the length of each such program. Often the algorithmic complexity of a string is defined as the length of the shortest program to output the string. The universal measure is based on the same idea, but takes into consideration that there may be multiple programs that output the same string. Each program of length L_i contributes 1/2^L_i to the string's measure. If there is only one short program and all others are much longer, then the probability measure is essentially 1/2^C where C is the length of this shortest program, i.e. the algorithmic complexity. The universal measure for a string can also be thought of as the probability that, given a fixed Universal Turing Machine (UTM), a randomly chosen input program will output that string. So this is clearly a probability distribution (with some technicalities regarding issues of program lengths being glossed over here) as John Denker says. However to go from this to a notion of entropy is more questionable. There are a countably infinite number of finite strings, and all of them have non-zero probabilities under this distribution. This means that for most strings, the probability must be very low, asymptotically approaching zero. In fact you can show that for most strings of length n, their measure is 1/2^n; this is equivalent to saying that most strings are effectively random and have no shorter program to output them. Shannon entropy is defined over a probability distribution. That is, given a probability distribution we can find its Shannon entropy by summing -p_i / log2(p_i) over all members of the distribution. If we approximate the universal distribution by 1/2^n for strings of length n, this sum is clearly divergent. So there is not really a meaningful Shannon entropy for this probability distribution, or in other words the entropy is infinite. John Kelsey asked: Indeed, what's the probability distribution of the sequence of bits defined by Chaitin's Omega? This probability distribution is defined only over finite strings and so Omega is not within the universe of this distribution. It should also be noted that it is impossible for an n bit program to output more than n bits of Omega (plus or minus an additive constant specific to the UTM). Hence even if we consider successive approximations to Omega of ever increasing length, their measures would tend asymptotically to zero. Hal Finney - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]

### Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

On Mar 22, 2006, at 2:05 PM, Perry E. Metzger wrote: Victor Duchovni [EMAIL PROTECTED] writes: Actually calculating the entropy for real-world functions and generators may be intractable... It is, in fact, generally intractable. 1) Kolmogorov-Chaitin entropy is just plain intractable -- finding the smallest possible Turing machine to generate a sequence is not computable. 2) Shannon entropy requires a precise knowledge of the probability of all symbols, and in any real world situation that, too, is impossible. I'm not a cryptographer nor a mathematician, so I stand duly corrected/chastised ;-) So, if you folks care to educate me, I have several questions related to entropy and information security (apologies to any physicists): * How do you measure entropy? I was under the (false) impression that Shannon gave a formula that measured the entropy of a message (or information stream). * Can you measure the entropy of a random oracle? Or is that what both Victor and Perry are saying is intractable? * Are there units of entropy? * What is the relationship between randomness and entropy? * (Apologies to the original poster) When the original poster requested passphrases with more than 160 bits of entropy, what was he requesting? * Does processing an 8 character password with a process similar to PKCS#5 increase the entropy of the password? * Can you add or increase entropy? Thanks in advance, Aram Perez - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]