### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

On Sat, Mar 25, 2006 at 07:26:51PM -0500, John Denker wrote:
Executive summary:  Small samples do not always exhibit average behavior.

That's not the whole problem - you have to be looking at the right
average too.

For the long run encodability of a set of IID symbols produced with
probability p_i, then that average is the Shannon Entropy.  If
you're interested in the mean number of guesses (per symbol) required
to guess a long word formed from these symbols, then you should be
looking at (\sum_i \sqrt(p_i))^2. Other metrics (min entropy, work
factor, ...) require other averages.

To see this behaviour, you both need a large sample and the right
type of average to match your problem (and I've assumed IID).

David.

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]



### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

From: John Denker [EMAIL PROTECTED]
Sent: Mar 24, 2006 11:57 AM
To: Erik Zenner [EMAIL PROTECTED], cryptography@metzdowd.com
Subject: Re: Entropy Definition (was Re: passphrases with more than 160 bits
of entropy)

Erik Zenner wrote:

0 occurs with probability 1/2
each other number from 1 to 2^{160}+1 happens with
probability 2^{-161}.

Is anyone aware of whether (and where) this was
discussed in the literature, or what other approaches are taken?

This particular problem is contrived or at least exaggerated.

The example is a contrived, pathological case.  And there are a bunch
of easy solutions once you know this is the distribution.  The point
is to demonstrate that Shannon entropy gives you the wrong answer when
you try to use it here.

Now, you might ask if this is a problem in a real-world setting.  So
let's imagine a very simple distribution--sequences of flips of a
biased coin.  There are nice ways to remove the bias, but let's
imagine we're not doing that--instead, we're going to take a sequence
of coin flips, turn it into a 0/1 sequence, and then hash it down to
get a key.

Suppose the coin is biased so that heads comes up 0.9 of the time, and
that we generate 16-bit sequences at a time.  The Shannon entropy of a
16-bit sequence is about 7.5, but the most likely symbol (all heads)
comes up about 18% of the time.  So if we tried to generate a 7-bit
key, we'd lose on the attacker's first guess 18% of the time.

So, let's go further with this.  We want to generate a DES key, with
56 bits of entropy.  A 64-bit sequence produced with this biased coin
has Shannon entropy of about 60 bits, but an attacker has about a
1/1000 chance of guessing the DES key we derive from it on his first
try, which is unacceptable for just about any crypto application.
(The min-entropy is about ten bits.)

So yes, I used a contrived example to demonstrate the problem, but no,
the problem isn't just an ivory-tower concern.

The intuition here is that Shannon entropy is concerned with
communications channels, where we assume we have to send every
symbol.  So when you have lots of low-probability symbols, and one of
those low-probability symbols is chosen 999 times out of a 1000, the
amount of bandwidth you need to transmit those symbols easily becomes
dominated by them.  Most of the time, you're sending one of a large
set of symbols, and they all have to be distinguished from one
another.

The situation for the attacker is a lot more like having a channel
where it's acceptable to only send a small fraction of those symbols,
and just drop the rest.  That is, it's okay for the attacker's model
to just ignore the huge set of low-probability symbols that occur
999/1000 of the time, and instead just transmit the highest
probability symbol 1/1000 of the time.  Instead of transmitting it,
he's just guessing it.  When he gets it right, he learns your DES
key.

...
This version serves to illustrate, in an exaggerated way, the necessity
of not assuming that the raw data words are IID (independent and identically
distributed).

Yes!  The independent part is much harder to deal with than the
per-symbol distribution, in many real-world applications.  The worst
of these are operating system events (used in Windows and various
Unixes to get random bits).  Peter Gutmann did some really nice work
on this on a bunch of operating systems in a Usenix paper, and he
updated that and has a version of it on his website.  If you're doing
OS-event sampling to generate random bits, you really ought to look at
his stuff.  (But if you can get some hardware support, either directly
or via sampling the microphone like the Turbid design does, you're
probably on much firmer ground.)

--John Kelsey, NIST

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]



### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)


In the context of

0 occurs with probability 1/2
each other number from 1 to 2^{160}+1 happens with
probability 2^{-161}.

I wrote:

This ... serves to illustrate, in an exaggerated way, the necessity
of not assuming that the raw data words are IID (independent and identically
distributed).

Correction:  IID isn't the issue here.  The raw data words described
above are IID.  That's not the problem.  The problem is that the
distribution is highly skewed.

Executive summary:  Small samples do not always exhibit average behavior.

Let me explain.  There is a very simple way to look at this.

Consider the expression for the entropy of the source,
S := Sum_i P_i log(1/P_i)  

where i runs over all symbols in the alphabet.

One may, with a little care, attribute log(1/P_i) bits of unpredictability
to the (i)th symbol in the alphabet.  Then S can be interpreted as the
appropriately weighted _average_ unpredictability per symbol.

In the example quoted above, the minimum log(1/P_i) is vastly smaller than
the average log(1/P_i) -- namely 1 versus 161.  So focussing on the average
is unlikely to solve all the world's problems.

In crypto applications (including RNG construction), a crude yet provably
correct approach is to rely on the minimum (not the average) per-symbol
unpredictability.  Using this approach, it would require 80 samples of the
given distribution to produce an output with 80 bits of entropy.

Things can only get better from here:
-- With full knowledge of the source statistics, one can distinguish the large
log(1/Pi) words from the small log(1/Pi) words.  This would allow the number
of required samples to be closer to the typical value (2) than to the
worst-case
value (80).  An example of this is the Huffman coding I discussed in my
previous
note.
-- With even modest knowledge of the given source, one can (by histogramming
if nothing else) estimate the probabilities well enough to yield worthwhile
improvements in efficiency.
-- If you want to produce a long sequence of output words, as you often do, a
reasonable-sized buffer is all you really need to come fairly close to optimal
efficiency, namely only about 1 sample of the distribution, on average, per
80-bit output word.  The chance of getting fooled can be made verrry small,
at a modest cost in terms of buffer-size and extra samples of the
distribution.
That is, the law of large numbers comes to your rescue sooner or later,
usually
rather soon.
-- It may be possible to engineer a different source with larger minimum
log(1/Pi).

Bottom line:  Setting up a highly-skewed distribution and then drawing from
it only a single sample is not guaranteed to produce average behavior.  Duh,
no surprise there!

The source entropy S in equation  is a single number that characterizes the
average behavior.  For a small sample from a highly-skewed distribution, S
doesn't tell you everything you need to know.  This has no bearing on the
definition of entropy;  entropy is defined by equation .  It just means that
equation  by itself doesn't solve all the world's problems.

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]



### RE: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

From: Erik Zenner [EMAIL PROTECTED]
Sent: Mar 24, 2006 4:14 AM
To: cryptography@metzdowd.com
Subject: RE: Entropy Definition (was Re: passphrases with more than 160 bits
of entropy)

...
[I wrote:]
0 occurs with probability 1/2
each other number from 1 to 2^{160}+1 happens with
probability 2^{-161}.

The Shannon entropy on this distribution is 81.5 bits.  But
if you tried to sample it once to generate an 80-bit Skipjack
key, half the time, I'd guess your key on my first try.

It's entirely correct that entropy is the wrong measure here, but
the question is how a good measure would look like.

There are a bunch of different entropy measures.  Shannon entropy is
the right one for compression ratios and channel capacity, but not for
difficulty of guessing the string.

Assume that you have a sample space with N elements and an intelligent
attacker (i.e., one that tries the most probable elements first). Then
what you actually are interested in is that the attacker's probability
of success after q sampling attempts is as close as possible to the
lowest possible, namely q * 2^{-N}. A natural way of measuring this
seems to be some kind of distance between Pr[succ after q samples] and
the ideal function q * 2^{-N}. Such a measure might allow a designer
to decide whether a non-perfect distribution is still acceptable or
simply far out. Is anyone aware of whether (and where) this was
discussed in the literature, or what other approaches are taken?

The best discussion I've seen of this is in a couple papers by John
Pliam, about something he calls the work function, W(n).  This is just
the probability of having guessed some secret value after n operations
(typically n guesses). You can generalize this to the number of
operations you have to carry out to have a given probability of
violating any security property (repeating a nonce, getting a block
collision in a block cipher chaining mode, etc).  It's a very general
way of thinking about limiting parameters of crypto algorithms.
You're basically heading toward this idea in what you said above.

Let's think of this for an ideal case: a 128-bit key.  When you have
done 2^k guesses of the key, you have a 2^{n-k} chance of having
guessed the key correctly.  So if you graphed the work vs probability
of success on a log/log chart, you'd get a straight line for the ideal
case.  W(n) = 2^{-128} n, as you said above.

Now, consider the case where you are generating a 128-bit key from a
bunch of sampled events on your computer that have been concatenated
together in a string S.  Imagine making a list of all the possible
values of that string, and sorting them in descending order of
probability.  You now have the best possible sequence of guesses.
W(n) is the sum of the first n probabilities.

If W(n)  2^{-128} n for any n, then the attacker has some point where

So, this is where min-entropy comes in.  Min-entropy is just
-lg(P[max]), the base 2 log of the maximum probability in the
distribution.  You can convince yourself than if the min-entropy of S
is at least 128, then it's never easier to guess S than to guess K.
This is because each possible input string must have probability lower
than 2^{-128}, so the sum of the first n, W(n)  n 2^{-128}.

This doesn't solve the much harder engineering/data analysis problem
of getting some reasonable approximation for the probabilities of S,
unfortunately.  The easy way to solve that is to design a hardware RNG
in such a way that you pretty-much know the probabilty model to use
and can work out how to sample it to get a good estimate of the
probabilities.  However, it is kind of nice to recognize that you only
have to estimate the largest probability to compute the min-entropy.
(It ought to be the easiest probability to measure and bound!)

Erik

--
Dr. Erik Zenner   Phone:  +45 39 17 96 06Cryptico A/S
Chief Cryptographer   Mobile: +45 60 77 95 41Fruebjergvej 3
[EMAIL PROTECTED]   www.cryptico.com   DK 2100 Copenhagen

--John Kelsey, NIST

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]



### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)


Hal Finney wrote:
...

This is true, in fact it is sometimes called the universal distribution
or universal measure.  In more detail, it is a distribution over all
finite-length strings.  The measure for a particular string X is defined
as the sum over all programs that output X of 1/2^L_i, where L_i is the
length of each such program.

There is no such that as the universal measure;  rather
there are lots of universal measures.  Universality is a
rather arcane property of the measure, the term doesn't mean
what most people think it means.
-- Universal does NOT mean all-purpose.
-- Universal does NOT mean general-purpose.
-- There are many inequivalent universal distributions, most
of which are not what you want for any given application.
-- Certain things that are true asymptotically are not true
for _any_ practical application.
-- Ratio converging to 1 does not imply difference converging to 0.

This is probably not the right forum for cleaning out this Augean stable.

...

The universal measure for a string can also be thought of as the
probability that, given a fixed Universal Turing Machine (UTM), a randomly
chosen input program will output that string.

So this is clearly a probability distribution (with some technicalities
regarding issues of program lengths being glossed over here) as John
Denker says.  However to go from this to a notion of entropy is more
questionable.

Not really questionable.  If you have a probability, you have an
entropy.

Shannon entropy is defined over a probability distribution.  That is,
given a probability distribution we can find its Shannon entropy by
summing -p_i / log2(p_i) over all members of the distribution.  If we
approximate the universal distribution by 1/2^n for strings of length
n, this sum is clearly divergent.  So there is not really a meaningful
Shannon entropy for this probability distribution, or in other words
the entropy is infinite.

The entropy _per string_ is unbounded, as it jolly well should be for
random strings of unbounded length.  That's true but uninteresting.
A more interesting quantity is the _entropy density_ aka the entropy
_per symbol_ which for typical long random bit-strings is one bit of
entropy per bit of string-length.  Similarly if you have a string of
symbols over a 32-symbol alphabet, you would expect to see five bits
of entropy per symbol for a typical long random string.

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]



### RE: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)


Shannon entropy is the one most people know, but it's all
wrong for deciding how many samples you need to derive a key.
The kind of classic illustration of this is the probability
distirbution:

0 occurs with probability 1/2
each other number from 1 to 2^{160}+1 happens with
probability 2^{-161}.

The Shannon entropy on this distribution is 81.5 bits.  But
if you tried to sample it once to generate an 80-bit Skipjack
key, half the time, I'd guess your key on my first try.

It's entirely correct that entropy is the wrong measure here, but
the question is how a good measure would look like.

Assume that you have a sample space with N elements and an intelligent
attacker (i.e., one that tries the most probable elements first). Then
what you actually are interested in is that the attacker's probability
of success after q sampling attempts is as close as possible to the
lowest possible, namely q * 2^{-N}. A natural way of measuring this
seems to be some kind of distance between Pr[succ after q samples] and
the ideal function q * 2^{-N}. Such a measure might allow a designer
to decide whether a non-perfect distribution is still acceptable or
simply far out. Is anyone aware of whether (and where) this was
discussed in the literature, or what other approaches are taken?

Erik

--
Dr. Erik Zenner   Phone:  +45 39 17 96 06Cryptico A/S
Chief Cryptographer   Mobile: +45 60 77 95 41Fruebjergvej 3
[EMAIL PROTECTED]   www.cryptico.com   DK 2100 Copenhagen

This e-mail may contain confidential information which is intended for
the addressee(s) only and which may not be reproduced or disclosed to
Cryptico immediately and destroy the e-mail. Thank you.

As e-mail can be changed electronically, Cryptico assumes no
responsibility for the message or any attachments. Nor will Cryptico be
responsible for any intrusion upon this e-mail or its attachments.

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]



### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)


Someone mentioned Physics in this discussion and this
was for me a motivation to point out something that
has been forgotten by Shannon, Kolmogorov, Chaitin

Even though Shannon's data entropy formula looks like an
absolute measure (there is no reference included), the often
confusing fact is that it does depend on a reference. The
reference is the probability model that you assume to fit
the data ensemble. You can have the same data ensemble and
many different (infinite) probability models that fit that
data ensemble, each one giving you a valid but different
entropy value. For example, if a source sends the number 1
1,000 times in a row, what would be the source's entropy?

Aram's assertion that the sequence of bytes from 1-256 has
maximum entropy would be right if that sequence came as one of
the possible outcomes of a neutron counter with a 256-byte
register. Someone's assertion that any data has entropy X
can be countered by finding a different probability model that
also fits the data, even if the entropy is higher (!). In short,
a data entropy value involves an arbitrary constant.

The situation, which seems confusing, improves when we realize
that only differences in data entropy can be actually measured,
when the arbitrary constant can be canceled -- if we are careful.

In practice, because data security studies usually (and often
wrongly!) suppose a closed system, then, so to say automatically,
only difference states of a single system are ever considered.
Under such circumstances, the probability model is well-defined
and the arbitrary constant *always* cancel. However, data systems
are not really closed, probability models are not always ergodic
or even accurate. Therefore, due care must be exercised when
using data entropy.

I don't want to go into too much detail here, which results
will be available elsewhere, but it is useful to take a brief
look into Physics.

In Physics, Thermodynamics, entropy is a potential .
As is usual for a potential, only *differences* in entropy
between different states can be measured. Since the entropy
is a potential, it is associated with a *state*, not with
a process. That is, it is possible to determine the entropy
difference regardless of the actual process which the system
may have performed, even whether the process was reversible or
not.

These are quite general properties. What I'm suggesting is
that the idea that entropy depends on a reference also applies
to data entropy, not just the entropy of a fluid, and it solves
the apparent contradictions (often somewhat acid) found in data
entropy discussions. It also explains why data entropy seems
confusing and contradictory to use. It may actually be a much
more powerful tool for data security than currently used.

Cheers,
Ed Gerck

 For example, J. Kestin, A Course in Thermodynamics, Blaisdell,
1966.

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]



### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)


Ed Gerck wrote:

In Physics, Thermodynamics, entropy is a potential .

That's true in classical (19th-century) thermodynamics, but not
true in modern physics, including statistical mechanics.  The
existence of superconductors and superfluids removes all doubt
about the absolute zero of entropy.  For details, see
http://www.av8n.com/physics/thermo-laws.htm#sec-spectator
http://www.av8n.com/physics/thermo-laws.htm#sec-secret-s

As is usual for a potential, only *differences* in entropy
between different states can be measured.

Not true.

These are quite general properties.

They are neither general nor relevant to crypto.

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]



### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)


Aram Perez wrote:

* How do you measure entropy? I was under the (false) impression that
Shannon gave a formula that measured the entropy of a message (or
information stream).

Entropy is defined in terms of probability.  It is a measure of
how much you don't know about the situation.  If by message
you mean a particular message that you are looking at, it has
zero entropy.  It is what it is;  no probability required.

If by stream you mean a somewhat abstract notion of an ensemble
of messages or symbols that you can only imperfectly predict, then
it has some nonzero entropy.

* Can you measure the entropy of a random oracle? Or is that what  both
Victor and Perry are saying is intractable?

That is a tricky question for a couple of reasons.
a) It will never be completely answerable because the question hinges
on the word random, which means different things to different people.
Thoughtful experts use the word in multiple inconsistent ways.
b) It also depends on just what you mean by measure.  Often it
is possible to _ascertain_ the entropy of a source ... but direct
empirical measurements of the output are usually not the best way.

* Are there units of entropy?

Yes.  It is naturally _dimensionless_, but dimensionless quantities
often have nontrivial units.  Commonly used units for entropy
include
++ bits
++ joules per kelvin.  One J/K equals 1.04×10^23 bits
For more on this, see
http://www.av8n.com/physics/thermo-laws.htm#sec-s-units

* What is the relationship between randomness and entropy?

They are neither exactly the same nor completely unrelated.
A pseudorandom sequence may be random enough for many
applications, yet has asymptotically zero entropy density.
A sequence of _odd_ bytes is obviously not entirely random,
yet may have considerable entropy density.

* (Apologies to the original poster) When the original poster  requested
passphrases with more than 160 bits of entropy, what was he requesting?

When you apply a mathematical function to an ensemble of inputs, it
is common to find that the ensemble of outputs has less entropy than
the ensemble of inputs. A simple pigeonhole argument suffices to show
that a function whose output is represented in 160 bits cannot possibly
represent more than 160 bits of entropy per output.  So if you want
the ensemble of outputs to have more than 160 bits of entropy, it is
necessary to do something fancier than a single instance of SHA-1.

* Does processing an 8 character password with a process similar to
PKCS#5 increase the entropy of the password?

No.

* Can you add or increase entropy?

Shuffling a deck of cards increases the entropy of the deck.

For more on this, see
http://www.av8n.com/physics/thermo-laws.htm#sec-entropy

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]



### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

On Wed, Mar 22, 2006 at 03:29:07PM -0800, Aram Perez wrote:

* How do you measure entropy? I was under the (false) impression that
Shannon gave a formula that measured the entropy of a message (or
information stream).

He did give a formula for the entropy of a source; however the caculation is
based on the probabilties of each symbol appearing. Unless you know those, you
can't actually apply the formula. So it is computable in theory, just not in
pratice for any source that is at all interesting.

* Can you measure the entropy of a random oracle? Or is that what
both Victor and Perry are saying is intractable?

A random oracle, by definition, produces a completely random output. However,
since random oracles don't actually exist that does not seem to be a terribly
interesting thing to be measuring the entropy of.

* Are there units of entropy?

Bits are usually the most intuitive/useful unit.

* What is the relationship between randomness and entropy?

I have a vague feeling this question requires a deeper answer than I'm able to
provide.

* Does processing an 8 character password with a process similar to
PKCS#5 increase the entropy of the password?

No, because there are no additional secrets. Knowledge of the password is all
you need to rederive the final output, thus clearly there is no additional
information (ie, entropy) in the output that was not in the original password.

This ignores the salt, iteration count, and the specification of the algorithm
itself, but those are all (usually) public. So they contribute to the entropy,
they do not contribute to the conditional entropy, which is what we are usually
interested in when thinking about entropy and crypto.

* Can you add or increase entropy?

Yes. Let's say the contents of tommorrow's NY Times has n bits of entropy (we
probably can't actually calculate n, but we know it is a finite and fixed
value). And the LA Times from the same day will also have some amount of
entropy (call it n'). However, the entropy of the two papers combined would
(probably) not be n+n', but some number x st min(n,n') = x = n+n', because
the two papers will report on many of the same issues, carry some of the same
AP stories, and so forth. This redundant information doesn't increase the
entropy (reading the same AP story in a second newspaper wouldn't give you any
new information).

A book you may find interesting is Elements of Information Theory by Cover
Thomas.

-Jack

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]



### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

Aram Perez [EMAIL PROTECTED] wrote:

So, if you folks care to educate me, I have several questions related
to entropy and information security (apologies to any physicists):

I'll answer the easier questions. I'll leave the harder ones for someone
with a better grounding in information theory.

* What is the relationship between randomness and entropy?

Roughly, they both measure unpredictability. Something that is hard
to predict is random, or has high entropy. There are mathematical
formulations that make this a lot more precise, but that's the basic
idea.

* Does processing an 8 character password with a process similar to
PKCS#5 increase the entropy of the password?

Absolutely not!

At best, you preserve the original entropy, just distributing it
differently. If you get the processing wrong, you can reduce or
even entirely destroy it, but no amount of any kind of processing
can increase it.

* Can you add or increase entropy?

You can add more entropy, either from another source or more
from the same source. That is the only way to increase it.

--
Sandy Harris
Zhuhai, Guangdong, China

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]



### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

From: Jack Lloyd [EMAIL PROTECTED]
Sent: Mar 22, 2006 11:30 PM
To: cryptography@metzdowd.com
Subject: Re: Entropy Definition (was Re: passphrases with more than 160 bits
of entropy)

...

As an aside, this whole discussion is confused by the fact that there
are a bunch of different domains in which entropy is defined.  The
algorithmic information theory sense of entropy (how long is the
shortest program that produces this sequence?) is miles away from the
information theory sense of entropy, and even that has a bunch of
different flavors.

For the information theory meanings of entropy, what you're really
talking about is a property of a probability distribution.  You can do
that in terms of Shannon entropy if you want to know about bounds on
average bandwidth requirements for transmitting symbols from that
distribution.  You can look at guessing entropy if you want to know
the -lg(expected work to guess the next symbol).  You can look at
min-entropy if you want a hard bound on how many symbols you need to
sample to derive a 128-bit key that won't ever expose you to
weaknesses based on how you selected it.  And so on.

Shannon entropy is the one most people know, but it's all wrong for
deciding how many samples you need to derive a key.  The kind of
classic illustration of this is the probability distirbution:

0 occurs with probability 1/2
each other number from 1 to 2^{160}+1 happens with probability
2^{-161}.

The Shannon entropy on this distribution is 81.5 bits.  But if you
tried to sample it once to generate an 80-bit Skipjack key, half the
time, I'd guess your key on my first try.

...

Yes. Let's say the contents of tommorrow's NY Times has n bits of entropy (we
probably can't actually calculate n, but we know it is a finite and fixed
value). And the LA Times from the same day will also have some amount of
entropy (call it n'). However, the entropy of the two papers combined would
(probably) not be n+n', but some number x st min(n,n') = x = n+n', because
the two papers will report on many of the same issues, carry some of the same
AP stories, and so forth. This redundant information doesn't increase the
entropy (reading the same AP story in a second newspaper wouldn't give you any
new information).

Right.  If the probabilities are independent, you can add the
entropies.  That's why they're defined in log terms.

--John

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]



### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)


John Kelsey wrote:

As an aside, this whole discussion is confused by the fact that there
are a bunch of different domains in which entropy is defined.  The
algorithmic information theory sense of entropy (how long is the
shortest program that produces this sequence?) is miles away from the
information theory sense of entropy, and even that has a bunch of
different flavors.

I would have said almost the opposite.  The great power and beauty
of the entropy idea is that it has the *same* meaning across many
domains.  Entropy is defined in terms of probability, period.  Any
other definition is either
a) exactly equivalent,
b) an approximation, valid under this-or-that restrictive conditions, or
c) wrong.

With some slight fiddling to get the normalization right, 1/2
raised to the power of (program length) defines a probability
measure.  This may not be the probability you want, but it
is a probability, and you can plug it into the entropy definition.

The problem is almost never understanding the definition of
entropy.  The problem is almost always ascertaining what is
the correct probability measure applicable to the problem
at hand.

Don't get me started about universal probability measures.
That has an arcane narrow technical meaning that is verrry widely
misunderstood.

0 occurs with probability 1/2
each other number from 1 to 2^{160}+1 happens with probability
2^{-161}.

The Shannon entropy on this distribution is 81.5 bits.

I think it is very close to 81 bits, not 81.5, but that is a minor
point that doesn't change the conclusion:

But if you
tried to sample it once to generate an 80-bit Skipjack key, half the
time, I'd guess your key on my first try.

Yeah, but if I sample it N times, with high probability I can generate
a large number of very good keys.  This problem is faced by (and solved
by) any decent TRNG.

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]



### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)


At 22:09  -0500 2006/03/22, John Denker wrote:

Aram Perez wrote:

* Can you add or increase entropy?

Shuffling a deck of cards increases the entropy of the deck.

As a minor nit, shuffling *in an unpredictable manner* adds entropy,
because there is extra randomness being brought into the process. If
I was one of those people who can do a perfect riffle shuffle,
reordering the cards in this entirely predictable manner does not
increase or decrease the existing entropy.

So in one sense, the answer is a simple no... nothing you can do to
a passphrase can increase its (that is, the passphrase's) entropy.
You can add randomness from another source, and increase the total
entropy, but I don't think that is relevant to the original question.

Greg.

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]



### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

From: John Denker [EMAIL PROTECTED]
Sent: Mar 23, 2006 1:44 PM
To: John Kelsey [EMAIL PROTECTED], cryptography@metzdowd.com
Subject: Re: Entropy Definition (was Re: passphrases with more than 160 bits
of entropy)

...
With some slight fiddling to get the normalization right, 1/2
raised to the power of (program length) defines a probability
measure.  This may not be the probability you want, but it
is a probability, and you can plug it into the entropy definition.

No, this isn't right for the algorithmic information theory meaning at
all.  For that measure, we can intelligently discuss the entropy of a
specific random string, without reference to a probability model.
Indeed, what's the probability distribution of the sequence of bits
defined by Chaitin's Omega?

You can certainly complain that they should have used a different term
than entropy here, but you're not going to get these to be the same.

The problem is almost never understanding the definition of
entropy.  The problem is almost always ascertaining what is
the correct probability measure applicable to the problem
at hand.

Well, you need to decide which of the probability distribution kinds
of entropy measures to use, and that differs in different
applications.  If you use min-entropy or guessing entropy to estimate
the limits on how well some sequence of symbols will compress, you'll
get a pretty lousy answer.  The same goes for using Shannon entropy to
determine whether you have collected enough entropy in your pool to
generate a 128-bit AES key.

...
0 occurs with probability 1/2
each other number from 1 to 2^{160}+1 happens with probability
2^{-161}.

The Shannon entropy on this distribution is 81.5 bits.

I think it is very close to 81 bits, not 81.5, but that is a minor
point that doesn't change the conclusion:

But if you
tried to sample it once to generate an 80-bit Skipjack key, half the
time, I'd guess your key on my first try.

Yeah, but if I sample it N times, with high probability I can generate
a large number of very good keys.  This problem is faced by (and solved
by) any decent TRNG.

Hmmm.  I've seen plenty of people get this wrong.  If you use Shannon
entropy as your measure, and then say when I have collected 128 bits
of Shannon entropy in my pool, I can generate a 128-bit AES key, you
will generate keys that aren't as secure as you think they are.  Now,
most TRNGs seem to evade this and many other problems by designing the
hardware to give a relatively simple probability model.  If your
probability model is close to uniform, then all these probability
distribution based entropy measurements converge (with a constant
difference).

In the case above, we had better specify N.  If you sample it 16
times, then you have a 2^{-16} chance of still being in a weak state.
Once you get enough samples that the probability of being in the
pathological worst case is negligible (whatever that means in your
application), then you can start generating output bits.  That's
probably somewhere between N=32 and N=64.

--John

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]



### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)


I wrote:

With some slight fiddling to get the normalization right, 1/2
raised to the power of (program length) defines a probability
measure.  This may not be the probability you want, but it
is a probability, and you can plug it into the entropy definition.

John Kelsey wrote:

No, this isn't right for the algorithmic information theory meaning at
all.

OK, in a moment we will have gone through four plies of no-it-isn't
yes-it-is no-it-isn't yes-it-is.  Let's get serious.  The axiomatic
definition of a measure is
-- a mapping from sets to numbers
-- positive
-- additive on the countable union of disjoint sets

And a probability measure has the further property of being
-- bounded above

I have checked that -- with due attention to trivial details --
.5 ^ (program length) satisfies this definition.  If you wish to
renew the assertion that there is no such probability measure, please
explain which of the axiomatic requirements is not met.  Please be
specific.

For that measure, we can intelligently discuss the entropy of a
specific random string, without reference to a probability model.

That's like saying we can talk about three-dimensional force, velocity,
and acceleration without reference to vectors.

Measure theory is the tried-and-true formalism for dealing with random
strings.  It would be spectacularly contrary to ignore the formalism,
and just plain wrong to say the formalism is inapplicable.

Indeed, what's the probability distribution of the sequence of bits
defined by Chaitin's Omega?

Huh?  Omega is so obviously a probability that usually the word probability
is used in its definition.  See e.g.
http://mathworld.wolfram.com/ChaitinsConstant.html
I suppose a masochistic nitpicker could demand to see a proof that this word
is justified, but I'm not going to bother, for reasons given above.

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]



### Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

This is getting pretty far afield from cryptography but it is a topic
I find very interesting so I can't resist jumping in.

John Denker writes:
OK, in a moment we will have gone through four plies of no-it-isn't
yes-it-is no-it-isn't yes-it-is.  Let's get serious.  The axiomatic
definition of a measure is
-- a mapping from sets to numbers
-- positive
-- additive on the countable union of disjoint sets

And a probability measure has the further property of being
-- bounded above

I have checked that -- with due attention to trivial details --
.5 ^ (program length) satisfies this definition.  If you wish to
renew the assertion that there is no such probability measure, please
explain which of the axiomatic requirements is not met.  Please be
specific.

This is true, in fact it is sometimes called the universal distribution
or universal measure.  In more detail, it is a distribution over all
finite-length strings.  The measure for a particular string X is defined
as the sum over all programs that output X of 1/2^L_i, where L_i is the
length of each such program.

Often the algorithmic complexity of a string is defined as the length
of the shortest program to output the string.  The universal measure is
based on the same idea, but takes into consideration that there may be
multiple programs that output the same string.  Each program of length L_i
contributes 1/2^L_i to the string's measure.  If there is only one short
program and all others are much longer, then the probability measure
is essentially 1/2^C where C is the length of this shortest program,
i.e. the algorithmic complexity.

The universal measure for a string can also be thought of as the
probability that, given a fixed Universal Turing Machine (UTM), a randomly
chosen input program will output that string.

So this is clearly a probability distribution (with some technicalities
regarding issues of program lengths being glossed over here) as John
Denker says.  However to go from this to a notion of entropy is more
questionable.

There are a countably infinite number of finite strings, and all of
them have non-zero probabilities under this distribution.  This means
that for most strings, the probability must be very low, asymptotically
approaching zero.  In fact you can show that for most strings of length n,
their measure is 1/2^n; this is equivalent to saying that most strings
are effectively random and have no shorter program to output them.

Shannon entropy is defined over a probability distribution.  That is,
given a probability distribution we can find its Shannon entropy by
summing -p_i / log2(p_i) over all members of the distribution.  If we
approximate the universal distribution by 1/2^n for strings of length
n, this sum is clearly divergent.  So there is not really a meaningful
Shannon entropy for this probability distribution, or in other words
the entropy is infinite.

Indeed, what's the probability distribution of the sequence of bits
defined by Chaitin's Omega?

This probability distribution is defined only over finite strings and so
Omega is not within the universe of this distribution.  It should also be
noted that it is impossible for an n bit program to output more than n
bits of Omega (plus or minus an additive constant specific to the UTM).
Hence even if we consider successive approximations to Omega of ever
increasing length, their measures would tend asymptotically to zero.

Hal Finney

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]



### Entropy Definition (was Re: passphrases with more than 160 bits of entropy)


On Mar 22, 2006, at 2:05 PM, Perry E. Metzger wrote:

Victor Duchovni [EMAIL PROTECTED] writes:
Actually calculating the entropy for real-world functions and
generators

may be intractable...

It is, in fact, generally intractable.

1) Kolmogorov-Chaitin entropy is just plain intractable -- finding the
smallest possible Turing machine to generate a sequence is not
computable.
2) Shannon entropy requires a precise knowledge of the probability of
all symbols, and in any real world situation that, too, is
impossible.

I'm not a cryptographer nor a mathematician, so I stand duly
corrected/chastised ;-)

So, if you folks care to educate me, I have several questions related
to entropy and information security (apologies to any physicists):

* How do you measure entropy? I was under the (false) impression that
Shannon gave a formula that measured the entropy of a message (or
information stream).
* Can you measure the entropy of a random oracle? Or is that what
both Victor and Perry are saying is intractable?

* Are there units of entropy?
* What is the relationship between randomness and entropy?
* (Apologies to the original poster) When the original poster
requested passphrases with more than 160 bits of entropy, what was
he requesting?
* Does processing an 8 character password with a process similar to
PKCS#5 increase the entropy of the password?

* Can you add or increase entropy?