### Re: statistical inferences and PRNG characterization

```| Hi,
|
| I've been wondering about the proper application of statistics with
| regard to comparing PRNGs and encrypted text to truly random sources.
|
| As I understand it, when looking at output, one can take a
| hypothetical source model (e.g. P(0) = 0.3, P(1) = 0.7, all bits
| independent) and come up with a probability that the source may have
| generated that output.  One cannot, however, say what probability such
| a source had generated the output, because there is an infinite number
| of sources (e.g. P(0) = 0.2.., P(1) = 7.000...).  Can one say
| that, if the source must be A or B, what probability it actually was A
| (and if so, how)?
That's not the way it's done.  Ignore for a moment that we have a sequence
(which is probably irrelevant for this purpose, but might not be).  Instead,
just imagine we have a large collection of values generated by the PRNG -
or, looked at another way, a large collection of values alleged to have been
drawn from a population with P(0) = 0.3 and P(1) = 0.7.  Now take a truely
random sample from that collection and ask the question:  What is the
probability that I would have seen this result, given that the collection
I'm drawing from is really taken from the alleged distribution?  You don't
need any information about *other* possible distributions.  (Not that there
aren't other questions you can ask.  Thus, if the collection could have
been drawn from either of two possible distributions, you can ask which
is more probable to have resulted in the random sample you saw.)

The randomness in the sampling is essential.  When you have it, you wipe out
any underlying bias in the way the collection was created.

| Also, it strikes me that it may not be possible to prove something
| cannot be distinguished from random, but that proofs must be of the
| opposite form, i.e. that some source is distinguishable from random.
Actually, what one tends to prove are things like:  If X is uniformally
randomly distributed over (0,1), then 2X is uniformally randomly distributed
over (0,2).  (On the other hand, X + X, while still random, is *not*
uniformally distributed.)  That's about as close as you are going to get
to a proof of randomness.

| Am I correct?  Are there any other subtleties in the application of
| statistics to crypto that anyone wishes to describe?  I have yet to
| find a good book on statistics in these kinds of situations, or for
| that matter in any.
Statistics in general require subtle reasoning.

| As an aside, it's amusing to see the abuse of statistics and
| probability in the media.  For example, when people ask what's the
| probability of some non-repeating event or condition?
That may or may not be a meaningful concept.  If I toss a coin, and
depending
on the result, blow up a building - there is no way to repeat the blowing up
of the building, but still it's meaningful to say that the probability that
the building gets blown up is 50%.

-- Jerry

| --
| Curiousity killed the cat, but for a while I was a suspect -- Steven
Wright
| Security Guru for Hire http://www.lightconsulting.com/~travis/ --
| GPG fingerprint: 9D3F 395A DAC5 5CCC 9066  151D 0A6B 4098 0C55 1484
|
| -
| The Cryptography Mailing List
| Unsubscribe by sending unsubscribe cryptography to
[EMAIL PROTECTED]
|

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]

```

### Re: statistical inferences and PRNG characterization

```
Travis H. wrote:
Hi,

I've been wondering about the proper application of statistics with
regard to comparing PRNGs and encrypted text to truly random sources.

As I understand it, when looking at output, one can take a
hypothetical source model (e.g. P(0) = 0.3, P(1) = 0.7, all bits
independent) and come up with a probability that the source may have
generated that output.  One cannot, however, say what probability such
a source had generated the output, because there is an infinite number
of sources (e.g. P(0) = 0.2.., P(1) = 7.000...).  Can one say
that, if the source must be A or B, what probability it actually was A
(and if so, how)?

Short answer: you can say what evidence the observed output provides in
favor of A vs B, and if you're willing to start with a prior belief
(that A and B are equally likely before the observation, say), then you
can calculate the probability after the observation:

P(A) / P(B) = P(D | A) / P(D | B)
(given that they were equally likely before seeing the data D)

There are a few philosophies on the correct way to think about such
issues.  The Bayesian inference approach would simply be to write the
probability that a source model is correct (given some data) in terms of
the probability that one would observe the data (given that source
model) and the probability of the source model being correct,
independent of the data (aka the prior), and the probability of the data
given any model:

P(M1 | D) = P(D | M1) * P(M1) / P(D)

M1 = model 1, D = observed data.  This equation is Bayes' Theorem,
and  is trivially derived from from the fact that
P(M1  D) = P(M1 | D) * P(D) = P(D | M1) * P(M1)

Depending on the type of data, determining P(D) can be an exercise in
philosophical futility, but note that in the ratio of two posterior
probabilities (the terms on the LHS), the P(D) terms will cancel:

P(M1 | D) / P(M2 | D)  =  P(D | M1) / P(D | M2)  * P(M1) / P(M2)

Now, if (say) you are using this as a spam filter, you might determine
that the prior probability (P(M1)) for one model is different from the
prior probability of a different model... for me, the odds that a given
email is spam seems to be about 99%, so P(M1)/P(M2) might be about 0.01.
For a cryptographic application, well, you'd want to figure out
something sensible given the two models.  People who dislike Bayesian
inference tend to have an issue with the arbitrary choice of prior
probabilities.

Note that the ratio of the likelihoods (P(D | M1) is the likelihood of
M1 given the data D) is a measure of the evidence in favor of M1 vs M2,
provided the data D.  Non-Bayesian approaches tend to use the log of
this ratio as a statistic with a chi^2 distribution.

Oh, and do beware the number of degrees of freedom in your two models-
if they are not equal, you need to do a bit more work that outlined here.

-Bill

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]

```

### Re: Crypto hardware with secure key storage

```Speaking of bulk encryption cards... does the linux 2.6 kernel support
any?  There is a reference to a crypto framework in the
configuration menus, but as is typical of linux, there are no man
pages or other documentation related to it, and I don't feel like
reading source code.  (/usr/src/linux*/Documentation/crypto says next to
nothing, and the two URLs in the file are not working)

Cheers,

/ji

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]

```

### Re: statistical inferences and PRNG characterization

```

On May 19, 2006, at 6:51, Travis H. wrote:

As I understand it, when looking at output, one can take a
hypothetical source model (e.g. P(0) = 0.3, P(1) = 0.7, all bits
independent) and come up with a probability that the source may have
generated that output.

One can come up with the probability that the defined source will
generate that output in a single run.

One cannot, however, say what probability such
a source had generated the output, because there is an infinite number
of sources (e.g. P(0) = 0.2.., P(1) = 7.000...).  Can one say
that, if the source must be A or B, what probability it actually was A
(and if so, how)?

If you can put your question into the form, Source A or B is chosen
with probability pA or 1-pA.  Output X is generated.  What is the
probability that it was source A that was chosen? then Bayesian
inference can answer the question.  However, you don't generally have
a known a priori probability of each source being chosen, and you
don't even know the characteristics of the other source.  You can
generalize to an arbitrary number of alternative sources, but that
doesn't provide the prior data that's lacking.

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]

```

### May 24: National Day of Outrage at NSA/Telco surveillance

```Some alternative media groups have called for a national day of protests
against the telcos' latest sleazy activities, including their cooperation
in NSA's illegal surveillance of innocent citizens.

http://saveaccess.org/

Events are already scheduled in Boston, Chicago, San Francisco, and
NYC.  You can register your own local event by sending mail to
[EMAIL PROTECTED]

Curiously, nobody in Washington, DC or Baltimore is protesting yet.
Perhaps a contingent should form outside NSA, with signs showing the
NSA employees on their way to/from work just what we think of their
disrespect for the constitution, the law, and the public.  Do we have
a local volunteer to organize it?

John

PS: I don't agree with all the things these people are protesting, but
I admire their energy.  I haven't seen cryptographers and cypherpunks
with protest signs -- yet.  But I hope to see you out there on May 24th.

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]

```

### Re: statistical inferences and PRNG characterization

```On Fri, May 19, 2006 at 06:51:55AM -0500, Travis H. wrote:
As I understand it, when looking at output, one can take a
hypothetical source model (e.g. P(0) = 0.3, P(1) = 0.7, all bits
independent) and come up with a probability that the source may have
generated that output.  One cannot, however, say what probability such
a source had generated the output, because there is an infinite number
of sources (e.g. P(0) = 0.2.., P(1) = 7.000...).  Can one say
that, if the source must be A or B, what probability it actually was A
(and if so, how)?

You could do this with relatively simple Bayesian classification.
Start with a prior assumption like As far as I know it is 50/50
that it is source A or B and then for the output you see you
calculate P(A|output) and P(B|outout) using Bayes rule, your
probabilistic model for the source and P(A) = P(B) = 0.5.

P(X|O) = P(O|X) P(X)/P(O)

A finite number of sources is not required here, as long as you're
willing to provide a prior distribution over all possible sources
that you can do calculations with.

Also, it strikes me that it may not be possible to prove something
cannot be distinguished from random, but that proofs must be of the
opposite form, i.e. that some source is distinguishable from random.

I think you're still going to run into the problem of deciding what
is random, and that problem will be tied up in your choice of prior
distribution on the sources.

Am I correct?  Are there any other subtleties in the application of
statistics to crypto that anyone wishes to describe?  I have yet to
find a good book on statistics in these kinds of situations, or for
that matter in any.

I guess the usual proviso: these sort of calculations require
assumptions to make them possible, and the results should not be
confidently applied outside situations where those assumptions are
valid.

David.

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]

```

### Phil Zimmerman and voice encryption; a Skype problem?

```There's an article in today's NY Times (for subscribers, it's at
on whether Phil Zimmerman's Zfone -- an encrypted VoIP package -- will
invite government scrutiny.  There doesn't seem to be any imminent threat
in the U.S.; the one concrete example mentioned -- the British plan to
give police the power to compel individuals to disclose keys -- doesn't
threaten Zfone, because it uses Diffie-Hellman for (among other things)
perfect forward secrecy and doesn't even have any long-term keys.  (See
draft-zimmermann-avt-zrtp-01.txt for protocol details.)

The fascinating thing, though, was this sentence near the end of the
article:

But at a conference last week in Cyprus, German officials said
they had technology for intercepting and decrypting Skype phone
calls, according to Anthony M. Rutkowski, vice president for
regulatory affairs and standards for VeriSign, a company that
offers security for Internet and phone operations.

The Berson report says that Skype uses AES-256.  NSA rates that as
suitable for Top Secret traffic, so it's presumably not the cipher.
Berson analyzed a number of other possible attack scenarios; the only one
that seems to be possible is an active attack plus forged certificates.
If Berson's analysis was correct -- and we all know how hard it is to
verify cryptographic protocols -- that leaves open the possibility of a
protocol change that implemented some sort of Clipper-like functionality.
A silent change like that would be *very* ominous.

--Steven M. Bellovin, http://www.cs.columbia.edu/~smb

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]

```

### Re: Phil Zimmerman and voice encryption; a Skype problem?

```
At 10:19 AM -0400 5/22/06, Steven M. Bellovin wrote:

There's an article in today's NY Times (for subscribers, it's at
on whether Phil Zimmerman's Zfone -- an encrypted VoIP package -- will
invite government scrutiny.  There doesn't seem to be any imminent threat
in the U.S.; the one concrete example mentioned -- the British plan to
give police the power to compel individuals to disclose keys -- doesn't
threaten Zfone, because it uses Diffie-Hellman for (among other things)
perfect forward secrecy and doesn't even have any long-term keys.  (See
draft-zimmermann-avt-zrtp-01.txt for protocol details.)

The fascinating thing, though, was this sentence near the end of the
article:

But at a conference last week in Cyprus, German officials said
they had technology for intercepting and decrypting Skype phone
calls, according to Anthony M. Rutkowski, vice president for
regulatory affairs and standards for VeriSign, a company that
offers security for Internet and phone operations.

The Berson report says that Skype uses AES-256.  NSA rates that as
suitable for Top Secret traffic, so it's presumably not the cipher.
Berson analyzed a number of other possible attack scenarios; the only one
that seems to be possible is an active attack plus forged certificates.
If Berson's analysis was correct -- and we all know how hard it is to
verify cryptographic protocols -- that leaves open the possibility of a
protocol change that implemented some sort of Clipper-like functionality.

Please don't forget that the VeriSign spokesperson may be mistaken,
or purposely lying (possibly in order to drum up business for the
company). Neither would be a first for VeriSign.

--Paul Hoffman, Director
--VPN Consortium

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]

```

### Re: Phil Zimmerman and voice encryption; a Skype problem?

```

Steven M. Bellovin writes:
-+--
| .. -- that leaves open the possibility of a
| protocol change that implemented some sort of Clipper-like functionality.
| A silent change like that would be *very* ominous.
|

I'm reminded of Adi Shamir's 2004 Turing Award Lecture

* Absolutely secure systems do not exist