Re: statistical inferences and PRNG characterization

2006-05-22 Thread Bill Tompkins

Travis H. wrote:
 Hi,
 
 I've been wondering about the proper application of statistics with
 regard to comparing PRNGs and encrypted text to truly random sources.
 
 As I understand it, when looking at output, one can take a
 hypothetical source model (e.g. P(0) = 0.3, P(1) = 0.7, all bits
 independent) and come up with a probability that the source may have
 generated that output.  One cannot, however, say what probability such
 a source had generated the output, because there is an infinite number
 of sources (e.g. P(0) = 0.2.., P(1) = 7.000...).  Can one say
 that, if the source must be A or B, what probability it actually was A
 (and if so, how)?

Short answer: you can say what evidence the observed output provides in
favor of A vs B, and if you're willing to start with a prior belief
(that A and B are equally likely before the observation, say), then you
can calculate the probability after the observation:

P(A) / P(B) = P(D | A) / P(D | B)
 (given that they were equally likely before seeing the data D)


Long answer:

There are a few philosophies on the correct way to think about such
issues.  The Bayesian inference approach would simply be to write the
probability that a source model is correct (given some data) in terms of
the probability that one would observe the data (given that source
model) and the probability of the source model being correct,
independent of the data (aka the prior), and the probability of the data
given any model:

P(M1 | D) = P(D | M1) * P(M1) / P(D)

  M1 = model 1, D = observed data.  This equation is Bayes' Theorem,
  and  is trivially derived from from the fact that
  P(M1  D) = P(M1 | D) * P(D) = P(D | M1) * P(M1)


Depending on the type of data, determining P(D) can be an exercise in
philosophical futility, but note that in the ratio of two posterior
probabilities (the terms on the LHS), the P(D) terms will cancel:

P(M1 | D) / P(M2 | D)  =  P(D | M1) / P(D | M2)  * P(M1) / P(M2)


Now, if (say) you are using this as a spam filter, you might determine
that the prior probability (P(M1)) for one model is different from the
prior probability of a different model... for me, the odds that a given
email is spam seems to be about 99%, so P(M1)/P(M2) might be about 0.01.
 For a cryptographic application, well, you'd want to figure out
something sensible given the two models.  People who dislike Bayesian
inference tend to have an issue with the arbitrary choice of prior
probabilities.

Note that the ratio of the likelihoods (P(D | M1) is the likelihood of
M1 given the data D) is a measure of the evidence in favor of M1 vs M2,
provided the data D.  Non-Bayesian approaches tend to use the log of
this ratio as a statistic with a chi^2 distribution.

Oh, and do beware the number of degrees of freedom in your two models-
if they are not equal, you need to do a bit more work that outlined here.

-Bill

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: Open Source Embedded SSL - Export Questions

2003-11-26 Thread Bill Tompkins
On Mon, 2003-11-24 at 21:06, J Harper wrote:

...snip...
 We're not looking for official legal advice, just some pointers to
 current online resources of how to go about registering our product 
 in the US.  I've seen posts that for SSL implementations you just
 need to send a letter to the government, but haven't come across 
 an official government checklist and address.
...snip

http://www.bxa.doc.gov/Encryption/
is the US Dept of Commerce site that has the regulations

http://www.bxa.doc.gov/Encryption/PubAvailEncSourceCodeNofify.html
has the details about what letter you send where for Publicly
Available source code.  You'll want to read the regulations to verify
that the code does qualify as publicly available, etc...

No, I'm not a lawyer, and no, this was not legal advice.

I am, however, an embedded software developer, and am looking forward to
seeing the code :)  I'm guessing the details of the software and license
are already set, but just in case they aren't, I've got a couple of
requests:

1) Not GPL or LPGL, please.  I'm a fan of the GPL for most things, but
for embedded software, especially in the security domain, it's a
killer.  I'm supposed to allow users to modify the software that runs on
their secure token?  And on a small platform where there won't be such
things as loadable modules, or even process separation, the (L)GPL
really does become viral.  This is, I think, why Red Hat releases eCos
under a non-GPL (but still open source) license.

2) Make it functional on systems without memory allocation.  Did I
mention that I work on (very) small embedded systems?  Having fixed
spaces for variables is useful when you want something to run
deterministically for a long time with no resets, and I have yet to find
a free bignum library that didn't want to use malloc all the time.

Thanks in advance for the code release,

-Bill

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]