Re: statistical inferences and PRNG characterization

2006-05-22 Thread Bill Tompkins

Travis H. wrote:
> Hi,
> 
> I've been wondering about the proper application of statistics with
> regard to comparing PRNGs and encrypted text to truly random sources.
> 
> As I understand it, when looking at output, one can take a
> hypothetical source model (e.g. "P(0) = 0.3, P(1) = 0.7, all bits
> independent") and come up with a probability that the source may have
> generated that output.  One cannot, however, say what probability such
> a source had generated the output, because there is an infinite number
> of sources (e.g. "P(0) = 0.2.., P(1) = 7.000...").  Can one say
> that, if the source must be A or B, what probability it actually was A
> (and if so, how)?

Short answer: you can say what evidence the observed output provides in
favor of A vs B, and if you're willing to start with a prior belief
(that A and B are equally likely before the observation, say), then you
can calculate the probability after the observation:

P(A) / P(B) = P(D | A) / P(D | B)
 (given that they were equally likely before seeing the data D)


Long answer:

There are a few philosophies on the correct way to think about such
issues.  The Bayesian inference approach would simply be to write the
probability that a source model is correct (given some data) in terms of
the probability that one would observe the data (given that source
model) and the probability of the source model being correct,
independent of the data (aka the prior), and the probability of the data
given any model:

P(M1 | D) = P(D | M1) * P(M1) / P(D)

  M1 = model 1, D = observed data.  This equation is Bayes' Theorem,
  and  is trivially derived from from the fact that
  P(M1 & D) = P(M1 | D) * P(D) = P(D | M1) * P(M1)


Depending on the type of data, determining P(D) can be an exercise in
philosophical futility, but note that in the ratio of two "posterior"
probabilities (the terms on the LHS), the P(D) terms will cancel:

P(M1 | D) / P(M2 | D)  =  P(D | M1) / P(D | M2)  * P(M1) / P(M2)


Now, if (say) you are using this as a spam filter, you might determine
that the prior probability (P(M1)) for one model is different from the
prior probability of a different model... for me, the odds that a given
email is spam seems to be about 99%, so P(M1)/P(M2) might be about 0.01.
 For a cryptographic application, well, you'd want to figure out
something sensible given the two models.  People who dislike Bayesian
inference tend to have an issue with the "arbitrary" choice of prior
probabilities.

Note that the ratio of the likelihoods (P(D | M1) is the likelihood of
M1 given the data D) is a measure of the evidence in favor of M1 vs M2,
provided the data D.  Non-Bayesian approaches tend to use the log of
this ratio as a statistic with a chi^2 distribution.

Oh, and do beware the number of degrees of freedom in your two models-
if they are not equal, you need to do a bit more work that outlined here.

-Bill

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


RE: Open Source Embedded SSL - (License and Memory)

2003-12-05 Thread Bill Tompkins
On Thu, 2003-12-04 at 20:15, Peter Gutmann wrote:

> How common is it to find an embedded system sophisticated enough to have a TCP
> stack and ethernet interface (and running SSL), but not sophisticated enough
> to have a malloc() implementation?  I've always assumed that anything with the
> former will also have the latter (I know there are some highly constrained
> embedded platforms used in some web-enabled widgets, but they usually don't
> run SSL).

I can't speak to how common it is, but there are applications that
require crypto, and that require some sort of negotiation protocol, that
don't use TCP or Ethernet.  For example- wireless apps, or various
non-ethernet multi-drop wired interfaces.  While these applications do
require some sort of communications stack, it might be less
sophisticated than what you're used to seeing with TCP/IP (and might be
mostly implemented in off-CPU hardware).

For systems like this, there's probably no requirement to use SSL
(unless it is talking through a gateway to some TCP device), but there
is a requirement for the underlying crypto library, and some protocol
that uses it.  As you've mentioned here before, designing a protocol
that is as secure as SSL is hard, and there's no sense re-inventing the
wheel, so why not use SSL?

Certainly, any system big enough to use SSL could have a simple block
allocator.  My complaint is with the non-determinancy (and implied error
checking/handling) that accompany variably-sized allocations and
reallocations.  As John Gilmore pointed out to me in an email, if the
library were simple enough to use static buffers, but did use memory
allocation, then the user could pre-determine the sizes and times of
memory allocation and write a simple block allocator to suit the
requirements of the library.  In this case, the behavior of the library
would again be deterministic.

I don't mean to be making a big deal about the memory allocation issue;
it's probably more a pet peeve of mine than an industry-wide concern. 
But it is something that would concern me if I was trying to do some
formal analysis to certify an embedded device (although I'm certainly no
expert on such things), or make a device that was simply never allowed
to fail or reset.

-Bill


-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Open Source Embedded SSL - Export Questions

2003-11-26 Thread Bill Tompkins
On Mon, 2003-11-24 at 21:06, J Harper wrote:

...snip...
> We're not looking for official legal advice, just some pointers to
> current online resources of how to go about registering our product 
> in the US.  I've seen posts that for SSL implementations you "just
> need to send a letter to the government", but haven't come across 
> an official government checklist and address.
...snip

http://www.bxa.doc.gov/Encryption/
is the US Dept of Commerce site that has the regulations

http://www.bxa.doc.gov/Encryption/PubAvailEncSourceCodeNofify.html
has the details about what letter you send where for "Publicly
Available" source code.  You'll want to read the regulations to verify
that the code does qualify as "publicly available", etc...

No, I'm not a lawyer, and no, this was not legal advice.

I am, however, an embedded software developer, and am looking forward to
seeing the code :)  I'm guessing the details of the software and license
are already set, but just in case they aren't, I've got a couple of
requests:

1) Not GPL or LPGL, please.  I'm a fan of the GPL for most things, but
for embedded software, especially in the security domain, it's a
killer.  I'm supposed to allow users to modify the software that runs on
their secure token?  And on a small platform where there won't be such
things as loadable modules, or even process separation, the (L)GPL
really does become viral.  This is, I think, why Red Hat releases eCos
under a non-GPL (but still open source) license.

2) Make it functional on systems without memory allocation.  Did I
mention that I work on (very) small embedded systems?  Having fixed
spaces for variables is useful when you want something to run
deterministically for a long time with no resets, and I have yet to find
a free bignum library that didn't want to use malloc all the time.

Thanks in advance for the code release,

-Bill

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]