Re: compressing randomly-generated numbers

2006-08-30 Thread Alexander Klimov
On Mon, 28 Aug 2006, Travis H. wrote:
 On 8/23/06, Alexander Klimov [EMAIL PROTECTED] wrote:
  A random bit stream should have two properties: no bias and no
  dependency between bits. If one has biased but independent bits he
  can use the von Neumann algorithm to remove the bias, but if there is
  a dependency no algorithm will be able to `strip' it.

 Interesting claim;

Well, it not really a claim since there was no definition, here it is:
A ``dependency stripping'' algorithm is a deterministic algorithm that
gets a stream of unbiased (but not necessary independent bits) and
produces a stream of several independent bits.

 what if I XOR it with a truly random stream,

It is no a deterministic algorithm.

 or a computationally-pseudorandom stream,

This requires a random key.

Recall that the whole point was to extract random bits -- if we
already have useful random bits we can simply output them and discard
the input stream.

 or if I filter out every other bit?

Consider ``x a x b x c x ...'', sampling every other bit throws away
most of entropy.

In general, there is no ``dependency stripping'' algorithm because an
input x,x,x,..., where all x are the same (either all 0 or all 1), is
possible. There is simply no enough entropy in the input to extract at
least two bits.

Interestingly, an additional requirement that the input stream has at
least two bits of entropy does not change the situation: Consider the
first two bits of the output. There are only four possibilities (00,
01, 10, and 11), so if the input is long enough then it is possible to
find four different inputs that give the same bits, e.g.,

 A(0001) = 00...
 A(0010) = 00...
 A(0100) = 00...
 A(1000) = 00...

An input that contains 0001, 0010, 0100, and 1000 with the same
probability has two bits of entropy, but it is always transformed by
the algorithm into 00... and thus the first two bits are not
independent. (Strictly speaking, if we recall a requirement that the
input bits are unbiased we should also include into the set of inputs
1110, 1101, ..., but this does not change the result.)

 Seems like these would, if not eliminate the dependency, would
 weaken its effect. The last is equivalent to sampling at a slower
 rate.

See above about ``sampling at a slower rate.''

-- 
Regards,
ASK

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: compressing randomly-generated numbers

2006-08-30 Thread Travis H.

On 8/29/06, Alexander Klimov [EMAIL PROTECTED] wrote:

Well, it not really a claim since there was no definition, here it is:
A ``dependency stripping'' algorithm is a deterministic algorithm that
gets a stream of unbiased (but not necessary independent bits) and
produces a stream of several independent bits.


Well, this is a fairly strict definition, I think uniform and e-close to
independent is still potentially suitable for use in cryptography.


 or a computationally-pseudorandom stream,
This requires a random key.


I can't use e-close to (uniform and independent)?


Recall that the whole point was to extract random bits -- if we
already have useful random bits we can simply output them and discard
the input stream.


I hear this argument often, and it appears that the people who say it
don't care about the rate of random bits, nor the desirability of not having
to trust any one source to be truly unpredictable, nor have they understood
the point of an extractor or hashing the output of multiple sources.

For example, I'd like to use the Quantis HWRNG, but since it is an
opaque container, then I cannot trust it fully.  If I had another source
that I could combine with it, then I do not have to trust it blindly;
the opponent would have to compromise both in order to be able to
predict the combined output.


Consider ``x a x b x c x ...'', sampling every other bit throws away
most of entropy.
In general, there is no ``dependency stripping'' algorithm because an
input x,x,x,..., where all x are the same (either all 0 or all 1), is
possible. There is simply no enough entropy in the input to extract at
least two bits.


No, you have no idea of the unpredictability of that source, because
it is unspecified and unpredictability is untestable.  That could very well
be the output of a perfect HWRNG.   So could 01010101, or 000,
or 111.  Each output is equally likely, and no amount of testing the
output can say whether the source is imperfectly random or truly random.

This was stated several times on this list; entropy depends on the
model of the source, not on the output.  The sequence:
0, 1, 2, 3, 4... 255, 0, 1, 2, 3, 4... 255 ...
has a zero entropy if the source is defined as an 8-bit counter starting
at zero, and it has an entropy of 1 if the source is defined as a HWRNG
that generates 8-bit outputs in a uniform and independent manner.
Re-read the definition of entropy, and pay particular attention to the
calculation of probability for a given event.


Interestingly, an additional requirement that the input stream has at
least two bits of entropy does not change the situation


I didn't understand this example.


See above about ``sampling at a slower rate.''


I can take an analog random source, and sample it at one rate, and have
a nearly independent stream.  If I increase the sample rate, the bits will start
to become more and more dependent upon the prior bit.  So, if that is true,
then logically a serially-correlated stream will become less and less correlated
as I decrease the sample rate.  Taking every other bit corresponds to sampling
at half the speed, but doesn't require modifying the hardware.

It seems that you are saying there is no general solution, given a total lack
of knowledge about the source other than the fact that there is some
dependency between the bits.  I can agree with that.  However, if you
understand the physics of the source, then you do know something about
the random phenomenon used as a source, and in that case you can eliminate
specific kinds of dependencies that it may exhibit.

The easiest way to eliminate (computationally) bias and dependency in one
step is to combine with a CSPRNG.  You can reseed it periodically with
the combined output.
--
If you're not part of the solution, you're part of the precipitate.
Unix guru for rent or hire -- http://www.lightconsulting.com/~travis/
GPG fingerprint: 9D3F 395A DAC5 5CCC 9066  151D 0A6B 4098 0C55 1484

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: compressing randomly-generated numbers

2006-08-11 Thread james hughes


On Aug 9, 2006, at 8:44 PM, Travis H. wrote:


Hey,

I was mulling over some old emails about randomly-generated numbers
and realized that if I had an imperfectly random source (something
less than 100% unpredictable), that compressing the output would
compress it to the point where it was nearly so.  Would there be any
reason to choose one algorithm over another for this application?


This is neither correct nor a good idea.

Taking almost random information and compressing it will lead to less  
random results.


Specifically, I will give the general LZW case and then go to the BZ2  
case.


1) For LZW (even ignoring the magic numbers), if a byte does not  
match anything in the dictionary (it starts with a dictionary of all  
0s, so the probability of a match on the first byte is low) then that  
byte will get a header of a 0 bit. That byte now becomes 9 bits. The  
next byte will have a dictionary of the previous byte and all zeros.  
The chance of a match will still be small and putting that into the  
dictionary will be a 9 bit field with 0s. So in the first 2 bytes, I  
can almost guarantee I know where 2 zero bits are.


2) BZ2 transforms the data and then uses LZW. See 1)

The correct way to improve almost random data is to process it with  
a hash like function with compression.


Jim


-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


RE: compressing randomly-generated numbers

2006-08-11 Thread Jeremy Hansen
 I was mulling over some old emails about randomly-generated 
 numbers and realized that if I had an imperfectly random 
 source (something less than 100% unpredictable), that 
 compressing the output would compress it to the point where 
 it was nearly so.  Would there be any reason to choose one 
 algorithm over another for this application?

I see where you're coming from, but take an imperfectly random source
and apply a deterministic function to it, and if I recall correctly, you
still have a imperfectly random output. It would be better to use
something like Von Neumann's unbiasing algorithm (or any of the newer
improvements) to strip out the non-randomness.
 
 I recall talking to a CS prof once who said that LZW 
 compression was optimal, which seemed and still seems 
 really odd to me because optimal compression would generate a 
 null output file.  So surely he meant asymptotically optimal, 
 or e-close to optimal, or something like that... anyone know?

He probably meant optimal in the information theoretic sense. If that
was the case, then no, optimal compression will not yield a null-length
output -- it will give you the minimum length output that could
represent your input from among all inputs. Or maybe he didn't ;)

Regards,
Jeremy Hansen, MS, CISSP
Director of Security
RAIR Technologies

Any views or opinions presented in this email are solely those of the
author and do not 
necessarily represent those of RAIR Technologies, Inc. The individual
author will be
personally liable for any damages or other liability arising from this
email. 
RAIR Technologies * Brookfield, WI * 262-780-6000  


-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: compressing randomly-generated numbers

2006-08-11 Thread Damien Miller
On Wed, 9 Aug 2006, Travis H. wrote:

 Hey,
 
 I was mulling over some old emails about randomly-generated numbers
 and realized that if I had an imperfectly random source (something
 less than 100% unpredictable), that compressing the output would
 compress it to the point where it was nearly so.  Would there be any
 reason to choose one algorithm over another for this application?

What are you trying to achieve?

An ARC4 keystream derived from fixed key is 100% predictable and yet
cannot be compressed by anything you would normally call a compression
algorithm.

-d


-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]