Re: compressing randomly-generated numbers
On Mon, 28 Aug 2006, Travis H. wrote: On 8/23/06, Alexander Klimov [EMAIL PROTECTED] wrote: A random bit stream should have two properties: no bias and no dependency between bits. If one has biased but independent bits he can use the von Neumann algorithm to remove the bias, but if there is a dependency no algorithm will be able to `strip' it. Interesting claim; Well, it not really a claim since there was no definition, here it is: A ``dependency stripping'' algorithm is a deterministic algorithm that gets a stream of unbiased (but not necessary independent bits) and produces a stream of several independent bits. what if I XOR it with a truly random stream, It is no a deterministic algorithm. or a computationally-pseudorandom stream, This requires a random key. Recall that the whole point was to extract random bits -- if we already have useful random bits we can simply output them and discard the input stream. or if I filter out every other bit? Consider ``x a x b x c x ...'', sampling every other bit throws away most of entropy. In general, there is no ``dependency stripping'' algorithm because an input x,x,x,..., where all x are the same (either all 0 or all 1), is possible. There is simply no enough entropy in the input to extract at least two bits. Interestingly, an additional requirement that the input stream has at least two bits of entropy does not change the situation: Consider the first two bits of the output. There are only four possibilities (00, 01, 10, and 11), so if the input is long enough then it is possible to find four different inputs that give the same bits, e.g., A(0001) = 00... A(0010) = 00... A(0100) = 00... A(1000) = 00... An input that contains 0001, 0010, 0100, and 1000 with the same probability has two bits of entropy, but it is always transformed by the algorithm into 00... and thus the first two bits are not independent. (Strictly speaking, if we recall a requirement that the input bits are unbiased we should also include into the set of inputs 1110, 1101, ..., but this does not change the result.) Seems like these would, if not eliminate the dependency, would weaken its effect. The last is equivalent to sampling at a slower rate. See above about ``sampling at a slower rate.'' -- Regards, ASK - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]
Re: compressing randomly-generated numbers
On 8/29/06, Alexander Klimov [EMAIL PROTECTED] wrote: Well, it not really a claim since there was no definition, here it is: A ``dependency stripping'' algorithm is a deterministic algorithm that gets a stream of unbiased (but not necessary independent bits) and produces a stream of several independent bits. Well, this is a fairly strict definition, I think uniform and e-close to independent is still potentially suitable for use in cryptography. or a computationally-pseudorandom stream, This requires a random key. I can't use e-close to (uniform and independent)? Recall that the whole point was to extract random bits -- if we already have useful random bits we can simply output them and discard the input stream. I hear this argument often, and it appears that the people who say it don't care about the rate of random bits, nor the desirability of not having to trust any one source to be truly unpredictable, nor have they understood the point of an extractor or hashing the output of multiple sources. For example, I'd like to use the Quantis HWRNG, but since it is an opaque container, then I cannot trust it fully. If I had another source that I could combine with it, then I do not have to trust it blindly; the opponent would have to compromise both in order to be able to predict the combined output. Consider ``x a x b x c x ...'', sampling every other bit throws away most of entropy. In general, there is no ``dependency stripping'' algorithm because an input x,x,x,..., where all x are the same (either all 0 or all 1), is possible. There is simply no enough entropy in the input to extract at least two bits. No, you have no idea of the unpredictability of that source, because it is unspecified and unpredictability is untestable. That could very well be the output of a perfect HWRNG. So could 01010101, or 000, or 111. Each output is equally likely, and no amount of testing the output can say whether the source is imperfectly random or truly random. This was stated several times on this list; entropy depends on the model of the source, not on the output. The sequence: 0, 1, 2, 3, 4... 255, 0, 1, 2, 3, 4... 255 ... has a zero entropy if the source is defined as an 8-bit counter starting at zero, and it has an entropy of 1 if the source is defined as a HWRNG that generates 8-bit outputs in a uniform and independent manner. Re-read the definition of entropy, and pay particular attention to the calculation of probability for a given event. Interestingly, an additional requirement that the input stream has at least two bits of entropy does not change the situation I didn't understand this example. See above about ``sampling at a slower rate.'' I can take an analog random source, and sample it at one rate, and have a nearly independent stream. If I increase the sample rate, the bits will start to become more and more dependent upon the prior bit. So, if that is true, then logically a serially-correlated stream will become less and less correlated as I decrease the sample rate. Taking every other bit corresponds to sampling at half the speed, but doesn't require modifying the hardware. It seems that you are saying there is no general solution, given a total lack of knowledge about the source other than the fact that there is some dependency between the bits. I can agree with that. However, if you understand the physics of the source, then you do know something about the random phenomenon used as a source, and in that case you can eliminate specific kinds of dependencies that it may exhibit. The easiest way to eliminate (computationally) bias and dependency in one step is to combine with a CSPRNG. You can reseed it periodically with the combined output. -- If you're not part of the solution, you're part of the precipitate. Unix guru for rent or hire -- http://www.lightconsulting.com/~travis/ GPG fingerprint: 9D3F 395A DAC5 5CCC 9066 151D 0A6B 4098 0C55 1484 - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]
Re: compressing randomly-generated numbers
On Aug 9, 2006, at 8:44 PM, Travis H. wrote: Hey, I was mulling over some old emails about randomly-generated numbers and realized that if I had an imperfectly random source (something less than 100% unpredictable), that compressing the output would compress it to the point where it was nearly so. Would there be any reason to choose one algorithm over another for this application? This is neither correct nor a good idea. Taking almost random information and compressing it will lead to less random results. Specifically, I will give the general LZW case and then go to the BZ2 case. 1) For LZW (even ignoring the magic numbers), if a byte does not match anything in the dictionary (it starts with a dictionary of all 0s, so the probability of a match on the first byte is low) then that byte will get a header of a 0 bit. That byte now becomes 9 bits. The next byte will have a dictionary of the previous byte and all zeros. The chance of a match will still be small and putting that into the dictionary will be a 9 bit field with 0s. So in the first 2 bytes, I can almost guarantee I know where 2 zero bits are. 2) BZ2 transforms the data and then uses LZW. See 1) The correct way to improve almost random data is to process it with a hash like function with compression. Jim - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]
RE: compressing randomly-generated numbers
I was mulling over some old emails about randomly-generated numbers and realized that if I had an imperfectly random source (something less than 100% unpredictable), that compressing the output would compress it to the point where it was nearly so. Would there be any reason to choose one algorithm over another for this application? I see where you're coming from, but take an imperfectly random source and apply a deterministic function to it, and if I recall correctly, you still have a imperfectly random output. It would be better to use something like Von Neumann's unbiasing algorithm (or any of the newer improvements) to strip out the non-randomness. I recall talking to a CS prof once who said that LZW compression was optimal, which seemed and still seems really odd to me because optimal compression would generate a null output file. So surely he meant asymptotically optimal, or e-close to optimal, or something like that... anyone know? He probably meant optimal in the information theoretic sense. If that was the case, then no, optimal compression will not yield a null-length output -- it will give you the minimum length output that could represent your input from among all inputs. Or maybe he didn't ;) Regards, Jeremy Hansen, MS, CISSP Director of Security RAIR Technologies Any views or opinions presented in this email are solely those of the author and do not necessarily represent those of RAIR Technologies, Inc. The individual author will be personally liable for any damages or other liability arising from this email. RAIR Technologies * Brookfield, WI * 262-780-6000 - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]
Re: compressing randomly-generated numbers
On Wed, 9 Aug 2006, Travis H. wrote: Hey, I was mulling over some old emails about randomly-generated numbers and realized that if I had an imperfectly random source (something less than 100% unpredictable), that compressing the output would compress it to the point where it was nearly so. Would there be any reason to choose one algorithm over another for this application? What are you trying to achieve? An ARC4 keystream derived from fixed key is 100% predictable and yet cannot be compressed by anything you would normally call a compression algorithm. -d - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]