Re: Improve `seed-random-state' in stable-2.0?
Hi Mark, On Fri 20 Jan 2012 23:45, Mark H Weaver m...@netris.org writes: Andy Wingo wi...@pobox.com writes: How about, we extend seed-random-state to operate on bytevectors, and have that interface do the right thing. I agree that it would be nice for `seed-random-state' to support bytevectors as well, but for many (most?) purposes that would be a very awkward interface to use. Really? My thought would be that if I have to initialize a PRNG, I need some random bits. Dunno. Deprecating the number interface does have the advantage that we can perhaps move people to use your nice new functions, instead of initializing with (getpid) or whatever it is that they are using. Even if we keep a broken `seed-random-state', there's another problem: our PRNG sucks rocks. If we constrain ourselves to produce the same sequence of random numbers for a given seed, that means that we're stuck with this very weak PRNG for the entire 2.0 series. Can't we just make a clean break now? 2.0 is still not widely deployed, so now is a great time to assert our right to change the PRNG at will. As you say, it's unlikely that anyone is relying on this anyway. If anyone is, wouldn't it be better to deal with that now? While I agree about the badness of the PRNG -- though we shouldn't overstate that; for being so simple, MWC does well -- but I really don't think that we should change the default behavior now. OTOH, we can make seed-random-state on bytevectors return an rstate with a different implementation of scm_t_rng -- for example, we could use GMP's mersenne twister API. In any case, don't let stability concerns stop you from hacking on a fix! We'll probably get the first 2.2 preview out within a a year, and we certainly need to change the default behavior for then Andy -- http://wingolog.org/
Re: Improve `seed-random-state' in stable-2.0?
On Sat 21 Jan 2012 00:46, Mike Gran spk...@yahoo.com writes: (seed-random-state (current-time)) seems to be a common idiom that you would end up breaking. This is a common idiom that is worth deprecating. Mark's new functions that seed the random state from /dev/urandom are much better. So no, no plans to break anything -- but we should help these people write better programs. Regards, Andy -- http://wingolog.org/
Re: Improve `seed-random-state' in stable-2.0?
From: Andy Wingo wi...@pobox.com (seed-random-state (current-time)) seems to be a common idiom that you would end up breaking. This is a common idiom that is worth deprecating. Mark's new functions that seed the random state from /dev/urandom are much better. Are you suggesting that you'll break the API in the hope that when people's code stops working, they'll reread the manual and notice that random-state-from-platform exists? It seems a rather unfriendly way to accomplish that task. Regards, Mike
Re: Improve `seed-random-state' in stable-2.0?
On Mon 23 Jan 2012 14:06, Mike Gran spk...@yahoo.com writes: From: Andy Wingo wi...@pobox.com (seed-random-state (current-time)) seems to be a common idiom that you would end up breaking. This is a common idiom that is worth deprecating. Mark's new functions that seed the random state from /dev/urandom are much better. Are you suggesting that you'll break the API in the hope that when people's code stops working, they'll reread the manual and notice that random-state-from-platform exists? It seems a rather unfriendly way to accomplish that task. That would indeed be a mean thing to do! It's not what I'm suggesting though. Deprecation means causing Guile to emit warnings, at compile-time or at runtime, indicating that a particular interface will go away at some point, and noting the interface that should be used instead. I think it's fairly helpful, actually, but if you have any suggestions for how it could be improved, they are much welcome. Regards, Andy -- http://wingolog.org/
Re: Improve `seed-random-state' in stable-2.0?
David Kastrup d...@gnu.org writes: Actually, you don't need a PRNG at all. Generate a _good_ random starting value, and count sequentially from there. This is _exactly_ what my patch does, on a per-thread basis. The starting value is read directly from /dev/urandom if available. However, if /dev/urandom cannot be read, the PRNG is used to generate the starting value. This is a last resort, and ideally we should never use it. Thanks, Mark
Re: Improve `seed-random-state' in stable-2.0?
Mark H Weaver m...@netris.org writes: I know how to make this _much_ better, but here's the question: is it okay to change the behavior of the random number generator in 2.0? Or is it important that the same sequence of random numbers are generated from a given seed in the entire stable-2.0 series? As long as the total _sequence_ stays the same (and does not contain duplicate values), I would not worry. If the sequence itself becomes different, then the probability of two random subsequences having at least one match match grows with the product rather than the sum of their length. Of course, if the subsequences _do_ overlap, you tend to get colliding sequences instead of just single outliers. -- David Kastrup
Re: Improve `seed-random-state' in stable-2.0?
On Fri 20 Jan 2012 04:28, Mark H Weaver m...@netris.org writes: `seed-random-state' is poorly implemented when passed a numeric argument. It converts the number to a decimal string, and then `scm_i_init_rstate' takes over and basically adds every 8th byte together to form the 64-bit internal state. Is that what it does? I agree that it's a pretty bad initializer to a pretty bad PRNG, but from what I can tell, it does use all of the bytes in the input. They aren't very dense bytes, entropy-wise, but there are more of them, so I think the amount of entropy added to the seed is the same. Dunno. Am I reading that code wrong? Regards, Andy -- http://wingolog.org/
Re: Improve `seed-random-state' in stable-2.0?
Andy Wingo wi...@pobox.com writes: On Fri 20 Jan 2012 04:28, Mark H Weaver m...@netris.org writes: `seed-random-state' is poorly implemented when passed a numeric argument. It converts the number to a decimal string, and then `scm_i_init_rstate' takes over and basically adds every 8th byte together to form the 64-bit internal state. Is that what it does? I agree that it's a pretty bad initializer to a pretty bad PRNG, but from what I can tell, it does use all of the bytes in the input. Sorry, my explanation was not clear. What I mean is that the result of number-string (suppose it is 12345678901234567890) gets processed like this: 0x31 0x32 0x33 0x34 0x35 0x36 0x37 0x38 0x39 0x30 0x31 0x32 0x33 0x34 0x35 0x36 0x37 0x38 0x39 0x30 (those are the ASCII values of digits 0-9) and then the sum of each column is taken, yielding 64-bits. That's what I meant when I wrote adds every 8th byte together. So yes, indeed every byte is used. They aren't very dense bytes, entropy-wise, but there are more of them, so I think the amount of entropy added to the seed is the same. No, adding numbers does _not_ preserve their total entropy, not even close. Suppose you start with two truly random 8-bit numbers, with all 256 values equally likely. So that's a total of 16 bits of entropy. If you add them together, you end up with a 9-bit number whose possible values are _not_ equally likely, so that's less than 9 bits of entropy. This is what's happening here. We have a 64-bit PRNG, but even if you make sure to seed it with a number that has plenty of entropy, `seed-random-state' will only preserve around 32 bits of it. That's not good. We should fix it if we possibly can. UUIDs really can't afford to have only 32 bits of entropy and still be reliable. If we cannot fix `seed-random-state', we'll have to avoid using it in UUID initialization. Mark
Re: Improve `seed-random-state' in stable-2.0?
Hi Mark, Thanks for the explanations! On Fri 20 Jan 2012 19:52, Mark H Weaver m...@netris.org writes: We have a 64-bit PRNG, but even if you make sure to seed it with a number that has plenty of entropy, `seed-random-state' will only preserve around 32 bits of it. Wow, that's horrible. Agreed that this needs fixing. How about, we extend seed-random-state to operate on bytevectors, and have that interface do the right thing. We deprecate the number interface. At some point in the future we deprecate the string interface as well. I'm hesitant to make seed-random-state change the rstate that it produces, for a given seed. It's unlikely anyone is relying on this, but who knows... WDYT? Andy -- http://wingolog.org/
Re: Improve `seed-random-state' in stable-2.0?
Andy Wingo wi...@pobox.com writes: On Fri 20 Jan 2012 19:52, Mark H Weaver m...@netris.org writes: We have a 64-bit PRNG, but even if you make sure to seed it with a number that has plenty of entropy, `seed-random-state' will only preserve around 32 bits of it. Wow, that's horrible. Agreed that this needs fixing. How about, we extend seed-random-state to operate on bytevectors, and have that interface do the right thing. I agree that it would be nice for `seed-random-state' to support bytevectors as well, but for many (most?) purposes that would be a very awkward interface to use. We deprecate the number interface. At some point in the future we deprecate the string interface as well. I think this would be very unfortunate. Numbers are by far the most natural representation for seed values, and I hope they will continue to be fully supported. I'm hesitant to make seed-random-state change the rstate that it produces, for a given seed. It's unlikely anyone is relying on this, but who knows... Even if we keep a broken `seed-random-state', there's another problem: our PRNG sucks rocks. If we constrain ourselves to produce the same sequence of random numbers for a given seed, that means that we're stuck with this very weak PRNG for the entire 2.0 series. Can't we just make a clean break now? 2.0 is still not widely deployed, so now is a great time to assert our right to change the PRNG at will. As you say, it's unlikely that anyone is relying on this anyway. If anyone is, wouldn't it be better to deal with that now? Thanks, Mark
Re: Improve `seed-random-state' in stable-2.0?
From: Andy Wingo wi...@pobox.com How about, we extend seed-random-state to operate on bytevectors, and have that interface do the right thing. We deprecate the number interface. At some point in the future we deprecate the string interface as well. The number or string argument to seed-random-state is not unique to Guile. Slib uses it, too. The string argument always seemed strange to me and seems to be seldom used. But a websearch says that the integer argument is quite common. (seed-random-state (current-time)) seems to be a common idiom that you would end up breaking. Regards, Mike
Re: Improve `seed-random-state' in stable-2.0?
Mark H Weaver m...@netris.org writes: Can't we just make a clean break now? 2.0 is still not widely deployed, so now is a great time to assert our right to change the PRNG at will. As you say, it's unlikely that anyone is relying on this anyway. If anyone is, wouldn't it be better to deal with that now? For unique identifier generation, you want a PRNG that has a period equal to the size of the generated identifier set and generates the same next value given the current value in order to avoid the birthday paradoxon. That is actually different from what one considers a good PRNG, so it might make sense to have a home-brewn bad PRNG for this purpose. Actually, you don't need a PRNG at all. Generate a _good_ random starting value, and count sequentially from there. The probability of collision is the same as with a pseudo-random sequence, but if there happens to be a problem at any time, it becomes much easier to diagnose. -- David Kastrup
Improve `seed-random-state' in stable-2.0?
Hello all, `seed-random-state' is poorly implemented when passed a numeric argument. It converts the number to a decimal string, and then `scm_i_init_rstate' takes over and basically adds every 8th byte together to form the 64-bit internal state. The problem is that only about 3-bits of entropy is present in each character of the string, and then these are added together in an aligned fashion, which also loses a great deal of entropy. I haven't analyzed this carefully, but I'd be surprised if we preserve much more than 32-bits of entropy from the original seed. I know how to make this _much_ better, but here's the question: is it okay to change the behavior of the random number generator in 2.0? Or is it important that the same sequence of random numbers are generated from a given seed in the entire stable-2.0 series? Mark