Re: Improve `seed-random-state' in stable-2.0?

2012-01-23 Thread Andy Wingo
Hi Mark,

On Fri 20 Jan 2012 23:45, Mark H Weaver m...@netris.org writes:

 Andy Wingo wi...@pobox.com writes:

 How about, we extend seed-random-state to operate on bytevectors, and
 have that interface do the right thing.

 I agree that it would be nice for `seed-random-state' to support
 bytevectors as well, but for many (most?) purposes that would be a very
 awkward interface to use.

Really?  My thought would be that if I have to initialize a PRNG, I need
some random bits.  Dunno.  Deprecating the number interface does have
the advantage that we can perhaps move people to use your nice new
functions, instead of initializing with (getpid) or whatever it is that
they are using.

 Even if we keep a broken `seed-random-state', there's another problem:
 our PRNG sucks rocks.  If we constrain ourselves to produce the same
 sequence of random numbers for a given seed, that means that we're stuck
 with this very weak PRNG for the entire 2.0 series.

 Can't we just make a clean break now?  2.0 is still not widely deployed,
 so now is a great time to assert our right to change the PRNG at will.
 As you say, it's unlikely that anyone is relying on this anyway.
 If anyone is, wouldn't it be better to deal with that now?

While I agree about the badness of the PRNG -- though we shouldn't
overstate that; for being so simple, MWC does well -- but I really don't
think that we should change the default behavior now.

OTOH, we can make seed-random-state on bytevectors return an rstate
with a different implementation of scm_t_rng -- for example, we could
use GMP's mersenne twister API.

In any case, don't let stability concerns stop you from hacking on a
fix!  We'll probably get the first 2.2 preview out within a a year, and
we certainly need to change the default behavior for then

Andy
-- 
http://wingolog.org/



Re: Improve `seed-random-state' in stable-2.0?

2012-01-23 Thread Andy Wingo
On Sat 21 Jan 2012 00:46, Mike Gran spk...@yahoo.com writes:

 (seed-random-state (current-time)) seems to be a common idiom that
 you would end up breaking.

This is a common idiom that is worth deprecating.  Mark's new functions
that seed the random state from /dev/urandom are much better.

So no, no plans to break anything -- but we should help these people
write better programs.

Regards,

Andy
-- 
http://wingolog.org/



Re: Improve `seed-random-state' in stable-2.0?

2012-01-23 Thread Mike Gran
 From: Andy Wingo wi...@pobox.com
 (seed-random-state (current-time)) seems to be a common idiom that
 you would end up breaking.

This is a common idiom that is worth deprecating.  Mark's new functions
that seed the random state from /dev/urandom are much better.
 
Are you suggesting that you'll break the API in the hope that when people's
code stops working, they'll reread the manual and notice that
random-state-from-platform exists?
 
It seems a rather unfriendly way to accomplish that task.

Regards,
 
Mike



Re: Improve `seed-random-state' in stable-2.0?

2012-01-23 Thread Andy Wingo
On Mon 23 Jan 2012 14:06, Mike Gran spk...@yahoo.com writes:

 From: Andy Wingo wi...@pobox.com
 (seed-random-state (current-time)) seems to be a common idiom that
 you would end up breaking.

This is a common idiom that is worth deprecating.  Mark's new functions
that seed the random state from /dev/urandom are much better.
  
 Are you suggesting that you'll break the API in the hope that when people's
 code stops working, they'll reread the manual and notice that
 random-state-from-platform exists?
  
 It seems a rather unfriendly way to accomplish that task.

That would indeed be a mean thing to do!  It's not what I'm suggesting
though.  Deprecation means causing Guile to emit warnings, at
compile-time or at runtime, indicating that a particular interface will
go away at some point, and noting the interface that should be used
instead.

I think it's fairly helpful, actually, but if you have any suggestions
for how it could be improved, they are much welcome.

Regards,

Andy
-- 
http://wingolog.org/



Re: Improve `seed-random-state' in stable-2.0?

2012-01-21 Thread Mark H Weaver
David Kastrup d...@gnu.org writes:
 Actually, you don't need a PRNG at all.  Generate a _good_ random
 starting value, and count sequentially from there.

This is _exactly_ what my patch does, on a per-thread basis.
The starting value is read directly from /dev/urandom if available.

However, if /dev/urandom cannot be read, the PRNG is used to generate
the starting value.  This is a last resort, and ideally we should never
use it.

   Thanks,
 Mark



Re: Improve `seed-random-state' in stable-2.0?

2012-01-20 Thread David Kastrup
Mark H Weaver m...@netris.org writes:

 I know how to make this _much_ better, but here's the question: is it
 okay to change the behavior of the random number generator in 2.0?  Or
 is it important that the same sequence of random numbers are generated
 from a given seed in the entire stable-2.0 series?

As long as the total _sequence_ stays the same (and does not contain
duplicate values), I would not worry.  If the sequence itself becomes
different, then the probability of two random subsequences having at
least one match match grows with the product rather than the sum of
their length.  Of course, if the subsequences _do_ overlap, you tend to
get colliding sequences instead of just single outliers.

-- 
David Kastrup




Re: Improve `seed-random-state' in stable-2.0?

2012-01-20 Thread Andy Wingo
On Fri 20 Jan 2012 04:28, Mark H Weaver m...@netris.org writes:

 `seed-random-state' is poorly implemented when passed a numeric
 argument.  It converts the number to a decimal string, and then
 `scm_i_init_rstate' takes over and basically adds every 8th byte
 together to form the 64-bit internal state.

Is that what it does?

I agree that it's a pretty bad initializer to a pretty bad PRNG, but
from what I can tell, it does use all of the bytes in the input.  They
aren't very dense bytes, entropy-wise, but there are more of them, so I
think the amount of entropy added to the seed is the same.

Dunno.  Am I reading that code wrong?

Regards,

Andy
-- 
http://wingolog.org/



Re: Improve `seed-random-state' in stable-2.0?

2012-01-20 Thread Mark H Weaver
Andy Wingo wi...@pobox.com writes:

 On Fri 20 Jan 2012 04:28, Mark H Weaver m...@netris.org writes:

 `seed-random-state' is poorly implemented when passed a numeric
 argument.  It converts the number to a decimal string, and then
 `scm_i_init_rstate' takes over and basically adds every 8th byte
 together to form the 64-bit internal state.

 Is that what it does?

 I agree that it's a pretty bad initializer to a pretty bad PRNG, but
 from what I can tell, it does use all of the bytes in the input.

Sorry, my explanation was not clear.  What I mean is that the result of
number-string (suppose it is 12345678901234567890) gets processed
like this:

   0x31 0x32 0x33 0x34 0x35 0x36 0x37 0x38
   0x39 0x30 0x31 0x32 0x33 0x34 0x35 0x36
   0x37 0x38 0x39 0x30

   (those are the ASCII values of digits 0-9)

and then the sum of each column is taken, yielding 64-bits.  That's what
I meant when I wrote adds every 8th byte together.  So yes, indeed
every byte is used.

 They aren't very dense bytes, entropy-wise, but there are more of
 them, so I think the amount of entropy added to the seed is the same.

No, adding numbers does _not_ preserve their total entropy, not even
close.  Suppose you start with two truly random 8-bit numbers, with all
256 values equally likely.  So that's a total of 16 bits of entropy.  If
you add them together, you end up with a 9-bit number whose possible
values are _not_ equally likely, so that's less than 9 bits of entropy.

This is what's happening here.  We have a 64-bit PRNG, but even if you
make sure to seed it with a number that has plenty of entropy,
`seed-random-state' will only preserve around 32 bits of it.

That's not good.  We should fix it if we possibly can.  UUIDs really
can't afford to have only 32 bits of entropy and still be reliable.  If
we cannot fix `seed-random-state', we'll have to avoid using it in UUID
initialization.

 Mark



Re: Improve `seed-random-state' in stable-2.0?

2012-01-20 Thread Andy Wingo
Hi Mark,

Thanks for the explanations!

On Fri 20 Jan 2012 19:52, Mark H Weaver m...@netris.org writes:

 We have a 64-bit PRNG, but even if you make sure to seed it with a
 number that has plenty of entropy, `seed-random-state' will only
 preserve around 32 bits of it.

Wow, that's horrible.  Agreed that this needs fixing.

How about, we extend seed-random-state to operate on bytevectors, and
have that interface do the right thing.  We deprecate the number
interface.  At some point in the future we deprecate the string
interface as well.

I'm hesitant to make seed-random-state change the rstate that it
produces, for a given seed.  It's unlikely anyone is relying on this,
but who knows...

WDYT?

Andy
-- 
http://wingolog.org/



Re: Improve `seed-random-state' in stable-2.0?

2012-01-20 Thread Mark H Weaver
Andy Wingo wi...@pobox.com writes:

 On Fri 20 Jan 2012 19:52, Mark H Weaver m...@netris.org writes:

 We have a 64-bit PRNG, but even if you make sure to seed it with a
 number that has plenty of entropy, `seed-random-state' will only
 preserve around 32 bits of it.

 Wow, that's horrible.  Agreed that this needs fixing.

 How about, we extend seed-random-state to operate on bytevectors, and
 have that interface do the right thing.

I agree that it would be nice for `seed-random-state' to support
bytevectors as well, but for many (most?) purposes that would be a very
awkward interface to use.

 We deprecate the number interface.  At some point in the future we
 deprecate the string interface as well.

I think this would be very unfortunate.  Numbers are by far the most
natural representation for seed values, and I hope they will continue to
be fully supported.

 I'm hesitant to make seed-random-state change the rstate that it
 produces, for a given seed.  It's unlikely anyone is relying on this,
 but who knows...

Even if we keep a broken `seed-random-state', there's another problem:
our PRNG sucks rocks.  If we constrain ourselves to produce the same
sequence of random numbers for a given seed, that means that we're stuck
with this very weak PRNG for the entire 2.0 series.

Can't we just make a clean break now?  2.0 is still not widely deployed,
so now is a great time to assert our right to change the PRNG at will.
As you say, it's unlikely that anyone is relying on this anyway.
If anyone is, wouldn't it be better to deal with that now?

Thanks,
  Mark



Re: Improve `seed-random-state' in stable-2.0?

2012-01-20 Thread Mike Gran
 From: Andy Wingo wi...@pobox.com

How about, we extend seed-random-state to operate on bytevectors, and
have that interface do the right thing.  We deprecate the number
interface.  At some point in the future we deprecate the string
interface as well.

The number or string argument to seed-random-state is not unique to
Guile.  Slib uses it, too.
 
The string argument always seemed strange to me and seems to be seldom
used.  But a websearch says that the integer argument is quite common.
 
(seed-random-state (current-time)) seems to be a common idiom that
you would end up breaking.
 
Regards,
 
Mike



Re: Improve `seed-random-state' in stable-2.0?

2012-01-20 Thread David Kastrup
Mark H Weaver m...@netris.org writes:

 Can't we just make a clean break now?  2.0 is still not widely
 deployed, so now is a great time to assert our right to change the
 PRNG at will.  As you say, it's unlikely that anyone is relying on
 this anyway.  If anyone is, wouldn't it be better to deal with that
 now?

For unique identifier generation, you want a PRNG that has a period
equal to the size of the generated identifier set and generates the same
next value given the current value in order to avoid the birthday
paradoxon.  That is actually different from what one considers a good
PRNG, so it might make sense to have a home-brewn bad PRNG for this
purpose.

Actually, you don't need a PRNG at all.  Generate a _good_ random
starting value, and count sequentially from there.  The probability of
collision is the same as with a pseudo-random sequence, but if there
happens to be a problem at any time, it becomes much easier to diagnose.

-- 
David Kastrup




Improve `seed-random-state' in stable-2.0?

2012-01-19 Thread Mark H Weaver
Hello all,

`seed-random-state' is poorly implemented when passed a numeric
argument.  It converts the number to a decimal string, and then
`scm_i_init_rstate' takes over and basically adds every 8th byte
together to form the 64-bit internal state.

The problem is that only about 3-bits of entropy is present in each
character of the string, and then these are added together in an aligned
fashion, which also loses a great deal of entropy.  I haven't analyzed
this carefully, but I'd be surprised if we preserve much more than
32-bits of entropy from the original seed.

I know how to make this _much_ better, but here's the question: is it
okay to change the behavior of the random number generator in 2.0?  Or
is it important that the same sequence of random numbers are generated
from a given seed in the entire stable-2.0 series?

Mark