[issue21470] Better seeding for the random module

2016-06-07 Thread Christian Heimes

Changes by Christian Heimes :


--
nosy: +christian.heimes

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-17 Thread Larry Hastings

Larry Hastings added the comment:

(If the past few weeks have taught us *anything*, it's that we can't look to 
OpenSSL to learn best practices.)

--
nosy: +larry
stage:  - resolved

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-14 Thread Charles-François Natali

Charles-François Natali added the comment:

Thanks for the explanations!

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-13 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

neologix:

According to man rand(3ssl), OpenSSL uses an internal state of 1023 bytes for 
the RNG.

You only see it reading 32 bytes from /dev/urandom in the strace because it has 
already loaded 1024 bytes from the RNG state file ~/.rng before adding another 
32 bytes:

open(/home/lemburg/.rnd, O_RDONLY)= 3
read(3, .., 4096) = 1024
read(3, , 4096)   = 0
Generating RSA private key, 512 bit long modulus
open(/dev/urandom, O_RDONLY|O_NOCTTY|O_NONBLOCK) = 3
read(3, ..., 32) = 32

FWIW: I'm with Raymond and Tim on this one. I prefer to have a good seed in an 
RNG per default, simply because most application don't bother to reseed RNGs 
every now and then, so having a good start into the day is important :-)

--
nosy: +lemburg

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-13 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Is ~/.rnd any kind of serious? It hasn't been modified since two weeks on my 
system (which is rebooted every day).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-13 Thread Charles-François Natali

Charles-François Natali added the comment:

 According to man rand(3ssl), OpenSSL uses an internal state of 1023 bytes for 
 the RNG.

 You only see it reading 32 bytes from /dev/urandom in the strace because it 
 has already loaded 1024 bytes from the RNG state file ~/.rng before adding 
 another 32 bytes:

Remove this .rnd file, and you'll see on the next run that it still
only reads 32 bytes.
Same holds for openssh.

I'm not using this as an argument against increasing the seed size -
Tim and Raymond convinced me - I'm just curious.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-13 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

On 13.05.2014 11:06, Antoine Pitrou wrote:
 
 Is ~/.rnd any kind of serious? It hasn't been modified since two weeks on my 
 system (which is rebooted every day).

The file is apparently only updated if you use one the OpenSSL commands
which needs random data. grep for RAND_write_file in the apps/ subdir
of the OpenSSL distribution. Of course, applications can also use that API,
so there may be other situations where it gets updated as well.

However, when removing that file, OpenSSL still only reads 32 bytes from
/dev/urandom, which suggests that it's either using some other sources
of randomness as well (there are some timing tricks being used in the
code for this), or (more likely) simply doesn't need more random
bytes to start with.

So while the file does have some meaning, it's not why I had thought
it would be.

Here's a more likely explanation:

The OpenSSL random number source only works with hash
function feedback and random data that gets added to it. It's not
an PRNG with provable characteristics.

OpenSSL uses SHA-1 for hashing which has a 20 byte digest size, so an
initial vector of 32 bytes is a good start (though more are always
better ;-)):

http://en.wikipedia.org/wiki/Randomness_extractor

That said and coming back to the question why 32 bytes are enough for
OpenSSL: the OpenSSL RNG is being seeded with a seed from the
full range of possible values (160 bits). It's period is a lot smaller
than the MT one (19937 bits), which is why fewer random bytes are
needed.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-13 Thread Tim Peters

Tim Peters added the comment:

Crytpo generators are a whole different world, and I wouldn't listen to anyone 
save a bona fide expert in that field.  Plausible:  the hardest thing OpenSSL 
has to do is generate secure RSA keys.  But the bit length of an RSA key can't 
be taken at face value:  the true strength of such a key is measured by the 
number of operations required to break it.  According to (among many others):

http://en.wikipedia.org/wiki/Key_size#Asymmetric_algorithm_key_lengths

NIST key management guidelines further suggest that 15360-bit RSA keys are 
equivalent in strength to 256-bit symmetric keys.

So 32 bytes = 256 bits of entropy is sufficient to generate secure 15360-bit 
RSA keys, which is larger than virtually anyone actually uses (so far), 
provided everything else is done exactly right.

For that reason, bug reports about OpenSSL using only 32 bytes seem to get 
brushed off, like:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=742145

So what does that have to do with Python's random()?  Nothing ;-)

A more fruitful tack would be to investigate switching away from the Mersenne 
Twister.  It was groundbreaking at the time, but nothing lasts forever.  Even 
Wikipedia can come up with a list of its disadvantages now, including the 
state space is too large and uselessly stresses the CPU cache:

http://en.wikipedia.org/wiki/Mersenne_twister#Disadvantages

Worse (according to me), when it reaches a point where most of the bits in 
its state are zeroes, it can take a long time (many calls) before its outputs 
pass randomness tests again - a paucity of 1 bits tends to persist way too 
long.

More recent algorithms claim to address these flaws, with smaller state and 
similar speed.  But they're marginal improvements, and don't seem to be gaining 
traction quickly.  The Twister was a huge improvement at the time, and caught 
on very quickly.

In the meantime, better safe than sorry.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-13 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 7b5265752942 by Raymond Hettinger in branch '2.7':
Issue #21470: Do a better job seeding the random number generator
http://hg.python.org/cpython/rev/7b5265752942

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-13 Thread Roundup Robot

Roundup Robot added the comment:

New changeset c203df907092 by Raymond Hettinger in branch '3.4':
Issue #21470: Do a better job seeding the random number generator
http://hg.python.org/cpython/rev/c203df907092

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-13 Thread Raymond Hettinger

Changes by Raymond Hettinger raymond.hettin...@gmail.com:


--
resolution:  - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-12 Thread STINNER Victor

STINNER Victor added the comment:

MT is equidistributed.  This a major point in its favor but also implies that 
there are long stretches of uninteresting sequences.  When we seed with only 
a subset the state space, there is a risk of systematically landing in those 
stretches.

What is an uninteresting sequence? What are the problem of these sequences?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-12 Thread Tim Peters

Tim Peters added the comment:

[haypo]
 What is an uninteresting sequence? What are the problem of these
 sequences?

A sequence that would greatly surprise a user.  For example, if you generate 
32-bit ints from the Twister in one obvious way, there are starting places 
where you'll get 623 zeroes in a row.  And places where you'll get 2**32-1 623 
times in a row.  Pick a sequence of any 623 32-bit ints, and there's a starting 
place that will deliver that sequence.  And, indeed, a truly random sequence 
must also display that behavior, but it's nevertheless very surprising when you 
see one.  It's very rare, both in a truly random sequence and in sequences 
derived from the Twister.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-12 Thread STINNER Victor

STINNER Victor added the comment:

 A sequence that would greatly surprise a user.

No user complained past years. I don't think that we should worry so much, 
because it looks like reading more data from /dev/urandom can be a more serious 
and concrete issue.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-12 Thread Tim Peters

Tim Peters added the comment:

[haypo]
 No user complained past years.

Raymond said We've previously had this problem with MT (since resolved, where 
it is was landed in a very non-random zone).  Do you believe he was wrong?

 I don't think that we should worry so much, because it looks like
 reading more data from /dev/urandom can be a more serious and
 concrete issue.

Why do you say that?  Most links people have found and posted here clearly say 
that the Linux warnings about urandom are basically nonsense.  Please supply 
references to back up serious and concrete (or point to earlier references, 
if you found them convincing).

I'm with Raymond on this.  There is no useful theory that allows us to predict 
the characteristics of the produced sequences from a set of possible seeds, so 
limiting the set of possible seeds is potentially dangerous.  The theory about 
equidistribution (etc) is very useful, but relies on mathematical analysis of 
the _entire_ period of the generator.  The only way to span the entire period 
is to allow for all possible seeds.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-12 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 There is no useful theory that allows us to predict the
 characteristics of the produced sequences from a set of possible
 seeds, so limiting the set of possible seeds is potentially dangerous.

I still find it difficult to understand where is the said danger. As you point 
out, a sequence of zeroes is a valid random sequence, and there is no reason to 
believe that a sequence of zeroes is more likely with a 256 bits seed, than 
with a 20 kbits seed (it might as well be less likely, for all we know).

We may as well bump it from 256 to 512 or 1024 bits, but 20 kbits sounds 
extremely unusual for a program to read from /dev/urandom at startup.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-12 Thread Tim Peters

Tim Peters added the comment:

[pitrou]
 I still find it difficult to understand where is the said danger.

The theoretical properties that make the Twister so attractive were all proved 
based on mathematical analysis of its entire period.  The only way to get at 
the whole period is to allow for all possible seeds.

If the seeds Python can use are drawn from a relatively tiny subset of the 
possible seeds, nothing can be said about most of the proved correct 
properties anymore.  Maybe they still hold.  Maybe they don't.  In the absence 
of analysis (which, AFAIK, is still too difficult to do), the only way to be 
safe is to refrain from being so bloody clever in the interest of saving a 
few microseconds.

  As you point out, a sequence of zeroes is a valid random
 sequence, and there is no reason to believe that a sequence
 of zeroes is more likely with a 256 bits seed, than with a
 20 kbits seed (it might as well be less likely, for all we know).

That's the point:  we don't _know_ much of anything if we restrict to a subset 
of possible seeds.  But we DO know if all possible seeds are allowed for.  Then 
the Twister has many nice properties, and provably so.  Allowing for all 
possible seeds is judicious:  it's an acknowledgement of our ignorance, and a 
statement that we're more concerned with correctness than micro-efficiency in a 
(typically) zero- or one-time-per-process `random` initialization cost.

 We may as well bump it from 256 to 512 or 1024 bits,

That's as unprincipled as using 1 bit - although _likely_ to give better 
results ;-)

 but 20 kbits sounds extremely unusual for a program to read from
 /dev/urandom at startup.

At least on my box, starting Python does not import `random`, and the seeding 
code isn't called at all.  It's only called when `random` is imported.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-12 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 The theoretical properties that make the Twister so attractive were
 all proved based on mathematical analysis of its entire period.  The
 only way to get at the whole period is to allow for all possible
 seeds.
 
 If the seeds Python can use are drawn from a relatively tiny subset of
 the possible seeds, nothing can be said about most of the proved
 correct properties anymore.  Maybe they still hold.  Maybe they
 don't.  In the absence of analysis (which, AFAIK, is still too
 difficult to do), the only way to be safe is to refrain from being so
 bloody clever in the interest of saving a few microseconds.

Thanks for the explanation. It's much clearer now.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-12 Thread Tim Peters

Tim Peters added the comment:

 Thanks for the explanation. It's much clearer now.

Maybe, but it's also overblown - LOL ;-)  That is, no matter what the starting 
seed, the user will see a microscopically tiny span of the Twister's entire 
period.  So all those provably correct properties that depend on whole-period 
analysis remain pretty much theoretical no matter what we do.

It's just a better safe than sorry thing.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-12 Thread Charles-François Natali

Charles-François Natali added the comment:

Tim, any idea why openssl, openssh  Co get away with just 32 bytes of
seed read from /dev/urandom?
Is it because of a much smaller state space of the underlying CSPRNG?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-11 Thread Charles-François Natali

Charles-François Natali added the comment:

 The default seeding for the random module currently used 32 bytes from
 urandom() to create the initial state of the random number generator.
 This is far less than the number of possible states 2**19937-1.

I'm not a cryptography expert, but IMO 32 bytes is more than enough:
for example, ssh-keygen only reads 32 bytes from /dev/urandom when
generating a new key (and I assume the authors know what they're
doing).
So I'm not sure that convering the state space is a good enough
reason to increase the amount of data read: it can hut performance,
not only for the singla process case, but also because /dev/urandom is
protected by a spinlock/mutex, it can hurt workloads where multiple
processes are simultaneously reading from /dev/urandom (which is the
reason we switched to a persistent /dev/urandom FD), see e.g.
http://drsnyder.us/2014/04/16/linux-dev-urandom-and-concurrency.html

And I especially like this answer by Ted Ts'o (once again, I assume he
knows what he's talking about):

Doctor, doctor it hurts when I do this. Well, then don't do that!

The reason for the spinlock is to avoid having two reads from
/dev/urandom return the same results. That would be undesirable...

If people are trying to read from /dev/urandom a huge number of times
per second, then they're doing something wrong. The real issue, as
Sebastian has pointed out, is that issue spawning a huge number of
curl processes for each request. Then hopefully curl is using
/dev/urandom to initialize its own userspace RNG (many crypto
libraries do this; most *should* do this), so you're not trying to
read from /dev/urandom for every single request. /dev/urandom is
designed for security, not for speed.


--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-11 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Several thoughts:

* We're not reading urandom a huge number of times per second.  This is just 
one read of 2,500 bytes.  What Ted is talking about and what we're doing are as 
different as night and day.

* We're also not doing this in a loop.  It is just once when Random() is 
initialized.  There are no threading issues here.

* 32 bytes is good but it is not enough.  There is a reason that the state 
space for the Mersenne Twister is so large to begin with.  Functions as simple 
as shuffle() eat through the possibilities very quickly.

* The doctor, it hurts quote is funny, but we don't have any hurt here.  
Running import os; os.urandom(2500) takes under a millisecond on my system.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-11 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 http://www.2uo.de/myths-about-urandom/

Thanks, interesting read.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-11 Thread Arfrever Frehtes Taifersar Arahesis

Changes by Arfrever Frehtes Taifersar Arahesis arfrever@gmail.com:


--
nosy: +Arfrever

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-11 Thread Mark Dickinson

Mark Dickinson added the comment:

Raymond:

 Functions as simple as shuffle() eat through the possibilities very quickly.

Can you elaborate on this?  Are there example scenarios where seeding with 32 
bytes isn't likely to be enough?

In the case of shuffle, for a large list, if you do a seed followed by a 
shuffle, the restriction to 32 bytes is restricting us to 'only' about 10**77 
possible different shuffle results.  It's hard to imagine a situation where 
that would be a problem.

--
nosy: +mark.dickinson

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-11 Thread Charles-François Natali

Charles-François Natali added the comment:

 * We're not reading urandom a huge number of times per second.  This is 
 just one read of 2,500 bytes.  What Ted is talking about and what we're doing 
 are as different as night and day.

 * We're also not doing this in a loop.  It is just once when Random() is 
 initialized.  There are no threading issues here.

Well, you don't know how people will use it though: some code spawns
many processes per second (see recent discussion on python-dev).

 * 32 bytes is good but it is not enough.  There is a reason that the state 
 space for the Mersenne Twister is so large to begin with.  Functions as 
 simple as shuffle() eat through the possibilities very quickly.

As I said, I'm not a cryptography expert, but quoting the link you gave:
About 256 bits of entropy are enough to get computationally secure
numbers for a long, long time.

The kernel's CSPRNG itself considers 256 bits enough, so I'm curious
as to what makes you think that 32 *bytes* is not enough.

openssl itself only reads 32 bytes from /dev/urandom:

$ strace -e open,read openssl genrsa
open(/dev/urandom, O_RDONLY|O_NOCTTY|O_NONBLOCK) = 3
read(3, 
\336\314\312\355\305\312\375\244\276G\n\201^\32\236\301\243\327\277\344\320\0\5\3017-\\\346\333G?,
32) = 32


In short, everyone seems to think that 32bytes seeding is more than enough.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-11 Thread Raymond Hettinger

Raymond Hettinger added the comment:

  32 bytes seeding is more than enough.

Not enough to cover *our* state space (2 ** 19937 - 1).

This tracker item boils down to balancing fear that something bad will happen 
when you call urandom(2500) versus having seeding sufficient to cover the 
state space.

My judgment (based on the cited article) is the first is FUD and that the 
second is something that has gives us the full value from the Mersenne Twister.

What constitutes enough is a value judgment that many vary from application 
to application.  For some applications, a much weaker PRNG  would suffice, but 
we decided long ago that we wanted the full power of MT.

If you an show an actual downside (not just fear of badness), then we have 
bigger issues (i.e. needing to document that SystemRandom() and os.urandom() 
have major limitations and should be avoided).

On my system, the difference between a call to urandom(32) and urandom(2500) is 
measured in microseconds.   I believe (but am having a hard time measuring) 
that the call is even cheaper than the subsequent processing we do 
random_seed() and init_by_array().

In other words, the worry about calling urandom(2500) appears to baseless or 
immaterial.  If you have timings that show evidence to the contrary, I would 
like to see them.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-11 Thread Raymond Hettinger

Raymond Hettinger added the comment:

http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/VERSIONS/C-LANG/MersenneTwister.h

http://www.omnetpp.org/doc/omnetpp/api/mersennetwister_8h_source.html

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-11 Thread Raymond Hettinger

Changes by Raymond Hettinger raymond.hettin...@gmail.com:


--
Removed message: http://bugs.python.org/msg218288

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-11 Thread Raymond Hettinger

Raymond Hettinger added the comment:

http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/VERSIONS/C-LANG/MersenneTwister.h

http://www.omnetpp.org/doc/omnetpp/api/mersennetwister_8h_source.html

http://www0.cs.ucl.ac.uk/staff/d.jones/GoodPracticeRNG.pdf

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-11 Thread STINNER Victor

STINNER Victor added the comment:

The default seeding for the random module currently used 32 bytes from 
urandom() to create the initial state of the random number generator.  This is 
far less than the number of possible states 2**19937-1.

I suggest to document how the PRNG is initialized instead of reading more bytes 
from /dev/urandom.

--
nosy: +haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-11 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 What constitutes enough is a value judgment that many vary from
 application to application.  For some applications, a much weaker PRNG 
 would suffice, but we decided long ago that we wanted the full power of MT.

I don't really understand for which application 2 bits of seeding entropy 
would be required *in practice*. Surely MT has other interesting properties 
(such as the statistical distribution of the output) than its insanely large 
cycle length, that make it desirable as a PRNG.

The paper you linked to (Good Practice in (Pseudo) Random Number Generation 
for Bioinformatics Applications) doesn't suggest feeding a 2 bits seed, it 
actually seems to say that 64 bits is enough for numerical simulations run on 
large clusters.

While reading 2 bits off of /dev/urandom might be fast under Linux, it 
might not necessarily be the case on other systems. It doesn't sound reasonable 
to read this many data if there isn't a strong reason for doing it.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-11 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Looking back over this tracker item, I realize that I didn't elaborate 
sufficiently on the problem being addressed:

MT is equidistributed.  This a major point in its favor but also implies that 
there are long stretches of uninteresting sequences.  When we seed with only 
a subset the state space, there is a risk of systematically landing in those 
stretches.

That is why the cited best practices paper recommends filling the seed space 
and likely is why the cited reference implementation uses /urandom to fill the 
state space.

We've previously had this problem with MT (since resolved, where it is was 
landed in a very non-random zone).   We can't avoid the *possibility* of 
landing in one of these zones, but we can make sure that it isn't a 
*systematic* recurring problem.  By using a sufficiently large seed, we avoid 
biasing the selection of which sequences are visitable out of the huge MT 
period.

Though the current 32 bytes is pretty good (I hope so, I'm the one that chose 
that constant), there is no question that there is some benefit from the larger 
seed and that we are following in the footsteps of published reference 
implementations.

The real question is whether there is a actual downside to calling 
urandom(2500).  AFAICT, the impact is insignificant (increasing the cost of 
initialization by mirco-seconds, a percentage increase so small that I can't 
measure it).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-11 Thread Tim Peters

Tim Peters added the comment:

[neologix]
 some code spawns many processes per second (see recent
 discussion on python-dev).

But that doesn't imply they're seeding the random module many times per second, 
right? Seeding isn't part of Python initialization, it's part of importing the 
`random` module.

Note that hash randomization is a different thing, unrelated to random.py.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-11 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 [neologix]
  some code spawns many processes per second (see recent
  discussion on python-dev).
 
 But that doesn't imply they're seeding the random module many times
 per second, right? Seeding isn't part of Python initialization, it's
 part of importing the `random` module.

It's easy to import the random module, even for a specific library
function which may never be called by the program being run.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-10 Thread Raymond Hettinger

New submission from Raymond Hettinger:

The default seeding for the random module currently used 32 bytes from 
urandom() to create the initial state of the random number generator.  This is 
far less than the number of possible states 2**19937-1.

Changing the default seed to a larger number will cover the state space.  This 
gives better security and better randomness by not using just a subset of the 
possible states.

The downside is slightly slower default initialization.

--
assignee: rhettinger
files: better_seed.diff
keywords: patch
messages: 218238
nosy: rhettinger
priority: normal
severity: normal
status: open
title: Better seeding for the random module
type: security
versions: Python 3.4, Python 3.5
Added file: http://bugs.python.org/file35209/better_seed.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-10 Thread Alex Gaynor

Changes by Alex Gaynor alex.gay...@gmail.com:


--
nosy: +alex

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-10 Thread Tim Peters

Tim Peters added the comment:

+1, although it could really use a comment explaining that 2500 bytes was 
chosen to be = the Twister's 19937 internal bits of state.  Otherwise it looks 
as arbitrary as 32 did ;-)

--
nosy: +tim.peters

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-10 Thread Antoine Pitrou

Antoine Pitrou added the comment:

I'm not sure it is good practice to read that many bytes from /dev/urandom. 
Quoting the Linux man page for /dev/urandom:

   The  kernel random-number generator is designed to produce a small 
amount of
   high-quality seed material to seed a cryptographic pseudo-random number 
gen‐
   erator  (CPRNG).   It  is  designed  for  security, not speed, and is 
poorly
   suited to generating large amounts of random data.   Users  should  be  
very
   economical  in  the amount of seed material that they read from 
/dev/urandom
   (and /dev/random); unnecessarily reading large quantities of data from  
this
   device will have a negative impact on other users of the device.

The (default?) entropy pool size under Linux is 4096 bytes, so reading 2500 
bytes will deplete more than half of it, IIUC. Example:

$ cat /proc/sys/kernel/random/poolsize 
4096
$ cat /proc/sys/kernel/random/entropy_avail 
516
$ python -c import os; os.urandom(300)
$ cat /proc/sys/kernel/random/entropy_avail 
160

--
nosy: +neologix, pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-10 Thread Ezio Melotti

Changes by Ezio Melotti ezio.melo...@gmail.com:


--
nosy: +ezio.melotti

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-10 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Also, the repletion rate of the entropy pool seems quite slow actually (around 
10 bytes per second?).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-10 Thread Donald Stufft

Donald Stufft added the comment:

Depleting the entropy pool is sort of a nonsense idea that /dev/random has. 
Nobody should ever be worried about it and nobody should ever use /dev/random. 
The manpage is wrong and has continued to be wrong because of historical 
reasons and the people involved not wanting to admit a wart.

Anything that relies on /dev/random isn't well written and is going to randomly 
block for bad reasons. We can pull as many bytes as we want off /dev/urandom 
and just forget that /dev/random or /proc/sys/kernel/random/entropy_avail even 
exists.

--
nosy: +dstufft

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-10 Thread Raymond Hettinger

Raymond Hettinger added the comment:

http://www.2uo.de/myths-about-urandom/

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21470] Better seeding for the random module

2014-05-10 Thread Raymond Hettinger

Changes by Raymond Hettinger raymond.hettin...@gmail.com:


--
versions: +Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21470
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com