Tim Peters <t...@python.org> added the comment:

I've been involved - usually very lightly, but sometimes deeply - with PRNGs 
for over 40 years.  I've never seen anyone raise the kinds of concerns you're 
raising here, and, frankly, they don't make sense to me.

But on the chance that I've missed recent fundamental developments, I did some 
searching.  The state of the art for uniform selection from a range appears to 
be what the Go language recently adopted, shown as Algorithm 5 in this paper 
(which also surveys what other languages do):

https://arxiv.org/pdf/1805.10941.pdf

Nobody gives a rip about "correlations" when reusing an internal state.  What 
they're aiming at is a combination of "no bias" and "peak speed".

Python's job is harder, because we couldn't care less what the native machine 
word size is.  For example, randrange(10**500) works fine.  Approaches based on 
chopping back multiplication with a "random" C double can't get more than 53 
bits (on most machines), the limit of IEEE-754 format double precision.  Python 
dropped that approach not only because it was slightly biased, but also because 
it didn't scale to unbounded ranges.

The Go approach works like this Python code, where NBITS would be hard coded in 
C to 32 or 64, depending on the machine word size.  Its advantage is that it 
_usually_ needs no arbitrary divisions (implied by "%") at all, just masking 
and shifting.  At worst, it _may_ need one for-real division.  But it's built 
to work with native machine integer precision, and requires full double-width 
integer multiplication:

    from random import randrange
    NBITS = 64 # need NBITS x NBITS full-precision multiply
    POWER = 1 << NBITS
    MASK = POWER - 1

    def rr(s):  # uniform random int in range(s)
        assert 0 < s <= POWER
        x = randrange(POWER) # i.e., a random bitstring with NBITS bits
        m = x * s # full double-width product
        r = m & MASK
        if r < s:
            t = (POWER - s) % s  # the expensive line
            while r < t:
                x = randrange(POWER)
                m = x * s
                r = m & MASK
        return m >> NBITS

In a nutshell, in each [i * POWER, (i+1) * POWER) clopen interval, it rejects 
the first POWER % s values, leaving exactly floor(POWER/s) multiples of s in 
the interval.

Of course that suffers "correlations" too if you reuse the seed - although not 
of the exact microscopic kind you're talking about.

You haven't responded to questions about why that specific test is thought to 
be especially important, or to why you believe you can't do an obvious thing 
instead (use different seeds, or don't reset the seed at all).

As to who's making raw assertions here, the situation isn't symmetric ;-)  If 
you want a change, you have to _make a case_ for it.  And a good one.  Every 
core Python developer who has knowledge of this code has now told you they 
personally don't see any case for it.  So, sorry, but the status quo is always 
favored in the absence of compelling reason to change.  There were compelling 
reasons to switch to the current scheme.  As you said yourself:

"""
And I understand the puzzlement about my test file setting the same random seed 
and then complaining about correlations. 
"""

That was insightful.  The difference is we never got over that puzzlement ;-)

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue39867>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to