Re: [scikit-learn] NEP: Random Number Generator Policy

Robert Kern Sat, 16 Jun 2018 17:32:26 -0700

On 6/16/18 05:54, [email protected] wrote:

On Sat, Jun 16, 2018 at 3:59 AM, Robert Kern <[email protected]> wrote:

I have made a significant revision. In this version, downstream projects
like scikit-learn should experience significantly less forced churn.


https://github.com/rkern/numpy/blob/nep/rng-clarification/doc/neps/nep-0019-rng-policy.rst

https://mail.python.org/pipermail/numpy-discussion/2018-June/078252.html

tl;dr RandomState lives! But its distributions are forever frozen. So maybe
"undead" is more apt. Anyways, RandomState will continue to provide the same
stream-compatibility that it always has. But it will be internally
refactored to use the same core uniform PRNG objects that the new
RandomGenerator distributions class will use underneath (defaulting to the
current Mersenne Twister, of course). The distribution methods on
RandomGenerator will be allowed to evolve with numpy versions and get
better/faster implementations.

Your code can mix the usage of RandomState and RandomGenerator as needed,
but they can be made to share the same underlying RNG algorithm's state.



Sounds good to me, and I think handles all our concerns.

I also think that the issues behind the np.random.* section about the
global state and seed can be revisited for possible deprecation of
convenience features.

One clarifying question, mainly to see IIUC

in this quote
"""
Calling numpy.random.seed() thereafter SHOULD just pass the given seed
to the current basic RNG object and not attempt to reset the basic RNG
to the Mersenne Twister. The global RandomState instance MUST be
accessible by the name numpy.random.mtrand._rand
"""

"the current basic RNG object" refers to the global object. AFAIU, it
is possible to change it numpy.random.mtrand._rand. Is it?

numpy.random.mtrand._rand would not be a basic RNG object; it would be (as it isnow) a RandomState instance. "the current basic RNG object" would be the basicRNG that that global RandomState instance is currently using.

It is not possible (now or in the glorious NEP future) to assign a new instanceto numpy.random.mtrand._rand. All of the numpy.random.* functions are actuallyjust simple aliases to the methods on that object when the module is firstbuilt. Re-assigning _rand wouldn't reassign those aliases.numpy.random.standard_normal(), for instance, would still be the.standard_normal() method on the RandomState instance that _rand initiallypointed to.

Currently and under the NEP, the only way to modify numpy.random.mtrand._rand isto call its methods (i.e. the numpy.random.* convenience functions) to modifyits internal state. That's not changing.

The only thing that will change will be that there will be a new numpy.random.*function to call that will let you give the global RandomState a new basic RNGobject that it will swap in internally. Let's call itnp.random.swap_global_basic_rng(). If you don't use that function, you won'thave a problem. I intend to make this new function *very* explicit about what itis doing, and document the crap out of it so it won't be misused likenp.random.seed() is.

I never tried that so I didn't know we can change the global
RandomState, and thought we will have to replace np.random.seed usage > with a 
specific RandomState(seed) instance

I did a quick review of np.random.seed() usage in statsmodels, and I think youare mostly fine. It looks like you mostly use it in unit tests and at the top ofexamples. The only possible problem that I can see that you might have with theswap_global_basic_rng() is if some other package that you rely on calls it inits library code. Then subsequent statsmodels unit tests might fail because whenthey call np.random.seed(), it won't be reseeding a Mersenne Twister but anotherbasic RNG.

However, I intend to make that a weird and unnatural thing to do. It's alreadyunlikely to happen as it's a niche requirement that one mostly would need at thestart of a whole program, not buried down inside library code. But we will alsodocument that function to discourage such usage, and probably have unconditionalnoisy warnings that users would have to explicitly silence.

If one of your dependencies did that, you'd be well within your rights to tellthem that they are misusing numpy and causing breakage in statsmodels.

In loose analogy:

Matplotlib has a "global" current figure and axis, gca, gcf.
In statsmodels we avoid any access to and usage of it and only work
with individual figure/axis instances that can be provided by the
user. (except for maybe some documentation examples and maybe some
"legacy" code.)
( 
https://github.com/statsmodels/statsmodels/blob/master/statsmodels/graphics/utils.py#L48
)

AFAICS, essentially, statsmodels will need a similar policy for
RandomState/RandomGenerator and give up the usage of the global random
instance.

I mean, you certainly *should* (outside of unit tests) for very similar reasonswhy you avoid the global state in matplotlib, but this NEP won't force you to.You should do so anyways under the status quo, too. For any of your functionsthat call np.random.* functions internally, it's hard to use them in threadedapplications, for instance, because it is relying on that global state.


scikit-learn's check_random_state() is a good pattern to follow.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] NEP: Random Number Generator Policy

Reply via email to