[issue9025] Non-uniformity in randrange for large arguments.

2010-09-06 Thread Raymond Hettinger
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: Put in a fix with r84576. May come back to it to see if it can or should be optimized with C. For now, this gets the job done. -- resolution: - fixed status: open - closed ___

[issue9025] Non-uniformity in randrange for large arguments.

2010-08-07 Thread Raymond Hettinger
Changes by Raymond Hettinger rhettin...@users.sourceforge.net: -- priority: normal - high ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9025 ___

[issue9025] Non-uniformity in randrange for large arguments.

2010-06-24 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Distribution with my algorithm: ... from collections import Counter print Counter(_randint(6755399441055744) % 3 for _ in xrange(1)) = Counter({0L: 33342985, 2L: 5781, 1L: 33321234}) Distribution: {0:

[issue9025] Non-uniformity in randrange for large arguments.

2010-06-24 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: A couple of points: (1) In addition to documenting the extent of the repeatability, it would be good to have tests to prevent changes that inadvertently change the sequence of randrange values. (2) For large arguments, cross-platform

[issue9025] Non-uniformity in randrange for large arguments.

2010-06-23 Thread Raymond Hettinger
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: FWIW, here are two approaches to getting an equi-distributed version of int(n*random()) where 0 n = 2**53. The first mirrors the approach currently in the code. The second approach makes fewer calls to random(). def

[issue9025] Non-uniformity in randrange for large arguments.

2010-06-23 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: Either of these looks good to me. If the last line of the second is changed from return int(r) % n to return int(r) // (N // n) then it'll use the high-order bits of random() instead of the low-order bits. This doesn't matter for MT, but

[issue9025] Non-uniformity in randrange for large arguments.

2010-06-23 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: This wouldn't be the first time reproduceability is dropped, since reading from the docs: “As an example of subclassing, the random module provides the WichmannHill class that implements an alternative generator in pure Python. The class

[issue9025] Non-uniformity in randrange for large arguments.

2010-06-23 Thread Senthil Kumaran
Senthil Kumaran orsent...@gmail.com added the comment: I guess, Antoine wanted to point out this: Changed in version 2.3: MersenneTwister replaced Wichmann-Hill as the default generator. But as the paragraph points out Python did provide non default WichmanHill class for generating repeatable

[issue9025] Non-uniformity in randrange for large arguments.

2010-06-23 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: BTW, the Wichmann-Hill code is gone in py3k, so that doc paragraph needs removing or updating. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9025

[issue9025] Non-uniformity in randrange for large arguments.

2010-06-23 Thread Raymond Hettinger
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: Thanks guys, I've got it from here. Some considerations for the PRNG are: * equidistribution (for quality) * repeatability from the same seed (even in multithreaded environments) * quality and simplicity of API (for

[issue9025] Non-uniformity in randrange for large arguments.

2010-06-23 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Some considerations for the PRNG are: * equidistribution (for quality) * repeatability from the same seed (even in multithreaded environments) I believe a reasonable (com)promise would be to guarantee repeatability accross a given set of

[issue9025] Non-uniformity in randrange for large arguments.

2010-06-23 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: * possibly providing a C version of rnd2() If recoding in C is acceptable, I think there may be better ( = simpler and faster) ways than doing a direct translation of rnd2. For example, for small k, the following algorithm for randrange(k)

[issue9025] Non-uniformity in randrange for large arguments.

2010-06-23 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: Just to illustrate, here's a patch that adds a method Random._smallrandbelow, based on the algorithm I described above. -- Added file: http://bugs.python.org/file17755/_smallrandbelow.diff ___

[issue9025] Non-uniformity in randrange for large arguments.

2010-06-23 Thread Raymond Hettinger
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: Antoine, there does need to be repeatablity; there's no question about that. The open question for me is how to offer that repeatability in the cleanest manner. People use random.seed() for reproducible tests. They need

[issue9025] Non-uniformity in randrange for large arguments.

2010-06-23 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: randint.py: another algorithm to generate a random integer in a range. It uses only operations on unsigned integers (no evil floatting point number). It calls tick() multiple times to generate enough entropy. It has an uniform

[issue9025] Non-uniformity in randrange for large arguments.

2010-06-23 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Antoine, there does need to be repeatablity; there's no question about that. Well, that doesn't address my proposal of making it repeatable accross bugfix releases only. There doesn't seem to be a strong use case for perpetual repeatability.

[issue9025] Non-uniformity in randrange for large arguments.

2010-06-21 Thread Terry J. Reedy
Terry J. Reedy tjre...@udel.edu added the comment: 'Random', without qualification, is commonly taken to mean 'with uniform distribution'. Otherwise it has no specific meaning and could well be a synonym for 'arbitrary' or 'haphazard'. The behavior reported is buggy and in my opinion should

[issue9025] Non-uniformity in randrange for large arguments.

2010-06-18 Thread Mark Dickinson
New submission from Mark Dickinson dicki...@gmail.com: Not a serious bug, but worth noting: The result of randrange(n) is not even close to uniform for large n. Witness the obvious skew in the following (this takes a minute or two to run, so you might want to reduce the range argument):

[issue9025] Non-uniformity in randrange for large arguments.

2010-06-18 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: Note: the number 6755399441055744 is special: it's 0.75 * 2**53, and was deliberately chosen so that the non-uniformity is easily exhibited by looking at residues modulo 3. For other numbers of this size, the non-uniformity is just as

[issue9025] Non-uniformity in randrange for large arguments.

2010-06-18 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: Here's an example patch that removes any bias from randrange(n) (except for bias resulting from the imperfectness of the core MT generator). I added a small private method to Modules/_randommodule.c to aid the computation. This only fixes

[issue9025] Non-uniformity in randrange for large arguments.

2010-06-18 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: The nonuniformity of randrange has a knock-on effect in other random module functions. For example, take a sample of 100 elements from range(6004799503160661), and take the smallest element from that sample. Then the exact distribution of

[issue9025] Non-uniformity in randrange for large arguments.

2010-06-18 Thread Alexander Belopolsky
Changes by Alexander Belopolsky belopol...@users.sourceforge.net: -- nosy: +belopolsky ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9025 ___ ___

[issue9025] Non-uniformity in randrange for large arguments.

2010-06-18 Thread STINNER Victor
Changes by STINNER Victor victor.stin...@haypocalc.com: -- nosy: +haypo ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9025 ___ ___

[issue9025] Non-uniformity in randrange for large arguments.

2010-06-18 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: Here's a more careful Python-only patch that fixes the bias in randrange and randint (but not in shuffle, choice or sample). It should work well both for Mersenne Twister and for subclasses of Random that use a poorer PRNG with

[issue9025] Non-uniformity in randrange for large arguments.

2010-06-18 Thread Raymond Hettinger
Changes by Raymond Hettinger rhettin...@users.sourceforge.net: -- assignee: - rhettinger ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9025 ___

[issue9025] Non-uniformity in randrange for large arguments.

2010-06-18 Thread Raymond Hettinger
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: Will take a look at this in the next few days. Am tempted to just either provide a recipe or provide a new method. That way sequences generated by earlier python's are still reproducible. --

[issue9025] Non-uniformity in randrange for large arguments.

2010-06-18 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: I would prefer to see correct algorithm in stdlib and a recipe for how to reproduce old sequences for the users who care. -- ___ Python tracker rep...@bugs.python.org

[issue9025] Non-uniformity in randrange for large arguments.

2010-06-18 Thread Raymond Hettinger
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: FWIW, we spent ten years maintaining the ability to reproduce sequences. It has become an implicit promise. I'll take a look at the patch in the next few days. -- ___ Python

[issue9025] Non-uniformity in randrange for large arguments.

2010-06-18 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: Hmm. I hadn't considered the reproducibility problem. Does the module aim for reproducibility across all platforms *and* all versions of Python? Or just one of those? For small n, I think the patched version of randrange(n) produces the