Raymond Hettinger <raymond.hettin...@gmail.com> added the comment: Messages (3) msg309956 - (view) Author: Johnny Dude (JohnnyD) Date: 2018-01-15 01:08 When using a tuple that include a string the results are not consistent when invoking a new interpreter or process.
For example executing the following on a linux machine will yield different results: python3.6 -c 'import random; random.seed(("a", 1)); print(random.random())" Please note that the doc string of random.seed states: "Initialize internal state from hashable object." Python documentation does not. (https://docs.python.org/3.6/library/random.html#random.seed) This is very confusing, I hope you can fix the behavior, not the doc string. msg309957 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-01-15 01:13 random.seed(str) uses: if version == 2 and isinstance(a, (str, bytes, bytearray)): if isinstance(a, str): a = a.encode() a += _sha512(a).digest() a = int.from_bytes(a, 'big') Whereas for other types, random.seed(obj) uses hash(obj), and hash is randomized by default in Python 3. Yeah, the random.seed() documentation should describe the implementation and explain that hash(obj) is used and that the hash function is randomized by default: https://docs.python.org/dev/library/random.html#random.seed msg310006 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-01-15 10:41 I'm getting a nice improvement in dispersion statistics by shuffling in higher bits right at the end: /* Disperse patterns arising in nested frozensets */ + hash ^= (hash >> 11) ^ (~hash >> 25); hash = hash * 69069U + 907133923UL; Results for range() check: range range baseline new 1st percentile 35.06% 40.63% 1st decile 48.03% 51.34% mean 61.47% 63.24% median 63.24% 65.58% Test code for the letter_range() test: letter letter baseline new 1st percentile 39.59% 40.14% 1st decile 50.90% 51.07% mean 63.02% 63.04% median 65.21% 65.23% def letter_range(n): return string.ascii_letters[:n] def powerset(s): for i in range(len(s)+1): yield from map(frozenset, itertools.combinations(s, i)) # range() check for i in range(10000): for n in range(5, 19): t = 2 ** n mask = t - 1 u = len({h & mask for h in map(hash, powerset(range(i, i+n)))}) print(u/t*100) # letter_range() check needs to be restarted (reseeded on every run) for n in range(5, 19): t = 2 ** n mask = t - 1 u = len({h & mask for h in map(hash, powerset(letter_range(n)))}) print(u/t) ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue26163> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com