STINNER Victor <victor.stin...@haypocalc.com> added the comment: I tried the collision counting with a low number of collisions:
less than 15 collisions ----------------------- Fail at startup. 5 collisions (32 buckets, 21 used=65.6%): hash=ceb3152f => f 10 collisions (32 buckets, 21 used=65.6%): hash=ceb3152f => f dict((str(k), 0) for k in range(2000000)) ----------------------------------------- 15 collisions (32,768 buckets, 18024 used=55.0%): hash=0e4631d2 => 31d2 20 collisions (131,072 buckets, 81568 used=62.2%): hash=12660719 => 719 25 collisions (1,048,576 buckets, 643992 used=61.4%): hash=6a1f6d21 => f6d21 30 collisions (1,048,576 buckets, 643992 used=61.4%): hash=6a1f6d21 => f6d21 35 collisions => ? (more than 10,000,000 integers) random_dict('', 50000, charset, 1, 3) -------------------------------------- charset = 'abcdefghijklmnopqrstuvwxyz0123456789' 15 collisions (8192 buckets, 5083 used=62.0%): hash=1526677a => 77a 20 collisions (32768 buckets, 19098 used=58.3%): hash=5d7760e6 => 60e6 25 collisions => <unable to generate a new key> random_dict('', 50000, charset, 1, 3) -------------------------------------- charset = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.=+_(){}%' 15 collisions (32768 buckets, 20572 used=62.8%): hash=789fe1e6 => 61e6 20 collisions (2048 buckets, 1297 used=63.3%): hash=2052533d => 33d 25 collisions => nope random_dict('', 50000, charset, 1, 10) -------------------------------------- charset = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.=+_(){}%' 15 collisions (32768 buckets, 18964 used=57.9%): hash=94d7c4f5 => 44f5 20 collisions (32768 buckets, 21548 used=65.8%): hash=acb5b39e => 339e 25 collisions (8192 buckets, 5395 used=65.9%): hash=04d367ae => 7ae 30 collisions => nope random_dict() comes from the following script: *** import random def random_string(charset, minlen, maxlen): strlen = random.randint(minlen, maxlen) return ''.join(random.choice(charset) for index in xrange(strlen)) def random_dict(prefix, count, charset, minlen, maxlen): dico = {} keys = set() for index in xrange(count): for tries in xrange(10000): key = prefix + random_string(charset, minlen, maxlen) if key in keys: continue keys.add(key) break else: raise ValueError("unable to generate a new key") dico[key] = None charset = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.=+_(){}%' charset = 'abcdefghijklmnopqrstuvwxyz0123456789' random_dict('', 50000, charset, 1, 3) *** I ran the Django test suite. With a limit of 20 collisions, 60 tests fail. With a limit of 50 collisions, there is no failure. But I don't think that the test suite uses large data sets. I also triend the Django test suite with a randomized hash function. There are 46 failures. Many (all?) are related to the order of dict keys: repr(dict) or indirectly in a HTML output. I didn't analyze all failures. I suppose that Django can simply run the test suite using PYTHONHASHSEED=0 (disable the randomized hash function), at least in a first time. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue13703> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com