Eric Appelt added the comment:
I also looked at hashes of strings themselves rather than frozensets to check
the hashing of strings directly.
For example, n=3:
['', 'a', 'b', 'c', 'ab', 'ac', 'bc', 'abc']
rather than:
[frozenset(), frozenset({'a'}), frozenset({'b'}), frozenset({'c'}),
frozenset({'b', 'a'}), frozenset({'c', 'a'}), frozenset({'b', 'c'}),
frozenset({'b', 'a', 'c'})]
I made a distribution as with the last comment but now using the # of unique
last-7 bit sequences in a set of 128 such strings (n=7) and compared to
pseudorandom integers, just as was done before with frozensets of the letter
combinations. This is shown in the file "str_string_n7_10k.png".
The last 7-bits of the small string hashes produce a distribution much like
regular pseudorandom integers.
So if there is a problem with the hash algorithm, it appears to be related to
the frozenset hashing and not strings.
----------
Added file: http://bugs.python.org/file45270/str_string_n7_10k.png
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue26163>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com