Eric Appelt added the comment:

I also looked at hashes of strings themselves rather than frozensets to check 
the hashing of strings directly.

For example, n=3:

['', 'a', 'b', 'c', 'ab', 'ac', 'bc', 'abc']

rather than:

[frozenset(), frozenset({'a'}), frozenset({'b'}), frozenset({'c'}), 
frozenset({'b', 'a'}), frozenset({'c', 'a'}), frozenset({'b', 'c'}), 
frozenset({'b', 'a', 'c'})]

I made a distribution as with the last comment but now using the # of unique 
last-7 bit sequences in a set of 128 such strings (n=7) and compared to 
pseudorandom integers, just as was done before with frozensets of the letter 
combinations. This is shown in the file "str_string_n7_10k.png".

The last 7-bits of the small string hashes produce a distribution much like 
regular pseudorandom integers.

So if there is a problem with the hash algorithm, it appears to be related to 
the frozenset hashing and not strings.

----------
Added file: http://bugs.python.org/file45270/str_string_n7_10k.png

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue26163>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to