Tim Peters <[email protected]> added the comment:
> Suppose that there is a hash collision, say hash((3, 3)) ==
> hash((-3, -3)) and you change the hashing algorithm to fix
> this collision.
There are _two_ hash functions at play in that collision: the tuple hash
function, and the integer hash function. Regardless of whether the _tuple_
hash function does
t ^= t << shift;
or
t += t >> shift;
or anything else involving just `t`, that only directly affects the result of
the _int_ hash function. Whose low order bits cannot be improved in general -
they're already optimally spread out in the most important cases.
> If that change does NOT affect the lower N bits,
> then you still have a collision hash((3, 3)) % 2**N ==
> hash((-3, -3)) % 2**N. This is relevant because the dict
> implementation starts by taking the lower bits first. So
> ideally we want to avoid hash collisions in the lower
> bits too.
Which the int hash on its own already does a superb job of doing in the most
important cases. If you feel you just have to mess with low-order bits, do it
instead on the _tuple_ hash intermediate results, not on its inputs. Like
x += x >> 16;
after t is folded in to x. It's x you care about directly, not t. Damaging
desirable properties in t to _indirectly_ gain something in x is too clever by
half.
Mucking with t to avoid the nested-tuple catastrophes is worth it, but I prefer
to do that with as little damage to t's desirable properties as reasonably
possible.
> This may also be a reason to avoid the standard FNV
> multipliers: the 64-bit FNV multiplier was chosen with
> the full 64-bit hash range in mind. It was never meant
> to work well when truncated to some lower N bits.
Do you believe any other multiplier would work better toward that end? For any
odd multiplier M, the last k bits of i*M are determined solely by the last k
bits of i, and
[(last k bits of i*M) for i in range(anything, anything + 2**k)]
is a permutation of range(2**k). The high order bits of i have nothing to do
with it, and any odd multiplier permutes the lower k bits over any stretch of
2**k consecutive multiplicands.
Extracting low-order bits is intended to be done by "xor folding" in FNV, and I
_expect_ the same would be prudent for any other multiplier:
https://tools.ietf.org/html/draft-eastlake-fnv-15#page-7
The dict and set lookup functions already do something stronger than xor
folding to bring high bits into play, but do so incrementally as the probe
sequence unfolds.
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue34751>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com