Jeroen Demeyer <j.deme...@ugent.be> added the comment:

> Nor is it hard to come up with collisions for most simple hash functions.

The Bernstein hash algorithm is a simple algorithm for which it can be proven 
mathematically that collisions cannot be generated if the multiplier is 
unknown. That is an objective improvement over the current tuple hash. Of 
course, in practice the multiplier is not secret, but it suffices that it is 
random enough.

> Addition also has a relationship to multiplication.

Of course, but not in a way that can be used to generate hash collisions. In 
fact, the simple relation between addition and multiplication makes this an 
actual provable mathematical statement.

> That is more suited to character data and small hash ranges (as opposed to 
> 64-bit).

Maybe you are thinking of the multiplier 33 which is classically used for 
character data? If you replace the multiplier 33 by a larger number like 
_PyHASH_MULTIPLIER == 1000003 (ideally it would be a truly random number), it 
works fine for arbitrary sequences and arbitrary bit lengths.

> The important thing is that it has passed our testing

That's a silly argument. If it passed testing, it is only because the testing 
is insufficient.

But really: what I am proposing is a better hash without obvious collisions 
which won't be slower than the existing hash. Why would you be against that? 
Why should we "roll our own" hash instead of using a known good algorithm?

----------
resolution: rejected -> 
status: closed -> open

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue34751>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to