Re: [Python-Dev] Hash collision security issue (now public)

Christian Heimes Sat, 07 Jan 2012 09:59:18 -0800

Am 07.01.2012 12:02, schrieb Stefan Behnel:
> Wouldn't Bob Jenkins' "lookup3" hash function fit in here? After all, it's
> portable, known to provide a very good distribution for different string
> values and is generally fast on both 32 and 64 bit architectures.
> 
> http://burtleburtle.net/bob/c/lookup3.c
> 
> The analysis is here:
> 
> http://burtleburtle.net/bob/hash/doobs.html
> 
> It seems that there's also support for generating 64bit hash values
> (actually 2x32bits) efficiently.


This thread as well as the ticket is getting so long that people barely
have a chance to catch up ...

Guido has stated that he doesn't want a completely new hash algorithm
for Python 2.x to 3.2. A new hash algorithm for 3.3 needs a PEP, too.

I've done some experiments with FNV and Murmur3. With Murmur3 128bit
I've seen some minor speed improvements on 64bit platforms. At first I
was surprised but it makes sense. Murmur3 operates on uint32_t blocks
while Python's hash algorithm iterates over 1 byte (bytes, ASCII), 2
bytes (USC2) or 4 bytes (USC4) types. Since most strings are either
ASCII or UCS2, the inner loop of the current algorithm is more tight.


> Admittedly, this may require some adaptation for the PEP393 unicode memory
> layout in order to produce identical hashes for all three representations
> if they represent the same content. So it's not a drop-in replacement.

Is this condition required and implemented at the moment?

Christian


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Hash collision security issue (now public)

Reply via email to