Am 07.01.2012 12:02, schrieb Stefan Behnel: > Wouldn't Bob Jenkins' "lookup3" hash function fit in here? After all, it's > portable, known to provide a very good distribution for different string > values and is generally fast on both 32 and 64 bit architectures. > > http://burtleburtle.net/bob/c/lookup3.c > > The analysis is here: > > http://burtleburtle.net/bob/hash/doobs.html > > It seems that there's also support for generating 64bit hash values > (actually 2x32bits) efficiently.
This thread as well as the ticket is getting so long that people barely have a chance to catch up ... Guido has stated that he doesn't want a completely new hash algorithm for Python 2.x to 3.2. A new hash algorithm for 3.3 needs a PEP, too. I've done some experiments with FNV and Murmur3. With Murmur3 128bit I've seen some minor speed improvements on 64bit platforms. At first I was surprised but it makes sense. Murmur3 operates on uint32_t blocks while Python's hash algorithm iterates over 1 byte (bytes, ASCII), 2 bytes (USC2) or 4 bytes (USC4) types. Since most strings are either ASCII or UCS2, the inner loop of the current algorithm is more tight. > Admittedly, this may require some adaptation for the PEP393 unicode memory > layout in order to produce identical hashes for all three representations > if they represent the same content. So it's not a drop-in replacement. Is this condition required and implemented at the moment? Christian _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com