Re: [Python-3000] Performance Notes - new hash algorithm

Larry Hastings Sat, 08 Sep 2007 19:24:56 -0700

If the Python community is just noticing the Hsieh hash, that impliesthat the Bob Jenkins hashes are probably unknown as well. Behold:

   http://burtleburtle.net/bob/hash/doobs.html

To save you a little head-scratching, the functions you want to playwith are hashlittle()/hashlittle2() in "lookup3.c":

   http://burtleburtle.net/bob/c/lookup3.c

hashlittle() returns a 32-bit hash; hashlittle2() returns two 32-bithashes on the same input (in effect a 64-bit hash). The "little"implies that the function is better on little-endian machines. (Thereis a hashbig(); no hashbig2(), it is left as an exercise for the reader.)

In our testing (at Facebook, for memcached) hashlittle2 was faster thanthe Hsieh hash; that was done a year ago (and before I joined) so Idon't have numbers for you.

One goal of Jenkin's hashes is uniform distribution, so these functionspresumably lack the serendipitous "similar inputs hash to similarvalues" behavior of Python's current hash function. But why is that afeature? (Not that I doubt Tim Peters!)

Oh, and, all the Jenkins code is public domain.

Cheers,


/larry/

_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] Performance Notes - new hash algorithm

Reply via email to