On Sun, Oct 28, 2007 at 08:06:58PM +0000, Simon Riggs wrote: > On Sun, 2007-10-28 at 13:05 -0500, Kenneth Marshall wrote: > > On Sun, Oct 28, 2007 at 05:27:38PM +0000, Simon Riggs wrote: > > > On Sat, 2007-10-27 at 15:15 -0500, Kenneth Marshall wrote: > > > > Its features include a better and faster hash function. > > > > > > Looks very promising. Do you have any performance test results to show > > > it really is faster, when compiled into Postgres? Better probably needs > > > some definition also; in what way are the hash functions better? > > > > > > -- > > > Simon Riggs > > > 2ndQuadrant http://www.2ndQuadrant.com > > > > > The new hash function is roughly twice as fast as the old function in > > terms of straight CPU time. It uses the same design as the current > > hash but provides code paths for aligned and unaligned access as well > > as separate mixing functions for different blocks in the hash run > > instead of having one general purpose block. I think the speed will > > not be an obvious win with smaller items, but will be very important > > when hashing larger items (up to 32kb). > > > > Better in this case means that the new hash mixes more thoroughly > > which results in less collisions and more even bucket distribution. > > There is also a 64-bit varient which is still faster since it can > > take advantage of the 64-bit processor instruction set. > > Ken, I was really looking for some tests that show both of the above > were true. We've had some trouble proving the claims of other algorithms > before, so I'm less inclined to take those things at face value. > > I'd suggest tests with Integers, BigInts, UUID, CHAR(20) and CHAR(100). > Others may have different concerns. > > -- > Simon Riggs > 2ndQuadrant http://www.2ndQuadrant.com > Hi,
I have finally had a chance to do some investigation on the performance of the old hash mix() function versus the updated mix()/final() in the new hash function. Here is a table of my current results for both the old and the new hash function. In this case cracklib refers to the cracklib-dict containing 1648379 unique words massaged in various ways to generate input strings for the hash functions. The result is the number of collisions in the hash values generated. hash input old new ---------- --- --- cracklib 338 316 cracklib x 2 (i.e. clibclib) 305 319 cracklib x 3 (clibclibclib) 323 329 cracklib x 10 302 310 cracklib x 100 350 335 cracklib x 1000 314 309 cracklib x 100 truncated to char(100) 311 327 uint32 from 1-1648379 309 319 (uint32 1-1948379)*256 309 314 (uint32 1-1948379)*16 310 314 "a"uint32 (i.e. a00001,a0002...) 320 321 uint32uint32 (i.e. uint64) 321 287 In addition to these tests, the new mixing functions allow the hash to pass the frog2.c test by Bob Jenkins. Here is his comment from http://burtleburtle.net/bob/hash/doobs.html: lookup3.c does a much more thorough job of mixing than any of my previous hashes (lookup2.c, lookup.c, One-at-a-time). All my previous hashes did a more thorough job of mixing than Paul Hsieh's hash. Paul's hash does a good enough job of mixing for most practical purposes. The most evil set of keys I know of are sets of keys that are all the same length, with all bytes zero, except with a few bits set. This is tested by frog.c.. To be even more evil, I had my hashes return b and c instead of just c, yielding a 64-bit hash value. Both lookup.c and lookup2.c start seeing collisions after 2^53 frog.c keypairs. Paul Hsieh's hash sees collisions after 2^17 keypairs, even if we take two hashes with different seeds. lookup3.c is the only one of the batch that passes this test. It gets its first collision somewhere beyond 2^63 keypairs, which is exactly what you'd expect from a completely random mapping to 64-bit values. If anyone has any other data for me to test with, please let me know. I think this is a reasonable justification for including the new mixing process (mix() and final()) as well as the word-at-a-time processing in our hash function. I will be putting a small patch together to add the new mixing process back in to the updated hash function this weekend in time for the September commit-fest unless there are objections. Both the old and the new hash functions meet the strict avalanche conditions as well. (http://home.comcast.net/~bretm/hash/3.html) I have used an Inline::C perl driver for these tests and can post it if others would like to use it as a testbed. Regards, Ken avalance -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches