Simon Riggs <[EMAIL PROTECTED]> writes: > On Sun, 2007-10-28 at 13:05 -0500, Kenneth Marshall wrote: >> The new hash function is roughly twice as fast as the old function in >> terms of straight CPU time. It uses the same design as the current >> hash but provides code paths for aligned and unaligned access as well >> as separate mixing functions for different blocks in the hash run >> instead of having one general purpose block. I think the speed will >> not be an obvious win with smaller items, but will be very important >> when hashing larger items (up to 32kb). >> >> Better in this case means that the new hash mixes more thoroughly >> which results in less collisions and more even bucket distribution. >> There is also a 64-bit varient which is still faster since it can >> take advantage of the 64-bit processor instruction set.
> Ken, I was really looking for some tests that show both of the above > were true. We've had some trouble proving the claims of other algorithms > before, so I'm less inclined to take those things at face value. I spent some time today looking at this code more closely and running some simple speed tests. It is faster than what we have, although 2X is the upper limit of the speedups I saw on four different machines. There are several things going on in comparison to our existing hash_any: * If the source data is word-aligned, the new code fetches it a word at a time instead of a byte at a time; that is a += (k[0] + ((uint32) k[1] << 8) + ((uint32) k[2] << 16) + ((uint32) k[3] << 24)); b += (k[4] + ((uint32) k[5] << 8) + ((uint32) k[6] << 16) + ((uint32) k[7] << 24)); c += (k[8] + ((uint32) k[9] << 8) + ((uint32) k[10] << 16) + ((uint32) k[11] << 24)); becomes a += k[0]; b += k[1]; c += k[2]; where k is now pointer to uint32 instead of uchar. This accounts for most of the speed improvement. However, the results now vary between big-endian and little-endian machines. That's fine for PG's purposes. But it means that we need two sets of code for the unaligned-input code path, since it clearly won't do for the same bytestring to get two different hashes depending on whether it happens to be presented aligned or not. The presented patch actually offers *four* code paths, so that you can compute either little-endian-ish or big-endian-ish hashes on either type of machine. That's nothing but bloat for our purposes, and should be reduced to the minimum. * Given a word-aligned source pointer and a length that isn't a multiple of 4, the new code fetches the last partial word as a full word fetch and masks it off, as per the code comment: * "k[2]&0xffffff" actually reads beyond the end of the string, but * then masks off the part it's not allowed to read. Because the * string is aligned, the masked-off tail is in the same word as the * rest of the string. Every machine with memory protection I've seen * does it on word boundaries, so is OK with this. But VALGRIND will * still catch it and complain. The masking trick does make the hash * noticably faster for short strings (like English words). This I think is well beyond the bounds of sanity, especially since we have no configure support for setting #ifdef VALGRIND. I'd lose the "non valgrind clean" paths (which again are contributing to the patch's impression of bloat/redundancy). * Independently of the above changes, the actual hash calculation (the mix() and final() macros) has been changed. Ken claims that this made the hash "better", but I'm deeply suspicious of that. The comments in the code make it look like Jenkins actually sacrificed hash quality in order to get a little more speed. I don't think we should adopt those changes unless some actual evidence is presented that the hash is better and not merely faster. In short: I think we should adopt the changes to use aligned word fetches where possible, but not adopt the mix/final changes unless more evidence is presented. Lastly, the patch adds yet more code to provide the option of computing a 64-bit hash rather than 32. (AFAICS, the claim that this part is optimized for 64-bit machines is mere fantasy. It's simply Yet Another duplicate of the identical code, but it gives you back two of its three words of internal state at the end, instead of only one.) As I said before, this is just bloat for us. I've got zero interest in pursuing 64-bit hashing when we still don't have a hash index implementation that anyone would consider using in anger. Let's see if we can make the cake edible before worrying about putting a better grade of icing on it. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches