On Sat, Apr 05, 2008 at 03:40:35PM -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > On Sun, 2007-10-28 at 13:05 -0500, Kenneth Marshall wrote:
> >> The new hash function is roughly twice as fast as the old function in
> >> terms of straight CPU time. It uses the same design as the current
> >> hash but provides code paths for aligned and unaligned access as well
> >> as separate mixing functions for different blocks in the hash run
> >> instead of having one general purpose block. I think the speed will
> >> not be an obvious win with smaller items, but will be very important
> >> when hashing larger items (up to 32kb).
> >> Better in this case means that the new hash mixes more thoroughly
> >> which results in less collisions and more even bucket distribution.
> >> There is also a 64-bit varient which is still faster since it can
> >> take advantage of the 64-bit processor instruction set.
> > Ken, I was really looking for some tests that show both of the above
> > were true. We've had some trouble proving the claims of other algorithms
> > before, so I'm less inclined to take those things at face value.
> I spent some time today looking at this code more closely and running
> some simple speed tests. It is faster than what we have, although 2X
> is the upper limit of the speedups I saw on four different machines.
> There are several things going on in comparison to our existing
> * If the source data is word-aligned, the new code fetches it a word at
> a time instead of a byte at a time; that is
> a += (k + ((uint32) k << 8) + ((uint32) k << 16) + ((uint32)
> k << 24));
> b += (k + ((uint32) k << 8) + ((uint32) k << 16) + ((uint32)
> k << 24));
> c += (k + ((uint32) k << 8) + ((uint32) k << 16) +
> ((uint32) k << 24));
> a += k;
> b += k;
> c += k;
> where k is now pointer to uint32 instead of uchar. This accounts for
> most of the speed improvement. However, the results now vary between
> big-endian and little-endian machines. That's fine for PG's purposes.
> But it means that we need two sets of code for the unaligned-input code
> path, since it clearly won't do for the same bytestring to get two
> different hashes depending on whether it happens to be presented aligned
> or not. The presented patch actually offers *four* code paths, so that
> you can compute either little-endian-ish or big-endian-ish hashes on
> either type of machine. That's nothing but bloat for our purposes, and
> should be reduced to the minimum.
I agree that a good portion of the speed up is due to the full word
processing. The original code from Bob Jenkins had all of these code
paths and I just dropped them in with a minimum of changes.
> * Given a word-aligned source pointer and a length that isn't a multiple
> of 4, the new code fetches the last partial word as a full word fetch
> and masks it off, as per the code comment:
> * "k&0xffffff" actually reads beyond the end of the string, but
> * then masks off the part it's not allowed to read. Because the
> * string is aligned, the masked-off tail is in the same word as the
> * rest of the string. Every machine with memory protection I've seen
> * does it on word boundaries, so is OK with this. But VALGRIND will
> * still catch it and complain. The masking trick does make the hash
> * noticably faster for short strings (like English words).
> This I think is well beyond the bounds of sanity, especially since we
> have no configure support for setting #ifdef VALGRIND. I'd lose the
> "non valgrind clean" paths (which again are contributing to the patch's
> impression of bloat/redundancy).
Okay, I will strip the VALGRIND paths. I did not see a real need for them
> * Independently of the above changes, the actual hash calculation
> (the mix() and final() macros) has been changed. Ken claims that
> this made the hash "better", but I'm deeply suspicious of that.
> The comments in the code make it look like Jenkins actually sacrificed
> hash quality in order to get a little more speed. I don't think we
> should adopt those changes unless some actual evidence is presented
> that the hash is better and not merely faster.
I was repeating the claims made by the functions author after his own
testing. His analysis and tests were reasonable, but I do agree that
we need some testing of our own. I will start pulling some test cases
together like what was discussed earlier with Simon.
> In short: I think we should adopt the changes to use aligned word
> fetches where possible, but not adopt the mix/final changes unless
> more evidence is presented.
Okay, I agree and will work on producing evidence either way.
> Lastly, the patch adds yet more code to provide the option of computing
> a 64-bit hash rather than 32. (AFAICS, the claim that this part is
> optimized for 64-bit machines is mere fantasy. It's simply Yet Another
> duplicate of the identical code, but it gives you back two of its three
> words of internal state at the end, instead of only one.) As I said
> before, this is just bloat for us. I've got zero interest in pursuing
> 64-bit hashing when we still don't have a hash index implementation that
> anyone would consider using in anger. Let's see if we can make the cake
> edible before worrying about putting a better grade of icing on it.
You are correct, my 64-bit claim was due to mis-interpreting some comments
by the author. He sent in a correction to the mailing list, personally.
Sent via pgsql-patches mailing list (email@example.com)
To make changes to your subscription: