On Sat, Apr 05, 2008 at 03:40:35PM -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > On Sun, 2007-10-28 at 13:05 -0500, Kenneth Marshall wrote:
> >> The new hash function is roughly twice as fast as the old function in
> >> terms of straight CPU time. It uses the same design as the current
> >> hash but provides code paths for aligned and unaligned access as well
> >> as separate mixing functions for different blocks in the hash run
> >> instead of having one general purpose block. I think the speed will
> >> not be an obvious win with smaller items, but will be very important
> >> when hashing larger items (up to 32kb).
> >> 
> >> Better in this case means that the new hash mixes more thoroughly
> >> which results in less collisions and more even bucket distribution.
> >> There is also a 64-bit varient which is still faster since it can
> >> take advantage of the 64-bit processor instruction set.
> > Ken, I was really looking for some tests that show both of the above
> > were true. We've had some trouble proving the claims of other algorithms
> > before, so I'm less inclined to take those things at face value.
> I spent some time today looking at this code more closely and running
> some simple speed tests.  It is faster than what we have, although 2X
> is the upper limit of the speedups I saw on four different machines.
> There are several things going on in comparison to our existing
> hash_any:
> * If the source data is word-aligned, the new code fetches it a word at
> a time instead of a byte at a time; that is
>         a += (k[0] + ((uint32) k[1] << 8) + ((uint32) k[2] << 16) + ((uint32) 
> k[3] << 24));
>         b += (k[4] + ((uint32) k[5] << 8) + ((uint32) k[6] << 16) + ((uint32) 
> k[7] << 24));
>         c += (k[8] + ((uint32) k[9] << 8) + ((uint32) k[10] << 16) + 
> ((uint32) k[11] << 24));
> becomes
>         a += k[0];
>         b += k[1];
>         c += k[2];
> where k is now pointer to uint32 instead of uchar.  This accounts for
> most of the speed improvement.  However, the results now vary between
> big-endian and little-endian machines.  That's fine for PG's purposes.
> But it means that we need two sets of code for the unaligned-input code
> path, since it clearly won't do for the same bytestring to get two
> different hashes depending on whether it happens to be presented aligned
> or not.  The presented patch actually offers *four* code paths, so that
> you can compute either little-endian-ish or big-endian-ish hashes on
> either type of machine.  That's nothing but bloat for our purposes, and
> should be reduced to the minimum.

I agree that a good portion of the speed up is due to the full word
processing. The original code from Bob Jenkins had all of these code
paths and I just dropped them in with a minimum of changes.

> * Given a word-aligned source pointer and a length that isn't a multiple
> of 4, the new code fetches the last partial word as a full word fetch
> and masks it off, as per the code comment:
>      * "k[2]&0xffffff" actually reads beyond the end of the string, but
>      * then masks off the part it's not allowed to read.  Because the
>      * string is aligned, the masked-off tail is in the same word as the
>      * rest of the string.  Every machine with memory protection I've seen
>      * does it on word boundaries, so is OK with this.  But VALGRIND will
>      * still catch it and complain.  The masking trick does make the hash
>      * noticably faster for short strings (like English words).
> This I think is well beyond the bounds of sanity, especially since we
> have no configure support for setting #ifdef VALGRIND.  I'd lose the
> "non valgrind clean" paths (which again are contributing to the patch's
> impression of bloat/redundancy).

Okay, I will strip the VALGRIND paths. I did not see a real need for them

> * Independently of the above changes, the actual hash calculation
> (the mix() and final() macros) has been changed.  Ken claims that
> this made the hash "better", but I'm deeply suspicious of that.
> The comments in the code make it look like Jenkins actually sacrificed
> hash quality in order to get a little more speed.  I don't think we
> should adopt those changes unless some actual evidence is presented
> that the hash is better and not merely faster.

I was repeating the claims made by the functions author after his own
testing. His analysis and tests were reasonable, but I do agree that
we need some testing of our own. I will start pulling some test cases
together like what was discussed earlier with Simon.

> In short: I think we should adopt the changes to use aligned word
> fetches where possible, but not adopt the mix/final changes unless
> more evidence is presented.
Okay, I agree and will work on producing evidence either way.

> Lastly, the patch adds yet more code to provide the option of computing
> a 64-bit hash rather than 32.  (AFAICS, the claim that this part is
> optimized for 64-bit machines is mere fantasy.  It's simply Yet Another
> duplicate of the identical code, but it gives you back two of its three
> words of internal state at the end, instead of only one.)  As I said
> before, this is just bloat for us.  I've got zero interest in pursuing
> 64-bit hashing when we still don't have a hash index implementation that
> anyone would consider using in anger.  Let's see if we can make the cake
> edible before worrying about putting a better grade of icing on it.
You are correct, my 64-bit claim was due to mis-interpreting some comments
by the author. He sent in a correction to the mailing list, personally.

Ken Marshall

Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:

Reply via email to