On 4/6/08, Tom Lane <[EMAIL PROTECTED]> wrote:
>  So adopting the mixing changes would make it faster yet.  What we need
>  to be certain of is that this doesn't expose us to poorer hashing.
>  We know that it is critical that all bits of the input affect all bits
>  of the hash fairly uniformly --- otherwise we are subject to very
>  serious performance hits at higher levels in hash join, for instance.
>  The comments in the new code led me to worry that Jenkins had
>  compromised on that property in search of speed.  I looked at his
>  website but couldn't find any real discussion of the design principles
>  for the new mixing code ...

Scroll at the end of doobs.html, there is the longer discussion.

My understanding is following:

His design is based on 2 properties of mixing function:

- reversible - that means for each input tuple of (a,b,c) corresponds
  exactly one output tuple of (a,b,c).  Such property guarantees that
  after repeatedly applying mixing function, no bits get lost.

- avalanche - that means any single bit change in input (a,b,c)
  half of the output bits are affected.

His "insight" (as he called it) when creating lookup3 was that
the bulk mixing that is applied repeatedly does not need to
have avalanche, it only needs to be reversible, meaning all the
bits that went in, are still there after mixing repeatedly.

And only final mixing needs to have avalanche as it produces
final result, but it does not need to be reversible as it wont
be applied repeatedly and most of the result is dropped anyway.

IMHO his choices are reasonable.


Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:

Reply via email to