On 12/20/14, 11:51 AM, Tom Lane wrote:
Andres Freund <and...@2ndquadrant.com> writes:
On 2014-12-19 22:03:55 -0600, Jim Nasby wrote:
What I am thinking is not using all of those fields in their raw form to
calculate the hash value. IE: something analogous to:
hash_any(SharedBufHash, (rot(forkNum, 2) | dbNode) ^ relNode) << 32 | blockNum)
perhaps that actual code wouldn't work, but I don't see why we couldn't do
something similar... am I missing something?
I don't think that'd improve anything. Jenkin's hash does have a quite
mixing properties, I don't believe that the above would improve the
quality of the hash.
I think what Jim is suggesting is to intentionally degrade the quality of
the hash in order to let it be calculated a tad faster. We could do that
but I doubt it would be a win, especially in systems with lots of buffers.
IIRC, when we put in Jenkins hashing to replace the older homebrew hash
function, it improved performance even though the hash itself was slower.
Right. Now that you mention it, I vaguely recall the discussions about changing
the hash function to reduce collisions.
I'll still take a look at fash-hash, but it's looking like there may not be
anything we can do here unless we change how we identify relation files
(combining dbid, tablespace id, fork number and file id, at least for
searching). If we had 64bit hash support then maybe that'd be a significant
win, since you wouldn't need to hash at all. But that certainly doesn't seem to
be low-hanging fruit to me...
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: