Re: hash_create API changes (was Re: [HACKERS] speedup tidbitmap patch: hash BlockNumber)

Jim Nasby Fri, 19 Dec 2014 14:42:54 -0800

On 12/18/14, 5:00 PM, Jim Nasby wrote:

2201582 20 -- Mostly LOCALLOCK and Shared Buffer


Started looking into this; perhaps https://code.google.com/p/fast-hash/ would 
be worth looking at, though it requires uint64.

It also occurs to me that we're needlessly shoving a lot of 0's into the hash 
input by using RelFileNode and ForkNumber. RelFileNode includes the tablespace 
Oid, which is pointless here because relid is unique per-database. We also have 
very few forks and typically care about very few databases. If we crammed dbid 
and ForkNum together that gets us down to 12 bytes, which at minimum saves us 
the trip through the case logic. I suspect it also means we could eliminate one 
of the mix() calls.

But I wonder if we could still do better, because we typically also won't have 
that many relations. Is there some fast way we could combine dbid, forkNum and 
relid into a uint32? That gets us down to 8 bytes, which means we could use 
fash-hash, or a stripped down mix().

Unfortunately I don't know much about hash algorithms, so I don't know how 
practical any of this actually is, or what a good method for combining those 
fields would be. My current idea is something like (rot(forkNum, 2) | dbid) ^ 
relid, but if you got unlucky with your oid values you could end up with a lot 
of collissions from that.

I can put some effort into this, but I'd like some guidance.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: hash_create API changes (was Re: [HACKERS] speedup tidbitmap patch: hash BlockNumber)

Reply via email to