Re: [HACKERS] [POC] hash partitioning

Andres Freund Thu, 12 Oct 2017 12:44:53 -0700

On 2017-10-12 10:05:26 -0400, Robert Haas wrote:
> On Thu, Oct 12, 2017 at 9:08 AM, amul sul <[email protected]> wrote:
> > How about combining high 32 bits and the low 32 bits separately as shown 
> > below?
> >
> > static inline uint64
> > hash_combine64(uint64 a, uint64 b)
> > {
> >     return (((uint64) hash_combine((uint32) a >> 32, (uint32) b >> 32) << 
> > 32)
> >             | hash_combine((unit32) a, (unit32) b));
> > }
> 
> I doubt that's the best approach, but I don't have something specific
> to recommend.


Yea, that doesn't look great. There's basically no intermixing between
low and high 32 bits. going on.  We probably should just expand the
concept of the 32 bit function:

static inline uint32
hash_combine32(uint32 a, uint32 b)
{
        /* 0x9e3779b9 is the golden ratio reciprocal */
        a ^= b + 0x9e3779b9 + (a << 6) + (a >> 2);
        return a;
}

to something roughly like:

static inline uint64
hash_combine64(uint64 a, uint64 b)
{
        /* 0x49A0F4DD15E5A8E3 is 64bit random data */
        a ^= b + 0x49A0F4DD15E5A8E3 + (a << 54) + (a >> 7);
        return a;
}

In contrast to the 32 bit version's fancy use of the golden ratio
reciprocal as a constant I went brute force, and just used 64bit of
/dev/random. From my understanding the important property is that bits
are independent from each other, nothing else.

The shift widths are fairly random, but they should bring in enough bit
perturbation when mixing in only 32bit of hash value (i.e
0x00000000xxxxxxxx).

Are we going to rely on the the combine function to stay the same
forever after?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

Reply via email to