Re: [HACKERS] [POC] hash partitioning

Jeff Davis Wed, 03 May 2017 22:45:54 -0700

On Tue, May 2, 2017 at 7:01 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Tue, May 2, 2017 at 9:01 PM, Jeff Davis <pg...@j-davis.com> wrote:
>> 1. Consider a partition-wise join of two hash-partitioned tables. If
>> that's a hash join, and we just use the hash opclass, we immediately
>> lose some useful bits of the hash function. Same for hash aggregation
>> where the grouping key is the partition key.
>
> Hmm, that could be a problem in some cases.  I think there's probably
> much less of a problem if the modulus isn't a power of two?


That's true, but it's awkward to describe that to users. And I think
most people would be inclined to use a power-of-two number of
partitions, perhaps coming from other systems.

>> To fix this, I think we need to include a salt in the hash API. Each
>> level of hashing can choose a random salt.
>
> Do you mean that we'd salt partitioning hashing differently from
> grouping hashing which would be salted different from aggregation
> hashing which, I suppose, would be salted differently from hash index
> hashing?

Yes. The way I think about it is that choosing a new random salt is an
easy way to get a new hash function.

> Or do you mean that you'd have to specify a salt when
> creating a hash-partitioned table, and make sure it's the same across
> all compatibly partitioned tables you might want to hash-join?  That
> latter sounds unappealing.

I don't see a reason to expose the salt to users. If we found a reason
in the future, we could, but it would create all of the problems you
are thinking about.

>> 2. Consider a partition-wise join where the join keys are varchar(10)
>> and char(10). We can't do that join if we just use the existing hash
>> strategy, because 'foo' = 'foo       ' should match, but those values
>> have different hashes when using the standard hash opclass.

...

> You're basically describing what a hash opfamily already does, except
> that we don't have a single opfamily that covers both varchar(10) and
> char(10), nor do we have one that covers both int and numeric.  We
> have one that covers int2, int4, and int8, though.  If somebody wanted
> to make the ones you're suggesting, there's nothing preventing it,
> although I'm not sure exactly how we'd encourage people to start using
> the new one and deprecating the old one.  We don't seem to have a good
> infrastructure for that.

OK. I will propose new hash opfamilies for varchar/bpchar/text,
int2/4/8/numeric, and timestamptz/date.

One approach is to promote the narrower type to the wider type, and
then hash. The problem is that would substantially slow down the
hashing of integers, so then we'd need to use one hash opfamily for
partitioning and one for hashjoin, and it gets messy.

The other approach is to check if the wider type is within the domain
of the narrower type, and if so, *demote* the value and then hash. For
instance, '4.2'::numeric would hash the same as it does today, but
'4'::numeric would hash as an int2. I prefer this approach, and int8
already does something resembling it.

For timestamptz/date, it's not nearly as important.

>> My opinion is that we should work on this hashing infrastructure
>> first, and then support the DDL. If we get the hash functions right,
>> that frees us up to create better plans, with better push-downs, which
>> will be good for parallel query.
>
> I am opposed to linking the fate of this patch to multiple
> independent, possibly large, possibly difficult, possibly
> controversial enhancements to the hashing mechanism.

It's a little early in the v11 cycle to be having this argument.
Really what I'm saying is that a small effort now may save us a lot of
headache later.

Regards,
     Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

Reply via email to