Re: [HACKERS] [POC] hash partitioning

Robert Haas Wed, 10 May 2017 09:40:02 -0700

On Thu, May 4, 2017 at 1:44 AM, Jeff Davis <pg...@j-davis.com> wrote:
>> Hmm, that could be a problem in some cases.  I think there's probably
>> much less of a problem if the modulus isn't a power of two?
>
> That's true, but it's awkward to describe that to users. And I think
> most people would be inclined to use a power-of-two number of
> partitions, perhaps coming from other systems.


Yeah, true.

>>> To fix this, I think we need to include a salt in the hash API. Each
>>> level of hashing can choose a random salt.
>>
>> Do you mean that we'd salt partitioning hashing differently from
>> grouping hashing which would be salted different from aggregation
>> hashing which, I suppose, would be salted differently from hash index
>> hashing?
>
> Yes. The way I think about it is that choosing a new random salt is an
> easy way to get a new hash function.

OK.  One problem, though, is we don't quite have the opclass
infrastructure for this.  A hash opclass's support function is
expected to take one argument, a value of the data type at issue.  The
first idea that occurred to me was to allow an optional second
argument which would be a seed, but that seems like it would require
extensive changes to all of the datatype-specific hash functions and
some of them would probably emerge noticeably slower.  If a function
is just calling hash_uint32 right now then I don't see how we're going
to replace that with something more complex that folds in a salt
without causing performance to drop.  Even just the cost of unpacking
the extra argument might be noticeable.

Another alternative would be to be to add one additional, optional
hash opclass support function which takes a value of the type in
question as one argument and a seed as a second argument.  That seems
like it might work OK.  Existing code can use the existing support
function 1 with no change, and hash partitioning can use support
function 2.

>> Or do you mean that you'd have to specify a salt when
>> creating a hash-partitioned table, and make sure it's the same across
>> all compatibly partitioned tables you might want to hash-join?  That
>> latter sounds unappealing.
>
> I don't see a reason to expose the salt to users. If we found a reason
> in the future, we could, but it would create all of the problems you
> are thinking about.

Right, OK.

>> You're basically describing what a hash opfamily already does, except
>> that we don't have a single opfamily that covers both varchar(10) and
>> char(10), nor do we have one that covers both int and numeric.  We
>> have one that covers int2, int4, and int8, though.  If somebody wanted
>> to make the ones you're suggesting, there's nothing preventing it,
>> although I'm not sure exactly how we'd encourage people to start using
>> the new one and deprecating the old one.  We don't seem to have a good
>> infrastructure for that.
>
> OK. I will propose new hash opfamilies for varchar/bpchar/text,
> int2/4/8/numeric, and timestamptz/date.

Cool!  I have no idea how we'll convert from the old ones to the new
ones without breaking things but I agree that it would be nicer if it
were like that rather than the way it is now.

> One approach is to promote the narrower type to the wider type, and
> then hash. The problem is that would substantially slow down the
> hashing of integers, so then we'd need to use one hash opfamily for
> partitioning and one for hashjoin, and it gets messy.

Yes, that sounds messy.

> The other approach is to check if the wider type is within the domain
> of the narrower type, and if so, *demote* the value and then hash. For
> instance, '4.2'::numeric would hash the same as it does today, but
> '4'::numeric would hash as an int2. I prefer this approach, and int8
> already does something resembling it.

Sounds reasonable.

> It's a little early in the v11 cycle to be having this argument.
> Really what I'm saying is that a small effort now may save us a lot of
> headache later.

Well, that's fair enough.  My concern is basically that it may the
other way around: a large effort to save a small headache later. I
agree that it's probably a good idea to figure out a way to salt the
hash function so that we don't end up with this and partitionwise join
interacting badly, but I don't see the other issues as being very
critical.  I don't have any evidence that there's a big need to
replace our hash functions with new ones, and over on the
partitionwise join thread we gave up on the idea of a cross-type
partitionwise join.  It wouldn't be particularly common (or sensible,
really) even if we ended up supporting it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

Reply via email to