Re: [HACKERS] Declarative partitioning

Ashutosh Bapat Wed, 10 Aug 2016 03:19:08 -0700

FOR VALUE clause of a partition does not allow a constant expression like
(10000/5 -1). It gives syntax error
regression=# create table pt1_p1 partition of pt1 for values start (0) end
((10000/5) - 1);
ERROR:  syntax error at or near "("
LINE 1: ...pt1_p1 partition of pt1 for values start (0) end ((10000/5) ...


Shouldn't we allow constant expressions here?

If this has been already discussed, please forgive me and point out the
relevant mail chain.

On Tue, Aug 9, 2016 at 12:48 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> What strikes me odd about these patches is RelOptInfo has remained
> unmodified. For a base partitioned table, I would expect it to be marked as
> partitioned may be indicating the partitioning scheme. Instead of that, I
> see that the code directly deals with PartitionDesc, PartitionKey which are
> part of Relation. It uses other structures like PartitionKeyExecInfo. I
> don't have any specific observation as to why we need such information in
> RelOptInfo, but lack of it makes me uncomfortable. It could be because
> inheritance code does not require any mark in RelOptInfo and your patch
> re-uses inheritance code. I am not sure.
>
> On Tue, Aug 9, 2016 at 9:17 AM, Amit Langote <
> langote_amit...@lab.ntt.co.jp> wrote:
>
>> On 2016/08/09 6:02, Robert Haas wrote:
>> > On Mon, Aug 8, 2016 at 1:40 AM, Amit Langote
>> > <langote_amit...@lab.ntt.co.jp> wrote:
>> >>> +1, if we could do it. It will need a change in the way Amit's patch
>> stores
>> >>> partitioning scheme in PartitionDesc.
>> >>
>> >> Okay, I will try to implement this in the next version of the patch.
>> >>
>> >> One thing that comes to mind is what if a user wants to apply hash
>> >> operator class equality to list partitioned key by specifying a hash
>> >> operator class for the corresponding column.  In that case, we would
>> not
>> >> have the ordering procedure with an hash operator class, hence any
>> >> ordering based optimization becomes impossible to implement.  The
>> current
>> >> patch rejects a column for partition key if its type does not have a
>> btree
>> >> operator class for both range and list methods, so this issue doesn't
>> >> exist, however it could be seen as a limitation.
>> >
>> > Yes, I think you should expect careful scrutiny of that issue.  It
>> > seems clear to me that range partitioning requires a btree opclass,
>> > that hash partitioning requires a hash opclass, and that list
>> > partitioning requires at least one of those things.  It would probably
>> > be reasonable to pick one or the other and insist that list
>> > partitioning always requires exactly that, but I can't see how it's
>> > reasonable to insist that you must have both types of opclass for any
>> > type of partitioning.
>>
>> So because we intend to implement optimizations for list partition
>> metadata that presuppose existence of corresponding btree operator class,
>> we should just always require user to specify one (or error out if user
>> doesn't specify and a default one doesn't exist).  That way, we explicitly
>> do not support specify hash equality operator for list partitioning.
>>
>> >> So, we have 3 choices for the internal representation of list
>> partitions:
>> >>
>> >> Choice 1 (the current approach):  Load them in the same order as they
>> are
>> >> found in the partition catalog:
>> >>
>> >> Table 1: p1 {'b', 'f'}, p2 {'c', 'd'}, p3 {'a', 'e'}
>> >> Table 2: p1 {'c', 'd'}, p2 {'a', 'e'}, p3 {'b', 'f'}
>> >>
>> >> In this case, mismatch on the first list would make the two tables
>> >> incompatibly partitioned, whereas they really aren't incompatible.
>> >
>> > Such a limitation seems clearly unacceptable.  We absolutely must be
>> > able to match up compatible partitioning schemes without getting
>> > confused by little details like the order of the partitions.
>>
>> Agreed.  Will change my patch to adopt the below method.
>>
>> >> Choice 2: Representation with 2 arrays:
>> >>
>> >> Table 1: ['a', 'b', 'c', 'd', 'e', 'f'], [3, 1, 2, 2, 3, 1]
>> >> Table 2: ['a', 'b', 'c', 'd', 'e', 'f'], [2, 3, 1, 1, 2, 3]
>> >>
>> >> It still doesn't help the case of pairwise joins because it's hard to
>> tell
>> >> which value belongs to which partition (the 2nd array carries the
>> original
>> >> partition numbers).  Although it might still work for tuple-routing.
>> >
>> > It's very good for tuple routing.  It can also be used to match up
>> > partitions for pairwise joins.  Compare the first arrays.  If they are
>> > unequal, stop.  Else, compare the second arrays, incrementally
>> > building a mapping between them and returning false if the mapping
>> > turns out to be non-bijective.  For example, in this case, we look at
>> > index 0 and decide that 3 -> 2.  We look at index 1 and decide 1 -> 3.
>> > We look at index 2 and decide 2 -> 1.  We look at index 4 and find
>> > that we already have a mapping for 2, but it's compatible because we
>> > need 2 -> 1 and that's what is already there.  Similarly for the
>> > remaining entries.  This is really a pretty easy loop to write and it
>> > should run very quickly.
>>
>> I see, it does make sense to try to implement this way.
>>
>> >> Choice 3: Order all lists' elements for each list individually and then
>> >> order the lists themselves on their first values:
>> >>
>> >> Table 1: p3 {'a', 'e'}, p2 {'b', 'f'}, p1 {'c', 'd'}
>> >> Table 2: p2 {'a', 'e'}, p1 {'b', 'f'}, p3 {'c', 'd'}
>> >>
>> >> This representation makes pairing partitions for pairwise joining
>> >> convenient but for tuple-routing we still need to visit each partition
>> in
>> >> the worst case.
>> >
>> > I think this is clearly not good enough for tuple routing.  If the
>> > algorithm I proposed above turns out to be too slow for matching
>> > partitions, then we could keep both this representation and the
>> > previous one.  We are not limited to just one.  But I don't think
>> > that's likely to be the case.
>>
>> I agree.  Let's see how the option 2 turns out.
>>
>> > Also, note that all of this presupposes we're doing range
>> > partitioning, or perhaps list partitioning with a btree opclass.  For
>> > partitioning based on a hash opclass, you'd organize the data based on
>> > the hash values rather than range comparisons.
>>
>> Yes, the current patch does not implement hash partitioning, although I
>> have to think about how to support the hash case when designing the
>> internal data structures.
>>
>>
>> By the way, I am planning to start a new thread with the latest set of
>> patches which I will post in a day or two.  I have tried to implement all
>> the bug fixes and improvements that have been suggested on this thread so
>> far.  Thanks to all those who reviewed and gave their comments.  Please
>> check this page to get a link to the new thread:
>> https://commitfest.postgresql.org/10/611/
>>
>> Thanks,
>> Amit
>>
>>
>>
>
>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Declarative partitioning

Reply via email to