Re: [HACKERS] Declarative partitioning

Robert Haas Tue, 16 Feb 2016 04:58:25 -0800

On Fri, Jan 15, 2016 at 5:48 AM, Amit Langote
<[email protected]> wrote:
> If we have a CREATE statement for each partition, how do we generalize
> that to partitions at different levels? For example, if we use something
> like the following to create a partition of parent_name:
>
> CREATE PARTITION partition_name OF parent_name FOR VALUES ...
>     WITH ... TABLESPACE ...
>
> Do we then say:
>
> CREATE PARTITION subpartition_name OF partition_name ...
>
> to create a level 2 partition (sub-partition) of parent_name?


Yes, exactly.

Personally, I would be more inclined to make this a CREATE TABLE statement, like

CREATE TABLE partition_name PARTITION OF parent_name FOR VALUES ...
CREATE TABLE subpartition_name PARTITION OF partition_name FOR VALUES ...

> I ask that also because it's related to the choice of syntax to use to
> declare the partition key for the multi-level case. I'm considering the
> SUBPARTITION BY notation and perhaps we could generalize it to more than
> just 2 levels. So, for the above case, parent_name would have been created as:
>
> CREATE TABLE parent_name PARTITION BY ... SUBPARTITION BY ...
>
> Needless to say, when subpartition_name is created with the command we saw
> a moment ago, the root partitioned table would be locked. In fact, adding
> a partition anywhere in the hierarchy needs an exclusive lock on the root
> table. Also, partition rule (the FOR VALUES clause) would be validated
> against PARTITION BY or SUBPARTITION BY clause at the respective level.
>
> Although, I must admit I feel a little uneasy about the inherent asymmetry
> in using SUBPARTITION BY for key declaration whereas piggybacking CREATE
> PARTITION for creating sub-partitions. Is there a better way?

I think if you've got SUBPARTITION as a keyword in the syntax
anywhere, you're doing it wrong.  The toplevel object shouldn't really
care whether its children are themselves partitioned or not.

> Do we want this at all? It seems difficult to generalize this to
> multi-level hierarchy of more than 2 levels.

It doesn't do anything for me.  There may be somebody who wants it,
but I don't see much value myself.

> After thinking some more on this - I think that identical tuple
> descriptors may not just be a nice-to-have but critical in some cases. For
> example, consider built-in/trigger-less tuple routing. I'd imagine that
> the partition to insert a tuple into would be determined just before
> calling heap_insert() in ExecInsert() and CopyFrom(). That means the
> HeapTuple that is passed to heap_insert() to insert into the partition
> would be based on the root table's tuple descriptor. Note also that the
> tuple would have passed through BR, IR triggers, constraints of the root
> table. When the data is eventually queried from partitions directly, or
> well even via the root table (considering existing executor capabilities),
> partition's tuple descriptor at that point had better match the data that
> went onto the disk. That means we had better keep at least the following
> things in sync: number of attributes, name, position (attnum), type,
> notnull-ness of individual attributes. So in order to do that, recursively
> apply ADD/DROP COLUMN, SET WITH/WITHOUT OIDS, RENAME COLUMN, ALTER COLUMN
> TYPE, SET/DROP NOT NULL on the root table to all the partitions and
> prevent those sub-commands to be directly applied to any table
> (partitions) in the partitioning hierarchy but the root. I further don't
> see the point of allowing to set (or drop) column defaults in partitions
> because now INSERT or COPY FROM cannot be directly applied to partitions.
> Similar argument could be made for BR, IR triggers and CHECK constraints.
> Am I missing something in all of this?

Well, in the end there are basically two choices.  Either tuple
descriptors have to match exactly, and then you can reuse a tuple
intended for one partition for some other partition without
projection; or else they don't, and you need to project.  I'm not sure
that projection is expensive enough to make disallowing mismatched
tuple descriptors a necessary design choice - and certainly that
design choice is awkward from a UI standpoint, because we will
sometimes not be able to attach a partition for a reason that the user
can neither see in the \d output nor correct.  But on the flip side,
not having to worry about projection is nice.

> An alternative to doing any of that very well may be to design
> trigger-less tuple routing to be smarter about possible mismatch of the
> tuple descriptors but I haven't given that a lot of thought. Is that
> really an alternative worth looking into?

Yes.

> On one hand, I think to keep treating "partition hierarchies" as
> "inheritance hierachies" might have some issues. I am afraid that
> documented inheritance semantics may not be what we want to keep using for
> the new partitioned tables. By that, I mean all the user-facing behaviors
> where inheritance has some bearing. Should it also affect new partitioned
> tables? Consider whether inheritance semantics would render infeasible
> some of the things that we'd like to introduce for the new partitioned
> tables such as automatic tuple routing, or keep us from improving planner
> smarts and executor capabilities for partitioned tables over what we
> already have.
>
> OTOH, I may be looking at it wrongly. We would not be required to enforce
> user-facing inheritance behaviors on the new partitioned tables after all.
> That is to say - it's just that new partitioned tables could still use
> relevant inheritance infrastructure behind-the-scenes for planning,
> execution and a few other things and not care about abiding by regular
> inheritance semantics. I should just go ahead and add special cases in all
> places where existing inheritance handling code stands to cause trouble
> down the line for partitioned tables. We might want to mention that we do
> so somewhere in documentation and also note that regular inheritance
> semantics does not apply. While it sounds probably fine as implementation
> for the feature released initially, a day will inevitably come when this
> behind-the-scenes implementation will be changed to something more
> amenable to better optimization. But that's for future...

This seems pretty vague to me.  I don't know that I have an opinion
without a more specific list of questions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning

Reply via email to