On Fri, Jan 15, 2016 at 5:48 AM, Amit Langote <langote_amit...@lab.ntt.co.jp> wrote: > If we have a CREATE statement for each partition, how do we generalize > that to partitions at different levels? For example, if we use something > like the following to create a partition of parent_name: > > CREATE PARTITION partition_name OF parent_name FOR VALUES ... > WITH ... TABLESPACE ... > > Do we then say: > > CREATE PARTITION subpartition_name OF partition_name ... > > to create a level 2 partition (sub-partition) of parent_name?
Yes, exactly. Personally, I would be more inclined to make this a CREATE TABLE statement, like CREATE TABLE partition_name PARTITION OF parent_name FOR VALUES ... CREATE TABLE subpartition_name PARTITION OF partition_name FOR VALUES ... > I ask that also because it's related to the choice of syntax to use to > declare the partition key for the multi-level case. I'm considering the > SUBPARTITION BY notation and perhaps we could generalize it to more than > just 2 levels. So, for the above case, parent_name would have been created as: > > CREATE TABLE parent_name PARTITION BY ... SUBPARTITION BY ... > > Needless to say, when subpartition_name is created with the command we saw > a moment ago, the root partitioned table would be locked. In fact, adding > a partition anywhere in the hierarchy needs an exclusive lock on the root > table. Also, partition rule (the FOR VALUES clause) would be validated > against PARTITION BY or SUBPARTITION BY clause at the respective level. > > Although, I must admit I feel a little uneasy about the inherent asymmetry > in using SUBPARTITION BY for key declaration whereas piggybacking CREATE > PARTITION for creating sub-partitions. Is there a better way? I think if you've got SUBPARTITION as a keyword in the syntax anywhere, you're doing it wrong. The toplevel object shouldn't really care whether its children are themselves partitioned or not. > Do we want this at all? It seems difficult to generalize this to > multi-level hierarchy of more than 2 levels. It doesn't do anything for me. There may be somebody who wants it, but I don't see much value myself. > After thinking some more on this - I think that identical tuple > descriptors may not just be a nice-to-have but critical in some cases. For > example, consider built-in/trigger-less tuple routing. I'd imagine that > the partition to insert a tuple into would be determined just before > calling heap_insert() in ExecInsert() and CopyFrom(). That means the > HeapTuple that is passed to heap_insert() to insert into the partition > would be based on the root table's tuple descriptor. Note also that the > tuple would have passed through BR, IR triggers, constraints of the root > table. When the data is eventually queried from partitions directly, or > well even via the root table (considering existing executor capabilities), > partition's tuple descriptor at that point had better match the data that > went onto the disk. That means we had better keep at least the following > things in sync: number of attributes, name, position (attnum), type, > notnull-ness of individual attributes. So in order to do that, recursively > apply ADD/DROP COLUMN, SET WITH/WITHOUT OIDS, RENAME COLUMN, ALTER COLUMN > TYPE, SET/DROP NOT NULL on the root table to all the partitions and > prevent those sub-commands to be directly applied to any table > (partitions) in the partitioning hierarchy but the root. I further don't > see the point of allowing to set (or drop) column defaults in partitions > because now INSERT or COPY FROM cannot be directly applied to partitions. > Similar argument could be made for BR, IR triggers and CHECK constraints. > Am I missing something in all of this? Well, in the end there are basically two choices. Either tuple descriptors have to match exactly, and then you can reuse a tuple intended for one partition for some other partition without projection; or else they don't, and you need to project. I'm not sure that projection is expensive enough to make disallowing mismatched tuple descriptors a necessary design choice - and certainly that design choice is awkward from a UI standpoint, because we will sometimes not be able to attach a partition for a reason that the user can neither see in the \d output nor correct. But on the flip side, not having to worry about projection is nice. > An alternative to doing any of that very well may be to design > trigger-less tuple routing to be smarter about possible mismatch of the > tuple descriptors but I haven't given that a lot of thought. Is that > really an alternative worth looking into? Yes. > On one hand, I think to keep treating "partition hierarchies" as > "inheritance hierachies" might have some issues. I am afraid that > documented inheritance semantics may not be what we want to keep using for > the new partitioned tables. By that, I mean all the user-facing behaviors > where inheritance has some bearing. Should it also affect new partitioned > tables? Consider whether inheritance semantics would render infeasible > some of the things that we'd like to introduce for the new partitioned > tables such as automatic tuple routing, or keep us from improving planner > smarts and executor capabilities for partitioned tables over what we > already have. > > OTOH, I may be looking at it wrongly. We would not be required to enforce > user-facing inheritance behaviors on the new partitioned tables after all. > That is to say - it's just that new partitioned tables could still use > relevant inheritance infrastructure behind-the-scenes for planning, > execution and a few other things and not care about abiding by regular > inheritance semantics. I should just go ahead and add special cases in all > places where existing inheritance handling code stands to cause trouble > down the line for partitioned tables. We might want to mention that we do > so somewhere in documentation and also note that regular inheritance > semantics does not apply. While it sounds probably fine as implementation > for the feature released initially, a day will inevitably come when this > behind-the-scenes implementation will be changed to something more > amenable to better optimization. But that's for future... This seems pretty vague to me. I don't know that I have an opinion without a more specific list of questions. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers