On 2015/12/18 3:56, Robert Haas wrote:
> On Mon, Dec 14, 2015 at 2:14 AM, Amit Langote
> <langote_amit...@lab.ntt.co.jp> wrote:
>> Syntax to create a partitioned table (up to 2 levels of partitioning):
>> CREATE TABLE foo (
>> PARTITION BY R/L ON (key0)
>> SUBPARTITION BY R/L ON (key1)
>> [(PARTITION foo_1 FOR VALUES <val> [<storage_params>] [<tblspc>]
>> [(SUBPARTITION foo_1_1 FOR VALUES <val> [<storage_params>] [<tblspc>],
>> ...)], ...)];
>> The above creates two pg_partitioned_rel entries for foo with partlevel 0
>> and 1, for key0 and key1, respectively. For foo_1 and foo_1_1, this
>> creates pg_partition entries, with foo and foo_1 as partparent,
>> Why just 2 levels? - it seems commonplace and makes the syntax more
>> intuitive? I guess it might be possible to generalize the syntax for
>> multi-level partitioning. Ideas? If we want to support the notion of
>> sub-partition template in future, that would require some thought, more
>> importantly proper catalog organization for the same.
> I do not think this is a particularly good idea. You're going to need
> to dump each partition separately at least in --binary-upgrade mode,
> because each is going to have its own magic OIDs that need to be
> restored, and also because there will most likely be at least some
> properties that are going to vary between partitions. You could
> require that every partition have exactly the same set of columns,
> constraints, rules, triggers, policies, attribute defaults, comments,
> column comments, and everything else that might be different from one
> partition to another, and further require that they have exactly
> matching indexes. It would take a fair amount of code to prohibit all
> that, but it could be done. However, do we really want that? There
> may well be some things were we want to enforce that the parent and
> the child are exactly identical, but I doubt we want that for
> absolutely every property, current and future, of the partition. And
> even if you did, because of the --binary-upgrade stuff, you still need
> to to be able to dump them separately.
> Therefore, I believe it is a whole lot better to make the primary
> syntax for table partitioning something where you issue a CREATE
> statement for the parent and then a CREATE statement for each child.
> If we want to also have a convenience syntax so that people who want
> to create a parent and a bunch of children in one fell swoop can do
> so, fine.
Regarding --binary-upgrade dump mode, how about we teach pg_dump to dump
each partition separately using ALTER TABLE parent ADD PARTITION
especially for the "magic OIDs" reason? It may very well be a CREATE
PARTITION-style command though. Note that each such command could specify
properties that can be different per partition. I said in my email,
perhaps not so clearly, that "only" WITH options, tablespace and
relpersistence can be different per partition. But I can see why that may
be severely restrictive at this stage.
By the way, what do you think about SUBPARTITION keyword-based syntax for
multi-level partitioning? Should we instead require that each partition
has its own PARTITION BY in its creation command?
> I would not choose to model the syntax for creating partitions on
> Oracle. I don't find that syntax particularly nice or easy to
> remember. I say PARTITION BY RANGE, and then inside the parentheses I
> use the PARTITION keyword for each partition? Really? But I think
> copying the style while having the details be incompatible is an even
> worse idea.
As for the convenience syntax (if at all), how about:
CREATE TABLE foo (
PARTITION BY ... ON (...)
SUBPARTITION BY ... ON (...)
where opt_partition_list is:
partname FOR VALUES ... [WITH] [TABLESPACE] opt_subpart_list
where opt_subpart_list is:
subpartname FOR VALUES ... [WITH] [ TABLESPACE]
PARTITIONS, SUBPARTITIONS would be new unreserved keywords. Or we can do
away with them.
>> What about ALTER TABLE? - Instead of allowing ALTER TABLE to be applied
>> directly to partitions on case-by-case basis (they are tables under the
>> hood after all), we should restrict AT to the master table. Most of the AT
>> changes implicitly propagate from the master table to its partitions. Some
>> of them could be directly applied to partitions and/or sub-partitions such
>> as rename, storage manipulations like - changing tablespace, storage
>> parameters (reloptions), etc.:
>> ALTER TABLE foo
>> RENAME PARTITION <partition-name> TO <new-name>;
>> ALTER TABLE foo
>> RENAME SUBPARTITION <sub-partition-name> TO <new-name>;
>> ALTER TABLE foo
>> SET TABLESPACE ... [DEFAULT] FOR PARTITION <partition-name>;
>> ALTER TABLE foo
>> SET TABLESPACE ... FOR SUBPARTITION <sub-partition-name>;
>> ALTER TABLE foo
>> SET (storage_parameter = value) [DEFAULT] FOR PARTITION <partition-name>;
>> ALTER TABLE foo
>> SET (storage_parameter = value) FOR SUBPARTITION <sub-partition-name>;
> I don't think this is a very good idea. This is basically proposing
> that for every DDL command that you can apply to a table, you have to
> spell it differently for a partition. That seems like a lot of extra
> work for no additional functionality.
I didn't mean to propose the special syntax for every available DDL
command. Just those that modify properties that could be different per
partition. I thought there would be very few such properties and hence not
a lot of special commands. But then again, I may be wrong about what
those properties are.
>> By the way, should we also allow changing the logging of
>> partitions/sub-partitions as follows?
> Again, I think you're coming at this from the wrong direction.
> Instead of saying we're going to disallow all changes to the
> partitions and then deciding we need to allow certain changes after
> all, I think we should allow everything that is currently allowed for
> an inherited table and then decide which of those things we need to
> prohibit, and why. For example, if you insist that a child table has
> to have a tuple descriptor that matches the parent, that can improve
> efficiency: Append won't need to project, and so on. But it now
> becomes very difficult to support taking a stand-alone table and
> making it a partition of an existing partitioned table, because the
> set of dropped columns might not match. Having to give an error in
> that case amounts to "we're sorry, we can't attach your partition to
> the partitioning hierarchy because of some invisible state that you
> can't see" isn't very nice. Now I'm not saying that isn't the right
> decision, but I think the design choices here need to be carefully
> thought about.
Yeah, I am concerned about the ATTACH PARTITION USING TABLE case for the
very point you mention. And I can see how it may result from the
restrictive model I propose. FWIW, other databases impose a number of
restrictions on the partition roll-in case but not sure if for the
internal reasons we might want to.
> Stepping away from that particular example, a blanket prohibition on
> changing any attribute of a child table seems like it will prohibit a
> lot of useful things that really ought to work. And again, I don't
> think it's a good idea to implement separate syntax for changing a
> partition vs. changing a table. If I want to set a partition as
> unlogged, I should be able to say ALTER TABLE partition_name UNLOGGED
> or maybe ALTER PARTITION partition_name UNLOGGED, not be forced to use
> some new grammar production that looks completely different.
Okay. ALTER PARTITION may be the way to go.
>> What about index constraints, ie, PRIMARY KEY, UNIQUE and EXCLUSION
>> constraints - 2 things must be clear here: cannot create these constraints
>> on individual partitions and all partition columns (key0 + key1) must be
>> the leading columns of the key. On a related note, creating index on the
>> master table should create the index on all "leaf" partitions. The index
>> on the mater table itself would be just a logical index. Should we allow
>> creating or dropping indexes on partitions directly?
> I don't find this to be particularly clear. You are assuming that
> nobody wants to create a constraint that a certain value is unique
> within a partition. That might not be a real common thing to want to
> do, but it could certainly be useful to somebody, and the current
> system with table inheritance allows it. For example, suppose that we
> have orders partitioned on the order_date column, by month. The user
> might want to create a UNIQUE index on order_id on each partition.
> Maybe they start over with order_id 1 at the beginning of each month.
> But even if, as is more likely, the order IDs keep counting up from
> month to month, they don't want to be forced to include the whole
> partitioning key in the index in order to have it marked UNIQUE. That
> may be enough, in practice, to ensure the global uniqueness of order
> IDs even though the system doesn't technically enforce it in all
Okay, I didn't consider that one may want to create UNIQUE constraint per
partition. Do they need to be allowed to be different per partition?
Sorry, I could not understand your last sentence regarding global
uniqueness. Given the restrictions on its definition, in what ways could
the system fail to enforce it?
> If you want an index created on the parent to cascade down to all
> children, that's a big project to get right. Suppose I create an
> index on the parent table. After a while, I notice that it's getting
> bloated, so I created another index with the same definition. Now, I
> drop one of the two indexes. One of the two indexes from each child
> table needs to go away, and moreover it can't be picked arbitrarily -
> it has to be the one that was created at the same time as the parent
> index I'm dropping. If you want it to behave like this, you need a
> whole system of bookkeeping to make it work right.
> For version 1, I'd go the other way and prohibit index definitions on
> the empty parent rels. Let people create indexes on the children as
> they wish. In a future version, we can add stuff to cascade from
> parents to children.
Okay, I may be missing the subtleties of global uniqueness case. I agree
we can leave it out of version 1.
>> It would be interesting to talk about features like SPLIT, MERGE, EXCHANGE
>> but I'm inclined to leave them as future enhancements. For a functionality
>> similar to EXCHANGE, there are commands like ATTACH/DETACH in the latest
>> patch. We could extend them to also consider sub-partitions:
> We don't need to have these in the first version, but we have to make
> some architectural decisions that affect how feasible they are to
> implement and in which cases, as noted above.
>> One cannot define rules, triggers, and RLS policies on them. Although,
>> AR triggers defined on a partitioned master table are propagated to the
>> "leaf" partitions.
> What value do you see us getting out of restricting these particular things?
Perhaps, I'm wrong in thinking that we should limit these to be associated
with only the top-level partitioned table.
Thanks a lot for the comments!
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: