I am trying to create a partitioned table with primary keys on the partitions. Here's the corresponding syntax as per documentation CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] table_name PARTITION OF parent_table [ ( { column_name WITH OPTIONS [ column_constraint [ ... ] ] | table_constraint } [, ... ] ) ] partition_bound_spec
IIUC, it should allow "create table t1_p1 partition of t1 (a primary key) ...", (a primary key) is nothing but "column_name column_constraint", but here's what happens create table t1_p1 partition of t1 (a primary key) for values from (0) to (100); ERROR: syntax error at or near "primary" LINE 1: create table t1_p1 partition of t1 (a primary key) for value... The same syntax also suggests using table_constraints but that too doesn't work create table t1_p1 partition of t1 (primary key (a) ) for values from (0) to (100); ERROR: inherited relation "t1" is not a table or foreign table of course t1 is a table, what it isn't? Am I missing something? How do I define constraints on the partitions? On Tue, Nov 22, 2016 at 2:45 PM, Amit Langote <langote_amit...@lab.ntt.co.jp> wrote: > > Updated patches attached. I merged what used to be 0006 and 0007 into one. > > On 2016/11/19 2:23, Robert Haas wrote: >> On Fri, Nov 18, 2016 at 5:59 AM, Amit Langote >> <langote_amit...@lab.ntt.co.jp> wrote: >>> Oh but wait, that means I can insert rows with NULLs in the range >>> partition key if I choose to insert it directly into the partition, >>> whereas I have been thinking all this while that there could never be >>> NULLs in the partition key of a range partition. What's more, >>> get_qual_for_partbound() (patch 0003) emits a IS NOT NULL constraint for >>> every partition key column in case of a range partition. Is that >>> wrongheaded altogether? (also see my reply to your earlier message about >>> NULLs in the range partition key) >> >> The easiest thing to do might be to just enforce that all of the >> partition key columns have to be not-null when the range-partitioned >> table is defined, and reject any attempt to DROP NOT NULL on them >> later. That's probably better that shoehorning it into the table >> constraint. > > Agreed that forcing range partitioning columns to be NOT NULL during table > creation would be a better approach. But then we would have to reject > using expressions in the range partition key, right? > >>> Thanks for the idea below! >>> >>>> 1. Forget the idea of a tree. Instead, let the total number of tables >>>> in the partitioning hierarchy be N and let the number of those that > > [ ... ] > >>>> >>>> No recursion, minimal pointer chasing, no linked lists. The whole >>>> thing is basically trivial aside from the cost of >>>> list/range_partition_for_tuple itself; optimizing that is a different >>>> project. I might have some details slightly off here, but hopefully >>>> you can see what I'm going for: you want to keep the computation that >>>> happens in get_partition_for_tuple() to an absolute minimum, and >>>> instead set things up in advance so that getting the partition for a >>>> tuple is FAST. And you want the data structures that you are using in >>>> that process to be very compact, hence arrays instead of linked lists. >>> >>> This sounds *much* better. Here is a quick attempt at coding the design >>> you have outlined above in the attached latest set of patches. >> >> That shrank both 0006 and 0007 substantially, and it should be faster, >> too. I bet you can shrink them further: > > Some changes described below have reduced the size to a certain degree. > >> >> - Why is PartitionKeyExecInfo a separate structure and why does it >> have a NodeTag? I bet you can dump the node tag, merge it into >> PartitionDispatch, and save some more code and some more >> pointer-chasing. > > OK, I merged the fields of what used to be PartitionKeyExecInfo into > PartitionDispatchData as the latter's new fields key and keystate. > >> - I still think it's a seriously bad idea for list partitioning and >> range partitioning to need different code-paths all over the place >> here. List partitions support nulls but not multi-column partitioning >> keys and range partitions support multi-column partitioning keys but >> not nulls, but you could use an internal structure that supports both. >> Then you wouldn't need partition_list_values_bsearch and also >> partition_rbound_bsearch; you could have one kind of bound structure >> that can be bsearch'd for either list or range. You might even be >> able to unify list_partition_for_tuple and range_partition_for_tuple >> although that looks a little harder. In either case, you bsearch for >> the greatest value <= the value you have. The only difference is that >> for list partitioning, you have to enforce at the end that it is an >> equal value, whereas for range partitioning less-than-or-equal-to is >> enough. But you should still be able to arrange for more code >> sharing. > > I have considered these suggestions in the latest patch. Now instead of > PartitionListInfo, PartitionRangeInfo, and BoundCollectionData structs, > there is only one PartitionBoundInfo which consolidates the partition > bound information of a partitioned table. Some of the fields are > applicable only to one of list or range case; for example, null-accepting > list partition index, infinite status of individual range datums. > > Also, there is now only one binary search function named > partition_bound_bsearch() which invokes a comparison function named > partition_bound_cmp(). The former searches a probe (a partition bound or > tuple) within a PartitionBoundInfo, which is passed all the way down to > the comparison function. > > Also, we no longer have list_partition_for_tuple() and > range_partition_for_tuple(). Instead, in get_partition_for_tuple() > itself, there is a bsearch followed by list and range partitioning > specific steps based on the returned offset. > >> - I don't see why you need the bound->lower stuff any more. If >> rangeinfo.bounds[offset] is a lower bound for a partition, then >> rangeinfo.bounds[offset+1] is either (a) the upper bound for that >> partition and the partition is followed by a "gap" or (b) both the >> upper bound for that partition and the lower bound for the next >> partition. With the inclusive/exclusive bound stuff gone, every range >> bound has the same sense: if the probed value is <= the bound then >> we're supposed to be a lower-numbered partition, but if > then we're >> supposed to be in this partition or a higher-numbered one. > > OK, I've managed to get rid of lower. At least it is no longer kept in > the new relcache struct PartitionBoundInfo. It is still kept in > PartitionRangeBound which is used to hold individual range bounds when > sorting them (during relcache build). Comparisons invoked during the > aforementioned sorting step still need to distinguish between lower and > upper bounds (such that '1)' < '[1'). > > Tuple-routing no longer needs to look at lower. In that case, what you > described above applies. > > As a result, one change became necessary: to how we flag individual range > bound datum as infinite or not. Previously, it was a regular Boolean > value (either infinite or not) and to distinguish +infinity from > -infinity, we looked at whether the bound is lower or upper (the lower > flag). Now, instead, the variable holding the status of individual range > bound datum is set to a ternary value: RANGE_DATUM_FINITE (0), > RANGE_DATUM_NEG_INF (1), and RANGE_DATUM_POS_INF (2), which still fits in > a bool. Upon encountering an infinite range bound datum, whether it's > negative or positive infinity derives the comparison result. Consider the > following example: > > partition p1 from (1, unbounded) to (1, 1); > partition p2 from (1, 1) to (1, 10); > partition p3 from (1, 10) to (1, unbounded); > partition p4 from (2, unbounded) to (2, 1); > ... so on > > In this case, we need to be able to conclude, say, (1, -inf) < (1, 15) < > (1, +inf), so that tuple (1, 15) is assigned to the proper partition. > > Does this last thing sound reasonable? > > Thanks, > Amit -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers