I was just reading through your patches and here are some quick review
Review comments for 0001-Catalog-and-DDL-for-partitioned-tables-17.patch:
@@ -1102,9 +1104,10 @@ heap_create_with_catalog(const char *relname,
/* Use binary-upgrade override for pg_class.oid/relfilenode? */
if (IsBinaryUpgrade &&
- (relkind == RELKIND_RELATION || relkind == RELKIND_SEQUENCE ||
- relkind == RELKIND_VIEW || relkind == RELKIND_MATVIEW ||
- relkind == RELKIND_COMPOSITE_TYPE || relkind ==
+ (relkind == RELKIND_RELATION || relkind ==
+ relkind == RELKIND_SEQUENCE || relkind == RELKIND_VIEW ||
+ relkind == RELKIND_MATVIEW || relkind ==
+ relkind == RELKIND_FOREIGN_TABLE))
You should add the RELKIND_PARTITIONED_TABLE at the end of if condition that
will make diff minimal. While reading through the patch I noticed that there
is inconsistency - someplace its been added at the end and at few places its
at the start. I think you can make add it at the end of condition and be
consistent with each place.
+ * We need to transform the raw parsetrees corresponding to
+ * expressions into executable expression trees. Like column
+ * and CHECK constraints, we could not have done the transformation
+ * earlier.
Additional space before "Like column defaults".
- char relkind;
+ char relkind,
Newly added variable should be define separately with its type. Something
+ char expected_relkind;
+ /* Prevent partitioned tables from becoming inheritance parents */
+ if (parent_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ errmsg("cannot inherit from partitioned table \"%s\"",
need alignment for last line.
+ atttuple = SearchSysCacheAttName(RelationGetRelid(rel),
+ if (!HeapTupleIsValid(atttuple))
+ errmsg("column \"%s\" named in partition key does
+ attform = (Form_pg_attribute) GETSTRUCT(atttuple);
+ if (attform->attnum <= 0)
+ errmsg("cannot use system column \"%s\" in
need alignment for last line of ereport
+ /* Disallow ROW triggers on partitioned tables */
+ if (stmt->row && rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ errmsg("\"%s\" is a partitioned table",
+ errdetail("Partitioned tables cannot have ROW triggers.")));
@@ -2512,6 +2579,7 @@ transformAlterTableStmt(Oid relid, AlterTableStmt
cxt.blist = NIL;
cxt.alist = NIL;
cxt.pkey = NULL;
+ cxt.ispartitioned = rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE;
I think adding bracket will look code more clear.
+ cxt.ispartitioned = (rel->rd_rel->relkind ==
+ * RelationBuildPartitionKey
+ * Build and attach to relcache partition key data of relation
+ * Partitioning key data is stored in CacheMemoryContext to ensure it
+ * as long as the relcache. To avoid leaking memory in that context in
+ * of an error partway through this function, we build the structure in the
+ * working context (which must be short-lived) and copy the completed
+ * structure into the cache memory.
extra space before "To avoid leaking memory"
+ /* variable-length fields start here, but we allow direct access to
+ int2vector partattrs; /* attribute numbers of columns in
Why partattrs is allow direct access - its not really clear from the
I will continue reading more patch and testing functionality.. will share
comments as I have it.
On Tue, Nov 22, 2016 at 2:45 PM, Amit Langote <langote_amit...@lab.ntt.co.jp
> Updated patches attached. I merged what used to be 0006 and 0007 into one.
> On 2016/11/19 2:23, Robert Haas wrote:
> > On Fri, Nov 18, 2016 at 5:59 AM, Amit Langote
> > <langote_amit...@lab.ntt.co.jp> wrote:
> >> Oh but wait, that means I can insert rows with NULLs in the range
> >> partition key if I choose to insert it directly into the partition,
> >> whereas I have been thinking all this while that there could never be
> >> NULLs in the partition key of a range partition. What's more,
> >> get_qual_for_partbound() (patch 0003) emits a IS NOT NULL constraint for
> >> every partition key column in case of a range partition. Is that
> >> wrongheaded altogether? (also see my reply to your earlier message
> >> NULLs in the range partition key)
> > The easiest thing to do might be to just enforce that all of the
> > partition key columns have to be not-null when the range-partitioned
> > table is defined, and reject any attempt to DROP NOT NULL on them
> > later. That's probably better that shoehorning it into the table
> > constraint.
> Agreed that forcing range partitioning columns to be NOT NULL during table
> creation would be a better approach. But then we would have to reject
> using expressions in the range partition key, right?
> >> Thanks for the idea below!
> >>> 1. Forget the idea of a tree. Instead, let the total number of tables
> >>> in the partitioning hierarchy be N and let the number of those that
> [ ... ]
> >>> No recursion, minimal pointer chasing, no linked lists. The whole
> >>> thing is basically trivial aside from the cost of
> >>> list/range_partition_for_tuple itself; optimizing that is a different
> >>> project. I might have some details slightly off here, but hopefully
> >>> you can see what I'm going for: you want to keep the computation that
> >>> happens in get_partition_for_tuple() to an absolute minimum, and
> >>> instead set things up in advance so that getting the partition for a
> >>> tuple is FAST. And you want the data structures that you are using in
> >>> that process to be very compact, hence arrays instead of linked lists.
> >> This sounds *much* better. Here is a quick attempt at coding the design
> >> you have outlined above in the attached latest set of patches.
> > That shrank both 0006 and 0007 substantially, and it should be faster,
> > too. I bet you can shrink them further:
> Some changes described below have reduced the size to a certain degree.
> > - Why is PartitionKeyExecInfo a separate structure and why does it
> > have a NodeTag? I bet you can dump the node tag, merge it into
> > PartitionDispatch, and save some more code and some more
> > pointer-chasing.
> OK, I merged the fields of what used to be PartitionKeyExecInfo into
> PartitionDispatchData as the latter's new fields key and keystate.
> > - I still think it's a seriously bad idea for list partitioning and
> > range partitioning to need different code-paths all over the place
> > here. List partitions support nulls but not multi-column partitioning
> > keys and range partitions support multi-column partitioning keys but
> > not nulls, but you could use an internal structure that supports both.
> > Then you wouldn't need partition_list_values_bsearch and also
> > partition_rbound_bsearch; you could have one kind of bound structure
> > that can be bsearch'd for either list or range. You might even be
> > able to unify list_partition_for_tuple and range_partition_for_tuple
> > although that looks a little harder. In either case, you bsearch for
> > the greatest value <= the value you have. The only difference is that
> > for list partitioning, you have to enforce at the end that it is an
> > equal value, whereas for range partitioning less-than-or-equal-to is
> > enough. But you should still be able to arrange for more code
> > sharing.
> I have considered these suggestions in the latest patch. Now instead of
> PartitionListInfo, PartitionRangeInfo, and BoundCollectionData structs,
> there is only one PartitionBoundInfo which consolidates the partition
> bound information of a partitioned table. Some of the fields are
> applicable only to one of list or range case; for example, null-accepting
> list partition index, infinite status of individual range datums.
> Also, there is now only one binary search function named
> partition_bound_bsearch() which invokes a comparison function named
> partition_bound_cmp(). The former searches a probe (a partition bound or
> tuple) within a PartitionBoundInfo, which is passed all the way down to
> the comparison function.
> Also, we no longer have list_partition_for_tuple() and
> range_partition_for_tuple(). Instead, in get_partition_for_tuple()
> itself, there is a bsearch followed by list and range partitioning
> specific steps based on the returned offset.
> > - I don't see why you need the bound->lower stuff any more. If
> > rangeinfo.bounds[offset] is a lower bound for a partition, then
> > rangeinfo.bounds[offset+1] is either (a) the upper bound for that
> > partition and the partition is followed by a "gap" or (b) both the
> > upper bound for that partition and the lower bound for the next
> > partition. With the inclusive/exclusive bound stuff gone, every range
> > bound has the same sense: if the probed value is <= the bound then
> > we're supposed to be a lower-numbered partition, but if > then we're
> > supposed to be in this partition or a higher-numbered one.
> OK, I've managed to get rid of lower. At least it is no longer kept in
> the new relcache struct PartitionBoundInfo. It is still kept in
> PartitionRangeBound which is used to hold individual range bounds when
> sorting them (during relcache build). Comparisons invoked during the
> aforementioned sorting step still need to distinguish between lower and
> upper bounds (such that '1)' < '[1').
> Tuple-routing no longer needs to look at lower. In that case, what you
> described above applies.
> As a result, one change became necessary: to how we flag individual range
> bound datum as infinite or not. Previously, it was a regular Boolean
> value (either infinite or not) and to distinguish +infinity from
> -infinity, we looked at whether the bound is lower or upper (the lower
> flag). Now, instead, the variable holding the status of individual range
> bound datum is set to a ternary value: RANGE_DATUM_FINITE (0),
> RANGE_DATUM_NEG_INF (1), and RANGE_DATUM_POS_INF (2), which still fits in
> a bool. Upon encountering an infinite range bound datum, whether it's
> negative or positive infinity derives the comparison result. Consider the
> following example:
> partition p1 from (1, unbounded) to (1, 1);
> partition p2 from (1, 1) to (1, 10);
> partition p3 from (1, 10) to (1, unbounded);
> partition p4 from (2, unbounded) to (2, 1);
> ... so on
> In this case, we need to be able to conclude, say, (1, -inf) < (1, 15) <
> (1, +inf), so that tuple (1, 15) is assigned to the proper partition.
> Does this last thing sound reasonable?
> Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
> To make changes to your subscription: