Re: [HACKERS] Declarative partitioning - another take

Robert Haas Tue, 15 Nov 2016 11:22:40 -0800

On Tue, Nov 15, 2016 at 5:30 AM, Amit Langote
<[email protected]> wrote:
> On 2016/11/11 19:30, Amit Langote wrote:
>>
>> Attached updated patches.
>
> Here is the latest version of the patches with some fixes along with those
> mentioned below (mostly in 0003):
>
> - Fixed the logic to skip the attach partition validation scan such that
>   it won't skip scanning a list partition *that doesn't accept NULLs* if
>   the partition key column is not set NOT NULL (it similarly doesn't skip
>   scanning a range partition if either of the partition key columns is not
>   set NOT NULL, because a range partition key cannot contain NULLs at all)
>
> - Added some more regression tests for ATTACH PARTITION command
>
> - Some fixes to documentation and source code comments


Have you done any performance testing on the tuple routing code?
Suppose we insert a million (or 10 million) tuples into an
unpartitioned table, a table with 10 partitions, a table with 100
partitions, a table with 1000 partitions, and a table that is
partitioned into 10 partitions each of which has 10 subpartitions.
Ideally, the partitioned cases would run almost as fast as the
unpartitioned case, but probably there will be some overhead.
However, it would be useful to know how much.  Also, it would be
useful to set up the same cases with inheritance using a PL/pgsql ON
INSERT trigger for tuple routing and compare.  Hopefully the tuple
routing code is far faster than a trigger, but we should make sure
that's the case and look for optimizations if not.  Also, it would be
useful to know how much slower the tuple-mapping-required case is than
the no-tuple-mapping-required case.

I think the comments in some of the later patches could use some work
yet.  For example, in 0007, FormPartitionKeyDatum()'s header comment
is largely uninformative, get_partition_for_tuple()'s header comment
doesn't explain what the return value means in the non-failure case,
and ExecFindPartition() doesn't have a header comment at all.

The number of places in these patches where you end up reopening a
hopefully-already-locked relation with NoLock (or sometimes with
AccessShareLock) is worrying to me.  I think that's a coding pattern
we should be seeking to avoid; every one of those is not only a hazard
(what if we reach that point in the code without a lock?) but a
possible performance issue (we have to look up the OID in the
backend-private hash table; and if you are passing AccessShareLock
then you might also hit the lock manager which could be slow or create
deadlock hazards).  It would be much better to pass the Relation
around rather than the OID whenever possible.

Also, in 0006:

- I doubt that PartitionTreeNodeData's header comment will survive
contact with pgindent.

In 0007:

- oid_is_foreign_table could/should do a syscache lookup instead of
opening the heap relation.  But actually I think you could drop it
altogether and use get_rel_relkind().

- The XXX in EndCopy will need to be fixed somehow.

- I suspect that many of the pfree's in this patch are pointless
because the contexts in which those allocations are performed will be
reset or deleted shortly afterwards anyway.  Only pfree data that
might otherwise live substantially longer than we want, because pfree
consumes some cycles.

- The code in make_modifytable() to swap out the rte_array for a fake
one looks like an unacceptable kludge.  I don't know offhand what a
better design would look like, but what you've got is really ugly.

- I don't understand how it can be right for get_partition_for_tuple()
to be emitting an error that says "range partition key contains null".
A defective partition key should be detected at partition creation
time, not in the middle of tuple routing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

Reply via email to