Re: [HACKERS] UPDATE of partition key

Amit Khandekar Fri, 28 Jul 2017 10:47:02 -0700

On 28 July 2017 at 20:10, Robert Haas <[email protected]> wrote:
> On Wed, Jul 26, 2017 at 2:13 AM, Amit Langote
> <[email protected]> wrote:
>> Sorry to be responding this late to the Amit's make_resultrel_ordered
>> patch itself, but I agree that we should teach the planner to *always*
>> expand partitioned tables in the partition bound order.
>
> Sounds like we have unanimous agreement on that point.


I too agree.

>
>> I checked that we get the same result relation order with both the
>> patches, but I would like to highlight a notable difference here between
>> the approaches taken by our patches.  In my patch, I have now taught
>> RelationGetPartitionDispatchInfo() to lock *only* the partitioned tables
>> in the tree, because we need to look at its partition descriptor to
>> collect partition OIDs and bounds.  We can defer locking (and opening the
>> relation descriptor of) leaf partitions to a point where planner has
>> determined that the partition will be accessed after all (not pruned),
>> which will be done in a separate patch of course.

With Amit Langote's patch, we can very well do the locking beforehand
by find_all_inheritors(), and then run
RelationGetPartitionDispatchInfo() with noLock, so as to remove the
deadlock problem. But I think we should keep these two tasks separate,
i.e. expanding the partition tree in bound order, and making
RelationGetPartitionDispatchInfo() work for the planner.

Regarding building the PartitionDispatchInfo in the planner, we should
do that only after it is known that partition columns are updated, so
it can't be done in expand_inherited_rtentry() because it would be too
soon. For planner setup, RelationGetPartitionDispatchInfo() should
just build the tupmap for each partitioned table, and then initialize
the rest of the fields like tuplslot, reldesc , etc later during
execution.

So for now, I feel we should just do the changes for making sure the
order is same, and then over that, separately modify
RelationGetPartitionDispatchInfo() for planner.

>
> That's very desirable, but I believe it introduces a deadlock risk
> which Amit's patch avoids.  A transaction using the code you've
> written here is eventually going to lock all partitions, BUT it's
> going to move the partitioned ones to the front of the locking order
> vs. what find_all_inheritors would do.  So, when multi-level
> partitioning is in use, I think it could happen that some other
> transaction is accessing the table using a different code path that
> uses the find_all_inheritors order without modification.  If those
> locks conflict (e.g. query vs. DROP) then there's a deadlock risk.

Yes, I agree. Even with single-level partitioning, the leaf partitions
ordered by find_all_inheritors() is by oid values, so that's also
going to be differently ordered.

>
> Unfortunately I don't see any easy way around that problem, but maybe
> somebody else has an idea.

One approach I had considered was to have find_inheritance_children()
itself lock the children in bound order, so that everyone will have
bound-ordered oids, but that would be too expensive since it requires
opening all partitioned tables to initialize partition descriptors. In
find_inheritance_children(), we get all oids without opening any
tables. But now that I think more of it, it's only the partitioned
tables that we have to open, not the leaf partitions; and furthermore,
I didn't see calls to find_inheritance_children() and
find_all_inheritors() in performance-critical code, except in
expand_inherited_rtentry(). All of them are in DDL commands; but yes,
that can change in the future.

Regarding dynamically locking specific partitions as and when needed,
I think this method inherently has the issue of deadlock because the
order would be random. So it feels like there is no way around other
than to lock all partitions beforehand.

----------------

Regarding using first resultrel for mapping RETURNING and WCO, I think
we can use (a renamed) getASTriggerResultRelInfo() to get the root
result relation, and use WCO and RETURNING expressions of this
relation to do the mapping for child rels. This way, there won't be
insert/update specific code, and we don't need to use first result
relation.

While checking the whole-row bug on the other thread [1] , I noticed
that the RETURNING/WCO expressions for the per-subplan result rels are
formed by considering not just simple vars, but also whole row vars
and other nodes. So for update-tuple-routing, there would be some
result-rels WCOs formed using adjust_appendrel_attrs(), while for
others, they would be built using map_partition_varattnos() which only
considers simple vars. So the bug in [1] would be there for
update-partition-key as well, when the tuple is routed into a newly
built resultrel. May be, while fixing the bug in [1] , this might be
automatically solved.

----------------

Below are the TODOS at this point :

Fix for bug reported by Rajkumar about update with join.
Do something about two separate mapping tables for Transition tables
and update tuple-routing.
GetUpdatedColumns() to be moved to header file.
More test scenarios in regression tests.
Need to check/test whether we are correctly applying insert policies
(ant not update) while inserting a routed tuple.
Use getASTriggerResultRelInfo() for attrno mapping, rather than first
resultrel, for generating child WCO/RETURNING expression.
Address Robert's review comments on make_resultrel_ordered.patch.
pgindent.

[1] 
https://www.postgresql.org/message-id/d86d27ea-cc9d-5dbe-b131-e7dec4017983%40lab.ntt.co.jp

Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

Reply via email to