On 2017/07/29 2:45, Amit Khandekar wrote:
> On 28 July 2017 at 20:10, Robert Haas <robertmh...@gmail.com> wrote:
>> On Wed, Jul 26, 2017 at 2:13 AM, Amit Langote wrote:
>>> I checked that we get the same result relation order with both the
>>> patches, but I would like to highlight a notable difference here between
>>> the approaches taken by our patches.  In my patch, I have now taught
>>> RelationGetPartitionDispatchInfo() to lock *only* the partitioned tables
>>> in the tree, because we need to look at its partition descriptor to
>>> collect partition OIDs and bounds.  We can defer locking (and opening the
>>> relation descriptor of) leaf partitions to a point where planner has
>>> determined that the partition will be accessed after all (not pruned),
>>> which will be done in a separate patch of course.
>> That's very desirable, but I believe it introduces a deadlock risk
>> which Amit's patch avoids.  A transaction using the code you've
>> written here is eventually going to lock all partitions, BUT it's
>> going to move the partitioned ones to the front of the locking order
>> vs. what find_all_inheritors would do.  So, when multi-level
>> partitioning is in use, I think it could happen that some other
>> transaction is accessing the table using a different code path that
>> uses the find_all_inheritors order without modification.  If those
>> locks conflict (e.g. query vs. DROP) then there's a deadlock risk.
> Yes, I agree. Even with single-level partitioning, the leaf partitions
> ordered by find_all_inheritors() is by oid values, so that's also
> going to be differently ordered.

We do require to lock the parent first in any case.  Doesn't that prevent
deadlocks by imparting an implicit order on locking by operations whose
locks conflict.

Having said that, I think it would be desirable for all code paths to
manipulate partitions in the same order.  For partitioned tables, I think
we can make it the partition bound order by replacing all calls to
find_all_inheritors and find_inheritance_children on partitioned table
parents with something else that reads partition OIDs from the relcache
(PartitionDesc) and traverses the partition tree breadth-first manner.

>> Unfortunately I don't see any easy way around that problem, but maybe
>> somebody else has an idea.
> One approach I had considered was to have find_inheritance_children()
> itself lock the children in bound order, so that everyone will have
> bound-ordered oids, but that would be too expensive since it requires
> opening all partitioned tables to initialize partition descriptors. In
> find_inheritance_children(), we get all oids without opening any
> tables. But now that I think more of it, it's only the partitioned
> tables that we have to open, not the leaf partitions; and furthermore,
> I didn't see calls to find_inheritance_children() and
> find_all_inheritors() in performance-critical code, except in
> expand_inherited_rtentry(). All of them are in DDL commands; but yes,
> that can change in the future.

This approach more or less amounts to calling the new
RelationGetPartitionDispatchInfo() (per my proposed patch, a version of
which I posted upthread.)  Maybe we can add a wrapper on top, say,
get_all_partition_oids() which throws away other things that
RelationGetPartitionDispatchInfo() returned.  In addition it locks all the
partitions that are returned, unlike only the partitioned ones, which is
what RelationGetPartitionDispatchInfo() has been taught to do.

> Regarding dynamically locking specific partitions as and when needed,
> I think this method inherently has the issue of deadlock because the
> order would be random. So it feels like there is no way around other
> than to lock all partitions beforehand.

I'm not sure why the order has to be random.  If and when we decide to
open and lock a subset of partitions for a given query, it will be done in
some canonical order as far as I can imagine.  Do you have some specific
example in mind?


Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to