On 28 July 2017 at 20:10, Robert Haas <robertmh...@gmail.com> wrote: > On Wed, Jul 26, 2017 at 2:13 AM, Amit Langote > <langote_amit...@lab.ntt.co.jp> wrote: >> Sorry to be responding this late to the Amit's make_resultrel_ordered >> patch itself, but I agree that we should teach the planner to *always* >> expand partitioned tables in the partition bound order. > > Sounds like we have unanimous agreement on that point.
I too agree. > >> I checked that we get the same result relation order with both the >> patches, but I would like to highlight a notable difference here between >> the approaches taken by our patches. In my patch, I have now taught >> RelationGetPartitionDispatchInfo() to lock *only* the partitioned tables >> in the tree, because we need to look at its partition descriptor to >> collect partition OIDs and bounds. We can defer locking (and opening the >> relation descriptor of) leaf partitions to a point where planner has >> determined that the partition will be accessed after all (not pruned), >> which will be done in a separate patch of course. With Amit Langote's patch, we can very well do the locking beforehand by find_all_inheritors(), and then run RelationGetPartitionDispatchInfo() with noLock, so as to remove the deadlock problem. But I think we should keep these two tasks separate, i.e. expanding the partition tree in bound order, and making RelationGetPartitionDispatchInfo() work for the planner. Regarding building the PartitionDispatchInfo in the planner, we should do that only after it is known that partition columns are updated, so it can't be done in expand_inherited_rtentry() because it would be too soon. For planner setup, RelationGetPartitionDispatchInfo() should just build the tupmap for each partitioned table, and then initialize the rest of the fields like tuplslot, reldesc , etc later during execution. So for now, I feel we should just do the changes for making sure the order is same, and then over that, separately modify RelationGetPartitionDispatchInfo() for planner. > > That's very desirable, but I believe it introduces a deadlock risk > which Amit's patch avoids. A transaction using the code you've > written here is eventually going to lock all partitions, BUT it's > going to move the partitioned ones to the front of the locking order > vs. what find_all_inheritors would do. So, when multi-level > partitioning is in use, I think it could happen that some other > transaction is accessing the table using a different code path that > uses the find_all_inheritors order without modification. If those > locks conflict (e.g. query vs. DROP) then there's a deadlock risk. Yes, I agree. Even with single-level partitioning, the leaf partitions ordered by find_all_inheritors() is by oid values, so that's also going to be differently ordered. > > Unfortunately I don't see any easy way around that problem, but maybe > somebody else has an idea. One approach I had considered was to have find_inheritance_children() itself lock the children in bound order, so that everyone will have bound-ordered oids, but that would be too expensive since it requires opening all partitioned tables to initialize partition descriptors. In find_inheritance_children(), we get all oids without opening any tables. But now that I think more of it, it's only the partitioned tables that we have to open, not the leaf partitions; and furthermore, I didn't see calls to find_inheritance_children() and find_all_inheritors() in performance-critical code, except in expand_inherited_rtentry(). All of them are in DDL commands; but yes, that can change in the future. Regarding dynamically locking specific partitions as and when needed, I think this method inherently has the issue of deadlock because the order would be random. So it feels like there is no way around other than to lock all partitions beforehand. ---------------- Regarding using first resultrel for mapping RETURNING and WCO, I think we can use (a renamed) getASTriggerResultRelInfo() to get the root result relation, and use WCO and RETURNING expressions of this relation to do the mapping for child rels. This way, there won't be insert/update specific code, and we don't need to use first result relation. While checking the whole-row bug on the other thread  , I noticed that the RETURNING/WCO expressions for the per-subplan result rels are formed by considering not just simple vars, but also whole row vars and other nodes. So for update-tuple-routing, there would be some result-rels WCOs formed using adjust_appendrel_attrs(), while for others, they would be built using map_partition_varattnos() which only considers simple vars. So the bug in  would be there for update-partition-key as well, when the tuple is routed into a newly built resultrel. May be, while fixing the bug in  , this might be automatically solved. ---------------- Below are the TODOS at this point : Fix for bug reported by Rajkumar about update with join. Do something about two separate mapping tables for Transition tables and update tuple-routing. GetUpdatedColumns() to be moved to header file. More test scenarios in regression tests. Need to check/test whether we are correctly applying insert policies (ant not update) while inserting a routed tuple. Use getASTriggerResultRelInfo() for attrno mapping, rather than first resultrel, for generating child WCO/RETURNING expression. Address Robert's review comments on make_resultrel_ordered.patch. pgindent.  https://www.postgresql.org/message-id/d86d27ea-cc9d-5dbe-b131-e7dec4017983%40lab.ntt.co.jp Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers