On 2018/06/19 2:05, Tom Lane wrote:
> Amit Langote <langote_amit...@lab.ntt.co.jp> writes:
>> [ 0001-Open-partitioned-tables-during-Append-initialization.patch ]
> 
> I took a look at this.  While I'm in agreement with the general idea of
> holding open the partitioned relations' relcache entries throughout the
> query, I do not like anything about this patch:

Thanks for taking a look at it and sorry about the delay in replying.

> * It seems to be outright broken for the case that any of the partitioned
> relations are listed in nonleafResultRelations.  If we're going to do it
> like this, we have to open every one of the partrels regardless of that.

Yes, that was indeed wrong.

> (I wonder whether we couldn't lose PlannedStmt.nonleafResultRelations
> altogether, in favor of merging the related code in InitPlan with this.

Hmm, PlannedStmt.nonleafResultRelations exists for the same reason as why
PlannedStmt.resultRelations does, that is,

/*
 * initialize result relation stuff, and open/lock the result rels.
 *
 * We must do this before initializing the plan tree, else we might try to
 * do a lock upgrade if a result rel is also a source rel.
 */

nonleafResultRelations contains members of partitioned_rels lists of all
ModifyTable nodes contained in a plan.

> That existing code is already a mighty ugly wart, and this patch makes
> it worse by adding new, related warts elsewhere.)

I just realized that there is a thinko in the following piece of code in
ExecLockNonLeafAppendTables

        /* If this is a result relation, already locked in InitPlan */
        foreach(l, stmt->nonleafResultRelations)
        {
            if (rti == lfirst_int(l))
            {
                is_result_rel = true;
                break;
            }
        }

It should actually be:

        /* If this is a result relation, already locked in InitPlan */
        foreach(l, stmt->nonleafResultRelations)
        {
            Index   nonleaf_rti = lfirst_int(l);
            Oid     nonleaf_relid = getrelid(nonleaf_rti,
                                             estate->es_range_table);

            if (relid == nonleaf_relid)
            {
                is_result_rel = true;
                break;
            }
        }

RT indexes in, say, Append.partitioned_rels, are distinct from those in
PlannedStmt.nonleafResultRelations, so the existing test never succeeds,
as also evident from the coverage report:

https://coverage.postgresql.org/src/backend/executor/execUtils.c.gcov.html#864


I'm wondering if we couldn't just get rid of this code.  If an input
partitioned tables is indeed also a result relation, then we would've
locked it in InitPlan with RowExclusiveLock and heap_opening it with a
weaker lock (RowShare/AccessShare) wouldn't hurt.

> * You've got *far* too much intimate knowledge of the possible callers
> in ExecOpenAppendPartitionedTables.
> 
> Personally, what I would have this function do is return a List of
> the opened Relation pointers, and add a matching function to run through
> such a List and close the entries again, and make the callers responsible
> for stashing the List pointer in an appropriate field in their planstate.

OK, I rewrote it to work that way.

> Or maybe what we should do is drop ExecLockNonLeafAppendTables/
> ExecOpenAppendPartitionedTables entirely and teach InitPlan to do it.

Hmm, for InitPlan to do what ExecOpenAppendPartitionedTables does, we'd
have to have all the partitioned tables (contained in partitioned_rels
fields of all Append/MergeAppend/ModifyTable nodes of a plan) also listed
in a global list like rootResultRelations and nonleafResultRelations of
PlannedStmt.

Attached updated patch.

Thanks,
Amit
From 0fd93b0b2108d4d12f483b96aab2c25c120173cb Mon Sep 17 00:00:00 2001
From: amit <amitlangot...@gmail.com>
Date: Thu, 14 Jun 2018 14:50:22 +0900
Subject: [PATCH v2] Fix opening/closing of partitioned tables in Append plans

---
 src/backend/executor/execPartition.c   |  49 ++++----------
 src/backend/executor/execUtils.c       | 114 +++++++++++++++++----------------
 src/backend/executor/nodeAppend.c      |  26 ++++----
 src/backend/executor/nodeMergeAppend.c |  11 ++--
 src/include/executor/execPartition.h   |   4 +-
 src/include/executor/executor.h        |   4 +-
 src/include/nodes/execnodes.h          |   4 ++
 7 files changed, 100 insertions(+), 112 deletions(-)

diff --git a/src/backend/executor/execPartition.c 
b/src/backend/executor/execPartition.c
index 7a4665cc4e..b9bc157bfa 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1399,13 +1399,19 @@ adjust_partition_tlist(List *tlist, TupleConversionMap 
*map)
  * PartitionPruningData for each item in that List.  This data can be re-used
  * each time we re-evaluate which partitions match the pruning steps provided
  * in each PartitionPruneInfo.
+ *
+ * 'partitioned_rels' is a List of same number of elements as there are in
+ * 'partitionpruneinfo' containing the Relation pointers of corresponding
+ * partitioned tables.
  */
 PartitionPruneState *
-ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
+ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo,
+                                                         List 
*partitioned_rels)
 {
        PartitionPruneState *prunestate;
        PartitionPruningData *prunedata;
-       ListCell   *lc;
+       ListCell   *lc1,
+                          *lc2;
        int                     i;
 
        Assert(partitionpruneinfo != NIL);
@@ -1435,9 +1441,10 @@ ExecCreatePartitionPruneState(PlanState *planstate, List 
*partitionpruneinfo)
                                                          
ALLOCSET_DEFAULT_SIZES);
 
        i = 0;
-       foreach(lc, partitionpruneinfo)
+       forboth(lc1, partitionpruneinfo, lc2, partitioned_rels)
        {
-               PartitionPruneInfo *pinfo = castNode(PartitionPruneInfo, 
lfirst(lc));
+               PartitionPruneInfo *pinfo = castNode(PartitionPruneInfo, 
lfirst(lc1));
+               Relation        rel = lfirst(lc2);
                PartitionPruningData *pprune = &prunedata[i];
                PartitionPruneContext *context = &pprune->context;
                PartitionDesc partdesc;
@@ -1460,17 +1467,9 @@ ExecCreatePartitionPruneState(PlanState *planstate, List 
*partitionpruneinfo)
                /* present_parts is also subject to later modification */
                pprune->present_parts = bms_copy(pinfo->present_parts);
 
-               /*
-                * We need to hold a pin on the partitioned table's relcache 
entry so
-                * that we can rely on its copies of the table's partition key 
and
-                * partition descriptor.  We need not get a lock though; one 
should
-                * have been acquired already by InitPlan or
-                * ExecLockNonLeafAppendTables.
-                */
-               context->partrel = relation_open(pinfo->reloid, NoLock);
-
-               partkey = RelationGetPartitionKey(context->partrel);
-               partdesc = RelationGetPartitionDesc(context->partrel);
+               Assert(RelationGetRelid(rel) == pinfo->reloid);
+               partkey = RelationGetPartitionKey(rel);
+               partdesc = RelationGetPartitionDesc(rel);
                n_steps = list_length(pinfo->pruning_steps);
 
                context->strategy = partkey->strategy;
@@ -1546,26 +1545,6 @@ ExecCreatePartitionPruneState(PlanState *planstate, List 
*partitionpruneinfo)
 }
 
 /*
- * ExecDestroyPartitionPruneState
- *             Release resources at plan shutdown.
- *
- * We don't bother to free any memory here, since the whole executor context
- * will be going away shortly.  We do need to release our relcache pins.
- */
-void
-ExecDestroyPartitionPruneState(PartitionPruneState *prunestate)
-{
-       int                     i;
-
-       for (i = 0; i < prunestate->num_partprunedata; i++)
-       {
-               PartitionPruningData *pprune = &prunestate->partprunedata[i];
-
-               relation_close(pprune->context.partrel, NoLock);
-       }
-}
-
-/*
  * ExecFindInitialMatchingSubPlans
  *             Identify the set of subplans that cannot be eliminated by 
initial
  *             pruning (disregarding any pruning constraints involving 
PARAM_EXEC
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index b963cae730..f936f8c3da 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -704,6 +704,65 @@ ExecCloseScanRelation(Relation scanrel)
 }
 
 /*
+ * ExecOpenAppendPartitionedTables
+ *
+ * Opens and/or locks, if necessary, the tables indicated by the RT indexes
+ * contained in the partitioned_rels list of a given Append or MergeAppend
+ * node.  These are the non-leaf tables in the partition tree controlled by
+ * the node.
+ */
+List *
+ExecOpenAppendPartitionedTables(List *partitioned_rels, EState *estate)
+{
+       PlannedStmt *stmt = estate->es_plannedstmt;
+       ListCell   *lc;
+       List       *result = NIL;
+
+       Assert(partitioned_rels != NIL);
+
+       foreach(lc, partitioned_rels)
+       {
+               ListCell   *l;
+               Index           rti = lfirst_int(lc);
+               Oid                     relid = getrelid(rti, 
estate->es_range_table);
+               int                     lockmode = AccessShareLock;
+               PlanRowMark *rc = NULL;
+
+               /* Check if there is a RowMark that requires taking a 
RowShareLock. */
+
+               foreach(l, stmt->rowMarks)
+               {
+                       if (((PlanRowMark *) lfirst(l))->rti == rti)
+                       {
+                               rc = lfirst(l);
+                               break;
+                       }
+               }
+
+               if (rc && RowMarkRequiresRowShareLock(rc->markType))
+                       lockmode = RowShareLock;
+
+               result = lappend(result, heap_open(relid, lockmode));
+       }
+
+       return result;
+}
+
+/* Close each Relation in the input list. */
+void
+ExecCloseAppendPartitionedTables(List *partitioned_rels)
+{
+       ListCell *lc;
+
+       /*
+        * Close partitioned rels that we may have opened for partition
+        * pruning.
+        */
+       foreach(lc, partitioned_rels)
+               heap_close(lfirst(lc), NoLock);
+}
+
+/*
  * UpdateChangedParamSet
  *             Add changed parameters to a plan node's chgParam set
  */
@@ -856,61 +915,6 @@ ShutdownExprContext(ExprContext *econtext, bool isCommit)
 }
 
 /*
- * ExecLockNonLeafAppendTables
- *
- * Locks, if necessary, the tables indicated by the RT indexes contained in
- * the partitioned_rels list.  These are the non-leaf tables in the partition
- * tree controlled by a given Append or MergeAppend node.
- */
-void
-ExecLockNonLeafAppendTables(List *partitioned_rels, EState *estate)
-{
-       PlannedStmt *stmt = estate->es_plannedstmt;
-       ListCell   *lc;
-
-       foreach(lc, partitioned_rels)
-       {
-               ListCell   *l;
-               Index           rti = lfirst_int(lc);
-               bool            is_result_rel = false;
-               Oid                     relid = getrelid(rti, 
estate->es_range_table);
-
-               /* If this is a result relation, already locked in InitPlan */
-               foreach(l, stmt->nonleafResultRelations)
-               {
-                       if (rti == lfirst_int(l))
-                       {
-                               is_result_rel = true;
-                               break;
-                       }
-               }
-
-               /*
-                * Not a result relation; check if there is a RowMark that 
requires
-                * taking a RowShareLock on this rel.
-                */
-               if (!is_result_rel)
-               {
-                       PlanRowMark *rc = NULL;
-
-                       foreach(l, stmt->rowMarks)
-                       {
-                               if (((PlanRowMark *) lfirst(l))->rti == rti)
-                               {
-                                       rc = lfirst(l);
-                                       break;
-                               }
-                       }
-
-                       if (rc && RowMarkRequiresRowShareLock(rc->markType))
-                               LockRelationOid(relid, RowShareLock);
-                       else
-                               LockRelationOid(relid, AccessShareLock);
-               }
-       }
-}
-
-/*
  *             GetAttributeByName
  *             GetAttributeByNum
  *
diff --git a/src/backend/executor/nodeAppend.c 
b/src/backend/executor/nodeAppend.c
index 5ce4fb43e1..77285af194 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -113,18 +113,17 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
        Assert(!(eflags & EXEC_FLAG_MARK));
 
        /*
-        * Lock the non-leaf tables in the partition tree controlled by this 
node.
-        * It's a no-op for non-partitioned parent tables.
-        */
-       ExecLockNonLeafAppendTables(node->partitioned_rels, estate);
-
-       /*
         * create new AppendState for our append node
         */
        appendstate->ps.plan = (Plan *) node;
        appendstate->ps.state = estate;
        appendstate->ps.ExecProcNode = ExecAppend;
 
+       /* Lock and open (if needed) the partitioned tables. */
+       if (node->partitioned_rels != NIL)
+               appendstate->as_partitioned_rels =
+                       ExecOpenAppendPartitionedTables(node->partitioned_rels, 
estate);
+
        /* Let choose_next_subplan_* function handle setting the first subplan 
*/
        appendstate->as_whichplan = INVALID_SUBPLAN_INDEX;
 
@@ -137,8 +136,11 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
                ExecAssignExprContext(estate, &appendstate->ps);
 
                /* Create the working data structure for pruning. */
-               prunestate = ExecCreatePartitionPruneState(&appendstate->ps,
-                                                                               
                   node->part_prune_infos);
+
+               prunestate =
+                               ExecCreatePartitionPruneState(&appendstate->ps,
+                                                                               
          node->part_prune_infos,
+                                                                               
          appendstate->as_partitioned_rels);
                appendstate->as_prune_state = prunestate;
 
                /* Perform an initial partition prune, if required. */
@@ -325,17 +327,13 @@ ExecEndAppend(AppendState *node)
        appendplans = node->appendplans;
        nplans = node->as_nplans;
 
+       ExecCloseAppendPartitionedTables(node->as_partitioned_rels);
+
        /*
         * shut down each of the subscans
         */
        for (i = 0; i < nplans; i++)
                ExecEndNode(appendplans[i]);
-
-       /*
-        * release any resources associated with run-time pruning
-        */
-       if (node->as_prune_state)
-               ExecDestroyPartitionPruneState(node->as_prune_state);
 }
 
 void
diff --git a/src/backend/executor/nodeMergeAppend.c 
b/src/backend/executor/nodeMergeAppend.c
index 118f4ef07d..8de956b129 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -72,11 +72,10 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int 
eflags)
        /* check for unsupported flags */
        Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
 
-       /*
-        * Lock the non-leaf tables in the partition tree controlled by this 
node.
-        * It's a no-op for non-partitioned parent tables.
-        */
-       ExecLockNonLeafAppendTables(node->partitioned_rels, estate);
+       /* Lock and open (if needed) the partitioned tables. */
+       if (node->partitioned_rels != NIL)
+               mergestate->ms_partitioned_rels =
+                       ExecOpenAppendPartitionedTables(node->partitioned_rels, 
estate);
 
        /*
         * Set up empty vector of subplan states
@@ -283,6 +282,8 @@ ExecEndMergeAppend(MergeAppendState *node)
        mergeplans = node->mergeplans;
        nplans = node->ms_nplans;
 
+       ExecCloseAppendPartitionedTables(node->ms_partitioned_rels);
+
        /*
         * shut down each of the subscans
         */
diff --git a/src/include/executor/execPartition.h 
b/src/include/executor/execPartition.h
index 862bf65060..7b0744a2c9 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -209,8 +209,8 @@ extern HeapTuple 
ConvertPartitionTupleSlot(TupleConversionMap *map,
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
                                                PartitionTupleRouting *proute);
 extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
-                                                         List 
*partitionpruneinfo);
-extern void ExecDestroyPartitionPruneState(PartitionPruneState *prunestate);
+                                                         List 
*partitionpruneinfo,
+                                                         List 
*partitioned_rels);
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState 
*prunestate,
                                                                int nsubplans);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index f82b51667f..301e4b4059 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -524,7 +524,9 @@ extern void UnregisterExprContextCallback(ExprContext 
*econtext,
                                                          
ExprContextCallbackFunction function,
                                                          Datum arg);
 
-extern void ExecLockNonLeafAppendTables(List *partitioned_rels, EState 
*estate);
+extern List *ExecOpenAppendPartitionedTables(List *partitioned_rels,
+                                                               EState *estate);
+extern void ExecCloseAppendPartitionedTables(List *partitioned_rels);
 
 extern Datum GetAttributeByName(HeapTupleHeader tuple, const char *attname,
                                   bool *isNull);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index da7f52cab0..55996b80d3 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1073,6 +1073,7 @@ typedef struct ModifyTableState
  *                                                     eliminated from the 
scan, or NULL if not possible.
  *             valid_subplans          for runtime pruning, valid appendplans 
indexes to
  *                                                     scan.
+ *             partitioned_rels        Relation pointers of partitioned tables
  * ----------------
  */
 
@@ -1095,6 +1096,7 @@ struct AppendState
        struct PartitionPruneState *as_prune_state;
        Bitmapset  *as_valid_subplans;
        bool            (*choose_next_subplan) (AppendState *);
+       List       *as_partitioned_rels;        /* List of Relation pointers */
 };
 
 /* ----------------
@@ -1106,6 +1108,7 @@ struct AppendState
  *             slots                   current output tuple of each subplan
  *             heap                    heap of active tuples
  *             initialized             true if we have fetched first tuple 
from each subplan
+ *             partitioned_rels        Relation pointers of partitioned tables
  * ----------------
  */
 typedef struct MergeAppendState
@@ -1118,6 +1121,7 @@ typedef struct MergeAppendState
        TupleTableSlot **ms_slots;      /* array of length ms_nplans */
        struct binaryheap *ms_heap; /* binary heap of slot indices */
        bool            ms_initialized; /* are subplans started? */
+       List       *ms_partitioned_rels;        /* List of Relation pointers */
 } MergeAppendState;
 
 /* ----------------
-- 
2.11.0

Reply via email to