Re: why partition pruning doesn't work?

Tom Lane Sun, 15 Jul 2018 10:03:13 -0700

Amit Langote <[email protected]> writes:
> On 2018/06/19 2:05, Tom Lane wrote:
>> Or maybe what we should do is drop ExecLockNonLeafAppendTables/
>> ExecOpenAppendPartitionedTables entirely and teach InitPlan to do it.


> Hmm, for InitPlan to do what ExecOpenAppendPartitionedTables does, we'd
> have to have all the partitioned tables (contained in partitioned_rels
> fields of all Append/MergeAppend/ModifyTable nodes of a plan) also listed
> in a global list like rootResultRelations and nonleafResultRelations of
> PlannedStmt.

> Attached updated patch.

I've been looking at this patch, and while it's not unreasonable on its
own terms, I'm growing increasingly distressed at the ad-hoc and rather
duplicative nature of the data structures that have gotten stuck into
plan trees thanks to partitioning (the rootResultRelations and
nonleafResultRelations lists being prime examples).

It struck me this morning that a whole lot of the complication here is
solely due to needing to identify the right type of relation lock to take
during executor startup, and that THAT WORK IS TOTALLY USELESS.  In every
case, we must already hold a suitable lock before we ever get to the
executor; either one acquired during the parse/plan pipeline, or one
re-acquired by AcquireExecutorLocks in the case of a cached plan.
Otherwise it's entirely possible that the plan has been invalidated by
concurrent DDL --- and it's not the executor's job to detect that and
re-plan; that *must* have been done upstream.

Moreover, it's important from a deadlock-avoidance standpoint that these
locks get acquired in the same order as they were acquired during the
initial parse/plan pipeline.  I think it's reasonable to assume they were
acquired in RTE order in that stage, so AcquireExecutorLocks is OK.  But,
if the current logic in the executor gets them in that order, it's both
non-obvious that it does so and horribly fragile if it does, seeing that
the responsibility for this is split across InitPlan,
ExecOpenScanRelation, and ExecLockNonLeafAppendTables.

So I'm thinking that what we really ought to do here is simplify executor
startup to just open all rels with NoLock, and get rid of any supporting
data structures that turn out to have no other use.  (David Rowley's
nearby patch to create a properly hierarchical executor data structure for
partitioning info is likely to tie into this too, by removing some other
vestigial uses of those lists.)

I think that this idea has been discussed in the past, and we felt at
the time that having the executor take its own locks was a good safety
measure, and a relatively cheap one since the lock manager is pretty
good at short-circuiting duplicative lock requests.  But those are
certainly not free.  Moreover, I'm not sure that this is really a
safety measure at all: if the executor were really taking any lock
not already held, it'd be masking a DDL hazard.

To investigate this further, I made the attached not-meant-for-commit
hack to verify whether InitPlan and related executor startup functions
were actually taking any not-previously-held locks.  I could only find
one such case: the parser always opens tables selected FOR UPDATE with
RowShareLock, but if we end up implementing the resulting row mark
with ROW_MARK_COPY, the executor is acquiring just AccessShareLock
(because ExecOpenScanRelation thinks it needs to acquire some lock).
The patch as presented hacks ExecOpenScanRelation to avoid that, and
it passes check-world.

What we'd be better off doing, if we go this route, is to install an
assertion-build-only test that verifies during relation_open(NoLock)
that some kind of lock is already held on the rel.  That would protect
not only the executor, but a boatload of existing places that open
rels with NoLock on the currently-unverified assumption that a lock is
already held.

I'm also rather strongly tempted to add a field to RangeTblEntry
that records the appropriate lock strength to take, so that we don't
have to rely on keeping AcquireExecutorLocks' logic to decide on the
lock type in sync with whatever the parse/plan pipeline does.  (One
could then imagine adding assertions in the executor that this field
shows a lock strength of at least X, in place of actually opening
the rel with X.)

BTW, there'd be a lot to be said for having InitPlan just open all
the rels and build an array of Relation pointers that parallels the
RTE list, rather than doing heap_opens in random places elsewhere.

                        regards, tom lane

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 8026fe2..58c62f3 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -817,6 +817,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 	TupleDesc	tupType;
 	ListCell   *l;
 	int			i;
+	bool		gotlock;
 
 	/*
 	 * Do permissions checks
@@ -852,7 +853,9 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 			Relation	resultRelation;
 
 			resultRelationOid = getrelid(resultRelationIndex, rangeTable);
-			resultRelation = heap_open(resultRelationOid, RowExclusiveLock);
+			gotlock = LockRelationOid(resultRelationOid, RowExclusiveLock);
+			Assert(!gotlock || IsParallelWorker());
+			resultRelation = heap_open(resultRelationOid, NoLock);
 
 			InitResultRelInfo(resultRelInfo,
 							  resultRelation,
@@ -892,7 +895,9 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 				Relation	resultRelDesc;
 
 				resultRelOid = getrelid(resultRelIndex, rangeTable);
-				resultRelDesc = heap_open(resultRelOid, RowExclusiveLock);
+				gotlock = LockRelationOid(resultRelOid, RowExclusiveLock);
+				Assert(!gotlock || IsParallelWorker());
+				resultRelDesc = heap_open(resultRelOid, NoLock);
 				InitResultRelInfo(resultRelInfo,
 								  resultRelDesc,
 								  lfirst_int(l),
@@ -912,8 +917,11 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 				/* We locked the roots above. */
 				if (!list_member_int(plannedstmt->rootResultRelations,
 									 resultRelIndex))
-					LockRelationOid(getrelid(resultRelIndex, rangeTable),
-									RowExclusiveLock);
+				{
+					gotlock = LockRelationOid(getrelid(resultRelIndex, rangeTable),
+											  RowExclusiveLock);
+					Assert(!gotlock || IsParallelWorker());
+				}
 			}
 		}
 	}
@@ -963,10 +971,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 			case ROW_MARK_NOKEYEXCLUSIVE:
 			case ROW_MARK_SHARE:
 			case ROW_MARK_KEYSHARE:
-				relation = heap_open(relid, RowShareLock);
+				gotlock = LockRelationOid(relid, RowShareLock);
+				Assert(!gotlock || IsParallelWorker());
+				relation = heap_open(relid, NoLock);
 				break;
 			case ROW_MARK_REFERENCE:
-				relation = heap_open(relid, AccessShareLock);
+				gotlock = LockRelationOid(relid, AccessShareLock);
+				Assert(!gotlock || IsParallelWorker());
+				relation = heap_open(relid, NoLock);
 				break;
 			case ROW_MARK_COPY:
 				/* no physical table access is required */
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index b963cae..cf08b50 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -42,6 +42,7 @@
 
 #include "postgres.h"
 
+#include "access/parallel.h"
 #include "access/relscan.h"
 #include "access/transam.h"
 #include "executor/executor.h"
@@ -645,6 +646,7 @@ ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags)
 	Relation	rel;
 	Oid			reloid;
 	LOCKMODE	lockmode;
+	bool		gotlock;
 
 	/*
 	 * Determine the lock type we need.  First, scan to see if target relation
@@ -659,13 +661,19 @@ ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags)
 		/* Keep this check in sync with InitPlan! */
 		ExecRowMark *erm = ExecFindRowMark(estate, scanrelid, true);
 
-		if (erm != NULL && erm->relation != NULL)
+		/* HACK: assume things are OK for ROW_MARK_COPY case */
+		if (erm != NULL)
 			lockmode = NoLock;
 	}
 
 	/* Open the relation and acquire lock as needed */
 	reloid = getrelid(scanrelid, estate->es_range_table);
-	rel = heap_open(reloid, lockmode);
+	if (lockmode != NoLock)
+	{
+		gotlock = LockRelationOid(reloid, lockmode);
+		Assert(!gotlock || IsParallelWorker());
+	}
+	rel = heap_open(reloid, NoLock);
 
 	/*
 	 * Complain if we're attempting a scan of an unscannable relation, except
@@ -874,6 +882,7 @@ ExecLockNonLeafAppendTables(List *partitioned_rels, EState *estate)
 		Index		rti = lfirst_int(lc);
 		bool		is_result_rel = false;
 		Oid			relid = getrelid(rti, estate->es_range_table);
+		bool		gotlock;
 
 		/* If this is a result relation, already locked in InitPlan */
 		foreach(l, stmt->nonleafResultRelations)
@@ -903,9 +912,10 @@ ExecLockNonLeafAppendTables(List *partitioned_rels, EState *estate)
 			}
 
 			if (rc && RowMarkRequiresRowShareLock(rc->markType))
-				LockRelationOid(relid, RowShareLock);
+				gotlock = LockRelationOid(relid, RowShareLock);
 			else
-				LockRelationOid(relid, AccessShareLock);
+				gotlock = LockRelationOid(relid, AccessShareLock);
+			Assert(!gotlock || IsParallelWorker());
 		}
 	}
 }
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index 7b2dcb6..a1ebf64 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -100,8 +100,9 @@ SetLocktagRelationOid(LOCKTAG *tag, Oid relid)
  *
  * Lock a relation given only its OID.  This should generally be used
  * before attempting to open the relation's relcache entry.
+ * Return TRUE if we acquired a new lock, FALSE if already held.
  */
-void
+bool
 LockRelationOid(Oid relid, LOCKMODE lockmode)
 {
 	LOCKTAG		tag;
@@ -122,7 +123,11 @@ LockRelationOid(Oid relid, LOCKMODE lockmode)
 	 * CommandCounterIncrement, not here.)
 	 */
 	if (res != LOCKACQUIRE_ALREADY_HELD)
+	{
 		AcceptInvalidationMessages();
+		return true;
+	}
+	return false;
 }
 
 /*
diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h
index a217de9..69e6f7f 100644
--- a/src/include/storage/lmgr.h
+++ b/src/include/storage/lmgr.h
@@ -37,7 +37,7 @@ typedef enum XLTW_Oper
 extern void RelationInitLockInfo(Relation relation);
 
 /* Lock a relation */
-extern void LockRelationOid(Oid relid, LOCKMODE lockmode);
+extern bool LockRelationOid(Oid relid, LOCKMODE lockmode);
 extern bool ConditionalLockRelationOid(Oid relid, LOCKMODE lockmode);
 extern void UnlockRelationId(LockRelId *relid, LOCKMODE lockmode);
 extern void UnlockRelationOid(Oid relid, LOCKMODE lockmode);

Re: why partition pruning doesn't work?

Reply via email to