subject:"remove spurious CREATE INDEX CONCURRENTLY wait"

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2021-11-11 Thread Andres Freund

On 2021-11-11 09:38:07 -0300, Alvaro Herrera wrote:
> > I just noticed that this commit (dcfff74fb16) didn't revert the change of 
> > lock
> > level in ReplicationSlotRelease(). Was that intentional?
> 
> Hmm, no, that seems to have been a mistake.  I'll restore it.

Thanks

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2021-11-11 Thread Alvaro Herrera

On 2021-Nov-10, Andres Freund wrote:

> > Reverts 27838981be9d (some comments are kept).  Per discussion, it does
> > not seem safe to relax the lock level used for this; in order for it to
> > be safe, there would have to be memory barriers between the point we set
> > the flag and the point we set the trasaction Xid, which perhaps would
> > not be so bad; but there would also have to be barriers at the readers'
> > side, which from a performance perspective might be bad.
> > 
> > Now maybe this analysis is wrong and it *is* safe for some reason, but
> > proof of that is not trivial.
> 
> I just noticed that this commit (dcfff74fb16) didn't revert the change of lock
> level in ReplicationSlotRelease(). Was that intentional?

Hmm, no, that seems to have been a mistake.  I'll restore it.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"El miedo atento y previsor es la madre de la seguridad" (E. Burke)

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2021-11-10 Thread Andres Freund

Hi,

On 2020-11-25 17:03:58 -0300, Alvaro Herrera wrote:
> On 2020-Nov-23, Andres Freund wrote:
> 
> > On 2020-11-23 12:30:05 -0300, Alvaro Herrera wrote:
> 
> > > In other words, my conclusion is that there definitely is a bug here and
> > > I am going to restore the use of exclusive lock for setting the
> > > statusFlags.
> > 
> > Cool.
> 
> Here's a patch.
> 
> Note it also moves the computation of vacuum's Xmin (per
> GetTransactionSnapshot) to *after* the bit has been set in statusFlags.

> From b813c67a4abe2127b8bd13db7e920f958db15d59 Mon Sep 17 00:00:00 2001
> From: Alvaro Herrera 
> Date: Tue, 24 Nov 2020 18:10:42 -0300
> Subject: [PATCH] Restore lock level to update statusFlags
> 
> Reverts 27838981be9d (some comments are kept).  Per discussion, it does
> not seem safe to relax the lock level used for this; in order for it to
> be safe, there would have to be memory barriers between the point we set
> the flag and the point we set the trasaction Xid, which perhaps would
> not be so bad; but there would also have to be barriers at the readers'
> side, which from a performance perspective might be bad.
> 
> Now maybe this analysis is wrong and it *is* safe for some reason, but
> proof of that is not trivial.

I just noticed that this commit (dcfff74fb16) didn't revert the change of lock
level in ReplicationSlotRelease(). Was that intentional?

Greetings,

Andres Freund

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-30 Thread Michael Paquier

On Mon, Nov 30, 2020 at 04:15:27PM -0300, Alvaro Herrera wrote:
> In the interest of showing progress, I'm going to mark this CF item as
> committed, and I'll submit the remaining pieces in a new thread.

Thanks for splitting!
--
Michael


signature.asc
Description: PGP signature

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-30 Thread Alvaro Herrera

In the interest of showing progress, I'm going to mark this CF item as
committed, and I'll submit the remaining pieces in a new thread.

Thanks!

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-27 Thread Alvaro Herrera

Actually, I noticed two things.  The first of them, addressed in this
new version of the patch, is that REINDEX CONCURRENTLY is doing a lot of
repetitive work by reopening each index and table in the build/validate
loops, so that they can report progress.  This is easy to remedy by
adding a couple more members to the new struct (which I also renamed to
ReindexIndexInfo), for tableId and amId.  The code seems a bit simpler
this way.

The other thing is that ReindexRelationConcurrenty seems to always be
called with the relations already locked by RangeVarGetRelidExtended.
So claiming to acquire locks on the relations over and over is
pointless.  (I only noticed this because there was an obvious deadlock
hazard in one of the loops, that locked index before table.)  I think we
should reduce all those to NoLock.  My patch does not do that.
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index ca24620fd0..a52cb16b5e 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -114,6 +114,17 @@ typedef struct ReindexErrorInfo
 	char		relkind;
 } ReindexErrorInfo;
 
+/*
+ * Index to process in ReindexRelationConcurrently
+ */
+typedef struct ReindexIndexInfo
+{
+	Oid			indexId;
+	Oid			tableId;
+	Oid			amId;
+	bool		safe;			/* for set_indexsafe_procflags */
+} ReindexIndexInfo;
+
 /*
  * CheckIndexCompatible
  *		Determine whether an existing index definition is compatible with a
@@ -385,7 +396,7 @@ CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts)
  * lazy VACUUMs, because they won't be fazed by missing index entries
  * either.  (Manual ANALYZEs, however, can't be excluded because they
  * might be within transactions that are going to do arbitrary operations
- * later.)  Processes running CREATE INDEX CONCURRENTLY
+ * later.)  Processes running CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY
  * on indexes that are neither expressional nor partial are also safe to
  * ignore, since we know that those processes won't examine any data
  * outside the table they're indexing.
@@ -1564,9 +1575,11 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
-	/* Tell concurrent index builds to ignore us, if index qualifies */
-	if (safe_index)
-		set_indexsafe_procflags();
+	/*
+	 * This transaction doesn't need to set the PROC_IN_SAFE_IC flag, because
+	 * it only takes a snapshot to do some catalog manipulations, after the
+	 * wait is over.
+	 */
 
 	/* We should now definitely not be advertising any xmin. */
 	Assert(MyProc->xmin == InvalidTransactionId);
@@ -3132,10 +3145,16 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		get_rel_name(cellOid;
 	else
 	{
+		ReindexIndexInfo *idx;
+
 		/* Save the list of relation OIDs in private context */
 		oldcontext = MemoryContextSwitchTo(private_context);
 
-		indexIds = lappend_oid(indexIds, cellOid);
+		idx = palloc(sizeof(ReindexIndexInfo));
+		idx->indexId = cellOid;
+		/* other fields set later */
+
+		indexIds = lappend(indexIds, idx);
 
 		MemoryContextSwitchTo(oldcontext);
 	}
@@ -3172,13 +3191,18 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 			get_rel_name(cellOid;
 		else
 		{
+			ReindexIndexInfo *idx;
+
 			/*
 			 * Save the list of relation OIDs in private
 			 * context
 			 */
 			oldcontext = MemoryContextSwitchTo(private_context);
 
-			indexIds = lappend_oid(indexIds, cellOid);
+			idx = palloc(sizeof(ReindexIndexInfo));
+			idx->indexId = cellOid;
+			indexIds = lappend(indexIds, idx);
+			/* other fields set later */
 
 			MemoryContextSwitchTo(oldcontext);
 		}
@@ -3197,6 +3221,7 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 Oid			heapId = IndexGetRelation(relationOid,
 	  (options & REINDEXOPT_MISSING_OK) != 0);
 Relation	heapRelation;
+ReindexIndexInfo *idx;
 
 /* if relation is missing, leave */
 if (!OidIsValid(heapId))
@@ -3247,7 +3272,10 @@ ReindexRelationConcurrently(Oid relationOid, int options)
  * Save the list of relation OIDs in private context.  Note
  * that invalid indexes are allowed here.
  */
-indexIds = lappend_oid(indexIds, relationOid);
+idx = palloc(sizeof(ReindexIndexInfo));
+idx->indexId = relationOid;
+indexIds = lappend(indexIds, idx);
+/* other fields set later */
 
 MemoryContextSwitchTo(oldcontext);
 break;
@@ -3306,31 +3334,40 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	foreach(lc, indexIds)
 	{
 		char	   *concurrentName;
-		Oid			indexId = lfirst_oid(lc);
+		ReindexIndexInfo *idx = lfirst(lc);
+		ReindexIndexInfo *newidx;
 		Oid			newIndexId;
 		Relation	indexRel;
 		Relation	heapRel;
 		Relation	newIndexRel;
 		LockRelId  *lockrelid;
 
-		indexRel = index_open(indexId, ShareUpdateExclusiveLock);
+		/* XXX pointless lock acquisition */
+		indexRel =

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-26 Thread Alvaro Herrera

On 2020-Nov-26, Alvaro Herrera wrote:

> So let's discuss the next step in this series: what to do about REINDEX
> CONCURRENTLY.

> [...]

... as in the attached.

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index ca24620fd0..9c1c0aad3e 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -385,7 +385,7 @@ CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts)
  * lazy VACUUMs, because they won't be fazed by missing index entries
  * either.  (Manual ANALYZEs, however, can't be excluded because they
  * might be within transactions that are going to do arbitrary operations
- * later.)  Processes running CREATE INDEX CONCURRENTLY
+ * later.)  Processes running CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY
  * on indexes that are neither expressional nor partial are also safe to
  * ignore, since we know that those processes won't examine any data
  * outside the table they're indexing.
@@ -1564,9 +1564,11 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
-	/* Tell concurrent index builds to ignore us, if index qualifies */
-	if (safe_index)
-		set_indexsafe_procflags();
+	/*
+	 * This transaction doesn't need to set the PROC_IN_SAFE_IC flag, because
+	 * it only takes a snapshot to do some catalog manipulations, after the
+	 * wait is over.
+	 */
 
 	/* We should now definitely not be advertising any xmin. */
 	Assert(MyProc->xmin == InvalidTransactionId);
@@ -3020,6 +3022,13 @@ ReindexMultipleInternal(List *relids, int options)
  * concerns, so there's no need for the more complicated concurrent build,
  * anyway, and a non-concurrent reindex is more efficient.
  */
+
+typedef struct reindex_idx
+{
+	Oid		indexId;
+	bool	safe;
+} reindex_idx;
+
 static bool
 ReindexRelationConcurrently(Oid relationOid, int options)
 {
@@ -3132,10 +3141,15 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		get_rel_name(cellOid;
 	else
 	{
+		reindex_idx	   *idx;
+
 		/* Save the list of relation OIDs in private context */
 		oldcontext = MemoryContextSwitchTo(private_context);
 
-		indexIds = lappend_oid(indexIds, cellOid);
+		idx = palloc(sizeof(reindex_idx));
+		idx->indexId = cellOid;
+
+		indexIds = lappend(indexIds, idx);
 
 		MemoryContextSwitchTo(oldcontext);
 	}
@@ -3172,13 +3186,17 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 			get_rel_name(cellOid;
 		else
 		{
+			reindex_idx	   *idx;
+
 			/*
 			 * Save the list of relation OIDs in private
 			 * context
 			 */
 			oldcontext = MemoryContextSwitchTo(private_context);
 
-			indexIds = lappend_oid(indexIds, cellOid);
+			idx = palloc(sizeof(reindex_idx));
+			idx->indexId = cellOid;
+			indexIds = lappend(indexIds, idx);
 
 			MemoryContextSwitchTo(oldcontext);
 		}
@@ -3197,6 +3215,7 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 Oid			heapId = IndexGetRelation(relationOid,
 	  (options & REINDEXOPT_MISSING_OK) != 0);
 Relation	heapRelation;
+reindex_idx *idx;
 
 /* if relation is missing, leave */
 if (!OidIsValid(heapId))
@@ -3247,7 +3266,9 @@ ReindexRelationConcurrently(Oid relationOid, int options)
  * Save the list of relation OIDs in private context.  Note
  * that invalid indexes are allowed here.
  */
-indexIds = lappend_oid(indexIds, relationOid);
+idx = palloc(sizeof(reindex_idx));
+idx->indexId = relationOid;
+indexIds = lappend(indexIds, idx);
 
 MemoryContextSwitchTo(oldcontext);
 break;
@@ -3306,8 +3327,10 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	foreach(lc, indexIds)
 	{
 		char	   *concurrentName;
-		Oid			indexId = lfirst_oid(lc);
+		reindex_idx *index = lfirst(lc);
+		Oid			indexId = index->indexId;
 		Oid			newIndexId;
+		reindex_idx *newidx;
 		Relation	indexRel;
 		Relation	heapRel;
 		Relation	newIndexRel;
@@ -3347,12 +3370,20 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		 */
 		newIndexRel = index_open(newIndexId, ShareUpdateExclusiveLock);
 
+		/* determine safety of this index for set_indexsafe_procflags */
+		index->safe = (newIndexRel->rd_indexprs == NIL &&
+	   newIndexRel->rd_indpred == NIL);
+
 		/*
 		 * Save the list of OIDs and locks in private context
 		 */
 		oldcontext = MemoryContextSwitchTo(private_context);
 
-		newIndexIds = lappend_oid(newIndexIds, newIndexId);
+		newidx = palloc(sizeof(reindex_idx));
+		newidx->indexId = newIndexId;
+		newidx->safe = index->safe;
+
+		newIndexIds = lappend(newIndexIds, newidx);
 
 		/*
 		 * Save lockrelid to protect each relation from drop then close
@@ -3416,6 +3447,11 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/*
+	 * Because we don't take a snapshot in this transaction, there's no
+	 * need to set the

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-26 Thread Alvaro Herrera

So let's discuss the next step in this series: what to do about REINDEX
CONCURRENTLY.

I started with Dmitry's patch (an updated version of which I already
posted in [1]).  However, Dmitry missed (and I hadn't noticed) that some
of the per-index loops are starting transactions also, and that we need
to set the flag in those.  And what's more, in a couple of the
function-scope transactions we do set the flag pointlessly: the
transactions there do not acquire a snapshot, so there's no reason to
set the flag at all, because WaitForOlderSnapshots ignores sessions
whose Xmin is 0.

There are also transactions that wait first, without setting a snapshot,
and then do some catalog manipulations.  I think it's prett much useless
to set the flag for those, because they're going to be very short
anyway.  (There's also one case of this in CREATE INDEX CONCURRENTLY.)

But there's a more interesting point also.  In Dmitry's patch, we
determine safety for *all* indexes being processed as a set, and then
apply the flag only if they're all deemed safe.  But we can optimize
this, and set the flag for each index' transaction individually, and
only skip it for those specific indexes that are unsafe.  So I propose
to change the data structure used in ReindexRelationConcurrently from
the current list of OIDs to a list of (oid,boolean) pairs, to be able to
track setting the flag individually.

There's one more useful observation: in the function-scope transactions
(outside the per-index loops), we don't touch the contents of any
indexes; we just wait or do some catalog manipulation.  So we can set
the flag *regardless of the safety of any indexes*.  We only need to
care about the safety of the indexes in the phases where we build the
indexes and when we validate them.


[1] https://postgr.es/m/20201118175804.GA23027@alvherre.pgsql

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-26 Thread Alvaro Herrera

On 2020-Nov-25, Alvaro Herrera wrote:

> On 2020-Nov-23, Andres Freund wrote:
> 
> > On 2020-11-23 12:30:05 -0300, Alvaro Herrera wrote:
> 
> > > In other words, my conclusion is that there definitely is a bug here and
> > > I am going to restore the use of exclusive lock for setting the
> > > statusFlags.
> > 
> > Cool.
> 
> Here's a patch.

Pushed, thanks.

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-25 Thread Alvaro Herrera

On 2020-Nov-23, Andres Freund wrote:

> On 2020-11-23 12:30:05 -0300, Alvaro Herrera wrote:

> > In other words, my conclusion is that there definitely is a bug here and
> > I am going to restore the use of exclusive lock for setting the
> > statusFlags.
> 
> Cool.

Here's a patch.

Note it also moves the computation of vacuum's Xmin (per
GetTransactionSnapshot) to *after* the bit has been set in statusFlags.
>From b813c67a4abe2127b8bd13db7e920f958db15d59 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 24 Nov 2020 18:10:42 -0300
Subject: [PATCH] Restore lock level to update statusFlags

Reverts 27838981be9d (some comments are kept).  Per discussion, it does
not seem safe to relax the lock level used for this; in order for it to
be safe, there would have to be memory barriers between the point we set
the flag and the point we set the trasaction Xid, which perhaps would
not be so bad; but there would also have to be barriers at the readers'
side, which from a performance perspective might be bad.

Now maybe this analysis is wrong and it *is* safe for some reason, but
proof of that is not trivial.

Discussion: https://postgr.es/m/20201118190928.vnztes7c2sldu...@alap3.anarazel.de
---
 src/backend/commands/vacuum.c | 20 +++-
 src/backend/replication/logical/logical.c |  2 +-
 src/backend/storage/ipc/procarray.c   |  4 +---
 src/include/storage/proc.h|  6 +++---
 4 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 395e75f768..f1112111de 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1712,13 +1712,6 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
 	/* Begin a transaction for vacuuming this relation */
 	StartTransactionCommand();
 
-	/*
-	 * Need to acquire a snapshot to prevent pg_subtrans from being truncated,
-	 * cutoff xids in local memory wrapping around, and to have updated xmin
-	 * horizons.
-	 */
-	PushActiveSnapshot(GetTransactionSnapshot());
-
 	if (!(params->options & VACOPT_FULL))
 	{
 		/*
@@ -1739,9 +1732,11 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
 		 * Note: these flags remain set until CommitTransaction or
 		 * AbortTransaction.  We don't want to clear them until we reset
 		 * MyProc->xid/xmin, otherwise GetOldestNonRemovableTransactionId()
-		 * might appear to go backwards, which is probably Not Good.
+		 * might appear to go backwards, which is probably Not Good.  (We also
+		 * set PROC_IN_VACUUM *before* taking our own snapshot, so that our
+		 * xmin doesn't become visible ahead of setting the flag.)
 		 */
-		LWLockAcquire(ProcArrayLock, LW_SHARED);
+		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 		MyProc->statusFlags |= PROC_IN_VACUUM;
 		if (params->is_wraparound)
 			MyProc->statusFlags |= PROC_VACUUM_FOR_WRAPAROUND;
@@ -1749,6 +1744,13 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
 		LWLockRelease(ProcArrayLock);
 	}
 
+	/*
+	 * Need to acquire a snapshot to prevent pg_subtrans from being truncated,
+	 * cutoff xids in local memory wrapping around, and to have updated xmin
+	 * horizons.
+	 */
+	PushActiveSnapshot(GetTransactionSnapshot());
+
 	/*
 	 * Check for user-requested abort.  Note we want this to be inside a
 	 * transaction, so xact.c doesn't issue useless WARNING.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 4324e32656..f1f4df7d70 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -181,7 +181,7 @@ StartupDecodingContext(List *output_plugin_options,
 	 */
 	if (!IsTransactionOrTransactionBlock())
 	{
-		LWLockAcquire(ProcArrayLock, LW_SHARED);
+		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 		MyProc->statusFlags |= PROC_IN_LOGICAL_DECODING;
 		ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
 		LWLockRelease(ProcArrayLock);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 94edb24b22..c7848c0b69 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -662,10 +662,8 @@ ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
 		/* avoid unnecessarily dirtying shared cachelines */
 		if (proc->statusFlags & PROC_VACUUM_STATE_MASK)
 		{
-			/* Only safe to change my own flags with just share lock */
-			Assert(proc == MyProc);
 			Assert(!LWLockHeldByMe(ProcArrayLock));
-			LWLockAcquire(ProcArrayLock, LW_SHARED);
+			LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 			Assert(proc->statusFlags == ProcGlobal->statusFlags[proc->pgxactoff]);
 			proc->statusFlags &= ~PROC_VACUUM_STATE_MASK;
 			ProcGlobal->statusFlags[proc->pgxactoff] = proc->statusFlags;
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 00bb244340..22046e4e36 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-24 Thread Alvaro Herrera

On 2020-Nov-23, Tom Lane wrote:

> Alvaro Herrera  writes:

> > GetCurrentVirtualXIDs, ComputeXidHorizons, GetSnapshotData:
> 
> > In these cases, what we need is that the code computes some xmin (or
> > equivalent computation) based on a set of transactions that exclude
> > those marked with the flags.  The behavior we want is that if some
> > transaction is marked as vacuum, we ignore the Xid/Xmin *if there is
> > one*.  In other words, if there's no Xid/Xmin, then the flag is not
> > important.  So if we can ensure that the flag is set first, and the
> > Xid/xmin is installed later, that's sufficient correctness and we don't
> > need to hold exclusive lock.  But if we can't ensure that, then we must
> > use exclusive lock, because otherwise we risk another process seeing our
> > Xid first and not our flag, which would be bad.
> 
> I don't buy this either.  You get the same result if someone looks just
> before you take the ProcArrayLock to set the flag.  So if there's a
> problem, it's inherent in the way that the flags are defined or used;
> the strength of lock used in this stanza won't affect it.

The problem is that the writes could be reordered in a way that makes
the Xid appear set to an onlooker before PROC_IN_VACUUM appears set.
Vacuum always sets the bit first, and *then* the xid.  If the reader
always reads it like that then it's not a problem.  But in order to
guarantee that, we would have to have a read barrier for each pass
through the loop.

With the LW_EXCLUSIVE lock, we block the readers so that the bit is
known set by the time they examine it.  As I understand, getting the
lock is itself a barrier, so there's no danger that we'll set the bit
and they won't see it.

... at least, that how I *imagine* the argument to be.  In practice,
vacuum_rel() calls GetSnapshotData() before installing the
PROC_IN_VACUUM bit, and therefore there *is* a risk that reader 1 will
get MyProc->xmin included in their snapshot (because bit not yet set),
and reader 2 won't.  If my understanding is correct, then we should move
the PushActiveSnapshot(GetTransactionSnapshot()) call to after we have
the PROC_IN_VACUUM bit set.

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-24 Thread Alvaro Herrera

On 2020-Nov-23, Andres Freund wrote:

> On 2020-11-23 12:30:05 -0300, Alvaro Herrera wrote:

> > PROC_IN_LOGICAL_DECODING:
> > Oddly enough, I think the reset of PROC_IN_LOGICAL_DECODING in
> > ReplicationSlotRelease might be the most problematic one of the lot.
> > That's because a proc's xmin that had been ignored all along by
> > ComputeXidHorizons, will now be included in the computation.  Adding
> > asserts that proc->xmin and proc->xid are InvalidXid by the time we
> > reset the flag, I got hits in pg_basebackup, test_decoding and
> > subscription tests.  I think it's OK for ComputeXidHorizons (since it
> > just means that a vacuum that reads a later will remove less rows.)  But
> > in GetSnapshotData it is just not correct to have the Xmin go backwards.
> 
> I don't think there's a problem. PROC_IN_LOGICAL_DECODING can only be
> set when outside a transaction block, i.e. walsender. In which case
> there shouldn't be an xid/xmin, I think? Or did you gate your assert on
> PROC_IN_LOGICAL_DECODING being set?

Ah, you're right about this one --  I missed the significance of setting
the flag only "when outside of a transaction block" at the time we call
StartupDecodingContext.

Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

2020-11-23 Thread Michael Paquier

On Mon, Nov 23, 2020 at 06:13:17PM -0500, Tom Lane wrote:
> Alvaro Herrera  writes:
>> Please feel free to go ahead, including the change to ProcSleep.
> 
> Will do.

Thank you both for 450c823 and 789b938.
--
Michael


signature.asc
Description: PGP signature

Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

2020-11-23 Thread Alvaro Herrera

On 2020-Nov-23, Tom Lane wrote:

> I never cared that much for "is_log_level_output" either.  Thinking
> about renaming it to "should_output_to_log()", and then the new function
> would be "should_output_to_client()".

Ah, that sounds nicely symmetric and grammatical.

Thanks!

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-23 Thread Andres Freund

Hi,

On 2020-11-23 12:30:05 -0300, Alvaro Herrera wrote:
> ProcSleep:  no bug here.
> The flags are checked to see if we should kill() the process (used when
> autovac blocks some other process).  The lock here appears to be
> inconsequential, since we release it before we do kill(); so strictly
> speaking, there is still a race condition where the autovac process
> could reset the flag after we read it and before we get a chance to
> signal it.  The lock level autovac uses to set the flag is not relevant
> either.

Yea. Even before the recent changes building the lock message under the
lock didn't make sense. Now that the flags are mirrored in pgproc, we
probably should just make this use READ_ONCE(autovac->statusFlags),
without any other use of ProcArrayLock.  As you say the race condition
is between the flag test, the kill, and the signal being processed,
which wasn't previously protected either.

> PROC_IN_LOGICAL_DECODING:
> Oddly enough, I think the reset of PROC_IN_LOGICAL_DECODING in
> ReplicationSlotRelease might be the most problematic one of the lot.
> That's because a proc's xmin that had been ignored all along by
> ComputeXidHorizons, will now be included in the computation.  Adding
> asserts that proc->xmin and proc->xid are InvalidXid by the time we
> reset the flag, I got hits in pg_basebackup, test_decoding and
> subscription tests.  I think it's OK for ComputeXidHorizons (since it
> just means that a vacuum that reads a later will remove less rows.)  But
> in GetSnapshotData it is just not correct to have the Xmin go backwards.

I don't think there's a problem. PROC_IN_LOGICAL_DECODING can only be
set when outside a transaction block, i.e. walsender. In which case
there shouldn't be an xid/xmin, I think? Or did you gate your assert on
PROC_IN_LOGICAL_DECODING being set?

> GetCurrentVirtualXIDs, ComputeXidHorizons, GetSnapshotData:
> 
> In these cases, what we need is that the code computes some xmin (or
> equivalent computation) based on a set of transactions that exclude
> those marked with the flags.  The behavior we want is that if some
> transaction is marked as vacuum, we ignore the Xid/Xmin *if there is
> one*. In other words, if there's no Xid/Xmin, then the flag is not
> important.  So if we can ensure that the flag is set first, and the
> Xid/xmin is installed later, that's sufficient correctness and we don't
> need to hold exclusive lock.

That'd require at least memory barriers in GetSnapshotData()'s loop,
which I'd really like to avoid. Otherwise the order in which memory gets
written in one process doesn't guarantee the order of visibility in
another process...

> In other words, my conclusion is that there definitely is a bug here and
> I am going to restore the use of exclusive lock for setting the
> statusFlags.

Cool.

> GetSnapshotData has an additional difficulty -- we do the
> UINT32_ACCESS_ONCE(ProcGlobal->xid[i]) read *before* testing the bit. 
> So it's not valid to set the bit without locking out GSD, regardless of
> any barriers we had; if we want this to be safe, we'd have to change
> this so that the flag is read first, and we read the xid only
> afterwards, with a read barrier.

Which we don't want, because that'd mean slowing down the *extremely*
common case of the majority of backends neither having an xid assigned
nor doing logical decoding / vacuum. We'd be reading two cachelines
instead of one.

Greetings,

Andres Freund

Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

2020-11-23 Thread Tom Lane

Alvaro Herrera  writes:
> On 2020-Nov-23, Tom Lane wrote:
>> I'm not too fussed about whether we invent is_log_level_output_client,
>> although that name doesn't seem well-chosen compared to
>> is_log_level_output.

> Just replacing "log" for "client" in that seemed strictly worse, and I
> didn't (don't) have any other ideas.

I never cared that much for "is_log_level_output" either.  Thinking
about renaming it to "should_output_to_log()", and then the new function
would be "should_output_to_client()".

>> Shall I press forward with this, or do you want to?

> Please feel free to go ahead, including the change to ProcSleep.

Will do.

regards, tom lane

Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

2020-11-23 Thread Alvaro Herrera

On 2020-Nov-23, Tom Lane wrote:

> Ah, I see I didn't cover the case in ProcSleep that you were originally on
> about ... I'd just looked for existing references to log_min_messages
> and client_min_messages.

Yeah, it seemed bad form to add that when you had just argued against it
:-)

> I think it's important to have the explicit check for elevel >= ERROR.
> I'm not too fussed about whether we invent is_log_level_output_client,
> although that name doesn't seem well-chosen compared to
> is_log_level_output.

Just replacing "log" for "client" in that seemed strictly worse, and I
didn't (don't) have any other ideas.

> Shall I press forward with this, or do you want to?

Please feel free to go ahead, including the change to ProcSleep.

Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

2020-11-23 Thread Tom Lane

Alvaro Herrera  writes:
> Your version has the advantage that errstart() doesn't get a new
> function call.  I'm +1 for going with that ... we could avoid the
> duplicate code with some additional contortions but this changes so
> rarely that it's probably not worth the trouble.

I was considering adding that factorization, but marking the function
inline to avoid adding overhead.  Most of elog.c predates our use of
inline, so it wasn't considered when this code was written.

regards, tom lane

Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

2020-11-23 Thread Tom Lane

Alvaro Herrera  writes:
> On 2020-Nov-23, Tom Lane wrote:
>> Here's a draft patch.

> Here's another of my own.  Outside of elog.c it seems identical.

Ah, I see I didn't cover the case in ProcSleep that you were originally on
about ... I'd just looked for existing references to log_min_messages
and client_min_messages.

I think it's important to have the explicit check for elevel >= ERROR.
I'm not too fussed about whether we invent is_log_level_output_client,
although that name doesn't seem well-chosen compared to
is_log_level_output.

Shall I press forward with this, or do you want to?

regards, tom lane

Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

2020-11-23 Thread Alvaro Herrera

On 2020-Nov-23, Alvaro Herrera wrote:

> On 2020-Nov-23, Tom Lane wrote:
> 
> > Here's a draft patch.
> 
> Here's another of my own.  Outside of elog.c it seems identical.

Your version has the advantage that errstart() doesn't get a new
function call.  I'm +1 for going with that ... we could avoid the
duplicate code with some additional contortions but this changes so
rarely that it's probably not worth the trouble.

Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

2020-11-23 Thread Alvaro Herrera

On 2020-Nov-23, Tom Lane wrote:

> Here's a draft patch.

Here's another of my own.  Outside of elog.c it seems identical.

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 03c553e7ea..a4ab9090f9 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -5344,7 +5344,7 @@ static void
 ShowTransactionState(const char *str)
 {
 	/* skip work if message will definitely not be printed */
-	if (log_min_messages <= DEBUG5 || client_min_messages <= DEBUG5)
+	if (message_level_is_interesting(DEBUG5))
 		ShowTransactionStateRec(str, CurrentTransactionState);
 }
 
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 7e915bcadf..32a3099c1f 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -105,7 +105,7 @@ log_invalid_page(RelFileNode node, ForkNumber forkno, BlockNumber blkno,
 	 * tracing of the cause (note the elog context mechanism will tell us
 	 * something about the XLOG record that generated the reference).
 	 */
-	if (log_min_messages <= DEBUG1 || client_min_messages <= DEBUG1)
+	if (message_level_is_interesting(DEBUG1))
 		report_invalid_page(DEBUG1, node, forkno, blkno, present);
 
 	if (invalid_page_tab == NULL)
@@ -159,7 +159,7 @@ forget_invalid_pages(RelFileNode node, ForkNumber forkno, BlockNumber minblkno)
 			hentry->key.forkno == forkno &&
 			hentry->key.blkno >= minblkno)
 		{
-			if (log_min_messages <= DEBUG2 || client_min_messages <= DEBUG2)
+			if (message_level_is_interesting(DEBUG2))
 			{
 char	   *path = relpathperm(hentry->key.node, forkno);
 
@@ -192,7 +192,7 @@ forget_invalid_pages_db(Oid dbid)
 	{
 		if (hentry->key.node.dbNode == dbid)
 		{
-			if (log_min_messages <= DEBUG2 || client_min_messages <= DEBUG2)
+			if (message_level_is_interesting(DEBUG2))
 			{
 char	   *path = relpathperm(hentry->key.node, hentry->key.forkno);
 
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index b0d037600e..0ad647ed9c 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1146,15 +1146,8 @@ reportDependentObjects(const ObjectAddresses *targetObjects,
 	 * If no error is to be thrown, and the msglevel is too low to be shown to
 	 * either client or server log, there's no need to do any of the rest of
 	 * the work.
-	 *
-	 * Note: this code doesn't know all there is to be known about elog
-	 * levels, but it works for NOTICE and DEBUG2, which are the only values
-	 * msglevel can currently have.  We also assume we are running in a normal
-	 * operating environment.
 	 */
-	if (behavior == DROP_CASCADE &&
-		msglevel < client_min_messages &&
-		(msglevel < log_min_messages || log_min_messages == LOG))
+	if (behavior == DROP_CASCADE && !message_level_is_interesting(msglevel))
 		return;
 
 	/*
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index babee386c4..87c3ea450e 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1215,7 +1215,7 @@ ProcessWalSndrMessage(XLogRecPtr walEnd, TimestampTz sendTime)
 	walrcv->lastMsgReceiptTime = lastMsgReceiptTime;
 	SpinLockRelease(>mutex);
 
-	if (log_min_messages <= DEBUG2)
+	if (message_level_is_interesting(DEBUG2))
 	{
 		char	   *sendtime;
 		char	   *receipttime;
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 5d1b1a16be..2eb19ad293 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1900,7 +1900,7 @@ ProcessStandbyReplyMessage(void)
 	replyTime = pq_getmsgint64(_message);
 	replyRequested = pq_getmsgbyte(_message);
 
-	if (log_min_messages <= DEBUG2)
+	if (message_level_is_interesting(DEBUG2))
 	{
 		char	   *replyTimeStr;
 
@@ -2082,7 +2082,7 @@ ProcessStandbyHSFeedbackMessage(void)
 	feedbackCatalogXmin = pq_getmsgint(_message, 4);
 	feedbackCatalogEpoch = pq_getmsgint(_message, 4);
 
-	if (log_min_messages <= DEBUG2)
+	if (message_level_is_interesting(DEBUG2))
 	{
 		char	   *replyTimeStr;
 
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 40eac704dc..eb2309ed76 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -1341,22 +1341,25 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 StringInfoData logbuf;	/* errdetail for server log */
 
 /* report the case, if configured to do so */
-initStringInfo();
-initStringInfo();
-DescribeLockTag(, _copy);
-appendStringInfo(,
- _("Process %d waits for %s on %s."),
- MyProcPid,
- GetLockmodeName(lockmethod_copy, lockmode),
- locktagbuf.data);
+if (message_level_is_interesting(DEBUG1))
+{
+	initStringInfo();
+	initStringInfo();
+	DescribeLockTag(, _copy);
+	appendStringInfo(,
+	 _("Process %d waits for %s on %s."),
+	 MyProcPid,
+

Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

2020-11-23 Thread Tom Lane

Here's a draft patch.

regards, tom lane

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 03c553e7ea..9cd0b7c11b 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -5344,7 +5344,7 @@ static void
 ShowTransactionState(const char *str)
 {
 	/* skip work if message will definitely not be printed */
-	if (log_min_messages <= DEBUG5 || client_min_messages <= DEBUG5)
+	if (message_level_is_interesting(DEBUG5))
 		ShowTransactionStateRec(str, CurrentTransactionState);
 }
 
@@ -5371,7 +5371,6 @@ ShowTransactionStateRec(const char *str, TransactionState s)
 	if (s->parent)
 		ShowTransactionStateRec(str, s->parent);
 
-	/* use ereport to suppress computation if msg will not be printed */
 	ereport(DEBUG5,
 			(errmsg_internal("%s(%d) name: %s; blockState: %s; state: %s, xid/subid/cid: %u/%u/%u%s%s",
 			 str, s->nestingLevel,
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 7e915bcadf..32a3099c1f 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -105,7 +105,7 @@ log_invalid_page(RelFileNode node, ForkNumber forkno, BlockNumber blkno,
 	 * tracing of the cause (note the elog context mechanism will tell us
 	 * something about the XLOG record that generated the reference).
 	 */
-	if (log_min_messages <= DEBUG1 || client_min_messages <= DEBUG1)
+	if (message_level_is_interesting(DEBUG1))
 		report_invalid_page(DEBUG1, node, forkno, blkno, present);
 
 	if (invalid_page_tab == NULL)
@@ -159,7 +159,7 @@ forget_invalid_pages(RelFileNode node, ForkNumber forkno, BlockNumber minblkno)
 			hentry->key.forkno == forkno &&
 			hentry->key.blkno >= minblkno)
 		{
-			if (log_min_messages <= DEBUG2 || client_min_messages <= DEBUG2)
+			if (message_level_is_interesting(DEBUG2))
 			{
 char	   *path = relpathperm(hentry->key.node, forkno);
 
@@ -192,7 +192,7 @@ forget_invalid_pages_db(Oid dbid)
 	{
 		if (hentry->key.node.dbNode == dbid)
 		{
-			if (log_min_messages <= DEBUG2 || client_min_messages <= DEBUG2)
+			if (message_level_is_interesting(DEBUG2))
 			{
 char	   *path = relpathperm(hentry->key.node, hentry->key.forkno);
 
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index b0d037600e..245c2f4fc8 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1146,15 +1146,9 @@ reportDependentObjects(const ObjectAddresses *targetObjects,
 	 * If no error is to be thrown, and the msglevel is too low to be shown to
 	 * either client or server log, there's no need to do any of the rest of
 	 * the work.
-	 *
-	 * Note: this code doesn't know all there is to be known about elog
-	 * levels, but it works for NOTICE and DEBUG2, which are the only values
-	 * msglevel can currently have.  We also assume we are running in a normal
-	 * operating environment.
 	 */
 	if (behavior == DROP_CASCADE &&
-		msglevel < client_min_messages &&
-		(msglevel < log_min_messages || log_min_messages == LOG))
+		!message_level_is_interesting(msglevel))
 		return;
 
 	/*
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index babee386c4..87c3ea450e 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1215,7 +1215,7 @@ ProcessWalSndrMessage(XLogRecPtr walEnd, TimestampTz sendTime)
 	walrcv->lastMsgReceiptTime = lastMsgReceiptTime;
 	SpinLockRelease(>mutex);
 
-	if (log_min_messages <= DEBUG2)
+	if (message_level_is_interesting(DEBUG2))
 	{
 		char	   *sendtime;
 		char	   *receipttime;
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 5d1b1a16be..2eb19ad293 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1900,7 +1900,7 @@ ProcessStandbyReplyMessage(void)
 	replyTime = pq_getmsgint64(_message);
 	replyRequested = pq_getmsgbyte(_message);
 
-	if (log_min_messages <= DEBUG2)
+	if (message_level_is_interesting(DEBUG2))
 	{
 		char	   *replyTimeStr;
 
@@ -2082,7 +2082,7 @@ ProcessStandbyHSFeedbackMessage(void)
 	feedbackCatalogXmin = pq_getmsgint(_message, 4);
 	feedbackCatalogEpoch = pq_getmsgint(_message, 4);
 
-	if (log_min_messages <= DEBUG2)
+	if (message_level_is_interesting(DEBUG2))
 	{
 		char	   *replyTimeStr;
 
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 1ba47c194b..04c2245338 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -600,6 +600,42 @@ errfinish(const char *filename, int lineno, const char *funcname)
 	CHECK_FOR_INTERRUPTS();
 }
 
+/*
+ * message_level_is_interesting --- would ereport/elog do anything?
+ *
+ * Returns true if ereport/elog with this elevel will not be a no-op.
+ * This is useful to short-circuit any expensive preparatory work that
+ * might be needed for a logging message.  There is no point in
+ *

Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

2020-11-23 Thread Alvaro Herrera

On 2020-Nov-23, Tom Lane wrote:

> Alvaro Herrera  writes:
> > Well, we already do this in a number of places.  But I can get behind
> > this:
> 
> >> Maybe it'd be a good idea to have elog.c expose a new function
> >> along the lines of "bool message_level_is_interesting(int elevel)"
> >> to support this and similar future optimizations in a less fragile way.
> 
> I'll see about a patch for that.

Looking at that now ...

Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

2020-11-23 Thread Tom Lane

Alvaro Herrera  writes:
> Well, we already do this in a number of places.  But I can get behind
> this:

>> Maybe it'd be a good idea to have elog.c expose a new function
>> along the lines of "bool message_level_is_interesting(int elevel)"
>> to support this and similar future optimizations in a less fragile way.

I'll see about a patch for that.

regards, tom lane

Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

2020-11-23 Thread Alvaro Herrera

On 2020-Nov-23, Tom Lane wrote:

> Alvaro Herrera  writes:
> > On 2020-Nov-19, Michael Paquier wrote:
> >> By the way, it strikes me that you could just do nothing as long as
> >> (log_min_messages > DEBUG1), so you could encapsulate most of the
> >> logic that plays with the lock tag using that.
> 
> > Good idea, done.
> 
> I'm less sure that that's a good idea.  It embeds knowledge here that
> should not exist outside elog.c; moreover, I'm not entirely sure that
> it's even correct, given the nonlinear ranking of log_min_messages.

Well, we already do this in a number of places.  But I can get behind
this:

> Maybe it'd be a good idea to have elog.c expose a new function
> along the lines of "bool message_level_is_interesting(int elevel)"
> to support this and similar future optimizations in a less fragile way.

Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

2020-11-23 Thread Tom Lane

Alvaro Herrera  writes:
> On 2020-Nov-19, Michael Paquier wrote:
>> By the way, it strikes me that you could just do nothing as long as
>> (log_min_messages > DEBUG1), so you could encapsulate most of the
>> logic that plays with the lock tag using that.

> Good idea, done.

I'm less sure that that's a good idea.  It embeds knowledge here that
should not exist outside elog.c; moreover, I'm not entirely sure that
it's even correct, given the nonlinear ranking of log_min_messages.

Maybe it'd be a good idea to have elog.c expose a new function
along the lines of "bool message_level_is_interesting(int elevel)"
to support this and similar future optimizations in a less fragile way.

regards, tom lane

Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

2020-11-23 Thread Alvaro Herrera

On 2020-Nov-19, Michael Paquier wrote:

> On Thu, Nov 19, 2020 at 12:13:44PM +0900, Michael Paquier wrote:
> > That still looks useful for debugging, so DEBUG1 sounds fine to me.
> 
> By the way, it strikes me that you could just do nothing as long as
> (log_min_messages > DEBUG1), so you could encapsulate most of the
> logic that plays with the lock tag using that.

Good idea, done.

I also noticed that if we're going to accept a race (which BTW already
exists) we may as well simplify the code about it.

I think the attached is the final form of this.
>From cab95d8813cd79adb7b2a1b5501714c2dd87bec2 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 17 Nov 2020 21:09:22 -0300
Subject: [PATCH] Don't hold ProcArrayLock longer than needed in rare cases

While cancelling an autovacuum worker, we hold ProcArrayLock while
formatting a debugging log string.  We can make this shorter by saving
the data we need to produce the message and doing the formatting outside
the locked region.

This isn't terribly critical, as it only occurs pretty rarely: when a
backend runs deadlock detection and it happens to be blocked by a
autovacuum running autovacuum.  Still, there's no need to cause a hiccup
in ProcArrayLock processing, which can be very high-traffic in some
cases.

While at it, rework code so that we only print the string when it is
really going to be used, as suggested by Michael Paquier.

Discussion: https://postgr.es/m/20201118214127.GA3179@alvherre.pgsql
Reviewed-by: Michael Paquier 
---
 src/backend/storage/lmgr/proc.c | 63 -
 1 file changed, 38 insertions(+), 25 deletions(-)

diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index d1738c65f5..dbfdd7b3c7 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -1311,40 +1311,58 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 		{
 			PGPROC	   *autovac = GetBlockingAutoVacuumPgproc();
 			uint8		statusFlags;
+			uint8		lockmethod_copy;
+			LOCKTAG		locktag_copy;
 
-			LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+			/*
+			 * Grab info we need, then release lock immediately.  Note this
+			 * coding means that there is a tiny chance that the process
+			 * terminates its current transaction and starts a different one
+			 * before we have a change to send the signal; the worst possible
+			 * consequence is that a for-wraparound vacuum is cancelled.  But
+			 * that could happen in any case unless we were to do kill() with
+			 * the lock held, which is much more undesirable.
+			 */
+			LWLockAcquire(ProcArrayLock, LW_SHARED);
+			statusFlags = ProcGlobal->statusFlags[autovac->pgxactoff];
+			lockmethod_copy = lock->tag.locktag_lockmethodid;
+			locktag_copy = lock->tag;
+			LWLockRelease(ProcArrayLock);
 
 			/*
 			 * Only do it if the worker is not working to protect against Xid
 			 * wraparound.
 			 */
-			statusFlags = ProcGlobal->statusFlags[autovac->pgxactoff];
 			if ((statusFlags & PROC_IS_AUTOVACUUM) &&
 !(statusFlags & PROC_VACUUM_FOR_WRAPAROUND))
 			{
 int			pid = autovac->pid;
-StringInfoData locktagbuf;
-StringInfoData logbuf;	/* errdetail for server log */
 
-initStringInfo();
-initStringInfo();
-DescribeLockTag(, >tag);
-appendStringInfo(,
- _("Process %d waits for %s on %s."),
- MyProcPid,
- GetLockmodeName(lock->tag.locktag_lockmethodid,
- lockmode),
- locktagbuf.data);
+/* report the case, if configured to do so */
+if (log_min_messages <= DEBUG1 || client_min_messages <= DEBUG1)
+{
+	StringInfoData locktagbuf;
+	StringInfoData logbuf;	/* errdetail for server log */
 
-/* release lock as quickly as possible */
-LWLockRelease(ProcArrayLock);
+	initStringInfo();
+	initStringInfo();
+	DescribeLockTag(, _copy);
+	appendStringInfo(,
+	 _("Process %d waits for %s on %s."),
+	 MyProcPid,
+	 GetLockmodeName(lockmethod_copy, lockmode),
+	 locktagbuf.data);
+
+	ereport(DEBUG1,
+			(errmsg("sending cancel to blocking autovacuum PID %d",
+	pid),
+			 errdetail_log("%s", logbuf.data)));
+
+	pfree(logbuf.data);
+	pfree(locktagbuf.data);
+}
 
 /* send the autovacuum worker Back to Old Kent Road */
-ereport(DEBUG1,
-		(errmsg("sending cancel to blocking autovacuum PID %d",
-pid),
-		 errdetail_log("%s", logbuf.data)));
-
 if (kill(pid, SIGINT) < 0)
 {
 	/*
@@ -1362,12 +1380,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 (errmsg("could not send signal to process %d: %m",
 		pid)));
 }
-
-pfree(logbuf.data);
-pfree(locktagbuf.data);
 			}
-			else
-LWLockRelease(ProcArrayLock);
 
 			/* prevent signal from being sent again more than once */
 			allow_autovacuum_cancel = false;
-- 
2.20.1

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-23 Thread Tom Lane

Alvaro Herrera  writes:
> PROC_IN_LOGICAL_DECODING:
> Oddly enough, I think the reset of PROC_IN_LOGICAL_DECODING in
> ReplicationSlotRelease might be the most problematic one of the lot.
> That's because a proc's xmin that had been ignored all along by
> ComputeXidHorizons, will now be included in the computation.  Adding
> asserts that proc->xmin and proc->xid are InvalidXid by the time we
> reset the flag, I got hits in pg_basebackup, test_decoding and
> subscription tests.  I think it's OK for ComputeXidHorizons (since it
> just means that a vacuum that reads a later will remove less rows.)  But
> in GetSnapshotData it is just not correct to have the Xmin go backwards.

> Therefore it seems to me that this code has a bug independently of the
> lock level used.

That is only a bug if the flags are *cleared* in a way that's not
atomic with clearing the transaction's xid/xmin, no?  I agree that
once set, the flag had better stay set till transaction end, but
that's not what's at stake here.

> GetCurrentVirtualXIDs, ComputeXidHorizons, GetSnapshotData:

> In these cases, what we need is that the code computes some xmin (or
> equivalent computation) based on a set of transactions that exclude
> those marked with the flags.  The behavior we want is that if some
> transaction is marked as vacuum, we ignore the Xid/Xmin *if there is
> one*.  In other words, if there's no Xid/Xmin, then the flag is not
> important.  So if we can ensure that the flag is set first, and the
> Xid/xmin is installed later, that's sufficient correctness and we don't
> need to hold exclusive lock.  But if we can't ensure that, then we must
> use exclusive lock, because otherwise we risk another process seeing our
> Xid first and not our flag, which would be bad.

I don't buy this either.  You get the same result if someone looks just
before you take the ProcArrayLock to set the flag.  So if there's a
problem, it's inherent in the way that the flags are defined or used;
the strength of lock used in this stanza won't affect it.

regards, tom lane

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-23 Thread Alvaro Herrera

On 2020-Nov-18, Andres Freund wrote:

> In 13 this is:
>   LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
>   MyPgXact->vacuumFlags |= PROC_IN_VACUUM;
>   if (params->is_wraparound)
>   MyPgXact->vacuumFlags |= PROC_VACUUM_FOR_WRAPAROUND;
>   LWLockRelease(ProcArrayLock);
> 
> Lowering this to a shared lock doesn't seem right, at least without a
> detailed comment explaining why it's safe. Because GetSnapshotData() etc
> look at all procs with just an LW_SHARED ProcArrayLock, changing
> vacuumFlags without a lock means that two concurrent horizon
> computations could come to a different result.
> 
> I'm not saying it's definitely wrong to relax things here, but I'm not
> sure we've evaluated it sufficiently.

True.  Let's evaluate it.

I changed three spots where statusFlags bits are written:

a) StartupDecodingContext: sets PROC_IN_LOGICAL_DECODING
b) ReplicationSlotRelease: removes PROC_IN_LOGICAL_DECODING
c) vacuum_rel: sets PROC_IN_VACUUM and potentially
   PROC_VACUUM_FOR_WRAPAROUND

Who reads these flags?

PROC_IN_LOGICAL_DECODING is read by:
 * ComputeXidHorizons
 * GetSnapshotData

PROC_IN_VACUUM is read by:
 * GetCurrentVirtualXIDs
 * ComputeXidHorizons
 * GetSnapshotData
 * ProcArrayInstallImportedXmin

PROC_VACUUM_FOR_WRAPAROUND is read by:
 * ProcSleep

ProcSleep:  no bug here.
The flags are checked to see if we should kill() the process (used when
autovac blocks some other process).  The lock here appears to be
inconsequential, since we release it before we do kill(); so strictly
speaking, there is still a race condition where the autovac process
could reset the flag after we read it and before we get a chance to
signal it.  The lock level autovac uses to set the flag is not relevant
either.

ProcArrayInstallImportedXmin:
This one is just searching for a matching backend; not affected by the
flags.

PROC_IN_LOGICAL_DECODING:
Oddly enough, I think the reset of PROC_IN_LOGICAL_DECODING in
ReplicationSlotRelease might be the most problematic one of the lot.
That's because a proc's xmin that had been ignored all along by
ComputeXidHorizons, will now be included in the computation.  Adding
asserts that proc->xmin and proc->xid are InvalidXid by the time we
reset the flag, I got hits in pg_basebackup, test_decoding and
subscription tests.  I think it's OK for ComputeXidHorizons (since it
just means that a vacuum that reads a later will remove less rows.)  But
in GetSnapshotData it is just not correct to have the Xmin go backwards.

Therefore it seems to me that this code has a bug independently of the
lock level used.

GetCurrentVirtualXIDs, ComputeXidHorizons, GetSnapshotData:

In these cases, what we need is that the code computes some xmin (or
equivalent computation) based on a set of transactions that exclude
those marked with the flags.  The behavior we want is that if some
transaction is marked as vacuum, we ignore the Xid/Xmin *if there is
one*.  In other words, if there's no Xid/Xmin, then the flag is not
important.  So if we can ensure that the flag is set first, and the
Xid/xmin is installed later, that's sufficient correctness and we don't
need to hold exclusive lock.  But if we can't ensure that, then we must
use exclusive lock, because otherwise we risk another process seeing our
Xid first and not our flag, which would be bad.

In other words, my conclusion is that there definitely is a bug here and
I am going to restore the use of exclusive lock for setting the
statusFlags.

GetSnapshotData has an additional difficulty -- we do the
UINT32_ACCESS_ONCE(ProcGlobal->xid[i]) read *before* testing the bit. 
So it's not valid to set the bit without locking out GSD, regardless of
any barriers we had; if we want this to be safe, we'd have to change
this so that the flag is read first, and we read the xid only
afterwards, with a read barrier.

I *think* we could relax the lock if we had a write barrier in
between: set the flag first, issue a write barrier, set the Xid.
(I have to admit I'm confused about what needs to happen in the read
case: read the bit first, potentially skip the PGPROC entry; but can we
read the Xid ahead of reading the flag, and if we do read the xid
afterwards, do we need a read barrier in between?)
Given this uncertainty, I'm not proposing to relax the lock from
exclusive to shared.

I didn't get around to reading ComputeXidHorizons, but it seems like
it'd have the same problem as GSD.

Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

2020-11-18 Thread Michael Paquier

On Thu, Nov 19, 2020 at 12:13:44PM +0900, Michael Paquier wrote:
> That still looks useful for debugging, so DEBUG1 sounds fine to me.

By the way, it strikes me that you could just do nothing as long as
(log_min_messages > DEBUG1), so you could encapsulate most of the
logic that plays with the lock tag using that.
--
Michael


signature.asc
Description: PGP signature

Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

2020-11-18 Thread Michael Paquier

On Wed, Nov 18, 2020 at 02:48:40PM -0800, Andres Freund wrote:
> On 2020-11-18 18:41:27 -0300, Alvaro Herrera wrote:
>> We could make this more concurrent by copying lock->tag to a local
>> variable, releasing the lock, then doing all the string formatting and
>> printing.  See attached quickly.patch.
> 
> Sounds like a plan.

+1.

>> Now, when this code was written (d7318d43d, 2012), this was a LOG
>> message; it was demoted to DEBUG1 later (d8f15c95bec, 2015).  I think it
>> would be fair to ... remove the message?  Or go back to Simon's original
>> formulation from commit acac68b2bca, which had this message as DEBUG2
>> without any string formatting.
> 
> I don't really have an opinion on this.

That still looks useful for debugging, so DEBUG1 sounds fine to me.
--
Michael


signature.asc
Description: PGP signature

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-18 Thread Michael Paquier

On Wed, Nov 18, 2020 at 11:09:28AM -0800, Andres Freund wrote:
> Uh, wait a second. The acquisition of this lock hasn't been affected by
> the snapshot scalability changes, and therefore are unrelated to
> ->pgxactoff changing or not.
> 
> In 13 this is:
>   LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
>   MyPgXact->vacuumFlags |= PROC_IN_VACUUM;
>   if (params->is_wraparound)
>   MyPgXact->vacuumFlags |= PROC_VACUUM_FOR_WRAPAROUND;
>   LWLockRelease(ProcArrayLock);
> 
> Lowering this to a shared lock doesn't seem right, at least without a
> detailed comment explaining why it's safe. Because GetSnapshotData() etc
> look at all procs with just an LW_SHARED ProcArrayLock, changing
> vacuumFlags without a lock means that two concurrent horizon
> computations could come to a different result.
> 
> I'm not saying it's definitely wrong to relax things here, but I'm not
> sure we've evaluated it sufficiently.

Yeah.  While I do like the new assertion that 27838981 has added in
ProcArrayEndTransactionInternal(), this commit feels a bit rushed to
me.  Echoing with my comment from upthread, I am not sure that we
still document enough why a shared lock would be completely fine in
the case of statusFlags.  We have no hints that this could be fine
before 2783898, and 2783898 does not make that look better.  FWIW, I
think that just using LW_EXCLUSIVE for the CIC change would have been
fine enough first, after evaluating the performance impact.  We could
evaluate if it is possible to lower the ProcArrayLock acquisition in
those code paths on a separate thread.
--
Michael


signature.asc
Description: PGP signature

Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

2020-11-18 Thread Andres Freund

Hi,

On 2020-11-18 18:41:27 -0300, Alvaro Herrera wrote:
> The amount of stuff that this is doing with ProcArrayLock held
> exclusively seems a bit excessive; it sounds like we could copy the
> values we need first, release the lock, and *then* do all that memory
> allocation and string printing -- it's a lock of code for such a
> contended lock.

Yea, that's a good point.


> Anytime a process sees itself as blocked by autovacuum
> and wants to signal it, there's a ProcArrayLock hiccup (I didn't
> actually measure it, but it's at least five function calls).

I'm a bit doubtful it's that important - it's limited in frequency
by deadlock_timeout. But worth improving anyway.


> We could make this more concurrent by copying lock->tag to a local
> variable, releasing the lock, then doing all the string formatting and
> printing.  See attached quickly.patch.

Sounds like a plan.


> Now, when this code was written (d7318d43d, 2012), this was a LOG
> message; it was demoted to DEBUG1 later (d8f15c95bec, 2015).  I think it
> would be fair to ... remove the message?  Or go back to Simon's original
> formulation from commit acac68b2bca, which had this message as DEBUG2
> without any string formatting.

I don't really have an opinion on this.

Greetings,

Andres Freund

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-18 Thread Alvaro Herrera

On 2020-Nov-18, Michael Paquier wrote:

> On Tue, Nov 17, 2020 at 02:14:53PM -0300, Alvaro Herrera wrote:
> > diff --git a/src/backend/replication/logical/logical.c 
> > b/src/backend/replication/logical/logical.c
> > index f1f4df7d70..4324e32656 100644
> > --- a/src/backend/replication/logical/logical.c
> > +++ b/src/backend/replication/logical/logical.c
> > @@ -181,7 +181,7 @@ StartupDecodingContext(List *output_plugin_options,
> >  */
> > if (!IsTransactionOrTransactionBlock())
> > {
> > -   LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
> > +   LWLockAcquire(ProcArrayLock, LW_SHARED);
> > MyProc->statusFlags |= PROC_IN_LOGICAL_DECODING;
> > ProcGlobal->statusFlags[MyProc->pgxactoff] = 
> > MyProc->statusFlags;
> > LWLockRelease(ProcArrayLock);
> 
> We don't really document that it is safe to update statusFlags while
> holding a shared lock on ProcArrayLock, right?  Could this be
> clarified at least in proc.h?

Pushed that part with a comment addition.  This stuff is completely
undocumented ...

> > +   /* Determine whether we can call set_safe_index_flag */
> > +   safe_index = indexInfo->ii_Expressions == NIL &&
> > +   indexInfo->ii_Predicate == NIL;
> 
> This should tell why we check after expressions and predicates, in
> short giving an explanation about the snapshot issues with build and
> validation when executing those expressions and/or predicates.

Fair.  It seems good to add a comment to the new function, and reference
that in each place where we set the flag.


> > + * Set the PROC_IN_SAFE_IC flag in my PGPROC entry.

> Would it be better to document that this function had better be called
> after beginning a transaction?  I am wondering if we should not
> enforce some conditions here, say this one or similar:
> Assert(GetTopTransactionIdIfAny() == InvalidTransactionId);

I went with checking MyProc->xid and MyProc->xmin, which is the same in
practice but notionally closer to what we're doing.

> > +static inline void
> > +set_safe_index_flag(void)
> 
> This routine name is rather close to index_set_state_flags(), could it
> be possible to finish with a better name?

I went with set_indexsafe_procflags().  Ugly ...
>From a1c9fb94a71f0bf9ec492e1715809315e38084b3 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 17 Nov 2020 11:41:19 -0300
Subject: [PATCH v7 1/2] CREATE INDEX CONCURRENTLY: don't wait for its kin
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In the various waiting phases of CREATE INDEX CONCURRENTLY, we wait
for other processes to release their snapshots; in general terms this is
necessary for correctness.  However, processes doing CIC for other
tables cannot possibly affect CIC done on "this" table, so we don't need
to wait for *those*.  This commit adds a flag in MyProc->statusFlags to
indicate that the current process is doing CIC, so that other processes
doing CIC can ignore it when waiting.

This flag can potentially also be used by processes doing REINDEX
CONCURRENTLY also to avoid waiting, and by VACUUM to ignore the process
in CIC for the purposes of computing an Xmin.  That's left for future
commits.

Author: Álvaro Herrera 
Author: Dimitry Dolgov <9erthali...@gmail.com>
Reviewed-by: Michael Paquier 
Discussion: https://postgr.es/m/20200810233815.GA18970@alvherre.pgsql
---
 src/backend/commands/indexcmds.c | 63 ++--
 src/include/storage/proc.h   |  5 ++-
 2 files changed, 64 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 02c7a0c7e1..9efb6b5420 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -93,6 +93,7 @@ static void ReindexPartitions(Oid relid, int options, bool isTopLevel);
 static void ReindexMultipleInternal(List *relids, int options);
 static bool ReindexRelationConcurrently(Oid relationOid, int options);
 static void update_relispartition(Oid relationId, bool newval);
+static inline void set_indexsafe_procflags(void);
 
 /*
  * callback argument type for RangeVarCallbackForReindexIndex()
@@ -384,7 +385,10 @@ CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts)
  * lazy VACUUMs, because they won't be fazed by missing index entries
  * either.  (Manual ANALYZEs, however, can't be excluded because they
  * might be within transactions that are going to do arbitrary operations
- * later.)
+ * later.)  Processes running CREATE INDEX CONCURRENTLY
+ * on indexes that are neither expressional nor partial are also safe to
+ * ignore, since we know that those processes won't examine any data
+ * outside the table they're indexing.
  *
  * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
  * check for that.
@@ -405,7 +409,8 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
 	VirtualTransactionId *old_snapshots;
 
 	old_snapshots = GetCurrentVirtualXIDs(limitXmin, true,

"as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

2020-11-18 Thread Alvaro Herrera

> On 2020-11-17 12:55:01 -0300, Alvaro Herrera wrote:
> > ... ah, but I realize now that this means that we can use shared lock
> > here, not exclusive, which is already an enormous improvement.  That's
> > because ->pgxactoff can only be changed with exclusive lock held; so as
> > long as we hold shared, the array item cannot move.
> 
> Uh, wait a second. The acquisition of this lock hasn't been affected by
> the snapshot scalability changes, and therefore are unrelated to
> ->pgxactoff changing or not.

I'm writing a response trying to thoroughly analyze this, but I also
wanted to report that ProcSleep is being a bit generous with what it
calls "as quickly as possible" here:

LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);

/*
 * Only do it if the worker is not working to protect against Xid
 * wraparound.
 */
statusFlags = ProcGlobal->statusFlags[autovac->pgxactoff];
if ((statusFlags & PROC_IS_AUTOVACUUM) &&
!(statusFlags & PROC_VACUUM_FOR_WRAPAROUND))
{
intpid = autovac->pid;
StringInfoData locktagbuf;
StringInfoData logbuf;/* errdetail for server log */

initStringInfo();
initStringInfo();
DescribeLockTag(, >tag);
appendStringInfo(,
 _("Process %d waits for %s on %s."),
 MyProcPid,
 GetLockmodeName(lock->tag.locktag_lockmethodid,
 lockmode),
 locktagbuf.data);

/* release lock as quickly as possible */
LWLockRelease(ProcArrayLock);

The amount of stuff that this is doing with ProcArrayLock held
exclusively seems a bit excessive; it sounds like we could copy the
values we need first, release the lock, and *then* do all that memory
allocation and string printing -- it's a lock of code for such a
contended lock.  Anytime a process sees itself as blocked by autovacuum
and wants to signal it, there's a ProcArrayLock hiccup (I didn't
actually measure it, but it's at least five function calls).  We could
make this more concurrent by copying lock->tag to a local variable,
releasing the lock, then doing all the string formatting and printing.
See attached quickly.patch.

Now, when this code was written (d7318d43d, 2012), this was a LOG
message; it was demoted to DEBUG1 later (d8f15c95bec, 2015).  I think it
would be fair to ... remove the message?  Or go back to Simon's original
formulation from commit acac68b2bca, which had this message as DEBUG2
without any string formatting.

Thoughts?
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index d1738c65f5..caefff41e2 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -1325,20 +1325,22 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 int			pid = autovac->pid;
 StringInfoData locktagbuf;
 StringInfoData logbuf;	/* errdetail for server log */
+LOCKTAG		locktag_copy;
+
+/* release lock as quickly as possible */
+locktag_copy = lock->tag;
+LWLockRelease(ProcArrayLock);

 initStringInfo();
 initStringInfo();
-DescribeLockTag(, >tag);
+DescribeLockTag(, _copy);
 appendStringInfo(,
  _("Process %d waits for %s on %s."),
  MyProcPid,
- GetLockmodeName(lock->tag.locktag_lockmethodid,
+ GetLockmodeName(locktag_copy.locktag_lockmethodid,
  lockmode),
  locktagbuf.data);

-/* release lock as quickly as possible */
-LWLockRelease(ProcArrayLock);
-
 /* send the autovacuum worker Back to Old Kent Road */
 ereport(DEBUG1,
 		(errmsg("sending cancel to blocking autovacuum PID %d",

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-18 Thread Andres Freund

Hi,

On 2020-11-17 12:55:01 -0300, Alvaro Herrera wrote:
> ... ah, but I realize now that this means that we can use shared lock
> here, not exclusive, which is already an enormous improvement.  That's
> because ->pgxactoff can only be changed with exclusive lock held; so as
> long as we hold shared, the array item cannot move.

Uh, wait a second. The acquisition of this lock hasn't been affected by
the snapshot scalability changes, and therefore are unrelated to
->pgxactoff changing or not.

In 13 this is:
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
MyPgXact->vacuumFlags |= PROC_IN_VACUUM;
if (params->is_wraparound)
MyPgXact->vacuumFlags |= PROC_VACUUM_FOR_WRAPAROUND;
LWLockRelease(ProcArrayLock);

Lowering this to a shared lock doesn't seem right, at least without a
detailed comment explaining why it's safe. Because GetSnapshotData() etc
look at all procs with just an LW_SHARED ProcArrayLock, changing
vacuumFlags without a lock means that two concurrent horizon
computations could come to a different result.

I'm not saying it's definitely wrong to relax things here, but I'm not
sure we've evaluated it sufficiently.

Greetings,

Andres Freund

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-17 Thread Michael Paquier

On Tue, Nov 17, 2020 at 02:14:53PM -0300, Alvaro Herrera wrote:
> diff --git a/src/backend/replication/logical/logical.c 
> b/src/backend/replication/logical/logical.c
> index f1f4df7d70..4324e32656 100644
> --- a/src/backend/replication/logical/logical.c
> +++ b/src/backend/replication/logical/logical.c
> @@ -181,7 +181,7 @@ StartupDecodingContext(List *output_plugin_options,
>*/
>   if (!IsTransactionOrTransactionBlock())
>   {
> - LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
> + LWLockAcquire(ProcArrayLock, LW_SHARED);
>   MyProc->statusFlags |= PROC_IN_LOGICAL_DECODING;
>   ProcGlobal->statusFlags[MyProc->pgxactoff] = 
> MyProc->statusFlags;
>   LWLockRelease(ProcArrayLock);

We don't really document that it is safe to update statusFlags while
holding a shared lock on ProcArrayLock, right?  Could this be
clarified at least in proc.h?

> + /* Determine whether we can call set_safe_index_flag */
> + safe_index = indexInfo->ii_Expressions == NIL &&
> + indexInfo->ii_Predicate == NIL;

This should tell why we check after expressions and predicates, in
short giving an explanation about the snapshot issues with build and
validation when executing those expressions and/or predicates.

> + * Set the PROC_IN_SAFE_IC flag in my PGPROC entry.
> + *
> + * When doing concurrent index builds, we can set this flag
> + * to tell other processes concurrently running CREATE
> + * INDEX CONCURRENTLY to ignore us when
> + * doing their waits for concurrent snapshots.  On one hand it
> + * avoids pointlessly waiting for a process that's not interesting
> + * anyway, but more importantly it avoids deadlocks in some cases.
> + *
> + * This can only be done for indexes that don't execute any expressions.
> + * Caller is responsible for only calling this routine when that
> + * assumption holds true.
> + *
> + * (The flag is reset automatically at transaction end, so it must be
> + * set for each transaction.)

Would it be better to document that this function had better be called
after beginning a transaction?  I am wondering if we should not
enforce some conditions here, say this one or similar:
Assert(GetTopTransactionIdIfAny() == InvalidTransactionId);

> + */
> +static inline void
> +set_safe_index_flag(void)

This routine name is rather close to index_set_state_flags(), could it
be possible to finish with a better name?
--
Michael


signature.asc
Description: PGP signature

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-17 Thread Alvaro Herrera

So I made the change to set the statusFlags with only LW_SHARED -- both
in existing uses (see 0002) and in the new CIC code (0003).  I don't
think 0002 is going to have a tremendous performance impact, because it
changes operations that are very infrequent.  But even so, it would be
weird to leave code around that uses exclusive lock when we're going to
add new code that uses shared lock for the same thing.

I still left the REINDEX CONCURRENTLY support in split out in 0004; I
intend to get the first three patches pushed now, and look into 0004
again later.
>From 5e2af4e082df8f839bf257d933d70bb9645f6cfe Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 17 Nov 2020 11:37:36 -0300
Subject: [PATCH v6 1/4] indexcmds.c: reorder function prototypes

... out of an overabundance of neatnikism, perhaps.
---
 src/backend/commands/indexcmds.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 35696f9f75..02c7a0c7e1 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,6 +68,7 @@
 
 
 /* non-export function prototypes */
+static bool CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts);
 static void CheckPredicate(Expr *predicate);
 static void ComputeIndexAttrs(IndexInfo *indexInfo,
 			  Oid *typeOidP,
@@ -87,13 +88,11 @@ static char *ChooseIndexNameAddition(List *colnames);
 static List *ChooseIndexColumnNames(List *indexElems);
 static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
 			Oid relId, Oid oldRelId, void *arg);
-static bool ReindexRelationConcurrently(Oid relationOid, int options);
-
+static void reindex_error_callback(void *args);
 static void ReindexPartitions(Oid relid, int options, bool isTopLevel);
 static void ReindexMultipleInternal(List *relids, int options);
-static void reindex_error_callback(void *args);
+static bool ReindexRelationConcurrently(Oid relationOid, int options);
 static void update_relispartition(Oid relationId, bool newval);
-static bool CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts);
 
 /*
  * callback argument type for RangeVarCallbackForReindexIndex()
-- 
2.20.1

>From 3e2778d553448c67e956bd385c4a94039933d3c6 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 17 Nov 2020 13:48:16 -0300
Subject: [PATCH v6 2/4] relax lock level for setting statusFlags

---
 src/backend/commands/vacuum.c | 2 +-
 src/backend/replication/logical/logical.c | 2 +-
 src/backend/replication/slot.c| 2 +-
 src/backend/storage/ipc/procarray.c   | 2 ++
 4 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index d89ba3031f..395e75f768 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1741,7 +1741,7 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
 		 * MyProc->xid/xmin, otherwise GetOldestNonRemovableTransactionId()
 		 * might appear to go backwards, which is probably Not Good.
 		 */
-		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		LWLockAcquire(ProcArrayLock, LW_SHARED);
 		MyProc->statusFlags |= PROC_IN_VACUUM;
 		if (params->is_wraparound)
 			MyProc->statusFlags |= PROC_VACUUM_FOR_WRAPAROUND;
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index f1f4df7d70..4324e32656 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -181,7 +181,7 @@ StartupDecodingContext(List *output_plugin_options,
 	 */
 	if (!IsTransactionOrTransactionBlock())
 	{
-		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		LWLockAcquire(ProcArrayLock, LW_SHARED);
 		MyProc->statusFlags |= PROC_IN_LOGICAL_DECODING;
 		ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
 		LWLockRelease(ProcArrayLock);
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 5ed955ba0b..9c7cf13d4d 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -527,7 +527,7 @@ ReplicationSlotRelease(void)
 	MyReplicationSlot = NULL;
 
 	/* might not have been set when we've been a plain slot */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
 	MyProc->statusFlags &= ~PROC_IN_LOGICAL_DECODING;
 	ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
 	LWLockRelease(ProcArrayLock);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index e2cb6e990d..ebe91907d6 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -662,6 +662,8 @@ ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
 		/* avoid unnecessarily dirtying shared cachelines */
 		if (proc->statusFlags & PROC_VACUUM_STATE_MASK)
 		{
+			/* Only safe to change my own flags with only share lock */
+			Assert(proc == MyProc);

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-17 Thread Alvaro Herrera

On 2020-Nov-16, Alvaro Herrera wrote:

> On 2020-Nov-09, Tom Lane wrote:
> 
> > Michael Paquier  writes:
> > >> +LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
> > >> +MyProc->vacuumFlags |= PROC_IN_SAFE_IC;
> > >> +ProcGlobal->vacuumFlags[MyProc->pgxactoff] = 
> > >> MyProc->vacuumFlags;
> > >> +LWLockRelease(ProcArrayLock);
> > 
> > > I can't help noticing that you are repeating the same code pattern
> > > eight times.  I think that this should be in its own routine, and that
> > > we had better document that this should be called just after starting
> > > a transaction, with an assertion enforcing that.
> > 
> > Do we really need exclusive lock on the ProcArray to make this flag
> > change?  That seems pretty bad from a concurrency standpoint.
> 
> BTW I now know that the reason for taking ProcArrayLock is not the
> vacuumFlags itself, but rather MyProc->pgxactoff, which can move.

... ah, but I realize now that this means that we can use shared lock
here, not exclusive, which is already an enormous improvement.  That's
because ->pgxactoff can only be changed with exclusive lock held; so as
long as we hold shared, the array item cannot move.

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-16 Thread Michael Paquier

On Mon, Nov 16, 2020 at 09:23:41PM -0300, Alvaro Herrera wrote:
> I've been wondering if it would be sane/safe to do the WaitForFoo stuff
> outside of any transaction.

This needs to remain within a transaction as CIC holds a session lock
on the parent table, which would not be cleaned up without a
transaction context.
--
Michael


signature.asc
Description: PGP signature

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-16 Thread Alvaro Herrera

I am really unsure about the REINDEX CONCURRENTLY part of this, for two
reasons:

1. It is not as good when reindexing multiple indexes, because we can
only apply the flag if *all* indexes are "safe".  Any unsafe index means
we step down from it for the whole thing.  This is probably not worth
worrying much about, but still.

2. In some of the waiting transactions, we actually do more things than
what we do in CREATE INDEX CONCURRENTLY transactions --- some catalog
updates, but we also do the whole index validation phase.  Is that OK?
It's not as clear to me that it is safe to set the flag in all those
places.

I moved the comments to the new function and made it inline.  I also
changed the way we determine how the function is safe; there's no reason
to build an IndexInfo if we can simply look at rel->rd_indexprs and
rel->indpred.

I've been wondering if it would be sane/safe to do the WaitForFoo stuff
outside of any transaction.

I'll have a look again tomorrow.
>From 1e5560d4a3f3e3b4cb83803f90985701fb4dee8c Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 4 Aug 2020 22:04:57 -0400
Subject: [PATCH v5] Avoid spurious CREATE INDEX CONCURRENTLY waits

---
 src/backend/commands/indexcmds.c | 85 +---
 src/include/storage/proc.h   |  6 ++-
 2 files changed, 83 insertions(+), 8 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 35696f9f75..44ea84c54d 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,6 +68,7 @@
 
 
 /* non-export function prototypes */
+static bool CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts);
 static void CheckPredicate(Expr *predicate);
 static void ComputeIndexAttrs(IndexInfo *indexInfo,
 			  Oid *typeOidP,
@@ -87,13 +88,12 @@ static char *ChooseIndexNameAddition(List *colnames);
 static List *ChooseIndexColumnNames(List *indexElems);
 static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
 			Oid relId, Oid oldRelId, void *arg);
-static bool ReindexRelationConcurrently(Oid relationOid, int options);
-
+static void reindex_error_callback(void *args);
 static void ReindexPartitions(Oid relid, int options, bool isTopLevel);
 static void ReindexMultipleInternal(List *relids, int options);
-static void reindex_error_callback(void *args);
+static bool ReindexRelationConcurrently(Oid relationOid, int options);
 static void update_relispartition(Oid relationId, bool newval);
-static bool CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts);
+static inline void set_safe_index_flag(void);
 
 /*
  * callback argument type for RangeVarCallbackForReindexIndex()
@@ -385,7 +385,10 @@ CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts)
  * lazy VACUUMs, because they won't be fazed by missing index entries
  * either.  (Manual ANALYZEs, however, can't be excluded because they
  * might be within transactions that are going to do arbitrary operations
- * later.)
+ * later.)  Processes running CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY
+ * on indexes that are neither expressional nor partial are also safe to
+ * ignore, since we know that those processes won't examine any data
+ * outside the table they're indexing.
  *
  * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
  * check for that.
@@ -406,7 +409,8 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
 	VirtualTransactionId *old_snapshots;
 
 	old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
-		  PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+		  PROC_IS_AUTOVACUUM | PROC_IN_VACUUM
+		  | PROC_IN_SAFE_IC,
 		  _old_snapshots);
 	if (progress)
 		pgstat_progress_update_param(PROGRESS_WAITFOR_TOTAL, n_old_snapshots);
@@ -426,7 +430,8 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
 
 			newer_snapshots = GetCurrentVirtualXIDs(limitXmin,
 	true, false,
-	PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+	PROC_IS_AUTOVACUUM | PROC_IN_VACUUM
+	| PROC_IN_SAFE_IC,
 	_newer_snapshots);
 			for (j = i; j < n_old_snapshots; j++)
 			{
@@ -519,6 +524,7 @@ DefineIndex(Oid relationId,
 	bool		amcanorder;
 	amoptions_function amoptions;
 	bool		partitioned;
+	bool		safe_index;
 	Datum		reloptions;
 	int16	   *coloptions;
 	IndexInfo  *indexInfo;
@@ -1045,6 +1051,10 @@ DefineIndex(Oid relationId,
 		}
 	}
 
+	/* Determine whether we can call set_safe_index_flag */
+	safe_index = indexInfo->ii_Expressions == NIL &&
+		indexInfo->ii_Predicate == NIL;
+
 	/*
 	 * Report index creation if appropriate (delay this till after most of the
 	 * error checks)
@@ -1431,6 +1441,10 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_safe_index_flag();
+
 	/*
 	 * The index is now visible, so we can report the OID.

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-16 Thread Alvaro Herrera

On 2020-Nov-09, Tom Lane wrote:

> Michael Paquier  writes:
> >> +  LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
> >> +  MyProc->vacuumFlags |= PROC_IN_SAFE_IC;
> >> +  ProcGlobal->vacuumFlags[MyProc->pgxactoff] = 
> >> MyProc->vacuumFlags;
> >> +  LWLockRelease(ProcArrayLock);
> 
> > I can't help noticing that you are repeating the same code pattern
> > eight times.  I think that this should be in its own routine, and that
> > we had better document that this should be called just after starting
> > a transaction, with an assertion enforcing that.
> 
> Do we really need exclusive lock on the ProcArray to make this flag
> change?  That seems pretty bad from a concurrency standpoint.

BTW I now know that the reason for taking ProcArrayLock is not the
vacuumFlags itself, but rather MyProc->pgxactoff, which can move.

On the other hand, if we stopped mirroring the flags in ProcGlobal, it
would mean we would have to access all procs' PGPROC entries in
GetSnapshotData, which is undesirable for performance reason (per commit
5788e258bb26).

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-16 Thread Alvaro Herrera

On 2020-Nov-16, Dmitry Dolgov wrote:

> The same with reindex without locks:
> 
>  nsecs   : count distribution
>512 -> 1023   : 0||
>   1024 -> 2047   : 111345   ||
>   2048 -> 4095   : 6997627  ||
>   4096 -> 8191   : 18575||
>   8192 -> 16383  : 586  ||
>  16384 -> 32767  : 312  ||
>  32768 -> 65535  : 18   ||
> 
> The same with reindex with locks:
> 
>  nsecs   : count distribution
>512 -> 1023   : 0||
>   1024 -> 2047   : 59438||
>   2048 -> 4095   : 6901187  ||
>   4096 -> 8191   : 18584||
>   8192 -> 16383  : 581  ||
>  16384 -> 32767  : 280  ||
>  32768 -> 65535  : 84   ||
> 
> Looks like with reindex without locks is indeed faster (there are mode
> samples in lower time section), but not particularly significant to the
> whole distribution, especially taking into account extremity of the
> test.

I didn't analyze these numbers super carefully, but yeah it doesn't look
significant.

I'm looking at these patches now, with intention to push.

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-16 Thread Simon Riggs

On Tue, 10 Nov 2020 at 03:02, Tom Lane  wrote:
>
> Alvaro Herrera  writes:
> > Yeah ... it would be much better if we can make it use atomics instead.
>
> I was thinking more like "do we need any locking at all".
>
> Assuming that a proc's vacuumFlags can be set by only the process itself,
> there's no write conflicts to worry about.  On the read side, there's a
> hazard that onlookers will not see the PROC_IN_SAFE_IC flag set; but
> that's not any different from what the outcome would be if they looked
> just before this stanza executes.  And even if they don't see it, at worst
> we lose the optimization being proposed.
>
> There is a question of whether it's important that both copies of the flag
> appear to update atomically ... but that just begs the question "why in
> heaven's name are there two copies?"

Agreed to all of the above, but I think the issue is miniscule because
ProcArrayLock is acquired in LW_EXCLUSIVE mode at transaction commit.
So it doesn't seem like there is much need to optimize this particular
aspect of the patch.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-16 Thread Dmitry Dolgov

> On Fri, Nov 13, 2020 at 09:25:40AM +0900, Michael Paquier wrote:
> On Thu, Nov 12, 2020 at 04:36:32PM +0100, Dmitry Dolgov wrote:
> > Interesting enough, similar discussion happened about vaccumFlags before
> > with the same conclusion that theoretically it's fine to update without
> > holding the lock, but this assumption could change one day and it's
> > better to avoid such risks. Having said that I believe it makes sense to
> > continue with locking. Are there any other opinions? I'll try to
> > benchmark it in the meantime.
>
> Thanks for planning some benchmarking for this specific patch.  I have
> to admit that the possibility of switching vacuumFlags to use atomics
> is very appealing in the long term, with or without considering this
> patch, even if we had better be sure that this patch has no actual
> effect on concurrency first if atomics are not used in worst-case
> scenarios.

I've tried first to test scenarios where GetSnapshotData produces
significant lock contention and "reindex concurrently" implementation
with locks interferes with it. The idea I had is to create a test
function that constantly calls GetSnapshotData (perf indeed shows
significant portion of time spent on contended lock), and clash it with
a stream of "reindex concurrently" of an empty relation (which still
reaches safe_index check). I guess it could be considered as an
artificial extreme case. Measuring GetSnapshotData (or rather the
surrounding wrapper, to distinguish calls from the test function from
everything else) latency without reindex, with reindex and locks, with
reindex without locks should produce different "modes" and comparing
them we can make some conclusions.

Latency histograms without reindex (nanoseconds):

 nsecs   : count distribution
   512 -> 1023   : 0||
  1024 -> 2047   : 10001209 ||
  2048 -> 4095   : 76936||
  4096 -> 8191   : 1468 ||
  8192 -> 16383  : 98   ||
 16384 -> 32767  : 39   ||
 32768 -> 65535  : 6||

The same with reindex without locks:

 nsecs   : count distribution
   512 -> 1023   : 0||
  1024 -> 2047   : 111345   ||
  2048 -> 4095   : 6997627  ||
  4096 -> 8191   : 18575||
  8192 -> 16383  : 586  ||
 16384 -> 32767  : 312  ||
 32768 -> 65535  : 18   ||

The same with reindex with locks:

 nsecs   : count distribution
   512 -> 1023   : 0||
  1024 -> 2047   : 59438||
  2048 -> 4095   : 6901187  ||
  4096 -> 8191   : 18584||
  8192 -> 16383  : 581  ||
 16384 -> 32767  : 280  ||
 32768 -> 65535  : 84   ||

Looks like with reindex without locks is indeed faster (there are mode
samples in lower time section), but not particularly significant to the
whole distribution, especially taking into account extremity of the
test.

I'll take a look at benchmarking of switching vacuumFlags to use
atomics, but as it's probably a bit off topic I'm going to attach
another version of the patch with locks and suggested changes. To which
I have one question:

> Michael Paquier  writes:

> I think that this should be in its own routine, and that we had better
> document that this should be called just after starting a transaction,
> with an assertion enforcing that.

I'm not sure which exactly assertion condition do you mean?
>From 07c4705a22e6cdd5717df46a974ce00a69fc901f Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Wed, 11 Nov 2020 15:19:48 +0100
Subject: [PATCH v4 1/2] Rename vaccumFlags to statusFlags

With more flags associated to a PGPROC entry that are not related to
vacuum (currently existing or planned), the name "statusFlags" describes
its purpose better.
---
 src/backend/access/transam/twophase.c |  2 +-
 src/backend/commands/vacuum.c |  6 +--
 src/backend/postmaster/autovacuum.c   |  6 +--
 src/backend/replication/logical/logical.c |  4 +-
 src/backend/replication/slot.c|  4 +-

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-12 Thread Michael Paquier

On Thu, Nov 12, 2020 at 04:36:32PM +0100, Dmitry Dolgov wrote:
> Interesting enough, similar discussion happened about vaccumFlags before
> with the same conclusion that theoretically it's fine to update without
> holding the lock, but this assumption could change one day and it's
> better to avoid such risks. Having said that I believe it makes sense to
> continue with locking. Are there any other opinions? I'll try to
> benchmark it in the meantime.

Thanks for planning some benchmarking for this specific patch.  I have
to admit that the possibility of switching vacuumFlags to use atomics
is very appealing in the long term, with or without considering this
patch, even if we had better be sure that this patch has no actual
effect on concurrency first if atomics are not used in worst-case
scenarios.
--
Michael


signature.asc
Description: PGP signature

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-12 Thread Dmitry Dolgov

> On Mon, Nov 09, 2020 at 10:02:27PM -0500, Tom Lane wrote:
>
> Alvaro Herrera  writes:
> > Yeah ... it would be much better if we can make it use atomics instead.
>
> I was thinking more like "do we need any locking at all".
>
> Assuming that a proc's vacuumFlags can be set by only the process itself,
> there's no write conflicts to worry about.  On the read side, there's a
> hazard that onlookers will not see the PROC_IN_SAFE_IC flag set; but
> that's not any different from what the outcome would be if they looked
> just before this stanza executes.  And even if they don't see it, at worst
> we lose the optimization being proposed.
>
> There is a question of whether it's important that both copies of the flag
> appear to update atomically ... but that just begs the question "why in
> heaven's name are there two copies?"

Sounds right, but after reading the thread about GetSnapshotData
scalability more thoroughly there seem to be an assumption that those
copies have to be updated at the same time under the same lock, and
claims that in some cases justification for correctness around not
taking ProcArrayLock is too complicated, at least for now.

Interesting enough, similar discussion happened about vaccumFlags before
with the same conclusion that theoretically it's fine to update without
holding the lock, but this assumption could change one day and it's
better to avoid such risks. Having said that I believe it makes sense to
continue with locking. Are there any other opinions? I'll try to
benchmark it in the meantime.

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-09 Thread Tom Lane

Alvaro Herrera  writes:
> Yeah ... it would be much better if we can make it use atomics instead.

I was thinking more like "do we need any locking at all".

Assuming that a proc's vacuumFlags can be set by only the process itself,
there's no write conflicts to worry about.  On the read side, there's a
hazard that onlookers will not see the PROC_IN_SAFE_IC flag set; but
that's not any different from what the outcome would be if they looked
just before this stanza executes.  And even if they don't see it, at worst
we lose the optimization being proposed.

There is a question of whether it's important that both copies of the flag
appear to update atomically ... but that just begs the question "why in
heaven's name are there two copies?"

regards, tom lane

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-09 Thread Michael Paquier

On Mon, Nov 09, 2020 at 11:31:15PM -0300, Alvaro Herrera wrote:
> Yeah ... it would be much better if we can make it use atomics instead.
> Currently it's an uint8, and in PGPROC itself it's probably not a big
> deal to enlarge that, but I fear that quadrupling the size of the
> mirroring array in PROC_HDR might be bad for performance.  However,
> maybe if we use atomics to access it, then we don't need to mirror it
> anymore?  That would need some benchmarking of GetSnapshotData.

Hmm.  If you worry about the performance impact here, it is possible
to do a small performance test without this patch.  vacuum_rel() sets
the flag for a non-full VACUUM, so with one backend running a manual
VACUUM in loop on an empty relation could apply some pressure on any
benchmark, even a simple pgbench.
--
Michael


signature.asc
Description: PGP signature

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-09 Thread Alvaro Herrera

On 2020-Nov-09, Tom Lane wrote:

> Michael Paquier  writes:
> > On Mon, Nov 09, 2020 at 08:32:13PM -0500, Tom Lane wrote:
> >> Do we really need exclusive lock on the ProcArray to make this flag
> >> change?  That seems pretty bad from a concurrency standpoint.
> 
> > Any place where we update vacuumFlags acquires an exclusive LWLock on
> > ProcArrayLock.  That's held for a very short time, so IMO it won't
> > matter much in practice, particularly if you compare that with the
> > potential gains related to the existing wait phases.
> 
> Not sure I believe that it doesn't matter much in practice.  If there's
> a steady stream of shared ProcArrayLock acquisitions (for snapshot
> acquisition) then somebody wanting exclusive lock will create a big
> hiccup, whether they hold it for a short time or not.

Yeah ... it would be much better if we can make it use atomics instead.
Currently it's an uint8, and in PGPROC itself it's probably not a big
deal to enlarge that, but I fear that quadrupling the size of the
mirroring array in PROC_HDR might be bad for performance.  However,
maybe if we use atomics to access it, then we don't need to mirror it
anymore?  That would need some benchmarking of GetSnapshotData.

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-09 Thread Tom Lane

Michael Paquier  writes:
> On Mon, Nov 09, 2020 at 08:32:13PM -0500, Tom Lane wrote:
>> Do we really need exclusive lock on the ProcArray to make this flag
>> change?  That seems pretty bad from a concurrency standpoint.

> Any place where we update vacuumFlags acquires an exclusive LWLock on
> ProcArrayLock.  That's held for a very short time, so IMO it won't
> matter much in practice, particularly if you compare that with the
> potential gains related to the existing wait phases.

Not sure I believe that it doesn't matter much in practice.  If there's
a steady stream of shared ProcArrayLock acquisitions (for snapshot
acquisition) then somebody wanting exclusive lock will create a big
hiccup, whether they hold it for a short time or not.

regards, tom lane

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-09 Thread Michael Paquier

On Mon, Nov 09, 2020 at 08:32:13PM -0500, Tom Lane wrote:
> Do we really need exclusive lock on the ProcArray to make this flag
> change?  That seems pretty bad from a concurrency standpoint.

Any place where we update vacuumFlags acquires an exclusive LWLock on
ProcArrayLock.  That's held for a very short time, so IMO it won't
matter much in practice, particularly if you compare that with the
potential gains related to the existing wait phases.
--
Michael


signature.asc
Description: PGP signature

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-09 Thread Tom Lane

Michael Paquier  writes:
>> +LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
>> +MyProc->vacuumFlags |= PROC_IN_SAFE_IC;
>> +ProcGlobal->vacuumFlags[MyProc->pgxactoff] = 
>> MyProc->vacuumFlags;
>> +LWLockRelease(ProcArrayLock);

> I can't help noticing that you are repeating the same code pattern
> eight times.  I think that this should be in its own routine, and that
> we had better document that this should be called just after starting
> a transaction, with an assertion enforcing that.

Do we really need exclusive lock on the ProcArray to make this flag
change?  That seems pretty bad from a concurrency standpoint.

regards, tom lane

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-09 Thread Michael Paquier

On Mon, Nov 09, 2020 at 04:47:43PM +0100, Dmitry Dolgov wrote:
> > On Tue, Nov 03, 2020 at 07:14:47PM +0100, Dmitry Dolgov wrote:
> > > On Thu, Aug 20, 2020 at 03:11:19PM +0900, Michael Paquier wrote:
> > > On Wed, Aug 19, 2020 at 02:16:46PM -0400, Alvaro Herrera wrote:
> > > > I did not set the flag in REINDEX CONCURRENTLY, but as I understand it
> > > > can be done too, since in essence it's the same thing as a CIC from a
> > > > snapshot management point of view.
> > >
> > > Yes, I see no problems for REINDEX CONCURRENTLY as well as long as
> > > there are no predicates and expressions involved.  The transactions
> > > that should be patched are all started in ReindexRelationConcurrently.
> > > The transaction of index_concurrently_swap() cannot set up that
> > > though.  Only thing to be careful is to make sure that safe_flag is
> > > correct depending on the list of indexes worked on.
> >
> > Hi,
> >
> > After looking through the thread and reading the patch it seems good,
> > and there are only few minor questions:
> >
> > * Doing the same for REINDEX CONCURRENTLY, which does make sense. In
> >   fact it's already mentioned in the commentaries as done, which a bit
> >   confusing.
> 
> Just to give it a shot, would the attached change be enough?

Could it be possible to rename vacuumFlags to statusFlags first?  I
did not see any objection to do this suggestion.

> + LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
> + MyProc->vacuumFlags |= PROC_IN_SAFE_IC;
> + ProcGlobal->vacuumFlags[MyProc->pgxactoff] = 
> MyProc->vacuumFlags;
> + LWLockRelease(ProcArrayLock);

I can't help noticing that you are repeating the same code pattern
eight times.  I think that this should be in its own routine, and that
we had better document that this should be called just after starting
a transaction, with an assertion enforcing that.
--
Michael


signature.asc
Description: PGP signature

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-09 Thread Dmitry Dolgov

> On Tue, Nov 03, 2020 at 07:14:47PM +0100, Dmitry Dolgov wrote:
> > On Thu, Aug 20, 2020 at 03:11:19PM +0900, Michael Paquier wrote:
> > On Wed, Aug 19, 2020 at 02:16:46PM -0400, Alvaro Herrera wrote:
> > > I did not set the flag in REINDEX CONCURRENTLY, but as I understand it
> > > can be done too, since in essence it's the same thing as a CIC from a
> > > snapshot management point of view.
> >
> > Yes, I see no problems for REINDEX CONCURRENTLY as well as long as
> > there are no predicates and expressions involved.  The transactions
> > that should be patched are all started in ReindexRelationConcurrently.
> > The transaction of index_concurrently_swap() cannot set up that
> > though.  Only thing to be careful is to make sure that safe_flag is
> > correct depending on the list of indexes worked on.
>
> Hi,
>
> After looking through the thread and reading the patch it seems good,
> and there are only few minor questions:
>
> * Doing the same for REINDEX CONCURRENTLY, which does make sense. In
>   fact it's already mentioned in the commentaries as done, which a bit
>   confusing.

Just to give it a shot, would the attached change be enough?
>From d30d8acf91679985970334069ab7f1f8f7fc3ec5 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 4 Aug 2020 22:04:57 -0400
Subject: [PATCH v3] Avoid spurious CREATE INDEX CONCURRENTLY waits

---
 src/backend/commands/indexcmds.c | 122 ++-
 src/include/storage/proc.h   |   6 +-
 2 files changed, 124 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 75552c64ed..5019397d50 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -385,7 +385,10 @@ CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts)
  * lazy VACUUMs, because they won't be fazed by missing index entries
  * either.  (Manual ANALYZEs, however, can't be excluded because they
  * might be within transactions that are going to do arbitrary operations
- * later.)
+ * later.)  Processes running CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY
+ * on indexes that are neither expressional nor partial are also safe to
+ * ignore, since we know that those processes won't examine any data
+ * outside the table they're indexing.
  *
  * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
  * check for that.
@@ -406,7 +409,8 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
 	VirtualTransactionId *old_snapshots;
 
 	old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
-		  PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+		  PROC_IS_AUTOVACUUM | PROC_IN_VACUUM
+		  | PROC_IN_SAFE_IC,
 		  _old_snapshots);
 	if (progress)
 		pgstat_progress_update_param(PROGRESS_WAITFOR_TOTAL, n_old_snapshots);
@@ -426,7 +430,8 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
 
 			newer_snapshots = GetCurrentVirtualXIDs(limitXmin,
 	true, false,
-	PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+	PROC_IS_AUTOVACUUM | PROC_IN_VACUUM
+	| PROC_IN_SAFE_IC,
 	_newer_snapshots);
 			for (j = i; j < n_old_snapshots; j++)
 			{
@@ -519,6 +524,7 @@ DefineIndex(Oid relationId,
 	bool		amcanorder;
 	amoptions_function amoptions;
 	bool		partitioned;
+	bool		safe_index;
 	Datum		reloptions;
 	int16	   *coloptions;
 	IndexInfo  *indexInfo;
@@ -1045,6 +1051,17 @@ DefineIndex(Oid relationId,
 		}
 	}
 
+	/*
+	 * When doing concurrent index builds, we can set a PGPROC flag to tell
+	 * concurrent VACUUM, CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY
+	 * to ignore us when waiting for concurrent snapshots.  That can only be
+	 * done for indexes that don't execute any expressions.  Determine that.
+	 * (The flag is reset automatically at transaction end, so it must be
+	 * set for each transaction.)
+	 */
+	safe_index = indexInfo->ii_Expressions == NIL &&
+		indexInfo->ii_Predicate == NIL;
+
 	/*
 	 * Report index creation if appropriate (delay this till after most of the
 	 * error checks)
@@ -1431,6 +1448,15 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+	{
+		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		MyProc->vacuumFlags |= PROC_IN_SAFE_IC;
+		ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags;
+		LWLockRelease(ProcArrayLock);
+	}
+
 	/*
 	 * The index is now visible, so we can report the OID.
 	 */
@@ -1490,6 +1516,15 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+	{
+		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		MyProc->vacuumFlags |= PROC_IN_SAFE_IC;
+		ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags;
+		LWLockRelease(ProcArrayLock);
+	}
+
 	/*
 	 * Phase 3 of

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-03 Thread Dmitry Dolgov

> On Thu, Aug 20, 2020 at 03:11:19PM +0900, Michael Paquier wrote:
> On Wed, Aug 19, 2020 at 02:16:46PM -0400, Alvaro Herrera wrote:
> > I did not set the flag in REINDEX CONCURRENTLY, but as I understand it
> > can be done too, since in essence it's the same thing as a CIC from a
> > snapshot management point of view.
>
> Yes, I see no problems for REINDEX CONCURRENTLY as well as long as
> there are no predicates and expressions involved.  The transactions
> that should be patched are all started in ReindexRelationConcurrently.
> The transaction of index_concurrently_swap() cannot set up that
> though.  Only thing to be careful is to make sure that safe_flag is
> correct depending on the list of indexes worked on.

Hi,

After looking through the thread and reading the patch it seems good,
and there are only few minor questions:

* Doing the same for REINDEX CONCURRENTLY, which does make sense. In
  fact it's already mentioned in the commentaries as done, which a bit
  confusing.

* Naming, to be more precise what suggested Michael:

> Could we consider renaming vacuumFlags?  With more flags associated to
> a PGPROC entry that are not related to vacuum, the current naming
> makes things confusing.  Something like statusFlags could fit better
> in the picture?

  which sounds reasonable, and similar one about flag name
  PROC_IN_SAFE_CIC - if it covers both CREATE INDEX/REINDEX CONCURRENTLY
  maybe just PROC_IN_SAFE_IC?

Any plans about those questions? I can imagine that are the only missing
parts.

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-08-20 Thread Michael Paquier

On Wed, Aug 19, 2020 at 02:16:46PM -0400, Alvaro Herrera wrote:
> I did not set the flag in REINDEX CONCURRENTLY, but as I understand it
> can be done too, since in essence it's the same thing as a CIC from a
> snapshot management point of view.

Yes, I see no problems for REINDEX CONCURRENTLY as well as long as
there are no predicates and expressions involved.  The transactions
that should be patched are all started in ReindexRelationConcurrently.
The transaction of index_concurrently_swap() cannot set up that
though.  Only thing to be careful is to make sure that safe_flag is
correct depending on the list of indexes worked on.

> Also, per [1], ISTM this flag could be used to tell lazy VACUUM to
> ignore the Xmin of this process too, which the previous formulation
> (where all CICs were so marked) could not.  This patch doesn't do that
> yet, but it seems the natural next step to take.
> 
> [1] https://postgr.es/m/20191101203310.GA12239@alvherre.pgsql

Could we consider renaming vacuumFlags?  With more flags associated to
a PGPROC entry that are not related to vacuum, the current naming
makes things confusing.  Something like statusFlags could fit better
in the picture?
--
Michael

signature.asc
Description: PGP signature

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-08-19 Thread Alvaro Herrera

On 2020-Aug-11, James Coleman wrote:

> In [3] I'd brought up that a function index can do arbitrary
> operations (including, terribly, a query of another table), and Andres
> (in [4]) noted that:
> 
> > Well, even if we consider this an actual problem, we could still use
> > PROC_IN_CIC for non-expression non-partial indexes (index operator
> > themselves better ensure this isn't a problem, or they're ridiculously
> > broken already - they can get called during vacuum).
> 
> But went on to describe how this is a problem with all expression
> indexes (even if those expressions don't do dangerous things) because
> of syscache lookups, but that ideally for expression indexes we'd have
> a way to use a different (or more frequently taken) snapshot for the
> purpose of computing those expressions. That's a significantly more
> involved patch though.

So the easy first patch here is to add the flag as PROC_IN_SAFE_CIC,
which is set only for CIC when it's used for indexes that are neither
on expressions nor partial.  Then ignore those in WaitForOlderSnapshots.
The attached patch does it that way.  (Also updated per recent
conflicts).

I did not set the flag in REINDEX CONCURRENTLY, but as I understand it
can be done too, since in essence it's the same thing as a CIC from a
snapshot management point of view.

Also, per [1], ISTM this flag could be used to tell lazy VACUUM to
ignore the Xmin of this process too, which the previous formulation
(where all CICs were so marked) could not.  This patch doesn't do that
yet, but it seems the natural next step to take.

[1] https://postgr.es/m/20191101203310.GA12239@alvherre.pgsql

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 2a5088dfa35cbc800a87dc2154b6ebfa22837a66 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 4 Aug 2020 22:04:57 -0400
Subject: [PATCH v2] Avoid spurious CREATE INDEX CONCURRENTLY waits

---
 src/backend/commands/indexcmds.c | 50 ++--
 src/include/storage/proc.h   |  6 +++-
 2 files changed, 52 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 254dbcdce5..459f6fa5db 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -372,7 +372,10 @@ CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts)
  * lazy VACUUMs, because they won't be fazed by missing index entries
  * either.  (Manual ANALYZEs, however, can't be excluded because they
  * might be within transactions that are going to do arbitrary operations
- * later.)
+ * later.)  Processes running CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY
+ * on indexes that are neither expressional nor partial are also safe to
+ * ignore, since we know that those processes won't examine any data
+ * outside the table they're indexing.
  *
  * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
  * check for that.
@@ -393,7 +396,8 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
 	VirtualTransactionId *old_snapshots;
 
 	old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
-		  PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+		  PROC_IS_AUTOVACUUM | PROC_IN_VACUUM
+		  | PROC_IN_SAFE_CIC,
 		  _old_snapshots);
 	if (progress)
 		pgstat_progress_update_param(PROGRESS_WAITFOR_TOTAL, n_old_snapshots);
@@ -413,7 +417,8 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
 
 			newer_snapshots = GetCurrentVirtualXIDs(limitXmin,
 	true, false,
-	PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+	PROC_IS_AUTOVACUUM | PROC_IN_VACUUM
+	| PROC_IN_SAFE_CIC,
 	_newer_snapshots);
 			for (j = i; j < n_old_snapshots; j++)
 			{
@@ -506,6 +511,7 @@ DefineIndex(Oid relationId,
 	bool		amcanorder;
 	amoptions_function amoptions;
 	bool		partitioned;
+	bool		safe_index;
 	Datum		reloptions;
 	int16	   *coloptions;
 	IndexInfo  *indexInfo;
@@ -1033,6 +1039,17 @@ DefineIndex(Oid relationId,
 		}
 	}
 
+	/*
+	 * When doing concurrent index builds, we can set a PGPROC flag to tell
+	 * concurrent VACUUM, CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY
+	 * to ignore us when waiting for concurrent snapshots.  That can only be
+	 * done for indexes that don't execute any expressions.  Determine that.
+	 * (The flag is reset automatically at transaction end, so it must be
+	 * set for each transaction.)
+	 */
+	safe_index = indexInfo->ii_Expressions == NIL &&
+		indexInfo->ii_Predicate == NIL;
+
 	/*
 	 * Report index creation if appropriate (delay this till after most of the
 	 * error checks)
@@ -1419,6 +1436,15 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+	{
+		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		MyProc->vacuumFlags |=

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-08-11 Thread James Coleman

On Tue, Aug 11, 2020 at 11:14 AM Tom Lane  wrote:
>
> James Coleman  writes:
> > Why is a CIC in active index-building something we need to wait for?
> > Wouldn't it fall under a similar kind of logic to the other snapshot
> > types we can explicitly ignore? CIC can't be run in a manual
> > transaction, so the snapshot it holds won't be used to perform
> > arbitrary operations (i.e., the reason why a manual ANALYZE can't be
> > ignored).
>
> Expression indexes that call user-defined functions seem like a
> pretty serious risk factor for that argument.  Those are exactly
> the same expressions that ANALYZE will evaluate, as a result of
> which we judge it unsafe to ignore.  Why would CIC be different?

The comments for WaitForOlderSnapshots() don't discuss that problem at
all; as far as ANALYZE goes they only say:

* Manual ANALYZEs, however, can't be excluded because they
* might be within transactions that are going to do arbitrary operations
* later.

But nonetheless over in the thread Álvaro linked to we'd discussed
these issues already. Andres in [1] and [2] believed that:

> For the snapshot waits we can add a procarray flag
> (alongside PROCARRAY_VACUUM_FLAG) indicating that the backend is
> doing. Which WaitForOlderSnapshots() can then use to ignore those CICs,
> which is safe, because those transactions definitely don't insert into
> relations targeted by CIC. The change to WaitForOlderSnapshots() would
> just be to pass the new flag to GetCurrentVirtualXIDs, I think.

and

> What I was thinking of was a new flag, with a distinct value from
> PROC_IN_VACUUM. It'd currently just be specified in the
> GetCurrentVirtualXIDs() calls in WaitForOlderSnapshots(). That'd avoid
> needing to wait for other CICs on different relations. Since CIC is not
> permitted on system tables, and CIC doesn't do DML on normal tables, it
> seems fairly obviously correct to exclude other CICs.

In [3] I'd brought up that a function index can do arbitrary
operations (including, terribly, a query of another table), and Andres
(in [4]) noted that:

> Well, even if we consider this an actual problem, we could still use
> PROC_IN_CIC for non-expression non-partial indexes (index operator
> themselves better ensure this isn't a problem, or they're ridiculously
> broken already - they can get called during vacuum).

But went on to describe how this is a problem with all expression
indexes (even if those expressions don't do dangerous things) because
of syscache lookups, but that ideally for expression indexes we'd have
a way to use a different (or more frequently taken) snapshot for the
purpose of computing those expressions. That's a significantly more
involved patch though.

So from what I understand, everything that I'd claimed in my previous
message still holds true for non-expression/non-partial indexes. Is
there something else I'm missing?

James

1: 
https://www.postgresql.org/message-id/20200325191935.jjhdg2zy5k7ath5v%40alap3.anarazel.de
2: 
https://www.postgresql.org/message-id/20200325195841.gq4hpl25t6pxv3gl%40alap3.anarazel.de
3: 
https://www.postgresql.org/message-id/CAAaqYe_fveT_tvKonVt1imujOURUUMrydGeaBGahqLLy9D-REw%40mail.gmail.com
4: 
https://www.postgresql.org/message-id/20200416221207.wmnzbxvjqqpazeob%40alap3.anarazel.de

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-08-11 Thread Tom Lane

James Coleman  writes:
> Why is a CIC in active index-building something we need to wait for?
> Wouldn't it fall under a similar kind of logic to the other snapshot
> types we can explicitly ignore? CIC can't be run in a manual
> transaction, so the snapshot it holds won't be used to perform
> arbitrary operations (i.e., the reason why a manual ANALYZE can't be
> ignored).

Expression indexes that call user-defined functions seem like a
pretty serious risk factor for that argument.  Those are exactly
the same expressions that ANALYZE will evaluate, as a result of
which we judge it unsafe to ignore.  Why would CIC be different?

regards, tom lane

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-08-10 Thread James Coleman

On Mon, Aug 10, 2020 at 8:37 PM Tom Lane  wrote:
>
> Alvaro Herrera  writes:
> > To recap: currently, any CREATE INDEX CONCURRENTLY will wait for all
> > other CICs running concurrently to finish, because they can't be
> > distinguished amidst other old snapshots.  We can change things by
> > having CIC set a special flag in PGPROC (like PROC_IN_VACUUM) indicating
> > that it's doing CIC; other CICs will see that flag and will know that
> > they don't need to wait for those processes.  With this, CIC on small
> > tables don't have to wait for CIC on large tables to complete.
>
> Hm.  +1 for improving this, if we can, but ...
>
> It seems clearly unsafe to ignore a CIC that is in active index-building;
> a snapshot held for that purpose is just as real as any other.  It *might*
> be all right to ignore a CIC that is just waiting, but you haven't made
> any argument in the patch comments as to why that's safe either.
> (Moreover, at the points where we're just waiting, I don't think we have
> a snapshot, so another CIC's WaitForOlderSnapshots shouldn't wait for us
> anyway.)

Why is a CIC in active index-building something we need to wait for?
Wouldn't it fall under a similar kind of logic to the other snapshot
types we can explicitly ignore? CIC can't be run in a manual
transaction, so the snapshot it holds won't be used to perform
arbitrary operations (i.e., the reason why a manual ANALYZE can't be
ignored).

> Actually, it doesn't look like you've touched the comments at all.
> WaitForOlderSnapshots' header comment has a long explanation of why
> it's safe to ignore certain processes.  That certainly needs to be
> updated by any patch that's going to change the rules.

Agreed that the comment needs to be updated to discuss the
(im)possibility of arbitrary operations within a snapshot held by CIC.

James

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-08-10 Thread Tom Lane

Alvaro Herrera  writes:
> To recap: currently, any CREATE INDEX CONCURRENTLY will wait for all
> other CICs running concurrently to finish, because they can't be
> distinguished amidst other old snapshots.  We can change things by
> having CIC set a special flag in PGPROC (like PROC_IN_VACUUM) indicating
> that it's doing CIC; other CICs will see that flag and will know that
> they don't need to wait for those processes.  With this, CIC on small
> tables don't have to wait for CIC on large tables to complete.

Hm.  +1 for improving this, if we can, but ...

It seems clearly unsafe to ignore a CIC that is in active index-building;
a snapshot held for that purpose is just as real as any other.  It *might*
be all right to ignore a CIC that is just waiting, but you haven't made
any argument in the patch comments as to why that's safe either.
(Moreover, at the points where we're just waiting, I don't think we have
a snapshot, so another CIC's WaitForOlderSnapshots shouldn't wait for us
anyway.)

Actually, it doesn't look like you've touched the comments at all.
WaitForOlderSnapshots' header comment has a long explanation of why
it's safe to ignore certain processes.  That certainly needs to be
updated by any patch that's going to change the rules.

BTW, what about REINDEX CONCURRENTLY?

regards, tom lane

Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-08-10 Thread Alvaro Herrera

+ James Coleman

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

remove spurious CREATE INDEX CONCURRENTLY wait

2020-08-10 Thread Alvaro Herrera

I previously[1] posted a patch to have multiple CREATE INDEX CONCURRENTLY
not wait for the slowest of them.  This is an update of that, with minor
conflicts fixed and a fresh thread.

To recap: currently, any CREATE INDEX CONCURRENTLY will wait for all
other CICs running concurrently to finish, because they can't be
distinguished amidst other old snapshots.  We can change things by
having CIC set a special flag in PGPROC (like PROC_IN_VACUUM) indicating
that it's doing CIC; other CICs will see that flag and will know that
they don't need to wait for those processes.  With this, CIC on small
tables don't have to wait for CIC on large tables to complete.

[1] https://postgr.es/m/20200805021109.GA9079@alvherre.pgsql


-- 
Álvaro Herrerahttp://www.linkedin.com/in/alvherre
"Escucha y olvidarás; ve y recordarás; haz y entenderás" (Confucio)
>From 2596c3033aacceb021463f58b50e2c4eed8a5ab2 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 4 Aug 2020 22:04:57 -0400
Subject: [PATCH] Flag CREATE INDEX CONCURRENTLY to avoid spurious waiting

---
 src/backend/commands/indexcmds.c | 13 +++--
 src/include/storage/proc.h   |  4 +++-
 src/include/storage/procarray.h  |  4 
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index a3cb3cd47f..241c8a337e 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -393,7 +393,7 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
 	VirtualTransactionId *old_snapshots;
 
 	old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
-		  PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+		  PROC_IS_AUTOVACUUM | PROC_IN_VACUUM | PROC_IN_CIC,
 		  _old_snapshots);
 	if (progress)
 		pgstat_progress_update_param(PROGRESS_WAITFOR_TOTAL, n_old_snapshots);
@@ -413,7 +413,7 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
 
 			newer_snapshots = GetCurrentVirtualXIDs(limitXmin,
 	true, false,
-	PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+	PROC_IS_AUTOVACUUM | PROC_IN_VACUUM | PROC_IN_CIC,
 	_newer_snapshots);
 			for (j = i; j < n_old_snapshots; j++)
 			{
@@ -1420,6 +1420,9 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Set flag for other concurrent index builds to ignore us */
+	MyPgXact->vacuumFlags |= PROC_IN_CIC;
+
 	/*
 	 * The index is now visible, so we can report the OID.
 	 */
@@ -1482,6 +1485,9 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Set flag for other concurrent index builds to ignore us */
+	MyPgXact->vacuumFlags |= PROC_IN_CIC;
+
 	/*
 	 * Phase 3 of concurrent index build
 	 *
@@ -1541,6 +1547,9 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/* Set flag for other concurrent index builds to ignore us */
+	MyPgXact->vacuumFlags |= PROC_IN_CIC;
+
 	/* We should now definitely not be advertising any xmin. */
 	Assert(MyPgXact->xmin == InvalidTransactionId);
 
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 5ceb2494ba..b030dcde6c 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -52,6 +52,8 @@ struct XidCache
  */
 #define		PROC_IS_AUTOVACUUM	0x01	/* is it an autovac worker? */
 #define		PROC_IN_VACUUM		0x02	/* currently running lazy vacuum */
+#define		PROC_IN_CIC			0x04	/* currently running CREATE INDEX
+		   CONCURRENTLY */
 #define		PROC_VACUUM_FOR_WRAPAROUND	0x08	/* set by autovac only */
 #define		PROC_IN_LOGICAL_DECODING	0x10	/* currently doing logical
  * decoding outside xact */
@@ -59,7 +61,7 @@ struct XidCache
 
 /* flags reset at EOXact */
 #define		PROC_VACUUM_STATE_MASK \
-	(PROC_IN_VACUUM | PROC_VACUUM_FOR_WRAPAROUND)
+	(PROC_IN_VACUUM | PROC_IN_CIC | PROC_VACUUM_FOR_WRAPAROUND)
 
 /*
  * We allow a small number of "weak" relation locks (AccessShareLock,
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 01040d76e1..c6edeb36e0 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -29,17 +29,21 @@
  */
 #define		PROCARRAY_VACUUM_FLAG			0x02	/* currently running lazy
 	 * vacuum */
+#define		PROCARRAY_CIC_FLAG0x04	/* currently running CREATE INDEX
+	 * CONCURRENTLY */
 #define		PROCARRAY_LOGICAL_DECODING_FLAG 0x10	/* currently doing logical
 	 * decoding outside xact */
 
 #define		PROCARRAY_SLOTS_XMIN			0x20	/* replication slot xmin,
 	 * catalog_xmin */
+
 /*
  * Only flags in PROCARRAY_PROC_FLAGS_MASK are considered when matching
  * PGXACT->vacuumFlags. Other flags are used for different purposes and
  * have no corresponding PROC flag equivalent.
  */
 #define		PROCARRAY_PROC_FLAGS_MASK	(PROCARRAY_VACUUM_FLAG | \
+		 PROCARRAY_CIC_FLAG | \

64 matches

Mail list logo