On 2020/09/08 13:23, Ian Barwick wrote:
On 2020/09/08 13:11, Andres Freund wrote:
Hi,

On 2020-09-08 13:03:01 +0900, Ian Barwick wrote:
(...)
I wonder if it's possible to increment "xactCompletionCount"
during replay along these lines:

     *** a/src/backend/access/transam/xact.c
     --- b/src/backend/access/transam/xact.c
     *************** xact_redo_commit(xl_xact_parsed_commit *
     *** 5915,5920 ****
     --- 5915,5924 ----
              */
             if (XactCompletionApplyFeedback(parsed->xinfo))
                     XLogRequestWalReceiverReply();
     +
     +       LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
     +       ShmemVariableCache->xactCompletionCount++;
     +       LWLockRelease(ProcArrayLock);
       }

which seems to work (though quite possibly I've overlooked something I don't
know that I don't know about and it will all break horribly somewhere,
etc. etc.).

We'd also need the same in a few more places. Probably worth looking at
the list where we increment it on the primary (particularly we need to
also increment it for aborts, and 2pc commit/aborts).

Yup.

At first I was very confused as to why none of the existing tests have
found this significant issue. But after thinking about it for a minute
that's because they all use psql, and largely separate psql invocations
for each query :(. Which means that there's no cached snapshot around...

Do you want to try to write a patch?

Sure, I'll give it a go as I have some time right now.


Attached, though bear in mind I'm not very familiar with parts of this,
particularly 2PC stuff, so consider it educated guesswork.


Regards


Ian Barwick

--
Ian Barwick                   https://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
commit 544e2b1661413fe08e3083f03063c12c0d7cf3aa
Author: Ian Barwick <i...@2ndquadrant.com>
Date:   Tue Sep 8 12:24:14 2020 +0900

    Fix snapshot caching on standbys
    
    Addresses issue introduced in 623a9ba.

diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index b8bedca04a..227d03bbce 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -3320,6 +3320,14 @@ multixact_redo(XLogReaderState *record)
 	}
 	else
 		elog(PANIC, "multixact_redo: unknown op code %u", info);
+
+	/*
+	 * Advance xactCompletionCount so rebuilds of snapshot contents
+	 * can be triggered.
+	 */
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ShmemVariableCache->xactCompletionCount++;
+	LWLockRelease(ProcArrayLock);
 }
 
 Datum
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index af6afcebb1..04ca858918 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -5915,6 +5915,14 @@ xact_redo_commit(xl_xact_parsed_commit *parsed,
 	 */
 	if (XactCompletionApplyFeedback(parsed->xinfo))
 		XLogRequestWalReceiverReply();
+
+	/*
+	 * Advance xactCompletionCount so rebuilds of snapshot contents
+	 * can be triggered.
+	 */
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ShmemVariableCache->xactCompletionCount++;
+	LWLockRelease(ProcArrayLock);
 }
 
 /*
@@ -5978,6 +5986,14 @@ xact_redo_abort(xl_xact_parsed_abort *parsed, TransactionId xid)
 
 	/* Make sure files supposed to be dropped are dropped */
 	DropRelationFiles(parsed->xnodes, parsed->nrels, true);
+
+	/*
+	 * Advance xactCompletionCount so rebuilds of snapshot contents
+	 * can be triggered.
+	 */
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	ShmemVariableCache->xactCompletionCount++;
+	LWLockRelease(ProcArrayLock);
 }
 
 void

Reply via email to