sync_standbys_defined read/write race on startup

Maksim.Melnikov Wed, 26 Mar 2025 09:41:39 -0700

Hi,

On my windows machine I've found unstable test, it fails very rarely,but anyway, sometimes it fails.


src/test/recovery/t/009_twophase.pl

# Failed test 'Committed prepared transaction is visible to newsnapshot in standby'

#   at t/009_twophase.pl line 360.
#          got: '0'
#     expected: '2'
# Looks like you failed 1 test of 30.
t/009_twophase.pl ....................
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/30 subtests

cat src/test/recovery/t/009_twophase.pl

....
$cur_primary->psql(
    'postgres', "

SET synchronous_commit='remote_apply'; -- To ensure the standby iscaught up

    CREATE TABLE t_009_tbl_standby_mvcc (id int, msg text);
    BEGIN;

INSERT INTO t_009_tbl_standby_mvcc VALUES (1, 'issued to${cur_primary_name}');

    SAVEPOINT s1;

INSERT INTO t_009_tbl_standby_mvcc VALUES (2, 'issued to${cur_primary_name}');

    PREPARE TRANSACTION 'xact_009_standby_mvcc';
    ");
$cur_primary->stop;
$cur_standby->restart;

# Acquire a snapshot in standby, before we commit the prepared transaction
my $standby_session =
  $cur_standby->background_psql('postgres', on_error_die => 1);
$standby_session->query_safe("BEGIN ISOLATION LEVEL REPEATABLE READ");
$psql_out =

$standby_session->query_safe("SELECT count(*) FROMt_009_tbl_standby_mvcc");

is($psql_out, '0',
    "Prepared transaction not visible in standby before commit");

# Commit the transaction in primary
$cur_primary->start;
$cur_primary->psql(
    'postgres', "
SET synchronous_commit='remote_apply'; -- To ensure the standby is caught up
COMMIT PREPARED 'xact_009_standby_mvcc';
");

# Still not visible to the old snapshot
$psql_out =

$standby_session->query_safe("SELECT count(*) FROMt_009_tbl_standby_mvcc");

is($psql_out, '0',

"Committed prepared transaction not visible to old snapshot instandby");


# Is visible to a new snapshot
$standby_session->query_safe("COMMIT");
$psql_out =

$standby_session->query_safe("SELECT count(*) FROMt_009_tbl_standby_mvcc");

*is($psql_out, '2',

"Committed prepared transaction is visible to new snapshot instandby");*

$standby_session->quit;
.....

Test check visibility of prepared transactions in standby after arestart while primary is down.Failed assert check that changes, commited on transaction on primarynode, were synchronously replicated to standby node.

In test flow we have primary server stop/start events.

Assert failure is hard to reproduce, but I found how to do itsynthetically, can do it by injecting in code synthetical sleep call


--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -924,6 +924,7 @@ SyncRepUpdateSyncStandbysDefined(void)

        if (sync_standbys_defined != WalSndCtl->sync_standbys_defined)
        {
+               sleep(2);
                LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);

                /*

The root cause is WalSndCtl->sync_standbys_defined, it is located inshared memory and can be updated by checkpointer process only,in backends it can be read in case of sync replication. So we have raceon postgres startup between backend, processing commit request, andcheckpointer,

setting WalSndCtl->sync_standbys_defined.

Synchronous replication occurred in SyncRepWaitForLSN(syncrep.c)

void
SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
{
.....

    /*

* Fast exit if user has not requested sync replication, or thereare no

     * sync replication standby names defined.
     *
     * Since this routine gets called every commit time, it's important to
     * exit quickly if sync replication is not requested. So we check
     * WalSndCtl->sync_standbys_defined flag without the lock and exit
     * immediately if it's false. If it's true, we need to check it again
     * later while holding the lock, to check the flag and operate the sync
     * rep queue atomically. This is necessary to avoid the race condition

* described in SyncRepUpdateSyncStandbysDefined(). On the otherhand, if * it's false, the lock is not necessary because we don't touch thequeue.

     */
    if (!SyncRepRequested() ||
        !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
        return;

.....

    if (!WalSndCtl->sync_standbys_defined ||
        lsn <= WalSndCtl->lsn[mode])
    {
        LWLockRelease(SyncRepLock);
        return;
    }

...........

Checkpointer set WalSndCtl->sync_standbys_defined parameter insrc/backend/replication/syncrep.c,

SyncRepUpdateSyncStandbysDefined method

void
SyncRepUpdateSyncStandbysDefined(void)
{
    bool        sync_standbys_defined = SyncStandbysDefined();

    if (sync_standbys_defined != WalSndCtl->sync_standbys_defined)
    {
        LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
..................
        WalSndCtl->sync_standbys_defined = sync_standbys_defined;

        LWLockRelease(SyncRepLock);
    }
}

So my question is

can we accept such behavior as bug of postgres codebase, or, on otherway, it is bug of unstable test?

P.S.

For me it is looking strange, that postgres start processing workloadbefore full config initializing.

Attached possible patch for codebase.

Best regards
Melnikov Maksim

From 8fa295c81a8db88568d45da3f779d3097aeb6942 Mon Sep 17 00:00:00 2001
From: Maksim Melnikov <m.melni...@postgrespro.ru>
Date: Wed, 26 Mar 2025 16:44:09 +0300
Subject: [PATCH] Fix sync_standbys_defined race on startup.

WalSndCtl->sync_standbys_defined, can be updated
by checkpointer process only and in backends it
can be read. So, on postgres startup, we have race
between checkpointer process and backends, that check
sync_standbys_defined for sync replications.

To avoid this, now we are working with sync_standbys_defined
not as bool field, but as bitmask, where 7th bit check
standbys_defined or not, 8th bit check
WalSndCtlData->sync_standbys_defined field has been
initialized by checkpointer. So if in synchronous replication functions
was detected that sync_standbys_defined is not initialized,
check SyncStandbysDefined() value.
---
 src/backend/replication/syncrep.c           | 31 +++++++++++++++++----
 src/include/replication/walsender_private.h | 10 ++++++-
 2 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index d75e3968035..cd29bfbdc94 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -148,6 +148,7 @@ void
 SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 {
 	int			mode;
+	bool		sync_standbys_defined;
 
 	/*
 	 * This should be called while holding interrupts during a transaction
@@ -170,7 +171,8 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 	 * it's false, the lock is not necessary because we don't touch the queue.
 	 */
 	if (!SyncRepRequested() ||
-		!((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
+		(((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined & SYNCSTANDBYSDEFINEDINITIALIZED)
+		== SYNCSTANDBYSINITIALIZED)
 		return;
 
 	/* Cap the level for anything other than commit to remote flush only. */
@@ -184,6 +186,8 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 
 	LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
 	Assert(MyProc->syncRepState == SYNC_REP_NOT_WAITING);
+	sync_standbys_defined = WalSndCtl->sync_standbys_defined & SYNCSTANDBYSINITIALIZED
+		? WalSndCtl->sync_standbys_defined & SYNCSTANDBYSDEFINED : SyncStandbysDefined();
 
 	/*
 	 * We don't wait for sync rep if WalSndCtl->sync_standbys_defined is not
@@ -193,7 +197,7 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 	 * condition but we'll be fetching that cache line anyway so it's likely
 	 * to be a low cost check.
 	 */
-	if (!WalSndCtl->sync_standbys_defined ||
+	if (!sync_standbys_defined ||
 		lsn <= WalSndCtl->lsn[mode])
 	{
 		LWLockRelease(SyncRepLock);
@@ -920,9 +924,10 @@ SyncRepWakeQueue(bool all, int mode)
 void
 SyncRepUpdateSyncStandbysDefined(void)
 {
-	bool		sync_standbys_defined = SyncStandbysDefined();
+	bool		new_sync_standbys_defined = SyncStandbysDefined();
+	bool		old_sync_standbys_defined = (WalSndCtl->sync_standbys_defined & SYNCSTANDBYSDEFINED);
 
-	if (sync_standbys_defined != WalSndCtl->sync_standbys_defined)
+	if (new_sync_standbys_defined != old_sync_standbys_defined)
 	{
 		LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
 
@@ -931,7 +936,7 @@ SyncRepUpdateSyncStandbysDefined(void)
 		 * for backends to continue waiting.  Since the user no longer wants
 		 * synchronous replication, we'd better wake them up.
 		 */
-		if (!sync_standbys_defined)
+		if (!new_sync_standbys_defined)
 		{
 			int			i;
 
@@ -946,10 +951,24 @@ SyncRepUpdateSyncStandbysDefined(void)
 		 * backend that hasn't yet reloaded its config might go to sleep on
 		 * the queue (and never wake up).  This prevents that.
 		 */
-		WalSndCtl->sync_standbys_defined = sync_standbys_defined;
+		WalSndCtl->sync_standbys_defined |= new_sync_standbys_defined
+			? (SYNCSTANDBYSDEFINEDINITIALIZED)
+			: SYNCSTANDBYSINITIALIZED;
 
 		LWLockRelease(SyncRepLock);
 	}
+	else if (!(WalSndCtl->sync_standbys_defined & SYNCSTANDBYSINITIALIZED))
+	{
+		/*
+		 * Even if new_sync_standbys_defined == old_sync_standbys_defined, we
+		 * should check WalSndCtl->sync_standbys_defined is initialized and
+		 * set its initialize bit, if it reset. Anyway this bit should be set
+		 * only on start and then it will be always setted.
+		 */
+		LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
+		WalSndCtl->sync_standbys_defined |= SYNCSTANDBYSINITIALIZED;
+		LWLockRelease(SyncRepLock);
+	}
 }
 
 #ifdef USE_ASSERT_CHECKING
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h
index 0fc77f1b4af..5b096a1dc90 100644
--- a/src/include/replication/walsender_private.h
+++ b/src/include/replication/walsender_private.h
@@ -100,7 +100,7 @@ typedef struct
 	 * config file safely, so checkpointer updates this value as needed.
 	 * Protected by SyncRepLock.
 	 */
-	bool		sync_standbys_defined;
+	uint8		sync_standbys_defined;
 
 	/* used as a registry of physical / logical walsenders to wake */
 	ConditionVariable wal_flush_cv;
@@ -116,6 +116,14 @@ typedef struct
 	WalSnd		walsnds[FLEXIBLE_ARRAY_MEMBER];
 } WalSndCtlData;
 
+/*
+ * WalSndCtlData->sync_standbys_defined bit flags. 7th bit check standbys_defined or not,
+ * 8th bit check WalSndCtlData->sync_standbys_defined field has been initialized.
+ */
+#define SYNCSTANDBYSINITIALIZED	(1 << 0)
+#define SYNCSTANDBYSDEFINED		(1 << 2)
+#define SYNCSTANDBYSDEFINEDINITIALIZED	(SYNCSTANDBYSINITIALIZED & SYNCSTANDBYSDEFINED)
+
 extern PGDLLIMPORT WalSndCtlData *WalSndCtl;
 
 
-- 
2.43.0

sync_standbys_defined read/write race on startup

Reply via email to