On Tue, Mar 2, 2021 at 12:00 AM Thomas Munro <thomas.mu...@gmail.com> wrote:
> On Mon, Nov 16, 2020 at 8:56 PM Michael Paquier <mich...@paquier.xyz> wrote:
> > No objections with the two changes from pg_usleep() to WaitLatch() so
> > they could be applied separately first.
>
> I thought about committing that first part, and got as far as
> splitting the patch into two (see attached), but then I re-read
> Fujii-san's message about the speed of promotion and realised that we
> really should have something like a condition variable for walRcvState
> changes.  I'll look into that.

Here's an experimental attempt at that, though I'm not sure if it's
the right approach.  Of course it's not necessary to use condition
variables here: we could use recoveryWakeupLatch, because we're not in
any doubt about who needs to be woken up.  But then you could get
constant wakeups while recovery is paused, unless you also suppressed
that somehow.  You could use the startup process's procLatch,
advertised in shmem, but that's almost a condition variable.  With a
condition variable, you get to name it something like
walRcvStateChanged, and then the programming rule is very clear: if
you change walRcvState, you need to broadcast that fact (and you don't
have to worry about who might be interested).  One question I haven't
got to the bottom of: is it a problem for the startup process that CVs
use CHECK_FOR_INTERRUPTS()?
From 1e28b9c21a953e66c48f78a90192f3a0ca83d0aa Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Tue, 2 Mar 2021 11:08:06 +1300
Subject: [PATCH v6 1/3] Add condition variable for walreceiver state.

Use this new CV to wait for walreceiver shutdown without a sleep/poll
loop, while also benefiting from standard postmaster death handling.

Reviewed-by: Heikki Linnakangas <hlinn...@iki.fi>
Reviewed-by: Fujii Masao <masao.fu...@oss.nttdata.com>
Reviewed-by: Michael Paquier <mich...@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
---
 doc/src/sgml/monitoring.sgml               |  4 +++
 src/backend/postmaster/pgstat.c            |  3 ++
 src/backend/replication/walreceiver.c      | 10 ++++++
 src/backend/replication/walreceiverfuncs.c | 42 ++++++++++++++--------
 src/include/pgstat.h                       |  1 +
 src/include/replication/walreceiver.h      |  2 ++
 6 files changed, 48 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3513e127b7..cf00210cb3 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1757,6 +1757,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting for confirmation from a remote server during synchronous
        replication.</entry>
      </row>
+     <row>
+      <entry><literal>WalrcvExit</literal></entry>
+      <entry>Waiting for the walreceiver to exit.</entry>
+     </row>
      <row>
       <entry><literal>XactGroupUpdate</literal></entry>
       <entry>Waiting for the group leader to update transaction status at
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f75b52719d..2dbfb81e40 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4122,6 +4122,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
+		case WAIT_EVENT_WALRCV_EXIT:
+			event_name = "WalrcvExit";
+			break;
 		case WAIT_EVENT_XACT_GROUP_UPDATE:
 			event_name = "XactGroupUpdate";
 			break;
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index e5f8a06fea..397a94d7af 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -207,6 +207,8 @@ WalReceiverMain(void)
 
 		case WALRCV_STOPPED:
 			SpinLockRelease(&walrcv->mutex);
+			/* We might have changed state and fallen through above. */
+			ConditionVariableBroadcast(&walrcv->walRcvStateChanged);
 			proc_exit(1);
 			break;
 
@@ -249,6 +251,8 @@ WalReceiverMain(void)
 
 	SpinLockRelease(&walrcv->mutex);
 
+	ConditionVariableBroadcast(&walrcv->walRcvStateChanged);
+
 	pg_atomic_write_u64(&WalRcv->writtenUpto, 0);
 
 	/* Arrange to clean up at walreceiver exit */
@@ -647,6 +651,8 @@ WalRcvWaitForStartPosition(XLogRecPtr *startpoint, TimeLineID *startpointTLI)
 	walrcv->receiveStartTLI = 0;
 	SpinLockRelease(&walrcv->mutex);
 
+	ConditionVariableBroadcast(&walrcv->walRcvStateChanged);
+
 	set_ps_display("idle");
 
 	/*
@@ -675,6 +681,8 @@ WalRcvWaitForStartPosition(XLogRecPtr *startpoint, TimeLineID *startpointTLI)
 			*startpointTLI = walrcv->receiveStartTLI;
 			walrcv->walRcvState = WALRCV_STREAMING;
 			SpinLockRelease(&walrcv->mutex);
+
+			ConditionVariableBroadcast(&walrcv->walRcvStateChanged);
 			break;
 		}
 		if (walrcv->walRcvState == WALRCV_STOPPING)
@@ -784,6 +792,8 @@ WalRcvDie(int code, Datum arg)
 	walrcv->latch = NULL;
 	SpinLockRelease(&walrcv->mutex);
 
+	ConditionVariableBroadcast(&walrcv->walRcvStateChanged);
+
 	/* Terminate the connection gracefully. */
 	if (wrconn != NULL)
 		walrcv_disconnect(wrconn);
diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c
index 63e60478ea..9106f43d51 100644
--- a/src/backend/replication/walreceiverfuncs.c
+++ b/src/backend/replication/walreceiverfuncs.c
@@ -23,8 +23,10 @@
 #include <signal.h>
 
 #include "access/xlog_internal.h"
+#include "pgstat.h"
 #include "postmaster/startup.h"
 #include "replication/walreceiver.h"
+#include "storage/latch.h"
 #include "storage/pmsignal.h"
 #include "storage/shmem.h"
 #include "utils/timestamp.h"
@@ -62,6 +64,7 @@ WalRcvShmemInit(void)
 		/* First time through, so initialize */
 		MemSet(WalRcv, 0, WalRcvShmemSize());
 		WalRcv->walRcvState = WALRCV_STOPPED;
+		ConditionVariableInit(&WalRcv->walRcvStateChanged);
 		SpinLockInit(&WalRcv->mutex);
 		pg_atomic_init_u64(&WalRcv->writtenUpto, 0);
 		WalRcv->latch = NULL;
@@ -95,12 +98,18 @@ WalRcvRunning(void)
 
 		if ((now - startTime) > WALRCV_STARTUP_TIMEOUT)
 		{
-			SpinLockAcquire(&walrcv->mutex);
+			bool		state_changed = false;
 
+			SpinLockAcquire(&walrcv->mutex);
 			if (walrcv->walRcvState == WALRCV_STARTING)
+			{
 				state = walrcv->walRcvState = WALRCV_STOPPED;
-
+				state_changed = true;
+			}
 			SpinLockRelease(&walrcv->mutex);
+
+			if (state_changed)
+				ConditionVariableBroadcast(&walrcv->walRcvStateChanged);
 		}
 	}
 
@@ -140,12 +149,18 @@ WalRcvStreaming(void)
 
 		if ((now - startTime) > WALRCV_STARTUP_TIMEOUT)
 		{
-			SpinLockAcquire(&walrcv->mutex);
+			bool		state_changed = false;
 
+			SpinLockAcquire(&walrcv->mutex);
 			if (walrcv->walRcvState == WALRCV_STARTING)
+			{
 				state = walrcv->walRcvState = WALRCV_STOPPED;
-
+				state_changed = true;
+			}
 			SpinLockRelease(&walrcv->mutex);
+
+			if (state_changed)
+				ConditionVariableBroadcast(&walrcv->walRcvStateChanged);
 		}
 	}
 
@@ -191,6 +206,8 @@ ShutdownWalRcv(void)
 	}
 	SpinLockRelease(&walrcv->mutex);
 
+	ConditionVariableBroadcast(&walrcv->walRcvStateChanged);
+
 	/*
 	 * Signal walreceiver process if it was still running.
 	 */
@@ -199,18 +216,13 @@ ShutdownWalRcv(void)
 
 	/*
 	 * Wait for walreceiver to acknowledge its death by setting state to
-	 * WALRCV_STOPPED.
+	 * WALRCV_STOPPED.  This wait contains a standard CHECK_FOR_INTERRUPTS().
 	 */
+	ConditionVariablePrepareToSleep(&walrcv->walRcvStateChanged);
 	while (WalRcvRunning())
-	{
-		/*
-		 * This possibly-long loop needs to handle interrupts of startup
-		 * process.
-		 */
-		HandleStartupProcInterrupts();
-
-		pg_usleep(100000);		/* 100ms */
-	}
+		ConditionVariableSleep(&walrcv->walRcvStateChanged,
+							   WAIT_EVENT_WALRCV_EXIT);
+	ConditionVariableCancelSleep();
 }
 
 /*
@@ -298,6 +310,8 @@ RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr, const char *conninfo,
 
 	SpinLockRelease(&walrcv->mutex);
 
+	ConditionVariableBroadcast(&walrcv->walRcvStateChanged);
+
 	if (launch)
 		SendPostmasterSignal(PMSIGNAL_START_WALRECEIVER);
 	else if (latch)
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 724068cf87..222e38037c 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -998,6 +998,7 @@ typedef enum
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
 	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_WALRCV_EXIT,
 	WAIT_EVENT_XACT_GROUP_UPDATE
 } WaitEventIPC;
 
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index a97a59a6a3..bddea21a30 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -19,6 +19,7 @@
 #include "port/atomics.h"
 #include "replication/logicalproto.h"
 #include "replication/walsender.h"
+#include "storage/condition_variable.h"
 #include "storage/latch.h"
 #include "storage/spin.h"
 #include "utils/tuplestore.h"
@@ -62,6 +63,7 @@ typedef struct
 	 */
 	pid_t		pid;
 	WalRcvState walRcvState;
+	ConditionVariable walRcvStateChanged;
 	pg_time_t	startTime;
 
 	/*
-- 
2.30.1

From f03e945f140d52a9e4910ab0d66a01d659c43c9f Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Tue, 2 Mar 2021 11:47:39 +1300
Subject: [PATCH v6 2/3] Add condition variable for recovery pause/resume.

This gives us a fast reaction time when recovery is resumed or the
postmaster exits.  Unfortunately we still need to wake up every second
to perform more expensive checks during the recovery pause loop.

Reviewed-by: Heikki Linnakangas <hlinn...@iki.fi>
Reviewed-by: Fujii Masao <masao.fu...@oss.nttdata.com>
Reviewed-by: Michael Paquier <mich...@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
---
 src/backend/access/transam/xlog.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 377afb8732..17240e9b8f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -723,6 +723,7 @@ typedef struct XLogCtlData
 	TimestampTz currentChunkStartTime;
 	/* Are we requested to pause recovery? */
 	bool		recoveryPause;
+	ConditionVariable recoveryPauseChanged;
 
 	/*
 	 * lastFpwDisableRecPtr points to the start of the last replayed
@@ -5204,6 +5205,7 @@ XLOGShmemInit(void)
 	SpinLockInit(&XLogCtl->info_lck);
 	SpinLockInit(&XLogCtl->ulsn_lck);
 	InitSharedLatch(&XLogCtl->recoveryWakeupLatch);
+	ConditionVariableInit(&XLogCtl->recoveryPauseChanged);
 }
 
 /*
@@ -6016,10 +6018,6 @@ recoveryStopsAfter(XLogReaderState *record)
  * endOfRecovery is true if the recovery target is reached and
  * the paused state starts at the end of recovery because of
  * recovery_target_action=pause, and false otherwise.
- *
- * XXX Could also be done with shared latch, avoiding the pg_usleep loop.
- * Probably not worth the trouble though.  This state shouldn't be one that
- * anyone cares about server power consumption in.
  */
 static void
 recoveryPausesHere(bool endOfRecovery)
@@ -6041,15 +6039,22 @@ recoveryPausesHere(bool endOfRecovery)
 				(errmsg("recovery has paused"),
 				 errhint("Execute pg_wal_replay_resume() to continue.")));
 
+	ConditionVariablePrepareToSleep(&XLogCtl->recoveryPauseChanged);
 	while (RecoveryIsPaused())
 	{
 		HandleStartupProcInterrupts();
 		if (CheckForStandbyTrigger())
-			return;
-		pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE);
-		pg_usleep(1000000L);	/* 1000 ms */
-		pgstat_report_wait_end();
+			break;
+
+		/*
+		 * We wait on a condition variable that will wake us as soon as the
+		 * pause ends, but we use a timeout so we can check the above exit
+		 * condition periodically too.
+		 */
+		ConditionVariableTimedSleep(&XLogCtl->recoveryPauseChanged, 1000,
+									WAIT_EVENT_RECOVERY_PAUSE);
 	}
+	ConditionVariableCancelSleep();
 }
 
 bool
@@ -6070,6 +6075,8 @@ SetRecoveryPause(bool recoveryPause)
 	SpinLockAcquire(&XLogCtl->info_lck);
 	XLogCtl->recoveryPause = recoveryPause;
 	SpinLockRelease(&XLogCtl->info_lck);
+
+	ConditionVariableBroadcast(&XLogCtl->recoveryPauseChanged);
 }
 
 /*
-- 
2.30.1

From 1dfc8a1b25b33c4d1115bcea68ebe8732d56ba93 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Thu, 24 Sep 2020 17:37:54 +1200
Subject: [PATCH v6 3/3] Poll postmaster less frequently in recovery.

Since commits 9f095299 and f98b8476 we don't poll the postmaster
pipe at all during crash recovery on Linux and FreeBSD, but on other
operating systems we were still doing it for every WAL record.  Do it
less frequently on operating systems where system calls are required, at
the cost of delaying exit a bit after postmaster death, to avoid
expensive system calls reported to slow down CPU-bound recovery by
as much as 10-30%.

Reviewed-by: Heikki Linnakangas <hlinn...@iki.fi>
Reviewed-by: Fujii Masao <masao.fu...@oss.nttdata.com>
Reviewed-by: Michael Paquier <mich...@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
Discussion: https://postgr.es/m/7261eb39-0369-f2f4-1bb5-62f3b6083...@iki.fi
---
 src/backend/postmaster/startup.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index f781fdc6fc..682fc675ab 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -134,6 +134,10 @@ StartupRereadConfig(void)
 void
 HandleStartupProcInterrupts(void)
 {
+#ifndef USE_POSTMASTER_DEATH_SIGNAL
+	static int	count = 0;
+#endif
+
 	/*
 	 * Process any requests or signals received recently.
 	 */
@@ -151,9 +155,17 @@ HandleStartupProcInterrupts(void)
 
 	/*
 	 * Emergency bailout if postmaster has died.  This is to avoid the
-	 * necessity for manual cleanup of all postmaster children.
+	 * necessity for manual cleanup of all postmaster children.  Do this less
+	 * frequently on systems for which we don't have signals to make that
+	 * cheap.  Any loop that sleeps should be using WaitLatch or similar and
+	 * handling postmaster death that way; the check here is intended only to
+	 * deal with CPU-bound loops such as the main redo loop.
 	 */
-	if (IsUnderPostmaster && !PostmasterIsAlive())
+	if (IsUnderPostmaster &&
+#ifndef USE_POSTMASTER_DEATH_SIGNAL
+		count++ % 1024 == 0 &&
+#endif
+		!PostmasterIsAlive())
 		exit(1);
 
 	/* Process barrier events */
-- 
2.30.1

Reply via email to