Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

Thomas Munro Sat, 19 Sep 2020 14:30:57 -0700

On Sat, Sep 19, 2020 at 6:07 AM Fujii Masao <masao.fu...@oss.nttdata.com> wrote:
> -               pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE);
> -               pg_usleep(1000000L);    /* 1000 ms */
> -               pgstat_report_wait_end();
> +               WaitLatch(NULL, WL_EXIT_ON_PM_DEATH | WL_TIMEOUT, 1000,
> +                                 WAIT_EVENT_RECOVERY_PAUSE);
>
> This change may cause at most one second delay against the standby
> promotion request during WAL replay pause? It's only one second,
> but I'd like to avoid this (unnecessary) wait to shorten the failover time
> as much as possible basically. So what about using WL_SET_LATCH here?


Right, there is a comment saying that we could do that:

 * XXX Could also be done with shared latch, avoiding the pg_usleep loop.
 * Probably not worth the trouble though.  This state shouldn't be one that
 * anyone cares about server power consumption in.

> But when using WL_SET_LATCH, one concern is that walreceiver can
> wake up the startup process too frequently even during WAL replay pause.
> This is also problematic? I'm ok with this, but if not, using pg_usleep()
> might be better as the original code does.

You're right, at least if we used recoveryWakeupLatch.  Although we'd
react to pg_wal_replay_resume() faster, which would be nice, we
wouldn't be saving energy, we'd be using more energy due to all the
other latch wakeups that we'd be ignoring.  I believe the correct
solution to this problem is to add a ConditionVariable
"recoveryPauseChanged" into XLogCtlData, and then broadcast on it in
SetRecoveryPause().  This would be a trivial change, except for one
small problem: ConditionVariableTimedSleep() contains
CHECK_FOR_INTERRUPTS(), but startup.c has its own special interrupt
handling rather than using ProcessInterrupts() from postgres.c.  Maybe
that's OK, I'm not sure, but it requires more thought, and I propose
to keep the existing sloppy polling for now and leave precise wakeup
improvements for a separate patch.  The primary goal of this patch is
to switch to the standard treatment of postmaster death in wait loops,
so that we're free to reduce the sampling frequency in
HandleStartupProcInterrupts(), to fix a horrible performance problem.
I have at least tweaked that comment about pg_usleep(), though,
because that was out of date; I also used (void) WaitLatch(...) to
make it look like other places where we ignore the return value
(perhaps some static analyser out there somewhere cares?)

By the way, a CV could probably be used for walreceiver state changes
too, to improve ShutdownWalRcv().

Although I know from CI that this builds and passes "make check" on
Windows, I'm hoping to attract some review of the 0001 patch from a
Windows person, and confirmation that it passes "check-world" (or at
least src/test/recovery check) there, because I don't have CI scripts
for that yet.

From a415ad0f7733ac85f315943dd0a1de9b24d909c5 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Fri, 18 Sep 2020 00:04:56 +1200
Subject: [PATCH v3 1/3] Allow WaitLatch() to be used without a latch.

Due to flaws in commit 3347c982bab, using WaitLatch() without
WL_LATCH_SET could cause an assertion failure or crash.  Repair.

While here, also add a check that the latch we're switching to belongs
to this backend, when changing from one latch to another.

Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
---
 src/backend/storage/ipc/latch.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 4153cc8557..63c6c97536 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -924,7 +924,22 @@ ModifyWaitEvent(WaitEventSet *set, int pos, uint32 events, Latch *latch)
 
 	if (events == WL_LATCH_SET)
 	{
+		if (latch && latch->owner_pid != MyProcPid)
+			elog(ERROR, "cannot wait on a latch owned by another process");
 		set->latch = latch;
+		/*
+		 * On Unix, we don't need to modify the kernel object because the
+		 * underlying pipe is the same for all latches so we can return
+		 * immediately.  On Windows, we need to update our array of handles,
+		 * but we leave the old one in place and tolerate spurious wakeups if
+		 * the latch is disabled.
+		 */
+#if defined(WAIT_USE_WIN32)
+		if (!latch)
+			return;
+#else
+		return;
+#endif
 	}
 
 #if defined(WAIT_USE_EPOLL)
@@ -1386,7 +1401,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 			/* There's data in the self-pipe, clear it. */
 			drainSelfPipe();
 
-			if (set->latch->is_set)
+			if (set->latch && set->latch->is_set)
 			{
 				occurred_events->fd = PGINVALID_SOCKET;
 				occurred_events->events = WL_LATCH_SET;
@@ -1536,7 +1551,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 			/* There's data in the self-pipe, clear it. */
 			drainSelfPipe();
 
-			if (set->latch->is_set)
+			if (set->latch && set->latch->is_set)
 			{
 				occurred_events->fd = PGINVALID_SOCKET;
 				occurred_events->events = WL_LATCH_SET;
@@ -1645,7 +1660,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 			/* There's data in the self-pipe, clear it. */
 			drainSelfPipe();
 
-			if (set->latch->is_set)
+			if (set->latch && set->latch->is_set)
 			{
 				occurred_events->fd = PGINVALID_SOCKET;
 				occurred_events->events = WL_LATCH_SET;
@@ -1812,7 +1827,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 		if (!ResetEvent(set->latch->event))
 			elog(ERROR, "ResetEvent failed: error code %lu", GetLastError());
 
-		if (set->latch->is_set)
+		if (set->latch && set->latch->is_set)
 		{
 			occurred_events->fd = PGINVALID_SOCKET;
 			occurred_events->events = WL_LATCH_SET;
-- 
2.20.1

From 1c72f965e24676d7ade05701617e2381b7ff3065 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Thu, 17 Sep 2020 21:11:01 +1200
Subject: [PATCH v3 2/3] Poll postmaster less frequently in recovery.

Since commits 9f095299 and f98b8476 we don't poll the postmaster
pipe at all during crash recovery on Linux and FreeBSD, but on other
operating systems we were still doing it for every WAL record.  Do it
less frequently on operating systems where system calls are required, at
the cost of delaying exit a bit after postmaster death, to avoid
expensive system calls reported to slow down CPU-bound recovery by
10-30%.

Replace a couple of pg_usleep()-based wait loops with WaitLatch() loops,
to make sure they respond promptly to postmaster death now that
HandleStartupProcInterrupts() doesn't check every time through on some
systems.

Reviewed-by: Heikki Linnakangas <hlinn...@iki.fi>
Reviewed-by: Fujii Masao <masao.fu...@oss.nttdata.com>
Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
Discussion: https://postgr.es/m/7261eb39-0369-f2f4-1bb5-62f3b6083...@iki.fi
---
 doc/src/sgml/monitoring.sgml               |  4 ++++
 src/backend/access/transam/xlog.c          |  7 +++----
 src/backend/postmaster/pgstat.c            |  3 +++
 src/backend/postmaster/startup.c           | 16 ++++++++++++++--
 src/backend/replication/walreceiverfuncs.c |  6 ++++--
 src/include/pgstat.h                       |  1 +
 6 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 673a0e73e4..b0a8fc1582 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1734,6 +1734,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting for confirmation from a remote server during synchronous
        replication.</entry>
      </row>
+     <row>
+      <entry><literal>WalrcvExit</literal></entry>
+      <entry>Waiting for the walreceiver to exit.</entry>
+     </row>
      <row>
       <entry><literal>XactGroupUpdate</literal></entry>
       <entry>Waiting for the group leader to update transaction status at
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 61754312e2..f9d9b38a8a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5999,7 +5999,7 @@ recoveryStopsAfter(XLogReaderState *record)
  * the paused state starts at the end of recovery because of
  * recovery_target_action=pause, and false otherwise.
  *
- * XXX Could also be done with shared latch, avoiding the pg_usleep loop.
+ * XXX Could also be done with shared latch, avoiding the WL_TIMEOUT loop.
  * Probably not worth the trouble though.  This state shouldn't be one that
  * anyone cares about server power consumption in.
  */
@@ -6028,9 +6028,8 @@ recoveryPausesHere(bool endOfRecovery)
 		HandleStartupProcInterrupts();
 		if (CheckForStandbyTrigger())
 			return;
-		pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE);
-		pg_usleep(1000000L);	/* 1000 ms */
-		pgstat_report_wait_end();
+		(void) WaitLatch(NULL, WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, 1000,
+						 WAIT_EVENT_RECOVERY_PAUSE);
 	}
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e6be2b7836..395d61f082 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3875,6 +3875,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
+		case WAIT_EVENT_WALRCV_EXIT:
+			event_name = "WalrcvExit";
+			break;
 		case WAIT_EVENT_XACT_GROUP_UPDATE:
 			event_name = "XactGroupUpdate";
 			break;
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 64af7b8707..a802041ca7 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -134,6 +134,10 @@ StartupRereadConfig(void)
 void
 HandleStartupProcInterrupts(void)
 {
+#ifndef USE_POSTMASTER_DEATH_SIGNAL
+	static int	count = 0;
+#endif
+
 	/*
 	 * Process any requests or signals received recently.
 	 */
@@ -151,9 +155,17 @@ HandleStartupProcInterrupts(void)
 
 	/*
 	 * Emergency bailout if postmaster has died.  This is to avoid the
-	 * necessity for manual cleanup of all postmaster children.
+	 * necessity for manual cleanup of all postmaster children.  Do this less
+	 * frequently on systems for which we don't have signals to make that
+	 * cheap.  Any loop that sleeps should be using WaitLatch or similar and
+	 * handling postmaster death that way; the check here is intended only to
+	 * deal with CPU-bound loops such as the main redo loop.
 	 */
-	if (IsUnderPostmaster && !PostmasterIsAlive())
+	if (IsUnderPostmaster &&
+#ifndef USE_POSTMASTER_DEATH_SIGNAL
+		count++ % 1024 == 0 &&
+#endif
+		!PostmasterIsAlive())
 		exit(1);
 
 	/* Process barrier events */
diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c
index e675757301..059353f342 100644
--- a/src/backend/replication/walreceiverfuncs.c
+++ b/src/backend/replication/walreceiverfuncs.c
@@ -23,8 +23,10 @@
 #include <signal.h>
 
 #include "access/xlog_internal.h"
+#include "pgstat.h"
 #include "postmaster/startup.h"
 #include "replication/walreceiver.h"
+#include "storage/latch.h"
 #include "storage/pmsignal.h"
 #include "storage/shmem.h"
 #include "utils/timestamp.h"
@@ -207,8 +209,8 @@ ShutdownWalRcv(void)
 		 * process.
 		 */
 		HandleStartupProcInterrupts();
-
-		pg_usleep(100000);		/* 100ms */
+		(void) WaitLatch(NULL, WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, 100,
+						 WAIT_EVENT_WALRCV_EXIT);
 	}
 }
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0dfbac46b4..f934acaed9 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -887,6 +887,7 @@ typedef enum
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
 	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_WALRCV_EXIT,
 	WAIT_EVENT_XACT_GROUP_UPDATE
 } WaitEventIPC;
 
-- 
2.20.1

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

Reply via email to