On Tue, Mar 2, 2021 at 12:00 AM Thomas Munro <thomas.mu...@gmail.com> wrote: > On Mon, Nov 16, 2020 at 8:56 PM Michael Paquier <mich...@paquier.xyz> wrote: > > No objections with the two changes from pg_usleep() to WaitLatch() so > > they could be applied separately first. > > I thought about committing that first part, and got as far as > splitting the patch into two (see attached), but then I re-read > Fujii-san's message about the speed of promotion and realised that we > really should have something like a condition variable for walRcvState > changes. I'll look into that.
Here's an experimental attempt at that, though I'm not sure if it's the right approach. Of course it's not necessary to use condition variables here: we could use recoveryWakeupLatch, because we're not in any doubt about who needs to be woken up. But then you could get constant wakeups while recovery is paused, unless you also suppressed that somehow. You could use the startup process's procLatch, advertised in shmem, but that's almost a condition variable. With a condition variable, you get to name it something like walRcvStateChanged, and then the programming rule is very clear: if you change walRcvState, you need to broadcast that fact (and you don't have to worry about who might be interested). One question I haven't got to the bottom of: is it a problem for the startup process that CVs use CHECK_FOR_INTERRUPTS()?
From 1e28b9c21a953e66c48f78a90192f3a0ca83d0aa Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Tue, 2 Mar 2021 11:08:06 +1300 Subject: [PATCH v6 1/3] Add condition variable for walreceiver state. Use this new CV to wait for walreceiver shutdown without a sleep/poll loop, while also benefiting from standard postmaster death handling. Reviewed-by: Heikki Linnakangas <hlinn...@iki.fi> Reviewed-by: Fujii Masao <masao.fu...@oss.nttdata.com> Reviewed-by: Michael Paquier <mich...@paquier.xyz> Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com --- doc/src/sgml/monitoring.sgml | 4 +++ src/backend/postmaster/pgstat.c | 3 ++ src/backend/replication/walreceiver.c | 10 ++++++ src/backend/replication/walreceiverfuncs.c | 42 ++++++++++++++-------- src/include/pgstat.h | 1 + src/include/replication/walreceiver.h | 2 ++ 6 files changed, 48 insertions(+), 14 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 3513e127b7..cf00210cb3 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -1757,6 +1757,10 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser <entry>Waiting for confirmation from a remote server during synchronous replication.</entry> </row> + <row> + <entry><literal>WalrcvExit</literal></entry> + <entry>Waiting for the walreceiver to exit.</entry> + </row> <row> <entry><literal>XactGroupUpdate</literal></entry> <entry>Waiting for the group leader to update transaction status at diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index f75b52719d..2dbfb81e40 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -4122,6 +4122,9 @@ pgstat_get_wait_ipc(WaitEventIPC w) case WAIT_EVENT_SYNC_REP: event_name = "SyncRep"; break; + case WAIT_EVENT_WALRCV_EXIT: + event_name = "WalrcvExit"; + break; case WAIT_EVENT_XACT_GROUP_UPDATE: event_name = "XactGroupUpdate"; break; diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index e5f8a06fea..397a94d7af 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -207,6 +207,8 @@ WalReceiverMain(void) case WALRCV_STOPPED: SpinLockRelease(&walrcv->mutex); + /* We might have changed state and fallen through above. */ + ConditionVariableBroadcast(&walrcv->walRcvStateChanged); proc_exit(1); break; @@ -249,6 +251,8 @@ WalReceiverMain(void) SpinLockRelease(&walrcv->mutex); + ConditionVariableBroadcast(&walrcv->walRcvStateChanged); + pg_atomic_write_u64(&WalRcv->writtenUpto, 0); /* Arrange to clean up at walreceiver exit */ @@ -647,6 +651,8 @@ WalRcvWaitForStartPosition(XLogRecPtr *startpoint, TimeLineID *startpointTLI) walrcv->receiveStartTLI = 0; SpinLockRelease(&walrcv->mutex); + ConditionVariableBroadcast(&walrcv->walRcvStateChanged); + set_ps_display("idle"); /* @@ -675,6 +681,8 @@ WalRcvWaitForStartPosition(XLogRecPtr *startpoint, TimeLineID *startpointTLI) *startpointTLI = walrcv->receiveStartTLI; walrcv->walRcvState = WALRCV_STREAMING; SpinLockRelease(&walrcv->mutex); + + ConditionVariableBroadcast(&walrcv->walRcvStateChanged); break; } if (walrcv->walRcvState == WALRCV_STOPPING) @@ -784,6 +792,8 @@ WalRcvDie(int code, Datum arg) walrcv->latch = NULL; SpinLockRelease(&walrcv->mutex); + ConditionVariableBroadcast(&walrcv->walRcvStateChanged); + /* Terminate the connection gracefully. */ if (wrconn != NULL) walrcv_disconnect(wrconn); diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c index 63e60478ea..9106f43d51 100644 --- a/src/backend/replication/walreceiverfuncs.c +++ b/src/backend/replication/walreceiverfuncs.c @@ -23,8 +23,10 @@ #include <signal.h> #include "access/xlog_internal.h" +#include "pgstat.h" #include "postmaster/startup.h" #include "replication/walreceiver.h" +#include "storage/latch.h" #include "storage/pmsignal.h" #include "storage/shmem.h" #include "utils/timestamp.h" @@ -62,6 +64,7 @@ WalRcvShmemInit(void) /* First time through, so initialize */ MemSet(WalRcv, 0, WalRcvShmemSize()); WalRcv->walRcvState = WALRCV_STOPPED; + ConditionVariableInit(&WalRcv->walRcvStateChanged); SpinLockInit(&WalRcv->mutex); pg_atomic_init_u64(&WalRcv->writtenUpto, 0); WalRcv->latch = NULL; @@ -95,12 +98,18 @@ WalRcvRunning(void) if ((now - startTime) > WALRCV_STARTUP_TIMEOUT) { - SpinLockAcquire(&walrcv->mutex); + bool state_changed = false; + SpinLockAcquire(&walrcv->mutex); if (walrcv->walRcvState == WALRCV_STARTING) + { state = walrcv->walRcvState = WALRCV_STOPPED; - + state_changed = true; + } SpinLockRelease(&walrcv->mutex); + + if (state_changed) + ConditionVariableBroadcast(&walrcv->walRcvStateChanged); } } @@ -140,12 +149,18 @@ WalRcvStreaming(void) if ((now - startTime) > WALRCV_STARTUP_TIMEOUT) { - SpinLockAcquire(&walrcv->mutex); + bool state_changed = false; + SpinLockAcquire(&walrcv->mutex); if (walrcv->walRcvState == WALRCV_STARTING) + { state = walrcv->walRcvState = WALRCV_STOPPED; - + state_changed = true; + } SpinLockRelease(&walrcv->mutex); + + if (state_changed) + ConditionVariableBroadcast(&walrcv->walRcvStateChanged); } } @@ -191,6 +206,8 @@ ShutdownWalRcv(void) } SpinLockRelease(&walrcv->mutex); + ConditionVariableBroadcast(&walrcv->walRcvStateChanged); + /* * Signal walreceiver process if it was still running. */ @@ -199,18 +216,13 @@ ShutdownWalRcv(void) /* * Wait for walreceiver to acknowledge its death by setting state to - * WALRCV_STOPPED. + * WALRCV_STOPPED. This wait contains a standard CHECK_FOR_INTERRUPTS(). */ + ConditionVariablePrepareToSleep(&walrcv->walRcvStateChanged); while (WalRcvRunning()) - { - /* - * This possibly-long loop needs to handle interrupts of startup - * process. - */ - HandleStartupProcInterrupts(); - - pg_usleep(100000); /* 100ms */ - } + ConditionVariableSleep(&walrcv->walRcvStateChanged, + WAIT_EVENT_WALRCV_EXIT); + ConditionVariableCancelSleep(); } /* @@ -298,6 +310,8 @@ RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr, const char *conninfo, SpinLockRelease(&walrcv->mutex); + ConditionVariableBroadcast(&walrcv->walRcvStateChanged); + if (launch) SendPostmasterSignal(PMSIGNAL_START_WALRECEIVER); else if (latch) diff --git a/src/include/pgstat.h b/src/include/pgstat.h index 724068cf87..222e38037c 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -998,6 +998,7 @@ typedef enum WAIT_EVENT_REPLICATION_SLOT_DROP, WAIT_EVENT_SAFE_SNAPSHOT, WAIT_EVENT_SYNC_REP, + WAIT_EVENT_WALRCV_EXIT, WAIT_EVENT_XACT_GROUP_UPDATE } WaitEventIPC; diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h index a97a59a6a3..bddea21a30 100644 --- a/src/include/replication/walreceiver.h +++ b/src/include/replication/walreceiver.h @@ -19,6 +19,7 @@ #include "port/atomics.h" #include "replication/logicalproto.h" #include "replication/walsender.h" +#include "storage/condition_variable.h" #include "storage/latch.h" #include "storage/spin.h" #include "utils/tuplestore.h" @@ -62,6 +63,7 @@ typedef struct */ pid_t pid; WalRcvState walRcvState; + ConditionVariable walRcvStateChanged; pg_time_t startTime; /* -- 2.30.1
From f03e945f140d52a9e4910ab0d66a01d659c43c9f Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Tue, 2 Mar 2021 11:47:39 +1300 Subject: [PATCH v6 2/3] Add condition variable for recovery pause/resume. This gives us a fast reaction time when recovery is resumed or the postmaster exits. Unfortunately we still need to wake up every second to perform more expensive checks during the recovery pause loop. Reviewed-by: Heikki Linnakangas <hlinn...@iki.fi> Reviewed-by: Fujii Masao <masao.fu...@oss.nttdata.com> Reviewed-by: Michael Paquier <mich...@paquier.xyz> Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com --- src/backend/access/transam/xlog.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 377afb8732..17240e9b8f 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -723,6 +723,7 @@ typedef struct XLogCtlData TimestampTz currentChunkStartTime; /* Are we requested to pause recovery? */ bool recoveryPause; + ConditionVariable recoveryPauseChanged; /* * lastFpwDisableRecPtr points to the start of the last replayed @@ -5204,6 +5205,7 @@ XLOGShmemInit(void) SpinLockInit(&XLogCtl->info_lck); SpinLockInit(&XLogCtl->ulsn_lck); InitSharedLatch(&XLogCtl->recoveryWakeupLatch); + ConditionVariableInit(&XLogCtl->recoveryPauseChanged); } /* @@ -6016,10 +6018,6 @@ recoveryStopsAfter(XLogReaderState *record) * endOfRecovery is true if the recovery target is reached and * the paused state starts at the end of recovery because of * recovery_target_action=pause, and false otherwise. - * - * XXX Could also be done with shared latch, avoiding the pg_usleep loop. - * Probably not worth the trouble though. This state shouldn't be one that - * anyone cares about server power consumption in. */ static void recoveryPausesHere(bool endOfRecovery) @@ -6041,15 +6039,22 @@ recoveryPausesHere(bool endOfRecovery) (errmsg("recovery has paused"), errhint("Execute pg_wal_replay_resume() to continue."))); + ConditionVariablePrepareToSleep(&XLogCtl->recoveryPauseChanged); while (RecoveryIsPaused()) { HandleStartupProcInterrupts(); if (CheckForStandbyTrigger()) - return; - pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE); - pg_usleep(1000000L); /* 1000 ms */ - pgstat_report_wait_end(); + break; + + /* + * We wait on a condition variable that will wake us as soon as the + * pause ends, but we use a timeout so we can check the above exit + * condition periodically too. + */ + ConditionVariableTimedSleep(&XLogCtl->recoveryPauseChanged, 1000, + WAIT_EVENT_RECOVERY_PAUSE); } + ConditionVariableCancelSleep(); } bool @@ -6070,6 +6075,8 @@ SetRecoveryPause(bool recoveryPause) SpinLockAcquire(&XLogCtl->info_lck); XLogCtl->recoveryPause = recoveryPause; SpinLockRelease(&XLogCtl->info_lck); + + ConditionVariableBroadcast(&XLogCtl->recoveryPauseChanged); } /* -- 2.30.1
From 1dfc8a1b25b33c4d1115bcea68ebe8732d56ba93 Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Thu, 24 Sep 2020 17:37:54 +1200 Subject: [PATCH v6 3/3] Poll postmaster less frequently in recovery. Since commits 9f095299 and f98b8476 we don't poll the postmaster pipe at all during crash recovery on Linux and FreeBSD, but on other operating systems we were still doing it for every WAL record. Do it less frequently on operating systems where system calls are required, at the cost of delaying exit a bit after postmaster death, to avoid expensive system calls reported to slow down CPU-bound recovery by as much as 10-30%. Reviewed-by: Heikki Linnakangas <hlinn...@iki.fi> Reviewed-by: Fujii Masao <masao.fu...@oss.nttdata.com> Reviewed-by: Michael Paquier <mich...@paquier.xyz> Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com Discussion: https://postgr.es/m/7261eb39-0369-f2f4-1bb5-62f3b6083...@iki.fi --- src/backend/postmaster/startup.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c index f781fdc6fc..682fc675ab 100644 --- a/src/backend/postmaster/startup.c +++ b/src/backend/postmaster/startup.c @@ -134,6 +134,10 @@ StartupRereadConfig(void) void HandleStartupProcInterrupts(void) { +#ifndef USE_POSTMASTER_DEATH_SIGNAL + static int count = 0; +#endif + /* * Process any requests or signals received recently. */ @@ -151,9 +155,17 @@ HandleStartupProcInterrupts(void) /* * Emergency bailout if postmaster has died. This is to avoid the - * necessity for manual cleanup of all postmaster children. + * necessity for manual cleanup of all postmaster children. Do this less + * frequently on systems for which we don't have signals to make that + * cheap. Any loop that sleeps should be using WaitLatch or similar and + * handling postmaster death that way; the check here is intended only to + * deal with CPU-bound loops such as the main redo loop. */ - if (IsUnderPostmaster && !PostmasterIsAlive()) + if (IsUnderPostmaster && +#ifndef USE_POSTMASTER_DEATH_SIGNAL + count++ % 1024 == 0 && +#endif + !PostmasterIsAlive()) exit(1); /* Process barrier events */ -- 2.30.1