Hi, You can't tell if your checkpointer is spending a lot of time waiting around for flags in delayChkptFlags to clear. Trivial patch to add that. I've managed to see it a few times when checkpointing repeatedly with a heavy pgbench workload.
I had to stop and think for a moment about whether these events belong under "WaitEventIPC", "waiting for notification from another process" or under "WaitEventTimeout", "waiting for a timeout to expire". I mean, both? It's using sleep-and-poll instead of (say) a CV due to the economics, we want to make the other side as cheap as possible, so we don't care about making the checkpointer take some micro-naps in this case. I feel like the key point here is that it's waiting for another process to do stuff and unblock it.
From fdce1ce74af59efa9020eecf52fe52af07b96670 Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Thu, 12 Oct 2023 13:52:26 +1300 Subject: [PATCH] Add wait events for checkpoint delay mechanism. When MyProc->delayChkptFlags is set to temporarily block phase transitions in a concurrent checkpoint, the checkpointer enters a sleep-poll loop to wait for the flag to be cleared. We should show that as a wait event in the pg_stat_activity view. diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index fcbde10529..45ace193ec 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -6720,11 +6720,13 @@ CreateCheckPoint(int flags) vxids = GetVirtualXIDsDelayingChkpt(&nvxids, DELAY_CHKPT_START); if (nvxids > 0) { + pgstat_report_wait_start(WAIT_EVENT_CHECKPOINT_DELAY_START); do { pg_usleep(10000L); /* wait for 10 msec */ } while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids, DELAY_CHKPT_START)); + pgstat_report_wait_end(); } pfree(vxids); @@ -6733,11 +6735,13 @@ CreateCheckPoint(int flags) vxids = GetVirtualXIDsDelayingChkpt(&nvxids, DELAY_CHKPT_COMPLETE); if (nvxids > 0) { + pgstat_report_wait_start(WAIT_EVENT_CHECKPOINT_DELAY_COMPLETE); do { pg_usleep(10000L); /* wait for 10 msec */ } while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids, DELAY_CHKPT_COMPLETE)); + pgstat_report_wait_end(); } pfree(vxids); diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt index 9c5fdeb3ca..d7995931bd 100644 --- a/src/backend/utils/activity/wait_event_names.txt +++ b/src/backend/utils/activity/wait_event_names.txt @@ -97,6 +97,8 @@ BGWORKER_SHUTDOWN "Waiting for background worker to shut down." BGWORKER_STARTUP "Waiting for background worker to start up." BTREE_PAGE "Waiting for the page number needed to continue a parallel B-tree scan to become available." BUFFER_IO "Waiting for buffer I/O to complete." +CHECKPOINT_DELAY_COMPLETE "Waiting for a backend that blocks a checkpoint from completing." +CHECKPOINT_DELAY_START "Waiting for a backend that blocks a checkpoint from starting." CHECKPOINT_DONE "Waiting for a checkpoint to complete." CHECKPOINT_START "Waiting for a checkpoint to start." EXECUTE_GATHER "Waiting for activity from a child process while executing a <literal>Gather</literal> plan node." -- 2.39.2