Hi,

You can't tell if your checkpointer is spending a lot of time waiting
around for flags in delayChkptFlags to clear.  Trivial patch to add
that.  I've managed to see it a few times when checkpointing
repeatedly with a heavy pgbench workload.

I had to stop and think for a moment about whether these events belong
under "WaitEventIPC", "waiting for notification from another process"
or under "WaitEventTimeout", "waiting for a timeout to expire".  I
mean, both?  It's using sleep-and-poll instead of (say) a CV due to
the economics, we want to make the other side as cheap as possible, so
we don't care about making the checkpointer take some micro-naps in
this case.  I feel like the key point here is that it's waiting for
another process to do stuff and unblock it.
From fdce1ce74af59efa9020eecf52fe52af07b96670 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Thu, 12 Oct 2023 13:52:26 +1300
Subject: [PATCH] Add wait events for checkpoint delay mechanism.

When MyProc->delayChkptFlags is set to temporarily block phase
transitions in a concurrent checkpoint, the checkpointer enters a
sleep-poll loop to wait for the flag to be cleared.  We should show that
as a wait event in the pg_stat_activity view.

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index fcbde10529..45ace193ec 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6720,11 +6720,13 @@ CreateCheckPoint(int flags)
 	vxids = GetVirtualXIDsDelayingChkpt(&nvxids, DELAY_CHKPT_START);
 	if (nvxids > 0)
 	{
+		pgstat_report_wait_start(WAIT_EVENT_CHECKPOINT_DELAY_START);
 		do
 		{
 			pg_usleep(10000L);	/* wait for 10 msec */
 		} while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids,
 											  DELAY_CHKPT_START));
+		pgstat_report_wait_end();
 	}
 	pfree(vxids);
 
@@ -6733,11 +6735,13 @@ CreateCheckPoint(int flags)
 	vxids = GetVirtualXIDsDelayingChkpt(&nvxids, DELAY_CHKPT_COMPLETE);
 	if (nvxids > 0)
 	{
+		pgstat_report_wait_start(WAIT_EVENT_CHECKPOINT_DELAY_COMPLETE);
 		do
 		{
 			pg_usleep(10000L);	/* wait for 10 msec */
 		} while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids,
 											  DELAY_CHKPT_COMPLETE));
+		pgstat_report_wait_end();
 	}
 	pfree(vxids);
 
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 9c5fdeb3ca..d7995931bd 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -97,6 +97,8 @@ BGWORKER_SHUTDOWN	"Waiting for background worker to shut down."
 BGWORKER_STARTUP	"Waiting for background worker to start up."
 BTREE_PAGE	"Waiting for the page number needed to continue a parallel B-tree scan to become available."
 BUFFER_IO	"Waiting for buffer I/O to complete."
+CHECKPOINT_DELAY_COMPLETE	"Waiting for a backend that blocks a checkpoint from completing."
+CHECKPOINT_DELAY_START	"Waiting for a backend that blocks a checkpoint from starting."
 CHECKPOINT_DONE	"Waiting for a checkpoint to complete."
 CHECKPOINT_START	"Waiting for a checkpoint to start."
 EXECUTE_GATHER	"Waiting for activity from a child process while executing a <literal>Gather</literal> plan node."
-- 
2.39.2

Reply via email to