Hi,

Currently the standby reaches ResolveRecoveryConflictWithVirtualXIDs()
with a waitlist of VXIDs that are preventing it from making progress.
For each one it enters a sleep/poll loop that repeatedly tries to
acquire (conditionally) the VXID lock with VirtualXactLock(vxid,
wait=false), as a reliable wait to know that that VXID is gone.  If it
can't get it, it sleeps initially for 1ms, doubling it each time
through the loop with a cap at 1s, until GetStandbyLimitTime() is
exceeded.  Then it switches to a machine-gun signal mode where it
calls CancelVirtualTransaction() repeatedly with a 5ms sleep, until
eventually it succeeds in acquiring the VXID lock or the universe
ends.

A boring observation: the explanation in comments for the 1s cap is
now extra wrong because pg_usleep() *is* interruptible by signals due
to a recent change, but it was already wrong to assume that that was a
reliable property and has been forever.  We should probably change
that pg_usleep() to a WaitLatch() so we can process incidental
interrupts faster, perhaps with something like the attached.  It still
doesn't help with the condition that the loop is actually waiting for,
though.

I don't really like WaitExceedsMaxStandbyDelay() at all, and would
probably rather refactor this a bit more, though.  It has a misleading
name, an egregious global variable, and generally feels like it wants
to be moved back inside the loop that calls it.

Contemplating that made me brave enough to start wondering what this
code *really* wants, with a view to improving it for v17.  What if,
instead of VirtualXactLock(vxid, wait) we could do
VirtualXactLock(vxid, timeout_ms)?  Then the pseudo-code for each VXID
might be as simple as:

if (!VirtualXactLock(vxid, polite_wait_time))
{
    CancelVirtualTransaction(vxid);
    VirtualXactLock(vxid, -1); // wait forever
}

... with some extra details because after some delay we want to log a
message.  You could ignore the current code that sets PS display with
" ... waiting" after a short time, because that's emulating the lock
manager's PS display now the lock manager would do that.

Which leads to the question: why does the current code believe that it
is necessary to cancel the VXID more than once?  Dare I ask... could
it be because the code on the receiving side of the proc signal is/was
so buggy?

Initially I was suspicious that there may be tricky races to deal with
around that wakeup logic, and the poll/sleep loop was due to an
inability to come up with something reliable.

Maybe someone's going to balk at the notion of pushing a timeout down
through the lock manager.  I'm not sure how far into the code that'd
need to go, because haven't tried and I don't know off the top of my
head whether that'd best be done internally with timers or through
timeout arguments (probably better but requires more threading through
more layers), or if there is some technical reason to object to both
of these.
From 04d5f4177b0112c9f01a5bdc48bc8e3983277740 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Wed, 5 Apr 2023 17:21:18 +1200
Subject: [PATCH] WIP: latchify standby sleep


diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 9f56b4e95c..7770877d9b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -222,8 +222,8 @@ GetStandbyLimitTime(void)
 	}
 }
 
-#define STANDBY_INITIAL_WAIT_US  1000
-static int	standbyWait_us = STANDBY_INITIAL_WAIT_US;
+#define STANDBY_INITIAL_WAIT_MS  1
+static int	standbyWait_ms = STANDBY_INITIAL_WAIT_MS;
 
 /*
  * Standby wait logic for ResolveRecoveryConflictWithVirtualXIDs.
@@ -235,8 +235,6 @@ WaitExceedsMaxStandbyDelay(uint32 wait_event_info)
 {
 	TimestampTz ltime;
 
-	CHECK_FOR_INTERRUPTS();
-
 	/* Are we past the limit time? */
 	ltime = GetStandbyLimitTime();
 	if (ltime && GetCurrentTimestamp() >= ltime)
@@ -245,17 +243,24 @@ WaitExceedsMaxStandbyDelay(uint32 wait_event_info)
 	/*
 	 * Sleep a bit (this is essential to avoid busy-waiting).
 	 */
-	pgstat_report_wait_start(wait_event_info);
-	pg_usleep(standbyWait_us);
-	pgstat_report_wait_end();
+	if (WaitLatch(MyLatch,
+				  WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+				  standbyWait_ms,
+				  wait_event_info) & WL_LATCH_SET)
+	{
+		ResetLatch(MyLatch);
+		CHECK_FOR_INTERRUPTS();
+	}
 
 	/*
 	 * Progressively increase the sleep times, but not to more than 1s, since
-	 * pg_usleep isn't interruptible on some platforms.
+	 * we don't yet have a way to wake up when the virtual xid we're waiting
+	 * for has gone.  ResolveRecoveryConflictWithVirtualXIDs() will keep
+	 * polling, but at least we will process interrupts that set our latch.
 	 */
-	standbyWait_us *= 2;
-	if (standbyWait_us > 1000000)
-		standbyWait_us = 1000000;
+	standbyWait_ms *= 2;
+	if (standbyWait_ms > 1000)
+		standbyWait_ms = 1000;
 
 	return false;
 }
@@ -376,7 +381,7 @@ ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
 	while (VirtualTransactionIdIsValid(*waitlist))
 	{
 		/* reset standbyWait_us for each xact we wait for */
-		standbyWait_us = STANDBY_INITIAL_WAIT_US;
+		standbyWait_ms = STANDBY_INITIAL_WAIT_MS;
 
 		/* wait until the virtual xid is gone */
 		while (!VirtualXactLock(*waitlist, false))
-- 
2.40.0

Reply via email to