Hi Xuneng,
I took a look at v11. Overall it looks good, and I think the direction
is right: now that the LSN-wait facility was committed independently
(3b4e53a, 447aae1), the patch no longer depends on the full WAIT FOR
machinery -- which I believe addresses Michael's main earlier concern --
and the change in read_local_xlog_page_guts() is small and focused.
I traced the wakeup paths against current master to convince myself the
busy-poll can be safely dropped:
- primary flush waiters are woken in xlog.c (XLogFlush /
XLogBackgroundFlush),
- standby replay waiters in xlogrecovery.c,
- promotion wakes standby replay waiters in xlog.c,
so the cascading A => B => C case (B promoted while decoding on C)
is covered via the replay advance.
WaitForLSN() also adds the waiter to the heap before re-reading the
current LSN, so there is no lost-wakeup window, and it checks for
interrupts before each WaitLatch(). So far so good.
A few comments, all minor:
1. CHECK_FOR_INTERRUPTS() right before the wait is now redundant. It
was needed for the old pg_usleep() busy-loop (pg_usleep does not
service interrupts), but WaitForLSN() already calls
CHECK_FOR_INTERRUPTS() before each WaitLatch(). It can be removed.
2. The switch on the standby WaitForLSN() result (the
SUCCESS/NOT_IN_RECOVERY/default block) doesn't seem to earn its keep.
Both non-default arms just break and fall through to the same "loop
back" as the primary branch, and WAIT_LSN_RESULT_TIMEOUT is
unreachable here because we pass timeout = -1. The existing callers
in commands/wait.c and commands/repack_worker.c switch on the result
precisely because they pass a finite timeout and act differently per
outcome; here the outcomes all lead to the same action, so the
primary branch's "ignore the result" style fits better and makes the
two branches symmetric.
The one part worth keeping is the comment about promotion, which
documents real behavior. So I'd drop the switch and keep the
explanation inline, e.g.:
else
{
/*
* On a standby, wait for replay. We ignore the result: if
* we get promoted while waiting, the next iteration of the
* outer loop takes the primary/flush path; if a cascading
* upstream switches timeline, replay advances and wakes us.
* Either way we loop back and re-evaluate, so no per-result
* handling is needed here.
*/
WaitForLSN(WAIT_LSN_TYPE_STANDBY_REPLAY, loc, -1);
}
3. The patch has no test. I realize a pure performance change is hard
to test, but the trickiest path is "promoted while waiting"
(WAIT_LSN_RESULT_NOT_IN_RECOVERY -> loop back -> flush path). A small
TAP test exercising that, plus a line in the commit message about how
you validated this (the performance comparison), would make reviewers
more comfortable since this touches sensitive WAL-read code. Even if
such a test doesn't end up being committed (recovery TAP tests are
expensive, and the committer may prefer not to add one), writing it
would at least let us confirm the promotion path behaves as intended.
One question on the design: the wakeup/queue pattern here is close to
what synchronous replication does, and Michael raised possible
consolidation with syncrep.c earlier. v11 leaves that out, which seems
reasonable to me for keeping the change small -- but it might be worth
saying so explicitly in the commit message so it's a recorded decision
rather than an omission.
I'll keep following the thread; happy to re-test once you post an
update.
Regards,
Rui Zhao
> 2026年2月2日 11:16,Xuneng Zhou <[email protected]> 写道:
>
> Hi,
>
> Just rebase.
>
> The performance comparison before and after applying the patch is attached.
>
> --
> Best,
> Xuneng
> <image.png><v11-0001-Improve-read_local_xlog_page_guts-by-replacing-p.patch>