Hi Xuneng,

I took a look at v11.  Overall it looks good, and I think the direction
is right: now that the LSN-wait facility was committed independently
(3b4e53a, 447aae1), the patch no longer depends on the full WAIT FOR
machinery -- which I believe addresses Michael's main earlier concern --
and the change in read_local_xlog_page_guts() is small and focused.

I traced the wakeup paths against current master to convince myself the
busy-poll can be safely dropped:

  - primary flush waiters are woken in xlog.c (XLogFlush /
    XLogBackgroundFlush),
  - standby replay waiters in xlogrecovery.c,
  - promotion wakes standby replay waiters in xlog.c,
    so the cascading A => B => C case (B promoted while decoding on C)
    is covered via the replay advance.

WaitForLSN() also adds the waiter to the heap before re-reading the
current LSN, so there is no lost-wakeup window, and it checks for
interrupts before each WaitLatch().  So far so good.

A few comments, all minor:

1. CHECK_FOR_INTERRUPTS() right before the wait is now redundant.  It
   was needed for the old pg_usleep() busy-loop (pg_usleep does not
   service interrupts), but WaitForLSN() already calls
   CHECK_FOR_INTERRUPTS() before each WaitLatch().  It can be removed.

2. The switch on the standby WaitForLSN() result (the
   SUCCESS/NOT_IN_RECOVERY/default block) doesn't seem to earn its keep.
   Both non-default arms just break and fall through to the same "loop
   back" as the primary branch, and WAIT_LSN_RESULT_TIMEOUT is
   unreachable here because we pass timeout = -1.  The existing callers
   in commands/wait.c and commands/repack_worker.c switch on the result
   precisely because they pass a finite timeout and act differently per
   outcome; here the outcomes all lead to the same action, so the
   primary branch's "ignore the result" style fits better and makes the
   two branches symmetric.

   The one part worth keeping is the comment about promotion, which
   documents real behavior.  So I'd drop the switch and keep the
   explanation inline, e.g.:

       else
       {
           /*
            * On a standby, wait for replay.  We ignore the result: if
            * we get promoted while waiting, the next iteration of the
            * outer loop takes the primary/flush path; if a cascading
            * upstream switches timeline, replay advances and wakes us.
            * Either way we loop back and re-evaluate, so no per-result
            * handling is needed here.
            */
           WaitForLSN(WAIT_LSN_TYPE_STANDBY_REPLAY, loc, -1);
       }

3. The patch has no test.  I realize a pure performance change is hard
   to test, but the trickiest path is "promoted while waiting"
   (WAIT_LSN_RESULT_NOT_IN_RECOVERY -> loop back -> flush path).  A small
   TAP test exercising that, plus a line in the commit message about how
   you validated this (the performance comparison), would make reviewers
   more comfortable since this touches sensitive WAL-read code.  Even if
   such a test doesn't end up being committed (recovery TAP tests are
   expensive, and the committer may prefer not to add one), writing it
   would at least let us confirm the promotion path behaves as intended.

One question on the design: the wakeup/queue pattern here is close to
what synchronous replication does, and Michael raised possible
consolidation with syncrep.c earlier.  v11 leaves that out, which seems
reasonable to me for keeping the change small -- but it might be worth
saying so explicitly in the commit message so it's a recorded decision
rather than an omission.

I'll keep following the thread; happy to re-test once you post an
update.

Regards,
Rui Zhao


> 2026年2月2日 11:16,Xuneng Zhou <[email protected]> 写道:
> 
> Hi,
> 
> Just rebase.
> 
> The performance comparison before and after applying the patch is attached.
> 
> --
> Best,
> Xuneng
> <image.png><v11-0001-Improve-read_local_xlog_page_guts-by-replacing-p.patch>



Reply via email to