Hi,

On Fri, Dec 12, 2025 at 4:45 PM Xuneng Zhou <[email protected]> wrote:
>
> Hi Noah,
>
> On Fri, Dec 12, 2025 at 1:05 PM Noah Misch <[email protected]> wrote:
> >
> > On Fri, Dec 12, 2025 at 12:51:00PM +0800, Xuneng Zhou wrote:
> > > Bug #19093 [1] reported that pg_stat_wal_receiver.status = 'streaming'
> > > does not accurately reflect streaming health.  In that discussion,
> > > Noah noted that even before the reported regression, status =
> > > 'streaming' was unreliable because walreceiver sets it during early
> > > startup, before attempting a connection. He suggested:
> > >
> > > "Long-term, in master only, perhaps we should introduce another status
> > > like 'connecting'. Perhaps enact the connecting->streaming status
> > > transition just before tendering the first byte of streamed WAL to the
> > > startup process. Alternatively, enact that transition when the startup
> > > process accepts the
> > > first streamed byte."
> >
> > > == Proposal ==
> > >
> > > Introduce WALRCV_CONNECTING as an intermediate state between STARTING
> > > and STREAMING:
> > >
> > > - When walreceiver starts, it enters CONNECTING (instead of going
> > > directly to STREAMING).
> > > - The transition to STREAMING occurs in XLogWalRcvFlush(), inside the
> > > existing spinlock-protected block that updates flushedUpto.
> >
> > I think this has the drawback that if the primary's WAL is incompatible,
> > e.g. unacceptable timeline, the walreceiver will still briefly enter
> > STREAMING.  That could trick monitoring.
>
> Thanks for pointing this out.
>
>  Waiting for applyPtr to advance
> > would avoid the short-lived STREAMING.  What's the feasibility of that?
>
> I think this could work, but with complications. If replay latency is
> high or replay is paused with pg_wal_replay_pause, the WalReceiver
> would stay in the CONNECTING state longer than expected. Whether this
> is ok depends on the definition of the 'connecting' state. For the
> implementation, deciding where and when to check applyPtr against LSNs
> like receiveStart is more difficult—the WalReceiver doesn't know when
> applyPtr advances. While the WalReceiver can read applyPtr from shared
> memory, it isn't automatically notified when that pointer advances.
> This leads to latency between checking and replay if this is done in
> the WalReceiver part unless we let the startup process set the state,
> which would couple the two components. Am I missing something here?
>

After some thoughts, a potential approach could be to expose a new
function in the WAL receiver that transitions the state from
CONNECTING to STREAMING. This function can then be invoked directly
from WaitForWALToBecomeAvailable in the startup process, ensuring the
state change aligns with the actual acceptance of the WAL stream.

-- 
Best,
Xuneng


Reply via email to