On Tue, Feb 24, 2026 at 3:54 PM Shinya Kato <[email protected]> wrote: > > Hi hackers, > > I have noticed that pg_stat_replication.*_lag sometimes shows NULL > when inserting a record per second for health checking. This happens > when the startup process replays WAL fast enough before the > walreceiver sends its flush notification to the walsender. > > Here is the sequence that triggers the issue: (See normal.svg and > error.svg for diagrams of the normal and problematic cases.) > > 1. The walreceiver receives, writes, and flushes WAL, then wakes the > startup process via WakeupRecovery(). > > 2. The startup process replays all available WAL quickly, then calls > WalRcvForceReply() to set force_reply = true and wakes the > walreceiver. > > 3. The walreceiver sends a flush notification to the walsender > (XLogWalRcvSendReply() in XLogWalRcvFlush()). Since the startup has > already replayed the WAL by this point, this message reports the > incremented applyPtr, which equals sentPtr. The walsender processes > this message, consuming the LagTracker samples and setting > fullyAppliedLastTime = true. > > 4. In the next loop iteration, the walreceiver sees force_reply = true > and sends another reply with the same positions. The walsender sees > applyPtr == sentPtr for the second consecutive time and sets > clearLagTimes = true. Since the LagTracker samples were already > consumed by step 3, all lag values are -1. With clearLagTimes = true, > these -1 values are written to walsnd->*Lag, causing > pg_stat_replication to show NULL. > > The comment in ProcessStandbyReplyMessage() says: > > * If the standby reports that it has fully replayed the WAL in two > * consecutive reply messages, then the second such message must result > * from wal_receiver_status_interval expiring on the standby. > > But as shown above, the second message can also come from > WalRcvForceReply(), violating this assumption. > > The attached patch fixes this by adding a check that all lag values > are -1 to the clearLagTimes condition. This ensures that clearLagTimes > only triggers when there are truly no new lag samples in two > consecutive messages (i.e., the system is genuinely idle), and not > when the samples were simply consumed by a preceding message in a > burst of replies.
Thanks for the patch! With the patch applied, I set up a logical replication and inserted a row every second. Even with continuous inserts, NULL was shown in the lag columns of pg_stat_replication. That makes me wonder whether the patch's approach is sufficient to address the issue. Relying solely on replies from the standby or subscriber seems a bit fragile to me. If the goal is to keep showing the last measured lag for some time, perhaps we should introduce a rate limit on when NULL is displayed in the lag columns? For example, if there has been no activity (i.e., sentPtr == applyPtr and applyPtr has not changed since the previous cycle) for, say, 10 seconds, then we could allow NULL to be shown. Thought? Regards, -- Fujii Masao
