On Tue, Feb 24, 2026 at 3:54 PM Shinya Kato <[email protected]> wrote:
>
> Hi hackers,
>
> I have noticed that pg_stat_replication.*_lag sometimes shows NULL
> when inserting a record per second for health checking. This happens
> when the startup process replays WAL fast enough before the
> walreceiver sends its flush notification to the walsender.
>
> Here is the sequence that triggers the issue: (See normal.svg and
> error.svg for diagrams of the normal and problematic cases.)
>
> 1. The walreceiver receives, writes, and flushes WAL, then wakes the
> startup process via WakeupRecovery().
>
> 2. The startup process replays all available WAL quickly, then calls
> WalRcvForceReply() to set force_reply = true and wakes the
> walreceiver.
>
> 3. The walreceiver sends a flush notification to the walsender
> (XLogWalRcvSendReply() in XLogWalRcvFlush()). Since the startup has
> already replayed the WAL by this point, this message reports the
> incremented applyPtr, which equals sentPtr. The walsender processes
> this message, consuming the LagTracker samples and setting
> fullyAppliedLastTime = true.
>
> 4. In the next loop iteration, the walreceiver sees force_reply = true
> and sends another reply with the same positions. The walsender sees
> applyPtr == sentPtr for the second consecutive time and sets
> clearLagTimes = true. Since the LagTracker samples were already
> consumed by step 3, all lag values are -1. With clearLagTimes = true,
> these -1 values are written to walsnd->*Lag, causing
> pg_stat_replication to show NULL.
>
> The comment in ProcessStandbyReplyMessage() says:
>
>      * If the standby reports that it has fully replayed the WAL in two
>      * consecutive reply messages, then the second such message must result
>      * from wal_receiver_status_interval expiring on the standby.
>
> But as shown above, the second message can also come from
> WalRcvForceReply(), violating this assumption.
>
> The attached patch fixes this by adding a check that all lag values
> are -1 to the clearLagTimes condition. This ensures that clearLagTimes
> only triggers when there are truly no new lag samples in two
> consecutive messages (i.e., the system is genuinely idle), and not
> when the samples were simply consumed by a preceding message in a
> burst of replies.

Thanks for the patch!

With the patch applied, I set up a logical replication and inserted a row every
second. Even with continuous inserts, NULL was shown in the lag columns of
pg_stat_replication. That makes me wonder whether the patch's approach is
sufficient to address the issue.

Relying solely on replies from the standby or subscriber seems a bit fragile to
me. If the goal is to keep showing the last measured lag for some time,
perhaps we should introduce a rate limit on when NULL is displayed in the lag
columns?

For example, if there has been no activity (i.e., sentPtr == applyPtr and
applyPtr has not changed since the previous cycle) for, say, 10 seconds,
then we could allow NULL to be shown. Thought?

Regards,

-- 
Fujii Masao


Reply via email to