On Mon, Mar 2, 2026 at 11:44 PM Fujii Masao <[email protected]> wrote:
> With the patch applied, I set up a logical replication and inserted a row 
> every
> second. Even with continuous inserts, NULL was shown in the lag columns of
> pg_stat_replication. That makes me wonder whether the patch's approach is
> sufficient to address the issue.

Thank you for the review and testing! I had only considered the issue
in the context of physical replication, but as you pointed out, my
approach is insufficient for logical replication.

> Relying solely on replies from the standby or subscriber seems a bit fragile 
> to
> me. If the goal is to keep showing the last measured lag for some time,
> perhaps we should introduce a rate limit on when NULL is displayed in the lag
> columns?

My primary goal was to ensure that the source code comments match the
actual behavior, as the comment stating "the second such message must
result from wal_receiver_status_interval expiring on the standby" is
inaccurate. However, as you noted, the patch alone is not sufficient
to fully address the issue.

> For example, if there has been no activity (i.e., sentPtr == applyPtr and
> applyPtr has not changed since the previous cycle) for, say, 10 seconds,
> then we could allow NULL to be shown. Thought?

I considered a time-based rate limit, but it is difficult to choose an
appropriate threshold. Furthermore, the walsender has no way of
knowing the standby's or subscriber's wal_receiver_status_interval
setting.

The attached v2 patch takes a different approach: it additionally
requires that all reported positions (write/flush/apply) remain
unchanged from the previous reply. This directly detects a truly idle
system without relying on timeouts—if any position has advanced, new
WAL activity must have occurred, so we should not clear the lag values
even if the lag tracker is empty.
--
Best regards,
Shinya Kato
NTT OSS Center

Attachment: v2-0001-Fix-spurious-NULL-lag-in-pg_stat_replication.patch
Description: Binary data

Reply via email to