Hi Jim,

On Sun, Mar 29, 2026 at 10:56 AM Jim Jones <[email protected]>
wrote:

> Hi,
>
> When connecting with target_session_attrs=standby (or prefer-standby,
> read-only, any) and multiple standbys are available, libpq currently
> selects the first acceptable candidate without regard for how "current"
> its data is. A standby configured with recovery_min_apply_delay,
> experiencing slow I/O, or otherwise lagging is treated the same as one
> that is fully caught up.
>
> I would like to propose a new libpq connection parameter,
> max_wal_replay_size, that allows clients to skip standby servers whose
> WAL replay backlog exceeds a given threshold.
>
> Example:
>
>   psql "host=host1,host2,host3 port=5111,5222,5333 \
>         target_session_attrs=standby max_wal_replay_size=16MB"
>
> When this parameter is set, libpq executes a small query during
> connection establishment to evaluate:
>
>   pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn())
>
> on the standby. If the result exceeds the specified threshold, the
> server is skipped and the next host in the list is tried. The check is
> skipped entirely when target_session_attrs is set to primary or
> read-write, since those modes already exclude standbys.
>

What if none of them meets the criteria? You fail the connection? Wouldn't
it cause an availability issue?


>
> If pg_last_wal_receive_lsn() is NULL (e.g. no active WAL receiver due to
> missing primary_conninfo or a disconnected upstream), the backlog cannot
> be determined. In that case, the standby is treated as exceeding the
> threshold and is skipped.
>

When a standby is replaying archiving log, it can still be caught up. This
doesn't seem right to me.


>
> This parameter measures only the apply lag on the standby itself, i.e.,
> how much already-received WAL remains to be replayed. It does not
> attempt to measure how far the standby is behind the primary. In
> particular, a standby that is slow to receive WAL but fast to replay it
> may report a small backlog here while still being significantly behind.
>

IMHO, this change appears to not meet the objective of routing
connections/queries to the most up-to-date standby.

Thanks,
Satya

Reply via email to