Hi,
When connecting with target_session_attrs=standby (or prefer-standby,
read-only, any) and multiple standbys are available, libpq currently
selects the first acceptable candidate without regard for how "current"
its data is. A standby configured with recovery_min_apply_delay,
experiencing slow I/O, or otherwise lagging is treated the same as one
that is fully caught up.
I would like to propose a new libpq connection parameter,
max_wal_replay_size, that allows clients to skip standby servers whose
WAL replay backlog exceeds a given threshold.
Example:
psql "host=host1,host2,host3 port=5111,5222,5333 \
target_session_attrs=standby max_wal_replay_size=16MB"
When this parameter is set, libpq executes a small query during
connection establishment to evaluate:
pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn())
on the standby. If the result exceeds the specified threshold, the
server is skipped and the next host in the list is tried. The check is
skipped entirely when target_session_attrs is set to primary or
read-write, since those modes already exclude standbys.
If pg_last_wal_receive_lsn() is NULL (e.g. no active WAL receiver due to
missing primary_conninfo or a disconnected upstream), the backlog cannot
be determined. In that case, the standby is treated as exceeding the
threshold and is skipped.
This parameter measures only the apply lag on the standby itself, i.e.,
how much already-received WAL remains to be replayed. It does not
attempt to measure how far the standby is behind the primary. In
particular, a standby that is slow to receive WAL but fast to replay it
may report a small backlog here while still being significantly behind.
The attached PoC patch may make the behaviour clearer.
Any feedback on this approach would be appreciated.
Best, Jim
From 8d85ed55f46749eff6f493edf1e214f4804726d5 Mon Sep 17 00:00:00 2001
From: Jim Jones <[email protected]>
Date: Sun, 29 Mar 2026 19:18:49 +0200
Subject: [PATCH v1] Add libpq connection parameter max_wal_replay_size
Introduce a new libpq connection parameter, max_wal_replay_size, that
allows clients to skip standby servers whose WAL replay backlog exceeds
a given threshold. The value is a size specifier accepted by
pg_size_bytes(), e.g. "100MB" or "1GB".
When this parameter is set and a candidate host is a standby, libpq
issues a single query during connection establishment to measure the
amount of WAL received but not yet replayed
(pg_last_wal_receive_lsn() - pg_last_wal_replay_lsn()). If the
backlog exceeds the threshold, the connection to that host is closed
and the next host in the list is tried. A standby with no active
WAL receiver is also rejected, as its backlog cannot be determined.
The check is skipped entirely when connecting to a primary, or when
target_session_attrs is set to "primary" or "read-write". It does
apply when target_session_attrs is "any", "read-only", "standby", or
"prefer-standby".
The new state CONNECTION_CHECK_WAL_REPLAY_SIZE is added to the
PQconnectPoll() state machine, following the same async pattern as
the existing CONNECTION_CHECK_STANDBY state.
---
doc/src/sgml/libpq.sgml | 39 ++++
src/interfaces/libpq/fe-connect.c | 199 +++++++++++++++-
src/interfaces/libpq/libpq-fe.h | 1 +
src/interfaces/libpq/libpq-int.h | 2 +
src/test/perl/PostgreSQL/Test/Utils.pm | 1 +
src/test/recovery/meson.build | 1 +
.../recovery/t/053_max_wal_replay_size.pl | 218 ++++++++++++++++++
src/test/regress/pg_regress.c | 1 +
8 files changed, 459 insertions(+), 3 deletions(-)
create mode 100644 src/test/recovery/t/053_max_wal_replay_size.pl
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 6db823808fc..b91cad42c31 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -2424,6 +2424,45 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
</listitem>
</varlistentry>
+ <varlistentry id="libpq-connect-max-wal-replay-size" xreflabel="max_wal_replay_size">
+ <term><literal>max_wal_replay_size</literal></term>
+ <listitem>
+ <para>
+ Specifies the maximum acceptable WAL replay backlog when connecting
+ to a standby server. The backlog is measured as the difference
+ between the last WAL byte received by the standby and the last WAL
+ byte it has replayed — in other words, how much received WAL is
+ waiting to be applied. Note that this measures only the apply lag
+ on the standby itself; it does not account for any delay in WAL
+ reaching the standby from the primary. The value must be a size
+ specifier understood by <function>pg_size_bytes</function>, such as
+ <literal>100MB</literal> or <literal>1GB</literal>. If not set, no
+ check is performed.
+ </para>
+
+ <para>
+ When this parameter is set and the selected host is a standby (i.e.,
+ in recovery), <productname>libpq</productname> will query the
+ server's WAL replay backlog and reject the connection if it exceeds
+ the specified threshold. A standby that has no active WAL receiver
+ process is one whose backlog cannot be determined and is therefore
+ also rejected. In either case, <productname>libpq</productname> will
+ try the next host in the list, if any.
+ </para>
+
+ <para>
+ This parameter is ignored when connecting to a primary or when
+ <literal>target_session_attrs</literal> is set to
+ <literal>primary</literal> or <literal>read-write</literal>. When
+ <literal>target_session_attrs</literal> is set to
+ <literal>any</literal>, the check still applies, and a standby that
+ exceeds the threshold will be skipped in favour of the next candidate
+ host (which may be a primary).
+ </para>
+
+ </listitem>
+ </varlistentry>
+
<varlistentry id="libpq-connect-load-balance-hosts" xreflabel="load_balance_hosts">
<term><literal>load_balance_hosts</literal></term>
<listitem>
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index db9b4c8edbf..f6d0ed238ba 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -385,6 +385,10 @@ static const internalPQconninfoOption PQconninfoOptions[] = {
"Target-Session-Attrs", "", 15, /* sizeof("prefer-standby") = 15 */
offsetof(struct pg_conn, target_session_attrs)},
+ {"max_wal_replay_size", "PGMAXWALREPLAYSIZE", NULL, NULL,
+ "Max-WAL-replay-size", "", 64,
+ offsetof(struct pg_conn, max_wal_replay_size)},
+
{"load_balance_hosts", "PGLOADBALANCEHOSTS",
DefaultLoadBalanceHosts, NULL,
"Load-Balance-Hosts", "", 8, /* sizeof("disable") = 8 */
@@ -512,11 +516,9 @@ static bool sslVerifyProtocolVersion(const char *version);
static bool sslVerifyProtocolRange(const char *min, const char *max);
static bool pqParseProtocolVersion(const char *value, ProtocolVersion *result, PGconn *conn, const char *context);
-
/* global variable because fe-auth.c needs to access it */
pgthreadlock_t pg_g_threadlock = default_threadlock;
-
/*
* pqDropConnection
*
@@ -686,6 +688,7 @@ pqDropServerData(PGconn *conn)
conn->std_strings = false;
conn->default_transaction_read_only = PG_BOOL_UNKNOWN;
conn->in_hot_standby = PG_BOOL_UNKNOWN;
+ conn->wal_replay_size_checked = false;
conn->scram_sha_256_iterations = SCRAM_SHA_256_DEFAULT_ITERATIONS;
conn->sversion = 0;
@@ -2119,6 +2122,43 @@ pqConnectOptions2(PGconn *conn)
}
}
+ /*
+ * Validate the max_wal_replay_size option.
+ *
+ * The value is passed directly into a SQL query as a string literal
+ * (inside single quotes). This allowlist blocks every character that
+ * could escape the quotes or otherwise alter the query — in particular
+ * single quotes, backslashes, and non-ASCII bytes. Signs ('+'/'-') are
+ * also excluded: pg_size_bytes() does not accept them, so permitting
+ * them would only produce a confusing server-side error.
+ *
+ * This is a security boundary, not a full semantic validator. Values
+ * such as "10XY" pass this check but will be rejected by pg_size_bytes()
+ * on the server with a clear "invalid size" error. There is no
+ * pg_size_bytes()-equivalent available in libpq, so complete client-side
+ * validation would require duplicating backend logic.
+ */
+ if (conn->max_wal_replay_size)
+ {
+ const char *p;
+
+ for (p = conn->max_wal_replay_size; *p; p++)
+ {
+ if (!isascii((unsigned char)*p) ||
+ (!isdigit((unsigned char)*p) &&
+ !isalpha((unsigned char)*p) &&
+ !isspace((unsigned char)*p) &&
+ *p != '.'))
+ {
+ conn->status = CONNECTION_BAD;
+ libpq_append_conn_error(conn, "invalid %s value: \"%s\"",
+ "max_wal_replay_size",
+ conn->max_wal_replay_size);
+ return false;
+ }
+ }
+ }
+
if (conn->min_protocol_version)
{
if (!pqParseProtocolVersion(conn->min_protocol_version, &conn->min_pversion, conn, "min_protocol_version"))
@@ -2947,6 +2987,7 @@ PQconnectPoll(PGconn *conn)
case CONNECTION_CHECK_WRITABLE:
case CONNECTION_CONSUME:
case CONNECTION_CHECK_STANDBY:
+ case CONNECTION_CHECK_WAL_REPLAY_SIZE:
{
/* Load waiting data */
int n = pqReadData(conn);
@@ -4426,6 +4467,58 @@ keep_going: /* We will come back to here until there is
case CONNECTION_CHECK_TARGET:
{
+
+ /*
+ * Servers before 9.0 have no hot standby mode at all, so treat
+ * them as primaries unconditionally. This mirrors the same check
+ * in the SERVER_TYPE_PRIMARY/STANDBY branch below, but must be
+ * done here first so that the WAL replay size check guard (which
+ * tests in_hot_standby) sees the correct value even when
+ * target_session_attrs=any.
+ */
+ if (conn->sversion < 90000)
+ conn->in_hot_standby = PG_BOOL_NO;
+
+ /*
+ * If the user specified a max WAL replay size, and we haven't
+ * yet checked it for this host, and the server is not known to
+ * be a primary, send an async query to evaluate the replay size.
+ * The check is skipped when target_session_attrs requires a
+ * primary or read-write session, and when the server has already
+ * been identified as a primary via startup parameters.
+ */
+ if (conn->max_wal_replay_size &&
+ !conn->wal_replay_size_checked &&
+ conn->target_server_type != SERVER_TYPE_PRIMARY &&
+ conn->target_server_type != SERVER_TYPE_READ_WRITE &&
+ conn->in_hot_standby != PG_BOOL_NO)
+ {
+ char qbuf[512];
+
+ /*
+ * Build a single query that determines recovery status and
+ * evaluates the replay size in one round-trip. We include
+ * pg_is_in_recovery() so that primaries (where in_hot_standby
+ * may still be unknown) are handled correctly without a
+ * separate query.
+ */
+ snprintf(qbuf, sizeof(qbuf),
+ "SELECT"
+ " pg_catalog.pg_is_in_recovery(),"
+ " pg_wal_lsn_diff(pg_last_wal_receive_lsn(),"
+ " pg_last_wal_replay_lsn())"
+ " > pg_catalog.pg_size_bytes('%s'),"
+ " pg_catalog.pg_size_pretty("
+ " pg_wal_lsn_diff(pg_last_wal_receive_lsn(),"
+ " pg_last_wal_replay_lsn()))",
+ conn->max_wal_replay_size);
+ conn->status = CONNECTION_OK;
+ if (!PQsendQueryContinue(conn, qbuf))
+ goto error_return;
+ conn->status = CONNECTION_CHECK_WAL_REPLAY_SIZE;
+ return PGRES_POLLING_READING;
+ }
+
/*
* If a read-write, read-only, primary, or standby connection
* is required, see if we have one.
@@ -4714,6 +4807,105 @@ keep_going: /* We will come back to here until there is
goto keep_going;
}
+ case CONNECTION_CHECK_WAL_REPLAY_SIZE:
+ {
+ /*
+ * Waiting for result of the WAL replay size check query. We
+ * must transiently set status = CONNECTION_OK in order to use
+ * the result-consuming subroutines.
+ *
+ * All functions in the query (pg_is_in_recovery,
+ * pg_last_wal_receive_lsn, pg_last_wal_replay_lsn,
+ * pg_wal_lsn_diff, pg_size_bytes, pg_size_pretty) are
+ * granted to PUBLIC, so unprivileged users obtain the same
+ * results as superusers. This is intentional: the check
+ * must work for any connecting role.
+ *
+ * The query returns three columns:
+ * 0: pg_is_in_recovery() -- bool
+ * 1: size_exceeds_threshold -- bool, or NULL if no
+ * WAL receiver is active
+ * 2: size_pretty -- text, or NULL
+ */
+ conn->status = CONNECTION_OK;
+ if (!PQconsumeInput(conn))
+ goto error_return;
+
+ if (PQisBusy(conn))
+ {
+ conn->status = CONNECTION_CHECK_WAL_REPLAY_SIZE;
+ return PGRES_POLLING_READING;
+ }
+
+ res = PQgetResult(conn);
+ if (res && PQresultStatus(res) == PGRES_TUPLES_OK &&
+ PQntuples(res) == 1)
+ {
+ /* col 0: not in recovery means this is a primary */
+ if (strcmp(PQgetvalue(res, 0, 0), "f") == 0)
+ {
+ conn->in_hot_standby = PG_BOOL_NO;
+ PQclear(res);
+ conn->wal_replay_size_checked = true;
+ conn->status = CONNECTION_CONSUME;
+ goto keep_going;
+ }
+
+ /*
+ * col 1: NULL means pg_last_wal_receive_lsn() returned
+ * NULL, i.e., no WAL receiver is active on this standby.
+ */
+ else if (PQgetisnull(res, 0, 1))
+ {
+ libpq_append_conn_error(conn,
+ "could not determine WAL replay backlog on standby (no WAL receiver active)");
+ PQclear(res);
+ conn->status = CONNECTION_OK;
+ sendTerminateConn(conn);
+ conn->try_next_host = true;
+ goto keep_going;
+ }
+
+ /* col 1: size exceeds the threshold */
+ else if (strcmp(PQgetvalue(res, 0, 1), "t") == 0)
+ {
+ char *size_pretty = PQgetvalue(res, 0, 2);
+
+ libpq_append_conn_error(conn,
+ "WAL replay size on standby is too large: %s (max_wal_replay_size=%s)",
+ size_pretty,
+ conn->max_wal_replay_size);
+ PQclear(res);
+ conn->status = CONNECTION_OK;
+ sendTerminateConn(conn);
+ conn->try_next_host = true;
+ goto keep_going;
+ }
+
+ /* Replay size is within the acceptable threshold */
+ conn->in_hot_standby = PG_BOOL_YES;
+ PQclear(res);
+ conn->wal_replay_size_checked = true;
+ conn->status = CONNECTION_CONSUME;
+ goto keep_going;
+ }
+
+ /*
+ * Something went wrong with the WAL replay size check query.
+ * If the server returned an error (e.g. invalid size), its
+ * message is already in conn->errorMessage; don't pile on a
+ * generic wrapper on top of it.
+ */
+ if (res == NULL ||
+ PQresultStatus(res) != PGRES_FATAL_ERROR)
+ libpq_append_conn_error(conn, "could not evaluate WAL replay size");
+ PQclear(res);
+ conn->status = CONNECTION_OK;
+ sendTerminateConn(conn);
+ conn->try_next_host = true;
+ goto keep_going;
+ }
+
default:
libpq_append_conn_error(conn,
"invalid connection state %d, probably indicative of memory corruption",
@@ -5033,13 +5225,14 @@ pqMakeEmptyPGconn(void)
conn->std_strings = false; /* unless server says differently */
conn->default_transaction_read_only = PG_BOOL_UNKNOWN;
conn->in_hot_standby = PG_BOOL_UNKNOWN;
+ conn->wal_replay_size_checked = false;
conn->scram_sha_256_iterations = SCRAM_SHA_256_DEFAULT_ITERATIONS;
conn->verbosity = PQERRORS_DEFAULT;
conn->show_context = PQSHOW_CONTEXT_ERRORS;
conn->sock = PGINVALID_SOCKET;
conn->altsock = PGINVALID_SOCKET;
conn->Pfdebug = NULL;
-
+ conn->max_wal_replay_size = NULL;
/*
* We try to send at least 8K at a time, which is the usual size of pipe
* buffers on Unix systems. That way, when we are sending a large amount
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index f06e7a972c3..c0b8b699a4c 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -110,6 +110,7 @@ typedef enum
CONNECTION_CHECK_TARGET, /* Internal state: checking target server
* properties. */
CONNECTION_CHECK_STANDBY, /* Checking if server is in standby mode. */
+ CONNECTION_CHECK_WAL_REPLAY_SIZE, /* Checking WAL replay size on standby. */
CONNECTION_ALLOCATED, /* Waiting for connection attempt to be
* started. */
CONNECTION_AUTHENTICATING, /* Authentication is in progress with some
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index bd7eb59f5f8..c16fb376621 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -425,6 +425,8 @@ struct pg_conn
char *ssl_min_protocol_version; /* minimum TLS protocol version */
char *ssl_max_protocol_version; /* maximum TLS protocol version */
char *target_session_attrs; /* desired session properties */
+ char *max_wal_replay_size; /* maximum WAL replay size allowed */
+ bool wal_replay_size_checked; /* WAL replay size check done for current host */
char *require_auth; /* name of the expected auth method */
char *load_balance_hosts; /* load balance over hosts */
char *scram_client_key; /* base64-encoded SCRAM client key */
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index ff843eecc6e..dbb1e31f014 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -142,6 +142,7 @@ BEGIN
PGSSLROOTCERT
PGSSLSNI
PGTARGETSESSIONATTRS
+ PGMAXWALREPLAYSIZE
PGUSER
PGPORT
PGHOST
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index 36d789720a3..e824ffadc12 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -61,6 +61,7 @@ tests += {
't/050_redo_segment_missing.pl',
't/051_effective_wal_level.pl',
't/052_checkpoint_segment_missing.pl',
+ 't/053_max_wal_replay_size.pl',
],
},
}
diff --git a/src/test/recovery/t/053_max_wal_replay_size.pl b/src/test/recovery/t/053_max_wal_replay_size.pl
new file mode 100644
index 00000000000..4929abddf46
--- /dev/null
+++ b/src/test/recovery/t/053_max_wal_replay_size.pl
@@ -0,0 +1,218 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+# Tests for the max_wal_replay_size parameter with target_session_attrs
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Utils;
+use PostgreSQL::Test::Cluster;
+use Test::More;
+
+# Initialize the primary node
+my $primary = PostgreSQL::Test::Cluster->new('primary');
+$primary->init(allows_streaming => 1);
+$primary->append_conf('postgresql.conf', "listen_addresses = 'localhost'");
+$primary->start;
+
+# Create and start the first standby (will be paused later to induce lag)
+my $standby1_backup = $primary->backup('my_backup1');
+my $standby_node1 = PostgreSQL::Test::Cluster->new('standby1');
+$standby_node1->init_from_backup($primary, 'my_backup1', has_streaming => 1);
+$standby_node1->append_conf('postgresql.conf', "listen_addresses = 'localhost'");
+$standby_node1->start;
+
+# Create and start the second standby (keeps streaming normally)
+my $standby2_backup = $primary->backup('my_backup2');
+my $standby_node2 = PostgreSQL::Test::Cluster->new('standby2');
+$standby_node2->init_from_backup($primary, 'my_backup2', has_streaming => 1);
+$standby_node2->append_conf('postgresql.conf', "listen_addresses = 'localhost'");
+$standby_node2->start;
+
+# Create and start a third standby that is in recovery but has no WAL receiver
+# (no primary_conninfo, no restore_command). pg_is_in_recovery() returns true
+# but pg_last_wal_receive_lsn() returns NULL, exercising the code path that
+# rejects connections with "no WAL receiver active".
+my $standby_node3 = PostgreSQL::Test::Cluster->new('standby3');
+$standby_node3->init_from_backup($primary, 'my_backup2');
+$standby_node3->append_conf('postgresql.conf', "listen_addresses = 'localhost'");
+$standby_node3->set_standby_mode();
+$standby_node3->start;
+
+# Pause WAL replay on standby1 to simulate lag
+$standby_node1->safe_psql('postgres', 'SELECT pg_wal_replay_pause();');
+
+# Generate WAL activity on primary to cause replay lag on standby1
+$primary->safe_psql('postgres', 'CREATE TABLE t AS SELECT generate_series(1,100000) i');
+$primary->safe_psql('postgres', 'SELECT pg_switch_wal();');
+
+# Wait until standby1 has accumulated a meaningful WAL replay backlog (> 2 MB).
+# Polling avoids a fixed-duration sleep that may be too short on slow systems.
+$standby_node1->poll_query_until(
+ 'postgres',
+ "SELECT pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn()) > 2 * 1024 * 1024")
+ or die "standby1 did not accumulate replay lag in time";
+
+# Collect connection ports
+my ($stdout, $stderr);
+my $port_primary = $primary->port;
+my $port_standby1 = $standby_node1->port;
+my $port_standby2 = $standby_node2->port;
+my $port_standby3 = $standby_node3->port;
+
+# 1. Connects to standby1 when lag is below high threshold (1GB)
+$standby_node1->psql(
+ 'postgres',
+ 'SELECT inet_server_port()',
+ connstr => "dbname=postgres host=localhost,localhost,localhost port=$port_primary,$port_standby1,$port_standby2 target_session_attrs=standby max_wal_replay_size=1GB",
+ stdout => \$stdout,
+ stderr => \$stderr,
+);
+is($stdout, $port_standby1, "Connects to standby1 with high size threshold (1GB)");
+
+# 2. Skips standby1 due to replay size (1MB threshold), connects to standby2
+$standby_node1->psql(
+ 'postgres',
+ 'SELECT inet_server_port()',
+ connstr => "dbname=postgres host=localhost,localhost,localhost port=$port_primary,$port_standby1,$port_standby2 target_session_attrs=standby max_wal_replay_size=1MB",
+ stdout => \$stdout,
+ stderr => \$stderr,
+);
+is($stdout, $port_standby2, "Skips standby1 due to replay size, connects to standby2");
+
+# 3. Skips standby1 due to replay size, connects to primary with prefer-standby
+$standby_node1->psql(
+ 'postgres',
+ 'SELECT inet_server_port()',
+ connstr => "dbname=postgres host=localhost,localhost port=$port_primary,$port_standby1 target_session_attrs=prefer-standby max_wal_replay_size=1MB",
+ stdout => \$stdout,
+ stderr => \$stderr,
+);
+is($stdout, $port_primary, "Connects to primary (prefer-standby) due to standby1 replay size");
+
+# 4. Connects to primary with target_session_attrs=any (standby1 exceeds threshold)
+$standby_node2->psql(
+ 'postgres',
+ 'SELECT inet_server_port()',
+ connstr => "dbname=postgres host=localhost,localhost port=$port_standby1,$port_primary target_session_attrs=any max_wal_replay_size=1MB",
+ stdout => \$stdout,
+ stderr => \$stderr,
+);
+is($stdout, $port_primary, "Connects to primary (any) due to standby1 replay size");
+
+# 5. All connections fail: standby1 exceeds replay size, and only standbys are allowed
+$standby_node1->psql(
+ 'postgres',
+ 'SELECT inet_server_port()',
+ connstr => "dbname=postgres host=localhost,localhost port=$port_primary,$port_standby1 target_session_attrs=standby max_wal_replay_size=1MB",
+ stdout => \$stdout,
+ stderr => \$stderr,
+);
+like($stderr, qr/WAL replay size on standby is too large/,
+ "Connection rejected: replay size exceeds 1MB");
+
+# 6. Invalid value for max_wal_replay_size (non-numeric)
+$standby_node1->psql(
+ 'postgres',
+ 'SELECT inet_server_port()',
+ connstr => "dbname=postgres host=localhost port=$port_standby2 max_wal_replay_size=foo",
+ stdout => \$stdout,
+ stderr => \$stderr,
+);
+like($stderr, qr/invalid size/i, "Rejects non-numeric max_wal_replay_size value");
+
+# 7. Negative max_wal_replay_size value
+$standby_node1->psql(
+ 'postgres',
+ 'SELECT inet_server_port()',
+ connstr => "dbname=postgres host=localhost port=$port_standby2 max_wal_replay_size=-1GB",
+ stdout => \$stdout,
+ stderr => \$stderr,
+);
+like($stderr, qr/invalid max_wal_replay_size value/,
+ "Rejects negative max_wal_replay_size value");
+
+# 8. Replay lag equals threshold exactly (should allow connection)
+my $lag_bytes;
+$standby_node1->psql(
+ 'postgres',
+ 'SELECT pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn())',
+ connstr => "dbname=postgres host=localhost port=$port_standby1",
+ stdout => \$lag_bytes,
+);
+chomp($lag_bytes);
+$standby_node1->psql(
+ 'postgres',
+ 'SELECT 1',
+ connstr => "dbname=postgres host=localhost port=$port_standby1 max_wal_replay_size=$lag_bytes",
+ stdout => \$stdout,
+);
+is($stdout, '1', "Connects when replay lag equals threshold exactly ($lag_bytes bytes)");
+
+# 9. Replay lag exceeds threshold by 1 byte (should reject)
+my $lag_low;
+$standby_node1->psql(
+ 'postgres',
+ 'SELECT pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn()) - 1;',
+ connstr => "dbname=postgres host=localhost port=$port_standby1",
+ stdout => \$lag_low,
+);
+chomp($lag_low);
+$standby_node1->psql(
+ 'postgres',
+ 'SELECT 1',
+ connstr => "dbname=postgres host=localhost port=$port_standby1 max_wal_replay_size=$lag_low",
+ stdout => \$stdout,
+ stderr => \$stderr,
+);
+like($stderr, qr/WAL replay size on standby is too large/,
+ "Connection rejected: replay size exceeds max_wal_replay_size ($lag_low bytes)");
+
+# 10. Connects to standby2 after it is freshly restarted
+$standby_node2->restart;
+$standby_node2->psql(
+ 'postgres',
+ 'SELECT inet_server_port()',
+ connstr => "dbname=postgres host=localhost,localhost,localhost port=$port_primary,$port_standby1,$port_standby2 target_session_attrs=standby max_wal_replay_size=1MB",
+ stdout => \$stdout,
+ stderr => \$stderr,
+);
+is($stdout, $port_standby2, "Connects to standby2 after restart");
+
+# 11. Standby with no active WAL receiver is rejected
+# standby3 is in recovery (standby.signal present) but has no primary_conninfo
+# and no restore_command, so pg_last_wal_receive_lsn() returns NULL.
+$standby_node3->psql(
+ 'postgres',
+ 'SELECT inet_server_port()',
+ connstr => "dbname=postgres host=localhost port=$port_standby3 max_wal_replay_size=1MB",
+ stdout => \$stdout,
+ stderr => \$stderr,
+);
+like($stderr, qr/could not determine WAL replay backlog/,
+ "Rejects standby with no active WAL receiver");
+
+# 12. target_session_attrs=read-only: check still applies.
+# standby1 exceeds the 1MB threshold, so libpq skips it and connects to
+# standby2 (which is current).
+$standby_node2->psql(
+ 'postgres',
+ 'SELECT inet_server_port()',
+ connstr => "dbname=postgres host=localhost,localhost,localhost port=$port_primary,$port_standby1,$port_standby2 target_session_attrs=read-only max_wal_replay_size=1MB",
+ stdout => \$stdout,
+ stderr => \$stderr,
+);
+is($stdout, $port_standby2, "read-only: skips standby1 due to replay size, connects to standby2");
+
+# 13. target_session_attrs=read-write: check is bypassed entirely.
+# The impossibly small threshold of 1 byte must not prevent connecting to
+# the primary (which is the only read-write host).
+$primary->psql(
+ 'postgres',
+ 'SELECT inet_server_port()',
+ connstr => "dbname=postgres host=localhost,localhost port=$port_primary,$port_standby2 target_session_attrs=read-write max_wal_replay_size=1",
+ stdout => \$stdout,
+ stderr => \$stderr,
+);
+is($stdout, $port_primary, "read-write: max_wal_replay_size check bypassed, connects to primary");
+
+done_testing();
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index b8b6a911987..5274603cc7c 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -840,6 +840,7 @@ initialize_environment(void)
unsetenv("PGSSLROOTCERT");
unsetenv("PGSSLSNI");
unsetenv("PGTARGETSESSIONATTRS");
+ unsetenv("PGMAXWALREPLAYSIZE");
unsetenv("PGUSER");
/* PGPORT, see below */
/* PGHOST, see below */
--
2.43.0