Hi Sergey,

Thanks for the report and patch. I think the analysis is right, and the
fix is in the right place.

The gap traces back to commit 7185eddf, which deliberately dropped the
wait_for_catchup() and switched the primary from teardown_node() to a
clean stop(), on the grounds that a clean stop flushes all WAL to both
standbys before exiting. That's true, but only for standbys whose
walsender is *connected* at shutdown time -- and ->start() only waits
for the postmaster to accept connections, not for the standby's
walreceiver to have connected back to the primary. So if a standby
hasn't connected yet when the primary stops, the clean-shutdown flush
skips it, and we're back to the exact "standbys received different
amounts of WAL -> timeline fork on reconnect" failure that 7185eddf was
meant to fix.

Polling pg_stat_replication until both walsenders are present closes
that hole: it re-establishes the precondition the clean-stop design
silently assumed. And connection is enough here -- the walsender
shutdown path sends all WAL up to the shutdown checkpoint regardless of
catchup state -- so there's no need to additionally check
state = 'streaming'.

One small thing: the rest of this file uses count(*), so I'd write count(*) = 2
rather than count(1) = 2 just for local consistency. And the comment reads a
little better as something like "Wait until both standbys have
connected to the primary",
since by this point they've already started -- what we're waiting for is the
connection.

Regards,
Ewan

On Tue, Jun 16, 2026 at 4:01 PM Sergey Tatarintsev
<[email protected]> wrote:
>
> Hi hackers!
>
> I found that after commit 7185eddf0522b3146ed1ff6e063e8e129e77c706 we
> got little omission
> in TAP test 004_timeline_switch:
> ...
> my $node_standby_1 = PostgreSQL::Test::Cluster->new('standby_1');
> ...
> $node_primary->stop;
>
> There is no guarantee that standby_1 and standby_2 was successfully
> connected to primary and start
> streaming before primary stopped.
>
>   I think we must ensure that primary knows about standby_1 and standby_2
>
> --
> With best regards,
> Sergey Tatarintsev,
> PostgresPro


Reply via email to