On Thu, Jan 29, 2026 at 9:22 PM Xuneng Zhou <[email protected]> wrote: > Thanks for your report. I can reliably reproduce the issue on HEAD > using your scripts. I’ve analyzed the problem and am proposing a patch > to fix it. > > --- Analysis > When a cascading standby streams from an archive-only upstream: > > 1. The upstream's GetStandbyFlushRecPtr() returns only replay position > (no received-but-not-replayed buffer since there's no walreceiver) > 2. When streaming ends and the cascade falls back to archive recovery, > it can restore WAL segments from its own archive access > 3. The cascade's read position (RecPtr) advances beyond what the > upstream has replayed > 4. On reconnect, the cascade requests streaming from RecPtr, which the > upstream rejects as "ahead of flush position" > > --- Proposed Fix > > Track the last confirmed flush position from streaming > (lastStreamedFlush) and clamp the streaming start request when it > exceeds that position:
I haven't read the patch yet, but doesn't lastStreamedFlush represent the same LSN as tliRecPtr or replayLSN (the arguments to WaitForWALToBecomeAvailable())? If so, we may not need to introduce a new variable to track this LSN. The choice of which LSN is used as the replication start point has varied over time to handle corner cases (for example, commit 06687198018). That makes me wonder whether we should first better understand why WaitForWALToBecomeAvailable() currently uses RecPtr as the starting point. BTW, with v1 patch, I was able to reproduce the issue using the following steps: -------------------------------------------- initdb -D data mkdir arch cat <<EOF >> data/postgresql.conf archive_mode = on archive_command = 'cp %p ../arch/%f' restore_command = 'cp ../arch/%f %p' EOF pg_ctl -D data start pg_basebackup -D sby1 -c fast cp -a sby1 sby2 cat <<EOF >> sby1/postgresql.conf port = 5433 EOF touch sby1/standby.signal pg_ctl -D sby1 start cat <<EOF >> sby2/postgresql.conf port = 5434 primary_conninfo = 'port=5433' EOF touch sby2/standby.signal pg_ctl -D sby2 start pgbench -i -s2 pg_ctl -D sby2 restart -------------------------------------------- In this case, after restarting the standby connecting to another (cascading) standby, I observed the following error. FATAL: could not receive data from WAL stream: ERROR: requested starting point 0/04000000 is ahead of the WAL flush position of this server 0/03FFE8D0 Regards, -- Fujii Masao
