Dear Amit, Alexander,

> > Regarding assertion failure, I've found that assert in
> > PhysicalConfirmReceivedLocation() conflicts with restart_lsn
> > previously set by ReplicationSlotReserveWal().  As I can see,
> > ReplicationSlotReserveWal() just picks fresh XLogCtl->RedoRecPtr lsn.
> > So, it doesn't seems there is a guarantee that restart_lsn never goes
> > backward.  The commit in ReplicationSlotReserveWal() even states there
> > is a "chance that we have to retry".
> >
> 
> I don't see how this theory can lead to a restart_lsn of a slot going
> backwards. The retry mentioned there is just a retry to reserve the
> slot's position again if the required WAL is already removed. Such a
> retry can only get the position later than the previous restart_lsn.

We analyzed the assertion failure happened at pg_basebackup/020_pg_receivewal,
and confirmed that restart_lsn can go backward. This meant that Assert() added
by the ca307d5 can cause crash.

Background
===========
When pg_receivewal starts the replication and it uses the replication slot, it
sets as the beginning of the segment where restart_lsn exists, as the 
startpoint.
E.g., if the restart_lsn of the slot is 0/B000D0, pg_receivewal requests WALs
from 0/B00000.
More detail of this behavior, see f61e1dd2 and d9bae531.

What happened here
==================
Based on above theory, walsender sent from the beginning of segment (0/B00000).
When walreceiver receives, it tried to send reply. At that time the flushed
location of WAL would be 0/B00000. walsender sets the received lsn as 
restart_lsn
in PhysicalConfirmReceivedLocation(). Here the restart_lsn went backward 
(0/B000D0->0/B00000).

The assertion failure could happen if CHECKPOINT happened at that time.
Attribute last_saved_restart_lsn of the slot was 0/B000D0, but the 
data.restart_lsn
was 0/B00000. It could not satisfy the assertion added in 
InvalidatePossiblyObsoleteSlot().

Note
====
1.
In this case, starting from the beginning of the segment is not a problem, 
because
the checkpoint process only removes WAL files from segments that precede the
restart_lsn's wal segment. The current segment (0/B00000) will not be removed,
so there is no risk of data loss or inconsistency.

2.
A similar pattern applies to pg_basebackup. Both use logic that adjusts the
requested streaming position to the start of the segment, and it replies the
received LSN as flushed.

3.
I considered the theory above, but I could not reproduce 
040_standby_failover_slots_sync
because it is a timing issue. Have someone else reproduced?

We are still investigating failure caused at 040_standby_failover_slots_sync.

[1]: 
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=scorpion&dt=2025-06-17%2000%3A40%3A46&stg=pg_basebackup-check

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Reply via email to