Prevent invalidation of newly synced replication slots.

A race condition could cause a newly synced replication slot to become
invalidated between its initial sync and the checkpoint.

When syncing a replication slot to a standby, the slot's initial
restart_lsn is taken from the publisher's remote_restart_lsn. Because slot
sync happens asynchronously, this value can lag behind the standby's
current redo pointer. Without any interlocking between WAL reservation and
checkpoints, a checkpoint may remove WAL required by the newly synced
slot, causing the slot to be invalidated.

To fix this, we acquire ReplicationSlotAllocationLock before reserving WAL
for a newly synced slot, similar to commit 006dd4b2e5. This ensures that
if WAL reservation happens first, the checkpoint process must wait for
slotsync to update the slot's restart_lsn before it computes the minimum
required LSN.

However, unlike in ReplicationSlotReserveWal(), this lock alone cannot
protect a newly synced slot if a checkpoint has already run
CheckPointReplicationSlots() before slotsync updates the slot. In such
cases, the remote restart_lsn may be stale and earlier than the current
redo pointer. To prevent relying on an outdated LSN, we use the oldest
WAL location available if it is greater than the remote restart_lsn.

This ensures that newly synced slots always start with a safe, non-stale
restart_lsn and are not invalidated by concurrent checkpoints.

Author: Zhijie Hou <[email protected]>
Reviewed-by: Hayato Kuroda <[email protected]>
Reviewed-by: Amit Kapila <[email protected]>
Reviewed-by: Vitaly Davydov <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Backpatch-through: 17
Discussion: 
https://postgr.es/m/TY4PR01MB16907E744589B1AB2EE89A31F94D7A%40TY4PR01MB16907.jpnprd01.prod.outlook.com

Branch
------
REL_18_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/919c9fa13cd0684b437a88719d670a9bf6dd0dc8

Modified Files
--------------
src/backend/access/transam/xlog.c                  |  6 +-
src/backend/replication/logical/slotsync.c         | 99 +++++++++++-----------
src/include/access/xlog.h                          |  1 +
src/test/recovery/t/046_checkpoint_logical_slot.pl | 84 +++++++++++++++++-
4 files changed, 137 insertions(+), 53 deletions(-)

Reply via email to