Prevent invalidation of newly created replication slots.

A race condition could cause a newly created replication slot to become
invalidated between WAL reservation and a checkpoint.

Previously, if the required WAL was removed, we retried the reservation
process. However, the slot could still be invalidated before the retry if
the WAL was not yet removed but the checkpoint advanced the redo pointer
beyond the slot's intended restart LSN and computed the minimum LSN that
needs to be preserved for the slots.

The fix is to acquire an exclusive lock on ReplicationSlotAllocationLock
during WAL reservation, and a shared lock during the minimum LSN
calculation at checkpoints to serialize the process. This ensures that, if
WAL reservation occurs first, the checkpoint waits until restart_lsn is
updated before calculating the minimum LSN. If the checkpoint runs first,
subsequent WAL reservations pick a position at or after the latest
checkpoint's redo pointer.

We used a similar fix in HEAD (via commit 006dd4b2e5) and 18. The
difference is that in 17 and prior branches we need to additionally handle
the race condition with slot's minimum LSN computation during checkpoints.

Reported-by: suyu.cmj <[email protected]>
Author: Hou Zhijie <[email protected]>
Author: vignesh C <[email protected]>
Reviewed-by: Hayato Kuroda <[email protected]>
Reviewed-by: Masahiko Sawada <[email protected]>
Reviewed-by: Amit Kapila <[email protected]>
Backpatch-through: 14
Discussion: 
https://postgr.es/m/5e045179-236f-4f8f-84f1-0f2566ba784c.mengjuan....@alibaba-inc.com

Branch
------
REL_14_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/7406df60569f77b20783950c05d86c157a4a10a2

Modified Files
--------------
src/backend/access/transam/xlog.c |  30 +++++++++--
src/backend/replication/slot.c    | 106 +++++++++++++++++++-------------------
2 files changed, 81 insertions(+), 55 deletions(-)

Reply via email to