Prevent invalidation of newly created replication slots. A race condition could cause a newly created replication slot to become invalidated between WAL reservation and a checkpoint.
Previously, if the required WAL was removed, we retried the reservation process. However, the slot could still be invalidated before the retry if the WAL was not yet removed but the checkpoint advanced the redo pointer beyond the slot's intended restart LSN and computed the minimum LSN that needs to be preserved for the slots. The fix is to acquire an exclusive lock on ReplicationSlotAllocationLock during WAL reservation to serialize WAL reservation and checkpoint's minimum restart_lsn computation. This ensures that, if WAL reservation occurs first, the checkpoint waits until restart_lsn is updated before removing WAL. If the checkpoint runs first, subsequent WAL reservations pick a position at or after the latest checkpoint's redo pointer. We can't use the same fix for branch 17 and prior because commit 2090edc6f3 changed to compute to the minimum restart_LSN among slot's at the beginning of checkpoint (or restart point). The fix for 17 and prior branches is under discussion and will be committed separately. Reported-by: suyu.cmj <[email protected]> Author: Hou Zhijie <[email protected]> Reviewed-by: Vitaly Davydov <[email protected]> Reviewed-by: Masahiko Sawada <[email protected]> Reviewed-by: Amit Kapila <[email protected]> Backpatch-through: 18 Discussion: https://postgr.es/m/5e045179-236f-4f8f-84f1-0f2566ba784c.mengjuan....@alibaba-inc.com Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/006dd4b2e5b33a3d0c2780f0bf307f6311ed776b Modified Files -------------- src/backend/replication/slot.c | 100 ++++++++++++++++++++++------------------- 1 file changed, 54 insertions(+), 46 deletions(-)
