Re: Newly created replication slot may be invalidated by checkpoint

Amit Kapila Fri, 05 Dec 2025 04:10:59 -0800

On Thu, Dec 4, 2025 at 12:12 PM Zhijie Hou (Fujitsu)
<[email protected]> wrote:
>
> On Thursday, December 4, 2025 1:58 PM Amit Kapila <[email protected]> 
> wrote:
> >
> > On Thu, Dec 4, 2025 at 2:04 AM Masahiko Sawada
> > <[email protected]> wrote:
> > >
> > > On Tue, Dec 2, 2025 at 10:15 PM Zhijie Hou (Fujitsu)
> > > <[email protected]> wrote:
> > > >
> > > > I think the invalidation cannot occur when copying because:
> > > >
> > > > Currently, there are no CHECK_FOR_INTERRUPTS() calls between the
> > > > initial restart_lsn copy (first phase) and the latest restart_lsn copy 
> > > > (second phase).
> > > > As a result, even if a checkpoint attempts to invalidate a slot and
> > > > sends a SIGTERM to the backend, the backend will first update the
> > > > restart_lsn during the second phase before responding to the signal.
> > > > Consequently, during the next cycle of
> > > > InvalidatePossiblyObsoleteSlot(), the checkpoint will observe the 
> > > > updated
> > > > restart_lsn and skip the invalidation.
> > > >
> > > > For logical slots, although invoking the output plugin startup
> > > > callback presents a slight chance of processing the signal (when
> > > > using third-party plugins), slot invalidation in this scenario
> > > > results in immediate slot dropping, because the slot is in RS_EPHEMERAL
> > state, thus preventing invalidation.
> > >
> > > Thank you for the analysis. I agree.
> > >
> > > > While theoretically, slot invalidation could occur if the code
> > > > changes in the future, addressing that possibility could be
> > > > considered an independent improvement task. What do you think ?
> > >
> > > Okay. I find that it also might make sense for HEAD to use
> > > RS_EPHEMERAL state for physical slots too to avoid being invalidated
> > > during creation, which probably can be discussed later. For back
> > > branches, the proposed idea of acquiring ReplicationSlotAllocationLock
> > > in an exclusive mode would be better. I think we might want to have a
> > > comment in CheckPointReplicationSlots() too that refers to
> > > ReplicationSlotReserveWal().
> > >
> > > Regarding whether we revert the original fix 2090edc6f32 and make it
> > > the same as we did in HEAD ca307d5cec90a4f, we need to change the size
> > > of ReplicationSlot struct. I'm concerned that it's really safe to
> > > change it because the data resides on the shared memory. For example,
> > > we typically iterate over all replication slots as follow:
> > >
> > > for (i = 0; i < max_replication_slots; i++) {
> > >     ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
> > >
> > > I'm concerned that the arithmetic for calculating the slot address is
> > > affected by the size of ReplicationSlot change.
> > >
> >
> > Yes, this is a valid concern. I think we can go-ahead with fixing the 
> > 0001's-fix
> > in HEAD and 18. We can discuss separately the fix for back-branches prior to
> > 18.
>
> Here are the updated patches for HEAD and 18. I did not add tests since, after
> applying the patch and resolving the issue, the only observable behavior is 
> that
> the checkpoint will wait for another backend to create a slot due to the 
> lwlock
> lock, so it seems not worth to test solely lwlock wait event (I could not 
> find similar
> tests).
>


Fair enough. The patch looks mostly good to me, attached are minor
comment improvements atop the HEAD patch. I'll do some more testing
before push.

Sawada-san/Vitaly, do you have any opinion on patch or the direction
to fix? The idea is to get this fixed for HEAD and 18, then continue
discussion for other bank-branches and the remaining patches.

-- 
With Regards,
Amit Kapila.

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index b496b11b5a9..77e022dfb0a 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1580,16 +1580,19 @@ ReplicationSlotReserveWal(void)
        Assert(!XLogRecPtrIsValid(slot->last_saved_restart_lsn));
 
        /*
+        * The replication slot mechanism is used to prevent the removal of
+        * required WAL.
+        *
         * Acquire an exclusive lock to prevent the checkpoint process from
-        * concurrently calculating the minimum slot LSN (see
-        * CheckPointReplicationSlots), ensuring that the reserved WAL cannot be
-        * removed during a checkpoint.
+        * concurrently computing the minimum slot LSN (see
+        * CheckPointReplicationSlots). This ensures that the WAL reserved for
+        * replication cannot be removed during a checkpoint.
         *
         * The mechanism is reliable because if WAL reservation occurs first, 
the
         * checkpoint must wait for the restart_lsn update before determining 
the
-        * minimum non-removable LSN. On the other hand, if the checkpoint 
occurs
-        * first, subsequent WAL reservations must choose positions beyond or
-        * equal to the redo pointer of checkpoint.
+        * minimum non-removable LSN. On the other hand, if the checkpoint 
happens
+        * first, subsequent WAL reservations will select positions at or beyond
+        * the redo pointer of that checkpoint.
         */
        LWLockAcquire(ReplicationSlotAllocationLock, LW_EXCLUSIVE);
 
@@ -1603,10 +1606,9 @@ ReplicationSlotReserveWal(void)
         * record.
         *
         * None of this is needed (or indeed helpful) for physical slots as
-        * they'll start replay at the last logged checkpoint anyway. Instead
-        * return the location of the last redo LSN. While that slightly 
increases
-        * the chance that we have to retry, it's where a base backup has to 
start
-        * replay at.
+        * they'll start replay at the last logged checkpoint anyway. Instead,
+        * return the location of the last redo LSN, where a base backup has to
+        * start replay at.
         */
        if (SlotIsPhysical(slot))
                restart_lsn = GetRedoRecPtr();
@@ -1622,6 +1624,7 @@ ReplicationSlotReserveWal(void)
        /* prevent WAL removal as fast as possible */
        ReplicationSlotsComputeRequiredLSN();
 
+       /* Checkpoint shouldn't remove the required WAL. */
        XLByteToSeg(slot->data.restart_lsn, segno, wal_segment_size);
        if (XLogGetLastRemovedSegno() >= segno)
                elog(ERROR, "WAL required by replication slot %s has been 
removed concurrently",
@@ -2139,8 +2142,8 @@ CheckPointReplicationSlots(bool is_shutdown)
         * Additionally, acquiring the Allocation lock is necessary to serialize
         * the slot flush process with concurrent slot WAL reservation. This
         * ensures that the WAL position being reserved is either flushed to 
disk
-        * or beyond or equal to the redo pointer (See ReplicationSlotReserveWal
-        * for details).
+        * or is beyond or equal to the redo pointer of the current checkpoint 
(See
+        * ReplicationSlotReserveWal for details).
         */
        LWLockAcquire(ReplicationSlotAllocationLock, LW_SHARED);

Re: Newly created replication slot may be invalidated by checkpoint

Reply via email to