Dear Hayato, On Tuesday, October 07, 2025 14:53 MSK, "Hayato Kuroda (Fujitsu)" <[email protected]> wrote:
> > Yes, I > > propose > > to ensure that the protection happens when we assign restart_lsn. It seems > > to be > > wrong that we invalidate slots by its restart_lsn but protect the wal for > > slots using XLogCtl->replicationSlotsMinLSN. > > Seems valid. There is another corner case that another restart_lsn can be set > in-between, > but they have larger LSN than RedoRecPtr, right? Here we talk about new creating slots. There should be no other processes that can change restart_lsn during slot creation. Once, the slot is successfully created it can be advanced to the greater values only. During advance, the old restart_lsn will protect the slot, because it will be taken into account in the checkpoint. > > There is one subtle thing. Once, the operation of restart_lsn assignment is > > not > > an atomic, the following scenario may happen theoretically: > > 1. Read GetRedoRecPtr() in the backend (ReplicationSlotReserveWal) > > 2. Assign a new redo LSN in the checkpointer > > 3. Call ReplicationSlotsComputeRequiredLSN() in the checkpointer > > 3. Assign the old redo LSN to restart_lsn > > > > In this scenario, the restart_lsn will point to a previous redo LSN and it > > will > > be not protected by the new redo LSN. This scenario is unlikely, but it can > > happen theoretically. I have no ideas how to deal with it, except of > > assigning > > restart_lsn under XLogCtl->info_lck lock to avoid concurrent modification of > > XLogCtl->RecoRecPtr until it is assigned to restart_lsn of a creating slot. > > Oh, your point is there is another race condition, right? Do you have the > reproducer for it? The attached test for the master branch demonstrates a possible but very rare race condition, because we read and assign slot's restart_lsn from redo rec ptr non-atomically. The proposed scenario (see above) seems to be not complete. With best regards, Vitaly
From 85f1b4070bcabdac73a97c3b06a64cf27673eb20 Mon Sep 17 00:00:00 2001 From: Vitaly Davydov <[email protected]> Date: Sun, 26 Oct 2025 11:22:54 +0300 Subject: [PATCH] Test a specific case of physical slot invalidation When ReplicationSlotReserveWal() succeeds to read old RedoRecPtr right before the assignment of a new RedoRecPtr in a checkpointer which runs in parallel, the created slot may be invalidated. --- src/backend/replication/slot.c | 4 + ..._create_physlcal_slot_during_checkpoint.pl | 121 ++++++++++++++++++ 2 files changed, 125 insertions(+) create mode 100644 src/test/recovery/t/100_create_physlcal_slot_during_checkpoint.pl diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index ac188bb2f77..4310589da26 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -1578,10 +1578,14 @@ ReplicationSlotReserveWal(void) else restart_lsn = GetXLogInsertRecPtr(); + INJECTION_POINT("physical-slot-reserve-wal-get-redo", NULL); + SpinLockAcquire(&slot->mutex); slot->data.restart_lsn = restart_lsn; SpinLockRelease(&slot->mutex); + INJECTION_POINT("physical-slot-reserve-wal-before-compute-required-lsn", NULL); + /* prevent WAL removal as fast as possible */ ReplicationSlotsComputeRequiredLSN(); diff --git a/src/test/recovery/t/100_create_physlcal_slot_during_checkpoint.pl b/src/test/recovery/t/100_create_physlcal_slot_during_checkpoint.pl new file mode 100644 index 00000000000..f3211e210a5 --- /dev/null +++ b/src/test/recovery/t/100_create_physlcal_slot_during_checkpoint.pl @@ -0,0 +1,121 @@ +# This test checks that the new slot maybe invalidated by checkpoint + +use strict; +use warnings; +use Config; +use PostgreSQL::Test::Cluster; +use PostgreSQL::Test::Utils; +use Test::More; + +if ($ENV{enable_injection_points} ne 'yes') +{ + plan skip_all => 'Injection points not supported by this build'; +} + +my $res; + +# Setup primary node +my $node = PostgreSQL::Test::Cluster->new('primary'); +$node->init(allows_streaming => 1); + +$node->append_conf('postgresql.conf', q{ + wal_level = logical + checkpoint_timeout = 30min + checkpoint_completion_target = 0.1 +}); + +$node->start(); +$node->safe_psql('postgres', q{create extension injection_points;}); + +my $createslot = $node->background_psql('postgres'); +my $checkpoint = $node->background_psql('postgres'); + +sub physreplslot_start_create() +{ + $createslot->query_until( + qr/start-create-slot/, + q(\echo start-create-slot + select pg_create_physical_replication_slot('standby1', true); + )); +} + +sub checkpoint_start() +{ + $checkpoint->query_until( + qr/start-checkpoint/, + q(\echo start-checkpoint + checkpoint; + )); +} + +sub injection_point_attach($$) +{ + my ($injectpname, $event) = @_; + + $node->safe_psql('postgres', "select injection_points_attach('$injectpname','$event')"); +} + +sub injection_point_detach($) +{ + my $injectpname = shift; + + $node->safe_psql('postgres', "select injection_points_detach('$injectpname');"); +} + +sub injection_point_wait($$) +{ + my ($subsystem, $injectpname) = @_; + + note("waiting for $injectpname injection point"); + $node->wait_for_event($subsystem, $injectpname); + note("injection_point $injectpname is reached"); +} + +sub injection_point_wakeup($) +{ + my $injectpname = shift; + + $node->safe_psql('postgres', "select injection_points_wakeup('$injectpname');"); +} + +sub get_slot_invalidation_state($) +{ + my $slotname = shift; + + return $node->safe_psql('postgres', + "select invalidation_reason from pg_replication_slots where slot_name='$slotname'"); +} + +injection_point_attach('physical-slot-reserve-wal-get-redo', 'wait'); +injection_point_attach('checkpoint-before-old-wal-removal', 'wait'); +injection_point_attach("physical-slot-reserve-wal-before-compute-required-lsn", 'wait'); + +# start create physical replication slot +physreplslot_start_create(); + +# reach the get-redo injection point and stop there +injection_point_wait('client backend', 'physical-slot-reserve-wal-get-redo'); + +$node->advance_wal(6); + +# start checkpoint +checkpoint_start(); + +# reach the injection point before old WAL removal and stop there +injection_point_wait('checkpointer', 'checkpoint-before-old-wal-removal'); + +injection_point_wakeup('physical-slot-reserve-wal-get-redo'); + +injection_point_wakeup('checkpoint-before-old-wal-removal'); + +injection_point_wakeup('physical-slot-reserve-wal-before-compute-required-lsn'); + +# complete physical slot creation +eval { $createslot->quit(); }; + +# complete checkpoint +eval { $checkpoint->quit(); }; + +is(get_slot_invalidation_state('standby1'), "", "slot is not invalidated by checkpoint"); + +done_testing(); -- 2.43.0
