Hi, On Sat, May 30, 2026 at 4:14 PM Zhijie Hou (Fujitsu) <[email protected]> wrote: > > On Saturday, May 30, 2026 1:44 AM Srinath Reddy Sadipiralla > <[email protected]> wrote: > > > > On Wed, May 27, 2026 at 5:20 PM Zhijie Hou (Fujitsu) > > > <mailto:[email protected]> wrote: > > > I haven't attached a test for this fix, as the change is straightforward > > > and the > > > Likelihood of encountering this bug is low, so it may not be worth adding > > > test > > > cycles for it. However, if others feel differently, I'm OK to add one. > > > > +1 for a test. The fix is just an else, so a future refactor could change > > it and silently > > reintroduce the corruption, since it scribbles on an unrelated reused slot, > > nothing > > would catch it. Injection points make it deterministic; I've attached a > > diff patch that adds > > a test that fails without the fix and passes with it. > > Thanks for the test. > > I'm not sure if adding an injection point for this rare case is worthwhile. > Even > if we were to add one, future refactoring of that function could shift the > position of the injection point, so its long-term usefulness is uncertain. I > don't have a strong opinion on this, so I'll leave it to Fujii-San to decide.
Thanks for reporting this issue. I can reproduce it with lldb on Mac. postgres=# SELECT pg_create_logical_replication_slot( postgres(# 'test_slot_dropped', postgres(# 'pgoutput2', postgres(# false, postgres(# false, postgres(# true postgres(# ); ERROR: could not access file "pgoutput2": No such file or directory 787 * decoding be disabled. 788 */ 789 ReplicationSlotDropAcquired(is_logical); -> 790 } 791 792 /* 793 * If slot needed to temporarily restrain both data and catalog xmin to Target 0: (postgres) stopped. (lldb) expr -- slot->data.name.data (char[64]) $0 = "test_slot_created" (lldb) expr -- slot->data.persistency (ReplicationSlotPersistency) $1 = RS_PERSISTENT (lldb) expr -- slot->active_proc (ProcNumber) $2 = 126 The fix looks good to me. There's an adjacent bug around drop_local_obsolete_slots. The root cause of them looks similar -- ReplicationSlot * is a pointer to a reusable shared-memory array cell, not a durable identity for the same slot. In drop_local_obsolete_slots, the issue is that the slot has been freed after ReplicationSlotDropAcquired(false); however, another backend may reuse the same cell before the unlock/log reads. This seems less severe -- it does not normally corrupt slot state, because the code only read after the drop. But it can still unlock/log misusing the identity of a different slot. Attached a test using injection point to reproduce it and a patch to fix it. -- Regards, Xuneng Zhou HighGo Software Co., Ltd.
use strict;
use warnings FATAL => 'all';
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;
my $tag = $$;
my $primary = PostgreSQL::Test::Cluster->new("repro_ip_primary_$tag");
$primary->init(allows_streaming => 'logical');
$primary->append_conf(
'postgresql.conf', qq(
autovacuum = off
log_min_messages = 'debug2'
));
$primary->start;
if (!$primary->check_extension('injection_points'))
{
plan skip_all => 'Extension injection_points not installed';
}
# Create the extension before the base backup so the standby can call its
# functions while in recovery.
$primary->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
my $backup_name = 'backup';
$primary->backup($backup_name);
my $standby = PostgreSQL::Test::Cluster->new("repro_ip_standby_$tag");
$standby->init_from_backup(
$primary, $backup_name,
has_streaming => 1,
has_restoring => 1);
my $primary_connstr = $primary->connstr;
$standby->append_conf(
'postgresql.conf', qq(
hot_standby_feedback = on
primary_slot_name = 'phys_slot'
primary_conninfo = '$primary_connstr dbname=postgres'
log_min_messages = 'debug2'
));
$primary->safe_psql(
'postgres',
q{
SELECT pg_create_logical_replication_slot('victim_slot', 'pgoutput', false, false, true);
SELECT pg_create_physical_replication_slot('phys_slot');
});
$standby->start;
$primary->advance_wal(1);
$primary->wait_for_replay_catchup($standby);
note('attach injection point on standby');
$standby->safe_psql(
'postgres',
q{SELECT injection_points_attach('slotsync-obsolete-after-drop', 'wait');}
);
note('sync failover slot to standby');
$standby->safe_psql('postgres', 'SELECT pg_sync_replication_slots();');
is( $standby->safe_psql(
'postgres',
q{SELECT synced FROM pg_replication_slots WHERE slot_name = 'victim_slot';}
),
't',
'victim slot is synchronized to the standby');
note('drop remote failover slot');
$primary->safe_psql('postgres',
q{SELECT pg_drop_replication_slot('victim_slot');});
my $log_offset = -s $standby->logfile;
note('start slot sync and wait at injection point after local drop');
my $sync = $standby->background_psql('postgres', on_error_stop => 0);
$sync->query_until(
qr/start_sync/,
q(
\echo start_sync
SELECT pg_sync_replication_slots();
));
$standby->wait_for_event('client backend', 'slotsync-obsolete-after-drop');
ok( $standby->poll_query_until(
'postgres',
q{SELECT NOT EXISTS (SELECT 1 FROM pg_replication_slots WHERE slot_name = 'victim_slot');}
),
'victim slot has been dropped locally');
note('reuse freed slot array cell with a physical slot');
$standby->safe_psql('postgres',
q{SELECT pg_create_physical_replication_slot('replacement_slot');});
note('release injection point');
$standby->safe_psql(
'postgres',
q{
SELECT injection_points_wakeup('slotsync-obsolete-after-drop');
SELECT injection_points_detach('slotsync-obsolete-after-drop');
});
ok( $standby->wait_for_log(
qr/dropped replication slot "replacement_slot" of database with OID 0|you don't own a lock of type AccessShareLock/,
$log_offset),
'stale local_slot pointer was observed after drop');
is( $standby->safe_psql(
'postgres',
q{SELECT slot_type FROM pg_replication_slots WHERE slot_name = 'replacement_slot';}
),
'physical',
'replacement slot still exists');
$sync->quit;
done_testing();
v1-0001-Avoid-stale-slot-access-after-dropping-obsolete-s.patch
Description: Binary data
