Hi,

On Sat, May 30, 2026 at 4:14 PM Zhijie Hou (Fujitsu)
<[email protected]> wrote:
>
> On Saturday, May 30, 2026 1:44 AM Srinath Reddy Sadipiralla 
> <[email protected]>  wrote:
>
> > > On Wed, May 27, 2026 at 5:20 PM Zhijie Hou (Fujitsu) 
> > > <mailto:[email protected]> wrote:
> > > I haven't attached a test for this fix, as the change is straightforward 
> > > and the
> > > Likelihood of encountering this bug is low, so it may not be worth adding 
> > > test
> > > cycles for it. However, if others feel differently, I'm OK to add one.
> >
> > +1 for a test. The fix is just an else, so a future refactor could change 
> > it and silently
> > reintroduce the corruption, since it scribbles on an unrelated reused slot, 
> > nothing
> > would catch it. Injection points make it deterministic; I've attached a 
> > diff patch that adds
> > a test that fails without the fix and passes with it.
>
> Thanks for the test.
>
> I'm not sure if adding an injection point for this rare case is worthwhile. 
> Even
> if we were to add one, future refactoring of that function could shift the
> position of the injection point, so its long-term usefulness is uncertain. I
> don't have a strong opinion on this, so I'll leave it to Fujii-San to decide.

Thanks for reporting this issue. I can reproduce it with lldb on Mac.

postgres=# SELECT pg_create_logical_replication_slot(
postgres(#     'test_slot_dropped',
postgres(#     'pgoutput2',
postgres(#     false,
postgres(#     false,
postgres(#     true
postgres(# );
ERROR:  could not access file "pgoutput2": No such file or directory


   787 * decoding be disabled.
   788 */
   789 ReplicationSlotDropAcquired(is_logical);
-> 790 }
   791
   792 /*
   793 * If slot needed to temporarily restrain both data and catalog xmin to
Target 0: (postgres) stopped.
(lldb) expr -- slot->data.name.data
(char[64]) $0 = "test_slot_created"
(lldb) expr -- slot->data.persistency
(ReplicationSlotPersistency) $1 = RS_PERSISTENT
(lldb) expr -- slot->active_proc
(ProcNumber) $2 = 126

The fix looks good to me.

There's an adjacent bug around drop_local_obsolete_slots. The root
cause of them looks similar -- ReplicationSlot * is a pointer to a
reusable shared-memory array cell, not a durable identity for the same
slot. In drop_local_obsolete_slots, the issue is that the slot has
been freed after ReplicationSlotDropAcquired(false); however, another
backend may reuse the same cell before the unlock/log reads. This
seems less severe -- it does not normally corrupt slot state, because
the code only read after the drop. But it can still unlock/log
misusing the identity of a different slot. Attached a test using
injection point to reproduce it and a patch to fix it.

--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.
use strict;
use warnings FATAL => 'all';

use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;

my $tag = $$;
my $primary = PostgreSQL::Test::Cluster->new("repro_ip_primary_$tag");

$primary->init(allows_streaming => 'logical');
$primary->append_conf(
	'postgresql.conf', qq(
autovacuum = off
log_min_messages = 'debug2'
));
$primary->start;

if (!$primary->check_extension('injection_points'))
{
	plan skip_all => 'Extension injection_points not installed';
}

# Create the extension before the base backup so the standby can call its
# functions while in recovery.
$primary->safe_psql('postgres', 'CREATE EXTENSION injection_points;');

my $backup_name = 'backup';
$primary->backup($backup_name);

my $standby = PostgreSQL::Test::Cluster->new("repro_ip_standby_$tag");
$standby->init_from_backup(
	$primary, $backup_name,
	has_streaming => 1,
	has_restoring => 1);

my $primary_connstr = $primary->connstr;
$standby->append_conf(
	'postgresql.conf', qq(
hot_standby_feedback = on
primary_slot_name = 'phys_slot'
primary_conninfo = '$primary_connstr dbname=postgres'
log_min_messages = 'debug2'
));

$primary->safe_psql(
	'postgres',
	q{
SELECT pg_create_logical_replication_slot('victim_slot', 'pgoutput', false, false, true);
SELECT pg_create_physical_replication_slot('phys_slot');
});

$standby->start;
$primary->advance_wal(1);
$primary->wait_for_replay_catchup($standby);

note('attach injection point on standby');
$standby->safe_psql(
	'postgres',
	q{SELECT injection_points_attach('slotsync-obsolete-after-drop', 'wait');}
);

note('sync failover slot to standby');
$standby->safe_psql('postgres', 'SELECT pg_sync_replication_slots();');

is( $standby->safe_psql(
		'postgres',
		q{SELECT synced FROM pg_replication_slots WHERE slot_name = 'victim_slot';}
	),
	't',
	'victim slot is synchronized to the standby');

note('drop remote failover slot');
$primary->safe_psql('postgres',
	q{SELECT pg_drop_replication_slot('victim_slot');});

my $log_offset = -s $standby->logfile;

note('start slot sync and wait at injection point after local drop');
my $sync = $standby->background_psql('postgres', on_error_stop => 0);
$sync->query_until(
	qr/start_sync/,
	q(
\echo start_sync
SELECT pg_sync_replication_slots();
));

$standby->wait_for_event('client backend', 'slotsync-obsolete-after-drop');

ok( $standby->poll_query_until(
		'postgres',
		q{SELECT NOT EXISTS (SELECT 1 FROM pg_replication_slots WHERE slot_name = 'victim_slot');}
	),
	'victim slot has been dropped locally');

note('reuse freed slot array cell with a physical slot');
$standby->safe_psql('postgres',
	q{SELECT pg_create_physical_replication_slot('replacement_slot');});

note('release injection point');
$standby->safe_psql(
	'postgres',
	q{
SELECT injection_points_wakeup('slotsync-obsolete-after-drop');
SELECT injection_points_detach('slotsync-obsolete-after-drop');
});

ok( $standby->wait_for_log(
		qr/dropped replication slot "replacement_slot" of database with OID 0|you don't own a lock of type AccessShareLock/,
		$log_offset),
	'stale local_slot pointer was observed after drop');

is( $standby->safe_psql(
		'postgres',
		q{SELECT slot_type FROM pg_replication_slots WHERE slot_name = 'replacement_slot';}
	),
	'physical',
	'replacement slot still exists');

$sync->quit;

done_testing();

Attachment: v1-0001-Avoid-stale-slot-access-after-dropping-obsolete-s.patch
Description: Binary data

Reply via email to