> On Feb 2, 2026, at 15:37, Zhijie Hou (Fujitsu) <[email protected]> wrote: > > On Monday, February 2, 2026 3:16 PM Chao Li <[email protected]> wrote: >> >>> On Feb 2, 2026, at 14:43, Zhijie Hou (Fujitsu) <[email protected]> >> wrote: >>> >>> On Monday, February 2, 2026 2:25 PM Chao Li <[email protected]> wrote: >>> >>>> >>>> Another related issue is in StartLogicalReplication(): the function >>>> initially asserts MyReplicationSlot == NULL, then calls >>>> ReplicationSlotAcquire(), which is expected to set up >>>> MyReplicationSlot. Since MyReplicationSlot is dereferenced later in >>>> this function, I think it would be clearer and safer to add an >>>> Assert(MyReplicationSlot != NULL) immediately after the call to >> ReplicationSlotAcquire(). >>> >>> I don't think the suggested Assert is unnecessary, as the primary >>> function of >>> ReplicationSlotAcquire() is to get MyReplicationSlot assigned a valid >>> value. Any issues that arise should be handled within the function >>> itself. I think an Assert placed after this function does not add additional >> safety. >>> >> >> Yes, ReplicationSlotAcquire() has a retry logic to ensure MyReplicationSlot >> to >> be set. So adding a NULL check here is a bit redundant. How about >> Assert(SlotIsLogical(MyReplicationSlot))? Does it make more sense? > > I think we cannot assume the slot type here. A suitable checking might > be: If a physical slot was acquired during logical replication, report an > error, > just like we do in StartReplication().
Good point. In StartReplication(), we check MyReplicationSlot is not logical, correspondingly, in StartLogicalReplication(), we should check MyReplicationSlot is not physical. > > I haven't verified whether this error will occur in practice, but I think we > could reproduce this by dropping the original logical slot and then creating a > new physical slot with the same name concurrently. > Without sufficient load, this race is hard to trigger manually, so I added a TAP test using an injection point. With the injection point in place, the TAP test can hit the error: ``` # +++ tap check in src/test/recovery +++ t/052_logical_slot_type_race.pl .. ok All tests successful. Files=1, Tests=2, 3 wallclock secs ( 0.00 usr 0.00 sys + 0.11 cusr 0.36 csys = 0.47 CPU) Result: PASS ``` I think this test demonstrates that adding a sanity check in StartLogicalReplication() is necessary. I’ve put the TAP script in 0002 and will leave it to the committer to decide whether to accept the test. PFA v2: * 0001 changes the Assert to explicitly verify that MyReplicationSlot is a logical slot. * 0002 adds a TAP test covering a race where a logical slot is dropped while a physical slot with the same name is created concurrently. If 0002 is accepted, I’m fine with squashing it into 0001. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
v2-0001-walsender-Assert-MyReplicationSlot-is-set-before-.patch
Description: Binary data
v2-0002-Add-TAP-test-for-logical-slot-type-race.patch
Description: Binary data
