Hi, While testing “Toggle logical decoding dynamically based on logical slot presence”, I hit an assertion failure with concurrent logical slot creation.
This is a repo:
1. In session 1, attach the injection point locally and start creating a
logical slot. The session blocks at logical-decoding-activation:
```
evantest=# set application_name = 'slot_a';
SET
evantest=# select injection_points_set_local();
injection_points_set_local
----------------------------
(1 row)
evantest=# select injection_points_attach('logical-decoding-activation',
'wait');
injection_points_attach
-------------------------
(1 row)
evantest=# select pg_create_logical_replication_slot('slot_a', 'pgoutput');
```
2. In session 2, create another logical slot. This succeeds, and
effective_wal_level becomes logical:
```
evantest=# select pg_create_logical_replication_slot('slot_b', 'pgoutput');
pg_create_logical_replication_slot
------------------------------------
(slot_b,0/0902E418)
(1 row)
evantest=# show effective_wal_level;
effective_wal_level
---------------------
logical
(1 row)
```
3. In session 2, cancel session 1 instead of waking it up:
```
evantest=# select pg_cancel_backend(pid) from pg_stat_activity where
application_name = 'slot_a';
pg_cancel_backend
-------------------
t
(1 row)
```
Then the server hits this assertion:
```
TRAP: failed Assert("!LogicalDecodingCtl->logical_decoding_enabled"), File:
"logicalctl.c", Line: 266, PID: 13768
0 postgres 0x00000001032b35d8 ExceptionalCondition
+ 216
1 postgres 0x0000000102f64600
abort_logical_decoding_activation + 120
2 postgres 0x0000000102f6451c
EnsureLogicalDecodingEnabled + 412
3 postgres 0x0000000102f9f314
create_logical_replication_slot + 164
4 postgres 0x0000000102f9f1c4
pg_create_logical_replication_slot + 312
5 postgres 0x0000000102ce5f48 ExecInterpExpr + 3888
6 postgres 0x0000000102ce48b4
ExecInterpExprStillValid + 76
7 postgres 0x0000000102d57e94 ExecEvalExprNoReturn
+ 44
8 postgres 0x0000000102d57e54
ExecEvalExprNoReturnSwitchContext + 48
9 postgres 0x0000000102d57d18 ExecProject + 72
10 postgres 0x0000000102d57a9c ExecResult + 312
11 postgres 0x0000000102d06f1c ExecProcNodeFirst +
92
12 postgres 0x0000000102cfd8cc ExecProcNode + 60
13 postgres 0x0000000102cf83fc ExecutePlan + 244
14 postgres 0x0000000102cf8298 standard_ExecutorRun
+ 456
15 postgres 0x0000000102cf80c0 ExecutorRun + 84
16 postgres 0x000000010306fc64 PortalRunSelect + 296
17 postgres 0x000000010306f674 PortalRun + 656
18 postgres 0x000000010306a220 exec_simple_query +
1372
19 postgres 0x0000000103069348 PostgresMain + 3224
20 postgres 0x0000000103060a3c BackendInitialize + 0
21 postgres 0x0000000102f27db8
postmaster_child_launch + 464
22 postgres 0x0000000102f2f2ec BackendStartup + 304
23 postgres 0x0000000102f2d260 ServerLoop + 372
24 postgres 0x0000000102f2bd8c PostmasterMain + 6256
25 postgres 0x0000000102d99e84 main + 924
26 dyld 0x000000018cef7e00 start + 6992
2026-05-28 13:28:32.526 CST [13753] LOG: client backend (PID 13768) was
terminated by signal 6: Abort trap: 6
2026-05-28 13:28:32.526 CST [13753] DETAIL: Failed process was running: select
pg_create_logical_replication_slot('slot_a', 'pgoutput');
```
From my tracing, when session 1 is cancelled, session 1 entered
abort_logical_decoding_activation(), and there is an assert:
```
Assert(!LogicalDecodingCtl->logical_decoding_enabled);
```
But session 2 had successfully created a slot and set
LogicalDecodingCtl->logical_decoding_enabled to true, so this is a race
condition.
I might be over thinking, but I just feel the safest fix is to make
EnableLogicalDecoding() serialize. I tried serializing with
LogicalDecodingControlLock and with a separate lock, but both approaches got
deadlock around the barrier wait. I ended up with adding an
activation_in_progress flag in shared memory, protected by
LogicalDecodingControlLock, with a condition variable to wait for the active
activation to finish.
With this fix, rerunning the repro makes session 2 wait while session 1 is
blocked at the injection point. After canceling session 1 from session 3,
session 2 continues, creates the slot successfully, and effective_wal_level
becomes logical.
I didn’t include a test in this patch, as I wasn’t sure such a test would be
desirable. If others think it is worth adding, I can convert the repro into a
TAP test.
See the attached patch for details.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
v1-0001-Fix-race-during-concurrent-logical-decoding-activ.patch
Description: Binary data
