Fix race when logical decoding activation is concurrently interrupted.

EnableLogicalDecoding() sets xlog_logical_info to true, emits a
procsignal barrier, sets logical_decoding_enabled to true, and then
writes a WAL record. If the activating backend is interrupted between
these steps, a PG_ENSURE_ERROR_CLEANUP() callback runs to undo the
partial activation.

The previous callback asserted that logical_decoding_enabled was still
false and then cleared xlog_logical_info. Both actions were unsafe
when a second backend was concurrently activating: the peer backend
might have already observed xlog_logical_info as true, set
logical_decoding_enabled to true, and written the activation WAL
record before our callback fired, causing the first backend to hit the
assertion failure.

Fix this by having the abort callback call
RequestDisableLogicalDecoding(), allowing the checkpointer to undo the
partial activation in the same manner as a normal deactivation. This
simplifies the logic by unifying the activation abort and deactivation
paths. While this approach now wakes up the checkpointer when an
activation is interrupted, this should not be a serious issue in
practice since such interruptions are rare.

Add a test case to 051_effective_wal_level.pl.

Reported-by: Chao Li <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/[email protected]

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/93a3e6839bf8d2e0498291335191b57ddf458b48

Modified Files
--------------
src/backend/replication/logical/logicalctl.c   | 69 ++++++++++++++------------
src/test/recovery/t/051_effective_wal_level.pl | 45 ++++++++++++++++-
2 files changed, 81 insertions(+), 33 deletions(-)

Reply via email to