Fix race when logical decoding activation is concurrently interrupted. EnableLogicalDecoding() sets xlog_logical_info to true, emits a procsignal barrier, sets logical_decoding_enabled to true, and then writes a WAL record. If the activating backend is interrupted between these steps, a PG_ENSURE_ERROR_CLEANUP() callback runs to undo the partial activation.
The previous callback asserted that logical_decoding_enabled was still false and then cleared xlog_logical_info. Both actions were unsafe when a second backend was concurrently activating: the peer backend might have already observed xlog_logical_info as true, set logical_decoding_enabled to true, and written the activation WAL record before our callback fired, causing the first backend to hit the assertion failure. Fix this by having the abort callback call RequestDisableLogicalDecoding(), allowing the checkpointer to undo the partial activation in the same manner as a normal deactivation. This simplifies the logic by unifying the activation abort and deactivation paths. While this approach now wakes up the checkpointer when an activation is interrupted, this should not be a serious issue in practice since such interruptions are rare. Add a test case to 051_effective_wal_level.pl. Reported-by: Chao Li <[email protected]> Reviewed-by: Chao Li <[email protected]> Discussion: https://postgr.es/m/[email protected] Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/93a3e6839bf8d2e0498291335191b57ddf458b48 Modified Files -------------- src/backend/replication/logical/logicalctl.c | 69 ++++++++++++++------------ src/test/recovery/t/051_effective_wal_level.pl | 45 ++++++++++++++++- 2 files changed, 81 insertions(+), 33 deletions(-)
