Hi, I found a backend crash in WAIT FOR LSN when it is interrupted inside a savepoint and the session then waits again.
I tried to find if it was already reported, but could not find it, so,
posting it.
While navigating I noticed WAIT FOR LSN cleanup is incomplete on
subtransaction abort. An interrupt such as statement_timeout while
waiting inside a savepoint leaves stale per-backend wait state,
causing a later WAIT FOR LSN in the same backend to violate
the wait-heap invariant and crash an assertion-enabled build.
A small reproducer is:
BEGIN;
SAVEPOINT s;
SET statement_timeout = '100ms';
WAIT FOR LSN '<future-lsn>' WITH (MODE 'primary_flush');
ROLLBACK TO s;
SET statement_timeout = 0;
WAIT FOR LSN '0/0' WITH (MODE 'primary_flush', TIMEOUT '10ms',
NO_THROW);
COMMIT;
where <future-lsn> can be generated with:
SELECT pg_current_wal_insert_lsn() + 10000000000;
TRAP: failed Assert("!procInfo->inHeap"), File: "xlogwait.c"
The attached patch mirrors the top-level abort cleanup by calling
WaitLSNCleanup() from AbortSubTransaction(), after LWLockReleaseAll(). It
also adds a TAP test to verify that WAIT FOR LSN can be reused in the same
backend after a statement_timeout and ROLLBACK TO SAVEPOINT.
Thoughts?
Regards,
Ayush
v1-0001-Fix-WAIT-FOR-LSN-cleanup-on-subtransaction-abort.patch
Description: Binary data
