Hi, Orphaned prepared transactions cause escalating harm the longer they persist:
1. *Lock retention* : All locks (row-level, table-level, advisory)
acquired during the transaction are held indefinitely, blocking concurrent
DML and DDL
2. *VACUUM blockage :* The prepared transaction's XID becomes the oldest
running transaction, preventing VACUUM from freezing tuples or reclaiming
dead rows across the entire cluster, leading to table and index bloat
3. *Transaction ID wraparound risk *: In extreme cases, the frozen XID
horizon cannot advance, eventually threatening XID wraparound shutdown
4. *Resource consumption *: Shared memory slots
(max_prepared_transactions) remain occupied; the WAL records for the
prepared state persist
Today, the only remediation is manual intervention: a DBA must discover the
orphan (via pg_prepared_xacts), determine it's truly abandoned, and issue
ROLLBACK PREPARED. PostgreSQL already has timeout-based safety nets for
other "stuck" session states such as, idle_in_transaction_session_timeout,
idle_session_timeout, statement_timeout, but no equivalent for prepared
transactions. This patch fills that gap.
*How it works ?*
CleanupOrphanedPreparedTransactions():
Phase 1 — Collect candidates (under TwoPhaseStateLock, shared mode):
for each GlobalTransactionData (gxact) in TwoPhaseState:
if gxact->valid AND
TimestampDifferenceExceeds(gxact->prepared_at, now, timeout):
save gxact->gid to candidate list
Phase 2 — Roll back each candidate (lock released):
for each saved GID:
lock = LockGXactForCleanup(gid)
if lock succeeded:
FinishPreparedTransaction(gid, isCommit=false)
log: "rolling back orphaned prepared transaction %s"
*Safety Properties*
1. Timeout = 0 (default): Feature is completely disabled, no behavior
change from default PostgreSQL
2. No false positives on active transactions: The check uses
prepared_at, which is set once at PREPARE TRANSACTION time. A transaction
that is actively being committed/rolled back by a client will either
complete before the timeout or be skipped by LockGXactForCleanup (which
returns NULL if the gxact is already locked by another backend)
3. Crash-safe: If the checkpointer crashes during cleanup, the prepared
transaction's WAL state is unchanged, it remains prepared and will be
cleaned up after recovery
4. Idempotent: If the GID was already resolved between Phase 1 and Phase
2, LockGXactForCleanup returns NULL and the cleanup is silently skipped
--
Regards,
Nikhil Chawla
Twitter <https://twitter.com/chawlanikhil24> | LinkedIn
<http://linkedin.com/in/chawlanikhil24>
0001-Add-prepared_orphaned_transaction_timeout-GUC.patch
Description: Binary data
