Hi folks,

We observed a case where our backup tooling was periodically failing
for a specific workload - nested subtrans overflowing subxid. We don't
have visibility on the specific customer workload (i.e. either SAVEPOINT
or EXCEPTION handling), but reproducing is covered in the TAP test.

The problem detail and proposed fix are described below. Happy to discuss
further.

Problem: When the first XLOG_RUNNING_XACTS record seen during recovery has
subxid_overflow=true, the standby enters STANDBY_SNAPSHOT_PENDING and
hot standby never activates (LocalHotStandbyActive stays false).

This caused recovery_target_action = 'pause' to be silently bypassed:
recoveryPausesHere() returns immediately when hot standby is not yet
active, so the pause is skipped and the server promotes instead.

Fix: in PerformWalRecovery(), when the recovery target is reached and
the snapshot is still PENDING, force a transition to STANDBY_SNAPSHOT_READY
and call CheckRecoveryConsistency() to activate hot standby before the
target action switch is evaluated.

As I understand it, this is safe because subtransaction
commits write to CLOG but produce no WAL entry, so standbys
always see overflowed subxids as INPROGRESS rather than SUB_COMMITTED.

INPROGRESS subxids are invisible without any SubTrans
lookup, so the missing SubTrans entries that STANDBY_SNAPSHOT_PENDING
guards against cannot cause incorrect visibility results.

Add a TAP test (052_pitr_subxid_overflow.pl) that exercises the scenario:
the overflow transaction is kept open during the base backup's forced
checkpoint so that the very first XLOG_RUNNING_XACTS the standby replays
has subxid_overflow=true.  A named restore point is then created while
the overflow transaction is still open.  Without the fix the standby
promotes silently at the target; with the fix it pauses and accepts
hot-standby queries.

Note: subtransaction XIDs are only assigned when the subtransaction writes,
so gen_subxids() must perform an INSERT at each recursion level to force
the PGPROC subxid cache to overflow.

I would consider this for backporting to supported releases.

Attachment: 0001-Fix-PITR-pause-bypass-when-initial-XLOG_RUNNING_XACT.patch
Description: Binary data

Reply via email to