On Thu, Feb 26, 2026 at 12:57:24PM +0000, Matt Blewitt wrote: > Problem: When the first XLOG_RUNNING_XACTS record seen during recovery has > subxid_overflow=true, the standby enters STANDBY_SNAPSHOT_PENDING and > hot standby never activates (LocalHotStandbyActive stays false).
Yes, this is an historical factor that exists since hot standby is a thing. We cannot connect yet because we don't have a stable state that live connections could rely on. > This caused recovery_target_action = 'pause' to be silently bypassed: > recoveryPausesHere() returns immediately when hot standby is not yet > active, so the pause is skipped and the server promotes instead. > > Fix: in PerformWalRecovery(), when the recovery target is reached and > the snapshot is still PENDING, force a transition to STANDBY_SNAPSHOT_READY > and call CheckRecoveryConsistency() to activate hot standby before the > target action switch is evaluated. > > As I understand it, this is safe because subtransaction > commits write to CLOG but produce no WAL entry, so standbys > always see overflowed subxids as INPROGRESS rather than SUB_COMMITTED. This is an interesting argument. To be honest, while it is true that subtransaction commits do not cause WAL records and flushes (as far as I recall), I am not completely sure yet if it is always OK to rely on that and open the server for connections earlier than we logically can. PENDING has the rather old historical assumption that we should never open connections yet, because we don't have a standby state initialized yet. That makes the introduction of such shortcuts very tricky to think about. The TAP test helps in showing what you are looking for, thanks for that. > I would consider this for backporting to supported releases. Note sure that I would agree with this position. This is also a slight change of behavior regarding the end of recovery due the interaction with the recovery target reached. It is not an area of the code we should underestimate. -- Michael
signature.asc
Description: PGP signature
