On Wed, Mar 25, 2026 at 12:17 PM Bharath Rupireddy
<[email protected]> wrote:
>
> Hi,
>
> On Tue, Mar 24, 2026 at 11:50 PM Masahiko Sawada <[email protected]>
> wrote:
> >
> > > Please find the v3 patch for further review.
> >
> > Thank you for updating the patch. I think the patch is reasonably
> > simple and can avoid unnecessary overheads well due to XID-based
> > checks. Here are some comments:
>
> Thank you for reviewing the patch.
>
> > vacuum_get_cutoff() is also called by VACUUM FULL, CLUSTER, and
> > REPACK. I'm not sure that users would expect the slot invalidation
> > also in these commands. I think it's better to leave
> > vacuum_get_cutoff() a pure cutoff computation function and we can try
> > to invalidate slots in heap_vacuum_rel(). It requires additional
> > ReadNextTransactionId() but we can live with it, or we can make
> > vacuum_get_cutoffs() return the nextXID as well (stored in *cutoffs).
>
> +1. I chose to perform the slot invalidation in heap_vacuum_rel by
> getting the next txn ID and calling vacuum_get_cutoffs again when a
> slot gets invalidated. IMHO, this is simple than adding a flag and do
> the invalidation selectively in vacuum_get_cutoffs.
>
> > if (TransactionIdPrecedes(oldestXmin, cutoffXID))
> > + {
> > + invalidated = InvalidateObsoleteReplicationSlots(RS_INVAL_XID_AGE,
> > + 0,
> > + InvalidOid,
> > +
> > InvalidTransactionId,
> > + nextXID);
> > + }
> >
> > I think it's better to check the procArray->replication_slot_xmin and
> > procArray->replication_slot_catalog_xmin before iterating over each
> > slot. Otherwise, we would end up checking every slot even when a long
> > running transaction holds the oldestxmin back.
>
> +1. Changed.
>
> > + if (!TransactionIdIsNormal(cutoffXID))
> > + cutoffXID = FirstNormalTransactionId;
> >
> > These codes have the same comment but are doing a slightly different
> > thing. I guess the latter is missing '-'?
>
> Fixed the typo.
>
> I fixed a test error being reported in CI.
>
> Please find the attached v4 patch for further review.
Thank you for updating the patch. I've reviewed the patch and have
some review comments:
+ /* translator: %s is a GUC variable name */
+ appendStringInfo(&err_detail, _("The slot's xmin
%u at next transaction ID %u exceeds the age %d specified by
\"%s\"."),
+ xmin,
+ nextXID,
+ max_slot_xid_age,
+ "max_slot_xid_age");
I think it's better to show the age of the slot's xmin instead of the
recent XID.
---
+
+ if (!TransactionIdIsNormal(oldestXmin) || !TransactionIdIsNormal(nextXID))
+ return false;
+
Do we expect that the passed oldestXmin or nextXID could be non-normal
XIDs? I think the function assumes these are valid XIDs.
Also, since this function is called only by heap_vacuum_rel(), we can
call ReadNextTransactionId() within this function.
---
+ if (IsReplicationSlotXIDAged(slot_xmin, slot_catalog_xmin, nextXID))
We compute the cutoff XID in IsReplicationSlotXIDAged() again, which
seems redundant.
I've attached the fixup patch addressing these comments and having
some code cleanups. Please review it.
I'm reviewing the regression test part, and will share review comments soon.
>
> I've also attached the 0002 patch that adds a test case to demo a
> production-like scenario by pushing the database to XID wraparound
> limits and checking if the XID-age based invalidation with the GUC
> setting at the default vacuum_failsafe_age of 1.6B works correctly,
> and whether autovacuum can successfully remove this replication slot
> blocker to proceed with freezing and bring the database back to
> normal. I don't intend to get this committed unless others think
> otherwise, but I wanted to have this as a reference.
Thank you for sharing the test script! I'll check it as well.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 809061110c7..bc9360b9743 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -805,16 +805,15 @@ heap_vacuum_rel(Relation rel, const VacuumParams params,
* vacuum's ability to freeze and remove dead tuples. Since OldestXmin
* already covers the slot xmin/catalog_xmin values, pass it as a
* preliminary check to avoid additional iteration over all the slots.
- *
- * If at least one slot was invalidated, recompute cutoffs so that this
- * vacuum benefits from the advanced horizon immediately.
- *
- * XXX: Next XID could be returned as output from vacuum_get_cutoffs() but
- * for now we live with an additional ReadNextTransactionId() call.
*/
- if (InvalidateXIDAgedReplicationSlots(vacrel->cutoffs.OldestXmin,
- ReadNextTransactionId()))
+ if (MaybeInvalidateXIDAgedSlots(vacrel->cutoffs.OldestXmin))
+ {
+ /*
+ * Some slots have been invalidated based on their XID age; recompute
+ * the vacuum cutoffs.
+ */
vacrel->aggressive = vacuum_get_cutoffs(rel, params, &vacrel->cutoffs);
+ }
vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel);
vacrel->vistest = GlobalVisTestFor(rel);
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 8ca5d86fe0e..8296856eefe 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -160,7 +160,7 @@ int max_replication_slots = 10; /* the maximum number of replication
int idle_replication_slot_timeout_secs = 0;
/*
- * Invalidate replication slots that have xmin or catalog_xmin greater
+ * Invalidate replication slots that have xmin or catalog_xmin older
* than the specified age; '0' disables it.
*/
int max_slot_xid_age = 0;
@@ -183,8 +183,6 @@ static XLogRecPtr ss_oldest_flush_lsn = InvalidXLogRecPtr;
static void ReplicationSlotShmemExit(int code, Datum arg);
static bool IsSlotForConflictCheck(const char *name);
static void ReplicationSlotDropPtr(ReplicationSlot *slot);
-static bool IsReplicationSlotXIDAged(TransactionId xmin, TransactionId catalog_xmin,
- TransactionId nextXID);
/* internal persistency functions */
static void RestoreSlotFromDisk(const char *name);
@@ -1792,7 +1790,7 @@ ReportSlotInvalidation(ReplicationSlotInvalidationCause cause,
long slot_idle_seconds,
TransactionId xmin,
TransactionId catalog_xmin,
- TransactionId nextXID)
+ TransactionId recentXid)
{
StringInfoData err_detail;
StringInfoData err_hint;
@@ -1845,20 +1843,14 @@ ReportSlotInvalidation(ReplicationSlotInvalidationCause cause,
if (TransactionIdIsValid(xmin))
{
/* translator: %s is a GUC variable name */
- appendStringInfo(&err_detail, _("The slot's xmin %u at next transaction ID %u exceeds the age %d specified by \"%s\"."),
- xmin,
- nextXID,
- max_slot_xid_age,
- "max_slot_xid_age");
+ appendStringInfo(&err_detail, _("The slot's xmin %u is %d transactions old, which exceeds the configured \"%s\" value of %d."),
+ xmin, (int32) (recentXid - xmin), "max_slot_xid_age", max_slot_xid_age);
}
else
{
/* translator: %s is a GUC variable name */
- appendStringInfo(&err_detail, _("The slot's catalog xmin %u at next transaction ID %u exceeds the age %d specified by \"%s\"."),
- catalog_xmin,
- nextXID,
- max_slot_xid_age,
- "max_slot_xid_age");
+ appendStringInfo(&err_detail, _("The slot's xmin %u is %d transactions old, which exceeds the configured \"%s\" value of %d."),
+ catalog_xmin, (int32) (recentXid - catalog_xmin), "max_slot_xid_age", max_slot_xid_age);
}
/* translator: %s is a GUC variable name */
@@ -1905,6 +1897,25 @@ CanInvalidateIdleSlot(ReplicationSlot *s)
!(RecoveryInProgress() && s->data.synced));
}
+/*
+ * Can we invalidate an XID-aged replication slot?
+ *
+ * XID-aged based invalidation is allowed to the given slot when:
+ *
+ * 1. Max XID-age is set
+ * 2. Slot has valid xmin or catalog_xmin
+ * 3. The slot is not being synced from the primary while the server is in
+ * recovery.
+ */
+static inline bool
+CanInvalidateXidAgedSlot(ReplicationSlot *s)
+{
+ return (max_slot_xid_age != 0 &&
+ (TransactionIdIsValid(s->data.xmin) ||
+ TransactionIdIsValid(s->data.catalog_xmin)) &&
+ !(RecoveryInProgress() && s->data.synced));
+}
+
/*
* DetermineSlotInvalidationCause - Determine the cause for which a slot
* becomes invalid among the given possible causes.
@@ -1916,7 +1927,7 @@ static ReplicationSlotInvalidationCause
DetermineSlotInvalidationCause(uint32 possible_causes, ReplicationSlot *s,
XLogRecPtr oldestLSN, Oid dboid,
TransactionId snapshotConflictHorizon,
- TransactionId nextXID,
+ TransactionId recentXid,
TimestampTz *inactive_since, TimestampTz now)
{
Assert(possible_causes != RS_INVAL_NONE);
@@ -1989,9 +2000,23 @@ DetermineSlotInvalidationCause(uint32 possible_causes, ReplicationSlot *s,
}
/* Check if the slot needs to be invalidated due to max_slot_xid_age GUC */
- if ((possible_causes & RS_INVAL_XID_AGE) &&
- IsReplicationSlotXIDAged(s->data.xmin, s->data.catalog_xmin, nextXID))
- return RS_INVAL_XID_AGE;
+ if (possible_causes & RS_INVAL_XID_AGE)
+ {
+ Assert(TransactionIdIsValid(recentXid));
+
+ if (CanInvalidateXidAgedSlot(s))
+ {
+ TransactionId xidLimit;
+
+ xidLimit = TransactionIdRetreatedBy(recentXid, max_slot_xid_age);
+
+ if ((TransactionIdIsValid(s->data.xmin) &&
+ TransactionIdPrecedes(s->data.xmin, xidLimit)) ||
+ (TransactionIdIsValid(s->data.catalog_xmin) &&
+ TransactionIdPrecedes(s->data.catalog_xmin, xidLimit)))
+ return RS_INVAL_XID_AGE;
+ }
+ }
return RS_INVAL_NONE;
}
@@ -2015,7 +2040,7 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
ReplicationSlot *s,
XLogRecPtr oldestLSN,
Oid dboid, TransactionId snapshotConflictHorizon,
- TransactionId nextXID,
+ TransactionId recentXid,
bool *released_lock_out)
{
int last_signaled_pid = 0;
@@ -2068,7 +2093,7 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
s, oldestLSN,
dboid,
snapshotConflictHorizon,
- nextXID,
+ recentXid,
&inactive_since,
now);
@@ -2163,7 +2188,7 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
slotname, restart_lsn,
oldestLSN, snapshotConflictHorizon,
slot_idle_secs, s->data.xmin,
- s->data.catalog_xmin, nextXID);
+ s->data.catalog_xmin, recentXid);
if (MyBackendType == B_STARTUP)
(void) SignalRecoveryConflict(GetPGProcByNumber(active_proc),
@@ -2217,7 +2242,7 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
slotname, restart_lsn,
oldestLSN, snapshotConflictHorizon,
slot_idle_secs, s->data.xmin,
- s->data.catalog_xmin, nextXID);
+ s->data.catalog_xmin, recentXid);
/* done with this slot for now */
break;
@@ -2259,7 +2284,8 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
bool
InvalidateObsoleteReplicationSlots(uint32 possible_causes,
XLogSegNo oldestSegno, Oid dboid,
- TransactionId snapshotConflictHorizon, TransactionId nextXID)
+ TransactionId snapshotConflictHorizon,
+ TransactionId recentXid)
{
XLogRecPtr oldestLSN;
bool invalidated = false;
@@ -2298,7 +2324,7 @@ restart:
if (InvalidatePossiblyObsoleteSlot(possible_causes, s, oldestLSN,
dboid, snapshotConflictHorizon,
- nextXID, &released_lock))
+ recentXid, &released_lock))
{
Assert(released_lock);
@@ -3330,104 +3356,53 @@ WaitForStandbyConfirmation(XLogRecPtr wait_for_lsn)
ConditionVariableCancelSleep();
}
-/*
- * Check if the passed-in xmin or catalog_xmin have aged beyond the
- * max_slot_xid_age GUC limit relative to nextXID.
- *
- * Returns true if either value exceeds the configured age.
- */
-static bool
-IsReplicationSlotXIDAged(TransactionId xmin, TransactionId catalog_xmin,
- TransactionId nextXID)
-{
- TransactionId cutoffXID;
- bool aged = false;
-
- if (max_slot_xid_age == 0)
- return false;
-
- if (!TransactionIdIsNormal(nextXID))
- return false;
-
- cutoffXID = nextXID - max_slot_xid_age;
-
- /* ensure it's a "normal" XID, else TransactionIdPrecedes misbehaves */
- /* this can cause the limit to go backwards by 3, but that's OK */
- if (cutoffXID < FirstNormalTransactionId)
- cutoffXID -= FirstNormalTransactionId;
-
- if (TransactionIdIsNormal(xmin) &&
- TransactionIdPrecedes(xmin, cutoffXID))
- aged = true;
-
- if (TransactionIdIsNormal(catalog_xmin) &&
- TransactionIdPrecedes(catalog_xmin, cutoffXID))
- aged = true;
-
- return aged;
-}
-
/*
* Invalidate replication slots whose XID age exceeds the max_slot_xid_age
* GUC.
*
- * The caller supplies oldestXmin, either computed via
- * GetOldestNonRemovableTransactionId during vacuum, or computed via the
- * minimum of slot xmin values obtained from ProcArrayGetReplicationSlotXmin,
- * and nextXID, the next XID to be assigned used to compute the age.
- *
- * Preliminary checks based on the passed-in oldestXmin and the oldest slot
- * xmin and catalog_xmin are done to avoid unnecessarily iterating over all
- * the slots. If the oldestXmin age does not exceed the GUC then no
- * individual slot can either, so the per-slot scan is skipped. For example,
- * if oldestXmin is 100 and the GUC is 500, every slot's xmin must be >= 100,
- * so none can be older than the GUC. Similarly, if the oldest slot xmin and
- * catalog_xmin from ProcArray are not aged, the per-slot scan is skipped;
- * this can happen when a long-running transaction holds the oldestXmin back.
- *
- * Even if the caller passes an oldestXmin that does not include the slot
- * xmin/catalog_xmin range, there is no risk of incorrect invalidation: each
- * slot's own xmin and catalog_xmin are individually verified against the GUC
- * inside IsReplicationSlotXIDAged(). The only downside is an additional
- * iteration over all the slots.
+ * The oldestXmin is expected to be a XID computed via
+ * GetOldestNonRemovableTransactionId() during vacuum. It is used as cutoffs
+ * for individual slot checks; if its age does not exceed the max_slot_xid_age,
+ * no individual slot can either, so we skip per-slot invalidation check.
*
* Returns true if at least one slot was invalidated.
*/
bool
-InvalidateXIDAgedReplicationSlots(TransactionId oldestXmin, TransactionId nextXID)
+MaybeInvalidateXIDAgedSlots(TransactionId oldestXmin)
{
- TransactionId cutoffXID;
+ TransactionId recentXid;
+ TransactionId xidLimit;
+ TransactionId slot_xmin = InvalidTransactionId;
+ TransactionId slot_catalog_xmin = InvalidTransactionId;
bool invalidated = false;
- if (max_slot_xid_age == 0)
- return false;
+ Assert(TransactionIdIsValid(oldestXmin));
- if (!TransactionIdIsNormal(oldestXmin) || !TransactionIdIsNormal(nextXID))
+ if (max_slot_xid_age == 0)
return false;
- cutoffXID = nextXID - max_slot_xid_age;
+ recentXid = ReadNextTransactionId();
+ xidLimit = TransactionIdRetreatedBy(recentXid, max_slot_xid_age);
- /* ensure it's a "normal" XID, else TransactionIdPrecedes misbehaves */
- /* this can cause the limit to go backwards by 3, but that's OK */
- if (!TransactionIdIsNormal(cutoffXID))
- cutoffXID -= FirstNormalTransactionId;
-
- if (TransactionIdPrecedes(oldestXmin, cutoffXID))
- {
- TransactionId slot_xmin;
- TransactionId slot_catalog_xmin;
+ /* oldestXmin is not behind the cutoff; no need to check slots */
+ if (TransactionIdPrecedes(xidLimit, oldestXmin))
+ return false;
- ProcArrayGetReplicationSlotXmin(&slot_xmin, &slot_catalog_xmin);
+ ProcArrayGetReplicationSlotXmin(&slot_xmin, &slot_catalog_xmin);
- if (IsReplicationSlotXIDAged(slot_xmin, slot_catalog_xmin, nextXID))
- {
- invalidated = InvalidateObsoleteReplicationSlots(RS_INVAL_XID_AGE,
- 0,
- InvalidOid,
- InvalidTransactionId,
- nextXID);
- }
- }
+ /*
+ * Invalidate possibly obsolete slots based on XID-age, if either slot's
+ * xmin or catalog_xmin is older than the cutoff.
+ */
+ if ((TransactionIdIsValid(slot_xmin) &&
+ TransactionIdPrecedes(slot_xmin, xidLimit)) ||
+ (TransactionIdIsValid(slot_catalog_xmin) &&
+ TransactionIdPrecedes(slot_catalog_xmin, xidLimit)))
+ invalidated = InvalidateObsoleteReplicationSlots(RS_INVAL_XID_AGE,
+ 0,
+ InvalidOid,
+ InvalidTransactionId,
+ recentXid);
return invalidated;
}
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 83cf8438724..c483f0f531b 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -371,7 +371,8 @@ extern bool InvalidateObsoleteReplicationSlots(uint32 possible_causes,
XLogSegNo oldestSegno,
Oid dboid,
TransactionId snapshotConflictHorizon,
- TransactionId nextXID);
+ TransactionId recentXid);
+extern bool MaybeInvalidateXIDAgedSlots(TransactionId oldestXmin);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name, bool need_lock);
extern int ReplicationSlotIndex(ReplicationSlot *slot);
extern bool ReplicationSlotName(int index, Name name);
@@ -391,6 +392,4 @@ extern bool SlotExistsInSyncStandbySlots(const char *slot_name);
extern bool StandbySlotsHaveCaughtup(XLogRecPtr wait_for_lsn, int elevel);
extern void WaitForStandbyConfirmation(XLogRecPtr wait_for_lsn);
-extern bool InvalidateXIDAgedReplicationSlots(TransactionId oldestXmin, TransactionId nextXID);
-
#endif /* SLOT_H */