On Tue, Mar 24, 2026 at 2:42 PM Bharath Rupireddy
<[email protected]> wrote:
>
> Hi,
>
> On Mon, Mar 23, 2026 at 4:36 PM Masahiko Sawada <[email protected]> wrote:
> >
> > I've studied the discussion on this thread and the patch. I understand
> > the purpose of this feature and agree that it's useful especially in
> > cases where orphaned (physical or logical) replication slots prevent
> > the xmin from advancing and inactive_since based slot invalidation
> > might not fit.
> >
> > And +1 for treating both the slot's xmin and catalog_xmin similarly
> > with the single GUC.
>
> Thanks for reviewing the patch.
>
> > > >> I made the following design choice: try invalidating only once per
> > > >> vacuum cycle, not per table. While this keeps the cost of checking
> > > >> (incl. the XidGenLock contention) for invalidation to a minimum when
> > > >> there are a large number of tables and replication slots, it can be
> > > >> less effective when individual tables/indexes are large. Invalidating
> > > >> during checkpoints can help to some extent with the large table/index
> > > >> cases. But I'm open to thoughts on this.
> > > >
> > > > It may not solve the intent when the vacuum cycle is longer, which one 
> > > > can expect on a large database particularly when there is heavy bloat.
> > >
> > > This design choice boils down to the following: a database instance
> > > having either 1/ a large number of small tables or 2/ large tables.
> > > From my experience, I have seen both cases but mostly case 2 (others
> > > can correct me). In this context, having an XID age based slot
> > > invalidation check once per relation makes sense. However, I'm open to
> > > more thoughts here.
> >
> > ISTM that checking the XID-based slot invalidation per table would be
> > more bullet-proof and cover many cases. How about checking the
> > XID-based slot invalidation opportunity only when the OldestXmin is
> > older than the new GUC? For example, we can do this check in
> > heap_vacuum_rel() based on the VacuumCutoffs returned by
> > vacuum_get_cutoffs(). If we invalidate at least one slot for its XID,
> > we can re-compute the OldestXmin.
>
> Agreed. Here's the patch that moves the XID-age based slot
> invalidation check to vacuum_get_cutoffs. This has some nice
> advantages: 1/ It makes the check once per table (to help with large
> tables). 2/ It makes the check less costly since we rely on already
> computed OldestXmin and nextXID values. 3/ It avoids the checkpointer
> to do XID-age based slot invalidation which keeps the usage of this
> GUC simple with no additional costs to the checkpointer - just the
> vacuum (both vacuum command and autovacuum) does the invalidation when
> needed.
>
> I moved the new tests to the existing TAP test file
> t/019_replslot_limit.pl alongside other invalidation tests.
>
> I added detailed comments around InvalidateXIDAgedReplicationSlots and
> slightly modified the docs.
>
> Please find the v3 patch for further review.

Thank you for updating the patch. I think the patch is reasonably
simple and can avoid unnecessary overheads well due to XID-based
checks. Here are some comments:

+   /*
+    * Try to invalidate XID-aged replication slots that may interfere with
+    * vacuum's ability to freeze and remove dead tuples. Since OldestXmin
+    * already covers the slot xmin/catalog_xmin values, pass it as a
+    * preliminary check to avoid additional iteration over all the slots.
+    *
+    * If at least one slot was invalidated, recompute OldestXmin so that this
+    * vacuum benefits from the advanced horizon immediately.
+    */
+   if (InvalidateXIDAgedReplicationSlots(cutoffs->OldestXmin, nextXID))
+   {
+       cutoffs->OldestXmin = GetOldestNonRemovableTransactionId(rel);
+       Assert(TransactionIdIsNormal(cutoffs->OldestXmin));
+   }

vacuum_get_cutoff() is also called by VACUUM FULL, CLUSTER, and
REPACK. I'm not sure that users would expect the slot invalidation
also in these commands. I think it's better to leave
vacuum_get_cutoff() a pure cutoff computation function and we can try
to invalidate slots in heap_vacuum_rel(). It requires additional
ReadNextTransactionId() but we can live with it, or we can make
vacuum_get_cutoffs() return the nextXID as well (stored in *cutoffs).

---
+   /* ensure it's a "normal" XID, else TransactionIdPrecedes misbehaves */
+   /* this can cause the limit to go backwards by 3, but that's OK */
+   if (!TransactionIdIsNormal(cutoffXID))
+       cutoffXID = FirstNormalTransactionId;
+
+   if (TransactionIdPrecedes(oldestXmin, cutoffXID))
+   {
+       invalidated = InvalidateObsoleteReplicationSlots(RS_INVAL_XID_AGE,
+                                                        0,
+                                                        InvalidOid,
+                                                        InvalidTransactionId,
+                                                        nextXID);
+   }

I think it's better to check the procArray->replication_slot_xmin and
procArray->replication_slot_catalog_xmin before iterating over each
slot. Otherwise, we would end up checking every slot even when a long
running transaction holds the oldestxmin back.

---
+   if (cutoffXID < FirstNormalTransactionId)
+       cutoffXID -= FirstNormalTransactionId;

and

+   if (!TransactionIdIsNormal(cutoffXID))
+       cutoffXID = FirstNormalTransactionId;

These codes have the same comment but are doing a slightly different
thing. I guess the latter is missing '-'?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


Reply via email to