I've been investigating the assert failure in
ProcArraySetReplicationSlotXmin() and would like to share my approach and
get feedback. Instead of inverting the locks and what robert shared before
[1].
Instead of unconditionally updating procArray->replication_slot_xmin in
ProcArraySetReplicationSlotXmin() in procarray.c, I made the updates
conditional:
1) Only update if the incoming xmin is valid
2) Only update if it's older than the currently stored xmin
3) Do the same for procArray->replication_slot_catalog_xmin

void ProcArraySetReplicationSlotXmin(TransactionId xmin, TransactionId
catalog_xmin, bool already_locked)
{
       Assert(!already_locked || LWLockHeldByMe(ProcArrayLock));

       if (!already_locked)
           LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);

       if (TransactionIdIsValid(xmin))
      {
           if (!TransactionIdIsValid(procArray->replication_slot_xmin) ||
TransactionIdPrecedes(xmin, procArray->replication_slot_xmin))
                 procArray->replication_slot_xmin = xmin;
      }
      if (TransactionIdIsValid(catalog_xmin))
      {
             if
(!TransactionIdIsValid(procArray->replication_slot_catalog_xmin) ||
TransactionIdPrecedes(catalog_xmin,
procArray->replication_slot_catalog_xmin))
                   procArray->replication_slot_catalog_xmin = catalog_xmin;
      }
      if (!already_locked)
          LWLockRelease(ProcArrayLock);

     elog(DEBUG1, "xmin required by slots: data %u, catalog %u", xmin,
catalog_xmin);
}

In above block of code ensures we always track the minimum xmin across all
active replication slots without losing data. And also no need to worry
about locks. And also while reproducing this issue [2]
In SnapBuildInitialSnapshot() while we computing safexid by
calling GetOldestSafeDecodingTransactionId(false) will enters into first
case and update the oldestSafeXid = procArray->replication_slot_xmin. So it
won't return nextXid. And also it solves this issue [2].

[1]
https://www.postgresql.org/message-id/ca+tgmoylzjxcea0acan3kr7o_25g52cbqw-90q0vgrmv3a8...@mail.gmail.com
[2]
https://www.postgresql.org/message-id/CAA4eK1KDFeh=ZbvSWPx=ir2qoxbxjbh0k8yqifdtg3xjenl...@mail.gmail.com

On Fri, Nov 7, 2025 at 11:05 AM Masahiko Sawada <[email protected]>
wrote:

> On Thu, Nov 6, 2025 at 8:05 PM Amit Kapila <[email protected]>
> wrote:
> >
> > On Fri, Nov 7, 2025 at 8:30 AM Zhijie Hou (Fujitsu)
> > <[email protected]> wrote:
> > >
> > > On Friday, November 7, 2025 2:36 AM Masahiko Sawada <
> [email protected]> wrote:
> > > > On Thu, Nov 6, 2025 at 2:36 AM Amit Kapila <[email protected]>
> > > > wrote:
> > > > >
> > > > > Good point. This can happen when the last slot is invalidated or
> dropped.
> > > >
> > > > After the last slot is invalidated or dropped, both slot_xmin and
> > > > slot_catalog_xmin values are set InvalidTransactionId. Then in this
> > > > case, these values are ignored when computing the oldest safe
> decoding
> > > > XID in GetOldestSafeDecodingTransactionId(), no? Or do you mean that
> > > > there is a case where slot_xmin and slot_catalog_xmin retreat to a
> > > > valid XID?
> > >
> > > I think when replication_slot_xmin is invalid,
> > > GetOldestSafeDecodingTransactionId would return nextXid, which can be
> greater
> > > than the original snap.xmin if some transaction IDs have been assigned.
> > >
> >
> > Won't we have a problem that values of
> > procArray->replication_slot_xmin and
> > procArray->replication_slot_catalog_xmin won't be set to
> > InvalidTransactionId after last slot removal due to a new check unless
> > we do special treatment for drop/invalidation of a slot? And that
> > would lead to accumulating dead rows even when not required.
>
> I understand Hou-san's point. Agreed. procArray->replication_slot_xmin
> and replication_slot_catalog_xmin should not retreat to a valid XID
> but could become 0 (invalid). Let's consider the idea of inverting the
> locks as Andres proposed[1].
>
> Regards,
>
> [1]
> https://www.postgresql.org/message-id/20230207194903.ws4acm7ake6ikacn%40awork3.anarazel.de
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com
>

Reply via email to