Re: Race conditions in logical decoding

Álvaro Herrera Thu, 22 Jan 2026 15:18:21 -0800

On 2026-Jan-20, Antonin Houska wrote:

> Antonin Houska <[email protected]> wrote:
> 
> > I'm not sure yet how to fix the problem. I tried to call XactLockTableWait()
> > from SnapBuildAddCommittedTxn() (like it happens in 
> > SnapBuildWaitSnapshot()),
> > but it made at least one regression test (subscription/t/010_truncate.pl)
> > stuck - probably a deadlock. I can spend more time on it, but maybe someone
> > can come up with a good idea sooner than me.
> 
> Attached here is what I consider a possible fix - simply wait for the CLOG
> update before building a new snapshot.
> 
> Unfortunately I have no idea right now how to test it using the isolation
> tester. With the fix, the additional waiting makes the current test
> block. (And if a step is added that unblock the session, it will not reliably
> catch failure to wait.)
> 
> -- 
> Antonin Houska
> Web: https://www.cybertec-postgresql.com
>


> @@ -400,6 +400,47 @@ SnapBuildBuildSnapshot(SnapBuild *builder)
>       snapshot->xmin = builder->xmin;
>       snapshot->xmax = builder->xmax;
>  
> +     /*
> +      * Although it's very unlikely, it's possible that a commit WAL record 
> was
> +      * decoded but CLOG is not aware of the commit yet. Should the CLOG 
> update
> +      * be delayed even more, visibility checks that use this snapshot could
> +      * work incorrectly. Therefore we check the CLOG status here.
> +      */
> +     while (true)
> +     {
> +             bool    found = false;
> +
> +             for (int i = 0; i < builder->committed.xcnt; i++)
> +             {
> +                     /*
> +                      * XXX Is it worth remembering the XIDs that appear to 
> be
> +                      * committed per CLOG and skipping them in the next 
> iteration of
> +                      * the outer loop? Not sure it's worth the effort - a 
> single
> +                      * iteration is enough in most cases.
> +                      */
> +                     if 
> (unlikely(!TransactionIdDidCommit(builder->committed.xip[i])))
> +                     {
> +                             found = true;
> +
> +                             /*
> +                              * Wait a bit before going to the next 
> iteration of the outer
> +                              * loop. The race conditions we address here is 
> pretty rare,
> +                              * so we shouldn't need to wait too long.
> +                              */
> +                             (void) WaitLatch(MyLatch,
> +                                                              WL_LATCH_SET | 
> WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
> +                                                              10L,
> +                                                              
> WAIT_EVENT_SNAPBUILD_CLOG);
> +                             ResetLatch(MyLatch);
> +
> +                             break;
> +                     }
> +             }
> +
> +             if (!found)
> +                     break;
> +     }

I think this algorithm is strange -- if you do have to wait more than
once for one transaction, it would lead to doing the
TransactionIdDidCommit again times for _all_ transactions by starting
the inner loop from scratch, which sounds really wasteful.  Why not nest
the for() loops the other way around?  Something like this perhaps,

    for (int i = 0; i < builder->committed.xcnt; i++)
    {
        for (;;)
        {
            if (TransactionIdDidCommit(builder->committed.xip[i]))
                break;
            else
            {
                (void) WaitLatch(MyLatch,
                                 WL_LATCH_SET, WL_TIMEOUT, WL_EXIT_ON_PM_DEATH,
                                 10L,
                                 WAIT_EVENT_SNAPBUILD_CLOG);
                ResetLatch(MyLatch);
            }
            CHECK_FOR_INTERRUPTS();
        }
    }

This way you wait repeatedly for one transaction until it is marked
committed; and once it does, you don't test it again.

I also wondered if it would make sense to get rid of the memcpy, given
that we're doing so much work per xid anyway it won't make any visible
difference (I believe), and do the copy per XID there, like

            if (TransactionIdDidCommit(builder->committed.xip[i]))
            {
                snapshot->xip[i] = builder->committed.xip[i];
                break;
            }
            else
            ...


-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Sallah, I said NO camels! That's FIVE camels; can't you count?"
(Indiana Jones)

Re: Race conditions in logical decoding

Reply via email to