Re: Skip partition tuple routing with constant partition key

Amit Kapila Tue, 12 Apr 2022 22:39:39 -0700

On Tue, Apr 12, 2022 at 6:16 AM Masahiko Sawada <[email protected]> wrote:
>
> Hi,
>
> On Thu, Apr 7, 2022 at 4:37 PM Andres Freund <[email protected]> wrote:
> >
> > Hi,
> >
> > On 2022-04-06 00:07:07 -0400, Tom Lane wrote:
> > > Amit Langote <[email protected]> writes:
> > > > On Sun, Apr 3, 2022 at 10:31 PM Greg Stark <[email protected]> wrote:
> > > >> Is this a problem with the patch or its tests?
> > > >> [18:14:20.798] Test Summary Report
> > > >> [18:14:20.798] -------------------
> > > >> [18:14:20.798] t/013_partition.pl (Wstat: 15360 Tests: 31 Failed: 0)
> > >
> > > > Hmm, make check-world passes for me after rebasing the patch (v10) to
> > > > the latest HEAD (clean), nor do I see a failure on cfbot:
> > > > http://cfbot.cputube.org/amit-langote.html
> > >
> > > 013_partition.pl has been failing regularly in the buildfarm,
> > > most recently here:
> > >
> > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&dt=2022-03-31%2000%3A49%3A45
> >
> > Just failed locally on my machine as well.
> >
> >
> > > I don't think there's room to blame any uncommitted patches
> > > for that.  Somebody broke it a short time before here:
> > >
> > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-03-17%2016%3A08%3A19
> >
> > The obvious thing to point a finger at is
> >
> > commit c91f71b9dc91ef95e1d50d6d782f477258374fc6
> > Author: Tomas Vondra <[email protected]>
> > Date:   2022-03-16 16:42:47 +0100
> >
> >     Fix publish_as_relid with multiple publications
> >
>
> I've not managed to reproduce this issue on my machine but while
> reviewing the code and the server logs[1] I may have found possible
> bugs:
>
> 2022-04-08 12:59:30.701 EDT [91997:1] LOG:  logical replication apply
> worker for subscription "sub2" has started
> 2022-04-08 12:59:30.702 EDT [91998:3] 013_partition.pl LOG:
> statement: ALTER SUBSCRIPTION sub2 SET PUBLICATION pub_lower_level,
> pub_all
> 2022-04-08 12:59:30.733 EDT [91998:4] 013_partition.pl LOG:
> disconnection: session time: 0:00:00.036 user=buildfarm
> database=postgres host=[local]
> 2022-04-08 12:59:30.740 EDT [92001:1] LOG:  logical replication table
> synchronization worker for subscription "sub2", table "tab4_1" has
> started
> 2022-04-08 12:59:30.744 EDT [91997:2] LOG:  logical replication apply
> worker for subscription "sub2" will restart because of a parameter
> change
> 2022-04-08 12:59:30.750 EDT [92003:1] LOG:  logical replication table
> synchronization worker for subscription "sub2", table "tab3" has
> started
>
> The logs say that the apply worker for "sub2" finished whereas the
> tablesync workers for "tab4_1" and "tab3" started. After these logs,
> there are no logs that these tablesync workers finished and the apply
> worker for "sub2" restarted, until the timeout. While reviewing the
> code, I realized that the tablesync workers can advance its relstate
> even without the apply worker intervention.
>
> After a tablesync worker copies the table it sets
> SUBREL_STATE_SYNCWAIT to its relstate, then it waits for the apply
> worker to update the relstate to SUBREL_STATE_CATCHUP. If the apply
> worker has already died, it breaks from the wait loop and returns
> false:
>
> wait_for_worker_state_change():
>
>     for (;;)
>     {
>         LogicalRepWorker *worker;
>
>         :
>
>         /*
>          * Bail out if the apply worker has died, else signal it we're
>          * waiting.
>          */
>         LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
>         worker = logicalrep_worker_find(MyLogicalRepWorker->subid,
>                                         InvalidOid, false);
>         if (worker && worker->proc)
>             logicalrep_worker_wakeup_ptr(worker);
>         LWLockRelease(LogicalRepWorkerLock);
>         if (!worker)
>             break;
>
>         :
>     }
>
>     return false;
>
> However, the caller doesn't check the return value at all:
>
>     /*
>      * We are done with the initial data synchronization, update the state.
>      */
>     SpinLockAcquire(&MyLogicalRepWorker->relmutex);
>     MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCWAIT;
>     MyLogicalRepWorker->relstate_lsn = *origin_startpos;
>     SpinLockRelease(&MyLogicalRepWorker->relmutex);
>
>     /*
>      * Finally, wait until the main apply worker tells us to catch up and then
>      * return to let LogicalRepApplyLoop do it.
>      */
>     wait_for_worker_state_change(SUBREL_STATE_CATCHUP);
>     return slotname;
>
> Therefore, the tablesync worker started logical replication while
> keeping its relstate as SUBREL_STATE_SYNCWAIT.
>
> Given the server logs, it's likely that both tablesync workers for
> "tab4_1" and "tab3" were in this situation. That is, there were two
> tablesync workers who were applying changes for the target relation
> but the relstate was SUBREL_STATE_SYNCWAIT.
>
> When it comes to starting the apply worker, probably it didn't happen
> since there are already running tablesync workers as much as
> max_sync_workers_per_subscription (2 by default):
>
> logicalrep_worker_launch():
>
>     /*
>      * If we reached the sync worker limit per subscription, just exit
>      * silently as we might get here because of an otherwise harmless race
>      * condition.
>      */
>     if (nsyncworkers >= max_sync_workers_per_subscription)
>     {
>         LWLockRelease(LogicalRepWorkerLock);
>         return;
>     }
>
> This scenario seems possible in principle but I've not managed to
> reproduce this issue so I might be wrong.
>


This is exactly the same analysis I have done in the original thread
where that patch was committed. I have found some crude ways to
reproduce it with a different test as well. See emails [1][2][3].

> Especially, according to the
> server logs, it seems like the tablesync workers were launched before
> the apply worker restarted due to parameter change and this is a
> common pattern among other failure logs. But I'm not sure how it could
> really happen. IIUC the apply worker always re-reads subscription (and
> exits if there is parameter change) and then requests to launch
> tablesync workers accordingly.
>

Is there any rule/documentation which ensures that we must re-read the
subscription parameter change before trying to launch sync workers?

Actually, it would be better if we discuss this problem on another
thread [1] to avoid hijacking this thread. So, it would be good if you
respond there with your thoughts. Thanks for looking into this.


[1] - 
https://www.postgresql.org/message-id/CAA4eK1LpBFU49Ohbnk%3Ddv_v9YP%2BKqh1%2BSf8i%2B%2B_s-QhD1Gy4Qw%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/CAA4eK1JzzoE61CY1qi9Vcdi742JFwG4YA3XpoMHwfKNhbFic6g%40mail.gmail.com
[3] - 
https://www.postgresql.org/message-id/CAA4eK1JcQRQw0G-U4A%2BvaGaBWSvggYMMDJH4eDtJ0Yf2eUYXyA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: Skip partition tuple routing with constant partition key

Reply via email to