On Friday, April 28, 2023 2:18 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > On Fri, Apr 28, 2023 at 11:51 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > On Wed, Apr 26, 2023 at 4:11 PM Zhijie Hou (Fujitsu) > > <houzj.f...@fujitsu.com> wrote: > > > > > > On Wednesday, April 26, 2023 5:00 PM Alexander Lakhin > <exclus...@gmail.com> wrote: > > > > > > > > IIUC, that assert will fail in case of any error raised between > > > > > ApplyWorkerMain()->logicalrep_worker_attach()->before_shmem_exit() and > > > > > ApplyWorkerMain()->InitializeApplyWorker()->BackgroundWorkerInitializeC > > > > onnectionByOid()->InitPostgres(). > > > > > > Thanks for reporting the issue. > > > > > > I think the problem is that it tried to release locks in > > > logicalrep_worker_onexit() before the initialization of the process is > complete > > > because this callback function was registered before the init phase. So I > think we > > > can add a conditional statement before releasing locks. Please find an > attached > > > patch. > > > > > > > Alexander, does the proposed patch fix the problem you are facing? > > Sawada-San, and others, do you see any better way to fix it than what > > has been proposed? > > I'm concerned that the idea of relying on IsNormalProcessingMode() > might not be robust since if we change the meaning of > IsNormalProcessingMode() some day it would silently break again. So I > prefer using something like InitializingApplyWorker, or another idea > would be to do cleanup work (e.g., fileset deletion and lock release) > in a separate callback that is registered after connecting to the > database.
Thanks for the review. I agree that it’s better to use a new variable here. Attach the patch for the same. > > FWIW, we might need to be careful about the timing when we call > logicalrep_worker_detach() in the worker's termination process. Since > we rely on IsLogicalParallelApplyWorker() for the parallel apply > worker to send ERROR messages to the leader apply worker, if an ERROR > happens after logicalrep_worker_detach(), we will end up with the > assertion failure. > > if (IsLogicalParallelApplyWorker()) > SendProcSignal(pq_mq_parallel_leader_pid, > PROCSIG_PARALLEL_APPLY_MESSAGE, > pq_mq_parallel_leader_backend_id); > else > { > Assert(IsParallelWorker()); > > It normally would be a should-no-happen case, though. Yes, I think currently PA sends ERROR message before exiting, so the callback functions are always fired after the above code which looks fine to me. Best Regards, Hou zj
v2-0001-Fix-assert-failure-in-logical-replication-apply-w.patch
Description: v2-0001-Fix-assert-failure-in-logical-replication-apply-w.patch