On Fri, Apr 28, 2023 at 11:48 AM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > While investigating this issue, I've reviewed the code around > callbacks and worker termination etc and I found a problem. > > A parallel apply worker calls the before_shmem_exit callbacks in the > following order: > > 1. ShutdownPostgres() > 2. logicalrep_worker_onexit() > 3. pa_shutdown() > > Since the worker is detached during logicalrep_worker_onexit(), > MyLogicalReplication->leader_pid is an invalid when we call > pa_shutdown(): > > static void > pa_shutdown(int code, Datum arg) > { > Assert(MyLogicalRepWorker->leader_pid != InvalidPid); > SendProcSignal(MyLogicalRepWorker->leader_pid, > PROCSIG_PARALLEL_APPLY_MESSAGE, > InvalidBackendId); > > Also, if the parallel apply worker fails shm_toc_lookup() during the > initialization, it raises an error (because of noError = false) but > ends up a SEGV as MyLogicalRepWorker is still NULL. > > I think that we should not use MyLogicalRepWorker->leader_pid in > pa_shutdown() but instead store the leader's pid to a static variable > before registering pa_shutdown() callback. >
Why not simply move the registration of pa_shutdown() to someplace after logicalrep_worker_attach()? BTW, it seems we don't have access to MyLogicalRepWorker->leader_pid till we attach to the worker slot via logicalrep_worker_attach(), so we anyway need to do what you are suggesting after attaching to the worker slot. -- With Regards, Amit Kapila.