On Fri, Jul 25, 2025 at 5:25 PM shveta malik <shveta.ma...@gmail.com> wrote: > > On Fri, Jul 25, 2025 at 7:17 AM Fujii Masao <masao.fu...@gmail.com> wrote: > > > > On Thu, Jul 24, 2025 at 6:46 PM shveta malik <shveta.ma...@gmail.com> wrote: > > > Sounds reasonable. > > > Thinking out loud, when cleaning up after a backend or background > > > worker crash, process_pm_child_exit() is invoked, which subsequently > > > calls both CleanupBackend() and HandleChildCrash(). After the cleanup > > > completes, process_pm_child_exit() calls PostmasterStateMachine() to > > > move to the next state. As part of that, PostmasterStateMachine() > > > invokes ResetBackgroundWorkerCrashTimes() (only in crash > > > scenarios/FatalError), to reset a few things. Since it also resets > > > rw_worker.bgw_notify_pid, it seems reasonable to reset the rw_pid as > > > well there. > > > > Thanks! > > Attached is a patch that fixes the issue by resetting rw_pid in > > ResetBackgroundWorkerCrashTimes(). > > > > The patch LGTM.
Thanks to both ChangAo and Shveta for the review! I've pushed the patch and back-patched it to v18. > > We should probably add a regression test for this case, > > but I'd prefer to commit the fix first and work on the test separately. > > Andrey Rudometov proposed a test patch in thread [1], > > which we might use as a starting point. > > Sounds good. This proposed patch adds a new regression test file to verify background worker restarts when restart_after_crash is enabled. However, since we already have 013_crash_restart.pl for testing that scenario, I’m thinking it might be sufficient to check that the logical replication launcher, i.e., the background worker, restarts properly there. For example, we could add a check like the following to 013_crash_restart.pl. Thoughts? ----------------- is($node->safe_psql('postgres', "SELECT count(*) = 1 FROM pg_stat_activity WHERE backend_type = 'logical replication launcher'"), 't', 'logical replication launcher is running after crash'); ----------------- Regards, -- Fujii Masao