At Tue, 3 Sep 2024 09:10:07 +0530, vignesh C <vignes...@gmail.com> wrote in > The attached v2 version patch has the changes for the same.
Sorry for jumping in at this point. I've just reviewed the latest patch (v2), and the frequent Own/Disown-Latch operations caught my attention. Additionally, handling multiple concurrently operating trigger sources with nested latch waits seems bug-prone, which I’d prefer to avoid from both a readability and safety perspective. With that in mind, I’d like to suggest an alternative approach. I may not be fully aware of all the previous discussions, so apologies if this idea has already been considered and dismissed. Currently, WaitForReplicationWorkerAttach() and logicalrep_worker_stop_internal() wait on a latch after verifying the desired state. This ensures that even if there are spurious or missed wakeups, they won't cause issues. In contrast, ApplyLauncherMain() enters a latch wait without checking the desired state first. Consequently, if another process sets the latch to wake up the main loop while the former two functions are waiting, that wakeup could be missed. If my understanding is correct, the problem lies in ApplyLauncherMain() not checking the expected state before beginning to wait on the latch. There is no issue with waiting if the state hasn't been satisfied yet. So, I propose that ApplyLauncherMain() should check the condition that triggers a main loop wakeup before calling WaitLatch(). To do this, we could add a flag in LogicalRepCtxStruct to signal that the main loop has immediate tasks to handle. ApplyLauncherWakeup() would set this flag before sending SIGUSR1. In turn, ApplyLauncherMain() would check this flag before calling WaitLatch() and skip the WaitLatch() call if the flag is set. I think this approach could solve the issue without adding complexity. What do you think? regard. -- Kyotaro Horiguchi NTT Open Source Software Center