Hi, I noticed that during standby promotion the startup process sends SIGUSR1 to the slotsync worker to make it exit. Is there a reason for using SIGUSR1?
If the slotsync worker is blocked waiting for input from the primary (e.g., due to a network outage between the primary and standby), SIGUSR1 won't interrupt the wait. As a result, the worker can remain stuck and delay promotion for a long time. Would it make sense to send SIGTERM instead, so the worker can exit promptly even while waiting? I've attached a WIP patch that does this. I haven't updated the source comments yet, but I can do so if we agree on the approach. SIGTERM alone is not sufficient, though. A new slotsync worker could start immediately after the old one exits and block promotion again. To address this, the patch makes a newly started worker exit immediately if promotion is in progress. Thoughts? Regards, -- Fujii Masao
v1-0001-Use-SIGTERM-to-stop-slotsync-worker-during-standb.patch
Description: Binary data
