Hi, On 2022-09-30 11:17:00 +0200, Alvaro Herrera wrote: > But on testing, some nodes linger after being sent a shutdown signal. > I'm not clear why this is -- I think it's due to the fact that we send > the signal just as the node is starting up, which means the signal > doesn't reach the process.
I suspect it's when a test gets interrupt while pg_ctl is starting the backend. The start() routine only does _update_pid() after pg_ctl finished, and terminate()->stop() returns before doing anything if pid isn't defined. Perhaps the END{} routine should call $node->_update_pid(-1); if $exit_code != 0 and _pid is undefined? That does seem to reduce the incidence of "leftover" postgres instances. 001_start_stop.pl leaves some behind, but that makes sense, because it's bypassing the whole node management. But I still occasionally see some remaining processes if I crank up test concurrency. Ah! At least part of the problem is that sub stop() does BAIL_OUT, and of course it can fail as part of the shutdown. But there's still some that survive, where your perl.trace doesn't contain the node getting shut down... Greetings, Andres Freund