Hi, While testing something I made the checkpointer process intentionally crash as soon as it started up. The odd thing I observed on macOS is that we start a *new* checkpointer before shutting down:
2023-07-29 14:32:39.241 PDT [65031] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2023-07-29 14:32:39.244 PDT [65031] DEBUG: reaping dead processes 2023-07-29 14:32:39.244 PDT [65031] LOG: checkpointer process (PID 65032) was terminated by signal 11: Segmentation fault: 11 2023-07-29 14:32:39.244 PDT [65031] LOG: terminating any other active server processes 2023-07-29 14:32:39.244 PDT [65031] DEBUG: sending SIGQUIT to process 65034 2023-07-29 14:32:39.245 PDT [65031] DEBUG: sending SIGQUIT to process 65033 2023-07-29 14:32:39.245 PDT [65031] DEBUG: reaping dead processes 2023-07-29 14:32:39.245 PDT [65035] LOG: process 65035 taking over ProcSignal slot 126, but it's not empty 2023-07-29 14:32:39.245 PDT [65031] DEBUG: reaping dead processes 2023-07-29 14:32:39.245 PDT [65031] LOG: shutting down because restart_after_crash is off Note that a new process (65035) is started after the crash has been observed. I added logging to StartChildProcess(), and the process that's started is another checkpointer. I could not initially reproduce this on linux. After a fair bit of confusion, I figured out the reason: On macOS it takes a bit longer for the startup process to finish, which means we're still in PM_STARTUP state when we see that crash, instead of PM_RECOVERY or PM_RUN or ... The problem is that unfortunately HandleChildCrash() doesn't change pmState when in PM_STARTUP: /* We now transit into a state of waiting for children to die */ if (pmState == PM_RECOVERY || pmState == PM_HOT_STANDBY || pmState == PM_RUN || pmState == PM_STOP_BACKENDS || pmState == PM_SHUTDOWN) pmState = PM_WAIT_BACKENDS; Once I figured that out, I put a sleep(1) in StartupProcessMain(), and the problem reproduces on linux as well. I haven't fully dug through the history, this looks to be a quite old problem. Arguably we might also be missing PM_SHUTDOWN_2, but I can't really see a bad consequence of that. Greetings, Andres Freund