Hi, On 2025-01-13 12:20:39 +0000, Bertrand Drouvot wrote: > On Fri, Jan 10, 2025 at 02:07:25PM -0500, Andres Freund wrote: > > There wasn't > > any code to treat receiving PMSIGNAL_XLOG_IS_SHUTDOWN in the wrong state as > > fatal. I think this needs to be treated the same way we treat not being > > able > > to fork checkpointer during a shutdown. Now receiving > > PMSIGNAL_XLOG_IS_SHUTDOWN while not in PM_WAIT_XLOG_SHUTDOWN triggers > > FatalError processing after logging "WAL was shut down unexpectedly". > > I wonder if we could first ask the postmaster a confirmation that the SIGINT > is > coming from it? (and if not, then simply ignore the SIGINT). I thought about a > shared memory flag but the postmaster has to be keept away from shared memory > operations, so that's not a valid idea. Another idea could be that the > checkpointer > sends a new "confirm" request to the postmater first. But then I'm not sure > how > the postmaster could reply back (given the signals that are already being > used) > and that might overcomplicate things.
That sounds more complicated than it's worth it for a hypothetical risk. > > Maybe PM_WAIT_CHECKPOINTER should be before PM_WAIT_DEAD_END? I earlier > > thought it'd be better to shut checkpointer down after even dead-end > > children > > exited, in case we ever wanted to introduce stats for dead-end children - > > but > > that seems rather unlikely? > > Yeah, given the above, it looks more clean to change that ordering. I'll post a version that does that in a bit. > > Independent of this patch, we seem to deal with FatalError processing in a > > somewhat inconsistently. We have this comment (in master): > > /* > > * PM_WAIT_DEAD_END state ends when all other children are gone > > except > > * for the logger. During normal shutdown, all that remains are > > * dead-end backends, but in FatalError processing we jump > > straight > > * here with more processes remaining. Note that they have > > already > > * been sent appropriate shutdown signals, either during a > > normal > > * state transition leading up to PM_WAIT_DEAD_END, or during > > * FatalError processing. > > > > However that doesn't actually appear to be true in the most common way to > > reach FatalError, via HandleChildCrash(): > > > > if (Shutdown != ImmediateShutdown) > > FatalError = true; > > > > /* We now transit into a state of waiting for children to die */ > > if (pmState == PM_RECOVERY || > > pmState == PM_HOT_STANDBY || > > pmState == PM_RUN || > > pmState == PM_STOP_BACKENDS || > > pmState == PM_WAIT_XLOG_SHUTDOWN) > > UpdatePMState(PM_WAIT_BACKENDS); > > > > Here we > > a) don't directly go to PM_WAIT_DEAD_END, but PM_WAIT_BACKENDS > > > > From PM_WAIT_BACKENDS we'll switch to PM_WAIT_DEAD_END if all processes > > other than walsender, archiver and dead-end children exited. > > > > b) in PM_WAIT_XLOG_SHUTDOWN (previously named PM_SHUTDOWN) we go *back* to > > PM_WAIT_BACKENDS? > > > > This comment seems to have been added in > > > > commit a78af0427015449269fb7d9d8c6057cfcb740149 > > Author: Heikki Linnakangas <heikki.linnakan...@iki.fi> > > Date: 2024-11-14 16:12:28 +0200 > > > > Assign a child slot to every postmaster child process > > > > ISTM that section of the comment is largely wrong and has never really been > > the case if my git sleuthing is correct? > > I agree that does not look correct but I don't think it's coming from > a78af0427015449269fb7d9d8c6057cfcb740149. > ISTM that a78af0427015449269fb7d9d8c6057cfcb740149 re-worded this already > wrong > comment: > > " > /* > * PM_WAIT_DEAD_END state ends when the BackendList is entirely empty > * (ie, no dead_end children remain), and the archiver is gone too. > * > * The reason we wait for those two is to protect them against a new > * postmaster starting conflicting subprocesses; this isn't an > * ironclad protection, but it at least helps in the > * shutdown-and-immediately-restart scenario. Note that they have > * already been sent appropriate shutdown signals, either during a > * normal state transition leading up to PM_WAIT_DEAD_END, or during > * FatalError processing. > */ > " Hm. This version doesn't really seem wrong in the same way / to the same degree. > while we could also see: > > " > if (Shutdown != ImmediateShutdown) > FatalError = true; > > /* We now transit into a state of waiting for children to die */ > if (pmState == PM_RECOVERY || > pmState == PM_HOT_STANDBY || > pmState == PM_RUN || > pmState == PM_STOP_BACKENDS || > pmState == PM_SHUTDOWN) > pmState = PM_WAIT_BACKENDS; > " I don't think this make the version of the comment you included above inaccurate? It's saying that the signal has been sent in the state *leading up" to PM_WAIT_DEAD_END, which does seem accurate? The code the comment is about just won't be reached until postmaster transition to PM_WAIT_DEAD_END. > > but I suspect that's just about unreachable these days. If checkpointer > > exits > > outside of shutdown processing, it's treated as as a reason to > > crash-restart: > > /* > > * Any unexpected exit of the checkpointer > > (including FATAL > > * exit) is treated as a crash. > > */ > > HandleChildCrash(pid, exitstatus, > > > > _("checkpointer process")); > > > > I do agree. I was trying to reach this code path while replying to one of > the above comments (about PM_WAIT_DEAD_END ordering) but had to replace with > another if condition to reach it during testing. I was able to reach it unfortunately 1) Repeated fork failures of checkpointer can lead to it 2) When crash-restarting we, don't start checkpointer immediately, but defer that until LaunchMissingBackgroundProcesses(). While not easy, it's possible to trigger a shutdown before that happens. Greetings, Andres Freund