Re: Reorder shutdown sequence, to flush pgstats later

Andres Freund Mon, 13 Jan 2025 14:34:00 -0800

Hi,

On 2025-01-13 12:20:39 +0000, Bertrand Drouvot wrote:
> On Fri, Jan 10, 2025 at 02:07:25PM -0500, Andres Freund wrote:
> > There wasn't
> > any code to treat receiving PMSIGNAL_XLOG_IS_SHUTDOWN in the wrong state as
> > fatal.  I think this needs to be treated the same way we treat not being 
> > able
> > to fork checkpointer during a shutdown.  Now receiving
> > PMSIGNAL_XLOG_IS_SHUTDOWN while not in PM_WAIT_XLOG_SHUTDOWN triggers
> > FatalError processing after logging "WAL was shut down unexpectedly".
> 
> I wonder if we could first ask the postmaster a confirmation that the SIGINT 
> is
> coming from it? (and if not, then simply ignore the SIGINT). I thought about a
> shared memory flag but the postmaster has to be keept away from shared memory
> operations, so that's not a valid idea. Another idea could be that the 
> checkpointer
> sends a new "confirm" request to the postmater first. But then I'm not sure 
> how
> the postmaster could reply back (given the signals that are already being 
> used)
> and that might overcomplicate things.


That sounds more complicated than it's worth it for a hypothetical risk.



> > Maybe PM_WAIT_CHECKPOINTER should be before PM_WAIT_DEAD_END? I earlier
> > thought it'd be better to shut checkpointer down after even dead-end 
> > children
> > exited, in case we ever wanted to introduce stats for dead-end children - 
> > but
> > that seems rather unlikely?
> 
> Yeah, given the above, it looks more clean to change that ordering.

I'll post a version that does that in a bit.


> > Independent of this patch, we seem to deal with FatalError processing in a
> > somewhat inconsistently.  We have this comment (in master):
> >             /*
> >              * PM_WAIT_DEAD_END state ends when all other children are gone 
> > except
> >              * for the logger.  During normal shutdown, all that remains are
> >              * dead-end backends, but in FatalError processing we jump 
> > straight
> >              * here with more processes remaining.  Note that they have 
> > already
> >              * been sent appropriate shutdown signals, either during a 
> > normal
> >              * state transition leading up to PM_WAIT_DEAD_END, or during
> >              * FatalError processing.
> > 
> > However that doesn't actually appear to be true in the most common way to
> > reach FatalError, via HandleChildCrash():
> > 
> >     if (Shutdown != ImmediateShutdown)
> >             FatalError = true;
> > 
> >     /* We now transit into a state of waiting for children to die */
> >     if (pmState == PM_RECOVERY ||
> >             pmState == PM_HOT_STANDBY ||
> >             pmState == PM_RUN ||
> >             pmState == PM_STOP_BACKENDS ||
> >             pmState == PM_WAIT_XLOG_SHUTDOWN)
> >             UpdatePMState(PM_WAIT_BACKENDS);
> > 
> > Here we
> > a) don't directly go to PM_WAIT_DEAD_END, but PM_WAIT_BACKENDS
> > 
> >    From PM_WAIT_BACKENDS we'll switch to PM_WAIT_DEAD_END if all processes
> >    other than walsender, archiver and dead-end children exited.
> > 
> > b) in PM_WAIT_XLOG_SHUTDOWN (previously named PM_SHUTDOWN) we go *back* to
> >    PM_WAIT_BACKENDS?
> > 
> > This comment seems to have been added in
> > 
> > commit a78af0427015449269fb7d9d8c6057cfcb740149
> > Author: Heikki Linnakangas <heikki.linnakan...@iki.fi>
> > Date:   2024-11-14 16:12:28 +0200
> > 
> >     Assign a child slot to every postmaster child process
> > 
> > ISTM that section of the comment is largely wrong and has never really been
> > the case if my git sleuthing is correct?
> 
> I agree that does not look correct but I don't think it's coming from 
> a78af0427015449269fb7d9d8c6057cfcb740149.
> ISTM that a78af0427015449269fb7d9d8c6057cfcb740149 re-worded this already 
> wrong
> comment:
> 
> "
>         /*
>          * PM_WAIT_DEAD_END state ends when the BackendList is entirely empty
>          * (ie, no dead_end children remain), and the archiver is gone too.
>          *
>          * The reason we wait for those two is to protect them against a new
>          * postmaster starting conflicting subprocesses; this isn't an
>          * ironclad protection, but it at least helps in the
>          * shutdown-and-immediately-restart scenario.  Note that they have
>          * already been sent appropriate shutdown signals, either during a
>          * normal state transition leading up to PM_WAIT_DEAD_END, or during
>          * FatalError processing.
>          */
> "

Hm. This version doesn't really seem wrong in the same way / to the same
degree.


> while we could also see:
> 
> "
>     if (Shutdown != ImmediateShutdown)
>         FatalError = true;
> 
>     /* We now transit into a state of waiting for children to die */
>     if (pmState == PM_RECOVERY ||
>         pmState == PM_HOT_STANDBY ||
>         pmState == PM_RUN ||
>         pmState == PM_STOP_BACKENDS ||
>         pmState == PM_SHUTDOWN)
>         pmState = PM_WAIT_BACKENDS;
> "

I don't think this make the version of the comment you included above
inaccurate? It's saying that the signal has been sent in the state *leading
up" to PM_WAIT_DEAD_END, which does seem accurate?  The code the comment is
about just won't be reached until postmaster transition to PM_WAIT_DEAD_END.


> > but I suspect that's just about unreachable these days.  If checkpointer 
> > exits
> > outside of shutdown processing, it's treated as as a reason to 
> > crash-restart:
> >                             /*
> >                              * Any unexpected exit of the checkpointer 
> > (including FATAL
> >                              * exit) is treated as a crash.
> >                              */
> >                             HandleChildCrash(pid, exitstatus,
> >                                                              
> > _("checkpointer process"));
> >
> 
> I do agree. I was trying to reach this code path while replying to one of
> the above comments (about PM_WAIT_DEAD_END ordering) but had to replace with
> another if condition to reach it during testing.

I was able to reach it unfortunately

1) Repeated fork failures of checkpointer can lead to it

2) When crash-restarting we, don't start checkpointer immediately, but defer
   that until LaunchMissingBackgroundProcesses(). While not easy, it's
   possible to trigger a shutdown before that happens.

Greetings,

Andres Freund

Re: Reorder shutdown sequence, to flush pgstats later

Reply via email to