Hi,
On 2023-11-28 20:42:47 -0600, Nathan Bossart wrote:
> On Tue, Nov 28, 2023 at 04:03:49PM -0800, Andres Freund wrote:
> > On 2023-11-28 16:05:16 -0600, Nathan Bossart wrote:
> >> From a glance, it looks to me like the problem is that pgstat_shutdown_hook
> >> is registered as a before_shmem_exit callback, while ProcKill is registered
> >> as an on_shmem_exit callback.
> >
> > That's required, as pgstat_shutdown_hook() needs to acquire lwlocks, which
> > you
> > can't after ProcKill(). It's also not unique to pgstat, several other
> > before_shmem_exit() callbacks acquire lwlocks (e.g. AtProcExit_Twophase(),
> > ParallelWorkerShutdown(), do_pg_abort_backup(), the first three uses of
> > before_shmem_exit when git grepping) - which makes sense, they are
> > presumably
> > before_shmem_exit() because they need to manage shared state, which often
> > needs locks.
> >
> > In normal backends this is fine-ish, because ShutdownPostgres() is
> > registered
> > very late (and thus is called early in the shutdown sequence), and the
> > AbortOutOfAnyTransaction() contained therein indirectly calls
> > LWLockReleaseAll() and very little happens outside of the transaction
> > context.
>
> Right. Perhaps we could add a LWLockReleaseAll() to
> pgstat_shutdown_hook() instead of the autovacuum code, but I'm afraid that
> is still just a hack.
Yea, we'd need that in just about all before_shmem_exit() callbacks. I could
see an argument for doing it in proc_exit_prepare(). While that'd be a fairly
gross layering violation, we already do reset a number a bunch of stuff in
there:
/*
* Forget any pending cancel or die requests; we're doing our best to
* close up shop already. Note that the signal handlers will not set
* these flags again, now that proc_exit_inprogress is set.
*/
InterruptPending = false;
ProcDiePending = false;
QueryCancelPending = false;
InterruptHoldoffCount = 1;
CritSectionCount = 0;
> >> I would expect your patch to fix this particular issue, but I'm wondering
> >> whether there's a bigger problem here.
> >
> > Yes, there is - our subsystem initialization, shutdown, error recovery
> > infrastructure is a mess. We've interwoven transaction handling far too
> > tightly with error handling, the order of subystem initialization is
> > basically
> > random and differs between operating systems (due to EXEC_BACKEND) and
> > "mode"
> > of execution (the order differs when using single user mode) and we've
> > distributed error recovery into ~10 places (all the sigsetjmp()s in backend
> > code, xact.c and and a few other places like WalSndErrorCleanup()).
>
> :(
>
> I do remember looking into uniting all the various sigsetjmp() calls
> before. That could be worth another try. The rest will probably require
> additional thought...
It'd definitely be worth some effort. I'm quite sure that we have a number of
hard to find bugs around this.
Greetings,
Andres Freund