Hi, On 2023-11-28 16:05:16 -0600, Nathan Bossart wrote: > On Tue, Nov 28, 2023 at 07:00:16PM +0100, David Geier wrote: > > PostgreSQL hit the following assertion during error cleanup, after being OOM > > in dsa_allocate0(): > > > > void dshash_detach(dshash_table *hash_table) { > > ASSERT_NO_PARTITION_LOCKS_HELD_BY_ME(hash_table); > > > > called from pgstat_shutdown_hook(), called from shmem_exit(), called from > > proc_exit(), called from the exception handler. > > Nice find.
+1 > > AutoVacWorkerMain() pgstat_report_autovac() pgstat_get_entry_ref_locked() > > pgstat_get_entry_ref() dshash_find_or_insert() resize() resize() locks all > > partitions so the hash table can safely be resized. Then it calls > > dsa_allocate0(). If dsa_allocate0() fails to allocate, it errors out. The > > exception handler calls proc_exit() which normally calls LWLockReleaseAll() > > via AbortTransaction() but only if there's an active transaction. However, > > pgstat_report_autovac() runs before a transaction got started and hence > > LWLockReleaseAll() doesn't run before pgstat_shutdown_hook() is called. > > From a glance, it looks to me like the problem is that pgstat_shutdown_hook > is registered as a before_shmem_exit callback, while ProcKill is registered > as an on_shmem_exit callback. That's required, as pgstat_shutdown_hook() needs to acquire lwlocks, which you can't after ProcKill(). It's also not unique to pgstat, several other before_shmem_exit() callbacks acquire lwlocks (e.g. AtProcExit_Twophase(), ParallelWorkerShutdown(), do_pg_abort_backup(), the first three uses of before_shmem_exit when git grepping) - which makes sense, they are presumably before_shmem_exit() because they need to manage shared state, which often needs locks. In normal backends this is fine-ish, because ShutdownPostgres() is registered very late (and thus is called early in the shutdown sequence), and the AbortOutOfAnyTransaction() contained therein indirectly calls LWLockReleaseAll() and very little happens outside of the transaction context. > I would expect your patch to fix this particular issue, but I'm wondering > whether there's a bigger problem here. Yes, there is - our subsystem initialization, shutdown, error recovery infrastructure is a mess. We've interwoven transaction handling far too tightly with error handling, the order of subystem initialization is basically random and differs between operating systems (due to EXEC_BACKEND) and "mode" of execution (the order differs when using single user mode) and we've distributed error recovery into ~10 places (all the sigsetjmp()s in backend code, xact.c and and a few other places like WalSndErrorCleanup()). Greetings, Andres Freund