On Tue, Sep 02, 2025 at 09:09:54PM +0100, Mikhail Kot wrote: > The error originates from pgstat_shmem.c file where shhashent is left in > half-initialized state if pgstat_init_entry(), calling dsa_allocate0(), errors > out with OOM. Then shhashent causes a segmentation fault on access. I propose > a > patch which solves this issue. The patch is for main branch, but the code is > nearly identical in Postgres 13-17 so I suggest backporting it to other > supported versions.
Ugh. Support for pgstats in shared memory has been added in v15, so v13 and v14 are out of scope, aren't they? > The patch changes pgstat_init_entry()'s behaviour, returning NULL if memory > allocation failed. It also adds sanity checks to routines accepting arguments > returned by pgstat_init_entry(). @@ -430,6 +430,9 @@ pgstat_database_flush_cb(PgStat_EntryRef *entry_ref, bool nowait) PgStatShared_Database *sharedent; PgStat_StatDBEntry *pendingent; + if (!entry_ref) + return false; The flush callbacks are by design expected to return false *if and only if* the flush of the stats is *not forced* and they could not be flushed due to lock contention. This addition means that when pgstat_report_stat() uses its "force" mode, then it may randomly behave incorrectly and would fail to fulfill its design contract. > Reproducing this behaviour is tricky, because under OOM Postgres doesn't > necessarily reach the condition where specific dsa_allocate0() call errors. Deterministic testing would not be complicated, one can use an injection point that does a stack manipulation with IS_INJECTION_POINT_ATTACHED() or just return an error, but I don't see much value in doing that here as long as a fix is local to pgstat_init_entry(). Requiring that all the callers of pgstat_init_entry() need to cope with a potential OOM error path seems like a recipe that could be easily forgotten, even if we redesign the flush callbacks to be able to know about a more complex error state, which means to rewrite the code to return an enum state made of three possible values for OOM, success and lock contention. Anyway, couldn't we flip the order of the operations in pgstat_init_entry() so as we do first an allocation and avoid any inconsistency in the shared state? Or we could use DSA_ALLOC_NO_OOM, and put back the shared state in a consistent state before issuing an error if we find that the result of the allocation is NULL. Requiring an error on allocation seems more important to me here, we do that for palloc() and I don't see why we should not just do the same here for this DSA usage in the pgstats code. -- Michael
signature.asc
Description: PGP signature