Re: Unexpected behavior after OOM errors

Matthias van de Meent Fri, 19 Jun 2026 00:18:33 -0700

On Fri, 19 Jun 2026 at 01:55, Michael Paquier <[email protected]> wrote:
>
> On Thu, Jun 18, 2026 at 11:27:28AM +0200, Matthias van de Meent wrote:
> > Each of the calls to
> > CacheRegisterSyscacheCallback/CacheRegisterRelcacheCallback can throw
> > an ERROR when all slots have been used.  This would leave the typcache
> > in an invalid state, so I think that must be wrapped in a critical
> > section: neither syscache nor relcache has options to release
> > callbacks, and we can't safely continue without the callbacks
> > installed, so once an error is thrown here this backend can't ever be
> > properly initialized.  This is unlike OOMs, whose conditions for
> > failure may (and often do) change as workloads change in other
> > backends.
>
> We don't ERROR when failing to register a syscache/relcache callback,
> we FATAL if we reach one of the thresholds.


Ah, thanks for correcting me. I'm not sure why I had ERROR in mind,
but you're obviously correct. Your patch v2 LGTM.

> Reaching these thresholds
> points to me to a programming error anyway, so these should not matter
> in the field.

I don't think that's (necessarily) correct. These callbacks are
accessible to extensions, and if you load sufficiently many of those
you could still run out of slots even if each extension stayed well
within a reasonable threshold.

> > I think Heikki's suggestion for a FATAL critical section option would
> > be a good alternative. It wouldn't always be sufficient, but would fix
> > issues here.
>
> That sounds like an interesting idea, potentially reusable for other
> areas, but I'm not really convinced that we need to add this kind of
> facility for the case dealt with here.  To me, that's also where we
> could use a TRY/CATCH block and call it a day.  If others feel
> differently about this matter, I'm fine to be outvoted.

While TRY/CATCH would work, I'm not so keen on adding those to
system-initalizing code; the allocations generally go into contexts
that aren't cleaned up nicely during error handling. E.g. a partial
hash initialization for any of the cache hashmaps due to OOM will leak
its allocations.
Failing the connection for that makes sure we complete the right
cleanup procedures and not leak those resources (and it adds another
item to the multi-threading concerns list).

-Matthias

Re: Unexpected behavior after OOM errors

Reply via email to