On 2018-07-20 11:53:32 +0300, Heikki Linnakangas wrote:
> ISTM that no-one has any great ideas on what to do about the ereport() in
> quickdie().

I think we actually have some decent ideas how to make that less
problematic in a number of cases. Avoid translation (thereby avoiding
direct malloc()) and rely on the fact that the error message evaluation
happens in a memory context with pre-allocated memory.  That still
leaves openssl, but is better than nothing.  If we wanted, we
additionally could force ssl allocations to happen through another
context with preallocated memory.


> But I think we have consensus on replacing the exit(2) calls
> with _exit(2). If we do just that, it would be better than the status quo,
> even if it doesn't completely fix the problem. This would prevent the case
> that Asim reported, for starters.

Right. You're planning to backpatch?


> With _exit(), I think we wouldn't really need the "PG_SETMASK(&BlockSig);"
> calls in the aux process *_quickdie handlers that don't do anything else
> than call _exit(). But I didn't dare to remove them yet.

I think we should remove it at least in master.


> diff --git a/src/backend/postmaster/bgwriter.c 
> b/src/backend/postmaster/bgwriter.c
> index 960d3de204..010d53a6ce 100644
> --- a/src/backend/postmaster/bgwriter.c
> +++ b/src/backend/postmaster/bgwriter.c
> @@ -404,22 +404,21 @@ bg_quickdie(SIGNAL_ARGS)
>       /*
>        * We DO NOT want to run proc_exit() callbacks -- we're here because
>        * shared memory may be corrupted, so we don't want to try to clean up 
> our
> -      * transaction.  Just nail the windows shut and get out of town.  Now 
> that
> -      * there's an atexit callback to prevent third-party code from breaking
> -      * things by calling exit() directly, we have to reset the callbacks
> -      * explicitly to make this work as intended.
> -      */
> -     on_exit_reset();
> -
> -     /*
> -      * Note we do exit(2) not exit(0).  This is to force the postmaster 
> into a
> -      * system reset cycle if some idiot DBA sends a manual SIGQUIT to a 
> random
> -      * backend.  This is necessary precisely because we don't clean up our
> -      * shared memory state.  (The "dead man switch" mechanism in pmsignal.c
> -      * should ensure the postmaster sees this as a crash, too, but no harm 
> in
> -      * being doubly sure.)
> +      * transaction.  Just nail the windows shut and get out of town.
> +      *
> +      * There's an atexit callback to prevent third-party code from breaking
> +      * things by calling exit() directly.  We don't want to trigger that, so
> +      * we use _exit(), which doesn't run atexit callbacks, rather than 
> exit().
> +      * And exit() wouldn't be safe to run from a signal handler, anyway.

It's much less the exit() that's unsafe, than the callbacks themselves,
right?  Perhaps just restate that we wouldn't want to trigger atexit
processing due to signal safety?

> +      * Note we use _exit(2) not _exit(0).  This is to force the postmaster
> +      * into a system reset cycle if some idiot DBA sends a manual SIGQUIT 
> to a
> +      * random backend.  This is necessary precisely because we don't clean 
> up
> +      * our shared memory state.  (The "dead man switch" mechanism in
> +      * pmsignal.c should ensure the postmaster sees this as a crash, too, 
> but
> +      * no harm in being doubly sure.)
>        */
> -     exit(2);
> +     _exit(2);
>  }

Greetings,

Andres Freund

Reply via email to