Re: [HACKERS] Platform-dependent(?) failure in timeout handling

Martijn van Oosterhout Wed, 27 Nov 2013 00:24:44 -0800

On Tue, Nov 26, 2013 at 06:50:28PM -0500, Tom Lane wrote:
> I believe the reason for this is the mechanism that I speculated about in
> that previous thread.  The platform is blocking SIGALRM while it executes
> handle_sig_alarm(), and that calls LockTimeoutHandler() which does
> "kill(MyProcPid, SIGINT)", and that SIGINT is being delivered immediately
> (or at least before we can get out of handle_sig_alarm).  So now the
> platform blocks SIGINT, too, and calls StatementCancelHandler(), which
> proceeds to longjmp out of the whole signal-handling call stack.  So
> the signal unblocking that would have happened after the handlers
> returned doesn't happen.  In simpler cases we don't see an issue because
> the longjmp returns to the setsigjmp(foo,1) call in postgres.c, which
> will result in restoring the signal mask that was active at that stack
> level, so we're all good.  However, PG_TRY() uses setsigjmp(foo,0),
> which means that no signal mask restoration happens if we catch the
> longjmp and don't ever re-throw it.  Which is exactly what happens in
> plpgsql because of the EXCEPTION clause in the above example.
> 
> I don't know how many platforms block signals during handlers in this way,
> but I'm seeing it on Linux (RHEL6.5 to be exact) and we know that at least
> OpenBSD acts likewise, so that's a pretty darn large chunk of the world.


Isn't this why sigsetjmp/siglongjmp where invented? Is there a
situation where you don't want the signal mask restored?

BTW, seems on BSD systems sigsetjmp == setjmp:

http://www.gnu.org/software/libc/manual/html_node/Non_002dLocal-Exits-and-Signals.html

Have a nice day,
-- 
Martijn van Oosterhout   <klep...@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

signature.asc
Description: Digital signature

Re: [HACKERS] Platform-dependent(?) failure in timeout handling

Reply via email to