Dan Wood sent me off-list the test case mentioned in


I've been poking at it, and one of the failure modes I'm seeing is that
all the backends hang up without crashing.  I thought at first it was an
undetected deadlock, but what it turns out to be is breakage in timeout
processing.  The symptom is reminiscent of what we saw on OpenBSD awhile


namely that backends end up with SIGALRM blocked and therefore can't
process any more timeout interrupts.  I've been able to devise a
repeatable test case: do this setup

create table lock_bug(f1 int, f2 int);

create or replace function timeoutfail() returns integer as $$
  SET lock_timeout = 100;
    INSERT INTO lock_bug VALUES (1, 2);
    return 0;
  return 1;
$$ LANGUAGE plpgsql;

and then try timeoutfail() a few times:

regression=# select timeoutfail();
(1 row)

regression=# select timeoutfail();
(1 row)

All fine so far.  Now, in another session, do

regression=# begin;
regression=# lock table lock_bug ;

and then go back and try timeoutfail() some more.

regression=# select timeoutfail();
(1 row)

regression=# select timeoutfail();
<< hangs ... >>

After the call that returns zero, the process has SIGALRM and SIGINT
blocked, so the timeout that should happen in the next call never comes.

I believe the reason for this is the mechanism that I speculated about in
that previous thread.  The platform is blocking SIGALRM while it executes
handle_sig_alarm(), and that calls LockTimeoutHandler() which does
"kill(MyProcPid, SIGINT)", and that SIGINT is being delivered immediately
(or at least before we can get out of handle_sig_alarm).  So now the
platform blocks SIGINT, too, and calls StatementCancelHandler(), which
proceeds to longjmp out of the whole signal-handling call stack.  So
the signal unblocking that would have happened after the handlers
returned doesn't happen.  In simpler cases we don't see an issue because
the longjmp returns to the setsigjmp(foo,1) call in postgres.c, which
will result in restoring the signal mask that was active at that stack
level, so we're all good.  However, PG_TRY() uses setsigjmp(foo,0),
which means that no signal mask restoration happens if we catch the
longjmp and don't ever re-throw it.  Which is exactly what happens in
plpgsql because of the EXCEPTION clause in the above example.

I don't know how many platforms block signals during handlers in this way,
but I'm seeing it on Linux (RHEL6.5 to be exact) and we know that at least
OpenBSD acts likewise, so that's a pretty darn large chunk of the world.

There are several ways that we might address this:

1. Make PG_TRY() use setsigjmp(foo,1) instead.

2. Establish a coding rule that signal handlers that throw longjmp
must unblock signals first.

3. Establish a coding rule that if you catch an error with PG_TRY()
and don't re-throw it, you have to unblock signals in your PG_CATCH

I don't especially like #1 because of performance considerations ---
I assume (perhaps wrongly) that saving and restoring the signal mask would
add noticeably to the cost of PG_TRY, and most of the time we don't need
that.  Also, it's arguable that we don't *want* signals to be unblocked
right away if we're dealing with some arbitrary error condition.  At
least, it seems scary for the PG_TRY mechanism to be enforcing a policy
that we unblock before any cleanup can happen.

#2 is more or less what I proposed in the prior thread.  I experimented
with putting "PG_SETMASK(&UnBlockSig);" into ProcessInterrupts() before
any of the ereport(ERROR) cases, and verified that this seems to resolve
the issue.  One could argue that this approach has the same defect of
unblocking signals before any cleanup can happen --- but the scope for
problems seems considerably narrower than if we make PG_TRY() itself
change behavior.

#3 is probably logically cleanest but I'm worried about the prospects
for missing some places that would need to be changed to conform to
this new coding rule.  In particular, any third-party PL that has
something corresponding to exception catching would have an issue.
(#2 is less bad on the how-much-code-changes scale because there are
very few signal handlers that ought to be throwing longjmps.)

So on balance I like approach #2 but I wonder if anyone has a different

Also, while looking at this example, I see that there's another issue
internal to timeout.c: if a timeout handler function throws a longjmp
then handle_sig_alarm() loses control, which means that (1) any other
handler functions that should have fired at the same time won't, and
(2) we fail to reschedule the SIGALRM interrupt, meaning that we may
forget to fire future timeout events too.  This is a bit less bad
than the other problem because anything that calls any of the timeout
management functions will result in rescheduling the interrupt, but
it's still not good.  So I think we definitely ought to deprecate this
business of allowing the SIGINT interrupt to happen immediately in
LockTimeoutHandler().  An appropriate patch for that seems to be

diff --git a/src/backend/utils/misc/timeout.c b/src/backend/utils/misc/timeout.c
index 3c3e41e..1499ace 100644
*** a/src/backend/utils/misc/timeout.c
--- b/src/backend/utils/misc/timeout.c
*** 16,21 ****
--- 16,22 ----
  #include <sys/time.h>
+ #include "miscadmin.h"
  #include "storage/proc.h"
  #include "utils/timeout.h"
  #include "utils/timestamp.h"
*************** handle_sig_alarm(SIGNAL_ARGS)
*** 259,264 ****
--- 260,267 ----
        int                     save_errno = errno;
         * SIGALRM is always cause for waking anything waiting on the process
         * latch.  Cope with MyProc not being there, as the startup process also
*************** handle_sig_alarm(SIGNAL_ARGS)
*** 311,316 ****
--- 314,322 ----
        errno = save_errno;
(needs a few comments of course) which basically ensures that if the
SIGINT does get delivered while we're still in handle_sig_alarm, it'll
be serviced just before exit instead of while we're still updating the
timeout state.


                        regards, tom lane

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to