Re: [HACKERS] Platform-dependent(?) failure in timeout handling

Tom Lane Wed, 27 Nov 2013 09:56:00 -0800

Kevin Grittner <kgri...@ymail.com> writes:
> Tom Lane <t...@sss.pgh.pa.us> wrote:
>> 3. Establish a coding rule that if you catch an error with
>> PG_TRY() and don't re-throw it, you have to unblock signals in
>> your PG_CATCH block.


> Could that be done in the PG_END_TRY macro?

Interesting idea ... [ thinks for a bit ... ] but I'm not sure it's really
a great fix.  PG_END_TRY would have to unblock *all* signals, I think,
and it's not clear you want that in that mechanism.  Consider a PG_TRY
executing inside a signal handler.  It would be semantically reasonable
to use sigsetjmp(foo,1) in PG_TRY, but as I said, I'm trying to avoid
that on performance grounds.

In any case, after sleeping on it I've realized that the HOLD_INTERRUPTS
dance I proposed adding to handle_sig_alarm() is necessary but not
sufficient to deal with interruptions from SIGINT.  The real reason that
we need HOLD_INTERRUPTS there is that the signal handler may be executing
while ImmediateInterruptsOK is true.  So if a random SIGINT happened to
arrive while we're updating the timeout data structures, we could lose
control and leave partially updated (ie, totally corrupted) timeout info.
Adding HOLD_INTERRUPTS prevents that case.  But, imagine that SIGINT
happens just as we enter handle_sig_alarm(), before we can do
HOLD_INTERRUPTS.  Then the SIGINT handler could longjmp out of the SIGALRM
handler; the data structures are OK, but we've effectively lost that
interrupt, and no new one is scheduled.  This isn't much of a problem
for the existing timer event handlers, because we'd just cancel them
anyway on the way out of the (sub)transaction, cf LockErrorCleanup.
But it could be a problem for future usages of the timer infrastructure.

In the case where we longjmp all the way out to the idle loop, we cancel
all pending timeouts anyway in postgres.c's sigsetjmp cleanup stanza.
It would be reasonable to add a PG_SETMASK(&UnBlockSig) in the same area.

I am thinking that what we want to do is add a signal unblock step to
(sub)transaction abort, too, as well as a call to timeout.c to let it
re-establish its timer interrupt in case one got lost.  This would fix
the problem as long as you assume that anyone catching an arbitrary error
condition will do a subtransaction abort to get back into a good state.
But we're requiring that already.

                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Platform-dependent(?) failure in timeout handling

Reply via email to