On Mon, 6 Apr 2009, David Houlder wrote:
I don't think longjmp() is async signal safe.
There is "safe" and there is "safe"
There is "safe" in the sense of being able to return back to what the
program was doing. But in this case, the program has no intention of
returning. It's reached an exception; it wants to take a specific action
and then exit.
In the case of imapd:
imapd got some form of "time to die" signal: a hangup, a termination, a
kiss of death. imapd has determined that whatever it was doing, it was
NOT an update to the mailbox; it is perfectly alright to abort whatever it
was.
So, imapd has no intention of continuing what it is doing. imapd simply
wants to do the following:
(1) If the mailbox traditional UNIX format, it wants to save any unsaved
changes.
(2) it wants to syslog that it is exiting, and why.
(3) it wants to exit.
For 15 or so years, imapd simply did this in the signal handler. That
worked well; and older versions of libc explicitly supported signal
handlers doing this. You could screw up your context as long as you
didn't try to go back. Let me emphasize: libc explicitly supported you
doing this.
Then glibc came along and applied mutexes. Suddenly in newer versions of
Linux, imapd would be hanging in the syslog() because it may have been
doing a printf() in the main line.
And the answer from the glibc developers was that you couldn't do syslog()
in a single handler. You have to continue what the program was doing, and
somehow in gawdknowswhat code figure out that the signal happened and take
the error path.
The problem was, the server ended up getting hung, typically in TCP wait
on a socket that was dead but somehow failing to fault the IOT on it.
So going back to what the program was doing wasn't working out.
Lo and behold, in looking at glibc code it appeared that longjmp() unwound
the mutexes. And it seems to work.
But now we have these wierd corruptions, which have nothing to do with
anything since it isn't even writing the file at that point! It's almost
as if glibc randomly picks a file descriptor, seeked to 0, and piddled
some stuff there.
At this point, it looks like the whole exercise is futile. Since glibc
has broken how signal handlers used to work, the only way out is not to
try to log why the server terminated. Just vanish without a trace.
Similarly, don't even try to save updates in traditional UNIX mailbox
format ...even though we KNOW that the server wasn't doing anything to the
file at the time therefore that file descriptor is completely clean.
It's a shame that Linux (and I guess BSD) does not have useful signals any
longer. For nearly 40 years, it has been commonplace for a signal handler
to take an abort action with logging without going back to what it was
doing. That apparently has been "improved" into abolition.
-- Mark --
http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
___
Imap-uw mailing list
Imap-uw@u.washington.edu
http://mailman2.u.washington.edu/mailman/listinfo/imap-uw