Re: [Imap-uw] Re: Imap-uw and mbox corruption (Mark Crispin)

2009-04-05 Thread Mark Crispin

On Mon, 6 Apr 2009, David Houlder wrote:

I don't think longjmp() is async signal safe.


There is "safe" and there is "safe"

There is "safe" in the sense of being able to return back to what the 
program was doing.  But in this case, the program has no intention of 
returning.  It's reached an exception; it wants to take a specific action 
and then exit.


In the case of imapd:

imapd got some form of "time to die" signal: a hangup, a termination, a 
kiss of death.  imapd has determined that whatever it was doing, it was 
NOT an update to the mailbox; it is perfectly alright to abort whatever it 
was.


So, imapd has no intention of continuing what it is doing.  imapd simply
wants to do the following:
 (1) If the mailbox traditional UNIX format, it wants to save any unsaved
 changes.
 (2) it wants to syslog that it is exiting, and why.
 (3) it wants to exit.

For 15 or so years, imapd simply did this in the signal handler.  That 
worked well; and older versions of libc explicitly supported signal 
handlers doing this.  You could screw up your context as long as you 
didn't try to go back.  Let me emphasize: libc explicitly supported you 
doing this.


Then glibc came along and applied mutexes.  Suddenly in newer versions of 
Linux, imapd would be hanging in the syslog() because it may have been 
doing a printf() in the main line.


And the answer from the glibc developers was that you couldn't do syslog() 
in a single handler.  You have to continue what the program was doing, and 
somehow in gawdknowswhat code figure out that the signal happened and take 
the error path.


The problem was, the server ended up getting hung, typically in TCP wait 
on a socket that was dead but somehow failing to fault the IOT on it.


So going back to what the program was doing wasn't working out.

Lo and behold, in looking at glibc code it appeared that longjmp() unwound 
the mutexes.  And it seems to work.


But now we have these wierd corruptions, which have nothing to do with 
anything since it isn't even writing the file at that point!  It's almost 
as if glibc randomly picks a file descriptor, seeked to 0, and piddled 
some stuff there.


At this point, it looks like the whole exercise is futile.  Since glibc 
has broken how signal handlers used to work, the only way out is not to 
try to log why the server terminated.  Just vanish without a trace. 
Similarly, don't even try to save updates in traditional UNIX mailbox 
format ...even though we KNOW that the server wasn't doing anything to the 
file at the time therefore that file descriptor is completely clean.


It's a shame that Linux (and I guess BSD) does not have useful signals any 
longer.  For nearly 40 years, it has been commonplace for a signal handler 
to take an abort action with logging without going back to what it was 
doing.  That apparently has been "improved" into abolition.


-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
___
Imap-uw mailing list
Imap-uw@u.washington.edu
http://mailman2.u.washington.edu/mailman/listinfo/imap-uw


[Imap-uw] Re: Imap-uw and mbox corruption (Mark Crispin)

2009-04-05 Thread David Houlder

From:
Mark Crispin 


To work around this, I tried to use a setjmp()/longjmp() in the signal 
handler that would take imapd back to the main command loop and then to 
code to save and exit.  Supposedly, longjmp() is supposed to unwind 
whatever context occurred since the setjmp().



I don't think longjmp() is async signal safe.

http://unix.derkeiler.com/Newsgroups/comp.unix.programmer/2003-04/0984.html


If it isn't the longjmp(), then I don't know what the hell is going on.


I suspect that doing longjmp() in an asynchronously called signal 
handler is asking for trouble.


cheers
David

--
   david.houl...@anu.edu.au NCI National Facility
   Phone: +61 2 6125 0578   and ANU Supercomputer Facility
   Fax:   +61 2 6125 8199   Leonard Huxley Bldg (No. 56)
Australian National University
Canberra, ACT, 0200, Australia
___
Imap-uw mailing list
Imap-uw@u.washington.edu
http://mailman2.u.washington.edu/mailman/listinfo/imap-uw