Re: EPIPE
Raphael Quinet wrote: > > On Thu, 11 May 2000, Michael Natterer <[EMAIL PROTECTED]> wrote: > > > > We don't use SA_NODEFER any more. And AFAIK the delivery of SIGCHLD has > > nothing to do with cleaning up zombies. This is why we loop around > > waitpid() because POSIX explicitly says that signals arriving close > > together may be merged by the kernel. > > The delivery of SIGCHLD does not clean up the zombies: this is done by > calling waitpid() or any wait*() function. But the call to waitpid() > will be triggered by a SIGCHLD, and there is a potential race > condition. I will try to explain this in a slightly different way: > regardless of the setting of SA_NODEFER, you cannot ensure that you > will get one SIGCHLD for every dead process. Some kernels merge the > signals that arrive close together, and some of them mask the signal > while the corresponding handler is being called (some systems will > re-deliver the masked signal immediately after the handler returns, > but this is not always the case). Yeah, my confusing sentence above :) wanted to say: The # of SIGCHLD arriving doesn't have to correspond with the # of zombies we have to clean up. Of course just catching SIGCHLD doesn't touch the zombies at all. > Now, consider the following scenario for a race condition: > * The Gimp starts two plug-ins > * plug-in 1 dies > * SIGCHLD delivered, handler called > * inside the signal handler: > * - waitpid() returns the status of the first process > * - waitpid() returns 0 and the while() loop ends > * plug-in 2 dies (before the signal handler returns) > * SIGCHLD cannot be delivered > * - the signal handler returns > * The Gimp continues and the status of second plug-in is never > collected, so this creates a zombie. > Although it is rather unlikely that a context switch happens in the > signal handler just after the while() loop and before returning, it > can and will happen. > > If you are sure that all kernels of the systems that Gimp supports > will re-deliver the second SIGCHLD signal immediately after the signal > handler returns, then there is no problem in the scenario described > above: the handler will be called a second time and will collect the > status of the second plug-in. But I do not think so; that's why I > suggested to call waitpid() outside the signal handler, because then > you avoid the race condition. Hm. I don't think that this is really a problem. The scenario above is, as you say unlikely but will definitely happen. But _if_ it happens we just have a stale zombie hanging around which should be cleaned up the next time the handler is called. > > > Yes... I wrote a few months ago that I would change that and implement > > > some kind of --enable-stack-trace option, but I never took the time to > > > do it. > > > > Now it's there :) We just have to convert the remaining g_print() to write() > > and the handler will be totally safe if enable_stack_trace == FALSE. > > Hmmm... I don't know how you have implemented it (I cannot look at > the CVS code), but I was thinking of having more options for > --enable-stack-trace, both for the configure option and for the > command-line option: > - "never": disable the stack trace and use the default signal handlers > - "query": ask the user, as we are doing now. > - "always": always call the debugger without asking any question (useful > if stdin does not work, but stdout is still OK) > - "wait": always wait 30 seconds, without asking any question (this does > not use stdin or stdout and gives some time to the user if she > wants to attach a debugger to the process) > If the code is compiled or run with the "never" option, then the > signal handlers would never be installed. The configure option would > set the default value for this flag, but it would always be possible > to override it with the command-line option without recompiling. The > default for the code in CVS and for the unstable tarballs should be > "query", and the default for the stable releases could be "never" or > "wait". Wow, this is a bit more than what I was thinking about. I think this would be really useful. Still willing to hack it if comeone kicks you ?-) Ok, kick, kick, kick... > > We should probably re-add the reentrancy guards in the fatal handlers and > > just do a brute force exit() if it's called recursively (which can only > > happen during the stack trace because that's the only case where the signals > > are unblocked in the handler). > > Another option would be to set the signals to SIG_DFL as soon as the > fatal handler is called. Since the default behavior for these signals > is to kill the program, this would prevent the endless loop without > requiring a special flag. Yep, this sounds more useful than the current state (which just unblocks the signals in the handler if the trace is called and causes the loop if another signal arrives during and probably because of the trace). > On
Re: EPIPE
On Fri, May 12, 2000 at 05:43:20PM +0200, Raphael Quinet <[EMAIL PROTECTED]> wrote: > * The Gimp continues and the status of second plug-in is never > collected, so this creates a zombie. > If you are sure that all kernels of the systems that Gimp supports > will re-deliver the second SIGCHLD signal immediately after the signal > handler returns, These kernel exists, but most likely gimp won't compile on them anyway. There is a large body of programs relying on these marginally-working-POSIX signals (bash or perl for example), and breaking the expectations of signal blocking would most likely break these programs. However, the bash tries to workaround bugs in various versions of waitpid > status of the second plug-in. But I do not think so; that's why I > suggested to call waitpid() outside the signal handler, because then > you avoid the race condition. This would indeed be the most safe solution. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | |
Re: EPIPE
On Thu, 11 May 2000, Michael Natterer <[EMAIL PROTECTED]> wrote: > Raphael Quinet wrote: > > No, actually it is not safe on all operating systems: as I wrote > > elsewhere, you cannot always rely on SA_NODEFER. This means that in > > some cases, you could miss a SIGCHLD signal that occurs while you are > > still inside the handler but after the last test on waitpid(). If > > this happens, the main app will not see that one of the plug-ins has > > died (until another one dies and the handler collects the status for > > both). That's why it is safer to make the tests outside the signal > > handler. Otherwise, you could have a race condition on some systems > > (very seldom, but still...). > > We don't use SA_NODEFER any more. And AFAIK the delivery of SIGCHLD has > nothing to do with cleaning up zombies. This is why we loop around > waitpid() because POSIX explicitly says that signals arriving close > together may be merged by the kernel. The delivery of SIGCHLD does not clean up the zombies: this is done by calling waitpid() or any wait*() function. But the call to waitpid() will be triggered by a SIGCHLD, and there is a potential race condition. I will try to explain this in a slightly different way: regardless of the setting of SA_NODEFER, you cannot ensure that you will get one SIGCHLD for every dead process. Some kernels merge the signals that arrive close together, and some of them mask the signal while the corresponding handler is being called (some systems will re-deliver the masked signal immediately after the handler returns, but this is not always the case). Now, consider the following scenario for a race condition: * The Gimp starts two plug-ins * plug-in 1 dies * SIGCHLD delivered, handler called * inside the signal handler: * - waitpid() returns the status of the first process * - waitpid() returns 0 and the while() loop ends * plug-in 2 dies (before the signal handler returns) * SIGCHLD cannot be delivered * - the signal handler returns * The Gimp continues and the status of second plug-in is never collected, so this creates a zombie. Although it is rather unlikely that a context switch happens in the signal handler just after the while() loop and before returning, it can and will happen. If you are sure that all kernels of the systems that Gimp supports will re-deliver the second SIGCHLD signal immediately after the signal handler returns, then there is no problem in the scenario described above: the handler will be called a second time and will collect the status of the second plug-in. But I do not think so; that's why I suggested to call waitpid() outside the signal handler, because then you avoid the race condition. > > Yes... I wrote a few months ago that I would change that and implement > > some kind of --enable-stack-trace option, but I never took the time to > > do it. > > Now it's there :) We just have to convert the remaining g_print() to write() > and the handler will be totally safe if enable_stack_trace == FALSE. Hmmm... I don't know how you have implemented it (I cannot look at the CVS code), but I was thinking of having more options for --enable-stack-trace, both for the configure option and for the command-line option: - "never": disable the stack trace and use the default signal handlers - "query": ask the user, as we are doing now. - "always": always call the debugger without asking any question (useful if stdin does not work, but stdout is still OK) - "wait": always wait 30 seconds, without asking any question (this does not use stdin or stdout and gives some time to the user if she wants to attach a debugger to the process) If the code is compiled or run with the "never" option, then the signal handlers would never be installed. The configure option would set the default value for this flag, but it would always be possible to override it with the command-line option without recompiling. The default for the code in CVS and for the unstable tarballs should be "query", and the default for the stable releases could be "never" or "wait". > We should probably re-add the reentrancy guards in the fatal handlers and > just do a brute force exit() if it's called recursively (which can only > happen during the stack trace because that's the only case where the signals > are unblocked in the handler). Another option would be to set the signals to SIG_DFL as soon as the fatal handler is called. Since the default behavior for these signals is to kill the program, this would prevent the endless loop without requiring a special flag. On Fri, 12 May 2000, Michael Natterer <[EMAIL PROTECTED]> wrote: [...] > So ignoring SIGPIPE in the app causes it to be ignored in children, too. > To avoid the need of resetting the signal we could just connect it > to a dummy handler and let exec() do it's job of resetting it. Well, as I wrote above and in a previous mail, there is a convenient SIG_DFL that exits for that purpose
Re: EPIPE
On Fri, 12 May 2000, Marc Lehmann wrote: > On Thu, May 11, 2000 at 09:05:58PM +0200, Michael Natterer ><[EMAIL PROTECTED]> wrote: > > > The libgimp code could try to set the signal handler to SIG_DFL before > > > executing the code of the plug-in. > > > > We don't need to do this, as (exec()'ed) children can't inherit handlers > > from their parent anyway. > > If you ignore signals (you said so) this is very well inherited by > children (this is how programs like nohup work). However, this is not > necessarily a problem, of course. It might just be unexpected by libgimp > in the child, for example, or by plug-ins, and most probably it's not the > intented action to ignore any signals by default (where the default action > isn't ignore) in the plug-ins. Oops, you're right. Of course only signals which are connected to a handler are reset to their default disposition on exec() at.al. So ignoring SIGPIPE in the app causes it to be ignored in children, too. To avoid the need of resetting the signal we could just connect it to a dummy handler and let exec() do it's job of resetting it. > I would really prefer leaving most of the signal handlers intact (or under > program control). It would be nice if a plug-in could choose to call > > gimp_signal_safetynet (); > > If it wanted all it's signals to be clobbered. And gimp_main could change > the signals it really requires. The problem is that now, a call to > gimp_main (which doesn't return) changes signal handlers without leaving > the caller the option of not (or the option of later re-setting them > again, since "later" might be never) Hm, we install the signal stuff before the call to gimp_loop() which in turn calls the plugin's run() function after which the plugin is closed anyway. So why can't you just do the signal setting which have to be special in the run() procedure. There is nothing really happening in between (well, except the config message arriving but that shouldn't hurt). > > > signal handlers installed by the Gimp should call the ones that were > > > installed before. But then we have to support the systems that have > > > > Do you really think this is neccessary? > > I can imagine that gimp needs control over a few signals (although I > cannot imagine which ones at the moment). The current situation, however, > is, that gimp (or glib?) changes many signals in plug-ins (like SIGPIPE) > without any visible benefit. Inherited settings are not the problem here > (no handlers). Plugins connect fatal signals to the fatal handler (stacktrace), ignore SIGPIPE to get notified of a crashing gimp by the new gimp_plugin_io_error_handler() (because SIGPIPE didn't work) and they connect SIGCHLD to a handler which is identical to the one in the main app (to clean up zombies). > > I'd rather like to wait it the current code (which is not really > > different from the old one except that it uses sigaction()) works as > > expected. > > It's indeed the behaviour of the old code that's bugging me ;) But I get > a bit nervous when the number of clobbered signals increases, rather then > decreases. Only SIGCHLD and SIGPIPE disposition has changed in plugins. But I'm of course willing to change my mind if someone shows me where the problem is. (the problem may be that I still don't understand why you think you cannot change gimp_main()'s signal settings :) bye, --Mitch
Re: EPIPE
On Friday, 12 May 2000, Marc Lehmann wrote: > [1: plugins should have the system default signal handers, not gimp ones] > [2: main gimp app should have system default SIGPIPE handler] Point 1: I mainly agree with this. I can't think of a need to hook any signals. Did plugins used to give backtraces when they crashed? I think they did: if so, and people want to keep this functionality, then unfortunately plugins need to hook signals in a similar manner to the main gimp app. However, plugins definitely won't benefit from SIGCHLD handers or ignoring SIGPIPE. If a plugin tried to write to the gimp pipe and gets a SIGPIPE, it's because the main gimp app has died. The plugin should die as swiftly as possible, in this case. Point 2: The main gimp app should ignore SIGPIPE, so that an EPIPE can be generated in a controlled manner to be collected and acted upon by the plugin management routines in plug_in.c With all this vigorous discussion going on, I seem to be a little confused. Can someone (Mitch, Garry?) summarise what the current CVS signal handling situation is, and what changes (if any) are planned? Austin
Re: EPIPE
On Thu, May 11, 2000 at 09:05:58PM +0200, Michael Natterer <[EMAIL PROTECTED]> wrote: > > The libgimp code could try to set the signal handler to SIG_DFL before > > executing the code of the plug-in. > > We don't need to do this, as (exec()'ed) children can't inherit handlers > from their parent anyway. If you ignore signals (you said so) this is very well inherited by children (this is how programs like nohup work). However, this is not necessarily a problem, of course. It might just be unexpected by libgimp in the child, for example, or by plug-ins, and most probably it's not the intented action to ignore any signals by default (where the default action isn't ignore) in the plug-ins. I would really prefer leaving most of the signal handlers intact (or under program control). It would be nice if a plug-in could choose to call gimp_signal_safetynet (); If it wanted all it's signals to be clobbered. And gimp_main could change the signals it really requires. The problem is that now, a call to gimp_main (which doesn't return) changes signal handlers without leaving the caller the option of not (or the option of later re-setting them again, since "later" might be never) > > signal handlers installed by the Gimp should call the ones that were > > installed before. But then we have to support the systems that have > > Do you really think this is neccessary? I can imagine that gimp needs control over a few signals (although I cannot imagine which ones at the moment). The current situation, however, is, that gimp (or glib?) changes many signals in plug-ins (like SIGPIPE) without any visible benefit. Inherited settings are not the problem here (no handlers). > I'd rather like to wait it the current code (which is not really > different from the old one except that it uses sigaction()) works as > expected. It's indeed the behaviour of the old code that's bugging me ;) But I get a bit nervous when the number of clobbered signals increases, rather then decreases. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | |
Re: EPIPE
On Thu, May 11, 2000 at 08:33:54PM +0200, Michael Natterer <[EMAIL PROTECTED]> wrote: > But we don't do bad things in the SIGCHLD handler anyway > (no need for sharing data with the app) and in the fatal handler, we just > quit the app. "I'll put all my trust in you" ;) > > > The reason to ignore SIGPIPE was to make plugins to bahave like > > > the main app did before :) > > > > I don't think plug-ins should behave like the main application ;) They > > are, after all, plug-ins. > > This was a joke. Same here ;) > But mysteriously, it didn't work. Gimp was always connecting SIGPIPE to > the fatal signal handler and we _never_ got the signal, though. Then _is_ SIGPIPE being ignored or not just now? (confused...). However, I think not connecting SIGPIPE to _anything_ at all is not a good solution. Unless the plug-in needs to do some specific cleanup just terminating silently seems like the best behaviour. And if SIGPIPE isn't sent, changeing it's handler is even less useful. > You can of course set your own signal handlers or reset the signals to > their default behaviour. This is very difficult, since libgimp changes the signal handlers after claling gimp_main, and control doesn't return to the main application very determistically :( > No it hasn't. But as SIGPIPE didn't work (probably because of glib doing > strange stuff in the main loop or whatever), I had to find another way > (which seems to work quite nice so far). Even better, so SIGPIPE is left alone (I presume). > Nope, signals are set back to their default behaviour when doing an exec(). Hmm, AFAICS gimp simply calls fork and then execvp. Nothing signal related at all (and the plug-in inherits the signal behaviour) > > handlers on any plug-in" behaviour is making life very difficult (and this > > has been the case before you started to change anything!). > > Good to hear that. I tested the perl plugins with the new code and it > seemed to work, but it feels better to hear it from da perl man :) Hah! you probably tested it more thoroughly than me, so watch out for the _real_ complaints! ;) > > I don't see why replacing the default behaviour of terminating the process > > is worse than installing a signal handler that just terminates the > > process anyway. > > And does a stacktrace. Nobody needs a stacktrace from gimp + as many stacktraces as plug-ins are currently running. Having a stacktrace for sigpipe in a plug-in seems counterproductive. > We can remove them in the final release. I would however prefer to > leave them there (because the final release is unlikely to be bug-free). I still get endless loops in about 2-10% of all crashes (As I said gimp rarely crashes compared to some months ago). I just close my terminal window since that seems to really kill it in most cases, but sometimes gimp crashes and eats 100% cpu power for a long time, until I can detect that it's still running somewhere in the background (printing messages to nowehere). -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | |
Re: EPIPE
On Thu, May 11, 2000 at 08:54:44PM +0200, Michael Natterer <[EMAIL PROTECTED]> wrote: > > Yes... I wrote a few months ago that I would change that and implement > > some kind of --enable-stack-trace option, but I never took the time to > > do it. > > Now it's there :) We just have to convert the remaining g_print() to write() > and the handler will be totally safe if enable_stack_trace == FALSE. Raphael, Mitch. Just to be clear: what _I_ need is a way to not have the plug-in signals changed behind my back. Installing signal handlers for unsuspecting signals, therefore, is my _problem_. Even moving signal initialization to somewhere before gimp_main would improve my situation. Making it optional in a plug-in would be best. I also argued (not independently enough, however, as the topics are similar) that the stack-tracing stuff (and I meant cathing each and every fatal signal) makes a lot of problems for the main _gimp_ app, as it's often not working (no way to get out of the prompt, no sensible stack trace anyway). It does not limit my "programming creativeness" (if there is any), like the first problem. I am just afraid that getting a SIGSEGV is better than looping around in the background. Catching signals and not offering a menu (as mitch seemed to imply) does not change the situation a bit. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | |
Re: EPIPE
(Sorry, I received mitch's mail befoee this one!) On Thu, May 11, 2000 at 07:23:48PM +0200, Raphael Quinet <[EMAIL PROTECTED]> wrote: > Hmmm... In the last discussion, I said that I would implement a > compile-time and a run-time option for choosing this, because some > people (including myself) do like the option to have a stack trace > (when it works). However, since there were such a large number of problems where you had to kill -9 gimp because it tried to catch all and everything, wouldn't it be safer to not try to play games, at least with the more problematic signals like SIGSEGV or SIGILL? (OTOH, gimp crashes ever more rarely the closer 1.2 gets!) > But I did not take the time to do it. If you kick me enough, I will do > it... :-) a compiletime option (per-plug-in) would be perfect, yes, yes, yes! please, please, please My problems are related to not being able to catch many signals (TERM, PIPE) in my plug-ins (gimp-perl as well as some specialised perl-plug-ins like the Perl-Server!) -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | |
Re: EPIPE
On Thu, May 11, 2000 at 06:48:17PM +0200, Raphael Quinet <[EMAIL PROTECTED]> wrote: > Note that being MT safe is not enough. For the 4 signals that are > listed above, you can usually expect that your memory is already > corrupted. So if you want to minimize the risks of crashing If it's important enough to reliably catch these (just for the benefit of developers...) then one could try to use sigaltstack where available to at least provide safe local storage. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | |
Re: EPIPE
In regard to: Re: EPIPE, Michael Natterer said (at 9:05pm on May 11, 2000): >> Another solution is to save and re-install the old signal handler. We >> may even want to support some kind of chain of signal handers: the new >> signal handlers installed by the Gimp should call the ones that were >> installed before. But then we have to support the systems that have >> SA_SIGINFO and those that do not have it (this is not part of POSIX). >> The systems that have SA_SIGINFO are using extended prototypes for the >> signal handlers, so some #ifdef's will be necessary if we want to do >> that in the right way. I could contribute some code if you want, >> because I already did that in some other projects. > >Do you really think this is neccessary? I'd rather like to wait it the >current code (which is not really different from the old one except >that it uses sigaction()) works as expected. > >I guess Tim with his hardware zoo will be able to answer the question :) I'll be checking it this weekend, and I'll definitely report to the "signal crew" on my findings. :-) Tim -- Tim Mooney [EMAIL PROTECTED] Information Technology Services (701) 231-1076 (Voice) Room 242-J1, IACC Building (701) 231-8541 (Fax) North Dakota State University, Fargo, ND 58105-5164
Re: EPIPE
Raphael Quinet wrote: > > On Thu, 11 May 2000, Marc Lehmann <[EMAIL PROTECTED]> wrote: > [...] > > What could be the case, however, is, that gimp itself does not reset it's > > signal handlers when it execs the plug-ins. If this is the case, then this > > bug (not restoring signal handlers to their default) might cause many other > > subtle problems. > > The libgimp code could try to set the signal handler to SIG_DFL before > executing the code of the plug-in. We don't need to do this, as (exec()'ed) children can't inherit handlers from their parent anyway. > Another solution is to save and re-install the old signal handler. We > may even want to support some kind of chain of signal handers: the new > signal handlers installed by the Gimp should call the ones that were > installed before. But then we have to support the systems that have > SA_SIGINFO and those that do not have it (this is not part of POSIX). > The systems that have SA_SIGINFO are using extended prototypes for the > signal handlers, so some #ifdef's will be necessary if we want to do > that in the right way. I could contribute some code if you want, > because I already did that in some other projects. Do you really think this is neccessary? I'd rather like to wait it the current code (which is not really different from the old one except that it uses sigaction()) works as expected. I guess Tim with his hardware zoo will be able to answer the question :) And BTW, this gimp-developer traffic is cool. Somehow remembers me of the old days when we used to hack new features :) bye, --Mitch
Re: EPIPE
Raphael Quinet wrote: > > On Thu, 11 May 2000, Michael Natterer <[EMAIL PROTECTED]> wrote: > > > > This is what currently happens (ok, it happens in the handler, but WNOHANG > > *should* be absolutely safe). > > No, actually it is not safe on all operating systems: as I wrote > elsewhere, you cannot always rely on SA_NODEFER. This means that in > some cases, you could miss a SIGCHLD signal that occurs while you are > still inside the handler but after the last test on waitpid(). If > this happens, the main app will not see that one of the plug-ins has > died (until another one dies and the handler collects the status for > both). That's why it is safer to make the tests outside the signal > handler. Otherwise, you could have a race condition on some systems > (very seldom, but still...). We don't use SA_NODEFER any more. And AFAIK the delivery of SIGCHLD has nothing to do with cleaning up zombies. This is why we loop around waitpid() because POSIX explicitly says that signals arriving close together may be merged by the kernel. > [...] > > The usage of SIGCLD is strongly discouraged by Stevens and some Solaris > > document I fould recently. But Gimp uses SIGCHLD anyway. > > And here is an excerpt from /usr/include/sys/signal.h on Solaris 2.6: > > #define SIGCLD 18 /* child status change */ > #define SIGCHLD 18 /* child status change alias (POSIX) */ > > So it does not make much of a difference under recent versions of > Solaris, at least... :-) But they still say in some docs that the > behavior of SIGCLD might change in future releases, so the POSIX > version should be used in new programs. Yeah, although Solaris is definitely more POSIX than Linux, they have to cook their own stuff :) Stevens calls SIGCLD a "questionable practice". > > > In one application that wanted to catch SIGSEGV, SIGBUS, SIGILL and > > > SIGFPE, I created a handler that uses a direct call to write() on an > > > fd that was previously obtained from fileno(stderr) (this fd is saved > > > early so that the write() call can work even if the FILE *stderr is > > > overwritten with garbage). Doing this is safe, AFAIK. > > > > Yep, write() is safe. Gimp uses g_print() which is not really safe, but > > then we call g_on_error_query() which definitely does a bit more than > > what's allowed :) > > Yes... I wrote a few months ago that I would change that and implement > some kind of --enable-stack-trace option, but I never took the time to > do it. Now it's there :) We just have to convert the remaining g_print() to write() and the handler will be totally safe if enable_stack_trace == FALSE. > > >From glib/gerror.c: > > > > /* > > * MT safe ; except for g_on_error_stack_trace, but who wants thread safety > > * then > > */ > > Note that being MT safe is not enough. For the 4 signals that are > listed above, you can usually expect that your memory is already > corrupted. So if you want to minimize the risks of crashing > recursively inside the signal handler, you should avoid using > variables as much as possible. A handler for SIGSEGV is a good place > for paranoia... Like getting another SIGSEGV while glib asks if it should do a stack trace ?-) That's e-v-i-l. But I'm afraid we either have a waterproof SIGSEGV handler or the trace, not both :( We should probably re-add the reentrancy guards in the fatal handlers and just do a brute force exit() if it's called recursively (which can only happen during the stack trace because that's the only case where the signals are unblocked in the handler). > The other signal handlers do not need so much defensive programming. > Being MT safe is usually enough. I think so, too. The SIGCHLD handler is so trivial that it can hardly cause any harm. bye, --Mitch
Re: EPIPE
Marc Lehmann wrote: > > On Thu, May 11, 2000 at 12:47:22PM +0200, Michael Natterer ><[EMAIL PROTECTED]> wrote: > > > The only type that is atomic is sig_atomic_t. Everything else is not atomic > > > one at least one target where gimp runs. Limiting gimp to gnu-libc-platforms > > > looks very bad. > > > > See my other mail. I meant atomic data access which simply has to be > > the case or the os will go berserk :) > > If you are free to use the type you want, I would opt for sig_atomic_t, > which should be available on all platforms that gimp targets (of course, > this is only what I believe ;) And you're right. But we don't do bad things in the SIGCHLD handler anyway (no need for sharing data with the app) and in the fatal handler, we just quit the app. > > The reason to ignore SIGPIPE was to make plugins to bahave like > > the main app did before :) > > I don't think plug-ins should behave like the main application ;) They > are, after all, plug-ins. This was a joke. Gimp used to die on dying children. But that is what plugins should do (die on dying gimp) :) > > They should terminate themselves when Gimp crashes. > > Since the default action of SIGPIPE is to terminate the process, I > really think that ignoring the signal won't improve the situation. To > the contrary, changing all sorts of signals behind the applications > back (remember that there is no way to reset the signal handlers to the > values an application likes, as they are reset inside gimp_main!) creates > headaches for non-trviial plug-ins (like some of mine... ;) But mysteriously, it didn't work. Gimp was always connecting SIGPIPE to the fatal signal handler and we _never_ got the signal, though. You can of course set your own signal handlers or reset the signals to their default behaviour. > > So I inotrduced a handler for IO errors on the read channel which > > doesn't let them hang around as they used to do before. > > This has nothing to do with SIGPIPE, however. If the process didn't receive a > SIGPIPE before (the default action is to terminate the process), ignoring it > will have no effect. No it hasn't. But as SIGPIPE didn't work (probably because of glib doing strange stuff in the main loop or whatever), I had to find another way (which seems to work quite nice so far). > What could be the case, however, is, that gimp itself does not reset it's > signal handlers when it execs the plug-ins. If this is the case, then this > bug (not restoring signal handlers to their default) might cause many other > subtle problems. Nope, signals are set back to their default behaviour when doing an exec(). This is not the case for blocked signals but we don't block signals, we just ignore them. > > But I never claimed the new code to be the correct and final solution. > > So if it breaks perl in any way, we will of course have to change it > > again. > > It doesn't break perl. However, the whole "we will force our %$&§%& signal > handlers on any plug-in" behaviour is making life very difficult (and this > has been the case before you started to change anything!). Good to hear that. I tested the perl plugins with the new code and it seemed to work, but it feels better to hear it from da perl man :) > I thought the last time this was discussed we agreed that, in the release, > the signal handling code that causes frustration and endless loops will be > disabled by default? Most of these signals have very sane default actions > (like SIGPIPE). > > I don't see why replacing the default behaviour of terminating the process > is worse than installing a signal handler that just terminates the > process anyway. And does a stacktrace. We can remove them in the final release. I would however prefer to leave them there (because the final release is unlikely to be bug-free). bye, --Mitch
Re: EPIPE
On Thu, 11 May 2000, Marc Lehmann <[EMAIL PROTECTED]> wrote: [...] > What could be the case, however, is, that gimp itself does not reset it's > signal handlers when it execs the plug-ins. If this is the case, then this > bug (not restoring signal handlers to their default) might cause many other > subtle problems. The libgimp code could try to set the signal handler to SIG_DFL before executing the code of the plug-in. Another solution is to save and re-install the old signal handler. We may even want to support some kind of chain of signal handers: the new signal handlers installed by the Gimp should call the ones that were installed before. But then we have to support the systems that have SA_SIGINFO and those that do not have it (this is not part of POSIX). The systems that have SA_SIGINFO are using extended prototypes for the signal handlers, so some #ifdef's will be necessary if we want to do that in the right way. I could contribute some code if you want, because I already did that in some other projects. [...] > I thought the last time this was discussed we agreed that, in the release, > the signal handling code that causes frustration and endless loops will be > disabled by default? Most of these signals have very sane default actions > (like SIGPIPE). Hmmm... In the last discussion, I said that I would implement a compile-time and a run-time option for choosing this, because some people (including myself) do like the option to have a stack trace (when it works). But I did not take the time to do it. If you kick me enough, I will do it... :-) -Raphael
Re: EPIPE
On Thu, May 11, 2000 at 12:47:22PM +0200, Michael Natterer <[EMAIL PROTECTED]> wrote: > > The only type that is atomic is sig_atomic_t. Everything else is not atomic > > one at least one target where gimp runs. Limiting gimp to gnu-libc-platforms > > looks very bad. > > See my other mail. I meant atomic data access which simply has to be > the case or the os will go berserk :) If you are free to use the type you want, I would opt for sig_atomic_t, which should be available on all platforms that gimp targets (of course, this is only what I believe ;) > The reason to ignore SIGPIPE was to make plugins to bahave like > the main app did before :) I don't think plug-ins should behave like the main application ;) They are, after all, plug-ins. > They should terminate themselves when Gimp crashes. Since the default action of SIGPIPE is to terminate the process, I really think that ignoring the signal won't improve the situation. To the contrary, changing all sorts of signals behind the applications back (remember that there is no way to reset the signal handlers to the values an application likes, as they are reset inside gimp_main!) creates headaches for non-trviial plug-ins (like some of mine... ;) > So I inotrduced a handler for IO errors on the read channel which > doesn't let them hang around as they used to do before. This has nothing to do with SIGPIPE, however. If the process didn't receive a SIGPIPE before (the default action is to terminate the process), ignoring it will have no effect. What could be the case, however, is, that gimp itself does not reset it's signal handlers when it execs the plug-ins. If this is the case, then this bug (not restoring signal handlers to their default) might cause many other subtle problems. > But I never claimed the new code to be the correct and final solution. > So if it breaks perl in any way, we will of course have to change it > again. It doesn't break perl. However, the whole "we will force our %$&§%& signal handlers on any plug-in" behaviour is making life very difficult (and this has been the case before you started to change anything!). I thought the last time this was discussed we agreed that, in the release, the signal handling code that causes frustration and endless loops will be disabled by default? Most of these signals have very sane default actions (like SIGPIPE). I don't see why replacing the default behaviour of terminating the process is worse than installing a signal handler that just terminates the process anyway. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | |
Re: EPIPE
On Thu, 11 May 2000, Michael Natterer <[EMAIL PROTECTED]> wrote: > Raphael Quinet wrote: [...] > > - SIGCHLD or SIGCLD: a child process died. This signal can be > > delivered at any time. Some systems do not provide a reliable way > > to know how many processes exited (if they do not support SA_NODEFER > > or if their waitpid() or wait3() calls are broken), so it is usually > > better to simply set a flag in the signal handler (without calling > > any wait*() function) and to check the status of the children outside > > the signal handler, until some wait*() function reports that there > > are no more dead processes. For example: > > while ((pid = waitpid (-1, &status, WNOHANG)) > 0) > > { ... /* check WIFEXITED(status) and other things */ } > > This is what currently happens (ok, it happens in the handler, but WNOHANG > *should* be absolutely safe). No, actually it is not safe on all operating systems: as I wrote elsewhere, you cannot always rely on SA_NODEFER. This means that in some cases, you could miss a SIGCHLD signal that occurs while you are still inside the handler but after the last test on waitpid(). If this happens, the main app will not see that one of the plug-ins has died (until another one dies and the handler collects the status for both). That's why it is safer to make the tests outside the signal handler. Otherwise, you could have a race condition on some systems (very seldom, but still...). [...] > The usage of SIGCLD is strongly discouraged by Stevens and some Solaris > document I fould recently. But Gimp uses SIGCHLD anyway. And here is an excerpt from /usr/include/sys/signal.h on Solaris 2.6: #define SIGCLD 18 /* child status change */ #define SIGCHLD 18 /* child status change alias (POSIX) */ So it does not make much of a difference under recent versions of Solaris, at least... :-) But they still say in some docs that the behavior of SIGCLD might change in future releases, so the POSIX version should be used in new programs. > > In one application that wanted to catch SIGSEGV, SIGBUS, SIGILL and > > SIGFPE, I created a handler that uses a direct call to write() on an > > fd that was previously obtained from fileno(stderr) (this fd is saved > > early so that the write() call can work even if the FILE *stderr is > > overwritten with garbage). Doing this is safe, AFAIK. > > Yep, write() is safe. Gimp uses g_print() which is not really safe, but > then we call g_on_error_query() which definitely does a bit more than > what's allowed :) Yes... I wrote a few months ago that I would change that and implement some kind of --enable-stack-trace option, but I never took the time to do it. > >From glib/gerror.c: > > /* > * MT safe ; except for g_on_error_stack_trace, but who wants thread safety > * then > */ Note that being MT safe is not enough. For the 4 signals that are listed above, you can usually expect that your memory is already corrupted. So if you want to minimize the risks of crashing recursively inside the signal handler, you should avoid using variables as much as possible. A handler for SIGSEGV is a good place for paranoia... The other signal handlers do not need so much defensive programming. Being MT safe is usually enough. > killall -SIGGIMP ps :-) -Raphael
Re: EPIPE
Marc Lehmann wrote: > > >However, a signal handler can do whatever it likes with the app's structures > >as long as it uses atomic data access (which can be a pointer, as pointers > >have the same size as integers, which are atomic. This is true at least on > >all processors which have a GNU libc port and finding a processor > >where pointers are not atomic looks very unlikely to me). > > The only type that is atomic is sig_atomic_t. Everything else is not atomic > one at least one target where gimp runs. Limiting gimp to gnu-libc-platforms > looks very bad. See my other mail. I meant atomic data access which simply has to be the case or the os will go berserk :) > BTW: what's the reason for messing with SIGPIPE in plug-ins? Wasn't the > original idea to not mess with signals in the release at all? I suffer > quite a lot because libgimp resets signal handlers at places where you > cannot override it ;) The reason to ignore SIGPIPE was to make plugins to bahave like the main app did before :) They should terminate themselves when Gimp crashes. So I inotrduced a handler for IO errors on the read channel which doesn't let them hang around as they used to do before. But I never claimed the new code to be the correct and final solution. So if it breaks perl in any way, we will of course have to change it again. bye, --Mitch
Re: EPIPE
Tim Mooney wrote: > > In regard to: Re: EPIPE, Michael Natterer said (at 12:40am on May 11, 2000): > > >This is what currently happens (ok, it happens in the handler, but WNOHANG > >*should* be absolutely safe). > >However, a signal handler can do whatever it likes with the app's structures > >as long as it uses atomic data access (which can be a pointer, as pointers > >have the same size as integers, which are atomic. This is true at least on > >all processors which have a GNU libc port and finding a processor > >where pointers are not atomic looks very unlikely to me). > > Finding a processor/OS combo where sizeof(pointer) != sizeof(int) is pretty > easy, however. How does this change your thinking? No substantially :) I meant atomic access. A 64bit Processor will still have to access it's pointers atomically even if they are larger than an integer. > >The usage of SIGCLD is strongly discouraged by Stevens and some Solaris > >document I fould recently. But Gimp uses SIGCHLD anyway. > > SIGCLD != SIGCHLD. They're distant relations, but that's it. Is that what > you were trying to say? Since you apparently have access to Stevens' APUE, > look at section 10.7 again if you don't understand what I'm talking about. Yep, this is my source of info. The only common thing between SIGCHLD and SIGCLD is that they have something to do with children. SIGCLD is nonstandard and should therefore not be used. bye, --Mitch
Re: EPIPE
On Wed, May 10, 2000 at 05:19:11PM +0200, Raphael Quinet <[EMAIL PROTECTED]> wrote: > - SIGABRT: usually triggered by the application calling abort() or by a > user who wants to get a core dump from a running process. It can be The official signal to get a coredump without process interaction is SIGQUIT (you can send a SIGQUIT via the keyboard, but you cnanot send SIGABORT). -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | |
Re: EPIPE
In regard to: Re: EPIPE, Nick Lamb said (at 2:57am on May 11, 2000): >On Wed, May 10, 2000 at 07:15:57PM -0500, Tim Mooney wrote: >> Finding a processor/OS combo where sizeof(pointer) != sizeof(int) is pretty >> easy, however. How does this change your thinking? > >Wouldn't atomicity guarantees be a processor feature, and hence tied >to word size (probably pointer width if you are taking full advantage >of your CPU) rather than whatever CC might think sizeof(int) is ? > >If I've completely forgotten my architecture course, don't hesitate >to write me a long rant, off-list of course... Nick- You may very well be completely correct, but it's essentially circumstantial to the point I was trying to make. My whole goal with the text you quoted was trying to dissuade anyone from making the "All the world's a ILP32 OS" assumption, which seemed where things might be headed. These days, recent versions of Unix from Sun, HP, SGI, IBM, and Compaq (as well as other's I'm not familiar with) on recent hardware are all LP64 or LP64-capable. Making pointer size assumptions was never a good idea, but it's especially problematic today. Linux and some of the BSD derivatives also have plans to be LP64 some day too, so even Linux-only developers want to keep a mindful eye on code portability to future platforms. Tim -- Tim Mooney [EMAIL PROTECTED] Information Technology Services (701) 231-1076 (Voice) Room 242-J1, IACC Building (701) 231-8541 (Fax) North Dakota State University, Fargo, ND 58105-5164
Re: EPIPE
>However, a signal handler can do whatever it likes with the app's structures >as long as it uses atomic data access (which can be a pointer, as pointers >have the same size as integers, which are atomic. This is true at least on >all processors which have a GNU libc port and finding a processor >where pointers are not atomic looks very unlikely to me). The only type that is atomic is sig_atomic_t. Everything else is not atomic one at least one target where gimp runs. Limiting gimp to gnu-libc-platforms looks very bad. BTW: what's the reason for messing with SIGPIPE in plug-ins? Wasn't the original idea to not mess with signals in the release at all? I suffer quite a lot because libgimp resets signal handlers at places where you cannot override it ;) -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | |
Re: EPIPE
On Wed, May 10, 2000 at 07:15:57PM -0500, Tim Mooney wrote: > Finding a processor/OS combo where sizeof(pointer) != sizeof(int) is pretty > easy, however. How does this change your thinking? Wouldn't atomicity guarantees be a processor feature, and hence tied to word size (probably pointer width if you are taking full advantage of your CPU) rather than whatever CC might think sizeof(int) is ? If I've completely forgotten my architecture course, don't hesitate to write me a long rant, off-list of course... Nick.
Re: EPIPE
Date: Wed, 10 May 2000 19:15:57 -0500 (CDT) From: Tim Mooney <[EMAIL PROTECTED]> cc: Raphael Quinet <[EMAIL PROTECTED]>, [EMAIL PROTECTED] In regard to: Re: EPIPE, Michael Natterer said (at 12:40am on May 11, 2000): >This is what currently happens (ok, it happens in the handler, but >WNOHANG *should* be absolutely safe). However, a signal handler >can do whatever it likes with the app's structures as long as it >uses atomic data access (which can be a pointer, as pointers have >the same size as integers, which are atomic. This is true at least >on all processors which have a GNU libc port and finding a >processor where pointers are not atomic looks very unlikely to >me). Finding a processor/OS combo where sizeof(pointer) != sizeof(int) is pretty easy, however. How does this change your thinking? For example, UltraSPARC in 64-bit mode under Solaris. -- Robert Krawitz <[EMAIL PROTECTED]> http://www.tiac.net/users/rlk/ Tall Clubs International -- http://www.tall.org/ or 1-888-IM-TALL-2 Member of the League for Programming Freedom -- mail [EMAIL PROTECTED] Project lead for The Gimp Print -- http://gimp-print.sourceforge.net "Linux doesn't dictate how I work, I dictate how Linux works." --Eric Crampton
Re: EPIPE
In regard to: Re: EPIPE, Michael Natterer said (at 12:40am on May 11, 2000): >This is what currently happens (ok, it happens in the handler, but WNOHANG >*should* be absolutely safe). >However, a signal handler can do whatever it likes with the app's structures >as long as it uses atomic data access (which can be a pointer, as pointers >have the same size as integers, which are atomic. This is true at least on >all processors which have a GNU libc port and finding a processor >where pointers are not atomic looks very unlikely to me). Finding a processor/OS combo where sizeof(pointer) != sizeof(int) is pretty easy, however. How does this change your thinking? >The usage of SIGCLD is strongly discouraged by Stevens and some Solaris >document I fould recently. But Gimp uses SIGCHLD anyway. SIGCLD != SIGCHLD. They're distant relations, but that's it. Is that what you were trying to say? Since you apparently have access to Stevens' APUE, look at section 10.7 again if you don't understand what I'm talking about. You're definitely correct that gimp shouldn't be mucking with SIGCLD, and it's not. Tim -- Tim Mooney [EMAIL PROTECTED] Information Technology Services (701) 231-1076 (Voice) Room 242-J1, IACC Building (701) 231-8541 (Fax) North Dakota State University, Fargo, ND 58105-5164
Re: EPIPE
Raphael Quinet wrote: > > [ a cool summary of signals which i'm thinking about to copy for my students :) ] > - SIGILL: one some processors that do not deliver SIGBUS in all cases, > you can get a SIGILL if a pointer to a callback function was > overwritten with garbage. If the pointer is still referencing some > area inside a code segment (so that you don't get a SIGSEGV) but not > pointing to the start of a valid instruction, you will get the SIGILL. > By the way, the Gimp does not catch this one. Why? Dunno, most likely it was just forgotten in the first place. > - SIGABRT: usually triggered by the application calling abort() or by a > user who wants to get a core dump from a running process. It can be > caught by an application that wants to perform some specific cleanup > tasks, but in most cases it should not be caught by a generic error > handler. I don't understand why the Gimp maps this to the generic > gimp_fatal_error() function??? Yep, we should not catch it but let the kernel do it's job. If the user wants a core dump, she should get one. > - SIGCHLD or SIGCLD: a child process died. This signal can be > delivered at any time. Some systems do not provide a reliable way > to know how many processes exited (if they do not support SA_NODEFER > or if their waitpid() or wait3() calls are broken), so it is usually > better to simply set a flag in the signal handler (without calling > any wait*() function) and to check the status of the children outside > the signal handler, until some wait*() function reports that there > are no more dead processes. For example: > while ((pid = waitpid (-1, &status, WNOHANG)) > 0) > { ... /* check WIFEXITED(status) and other things */ } This is what currently happens (ok, it happens in the handler, but WNOHANG *should* be absolutely safe). However, a signal handler can do whatever it likes with the app's structures as long as it uses atomic data access (which can be a pointer, as pointers have the same size as integers, which are atomic. This is true at least on all processors which have a GNU libc port and finding a processor where pointers are not atomic looks very unlikely to me). The usage of SIGCLD is strongly discouraged by Stevens and some Solaris document I fould recently. But Gimp uses SIGCHLD anyway. > In most of the applications that I wrote, the signal handlers do > nothing directly: they only set a flag that is checked by the main > loop (in an idle function for GTK+ apps, or after poll() or select() > for applications that use low-level calls). I define one flag for > each signal (got_sigchld, got_sigterm, ...) and a master flag that > tells if any of the signal-specific flags have been set. Sometimes I > also use counters instead of boolean flags, but on some systems the > counters are not reliable (especially if there is no SA_NODEFER) so > most of the time they are meaningless. > > In one application that wanted to catch SIGSEGV, SIGBUS, SIGILL and > SIGFPE, I created a handler that uses a direct call to write() on an > fd that was previously obtained from fileno(stderr) (this fd is saved > early so that the write() call can work even if the FILE *stderr is > overwritten with garbage). Doing this is safe, AFAIK. Yep, write() is safe. Gimp uses g_print() which is not really safe, but then we call g_on_error_query() which definitely does a bit more than what's allowed :) >From glib/gerror.c: /* * MT safe ; except for g_on_error_stack_trace, but who wants thread safety * then */ > In most cases, I ignore SIGPIPE (or I only increment a counter for > debugging purposes) because the best way to deal with this is to check > the return value of the write() or send() calls, or to check if a > read() returns 0 later. This is what Gimp does now. Still mysterious why it worked all the time with SIGPIPE being fatal. > Just my 0.02 Euro. But you probably knew all of this already... Almost all... But thanks a lot for writing this summary I've been to lazy to write for years now :):):) And... please torture the new code. I was not able to crash gimp or make it spinning, even when sending SIGSEGV's to plugins while they were heavily using the wire. OTOH I'm almost sure that I'm again the one who introduced the worst bug of the next release... bye, --Mitch - killall -SIGGIMP ps
Re: EPIPE
On Wed, 10 May 2000, Michael Natterer <[EMAIL PROTECTED]> wrote: > Michael Natterer wrote: > > > > Austin Donnelly wrote: > > > > > > [ two mails i totally agree with ] > > > > I'm about to commit some code which should bring the signal > > stuff into a sane state. The ChangeLog entry is quite verbose > > and should explain how I tortured the code. > > Um, 1 minute later I found a bad bug. No commit today :( I don't know if this is relevant (I haven't seen the code that you wanted to commit), but here are some general comments about signals and when they are usually triggered in programs similar to the Gimp... - SIGHUP, SIGINT, SIGQUIT, SIGTERM: usually triggered by the user or by the system shutting down. They can be delivered at any time. - SIGPIPE: an attempt to write() or send() something on a socket has failed because the other party has closed the connection. This signal is usually triggered from within the system call, which is often called from inside a high-level function such as printf(). Since most versions of printf() are not re-entrant, it is usually a bad idea to call printf() or any stdio function in the signal handler. - SIGSEGV: just say "Oops!" or "Eek!". Some bug in the code has corrupted the memory. Usually happens when trying to dereference a NULL pointer or a pointer that has been overwritten with garbage. This also happens quite often inside printf() or sprintf(), for example when you are printing some debugging messages (or error messages) and you did not think that some arguments may be NULL. For this reason, it is also a bad idea to call any stdio functions inside the handler for this signal. - SIGBUS: mostly a variant of SIGSEGV. Happens on many processors when you are trying to access some unaligned data. Again, this is usually due to pointer corruption. - SIGILL: one some processors that do not deliver SIGBUS in all cases, you can get a SIGILL if a pointer to a callback function was overwritten with garbage. If the pointer is still referencing some area inside a code segment (so that you don't get a SIGSEGV) but not pointing to the start of a valid instruction, you will get the SIGILL. By the way, the Gimp does not catch this one. Why? - SIGFPE: usually comes from a division by 0, although some other errors (overflow, invalid operand) can also occur. This is usually triggered while executing a floating-point operation, although some processors or OS's can delay the signal. - SIGABRT: usually triggered by the application calling abort() or by a user who wants to get a core dump from a running process. It can be caught by an application that wants to perform some specific cleanup tasks, but in most cases it should not be caught by a generic error handler. I don't understand why the Gimp maps this to the generic gimp_fatal_error() function??? - SIGCHLD or SIGCLD: a child process died. This signal can be delivered at any time. Some systems do not provide a reliable way to know how many processes exited (if they do not support SA_NODEFER or if their waitpid() or wait3() calls are broken), so it is usually better to simply set a flag in the signal handler (without calling any wait*() function) and to check the status of the children outside the signal handler, until some wait*() function reports that there are no more dead processes. For example: while ((pid = waitpid (-1, &status, WNOHANG)) > 0) { ... /* check WIFEXITED(status) and other things */ } In most of the applications that I wrote, the signal handlers do nothing directly: they only set a flag that is checked by the main loop (in an idle function for GTK+ apps, or after poll() or select() for applications that use low-level calls). I define one flag for each signal (got_sigchld, got_sigterm, ...) and a master flag that tells if any of the signal-specific flags have been set. Sometimes I also use counters instead of boolean flags, but on some systems the counters are not reliable (especially if there is no SA_NODEFER) so most of the time they are meaningless. In one application that wanted to catch SIGSEGV, SIGBUS, SIGILL and SIGFPE, I created a handler that uses a direct call to write() on an fd that was previously obtained from fileno(stderr) (this fd is saved early so that the write() call can work even if the FILE *stderr is overwritten with garbage). Doing this is safe, AFAIK. In most cases, I ignore SIGPIPE (or I only increment a counter for debugging purposes) because the best way to deal with this is to check the return value of the write() or send() calls, or to check if a read() returns 0 later. Just my 0.02 Euro. But you probably knew all of this already... -Raphael
Re: EPIPE
Michael Natterer wrote: > > Austin Donnelly wrote: > > > > [ two mails i totally agree with ] > > I'm about to commit some code which should bring the signal > stuff into a sane state. The ChangeLog entry is quite verbose > and should explain how I tortured the code. Um, 1 minute later I found a bad bug. No commit today :( --Mitch
Re: EPIPE
Austin Donnelly wrote: > > [ two mails i totally agree with ] I'm about to commit some code which should bring the signal stuff into a sane state. The ChangeLog entry is quite verbose and should explain how I tortured the code. Hopefully no -isms this time :) bye, --Mitch
Re: EPIPE
On Monday, 8 May 2000, Michael Natterer wrote: > Unfortunately this is not the reason why gimp dies on just any aborting > child. Although I 100% agree that SIGPIPE being fatal is the wrong thing > to do. I browsed CVS and Gimp is connecting SIGPIPE to on_signal() since > the beginning of CVS time. Ok, fair enough. But gimp used to be able to survive its plugins dying in the past. I'm not sure when this was last working. > There is also an infinite loop using 30% of my cpu time in all cases where > a dying plugin does _not_ kill gimp. Fvwm2 used to have a bug very similar to this. It turned out to be trying to do too much processing in some signal hander function. > [Mitch on temp procs not working from plugins] This seems to be a separate, probably unrelated, bug. On Monday, 8 May 2000, Tim Mooney wrote: > The SIGPIPE problem is because on_signal is currently treating it as a > fatal signal (see the case on_signal in app/main.c). The on_signal routine > should probably be modified to not treat SIGPIPE as fatal. That should fix > the problem Austin is seeing (that others will no doubt see too). Someone > should investigate the handler in 1.1.19 or earlier, and see what was being > done on SIGPIPE there. On Monday, 8 May 2000, Tim Mooney wrote: > So why is Austin observing the problem now? I'm not sure. I'm noting that 2-3 recent bug reports have been related to plugins crashing (the main reason the bug was filed) but as a side effect, the main gimp process is also dying. I happened to look into one of these very briefly, and noted the strange SIGPIPE -> on_signal() -> SIGSEGV behaviour. > It does seem > like it's behaving as expected based on the code. The thing to try would > be to use a different signal handler for SIGPIPE, that doesn't terminate > the gimp. Perhaps just ignore the signal and let the plug-in protocol detect > the problem (assuming it can?)? What happens if we use SIGIGN as the handler for SIGPIPE. That, combined with using EPIPE from g_io_channel_{read,write}() should give us everything we need, without needing the overly complex gimp_nanny() stuff being proposed earlier. Remember: KISS - Keep It Simple, Stupid. On Monday, 8 May 2000, Michael Natterer wrote: > Nope, it unfortunately has another reason. I can't explain why it used to > work with SIGPIPE being fatal but the diffs of app/main.c show no semantical > changes at all down to CVS version 1.1. So, maybe we never used to get SIGPIPEs raised before. Maybe we were just lucky. I still think we need to deal with them properly. > On Monday, 8 May 2000, Tim Mooney wrote: > > Austin is also correct that calling printf from the handler is probably > > asking for trouble. If a message must be written, some other method should > > be found (write *is* ok from a signal handler, but won't using *that* be > > fun...). > > Hm, I guess printf from a signal handler which aborts the program should be > totally safe No it is not. Consider a libc implementation where printf takes out a mutex while appending to the stdio output buffer. If the signal is delivered while printf has this lock held, any attempt to call printf again will deadlock. You lose. This situation may occur if you call _any_ library code you didn't write, so the conclusion is that calling someone else's code from a signal handler is a very bad idea, unless you're guaranteed the code is fully re-entrant. Section 10.6 of Advanced Programming in the UNIX Environment (W. Richard Stevens) lists the POSIX1 mandated re-entrant functions. Printf, malloc etc are not in the list, unsuprisingly. On Monday, 8 May 2000, Michael Natterer wrote: > We should probably ignore SIGPIPE totally and let a more sophisticated > SIGCHLD handler do the work: > > - record which processes have been started ("gimp_nanny()") > - on SIGCHLD check if it crashed... > - ...and somehow clean up the plugin's struct and io channels > (and show a proper error message) > > The problem with this may be that we need a periodic function doing the cleanup > (just like the shell prints it's "bla exited [SIGwhatever]" message before the > prompt) because we cannot do it from the handler. > > OTOH it should be ok to call this cleanup function from two places: > > 1. whenever read/write from/to a plugin pipe fails. > 2. in an idle function. > > The "gimp_fork()" and signal handler stuff I proposed during the SIGCHLD > discussion might so the job but it would be much nicer to fix it to work like > before SIGPIPE indicates a write(2) into the pipe when the plugin has died. If SIGPIPE is being ignored (or the signal handler returns) then the write(2) returns EPIPE. If you read(2) from a pipe when the plugin has died,
Re: EPIPE
Date: Mon, 08 May 2000 20:22:27 +0200 From: Michael Natterer <[EMAIL PROTECTED]> Unfortunately this is not the reason why gimp dies on just any aborting child. Although I 100% agree that SIGPIPE being fatal is the wrong thing to do. I browsed CVS and Gimp is connecting SIGPIPE to on_signal() since the beginning of CVS time. Absolutely. I get hammered by this on the print plugin every now and again when I'm hacking on it and it segv's on me. While we're on the subject of plugin interaction, is there any way that hitting "cancel" could be a bit more graceful, to allow the plugin a chance to clean up? The print plugin doesn't get a chance to clean up its lpr child when the user cancels, with the result that an incomplete print file gets sent to the printer. That's nasty. -- Robert Krawitz <[EMAIL PROTECTED]> http://www.tiac.net/users/rlk/ Tall Clubs International -- http://www.tall.org/ or 1-888-IM-TALL-2 Member of the League for Programming Freedom -- mail [EMAIL PROTECTED] Project lead for The Gimp Print -- http://gimp-print.sourceforge.net "Linux doesn't dictate how I work, I dictate how Linux works." --Eric Crampton
Re: EPIPE
Tim Mooney wrote: > > I just looked at 1.1.19, and on_signal there was doing the same thing it > is now: SIGPIPE caused a call to gimp_terminate. Obviously the current code > is based on the older code. > > So why is Austin observing the problem now? I'm not sure. It does seem > like it's behaving as expected based on the code. The thing to try would > be to use a different signal handler for SIGPIPE, that doesn't terminate > the gimp. Perhaps just ignore the signal and let the plug-in protocol detect > the problem (assuming it can?)? We should probably ignore SIGPIPE totally and let a more sophisticated SIGCHLD handler do the work: - record which processes have been started ("gimp_nanny()") - on SIGCHLD check if it crashed... - ...and somehow clean up the plugin's struct and io channels (and show a proper error message) The problem with this may be that we need a periodic function doing the cleanup (just like the shell prints it's "bla exited [SIGwhatever]" message before the prompt) because we cannot do it from the handler. OTOH it should be ok to call this cleanup function from two places: 1. whenever read/write from/to a plugin pipe fails. 2. in an idle function. The "gimp_fork()" and signal handler stuff I proposed during the SIGCHLD discussion might so the job but it would be much nicer to fix it to work like before :) bye, --Mitch
Re: EPIPE
Tim Mooney wrote: > > In regard to: Re: EPIPE, Michael Natterer said (at 8:22pm on May 8, 2000): > > >Unfortunately this is not the reason why gimp dies on just any aborting > >child. Although I 100% agree that SIGPIPE being fatal is the wrong thing > >to do. I browsed CVS and Gimp is connecting SIGPIPE to on_signal() since > >the beginning of CVS time. > > > >Mysteriously, as you mention, it worked before. I'm also pretty sure that > >the current state of signal processing behaves exactly like before Garry > >started to fix the SIGCHLD bug. > > The current state of signal processing does *not* behave exactly as before: > plug-in query works on alpha-dec-osf now. Using sigaction instead of signal > should improve things across the board, on all platforms. signal() on HP-UX > and Solaris was the SYSV signal, where the handler needed to be reset after > every signal (and there was therefore a race condition), so even though major > problems regarding signal handling were never reported on those platforms, > they were lurking. Ok, the handling of SIGCHLD has changed but SIGPIPE is handled as always in Gimp and the call to gimp_signal_private() emulates the old signal() > The SIGPIPE problem is because on_signal is currently treating it as a > fatal signal (see the case on_signal in app/main.c). The on_signal routine > should probably be modified to not treat SIGPIPE as fatal. That should fix > the problem Austin is seeing (that others will no doubt see too). Someone > should investigate the handler in 1.1.19 or earlier, and see what was being > done on SIGPIPE there. Nope, it unfortunately has another reason. I can't explain why it used to work with SIGPIPE being fatal but the diffs of app/main.c show no semantical changes at all down to CVS version 1.1. > Austin is also correct that calling printf from the handler is probably > asking for trouble. If a message must be written, some other method should > be found (write *is* ok from a signal handler, but won't using *that* be > fun...). Hm, I guess printf from a signal handler which aborts the program should be totally safe (while we should of course _not_ call the handler for SIGPIPE). My current theory (TM) is that there are weird implications between SIGPIPE and SIGCHLD (which we currently ignore) but I suspect this is total nonsense :) bye, --Mitch
Re: EPIPE
I said: >The SIGPIPE problem is because on_signal is currently treating it as a >fatal signal (see the case on_signal in app/main.c). The on_signal routine >should probably be modified to not treat SIGPIPE as fatal. That should fix >the problem Austin is seeing (that others will no doubt see too). Someone >should investigate the handler in 1.1.19 or earlier, and see what was being >done on SIGPIPE there. I just looked at 1.1.19, and on_signal there was doing the same thing it is now: SIGPIPE caused a call to gimp_terminate. Obviously the current code is based on the older code. So why is Austin observing the problem now? I'm not sure. It does seem like it's behaving as expected based on the code. The thing to try would be to use a different signal handler for SIGPIPE, that doesn't terminate the gimp. Perhaps just ignore the signal and let the plug-in protocol detect the problem (assuming it can?)? Tim -- Tim Mooney [EMAIL PROTECTED] Information Technology Services (701) 231-1076 (Voice) Room 242-J1, IACC Building (701) 231-8541 (Fax) North Dakota State University, Fargo, ND 58105-5164
Re: EPIPE
In regard to: Re: EPIPE, Michael Natterer said (at 8:22pm on May 8, 2000): >Austin Donnelly wrote: >> >> Current gimp (1.1.21) seems to have problems with recovering from any >> plugin that dies. Things start going wrong when it takes a SIGPIPE >> while trying to write(read?) to the pipe to the plugin which is dead. >> Rather than ignoring SIGPIPE, and collecting an EPIPE from the io >> operation and using this to trigger dead plugin cleanup operations, >> gimp currently treats SIGPIPE just like any other signal: it's fatal. >> Unfortunately, while attempting to print out some error message or >> other, gimp causes a segfault. This might be due to non-reentrant >> stdio libraries in use, I don't know. According to POSIX, the only >> thing you're allowed to do is read or write variables of type >> sigatomic_t. Calling libc funcitions (including printf()) sounds like >> a recipe for disaster, especially with a non-reentrant libc. >> >> This needs some more thought, and I don't have much time right now to >> look into any more. I'm pretty sure that plugins were correctly >> cleaned up on their unexpected termination at some earlier stage. The >> whole point of plugins being separate processes is that a plugin >> should be unable to cause the main gimp app to crash: if they can then >> this is a fairly critical bug that should be fixed. > >Hi Austin, > >Unfortunately this is not the reason why gimp dies on just any aborting >child. Although I 100% agree that SIGPIPE being fatal is the wrong thing >to do. I browsed CVS and Gimp is connecting SIGPIPE to on_signal() since >the beginning of CVS time. > >Mysteriously, as you mention, it worked before. I'm also pretty sure that >the current state of signal processing behaves exactly like before Garry >started to fix the SIGCHLD bug. Actually, it was Austin's diagnosis (the hard part!) and initial code, with Garry fleshing it out and polishing it up in Austin's abscense, and comments from the peanut gallery (me and you, mainly). The current state of signal processing does *not* behave exactly as before: plug-in query works on alpha-dec-osf now. Using sigaction instead of signal should improve things across the board, on all platforms. signal() on HP-UX and Solaris was the SYSV signal, where the handler needed to be reset after every signal (and there was therefore a race condition), so even though major problems regarding signal handling were never reported on those platforms, they were lurking. The SIGPIPE problem is because on_signal is currently treating it as a fatal signal (see the case on_signal in app/main.c). The on_signal routine should probably be modified to not treat SIGPIPE as fatal. That should fix the problem Austin is seeing (that others will no doubt see too). Someone should investigate the handler in 1.1.19 or earlier, and see what was being done on SIGPIPE there. Austin is also correct that calling printf from the handler is probably asking for trouble. If a message must be written, some other method should be found (write *is* ok from a signal handler, but won't using *that* be fun...). Tim -- Tim Mooney [EMAIL PROTECTED] Information Technology Services (701) 231-1076 (Voice) Room 242-J1, IACC Building (701) 231-8541 (Fax) North Dakota State University, Fargo, ND 58105-5164
Re: EPIPE
Austin Donnelly wrote: > > Current gimp (1.1.21) seems to have problems with recovering from any > plugin that dies. Things start going wrong when it takes a SIGPIPE > while trying to write(read?) to the pipe to the plugin which is dead. > Rather than ignoring SIGPIPE, and collecting an EPIPE from the io > operation and using this to trigger dead plugin cleanup operations, > gimp currently treats SIGPIPE just like any other signal: it's fatal. > Unfortunately, while attempting to print out some error message or > other, gimp causes a segfault. This might be due to non-reentrant > stdio libraries in use, I don't know. According to POSIX, the only > thing you're allowed to do is read or write variables of type > sigatomic_t. Calling libc funcitions (including printf()) sounds like > a recipe for disaster, especially with a non-reentrant libc. > > This needs some more thought, and I don't have much time right now to > look into any more. I'm pretty sure that plugins were correctly > cleaned up on their unexpected termination at some earlier stage. The > whole point of plugins being separate processes is that a plugin > should be unable to cause the main gimp app to crash: if they can then > this is a fairly critical bug that should be fixed. Hi Austin, Unfortunately this is not the reason why gimp dies on just any aborting child. Although I 100% agree that SIGPIPE being fatal is the wrong thing to do. I browsed CVS and Gimp is connecting SIGPIPE to on_signal() since the beginning of CVS time. Mysteriously, as you mention, it worked before. I'm also pretty sure that the current state of signal processing behaves exactly like before Garry started to fix the SIGCHLD bug. There is also an infinite loop using 30% of my cpu time in all cases where a dying plugin does _not_ kill gimp. This also happens if a plugin calls gimp_quit() (which is the recomended way of aborting a plugin and it definitely worked before, too). I suspect there are some strange things going on in plug_in.c like a forgotten plug_in_push() or the following (unrelated): --- /* plug_in_pop (); */ plug_in_params_destroy (proc_run.params, proc_run.nparams, FALSE); old_recurse = plug_in->recurse; plug_in->recurse = TRUE; /* gtk_main (); */ /* return_vals = plug_in_get_current_return_vals (proc_rec); */ --- which, toghether with libgimp/gimp.c: --- /* No longer a return message */ /* proc_return.name = proc_run->name; */ /* proc_return.nparams = nreturn_vals; */ /* proc_return.params = (GPParam*) return_vals; */ /* if (!gp_temp_proc_return_write (_writechannel, &proc_return)) */ /* gimp_quit (); */ --- makes it impossible to call temp procs across plugins. I tried to use temp procs in a naive way (i.e. how expected) and totally failed when hacking the help browser's temp proc. The caller just doesn't get a return message when calling a plugin's temp proc. At least this is my theory :) [ no conclusion ] starting gdb... --Mitch
Re: EPIPE
On Mon, May 08, 2000 at 02:16:06PM +0100, Austin Donnelly wrote: > cleaned up on their unexpected termination at some earlier stage. The > whole point of plugins being separate processes is that a plugin > should be unable to cause the main gimp app to crash: if they can then > this is a fairly critical bug that should be fixed. s/unable to cause/able to prevent/ We don't do nearly enough checks to prevent plug-ins from killing Gimp if they're badly written. A future version could, in theory, but it seems better to assume that plug-in authors have good intentions. However, minor accidents causing a crash in a plug-in should NOT kill Gimp as you said. This is our strength over most PC apps when it comes to integrating 3rd party plug-ins. This has been annoying me, but I presumed that whoever's been tinkering with the signals knew what they were doing...? Nick.
EPIPE
Current gimp (1.1.21) seems to have problems with recovering from any plugin that dies. Things start going wrong when it takes a SIGPIPE while trying to write(read?) to the pipe to the plugin which is dead. Rather than ignoring SIGPIPE, and collecting an EPIPE from the io operation and using this to trigger dead plugin cleanup operations, gimp currently treats SIGPIPE just like any other signal: it's fatal. Unfortunately, while attempting to print out some error message or other, gimp causes a segfault. This might be due to non-reentrant stdio libraries in use, I don't know. According to POSIX, the only thing you're allowed to do is read or write variables of type sigatomic_t. Calling libc funcitions (including printf()) sounds like a recipe for disaster, especially with a non-reentrant libc. This needs some more thought, and I don't have much time right now to look into any more. I'm pretty sure that plugins were correctly cleaned up on their unexpected termination at some earlier stage. The whole point of plugins being separate processes is that a plugin should be unable to cause the main gimp app to crash: if they can then this is a fairly critical bug that should be fixed. Austin