Re: sigCHLD signal handling (Was: Re: pipes? threadWaitRead?)
In local.glasgow-haskell-bugs, you wrote: Am 10. Jul 2002 um 22:21 CEST schrieb Dean Herington: The first issue I confronted is that the get*ProcessStatus routines return an error rather than nothing if there is no candidate child process. Yes, `waitpid' might return with EINTR which will cause an exception (I just checked, it did). I'll try to devise a fix for PosixProcPrim. That's ECHILD, not EINTR. The only nice way to figure this out might be keeping a global counter (in an MVar), increasing it on forkProcess() and counting it down for each awaited child. That way, you could safely loop on `waitpid' if you know there should still be a child around (and you'll get `Nothing' if it's not done yet). Volker -- http://news.bbc.co.uk: `Israeli forces [...], declaring curfews that confine more than 700,000 people to their homes.' ___ Glasgow-haskell-bugs mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
Re: sigCHLD signal handling (Was: Re: pipes? threadWaitRead?)
Volker Stolz wrote: In local.glasgow-haskell-bugs, you wrote: Am 10. Jul 2002 um 22:21 CEST schrieb Dean Herington: The first issue I confronted is that the get*ProcessStatus routines return an error rather than nothing if there is no candidate child process. Yes, `waitpid' might return with EINTR which will cause an exception (I just checked, it did). I'll try to devise a fix for PosixProcPrim. That's ECHILD, not EINTR. Yes, that's what I would expect from reading the man page. I was not complaining about the Unix semantics, just noting that there was a subtlety I wasn't previously aware of that made my code more complex. The only nice way to figure this out might be keeping a global counter (in an MVar), increasing it on forkProcess() and counting it down for each awaited child. That way, you could safely loop on `waitpid' if you know there should still be a child around (and you'll get `Nothing' if it's not done yet). Yes, that's exactly the solution I came up with. Volker Thanks. Dean ___ Glasgow-haskell-bugs mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
RE: sigCHLD signal handling (Was: Re: pipes? threadWaitRead?)
I had thought of having the signal handler reap as many terminated child processes as possible, but had been concerned about a possible race condition. After you suggested that approach, I thought some more and decided that no race problem should exist. So I've implemented multiple reaping and it does help. I no longer have any tests hang as before. (Note that I still do see the occasional EVACUATED object entered! error.) However, the implementation turned out to be surprisingly complex. Can you manage to get a repeatable case of the 'EVACUATED object' error? I'd really like to track that one down. The first issue I confronted is that the get*ProcessStatus routines return an error rather than nothing if there is no candidate child process. (The GHC routines simply reflect the system call semantics.) So can't you just catch the error? Something like handler = do r - try (getAnyProcessStatus ...) case r of Left _ - return () Right _ - handler Cheers, Simon ___ Glasgow-haskell-bugs mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
RE: sigCHLD signal handling (Was: Re: pipes? threadWaitRead?)
The fine points of Unix signal semantics have always been somewhat mysterious to me. However, after digging around in man pages for a while, I have a theory as to what's going wrong... Yes, your diagnosis looks very plausible. The right way, I believe, to handle this in your signal handler is to call getAnyProcessStatus repeatedly until it doesn't return any more children (not forgetting to use the non-blocking version, ie. the first arg should be False). Does that help? Cheers, Simon ___ Glasgow-haskell-bugs mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
Re: sigCHLD signal handling (Was: Re: pipes? threadWaitRead?)
Simon Marlow wrote: The fine points of Unix signal semantics have always been somewhat mysterious to me. However, after digging around in man pages for a while, I have a theory as to what's going wrong... Yes, your diagnosis looks very plausible. The right way, I believe, to handle this in your signal handler is to call getAnyProcessStatus repeatedly until it doesn't return any more children (not forgetting to use the non-blocking version, ie. the first arg should be False). Does that help? I had thought of having the signal handler reap as many terminated child processes as possible, but had been concerned about a possible race condition. After you suggested that approach, I thought some more and decided that no race problem should exist. So I've implemented multiple reaping and it does help. I no longer have any tests hang as before. (Note that I still do see the occasional EVACUATED object entered! error.) However, the implementation turned out to be surprisingly complex. The first issue I confronted is that the get*ProcessStatus routines return an error rather than nothing if there is no candidate child process. (The GHC routines simply reflect the system call semantics.) This required me to maintain a count of child processes so I could avoid trying to reap nonexistent children. Fortunately, adding the counting to my monad was not difficult. But having the signal handler avoid subsequent reaping was insufficient. I apparently had instances of the signal handler for which there were no children to reap. What I figure must be happening is something like the following. A sigCHLD signal comes in. A signal handler instance is created. But before it is run, sigCHLD is unblocked. A second sigCHLD signal comes in. A second signal handler instance is created. One of the signal handler instances runs and reaps both terminated children. The second signal handler instance runs and finds nothing to reap. I bullet-proofed my logic, so that the signal handler conditions even its first getAnyProcessStatus call on a nonzero child count. (By the way, I had to be careful to lock access to the child count appropriately.) The result now seems to work properly. Two questions: 1. Though I'm now immune to seeing too many sigCHLD signals, I still rely on seeing enough of them. Can you think of any way that a signal could go unnoticed in my scheme described above? 2. Is my supposition true, that sigCHLD is unblocked *before* invoking the signal handler? If so, I think this subtle but important semantic difference between GHC RTS and POSIX signal handling should be documented. Are there other differences? Dean ___ Glasgow-haskell-bugs mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
RE: sigCHLD signal handling (Was: Re: pipes? threadWaitRead?)
After much study I have a new theory. It appears that the pipe machinery is working fine, but that sometimes my program fails to reap all of its terminated child processes. I'm using a `sigCHLD` signal handler that does `getAnyProcessStatus True False` each time it's invoked. It seems that one or more `sigCHLD` signals are occurring while an instance of the handler is already running (having presumably blocked `sigCHLD` during its execution), and hence the contemporaneous signals are getting lost. (Oh, and it seems this unfortunate behavior occurs for me under Linux but not Solaris.) How can I avoid losing `sigCHLD` signals? It seems that use of the `SA_NODEFER` flag on `sigaction` might do the trick, but that flag is not accessible via `installHandler`. I can't see a reason why SIGCHLD signals might be lost, but the handler might be deferred in the way that Volker described if you have blocking C calls. Perhaps you could investigate with strace and see if the signal is actually being delivered? Cheers, Simon ___ Glasgow-haskell-bugs mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs