Re: sigCHLD signal handling (Was: Re: pipes? threadWaitRead?)

2002-07-15 Thread Volker Stolz

In local.glasgow-haskell-bugs, you wrote:
 Am 10. Jul 2002 um 22:21 CEST schrieb Dean Herington:
 The first issue I confronted is that the get*ProcessStatus routines return
 an error rather than nothing if there is no candidate child process.
 
 Yes, `waitpid' might return with EINTR which will cause an exception
 (I just checked, it did). I'll try to devise a fix for PosixProcPrim.

That's ECHILD, not EINTR.
The only nice way to figure this out might be keeping a global counter
(in an MVar), increasing it on forkProcess() and counting it down for
each awaited child. That way, you could safely loop on `waitpid' if
you know there should still be a child around (and you'll get `Nothing'
if it's not done yet).

Volker
-- 
http://news.bbc.co.uk: `Israeli forces [...], declaring curfews that
confine more than 700,000 people to their homes.'

___
Glasgow-haskell-bugs mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs



Re: sigCHLD signal handling (Was: Re: pipes? threadWaitRead?)

2002-07-15 Thread Dean Herington

Volker Stolz wrote:

 In local.glasgow-haskell-bugs, you wrote:
  Am 10. Jul 2002 um 22:21 CEST schrieb Dean Herington:
  The first issue I confronted is that the get*ProcessStatus routines return
  an error rather than nothing if there is no candidate child process.
 
  Yes, `waitpid' might return with EINTR which will cause an exception
  (I just checked, it did). I'll try to devise a fix for PosixProcPrim.

 That's ECHILD, not EINTR.

Yes, that's what I would expect from reading the man page.

I was not complaining about the Unix semantics, just noting that there was a
subtlety I wasn't previously aware of that made my code more complex.

 The only nice way to figure this out might be keeping a global counter
 (in an MVar), increasing it on forkProcess() and counting it down for
 each awaited child. That way, you could safely loop on `waitpid' if
 you know there should still be a child around (and you'll get `Nothing'
 if it's not done yet).

Yes, that's exactly the solution I came up with.

 Volker

Thanks.

Dean

___
Glasgow-haskell-bugs mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs



RE: sigCHLD signal handling (Was: Re: pipes? threadWaitRead?)

2002-07-11 Thread Simon Marlow


 I had thought of having the signal handler reap as many 
 terminated child
 processes as possible, but had been concerned about a possible race
 condition.  After you suggested that approach, I thought some more and
 decided that no race problem should exist.  So I've 
 implemented multiple
 reaping and it does help.  I no longer have any tests hang as before.
 (Note that I still do see the occasional EVACUATED object entered!
 error.)  However, the implementation turned out to be 
 surprisingly complex.

Can you manage to get a repeatable case of the 'EVACUATED object' error?
I'd really like to track that one down.

 The first issue I confronted is that the get*ProcessStatus 
 routines return
 an error rather than nothing if there is no candidate child process.
 (The GHC routines simply reflect the system call semantics.)

So can't you just catch the error?  Something like

handler = do
r - try (getAnyProcessStatus ...)
  case r of
  Left  _ - return ()
  Right _ - handler

Cheers,
Simon

___
Glasgow-haskell-bugs mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs



RE: sigCHLD signal handling (Was: Re: pipes? threadWaitRead?)

2002-07-10 Thread Simon Marlow

 The fine points of Unix signal semantics have always been somewhat
 mysterious to me.  However, after digging around in man pages 
 for a while,
 I have a theory as to what's going wrong...

Yes, your diagnosis looks very plausible.

The right way, I believe, to handle this in your signal handler is to
call getAnyProcessStatus repeatedly until it doesn't return any more
children (not forgetting to use the non-blocking version, ie. the first
arg should be False).  Does that help?

Cheers,
Simon
___
Glasgow-haskell-bugs mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs



Re: sigCHLD signal handling (Was: Re: pipes? threadWaitRead?)

2002-07-10 Thread Dean Herington

Simon Marlow wrote:

  The fine points of Unix signal semantics have always been somewhat
  mysterious to me.  However, after digging around in man pages
  for a while,
  I have a theory as to what's going wrong...

 Yes, your diagnosis looks very plausible.

 The right way, I believe, to handle this in your signal handler is to
 call getAnyProcessStatus repeatedly until it doesn't return any more
 children (not forgetting to use the non-blocking version, ie. the first
 arg should be False).  Does that help?

I had thought of having the signal handler reap as many terminated child
processes as possible, but had been concerned about a possible race
condition.  After you suggested that approach, I thought some more and
decided that no race problem should exist.  So I've implemented multiple
reaping and it does help.  I no longer have any tests hang as before.
(Note that I still do see the occasional EVACUATED object entered!
error.)  However, the implementation turned out to be surprisingly complex.

The first issue I confronted is that the get*ProcessStatus routines return
an error rather than nothing if there is no candidate child process.
(The GHC routines simply reflect the system call semantics.)  This required
me to maintain a count of child processes so I could avoid trying to reap
nonexistent children.  Fortunately, adding the counting to my monad was not
difficult.

But having the signal handler avoid subsequent reaping was insufficient.  I
apparently had instances of the signal handler for which there were no
children to reap.  What I figure must be happening is something like the
following.  A sigCHLD signal comes in.  A signal handler instance is
created.  But before it is run, sigCHLD is unblocked.  A second sigCHLD
signal comes in.  A second signal handler instance is created.  One of the
signal handler instances runs and reaps both terminated children.  The
second signal handler instance runs and finds nothing to reap.

I bullet-proofed my logic, so that the signal handler conditions even its
first getAnyProcessStatus call on a nonzero child count.  (By the way, I
had to be careful to lock access to the child count appropriately.)  The
result now seems to work properly.

Two questions:

1. Though I'm now immune to seeing too many sigCHLD signals, I still rely
on seeing enough of them.  Can you think of any way that a signal could
go unnoticed in my scheme described above?

2. Is my supposition true, that sigCHLD is unblocked *before* invoking the
signal handler?  If so, I think this subtle but important semantic
difference between GHC RTS and POSIX signal handling should be documented.
Are there other differences?

Dean

___
Glasgow-haskell-bugs mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs



RE: sigCHLD signal handling (Was: Re: pipes? threadWaitRead?)

2002-07-09 Thread Simon Marlow

 After much study I have a new theory.  It appears that the 
 pipe machinery
 is working fine, but that sometimes my program fails to 
 reap all of its
 terminated child processes.  I'm using a `sigCHLD` signal handler that
 does `getAnyProcessStatus True False` each time it's invoked. 
  It seems
 that one or more `sigCHLD` signals are occurring while an 
 instance of the
 handler is already running (having presumably blocked 
 `sigCHLD` during its
 execution), and hence the contemporaneous signals are getting lost.
 (Oh, and it seems this unfortunate behavior occurs for me 
 under Linux but
 not Solaris.)
 
 How can I avoid losing `sigCHLD` signals?
 
 It seems that use of the `SA_NODEFER` flag on `sigaction` might do the
 trick, but that flag is not accessible via `installHandler`.

I can't see a reason why SIGCHLD signals might be lost, but the handler
might be deferred in the way that Volker described if you have blocking
C calls.

Perhaps you could investigate with strace and see if the signal is
actually being delivered?

Cheers,
Simon
___
Glasgow-haskell-bugs mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs