On Tue, Feb 15, 2011 at 2:57 AM, Dinh <pcd...@gmail.com> wrote: > Hi, > > I currently build a process management system which is able to fork child > processes (fork()) and keep them alive (waitpid() ). > > if pid in self.current_workers: > os.waitpid(pid, 0) > > If a child process dies, it should trigger a SIGCHLD signal and a handler is > installed to catch the signal and start a new child process. The code is > nothing special, just can be seen in any Python tutorial you can find on the > net. > > signal.signal(signal.SIGCHLD, self.restart_child_process) > signal.signal(signal.SIGHUP, self.handle) # reload > signal.signal(signal.SIGINT, self.handle) > signal.signal(signal.SIGTERM, self.handle) > signal.signal(signal.SIGQUIT, self.handle) > > However, this code does not always work as expected. Most of the time, it > works. When a child process exits, the master process receives a SIGCHLD and > restart_child_process() method is invoked automatically to start a new child > process. But the problem is that sometimes, I know a child process exits due > to an unexpected exception (via log file) but it seems that master process > does not know about it. No SIGCHLD and so restart_child_process() is not > triggered. Therefore, no new child process is forked. > > Could you please kindly tell me why this happens? Is there any special code > that need being installed to ensure that every dead child will be informed > correctly? > > Mac OSX 10.6 > Python 2.6.6
Hi Dinh. I've done no Mac OS/X programming, but I've done Python and *ix signals some - so I'm going to try to help you, but it'll be kind of stabbing in the dark. *ix signals have historically been rather unreliable and troublesome when used heavily. There are BSD signals, SysV signals, and POSIX signals - they all try to solve the problems in different ways. Oh, and Linux has a way of doing signals using file descriptors that apparently helps quite a bit. I'm guessing your Mac will have available BSD and maybe POSIX signals, but you might check on that. You might try using ktrace on your Mac to see if any SIGCHLD signals are getting lost (it definitely happens in some scenarios), and hopefully, which kind of (C level) signal API CPython is using on your Mac also. You might also make sure your SIGCHLD signal handler is not just waitpid'ing once per invocation, but rather doing a nonblocking waitpid in a loop until no process is found, in case signals are lost (especially if/when signals occur during signal handler processing). If the loop in your signal handler doesn't help (enough), you could also try using a nonblocking waitpid in a SIGALARM handler in addition to your SIGCHLD handler. Some signal API's want you to reenable the signal as your first action in your signal handler to shorten a race window. Hopefully Mac OS/X doesn't need this, but you might check on it. BTW, CPython signals and CPython threads don't play very nicely together; if you're combining them, you might want to study up on this. Oh, also, signals in CPython will tend to cause system calls to return without completing, and giving an EINTR in errno, and not all CPython modules will understand what to do with that. :( Sadly, many application programmers tend to ignore the EINTR possibility. HTH -- http://mail.python.org/mailman/listinfo/python-list