On 09/02/2014 05:08 AM, Adi Roiban wrote:
Hi,

While using spawnProcess on Linux I found out that when an invalid
executable is called there is a corner case in which a zombie process
is left until main process exists and can not be closed.

I wrote a test for this but I was not able to reproduce this error in
isolation, event if I run the test for 10000 times. reapProcess will
always succeed from the first call.

For the production code I can always reproduce the problem.

Inspecting the execution thread I found out that all pipes are closed
but spawned process is not closed yet. Due to this
Process.maybeCallProcessEnded() will call self.reapProcess().

In my case,  os.waitpid(pid, os.WNOHANG) return 0, and
self.reapProcess() will just ignore this case.

We encountered this problem in our code too. We worked around it with the following code, which basically monkey-patches Twisted to "try again later" when waitpid returns 0. (Most of the code below is just copied from _BaseProcess; the important part is the "elif pid == 0" branch.)

-----

"""Workarounds for problems with Twisted."""

import errno
import os

from twisted.python import log
from twisted.internet.process import (
    _BaseProcess,
    reapAllProcesses,
    unregisterReapProcessHandler
)

def workaround_reapProcess(reactor):
    """Install a workaround for unsticking reapProcess.

    Sometimes when a child process takes too long to die that
    reapProcess doesn't catch it in time.  We add a hack where we add
    a timeout to the reactor to try again later.
    """

    def reapProcess(self):
        """
        Try to reap a process (without blocking) via waitpid.

        This is called when sigchild is caught or a Process object loses its
        "connection" (stdout is closed) This ought to result in reaping all
        zombie processes, since it will be called twice as often as it needs
        to be.

        (Unfortunately, this is a slightly experimental approach, since
        UNIX has no way to be really sure that your process is going to
        go away w/o blocking.  I don't want to block.)
        """
        try:
            try:
                pid, status = os.waitpid(self.pid, os.WNOHANG)
            except OSError, e:
                if e.errno == errno.ECHILD:
                    # no child process
                    pid = None
                else:
                    raise
        except:
            log.msg('Failed to reap %d:' % self.pid)
            log.err()
            pid = None
            status = None
        if pid:
            self.processEnded(status)
            unregisterReapProcessHandler(pid, self)
        elif pid == 0:
            # Twisted seems to get stuck if pid is 0, which means that
            # the child process hasn't changed status, but if called
            # after SIGCHLD probably means that the child process is
            # in the process of dying, but hasn't quite died yet.
            # We'll try to kick the reactor to reap the processes
            # again in a bit.
            #
            # We're testing specifically against 0 because pid may
            # also be None in an error case.
            def unstick():
                reapAllProcesses()
            reactor.callLater(1, unstick)

    _BaseProcess.reapProcess = reapProcess

----

To use this, import your reactor and then call workaround_reapProcess(reactor).

Now that two of us have seen the same problem, we should probably file a ticket in the bug tracker.
    --Justin

_______________________________________________
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Reply via email to