This is what it is going to take to fix the race in the kernel.  Please
    test it out.  I'll commit it over the weekend.

        fetch http://apollo.backplane.com/DFlyMisc/fork01.patch

    The basic problem being solved here is that a signal sent to a process
    group can wind up not being propogated to a newly fork()ed child if it
    occurs just before or during the fork().  In order to fix the problem
    we have to do two things:

    (1) We have to interlock new signal delivery to a process group until
        the fork1() code can add the new child to the process group.  This
        is accomplished by adding a lockmgr lock to the pgrp structure.

    (2) fork1() cannot be allowed to run at all if there are pending signals,
        for example, from a previous process group signal delivery that
        completed but which has not yet been processed by the calling process.
        Those signals must be processed BEFORE we can fork a new child or
        the new child might miss a signal sent to the process group that
        would otherwise have killed the parent before the fork().  This
        case is handled by returning ERESTART if pending signals are
        detected.

        If the pending signal would cause the calling process to be killed,
        the processing of the signal then kills the calling process and
        the fork() is never restarted, hence no child is left dangling.

    I have included a test program.  With some playing around you should
    be able to see that children can be left alive and ticking after a ^C
    without the patch, and this hopefully will not occur after the patch.

    It takes a little playing around with the test program to reproduce
    the problem since it is somewhat dependant on the scheduler.

    It is also fairly easy to reproceu this by typing 'make' in a post-built
    kernel, e.g.:

        cd /usr/obj/usr/src/sys/SOMEKERNEL
        make
        ^C
        make
        ^C
        ... repeat ...  sometimes the build will appear to continue in the
        background after the ^C without the patch.

                                        -Matt
                                        Matthew Dillon 
                                        <[EMAIL PROTECTED]>


/*
 * TESTFORK.C
 *
 * ./testfork
 * (hit ^C)
 * .... try again
 *
 * If 'x' and/or 'X' is output after the ^C, the ^C missed a child process
 * during fork().
 */
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>

int
main(int ac, char **av)
{
    int count = 0;

    for (;;) {
        if (fork() == 0) {
            usleep(1000);
            if (getppid() == 1) {
                usleep(10000);
                write(1, "x", 1);
                sleep(1);
                write(1, "X", 1);
            }
            _exit(0);
        }
        ++count;
        while (wait3(NULL, WNOHANG, NULL) > 0)
            --count;
        while (count > 75 && wait3(NULL, 0, NULL) > 0)
            --count;
    }
    exit(0);
}

Reply via email to