This is what it is going to take to fix the race in the kernel. Please
test it out. I'll commit it over the weekend.
fetch http://apollo.backplane.com/DFlyMisc/fork01.patch
The basic problem being solved here is that a signal sent to a process
group can wind up not being propogated to a newly fork()ed child if it
occurs just before or during the fork(). In order to fix the problem
we have to do two things:
(1) We have to interlock new signal delivery to a process group until
the fork1() code can add the new child to the process group. This
is accomplished by adding a lockmgr lock to the pgrp structure.
(2) fork1() cannot be allowed to run at all if there are pending signals,
for example, from a previous process group signal delivery that
completed but which has not yet been processed by the calling process.
Those signals must be processed BEFORE we can fork a new child or
the new child might miss a signal sent to the process group that
would otherwise have killed the parent before the fork(). This
case is handled by returning ERESTART if pending signals are
detected.
If the pending signal would cause the calling process to be killed,
the processing of the signal then kills the calling process and
the fork() is never restarted, hence no child is left dangling.
I have included a test program. With some playing around you should
be able to see that children can be left alive and ticking after a ^C
without the patch, and this hopefully will not occur after the patch.
It takes a little playing around with the test program to reproduce
the problem since it is somewhat dependant on the scheduler.
It is also fairly easy to reproceu this by typing 'make' in a post-built
kernel, e.g.:
cd /usr/obj/usr/src/sys/SOMEKERNEL
make
^C
make
^C
... repeat ... sometimes the build will appear to continue in the
background after the ^C without the patch.
-Matt
Matthew Dillon
<[EMAIL PROTECTED]>
/*
* TESTFORK.C
*
* ./testfork
* (hit ^C)
* .... try again
*
* If 'x' and/or 'X' is output after the ^C, the ^C missed a child process
* during fork().
*/
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
int
main(int ac, char **av)
{
int count = 0;
for (;;) {
if (fork() == 0) {
usleep(1000);
if (getppid() == 1) {
usleep(10000);
write(1, "x", 1);
sleep(1);
write(1, "X", 1);
}
_exit(0);
}
++count;
while (wait3(NULL, WNOHANG, NULL) > 0)
--count;
while (count > 75 && wait3(NULL, 0, NULL) > 0)
--count;
}
exit(0);
}