On 15-04-12 07:14 AM, Ole Tange wrote:

You mean in sub Job::kill():

             # Wait up to 200 ms between TERMs - but only if any pids
are alive
             my $sleep = 1;
             for (my $sleepsum = 0; kill 0, $family_pids[0] and $sleepsum < 200;
                  $sleepsum += $sleep) {
                 $sleep = ::reap_usleep($sleep);
             }

'kill 0, pid' returns true if the process is still running.
$family_pids[0] is the immediate child (i.e. the parent of any
(grand)*children)).
There is no need to see if any (grand)*children are running: it is the
job of $family_pids[0] to kill those.

I agree: it is the job of the parent process to kill its child processes. But this does not match what I see in the code. Here is the kill loop:

    for my $pid (@family_pids) {
        if(kill 0, $pid) {
        # The job still running
        kill $signal, $pid;
        $alive = 1;
        }
    }

The kill loop sends the kill signal to all elements of the family_pids array. So IMO the code and the intent have to match. In fact, I ran into this confusion when I wrote the test (line 19 of this file):
https://github.com/martinda/gnu-parallel/blob/sigterm-1/testsuite/tests-to-run/parallel-local-signals.sh#L19

I propose two solutions:
- Change the code to only kill the parent process (existing users will have to change their application code to kill children processes themselves) - Do not change the default behaviour, but add an option so that only the parent job is killed (existing users see no change)

What solution is preferable?

Martin

Reply via email to