Re: Alternate termination sequence option --term-seq

Rasmus Villemoes Thu, 30 Apr 2015 03:48:08 -0700

On Wed, Apr 29 2015, Ole Tange <[email protected]> wrote:

> On Wed, Apr 29, 2015 at 2:07 PM, Rasmus Villemoes <[email protected]> 
> wrote:
>> On Wed, Apr 29 2015, Ole Tange <[email protected]> wrote:
>>
>>> This still has the risk of killing an innocent PID and its children.
>>
>> Killing (in the sense of sending any signal whatsoever) an
>> innocent/unrelated PID is completely unacceptable, IMO. On a reasonably
>> busy system, PID reuse within 10 seconds is far from unlikely.
>
> On my system this gives PID reuse after 3.1 secs, but that is a very
> extreme case, and I will accept if GNU Parallel deals wrongly with that case:
>
> perl -e 'while(1) { $a=(fork()|| exit); if(not $a %1000) {print "$a\n";}  } '
>

It can happen a lot faster if the system has a lot of running processes
(or not running; 20000 processes doing sleep(100000) has the same
effect). But this is besides the point: I do not want GNU Parallel to
accidentally kill some other process; there's no limit to what kind of
corruption that might cause, especially if the signal is SIGKILL.

>> Mapping the tree even before signalling the immediate children is not
>> enough; some of the grand^nchildren may vanish in the meantime and
>> their PIDs reused before one can use the gathered information.
>
> I doubt that is true in practice. Mapping takes less than 100 ms, so I
> would find it very unlikely that the PID will be reused that fast. I
> understand that this could in theory happen, but I would like to see
> this demonstrated before I consider this a real problem.

I'm a mathematician and a strong believer in Murphy's law. I really
prefer _not_ seeing this happen :-)

> Since GNU Parallel will be sleeping (and not doing anything else) we
> could simply kill 0 all the (grand*)children every second and compute
> the family tree of the current children. If the child dies, remove the
> child from the list to be killed later.
>
> @children=familiy_tree(@job_pids);
> for $signal (@the_signals) {
>   kill $signal, @job_pids;
>   $sleep_time = shift @sleep_times;
>   $time_slept = 0;
>   while($time_slept < $sleep_time and @children) {
>     @children = family_tree(grep { kill( 0, $_) } @children);
>     sleep $a_while;
>     $time_slept += $a_while;
>   }
> }
> kill KILL, @children;
>
> Rasmus: Can you find a situation in which the above will fail?

Yes, but the argument will just be the same: Any pid in @children which
is not a direct child of yours must be considered out-of-bounds. Between
the time the pid is obtained and SIGKILL is sent, that process may have
exited and the pid reused. In fact, the above is worse, since any random
process can sneak into the @children array at some point, causing all
its descendants to be picked up by the family_tree, and then we have the
potential for a real bloodbath.

>> I think the only way to do this right is for GNU Parallel to make each
>> immediate child a process group leader (setpgrp 0,0 immediately after
>> fork).
>
> GNU Parallel uses open3 to spawn children. According to strace -ff
> that does not do a setpgrp.

One can call setpgid on behalf of another process. So GNU Parallel can
put the child created by open3 in its own process group by doing
setpgid(child, 0) (equivalently setpgid(child, child)). In Perl, it
seems that the wrapper for the setpgid system call is setpgrp, but the
semantics should be the same.

On Thu, Apr 30 2015, Martin d'Anjou <[email protected]> wrote:

> To me the "right thing" is to give a chance for well-written programs,
> programs that obey all the rules about cleaning up their child
> processes, a chance to execute without causing in them spurious errors
> like "kill (...) - No such process". Send the signals to the immediate
> child first, then after a delay, if that has failed, do the process
> group kill loop. To those who care, it is a huge improvement, and to
> those who do not care, they won't care one way or the other.

On Wed, Apr 29 2015, Ole Tange <[email protected]> wrote:

> Yes. GNU Parallel should do the right thing in most cases and not
> cause a problem in the rest.

OK, so here's a concrete proposal, which shouldn't cause random
processes to get killed:

- Put every child in its own process group, per the above.

- When it's time to begin the termination sequence, send the first
  signal to each immediate child.

Now the children may respond by (hopefully cleaning up and)
exiting. _Before_ reaping the child, we can send a signal to its entire
process group; if the child was well-behaved, there are no other
processes in its process group, so this is a noop, otherwise this is
precisely when we want to terminate the grandchildren etc. Of course,
the SIGCHLD doesn't tell us which child has died, but we can reliably
check each of our children for being in the zombie state before calling
wait (and then only wait for children which we know are gone).

I'm not sure what the appropriate signal is to send to the grandchildren
- SIGKILL is rather harsh. It may be that the child wasn't well-behaved
and just died upon getting the signal without propagating it further
down, but that some grandchild actually has some state it could and
should clean up. Maybe the right thing to do is send all the signals in
the users chosen termination sequence in reasonably rapid succession,
but still giving the grandchildren some time to respond to them.

In the common case, when there are no grandchildren (often simply
because the child never spawned any), or when the grandchildren actually
respond to the first signal in the sequence, we'd also like to avoid a
bunch of useless sleeps. As an optimization, we could scan the process
tree for processes belonging to the process group of the zombie child,
and stop when only the child is found. Note that I still do not propose
sending signals specifically to any pids found - any signals are still
sent to the process group as a whole, which exists for as long as we
keep the child in the zombie state. The scanning also automatically
introduces a little delay, which may be enough for the grandchildren to
do their cleanup.

After we've sent the last signal to the process group, there's nothing
more we can do, so we can reap the child and forget about it.

Now, this can be repeated for each given signal, with the given delays
inbetween, stopping of course if/when all children are gone.

This became a lot longer than I intended - I hope the idea is obvious
enough.

Rasmus

Re: Alternate termination sequence option --term-seq

Reply via email to