Re: Review Request 65695: Made the default executor allow schedulers to retry task kills.

Gaston Kleiman Tue, 20 Mar 2018 14:35:15 -0700


> On March 20, 2018, 5:53 a.m., Alexander Rukletsov wrote:
> > A high level question I have is the semantics of a task kill. You retry 
> > `SIGKILL` but not `SIGTERM`. Why is it so? Do you expect frameworks to 
> > retry kills at some point of time (which should be less that the kill 
> > policy timeout to make sense) if they do not receive a terminal status 
> > update? Since the escalation timer is started anyway, is it correct to 
> > transition the container back to `killing=false`? This can also lead to 
> > multiple `SIGKILL` requests going to the agent for the same container, 
> > which is likely not a big deal, but maybe is worth checking?
> > 
> > My suggestion would be to retry everything ourselves and do not expect 
> > frameworks to do this for us. When we faced a similar problem with the 
> > docker executor, we decided that once the task has been transitioned to 
> > `killed`, there is no way back, even if the kill attempt failed.

Yes, I expect schedulers to retry kills at some point. Some schedulers already 
do this, i.e., Marathon.

A problem with the approach is that Mesos doesn't send any kind of signal when 
a kill fails. So probably the easiest way for a scheduler to detect a KILL 
failure is to check if a task is still in `TASK_KILLING` after the grace period 
elapses. This means that the retries will probably happen after the kill grace 
period lapsed.

Given that the escalation is not cancelled when a kill call fails, it is 
possible for the task to get a `SIGKILL` and no `SIGTERM`. We could fix this by 
only scheduling the escalation once the `SIGTERM` kill call succeeds.

We definitely want the default executor to retry kills that it initiates (see 
[MESOS-8532](https://issues.apache.org/jira/browse/MESOS-8532]). I am not sure 
if the default executor should also retry kills initiated by a scheduler. It 
could be better to let it somehow signal the scheduler that the kill request 
failed and then to let the scheduler decide when to retry the kill, also giving 
it the chance to adjust the kill policy.

> On March 20, 2018, 5:53 a.m., Alexander Rukletsov wrote:
> > src/launcher/default_executor.cpp
> > Lines 1242-1243 (original)
> > <https://reviews.apache.org/r/65695/diff/3/?file=1978910#file1978910line1242>
> >
> >     Is this change related to your chain? Or does this fix a different 
> > issue with the broken invariant?

This method can be called by `taskHealthUpdated()` when the executor is not 
subscribed, so it is wrong to assume that the executor will always be 
subscribed.

Furthermore, the method doesn't require the executor to be subscribed, because 
it triggers a `KILL_NESTED_CONTAINER` call, which is part of the agent API and 
not of the executor API.

After having talked to Anand, I think that when this was added, the following 
was assumed:

1) That the executor will be unsubscribed only for brief periods of time, 
mostly during agent recovery, when `KILL_NESTED_CONTAINER` calls will most 
likely fail. Crashing the executor leads to the destruction of all the child 
containers, so it was a way of working around `KILL_NESTED_CONTAINER` failures.
2) At that time the default executor supported launching only one task group. 
Assuming this, killing a task would eventually trigger the destruction of all 
the child containers, which will also happen (in a less graceful) when the 
executor crashes. This made it acceptable to fail this CHECK.

If the executor detects kill failures and allows the scheduler to retry or 
retries by itself, we don't need this check as a workaround, addressing #1.

The default executor now supports running multiple task groups, which makes 
assumption #2 invalid and dangerous. If this CHECK fails, all the containers 
started by the executor will be killed, regardles of which task group they 
belong to. This is bad and could lead to data loss.

This patch addresses all of this by removing the CHECK, making the executor 
detect kill failures, and allowing schedulers to retry kills.

- Gaston

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65695/#review199540
-----------------------------------------------------------

On March 20, 2018, 1:50 p.m., Gaston Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65695/
> -----------------------------------------------------------
> 
> (Updated March 20, 2018, 1:50 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Joseph Wu, Qian Zhang, and 
> Vinod Kone.
> 
> 
> Bugs: MESOS-8530
>     https://issues.apache.org/jira/browse/MESOS-8530
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The default executor transitions a task to `TASK_KILLING` and marks its
> child container as being killed before posting a `KILL` call to the
> agent.
> 
> The executor ignores kill requests for containers that are marked as
> being killed, and it doesn't remove this mark if the `KILL` call fails.
> This means that it's possible for tasks to get stuck in a `TASK_KILLING`
> state.
> 
> This patch makes the default executor remove the killing mark if a
> `KILL` call fails. That way a scheduler can retry a kill.
> 
> 
> Diffs
> -----
> 
>   src/launcher/default_executor.cpp 906836f3b8e0af79d7c61f90fd8a95f193b26e84 
> 
> 
> Diff: https://reviews.apache.org/r/65695/diff/4/
> 
> 
> Testing
> -------
> 
> `sudo bin/mesos-tests.sh` on GNU/Linux
> 
> 
> Thanks,
> 
> Gaston Kleiman
> 
>

Re: Review Request 65695: Made the default executor allow schedulers to retry task kills.

Reply via email to