> On March 20, 2018, 5:53 a.m., Alexander Rukletsov wrote: > > A high level question I have is the semantics of a task kill. You retry > > `SIGKILL` but not `SIGTERM`. Why is it so? Do you expect frameworks to > > retry kills at some point of time (which should be less that the kill > > policy timeout to make sense) if they do not receive a terminal status > > update? Since the escalation timer is started anyway, is it correct to > > transition the container back to `killing=false`? This can also lead to > > multiple `SIGKILL` requests going to the agent for the same container, > > which is likely not a big deal, but maybe is worth checking? > > > > My suggestion would be to retry everything ourselves and do not expect > > frameworks to do this for us. When we faced a similar problem with the > > docker executor, we decided that once the task has been transitioned to > > `killed`, there is no way back, even if the kill attempt failed.
Yes, I expect schedulers to retry kills at some point. Some schedulers already do this, i.e., Marathon. A problem with the approach is that Mesos doesn't send any kind of signal when a kill fails. So probably the easiest way for a scheduler to detect a KILL failure is to check if a task is still in `TASK_KILLING` after the grace period elapses. This means that the retries will probably happen after the kill grace period lapsed. Given that the escalation is not cancelled when a kill call fails, it is possible for the task to get a `SIGKILL` and no `SIGTERM`. We could fix this by only scheduling the escalation once the `SIGTERM` kill call succeeds. We definitely want the default executor to retry kills that it initiates (see [MESOS-8532](https://issues.apache.org/jira/browse/MESOS-8532]). I am not sure if the default executor should also retry kills initiated by a scheduler. It could be better to let it somehow signal the scheduler that the kill request failed and then to let the scheduler decide when to retry the kill, also giving it the chance to adjust the kill policy. > On March 20, 2018, 5:53 a.m., Alexander Rukletsov wrote: > > src/launcher/default_executor.cpp > > Lines 1242-1243 (original) > > <https://reviews.apache.org/r/65695/diff/3/?file=1978910#file1978910line1242> > > > > Is this change related to your chain? Or does this fix a different > > issue with the broken invariant? This method can be called by `taskHealthUpdated()` when the executor is not subscribed, so it is wrong to assume that the executor will always be subscribed. Furthermore, the method doesn't require the executor to be subscribed, because it triggers a `KILL_NESTED_CONTAINER` call, which is part of the agent API and not of the executor API. After having talked to Anand, I think that when this was added, the following was assumed: 1) That the executor will be unsubscribed only for brief periods of time, mostly during agent recovery, when `KILL_NESTED_CONTAINER` calls will most likely fail. Crashing the executor leads to the destruction of all the child containers, so it was a way of working around `KILL_NESTED_CONTAINER` failures. 2) At that time the default executor supported launching only one task group. Assuming this, killing a task would eventually trigger the destruction of all the child containers, which will also happen (in a less graceful) when the executor crashes. This made it acceptable to fail this CHECK. If the executor detects kill failures and allows the scheduler to retry or retries by itself, we don't need this check as a workaround, addressing #1. The default executor now supports running multiple task groups, which makes assumption #2 invalid and dangerous. If this CHECK fails, all the containers started by the executor will be killed, regardles of which task group they belong to. This is bad and could lead to data loss. This patch addresses all of this by removing the CHECK, making the executor detect kill failures, and allowing schedulers to retry kills. - Gaston ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/65695/#review199540 ----------------------------------------------------------- On March 20, 2018, 1:50 p.m., Gaston Kleiman wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/65695/ > ----------------------------------------------------------- > > (Updated March 20, 2018, 1:50 p.m.) > > > Review request for mesos, Alexander Rukletsov, Joseph Wu, Qian Zhang, and > Vinod Kone. > > > Bugs: MESOS-8530 > https://issues.apache.org/jira/browse/MESOS-8530 > > > Repository: mesos > > > Description > ------- > > The default executor transitions a task to `TASK_KILLING` and marks its > child container as being killed before posting a `KILL` call to the > agent. > > The executor ignores kill requests for containers that are marked as > being killed, and it doesn't remove this mark if the `KILL` call fails. > This means that it's possible for tasks to get stuck in a `TASK_KILLING` > state. > > This patch makes the default executor remove the killing mark if a > `KILL` call fails. That way a scheduler can retry a kill. > > > Diffs > ----- > > src/launcher/default_executor.cpp 906836f3b8e0af79d7c61f90fd8a95f193b26e84 > > > Diff: https://reviews.apache.org/r/65695/diff/4/ > > > Testing > ------- > > `sudo bin/mesos-tests.sh` on GNU/Linux > > > Thanks, > > Gaston Kleiman > >
