> On Aug. 10, 2017, 12:07 p.m., Alexander Rukletsov wrote:
> > src/docker/executor.cpp
> > Lines 410-415 (original), 416-421 (patched)
> > <https://reviews.apache.org/r/61530/diff/1/?file=1794166#file1794166line416>
> >
> >     Let's add a comment explaining why are we doing retry / unblock on 
> > failure and why not timeout. Something like:
> >     ```
> >     // Invoking `docker stop` might be unsuccessful, in which case the 
> > container most probably does not receive the signal. In this case we should 
> > allow schedulers to retry the kill operation or, if the kill was initiated 
> > by a failing health check, retry ourselves. We do not bail out nor stop 
> > retrying to avoid sending a terminal status update while the container 
> > might still be running.
> >     //
> >     // NOTE: `docker stop` might also hang. We do not address this for now, 
> > because there is no evidence that in this case docker daemon might funciton 
> > properly, i.e., it is only the docker cli command that hangs, and hence 
> > there is not so much we can do.
> >     ```

We should also refer to https://issues.apache.org/jira/browse/MESOS-6743 in the 
comment so that folks can get more context.


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61530/#review182573
-----------------------------------------------------------


On Aug. 9, 2017, 4:55 p.m., Andrei Budnik wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61530/
> -----------------------------------------------------------
> 
> (Updated Aug. 9, 2017, 4:55 p.m.)
> 
> 
> Review request for mesos and Alexander Rukletsov.
> 
> 
> Bugs: MESOS-6743
>     https://issues.apache.org/jira/browse/MESOS-6743
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, after `docker stop` command failure, docker executor
> neither allowed a scheduler to retry `killTask` command, nor retried to
> kill a task when task killing was triggered by a failed health check.
> 
> 
> Diffs
> -----
> 
>   src/docker/executor.cpp 26f12ec002f754fab0d34c01472cf95b499d8007 
> 
> 
> Diff: https://reviews.apache.org/r/61530/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> Manual testing:
> 
> Emulating `docker stop` errors:
> ===============================
> 1. Add `return fmt.Errorf("Emulating error!")` to 
> https://github.com/docker/docker-ce/blob/master/components/engine/daemon/stop.go#L21
> 2. build docker from sources: 
> http://oyvindsk.com/writing/docker-build-from-source
> 3. stop docker service and launch dockerd binary, like: `sudo 
> ./bundles/17.06.0-dev/binary-daemon/dockerd`
> 
> Emulating docker daemon hang:
> =============================
> 1. `ps aux|grep dockerd` - 2 processes will be found
> 2. `sudo kill -STOP <PID1> <PID2>` - send SIGSTOP to docker daemon processes 
> just before sending `docker stop`
> 
> Emulating health check failure in docker executor:
> ==================================================
> 1. Add
> ```c++
>   static int fake = 0;
>   if (++fake > 10) {
>     failure();
>     return;
>   }
> ```
> to `HealthChecker::processCheckResult()` in `src/checks/health_checker.cpp`
> 2. Add
> ```c++
>        HealthCheck healthCheck;
>        healthCheck.set_type(HealthCheck::COMMAND);
>        healthCheck.mutable_command()->set_value("exit 0");
>        healthCheck.set_delay_seconds(0);
>        healthCheck.set_interval_seconds(0);
>        healthCheck.set_grace_period_seconds(1);
>        _task.mutable_health_check()->CopyFrom(healthCheck);
> ```
> to `CommandScheduler::offers()` in `src/cli/execute.cpp`
> 3. compile mesos
> 4. run mesos agent: `sudo GLOG_v=1 ./bin/mesos-agent.sh 
> --resources="cpus:10000;mem:1000000" 
> --work_dir='/home/some_user/mesos/build/var/agent-1' 
> --containerizers="docker,mesos" --master="127.0.1.1:5050"`
> 5. launch docker executor: `./src/mesos-execute --master="127.0.1.1:5050" 
> --name="a" --containerizer=docker --docker_image="ubuntu:xenial" 
> --command="while true; do : ; done"`
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>

Reply via email to