----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/44571/ -----------------------------------------------------------
(Updated April 8, 2016, 1:20 p.m.) Review request for mesos, Jie Yu and Joris Van Remoortere. Changes ------- Addressed issues. Bugs: MESOS-4673 https://issues.apache.org/jira/browse/MESOS-4673 Repository: mesos Description (updated) ------- Commands issued to the Docker daemon can hang, causing problems within Mesos. For example a hanging 'docker stop' can result in an unresponsive executor, causing the Mesos agent to issue an to run a 'docker stop' itself which might result in an unresponsive agent (see MESOS-4673). Adding a timeout can be used as a workaround. Diffs (updated) ----- src/slave/constants.hpp 449c8cd9f43f71b4612023eb463969e9db0bc960 src/slave/containerizer/docker.hpp 35673214ab4bf50151f15e3fad10ff374cda3bbc src/slave/containerizer/docker.cpp 5755effec065650aac4473e4b622f4342ad020a3 Diff: https://reviews.apache.org/r/44571/diff/ Testing ------- sudo ./bin/mesos-tests.sh (to test if existing tests break due to the changed behavior) Because docker must hang for both the Mesos agent as well as the `mesos-docker-executor`, it can't currently be tested as part of the Mesos integration tests. Here's how to test that the timeout works: Run with Fedora 23 (Kernel 4.2.3, Docker 1.9.1) # Start a master ./bin/mesos-master.sh --work_dir=/tmp/mesos & # Start an agent sudo ./bin/mesos-slave.sh --master=127.0.0.1:5050 --containerizers=docker & # Run a task using the docker containerizer ./src/mesos-execute --containerizer=docker --docker_image=alpine --master=127.0.0.1:5050 --name="sleep" --command="sleep 1000" & # Note the pid of `mesos-execute` as well as the pid of the sleep task run by docker (eg 3323 and 3474) # Have mesos run `docker inspect` to gather the pid of the docker task curl -X GET localhost:5051/monitor/statistics # Now overload docker by trying to run a lot of tasks in parallel for i in `seq 1 100`; do sudo docker run --rm alpine sleep 60 & done # Wait until the first of these docker tasks finish, `sudo docker ps` should be unresponsible now # Kill the `mesos-execute` task (eg 3323) kill 3323 # Watch the logs of the Mesos agent. At some point it will send a SIGKILL to the docker task (eg 3474) # Make sure that the docker task is indeed termintad (using `ps fax` or the like) Thanks, Jan Schlicht