> On May 7, 2015, 10:02 a.m., Timothy Chen wrote: > > src/examples/docker_no_executor_framework.cpp, line 109 > > <https://reviews.apache.org/r/33249/diff/4/?file=952333#file952333line109> > > > > is this something you like to commit?
Whops! That snuck in. I'll fix. > On May 7, 2015, 10:02 a.m., Timothy Chen wrote: > > src/slave/containerizer/docker.cpp, line 1187 > > <https://reviews.apache.org/r/33249/diff/4/?file=952336#file952336line1187> > > > > This seems to not respect anymore the explicit state we try to track > > what stage the containerizer was running when we destroy, and what stage it > > failed. > > > > Does the error message has clear message saying it failed to > > fetch/pull/run? tracking the stage the container is in when we destroyed is done by the container->state enum now, right? Yes, the status message on TASK_FAIL that the scheduler gets back now looks like this: Failed to launch container: Failed to 'docker pull bogus-image:latest': exit status = exited with status 1 stderr = time="2015-05-07T17:45:35Z" level=fatal msg="Error: image library/bogus-image:latest not found" I tested fetch failures by providing a bogus URI and run failures by starting the slave with --docker=/tmp/docker which looked like this: $ cat /tmp/docker #!/bin/bash if [[ $1 == "run" ]]; then echo "docker run failure!" 1>&2 exit 123 fi exec docker $* > On May 7, 2015, 10:02 a.m., Timothy Chen wrote: > > src/slave/slave.cpp, line 3125 > > <https://reviews.apache.org/r/33249/diff/4/?file=952339#file952339line3125> > > > > We shouldn't need to destroy if none of the containerizers return true. Ah, that's right. I will just remove this line. - Jay ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33249/#review82832 ----------------------------------------------------------- On May 7, 2015, 9:17 a.m., Jay Buffington wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/33249/ > ----------------------------------------------------------- > > (Updated May 7, 2015, 9:17 a.m.) > > > Review request for mesos, Benjamin Hindman, Jie Yu, and Timothy Chen. > > > Bugs: MESOS-2020 and MESOS-2656 > https://issues.apache.org/jira/browse/MESOS-2020 > https://issues.apache.org/jira/browse/MESOS-2656 > > > Repository: mesos > > > Description > ------- > > Send statusUpdate to scheduler on containerizer launch failure > > This review is actually for three commits. The three commits are separated > out > on github here: > https://github.com/apache/mesos/compare/master...jaybuff:report-launch-failure?expand=1 > > Commit #1: > docker-no-executor-framework example should send ContainerInfo > > docker-no-executor-framework stopped working when the protobuf message > changed during development. Apparently noone noticed because we don't > run it as part of our test suite. I used it to develop a fix for > proper error reporting, so I fixed it. > > Commit #2: > serialize wait and destroy in containerizer > > Containerizers keep track of all the containers they have created. When > a container is destroy()'ed the containerizer removes the container from > its internal hashmap. This means that wait() must be called before the > container is destroyed. > > When the docker containerizer fails because of a "docker pull" or > "docker run" failure, we end up with a race condition where wait() does > not properly return the Termination, so we ended up with a default > failure message of "Abnormal Executor Termination" which isn't very > useful. > > This commit solves this issue by not destroying the container until > after wait is called in the failure case. > > Fixes MESOS-2656 > > Commit #3: > send scheduler containerizer launch failure errors > > If a method in the launch sequence returns a failure the error message > wasn't making it back to the scheduler. > > Fixes MESOS-2020 > > > Diffs > ----- > > src/examples/docker_no_executor_framework.cpp > d5385d9295d30a6039bab9938f3270006582d3da > src/slave/containerizer/containerizer.hpp > 56c088abb036aa26c323c3142fd27ea29ed51d4f > src/slave/containerizer/docker.hpp b25ec55bf3cd30d6e8a804d09d90c632a7d12e3f > src/slave/containerizer/docker.cpp f9fc89ad7e3c853c3f9f6dcf9aa68e54dc1888c6 > src/slave/containerizer/mesos/containerizer.hpp > 5e5f13ed8a71ff9510b40b6032d00fd16d312622 > src/slave/containerizer/mesos/containerizer.cpp > f2587280dc0e1d566d2b856a80358c7b3896c603 > src/slave/slave.cpp c78ee3c9e7fc38ad364e83f4abe267e86bfbbc13 > > Diff: https://reviews.apache.org/r/33249/diff/ > > > Testing > ------- > > I used examples/docker_no_executor_framework.cpp and make check > > > Thanks, > > Jay Buffington > >