> On June 2, 2017, 4:37 p.m., Jie Yu wrote: > > src/slave/slave.cpp > > Line 5147 (original), 5147 (patched) > > <https://reviews.apache.org/r/59746/diff/1/?file=1740554#file1740554line5147> > > > > Can you explain to me in what scenario, the `future` will be in > > DISCARDED state? who discard the promise associated with this future? > > Alexander Rukletsov wrote: > Sure. Consider docker containerizer. > > 1) During container launch, docker containerizer calls `pull()`: > https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L1238 > 2) The container enters `PULLING` state: > https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L435 > 3) While the image is being pulled by docker, future > `containers_[containerId]->pull` is returned from `pull()`: > https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L446 > 4) This future is part of the `.then` chain returned from `_launch()`: > https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L1269 > 5) Now while docker is pulling, `destroy()` is called, which discards the > "pulling future": > https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L2126-L2128 > 6) But discarding that future is propagated up the chain: > https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/3rdparty/libprocess/include/process/future.hpp#L1410-L1411 > 7) Which triggers the `onAny` callback attached to launch: > https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/slave.cpp#L2800-L2810 > 8) Which in turn gives us discarded future treated as launch error: > https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/slave.cpp#L5147-L5152 > > Jie Yu wrote: > > https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L2126-L2128 > > This discards the future, but not necessarily transition the future to > DISCARDED state. That's the reason we have `hasDiscard` and `isDiscarded` > methods for Future becaue they means different things. Can you point to me > where the promise associated with this future is actually being transitioned > into DISCARDED state?
Sure. In this case, we discard pulling in case client discarded the future: https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/docker/docker.cpp#L1512 Additionally, I've manually reproduced the issue (https://issues.apache.org/jira/browse/MESOS-7601) ``` ./src/mesos-execute --master=192.99.40.208:5050 --containerizer=docker --docker_image=ubuntu:16.04 --name=pull-test --command="sleep 1000" ``` aborted right after the start when docker was pulling the image yielded the following verbose agent log: ``` I0621 12:59:22.271728 28980 fetcher.cpp:324] Starting to fetch URIs for container: e2227d2f-fb6e-4fba-b6b6-528d2da7b276, directory: /tmp/a/slaves/f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-S0/frameworks/f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003/executors/pull-test/runs/e2227d2f-fb6e-4fba-b6b6-528d2da7b276 I0621 12:59:22.272665 28989 docker.cpp:1352] Running docker -H unix:///var/run/docker.sock inspect ubuntu:16.04 I0621 12:59:22.420902 28990 docker.cpp:1426] Running docker -H unix:///var/run/docker.sock pull ubuntu:16.04 I0621 12:59:23.070950 28980 slave.cpp:3130] Asked to shut down framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003 by [email protected]:5050 I0621 12:59:23.071007 28980 slave.cpp:3155] Shutting down framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003 I0621 12:59:23.071146 28980 slave.cpp:5625] Shutting down executor 'pull-test' of framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003 W0621 12:59:23.071171 28980 slave.hpp:732] Unable to send event to executor 'pull-test' of framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003: unknown connection type I0621 12:59:28.072532 28984 slave.cpp:5698] Killing executor 'pull-test' of framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003 I0621 12:59:28.072849 28985 docker.cpp:2125] Destroying container e2227d2f-fb6e-4fba-b6b6-528d2da7b276 in PULLING state I0621 12:59:28.073074 28985 docker.cpp:149] 'docker -H unix:///var/run/docker.sock pull ubuntu:16.04' is being discarded E0621 12:59:28.150388 28981 slave.cpp:5183] Container 'e2227d2f-fb6e-4fba-b6b6-528d2da7b276' for executor 'pull-test' of framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003 failed to start: future discarded E0621 12:59:28.150698 28978 slave.cpp:5290] Termination of executor 'pull-test' of framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003 failed: unknown container W0621 12:59:28.150737 28985 composing.cpp:569] Attempted to destroy unknown container e2227d2f-fb6e-4fba-b6b6-528d2da7b276 I0621 12:59:28.150754 28978 slave.cpp:5403] Cleaning up executor 'pull-test' of framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003 ``` I believe killing the process tree leads to discarded future returned by `Subprocess` call. The question here, I think, is whether it is safe to _always_ treat discarded container launch attempts as non-failures. I would argue it makes sense, because for failures we should use future failures : ). What do you think? - Alexander ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/59746/#review176789 ----------------------------------------------------------- On June 2, 2017, 1:10 p.m., Alexander Rukletsov wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/59746/ > ----------------------------------------------------------- > > (Updated June 2, 2017, 1:10 p.m.) > > > Review request for mesos, Ian Downes, Jie Yu, Joseph Wu, and Jan Schlicht. > > > Bugs: MESOS-7601 > https://issues.apache.org/jira/browse/MESOS-7601 > > > Repository: mesos > > > Description > ------- > > Discarded future returned from the containerizer->launch() does not > necessarily mean that the container launch has failed. For example, > a framework may stop while its task are being started. > > > Diffs > ----- > > include/mesos/mesos.proto 5f80170fcd3c05add8b6e9e3107cff062818c1dc > include/mesos/v1/mesos.proto 4b528751006f709f841e44f48c9f5c2dc035b402 > src/slave/slave.cpp 0c7e5f4ef905b3897d341c3147a208fc7a8a12e0 > > > Diff: https://reviews.apache.org/r/59746/diff/1/ > > > Testing > ------- > > make check on several Linux distros. > > > Thanks, > > Alexander Rukletsov > >
