> On June 2, 2017, 4:37 p.m., Jie Yu wrote: > > src/slave/slave.cpp > > Line 5147 (original), 5147 (patched) > > <https://reviews.apache.org/r/59746/diff/1/?file=1740554#file1740554line5147> > > > > Can you explain to me in what scenario, the `future` will be in > > DISCARDED state? who discard the promise associated with this future? > > Alexander Rukletsov wrote: > Sure. Consider docker containerizer. > > 1) During container launch, docker containerizer calls `pull()`: > https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L1238 > 2) The container enters `PULLING` state: > https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L435 > 3) While the image is being pulled by docker, future > `containers_[containerId]->pull` is returned from `pull()`: > https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L446 > 4) This future is part of the `.then` chain returned from `_launch()`: > https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L1269 > 5) Now while docker is pulling, `destroy()` is called, which discards the > "pulling future": > https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L2126-L2128 > 6) But discarding that future is propagated up the chain: > https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/3rdparty/libprocess/include/process/future.hpp#L1410-L1411 > 7) Which triggers the `onAny` callback attached to launch: > https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/slave.cpp#L2800-L2810 > 8) Which in turn gives us discarded future treated as launch error: > https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/slave.cpp#L5147-L5152 > > Jie Yu wrote: > > https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L2126-L2128 > > This discards the future, but not necessarily transition the future to > DISCARDED state. That's the reason we have `hasDiscard` and `isDiscarded` > methods for Future becaue they means different things. Can you point to me > where the promise associated with this future is actually being transitioned > into DISCARDED state? > > Alexander Rukletsov wrote: > Sure. In this case, we discard pulling in case client discarded the > future: > https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/docker/docker.cpp#L1512 > > Additionally, I've manually reproduced the issue > (https://issues.apache.org/jira/browse/MESOS-7601) > ``` > ./src/mesos-execute --master=192.99.40.208:5050 --containerizer=docker > --docker_image=ubuntu:16.04 --name=pull-test --command="sleep 1000" > ``` > aborted right after the start when docker was pulling the image yielded > the following verbose agent log: > ``` > I0621 12:59:22.271728 28980 fetcher.cpp:324] Starting to fetch URIs for > container: e2227d2f-fb6e-4fba-b6b6-528d2da7b276, directory: > /tmp/a/slaves/f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-S0/frameworks/f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003/executors/pull-test/runs/e2227d2f-fb6e-4fba-b6b6-528d2da7b276 > I0621 12:59:22.272665 28989 docker.cpp:1352] Running docker -H > unix:///var/run/docker.sock inspect ubuntu:16.04 > I0621 12:59:22.420902 28990 docker.cpp:1426] Running docker -H > unix:///var/run/docker.sock pull ubuntu:16.04 > I0621 12:59:23.070950 28980 slave.cpp:3130] Asked to shut down framework > f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003 by master@192.99.40.208:5050 > I0621 12:59:23.071007 28980 slave.cpp:3155] Shutting down framework > f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003 > I0621 12:59:23.071146 28980 slave.cpp:5625] Shutting down executor > 'pull-test' of framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003 > W0621 12:59:23.071171 28980 slave.hpp:732] Unable to send event to > executor 'pull-test' of framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003: > unknown connection type > I0621 12:59:28.072532 28984 slave.cpp:5698] Killing executor 'pull-test' > of framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003 > I0621 12:59:28.072849 28985 docker.cpp:2125] Destroying container > e2227d2f-fb6e-4fba-b6b6-528d2da7b276 in PULLING state > I0621 12:59:28.073074 28985 docker.cpp:149] 'docker -H > unix:///var/run/docker.sock pull ubuntu:16.04' is being discarded > E0621 12:59:28.150388 28981 slave.cpp:5183] Container > 'e2227d2f-fb6e-4fba-b6b6-528d2da7b276' for executor 'pull-test' of framework > f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003 failed to start: future discarded > E0621 12:59:28.150698 28978 slave.cpp:5290] Termination of executor > 'pull-test' of framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003 failed: > unknown container > W0621 12:59:28.150737 28985 composing.cpp:569] Attempted to destroy > unknown container e2227d2f-fb6e-4fba-b6b6-528d2da7b276 > I0621 12:59:28.150754 28978 slave.cpp:5403] Cleaning up executor > 'pull-test' of framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003 > ``` > > I believe killing the process tree leads to discarded future returned by > `Subprocess` call. > > The question here, I think, is whether it is safe to _always_ treat > discarded container launch attempts as non-failures. I would argue it makes > sense, because for failures we should use future failures : ). What do you > think?
I've realised that I have not answered your question explicitly : ). So when `docker pull` is forcefully killed and the corresponding process is reaped, the `subprocess.status` future is set to ready, but the chained one (`___pull` if my mental compiler works correctly) transitions to `discarded` because of [1], leading to the original `launch` future being discarded as well. [1] https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/3rdparty/libprocess/include/process/future.hpp#L1297 - Alexander ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/59746/#review176789 ----------------------------------------------------------- On June 2, 2017, 1:10 p.m., Alexander Rukletsov wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/59746/ > ----------------------------------------------------------- > > (Updated June 2, 2017, 1:10 p.m.) > > > Review request for mesos, Ian Downes, Jie Yu, Joseph Wu, and Jan Schlicht. > > > Bugs: MESOS-7601 > https://issues.apache.org/jira/browse/MESOS-7601 > > > Repository: mesos > > > Description > ------- > > Discarded future returned from the containerizer->launch() does not > necessarily mean that the container launch has failed. For example, > a framework may stop while its task are being started. > > > Diffs > ----- > > include/mesos/mesos.proto 5f80170fcd3c05add8b6e9e3107cff062818c1dc > include/mesos/v1/mesos.proto 4b528751006f709f841e44f48c9f5c2dc035b402 > src/slave/slave.cpp 0c7e5f4ef905b3897d341c3147a208fc7a8a12e0 > > > Diff: https://reviews.apache.org/r/59746/diff/1/ > > > Testing > ------- > > make check on several Linux distros. > > > Thanks, > > Alexander Rukletsov > >