> On June 2, 2017, 4:37 p.m., Jie Yu wrote:
> > src/slave/slave.cpp
> > Line 5147 (original), 5147 (patched)
> > <https://reviews.apache.org/r/59746/diff/1/?file=1740554#file1740554line5147>
> >
> >     Can you explain to me in what scenario, the `future` will be in 
> > DISCARDED state? who discard the promise associated with this future?
> 
> Alexander Rukletsov wrote:
>     Sure. Consider docker containerizer.
>     
>     1) During container launch, docker containerizer calls `pull()`: 
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L1238
>     2) The container enters `PULLING` state: 
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L435
>     3) While the image is being pulled by docker, future 
> `containers_[containerId]->pull` is returned from `pull()`: 
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L446
>     4) This future is part of the `.then` chain returned from `_launch()`: 
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L1269
>     5) Now while docker is pulling, `destroy()` is called, which discards the 
> "pulling future": 
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L2126-L2128
>     6) But discarding that future is propagated up the chain: 
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/3rdparty/libprocess/include/process/future.hpp#L1410-L1411
>     7) Which triggers the `onAny` callback attached to launch: 
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/slave.cpp#L2800-L2810
>     8) Which in turn gives us discarded future treated as launch error: 
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/slave.cpp#L5147-L5152
> 
> Jie Yu wrote:
>     
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L2126-L2128
>     
>     This discards the future, but not necessarily transition the future to 
> DISCARDED state. That's the reason we have `hasDiscard` and `isDiscarded` 
> methods for Future becaue they means different things. Can you point to me 
> where the promise associated with this future is actually being transitioned 
> into DISCARDED state?
> 
> Alexander Rukletsov wrote:
>     Sure. In this case, we discard pulling in case client discarded the 
> future: 
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/docker/docker.cpp#L1512
>     
>     Additionally, I've manually reproduced the issue 
> (https://issues.apache.org/jira/browse/MESOS-7601)
>     ```
>     ./src/mesos-execute --master=192.99.40.208:5050 --containerizer=docker 
> --docker_image=ubuntu:16.04 --name=pull-test --command="sleep 1000"
>     ```
>     aborted right after the start when docker was pulling the image yielded 
> the following verbose agent log:
>     ```
>     I0621 12:59:22.271728 28980 fetcher.cpp:324] Starting to fetch URIs for 
> container: e2227d2f-fb6e-4fba-b6b6-528d2da7b276, directory: 
> /tmp/a/slaves/f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-S0/frameworks/f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003/executors/pull-test/runs/e2227d2f-fb6e-4fba-b6b6-528d2da7b276
>     I0621 12:59:22.272665 28989 docker.cpp:1352] Running docker -H 
> unix:///var/run/docker.sock inspect ubuntu:16.04
>     I0621 12:59:22.420902 28990 docker.cpp:1426] Running docker -H 
> unix:///var/run/docker.sock pull ubuntu:16.04
>     I0621 12:59:23.070950 28980 slave.cpp:3130] Asked to shut down framework 
> f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003 by master@192.99.40.208:5050
>     I0621 12:59:23.071007 28980 slave.cpp:3155] Shutting down framework 
> f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003
>     I0621 12:59:23.071146 28980 slave.cpp:5625] Shutting down executor 
> 'pull-test' of framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003
>     W0621 12:59:23.071171 28980 slave.hpp:732] Unable to send event to 
> executor 'pull-test' of framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003: 
> unknown connection type
>     I0621 12:59:28.072532 28984 slave.cpp:5698] Killing executor 'pull-test' 
> of framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003
>     I0621 12:59:28.072849 28985 docker.cpp:2125] Destroying container 
> e2227d2f-fb6e-4fba-b6b6-528d2da7b276 in PULLING state
>     I0621 12:59:28.073074 28985 docker.cpp:149] 'docker -H 
> unix:///var/run/docker.sock pull ubuntu:16.04' is being discarded
>     E0621 12:59:28.150388 28981 slave.cpp:5183] Container 
> 'e2227d2f-fb6e-4fba-b6b6-528d2da7b276' for executor 'pull-test' of framework 
> f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003 failed to start: future discarded
>     E0621 12:59:28.150698 28978 slave.cpp:5290] Termination of executor 
> 'pull-test' of framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003 failed: 
> unknown container
>     W0621 12:59:28.150737 28985 composing.cpp:569] Attempted to destroy 
> unknown container e2227d2f-fb6e-4fba-b6b6-528d2da7b276
>     I0621 12:59:28.150754 28978 slave.cpp:5403] Cleaning up executor 
> 'pull-test' of framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003
>     ```
>     
>     I believe killing the process tree leads to discarded future returned by 
> `Subprocess` call.
>     
>     The question here, I think, is whether it is safe to _always_ treat 
> discarded container launch attempts as non-failures. I would argue it makes 
> sense, because for failures we should use future failures : ). What do you 
> think?

I've realised that I have not answered your question explicitly : ). So when 
`docker pull` is forcefully killed and the corresponding process is reaped, the 
`subprocess.status` future is set to ready, but the chained one (`___pull` if 
my mental compiler works correctly) transitions to `discarded` because of [1], 
leading to the original `launch` future being discarded as well.

[1] 
https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/3rdparty/libprocess/include/process/future.hpp#L1297


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59746/#review176789
-----------------------------------------------------------


On June 2, 2017, 1:10 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59746/
> -----------------------------------------------------------
> 
> (Updated June 2, 2017, 1:10 p.m.)
> 
> 
> Review request for mesos, Ian Downes, Jie Yu, Joseph Wu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-7601
>     https://issues.apache.org/jira/browse/MESOS-7601
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Discarded future returned from the containerizer->launch() does not
> necessarily mean that the container launch has failed. For example,
> a framework may stop while its task are being started.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 5f80170fcd3c05add8b6e9e3107cff062818c1dc 
>   include/mesos/v1/mesos.proto 4b528751006f709f841e44f48c9f5c2dc035b402 
>   src/slave/slave.cpp 0c7e5f4ef905b3897d341c3147a208fc7a8a12e0 
> 
> 
> Diff: https://reviews.apache.org/r/59746/diff/1/
> 
> 
> Testing
> -------
> 
> make check on several Linux distros.
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>

Reply via email to