> On June 2, 2017, 4:37 p.m., Jie Yu wrote:
> > src/slave/slave.cpp
> > Line 5147 (original), 5147 (patched)
> > <https://reviews.apache.org/r/59746/diff/1/?file=1740554#file1740554line5147>
> >
> >     Can you explain to me in what scenario, the `future` will be in 
> > DISCARDED state? who discard the promise associated with this future?
> 
> Alexander Rukletsov wrote:
>     Sure. Consider docker containerizer.
>     
>     1) During container launch, docker containerizer calls `pull()`: 
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L1238
>     2) The container enters `PULLING` state: 
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L435
>     3) While the image is being pulled by docker, future 
> `containers_[containerId]->pull` is returned from `pull()`: 
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L446
>     4) This future is part of the `.then` chain returned from `_launch()`: 
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L1269
>     5) Now while docker is pulling, `destroy()` is called, which discards the 
> "pulling future": 
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L2126-L2128
>     6) But discarding that future is propagated up the chain: 
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/3rdparty/libprocess/include/process/future.hpp#L1410-L1411
>     7) Which triggers the `onAny` callback attached to launch: 
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/slave.cpp#L2800-L2810
>     8) Which in turn gives us discarded future treated as launch error: 
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/slave.cpp#L5147-L5152
> 
> Jie Yu wrote:
>     
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L2126-L2128
>     
>     This discards the future, but not necessarily transition the future to 
> DISCARDED state. That's the reason we have `hasDiscard` and `isDiscarded` 
> methods for Future becaue they means different things. Can you point to me 
> where the promise associated with this future is actually being transitioned 
> into DISCARDED state?

Sure. In this case, we discard pulling in case client discarded the future: 
https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/docker/docker.cpp#L1512

Additionally, I've manually reproduced the issue 
(https://issues.apache.org/jira/browse/MESOS-7601)
```
./src/mesos-execute --master=192.99.40.208:5050 --containerizer=docker 
--docker_image=ubuntu:16.04 --name=pull-test --command="sleep 1000"
```
aborted right after the start when docker was pulling the image yielded the 
following verbose agent log:
```
I0621 12:59:22.271728 28980 fetcher.cpp:324] Starting to fetch URIs for 
container: e2227d2f-fb6e-4fba-b6b6-528d2da7b276, directory: 
/tmp/a/slaves/f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-S0/frameworks/f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003/executors/pull-test/runs/e2227d2f-fb6e-4fba-b6b6-528d2da7b276
I0621 12:59:22.272665 28989 docker.cpp:1352] Running docker -H 
unix:///var/run/docker.sock inspect ubuntu:16.04
I0621 12:59:22.420902 28990 docker.cpp:1426] Running docker -H 
unix:///var/run/docker.sock pull ubuntu:16.04
I0621 12:59:23.070950 28980 slave.cpp:3130] Asked to shut down framework 
f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003 by [email protected]:5050
I0621 12:59:23.071007 28980 slave.cpp:3155] Shutting down framework 
f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003
I0621 12:59:23.071146 28980 slave.cpp:5625] Shutting down executor 'pull-test' 
of framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003
W0621 12:59:23.071171 28980 slave.hpp:732] Unable to send event to executor 
'pull-test' of framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003: unknown 
connection type
I0621 12:59:28.072532 28984 slave.cpp:5698] Killing executor 'pull-test' of 
framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003
I0621 12:59:28.072849 28985 docker.cpp:2125] Destroying container 
e2227d2f-fb6e-4fba-b6b6-528d2da7b276 in PULLING state
I0621 12:59:28.073074 28985 docker.cpp:149] 'docker -H 
unix:///var/run/docker.sock pull ubuntu:16.04' is being discarded
E0621 12:59:28.150388 28981 slave.cpp:5183] Container 
'e2227d2f-fb6e-4fba-b6b6-528d2da7b276' for executor 'pull-test' of framework 
f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003 failed to start: future discarded
E0621 12:59:28.150698 28978 slave.cpp:5290] Termination of executor 'pull-test' 
of framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003 failed: unknown container
W0621 12:59:28.150737 28985 composing.cpp:569] Attempted to destroy unknown 
container e2227d2f-fb6e-4fba-b6b6-528d2da7b276
I0621 12:59:28.150754 28978 slave.cpp:5403] Cleaning up executor 'pull-test' of 
framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003
```

I believe killing the process tree leads to discarded future returned by 
`Subprocess` call.

The question here, I think, is whether it is safe to _always_ treat discarded 
container launch attempts as non-failures. I would argue it makes sense, 
because for failures we should use future failures : ). What do you think?


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59746/#review176789
-----------------------------------------------------------


On June 2, 2017, 1:10 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59746/
> -----------------------------------------------------------
> 
> (Updated June 2, 2017, 1:10 p.m.)
> 
> 
> Review request for mesos, Ian Downes, Jie Yu, Joseph Wu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-7601
>     https://issues.apache.org/jira/browse/MESOS-7601
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Discarded future returned from the containerizer->launch() does not
> necessarily mean that the container launch has failed. For example,
> a framework may stop while its task are being started.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 5f80170fcd3c05add8b6e9e3107cff062818c1dc 
>   include/mesos/v1/mesos.proto 4b528751006f709f841e44f48c9f5c2dc035b402 
>   src/slave/slave.cpp 0c7e5f4ef905b3897d341c3147a208fc7a8a12e0 
> 
> 
> Diff: https://reviews.apache.org/r/59746/diff/1/
> 
> 
> Testing
> -------
> 
> make check on several Linux distros.
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>

Reply via email to