-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65071/#review195216
-----------------------------------------------------------
Ship it!
Verified that this patch fix the issue. I am ok with it for Mesos 1.5.0, but we
may need to schedule some long term plan for composing containerizer. Because:
1. Calling the underlying containerizer wait() is consumed as a monitoring
machnism in composing containerizer to detect whether a container is exited. We
are calling the underlying wait() in a couple methods:
a. `__recover()`
b. Two `_launch()`
c. `wait()` (as expected)
d. `destroy()`
2. The complication comes from how do we define the containerizer wait()
interface:
a.
https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.hpp#L131
This is no longer true for nested containers in Mesos Containerizer
(the current semantic
is: if a container has been destroyed and the checkpointed termination
file can be found,
the exited nested container's termination would be returned.
b.
https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.hpp#L141
Same as (a).
3. The composing containerizer code is not easy to read. People new to this
part need to spend more time to understand it (e.g., could we do the
refactoring and do not have the recursive logics?).
TODO list after 1.5.0:
1. Change the `containerizer::destroy()` interface to be:
`virtual process::Future<Option<mesos::slave::ContainerTermination>>
destroy(const ContainerID& containerId) = 0;`
2. Precisely re-define what `containerizer::wait()` means for containers
(either executor container, nested container or standalone container).
3. See if we could de-couple methods in composing containerizer calling the
underlying containerizer `wait()` (e.g., introduce another way to watch
container exited status).
4. Simply and refactor the logics in composing containerizer.
- Gilbert Song
On Jan. 10, 2018, 10:48 a.m., Andrei Budnik wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65071/
> -----------------------------------------------------------
>
> (Updated Jan. 10, 2018, 10:48 a.m.)
>
>
> Review request for mesos, Alexander Rukletsov, Gilbert Song, and Jie Yu.
>
>
> Bugs: MESOS-8391
> https://issues.apache.org/jira/browse/MESOS-8391
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This patch fixes the issue when `wait()` of composing containerizer
> returns a future that is never set to `READY` state due to a missing
> `wait()` call on corresponding containerizer after it's recovery.
> It is necessary to call `wait()` for each known container of composing
> containerizer, because any terminated container should be handled in
> `_destroy()` callback that is bound to that `wait()`.
>
>
> Diffs
> -----
>
> src/slave/containerizer/composing.cpp
> 9ace70d9fbd78182715c5ef13fcaf7ad45f76f97
>
>
> Diff: https://reviews.apache.org/r/65071/diff/2/
>
>
> Testing
> -------
>
> sudo make check (fedora 25)
>
>
> Thanks,
>
> Andrei Budnik
>
>