-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65071/#review195216
-----------------------------------------------------------


Ship it!




Verified that this patch fix the issue. I am ok with it for Mesos 1.5.0, but we 
may need to schedule some long term plan for composing containerizer. Because:

1. Calling the underlying containerizer wait() is consumed as a monitoring 
machnism in composing containerizer to detect whether a container is exited. We 
are calling the underlying wait() in a couple methods:
     a. `__recover()`
     b. Two `_launch()`
     c. `wait()` (as expected)
     d. `destroy()`
2. The complication comes from how do we define the containerizer wait() 
interface:
     a. 
https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.hpp#L131
        This is no longer true for nested containers in Mesos Containerizer 
(the current semantic
        is: if a container has been destroyed and the checkpointed termination 
file can be found,
        the exited nested container's termination would be returned.
     b. 
https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.hpp#L141
        Same as (a).
3. The composing containerizer code is not easy to read. People new to this 
part need to spend more time to understand it (e.g., could we do the 
refactoring and do not have the recursive logics?).

TODO list after 1.5.0:
1. Change the `containerizer::destroy()` interface to be:
   `virtual process::Future<Option<mesos::slave::ContainerTermination>> 
destroy(const ContainerID& containerId) = 0;`
2. Precisely re-define what `containerizer::wait()` means for containers 
(either executor container, nested container or standalone container).
3. See if we could de-couple methods in composing containerizer calling the 
underlying containerizer `wait()` (e.g., introduce another way to watch 
container exited status).
4. Simply and refactor the logics in composing containerizer.

- Gilbert Song


On Jan. 10, 2018, 10:48 a.m., Andrei Budnik wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65071/
> -----------------------------------------------------------
> 
> (Updated Jan. 10, 2018, 10:48 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Gilbert Song, and Jie Yu.
> 
> 
> Bugs: MESOS-8391
>     https://issues.apache.org/jira/browse/MESOS-8391
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch fixes the issue when `wait()` of composing containerizer
> returns a future that is never set to `READY` state due to a missing
> `wait()` call on corresponding containerizer after it's recovery.
> It is necessary to call `wait()` for each known container of composing
> containerizer, because any terminated container should be handled in
> `_destroy()` callback that is bound to that `wait()`.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/composing.cpp 
> 9ace70d9fbd78182715c5ef13fcaf7ad45f76f97 
> 
> 
> Diff: https://reviews.apache.org/r/65071/diff/2/
> 
> 
> Testing
> -------
> 
> sudo make check (fedora 25)
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>

Reply via email to