> On Aug. 7, 2018, 4:03 p.m., Vinod Kone wrote:
> > src/slave/containerizer/mesos/utils.cpp
> > Line 102 (original), 105 (patched)
> > <https://reviews.apache.org/r/68257/diff/1/?file=2069850#file2069850line108>
> >
> >     Are we guaranteed that there are no short-lived processes, other than 
> > the task process, at the 2nd level? If not, we will have the same issue 
> > right?
> >     
> >     Modulo the above question, the change LGTM.
> 
> Andrei Budnik wrote:
>     There are two types of 2nd-level processes:
>     1) the command executor's task
>     2) the nested container's task
>     
>     E.g. the process tree can be like the following:
>     0. mesos-containerizer (`init`)
>        1. mesos-executor (command executor)
>           2. sleep 1000 (command executor's task)
>        1. mesos-containerizer (`init` of a nested container) <- enters `mnt` 
> namespace of command executor's task before forking a task
>           2. echo "echo" (nested container's task)
>     
>     Since we skip 1st-level processes whose `mnt` namespace is not the same 
> as `init` process (PID 1), the algorithm doesn't iterate over their 2nd-level 
> processes. That gives a gurarantee that we only detect command executor's 
> task, but not the short-lived processes.
> 
> Alexander Rukletsov wrote:
>     Let's make it explicit to the reader. I suggest to:
>     1. Collect _candidates_ (2nd level children) first.
>     2. Assert there are >= 1 candidates.
>     3. Filter all candidates except those whose (`mnt` differs from root's 
> `mnt` && parent's `mnt` equals to root's `mnt`).
>     4. Assert there are now 0 or 1 candidates left.
>     5. Return pid of that single candidate or root's pid if there are none.

@andrei Thanks for the info. IIUC, any processes launched by the mesos-executor 
or the task using `LAUNCH_NESTED_CONTAINER` API will end up with the process 
tree you showed above. I just wanted to make sure it is not possible for us to 
see a child process of command executor that is a sibling of the task process 
(e.g., task launched a process without using LNC API but it ended up as a 
sibling, command executor runs a script/binary without using LNC API etc).


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68257/#review206947
-----------------------------------------------------------


On Aug. 7, 2018, 1:46 p.m., Andrei Budnik wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68257/
> -----------------------------------------------------------
> 
> (Updated Aug. 7, 2018, 1:46 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Gilbert Song, Jie Yu, and 
> Kevin Klues.
> 
> 
> Bugs: MESOS-9116
>     https://issues.apache.org/jira/browse/MESOS-9116
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, we were walking the process tree from the container's
> `init` process to find the first process along the way whose `mnt`
> namespace differs from the `init` process. We expected this algorithm
> to always return the PID of the command executor's task. However, if
> someone launches multiple nested containers within the process tree,
> the algorithm might detect the PID of the nested container instead of
> the command executor's task. The detected PID might belong to a
> short-lived container, so the container's process might terminate at
> the moment the containerizer launcher (aka `nanny`) process tries to
> enter its `mnt` namespace. This patch fixes the detection algorithm
> so that it always returns PID of the command executor's task.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/utils.cpp 
> 30e76d1d91651975033078f5450e45f5f2fd8ba0 
> 
> 
> Diff: https://reviews.apache.org/r/68257/diff/1/
> 
> 
> Testing
> -------
> 
> 1) Internal CI with disabled 
> `ROOT_CGROUPS_LaunchNestedContainerSessionsInParallel` test (see previous 
> patch).
> 2) Fedora 25: `./src/mesos-tests 
> --gtest_filter=*AgentAPITest.LaunchNestedContainerSessionInParallel* 
> --gtest_break_on_failure --gtest_repeat=100 --verbose`
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>

Reply via email to