[ 
https://issues.apache.org/jira/browse/MESOS-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358625#comment-16358625
 ] 

James Peach edited comment on MESOS-8313 at 10/15/18 6:38 PM:
--------------------------------------------------------------

Note, this supervisor need to reap all its children, as per MESOS-5893.


was (Author: jamespeach):
Note, this supervisor need to read all its children, as per MESOS-5893.

> Provide a host namespace container supervisor.
> ----------------------------------------------
>
>                 Key: MESOS-8313
>                 URL: https://issues.apache.org/jira/browse/MESOS-8313
>             Project: Mesos
>          Issue Type: Improvement
>          Components: containerization
>            Reporter: James Peach
>            Assignee: James Peach
>            Priority: Major
>         Attachments: IMG_2629.JPG
>
>
> After more investigation on user namespaces, the current implementation of 
> creating the container namespaces needs some adjustment before we can 
> implement user namespaces in a useable fashion.
> The problems we need to address are:
> 1. The containerizer needs to hold {{CAP_SYS_ADMIN}} over the PID namespace 
> to mount {{procfs}}. Currently, this prevents containers joining the host PID 
> namespace. The workaround is to always create a new container PID namespace 
> (as a child of the user namespace) with the {{namespaces/pid}} isolator.
> 2. The containerizer needs to hold {{CAP_SYS_ADMIN}} over the network 
> namespace to mount {{sysfs}}. There's no general workaround for this since we 
> can't generally require containers to not join the host network namespace.
> 3. The containerizer can't enter a user namespace after entering the 
> {{chroot}}. This restriction makes the existing order of containerizer 
> operations impossible to remain in the case where we want the executor to be 
> in a new user namespace that has no children (i.e. to protect the container 
> from a privileged task).
> After some discussion with [~jieyu], we believe that we can some most or all 
> of these issues by creating a new containerized supervisor that runs fully 
> outside the container and is responsible for constructing the roots mount 
> namespace, launching the containerized to enter the rest of the container, 
> and waiting on the entered process.
> Since this new supervisor process is not running in the user namespace, it 
> will be able to construct the container rootfs in a new mount namespace 
> without user namespace restrictions. We can then clone a child to fully 
> create and enter container namespaces along with the prefabricated rootfs 
> mount namespace.
> The only drawback to this approach is that the container's mount namespace 
> will be owned by the root user namespace rather than the container user 
> namespace. We are OK with this for now.
> The plan here is to retain the existing {{mesos-containerizer launch}} 
> subcommand and add a new {{mesos-containerizer supervise}} subcommand, which 
> will be its parent process. This new subcommand will be used for the default 
> executor and custom executor code paths.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to