[ 
https://issues.apache.org/jira/browse/MESOS-9319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach updated MESOS-9319:
-------------------------------
    Comment: was deleted

(was: When using a custom user namespace isolator, the task fails at launch 
because opening devices fails with a {{EPERM}} error. This problem is described 
in [this system issue|https://github.com/systemd/systemd/pull/9483] and this 
[lxd issue|https://github.com/lxc/lxd/issues/4950].

The problem arises in the Mesos containerizer due to the order of operations:

# Clone the containerizer with CLONE_NEWNS
# Mount a tmpfs for the devices
# mknod for the various device nodes

Referring back to the lxc issue, because we do (1) before (2), the tmpfs on 
/dev is marked SB_I_NODEV. Due to the new 4.18 behavior, the mkdir in (3) now 
succeeds (see commit 
[55956b59df33|https://github.com/torvalds/linux/commit/55956b59df336f6738da916dbb520b6e37df9fbd]).
 Previously it would fail and we would fall back to bind mounting the device. 
However, even though we created the device, we can't actually open it due to 
the SB_I_NODEV flag on the tmpfs mount. It appears that the purpose of allowing 
mknod is to that containers can create overlayfs whiteouts.

One approach to deal with this in the Mesos containerizer is to complete the 
device node cleanup that was begun in with the linux/devices isolator. This 
approach involves moving all the responsibility for creating devices back to 
the isolators. Then, at containerization time, we simply bind-mount the whole 
of /dev from the per-container staging area. Since the isolators create the 
devices in the host namespace and on the Mesos work directory, none of the 
conditions that trigger the failure would be invoked.

The failure we observed with our tasks was a failure to open {{/dev/null}}, 
when redirecting it as standard input to a child process.)

> Create all container devices at isolation time
> ----------------------------------------------
>
>                 Key: MESOS-9319
>                 URL: https://issues.apache.org/jira/browse/MESOS-9319
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>            Reporter: James Peach
>            Priority: Major
>
> When using a custom user namespace isolator, the task fails at launch because 
> opening devices fails with a EPERM error. This problem is described in this 
> system issue and this lxd issue.
> The problem arises in the Mesos containerizer due to the order of operations:
> Clone the containerizer with CLONE_NEWNS
> Mount a tmpfs for the devices
> mknod for the various device nodes
> Referring back to the lxc issue, because we do (1) before (2), the tmpfs on 
> /dev is marked SB_I_NODEV. Due to the new 4.18 behavior, the mkdir in (3) now 
> succeeds (see commit 55956b59df33). Previously it would fail and we would 
> fall back to bind mounting the device. However, even though we created the 
> device, we can't actually open it due to the SB_I_NODEV flag on the tmpfs 
> mount. It appears that the purpose of allowing mknod is to that containers 
> can create overlayfs whiteouts.
> One approach to deal with this in the Mesos containerizer is to complete the 
> device node cleanup that was begun in with the linux/devices isolator. This 
> approach involves moving all the responsibility for creating devices back to 
> the isolators. Then, at containerization time, we simply bind-mount the whole 
> of /dev from the per-container staging area. Since the isolators create the 
> devices in the host namespace and on the Mesos work directory, none of the 
> conditions that trigger the failure would be invoked.
> The failure we observed with our tasks was a failure to open /dev/null, when 
> redirecting it as standard input to a child process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to