James Peach created MESOS-9319:
----------------------------------

             Summary: Create all container devices at isolation time
                 Key: MESOS-9319
                 URL: https://issues.apache.org/jira/browse/MESOS-9319
             Project: Mesos
          Issue Type: Bug
          Components: containerization
         Environment: When using a custom user namespace isolator, the task 
fails at launch because opening devices fails with a {{EPERM}} error. This 
problem is described in [this system 
issue|https://github.com/systemd/systemd/pull/9483] and this [lxd 
issue|https://github.com/lxc/lxd/issues/4950].

The problem arises in the Mesos containerizer due to the order of operations:

# Clone the containerizer with CLONE_NEWNS
# Mount a tmpfs for the devices
# mknod for the various device nodes

Referring back to the lxc issue, because we do (1) before (2), the tmpfs on 
/dev is marked SB_I_NODEV. Due to the new 4.18 behavior, the mkdir in (3) now 
succeeds (see commit 
[55956b59df33|https://github.com/torvalds/linux/commit/55956b59df336f6738da916dbb520b6e37df9fbd]).
 Previously it would fail and we would fall back to bind mounting the device. 
However, even though we created the device, we can't actually open it due to 
the SB_I_NODEV flag on the tmpfs mount. It appears that the purpose of allowing 
mknod is to that containers can create overlayfs whiteouts.

One approach to deal with this in the Mesos containerizer is to complete the 
device node cleanup that was begun in with the linux/devices isolator. This 
approach involves moving all the responsibility for creating devices back to 
the isolators. Then, at containerization time, we simply bind-mount the whole 
of /dev from the per-container staging area. Since the isolators create the 
devices in the host namespace and on the Mesos work directory, none of the 
conditions that trigger the failure would be invoked.

            Reporter: James Peach






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to