David Robinson created MESOS-6563:
-------------------------------------

             Summary: Shared Filesystem Isolator does not clean up mounts
                 Key: MESOS-6563
                 URL: https://issues.apache.org/jira/browse/MESOS-6563
             Project: Mesos
          Issue Type: Bug
          Components: isolation
            Reporter: David Robinson


While testing the agent's 'filesystem/shared' isolator we discovered that 
mounts are not unmounted, agents ended up with 1000s of mounts, one for each 
task that has run.

To reproduce the problem start a mesos agent w/ --isolation="filesystem/shared" 
and --default_container_info="file:///tmp/the-container-info-below.json", then 
launch and kill several tasks. After the tasks are killed the mount points 
should be unmounted, but they are not.

{noformat:title=container info}
{
    "type": "MESOS",
    "volumes": [
        {
            "container_path": "/tmp",
            "host_path": "tmp",
            "mode": "RW"
        }
    ]
}
{noformat}

Mounts are supposed to be [cleaned automatically by the kernel when the process 
exits|https://github.com/apache/mesos/blob/3845ab8af83a6eebfbf32e98f9000ab695cf2661/src/slave/containerizer/mesos/isolators/filesystem/shared.cpp#L70],
 but I suspect the process that created the mounts in the mesos agent...

{noformat}
// We only need to implement the `prepare()` function in this
// isolator. There is nothing to recover because we do not keep any
// state and do not monitor filesystem usage or perform any action on
// cleanup. Cleanup of mounts is done automatically done by the kernel
// when the mount namespace is destroyed after the last process
// terminates.
Future<Option<ContainerLaunchInfo>> SharedFilesystemIsolatorProcess::prepare(
    const ContainerID& containerId,
    const ContainerConfig& containerConfig)
{
{noformat}

We found during testing that an agent would have 1000s of dangling mounts, all 
of them attributed to the mesos agent:

{noformat}
root[7]server-001 ~ # tail /proc/mounts
/dev/sda1 
/var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-dda59747-848a-4b3b-8424-d0032f8a38f7/runs/e31bea31-22d7-4758-bc8b-6837919d7ed7/tmp
 xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
/dev/sda1 
/var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-3a001926-a442-45c4-9cbc-dad182954fed/runs/bd0a8e36-d147-4511-9cc5-afff9f1c0fbe/tmp
 xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
/dev/sda1 
/var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-04204a72-53d8-44a8-bac5-613835ff85a7/runs/967739ea-5284-41ed-af1a-1cb5a77dd690/tmp
 xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
/dev/sda1 
/var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-95d1ac39-323a-4c15-b1dc-645ed79c4128/runs/6ff6d2b3-2867-4ad4-b2bb-20e27a0fa925/tmp
 xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
/dev/sda1 
/var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-91f6a946-f560-43a3-95c2-424c5dd71684/runs/a4821acc-58f8-4457-bdc9-bd83bdeb8231/tmp
 xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
/dev/sda1 
/var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-dd3b34f1-10c6-43d3-8741-a3164a642e93/runs/0ef8cf17-6c18-48a4-9943-66c448de5d44/tmp
 xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
/dev/sda1 
/var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-fb704ef8-1cf9-4d35-854d-7b6247cf4bc2/runs/e65ec976-057f-4939-9053-1ddcddfc98f8/tmp
 xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
/dev/sda1 
/var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-cdf7b06d-2265-41fe-b1e9-84366dc88b62/runs/1bed4289-7442-4a91-bf45-a7de10ab79bb/tmp
 xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
/dev/sda1 
/var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-58582496-e551-4d80-8ae5-9eacac5e8a36/runs/6b5a7f56-af89-4eab-bbfa-883ca43744ad/tmp
 xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
/dev/sda1 
/var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-5d6bc25a-6ba7-48f9-9655-85da6ff0a383/runs/d5cc4b31-7876-4bca-b1fa-b177c5d88bfc/tmp
 xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
root[7]server-001 ~ # grep -c 'drobinson-test-sleep2' /proc/mounts
4950
root[7]server-001 ~ # pgrep -f /usr/local/bin/mesos-slave
27799
root[7]server-001 ~ # wc -l /proc/27799/mounts
5079 /proc/27799/mounts
root[7]server-001 ~ # grep -c 'drobinson-test-sleep2' /proc/27799/mounts
4950
root[7]server-001 ~ # ps auxww | grep 'drobinson-test-sleep2' -c
5
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to