[jira] [Commented] (MESOS-8522) `prepareMounts` in Mesos containerizer is flaky.
[ https://issues.apache.org/jira/browse/MESOS-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825419#comment-16825419 ] Gilbert Song commented on MESOS-8522: - probably we could just simply check os::exists(mount.target) for this case? > `prepareMounts` in Mesos containerizer is flaky. > > > Key: MESOS-8522 > URL: https://issues.apache.org/jira/browse/MESOS-8522 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.5.0 >Reporter: Chun-Hung Hsiao >Assignee: Jie Yu >Priority: Major > Labels: mesosphere, storage > > The > [{{prepareMount()}}|https://github.com/apache/mesos/blob/1.5.x/src/slave/containerizer/mesos/launch.cpp#L244] > function in {{src/slave/containerizer/mesos/launch.cpp}} sometimes fails > with the following error: > {noformat} > Failed to prepare mounts: Failed to mark > '/home/docker/containers/af78db6ebc1aff572e576b773d1378121a66bb755ed63b3278e759907e5fe7b6/shm' > as slave: Invalid argument > {noformat} > The error message comes from > https://github.com/apache/mesos/blob/1.5.x/src/slave/containerizer/mesos/launch.cpp#L#L326. > Although it does not happen frequently, it can be reproduced by running tests > that need to clone mount namespaces in repetition. For example, I just > reproduced the bug with the following command after 17 minutes: > {noformat} > sudo bin/mesos-tests.sh --gtest_filter='*ROOT_PublishResourcesRecovery' > --gtest_break_on_failure --gtest_repeat=-1 --verbose > {noformat} > No that in this example, the test itself does not involve any docker image or > docker containerizer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8522) `prepareMounts` in Mesos containerizer is flaky.
[ https://issues.apache.org/jira/browse/MESOS-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825418#comment-16825418 ] Gilbert Song commented on MESOS-8522: - [~chhsia0][~bbannier] what is the priority of this issue? does it only happen when there is a race with flapping docker containers? > `prepareMounts` in Mesos containerizer is flaky. > > > Key: MESOS-8522 > URL: https://issues.apache.org/jira/browse/MESOS-8522 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.5.0 >Reporter: Chun-Hung Hsiao >Assignee: Jie Yu >Priority: Major > Labels: mesosphere, storage > > The > [{{prepareMount()}}|https://github.com/apache/mesos/blob/1.5.x/src/slave/containerizer/mesos/launch.cpp#L244] > function in {{src/slave/containerizer/mesos/launch.cpp}} sometimes fails > with the following error: > {noformat} > Failed to prepare mounts: Failed to mark > '/home/docker/containers/af78db6ebc1aff572e576b773d1378121a66bb755ed63b3278e759907e5fe7b6/shm' > as slave: Invalid argument > {noformat} > The error message comes from > https://github.com/apache/mesos/blob/1.5.x/src/slave/containerizer/mesos/launch.cpp#L#L326. > Although it does not happen frequently, it can be reproduced by running tests > that need to clone mount namespaces in repetition. For example, I just > reproduced the bug with the following command after 17 minutes: > {noformat} > sudo bin/mesos-tests.sh --gtest_filter='*ROOT_PublishResourcesRecovery' > --gtest_break_on_failure --gtest_repeat=-1 --verbose > {noformat} > No that in this example, the test itself does not involve any docker image or > docker containerizer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8522) `prepareMounts` in Mesos containerizer is flaky.
[ https://issues.apache.org/jira/browse/MESOS-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825335#comment-16825335 ] Benjamin Bannier commented on MESOS-8522: - [~jieyu], are you working on this? If not, let's talk with e.g., [~gilbert] to get this onto somebody else's plate. > `prepareMounts` in Mesos containerizer is flaky. > > > Key: MESOS-8522 > URL: https://issues.apache.org/jira/browse/MESOS-8522 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.5.0 >Reporter: Chun-Hung Hsiao >Assignee: Jie Yu >Priority: Major > Labels: mesosphere, storage > > The > [{{prepareMount()}}|https://github.com/apache/mesos/blob/1.5.x/src/slave/containerizer/mesos/launch.cpp#L244] > function in {{src/slave/containerizer/mesos/launch.cpp}} sometimes fails > with the following error: > {noformat} > Failed to prepare mounts: Failed to mark > '/home/docker/containers/af78db6ebc1aff572e576b773d1378121a66bb755ed63b3278e759907e5fe7b6/shm' > as slave: Invalid argument > {noformat} > The error message comes from > https://github.com/apache/mesos/blob/1.5.x/src/slave/containerizer/mesos/launch.cpp#L#L326. > Although it does not happen frequently, it can be reproduced by running tests > that need to clone mount namespaces in repetition. For example, I just > reproduced the bug with the following command after 17 minutes: > {noformat} > sudo bin/mesos-tests.sh --gtest_filter='*ROOT_PublishResourcesRecovery' > --gtest_break_on_failure --gtest_repeat=-1 --verbose > {noformat} > No that in this example, the test itself does not involve any docker image or > docker containerizer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8522) `prepareMounts` in Mesos containerizer is flaky.
[ https://issues.apache.org/jira/browse/MESOS-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357239#comment-16357239 ] Greg Mann commented on MESOS-8522: -- As a mitigation, we could re-scan the mount table after the first pass, and allow these failures if the failed entry no longer exists. > `prepareMounts` in Mesos containerizer is flaky. > > > Key: MESOS-8522 > URL: https://issues.apache.org/jira/browse/MESOS-8522 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.5.0 >Reporter: Chun-Hung Hsiao >Assignee: Jie Yu >Priority: Critical > Labels: mesosphere, storage > > The > [{{prepareMount()}}|https://github.com/apache/mesos/blob/1.5.x/src/slave/containerizer/mesos/launch.cpp#L244] > function in {{src/slave/containerizer/mesos/launch.cpp}} sometimes fails > with the following error: > {noformat} > Failed to prepare mounts: Failed to mark > '/home/docker/containers/af78db6ebc1aff572e576b773d1378121a66bb755ed63b3278e759907e5fe7b6/shm' > as slave: Invalid argument > {noformat} > The error message comes from > https://github.com/apache/mesos/blob/1.5.x/src/slave/containerizer/mesos/launch.cpp#L#L326. > Although it does not happen frequently, it can be reproduced by running tests > that need to clone mount namespaces in repetition. For example, I just > reproduced the bug with the following command after 17 minutes: > {noformat} > sudo bin/mesos-tests.sh --gtest_filter='*ROOT_PublishResourcesRecovery' > --gtest_break_on_failure --gtest_repeat=-1 --verbose > {noformat} > No that in this example, the test itself does not involve any docker image or > docker containerizer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8522) `prepareMounts` in Mesos containerizer is flaky.
[ https://issues.apache.org/jira/browse/MESOS-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348986#comment-16348986 ] Jie Yu commented on MESOS-8522: --- By looking at the box, there seemed to be a flapping docker container. That explains this. The mount entry is gone after we scan the mount table but before we mark the given mount entry as slave mount. > `prepareMounts` in Mesos containerizer is flaky. > > > Key: MESOS-8522 > URL: https://issues.apache.org/jira/browse/MESOS-8522 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.5.0 >Reporter: Chun-Hung Hsiao >Priority: Critical > Labels: mesosphere, storage > > The > [{{prepareMount()}}|https://github.com/apache/mesos/blob/1.5.x/src/slave/containerizer/mesos/launch.cpp#L244] > function in {{src/slave/containerizer/mesos/launch.cpp}} sometimes fails > with the following error: > {noformat} > Failed to prepare mounts: Failed to mark > '/home/docker/containers/af78db6ebc1aff572e576b773d1378121a66bb755ed63b3278e759907e5fe7b6/shm' > as slave: Invalid argument > {noformat} > The error message comes from > https://github.com/apache/mesos/blob/1.5.x/src/slave/containerizer/mesos/launch.cpp#L#L326. > Although it does not happen frequently, it can be reproduced by running tests > that need to clone mount namespaces in repetition. For example, I just > reproduced the bug with the following command after 17 minutes: > {noformat} > sudo bin/mesos-tests.sh --gtest_filter='*ROOT_PublishResourcesRecovery' > --gtest_break_on_failure --gtest_repeat=-1 --verbose > {noformat} > No that in this example, the test itself does not involve any docker image or > docker containerizer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)