[jira] [Commented] (MESOS-8522) `prepareMounts` in Mesos containerizer is flaky.

2019-04-24 Thread Gilbert Song (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825419#comment-16825419
 ] 

Gilbert Song commented on MESOS-8522:
-

probably we could just simply check os::exists(mount.target) for this case?

> `prepareMounts` in Mesos containerizer is flaky.
> 
>
> Key: MESOS-8522
> URL: https://issues.apache.org/jira/browse/MESOS-8522
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.5.0
>Reporter: Chun-Hung Hsiao
>Assignee: Jie Yu
>Priority: Major
>  Labels: mesosphere, storage
>
> The 
> [{{prepareMount()}}|https://github.com/apache/mesos/blob/1.5.x/src/slave/containerizer/mesos/launch.cpp#L244]
>  function in {{src/slave/containerizer/mesos/launch.cpp}} sometimes fails 
> with the following error:
> {noformat}
> Failed to prepare mounts: Failed to mark 
> '/home/docker/containers/af78db6ebc1aff572e576b773d1378121a66bb755ed63b3278e759907e5fe7b6/shm'
>  as slave: Invalid argument
> {noformat}
> The error message comes from 
> https://github.com/apache/mesos/blob/1.5.x/src/slave/containerizer/mesos/launch.cpp#L#L326.
> Although it does not happen frequently, it can be reproduced by running tests 
> that need to clone mount namespaces in repetition. For example, I just 
> reproduced the bug with the following command after 17 minutes:
> {noformat}
> sudo bin/mesos-tests.sh --gtest_filter='*ROOT_PublishResourcesRecovery' 
> --gtest_break_on_failure --gtest_repeat=-1 --verbose
> {noformat}
> No that in this example, the test itself does not involve any docker image or 
> docker containerizer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8522) `prepareMounts` in Mesos containerizer is flaky.

2019-04-24 Thread Gilbert Song (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825418#comment-16825418
 ] 

Gilbert Song commented on MESOS-8522:
-

[~chhsia0][~bbannier] what is the priority of this issue? does it only happen 
when there is a race with flapping docker containers?

> `prepareMounts` in Mesos containerizer is flaky.
> 
>
> Key: MESOS-8522
> URL: https://issues.apache.org/jira/browse/MESOS-8522
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.5.0
>Reporter: Chun-Hung Hsiao
>Assignee: Jie Yu
>Priority: Major
>  Labels: mesosphere, storage
>
> The 
> [{{prepareMount()}}|https://github.com/apache/mesos/blob/1.5.x/src/slave/containerizer/mesos/launch.cpp#L244]
>  function in {{src/slave/containerizer/mesos/launch.cpp}} sometimes fails 
> with the following error:
> {noformat}
> Failed to prepare mounts: Failed to mark 
> '/home/docker/containers/af78db6ebc1aff572e576b773d1378121a66bb755ed63b3278e759907e5fe7b6/shm'
>  as slave: Invalid argument
> {noformat}
> The error message comes from 
> https://github.com/apache/mesos/blob/1.5.x/src/slave/containerizer/mesos/launch.cpp#L#L326.
> Although it does not happen frequently, it can be reproduced by running tests 
> that need to clone mount namespaces in repetition. For example, I just 
> reproduced the bug with the following command after 17 minutes:
> {noformat}
> sudo bin/mesos-tests.sh --gtest_filter='*ROOT_PublishResourcesRecovery' 
> --gtest_break_on_failure --gtest_repeat=-1 --verbose
> {noformat}
> No that in this example, the test itself does not involve any docker image or 
> docker containerizer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8522) `prepareMounts` in Mesos containerizer is flaky.

2019-04-24 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825335#comment-16825335
 ] 

Benjamin Bannier commented on MESOS-8522:
-

[~jieyu], are you working on this? If not, let's talk with e.g., [~gilbert] to 
get this onto somebody else's plate.

> `prepareMounts` in Mesos containerizer is flaky.
> 
>
> Key: MESOS-8522
> URL: https://issues.apache.org/jira/browse/MESOS-8522
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.5.0
>Reporter: Chun-Hung Hsiao
>Assignee: Jie Yu
>Priority: Major
>  Labels: mesosphere, storage
>
> The 
> [{{prepareMount()}}|https://github.com/apache/mesos/blob/1.5.x/src/slave/containerizer/mesos/launch.cpp#L244]
>  function in {{src/slave/containerizer/mesos/launch.cpp}} sometimes fails 
> with the following error:
> {noformat}
> Failed to prepare mounts: Failed to mark 
> '/home/docker/containers/af78db6ebc1aff572e576b773d1378121a66bb755ed63b3278e759907e5fe7b6/shm'
>  as slave: Invalid argument
> {noformat}
> The error message comes from 
> https://github.com/apache/mesos/blob/1.5.x/src/slave/containerizer/mesos/launch.cpp#L#L326.
> Although it does not happen frequently, it can be reproduced by running tests 
> that need to clone mount namespaces in repetition. For example, I just 
> reproduced the bug with the following command after 17 minutes:
> {noformat}
> sudo bin/mesos-tests.sh --gtest_filter='*ROOT_PublishResourcesRecovery' 
> --gtest_break_on_failure --gtest_repeat=-1 --verbose
> {noformat}
> No that in this example, the test itself does not involve any docker image or 
> docker containerizer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8522) `prepareMounts` in Mesos containerizer is flaky.

2018-02-08 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357239#comment-16357239
 ] 

Greg Mann commented on MESOS-8522:
--

As a mitigation, we could re-scan the mount table after the first pass, and 
allow these failures if the failed entry no longer exists.

> `prepareMounts` in Mesos containerizer is flaky.
> 
>
> Key: MESOS-8522
> URL: https://issues.apache.org/jira/browse/MESOS-8522
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.5.0
>Reporter: Chun-Hung Hsiao
>Assignee: Jie Yu
>Priority: Critical
>  Labels: mesosphere, storage
>
> The 
> [{{prepareMount()}}|https://github.com/apache/mesos/blob/1.5.x/src/slave/containerizer/mesos/launch.cpp#L244]
>  function in {{src/slave/containerizer/mesos/launch.cpp}} sometimes fails 
> with the following error:
> {noformat}
> Failed to prepare mounts: Failed to mark 
> '/home/docker/containers/af78db6ebc1aff572e576b773d1378121a66bb755ed63b3278e759907e5fe7b6/shm'
>  as slave: Invalid argument
> {noformat}
> The error message comes from 
> https://github.com/apache/mesos/blob/1.5.x/src/slave/containerizer/mesos/launch.cpp#L#L326.
> Although it does not happen frequently, it can be reproduced by running tests 
> that need to clone mount namespaces in repetition. For example, I just 
> reproduced the bug with the following command after 17 minutes:
> {noformat}
> sudo bin/mesos-tests.sh --gtest_filter='*ROOT_PublishResourcesRecovery' 
> --gtest_break_on_failure --gtest_repeat=-1 --verbose
> {noformat}
> No that in this example, the test itself does not involve any docker image or 
> docker containerizer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8522) `prepareMounts` in Mesos containerizer is flaky.

2018-02-01 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348986#comment-16348986
 ] 

Jie Yu commented on MESOS-8522:
---

By looking at the box, there seemed to be a flapping docker container. That 
explains this. The mount entry is gone after we scan the mount table but before 
we mark the given mount entry as slave mount.

> `prepareMounts` in Mesos containerizer is flaky.
> 
>
> Key: MESOS-8522
> URL: https://issues.apache.org/jira/browse/MESOS-8522
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.5.0
>Reporter: Chun-Hung Hsiao
>Priority: Critical
>  Labels: mesosphere, storage
>
> The 
> [{{prepareMount()}}|https://github.com/apache/mesos/blob/1.5.x/src/slave/containerizer/mesos/launch.cpp#L244]
>  function in {{src/slave/containerizer/mesos/launch.cpp}} sometimes fails 
> with the following error:
> {noformat}
> Failed to prepare mounts: Failed to mark 
> '/home/docker/containers/af78db6ebc1aff572e576b773d1378121a66bb755ed63b3278e759907e5fe7b6/shm'
>  as slave: Invalid argument
> {noformat}
> The error message comes from 
> https://github.com/apache/mesos/blob/1.5.x/src/slave/containerizer/mesos/launch.cpp#L#L326.
> Although it does not happen frequently, it can be reproduced by running tests 
> that need to clone mount namespaces in repetition. For example, I just 
> reproduced the bug with the following command after 17 minutes:
> {noformat}
> sudo bin/mesos-tests.sh --gtest_filter='*ROOT_PublishResourcesRecovery' 
> --gtest_break_on_failure --gtest_repeat=-1 --verbose
> {noformat}
> No that in this example, the test itself does not involve any docker image or 
> docker containerizer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)