[jira] [Commented] (MESOS-8951) Flaky `AgentContainerAPITest.RecoverNestedContainer`

2018-06-27 Thread Jie Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16525319#comment-16525319
 ] 

Jie Yu commented on MESOS-8951:
---

I think we should add some method to cluster::Slave to wait for it to be ready:
```
Future ready();
```
so that in test, we can
```
AWAIT_READY(slave->ready());
```

> Flaky `AgentContainerAPITest.RecoverNestedContainer`
> 
>
> Key: MESOS-8951
> URL: https://issues.apache.org/jira/browse/MESOS-8951
> Project: Mesos
>  Issue Type: Bug
> Environment: internal CI
>  master-668030da
>Reporter: Andrei Budnik
>Priority: Major
>  Labels: flaky, flaky-test
> Attachments: 
> AgentContainerAPITest.RecoverNestedContainer-badrun1.txt, 
> AgentContainerAPITest.RecoverNestedContainer-badrun2.txt
>
>
> {code:java}
> [  FAILED  ] 
> ParentChildContainerTypeAndContentType/AgentContainerAPITest.RecoverNestedContainer/9,
>  where GetParam() = (1, 0, application/json, 
> ("cgroups/cpu,cgroups/mem,filesystem/linux,namespaces/pid", "linux", 
> "ROOT_CGROUPS_")) (15297 ms)
> [  FAILED  ] 
> ParentChildContainerTypeAndContentType/AgentContainerAPITest.RecoverNestedContainer/13,
>  where GetParam() = (1, 1, application/json, 
> ("cgroups/cpu,cgroups/mem,filesystem/linux,namespaces/pid", "linux", 
> "ROOT_CGROUPS_")) (15275 ms){code}
> {code:java}
> ../../src/tests/agent_container_api_tests.cpp:596
> Failed to wait 15secs for wait
> {code}
> There is no call of `WAIT_CONTAINER` in agent logs. It looks like the request 
> wasn't delivered to the agent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8951) Flaky `AgentContainerAPITest.RecoverNestedContainer`

2018-06-27 Thread Jie Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16525298#comment-16525298
 ] 

Jie Yu commented on MESOS-8951:
---

It looks to me that this is a race due to the fact that waiting on the 
receiving of `SlaveReregisteredMessage` is not guaranteed to have the agent 
ready to process API calls. We need to make sure `reregistered` is called and 
finished before we can issue wait nested container API call.

> Flaky `AgentContainerAPITest.RecoverNestedContainer`
> 
>
> Key: MESOS-8951
> URL: https://issues.apache.org/jira/browse/MESOS-8951
> Project: Mesos
>  Issue Type: Bug
> Environment: internal CI
>  master-668030da
>Reporter: Andrei Budnik
>Priority: Major
>  Labels: flaky, flaky-test
> Attachments: 
> AgentContainerAPITest.RecoverNestedContainer-badrun1.txt, 
> AgentContainerAPITest.RecoverNestedContainer-badrun2.txt
>
>
> {code:java}
> [  FAILED  ] 
> ParentChildContainerTypeAndContentType/AgentContainerAPITest.RecoverNestedContainer/9,
>  where GetParam() = (1, 0, application/json, 
> ("cgroups/cpu,cgroups/mem,filesystem/linux,namespaces/pid", "linux", 
> "ROOT_CGROUPS_")) (15297 ms)
> [  FAILED  ] 
> ParentChildContainerTypeAndContentType/AgentContainerAPITest.RecoverNestedContainer/13,
>  where GetParam() = (1, 1, application/json, 
> ("cgroups/cpu,cgroups/mem,filesystem/linux,namespaces/pid", "linux", 
> "ROOT_CGROUPS_")) (15275 ms){code}
> {code:java}
> ../../src/tests/agent_container_api_tests.cpp:596
> Failed to wait 15secs for wait
> {code}
> There is no call of `WAIT_CONTAINER` in agent logs. It looks like the request 
> wasn't delivered to the agent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)