Chun-Hung Hsiao created MESOS-9719:
--------------------------------------

             Summary: Test 
`AgentFailoverHTTPExecutorUsingResourceProviderResources` is flaky.
                 Key: MESOS-9719
                 URL: https://issues.apache.org/jira/browse/MESOS-9719
             Project: Mesos
          Issue Type: Bug
            Reporter: Chun-Hung Hsiao
            Assignee: Chun-Hung Hsiao


The test is flaky because:
 # It assumes the mock RP never reregisters, which might not be true.
 # It does not wait for the task and executor to be reaped, which would lead to 
a race between containerizer destroy and test teardown and cause cgroups 
cleanup to fail.
 # It fast-forwards the clock, which might lead to containerizer destroy 
failures.
 # It assumes that the framework only receives two status updates, which might 
not be true.

Example failure log:
{noformat}
E0410 00:18:23.526867  1251 slave.cpp:3118] Failed to update resources for 
container f941cb68-9f13-418c-be1b-702e5927b1eb of executor 'default' of 
framework ca96f624-9590-4776-9e83-39714cebd25f-0000, destroying container: 
Collect failed: Failed to publish resources 'disk(allocated: foo)[RAW]:200' for 
container f941cb68-9f13-418c-be1b-702e5927b1eb: Resource provider 
616834b9-4dbb-45a7-b762-831ce5e8534a is not subscribed
I0410 00:18:23.526957  1251 containerizer.cpp:2576] Destroying container 
f941cb68-9f13-418c-be1b-702e5927b1eb in RUNNING state
I0410 00:18:23.526979  1251 containerizer.cpp:3278] Transitioning the state of 
container f941cb68-9f13-418c-be1b-702e5927b1eb from RUNNING to DESTROYING
I0410 00:18:23.526989  1251 containerizer.cpp:2576] Destroying container 
f941cb68-9f13-418c-be1b-702e5927b1eb.523acde5-8c21-4f3f-af71-7cb84b54803e in 
RUNNING state
I0410 00:18:23.526996  1251 containerizer.cpp:3278] Transitioning the state of 
container 
f941cb68-9f13-418c-be1b-702e5927b1eb.523acde5-8c21-4f3f-af71-7cb84b54803e from 
RUNNING to DESTROYING
I0410 00:18:23.527102  1251 linux_launcher.cpp:576] Asked to destroy container 
f941cb68-9f13-418c-be1b-702e5927b1eb.523acde5-8c21-4f3f-af71-7cb84b54803e
...
E0410 00:18:23.535424  1246 slave.cpp:6572] Termination of executor 'default' 
of framework ca96f624-9590-4776-9e83-39714cebd25f-0000 failed: Failed to 
destroy nested containers: Failed to kill all processes in the container: Timed 
out after 1mins
...
I0410 00:18:23.535817  1252 master.cpp:8983] Executor 'default' of framework 
ca96f624-9590-4776-9e83-39714cebd25f-0000 on agent 
ca96f624-9590-4776-9e83-39714cebd25f-S0 at slave(699)@172.16.10.211:33823 
(ip-172-16-10-211.ec2.internal): wait status -1
...
../../src/tests/mesos.cpp:926: Failure
(cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
'/sys/fs/cgroup/memory/mesos_test_b1965800-016c-494b-8d6d-c70437c9405f/f941cb68-9f13-418c-be1b-702e5927b1eb':
 Device or resource busy
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to