Andrei Budnik created MESOS-8568:
------------------------------------

             Summary: Command checks should always call `WAIT_NESTED_CONTAINER` 
before `REMOVE_NESTED_CONTAINER`
                 Key: MESOS-8568
                 URL: https://issues.apache.org/jira/browse/MESOS-8568
             Project: Mesos
          Issue Type: Task
            Reporter: Andrei Budnik


After successful launch of a nested container via 
`LAUNCH_NESTED_CONTAINER_SESSION` in a checker library, it calls 
[waitNestedContainer 
|https://github.com/apache/mesos/blob/0a40243c6a35dc9dc41774d43ee3c19cdf9e54be/src/checks/checker_process.cpp#L657]
 for the container. Checker library 
[calls|https://github.com/apache/mesos/blob/0a40243c6a35dc9dc41774d43ee3c19cdf9e54be/src/checks/checker_process.cpp#L466-L487]
 `REMOVE_NESTED_CONTAINER` to remove a previous nested container before 
launching a nested container for a subsequent check. Hence, 
`REMOVE_NESTED_CONTAINER` call follows `WAIT_NESTED_CONTAINER` to ensure that 
the nested container has been terminated and can be removed/cleaned up.

In case of failure, the library [doesn't 
call|https://github.com/apache/mesos/blob/0a40243c6a35dc9dc41774d43ee3c19cdf9e54be/src/checks/checker_process.cpp#L627-L636]
 `WAIT_NESTED_CONTAINER`. Despite the failure, the container might be launched 
and the following attempt to remove the container without call 
`WAIT_NESTED_CONTAINER` leads to errors like:
{code:java}
W0202 20:03:08.895830 7 checker_process.cpp:503] Received '500 Internal Server 
Error' (Nested container has not terminated yet) while removing the nested 
container 
'2b0c542c-1f5f-42f7-b914-2c1cadb4aeca.da0a7cca-516c-4ec9-b215-b34412b670fa.check-49adc5f1-37a3-4f26-8708-e27d2d6cd125'
 used for the COMMAND check for task 
'node-0-server__e26a82b0-fbab-46a0-a1ea-e7ac6cfa4c91
{code}

The checker library should always call `WAIT_NESTED_CONTAINER` before 
`REMOVE_NESTED_CONTAINER`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to