----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66668/#review203185 -----------------------------------------------------------
src/slave/containerizer/composing.cpp Line 361 (original), 360-365 (patched) <https://reviews.apache.org/r/66668/#comment285278> Moving our previous discussion from https://reviews.apache.org/r/66669/ to this RR, since the two have been merged. You said: ``` Composing c'zer adds a container to the containers_ map and then calls launch() on underlying c'zer. However, composing c'zer doesn't handle the case when an underlying containerizer returns a failure during the launch, so the container will not be removed from the containers_ map. If we want to call destroy() without the callback, then there is no chance to get the container removed from the containers_ map in this case. To check that, you can remove the callback from destroy() (and optionally copy-paste containerizer->wait(containerId).onAny(...) to _recover()), then run tests and you'll get a few tests failing, including ParentChildContainerTypeAndContentType/AgentContainerAPITest.NestedContainerFailLaunch/0. We could address the issue by handling the launch failures, but there are 3 places where we should handle it. That change would make composing c'zer more complex IMO. ``` I'm a bit confused. Are you saying that in the case of launch failure, we rely on the caller to call `destroy()` in order to remove the container from the `containers_` map? It looks to me like we handle the launch failure case with the following code: ``` // If we are here, the launch is not supported by `containerizer`. // Try the next containerizer. ++containerizer; if (containerizer == containerizers_.end()) { // If we are here none of the containerizers support the launch. // We set this to `None` because the container has no chance of // getting launched by any containerizer. This is similar to what // would happen if the destroy "started" after launch returned false. container->destroyed.set(Option<ContainerTermination>::none()); // We destroy the container irrespective whether // a destroy is already in progress, for simplicity. containers_.erase(containerId); delete container; // None of the containerizers support the launch. return Containerizer::LaunchResult::NOT_SUPPORTED; } ``` My hope would be that if the caller attempts to launch a task with the composing containerizer and the launch fails, then the `containers_` map will be updated so that the container does not exist anymore. Is that not the case? I don't quite understand why it's not simple to remove the `containers_.erase(containerId)` code from `destroy()` and simply rely on the code here to erase it whenever `wait()` returns. - Greg Mann On April 17, 2018, 3:23 p.m., Andrei Budnik wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/66668/ > ----------------------------------------------------------- > > (Updated April 17, 2018, 3:23 p.m.) > > > Review request for mesos, Alexander Rukletsov, Greg Mann, Jie Yu, and Qian > Zhang. > > > Bugs: MESOS-8712 > https://issues.apache.org/jira/browse/MESOS-8712 > > > Repository: mesos > > > Description > ------- > > Previously, we stored `destroyed` promise for each container and used > it to guarantee that `destroy()` returns a non-empty value when the > destroy-in-progress stops an launch-in-progress using the next > containerizer. Since `wait()` and `destroy()` return the same > `ContainerTermination` value when called with the same ContainerID > argument, we can remove `destroyed` promise and add callbacks to > clean up `containers_` map instead. > > > Diffs > ----- > > src/slave/containerizer/composing.cpp > 1fb79f53b48154ecbd3b0165b6a8002c871e6dce > > > Diff: https://reviews.apache.org/r/66668/diff/4/ > > > Testing > ------- > > internal CI > > > Thanks, > > Andrei Budnik > >