> On May 15, 2018, 11:05 p.m., Greg Mann wrote: > > src/slave/containerizer/composing.cpp > > Line 361 (original), 360-365 (patched) > > <https://reviews.apache.org/r/66668/diff/4/?file=2021203#file2021203line361> > > > > Moving our previous discussion from https://reviews.apache.org/r/66669/ > > to this RR, since the two have been merged. > > > > You said: > > ``` > > Composing c'zer adds a container to the containers_ map and then calls > > launch() on underlying c'zer. However, composing c'zer doesn't handle the > > case when an underlying containerizer returns a failure during the launch, > > so the container will not be removed from the containers_ map. If we want > > to call destroy() without the callback, then there is no chance to get the > > container removed from the containers_ map in this case. > > > > To check that, you can remove the callback from destroy() (and > > optionally copy-paste containerizer->wait(containerId).onAny(...) to > > _recover()), then run tests and you'll get a few tests failing, including > > ParentChildContainerTypeAndContentType/AgentContainerAPITest.NestedContainerFailLaunch/0. > > > > We could address the issue by handling the launch failures, but there > > are 3 places where we should handle it. That change would make composing > > c'zer more complex IMO. > > ``` > > > > I'm a bit confused. Are you saying that in the case of launch failure, > > we rely on the caller to call `destroy()` in order to remove the container > > from the `containers_` map? It looks to me like we handle the launch > > failure case with the following code: > > ``` > > // If we are here, the launch is not supported by `containerizer`. > > > > // Try the next containerizer. > > ++containerizer; > > > > if (containerizer == containerizers_.end()) { > > // If we are here none of the containerizers support the launch. > > > > // We set this to `None` because the container has no chance of > > // getting launched by any containerizer. This is similar to what > > // would happen if the destroy "started" after launch returned > > false. > > container->destroyed.set(Option<ContainerTermination>::none()); > > > > // We destroy the container irrespective whether > > // a destroy is already in progress, for simplicity. > > containers_.erase(containerId); > > delete container; > > > > // None of the containerizers support the launch. > > return Containerizer::LaunchResult::NOT_SUPPORTED; > > } > > ``` > > > > My hope would be that if the caller attempts to launch a task with the > > composing containerizer and the launch fails, then the `containers_` map > > will be updated so that the container does not exist anymore. Is that not > > the case? > > > > I don't quite understand why it's not simple to remove the > > `containers_.erase(containerId)` code from `destroy()` and simply rely on > > the code here to erase it whenever `wait()` returns. > > Andrei Budnik wrote: > >we handle the launch failure case with the following code > > We have 3 places in composing where we call `containerizer->launch()` and > then subscribe `_launch()` callback using `.then()` which doesn't call the > callback if a future returned by `containerizer->launch()` is failed. See > test `NestedContainerFailLaunch` from my previous comment that reproduces > this case. > The following code handles only the case when an underlying c'zer doesn't > support the launch. > > I tried to compare two (?) possible solutions: > 1) Add `.onFailed()` callback in all 3 places where we call > `containerizer->launch().next(...)` to clean `containers_` map. > 2) Add `.onAny()` callback for `containerizer->destroy()` in > `ComposingContainer::destroy()` to clean `containers_` map. > > Also, first solution requires adding `containerizer->wait()` code to the > `ComposingContainerizerProcess::__recover()` in order to be able to clean > `containers_` map after recovery. > Finally, I decided to implement second solution, because first one "would > make composing c'zer more complex IMO". > > Greg Mann wrote: > OK that makes sense, thanks!! > > So, the current implementation requires that `destroy()` must be called > after a failed launch in order to clean up the `containers_` map. So, are we > relying on the fact that the caller (Mesos agent) will always call > `destroy()` after a failed launch? Do the Mesos and Docker containerizers > also require that the agent must do this in order to maintain consistent > state in the containerizer? > > Andrei Budnik wrote: > >Do the Mesos and Docker containerizers also require that the agent must > do this in order to maintain consistent state in the containerizer? > > If an attempt to launch a > [mesos](https://github.com/apache/mesos/blob/c020a130afd55dd3f5702a23b13f8234e0ace391/src/slave/containerizer/mesos/containerizer.cpp#L1336-L1345) > or > [docker](https://github.com/apache/mesos/blob/c020a130afd55dd3f5702a23b13f8234e0ace391/src/slave/containerizer/docker.cpp#L1336-L1371) > containerizer fails, then neither mesos nor docker containerizer clean up > their internal state. > > >the current implementation requires that destroy() must be called after > a failed launch in order to clean up the containers_ map. So, are we relying > on the fact that the caller (Mesos agent) will always call destroy() after a > failed launch? > > Yes, [we > are](https://github.com/apache/mesos/blob/c020a130afd55dd3f5702a23b13f8234e0ace391/src/slave/http.cpp#L2578-L2602).
Thanks Andrei! - Greg ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66668/#review203185 ----------------------------------------------------------- On April 17, 2018, 3:23 p.m., Andrei Budnik wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/66668/ > ----------------------------------------------------------- > > (Updated April 17, 2018, 3:23 p.m.) > > > Review request for mesos, Alexander Rukletsov, Greg Mann, Jie Yu, and Qian > Zhang. > > > Bugs: MESOS-8712 > https://issues.apache.org/jira/browse/MESOS-8712 > > > Repository: mesos > > > Description > ------- > > Previously, we stored `destroyed` promise for each container and used > it to guarantee that `destroy()` returns a non-empty value when the > destroy-in-progress stops an launch-in-progress using the next > containerizer. Since `wait()` and `destroy()` return the same > `ContainerTermination` value when called with the same ContainerID > argument, we can remove `destroyed` promise and add callbacks to > clean up `containers_` map instead. > > > Diffs > ----- > > src/slave/containerizer/composing.cpp > 1fb79f53b48154ecbd3b0165b6a8002c871e6dce > > > Diff: https://reviews.apache.org/r/66668/diff/4/ > > > Testing > ------- > > internal CI > > > Thanks, > > Andrei Budnik > >