> On May 15, 2018, 11:05 p.m., Greg Mann wrote:
> > src/slave/containerizer/composing.cpp
> > Line 361 (original), 360-365 (patched)
> > <https://reviews.apache.org/r/66668/diff/4/?file=2021203#file2021203line361>
> >
> >     Moving our previous discussion from https://reviews.apache.org/r/66669/ 
> > to this RR, since the two have been merged.
> >     
> >     You said:
> >     ```
> >     Composing c'zer adds a container to the containers_ map and then calls 
> > launch() on underlying c'zer. However, composing c'zer doesn't handle the 
> > case when an underlying containerizer returns a failure during the launch, 
> > so the container will not be removed from the containers_ map. If we want 
> > to call destroy() without the callback, then there is no chance to get the 
> > container removed from the containers_ map in this case.
> >     
> >     To check that, you can remove the callback from destroy() (and 
> > optionally copy-paste containerizer->wait(containerId).onAny(...) to 
> > _recover()), then run tests and you'll get a few tests failing, including 
> > ParentChildContainerTypeAndContentType/AgentContainerAPITest.NestedContainerFailLaunch/0.
> >     
> >     We could address the issue by handling the launch failures, but there 
> > are 3 places where we should handle it. That change would make composing 
> > c'zer more complex IMO.
> >     ```
> >     
> >     I'm a bit confused. Are you saying that in the case of launch failure, 
> > we rely on the caller to call `destroy()` in order to remove the container 
> > from the `containers_` map? It looks to me like we handle the launch 
> > failure case with the following code:
> >     ```
> >       // If we are here, the launch is not supported by `containerizer`.
> >     
> >       // Try the next containerizer.
> >       ++containerizer;
> >     
> >       if (containerizer == containerizers_.end()) {
> >         // If we are here none of the containerizers support the launch.
> >     
> >         // We set this to `None` because the container has no chance of
> >         // getting launched by any containerizer. This is similar to what
> >         // would happen if the destroy "started" after launch returned 
> > false.
> >         container->destroyed.set(Option<ContainerTermination>::none());
> >     
> >         // We destroy the container irrespective whether
> >         // a destroy is already in progress, for simplicity.
> >         containers_.erase(containerId);
> >         delete container;
> >     
> >         // None of the containerizers support the launch.
> >         return Containerizer::LaunchResult::NOT_SUPPORTED;
> >       }
> >     ```
> >     
> >     My hope would be that if the caller attempts to launch a task with the 
> > composing containerizer and the launch fails, then the `containers_` map 
> > will be updated so that the container does not exist anymore. Is that not 
> > the case?
> >     
> >     I don't quite understand why it's not simple to remove the 
> > `containers_.erase(containerId)` code from `destroy()` and simply rely on 
> > the code here to erase it whenever `wait()` returns.
> 
> Andrei Budnik wrote:
>     >we handle the launch failure case with the following code
>     
>     We have 3 places in composing where we call `containerizer->launch()` and 
> then subscribe `_launch()` callback using `.then()` which doesn't call the 
> callback if a future returned by `containerizer->launch()` is failed. See 
> test `NestedContainerFailLaunch` from my previous comment that reproduces 
> this case.
>     The following code handles only the case when an underlying c'zer doesn't 
> support the launch.
>     
>     I tried to compare two (?) possible solutions:
>     1) Add `.onFailed()` callback in all 3 places where we call 
> `containerizer->launch().next(...)` to clean `containers_` map.
>     2) Add `.onAny()` callback for `containerizer->destroy()` in 
> `ComposingContainer::destroy()` to clean `containers_` map.
>     
>     Also, first solution requires adding `containerizer->wait()` code to the 
> `ComposingContainerizerProcess::__recover()` in order to be able to clean 
> `containers_` map after recovery.
>     Finally, I decided to implement second solution, because first one "would 
> make composing c'zer more complex IMO".

OK that makes sense, thanks!!

So, the current implementation requires that `destroy()` must be called after a 
failed launch in order to clean up the `containers_` map. So, are we relying on 
the fact that the caller (Mesos agent) will always call `destroy()` after a 
failed launch? Do the Mesos and Docker containerizers also require that the 
agent must do this in order to maintain consistent state in the containerizer?


- Greg


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66668/#review203185
-----------------------------------------------------------


On April 17, 2018, 3:23 p.m., Andrei Budnik wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66668/
> -----------------------------------------------------------
> 
> (Updated April 17, 2018, 3:23 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Greg Mann, Jie Yu, and Qian 
> Zhang.
> 
> 
> Bugs: MESOS-8712
>     https://issues.apache.org/jira/browse/MESOS-8712
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, we stored `destroyed` promise for each container and used
> it to guarantee that `destroy()` returns a non-empty value when the
> destroy-in-progress stops an launch-in-progress using the next
> containerizer. Since `wait()` and `destroy()` return the same
> `ContainerTermination` value when called with the same ContainerID
> argument, we can remove `destroyed` promise and add callbacks to
> clean up `containers_` map instead.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/composing.cpp 
> 1fb79f53b48154ecbd3b0165b6a8002c871e6dce 
> 
> 
> Diff: https://reviews.apache.org/r/66668/diff/4/
> 
> 
> Testing
> -------
> 
> internal CI
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>

Reply via email to