Re: Review Request 51278: Refactored LinuxLauncher to be nested container aware.

2016-09-26 Thread Benjamin Hindman


> On Sept. 25, 2016, 10:32 p.m., Jie Yu wrote:
> > src/slave/containerizer/mesos/linux_launcher.cpp, line 614
> > 
> >
> > using executor_pid here is weird. I think we should rethink how we 
> > should `ContainerStatus` when we have nested containers. Should we rename 
> > it to `pid`? Sounds like 'executor_pid' is not very precise.
> > 
> > Another to consider is: in the future, user might want to get the pid 
> > of it's actual process, not the 'init' pid we forked. What should we call 
> > that?

I'm going to drop this for now, let's iterate on this together in a subsequent 
review.


> On Sept. 25, 2016, 10:32 p.m., Jie Yu wrote:
> > src/slave/containerizer/mesos/linux_launcher.cpp, lines 565-571
> > 
> >
> > I won't worry about this case because the containerizer will make sure 
> > cleanup/destroy is only called once for launcher/isolator/provisioner.
> > 
> > But I like the fact we remove the container from `containers` map so 
> > that other calls to launcher will be properly handled (e.g., status).

I improved the comment but still mentioned this because we're missing proper 
separation of concerns by assuming that we know how the caller is calling this 
function. I didn't write that code, someone else did, I'm just exposing this 
functionality through an interface!


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51278/#review150361
---


On Sept. 25, 2016, 4:08 p.m., Benjamin Hindman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51278/
> ---
> 
> (Updated Sept. 25, 2016, 4:08 p.m.)
> 
> 
> Review request for mesos, Gilbert Song, Jie Yu, and Kevin Klues.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Refactored LinuxLauncher to be nested container aware.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/launch.hpp 
> 859990cb85e9e8c06400397256cfc512f0811800 
>   src/slave/containerizer/mesos/linux_launcher.hpp 
> 2cd1e4d4b6d9e656c2447a9d3ab8e1f49ba5fffa 
>   src/slave/containerizer/mesos/linux_launcher.cpp 
> f27757032cc28be0f0067bed045536431dde4d6c 
> 
> Diff: https://reviews.apache.org/r/51278/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>



Re: Review Request 51278: Refactored LinuxLauncher to be nested container aware.

2016-09-26 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51278/#review150481
---


Ship it!




Ship It!

- Jie Yu


On Sept. 25, 2016, 4:08 p.m., Benjamin Hindman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51278/
> ---
> 
> (Updated Sept. 25, 2016, 4:08 p.m.)
> 
> 
> Review request for mesos, Gilbert Song, Jie Yu, and Kevin Klues.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Refactored LinuxLauncher to be nested container aware.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/launch.hpp 
> 859990cb85e9e8c06400397256cfc512f0811800 
>   src/slave/containerizer/mesos/linux_launcher.hpp 
> 2cd1e4d4b6d9e656c2447a9d3ab8e1f49ba5fffa 
>   src/slave/containerizer/mesos/linux_launcher.cpp 
> f27757032cc28be0f0067bed045536431dde4d6c 
> 
> Diff: https://reviews.apache.org/r/51278/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>



Re: Review Request 51278: Refactored LinuxLauncher to be nested container aware.

2016-09-25 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51278/#review150361
---




src/slave/containerizer/mesos/linux_launcher.cpp (line 105)


= None() is not necessary.



src/slave/containerizer/mesos/linux_launcher.cpp (lines 108 - 110)


Do you still need this function?



src/slave/containerizer/mesos/linux_launcher.cpp (lines 257 - 259)


checkpointed pid? Please update the comment here



src/slave/containerizer/mesos/linux_launcher.cpp (line 270)


`return Failure("Failed to list cgroups under : " + 
cgroups.error());`



src/slave/containerizer/mesos/linux_launcher.cpp (lines 276 - 289)


I suggest we factor out into a helper:
```
ContainerID containerId = getContainerId(cgroup);
```



src/slave/containerizer/mesos/linux_launcher.cpp (lines 308 - 319)


launcher destroy will be called for those orphans that launcher do not know 
about as well (e.g., launcher destroy succeeds, but isolator cleanup is not 
done yet, agent crashes).

Therefore, I'd suggest we adjust the destroy interface to return a 
Future. 'false' means the container is unknown to the launcher. Let the 
caller decide what to do. In Mesos containerizer, if the launcher does not know 
about a container in the destroy path, we probably should just continue the 
chain.

In that way, you don't have to do the reconciliation here.



src/slave/containerizer/mesos/linux_launcher.cpp (line 356)


count returns 'size_t'. So please use xxx.count() != 0



src/slave/containerizer/mesos/linux_launcher.cpp (line 475)


space after `:`



src/slave/containerizer/mesos/linux_launcher.cpp (line 508)


Ditto. Return a false here



src/slave/containerizer/mesos/linux_launcher.cpp (lines 520 - 526)


I won't worry about this case because the containerizer will make sure 
cleanup/destroy is only called once for launcher/isolator/provisioner.

But I like the fact we remove the container from `containers` map so that 
other calls to launcher will be properly handled (e.g., status).



src/slave/containerizer/mesos/linux_launcher.cpp (line 567)


using executor_pid here is weird. I think we should rethink how we should 
`ContainerStatus` when we have nested containers. Should we rename it to `pid`? 
Sounds like 'executor_pid' is not very precise.

Another to consider is: in the future, user might want to get the pid of 
it's actual process, not the 'init' pid we forked. What should we call that?


- Jie Yu


On Sept. 25, 2016, 4:08 p.m., Benjamin Hindman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51278/
> ---
> 
> (Updated Sept. 25, 2016, 4:08 p.m.)
> 
> 
> Review request for mesos, Gilbert Song, Jie Yu, and Kevin Klues.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Refactored LinuxLauncher to be nested container aware.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/launch.hpp 
> 859990cb85e9e8c06400397256cfc512f0811800 
>   src/slave/containerizer/mesos/linux_launcher.hpp 
> 2cd1e4d4b6d9e656c2447a9d3ab8e1f49ba5fffa 
>   src/slave/containerizer/mesos/linux_launcher.cpp 
> f27757032cc28be0f0067bed045536431dde4d6c 
> 
> Diff: https://reviews.apache.org/r/51278/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>



Re: Review Request 51278: Refactored LinuxLauncher to be nested container aware.

2016-09-25 Thread Gilbert Song

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51278/#review150346
---




src/slave/containerizer/mesos/linux_launcher.cpp (line 578)


This is the only conflict from the recent master head:

s/buildPathForContainer/buildPath/g


- Gilbert Song


On Sept. 25, 2016, 9:08 a.m., Benjamin Hindman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51278/
> ---
> 
> (Updated Sept. 25, 2016, 9:08 a.m.)
> 
> 
> Review request for mesos, Gilbert Song, Jie Yu, and Kevin Klues.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Refactored LinuxLauncher to be nested container aware.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/launch.hpp 
> 859990cb85e9e8c06400397256cfc512f0811800 
>   src/slave/containerizer/mesos/linux_launcher.hpp 
> 2cd1e4d4b6d9e656c2447a9d3ab8e1f49ba5fffa 
>   src/slave/containerizer/mesos/linux_launcher.cpp 
> f27757032cc28be0f0067bed045536431dde4d6c 
> 
> Diff: https://reviews.apache.org/r/51278/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>



Re: Review Request 51278: Refactored LinuxLauncher to be nested container aware.

2016-09-25 Thread Benjamin Hindman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51278/
---

(Updated Sept. 25, 2016, 4:08 p.m.)


Review request for mesos, Gilbert Song, Jie Yu, and Kevin Klues.


Repository: mesos


Description
---

Refactored LinuxLauncher to be nested container aware.


Diffs (updated)
-

  src/slave/containerizer/mesos/launch.hpp 
859990cb85e9e8c06400397256cfc512f0811800 
  src/slave/containerizer/mesos/linux_launcher.hpp 
2cd1e4d4b6d9e656c2447a9d3ab8e1f49ba5fffa 
  src/slave/containerizer/mesos/linux_launcher.cpp 
f27757032cc28be0f0067bed045536431dde4d6c 

Diff: https://reviews.apache.org/r/51278/diff/


Testing
---


Thanks,

Benjamin Hindman



Re: Review Request 51278: Refactored LinuxLauncher to be nested container aware.

2016-09-25 Thread Benjamin Hindman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51278/
---

(Updated Sept. 25, 2016, 4:07 p.m.)


Review request for mesos, Gilbert Song, Jie Yu, and Kevin Klues.


Repository: mesos


Description (updated)
---

Refactored LinuxLauncher to be nested container aware.


Diffs (updated)
-

  src/slave/containerizer/mesos/launch.hpp 
859990cb85e9e8c06400397256cfc512f0811800 
  src/slave/containerizer/mesos/linux_launcher.hpp 
2cd1e4d4b6d9e656c2447a9d3ab8e1f49ba5fffa 
  src/slave/containerizer/mesos/linux_launcher.cpp 
f27757032cc28be0f0067bed045536431dde4d6c 

Diff: https://reviews.apache.org/r/51278/diff/


Testing
---


Thanks,

Benjamin Hindman



Re: Review Request 51278: Refactored LinuxLauncher to be nested container aware.

2016-08-24 Thread Kevin Klues

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51278/#review146740
---




src/slave/containerizer/mesos/linux_launcher.cpp (line 132)


There's something wrong with this logic. The top level directory of the 
cgroup begins with `mesos`, not the first container ID. We either need to strip 
this here or at a level above.


- Kevin Klues


On Aug. 22, 2016, 4:54 a.m., Benjamin Hindman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51278/
> ---
> 
> (Updated Aug. 22, 2016, 4:54 a.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Note, this doesn't yet include the use of 'ns::enter' for creating
> nested containers.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/linux_launcher.hpp 
> 8fbe1e9742df85b0fc6de46ac81a0c064c845a63 
>   src/slave/containerizer/mesos/linux_launcher.cpp 
> 7377316776646e3d660086da3c0d5b494ce8ace4 
> 
> Diff: https://reviews.apache.org/r/51278/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>



Re: Review Request 51278: Refactored LinuxLauncher to be nested container aware.

2016-08-23 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51278/#review146532
---




src/slave/containerizer/mesos/linux_launcher.hpp (lines 20 - 22)


Swap these two.



src/slave/containerizer/mesos/linux_launcher.hpp (lines 64 - 85)


Swap these two definitions. According to google style, type definitions 
should be listed first.



src/slave/containerizer/mesos/linux_launcher.hpp (lines 97 - 99)


Move this definition up below `recover` helper.



src/slave/containerizer/mesos/linux_launcher.cpp (lines 131 - 145)


I suggest we move this to a helper. We already have a helper to get cgroup 
from containerId, this will be the reverse operation.

```
string getCgroup(const ContainerID& containerId);
Try getContainerId(const string& cgroup);
```

We can add necessary validation in the helper.



src/slave/containerizer/mesos/linux_launcher.cpp (lines 149 - 151)


As we discussed yesterday, we probably should checkpoint the pid in 
runtime_dir before creating the cgroup in the freezer hierarchy.



src/slave/containerizer/mesos/linux_launcher.cpp (lines 186 - 193)


The accumulation logic here is quite unintuitive. I'd suggest we make 
`cgroups::traverse` simple in the sense that it only applies a _map_ function.

```
template 
Try cgroups::traverse(
  const string& hierarchy,
  const string& cgroup,
  const lambda::function& f,
  const Traversal& traversal = PRE_ORDER);
```

And we perform the accumulation in LinuxLauncher. In that way, `recover` 
helper can be simplified to just return a `Try`.



src/slave/containerizer/mesos/linux_launcher.cpp (lines 201 - 206)


I am wondering if should traverse the runtime_dir. Since we checkpoint the 
pid before we create the cgroup (i.e., we remove cgroup before we remove the 
checkpointed pid), we might run into the case where  the checkpointed pid 
exists but the freezer cgroup does not exist.

However, for legacy containers, we still have to traverse freezer hierarchy.



src/slave/containerizer/mesos/linux_launcher.cpp (line 209)


s/our process/agent



src/slave/containerizer/mesos/linux_launcher.cpp (line 223)


Why do you need this if check?



src/slave/containerizer/mesos/linux_launcher.cpp (line 229)


s/our process/agent



src/slave/containerizer/mesos/linux_launcher.cpp (lines 402 - 409)


When do we remove `Container` from `containers`? We cannot keep them 
forever as it'll cause a memory leak. I think we should remove it when the top 
level container is being destroyed, and we should do it _after_ the 
`cgroups::destory` is successful.



src/slave/containerizer/mesos/linux_launcher.cpp (lines 408 - 409)


Can you be more specific here? Who will be responsible for cleaning up 
sub-directories under runtime_dir when the top level container terminates? I 
feel that Launcher should be responsible for doing that because it was the one 
that checkpoints the data.

The caveat here is that we cannot remove checkpoint dir for nested 
containers until the top level container terminates. This is because the 
executor might issue `WAIT` call to the agent to get the exit status of the 
nested container.

For the top level container, if we move the entire checkpointed dir during 
destroy, and if slave crashes before it checkpoints the sentinel file. During 
recovery, the containerizer will still call `launcher->wait` on the containerId 
that launcher does not know about. But since the agent does not need the exit 
code for the executor, this is fine for now. In the future, if we want to add 
executor status update, we need to adjust the logics here so that we only 
remove the top level checkpoint dir for the top level container after agent 
checkpoint the sentinel file. We probably need an explicit `launcher->cleanup`.



src/slave/containerizer/mesos/linux_launcher.cpp (lines 414 - 419)


This sounds like an uncessary optimization that buy us nothing. This also 
creates an explicit dependency from launcher to pid namespace isolator. I'd 
suggest we simply just remove this code here (i.e., 

Re: Review Request 51278: Refactored LinuxLauncher to be nested container aware.

2016-08-22 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51278/#review146309
---



Patch looks great!

Reviews applied: [51272, 51273, 51274, 51275, 51276, 51277, 51278]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker_build.sh

- Mesos ReviewBot


On Aug. 22, 2016, 4:54 a.m., Benjamin Hindman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51278/
> ---
> 
> (Updated Aug. 22, 2016, 4:54 a.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Note, this doesn't yet include the use of 'ns::enter' for creating
> nested containers.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/linux_launcher.hpp 
> 8fbe1e9742df85b0fc6de46ac81a0c064c845a63 
>   src/slave/containerizer/mesos/linux_launcher.cpp 
> 7377316776646e3d660086da3c0d5b494ce8ace4 
> 
> Diff: https://reviews.apache.org/r/51278/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>