Re: Review Request 51406: Updated 'launcher' to optionally spawn an 'init' process.

2016-08-25 Thread Kevin Klues

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51406/
---

(Updated Aug. 26, 2016, 1:52 a.m.)


Review request for mesos and Jie Yu.


Changes
---

The error handlig for `forwardSignals()` from the last revision caused failures 
because I wasn't filtering out signals properly.


Bugs: MESOS-6088
https://issues.apache.org/jira/browse/MESOS-6088


Repository: mesos


Description
---

Previously the 'mesos-containerizer launch' binary would simply exec
into the actual command we wanted to launch after doing some set of
preperatory work. The problem with this approach, however, is that
this gives us no opportunity to checkpoint the exit status of the
command so the agent can recover it in cases where it is offline at
the time the command completes.

To compensate for this, we now allow 'mesos-containerizer launch' to
take an optional '--exit_status_path' parameter, which indicates where
the exit status of our command should be checkpointed. In order to
do this checkpointing, however, we cannot simply exec into the command
anymore. Instead we now fork-exec the command and reap it once it
completes. We then checkpoint its exit status and return it as our
own. We call the original process the 'init' process of the container
and the fork-exec'd process its child.  As a side effect of doing
things this way, we also have to be careful to forward all signals
sent to the init process down to the child.

In a subsequent commit we will update the mesos containerizer to use
this new functionality.


Diffs (updated)
-

  src/slave/containerizer/mesos/launch.hpp 
0e86da9c7bd9c7fbedd7102d66b902d1c10e5e0b 
  src/slave/containerizer/mesos/launch.cpp 
2db8db5ac2f4614d406b950a1b4e8098a0d90e7b 
  src/slave/containerizer/mesos/launcher.cpp 
413a8afdc56127a58c9599c436d17d6c98e62434 
  src/slave/containerizer/mesos/linux_launcher.cpp 
7377316776646e3d660086da3c0d5b494ce8ace4 

Diff: https://reviews.apache.org/r/51406/diff/


Testing
---


Thanks,

Kevin Klues



Re: Review Request 51406: Updated 'launcher' to optionally spawn an 'init' process.

2016-08-25 Thread Kevin Klues

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51406/
---

(Updated Aug. 26, 2016, 12:34 a.m.)


Review request for mesos and Jie Yu.


Changes
---

Handle error when calling `forwardSignals()`


Bugs: MESOS-6088
https://issues.apache.org/jira/browse/MESOS-6088


Repository: mesos


Description
---

Previously the 'mesos-containerizer launch' binary would simply exec
into the actual command we wanted to launch after doing some set of
preperatory work. The problem with this approach, however, is that
this gives us no opportunity to checkpoint the exit status of the
command so the agent can recover it in cases where it is offline at
the time the command completes.

To compensate for this, we now allow 'mesos-containerizer launch' to
take an optional '--exit_status_path' parameter, which indicates where
the exit status of our command should be checkpointed. In order to
do this checkpointing, however, we cannot simply exec into the command
anymore. Instead we now fork-exec the command and reap it once it
completes. We then checkpoint its exit status and return it as our
own. We call the original process the 'init' process of the container
and the fork-exec'd process its child.  As a side effect of doing
things this way, we also have to be careful to forward all signals
sent to the init process down to the child.

In a subsequent commit we will update the mesos containerizer to use
this new functionality.


Diffs (updated)
-

  src/slave/containerizer/mesos/launch.hpp 
0e86da9c7bd9c7fbedd7102d66b902d1c10e5e0b 
  src/slave/containerizer/mesos/launch.cpp 
2db8db5ac2f4614d406b950a1b4e8098a0d90e7b 
  src/slave/containerizer/mesos/launcher.cpp 
413a8afdc56127a58c9599c436d17d6c98e62434 
  src/slave/containerizer/mesos/linux_launcher.cpp 
7377316776646e3d660086da3c0d5b494ce8ace4 

Diff: https://reviews.apache.org/r/51406/diff/


Testing
---


Thanks,

Kevin Klues



Re: Review Request 51406: Updated 'launcher' to optionally spawn an 'init' process.

2016-08-25 Thread Jie Yu


> On Aug. 25, 2016, 5:34 p.m., Jie Yu wrote:
> > src/slave/containerizer/mesos/launch.cpp, line 502
> > 
> >
> > I'd avoid using subprocess here. Using subprocess here means that we'll 
> > initialize libprocess, creating many unnecessary threads. I would simply 
> > fork-exec here. We don't have two worry about async signal safe problem 
> > here because it's single threaded process.
> > 
> > We need to fix the `pre_exec_comands` above for the same problem.

Not resolved? I'd still suggest you use 'fork-exec' as the code shown above. 
You can avoid the subprocess usage in that way.


- Jie


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51406/#review146791
---


On Aug. 25, 2016, 11:32 p.m., Kevin Klues wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51406/
> ---
> 
> (Updated Aug. 25, 2016, 11:32 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6088
> https://issues.apache.org/jira/browse/MESOS-6088
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Previously the 'mesos-containerizer launch' binary would simply exec
> into the actual command we wanted to launch after doing some set of
> preperatory work. The problem with this approach, however, is that
> this gives us no opportunity to checkpoint the exit status of the
> command so the agent can recover it in cases where it is offline at
> the time the command completes.
> 
> To compensate for this, we now allow 'mesos-containerizer launch' to
> take an optional '--exit_status_path' parameter, which indicates where
> the exit status of our command should be checkpointed. In order to
> do this checkpointing, however, we cannot simply exec into the command
> anymore. Instead we now fork-exec the command and reap it once it
> completes. We then checkpoint its exit status and return it as our
> own. We call the original process the 'init' process of the container
> and the fork-exec'd process its child.  As a side effect of doing
> things this way, we also have to be careful to forward all signals
> sent to the init process down to the child.
> 
> In a subsequent commit we will update the mesos containerizer to use
> this new functionality.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/launch.hpp 
> 0e86da9c7bd9c7fbedd7102d66b902d1c10e5e0b 
>   src/slave/containerizer/mesos/launch.cpp 
> 2db8db5ac2f4614d406b950a1b4e8098a0d90e7b 
>   src/slave/containerizer/mesos/launcher.cpp 
> 413a8afdc56127a58c9599c436d17d6c98e62434 
>   src/slave/containerizer/mesos/linux_launcher.cpp 
> 7377316776646e3d660086da3c0d5b494ce8ace4 
> 
> Diff: https://reviews.apache.org/r/51406/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>



Re: Review Request 51406: Updated 'launcher' to optionally spawn an 'init' process.

2016-08-25 Thread Kevin Klues

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51406/
---

(Updated Aug. 25, 2016, 11:32 p.m.)


Review request for mesos and Jie Yu.


Changes
---

Addressed all of Jie's comments except "I'd avoid using subprocess here." That 
should come in a seperate commit and already has a JIRA for it: `MESOS-6075`


Bugs: MESOS-6088
https://issues.apache.org/jira/browse/MESOS-6088


Repository: mesos


Description (updated)
---

Previously the 'mesos-containerizer launch' binary would simply exec
into the actual command we wanted to launch after doing some set of
preperatory work. The problem with this approach, however, is that
this gives us no opportunity to checkpoint the exit status of the
command so the agent can recover it in cases where it is offline at
the time the command completes.

To compensate for this, we now allow 'mesos-containerizer launch' to
take an optional '--exit_status_path' parameter, which indicates where
the exit status of our command should be checkpointed. In order to
do this checkpointing, however, we cannot simply exec into the command
anymore. Instead we now fork-exec the command and reap it once it
completes. We then checkpoint its exit status and return it as our
own. We call the original process the 'init' process of the container
and the fork-exec'd process its child.  As a side effect of doing
things this way, we also have to be careful to forward all signals
sent to the init process down to the child.

In a subsequent commit we will update the mesos containerizer to use
this new functionality.


Diffs (updated)
-

  src/slave/containerizer/mesos/launch.hpp 
0e86da9c7bd9c7fbedd7102d66b902d1c10e5e0b 
  src/slave/containerizer/mesos/launch.cpp 
2db8db5ac2f4614d406b950a1b4e8098a0d90e7b 
  src/slave/containerizer/mesos/launcher.cpp 
413a8afdc56127a58c9599c436d17d6c98e62434 
  src/slave/containerizer/mesos/linux_launcher.cpp 
7377316776646e3d660086da3c0d5b494ce8ace4 

Diff: https://reviews.apache.org/r/51406/diff/


Testing
---


Thanks,

Kevin Klues



Re: Review Request 51406: Updated 'launcher' to optionally spawn an 'init' process.

2016-08-25 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51406/#review146819
---




src/slave/containerizer/mesos/launcher.cpp (lines 91 - 114)


No need to use os::open. Simply call os::read to get the string from the 
file. For such IO operations, no need to make it async (e.g., LOG(INFO) is 
synchrounous as well).

BTW: you forgot to close the fd.


- Jie Yu


On Aug. 25, 2016, 1:21 a.m., Kevin Klues wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51406/
> ---
> 
> (Updated Aug. 25, 2016, 1:21 a.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6088
> https://issues.apache.org/jira/browse/MESOS-6088
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Previously the 'mesos-containerizer launch' binary would simply exec
> into the actual command we wanted to launch after doing some set of
> preperatory work. The problem with this approach, however, is that
> this gives us no opportunity to checkpoint the exit status of the
> command so the agent can recover it in cases where it is offline at
> the time the command completes.
> 
> To compensate for this, we now allow 'mesos-containerizer launch' to
> take an optional '--exit_status_path' parameter, which indicates where
> the exit status of our command should be checkpointed. In order to
> do this checkpointing, however, we cannot simply exec into the command
> anymore. Instead we now fork-exec the command and reap it once it
> completes. We then checkpoint its exit status and return it as our
> own. We call the original process the 'init' process of the container
> and the fork-exec'd process its child.  As a side effect of doing
> things this way, we also have to be careful to forward all signals
> sent to the init process down to the child.
> 
> In order to properly reap the 'init' process, we also introduce a new
> 'Launcher::wait' command that knows how to reap the 'init' process
> directly and then retreive its exit status from the checkpoint (if
> necessary).
> 
> In a subsequent commit we will update the mesos containerizer to use
> this new functionality.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/launch.hpp 
> 0e86da9c7bd9c7fbedd7102d66b902d1c10e5e0b 
>   src/slave/containerizer/mesos/launch.cpp 
> 2db8db5ac2f4614d406b950a1b4e8098a0d90e7b 
>   src/slave/containerizer/mesos/launcher.hpp 
> 0eae09515d550a2c71ae1414d4a22943f1d09db9 
>   src/slave/containerizer/mesos/launcher.cpp 
> 413a8afdc56127a58c9599c436d17d6c98e62434 
>   src/slave/containerizer/mesos/linux_launcher.hpp 
> 8fbe1e9742df85b0fc6de46ac81a0c064c845a63 
>   src/slave/containerizer/mesos/linux_launcher.cpp 
> 7377316776646e3d660086da3c0d5b494ce8ace4 
>   src/tests/containerizer/launcher.hpp 
> 94c62b761a17436841bd19f3bf622cc8f1047876 
>   src/tests/containerizer/launcher.cpp 
> a92d9890f0931425d69ef8ce0896d081b8722079 
> 
> Diff: https://reviews.apache.org/r/51406/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>



Re: Review Request 51406: Updated 'launcher' to optionally spawn an 'init' process.

2016-08-25 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51406/#review146791
---



Can you split the Launcher::wait support to a separate patch?


src/slave/containerizer/mesos/launch.cpp (line 124)


We use the same binary for nested container as well. So it might not be an 
executor. I'd rename it to `containerPid`



src/slave/containerizer/mesos/launch.cpp (lines 130 - 142)


indentation here should be 2 :)



src/slave/containerizer/mesos/launch.cpp (lines 130 - 131)


You can do the following:
```
bufferedSignals[sig].fetch_add(1)
```



src/slave/containerizer/mesos/launch.cpp (line 132)


defer is not async singal safe because it'll try to acquire a lock.

os::kill is async signal safe
http://man7.org/linux/man-pages/man7/signal.7.html

Looking at possible failure scenarios of os::kill, none of them looks 
possible except ESRCH. But in that case, the container pid is gone, and the 
process will exit soon. So no need to print the error message.

Therefore, i'd suggest we simply call os::kill in the signal handler and 
ignore the error. No need to use defer.



src/slave/containerizer/mesos/launch.cpp (line 133)


What if bufferedSignals[i] > 1?



src/slave/containerizer/mesos/launch.cpp (lines 161 - 172)


We have os::signal::install in stout. Please use that



src/slave/containerizer/mesos/launch.cpp (lines 220 - 225)


I'd suggest move this whole block of code after we receive the control pipe 
signal from the parent to continue.

If parent dies, there no point to open and create the exit_status file.



src/slave/containerizer/mesos/launch.cpp (line 224)


even if we pivot_root below.



src/slave/containerizer/mesos/launch.cpp (lines 229 - 239)


I suggest that we create the directory in `Launcher::fork` because it also 
needs to checkpoint the pid there.

This helper binary should just open the file at `exit_status_path`.



src/slave/containerizer/mesos/launch.cpp (line 244)


can you make sure fd has cloexec flag, we don't want to leak this fd to the 
subprocess.



src/slave/containerizer/mesos/launch.cpp (line 245)


I'd insert a new line above.



src/slave/containerizer/mesos/launch.cpp (lines 493 - 497)


I'd suggest we restructure the code like the following. Since we cannot do 
signal forwarding on windows, i'd suggest we simply does not support this 
feature on Windows. This flag will only be used in LinuxLauncher.

```
#ifndef __WINDOWS__
if (exitStatusFd.isSome()) {
  pid_t pid = fork();
  if (pid < 0) {
cerr << ... << endl;
return EXIT_FAILURE;
  } else if (pid > 0) {
// Parent.
forwardSingals(pid);

...
  }
}
#endif // __WINDOWS__

// Child.
if (command->shell()) {
  os::execlp(...);
} else {
  execvp(...);
}
```



src/slave/containerizer/mesos/launch.cpp (line 502)


I'd avoid using subprocess here. Using subprocess here means that we'll 
initialize libprocess, creating many unnecessary threads. I would simply 
fork-exec here. We don't have two worry about async signal safe problem here 
because it's single threaded process.

We need to fix the `pre_exec_comands` above for the same problem.



src/slave/containerizer/mesos/launch.cpp (lines 542 - 544)


else if



src/slave/containerizer/mesos/launch.cpp (line 550)


kill one extra line



src/slave/containerizer/mesos/launch.cpp (line 562)


better to print the exit status path as well here


- Jie Yu


On Aug. 25, 2016, 1:21 a.m., Kevin Klues wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51406/
> ---
> 
> (Updated Aug. 25, 2016, 1:21 a.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6088
> https://issues.apache.org/jira/browse/MESOS-6088