Re: Review Request 51406: Updated launch helper to optionally spawn an 'init' process on linux.

Kevin Klues Sat, 24 Sep 2016 12:21:05 -0700


> On Sept. 24, 2016, 1:33 a.m., Jie Yu wrote:
> > src/slave/containerizer/mesos/launch.cpp, line 140
> > <https://reviews.apache.org/r/51406/diff/7/?file=1509426#file1509426line140>
> >
> >     Looks like sprintf is not async safe in glibc:
> >     
> > http://stackoverflow.com/questions/14573000/print-int-from-signal-handler-using-write-or-async-safe-functions
> >     
> >     cerr << below is not async safe either.
> >     
> >     I think it's highly unlikely that we'll hit the async safe problem here 
> > because most of the time, our code will be in syscall. So I suggest we 
> > simply this function and add a TODO about async safety (we try to be async 
> > safe, but if it's too hard, use a simple way):
> >     
> >     ```
> >     Try<Nothing> write = os::write(
> >         containerStatusFd.get(),
> >         std::to_string(status));
> >     
> >     if (write.isError()) {
> >       os::write(
> >           STDERR_FILENO,
> >           "Failed to write container status '" + statusString +
> >           "': " + ::strerror(errno));
> >     }
> >     ```
> >     
> >     I'd not do the `_exit` here because this helper is meant to be writting 
> > the status. Leave the `_exit` to the caller.


Yeah, that's my bad. I didn't look closely enough at `sprintf()` and assumed it 
was async signal safe since it takes a pointer to the destination where to 
write the result. I forgot that it does buffered I/O under the hood though. Doh!

The `cerr <<` was just an oversight. Obviously this is not async signal safe!

Regarding the exit... If we don't want to do an exit here, then the function 
should return a `bool` so the caller can decide what to do.


> On Sept. 24, 2016, 1:33 a.m., Jie Yu wrote:
> > src/slave/containerizer/mesos/launch.cpp, line 303
> > <https://reviews.apache.org/r/51406/diff/7/?file=1509426#file1509426line303>
> >
> >     Not your, but what I am thinking here is that if we should 
> > `exitWithStatus(EXIT_FAILURE)` here or simply `exitWithSignal(SIGABRT)`.
> >     
> >     The reason being: If we exit with EXIT_FAILURE, the user might think 
> > their task (or nested container) failed, rather than something wrong with 
> > our init process.
> >     
> >     Let me know what you think. We don't need to do it now as this is the 
> > current semantics.

Either of these seem reasonable to me. I would expect any error at all to 
trigger the user to go look at the stderr log and it should be clear that it 
dies in the init process given the error messages we are logging there.


> On Sept. 24, 2016, 1:33 a.m., Jie Yu wrote:
> > src/slave/containerizer/mesos/launch.cpp, line 33
> > <https://reviews.apache.org/r/51406/diff/7/?file=1509426#file1509426line33>
> >
> >     No need for this as we already included `os.hpp`

Ack.


- Kevin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51406/#review150266
-----------------------------------------------------------


On Sept. 23, 2016, 8:59 p.m., Kevin Klues wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51406/
> -----------------------------------------------------------
> 
> (Updated Sept. 23, 2016, 8:59 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6088
>     https://issues.apache.org/jira/browse/MESOS-6088
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously the 'mesos-containerizer launch' binary would simply exec
> into the actual command we wanted to launch after doing some set of
> preperatory work. The problem with this approach, however, is that
> this gives us no opportunity to checkpoint the wait status of the
> command so the agent can recover it in cases where it is offline at
> the time the command completes.
> 
> To compensate for this, we now allow 'mesos-containerizer launch' to
> take an optional '--runtime_directory' argument, which indicates where
> to checkpint the status of the launched command (as returned by
> 'waitpid()'). In order to do this checkpointing, however, we cannot
> simply exec into the command anymore. Instead we now fork-exec the
> command and reap it once it completes. We then checkpoint its status
> and return it as our own. We call the original process the 'init'
> process of the container and the fork-exec'd process its child.  As a
> side effect of doing things this way, we also have to be careful to
> forward all signals sent to the init process down to the child.
> 
> In a subsequent commit we will update the mesos containerizer to use
> this new functionality.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/launch.hpp 
> 859990cb85e9e8c06400397256cfc512f0811800 
>   src/slave/containerizer/mesos/launch.cpp 
> 777121cffaa5fc76adfefc1e617409d997c7aac8 
> 
> Diff: https://reviews.apache.org/r/51406/diff/
> 
> 
> Testing
> -------
> 
> $ GTEST_FILTER="" make -j check
> $ src/mesos-tests
> $ sudo src/mesos-tests
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>

Re: Review Request 51406: Updated launch helper to optionally spawn an 'init' process on linux.

Reply via email to