On Fri, Apr 11, 2014 at 2:45 AM, Djalal Harouni <tix...@opendz.org> wrote: > nspawn and the container child use eventfd to wait and notify each other > that they are ready so the container setup can be completed. > > However in its current form the wait/notify event ignore errors that > may especially affect the child (container). > > On errors the child will jump to the "child_fail" label and terminate > with _exit(EXIT_FAILURE) without notifying the parent. Since the eventfd > is created without the "EFD_NONBLOCK" flag, this leaves the parent > blocking on the eventfd_read() call. > > To fix this without adding extra overheads, we keep the eventfd logic > and improve it by adding: > > * States of the parent and child setups: > SETUP_INIT, SETUP_SUCCEEDED and SETUP_FAILED > > * In the child if the setup succeeded we notify parent by writing a > SETUP_SUCCEEDED value, otherwise we make sure to write a SETUP_FAILED > before the _exit(). This prevents the parent from waiting on an event > that will never come. > > * In parent read the child setup state, if SETUP_SUCCEEDED continue, > otherwise jump to "check_container_status" label, get the container > child status and release resources. > > https://bugs.freedesktop.org/show_bug.cgi?id=76193 > > Reported-by: Tobias Hunger <tobias.hun...@gmail.com>
Looks good to me. Cheers, Tom > --- > src/nspawn/nspawn.c | 50 ++++++++++++++++++++++++++++++++++++++------------ > 1 file changed, 38 insertions(+), 12 deletions(-) > > diff --git a/src/nspawn/nspawn.c b/src/nspawn/nspawn.c > index d606bf2..21322f7 100644 > --- a/src/nspawn/nspawn.c > +++ b/src/nspawn/nspawn.c > @@ -104,6 +104,16 @@ typedef enum LinkJournal { > LINK_GUEST > } LinkJournal; > > +enum { > + /* States of the parent (nspawn) and child (container) setups > + * These are used for event notification so we can inform > + * the other party if the appropriate setup succeeded or > + * failed. */ > + SETUP_INIT, > + SETUP_SUCCEEDED, > + SETUP_FAILED > +}; > + > static char *arg_directory = NULL; > static char *arg_user = NULL; > static sd_id128_t arg_uuid = {}; > @@ -2815,16 +2825,16 @@ int main(int argc, char *argv[]) { > > for (;;) { > ContainerStatus container_status; > + eventfd_t event_status; > int parent_ready_fd = -1, child_ready_fd = -1; > - eventfd_t x; > > - parent_ready_fd = eventfd(0, EFD_CLOEXEC); > + parent_ready_fd = eventfd(SETUP_INIT, EFD_CLOEXEC); > if (parent_ready_fd < 0) { > log_error("Failed to create event fd: %m"); > goto finish; > } > > - child_ready_fd = eventfd(0, EFD_CLOEXEC); > + child_ready_fd = eventfd(SETUP_INIT, EFD_CLOEXEC); > if (child_ready_fd < 0) { > log_error("Failed to create event fd: %m"); > goto finish; > @@ -2980,7 +2990,7 @@ int main(int argc, char *argv[]) { > /* Tell the parent that we are ready, and that > * it can cgroupify us to that we lack access > * to certain devices and resources. */ > - eventfd_write(child_ready_fd, 1); > + eventfd_write(child_ready_fd, SETUP_SUCCEEDED); > child_ready_fd = safe_close(child_ready_fd); > > if (chdir(arg_directory) < 0) { > @@ -3081,7 +3091,7 @@ int main(int argc, char *argv[]) { > env_use = (char**) envp; > > /* Wait until the parent is ready with the setup, > too... */ > - eventfd_read(parent_ready_fd, &x); > + eventfd_read(parent_ready_fd, &event_status); > parent_ready_fd = safe_close(parent_ready_fd); > > if (arg_boot) { > @@ -3113,17 +3123,29 @@ int main(int argc, char *argv[]) { > log_error("execv() failed: %m"); > > child_fail: > + /* Tell the parent that the setup failed, so he > + * can clean up resources and terminate. */ > + if (child_ready_fd != -1) { > + eventfd_write(child_ready_fd, SETUP_FAILED); > + child_ready_fd = safe_close(child_ready_fd); > + } > _exit(EXIT_FAILURE); > } > > fdset_free(fds); > fds = NULL; > > - /* Wait until the child reported that it is ready with > - * all it needs to do with priviliges. After we got > - * the notification we can make the process join its > - * cgroup which might limit what it can do */ > - eventfd_read(child_ready_fd, &x); > + /* Wait for the child event: > + * If SETUP_FAILED, the child will terminate soon. > + * If SETUP_SUCCEEDED, the child is reporting that it > + * is ready with all it needs to do with priviliges. > + * After we got the notification we can make the > + * process join its cgroup which might limit what it > + * can do */ > + r = eventfd_read(child_ready_fd, &event_status); > + child_ready_fd = safe_close(child_ready_fd); > + if (r < 0 || event_status == SETUP_FAILED) > + goto check_container_status; > > r = register_machine(pid); > if (r < 0) > @@ -3146,9 +3168,12 @@ int main(int argc, char *argv[]) { > goto finish; > > /* Notify the child that the parent is ready with all > - * its setup, and thtat the child can now hand over > + * its setup, and that the child can now hand over > * control to the code to run inside the container. */ > - eventfd_write(parent_ready_fd, 1); > + r = eventfd_write(parent_ready_fd, SETUP_SUCCEEDED); > + parent_ready_fd = safe_close(parent_ready_fd); > + if (r < 0) > + goto finish; > > k = process_pty(master, &mask, arg_boot ? pid : 0, > SIGRTMIN+3); > if (k < 0) { > @@ -3162,6 +3187,7 @@ int main(int argc, char *argv[]) { > /* Kill if it is not dead yet anyway */ > terminate_machine(pid); > > +check_container_status: > /* Redundant, but better safe than sorry */ > kill(pid, SIGKILL); > > -- > 1.8.5.3 > > _______________________________________________ > systemd-devel mailing list > systemd-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/systemd-devel _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel