Re: [systemd-devel] [PATCH 2/3] nspawn: use Barrier API instead of eventfd-util
Hi On Mon, Jul 14, 2014 at 3:28 AM, Djalal Harouni tix...@opendz.org wrote: ppoll is atomic and it is handled by the kernel, so perhaps setting/restoring sigmask can be done easily! and for nspawn: IMO we need to receive SIGCHLD which implies EINTR. I say EINTR since not only for blocking read or infinite poll, but perhaps for all the other functions that the parent may do to setup the environment of the container, currently nspawn will set network interfaces before moving them into the container, it will also register the machine, and perhaps other operations... So having EINTR errors is useful here not only for direct reads, but for all the other calls that might block! IOW I think that nspawn should have an empty sig handler for SIGCHLD. Barrier reads already use poll and pipe to handle remote abortion since it can *not* be done by eventfd, yes this is perfect but for nspawn we can also achieve the same by combining eventfd and SICCHLD! What do you think if we make Barrier use: eventfd+pipe and/or eventfd+SIGCHLD ? Most complex fork/clone code should receive SIGCHLD, and think about nspawn! we do want it to be as lightweigh as possible, having 4 fds by default (2 eventfd + heavy pipe) may hit some resource limits quickly! compared to: 2 eventfd + empty sig handler! My first attempt was to use a signalfd on SIGCHLD + edge-triggered. If I don't read from the signalfd and only use it to wake up and wall waitid(WNOWAIT), I won't interfere with other signalfds. However, this wasn't really more lightweight than the pipe-method so i ditched it. Regarding dropping the pipe: pipe2() is _really_ fast. I mean, we're fork()ing and running like thousands of syscalls just during container setup. I cannot see how dropping one light pipe2 call is beneficial here? We also destroy the pipe before running the real container. So it's really just during setup. And it seems from the patch you are not checking barrier_place() return code, if the remote aborted ? That's fine. Abortions are remembered and the later barrier_sync() call will return immediately. Thanks for the patches, sure the API is really nice, I'll try to comment on #1 Thanks! David ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH 2/3] nspawn: use Barrier API instead of eventfd-util
On Thu, Jul 17, 2014 at 11:30:26AM +0200, David Herrmann wrote: Hi On Mon, Jul 14, 2014 at 3:28 AM, Djalal Harouni tix...@opendz.org wrote: ppoll is atomic and it is handled by the kernel, so perhaps setting/restoring sigmask can be done easily! and for nspawn: IMO we need to receive SIGCHLD which implies EINTR. I say EINTR since not only for blocking read or infinite poll, but perhaps for all the other functions that the parent may do to setup the environment of the container, currently nspawn will set network interfaces before moving them into the container, it will also register the machine, and perhaps other operations... So having EINTR errors is useful here not only for direct reads, but for all the other calls that might block! IOW I think that nspawn should have an empty sig handler for SIGCHLD. Barrier reads already use poll and pipe to handle remote abortion since it can *not* be done by eventfd, yes this is perfect but for nspawn we can also achieve the same by combining eventfd and SICCHLD! What do you think if we make Barrier use: eventfd+pipe and/or eventfd+SIGCHLD ? Most complex fork/clone code should receive SIGCHLD, and think about nspawn! we do want it to be as lightweigh as possible, having 4 fds by default (2 eventfd + heavy pipe) may hit some resource limits quickly! compared to: 2 eventfd + empty sig handler! My first attempt was to use a signalfd on SIGCHLD + edge-triggered. If I don't read from the signalfd and only use it to wake up and wall waitid(WNOWAIT), I won't interfere with other signalfds. However, this wasn't really more lightweight than the pipe-method so i ditched it. Ok. Regarding dropping the pipe: pipe2() is _really_ fast. I mean, we're fork()ing and running like thousands of syscalls just during container setup. I cannot see how dropping one light pipe2 call is beneficial here? We also destroy the pipe before running the real container. So it's really just during setup. Yes, compared to fork() and all the other stuff, pipe2() is fast. My concern was about the other resources that pipe needs and the fd limit. Of course, it depends on nspawn future and plans, 2 or 4 fds sure it will affect systems that will run multiple nspawn instances... but perhaps this is not an issue for nspawn! Otherwise I'm ok with having a pipe as a mechanism to detect container failure, and a good point for general cases: it does not interfere with signal handlers Thanks! -- Djalal Harouni http://opendz.org ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] [PATCH 2/3] nspawn: use Barrier API instead of eventfd-util
The Barrier-API simplifies cross-fork() synchronization a lot. Replace the hard-coded eventfd-util implementation and drop it. Compared to the old API, Barriers also handle exit() of the remote side as abortion. This way, segfaults will not cause the parent to deadlock. EINTR handling is currently ignored for any barrier-waits. This can easily be added, but it isn't needed so far so I dropped it. EINTR handling in general is ugly, anyway. You need to deal with pselect/ppoll/... variants and make sure not to unblock signals at the wrong times. So genrally, there's little use in adding it. --- Makefile.am | 2 - src/nspawn/nspawn.c | 136 - src/shared/eventfd-util.c | 169 -- src/shared/eventfd-util.h | 43 4 files changed, 60 insertions(+), 290 deletions(-) delete mode 100644 src/shared/eventfd-util.c delete mode 100644 src/shared/eventfd-util.h diff --git a/Makefile.am b/Makefile.am index 039a83e..d7b0f90 100644 --- a/Makefile.am +++ b/Makefile.am @@ -832,8 +832,6 @@ libsystemd_shared_la_SOURCES = \ src/shared/barrier.h \ src/shared/async.c \ src/shared/async.h \ - src/shared/eventfd-util.c \ - src/shared/eventfd-util.h \ src/shared/copy.c \ src/shared/copy.h \ src/shared/base-filesystem.c \ diff --git a/src/nspawn/nspawn.c b/src/nspawn/nspawn.c index bad93a5..e75cc28 100644 --- a/src/nspawn/nspawn.c +++ b/src/nspawn/nspawn.c @@ -40,7 +40,6 @@ #include sys/un.h #include sys/socket.h #include linux/netlink.h -#include sys/eventfd.h #include net/if.h #include linux/veth.h #include sys/personality.h @@ -84,12 +83,12 @@ #include def.h #include rtnl-util.h #include udev-util.h -#include eventfd-util.h #include blkid-util.h #include gpt.h #include siphash24.h #include copy.h #include base-filesystem.h +#include barrier.h #ifdef HAVE_SECCOMP #include seccomp-util.h @@ -3074,12 +3073,18 @@ int main(int argc, char *argv[]) { for (;;) { ContainerStatus container_status; -int eventfds[2] = { -1, -1 }; +_barrier_destroy_ Barrier barrier = { }; struct sigaction sa = { .sa_handler = nop_handler, .sa_flags = SA_NOCLDSTOP, }; +r = barrier_init(barrier); +if (r 0) { +log_error(Cannot initialize IPC barrier: %s, strerror(-r)); +goto finish; +} + /* Child can be killed before execv(), so handle SIGCHLD * in order to interrupt parent's blocking calls and * give it a chance to call wait() and terminate. */ @@ -3095,9 +3100,9 @@ int main(int argc, char *argv[]) { goto finish; } -pid = clone_with_eventfd(SIGCHLD|CLONE_NEWNS| - (arg_share_system ? 0 : CLONE_NEWIPC|CLONE_NEWPID|CLONE_NEWUTS)| - (arg_private_network ? CLONE_NEWNET : 0), eventfds); +pid = syscall(__NR_clone, SIGCHLD|CLONE_NEWNS| + (arg_share_system ? 0 : CLONE_NEWIPC|CLONE_NEWPID|CLONE_NEWUTS)| + (arg_private_network ? CLONE_NEWNET : 0), NULL); if (pid 0) { if (errno == EINVAL) log_error(clone() failed, do you have namespace support enabled in your kernel? (You need UTS, IPC, PID and NET namespacing built in): %m); @@ -3126,6 +3131,8 @@ int main(int argc, char *argv[]) { }; char **env_use; +barrier_set_role(barrier, BARRIER_CHILD); + envp[n_env] = strv_find_prefix(environ, TERM=); if (envp[n_env]) n_env ++; @@ -3151,26 +3158,26 @@ int main(int argc, char *argv[]) { } log_error(Failed to open console: %s, strerror(-k)); -goto child_fail; +_exit(EXIT_FAILURE); } if (dup2(STDIN_FILENO, STDOUT_FILENO) != STDOUT_FILENO || dup2(STDIN_FILENO, STDERR_FILENO) != STDERR_FILENO) { log_error(Failed to duplicate console: %m); -goto child_fail; +_exit(EXIT_FAILURE); } if (setsid() 0) { log_error(setsid() failed: %m); -goto child_fail; +_exit(EXIT_FAILURE);
Re: [systemd-devel] [PATCH 2/3] nspawn: use Barrier API instead of eventfd-util
Hi, Yes, sending abortion events asynchronously is really nice :-) Just small quick comments for nspawn here, but I'll comment on patch #1 On Sun, Jul 13, 2014 at 12:37:07PM +0200, David Herrmann wrote: The Barrier-API simplifies cross-fork() synchronization a lot. Replace the hard-coded eventfd-util implementation and drop it. Ok Compared to the old API, Barriers also handle exit() of the remote side as abortion. This way, segfaults will not cause the parent to deadlock. Indeed, eventfd only sends POLLHUP wakeups for in kernel clients, userspace do not see it, and we can't complain since the doc states that :-/ Having POLLHUP for userspace will make eventfd replace pipe. I hacked a quick kernel patch to test it, but it would be hard to send it via poll, since the only available mechanism is eventfd counter and it is already used to generate POLLERR... EINTR handling is currently ignored for any barrier-waits. This can easily be added, but it isn't needed so far so I dropped it. EINTR handling in general is ugly, anyway. You need to deal with pselect/ppoll/... variants and make sure not to unblock signals at the wrong times. So genrally, there's little use in adding it. ppoll is atomic and it is handled by the kernel, so perhaps setting/restoring sigmask can be done easily! and for nspawn: IMO we need to receive SIGCHLD which implies EINTR. I say EINTR since not only for blocking read or infinite poll, but perhaps for all the other functions that the parent may do to setup the environment of the container, currently nspawn will set network interfaces before moving them into the container, it will also register the machine, and perhaps other operations... So having EINTR errors is useful here not only for direct reads, but for all the other calls that might block! IOW I think that nspawn should have an empty sig handler for SIGCHLD. Barrier reads already use poll and pipe to handle remote abortion since it can *not* be done by eventfd, yes this is perfect but for nspawn we can also achieve the same by combining eventfd and SICCHLD! What do you think if we make Barrier use: eventfd+pipe and/or eventfd+SIGCHLD ? Most complex fork/clone code should receive SIGCHLD, and think about nspawn! we do want it to be as lightweigh as possible, having 4 fds by default (2 eventfd + heavy pipe) may hit some resource limits quickly! compared to: 2 eventfd + empty sig handler! And it seems from the patch you are not checking barrier_place() return code, if the remote aborted ? Thanks for the patches, sure the API is really nice, I'll try to comment on #1 --- Makefile.am | 2 - src/nspawn/nspawn.c | 136 - src/shared/eventfd-util.c | 169 -- src/shared/eventfd-util.h | 43 4 files changed, 60 insertions(+), 290 deletions(-) delete mode 100644 src/shared/eventfd-util.c delete mode 100644 src/shared/eventfd-util.h diff --git a/Makefile.am b/Makefile.am index 039a83e..d7b0f90 100644 --- a/Makefile.am +++ b/Makefile.am @@ -832,8 +832,6 @@ libsystemd_shared_la_SOURCES = \ src/shared/barrier.h \ src/shared/async.c \ src/shared/async.h \ - src/shared/eventfd-util.c \ - src/shared/eventfd-util.h \ src/shared/copy.c \ src/shared/copy.h \ src/shared/base-filesystem.c \ diff --git a/src/nspawn/nspawn.c b/src/nspawn/nspawn.c index bad93a5..e75cc28 100644 --- a/src/nspawn/nspawn.c +++ b/src/nspawn/nspawn.c @@ -40,7 +40,6 @@ #include sys/un.h #include sys/socket.h #include linux/netlink.h -#include sys/eventfd.h #include net/if.h #include linux/veth.h #include sys/personality.h @@ -84,12 +83,12 @@ #include def.h #include rtnl-util.h #include udev-util.h -#include eventfd-util.h #include blkid-util.h #include gpt.h #include siphash24.h #include copy.h #include base-filesystem.h +#include barrier.h #ifdef HAVE_SECCOMP #include seccomp-util.h @@ -3074,12 +3073,18 @@ int main(int argc, char *argv[]) { for (;;) { ContainerStatus container_status; -int eventfds[2] = { -1, -1 }; +_barrier_destroy_ Barrier barrier = { }; struct sigaction sa = { .sa_handler = nop_handler, .sa_flags = SA_NOCLDSTOP, }; +r = barrier_init(barrier); +if (r 0) { +log_error(Cannot initialize IPC barrier: %s, strerror(-r)); +goto finish; +} + /* Child can be killed before execv(), so handle SIGCHLD * in order to interrupt parent's blocking calls and * give it a chance to call wait() and terminate. */ @@ -3095,9 +3100,9 @@ int main(int argc, char *argv[]) { goto finish;