Re: [systemd-devel] [PATCH 2/3] nspawn: use Barrier API instead of eventfd-util

2014-07-17 Thread David Herrmann
Hi

On Mon, Jul 14, 2014 at 3:28 AM, Djalal Harouni tix...@opendz.org wrote:
 ppoll is atomic and it is handled by the kernel, so perhaps
 setting/restoring sigmask can be done easily! and for nspawn: IMO we need
 to receive SIGCHLD which implies EINTR.

 I say EINTR since not only for blocking read or infinite poll, but
 perhaps for all the other functions that the parent may do to setup the
 environment of the container, currently nspawn will set network
 interfaces before moving them into the container, it will also register
 the machine, and perhaps other operations...

 So having EINTR errors is useful here not only for direct reads, but for
 all the other calls that might block! IOW I think that nspawn should
 have an empty sig handler for SIGCHLD.

 Barrier reads already use poll and pipe to handle remote abortion since
 it can *not* be done by eventfd, yes this is perfect but for nspawn we
 can also achieve the same by combining eventfd and SICCHLD!

 What do you think if we make Barrier use:
 eventfd+pipe and/or eventfd+SIGCHLD ?

 Most complex fork/clone code should receive SIGCHLD, and think about
 nspawn! we do want it to be as lightweigh as possible, having 4 fds by
 default (2 eventfd + heavy pipe) may hit some resource limits quickly!

 compared to: 2 eventfd + empty sig handler!

My first attempt was to use a signalfd on SIGCHLD + edge-triggered. If
I don't read from the signalfd and only use it to wake up and wall
waitid(WNOWAIT), I won't interfere with other signalfds. However, this
wasn't really more lightweight than the pipe-method so i ditched it.

Regarding dropping the pipe: pipe2() is _really_ fast. I mean, we're
fork()ing and running like thousands of syscalls just during container
setup. I cannot see how dropping one light pipe2 call is beneficial
here? We also destroy the pipe before running the real container. So
it's really just during setup.

 And it seems from the patch you are not checking barrier_place() return
 code, if the remote aborted ?

That's fine. Abortions are remembered and the later barrier_sync()
call will return immediately.

 Thanks for the patches, sure the API is really nice, I'll try to comment
 on #1

Thanks!
David
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH 2/3] nspawn: use Barrier API instead of eventfd-util

2014-07-17 Thread Djalal Harouni
On Thu, Jul 17, 2014 at 11:30:26AM +0200, David Herrmann wrote:
 Hi
 
 On Mon, Jul 14, 2014 at 3:28 AM, Djalal Harouni tix...@opendz.org wrote:
  ppoll is atomic and it is handled by the kernel, so perhaps
  setting/restoring sigmask can be done easily! and for nspawn: IMO we need
  to receive SIGCHLD which implies EINTR.
 
  I say EINTR since not only for blocking read or infinite poll, but
  perhaps for all the other functions that the parent may do to setup the
  environment of the container, currently nspawn will set network
  interfaces before moving them into the container, it will also register
  the machine, and perhaps other operations...
 
  So having EINTR errors is useful here not only for direct reads, but for
  all the other calls that might block! IOW I think that nspawn should
  have an empty sig handler for SIGCHLD.
 
  Barrier reads already use poll and pipe to handle remote abortion since
  it can *not* be done by eventfd, yes this is perfect but for nspawn we
  can also achieve the same by combining eventfd and SICCHLD!
 
  What do you think if we make Barrier use:
  eventfd+pipe and/or eventfd+SIGCHLD ?
 
  Most complex fork/clone code should receive SIGCHLD, and think about
  nspawn! we do want it to be as lightweigh as possible, having 4 fds by
  default (2 eventfd + heavy pipe) may hit some resource limits quickly!
 
  compared to: 2 eventfd + empty sig handler!
 
 My first attempt was to use a signalfd on SIGCHLD + edge-triggered. If
 I don't read from the signalfd and only use it to wake up and wall
 waitid(WNOWAIT), I won't interfere with other signalfds. However, this
 wasn't really more lightweight than the pipe-method so i ditched it.
Ok.

 Regarding dropping the pipe: pipe2() is _really_ fast. I mean, we're
 fork()ing and running like thousands of syscalls just during container
 setup. I cannot see how dropping one light pipe2 call is beneficial
 here? We also destroy the pipe before running the real container. So
 it's really just during setup.
Yes, compared to fork() and all the other stuff, pipe2() is fast. My
concern was about the other resources that pipe needs and the fd limit.

Of course, it depends on nspawn future and plans, 2 or 4 fds sure it
will affect systems that will run multiple nspawn instances... but
perhaps this is not an issue for nspawn!

Otherwise I'm ok with having a pipe as a mechanism to detect container
failure, and a good point for general cases: it does not interfere with
signal handlers

Thanks!

-- 
Djalal Harouni
http://opendz.org
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 2/3] nspawn: use Barrier API instead of eventfd-util

2014-07-13 Thread David Herrmann
The Barrier-API simplifies cross-fork() synchronization a lot. Replace the
hard-coded eventfd-util implementation and drop it.

Compared to the old API, Barriers also handle exit() of the remote side as
abortion. This way, segfaults will not cause the parent to deadlock.

EINTR handling is currently ignored for any barrier-waits. This can easily
be added, but it isn't needed so far so I dropped it. EINTR handling in
general is ugly, anyway. You need to deal with pselect/ppoll/... variants
and make sure not to unblock signals at the wrong times. So genrally,
there's little use in adding it.
---
 Makefile.am   |   2 -
 src/nspawn/nspawn.c   | 136 -
 src/shared/eventfd-util.c | 169 --
 src/shared/eventfd-util.h |  43 
 4 files changed, 60 insertions(+), 290 deletions(-)
 delete mode 100644 src/shared/eventfd-util.c
 delete mode 100644 src/shared/eventfd-util.h

diff --git a/Makefile.am b/Makefile.am
index 039a83e..d7b0f90 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -832,8 +832,6 @@ libsystemd_shared_la_SOURCES = \
src/shared/barrier.h \
src/shared/async.c \
src/shared/async.h \
-   src/shared/eventfd-util.c \
-   src/shared/eventfd-util.h \
src/shared/copy.c \
src/shared/copy.h \
src/shared/base-filesystem.c \
diff --git a/src/nspawn/nspawn.c b/src/nspawn/nspawn.c
index bad93a5..e75cc28 100644
--- a/src/nspawn/nspawn.c
+++ b/src/nspawn/nspawn.c
@@ -40,7 +40,6 @@
 #include sys/un.h
 #include sys/socket.h
 #include linux/netlink.h
-#include sys/eventfd.h
 #include net/if.h
 #include linux/veth.h
 #include sys/personality.h
@@ -84,12 +83,12 @@
 #include def.h
 #include rtnl-util.h
 #include udev-util.h
-#include eventfd-util.h
 #include blkid-util.h
 #include gpt.h
 #include siphash24.h
 #include copy.h
 #include base-filesystem.h
+#include barrier.h
 
 #ifdef HAVE_SECCOMP
 #include seccomp-util.h
@@ -3074,12 +3073,18 @@ int main(int argc, char *argv[]) {
 
 for (;;) {
 ContainerStatus container_status;
-int eventfds[2] = { -1, -1 };
+_barrier_destroy_ Barrier barrier = { };
 struct sigaction sa = {
 .sa_handler = nop_handler,
 .sa_flags = SA_NOCLDSTOP,
 };
 
+r = barrier_init(barrier);
+if (r  0) {
+log_error(Cannot initialize IPC barrier: %s, 
strerror(-r));
+goto finish;
+}
+
 /* Child can be killed before execv(), so handle SIGCHLD
  * in order to interrupt parent's blocking calls and
  * give it a chance to call wait() and terminate. */
@@ -3095,9 +3100,9 @@ int main(int argc, char *argv[]) {
 goto finish;
 }
 
-pid = clone_with_eventfd(SIGCHLD|CLONE_NEWNS|
- (arg_share_system ? 0 : 
CLONE_NEWIPC|CLONE_NEWPID|CLONE_NEWUTS)|
- (arg_private_network ? CLONE_NEWNET : 
0), eventfds);
+pid = syscall(__NR_clone, SIGCHLD|CLONE_NEWNS|
+  (arg_share_system ? 0 : 
CLONE_NEWIPC|CLONE_NEWPID|CLONE_NEWUTS)|
+  (arg_private_network ? CLONE_NEWNET 
: 0), NULL);
 if (pid  0) {
 if (errno == EINVAL)
 log_error(clone() failed, do you have 
namespace support enabled in your kernel? (You need UTS, IPC, PID and NET 
namespacing built in): %m);
@@ -3126,6 +3131,8 @@ int main(int argc, char *argv[]) {
 };
 char **env_use;
 
+barrier_set_role(barrier, BARRIER_CHILD);
+
 envp[n_env] = strv_find_prefix(environ, TERM=);
 if (envp[n_env])
 n_env ++;
@@ -3151,26 +3158,26 @@ int main(int argc, char *argv[]) {
 }
 
 log_error(Failed to open console: %s, 
strerror(-k));
-goto child_fail;
+_exit(EXIT_FAILURE);
 }
 
 if (dup2(STDIN_FILENO, STDOUT_FILENO) != STDOUT_FILENO 
||
 dup2(STDIN_FILENO, STDERR_FILENO) != 
STDERR_FILENO) {
 log_error(Failed to duplicate console: %m);
-goto child_fail;
+_exit(EXIT_FAILURE);
 }
 
 if (setsid()  0) {
 log_error(setsid() failed: %m);
-goto child_fail;
+_exit(EXIT_FAILURE);

Re: [systemd-devel] [PATCH 2/3] nspawn: use Barrier API instead of eventfd-util

2014-07-13 Thread Djalal Harouni
Hi,

Yes, sending abortion events asynchronously is really nice :-)

Just small quick comments for nspawn here, but I'll comment on patch #1

On Sun, Jul 13, 2014 at 12:37:07PM +0200, David Herrmann wrote:
 The Barrier-API simplifies cross-fork() synchronization a lot. Replace the
 hard-coded eventfd-util implementation and drop it.
Ok

 Compared to the old API, Barriers also handle exit() of the remote side as
 abortion. This way, segfaults will not cause the parent to deadlock.
Indeed, eventfd only sends POLLHUP wakeups for in kernel clients, userspace
do not see it, and we can't complain since the doc states that :-/

Having POLLHUP for userspace will make eventfd replace pipe. I hacked a
quick kernel patch to test it, but it would be hard to send it via poll,
since the only available mechanism is eventfd counter and it is already
used to generate POLLERR...

 EINTR handling is currently ignored for any barrier-waits. This can easily
 be added, but it isn't needed so far so I dropped it. EINTR handling in
 general is ugly, anyway. You need to deal with pselect/ppoll/... variants
 and make sure not to unblock signals at the wrong times. So genrally,
 there's little use in adding it.

ppoll is atomic and it is handled by the kernel, so perhaps
setting/restoring sigmask can be done easily! and for nspawn: IMO we need
to receive SIGCHLD which implies EINTR.

I say EINTR since not only for blocking read or infinite poll, but
perhaps for all the other functions that the parent may do to setup the
environment of the container, currently nspawn will set network
interfaces before moving them into the container, it will also register
the machine, and perhaps other operations...

So having EINTR errors is useful here not only for direct reads, but for
all the other calls that might block! IOW I think that nspawn should
have an empty sig handler for SIGCHLD.

Barrier reads already use poll and pipe to handle remote abortion since
it can *not* be done by eventfd, yes this is perfect but for nspawn we
can also achieve the same by combining eventfd and SICCHLD!

What do you think if we make Barrier use:
eventfd+pipe and/or eventfd+SIGCHLD ?

Most complex fork/clone code should receive SIGCHLD, and think about
nspawn! we do want it to be as lightweigh as possible, having 4 fds by
default (2 eventfd + heavy pipe) may hit some resource limits quickly!

compared to: 2 eventfd + empty sig handler!


And it seems from the patch you are not checking barrier_place() return
code, if the remote aborted ?

Thanks for the patches, sure the API is really nice, I'll try to comment
on #1


 ---
  Makefile.am   |   2 -
  src/nspawn/nspawn.c   | 136 -
  src/shared/eventfd-util.c | 169 
 --
  src/shared/eventfd-util.h |  43 
  4 files changed, 60 insertions(+), 290 deletions(-)
  delete mode 100644 src/shared/eventfd-util.c
  delete mode 100644 src/shared/eventfd-util.h
 
 diff --git a/Makefile.am b/Makefile.am
 index 039a83e..d7b0f90 100644
 --- a/Makefile.am
 +++ b/Makefile.am
 @@ -832,8 +832,6 @@ libsystemd_shared_la_SOURCES = \
   src/shared/barrier.h \
   src/shared/async.c \
   src/shared/async.h \
 - src/shared/eventfd-util.c \
 - src/shared/eventfd-util.h \
   src/shared/copy.c \
   src/shared/copy.h \
   src/shared/base-filesystem.c \
 diff --git a/src/nspawn/nspawn.c b/src/nspawn/nspawn.c
 index bad93a5..e75cc28 100644
 --- a/src/nspawn/nspawn.c
 +++ b/src/nspawn/nspawn.c
 @@ -40,7 +40,6 @@
  #include sys/un.h
  #include sys/socket.h
  #include linux/netlink.h
 -#include sys/eventfd.h
  #include net/if.h
  #include linux/veth.h
  #include sys/personality.h
 @@ -84,12 +83,12 @@
  #include def.h
  #include rtnl-util.h
  #include udev-util.h
 -#include eventfd-util.h
  #include blkid-util.h
  #include gpt.h
  #include siphash24.h
  #include copy.h
  #include base-filesystem.h
 +#include barrier.h
  
  #ifdef HAVE_SECCOMP
  #include seccomp-util.h
 @@ -3074,12 +3073,18 @@ int main(int argc, char *argv[]) {
  
  for (;;) {
  ContainerStatus container_status;
 -int eventfds[2] = { -1, -1 };
 +_barrier_destroy_ Barrier barrier = { };
  struct sigaction sa = {
  .sa_handler = nop_handler,
  .sa_flags = SA_NOCLDSTOP,
  };
  
 +r = barrier_init(barrier);
 +if (r  0) {
 +log_error(Cannot initialize IPC barrier: %s, 
 strerror(-r));
 +goto finish;
 +}
 +
  /* Child can be killed before execv(), so handle SIGCHLD
   * in order to interrupt parent's blocking calls and
   * give it a chance to call wait() and terminate. */
 @@ -3095,9 +3100,9 @@ int main(int argc, char *argv[]) {
  goto finish;