Re: [systemd-devel] [PATCH] bus: fix duplicate comparisons

2013-10-10 Thread Carlos Silva
On Thu, Oct 10, 2013 at 5:14 AM, Tero Roponen tero.ropo...@gmail.com
 wrote:

 Testing for y  x is the same as testing for x  y.



 -if (y  x)
 +if (x  y)

snip

I thing you forgot to change the signs ;)
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] bus: fix duplicate comparisons

2013-10-10 Thread Olivier Brunel
On 10/10/13 12:38, Carlos Silva wrote:
 On Thu, Oct 10, 2013 at 5:14 AM, Tero Roponen tero.ropo...@gmail.com
  wrote:
 
 Testing for y  x is the same as testing for x  y.

 
 
 -if (y  x)
 +if (x  y)

 snip
 
 I thing you forgot to change the signs ;)

No, I believe that was the point of the patch. The two tests were the
same, first testing (x  y), and then (y  x). Now it then properly
tests for (x  y)

-j

 
 
 ___
 systemd-devel mailing list
 systemd-devel@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/systemd-devel
 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] bus: fix duplicate comparisons

2013-10-10 Thread Carlos Silva
On Thu, Oct 10, 2013 at 11:21 AM, Olivier Brunel j...@jjacky.com wrote:

 No, I believe that was the point of the patch. The two tests were the
  same, first testing (x  y), and then (y  x). Now it then properly
 tests for (x  y)


Totally didn't read the context of the code, just the changes and the patch
comment. Sorry about that :-/
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] make fsck fix mode a kernel command line option

2013-10-10 Thread Jan Engelhardt

On Monday 2013-10-07 14:25, Karel Zak wrote:
On Tue, Sep 10, 2013 at 04:55:19PM +0100, Colin Guthrie wrote:
 'Twas brillig, and Tom Gundersen at 10/09/13 13:45 did gyre and gimble:
  On Tue, Sep 10, 2013 at 2:31 PM, Jan Engelhardt jeng...@inai.de wrote:
 
  On Tuesday 2013-09-10 13:52, Dave Reisner wrote:
  the FUSE program knows
  nothing about the systemd-specific nofail or x-*.

 Note that mount(8) does not strip nofail when call mount.type
 helpers.

And that I would feel is a problem, because it would require that
every mount helper out there be updated every time some special
option is added.
nofail is something that mount should strip, because it is IMHO
specific to mount and/or the system boot, and not the helper.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [udev] Wrong PID used in netlink socket

2013-10-10 Thread Sven Schnelle

Hi List,

i was debugging a problem in my own program which sometimes received
'Address already in use' during creation of the netlink socket. It 
turned out

that udevd has the same bug. What actually happens is that udevd opens
the netlink socket, and forks afterwards. that doesn't sound bad at all, but
the pid of udevd is stored inside the nl_sockaddr structure. So after udevd
has forked, the PID stored in the kernel is no longer existent. If 
another process

is now started that wants to do netlink communication with the kernel and
has (by coincidence) the same PID, it will fail.

Example from my system running udev:

pid of udevd is 18921:

# pidof udevd
18921

get the netlink socket for pid 18921:

$ lsof -np 18921
COMMAND   PID USER   FD  TYPE DEVICE SIZE/OFFNODE NAME
udevd   18921 root  cwd   DIR  253,1 4096   2 /
udevd   18921 root  rtd   DIR  253,1 4096   2 /
udevd   18921 root  txt   REG  253,1   161776 7823451 /sbin/udevd
udevd   18921 root  mem   REG  253,147080 2122850 
/lib/i386-linux-gnu/i686/cmov/libnss_files-2.17.so
udevd   18921 root  mem   REG  253,142668 2122852 
/lib/i386-linux-gnu/i686/cmov/libnss_nis-2.17.so
udevd   18921 root  mem   REG  253,113856 2122844 
/lib/i386-linux-gnu/i686/cmov/libdl-2.17.so
udevd   18921 root  mem   REG  253,1   125258 2122837 
/lib/i386-linux-gnu/i686/cmov/libpthread-2.17.so
udevd   18921 root  mem   REG  253,1   255908  688195 
/lib/i386-linux-gnu/libpcre.so.3.13.1
udevd   18921 root  mem   REG  253,1  1759012 2122841 
/lib/i386-linux-gnu/i686/cmov/libc-2.17.so
udevd   18921 root  mem   REG  253,130696 2122856 
/lib/i386-linux-gnu/i686/cmov/librt-2.17.so
udevd   18921 root  mem   REG  253,1   133088  658519 
/lib/i386-linux-gnu/libselinux.so.1
udevd   18921 root  mem   REG  253,187940 2122847 
/lib/i386-linux-gnu/i686/cmov/libnsl-2.17.so
udevd   18921 root  mem   REG  253,130560 2122848 
/lib/i386-linux-gnu/i686/cmov/libnss_compat-2.17.so
udevd   18921 root  mem   REG  253,1   134376 8478759 
/lib/i386-linux-gnu/ld-2.17.so

udevd   18921 root0u  CHR1,3  0t01029 /dev/null
udevd   18921 root1u  CHR1,3  0t01029 /dev/null
udevd   18921 root2u  CHR1,3  0t01029 /dev/null
udevd   18921 root3u unix 0xc019b940  0t0  784351 
/run/udev/control

udevd   18921 root4u  netlink 0t0  784352 KOBJECT_UEVENT
udevd   18921 root5u  REG   0,138  784354 
/run/udev/queue.bin

udevd   18921 root6r 0,904048 anon_inode
udevd   18921 root7u 0,904048 anon_inode
udevd   18921 root8u unix 0xdf574040  0t0  788161 socket
udevd   18921 root9u unix 0xe3ddb4c0  0t0  788162 socket
udevd   18921 root   10u 0,904048 anon_inode
udevd   18921 root   11u unix 0xe3ddb940  0t0  788165 socket

- 784352

check PID with /proc/net/netlink:

$ grep 784352 /proc/net/netlink
e70ad800 15  18920  0001 000 20 784352

tells 18920, which is the pid before the demonize fork.

I'm using the following diff (fork before opening the netlink socket):

$ git diff
diff --git a/src/udev/udevd.c b/src/udev/udevd.c
index 7c6c5d6..4e0a789 100644
--- a/src/udev/udevd.c
+++ b/src/udev/udevd.c
@@ -1003,6 +1003,7 @@ int main(int argc, char *argv[])
 /* before opening new files, make sure std{in,out,err} fds are 
in a sane state */

 if (daemonize) {
 int fd;
+pid_t pid;

 fd = open(/dev/null, O_RDWR);
 if (fd = 0) {
@@ -1016,6 +1017,23 @@ int main(int argc, char *argv[])
 fprintf(stderr, cannot open /dev/null\n);
 log_error(cannot open /dev/null\n);
 }
+
+pid = fork();
+switch (pid) {
+case 0:
+break;
+case -1:
+log_error(fork of daemon failed: %m\n);
+rc = 4;
+goto exit;
+default:
+rc = EXIT_SUCCESS;
+goto exit_daemonize;
+}
+
+setsid();
+
+write_string_file(/proc/self/oom_score_adj, -1000);
 }

 if (systemd_fds(udev, fd_ctrl, fd_netlink) = 0) {
@@ -1081,28 +1099,8 @@ int main(int argc, char *argv[])
 goto exit;
 }

-if (daemonize) {
-pid_t pid;
-
-pid = fork();
-switch (pid) {
-case 0:
-break;
-case -1:
-log_error(fork of daemon failed: %m\n);
-rc = 4;
-goto 

Re: [systemd-devel] [PATCH 1/4] cgroups: support for MemoryAndSwapLimit= setting

2013-10-10 Thread Lennart Poettering
On Thu, 10.10.13 02:18, Mika Eloranta (m...@ohmu.fi) wrote:

Mika,

so before we add properties for these settings we need to make sure they
actually have a future in the kernel and are attributes that are going
to stay supported.

For example MemorySoftLimit is something we supported previously, but
which I recently removed because Tejun Heo (the kernel cgroup
maintainer, added to CC) suggested that the attribute wouldn't continue
to exist on the kernel side or at least not in this form.

Tejun, Mika sent patches to wrap memory.memsw.limit_in_bytes,
memory.kmem.limit_in_bytes, memory.soft_limit_in_bytes,
memory.kmem.tcp.limit_in_bytes in high-level systemd attributes. Could
you comment on the future of these attributes in the kernel? Should we
expose them in systemd?

At the systemd hack fest in New Orleans we already discussed
memory.soft_limit_in_bytes and memory.memsw.limit_in_bytes and you
suggested not to expose them. What about the other two?

(I have the suspicion though that if we want to expose something we
probably want to expose a single knob that puts a limit on all kinds of
memory, regardless of RAM, swap, kernel or tcp...)

Thanks, 

Lennart

 Add a MemoryAndSwapLimit setting that behaves the same way
 as MemoryLimit, except that it controls the
 memory.memsw.limit_in_bytes cgroup attribute.
 ---
  man/systemd.resource-control.xml  |  9 +++--
  src/core/cgroup.c | 21 ++---
  src/core/cgroup.h |  1 +
  src/core/dbus-cgroup.c|  1 +
  src/core/load-fragment-gperf.gperf.m4 |  1 +
  src/core/load-fragment.c  | 34 ++
  src/core/load-fragment.h  |  1 +
  src/systemctl/systemctl.c |  2 +-
  8 files changed, 64 insertions(+), 6 deletions(-)
 
 diff --git a/man/systemd.resource-control.xml 
 b/man/systemd.resource-control.xml
 index 8688905..606e078 100644
 --- a/man/systemd.resource-control.xml
 +++ b/man/systemd.resource-control.xml
 @@ -138,7 +138,7 @@ along with systemd; If not, see 
 http://www.gnu.org/licenses/.
/varlistentry
  
varlistentry
 -
 termvarnameMemoryLimit=replaceablebytes/replaceable/varname/term
 +termvarnameMemoryLimit=, 
 MemoryAndSwapLimit=replaceablebytes/replaceable/varname/term
  
  listitem
paraSpecify the limit on maximum memory usage of the
 @@ -149,7 +149,12 @@ along with systemd; If not, see 
 http://www.gnu.org/licenses/.
Megabytes, Gigabytes, or Terabytes (with the base 1024),
respectively. This controls the
literalmemory.limit_in_bytes/literal control group
 -  attribute. For details about this control group attribute,
 +  attribute.
 +  literalMemoryAndSwapLimit/literal controls the
 +  literalmemory.limit_in_bytes/literal control group
 +  attribute, which sets the limit for the sum of the used
 +  memory and used swap space.
 +  For details about these control group attributes,
see ulink

 url=https://www.kernel.org/doc/Documentation/cgroups/memory.txt;memory.txt/ulink./para
  
 diff --git a/src/core/cgroup.c b/src/core/cgroup.c
 index 8bf4d89..3b465cc 100644
 --- a/src/core/cgroup.c
 +++ b/src/core/cgroup.c
 @@ -34,6 +34,7 @@ void cgroup_context_init(CGroupContext *c) {
  
  c-cpu_shares = 1024;
  c-memory_limit = (uint64_t) -1;
 +c-memory_and_swap_limit = (uint64_t) -1;
  c-blockio_weight = 1000;
  }
  
 @@ -94,6 +95,7 @@ void cgroup_context_dump(CGroupContext *c, FILE* f, const 
 char *prefix) {
  %sCPUShares=%lu\n
  %sBlockIOWeight=%lu\n
  %sMemoryLimit=% PRIu64 \n
 +%sMemoryAndSwapLimit=% PRIu64 \n
  %sDevicePolicy=%s\n,
  prefix, yes_no(c-cpu_accounting),
  prefix, yes_no(c-blockio_accounting),
 @@ -101,6 +103,7 @@ void cgroup_context_dump(CGroupContext *c, FILE* f, const 
 char *prefix) {
  prefix, c-cpu_shares,
  prefix, c-blockio_weight,
  prefix, c-memory_limit,
 +prefix, c-memory_and_swap_limit,
  prefix, cgroup_device_policy_to_string(c-device_policy));
  
  LIST_FOREACH(device_allow, a, c-device_allow)
 @@ -254,9 +257,8 @@ void cgroup_context_apply(CGroupContext *c, 
 CGroupControllerMask mask, const cha
  }
  
  if (mask  CGROUP_MEMORY) {
 +char buf[DECIMAL_STR_MAX(uint64_t) + 1];
  if (c-memory_limit != (uint64_t) -1) {
 -char buf[DECIMAL_STR_MAX(uint64_t) + 1];
 -
  sprintf(buf, % PRIu64 \n, c-memory_limit);
  r = cg_set_attribute(memory, path, 
 memory.limit_in_bytes, buf);
  } else
 @@ -264,6 +266,18 @@ void cgroup_context_apply(CGroupContext *c, 
 CGroupControllerMask mask, 

Re: [systemd-devel] [PATCH 1/4] cgroups: support for MemoryAndSwapLimit= setting

2013-10-10 Thread Tejun Heo
Hello,

On Thu, Oct 10, 2013 at 04:03:20PM +0200, Lennart Poettering wrote:
 For example MemorySoftLimit is something we supported previously, but
 which I recently removed because Tejun Heo (the kernel cgroup
 maintainer, added to CC) suggested that the attribute wouldn't continue
 to exist on the kernel side or at least not in this form.

The problem with the current softlimit is that we currently aren't
sure what it means.  Its semantics is defined only by its
implementation details with all its quirks and different parties
interpret and use it differently.  memcg people are trying to clear
that up so I think it'd be worthwhile to wait to see what happens
there.

 Tejun, Mika sent patches to wrap memory.memsw.limit_in_bytes,
 memory.kmem.limit_in_bytes, memory.soft_limit_in_bytes,
 memory.kmem.tcp.limit_in_bytes in high-level systemd attributes. Could
 you comment on the future of these attributes in the kernel? Should we
 expose them in systemd?
 
 At the systemd hack fest in New Orleans we already discussed
 memory.soft_limit_in_bytes and memory.memsw.limit_in_bytes and you
 suggested not to expose them. What about the other two?

Except for soft_limit_in_bytes, at least the meanings of the knobs are
well-defined and stable, so I think it should be at least safe to
expose those.

 (I have the suspicion though that if we want to expose something we
 probably want to expose a single knob that puts a limit on all kinds of
 memory, regardless of RAM, swap, kernel or tcp...)

Yeah, the different knobs grew organically to cover more stuff which
wasn't covered before, so, yeah, when viewed together, they don't
really make a cohesive sense.  Another problem is that, enabling kmem
knobs would involve noticeable amount of extra overhead.  kmem also
has restrictions on when it can be enabled - it can't be enabled on a
populated cgroup.

Maybe an approach which makes sense is where one sets the amount of
memory which can be used and toggle which types of memory should be
included in the accounting.  Setting kmem limit equal to that of
limit_in_bytes makes limit_in_bytes applied to both kernel and user
memories.  I'll ask memcg people and find out how viable such approach
is.

Thanks!

-- 
tejun
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH 1/4] cgroups: support for MemoryAndSwapLimit= setting

2013-10-10 Thread Mika Eloranta
Hi,

Thanks guys for the feedback. I'm interested in using these controls
in dense environments (read: overcommitted memory), where striking
a good balance requires individual tuning of these settings.

I'm actually also eyeballing swappiness, pressure_level notifications 
and oom_control (each currently not supported directly by systemd's 
cgroup settings). Do you think adding those would be feasible? I'm
a bit new to systemd's code, but willing to spend some time getting
it done properly...

Another option could be just to provide a proxy interface for setting
any user-defined cgroup attributes, which would shift the responsibility
of using it correctly to the user. Something like:

  
CGroupAttributes=memory.kmem.limit_in_bytes=1024,memory.kmem.tcp.limit_in_bytes=102400

Some of these (excl. kmem) could be tuned outside systemd, but they'd
be much nicer to use directly from systemd's standard configuration.

Cheers,

- Mika

On Oct 10, 2013, at 17:28, Tejun Heo wrote:

 Hello,
 
 On Thu, Oct 10, 2013 at 04:03:20PM +0200, Lennart Poettering wrote:
 For example MemorySoftLimit is something we supported previously, but
 which I recently removed because Tejun Heo (the kernel cgroup
 maintainer, added to CC) suggested that the attribute wouldn't continue
 to exist on the kernel side or at least not in this form.
 
 The problem with the current softlimit is that we currently aren't
 sure what it means.  Its semantics is defined only by its
 implementation details with all its quirks and different parties
 interpret and use it differently.  memcg people are trying to clear
 that up so I think it'd be worthwhile to wait to see what happens
 there.
 
 Tejun, Mika sent patches to wrap memory.memsw.limit_in_bytes,
 memory.kmem.limit_in_bytes, memory.soft_limit_in_bytes,
 memory.kmem.tcp.limit_in_bytes in high-level systemd attributes. Could
 you comment on the future of these attributes in the kernel? Should we
 expose them in systemd?
 
 At the systemd hack fest in New Orleans we already discussed
 memory.soft_limit_in_bytes and memory.memsw.limit_in_bytes and you
 suggested not to expose them. What about the other two?
 
 Except for soft_limit_in_bytes, at least the meanings of the knobs are
 well-defined and stable, so I think it should be at least safe to
 expose those.
 
 (I have the suspicion though that if we want to expose something we
 probably want to expose a single knob that puts a limit on all kinds of
 memory, regardless of RAM, swap, kernel or tcp...)
 
 Yeah, the different knobs grew organically to cover more stuff which
 wasn't covered before, so, yeah, when viewed together, they don't
 really make a cohesive sense.  Another problem is that, enabling kmem
 knobs would involve noticeable amount of extra overhead.  kmem also
 has restrictions on when it can be enabled - it can't be enabled on a
 populated cgroup.
 
 Maybe an approach which makes sense is where one sets the amount of
 memory which can be used and toggle which types of memory should be
 included in the accounting.  Setting kmem limit equal to that of
 limit_in_bytes makes limit_in_bytes applied to both kernel and user
 memories.  I'll ask memcg people and find out how viable such approach
 is.
 
 Thanks!
 
 -- 
 tejun

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Early review request: socket activation bridge

2013-10-10 Thread Lennart Poettering
On Tue, 08.10.13 02:07, David Strauss (da...@davidstrauss.net) wrote:

 I've attached the initial implementation -- not yet ready to merge --
 for an event-oriented socket activation bridge. It performs well under
 load. I haven't tied up all potential leaks yet, but the normal
 execution paths seem to be clean. I also need to use proper shell
 option management.
 
 The bridge adds about 0.569ms to an average request, which is the same
 overhead I see from a normal, local-network Ethernet hop.
 
 This is with it wrapping nginx using Fedora's default nginx
 configuration and default homepage:

Hmm, so I have serious reservations about using libev. Quite frankly, I
find its code horrible...

So far we used low-level epoll directly everywhere, though it certainly
isn't particularly fun to use and very limited. In New Orleans Marcel
suggested we should add some kind of event loop abstraction to systemd
that makes working with epoll nicer, and maybe one day even export that
as on API, similar to libevent or libev, but less crazy.

And so I sat down yesterday and wrote some code for this. It's a thin
layer around epoll, that makes it easier to use, makes working with
timer events more scalable (i.e. doesn't require one timerfd per timer
event), and adds event priorisation. I tried hard to make it easy to
use, you find the result here:

http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-event.h

This should be useful for your specific purpose, but also and especially
to write bus services with. This is going to be part of libsystemd-bus
but useable outside of the immediate bus context, too.

I am planning to port PID 1 and all the auxiliary daemons to it.

 struct proxy_t {

We usually use the _t suffix to indicate typedef'ed types that are used
like a value, rather than an object. (libc does that similar, but not
the same way...). Also, OOM...

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Early review request: socket activation bridge

2013-10-10 Thread Lennart Poettering
On Tue, 08.10.13 13:12, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) wrote:

 
 On Tue, Oct 08, 2013 at 02:07:27AM -0700, David Strauss wrote:
  I've attached the initial implementation -- not yet ready to merge --
  for an event-oriented socket activation bridge. It performs well under
  load. I haven't tied up all potential leaks yet, but the normal
  execution paths seem to be clean. I also need to use proper shell
  option management.
 Hi David,
 
 how do you intend target service to be started? I understand that the
 intended use case case is for non-socket-activatable services, so they
 should be started synchronously in the background. In case of local
 services normal systemd management (over dbus) would work. In case of
 remote systems, maybe too, if dbus over the network was properly
 authorized. Do you havy any plans here?

For local systems it should be sufficient to simply invoke the backend
service as dependency of the proxy instance. i.e. not need to involve
D-Bus just activate the backend service at the same time as the socket
activated proxy service. Or am I missing something?

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Early review request: socket activation bridge

2013-10-10 Thread Zbigniew Jędrzejewski-Szmek
On Thu, Oct 10, 2013 at 05:08:26PM +0200, Lennart Poettering wrote:
 On Tue, 08.10.13 13:12, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) wrote:
 
  
  On Tue, Oct 08, 2013 at 02:07:27AM -0700, David Strauss wrote:
   I've attached the initial implementation -- not yet ready to merge --
   for an event-oriented socket activation bridge. It performs well under
   load. I haven't tied up all potential leaks yet, but the normal
   execution paths seem to be clean. I also need to use proper shell
   option management.
  Hi David,
  
  how do you intend target service to be started? I understand that the
  intended use case case is for non-socket-activatable services, so they
  should be started synchronously in the background. In case of local
  services normal systemd management (over dbus) would work. In case of
  remote systems, maybe too, if dbus over the network was properly
  authorized. Do you havy any plans here?
 
 For local systems it should be sufficient to simply invoke the backend
 service as dependency of the proxy instance. i.e. not need to involve
 D-Bus just activate the backend service at the same time as the socket
 activated proxy service. Or am I missing something?
Yeah, that would be enough. I was confused by that idea that we want
to delay the starting of the target service. But we don't have to do that,
because the proxy service is itself socket activated and started when
we actually have a connection.

If the target process is managed by systemd, the target service should
be bound to be started and stopped together with the proxy service. If
systemd.unit(5) is correct, this could be expressed as combination of
BindsTo=proxy.service and PartOf=proxy.service.

One thing which we can't make work currently, is having the target
service managed by systemd, but running with PrivateNetwork=yes. In
this case, the bridge process must be inside of the target service
and start the target binary itself. But maybe that's not so bad,
since the proxy can be introduced by adding one word to ExecStart=.

Zbyszek
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH 1/4] cgroups: support for MemoryAndSwapLimit= setting

2013-10-10 Thread Tejun Heo
(cc'ing Johannes and quoting the whole body for context)

Hey, guys.

On Thu, Oct 10, 2013 at 10:28:16AM -0400, Tejun Heo wrote:
 Hello,
 
 On Thu, Oct 10, 2013 at 04:03:20PM +0200, Lennart Poettering wrote:
  For example MemorySoftLimit is something we supported previously, but
  which I recently removed because Tejun Heo (the kernel cgroup
  maintainer, added to CC) suggested that the attribute wouldn't continue
  to exist on the kernel side or at least not in this form.
 
 The problem with the current softlimit is that we currently aren't
 sure what it means.  Its semantics is defined only by its
 implementation details with all its quirks and different parties
 interpret and use it differently.  memcg people are trying to clear
 that up so I think it'd be worthwhile to wait to see what happens
 there.
 
  Tejun, Mika sent patches to wrap memory.memsw.limit_in_bytes,
  memory.kmem.limit_in_bytes, memory.soft_limit_in_bytes,
  memory.kmem.tcp.limit_in_bytes in high-level systemd attributes. Could
  you comment on the future of these attributes in the kernel? Should we
  expose them in systemd?
  
  At the systemd hack fest in New Orleans we already discussed
  memory.soft_limit_in_bytes and memory.memsw.limit_in_bytes and you
  suggested not to expose them. What about the other two?
 
 Except for soft_limit_in_bytes, at least the meanings of the knobs are
 well-defined and stable, so I think it should be at least safe to
 expose those.
 
  (I have the suspicion though that if we want to expose something we
  probably want to expose a single knob that puts a limit on all kinds of
  memory, regardless of RAM, swap, kernel or tcp...)
 
 Yeah, the different knobs grew organically to cover more stuff which
 wasn't covered before, so, yeah, when viewed together, they don't
 really make a cohesive sense.  Another problem is that, enabling kmem
 knobs would involve noticeable amount of extra overhead.  kmem also
 has restrictions on when it can be enabled - it can't be enabled on a
 populated cgroup.
 
 Maybe an approach which makes sense is where one sets the amount of
 memory which can be used and toggle which types of memory should be
 included in the accounting.  Setting kmem limit equal to that of
 limit_in_bytes makes limit_in_bytes applied to both kernel and user
 memories.  I'll ask memcg people and find out how viable such approach
 is.

I talked with Johannes about the knobs and think something like the
following could be useful.

* A swap knob, which, when set, configures memsw.limit_in_bytes to
  memory.limit_in_bytes + the set value.

* A switch to enable kmem.  When enabled, kmem.limit_in_bytes tracks
  memory.limit_in_bytes.  ie. kmem is accounted and both kernel and
  user memory live under the same memory limit.

* A kmem knob which can be optionally configured to a lower value than
  memory.limit_in_bytes.  This is useful for overcommit scenarios as
  explained in Documentation/cgroups/memory.txt::2.7.3.

* tcp knobs are currently completely separate from other memory
  limits.  This should probably be included in memory.limit_in_bytes.
  I think it probably is a better idea to hold off on this one.

* What softlimit means is still very unclear.  We might end up with
  explicit guarantee knob and keep softlimit as it is, whatever it
  currently means.

Caveats

* This setup doesn't allow setting (memory + swap) limit without
  setting memory limit.

* The overcommit scenario described in memory.txt::2.7.3 is somewhat
  bogus because not all userland memory is reclaimable and not all
  kernel memory is unreclaimable.  Oh well...

Thanks.

-- 
tejun
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Early review request: socket activation bridge

2013-10-10 Thread Lennart Poettering
On Thu, 10.10.13 17:39, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) wrote:

 
 On Thu, Oct 10, 2013 at 05:08:26PM +0200, Lennart Poettering wrote:
  On Tue, 08.10.13 13:12, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) 
  wrote:
  
   
   On Tue, Oct 08, 2013 at 02:07:27AM -0700, David Strauss wrote:
I've attached the initial implementation -- not yet ready to merge --
for an event-oriented socket activation bridge. It performs well under
load. I haven't tied up all potential leaks yet, but the normal
execution paths seem to be clean. I also need to use proper shell
option management.
   Hi David,
   
   how do you intend target service to be started? I understand that the
   intended use case case is for non-socket-activatable services, so they
   should be started synchronously in the background. In case of local
   services normal systemd management (over dbus) would work. In case of
   remote systems, maybe too, if dbus over the network was properly
   authorized. Do you havy any plans here?
  
  For local systems it should be sufficient to simply invoke the backend
  service as dependency of the proxy instance. i.e. not need to involve
  D-Bus just activate the backend service at the same time as the socket
  activated proxy service. Or am I missing something?
 Yeah, that would be enough. I was confused by that idea that we want
 to delay the starting of the target service. But we don't have to do that,
 because the proxy service is itself socket activated and started when
 we actually have a connection.
 
 If the target process is managed by systemd, the target service should
 be bound to be started and stopped together with the proxy service. If
 systemd.unit(5) is correct, this could be expressed as combination of
 BindsTo=proxy.service and PartOf=proxy.service.
 
 One thing which we can't make work currently, is having the target
 service managed by systemd, but running with PrivateNetwork=yes. In
 this case, the bridge process must be inside of the target service
 and start the target binary itself. But maybe that's not so bad,
 since the proxy can be introduced by adding one word to ExecStart=.

Hmm, that's actually a good idea. The tool should have a mode wher you
can prefix the command line of another daemon with an invocation of this
tool. It would then fork the proxy bit into the background, and use
PR_SET_PDEATHSIG to make sure it will die along with the process it is
the proxy for.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] bus: fix duplicate comparisons

2013-10-10 Thread Lennart Poettering
On Thu, 10.10.13 08:14, Tero Roponen (tero.ropo...@gmail.com) wrote:

 Testing for y  x is the same as testing for x  y.

Thanks!

Applied!

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] dbus API for unit state change?

2013-10-10 Thread Lennart Poettering
On Sun, 06.10.13 21:11, Brandon Philips (bran...@ifup.co) wrote:

 
 On Sun, Oct 6, 2013 at 3:10 PM, Lennart Poettering
 lenn...@poettering.net wrote:
  So, yeah, if you respond to each UnitNew signal you get with a property
  Get/GetAll call, then this will result in endless ping pong, which is
  certainly not a good idea.
 
  What are you trying to do? Write some tool that tracks all units that
  are loaded?
 
 Yes, I want to register services into a networked service registry. An
 example use case would be an HTTP load balancer that is service
 registry aware and adds machines to the load balancer based on certain
 unit files appearing/leaving.
 
 An alternative solution is making a user explicitly add a
 service-registry-notifier@.service to my-application.service.wants but
 I wanted to avoid making registration a special case. For example:
 https://gist.github.com/philips/6710008
 
 Maybe there is a middle ground solution? Does it makes sense to send
 LoadState with UnitNew? I will have to look tomorrow because I think
 without that trying to do other things gets racy with transient units.

Hmm, so I thought a bit about the issue. 

If I got this right, then you get the UnitNew, immediately issue a
Get/GetAll, then you get a UnitRemoved, then you get another UnitNew,
and then the response to Get/GetAll, right? If so, it would work to
simply ignore all UnitNew signals between the response and the request,
no?

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH 2/2] Run with a custom SMACK domain (label).

2013-10-10 Thread Lennart Poettering
On Tue, 08.10.13 22:29, Schaufler, Casey (casey.schauf...@intel.com) wrote:

  On Mon, 07.10.13 10:30, Kok, Auke-jan H (auke-jan.h@intel.com) wrote:
  
Hi,
the patches look OK. I dont' have a system with smack support at
hand, but I tested them on Fedora, and didn't notice any adverse 
effects.
I you've tested them with smack, then they should be applied, imo.
  
   Thanks, I just applied them myself - I just wanted to give folks a bit
   of time to read and test - so thanks for doing so!
  
  Hmm, the patches as they are merged now try to mount the SMACK version
  of /run and /dev/shm also in containers. Will this work?
 
 So long as the cgroup filesystem propagates the xattrs to and from the real
 filesystem it won't be a problem. If the cgroup filesystem is not doing that
 there will be a problem.

I can't parse this.

  So far (at least for SELinux) we tried to turn off all security layers in
  containers, since the policies are not virtualized.
 
 I don't know what you mean by virtualized in this context.

Well, unlike for example the PID namespace stuff where the PIDs are
virtualized there is no scheme where the SMACK enforcement could be
virtualized, so that an OS container could install its own SMACK policy,
and so that SMACK labels from the container are different things even
though they share the same name with labels from the host. (I mean, I am
not saying this would be even desirable...)

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH 2/2] Run with a custom SMACK domain (label).

2013-10-10 Thread Schaufler, Casey
 -Original Message-
 From: Lennart Poettering [mailto:lenn...@poettering.net]
 Sent: Thursday, October 10, 2013 9:51 AM
 To: Schaufler, Casey
 Cc: Kok, Auke-jan H; Zbigniew Jędrzejewski-Szmek; systemd-devel
 Subject: Re: [systemd-devel] [PATCH 2/2] Run with a custom SMACK domain
 (label).
 
 On Tue, 08.10.13 22:29, Schaufler, Casey (casey.schauf...@intel.com) wrote:
 
   On Mon, 07.10.13 10:30, Kok, Auke-jan H (auke-jan.h@intel.com)
 wrote:
  
 Hi,
 the patches look OK. I dont' have a system with smack support at
 hand, but I tested them on Fedora, and didn't notice any adverse
 effects.
 I you've tested them with smack, then they should be applied, imo.
   
Thanks, I just applied them myself - I just wanted to give folks a
bit of time to read and test - so thanks for doing so!
  
   Hmm, the patches as they are merged now try to mount the SMACK
   version of /run and /dev/shm also in containers. Will this work?
 
  So long as the cgroup filesystem propagates the xattrs to and from the
  real filesystem it won't be a problem. If the cgroup filesystem is not
  doing that there will be a problem.
 
 I can't parse this.

That's because it doesn't make sense.
I had been under the impression that cgroupfs was something
other than what it is. Now that I understand better I see that
this is a nonsensical statement.

Read it as everything is OK.
 
   So far (at least for SELinux) we tried to turn off all security
   layers in containers, since the policies are not virtualized.
 
  I don't know what you mean by virtualized in this context.
 
 Well, unlike for example the PID namespace stuff where the PIDs are
 virtualized there is no scheme where the SMACK enforcement could be
 virtualized, so that an OS container could install its own SMACK policy, and 
 so
 that SMACK labels from the container are different things even though they
 share the same name with labels from the host. (I mean, I am not saying this
 would be even desirable...)

OK, that 

We've identified how we could do Smack namespaces if we wanted
to. I am pretty sure that we don't want to at this point, and that
we probably won't in the near future.

 
 Lennart
 
 --
 Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Early review request: socket activation bridge

2013-10-10 Thread David Strauss
On Thu, Oct 10, 2013 at 8:39 AM, Zbigniew Jędrzejewski-Szmek
zbys...@in.waw.pl wrote:
 One thing which we can't make work currently, is having the target
 service managed by systemd, but running with PrivateNetwork=yes. In
 this case, the bridge process must be inside of the target service
 and start the target binary itself. But maybe that's not so bad,
 since the proxy can be introduced by adding one word to ExecStart=.

This is why I opted for the tiny script to start nginx and then the
bridge in my proof-of-concept implementation.

-- 
David Strauss
   | da...@davidstrauss.net
   | +1 512 577 5827 [mobile]
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Early review request: socket activation bridge

2013-10-10 Thread David Strauss
I was actually planning to rewrite on top of libuv today, but I'm
happy to port to the new, native event library.

Is there any best-practice for using it with multiple threads?
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Early review request: socket activation bridge

2013-10-10 Thread Zbigniew Jędrzejewski-Szmek
On Thu, Oct 10, 2013 at 01:12:26PM -0700, David Strauss wrote:
 I was actually planning to rewrite on top of libuv today, but I'm
 happy to port to the new, native event library.
 
 Is there any best-practice for using it with multiple threads?
Best-practice is using just one thread :)

Zbyszek

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Early review request: socket activation bridge

2013-10-10 Thread David Strauss
On Thu, Oct 10, 2013 at 1:20 PM, Zbigniew Jędrzejewski-Szmek
zbys...@in.waw.pl wrote:
 Best-practice is using just one thread :)

That depends on whether you need to scale up to multiple cores.

-- 
David Strauss
   | da...@davidstrauss.net
   | +1 512 577 5827 [mobile]
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH 2/2] core: require $XDG_RUNTIME_DIR to be set for user instances

2013-10-10 Thread Kok, Auke-jan H
On Wed, Oct 9, 2013 at 4:57 AM, Mantas Mikulėnas graw...@gmail.com wrote:
 It seems that some places use /run otherwise, which isn't going to work.
 ---
  src/core/main.c | 6 ++
  1 file changed, 6 insertions(+)

 diff --git a/src/core/main.c b/src/core/main.c
 index fe291f8..36543c6 100644
 --- a/src/core/main.c
 +++ b/src/core/main.c
 @@ -1404,6 +1404,12 @@ int main(int argc, char *argv[]) {
  goto finish;
  }

 +if (arg_running_as == SYSTEMD_USER 
 +!getenv(XDG_RUNTIME_DIR)) {
 +log_error(Trying to run as user instance, but 
 \$XDG_RUNTIME_DIR is not set.);
 +goto finish;
 +}
 +

This is good, hopefully it will help folks debug user session usage better.

Auke
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH 3/4] cgroups: support for MemorySoftLimit= setting

2013-10-10 Thread David Strauss
Didn't we recently drop this option?
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Early review request: socket activation bridge

2013-10-10 Thread Lennart Poettering
On Thu, 10.10.13 13:12, David Strauss (da...@davidstrauss.net) wrote:

 I was actually planning to rewrite on top of libuv today, but I'm
 happy to port to the new, native event library.
 
 Is there any best-practice for using it with multiple threads?

We are pretty conservative on threads so far, but I guess in this case
it makes some sense to distribute work on CPUs. Here's how I would do it:

You start with one thread first (the main thread that is). You run an
event queue, and add all listening sockets to it. When a connection
comes in you process it as usual. As soon as you notice you are
processing more than let's say 5 connections at the same time, you spawn
a new thread and disable the listening sockets watches (use
sd_event_source_set_enable(fd, SD_EVENT_OFF) for this). That new thread
then also runs an event loop if its own, completely independent of the
original one, and also adds the listening sockets to them, it basically
takes over from the original main thread.

Eventually this second thread will either also reach its limit of 5
connections. Now, we could just fork off a yet another thread, and again
pass control of the listening socket to it and so on, but we cannot do
this unbounded, and we should try to give work back to the older threads
that have become idle again.

To do this, we keep a (mutex protected) global list of thread
information structs, each structure contains two things: a counter how
many connections that thread currently processes, and an fd referring to
a per-thread eventfd(). The eventfd() is hooked into the thread's event
loop, and we use this to pass control of the listening socket from one
thread to another.

So with this in place we can now alter our thread allocation scheme:
instead of stupidly forking off a new thread from a thread that reached
its connection limit we simply sweep through the thread info struct
array and look for the thread with the least number of connections, then
trigger its eventfd. When that thread gets this in its event loop it
will reenable the listening on the fds, and go on, until it reached
again the limit, at which point it will try to find another thread to
take control of the listening socket. When during the sweep a thread
recognizes that all threads are at their limits it forks off a new one,
as described above. If the max number of threads is reached (which we
should put at 2x or 3x the number of CPUs in the the CPU affinity set of
the process), the thread in control of the listening socket will simply
turn off the poll flags for the listening socket, and stop porcessing it
for one event loop iteration, and then try to pass it on to somebody
else on the next iteration.

With this scheme you should get pretty good distribution of things if a
large number of long running TCP connections are made. It will be not as
good if a lot of short ones are made.

That all said, I am not convinced this is really something to
necessarily implement in the service itself. Instead we could also beef
up support for the new SO_REUSEPORT socket option in systemd. For
example, we could add a new option in .socket files:
Distribute=$NUMBER. If set to some number systemd will create that many
socket fds and all bind them to the same configured address with
SO_REUSEPORT. Then, when a connection comes in on any of these, we'd
instantiate a new service instance for each and pass that one listening
socket to it, which that daemon instance would then process. The daemon
would invoke accept() on the fd, a couple of times, and process
everything it finds there. After it became idle for a while it would
exit.

With the SO_REUSEPORT scheme your daemon can stay single threaded
(making things much simpler), and you'd get much better performance
too... (Oh, and of course, with that work, we'd have something powerful
for other usecases too). All load balancing would be done by the kernel,
and that's kinda cool, because they actually are good at these things...

So, if you ask me, I vote for the SO_REUSEPORT logic.

For more information on SO_REUSEPORT:

https://lwn.net/Articles/542629/

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Early review request: socket activation bridge

2013-10-10 Thread David Strauss
On Thu, Oct 10, 2013 at 7:07 PM, Lennart Poettering
lenn...@poettering.net wrote:
 All load balancing would be done by the kernel,
 and that's kinda cool, because they actually are good at these things...

This is essentially what I was advocating a while back for other
event-oriented frameworks like Node and Twisted. Both support socket
activation these days, but they have no reliable mechanism for
distributing the load across multiple processes.

So, a big +1 to generic support for pools of socket-activated
processes that still run accept() on their own.

-- 
David Strauss
   | da...@davidstrauss.net
   | +1 512 577 5827 [mobile]
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Early review request: socket activation bridge

2013-10-10 Thread David Strauss
Here's a first take on having sabridge use the systemd-native event
library. The current, full diff is also visible on GitHub [1].

Obviously, this work still needs considerable cleanup and tightening.
I like how we're currently hammering out the basics, like the event
library to use and where the multiprocess/multithreaded logic should
go in the longer-run.

I'm open to better ideas for the data structures. Right now, the
priority is to hammer everything into symmetric structures so the
bi-directionality of the proxy gets abstracted away from the transfer
function. This is useful for ensuring we have consistent support for
server-first (MySQL) and client-first (HTTP) protocols.

[1] https://github.com/systemd/systemd/pull/5/files
/*-*- Mode: C; c-basic-offset: 8; indent-tabs-mode: nil -*-*/

/***
  This file is part of systemd.

  Copyright 2013 David Strauss

  systemd is free software; you can redistribute it and/or modify it
  under the terms of the GNU Lesser General Public License as published by
  the Free Software Foundation; either version 2.1 of the License, or
  (at your option) any later version.

  systemd is distributed in the hope that it will be useful, but
  WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
  Lesser General Public License for more details.

  You should have received a copy of the GNU Lesser General Public License
  along with systemd; If not, see http://www.gnu.org/licenses/.
 ***/

#define __STDC_FORMAT_MACROS
#include errno.h
#include stdio.h
#include stdlib.h
#include string.h
#include netdb.h
#include sys/fcntl.h
#include sys/socket.h
#include sys/un.h
#include unistd.h

#include log.h
#include sd-daemon.h
#include sd-event.h

#define BUFFER_SIZE 1024

unsigned int total_clients = 0;

struct proxy {
int listen_fd;
bool remote_is_inet;
const char *remote_host;
const char *remote_service;
};

struct connection {
int origin_fd;
int destination_fd;
sd_event_source *w_destination;
struct connection *c_destination;
};

static int transfer_data_cb(sd_event_source *s, int fd, uint32_t revents, void *userdata) {
struct connection *connection = (struct connection *) userdata;

char *buffer = malloc(BUFFER_SIZE);
ssize_t buffer_len;

assert(revents  EPOLLIN);
assert(fd == connection-origin_fd);

log_info(About to transfer up to %u bytes from %d to %d., BUFFER_SIZE, connection-origin_fd, connection-destination_fd);

buffer_len = recv(connection-origin_fd, buffer, BUFFER_SIZE, 0);
if (buffer_len == 0) {
log_info(Clean disconnection.);
sd_event_source_unref(connection-w_destination);
sd_event_source_unref(s);
close(connection-origin_fd);
close(connection-destination_fd);
free(connection-c_destination);
free(connection);
goto finish;
}
else if (buffer_len == -1) {
log_error(Error %d in recv from fd=%d: %s, errno, connection-origin_fd, strerror(errno));
exit(EXIT_FAILURE);
}

if (send(connection-destination_fd, buffer, buffer_len, 0)  0) {
log_error(Error %d in send to fd=%d: %s, errno, connection-destination_fd, strerror(errno));
exit(EXIT_FAILURE);
}

finish:
free(buffer);
return 0;
}

static int connected_to_server_cb(sd_event_source *s, int fd, uint32_t revents, void *userdata) {
struct connection *c_server_to_client = (struct connection *) userdata;
struct sd_event *e = sd_event_get(s);

log_info(Connected to server. Initializing watchers for sending data.);

// Start listening for data sent by the client.
sd_event_add_io(e, c_server_to_client-destination_fd, EPOLLIN, transfer_data_cb, c_server_to_client-c_destination, c_server_to_client-w_destination);

// Cancel the write watcher for the server.
sd_event_source_unref(s);

// Start listening for data sent by the server.
sd_event_add_io(e, c_server_to_client-origin_fd, EPOLLIN, transfer_data_cb, c_server_to_client, c_server_to_client-c_destination-w_destination);

return 0;
}


static int set_nonblock(int fd) {
int flags;
flags = fcntl(fd, F_GETFL);
flags |= O_NONBLOCK;
return fcntl(fd, F_SETFL, flags);
}

static int get_server_connection_fd(const struct proxy *proxy) {
int server_fd;
int len;

if (proxy-remote_is_inet) {
struct addrinfo hints;
struct addrinfo *result;
int s;

memset(hints, 0, sizeof(struct addrinfo));
hints.ai_family = AF_UNSPEC; /* IPv4 or IPv6 */
hints.ai_socktype = SOCK_STREAM;  /* TCP */
hints.ai_flags = AI_PASSIVE; /* Any IP address */

//log_error(Looking up address info for %s:%s, proxy-remote_host, proxy-remote_service);
s = getaddrinfo(proxy-remote_host, proxy-remote_service, hints, result);
if (s != 0) {
log_error(getaddrinfo error (%d): %s, s, gai_strerror(s));

Re: [systemd-devel] Early review request: socket activation bridge

2013-10-10 Thread Tollef Fog Heen
]] Lennart Poettering 

 On Thu, 10.10.13 13:12, David Strauss (da...@davidstrauss.net) wrote:
 
  I was actually planning to rewrite on top of libuv today, but I'm
  happy to port to the new, native event library.
  
  Is there any best-practice for using it with multiple threads?
 
 We are pretty conservative on threads so far, but I guess in this case
 it makes some sense to distribute work on CPUs. Here's how I would do it:

[snip long description]

fwiw, if you want really high performance, this is not at all how I'd do
it.  Spawning threads while under load is a recipe for disaster, for a
start.  I'd go with something how it's done in Varnish: Have an (or n)
acceptor threads that schedule work to a pool of worker threads.  That
scheduler should be careful about such things as treating the worker
threads as LIFO (to preserve CPU cache).  The advice about only 2-3
threads per CPU core looks excessively conservative.  We're usually, and
quite happily running with a few thousand threads, no matter the number
of cores.

Using REUSEPORT might make sense in cases where you're happy to throw
away performance for simplicty.  That's a completely valid tradeoff.

-- 
Tollef Fog Heen
UNIX is user friendly, it's just picky about who its friends are
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel