Re: [systemd-devel] [PATCH] bus: fix duplicate comparisons
On Thu, Oct 10, 2013 at 5:14 AM, Tero Roponen tero.ropo...@gmail.com wrote: Testing for y x is the same as testing for x y. -if (y x) +if (x y) snip I thing you forgot to change the signs ;) ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH] bus: fix duplicate comparisons
On 10/10/13 12:38, Carlos Silva wrote: On Thu, Oct 10, 2013 at 5:14 AM, Tero Roponen tero.ropo...@gmail.com wrote: Testing for y x is the same as testing for x y. -if (y x) +if (x y) snip I thing you forgot to change the signs ;) No, I believe that was the point of the patch. The two tests were the same, first testing (x y), and then (y x). Now it then properly tests for (x y) -j ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH] bus: fix duplicate comparisons
On Thu, Oct 10, 2013 at 11:21 AM, Olivier Brunel j...@jjacky.com wrote: No, I believe that was the point of the patch. The two tests were the same, first testing (x y), and then (y x). Now it then properly tests for (x y) Totally didn't read the context of the code, just the changes and the patch comment. Sorry about that :-/ ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH] make fsck fix mode a kernel command line option
On Monday 2013-10-07 14:25, Karel Zak wrote: On Tue, Sep 10, 2013 at 04:55:19PM +0100, Colin Guthrie wrote: 'Twas brillig, and Tom Gundersen at 10/09/13 13:45 did gyre and gimble: On Tue, Sep 10, 2013 at 2:31 PM, Jan Engelhardt jeng...@inai.de wrote: On Tuesday 2013-09-10 13:52, Dave Reisner wrote: the FUSE program knows nothing about the systemd-specific nofail or x-*. Note that mount(8) does not strip nofail when call mount.type helpers. And that I would feel is a problem, because it would require that every mount helper out there be updated every time some special option is added. nofail is something that mount should strip, because it is IMHO specific to mount and/or the system boot, and not the helper. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] [udev] Wrong PID used in netlink socket
Hi List, i was debugging a problem in my own program which sometimes received 'Address already in use' during creation of the netlink socket. It turned out that udevd has the same bug. What actually happens is that udevd opens the netlink socket, and forks afterwards. that doesn't sound bad at all, but the pid of udevd is stored inside the nl_sockaddr structure. So after udevd has forked, the PID stored in the kernel is no longer existent. If another process is now started that wants to do netlink communication with the kernel and has (by coincidence) the same PID, it will fail. Example from my system running udev: pid of udevd is 18921: # pidof udevd 18921 get the netlink socket for pid 18921: $ lsof -np 18921 COMMAND PID USER FD TYPE DEVICE SIZE/OFFNODE NAME udevd 18921 root cwd DIR 253,1 4096 2 / udevd 18921 root rtd DIR 253,1 4096 2 / udevd 18921 root txt REG 253,1 161776 7823451 /sbin/udevd udevd 18921 root mem REG 253,147080 2122850 /lib/i386-linux-gnu/i686/cmov/libnss_files-2.17.so udevd 18921 root mem REG 253,142668 2122852 /lib/i386-linux-gnu/i686/cmov/libnss_nis-2.17.so udevd 18921 root mem REG 253,113856 2122844 /lib/i386-linux-gnu/i686/cmov/libdl-2.17.so udevd 18921 root mem REG 253,1 125258 2122837 /lib/i386-linux-gnu/i686/cmov/libpthread-2.17.so udevd 18921 root mem REG 253,1 255908 688195 /lib/i386-linux-gnu/libpcre.so.3.13.1 udevd 18921 root mem REG 253,1 1759012 2122841 /lib/i386-linux-gnu/i686/cmov/libc-2.17.so udevd 18921 root mem REG 253,130696 2122856 /lib/i386-linux-gnu/i686/cmov/librt-2.17.so udevd 18921 root mem REG 253,1 133088 658519 /lib/i386-linux-gnu/libselinux.so.1 udevd 18921 root mem REG 253,187940 2122847 /lib/i386-linux-gnu/i686/cmov/libnsl-2.17.so udevd 18921 root mem REG 253,130560 2122848 /lib/i386-linux-gnu/i686/cmov/libnss_compat-2.17.so udevd 18921 root mem REG 253,1 134376 8478759 /lib/i386-linux-gnu/ld-2.17.so udevd 18921 root0u CHR1,3 0t01029 /dev/null udevd 18921 root1u CHR1,3 0t01029 /dev/null udevd 18921 root2u CHR1,3 0t01029 /dev/null udevd 18921 root3u unix 0xc019b940 0t0 784351 /run/udev/control udevd 18921 root4u netlink 0t0 784352 KOBJECT_UEVENT udevd 18921 root5u REG 0,138 784354 /run/udev/queue.bin udevd 18921 root6r 0,904048 anon_inode udevd 18921 root7u 0,904048 anon_inode udevd 18921 root8u unix 0xdf574040 0t0 788161 socket udevd 18921 root9u unix 0xe3ddb4c0 0t0 788162 socket udevd 18921 root 10u 0,904048 anon_inode udevd 18921 root 11u unix 0xe3ddb940 0t0 788165 socket - 784352 check PID with /proc/net/netlink: $ grep 784352 /proc/net/netlink e70ad800 15 18920 0001 000 20 784352 tells 18920, which is the pid before the demonize fork. I'm using the following diff (fork before opening the netlink socket): $ git diff diff --git a/src/udev/udevd.c b/src/udev/udevd.c index 7c6c5d6..4e0a789 100644 --- a/src/udev/udevd.c +++ b/src/udev/udevd.c @@ -1003,6 +1003,7 @@ int main(int argc, char *argv[]) /* before opening new files, make sure std{in,out,err} fds are in a sane state */ if (daemonize) { int fd; +pid_t pid; fd = open(/dev/null, O_RDWR); if (fd = 0) { @@ -1016,6 +1017,23 @@ int main(int argc, char *argv[]) fprintf(stderr, cannot open /dev/null\n); log_error(cannot open /dev/null\n); } + +pid = fork(); +switch (pid) { +case 0: +break; +case -1: +log_error(fork of daemon failed: %m\n); +rc = 4; +goto exit; +default: +rc = EXIT_SUCCESS; +goto exit_daemonize; +} + +setsid(); + +write_string_file(/proc/self/oom_score_adj, -1000); } if (systemd_fds(udev, fd_ctrl, fd_netlink) = 0) { @@ -1081,28 +1099,8 @@ int main(int argc, char *argv[]) goto exit; } -if (daemonize) { -pid_t pid; - -pid = fork(); -switch (pid) { -case 0: -break; -case -1: -log_error(fork of daemon failed: %m\n); -rc = 4; -goto
Re: [systemd-devel] [PATCH 1/4] cgroups: support for MemoryAndSwapLimit= setting
On Thu, 10.10.13 02:18, Mika Eloranta (m...@ohmu.fi) wrote: Mika, so before we add properties for these settings we need to make sure they actually have a future in the kernel and are attributes that are going to stay supported. For example MemorySoftLimit is something we supported previously, but which I recently removed because Tejun Heo (the kernel cgroup maintainer, added to CC) suggested that the attribute wouldn't continue to exist on the kernel side or at least not in this form. Tejun, Mika sent patches to wrap memory.memsw.limit_in_bytes, memory.kmem.limit_in_bytes, memory.soft_limit_in_bytes, memory.kmem.tcp.limit_in_bytes in high-level systemd attributes. Could you comment on the future of these attributes in the kernel? Should we expose them in systemd? At the systemd hack fest in New Orleans we already discussed memory.soft_limit_in_bytes and memory.memsw.limit_in_bytes and you suggested not to expose them. What about the other two? (I have the suspicion though that if we want to expose something we probably want to expose a single knob that puts a limit on all kinds of memory, regardless of RAM, swap, kernel or tcp...) Thanks, Lennart Add a MemoryAndSwapLimit setting that behaves the same way as MemoryLimit, except that it controls the memory.memsw.limit_in_bytes cgroup attribute. --- man/systemd.resource-control.xml | 9 +++-- src/core/cgroup.c | 21 ++--- src/core/cgroup.h | 1 + src/core/dbus-cgroup.c| 1 + src/core/load-fragment-gperf.gperf.m4 | 1 + src/core/load-fragment.c | 34 ++ src/core/load-fragment.h | 1 + src/systemctl/systemctl.c | 2 +- 8 files changed, 64 insertions(+), 6 deletions(-) diff --git a/man/systemd.resource-control.xml b/man/systemd.resource-control.xml index 8688905..606e078 100644 --- a/man/systemd.resource-control.xml +++ b/man/systemd.resource-control.xml @@ -138,7 +138,7 @@ along with systemd; If not, see http://www.gnu.org/licenses/. /varlistentry varlistentry - termvarnameMemoryLimit=replaceablebytes/replaceable/varname/term +termvarnameMemoryLimit=, MemoryAndSwapLimit=replaceablebytes/replaceable/varname/term listitem paraSpecify the limit on maximum memory usage of the @@ -149,7 +149,12 @@ along with systemd; If not, see http://www.gnu.org/licenses/. Megabytes, Gigabytes, or Terabytes (with the base 1024), respectively. This controls the literalmemory.limit_in_bytes/literal control group - attribute. For details about this control group attribute, + attribute. + literalMemoryAndSwapLimit/literal controls the + literalmemory.limit_in_bytes/literal control group + attribute, which sets the limit for the sum of the used + memory and used swap space. + For details about these control group attributes, see ulink url=https://www.kernel.org/doc/Documentation/cgroups/memory.txt;memory.txt/ulink./para diff --git a/src/core/cgroup.c b/src/core/cgroup.c index 8bf4d89..3b465cc 100644 --- a/src/core/cgroup.c +++ b/src/core/cgroup.c @@ -34,6 +34,7 @@ void cgroup_context_init(CGroupContext *c) { c-cpu_shares = 1024; c-memory_limit = (uint64_t) -1; +c-memory_and_swap_limit = (uint64_t) -1; c-blockio_weight = 1000; } @@ -94,6 +95,7 @@ void cgroup_context_dump(CGroupContext *c, FILE* f, const char *prefix) { %sCPUShares=%lu\n %sBlockIOWeight=%lu\n %sMemoryLimit=% PRIu64 \n +%sMemoryAndSwapLimit=% PRIu64 \n %sDevicePolicy=%s\n, prefix, yes_no(c-cpu_accounting), prefix, yes_no(c-blockio_accounting), @@ -101,6 +103,7 @@ void cgroup_context_dump(CGroupContext *c, FILE* f, const char *prefix) { prefix, c-cpu_shares, prefix, c-blockio_weight, prefix, c-memory_limit, +prefix, c-memory_and_swap_limit, prefix, cgroup_device_policy_to_string(c-device_policy)); LIST_FOREACH(device_allow, a, c-device_allow) @@ -254,9 +257,8 @@ void cgroup_context_apply(CGroupContext *c, CGroupControllerMask mask, const cha } if (mask CGROUP_MEMORY) { +char buf[DECIMAL_STR_MAX(uint64_t) + 1]; if (c-memory_limit != (uint64_t) -1) { -char buf[DECIMAL_STR_MAX(uint64_t) + 1]; - sprintf(buf, % PRIu64 \n, c-memory_limit); r = cg_set_attribute(memory, path, memory.limit_in_bytes, buf); } else @@ -264,6 +266,18 @@ void cgroup_context_apply(CGroupContext *c, CGroupControllerMask mask,
Re: [systemd-devel] [PATCH 1/4] cgroups: support for MemoryAndSwapLimit= setting
Hello, On Thu, Oct 10, 2013 at 04:03:20PM +0200, Lennart Poettering wrote: For example MemorySoftLimit is something we supported previously, but which I recently removed because Tejun Heo (the kernel cgroup maintainer, added to CC) suggested that the attribute wouldn't continue to exist on the kernel side or at least not in this form. The problem with the current softlimit is that we currently aren't sure what it means. Its semantics is defined only by its implementation details with all its quirks and different parties interpret and use it differently. memcg people are trying to clear that up so I think it'd be worthwhile to wait to see what happens there. Tejun, Mika sent patches to wrap memory.memsw.limit_in_bytes, memory.kmem.limit_in_bytes, memory.soft_limit_in_bytes, memory.kmem.tcp.limit_in_bytes in high-level systemd attributes. Could you comment on the future of these attributes in the kernel? Should we expose them in systemd? At the systemd hack fest in New Orleans we already discussed memory.soft_limit_in_bytes and memory.memsw.limit_in_bytes and you suggested not to expose them. What about the other two? Except for soft_limit_in_bytes, at least the meanings of the knobs are well-defined and stable, so I think it should be at least safe to expose those. (I have the suspicion though that if we want to expose something we probably want to expose a single knob that puts a limit on all kinds of memory, regardless of RAM, swap, kernel or tcp...) Yeah, the different knobs grew organically to cover more stuff which wasn't covered before, so, yeah, when viewed together, they don't really make a cohesive sense. Another problem is that, enabling kmem knobs would involve noticeable amount of extra overhead. kmem also has restrictions on when it can be enabled - it can't be enabled on a populated cgroup. Maybe an approach which makes sense is where one sets the amount of memory which can be used and toggle which types of memory should be included in the accounting. Setting kmem limit equal to that of limit_in_bytes makes limit_in_bytes applied to both kernel and user memories. I'll ask memcg people and find out how viable such approach is. Thanks! -- tejun ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH 1/4] cgroups: support for MemoryAndSwapLimit= setting
Hi, Thanks guys for the feedback. I'm interested in using these controls in dense environments (read: overcommitted memory), where striking a good balance requires individual tuning of these settings. I'm actually also eyeballing swappiness, pressure_level notifications and oom_control (each currently not supported directly by systemd's cgroup settings). Do you think adding those would be feasible? I'm a bit new to systemd's code, but willing to spend some time getting it done properly... Another option could be just to provide a proxy interface for setting any user-defined cgroup attributes, which would shift the responsibility of using it correctly to the user. Something like: CGroupAttributes=memory.kmem.limit_in_bytes=1024,memory.kmem.tcp.limit_in_bytes=102400 Some of these (excl. kmem) could be tuned outside systemd, but they'd be much nicer to use directly from systemd's standard configuration. Cheers, - Mika On Oct 10, 2013, at 17:28, Tejun Heo wrote: Hello, On Thu, Oct 10, 2013 at 04:03:20PM +0200, Lennart Poettering wrote: For example MemorySoftLimit is something we supported previously, but which I recently removed because Tejun Heo (the kernel cgroup maintainer, added to CC) suggested that the attribute wouldn't continue to exist on the kernel side or at least not in this form. The problem with the current softlimit is that we currently aren't sure what it means. Its semantics is defined only by its implementation details with all its quirks and different parties interpret and use it differently. memcg people are trying to clear that up so I think it'd be worthwhile to wait to see what happens there. Tejun, Mika sent patches to wrap memory.memsw.limit_in_bytes, memory.kmem.limit_in_bytes, memory.soft_limit_in_bytes, memory.kmem.tcp.limit_in_bytes in high-level systemd attributes. Could you comment on the future of these attributes in the kernel? Should we expose them in systemd? At the systemd hack fest in New Orleans we already discussed memory.soft_limit_in_bytes and memory.memsw.limit_in_bytes and you suggested not to expose them. What about the other two? Except for soft_limit_in_bytes, at least the meanings of the knobs are well-defined and stable, so I think it should be at least safe to expose those. (I have the suspicion though that if we want to expose something we probably want to expose a single knob that puts a limit on all kinds of memory, regardless of RAM, swap, kernel or tcp...) Yeah, the different knobs grew organically to cover more stuff which wasn't covered before, so, yeah, when viewed together, they don't really make a cohesive sense. Another problem is that, enabling kmem knobs would involve noticeable amount of extra overhead. kmem also has restrictions on when it can be enabled - it can't be enabled on a populated cgroup. Maybe an approach which makes sense is where one sets the amount of memory which can be used and toggle which types of memory should be included in the accounting. Setting kmem limit equal to that of limit_in_bytes makes limit_in_bytes applied to both kernel and user memories. I'll ask memcg people and find out how viable such approach is. Thanks! -- tejun ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Early review request: socket activation bridge
On Tue, 08.10.13 02:07, David Strauss (da...@davidstrauss.net) wrote: I've attached the initial implementation -- not yet ready to merge -- for an event-oriented socket activation bridge. It performs well under load. I haven't tied up all potential leaks yet, but the normal execution paths seem to be clean. I also need to use proper shell option management. The bridge adds about 0.569ms to an average request, which is the same overhead I see from a normal, local-network Ethernet hop. This is with it wrapping nginx using Fedora's default nginx configuration and default homepage: Hmm, so I have serious reservations about using libev. Quite frankly, I find its code horrible... So far we used low-level epoll directly everywhere, though it certainly isn't particularly fun to use and very limited. In New Orleans Marcel suggested we should add some kind of event loop abstraction to systemd that makes working with epoll nicer, and maybe one day even export that as on API, similar to libevent or libev, but less crazy. And so I sat down yesterday and wrote some code for this. It's a thin layer around epoll, that makes it easier to use, makes working with timer events more scalable (i.e. doesn't require one timerfd per timer event), and adds event priorisation. I tried hard to make it easy to use, you find the result here: http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-event.h This should be useful for your specific purpose, but also and especially to write bus services with. This is going to be part of libsystemd-bus but useable outside of the immediate bus context, too. I am planning to port PID 1 and all the auxiliary daemons to it. struct proxy_t { We usually use the _t suffix to indicate typedef'ed types that are used like a value, rather than an object. (libc does that similar, but not the same way...). Also, OOM... Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Early review request: socket activation bridge
On Tue, 08.10.13 13:12, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) wrote: On Tue, Oct 08, 2013 at 02:07:27AM -0700, David Strauss wrote: I've attached the initial implementation -- not yet ready to merge -- for an event-oriented socket activation bridge. It performs well under load. I haven't tied up all potential leaks yet, but the normal execution paths seem to be clean. I also need to use proper shell option management. Hi David, how do you intend target service to be started? I understand that the intended use case case is for non-socket-activatable services, so they should be started synchronously in the background. In case of local services normal systemd management (over dbus) would work. In case of remote systems, maybe too, if dbus over the network was properly authorized. Do you havy any plans here? For local systems it should be sufficient to simply invoke the backend service as dependency of the proxy instance. i.e. not need to involve D-Bus just activate the backend service at the same time as the socket activated proxy service. Or am I missing something? Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Early review request: socket activation bridge
On Thu, Oct 10, 2013 at 05:08:26PM +0200, Lennart Poettering wrote: On Tue, 08.10.13 13:12, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) wrote: On Tue, Oct 08, 2013 at 02:07:27AM -0700, David Strauss wrote: I've attached the initial implementation -- not yet ready to merge -- for an event-oriented socket activation bridge. It performs well under load. I haven't tied up all potential leaks yet, but the normal execution paths seem to be clean. I also need to use proper shell option management. Hi David, how do you intend target service to be started? I understand that the intended use case case is for non-socket-activatable services, so they should be started synchronously in the background. In case of local services normal systemd management (over dbus) would work. In case of remote systems, maybe too, if dbus over the network was properly authorized. Do you havy any plans here? For local systems it should be sufficient to simply invoke the backend service as dependency of the proxy instance. i.e. not need to involve D-Bus just activate the backend service at the same time as the socket activated proxy service. Or am I missing something? Yeah, that would be enough. I was confused by that idea that we want to delay the starting of the target service. But we don't have to do that, because the proxy service is itself socket activated and started when we actually have a connection. If the target process is managed by systemd, the target service should be bound to be started and stopped together with the proxy service. If systemd.unit(5) is correct, this could be expressed as combination of BindsTo=proxy.service and PartOf=proxy.service. One thing which we can't make work currently, is having the target service managed by systemd, but running with PrivateNetwork=yes. In this case, the bridge process must be inside of the target service and start the target binary itself. But maybe that's not so bad, since the proxy can be introduced by adding one word to ExecStart=. Zbyszek ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH 1/4] cgroups: support for MemoryAndSwapLimit= setting
(cc'ing Johannes and quoting the whole body for context) Hey, guys. On Thu, Oct 10, 2013 at 10:28:16AM -0400, Tejun Heo wrote: Hello, On Thu, Oct 10, 2013 at 04:03:20PM +0200, Lennart Poettering wrote: For example MemorySoftLimit is something we supported previously, but which I recently removed because Tejun Heo (the kernel cgroup maintainer, added to CC) suggested that the attribute wouldn't continue to exist on the kernel side or at least not in this form. The problem with the current softlimit is that we currently aren't sure what it means. Its semantics is defined only by its implementation details with all its quirks and different parties interpret and use it differently. memcg people are trying to clear that up so I think it'd be worthwhile to wait to see what happens there. Tejun, Mika sent patches to wrap memory.memsw.limit_in_bytes, memory.kmem.limit_in_bytes, memory.soft_limit_in_bytes, memory.kmem.tcp.limit_in_bytes in high-level systemd attributes. Could you comment on the future of these attributes in the kernel? Should we expose them in systemd? At the systemd hack fest in New Orleans we already discussed memory.soft_limit_in_bytes and memory.memsw.limit_in_bytes and you suggested not to expose them. What about the other two? Except for soft_limit_in_bytes, at least the meanings of the knobs are well-defined and stable, so I think it should be at least safe to expose those. (I have the suspicion though that if we want to expose something we probably want to expose a single knob that puts a limit on all kinds of memory, regardless of RAM, swap, kernel or tcp...) Yeah, the different knobs grew organically to cover more stuff which wasn't covered before, so, yeah, when viewed together, they don't really make a cohesive sense. Another problem is that, enabling kmem knobs would involve noticeable amount of extra overhead. kmem also has restrictions on when it can be enabled - it can't be enabled on a populated cgroup. Maybe an approach which makes sense is where one sets the amount of memory which can be used and toggle which types of memory should be included in the accounting. Setting kmem limit equal to that of limit_in_bytes makes limit_in_bytes applied to both kernel and user memories. I'll ask memcg people and find out how viable such approach is. I talked with Johannes about the knobs and think something like the following could be useful. * A swap knob, which, when set, configures memsw.limit_in_bytes to memory.limit_in_bytes + the set value. * A switch to enable kmem. When enabled, kmem.limit_in_bytes tracks memory.limit_in_bytes. ie. kmem is accounted and both kernel and user memory live under the same memory limit. * A kmem knob which can be optionally configured to a lower value than memory.limit_in_bytes. This is useful for overcommit scenarios as explained in Documentation/cgroups/memory.txt::2.7.3. * tcp knobs are currently completely separate from other memory limits. This should probably be included in memory.limit_in_bytes. I think it probably is a better idea to hold off on this one. * What softlimit means is still very unclear. We might end up with explicit guarantee knob and keep softlimit as it is, whatever it currently means. Caveats * This setup doesn't allow setting (memory + swap) limit without setting memory limit. * The overcommit scenario described in memory.txt::2.7.3 is somewhat bogus because not all userland memory is reclaimable and not all kernel memory is unreclaimable. Oh well... Thanks. -- tejun ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Early review request: socket activation bridge
On Thu, 10.10.13 17:39, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) wrote: On Thu, Oct 10, 2013 at 05:08:26PM +0200, Lennart Poettering wrote: On Tue, 08.10.13 13:12, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) wrote: On Tue, Oct 08, 2013 at 02:07:27AM -0700, David Strauss wrote: I've attached the initial implementation -- not yet ready to merge -- for an event-oriented socket activation bridge. It performs well under load. I haven't tied up all potential leaks yet, but the normal execution paths seem to be clean. I also need to use proper shell option management. Hi David, how do you intend target service to be started? I understand that the intended use case case is for non-socket-activatable services, so they should be started synchronously in the background. In case of local services normal systemd management (over dbus) would work. In case of remote systems, maybe too, if dbus over the network was properly authorized. Do you havy any plans here? For local systems it should be sufficient to simply invoke the backend service as dependency of the proxy instance. i.e. not need to involve D-Bus just activate the backend service at the same time as the socket activated proxy service. Or am I missing something? Yeah, that would be enough. I was confused by that idea that we want to delay the starting of the target service. But we don't have to do that, because the proxy service is itself socket activated and started when we actually have a connection. If the target process is managed by systemd, the target service should be bound to be started and stopped together with the proxy service. If systemd.unit(5) is correct, this could be expressed as combination of BindsTo=proxy.service and PartOf=proxy.service. One thing which we can't make work currently, is having the target service managed by systemd, but running with PrivateNetwork=yes. In this case, the bridge process must be inside of the target service and start the target binary itself. But maybe that's not so bad, since the proxy can be introduced by adding one word to ExecStart=. Hmm, that's actually a good idea. The tool should have a mode wher you can prefix the command line of another daemon with an invocation of this tool. It would then fork the proxy bit into the background, and use PR_SET_PDEATHSIG to make sure it will die along with the process it is the proxy for. Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH] bus: fix duplicate comparisons
On Thu, 10.10.13 08:14, Tero Roponen (tero.ropo...@gmail.com) wrote: Testing for y x is the same as testing for x y. Thanks! Applied! Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] dbus API for unit state change?
On Sun, 06.10.13 21:11, Brandon Philips (bran...@ifup.co) wrote: On Sun, Oct 6, 2013 at 3:10 PM, Lennart Poettering lenn...@poettering.net wrote: So, yeah, if you respond to each UnitNew signal you get with a property Get/GetAll call, then this will result in endless ping pong, which is certainly not a good idea. What are you trying to do? Write some tool that tracks all units that are loaded? Yes, I want to register services into a networked service registry. An example use case would be an HTTP load balancer that is service registry aware and adds machines to the load balancer based on certain unit files appearing/leaving. An alternative solution is making a user explicitly add a service-registry-notifier@.service to my-application.service.wants but I wanted to avoid making registration a special case. For example: https://gist.github.com/philips/6710008 Maybe there is a middle ground solution? Does it makes sense to send LoadState with UnitNew? I will have to look tomorrow because I think without that trying to do other things gets racy with transient units. Hmm, so I thought a bit about the issue. If I got this right, then you get the UnitNew, immediately issue a Get/GetAll, then you get a UnitRemoved, then you get another UnitNew, and then the response to Get/GetAll, right? If so, it would work to simply ignore all UnitNew signals between the response and the request, no? Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH 2/2] Run with a custom SMACK domain (label).
On Tue, 08.10.13 22:29, Schaufler, Casey (casey.schauf...@intel.com) wrote: On Mon, 07.10.13 10:30, Kok, Auke-jan H (auke-jan.h@intel.com) wrote: Hi, the patches look OK. I dont' have a system with smack support at hand, but I tested them on Fedora, and didn't notice any adverse effects. I you've tested them with smack, then they should be applied, imo. Thanks, I just applied them myself - I just wanted to give folks a bit of time to read and test - so thanks for doing so! Hmm, the patches as they are merged now try to mount the SMACK version of /run and /dev/shm also in containers. Will this work? So long as the cgroup filesystem propagates the xattrs to and from the real filesystem it won't be a problem. If the cgroup filesystem is not doing that there will be a problem. I can't parse this. So far (at least for SELinux) we tried to turn off all security layers in containers, since the policies are not virtualized. I don't know what you mean by virtualized in this context. Well, unlike for example the PID namespace stuff where the PIDs are virtualized there is no scheme where the SMACK enforcement could be virtualized, so that an OS container could install its own SMACK policy, and so that SMACK labels from the container are different things even though they share the same name with labels from the host. (I mean, I am not saying this would be even desirable...) Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH 2/2] Run with a custom SMACK domain (label).
-Original Message- From: Lennart Poettering [mailto:lenn...@poettering.net] Sent: Thursday, October 10, 2013 9:51 AM To: Schaufler, Casey Cc: Kok, Auke-jan H; Zbigniew Jędrzejewski-Szmek; systemd-devel Subject: Re: [systemd-devel] [PATCH 2/2] Run with a custom SMACK domain (label). On Tue, 08.10.13 22:29, Schaufler, Casey (casey.schauf...@intel.com) wrote: On Mon, 07.10.13 10:30, Kok, Auke-jan H (auke-jan.h@intel.com) wrote: Hi, the patches look OK. I dont' have a system with smack support at hand, but I tested them on Fedora, and didn't notice any adverse effects. I you've tested them with smack, then they should be applied, imo. Thanks, I just applied them myself - I just wanted to give folks a bit of time to read and test - so thanks for doing so! Hmm, the patches as they are merged now try to mount the SMACK version of /run and /dev/shm also in containers. Will this work? So long as the cgroup filesystem propagates the xattrs to and from the real filesystem it won't be a problem. If the cgroup filesystem is not doing that there will be a problem. I can't parse this. That's because it doesn't make sense. I had been under the impression that cgroupfs was something other than what it is. Now that I understand better I see that this is a nonsensical statement. Read it as everything is OK. So far (at least for SELinux) we tried to turn off all security layers in containers, since the policies are not virtualized. I don't know what you mean by virtualized in this context. Well, unlike for example the PID namespace stuff where the PIDs are virtualized there is no scheme where the SMACK enforcement could be virtualized, so that an OS container could install its own SMACK policy, and so that SMACK labels from the container are different things even though they share the same name with labels from the host. (I mean, I am not saying this would be even desirable...) OK, that We've identified how we could do Smack namespaces if we wanted to. I am pretty sure that we don't want to at this point, and that we probably won't in the near future. Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Early review request: socket activation bridge
On Thu, Oct 10, 2013 at 8:39 AM, Zbigniew Jędrzejewski-Szmek zbys...@in.waw.pl wrote: One thing which we can't make work currently, is having the target service managed by systemd, but running with PrivateNetwork=yes. In this case, the bridge process must be inside of the target service and start the target binary itself. But maybe that's not so bad, since the proxy can be introduced by adding one word to ExecStart=. This is why I opted for the tiny script to start nginx and then the bridge in my proof-of-concept implementation. -- David Strauss | da...@davidstrauss.net | +1 512 577 5827 [mobile] ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Early review request: socket activation bridge
I was actually planning to rewrite on top of libuv today, but I'm happy to port to the new, native event library. Is there any best-practice for using it with multiple threads? ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Early review request: socket activation bridge
On Thu, Oct 10, 2013 at 01:12:26PM -0700, David Strauss wrote: I was actually planning to rewrite on top of libuv today, but I'm happy to port to the new, native event library. Is there any best-practice for using it with multiple threads? Best-practice is using just one thread :) Zbyszek ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Early review request: socket activation bridge
On Thu, Oct 10, 2013 at 1:20 PM, Zbigniew Jędrzejewski-Szmek zbys...@in.waw.pl wrote: Best-practice is using just one thread :) That depends on whether you need to scale up to multiple cores. -- David Strauss | da...@davidstrauss.net | +1 512 577 5827 [mobile] ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH 2/2] core: require $XDG_RUNTIME_DIR to be set for user instances
On Wed, Oct 9, 2013 at 4:57 AM, Mantas Mikulėnas graw...@gmail.com wrote: It seems that some places use /run otherwise, which isn't going to work. --- src/core/main.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/core/main.c b/src/core/main.c index fe291f8..36543c6 100644 --- a/src/core/main.c +++ b/src/core/main.c @@ -1404,6 +1404,12 @@ int main(int argc, char *argv[]) { goto finish; } +if (arg_running_as == SYSTEMD_USER +!getenv(XDG_RUNTIME_DIR)) { +log_error(Trying to run as user instance, but \$XDG_RUNTIME_DIR is not set.); +goto finish; +} + This is good, hopefully it will help folks debug user session usage better. Auke ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH 3/4] cgroups: support for MemorySoftLimit= setting
Didn't we recently drop this option? ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Early review request: socket activation bridge
On Thu, 10.10.13 13:12, David Strauss (da...@davidstrauss.net) wrote: I was actually planning to rewrite on top of libuv today, but I'm happy to port to the new, native event library. Is there any best-practice for using it with multiple threads? We are pretty conservative on threads so far, but I guess in this case it makes some sense to distribute work on CPUs. Here's how I would do it: You start with one thread first (the main thread that is). You run an event queue, and add all listening sockets to it. When a connection comes in you process it as usual. As soon as you notice you are processing more than let's say 5 connections at the same time, you spawn a new thread and disable the listening sockets watches (use sd_event_source_set_enable(fd, SD_EVENT_OFF) for this). That new thread then also runs an event loop if its own, completely independent of the original one, and also adds the listening sockets to them, it basically takes over from the original main thread. Eventually this second thread will either also reach its limit of 5 connections. Now, we could just fork off a yet another thread, and again pass control of the listening socket to it and so on, but we cannot do this unbounded, and we should try to give work back to the older threads that have become idle again. To do this, we keep a (mutex protected) global list of thread information structs, each structure contains two things: a counter how many connections that thread currently processes, and an fd referring to a per-thread eventfd(). The eventfd() is hooked into the thread's event loop, and we use this to pass control of the listening socket from one thread to another. So with this in place we can now alter our thread allocation scheme: instead of stupidly forking off a new thread from a thread that reached its connection limit we simply sweep through the thread info struct array and look for the thread with the least number of connections, then trigger its eventfd. When that thread gets this in its event loop it will reenable the listening on the fds, and go on, until it reached again the limit, at which point it will try to find another thread to take control of the listening socket. When during the sweep a thread recognizes that all threads are at their limits it forks off a new one, as described above. If the max number of threads is reached (which we should put at 2x or 3x the number of CPUs in the the CPU affinity set of the process), the thread in control of the listening socket will simply turn off the poll flags for the listening socket, and stop porcessing it for one event loop iteration, and then try to pass it on to somebody else on the next iteration. With this scheme you should get pretty good distribution of things if a large number of long running TCP connections are made. It will be not as good if a lot of short ones are made. That all said, I am not convinced this is really something to necessarily implement in the service itself. Instead we could also beef up support for the new SO_REUSEPORT socket option in systemd. For example, we could add a new option in .socket files: Distribute=$NUMBER. If set to some number systemd will create that many socket fds and all bind them to the same configured address with SO_REUSEPORT. Then, when a connection comes in on any of these, we'd instantiate a new service instance for each and pass that one listening socket to it, which that daemon instance would then process. The daemon would invoke accept() on the fd, a couple of times, and process everything it finds there. After it became idle for a while it would exit. With the SO_REUSEPORT scheme your daemon can stay single threaded (making things much simpler), and you'd get much better performance too... (Oh, and of course, with that work, we'd have something powerful for other usecases too). All load balancing would be done by the kernel, and that's kinda cool, because they actually are good at these things... So, if you ask me, I vote for the SO_REUSEPORT logic. For more information on SO_REUSEPORT: https://lwn.net/Articles/542629/ Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Early review request: socket activation bridge
On Thu, Oct 10, 2013 at 7:07 PM, Lennart Poettering lenn...@poettering.net wrote: All load balancing would be done by the kernel, and that's kinda cool, because they actually are good at these things... This is essentially what I was advocating a while back for other event-oriented frameworks like Node and Twisted. Both support socket activation these days, but they have no reliable mechanism for distributing the load across multiple processes. So, a big +1 to generic support for pools of socket-activated processes that still run accept() on their own. -- David Strauss | da...@davidstrauss.net | +1 512 577 5827 [mobile] ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Early review request: socket activation bridge
Here's a first take on having sabridge use the systemd-native event library. The current, full diff is also visible on GitHub [1]. Obviously, this work still needs considerable cleanup and tightening. I like how we're currently hammering out the basics, like the event library to use and where the multiprocess/multithreaded logic should go in the longer-run. I'm open to better ideas for the data structures. Right now, the priority is to hammer everything into symmetric structures so the bi-directionality of the proxy gets abstracted away from the transfer function. This is useful for ensuring we have consistent support for server-first (MySQL) and client-first (HTTP) protocols. [1] https://github.com/systemd/systemd/pull/5/files /*-*- Mode: C; c-basic-offset: 8; indent-tabs-mode: nil -*-*/ /*** This file is part of systemd. Copyright 2013 David Strauss systemd is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. systemd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with systemd; If not, see http://www.gnu.org/licenses/. ***/ #define __STDC_FORMAT_MACROS #include errno.h #include stdio.h #include stdlib.h #include string.h #include netdb.h #include sys/fcntl.h #include sys/socket.h #include sys/un.h #include unistd.h #include log.h #include sd-daemon.h #include sd-event.h #define BUFFER_SIZE 1024 unsigned int total_clients = 0; struct proxy { int listen_fd; bool remote_is_inet; const char *remote_host; const char *remote_service; }; struct connection { int origin_fd; int destination_fd; sd_event_source *w_destination; struct connection *c_destination; }; static int transfer_data_cb(sd_event_source *s, int fd, uint32_t revents, void *userdata) { struct connection *connection = (struct connection *) userdata; char *buffer = malloc(BUFFER_SIZE); ssize_t buffer_len; assert(revents EPOLLIN); assert(fd == connection-origin_fd); log_info(About to transfer up to %u bytes from %d to %d., BUFFER_SIZE, connection-origin_fd, connection-destination_fd); buffer_len = recv(connection-origin_fd, buffer, BUFFER_SIZE, 0); if (buffer_len == 0) { log_info(Clean disconnection.); sd_event_source_unref(connection-w_destination); sd_event_source_unref(s); close(connection-origin_fd); close(connection-destination_fd); free(connection-c_destination); free(connection); goto finish; } else if (buffer_len == -1) { log_error(Error %d in recv from fd=%d: %s, errno, connection-origin_fd, strerror(errno)); exit(EXIT_FAILURE); } if (send(connection-destination_fd, buffer, buffer_len, 0) 0) { log_error(Error %d in send to fd=%d: %s, errno, connection-destination_fd, strerror(errno)); exit(EXIT_FAILURE); } finish: free(buffer); return 0; } static int connected_to_server_cb(sd_event_source *s, int fd, uint32_t revents, void *userdata) { struct connection *c_server_to_client = (struct connection *) userdata; struct sd_event *e = sd_event_get(s); log_info(Connected to server. Initializing watchers for sending data.); // Start listening for data sent by the client. sd_event_add_io(e, c_server_to_client-destination_fd, EPOLLIN, transfer_data_cb, c_server_to_client-c_destination, c_server_to_client-w_destination); // Cancel the write watcher for the server. sd_event_source_unref(s); // Start listening for data sent by the server. sd_event_add_io(e, c_server_to_client-origin_fd, EPOLLIN, transfer_data_cb, c_server_to_client, c_server_to_client-c_destination-w_destination); return 0; } static int set_nonblock(int fd) { int flags; flags = fcntl(fd, F_GETFL); flags |= O_NONBLOCK; return fcntl(fd, F_SETFL, flags); } static int get_server_connection_fd(const struct proxy *proxy) { int server_fd; int len; if (proxy-remote_is_inet) { struct addrinfo hints; struct addrinfo *result; int s; memset(hints, 0, sizeof(struct addrinfo)); hints.ai_family = AF_UNSPEC; /* IPv4 or IPv6 */ hints.ai_socktype = SOCK_STREAM; /* TCP */ hints.ai_flags = AI_PASSIVE; /* Any IP address */ //log_error(Looking up address info for %s:%s, proxy-remote_host, proxy-remote_service); s = getaddrinfo(proxy-remote_host, proxy-remote_service, hints, result); if (s != 0) { log_error(getaddrinfo error (%d): %s, s, gai_strerror(s));
Re: [systemd-devel] Early review request: socket activation bridge
]] Lennart Poettering On Thu, 10.10.13 13:12, David Strauss (da...@davidstrauss.net) wrote: I was actually planning to rewrite on top of libuv today, but I'm happy to port to the new, native event library. Is there any best-practice for using it with multiple threads? We are pretty conservative on threads so far, but I guess in this case it makes some sense to distribute work on CPUs. Here's how I would do it: [snip long description] fwiw, if you want really high performance, this is not at all how I'd do it. Spawning threads while under load is a recipe for disaster, for a start. I'd go with something how it's done in Varnish: Have an (or n) acceptor threads that schedule work to a pool of worker threads. That scheduler should be careful about such things as treating the worker threads as LIFO (to preserve CPU cache). The advice about only 2-3 threads per CPU core looks excessively conservative. We're usually, and quite happily running with a few thousand threads, no matter the number of cores. Using REUSEPORT might make sense in cases where you're happy to throw away performance for simplicty. That's a completely valid tradeoff. -- Tollef Fog Heen UNIX is user friendly, it's just picky about who its friends are ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel