Re: [systemd-devel] udevd: increase maximum number of children
Hi Robert, On Fri, Nov 28, 2014 at 8:57 AM, Robert Milasan rmila...@suse.com wrote: Hello, since a while back, there was a commit (haven't found it) commit 8cc3f8c0bcd23bb68166cb197a4c541d7621b19c Author: Harald Hoyer har...@redhat.com Date: Mon Mar 25 13:02:05 2013 +0100 udevd.c: set udev children_max according to CPU count Setting children_max according to RAM leads to too much concurrent I/O. which limits the number of children/workers to 8 + num_cpu * 2, which in a normal case like a 4 core/cpu machine is 16 children/workers. This limit is way too low even for a single core VM, that actually means 10 children/workers. I've attached the patch which increased this number to 8 + num_cpu * 256, which is 1032 for a 4 core/cpu machine and 264 for a single core/cpu machine. The reason we have to limit the number of workers is IO, but we used to limit based on RAM, and now we limit based on number of CPUs. Could we not use some scheduler/cgroup tweaks to achieve the correct result instead? Harald, do you have some input on this? Also the patch changes the logging level of 'maximum number of children reached' to an error, this should be visible as an error when the number has been reached. I don't think an error is appropriate here (as nothing actually fails). At most a warning I would say. Cheers, Tom ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] udevd: increase maximum number of children
On Fri, 28 Nov 2014 13:51:04 +0100 Tom Gundersen t...@jklm.no wrote: Also the patch changes the logging level of 'maximum number of children reached' to an error, this should be visible as an error when the number has been reached. I don't think an error is appropriate here (as nothing actually fails). At most a warning I would say. Cheers, Tom This is an error, because the behavior makes the system not work correctly. I don't care about a warning that much, but in this case and reference bug, we see a bug and/or an error which is causes by the small amount of children, or the impossibility of udev daemon to create new children/workers, stopping the queue processing until the number of children is lower the children_max. Anyway, please do as you wish as long as it gets fixed. -- Robert Milasan L3 Support Engineer SUSE Linux (http://www.suse.com) email: rmila...@suse.com GPG fingerprint: B6FE F4A8 0FA3 3040 3402 6FE7 2F64 167C 1909 6D1A ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] udevd: increase maximum number of children
On 28.11.2014 13:59, Robert Milasan wrote: On Fri, 28 Nov 2014 13:51:04 +0100 Tom Gundersen t...@jklm.no wrote: Also the patch changes the logging level of 'maximum number of children reached' to an error, this should be visible as an error when the number has been reached. I don't think an error is appropriate here (as nothing actually fails). At most a warning I would say. Cheers, Tom This is an error, because the behavior makes the system not work correctly. I don't care about a warning that much, but in this case and reference bug, we see a bug and/or an error which is causes by the small amount of children, or the impossibility of udev daemon to create new children/workers, stopping the queue processing until the number of children is lower the children_max. Anyway, please do as you wish as long as it gets fixed. This is not true. It only defers the uevent until a worker is available. So logging it as an error is incorrect. It's a debug message. We don't have unlimited resources. You don't do make -j with unlimited make jobs either for a kernel build to get the minimum build time. Having 1024 concurrent blkid's running will slow down a machine significantly, because concurrent I/O over a single path is never good. So, what we see is a lot of I/O errors because of timeouts on huge systems, if the bottleneck is saturated. Ideally we would limit the concurrent I/O, but we can't know (in a simple way) where the bottleneck is. It might be a SAN behind FCoE, or iscsi and the network is the bottleneck. That being said, I don't have a sane solution to satisfy everybody. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] udevd: increase maximum number of children
On 28.11.2014 08:57, Robert Milasan wrote: Hello, since a while back, there was a commit (haven't found it) which limits the number of children/workers to 8 + num_cpu * 2, which in a normal case like a 4 core/cpu machine is 16 children/workers. This limit is way too low even for a single core VM, that actually means 10 children/workers. I've attached the patch which increased this number to 8 + num_cpu * 256, which is 1032 for a 4 core/cpu machine and 264 for a single core/cpu machine. Also the patch changes the logging level of 'maximum number of children reached' to an error, this should be visible as an error when the number has been reached. Reference bug: http://bugzilla.opensuse.org/show_bug.cgi?id=907393 I think what we are seeing here is, that module loading saturates the udev workers here. So there are at least 16 modprobes (kmod) running and this hinders further processing of the uevents. In theory we could increase arg_children_max before builtin_kmod() and decrease it again afterwards. CC'ing Kay ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] udevd: increase maximum number of children
On Fri, 28 Nov 2014 14:28:31 +0100 Harald Hoyer harald.ho...@gmail.com wrote: I think what we are seeing here is, that module loading saturates the udev workers here. So there are at least 16 modprobes (kmod) running and this hinders further processing of the uevents. In theory we could increase arg_children_max before builtin_kmod() and decrease it again afterwards. CC'ing Kay ___ Ok, I've decreased the arg_children_max number from CPU_COUNT * 256 to CPU_COUNT * 64. You where right, it was a bit too much. -- Robert Milasan L3 Support Engineer SUSE Linux (http://www.suse.com) email: rmila...@suse.com GPG fingerprint: B6FE F4A8 0FA3 3040 3402 6FE7 2F64 167C 1909 6D1A ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] udevd: increase maximum number of children
On 28.11.2014 14:09, Harald Hoyer wrote: On 28.11.2014 13:59, Robert Milasan wrote: On Fri, 28 Nov 2014 13:51:04 +0100 Tom Gundersen t...@jklm.no wrote: Also the patch changes the logging level of 'maximum number of children reached' to an error, this should be visible as an error when the number has been reached. I don't think an error is appropriate here (as nothing actually fails). At most a warning I would say. Cheers, Tom This is an error, because the behavior makes the system not work correctly. I don't care about a warning that much, but in this case and reference bug, we see a bug and/or an error which is causes by the small amount of children, or the impossibility of udev daemon to create new children/workers, stopping the queue processing until the number of children is lower the children_max. Anyway, please do as you wish as long as it gets fixed. This is not true. It only defers the uevent until a worker is available. So logging it as an error is incorrect. It's a debug message. We don't have unlimited resources. You don't do make -j with unlimited make jobs either for a kernel build to get the minimum build time. Having 1024 concurrent blkid's running will slow down a machine significantly, because concurrent I/O over a single path is never good. So, what we see is a lot of I/O errors because of timeouts on huge systems, if the bottleneck is saturated. Ideally we would limit the concurrent I/O, but we can't know (in a simple way) where the bottleneck is. It might be a SAN behind FCoE, or iscsi and the network is the bottleneck. That being said, I don't have a sane solution to satisfy everybody. Also interesting: udev workers and the time to bring up 2000 LVM LVs on 2 disks. https://plus.google.com/117537647502636167748/posts/eRJFhjLbpta ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] udevd: increase maximum number of children
Hello, since a while back, there was a commit (haven't found it) which limits the number of children/workers to 8 + num_cpu * 2, which in a normal case like a 4 core/cpu machine is 16 children/workers. This limit is way too low even for a single core VM, that actually means 10 children/workers. I've attached the patch which increased this number to 8 + num_cpu * 256, which is 1032 for a 4 core/cpu machine and 264 for a single core/cpu machine. Also the patch changes the logging level of 'maximum number of children reached' to an error, this should be visible as an error when the number has been reached. Reference bug: http://bugzilla.opensuse.org/show_bug.cgi?id=907393 -- Robert Milasan L3 Support Engineer SUSE Linux (http://www.suse.com) email: rmila...@suse.com GPG fingerprint: B6FE F4A8 0FA3 3040 3402 6FE7 2F64 167C 1909 6D1A From 4fb50b784c7aa6f5ea9cdf2e5a65ba48a2b9ce1b Mon Sep 17 00:00:00 2001 From: Robert Milasan rmila...@suse.com Date: Fri, 28 Nov 2014 08:49:38 +0100 Subject: [PATCH] udevd: increase maximum number of children Signed-off-by: Robert Milasan rmila...@suse.com --- src/udev/udevd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/udev/udevd.c b/src/udev/udevd.c index 3c3de76..c327c38 100644 --- a/src/udev/udevd.c +++ b/src/udev/udevd.c @@ -445,7 +445,7 @@ static void event_run(struct event *event) { if (children = arg_children_max) { if (arg_children_max 1) -log_debug(maximum number (%i) of children reached, children); +log_error(maximum number (%i) of children reached, children); return; } @@ -1270,7 +1270,7 @@ int main(int argc, char *argv[]) { arg_children_max = 8; if (sched_getaffinity(0, sizeof (cpu_set), cpu_set) == 0) { -arg_children_max += CPU_COUNT(cpu_set) * 2; +arg_children_max += CPU_COUNT(cpu_set) * 256; } } log_debug(set children_max to %u, arg_children_max); -- 1.8.4.5 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel