Re: [systemd-devel] question on special configuration case
Thanks for this and the other suggestions! So for starters we’ll disable logind and dbus, increase watchdogsec and see where the footprint is – before disabling journald if necessary in a next step. Regards Michael From: Mantas Mikulėnas [mailto:graw...@gmail.com] Sent: Wednesday, June 08, 2016 11:35 AM To: Hebenstreit, Michael Cc: Systemd Subject: Re: [systemd-devel] question on special configuration case This sounds like you could start by unsetting WatchdogSec= for those daemons. Other than the watchdog, they shouldn't be using any CPU unless explicitly contacted. On Wed, Jun 8, 2016, 02:50 Hebenstreit, Michael <michael.hebenstr...@intel.com<mailto:michael.hebenstr...@intel.com>> wrote: The base system is actually pretty large (currently 1200 packages) - I hate that myself. Still performance wise the packages are not the issue. The SSDs used can easily handle that, and library loads are only happening once at startup (where the difference van be measured, but if the runtime is 24h startup time of 1s are not an issue). Kernel is tweaked, but those changes are relatively small. The single problem biggest problem is OS noise. Aka every cycle that the CPU(s) are working on anything but the application. This is caused by a combination of "large number of nodes" and "tightly coupled job processes". Our current (RH6) based system runs with a minimal number of demons, none of them taking up any CPU time unless they are used. Systemd process are not so well behaved. After a few hours of running they are already at a few seconds. On a single system - or systems working independent like server farms - that is not an issue. On our systems each second lost is multiplied by the number of nodes in the jobs (let's say 200, but it could also be up to 1 or more on large installations) due to tight coupling. If 3 demons use 1s a day each (and this is realistic on Xeon Phi Knights Landing systems), that's slowing down the performance by almost 1% (3 * 200 / 86400 = 0.7% to be exact). And - we do not gain anything from those demons after initial startup! My worst experience with such issues was on a cluster that lost 20% application performance due to a badly configured crond demon. Now I do not expect systemd to have such a negative impact, but even 1%, or even 0.5% of expected loss are too much in our case. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] question on special configuration case
> Really? No journal messages are getting created at all? No users logging > in/out? What does strace show on those processes? Yes, messages are created - but I'm not interested in them. Maybe a user logs in for a 6h job - that's already tracked by the cluster software. There are virtually no demons running, no changes to the hardware - so all those demons are doing are looking out for themselves. Not really productive > So you are hurting all 253 cores because you can't spare 1? Situation is a bit more complex. I have 64 physical cores, with 4 units each for integer operations and 2 floating point units. So essentially if I reserve one integer unit for the OS, due to cache hierarchies and other oddities, I essentially take down 4 cores. The applications typically scale best if they run on a power of 2 number of cores. > Again, that's not the issue, you can't see the time the kernel is using to do > its work, but it is there (interrupts, scheduling, housekeeping, etc.) shouldn't that show up in the time for worker threads? And I'm not arguing you are wrong. We should minimize that and if possible keep all OS on an extra core. That does not make my argument invalid those demons are doing nothing more than housekeeping themselves in a very complicated fashion and they are wasting resources. -Original Message- From: Greg KH [mailto:gre...@linuxfoundation.org] Sent: Wednesday, June 08, 2016 11:01 AM To: Hebenstreit, Michael Cc: Jóhann B. Guðmundsson; Lennart Poettering; systemd-devel@lists.freedesktop.org Subject: Re: [systemd-devel] question on special configuration case On Wed, Jun 08, 2016 at 02:04:48AM +0000, Hebenstreit, Michael wrote: > > What processes are showing up in your count? Perhaps it's just a > > bug that needs to be fixed. > /bin/dbus-daemon > /usr/lib/systemd/systemd-journald > /usr/lib/systemd/systemd-logind > > I understand from the previous mails those are necessary to make > systemd work - but here they are doing nothing more than talking to > each other! Really? No journal messages are getting created at all? No users logging in/out? What does strace show on those processes? > > That what "most" other system designers in your situation do :) > Unfortunately I cannot reserve a CPU for OS - I'd like to, but the app > developers insist to use all 254 cores available So you are hurting all 253 cores because you can't spare 1? If you do the math I think you will find you will get increased throughput. But what do I know... :) > > Your kernel is eating more CPU time than those 1s numbers, why you > > aren't complaining about that seems strange to me :) > I also check kernel - last time I look on RH6 all kernel threads > taking up clock ticks were actually doing work ^^ No time yet to do > the same on RH7 kernel Again, that's not the issue, you can't see the time the kernel is using to do its work, but it is there (interrupts, scheduling, housekeeping, etc.) So get it out of the way entirely and see how much faster your application runs without it even present on those cpus, if you really have cpu bound processes. That's what the feature was made for, people in your situation, to ignore it and try to go after something else seems very strange to me. greg k-h ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] question on special configuration case
> That's not the issue here though. Nope, but an example how bad things can get. > What processes are showing up in your count? Perhaps it's just a bug that > needs to be fixed. /bin/dbus-daemon /usr/lib/systemd/systemd-journald /usr/lib/systemd/systemd-logind I understand from the previous mails those are necessary to make systemd work - but here they are doing nothing more than talking to each other! > That what "most" other system designers in your situation do :) Unfortunately I cannot reserve a CPU for OS - I'd like to, but the app developers insist to use all 254 cores available > Your kernel is eating more CPU time than those 1s numbers, why you aren't > complaining about that seems strange to me :) I also check kernel - last time I look on RH6 all kernel threads taking up clock ticks were actually doing work ^^ No time yet to do the same on RH7 kernel -Original Message- From: Greg KH [mailto:gre...@linuxfoundation.org] Sent: Wednesday, June 08, 2016 8:54 AM To: Hebenstreit, Michael Cc: Jóhann B. Guðmundsson; Lennart Poettering; systemd-devel@lists.freedesktop.org Subject: Re: [systemd-devel] question on special configuration case On Tue, Jun 07, 2016 at 11:50:36PM +, Hebenstreit, Michael wrote: > The base system is actually pretty large (currently 1200 packages) - I > hate that myself. Still performance wise the packages are not the > issue. The SSDs used can easily handle that, and library loads are > only happening once at startup (where the difference van be measured, > but if the runtime is 24h startup time of 1s are not an issue). Kernel > is tweaked, but those changes are relatively small. > > The single problem biggest problem is OS noise. Aka every cycle that > the CPU(s) are working on anything but the application. This is caused > by a combination of "large number of nodes" and "tightly coupled job > processes". Then bind your applications to the cpus and don't let anything else run on them, including the kernel. That way you will not get any jitter or latencies and can use the CPUs to their max, without having to worry about anything. Leave one CPU alone to have the kernel be able to manage its housekeeping tasks (you seem to be ignoring that issue when looking at systemd, which is odd to me as it's more noise than anything else), and also let everything else run there as well. That what "most" other system designers in your situation do :) > Our current (RH6) based system runs with a minimal number of demons, > none of them taking up any CPU time unless they are used. Systemd > process are not so well behaved. After a few hours of running they are > already at a few seconds. What processes are showing up in your count? Perhaps it's just a bug that needs to be fixed. > On a single system - or systems working independent like server farms > - that is not an issue. On our systems each second lost is multiplied > by the number of nodes in the jobs (let's say 200, but it could also > be up to 1 or more on large installations) due to tight coupling. > If 3 demons use 1s a day each (and this is realistic on Xeon Phi > Knights Landing systems), that's slowing down the performance by > almost 1% (3 * 200 / 86400 = 0.7% to be exact). And - we do not gain > anything from those demons after initial startup! Your kernel is eating more CPU time than those 1s numbers, why you aren't complaining about that seems strange to me :) > My worst experience with such issues was on a cluster that lost 20% > application performance due to a badly configured crond demon. That's not the issue here though. Again, what tasks are causing cpu time for "no good reason", let's see if we can just fix them. thanks, greg k-h ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] question on special configuration case
The base system is actually pretty large (currently 1200 packages) - I hate that myself. Still performance wise the packages are not the issue. The SSDs used can easily handle that, and library loads are only happening once at startup (where the difference van be measured, but if the runtime is 24h startup time of 1s are not an issue). Kernel is tweaked, but those changes are relatively small. The single problem biggest problem is OS noise. Aka every cycle that the CPU(s) are working on anything but the application. This is caused by a combination of "large number of nodes" and "tightly coupled job processes". Our current (RH6) based system runs with a minimal number of demons, none of them taking up any CPU time unless they are used. Systemd process are not so well behaved. After a few hours of running they are already at a few seconds. On a single system - or systems working independent like server farms - that is not an issue. On our systems each second lost is multiplied by the number of nodes in the jobs (let's say 200, but it could also be up to 1 or more on large installations) due to tight coupling. If 3 demons use 1s a day each (and this is realistic on Xeon Phi Knights Landing systems), that's slowing down the performance by almost 1% (3 * 200 / 86400 = 0.7% to be exact). And - we do not gain anything from those demons after initial startup! My worst experience with such issues was on a cluster that lost 20% application performance due to a badly configured crond demon. Now I do not expect systemd to have such a negative impact, but even 1%, or even 0.5% of expected loss are too much in our case. -Original Message- From: Jóhann B. Guðmundsson [mailto:johan...@gmail.com] Sent: Wednesday, June 08, 2016 6:10 AM To: Hebenstreit, Michael; Lennart Poettering Cc: systemd-devel@lists.freedesktop.org Subject: Re: [systemd-devel] question on special configuration case On 06/07/2016 10:17 PM, Hebenstreit, Michael wrote: > I understand this usage model cannot be compared to laptops or web > servers. But basically you are saying systemd is not usable for our > High Performance Computing usage case and I might better off by > replacing it with sysinitV. I was hoping for some simpler solution, > but if it's not possible then that's life. Will certainly make an > interesting topic at HPC conferences :P I personally would be interesting comparing your legacy sysv init setup to an systemd one since systemd is widely deployed on embedded devices with minimal build ( systemd, udevd and journald ) where systemd footprint and resource usage has been significantly reduced. Given that I have pretty much crawled in the entire mud bath that makes up the core/baseOS layer in Fedora ( which RHEL and it's clone derive from ) when I was working on integrating systemd in the distribution I'm also interesting how you plan on making a minimal targeted base image which installs and uses just what you need from that ( dependency ) mess without having to rebuild those components first. ( I would think systemd "tweaking" came after you had solved that problem first along with rebuilding the kernel if your plan is to use just what you need ). JBG ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] question on special configuration case
Thanks for the answers > Well, there's no tracking of sessions anymore, i.e. polkit and all that stuff > won't work anymore reasonably, and everything else that involves anything > graphical and so on. Nothing listed is in anyway used on our system as already laid out in the original mail. Your answer implies though there is no real security issue though (like sshd not working or being exploitable to gain access to other accounts) - is this correct? > If I were you I'd actually look what wakes up the system IRL instead of just > trying to blanket remove everything. > Can you clarify how dbus-daemon, systemd-journald, systemd-logind, > systemd-udevd are causing issue/impacting in the above setup some thing more > than "I dont think we need it hence we want to disable it". The approach "if you do not need it, do not run it" works for this case pretty well. Systemd demons take up cycles without doing anything useful for us. We do not do any logging, we do not change the hardware during runtime - so no matter how little time those unit consumes, it impacts scalability. As explained this is not acceptable in our environment. > If you need to perform benchmark on Red Hat and it's derivatives/clones then > disable this would scew the benchmark output on those would it not? Not if you have some "easy" steps to duplicate the environment. > If you need absolute bare minimum systemd [¹] then you need to > create/maintain your entire distribution for that I would not call it a distribution - but yes, building/configuring a new OS out of the basic components supplied by RH/Centos is similar to a new distro. I understand this usage model cannot be compared to laptops or web servers. But basically you are saying systemd is not usable for our High Performance Computing usage case and I might better off by replacing it with sysinitV. I was hoping for some simpler solution, but if it's not possible then that's life. Will certainly make an interesting topic at HPC conferences :P Regards Michael -Original Message- From: Lennart Poettering [mailto:lenn...@poettering.net] Sent: Tuesday, June 07, 2016 11:23 PM To: Hebenstreit, Michael Cc: systemd-devel@lists.freedesktop.org Subject: Re: [systemd-devel] question on special configuration case On Tue, 07.06.16 15:13, Hebenstreit, Michael (michael.hebenstr...@intel.com) wrote: > Sorry for directing this question here, but I did not find any mailing > list that would be a better fit. > > Problem: I'm running an HPC benchmarking cluster. We are evaluating > RH7/CentOS7/OL7 and have a problem with system noise generated by the > systemd components (v 219-19.0.2, see below). > > Background: All cores of the CPU (up to 288) are utilized 99.99% by > the application. Because of the tight coupling node to node (of > programs running on 200+ nodes) every time an OS process wakes up this > automatically delays EVERY process on EVERY node. As those small > interruptions are not synchronized over the cluster, the overall > effect on the effective performance is "time of the single delay" > times "number of nodes in the job". Therefore we need to keep the OS > of our systems are stripped down to an absolute bare minimum. > > a) we have no use for any type of logging. The only log we have is >kernel dmesg > b) there is only a single user at any time on the system (logging in via ssh). > c) The only demons running are those necessary for NFS, ntp and sshd. > d) we do not run Gnome or similar desktop. > > Goal: For these reasons we want to shut down dbus-daemon, > systemd-journald, systemd-logind and after startup also systemd-udevd. > In our special case they do not serve any purpose. Unfortunately the > basic configuration options do not allow this. This is simply not supported on systemd. Systems without journald and udevd are explicitly not supported, and systems without dbus-daemon are only really supported for early boot schemes. You can of course ignore what we support and what not, but of course, then you really should know what you do, and you are basically on your own. Note that you can connect the journal to kmsg, if you like, and turn off local storage, via ForwardToKMsg= and Storage= in journald.conf. > Questions: > Can you provide any guidance? > Will PID 1 (systemd) continue to do its work (first tests were > already successful)? No, it will not. The only daemon of those listed you can realistically do without is logind, and if you do that, then you basically roll your own distro. > What are security implications when shutting down > systemd-logind? Well, there's no tracking of sessions anymore, i.e. polkit and all that stuff won't work anymore reasonably, and everything else that involves
[systemd-devel] question on special configuration case
Sorry for directing this question here, but I did not find any mailing list that would be a better fit. Problem: I'm running an HPC benchmarking cluster. We are evaluating RH7/CentOS7/OL7 and have a problem with system noise generated by the systemd components (v 219-19.0.2, see below). Background: All cores of the CPU (up to 288) are utilized 99.99% by the application. Because of the tight coupling node to node (of programs running on 200+ nodes) every time an OS process wakes up this automatically delays EVERY process on EVERY node. As those small interruptions are not synchronized over the cluster, the overall effect on the effective performance is "time of the single delay" times "number of nodes in the job". Therefore we need to keep the OS of our systems are stripped down to an absolute bare minimum. a) we have no use for any type of logging. The only log we have is kernel dmesg b) there is only a single user at any time on the system (logging in via ssh). c) The only demons running are those necessary for NFS, ntp and sshd. d) we do not run Gnome or similar desktop. Goal: For these reasons we want to shut down dbus-daemon, systemd-journald, systemd-logind and after startup also systemd-udevd. In our special case they do not serve any purpose. Unfortunately the basic configuration options do not allow this. Questions: Can you provide any guidance? Will PID 1 (systemd) continue to do its work (first tests were already successful)? What are security implications when shutting down systemd-logind? Is there any mailing list better suited you can point me too? Thanks for any help you can provide Michael Installed: systemd-networkd-219-19.0.2.el7_2.9.x86_64 systemd-219-19.0.2.el7_2.9.x86_64 systemd-devel-219-19.0.2.el7_2.9.x86_64 systemd-sysv-219-19.0.2.el7_2.9.x86_64 systemd-libs-219-19.0.2.el7_2.9.x86_64 systemd-python-219-19.0.2.el7_2.9.x86_64 systemd-resolved-219-19.0.2.el7_2.9.x86_64 Michael Hebenstreit Senior Cluster Architect Intel Corporation, MS: RR1-105/H14 Software and Services Group/DCE 4100 Sara Road Tel.: +1 505-794-3144 Rio Rancho, NM 87124 UNITED STATES E-mail: michael.hebenstr...@intel.com ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel