from:"Hebenstreit, Michael"

Re: [systemd-devel] question on special configuration case

2016-06-08 Thread Hebenstreit, Michael

Thanks for this and the other suggestions!

So for starters we’ll disable logind and dbus, increase watchdogsec and see 
where the footprint is – before disabling journald if necessary in a next step.

Regards
Michael


From: Mantas Mikulėnas [mailto:graw...@gmail.com]
Sent: Wednesday, June 08, 2016 11:35 AM
To: Hebenstreit, Michael
Cc: Systemd
Subject: Re: [systemd-devel] question on special configuration case

This sounds like you could start by unsetting WatchdogSec= for those daemons. 
Other than the watchdog, they shouldn't be using any CPU unless explicitly 
contacted.
On Wed, Jun 8, 2016, 02:50 Hebenstreit, Michael 
<michael.hebenstr...@intel.com<mailto:michael.hebenstr...@intel.com>> wrote:
The base system is actually pretty large (currently 1200 packages) - I hate 
that myself. Still performance wise the packages are not the issue. The SSDs 
used can easily handle that, and library loads are only happening once at 
startup (where the difference van be measured, but if the runtime is 24h 
startup time of 1s are not an issue). Kernel is tweaked, but those changes are 
relatively small.

The single problem biggest problem is OS noise. Aka every cycle that the CPU(s) 
are working on anything but the application. This is caused by a  combination 
of "large number of nodes" and "tightly coupled job processes".

Our current (RH6) based system runs with a minimal number of demons, none of 
them taking up any CPU time unless they are used. Systemd process are not so 
well behaved. After a few hours of running they are already at a few seconds. 
On a single system - or systems working independent like server farms - that is 
not an issue. On our systems each second lost is multiplied by the number of 
nodes in the jobs (let's say 200, but it could also be up to 1 or more on 
large installations) due to tight coupling. If 3 demons use 1s a day each (and 
this is realistic on Xeon Phi Knights Landing systems), that's slowing down the 
performance by almost 1% (3 * 200 / 86400 = 0.7% to be exact). And - we do not 
gain anything from those demons after initial startup!

My worst experience with such issues was on a cluster that lost 20% application 
performance due to a badly configured crond demon. Now I do not expect systemd 
to have such a negative impact, but even 1%, or even 0.5% of expected loss are 
too much in our case.

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] question on special configuration case

2016-06-08 Thread Hebenstreit, Michael

> Really?  No journal messages are getting created at all?  No users logging 
> in/out?  What does strace show on those processes?

Yes, messages are created - but I'm not interested in them. Maybe a user logs 
in for a 6h job - that's already tracked by the cluster software. There are 
virtually no demons running, no changes to the hardware - so all those demons 
are doing are looking out for themselves. Not really productive

> So you are hurting all 253 cores because you can't spare 1?

Situation is a bit more complex. I have 64 physical cores, with 4 units each 
for integer operations and 2 floating point units. So essentially if I reserve 
one integer unit for the OS, due to cache hierarchies and other oddities, I 
essentially take down 4 cores. The applications typically scale best if they 
run on a power of 2 number of cores. 

> Again, that's not the issue, you can't see the time the kernel is using to do 
> its work, but it is there (interrupts, scheduling, housekeeping,
etc.)

shouldn't that show up in the time for worker threads? And I'm not arguing you 
are wrong. We should minimize that and if possible keep all OS on an extra 
core. That does not make my argument invalid those demons are doing nothing 
more than housekeeping themselves in a very complicated fashion and they are 
wasting resources. 

-Original Message-
From: Greg KH [mailto:gre...@linuxfoundation.org] 
Sent: Wednesday, June 08, 2016 11:01 AM
To: Hebenstreit, Michael
Cc: Jóhann B. Guðmundsson; Lennart Poettering; 
systemd-devel@lists.freedesktop.org
Subject: Re: [systemd-devel] question on special configuration case

On Wed, Jun 08, 2016 at 02:04:48AM +0000, Hebenstreit, Michael wrote:
> > What processes are showing up in your count?  Perhaps it's just a 
> > bug that needs to be fixed.
> /bin/dbus-daemon
> /usr/lib/systemd/systemd-journald
> /usr/lib/systemd/systemd-logind
> 
> I understand from the previous mails those are necessary to make 
> systemd work - but here they are doing nothing more than talking to 
> each other!

Really?  No journal messages are getting created at all?  No users logging 
in/out?  What does strace show on those processes?

> > That what "most" other system designers in your situation do :)
> Unfortunately I cannot reserve a CPU for OS - I'd like to, but the app 
> developers insist to use all 254 cores available

So you are hurting all 253 cores because you can't spare 1?  If you do the math 
I think you will find you will get increased throughput.  But what do I know... 
:)

> > Your kernel is eating more CPU time than those 1s numbers, why you 
> > aren't complaining about that seems strange to me :)
> I also check kernel - last time I look on RH6 all kernel threads 
> taking up clock ticks were actually doing work ^^ No time yet to do 
> the same on RH7 kernel

Again, that's not the issue, you can't see the time the kernel is using to do 
its work, but it is there (interrupts, scheduling, housekeeping,
etc.)  So get it out of the way entirely and see how much faster your 
application runs without it even present on those cpus, if you really have cpu 
bound processes.  That's what the feature was made for, people in your 
situation, to ignore it and try to go after something else seems very strange 
to me.

greg k-h
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] question on special configuration case

2016-06-07 Thread Hebenstreit, Michael

> That's not the issue here though.
Nope, but an example how bad things can get.


> What processes are showing up in your count?  Perhaps it's just a bug that 
> needs to be fixed.
/bin/dbus-daemon
/usr/lib/systemd/systemd-journald
/usr/lib/systemd/systemd-logind

I understand from the previous mails those are necessary to make systemd work - 
but here they are doing nothing more than talking to each other!

> That what "most" other system designers in your situation do :)
Unfortunately I cannot reserve a CPU for OS - I'd like to, but the app 
developers insist to use all 254 cores available

> Your kernel is eating more CPU time than those 1s numbers, why you aren't 
> complaining about that seems strange to me :)
I also check kernel - last time I look on RH6 all kernel threads taking up 
clock ticks were actually doing work ^^
No time yet to do the same on RH7 kernel


-Original Message-
From: Greg KH [mailto:gre...@linuxfoundation.org] 
Sent: Wednesday, June 08, 2016 8:54 AM
To: Hebenstreit, Michael
Cc: Jóhann B. Guðmundsson; Lennart Poettering; 
systemd-devel@lists.freedesktop.org
Subject: Re: [systemd-devel] question on special configuration case

On Tue, Jun 07, 2016 at 11:50:36PM +, Hebenstreit, Michael wrote:
> The base system is actually pretty large (currently 1200 packages) - I 
> hate that myself. Still performance wise the packages are not the 
> issue. The SSDs used can easily handle that, and library loads are 
> only happening once at startup (where the difference van be measured, 
> but if the runtime is 24h startup time of 1s are not an issue). Kernel 
> is tweaked, but those changes are relatively small.
> 
> The single problem biggest problem is OS noise. Aka every cycle that 
> the CPU(s) are working on anything but the application. This is caused 
> by a  combination of "large number of nodes" and "tightly coupled job 
> processes".

Then bind your applications to the cpus and don't let anything else run on 
them, including the kernel.  That way you will not get any jitter or latencies 
and can use the CPUs to their max, without having to worry about anything.  
Leave one CPU alone to have the kernel be able to manage its housekeeping tasks 
(you seem to be ignoring that issue when looking at systemd, which is odd to me 
as it's more noise than anything else), and also let everything else run there 
as well.

That what "most" other system designers in your situation do :)

> Our current (RH6) based system runs with a minimal number of demons, 
> none of them taking up any CPU time unless they are used. Systemd 
> process are not so well behaved. After a few hours of running they are 
> already at a few seconds.

What processes are showing up in your count?  Perhaps it's just a bug that 
needs to be fixed.

> On a single system - or systems working independent like server farms
> - that is not an issue. On our systems each second lost is multiplied 
> by the number of nodes in the jobs (let's say 200, but it could also 
> be up to 1 or more on large installations) due to tight coupling.
> If 3 demons use 1s a day each (and this is realistic on Xeon Phi 
> Knights Landing systems), that's slowing down the performance by 
> almost 1% (3 * 200 / 86400 = 0.7% to be exact). And - we do not gain 
> anything from those demons after initial startup!

Your kernel is eating more CPU time than those 1s numbers, why you aren't 
complaining about that seems strange to me :)

> My worst experience with such issues was on a cluster that lost 20% 
> application performance due to a badly configured crond demon.

That's not the issue here though.

Again, what tasks are causing cpu time for "no good reason", let's see if we 
can just fix them.

thanks,

greg k-h
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] question on special configuration case

2016-06-07 Thread Hebenstreit, Michael

The base system is actually pretty large (currently 1200 packages) - I hate 
that myself. Still performance wise the packages are not the issue. The SSDs 
used can easily handle that, and library loads are only happening once at 
startup (where the difference van be measured, but if the runtime is 24h 
startup time of 1s are not an issue). Kernel is tweaked, but those changes are 
relatively small.

The single problem biggest problem is OS noise. Aka every cycle that the CPU(s) 
are working on anything but the application. This is caused by a  combination 
of "large number of nodes" and "tightly coupled job processes". 

Our current (RH6) based system runs with a minimal number of demons, none of 
them taking up any CPU time unless they are used. Systemd process are not so 
well behaved. After a few hours of running they are already at a few seconds. 
On a single system - or systems working independent like server farms - that is 
not an issue. On our systems each second lost is multiplied by the number of 
nodes in the jobs (let's say 200, but it could also be up to 1 or more on 
large installations) due to tight coupling. If 3 demons use 1s a day each (and 
this is realistic on Xeon Phi Knights Landing systems), that's slowing down the 
performance by almost 1% (3 * 200 / 86400 = 0.7% to be exact). And - we do not 
gain anything from those demons after initial startup! 

My worst experience with such issues was on a cluster that lost 20% application 
performance due to a badly configured crond demon. Now I do not expect systemd 
to have such a negative impact, but even 1%, or even 0.5% of expected loss are 
too much in our case. 

-Original Message-
From: Jóhann B. Guðmundsson [mailto:johan...@gmail.com] 
Sent: Wednesday, June 08, 2016 6:10 AM
To: Hebenstreit, Michael; Lennart Poettering
Cc: systemd-devel@lists.freedesktop.org
Subject: Re: [systemd-devel] question on special configuration case

On 06/07/2016 10:17 PM, Hebenstreit, Michael wrote:

> I understand this usage model cannot be compared to laptops or web 
> servers. But basically you are saying systemd is not usable for our 
> High Performance Computing usage case and I might better off by 
> replacing it with sysinitV. I was hoping for some simpler solution, 
> but if it's not possible then that's life. Will certainly make an 
> interesting topic at HPC conferences :P

I personally would be interesting comparing your legacy sysv init setup to an 
systemd one since systemd is widely deployed on embedded devices with minimal 
build ( systemd, udevd and journald ) where systemd footprint and resource 
usage has been significantly reduced.

Given that I have pretty much crawled in the entire mud bath that makes up the 
core/baseOS layer in Fedora ( which RHEL and it's clone derive from ) when I 
was working on integrating systemd in the distribution I'm also interesting how 
you plan on making a minimal targeted base image which installs and uses just 
what you need from that ( dependency ) mess without having to rebuild those 
components first. ( I would think systemd "tweaking" came after you had solved 
that problem first along with rebuilding the kernel if your plan is to use just 
what you need ).

JBG
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] question on special configuration case

2016-06-07 Thread Hebenstreit, Michael

Thanks for the answers 

> Well, there's no tracking of sessions anymore, i.e. polkit and all that stuff 
> won't work anymore reasonably, and everything else that involves anything 
> graphical and so on.

Nothing listed is in anyway used on our system as already laid out in the 
original mail. Your answer implies though there is no real security issue 
though (like sshd not working or being exploitable to gain access to other 
accounts) - is this correct?

> If I were you I'd actually look what wakes up the system IRL instead of just 
> trying to blanket remove everything.
> Can you clarify how dbus-daemon, systemd-journald, systemd-logind, 
> systemd-udevd are causing issue/impacting in the above setup some thing more 
> than "I dont think we need it hence we want to disable it".

The approach "if you do not need it, do not run it" works for this case pretty 
well. Systemd demons take up cycles without doing anything useful for us. We do 
not do any logging, we do not change the hardware during runtime - so no matter 
how little time those unit consumes, it impacts scalability. As explained this 
is not acceptable in our environment. 

> If you need to perform benchmark on Red Hat and it's derivatives/clones then 
> disable this would scew the benchmark output on those would it not?
Not if you have some "easy" steps to duplicate the environment.

> If you need absolute bare minimum systemd [¹] then you need to 
> create/maintain your entire distribution for that
I would not call it a distribution - but yes, building/configuring a new OS out 
of the basic components supplied by RH/Centos is similar to a new distro.

I understand this usage model cannot be compared to laptops or web servers. But 
basically you are saying systemd is not usable for our High Performance 
Computing usage case and I might better off by replacing it with sysinitV. I 
was hoping for some simpler solution, but if it's not possible then that's 
life. Will certainly make an interesting topic at HPC conferences :P

Regards
Michael

-Original Message-
From: Lennart Poettering [mailto:lenn...@poettering.net] 
Sent: Tuesday, June 07, 2016 11:23 PM
To: Hebenstreit, Michael
Cc: systemd-devel@lists.freedesktop.org
Subject: Re: [systemd-devel] question on special configuration case

On Tue, 07.06.16 15:13, Hebenstreit, Michael (michael.hebenstr...@intel.com) 
wrote:

> Sorry for directing this question here, but I did not find any mailing 
> list that would be a better fit.
> 
> Problem: I'm running an HPC benchmarking cluster. We are evaluating
> RH7/CentOS7/OL7 and have a problem with system noise generated by the 
> systemd components (v 219-19.0.2, see below).
> 
> Background: All cores of the CPU (up to 288) are utilized 99.99% by 
> the application. Because of the tight coupling node to node (of 
> programs running on 200+ nodes) every time an OS process wakes up this 
> automatically delays EVERY process on EVERY node. As those small 
> interruptions are not synchronized over the cluster, the overall 
> effect on the effective performance is "time of the single delay" 
> times "number of nodes in the job". Therefore we need to keep the OS 
> of our systems are stripped down to an absolute bare minimum.
> 
> a) we have no use for any type of logging. The only log we have is
>kernel dmesg
> b) there is only a single user at any time on the system (logging in via ssh).
> c) The only demons running are those necessary for NFS, ntp and sshd. 
> d) we do not run Gnome or similar desktop.
> 
> Goal: For these reasons we want to shut down dbus-daemon, 
> systemd-journald, systemd-logind and after startup also systemd-udevd. 
> In our special case they do not serve any purpose. Unfortunately the 
> basic configuration options do not allow this.

This is simply not supported on systemd. Systems without journald and udevd are 
explicitly not supported, and systems without dbus-daemon are only really 
supported for early boot schemes.

You can of course ignore what we support and what not, but of course, then you 
really should know what you do, and you are basically on your own.

Note that you can connect the journal to kmsg, if you like, and turn off local 
storage, via ForwardToKMsg= and Storage= in journald.conf.

> Questions: 
>   Can you provide any guidance?
>   Will PID 1 (systemd) continue to do its work (first tests were
>   already successful)?

No, it will not. The only daemon of those listed you can realistically do 
without is logind, and if you do that, then you basically roll your own distro.

>   What are security implications when shutting down
>   systemd-logind?

Well, there's no tracking of sessions anymore, i.e. polkit and all that stuff 
won't work anymore reasonably, and everything else that involves

[systemd-devel] question on special configuration case

2016-06-07 Thread Hebenstreit, Michael

Sorry for directing this question here, but I did not find any mailing list 
that would be a better fit.

Problem: I'm running an HPC benchmarking cluster. We are evaluating 
RH7/CentOS7/OL7 and have a problem with system noise generated by the systemd 
components (v 219-19.0.2, see below). 

Background: All cores of the CPU (up to 288) are utilized 99.99% by the 
application. Because of the tight coupling node to node (of programs running on 
200+ nodes) every time an OS process wakes up this automatically delays EVERY 
process on EVERY node. As those small interruptions are not synchronized over 
the cluster, the overall effect on the effective performance is "time of the 
single delay" times "number of nodes in the job". Therefore we need to keep the 
OS of our systems are stripped down to an absolute bare minimum.

a) we have no use for any type of logging. The only log we have is kernel dmesg
b) there is only a single user at any time on the system (logging in via ssh).
c) The only demons running are those necessary for NFS, ntp and sshd. 
d) we do not run Gnome or similar desktop.

Goal: For these reasons we want to shut down dbus-daemon, systemd-journald, 
systemd-logind and after startup also systemd-udevd. In our special case they 
do not serve any purpose. Unfortunately the basic configuration options do not 
allow this.

Questions: 
Can you provide any guidance?
Will PID 1 (systemd) continue to do its work (first tests were already 
successful)?
What are security implications when shutting down systemd-logind?
Is there any mailing list better suited you can point me too?


Thanks for any help you can provide
Michael





Installed:
systemd-networkd-219-19.0.2.el7_2.9.x86_64
systemd-219-19.0.2.el7_2.9.x86_64
systemd-devel-219-19.0.2.el7_2.9.x86_64
systemd-sysv-219-19.0.2.el7_2.9.x86_64
systemd-libs-219-19.0.2.el7_2.9.x86_64
systemd-python-219-19.0.2.el7_2.9.x86_64
systemd-resolved-219-19.0.2.el7_2.9.x86_64


Michael Hebenstreit Senior Cluster Architect
Intel Corporation, MS: RR1-105/H14  Software and Services Group/DCE
4100 Sara Road  Tel.:   +1 505-794-3144 
Rio Rancho, NM 87124
UNITED STATES   E-mail: michael.hebenstr...@intel.com

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] question on special configuration case

Re: [systemd-devel] question on special configuration case

Re: [systemd-devel] question on special configuration case

Re: [systemd-devel] question on special configuration case

Re: [systemd-devel] question on special configuration case

[systemd-devel] question on special configuration case

6 matches

Site Navigation

Mail list logo

Footer information