Re: [systemd-devel] question on special configuration case

2016-06-08 Thread Greg KH
On Wed, Jun 08, 2016 at 06:43:04AM +, Hebenstreit, Michael wrote:
> > Really?  No journal messages are getting created at all?  No users logging 
> > in/out?  What does strace show on those processes?
> 
> Yes, messages are created - but I'm not interested in them. Maybe a
> user logs in for a 6h job - that's already tracked by the cluster
> software. There are virtually no demons running, no changes to the
> hardware - so all those demons are doing are looking out for
> themselves. Not really productive

If messages are created, they have to go somewhere, to think that they
would be "free" is crazy :)

> > So you are hurting all 253 cores because you can't spare 1?
> 
> Situation is a bit more complex. I have 64 physical cores, with 4
> units each for integer operations and 2 floating point units. So
> essentially if I reserve one integer unit for the OS, due to cache
> hierarchies and other oddities, I essentially take down 4 cores. The
> applications typically scale best if they run on a power of 2 number
> of cores. 

You can still run the applications on the "non-reserved" core, it's just
that the kernel can't get access to any of the others.  So you only take
the hit of any potential wakeups and other kernel housekeeping on that
one core.

Again, try it, you might be pleasantly surprised as your workload is
_exactly_ what that feature was created for.  To ignore it without
testing seems bizarre to me.  If it doesn't work for you, then either
that kernel feature needs to be fixed, or maybe we can just rip it out,
so you need to tell the kernel developers about it.

> > Again, that's not the issue, you can't see the time the kernel is using to 
> > do its work, but it is there (interrupts, scheduling, housekeeping,
> etc.)
> 
> shouldn't that show up in the time for worker threads?

How do you account for interrupts, I/O, scheduler processing time, etc?
:)

> And I'm not arguing you are wrong. We should minimize that and if
> possible keep all OS on an extra core. That does not make my argument
> invalid those demons are doing nothing more than housekeeping
> themselves in a very complicated fashion and they are wasting
> resources. 

Again, I think you are wasting more resources than you realize just
because you can't see it :)

And as others have pointed out, turn off watchdogs and you should be
fine from a systemd point of view.

thanks,

greg k-h
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] question on special configuration case

2016-06-08 Thread Simon McVittie
On 08/06/16 03:04, Hebenstreit, Michael wrote:
>> What processes are showing up in your count?  Perhaps it's just a bug that 
>> needs to be fixed.
> /bin/dbus-daemon
> /usr/lib/systemd/systemd-journald
> /usr/lib/systemd/systemd-logind

dbus-daemon will wake up when there are D-Bus messages to be delivered,
or when D-Bus-related data in /usr/share/dbus-1/ changes. If there is
nothing emitting D-Bus messages then it shouldn't normally wake up.

In dbus >= 1.10 you can run "dbus-monitor --system" as root, and you'll
see any D-Bus message that goes past. Unfortunately this use-case for
monitoring didn't really work in previous versions.

If you want it to stay off the majority of your CPU cores, Greg's
recommendation to set up CPU affinity seems wise. dbus-daemon is
single-threaded (or 2-threaded if SELinux and the audit subsystem are
active), so it will normally only run on one CPU at a time anyway.

-- 
Simon McVittie
Collabora Ltd. 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] question on special configuration case

2016-06-08 Thread Jóhann B . Guðmundsson

On 06/08/2016 06:51 AM, Hebenstreit, Michael wrote:


Thanks for this and the other suggestions!

So for starters we’ll disable logind and dbus, increase watchdogsec 
and see where the footprint is – before disabling journald if 
necessary in a next step.




You cannot disable journal but you can reduce it and the following 
should give the least amount of logging in all potential scenarios and 
usage ;)


Just create "/etc/systemd/journald.conf.d/10-hpc-tweaks.conf"which contains

[Journal]
Storage=none
MaxLevelConsole=emerg
MaxLevelStore=emerg
MaxLevelSyslog=emerg
MaxLevelKMsg=emerg
MaxLevelConsole=emerg
MaxLevelWall=emerg
TTYPath=/dev/null

Then restart the journal ( systemctl restart systemd-journald )

JBG
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] question on special configuration case

2016-06-08 Thread Lennart Poettering
On Tue, 07.06.16 21:00, Greg KH (gre...@linuxfoundation.org) wrote:

> On Wed, Jun 08, 2016 at 02:04:48AM +, Hebenstreit, Michael wrote:
> > > What processes are showing up in your count?  Perhaps it's just a
> > > bug that needs to be fixed.
> > /bin/dbus-daemon
> > /usr/lib/systemd/systemd-journald
> > /usr/lib/systemd/systemd-logind
> > 
> > I understand from the previous mails those are necessary to make
> > systemd work - but here they are doing nothing more than talking to
> > each other!
> 
> Really?  No journal messages are getting created at all?  No users
> logging in/out?  What does strace show on those processes?

It's the "watchdog" logic most likely. i.e. systemd has a per-service
setting WatchdogSec=. If that's set the daemons have to ping back PID
1 in regular intervals, or otherwise are assumed hanging.

On top of that PID 1 actually talks to hw watchdogs, if there are any,
by default.

If both of that is turned off, then there should be zero wakeups
really...

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] question on special configuration case

2016-06-08 Thread Lennart Poettering
On Tue, 07.06.16 22:17, Hebenstreit, Michael (michael.hebenstr...@intel.com) 
wrote:

> Thanks for the answers 
> 
> > Well, there's no tracking of sessions anymore, i.e. polkit and all that 
> > stuff won't work anymore reasonably, and everything else that involves 
> > anything graphical and so on.
> 
> Nothing listed is in anyway used on our system as already laid out
> in the original mail. Your answer implies though there is no real
> security issue though (like sshd not working or being exploitable to
> gain access to other accounts) - is this correct?

Yes, that's correct.

> > If I were you I'd actually look what wakes up the system IRL
> > instead of just trying to blanket remove everything.  Can you
> > clarify how dbus-daemon, systemd-journald, systemd-logind,
> > systemd-udevd are causing issue/impacting in the above setup some
> > thing more than "I dont think we need it hence we want to disable
> > it".
> 
> The approach "if you do not need it, do not run it" works for this
> case pretty well. Systemd demons take up cycles without doing
> anything useful for us. We do not do any logging, we do not change
> the hardware during runtime - so no matter how little time those
> unit consumes, it impacts scalability. As explained this is not
> acceptable in our environment.

Well, they really shouldn't take up cycles when idle, except for the
watchdog stuff, which is easy to disable... It sounds like the much
better idea to track this down, and fix it in the individual case.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] question on special configuration case

2016-06-08 Thread Hebenstreit, Michael
Thanks for this and the other suggestions!

So for starters we’ll disable logind and dbus, increase watchdogsec and see 
where the footprint is – before disabling journald if necessary in a next step.

Regards
Michael


From: Mantas Mikulėnas [mailto:graw...@gmail.com]
Sent: Wednesday, June 08, 2016 11:35 AM
To: Hebenstreit, Michael
Cc: Systemd
Subject: Re: [systemd-devel] question on special configuration case

This sounds like you could start by unsetting WatchdogSec= for those daemons. 
Other than the watchdog, they shouldn't be using any CPU unless explicitly 
contacted.
On Wed, Jun 8, 2016, 02:50 Hebenstreit, Michael 
<michael.hebenstr...@intel.com<mailto:michael.hebenstr...@intel.com>> wrote:
The base system is actually pretty large (currently 1200 packages) - I hate 
that myself. Still performance wise the packages are not the issue. The SSDs 
used can easily handle that, and library loads are only happening once at 
startup (where the difference van be measured, but if the runtime is 24h 
startup time of 1s are not an issue). Kernel is tweaked, but those changes are 
relatively small.

The single problem biggest problem is OS noise. Aka every cycle that the CPU(s) 
are working on anything but the application. This is caused by a  combination 
of "large number of nodes" and "tightly coupled job processes".

Our current (RH6) based system runs with a minimal number of demons, none of 
them taking up any CPU time unless they are used. Systemd process are not so 
well behaved. After a few hours of running they are already at a few seconds. 
On a single system - or systems working independent like server farms - that is 
not an issue. On our systems each second lost is multiplied by the number of 
nodes in the jobs (let's say 200, but it could also be up to 1 or more on 
large installations) due to tight coupling. If 3 demons use 1s a day each (and 
this is realistic on Xeon Phi Knights Landing systems), that's slowing down the 
performance by almost 1% (3 * 200 / 86400 = 0.7% to be exact). And - we do not 
gain anything from those demons after initial startup!

My worst experience with such issues was on a cluster that lost 20% application 
performance due to a badly configured crond demon. Now I do not expect systemd 
to have such a negative impact, but even 1%, or even 0.5% of expected loss are 
too much in our case.

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] question on special configuration case

2016-06-08 Thread Hebenstreit, Michael
> Really?  No journal messages are getting created at all?  No users logging 
> in/out?  What does strace show on those processes?

Yes, messages are created - but I'm not interested in them. Maybe a user logs 
in for a 6h job - that's already tracked by the cluster software. There are 
virtually no demons running, no changes to the hardware - so all those demons 
are doing are looking out for themselves. Not really productive

> So you are hurting all 253 cores because you can't spare 1?

Situation is a bit more complex. I have 64 physical cores, with 4 units each 
for integer operations and 2 floating point units. So essentially if I reserve 
one integer unit for the OS, due to cache hierarchies and other oddities, I 
essentially take down 4 cores. The applications typically scale best if they 
run on a power of 2 number of cores. 

> Again, that's not the issue, you can't see the time the kernel is using to do 
> its work, but it is there (interrupts, scheduling, housekeeping,
etc.)

shouldn't that show up in the time for worker threads? And I'm not arguing you 
are wrong. We should minimize that and if possible keep all OS on an extra 
core. That does not make my argument invalid those demons are doing nothing 
more than housekeeping themselves in a very complicated fashion and they are 
wasting resources. 



-Original Message-
From: Greg KH [mailto:gre...@linuxfoundation.org] 
Sent: Wednesday, June 08, 2016 11:01 AM
To: Hebenstreit, Michael
Cc: Jóhann B. Guðmundsson; Lennart Poettering; 
systemd-devel@lists.freedesktop.org
Subject: Re: [systemd-devel] question on special configuration case

On Wed, Jun 08, 2016 at 02:04:48AM +, Hebenstreit, Michael wrote:
> > What processes are showing up in your count?  Perhaps it's just a 
> > bug that needs to be fixed.
> /bin/dbus-daemon
> /usr/lib/systemd/systemd-journald
> /usr/lib/systemd/systemd-logind
> 
> I understand from the previous mails those are necessary to make 
> systemd work - but here they are doing nothing more than talking to 
> each other!

Really?  No journal messages are getting created at all?  No users logging 
in/out?  What does strace show on those processes?

> > That what "most" other system designers in your situation do :)
> Unfortunately I cannot reserve a CPU for OS - I'd like to, but the app 
> developers insist to use all 254 cores available

So you are hurting all 253 cores because you can't spare 1?  If you do the math 
I think you will find you will get increased throughput.  But what do I know... 
:)

> > Your kernel is eating more CPU time than those 1s numbers, why you 
> > aren't complaining about that seems strange to me :)
> I also check kernel - last time I look on RH6 all kernel threads 
> taking up clock ticks were actually doing work ^^ No time yet to do 
> the same on RH7 kernel

Again, that's not the issue, you can't see the time the kernel is using to do 
its work, but it is there (interrupts, scheduling, housekeeping,
etc.)  So get it out of the way entirely and see how much faster your 
application runs without it even present on those cpus, if you really have cpu 
bound processes.  That's what the feature was made for, people in your 
situation, to ignore it and try to go after something else seems very strange 
to me.

greg k-h
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] question on special configuration case

2016-06-07 Thread Mantas Mikulėnas
This sounds like you could start by unsetting WatchdogSec= for those
daemons. Other than the watchdog, they shouldn't be using any CPU unless
explicitly contacted.

On Wed, Jun 8, 2016, 02:50 Hebenstreit, Michael <
michael.hebenstr...@intel.com> wrote:

> The base system is actually pretty large (currently 1200 packages) - I
> hate that myself. Still performance wise the packages are not the issue.
> The SSDs used can easily handle that, and library loads are only happening
> once at startup (where the difference van be measured, but if the runtime
> is 24h startup time of 1s are not an issue). Kernel is tweaked, but those
> changes are relatively small.
>
> The single problem biggest problem is OS noise. Aka every cycle that the
> CPU(s) are working on anything but the application. This is caused by a
> combination of "large number of nodes" and "tightly coupled job processes".
>
> Our current (RH6) based system runs with a minimal number of demons, none
> of them taking up any CPU time unless they are used. Systemd process are
> not so well behaved. After a few hours of running they are already at a few
> seconds. On a single system - or systems working independent like server
> farms - that is not an issue. On our systems each second lost is multiplied
> by the number of nodes in the jobs (let's say 200, but it could also be up
> to 1 or more on large installations) due to tight coupling. If 3 demons
> use 1s a day each (and this is realistic on Xeon Phi Knights Landing
> systems), that's slowing down the performance by almost 1% (3 * 200 / 86400
> = 0.7% to be exact). And - we do not gain anything from those demons after
> initial startup!
>
> My worst experience with such issues was on a cluster that lost 20%
> application performance due to a badly configured crond demon. Now I do not
> expect systemd to have such a negative impact, but even 1%, or even 0.5% of
> expected loss are too much in our case.
>
>
> -Original Message-
> From: Jóhann B. Guðmundsson [mailto:johan...@gmail.com]
> Sent: Wednesday, June 08, 2016 6:10 AM
> To: Hebenstreit, Michael; Lennart Poettering
> Cc: systemd-devel@lists.freedesktop.org
> Subject: Re: [systemd-devel] question on special configuration case
>
> On 06/07/2016 10:17 PM, Hebenstreit, Michael wrote:
>
> > I understand this usage model cannot be compared to laptops or web
> > servers. But basically you are saying systemd is not usable for our
> > High Performance Computing usage case and I might better off by
> > replacing it with sysinitV. I was hoping for some simpler solution,
> > but if it's not possible then that's life. Will certainly make an
> > interesting topic at HPC conferences :P
>
> I personally would be interesting comparing your legacy sysv init setup to
> an systemd one since systemd is widely deployed on embedded devices with
> minimal build ( systemd, udevd and journald ) where systemd footprint and
> resource usage has been significantly reduced.
>
> Given that I have pretty much crawled in the entire mud bath that makes up
> the core/baseOS layer in Fedora ( which RHEL and it's clone derive from )
> when I was working on integrating systemd in the distribution I'm also
> interesting how you plan on making a minimal targeted base image which
> installs and uses just what you need from that ( dependency ) mess without
> having to rebuild those components first. ( I would think systemd
> "tweaking" came after you had solved that problem first along with
> rebuilding the kernel if your plan is to use just what you need ).
>
> JBG
> ___
> systemd-devel mailing list
> systemd-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/systemd-devel
>
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] question on special configuration case

2016-06-07 Thread Andrew Thompson
On Tue, Jun 7, 2016 at 7:04 PM, Hebenstreit, Michael
 wrote:
>> That what "most" other system designers in your situation do :)
> Unfortunately I cannot reserve a CPU for OS - I'd like to, but the app 
> developers insist to use all 254 cores available

Tough situation. Use Forth, it's the only way, my friend.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] question on special configuration case

2016-06-07 Thread Greg KH
On Wed, Jun 08, 2016 at 02:04:48AM +, Hebenstreit, Michael wrote:
> > What processes are showing up in your count?  Perhaps it's just a
> > bug that needs to be fixed.
> /bin/dbus-daemon
> /usr/lib/systemd/systemd-journald
> /usr/lib/systemd/systemd-logind
> 
> I understand from the previous mails those are necessary to make
> systemd work - but here they are doing nothing more than talking to
> each other!

Really?  No journal messages are getting created at all?  No users
logging in/out?  What does strace show on those processes?

> > That what "most" other system designers in your situation do :)
> Unfortunately I cannot reserve a CPU for OS - I'd like to, but the app
> developers insist to use all 254 cores available

So you are hurting all 253 cores because you can't spare 1?  If you do
the math I think you will find you will get increased throughput.  But
what do I know... :)

> > Your kernel is eating more CPU time than those 1s numbers, why you
> > aren't complaining about that seems strange to me :)
> I also check kernel - last time I look on RH6 all kernel threads
> taking up clock ticks were actually doing work ^^ No time yet to do
> the same on RH7 kernel

Again, that's not the issue, you can't see the time the kernel is using
to do its work, but it is there (interrupts, scheduling, housekeeping,
etc.)  So get it out of the way entirely and see how much faster your
application runs without it even present on those cpus, if you really
have cpu bound processes.  That's what the feature was made for, people
in your situation, to ignore it and try to go after something else seems
very strange to me.

greg k-h
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] question on special configuration case

2016-06-07 Thread Hebenstreit, Michael
> That's not the issue here though.
Nope, but an example how bad things can get.


> What processes are showing up in your count?  Perhaps it's just a bug that 
> needs to be fixed.
/bin/dbus-daemon
/usr/lib/systemd/systemd-journald
/usr/lib/systemd/systemd-logind

I understand from the previous mails those are necessary to make systemd work - 
but here they are doing nothing more than talking to each other!

> That what "most" other system designers in your situation do :)
Unfortunately I cannot reserve a CPU for OS - I'd like to, but the app 
developers insist to use all 254 cores available

> Your kernel is eating more CPU time than those 1s numbers, why you aren't 
> complaining about that seems strange to me :)
I also check kernel - last time I look on RH6 all kernel threads taking up 
clock ticks were actually doing work ^^
No time yet to do the same on RH7 kernel


-Original Message-
From: Greg KH [mailto:gre...@linuxfoundation.org] 
Sent: Wednesday, June 08, 2016 8:54 AM
To: Hebenstreit, Michael
Cc: Jóhann B. Guðmundsson; Lennart Poettering; 
systemd-devel@lists.freedesktop.org
Subject: Re: [systemd-devel] question on special configuration case

On Tue, Jun 07, 2016 at 11:50:36PM +, Hebenstreit, Michael wrote:
> The base system is actually pretty large (currently 1200 packages) - I 
> hate that myself. Still performance wise the packages are not the 
> issue. The SSDs used can easily handle that, and library loads are 
> only happening once at startup (where the difference van be measured, 
> but if the runtime is 24h startup time of 1s are not an issue). Kernel 
> is tweaked, but those changes are relatively small.
> 
> The single problem biggest problem is OS noise. Aka every cycle that 
> the CPU(s) are working on anything but the application. This is caused 
> by a  combination of "large number of nodes" and "tightly coupled job 
> processes".

Then bind your applications to the cpus and don't let anything else run on 
them, including the kernel.  That way you will not get any jitter or latencies 
and can use the CPUs to their max, without having to worry about anything.  
Leave one CPU alone to have the kernel be able to manage its housekeeping tasks 
(you seem to be ignoring that issue when looking at systemd, which is odd to me 
as it's more noise than anything else), and also let everything else run there 
as well.

That what "most" other system designers in your situation do :)

> Our current (RH6) based system runs with a minimal number of demons, 
> none of them taking up any CPU time unless they are used. Systemd 
> process are not so well behaved. After a few hours of running they are 
> already at a few seconds.

What processes are showing up in your count?  Perhaps it's just a bug that 
needs to be fixed.

> On a single system - or systems working independent like server farms
> - that is not an issue. On our systems each second lost is multiplied 
> by the number of nodes in the jobs (let's say 200, but it could also 
> be up to 1 or more on large installations) due to tight coupling.
> If 3 demons use 1s a day each (and this is realistic on Xeon Phi 
> Knights Landing systems), that's slowing down the performance by 
> almost 1% (3 * 200 / 86400 = 0.7% to be exact). And - we do not gain 
> anything from those demons after initial startup!

Your kernel is eating more CPU time than those 1s numbers, why you aren't 
complaining about that seems strange to me :)

> My worst experience with such issues was on a cluster that lost 20% 
> application performance due to a badly configured crond demon.

That's not the issue here though.

Again, what tasks are causing cpu time for "no good reason", let's see if we 
can just fix them.

thanks,

greg k-h
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] question on special configuration case

2016-06-07 Thread Greg KH
On Tue, Jun 07, 2016 at 11:50:36PM +, Hebenstreit, Michael wrote:
> The base system is actually pretty large (currently 1200 packages) - I
> hate that myself. Still performance wise the packages are not the
> issue. The SSDs used can easily handle that, and library loads are
> only happening once at startup (where the difference van be measured,
> but if the runtime is 24h startup time of 1s are not an issue). Kernel
> is tweaked, but those changes are relatively small.
> 
> The single problem biggest problem is OS noise. Aka every cycle that
> the CPU(s) are working on anything but the application. This is caused
> by a  combination of "large number of nodes" and "tightly coupled job
> processes". 

Then bind your applications to the cpus and don't let anything else run
on them, including the kernel.  That way you will not get any jitter or
latencies and can use the CPUs to their max, without having to worry
about anything.  Leave one CPU alone to have the kernel be able to
manage its housekeeping tasks (you seem to be ignoring that issue when
looking at systemd, which is odd to me as it's more noise than anything
else), and also let everything else run there as well.

That what "most" other system designers in your situation do :)

> Our current (RH6) based system runs with a minimal number of demons,
> none of them taking up any CPU time unless they are used. Systemd
> process are not so well behaved. After a few hours of running they are
> already at a few seconds.

What processes are showing up in your count?  Perhaps it's just a bug
that needs to be fixed.

> On a single system - or systems working independent like server farms
> - that is not an issue. On our systems each second lost is multiplied
> by the number of nodes in the jobs (let's say 200, but it could also
> be up to 1 or more on large installations) due to tight coupling.
> If 3 demons use 1s a day each (and this is realistic on Xeon Phi
> Knights Landing systems), that's slowing down the performance by
> almost 1% (3 * 200 / 86400 = 0.7% to be exact). And - we do not gain
> anything from those demons after initial startup! 

Your kernel is eating more CPU time than those 1s numbers, why you
aren't complaining about that seems strange to me :)

> My worst experience with such issues was on a cluster that lost 20%
> application performance due to a badly configured crond demon.

That's not the issue here though.

Again, what tasks are causing cpu time for "no good reason", let's see
if we can just fix them.

thanks,

greg k-h
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] question on special configuration case

2016-06-07 Thread Hebenstreit, Michael
The base system is actually pretty large (currently 1200 packages) - I hate 
that myself. Still performance wise the packages are not the issue. The SSDs 
used can easily handle that, and library loads are only happening once at 
startup (where the difference van be measured, but if the runtime is 24h 
startup time of 1s are not an issue). Kernel is tweaked, but those changes are 
relatively small.

The single problem biggest problem is OS noise. Aka every cycle that the CPU(s) 
are working on anything but the application. This is caused by a  combination 
of "large number of nodes" and "tightly coupled job processes". 

Our current (RH6) based system runs with a minimal number of demons, none of 
them taking up any CPU time unless they are used. Systemd process are not so 
well behaved. After a few hours of running they are already at a few seconds. 
On a single system - or systems working independent like server farms - that is 
not an issue. On our systems each second lost is multiplied by the number of 
nodes in the jobs (let's say 200, but it could also be up to 1 or more on 
large installations) due to tight coupling. If 3 demons use 1s a day each (and 
this is realistic on Xeon Phi Knights Landing systems), that's slowing down the 
performance by almost 1% (3 * 200 / 86400 = 0.7% to be exact). And - we do not 
gain anything from those demons after initial startup! 

My worst experience with such issues was on a cluster that lost 20% application 
performance due to a badly configured crond demon. Now I do not expect systemd 
to have such a negative impact, but even 1%, or even 0.5% of expected loss are 
too much in our case. 


-Original Message-
From: Jóhann B. Guðmundsson [mailto:johan...@gmail.com] 
Sent: Wednesday, June 08, 2016 6:10 AM
To: Hebenstreit, Michael; Lennart Poettering
Cc: systemd-devel@lists.freedesktop.org
Subject: Re: [systemd-devel] question on special configuration case

On 06/07/2016 10:17 PM, Hebenstreit, Michael wrote:

> I understand this usage model cannot be compared to laptops or web 
> servers. But basically you are saying systemd is not usable for our 
> High Performance Computing usage case and I might better off by 
> replacing it with sysinitV. I was hoping for some simpler solution, 
> but if it's not possible then that's life. Will certainly make an 
> interesting topic at HPC conferences :P

I personally would be interesting comparing your legacy sysv init setup to an 
systemd one since systemd is widely deployed on embedded devices with minimal 
build ( systemd, udevd and journald ) where systemd footprint and resource 
usage has been significantly reduced.

Given that I have pretty much crawled in the entire mud bath that makes up the 
core/baseOS layer in Fedora ( which RHEL and it's clone derive from ) when I 
was working on integrating systemd in the distribution I'm also interesting how 
you plan on making a minimal targeted base image which installs and uses just 
what you need from that ( dependency ) mess without having to rebuild those 
components first. ( I would think systemd "tweaking" came after you had solved 
that problem first along with rebuilding the kernel if your plan is to use just 
what you need ).

JBG
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] question on special configuration case

2016-06-07 Thread Jóhann B . Guðmundsson

On 06/07/2016 10:17 PM, Hebenstreit, Michael wrote:


I understand this usage model cannot be compared to laptops or web servers. But 
basically you are saying systemd is not usable for our High Performance 
Computing usage case and I might better off by replacing it with sysinitV. I 
was hoping for some simpler solution, but if it's not possible then that's 
life. Will certainly make an interesting topic at HPC conferences :P


I personally would be interesting comparing your legacy sysv init setup 
to an systemd one since systemd is widely deployed on embedded devices 
with minimal build ( systemd, udevd and journald ) where systemd 
footprint and resource usage has been significantly reduced.


Given that I have pretty much crawled in the entire mud bath that makes 
up the core/baseOS layer in Fedora ( which RHEL and it's clone derive 
from ) when I was working on integrating systemd in the distribution I'm 
also interesting how you plan on making a minimal targeted base image 
which installs and uses just what you need from that ( dependency ) mess 
without having to rebuild those components first. ( I would think 
systemd "tweaking" came after you had solved that problem first along 
with rebuilding the kernel if your plan is to use just what you need ).


JBG
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] question on special configuration case

2016-06-07 Thread Hebenstreit, Michael
Thanks for the answers 

> Well, there's no tracking of sessions anymore, i.e. polkit and all that stuff 
> won't work anymore reasonably, and everything else that involves anything 
> graphical and so on.

Nothing listed is in anyway used on our system as already laid out in the 
original mail. Your answer implies though there is no real security issue 
though (like sshd not working or being exploitable to gain access to other 
accounts) - is this correct?


> If I were you I'd actually look what wakes up the system IRL instead of just 
> trying to blanket remove everything.
> Can you clarify how dbus-daemon, systemd-journald, systemd-logind, 
> systemd-udevd are causing issue/impacting in the above setup some thing more 
> than "I dont think we need it hence we want to disable it".

The approach "if you do not need it, do not run it" works for this case pretty 
well. Systemd demons take up cycles without doing anything useful for us. We do 
not do any logging, we do not change the hardware during runtime - so no matter 
how little time those unit consumes, it impacts scalability. As explained this 
is not acceptable in our environment. 


> If you need to perform benchmark on Red Hat and it's derivatives/clones then 
> disable this would scew the benchmark output on those would it not?
Not if you have some "easy" steps to duplicate the environment.


> If you need absolute bare minimum systemd [¹] then you need to 
> create/maintain your entire distribution for that
I would not call it a distribution - but yes, building/configuring a new OS out 
of the basic components supplied by RH/Centos is similar to a new distro.


I understand this usage model cannot be compared to laptops or web servers. But 
basically you are saying systemd is not usable for our High Performance 
Computing usage case and I might better off by replacing it with sysinitV. I 
was hoping for some simpler solution, but if it's not possible then that's 
life. Will certainly make an interesting topic at HPC conferences :P

Regards
Michael

-Original Message-
From: Lennart Poettering [mailto:lenn...@poettering.net] 
Sent: Tuesday, June 07, 2016 11:23 PM
To: Hebenstreit, Michael
Cc: systemd-devel@lists.freedesktop.org
Subject: Re: [systemd-devel] question on special configuration case

On Tue, 07.06.16 15:13, Hebenstreit, Michael (michael.hebenstr...@intel.com) 
wrote:

> Sorry for directing this question here, but I did not find any mailing 
> list that would be a better fit.
> 
> Problem: I'm running an HPC benchmarking cluster. We are evaluating
> RH7/CentOS7/OL7 and have a problem with system noise generated by the 
> systemd components (v 219-19.0.2, see below).
> 
> Background: All cores of the CPU (up to 288) are utilized 99.99% by 
> the application. Because of the tight coupling node to node (of 
> programs running on 200+ nodes) every time an OS process wakes up this 
> automatically delays EVERY process on EVERY node. As those small 
> interruptions are not synchronized over the cluster, the overall 
> effect on the effective performance is "time of the single delay" 
> times "number of nodes in the job". Therefore we need to keep the OS 
> of our systems are stripped down to an absolute bare minimum.
> 
> a) we have no use for any type of logging. The only log we have is
>kernel dmesg
> b) there is only a single user at any time on the system (logging in via ssh).
> c) The only demons running are those necessary for NFS, ntp and sshd. 
> d) we do not run Gnome or similar desktop.
> 
> Goal: For these reasons we want to shut down dbus-daemon, 
> systemd-journald, systemd-logind and after startup also systemd-udevd. 
> In our special case they do not serve any purpose. Unfortunately the 
> basic configuration options do not allow this.

This is simply not supported on systemd. Systems without journald and udevd are 
explicitly not supported, and systems without dbus-daemon are only really 
supported for early boot schemes.

You can of course ignore what we support and what not, but of course, then you 
really should know what you do, and you are basically on your own.

Note that you can connect the journal to kmsg, if you like, and turn off local 
storage, via ForwardToKMsg= and Storage= in journald.conf.

> Questions: 
>   Can you provide any guidance?
>   Will PID 1 (systemd) continue to do its work (first tests were
>   already successful)?

No, it will not. The only daemon of those listed you can realistically do 
without is logind, and if you do that, then you basically roll your own distro.

>   What are security implications when shutting down
>   systemd-logind?

Well, there's no tracking of sessions anymore, i.e. polkit and all that stuff 
won't work anymore reasonably, and everything else that involves

Re: [systemd-devel] question on special configuration case

2016-06-07 Thread Jóhann B . Guðmundsson

On 06/07/2016 03:13 PM, Hebenstreit, Michael wrote:


  we need to keep the OS of our systems are stripped down to an absolute bare 
minimum.


If you need absolute bare minimum systemd [¹] then you need to 
create/maintain your entire distribution for that ( for example you 
would build systemd so you just with what you need and use systemd's 
built in networkd not NetworkManager, timesyncd not ntp etc, change sshd 
to be socket activate, only install components necessary for the 
application to run, kernel mods so fourth and so on ) .


If you need to perform benchmark on Red Hat and it's derivatives/clones 
then disable this would scew the benchmark output on those would it not?


Can you clarify how dbus-daemon, systemd-journald, systemd-logind, 
systemd-udevd are causing issue/impacting in the above setup some thing 
more than "I dont think we need it hence we want to disable it".


JBG

1. https://freedesktop.org/wiki/Software/systemd/MinimalBuilds/
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] question on special configuration case

2016-06-07 Thread Hebenstreit, Michael
Sorry for directing this question here, but I did not find any mailing list 
that would be a better fit.

Problem: I'm running an HPC benchmarking cluster. We are evaluating 
RH7/CentOS7/OL7 and have a problem with system noise generated by the systemd 
components (v 219-19.0.2, see below). 

Background: All cores of the CPU (up to 288) are utilized 99.99% by the 
application. Because of the tight coupling node to node (of programs running on 
200+ nodes) every time an OS process wakes up this automatically delays EVERY 
process on EVERY node. As those small interruptions are not synchronized over 
the cluster, the overall effect on the effective performance is "time of the 
single delay" times "number of nodes in the job". Therefore we need to keep the 
OS of our systems are stripped down to an absolute bare minimum.

a) we have no use for any type of logging. The only log we have is kernel dmesg
b) there is only a single user at any time on the system (logging in via ssh).
c) The only demons running are those necessary for NFS, ntp and sshd. 
d) we do not run Gnome or similar desktop.

Goal: For these reasons we want to shut down dbus-daemon, systemd-journald, 
systemd-logind and after startup also systemd-udevd. In our special case they 
do not serve any purpose. Unfortunately the basic configuration options do not 
allow this.

Questions: 
Can you provide any guidance?
Will PID 1 (systemd) continue to do its work (first tests were already 
successful)?
What are security implications when shutting down systemd-logind?
Is there any mailing list better suited you can point me too?


Thanks for any help you can provide
Michael





Installed:
systemd-networkd-219-19.0.2.el7_2.9.x86_64
systemd-219-19.0.2.el7_2.9.x86_64
systemd-devel-219-19.0.2.el7_2.9.x86_64
systemd-sysv-219-19.0.2.el7_2.9.x86_64
systemd-libs-219-19.0.2.el7_2.9.x86_64
systemd-python-219-19.0.2.el7_2.9.x86_64
systemd-resolved-219-19.0.2.el7_2.9.x86_64


Michael Hebenstreit Senior Cluster Architect
Intel Corporation, MS: RR1-105/H14  Software and Services Group/DCE
4100 Sara Road  Tel.:   +1 505-794-3144 
Rio Rancho, NM 87124
UNITED STATES   E-mail: michael.hebenstr...@intel.com

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel