Bug#929652: systemd: sshd processes are not put into the correct slice/scope

2019-08-05 Thread Michael Biebl
On Mon, 27 May 2019 21:03:03 + Simon Beirnaert
 wrote:
> Package: systemd
> Version: 232-25+deb9u11
> Severity: normal
> 
> Dear Maintainer,
> 
> 
> Systemd does not seem to consistently put sshd processes under the
> relevant user's slice. I have a user with 10925 sshd processes related
> to its sessions. 7552 of those are put under session scopes in the
> user's slice. 3372 are put under system.slice (ssh.service). The ones
> under the user slice are neatly grouped into session scopes whereas
> the ones under the system slice are not.
> 
> This is making it impossible to put accurate limits to the sshd
> processes, or the user's processes. I can set TasksMax, but that only
> has a value if the tasks are counted correctly.
> 
> If there's any more information needed to debug this, please let me
> know.

Thanks for filing this issue in the upstream bug tracker.
It would be great if you can follow-up there. Afaics Lennart had some
questions.

Regards,
Michael

-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?



signature.asc
Description: OpenPGP digital signature


Bug#929652: systemd: sshd processes are not put into the correct slice/scope

2019-06-24 Thread Michal Koutný
Hi.

Failure to create the session is almost certainly the reason why the
processes are in wrong slice.

Note that the PID that is in journald is PID of the process who
connected to journald obtained via getpeercred. If it forks and the fd
is passed to another process, the logs will misreport the old PID.
(Given different PIDs in very short intervals, I don't think it's the
PID of the listening sshd process but I'm not sure if sshd children
don't fork again. I'd recommend checking also neighbor PIDs found in the
logs.)

To your problem, I guess you're hitting some limits of dbus-daemon, I'd
try increasing
256
for you system dbus-daemon instace.

The second suggestion^W workaround would be to tweak
CPUWeight=/CPUShares= of dbus.service and sshd.service. Given
dbus.service ~2 times more that to sshd.service. This will likely cause
slightly more latency at clients but it can lower the strain on
dbus-daemon so that it brokers all pam_systemd requests.

HTH,
Michal


signature.asc
Description: PGP signature


Bug#929652: systemd: sshd processes are not put into the correct slice/scope

2019-06-24 Thread Simon Beirnaert
I opened a ticket upstream. You can find it here: 
https://github.com/systemd/systemd/issues/12867



Thank you for your help!



Bug#929652: systemd: sshd processes are not put into the correct slice/scope

2019-06-05 Thread Michael Biebl
Am 04.06.19 um 14:25 schrieb Simon Beirnaert:
> Sorry for the late reply here. Got caught up in other stuff. I did 
> some digging around. Set the loglevel with systemd-analyze to debug 
> and also added the debug flag to pam_systemd.so.
> 
> What I noticed is that on boot, when 5000+ machines try to 
> authenticate at once, pam_systemd seems to fall on its ass and fail 
> due to resource exhaustion. An excerpt from journalctl is attached 
> below.
> 
> I thought this might've been the reason behind sshd processes not 
> being assigned to the correct slice, but the processes for which 
> these log entries are generated are not available on the system 
> anymore, which I take as meaning that the sshd process exited 
> because it couldn't open a session.
> 
> I tried to go about it the other way around and search for logs 
> generated by any sshd process which is under the system.slice. I
> used this oneliner to do so:
> 
> for i in $(systemctl status ssh | grep 'sshd: ' | sed -E 
> 's/[^0-9]*([0-9]*)[^0-9]*/\1/'); do echo "==> logs for process $i"; 
> journalctl | grep '\[$i\]'; done | less
> 
> The search came up empty. For none of the currently 7000+ sshd 
> processes which aren't in the correct user slice, there are any logs 
> in journalctl. I verified and all logs since boot time are currently 
> still in the journal, it hasn't rotated yet.
> 

If you have 5000 users authenticate at once, I can imagine, that
libpam-systemd/systemd/dbus run into some limits.
Would you mind testing with v241 (either from backports or by setting up
a buster system) and if the problem is still reproducible, file it
upstream at https://github.com/systemd/systemd/issues

Thanks,
Michael


-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?



signature.asc
Description: OpenPGP digital signature


Bug#929652: systemd: sshd processes are not put into the correct slice/scope

2019-06-04 Thread Simon Beirnaert

Sorry for the late reply here. Got caught up in other stuff. I did
some digging around. Set the loglevel with systemd-analyze to debug
and also added the debug flag to pam_systemd.so.

What I noticed is that on boot, when 5000+ machines try to
authenticate at once, pam_systemd seems to fall on its ass and fail
due to resource exhaustion. An excerpt from journalctl is attached
below.

I thought this might've been the reason behind sshd processes not
being assigned to the correct slice, but the processes for which
these log entries are generated are not available on the system
anymore, which I take as meaning that the sshd process exited
because it couldn't open a session.

I tried to go about it the other way around and search for logs
generated by any sshd process which is under the system.slice. I
used this oneliner to do so:

for i in $(systemctl status ssh | grep 'sshd: ' | sed -E 
's/[^0-9]*([0-9]*)[^0-9]*/\1/'); do echo "==> logs for process $i"; journalctl | grep 
'\[$i\]'; done | less

The search came up empty. For none of the currently 7000+ sshd
processes which aren't in the correct user slice, there are any logs
in journalctl. I verified and all logs since boot time are currently
still in the journal, it hasn't rotated yet.



Jun 04 09:50:17  sshd[8415]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[8691]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[8717]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[28538]: pam_systemd(sshd:session): Failed to 
connect to system bus: Resource temporarily unavailable
Jun 04 09:50:17  sshd[8652]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[8721]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[8716]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[28121]: pam_systemd(sshd:session): Failed to 
connect to system bus: Resource temporarily unavailable
Jun 04 09:50:17  sshd[28551]: pam_systemd(sshd:session): Failed to 
connect to system bus: Resource temporarily unavailable
Jun 04 09:50:17  sshd[8682]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[8396]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[28524]: pam_systemd(sshd:session): Failed to 
connect to system bus: Resource temporarily unavailable
Jun 04 09:50:17  sshd[8719]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[28546]: pam_systemd(sshd:session): Failed to 
connect to system bus: Resource temporarily unavailable
Jun 04 09:50:17  sshd[8690]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[8697]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[7350]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[8522]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[8406]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[28636]: pam_systemd(sshd:session): Failed to 
connect to system bus: Resource temporarily unavailable
Jun 04 09:50:17  sshd[28620]: pam_systemd(sshd:session): Failed to 
connect to system bus: Resource temporarily unavailable
Jun 04 09:50:17  sshd[28669]: pam_systemd(sshd:session): Failed to 
connect to system bus: Resource temporarily unavailable
Jun 04 09:50:17  sshd[8693]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[8723]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[28633]: pam_systemd(sshd:session): Failed to 
connect to system bus: Resource temporarily unavailable
Jun 04 09:50:17  sshd[8401]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[8692]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[28670]: pam_systemd(sshd:session): Failed to 
connect to system bus: Resource temporarily unavailable
Jun 04 09:50:17  sshd[8543]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[8234]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[8654]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[28545]: pam_systemd(sshd:session): Failed to 
connect to system bus: Resource temporarily unavailable
Jun 04 09:50:17  sshd[8791]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[8840]: pam_systemd(sshd:session): Failed to 
create session: 

Bug#929652: systemd: sshd processes are not put into the correct slice/scope

2019-05-27 Thread Michael Biebl
Control: tags -1 + moreinfo

Am 27.05.19 um 23:03 schrieb Simon Beirnaert:
> Package: systemd
> Version: 232-25+deb9u11
> Severity: normal
> 
> Dear Maintainer,
> 
> 
> Systemd does not seem to consistently put sshd processes under the
> relevant user's slice. I have a user with 10925 sshd processes related
> to its sessions. 7552 of those are put under session scopes in the
> user's slice. 3372 are put under system.slice (ssh.service). The ones
> under the user slice are neatly grouped into session scopes whereas
> the ones under the system slice are not.
> 
> This is making it impossible to put accurate limits to the sshd
> processes, or the user's processes. I can set TasksMax, but that only
> has a value if the tasks are counted correctly.
> 
> If there's any more information needed to debug this, please let me
> know.

You need to have libpam-systemd enabled and PAM support in sshd as well
for processes spawned from an SSH session to be put in the user slice.

~11000 sshd processes for one user seems unusual. What kind of setup is
this? Are you sure all those sshd processes were going through the PAM
stack?
You might add the "debug" flag to the pam_systemd.so config to get more
information.
What do you get from pam_systemd.so for an exemplary sshd process which
is not put into the user slice? Do you get anything in the journal from
systemd-logind for this process?
You can increase what's being logged by systemd-logind with
"systemd-analyze log-level"

-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?



signature.asc
Description: OpenPGP digital signature


Bug#929652: systemd: sshd processes are not put into the correct slice/scope

2019-05-27 Thread Simon Beirnaert
Package: systemd
Version: 232-25+deb9u11
Severity: normal

Dear Maintainer,


Systemd does not seem to consistently put sshd processes under the
relevant user's slice. I have a user with 10925 sshd processes related
to its sessions. 7552 of those are put under session scopes in the
user's slice. 3372 are put under system.slice (ssh.service). The ones
under the user slice are neatly grouped into session scopes whereas
the ones under the system slice are not.

This is making it impossible to put accurate limits to the sshd
processes, or the user's processes. I can set TasksMax, but that only
has a value if the tasks are counted correctly.

If there's any more information needed to debug this, please let me
know.


Thanks
Simon

-- Package-specific info:

-- System Information:
Debian Release: 9.5
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.9.0-8-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=UTF-8 (charmap=locale: Cannot set LC_CTYPE 
to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
ANSI_X3.4-1968), LANGUAGE=en_US.UTF-8 (charmap=locale: Cannot set LC_CTYPE to 
default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
ANSI_X3.4-1968)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages systemd depends on:
ii  adduser 3.115
ii  libacl1 2.2.52-3+b1
ii  libapparmor12.11.0-3+deb9u2
ii  libaudit1   1:2.6.7-2
ii  libblkid1   2.29.2-1+deb9u1
ii  libc6   2.24-11+deb9u3
ii  libcap2 1:2.25-1
ii  libcryptsetup4  2:1.7.3-4
ii  libgcrypt20 1.7.6-2+deb9u3
ii  libgpg-error0   1.26-2
ii  libidn111.33-1
ii  libip4tc0   1.6.0+snapshot20161117-6
ii  libkmod223-2
ii  liblz4-10.0~r131-2+b1
ii  liblzma55.2.2-1.2+b1
ii  libmount1   2.29.2-1+deb9u1
ii  libpam0g1.1.8-3.6
ii  libseccomp2 2.3.1-2.1
ii  libselinux1 2.6-3+b3
ii  libsystemd0 232-25+deb9u11
ii  mount   2.29.2-1+deb9u1
ii  procps  2:3.3.12-3+deb9u1
ii  util-linux  2.29.2-1+deb9u1

Versions of packages systemd recommends:
ii  dbus1.10.26-0+deb9u1
ii  libpam-systemd  232-25+deb9u11

Versions of packages systemd suggests:
pn  policykit-1
pn  systemd-container  
pn  systemd-ui 

Versions of packages systemd is related to:
pn  dracut   
ii  initramfs-tools  0.130
ii  udev 232-25+deb9u11

-- debconf information excluded