Bug#929652: systemd: sshd processes are not put into the correct slice/scope

2019-06-24 Thread Simon Beirnaert
I opened a ticket upstream. You can find it here: 
https://github.com/systemd/systemd/issues/12867



Thank you for your help!



Bug#929652: systemd: sshd processes are not put into the correct slice/scope

2019-06-04 Thread Simon Beirnaert

Sorry for the late reply here. Got caught up in other stuff. I did
some digging around. Set the loglevel with systemd-analyze to debug
and also added the debug flag to pam_systemd.so.

What I noticed is that on boot, when 5000+ machines try to
authenticate at once, pam_systemd seems to fall on its ass and fail
due to resource exhaustion. An excerpt from journalctl is attached
below.

I thought this might've been the reason behind sshd processes not
being assigned to the correct slice, but the processes for which
these log entries are generated are not available on the system
anymore, which I take as meaning that the sshd process exited
because it couldn't open a session.

I tried to go about it the other way around and search for logs
generated by any sshd process which is under the system.slice. I
used this oneliner to do so:

for i in $(systemctl status ssh | grep 'sshd: ' | sed -E 
's/[^0-9]*([0-9]*)[^0-9]*/\1/'); do echo "==> logs for process $i"; journalctl | grep 
'\[$i\]'; done | less

The search came up empty. For none of the currently 7000+ sshd
processes which aren't in the correct user slice, there are any logs
in journalctl. I verified and all logs since boot time are currently
still in the journal, it hasn't rotated yet.



Jun 04 09:50:17  sshd[8415]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[8691]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[8717]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[28538]: pam_systemd(sshd:session): Failed to 
connect to system bus: Resource temporarily unavailable
Jun 04 09:50:17  sshd[8652]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[8721]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[8716]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[28121]: pam_systemd(sshd:session): Failed to 
connect to system bus: Resource temporarily unavailable
Jun 04 09:50:17  sshd[28551]: pam_systemd(sshd:session): Failed to 
connect to system bus: Resource temporarily unavailable
Jun 04 09:50:17  sshd[8682]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[8396]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[28524]: pam_systemd(sshd:session): Failed to 
connect to system bus: Resource temporarily unavailable
Jun 04 09:50:17  sshd[8719]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[28546]: pam_systemd(sshd:session): Failed to 
connect to system bus: Resource temporarily unavailable
Jun 04 09:50:17  sshd[8690]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[8697]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[7350]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[8522]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[8406]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[28636]: pam_systemd(sshd:session): Failed to 
connect to system bus: Resource temporarily unavailable
Jun 04 09:50:17  sshd[28620]: pam_systemd(sshd:session): Failed to 
connect to system bus: Resource temporarily unavailable
Jun 04 09:50:17  sshd[28669]: pam_systemd(sshd:session): Failed to 
connect to system bus: Resource temporarily unavailable
Jun 04 09:50:17  sshd[8693]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[8723]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[28633]: pam_systemd(sshd:session): Failed to 
connect to system bus: Resource temporarily unavailable
Jun 04 09:50:17  sshd[8401]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[8692]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[28670]: pam_systemd(sshd:session): Failed to 
connect to system bus: Resource temporarily unavailable
Jun 04 09:50:17  sshd[8543]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[8234]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[8654]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[28545]: pam_systemd(sshd:session): Failed to 
connect to system bus: Resource temporarily unavailable
Jun 04 09:50:17  sshd[8791]: pam_systemd(sshd:session): Failed to 
create session: Connection timed out
Jun 04 09:50:17  sshd[8840]: pam_systemd(sshd:session): Failed to 
create session: 

Bug#929652: systemd: sshd processes are not put into the correct slice/scope

2019-05-27 Thread Simon Beirnaert
Package: systemd
Version: 232-25+deb9u11
Severity: normal

Dear Maintainer,


Systemd does not seem to consistently put sshd processes under the
relevant user's slice. I have a user with 10925 sshd processes related
to its sessions. 7552 of those are put under session scopes in the
user's slice. 3372 are put under system.slice (ssh.service). The ones
under the user slice are neatly grouped into session scopes whereas
the ones under the system slice are not.

This is making it impossible to put accurate limits to the sshd
processes, or the user's processes. I can set TasksMax, but that only
has a value if the tasks are counted correctly.

If there's any more information needed to debug this, please let me
know.


Thanks
Simon

-- Package-specific info:

-- System Information:
Debian Release: 9.5
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.9.0-8-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=UTF-8 (charmap=locale: Cannot set LC_CTYPE 
to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
ANSI_X3.4-1968), LANGUAGE=en_US.UTF-8 (charmap=locale: Cannot set LC_CTYPE to 
default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
ANSI_X3.4-1968)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages systemd depends on:
ii  adduser 3.115
ii  libacl1 2.2.52-3+b1
ii  libapparmor12.11.0-3+deb9u2
ii  libaudit1   1:2.6.7-2
ii  libblkid1   2.29.2-1+deb9u1
ii  libc6   2.24-11+deb9u3
ii  libcap2 1:2.25-1
ii  libcryptsetup4  2:1.7.3-4
ii  libgcrypt20 1.7.6-2+deb9u3
ii  libgpg-error0   1.26-2
ii  libidn111.33-1
ii  libip4tc0   1.6.0+snapshot20161117-6
ii  libkmod223-2
ii  liblz4-10.0~r131-2+b1
ii  liblzma55.2.2-1.2+b1
ii  libmount1   2.29.2-1+deb9u1
ii  libpam0g1.1.8-3.6
ii  libseccomp2 2.3.1-2.1
ii  libselinux1 2.6-3+b3
ii  libsystemd0 232-25+deb9u11
ii  mount   2.29.2-1+deb9u1
ii  procps  2:3.3.12-3+deb9u1
ii  util-linux  2.29.2-1+deb9u1

Versions of packages systemd recommends:
ii  dbus1.10.26-0+deb9u1
ii  libpam-systemd  232-25+deb9u11

Versions of packages systemd suggests:
pn  policykit-1
pn  systemd-container  
pn  systemd-ui 

Versions of packages systemd is related to:
pn  dracut   
ii  initramfs-tools  0.130
ii  udev 232-25+deb9u11

-- debconf information excluded