Bug#929652: systemd: sshd processes are not put into the correct slice/scope
I opened a ticket upstream. You can find it here: https://github.com/systemd/systemd/issues/12867 Thank you for your help!
Bug#929652: systemd: sshd processes are not put into the correct slice/scope
Sorry for the late reply here. Got caught up in other stuff. I did some digging around. Set the loglevel with systemd-analyze to debug and also added the debug flag to pam_systemd.so. What I noticed is that on boot, when 5000+ machines try to authenticate at once, pam_systemd seems to fall on its ass and fail due to resource exhaustion. An excerpt from journalctl is attached below. I thought this might've been the reason behind sshd processes not being assigned to the correct slice, but the processes for which these log entries are generated are not available on the system anymore, which I take as meaning that the sshd process exited because it couldn't open a session. I tried to go about it the other way around and search for logs generated by any sshd process which is under the system.slice. I used this oneliner to do so: for i in $(systemctl status ssh | grep 'sshd: ' | sed -E 's/[^0-9]*([0-9]*)[^0-9]*/\1/'); do echo "==> logs for process $i"; journalctl | grep '\[$i\]'; done | less The search came up empty. For none of the currently 7000+ sshd processes which aren't in the correct user slice, there are any logs in journalctl. I verified and all logs since boot time are currently still in the journal, it hasn't rotated yet. Jun 04 09:50:17 sshd[8415]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[8691]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[8717]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[28538]: pam_systemd(sshd:session): Failed to connect to system bus: Resource temporarily unavailable Jun 04 09:50:17 sshd[8652]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[8721]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[8716]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[28121]: pam_systemd(sshd:session): Failed to connect to system bus: Resource temporarily unavailable Jun 04 09:50:17 sshd[28551]: pam_systemd(sshd:session): Failed to connect to system bus: Resource temporarily unavailable Jun 04 09:50:17 sshd[8682]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[8396]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[28524]: pam_systemd(sshd:session): Failed to connect to system bus: Resource temporarily unavailable Jun 04 09:50:17 sshd[8719]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[28546]: pam_systemd(sshd:session): Failed to connect to system bus: Resource temporarily unavailable Jun 04 09:50:17 sshd[8690]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[8697]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[7350]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[8522]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[8406]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[28636]: pam_systemd(sshd:session): Failed to connect to system bus: Resource temporarily unavailable Jun 04 09:50:17 sshd[28620]: pam_systemd(sshd:session): Failed to connect to system bus: Resource temporarily unavailable Jun 04 09:50:17 sshd[28669]: pam_systemd(sshd:session): Failed to connect to system bus: Resource temporarily unavailable Jun 04 09:50:17 sshd[8693]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[8723]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[28633]: pam_systemd(sshd:session): Failed to connect to system bus: Resource temporarily unavailable Jun 04 09:50:17 sshd[8401]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[8692]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[28670]: pam_systemd(sshd:session): Failed to connect to system bus: Resource temporarily unavailable Jun 04 09:50:17 sshd[8543]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[8234]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[8654]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[28545]: pam_systemd(sshd:session): Failed to connect to system bus: Resource temporarily unavailable Jun 04 09:50:17 sshd[8791]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[8840]: pam_systemd(sshd:session): Failed to create session:
Bug#929652: systemd: sshd processes are not put into the correct slice/scope
Package: systemd Version: 232-25+deb9u11 Severity: normal Dear Maintainer, Systemd does not seem to consistently put sshd processes under the relevant user's slice. I have a user with 10925 sshd processes related to its sessions. 7552 of those are put under session scopes in the user's slice. 3372 are put under system.slice (ssh.service). The ones under the user slice are neatly grouped into session scopes whereas the ones under the system slice are not. This is making it impossible to put accurate limits to the sshd processes, or the user's processes. I can set TasksMax, but that only has a value if the tasks are counted correctly. If there's any more information needed to debug this, please let me know. Thanks Simon -- Package-specific info: -- System Information: Debian Release: 9.5 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 4.9.0-8-amd64 (SMP w/4 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=UTF-8 (charmap=locale: Cannot set LC_CTYPE to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory ANSI_X3.4-1968), LANGUAGE=en_US.UTF-8 (charmap=locale: Cannot set LC_CTYPE to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory ANSI_X3.4-1968) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages systemd depends on: ii adduser 3.115 ii libacl1 2.2.52-3+b1 ii libapparmor12.11.0-3+deb9u2 ii libaudit1 1:2.6.7-2 ii libblkid1 2.29.2-1+deb9u1 ii libc6 2.24-11+deb9u3 ii libcap2 1:2.25-1 ii libcryptsetup4 2:1.7.3-4 ii libgcrypt20 1.7.6-2+deb9u3 ii libgpg-error0 1.26-2 ii libidn111.33-1 ii libip4tc0 1.6.0+snapshot20161117-6 ii libkmod223-2 ii liblz4-10.0~r131-2+b1 ii liblzma55.2.2-1.2+b1 ii libmount1 2.29.2-1+deb9u1 ii libpam0g1.1.8-3.6 ii libseccomp2 2.3.1-2.1 ii libselinux1 2.6-3+b3 ii libsystemd0 232-25+deb9u11 ii mount 2.29.2-1+deb9u1 ii procps 2:3.3.12-3+deb9u1 ii util-linux 2.29.2-1+deb9u1 Versions of packages systemd recommends: ii dbus1.10.26-0+deb9u1 ii libpam-systemd 232-25+deb9u11 Versions of packages systemd suggests: pn policykit-1 pn systemd-container pn systemd-ui Versions of packages systemd is related to: pn dracut ii initramfs-tools 0.130 ii udev 232-25+deb9u11 -- debconf information excluded