Bug#929652: systemd: sshd processes are not put into the correct slice/scope
On Mon, 27 May 2019 21:03:03 + Simon Beirnaert wrote: > Package: systemd > Version: 232-25+deb9u11 > Severity: normal > > Dear Maintainer, > > > Systemd does not seem to consistently put sshd processes under the > relevant user's slice. I have a user with 10925 sshd processes related > to its sessions. 7552 of those are put under session scopes in the > user's slice. 3372 are put under system.slice (ssh.service). The ones > under the user slice are neatly grouped into session scopes whereas > the ones under the system slice are not. > > This is making it impossible to put accurate limits to the sshd > processes, or the user's processes. I can set TasksMax, but that only > has a value if the tasks are counted correctly. > > If there's any more information needed to debug this, please let me > know. Thanks for filing this issue in the upstream bug tracker. It would be great if you can follow-up there. Afaics Lennart had some questions. Regards, Michael -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? signature.asc Description: OpenPGP digital signature
Bug#929652: systemd: sshd processes are not put into the correct slice/scope
Hi. Failure to create the session is almost certainly the reason why the processes are in wrong slice. Note that the PID that is in journald is PID of the process who connected to journald obtained via getpeercred. If it forks and the fd is passed to another process, the logs will misreport the old PID. (Given different PIDs in very short intervals, I don't think it's the PID of the listening sshd process but I'm not sure if sshd children don't fork again. I'd recommend checking also neighbor PIDs found in the logs.) To your problem, I guess you're hitting some limits of dbus-daemon, I'd try increasing 256 for you system dbus-daemon instace. The second suggestion^W workaround would be to tweak CPUWeight=/CPUShares= of dbus.service and sshd.service. Given dbus.service ~2 times more that to sshd.service. This will likely cause slightly more latency at clients but it can lower the strain on dbus-daemon so that it brokers all pam_systemd requests. HTH, Michal signature.asc Description: PGP signature
Bug#929652: systemd: sshd processes are not put into the correct slice/scope
I opened a ticket upstream. You can find it here: https://github.com/systemd/systemd/issues/12867 Thank you for your help!
Bug#929652: systemd: sshd processes are not put into the correct slice/scope
Am 04.06.19 um 14:25 schrieb Simon Beirnaert: > Sorry for the late reply here. Got caught up in other stuff. I did > some digging around. Set the loglevel with systemd-analyze to debug > and also added the debug flag to pam_systemd.so. > > What I noticed is that on boot, when 5000+ machines try to > authenticate at once, pam_systemd seems to fall on its ass and fail > due to resource exhaustion. An excerpt from journalctl is attached > below. > > I thought this might've been the reason behind sshd processes not > being assigned to the correct slice, but the processes for which > these log entries are generated are not available on the system > anymore, which I take as meaning that the sshd process exited > because it couldn't open a session. > > I tried to go about it the other way around and search for logs > generated by any sshd process which is under the system.slice. I > used this oneliner to do so: > > for i in $(systemctl status ssh | grep 'sshd: ' | sed -E > 's/[^0-9]*([0-9]*)[^0-9]*/\1/'); do echo "==> logs for process $i"; > journalctl | grep '\[$i\]'; done | less > > The search came up empty. For none of the currently 7000+ sshd > processes which aren't in the correct user slice, there are any logs > in journalctl. I verified and all logs since boot time are currently > still in the journal, it hasn't rotated yet. > If you have 5000 users authenticate at once, I can imagine, that libpam-systemd/systemd/dbus run into some limits. Would you mind testing with v241 (either from backports or by setting up a buster system) and if the problem is still reproducible, file it upstream at https://github.com/systemd/systemd/issues Thanks, Michael -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? signature.asc Description: OpenPGP digital signature
Bug#929652: systemd: sshd processes are not put into the correct slice/scope
Sorry for the late reply here. Got caught up in other stuff. I did some digging around. Set the loglevel with systemd-analyze to debug and also added the debug flag to pam_systemd.so. What I noticed is that on boot, when 5000+ machines try to authenticate at once, pam_systemd seems to fall on its ass and fail due to resource exhaustion. An excerpt from journalctl is attached below. I thought this might've been the reason behind sshd processes not being assigned to the correct slice, but the processes for which these log entries are generated are not available on the system anymore, which I take as meaning that the sshd process exited because it couldn't open a session. I tried to go about it the other way around and search for logs generated by any sshd process which is under the system.slice. I used this oneliner to do so: for i in $(systemctl status ssh | grep 'sshd: ' | sed -E 's/[^0-9]*([0-9]*)[^0-9]*/\1/'); do echo "==> logs for process $i"; journalctl | grep '\[$i\]'; done | less The search came up empty. For none of the currently 7000+ sshd processes which aren't in the correct user slice, there are any logs in journalctl. I verified and all logs since boot time are currently still in the journal, it hasn't rotated yet. Jun 04 09:50:17 sshd[8415]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[8691]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[8717]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[28538]: pam_systemd(sshd:session): Failed to connect to system bus: Resource temporarily unavailable Jun 04 09:50:17 sshd[8652]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[8721]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[8716]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[28121]: pam_systemd(sshd:session): Failed to connect to system bus: Resource temporarily unavailable Jun 04 09:50:17 sshd[28551]: pam_systemd(sshd:session): Failed to connect to system bus: Resource temporarily unavailable Jun 04 09:50:17 sshd[8682]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[8396]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[28524]: pam_systemd(sshd:session): Failed to connect to system bus: Resource temporarily unavailable Jun 04 09:50:17 sshd[8719]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[28546]: pam_systemd(sshd:session): Failed to connect to system bus: Resource temporarily unavailable Jun 04 09:50:17 sshd[8690]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[8697]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[7350]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[8522]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[8406]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[28636]: pam_systemd(sshd:session): Failed to connect to system bus: Resource temporarily unavailable Jun 04 09:50:17 sshd[28620]: pam_systemd(sshd:session): Failed to connect to system bus: Resource temporarily unavailable Jun 04 09:50:17 sshd[28669]: pam_systemd(sshd:session): Failed to connect to system bus: Resource temporarily unavailable Jun 04 09:50:17 sshd[8693]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[8723]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[28633]: pam_systemd(sshd:session): Failed to connect to system bus: Resource temporarily unavailable Jun 04 09:50:17 sshd[8401]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[8692]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[28670]: pam_systemd(sshd:session): Failed to connect to system bus: Resource temporarily unavailable Jun 04 09:50:17 sshd[8543]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[8234]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[8654]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[28545]: pam_systemd(sshd:session): Failed to connect to system bus: Resource temporarily unavailable Jun 04 09:50:17 sshd[8791]: pam_systemd(sshd:session): Failed to create session: Connection timed out Jun 04 09:50:17 sshd[8840]: pam_systemd(sshd:session): Failed to create session:
Bug#929652: systemd: sshd processes are not put into the correct slice/scope
Control: tags -1 + moreinfo Am 27.05.19 um 23:03 schrieb Simon Beirnaert: > Package: systemd > Version: 232-25+deb9u11 > Severity: normal > > Dear Maintainer, > > > Systemd does not seem to consistently put sshd processes under the > relevant user's slice. I have a user with 10925 sshd processes related > to its sessions. 7552 of those are put under session scopes in the > user's slice. 3372 are put under system.slice (ssh.service). The ones > under the user slice are neatly grouped into session scopes whereas > the ones under the system slice are not. > > This is making it impossible to put accurate limits to the sshd > processes, or the user's processes. I can set TasksMax, but that only > has a value if the tasks are counted correctly. > > If there's any more information needed to debug this, please let me > know. You need to have libpam-systemd enabled and PAM support in sshd as well for processes spawned from an SSH session to be put in the user slice. ~11000 sshd processes for one user seems unusual. What kind of setup is this? Are you sure all those sshd processes were going through the PAM stack? You might add the "debug" flag to the pam_systemd.so config to get more information. What do you get from pam_systemd.so for an exemplary sshd process which is not put into the user slice? Do you get anything in the journal from systemd-logind for this process? You can increase what's being logged by systemd-logind with "systemd-analyze log-level" -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? signature.asc Description: OpenPGP digital signature
Bug#929652: systemd: sshd processes are not put into the correct slice/scope
Package: systemd Version: 232-25+deb9u11 Severity: normal Dear Maintainer, Systemd does not seem to consistently put sshd processes under the relevant user's slice. I have a user with 10925 sshd processes related to its sessions. 7552 of those are put under session scopes in the user's slice. 3372 are put under system.slice (ssh.service). The ones under the user slice are neatly grouped into session scopes whereas the ones under the system slice are not. This is making it impossible to put accurate limits to the sshd processes, or the user's processes. I can set TasksMax, but that only has a value if the tasks are counted correctly. If there's any more information needed to debug this, please let me know. Thanks Simon -- Package-specific info: -- System Information: Debian Release: 9.5 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 4.9.0-8-amd64 (SMP w/4 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=UTF-8 (charmap=locale: Cannot set LC_CTYPE to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory ANSI_X3.4-1968), LANGUAGE=en_US.UTF-8 (charmap=locale: Cannot set LC_CTYPE to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory ANSI_X3.4-1968) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages systemd depends on: ii adduser 3.115 ii libacl1 2.2.52-3+b1 ii libapparmor12.11.0-3+deb9u2 ii libaudit1 1:2.6.7-2 ii libblkid1 2.29.2-1+deb9u1 ii libc6 2.24-11+deb9u3 ii libcap2 1:2.25-1 ii libcryptsetup4 2:1.7.3-4 ii libgcrypt20 1.7.6-2+deb9u3 ii libgpg-error0 1.26-2 ii libidn111.33-1 ii libip4tc0 1.6.0+snapshot20161117-6 ii libkmod223-2 ii liblz4-10.0~r131-2+b1 ii liblzma55.2.2-1.2+b1 ii libmount1 2.29.2-1+deb9u1 ii libpam0g1.1.8-3.6 ii libseccomp2 2.3.1-2.1 ii libselinux1 2.6-3+b3 ii libsystemd0 232-25+deb9u11 ii mount 2.29.2-1+deb9u1 ii procps 2:3.3.12-3+deb9u1 ii util-linux 2.29.2-1+deb9u1 Versions of packages systemd recommends: ii dbus1.10.26-0+deb9u1 ii libpam-systemd 232-25+deb9u11 Versions of packages systemd suggests: pn policykit-1 pn systemd-container pn systemd-ui Versions of packages systemd is related to: pn dracut ii initramfs-tools 0.130 ii udev 232-25+deb9u11 -- debconf information excluded