Peace, On 28/02/2017 16:00, Lennart Poettering wrote: > On Tue, 28.02.17 13:26, Pascal kolijn (p.kol...@vu.nl) wrote: > >> Hi List, >> >> I've subscribed to this list to ask for help in debugging a problem we >> seem to have with the socket activated telnetd on a rhel7 system. >> >> A default install of telnetd collects data from some small boxes >> deployed in the field. It works for a long time and then suddenly: >> >> Feb 26 17:46:53 bibr systemd: Created slice user-6006.slice. >> Feb 26 17:46:53 bibr systemd: Starting user-6006.slice. >> Feb 26 17:46:53 bibr systemd: Started Session 2223341 of user <USER>. >> Feb 26 17:46:53 bibr systemd-logind: New session 2223341 of user <USER>. >> Feb 26 17:46:53 bibr systemd: Starting Session 2223341 of user <USER>. >> Feb 26 17:46:53 bibr systemd: Started Telnet Server (<IP>:28830). >> Feb 26 17:46:53 pbibr001 systemd: Starting Telnet Server (<IP>:28830)... >> Feb 26 17:46:57 bibr systemd: Failed to fork: Cannot allocate memory > > Hmm, Linux fork() returns ENOMEM if the maximum number of tasks on the > system is hit (yes this is a bit misleading, but that's how it is). > That max number of tasks is limited for example by the max number of > assignable pids as configured in /proc/sys/kernel/pid_max? Maybe you > hit that limit? Maybe something is leaking pids on your system? not > reaping zombies properly?
As far as I can determine running out of pids is not the issue, as I can see pids being reused in a day, which will not say that some may still go missing over time, but how do I determine if that is the case...? What I do see is that the rss of the systemd process is slowly growing over time in the production environment. I've not been able (yet) to reproduce the situation in a test environment, which is a pity. I think I can simulate the telnet connects more accurately after I speak with the developer of the said boxes, and see if I can create a reproducible situation. >> Feb 26 17:46:57 bibr systemd: Assertion 'pid >= 1' failed at >> src/core/unit.c:1996, function unit_watch_pid(). Aborting. >> Feb 26 17:46:57 bibr001 systemd: Caught <ABRT>, cannot fork for core >> dump: Cannot allocate memory >> Feb 26 17:46:57 bibr systemd: Freezing execution. > > So this is definitely a bug. If the limit is hit, we hould certainly > not hit an assert. I tried to figure out how this could ever happen, > but afaics this should not be possible on current git at least. Any > chance you can try to reproduce this isue with something more recent > than a rhel7 box? Hmmm, the version we currently use in production is: # rpm -qa | grep systemd systemd-libs-219-19.el7_2.13.x86_64 systemd-219-19.el7_2.13.x86_64 systemd-sysv-219-19.el7_2.13.x86_64 I think I can update it to the current state in 7.3 for the production machine, but will be reluctant to go for a more recent version... Maybe in the test env, if I can reproduce it there. > Either way it appears that there's both a bug on your setup and in > systemd: something leaks processes (which is bug #1, in your setup) > and then systemd doesn't deal properly with that (which is bug #2, in > systemd upstream)... > > Lennart > Pascal Kolijn Vrije Universiteit Amsterdam _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel