Re: [systemd-devel] systemd debug out of memory

2017-03-05 Thread Pascal Kolijn
Peace,

On 28/02/2017 16:00, Lennart Poettering wrote:
> On Tue, 28.02.17 13:26, Pascal kolijn (p.kol...@vu.nl) wrote:
> 
>> Hi List,
>>
>> I've subscribed to this list to ask for help in debugging a problem we
>> seem to have with the socket activated telnetd on a rhel7 system.
>>
>> A default install of telnetd collects data from some small boxes
>> deployed in the field. It works for a long time and then suddenly:
>>
>> Feb 26 17:46:53 bibr systemd: Created slice user-6006.slice.
>> Feb 26 17:46:53 bibr systemd: Starting user-6006.slice.
>> Feb 26 17:46:53 bibr systemd: Started Session 2223341 of user .
>> Feb 26 17:46:53 bibr systemd-logind: New session 2223341 of user .
>> Feb 26 17:46:53 bibr systemd: Starting Session 2223341 of user .
>> Feb 26 17:46:53 bibr systemd: Started Telnet Server (:28830).
>> Feb 26 17:46:53 pbibr001 systemd: Starting Telnet Server (:28830)...
>> Feb 26 17:46:57 bibr systemd: Failed to fork: Cannot allocate memory
> 
> Hmm, Linux fork() returns ENOMEM if the maximum number of tasks on the
> system is hit (yes this is a bit misleading, but that's how it is).
> That max number of tasks is limited for example by the max number of
> assignable pids as configured in /proc/sys/kernel/pid_max? Maybe you
> hit that limit? Maybe something is leaking pids on your system? not
> reaping zombies properly?

As far as I can determine running out of pids is not the issue, as I can
see pids being reused in a day, which will not say that some may still
go missing over time, but how do I determine if that is the case...?

What I do see is that the rss of the systemd process is slowly growing
over time in the production environment. I've not been able (yet) to
reproduce the situation in a test environment, which is a pity. I think
I can simulate the telnet connects more accurately after I speak with
the developer of the said boxes, and see if I can create a reproducible
situation.

>> Feb 26 17:46:57 bibr systemd: Assertion 'pid >= 1' failed at
>> src/core/unit.c:1996, function unit_watch_pid(). Aborting.
>> Feb 26 17:46:57 bibr001 systemd: Caught , cannot fork for core
>> dump: Cannot allocate memory
>> Feb 26 17:46:57 bibr systemd: Freezing execution.
> 
> So this is definitely a bug. If the limit is hit, we hould certainly
> not hit an assert. I tried to figure out how this could ever happen,
> but afaics this should not be possible on current git at least. Any
> chance you can try to reproduce this isue with something more recent
> than a rhel7 box?

Hmmm, the version we currently use in production is:

# rpm -qa | grep systemd
systemd-libs-219-19.el7_2.13.x86_64
systemd-219-19.el7_2.13.x86_64
systemd-sysv-219-19.el7_2.13.x86_64

I think I can update it to the current state in 7.3 for the production
machine, but will be reluctant to go for a more recent version...

Maybe in the test env, if I can reproduce it there.

> Either way it appears that there's both a bug on your setup and in
> systemd: something leaks processes (which is bug #1, in your setup)
> and then systemd doesn't deal properly with that (which is bug #2, in
> systemd upstream)...
> 
> Lennart
> 

Pascal Kolijn
Vrije Universiteit Amsterdam
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] systemd debug out of memory

2017-02-28 Thread Pascal kolijn
Hi List,

I've subscribed to this list to ask for help in debugging a problem we
seem to have with the socket activated telnetd on a rhel7 system.

A default install of telnetd collects data from some small boxes
deployed in the field. It works for a long time and then suddenly:

Feb 26 17:46:53 bibr systemd: Created slice user-6006.slice.
Feb 26 17:46:53 bibr systemd: Starting user-6006.slice.
Feb 26 17:46:53 bibr systemd: Started Session 2223341 of user .
Feb 26 17:46:53 bibr systemd-logind: New session 2223341 of user .
Feb 26 17:46:53 bibr systemd: Starting Session 2223341 of user .
Feb 26 17:46:53 bibr systemd: Started Telnet Server (:28830).
Feb 26 17:46:53 pbibr001 systemd: Starting Telnet Server (:28830)...
Feb 26 17:46:57 bibr systemd: Failed to fork: Cannot allocate memory
Feb 26 17:46:57 bibr systemd: Assertion 'pid >= 1' failed at
src/core/unit.c:1996, function unit_watch_pid(). Aborting.
Feb 26 17:46:57 bibr001 systemd: Caught , cannot fork for core
dump: Cannot allocate memory
Feb 26 17:46:57 bibr systemd: Freezing execution.
Feb 26 17:47:22 bibr dbus[768]: [system] Failed to activate service
'org.freedesktop.systemd1': timed out
Feb 26 17:47:22 bibr dbus-daemon: dbus[768]: [system] Failed to activate
service 'org.freedesktop.systemd1': timed out
Feb 26 17:47:22 bibr systemd-logind: Failed to start session scope
session-2223342.scope: Activation of org.freedesktop.systemd1 timed out
org.freedesktop.DBus.Error.TimedOut
Feb 26 17:47:47 bibr dbus[768]: [system] Failed to activate service
'org.freedesktop.systemd1': timed out
Feb 26 17:47:47 bibr systemd-logind: Failed to abandon session scope:
Connection timed out
Feb 26 17:47:47 bibr dbus-daemon: dbus[768]: [system] Failed to activate
service 'org.freedesktop.systemd1': timed out

And after the systemd: Freezing execution. systemctl is no longer able
to communicate with systemd. It has all the looks of a memory leak
somewhere (systemd  (-logind) or telnetd) but how can I debug this...

Pascal Kolijn

Vrije Universiteit - Amsterdam
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel