Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces

2021-02-22 Thread Cristian Rodríguez
On Mon, Feb 22, 2021 at 8:22 AM Robert P. J. Day 
wrote:

> On Thu, 18 Feb 2021, Lennart Poettering wrote:
>
> > On Do, 18.02.21 11:48, Robert P. J. Day (rpj...@crashcourse.ca) wrote:
> >
> > >   A colleague has reported the following apparent issue in a fairly
> > > old (v230) version of systemd -- this is in a Yocto Project Wind River
> > > Linux 9 build, hence the age of the package.
> > >
> > >   As reported to me (and I'm gathering more info), the system was
> > > being put through some "longevity testing" by repeatedly adding,
> > > removing, activating and de-activating network interfaces. According
> > > to the report, the result was heap space slowly but inexorably being
> > > consumed.
> > >
> > >   While waiting for more info, I'm going to examine the commit log for
> > > systemd from v230 moving forward to collect any commits that address
> > > memory leaks, then peruse them more carefully to see if they might
> > > resolve the problem.
> > >
> > >   I realize it's asking a bit for folks here to remember that far
> > > back, but does this issue sound at all familiar? Any pointers that
> > > might save me some time? Thanks.
> >
> > Note that our hash tables operate with an allocation cache: when
> > adding entries to them and then removing them again the memory
> > required for that is not returned to the OS but added to a local
> > cache. When the next entry is then added again, we recycle the
> > cached entry instead of asking for new memory again. This allocation
> > cache is a bit quicker then going to malloc() all the time, but
> > means if you just watch the heap you'll assume there's a leak even
> > though there isn't really, the memory is not lost after all, and
> > will be reused eventually if we need it.
> >
> > You may use the env var SYSTEMD_MEMPOOL=0 to turn this logic off,
> > but not sure v230 already knew that env var.
>
>   well, we seem to have isolated the issue, here it is in a nutshell
> based on a condensed note i got from someone who tracked it down this
> weekend. the memory leak is triggered by:
>
>   $ ssh root@ -p 830 -s netconf   [830 = netconf over SSH]
>
> long story short, according to jemalloc profiling, there is a massive
> memory leak in DBUS code,


Ok, give that data to whoever supports your system.. you are not giving us
anything useful..
Now.. Can it have memory leaks ? yeah, it could, however I have reviewed
the code (Admit a long time ago) and leaks on the systemd binaries are
usually limited to error paths not exercised often and frankly in the path
to extinction.

What you see most of time are not leaks in the code, but leaks in
understanding of memory management techniques.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces

2021-02-22 Thread Lennart Poettering
On Mo, 22.02.21 06:22, Robert P. J. Day (rpj...@crashcourse.ca) wrote:

>   well, we seem to have isolated the issue, here it is in a nutshell
> based on a condensed note i got from someone who tracked it down this
> weekend. the memory leak is triggered by:
>
>   $ ssh root@ -p 830 -s netconf   [830 = netconf over SSH]
>
> long story short, according to jemalloc profiling, there is a massive
> memory leak in DBUS code, to the tune of about 500M/day on a running
> system. i'm perusing the profiling output now, but does any of this
> sound even remotely familiar to anyone? i realize that's just a
> summary, but does anyone remember seeing something related to this
> once upon a time? [heavily-patched systemd_230 from wind river linux
> 9].

On really old kernels we didn#t get reliable notification of cgroup
empty events, hence .scope units would stick around on login sessions
under some conditions. We added workarounds later on for that. But
seriously, this is so long ago... no idea.

Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces

2021-02-22 Thread Robert P. J. Day
On Mon, 22 Feb 2021, Greg KH wrote:

> On Mon, Feb 22, 2021 at 06:22:44AM -0500, Robert P. J. Day wrote:
> > On Thu, 18 Feb 2021, Lennart Poettering wrote:
> >
> > > On Do, 18.02.21 11:48, Robert P. J. Day (rpj...@crashcourse.ca) wrote:
> > >
> > > >   A colleague has reported the following apparent issue in a fairly
> > > > old (v230) version of systemd -- this is in a Yocto Project Wind River
> > > > Linux 9 build, hence the age of the package.
> > > >
> > > >   As reported to me (and I'm gathering more info), the system was
> > > > being put through some "longevity testing" by repeatedly adding,
> > > > removing, activating and de-activating network interfaces. According
> > > > to the report, the result was heap space slowly but inexorably being
> > > > consumed.
> > > >
> > > >   While waiting for more info, I'm going to examine the commit log for
> > > > systemd from v230 moving forward to collect any commits that address
> > > > memory leaks, then peruse them more carefully to see if they might
> > > > resolve the problem.
> > > >
> > > >   I realize it's asking a bit for folks here to remember that far
> > > > back, but does this issue sound at all familiar? Any pointers that
> > > > might save me some time? Thanks.
> > >
> > > Note that our hash tables operate with an allocation cache: when
> > > adding entries to them and then removing them again the memory
> > > required for that is not returned to the OS but added to a local
> > > cache. When the next entry is then added again, we recycle the
> > > cached entry instead of asking for new memory again. This allocation
> > > cache is a bit quicker then going to malloc() all the time, but
> > > means if you just watch the heap you'll assume there's a leak even
> > > though there isn't really, the memory is not lost after all, and
> > > will be reused eventually if we need it.
> > >
> > > You may use the env var SYSTEMD_MEMPOOL=0 to turn this logic off,
> > > but not sure v230 already knew that env var.
> >
> >   well, we seem to have isolated the issue, here it is in a nutshell
> > based on a condensed note i got from someone who tracked it down this
> > weekend. the memory leak is triggered by:
> >
> >   $ ssh root@ -p 830 -s netconf   [830 = netconf over SSH]
> >
> > long story short, according to jemalloc profiling, there is a massive
> > memory leak in DBUS code, to the tune of about 500M/day on a running
> > system. i'm perusing the profiling output now, but does any of this
> > sound even remotely familiar to anyone? i realize that's just a
> > summary, but does anyone remember seeing something related to this
> > once upon a time? [heavily-patched systemd_230 from wind river linux
> > 9].
>
> Given that this is a heavily patched system, please get support from
> the vendor that provided this as you are paying for this.  Don't ask
> the community to try to remember what happened with an old obsolete
> version of software, that's crazy...

  that's already in the pipeline, i was simply asking if anyone had
ever *seen* this before, just so we might be able to say, "hey, we're
not the first this has happened to."

  also, on the off-chance that anyone else is using a similarly-dated
version of systemd, they might say, "hm, that sounds suspiciously
like what's happening with *us*."

  just trying to be helpful.

rday
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces

2021-02-22 Thread Greg KH
On Mon, Feb 22, 2021 at 06:22:44AM -0500, Robert P. J. Day wrote:
> On Thu, 18 Feb 2021, Lennart Poettering wrote:
> 
> > On Do, 18.02.21 11:48, Robert P. J. Day (rpj...@crashcourse.ca) wrote:
> >
> > >   A colleague has reported the following apparent issue in a fairly
> > > old (v230) version of systemd -- this is in a Yocto Project Wind River
> > > Linux 9 build, hence the age of the package.
> > >
> > >   As reported to me (and I'm gathering more info), the system was
> > > being put through some "longevity testing" by repeatedly adding,
> > > removing, activating and de-activating network interfaces. According
> > > to the report, the result was heap space slowly but inexorably being
> > > consumed.
> > >
> > >   While waiting for more info, I'm going to examine the commit log for
> > > systemd from v230 moving forward to collect any commits that address
> > > memory leaks, then peruse them more carefully to see if they might
> > > resolve the problem.
> > >
> > >   I realize it's asking a bit for folks here to remember that far
> > > back, but does this issue sound at all familiar? Any pointers that
> > > might save me some time? Thanks.
> >
> > Note that our hash tables operate with an allocation cache: when
> > adding entries to them and then removing them again the memory
> > required for that is not returned to the OS but added to a local
> > cache. When the next entry is then added again, we recycle the
> > cached entry instead of asking for new memory again. This allocation
> > cache is a bit quicker then going to malloc() all the time, but
> > means if you just watch the heap you'll assume there's a leak even
> > though there isn't really, the memory is not lost after all, and
> > will be reused eventually if we need it.
> >
> > You may use the env var SYSTEMD_MEMPOOL=0 to turn this logic off,
> > but not sure v230 already knew that env var.
> 
>   well, we seem to have isolated the issue, here it is in a nutshell
> based on a condensed note i got from someone who tracked it down this
> weekend. the memory leak is triggered by:
> 
>   $ ssh root@ -p 830 -s netconf   [830 = netconf over SSH]
> 
> long story short, according to jemalloc profiling, there is a massive
> memory leak in DBUS code, to the tune of about 500M/day on a running
> system. i'm perusing the profiling output now, but does any of this
> sound even remotely familiar to anyone? i realize that's just a
> summary, but does anyone remember seeing something related to this
> once upon a time? [heavily-patched systemd_230 from wind river linux
> 9].

Given that this is a heavily patched system, please get support from the
vendor that provided this as you are paying for this.  Don't ask the
community to try to remember what happened with an old obsolete version
of software, that's crazy...

good luck!

greg k-h
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces

2021-02-22 Thread Robert P. J. Day
On Thu, 18 Feb 2021, Lennart Poettering wrote:

> On Do, 18.02.21 11:48, Robert P. J. Day (rpj...@crashcourse.ca) wrote:
>
> >   A colleague has reported the following apparent issue in a fairly
> > old (v230) version of systemd -- this is in a Yocto Project Wind River
> > Linux 9 build, hence the age of the package.
> >
> >   As reported to me (and I'm gathering more info), the system was
> > being put through some "longevity testing" by repeatedly adding,
> > removing, activating and de-activating network interfaces. According
> > to the report, the result was heap space slowly but inexorably being
> > consumed.
> >
> >   While waiting for more info, I'm going to examine the commit log for
> > systemd from v230 moving forward to collect any commits that address
> > memory leaks, then peruse them more carefully to see if they might
> > resolve the problem.
> >
> >   I realize it's asking a bit for folks here to remember that far
> > back, but does this issue sound at all familiar? Any pointers that
> > might save me some time? Thanks.
>
> Note that our hash tables operate with an allocation cache: when
> adding entries to them and then removing them again the memory
> required for that is not returned to the OS but added to a local
> cache. When the next entry is then added again, we recycle the
> cached entry instead of asking for new memory again. This allocation
> cache is a bit quicker then going to malloc() all the time, but
> means if you just watch the heap you'll assume there's a leak even
> though there isn't really, the memory is not lost after all, and
> will be reused eventually if we need it.
>
> You may use the env var SYSTEMD_MEMPOOL=0 to turn this logic off,
> but not sure v230 already knew that env var.

  well, we seem to have isolated the issue, here it is in a nutshell
based on a condensed note i got from someone who tracked it down this
weekend. the memory leak is triggered by:

  $ ssh root@ -p 830 -s netconf   [830 = netconf over SSH]

long story short, according to jemalloc profiling, there is a massive
memory leak in DBUS code, to the tune of about 500M/day on a running
system. i'm perusing the profiling output now, but does any of this
sound even remotely familiar to anyone? i realize that's just a
summary, but does anyone remember seeing something related to this
once upon a time? [heavily-patched systemd_230 from wind river linux
9].

rday
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces

2021-02-19 Thread Greg KH
On Fri, Feb 19, 2021 at 06:09:16PM +0200, Mantas Mikulėnas wrote:
> On Fri, Feb 19, 2021 at 4:49 PM Lennart Poettering 
> wrote:
> 
> > On Fr, 19.02.21 09:28, Robert P. J. Day (rpj...@crashcourse.ca) wrote:
> >
> > > i guess i expected that the CVE identifier would be in the commit
> > > message. anyway, time to examine ...
> >
> > CVEs are assigned/published long after the commits to fix the issues
> > are made. We cannot retroactively change git commits, that's just not
> > how this works.
> >
> 
> This *could* work with git notes, it seems --grep searches them as well.

Git notes do not work for anything but a local repo, sorry.

if people really care about CVEs, they know how to use them.  But
really, they are mostly useless:

https://kernel-recipes.org/en/2019/talks/cves-are-dead-long-live-the-cve/

greg k-h
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces

2021-02-19 Thread Robert P. J. Day
On Fri, 19 Feb 2021, Lennart Poettering wrote:

> On Fr, 19.02.21 18:09, Mantas Mikulėnas (graw...@gmail.com) wrote:
>
> > On Fri, Feb 19, 2021 at 4:49 PM Lennart Poettering 
> > wrote:
> >
> > > On Fr, 19.02.21 09:28, Robert P. J. Day (rpj...@crashcourse.ca) wrote:
> > >
> > > > i guess i expected that the CVE identifier would be in the commit
> > > > message. anyway, time to examine ...
> > >
> > > CVEs are assigned/published long after the commits to fix the issues
> > > are made. We cannot retroactively change git commits, that's just not
> > > how this works.
> > >
> >
> > This *could* work with git notes, it seems --grep searches them as well.
>
> We used to attach security + backport info via git notes onto our
> commits, but github doesn#t show them/support them. They were
> basically invisible, noone knew they were there. Thus we eventually
> stopped doing them.
>
> If github would integrate git notes into their UI somehow this would
> be grand.

  i suspect that won't happen any time soon:

https://www.quora.com/Why-does-GitHub-no-longer-support-git-notes

rday___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces

2021-02-19 Thread Lennart Poettering
On Fr, 19.02.21 18:09, Mantas Mikulėnas (graw...@gmail.com) wrote:

> On Fri, Feb 19, 2021 at 4:49 PM Lennart Poettering 
> wrote:
>
> > On Fr, 19.02.21 09:28, Robert P. J. Day (rpj...@crashcourse.ca) wrote:
> >
> > > i guess i expected that the CVE identifier would be in the commit
> > > message. anyway, time to examine ...
> >
> > CVEs are assigned/published long after the commits to fix the issues
> > are made. We cannot retroactively change git commits, that's just not
> > how this works.
> >
>
> This *could* work with git notes, it seems --grep searches them as well.

We used to attach security + backport info via git notes onto our
commits, but github doesn#t show them/support them. They were
basically invisible, noone knew they were there. Thus we eventually
stopped doing them.

If github would integrate git notes into their UI somehow this would
be grand.

Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces

2021-02-19 Thread Mantas Mikulėnas
On Fri, Feb 19, 2021 at 4:49 PM Lennart Poettering 
wrote:

> On Fr, 19.02.21 09:28, Robert P. J. Day (rpj...@crashcourse.ca) wrote:
>
> > i guess i expected that the CVE identifier would be in the commit
> > message. anyway, time to examine ...
>
> CVEs are assigned/published long after the commits to fix the issues
> are made. We cannot retroactively change git commits, that's just not
> how this works.
>

This *could* work with git notes, it seems --grep searches them as well.

-- 
Mantas Mikulėnas
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces

2021-02-19 Thread Robert P. J. Day
On Fri, 19 Feb 2021, Lennart Poettering wrote:

> On Fr, 19.02.21 09:28, Robert P. J. Day (rpj...@crashcourse.ca) wrote:
>
> > i guess i expected that the CVE identifier would be in the commit
> > message. anyway, time to examine ...
>
> CVEs are assigned/published long after the commits to fix the issues
> are made. We cannot retroactively change git commits, that's just not
> how this works.

  oh, i get that. i have my commit so it's time to see if it fixes the
problem. thanks all for the assistance.

rday
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces

2021-02-19 Thread Lennart Poettering
On Fr, 19.02.21 09:28, Robert P. J. Day (rpj...@crashcourse.ca) wrote:

> i guess i expected that the CVE identifier would be in the commit
> message. anyway, time to examine ...

CVEs are assigned/published long after the commits to fix the issues
are made. We cannot retroactively change git commits, that's just not
how this works.

Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces

2021-02-19 Thread Robert P. J. Day
On Fri, 19 Feb 2021, Reindl Harald wrote:

>
>
> Am 19.02.21 um 11:28 schrieb Robert P. J. Day:
> >I *may* have found the problem ... as one can read here:
> >
> > https://access.redhat.com/solutions/3840481
> >
> > "CVE-2019-3815 systemd: memory leak in journald-server.c introduced by
> > fix for CVE-2018-16864"
> >
> >So as I interpret that, a memory leak introduced by that earlier CVE
> > had to be corrected by that later CVE. I checked the state of
> > systemd_230 as shipped by WRL9, and it comes with an extensive set of
> > patches, which includes the earlier CVE, but *not* the later one.
> >
> >Hmmm ...
>
> that one should have been fixed long ago
> https://bugzilla.redhat.com/show_bug.cgi?id=1665931

  while i'm jumping back into this, a possibly silly question ...
where in the systemd commit log can i find the commit corresponding to
the "fix" associated with CVE-2019-3815?

  i wanted to see the original commit that introduced the bug, so i
naturally used:

  $ git log --grep="CVE-2018-16864"

and there it was. but i get nothing from:

  $ git log --grep="CVE-2019-3815"

after a bit of reading, i landed on:

  $ git log -p --grep="don't use overly large buffer"

  commit eb1ec489eef8a32918bbfc56a268c9d10464584d
  Author: Michal Sekletár 
  Date:   Tue Jan 22 14:29:50 2019 +0100

process-util: don't use overly large buffer to store process command line

Allocate new string as a return value and free our "scratch pad"
buffer that is potentially much larger than needed (up to
_SC_ARG_MAX).

Fixes #11502

... etc etc ...

i guess i expected that the CVE identifier would be in the commit
message. anyway, time to examine ...

rday

p.s. my first concern is whether this is a standalone patch that i can
shoehorn into systemd_230, or whether i can just drag the entire
version forward to the point where it incorporates that fix. that
might take a bit of work given the distance involved.___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces

2021-02-19 Thread Robert P. J. Day
On Fri, 19 Feb 2021, Reindl Harald wrote:

>
>
> Am 19.02.21 um 11:28 schrieb Robert P. J. Day:
> >I *may* have found the problem ... as one can read here:
> >
> > https://access.redhat.com/solutions/3840481
> >
> > "CVE-2019-3815 systemd: memory leak in journald-server.c introduced by
> > fix for CVE-2018-16864"
> >
> >So as I interpret that, a memory leak introduced by that earlier CVE
> > had to be corrected by that later CVE. I checked the state of
> > systemd_230 as shipped by WRL9, and it comes with an extensive set of
> > patches, which includes the earlier CVE, but *not* the later one.
> >
> >Hmmm ...
>
> that one should have been fixed long ago
> https://bugzilla.redhat.com/show_bug.cgi?id=1665931

  yes, that fix is from a while ago, but the issue here is that it
wasn't incorporated in the patch set for wind river linux 9, which is
a few years old, so it's not at all surprising that WRL9 is not
keeping up with current patches.

rday
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces

2021-02-19 Thread Reindl Harald




Am 19.02.21 um 11:28 schrieb Robert P. J. Day:

   I *may* have found the problem ... as one can read here:

https://access.redhat.com/solutions/3840481

"CVE-2019-3815 systemd: memory leak in journald-server.c introduced by
fix for CVE-2018-16864"

   So as I interpret that, a memory leak introduced by that earlier CVE
had to be corrected by that later CVE. I checked the state of
systemd_230 as shipped by WRL9, and it comes with an extensive set of
patches, which includes the earlier CVE, but *not* the later one.

   Hmmm ...


that one should have been fixed long ago
https://bugzilla.redhat.com/show_bug.cgi?id=1665931


but your original mail didn't talk about journald at all


 Weitergeleitete Nachricht 
Betreff: [systemd-devel] Looking for known memory leaks triggered by 
stress testing add/remove/up/down interfaces

Datum: Thu, 18 Feb 2021 11:48:58 -0500 (EST)
Von: Robert P. J. Day 
An: Systemd mailing list 


  A colleague has reported the following apparent issue in a fairly
old (v230) version of systemd -- this is in a Yocto Project Wind River
Linux 9 build, hence the age of the package.

  As reported to me (and I'm gathering more info), the system was
being put through some "longevity testing" by repeatedly adding,
removing, activating and de-activating network interfaces. According
to the report, the result was heap space slowly but inexorably being
consumed.

  While waiting for more info, I'm going to examine the commit log for
systemd from v230 moving forward to collect any commits that address
memory leaks, then peruse them more carefully to see if they might
resolve the problem.

  I realize it's asking a bit for folks here to remember that far
back, but does this issue sound at all familiar? Any pointers that
might save me some time? Thanks.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces

2021-02-19 Thread Robert P. J. Day
On Thu, 18 Feb 2021, Lennart Poettering wrote:

> On Do, 18.02.21 11:48, Robert P. J. Day (rpj...@crashcourse.ca) wrote:
>
> >   A colleague has reported the following apparent issue in a fairly
> > old (v230) version of systemd -- this is in a Yocto Project Wind River
> > Linux 9 build, hence the age of the package.
> >
> >   As reported to me (and I'm gathering more info), the system was
> > being put through some "longevity testing" by repeatedly adding,
> > removing, activating and de-activating network interfaces. According
> > to the report, the result was heap space slowly but inexorably being
> > consumed.
> >
> >   While waiting for more info, I'm going to examine the commit log for
> > systemd from v230 moving forward to collect any commits that address
> > memory leaks, then peruse them more carefully to see if they might
> > resolve the problem.
> >
> >   I realize it's asking a bit for folks here to remember that far
> > back, but does this issue sound at all familiar? Any pointers that
> > might save me some time? Thanks.
>
> Note that our hash tables operate with an allocation cache: when
> adding entries to them and then removing them again the memory
> required for that is not returned to the OS but added to a local
> cache. When the next entry is then added again, we recycle the cached
> entry instead of asking for new memory again. This allocation cache is
> a bit quicker then going to malloc() all the time, but means if you
> just watch the heap you'll assume there's a leak even though there
> isn't really, the memory is not lost after all, and will be reused
> eventually if we need it.
>
> You may use the env var SYSTEMD_MEMPOOL=0 to turn this logic off, but
> not sure v230 already knew that env var.

  I *may* have found the problem ... as one can read here:

https://access.redhat.com/solutions/3840481

"CVE-2019-3815 systemd: memory leak in journald-server.c introduced by
fix for CVE-2018-16864"

  So as I interpret that, a memory leak introduced by that earlier CVE
had to be corrected by that later CVE. I checked the state of
systemd_230 as shipped by WRL9, and it comes with an extensive set of
patches, which includes the earlier CVE, but *not* the later one.

  Hmmm ...

rday
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces

2021-02-19 Thread Robert P. J. Day
On Thu, 18 Feb 2021, Lennart Poettering wrote:

> On Do, 18.02.21 11:48, Robert P. J. Day (rpj...@crashcourse.ca) wrote:
>
> >   A colleague has reported the following apparent issue in a fairly
> > old (v230) version of systemd -- this is in a Yocto Project Wind River
> > Linux 9 build, hence the age of the package.
> >
> >   As reported to me (and I'm gathering more info), the system was
> > being put through some "longevity testing" by repeatedly adding,
> > removing, activating and de-activating network interfaces. According
> > to the report, the result was heap space slowly but inexorably being
> > consumed.
> >
> >   While waiting for more info, I'm going to examine the commit log for
> > systemd from v230 moving forward to collect any commits that address
> > memory leaks, then peruse them more carefully to see if they might
> > resolve the problem.
> >
> >   I realize it's asking a bit for folks here to remember that far
> > back, but does this issue sound at all familiar? Any pointers that
> > might save me some time? Thanks.
>
> Note that our hash tables operate with an allocation cache: when
> adding entries to them and then removing them again the memory
> required for that is not returned to the OS but added to a local
> cache. When the next entry is then added again, we recycle the cached
> entry instead of asking for new memory again. This allocation cache is
> a bit quicker then going to malloc() all the time, but means if you
> just watch the heap you'll assume there's a leak even though there
> isn't really, the memory is not lost after all, and will be reused
> eventually if we need it.
>
> You may use the env var SYSTEMD_MEMPOOL=0 to turn this logic off, but
> not sure v230 already knew that env var.

  One more observation before I dive head-first into debugging this --
I just logged into the embedded system in question after several
hours, and "top" shows systemd, RES = 2.706g ... 2.707g ... 2.708g,
bumping up every 30 seconds or so, so something is definitely eating
memory.

rday
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces

2021-02-18 Thread Robert P. J. Day
On Thu, 18 Feb 2021, Lennart Poettering wrote:

> On Do, 18.02.21 11:48, Robert P. J. Day (rpj...@crashcourse.ca) wrote:
>
> >   A colleague has reported the following apparent issue in a fairly
> > old (v230) version of systemd -- this is in a Yocto Project Wind River
> > Linux 9 build, hence the age of the package.
> >
> >   As reported to me (and I'm gathering more info), the system was
> > being put through some "longevity testing" by repeatedly adding,
> > removing, activating and de-activating network interfaces. According
> > to the report, the result was heap space slowly but inexorably being
> > consumed.
> >
> >   While waiting for more info, I'm going to examine the commit log for
> > systemd from v230 moving forward to collect any commits that address
> > memory leaks, then peruse them more carefully to see if they might
> > resolve the problem.
> >
> >   I realize it's asking a bit for folks here to remember that far
> > back, but does this issue sound at all familiar? Any pointers that
> > might save me some time? Thanks.
>
> Note that our hash tables operate with an allocation cache: when
> adding entries to them and then removing them again the memory
> required for that is not returned to the OS but added to a local
> cache. When the next entry is then added again, we recycle the cached
> entry instead of asking for new memory again. This allocation cache is
> a bit quicker then going to malloc() all the time, but means if you
> just watch the heap you'll assume there's a leak even though there
> isn't really, the memory is not lost after all, and will be reused
> eventually if we need it.
>
> You may use the env var SYSTEMD_MEMPOOL=0 to turn this logic off, but
> not sure v230 already knew that env var.

  i don't think that's it, as i was told that, eventually, the system
crashes due to lack of memory. here's a snippet from "top" from about
an hour ago:

 PID USER  PR  NI    VIRT    RES    SHR S  %CPU %MEM TIME+ COMMAND

... snip ...

    1 root  20   0 1807772 1.699g   3572 S   0.0  2.7   3:05.78 systemd

as you can see, systemd is apparently already sucking up 1.7G, and i
was also told that this eventually gets up into reporting in units of
terabytes before the system falls over. so that doesn't sound like it.
i'm just about to start perusing the commit log since v230 to see if
anything looks appropriate.

rday___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces

2021-02-18 Thread Lennart Poettering
On Do, 18.02.21 11:48, Robert P. J. Day (rpj...@crashcourse.ca) wrote:

>   A colleague has reported the following apparent issue in a fairly
> old (v230) version of systemd -- this is in a Yocto Project Wind River
> Linux 9 build, hence the age of the package.
>
>   As reported to me (and I'm gathering more info), the system was
> being put through some "longevity testing" by repeatedly adding,
> removing, activating and de-activating network interfaces. According
> to the report, the result was heap space slowly but inexorably being
> consumed.
>
>   While waiting for more info, I'm going to examine the commit log for
> systemd from v230 moving forward to collect any commits that address
> memory leaks, then peruse them more carefully to see if they might
> resolve the problem.
>
>   I realize it's asking a bit for folks here to remember that far
> back, but does this issue sound at all familiar? Any pointers that
> might save me some time? Thanks.

Note that our hash tables operate with an allocation cache: when
adding entries to them and then removing them again the memory
required for that is not returned to the OS but added to a local
cache. When the next entry is then added again, we recycle the cached
entry instead of asking for new memory again. This allocation cache is
a bit quicker then going to malloc() all the time, but means if you
just watch the heap you'll assume there's a leak even though there
isn't really, the memory is not lost after all, and will be reused
eventually if we need it.

You may use the env var SYSTEMD_MEMPOOL=0 to turn this logic off, but
not sure v230 already knew that env var.

Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces

2021-02-18 Thread Robert P. J. Day


  A colleague has reported the following apparent issue in a fairly
old (v230) version of systemd -- this is in a Yocto Project Wind River
Linux 9 build, hence the age of the package.

  As reported to me (and I'm gathering more info), the system was
being put through some "longevity testing" by repeatedly adding,
removing, activating and de-activating network interfaces. According
to the report, the result was heap space slowly but inexorably being
consumed.

  While waiting for more info, I'm going to examine the commit log for
systemd from v230 moving forward to collect any commits that address
memory leaks, then peruse them more carefully to see if they might
resolve the problem.

  I realize it's asking a bit for folks here to remember that far
back, but does this issue sound at all familiar? Any pointers that
might save me some time? Thanks.

rday
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel