Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces
On Mon, Feb 22, 2021 at 8:22 AM Robert P. J. Day wrote: > On Thu, 18 Feb 2021, Lennart Poettering wrote: > > > On Do, 18.02.21 11:48, Robert P. J. Day (rpj...@crashcourse.ca) wrote: > > > > > A colleague has reported the following apparent issue in a fairly > > > old (v230) version of systemd -- this is in a Yocto Project Wind River > > > Linux 9 build, hence the age of the package. > > > > > > As reported to me (and I'm gathering more info), the system was > > > being put through some "longevity testing" by repeatedly adding, > > > removing, activating and de-activating network interfaces. According > > > to the report, the result was heap space slowly but inexorably being > > > consumed. > > > > > > While waiting for more info, I'm going to examine the commit log for > > > systemd from v230 moving forward to collect any commits that address > > > memory leaks, then peruse them more carefully to see if they might > > > resolve the problem. > > > > > > I realize it's asking a bit for folks here to remember that far > > > back, but does this issue sound at all familiar? Any pointers that > > > might save me some time? Thanks. > > > > Note that our hash tables operate with an allocation cache: when > > adding entries to them and then removing them again the memory > > required for that is not returned to the OS but added to a local > > cache. When the next entry is then added again, we recycle the > > cached entry instead of asking for new memory again. This allocation > > cache is a bit quicker then going to malloc() all the time, but > > means if you just watch the heap you'll assume there's a leak even > > though there isn't really, the memory is not lost after all, and > > will be reused eventually if we need it. > > > > You may use the env var SYSTEMD_MEMPOOL=0 to turn this logic off, > > but not sure v230 already knew that env var. > > well, we seem to have isolated the issue, here it is in a nutshell > based on a condensed note i got from someone who tracked it down this > weekend. the memory leak is triggered by: > > $ ssh root@ -p 830 -s netconf [830 = netconf over SSH] > > long story short, according to jemalloc profiling, there is a massive > memory leak in DBUS code, Ok, give that data to whoever supports your system.. you are not giving us anything useful.. Now.. Can it have memory leaks ? yeah, it could, however I have reviewed the code (Admit a long time ago) and leaks on the systemd binaries are usually limited to error paths not exercised often and frankly in the path to extinction. What you see most of time are not leaks in the code, but leaks in understanding of memory management techniques. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces
On Mo, 22.02.21 06:22, Robert P. J. Day (rpj...@crashcourse.ca) wrote: > well, we seem to have isolated the issue, here it is in a nutshell > based on a condensed note i got from someone who tracked it down this > weekend. the memory leak is triggered by: > > $ ssh root@ -p 830 -s netconf [830 = netconf over SSH] > > long story short, according to jemalloc profiling, there is a massive > memory leak in DBUS code, to the tune of about 500M/day on a running > system. i'm perusing the profiling output now, but does any of this > sound even remotely familiar to anyone? i realize that's just a > summary, but does anyone remember seeing something related to this > once upon a time? [heavily-patched systemd_230 from wind river linux > 9]. On really old kernels we didn#t get reliable notification of cgroup empty events, hence .scope units would stick around on login sessions under some conditions. We added workarounds later on for that. But seriously, this is so long ago... no idea. Lennart -- Lennart Poettering, Berlin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces
On Mon, 22 Feb 2021, Greg KH wrote: > On Mon, Feb 22, 2021 at 06:22:44AM -0500, Robert P. J. Day wrote: > > On Thu, 18 Feb 2021, Lennart Poettering wrote: > > > > > On Do, 18.02.21 11:48, Robert P. J. Day (rpj...@crashcourse.ca) wrote: > > > > > > > A colleague has reported the following apparent issue in a fairly > > > > old (v230) version of systemd -- this is in a Yocto Project Wind River > > > > Linux 9 build, hence the age of the package. > > > > > > > > As reported to me (and I'm gathering more info), the system was > > > > being put through some "longevity testing" by repeatedly adding, > > > > removing, activating and de-activating network interfaces. According > > > > to the report, the result was heap space slowly but inexorably being > > > > consumed. > > > > > > > > While waiting for more info, I'm going to examine the commit log for > > > > systemd from v230 moving forward to collect any commits that address > > > > memory leaks, then peruse them more carefully to see if they might > > > > resolve the problem. > > > > > > > > I realize it's asking a bit for folks here to remember that far > > > > back, but does this issue sound at all familiar? Any pointers that > > > > might save me some time? Thanks. > > > > > > Note that our hash tables operate with an allocation cache: when > > > adding entries to them and then removing them again the memory > > > required for that is not returned to the OS but added to a local > > > cache. When the next entry is then added again, we recycle the > > > cached entry instead of asking for new memory again. This allocation > > > cache is a bit quicker then going to malloc() all the time, but > > > means if you just watch the heap you'll assume there's a leak even > > > though there isn't really, the memory is not lost after all, and > > > will be reused eventually if we need it. > > > > > > You may use the env var SYSTEMD_MEMPOOL=0 to turn this logic off, > > > but not sure v230 already knew that env var. > > > > well, we seem to have isolated the issue, here it is in a nutshell > > based on a condensed note i got from someone who tracked it down this > > weekend. the memory leak is triggered by: > > > > $ ssh root@ -p 830 -s netconf [830 = netconf over SSH] > > > > long story short, according to jemalloc profiling, there is a massive > > memory leak in DBUS code, to the tune of about 500M/day on a running > > system. i'm perusing the profiling output now, but does any of this > > sound even remotely familiar to anyone? i realize that's just a > > summary, but does anyone remember seeing something related to this > > once upon a time? [heavily-patched systemd_230 from wind river linux > > 9]. > > Given that this is a heavily patched system, please get support from > the vendor that provided this as you are paying for this. Don't ask > the community to try to remember what happened with an old obsolete > version of software, that's crazy... that's already in the pipeline, i was simply asking if anyone had ever *seen* this before, just so we might be able to say, "hey, we're not the first this has happened to." also, on the off-chance that anyone else is using a similarly-dated version of systemd, they might say, "hm, that sounds suspiciously like what's happening with *us*." just trying to be helpful. rday ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces
On Mon, Feb 22, 2021 at 06:22:44AM -0500, Robert P. J. Day wrote: > On Thu, 18 Feb 2021, Lennart Poettering wrote: > > > On Do, 18.02.21 11:48, Robert P. J. Day (rpj...@crashcourse.ca) wrote: > > > > > A colleague has reported the following apparent issue in a fairly > > > old (v230) version of systemd -- this is in a Yocto Project Wind River > > > Linux 9 build, hence the age of the package. > > > > > > As reported to me (and I'm gathering more info), the system was > > > being put through some "longevity testing" by repeatedly adding, > > > removing, activating and de-activating network interfaces. According > > > to the report, the result was heap space slowly but inexorably being > > > consumed. > > > > > > While waiting for more info, I'm going to examine the commit log for > > > systemd from v230 moving forward to collect any commits that address > > > memory leaks, then peruse them more carefully to see if they might > > > resolve the problem. > > > > > > I realize it's asking a bit for folks here to remember that far > > > back, but does this issue sound at all familiar? Any pointers that > > > might save me some time? Thanks. > > > > Note that our hash tables operate with an allocation cache: when > > adding entries to them and then removing them again the memory > > required for that is not returned to the OS but added to a local > > cache. When the next entry is then added again, we recycle the > > cached entry instead of asking for new memory again. This allocation > > cache is a bit quicker then going to malloc() all the time, but > > means if you just watch the heap you'll assume there's a leak even > > though there isn't really, the memory is not lost after all, and > > will be reused eventually if we need it. > > > > You may use the env var SYSTEMD_MEMPOOL=0 to turn this logic off, > > but not sure v230 already knew that env var. > > well, we seem to have isolated the issue, here it is in a nutshell > based on a condensed note i got from someone who tracked it down this > weekend. the memory leak is triggered by: > > $ ssh root@ -p 830 -s netconf [830 = netconf over SSH] > > long story short, according to jemalloc profiling, there is a massive > memory leak in DBUS code, to the tune of about 500M/day on a running > system. i'm perusing the profiling output now, but does any of this > sound even remotely familiar to anyone? i realize that's just a > summary, but does anyone remember seeing something related to this > once upon a time? [heavily-patched systemd_230 from wind river linux > 9]. Given that this is a heavily patched system, please get support from the vendor that provided this as you are paying for this. Don't ask the community to try to remember what happened with an old obsolete version of software, that's crazy... good luck! greg k-h ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces
On Thu, 18 Feb 2021, Lennart Poettering wrote: > On Do, 18.02.21 11:48, Robert P. J. Day (rpj...@crashcourse.ca) wrote: > > > A colleague has reported the following apparent issue in a fairly > > old (v230) version of systemd -- this is in a Yocto Project Wind River > > Linux 9 build, hence the age of the package. > > > > As reported to me (and I'm gathering more info), the system was > > being put through some "longevity testing" by repeatedly adding, > > removing, activating and de-activating network interfaces. According > > to the report, the result was heap space slowly but inexorably being > > consumed. > > > > While waiting for more info, I'm going to examine the commit log for > > systemd from v230 moving forward to collect any commits that address > > memory leaks, then peruse them more carefully to see if they might > > resolve the problem. > > > > I realize it's asking a bit for folks here to remember that far > > back, but does this issue sound at all familiar? Any pointers that > > might save me some time? Thanks. > > Note that our hash tables operate with an allocation cache: when > adding entries to them and then removing them again the memory > required for that is not returned to the OS but added to a local > cache. When the next entry is then added again, we recycle the > cached entry instead of asking for new memory again. This allocation > cache is a bit quicker then going to malloc() all the time, but > means if you just watch the heap you'll assume there's a leak even > though there isn't really, the memory is not lost after all, and > will be reused eventually if we need it. > > You may use the env var SYSTEMD_MEMPOOL=0 to turn this logic off, > but not sure v230 already knew that env var. well, we seem to have isolated the issue, here it is in a nutshell based on a condensed note i got from someone who tracked it down this weekend. the memory leak is triggered by: $ ssh root@ -p 830 -s netconf [830 = netconf over SSH] long story short, according to jemalloc profiling, there is a massive memory leak in DBUS code, to the tune of about 500M/day on a running system. i'm perusing the profiling output now, but does any of this sound even remotely familiar to anyone? i realize that's just a summary, but does anyone remember seeing something related to this once upon a time? [heavily-patched systemd_230 from wind river linux 9]. rday ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces
On Fri, Feb 19, 2021 at 06:09:16PM +0200, Mantas Mikulėnas wrote: > On Fri, Feb 19, 2021 at 4:49 PM Lennart Poettering > wrote: > > > On Fr, 19.02.21 09:28, Robert P. J. Day (rpj...@crashcourse.ca) wrote: > > > > > i guess i expected that the CVE identifier would be in the commit > > > message. anyway, time to examine ... > > > > CVEs are assigned/published long after the commits to fix the issues > > are made. We cannot retroactively change git commits, that's just not > > how this works. > > > > This *could* work with git notes, it seems --grep searches them as well. Git notes do not work for anything but a local repo, sorry. if people really care about CVEs, they know how to use them. But really, they are mostly useless: https://kernel-recipes.org/en/2019/talks/cves-are-dead-long-live-the-cve/ greg k-h ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces
On Fri, 19 Feb 2021, Lennart Poettering wrote: > On Fr, 19.02.21 18:09, Mantas Mikulėnas (graw...@gmail.com) wrote: > > > On Fri, Feb 19, 2021 at 4:49 PM Lennart Poettering > > wrote: > > > > > On Fr, 19.02.21 09:28, Robert P. J. Day (rpj...@crashcourse.ca) wrote: > > > > > > > i guess i expected that the CVE identifier would be in the commit > > > > message. anyway, time to examine ... > > > > > > CVEs are assigned/published long after the commits to fix the issues > > > are made. We cannot retroactively change git commits, that's just not > > > how this works. > > > > > > > This *could* work with git notes, it seems --grep searches them as well. > > We used to attach security + backport info via git notes onto our > commits, but github doesn#t show them/support them. They were > basically invisible, noone knew they were there. Thus we eventually > stopped doing them. > > If github would integrate git notes into their UI somehow this would > be grand. i suspect that won't happen any time soon: https://www.quora.com/Why-does-GitHub-no-longer-support-git-notes rday___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces
On Fr, 19.02.21 18:09, Mantas Mikulėnas (graw...@gmail.com) wrote: > On Fri, Feb 19, 2021 at 4:49 PM Lennart Poettering > wrote: > > > On Fr, 19.02.21 09:28, Robert P. J. Day (rpj...@crashcourse.ca) wrote: > > > > > i guess i expected that the CVE identifier would be in the commit > > > message. anyway, time to examine ... > > > > CVEs are assigned/published long after the commits to fix the issues > > are made. We cannot retroactively change git commits, that's just not > > how this works. > > > > This *could* work with git notes, it seems --grep searches them as well. We used to attach security + backport info via git notes onto our commits, but github doesn#t show them/support them. They were basically invisible, noone knew they were there. Thus we eventually stopped doing them. If github would integrate git notes into their UI somehow this would be grand. Lennart -- Lennart Poettering, Berlin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces
On Fri, Feb 19, 2021 at 4:49 PM Lennart Poettering wrote: > On Fr, 19.02.21 09:28, Robert P. J. Day (rpj...@crashcourse.ca) wrote: > > > i guess i expected that the CVE identifier would be in the commit > > message. anyway, time to examine ... > > CVEs are assigned/published long after the commits to fix the issues > are made. We cannot retroactively change git commits, that's just not > how this works. > This *could* work with git notes, it seems --grep searches them as well. -- Mantas Mikulėnas ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces
On Fri, 19 Feb 2021, Lennart Poettering wrote: > On Fr, 19.02.21 09:28, Robert P. J. Day (rpj...@crashcourse.ca) wrote: > > > i guess i expected that the CVE identifier would be in the commit > > message. anyway, time to examine ... > > CVEs are assigned/published long after the commits to fix the issues > are made. We cannot retroactively change git commits, that's just not > how this works. oh, i get that. i have my commit so it's time to see if it fixes the problem. thanks all for the assistance. rday ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces
On Fr, 19.02.21 09:28, Robert P. J. Day (rpj...@crashcourse.ca) wrote: > i guess i expected that the CVE identifier would be in the commit > message. anyway, time to examine ... CVEs are assigned/published long after the commits to fix the issues are made. We cannot retroactively change git commits, that's just not how this works. Lennart -- Lennart Poettering, Berlin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces
On Fri, 19 Feb 2021, Reindl Harald wrote: > > > Am 19.02.21 um 11:28 schrieb Robert P. J. Day: > >I *may* have found the problem ... as one can read here: > > > > https://access.redhat.com/solutions/3840481 > > > > "CVE-2019-3815 systemd: memory leak in journald-server.c introduced by > > fix for CVE-2018-16864" > > > >So as I interpret that, a memory leak introduced by that earlier CVE > > had to be corrected by that later CVE. I checked the state of > > systemd_230 as shipped by WRL9, and it comes with an extensive set of > > patches, which includes the earlier CVE, but *not* the later one. > > > >Hmmm ... > > that one should have been fixed long ago > https://bugzilla.redhat.com/show_bug.cgi?id=1665931 while i'm jumping back into this, a possibly silly question ... where in the systemd commit log can i find the commit corresponding to the "fix" associated with CVE-2019-3815? i wanted to see the original commit that introduced the bug, so i naturally used: $ git log --grep="CVE-2018-16864" and there it was. but i get nothing from: $ git log --grep="CVE-2019-3815" after a bit of reading, i landed on: $ git log -p --grep="don't use overly large buffer" commit eb1ec489eef8a32918bbfc56a268c9d10464584d Author: Michal Sekletár Date: Tue Jan 22 14:29:50 2019 +0100 process-util: don't use overly large buffer to store process command line Allocate new string as a return value and free our "scratch pad" buffer that is potentially much larger than needed (up to _SC_ARG_MAX). Fixes #11502 ... etc etc ... i guess i expected that the CVE identifier would be in the commit message. anyway, time to examine ... rday p.s. my first concern is whether this is a standalone patch that i can shoehorn into systemd_230, or whether i can just drag the entire version forward to the point where it incorporates that fix. that might take a bit of work given the distance involved.___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces
On Fri, 19 Feb 2021, Reindl Harald wrote: > > > Am 19.02.21 um 11:28 schrieb Robert P. J. Day: > >I *may* have found the problem ... as one can read here: > > > > https://access.redhat.com/solutions/3840481 > > > > "CVE-2019-3815 systemd: memory leak in journald-server.c introduced by > > fix for CVE-2018-16864" > > > >So as I interpret that, a memory leak introduced by that earlier CVE > > had to be corrected by that later CVE. I checked the state of > > systemd_230 as shipped by WRL9, and it comes with an extensive set of > > patches, which includes the earlier CVE, but *not* the later one. > > > >Hmmm ... > > that one should have been fixed long ago > https://bugzilla.redhat.com/show_bug.cgi?id=1665931 yes, that fix is from a while ago, but the issue here is that it wasn't incorporated in the patch set for wind river linux 9, which is a few years old, so it's not at all surprising that WRL9 is not keeping up with current patches. rday ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces
Am 19.02.21 um 11:28 schrieb Robert P. J. Day: I *may* have found the problem ... as one can read here: https://access.redhat.com/solutions/3840481 "CVE-2019-3815 systemd: memory leak in journald-server.c introduced by fix for CVE-2018-16864" So as I interpret that, a memory leak introduced by that earlier CVE had to be corrected by that later CVE. I checked the state of systemd_230 as shipped by WRL9, and it comes with an extensive set of patches, which includes the earlier CVE, but *not* the later one. Hmmm ... that one should have been fixed long ago https://bugzilla.redhat.com/show_bug.cgi?id=1665931 but your original mail didn't talk about journald at all Weitergeleitete Nachricht Betreff: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces Datum: Thu, 18 Feb 2021 11:48:58 -0500 (EST) Von: Robert P. J. Day An: Systemd mailing list A colleague has reported the following apparent issue in a fairly old (v230) version of systemd -- this is in a Yocto Project Wind River Linux 9 build, hence the age of the package. As reported to me (and I'm gathering more info), the system was being put through some "longevity testing" by repeatedly adding, removing, activating and de-activating network interfaces. According to the report, the result was heap space slowly but inexorably being consumed. While waiting for more info, I'm going to examine the commit log for systemd from v230 moving forward to collect any commits that address memory leaks, then peruse them more carefully to see if they might resolve the problem. I realize it's asking a bit for folks here to remember that far back, but does this issue sound at all familiar? Any pointers that might save me some time? Thanks. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces
On Thu, 18 Feb 2021, Lennart Poettering wrote: > On Do, 18.02.21 11:48, Robert P. J. Day (rpj...@crashcourse.ca) wrote: > > > A colleague has reported the following apparent issue in a fairly > > old (v230) version of systemd -- this is in a Yocto Project Wind River > > Linux 9 build, hence the age of the package. > > > > As reported to me (and I'm gathering more info), the system was > > being put through some "longevity testing" by repeatedly adding, > > removing, activating and de-activating network interfaces. According > > to the report, the result was heap space slowly but inexorably being > > consumed. > > > > While waiting for more info, I'm going to examine the commit log for > > systemd from v230 moving forward to collect any commits that address > > memory leaks, then peruse them more carefully to see if they might > > resolve the problem. > > > > I realize it's asking a bit for folks here to remember that far > > back, but does this issue sound at all familiar? Any pointers that > > might save me some time? Thanks. > > Note that our hash tables operate with an allocation cache: when > adding entries to them and then removing them again the memory > required for that is not returned to the OS but added to a local > cache. When the next entry is then added again, we recycle the cached > entry instead of asking for new memory again. This allocation cache is > a bit quicker then going to malloc() all the time, but means if you > just watch the heap you'll assume there's a leak even though there > isn't really, the memory is not lost after all, and will be reused > eventually if we need it. > > You may use the env var SYSTEMD_MEMPOOL=0 to turn this logic off, but > not sure v230 already knew that env var. I *may* have found the problem ... as one can read here: https://access.redhat.com/solutions/3840481 "CVE-2019-3815 systemd: memory leak in journald-server.c introduced by fix for CVE-2018-16864" So as I interpret that, a memory leak introduced by that earlier CVE had to be corrected by that later CVE. I checked the state of systemd_230 as shipped by WRL9, and it comes with an extensive set of patches, which includes the earlier CVE, but *not* the later one. Hmmm ... rday ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces
On Thu, 18 Feb 2021, Lennart Poettering wrote: > On Do, 18.02.21 11:48, Robert P. J. Day (rpj...@crashcourse.ca) wrote: > > > A colleague has reported the following apparent issue in a fairly > > old (v230) version of systemd -- this is in a Yocto Project Wind River > > Linux 9 build, hence the age of the package. > > > > As reported to me (and I'm gathering more info), the system was > > being put through some "longevity testing" by repeatedly adding, > > removing, activating and de-activating network interfaces. According > > to the report, the result was heap space slowly but inexorably being > > consumed. > > > > While waiting for more info, I'm going to examine the commit log for > > systemd from v230 moving forward to collect any commits that address > > memory leaks, then peruse them more carefully to see if they might > > resolve the problem. > > > > I realize it's asking a bit for folks here to remember that far > > back, but does this issue sound at all familiar? Any pointers that > > might save me some time? Thanks. > > Note that our hash tables operate with an allocation cache: when > adding entries to them and then removing them again the memory > required for that is not returned to the OS but added to a local > cache. When the next entry is then added again, we recycle the cached > entry instead of asking for new memory again. This allocation cache is > a bit quicker then going to malloc() all the time, but means if you > just watch the heap you'll assume there's a leak even though there > isn't really, the memory is not lost after all, and will be reused > eventually if we need it. > > You may use the env var SYSTEMD_MEMPOOL=0 to turn this logic off, but > not sure v230 already knew that env var. One more observation before I dive head-first into debugging this -- I just logged into the embedded system in question after several hours, and "top" shows systemd, RES = 2.706g ... 2.707g ... 2.708g, bumping up every 30 seconds or so, so something is definitely eating memory. rday ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces
On Thu, 18 Feb 2021, Lennart Poettering wrote: > On Do, 18.02.21 11:48, Robert P. J. Day (rpj...@crashcourse.ca) wrote: > > > A colleague has reported the following apparent issue in a fairly > > old (v230) version of systemd -- this is in a Yocto Project Wind River > > Linux 9 build, hence the age of the package. > > > > As reported to me (and I'm gathering more info), the system was > > being put through some "longevity testing" by repeatedly adding, > > removing, activating and de-activating network interfaces. According > > to the report, the result was heap space slowly but inexorably being > > consumed. > > > > While waiting for more info, I'm going to examine the commit log for > > systemd from v230 moving forward to collect any commits that address > > memory leaks, then peruse them more carefully to see if they might > > resolve the problem. > > > > I realize it's asking a bit for folks here to remember that far > > back, but does this issue sound at all familiar? Any pointers that > > might save me some time? Thanks. > > Note that our hash tables operate with an allocation cache: when > adding entries to them and then removing them again the memory > required for that is not returned to the OS but added to a local > cache. When the next entry is then added again, we recycle the cached > entry instead of asking for new memory again. This allocation cache is > a bit quicker then going to malloc() all the time, but means if you > just watch the heap you'll assume there's a leak even though there > isn't really, the memory is not lost after all, and will be reused > eventually if we need it. > > You may use the env var SYSTEMD_MEMPOOL=0 to turn this logic off, but > not sure v230 already knew that env var. i don't think that's it, as i was told that, eventually, the system crashes due to lack of memory. here's a snippet from "top" from about an hour ago: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND ... snip ... 1 root 20 0 1807772 1.699g 3572 S 0.0 2.7 3:05.78 systemd as you can see, systemd is apparently already sucking up 1.7G, and i was also told that this eventually gets up into reporting in units of terabytes before the system falls over. so that doesn't sound like it. i'm just about to start perusing the commit log since v230 to see if anything looks appropriate. rday___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces
On Do, 18.02.21 11:48, Robert P. J. Day (rpj...@crashcourse.ca) wrote: > A colleague has reported the following apparent issue in a fairly > old (v230) version of systemd -- this is in a Yocto Project Wind River > Linux 9 build, hence the age of the package. > > As reported to me (and I'm gathering more info), the system was > being put through some "longevity testing" by repeatedly adding, > removing, activating and de-activating network interfaces. According > to the report, the result was heap space slowly but inexorably being > consumed. > > While waiting for more info, I'm going to examine the commit log for > systemd from v230 moving forward to collect any commits that address > memory leaks, then peruse them more carefully to see if they might > resolve the problem. > > I realize it's asking a bit for folks here to remember that far > back, but does this issue sound at all familiar? Any pointers that > might save me some time? Thanks. Note that our hash tables operate with an allocation cache: when adding entries to them and then removing them again the memory required for that is not returned to the OS but added to a local cache. When the next entry is then added again, we recycle the cached entry instead of asking for new memory again. This allocation cache is a bit quicker then going to malloc() all the time, but means if you just watch the heap you'll assume there's a leak even though there isn't really, the memory is not lost after all, and will be reused eventually if we need it. You may use the env var SYSTEMD_MEMPOOL=0 to turn this logic off, but not sure v230 already knew that env var. Lennart -- Lennart Poettering, Berlin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces
A colleague has reported the following apparent issue in a fairly old (v230) version of systemd -- this is in a Yocto Project Wind River Linux 9 build, hence the age of the package. As reported to me (and I'm gathering more info), the system was being put through some "longevity testing" by repeatedly adding, removing, activating and de-activating network interfaces. According to the report, the result was heap space slowly but inexorably being consumed. While waiting for more info, I'm going to examine the commit log for systemd from v230 moving forward to collect any commits that address memory leaks, then peruse them more carefully to see if they might resolve the problem. I realize it's asking a bit for folks here to remember that far back, but does this issue sound at all familiar? Any pointers that might save me some time? Thanks. rday ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel