[Bug 1926081] Re: nr_writeback memory leak in kernel 4.15.0-137+

2021-05-04 Thread Stefan Bader
*** This bug is a duplicate of bug 1926808 ***
https://bugs.launchpad.net/bugs/1926808

All 4 patches were already applied via "Bionic update: upstream stable
patchset 2021-04-30" (bug #1926808):


e1e50ec65274 mm: memcontrol: fix NR_WRITEBACK leak in memcg and system stats
73b14d6718b0 mm: memcg: make sure memory.events is uptodate when waking pollers
03c3e163ebe9 mem_cgroup: make sure moving_account, move_lock_task and stat_cpu 
in the same cacheline
09939b6b2c01 mm: fix oom_kill event handling

** This bug has been marked a duplicate of bug 1926808
   Bionic update: upstream stable patchset 2021-04-30

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1926081

Title:
  nr_writeback memory leak in kernel 4.15.0-137+

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1926081/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1926081] Re: nr_writeback memory leak in kernel 4.15.0-137+

2021-04-29 Thread Tim Gardner
Patches submitted: https://lists.ubuntu.com/archives/kernel-
team/2021-April/119661.html

** Changed in: linux (Ubuntu Bionic)
 Assignee: (unassigned) => Tim Gardner (timg-tpi)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1926081

Title:
  nr_writeback memory leak in kernel 4.15.0-137+

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1926081/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1926081] Re: nr_writeback memory leak in kernel 4.15.0-137+

2021-04-29 Thread Tim Gardner
Andrew - thanks for the testing and results. I'll get these patches
submitted for review.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1926081

Title:
  nr_writeback memory leak in kernel 4.15.0-137+

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1926081/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1926081] Re: nr_writeback memory leak in kernel 4.15.0-137+

2021-04-29 Thread Andrew Taylor
Installed and booted into the newly supplied kernel:

root@jenkins-lon02-02-general-swarm-node-03:~# dpkg -l | grep linux | grep 144
ii  linux-image-unsigned-4.15.0-144-generic 4.15.0-144.148~LP1926081.2  
amd64Linux kernel image for version 4.15.0 on 64 bit x86 SMP
ii  linux-modules-4.15.0-144-generic4.15.0-144.148~LP1926081.2  
amd64Linux kernel extra modules for version 4.15.0 on 64 
bit x86 SMP

root@jenkins-lon02-02-general-swarm-node-03:~# uname -r
4.15.0-144-generic

root@jenkins-lon02-02-general-swarm-node-03:~# uptime
 08:06:22 up 19 min,  1 user,  load average: 0.74, 0.31, 0.21

root@jenkins-lon02-02-general-swarm-node-03:~# ls -l /boot/
total 201933
-rw-r--r-- 1 root root  1537161 Jul 17  2018 abi-4.15.0-29-generic
-rw-r--r-- 1 root root   217414 Mar 24 12:47 config-4.15.0-141-generic
-rw-r--r-- 1 root root   217426 Apr 28 09:46 config-4.15.0-144-generic
-rw-r--r-- 1 root root   216807 Jul 17  2018 config-4.15.0-29-generic
-rw-r--r-- 1 root root   237757 Apr 12 17:02 config-5.4.0-72-generic
drwxr-xr-x 5 root root 1024 Apr 29 04:06 grub
-rw-r--r-- 1 root root 43066106 Apr 15 09:40 initrd.img-4.15.0-141-generic
-rw-r--r-- 1 root root 24589989 Apr 29 04:06 initrd.img-4.15.0-144-generic
-rw-r--r-- 1 root root 39975786 Apr 15 08:14 initrd.img-4.15.0-29-generic
-rw-r--r-- 1 root root 44490420 Apr 27 04:50 initrd.img-5.4.0-72-generic
drwx-- 2 root root12288 Apr 15 07:55 lost+found
-rw-r--r-- 1 root root0 Jul 17  2018 retpoline-4.15.0-29-generic
-rw--- 1 root root  4081420 Mar 24 12:47 System.map-4.15.0-141-generic
-rw--- 1 root root  4081469 Apr 28 09:46 System.map-4.15.0-144-generic
-rw--- 1 root root  4040379 Jul 17  2018 System.map-4.15.0-29-generic
-rw--- 1 root root  4585968 Apr 12 17:02 System.map-5.4.0-72-generic
-rw--- 1 root root  8445600 Mar 24 11:00 vmlinuz-4.15.0-141-generic
-rw--- 1 root root  8447776 Apr 28 09:46 vmlinuz-4.15.0-144-generic
-rw--- 1 root root  8257272 Jul 17  2018 vmlinuz-4.15.0-29-generic
-rw--- 1 root root  9445632 Apr 12 17:15 vmlinuz-5.4.0-72-generic

I've run my test through 5 times, and there has been no observed leak in
the writeback queue/memory used, so everything still looks good from my
side.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1926081

Title:
  nr_writeback memory leak in kernel 4.15.0-137+

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1926081/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1926081] Re: nr_writeback memory leak in kernel 4.15.0-137+

2021-04-29 Thread Andrew Taylor
Tim,

I'll cordon off the machine we've been using again and run some tests
with this new kernel, hopefully I'll be able to do it today.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1926081

Title:
  nr_writeback memory leak in kernel 4.15.0-137+

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1926081/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1926081] Re: nr_writeback memory leak in kernel 4.15.0-137+

2021-04-28 Thread Tim Gardner
Andrew - It was suggested that there are a couple more patches relevant
to this issue:

mm: fix oom_kill event handling
mem_cgroup: make sure moving_account, move_lock_task and stat_cpu in the same 
cacheline
mm: memcg: make sure memory.events is uptodate when waking pollers
mm: memcontrol: fix NR_WRITEBACK leak in memcg and system stats

Please test the kernel at
https://kernel.ubuntu.com/~rtg/LP%231926081/4.15.0-144.148~LP1926081.2/amd64/
to make sure it still solves your issue.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1926081

Title:
  nr_writeback memory leak in kernel 4.15.0-137+

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1926081/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1926081] Re: nr_writeback memory leak in kernel 4.15.0-137+

2021-04-28 Thread Tim Gardner
** Also affects: linux (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu Bionic)
   Status: New => In Progress

** Changed in: linux (Ubuntu Bionic)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1926081

Title:
  nr_writeback memory leak in kernel 4.15.0-137+

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1926081/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1926081] Re: nr_writeback memory leak in kernel 4.15.0-137+

2021-04-28 Thread Tim Gardner
Patches sent: https://lists.ubuntu.com/archives/kernel-
team/2021-April/119627.html

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1926081

Title:
  nr_writeback memory leak in kernel 4.15.0-137+

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1926081/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1926081] Re: nr_writeback memory leak in kernel 4.15.0-137+

2021-04-28 Thread Tim Gardner
** Description changed:

+ SRU Justification
  
- Ubuntu 18.04.5 4.15.0 LTS kernels at version 4.15.0-137 and above contain a 
memory leak due to the inclusion of patch from the upstream kernel, but not the 
fix for that patch which was released later.
+ [Impact]
+ 
+ Ubuntu 18.04.5 4.15.0 LTS kernels at version 4.15.0-137 and above
+ contain a memory leak due to the inclusion of patch from the upstream
+ kernel, but not the fix for that patch which was released later.
+ 
+ Bad patch in bionic:linux 2c17fa778db85644458b52a7df8eacc402cbc1ef mm:
+ memcontrol: fix excessive complexity in memory.stat reporting
  
  This issue manifests itself as an increasing amount of memory used by
  the writeback queue, which never returns to zero. This can been seen
  either as the value of `nr_writeback` in /proc/vmstat, or the value of
  `Writeback` in /proc/meminfo.
  
  Ordinarily these values should be at or around zero, but on our servers
  we observe the `nr_writeback` value increasing to over 8 million, (32GB
  of memory), at which point it isn't long before the system IO slows to a
  crawl (tens of Kb/s). Our servers have 256GB of memory, and are
  performing many CI related activities - this issue appears to be related
  to concurrent writing to disk, and can be demonstrated with a simple
  testcase (see later).
  
  On our heavily used systems this memory leak can result in an unstable
  server after 2-3 days, requiring a reboot to fix it.
  
  After much investigation the issue appears to be because the patch "mm:
  memcontrol: fix excessive complexity in memory.stat reporting" was
  brought in to the 4.15.0-137 Ubuntu kernel (see
  https://launchpad.net/ubuntu/+source/linux/4.15.0-137.141) as part of "
  Bionic update: upstream stable patchset 2021-01-25 (LP: #1913214)",
  however in the mainline kernel there was a follow up patch because this
  initial patch introduced concurrency issues. The patch "mm: memcontrol:
  fix NR_WRITEBACK leak in memcg and system stats" is required, and should
  be brought into the Ubuntu packaged kernel to fix the issues reported.
  
  The required patch is here:
  
https://github.com/torvalds/linux/commit/c3cc39118c3610eb6ab4711bc624af7fc48a35fe
  and was committed a few weeks after the original (broken) patch:
  
https://github.com/torvalds/linux/commit/a983b5ebee57209c99f68c8327072f25e0e6e3da
  
  I have checked the release notes for Ubuntu versions -137 to -143, and
  none include this second patch that should fix the issue. (I checked
  https://people.canonical.com/~kernel/info/kernel-version-map.html for
  all the kernel versions, and then visited each changelog page in turn,
  e.g. https://launchpad.net/ubuntu/+source/linux/4.15.0-143.147 looking
  for "mm: memcontrol: fix NR_WRITEBACK leak in memcg and system stats").
  
  We do not observe this on the 5.4.0 kernel (supported HWE kernel on
  18.05.5), which includes this second patch. That kernel may also include
  other patches, so we do not know if any other fixes are also required,
  but the one we have linked above seems to definitely be needed, and
  seems to match our symptoms.
  
+ [Test Plan]
+ 
  Testcase:
  
  The following is enough to permanently increase the value of
  `nr_writeback` on our systems (by about 2000 during most executions):
  
  ```
  date
  grep nr_writeback /proc/vmstat
  mkdir -p /docker/testfiles/{1..5}
  
  seq -w 1 10 | xargs -n1 -I% sh -c 'dd if=/dev/urandom 
of=/docker/testfiles/1/file.% bs=4k count=10 status=none' &
  seq -w 1 10 | xargs -n1 -I% sh -c 'dd if=/dev/urandom 
of=/docker/testfiles/2/file.% bs=4k count=10 status=none' &
  seq -w 1 10 | xargs -n1 -I% sh -c 'dd if=/dev/urandom 
of=/docker/testfiles/3/file.% bs=4k count=10 status=none' &
  seq -w 1 10 | xargs -n1 -I% sh -c 'dd if=/dev/urandom 
of=/docker/testfiles/4/file.% bs=4k count=10 status=none' &
  seq -w 1 10 | xargs -n1 -I% sh -c 'dd if=/dev/urandom 
of=/docker/testfiles/5/file.% bs=4k count=10 status=none' &
  
  wait $(jobs -p)
  grep nr_writeback /proc/vmstat
  date
  ```
  
  Subsequent iterations of the test raise it further, and on a system
  doing a lot of writing from a lot of different processes, it can rise
  quickly.
  
  System details:
  
  lsb_release -rd
  Description:  Ubuntu 18.04.5 LTS
  Release:  18.04
  
  Affected kernel: 4.15.0-137 onwards (current latest version tried was
  4.15.0-142)
  
  e.g.
  
  apt-cache policy linux-image-4.15.0-141-generic
  linux-image-4.15.0-141-generic:
-   Installed: 4.15.0-141.145
-   Candidate: 4.15.0-141.145
-   Version table:
-  *** 4.15.0-141.145 500
- 500 http://mirrors.service.networklayer.com/ubuntu 
bionic-updates/main amd64 Packages
- 500 http://mirrors.service.networklayer.com/ubuntu 
bionic-security/main amd64 Packages
- 100 /var/lib/dpkg/status
+   Installed: 4.15.0-141.145
+   Candidate: 4.15.0-141.145
+   Version table:
+  *** 4.15.0-141.145 500
+ 500 http://mirrors.service.networklayer.com/ubuntu 

[Bug 1926081] Re: nr_writeback memory leak in kernel 4.15.0-137+

2021-04-27 Thread Andrew Taylor
Awesome, thank you!

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1926081

Title:
  nr_writeback memory leak in kernel 4.15.0-137+

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1926081/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1926081] Re: nr_writeback memory leak in kernel 4.15.0-137+

2021-04-27 Thread Tim Gardner
Andrew - thanks for the testing feedback. We'll get these patches
included in the next SRU cycle.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1926081

Title:
  nr_writeback memory leak in kernel 4.15.0-137+

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1926081/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1926081] Re: nr_writeback memory leak in kernel 4.15.0-137+

2021-04-27 Thread Andrew Taylor
Good news on the custom -144 kernel you provided:

root@jenkins-lon02-02-general-swarm-node-03:~# uname -r
4.15.0-144-generic

root@jenkins-lon02-02-general-swarm-node-03:~# dpkg -l | grep 144
ii  linux-image-unsigned-4.15.0-144-generic 4.15.0-144.148~LP1926081.1  
amd64Linux kernel image for version 4.15.0 on 64 bit x86 SMP
ii  linux-modules-4.15.0-144-generic4.15.0-144.148~LP1926081.1  
amd64Linux kernel extra modules for version 4.15.0 on 64 
bit x86 SMP

root@jenkins-lon02-02-general-swarm-node-03:~# uptime
 08:53:37 up 48 min,  1 user,  load average: 1.22, 3.54, 3.05

root@jenkins-lon02-02-general-swarm-node-03:~# grep "writeback" /proc/vmstat
nr_writeback 0
nr_writeback_temp 0

I've run through my testcase 3 times on this new kernel and I've not
seen any permanent increase in the `nr_writeback` value, it always
returns to zero, which is great.

So from my end, this looks really good - is there anything else you'd
like me to do or test, or is this good enough to pull them in for the
-144 update?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1926081

Title:
  nr_writeback memory leak in kernel 4.15.0-137+

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1926081/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1926081] Re: nr_writeback memory leak in kernel 4.15.0-137+

2021-04-27 Thread Andrew Taylor
Thank you both,

I'll give that kernel a go now.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1926081

Title:
  nr_writeback memory leak in kernel 4.15.0-137+

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1926081/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1926081] Re: nr_writeback memory leak in kernel 4.15.0-137+

2021-04-27 Thread Guilherme G. Piccoli
That's awesome Andrew, thanks for being so proactive in the issue. For
your first LP bug report, you're doing an amazing job!

Tim posted above a kernel with both fixes, if you can try it, that'd be good. 
And I want to apologize, I said a wrong information above - the fixes ARE in 
linux-stable upstream, I just missed them in a first look. My colleague 
Krzysztof pointed me that they reached stable in v4.14.229 - so eventually 
they'd reach Ubuntu kernels. But your LP hopefully speed-up things.
Cheers!

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1926081

Title:
  nr_writeback memory leak in kernel 4.15.0-137+

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1926081/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1926081] Re: nr_writeback memory leak in kernel 4.15.0-137+

2021-04-27 Thread Tim Gardner
Andrew - please find a test kernel at
https://kernel.ubuntu.com/~rtg/LP%231926081.

wget 
https://kernel.ubuntu.com/~rtg/LP%231926081/amd64/linux-image-unsigned-4.15.0-144-generic_4.15.0-144.148~LP1926081.1_amd64.deb
wget 
https://kernel.ubuntu.com/~rtg/LP%231926081/amd64/linux-modules-4.15.0-144-generic_4.15.0-144.148~LP1926081.1_amd64.deb
sudo dpkg -i *.deb

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1926081

Title:
  nr_writeback memory leak in kernel 4.15.0-137+

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1926081/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1926081] Re: nr_writeback memory leak in kernel 4.15.0-137+

2021-04-27 Thread Andrew Taylor
So, good news.

I pulled in the ubuntu kernel sources and applied the main patch I had
previously identified
(https://github.com/torvalds/linux/commit/c3cc39118c3610eb6ab4711bc624af7fc48a35fe).
Looking at the other patch
(https://github.com/torvalds/linux/commit/e27be240df53), that seems to
be related to the accuracy of returned statistics which is not something
that my testcase is going to show up.

I built a version of the 4.15.0-142 kernel with patch, and booted back
into it, and with two iterations of my testcase my `nr_writeback` value
has ended up on zero - which is excellent news. From previous runs it
has never failed to leak, so this looks good to me. I will run it some
more, but it is safe to say I am pretty happy at this point.

To that end, I believe that
https://github.com/torvalds/linux/commit/c3cc39118c3610eb6ab4711bc624af7fc48a35fe
"mm: memcontrol: fix NR_WRITEBACK leak in memcg and system stats" does
indeed fix this issue and the symptoms I have reported above.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1926081

Title:
  nr_writeback memory leak in kernel 4.15.0-137+

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1926081/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1926081] Re: nr_writeback memory leak in kernel 4.15.0-137+

2021-04-27 Thread Andrew Taylor
Guilherme, thank you for your kind words :)

I have been trying to reproduce this bug on several other systems that I
have access to in our cloud account, but I have been unable to reproduce
it on a VM (either with SAN or local SSD storage). The main set of
servers where this has been seen by us are bare-metal servers with a
RAID card backed by SSDs - it's possible that a combination of the
resources available on the machine (CPU, RAM, disk IO) cause this bug to
be more reproducible with my basic testcase.

I have taken a server out of production and rebooted it into the 4.15
kernel (4.15.0-141-generic) where the issue is able to be seen.

I confirmed my testcase still reproduces the issue here, and it does -
nr_writeback is currently stuck at 2641 after one iteration.

I have supplied the apport collected information from that server, which
is now attached to this issue.


This is my first bug report on Launchpad, so I am as yet unfamiliar with the 
process of testing the potential patches I need. Are you suggesting that I 
follow the process to rebuild the kernel 
(https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel) including the patches you 
have mentioned?

Assuming that is the correct course of action I'll attempt to follow the
instructions and do that, and report back.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1926081

Title:
  nr_writeback memory leak in kernel 4.15.0-137+

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1926081/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1926081] Re: nr_writeback memory leak in kernel 4.15.0-137+

2021-04-27 Thread Andrew Taylor
apport information

** Tags added: apport-collected

** Description changed:

  
  Ubuntu 18.04.5 4.15.0 LTS kernels at version 4.15.0-137 and above contain a 
memory leak due to the inclusion of patch from the upstream kernel, but not the 
fix for that patch which was released later.
  
  This issue manifests itself as an increasing amount of memory used by
  the writeback queue, which never returns to zero. This can been seen
  either as the value of `nr_writeback` in /proc/vmstat, or the value of
  `Writeback` in /proc/meminfo.
  
  Ordinarily these values should be at or around zero, but on our servers
  we observe the `nr_writeback` value increasing to over 8 million, (32GB
  of memory), at which point it isn't long before the system IO slows to a
  crawl (tens of Kb/s). Our servers have 256GB of memory, and are
  performing many CI related activities - this issue appears to be related
  to concurrent writing to disk, and can be demonstrated with a simple
  testcase (see later).
  
  On our heavily used systems this memory leak can result in an unstable
  server after 2-3 days, requiring a reboot to fix it.
  
  After much investigation the issue appears to be because the patch "mm:
  memcontrol: fix excessive complexity in memory.stat reporting" was
  brought in to the 4.15.0-137 Ubuntu kernel (see
  https://launchpad.net/ubuntu/+source/linux/4.15.0-137.141) as part of "
  Bionic update: upstream stable patchset 2021-01-25 (LP: #1913214)",
  however in the mainline kernel there was a follow up patch because this
  initial patch introduced concurrency issues. The patch "mm: memcontrol:
  fix NR_WRITEBACK leak in memcg and system stats" is required, and should
  be brought into the Ubuntu packaged kernel to fix the issues reported.
  
  The required patch is here:
  
https://github.com/torvalds/linux/commit/c3cc39118c3610eb6ab4711bc624af7fc48a35fe
  and was committed a few weeks after the original (broken) patch:
  
https://github.com/torvalds/linux/commit/a983b5ebee57209c99f68c8327072f25e0e6e3da
  
  I have checked the release notes for Ubuntu versions -137 to -143, and
  none include this second patch that should fix the issue. (I checked
  https://people.canonical.com/~kernel/info/kernel-version-map.html for
  all the kernel versions, and then visited each changelog page in turn,
  e.g. https://launchpad.net/ubuntu/+source/linux/4.15.0-143.147 looking
  for "mm: memcontrol: fix NR_WRITEBACK leak in memcg and system stats").
  
  We do not observe this on the 5.4.0 kernel (supported HWE kernel on
  18.05.5), which includes this second patch. That kernel may also include
  other patches, so we do not know if any other fixes are also required,
  but the one we have linked above seems to definitely be needed, and
  seems to match our symptoms.
  
  Testcase:
  
  The following is enough to permanently increase the value of
  `nr_writeback` on our systems (by about 2000 during most executions):
  
  ```
  date
  grep nr_writeback /proc/vmstat
  mkdir -p /docker/testfiles/{1..5}
  
  seq -w 1 10 | xargs -n1 -I% sh -c 'dd if=/dev/urandom 
of=/docker/testfiles/1/file.% bs=4k count=10 status=none' &
  seq -w 1 10 | xargs -n1 -I% sh -c 'dd if=/dev/urandom 
of=/docker/testfiles/2/file.% bs=4k count=10 status=none' &
  seq -w 1 10 | xargs -n1 -I% sh -c 'dd if=/dev/urandom 
of=/docker/testfiles/3/file.% bs=4k count=10 status=none' &
  seq -w 1 10 | xargs -n1 -I% sh -c 'dd if=/dev/urandom 
of=/docker/testfiles/4/file.% bs=4k count=10 status=none' &
  seq -w 1 10 | xargs -n1 -I% sh -c 'dd if=/dev/urandom 
of=/docker/testfiles/5/file.% bs=4k count=10 status=none' &
  
  wait $(jobs -p)
  grep nr_writeback /proc/vmstat
  date
  ```
  
  Subsequent iterations of the test raise it further, and on a system
  doing a lot of writing from a lot of different processes, it can rise
  quickly.
  
  System details:
  
  lsb_release -rd
  Description:  Ubuntu 18.04.5 LTS
  Release:  18.04
  
  Affected kernel: 4.15.0-137 onwards (current latest version tried was
  4.15.0-142)
  
  e.g.
  
  apt-cache policy linux-image-4.15.0-141-generic
  linux-image-4.15.0-141-generic:
Installed: 4.15.0-141.145
Candidate: 4.15.0-141.145
Version table:
   *** 4.15.0-141.145 500
  500 http://mirrors.service.networklayer.com/ubuntu 
bionic-updates/main amd64 Packages
  500 http://mirrors.service.networklayer.com/ubuntu 
bionic-security/main amd64 Packages
  100 /var/lib/dpkg/status
  
  According to https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies I
  should include additional information from the server, but at this stage
  we have upgraded all our affected systems to 5.4.0, and therefore the
  kernel versions do not match those with this issue.
  
  We likely have other servers used in other services that are not as
  heavily loaded that have not been as affected by this issue - and
  therefore and I may be able to get the equivalent diagnostics from there
  after confirming 

[Bug 1926081] Re: nr_writeback memory leak in kernel 4.15.0-137+

2021-04-26 Thread Guilherme G. Piccoli
Hi Andrew, thanks a lot for the very thorough bug report! This far
better than apport logs, you basically explained everything that is
going on plus what is need to fix it, much appreciated!

Based on upstream tree, I see the following 2 commits that carry Fixes
tag to the offender you pointed:

e27be240df53 ("mm: memcg: make sure memory.events is uptodate when waking 
pollers")
c3cc39118c36 ("mm: memcontrol: fix NR_WRITEBACK leak in memcg and system stats")

Did you manage to test with both commits? They don't appear to be present in 
linux-stable, which is unfortunate, especially since the offender *is* present 
in v4.14.y.
I'll try to cook a kernel with both fixes and submit them to the ML.

Cheers,


Guilherme

** Changed in: linux (Ubuntu)
   Status: Confirmed => In Progress

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1926081

Title:
  nr_writeback memory leak in kernel 4.15.0-137+

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1926081/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1926081] Re: nr_writeback memory leak in kernel 4.15.0-137+

2021-04-25 Thread Andrew Taylor
I did not initially run the `apport-collect` command as the servers on
which I observed this bug have been upgraded to use the 5.4.0 kernel to
mitigate the issue (as mentioned in the initial report), therefore the
kernel related information may be misleading.

I will endeavour to find a server that is exhibiting this issue that
remains on an impacted kernel, however I cannot do so today, and given
that I have identified a required upstream patch I believe some progress
can be made with this bug without this information.

If an `apport-collect` command on a server that has been upgraded to the newer 
kernel is still useful, please let me know and I can see if I can generate the 
information there.
I will attempt to find a server that has not been upgraded, replicate the 
issue, and then will upload the required logs from that server.

** Changed in: linux (Ubuntu)
   Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1926081

Title:
  nr_writeback memory leak in kernel 4.15.0-137+

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1926081/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs