Public bug reported:
==Case Study==
I was trying to figure out why Ubuntu was killing my program, even though
system real memory pressure is not that high, though virtual memory seemed
abnormally high compared to on other platforms (which as it turn out just do
opportunistic page reclaim). It turned out to be a quirk of the lazy
allocater, and for some reason if there is not enough real memory it is trigger
happy with the OOMkiller instead of trying to solve the problem by garbage
collecting first. I understand that the memory pressure watermark is
introduced so heavy scans aren't taking place all the time, but if the system
encounters a potential OOM, there should be a policy setting to try a heavy
scan before going straight to OOMKilling.
==Summary==
The kernel will leave memory mapped into a program's address space as part of
lazy allocation, and will only free pages when a request both reduces total
free memory (real+swap) below `vm.min_freekbytes`, and does not exceed free
memory available.
Any long lived process will always accumulate a huge pool of virtual
memory that is over-committed because of the kernel's desire to do this.
If the amount of memory available is greater than the watermark, and any
program in the system then makes an allocation request which exceeds
total free and available memory the OOMKiller is launched to kill
programs.
==The ideal case==
Introduce a `vm.overcommit_memory` policy that attempts to relocate memory
before treating the system as OOM.
If reclaim/relocate takes longer than a timeout or if after compacting
there is not(or could not feasibly be with a quick preflight sum), then
treat the system as OOM, instead of immediately killing the most memory
hungry program.
==The workaround==
Setting
```
sysctl vm.overcommit_memory=1 #Always grant memory
sysctl vm.min_free_kbytes= $A_LARGE_NUMBER_SAY_A_GIG
```
Allows the system to recover by sidestepping the issue, but this setting is too
low by default in Ubuntu for a memory hungry program and something that wants
periodic large allocations running at once.
In particular I was running a background scientific job, and tried to
watch a youtube video in Firefox, and either the database server wanted
a large allocation for a transaction or Firefox wanted a large
allocation for a new window or video buffer, but one of those
allocations was to large and prompted the OOMkiller to fire, even though
by all accounts the amount of real memory in use was small, and neither
task was actually using anywhere close to all of the memory it had been
mapped, because it had freed that memory and had just been running for a
while.
ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-5.8.0-53-generic 5.8.0-53.60~20.04.1
ProcVersionSignature: Ubuntu 5.8.0-53.60~20.04.1-generic 5.8.18
Uname: Linux 5.8.0-53-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
ApportVersion: 2.20.11-0ubuntu27.17
Architecture: amd64
CasperMD5CheckResult: skip
CurrentDesktop: ubuntu:GNOME
Date: Tue May 25 13:12:15 2021
InstallationDate: Installed on 2021-03-12 (74 days ago)
InstallationMedia: Ubuntu 20.04.2.0 LTS "Focal Fossa" - Release amd64
(20210209.1)
SourcePackage: linux-signed-hwe-5.8
UpgradeStatus: No upgrade log present (probably fresh install)
** Affects: linux-signed-hwe-5.8 (Ubuntu)
Importance: Undecided
Status: New
** Tags: amd64 apport-bug focal
** Description changed:
==Case Study==
- I was trying to figure out why Ubuntu was killing my program, even though
system real memory pressure is high. It turned out to be a quirk of the lazy
allocater, and for some reason if there is not enough real memory it is trigger
happy with the OOMkiller instead of trying to solve the problem by garbage
collecting first. I understand that the memory pressure watermark is
introduced so heavy scans aren't taking place all the time, but if the system
encounters a potential OOM, there should be a policy setting to try a heavy
scan before going straight to OOMKilling.
+ I was trying to figure out why Ubuntu was killing my program, even though
system real memory pressure is not that high, though virtual memory seemed
abnormally high compared to on other platforms (which as it turn out just do
opportunistic page reclaim). It turned out to be a quirk of the lazy
allocater, and for some reason if there is not enough real memory it is trigger
happy with the OOMkiller instead of trying to solve the problem by garbage
collecting first. I understand that the memory pressure watermark is
introduced so heavy scans aren't taking place all the time, but if the system
encounters a potential OOM, there should be a policy setting to try a heavy
scan before going straight to OOMKilling.
==Summary==
The kernel will leave memory mapped into a program's address space as part of
lazy allocation, and will only free pages when a request both reduces total
free memory (real+swap) below `vm.min_freekbytes`, and does not exceed free
memory available.
Any long lived process will always accumulate a huge pool of virtual
memory that is over-committed because of the kernel's desire to do this.
If the amount of memory available is greater than the watermark, and any
program in the system then makes an allocation request which exceeds
total free and available memory the OOMKiller is launched to kill
programs.
==The ideal case==
Introduce a `vm.overcommit_memory` policy that attempts to relocate memory
before treating the system as OOM.
If reclaim/relocate takes longer than a timeout or if after compacting
there is not(or could not feasibly be with a quick preflight sum), then
treat the system as OOM, instead of immediately killing the most memory
hungry program.
==The workaround==
Setting
```
sysctl vm.overcommit_memory=1 #Always grant memory
sysctl vm.min_free_kbytes= $A_LARGE_NUMBER_SAY_A_GIG
```
Allows the system to recover by sidestepping the issue, but this setting is
too low by default in Ubuntu for a memory hungry program and something that
wants periodic large allocations running at once.
In particular I was running a background scientific job, and tried to
watch a youtube video in Firefox, and either the database server wanted
a large allocation for a transaction or Firefox wanted a large
allocation for a new window or video buffer, but one of those
allocations was to large and prompted the OOMkiller to fire, even though
by all accounts the amount of real memory in use was small, and neither
task was actually using anywhere close to all of the memory it had been
mapped, because it had freed that memory and had just been running for a
while.
ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-5.8.0-53-generic 5.8.0-53.60~20.04.1
ProcVersionSignature: Ubuntu 5.8.0-53.60~20.04.1-generic 5.8.18
Uname: Linux 5.8.0-53-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
ApportVersion: 2.20.11-0ubuntu27.17
Architecture: amd64
CasperMD5CheckResult: skip
CurrentDesktop: ubuntu:GNOME
Date: Tue May 25 13:12:15 2021
InstallationDate: Installed on 2021-03-12 (74 days ago)
InstallationMedia: Ubuntu 20.04.2.0 LTS "Focal Fossa" - Release amd64
(20210209.1)
SourcePackage: linux-signed-hwe-5.8
UpgradeStatus: No upgrade log present (probably fresh install)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1929612
Title:
OMMKiller dispatched when memory free but not reclaimed
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-signed-hwe-5.8/+bug/1929612/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs