You have been subscribed to a public bug: Description ===========
When hugepages are enabled on the host it's possible to schedule VMs using more RAM than available. On the node with memory usage presented below it was possible to schedule 6 instances using a total of 140G of memory and a non- hugepages-enabled flavor. The same machine has 188G of memory in total, of which 64G were reserved for hugepages. Additional ~4G were used for housekeeping, OpenStack control plane, etc. This resulted in overcommitment of roughly 20G. After running memory intensive operations on the VMs, some of them got OOM killed. $ cat /proc/meminfo | egrep "^(Mem|Huge)" # on the compute node MemTotal: 197784792 kB MemFree: 115005288 kB MemAvailable: 116745612 kB HugePages_Total: 64 HugePages_Free: 64 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1048576 kB Hugetlb: 67108864 kB $ os hypervisor show copmute1 -c memory_mb -c memory_mb_used -c free_ram_mb +----------------+--------+ | Field | Value | +----------------+--------+ | free_ram_mb | 29309 | | memory_mb | 193149 | | memory_mb_used | 163840 | +----------------+--------+ $ os host show compute1 +----------+----------------------------------+-----+-----------+---------+ | Host | Project | CPU | Memory MB | Disk GB | +----------+----------------------------------+-----+-----------+---------+ | compute1 | (total) | 0 | 193149 | 893 | | compute1 | (used_now) | 72 | 163840 | 460 | | compute1 | (used_max) | 72 | 147456 | 460 | | compute1 | some_project_id_was_here | 2 | 4096 | 40 | | compute1 | another_anonymized_id_here | 70 | 143360 | 420 | +----------+----------------------------------+-----+-----------+---------+ $ os resource provider inventory list uuid_of_compute1_node +----------------+------------------+----------+----------+----------+-----------+--------+ | resource_class | allocation_ratio | min_unit | max_unit | reserved | step_size | total | +----------------+------------------+----------+----------+----------+-----------+--------+ | MEMORY_MB | 1.0 | 1 | 193149 | 16384 | 1 | 193149 | | DISK_GB | 1.0 | 1 | 893 | 0 | 1 | 893 | | PCPU | 1.0 | 1 | 72 | 0 | 1 | 72 | +----------------+------------------+----------+----------+----------+-----------+--------+ Steps to reproduce ================== 1. Reserve a large part of memory for hugepages on the hypervisor. 2. Create VMs using a flavor that uses a lot of memory that isn't backed by hugepages. 3. Start memory intensive operations on the VMs, e.g.: stress-ng --vm-bytes $(awk '/MemAvailable/{printf "%d", $2 * 0.98;}' < /proc/meminfo)k --vm-keep -m 1 Expected result =============== Nova should not allow overcommitment and should be able to differentiate between hugepages and "normal" memory. Actual result ============= Overcommitment resulting in OOM kills. Environment =========== nova-api-metadata 2:21.2.1-0ubuntu1~cloud0 nova-common 2:21.2.1-0ubuntu1~cloud0 nova-compute 2:21.2.1-0ubuntu1~cloud0 nova-compute-kvm 2:21.2.1-0ubuntu1~cloud0 nova-compute-libvirt 2:21.2.1-0ubuntu1~cloud0 python3-nova 2:21.2.1-0ubuntu1~cloud0 python3-novaclient 2:17.0.0-0ubuntu1~cloud0 OS: Ubuntu 18.04.5 LTS Hypervisor: libvirt + KVM ** Affects: nova Importance: Undecided Status: Confirmed ** Tags: sts -- Nova doesn't account for hugepages when scheduling VMs https://bugs.launchpad.net/bugs/1950186 You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp