[Yahoo-eng-team] [Bug 1950186] Re: Nova doesn't account for hugepages when scheduling VMs
This is not a bug it is user error. when using hugepages if you want to have non hugepage guests on the same host then you must use hw:mem_page_size=small or hw:mem_page_size=4k for all non hugepages guests we do not support memory oversubscriton when using hw:mem_page_size and this also makes the guest have 1 implicit numa node. we intentually do not support mixing numa and non numa guest on the same host which is what happens if you do not use hw:mem_page_size=small when hw:mem_page_size is not set we do not do page size/numa node aware schduling. the reason that you are having the current issue is because you are mixing numa and non numa instance on the same host which has never been supported in nova. we may eventually support this in the distantant future but we have no plans to support this in zed and no one has proposed a way to support it upstream yet. it is a very non trivial feature and would require us to effectively make all instance numa instances. we cannot support mixing floating instance an numa affined instances on the same host today due to how we do numa affinity and how that interacts with the kernel OOM reaper. basically the OOM reaper operates per numa node not globally so if the kernel need memory on numa node 0 even if there is free memory on numa node 0 if it cant free the memory on numa node 0 it will kill process to free it. that will often result in numa affined non hugepage guest being killed if a floating guest is spawned and it triggers an OOM event. that is not something we can allow to happen as its a multi tenant issue so we cannot support mixing numa and non numa instance in the same host. the workaround to use hugepage and non hugepage guests on the same host is there for to make all the guest have numa affinity by using hw:mem_page_size. this is a well know limitation and not a bug so I'm closing this as wont fix ** Changed in: nova Status: Confirmed => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1950186 Title: Nova doesn't account for hugepages when scheduling VMs Status in OpenStack Compute (nova): Won't Fix Bug description: Description === When hugepages are enabled on the host it's possible to schedule VMs using more RAM than available. On the node with memory usage presented below it was possible to schedule 6 instances using a total of 140G of memory and a non- hugepages-enabled flavor. The same machine has 188G of memory in total, of which 64G were reserved for hugepages. Additional ~4G were used for housekeeping, OpenStack control plane, etc. This resulted in overcommitment of roughly 20G. After running memory intensive operations on the VMs, some of them got OOM killed. $ cat /proc/meminfo | egrep "^(Mem|Huge)" # on the compute node MemTotal: 197784792 kB MemFree:115005288 kB MemAvailable: 116745612 kB HugePages_Total: 64 HugePages_Free: 64 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize:1048576 kB Hugetlb:67108864 kB $ os hypervisor show copmute1 -c memory_mb -c memory_mb_used -c free_ram_mb +++ | Field | Value | +++ | free_ram_mb| 29309 | | memory_mb | 193149 | | memory_mb_used | 163840 | +++ $ os host show compute1 +--+--+-+---+-+ | Host | Project | CPU | Memory MB | Disk GB | +--+--+-+---+-+ | compute1 | (total) | 0 |193149 | 893 | | compute1 | (used_now) | 72 |163840 | 460 | | compute1 | (used_max) | 72 |147456 | 460 | | compute1 | some_project_id_was_here | 2 | 4096 | 40 | | compute1 | another_anonymized_id_here | 70 |143360 | 420 | +--+--+-+---+-+ $ os resource provider inventory list uuid_of_compute1_node ++--+--+--+--+---++ | resource_class | allocation_ratio | min_unit | max_unit | reserved | step_size | total | ++--+--+--+--+---++ | MEMORY_MB | 1.0 |1 | 193149 |16384 | 1 | 193149 | | DISK_GB| 1.0 |1 | 893 |0 | 1 |893 | | PCPU | 1.0 |1 | 72 |0 | 1 | 72 | ++--+--+--+--+---++ Steps to reproduce == 1.
[Yahoo-eng-team] [Bug 1950186] Re: Nova doesn't account for hugepages when scheduling VMs
** Package changed: nova (Ubuntu) => nova -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1950186 Title: Nova doesn't account for hugepages when scheduling VMs Status in OpenStack Compute (nova): Confirmed Bug description: Description === When hugepages are enabled on the host it's possible to schedule VMs using more RAM than available. On the node with memory usage presented below it was possible to schedule 6 instances using a total of 140G of memory and a non- hugepages-enabled flavor. The same machine has 188G of memory in total, of which 64G were reserved for hugepages. Additional ~4G were used for housekeeping, OpenStack control plane, etc. This resulted in overcommitment of roughly 20G. After running memory intensive operations on the VMs, some of them got OOM killed. $ cat /proc/meminfo | egrep "^(Mem|Huge)" # on the compute node MemTotal: 197784792 kB MemFree:115005288 kB MemAvailable: 116745612 kB HugePages_Total: 64 HugePages_Free: 64 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize:1048576 kB Hugetlb:67108864 kB $ os hypervisor show copmute1 -c memory_mb -c memory_mb_used -c free_ram_mb +++ | Field | Value | +++ | free_ram_mb| 29309 | | memory_mb | 193149 | | memory_mb_used | 163840 | +++ $ os host show compute1 +--+--+-+---+-+ | Host | Project | CPU | Memory MB | Disk GB | +--+--+-+---+-+ | compute1 | (total) | 0 |193149 | 893 | | compute1 | (used_now) | 72 |163840 | 460 | | compute1 | (used_max) | 72 |147456 | 460 | | compute1 | some_project_id_was_here | 2 | 4096 | 40 | | compute1 | another_anonymized_id_here | 70 |143360 | 420 | +--+--+-+---+-+ $ os resource provider inventory list uuid_of_compute1_node ++--+--+--+--+---++ | resource_class | allocation_ratio | min_unit | max_unit | reserved | step_size | total | ++--+--+--+--+---++ | MEMORY_MB | 1.0 |1 | 193149 |16384 | 1 | 193149 | | DISK_GB| 1.0 |1 | 893 |0 | 1 |893 | | PCPU | 1.0 |1 | 72 |0 | 1 | 72 | ++--+--+--+--+---++ Steps to reproduce == 1. Reserve a large part of memory for hugepages on the hypervisor. 2. Create VMs using a flavor that uses a lot of memory that isn't backed by hugepages. 3. Start memory intensive operations on the VMs, e.g.: stress-ng --vm-bytes $(awk '/MemAvailable/{printf "%d", $2 * 0.98;}' < /proc/meminfo)k --vm-keep -m 1 Expected result === Nova should not allow overcommitment and should be able to differentiate between hugepages and "normal" memory. Actual result = Overcommitment resulting in OOM kills. Environment === nova-api-metadata 2:21.2.1-0ubuntu1~cloud0 nova-common 2:21.2.1-0ubuntu1~cloud0 nova-compute 2:21.2.1-0ubuntu1~cloud0 nova-compute-kvm 2:21.2.1-0ubuntu1~cloud0 nova-compute-libvirt 2:21.2.1-0ubuntu1~cloud0 python3-nova 2:21.2.1-0ubuntu1~cloud0 python3-novaclient 2:17.0.0-0ubuntu1~cloud0 OS: Ubuntu 18.04.5 LTS Hypervisor: libvirt + KVM To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1950186/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1950186] Re: Nova doesn't account for hugepages when scheduling VMs
This can be reproduced on Focal/ussuri: Computes: $ os resource provider list +--+-++--+--+ | uuid | name | generation | root_provider_uuid | parent_provider_uuid | +--+-++--+--+ | ca3fa736-7e60-4365-9cc8-7afc78b53005 | juju-98fb61-zaza-d6f2c7825043-9.project.serverstack | 5 | ca3fa736-7e60-4365-9cc8-7afc78b53005 | None | | 0605bd29-71d5-40ed-ab8f-eceeaaac59b5 | juju-98fb61-zaza-d6f2c7825043-8.project.serverstack | 4 | 0605bd29-71d5-40ed-ab8f-eceeaaac59b5 | None | +--+-++--+--+ Mem Allocation ratio is 1: $ openstack resource provider inventory list ca3fa736-7e60-4365-9cc8-7afc78b53005 ++--+--+--+--+---+---+---+ | resource_class | allocation_ratio | min_unit | max_unit | reserved | step_size | total | used | ++--+--+--+--+---+---+---+ | VCPU | 16.0 |1 |8 |0 | 1 | 8 | 2 | | MEMORY_MB | 1.0 |1 |16008 | 2048 | 1 | 16008 | 13960 | | DISK_GB| 1.0 |1 | 77 |0 | 1 |77 |20 | ++--+--+--+--+---+---+---+ $ openstack resource provider inventory list 0605bd29-71d5-40ed-ab8f-eceeaaac59b5 ++--+--+--+--+---+---+--+ | resource_class | allocation_ratio | min_unit | max_unit | reserved | step_size | total | used | ++--+--+--+--+---+---+--+ | VCPU | 16.0 |1 |8 |0 | 1 | 8 |0 | | MEMORY_MB | 1.0 |1 |16008 | 2048 | 1 | 16008 |0 | | DISK_GB| 1.0 |1 | 77 |0 | 1 |77 |0 | ++--+--+--+--+---+---+--+ Hugepages: 1000 * 2M root@juju-98fb61-zaza-d6f2c7825043-9:~# cat /proc/meminfo | grep -i huge AnonHugePages:622592 kB ShmemHugePages:0 kB FileHugePages: 0 kB HugePages_Total:1000 HugePages_Free: 1000 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 2048 kB Hugetlb: 2048000 kB root@juju-98fb61-zaza-d6f2c7825043-9:~# free -mh totalusedfree shared buff/cache available Mem: 15Gi 3.5Gi11Gi 1.0Mi 713Mi11Gi Swap:0B 0B 0B Host reserved memory is 2G: $ juju config nova-compute reserved-host-memory 2048 Available Mem for general use (not hugepage) I expect to have available memory for VMs = 16008 (total) - 2048 (reserved) - 2048 (hugepages) = 11912 Flavor with mem 13960 (> of expected total available 11912) $ os flavor show 14g-mem ++--+ | Field | Value| ++--+ | OS-FLV-DISABLED:disabled | False| | OS-FLV-EXT-DATA:ephemeral | 0| | access_project_ids | None | | description| None | | disk | 20 | | id | 377de58b-7aa2-499d-9940-abf98aaa5a8a | | name | 14g-mem | | os-flavor-access:is_public | True | | properties | | | ram| 13960| | rxtx_factor| 1.0 | | swap | | | vcpus | 2| ++--+ ## VM with flavor 14g-mem is scheduled correctly (Expected No Valid host) $ os server list -c ID -c Name -c Status -c "Flavor"