On 27.09.2017 10:14, Stephen Finucane wrote: > On Mon, 2017-09-25 at 17:36 +0200, Jakub Jursa wrote: >> Hello everyone, >> >> We're experiencing issues with running large instances (~60GB RAM) on >> fairly large NUMA nodes (4 CPUs, 256GB RAM) while using cpu pinning. The >> problem is that it seems that in some extreme cases qemu/KVM can have >> significant memory overhead (10-15%?) which nova-compute service doesn't >> take in to the account when launching VMs. Using our configuration as an >> example - imagine running two VMs with 30GB RAM on one NUMA node >> (because we use cpu pinning) - therefore using 60GB out of 64GB for >> given NUMA domain. When both VMs would consume their entire memory >> (given 10% KVM overhead) OOM killer takes an action (despite having >> plenty of free RAM in other NUMA nodes). (the numbers are just >> arbitrary, the point is that nova-scheduler schedules the instance to >> run on the node because the memory seems 'free enough', but specific >> NUMA node can be lacking the memory reserve). >> >> Our initial solution was to use ram_allocation_ratio < 1 to ensure >> having some reserved memory - this didn't work. Upon studying source of >> nova, it turns out that ram_allocation_ratio is ignored when using cpu >> pinning. (see >> https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L859 >> and >> https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L821 >> ). We're running Mitaka, but this piece of code is implemented in Ocata >> in a same way. >> We're considering to create a patch for taking ram_allocation_ratio in >> to account. >> >> My question is - is ram_allocation_ratio ignored on purpose when using >> cpu pinning? If yes, what is the reasoning behind it? And what would be >> the right solution to ensure having reserved RAM on the NUMA nodes? > > Both 'ram_allocation_ratio' and 'cpu_allocation_ratio' are ignored when using > pinned CPUs because they don't make much sense: you want a high performance VM > and have assigned dedicated cores to the instance for this purpose, yet you're > telling nova to over-schedule and schedule multiple instances to some of these > same cores.
I wanted to use 'ram_allocation_ration' with value for example 0.8 to force 'under-schedule' the host, to create a reserve on the host. > > What you're probably looking for is the 'reserved_host_memory_mb' option. This > defaults to 512 (at least in the latest master) so if you up this to 4192 or > similar you should resolve the issue. I'm afraid that this won't help as this option doesn't take into account NUMA nodes (e.g. there would be 'reserved_host_memory_mb' amount of free memory on the physical node, but not in all its NUMA nodes > > Hope this helps, > Stephen > Regards, Jakub __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
