On Mon, 2017-09-25 at 17:36 +0200, Jakub Jursa wrote: > Hello everyone, > > We're experiencing issues with running large instances (~60GB RAM) on > fairly large NUMA nodes (4 CPUs, 256GB RAM) while using cpu pinning. The > problem is that it seems that in some extreme cases qemu/KVM can have > significant memory overhead (10-15%?) which nova-compute service doesn't > take in to the account when launching VMs. Using our configuration as an > example - imagine running two VMs with 30GB RAM on one NUMA node > (because we use cpu pinning) - therefore using 60GB out of 64GB for > given NUMA domain. When both VMs would consume their entire memory > (given 10% KVM overhead) OOM killer takes an action (despite having > plenty of free RAM in other NUMA nodes). (the numbers are just > arbitrary, the point is that nova-scheduler schedules the instance to > run on the node because the memory seems 'free enough', but specific > NUMA node can be lacking the memory reserve). > > Our initial solution was to use ram_allocation_ratio < 1 to ensure > having some reserved memory - this didn't work. Upon studying source of > nova, it turns out that ram_allocation_ratio is ignored when using cpu > pinning. (see > https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L859 > and > https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L821 > ). We're running Mitaka, but this piece of code is implemented in Ocata > in a same way. > We're considering to create a patch for taking ram_allocation_ratio in > to account. > > My question is - is ram_allocation_ratio ignored on purpose when using > cpu pinning? If yes, what is the reasoning behind it? And what would be > the right solution to ensure having reserved RAM on the NUMA nodes?
Both 'ram_allocation_ratio' and 'cpu_allocation_ratio' are ignored when using pinned CPUs because they don't make much sense: you want a high performance VM and have assigned dedicated cores to the instance for this purpose, yet you're telling nova to over-schedule and schedule multiple instances to some of these same cores. What you're probably looking for is the 'reserved_host_memory_mb' option. This defaults to 512 (at least in the latest master) so if you up this to 4192 or similar you should resolve the issue. Hope this helps, Stephen __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
