On Mon, 2017-09-25 at 17:36 +0200, Jakub Jursa wrote:
> Hello everyone,
> 
> We're experiencing issues with running large instances (~60GB RAM) on
> fairly large NUMA nodes (4 CPUs, 256GB RAM) while using cpu pinning. The
> problem is that it seems that in some extreme cases qemu/KVM can have
> significant memory overhead (10-15%?) which nova-compute service doesn't
> take in to the account when launching VMs. Using our configuration as an
> example - imagine running two VMs with 30GB RAM on one NUMA node
> (because we use cpu pinning) - therefore using 60GB out of 64GB for
> given NUMA domain. When both VMs would consume their entire memory
> (given 10% KVM overhead) OOM killer takes an action (despite having
> plenty of free RAM in other NUMA nodes). (the numbers are just
> arbitrary, the point is that nova-scheduler schedules the instance to
> run on the node because the memory seems 'free enough', but specific
> NUMA node can be lacking the memory reserve).
> 
> Our initial solution was to use ram_allocation_ratio < 1 to ensure
> having some reserved memory - this didn't work. Upon studying source of
> nova, it turns out that ram_allocation_ratio is ignored when using cpu
> pinning. (see
> https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L859
> and
> https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L821
> ). We're running Mitaka, but this piece of code is implemented in Ocata
> in a same way.
> We're considering to create a patch for taking ram_allocation_ratio in
> to account.
> 
> My question is - is ram_allocation_ratio ignored on purpose when using
> cpu pinning? If yes, what is the reasoning behind it? And what would be
> the right solution to ensure having reserved RAM on the NUMA nodes?

Both 'ram_allocation_ratio' and 'cpu_allocation_ratio' are ignored when using
pinned CPUs because they don't make much sense: you want a high performance VM
and have assigned dedicated cores to the instance for this purpose, yet you're
telling nova to over-schedule and schedule multiple instances to some of these
same cores.

What you're probably looking for is the 'reserved_host_memory_mb' option. This
defaults to 512 (at least in the latest master) so if you up this to 4192 or
similar you should resolve the issue.

Hope this helps,
Stephen

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to