Reviewed: https://review.openstack.org/629281 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b24ad3780bc872d1a17907909cd6bcbea7e804b3 Submitter: Zuul Branch: master
commit b24ad3780bc872d1a17907909cd6bcbea7e804b3 Author: Stephen Finucane <[email protected]> Date: Tue Jan 8 17:01:41 2019 +0000 Fix overcommit for NUMA-based instances Change I5f5c621f2f0fa1bc18ee9a97d17085107a5dee53 modified how we evaluated available memory for instances with a NUMA topology. Previously, we used a non-pagesize aware check unless the user had explicitly requested a specific pagesize. This means that for instances without pagesize requests, nova considers hugepages as available memory when deciding if a host has enough available memory for the instance. The aforementioned change modified this so that all NUMA-based instances, whether they had hugepages or not, would use the pagesize-aware check. Unfortunately the functionality it was reusing to do this was functionality previously only used for hugepages. Hugepages cannot be oversubscribed so we did not take oversubscription into account, comparing against available memory on the host (i.e. memory not consumed by other instances) rather than total memory. This is OK when using hugepages but not small pages, where overcommit is OK. Given that overcommit is already handled elsewhere in the code, we simply modify the non-hugepage code path to check for available memory of the lowest pagesize vs. total memory. Change-Id: I890b2c81cd49c1c601e9baee6a249709d0f6810e Signed-off-by: Stephen Finucane <[email protected]> Closes-Bug: #1810977 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1810977 Title: Oversubscription broken for instances with NUMA topologies Status in OpenStack Compute (nova): Fix Released Bug description: As described in [1], the fix to [2] appears to have inadvertently broken oversubscription of memory for instances with a NUMA topology but no hugepages. Steps to reproduce: 1. Create a flavor that will consume > 50% available memory for your host(s) and specify an explicit NUMA topology. For example, on my all- in-one deployment where the host has 32GB RAM, we will request a 20GB instance: $ openstack flavor create --vcpu 2 --disk 0 --ram 20480 test.numa $ openstack flavor set test.numa --property hw:numa_nodes=2 2. Boot an instance using this flavor: $ openstack server create --flavor test.numa --image cirros-0.3.6-x86_64-disk --wait test 3. Boot another instance using this flavor: $ openstack server create --flavor test.numa --image cirros-0.3.6-x86_64-disk --wait test2 # Expected result: The second instance should boot. # Actual result: The second instance fails to boot. We see the following error message in the logs. nova-scheduler[18295]: DEBUG nova.virt.hardware [None req-f7a6594b-8d25-424c-9c6e-8522f66ffd22 demo admin] No specific pagesize requested for instance, selected pagesize: 4 {{(pid=18318) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1045}} nova-scheduler[18295]: DEBUG nova.virt.hardware [None req-f7a6594b-8d25-424c-9c6e-8522f66ffd22 demo admin] Not enough available memory to schedule instance with pagesize 4. Required: 10240, available: 5676, total: 15916. {{(pid=18318) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1055}} If we revert the patch that addressed the bug [3] then we revert to the correct behaviour and the instance boots. With this though, we obviously lose whatever benefits that change gave us. [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-January/001459.html [2] https://bugs.launchpad.net/nova/+bug/1734204 [3] https://review.openstack.org/#/c/532168 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1810977/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

