Reviewed: https://review.openstack.org/520024 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c9b74bcfa09d11c2046ce1bfb6dd8463b3a2f3b0 Submitter: Zuul Branch: master
commit c9b74bcfa09d11c2046ce1bfb6dd8463b3a2f3b0 Author: Maciej Józefczyk <[email protected]> Date: Thu Nov 16 14:49:42 2017 +0100 Update resources once in update_available_resource This change ensures that resources are updated only once per update_available_resource() call. Compute resources were previously updated during host object initialization and at the end of update_available_resource(). It could cause inconsistencies in resource tracking between compute host and DB for couple of second when final _update() at the end of update_available_resource() is being called. For example: nova-api shows that host uses 10GB of RAM, but in fact its 12GB because DB doesn't have resources that belongs to shutdown instance. Because of that fact nova-scheduler (CachingScheduler) could choose (based on imcomplete information) host which is already full. For more informations please see realted bug: #1729621 Change-Id: I120a98cc4c11772f24099081ef3ac44a50daf71d Closes-Bug: #1729621 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1729621 Title: Inconsistent value for vcpu_used Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) ocata series: New Status in OpenStack Compute (nova) pike series: New Bug description: Description =========== Nova updates hypervisor resources using function called ./nova/compute/resource_tracker.py:update_available_resource(). In case of *shutdowned* instances it could impact inconsistent values for resources like vcpu_used. Resources are taken from function self.driver.get_available_resource(): https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/compute/resource_tracker.py#L617 https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/virt/libvirt/driver.py#L5766 This function calculates allocated vcpu's based on function _get_vcpu_total(). https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/virt/libvirt/driver.py#L5352 As we see in _get_vcpu_total() function calls *self._host.list_guests()* without "only_running=False" parameter. So it doesn't respect shutdowned instances. At the end of resource update process function _update_available_resource() is beign called: > /opt/stack/nova/nova/compute/resource_tracker.py(733) 677 @utils.synchronized(COMPUTE_RESOURCE_SEMAPHORE) 678 def _update_available_resource(self, context, resources): 679 681 # initialize the compute node object, creating it 682 # if it does not already exist. 683 self._init_compute_node(context, resources) It initialize compute node object with resources that are calculated without shutdowned instances. If compute node object already exists it *UPDATES* its fields - *for a while nova-api has other resources values than it its in real.* 731 # update the compute_node 732 self._update(context, cn) The inconsistency is automatically fixed during other code execution: https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/compute/resource_tracker.py#L709 But for heavy-loaded hypervisors (like 100 active instances and 30 shutdowned instances) it creates wrong informations in nova database for about 4-5 seconds (in my usecase) - it could impact other issues like spawning on already full hypervisor (because scheduler has wrong informations about hypervisor usage). Steps to reproduce ================== 1) Start devstack 2) Create 120 instances 3) Stop some instances 4) Watch blinking values in nova hypervisor-show nova hypervisor-show e6dfc16b-7914-48fb-a235-6fe3a41bb6db Expected result =============== Returned values should be the same during test. Actual result ============= while true; do echo -n "$(date) "; echo "select hypervisor_hostname, vcpus_used from compute_nodes where hypervisor_hostname='example.compute.node.com';" | mysql nova_cell1; sleep 0.3; done Thu Nov 2 14:50:09 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120 Bad values were stored in nova DB for about 5 seconds. During this time nova-scheduler could take this host. Environment =========== Devstack master (f974e3c3566f379211d7fdc790d07b5680925584). For sure releases down to Newton are impacted. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1729621/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

