Public bug reported: Description ===========
Nova updates hypervisor resources using function called ./nova/compute/resource_tracker.py:update_available_resource(). In case of *shutdowned* instances it could impact inconsistent values for resources like vcpu_used. Resources are taken from function self.driver.get_available_resource(): https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/compute/resource_tracker.py#L617 https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/virt/libvirt/driver.py#L5766 This function calculates allocated vcpu's based on function _get_vcpu_total(). https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/virt/libvirt/driver.py#L5352 As we see in _get_vcpu_total() function calls *self._host.list_guests()* without "only_running=False" parameter. So it doesn't respect shutdowned instances. At the end of resource update process function _update_available_resource() is beign called: > /opt/stack/nova/nova/compute/resource_tracker.py(733) 677 @utils.synchronized(COMPUTE_RESOURCE_SEMAPHORE) 678 def _update_available_resource(self, context, resources): 679 681 # initialize the compute node object, creating it 682 # if it does not already exist. 683 self._init_compute_node(context, resources) It initialize compute node object with resources that are calculated without shutdowned instances. If compute node object already exists it *UPDATES* its fields - *for a while nova-api has other resources values than it its in real.* 731 # update the compute_node 732 self._update(context, cn) The inconsistency is automatically fixed during other code execution: https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/compute/resource_tracker.py#L709 But for heavy-loaded hypervisors (like 100 active instances and 30 shutdowned instances) it creates wrong informations in nova database for about 4-5 seconds (in my usecase) - it could impact other issues like spawning on already full hypervisor (because scheduler has wrong informations about hypervisor usage). Steps to reproduce ================== 1) Start devstack 2) Create 120 instances 3) Stop some instances 4) Watch blinking values in nova hypervisor-show nova hypervisor-show e6dfc16b-7914-48fb-a235-6fe3a41bb6db Expected result =============== Returned values should be the same during test. Actual result ============= while true; do echo -n "$(date) "; echo "select hypervisor_hostname, vcpus_used from compute_nodes where hypervisor_hostname='example.compute.node.com';" | mysql nova_cell1; sleep 0.3; done Thu Nov 2 14:50:09 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120 Bad values were stored in nova DB for about 5 seconds. During this time nova-scheduler could take this host. Environment =========== Devstack master (f974e3c3566f379211d7fdc790d07b5680925584). For sure releases down to Newton are impacted. ** Affects: nova Importance: Undecided Status: New ** Description changed: Description =========== - - Nova updates hypervisor resources using function called ./nova/compute/resource_tracker.py:update_available_resource(). + Nova updates hypervisor resources using function called + ./nova/compute/resource_tracker.py:update_available_resource(). In case of *shutdowned* instances it could impact inconsistent values for resources like vcpu_used. Resources are taken from function self.driver.get_available_resource(): https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/compute/resource_tracker.py#L617 https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/virt/libvirt/driver.py#L5766 This function calculates allocated vcpu's based on function _get_vcpu_total(). https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/virt/libvirt/driver.py#L5352 As we see in _get_vcpu_total() function calls *self._host.list_guests()* without "only_running=False" parameter. So it doesn't respect shutdowned instances. - At the end of resource update process function _update_available_resource() is beign called: > /opt/stack/nova/nova/compute/resource_tracker.py(733) - 677 @utils.synchronized(COMPUTE_RESOURCE_SEMAPHORE) - 678 def _update_available_resource(self, context, resources): - 679 - 681 # initialize the compute node object, creating it - 682 # if it does not already exist. - 683 self._init_compute_node(context, resources) + 677 @utils.synchronized(COMPUTE_RESOURCE_SEMAPHORE) + 678 def _update_available_resource(self, context, resources): + 679 + 681 # initialize the compute node object, creating it + 682 # if it does not already exist. + 683 self._init_compute_node(context, resources) It initialize compute node object with resources that are calculated without shutdowned instances. If compute node object already exists it *UPDATES* its fields - *for a while nova-api has other resources values than it its in real.* - - 731 # update the compute_node - 732 self._update(context, cn) + 731 # update the compute_node + 732 self._update(context, cn) The inconsistency is automatically fixed during other code execution: https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/compute/resource_tracker.py#L709 - - But for heavy-loaded hypervisors (like 100 active instances and 30 shutdowned instances) it creates wrong informations in nova database for about 4-5 seconds (in my usecase) - it could impact other issues like spawning on already full hypervisor (because scheduler has wrong informations about hypervisor usage). - + But for heavy-loaded hypervisors (like 100 active instances and 30 + shutdowned instances) it creates wrong informations in nova database for + about 4-5 seconds (in my usecase) - it could impact other issues like + spawning on already full hypervisor (because scheduler has wrong + informations about hypervisor usage). Steps to reproduce ================== - 1) Start devstack 2) Create 120 instances 3) Stop some instances 4) Watch blinking values in nova hypervisor-show nova hypervisor-show e6dfc16b-7914-48fb-a235-6fe3a41bb6db Expected result =============== Returned values should be the same during test. - Actual result ============= - while true; do echo -n "$(date) "; echo "select hypervisor_hostname, vcpus_used from compute_nodes where hypervisor_hostname='example.compute.node.com';" | mysql nova_cell1; sleep 0.3; done + while true; do echo -n "$(date) "; echo "select hypervisor_hostname, vcpus_used from compute_nodes where hypervisor_hostname='example.compute.node.com';" | mysql nova_cell1; sleep 0.3; done Thu Nov 2 14:50:09 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120 - Bad values where stored in for about 5 seconds. During this time nova- - scheduler could take this host. - + Bad values were stored in nova DB for about 5 seconds. During this time + nova-scheduler could take this host. Environment =========== Devstack master (f974e3c3566f379211d7fdc790d07b5680925584). For sure releases down to Newton are impacted. -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1729621 Title: Inconsistent value for vcpu_used Status in OpenStack Compute (nova): New Bug description: Description =========== Nova updates hypervisor resources using function called ./nova/compute/resource_tracker.py:update_available_resource(). In case of *shutdowned* instances it could impact inconsistent values for resources like vcpu_used. Resources are taken from function self.driver.get_available_resource(): https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/compute/resource_tracker.py#L617 https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/virt/libvirt/driver.py#L5766 This function calculates allocated vcpu's based on function _get_vcpu_total(). https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/virt/libvirt/driver.py#L5352 As we see in _get_vcpu_total() function calls *self._host.list_guests()* without "only_running=False" parameter. So it doesn't respect shutdowned instances. At the end of resource update process function _update_available_resource() is beign called: > /opt/stack/nova/nova/compute/resource_tracker.py(733) 677 @utils.synchronized(COMPUTE_RESOURCE_SEMAPHORE) 678 def _update_available_resource(self, context, resources): 679 681 # initialize the compute node object, creating it 682 # if it does not already exist. 683 self._init_compute_node(context, resources) It initialize compute node object with resources that are calculated without shutdowned instances. If compute node object already exists it *UPDATES* its fields - *for a while nova-api has other resources values than it its in real.* 731 # update the compute_node 732 self._update(context, cn) The inconsistency is automatically fixed during other code execution: https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/compute/resource_tracker.py#L709 But for heavy-loaded hypervisors (like 100 active instances and 30 shutdowned instances) it creates wrong informations in nova database for about 4-5 seconds (in my usecase) - it could impact other issues like spawning on already full hypervisor (because scheduler has wrong informations about hypervisor usage). Steps to reproduce ================== 1) Start devstack 2) Create 120 instances 3) Stop some instances 4) Watch blinking values in nova hypervisor-show nova hypervisor-show e6dfc16b-7914-48fb-a235-6fe3a41bb6db Expected result =============== Returned values should be the same during test. Actual result ============= while true; do echo -n "$(date) "; echo "select hypervisor_hostname, vcpus_used from compute_nodes where hypervisor_hostname='example.compute.node.com';" | mysql nova_cell1; sleep 0.3; done Thu Nov 2 14:50:09 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120 Bad values were stored in nova DB for about 5 seconds. During this time nova-scheduler could take this host. Environment =========== Devstack master (f974e3c3566f379211d7fdc790d07b5680925584). For sure releases down to Newton are impacted. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1729621/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

