Reviewed: https://review.opendev.org/685391 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6198f317be549e6d2bd324a48f226b379556e945 Submitter: Zuul Branch: master
commit 6198f317be549e6d2bd324a48f226b379556e945 Author: Matthew Booth <[email protected]> Date: Fri Sep 27 16:51:02 2019 +0100 libvirt: Ignore DiskNotFound during update_available_resource There was a previous attempt to fix this in change Id687e11e235fd6c2f99bb647184310dfdce9a08d. However, there were 2 problems with the previous fix: 1. The handling of missing volumes and disks, while typically having the same cause, was inconsistent. 2. It failed to consider the very wide race opportunity in _get_disk_over_committed_size_total between initially fetching the instance list from the DB and later getting disk sizes. Because _get_disk_over_committed_size_total() can be a very long operation, we found that we were reliably hitting this race in CI. It might be possible to fix the race, but this would add unnecessary complication to code which isn't critical. It's far more robust just to log it and ignore it, which is also consistent with the handling of missing volumes. Closes-Bug: #1774249 Change-Id: I48719c02713113a41176b8f5cc3c5831f1284a39 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1774249 Title: update_available_resource will raise DiskNotFound after resize but before confirm Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) ocata series: Triaged Status in OpenStack Compute (nova) pike series: Triaged Status in OpenStack Compute (nova) queens series: In Progress Status in OpenStack Compute (nova) rocky series: In Progress Status in OpenStack Compute (nova) stein series: In Progress Bug description: Original reported in RH Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1584315 Tested on OSP12 (Pike), but appears to be still present on master. Should only occur if nova compute is configured to use local file instance storage. Create instance A on compute X Resize instance A to compute Y Domain is powered off /var/lib/nova/instances/<uuid A> renamed to <uuid A>_resize on X Domain is *not* undefined On compute X: update_available_resource runs as a periodic task First action is to update self rt calls driver.get_available_resource() ...calls _get_disk_over_committed_size_total ...iterates over all defined domains, including the ones whose disks we renamed ...fails because a referenced disk no longer exists Results in errors in nova-compute.log: 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager [req-bd52371f-c6ec-4a83-9584-c00c5377acd8 - - - - -] Error updating resources for node compute-0.localdomain.: DiskNotFound: No disk at /var/lib/nova/instances/f3ed9015-3984-43f4-b4a5-c2898052b47d/disk 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager Traceback (most recent call last): 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6695, in update_available_resource_for_node 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager rt.update_available_resource(context, nodename) 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 641, in update_available_resource 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager resources = self.driver.get_available_resource(nodename) 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5892, in get_available_resource 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager disk_over_committed = self._get_disk_over_committed_size_total() 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7393, in _get_disk_over_committed_size_total 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager config, block_device_info) 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7301, in _get_instance_disk_info_from_config 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager dk_size = disk_api.get_allocated_disk_size(path) 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/disk/api.py", line 156, in get_allocated_disk_size 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager return images.qemu_img_info(path).disk_size 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/images.py", line 57, in qemu_img_info 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager raise exception.DiskNotFound(location=path) 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager DiskNotFound: No disk at /var/lib/nova/instances/f3ed9015-3984-43f4-b4a5-c2898052b47d/disk And resource tracker is no longer updated. We can find lots of these in the gate. Note that change Icec2769bf42455853cbe686fb30fda73df791b25 nearly mitigates this, but doesn't because task_state is not set while the instance is awaiting confirm. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1774249/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

