Reviewed: https://review.openstack.org/378746 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0f4bd241665c287e49f2d30ca79be96298217b7e Submitter: Jenkins Branch: master
commit 0f4bd241665c287e49f2d30ca79be96298217b7e Author: Matthew Booth <[email protected]> Date: Wed Sep 28 16:44:41 2016 +0100 libvirt: Fix BlockDevice.wait_for_job when qemu reports no job We were misinterpreting the return value of blockJobInfo. Most immediately we were expecting it to return an integer, which has never been the case. blockJobInfo also raises an exception on error. Note that the implementation of abort_on_error has always expected an integer return value, and exceptions have never been handled, which means that abort_on_error has always been a no-op, and exceptions have never been swallowed. As this is also the most intuitive behaviour, we make it explicit by removing abort_on_error. Any exception raised by blockJobInfo will continue to propagate unhandled. We were obfuscating the return value indicating that the job did not exist, {}, by populating a BlockDeviceJobInfo with fake values. We de-obfuscate this by returning None instead, which is unambiguous. wait_for_job() was misnamed, as it does not wait. This is renamed to is_job_complete() to be less confusing. Note that the logic is reversed. After discussion with Eric Blake of the libvirt team (see review comments: https://review.openstack.org/#/c/375652/), we are now confident asserting that if no job exists then it has completed (although we are still not sure that it succeeded). Consequently we remove the wait_for_job_clean parameter, and always assume that no job means it has completed. Previously this was implicit because no job meant a defaulted BlockDeviceJobInfo.job value of 0. Co-authored-by: Sławek Kapłoński <[email protected]> Closes-Bug: #1627134 Change-Id: I2d0daa32b1d37fa60412ad7a374ee38cebdeb579 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1627134 Title: libvirt driver stuck deleting online snapshot Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) newton series: Confirmed Bug description: There is a problem in nova code in nova/virt/libvirt/driver.py: dev = guest.get_block_device(rebase_disk) if guest.is_active(): result = dev.rebase(rebase_base, relative=relative) if result == 0: LOG.debug('blockRebase started successfully', instance=instance) while dev.wait_for_job(abort_on_error=True): LOG.debug('waiting for blockRebase job completion', instance=instance) time.sleep(0.5) It expects that libvirt block job stays for some period in 'cur == end' state, with end != 0 (wait_for_job logic). But in fact, at least for libvirt 1.3.3.2 and libvirt-python-1.2.17, we are not guaranteed to catch a job in such a state, before it dissapears and libvirt call returns empty result. Which is represented in get_job_info() by BlockDeviceJobInfo(job=0,bandwidth=0,cur=0,end=0). Such result doesn't match wait_for_job finish criteria (effective since I45ac06eae0b1949f746dae305469718649bfcf23 is merged). This bug started to occur in our third-party CI: http://openstack-3rd-party-storage-ci-logs.virtuozzo.com/28/314928/13/check/dsvm-tempest-kvm/5aae7aa n-cpu.log: 2016-08-17 15:47:04.856 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] volume_snapshot_delete: delete_info: {u'type': u'qcow2', u'merge_target_file': None, u'file_to_merge': None, u'volume_id': u'3e64cef0-03e3-407e-b6c5-fac873a7c98a'} _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2054 2016-08-17 15:47:04.864 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] found device at vda _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2098 2016-08-17 15:47:04.864 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] disk: vda, base: None, bw: 0, relative: False _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2171 2016-08-17 15:47:04.868 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] blockRebase started successfully _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2178 2016-08-17 15:47:04.889 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] waiting for blockRebase job completion _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2182 2016-08-17 15:47:05.396 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] waiting for blockRebase job completion _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2182 2016-08-17 15:47:05.951 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] waiting for blockRebase job completion _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2182 2016-08-17 15:47:06.456 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] waiting for blockRebase job completion _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2182 2016-08-17 15:47:06.968 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] waiting for blockRebase job completion _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2182 2016-08-17 15:47:07.594 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] waiting for blockRebase job completion _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2182 BTW, I didn't found tests checking this in gate. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1627134/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

