Reviewed: https://review.openstack.org/401375 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d079f37f7ccb8da9e74416cedce4f194b66ee1d0 Submitter: Jenkins Branch: master
commit d079f37f7ccb8da9e74416cedce4f194b66ee1d0 Author: int32bit <[email protected]> Date: Wed Nov 23 18:21:16 2016 +0800 Fix wait for detach code to handle 'disk not found error' Currently, the code does an initial detach from the persistent and transient domains in one call, then follows up with a call of the retryable function, which first checks the domain xml before retrying the transient domain detach. If the transient domain detach (which is asynchronous) completes after we checked the domain xml, trying to detach it again will raise a LibvirtError exception. Then, our loop code in `_do_wait_and_retry_detach` doesn't catch it and will respond with error. We should be handling the `disk not found error` from libvirt and consider the job complete. Closes-Bug: #1642689 Change-Id: I131aaf28d2f5d5d964d4045e3d7d62207079cfb0 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1642689 Title: ceph: volume detach fails with "libvirtError: operation failed: disk vdb not found" Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) newton series: In Progress Bug description: Seeing this failure in this job: tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_snapshot_create_with_volume_in_use[compute ,id-b467b54c-07a4-446d-a1cf-651dedcc3ff1] While detaching a volume on cleanup it times out waiting for the volume status to go from 'in-use' back to 'available': 2016-11-17 11:58:59.110301 | Captured traceback: 2016-11-17 11:58:59.110310 | ~~~~~~~~~~~~~~~~~~~ 2016-11-17 11:58:59.110322 | Traceback (most recent call last): 2016-11-17 11:58:59.110342 | File "tempest/common/waiters.py", line 189, in wait_for_volume_status 2016-11-17 11:58:59.110356 | raise lib_exc.TimeoutException(message) 2016-11-17 11:58:59.110373 | tempest.lib.exceptions.TimeoutException: Request timed out 2016-11-17 11:58:59.110405 | Details: Volume db12eda4-4ce6-4f00-a4e0-9df115f230e5 failed to reach available status (current in-use) within the required time (196 s). The volume detach request is here: 2016-11-17 11:58:59.031058 | 2016-11-17 11:38:55,018 8316 INFO [tempest.lib.common.rest_client] Request (VolumesV1SnapshotTestJSON:_run_cleanups): 202 DELETE http://127.0.0.1:8774/v2.1/servers/584a65b5-07fa-4994-a2d5-1676d0e13a8c/os-volume_attachments/db12eda4-4ce6-4f00-a4e0-9df115f230e5 0.277s 2016-11-17 11:58:59.031103 | 2016-11-17 11:38:55,018 8316 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'X-Auth-Token': '<omitted>', 'Accept': 'application/json', 'Content-Type': 'application/json'} 2016-11-17 11:58:59.031113 | Body: None 2016-11-17 11:58:59.031235 | Response - Headers: {'content-location': 'http://127.0.0.1:8774/v2.1/servers/584a65b5-07fa-4994-a2d5-1676d0e13a8c/os-volume_attachments/db12eda4-4ce6-4f00-a4e0-9df115f230e5', 'content-type': 'application/json', 'x-openstack-nova-api-version': '2.1', 'date': 'Thu, 17 Nov 2016 11:38:55 GMT', 'content-length': '0', 'status': '202', 'connection': 'close', 'x-compute-request-id': 'req-9f0541d3-6eec-4793-8852-7bd01708932e', 'openstack-api-version': 'compute 2.1', 'vary': 'X-OpenStack-Nova-API-Version'} 2016-11-17 11:58:59.031248 | Body: Following the req-9f0541d3-6eec-4793-8852-7bd01708932e request ID to the compute logs we see this detach failure: http://logs.openstack.org/00/398800/1/gate/gate-tempest-dsvm-full- devstack-plugin-ceph-ubuntu- xenial/a387fb0/logs/screen-n-cpu.txt.gz?level=TRACE#_2016-11-17_11_39_00_649 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [req-9f0541d3-6eec-4793-8852-7bd01708932e tempest-VolumesV1SnapshotTestJSON-1819335716 tempest-VolumesV1SnapshotTestJSON-1819335716] [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] Failed to detach volume db12eda4-4ce6-4f00-a4e0-9df115f230e5 from /dev/vdb 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] Traceback (most recent call last): 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/opt/stack/new/nova/nova/compute/manager.py", line 4757, in _driver_detach_volume 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] encryption=encryption) 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 1307, in detach_volume 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] wait_for_detach() 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/usr/local/lib/python2.7/dist-packages/oslo_service/loopingcall.py", line 385, in func 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] return evt.wait() 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 121, in wait 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] return hubs.get_hub().switch() 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 294, in switch 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] return self.greenlet.switch() 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/usr/local/lib/python2.7/dist-packages/oslo_service/loopingcall.py", line 136, in _run_loop 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] result = func(*self.args, **self.kw) 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/usr/local/lib/python2.7/dist-packages/oslo_service/loopingcall.py", line 356, in _func 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] result = f(*args, **kwargs) 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/opt/stack/new/nova/nova/virt/libvirt/guest.py", line 380, in _do_wait_and_retry_detach 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] self.detach_device(config, persistent=False, live=live) 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/opt/stack/new/nova/nova/virt/libvirt/guest.py", line 401, in detach_device 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] self._domain.detachDeviceFlags(device_xml, flags=flags) 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 186, in doit 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] result = proxy_call(self._autowrap, f, *args, **kwargs) 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 144, in proxy_call 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] rv = execute(f, *args, **kwargs) 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 125, in execute 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] six.reraise(c, e, tb) 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] rv = meth(*args, **kwargs) 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 1190, in detachDeviceFlags 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] if ret == -1: raise libvirtError ('virDomainDetachDeviceFlags() failed', dom=self) 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] libvirtError: operation failed: disk vdb not found 2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Failed%20to%20detach%20volume%5C%22%20AND%20message%3A%5C%22_do_wait_and_retry_detach%5C%22%20AND%20message%3A%5C%22libvirtError%3A%20operation%20failed%3A%20disk%5C%22%20AND%20message%3A%5C%22not%20found%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22&from=7d 21 hits in the last 7 days, check and gate, all failures, 90% in ceph jobs. Looks like that wait for detach code should be handling the not found from libvirt and considering the job complete. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1642689/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

