Reviewed: https://review.openstack.org/587071 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=41452a5c6adb8cae54eef24803f4adc468131b34 Submitter: Zuul Branch: master
commit 41452a5c6adb8cae54eef24803f4adc468131b34 Author: Lee Yarwood <[email protected]> Date: Mon Jul 30 13:41:35 2018 +0100 conductor: Recreate volume attachments during a reschedule When an instance with attached volumes fails to spawn, cleanup code within the compute manager (_shutdown_instance called from _build_resources) will delete the volume attachments referenced by the bdms in Cinder. As a result we should check and if necessary recreate these volume attachments when rescheduling an instance. Note that there are a few different ways to fix this bug by making changes to the compute manager code, either by not deleting the volume attachment on failure before rescheduling [1] or by performing the get/create check during each build after the reschedule [2]. The problem with *not* cleaning up the attachments is if we don't reschedule, then we've left orphaned "reserved" volumes in Cinder (or we have to add special logic to tell compute when to cleanup attachments). The problem with checking the existence of the attachment on every new host we build on is that we'd be needlessly checking that for initial creates even if we don't ever need to reschedule, unless again we have special logic against that (like checking to see if we've rescheduled at all). Also, in either case that involves changes to the compute means that older computes might not have the fix. So ultimately it seems that the best way to handle this is: 1. Only deal with this on reschedules. 2. Let the cell conductor orchestrate it since it's already dealing with the reschedule. Then the compute logic doesn't need to change. [1] https://review.openstack.org/#/c/587071/3/nova/compute/manager.py@1631 [2] https://review.openstack.org/#/c/587071/4/nova/compute/manager.py@1667 Change-Id: I739c06bd02336bf720cddacb21f48e7857378487 Closes-bug: #1784353 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1784353 Title: Rescheduled boot from volume instances fail due to the premature removal of their attachments Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: In Progress Status in OpenStack Compute (nova) rocky series: In Progress Bug description: Description =========== This is caused by the cleanup code within the compute layer (_shutdown_instance) removing all volume attachments associated with an instance with no attempt being made to recreate these ahead of the instance being rescheduled. Steps to reproduce ================== - Attempt to boot an instance with volumes attached. - Ensure spawn() fails, for example by stopping the l2 network agent services on the compute host. Expected result =============== The instance is reschedule to another compute host and boots correctly. Actual result ============= The instance fails to boot on all hosts that is rescheduled to due to a missing volume attachment. Environment =========== 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ bf497cc47497d3a5603bf60de652054ac5ae1993 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? Libvirt + KVM, however this shouldn't matter. 3. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? N/A 4. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs ============== 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] Traceback (most recent call last): 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1579, in _prep_block_device 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] wait_func=self._await_block_device_map_created) 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 837, in attach_block_devices 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] _log_and_attach(device) 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 834, in _log_and_attach 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] bdm.attach(*attach_args, **attach_kwargs) 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 46, in wrapped 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] ret_val = method(obj, context, *args, **kwargs) 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 617, in attach 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] virt_driver, do_driver_attach) 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in inner 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] return f(*args, **kwargs) 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 614, in _do_locked_attach 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] self._do_attach(*args, **_kwargs) 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 599, in _do_attach 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] do_driver_attach) 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 513, in _volume_attach 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] self['mount_device'])['connection_info'] 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 379, in wrapper 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] res = method(self, ctx, *args, **kwargs) 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 418, in wrapper 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] attachment_id=attachment_id)) 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 450, in _reraise 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] six.reraise(type(desired_exc), desired_exc, sys.exc_info()[2]) 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 415, in wrapper 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] res = method(self, ctx, attachment_id, *args, **kwargs) 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 824, in attachment_update 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] 'code': getattr(ex, 'code', None)}) 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] self.force_reraise() 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] six.reraise(self.type_, self.value, self.tb) 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 814, in attachment_update 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] attachment_id, _connector) 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/cinderclient/v3/attachments.py", line 67, in update 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] resp = self._update('/attachments/%s' % id, body) 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/cinderclient/base.py", line 344, in _update 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] resp, body = self.api.client.put(url, body=body, **kwargs) 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/cinderclient/client.py", line 206, in put 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] return self._cs_request(url, 'PUT', **kwargs) 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/cinderclient/client.py", line 191, in _cs_request 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] return self.request(url, method, **kwargs) 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/cinderclient/client.py", line 177, in request 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] raise exceptions.from_response(resp, body) 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] VolumeAttachmentNotFound: Volume attachment 11 [details]d518a9-16d4-4ccb-9487-ec2b35834945 could not be found. 2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1784353/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

