Reviewed: https://review.opendev.org/c/openstack/nova/+/883411 Committed: https://opendev.org/openstack/nova/commit/6ebd8a56d1c79fc15f385f232ce0e22937b2fdac Submitter: "Zuul (22348)" Branch: master
commit 6ebd8a56d1c79fc15f385f232ce0e22937b2fdac Author: Mark Goddard <[email protected]> Date: Wed May 17 14:20:54 2023 +0100 ironic: Fix ConflictException when deleting server When unprovision works via Ironic, all operations in _cleanup_deploy have already been completed. Previous to this patch, we continue attempting all the clean up steps, which eventually errors out with BadRequest, or similar, and we complete the delete. Howerver, if cleaning has started, we hit a conflict exception, so we don't hit the expected error above. Prior to moving to the SDK, that landed in Caracal, we would retry on conflict errors. You can tweak the config to keep retrying for the length of time cleaning usually takes in your enviroment. After this patch: Ieda5636a5e80ea4af25db2e90be241869759e30c We now hard fail with this error: openstack.exceptions.ConflictException: Failed to detach VIF xxx from bare metal node yyy ... Node yyy is locked by host zzz, please retry after the current operation is completed. This change simply skips calling the operations that will always error out, avoiding the need to wait for cleainging to complete before getting the expected error message. Closes-Bug: #2019977 Related-Bug: #1815799 Change-Id: I60971b58cf1f24bdb5d40f668e380ebfee2ac495 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2019977 Title: Ironic VIF unplug in destroy may fail due to node locked by cleaning Status in Ironic: New Status in OpenStack Compute (nova): Fix Released Bug description: This bug describes a race condition during deletion of a bare metal (Ironic) instance when automated cleaning is enabled in Ironic (which is the default). # Steps to reproduce As a race condition, this one is not easy to reproduce, although it has been seen in the wild on several occasions. The easiest way to reproduce it is in a development environment (e.g. devstack), tweaking some configuration and adding some sleeps to particular operations in nova and ironic. nova.conf configuration: Reduce retries for ironic API polling. Needs to be long enough to allow operations to complete. [ironic] api_max_retries = 10 sleep #1: nova-compute waits for the node to move to cleaning (nova/virt/ironic/driver.py): try: if node.provision_state in _UNPROVISION_STATES: self._unprovision(instance, node) else: # NOTE(hshiina): if spawn() fails before ironic starts # provisioning, instance information should be # removed from ironic node. self._remove_instance_info_from_node(node, instance) finally: ##### HERE ####### time.sleep(20) ##### HERE ####### # NOTE(mgoddard): We don't need to remove instance info at this # point since we will have already done it. The destroy will only # succeed if this method returns without error, so we will end up # removing the instance info eventually. self._cleanup_deploy(node, instance, network_info, remove_instance_info=False) sleep #2: ironic conductor holds onto the node lock: @task_manager.require_exclusive_lock def do_node_clean(task, clean_steps=None, disable_ramdisk=False): """Internal RPC method to perform cleaning of a node. :param task: a TaskManager instance with an exclusive lock on its node :param clean_steps: For a manual clean, the list of clean steps to perform. Is None For automated cleaning (default). For more information, see the clean_steps parameter of :func:`ConductorManager.do_node_clean`. :param disable_ramdisk: Whether to skip booting ramdisk for cleaning. """ ##### HERE ##### import time time.sleep(120) ##### HERE ##### node = task.node manual_clean = clean_steps is not None Next, create, then destroy a bare metal instance. # Expected results The node is deleted # Actual results The node moves to an ERROR state. Logs from nova-compute: 2023-05-17 14:30:30.493 7 ERROR ironicclient.common.http [req-c8d567dc-7088-4bad-9a09-6ba33b6c7b4d 5ef38591b78a42928e8dfbdbaa90acd9 60d7786e349549eeb5a309b96251c789 - default default] Error contacting Ir onic server: Node 2666c347-ff75-4c04-9059-9b890ad14b08 is locked by host mark-ironic.novalocal, please retry after the current operation is completed. (HTTP 409). Attempt 6 of 6: ironicclient.common.apic lient.exceptions.Conflict: Node 2666c347-ff75-4c04-9059-9b890ad14b08 is locked by host mark-ironic.novalocal, please retry after the current operation is completed. (HTTP 409) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server [req-c8d567dc-7088-4bad-9a09-6ba33b6c7b4d 5ef38591b78a42928e8dfbdbaa90acd9 60d7786e349549eeb5a309b96251c789 - default default] Exception during m essage handling: ironicclient.common.apiclient.exceptions.Conflict: Node 2666c347-ff75-4c04-9059-9b890ad14b08 is locked by host mark-ironic.novalocal, please retry after the current operation is complete d. (HTTP 409) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/exception_wrapper.py", line 72, in wrapped 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server context, exc, binary) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_utils/excutils.py", line 227, in __exit__ 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server self.force_reraise() 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_utils/excutils.py", line 200, in force_reraise 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server raise self.value 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/exception_wrapper.py", line 63, in wrapped 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/compute/manager.py", line 10693, in external_instance_event 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server event.tag) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/compute/manager.py", line 10498, in _process_instance_vif_deleted_event 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server self.driver.detach_interface(context, instance, vif) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1630, in detach_interface 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server self.unplug_vifs(instance, [vif]) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1591, in unplug_vifs 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server self._unplug_vifs(node, instance, network_info) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1558, in _unplug_vifs 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server port_id) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/ironic/client_wrapper.py", line 180, in call 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server return self._multi_getattr(client, method)(*args, **kwargs) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/ironicclient/v1/node.py", line 512, in vif_detach 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server global_request_id=global_request_id) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/ironicclient/v1/node.py", line 408, in delete 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server global_request_id=global_request_id) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/ironicclient/common/base.py", line 292, in _delete 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server headers=headers) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/ironicclient/common/http.py", line 426, in raw_request 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server return self._http_request(url, method, **kwargs) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/ironicclient/common/http.py", line 287, in wrapper 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server return func(self, url, method, **kwargs) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/ironicclient/common/http.py", line 387, in _http_request 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server error_json.get('debuginfo'), method, url) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server ironicclient.common.apiclient.exceptions.Conflict: Node 2666c347-ff75-4c04-9059-9b890ad14b08 is locked by host mark-ironic.novalocal, please retry after the current operation is completed. (HTTP 409) # Analysis This issue is similar to https://bugs.launchpad.net/nova/+bug/1815799, where instance_uuid and instance_info were being cleaned up. Ironic deletes VIFs during tear down since Rocky [1], so we should also be safe to skip this in the case where the node has been deprovisioned. Debug logs confirm that the VIF cleanup is a noop: 2023-05-17 14:24:11.791 7 DEBUG nova.virt.ironic.driver [req-e419f330-fe14-4c4b-8d5a-b7a40bb77651 5ef38591b78a42928e8dfbdbaa90acd9 60d7786e349549eeb5a309b96251c789 - default default] VIF a5c1622b-eb13-43 d3-a46d-aa5781a85cdb isn't attached to Ironic node 3152ff2d-a0cb-4e7a-bcc7-81266106fef2 _unplug_vifs /var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/ironic/driver.py:1561 [1] https://review.opendev.org/q/I8d683d2d506c97535b5a8f9a5de4c070c7e887df To manage notifications about this bug go to: https://bugs.launchpad.net/ironic/+bug/2019977/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

