Reviewed: https://review.openstack.org/626228 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1def76a1c49032d93ab6c7ee61dbbfe8e29cafca Submitter: Zuul Branch: master
commit 1def76a1c49032d93ab6c7ee61dbbfe8e29cafca Author: Stephen Finucane <[email protected]> Date: Wed Dec 19 16:03:22 2018 +0000 Handle unbound vif plug errors on compute restart As with change Ia963a093a1b26d90b4de2e8fc623031cf175aece, we can sometimes cache failed port binding information which we'll see on startup. Long term, the fix for both issues is to figure out how this is being cached and stop that happening but for now we simply need to allow the service to start up. To this end, we copy the approach in the aforementioned change and implement a translation function in os_vif_util for unbound which will make the plug_vifs code raise VirtualInterfacePlugException which is what the _init_instance code in ComputeManager is already handling. This has the same caveats as that change, namely that there may be smarter ways to do this that we should explore. However, that change also included a note which goes someway to explaining this. Change-Id: Iaec1f6fd12dba8b11991b7a7595593d5c8b1db50 Signed-off-by: Stephen Finucane <[email protected]> Related-bug: #1784579 Closes-bug: #1809136 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1809136 Title: Unsupported VIF type unbound convert '_nova_to_osvif_vif_unbound' on compute restart Status in OpenStack Compute (nova): Fix Released Bug description: This is a variant of an existing bug: - https://bugs.launchpad.net/nova/+bug/1738373 tracks a similar exception ('_nova_to_osvif_vif_binding_failed') on compute startup. There are also two other closely related bugs: - https://bugs.launchpad.net/nova/+bug/1783917 tracks this same exception ('_nova_to_osvif_vif_unbound') but for live migrations - https://bugs.launchpad.net/nova/+bug/1784579 tracks a similar exception ('_nova_to_osvif_vif_binding_failed') but for live migration In addition, there are a few bugs which are likely the root cause of all of the above issues (and this one) in the first place: - https://bugs.launchpad.net/nova/+bug/1751923 In this instance, as with bug 1738373, we are unable to start nova- compute service on compute node due to an os-vif invoked error. nova-compute.log on compute shows: 2018-05-12 16:42:47.323 305978 INFO os_vif [req-0a72cdea-843a-4932-b8a0-bc24c2f21d9f - - - - -] Successfully plugged vif VIFBridge(active=True,address=fa:16:3e:41:a9:2c,bridge_name='qbr8d027ff4-23',has_traffic_filtering=True,id=8d027ff4-2328-47df-9f9a-2c1a9914a83b,network=Network(9a98b244-b1d2-46b3-ab0e-be8456e3a984),plugin='ovs',port_profile=VIFPortProfileBase,preserve_on_delete=False,vif_name='tap8d027ff4-23') 2018-05-12 16:42:47.369 305978 ERROR oslo_service.service [req-0a72cdea-843a-4932-b8a0-bc24c2f21d9f - - - - -] Error starting thread. 2018-05-12 16:42:47.369 305978 ERROR oslo_service.service Traceback (most recent call last): 2018-05-12 16:42:47.369 305978 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 708, in run_service 2018-05-12 16:42:47.369 305978 ERROR oslo_service.service service.start() 2018-05-12 16:42:47.369 305978 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/service.py", line 117, in start 2018-05-12 16:42:47.369 305978 ERROR oslo_service.service self.manager.init_host() 2018-05-12 16:42:47.369 305978 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1154, in init_host 2018-05-12 16:42:47.369 305978 ERROR oslo_service.service self._init_instance(context, instance) 2018-05-12 16:42:47.369 305978 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 957, in _init_instance 2018-05-12 16:42:47.369 305978 ERROR oslo_service.service self.driver.plug_vifs(instance, net_info) 2018-05-12 16:42:47.369 305978 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 703, in plug_vifs 2018-05-12 16:42:47.369 305978 ERROR oslo_service.service self.vif_driver.plug(instance, vif) 2018-05-12 16:42:47.369 305978 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/vif.py", line 771, in plug 2018-05-12 16:42:47.369 305978 ERROR oslo_service.service vif_obj = os_vif_util.nova_to_osvif_vif(vif) 2018-05-12 16:42:47.369 305978 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/network/os_vif_util.py", line 408, in nova_to_osvif_vif 2018-05-12 16:42:47.369 305978 ERROR oslo_service.service {'type': vif['type'], 'func': funcname}) 2018-05-12 16:42:47.369 305978 ERROR oslo_service.service NovaException: Unsupported VIF type unbound convert '_nova_to_osvif_vif_unbound' 2018-05-12 16:42:47.369 305978 ERROR oslo_service.service Inspecting the available ports shows the port does exist, so this looks like a caching issue. [stack@director:~]$ neutron port-list | grep fa:16:3e:41:a9:2c | 8d027ff4-2328-47df-9f9a-2c1a9914a83b | | fa:16:3e:41:a9:2c | {"subnet_id": "1f5ed9bc-aa7d-49bd-ac48-23b430fc0eb4", "ip_address": "172.19.9.17"} | [stack@director:~]$ neutron port-show 8d027ff4-2328-47df-9f9a-2c1a9914a83b +-----------------------+------------------------------------------------------------------------------------+ | Field | Value | +-----------------------+------------------------------------------------------------------------------------+ | admin_state_up | True | | allowed_address_pairs | | | binding:host_id | overcloud-compute-7.localdomain | | binding:profile | {} | | binding:vif_details | {"port_filter": true, "ovs_hybrid_plug": true} | | binding:vif_type | ovs | | binding:vnic_type | normal | | created_at | 2017-10-31T12:31:45Z | | description | | | device_id | b4ef4d0b-9e39-4741-a2dd-7fd7c066d13b | | device_owner | compute:nova | | extra_dhcp_opts | | | fixed_ips | {"subnet_id": "1f5ed9bc-aa7d-49bd-ac48-23b430fc0eb4", "ip_address": "172.19.9.17"} | | id | 8d027ff4-2328-47df-9f9a-2c1a9914a83b | | mac_address | fa:16:3e:41:a9:2c | | name | | | network_id | 9a98b244-b1d2-46b3-ab0e-be8456e3a984 | | port_security_enabled | True | | project_id | 3b2049626c954cdc9147beee2d34b441 | | qos_policy_id | | | revision_number | 184 | | security_groups | 97aa0764-c0b5-47d1-88b2-285673d46a31 | | | c7addc13-5a77-4322-953a-9d89d42468e6 | | | cecdad42-7c78-45e7-9ec2-fef1086dbb7e | | | de0a6da8-c44e-475f-90fd-1fb625840c52 | | status | ACTIVE | | tenant_id | 3b2049626c954cdc9147beee2d34b441 | | updated_at | 2018-05-12T15:37:46Z | +-----------------------+------------------------------------------------------------------------------------+ We should figure out why the invalid cache is getting saved, but we're going to track that effort separately. For now, we should just focus on letting the service start, putting instances with errors like this into error state. This was originally reported here https://bugzilla.redhat.com/show_bug.cgi?id=1578028 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1809136/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

