Reviewed: https://review.openstack.org/413469 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2207dcf560413b213a8fb3737bb4b0923dcd96e0 Submitter: Jenkins Branch: master
commit 2207dcf560413b213a8fb3737bb4b0923dcd96e0 Author: Huan Xie <[email protected]> Date: Tue Dec 20 23:26:49 2016 -0800 XenAPI: Fix vif plug problem during VM rescue/unrescue During VM rescue tests, we found nova xenserver driver failed due to waiting vif-plug-event from neutron timeout. when checking nova and neutron logs, we found there are several mistakes in nova driver: (1) After several rounds of rescuing/unrescuing, it will wait for vif-plug-event, but actually, it shouldn't wait for such event (2) Checking neutron log, we found the port status sometimes will change during rescuing/unrescuing, which also shouldn't happen (3) Checking nova related code, we found each time when booting a VM, it will delete and create the tap device, which is used by neutron security group, this delete/re-create action will cause the port status change which shouldn't be changed. (4) When adding/deleting security groups to VM's port, it will trigger the port status change, e.g. from ACTIVE to BUILDING, but under rescue scenario, we also depends on VIF's status to determine whether waiting for vif plug event is not appropriate. This patch is to fix the above problem and there is another patch to enable the exclude rescue tests to test this fix https://review.openstack.org/#/c/416197/ Closes-Bug: #1651650 Change-Id: I32c66733330bc9877caea7e2a2290c02b3906708 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1651650 Title: XenAPI: server rescue test sometime failed with timeout waiting for vif plugging Status in OpenStack Compute (nova): Fix Released Bug description: Observed several failure in citrix xenserver CI for this test case: tempest.api.compute.servers.test_server_rescue See there are timeout waiting for vif: $ grep 'Timeout waiting for vif plugging callbac' screen-n-cpu.txt.gz 2016-12-20 10:58:52.036 4528 WARNING nova.virt.xenapi.vmops [req-ff027cef-59be-4326-95e1-065f68077d63 tempest-ServerRescueTestJSON-1293983176 tempest-ServerRescueTestJSON-1293983176] [instance: 28b094ee-c571-4083-b72b-5ea78f1f4291] Timeout waiting for vif plugging callback For rescue, it seems shouldn't wait for this event as this port should be active at the rescuing start. But observed: neutron service reported the 2nd vif-plugin event: 2016-12-20 10:52:31.689 712 DEBUG neutron.notifiers.nova [-] Sending events: [{'status': 'completed', 'tag': u'52d79a78-7205-4e69-8005-76a3cebbf267', 'name': 'network-vif-plugged', 'server_uuid': u'28b094ee-c571-4083-b72b-5ea78f1f4291'}] send_events /opt/stack/new/neutron/neutron/notifiers/nova.py:248 2016-12-20 10:53:45.179 712 DEBUG neutron.notifiers.nova [-] Sending events: [{'status': 'completed', 'tag': u'52d79a78-7205-4e69-8005-76a3cebbf267', 'name': 'network-vif- plugged', 'server_uuid': u'28b094ee-c571-4083-b72b-5ea78f1f4291'}] send_events /opt/stack/new/neutron/neutron/notifiers/nova.py:248 And nova attempts to wait for this event after the 2nd event sent out; so it won't catch the 2nd event at all: 2016-12-20 10:53:46.326 4528 DEBUG nova.virt.xenapi.vmops [req-ff027cef-59be-4326-95e1-065f68077d63 tempest-ServerRescueTestJSON-1293983176 tempest-ServerRescueTestJSON-1293983176] wait for instance event:[('network-vif-plugged', u'52d79a78-7205-4e69-8005-76a3cebbf267')] _spawn /opt/stack/new/nova/nova/virt/xenapi/vmops.py:599 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1651650/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

