Reviewed: https://review.opendev.org/c/openstack/neutron/+/855257 Committed: https://opendev.org/openstack/neutron/commit/91f0864dc0ccf0f67be7162f011706dbc6383cb3 Submitter: "Zuul (22348)" Branch: master
commit 91f0864dc0ccf0f67be7162f011706dbc6383cb3 Author: Rodolfo Alonso Hernandez <[email protected]> Date: Tue Aug 30 18:09:34 2022 +0200 Add an active wait during the port provisioning event In ML2/OVN, during a live-migration process, it could happend that the port provisioning event is received before the port binding has been updated. That means the port has been created in the destination host and the event received (this event will remove any pending provisioning block). But the Nova port binding request has not arrived yet, updating the port binding registers. Because the port is considered "not bound" (yet), the port provisioning doesn't set the port status to ACTIVE. This patch creates an active wait during the port provisioning event method. If the port binding is still "unbound", the method retries the port retrieval several times, giving some time to the port binding request from Nova to arrive. Closes-Bug: #1988199 Change-Id: I50091c84e67c172c94ce9140f23235421599185c ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1988199 Title: [OVN][live-migration] Nova port binding request and "LogicalSwitchPortUpdateUpEvent" race condition Status in neutron: Fix Released Bug description: Related Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2120409 Summary: after the a live-migration, the VM port status is DOWN. During a live-migration, the following events happen in the Neutron server: 1) We receive a port update. Because the "migrating_to" field is in the port binding, the OVN mech driver forces a port update from DOWN to UP. This (1) sets the port status to UP and (2) sends the vif-plugged event to Nova. That will trigger the port creation (layer 1) in the destination node. 2) Then the "LogicalSwitchPortUpdateDownEvent", because the source port was deleted. That sets the port status to DOWN. 3) At the same time we receive the "LogicalSwitchPortUpdateUpEvent", because the port in the destination host has been created. This last event won't manually set the port status to UP. Instead it will remove any port provisioning block [1]. 3.1) If the port provisioned is considered as complete ("provisioning_complete" event), this is processed in "Ml2Plugin._port_provisioned". The problem we are hitting here is that the port has no host (the port is still not bound): 2022-08-26 10:08:23.373 17 DEBUG neutron.plugins.ml2.plugin [req-2b13d263-5748-46e2-9fdf-33df50634607 - - - - -] Port 943db0db-773f-45e9-8b68-0ebcc1840207 cannot update to ACTIVE because it is not bound. _port_provisioned /usr/lib/python3.9/site- packages/neutron/plugins/ml2/plugin.py:339 4) Right after the Nova port binding request is received and the port is bound: https://paste.opendev.org/show/bIUoJkiStCIe8TBb0573/ This is basically the issue we have here: there is a race condition between (1) the Nova port binding request and (2) the "LogicalSwitchPortUpdateUpEvent" that is received when the OVS port is created on a chassis. Just for testing, if I add a 1 second sleep at the very first line of "_port_provisioned", allowing to receive the Nova port binding request (that will bind the port to a host), the port provisioning succeeds and the port is set to UP. I'll find a way to fix that in the Ml2Plugin code. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1988199/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

