Reviewed: https://review.opendev.org/c/openstack/nova/+/815324 Committed: https://opendev.org/openstack/nova/commit/63ffba7496182f6f6f49a380f3c639fc3ded9772 Submitter: "Zuul (22348)" Branch: master
commit 63ffba7496182f6f6f49a380f3c639fc3ded9772 Author: Erlon R. Cruz <[email protected]> Date: Tue Dec 7 17:39:58 2021 -0300 Fix pre_live_migration rollback During the pre live migration process, Nova performs most of the tasks related to the creation and operation of the VM in the destination host. That is done without interrupting any of the hardware in the source host. If the pre_live_migration fails, those same operations should be rolled back. Currently nova is sharing the _rollback_live_migration for both live and pre_live migration rollbacks, and that is causing the source host to try to re-attach network interfaces on the source host where they weren't actually de-attached. This patch fixes that by adding a conditional to allow nova to do different paths for migration and pre_live_migration rollbacks. Closes-bug: #1944619 Change-Id: I784190ac356695dd508e0ad8ec31d8eaa3ebee56 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1944619 Title: Instances with hardware offloaded ovs ports lose access after failed live migrations Status in neutron: Incomplete Status in OpenStack Compute (nova): Fix Released Bug description: If for some reason a live migration fails for an instance with an SRIOV port during the '_pre_live_migration' hook. The instance will lose access to the network and leave behind duplicated port bindings on the database. The instance re-gains connectivity on the source host after a reboot (don't know if there's another way to restore connectivity). As a side effect of this behavior, the pre-live migration cleanup hook also fails with: PCI device 0000:3b:10.0 is in use by driver QEMU [How to reproduce] - Create an environment with SRIOV, (our case uses switchdev[1]) - Create 1 VM - Provoke a failure in the _pre_live_migration process (for example creating a directory /var/lib/nova/instances/<instance id>) - Check the VM's connectivity - Check the logs for: libvirt.libvirtError: Requested operation is not valid: PCI device 0000:03:04.1 is in use by driver QEMU, domain instance-00000001 Full-stack trace[2] [Expected] VM connectivity is restored even if it gets a brief disconnection As happens for non-SRIOV scenarios, after a failure, no leftovers remains (port bindings and instance path files) [Observed] VM loses connectivity which is only is restored after the VM status is set to ERROR and the VM is power recycled Port bindings are not removed [Environment] Focal Ussuri with Mellanox Connect5 cards [1] https://paste.ubuntu.com/p/PzBM7y6Dbr/ [2] https://paste.ubuntu.com/p/ThQmDYtdSS/ To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1944619/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

