Reviewed: https://review.opendev.org/c/openstack/nova/+/852002 Committed: https://opendev.org/openstack/nova/commit/9fea934c71d3c2fa7fdd80c67d94e18466c5cf9a Submitter: "Zuul (22348)" Branch: master
commit 9fea934c71d3c2fa7fdd80c67d94e18466c5cf9a Author: Brett Milford <[email protected]> Date: Thu Aug 4 16:52:33 2022 +1000 Handle "no RAM info was set" migration case This handles the case where the live migration monitoring thread may race and call jobStats() after the migration has completed resulting in the following error: libvirt.libvirtError: internal error: migration was active, but no RAM info was set Closes-Bug: #1982284 Change-Id: I77fdfa9cffbd44b2889f49f266b2582bcc6a4267 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1982284 Title: libvirt live migration sometimes fails with "libvirt.libvirtError: internal error: migration was active, but no RAM info was set" Status in OpenStack Compute (nova): Fix Released Bug description: We have seen this downstream where live migration randomly fails with the following error [1]: libvirt.libvirtError: internal error: migration was active, but no RAM info was set Discussion on [1] gravitated toward a possible race condition issue in qemu around the query-migrate command [2]. The query-migrate command is used (indirectly) by the libvirt driver during monitoring of live migrations [3][4][5]. While searching for info about this error, I found a thread on libvir- list from the past [6] where someone else encountered the same error and for them it happened if they called query-migrate *after* a live migration had completed. Based on this, it seemed possible that our live migration monitoring thread sometimes races and calls jobStats() after the migration has completed, resulting in this error being raised and the migration being considered failed when it was actually complete. A patch has since been proposed and committed [7] to address the possible issue. Meanwhile, on our side in nova, we can mitigate this problematic behavior by catching the specific error from libvirt and ignoring it so that a live migration in this situation will be considered completed by the libvirt driver. Doing this would improve the experience for users that are hitting this error and getting erroneous live migration failures. [1] https://bugzilla.redhat.com/show_bug.cgi?id=2074205 [2] https://qemu.readthedocs.io/en/latest/interop/qemu-qmp-ref.html#qapidoc-1848 [3] https://github.com/openstack/nova/blob/bcb96f362ab12e297f125daa5189fb66345b4976/nova/virt/libvirt/driver.py#L10123 [4] https://github.com/openstack/nova/blob/bcb96f362ab12e297f125daa5189fb66345b4976/nova/virt/libvirt/guest.py#L655 [5] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainGetJobStats [6] https://listman.redhat.com/archives/libvir-list/2021-January/213631.html [7] https://github.com/qemu/qemu/commit/552de79bfdd5e9e53847eb3c6d6e4cd898a4370e To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1982284/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

