** Description changed: [ Impact ] * If a race condition occurs on libvirtd shutdown, a QEMU domain status XML (/run/libvirt/qemu/*.xml) might lose the QEMU-driver specific information, such as '<monitor path=.../>'. (The race condition details are in [Other Info].) * On the next libvirtd startup, the parsing of that QEMU domain's status XML fails as '<monitor path=' is not found: $ journalctl -b -u libvirtd.service | tail ... ... libvirtd[2789]: internal error: no monitor path ... libvirtd[2789]: Failed to load config for domain 'test-vm' * As a result, the domain is not listed in `virsh list`, and `virsh` commands to it fail. $ virsh list Id Name State -------------------- * The domain is still running, but libvirt considers it as shutdown, which might cause conflicts/issues with higher-level tools (e.g., openstack nova). $ virsh list --all Id Name State -------------------------- - test-vm shut off $ pgrep -af qemu-system-x86_64 | cut -d, -f1 2638 /usr/bin/qemu-system-x86_64 -name guest=test-vm, [ Test Plan ] * (Focal/Jammy) shutdown-on-runtime: Synthetic reproducer/verification with GDB in comments #1 and #2 (Jammy) and #12 and #14 (Focal). * (Focal-only) shutdown-on-init: Synthetic reproducer/verification with GDB in comments #13 and #15. * On failure, the XML is saved *without* '<monitor path=' and libvirt fails to parse the domain on startup. The domain is *not* listed in `virsh list`. * On success, the XML is saved *with* '<monitor path=' and libvirt correctly parses the domain on startup. The domain is listed in `virsh list`. * Normal 'restart' testing in comment #5. * Test packages built successfully in all architectures with -proposed enabled in Launchpad PPA mfo/lp2059272 [0] [0] https://launchpad.net/~mfo/+archive/ubuntu/lp2059272 [ Regression Potential ] * One patch changes *where* in the libvirt qemu driver's shutdown path the worker thread pool is stopped/freed: from _after_ releasing other data to _before_ doing so. - * The other patch (Focal-only) introduces a bounded wait - (with configurable timeout via an environment variable) - in the (same) libvirt qemu driver's shutdown path. - - By default, this waits for qemuProcessReconnect threads - for up to 30 seconds (expected to finish in less than - 1 second, in practice), and gives up / continues with - shutdown anyway so not to introduce a behavior change - on this path (prevents impact in case of regressions). + * The other patch (Focal-only) skips the update of the + QEMU domain status XML file during initialization if + libvirt is shutting down. (This is OK since the file + is not going to be used anyway in the current run as + it is shutting down, and it will be updated again in + the next run anyway.) * Therefore, the potential for regression is limited to the libvirt qemu driver's shutdown path, and would be observed when stopping/restarting libvirtd.service. * The behavior during normal operation is not affected. [Other Info] * In Focal, race windows exist if libvirtd shuts down _after_ initialization and _during_ initialization (which is unlikely in practice, but it's possible.) Say, 'shutdown'on-runtime' and 'shutdown-on-init'. * In Jammy, only 'shutdown-on-runtime' might happen, due to the introduction of the '.stateShutdownWait' driver callback (not available in Focal), which indirectly prevents the 'shutdown-on-init' race due to additional synchronization with locking. * For 'shutdown-on-runtime': use upstream commit [1]. It's needed in Focal and Jammy (included in Mantic). * For 'shutdown-on-init' (Focal-only), we should use a - downstream-only patch (with configurable behavior), + downstream-only patch (with conservative behavior), since upstream addressed this issue indirectly with the '.stateShutdownWait' callbacks and other changes (which are not SRU material, ~10 patches, redesign [2]) - in 6.8.0. + in 6.8.0. [1] https://gitlab.com/libvirt/libvirt/-/commit/152770333449cd3b78b4f5a9f1148fc1f482d842 $ git describe --contains 152770333449cd3b78b4f5a9f1148fc1f482d842 v9.3.0-rc1~90 $ rmadison -a source libvirt | sed -n '/focal/,$p' libvirt | 6.0.0-0ubuntu8 | focal | source libvirt | 6.0.0-0ubuntu8.16 | focal-security | source libvirt | 6.0.0-0ubuntu8.16 | focal-updates | source libvirt | 6.0.0-0ubuntu8.17 | focal-proposed | source libvirt | 8.0.0-1ubuntu7 | jammy | source libvirt | 8.0.0-1ubuntu7.5 | jammy-security | source libvirt | 8.0.0-1ubuntu7.8 | jammy-updates | source libvirt | 9.6.0-1ubuntu1 | mantic | source libvirt | 10.0.0-2ubuntu1 | noble | source libvirt | 10.0.0-2ubuntu5 | noble-proposed | source [2] https://listman.redhat.com/archives/libvir-list/2020-July/205291.html [PATCH 00/10] resolve hangs/crashes on libvirtd shutdown commit 94e45d1042e21e03a15ce993f90fbef626f1ae41 Author: Nikolay Shirokovskiy <[email protected]> Date: Thu Jul 23 09:53:04 2020 +0300 rpc: finish all threads before exiting main loop $ git describe --contains 94e45d1042e21e03a15ce993f90fbef626f1ae41 v6.8.0-rc1~279 [Original Description] There's a race condition on libvirtd shutdown that might cause the domain status XML file(s) to lose the '<monitor path=...'> tag/field. This causes an error on libvirtd startup, and the domain is not listed/managed, despite it is still running. $ virsh list Id Name State ------------------------- 1 test-vm running $ sudo systemctl restart libvirtd.service $ journalctl -b -u libvirtd.service | tail ... ... libvirtd[2789]: internal error: no monitor path ... libvirtd[2789]: Failed to load config for domain 'test-vm' $ virsh list Id Name State -------------------- $ virsh list --all Id Name State -------------------------- - test-vm shut off $ pgrep -af qemu-system-x86_64 | cut -d, -f1 2638 /usr/bin/qemu-system-x86_64 -name guest=test-vm,
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2059272 Title: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain" To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
