** Description changed:

  [ Impact ]
  
   * If a race condition occurs on libvirtd shutdown,
     a QEMU domain status XML (/run/libvirt/qemu/*.xml)
     might lose the QEMU-driver specific information,
     such as '<monitor path=.../>'.
     (The race condition details are in [Other Info].)
  
   * On the next libvirtd startup, the parsing of that
     QEMU domain's status XML fails as '<monitor path='
     is not found:
  
    $ journalctl -b -u libvirtd.service | tail
    ...
    ... libvirtd[2789]: internal error: no monitor path
    ... libvirtd[2789]: Failed to load config for domain 'test-vm'
  
   * As a result, the domain is not listed in `virsh list`,
     and `virsh` commands to it fail.
  
    $ virsh list
     Id Name State
    --------------------
  
   * The domain is still running, but libvirt considers
     it as shutdown, which might cause conflicts/issues
     with higher-level tools (e.g., openstack nova).
  
    $ virsh list --all
     Id Name State
    --------------------------
     - test-vm shut off
  
    $ pgrep -af qemu-system-x86_64 | cut -d, -f1
    2638 /usr/bin/qemu-system-x86_64 -name guest=test-vm,
  
  [ Test Plan ]
  
   * (Focal/Jammy) shutdown-on-runtime:
     Synthetic reproducer/verification with GDB in comments #1 and #2 (Jammy) 
and #12 and #14 (Focal).
  
   * (Focal-only) shutdown-on-init:
     Synthetic reproducer/verification with GDB in comments #13 and #15.
  
   * On failure, the XML is saved *without* '<monitor path='
     and libvirt fails to parse the domain on startup.
     The domain is *not* listed in `virsh list`.
  
   * On success, the XML is saved *with* '<monitor path='
     and libvirt correctly parses the domain on startup.
     The domain is listed in `virsh list`.
  
   * Normal 'restart' testing in comment #5.
  
   * Test packages built successfully in all architectures
     with -proposed enabled in Launchpad PPA mfo/lp2059272 [0]
  
  [0] https://launchpad.net/~mfo/+archive/ubuntu/lp2059272
  
  [ Regression Potential ]
  
   * One patch changes *where* in the libvirt qemu driver's
     shutdown path the worker thread pool is stopped/freed:
     from _after_ releasing other data to _before_ doing so.
  
-  * The other patch (Focal-only) introduces a bounded wait
-    (with configurable timeout via an environment variable)
-    in the (same) libvirt qemu driver's shutdown path.
- 
-    By default, this waits for qemuProcessReconnect threads
-    for up to 30 seconds (expected to finish in less than
-    1 second, in practice), and gives up / continues with
-    shutdown anyway so not to introduce a behavior change
-    on this path (prevents impact in case of regressions).
+  * The other patch (Focal-only) skips the update of the
+    QEMU domain status XML file during initialization if
+    libvirt is shutting down. (This is OK since the file
+    is not going to be used anyway in the current run as
+    it is shutting down, and it will be updated again in
+    the next run anyway.)
  
   * Therefore, the potential for regression is limited to
     the libvirt qemu driver's shutdown path, and would be
     observed when stopping/restarting libvirtd.service.
  
   * The behavior during normal operation is not affected.
  
  [Other Info]
  
   * In Focal, race windows exist if libvirtd shuts down
     _after_ initialization and _during_ initialization
     (which is unlikely in practice, but it's possible.)
  
     Say, 'shutdown'on-runtime' and 'shutdown-on-init'.
  
   * In Jammy, only 'shutdown-on-runtime' might happen,
     due to the introduction of the '.stateShutdownWait'
     driver callback (not available in Focal), which
     indirectly prevents the 'shutdown-on-init' race
     due to additional synchronization with locking.
  
   * For 'shutdown-on-runtime': use upstream commit [1].
     It's needed in Focal and Jammy (included in Mantic).
  
   * For 'shutdown-on-init' (Focal-only), we should use a
-    downstream-only patch (with configurable behavior),
+    downstream-only patch (with conservative behavior),
     since upstream addressed this issue indirectly with
     the '.stateShutdownWait' callbacks and other changes
     (which are not SRU material, ~10 patches, redesign [2])
-    in 6.8.0.
+    in 6.8.0.
  
  [1]
  
https://gitlab.com/libvirt/libvirt/-/commit/152770333449cd3b78b4f5a9f1148fc1f482d842
  
   $ git describe --contains 152770333449cd3b78b4f5a9f1148fc1f482d842
   v9.3.0-rc1~90
  
   $ rmadison -a source libvirt | sed -n '/focal/,$p'
    libvirt | 6.0.0-0ubuntu8       | focal           | source
    libvirt | 6.0.0-0ubuntu8.16    | focal-security  | source
    libvirt | 6.0.0-0ubuntu8.16    | focal-updates   | source
    libvirt | 6.0.0-0ubuntu8.17    | focal-proposed  | source
    libvirt | 8.0.0-1ubuntu7       | jammy           | source
    libvirt | 8.0.0-1ubuntu7.5     | jammy-security  | source
    libvirt | 8.0.0-1ubuntu7.8     | jammy-updates   | source
    libvirt | 9.6.0-1ubuntu1       | mantic          | source
    libvirt | 10.0.0-2ubuntu1      | noble           | source
    libvirt | 10.0.0-2ubuntu5      | noble-proposed  | source
  
  [2] https://listman.redhat.com/archives/libvir-list/2020-July/205291.html
  [PATCH 00/10] resolve hangs/crashes on libvirtd shutdown
  
  commit 94e45d1042e21e03a15ce993f90fbef626f1ae41
  Author: Nikolay Shirokovskiy <[email protected]>
  Date: Thu Jul 23 09:53:04 2020 +0300
  
  rpc: finish all threads before exiting main loop
  
  $ git describe --contains 94e45d1042e21e03a15ce993f90fbef626f1ae41
  v6.8.0-rc1~279
  
  [Original Description]
  
  There's a race condition on libvirtd shutdown
  that might cause the domain status XML file(s)
  to lose the '<monitor path=...'> tag/field.
  
  This causes an error on libvirtd startup, and
  the domain is not listed/managed, despite it
  is still running.
  
   $ virsh list
    Id   Name      State
   -------------------------
    1    test-vm   running
  
   $ sudo systemctl restart libvirtd.service
  
   $ journalctl -b -u libvirtd.service | tail
   ...
   ... libvirtd[2789]: internal error: no monitor path
   ... libvirtd[2789]: Failed to load config for domain 'test-vm'
  
   $ virsh list
    Id   Name   State
   --------------------
  
   $ virsh list --all
    Id   Name      State
   --------------------------
    -    test-vm   shut off
  
   $ pgrep -af qemu-system-x86_64 | cut -d, -f1
   2638 /usr/bin/qemu-system-x86_64 -name guest=test-vm,

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to