Steps to reproduce on Jammy
---

Stop libvirt systemd units

        sudo systemctl stop 'libvirtd*'

Start libvirt in GDB

        sudo gdb \
          -iex 'set confirm off' \
          -iex 'set pagination off' \
          -iex 'set debuginfod enabled on' \
          -iex 'set debuginfod urls https://debuginfod.ubuntu.com' \
          -ex 'set non-stop on' \
          -ex 'handle SIGTERM nostop noprint pass' \
          -ex 'add-symbol-file /usr/sbin/libvirtd' \
          -ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt.so.0' \
          -ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt-qemu.so.0' \
          -ex 'add-symbol-file 
/usr/lib/x86_64-linux-gnu/libvirt/connection-driver/libvirt_driver_qemu.so' \
          /usr/sbin/libvirtd

Add breakpoints for qemu driver cleanup and device deleted event

        b qemuStateCleanup
        b processDeviceDeletedEvent
        run

Start test VM with an USB mouse device

        cat <<-EOF >test-vm.xml
        <domain type='qemu'>
          <name>test-vm</name>
          <os>
            <type>hvm</type>
          </os>
          <memory unit='MiB'>32</memory>
          <vcpu>1</vcpu>
          <devices>
            <input type='mouse' bus='usb'/>
          </devices>
        </domain>
        EOF

        virsh define test-vm.xml
        virsh start test-vm

        $ virsh list
         Id   Name      State
        -------------------------
         1    test-vm   running

Delete the USB mouse device

        DEVICE_ID=$(virsh qemu-monitor-command test-vm --hmp 'info qtree' | 
grep 'dev: usb-mouse' | cut -d'"' -f2)
        virsh qemu-monitor-command test-vm --hmp "device_del $DEVICE_ID"

Back to GDB

        Thread 25 "qemu-event" hit Breakpoint 2, 0x00007f6179ed20a7 in
processDeviceDeletedEvent (devAlias=<optimized out>, vm=0x7f61842f1020,
driver=0x7f6184035e20) at ../../src/qemu/qemu_driver.c:3536

Add breakpoint to domain status XML save, and continue the thread above

        b virDomainObjSave
        t 25
        c

        Thread 25 "qemu-event" hit Breakpoint 3, virDomainObjSave
(obj=0x7f61842f1020, xmlopt=0x7f6184028010, statusDir=0x7f6184035460
"/run/libvirt/qemu") at ../../src/conf/domain_conf.c:28879

Check the backtrace of the domain status XML save function, coming from
device deleted event

        (gdb) bt
        #0  virDomainObjSave (obj=0x7f61842f1020, xmlopt=0x7f6184028010, 
statusDir=0x7f6184035460 "/run/libvirt/qemu") at 
../../src/conf/domain_conf.c:28879
        #1  0x00007f6179eb68c3 in qemuDomainObjSaveStatus 
(driver=0x7f6184035e20, obj=0x7f61842f1020) at ../../src/qemu/qemu_domain.c:5801
        #2  0x00007f6179ed2159 in processDeviceDeletedEvent 
(devAlias=0x7f617c0073e0 "input0", vm=0x7f61842f1020, driver=0x7f6184035e20) at 
../../src/qemu/qemu_driver.c:3557
        #3  qemuProcessEventHandler (data=0x7f617c0072b0, 
opaque=0x7f6184035e20) at ../../src/qemu/qemu_driver.c:4184
        #4  0x00007f61974fc983 in virThreadPoolWorker (opaque=<optimized out>) 
at ../../src/util/virthreadpool.c:164
        #5  0x00007f61974fb4d9 in virThreadHelper (data=<optimized out>) at 
../../src/util/virthread.c:241
        #6  0x00007f6196e64ac3 in start_thread (arg=<optimized out>) at 
./nptl/pthread_create.c:442
        #7  0x00007f6196ef6850 in clone3 () at 
../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Leave the thread at this point

Let's trigger the shutdown path

First, increase the shutdown timer (30 seconds is too fast for me; use
30 minutes)

        (gdb) b virEventAddTimeout

        $ sudo kill $(pidof libvirtd)

        Thread 1 "libvirtd" hit Breakpoint 4, virEventAddTimeout
(timeout=30000, cb=0x7f61975bbbc0 <virNetDaemonFinishTimer>,
opaque=0x55aec684a020, ff=0x0) at ../../src/util/virevent.c:148

        t 1
        set $rdi = 30 * 60 * 1000

        (gdb) i r $rdi
        rdi            0x1b7740            1800000

Now, skip the qemu driver shutdown wait path, to force the scenario
(unexpected) that it allows a race condition:

        b qemuStateShutdownWait
        c

        Thread 26 "daemon-shutdown" hit Breakpoint 5,
qemuStateShutdownWait () at ../../src/qemu/qemu_driver.c:1055

        t 26
        ret
        c
        
        Thread 1 "libvirtd" hit Breakpoint 1, qemuStateCleanup () at 
../../src/qemu/qemu_driver.c:1070

Check there are 2 threads: cleanup and domain status XML save

        (gdb) i th
          Id   Target Id                                     Frame
          1    Thread 0x7f6193934ac0 (LWP 2544) "libvirtd"   qemuStateCleanup 
() at ../../src/qemu/qemu_driver.c:1070
          18   Thread 0x7f616a7fc640 (LWP 2563) "gmain"      (running)
          19   Thread 0x7f6169ffb640 (LWP 2564) "gdbus"      (running)
          20   Thread 0x7f61697fa640 (LWP 2565) "udev-event" (running)
          24   Thread 0x7f616affd640 (LWP 2641) "vm-test-vm" (running)
          25   Thread 0x7f61687f8640 (LWP 2660) "qemu-event" virDomainObjSave 
(obj=0x7f61842f1020, xmlopt=0x7f6184028010, statusDir=0x7f6184035460 
"/run/libvirt/qemu") at ../../src/conf/domain_conf.c:28879

Confirm the qemu driver's domain xml formatter/options is
set/referenced:

        t 25

        (gdb) p xmlopt.privateData.format
        $1 = (virDomainXMLPrivateDataFormatFunc) 0x7f6179eb1da0 
<qemuDomainObjPrivateXMLFormat>

        (gdb) p xmlopt.parent.parent_instance
        $2 = {g_type_instance = {g_class = 0x7f6184053290}, ref_count = 1, 
qdata = 0x0}

Let the cleanup function and shutdown path finish

        t 1
        c &
        
Check the formatter/options again; it is *NO* longer referenced:

        (gdb) p xmlopt.privateData.format
        $3 = (virDomainXMLPrivateDataFormatFunc) 0x7f6179eb1da0 
<qemuDomainObjPrivateXMLFormat>

        (gdb) p xmlopt.parent.parent_instance
        $4 = {g_type_instance = {g_class = 0x0}, ref_count = 0, qdata = 0x0}

The object data is _not_ zeroed in the last unreference anymore
in Jammy as it is Focal, but it might happen, as this is really
an use-after-free (and another thread might get/use that memory).

So, let's simulate that.

        set xmlopt.privateData.format = 0

        (gdb) p xmlopt.privateData.format
        $5 = (virDomainXMLPrivateDataFormatFunc) 0x0

Check the VM status XML *before* the save function finishes:

        $ sudo grep -e '<domstatus' -e '<domain' -e 'monitor path' 
/run/libvirt/qemu/test-vm.xml
        <domstatus state='running' reason='booted' pid='2638'>
          <monitor path='/var/lib/libvirt/qemu/domain-1-test-vm/monitor.sock' 
type='unix'/>
          <domain type='qemu' id='1'>

Let the save function continue, and libvirt finishes shutting down:

        (gdb) c
        Continuing.
        ...
        [Inferior 1 (process 2544) exited normally]

Check the VM status XML *after*:

        $ sudo grep -e '<domstatus' -e '<domain' -e 'monitor path' 
/run/libvirt/qemu/test-vm.xml
        <domstatus state='running' reason='booted' pid='2638'>
          <domain type='qemu' id='1'>

It no longer has the 'monitor path' tag/field.

Now, the next time libvirtd starts, it fails to parse that XML:

        $ sudo systemctl start libvirtd.service

        $ journalctl -b -u libvirtd.service | tail
        ...
        ... libvirtd[2789]: internal error: no monitor path
        ... libvirtd[2789]: Failed to load config for domain 'test-vm'

And libvirt is not aware of the domain, and cannot manage it:

        $ virsh list
         Id   Name   State
        --------------------

        $ virsh list --all
         Id   Name      State
        --------------------------
         -    test-vm   shut off

Even though it is still running:

        $ pgrep -af qemu-system-x86_64 | cut -d, -f1
        2638 /usr/bin/qemu-system-x86_64 -name guest=test-vm,

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to