Public bug reported:

## Package

qemu-system-x86 (Ubuntu noble)

## Affects

qemu (Ubuntu)

## Related bugs

- LP #1847361 (Upgrade of qemu binaries causes running instances to be unable 
to hot-attach)
- LP #1913421 (module retention improvements)

## Description

### Summary

After upgrading QEMU packages on a compute node (e.g. from
`1:8.2.2+ds-0ubuntu1.12` to `0ubuntu1.13`), long-running VM instances
started with the older build can no longer hot-attach Ceph RBD volumes —
even though `/run/qemu/` contains the retained modules for the old
build.

The first attach attempt fails with "Unknown driver 'rbd'". A second
attempt crashes QEMU with an assertion failure.

This is a regression in the module-retention mechanism introduced for LP
#1847361.

### Root cause

Two bugs in `util/module.c` (confirmed identical on current QEMU master
as of 2026-03-26):

**Bug A — module_load() does not fall back on build mismatch:**

The directory search loop (lines 282–303) only continues to the next
directory when the module file is not found (`ENOENT`). When the file
exists but `module_load_dso()` fails (build mismatch), the loop hits
`goto out` immediately — never reaching `/run/qemu/<version>/`.

`CONFIG_MODULE_UPGRADES` is enabled in the Ubuntu noble build
(`debian/rules`: `$(if ${enable-system},--enable-module-upgrades)`), so
the `/run/qemu/<version>/` path is added to the search list — but it is
never reached because the system path (`/usr/lib/x86_64-linux-
gnu/qemu/`) contains the new build's modules, which exist but fail the
stamp check.

**Bug B — module_load_dso() leaks dso_init_list on failure:**

When `g_module_open()` loads a `.so`, its constructors populate
`dso_init_list`. On build mismatch, `g_module_close()` is called but
`dso_init_list` is not drained. On the next module load attempt,
`assert(QTAILQ_EMPTY(&dso_init_list))` fires and QEMU aborts.

### Environment

- Ubuntu 24.04 (noble), OpenStack compute nodes (Nova Victoria, libvirt/kvm, 
Cinder/Ceph RBD)
- Kernel: `6.14.0-37-generic #37~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC x86_64`
- QEMU: `qemu-system-x86 1:8.2.2+ds-0ubuntu1.13`
- libvirt: 10.0.0-2ubuntu8.12
- AppArmor: enabled, no DENIED entries for `/run/qemu` or `block-rbd.so`
- `/run/qemu` mounted as tmpfs (rw, no noexec)

### Observed symptoms

**Instance started with QEMU 0ubuntu1.11, host upgraded to
0ubuntu1.13:**

Instance log (first attach attempt):
```
failed to initialize module: /usr/lib/x86_64-linux-gnu/qemu/block-rbd.so
Only modules from the same build can be loaded.
```

libvirt:
```
internal error: unable to execute QEMU command 'blockdev-add': Unknown driver 
'rbd'
```

VM continues running, but attach fails. `/proc/$PID/maps` shows no
mapping of `block-rbd.so`.

Second attempt — instance log:
```
qemu-system-x86_64: util/module.c:165: module_load_dso: Assertion 
`QTAILQ_EMPTY(&dso_init_list)' failed.
```

QEMU exits (`reason=crashed`), VM ends up in SHUTOFF state.

At the time of the failure, the retained modules exist:
```
/run/qemu/Debian_1_8.2.2+ds-0ubuntu1.12/block-rbd.so   (40312 bytes, readable)
/run/qemu/Debian_1_8.2.2+ds-0ubuntu1.11/block-rbd.so   (40312 bytes, readable)
```

This has been reproduced across multiple minor build upgrades
(0ubuntu1.11→12 and 0ubuntu1.12→13).

### Steps to reproduce

1. Start an OpenStack instance on a compute node running QEMU 
`1:8.2.2+ds-0ubuntu1.X`. The instance must not use RBD at boot.
2. Upgrade QEMU on the host to `0ubuntu1.(X+1)` while the instance keeps 
running.
3. Verify `/run/qemu/Debian_1_8.2.2+ds-0ubuntu1.X/block-rbd.so` exists.
4. Hot-attach a Cinder/Ceph RBD volume (`openstack server add volume`).
5. First attempt: "Unknown driver 'rbd'".
6. Second attempt: QEMU assertion crash.

### Impact

- Long-running VMs that predate a QEMU package upgrade cannot hot-attach RBD 
volumes (or any other module-backed driver not already loaded).
- Second attempt crashes the VM, causing unplanned downtime.
- Defeats the purpose of the `/run/qemu/` module-retention mechanism (LP 
#1847361, LP #1913421).

### Proposed fix

See upstream QEMU GitLab issue (https://gitlab.com/qemu-
project/qemu/-/work_items/3354) for detailed code analysis and patch
proposals. Summary:

- **Bug A:** On `module_load_dso()` failure, clear the error and `continue` to 
the next directory instead of `goto out`.
- **Bug B:** In `module_load_dso()`, drain `dso_init_list` before 
`g_module_close()` when the stamp check fails.

Both fixes are against upstream `util/module.c` — the code is identical
on current QEMU master.

### Current workaround

Proactively reboot or live-migrate any instance whose running QEMU
version (via QMP `query-version`) does not match the installed package
version, before hot-attaching RBD volumes.

** Affects: qemu (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2146445

Title:
  qemu-system-x86: module upgrade fallback in /run/qemu/ broken —
  "Unknown driver 'rbd'" + crash on retry

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/2146445/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to