** Description changed:
[ Impact ]
Virtualization users who:
- Have a Noble VM on top of an Intel bare metal machine, and
- Create a nested VM (guest) inside this Noble VM (host), and
- Try to migrate this nested VM (guest) to another Noble VM (host) running on
the same bare metal machine or a similar one, and
- Use a "migratable XML" (as generated by "virsh dumpxml --migratable") as
virsh's "--xml" and "--persistent-xml" arguments
might encounter issues which prevent the migration from starting. These
issues are related to CPU feature checks performed by libvirt, more
specifically features related to "vmx*", which unfortunately have been a
known source of problems in migration scenarios under libvirt.
This bug also affects users who created the migratable XML file under
Noble and are now trying to use it with the libvirt shipped in Oracular.
[ Test Plan ]
- Even though this problem happens only when using nested VMs with Intel
- CPUs, it is still recommended to perform the following tests on a bare
- metal machine also with an Intel CPU. In theory it should be possible
- to reproduce this on a host using an AMD CPU, but you'd have to
- explicitly tell LXD to create VMs with Intel CPUs.
+ This particular problem about VMX happens only when using nested VMs
+ with Intel CPUs, therefore it is recommended to perform the following
+ tests on a bare metal machine also with an Intel CPU. In theory it
+ could be possible to reproduce this on a host using an AMD CPU, but
+ you'd have to explicitly tell LXD to create VMs with virtual Intel CPUs
+ - but they'd likely refuse to expose all the intel virt features as
+ needed.
Credits to Guillaume Boutry for providing scripts automating most of the
reproduction steps.
Let's create two Noble VMs using LXD:
$ lxc launch ubuntu:noble --vm --config limits.cpu=4 --config
limits.memory=8GiB -d root,size=80GiB libvirt-1
$ lxc launch ubuntu:noble --vm --config limits.cpu=4 --config
limits.memory=8GiB -d root,size=80GiB libvirt-2
You will need to generate an SSH keypair for the "ubuntu" user on
libvirt-1 and install the public key on libvirt-2 so that "ssh
libvirt-2.lxd" works. The rest of this test plan assumes you have done
that.
Inside libvirt-1:
# apt update
# apt install -y libvirt-daemon-system uuid
# echo "host_uuid = \"00000000-0000-0000-0000-$(printf "%012x"
"${RANDOM}")\"" >> /etc/libvirt/libvirtd.conf
# systemctl restart libvirtd.service
# su - ubuntu
$ cd /tmp
$ wget
http://cloud-images.ubuntu.com/noble/current/noble-server-cloudimg-amd64.img
$ sudo chown libvirt-qemu:kvm noble-server-cloudimg-amd64.img
$ cd
$ cat > domain.xml << _EOF_
<domain type="kvm">
<uuid>$(uuidgen)</uuid>
<name>test-domain</name>
<memory>1048576</memory>
<vcpu>2</vcpu>
<os>
<type arch="x86_64" machine="pc">hvm</type>
<boot dev="hd"/>
</os>
<features>
<acpi/>
<apic/>
<vmcoreinfo/>
</features>
<clock offset="utc">
<timer name="pit" tickpolicy="delay"/>
<timer name="rtc" tickpolicy="catchup"/>
<timer name="hpet" present="no"/>
</clock>
<cpu mode="host-model" match="exact">
<topology sockets="2" cores="1" threads="1"/>
</cpu>
<devices>
<disk type="file" device="disk">
<driver name="qemu" type="qcow2" cache="none"/>
<source file="/tmp/noble-server-cloudimg-amd64.img"/>
<target dev="vda" bus="virtio"/>
</disk>
<video>
<model type="qxl"/>
</video>
<rng model="virtio">
<backend model="random">/dev/urandom</backend>
</rng>
<controller type="usb" index="0" model="none"/>
<memballoon model="virtio">
<stats period="10"/>
</memballoon>
</devices>
</domain>
_EOF_
$ virsh define domain.xml
$ virsh start test-domain
$ virsh dumpxml --migratable test-domain > migratable.xml
Inside libvirt-2:
# apt update
# apt install -y libvirt-daemon-system
# cd /tmp
# wget
http://cloud-images.ubuntu.com/noble/current/noble-server-cloudimg-amd64.img
# chown libvirt-qemu:kvm noble-server-cloudimg-amd64.img
# cd
Now, back to libvirt-1, we are ready to test the migration:
$ virsh migrate test-domain qemu+ssh://libvirt-2.lxd/system --live
--persistent --undefinesource --copy-storage-inc --migrate-disks vda
--persistent-xml migratable.xml --xml migratable.xml
On Noble, you should see the following error:
error: unsupported configuration: Target CPU feature count 28 does not
match source 96
[ Where problems could occur ]
As described below (in the "Other information" section), this SRU is
different between Noble and Oracular.
For Noble, the chances of regression are higher because it involves
updating a sizeable patch series before actually backporting the patches
to fix the bug. Feature-wise, this update should not change anything,
and a review has been performed to make sure that, to the best of our
knowledge, no user-facing changes are introduced.
For Oracular, all that was needed to be done was backporting the patches
that fix the issue.
The patches themselves are not complex and have been part of RHEL's
libvirt for a while now, without any regressions. There is always the
possibility that some unwanted regression is introduced, but our
internal migration testsuite has not caught any problems.
[ Other information ]
For Noble, this SRU involves two steps:
1) Updating an existing patch series (which was backported in order to
fix bug #2051754). This is needed because the patch series was
backported directly from the patches posted at upstream's mailing list.
The series has since been accepted and pushed to the upstream git
repository, and although it is exactly feature-wise, there were some
minor cosmetic changes done to function names which can affect future
backports that touch the same code (as is the case here).
2) Actually backporting the patches that fix the issue.
Oracular was simpler because the patchset from step (1) was already
present at the release.
[ Original Description ]
This is issue is reproduced consistently from the snap-openstack-
hypervisor built from
https://git.launchpad.net/ubuntu/+source/libvirt@ubuntu/noble-updates
(with patches applied).
When creating a nova instance, live migrating between two hosts always fails
because of:
error: unsupported configuration: Target CPU feature count 44 does not match
source 109
Command that reproduces a Nova migration using libvirt client (and
reproduces the same error):
virsh migrate instance-00000002 qemu+tls://juju-596fd1-1.lxd/system
--live --p2p --persistent --undefinesource --copy-storage-inc --migrate-
disks vda --xml migratable.xml --persistent-xml migratable.xml
--bandwidth 0
Attached to this bug you will find:
- instance.xml: domain dumped through virsh
- migratable.xml: domain drump through virsh using --migratable (same flags
as nova updated xml)
- libvirtd.log: libvirt daemon debug logs showcasing why it refused to migrate
As you can see in the logs from libvirtd.log, the method
virDomainDefCheckABIStabilityFlags fails because the src has 65 VMX
additional features that are not found on the destination.
(Both hypervisors are hosted on LXD VMs on the same physical machines
i.e. same cpu flags)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2083986
Title:
Live migration fails because VMX features are missing on target cpu
definition
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2083986/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs