Hi,
I was unsure if this would be better sent to libvirt or qemu - the
issue is somewhere between libvirt modelling CPUs and qemu 10.1
behaving differently. I did not want to double post and gladly most of
the people are on both lists - since the switch in/out of the problem
is qemu 10.0 <-> 10.1 let me start here. I beg your pardon for not yet
having all the answers, I'm sure I could find more with debugging, but
I also wanted to report early for your awareness while we are still in
the RC phase.


# Problem

What I found when testing migrations in Ubuntu with qemu 10.1-rc1 was:
  error: operation failed: guest CPU doesn't match specification:
missing features: pdcm

This is behaving the same with libvirt 11.4 or the more recent 11.6.
But switching back to qemu 10.0 confirmed that this behavior is new
with qemu 10.1-rc.

To allow you to have a look I isolated it from the test automation and
simplified it to use save/restore which allows you to see it on just
one machine.


# Steps to reproduce

$ cat testguest.xml
<domain type='kvm'>
<name>testguest</name>
<memory unit='KiB'>524288</memory>
<currentMemory unit='KiB'>524288</currentMemory>
<os>
<type arch='x86_64' machine='pc-q35-10.0'>hvm</type>
</os>
<cpu mode='host-model' check='partial'/>
<devices>
<emulator>/usr/bin/qemu-system-x86_64</emulator>
</devices>
</domain>

$ virsh define testguest.xml
Domain 'testguest' defined from testguest.xml

$ virsh start testguest
Domain 'testguest' started

$ virsh managedsave testguest
Domain 'testguest' state saved by libvirt

$ virsh start testguest
error: Failed to start domain 'testguest'
error: operation failed: guest CPU doesn't match specification:
missing features: pdcm

Without yet having any hard evidence against them I found a few pdcm
related commits between 10.0 and 10.1-rc1:
  7ff24fb65 i386/tdx: Don't mask off CPUID_EXT_PDCM
  00268e000 i386/cpu: Warn about why CPUID_EXT_PDCM is not available
  e68ec2980 i386/cpu: Move adjustment of CPUID_EXT_PDCM before
feature_dependencies[] check
  0ba06e46d i386/tdx: Add TDX fixed1 bits to supported CPUIDs


# Caveat

My test environment is in LXD system containers, that gives me issues
in the power management detection
  libvirtd[406]: error from service: GDBus.Error:System.Error.EROFS:
Read-only file system
  libvirtd[406]: Failed to get host power management capabilities

And the resulting host-model on a  rather old test server will therefore have:
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>Haswell-noTSX-IBRS</model>
    <vendor>Intel</vendor>
    <feature policy='require' name='vmx'/>
    <feature policy='disable' name='pdcm'/>
     ...

But that was fine in the past, and the behavior started to break
save/restore or migrations just now with the new qemu 10.1-rc.


# Next steps

I'm soon overwhelmed by meetings for the rest of the day, but would be
curious if one has a suggestion about what to look at next for
debugging or a theory about what might go wrong. If nothing else comes
up I'll try to set up a bisect run tomorrow.

-- 
Christian Ehrhardt
Director of Engineering, Ubuntu Server
Canonical Ltd

Reply via email to