On 8/7/2025 3:18 AM, Daniel P. Berrangé wrote:
On Wed, Aug 06, 2025 at 07:57:34PM +0200, Christian Ehrhardt wrote:
On Wed, Aug 6, 2025 at 2:00 PM Daniel P. Berrangé <berra...@redhat.com> wrote:

On Wed, Aug 06, 2025 at 01:52:17PM +0200, Christian Ehrhardt wrote:
Hi,
I was unsure if this would be better sent to libvirt or qemu - the
issue is somewhere between libvirt modelling CPUs and qemu 10.1
behaving differently. I did not want to double post and gladly most of
the people are on both lists - since the switch in/out of the problem
is qemu 10.0 <-> 10.1 let me start here. I beg your pardon for not yet
having all the answers, I'm sure I could find more with debugging, but
I also wanted to report early for your awareness while we are still in
the RC phase.


# Problem

What I found when testing migrations in Ubuntu with qemu 10.1-rc1 was:
   error: operation failed: guest CPU doesn't match specification:
missing features: pdcm

This is behaving the same with libvirt 11.4 or the more recent 11.6.
But switching back to qemu 10.0 confirmed that this behavior is new
with qemu 10.1-rc.


Without yet having any hard evidence against them I found a few pdcm
related commits between 10.0 and 10.1-rc1:
   7ff24fb65 i386/tdx: Don't mask off CPUID_EXT_PDCM
   00268e000 i386/cpu: Warn about why CPUID_EXT_PDCM is not available
   e68ec2980 i386/cpu: Move adjustment of CPUID_EXT_PDCM before
feature_dependencies[] check
   0ba06e46d i386/tdx: Add TDX fixed1 bits to supported CPUIDs


# Caveat

My test environment is in LXD system containers, that gives me issues
in the power management detection
   libvirtd[406]: error from service: GDBus.Error:System.Error.EROFS:
Read-only file system
   libvirtd[406]: Failed to get host power management capabilities

That's harmless.

Yeah, it always was for me - thanks for confirming.

And the resulting host-model on a  rather old test server will therefore have:
   <cpu mode='custom' match='exact' check='full'>
     <model fallback='forbid'>Haswell-noTSX-IBRS</model>
     <vendor>Intel</vendor>
     <feature policy='require' name='vmx'/>
     <feature policy='disable' name='pdcm'/>
      ...

But that was fine in the past, and the behavior started to break
save/restore or migrations just now with the new qemu 10.1-rc.

# Next steps

I'm soon overwhelmed by meetings for the rest of the day, but would be
curious if one has a suggestion about what to look at next for
debugging or a theory about what might go wrong. If nothing else comes
up I'll try to set up a bisect run tomorrow.

Yeah, git bisect is what I'd start with.

Bisect complete, identified this commit

commit 00268e00027459abede448662f8794d78eb4b0a4
Author: Xiaoyao Li <xiaoyao...@intel.com>
Date:   Tue Mar 4 00:24:50 2025 -0500

     i386/cpu: Warn about why CPUID_EXT_PDCM is not available

     When user requests PDCM explicitly via "+pdcm" without PMU enabled, emit
     a warning to inform the user.

     Signed-off-by: Xiaoyao Li <xiaoyao...@intel.com>
     Reviewed-by: Zhao Liu <zhao1....@intel.com>
     Link: 
https://lore.kernel.org/r/20250304052450.465445-3-xiaoyao...@intel.com
     Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>

  target/i386/cpu.c | 3 +++
  1 file changed, 3 insertions(+)



Which is odd as it should only add a warning right?

No, that commit message is misleading.

IIUC mark_unavailable_features() actively blocks usage of the feature,
so it is a functional change, not merely a emitting warning.

It makes me wonder if that commit was actually intended to block the
feature or not, vs merely warning ?  CC'ing those involved in the
commit.

The intention was to print a warning to tell users PDCM cannot be enabled if pmu is not enabled. While mark_unavailable_features() does has the effect of setting the bit in cpu->filtered_features[].

But the feature is masked off anyway even without the mark_unavailable_features():

    env->features[FEAT_1_ECX] &= ~CPUID_EXT_PDCM;

So is it that PDCM is set in cpu->filtered_features[] causing the problem?

With regards,
Daniel


Reply via email to