On Tue, Aug 19, 2025 at 4:51 PM Paolo Bonzini <pbonz...@redhat.com> wrote:
>
> On 8/6/25 21:18, Daniel P. Berrangé wrote:
> > On Wed, Aug 06, 2025 at 07:57:34PM +0200, Christian Ehrhardt wrote:
> >> On Wed, Aug 6, 2025 at 2:00 PM Daniel P. Berrangé <berra...@redhat.com> 
> >> wrote:
> >>>
> >>> On Wed, Aug 06, 2025 at 01:52:17PM +0200, Christian Ehrhardt wrote:
> >>>> Hi,
> >>>> I was unsure if this would be better sent to libvirt or qemu - the
> >>>> issue is somewhere between libvirt modelling CPUs and qemu 10.1
> >>>> behaving differently. I did not want to double post and gladly most of
> >>>> the people are on both lists - since the switch in/out of the problem
> >>>> is qemu 10.0 <-> 10.1 let me start here. I beg your pardon for not yet
> >>>> having all the answers, I'm sure I could find more with debugging, but
> >>>> I also wanted to report early for your awareness while we are still in
> >>>> the RC phase.
> >>>>
> >>>>
> >>>> # Problem
> >>>>
> >>>> What I found when testing migrations in Ubuntu with qemu 10.1-rc1 was:
> >>>>    error: operation failed: guest CPU doesn't match specification:
> >>>> missing features: pdcm
> >>>>
> >>>> This is behaving the same with libvirt 11.4 or the more recent 11.6.
> >>>> But switching back to qemu 10.0 confirmed that this behavior is new
> >>>> with qemu 10.1-rc.
> >>>
> >>>
> >>>> Without yet having any hard evidence against them I found a few pdcm
> >>>> related commits between 10.0 and 10.1-rc1:
> >>>>    7ff24fb65 i386/tdx: Don't mask off CPUID_EXT_PDCM
> >>>>    00268e000 i386/cpu: Warn about why CPUID_EXT_PDCM is not available
> >>>>    e68ec2980 i386/cpu: Move adjustment of CPUID_EXT_PDCM before
> >>>> feature_dependencies[] check
> >>>>    0ba06e46d i386/tdx: Add TDX fixed1 bits to supported CPUIDs
> >>>>
> >>>>
> >>>> # Caveat
> >>>>
> >>>> My test environment is in LXD system containers, that gives me issues
> >>>> in the power management detection
> >>>>    libvirtd[406]: error from service: GDBus.Error:System.Error.EROFS:
> >>>> Read-only file system
> >>>>    libvirtd[406]: Failed to get host power management capabilities
> >>>
> >>> That's harmless.
> >>
> >> Yeah, it always was for me - thanks for confirming.
> >>
> >>>> And the resulting host-model on a  rather old test server will therefore 
> >>>> have:
> >>>>    <cpu mode='custom' match='exact' check='full'>
> >>>>      <model fallback='forbid'>Haswell-noTSX-IBRS</model>
> >>>>      <vendor>Intel</vendor>
> >>>>      <feature policy='require' name='vmx'/>
> >>>>      <feature policy='disable' name='pdcm'/>
> >>>>       ...
> >>>>
> >>>> But that was fine in the past, and the behavior started to break
> >>>> save/restore or migrations just now with the new qemu 10.1-rc.
> >>>>
> >>>> # Next steps
> >>>>
> >>>> I'm soon overwhelmed by meetings for the rest of the day, but would be
> >>>> curious if one has a suggestion about what to look at next for
> >>>> debugging or a theory about what might go wrong. If nothing else comes
> >>>> up I'll try to set up a bisect run tomorrow.
> >>>
> >>> Yeah, git bisect is what I'd start with.
> >>
> >> Bisect complete, identified this commit
> >>
> >> commit 00268e00027459abede448662f8794d78eb4b0a4
> >> Author: Xiaoyao Li <xiaoyao...@intel.com>
> >> Date:   Tue Mar 4 00:24:50 2025 -0500
> >>
> >>      i386/cpu: Warn about why CPUID_EXT_PDCM is not available
> >>
> >>      When user requests PDCM explicitly via "+pdcm" without PMU enabled, 
> >> emit
> >>      a warning to inform the user.
> >>
> >>      Signed-off-by: Xiaoyao Li <xiaoyao...@intel.com>
> >>      Reviewed-by: Zhao Liu <zhao1....@intel.com>
> >>      Link: 
> >> https://lore.kernel.org/r/20250304052450.465445-3-xiaoyao...@intel.com
> >>      Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
> >>
> >>   target/i386/cpu.c | 3 +++
> >>   1 file changed, 3 insertions(+)
> >>
> >>
> >>
> >> Which is odd as it should only add a warning right?
> >
> > No, that commit message is misleading.
> >
> > IIUC mark_unavailable_features() actively blocks usage of the feature,
> > so it is a functional change, not merely a emitting warning.
> >
> > It makes me wonder if that commit was actually intended to block the
> > feature or not, vs merely warning ?  CC'ing those involved in the
> > commit.
> We can revert the commit.  I'll send the revert to Stefan and let him
> decide whether to include it in 10.1-rc4 or delay to 10.2 and 10.1.1.

Thanks Paolo for considering that.

My steps to reproduce seemed really clear and are 100% reproducible
for me, but no one so far said "yeah they see it too", so I'm getting
unsure if it was not tried by anyone else or if there is more to it
than we yet know.
Further I tested more with the commit reverted, and found that at
least cross version migrations (9.2 -> 10.1) still have issues that
seem related - complaining about pdcm as missing feature.
But that was in a log of a test system that went away and ... you know
how these things can sometimes be, that new result is not yet very
reliable.

I intended to check the following matrix more deeply again with and
without the reverted change and then come back to this thread:

#1 Compare platforms
- Migrating between non containerized hosts to verify if they are
affected as well
- Power management explicitly switched off/on (vs the auto detect of
host-model) in the guest XML
#2 Retest the different Use-cases I've seen this pop up
- 10.1 managed save (broken unless reverting the commit that was identified)
- 9.2 -> 10.1 migration (seems broken even with the revert)

The hope was that these will help to further identify what is going
on, but despite the urgency of the release being imminent I have not
yet managed to find the time in the last two days :-/

> Sorry for the delay in answering (and thanks Daniel for bringing this to
> my attention).
>
> Thanks,
>
> Paolo
>


-- 
Christian Ehrhardt
Director of Engineering, Ubuntu Server
Canonical Ltd

Reply via email to