On Mon, 3 Jun 2024 14:29:50 -0700
"Chen, Zide" <zide.c...@intel.com> wrote:

> On 6/3/2024 2:30 AM, Igor Mammedov wrote:
> > On Sat, 1 Jun 2024 23:26:55 +0800
> > Zhao Liu <zhao1....@intel.com> wrote:
> >   
> >> On Fri, May 31, 2024 at 10:13:47AM -0700, Chen, Zide wrote:  
> >>> Date: Fri, 31 May 2024 10:13:47 -0700
> >>> From: "Chen, Zide" <zide.c...@intel.com>
> >>> Subject: Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
> >>>  x86_cpu_filter_features
> >>>
> >>> On 5/30/2024 11:30 PM, Zhao Liu wrote:    
> >>>> Hi Zide,
> >>>>
> >>>> On Fri, May 24, 2024 at 01:00:16PM -0700, Zide Chen wrote:    
> >>>>> Date: Fri, 24 May 2024 13:00:16 -0700
> >>>>> From: Zide Chen <zide.c...@intel.com>
> >>>>> Subject: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
> >>>>>  x86_cpu_filter_features
> >>>>> X-Mailer: git-send-email 2.34.1
> >>>>>
> >>>>> cpu_exec_realizefn which calls the accel-specific realizefn may expand
> >>>>> features.  e.g., some accel-specific options may require extra features
> >>>>> to be enabled, and it's appropriate to expand these features in accel-
> >>>>> specific realizefn.
> >>>>>
> >>>>> One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR.
> >>>>>
> >>>>> Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure
> >>>>> that it won't expose features not supported by the host.
> >>>>>
> >>>>> Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn")
> >>>>> Suggested-by: Xiaoyao Li <xiaoyao...@intel.com>
> >>>>> Signed-off-by: Zide Chen <zide.c...@intel.com>
> >>>>> ---
> >>>>>  target/i386/cpu.c         | 24 ++++++++++++------------
> >>>>>  target/i386/kvm/kvm-cpu.c |  1 -
> >>>>>  2 files changed, 12 insertions(+), 13 deletions(-)
> >>>>>
> >>>>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> >>>>> index bc2dceb647fa..a1c1c785bd2f 100644
> >>>>> --- a/target/i386/cpu.c
> >>>>> +++ b/target/i386/cpu.c
> >>>>> @@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, 
> >>>>> Error **errp)
> >>>>>          }
> >>>>>      }
> >>>>>  
> >>>>> +    /*
> >>>>> +     * note: the call to the framework needs to happen after feature 
> >>>>> expansion,
> >>>>> +     * but before the checks/modifications to ucode_rev, mwait, 
> >>>>> phys_bits.
> >>>>> +     * These may be set by the accel-specific code,
> >>>>> +     * and the results are subsequently checked / assumed in this 
> >>>>> function.
> >>>>> +     */
> >>>>> +    cpu_exec_realizefn(cs, &local_err);
> >>>>> +    if (local_err != NULL) {
> >>>>> +        error_propagate(errp, local_err);
> >>>>> +        return;
> >>>>> +    }
> >>>>> +
> >>>>>      x86_cpu_filter_features(cpu, cpu->check_cpuid || 
> >>>>> cpu->enforce_cpuid);    
> >>>>
> >>>> For your case, which sets cpu-pm=on via overcommit, then
> >>>> x86_cpu_filter_features() will complain that mwait is not supported.
> >>>>
> >>>> Such warning is not necessary, because the purpose of overcommit (from
> >>>> code) is only to support mwait when possible, not to commit to support
> >>>> mwait in Guest.
> >>>>
> >>>> Additionally, I understand x86_cpu_filter_features() is primarily
> >>>> intended to filter features configured by the user,     
> >>>
> >>> Yes, that's why this patches intends to let x86_cpu_filter_features()
> >>> filter out the MWAIT bit which is set from the overcommit option.    
> >>
> >> HMM, but in fact x86_cpu_filter_features() has already checked the MWAIT
> >> bit set by "-overcommit cpu-pm=on". ;-)
> >>
> >> (Pls correct me if I'm wrong) Revisiting what cpu-pm did to MWAIT:
> >> * Firstly, it set MWAIT bit in x86_cpu_expand_features():
> >>   x86_cpu_expand_features()  
> >>      -> x86_cpu_get_supported_feature_word()
> >>         -> kvm_arch_get_supported_cpuid()    
> >>  This MWAIT is based on Host's MWAIT capability. This MWAIT enablement
> >>  is fine for next x86_cpu_filter_features() and x86_cpu_filter_features()
> >>  is working correctly here!
> >>
> >> * Then, MWAIT was secondly set in host_cpu_enable_cpu_pm() regardless
> >>   neither Host's support or previous MWAIT enablement result. This is
> >>   the root cause of your issue.
> >>
> >> Therefore, we should make cpu-pm honor his first MWAIT enablement result
> >> instead of repeatly and unconditionally setting the MWAIT bit again in
> >> host_cpu_enable_cpu_pm().
> >>
> >> Additionally, I think the code in x86_cpu_realizefn():
> >>   cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
> >> has the similar issue because it also should check MWAIT feature bit.
> >>
> >> Further, it may be possible to remove cpu->mwait: just check the MWAIT
> >> bit in leaf 5 of cpu_x86_cpuid(), and if MWAIT is present, use host's
> >> mwait info plus CPUID_MWAIT_EMX | CPUID_MWAIT_IBE.  
> > 
> > Agreed with above analysis,
> > we shouldn't have host_cpu_enable_cpu_pm() as kvm_arch_get_supported_cpuid
> > gets us MWAIT already.  
> 
> Yes, I agree don't need to set CPUID_EXT_MONITOR besides
> kvm_arch_get_supported_cpuid().
> 
> > 
> > filling in cpu->mwait.ecx is benign mistake which likely doesn't
> > trigger anything if CPUID_EXT_MONITOR is not present.
> > But for clarity it's better to add an explicit check there as well.  
> 
> Yes, I agree without MWAIT available and advertised, it's meaningless to
> set the EMX and IBE bits. Seems to me it's cleaner to remove cpu->mwait
> all together, and in cpu_x86_cpuid(), just read from host_cpuid(5) if
> MWAIT is available to the guest. But I don't understand the history of
> why QEMU unconditionally advertises these two bits, and don't know it it
> could break some thing if these two bits are removed.
> 
> Even if we want to fix these two bits, we can do it in another separate
> patch.
> 
> e737b32a36 (" Core 2 Duo specification (Alexander Graf).")
> unconditionally adds "CPUID_MWAIT_EMX | CPUID_MWAIT_IBE" to CPUID.5.ECX
> with further explanation.
> 
> 2266d44311 ("i386/cpu: make -cpu host support monitor/mwait") adds
> comment "We always wake on interrupt even if host does not have the
> capability" to CPUID_MWAIT_IBE.
> 
> 
> >   
> >>  
> >>>> and the changes of
> >>>> CPUID after x86_cpu_filter_features() should by default be regarded like
> >>>> "QEMU knows what it is doing".    
> >>>
> >>> Sure, we can add feature bits after x86_cpu_filter_features(), but I
> >>> think moving cpu_exec_realizefn() before x86_cpu_filter_features() is
> >>> more generic, and actually this is what QEMU did before commit 
> >>> 662175b91ff2.
> >>>
> >>> - Less redundant code. Specifically, no need to call
> >>> x86_cpu_get_supported_feature_word() again.
> >>> - Potentially there could be other features could be added from the
> >>> accel-specific realizefn, kvm_cpu_realizefn() for example.  And these
> >>> features need to be checked against the host availability.    
> >>
> >> Mainly I don't think this reorder is a direct fix for the problem (I
> >> just analyse it above), also in your case x86_cpu_filter_features() will
> >> print a WARNING when QEMU boots, which I don't think is cpu-pm's 
> >> intention.  
> > 
> > There is no problem with warning, I'd even say it's a good thing.  
> 
> I agree it's good to have the warning as well.
> 
> > But you are right reordering just masks the issue.
> > 
> > As for expected behavior, if user asked for "-overcommit cpu-pm=on"
> > there are 2 options:
> >    * it's working as expected (mwait exiting is enabled successfully with 
> > CPUID MONITOR bit set)
> >    * QEMU shall fail to start.  
> 
> I like the idea that QEMU refuses to launch the guest if the asked CPU
> features are not available, which is more user friendly.  But the
> problem is, "-overcommit cpu-pm=on" is an umbrella which intends to
> enable all the following CPUIDs and KVM features if it's possible.  So,
> if QEMU fails the guest in this case, then it needs to fail the WAITPKG
> feature as well. Additionally, it may need to provide individual options
> to enable these individual features, which I doubt could be too complicated.

how about

kvm_arch_init()
    ...
    if (enable_cpu_pm) {                                                        
 
        int disable_exits = kvm_check_extension(s, KVM_CAP_X86_DISABLE_EXITS);  
 

/* Work around for kernel header with a typo. TODO: fix header and drop. */     
 
#if defined(KVM_X86_DISABLE_EXITS_HTL) && !defined(KVM_X86_DISABLE_EXITS_HLT)   
 
#define KVM_X86_DISABLE_EXITS_HLT KVM_X86_DISABLE_EXITS_HTL                     
 
#endif                                   
                                        
above comment probably needs to be cleaned up

        if (disable_exits) {                                                    
 
            disable_exits &= (KVM_X86_DISABLE_EXITS_MWAIT |                     
 
                              KVM_X86_DISABLE_EXITS_HLT |                       
 
                              KVM_X86_DISABLE_EXITS_PAUSE |                     
 
                              KVM_X86_DISABLE_EXITS_CSTATE);                    
 
        }  

fail here if filtered disable_exits is an empty set?                            
                                    
                                                                                
 
        ret = kvm_vm_enable_cap(s, KVM_CAP_X86_DISABLE_EXITS, 0,                
 
                                disable_exits);                                 
 
        if (ret < 0) {                                                          
 
            error_report("kvm: guest stopping CPU not supported: %s",           
 
                         strerror(-ret));                                       
 
        }                                                                       
 
    }                                                                           
 
      


> KVM_X86_DISABLE_EXITS_MWAI
> KVM_X86_DISABLE_EXITS_HLTKVM_X86_DISABLE_EXITS_PAUSE
> KVM_X86_DISABLE_EXITS_CSTATE
> CPUID.7.0:ECX.WAITPKG
> CPUID.1.ECX.MWAIT
> 


Reply via email to