On Mon, 3 Jun 2024 14:29:50 -0700 "Chen, Zide" <zide.c...@intel.com> wrote:
> On 6/3/2024 2:30 AM, Igor Mammedov wrote: > > On Sat, 1 Jun 2024 23:26:55 +0800 > > Zhao Liu <zhao1....@intel.com> wrote: > > > >> On Fri, May 31, 2024 at 10:13:47AM -0700, Chen, Zide wrote: > >>> Date: Fri, 31 May 2024 10:13:47 -0700 > >>> From: "Chen, Zide" <zide.c...@intel.com> > >>> Subject: Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before > >>> x86_cpu_filter_features > >>> > >>> On 5/30/2024 11:30 PM, Zhao Liu wrote: > >>>> Hi Zide, > >>>> > >>>> On Fri, May 24, 2024 at 01:00:16PM -0700, Zide Chen wrote: > >>>>> Date: Fri, 24 May 2024 13:00:16 -0700 > >>>>> From: Zide Chen <zide.c...@intel.com> > >>>>> Subject: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before > >>>>> x86_cpu_filter_features > >>>>> X-Mailer: git-send-email 2.34.1 > >>>>> > >>>>> cpu_exec_realizefn which calls the accel-specific realizefn may expand > >>>>> features. e.g., some accel-specific options may require extra features > >>>>> to be enabled, and it's appropriate to expand these features in accel- > >>>>> specific realizefn. > >>>>> > >>>>> One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR. > >>>>> > >>>>> Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure > >>>>> that it won't expose features not supported by the host. > >>>>> > >>>>> Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn") > >>>>> Suggested-by: Xiaoyao Li <xiaoyao...@intel.com> > >>>>> Signed-off-by: Zide Chen <zide.c...@intel.com> > >>>>> --- > >>>>> target/i386/cpu.c | 24 ++++++++++++------------ > >>>>> target/i386/kvm/kvm-cpu.c | 1 - > >>>>> 2 files changed, 12 insertions(+), 13 deletions(-) > >>>>> > >>>>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c > >>>>> index bc2dceb647fa..a1c1c785bd2f 100644 > >>>>> --- a/target/i386/cpu.c > >>>>> +++ b/target/i386/cpu.c > >>>>> @@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, > >>>>> Error **errp) > >>>>> } > >>>>> } > >>>>> > >>>>> + /* > >>>>> + * note: the call to the framework needs to happen after feature > >>>>> expansion, > >>>>> + * but before the checks/modifications to ucode_rev, mwait, > >>>>> phys_bits. > >>>>> + * These may be set by the accel-specific code, > >>>>> + * and the results are subsequently checked / assumed in this > >>>>> function. > >>>>> + */ > >>>>> + cpu_exec_realizefn(cs, &local_err); > >>>>> + if (local_err != NULL) { > >>>>> + error_propagate(errp, local_err); > >>>>> + return; > >>>>> + } > >>>>> + > >>>>> x86_cpu_filter_features(cpu, cpu->check_cpuid || > >>>>> cpu->enforce_cpuid); > >>>> > >>>> For your case, which sets cpu-pm=on via overcommit, then > >>>> x86_cpu_filter_features() will complain that mwait is not supported. > >>>> > >>>> Such warning is not necessary, because the purpose of overcommit (from > >>>> code) is only to support mwait when possible, not to commit to support > >>>> mwait in Guest. > >>>> > >>>> Additionally, I understand x86_cpu_filter_features() is primarily > >>>> intended to filter features configured by the user, > >>> > >>> Yes, that's why this patches intends to let x86_cpu_filter_features() > >>> filter out the MWAIT bit which is set from the overcommit option. > >> > >> HMM, but in fact x86_cpu_filter_features() has already checked the MWAIT > >> bit set by "-overcommit cpu-pm=on". ;-) > >> > >> (Pls correct me if I'm wrong) Revisiting what cpu-pm did to MWAIT: > >> * Firstly, it set MWAIT bit in x86_cpu_expand_features(): > >> x86_cpu_expand_features() > >> -> x86_cpu_get_supported_feature_word() > >> -> kvm_arch_get_supported_cpuid() > >> This MWAIT is based on Host's MWAIT capability. This MWAIT enablement > >> is fine for next x86_cpu_filter_features() and x86_cpu_filter_features() > >> is working correctly here! > >> > >> * Then, MWAIT was secondly set in host_cpu_enable_cpu_pm() regardless > >> neither Host's support or previous MWAIT enablement result. This is > >> the root cause of your issue. > >> > >> Therefore, we should make cpu-pm honor his first MWAIT enablement result > >> instead of repeatly and unconditionally setting the MWAIT bit again in > >> host_cpu_enable_cpu_pm(). > >> > >> Additionally, I think the code in x86_cpu_realizefn(): > >> cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE; > >> has the similar issue because it also should check MWAIT feature bit. > >> > >> Further, it may be possible to remove cpu->mwait: just check the MWAIT > >> bit in leaf 5 of cpu_x86_cpuid(), and if MWAIT is present, use host's > >> mwait info plus CPUID_MWAIT_EMX | CPUID_MWAIT_IBE. > > > > Agreed with above analysis, > > we shouldn't have host_cpu_enable_cpu_pm() as kvm_arch_get_supported_cpuid > > gets us MWAIT already. > > Yes, I agree don't need to set CPUID_EXT_MONITOR besides > kvm_arch_get_supported_cpuid(). > > > > > filling in cpu->mwait.ecx is benign mistake which likely doesn't > > trigger anything if CPUID_EXT_MONITOR is not present. > > But for clarity it's better to add an explicit check there as well. > > Yes, I agree without MWAIT available and advertised, it's meaningless to > set the EMX and IBE bits. Seems to me it's cleaner to remove cpu->mwait > all together, and in cpu_x86_cpuid(), just read from host_cpuid(5) if > MWAIT is available to the guest. But I don't understand the history of > why QEMU unconditionally advertises these two bits, and don't know it it > could break some thing if these two bits are removed. > > Even if we want to fix these two bits, we can do it in another separate > patch. > > e737b32a36 (" Core 2 Duo specification (Alexander Graf).") > unconditionally adds "CPUID_MWAIT_EMX | CPUID_MWAIT_IBE" to CPUID.5.ECX > with further explanation. > > 2266d44311 ("i386/cpu: make -cpu host support monitor/mwait") adds > comment "We always wake on interrupt even if host does not have the > capability" to CPUID_MWAIT_IBE. > > > > > >> > >>>> and the changes of > >>>> CPUID after x86_cpu_filter_features() should by default be regarded like > >>>> "QEMU knows what it is doing". > >>> > >>> Sure, we can add feature bits after x86_cpu_filter_features(), but I > >>> think moving cpu_exec_realizefn() before x86_cpu_filter_features() is > >>> more generic, and actually this is what QEMU did before commit > >>> 662175b91ff2. > >>> > >>> - Less redundant code. Specifically, no need to call > >>> x86_cpu_get_supported_feature_word() again. > >>> - Potentially there could be other features could be added from the > >>> accel-specific realizefn, kvm_cpu_realizefn() for example. And these > >>> features need to be checked against the host availability. > >> > >> Mainly I don't think this reorder is a direct fix for the problem (I > >> just analyse it above), also in your case x86_cpu_filter_features() will > >> print a WARNING when QEMU boots, which I don't think is cpu-pm's > >> intention. > > > > There is no problem with warning, I'd even say it's a good thing. > > I agree it's good to have the warning as well. > > > But you are right reordering just masks the issue. > > > > As for expected behavior, if user asked for "-overcommit cpu-pm=on" > > there are 2 options: > > * it's working as expected (mwait exiting is enabled successfully with > > CPUID MONITOR bit set) > > * QEMU shall fail to start. > > I like the idea that QEMU refuses to launch the guest if the asked CPU > features are not available, which is more user friendly. But the > problem is, "-overcommit cpu-pm=on" is an umbrella which intends to > enable all the following CPUIDs and KVM features if it's possible. So, > if QEMU fails the guest in this case, then it needs to fail the WAITPKG > feature as well. Additionally, it may need to provide individual options > to enable these individual features, which I doubt could be too complicated. how about kvm_arch_init() ... if (enable_cpu_pm) { int disable_exits = kvm_check_extension(s, KVM_CAP_X86_DISABLE_EXITS); /* Work around for kernel header with a typo. TODO: fix header and drop. */ #if defined(KVM_X86_DISABLE_EXITS_HTL) && !defined(KVM_X86_DISABLE_EXITS_HLT) #define KVM_X86_DISABLE_EXITS_HLT KVM_X86_DISABLE_EXITS_HTL #endif above comment probably needs to be cleaned up if (disable_exits) { disable_exits &= (KVM_X86_DISABLE_EXITS_MWAIT | KVM_X86_DISABLE_EXITS_HLT | KVM_X86_DISABLE_EXITS_PAUSE | KVM_X86_DISABLE_EXITS_CSTATE); } fail here if filtered disable_exits is an empty set? ret = kvm_vm_enable_cap(s, KVM_CAP_X86_DISABLE_EXITS, 0, disable_exits); if (ret < 0) { error_report("kvm: guest stopping CPU not supported: %s", strerror(-ret)); } } > KVM_X86_DISABLE_EXITS_MWAI > KVM_X86_DISABLE_EXITS_HLTKVM_X86_DISABLE_EXITS_PAUSE > KVM_X86_DISABLE_EXITS_CSTATE > CPUID.7.0:ECX.WAITPKG > CPUID.1.ECX.MWAIT >