Re: [PATCH RESEND v2 2/2] target/i386/kvm: get and put AMD pmu registers
On Wed, Jun 21, 2023 at 9:39 AM Dongli Zhang wrote: > > The QEMU side calls kvm_get_msrs() to save the pmu registers from the KVM > side to QEMU, and calls kvm_put_msrs() to store the pmu registers back to > the KVM side. > > However, only the Intel gp/fixed/global pmu registers are involved. There > is not any implementation for AMD pmu registers. The > 'has_architectural_pmu_version' and 'num_architectural_pmu_gp_counters' are > calculated at kvm_arch_init_vcpu() via cpuid(0xa). This does not work for > AMD. Before AMD PerfMonV2, the number of gp registers is decided based on > the CPU version. Updating the relevant documentation to clarify this part of the deficiency would be a good first step. > > This patch is to add the support for AMD version=1 pmu, to get and put AMD > pmu registers. Otherwise, there will be a bug: AMD version=1 ? AMD does not have version 1, just directly has 2, perhaps because of x86 compatibility. AMD also does not have the so-called architectural pmu. Maybe need to rename has_architectural_pmu_version for AMD. It might be more helpful to add similar support for AMD PerfMonV2. > > 1. The VM resets (e.g., via QEMU system_reset or VM kdump/kexec) while it > is running "perf top". The pmu registers are not disabled gracefully. > > 2. Although the x86_cpu_reset() resets many registers to zero, the > kvm_put_msrs() does not puts AMD pmu registers to KVM side. As a result, > some pmu events are still enabled at the KVM side. I agree that we should have done that, especially if guest pmu is enabled on the AMD platforms. > > 3. The KVM pmc_speculative_in_use() always returns true so that the events > will not be reclaimed. The kvm_pmc->perf_event is still active. > > 4. After the reboot, the VM kernel reports below error: > > [0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS > detected, complain to your hardware vendor. > [0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR > c0010200 is 530076) > > 5. In a worse case, the active kvm_pmc->perf_event is still able to > inject unknown NMIs randomly to the VM kernel. > > [...] Uhhuh. NMI received for unknown reason 30 on CPU 0. > > The patch is to fix the issue by resetting AMD pmu registers during the > reset. I'm not sure if the qemu_reset or VM kexec will necessarily trigger kvm::amd_pmu_reset(). > > Cc: Joe Jin > Cc: Like Xu > Signed-off-by: Dongli Zhang > --- > target/i386/cpu.h | 5 +++ > target/i386/kvm/kvm.c | 83 +-- > 2 files changed, 86 insertions(+), 2 deletions(-) > > diff --git a/target/i386/cpu.h b/target/i386/cpu.h > index cd047e0410..b8ba72e87a 100644 > --- a/target/i386/cpu.h > +++ b/target/i386/cpu.h > @@ -471,6 +471,11 @@ typedef enum X86Seg { > #define MSR_CORE_PERF_GLOBAL_CTRL 0x38f > #define MSR_CORE_PERF_GLOBAL_OVF_CTRL 0x390 > > +#define MSR_K7_EVNTSEL0 0xc001 > +#define MSR_K7_PERFCTR0 0xc0010004 > +#define MSR_F15H_PERF_CTL0 0xc0010200 > +#define MSR_F15H_PERF_CTR0 0xc0010201 > + > #define MSR_MC0_CTL 0x400 > #define MSR_MC0_STATUS 0x401 > #define MSR_MC0_ADDR0x402 > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c > index bf4136fa1b..a0f7273dad 100644 > --- a/target/i386/kvm/kvm.c > +++ b/target/i386/kvm/kvm.c > @@ -2084,6 +2084,32 @@ int kvm_arch_init_vcpu(CPUState *cs) > } > } > > +/* > + * If KVM_CAP_PMU_CAPABILITY is not supported, there is no way to > + * disable the AMD pmu virtualization. > + * > + * If KVM_CAP_PMU_CAPABILITY is supported, kvm_state->pmu_cap_disabled > + * indicates the KVM side has already disabled the pmu virtualization. > + */ > +if (IS_AMD_CPU(env) && !cs->kvm_state->pmu_cap_disabled) { > +int64_t family; > + > +family = (env->cpuid_version >> 8) & 0xf; > +if (family == 0xf) { > +family += (env->cpuid_version >> 20) & 0xff; > +} > + > +if (family >= 6) { > +has_architectural_pmu_version = 1; > + > +if (env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_PERFCORE) { > +num_architectural_pmu_gp_counters = 6; Please make the code a little more readable with some macro definitions. #define AMD64_NUM_COUNTERS 4 #define AMD64_NUM_COUNTERS_CORE 6 > +} else { > +num_architectural_pmu_gp_counters = 4; > +} > +} > +} > + > cpu_x86_cpuid(env, 0x8000, 0, &limit, &unu
Re: [PATCH RESEND v2 1/2] target/i386/kvm: introduce 'pmu-cap-disabled' to set KVM_PMU_CAP_DISABLE
On Wed, Jun 21, 2023 at 9:39 AM Dongli Zhang wrote: > > The "perf stat" at the VM side still works even we set "-cpu host,-pmu" in > the QEMU command line. That is, neither "-cpu host,-pmu" nor "-cpu EPYC" > could disable the pmu virtualization in an AMD environment. > > We still see below at VM kernel side ... > > [0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver. > > ... although we expect something like below. > > [0.596381] Performance Events: PMU not available due to virtualization, > using software events only. > [0.600972] NMI watchdog: Perf NMI watchdog permanently disabled > > This is because the AMD pmu (v1) does not rely on cpuid to decide if the > pmu virtualization is supported. > > We introduce a new property 'pmu-cap-disabled' for KVM accel to set > KVM_PMU_CAP_DISABLE if KVM_CAP_PMU_CAPABILITY is supported. Only x86 host > is supported because currently KVM uses KVM_CAP_PMU_CAPABILITY only for > x86. We may check cpu->enable_pmu when creating the first CPU or a BSP one (before it gets running) and then choose whether to disable guest pmu using vm ioctl KVM_CAP_PMU_CAPABILITY. Introducing a new property is not too acceptable if there are other options. > > Cc: Joe Jin > Cc: Like Xu > Signed-off-by: Dongli Zhang > --- > Changed since v1: > - In version 1 we did not introduce the new property. We ioctl > KVM_PMU_CAP_DISABLE only before the creation of the 1st vcpu. We had > introduced a helpfer function to do this job before creating the 1st > KVM vcpu in v1. > > accel/kvm/kvm-all.c | 1 + > include/sysemu/kvm_int.h | 1 + > qemu-options.hx | 7 ++ > target/i386/kvm/kvm.c| 46 > 4 files changed, 55 insertions(+) > > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c > index 7679f397ae..238098e991 100644 > --- a/accel/kvm/kvm-all.c > +++ b/accel/kvm/kvm-all.c > @@ -3763,6 +3763,7 @@ static void kvm_accel_instance_init(Object *obj) > s->xen_version = 0; > s->xen_gnttab_max_frames = 64; > s->xen_evtchn_max_pirq = 256; > +s->pmu_cap_disabled = false; > } > > /** > diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h > index 511b42bde5..cbbe08ec54 100644 > --- a/include/sysemu/kvm_int.h > +++ b/include/sysemu/kvm_int.h > @@ -123,6 +123,7 @@ struct KVMState > uint32_t xen_caps; > uint16_t xen_gnttab_max_frames; > uint16_t xen_evtchn_max_pirq; > +bool pmu_cap_disabled; > }; > > void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml, > diff --git a/qemu-options.hx b/qemu-options.hx > index b57489d7ca..1976c0ca3e 100644 > --- a/qemu-options.hx > +++ b/qemu-options.hx > @@ -187,6 +187,7 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel, > "tb-size=n (TCG translation block cache size)\n" > "dirty-ring-size=n (KVM dirty ring GFN count, default > 0)\n" > " > notify-vmexit=run|internal-error|disable,notify-window=n (enable notify VM > exit and set notify window, x86 only)\n" > +"pmu-cap-disabled=true|false (disable > KVM_CAP_PMU_CAPABILITY, x86 only, default false)\n" > "thread=single|multi (enable multi-threaded TCG)\n", > QEMU_ARCH_ALL) > SRST > ``-accel name[,prop=value[,...]]`` > @@ -254,6 +255,12 @@ SRST > open up for a specified of time (i.e. notify-window). > Default: notify-vmexit=run,notify-window=0. > > +``pmu-cap-disabled=true|false`` > +When the KVM accelerator is used, it controls whether to disable the > +KVM_CAP_PMU_CAPABILITY via KVM_PMU_CAP_DISABLE. When disabled, the > +PMU virtualization is disabled at the KVM module side. This is for > +x86 host only. > + > ERST > > DEF("smp", HAS_ARG, QEMU_OPTION_smp, > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c > index de531842f6..bf4136fa1b 100644 > --- a/target/i386/kvm/kvm.c > +++ b/target/i386/kvm/kvm.c > @@ -129,6 +129,7 @@ static bool has_msr_ucode_rev; > static bool has_msr_vmx_procbased_ctls2; > static bool has_msr_perf_capabs; > static bool has_msr_pkrs; > +static bool has_pmu_cap; > > static uint32_t has_architectural_pmu_version; > static uint32_t num_architectural_pmu_gp_counters; > @@ -2767,6 +2768,23 @@ int kvm_arch_init(MachineState *ms, KVMState *s) > } > } > > +has_pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY); > + > +if (s->pmu_cap_disabled) { > +if (has_pmu_cap) { >
Re: [PATCH v2 0/2] target/i386/kvm: fix two svm pmu virtualization bugs
I think we've been stuck here too long. Sorry Dongli. +zhenyu, could you get someone to follow up on this, or I will start working on that. On 9/1/2023 9:19 am, Dongli Zhang wrote: Ping? About [PATCH v2 2/2], the bad thing is that the customer will not be able to notice the issue, that is, the "Broken BIOS detected" in dmesg, immediately. As a result, the customer VM many panic randomly anytime in the future (once issue is encountered) if "/proc/sys/kernel/unknown_nmi_panic" is enabled. Thank you very much! Dongli Zhang On 12/19/22 06:45, Dongli Zhang wrote: Can I get feedback for this patchset, especially the [PATCH v2 2/2]? About the [PATCH v2 2/2], currently the issue impacts the usage of PMUs on AMD VM, especially the below case: 1. Enable panic on nmi. 2. Use perf to monitor the performance of VM. Although without a test, I think the nmi watchdog has the same effect. 3. A sudden system reset, or a kernel panic (kdump/kexec). 4. After reboot, there will be random unknown NMI. 5. Unfortunately, the "panic on nmi" may panic the VM randomly at any time. Thank you very much! Dongli Zhang On 12/1/22 16:22, Dongli Zhang wrote: This patchset is to fix two svm pmu virtualization bugs, x86 only. version 1: https://lore.kernel.org/all/20221119122901.2469-1-dongli.zh...@oracle.com/ 1. The 1st bug is that "-cpu,-pmu" cannot disable svm pmu virtualization. To use "-cpu EPYC" or "-cpu host,-pmu" cannot disable the pmu virtualization. There is still below at the VM linux side ... [0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver. ... although we expect something like below. [0.596381] Performance Events: PMU not available due to virtualization, using software events only. [0.600972] NMI watchdog: Perf NMI watchdog permanently disabled The 1st patch has introduced a new x86 only accel/kvm property "pmu-cap-disabled=true" to disable the pmu virtualization via KVM_PMU_CAP_DISABLE. I considered 'KVM_X86_SET_MSR_FILTER' initially before patchset v1. Since both KVM_X86_SET_MSR_FILTER and KVM_PMU_CAP_DISABLE are VM ioctl. I finally used the latter because it is easier to use. 2. The 2nd bug is that un-reclaimed perf events (after QEMU system_reset) at the KVM side may inject random unwanted/unknown NMIs to the VM. The svm pmu registers are not reset during QEMU system_reset. (1). The VM resets (e.g., via QEMU system_reset or VM kdump/kexec) while it is running "perf top". The pmu registers are not disabled gracefully. (2). Although the x86_cpu_reset() resets many registers to zero, the kvm_put_msrs() does not puts AMD pmu registers to KVM side. As a result, some pmu events are still enabled at the KVM side. (3). The KVM pmc_speculative_in_use() always returns true so that the events will not be reclaimed. The kvm_pmc->perf_event is still active. (4). After the reboot, the VM kernel reports below error: [0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor. [0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076) (5). In a worse case, the active kvm_pmc->perf_event is still able to inject unknown NMIs randomly to the VM kernel. [...] Uhhuh. NMI received for unknown reason 30 on CPU 0. The 2nd patch is to fix the issue by resetting AMD pmu registers as well as Intel registers. This patchset does not cover PerfMonV2, until the below patchset is merged into the KVM side. [PATCH v3 0/8] KVM: x86: Add AMD Guest PerfMonV2 PMU support https://lore.kernel.org/all/2022102645.82001-1-lik...@tencent.com/ Dongli Zhang (2): target/i386/kvm: introduce 'pmu-cap-disabled' to set KVM_PMU_CAP_DISABLE target/i386/kvm: get and put AMD pmu registers accel/kvm/kvm-all.c | 1 + include/sysemu/kvm_int.h | 1 + qemu-options.hx | 7 +++ target/i386/cpu.h| 5 ++ target/i386/kvm/kvm.c| 129 +- 5 files changed, 141 insertions(+), 2 deletions(-) Thank you very much! Dongli Zhang
Re: [PATCH 0/3] kvm: fix two svm pmu virtualization bugs
On 19/11/2022 8:28 pm, Dongli Zhang wrote: This patchset is to fix two svm pmu virtualization bugs. 1. The 1st bug is that "-cpu,-pmu" cannot disable svm pmu virtualization. To use "-cpu EPYC" or "-cpu host,-pmu" cannot disable the pmu virtualization. There is still below at the VM linux side ... Many QEMU vendor forks already have similar fixes, and thanks for bringing this issue back to the mainline. [0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver. ... although we expect something like below. [0.596381] Performance Events: PMU not available due to virtualization, using software events only. [0.600972] NMI watchdog: Perf NMI watchdog permanently disabled The patch 1-2 is to disable the pmu virtualization via KVM_PMU_CAP_DISABLE if the per-vcpu "pmu" property is disabled. I considered 'KVM_X86_SET_MSR_FILTER' initially. Since both KVM_X86_SET_MSR_FILTER and KVM_PMU_CAP_DISABLE are VM ioctl. I finally used the latter because it is easier to use. 2. The 2nd bug is that un-reclaimed perf events (after QEMU system_reset) at the KVM side may inject random unwanted/unknown NMIs to the VM. The svm pmu registers are not reset during QEMU system_reset. (1). The VM resets (e.g., via QEMU system_reset or VM kdump/kexec) while it is running "perf top". The pmu registers are not disabled gracefully. (2). Although the x86_cpu_reset() resets many registers to zero, the kvm_put_msrs() does not puts AMD pmu registers to KVM side. As a result, some pmu events are still enabled at the KVM side. (3). The KVM pmc_speculative_in_use() always returns true so that the events will not be reclaimed. The kvm_pmc->perf_event is still active. I'm not sure if you're saying KVM doing something wrong, I don't think so because KVM doesn't sense the system_reset defined by QEME or other user space, AMD's vPMC will continue to be enabled (if it was enabled before), generating pmi injection into the guest, and the newly started guest doesn't realize the counter is still enabled and blowing up the error log. (4). After the reboot, the VM kernel reports below error: [0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor. [0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076) (5). In a worse case, the active kvm_pmc->perf_event is still able to inject unknown NMIs randomly to the VM kernel. [...] Uhhuh. NMI received for unknown reason 30 on CPU 0. The patch 3 is to fix the issue by resetting AMD pmu registers as well as Intel registers. This fix idea looks good, it does require syncing the new changed device state of QEMU to KVM. This patchset does cover does not cover PerfMonV2, until the below patchset is merged into the KVM side. [PATCH v3 0/8] KVM: x86: Add AMD Guest PerfMonV2 PMU support https://lore.kernel.org/all/2022102645.82001-1-lik...@tencent.com/ Dongli Zhang (3): kvm: introduce a helper before creating the 1st vcpu i386: kvm: disable KVM_CAP_PMU_CAPABILITY if "pmu" is disabled target/i386/kvm: get and put AMD pmu registers accel/kvm/kvm-all.c| 7 ++- include/sysemu/kvm.h | 2 + target/arm/kvm64.c | 4 ++ target/i386/cpu.h | 5 +++ target/i386/kvm/kvm.c | 104 +++- target/mips/kvm.c | 4 ++ target/ppc/kvm.c | 4 ++ target/riscv/kvm.c | 4 ++ target/s390x/kvm/kvm.c | 4 ++ 9 files changed, 134 insertions(+), 4 deletions(-) Thank you very much! Dongli Zhang
Re: [PATCH] i386: Disable BTS and PEBS
On 20/7/2022 2:53 am, Sean Christopherson wrote: On Tue, Jul 19, 2022, Paolo Bonzini wrote: On 7/18/22 22:12, Sean Christopherson wrote: On Mon, Jul 18, 2022, Paolo Bonzini wrote: This needs to be fixed in the kernel because old QEMU/new KVM is supported. I can't object to adding a quirk for this since KVM is breaking userspace, but on the KVM side we really need to stop "sanitizing" userspace inputs unless it puts the host at risk, because inevitably it leads to needing a quirk. The problem is not the sanitizing, it's that userspace literally cannot know that this needs to be done because the feature bits are "backwards" (1 = unavailable). Yes, the bits being inverted contributed to KVM not providing a way for userspace to enumerate PEBS and BTS support, but lack of enumeration is a seperate issue. If KVM had simply ignored invalid guest state from the get go, then userspace would never have gained a dependency on KVM sanitizing guest state. The fact that KVM didn't enumerate support in any way is an orthogonal problem. To play nice with older userspace, KVM will need to add a quirk to restore the sanizting code, but that doesn't solve the enumeration issue. And vice versa, solving the enuemaration problem doesn't magically fix old userspace. The right way to fix it is probably to use feature MSRs and, by default, leave the features marked as unavailable. I'll think it through and post a patch tomorrow for both KVM and QEMU (to enable PEBS). Try to help: KVM already have MSR_IA32_PERF_CAPABILITIES as a feature msr (to enable LBR/PEBS), and KVM_CAP_PMU_CAPABILITY as vm ioctl extension for model specific crappiness. Yeah, lack of CPUID bits is annoying.
Re: [PATCH] i386: Disable BTS and PEBS
On 18/7/2022 11:22 am, Zhenzhong Duan wrote: Since below KVM commit, KVM hided BTS as it's not supported yet. b9181c8ef356 ("KVM: x86/pmu: Avoid exposing Intel BTS feature") After below KVM commit, it gave control of MSR_IA32_MISC_ENABLES to userspace. 9fc222967a39 ("KVM: x86: Give host userspace full control of MSR_IA32_MISC_ENABLES") So qemu takes the responsibility to hide BTS. Without fix, we get below warning in guest kernel: [] unchecked MSR access error: WRMSR to 0x1d9 (tried to write 0x01c0) at rIP: 0xaa070644 (native_write_msr+0x4/0x20) [] Call Trace: [] [] intel_pmu_enable_bts+0x5d/0x70 [] bts_event_add+0x77/0x90 [] event_sched_in.isra.135+0x99/0x1e0 Tested-by: Xiangfei Ma Signed-off-by: Zhenzhong Duan --- target/i386/cpu.h | 6 -- target/i386/kvm/kvm.c | 4 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 82004b65b944..8a83d0995c66 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -434,8 +434,10 @@ typedef enum X86Seg { #define MSR_IA32_MISC_ENABLE0x1a0 /* Indicates good rep/movs microcode on some processors: */ -#define MSR_IA32_MISC_ENABLE_DEFAULT1 -#define MSR_IA32_MISC_ENABLE_MWAIT (1ULL << 18) +#define MSR_IA32_MISC_ENABLE_DEFAULT 1 +#define MSR_IA32_MISC_ENABLE_BTS_UNAVAIL (1ULL << 11) +#define MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL (1ULL << 12) +#define MSR_IA32_MISC_ENABLE_MWAIT(1ULL << 18) #define MSR_MTRRphysBase(reg) (0x200 + 2 * (reg)) #define MSR_MTRRphysMask(reg) (0x200 + 2 * (reg) + 1) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index f148a6d52fa4..002e0520dd76 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -2180,6 +2180,10 @@ void kvm_arch_reset_vcpu(X86CPU *cpu) { CPUX86State *env = &cpu->env; +/* Disable BTS feature which is unsupported on KVM */ +env->msr_ia32_misc_enable |= MSR_IA32_MISC_ENABLE_BTS_UNAVAIL; +env->msr_ia32_misc_enable |= MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL; Would it be more readable to group msr_ia32_misc_enable code into this function: static void x86_cpu_reset(DeviceState *dev) and, why disable PEBS (we need it at least for "-cpu host,migratable=no") ? Also, the behavior of MISC_ENABLE_EMON is also inconsistent with "pmu=off”. + env->xcr0 = 1; if (kvm_irqchip_in_kernel()) { env->mp_state = cpu_is_bsp(cpu) ? KVM_MP_STATE_RUNNABLE :
Re: [PATCH v5 0/2] Enable legacy LBR support for guest
Hi Weijiang, On 23/1/2022 12:11 am, Yang Weijiang wrote: KVM legacy LBR patches have been merged in kernel 5.12, this patchset is to expose the feature to guest from the perf capability MSR. Qemu can add LBR format in cpu option to achieve it, e.g., -cpu host,lbr-fmt=0x5, Some older Intel CPUs may have lbr-fmt=LBR_FORMAT_32 (which is 0), would you help verify that KVM is supported on these platforms ? If so, how do we enable guest LBR form the QEMU side, w/ -cpu host,lbr-fmt=0x0 ? the format should match host value in IA32_PERF_CAPABILITIES. Note, KVM legacy LBR solution accelerates guest perf performace by LBR MSR passthrough so it requires guest cpu model matches that of host's, i.e., Would you help add live migration support across host/guest CPU models when hosts at both ends have the same number of LBR entries and the same lbr-fmt ? Thanks, Like Xu only -cpu host is supported. Change in v5: 1. This patchset is rebased on tip : 6621441db5 2. No functional change since v4.
[PATCH] target/i386/cpu: Use the KVM reported value for the number of ASIDs
From: Like Xu If KVM is enabled, use the supported number of address space identifiers (ASIDs) by the CPUID Fn8000_000A_EBX instead of hard-coding it to 0x10. Signed-off-by: Like Xu --- target/i386/cpu.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 48b55ebd0a..959c4425a4 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -5523,7 +5523,13 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, case 0x800A: if (env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_SVM) { *eax = 0x0001; /* SVM Revision */ -*ebx = 0x0010; /* nr of ASIDs */ +/* nr of ASIDs */ +if (kvm_enabled()) { +*ebx = kvm_arch_get_supported_cpuid(cs->kvm_state, +0x800A, 0, R_EBX); +} else { +*ebx = 0x0010; +} *ecx = 0; *edx = env->features[FEAT_SVM]; /* optional features */ } else { -- 2.32.0
[PATCH v3 2/2] target/i386: add "-cpu, lbr-fmt=*" support to enable guest LBR
The last branch recording (LBR) is a performance monitor unit (PMU) feature on Intel processors that records a running trace of the most recent branches taken by the processor in the LBR stack. The QEMU could configure whether it's enabled or not for each guest via CLI. The LBR feature would be enabled on the guest if: - the KVM is enabled and the PMU is enabled and, - the msr-based-feature IA32_PERF_CAPABILITIES is supporterd on KVM and, - the supported returned value for lbr_fmt from this msr is not zero and, - the requested guest vcpu model does support FEAT_1_ECX.CPUID_EXT_PDCM, - the user-provided lbr-fmt value should not violate its bitmask (0x3f) and it should be the same as the host lbr_fmt value or just use the QEMU option "-cpu host,migratable=no" to enable guest LBR. Signed-off-by: Like Xu --- v2-v3 Changelog: - Add a new generic property macro to validate its bitmask; - Differentiate "lbr-fmt=0" from "lbr-fmt not set"; - Do what the user asked for whenever possible; - Treat mismatch or violatation as an error rather than warning; Testcases for a lbr-fmt=5 host: "-cpu host" --> "Disable LBR" "-cpu host,lbr-fmt=0" --> "Disable LBR" "-cpu host,lbr-fmt=5" --> "Enable LBR" "-cpu host,lbr-fmt=6" --> "Error out, lbr mismatch" "-cpu host,lbr-fmt=0xff" --> "Error out, bitmask violatation" "-cpu host,migratable=no" --> "Enable LBR" "-cpu host,migratable=no,lbr-fmt=0" --> "Disable LBR" "-cpu host,migratable=no,lbr-fmt=5" --> "Enable LBR" "-cpu host,migratable=no,lbr-fmt=6" --> "Error out, lbr mismatch" "-cpu host,migratable=no,lbr-fmt=0xff" --> "Error out, bitmask violatation" target/i386/cpu.c | 39 +++ target/i386/cpu.h | 10 ++ 2 files changed, 49 insertions(+) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index ad99cad0e7..d03306179a 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6748,6 +6748,41 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp) goto out; } +/* + * Override env->features[FEAT_PERF_CAPABILITIES] + * with explicit user-provided settings. + */ +if (cpu->lbr_fmt != ~PERF_CAP_LBR_FMT) { +if ((cpu->lbr_fmt & PERF_CAP_LBR_FMT) != cpu->lbr_fmt) { +error_setg(errp, "invalid lbr-fmt"); +return; +} +env->features[FEAT_PERF_CAPABILITIES] &= ~PERF_CAP_LBR_FMT; +env->features[FEAT_PERF_CAPABILITIES] |= cpu->lbr_fmt; +} + +/* + * We can always validate env->features[FEAT_PERF_CAPABILITIES], + * no matter how it was initialized: + */ +uint64_t requested_lbr_fmt = +env->features[FEAT_PERF_CAPABILITIES] & PERF_CAP_LBR_FMT; +if (requested_lbr_fmt && kvm_enabled()) { +uint64_t host_perf_cap = +x86_cpu_get_supported_feature_word(FEAT_PERF_CAPABILITIES, false); +uint64_t host_lbr_fmt = host_perf_cap & PERF_CAP_LBR_FMT; +if (!cpu->enable_pmu) { +error_setg(errp, "vPMU: LBR is unsupported without pmu=on"); +return; +} +if (requested_lbr_fmt != host_lbr_fmt) { +error_setg(errp, "vPMU: the lbr-fmt value (0x%lx) mismatches " +"the host supported value (0x%lx).", +requested_lbr_fmt, host_lbr_fmt); +return; +} +} + x86_cpu_filter_features(cpu, cpu->check_cpuid || cpu->enforce_cpuid); if (cpu->enforce_cpuid && x86_cpu_have_filtered_features(cpu)) { @@ -7150,6 +7185,9 @@ static void x86_cpu_initfn(Object *obj) object_property_add_alias(obj, "sse4_1", obj, "sse4.1"); object_property_add_alias(obj, "sse4_2", obj, "sse4.2"); +cpu->lbr_fmt = ~PERF_CAP_LBR_FMT; +object_property_add_alias(obj, "lbr_fmt", obj, "lbr-fmt"); + if (xcc->model) { x86_cpu_load_model(cpu, xcc->model); } @@ -7300,6 +7338,7 @@ static Property x86_cpu_properties[] = { #endif DEFINE_PROP_INT32("node-id", X86CPU, node_id, CPU_UNSET_NUMA_NODE_ID), DEFINE_PROP_BOOL("pmu", X86CPU, enable_pmu, false), +DEFINE_PROP_BITMASK_UINT64("lbr-fmt", X86CPU, lbr_fmt, PERF_CAP_LBR_FMT), DEFINE_PROP_UINT32("hv-spinlocks", X86CPU, hyperv_spinlock_attempts, HYPERV_SPINLOCK_NEVER_NOTIFY), diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 1bc300ce85..bab394e18e 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -354,6 +354,7 @@ typedef enum X86Seg { #define ARCH_CAP_TSX_CTRL_MSR (1<<7
[PATCH v3 1/2] qdev-properties: Add a new macro to validate bitmask for setter
The new generic DEFINE_PROP_BITMASK_UINT64 could be used to ensure that a user-provided property value complies with its bitmask rule and the default value is recommended to be set in instance_init(). Signed-off-by: Like Xu --- hw/core/qdev-properties.c| 19 +++ include/hw/qdev-properties.h | 12 include/qapi/qmp/qerror.h| 3 +++ 3 files changed, 34 insertions(+) diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c index 50f40949f5..3784d3b30d 100644 --- a/hw/core/qdev-properties.c +++ b/hw/core/qdev-properties.c @@ -428,6 +428,25 @@ const PropertyInfo qdev_prop_int64 = { .set_default_value = qdev_propinfo_set_default_value_int, }; +static void set_bitmask_uint64(Object *obj, Visitor *v, const char *name, + void *opaque, Error **errp) +{ +Property *prop = opaque; +uint64_t *ptr = object_field_prop_ptr(obj, prop); + +visit_type_uint64(v, name, ptr, errp); + +if (*ptr & ~prop->bitmask) { +error_setg(errp, QERR_INVALID_BITMASK_VALUE, name, prop->bitmask); +} +} + +const PropertyInfo qdev_prop_bitmask_uint64 = { +.name = "int64", +.get = get_uint64, +.set = set_bitmask_uint64, +}; + /* --- string --- */ static void release_string(Object *obj, const char *name, void *opaque) diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h index 0ef97d60ce..42f0112e14 100644 --- a/include/hw/qdev-properties.h +++ b/include/hw/qdev-properties.h @@ -17,6 +17,7 @@ struct Property { const PropertyInfo *info; ptrdiff_toffset; uint8_t bitnr; +uint64_t bitmask; bool set_default; union { int64_t i; @@ -53,6 +54,7 @@ extern const PropertyInfo qdev_prop_uint16; extern const PropertyInfo qdev_prop_uint32; extern const PropertyInfo qdev_prop_int32; extern const PropertyInfo qdev_prop_uint64; +extern const PropertyInfo qdev_prop_bitmask_uint64; extern const PropertyInfo qdev_prop_int64; extern const PropertyInfo qdev_prop_size; extern const PropertyInfo qdev_prop_string; @@ -102,6 +104,16 @@ extern const PropertyInfo qdev_prop_link; .set_default = true, \ .defval.u= (bool)_defval) +/** + * The DEFINE_PROP_BITMASK_UINT64 could be used to ensure that + * a user-provided value complies with certain bitmask rule and + * the default value is recommended to be set in instance_init(). + */ +#define DEFINE_PROP_BITMASK_UINT64(_name, _state, _field, _bitmask) \ +DEFINE_PROP(_name, _state, _field, qdev_prop_bitmask_uint64, uint64_t, \ +.bitmask= (_bitmask), \ +.set_default = false) + #define PROP_ARRAY_LEN_PREFIX "len-" /** diff --git a/include/qapi/qmp/qerror.h b/include/qapi/qmp/qerror.h index 596fce0c54..aab7902760 100644 --- a/include/qapi/qmp/qerror.h +++ b/include/qapi/qmp/qerror.h @@ -68,4 +68,7 @@ #define QERR_UNSUPPORTED \ "this feature or command is not currently supported" +#define QERR_INVALID_BITMASK_VALUE \ +"the requested value for '%s' violates its bitmask '0x%lx'" + #endif /* QERROR_H */ -- 2.30.2
Re: [PATCH v2] target/i386: add "-cpu, lbr-fmt=*" support to enable guest LBR
Hi Eduardo, Thanks for your detailed comments. On 2021/4/29 5:19, Eduardo Habkost wrote: On Tue, Apr 27, 2021 at 04:09:48PM +0800, Like Xu wrote: The last branch recording (LBR) is a performance monitor unit (PMU) feature on Intel processors that records a running trace of the most recent branches taken by the processor in the LBR stack. The QEMU could configure whether it's enabled or not for each guest via CLI. The LBR feature would be enabled on the guest if: - the KVM is enabled and the PMU is enabled and, - the msr-based-feature IA32_PERF_CAPABILITIES is supporterd on KVM and, - the supported returned value for lbr_fmt from this msr is not zero and, - the requested guest vcpu model does support FEAT_1_ECX.CPUID_EXT_PDCM, - the configured lbr-fmt value is the same as the host lbr_fmt value OR use the QEMU option "-cpu host,migratable=no". I don't understand why "migratable" matters here. "migratable" is just a convenience property to get better defaults when using "-cpu host". I don't know why it would change the lbr-fmt validation rules. Your comments bevlow help me understand why we introduced "migratable" and I'll fllow it. Signed-off-by: Like Xu --- A changelog explaining what you changed since v1 would have been useful here. Sorry for inconvenience. target/i386/cpu.c | 34 ++ target/i386/cpu.h | 10 ++ target/i386/kvm/kvm.c | 10 -- 3 files changed, 52 insertions(+), 2 deletions(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index ad99cad0e7..9c8e54aa6f 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6623,6 +6623,10 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool verbose) } for (w = 0; w < FEATURE_WORDS; w++) { +if (w == FEAT_PERF_CAPABILITIES) { +continue; +} + Why exactly is this necessary? I expected to be completely OK to call mark_unavailable_features() multiple times for the same FeatureWord. OK. If there's a reason why this is necessary, I suggest adding a comment explaining why. uint64_t host_feat = x86_cpu_get_supported_feature_word(w, false); uint64_t requested_features = env->features[w]; @@ -6630,6 +6634,27 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool verbose) mark_unavailable_features(cpu, w, unavailable_features, prefix); } +uint64_t host_perf_cap = +x86_cpu_get_supported_feature_word(FEAT_PERF_CAPABILITIES, false); +if (!cpu->lbr_fmt && !cpu->migratable) { +cpu->lbr_fmt = host_perf_cap & PERF_CAP_LBR_FMT; "migratable=no" is not a request to override explicit user settings. This is why we have the ~env->user_features masking inside x86_cpu_expand_features() when initializing env->features[]. In either case, I don't understand why you need the lines above. "migratable=no" should already trigger the x86_cpu_get_supported_feature_word() loop inside x86_cpu_expand_features(), and it should initialize env->features[FEAT_PERF_CAPABILITIES] with the host value. Isn't that code working for you? +if (cpu->lbr_fmt) { +info_report("vPMU: The value of lbr-fmt has been adjusted " +"to 0x%lx and guest LBR is enabled.", +host_perf_cap & PERF_CAP_LBR_FMT); From your other message: (I'm assuming your examples are for a lbr-fmt=5 host) "-cpu host,migratable=no" --> "Enable guest LBR and show warning" Enabling guest LBR in this case is 100% OK, isn't it? I don't think you need to show a warning. "-cpu host,migratable=no,lbr-fmt=0" --> "Enable guest LBR and show warning" Why? In this case, we should do what the user asked for whenever possible, and the user is explicitly asking lbr-fmt to be 0. "-cpu host,migratable=no,lbr-fmt=5" --> "Enable guest LBR" Looks OK. "-cpu host,migratable=no,lbr-fmt=6" --> "Disable guest LBR and show warning" Makes sense to me[1]. +} +} else { +uint64_t requested_lbr_fmt = cpu->lbr_fmt & PERF_CAP_LBR_FMT; +if (requested_lbr_fmt && kvm_enabled()) { From your other message: "-cpu host,lbr-fmt=0" --> "Disable guest LBR" Makes sense to me. I understand this as a confirmation that it's OK to have a guest/host mismatch if guest LBR=0. "-cpu host,lbr-fmt=5" --> "Enable guest LBR" Makes sense to me. "-cpu host,lbr-fmt=6" --> "Disable guest LBR and show warning" Makes sense to me[1]. [1] As long as "show warning" becomes "fatal error" if enforce=1. mark_unavailable_features() should make s
Re: [PATCH RESEND 1/2] target/i386: add "-cpu, lbr-fmt=*" support to enable guest LBR
Hi Eduardo, On 2021/4/24 5:20, Eduardo Habkost wrote: Hi, Sorry for missing the previous submission of this series, and thanks for resubmitting. Long time no see and thanks for your comments. On Fri, Apr 23, 2021 at 10:20:36AM +0800, Like Xu wrote: The last branch recording (LBR) is a performance monitor unit (PMU) feature on Intel processors that records a running trace of the most recent branches taken by the processor in the LBR stack. The QEMU could configure whether it's enabled or not for each guest via CLI. The LBR feature would be enabled on the guest if: - the KVM is enabled and the PMU is enabled and, - the msr-based-feature IA32_PERF_CAPABILITIES is supporterd on KVM and, - the supported returned value for lbr_fmt from this msr is not zero and, - the requested guest vcpu model does support FEAT_1_ECX.CPUID_EXT_PDCM, - the configured lbr-fmt value is the same as the host lbr_fmt value or use the QEMU option "-cpu host,migratable=no". Cc: Eduardo Habkost Cc: Paolo Bonzini Signed-off-by: Like Xu --- target/i386/cpu.c | 16 target/i386/cpu.h | 10 ++ target/i386/kvm/kvm.c | 5 +++-- 3 files changed, 29 insertions(+), 2 deletions(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index ad99cad0e7..eee6da3ad8 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6627,6 +6627,13 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool verbose) x86_cpu_get_supported_feature_word(w, false); uint64_t requested_features = env->features[w]; uint64_t unavailable_features = requested_features & ~host_feat; +if (kvm_enabled() && w == FEAT_PERF_CAPABILITIES && If this block of code should run only once, why is this inside the loop in the first place? I suggest following the same pattern used for intel-pt flags and moving it outside the loop. Sure, the mark_unavailable_features() will skip the check for feature_word(FEAT_PERF_CAPABILITIES) and avoid avoid double checking. +(requested_features & PERF_CAP_LBR_FMT)) { What exactly is supposed to happen if the VCPU is configured with LBR_FMT=0 and the host has LBR_FMT != 0 ? If the VCPU is configured with LBR_FMT=0 and the host has LBR_FMT != 0, the guest LBR will be enabled if "migratable=no" and will be disabled if "migratable=yes" by default. Some test cases and expected results can be listed as: "-cpu host,lbr-fmt=0" --> "Disable guest LBR" "-cpu host,lbr-fmt=5" --> "Enable guest LBR" "-cpu host,lbr-fmt=6" --> "Disable guest LBR and show warning" "-cpu host,migratable=no" --> "Enable guest LBR and show warning" "-cpu host,migratable=no,lbr-fmt=0" --> "Enable guest LBR and show warning" "-cpu host,migratable=no,lbr-fmt=5" --> "Enable guest LBR" "-cpu host,migratable=no,lbr-fmt=6" --> "Disable guest LBR and show warning" If it shouldn't be an error, then the new kvm_exact_match_flags field added in patch 2/2 becomes hard to reuse, and easy to misuse (there's no code documentation indicating that a mismatch is allowed if the requested bits are all zero). In that case, maybe patch 2/2 could be dropped by now. Let us drop the patch 2/2 and please help review the new version: https://lore.kernel.org/qemu-devel/20210427080948.439432-1-like...@linux.intel.com/ If it should be an error, this patch and 2/2 don't seem correct. If correcting that, I also suggest reversing the patch order in the series, so this whole block of code doesn't even need to be added in the first place. +if ((host_feat & PERF_CAP_LBR_FMT) != +(requested_features & PERF_CAP_LBR_FMT)) { +unavailable_features |= PERF_CAP_LBR_FMT; +} +} mark_unavailable_features(cpu, w, unavailable_features, prefix); } @@ -6734,6 +6741,14 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp) } } +if (cpu->lbr_fmt) { +if (!cpu->enable_pmu) { +error_setg(errp, "LBR is unsupported since guest PMU is disabled."); +return; +} +env->features[FEAT_PERF_CAPABILITIES] |= cpu->lbr_fmt; +} + /* mwait extended info: needed for Core compatibility */ /* We always wake on interrupt even if host does not have the capability */ cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE; @@ -7300,6 +7315,7 @@ static Property x86_cpu_properties[] = { #endif DEFINE_PROP_INT32("node-id", X86CPU, node_id, CPU_UNSET_NUMA_NODE_ID), DEFINE_PROP_BOOL("pmu", X86CPU, enable_pmu, false), +DEFINE_PROP_UINT8("lbr-fmt", X86CPU, lbr_fmt, 0), DEFINE_PROP_UINT32("hv-spinlocks"
[PATCH v2] target/i386: add "-cpu, lbr-fmt=*" support to enable guest LBR
The last branch recording (LBR) is a performance monitor unit (PMU) feature on Intel processors that records a running trace of the most recent branches taken by the processor in the LBR stack. The QEMU could configure whether it's enabled or not for each guest via CLI. The LBR feature would be enabled on the guest if: - the KVM is enabled and the PMU is enabled and, - the msr-based-feature IA32_PERF_CAPABILITIES is supporterd on KVM and, - the supported returned value for lbr_fmt from this msr is not zero and, - the requested guest vcpu model does support FEAT_1_ECX.CPUID_EXT_PDCM, - the configured lbr-fmt value is the same as the host lbr_fmt value OR use the QEMU option "-cpu host,migratable=no". Signed-off-by: Like Xu --- target/i386/cpu.c | 34 ++ target/i386/cpu.h | 10 ++ target/i386/kvm/kvm.c | 10 -- 3 files changed, 52 insertions(+), 2 deletions(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index ad99cad0e7..9c8e54aa6f 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6623,6 +6623,10 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool verbose) } for (w = 0; w < FEATURE_WORDS; w++) { +if (w == FEAT_PERF_CAPABILITIES) { +continue; +} + uint64_t host_feat = x86_cpu_get_supported_feature_word(w, false); uint64_t requested_features = env->features[w]; @@ -6630,6 +6634,27 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool verbose) mark_unavailable_features(cpu, w, unavailable_features, prefix); } +uint64_t host_perf_cap = +x86_cpu_get_supported_feature_word(FEAT_PERF_CAPABILITIES, false); +if (!cpu->lbr_fmt && !cpu->migratable) { +cpu->lbr_fmt = host_perf_cap & PERF_CAP_LBR_FMT; +if (cpu->lbr_fmt) { +info_report("vPMU: The value of lbr-fmt has been adjusted " +"to 0x%lx and guest LBR is enabled.", +host_perf_cap & PERF_CAP_LBR_FMT); +} +} else { +uint64_t requested_lbr_fmt = cpu->lbr_fmt & PERF_CAP_LBR_FMT; +if (requested_lbr_fmt && kvm_enabled()) { +if (requested_lbr_fmt != (host_perf_cap & PERF_CAP_LBR_FMT)) { +cpu->lbr_fmt = 0; +warn_report("vPMU: The supported lbr-fmt value on the host " +"is 0x%lx and guest LBR is disabled.", +host_perf_cap & PERF_CAP_LBR_FMT); +} +} +} + if ((env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT) && kvm_enabled()) { KVMState *s = CPU(cpu)->kvm_state; @@ -6734,6 +6759,14 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp) } } +if (cpu->lbr_fmt) { +if (!cpu->enable_pmu) { +error_setg(errp, "LBR is unsupported since guest PMU is disabled."); +return; +} +env->features[FEAT_PERF_CAPABILITIES] |= cpu->lbr_fmt; +} + /* mwait extended info: needed for Core compatibility */ /* We always wake on interrupt even if host does not have the capability */ cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE; @@ -7300,6 +7333,7 @@ static Property x86_cpu_properties[] = { #endif DEFINE_PROP_INT32("node-id", X86CPU, node_id, CPU_UNSET_NUMA_NODE_ID), DEFINE_PROP_BOOL("pmu", X86CPU, enable_pmu, false), +DEFINE_PROP_UINT8("lbr-fmt", X86CPU, lbr_fmt, 0), DEFINE_PROP_UINT32("hv-spinlocks", X86CPU, hyperv_spinlock_attempts, HYPERV_SPINLOCK_NEVER_NOTIFY), diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 570f916878..b12c879fc4 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -354,6 +354,7 @@ typedef enum X86Seg { #define ARCH_CAP_TSX_CTRL_MSR (1<<7) #define MSR_IA32_PERF_CAPABILITIES 0x345 +#define PERF_CAP_LBR_FMT 0x3f #define MSR_IA32_TSX_CTRL 0x122 #define MSR_IA32_TSCDEADLINE0x6e0 @@ -1726,6 +1727,15 @@ struct X86CPU { */ bool enable_pmu; +/* + * Configure LBR_FMT bits on IA32_PERF_CAPABILITIES MSR. + * This can't be enabled by default yet because it doesn't have + * ABI stability guarantees, as it is only allowed to pass all + * LBR_FMT bits returned by kvm_arch_get_supported_msr_feature() + * (that depends on host CPU and kernel capabilities) to the guest. + */ +uint8_t lbr_fmt; + /* LMCE support can be enabled/disabled via cpu option 'lmce=on/off'. It is * disabled by default to avoid breaking migration between QEMU with * different LMCE configurations. diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 7fe9f52710..aa926984ae 1
Re: [PATCH v2] hw/i386: Expand the range of CPU topologies between smp and maxcpus
On 2021/4/26 21:30, Daniel P. Berrangé wrote: On Mon, Apr 26, 2021 at 10:08:52AM +0800, caodon...@kingsoft.com wrote: Change the criteria for the initial CPU topology and maxcpus, user can have more settings Can you provide a better explanation of why this is needed. What valid usage scenario is blocked by the current check ? AFAICT, it partially reverts an intentional change done in several years ago in : commit bc1fb850a31468ac4976f3895f01a6d981e06d0a Author: Igor Mammedov Date: Thu Sep 13 13:06:01 2018 +0200 vl.c deprecate incorrect CPUs topology -smp [cpus],sockets/cores/threads[,maxcpus] should describe topology so that total number of logical CPUs [sockets * cores * threads] would be equal to [maxcpus], however historically we didn't have such check in QEMU and it is possible to start VM with an invalid topology. Deprecate invalid options combination so we can make sure that the topology VM started with is always correct in the future. Users with an invalid sockets/cores/threads/maxcpus values should fix their CLI to make sure that [sockets * cores * threads] == [maxcpus] Another helpful commit would be: commit c4332cd1dcf2964c23893ab4c0bf8d774e42a3cf Author: Igor Mammedov Date: Fri Sep 11 09:32:02 2020 -0400 smp: drop support for deprecated (invalid topologies) it's was deprecated since 3.1 Support for invalid topologies is removed, the user must ensure that topologies described with -smp include all possible cpus, i.e. (sockets * cores * threads) == maxcpus or QEMU will exit with error. So is the following statement correct: When we explicitly set the topology, we must ensure that the combination (sockets/dies/cores/threads/maxcpus) is always valid. If we need hot plug testing, we can only use something like "-smp 1,maxcpus = 4" since 3.1. ? Signed-off-by: Dongli Cao --- hw/i386/pc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 8a84b25..ef2e819 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -751,7 +751,7 @@ void pc_smp_parse(MachineState *ms, QemuOpts *opts) exit(1); } -if (sockets * dies * cores * threads != ms->smp.max_cpus) { +if (sockets * dies * cores * threads > ms->smp.max_cpus) { error_report("Invalid CPU topology deprecated: " "sockets (%u) * dies (%u) * cores (%u) * threads (%u) " "!= maxcpus (%u)", This is -- 1.8.3.1 caodon...@kingsoft.com Regards, Daniel
[PATCH RESEND 1/2] target/i386: add "-cpu, lbr-fmt=*" support to enable guest LBR
The last branch recording (LBR) is a performance monitor unit (PMU) feature on Intel processors that records a running trace of the most recent branches taken by the processor in the LBR stack. The QEMU could configure whether it's enabled or not for each guest via CLI. The LBR feature would be enabled on the guest if: - the KVM is enabled and the PMU is enabled and, - the msr-based-feature IA32_PERF_CAPABILITIES is supporterd on KVM and, - the supported returned value for lbr_fmt from this msr is not zero and, - the requested guest vcpu model does support FEAT_1_ECX.CPUID_EXT_PDCM, - the configured lbr-fmt value is the same as the host lbr_fmt value or use the QEMU option "-cpu host,migratable=no". Cc: Eduardo Habkost Cc: Paolo Bonzini Signed-off-by: Like Xu --- target/i386/cpu.c | 16 target/i386/cpu.h | 10 ++ target/i386/kvm/kvm.c | 5 +++-- 3 files changed, 29 insertions(+), 2 deletions(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index ad99cad0e7..eee6da3ad8 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6627,6 +6627,13 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool verbose) x86_cpu_get_supported_feature_word(w, false); uint64_t requested_features = env->features[w]; uint64_t unavailable_features = requested_features & ~host_feat; +if (kvm_enabled() && w == FEAT_PERF_CAPABILITIES && +(requested_features & PERF_CAP_LBR_FMT)) { +if ((host_feat & PERF_CAP_LBR_FMT) != +(requested_features & PERF_CAP_LBR_FMT)) { +unavailable_features |= PERF_CAP_LBR_FMT; +} +} mark_unavailable_features(cpu, w, unavailable_features, prefix); } @@ -6734,6 +6741,14 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp) } } +if (cpu->lbr_fmt) { +if (!cpu->enable_pmu) { +error_setg(errp, "LBR is unsupported since guest PMU is disabled."); +return; +} +env->features[FEAT_PERF_CAPABILITIES] |= cpu->lbr_fmt; +} + /* mwait extended info: needed for Core compatibility */ /* We always wake on interrupt even if host does not have the capability */ cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE; @@ -7300,6 +7315,7 @@ static Property x86_cpu_properties[] = { #endif DEFINE_PROP_INT32("node-id", X86CPU, node_id, CPU_UNSET_NUMA_NODE_ID), DEFINE_PROP_BOOL("pmu", X86CPU, enable_pmu, false), +DEFINE_PROP_UINT8("lbr-fmt", X86CPU, lbr_fmt, 0), DEFINE_PROP_UINT32("hv-spinlocks", X86CPU, hyperv_spinlock_attempts, HYPERV_SPINLOCK_NEVER_NOTIFY), diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 570f916878..b12c879fc4 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -354,6 +354,7 @@ typedef enum X86Seg { #define ARCH_CAP_TSX_CTRL_MSR (1<<7) #define MSR_IA32_PERF_CAPABILITIES 0x345 +#define PERF_CAP_LBR_FMT 0x3f #define MSR_IA32_TSX_CTRL 0x122 #define MSR_IA32_TSCDEADLINE0x6e0 @@ -1726,6 +1727,15 @@ struct X86CPU { */ bool enable_pmu; +/* + * Configure LBR_FMT bits on IA32_PERF_CAPABILITIES MSR. + * This can't be enabled by default yet because it doesn't have + * ABI stability guarantees, as it is only allowed to pass all + * LBR_FMT bits returned by kvm_arch_get_supported_msr_feature() + * (that depends on host CPU and kernel capabilities) to the guest. + */ +uint8_t lbr_fmt; + /* LMCE support can be enabled/disabled via cpu option 'lmce=on/off'. It is * disabled by default to avoid breaking migration between QEMU with * different LMCE configurations. diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 7fe9f52710..4d842d32a6 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -2732,8 +2732,9 @@ static void kvm_msr_entry_add_perf(X86CPU *cpu, FeatureWordArray f) MSR_IA32_PERF_CAPABILITIES); if (kvm_perf_cap) { -kvm_msr_entry_add(cpu, MSR_IA32_PERF_CAPABILITIES, -kvm_perf_cap & f[FEAT_PERF_CAPABILITIES]); +kvm_perf_cap = cpu->migratable ? +(kvm_perf_cap & f[FEAT_PERF_CAPABILITIES]) : kvm_perf_cap; +kvm_msr_entry_add(cpu, MSR_IA32_PERF_CAPABILITIES, kvm_perf_cap); } } -- 2.30.2
[PATCH RESEND 2/2] target/i386: add kvm_exact_match_flags to FeatureWordInfo
Instead of hardcoding the PERF_CAPABILITIES rules in this loop, this could become a FeatureWordInfo field. It would be very useful for other features like intel-pt, where we need some bits to match the host bits too. Suggested-by: Eduardo Habkost Signed-off-by: Like Xu --- target/i386/cpu.c | 21 +++-- 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index eee6da3ad8..56a486b498 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -708,6 +708,8 @@ typedef struct FeatureWordInfo { uint64_t migratable_flags; /* Feature flags known to be migratable */ /* Features that shouldn't be auto-enabled by "-cpu host" */ uint64_t no_autoenable_flags; +/* Bits that must match host exactly when using KVM */ +uint64_t kvm_exact_match_flags; } FeatureWordInfo; static FeatureWordInfo feature_word_info[FEATURE_WORDS] = { @@ -1147,6 +1149,11 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = { .msr = { .index = MSR_IA32_PERF_CAPABILITIES, }, +/* + * KVM is not able to emulate a VCPU with LBR_FMT different + * from the host, so LBR_FMT must match the host exactly. + */ +.kvm_exact_match_flags = PERF_CAP_LBR_FMT, }, [FEAT_VMX_PROCBASED_CTLS] = { @@ -6623,16 +6630,18 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool verbose) } for (w = 0; w < FEATURE_WORDS; w++) { +FeatureWordInfo *fi = &feature_word_info[w]; +uint64_t match_flags = fi->kvm_exact_match_flags; uint64_t host_feat = x86_cpu_get_supported_feature_word(w, false); uint64_t requested_features = env->features[w]; uint64_t unavailable_features = requested_features & ~host_feat; -if (kvm_enabled() && w == FEAT_PERF_CAPABILITIES && -(requested_features & PERF_CAP_LBR_FMT)) { -if ((host_feat & PERF_CAP_LBR_FMT) != -(requested_features & PERF_CAP_LBR_FMT)) { -unavailable_features |= PERF_CAP_LBR_FMT; -} +if (kvm_enabled() && match_flags) { +uint64_t mismatches = (requested_features & match_flags) && +(requested_features ^ host_feat) & match_flags; +mark_unavailable_features(cpu, w, +mismatches, "feature doesn't match host"); +unavailable_features &= ~match_flags; } mark_unavailable_features(cpu, w, unavailable_features, prefix); } -- 2.30.2
Re: [RESEND][BUG FIX HELP] QEMU main thread endlessly hangs in __ppoll()
Hi John, Thanks for your comment. On 2021/3/5 7:53, John Snow wrote: On 2/28/21 9:39 PM, Like Xu wrote: Hi Genius, I am a user of QEMU v4.2.0 and stuck in an interesting bug, which may still exist in the mainline. Thanks in advance to heroes who can take a look and share understanding. Do you have a test case that reproduces on 5.2? It'd be nice to know if it was still a problem in the latest source tree or not. We narrowed down the source of the bug, which basically came from the following qmp usage: {'execute': 'human-monitor-command', 'arguments':{ 'command-line': 'drive_del replication0' } } One of the test cases is the COLO usage (docs/colo-proxy.txt). This issue is sporadic,the probability may be 1/15 for a io-heavy guest. I believe it's reproducible on 5.2 and the latest tree. --js The qemu main thread endlessly hangs in the handle of the qmp statement: {'execute': 'human-monitor-command', 'arguments':{ 'command-line': 'drive_del replication0' } } and we have the call trace looks like: #0 0x7f3c22045bf6 in __ppoll (fds=0x555611328410, nfds=1, timeout=, timeout@entry=0x7ffc56c66db0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44 #1 0x55561021f415 in ppoll (__ss=0x0, __timeout=0x7ffc56c66db0, __nfds=, __fds=) at /usr/include/x86_64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=) at util/qemu-timer.c:348 #3 0x555610221430 in aio_poll (ctx=ctx@entry=0x5556113010f0, blocking=blocking@entry=true) at util/aio-posix.c:669 #4 0x55561019268d in bdrv_do_drained_begin (poll=true, ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x55561138b0a0) at block/io.c:430 #5 bdrv_do_drained_begin (bs=0x55561138b0a0, recursive=, parent=0x0, ignore_bds_parents=, poll=) at block/io.c:396 #6 0x55561017b60b in quorum_del_child (bs=0x55561138b0a0, child=0x7f36dc0ce380, errp=) at block/quorum.c:1063 #7 0x55560ff5836b in qmp_x_blockdev_change (parent=0x555612373120 "colo-disk0", has_child=, child=0x5556112df3e0 "children.1", has_node=, node=0x0, errp=0x7ffc56c66f98) at blockdev.c:4494 #8 0x5556100f8f57 in qmp_marshal_x_blockdev_change (args=out>, ret=, errp=0x7ffc56c67018) at qapi/qapi-commands-block-core.c:1538 #9 0x5556101d8290 in do_qmp_dispatch (errp=0x7ffc56c67010, allow_oob=, request=, cmds=0x5556109c69a0 ) at qapi/qmp-dispatch.c:132 #10 qmp_dispatch (cmds=0x5556109c69a0 , request=out>, allow_oob=) at qapi/qmp-dispatch.c:175 #11 0x5556100d4c4d in monitor_qmp_dispatch (mon=0x5556113a6f40, req=) at monitor/qmp.c:145 #12 0x5556100d5437 in monitor_qmp_bh_dispatcher (data=out>) at monitor/qmp.c:234 #13 0x55561021dbec in aio_bh_call (bh=0x5556112164bGrateful0) at util/async.c:117 #14 aio_bh_poll (ctx=ctx@entry=0x5556112151b0) at util/async.c:117 #15 0x5556102212c4 in aio_dispatch (ctx=0x5556112151b0) at util/aio-posix.c:459 #16 0x55561021dab2 in aio_ctx_dispatch (source=, callback=, user_data=) at util/async.c:260 #17 0x7f3c22302fbd in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #18 0x555610220358 in glib_pollfds_poll () at util/main-loop.c:219 #19 os_host_main_loop_wait (timeout=) at util/main-loop.c:242 #20 main_loop_wait (nonblocking=) at util/main-loop.c:518 #21 0x55560ff600fe in main_loop () at vl.c:1814 #22 0x55560fddbce9 in main (argc=, argv=out>, envp=) at vl.c:4503 We found that we're doing endless check in the line of block/io.c:bdrv_do_drained_begin(): BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, recursive, parent)); and it turns out that the bdrv_drain_poll() always get true from: - bdrv_parent_drained_poll(bs, ignore_parent, ignore_bds_parents) - AND atomic_read(&bs->in_flight) I personally think this is a deadlock issue in the a QEMU block layer (as we know, we have some #FIXME comments in related codes, such as block permisson update). Any comments are welcome and appreciated. --- thx,likexu
Re: [PATCH v2 1/2] target/i386: add "-cpu, lbr-fmt=*" support to enable guest LBR
Hi Paolo & Eduardo, Do we have any comment for the QEMU LBR enabling patches? https://lore.kernel.org/qemu-devel/20210201045453.240258-1-like...@linux.intel.com/ On 2021/2/1 12:54, Like Xu wrote: The last branch recording (LBR) is a performance monitor unit (PMU) feature on Intel processors that records a running trace of the most recent branches taken by the processor in the LBR stack. The QEMU could configure whether it's enabled or not for each guest via CLI. The LBR feature would be enabled on the guest if: - the KVM is enabled and the PMU is enabled and, - the msr-based-feature IA32_PERF_CAPABILITIES is supporterd on KVM and, - the supported returned value for lbr_fmt from this msr is not zero and, - the requested guest vcpu model does support FEAT_1_ECX.CPUID_EXT_PDCM, - the configured lbr-fmt value is the same as the host lbr_fmt value or use the QEMU option "-cpu host,migratable=no". Cc: Eduardo Habkost Cc: Paolo Bonzini Signed-off-by: Like Xu --- target/i386/cpu.c | 16 target/i386/cpu.h | 10 ++ target/i386/kvm/kvm.c | 5 +++-- 3 files changed, 29 insertions(+), 2 deletions(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index ae89024d36..80a5d3f0c2 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6504,6 +6504,13 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool verbose) x86_cpu_get_supported_feature_word(w, false); uint64_t requested_features = env->features[w]; uint64_t unavailable_features = requested_features & ~host_feat; +if (kvm_enabled() && w == FEAT_PERF_CAPABILITIES && +(requested_features & PERF_CAP_LBR_FMT)) { +if ((host_feat & PERF_CAP_LBR_FMT) != +(requested_features & PERF_CAP_LBR_FMT)) { +unavailable_features |= PERF_CAP_LBR_FMT; +} +} mark_unavailable_features(cpu, w, unavailable_features, prefix); } @@ -6611,6 +6618,14 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp) } } +if (cpu->lbr_fmt) { +if (!cpu->enable_pmu) { +error_setg(errp, "LBR is unsupported since guest PMU is disabled."); +return; +} +env->features[FEAT_PERF_CAPABILITIES] |= cpu->lbr_fmt; +} + /* mwait extended info: needed for Core compatibility */ /* We always wake on interrupt even if host does not have the capability */ cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE; @@ -7184,6 +7199,7 @@ static Property x86_cpu_properties[] = { #endif DEFINE_PROP_INT32("node-id", X86CPU, node_id, CPU_UNSET_NUMA_NODE_ID), DEFINE_PROP_BOOL("pmu", X86CPU, enable_pmu, false), +DEFINE_PROP_UINT8("lbr-fmt", X86CPU, lbr_fmt, 0), DEFINE_PROP_UINT32("hv-spinlocks", X86CPU, hyperv_spinlock_attempts, HYPERV_SPINLOCK_NEVER_NOTIFY), diff --git a/target/i386/cpu.h b/target/i386/cpu.h index d23a5b340a..64320bced2 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -354,6 +354,7 @@ typedef enum X86Seg { #define ARCH_CAP_TSX_CTRL_MSR (1<<7) #define MSR_IA32_PERF_CAPABILITIES 0x345 +#define PERF_CAP_LBR_FMT 0x3f #define MSR_IA32_TSX_CTRL 0x122 #define MSR_IA32_TSCDEADLINE0x6e0 @@ -1709,6 +1710,15 @@ struct X86CPU { */ bool enable_pmu; +/* + * Configure LBR_FMT bits on IA32_PERF_CAPABILITIES MSR. + * This can't be enabled by default yet because it doesn't have + * ABI stability guarantees, as it is only allowed to pass all + * LBR_FMT bits returned by kvm_arch_get_supported_msr_feature() + * (that depends on host CPU and kernel capabilities) to the guest. + */ +uint8_t lbr_fmt; + /* LMCE support can be enabled/disabled via cpu option 'lmce=on/off'. It is * disabled by default to avoid breaking migration between QEMU with * different LMCE configurations. diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 6dc1ee052d..49745efb78 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -2705,8 +2705,9 @@ static void kvm_msr_entry_add_perf(X86CPU *cpu, FeatureWordArray f) MSR_IA32_PERF_CAPABILITIES); if (kvm_perf_cap) { -kvm_msr_entry_add(cpu, MSR_IA32_PERF_CAPABILITIES, -kvm_perf_cap & f[FEAT_PERF_CAPABILITIES]); +kvm_perf_cap = cpu->migratable ? +(kvm_perf_cap & f[FEAT_PERF_CAPABILITIES]) : kvm_perf_cap; +kvm_msr_entry_add(cpu, MSR_IA32_PERF_CAPABILITIES, kvm_perf_cap); } }
[RESEND][BUG FIX HELP] QEMU main thread endlessly hangs in __ppoll()
Hi Genius, I am a user of QEMU v4.2.0 and stuck in an interesting bug, which may still exist in the mainline. Thanks in advance to heroes who can take a look and share understanding. The qemu main thread endlessly hangs in the handle of the qmp statement: {'execute': 'human-monitor-command', 'arguments':{ 'command-line': 'drive_del replication0' } } and we have the call trace looks like: #0 0x7f3c22045bf6 in __ppoll (fds=0x555611328410, nfds=1, timeout=, timeout@entry=0x7ffc56c66db0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44 #1 0x55561021f415 in ppoll (__ss=0x0, __timeout=0x7ffc56c66db0, __nfds=, __fds=) at /usr/include/x86_64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=) at util/qemu-timer.c:348 #3 0x555610221430 in aio_poll (ctx=ctx@entry=0x5556113010f0, blocking=blocking@entry=true) at util/aio-posix.c:669 #4 0x55561019268d in bdrv_do_drained_begin (poll=true, ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x55561138b0a0) at block/io.c:430 #5 bdrv_do_drained_begin (bs=0x55561138b0a0, recursive=, parent=0x0, ignore_bds_parents=, poll=) at block/io.c:396 #6 0x55561017b60b in quorum_del_child (bs=0x55561138b0a0, child=0x7f36dc0ce380, errp=) at block/quorum.c:1063 #7 0x55560ff5836b in qmp_x_blockdev_change (parent=0x555612373120 "colo-disk0", has_child=, child=0x5556112df3e0 "children.1", has_node=, node=0x0, errp=0x7ffc56c66f98) at blockdev.c:4494 #8 0x5556100f8f57 in qmp_marshal_x_blockdev_change (args=out>, ret=, errp=0x7ffc56c67018) at qapi/qapi-commands-block-core.c:1538 #9 0x5556101d8290 in do_qmp_dispatch (errp=0x7ffc56c67010, allow_oob=, request=, cmds=0x5556109c69a0 ) at qapi/qmp-dispatch.c:132 #10 qmp_dispatch (cmds=0x5556109c69a0 , request=out>, allow_oob=) at qapi/qmp-dispatch.c:175 #11 0x5556100d4c4d in monitor_qmp_dispatch (mon=0x5556113a6f40, req=) at monitor/qmp.c:145 #12 0x5556100d5437 in monitor_qmp_bh_dispatcher (data=) at monitor/qmp.c:234 #13 0x55561021dbec in aio_bh_call (bh=0x5556112164bGrateful0) at util/async.c:117 #14 aio_bh_poll (ctx=ctx@entry=0x5556112151b0) at util/async.c:117 #15 0x5556102212c4 in aio_dispatch (ctx=0x5556112151b0) at util/aio-posix.c:459 #16 0x55561021dab2 in aio_ctx_dispatch (source=, callback=, user_data=) at util/async.c:260 #17 0x7f3c22302fbd in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #18 0x555610220358 in glib_pollfds_poll () at util/main-loop.c:219 #19 os_host_main_loop_wait (timeout=) at util/main-loop.c:242 #20 main_loop_wait (nonblocking=) at util/main-loop.c:518 #21 0x55560ff600fe in main_loop () at vl.c:1814 #22 0x55560fddbce9 in main (argc=, argv=, envp=) at vl.c:4503 We found that we're doing endless check in the line of block/io.c:bdrv_do_drained_begin(): BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, recursive, parent)); and it turns out that the bdrv_drain_poll() always get true from: - bdrv_parent_drained_poll(bs, ignore_parent, ignore_bds_parents) - AND atomic_read(&bs->in_flight) I personally think this is a deadlock issue in the a QEMU block layer (as we know, we have some #FIXME comments in related codes, such as block permisson update). Any comments are welcome and appreciated. --- thx,likexu
[PATCH v2 1/2] target/i386: add "-cpu, lbr-fmt=*" support to enable guest LBR
The last branch recording (LBR) is a performance monitor unit (PMU) feature on Intel processors that records a running trace of the most recent branches taken by the processor in the LBR stack. The QEMU could configure whether it's enabled or not for each guest via CLI. The LBR feature would be enabled on the guest if: - the KVM is enabled and the PMU is enabled and, - the msr-based-feature IA32_PERF_CAPABILITIES is supporterd on KVM and, - the supported returned value for lbr_fmt from this msr is not zero and, - the requested guest vcpu model does support FEAT_1_ECX.CPUID_EXT_PDCM, - the configured lbr-fmt value is the same as the host lbr_fmt value or use the QEMU option "-cpu host,migratable=no". Cc: Eduardo Habkost Cc: Paolo Bonzini Signed-off-by: Like Xu --- target/i386/cpu.c | 16 target/i386/cpu.h | 10 ++ target/i386/kvm/kvm.c | 5 +++-- 3 files changed, 29 insertions(+), 2 deletions(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index ae89024d36..80a5d3f0c2 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6504,6 +6504,13 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool verbose) x86_cpu_get_supported_feature_word(w, false); uint64_t requested_features = env->features[w]; uint64_t unavailable_features = requested_features & ~host_feat; +if (kvm_enabled() && w == FEAT_PERF_CAPABILITIES && +(requested_features & PERF_CAP_LBR_FMT)) { +if ((host_feat & PERF_CAP_LBR_FMT) != +(requested_features & PERF_CAP_LBR_FMT)) { +unavailable_features |= PERF_CAP_LBR_FMT; +} +} mark_unavailable_features(cpu, w, unavailable_features, prefix); } @@ -6611,6 +6618,14 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp) } } +if (cpu->lbr_fmt) { +if (!cpu->enable_pmu) { +error_setg(errp, "LBR is unsupported since guest PMU is disabled."); +return; +} +env->features[FEAT_PERF_CAPABILITIES] |= cpu->lbr_fmt; +} + /* mwait extended info: needed for Core compatibility */ /* We always wake on interrupt even if host does not have the capability */ cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE; @@ -7184,6 +7199,7 @@ static Property x86_cpu_properties[] = { #endif DEFINE_PROP_INT32("node-id", X86CPU, node_id, CPU_UNSET_NUMA_NODE_ID), DEFINE_PROP_BOOL("pmu", X86CPU, enable_pmu, false), +DEFINE_PROP_UINT8("lbr-fmt", X86CPU, lbr_fmt, 0), DEFINE_PROP_UINT32("hv-spinlocks", X86CPU, hyperv_spinlock_attempts, HYPERV_SPINLOCK_NEVER_NOTIFY), diff --git a/target/i386/cpu.h b/target/i386/cpu.h index d23a5b340a..64320bced2 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -354,6 +354,7 @@ typedef enum X86Seg { #define ARCH_CAP_TSX_CTRL_MSR (1<<7) #define MSR_IA32_PERF_CAPABILITIES 0x345 +#define PERF_CAP_LBR_FMT 0x3f #define MSR_IA32_TSX_CTRL 0x122 #define MSR_IA32_TSCDEADLINE0x6e0 @@ -1709,6 +1710,15 @@ struct X86CPU { */ bool enable_pmu; +/* + * Configure LBR_FMT bits on IA32_PERF_CAPABILITIES MSR. + * This can't be enabled by default yet because it doesn't have + * ABI stability guarantees, as it is only allowed to pass all + * LBR_FMT bits returned by kvm_arch_get_supported_msr_feature() + * (that depends on host CPU and kernel capabilities) to the guest. + */ +uint8_t lbr_fmt; + /* LMCE support can be enabled/disabled via cpu option 'lmce=on/off'. It is * disabled by default to avoid breaking migration between QEMU with * different LMCE configurations. diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 6dc1ee052d..49745efb78 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -2705,8 +2705,9 @@ static void kvm_msr_entry_add_perf(X86CPU *cpu, FeatureWordArray f) MSR_IA32_PERF_CAPABILITIES); if (kvm_perf_cap) { -kvm_msr_entry_add(cpu, MSR_IA32_PERF_CAPABILITIES, -kvm_perf_cap & f[FEAT_PERF_CAPABILITIES]); +kvm_perf_cap = cpu->migratable ? +(kvm_perf_cap & f[FEAT_PERF_CAPABILITIES]) : kvm_perf_cap; +kvm_msr_entry_add(cpu, MSR_IA32_PERF_CAPABILITIES, kvm_perf_cap); } } -- 2.29.2
[PATCH v2 2/2] target/i386: add kvm_exact_match_flags to FeatureWordInfo
Eduardo has a suggestion: instead of hardcoding the PERF_CAPABILITIES rules in this loop, this could become a FeatureWordInfo field. It would be very useful for other features like intel-pt, where we need some bits to match the host too. Suggested-by: Eduardo Habkost Signed-off-by: Like Xu --- target/i386/cpu.c | 21 +++-- 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 80a5d3f0c2..8eaa5879ea 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -708,6 +708,8 @@ typedef struct FeatureWordInfo { uint64_t migratable_flags; /* Feature flags known to be migratable */ /* Features that shouldn't be auto-enabled by "-cpu host" */ uint64_t no_autoenable_flags; +/* Bits that must match host exactly when using KVM */ +uint64_t kvm_exact_match_flags; } FeatureWordInfo; static FeatureWordInfo feature_word_info[FEATURE_WORDS] = { @@ -1147,6 +1149,11 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = { .msr = { .index = MSR_IA32_PERF_CAPABILITIES, }, +/* + * KVM is not able to emulate a VCPU with LBR_FMT different + * from the host, so LBR_FMT must match the host exactly. + */ +.kvm_exact_match_flags = PERF_CAP_LBR_FMT, }, [FEAT_VMX_PROCBASED_CTLS] = { @@ -6500,16 +6507,18 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool verbose) } for (w = 0; w < FEATURE_WORDS; w++) { +FeatureWordInfo *fi = &feature_word_info[w]; +uint64_t match_flags = fi->kvm_exact_match_flags; uint64_t host_feat = x86_cpu_get_supported_feature_word(w, false); uint64_t requested_features = env->features[w]; uint64_t unavailable_features = requested_features & ~host_feat; -if (kvm_enabled() && w == FEAT_PERF_CAPABILITIES && -(requested_features & PERF_CAP_LBR_FMT)) { -if ((host_feat & PERF_CAP_LBR_FMT) != -(requested_features & PERF_CAP_LBR_FMT)) { -unavailable_features |= PERF_CAP_LBR_FMT; -} +if (kvm_enabled() && match_flags) { +uint64_t mismatches = (requested_features & match_flags) && +(requested_features ^ host_feat) & match_flags; +mark_unavailable_features(cpu, w, +mismatches, "feature doesn't match host"); +unavailable_features &= ~match_flags; } mark_unavailable_features(cpu, w, unavailable_features, prefix); } -- 2.29.2
Re: [PATCH 4/5 v4] KVM: VMX: Fill in conforming vmx_x86_ops via macro
Hi Krish, On 2020/11/10 9:23, Krish Sadhukhan wrote: @@ -1192,7 +1192,7 @@ void vmx_set_host_fs_gs(struct vmcs_host_state *host, u16 fs_sel, u16 gs_sel, } } -void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) +void vmx_prepare_guest_switch(struct kvm_vcpu *vcpu) What do you think of renaming it to void vmx_prepare_switch_for_guest(struct kvm_vcpu *vcpu); ? Thanks, Like Xu { struct vcpu_vmx *vmx = to_vmx(vcpu); struct vmcs_host_state *host_state; @@ -311,7 +311,7 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu, int allocate_vpid(void); void free_vpid(int vpid); void vmx_set_constant_host_state(struct vcpu_vmx *vmx); -void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu); +void vmx_prepare_guest_switch(struct kvm_vcpu *vcpu); void vmx_set_host_fs_gs(struct vmcs_host_state *host, u16 fs_sel, u16 gs_sel, unsigned long fs_base, unsigned long gs_base); int vmx_get_cpl(struct kvm_vcpu *vcpu);
Re: [Qemu-devel PATCH v2] target/i386: add "-cpu,lbr-fmt=*" support to enable guest LBR
Hi Eduardo, On 2020/9/30 1:38, Eduardo Habkost wrote: (CCing the people from the thread, as kvm_exact_match_flags would be useful for INTEL_PT_IP_LIP) On Tue, Sep 29, 2020 at 02:12:17PM +0800, Like Xu wrote: The last branch recording (LBR) is a performance monitor unit (PMU) feature on Intel processors that records a running trace of the most recent branches taken by the processor in the LBR stack. The QEMU could configure whether it's enabled or not for each guest via CLI. The LBR feature would be enabled on the guest if: - the KVM is enabled and the PMU is enabled and, - the msr-based-feature IA32_PERF_CAPABILITIES is supporterd on KVM and, - the supported returned value for lbr_fmt from this msr is not zero and, - the requested guest vcpu model does support FEAT_1_ECX.CPUID_EXT_PDCM, - the configured lbr-fmt value is the same as the host lbr_fmt value. Cc: Eduardo Habkost Cc: Paolo Bonzini Signed-off-by: Like Xu The approach below looks better, thanks! Only one problem below, with a few suggestions and questions: --- target/i386/cpu.c | 16 target/i386/cpu.h | 10 ++ 2 files changed, 26 insertions(+) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 3ffd877dd5..b10344be01 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6461,6 +6461,13 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool verbose) x86_cpu_get_supported_feature_word(w, false); uint64_t requested_features = env->features[w]; uint64_t unavailable_features = requested_features & ~host_feat; +if (w == FEAT_PERF_CAPABILITIES && +(requested_features & PERF_CAP_LBR_FMT)) { +if ((host_feat & PERF_CAP_LBR_FMT) != +(requested_features & PERF_CAP_LBR_FMT)) { +unavailable_features |= PERF_CAP_LBR_FMT; +} +} This looks correct, but needs to be conditional on kvm_enabled(). I also have a suggestion: instead of hardcoding the PERF_CAPABILITIES rules in this loop, this could become a FeatureWordInfo field. It would be very useful for other features like intel-pt, where we need some bits to match the host too. The idea looks good to me. Could you please check if the following patch works? Signed-off-by: Eduardo Habkost --- diff --git a/target/i386/cpu.c b/target/i386/cpu.c index b10344be010..d4107dcc026 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -704,6 +704,8 @@ typedef struct FeatureWordInfo { uint64_t migratable_flags; /* Feature flags known to be migratable */ /* Features that shouldn't be auto-enabled by "-cpu host" */ uint64_t no_autoenable_flags; +/* Bits that must match host exactly when using KVM */ +uint64_t kvm_exact_match_flags; } FeatureWordInfo; static FeatureWordInfo feature_word_info[FEATURE_WORDS] = { @@ -1143,6 +1145,11 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = { .msr = { .index = MSR_IA32_PERF_CAPABILITIES, }, +/* + * KVM is not able to emulate a VCPU with LBR_FMT different + * from the host, so LBR_FMT must match the host exactly. + */ +.kvm_exact_match_flags = PERF_CAP_LBR_FMT, }, [FEAT_VMX_PROCBASED_CTLS] = { @@ -6457,16 +6464,15 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool verbose) } for (w = 0; w < FEATURE_WORDS; w++) { +FeatureWordInfo *fi = &feature_word_info[w]; uint64_t host_feat = x86_cpu_get_supported_feature_word(w, false); uint64_t requested_features = env->features[w]; uint64_t unavailable_features = requested_features & ~host_feat; -if (w == FEAT_PERF_CAPABILITIES && -(requested_features & PERF_CAP_LBR_FMT)) { -if ((host_feat & PERF_CAP_LBR_FMT) != -(requested_features & PERF_CAP_LBR_FMT)) { -unavailable_features |= PERF_CAP_LBR_FMT; -} +if (kvm_enabled()) { +uint64_t mismatches = (requested_features ^ host_feat) & + fi->kvm_exact_match_flags; +mark_unavailable_features(cpu, w, mismatches, "feature doesn't match host"); } mark_unavailable_features(cpu, w, unavailable_features, prefix); } --- I may refine this part in this way: for (w = 0; w < FEATURE_WORDS; w++) { FeatureWordInfo *fi = &feature_word_info[w]; uint64_t match_flags = fi->kvm_exact_match_flags; uint64_t host_feat = x86_cpu_get_supported_feature_word(w, false); uint64_t requested_features = env->features[w]; uint64_t unavailable_features = requested_features & ~host_feat; if (kvm_enabled() && match_flags) {
[Qemu-devel PATCH v2] target/i386: add "-cpu, lbr-fmt=*" support to enable guest LBR
The last branch recording (LBR) is a performance monitor unit (PMU) feature on Intel processors that records a running trace of the most recent branches taken by the processor in the LBR stack. The QEMU could configure whether it's enabled or not for each guest via CLI. The LBR feature would be enabled on the guest if: - the KVM is enabled and the PMU is enabled and, - the msr-based-feature IA32_PERF_CAPABILITIES is supporterd on KVM and, - the supported returned value for lbr_fmt from this msr is not zero and, - the requested guest vcpu model does support FEAT_1_ECX.CPUID_EXT_PDCM, - the configured lbr-fmt value is the same as the host lbr_fmt value. Cc: Eduardo Habkost Cc: Paolo Bonzini Signed-off-by: Like Xu --- target/i386/cpu.c | 16 target/i386/cpu.h | 10 ++ 2 files changed, 26 insertions(+) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 3ffd877dd5..b10344be01 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6461,6 +6461,13 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool verbose) x86_cpu_get_supported_feature_word(w, false); uint64_t requested_features = env->features[w]; uint64_t unavailable_features = requested_features & ~host_feat; +if (w == FEAT_PERF_CAPABILITIES && +(requested_features & PERF_CAP_LBR_FMT)) { +if ((host_feat & PERF_CAP_LBR_FMT) != +(requested_features & PERF_CAP_LBR_FMT)) { +unavailable_features |= PERF_CAP_LBR_FMT; +} +} mark_unavailable_features(cpu, w, unavailable_features, prefix); } @@ -6533,6 +6540,14 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp) } } +if (cpu->lbr_fmt) { +if (!cpu->enable_pmu) { +error_setg(errp, "LBR is unsupported since guest PMU is disabled."); +return; +} +env->features[FEAT_PERF_CAPABILITIES] |= cpu->lbr_fmt; +} + /* mwait extended info: needed for Core compatibility */ /* We always wake on interrupt even if host does not have the capability */ cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE; @@ -7157,6 +7172,7 @@ static Property x86_cpu_properties[] = { #endif DEFINE_PROP_INT32("node-id", X86CPU, node_id, CPU_UNSET_NUMA_NODE_ID), DEFINE_PROP_BOOL("pmu", X86CPU, enable_pmu, false), +DEFINE_PROP_UINT8("lbr-fmt", X86CPU, lbr_fmt, 0), DEFINE_PROP_UINT32("hv-spinlocks", X86CPU, hyperv_spinlock_attempts, HYPERV_SPINLOCK_NEVER_NOTIFY), diff --git a/target/i386/cpu.h b/target/i386/cpu.h index f519d2bfd4..c1cf8b7160 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -357,6 +357,7 @@ typedef enum X86Seg { #define ARCH_CAP_TSX_CTRL_MSR (1<<7) #define MSR_IA32_PERF_CAPABILITIES 0x345 +#define PERF_CAP_LBR_FMT 0x3f #define MSR_IA32_TSX_CTRL 0x122 #define MSR_IA32_TSCDEADLINE0x6e0 @@ -1701,6 +1702,15 @@ struct X86CPU { */ bool enable_pmu; +/* + * Configure LBR_FMT bits on IA32_PERF_CAPABILITIES MSR. + * This can't be enabled by default yet because it doesn't have + * ABI stability guarantees, as it is only allowed to pass all + * LBR_FMT bits returned by kvm_arch_get_supported_msr_feature() + * (that depends on host CPU and kernel capabilities) to the guest. + */ +uint8_t lbr_fmt; + /* LMCE support can be enabled/disabled via cpu option 'lmce=on/off'. It is * disabled by default to avoid breaking migration between QEMU with * different LMCE configurations. -- 2.21.3
[PATCH] target/i386: add -cpu,lbr=true support to enable guest LBR
The LBR feature would be enabled on the guest if: - the KVM is enabled and the PMU is enabled and, - the msr-based-feature IA32_PERF_CAPABILITIES is supporterd and, - the supported returned value for lbr_fmt from this msr is not zero. The LBR feature would be disabled on the guest if: - the msr-based-feature IA32_PERF_CAPABILITIES is unsupporterd OR, - qemu set the IA32_PERF_CAPABILITIES msr feature without lbr_fmt values OR, - the requested guest vcpu model doesn't support PDCM. Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Cc: "Michael S. Tsirkin" Cc: Marcel Apfelbaum Cc: Marcelo Tosatti Cc: qemu-devel@nongnu.org Signed-off-by: Like Xu --- hw/i386/pc.c | 1 + target/i386/cpu.c | 24 ++-- target/i386/cpu.h | 2 ++ target/i386/kvm.c | 7 ++- 4 files changed, 31 insertions(+), 3 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 3d419d5991..857aff75bb 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -318,6 +318,7 @@ GlobalProperty pc_compat_1_5[] = { { "Nehalem-" TYPE_X86_CPU, "min-level", "2" }, { "virtio-net-pci", "any_layout", "off" }, { TYPE_X86_CPU, "pmu", "on" }, +{ TYPE_X86_CPU, "lbr", "on" }, { "i440FX-pcihost", "short_root_bus", "0" }, { "q35-pcihost", "short_root_bus", "0" }, }; diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 588f32e136..c803994887 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -1142,8 +1142,8 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = { [FEAT_PERF_CAPABILITIES] = { .type = MSR_FEATURE_WORD, .feat_names = { -NULL, NULL, NULL, NULL, -NULL, NULL, NULL, NULL, +"lbr-fmt-bit-0", "lbr-fmt-bit-1", "lbr-fmt-bit-2", "lbr-fmt-bit-3", +"lbr-fmt-bit-4", "lbr-fmt-bit-5", NULL, NULL, NULL, NULL, NULL, NULL, NULL, "full-width-write", NULL, NULL, NULL, NULL, NULL, NULL, @@ -4224,6 +4224,12 @@ static bool lmce_supported(void) return !!(mce_cap & MCG_LMCE_P); } +static inline bool lbr_supported(void) +{ +return kvm_enabled() && (kvm_arch_get_supported_msr_feature(kvm_state, +MSR_IA32_PERF_CAPABILITIES) & PERF_CAP_LBR_FMT); +} + #define CPUID_MODEL_ID_SZ 48 /** @@ -4327,6 +4333,9 @@ static void max_x86_cpu_initfn(Object *obj) } object_property_set_bool(OBJECT(cpu), "pmu", true, &error_abort); +if (lbr_supported()) { +object_property_set_bool(OBJECT(cpu), "lbr", true, &error_abort); +} } static const TypeInfo max_x86_cpu_type_info = { @@ -5535,6 +5544,10 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, } if (!cpu->enable_pmu) { *ecx &= ~CPUID_EXT_PDCM; +if (cpu->enable_lbr) { +warn_report("LBR is unsupported since guest PMU is disabled."); +exit(1); +} } break; case 2: @@ -6553,6 +6566,12 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp) } } +if (!cpu->max_features && cpu->enable_lbr && +!(env->features[FEAT_1_ECX] & CPUID_EXT_PDCM)) { +warn_report("requested vcpu model doesn't support PDCM for LBR."); +exit(1); +} + if (cpu->ucode_rev == 0) { /* The default is the same as KVM's. */ if (IS_AMD_CPU(env)) { @@ -7187,6 +7206,7 @@ static Property x86_cpu_properties[] = { #endif DEFINE_PROP_INT32("node-id", X86CPU, node_id, CPU_UNSET_NUMA_NODE_ID), DEFINE_PROP_BOOL("pmu", X86CPU, enable_pmu, false), +DEFINE_PROP_BOOL("lbr", X86CPU, enable_lbr, false), DEFINE_PROP_UINT32("hv-spinlocks", X86CPU, hyperv_spinlock_attempts, HYPERV_SPINLOCK_NEVER_RETRY), diff --git a/target/i386/cpu.h b/target/i386/cpu.h index e1a5c174dc..a059913e26 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -357,6 +357,7 @@ typedef enum X86Seg { #define ARCH_CAP_TSX_CTRL_MSR (1<<7) #define MSR_IA32_PERF_CAPABILITIES 0x345 +#define PERF_CAP_LBR_FMT 0x3f #define MSR_IA32_TSX_CTRL 0x122 #define MSR_IA32_TSCDEADLINE0x6e0 @@ -1702,6 +1703,7 @@ struct X86CPU { * capabilities) directly to the guest. */ bool enable_pmu; +bool enable_lbr; /* LMCE support can be enabled/disabled via cpu option 'lmce=on/off'. It is * disabled by default to avoid breaking migration between QEMU with diff --git a/target/i386/kvm.c b/target/i386/kvm.c index b8455c89ed..feb3
Re: [PATCH 1/2] migration/colo: fix typo in the COLO Framework module
On 2020/6/15 9:36, Zhanghailiang wrote: Hi Like, Please check this patch, It seems that you didn't use git format-patch command to generate this patch, It is in wrong format. I rebase the patch on the top commit of 7d3660e79830a069f1848bb4fa1cdf8f666424fb, and hope it helps you. Thanks, Hailiang From 15c19be9be07598d4264a4a84b85d4efa79bff9d Mon Sep 17 00:00:00 2001 From: Like Xu Date: Mon, 15 Jun 2020 10:10:57 +0800 Subject: [PATCH 1/2] migration/colo: fix typo in the COLO Framework module Cc: Hailiang Zhang Signed-off-by: Like Xu --- docs/COLO-FT.txt | 8 migration/colo.c | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt index c8e1740935..fdc0207cff 100644 --- a/docs/COLO-FT.txt +++ b/docs/COLO-FT.txt @@ -10,7 +10,7 @@ See the COPYING file in the top-level directory. This document gives an overview of COLO's design and how to use it. == Background == -Virtual machine (VM) replication is a well known technique for providing +Virtual machine (VM) replication is a well-known technique for providing application-agnostic software-implemented hardware fault tolerance, also known as "non-stop service". @@ -103,7 +103,7 @@ Primary side. COLO Proxy: Delivers packets to Primary and Secondary, and then compare the responses from -both side. Then decide whether to start a checkpoint according to some rules. +both sides. Then decide whether to start a checkpoint according to some rules. Please refer to docs/colo-proxy.txt for more information. Note: @@ -146,12 +146,12 @@ in test procedure. == Test procedure == Note: Here we are running both instances on the same host for testing, -change the IP Addresses if you want to run it on two hosts. Initally +change the IP Addresses if you want to run it on two hosts. Initially 127.0.0.1 is the Primary Host and 127.0.0.2 is the Secondary Host. == Startup qemu == 1. Primary: -Note: Initally, $imagefolder/primary.qcow2 needs to be copied to all hosts. +Note: Initially, $imagefolder/primary.qcow2 needs to be copied to all hosts. You don't need to change any IP's here, because 0.0.0.0 listens on any interface. The chardev's with 127.0.0.1 IP's loopback to the local qemu instance. diff --git a/migration/colo.c b/migration/colo.c index ea7d1e9d4e..80788d46b5 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -632,7 +632,7 @@ out: /* * It is safe to unregister notifier after failover finished. * Besides, colo_delay_timer and colo_checkpoint_sem can't be - * released befor unregister notifier, or there will be use-after-free + * released before unregister notifier, or there will be use-after-free * error. */ colo_compare_unregister_notifier(&packets_compare_notifier); -- 2.21.3
[PATCH 2/2] migration/colo/net: fix typo in the COLO Proxy module
Cc: Zhang Chen Cc: Li Zhijian Signed-off-by: Like Xu --- docs/colo-proxy.txt | 4 ++-- net/colo-compare.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/colo-proxy.txt b/docs/colo-proxy.txt index fa1cef0278..0bbd6f720a 100644 --- a/docs/colo-proxy.txt +++ b/docs/colo-proxy.txt @@ -21,7 +21,7 @@ and filter-rewriter compose the COLO-proxy. == Architecture == COLO-Proxy is based on qemu netfilter and it's a plugin for qemu netfilter -(except colo-compare). It keep Secondary VM connect normally to +(except colo-compare). It keeps Secondary VM connect normally to client and compare packets sent by PVM with sent by SVM. If the packet difference, notify COLO-frame to do checkpoint and send all primary packet has queued. Otherwise just send the queued primary @@ -94,7 +94,7 @@ Redirect Server Filter --> COLO-Compare COLO-compare receive primary guest packet then waiting secondary redirect packet to compare it. If packet same,send queued primary packet and clear -queued secondary packet, Otherwise send primary packet +queued secondary packet, otherwise send primary packet and do checkpoint. COLO-Compare --> Another Redirector Filter diff --git a/net/colo-compare.c b/net/colo-compare.c index c07e7c1c09..3efc61c777 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -658,7 +658,7 @@ static void colo_compare_packet(CompareState *s, Connection *conn, g_queue_remove(&conn->secondary_list, result->data); } else { /* - * If one packet arrive late, the secondary_list or + * If one packet arrives late, the secondary_list or * primary_list will be empty, so we can't compare it * until next comparison. If the packets in the list are * timeout, it will trigger a checkpoint request. @@ -1296,7 +1296,7 @@ static void colo_compare_finalize(Object *obj) } } -/* Release all unhandled packets after compare thead exited */ +/* Release all unhandled packets after compare thread exited */ g_queue_foreach(&s->conn_list, colo_flush_packets, s); g_queue_clear(&s->conn_list); -- 2.21.3
[PATCH 1/2] migration/colo: fix typo in the COLO Framework module
Cc: Hailiang Zhang Signed-off-by: Like Xu --- docs/COLO-FT.txt | 8 migration/colo.c | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt index c8e1740935..fdc0207cff 100644 --- a/docs/COLO-FT.txt +++ b/docs/COLO-FT.txt @@ -10,7 +10,7 @@ See the COPYING file in the top-level directory. This document gives an overview of COLO's design and how to use it. == Background == -Virtual machine (VM) replication is a well known technique for providing +Virtual machine (VM) replication is a well-known technique for providing application-agnostic software-implemented hardware fault tolerance, also known as "non-stop service". @@ -103,7 +103,7 @@ Primary side. COLO Proxy: Delivers packets to Primary and Secondary, and then compare the responses from -both side. Then decide whether to start a checkpoint according to some rules. +both sides. Then decide whether to start a checkpoint according to some rules. Please refer to docs/colo-proxy.txt for more information. Note: @@ -146,12 +146,12 @@ in test procedure. == Test procedure == Note: Here we are running both instances on the same host for testing, -change the IP Addresses if you want to run it on two hosts. Initally +change the IP Addresses if you want to run it on two hosts. Initially 127.0.0.1 is the Primary Host and 127.0.0.2 is the Secondary Host. == Startup qemu == 1. Primary: -Note: Initally, $imagefolder/primary.qcow2 needs to be copied to all hosts. +Note: Initially, $imagefolder/primary.qcow2 needs to be copied to all hosts. You don't need to change any IP's here, because 0.0.0.0 listens on any interface. The chardev's with 127.0.0.1 IP's loopback to the local qemu instance. diff --git a/migration/colo.c b/migration/colo.c index ea7d1e9d4e..80788d46b5 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -632,7 +632,7 @@ out: /* * It is safe to unregister notifier after failover finished. * Besides, colo_delay_timer and colo_checkpoint_sem can't be - * released befor unregister notifier, or there will be use-after-free + * released before unregister notifier, or there will be use-after-free * error. */ colo_compare_unregister_notifier(&packets_compare_notifier); -- 2.21.3
[Qemu-devel] [PATCH 2/2] target/i386: add -cpu, lbr=true support to enable guest LBR
The LBR feature would be enabled on the guest if: - the KVM is enabled and the PMU is enabled and, - the msr-based-feature IA32_PERF_CAPABILITIES is supporterd and, - the supported returned value for lbr_fmt from this msr is not zero. The LBR feature would be disabled on the guest if: - the msr-based-feature IA32_PERF_CAPABILITIES is unsupporterd OR, - qemu set the IA32_PERF_CAPABILITIES msr feature without lbr_fmt values OR, - the requested guest vcpu model doesn't support PDCM. Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Cc: "Michael S. Tsirkin" Cc: Marcel Apfelbaum Cc: Marcelo Tosatti Cc: qemu-devel@nongnu.org Signed-off-by: Like Xu --- hw/i386/pc.c | 1 + target/i386/cpu.c | 25 +++-- target/i386/cpu.h | 2 ++ target/i386/kvm.c | 7 ++- 4 files changed, 32 insertions(+), 3 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 2128f3d6fe..8d8d42a8ea 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -316,6 +316,7 @@ GlobalProperty pc_compat_1_5[] = { { "Nehalem-" TYPE_X86_CPU, "min-level", "2" }, { "virtio-net-pci", "any_layout", "off" }, { TYPE_X86_CPU, "pmu", "on" }, +{ TYPE_X86_CPU, "lbr", "on" }, { "i440FX-pcihost", "short_root_bus", "0" }, { "q35-pcihost", "short_root_bus", "0" }, }; diff --git a/target/i386/cpu.c b/target/i386/cpu.c index e47c9d1604..262a2595fa 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -1142,8 +1142,8 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = { [FEAT_PERF_CAPABILITIES] = { .type = MSR_FEATURE_WORD, .feat_names = { -NULL, NULL, NULL, NULL, -NULL, NULL, NULL, NULL, +"lbr-fmt-bit-0", "lbr-fmt-bit-1", "lbr-fmt-bit-2", "lbr-fmt-bit-3", +"lbr-fmt-bit-4", "lbr-fmt-bit-5", NULL, NULL, NULL, NULL, NULL, NULL, NULL, "full-width-write", NULL, NULL, NULL, NULL, NULL, NULL, @@ -4187,6 +4187,13 @@ static bool lmce_supported(void) return !!(mce_cap & MCG_LMCE_P); } +static inline bool lbr_supported(void) +{ +return kvm_enabled() && (PERF_CAP_LBR_FMT & +kvm_arch_get_supported_msr_feature(kvm_state, + MSR_IA32_PERF_CAPABILITIES)); +} + #define CPUID_MODEL_ID_SZ 48 /** @@ -4290,6 +4297,9 @@ static void max_x86_cpu_initfn(Object *obj) } object_property_set_bool(OBJECT(cpu), true, "pmu", &error_abort); +if (lbr_supported()) { +object_property_set_bool(OBJECT(cpu), true, "lbr", &error_abort); +} } static const TypeInfo max_x86_cpu_type_info = { @@ -5510,6 +5520,10 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, } if (!cpu->enable_pmu) { *ecx &= ~CPUID_EXT_PDCM; +if (cpu->enable_lbr) { +warn_report("LBR is unsupported since guest PMU is disabled."); +exit(1); +} } break; case 2: @@ -6528,6 +6542,12 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp) } } +if (!cpu->max_features && cpu->enable_lbr && +!(env->features[FEAT_1_ECX] & CPUID_EXT_PDCM)) { +warn_report("requested vcpu model doesn't support PDCM for LBR."); +exit(1); +} + if (cpu->ucode_rev == 0) { /* The default is the same as KVM's. */ if (IS_AMD_CPU(env)) { @@ -7165,6 +7185,7 @@ static Property x86_cpu_properties[] = { #endif DEFINE_PROP_INT32("node-id", X86CPU, node_id, CPU_UNSET_NUMA_NODE_ID), DEFINE_PROP_BOOL("pmu", X86CPU, enable_pmu, false), +DEFINE_PROP_BOOL("lbr", X86CPU, enable_lbr, false), DEFINE_PROP_UINT32("hv-spinlocks", X86CPU, hyperv_spinlock_attempts, HYPERV_SPINLOCK_NEVER_RETRY), diff --git a/target/i386/cpu.h b/target/i386/cpu.h index fad2f874bd..e5f65e9b0c 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -357,6 +357,7 @@ typedef enum X86Seg { #define ARCH_CAP_TSX_CTRL_MSR (1<<7) #define MSR_IA32_PERF_CAPABILITIES 0x345 +#define PERF_CAP_LBR_FMT 0x3f #define MSR_IA32_TSX_CTRL 0x122 #define MSR_IA32_TSCDEADLINE0x6e0 @@ -1686,6 +1687,7 @@ struct X86CPU { * capabilities) directly to the guest. */ bool enable_pmu; +bool enable_lbr; /* LMCE support can be enabled/disabled via cpu option 'lmce=on/off'. It is * disabled by default to avoid breaking migration between QEMU with diff --git a/target/i386/kvm.c b/target/i386/kvm.c i
[Qemu-devel] [PATCH 1/2] target/i386: define a new MSR based feature word - FEAT_PERF_CAPABILITIES
The Perfmon and Debug Capability MSR named IA32_PERF_CAPABILITIES is a feature-enumerating MSR, which only enumerates the feature full-width write (via bit 13) by now which indicates the processor supports IA32_A_PMCx interface for updating bits 32 and above of IA32_PMCx. The existence of MSR IA32_PERF_CAPABILITIES is enumerated by CPUID.1:ECX[15]. Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Cc: Marcelo Tosatti Cc: qemu-devel@nongnu.org Signed-off-by: Like Xu Message-Id: <20200529074347.124619-5-like...@linux.intel.com> Signed-off-by: Paolo Bonzini --- target/i386/cpu.c | 23 +++ target/i386/cpu.h | 3 +++ target/i386/kvm.c | 20 3 files changed, 46 insertions(+) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 02065e35d4..e47c9d1604 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -1139,6 +1139,22 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = { .index = MSR_IA32_CORE_CAPABILITY, }, }, +[FEAT_PERF_CAPABILITIES] = { +.type = MSR_FEATURE_WORD, +.feat_names = { +NULL, NULL, NULL, NULL, +NULL, NULL, NULL, NULL, +NULL, NULL, NULL, NULL, +NULL, "full-width-write", NULL, NULL, +NULL, NULL, NULL, NULL, +NULL, NULL, NULL, NULL, +NULL, NULL, NULL, NULL, +NULL, NULL, NULL, NULL, +}, +.msr = { +.index = MSR_IA32_PERF_CAPABILITIES, +}, +}, [FEAT_VMX_PROCBASED_CTLS] = { .type = MSR_FEATURE_WORD, @@ -1316,6 +1332,10 @@ static FeatureDep feature_dependencies[] = { .from = { FEAT_7_0_EDX, CPUID_7_0_EDX_CORE_CAPABILITY }, .to = { FEAT_CORE_CAPABILITY, ~0ull }, }, +{ +.from = { FEAT_1_ECX, CPUID_EXT_PDCM }, +.to = { FEAT_PERF_CAPABILITIES, ~0ull }, +}, { .from = { FEAT_1_ECX, CPUID_EXT_VMX }, .to = { FEAT_VMX_PROCBASED_CTLS,~0ull }, @@ -5488,6 +5508,9 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, *ebx |= (cs->nr_cores * cs->nr_threads) << 16; *edx |= CPUID_HT; } +if (!cpu->enable_pmu) { +*ecx &= ~CPUID_EXT_PDCM; +} break; case 2: /* cache info: needed for Pentium Pro compatibility */ diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 408392dbf6..fad2f874bd 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -356,6 +356,8 @@ typedef enum X86Seg { #define MSR_IA32_ARCH_CAPABILITIES 0x10a #define ARCH_CAP_TSX_CTRL_MSR (1<<7) +#define MSR_IA32_PERF_CAPABILITIES 0x345 + #define MSR_IA32_TSX_CTRL 0x122 #define MSR_IA32_TSCDEADLINE0x6e0 @@ -529,6 +531,7 @@ typedef enum FeatureWord { FEAT_XSAVE_COMP_HI, /* CPUID[EAX=0xd,ECX=0].EDX */ FEAT_ARCH_CAPABILITIES, FEAT_CORE_CAPABILITY, +FEAT_PERF_CAPABILITIES, FEAT_VMX_PROCBASED_CTLS, FEAT_VMX_SECONDARY_CTLS, FEAT_VMX_PINBASED_CTLS, diff --git a/target/i386/kvm.c b/target/i386/kvm.c index 34f838728d..9be6f76b2c 100644 --- a/target/i386/kvm.c +++ b/target/i386/kvm.c @@ -106,6 +106,7 @@ static bool has_msr_core_capabs; static bool has_msr_vmx_vmfunc; static bool has_msr_ucode_rev; static bool has_msr_vmx_procbased_ctls2; +static bool has_msr_perf_capabs; static uint32_t has_architectural_pmu_version; static uint32_t num_architectural_pmu_gp_counters; @@ -2027,6 +2028,9 @@ static int kvm_get_supported_msrs(KVMState *s) case MSR_IA32_CORE_CAPABILITY: has_msr_core_capabs = true; break; +case MSR_IA32_PERF_CAPABILITIES: +has_msr_perf_capabs = true; +break; case MSR_IA32_VMX_VMFUNC: has_msr_vmx_vmfunc = true; break; @@ -2643,6 +2647,18 @@ static void kvm_msr_entry_add_vmx(X86CPU *cpu, FeatureWordArray f) VMCS12_MAX_FIELD_INDEX << 1); } +static void kvm_msr_entry_add_perf(X86CPU *cpu, FeatureWordArray f) +{ +uint64_t kvm_perf_cap = +kvm_arch_get_supported_msr_feature(kvm_state, + MSR_IA32_PERF_CAPABILITIES); + +if (kvm_perf_cap) { +kvm_msr_entry_add(cpu, MSR_IA32_PERF_CAPABILITIES, +kvm_perf_cap & f[FEAT_PERF_CAPABILITIES]); +} +} + static int kvm_buf_set_msrs(X86CPU *cpu) { int ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MSRS, cpu->kvm_msr_buf); @@ -2675,6 +2691,10 @@ static void kvm_init_msrs(X86CPU *cpu) env->features[FEAT_CORE_CAPABILITY]); } +if (has_msr_perf_capabs && cpu->enable_pmu) { +kvm_msr_entry_add_perf(cpu, env->features); +} + if (has_msr_ucode_rev) { kvm_msr_entry_add(cpu, MSR_IA32_UCODE_REV, cpu->ucode_rev); } -- 2.21.3
[Qemu-devel PATCH] target/i386: define a new MSR based feature word - FEAT_PERF_CAPABILITIES
The Perfmon and Debug Capability MSR named IA32_PERF_CAPABILITIES is a feature-enumerating MSR, which only enumerates the feature full-width write (via bit 13) by now which indicates the processor supports IA32_A_PMCx interface for updating bits 32 and above of IA32_PMCx. The existence of MSR IA32_PERF_CAPABILITIES is enumerated by CPUID.1:ECX[15]. Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Cc: Marcelo Tosatti Cc: qemu-devel@nongnu.org Signed-off-by: Like Xu --- target/i386/cpu.c | 29 + target/i386/cpu.h | 3 +++ target/i386/kvm.c | 20 3 files changed, 52 insertions(+) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 3733d9a279..be56966bb0 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -1139,6 +1139,22 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = { .index = MSR_IA32_CORE_CAPABILITY, }, }, +[FEAT_PERF_CAPABILITIES] = { +.type = MSR_FEATURE_WORD, +.feat_names = { +NULL, NULL, NULL, NULL, +NULL, NULL, NULL, NULL, +NULL, NULL, NULL, NULL, +NULL, "full-width-write", NULL, NULL, +NULL, NULL, NULL, NULL, +NULL, NULL, NULL, NULL, +NULL, NULL, NULL, NULL, +NULL, NULL, NULL, NULL, +}, +.msr = { +.index = MSR_IA32_PERF_CAPABILITIES, +}, +}, [FEAT_VMX_PROCBASED_CTLS] = { .type = MSR_FEATURE_WORD, @@ -1316,6 +1332,10 @@ static FeatureDep feature_dependencies[] = { .from = { FEAT_7_0_EDX, CPUID_7_0_EDX_CORE_CAPABILITY }, .to = { FEAT_CORE_CAPABILITY, ~0ull }, }, +{ +.from = { FEAT_1_ECX, CPUID_EXT_PDCM }, +.to = { FEAT_PERF_CAPABILITIES, ~0ull }, +}, { .from = { FEAT_1_ECX, CPUID_EXT_VMX }, .to = { FEAT_VMX_PROCBASED_CTLS,~0ull }, @@ -5488,6 +5508,9 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, *ebx |= (cs->nr_cores * cs->nr_threads) << 16; *edx |= CPUID_HT; } +if (!cpu->enable_pmu) { +*ecx &= ~CPUID_EXT_PDCM; +} break; case 2: /* cache info: needed for Pentium Pro compatibility */ @@ -6505,6 +6528,12 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp) } } +if (kvm_enabled() && cpu->enable_pmu && +(kvm_arch_get_supported_cpuid(kvm_state, 1, 0, R_ECX) & + CPUID_EXT_PDCM)) { +env->features[FEAT_1_ECX] |= CPUID_EXT_PDCM; +} + if (cpu->ucode_rev == 0) { /* The default is the same as KVM's. */ if (IS_AMD_CPU(env)) { diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 408392dbf6..fad2f874bd 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -356,6 +356,8 @@ typedef enum X86Seg { #define MSR_IA32_ARCH_CAPABILITIES 0x10a #define ARCH_CAP_TSX_CTRL_MSR (1<<7) +#define MSR_IA32_PERF_CAPABILITIES 0x345 + #define MSR_IA32_TSX_CTRL 0x122 #define MSR_IA32_TSCDEADLINE0x6e0 @@ -529,6 +531,7 @@ typedef enum FeatureWord { FEAT_XSAVE_COMP_HI, /* CPUID[EAX=0xd,ECX=0].EDX */ FEAT_ARCH_CAPABILITIES, FEAT_CORE_CAPABILITY, +FEAT_PERF_CAPABILITIES, FEAT_VMX_PROCBASED_CTLS, FEAT_VMX_SECONDARY_CTLS, FEAT_VMX_PINBASED_CTLS, diff --git a/target/i386/kvm.c b/target/i386/kvm.c index 34f838728d..9be6f76b2c 100644 --- a/target/i386/kvm.c +++ b/target/i386/kvm.c @@ -106,6 +106,7 @@ static bool has_msr_core_capabs; static bool has_msr_vmx_vmfunc; static bool has_msr_ucode_rev; static bool has_msr_vmx_procbased_ctls2; +static bool has_msr_perf_capabs; static uint32_t has_architectural_pmu_version; static uint32_t num_architectural_pmu_gp_counters; @@ -2027,6 +2028,9 @@ static int kvm_get_supported_msrs(KVMState *s) case MSR_IA32_CORE_CAPABILITY: has_msr_core_capabs = true; break; +case MSR_IA32_PERF_CAPABILITIES: +has_msr_perf_capabs = true; +break; case MSR_IA32_VMX_VMFUNC: has_msr_vmx_vmfunc = true; break; @@ -2643,6 +2647,18 @@ static void kvm_msr_entry_add_vmx(X86CPU *cpu, FeatureWordArray f) VMCS12_MAX_FIELD_INDEX << 1); } +static void kvm_msr_entry_add_perf(X86CPU *cpu, FeatureWordArray f) +{ +uint64_t kvm_perf_cap = +kvm_arch_get_supported_msr_feature(kvm_state, + MSR_IA32_PERF_CAPABILITIES); + +if (kvm_perf_cap) { +kvm_msr_entry_add(cpu, MSR_IA32_PERF_CAPABILITIES, +kvm_perf_cap & f[FEAT_PERF_CAPABILITIES]); +} +} + static int kvm_buf_set_msrs(X86CPU *cpu) { int ret = kvm_vcpu_ioct
Re: [PATCH] i386/cpu: Expand MAX_FIXED_COUNTERS from 3 to 4 to for Icelake
On 2020/3/27 2:48, Paolo Bonzini wrote: On 17/03/20 06:54, Like Xu wrote: In the Intel SDM, "Table 18-2. Association of Fixed-Function Performance Counters with Architectural Performance Events", we may have a new fixed counter 'TOPDOWN.SLOTS' (since Icelake), which counts the number of available slots for an unhalted logical processor. Check commit 6017608936 in the kernel tree. Signed-off-by: Like Xu --- target/i386/cpu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 576f309bbf..ec2b67d425 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1185,7 +1185,7 @@ typedef struct { #define CPU_NB_REGS CPU_NB_REGS32 #endif -#define MAX_FIXED_COUNTERS 3 +#define MAX_FIXED_COUNTERS 4 #define MAX_GP_COUNTERS(MSR_IA32_PERF_STATUS - MSR_P6_EVNTSEL0) #define TARGET_INSN_START_EXTRA_WORDS 1 Hi Like, the problem with this patch is that it breaks live migration; the vmstate_msr_architectural_pmu record hardcodes MAX_FIXED_COUNTERS as the number of registers. So it's more complicated, you need to add a new subsection (following vmstate_msr_architectural_pmu) and transmit it only if the 4th counter is nonzero (instead of the more complicated check in pmu_enable_needed). Just to be safe, I'd make the new subsection hold 16 counters and bump MAX_FIXED_COUNTERS to 16. The new MAX_FIXED_COUNTERS looks good to me and and let me follow up this live migration issue. Thanks, Like Xu Thanks, Paolo
Re: [PATCH] i386/cpu: Expand MAX_FIXED_COUNTERS from 3 to 4 to for Icelake
Anyone to help review this change? Thanks, Like Xu On 2020/3/17 13:54, Like Xu wrote: In the Intel SDM, "Table 18-2. Association of Fixed-Function Performance Counters with Architectural Performance Events", we may have a new fixed counter 'TOPDOWN.SLOTS' (since Icelake), which counts the number of available slots for an unhalted logical processor. Check commit 6017608936 in the kernel tree. Signed-off-by: Like Xu --- target/i386/cpu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 576f309bbf..ec2b67d425 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1185,7 +1185,7 @@ typedef struct { #define CPU_NB_REGS CPU_NB_REGS32 #endif -#define MAX_FIXED_COUNTERS 3 +#define MAX_FIXED_COUNTERS 4 #define MAX_GP_COUNTERS(MSR_IA32_PERF_STATUS - MSR_P6_EVNTSEL0) #define TARGET_INSN_START_EXTRA_WORDS 1
[PATCH] i386/cpu: Expand MAX_FIXED_COUNTERS from 3 to 4 to for Icelake
In the Intel SDM, "Table 18-2. Association of Fixed-Function Performance Counters with Architectural Performance Events", we may have a new fixed counter 'TOPDOWN.SLOTS' (since Icelake), which counts the number of available slots for an unhalted logical processor. Check commit 6017608936 in the kernel tree. Signed-off-by: Like Xu --- target/i386/cpu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 576f309bbf..ec2b67d425 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1185,7 +1185,7 @@ typedef struct { #define CPU_NB_REGS CPU_NB_REGS32 #endif -#define MAX_FIXED_COUNTERS 3 +#define MAX_FIXED_COUNTERS 4 #define MAX_GP_COUNTERS(MSR_IA32_PERF_STATUS - MSR_P6_EVNTSEL0) #define TARGET_INSN_START_EXTRA_WORDS 1 -- 2.21.1
Re: Difference between 'current_machine' vs MACHINE(qdev_get_machine())
On 2020/1/9 20:01, Paolo Bonzini wrote: On 09/01/20 12:23, Philippe Mathieu-Daudé wrote: current_machine = MACHINE(object_new_with_class(OBJECT_CLASS(machine_class))); object_property_add_child(object_get_root(), "machine", OBJECT(current_machine), &error_abort); The bigger user of 'current_machine' is the accel/KVM code. Recently in a0628599f..cc7d44c2e0 "Replace global smp variables with machine smp properties" we started to use MACHINE(qdev_get_machine()). qdev_get_machine() resolves the machine in the QOM composition tree. I am confused by this comment: /* qdev_get_machine() can return something that's not TYPE_MACHINE * if this is one of the user-only emulators; in that case there's * no need to check the ignore_memory_transaction_failures board flag. */ Following a0628599f..cc7d44c2e0, a5e0b33119 use 'current_machine' again. What are the differences between both form, when should we use one or another (or can we use a single one?). Can this break user-only mode? I would always use MACHINE(qdev_get_machine()), espeecially outside vl.c. Ideally, current_machine would be static within vl.c or even unused outside the object_property_add_child() that you quote above. Most of the times, I noticed from a quick grep, we actually want to access the accelerator, not the machine, so we could add a qemu_get_accelerator() wrapper that does MACHINE(qdev_get_machine())->accelerator. Paolo I prefer to use MACHINE(qdev_get_machine()) wherever possible. However, the qdev_get_machine() would return non TYPE_MACHINE object if: - call qdev_get_machine() before we do "object_property_add_child(object_get_root(), "machine", OBJECT(current_machine), &error_abort);" in vl.c; - or in the context with '#ifdef CONFIG_USER_ONLY'; Thanks, Like Xu
Re: [Qemu-devel] [PATCH 1/3] pc: Fix error message on die-id validation
On 2019/8/16 21:49, Eduardo Habkost wrote: On Fri, Aug 16, 2019 at 09:04:16AM +0800, Like Xu wrote: Hi, On 2019/8/16 2:38, Eduardo Habkost wrote: The error message for die-id range validation is incorrect. Example: $ qemu-system-x86_64 -smp 1,sockets=6,maxcpus=6 \ -device qemu64-x86_64-cpu,socket-id=1,die-id=1,core-id=0,thread-id=0 qemu-system-x86_64: -device qemu64-x86_64-cpu,socket-id=1,die-id=1,core-id=0,thread-id=0: \ Invalid CPU die-id: 1 must be in range 0:5 The actual range for die-id in this example is 0:0. There is one die per socket by default. The case sockets=6 means there are 6 dies by default and the range for die-id is 0:5. I don't understand why you say that. die-id supposed to identify a die inside a socket. The code below is already checking for (cpu->die_id > pcms->smp_dies - 1) because of that. You're right about this. Sorry to make a mess to support die topology. Fix the error message to use smp_dies and print the correct range. Signed-off-by: Eduardo Habkost --- hw/i386/pc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 549c437050..24b03bb49c 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2412,7 +2412,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev, return; } else if (cpu->die_id > pcms->smp_dies - 1) { error_setg(errp, "Invalid CPU die-id: %u must be in range 0:%u", - cpu->die_id, max_socket); + cpu->die_id, pcms->smp_dies - 1); return; } if (cpu->core_id < 0) {
Re: [Qemu-devel] [PATCH 1/3] pc: Fix error message on die-id validation
Hi, On 2019/8/16 2:38, Eduardo Habkost wrote: The error message for die-id range validation is incorrect. Example: $ qemu-system-x86_64 -smp 1,sockets=6,maxcpus=6 \ -device qemu64-x86_64-cpu,socket-id=1,die-id=1,core-id=0,thread-id=0 qemu-system-x86_64: -device qemu64-x86_64-cpu,socket-id=1,die-id=1,core-id=0,thread-id=0: \ Invalid CPU die-id: 1 must be in range 0:5 The actual range for die-id in this example is 0:0. There is one die per socket by default. The case sockets=6 means there are 6 dies by default and the range for die-id is 0:5. Fix the error message to use smp_dies and print the correct range. Signed-off-by: Eduardo Habkost --- hw/i386/pc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 549c437050..24b03bb49c 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2412,7 +2412,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev, return; } else if (cpu->die_id > pcms->smp_dies - 1) { error_setg(errp, "Invalid CPU die-id: %u must be in range 0:%u", - cpu->die_id, max_socket); + cpu->die_id, pcms->smp_dies - 1); return; } if (cpu->core_id < 0) {
Re: [Qemu-devel] [PATCH for 4.1?] includes: remove stale [smp|max]_cpus externs
On 2019/7/11 21:05, Alex Bennée wrote: Commit a5e0b3311 removed these in favour of querying machine properties. Remove the extern declarations as well. Signed-off-by: Alex Bennée Cc: Like Xu Reviewed-by: Like Xu --- include/sysemu/sysemu.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h index 984c439ac9..e70edf7c1c 100644 --- a/include/sysemu/sysemu.h +++ b/include/sysemu/sysemu.h @@ -103,8 +103,6 @@ extern const char *keyboard_layout; extern int win2k_install_hack; extern int alt_grab; extern int ctrl_grab; -extern int smp_cpus; -extern unsigned int max_cpus; extern int cursor_hide; extern int graphic_rotate; extern int no_quit;
Re: [Qemu-devel] [PATCH v3 05/10] hw/riscv: Replace global smp variables with machine smp properties
On 2019/6/20 22:52, Eduardo Habkost wrote: On Sun, May 19, 2019 at 04:54:23AM +0800, Like Xu wrote: The global smp variables in riscv are replaced with smp machine properties. A local variable of the same name would be introduced in the declaration phase if it's used widely in the context OR replace it on the spot if it's only used once. No semantic changes. Signed-off-by: Like Xu --- hw/riscv/sifive_e.c| 6 -- hw/riscv/sifive_plic.c | 3 +++ hw/riscv/sifive_u.c| 6 -- hw/riscv/spike.c | 2 ++ hw/riscv/virt.c| 1 + 5 files changed, 14 insertions(+), 4 deletions(-) This was incomplete, I had to apply the following fixup. Signed-off-by: Eduardo Habkost Reviewed-by: Like Xu --- hw/riscv/spike.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/riscv/spike.c b/hw/riscv/spike.c index 9e95f2c13c..d91d49dcae 100644 --- a/hw/riscv/spike.c +++ b/hw/riscv/spike.c @@ -172,6 +172,7 @@ static void spike_board_init(MachineState *machine) MemoryRegion *main_mem = g_new(MemoryRegion, 1); MemoryRegion *mask_rom = g_new(MemoryRegion, 1); int i; +unsigned int smp_cpus = machine->smp.cpus; /* Initialize SOC */ object_initialize_child(OBJECT(machine), "soc", &s->soc, sizeof(s->soc),
[Qemu-devel] [PATCH v4 1/3] target/i386: Add CPUID.1F generation support for multi-dies PCMachine
The CPUID.1F as Intel V2 Extended Topology Enumeration Leaf would be exposed if guests want to emulate multiple software-visible die within each package. Per Intel's SDM, the 0x1f is a superset of 0xb, thus they can be generated by almost same code as 0xb except die_offset setting. If the number of dies per package is greater than 1, the cpuid_min_level would be adjusted to 0x1f regardless of whether the host supports CPUID.1F. Likewise, the CPUID.1F wouldn't be exposed if env->nr_dies < 2. Suggested-by: Eduardo Habkost Signed-off-by: Like Xu --- target/i386/cpu.c | 41 + target/i386/cpu.h | 1 + target/i386/kvm.c | 12 3 files changed, 54 insertions(+) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 88908a6373..efcbe6a2b2 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -4439,6 +4439,42 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, *ecx |= CPUID_TOPOLOGY_LEVEL_INVALID; } +assert(!(*eax & ~0x1f)); +*ebx &= 0x; /* The count doesn't need to be reliable. */ +break; +case 0x1F: +/* V2 Extended Topology Enumeration Leaf */ +if (env->nr_dies < 2) { +*eax = *ebx = *ecx = *edx = 0; +break; +} + +*ecx = count & 0xff; +*edx = cpu->apic_id; +switch (count) { +case 0: +*eax = apicid_core_offset(env->nr_dies, cs->nr_cores, +cs->nr_threads); +*ebx = cs->nr_threads; +*ecx |= CPUID_TOPOLOGY_LEVEL_SMT; +break; +case 1: +*eax = apicid_die_offset(env->nr_dies, cs->nr_cores, + cs->nr_threads); +*ebx = cs->nr_cores * cs->nr_threads; +*ecx |= CPUID_TOPOLOGY_LEVEL_CORE; +break; +case 2: +*eax = apicid_pkg_offset(env->nr_dies, cs->nr_cores, + cs->nr_threads); +*ebx = env->nr_dies * cs->nr_cores * cs->nr_threads; +*ecx |= CPUID_TOPOLOGY_LEVEL_DIE; +break; +default: +*eax = 0; +*ebx = 0; +*ecx |= CPUID_TOPOLOGY_LEVEL_INVALID; +} assert(!(*eax & ~0x1f)); *ebx &= 0x; /* The count doesn't need to be reliable. */ break; @@ -5116,6 +5152,11 @@ static void x86_cpu_expand_features(X86CPU *cpu, Error **errp) x86_cpu_adjust_level(cpu, &cpu->env.cpuid_min_level, 0x14); } +/* CPU topology with multi-dies support requires CPUID[0x1F] */ +if (env->nr_dies > 1) { +x86_cpu_adjust_level(cpu, &env->cpuid_min_level, 0x1F); +} + /* SVM requires CPUID[0x800A] */ if (env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_SVM) { x86_cpu_adjust_level(cpu, &env->cpuid_min_xlevel, 0x800A); diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 46dd81f6b7..eec6e4b7b7 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -726,6 +726,7 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS]; #define CPUID_TOPOLOGY_LEVEL_INVALID (0U << 8) #define CPUID_TOPOLOGY_LEVEL_SMT (1U << 8) #define CPUID_TOPOLOGY_LEVEL_CORE (2U << 8) +#define CPUID_TOPOLOGY_LEVEL_DIE (5U << 8) /* MSR Feature Bits */ #define MSR_ARCH_CAP_RDCL_NO(1U << 0) diff --git a/target/i386/kvm.c b/target/i386/kvm.c index 6899061b4e..5deb4248ac 100644 --- a/target/i386/kvm.c +++ b/target/i386/kvm.c @@ -1080,6 +1080,10 @@ int kvm_arch_init_vcpu(CPUState *cs) } break; } +case 0x1f: +if (env->nr_dies < 2) { +break; +} case 4: case 0xb: case 0xd: @@ -1087,6 +1091,11 @@ int kvm_arch_init_vcpu(CPUState *cs) if (i == 0xd && j == 64) { break; } + +if (i == 0x1f && j == 64) { +break; +} + c->function = i; c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX; c->index = j; @@ -1098,6 +1107,9 @@ int kvm_arch_init_vcpu(CPUState *cs) if (i == 0xb && !(c->ecx & 0xff00)) { break; } +if (i == 0x1f && !(c->ecx & 0xff00)) { +break; +} if (i == 0xd && c->eax == 0) { continue; } -- 2.21.0
[Qemu-devel] [PATCH v4 3/3] vl.c: Add -smp, dies=* command line support and update doc
For PC target, users could configure the number of dies per one package via command line with this patch, such as "-smp dies=2,cores=4". The parsing rules of new cpu-topology model obey the same restrictions/logic as the legacy socket/core/thread model especially on missing values computing. Signed-off-by: Like Xu --- hw/i386/pc.c| 30 +- qemu-options.hx | 17 + vl.c| 3 +++ 3 files changed, 29 insertions(+), 21 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 092bd10d4d..2ed1b3f8de 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -1548,9 +1548,12 @@ static void pc_new_cpu(PCMachineState *pcms, int64_t apic_id, Error **errp) */ void pc_smp_parse(MachineState *ms, QemuOpts *opts) { +PCMachineState *pcms = PC_MACHINE(ms); + if (opts) { unsigned cpus= qemu_opt_get_number(opts, "cpus", 0); unsigned sockets = qemu_opt_get_number(opts, "sockets", 0); +unsigned dies = qemu_opt_get_number(opts, "dies", 1); unsigned cores = qemu_opt_get_number(opts, "cores", 0); unsigned threads = qemu_opt_get_number(opts, "threads", 0); @@ -1560,24 +1563,24 @@ void pc_smp_parse(MachineState *ms, QemuOpts *opts) threads = threads > 0 ? threads : 1; if (cpus == 0) { sockets = sockets > 0 ? sockets : 1; -cpus = cores * threads * sockets; +cpus = cores * threads * dies * sockets; } else { ms->smp.max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus); -sockets = ms->smp.max_cpus / (cores * threads); +sockets = ms->smp.max_cpus / (cores * threads * dies); } } else if (cores == 0) { threads = threads > 0 ? threads : 1; -cores = cpus / (sockets * threads); +cores = cpus / (sockets * dies * threads); cores = cores > 0 ? cores : 1; } else if (threads == 0) { -threads = cpus / (cores * sockets); +threads = cpus / (cores * dies * sockets); threads = threads > 0 ? threads : 1; -} else if (sockets * cores * threads < cpus) { +} else if (sockets * dies * cores * threads < cpus) { error_report("cpu topology: " - "sockets (%u) * cores (%u) * threads (%u) < " + "sockets (%u) * dies (%u) * cores (%u) * threads (%u) < " "smp_cpus (%u)", - sockets, cores, threads, cpus); + sockets, dies, cores, threads, cpus); exit(1); } @@ -1589,26 +1592,27 @@ void pc_smp_parse(MachineState *ms, QemuOpts *opts) exit(1); } -if (sockets * cores * threads > ms->smp.max_cpus) { +if (sockets * dies * cores * threads > ms->smp.max_cpus) { error_report("cpu topology: " - "sockets (%u) * cores (%u) * threads (%u) > " + "sockets (%u) * dies (%u) * cores (%u) * threads (%u) > " "maxcpus (%u)", - sockets, cores, threads, + sockets, dies, cores, threads, ms->smp.max_cpus); exit(1); } -if (sockets * cores * threads != ms->smp.max_cpus) { +if (sockets * dies * cores * threads != ms->smp.max_cpus) { warn_report("Invalid CPU topology deprecated: " -"sockets (%u) * cores (%u) * threads (%u) " +"sockets (%u) * dies (%u) * cores (%u) * threads (%u) " "!= maxcpus (%u)", -sockets, cores, threads, +sockets, dies, cores, threads, ms->smp.max_cpus); } ms->smp.cpus = cpus; ms->smp.cores = cores; ms->smp.threads = threads; +pcms->smp_dies = dies; } if (ms->smp.cpus > 1) { diff --git a/qemu-options.hx b/qemu-options.hx index 0d8beb4afd..a5b314a448 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -138,25 +138,26 @@ no incompatible TCG features have been enabled (e.g. icount/replay). ETEXI DEF("smp", HAS_ARG, QEMU_OPTION_smp, -"-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,sockets=sockets]\n" +"-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n" "set the number of CPUs to 'n' [default=1]\n" "max
[Qemu-devel] [PATCH v4 0/3] Introduce cpu die topology and enable CPUID.1F for i386
This patch series introduces a new cpu topolgy 'die' for PCMachine, which extends virtual cpu topology to the socket/die/core/thread model, allowing the setting of dies number per one socket via -smp qemu command. For i386, it upgrades APIC-IDs generation and reversion functions with a new exposed leaf called CPUID.1F, which is a preferred superset to leaf 0BH. The CPUID.1F spec is on the latest Inetl SDM, 3-190 Vol 2A. Guest system could discover multi-die/package topology through CPUID.1F. and its benefit is primarily for _reporting_ of the guest CPU topology. The guest kernel with multi-die/package support have no impact on its cache topology, NUMA topology, Linux scheduler, or system performance. ==changelog== v4: - base commit: 22fa84da on github.com/ehabkost/qemu.git:machine-next - refine comments for pc_smp_parse() - remove the use of cpu->enable_cpuid_0x1f - apply new logic for cpuid_min_level adjustment and drop the legacy one - refine the way of MachineState casting in pc_smp_parse() - [QUEUED] move test_topo_bits to the previous patch for bisectability v3: https://patchwork.kernel.org/cover/10989013/ - add a MachineClass::smp_parse function pointer - place the PC-specific function inside hw/i386/pc.c - introduce die_id in a separate patch with default value 0 - set env->nr_dies in pc_new_cpu() and pc_cpu_pre_plug() - fix a circular dependency between target/i386/cpu.c and hw/i386/pc.c - fix cpu->die_id check in pc_cpu_pre_plug() - Based on "[PATCH v3 00/10] Refactor cpu topo into machine properties" - Rebase to commit 219dca61ebf41625831d4f96a720852baf44b762 v2: https://patchwork.kernel.org/cover/10953191/ - Enable cpu die-level topolgy only for PCMachine and X86CPU - Minimize cpuid.0.eax to the setting value actually used by guest - Update cmd line -smps docs for die-level configurations - Refactoring topo-bit tests for x86_apicid_from_cpu_idx() with nr_dies - Based on "[PATCH v3 00/10] Refactor cpu topo into machine properties" - Rebase to commit 2259637b95bef3116cc262459271de08e038cc66 v1: https://patchwork.kernel.org/cover/10876667/ Like Xu (3): target/i386: Add CPUID.1F generation support for multi-dies PCMachine machine: Refactor smp_parse() in vl.c as MachineClass::smp_parse() vl.c: Add -smp, dies=* command line support and update doc hw/core/machine.c| 76 hw/i386/pc.c | 83 include/hw/boards.h | 5 +++ include/hw/i386/pc.h | 1 + qemu-options.hx | 17 - target/i386/cpu.c| 41 ++ target/i386/cpu.h| 1 + target/i386/kvm.c| 12 +++ vl.c | 78 +++-- 9 files changed, 233 insertions(+), 81 deletions(-) -- 2.21.0
[Qemu-devel] [PATCH v4 2/3] machine: Refactor smp_parse() in vl.c as MachineClass::smp_parse()
To make smp_parse() more flexible and expansive, a smp_parse function pointer is added to MachineClass that machine types could override. The generic smp_parse() code in vl.c is moved to hw/core/machine.c, and become the default implementation of MachineClass::smp_parse. A PC-specific function called pc_smp_parse() has been added to hw/i386/pc.c, which in this patch changes nothing against the default one . Suggested-by: Eduardo Habkost Signed-off-by: Like Xu Reviewed-by: Eduardo Habkost --- hw/core/machine.c| 76 ++ hw/i386/pc.c | 79 include/hw/boards.h | 5 +++ include/hw/i386/pc.h | 1 + vl.c | 75 ++--- 5 files changed, 163 insertions(+), 73 deletions(-) diff --git a/hw/core/machine.c b/hw/core/machine.c index 8b8d263afe..36a838f1cb 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -11,6 +11,9 @@ */ #include "qemu/osdep.h" +#include "qemu/option.h" +#include "qapi/qmp/qerror.h" +#include "sysemu/replay.h" #include "qemu/units.h" #include "hw/boards.h" #include "qapi/error.h" @@ -728,6 +731,78 @@ void machine_set_cpu_numa_node(MachineState *machine, } } +static void smp_parse(MachineState *ms, QemuOpts *opts) +{ +if (opts) { +unsigned cpus= qemu_opt_get_number(opts, "cpus", 0); +unsigned sockets = qemu_opt_get_number(opts, "sockets", 0); +unsigned cores = qemu_opt_get_number(opts, "cores", 0); +unsigned threads = qemu_opt_get_number(opts, "threads", 0); + +/* compute missing values, prefer sockets over cores over threads */ +if (cpus == 0 || sockets == 0) { +cores = cores > 0 ? cores : 1; +threads = threads > 0 ? threads : 1; +if (cpus == 0) { +sockets = sockets > 0 ? sockets : 1; +cpus = cores * threads * sockets; +} else { +ms->smp.max_cpus = +qemu_opt_get_number(opts, "maxcpus", cpus); +sockets = ms->smp.max_cpus / (cores * threads); +} +} else if (cores == 0) { +threads = threads > 0 ? threads : 1; +cores = cpus / (sockets * threads); +cores = cores > 0 ? cores : 1; +} else if (threads == 0) { +threads = cpus / (cores * sockets); +threads = threads > 0 ? threads : 1; +} else if (sockets * cores * threads < cpus) { +error_report("cpu topology: " + "sockets (%u) * cores (%u) * threads (%u) < " + "smp_cpus (%u)", + sockets, cores, threads, cpus); +exit(1); +} + +ms->smp.max_cpus = +qemu_opt_get_number(opts, "maxcpus", cpus); + +if (ms->smp.max_cpus < cpus) { +error_report("maxcpus must be equal to or greater than smp"); +exit(1); +} + +if (sockets * cores * threads > ms->smp.max_cpus) { +error_report("cpu topology: " + "sockets (%u) * cores (%u) * threads (%u) > " + "maxcpus (%u)", + sockets, cores, threads, + ms->smp.max_cpus); +exit(1); +} + +if (sockets * cores * threads != ms->smp.max_cpus) { +warn_report("Invalid CPU topology deprecated: " +"sockets (%u) * cores (%u) * threads (%u) " +"!= maxcpus (%u)", +sockets, cores, threads, +ms->smp.max_cpus); +} + +ms->smp.cpus = cpus; +ms->smp.cores = cores; +ms->smp.threads = threads; +} + +if (ms->smp.cpus > 1) { +Error *blocker = NULL; +error_setg(&blocker, QERR_REPLAY_NOT_SUPPORTED, "smp"); +replay_add_blocker(blocker); +} +} + static void machine_class_init(ObjectClass *oc, void *data) { MachineClass *mc = MACHINE_CLASS(oc); @@ -735,6 +810,7 @@ static void machine_class_init(ObjectClass *oc, void *data) /* Default 128 MB as guest ram size */ mc->default_ram_size = 128 * MiB; mc->rom_file_has_mr = true; +mc->smp_parse = smp_parse; /* numa node memory size aligned on 8MB by default. * On Linux, each node's border has to be 8MB aligned diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 7b8c9caed6..092bd10d4d 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -79,6 +79,8 @@ #include "hw/i386/intel_iommu.h" #include "hw/net/ne2000-isa.h&q
Re: [Qemu-devel] [PATCH v3 7/9] target/i386: Support multi-dies when host doesn't support CPUID.1F
On 2019/6/20 7:36, Eduardo Habkost wrote: On Wed, Jun 19, 2019 at 04:15:46PM -0300, Eduardo Habkost wrote: On Wed, Jun 12, 2019 at 04:41:02PM +0800, Like Xu wrote: In guest CPUID generation process, the cpuid_min_level would be adjusted to the maximum passed value for basic CPUID configuration and it should not be restricted by the limited value returned from cpu_x86_cpuid(). After the basic cpu_x86_cpuid() loop is finished, the cpuid_0_entry.eax needs to be configured again by the last adjusted cpuid_min_level value. If a user wants to expose CPUID.1F by passing dies > 1 for any reason without host support, a per-cpu smp topology warning will appear but it's not blocked. Signed-off-by: Like Xu This code doesn't look at host CPUID at all, as far as I can see. Isn't it simpler to just make cpuid_x86_cpuid() return the correct data? I suggest the following change instead. Signed-off-by: Eduardo Habkost Hi Eduardo, Your code is more reasonable and concise than mine on this so let's not break cpuid_x86_cpuid(). I'll remove the use of enable_cpuid_0x1f in next version, and should I resend the patch series "Refactor cpu topo into machine properties" because rebase-fix may distract you ? --- target/i386/cpu.c | 4 1 file changed, 4 insertions(+) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 6db38e145b..d05a224092 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -5152,6 +5152,10 @@ static void x86_cpu_expand_features(X86CPU *cpu, Error **errp) x86_cpu_adjust_level(cpu, &cpu->env.cpuid_min_level, 0x14); } +if (env->nr_dies > 1) { +x86_cpu_adjust_level(cpu, &env->cpuid_min_level, 0x1F); +} + /* SVM requires CPUID[0x800A] */ if (env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_SVM) { x86_cpu_adjust_level(cpu, &env->cpuid_min_xlevel, 0x800A);
Re: [Qemu-devel] [PATCH v3 0/9] Introduce cpu die topology and enable CPUID.1F for i386
Ping for timely review. On 2019/6/12 16:40, Like Xu wrote: Multi-chip packaging technology allows integration of multi-cores in one die and multi-dies in one single package, for example Intel CLX-AP or AMD EPYC. This patch series extend the CPU topology to the socket/dies/core/thread model, allowing the setting of dies number per one socket on -smp qemu command. For i386, it upgrades APIC_IDs generation and reversion functions with a new exposed leaf called CPUID.1F, which is a preferred superset to leaf 0BH. The CPUID.1F spec is on https://software.intel.com/en-us/articles/intel-sdm, 3-190 Vol 2A. E.g. we use -smp 4,dies=2,cores=2,threads=1 to run a multi-dies guest and check raw cpuid data and the expected output from guest is following: 0x001f 0x00: eax=0x ebx=0x0001 ecx=0x0100 edx=0x0002 0x001f 0x01: eax=0x0001 ebx=0x0002 ecx=0x0201 edx=0x0001 0x001f 0x02: eax=0x0002 ebx=0x0004 ecx=0x0502 edx=0x0003 0x001f 0x03: eax=0x ebx=0x ecx=0x0003 edx=0x0001 Guest system could discover multi-die/package topology through CPUID.1F. and its benefit is primarily for _reporting_ of the (virtual) CPU topology. The guest kernel with multi-die/package support have no impact on its cache topology, NUMA topology, Linux scheduler, or system performance. ==changelog== v3: - add a MachineClass::smp_parse function pointer - place the PC-specific function inside hw/i386/pc.c - introduce die_id in a separate patch with default value 0 - set env->nr_dies in pc_new_cpu() and pc_cpu_pre_plug() - fix a circular dependency between target/i386/cpu.c and hw/i386/pc.c - fix cpu->die_id check in pc_cpu_pre_plug() - Based on "[PATCH v3 00/10] Refactor cpu topo into machine properties" - Rebase to commit 219dca61ebf41625831d4f96a720852baf44b762 v2: https://patchwork.kernel.org/cover/10953191/ - Enable cpu die-level topolgy only for PCMachine and X86CPU - Minimize cpuid.0.eax to the setting value actually used by guest - Update cmd line -smps docs for die-level configurations - Refactoring topo-bit tests for x86_apicid_from_cpu_idx() with nr_dies - Based on "[PATCH v3 00/10] Refactor cpu topo into machine properties" - Rebase to commit 2259637b95bef3116cc262459271de08e038cc66 v1: https://patchwork.kernel.org/cover/10876667/ Like Xu (9): i386: Add die-level cpu topology to x86CPU on PCMachine hw/i386: Adjust nr_dies with configured smp_dies for PCMachine i386/cpu: Consolidate die-id validity in smp context i386: Update new x86_apicid parsing rules with die_offset support tests/x86-cpuid: Update testcases in test_topo_bits() with multiple dies i386/cpu: Add CPUID.1F generation support for multi-dies PCMachine target/i386: Support multi-dies when host doesn't support CPUID.1F machine: Refactor smp_parse() in vl.c as MachineClass::smp_parse() vl.c: Add -smp, dies=* command line support and update doc hmp.c | 3 + hw/core/machine.c | 89 ++ hw/i386/pc.c | 148 - include/hw/boards.h| 5 ++ include/hw/i386/pc.h | 3 + include/hw/i386/topology.h | 76 +-- qapi/misc.json | 6 +- qemu-options.hx| 17 +++-- target/i386/cpu.c | 53 +++-- target/i386/cpu.h | 7 ++ target/i386/kvm.c | 36 - tests/test-x86-cpuid.c | 84 +++-- vl.c | 78 ++- 13 files changed, 438 insertions(+), 167 deletions(-)
[Qemu-devel] [PATCH v3 4/9] i386: Update new x86_apicid parsing rules with die_offset support
In new sockets/dies/cores/threads model, the apicid of logical cpu could imply die level info of guest cpu topology thus x86_apicid_from_cpu_idx() need to be refactored with #dies value, so does apicid_*_offset(). To keep semantic compatibility, the legacy pkg_offset which helps to generate CPUIDs such as 0x3 for L3 cache should be mapping to die_offset. Signed-off-by: Like Xu --- hw/i386/pc.c | 29 ++- include/hw/i386/topology.h | 76 +++--- target/i386/cpu.c | 13 --- 3 files changed, 81 insertions(+), 37 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 6e774c6c8e..b4dbd1064d 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -933,10 +933,11 @@ void enable_compat_apic_id_mode(void) static uint32_t x86_cpu_apic_id_from_index(MachineState *ms, unsigned int cpu_index) { +PCMachineState *pcms = PC_MACHINE(ms); uint32_t correct_id; static bool warned; -correct_id = x86_apicid_from_cpu_idx(ms->smp.cores, +correct_id = x86_apicid_from_cpu_idx(pcms->smp_dies, ms->smp.cores, ms->smp.threads, cpu_index); if (compat_apic_id_mode) { if (cpu_index != correct_id && !warned && !qtest_enabled()) { @@ -2355,18 +2356,21 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev, topo.die_id = cpu->die_id; topo.core_id = cpu->core_id; topo.smt_id = cpu->thread_id; -cpu->apic_id = apicid_from_topo_ids(smp_cores, smp_threads, &topo); +cpu->apic_id = apicid_from_topo_ids(pcms->smp_dies, smp_cores, +smp_threads, &topo); } cpu_slot = pc_find_cpu_slot(MACHINE(pcms), cpu->apic_id, &idx); if (!cpu_slot) { MachineState *ms = MACHINE(pcms); -x86_topo_ids_from_apicid(cpu->apic_id, smp_cores, smp_threads, &topo); -error_setg(errp, "Invalid CPU [socket: %u, core: %u, thread: %u] with" - " APIC ID %" PRIu32 ", valid index range 0:%d", - topo.pkg_id, topo.core_id, topo.smt_id, cpu->apic_id, - ms->possible_cpus->len - 1); +x86_topo_ids_from_apicid(cpu->apic_id, pcms->smp_dies, + smp_cores, smp_threads, &topo); +error_setg(errp, +"Invalid CPU [socket: %u, die: %u, core: %u, thread: %u] with" +" APIC ID %" PRIu32 ", valid index range 0:%d", +topo.pkg_id, topo.die_id, topo.core_id, topo.smt_id, +cpu->apic_id, ms->possible_cpus->len - 1); return; } @@ -2382,7 +2386,8 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev, /* TODO: move socket_id/core_id/thread_id checks into x86_cpu_realizefn() * once -smp refactoring is complete and there will be CPU private * CPUState::nr_cores and CPUState::nr_threads fields instead of globals */ -x86_topo_ids_from_apicid(cpu->apic_id, smp_cores, smp_threads, &topo); +x86_topo_ids_from_apicid(cpu->apic_id, pcms->smp_dies, + smp_cores, smp_threads, &topo); if (cpu->socket_id != -1 && cpu->socket_id != topo.pkg_id) { error_setg(errp, "property socket-id: %u doesn't match set apic-id:" " 0x%x (socket-id: %u)", cpu->socket_id, cpu->apic_id, topo.pkg_id); @@ -2679,10 +2684,12 @@ pc_cpu_index_to_props(MachineState *ms, unsigned cpu_index) static int64_t pc_get_default_cpu_node_id(const MachineState *ms, int idx) { X86CPUTopoInfo topo; + PCMachineState *pcms = PC_MACHINE(ms); assert(idx < ms->possible_cpus->len); x86_topo_ids_from_apicid(ms->possible_cpus->cpus[idx].arch_id, -ms->smp.cores, ms->smp.threads, &topo); +pcms->smp_dies, ms->smp.cores, +ms->smp.threads, &topo); return topo.pkg_id % nb_numa_nodes; } @@ -2690,6 +2697,7 @@ static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms) { int i; unsigned int max_cpus = ms->smp.max_cpus; +PCMachineState *pcms = PC_MACHINE(ms); if (ms->possible_cpus) { /* @@ -2710,7 +2718,8 @@ static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms) ms->possible_cpus->cpus[i].vcpus_count = 1; ms->possible_cpus->cpus[i].arch_id = x86_cpu_apic_id_from_index(ms, i); x86_topo_ids_from_apicid(ms->possible_cpus->cpus[i].arch_id, - ms->smp.cores, ms->smp.threads, &topo); + pcms->smp_dies, ms->smp.cores, + ms->smp.threads
[Qemu-devel] [PATCH v3 9/9] vl.c: Add -smp, dies=* command line support and update doc
For PC target, users could configure the number of dies per one package via command line with this patch, such as "-smp dies=2,cores=4". The parsing rules of new cpu-topology model obey the same restrictions/logic as the legacy socket/core/thread model especially on missing values computing. Signed-off-by: Like Xu --- hw/i386/pc.c| 32 ++-- qemu-options.hx | 17 + vl.c| 3 +++ 3 files changed, 30 insertions(+), 22 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 63b44bd2bd..8a5da4f0c1 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -1543,10 +1543,13 @@ static void pc_new_cpu(PCMachineState *pcms, int64_t apic_id, Error **errp) void pc_smp_parse(MachineState *ms, QemuOpts *opts) { -/* copy it from legacy smp_parse() in vl.c */ +PCMachineState *pcms = (PCMachineState *) +object_dynamic_cast(OBJECT(ms), TYPE_PC_MACHINE); + if (opts) { unsigned cpus= qemu_opt_get_number(opts, "cpus", 0); unsigned sockets = qemu_opt_get_number(opts, "sockets", 0); +unsigned dies = qemu_opt_get_number(opts, "dies", 1); unsigned cores = qemu_opt_get_number(opts, "cores", 0); unsigned threads = qemu_opt_get_number(opts, "threads", 0); @@ -1556,24 +1559,24 @@ void pc_smp_parse(MachineState *ms, QemuOpts *opts) threads = threads > 0 ? threads : 1; if (cpus == 0) { sockets = sockets > 0 ? sockets : 1; -cpus = cores * threads * sockets; +cpus = cores * threads * dies * sockets; } else { ms->smp.max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus); -sockets = ms->smp.max_cpus / (cores * threads); +sockets = ms->smp.max_cpus / (cores * threads * dies); } } else if (cores == 0) { threads = threads > 0 ? threads : 1; -cores = cpus / (sockets * threads); +cores = cpus / (sockets * dies * threads); cores = cores > 0 ? cores : 1; } else if (threads == 0) { -threads = cpus / (cores * sockets); +threads = cpus / (cores * dies * sockets); threads = threads > 0 ? threads : 1; -} else if (sockets * cores * threads < cpus) { +} else if (sockets * dies * cores * threads < cpus) { error_report("cpu topology: " - "sockets (%u) * cores (%u) * threads (%u) < " + "sockets (%u) * dies (%u) * cores (%u) * threads (%u) < " "smp_cpus (%u)", - sockets, cores, threads, cpus); + sockets, dies, cores, threads, cpus); exit(1); } @@ -1585,26 +1588,27 @@ void pc_smp_parse(MachineState *ms, QemuOpts *opts) exit(1); } -if (sockets * cores * threads > ms->smp.max_cpus) { +if (sockets * dies * cores * threads > ms->smp.max_cpus) { error_report("cpu topology: " - "sockets (%u) * cores (%u) * threads (%u) > " + "sockets (%u) * dies (%u) * cores (%u) * threads (%u) > " "maxcpus (%u)", - sockets, cores, threads, + sockets, dies, cores, threads, ms->smp.max_cpus); exit(1); } -if (sockets * cores * threads != ms->smp.max_cpus) { +if (sockets * dies * cores * threads != ms->smp.max_cpus) { warn_report("Invalid CPU topology deprecated: " -"sockets (%u) * cores (%u) * threads (%u) " +"sockets (%u) * dies (%u) * cores (%u) * threads (%u) " "!= maxcpus (%u)", -sockets, cores, threads, +sockets, dies, cores, threads, ms->smp.max_cpus); } ms->smp.cpus = cpus; ms->smp.cores = cores; ms->smp.threads = threads; +pcms->smp_dies = dies; } if (ms->smp.cpus > 1) { diff --git a/qemu-options.hx b/qemu-options.hx index 0d8beb4afd..a5b314a448 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -138,25 +138,26 @@ no incompatible TCG features have been enabled (e.g. icount/replay). ETEXI DEF("smp", HAS_ARG, QEMU_OPTION_smp, -"-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,sockets=sockets]\n" +"-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n" "
[Qemu-devel] [PATCH v3 6/9] i386/cpu: Add CPUID.1F generation support for multi-dies PCMachine
The CPUID.1F as Intel V2 Extended Topology Enumeration Leaf would be exposed if guests want to emulate multiple software-visible die within each package. Per Intel's SDM, the 0x1f is a superset of 0xb, thus they can be generated by almost same code as 0xb except die_offset setting. If the number of dies per package is less than 2, the qemu will not expose CPUID.1F regardless of whether the host supports CPUID.1F. Signed-off-by: Like Xu --- target/i386/cpu.c | 37 + target/i386/cpu.h | 4 target/i386/kvm.c | 12 3 files changed, 53 insertions(+) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 09e20a2c3b..127aff74a6 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -4437,6 +4437,42 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, *ecx |= CPUID_TOPOLOGY_LEVEL_INVALID; } +assert(!(*eax & ~0x1f)); +*ebx &= 0x; /* The count doesn't need to be reliable. */ +break; +case 0x1F: +/* V2 Extended Topology Enumeration Leaf */ +if (env->nr_dies < 2 || !cpu->enable_cpuid_0x1f) { +*eax = *ebx = *ecx = *edx = 0; +break; +} + +*ecx = count & 0xff; +*edx = cpu->apic_id; +switch (count) { +case 0: +*eax = apicid_core_offset(env->nr_dies, cs->nr_cores, +cs->nr_threads); +*ebx = cs->nr_threads; +*ecx |= CPUID_TOPOLOGY_LEVEL_SMT; +break; +case 1: +*eax = apicid_die_offset(env->nr_dies, cs->nr_cores, + cs->nr_threads); +*ebx = cs->nr_cores * cs->nr_threads; +*ecx |= CPUID_TOPOLOGY_LEVEL_CORE; +break; +case 2: +*eax = apicid_pkg_offset(env->nr_dies, cs->nr_cores, + cs->nr_threads); +*ebx = env->nr_dies * cs->nr_cores * cs->nr_threads; +*ecx |= CPUID_TOPOLOGY_LEVEL_DIE; +break; +default: +*eax = 0; +*ebx = 0; +*ecx |= CPUID_TOPOLOGY_LEVEL_INVALID; +} assert(!(*eax & ~0x1f)); *ebx &= 0x; /* The count doesn't need to be reliable. */ break; @@ -5890,6 +5926,7 @@ static Property x86_cpu_properties[] = { DEFINE_PROP_BOOL("full-cpuid-auto-level", X86CPU, full_cpuid_auto_level, true), DEFINE_PROP_STRING("hv-vendor-id", X86CPU, hyperv_vendor_id), DEFINE_PROP_BOOL("cpuid-0xb", X86CPU, enable_cpuid_0xb, true), +DEFINE_PROP_BOOL("cpuid-0x1f", X86CPU, enable_cpuid_0x1f, true), DEFINE_PROP_BOOL("lmce", X86CPU, enable_lmce, false), DEFINE_PROP_BOOL("l3-cache", X86CPU, enable_l3_cache, true), DEFINE_PROP_BOOL("kvm-no-smi-migration", X86CPU, kvm_no_smi_migration, diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 69495f0a8a..0434dfb62a 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -726,6 +726,7 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS]; #define CPUID_TOPOLOGY_LEVEL_INVALID (0U << 8) #define CPUID_TOPOLOGY_LEVEL_SMT (1U << 8) #define CPUID_TOPOLOGY_LEVEL_CORE (2U << 8) +#define CPUID_TOPOLOGY_LEVEL_DIE (5U << 8) /* MSR Feature Bits */ #define MSR_ARCH_CAP_RDCL_NO(1U << 0) @@ -1444,6 +1445,9 @@ struct X86CPU { /* Compatibility bits for old machine types: */ bool enable_cpuid_0xb; +/* V2 Compatibility bits for old machine types: */ +bool enable_cpuid_0x1f; + /* Enable auto level-increase for all CPUID leaves */ bool full_cpuid_auto_level; diff --git a/target/i386/kvm.c b/target/i386/kvm.c index 3b29ce5c0d..9b4da9b265 100644 --- a/target/i386/kvm.c +++ b/target/i386/kvm.c @@ -1081,6 +1081,10 @@ int kvm_arch_init_vcpu(CPUState *cs) } break; } +case 0x1f: +if (env->nr_dies < 2 || !cpu->enable_cpuid_0x1f) { +break; +} case 4: case 0xb: case 0xd: @@ -1088,6 +1092,11 @@ int kvm_arch_init_vcpu(CPUState *cs) if (i == 0xd && j == 64) { break; } + +if (i == 0x1f && j == 64) { +break; +} + c->function = i; c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX; c->index = j; @@ -1099,6 +1108,9 @@ int kvm_arch_init_vcpu(CPUState *cs) if (i == 0xb && !(c->ecx & 0xff00)) { break; } +if (i == 0x1f && !(c->ecx & 0xff00)) { +break; +} if (i == 0xd && c->eax == 0) { continue; } -- 2.21.0
[Qemu-devel] [PATCH v3 5/9] tests/x86-cpuid: Update testcases in test_topo_bits() with multiple dies
The corresponding topo_bits tests are updated to support die configurations. Signed-off-by: Like Xu --- tests/test-x86-cpuid.c | 84 ++ 1 file changed, 45 insertions(+), 39 deletions(-) diff --git a/tests/test-x86-cpuid.c b/tests/test-x86-cpuid.c index ff225006e4..1942287f33 100644 --- a/tests/test-x86-cpuid.c +++ b/tests/test-x86-cpuid.c @@ -28,74 +28,80 @@ static void test_topo_bits(void) { -/* simple tests for 1 thread per core, 1 core per socket */ -g_assert_cmpuint(apicid_smt_width(1, 1), ==, 0); -g_assert_cmpuint(apicid_core_width(1, 1), ==, 0); +/* simple tests for 1 thread per core, 1 core per die, 1 die per package */ +g_assert_cmpuint(apicid_smt_width(1, 1, 1), ==, 0); +g_assert_cmpuint(apicid_core_width(1, 1, 1), ==, 0); +g_assert_cmpuint(apicid_die_width(1, 1, 1), ==, 0); -g_assert_cmpuint(x86_apicid_from_cpu_idx(1, 1, 0), ==, 0); -g_assert_cmpuint(x86_apicid_from_cpu_idx(1, 1, 1), ==, 1); -g_assert_cmpuint(x86_apicid_from_cpu_idx(1, 1, 2), ==, 2); -g_assert_cmpuint(x86_apicid_from_cpu_idx(1, 1, 3), ==, 3); +g_assert_cmpuint(x86_apicid_from_cpu_idx(1, 1, 1, 0), ==, 0); +g_assert_cmpuint(x86_apicid_from_cpu_idx(1, 1, 1, 1), ==, 1); +g_assert_cmpuint(x86_apicid_from_cpu_idx(1, 1, 1, 2), ==, 2); +g_assert_cmpuint(x86_apicid_from_cpu_idx(1, 1, 1, 3), ==, 3); /* Test field width calculation for multiple values */ -g_assert_cmpuint(apicid_smt_width(1, 2), ==, 1); -g_assert_cmpuint(apicid_smt_width(1, 3), ==, 2); -g_assert_cmpuint(apicid_smt_width(1, 4), ==, 2); +g_assert_cmpuint(apicid_smt_width(1, 1, 2), ==, 1); +g_assert_cmpuint(apicid_smt_width(1, 1, 3), ==, 2); +g_assert_cmpuint(apicid_smt_width(1, 1, 4), ==, 2); -g_assert_cmpuint(apicid_smt_width(1, 14), ==, 4); -g_assert_cmpuint(apicid_smt_width(1, 15), ==, 4); -g_assert_cmpuint(apicid_smt_width(1, 16), ==, 4); -g_assert_cmpuint(apicid_smt_width(1, 17), ==, 5); +g_assert_cmpuint(apicid_smt_width(1, 1, 14), ==, 4); +g_assert_cmpuint(apicid_smt_width(1, 1, 15), ==, 4); +g_assert_cmpuint(apicid_smt_width(1, 1, 16), ==, 4); +g_assert_cmpuint(apicid_smt_width(1, 1, 17), ==, 5); -g_assert_cmpuint(apicid_core_width(30, 2), ==, 5); -g_assert_cmpuint(apicid_core_width(31, 2), ==, 5); -g_assert_cmpuint(apicid_core_width(32, 2), ==, 5); -g_assert_cmpuint(apicid_core_width(33, 2), ==, 6); +g_assert_cmpuint(apicid_core_width(1, 30, 2), ==, 5); +g_assert_cmpuint(apicid_core_width(1, 31, 2), ==, 5); +g_assert_cmpuint(apicid_core_width(1, 32, 2), ==, 5); +g_assert_cmpuint(apicid_core_width(1, 33, 2), ==, 6); +g_assert_cmpuint(apicid_die_width(1, 30, 2), ==, 0); +g_assert_cmpuint(apicid_die_width(2, 30, 2), ==, 1); +g_assert_cmpuint(apicid_die_width(3, 30, 2), ==, 2); +g_assert_cmpuint(apicid_die_width(4, 30, 2), ==, 2); /* build a weird topology and see if IDs are calculated correctly */ /* This will use 2 bits for thread ID and 3 bits for core ID */ -g_assert_cmpuint(apicid_smt_width(6, 3), ==, 2); -g_assert_cmpuint(apicid_core_width(6, 3), ==, 3); -g_assert_cmpuint(apicid_pkg_offset(6, 3), ==, 5); +g_assert_cmpuint(apicid_smt_width(1, 6, 3), ==, 2); +g_assert_cmpuint(apicid_core_offset(1, 6, 3), ==, 2); +g_assert_cmpuint(apicid_die_offset(1, 6, 3), ==, 5); +g_assert_cmpuint(apicid_pkg_offset(1, 6, 3), ==, 5); -g_assert_cmpuint(x86_apicid_from_cpu_idx(6, 3, 0), ==, 0); -g_assert_cmpuint(x86_apicid_from_cpu_idx(6, 3, 1), ==, 1); -g_assert_cmpuint(x86_apicid_from_cpu_idx(6, 3, 2), ==, 2); +g_assert_cmpuint(x86_apicid_from_cpu_idx(1, 6, 3, 0), ==, 0); +g_assert_cmpuint(x86_apicid_from_cpu_idx(1, 6, 3, 1), ==, 1); +g_assert_cmpuint(x86_apicid_from_cpu_idx(1, 6, 3, 2), ==, 2); -g_assert_cmpuint(x86_apicid_from_cpu_idx(6, 3, 1 * 3 + 0), ==, +g_assert_cmpuint(x86_apicid_from_cpu_idx(1, 6, 3, 1 * 3 + 0), ==, (1 << 2) | 0); -g_assert_cmpuint(x86_apicid_from_cpu_idx(6, 3, 1 * 3 + 1), ==, +g_assert_cmpuint(x86_apicid_from_cpu_idx(1, 6, 3, 1 * 3 + 1), ==, (1 << 2) | 1); -g_assert_cmpuint(x86_apicid_from_cpu_idx(6, 3, 1 * 3 + 2), ==, +g_assert_cmpuint(x86_apicid_from_cpu_idx(1, 6, 3, 1 * 3 + 2), ==, (1 << 2) | 2); -g_assert_cmpuint(x86_apicid_from_cpu_idx(6, 3, 2 * 3 + 0), ==, +g_assert_cmpuint(x86_apicid_from_cpu_idx(1, 6, 3, 2 * 3 + 0), ==, (2 << 2) | 0); -g_assert_cmpuint(x86_apicid_from_cpu_idx(6, 3, 2 * 3 + 1), ==, +g_assert_cmpuint(x86_apicid_from_cpu_idx(1, 6, 3, 2 * 3 + 1), ==, (2 << 2) | 1); -g_assert_cmpuint(x86_apicid_from_cpu_idx(6, 3, 2 * 3 + 2), ==, +g_assert_cmpuint(x86_apicid_from_cpu_idx(1, 6, 3, 2 * 3 + 2), ==, (2 <&
[Qemu-devel] [PATCH v3 0/9] Introduce cpu die topology and enable CPUID.1F for i386
Multi-chip packaging technology allows integration of multi-cores in one die and multi-dies in one single package, for example Intel CLX-AP or AMD EPYC. This patch series extend the CPU topology to the socket/dies/core/thread model, allowing the setting of dies number per one socket on -smp qemu command. For i386, it upgrades APIC_IDs generation and reversion functions with a new exposed leaf called CPUID.1F, which is a preferred superset to leaf 0BH. The CPUID.1F spec is on https://software.intel.com/en-us/articles/intel-sdm, 3-190 Vol 2A. E.g. we use -smp 4,dies=2,cores=2,threads=1 to run a multi-dies guest and check raw cpuid data and the expected output from guest is following: 0x001f 0x00: eax=0x ebx=0x0001 ecx=0x0100 edx=0x0002 0x001f 0x01: eax=0x0001 ebx=0x0002 ecx=0x0201 edx=0x0001 0x001f 0x02: eax=0x0002 ebx=0x0004 ecx=0x0502 edx=0x0003 0x001f 0x03: eax=0x ebx=0x ecx=0x0003 edx=0x0001 Guest system could discover multi-die/package topology through CPUID.1F. and its benefit is primarily for _reporting_ of the (virtual) CPU topology. The guest kernel with multi-die/package support have no impact on its cache topology, NUMA topology, Linux scheduler, or system performance. ==changelog== v3: - add a MachineClass::smp_parse function pointer - place the PC-specific function inside hw/i386/pc.c - introduce die_id in a separate patch with default value 0 - set env->nr_dies in pc_new_cpu() and pc_cpu_pre_plug() - fix a circular dependency between target/i386/cpu.c and hw/i386/pc.c - fix cpu->die_id check in pc_cpu_pre_plug() - Based on "[PATCH v3 00/10] Refactor cpu topo into machine properties" - Rebase to commit 219dca61ebf41625831d4f96a720852baf44b762 v2: https://patchwork.kernel.org/cover/10953191/ - Enable cpu die-level topolgy only for PCMachine and X86CPU - Minimize cpuid.0.eax to the setting value actually used by guest - Update cmd line -smps docs for die-level configurations - Refactoring topo-bit tests for x86_apicid_from_cpu_idx() with nr_dies - Based on "[PATCH v3 00/10] Refactor cpu topo into machine properties" - Rebase to commit 2259637b95bef3116cc262459271de08e038cc66 v1: https://patchwork.kernel.org/cover/10876667/ Like Xu (9): i386: Add die-level cpu topology to x86CPU on PCMachine hw/i386: Adjust nr_dies with configured smp_dies for PCMachine i386/cpu: Consolidate die-id validity in smp context i386: Update new x86_apicid parsing rules with die_offset support tests/x86-cpuid: Update testcases in test_topo_bits() with multiple dies i386/cpu: Add CPUID.1F generation support for multi-dies PCMachine target/i386: Support multi-dies when host doesn't support CPUID.1F machine: Refactor smp_parse() in vl.c as MachineClass::smp_parse() vl.c: Add -smp, dies=* command line support and update doc hmp.c | 3 + hw/core/machine.c | 89 ++ hw/i386/pc.c | 148 - include/hw/boards.h| 5 ++ include/hw/i386/pc.h | 3 + include/hw/i386/topology.h | 76 +-- qapi/misc.json | 6 +- qemu-options.hx| 17 +++-- target/i386/cpu.c | 53 +++-- target/i386/cpu.h | 7 ++ target/i386/kvm.c | 36 - tests/test-x86-cpuid.c | 84 +++-- vl.c | 78 ++- 13 files changed, 438 insertions(+), 167 deletions(-) -- 2.21.0
[Qemu-devel] [PATCH v3 1/9] i386: Add die-level cpu topology to x86CPU on PCMachine
The die-level as the first PC-specific cpu topology is added to the leagcy cpu topology model, which has one die per package implicitly and only the numbers of sockets/cores/threads are configurable. In the new model with die-level support, the total number of logical processors (including offline) on board will be calculated as: #cpus = #sockets * #dies * #cores * #threads and considering compatibility, the default value for #dies would be initialized to one in x86_cpu_initfn() and pc_machine_initfn(). Signed-off-by: Like Xu --- hw/i386/pc.c | 9 +++-- include/hw/i386/pc.h | 2 ++ target/i386/cpu.c| 1 + target/i386/cpu.h| 2 ++ 4 files changed, 12 insertions(+), 2 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 12c1e08b85..9e9a42f007 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2308,9 +2308,13 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev, return; } -/* if APIC ID is not set, set it based on socket/core/thread properties */ +/* + * If APIC ID is not set, + * set it based on socket/die/core/thread properties. + */ if (cpu->apic_id == UNASSIGNED_APIC_ID) { -int max_socket = (ms->smp.max_cpus - 1) / smp_threads / smp_cores; +int max_socket = (ms->smp.max_cpus - 1) / +smp_threads / smp_cores / pcms->smp_dies; if (cpu->socket_id < 0) { error_setg(errp, "CPU socket-id is not set"); @@ -2620,6 +2624,7 @@ static void pc_machine_initfn(Object *obj) pcms->smbus_enabled = true; pcms->sata_enabled = true; pcms->pit_enabled = true; +pcms->smp_dies = 1; pc_system_flash_create(pcms); } diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index b260262640..fae9217e34 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -24,6 +24,7 @@ * PCMachineState: * @acpi_dev: link to ACPI PM device that performs ACPI hotplug handling * @boot_cpus: number of present VCPUs + * @smp_dies: number of dies per one package */ struct PCMachineState { /*< private >*/ @@ -59,6 +60,7 @@ struct PCMachineState { bool apic_xrupt_override; unsigned apic_id_limit; uint16_t boot_cpus; +unsigned smp_dies; /* NUMA information: */ uint64_t numa_nodes; diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 23119699de..a16be205fe 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -5619,6 +5619,7 @@ static void x86_cpu_initfn(Object *obj) CPUX86State *env = &cpu->env; FeatureWord w; +env->nr_dies = 1; cpu_set_cpustate_pointers(cpu); object_property_add(obj, "family", "int", diff --git a/target/i386/cpu.h b/target/i386/cpu.h index edad6e1efb..5daa2eeafa 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1349,6 +1349,8 @@ typedef struct CPUX86State { uint64_t xss; TPRAccess tpr_access_type; + +unsigned nr_dies; } CPUX86State; struct kvm_msrs; -- 2.21.0
[Qemu-devel] [PATCH v3 2/9] hw/i386: Adjust nr_dies with configured smp_dies for PCMachine
To support multiple dies configuration on PCMachine, the best place to set CPUX86State->nr_dies with requested PCMachineState->smp_dies is in pc_new_cpu() and pc_cpu_pre_plug(). Refactoring pc_new_cpu() is applied and redundant parameter "const char *typename" would be removed. Suggested-by: Eduardo Habkost Signed-off-by: Like Xu --- hw/i386/pc.c | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 9e9a42f007..af2e95a1b9 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -1520,12 +1520,16 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int level) } } -static void pc_new_cpu(const char *typename, int64_t apic_id, Error **errp) +static void pc_new_cpu(PCMachineState *pcms, int64_t apic_id, Error **errp) { Object *cpu = NULL; Error *local_err = NULL; +CPUX86State *env = NULL; -cpu = object_new(typename); +cpu = object_new(MACHINE(pcms)->cpu_type); + +env = &X86_CPU(cpu)->env; +env->nr_dies = pcms->smp_dies; object_property_set_uint(cpu, apic_id, "apic-id", &local_err); object_property_set_bool(cpu, true, "realized", &local_err); @@ -1551,7 +1555,7 @@ void pc_hot_add_cpu(MachineState *ms, const int64_t id, Error **errp) return; } -pc_new_cpu(ms->cpu_type, apic_id, &local_err); +pc_new_cpu(PC_MACHINE(ms), apic_id, &local_err); if (local_err) { error_propagate(errp, local_err); return; @@ -1576,8 +1580,7 @@ void pc_cpus_init(PCMachineState *pcms) ms->smp.max_cpus - 1) + 1; possible_cpus = mc->possible_cpu_arch_ids(ms); for (i = 0; i < ms->smp.cpus; i++) { -pc_new_cpu(possible_cpus->cpus[i].type, possible_cpus->cpus[i].arch_id, - &error_fatal); +pc_new_cpu(pcms, possible_cpus->cpus[i].arch_id, &error_fatal); } } @@ -2297,6 +2300,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev, CPUArchId *cpu_slot; X86CPUTopoInfo topo; X86CPU *cpu = X86_CPU(dev); +CPUX86State *env = &cpu->env; MachineState *ms = MACHINE(hotplug_dev); PCMachineState *pcms = PC_MACHINE(hotplug_dev); unsigned int smp_cores = ms->smp.cores; @@ -2308,6 +2312,8 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev, return; } +env->nr_dies = pcms->smp_dies; + /* * If APIC ID is not set, * set it based on socket/die/core/thread properties. -- 2.21.0
[Qemu-devel] [PATCH v3 8/9] machine: Refactor smp_parse() in vl.c as MachineClass::smp_parse()
To make smp_parse() more flexible and expansive, a smp_parse function pointer is added to MachineClass that machine types could override. The generic smp_parse() code in vl.c is moved to hw/core/machine.c, and become the default implementation of MachineClass::smp_parse. A PC-specific function called pc_smp_parse() has been added to hw/i386/pc.c, which in this patch changes nothing against the default one . Suggested-by: Eduardo Habkost Signed-off-by: Like Xu --- hw/core/machine.c| 77 hw/i386/pc.c | 76 +++ include/hw/boards.h | 5 +++ include/hw/i386/pc.h | 1 + vl.c | 75 ++ 5 files changed, 161 insertions(+), 73 deletions(-) diff --git a/hw/core/machine.c b/hw/core/machine.c index 9eeba448ed..d58a684abf 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -11,6 +11,9 @@ */ #include "qemu/osdep.h" +#include "qemu/option.h" +#include "qapi/qmp/qerror.h" +#include "sysemu/replay.h" #include "qemu/units.h" #include "hw/boards.h" #include "qapi/error.h" @@ -722,6 +725,79 @@ void machine_set_cpu_numa_node(MachineState *machine, } } +static void smp_parse(MachineState *ms, QemuOpts *opts) +{ +/* copy it from legacy smp_parse() in vl.c */ +if (opts) { +unsigned cpus= qemu_opt_get_number(opts, "cpus", 0); +unsigned sockets = qemu_opt_get_number(opts, "sockets", 0); +unsigned cores = qemu_opt_get_number(opts, "cores", 0); +unsigned threads = qemu_opt_get_number(opts, "threads", 0); + +/* compute missing values, prefer sockets over cores over threads */ +if (cpus == 0 || sockets == 0) { +cores = cores > 0 ? cores : 1; +threads = threads > 0 ? threads : 1; +if (cpus == 0) { +sockets = sockets > 0 ? sockets : 1; +cpus = cores * threads * sockets; +} else { +ms->smp.max_cpus = +qemu_opt_get_number(opts, "maxcpus", cpus); +sockets = ms->smp.max_cpus / (cores * threads); +} +} else if (cores == 0) { +threads = threads > 0 ? threads : 1; +cores = cpus / (sockets * threads); +cores = cores > 0 ? cores : 1; +} else if (threads == 0) { +threads = cpus / (cores * sockets); +threads = threads > 0 ? threads : 1; +} else if (sockets * cores * threads < cpus) { +error_report("cpu topology: " + "sockets (%u) * cores (%u) * threads (%u) < " + "smp_cpus (%u)", + sockets, cores, threads, cpus); +exit(1); +} + +ms->smp.max_cpus = +qemu_opt_get_number(opts, "maxcpus", cpus); + +if (ms->smp.max_cpus < cpus) { +error_report("maxcpus must be equal to or greater than smp"); +exit(1); +} + +if (sockets * cores * threads > ms->smp.max_cpus) { +error_report("cpu topology: " + "sockets (%u) * cores (%u) * threads (%u) > " + "maxcpus (%u)", + sockets, cores, threads, + ms->smp.max_cpus); +exit(1); +} + +if (sockets * cores * threads != ms->smp.max_cpus) { +warn_report("Invalid CPU topology deprecated: " +"sockets (%u) * cores (%u) * threads (%u) " +"!= maxcpus (%u)", +sockets, cores, threads, +ms->smp.max_cpus); +} + +ms->smp.cpus = cpus; +ms->smp.cores = cores; +ms->smp.threads = threads; +} + +if (ms->smp.cpus > 1) { +Error *blocker = NULL; +error_setg(&blocker, QERR_REPLAY_NOT_SUPPORTED, "smp"); +replay_add_blocker(blocker); +} +} + static void machine_class_init(ObjectClass *oc, void *data) { MachineClass *mc = MACHINE_CLASS(oc); @@ -729,6 +805,7 @@ static void machine_class_init(ObjectClass *oc, void *data) /* Default 128 MB as guest ram size */ mc->default_ram_size = 128 * MiB; mc->rom_file_has_mr = true; +mc->smp_parse = smp_parse; /* numa node memory size aligned on 8MB by default. * On Linux, each node's border has to be 8MB aligned diff --git a/hw/i386/pc.c b/hw/i386/pc.c index b4dbd1064d..63b44bd2bd 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -78,6 +78,8 @@ #include "hw/i386/intel_iommu.h" #include "
[Qemu-devel] [PATCH v3 3/9] i386/cpu: Consolidate die-id validity in smp context
The field die_id (default as 0) and has_die_id are introduced to X86CPU. Following the legacy smp check rules, the die_id validity is added to the same contexts as leagcy smp variables such as hmp_hotpluggable_cpus(), machine_set_cpu_numa_node(), cpu_slot_to_string() and pc_cpu_pre_plug(). Acked-by: Dr. David Alan Gilbert Signed-off-by: Like Xu --- hmp.c | 3 +++ hw/core/machine.c | 12 hw/i386/pc.c | 14 ++ include/hw/i386/topology.h | 2 ++ qapi/misc.json | 6 -- target/i386/cpu.c | 2 ++ target/i386/cpu.h | 1 + 7 files changed, 38 insertions(+), 2 deletions(-) diff --git a/hmp.c b/hmp.c index be5e345c6f..b567c86628 100644 --- a/hmp.c +++ b/hmp.c @@ -3113,6 +3113,9 @@ void hmp_hotpluggable_cpus(Monitor *mon, const QDict *qdict) if (c->has_socket_id) { monitor_printf(mon, "socket-id: \"%" PRIu64 "\"\n", c->socket_id); } +if (c->has_die_id) { +monitor_printf(mon, "die-id: \"%" PRIu64 "\"\n", c->die_id); +} if (c->has_core_id) { monitor_printf(mon, "core-id: \"%" PRIu64 "\"\n", c->core_id); } diff --git a/hw/core/machine.c b/hw/core/machine.c index f1a0f45f9c..9eeba448ed 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -679,6 +679,11 @@ void machine_set_cpu_numa_node(MachineState *machine, return; } +if (props->has_die_id && !slot->props.has_die_id) { +error_setg(errp, "die-id is not supported"); +return; +} + /* skip slots with explicit mismatch */ if (props->has_thread_id && props->thread_id != slot->props.thread_id) { continue; @@ -688,6 +693,10 @@ void machine_set_cpu_numa_node(MachineState *machine, continue; } +if (props->has_die_id && props->die_id != slot->props.die_id) { +continue; +} + if (props->has_socket_id && props->socket_id != slot->props.socket_id) { continue; } @@ -945,6 +954,9 @@ static char *cpu_slot_to_string(const CPUArchId *cpu) if (cpu->props.has_socket_id) { g_string_append_printf(s, "socket-id: %"PRId64, cpu->props.socket_id); } +if (cpu->props.has_die_id) { +g_string_append_printf(s, "die-id: %"PRId64, cpu->props.die_id); +} if (cpu->props.has_core_id) { if (s->len) { g_string_append_printf(s, ", "); diff --git a/hw/i386/pc.c b/hw/i386/pc.c index af2e95a1b9..6e774c6c8e 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2329,6 +2329,10 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev, error_setg(errp, "Invalid CPU socket-id: %u must be in range 0:%u", cpu->socket_id, max_socket); return; +} else if (cpu->die_id > pcms->smp_dies - 1) { +error_setg(errp, "Invalid CPU die-id: %u must be in range 0:%u", + cpu->die_id, max_socket); +return; } if (cpu->core_id < 0) { error_setg(errp, "CPU core-id is not set"); @@ -2348,6 +2352,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev, } topo.pkg_id = cpu->socket_id; +topo.die_id = cpu->die_id; topo.core_id = cpu->core_id; topo.smt_id = cpu->thread_id; cpu->apic_id = apicid_from_topo_ids(smp_cores, smp_threads, &topo); @@ -2385,6 +2390,13 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev, } cpu->socket_id = topo.pkg_id; +if (cpu->die_id != -1 && cpu->die_id != topo.die_id) { +error_setg(errp, "property die-id: %u doesn't match set apic-id:" +" 0x%x (die-id: %u)", cpu->die_id, cpu->apic_id, topo.die_id); +return; +} +cpu->die_id = topo.die_id; + if (cpu->core_id != -1 && cpu->core_id != topo.core_id) { error_setg(errp, "property core-id: %u doesn't match set apic-id:" " 0x%x (core-id: %u)", cpu->core_id, cpu->apic_id, topo.core_id); @@ -2701,6 +2713,8 @@ static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms) ms->smp.cores, ms->smp.threads, &topo); ms->possible_cpus->cpus[i].props.has_socket_id = true; ms->possible_cpus->cpus[i].props.socket_id = topo.pkg_id; +ms->possible_cpus->cpus[i].props.has_die_id = true; +ms->possible_cpus->cpus[i].props.die_id = topo.die
[Qemu-devel] [PATCH v3 7/9] target/i386: Support multi-dies when host doesn't support CPUID.1F
In guest CPUID generation process, the cpuid_min_level would be adjusted to the maximum passed value for basic CPUID configuration and it should not be restricted by the limited value returned from cpu_x86_cpuid(). After the basic cpu_x86_cpuid() loop is finished, the cpuid_0_entry.eax needs to be configured again by the last adjusted cpuid_min_level value. If a user wants to expose CPUID.1F by passing dies > 1 for any reason without host support, a per-cpu smp topology warning will appear but it's not blocked. Signed-off-by: Like Xu --- target/i386/kvm.c | 24 ++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/target/i386/kvm.c b/target/i386/kvm.c index 9b4da9b265..8bf1604d2b 100644 --- a/target/i386/kvm.c +++ b/target/i386/kvm.c @@ -931,12 +931,12 @@ int kvm_arch_init_vcpu(CPUState *cs) struct kvm_cpuid_entry2 *c; uint32_t signature[3]; int kvm_base = KVM_CPUID_SIGNATURE; -int r; +int r, cpuid_0_entry, cpuid_min_level; Error *local_err = NULL; memset(&cpuid_data, 0, sizeof(cpuid_data)); -cpuid_i = 0; +cpuid_i = cpuid_0_entry = cpuid_min_level = 0; r = kvm_arch_set_tsc_khz(cs); if (r < 0) { @@ -1050,6 +1050,12 @@ int kvm_arch_init_vcpu(CPUState *cs) cpu_x86_cpuid(env, 0, 0, &limit, &unused, &unused, &unused); +/* Allow 0x1f setting regardless of kvm support if nr_dies > 1 */ +if (limit < 0x1f && env->nr_dies > 1 && cpu->enable_cpuid_0x1f) { +limit = env->cpuid_level = env->cpuid_min_level = 0x1f; +warn_report("CPU topology: the CPUID.1F isn't supported on the host."); +} + for (i = 0; i <= limit; i++) { if (cpuid_i == KVM_MAX_CPUID_ENTRIES) { fprintf(stderr, "unsupported level value: 0x%x\n", limit); @@ -1151,8 +1157,22 @@ int kvm_arch_init_vcpu(CPUState *cs) cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx); break; } + +/* Remember the index of cpuid.0 leaf for reconfiguration. */ +cpuid_0_entry = (i == 0) ? (cpuid_i - 1) : cpuid_0_entry; + +/* Adjust cpuid_min_level to the maximum index of valid basic cpuids. */ +cpuid_min_level = +((c->eax | c->ebx | c->ecx | c->edx | c->flags | c->index) && +(i > cpuid_min_level)) ? i : cpuid_min_level; } +env->cpuid_level = env->cpuid_min_level = cpuid_min_level; + +/* Reconfigure cpuid_0_eax value to follow CPUID.0 instruction spec.*/ +c = &cpuid_data.entries[cpuid_0_entry]; +cpu_x86_cpuid(env, 0, 0, &c->eax, &c->ebx, &c->ecx, &c->edx); + if (limit >= 0x0a) { uint32_t eax, edx; -- 2.21.0
Re: [Qemu-devel] [PATCH v2 1/5] target/i386: Add cpu die-level topology support for X86CPU
On 2019/6/6 11:32, Eduardo Habkost wrote: On Tue, May 21, 2019 at 12:50:52AM +0800, Like Xu wrote: The die-level as the first PC-specific cpu topology is added to the leagcy cpu topology model which only covers sockets/cores/threads. In the new model with die-level support, the total number of logical processors (including offline) on board will be calculated as: #cpus = #sockets * #dies * #cores * #threads and considering compatibility, the default value for #dies is 1. A new set of die-related variables are added in smp context and the CPUX86State.nr_dies is assigned in x86_cpu_initfn() from PCMachineState. Signed-off-by: Like Xu --- hw/i386/pc.c | 3 +++ include/hw/i386/pc.h | 2 ++ include/hw/i386/topology.h | 2 ++ qapi/misc.json | 6 -- target/i386/cpu.c | 9 + target/i386/cpu.h | 3 +++ 6 files changed, 23 insertions(+), 2 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 896c22e32e..83ab53c814 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2341,6 +2341,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev, topo.pkg_id = cpu->socket_id; topo.core_id = cpu->core_id; +topo.die_id = cpu->die_id; topo.smt_id = cpu->thread_id; cpu->apic_id = apicid_from_topo_ids(smp_cores, smp_threads, &topo); } @@ -2692,6 +2693,8 @@ static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms) ms->smp.cores, ms->smp.threads, &topo); ms->possible_cpus->cpus[i].props.has_socket_id = true; ms->possible_cpus->cpus[i].props.socket_id = topo.pkg_id; +ms->possible_cpus->cpus[i].props.has_die_id = true; +ms->possible_cpus->cpus[i].props.die_id = topo.die_id; ms->possible_cpus->cpus[i].props.has_core_id = true; ms->possible_cpus->cpus[i].props.core_id = topo.core_id; ms->possible_cpus->cpus[i].props.has_thread_id = true; diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index ce3c22951e..b5faf2ede9 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -24,6 +24,7 @@ * PCMachineState: * @acpi_dev: link to ACPI PM device that performs ACPI hotplug handling * @boot_cpus: number of present VCPUs + * @smp_dies: number of dies per one package */ struct PCMachineState { /*< private >*/ @@ -59,6 +60,7 @@ struct PCMachineState { bool apic_xrupt_override; unsigned apic_id_limit; uint16_t boot_cpus; +unsigned smp_dies; /* NUMA information: */ uint64_t numa_nodes; diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h index 1ebaee0f76..7f80498eb3 100644 --- a/include/hw/i386/topology.h +++ b/include/hw/i386/topology.h @@ -47,6 +47,7 @@ typedef uint32_t apic_id_t; typedef struct X86CPUTopoInfo { unsigned pkg_id; +unsigned die_id; Isn't it better to add this field only on patch 4/5? unsigned core_id; unsigned smt_id; } X86CPUTopoInfo; @@ -130,6 +131,7 @@ static inline void x86_topo_ids_from_apicid(apic_id_t apicid, topo->core_id = (apicid >> apicid_core_offset(nr_cores, nr_threads)) & ~(0xUL << apicid_core_width(nr_cores, nr_threads)); topo->pkg_id = apicid >> apicid_pkg_offset(nr_cores, nr_threads); +topo->die_id = -1; Why are you setting die_id = -1 here? Hi Eduardo,thanks for your comments and support. Would it be a better way to introduce all die related variables including has_die_id/nr_dies/cpu->die_id/topo.die_id/smp_dies in one patch for consistency check and backport convenient? In this case the default value for topo->die_id would be 0 (for sure, one die per package) with has_die_id = false. Is that acceptable to you? If die_id isn't valid yet, isn't it better to keep has_die_id = false at pc_possible_cpu_arch_ids() above, and set has_die_id = true only on patch 4/5? } /* Make APIC ID for the CPU 'cpu_index' diff --git a/qapi/misc.json b/qapi/misc.json index 8b3ca4fdd3..cd236c89b3 100644 --- a/qapi/misc.json +++ b/qapi/misc.json @@ -2924,10 +2924,11 @@ # # @node-id: NUMA node ID the CPU belongs to # @socket-id: socket number within node/board the CPU belongs to -# @core-id: core number within socket the CPU belongs to +# @die-id: die number within node/board the CPU belongs to (Since 4.1) +# @core-id: core number within die the CPU belongs to # @thread-id: thread number within core the CPU belongs to # -# Note: currently there are 4 properties that could be present +# Note: currently there are 5 properties that could be present # but management should be prepared to pass through other # properties with device_add command to allow for future # interface extension. This also requires the filed names to be kept in @@ -2938
Re: [Qemu-devel] [Qemu-ppc] [PATCH v3 04/10] hw/ppc: Replace global smp variables with machine smp properties
On 2019/6/6 16:20, Greg Kurz wrote: On Thu, 6 Jun 2019 13:07:32 +1000 David Gibson wrote: On Wed, Jun 05, 2019 at 11:54:56PM -0300, Eduardo Habkost wrote: On Wed, Jun 05, 2019 at 11:52:41PM -0300, Eduardo Habkost wrote: On Sun, May 19, 2019 at 04:54:22AM +0800, Like Xu wrote: The global smp variables in ppc are replaced with smp machine properties. A local variable of the same name would be introduced in the declaration phase if it's used widely in the context OR replace it on the spot if it's only used once. No semantic changes. Signed-off-by: Like Xu Any objections from the ppc maintainers to queueing this through the Machine Core tree? Oops, CCing the ppc maintainers. No objection here. Acked-by: David Gibson Just one nit... [...] diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c index ee24212765..c9ffe9786c 100644 --- a/hw/ppc/spapr_rtas.c +++ b/hw/ppc/spapr_rtas.c @@ -231,6 +231,8 @@ static void rtas_ibm_get_system_parameter(PowerPCCPU *cpu, target_ulong args, uint32_t nret, target_ulong rets) { +MachineState *ms = MACHINE(qdev_get_machine()); rtas_ibm_get_system_parameter() has a SpaprMachineState *spapr argument, no need to rely on qdev_get_machine(). I will fix it in the next (rebased) version. Thank you, Greg. But this can be fixed in a followup patch I guess. Not worth holding the patchset because of that. +unsigned int max_cpus = ms->smp.max_cpus; target_ulong parameter = rtas_ld(args, 0); target_ulong buffer = rtas_ld(args, 1); target_ulong length = rtas_ld(args, 2); @@ -244,7 +246,7 @@ static void rtas_ibm_get_system_parameter(PowerPCCPU *cpu, "MaxPlatProcs=%d", max_cpus, current_machine->ram_size / MiB, - smp_cpus, + ms->smp.cpus, max_cpus); ret = sysparm_st(buffer, length, param_val, strlen(param_val) + 1); g_free(param_val);
[Qemu-devel] [QUESTION] How to reduce network latency to improve netperf TCP_RR drastically?
Hi Michael, At https://www.linux-kvm.org/page/NetworkingTodo, there is an entry for network latency saying: --- reduce networking latency: allow handling short packets from softirq or VCPU context Plan: We are going through the scheduler 3 times (could be up to 5 if softirqd is involved) Consider RX: host irq -> io thread -> VCPU thread -> guest irq -> guest thread. This adds a lot of latency. We can cut it by some 1.5x if we do a bit of work either in the VCPU or softirq context. Testing: netperf TCP RR - should be improved drastically netperf TCP STREAM guest to host - no regression Contact: MST --- I am trying to make some contributions to improving netperf TCP_RR. Could you please share more ideas or plans or implemental details to make it happen? Thanks, Like Xu
Re: [Qemu-devel] [PATCH v3 00/10] Refactor cpu topo into machine properties
Ping for [PATCH v3 00/10] Refactor cpu topo into machine properties. On 2019/5/26 21:51, Like Xu wrote: On 2019/5/19 4:54, Like Xu wrote: This patch series make existing cores/threads/sockets into machine properties and get rid of global smp_* variables they use currently. The purpose of getting rid of globals is disentangle layer violations and let's do it one step at a time by replacing the smp_foo with qdev_get_machine() as few calls as possible and delay other related refactoring efforts. Hi Eduardo & Igor, Do you have any comments on this new version of CpuTopology refactoring? With this series of patch, we may move forward to review [Qemu-devel] [PATCH v2 0/5] Introduce cpu die topology and enable CPUID.1F for i386. Thanks, Like Xu ==changelog== v3: - rephrase commit messages - s/of/of present/ for CpuTopology comment - drop reduanct arguments such as cpu_type - use ms instead of macs in migration context - rebase to commit 1b46b4daa6
Re: [Qemu-devel] [PATCH v2 2/5] i386/cpu: Consolidate die-id validity in smp context
On 2019/5/22 1:12, Dr. David Alan Gilbert wrote: * Like Xu (like...@linux.intel.com) wrote: Following the legacy smp check rules, the die_id validity is added to the same contexts as leagcy smp variables such as hmp_hotpluggable_cpus(), machine_set_cpu_numa_node(), cpu_slot_to_string() and pc_cpu_pre_plug(). Signed-off-by: Like Xu --- hmp.c | 3 +++ hw/core/machine.c | 12 hw/i386/pc.c | 11 +++ 3 files changed, 26 insertions(+) diff --git a/hmp.c b/hmp.c index 56a3ed7375..7deb7b7226 100644 --- a/hmp.c +++ b/hmp.c @@ -3112,6 +3112,9 @@ void hmp_hotpluggable_cpus(Monitor *mon, const QDict *qdict) if (c->has_socket_id) { monitor_printf(mon, "socket-id: \"%" PRIu64 "\"\n", c->socket_id); } +if (c->has_die_id) { +monitor_printf(mon, "die-id: \"%" PRIu64 "\"\n", c->die_id); +} if (c->has_core_id) { monitor_printf(mon, "core-id: \"%" PRIu64 "\"\n", c->core_id); } diff --git a/hw/core/machine.c b/hw/core/machine.c index 5d046a43e3..5116429732 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -659,6 +659,11 @@ void machine_set_cpu_numa_node(MachineState *machine, return; } +if (props->has_die_id && !slot->props.has_die_id) { +error_setg(errp, "die-id is not supported"); +return; +} + /* skip slots with explicit mismatch */ if (props->has_thread_id && props->thread_id != slot->props.thread_id) { continue; @@ -668,6 +673,10 @@ void machine_set_cpu_numa_node(MachineState *machine, continue; } +if (props->has_die_id && props->die_id != slot->props.die_id) { +continue; +} + if (props->has_socket_id && props->socket_id != slot->props.socket_id) { continue; } @@ -925,6 +934,9 @@ static char *cpu_slot_to_string(const CPUArchId *cpu) if (cpu->props.has_socket_id) { g_string_append_printf(s, "socket-id: %"PRId64, cpu->props.socket_id); } +if (cpu->props.has_die_id) { +g_string_append_printf(s, "die-id: %"PRId64, cpu->props.die_id); +} if (cpu->props.has_core_id) { if (s->len) { g_string_append_printf(s, ", "); diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 83ab53c814..00be2463af 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2321,6 +2321,10 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev, error_setg(errp, "Invalid CPU socket-id: %u must be in range 0:%u", cpu->socket_id, max_socket); return; +} else if (cpu->die_id > max_socket) { +error_setg(errp, "Invalid CPU die-id: %u must be in range 0:%u", + cpu->die_id, max_socket); +return; Can you explain why the die_id is related to max_socket? I'd assumed you could have a 2 socket system where each socket has 4 dies. Dr David,thanks for your comments and sorry for the slow reply. You're right about this and the check rule for cpu->die_id in pc_cpu_pre_plug() should be: "else if (cpu->die_id > (pcms->smp_dies - 1))" However, for the HMP side of it: Acked-by: Dr. David Alan Gilbert } if (cpu->core_id < 0) { error_setg(errp, "CPU core-id is not set"); @@ -2378,6 +2382,13 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev, } cpu->socket_id = topo.pkg_id; +if (cpu->die_id != -1 && cpu->die_id != topo.die_id) { +error_setg(errp, "property die-id: %u doesn't match set apic-id:" +" 0x%x (die-id: %u)", cpu->die_id, cpu->apic_id, topo.die_id); +return; +} +cpu->die_id = topo.die_id; + if (cpu->core_id != -1 && cpu->core_id != topo.core_id) { error_setg(errp, "property core-id: %u doesn't match set apic-id:" " 0x%x (core-id: %u)", cpu->core_id, cpu->apic_id, topo.core_id); -- 2.21.0 -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v3 00/10] Refactor cpu topo into machine properties
On 2019/5/19 4:54, Like Xu wrote: This patch series make existing cores/threads/sockets into machine properties and get rid of global smp_* variables they use currently. The purpose of getting rid of globals is disentangle layer violations and let's do it one step at a time by replacing the smp_foo with qdev_get_machine() as few calls as possible and delay other related refactoring efforts. Hi Eduardo & Igor, Do you have any comments on this new version of CpuTopology refactoring? With this series of patch, we may move forward to review [Qemu-devel] [PATCH v2 0/5] Introduce cpu die topology and enable CPUID.1F for i386. Thanks, Like Xu ==changelog== v3: - rephrase commit messages - s/of/of present/ for CpuTopology comment - drop reduanct arguments such as cpu_type - use ms instead of macs in migration context - rebase to commit 1b46b4daa6
[Qemu-devel] [PATCH v2 4/5] i386/cpu: Update apicid parsing rules and topo-bit tests for dies
On Intel MCP (Multi-chip packaging) platforms, the apicid of logical cpu would imply die level info of cpu topology thus x86_apicid_from_cpu_idx() should be refactored with virtual nr_dies, so does apicid_*_offset(). To maintain semantic consistency, the pkg_offset which helps to generate CPUIDs such as 0x3 for L3 cache is mapping to die_offset from this commit. The corresponding topo_bits tests are updated to test die configurations. Signed-off-by: Like Xu --- hw/i386/pc.c | 38 +++-- include/hw/i386/topology.h | 76 -- target/i386/cpu.c | 13 +++--- tests/test-x86-cpuid.c | 84 -- 4 files changed, 133 insertions(+), 78 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 00be2463af..e498334cbc 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -935,10 +935,11 @@ void enable_compat_apic_id_mode(void) static uint32_t x86_cpu_apic_id_from_index(MachineState *ms, unsigned int cpu_index) { +PCMachineState *pcms = PC_MACHINE(ms); uint32_t correct_id; static bool warned; -correct_id = x86_apicid_from_cpu_idx(ms->smp.cores, +correct_id = x86_apicid_from_cpu_idx(pcms->smp_dies, ms->smp.cores, ms->smp.threads, cpu_index); if (compat_apic_id_mode) { if (cpu_index != correct_id && !warned && !qtest_enabled()) { @@ -2303,6 +2304,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev, PCMachineState *pcms = PC_MACHINE(hotplug_dev); unsigned int smp_cores = ms->smp.cores; unsigned int smp_threads = ms->smp.threads; +unsigned int smp_dies = pcms->smp_dies; if(!object_dynamic_cast(OBJECT(cpu), ms->cpu_type)) { error_setg(errp, "Invalid CPU type, expected cpu type: '%s'", @@ -2310,9 +2312,13 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev, return; } -/* if APIC ID is not set, set it based on socket/core/thread properties */ +/* + * If APIC ID is not set, + * set it based on socket/die/core/thread properties. + */ if (cpu->apic_id == UNASSIGNED_APIC_ID) { -int max_socket = (ms->smp.max_cpus - 1) / smp_threads / smp_cores; +int max_socket = (ms->smp.max_cpus - 1) / +smp_threads / smp_cores / pcms->smp_dies; if (cpu->socket_id < 0) { error_setg(errp, "CPU socket-id is not set"); @@ -2347,18 +2353,21 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev, topo.core_id = cpu->core_id; topo.die_id = cpu->die_id; topo.smt_id = cpu->thread_id; -cpu->apic_id = apicid_from_topo_ids(smp_cores, smp_threads, &topo); +cpu->apic_id = apicid_from_topo_ids(smp_dies, smp_cores, +smp_threads, &topo); } cpu_slot = pc_find_cpu_slot(MACHINE(pcms), cpu->apic_id, &idx); if (!cpu_slot) { MachineState *ms = MACHINE(pcms); -x86_topo_ids_from_apicid(cpu->apic_id, smp_cores, smp_threads, &topo); -error_setg(errp, "Invalid CPU [socket: %u, core: %u, thread: %u] with" - " APIC ID %" PRIu32 ", valid index range 0:%d", - topo.pkg_id, topo.core_id, topo.smt_id, cpu->apic_id, - ms->possible_cpus->len - 1); +x86_topo_ids_from_apicid(cpu->apic_id, smp_dies, + smp_cores, smp_threads, &topo); +error_setg(errp, +"Invalid CPU [socket: %u, die: %u, core: %u, thread: %u] with" +" APIC ID %" PRIu32 ", valid index range 0:%d", +topo.pkg_id, topo.die_id, topo.core_id, topo.smt_id, +cpu->apic_id, ms->possible_cpus->len - 1); return; } @@ -2374,7 +2383,8 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev, /* TODO: move socket_id/core_id/thread_id checks into x86_cpu_realizefn() * once -smp refactoring is complete and there will be CPU private * CPUState::nr_cores and CPUState::nr_threads fields instead of globals */ -x86_topo_ids_from_apicid(cpu->apic_id, smp_cores, smp_threads, &topo); +x86_topo_ids_from_apicid(cpu->apic_id, smp_dies, + smp_cores, smp_threads, &topo); if (cpu->socket_id != -1 && cpu->socket_id != topo.pkg_id) { error_setg(errp, "property socket-id: %u doesn't match set apic-id:" " 0x%x (socket-id: %u)", cpu->socket_id, cpu->apic_id, topo.pkg_id); @@ -2670,10 +2680,12 @@ pc_cpu_index_to_props(MachineState *ms, unsigned cpu_index) static int64_t pc_get_default_cpu_node_id(const Machin
[Qemu-devel] [PATCH v2 5/5] target/i386: Add CPUID.1F generation support for multi-die PCMachine
The CPUID.1F as Intel V2 Extended Topology Enumeration Leaf would be exposed if guests want to emulate multiple software-visible die within each package. Per Intel's SDM, the 0x1f is a superset of 0xb, thus they can be generated by almost same code as 0xb except die_offset setting. If the number of dies per package is less than 2, the qemu will not expose CPUID.1F regardless of whether the host supports CPUID.1F, and in any case, cpuid.0.eax would store the maximum input value for **guest** basic CPUID. If users do want to expose CPUID.1F by passing dies > 1 for simulation without host support, there will be a smp topology warning but it is not blocking. Signed-off-by: Like Xu --- target/i386/cpu.c | 37 + target/i386/cpu.h | 4 target/i386/kvm.c | 30 -- 3 files changed, 69 insertions(+), 2 deletions(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 3222bd3254..cd6c9933c3 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -4417,6 +4417,42 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, *ecx |= CPUID_TOPOLOGY_LEVEL_INVALID; } +assert(!(*eax & ~0x1f)); +*ebx &= 0x; /* The count doesn't need to be reliable. */ +break; +case 0x1F: +/* V2 Extended Topology Enumeration Leaf */ +if (env->nr_dies < 2 || !cpu->enable_cpuid_0x1f) { +*eax = *ebx = *ecx = *edx = 0; +break; +} + +*ecx = count & 0xff; +*edx = cpu->apic_id; +switch (count) { +case 0: +*eax = apicid_core_offset(env->nr_dies, cs->nr_cores, +cs->nr_threads); +*ebx = cs->nr_threads; +*ecx |= CPUID_TOPOLOGY_LEVEL_SMT; +break; +case 1: +*eax = apicid_die_offset(env->nr_dies, cs->nr_cores, + cs->nr_threads); +*ebx = cs->nr_cores * cs->nr_threads; +*ecx |= CPUID_TOPOLOGY_LEVEL_CORE; +break; +case 2: +*eax = apicid_pkg_offset(env->nr_dies, cs->nr_cores, + cs->nr_threads); +*ebx = env->nr_dies * cs->nr_cores * cs->nr_threads; +*ecx |= CPUID_TOPOLOGY_LEVEL_DIE; +break; +default: +*eax = 0; +*ebx = 0; +*ecx |= CPUID_TOPOLOGY_LEVEL_INVALID; +} assert(!(*eax & ~0x1f)); *ebx &= 0x; /* The count doesn't need to be reliable. */ break; @@ -5864,6 +5900,7 @@ static Property x86_cpu_properties[] = { DEFINE_PROP_BOOL("full-cpuid-auto-level", X86CPU, full_cpuid_auto_level, true), DEFINE_PROP_STRING("hv-vendor-id", X86CPU, hyperv_vendor_id), DEFINE_PROP_BOOL("cpuid-0xb", X86CPU, enable_cpuid_0xb, true), +DEFINE_PROP_BOOL("cpuid-0x1f", X86CPU, enable_cpuid_0x1f, true), DEFINE_PROP_BOOL("lmce", X86CPU, enable_lmce, false), DEFINE_PROP_BOOL("l3-cache", X86CPU, enable_l3_cache, true), DEFINE_PROP_BOOL("kvm-no-smi-migration", X86CPU, kvm_no_smi_migration, diff --git a/target/i386/cpu.h b/target/i386/cpu.h index d5f2a60ff5..9b54c646e7 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -735,6 +735,7 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS]; #define CPUID_TOPOLOGY_LEVEL_INVALID (0U << 8) #define CPUID_TOPOLOGY_LEVEL_SMT (1U << 8) #define CPUID_TOPOLOGY_LEVEL_CORE (2U << 8) +#define CPUID_TOPOLOGY_LEVEL_DIE (5U << 8) /* MSR Feature Bits */ #define MSR_ARCH_CAP_RDCL_NO(1U << 0) @@ -1455,6 +1456,9 @@ struct X86CPU { /* Compatibility bits for old machine types: */ bool enable_cpuid_0xb; +/* V2 Compatibility bits for old machine types: */ +bool enable_cpuid_0x1f; + /* Enable auto level-increase for all CPUID leaves */ bool full_cpuid_auto_level; diff --git a/target/i386/kvm.c b/target/i386/kvm.c index 3b29ce5c0d..d8b8bd5c9e 100644 --- a/target/i386/kvm.c +++ b/target/i386/kvm.c @@ -931,12 +931,12 @@ int kvm_arch_init_vcpu(CPUState *cs) struct kvm_cpuid_entry2 *c; uint32_t signature[3]; int kvm_base = KVM_CPUID_SIGNATURE; -int r; +int r, cpuid_0_entry, cpuid_min_level; Error *local_err = NULL; memset(&cpuid_data, 0, sizeof(cpuid_data)); -cpuid_i = 0; +cpuid_i = cpuid_0_entry = cpuid_min_level = 0; r = kvm_arch_set_tsc_khz(cs); if (r < 0) { @@ -1050,6 +1050,11 @@ int kvm_arch_init_vcpu(CPUState *cs) cpu_x86_cpuid(env, 0, 0, &limit, &unused, &unused, &unused); +if (limit < 0x1f && env->nr_dies > 1 && cpu->enable_cpui
[Qemu-devel] [PATCH v2 0/5] Introduce cpu die topology and enable CPUID.1F for i386
Multi-chip packaging technology allows integration of multi-cores in one die and multi-dies in one single package, for example Intel CLX-AP or AMD EPYC. This kind of integration can be enabled by high-performance, heterogeneous, multi-dies interconnect technology, providing a more cost-effective manner. QEMU and guests may take advantages of multi-dies host for such as guest placing or energy efficiency management... This patch series extend the CPU topology to the socket/dies/core/thread model, allowing the setting of dies number per one socket on -smp qemu command. For i386, it upgrades APIC_IDs generation and reversion functions with a new exposed leaf called CPUID.1F, which is a preferred superset to leaf 0BH. The CPUID.1F spec is on https://software.intel.com/en-us/articles/intel-sdm, 3-190 Vol 2A. E.g. we use -smp 4,dies=2,cores=2,threads=1 to run an MCP kvm-guest, check raw cpuid data and the expected output from guest is following: 0x001f 0x00: eax=0x ebx=0x0001 ecx=0x0100 edx=0x0002 0x001f 0x01: eax=0x0001 ebx=0x0002 ecx=0x0201 edx=0x0001 0x001f 0x02: eax=0x0002 ebx=0x0004 ecx=0x0502 edx=0x0003 0x001f 0x03: eax=0x ebx=0x ecx=0x0003 edx=0x0001 ==changelog== v2: - Enable cpu die-level topolgy only for PCMachine and X86CPU - Minimize cpuid.0.eax to the setting value actually used by guest - Update cmd line -smps docs for die-level configurations - Refactoring topo-bit tests for x86_apicid_from_cpu_idx() with nr_dies - Based on "[PATCH v3 00/10] Refactor cpu topo into machine properties" - Rebase to commit 2259637b95bef3116cc262459271de08e038cc66 v1: https://patchwork.kernel.org/cover/10876667/ Like Xu (5): target/i386: Add cpu die-level topology support for X86CPU i386/cpu: Consolidate die-id validity in smp context vl.c: Add -smp, dies=* command line support and update -smp doc i386/cpu: Update apicid parsing rules and topo-bit tests for dies target/i386: Add CPUID.1F generation support for multi-die PCMachine hmp.c | 3 ++ hw/core/machine.c | 12 + hw/i386/pc.c | 52 +- include/hw/i386/pc.h | 2 + include/hw/i386/topology.h | 76 +++- qapi/misc.json | 6 ++- qemu-options.hx| 17 target/i386/cpu.c | 59 ++--- target/i386/cpu.h | 7 +++ target/i386/kvm.c | 30 - tests/test-x86-cpuid.c | 84 ++- vl.c | 89 +- 12 files changed, 347 insertions(+), 90 deletions(-) -- 2.21.0
[Qemu-devel] [PATCH v2 3/5] vl.c: Add -smp, dies=* command line support and update -smp doc
For PC target, users could configure the number of dies per one package via command line with this patch, such as "-smp dies=2,cores=4". A new pc-specified pc_smp_parse() is introduced and to keep the interface consistent, refactoring legacy smp_parse() to __smp_parse() is necessary. The parsing rules of new cpu-topology model obey the same restrictions/logic as the legacy socket/core/thread model especially on missing values computing. Signed-off-by: Like Xu --- qemu-options.hx | 17 +- vl.c| 89 - 2 files changed, 97 insertions(+), 9 deletions(-) diff --git a/qemu-options.hx b/qemu-options.hx index 5daa5a8fb0..7fad5b50ff 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -138,25 +138,26 @@ no incompatible TCG features have been enabled (e.g. icount/replay). ETEXI DEF("smp", HAS_ARG, QEMU_OPTION_smp, -"-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,sockets=sockets]\n" +"-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n" "set the number of CPUs to 'n' [default=1]\n" "maxcpus= maximum number of total cpus, including\n" "offline CPUs for hotplug, etc\n" -"cores= number of CPU cores on one socket\n" +"cores= number of CPU cores on one socket (for PC, it's on one die)\n" "threads= number of threads on one CPU core\n" +"dies= number of CPU dies on one socket (for PC only)\n" "sockets= number of discrete sockets in the system\n", QEMU_ARCH_ALL) STEXI -@item -smp [cpus=]@var{n}[,cores=@var{cores}][,threads=@var{threads}][,sockets=@var{sockets}][,maxcpus=@var{maxcpus}] +@item -smp [cpus=]@var{n}[,cores=@var{cores}][,threads=@var{threads}][,dies=dies][,sockets=@var{sockets}][,maxcpus=@var{maxcpus}] @findex -smp Simulate an SMP system with @var{n} CPUs. On the PC target, up to 255 CPUs are supported. On Sparc32 target, Linux limits the number of usable CPUs to 4. -For the PC target, the number of @var{cores} per socket, the number -of @var{threads} per cores and the total number of @var{sockets} can be -specified. Missing values will be computed. If any on the three values is -given, the total number of CPUs @var{n} can be omitted. @var{maxcpus} -specifies the maximum number of hotpluggable CPUs. +For the PC target, the number of @var{cores} per die, the number of @var{threads} +per cores, the number of @var{dies} per packages and the total number of +@var{sockets} can be specified. Missing values will be computed. +If any on the three values is given, the total number of CPUs @var{n} can be omitted. +@var{maxcpus} specifies the maximum number of hotpluggable CPUs. ETEXI DEF("numa", HAS_ARG, QEMU_OPTION_numa, diff --git a/vl.c b/vl.c index 8d92e2d209..66b577f447 100644 --- a/vl.c +++ b/vl.c @@ -63,6 +63,7 @@ int main(int argc, char **argv) #include "sysemu/watchdog.h" #include "hw/firmware/smbios.h" #include "hw/acpi/acpi.h" +#include "hw/i386/pc.h" #include "hw/xen/xen.h" #include "hw/qdev.h" #include "hw/loader.h" @@ -1248,6 +1249,9 @@ static QemuOptsList qemu_smp_opts = { }, { .name = "sockets", .type = QEMU_OPT_NUMBER, +}, { +.name = "dies", +.type = QEMU_OPT_NUMBER, }, { .name = "cores", .type = QEMU_OPT_NUMBER, @@ -1262,7 +1266,7 @@ static QemuOptsList qemu_smp_opts = { }, }; -static void smp_parse(QemuOpts *opts) +static void __smp_parse(QemuOpts *opts) { if (opts) { unsigned cpus= qemu_opt_get_number(opts, "cpus", 0); @@ -1334,6 +1338,89 @@ static void smp_parse(QemuOpts *opts) } } +static void pc_smp_parse(QemuOpts *opts) +{ +PCMachineState *pcms = (PCMachineState *) +object_dynamic_cast(OBJECT(current_machine), TYPE_PC_MACHINE); + +unsigned cpus= qemu_opt_get_number(opts, "cpus", 0); +unsigned sockets = qemu_opt_get_number(opts, "sockets", 0); +unsigned dies = qemu_opt_get_number(opts, "dies", 1); +unsigned cores = qemu_opt_get_number(opts, "cores", 0); +unsigned threads = qemu_opt_get_number(opts, "threads", 0); + +/* compute missing values, prefer sockets over cores over threads */ +if (cpus == 0 || sockets == 0) { +cores = cores > 0 ? cores : 1; +threads = threads > 0 ? threads : 1; +if (cpus == 0) { +sockets = sockets > 0 ? sockets : 1; +cpus = cores * threads * dies * sockets; +} else { +current_machine-&
[Qemu-devel] [PATCH v2 1/5] target/i386: Add cpu die-level topology support for X86CPU
The die-level as the first PC-specific cpu topology is added to the leagcy cpu topology model which only covers sockets/cores/threads. In the new model with die-level support, the total number of logical processors (including offline) on board will be calculated as: #cpus = #sockets * #dies * #cores * #threads and considering compatibility, the default value for #dies is 1. A new set of die-related variables are added in smp context and the CPUX86State.nr_dies is assigned in x86_cpu_initfn() from PCMachineState. Signed-off-by: Like Xu --- hw/i386/pc.c | 3 +++ include/hw/i386/pc.h | 2 ++ include/hw/i386/topology.h | 2 ++ qapi/misc.json | 6 -- target/i386/cpu.c | 9 + target/i386/cpu.h | 3 +++ 6 files changed, 23 insertions(+), 2 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 896c22e32e..83ab53c814 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2341,6 +2341,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev, topo.pkg_id = cpu->socket_id; topo.core_id = cpu->core_id; +topo.die_id = cpu->die_id; topo.smt_id = cpu->thread_id; cpu->apic_id = apicid_from_topo_ids(smp_cores, smp_threads, &topo); } @@ -2692,6 +2693,8 @@ static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms) ms->smp.cores, ms->smp.threads, &topo); ms->possible_cpus->cpus[i].props.has_socket_id = true; ms->possible_cpus->cpus[i].props.socket_id = topo.pkg_id; +ms->possible_cpus->cpus[i].props.has_die_id = true; +ms->possible_cpus->cpus[i].props.die_id = topo.die_id; ms->possible_cpus->cpus[i].props.has_core_id = true; ms->possible_cpus->cpus[i].props.core_id = topo.core_id; ms->possible_cpus->cpus[i].props.has_thread_id = true; diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index ce3c22951e..b5faf2ede9 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -24,6 +24,7 @@ * PCMachineState: * @acpi_dev: link to ACPI PM device that performs ACPI hotplug handling * @boot_cpus: number of present VCPUs + * @smp_dies: number of dies per one package */ struct PCMachineState { /*< private >*/ @@ -59,6 +60,7 @@ struct PCMachineState { bool apic_xrupt_override; unsigned apic_id_limit; uint16_t boot_cpus; +unsigned smp_dies; /* NUMA information: */ uint64_t numa_nodes; diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h index 1ebaee0f76..7f80498eb3 100644 --- a/include/hw/i386/topology.h +++ b/include/hw/i386/topology.h @@ -47,6 +47,7 @@ typedef uint32_t apic_id_t; typedef struct X86CPUTopoInfo { unsigned pkg_id; +unsigned die_id; unsigned core_id; unsigned smt_id; } X86CPUTopoInfo; @@ -130,6 +131,7 @@ static inline void x86_topo_ids_from_apicid(apic_id_t apicid, topo->core_id = (apicid >> apicid_core_offset(nr_cores, nr_threads)) & ~(0xUL << apicid_core_width(nr_cores, nr_threads)); topo->pkg_id = apicid >> apicid_pkg_offset(nr_cores, nr_threads); +topo->die_id = -1; } /* Make APIC ID for the CPU 'cpu_index' diff --git a/qapi/misc.json b/qapi/misc.json index 8b3ca4fdd3..cd236c89b3 100644 --- a/qapi/misc.json +++ b/qapi/misc.json @@ -2924,10 +2924,11 @@ # # @node-id: NUMA node ID the CPU belongs to # @socket-id: socket number within node/board the CPU belongs to -# @core-id: core number within socket the CPU belongs to +# @die-id: die number within node/board the CPU belongs to (Since 4.1) +# @core-id: core number within die the CPU belongs to # @thread-id: thread number within core the CPU belongs to # -# Note: currently there are 4 properties that could be present +# Note: currently there are 5 properties that could be present # but management should be prepared to pass through other # properties with device_add command to allow for future # interface extension. This also requires the filed names to be kept in @@ -2938,6 +2939,7 @@ { 'struct': 'CpuInstanceProperties', 'data': { '*node-id': 'int', '*socket-id': 'int', +'*die-id': 'int', '*core-id': 'int', '*thread-id': 'int' } diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 9a93dd8be7..9bd35b4965 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -55,6 +55,7 @@ #include "hw/xen/xen.h" #include "hw/i386/apic_internal.h" #include "hw/boards.h" +#include "hw/i386/pc.h" #endif #include "disas/capstone.h" @@ -5595,7 +5596,13 @@ static void x86_cpu_initfn(Object *obj) X86CPUClass *xcc = X86_CPU_GET_CLASS(obj); CPU
[Qemu-devel] [PATCH v2 2/5] i386/cpu: Consolidate die-id validity in smp context
Following the legacy smp check rules, the die_id validity is added to the same contexts as leagcy smp variables such as hmp_hotpluggable_cpus(), machine_set_cpu_numa_node(), cpu_slot_to_string() and pc_cpu_pre_plug(). Signed-off-by: Like Xu --- hmp.c | 3 +++ hw/core/machine.c | 12 hw/i386/pc.c | 11 +++ 3 files changed, 26 insertions(+) diff --git a/hmp.c b/hmp.c index 56a3ed7375..7deb7b7226 100644 --- a/hmp.c +++ b/hmp.c @@ -3112,6 +3112,9 @@ void hmp_hotpluggable_cpus(Monitor *mon, const QDict *qdict) if (c->has_socket_id) { monitor_printf(mon, "socket-id: \"%" PRIu64 "\"\n", c->socket_id); } +if (c->has_die_id) { +monitor_printf(mon, "die-id: \"%" PRIu64 "\"\n", c->die_id); +} if (c->has_core_id) { monitor_printf(mon, "core-id: \"%" PRIu64 "\"\n", c->core_id); } diff --git a/hw/core/machine.c b/hw/core/machine.c index 5d046a43e3..5116429732 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -659,6 +659,11 @@ void machine_set_cpu_numa_node(MachineState *machine, return; } +if (props->has_die_id && !slot->props.has_die_id) { +error_setg(errp, "die-id is not supported"); +return; +} + /* skip slots with explicit mismatch */ if (props->has_thread_id && props->thread_id != slot->props.thread_id) { continue; @@ -668,6 +673,10 @@ void machine_set_cpu_numa_node(MachineState *machine, continue; } +if (props->has_die_id && props->die_id != slot->props.die_id) { +continue; +} + if (props->has_socket_id && props->socket_id != slot->props.socket_id) { continue; } @@ -925,6 +934,9 @@ static char *cpu_slot_to_string(const CPUArchId *cpu) if (cpu->props.has_socket_id) { g_string_append_printf(s, "socket-id: %"PRId64, cpu->props.socket_id); } +if (cpu->props.has_die_id) { +g_string_append_printf(s, "die-id: %"PRId64, cpu->props.die_id); +} if (cpu->props.has_core_id) { if (s->len) { g_string_append_printf(s, ", "); diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 83ab53c814..00be2463af 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2321,6 +2321,10 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev, error_setg(errp, "Invalid CPU socket-id: %u must be in range 0:%u", cpu->socket_id, max_socket); return; +} else if (cpu->die_id > max_socket) { +error_setg(errp, "Invalid CPU die-id: %u must be in range 0:%u", + cpu->die_id, max_socket); +return; } if (cpu->core_id < 0) { error_setg(errp, "CPU core-id is not set"); @@ -2378,6 +2382,13 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev, } cpu->socket_id = topo.pkg_id; +if (cpu->die_id != -1 && cpu->die_id != topo.die_id) { +error_setg(errp, "property die-id: %u doesn't match set apic-id:" +" 0x%x (die-id: %u)", cpu->die_id, cpu->apic_id, topo.die_id); +return; +} +cpu->die_id = topo.die_id; + if (cpu->core_id != -1 && cpu->core_id != topo.core_id) { error_setg(errp, "property core-id: %u doesn't match set apic-id:" " 0x%x (core-id: %u)", cpu->core_id, cpu->apic_id, topo.core_id); -- 2.21.0
[Qemu-devel] [PATCH v3 08/10] hw/arm: Replace global smp variables with machine smp properties
The global smp variables in arm are replaced with smp machine properties. The init_cpus() and *_create_rpu() are refactored to pass MachineState. A local variable of the same name would be introduced in the declaration phase if it's used widely in the context OR replace it on the spot if it's only used once. No semantic changes. Signed-off-by: Like Xu Reviewed-by: Alistair Francis --- hw/arm/fsl-imx6.c | 6 +- hw/arm/fsl-imx6ul.c| 6 +- hw/arm/fsl-imx7.c | 7 +-- hw/arm/highbank.c | 1 + hw/arm/mcimx6ul-evk.c | 2 +- hw/arm/mcimx7d-sabre.c | 2 +- hw/arm/raspi.c | 4 ++-- hw/arm/realview.c | 1 + hw/arm/sabrelite.c | 2 +- hw/arm/vexpress.c | 16 ++-- hw/arm/virt.c | 8 +++- hw/arm/xlnx-zynqmp.c | 16 ++-- target/arm/cpu.c | 8 +++- 13 files changed, 56 insertions(+), 23 deletions(-) diff --git a/hw/arm/fsl-imx6.c b/hw/arm/fsl-imx6.c index 7b7b97f74c..ed772d5bd9 100644 --- a/hw/arm/fsl-imx6.c +++ b/hw/arm/fsl-imx6.c @@ -23,6 +23,7 @@ #include "qapi/error.h" #include "qemu-common.h" #include "hw/arm/fsl-imx6.h" +#include "hw/boards.h" #include "sysemu/sysemu.h" #include "chardev/char.h" #include "qemu/error-report.h" @@ -33,11 +34,12 @@ static void fsl_imx6_init(Object *obj) { +MachineState *ms = MACHINE(qdev_get_machine()); FslIMX6State *s = FSL_IMX6(obj); char name[NAME_SIZE]; int i; -for (i = 0; i < MIN(smp_cpus, FSL_IMX6_NUM_CPUS); i++) { +for (i = 0; i < MIN(ms->smp.cpus, FSL_IMX6_NUM_CPUS); i++) { snprintf(name, NAME_SIZE, "cpu%d", i); object_initialize_child(obj, name, &s->cpu[i], sizeof(s->cpu[i]), "cortex-a9-" TYPE_ARM_CPU, &error_abort, NULL); @@ -93,9 +95,11 @@ static void fsl_imx6_init(Object *obj) static void fsl_imx6_realize(DeviceState *dev, Error **errp) { +MachineState *ms = MACHINE(qdev_get_machine()); FslIMX6State *s = FSL_IMX6(dev); uint16_t i; Error *err = NULL; +unsigned int smp_cpus = ms->smp.cpus; if (smp_cpus > FSL_IMX6_NUM_CPUS) { error_setg(errp, "%s: Only %d CPUs are supported (%d requested)", diff --git a/hw/arm/fsl-imx6ul.c b/hw/arm/fsl-imx6ul.c index 4b56bfa8d1..74b8ecbbb6 100644 --- a/hw/arm/fsl-imx6ul.c +++ b/hw/arm/fsl-imx6ul.c @@ -21,6 +21,7 @@ #include "qemu-common.h" #include "hw/arm/fsl-imx6ul.h" #include "hw/misc/unimp.h" +#include "hw/boards.h" #include "sysemu/sysemu.h" #include "qemu/error-report.h" @@ -28,11 +29,12 @@ static void fsl_imx6ul_init(Object *obj) { +MachineState *ms = MACHINE(qdev_get_machine()); FslIMX6ULState *s = FSL_IMX6UL(obj); char name[NAME_SIZE]; int i; -for (i = 0; i < MIN(smp_cpus, FSL_IMX6UL_NUM_CPUS); i++) { +for (i = 0; i < MIN(ms->smp.cpus, FSL_IMX6UL_NUM_CPUS); i++) { snprintf(name, NAME_SIZE, "cpu%d", i); object_initialize_child(obj, name, &s->cpu[i], sizeof(s->cpu[i]), "cortex-a7-" TYPE_ARM_CPU, &error_abort, NULL); @@ -156,10 +158,12 @@ static void fsl_imx6ul_init(Object *obj) static void fsl_imx6ul_realize(DeviceState *dev, Error **errp) { +MachineState *ms = MACHINE(qdev_get_machine()); FslIMX6ULState *s = FSL_IMX6UL(dev); int i; qemu_irq irq; char name[NAME_SIZE]; +unsigned int smp_cpus = ms->smp.cpus; if (smp_cpus > FSL_IMX6UL_NUM_CPUS) { error_setg(errp, "%s: Only %d CPUs are supported (%d requested)", diff --git a/hw/arm/fsl-imx7.c b/hw/arm/fsl-imx7.c index 7663ad6861..71cc414de6 100644 --- a/hw/arm/fsl-imx7.c +++ b/hw/arm/fsl-imx7.c @@ -23,6 +23,7 @@ #include "qemu-common.h" #include "hw/arm/fsl-imx7.h" #include "hw/misc/unimp.h" +#include "hw/boards.h" #include "sysemu/sysemu.h" #include "qemu/error-report.h" @@ -30,12 +31,12 @@ static void fsl_imx7_init(Object *obj) { +MachineState *ms = MACHINE(qdev_get_machine()); FslIMX7State *s = FSL_IMX7(obj); char name[NAME_SIZE]; int i; - -for (i = 0; i < MIN(smp_cpus, FSL_IMX7_NUM_CPUS); i++) { +for (i = 0; i < MIN(ms->smp.cpus, FSL_IMX7_NUM_CPUS); i++) { snprintf(name, NAME_SIZE, "cpu%d", i); object_initialize_child(obj, name, &s->cpu[i], sizeof(s->cpu[i]), ARM_CPU_TYPE_NAME("cortex-a7"), &error_abort, @@ -155,11 +156,13 @@ static void fsl_imx7_init(Object *obj) static void fsl_imx7_realize(DeviceState *dev, Error **errp) { +MachineState *ms = MACHINE(qdev_get_machine()); FslIMX7State *s = FSL_IMX7(dev); Object *o; int i; qe
[Qemu-devel] [PATCH v3 02/10] machine: Refactor smp-related call chains to pass MachineState
To get rid of the global smp_* variables we're currently using, it's recommended to pass MachineState in the list of incoming parameters for functions that use global smp variables, thus some redundant parameters are dropped. It's applied for legacy smbios_*(), *_machine_reset(), hot_add_cpu() and mips *_create_cpu(). Suggested-by: Igor Mammedov Signed-off-by: Like Xu Reviewed-by: Alistair Francis --- hw/arm/virt.c| 2 +- hw/hppa/machine.c| 2 +- hw/i386/acpi-build.c | 2 +- hw/i386/pc.c | 9 - hw/mips/mips_malta.c | 22 +++--- hw/ppc/pnv.c | 3 +-- hw/ppc/spapr.c | 3 +-- hw/s390x/s390-virtio-ccw.c | 6 +++--- hw/smbios/smbios.c | 26 +++--- include/hw/boards.h | 4 ++-- include/hw/firmware/smbios.h | 5 +++-- include/hw/i386/pc.h | 2 +- qmp.c| 2 +- vl.c | 2 +- 14 files changed, 46 insertions(+), 44 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 5331ab71e2..6b2f2e96d3 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1342,7 +1342,7 @@ static void virt_build_smbios(VirtMachineState *vms) vmc->smbios_old_sys_ver ? "1.0" : mc->name, false, true, SMBIOS_ENTRY_POINT_30); -smbios_get_tables(NULL, 0, &smbios_tables, &smbios_tables_len, +smbios_get_tables(MACHINE(vms), NULL, 0, &smbios_tables, &smbios_tables_len, &smbios_anchor, &smbios_anchor_len); if (smbios_anchor) { diff --git a/hw/hppa/machine.c b/hw/hppa/machine.c index d1b1d3caa4..416e67bab1 100644 --- a/hw/hppa/machine.c +++ b/hw/hppa/machine.c @@ -240,7 +240,7 @@ static void machine_hppa_init(MachineState *machine) cpu[0]->env.gr[21] = smp_cpus; } -static void hppa_machine_reset(void) +static void hppa_machine_reset(MachineState *ms) { int i; diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index b4ec14e349..c8e47e5713 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -187,7 +187,7 @@ static void acpi_get_pm_info(AcpiPmInfo *pm) pm->pcihp_io_len = 0; assert(obj); -init_common_fadt_data(obj, &pm->fadt); +init_common_fadt_data(machine, obj, &pm->fadt); if (piix) { /* w2k requires FADT(rev1) or it won't boot, keep PC compatible */ pm->fadt.rev = 1; diff --git a/hw/i386/pc.c b/hw/i386/pc.c index d98b737b8f..9bcd867ea3 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -962,7 +962,7 @@ static void pc_build_smbios(PCMachineState *pcms) /* tell smbios about cpuid version and features */ smbios_set_cpuid(cpu->env.cpuid_version, cpu->env.features[FEAT_1_EDX]); -smbios_tables = smbios_get_table_legacy(&smbios_tables_len); +smbios_tables = smbios_get_table_legacy(ms, &smbios_tables_len); if (smbios_tables) { fw_cfg_add_bytes(pcms->fw_cfg, FW_CFG_SMBIOS_ENTRIES, smbios_tables, smbios_tables_len); @@ -979,7 +979,7 @@ static void pc_build_smbios(PCMachineState *pcms) array_count++; } } -smbios_get_tables(mem_array, array_count, +smbios_get_tables(ms, mem_array, array_count, &smbios_tables, &smbios_tables_len, &smbios_anchor, &smbios_anchor_len); g_free(mem_array); @@ -1534,9 +1534,8 @@ static void pc_new_cpu(const char *typename, int64_t apic_id, Error **errp) error_propagate(errp, local_err); } -void pc_hot_add_cpu(const int64_t id, Error **errp) +void pc_hot_add_cpu(MachineState *ms, const int64_t id, Error **errp) { -MachineState *ms = MACHINE(qdev_get_machine()); int64_t apic_id = x86_cpu_apic_id_from_index(id); Error *local_err = NULL; @@ -2622,7 +2621,7 @@ static void pc_machine_initfn(Object *obj) pc_system_flash_create(pcms); } -static void pc_machine_reset(void) +static void pc_machine_reset(MachineState *machine) { CPUState *cs; X86CPU *cpu; diff --git a/hw/mips/mips_malta.c b/hw/mips/mips_malta.c index 439665ab45..5fe9512c24 100644 --- a/hw/mips/mips_malta.c +++ b/hw/mips/mips_malta.c @@ -1124,15 +1124,15 @@ static void main_cpu_reset(void *opaque) } } -static void create_cpu_without_cps(const char *cpu_type, +static void create_cpu_without_cps(MachineState *ms, qemu_irq *cbus_irq, qemu_irq *i8259_irq) { CPUMIPSState *env; MIPSCPU *cpu; int i; -for (i = 0; i < smp_cpus; i++) { -cpu = MIPS_CPU(cpu_create(cpu_type)); +for (i = 0; i < ms->smp.cpus; i++) { +cpu = MIPS_CPU(cpu_create(ms->cpu_type)); /* Init internal devices */ cpu_mips_irq_init_cpu(cpu); @@ -1146,7 +1146,7 @@ static void create_cpu_without_cps(const char *c
[Qemu-devel] [PATCH v3 10/10] vl.c: Replace smp global variables with smp machine properties
The global smp variables in vl.c are completely replaced with machine properties. Form this commit, the smp_cpus/smp_cores/smp_threads/max_cpus are deprecated and only machine properties within MachineState are fully applied and enabled. Signed-off-by: Like Xu Reviewed-by: Alistair Francis --- vl.c | 53 ++--- 1 file changed, 26 insertions(+), 27 deletions(-) diff --git a/vl.c b/vl.c index 15d519e371..a700c93c77 100644 --- a/vl.c +++ b/vl.c @@ -162,10 +162,6 @@ static Chardev **serial_hds; Chardev *parallel_hds[MAX_PARALLEL_PORTS]; int win2k_install_hack = 0; int singlestep = 0; -int smp_cpus; -unsigned int max_cpus; -int smp_cores = 1; -int smp_threads = 1; int acpi_enabled = 1; int no_hpet = 0; int fd_bootchk = 1; @@ -1282,8 +1278,9 @@ static void smp_parse(QemuOpts *opts) sockets = sockets > 0 ? sockets : 1; cpus = cores * threads * sockets; } else { -max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus); -sockets = max_cpus / (cores * threads); +current_machine->smp.max_cpus = +qemu_opt_get_number(opts, "maxcpus", cpus); +sockets = current_machine->smp.max_cpus / (cores * threads); } } else if (cores == 0) { threads = threads > 0 ? threads : 1; @@ -1300,34 +1297,37 @@ static void smp_parse(QemuOpts *opts) exit(1); } -max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus); +current_machine->smp.max_cpus = +qemu_opt_get_number(opts, "maxcpus", cpus); -if (max_cpus < cpus) { +if (current_machine->smp.max_cpus < cpus) { error_report("maxcpus must be equal to or greater than smp"); exit(1); } -if (sockets * cores * threads > max_cpus) { +if (sockets * cores * threads > current_machine->smp.max_cpus) { error_report("cpu topology: " "sockets (%u) * cores (%u) * threads (%u) > " "maxcpus (%u)", - sockets, cores, threads, max_cpus); + sockets, cores, threads, + current_machine->smp.max_cpus); exit(1); } -if (sockets * cores * threads != max_cpus) { +if (sockets * cores * threads != current_machine->smp.max_cpus) { warn_report("Invalid CPU topology deprecated: " "sockets (%u) * cores (%u) * threads (%u) " "!= maxcpus (%u)", -sockets, cores, threads, max_cpus); +sockets, cores, threads, +current_machine->smp.max_cpus); } -smp_cpus = cpus; -smp_cores = cores; -smp_threads = threads; +current_machine->smp.cpus = cpus; +current_machine->smp.cores = cores; +current_machine->smp.threads = threads; } -if (smp_cpus > 1) { +if (current_machine->smp.cpus > 1) { Error *blocker = NULL; error_setg(&blocker, QERR_REPLAY_NOT_SUPPORTED, "smp"); replay_add_blocker(blocker); @@ -4128,26 +4128,25 @@ int main(int argc, char **argv, char **envp) machine_class->default_cpus = machine_class->default_cpus ?: 1; /* default to machine_class->default_cpus */ -smp_cpus = machine_class->default_cpus; -max_cpus = machine_class->default_cpus; +current_machine->smp.cpus = machine_class->default_cpus; +current_machine->smp.max_cpus = machine_class->default_cpus; +current_machine->smp.cores = 1; +current_machine->smp.threads = 1; smp_parse(qemu_opts_find(qemu_find_opts("smp-opts"), NULL)); -current_machine->smp.cpus = smp_cpus; -current_machine->smp.max_cpus = max_cpus; -current_machine->smp.cores = smp_cores; -current_machine->smp.threads = smp_threads; - /* sanity-check smp_cpus and max_cpus against machine_class */ -if (smp_cpus < machine_class->min_cpus) { +if (current_machine->smp.cpus < machine_class->min_cpus) { error_report("Invalid SMP CPUs %d. The min CPUs " - "supported by machine '%s' is %d", smp_cpus, + "supported by machine '%s' is %d", + current_machine->smp.cpus, machine_class->name, machine_class->min_cpus); exit(1); } -if (max_cpus > machine_class->max_cpus) { +if (current_machine->smp.max_cpus > machine_class->max_cpus) { error_report("Invalid SMP CPUs %d. The max CPUs " - "supported by machine '%s' is %d", max_cpus, + "supported by machine '%s' is %d", + current_machine->smp.max_cpus, machine_class->name, machine_class->max_cpus); exit(1); } -- 2.21.0
[Qemu-devel] [PATCH v3 03/10] general: Replace global smp variables with smp machine properties
Basically, the context could get the MachineState reference via call chains or unrecommended qdev_get_machine() in !CONFIG_USER_ONLY mode. A local variable of the same name would be introduced in the declaration phase out of less effort OR replace it on the spot if it's only used once in the context. No semantic changes. Signed-off-by: Like Xu Reviewed-by: Alistair Francis --- accel/kvm/kvm-all.c | 4 ++-- backends/hostmem.c | 6 -- cpus.c | 6 -- exec.c | 3 ++- gdbstub.c| 4 hw/cpu/core.c| 4 +++- migration/postcopy-ram.c | 8 +++- numa.c | 1 + target/openrisc/sys_helper.c | 6 +- tcg/tcg.c| 13 - 10 files changed, 44 insertions(+), 11 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 524c4ddfbd..f8ef39d845 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -1533,8 +1533,8 @@ static int kvm_init(MachineState *ms) const char *name; int num; } num_cpus[] = { -{ "SMP", smp_cpus }, -{ "hotpluggable", max_cpus }, +{ "SMP", ms->smp.cpus }, +{ "hotpluggable", ms->smp.max_cpus }, { NULL, } }, *nc = num_cpus; int soft_vcpus_limit, hard_vcpus_limit; diff --git a/backends/hostmem.c b/backends/hostmem.c index 04baf479a1..463102aa15 100644 --- a/backends/hostmem.c +++ b/backends/hostmem.c @@ -222,6 +222,7 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value, { Error *local_err = NULL; HostMemoryBackend *backend = MEMORY_BACKEND(obj); +MachineState *ms = MACHINE(qdev_get_machine()); if (backend->force_prealloc) { if (value) { @@ -241,7 +242,7 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value, void *ptr = memory_region_get_ram_ptr(&backend->mr); uint64_t sz = memory_region_size(&backend->mr); -os_mem_prealloc(fd, ptr, sz, smp_cpus, &local_err); +os_mem_prealloc(fd, ptr, sz, ms->smp.cpus, &local_err); if (local_err) { error_propagate(errp, local_err); return; @@ -311,6 +312,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) { HostMemoryBackend *backend = MEMORY_BACKEND(uc); HostMemoryBackendClass *bc = MEMORY_BACKEND_GET_CLASS(uc); +MachineState *ms = MACHINE(qdev_get_machine()); Error *local_err = NULL; void *ptr; uint64_t sz; @@ -375,7 +377,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) */ if (backend->prealloc) { os_mem_prealloc(memory_region_get_fd(&backend->mr), ptr, sz, -smp_cpus, &local_err); +ms->smp.cpus, &local_err); if (local_err) { goto out; } diff --git a/cpus.c b/cpus.c index e58e7ab0f6..b49db3604a 100644 --- a/cpus.c +++ b/cpus.c @@ -2068,8 +2068,10 @@ static void qemu_dummy_start_vcpu(CPUState *cpu) void qemu_init_vcpu(CPUState *cpu) { -cpu->nr_cores = smp_cores; -cpu->nr_threads = smp_threads; +MachineState *ms = MACHINE(qdev_get_machine()); + +cpu->nr_cores = ms->smp.cores; +cpu->nr_threads = ms->smp.threads; cpu->stopped = true; if (!cpu->as) { diff --git a/exec.c b/exec.c index 4e734770c2..2744df648c 100644 --- a/exec.c +++ b/exec.c @@ -1871,6 +1871,7 @@ static void *file_ram_alloc(RAMBlock *block, bool truncate, Error **errp) { +MachineState *ms = MACHINE(qdev_get_machine()); void *area; block->page_size = qemu_fd_getpagesize(fd); @@ -1927,7 +1928,7 @@ static void *file_ram_alloc(RAMBlock *block, } if (mem_prealloc) { -os_mem_prealloc(fd, area, memory, smp_cpus, errp); +os_mem_prealloc(fd, area, memory, ms->smp.cpus, errp); if (errp && *errp) { qemu_ram_munmap(fd, area, memory); return NULL; diff --git a/gdbstub.c b/gdbstub.c index d54abd17cc..dba37df2e9 100644 --- a/gdbstub.c +++ b/gdbstub.c @@ -30,6 +30,7 @@ #include "sysemu/sysemu.h" #include "exec/gdbstub.h" #include "hw/cpu/cluster.h" +#include "hw/boards.h" #endif #define MAX_PACKET_LENGTH 4096 @@ -1159,6 +1160,9 @@ static int gdb_handle_vcont(GDBState *s, const char *p) CPU_FOREACH(cpu) { max_cpus = max_cpus <= cpu->cpu_index ? cpu->cpu_index + 1 : max_cpus; } +#else +MachineState *ms = MACHINE(qdev_get_machine()); +unsigned int max_cpus = ms->smp.max_cpus; #endif /* uninitialised CPUs stay 0 */ newstates = g_new0(char, max_cpus); diff --git a/hw/cpu/core.c b/hw/cpu/co
[Qemu-devel] [PATCH v3 05/10] hw/riscv: Replace global smp variables with machine smp properties
The global smp variables in riscv are replaced with smp machine properties. A local variable of the same name would be introduced in the declaration phase if it's used widely in the context OR replace it on the spot if it's only used once. No semantic changes. Signed-off-by: Like Xu --- hw/riscv/sifive_e.c| 6 -- hw/riscv/sifive_plic.c | 3 +++ hw/riscv/sifive_u.c| 6 -- hw/riscv/spike.c | 2 ++ hw/riscv/virt.c| 1 + 5 files changed, 14 insertions(+), 4 deletions(-) diff --git a/hw/riscv/sifive_e.c b/hw/riscv/sifive_e.c index b1cd11363c..ae86a63c04 100644 --- a/hw/riscv/sifive_e.c +++ b/hw/riscv/sifive_e.c @@ -137,6 +137,7 @@ static void riscv_sifive_e_init(MachineState *machine) static void riscv_sifive_e_soc_init(Object *obj) { +MachineState *ms = MACHINE(qdev_get_machine()); SiFiveESoCState *s = RISCV_E_SOC(obj); object_initialize_child(obj, "cpus", &s->cpus, @@ -144,12 +145,13 @@ static void riscv_sifive_e_soc_init(Object *obj) &error_abort, NULL); object_property_set_str(OBJECT(&s->cpus), SIFIVE_E_CPU, "cpu-type", &error_abort); -object_property_set_int(OBJECT(&s->cpus), smp_cpus, "num-harts", +object_property_set_int(OBJECT(&s->cpus), ms->smp.cpus, "num-harts", &error_abort); } static void riscv_sifive_e_soc_realize(DeviceState *dev, Error **errp) { +MachineState *ms = MACHINE(qdev_get_machine()); const struct MemmapEntry *memmap = sifive_e_memmap; SiFiveESoCState *s = RISCV_E_SOC(dev); @@ -179,7 +181,7 @@ static void riscv_sifive_e_soc_realize(DeviceState *dev, Error **errp) SIFIVE_E_PLIC_CONTEXT_STRIDE, memmap[SIFIVE_E_PLIC].size); sifive_clint_create(memmap[SIFIVE_E_CLINT].base, -memmap[SIFIVE_E_CLINT].size, smp_cpus, +memmap[SIFIVE_E_CLINT].size, ms->smp.cpus, SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE); sifive_mmio_emulate(sys_mem, "riscv.sifive.e.aon", memmap[SIFIVE_E_AON].base, memmap[SIFIVE_E_AON].size); diff --git a/hw/riscv/sifive_plic.c b/hw/riscv/sifive_plic.c index 07a032d93d..d4010a1f39 100644 --- a/hw/riscv/sifive_plic.c +++ b/hw/riscv/sifive_plic.c @@ -23,6 +23,7 @@ #include "qemu/error-report.h" #include "hw/sysbus.h" #include "hw/pci/msi.h" +#include "hw/boards.h" #include "target/riscv/cpu.h" #include "sysemu/sysemu.h" #include "hw/riscv/sifive_plic.h" @@ -438,6 +439,8 @@ static void sifive_plic_irq_request(void *opaque, int irq, int level) static void sifive_plic_realize(DeviceState *dev, Error **errp) { +MachineState *ms = MACHINE(qdev_get_machine()); +unsigned int smp_cpus = ms->smp.cpus; SiFivePLICState *plic = SIFIVE_PLIC(dev); int i; diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c index 5ecc47cea3..43bf256946 100644 --- a/hw/riscv/sifive_u.c +++ b/hw/riscv/sifive_u.c @@ -321,13 +321,14 @@ static void riscv_sifive_u_init(MachineState *machine) static void riscv_sifive_u_soc_init(Object *obj) { +MachineState *ms = MACHINE(qdev_get_machine()); SiFiveUSoCState *s = RISCV_U_SOC(obj); object_initialize_child(obj, "cpus", &s->cpus, sizeof(s->cpus), TYPE_RISCV_HART_ARRAY, &error_abort, NULL); object_property_set_str(OBJECT(&s->cpus), SIFIVE_U_CPU, "cpu-type", &error_abort); -object_property_set_int(OBJECT(&s->cpus), smp_cpus, "num-harts", +object_property_set_int(OBJECT(&s->cpus), ms->smp.cpus, "num-harts", &error_abort); sysbus_init_child_obj(obj, "gem", &s->gem, sizeof(s->gem), @@ -336,6 +337,7 @@ static void riscv_sifive_u_soc_init(Object *obj) static void riscv_sifive_u_soc_realize(DeviceState *dev, Error **errp) { +MachineState *ms = MACHINE(qdev_get_machine()); SiFiveUSoCState *s = RISCV_U_SOC(dev); const struct MemmapEntry *memmap = sifive_u_memmap; MemoryRegion *system_memory = get_system_memory(); @@ -371,7 +373,7 @@ static void riscv_sifive_u_soc_realize(DeviceState *dev, Error **errp) sifive_uart_create(system_memory, memmap[SIFIVE_U_UART1].base, serial_hd(1), qdev_get_gpio_in(DEVICE(s->plic), SIFIVE_U_UART1_IRQ)); sifive_clint_create(memmap[SIFIVE_U_CLINT].base, -memmap[SIFIVE_U_CLINT].size, smp_cpus, +memmap[SIFIVE_U_CLINT].size, ms->smp.cpus, SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE); for (i = 0; i < SIFIVE_U_PLIC_NUM_SOURCES; i++) { diff --git a/hw/riscv/spike.c b/hw/riscv/spike.c index 2a000a5800..6a747ff22e 100644 --- a/hw/riscv/spike.c +++ b/hw/riscv/spike.c @@ -171,6 +171,7
[Qemu-devel] [PATCH v3 06/10] hw/s390x: Replace global smp variables with machine smp properties
The global smp variables in s390x are replaced with smp machine properties. A local variable of the same name would be introduced in the declaration phase if it's used widely in the context OR replace it on the spot if it's only used once. No semantic changes. Signed-off-by: Like Xu --- hw/s390x/s390-virtio-ccw.c | 3 ++- hw/s390x/sclp.c| 2 +- target/s390x/cpu.c | 3 +++ target/s390x/excp_helper.c | 5 + 4 files changed, 11 insertions(+), 2 deletions(-) diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c index ed1fe7a93e..692ad6e372 100644 --- a/hw/s390x/s390-virtio-ccw.c +++ b/hw/s390x/s390-virtio-ccw.c @@ -83,7 +83,7 @@ static void s390_init_cpus(MachineState *machine) /* initialize possible_cpus */ mc->possible_cpu_arch_ids(machine); -for (i = 0; i < smp_cpus; i++) { +for (i = 0; i < machine->smp.cpus; i++) { s390x_new_cpu(machine->cpu_type, i, &error_fatal); } } @@ -410,6 +410,7 @@ static CpuInstanceProperties s390_cpu_index_to_props(MachineState *ms, static const CPUArchIdList *s390_possible_cpu_arch_ids(MachineState *ms) { int i; +unsigned int max_cpus = ms->smp.max_cpus; if (ms->possible_cpus) { g_assert(ms->possible_cpus && ms->possible_cpus->len == max_cpus); diff --git a/hw/s390x/sclp.c b/hw/s390x/sclp.c index 4510a800cb..fac7c3bb6c 100644 --- a/hw/s390x/sclp.c +++ b/hw/s390x/sclp.c @@ -64,7 +64,7 @@ static void read_SCP_info(SCLPDevice *sclp, SCCB *sccb) prepare_cpu_entries(sclp, read_info->entries, &cpu_count); read_info->entries_cpu = cpu_to_be16(cpu_count); read_info->offset_cpu = cpu_to_be16(offsetof(ReadInfo, entries)); -read_info->highest_cpu = cpu_to_be16(max_cpus - 1); +read_info->highest_cpu = cpu_to_be16(machine->smp.max_cpus - 1); read_info->ibc_val = cpu_to_be32(s390_get_ibc_val()); diff --git a/target/s390x/cpu.c b/target/s390x/cpu.c index b1df63d82c..f1e5c0d9c3 100644 --- a/target/s390x/cpu.c +++ b/target/s390x/cpu.c @@ -37,6 +37,7 @@ #include "hw/qdev-properties.h" #ifndef CONFIG_USER_ONLY #include "hw/hw.h" +#include "hw/boards.h" #include "sysemu/arch_init.h" #include "sysemu/sysemu.h" #endif @@ -193,6 +194,8 @@ static void s390_cpu_realizefn(DeviceState *dev, Error **errp) } #if !defined(CONFIG_USER_ONLY) +MachineState *ms = MACHINE(qdev_get_machine()); +unsigned int max_cpus = ms->smp.max_cpus; if (cpu->env.core_id >= max_cpus) { error_setg(&err, "Unable to add CPU with core-id: %" PRIu32 ", maximum core-id: %d", cpu->env.core_id, diff --git a/target/s390x/excp_helper.c b/target/s390x/excp_helper.c index 3a467b72c5..1c6938effc 100644 --- a/target/s390x/excp_helper.c +++ b/target/s390x/excp_helper.c @@ -31,6 +31,7 @@ #ifndef CONFIG_USER_ONLY #include "sysemu/sysemu.h" #include "hw/s390x/s390_flic.h" +#include "hw/boards.h" #endif void QEMU_NORETURN tcg_s390_program_interrupt(CPUS390XState *env, uint32_t code, @@ -300,6 +301,10 @@ static void do_ext_interrupt(CPUS390XState *env) g_assert(cpu_addr < S390_MAX_CPUS); lowcore->cpu_addr = cpu_to_be16(cpu_addr); clear_bit(cpu_addr, env->emergency_signals); +#ifndef CONFIG_USER_ONLY +MachineState *ms = MACHINE(qdev_get_machine()); +unsigned int max_cpus = ms->smp.max_cpus; +#endif if (bitmap_empty(env->emergency_signals, max_cpus)) { env->pending_int &= ~INTERRUPT_EMERGENCY_SIGNAL; } -- 2.21.0
[Qemu-devel] [PATCH v3 04/10] hw/ppc: Replace global smp variables with machine smp properties
The global smp variables in ppc are replaced with smp machine properties. A local variable of the same name would be introduced in the declaration phase if it's used widely in the context OR replace it on the spot if it's only used once. No semantic changes. Signed-off-by: Like Xu --- hw/ppc/e500.c | 3 +++ hw/ppc/mac_newworld.c | 3 ++- hw/ppc/mac_oldworld.c | 3 ++- hw/ppc/pnv.c | 6 -- hw/ppc/prep.c | 4 ++-- hw/ppc/spapr.c| 34 ++ hw/ppc/spapr_rtas.c | 4 +++- 7 files changed, 42 insertions(+), 15 deletions(-) diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c index beb2efd694..5e42e5a059 100644 --- a/hw/ppc/e500.c +++ b/hw/ppc/e500.c @@ -307,6 +307,7 @@ static int ppce500_load_device_tree(PPCE500MachineState *pms, bool dry_run) { MachineState *machine = MACHINE(pms); +unsigned int smp_cpus = machine->smp.cpus; const PPCE500MachineClass *pmc = PPCE500_MACHINE_GET_CLASS(pms); CPUPPCState *env = first_cpu->env_ptr; int ret = -1; @@ -734,6 +735,7 @@ static DeviceState *ppce500_init_mpic_qemu(PPCE500MachineState *pms, SysBusDevice *s; int i, j, k; MachineState *machine = MACHINE(pms); +unsigned int smp_cpus = machine->smp.cpus; const PPCE500MachineClass *pmc = PPCE500_MACHINE_GET_CLASS(pms); dev = qdev_create(NULL, TYPE_OPENPIC); @@ -846,6 +848,7 @@ void ppce500_init(MachineState *machine) struct boot_info *boot_info; int dt_size; int i; +unsigned int smp_cpus = machine->smp.cpus; /* irq num for pin INTA, INTB, INTC and INTD is 1, 2, 3 and * 4 respectively */ unsigned int pci_irq_nrs[PCI_NUM_PINS] = {1, 2, 3, 4}; diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c index 02d8559621..257b26ee24 100644 --- a/hw/ppc/mac_newworld.c +++ b/hw/ppc/mac_newworld.c @@ -135,6 +135,7 @@ static void ppc_core99_init(MachineState *machine) DeviceState *dev, *pic_dev; hwaddr nvram_addr = 0xFFF04000; uint64_t tbfreq; +unsigned int smp_cpus = machine->smp.cpus; linux_boot = (kernel_filename != NULL); @@ -464,7 +465,7 @@ static void ppc_core99_init(MachineState *machine) sysbus_mmio_map(s, 1, CFG_ADDR + 2); fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus); -fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus); +fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)machine->smp.max_cpus); fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size); fw_cfg_add_i16(fw_cfg, FW_CFG_MACHINE_ID, machine_arch); fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, kernel_base); diff --git a/hw/ppc/mac_oldworld.c b/hw/ppc/mac_oldworld.c index 460cbc7923..1968f05a6c 100644 --- a/hw/ppc/mac_oldworld.c +++ b/hw/ppc/mac_oldworld.c @@ -99,6 +99,7 @@ static void ppc_heathrow_init(MachineState *machine) DeviceState *dev, *pic_dev; BusState *adb_bus; int bios_size; +unsigned int smp_cpus = machine->smp.cpus; uint16_t ppc_boot_device; DriveInfo *hd[MAX_IDE_BUS * MAX_IDE_DEVS]; void *fw_cfg; @@ -322,7 +323,7 @@ static void ppc_heathrow_init(MachineState *machine) sysbus_mmio_map(s, 1, CFG_ADDR + 2); fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus); -fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus); +fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)machine->smp.max_cpus); fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size); fw_cfg_add_i16(fw_cfg, FW_CFG_MACHINE_ID, ARCH_HEATHROW); fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, kernel_base); diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c index 1e8c505936..3bb15338de 100644 --- a/hw/ppc/pnv.c +++ b/hw/ppc/pnv.c @@ -678,7 +678,8 @@ static void pnv_init(MachineState *machine) object_property_add_child(OBJECT(pnv), chip_name, chip, &error_fatal); object_property_set_int(chip, PNV_CHIP_HWID(i), "chip-id", &error_fatal); -object_property_set_int(chip, smp_cores, "nr-cores", &error_fatal); +object_property_set_int(chip, machine->smp.cores, +"nr-cores", &error_fatal); object_property_set_bool(chip, true, "realized", &error_fatal); } g_free(chip_typename); @@ -1134,6 +1135,7 @@ static void pnv_chip_instance_init(Object *obj) static void pnv_chip_core_realize(PnvChip *chip, Error **errp) { +MachineState *ms = MACHINE(qdev_get_machine()); Error *error = NULL; PnvChipClass *pcc = PNV_CHIP_GET_CLASS(chip); const char *typename = pnv_chip_core_typename(chip); @@ -1168,7 +1170,7 @@ static void pnv_chip_core_realize(PnvChip *chip, Error **errp) snprintf(core_name, sizeof(core_name), "core[%d]", core_hwid); object_property_add_child(OBJECT(chip), core_name, OBJECT(pnv_core),
[Qemu-devel] [PATCH v3 07/10] hw/i386: Replace global smp variables with machine smp properties
The global smp variables in i386 are replaced with smp machine properties. To avoid calling qdev_get_machine() as much as possible, some related funtions for acpi data generations are refactored. No semantic changes. A local variable of the same name would be introduced in the declaration phase if it's used widely in the context OR replace it on the spot if it's only used once. No semantic changes. Signed-off-by: Like Xu --- hw/i386/acpi-build.c | 11 +++ hw/i386/kvmvapic.c| 7 +-- hw/i386/pc.c | 24 +++- hw/i386/xen/xen-hvm.c | 4 target/i386/cpu.c | 4 +++- 5 files changed, 34 insertions(+), 16 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index c8e47e5713..eb41af04ce 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -45,6 +45,7 @@ #include "sysemu/tpm.h" #include "hw/acpi/tpm.h" #include "hw/acpi/vmgenid.h" +#include "hw/boards.h" #include "sysemu/tpm_backend.h" #include "hw/timer/mc146818rtc_regs.h" #include "hw/mem/memory-device.h" @@ -126,7 +127,8 @@ typedef struct FwCfgTPMConfig { uint8_t tpmppi_version; } QEMU_PACKED FwCfgTPMConfig; -static void init_common_fadt_data(Object *o, AcpiFadtData *data) +static void init_common_fadt_data(MachineState *ms, Object *o, + AcpiFadtData *data) { uint32_t io = object_property_get_uint(o, ACPI_PM_PROP_PM_IO_BASE, NULL); AmlAddressSpace as = AML_AS_SYSTEM_IO; @@ -142,7 +144,8 @@ static void init_common_fadt_data(Object *o, AcpiFadtData *data) * CPUs for more than 8 CPUs, "Clustered Logical" mode has to be * used */ -((max_cpus > 8) ? (1 << ACPI_FADT_F_FORCE_APIC_CLUSTER_MODEL) : 0), +((ms->smp.max_cpus > 8) ? +(1 << ACPI_FADT_F_FORCE_APIC_CLUSTER_MODEL) : 0), .int_model = 1 /* Multiple APIC */, .rtc_century = RTC_CENTURY, .plvl2_lat = 0xfff /* C2 state not supported */, @@ -176,7 +179,7 @@ static Object *object_resolve_type_unambiguous(const char *typename) return o; } -static void acpi_get_pm_info(AcpiPmInfo *pm) +static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm) { Object *piix = object_resolve_type_unambiguous(TYPE_PIIX4_PM); Object *lpc = object_resolve_type_unambiguous(TYPE_ICH9_LPC_DEVICE); @@ -2629,7 +2632,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine) AcpiSlicOem slic_oem = { .id = NULL, .table_id = NULL }; Object *vmgenid_dev; -acpi_get_pm_info(&pm); +acpi_get_pm_info(machine, &pm); acpi_get_misc_info(&misc); acpi_get_pci_holes(&pci_hole, &pci_hole64); acpi_get_slic_oem(&slic_oem); diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c index 70f6f26a94..3fce704613 100644 --- a/hw/i386/kvmvapic.c +++ b/hw/i386/kvmvapic.c @@ -17,6 +17,7 @@ #include "sysemu/kvm.h" #include "hw/i386/apic_internal.h" #include "hw/sysbus.h" +#include "hw/boards.h" #include "tcg/tcg.h" #define VAPIC_IO_PORT 0x7e @@ -441,11 +442,12 @@ static void do_patch_instruction(CPUState *cs, run_on_cpu_data data) static void patch_instruction(VAPICROMState *s, X86CPU *cpu, target_ulong ip) { +MachineState *ms = MACHINE(qdev_get_machine()); CPUState *cs = CPU(cpu); VAPICHandlers *handlers; PatchInfo *info; -if (smp_cpus == 1) { +if (ms->smp.cpus == 1) { handlers = &s->rom_state.up; } else { handlers = &s->rom_state.mp; @@ -746,6 +748,7 @@ static void do_vapic_enable(CPUState *cs, run_on_cpu_data data) static void kvmvapic_vm_state_change(void *opaque, int running, RunState state) { +MachineState *ms = MACHINE(qdev_get_machine()); VAPICROMState *s = opaque; uint8_t *zero; @@ -754,7 +757,7 @@ static void kvmvapic_vm_state_change(void *opaque, int running, } if (s->state == VAPIC_ACTIVE) { -if (smp_cpus == 1) { +if (ms->smp.cpus == 1) { run_on_cpu(first_cpu, do_vapic_enable, RUN_ON_CPU_HOST_PTR(s)); } else { zero = g_malloc0(s->rom_state.vapic_size); diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 9bcd867ea3..896c22e32e 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -932,12 +932,14 @@ void enable_compat_apic_id_mode(void) * no concept of "CPU index", and the NUMA tables on fw_cfg need the APIC ID of * all CPUs up to max_cpus. */ -static uint32_t x86_cpu_apic_id_from_index(unsigned int cpu_index) +static uint32_t x86_cpu_apic_id_from_index(MachineState *ms, + unsigned int cpu_index) { uint32_t correct_id; static bool warned; -correct_id = x86_apicid
[Qemu-devel] [PATCH v3 09/10] hw: Replace global smp variables with MachineState for all remaining archs
The global smp variables in alpha/hppa/mips/openrisc/sparc*/xtensa codes are replaced with smp properties from MachineState. A local variable of the same name would be introduced in the declaration phase if it's used widely in the context OR replace it on the spot if it's only used once. No semantic changes. Signed-off-by: Like Xu Reviewed-by: Alistair Francis --- hw/alpha/dp264.c | 1 + hw/hppa/machine.c | 2 ++ hw/mips/boston.c | 2 +- hw/mips/mips_malta.c | 2 ++ hw/openrisc/openrisc_sim.c | 1 + hw/sparc/sun4m.c | 2 ++ hw/sparc64/sun4u.c | 4 ++-- hw/xtensa/sim.c| 2 +- hw/xtensa/xtfpga.c | 1 + 9 files changed, 13 insertions(+), 4 deletions(-) diff --git a/hw/alpha/dp264.c b/hw/alpha/dp264.c index 0347eb897c..9dfb835013 100644 --- a/hw/alpha/dp264.c +++ b/hw/alpha/dp264.c @@ -63,6 +63,7 @@ static void clipper_init(MachineState *machine) char *palcode_filename; uint64_t palcode_entry, palcode_low, palcode_high; uint64_t kernel_entry, kernel_low, kernel_high; +unsigned int smp_cpus = machine->smp.cpus; /* Create up to 4 cpus. */ memset(cpus, 0, sizeof(cpus)); diff --git a/hw/hppa/machine.c b/hw/hppa/machine.c index 416e67bab1..662838d83b 100644 --- a/hw/hppa/machine.c +++ b/hw/hppa/machine.c @@ -72,6 +72,7 @@ static void machine_hppa_init(MachineState *machine) MemoryRegion *ram_region; MemoryRegion *cpu_region; long i; +unsigned int smp_cpus = machine->smp.cpus; ram_size = machine->ram_size; @@ -242,6 +243,7 @@ static void machine_hppa_init(MachineState *machine) static void hppa_machine_reset(MachineState *ms) { +unsigned int smp_cpus = ms->smp.cpus; int i; qemu_devices_reset(); diff --git a/hw/mips/boston.c b/hw/mips/boston.c index a8b29f62f5..ccbfac54ef 100644 --- a/hw/mips/boston.c +++ b/hw/mips/boston.c @@ -460,7 +460,7 @@ static void boston_mach_init(MachineState *machine) object_property_set_str(OBJECT(s->cps), machine->cpu_type, "cpu-type", &err); -object_property_set_int(OBJECT(s->cps), smp_cpus, "num-vp", &err); +object_property_set_int(OBJECT(s->cps), machine->smp.cpus, "num-vp", &err); object_property_set_bool(OBJECT(s->cps), true, "realized", &err); if (err != NULL) { diff --git a/hw/mips/mips_malta.c b/hw/mips/mips_malta.c index 5fe9512c24..ead5976d1a 100644 --- a/hw/mips/mips_malta.c +++ b/hw/mips/mips_malta.c @@ -1095,6 +1095,8 @@ static int64_t load_kernel (void) static void malta_mips_config(MIPSCPU *cpu) { +MachineState *ms = MACHINE(qdev_get_machine()); +unsigned int smp_cpus = ms->smp.cpus; CPUMIPSState *env = &cpu->env; CPUState *cs = CPU(cpu); diff --git a/hw/openrisc/openrisc_sim.c b/hw/openrisc/openrisc_sim.c index 0a906d815e..8d828e78ee 100644 --- a/hw/openrisc/openrisc_sim.c +++ b/hw/openrisc/openrisc_sim.c @@ -131,6 +131,7 @@ static void openrisc_sim_init(MachineState *machine) qemu_irq *cpu_irqs[2]; qemu_irq serial_irq; int n; +unsigned int smp_cpus = machine->smp.cpus; for (n = 0; n < smp_cpus; n++) { cpu = OPENRISC_CPU(cpu_create(machine->cpu_type)); diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c index 07d126aea8..5c3739f2ef 100644 --- a/hw/sparc/sun4m.c +++ b/hw/sparc/sun4m.c @@ -852,6 +852,8 @@ static void sun4m_hw_init(const struct sun4m_hwdef *hwdef, FWCfgState *fw_cfg; DeviceState *dev; SysBusDevice *s; +unsigned int smp_cpus = machine->smp.cpus; +unsigned int max_cpus = machine->smp.max_cpus; /* init CPUs */ for(i = 0; i < smp_cpus; i++) { diff --git a/hw/sparc64/sun4u.c b/hw/sparc64/sun4u.c index 399f2d73c8..0807f274bf 100644 --- a/hw/sparc64/sun4u.c +++ b/hw/sparc64/sun4u.c @@ -678,8 +678,8 @@ static void sun4uv_init(MemoryRegion *address_space_mem, &FW_CFG_IO(dev)->comb_iomem); fw_cfg = FW_CFG(dev); -fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus); -fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus); +fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)machine->smp.cpus); +fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)machine->smp.max_cpus); fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size); fw_cfg_add_i16(fw_cfg, FW_CFG_MACHINE_ID, hwdef->machine_id); fw_cfg_add_i64(fw_cfg, FW_CFG_KERNEL_ADDR, kernel_entry); diff --git a/hw/xtensa/sim.c b/hw/xtensa/sim.c index 12c7437398..a4eef76fbc 100644 --- a/hw/xtensa/sim.c +++ b/hw/xtensa/sim.c @@ -60,7 +60,7 @@ static void xtensa_sim_init(MachineState *machine) const char *kernel_filename = machine->kernel_filename; int n; -for (n = 0; n < smp_cpus; n++) { +for (n = 0; n < machine->smp.cpus; n++) { cpu = XTENSA_CPU(cpu_create(machine-&g
[Qemu-devel] [PATCH v3 00/10] Refactor cpu topo into machine properties
This patch series make existing cores/threads/sockets into machine properties and get rid of global smp_* variables they use currently. The purpose of getting rid of globals is disentangle layer violations and let's do it one step at a time by replacing the smp_foo with qdev_get_machine() as few calls as possible and delay other related refactoring efforts. ==changelog== v3: - rephrase commit messages - s/of/of present/ for CpuTopology comment - drop reduanct arguments such as cpu_type - use ms instead of macs in migration context - rebase to commit 1b46b4daa6 v2: https://patchwork.ozlabs.org/cover/1095727/ - pass MachineState via call chain with trivial fixups - replace smp_cpus directly at places if it's only used once - s/topo/smp/ and drop smp_ prefix inside CpuTopology structure - add more commit messages to explaining what patch does - fix Patchew build failure for xen usage - use macs rather than ms in migration context for MigrationState - cleanup unrelated and redundant changes - spilt OpenRISC and RISC-V related patches v1: https://patchwork.kernel.org/cover/10876667/ Like Xu (10): hw/boards: Add struct CpuTopology to MachineState machine: Refactor smp-related call chains to pass MachineState general: Replace global smp variables with smp machine properties hw/ppc: Replace global smp variables with machine smp properties hw/riscv: Replace global smp variables with machine smp properties hw/s390x: Replace global smp variables with machine smp properties hw/i386: Replace global smp variables with machine smp properties hw/arm: Replace global smp variables with machine smp properties hw: Replace global smp variables with MachineState for all remaining archs vl.c: Replace smp global variables with smp machine properties accel/kvm/kvm-all.c | 4 +-- backends/hostmem.c | 6 +++-- cpus.c | 6 +++-- exec.c | 3 ++- gdbstub.c| 4 +++ hw/alpha/dp264.c | 1 + hw/arm/fsl-imx6.c| 6 - hw/arm/fsl-imx6ul.c | 6 - hw/arm/fsl-imx7.c| 7 +++-- hw/arm/highbank.c| 1 + hw/arm/mcimx6ul-evk.c| 2 +- hw/arm/mcimx7d-sabre.c | 2 +- hw/arm/raspi.c | 4 +-- hw/arm/realview.c| 1 + hw/arm/sabrelite.c | 2 +- hw/arm/vexpress.c| 16 +++- hw/arm/virt.c| 10 ++-- hw/arm/xlnx-zynqmp.c | 16 +++- hw/cpu/core.c| 4 ++- hw/hppa/machine.c| 4 ++- hw/i386/acpi-build.c | 13 ++ hw/i386/kvmvapic.c | 7 +++-- hw/i386/pc.c | 33 ++-- hw/i386/xen/xen-hvm.c| 4 +++ hw/mips/boston.c | 2 +- hw/mips/mips_malta.c | 24 + hw/openrisc/openrisc_sim.c | 1 + hw/ppc/e500.c| 3 +++ hw/ppc/mac_newworld.c| 3 ++- hw/ppc/mac_oldworld.c| 3 ++- hw/ppc/pnv.c | 9 --- hw/ppc/prep.c| 4 +-- hw/ppc/spapr.c | 37 ++ hw/ppc/spapr_rtas.c | 4 ++- hw/riscv/sifive_e.c | 6 +++-- hw/riscv/sifive_plic.c | 3 +++ hw/riscv/sifive_u.c | 6 +++-- hw/riscv/spike.c | 2 ++ hw/riscv/virt.c | 1 + hw/s390x/s390-virtio-ccw.c | 9 --- hw/s390x/sclp.c | 2 +- hw/smbios/smbios.c | 26 +++ hw/sparc/sun4m.c | 2 ++ hw/sparc64/sun4u.c | 4 +-- hw/xtensa/sim.c | 2 +- hw/xtensa/xtfpga.c | 1 + include/hw/boards.h | 19 -- include/hw/firmware/smbios.h | 5 ++-- include/hw/i386/pc.h | 2 +- migration/postcopy-ram.c | 8 +- numa.c | 1 + qmp.c| 2 +- target/arm/cpu.c | 8 +- target/i386/cpu.c| 4 ++- target/openrisc/sys_helper.c | 6 - target/s390x/cpu.c | 3 +++ target/s390x/excp_helper.c | 5 tcg/tcg.c| 13 +- vl.c | 50 +++- 59 files changed, 301 insertions(+), 141 deletions(-) -- 2.21.0
[Qemu-devel] [PATCH v3 01/10] hw/boards: Add struct CpuTopology to MachineState
The cpu topology property CpuTopology is added to the MachineState and its members are initialized with the leagcy global smp variables. >From this commit, the code in the system emulation mode is supposed to use cpu topology variables from MachineState instead of the global ones defined in vl.c and there is no semantic change. Suggested-by: Igor Mammedov Suggested-by: Eduardo Habkost Signed-off-by: Like Xu Reviewed-by: Alistair Francis --- include/hw/boards.h | 15 +++ vl.c| 5 + 2 files changed, 20 insertions(+) diff --git a/include/hw/boards.h b/include/hw/boards.h index 6f7916f88f..bc23b5db1d 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -230,6 +230,20 @@ typedef struct DeviceMemoryState { MemoryRegion mr; } DeviceMemoryState; +/** + * CpuTopology: + * @cpus: the number of present logical processors on the machine + * @cores: the number of cores in one package + * @threads: the number of threads in one core + * @max_cpus: the maximum number of logical processors on the machine + */ +typedef struct CpuTopology { +unsigned int cpus; +unsigned int cores; +unsigned int threads; +unsigned int max_cpus; +} CpuTopology; + /** * MachineState: */ @@ -272,6 +286,7 @@ struct MachineState { const char *cpu_type; AccelState *accelerator; CPUArchIdList *possible_cpus; +CpuTopology smp; struct NVDIMMState *nvdimms_state; }; diff --git a/vl.c b/vl.c index c8ca9ff6ff..40b006577b 100644 --- a/vl.c +++ b/vl.c @@ -4133,6 +4133,11 @@ int main(int argc, char **argv, char **envp) smp_parse(qemu_opts_find(qemu_find_opts("smp-opts"), NULL)); +current_machine->smp.cpus = smp_cpus; +current_machine->smp.max_cpus = max_cpus; +current_machine->smp.cores = smp_cores; +current_machine->smp.threads = smp_threads; + /* sanity-check smp_cpus and max_cpus against machine_class */ if (smp_cpus < machine_class->min_cpus) { error_report("Invalid SMP CPUs %d. The min CPUs " -- 2.21.0
Re: [Qemu-devel] [PATCH v2 00/10] refactor cpu topo into machine properties
On 2019/5/6 16:33, Like Xu wrote: This patch series make existing cores/threads/sockets into machine properties and get rid of global smp_* variables they use currently. The purpose of getting rid of globals is disentangle layer violations and let's do it one step at a time by replacing the smp_foo with qdev_get_machine() as few calls as possible and delay other related refactoring efforts. It looks like the changelog is missing and here it is: ==changelog== v2: - pass MachineState via call chain with trivial fixups - replace smp_cpus directly at places if it's only used once - s/topo/smp/ and drop smp_ prefix inside CpuTopology structure - add more commit messages to explaining what patch does - fix Patchew build failure for xen usage - use macs rather than ms in migration context for MigrationState - cleanup unrelated and redundant changes - spilt OpenRISC and RISC-V related patches v1: https://patchwork.kernel.org/cover/10876667/ Like Xu (10): hw/boards: add struct CpuTopology to MachineState cpu/topology: related call chains refactoring to pass MachineState cpu/topology: replace global smp variables by MachineState in general path cpu/topology: add uncommon arch support for smp machine properties cpu/topology: add hw/ppc support for smp machine properties cpu/topology: add hw/riscv support for smp machine properties cpu/topology: add hw/s390x support for smp machine properties cpu/topology: add hw/i386 support for smp machine properties cpu/topology: add hw/arm support for smp machine properties cpu/topology: replace smp global variables with smp machine properties accel/kvm/kvm-all.c | 4 ++-- backends/hostmem.c | 6 -- cpus.c | 6 -- exec.c | 3 ++- gdbstub.c| 4 hw/alpha/dp264.c | 1 + hw/arm/fsl-imx6.c| 6 +- hw/arm/fsl-imx6ul.c | 6 +- hw/arm/fsl-imx7.c| 7 +-- hw/arm/highbank.c| 1 + hw/arm/mcimx6ul-evk.c| 2 +- hw/arm/mcimx7d-sabre.c | 2 +- hw/arm/raspi.c | 4 ++-- hw/arm/realview.c| 1 + hw/arm/sabrelite.c | 2 +- hw/arm/vexpress.c| 16 -- hw/arm/virt.c| 10 +++-- hw/arm/xlnx-zynqmp.c | 16 -- hw/cpu/core.c| 4 +++- hw/hppa/machine.c| 4 +++- hw/i386/acpi-build.c | 13 +++- hw/i386/kvmvapic.c | 7 +-- hw/i386/pc.c | 33 - hw/i386/xen/xen-hvm.c| 4 hw/mips/boston.c | 2 +- hw/mips/mips_malta.c | 23 +++- hw/openrisc/openrisc_sim.c | 1 + hw/ppc/e500.c| 3 +++ hw/ppc/mac_newworld.c| 3 ++- hw/ppc/mac_oldworld.c| 3 ++- hw/ppc/pnv.c | 9 hw/ppc/prep.c| 4 ++-- hw/ppc/spapr.c | 37 +++- hw/ppc/spapr_rtas.c | 4 +++- hw/riscv/sifive_e.c | 6 -- hw/riscv/sifive_plic.c | 3 +++ hw/riscv/sifive_u.c | 6 -- hw/riscv/spike.c | 2 ++ hw/riscv/virt.c | 1 + hw/s390x/s390-virtio-ccw.c | 9 hw/s390x/sclp.c | 2 +- hw/smbios/smbios.c | 26 +-- hw/sparc/sun4m.c | 2 ++ hw/sparc64/sun4u.c | 4 ++-- hw/xtensa/sim.c | 2 +- hw/xtensa/xtfpga.c | 1 + include/hw/boards.h | 19 +++-- include/hw/firmware/smbios.h | 5 +++-- include/hw/i386/pc.h | 2 +- migration/postcopy-ram.c | 8 ++- numa.c | 1 + qmp.c| 2 +- target/arm/cpu.c | 8 ++- target/i386/cpu.c| 4 +++- target/openrisc/sys_helper.c | 6 +- target/s390x/cpu.c | 3 +++ target/s390x/excp_helper.c | 5 + tcg/tcg.c| 13 +++- vl.c | 50 59 files changed, 301 insertions(+), 140 deletions(-)
[Qemu-devel] [PATCH v2 10/10] cpu/topology: replace smp global variables with smp machine properties
At the end of this smp refactoring series, the global ones are removed and only smp machine properties are fully applied and enabled. Signed-off-by: Like Xu --- vl.c | 53 ++--- 1 file changed, 26 insertions(+), 27 deletions(-) diff --git a/vl.c b/vl.c index 34f05b2..a3e426c 100644 --- a/vl.c +++ b/vl.c @@ -162,10 +162,6 @@ static Chardev **serial_hds; Chardev *parallel_hds[MAX_PARALLEL_PORTS]; int win2k_install_hack = 0; int singlestep = 0; -int smp_cpus; -unsigned int max_cpus; -int smp_cores = 1; -int smp_threads = 1; int acpi_enabled = 1; int no_hpet = 0; int fd_bootchk = 1; @@ -1282,8 +1278,9 @@ static void smp_parse(QemuOpts *opts) sockets = sockets > 0 ? sockets : 1; cpus = cores * threads * sockets; } else { -max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus); -sockets = max_cpus / (cores * threads); +current_machine->smp.max_cpus = +qemu_opt_get_number(opts, "maxcpus", cpus); +sockets = current_machine->smp.max_cpus / (cores * threads); } } else if (cores == 0) { threads = threads > 0 ? threads : 1; @@ -1300,34 +1297,37 @@ static void smp_parse(QemuOpts *opts) exit(1); } -max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus); +current_machine->smp.max_cpus = +qemu_opt_get_number(opts, "maxcpus", cpus); -if (max_cpus < cpus) { +if (current_machine->smp.max_cpus < cpus) { error_report("maxcpus must be equal to or greater than smp"); exit(1); } -if (sockets * cores * threads > max_cpus) { +if (sockets * cores * threads > current_machine->smp.max_cpus) { error_report("cpu topology: " "sockets (%u) * cores (%u) * threads (%u) > " "maxcpus (%u)", - sockets, cores, threads, max_cpus); + sockets, cores, threads, + current_machine->smp.max_cpus); exit(1); } -if (sockets * cores * threads != max_cpus) { +if (sockets * cores * threads != current_machine->smp.max_cpus) { warn_report("Invalid CPU topology deprecated: " "sockets (%u) * cores (%u) * threads (%u) " "!= maxcpus (%u)", -sockets, cores, threads, max_cpus); +sockets, cores, threads, +current_machine->smp.max_cpus); } -smp_cpus = cpus; -smp_cores = cores; -smp_threads = threads; +current_machine->smp.cpus = cpus; +current_machine->smp.cores = cores; +current_machine->smp.threads = threads; } -if (smp_cpus > 1) { +if (current_machine->smp.cpus > 1) { Error *blocker = NULL; error_setg(&blocker, QERR_REPLAY_NOT_SUPPORTED, "smp"); replay_add_blocker(blocker); @@ -4094,26 +4094,25 @@ int main(int argc, char **argv, char **envp) machine_class->default_cpus = machine_class->default_cpus ?: 1; /* default to machine_class->default_cpus */ -smp_cpus = machine_class->default_cpus; -max_cpus = machine_class->default_cpus; +current_machine->smp.cpus = machine_class->default_cpus; +current_machine->smp.max_cpus = machine_class->default_cpus; +current_machine->smp.cores = 1; +current_machine->smp.threads = 1; smp_parse(qemu_opts_find(qemu_find_opts("smp-opts"), NULL)); -current_machine->smp.cpus = smp_cpus; -current_machine->smp.max_cpus = max_cpus; -current_machine->smp.cores = smp_cores; -current_machine->smp.threads = smp_threads; - /* sanity-check smp_cpus and max_cpus against machine_class */ -if (smp_cpus < machine_class->min_cpus) { +if (current_machine->smp.cpus < machine_class->min_cpus) { error_report("Invalid SMP CPUs %d. The min CPUs " - "supported by machine '%s' is %d", smp_cpus, + "supported by machine '%s' is %d", + current_machine->smp.cpus, machine_class->name, machine_class->min_cpus); exit(1); } -if (max_cpus > machine_class->max_cpus) { +if (current_machine->smp.max_cpus > machine_class->max_cpus) { error_report("Invalid SMP CPUs %d. The max CPUs " - "supported by machine '%s' is %d", max_cpus, + "supported by machine '%s' is %d", + current_machine->smp.max_cpus, machine_class->name, machine_class->max_cpus); exit(1); } -- 1.8.3.1
[Qemu-devel] [PATCH v2 08/10] cpu/topology: add hw/i386 support for smp machine properties
Following the replace rules, the global smp variables in i386 are replaced with smp machine properties. To avoid calling qdev_get_machine() as much as possible, related funtions for acpi data generations including init_cpus() are refactored to pass MachineState. No semantic changes. Signed-off-by: Like Xu --- hw/arm/vexpress.c | 4 ++-- hw/i386/acpi-build.c | 13 - hw/i386/kvmvapic.c| 7 +-- hw/i386/pc.c | 24 +++- hw/i386/xen/xen-hvm.c | 4 target/i386/cpu.c | 4 +++- 6 files changed, 37 insertions(+), 19 deletions(-) diff --git a/hw/arm/vexpress.c b/hw/arm/vexpress.c index d8634f3..19273a2 100644 --- a/hw/arm/vexpress.c +++ b/hw/arm/vexpress.c @@ -377,8 +377,8 @@ static void a15_daughterboard_init(const VexpressMachineState *vms, memory_region_add_subregion(sysmem, 0x8000, ram); /* 0x2c00 A15MPCore private memory region (GIC) */ -init_cpus(cpu_type, TYPE_A15MPCORE_PRIV, 0x2c00, pic, vms->secure, - vms->virt); +init_cpus(machine, cpu_type, TYPE_A15MPCORE_PRIV, + 0x2c00, pic, vms->secure, vms->virt); /* A15 daughterboard peripherals: */ diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 416da31..29adc17 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -44,6 +44,7 @@ #include "sysemu/tpm.h" #include "hw/acpi/tpm.h" #include "hw/acpi/vmgenid.h" +#include "hw/boards.h" #include "sysemu/tpm_backend.h" #include "hw/timer/mc146818rtc_regs.h" #include "hw/mem/memory-device.h" @@ -125,7 +126,8 @@ typedef struct FwCfgTPMConfig { uint8_t tpmppi_version; } QEMU_PACKED FwCfgTPMConfig; -static void init_common_fadt_data(Object *o, AcpiFadtData *data) +static void init_common_fadt_data(MachineState *ms, Object *o, + AcpiFadtData *data) { uint32_t io = object_property_get_uint(o, ACPI_PM_PROP_PM_IO_BASE, NULL); AmlAddressSpace as = AML_AS_SYSTEM_IO; @@ -141,7 +143,8 @@ static void init_common_fadt_data(Object *o, AcpiFadtData *data) * CPUs for more than 8 CPUs, "Clustered Logical" mode has to be * used */ -((max_cpus > 8) ? (1 << ACPI_FADT_F_FORCE_APIC_CLUSTER_MODEL) : 0), +((ms->smp.max_cpus > 8) ? +(1 << ACPI_FADT_F_FORCE_APIC_CLUSTER_MODEL) : 0), .int_model = 1 /* Multiple APIC */, .rtc_century = RTC_CENTURY, .plvl2_lat = 0xfff /* C2 state not supported */, @@ -164,7 +167,7 @@ static void init_common_fadt_data(Object *o, AcpiFadtData *data) *data = fadt; } -static void acpi_get_pm_info(AcpiPmInfo *pm) +static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm) { Object *piix = piix4_pm_find(); Object *lpc = ich9_lpc_find(); @@ -174,7 +177,7 @@ static void acpi_get_pm_info(AcpiPmInfo *pm) pm->pcihp_io_base = 0; pm->pcihp_io_len = 0; -init_common_fadt_data(obj, &pm->fadt); +init_common_fadt_data(machine, obj, &pm->fadt); if (piix) { /* w2k requires FADT(rev1) or it won't boot, keep PC compatible */ pm->fadt.rev = 1; @@ -2617,7 +2620,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine) AcpiSlicOem slic_oem = { .id = NULL, .table_id = NULL }; Object *vmgenid_dev; -acpi_get_pm_info(&pm); +acpi_get_pm_info(machine, &pm); acpi_get_misc_info(&misc); acpi_get_pci_holes(&pci_hole, &pci_hole64); acpi_get_slic_oem(&slic_oem); diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c index 70f6f26..3fce704 100644 --- a/hw/i386/kvmvapic.c +++ b/hw/i386/kvmvapic.c @@ -17,6 +17,7 @@ #include "sysemu/kvm.h" #include "hw/i386/apic_internal.h" #include "hw/sysbus.h" +#include "hw/boards.h" #include "tcg/tcg.h" #define VAPIC_IO_PORT 0x7e @@ -441,11 +442,12 @@ static void do_patch_instruction(CPUState *cs, run_on_cpu_data data) static void patch_instruction(VAPICROMState *s, X86CPU *cpu, target_ulong ip) { +MachineState *ms = MACHINE(qdev_get_machine()); CPUState *cs = CPU(cpu); VAPICHandlers *handlers; PatchInfo *info; -if (smp_cpus == 1) { +if (ms->smp.cpus == 1) { handlers = &s->rom_state.up; } else { handlers = &s->rom_state.mp; @@ -746,6 +748,7 @@ static void do_vapic_enable(CPUState *cs, run_on_cpu_data data) static void kvmvapic_vm_state_change(void *opaque, int running, RunState state) { +MachineState *ms = MACHINE(qdev_get_machine()); VAPICROMState *s = opaque; uint8_t *zero; @@ -754,7 +757,7 @@ static void kvmvapic_vm_state_change(void *opaque, int running, } if (s->state == VAPIC_ACTIVE) { -
[Qemu-devel] [PATCH v2 00/10] refactor cpu topo into machine properties
This patch series make existing cores/threads/sockets into machine properties and get rid of global smp_* variables they use currently. The purpose of getting rid of globals is disentangle layer violations and let's do it one step at a time by replacing the smp_foo with qdev_get_machine() as few calls as possible and delay other related refactoring efforts. Like Xu (10): hw/boards: add struct CpuTopology to MachineState cpu/topology: related call chains refactoring to pass MachineState cpu/topology: replace global smp variables by MachineState in general path cpu/topology: add uncommon arch support for smp machine properties cpu/topology: add hw/ppc support for smp machine properties cpu/topology: add hw/riscv support for smp machine properties cpu/topology: add hw/s390x support for smp machine properties cpu/topology: add hw/i386 support for smp machine properties cpu/topology: add hw/arm support for smp machine properties cpu/topology: replace smp global variables with smp machine properties accel/kvm/kvm-all.c | 4 ++-- backends/hostmem.c | 6 -- cpus.c | 6 -- exec.c | 3 ++- gdbstub.c| 4 hw/alpha/dp264.c | 1 + hw/arm/fsl-imx6.c| 6 +- hw/arm/fsl-imx6ul.c | 6 +- hw/arm/fsl-imx7.c| 7 +-- hw/arm/highbank.c| 1 + hw/arm/mcimx6ul-evk.c| 2 +- hw/arm/mcimx7d-sabre.c | 2 +- hw/arm/raspi.c | 4 ++-- hw/arm/realview.c| 1 + hw/arm/sabrelite.c | 2 +- hw/arm/vexpress.c| 16 -- hw/arm/virt.c| 10 +++-- hw/arm/xlnx-zynqmp.c | 16 -- hw/cpu/core.c| 4 +++- hw/hppa/machine.c| 4 +++- hw/i386/acpi-build.c | 13 +++- hw/i386/kvmvapic.c | 7 +-- hw/i386/pc.c | 33 - hw/i386/xen/xen-hvm.c| 4 hw/mips/boston.c | 2 +- hw/mips/mips_malta.c | 23 +++- hw/openrisc/openrisc_sim.c | 1 + hw/ppc/e500.c| 3 +++ hw/ppc/mac_newworld.c| 3 ++- hw/ppc/mac_oldworld.c| 3 ++- hw/ppc/pnv.c | 9 hw/ppc/prep.c| 4 ++-- hw/ppc/spapr.c | 37 +++- hw/ppc/spapr_rtas.c | 4 +++- hw/riscv/sifive_e.c | 6 -- hw/riscv/sifive_plic.c | 3 +++ hw/riscv/sifive_u.c | 6 -- hw/riscv/spike.c | 2 ++ hw/riscv/virt.c | 1 + hw/s390x/s390-virtio-ccw.c | 9 hw/s390x/sclp.c | 2 +- hw/smbios/smbios.c | 26 +-- hw/sparc/sun4m.c | 2 ++ hw/sparc64/sun4u.c | 4 ++-- hw/xtensa/sim.c | 2 +- hw/xtensa/xtfpga.c | 1 + include/hw/boards.h | 19 +++-- include/hw/firmware/smbios.h | 5 +++-- include/hw/i386/pc.h | 2 +- migration/postcopy-ram.c | 8 ++- numa.c | 1 + qmp.c| 2 +- target/arm/cpu.c | 8 ++- target/i386/cpu.c| 4 +++- target/openrisc/sys_helper.c | 6 +- target/s390x/cpu.c | 3 +++ target/s390x/excp_helper.c | 5 + tcg/tcg.c| 13 +++- vl.c | 50 59 files changed, 301 insertions(+), 140 deletions(-) -- 1.8.3.1
[Qemu-devel] [PATCH v2 05/10] cpu/topology: add hw/ppc support for smp machine properties
Following the replace rules, the global smp variables in ppc are replaced with smp machine properties. No semantic changes. Signed-off-by: Like Xu --- hw/ppc/e500.c | 3 +++ hw/ppc/mac_newworld.c | 3 ++- hw/ppc/mac_oldworld.c | 3 ++- hw/ppc/pnv.c | 6 -- hw/ppc/prep.c | 4 ++-- hw/ppc/spapr.c| 34 ++ hw/ppc/spapr_rtas.c | 4 +++- 7 files changed, 42 insertions(+), 15 deletions(-) diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c index beb2efd..5e42e5a 100644 --- a/hw/ppc/e500.c +++ b/hw/ppc/e500.c @@ -307,6 +307,7 @@ static int ppce500_load_device_tree(PPCE500MachineState *pms, bool dry_run) { MachineState *machine = MACHINE(pms); +unsigned int smp_cpus = machine->smp.cpus; const PPCE500MachineClass *pmc = PPCE500_MACHINE_GET_CLASS(pms); CPUPPCState *env = first_cpu->env_ptr; int ret = -1; @@ -734,6 +735,7 @@ static DeviceState *ppce500_init_mpic_qemu(PPCE500MachineState *pms, SysBusDevice *s; int i, j, k; MachineState *machine = MACHINE(pms); +unsigned int smp_cpus = machine->smp.cpus; const PPCE500MachineClass *pmc = PPCE500_MACHINE_GET_CLASS(pms); dev = qdev_create(NULL, TYPE_OPENPIC); @@ -846,6 +848,7 @@ void ppce500_init(MachineState *machine) struct boot_info *boot_info; int dt_size; int i; +unsigned int smp_cpus = machine->smp.cpus; /* irq num for pin INTA, INTB, INTC and INTD is 1, 2, 3 and * 4 respectively */ unsigned int pci_irq_nrs[PCI_NUM_PINS] = {1, 2, 3, 4}; diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c index 02d8559..257b26e 100644 --- a/hw/ppc/mac_newworld.c +++ b/hw/ppc/mac_newworld.c @@ -135,6 +135,7 @@ static void ppc_core99_init(MachineState *machine) DeviceState *dev, *pic_dev; hwaddr nvram_addr = 0xFFF04000; uint64_t tbfreq; +unsigned int smp_cpus = machine->smp.cpus; linux_boot = (kernel_filename != NULL); @@ -464,7 +465,7 @@ static void ppc_core99_init(MachineState *machine) sysbus_mmio_map(s, 1, CFG_ADDR + 2); fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus); -fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus); +fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)machine->smp.max_cpus); fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size); fw_cfg_add_i16(fw_cfg, FW_CFG_MACHINE_ID, machine_arch); fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, kernel_base); diff --git a/hw/ppc/mac_oldworld.c b/hw/ppc/mac_oldworld.c index 460cbc7..1968f05 100644 --- a/hw/ppc/mac_oldworld.c +++ b/hw/ppc/mac_oldworld.c @@ -99,6 +99,7 @@ static void ppc_heathrow_init(MachineState *machine) DeviceState *dev, *pic_dev; BusState *adb_bus; int bios_size; +unsigned int smp_cpus = machine->smp.cpus; uint16_t ppc_boot_device; DriveInfo *hd[MAX_IDE_BUS * MAX_IDE_DEVS]; void *fw_cfg; @@ -322,7 +323,7 @@ static void ppc_heathrow_init(MachineState *machine) sysbus_mmio_map(s, 1, CFG_ADDR + 2); fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus); -fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus); +fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)machine->smp.max_cpus); fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size); fw_cfg_add_i16(fw_cfg, FW_CFG_MACHINE_ID, ARCH_HEATHROW); fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, kernel_base); diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c index 1e8c505..3bb1533 100644 --- a/hw/ppc/pnv.c +++ b/hw/ppc/pnv.c @@ -678,7 +678,8 @@ static void pnv_init(MachineState *machine) object_property_add_child(OBJECT(pnv), chip_name, chip, &error_fatal); object_property_set_int(chip, PNV_CHIP_HWID(i), "chip-id", &error_fatal); -object_property_set_int(chip, smp_cores, "nr-cores", &error_fatal); +object_property_set_int(chip, machine->smp.cores, +"nr-cores", &error_fatal); object_property_set_bool(chip, true, "realized", &error_fatal); } g_free(chip_typename); @@ -1134,6 +1135,7 @@ static void pnv_chip_instance_init(Object *obj) static void pnv_chip_core_realize(PnvChip *chip, Error **errp) { +MachineState *ms = MACHINE(qdev_get_machine()); Error *error = NULL; PnvChipClass *pcc = PNV_CHIP_GET_CLASS(chip); const char *typename = pnv_chip_core_typename(chip); @@ -1168,7 +1170,7 @@ static void pnv_chip_core_realize(PnvChip *chip, Error **errp) snprintf(core_name, sizeof(core_name), "core[%d]", core_hwid); object_property_add_child(OBJECT(chip), core_name, OBJECT(pnv_core), &error_fatal); -object_property_set_int(OBJECT(pnv_core), smp_threads, "nr-threads", +object_p
[Qemu-devel] [PATCH v2 03/10] cpu/topology: replace global smp variables by MachineState in general path
Basically, the context could get the MachineState reference via call chains or unrecommend qdev_get_machine() in !CONFIG_USER_ONLY mode. A new variable of the same name would be introduced in the declaration phase out of less effort OR replace it on the spot if it's only used once in the context. No semantic changes. Signed-off-by: Like Xu --- accel/kvm/kvm-all.c | 4 ++-- backends/hostmem.c | 6 -- cpus.c | 6 -- exec.c | 3 ++- gdbstub.c| 4 hw/cpu/core.c| 4 +++- migration/postcopy-ram.c | 8 +++- numa.c | 1 + target/openrisc/sys_helper.c | 6 +- tcg/tcg.c| 13 - 10 files changed, 44 insertions(+), 11 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 524c4dd..f8ef39d 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -1533,8 +1533,8 @@ static int kvm_init(MachineState *ms) const char *name; int num; } num_cpus[] = { -{ "SMP", smp_cpus }, -{ "hotpluggable", max_cpus }, +{ "SMP", ms->smp.cpus }, +{ "hotpluggable", ms->smp.max_cpus }, { NULL, } }, *nc = num_cpus; int soft_vcpus_limit, hard_vcpus_limit; diff --git a/backends/hostmem.c b/backends/hostmem.c index 04baf47..463102a 100644 --- a/backends/hostmem.c +++ b/backends/hostmem.c @@ -222,6 +222,7 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value, { Error *local_err = NULL; HostMemoryBackend *backend = MEMORY_BACKEND(obj); +MachineState *ms = MACHINE(qdev_get_machine()); if (backend->force_prealloc) { if (value) { @@ -241,7 +242,7 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value, void *ptr = memory_region_get_ram_ptr(&backend->mr); uint64_t sz = memory_region_size(&backend->mr); -os_mem_prealloc(fd, ptr, sz, smp_cpus, &local_err); +os_mem_prealloc(fd, ptr, sz, ms->smp.cpus, &local_err); if (local_err) { error_propagate(errp, local_err); return; @@ -311,6 +312,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) { HostMemoryBackend *backend = MEMORY_BACKEND(uc); HostMemoryBackendClass *bc = MEMORY_BACKEND_GET_CLASS(uc); +MachineState *ms = MACHINE(qdev_get_machine()); Error *local_err = NULL; void *ptr; uint64_t sz; @@ -375,7 +377,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) */ if (backend->prealloc) { os_mem_prealloc(memory_region_get_fd(&backend->mr), ptr, sz, -smp_cpus, &local_err); +ms->smp.cpus, &local_err); if (local_err) { goto out; } diff --git a/cpus.c b/cpus.c index e58e7ab..b49db36 100644 --- a/cpus.c +++ b/cpus.c @@ -2068,8 +2068,10 @@ static void qemu_dummy_start_vcpu(CPUState *cpu) void qemu_init_vcpu(CPUState *cpu) { -cpu->nr_cores = smp_cores; -cpu->nr_threads = smp_threads; +MachineState *ms = MACHINE(qdev_get_machine()); + +cpu->nr_cores = ms->smp.cores; +cpu->nr_threads = ms->smp.threads; cpu->stopped = true; if (!cpu->as) { diff --git a/exec.c b/exec.c index 4e73477..2744df6 100644 --- a/exec.c +++ b/exec.c @@ -1871,6 +1871,7 @@ static void *file_ram_alloc(RAMBlock *block, bool truncate, Error **errp) { +MachineState *ms = MACHINE(qdev_get_machine()); void *area; block->page_size = qemu_fd_getpagesize(fd); @@ -1927,7 +1928,7 @@ static void *file_ram_alloc(RAMBlock *block, } if (mem_prealloc) { -os_mem_prealloc(fd, area, memory, smp_cpus, errp); +os_mem_prealloc(fd, area, memory, ms->smp.cpus, errp); if (errp && *errp) { qemu_ram_munmap(fd, area, memory); return NULL; diff --git a/gdbstub.c b/gdbstub.c index d54abd1..dba37df 100644 --- a/gdbstub.c +++ b/gdbstub.c @@ -30,6 +30,7 @@ #include "sysemu/sysemu.h" #include "exec/gdbstub.h" #include "hw/cpu/cluster.h" +#include "hw/boards.h" #endif #define MAX_PACKET_LENGTH 4096 @@ -1159,6 +1160,9 @@ static int gdb_handle_vcont(GDBState *s, const char *p) CPU_FOREACH(cpu) { max_cpus = max_cpus <= cpu->cpu_index ? cpu->cpu_index + 1 : max_cpus; } +#else +MachineState *ms = MACHINE(qdev_get_machine()); +unsigned int max_cpus = ms->smp.max_cpus; #endif /* uninitialised CPUs stay 0 */ newstates = g_new0(char, max_cpus); diff --git a/hw/cpu/core.c b/hw/cpu/core.c index 7e42e2c..be2c7e1 100644 --- a/hw/cpu/core.c +++ b/hw/c
[Qemu-devel] [PATCH v2 09/10] cpu/topology: add hw/arm support for smp machine properties
Following the replace rules, the global smp variables in arm are replaced with smp machine properties. The init_cpus() and xlnx_zynqmp_create_rpu() are refactored to pass MachineState. No semantic changes. Signed-off-by: Like Xu --- hw/arm/fsl-imx6.c | 6 +- hw/arm/fsl-imx6ul.c| 6 +- hw/arm/fsl-imx7.c | 7 +-- hw/arm/highbank.c | 1 + hw/arm/mcimx6ul-evk.c | 2 +- hw/arm/mcimx7d-sabre.c | 2 +- hw/arm/raspi.c | 4 ++-- hw/arm/realview.c | 1 + hw/arm/sabrelite.c | 2 +- hw/arm/vexpress.c | 12 hw/arm/virt.c | 8 +++- hw/arm/xlnx-zynqmp.c | 16 ++-- target/arm/cpu.c | 8 +++- 13 files changed, 54 insertions(+), 21 deletions(-) diff --git a/hw/arm/fsl-imx6.c b/hw/arm/fsl-imx6.c index 7b7b97f..ed772d5 100644 --- a/hw/arm/fsl-imx6.c +++ b/hw/arm/fsl-imx6.c @@ -23,6 +23,7 @@ #include "qapi/error.h" #include "qemu-common.h" #include "hw/arm/fsl-imx6.h" +#include "hw/boards.h" #include "sysemu/sysemu.h" #include "chardev/char.h" #include "qemu/error-report.h" @@ -33,11 +34,12 @@ static void fsl_imx6_init(Object *obj) { +MachineState *ms = MACHINE(qdev_get_machine()); FslIMX6State *s = FSL_IMX6(obj); char name[NAME_SIZE]; int i; -for (i = 0; i < MIN(smp_cpus, FSL_IMX6_NUM_CPUS); i++) { +for (i = 0; i < MIN(ms->smp.cpus, FSL_IMX6_NUM_CPUS); i++) { snprintf(name, NAME_SIZE, "cpu%d", i); object_initialize_child(obj, name, &s->cpu[i], sizeof(s->cpu[i]), "cortex-a9-" TYPE_ARM_CPU, &error_abort, NULL); @@ -93,9 +95,11 @@ static void fsl_imx6_init(Object *obj) static void fsl_imx6_realize(DeviceState *dev, Error **errp) { +MachineState *ms = MACHINE(qdev_get_machine()); FslIMX6State *s = FSL_IMX6(dev); uint16_t i; Error *err = NULL; +unsigned int smp_cpus = ms->smp.cpus; if (smp_cpus > FSL_IMX6_NUM_CPUS) { error_setg(errp, "%s: Only %d CPUs are supported (%d requested)", diff --git a/hw/arm/fsl-imx6ul.c b/hw/arm/fsl-imx6ul.c index 4b56bfa..74b8ecb 100644 --- a/hw/arm/fsl-imx6ul.c +++ b/hw/arm/fsl-imx6ul.c @@ -21,6 +21,7 @@ #include "qemu-common.h" #include "hw/arm/fsl-imx6ul.h" #include "hw/misc/unimp.h" +#include "hw/boards.h" #include "sysemu/sysemu.h" #include "qemu/error-report.h" @@ -28,11 +29,12 @@ static void fsl_imx6ul_init(Object *obj) { +MachineState *ms = MACHINE(qdev_get_machine()); FslIMX6ULState *s = FSL_IMX6UL(obj); char name[NAME_SIZE]; int i; -for (i = 0; i < MIN(smp_cpus, FSL_IMX6UL_NUM_CPUS); i++) { +for (i = 0; i < MIN(ms->smp.cpus, FSL_IMX6UL_NUM_CPUS); i++) { snprintf(name, NAME_SIZE, "cpu%d", i); object_initialize_child(obj, name, &s->cpu[i], sizeof(s->cpu[i]), "cortex-a7-" TYPE_ARM_CPU, &error_abort, NULL); @@ -156,10 +158,12 @@ static void fsl_imx6ul_init(Object *obj) static void fsl_imx6ul_realize(DeviceState *dev, Error **errp) { +MachineState *ms = MACHINE(qdev_get_machine()); FslIMX6ULState *s = FSL_IMX6UL(dev); int i; qemu_irq irq; char name[NAME_SIZE]; +unsigned int smp_cpus = ms->smp.cpus; if (smp_cpus > FSL_IMX6UL_NUM_CPUS) { error_setg(errp, "%s: Only %d CPUs are supported (%d requested)", diff --git a/hw/arm/fsl-imx7.c b/hw/arm/fsl-imx7.c index 7663ad6..71cc414 100644 --- a/hw/arm/fsl-imx7.c +++ b/hw/arm/fsl-imx7.c @@ -23,6 +23,7 @@ #include "qemu-common.h" #include "hw/arm/fsl-imx7.h" #include "hw/misc/unimp.h" +#include "hw/boards.h" #include "sysemu/sysemu.h" #include "qemu/error-report.h" @@ -30,12 +31,12 @@ static void fsl_imx7_init(Object *obj) { +MachineState *ms = MACHINE(qdev_get_machine()); FslIMX7State *s = FSL_IMX7(obj); char name[NAME_SIZE]; int i; - -for (i = 0; i < MIN(smp_cpus, FSL_IMX7_NUM_CPUS); i++) { +for (i = 0; i < MIN(ms->smp.cpus, FSL_IMX7_NUM_CPUS); i++) { snprintf(name, NAME_SIZE, "cpu%d", i); object_initialize_child(obj, name, &s->cpu[i], sizeof(s->cpu[i]), ARM_CPU_TYPE_NAME("cortex-a7"), &error_abort, @@ -155,11 +156,13 @@ static void fsl_imx7_init(Object *obj) static void fsl_imx7_realize(DeviceState *dev, Error **errp) { +MachineState *ms = MACHINE(qdev_get_machine()); FslIMX7State *s = FSL_IMX7(dev); Object *o; int i; qemu_irq irq; char name[NAME_SIZE]; +unsigned int smp_cpus = ms->smp.cpus; if (smp_cpus > FSL_IMX7_NUM_CPUS) { error_setg(errp, "%s: Only %d CPUs are
[Qemu-devel] [PATCH v2 02/10] cpu/topology: related call chains refactoring to pass MachineState
It's recommended to access smp variables via MachineState as an incoming parameter. This approach applies on legacy smbios_*_tables*(), *_machine_reset(), *__hot_add_cpu() and related *_create_cpu() for later smp variables usages. Suggested-by: Igor Mammedov Signed-off-by: Like Xu --- hw/arm/virt.c| 2 +- hw/hppa/machine.c| 2 +- hw/i386/pc.c | 9 - hw/mips/mips_malta.c | 21 +++-- hw/ppc/pnv.c | 3 +-- hw/ppc/spapr.c | 3 +-- hw/s390x/s390-virtio-ccw.c | 6 +++--- hw/smbios/smbios.c | 26 +++--- include/hw/boards.h | 4 ++-- include/hw/firmware/smbios.h | 5 +++-- include/hw/i386/pc.h | 2 +- qmp.c| 2 +- vl.c | 2 +- 13 files changed, 45 insertions(+), 42 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 16ba67f..1b02ba4 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1290,7 +1290,7 @@ static void virt_build_smbios(VirtMachineState *vms) vmc->smbios_old_sys_ver ? "1.0" : mc->name, false, true, SMBIOS_ENTRY_POINT_30); -smbios_get_tables(NULL, 0, &smbios_tables, &smbios_tables_len, +smbios_get_tables(MACHINE(vms), NULL, 0, &smbios_tables, &smbios_tables_len, &smbios_anchor, &smbios_anchor_len); if (smbios_anchor) { diff --git a/hw/hppa/machine.c b/hw/hppa/machine.c index d1b1d3c..416e67b 100644 --- a/hw/hppa/machine.c +++ b/hw/hppa/machine.c @@ -240,7 +240,7 @@ static void machine_hppa_init(MachineState *machine) cpu[0]->env.gr[21] = smp_cpus; } -static void hppa_machine_reset(void) +static void hppa_machine_reset(MachineState *ms) { int i; diff --git a/hw/i386/pc.c b/hw/i386/pc.c index d98b737..9bcd867 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -962,7 +962,7 @@ static void pc_build_smbios(PCMachineState *pcms) /* tell smbios about cpuid version and features */ smbios_set_cpuid(cpu->env.cpuid_version, cpu->env.features[FEAT_1_EDX]); -smbios_tables = smbios_get_table_legacy(&smbios_tables_len); +smbios_tables = smbios_get_table_legacy(ms, &smbios_tables_len); if (smbios_tables) { fw_cfg_add_bytes(pcms->fw_cfg, FW_CFG_SMBIOS_ENTRIES, smbios_tables, smbios_tables_len); @@ -979,7 +979,7 @@ static void pc_build_smbios(PCMachineState *pcms) array_count++; } } -smbios_get_tables(mem_array, array_count, +smbios_get_tables(ms, mem_array, array_count, &smbios_tables, &smbios_tables_len, &smbios_anchor, &smbios_anchor_len); g_free(mem_array); @@ -1534,9 +1534,8 @@ static void pc_new_cpu(const char *typename, int64_t apic_id, Error **errp) error_propagate(errp, local_err); } -void pc_hot_add_cpu(const int64_t id, Error **errp) +void pc_hot_add_cpu(MachineState *ms, const int64_t id, Error **errp) { -MachineState *ms = MACHINE(qdev_get_machine()); int64_t apic_id = x86_cpu_apic_id_from_index(id); Error *local_err = NULL; @@ -2622,7 +2621,7 @@ static void pc_machine_initfn(Object *obj) pc_system_flash_create(pcms); } -static void pc_machine_reset(void) +static void pc_machine_reset(MachineState *machine) { CPUState *cs; X86CPU *cpu; diff --git a/hw/mips/mips_malta.c b/hw/mips/mips_malta.c index 439665a..534e705 100644 --- a/hw/mips/mips_malta.c +++ b/hw/mips/mips_malta.c @@ -1124,14 +1124,14 @@ static void main_cpu_reset(void *opaque) } } -static void create_cpu_without_cps(const char *cpu_type, +static void create_cpu_without_cps(MachineState *ms, const char *cpu_type, qemu_irq *cbus_irq, qemu_irq *i8259_irq) { CPUMIPSState *env; MIPSCPU *cpu; int i; -for (i = 0; i < smp_cpus; i++) { +for (i = 0; i < ms->smp.cpus; i++) { cpu = MIPS_CPU(cpu_create(cpu_type)); /* Init internal devices */ @@ -1146,7 +1146,7 @@ static void create_cpu_without_cps(const char *cpu_type, *cbus_irq = env->irq[4]; } -static void create_cps(MaltaState *s, const char *cpu_type, +static void create_cps(MachineState *ms, MaltaState *s, const char *cpu_type, qemu_irq *cbus_irq, qemu_irq *i8259_irq) { Error *err = NULL; @@ -1155,7 +1155,7 @@ static void create_cps(MaltaState *s, const char *cpu_type, qdev_set_parent_bus(DEVICE(s->cps), sysbus_get_default()); object_property_set_str(OBJECT(s->cps), cpu_type, "cpu-type", &err); -object_property_set_int(OBJECT(s->cps), smp_cpus, "num-vp", &err); +object_property_set_int(OBJECT(s->cps), ms->smp.cpus, "num-vp", &err); object_property_set_bool(OBJECT(s->cps), true, "rea
[Qemu-devel] [PATCH v2 04/10] cpu/topology: add uncommon arch support for smp machine properties
Following the replace rules, the global smp variables in hppa/mips/openrisc /sparc*/xtensa are replaced with smp machine properties. No semantic changes. Signed-off-by: Like Xu --- hw/alpha/dp264.c | 1 + hw/hppa/machine.c | 2 ++ hw/mips/boston.c | 2 +- hw/mips/mips_malta.c | 2 ++ hw/openrisc/openrisc_sim.c | 1 + hw/sparc/sun4m.c | 2 ++ hw/sparc64/sun4u.c | 4 ++-- hw/xtensa/sim.c| 2 +- hw/xtensa/xtfpga.c | 1 + 9 files changed, 13 insertions(+), 4 deletions(-) diff --git a/hw/alpha/dp264.c b/hw/alpha/dp264.c index 0347eb8..9dfb835 100644 --- a/hw/alpha/dp264.c +++ b/hw/alpha/dp264.c @@ -63,6 +63,7 @@ static void clipper_init(MachineState *machine) char *palcode_filename; uint64_t palcode_entry, palcode_low, palcode_high; uint64_t kernel_entry, kernel_low, kernel_high; +unsigned int smp_cpus = machine->smp.cpus; /* Create up to 4 cpus. */ memset(cpus, 0, sizeof(cpus)); diff --git a/hw/hppa/machine.c b/hw/hppa/machine.c index 416e67b..662838d 100644 --- a/hw/hppa/machine.c +++ b/hw/hppa/machine.c @@ -72,6 +72,7 @@ static void machine_hppa_init(MachineState *machine) MemoryRegion *ram_region; MemoryRegion *cpu_region; long i; +unsigned int smp_cpus = machine->smp.cpus; ram_size = machine->ram_size; @@ -242,6 +243,7 @@ static void machine_hppa_init(MachineState *machine) static void hppa_machine_reset(MachineState *ms) { +unsigned int smp_cpus = ms->smp.cpus; int i; qemu_devices_reset(); diff --git a/hw/mips/boston.c b/hw/mips/boston.c index a8b29f6..ccbfac5 100644 --- a/hw/mips/boston.c +++ b/hw/mips/boston.c @@ -460,7 +460,7 @@ static void boston_mach_init(MachineState *machine) object_property_set_str(OBJECT(s->cps), machine->cpu_type, "cpu-type", &err); -object_property_set_int(OBJECT(s->cps), smp_cpus, "num-vp", &err); +object_property_set_int(OBJECT(s->cps), machine->smp.cpus, "num-vp", &err); object_property_set_bool(OBJECT(s->cps), true, "realized", &err); if (err != NULL) { diff --git a/hw/mips/mips_malta.c b/hw/mips/mips_malta.c index 534e705..70ff98b 100644 --- a/hw/mips/mips_malta.c +++ b/hw/mips/mips_malta.c @@ -1095,6 +1095,8 @@ static int64_t load_kernel (void) static void malta_mips_config(MIPSCPU *cpu) { +MachineState *ms = MACHINE(qdev_get_machine()); +unsigned int smp_cpus = ms->smp.cpus; CPUMIPSState *env = &cpu->env; CPUState *cs = CPU(cpu); diff --git a/hw/openrisc/openrisc_sim.c b/hw/openrisc/openrisc_sim.c index 7d3b734..c84b9af 100644 --- a/hw/openrisc/openrisc_sim.c +++ b/hw/openrisc/openrisc_sim.c @@ -131,6 +131,7 @@ static void openrisc_sim_init(MachineState *machine) qemu_irq *cpu_irqs[2]; qemu_irq serial_irq; int n; +unsigned int smp_cpus = machine->smp.cpus; for (n = 0; n < smp_cpus; n++) { cpu = OPENRISC_CPU(cpu_create(machine->cpu_type)); diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c index ca1e382..43e3434 100644 --- a/hw/sparc/sun4m.c +++ b/hw/sparc/sun4m.c @@ -853,6 +853,8 @@ static void sun4m_hw_init(const struct sun4m_hwdef *hwdef, unsigned int num_vsimms; DeviceState *dev; SysBusDevice *s; +unsigned int smp_cpus = machine->smp.cpus; +unsigned int max_cpus = machine->smp.max_cpus; /* init CPUs */ for(i = 0; i < smp_cpus; i++) { diff --git a/hw/sparc64/sun4u.c b/hw/sparc64/sun4u.c index 399f2d7..0807f27 100644 --- a/hw/sparc64/sun4u.c +++ b/hw/sparc64/sun4u.c @@ -678,8 +678,8 @@ static void sun4uv_init(MemoryRegion *address_space_mem, &FW_CFG_IO(dev)->comb_iomem); fw_cfg = FW_CFG(dev); -fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus); -fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus); +fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)machine->smp.cpus); +fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)machine->smp.max_cpus); fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size); fw_cfg_add_i16(fw_cfg, FW_CFG_MACHINE_ID, hwdef->machine_id); fw_cfg_add_i64(fw_cfg, FW_CFG_KERNEL_ADDR, kernel_entry); diff --git a/hw/xtensa/sim.c b/hw/xtensa/sim.c index 12c7437..a4eef76 100644 --- a/hw/xtensa/sim.c +++ b/hw/xtensa/sim.c @@ -60,7 +60,7 @@ static void xtensa_sim_init(MachineState *machine) const char *kernel_filename = machine->kernel_filename; int n; -for (n = 0; n < smp_cpus; n++) { +for (n = 0; n < machine->smp.cpus; n++) { cpu = XTENSA_CPU(cpu_create(machine->cpu_type)); env = &cpu->env; diff --git a/hw/xtensa/xtfpga.c b/hw/xtensa/xtfpga.c index e05ef75..f7f3e11 100644 --- a/hw/xtensa/xtfpga.c +++ b/hw/xtensa/xtfpga.c @@ -238,6 +238,7 @@ static void xtfpga_init(const Xtfp
[Qemu-devel] [PATCH v2 06/10] cpu/topology: add hw/riscv support for smp machine properties
Following the replace rules, the global smp variables in riscv are replaced with smp machine properties. No semantic changes. Signed-off-by: Like Xu --- hw/riscv/sifive_e.c| 6 -- hw/riscv/sifive_plic.c | 3 +++ hw/riscv/sifive_u.c| 6 -- hw/riscv/spike.c | 2 ++ hw/riscv/virt.c| 1 + 5 files changed, 14 insertions(+), 4 deletions(-) diff --git a/hw/riscv/sifive_e.c b/hw/riscv/sifive_e.c index b1cd113..ae86a63 100644 --- a/hw/riscv/sifive_e.c +++ b/hw/riscv/sifive_e.c @@ -137,6 +137,7 @@ static void riscv_sifive_e_init(MachineState *machine) static void riscv_sifive_e_soc_init(Object *obj) { +MachineState *ms = MACHINE(qdev_get_machine()); SiFiveESoCState *s = RISCV_E_SOC(obj); object_initialize_child(obj, "cpus", &s->cpus, @@ -144,12 +145,13 @@ static void riscv_sifive_e_soc_init(Object *obj) &error_abort, NULL); object_property_set_str(OBJECT(&s->cpus), SIFIVE_E_CPU, "cpu-type", &error_abort); -object_property_set_int(OBJECT(&s->cpus), smp_cpus, "num-harts", +object_property_set_int(OBJECT(&s->cpus), ms->smp.cpus, "num-harts", &error_abort); } static void riscv_sifive_e_soc_realize(DeviceState *dev, Error **errp) { +MachineState *ms = MACHINE(qdev_get_machine()); const struct MemmapEntry *memmap = sifive_e_memmap; SiFiveESoCState *s = RISCV_E_SOC(dev); @@ -179,7 +181,7 @@ static void riscv_sifive_e_soc_realize(DeviceState *dev, Error **errp) SIFIVE_E_PLIC_CONTEXT_STRIDE, memmap[SIFIVE_E_PLIC].size); sifive_clint_create(memmap[SIFIVE_E_CLINT].base, -memmap[SIFIVE_E_CLINT].size, smp_cpus, +memmap[SIFIVE_E_CLINT].size, ms->smp.cpus, SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE); sifive_mmio_emulate(sys_mem, "riscv.sifive.e.aon", memmap[SIFIVE_E_AON].base, memmap[SIFIVE_E_AON].size); diff --git a/hw/riscv/sifive_plic.c b/hw/riscv/sifive_plic.c index 07a032d..d4010a1 100644 --- a/hw/riscv/sifive_plic.c +++ b/hw/riscv/sifive_plic.c @@ -23,6 +23,7 @@ #include "qemu/error-report.h" #include "hw/sysbus.h" #include "hw/pci/msi.h" +#include "hw/boards.h" #include "target/riscv/cpu.h" #include "sysemu/sysemu.h" #include "hw/riscv/sifive_plic.h" @@ -438,6 +439,8 @@ static void sifive_plic_irq_request(void *opaque, int irq, int level) static void sifive_plic_realize(DeviceState *dev, Error **errp) { +MachineState *ms = MACHINE(qdev_get_machine()); +unsigned int smp_cpus = ms->smp.cpus; SiFivePLICState *plic = SIFIVE_PLIC(dev); int i; diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c index 5ecc47c..43bf256 100644 --- a/hw/riscv/sifive_u.c +++ b/hw/riscv/sifive_u.c @@ -321,13 +321,14 @@ static void riscv_sifive_u_init(MachineState *machine) static void riscv_sifive_u_soc_init(Object *obj) { +MachineState *ms = MACHINE(qdev_get_machine()); SiFiveUSoCState *s = RISCV_U_SOC(obj); object_initialize_child(obj, "cpus", &s->cpus, sizeof(s->cpus), TYPE_RISCV_HART_ARRAY, &error_abort, NULL); object_property_set_str(OBJECT(&s->cpus), SIFIVE_U_CPU, "cpu-type", &error_abort); -object_property_set_int(OBJECT(&s->cpus), smp_cpus, "num-harts", +object_property_set_int(OBJECT(&s->cpus), ms->smp.cpus, "num-harts", &error_abort); sysbus_init_child_obj(obj, "gem", &s->gem, sizeof(s->gem), @@ -336,6 +337,7 @@ static void riscv_sifive_u_soc_init(Object *obj) static void riscv_sifive_u_soc_realize(DeviceState *dev, Error **errp) { +MachineState *ms = MACHINE(qdev_get_machine()); SiFiveUSoCState *s = RISCV_U_SOC(dev); const struct MemmapEntry *memmap = sifive_u_memmap; MemoryRegion *system_memory = get_system_memory(); @@ -371,7 +373,7 @@ static void riscv_sifive_u_soc_realize(DeviceState *dev, Error **errp) sifive_uart_create(system_memory, memmap[SIFIVE_U_UART1].base, serial_hd(1), qdev_get_gpio_in(DEVICE(s->plic), SIFIVE_U_UART1_IRQ)); sifive_clint_create(memmap[SIFIVE_U_CLINT].base, -memmap[SIFIVE_U_CLINT].size, smp_cpus, +memmap[SIFIVE_U_CLINT].size, ms->smp.cpus, SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE); for (i = 0; i < SIFIVE_U_PLIC_NUM_SOURCES; i++) { diff --git a/hw/riscv/spike.c b/hw/riscv/spike.c index 2a000a5..6a747ff 100644 --- a/hw/riscv/spike.c +++ b/hw/riscv/spike.c @@ -171,6 +171,7 @@ static void spike_v1_10_0_board_init(MachineState *machine) MemoryRegion *main_mem = g_new(MemoryRegion, 1); MemoryRegion *mask_rom = g_new(MemoryRegion, 1
[Qemu-devel] [PATCH v2 07/10] cpu/topology: add hw/s390x support for smp machine properties
Following the replace rules, the global smp variables in s390x are replaced with smp machine properties. No semantic changes. Signed-off-by: Like Xu --- hw/s390x/s390-virtio-ccw.c | 3 ++- hw/s390x/sclp.c| 2 +- target/s390x/cpu.c | 3 +++ target/s390x/excp_helper.c | 5 + 4 files changed, 11 insertions(+), 2 deletions(-) diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c index ed1fe7a..692ad6e 100644 --- a/hw/s390x/s390-virtio-ccw.c +++ b/hw/s390x/s390-virtio-ccw.c @@ -83,7 +83,7 @@ static void s390_init_cpus(MachineState *machine) /* initialize possible_cpus */ mc->possible_cpu_arch_ids(machine); -for (i = 0; i < smp_cpus; i++) { +for (i = 0; i < machine->smp.cpus; i++) { s390x_new_cpu(machine->cpu_type, i, &error_fatal); } } @@ -410,6 +410,7 @@ static CpuInstanceProperties s390_cpu_index_to_props(MachineState *ms, static const CPUArchIdList *s390_possible_cpu_arch_ids(MachineState *ms) { int i; +unsigned int max_cpus = ms->smp.max_cpus; if (ms->possible_cpus) { g_assert(ms->possible_cpus && ms->possible_cpus->len == max_cpus); diff --git a/hw/s390x/sclp.c b/hw/s390x/sclp.c index 4510a80..fac7c3b 100644 --- a/hw/s390x/sclp.c +++ b/hw/s390x/sclp.c @@ -64,7 +64,7 @@ static void read_SCP_info(SCLPDevice *sclp, SCCB *sccb) prepare_cpu_entries(sclp, read_info->entries, &cpu_count); read_info->entries_cpu = cpu_to_be16(cpu_count); read_info->offset_cpu = cpu_to_be16(offsetof(ReadInfo, entries)); -read_info->highest_cpu = cpu_to_be16(max_cpus - 1); +read_info->highest_cpu = cpu_to_be16(machine->smp.max_cpus - 1); read_info->ibc_val = cpu_to_be32(s390_get_ibc_val()); diff --git a/target/s390x/cpu.c b/target/s390x/cpu.c index b58ef0a..0601c2e 100644 --- a/target/s390x/cpu.c +++ b/target/s390x/cpu.c @@ -37,6 +37,7 @@ #include "hw/qdev-properties.h" #ifndef CONFIG_USER_ONLY #include "hw/hw.h" +#include "hw/boards.h" #include "sysemu/arch_init.h" #include "sysemu/sysemu.h" #endif @@ -193,6 +194,8 @@ static void s390_cpu_realizefn(DeviceState *dev, Error **errp) } #if !defined(CONFIG_USER_ONLY) +MachineState *ms = MACHINE(qdev_get_machine()); +unsigned int max_cpus = ms->smp.max_cpus; if (cpu->env.core_id >= max_cpus) { error_setg(&err, "Unable to add CPU with core-id: %" PRIu32 ", maximum core-id: %d", cpu->env.core_id, diff --git a/target/s390x/excp_helper.c b/target/s390x/excp_helper.c index f84bfb1..77833e9 100644 --- a/target/s390x/excp_helper.c +++ b/target/s390x/excp_helper.c @@ -31,6 +31,7 @@ #ifndef CONFIG_USER_ONLY #include "sysemu/sysemu.h" #include "hw/s390x/s390_flic.h" +#include "hw/boards.h" #endif void QEMU_NORETURN tcg_s390_program_interrupt(CPUS390XState *env, uint32_t code, @@ -279,6 +280,10 @@ static void do_ext_interrupt(CPUS390XState *env) g_assert(cpu_addr < S390_MAX_CPUS); lowcore->cpu_addr = cpu_to_be16(cpu_addr); clear_bit(cpu_addr, env->emergency_signals); +#ifndef CONFIG_USER_ONLY +MachineState *ms = MACHINE(qdev_get_machine()); +unsigned int max_cpus = ms->smp.max_cpus; +#endif if (bitmap_empty(env->emergency_signals, max_cpus)) { env->pending_int &= ~INTERRUPT_EMERGENCY_SIGNAL; } -- 1.8.3.1
[Qemu-devel] [PATCH v2 01/10] hw/boards: add struct CpuTopology to MachineState
To remove usages of global smp variables arch by arch, a bisect friendly way is introduced to initialize struct CpuTopology with duplicate ones; no semantic changes. Suggested-by: Igor Mammedov Suggested-by: Eduardo Habkost Signed-off-by: Like Xu --- include/hw/boards.h | 15 +++ vl.c| 5 + 2 files changed, 20 insertions(+) diff --git a/include/hw/boards.h b/include/hw/boards.h index 6f7916f..dc89c6d 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -231,6 +231,20 @@ typedef struct DeviceMemoryState { } DeviceMemoryState; /** + * CpuTopology: + * @cpus: the number of logical processors on the machine + * @cores: the number of cores in one package + * @threads: the number of threads in one core + * @max_cpus: the maximum number of logical processors on the machine + */ +typedef struct CpuTopology { +unsigned int cpus; +unsigned int cores; +unsigned int threads; +unsigned int max_cpus; +} CpuTopology; + +/** * MachineState: */ struct MachineState { @@ -272,6 +286,7 @@ struct MachineState { const char *cpu_type; AccelState *accelerator; CPUArchIdList *possible_cpus; +CpuTopology smp; struct NVDIMMState *nvdimms_state; }; diff --git a/vl.c b/vl.c index d9fea0a..43fd247 100644 --- a/vl.c +++ b/vl.c @@ -4099,6 +4099,11 @@ int main(int argc, char **argv, char **envp) smp_parse(qemu_opts_find(qemu_find_opts("smp-opts"), NULL)); +current_machine->smp.cpus = smp_cpus; +current_machine->smp.max_cpus = max_cpus; +current_machine->smp.cores = smp_cores; +current_machine->smp.threads = smp_threads; + /* sanity-check smp_cpus and max_cpus against machine_class */ if (smp_cpus < machine_class->min_cpus) { error_report("Invalid SMP CPUs %d. The min CPUs " -- 1.8.3.1
[Qemu-devel] [PATCH] hw/arm/fsl-imx: move cpus initialization to realize time after smp_cpus check
If "smp_cpus> FSL_IMX6_NUM_CPUS" fails in *_realize(), there is no need to initialize the CPUs in *_init(). So it could be better to create all cpus after the validity in *_realize(). On the other hand, it makes the usages of global variable smp_cpus more centrally for maintenance. Suggested-by: Igor Mammedov Signed-off-by: Like Xu --- hw/arm/fsl-imx6.c | 13 +++-- hw/arm/fsl-imx6ul.c | 12 ++-- hw/arm/fsl-imx7.c | 15 +++ 3 files changed, 20 insertions(+), 20 deletions(-) diff --git a/hw/arm/fsl-imx6.c b/hw/arm/fsl-imx6.c index 7b7b97f..14015a1 100644 --- a/hw/arm/fsl-imx6.c +++ b/hw/arm/fsl-imx6.c @@ -37,12 +37,6 @@ static void fsl_imx6_init(Object *obj) char name[NAME_SIZE]; int i; -for (i = 0; i < MIN(smp_cpus, FSL_IMX6_NUM_CPUS); i++) { -snprintf(name, NAME_SIZE, "cpu%d", i); -object_initialize_child(obj, name, &s->cpu[i], sizeof(s->cpu[i]), -"cortex-a9-" TYPE_ARM_CPU, &error_abort, NULL); -} - sysbus_init_child_obj(obj, "a9mpcore", &s->a9mpcore, sizeof(s->a9mpcore), TYPE_A9MPCORE_PRIV); @@ -95,6 +89,7 @@ static void fsl_imx6_realize(DeviceState *dev, Error **errp) { FslIMX6State *s = FSL_IMX6(dev); uint16_t i; +char name[NAME_SIZE]; Error *err = NULL; if (smp_cpus > FSL_IMX6_NUM_CPUS) { @@ -103,6 +98,12 @@ static void fsl_imx6_realize(DeviceState *dev, Error **errp) return; } +for (i = 0; i < MIN(smp_cpus, FSL_IMX6_NUM_CPUS); i++) { +snprintf(name, NAME_SIZE, "cpu%d", i); +object_initialize_child(OBJECT(dev), name, &s->cpu[i], +sizeof(s->cpu[i]), "cortex-a9-" TYPE_ARM_CPU, &error_abort, NULL); +} + for (i = 0; i < smp_cpus; i++) { /* On uniprocessor, the CBAR is set to 0 */ diff --git a/hw/arm/fsl-imx6ul.c b/hw/arm/fsl-imx6ul.c index 4b56bfa..7f30eb7 100644 --- a/hw/arm/fsl-imx6ul.c +++ b/hw/arm/fsl-imx6ul.c @@ -32,12 +32,6 @@ static void fsl_imx6ul_init(Object *obj) char name[NAME_SIZE]; int i; -for (i = 0; i < MIN(smp_cpus, FSL_IMX6UL_NUM_CPUS); i++) { -snprintf(name, NAME_SIZE, "cpu%d", i); -object_initialize_child(obj, name, &s->cpu[i], sizeof(s->cpu[i]), -"cortex-a7-" TYPE_ARM_CPU, &error_abort, NULL); -} - /* * A7MPCORE */ @@ -167,6 +161,12 @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp) return; } +for (i = 0; i < MIN(smp_cpus, FSL_IMX6UL_NUM_CPUS); i++) { +snprintf(name, NAME_SIZE, "cpu%d", i); +object_initialize_child(OBJECT(dev), name, &s->cpu[i], +sizeof(s->cpu[i]), "cortex-a7-" TYPE_ARM_CPU, &error_abort, NULL); +} + for (i = 0; i < smp_cpus; i++) { Object *o = OBJECT(&s->cpu[i]); diff --git a/hw/arm/fsl-imx7.c b/hw/arm/fsl-imx7.c index 7663ad6..2580348 100644 --- a/hw/arm/fsl-imx7.c +++ b/hw/arm/fsl-imx7.c @@ -34,14 +34,6 @@ static void fsl_imx7_init(Object *obj) char name[NAME_SIZE]; int i; - -for (i = 0; i < MIN(smp_cpus, FSL_IMX7_NUM_CPUS); i++) { -snprintf(name, NAME_SIZE, "cpu%d", i); -object_initialize_child(obj, name, &s->cpu[i], sizeof(s->cpu[i]), -ARM_CPU_TYPE_NAME("cortex-a7"), &error_abort, -NULL); -} - /* * A7MPCORE */ @@ -167,6 +159,13 @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp) return; } +for (i = 0; i < MIN(smp_cpus, FSL_IMX7_NUM_CPUS); i++) { +snprintf(name, NAME_SIZE, "cpu%d", i); +object_initialize_child(OBJECT(dev), name, &s->cpu[i], +sizeof(s->cpu[i]), ARM_CPU_TYPE_NAME("cortex-a7"), +&error_abort, NULL); +} + for (i = 0; i < smp_cpus; i++) { o = OBJECT(&s->cpu[i]); -- 1.8.3.1
Re: [Qemu-devel] [PATCH 2/9] cpu/topology: add general support for machine properties
On 2019/4/4 22:25, Igor Mammedov wrote: On Fri, 29 Mar 2019 16:48:38 +0800 Like Xu wrote: diff --git a/cpus.c b/cpus.c index e83f72b..834a697 100644 --- a/cpus.c +++ b/cpus.c @@ -2067,6 +2067,10 @@ static void qemu_dummy_start_vcpu(CPUState *cpu) void qemu_init_vcpu(CPUState *cpu) { +MachineState *ms = MACHINE(qdev_get_machine()); +unsigned int smp_cores = ms->topo.smp_cores; +unsigned int smp_threads = ms->topo.smp_threads; (***) for once it probably will crash *-user builds and secondly the purpose of getting rid of smp_foo globals is disentangle layer violations and not replace it with another global (qdev_get_machine()). I am happy to follow this rule on cpu-topo refactoring work, but sometimes calling qdev_get_machine() is inevitable. What should be done is to make a properties of nr_cores/nr_threads and set them from the parent object that creates CPUs. The point is CPUs shouldn't reach out outside itself to fish out data bits it needs, it's responsibility of creator to feed to being create CPU needed properties. This kind of refactoring probably deserves its own series and should precede -smp refactoring as it doesn't depend on CpuTopology at all. The division of responsibility for this case (refactoring qemu_init_vcpu) seems to be a poisonous apple. The prerequisite for setting cpu-> nr_cores / nr_threads from the parent is that the CPU has been created, so if any process during initialization needs this topo information, it will use the default values form cpu_common_initfn() instead of user-configured parameters. We may not want to repeat those assignment operations using the new values and what do you think, Igor?
Re: [Qemu-devel] [PATCH 0/4] Remove some qdev_get_machine() calls from CONFIG_USER_ONLY
On 2019/4/26 4:00, Eduardo Habkost wrote: This series moves some qdev code outside qdev.o, so it can be compiled only in CONFIG_SOFTMMU. The code being moved includes two qdev_get_machine() calls, so this will make it easier to move qdev_get_machine() to CONFIG_SOFTMMU later. After this series, there's one remaining qdev_get_machine() call that seems more difficult to remove: static void device_set_realized(Object *obj, bool value, Error **errp) { /* [...] */ if (!obj->parent) { gchar *name = g_strdup_printf("device[%d]", unattached_count++); object_property_add_child(container_get(qdev_get_machine(), "/unattached"), name, obj, &error_abort); unattached_parent = true; g_free(name); } /* [...] */ } I may have an experimental patch to fix device_set_realized issue: 1. in qdev_get_machine(): replace dev = container_get(object_get_root(), "/machine"); with dev = object_resolve_path("/machine", NULL); 2. in device_set_realized(): Using Object *container = qdev_get_machine() ? qdev_get_machine() : object_get_root(); and pass it to object_property_add_child( container_get(container, "/unattached"), name, obj, &error_abort); With this fix, we could say the qdev_get_machine() does return the "/machine" object (or null) not a confused "/container". We could continue to use qdev_get_machine() in system emulation mode, getting rid of its surprising side effect as Markus said. The return value of qdev_get_machine() in user-only mode is the same object returned by object_get_root(), so no semantic changes. This one is tricky because on system emulation mode it needs "/machine" to already exist, but in user-only mode it needs to implicitly create a "/machine" container. Eduardo Habkost (4): machine: Move gpio code to hw/core/gpio.c move qdev hotplug code to qdev-hotplug.c qdev: Don't compile hotplug code in user-mode emulation qdev-hotplug: Don't check type of qdev_get_machine() hw/core/bus.c| 11 -- hw/core/gpio.c | 206 hw/core/qdev-hotplug-stubs.c | 44 +++ hw/core/qdev-hotplug.c | 64 ++ hw/core/qdev.c | 219 --- hw/core/Makefile.objs| 5 +- tests/Makefile.include | 3 +- 7 files changed, 320 insertions(+), 232 deletions(-) create mode 100644 hw/core/gpio.c create mode 100644 hw/core/qdev-hotplug-stubs.c create mode 100644 hw/core/qdev-hotplug.c
Re: [Qemu-devel] [PATCH v3 2/2] core/qdev: refactor qdev_get_machine() with type assertion
On 2019/4/25 1:21, Eduardo Habkost wrote: On Tue, Apr 23, 2019 at 03:59:31PM +0800, Like Xu wrote: On 2019/4/18 1:10, Eduardo Habkost wrote: On Wed, Apr 17, 2019 at 07:14:10AM +0200, Markus Armbruster wrote: Eduardo Habkost writes: On Mon, Apr 15, 2019 at 03:59:45PM +0800, Like Xu wrote: To avoid the misuse of qdev_get_machine() if machine hasn't been created yet, this patch uses qdev_get_machine_uncheck() for obj-common (share with user-only mode) and adds type assertion to qdev_get_machine() in system-emulation mode. Suggested-by: Igor Mammedov Signed-off-by: Like Xu Reviewed-by: Eduardo Habkost I'm queueing the series on machine-next, thanks! Hold your horses, please. I dislike the name qdev_get_machine_uncheck(). I could live with qdev_get_machine_unchecked(). However, I doubt this is the right approach. The issue at hand is undisciplined creation of QOM object /machine. This patch adds an asseertion "undisciplined creation of /machine didn't create crap", but only in some places. I think we should never create /machine as (surprising!) side effect of qdev_get_machine(). Create it explicitly instead, and have qdev_get_machine() use object_resolve_path("/machine", NULL) to get it. Look ma, no side effects. OK, I'm dropping this one while we discuss it. I really miss a good explanation why qdev_get_machine_unchecked() needs to exist. When exactly do we want /machine to exist but not be TYPE_MACHINE? Why? AFAICT, there is no such "/machine" that is not of type TYPE_MACHINE. The original qdev_get_machine() would always return a "/container" object in user-only mode and there is none TYPE_MACHINE object. I'm confused. Both qdev_get_machine() and qdev_get_machine_unchecked() still return the object at "/machine". On softmmu, /machine will be of type TYPE_MACHINE. On user-only, /machine will be of type "container". In system emulation mode, it returns the same "/container" object at the beginning, until we initialize and add a TYPE_MACHINE object to the "/container" as a child and it would return OBJECT(current_machine) for later usages. The starting point is to avoid using the legacy qdev_get_machine() in system emulation mode when we haven't added the "/machine" object. As a result, we introduced type checking assertions to avoid premature invocations. I believe Markus is suggesting that avoiding unwanted side effects is even better than doing type checking after it's already too late. In other words, it doesn't make sense to call container_get("/machine") on system emulation mode. I agree. In this proposal, the qdev_get_machine_unchecked() is only used in user-only mode, part of which shares with system emulation mode (such as device_set_realized, cpu_common_realizefn). The new qdev_get_machine() is only used in system emulation mode and type checking assertion does reduce the irrational use of this function (if any in the future). This part confuses me as well. qdev_get_machine_unchecked() is used in both user-only and softmmu, isn't? Thus we can't say it is only used in user-only mode. You're right about this. I think we all agree that qdev_get_machine() should eventually be available in softmmu only. I think we need to make it happen to avoid calling qdev_get_machine() in user-only mode. But I don't think we agree when it would be appropriate to call qdev_get_machine_unchecked() instead of qdev_get_machine(). On both examples in your patch, the code checks for TYPE_MACHINE immediately after calling qdev_get_machine_unchecked(). If that code is only useful in softmmu mode, when would anybody want to call qdev_get_machine_unchecked() in user-only mode? We all agree to use this qdev_get_machine() as little as possible and this patch could make future clean up work easier. Once the expectations and use cases are explained, we can choose a better name for qdev_get_machine_unchecked() and document it properly.
Re: [Qemu-devel] [PATCH v3 2/2] core/qdev: refactor qdev_get_machine() with type assertion
On 2019/4/18 1:10, Eduardo Habkost wrote: On Wed, Apr 17, 2019 at 07:14:10AM +0200, Markus Armbruster wrote: Eduardo Habkost writes: On Mon, Apr 15, 2019 at 03:59:45PM +0800, Like Xu wrote: To avoid the misuse of qdev_get_machine() if machine hasn't been created yet, this patch uses qdev_get_machine_uncheck() for obj-common (share with user-only mode) and adds type assertion to qdev_get_machine() in system-emulation mode. Suggested-by: Igor Mammedov Signed-off-by: Like Xu Reviewed-by: Eduardo Habkost I'm queueing the series on machine-next, thanks! Hold your horses, please. I dislike the name qdev_get_machine_uncheck(). I could live with qdev_get_machine_unchecked(). However, I doubt this is the right approach. The issue at hand is undisciplined creation of QOM object /machine. This patch adds an asseertion "undisciplined creation of /machine didn't create crap", but only in some places. I think we should never create /machine as (surprising!) side effect of qdev_get_machine(). Create it explicitly instead, and have qdev_get_machine() use object_resolve_path("/machine", NULL) to get it. Look ma, no side effects. OK, I'm dropping this one while we discuss it. I really miss a good explanation why qdev_get_machine_unchecked() needs to exist. When exactly do we want /machine to exist but not be TYPE_MACHINE? Why? AFAICT, there is no such "/machine" that is not of type TYPE_MACHINE. The original qdev_get_machine() would always return a "/container" object in user-only mode and there is none TYPE_MACHINE object. In system emulation mode, it returns the same "/container" object at the beginning, until we initialize and add a TYPE_MACHINE object to the "/container" as a child and it would return OBJECT(current_machine) for later usages. The starting point is to avoid using the legacy qdev_get_machine() in system emulation mode when we haven't added the "/machine" object. As a result, we introduced type checking assertions to avoid premature invocations. In this proposal, the qdev_get_machine_unchecked() is only used in user-only mode, part of which shares with system emulation mode (such as device_set_realized, cpu_common_realizefn). The new qdev_get_machine() is only used in system emulation mode and type checking assertion does reduce the irrational use of this function (if any in the future). We all agree to use this qdev_get_machine() as little as possible and this patch could make future clean up work easier. Once the expectations and use cases are explained, we can choose a better name for qdev_get_machine_unchecked() and document it properly.
Re: [Qemu-devel] [PATCH 3/9] cpu/topology: add uncommon arch support for smp machine properties
On 2019/4/8 20:54, Igor Mammedov wrote: On Fri, 29 Mar 2019 16:48:39 +0800 Like Xu wrote: here should be a commit message explaining what patch does in more detail. Signed-off-by: Like Xu Generic note, try not call qdev_get_machine() every time you replace smp_cpus or other variables. It's often possible to pass MachineState via call chain with trivial fixups. Hi Igor, I have some doubts on this comments after some attempts. I'm not sure if this idea could apply to all qdev_get_machine() usages in tree or just for this smp-touch-only patch. It takes a lot of efforts on hooks overrides when we undo calls to qdev_get_machine() with modification of incoming parameters. The implementation of qdev_get_machine() couldn't be simpler more and it doesn't seem to bring much overhead compared with parameter stack. --- hw/alpha/dp264.c | 1 + hw/hppa/machine.c| 4 hw/mips/boston.c | 1 + hw/mips/mips_malta.c | 9 + hw/sparc/sun4m.c | 2 ++ hw/sparc64/sun4u.c | 2 ++ hw/xtensa/sim.c | 1 + hw/xtensa/xtfpga.c | 1 + 8 files changed, 21 insertions(+) diff --git a/hw/alpha/dp264.c b/hw/alpha/dp264.c index 0347eb8..ee5d432 100644 --- a/hw/alpha/dp264.c +++ b/hw/alpha/dp264.c @@ -63,6 +63,7 @@ static void clipper_init(MachineState *machine) char *palcode_filename; uint64_t palcode_entry, palcode_low, palcode_high; uint64_t kernel_entry, kernel_low, kernel_high; +unsigned int smp_cpus = machine->topo.smp_cpus; /* Create up to 4 cpus. */ memset(cpus, 0, sizeof(cpus)); diff --git a/hw/hppa/machine.c b/hw/hppa/machine.c index d1b1d3c..f652891 100644 --- a/hw/hppa/machine.c +++ b/hw/hppa/machine.c @@ -16,6 +16,7 @@ #include "hw/ide.h" #include "hw/timer/i8254.h" #include "hw/char/serial.h" +#include "hw/boards.h" #include "hppa_sys.h" #include "qemu/units.h" #include "qapi/error.h" @@ -72,6 +73,7 @@ static void machine_hppa_init(MachineState *machine) MemoryRegion *ram_region; MemoryRegion *cpu_region; long i; +unsigned int smp_cpus = machine->topo.smp_cpus; I'd prefer to replace smp_cpus with machine->topo.smp_cpus directly at places it's used, as it makes affected sites more visible in the patch. And use local smp_cpus only in places where using machine->topo.smp_cpus makes core less readable. (but it's just personal preference so I don't insist on it) ram_size = machine->ram_size; @@ -242,7 +244,9 @@ static void machine_hppa_init(MachineState *machine) static void hppa_machine_reset(void) { +MachineState *ms = MACHINE(qdev_get_machine()); int i; +unsigned int smp_cpus = ms->topo.smp_cpus; ***) It would be better to pass MachineState as argument to hppa_machine_reset(), a patch to so should go before this one. Quick look shows only 3 overrides (hppa, pc, pnv) and one caller, so I'd rather fix it than calling qdev_get_machine() unnecessarily qemu_devices_reset(); diff --git a/hw/mips/boston.c b/hw/mips/boston.c index e5bab3c..7752c10 100644 --- a/hw/mips/boston.c +++ b/hw/mips/boston.c @@ -434,6 +434,7 @@ static void boston_mach_init(MachineState *machine) DriveInfo *hd[6]; Chardev *chr; int fw_size, fit_err; +unsigned int smp_cpus = machine->topo.smp_cpus; bool is_64b; if ((machine->ram_size % GiB) || diff --git a/hw/mips/mips_malta.c b/hw/mips/mips_malta.c index 439665a..d595375 100644 --- a/hw/mips/mips_malta.c +++ b/hw/mips/mips_malta.c @@ -1095,6 +1095,8 @@ static int64_t load_kernel (void) static void malta_mips_config(MIPSCPU *cpu) { +MachineState *ms = MACHINE(qdev_get_machine()); +unsigned int smp_cpus = ms->topo.smp_cpus; CPUMIPSState *env = &cpu->env; CPUState *cs = CPU(cpu); this one also called from reset, so the same [***] applies here too. @@ -1127,9 +1129,11 @@ static void main_cpu_reset(void *opaque) static void create_cpu_without_cps(const char *cpu_type, qemu_irq *cbus_irq, qemu_irq *i8259_irq) { +MachineState *ms = MACHINE(qdev_get_machine()); caller has an access to MachineState so pass it down call chain all the way CPUMIPSState *env; MIPSCPU *cpu; int i; +unsigned int smp_cpus = ms->topo.smp_cpus; for (i = 0; i < smp_cpus; i++) { cpu = MIPS_CPU(cpu_create(cpu_type)); @@ -1149,7 +1153,9 @@ static void create_cpu_without_cps(const char *cpu_type, static void create_cps(MaltaState *s, const char *cpu_type, qemu_irq *cbus_irq, qemu_irq *i8259_irq) { +MachineState *ms = MACHINE(qdev_get_machine()); ditto Error *err = NULL; +unsigned int smp_cpus = ms->topo.smp_cpus; s->cps = MIPS_CPS(object_new(TYPE_MIPS_CPS)); qdev_set_parent_bus(DEVI
[Qemu-devel] [PATCH v3 1/2] vl.c: refactor current_machine as non-global variable
This patch makes the remaining dozen or so uses of the global current_machine outside vl.c use qdev_get_machine() instead, and then make current_machine local to vl.c instead of global. Suggested-by: Peter Maydell Signed-off-by: Like Xu --- accel/kvm/kvm-all.c | 6 -- device-hotplug.c| 3 ++- device_tree.c | 3 ++- exec.c | 6 -- hw/ppc/spapr_rtas.c | 3 ++- include/hw/boards.h | 1 - migration/savevm.c | 9 ++--- qmp.c | 3 ++- target/i386/kvm.c | 3 ++- target/ppc/kvm.c| 3 ++- vl.c| 4 ++-- 11 files changed, 28 insertions(+), 16 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 241db49..d103de2 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -140,7 +140,8 @@ static const KVMCapabilityInfo kvm_required_capabilites[] = { int kvm_get_max_memslots(void) { -KVMState *s = KVM_STATE(current_machine->accelerator); +MachineState *ms = MACHINE(qdev_get_machine()); +KVMState *s = KVM_STATE(ms->accelerator); return s->nr_slots; } @@ -1519,7 +1520,8 @@ static int kvm_max_vcpu_id(KVMState *s) bool kvm_vcpu_id_is_valid(int vcpu_id) { -KVMState *s = KVM_STATE(current_machine->accelerator); +MachineState *ms = MACHINE(qdev_get_machine()); +KVMState *s = KVM_STATE(ms->accelerator); return vcpu_id >= 0 && vcpu_id < kvm_max_vcpu_id(s); } diff --git a/device-hotplug.c b/device-hotplug.c index 6153259..d31c1f8 100644 --- a/device-hotplug.c +++ b/device-hotplug.c @@ -37,6 +37,7 @@ static DriveInfo *add_init_drive(const char *optstr) { +MachineState *ms = MACHINE(qdev_get_machine()); Error *err = NULL; DriveInfo *dinfo; QemuOpts *opts; @@ -46,7 +47,7 @@ static DriveInfo *add_init_drive(const char *optstr) if (!opts) return NULL; -mc = MACHINE_GET_CLASS(current_machine); +mc = MACHINE_GET_CLASS(ms); dinfo = drive_new(opts, mc->block_default_type, &err); if (err) { error_report_err(err); diff --git a/device_tree.c b/device_tree.c index f8b46b3..3294ef6 100644 --- a/device_tree.c +++ b/device_tree.c @@ -459,6 +459,7 @@ int qemu_fdt_setprop_phandle(void *fdt, const char *node_path, uint32_t qemu_fdt_alloc_phandle(void *fdt) { +MachineState *ms = MACHINE(qdev_get_machine()); static int phandle = 0x0; /* @@ -466,7 +467,7 @@ uint32_t qemu_fdt_alloc_phandle(void *fdt) * which phandle id to start allocating phandles. */ if (!phandle) { -phandle = machine_phandle_start(current_machine); +phandle = machine_phandle_start(ms); } if (!phandle) { diff --git a/exec.c b/exec.c index 6ab62f4..15ff2b1 100644 --- a/exec.c +++ b/exec.c @@ -1969,10 +1969,11 @@ static unsigned long last_ram_page(void) static void qemu_ram_setup_dump(void *addr, ram_addr_t size) { +MachineState *ms = MACHINE(qdev_get_machine()); int ret; /* Use MADV_DONTDUMP, if user doesn't want the guest memory in the core */ -if (!machine_dump_guest_core(current_machine)) { +if (!machine_dump_guest_core(ms)) { ret = qemu_madvise(addr, size, QEMU_MADV_DONTDUMP); if (ret) { perror("qemu_madvise"); @@ -2094,7 +2095,8 @@ size_t qemu_ram_pagesize_largest(void) static int memory_try_enable_merging(void *addr, size_t len) { -if (!machine_mem_merge(current_machine)) { +MachineState *ms = MACHINE(qdev_get_machine()); +if (!machine_mem_merge(ms)) { /* disabled by the user */ return 0; } diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c index 24c45b1..51e320d 100644 --- a/hw/ppc/spapr_rtas.c +++ b/hw/ppc/spapr_rtas.c @@ -231,6 +231,7 @@ static void rtas_ibm_get_system_parameter(PowerPCCPU *cpu, target_ulong args, uint32_t nret, target_ulong rets) { +MachineState *ms = MACHINE(spapr); target_ulong parameter = rtas_ld(args, 0); target_ulong buffer = rtas_ld(args, 1); target_ulong length = rtas_ld(args, 2); @@ -243,7 +244,7 @@ static void rtas_ibm_get_system_parameter(PowerPCCPU *cpu, "DesProcs=%d," "MaxPlatProcs=%d", max_cpus, - current_machine->ram_size / MiB, + ms->ram_size / MiB, smp_cpus, max_cpus); ret = sysparm_st(buffer, length, param_val, strlen(param_val) + 1); diff --git a/include/hw/boards.h b/include/hw/boards.h index e231860..1d598c8 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -58,7 +58,6 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner, OBJECT_CLASS_CHE
[Qemu-devel] [PATCH v3 2/2] core/qdev: refactor qdev_get_machine() with type assertion
To avoid the misuse of qdev_get_machine() if machine hasn't been created yet, this patch uses qdev_get_machine_uncheck() for obj-common (share with user-only mode) and adds type assertion to qdev_get_machine() in system-emulation mode. Suggested-by: Igor Mammedov Signed-off-by: Like Xu --- hw/core/qdev.c | 16 +--- include/hw/qdev-core.h | 1 + qom/cpu.c | 5 +++-- 3 files changed, 17 insertions(+), 5 deletions(-) diff --git a/hw/core/qdev.c b/hw/core/qdev.c index f9b6efe..8232216 100644 --- a/hw/core/qdev.c +++ b/hw/core/qdev.c @@ -223,7 +223,7 @@ HotplugHandler *qdev_get_machine_hotplug_handler(DeviceState *dev) { MachineState *machine; MachineClass *mc; -Object *m_obj = qdev_get_machine(); +Object *m_obj = qdev_get_machine_uncheck(); if (object_dynamic_cast(m_obj, TYPE_MACHINE)) { machine = MACHINE(m_obj); @@ -815,7 +815,7 @@ static void device_set_realized(Object *obj, bool value, Error **errp) if (!obj->parent) { gchar *name = g_strdup_printf("device[%d]", unattached_count++); -object_property_add_child(container_get(qdev_get_machine(), +object_property_add_child(container_get(qdev_get_machine_uncheck(), "/unattached"), name, obj, &error_abort); unattached_parent = true; @@ -1095,7 +1095,7 @@ void device_reset(DeviceState *dev) } } -Object *qdev_get_machine(void) +Object *qdev_get_machine_uncheck(void) { static Object *dev; @@ -1106,6 +1106,16 @@ Object *qdev_get_machine(void) return dev; } +Object *qdev_get_machine(void) +{ +static Object *dev; + +dev = qdev_get_machine_uncheck(); +assert(object_dynamic_cast(dev, TYPE_MACHINE) != NULL); + +return dev; +} + static const TypeInfo device_type_info = { .name = TYPE_DEVICE, .parent = TYPE_OBJECT, diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h index 33ed3b8..e7c6a5a 100644 --- a/include/hw/qdev-core.h +++ b/include/hw/qdev-core.h @@ -429,6 +429,7 @@ const struct VMStateDescription *qdev_get_vmsd(DeviceState *dev); const char *qdev_fw_name(DeviceState *dev); +Object *qdev_get_machine_uncheck(void); Object *qdev_get_machine(void); /* FIXME: make this a link<> */ diff --git a/qom/cpu.c b/qom/cpu.c index a8d2958..bb877d5 100644 --- a/qom/cpu.c +++ b/qom/cpu.c @@ -325,9 +325,10 @@ static void cpu_common_parse_features(const char *typename, char *features, static void cpu_common_realizefn(DeviceState *dev, Error **errp) { CPUState *cpu = CPU(dev); -Object *machine = qdev_get_machine(); +Object *machine = qdev_get_machine_uncheck(); -/* qdev_get_machine() can return something that's not TYPE_MACHINE +/* + * qdev_get_machine_uncheck() can return something that's not TYPE_MACHINE * if this is one of the user-only emulators; in that case there's * no need to check the ignore_memory_transaction_failures board flag. */ -- 1.8.3.1
[Qemu-devel] [PATCH v3 0/2] vl.c: make current_machine as non-global variable
This patch makes the remaining dozen or so uses of the global current_machine outside vl.c use qdev_get_machine() instead, and then make current_machine local to vl.c instead of global. With type assertion in qdev_get_machine(), it will be hard to misuse this function if machine hasn't been created yet. For obj-common cases, qdev_get_machine_uncheck() is applied without semantic change. --- Changes in v3: - add TYPE_MACHINE assertion for qdev_get_machine() usage - apply qdev_get_machine_uncheck() for obj-common usage Changes in v2: - make the variable current_machine "static" (Thomas Huth) Like Xu (2): vl.c: refactor current_machine as non-global variable core/qdev: refactor qdev_get_machine() with type assertion accel/kvm/kvm-all.c| 6 -- device-hotplug.c | 3 ++- device_tree.c | 3 ++- exec.c | 6 -- hw/core/qdev.c | 16 +--- hw/ppc/spapr_rtas.c| 3 ++- include/hw/boards.h| 1 - include/hw/qdev-core.h | 1 + migration/savevm.c | 9 ++--- qmp.c | 3 ++- qom/cpu.c | 5 +++-- target/i386/kvm.c | 3 ++- target/ppc/kvm.c | 3 ++- vl.c | 4 ++-- 14 files changed, 45 insertions(+), 21 deletions(-) -- 1.8.3.1
Re: [Qemu-devel] [PATCH 0/9] refactor cpu topo into machine properties
On 2019/4/8 21:26, Igor Mammedov wrote: On Thu, 4 Apr 2019 11:26:09 +0800 Like Xu wrote: On 2019/3/29 18:21, Igor Mammedov wrote: On Fri, 29 Mar 2019 16:48:36 +0800 Like Xu wrote: This patch series make existing cores/threads/sockets into machine properties and get rid of global variables they use currently. Thanks for looking into it! Its long overdue and rather desired conversion (albeit naive one, but this series is a good starting point). I'll go over your patches next week with comments and concrete suggestions how to implement particular things. Hi Igor, any comments and suggestions on smp machine properties in this patch considering we may add die topology for PCMachine as an extension? I've looked at several patches and that it for this series. The most comments apply to the patches I've not reviewed as well. Hi Igor, thanks for your comments, time and patience. I'll try to fix them in next version ASAP. Like Xu (9): cpu/topology: add struct CpuTopology to MachineState cpu/topology: add general support for machine properties cpu/topology: add uncommon arch support for smp machine properties cpu/topology: add ARM support for smp machine properties cpu/topology: add i386 support for smp machine properties cpu/topology: add PPC support for smp machine properties cpu/topology: add riscv support for smp machine properties cpu/topology: add s390x support for smp machine properties cpu/topology: replace smp global variables with machine propertie accel/kvm/kvm-all.c | 3 +++ backends/hostmem.c | 4 cpus.c | 4 exec.c | 2 ++ gdbstub.c| 7 ++- hw/alpha/dp264.c | 1 + hw/arm/fsl-imx6.c| 5 + hw/arm/fsl-imx6ul.c | 5 + hw/arm/fsl-imx7.c| 5 + hw/arm/highbank.c| 1 + hw/arm/mcimx6ul-evk.c| 1 + hw/arm/mcimx7d-sabre.c | 3 +++ hw/arm/raspi.c | 2 ++ hw/arm/realview.c| 1 + hw/arm/sabrelite.c | 1 + hw/arm/vexpress.c| 3 +++ hw/arm/virt.c| 7 +++ hw/arm/xlnx-zynqmp.c | 7 +++ hw/cpu/core.c| 3 +++ hw/hppa/machine.c| 4 hw/i386/acpi-build.c | 3 +++ hw/i386/kvmvapic.c | 5 + hw/i386/pc.c | 12 +++ hw/mips/boston.c | 1 + hw/mips/mips_malta.c | 9 + hw/openrisc/openrisc_sim.c | 1 + hw/ppc/e500.c| 3 +++ hw/ppc/mac_newworld.c| 2 ++ hw/ppc/mac_oldworld.c| 2 ++ hw/ppc/pnv.c | 3 +++ hw/ppc/prep.c| 2 ++ hw/ppc/spapr.c | 29 ++ hw/ppc/spapr_rtas.c | 3 +++ hw/riscv/sifive_e.c | 4 hw/riscv/sifive_plic.c | 3 +++ hw/riscv/sifive_u.c | 4 hw/riscv/spike.c | 2 ++ hw/riscv/virt.c | 1 + hw/s390x/s390-virtio-ccw.c | 2 ++ hw/s390x/sclp.c | 1 + hw/smbios/smbios.c | 11 ++ hw/sparc/sun4m.c | 2 ++ hw/sparc64/sun4u.c | 2 ++ hw/xtensa/sim.c | 1 + hw/xtensa/xtfpga.c | 1 + include/hw/arm/virt.h| 2 +- include/hw/boards.h | 8 include/sysemu/sysemu.h | 2 +- migration/postcopy-ram.c | 7 +++ numa.c | 1 + target/arm/cpu.c | 7 +++ target/i386/cpu.c| 4 target/openrisc/sys_helper.c | 5 + target/s390x/cpu.c | 3 +++ target/s390x/excp_helper.c | 6 ++ tcg/tcg.c| 15 ++ vl.c | 48 57 files changed, 261 insertions(+), 25 deletions(-)