Hi Zhao,

On 8/1/23 05:35, Zhao Liu wrote:
> From: Zhuocheng Ding <zhuocheng.d...@intel.com>
> 
> From CPUState.nr_cores' comment, it represents "number of cores within
> this CPU package".
> 
> After 003f230e37d7 ("machine: Tweak the order of topology members in
> struct CpuTopology"), the meaning of smp.cores changed to "the number of
> cores in one die", but this commit missed to change CPUState.nr_cores'
> caculation, so that CPUState.nr_cores became wrong and now it
> misses to consider numbers of clusters and dies.
> 
> At present, only i386 is using CPUState.nr_cores.
> 
> But as for i386, which supports die level, the uses of CPUState.nr_cores
> are very confusing:
> 
> Early uses are based on the meaning of "cores per package" (before die
> is introduced into i386), and later uses are based on "cores per die"
> (after die's introduction).
> 
> This difference is due to that commit a94e1428991f ("target/i386: Add
> CPUID.1F generation support for multi-dies PCMachine") misunderstood
> that CPUState.nr_cores means "cores per die" when caculated
> CPUID.1FH.01H:EBX. After that, the changes in i386 all followed this
> wrong understanding.
> 
> With the influence of 003f230e37d7 and a94e1428991f, for i386 currently
> the result of CPUState.nr_cores is "cores per die", thus the original
> uses of CPUState.cores based on the meaning of "cores per package" are
> wrong when mutiple dies exist:
> 1. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.01H:EBX[bits 23:16] is
>    incorrect because it expects "cpus per package" but now the
>    result is "cpus per die".
> 2. In cpu_x86_cpuid() of target/i386/cpu.c, for all leaves of CPUID.04H:
>    EAX[bits 31:26] is incorrect because they expect "cpus per package"
>    but now the result is "cpus per die". The error not only impacts the
>    EAX caculation in cache_info_passthrough case, but also impacts other
>    cases of setting cache topology for Intel CPU according to cpu
>    topology (specifically, the incoming parameter "num_cores" expects
>    "cores per package" in encode_cache_cpuid4()).
> 3. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.0BH.01H:EBX[bits
>    15:00] is incorrect because the EBX of 0BH.01H (core level) expects
>    "cpus per package", which may be different with 1FH.01H (The reason
>    is 1FH can support more levels. For QEMU, 1FH also supports die,
>    1FH.01H:EBX[bits 15:00] expects "cpus per die").
> 4. In cpu_x86_cpuid() of target/i386/cpu.c, when CPUID.80000001H is
>    caculated, here "cpus per package" is expected to be checked, but in
>    fact, now it checks "cpus per die". Though "cpus per die" also works
>    for this code logic, this isn't consistent with AMD's APM.
> 5. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.80000008H:ECX expects
>    "cpus per package" but it obtains "cpus per die".
> 6. In simulate_rdmsr() of target/i386/hvf/x86_emu.c, in
>    kvm_rdmsr_core_thread_count() of target/i386/kvm/kvm.c, and in
>    helper_rdmsr() of target/i386/tcg/sysemu/misc_helper.c,
>    MSR_CORE_THREAD_COUNT expects "cpus per package" and "cores per
>    package", but in these functions, it obtains "cpus per die" and
>    "cores per die".
> 
> On the other hand, these uses are correct now (they are added in/after
> a94e1428991f):
> 1. In cpu_x86_cpuid() of target/i386/cpu.c, topo_info.cores_per_die
>    meets the actual meaning of CPUState.nr_cores ("cores per die").
> 2. In cpu_x86_cpuid() of target/i386/cpu.c, vcpus_per_socket (in CPUID.
>    04H's caculation) considers number of dies, so it's correct.
> 3. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.1FH.01H:EBX[bits
>    15:00] needs "cpus per die" and it gets the correct result, and
>    CPUID.1FH.02H:EBX[bits 15:00] gets correct "cpus per package".
> 
> When CPUState.nr_cores is correctly changed to "cores per package" again
> , the above errors will be fixed without extra work, but the "currently"
> correct cases will go wrong and need special handling to pass correct
> "cpus/cores per die" they want.
> 
> Thus in this patch, we fix CPUState.nr_cores' caculation to fit the

s/Thus in this patch, we fix CPUState.nr_cores' caculation/Fix
CPUState.nr_cores' calculation/


Describe your changes in imperative mood also spell check.
Thanks
Babu


> original meaning "cores per package", as well as changing calculation of
> topo_info.cores_per_die, vcpus_per_socket and CPUID.1FH.
> 
> In addition, in the nr_threads' comment, specify it represents the
> number of threads in the "core" to avoid confusion, and also add comment
> for nr_dies in CPUX86State.
> 
> Fixes: a94e1428991f ("target/i386: Add CPUID.1F generation support for 
> multi-dies PCMachine")
> Fixes: 003f230e37d7 ("machine: Tweak the order of topology members in struct 
> CpuTopology")
> Signed-off-by: Zhuocheng Ding <zhuocheng.d...@intel.com>
> Co-developed-by: Zhao Liu <zhao1....@intel.com>
> Signed-off-by: Zhao Liu <zhao1....@intel.com>
> ---
> Changes since v2:
>  * Use wrapped helper to get cores per socket in qemu_init_vcpu().
> Changes since v1:
>  * Add comment for nr_dies in CPUX86State. (Yanan)
> ---
>  include/hw/core/cpu.h | 2 +-
>  softmmu/cpus.c        | 2 +-
>  target/i386/cpu.c     | 9 ++++-----
>  target/i386/cpu.h     | 1 +
>  4 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index fdcbe8735258..57f4d50ace72 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -277,7 +277,7 @@ struct qemu_work_item;
>   *   See TranslationBlock::TCG CF_CLUSTER_MASK.
>   * @tcg_cflags: Pre-computed cflags for this cpu.
>   * @nr_cores: Number of cores within this CPU package.
> - * @nr_threads: Number of threads within this CPU.
> + * @nr_threads: Number of threads within this CPU core.
>   * @running: #true if CPU is currently running (lockless).
>   * @has_waiter: #true if a CPU is currently waiting for the cpu_exec_end;
>   * valid under cpu_list_lock.
> diff --git a/softmmu/cpus.c b/softmmu/cpus.c
> index fed20ffb5dd2..984558d7b245 100644
> --- a/softmmu/cpus.c
> +++ b/softmmu/cpus.c
> @@ -630,7 +630,7 @@ void qemu_init_vcpu(CPUState *cpu)
>  {
>      MachineState *ms = MACHINE(qdev_get_machine());
>  
> -    cpu->nr_cores = ms->smp.cores;
> +    cpu->nr_cores = machine_topo_get_cores_per_socket(ms);
>      cpu->nr_threads =  ms->smp.threads;
>      cpu->stopped = true;
>      cpu->random_seed = qemu_guest_random_seed_thread_part1();
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 97ad229d8ba3..50613cd04612 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -6011,7 +6011,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
> uint32_t count,
>      X86CPUTopoInfo topo_info;
>  
>      topo_info.dies_per_pkg = env->nr_dies;
> -    topo_info.cores_per_die = cs->nr_cores;
> +    topo_info.cores_per_die = cs->nr_cores / env->nr_dies;
>      topo_info.threads_per_core = cs->nr_threads;
>  
>      /* Calculate & apply limits for different index ranges */
> @@ -6087,8 +6087,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
> uint32_t count,
>               */
>              if (*eax & 31) {
>                  int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
> -                int vcpus_per_socket = env->nr_dies * cs->nr_cores *
> -                                       cs->nr_threads;
> +                int vcpus_per_socket = cs->nr_cores * cs->nr_threads;
>                  if (cs->nr_cores > 1) {
>                      *eax &= ~0xFC000000;
>                      *eax |= (pow2ceil(cs->nr_cores) - 1) << 26;
> @@ -6266,12 +6265,12 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
> uint32_t count,
>              break;
>          case 1:
>              *eax = apicid_die_offset(&topo_info);
> -            *ebx = cs->nr_cores * cs->nr_threads;
> +            *ebx = topo_info.cores_per_die * topo_info.threads_per_core;
>              *ecx |= CPUID_TOPOLOGY_LEVEL_CORE;
>              break;
>          case 2:
>              *eax = apicid_pkg_offset(&topo_info);
> -            *ebx = env->nr_dies * cs->nr_cores * cs->nr_threads;
> +            *ebx = cs->nr_cores * cs->nr_threads;
>              *ecx |= CPUID_TOPOLOGY_LEVEL_DIE;
>              break;
>          default:
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index e0771a10433b..7638128d59cc 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -1878,6 +1878,7 @@ typedef struct CPUArchState {
>  
>      TPRAccess tpr_access_type;
>  
> +    /* Number of dies within this CPU package. */
>      unsigned nr_dies;
>  } CPUX86State;
>  

-- 
Thanks
Babu Moger

Reply via email to