Re: [Qemu-devel] [PATCH 1/5] spapr: use spapr->vsmt to compute VCPU ids

2018-02-14 Thread David Gibson
On Wed, Feb 14, 2018 at 08:40:26PM +0100, Greg Kurz wrote:
> Since the introduction of VSMT in 2.11, the spacing of VCPU ids
> between cores is controllable through a machine property instead
> of being only dictated by the SMT mode of the host:
> 
> cpu->vcpu_id = (cc->core_id * spapr->vsmt / smp_threads) + i
> 
> Until recently, the machine code would try to change the SMT mode
> of the host to be equal to VSMT or exit. This allowed the rest of
> the code to assume that kvmppc_smt_threads() == spapr->vsmt is
> always true.
> 
> Recent commit "8904e5a75005 spapr: Adjust default VSMT value for
> better migration compatibility" relaxed the rule. If the VSMT
> mode cannot be set in KVM for some reasons, but the requested
> CPU topology is compatible with the current SMT mode, then we
> let the guest run with  kvmppc_smt_threads() != spapr->vsmt.
> 
> This breaks quite a few places in the code, in particular when
> calculating DRC indexes.
> 
> This is what happens on a POWER host with subcores-per-core=2 (ie,
> supports up to SMT4) when passing the following topology:
> 
> -smp threads=4,maxcpus=16 \
> -device host-spapr-cpu-core,core-id=4,id=core1 \
> -device host-spapr-cpu-core,core-id=8,id=core2
> 
> qemu-system-ppc64: warning: Failed to set KVM's VSMT mode to 8 (errno -22)
> 
> This is expected since KVM is limited to SMT4, but the guest is started
> anyway because this topology can run on SMT4 even with a VSMT8 spacing.
> 
> But when we look at the DT, things get nastier:
> 
> cpus {
> ...
> ibm,drc-indexes = <0x4 0x1000 0x1004 0x1008 0x100c>;
> 
> This means that we have the following association:
> 
>  CPU core device | DRC| VCPU id
> -++-
>boot core | 0x1000 | 0
>core1 | 0x1004 | 4
>core2 | 0x1008 | 8
>core3 | 0x100c | 12
> 
> But since the spacing of VCPU ids is 8, the DRC for core1 points to a
> VCPU that doesn't exist, the DRC for core2 points to the first VCPU of
> core1 and and so on...
> 
> ...
> 
> PowerPC,POWER8@0 {
> ...
> ibm,my-drc-index = <0x1000>;
> ...
> };
> 
> PowerPC,POWER8@8 {
> ...
> ibm,my-drc-index = <0x1008>;
> ...
> };
> 
> PowerPC,POWER8@10 {
> ...
> 
> No ibm,my-drc-index property for this core since 0x1010 doesn't
> exist in ibm,drc-indexes above.
> 
> ...
> };
> };
> 
> ...
> 
> interrupt-controller {
> ...
> ibm,interrupt-server-ranges = <0x0 0x10>;
> 
> With a spacing of 8, the highest VCPU id for the given topology should be:
> 16 * 8 / 4 = 32 and not 16
> 
> ...
> linux,phandle = <0x7e7323b8>;
> interrupt-controller;
> };
> 
> And CPU hot-plug/unplug is broken:
> 
> (qemu) device_del core1
> pseries-hotplug-cpu: Cannot find CPU (drc index 1004) to remove
> 
> (qemu) device_del core2
> cpu 4 (hwid 8) Ready to die...
> cpu 5 (hwid 9) Ready to die...
> cpu 6 (hwid 10) Ready to die...
> cpu 7 (hwid 11) Ready to die...
> 
> These are the VCPU ids of core1 actually
> 
> (qemu) device_add host-spapr-cpu-core,core-id=12,id=core3
> (qemu) device_del core3
> pseries-hotplug-cpu: Cannot find CPU (drc index 100c) to remove
> 
> This patches all the code in hw/ppc/spapr.c to assume the VSMT
> spacing when manipulating VCPU ids.
> 
> Fixes: 8904e5a75005
> Signed-off-by: Greg Kurz 

Ouch, good catch.  That's a lot of nasty bugs I hadn't realised were
there.  Applied, thanks.

> ---
>  hw/ppc/spapr.c |   24 ++--
>  1 file changed, 10 insertions(+), 14 deletions(-)



> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 9f29434819bd..ea7429c92a97 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -160,9 +160,9 @@ static void pre_2_10_vmstate_unregister_dummy_icp(int i)
> (void *)(uintptr_t) i);
>  }
>  
> -static inline int xics_max_server_number(void)
> +static int xics_max_server_number(sPAPRMachineState *spapr)
>  {
> -return DIV_ROUND_UP(max_cpus * kvmppc_smt_threads(), smp_threads);
> +return DIV_ROUND_UP(max_cpus * spapr->vsmt, smp_threads);
>  }
>  
>  static void xics_system_init(MachineState *machine, int nr_irqs, Error 
> **errp)
> @@ -194,7 +194,7 @@ static void xics_system_init(MachineState *machine, int 
> nr_irqs, Error **errp)
>  if (smc->pre_2_10_has_unused_icps) {
>  int i;
>  
> -for (i = 0; i < xics_max_server_number(); i++) {
> +for (i = 0; i < xics_max_server_number(spapr); i++) {
>  /* Dummy entries get deregistered when real ICPState objects
>   * are registered during CPU core hotplug.
>   */
> @@ -337,7 +337,6 @@ static int spapr_fixup_cpu_dt(void *fdt, 
> sPAPRMachineState *spapr)
>  int ret = 0, offset, cpus_offset;
> 

[Qemu-devel] [PATCH 1/5] spapr: use spapr->vsmt to compute VCPU ids

2018-02-14 Thread Greg Kurz
Since the introduction of VSMT in 2.11, the spacing of VCPU ids
between cores is controllable through a machine property instead
of being only dictated by the SMT mode of the host:

cpu->vcpu_id = (cc->core_id * spapr->vsmt / smp_threads) + i

Until recently, the machine code would try to change the SMT mode
of the host to be equal to VSMT or exit. This allowed the rest of
the code to assume that kvmppc_smt_threads() == spapr->vsmt is
always true.

Recent commit "8904e5a75005 spapr: Adjust default VSMT value for
better migration compatibility" relaxed the rule. If the VSMT
mode cannot be set in KVM for some reasons, but the requested
CPU topology is compatible with the current SMT mode, then we
let the guest run with  kvmppc_smt_threads() != spapr->vsmt.

This breaks quite a few places in the code, in particular when
calculating DRC indexes.

This is what happens on a POWER host with subcores-per-core=2 (ie,
supports up to SMT4) when passing the following topology:

-smp threads=4,maxcpus=16 \
-device host-spapr-cpu-core,core-id=4,id=core1 \
-device host-spapr-cpu-core,core-id=8,id=core2

qemu-system-ppc64: warning: Failed to set KVM's VSMT mode to 8 (errno -22)

This is expected since KVM is limited to SMT4, but the guest is started
anyway because this topology can run on SMT4 even with a VSMT8 spacing.

But when we look at the DT, things get nastier:

cpus {
...
ibm,drc-indexes = <0x4 0x1000 0x1004 0x1008 0x100c>;

This means that we have the following association:

 CPU core device | DRC| VCPU id
-++-
   boot core | 0x1000 | 0
   core1 | 0x1004 | 4
   core2 | 0x1008 | 8
   core3 | 0x100c | 12

But since the spacing of VCPU ids is 8, the DRC for core1 points to a
VCPU that doesn't exist, the DRC for core2 points to the first VCPU of
core1 and and so on...

...

PowerPC,POWER8@0 {
...
ibm,my-drc-index = <0x1000>;
...
};

PowerPC,POWER8@8 {
...
ibm,my-drc-index = <0x1008>;
...
};

PowerPC,POWER8@10 {
...

No ibm,my-drc-index property for this core since 0x1010 doesn't
exist in ibm,drc-indexes above.

...
};
};

...

interrupt-controller {
...
ibm,interrupt-server-ranges = <0x0 0x10>;

With a spacing of 8, the highest VCPU id for the given topology should be:
16 * 8 / 4 = 32 and not 16

...
linux,phandle = <0x7e7323b8>;
interrupt-controller;
};

And CPU hot-plug/unplug is broken:

(qemu) device_del core1
pseries-hotplug-cpu: Cannot find CPU (drc index 1004) to remove

(qemu) device_del core2
cpu 4 (hwid 8) Ready to die...
cpu 5 (hwid 9) Ready to die...
cpu 6 (hwid 10) Ready to die...
cpu 7 (hwid 11) Ready to die...

These are the VCPU ids of core1 actually

(qemu) device_add host-spapr-cpu-core,core-id=12,id=core3
(qemu) device_del core3
pseries-hotplug-cpu: Cannot find CPU (drc index 100c) to remove

This patches all the code in hw/ppc/spapr.c to assume the VSMT
spacing when manipulating VCPU ids.

Fixes: 8904e5a75005
Signed-off-by: Greg Kurz 
---
 hw/ppc/spapr.c |   24 ++--
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 9f29434819bd..ea7429c92a97 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -160,9 +160,9 @@ static void pre_2_10_vmstate_unregister_dummy_icp(int i)
(void *)(uintptr_t) i);
 }
 
-static inline int xics_max_server_number(void)
+static int xics_max_server_number(sPAPRMachineState *spapr)
 {
-return DIV_ROUND_UP(max_cpus * kvmppc_smt_threads(), smp_threads);
+return DIV_ROUND_UP(max_cpus * spapr->vsmt, smp_threads);
 }
 
 static void xics_system_init(MachineState *machine, int nr_irqs, Error **errp)
@@ -194,7 +194,7 @@ static void xics_system_init(MachineState *machine, int 
nr_irqs, Error **errp)
 if (smc->pre_2_10_has_unused_icps) {
 int i;
 
-for (i = 0; i < xics_max_server_number(); i++) {
+for (i = 0; i < xics_max_server_number(spapr); i++) {
 /* Dummy entries get deregistered when real ICPState objects
  * are registered during CPU core hotplug.
  */
@@ -337,7 +337,6 @@ static int spapr_fixup_cpu_dt(void *fdt, sPAPRMachineState 
*spapr)
 int ret = 0, offset, cpus_offset;
 CPUState *cs;
 char cpu_model[32];
-int smt = kvmppc_smt_threads();
 uint32_t pft_size_prop[] = {0, cpu_to_be32(spapr->htab_shift)};
 
 CPU_FOREACH(cs) {
@@ -346,7 +345,7 @@ static int spapr_fixup_cpu_dt(void *fdt, sPAPRMachineState 
*spapr)
 int index = spapr_vcpu_id(cpu);
 int compat_smt = MIN(smp_threads, ppc_compat_max_vthreads(cpu));
 
-if ((index % smt) != 0) {
+if (index % spapr->vsmt !=