Re: [PATCH 1/3] powerpc/smp: Reintroduce cpu_core_mask

2021-04-18 Thread David Gibson
On Fri, Apr 16, 2021 at 11:15:49AM +0530, Srikar Dronamraju wrote:
> * David Gibson  [2021-04-16 13:21:34]:
> 
> Thanks for having a look at the patches.
> 
> > On Thu, Apr 15, 2021 at 05:39:32PM +0530, Srikar Dronamraju wrote:
> > > Daniel reported that with Commit 4ca234a9cbd7 ("powerpc/smp: Stop
> > > updating cpu_core_mask") QEMU was unable to set single NUMA node SMP
> > > topologies such as:
> > >  -smp 8,maxcpus=8,cores=2,threads=2,sockets=2
> > >  i.e he expected 2 sockets in one NUMA node.
> > 
> > Well, strictly speaking, you can still set that toplogy in qemu but a
> > PAPR guest with that commit will show as having 1 socket in lscpu and
> > similar things.
> > 
> 
> Right, I did mention the o/p of lscpu in QEMU with the said commit and
> with the new patches in the cover letter. Somehow I goofed up the cc
> list for the cover letter.
> 
> Reference for the cover letter:
> https://lore.kernel.org/linuxppc-dev/20210415120934.232271-1-sri...@linux.vnet.ibm.com/t/#u
> 
> > Basically, this is because PAPR has no meaningful distinction between
> > cores and sockets.  So it's kind of a cosmetic problem, but it is a
> > user-unexpected behaviour that it would be nice to avoid if it's not
> > excessively difficult.
> > 
> > > The above commit helped to reduce boot time on Large Systems for
> > > example 4096 vCPU single socket QEMU instance. PAPR is silent on
> > > having more than one socket within a NUMA node.
> > > 
> > > cpu_core_mask and cpu_cpu_mask for any CPU would be same unless the
> > > number of sockets is different from the number of NUMA nodes.
> > 
> > Number of sockets being different from number of NUMA nodes is routine
> > in qemu, and I don't think it's something we should enforce.
> > 
> > > One option is to reintroduce cpu_core_mask but use a slightly
> > > different method to arrive at the cpu_core_mask. Previously each CPU's
> > > chip-id would be compared with all other CPU's chip-id to verify if
> > > both the CPUs were related at the chip level. Now if a CPU 'A' is
> > > found related / (unrelated) to another CPU 'B', all the thread
> > > siblings of 'A' and thread siblings of 'B' are automatically marked as
> > > related / (unrelated).
> > > 
> > > Also if a platform doesn't support ibm,chip-id property, i.e its
> > > cpu_to_chip_id returns -1, cpu_core_map holds a copy of
> > > cpu_cpu_mask().
> > 
> > Yeah, the other weirdness here is that ibm,chip-id isn't a PAPR
> > property at all - it was added for powernv.  We then added it to qemu
> > for PAPR guests because that was the way at the time to get the guest
> > to advertise the expected number of sockets.  It therefore basically
> > *only* exists on PAPR/qemu for that purpose, so if it's not serving it
> > we need to come up with something else.
> > 
> 
> Do you have ideas on what that something could be like?

Not really, sorry.

> So if that's
> more beneficial then we could move over to that scheme. Also apart
> from ibm,chip-id being not a PAPR property, do you have any other
> concerns with it.

I think if we can keep ibm,chip-id doing this job, that would be
simplest - as long as our PAPR usage isn't implying semantics which
contradict what it does on powernv.  AIUI Cédric thought it did that,
but with further discussion it seems like that might have been a
misunderstanding incorrectly conflating chip-id with NUMA nodes.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 1/3] powerpc/smp: Reintroduce cpu_core_mask

2021-04-15 Thread Srikar Dronamraju
* David Gibson  [2021-04-16 13:21:34]:

Thanks for having a look at the patches.

> On Thu, Apr 15, 2021 at 05:39:32PM +0530, Srikar Dronamraju wrote:
> > Daniel reported that with Commit 4ca234a9cbd7 ("powerpc/smp: Stop
> > updating cpu_core_mask") QEMU was unable to set single NUMA node SMP
> > topologies such as:
> >  -smp 8,maxcpus=8,cores=2,threads=2,sockets=2
> >  i.e he expected 2 sockets in one NUMA node.
> 
> Well, strictly speaking, you can still set that toplogy in qemu but a
> PAPR guest with that commit will show as having 1 socket in lscpu and
> similar things.
> 

Right, I did mention the o/p of lscpu in QEMU with the said commit and
with the new patches in the cover letter. Somehow I goofed up the cc
list for the cover letter.

Reference for the cover letter:
https://lore.kernel.org/linuxppc-dev/20210415120934.232271-1-sri...@linux.vnet.ibm.com/t/#u

> Basically, this is because PAPR has no meaningful distinction between
> cores and sockets.  So it's kind of a cosmetic problem, but it is a
> user-unexpected behaviour that it would be nice to avoid if it's not
> excessively difficult.
> 
> > The above commit helped to reduce boot time on Large Systems for
> > example 4096 vCPU single socket QEMU instance. PAPR is silent on
> > having more than one socket within a NUMA node.
> > 
> > cpu_core_mask and cpu_cpu_mask for any CPU would be same unless the
> > number of sockets is different from the number of NUMA nodes.
> 
> Number of sockets being different from number of NUMA nodes is routine
> in qemu, and I don't think it's something we should enforce.
> 
> > One option is to reintroduce cpu_core_mask but use a slightly
> > different method to arrive at the cpu_core_mask. Previously each CPU's
> > chip-id would be compared with all other CPU's chip-id to verify if
> > both the CPUs were related at the chip level. Now if a CPU 'A' is
> > found related / (unrelated) to another CPU 'B', all the thread
> > siblings of 'A' and thread siblings of 'B' are automatically marked as
> > related / (unrelated).
> > 
> > Also if a platform doesn't support ibm,chip-id property, i.e its
> > cpu_to_chip_id returns -1, cpu_core_map holds a copy of
> > cpu_cpu_mask().
> 
> Yeah, the other weirdness here is that ibm,chip-id isn't a PAPR
> property at all - it was added for powernv.  We then added it to qemu
> for PAPR guests because that was the way at the time to get the guest
> to advertise the expected number of sockets.  It therefore basically
> *only* exists on PAPR/qemu for that purpose, so if it's not serving it
> we need to come up with something else.
> 

Do you have ideas on what that something could be like? So if that's
more beneficial then we could move over to that scheme. Also apart
from ibm,chip-id being not a PAPR property, do you have any other
concerns with it.


-- 
Thanks and Regards
Srikar Dronamraju


Re: [PATCH 1/3] powerpc/smp: Reintroduce cpu_core_mask

2021-04-15 Thread David Gibson
On Thu, Apr 15, 2021 at 05:39:32PM +0530, Srikar Dronamraju wrote:
> Daniel reported that with Commit 4ca234a9cbd7 ("powerpc/smp: Stop
> updating cpu_core_mask") QEMU was unable to set single NUMA node SMP
> topologies such as:
>  -smp 8,maxcpus=8,cores=2,threads=2,sockets=2
>  i.e he expected 2 sockets in one NUMA node.

Well, strictly speaking, you can still set that toplogy in qemu but a
PAPR guest with that commit will show as having 1 socket in lscpu and
similar things.

Basically, this is because PAPR has no meaningful distinction between
cores and sockets.  So it's kind of a cosmetic problem, but it is a
user-unexpected behaviour that it would be nice to avoid if it's not
excessively difficult.

> The above commit helped to reduce boot time on Large Systems for
> example 4096 vCPU single socket QEMU instance. PAPR is silent on
> having more than one socket within a NUMA node.
> 
> cpu_core_mask and cpu_cpu_mask for any CPU would be same unless the
> number of sockets is different from the number of NUMA nodes.

Number of sockets being different from number of NUMA nodes is routine
in qemu, and I don't think it's something we should enforce.

> One option is to reintroduce cpu_core_mask but use a slightly
> different method to arrive at the cpu_core_mask. Previously each CPU's
> chip-id would be compared with all other CPU's chip-id to verify if
> both the CPUs were related at the chip level. Now if a CPU 'A' is
> found related / (unrelated) to another CPU 'B', all the thread
> siblings of 'A' and thread siblings of 'B' are automatically marked as
> related / (unrelated).
> 
> Also if a platform doesn't support ibm,chip-id property, i.e its
> cpu_to_chip_id returns -1, cpu_core_map holds a copy of
> cpu_cpu_mask().

Yeah, the other weirdness here is that ibm,chip-id isn't a PAPR
property at all - it was added for powernv.  We then added it to qemu
for PAPR guests because that was the way at the time to get the guest
to advertise the expected number of sockets.  It therefore basically
*only* exists on PAPR/qemu for that purpose, so if it's not serving it
we need to come up with something else.

> 
> Fixes: 4ca234a9cbd7 ("powerpc/smp: Stop updating cpu_core_mask")
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: qemu-...@nongnu.org
> Cc: Cedric Le Goater 
> Cc: David Gibson 
> Cc: Nathan Lynch 
> Cc: Michael Ellerman 
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Cc: Valentin Schneider 
> Cc: Gautham R Shenoy 
> Reported-by: Daniel Henrique Barboza 
> Signed-off-by: Srikar Dronamraju 
> ---
>  arch/powerpc/include/asm/smp.h |  5 +
>  arch/powerpc/kernel/smp.c  | 39 --
>  2 files changed, 37 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
> index 7a13bc20f0a0..47081a9e13ca 100644
> --- a/arch/powerpc/include/asm/smp.h
> +++ b/arch/powerpc/include/asm/smp.h
> @@ -121,6 +121,11 @@ static inline struct cpumask *cpu_sibling_mask(int cpu)
>   return per_cpu(cpu_sibling_map, cpu);
>  }
>  
> +static inline struct cpumask *cpu_core_mask(int cpu)
> +{
> + return per_cpu(cpu_core_map, cpu);
> +}
> +
>  static inline struct cpumask *cpu_l2_cache_mask(int cpu)
>  {
>   return per_cpu(cpu_l2_cache_map, cpu);
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 5a4d59a1070d..5c7ce1d50631 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1057,17 +1057,12 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
>   local_memory_node(numa_cpu_lookup_table[cpu]));
>   }
>  #endif
> - /*
> -  * cpu_core_map is now more updated and exists only since
> -  * its been exported for long. It only will have a snapshot
> -  * of cpu_cpu_mask.
> -  */
> - cpumask_copy(per_cpu(cpu_core_map, cpu), cpu_cpu_mask(cpu));
>   }
>  
>   /* Init the cpumasks so the boot CPU is related to itself */
>   cpumask_set_cpu(boot_cpuid, cpu_sibling_mask(boot_cpuid));
>   cpumask_set_cpu(boot_cpuid, cpu_l2_cache_mask(boot_cpuid));
> + cpumask_set_cpu(boot_cpuid, cpu_core_mask(boot_cpuid));
>  
>   if (has_coregroup_support())
>   cpumask_set_cpu(boot_cpuid, cpu_coregroup_mask(boot_cpuid));
> @@ -1408,6 +1403,9 @@ static void remove_cpu_from_masks(int cpu)
>   set_cpus_unrelated(cpu, i, cpu_smallcore_mask);
>   }
>  
> + for_each_cpu(i, cpu_core_mask(cpu))
> + set_cpus_unrelated(cpu, i, cpu_core_mask);
> +
>   if (has_coregroup_support()) {
>   for_each_cpu(i, cpu_coregroup_mask(cpu))
>   set_cpus_unrelated(cpu, i, cpu_coregroup_mask);
> @@ -1468,8 +1466,11 @@ static void update_coregroup_mask(int cpu, 
> cpumask_var_t *mask)
>  
>  static void add_cpu_to_masks(int cpu)
>  {
> + struct cpumask *(*submask_fn)(int) = cpu_sibling_mask;
>   int first_thread = cpu_first

Re: [PATCH 1/3] powerpc/smp: Reintroduce cpu_core_mask

2021-04-15 Thread Srikar Dronamraju
* Gautham R Shenoy  [2021-04-15 22:41:34]:

> Hi Srikar,
> 
> 

Thanks for taking a look.

> > @@ -1485,12 +1486,36 @@ static void add_cpu_to_masks(int cpu)
> > add_cpu_to_smallcore_masks(cpu);
> > 
> > /* In CPU-hotplug path, hence use GFP_ATOMIC */
> > -   alloc_cpumask_var_node(&mask, GFP_ATOMIC, cpu_to_node(cpu));
> > +   ret = alloc_cpumask_var_node(&mask, GFP_ATOMIC, cpu_to_node(cpu));
> > update_mask_by_l2(cpu, &mask);
> > 
> > if (has_coregroup_support())
> > update_coregroup_mask(cpu, &mask);
> > 
> > +   if (chip_id == -1 || !ret) {
> > +   cpumask_copy(per_cpu(cpu_core_map, cpu), cpu_cpu_mask(cpu));
> > +   goto out;
> > +   }
> > +
> > +   if (shared_caches)
> > +   submask_fn = cpu_l2_cache_mask;
> > +
> > +   /* Update core_mask with all the CPUs that are part of submask */
> > +   or_cpumasks_related(cpu, cpu, submask_fn, cpu_core_mask);
> >
> 
> If coregroups exist, we can add the cpus of the coregroup to the
> cpu_core_mask thereby reducing the scope of the for_each_cpu() search
> below. This will still cut down the time on Baremetal systems
> supporting coregroups.
> 

Yes, once we upstream coregroup support to Baremetal, we should look
at adding it. Also do note, number of CPUs we support for Baremetal is
comparatively lower than in PowerVM + QEMU. And more importantly the
number of cores per coregroup is also very low. So the optimization
may not yield too much of a benefit.

Its only in the QEMU case, where we end up having too many cores in
the same chip, where we see a drastic increase in the boot-up time.

-- 
Thanks and Regards
Srikar Dronamraju


Re: [PATCH 1/3] powerpc/smp: Reintroduce cpu_core_mask

2021-04-15 Thread Gautham R Shenoy
Hi Srikar,

On Thu, Apr 15, 2021 at 05:39:32PM +0530, Srikar Dronamraju wrote:
 [..snip..]



> @@ -1485,12 +1486,36 @@ static void add_cpu_to_masks(int cpu)
>   add_cpu_to_smallcore_masks(cpu);
> 
>   /* In CPU-hotplug path, hence use GFP_ATOMIC */
> - alloc_cpumask_var_node(&mask, GFP_ATOMIC, cpu_to_node(cpu));
> + ret = alloc_cpumask_var_node(&mask, GFP_ATOMIC, cpu_to_node(cpu));
>   update_mask_by_l2(cpu, &mask);
> 
>   if (has_coregroup_support())
>   update_coregroup_mask(cpu, &mask);
> 
> + if (chip_id == -1 || !ret) {
> + cpumask_copy(per_cpu(cpu_core_map, cpu), cpu_cpu_mask(cpu));
> + goto out;
> + }
> +
> + if (shared_caches)
> + submask_fn = cpu_l2_cache_mask;
> +
> + /* Update core_mask with all the CPUs that are part of submask */
> + or_cpumasks_related(cpu, cpu, submask_fn, cpu_core_mask);
>

If coregroups exist, we can add the cpus of the coregroup to the
cpu_core_mask thereby reducing the scope of the for_each_cpu() search
below. This will still cut down the time on Baremetal systems
supporting coregroups.


> + /* Skip all CPUs already part of current CPU core mask */
> + cpumask_andnot(mask, cpu_online_mask, cpu_core_mask(cpu));
> +
> + for_each_cpu(i, mask) {
> + if (chip_id == cpu_to_chip_id(i)) {
> + or_cpumasks_related(cpu, i, submask_fn, cpu_core_mask);
> + cpumask_andnot(mask, mask, submask_fn(i));
> + } else {
> + cpumask_andnot(mask, mask, cpu_core_mask(i));
> + }
> + }
> +
> +out:
>   free_cpumask_var(mask);
>  }
> 
> -- 
> 2.25.1
> 


[PATCH 1/3] powerpc/smp: Reintroduce cpu_core_mask

2021-04-15 Thread Srikar Dronamraju
Daniel reported that with Commit 4ca234a9cbd7 ("powerpc/smp: Stop
updating cpu_core_mask") QEMU was unable to set single NUMA node SMP
topologies such as:
 -smp 8,maxcpus=8,cores=2,threads=2,sockets=2
 i.e he expected 2 sockets in one NUMA node.

The above commit helped to reduce boot time on Large Systems for
example 4096 vCPU single socket QEMU instance. PAPR is silent on
having more than one socket within a NUMA node.

cpu_core_mask and cpu_cpu_mask for any CPU would be same unless the
number of sockets is different from the number of NUMA nodes.

One option is to reintroduce cpu_core_mask but use a slightly
different method to arrive at the cpu_core_mask. Previously each CPU's
chip-id would be compared with all other CPU's chip-id to verify if
both the CPUs were related at the chip level. Now if a CPU 'A' is
found related / (unrelated) to another CPU 'B', all the thread
siblings of 'A' and thread siblings of 'B' are automatically marked as
related / (unrelated).

Also if a platform doesn't support ibm,chip-id property, i.e its
cpu_to_chip_id returns -1, cpu_core_map holds a copy of
cpu_cpu_mask().

Fixes: 4ca234a9cbd7 ("powerpc/smp: Stop updating cpu_core_mask")
Cc: linuxppc-dev@lists.ozlabs.org
Cc: qemu-...@nongnu.org
Cc: Cedric Le Goater 
Cc: David Gibson 
Cc: Nathan Lynch 
Cc: Michael Ellerman 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Valentin Schneider 
Cc: Gautham R Shenoy 
Reported-by: Daniel Henrique Barboza 
Signed-off-by: Srikar Dronamraju 
---
 arch/powerpc/include/asm/smp.h |  5 +
 arch/powerpc/kernel/smp.c  | 39 --
 2 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 7a13bc20f0a0..47081a9e13ca 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -121,6 +121,11 @@ static inline struct cpumask *cpu_sibling_mask(int cpu)
return per_cpu(cpu_sibling_map, cpu);
 }
 
+static inline struct cpumask *cpu_core_mask(int cpu)
+{
+   return per_cpu(cpu_core_map, cpu);
+}
+
 static inline struct cpumask *cpu_l2_cache_mask(int cpu)
 {
return per_cpu(cpu_l2_cache_map, cpu);
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 5a4d59a1070d..5c7ce1d50631 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1057,17 +1057,12 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
local_memory_node(numa_cpu_lookup_table[cpu]));
}
 #endif
-   /*
-* cpu_core_map is now more updated and exists only since
-* its been exported for long. It only will have a snapshot
-* of cpu_cpu_mask.
-*/
-   cpumask_copy(per_cpu(cpu_core_map, cpu), cpu_cpu_mask(cpu));
}
 
/* Init the cpumasks so the boot CPU is related to itself */
cpumask_set_cpu(boot_cpuid, cpu_sibling_mask(boot_cpuid));
cpumask_set_cpu(boot_cpuid, cpu_l2_cache_mask(boot_cpuid));
+   cpumask_set_cpu(boot_cpuid, cpu_core_mask(boot_cpuid));
 
if (has_coregroup_support())
cpumask_set_cpu(boot_cpuid, cpu_coregroup_mask(boot_cpuid));
@@ -1408,6 +1403,9 @@ static void remove_cpu_from_masks(int cpu)
set_cpus_unrelated(cpu, i, cpu_smallcore_mask);
}
 
+   for_each_cpu(i, cpu_core_mask(cpu))
+   set_cpus_unrelated(cpu, i, cpu_core_mask);
+
if (has_coregroup_support()) {
for_each_cpu(i, cpu_coregroup_mask(cpu))
set_cpus_unrelated(cpu, i, cpu_coregroup_mask);
@@ -1468,8 +1466,11 @@ static void update_coregroup_mask(int cpu, cpumask_var_t 
*mask)
 
 static void add_cpu_to_masks(int cpu)
 {
+   struct cpumask *(*submask_fn)(int) = cpu_sibling_mask;
int first_thread = cpu_first_thread_sibling(cpu);
+   int chip_id = cpu_to_chip_id(cpu);
cpumask_var_t mask;
+   bool ret;
int i;
 
/*
@@ -1485,12 +1486,36 @@ static void add_cpu_to_masks(int cpu)
add_cpu_to_smallcore_masks(cpu);
 
/* In CPU-hotplug path, hence use GFP_ATOMIC */
-   alloc_cpumask_var_node(&mask, GFP_ATOMIC, cpu_to_node(cpu));
+   ret = alloc_cpumask_var_node(&mask, GFP_ATOMIC, cpu_to_node(cpu));
update_mask_by_l2(cpu, &mask);
 
if (has_coregroup_support())
update_coregroup_mask(cpu, &mask);
 
+   if (chip_id == -1 || !ret) {
+   cpumask_copy(per_cpu(cpu_core_map, cpu), cpu_cpu_mask(cpu));
+   goto out;
+   }
+
+   if (shared_caches)
+   submask_fn = cpu_l2_cache_mask;
+
+   /* Update core_mask with all the CPUs that are part of submask */
+   or_cpumasks_related(cpu, cpu, submask_fn, cpu_core_mask);
+
+   /* Skip all CPUs already part of current CPU core mask */
+   cpumask_andnot(mask, cpu_online_mask, cpu_core_mask(cpu));
+
+   for_each_cpu(i, mask) {
+