Re: [PATCH v2 06/10] powerpc/smp: Generalize 2nd sched domain

2020-07-22 Thread peterz
On Wed, Jul 22, 2020 at 10:48:54AM +0200, Peter Zijlstra wrote:
> But reading your explanation, it looks like the Linux topology setup
> could actually unravel the fused-faux-SMT8 into two independent SMT4
> parts, negating that firmware option.

Ah, it looks like that's exactly what you're doing. Nice!


Re: [PATCH v2 06/10] powerpc/smp: Generalize 2nd sched domain

2020-07-22 Thread Peter Zijlstra
On Wed, Jul 22, 2020 at 01:48:22PM +0530, Srikar Dronamraju wrote:
> * pet...@infradead.org  [2020-07-22 09:46:24]:
> 
> > On Tue, Jul 21, 2020 at 05:08:10PM +0530, Srikar Dronamraju wrote:
> > > Currently "CACHE" domain happens to be the 2nd sched domain as per
> > > powerpc_topology. This domain will collapse if cpumask of l2-cache is
> > > same as SMT domain. However we could generalize this domain such that it
> > > could mean either be a "CACHE" domain or a "BIGCORE" domain.
> > 
> > What's the whole smallcore vs bigcore thing?
> > 
> > Would it make sense to have an actual overview of the various topology
> > layers somewhere? Because I'm getting lost and can't really make sense
> > of the code due to that.
> 
> To quote with an example: using (Power 9)
> 
> Power 9 is an SMT 8 core by design. However this 8 thread core can run as 2
> independent core with threads 0,2,4 and 6 acting like a core, while threads
> 1,3,5,7 acting as another core.  
> 
> The firmware can decide to showcase them as 2 independent small cores or as
> one big core. However the LLC (i.e last level of cache) is shared between
> all the threads of the SMT 8 core. Future power chips, LLC might change, it
> may be expanded to share with another SMT 8 core or it could be made
> specific to SMT 4 core.
> 
> The smt 8 core is also known as fused core/ Big core.
> The smaller smt 4 core is known as small core.
> 
> So on a Power9 Baremetal, the firmware would show up as SMT4 core.
> and we have a CACHE level at SMT 8. CACHE level would be very very similar
> to MC domain in X86.
> 
> Hope this is clear.

Ooh, that thing. I thought P9 was in actual fact an SMT4 hardware part,
but to be compatible with P8 it has this fused option where two cores
act like a single SMT8 part.

The interleaving enumeration is done due to P7 asymmetric cores,
resuting in schedulers having the preference to use the lower threads.

Combined that results in a P9-fused configuration using two independent
full cores when there's only 2 runnable threads.

Which is a subtly different story from yours.

But reading your explanation, it looks like the Linux topology setup
could actually unravel the fused-faux-SMT8 into two independent SMT4
parts, negating that firmware option.

Anyway, a few comments just around there might be helpfull.


Thanks!


Re: [PATCH v2 06/10] powerpc/smp: Generalize 2nd sched domain

2020-07-22 Thread Srikar Dronamraju
* pet...@infradead.org  [2020-07-22 09:46:24]:

> On Tue, Jul 21, 2020 at 05:08:10PM +0530, Srikar Dronamraju wrote:
> > Currently "CACHE" domain happens to be the 2nd sched domain as per
> > powerpc_topology. This domain will collapse if cpumask of l2-cache is
> > same as SMT domain. However we could generalize this domain such that it
> > could mean either be a "CACHE" domain or a "BIGCORE" domain.
> 
> What's the whole smallcore vs bigcore thing?
> 
> Would it make sense to have an actual overview of the various topology
> layers somewhere? Because I'm getting lost and can't really make sense
> of the code due to that.

To quote with an example: using (Power 9)

Power 9 is an SMT 8 core by design. However this 8 thread core can run as 2
independent core with threads 0,2,4 and 6 acting like a core, while threads
1,3,5,7 acting as another core.  

The firmware can decide to showcase them as 2 independent small cores or as
one big core. However the LLC (i.e last level of cache) is shared between
all the threads of the SMT 8 core. Future power chips, LLC might change, it
may be expanded to share with another SMT 8 core or it could be made
specific to SMT 4 core.

The smt 8 core is also known as fused core/ Big core.
The smaller smt 4 core is known as small core.

So on a Power9 Baremetal, the firmware would show up as SMT4 core.
and we have a CACHE level at SMT 8. CACHE level would be very very similar
to MC domain in X86.

Hope this is clear.

-- 
Thanks and Regards
Srikar Dronamraju


Re: [PATCH v2 06/10] powerpc/smp: Generalize 2nd sched domain

2020-07-22 Thread peterz
On Tue, Jul 21, 2020 at 05:08:10PM +0530, Srikar Dronamraju wrote:
> Currently "CACHE" domain happens to be the 2nd sched domain as per
> powerpc_topology. This domain will collapse if cpumask of l2-cache is
> same as SMT domain. However we could generalize this domain such that it
> could mean either be a "CACHE" domain or a "BIGCORE" domain.

What's the whole smallcore vs bigcore thing?

Would it make sense to have an actual overview of the various topology
layers somewhere? Because I'm getting lost and can't really make sense
of the code due to that.


Re: [PATCH v2 06/10] powerpc/smp: Generalize 2nd sched domain

2020-07-22 Thread Srikar Dronamraju
* Gautham R Shenoy  [2020-07-22 12:26:40]:

> Hello Srikar,
> 
> On Tue, Jul 21, 2020 at 05:08:10PM +0530, Srikar Dronamraju wrote:
> > Currently "CACHE" domain happens to be the 2nd sched domain as per
> > powerpc_topology. This domain will collapse if cpumask of l2-cache is
> > same as SMT domain. However we could generalize this domain such that it
> > could mean either be a "CACHE" domain or a "BIGCORE" domain.
> > 
> > While setting up the "CACHE" domain, check if shared_cache is already
> > set.
> > @@ -1339,14 +1345,20 @@ void start_secondary(void *unused)
> > /* Update topology CPU masks */
> > add_cpu_to_masks(cpu);
> > 
> > -   if (has_big_cores)
> > -   sibling_mask = cpu_smallcore_mask;
> > /*
> >  * Check for any shared caches. Note that this must be done on a
> >  * per-core basis because one core in the pair might be disabled.
> >  */
> > -   if (!cpumask_equal(cpu_l2_cache_mask(cpu), sibling_mask(cpu)))
> > -   shared_caches = true;
> > +   if (!shared_caches) {
> > +   struct cpumask *(*sibling_mask)(int) = cpu_sibling_mask;
> > +   struct cpumask *mask = cpu_l2_cache_mask(cpu);
> > +
> > +   if (has_big_cores)
> > +   sibling_mask = cpu_smallcore_mask;
> > +
> > +   if (cpumask_weight(mask) > cpumask_weight(sibling_mask(cpu)))
> > +   shared_caches = true;
> 
> At the risk of repeating my comment to the v1 version of the patch, we
> have shared caches only l2_cache_mask(cpu) is a strict superset of
> sibling_mask(cpu).
> 
> "cpumask_weight(mask) > cpumask_weight(sibling_mask(cpu))" does not
> capture this.

Why would it not? We are setting shared_caches if and only if l2_cache_mask
is a strict superset of sibling/smallcore mask.

> Could we please use
> 
>   if (!cpumask_equal(sibling_mask(cpu), mask) &&
> cpumask_subset(sibling_mask(cpu), mask) {
>   }
> 

Scheduler mandates that two cpumasks for the same CPU would either have to
be equal or one of them has to be a strict superset of the other. If not the
scheduler would mark our domains as broken. That being the case,
cpumask_weight will correctly capture what we are looking for. That said
your condition is also correct but slightly heavier and doesn't provide us
with any more information or correctness.

> 
> Otherwise the patch looks good to me.
> 
> --
> Thanks and Regards
> gautham.

-- 
Thanks and Regards
Srikar Dronamraju


Re: [PATCH v2 06/10] powerpc/smp: Generalize 2nd sched domain

2020-07-22 Thread Gautham R Shenoy
Hello Srikar,

On Tue, Jul 21, 2020 at 05:08:10PM +0530, Srikar Dronamraju wrote:
> Currently "CACHE" domain happens to be the 2nd sched domain as per
> powerpc_topology. This domain will collapse if cpumask of l2-cache is
> same as SMT domain. However we could generalize this domain such that it
> could mean either be a "CACHE" domain or a "BIGCORE" domain.
> 
> While setting up the "CACHE" domain, check if shared_cache is already
> set.
> 
> Cc: linuxppc-dev 
> Cc: LKML 
> Cc: Michael Ellerman 
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Cc: Valentin Schneider 
> Cc: Nick Piggin 
> Cc: Oliver OHalloran 
> Cc: Nathan Lynch 
> Cc: Michael Neuling 
> Cc: Anton Blanchard 
> Cc: Gautham R Shenoy 
> Cc: Vaidyanathan Srinivasan 
> Cc: Jordan Niethe 
> Signed-off-by: Srikar Dronamraju 
> ---
> Changelog v1 -> v2:
> powerpc/smp: Generalize 2nd sched domain
>   Moved shared_cache topology fixup to fixup_topology (Gautham)
>

Just one comment below.

>  arch/powerpc/kernel/smp.c | 49 ---
>  1 file changed, 35 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 57468877499a..933ebdf97432 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -85,6 +85,14 @@ EXPORT_PER_CPU_SYMBOL(cpu_l2_cache_map);
>  EXPORT_PER_CPU_SYMBOL(cpu_core_map);
>  EXPORT_SYMBOL_GPL(has_big_cores);
> 
> +enum {
> +#ifdef CONFIG_SCHED_SMT
> + smt_idx,
> +#endif
> + bigcore_idx,
> + die_idx,
> +};
> +


[..snip..]

> @@ -1339,14 +1345,20 @@ void start_secondary(void *unused)
>   /* Update topology CPU masks */
>   add_cpu_to_masks(cpu);
> 
> - if (has_big_cores)
> - sibling_mask = cpu_smallcore_mask;
>   /*
>* Check for any shared caches. Note that this must be done on a
>* per-core basis because one core in the pair might be disabled.
>*/
> - if (!cpumask_equal(cpu_l2_cache_mask(cpu), sibling_mask(cpu)))
> - shared_caches = true;
> + if (!shared_caches) {
> + struct cpumask *(*sibling_mask)(int) = cpu_sibling_mask;
> + struct cpumask *mask = cpu_l2_cache_mask(cpu);
> +
> + if (has_big_cores)
> + sibling_mask = cpu_smallcore_mask;
> +
> + if (cpumask_weight(mask) > cpumask_weight(sibling_mask(cpu)))
> + shared_caches = true;

At the risk of repeating my comment to the v1 version of the patch, we
have shared caches only l2_cache_mask(cpu) is a strict superset of
sibling_mask(cpu).

"cpumask_weight(mask) > cpumask_weight(sibling_mask(cpu))" does not
capture this.

Could we please use

  if (!cpumask_equal(sibling_mask(cpu), mask) &&
  cpumask_subset(sibling_mask(cpu), mask) {
  }

?


> + }
> 
>   set_numa_node(numa_cpu_lookup_table[cpu]);
>   set_numa_mem(local_memory_node(numa_cpu_lookup_table[cpu]));
> @@ -1374,10 +1386,19 @@ int setup_profiling_timer(unsigned int multiplier)
> 
>  static void fixup_topology(void)
>  {
> + if (shared_caches) {
> + pr_info("Using shared cache scheduler topology\n");
> + powerpc_topology[bigcore_idx].mask = shared_cache_mask;
> +#ifdef CONFIG_SCHED_DEBUG
> + powerpc_topology[bigcore_idx].name = "CACHE";
> +#endif
> + powerpc_topology[bigcore_idx].sd_flags = 
> powerpc_shared_cache_flags;
> + }
> +
>  #ifdef CONFIG_SCHED_SMT
>   if (has_big_cores) {
>   pr_info("Big cores detected but using small core scheduling\n");
> - powerpc_topology[0].mask = smallcore_smt_mask;
> + powerpc_topology[smt_idx].mask = smallcore_smt_mask;
>   }
>  #endif


Otherwise the patch looks good to me.

--
Thanks and Regards
gautham.