Re: [PATCH v2 06/10] powerpc/smp: Generalize 2nd sched domain
On Wed, Jul 22, 2020 at 10:48:54AM +0200, Peter Zijlstra wrote: > But reading your explanation, it looks like the Linux topology setup > could actually unravel the fused-faux-SMT8 into two independent SMT4 > parts, negating that firmware option. Ah, it looks like that's exactly what you're doing. Nice!
Re: [PATCH v2 06/10] powerpc/smp: Generalize 2nd sched domain
On Wed, Jul 22, 2020 at 01:48:22PM +0530, Srikar Dronamraju wrote: > * pet...@infradead.org [2020-07-22 09:46:24]: > > > On Tue, Jul 21, 2020 at 05:08:10PM +0530, Srikar Dronamraju wrote: > > > Currently "CACHE" domain happens to be the 2nd sched domain as per > > > powerpc_topology. This domain will collapse if cpumask of l2-cache is > > > same as SMT domain. However we could generalize this domain such that it > > > could mean either be a "CACHE" domain or a "BIGCORE" domain. > > > > What's the whole smallcore vs bigcore thing? > > > > Would it make sense to have an actual overview of the various topology > > layers somewhere? Because I'm getting lost and can't really make sense > > of the code due to that. > > To quote with an example: using (Power 9) > > Power 9 is an SMT 8 core by design. However this 8 thread core can run as 2 > independent core with threads 0,2,4 and 6 acting like a core, while threads > 1,3,5,7 acting as another core. > > The firmware can decide to showcase them as 2 independent small cores or as > one big core. However the LLC (i.e last level of cache) is shared between > all the threads of the SMT 8 core. Future power chips, LLC might change, it > may be expanded to share with another SMT 8 core or it could be made > specific to SMT 4 core. > > The smt 8 core is also known as fused core/ Big core. > The smaller smt 4 core is known as small core. > > So on a Power9 Baremetal, the firmware would show up as SMT4 core. > and we have a CACHE level at SMT 8. CACHE level would be very very similar > to MC domain in X86. > > Hope this is clear. Ooh, that thing. I thought P9 was in actual fact an SMT4 hardware part, but to be compatible with P8 it has this fused option where two cores act like a single SMT8 part. The interleaving enumeration is done due to P7 asymmetric cores, resuting in schedulers having the preference to use the lower threads. Combined that results in a P9-fused configuration using two independent full cores when there's only 2 runnable threads. Which is a subtly different story from yours. But reading your explanation, it looks like the Linux topology setup could actually unravel the fused-faux-SMT8 into two independent SMT4 parts, negating that firmware option. Anyway, a few comments just around there might be helpfull. Thanks!
Re: [PATCH v2 06/10] powerpc/smp: Generalize 2nd sched domain
* pet...@infradead.org [2020-07-22 09:46:24]: > On Tue, Jul 21, 2020 at 05:08:10PM +0530, Srikar Dronamraju wrote: > > Currently "CACHE" domain happens to be the 2nd sched domain as per > > powerpc_topology. This domain will collapse if cpumask of l2-cache is > > same as SMT domain. However we could generalize this domain such that it > > could mean either be a "CACHE" domain or a "BIGCORE" domain. > > What's the whole smallcore vs bigcore thing? > > Would it make sense to have an actual overview of the various topology > layers somewhere? Because I'm getting lost and can't really make sense > of the code due to that. To quote with an example: using (Power 9) Power 9 is an SMT 8 core by design. However this 8 thread core can run as 2 independent core with threads 0,2,4 and 6 acting like a core, while threads 1,3,5,7 acting as another core. The firmware can decide to showcase them as 2 independent small cores or as one big core. However the LLC (i.e last level of cache) is shared between all the threads of the SMT 8 core. Future power chips, LLC might change, it may be expanded to share with another SMT 8 core or it could be made specific to SMT 4 core. The smt 8 core is also known as fused core/ Big core. The smaller smt 4 core is known as small core. So on a Power9 Baremetal, the firmware would show up as SMT4 core. and we have a CACHE level at SMT 8. CACHE level would be very very similar to MC domain in X86. Hope this is clear. -- Thanks and Regards Srikar Dronamraju
Re: [PATCH v2 06/10] powerpc/smp: Generalize 2nd sched domain
On Tue, Jul 21, 2020 at 05:08:10PM +0530, Srikar Dronamraju wrote: > Currently "CACHE" domain happens to be the 2nd sched domain as per > powerpc_topology. This domain will collapse if cpumask of l2-cache is > same as SMT domain. However we could generalize this domain such that it > could mean either be a "CACHE" domain or a "BIGCORE" domain. What's the whole smallcore vs bigcore thing? Would it make sense to have an actual overview of the various topology layers somewhere? Because I'm getting lost and can't really make sense of the code due to that.
Re: [PATCH v2 06/10] powerpc/smp: Generalize 2nd sched domain
* Gautham R Shenoy [2020-07-22 12:26:40]: > Hello Srikar, > > On Tue, Jul 21, 2020 at 05:08:10PM +0530, Srikar Dronamraju wrote: > > Currently "CACHE" domain happens to be the 2nd sched domain as per > > powerpc_topology. This domain will collapse if cpumask of l2-cache is > > same as SMT domain. However we could generalize this domain such that it > > could mean either be a "CACHE" domain or a "BIGCORE" domain. > > > > While setting up the "CACHE" domain, check if shared_cache is already > > set. > > @@ -1339,14 +1345,20 @@ void start_secondary(void *unused) > > /* Update topology CPU masks */ > > add_cpu_to_masks(cpu); > > > > - if (has_big_cores) > > - sibling_mask = cpu_smallcore_mask; > > /* > > * Check for any shared caches. Note that this must be done on a > > * per-core basis because one core in the pair might be disabled. > > */ > > - if (!cpumask_equal(cpu_l2_cache_mask(cpu), sibling_mask(cpu))) > > - shared_caches = true; > > + if (!shared_caches) { > > + struct cpumask *(*sibling_mask)(int) = cpu_sibling_mask; > > + struct cpumask *mask = cpu_l2_cache_mask(cpu); > > + > > + if (has_big_cores) > > + sibling_mask = cpu_smallcore_mask; > > + > > + if (cpumask_weight(mask) > cpumask_weight(sibling_mask(cpu))) > > + shared_caches = true; > > At the risk of repeating my comment to the v1 version of the patch, we > have shared caches only l2_cache_mask(cpu) is a strict superset of > sibling_mask(cpu). > > "cpumask_weight(mask) > cpumask_weight(sibling_mask(cpu))" does not > capture this. Why would it not? We are setting shared_caches if and only if l2_cache_mask is a strict superset of sibling/smallcore mask. > Could we please use > > if (!cpumask_equal(sibling_mask(cpu), mask) && > cpumask_subset(sibling_mask(cpu), mask) { > } > Scheduler mandates that two cpumasks for the same CPU would either have to be equal or one of them has to be a strict superset of the other. If not the scheduler would mark our domains as broken. That being the case, cpumask_weight will correctly capture what we are looking for. That said your condition is also correct but slightly heavier and doesn't provide us with any more information or correctness. > > Otherwise the patch looks good to me. > > -- > Thanks and Regards > gautham. -- Thanks and Regards Srikar Dronamraju
Re: [PATCH v2 06/10] powerpc/smp: Generalize 2nd sched domain
Hello Srikar, On Tue, Jul 21, 2020 at 05:08:10PM +0530, Srikar Dronamraju wrote: > Currently "CACHE" domain happens to be the 2nd sched domain as per > powerpc_topology. This domain will collapse if cpumask of l2-cache is > same as SMT domain. However we could generalize this domain such that it > could mean either be a "CACHE" domain or a "BIGCORE" domain. > > While setting up the "CACHE" domain, check if shared_cache is already > set. > > Cc: linuxppc-dev > Cc: LKML > Cc: Michael Ellerman > Cc: Ingo Molnar > Cc: Peter Zijlstra > Cc: Valentin Schneider > Cc: Nick Piggin > Cc: Oliver OHalloran > Cc: Nathan Lynch > Cc: Michael Neuling > Cc: Anton Blanchard > Cc: Gautham R Shenoy > Cc: Vaidyanathan Srinivasan > Cc: Jordan Niethe > Signed-off-by: Srikar Dronamraju > --- > Changelog v1 -> v2: > powerpc/smp: Generalize 2nd sched domain > Moved shared_cache topology fixup to fixup_topology (Gautham) > Just one comment below. > arch/powerpc/kernel/smp.c | 49 --- > 1 file changed, 35 insertions(+), 14 deletions(-) > > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c > index 57468877499a..933ebdf97432 100644 > --- a/arch/powerpc/kernel/smp.c > +++ b/arch/powerpc/kernel/smp.c > @@ -85,6 +85,14 @@ EXPORT_PER_CPU_SYMBOL(cpu_l2_cache_map); > EXPORT_PER_CPU_SYMBOL(cpu_core_map); > EXPORT_SYMBOL_GPL(has_big_cores); > > +enum { > +#ifdef CONFIG_SCHED_SMT > + smt_idx, > +#endif > + bigcore_idx, > + die_idx, > +}; > + [..snip..] > @@ -1339,14 +1345,20 @@ void start_secondary(void *unused) > /* Update topology CPU masks */ > add_cpu_to_masks(cpu); > > - if (has_big_cores) > - sibling_mask = cpu_smallcore_mask; > /* >* Check for any shared caches. Note that this must be done on a >* per-core basis because one core in the pair might be disabled. >*/ > - if (!cpumask_equal(cpu_l2_cache_mask(cpu), sibling_mask(cpu))) > - shared_caches = true; > + if (!shared_caches) { > + struct cpumask *(*sibling_mask)(int) = cpu_sibling_mask; > + struct cpumask *mask = cpu_l2_cache_mask(cpu); > + > + if (has_big_cores) > + sibling_mask = cpu_smallcore_mask; > + > + if (cpumask_weight(mask) > cpumask_weight(sibling_mask(cpu))) > + shared_caches = true; At the risk of repeating my comment to the v1 version of the patch, we have shared caches only l2_cache_mask(cpu) is a strict superset of sibling_mask(cpu). "cpumask_weight(mask) > cpumask_weight(sibling_mask(cpu))" does not capture this. Could we please use if (!cpumask_equal(sibling_mask(cpu), mask) && cpumask_subset(sibling_mask(cpu), mask) { } ? > + } > > set_numa_node(numa_cpu_lookup_table[cpu]); > set_numa_mem(local_memory_node(numa_cpu_lookup_table[cpu])); > @@ -1374,10 +1386,19 @@ int setup_profiling_timer(unsigned int multiplier) > > static void fixup_topology(void) > { > + if (shared_caches) { > + pr_info("Using shared cache scheduler topology\n"); > + powerpc_topology[bigcore_idx].mask = shared_cache_mask; > +#ifdef CONFIG_SCHED_DEBUG > + powerpc_topology[bigcore_idx].name = "CACHE"; > +#endif > + powerpc_topology[bigcore_idx].sd_flags = > powerpc_shared_cache_flags; > + } > + > #ifdef CONFIG_SCHED_SMT > if (has_big_cores) { > pr_info("Big cores detected but using small core scheduling\n"); > - powerpc_topology[0].mask = smallcore_smt_mask; > + powerpc_topology[smt_idx].mask = smallcore_smt_mask; > } > #endif Otherwise the patch looks good to me. -- Thanks and Regards gautham.