RE: [RFC PATCH v5 4/4] scheduler: Add cluster scheduler level for x86

2021-04-20 Thread Song Bao Hua (Barry Song)


> -Original Message-
> From: Tim Chen [mailto:tim.c.c...@linux.intel.com]
> Sent: Wednesday, April 21, 2021 6:32 AM
> To: Song Bao Hua (Barry Song) ;
> catalin.mari...@arm.com; w...@kernel.org; r...@rjwysocki.net;
> vincent.guit...@linaro.org; b...@alien8.de; t...@linutronix.de;
> mi...@redhat.com; l...@kernel.org; pet...@infradead.org;
> dietmar.eggem...@arm.com; rost...@goodmis.org; bseg...@google.com;
> mgor...@suse.de
> Cc: msys.miz...@gmail.com; valentin.schnei...@arm.com;
> gre...@linuxfoundation.org; Jonathan Cameron ;
> juri.le...@redhat.com; mark.rutl...@arm.com; sudeep.ho...@arm.com;
> aubrey...@linux.intel.com; linux-arm-ker...@lists.infradead.org;
> linux-kernel@vger.kernel.org; linux-a...@vger.kernel.org; x...@kernel.org;
> xuwei (O) ; Zengtao (B) ;
> guodong...@linaro.org; yangyicong ; Liguozhu (Kenneth)
> ; linux...@openeuler.org; h...@zytor.com
> Subject: Re: [RFC PATCH v5 4/4] scheduler: Add cluster scheduler level for x86
> 
> 
> 
> On 3/23/21 4:21 PM, Song Bao Hua (Barry Song) wrote:
> 
> >>
> >> On 3/18/21 9:16 PM, Barry Song wrote:
> >>> From: Tim Chen 
> >>>
> >>> There are x86 CPU architectures (e.g. Jacobsville) where L2 cahce
> >>> is shared among a cluster of cores instead of being exclusive
> >>> to one single core.
> >>>
> >>> To prevent oversubscription of L2 cache, load should be
> >>> balanced between such L2 clusters, especially for tasks with
> >>> no shared data.
> >>>
> >>> Also with cluster scheduling policy where tasks are woken up
> >>> in the same L2 cluster, we will benefit from keeping tasks
> >>> related to each other and likely sharing data in the same L2
> >>> cluster.
> >>>
> >>> Add CPU masks of CPUs sharing the L2 cache so we can build such
> >>> L2 cluster scheduler domain.
> >>>
> >>> Signed-off-by: Tim Chen 
> >>> Signed-off-by: Barry Song 
> >>
> >>
> >> Barry,
> >>
> >> Can you also add this chunk to the patch.
> >> Thanks.
> >
> > Sure, Tim, Thanks. I'll put that into patch 4/4 in v6.
> >
> 
> Barry,
> 
> This chunk will also need to be added to return cluster id for x86.
> Please add it in your next rev.

Yes. Thanks. I'll put this in either RFC v7 or Patch v1.

For spreading path, things are much easier, though packing path is 
quite tricky. But It seems RFC v6 has been quite close to what we want
to achieve to pack related tasks by scanning cluster for tasks within
same NUMA:
https://lore.kernel.org/lkml/20210420001844.9116-1-song.bao@hisilicon.com/

If couples have been already in same LLC(numa), scanning clusters will
gather them further. If they are running in different NUMA nodes, the
original scanning LLC will move them to the same node, after that,
scanning cluster might put them closer to each other.

it seems it is kind of the two-level packing Dietmar has suggested.

So perhaps we won't have RFC v7, I will probably send patch v1 afterwards.

> 
> Thanks.
> 
> Tim
> 
> ---
> 
> diff --git a/arch/x86/include/asm/topology.h
> b/arch/x86/include/asm/topology.h
> index 800fa48c9fcd..2548d824f103 100644
> --- a/arch/x86/include/asm/topology.h
> +++ b/arch/x86/include/asm/topology.h
> @@ -109,6 +109,7 @@ extern const struct cpumask *cpu_clustergroup_mask(int 
> cpu);
>  #define topology_physical_package_id(cpu)(cpu_data(cpu).phys_proc_id)
>  #define topology_logical_die_id(cpu) (cpu_data(cpu).logical_die_id)
>  #define topology_die_id(cpu) (cpu_data(cpu).cpu_die_id)
> +#define topology_cluster_id(cpu) (per_cpu(cpu_l2c_id, cpu))
>  #define topology_core_id(cpu)
> (cpu_data(cpu).cpu_core_id)
> 
>  extern unsigned int __max_die_per_package;

Thanks
Barry



Re: [RFC PATCH v5 4/4] scheduler: Add cluster scheduler level for x86

2021-04-20 Thread Tim Chen



On 3/23/21 4:21 PM, Song Bao Hua (Barry Song) wrote:

>>
>> On 3/18/21 9:16 PM, Barry Song wrote:
>>> From: Tim Chen 
>>>
>>> There are x86 CPU architectures (e.g. Jacobsville) where L2 cahce
>>> is shared among a cluster of cores instead of being exclusive
>>> to one single core.
>>>
>>> To prevent oversubscription of L2 cache, load should be
>>> balanced between such L2 clusters, especially for tasks with
>>> no shared data.
>>>
>>> Also with cluster scheduling policy where tasks are woken up
>>> in the same L2 cluster, we will benefit from keeping tasks
>>> related to each other and likely sharing data in the same L2
>>> cluster.
>>>
>>> Add CPU masks of CPUs sharing the L2 cache so we can build such
>>> L2 cluster scheduler domain.
>>>
>>> Signed-off-by: Tim Chen 
>>> Signed-off-by: Barry Song 
>>
>>
>> Barry,
>>
>> Can you also add this chunk to the patch.
>> Thanks.
> 
> Sure, Tim, Thanks. I'll put that into patch 4/4 in v6.
> 

Barry,

This chunk will also need to be added to return cluster id for x86.
Please add it in your next rev.

Thanks.

Tim

---

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 800fa48c9fcd..2548d824f103 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -109,6 +109,7 @@ extern const struct cpumask *cpu_clustergroup_mask(int cpu);
 #define topology_physical_package_id(cpu)  (cpu_data(cpu).phys_proc_id)
 #define topology_logical_die_id(cpu)   (cpu_data(cpu).logical_die_id)
 #define topology_die_id(cpu)   (cpu_data(cpu).cpu_die_id)
+#define topology_cluster_id(cpu)   (per_cpu(cpu_l2c_id, cpu))
 #define topology_core_id(cpu)  (cpu_data(cpu).cpu_core_id)
 
 extern unsigned int __max_die_per_package;


RE: [RFC PATCH v5 4/4] scheduler: Add cluster scheduler level for x86

2021-03-31 Thread Song Bao Hua (Barry Song)


> -Original Message-
> From: Song Bao Hua (Barry Song)
> Sent: Wednesday, March 24, 2021 12:15 PM
> To: 'Tim Chen' ; catalin.mari...@arm.com;
> w...@kernel.org; r...@rjwysocki.net; vincent.guit...@linaro.org; 
> b...@alien8.de;
> t...@linutronix.de; mi...@redhat.com; l...@kernel.org; pet...@infradead.org;
> dietmar.eggem...@arm.com; rost...@goodmis.org; bseg...@google.com;
> mgor...@suse.de
> Cc: msys.miz...@gmail.com; valentin.schnei...@arm.com;
> gre...@linuxfoundation.org; Jonathan Cameron ;
> juri.le...@redhat.com; mark.rutl...@arm.com; sudeep.ho...@arm.com;
> aubrey...@linux.intel.com; linux-arm-ker...@lists.infradead.org;
> linux-kernel@vger.kernel.org; linux-a...@vger.kernel.org; x...@kernel.org;
> xuwei (O) ; Zengtao (B) ;
> guodong...@linaro.org; yangyicong ; Liguozhu (Kenneth)
> ; linux...@openeuler.org; h...@zytor.com
> Subject: RE: [RFC PATCH v5 4/4] scheduler: Add cluster scheduler level for x86
> 
> 
> 
> > -Original Message-
> > From: Tim Chen [mailto:tim.c.c...@linux.intel.com]
> > Sent: Wednesday, March 24, 2021 11:51 AM
> > To: Song Bao Hua (Barry Song) ;
> > catalin.mari...@arm.com; w...@kernel.org; r...@rjwysocki.net;
> > vincent.guit...@linaro.org; b...@alien8.de; t...@linutronix.de;
> > mi...@redhat.com; l...@kernel.org; pet...@infradead.org;
> > dietmar.eggem...@arm.com; rost...@goodmis.org; bseg...@google.com;
> > mgor...@suse.de
> > Cc: msys.miz...@gmail.com; valentin.schnei...@arm.com;
> > gre...@linuxfoundation.org; Jonathan Cameron ;
> > juri.le...@redhat.com; mark.rutl...@arm.com; sudeep.ho...@arm.com;
> > aubrey...@linux.intel.com; linux-arm-ker...@lists.infradead.org;
> > linux-kernel@vger.kernel.org; linux-a...@vger.kernel.org; x...@kernel.org;
> > xuwei (O) ; Zengtao (B) ;
> > guodong...@linaro.org; yangyicong ; Liguozhu
> (Kenneth)
> > ; linux...@openeuler.org; h...@zytor.com
> > Subject: Re: [RFC PATCH v5 4/4] scheduler: Add cluster scheduler level for
> x86
> >
> >
> >
> > On 3/18/21 9:16 PM, Barry Song wrote:
> > > From: Tim Chen 
> > >
> > > There are x86 CPU architectures (e.g. Jacobsville) where L2 cahce
> > > is shared among a cluster of cores instead of being exclusive
> > > to one single core.
> > >
> > > To prevent oversubscription of L2 cache, load should be
> > > balanced between such L2 clusters, especially for tasks with
> > > no shared data.
> > >
> > > Also with cluster scheduling policy where tasks are woken up
> > > in the same L2 cluster, we will benefit from keeping tasks
> > > related to each other and likely sharing data in the same L2
> > > cluster.
> > >
> > > Add CPU masks of CPUs sharing the L2 cache so we can build such
> > > L2 cluster scheduler domain.
> > >
> > > Signed-off-by: Tim Chen 
> > > Signed-off-by: Barry Song 
> >
> >
> > Barry,
> >
> > Can you also add this chunk to the patch.
> > Thanks.
> 
> Sure, Tim, Thanks. I'll put that into patch 4/4 in v6.

Hi Tim,
You might want to take a look at this qemu patchset:
https://lore.kernel.org/qemu-devel/20210331095343.12172-1-wangyana...@huawei.com/T/#t

someone is trying to leverage this cluster topology
to improve KVM virtual machines performance.

> 
> >
> > Tim
> >
> >
> > diff --git a/arch/x86/include/asm/topology.h
> > b/arch/x86/include/asm/topology.h
> > index 2a11ccc14fb1..800fa48c9fcd 100644
> > --- a/arch/x86/include/asm/topology.h
> > +++ b/arch/x86/include/asm/topology.h
> > @@ -115,6 +115,7 @@ extern unsigned int __max_die_per_package;
> >
> >  #ifdef CONFIG_SMP
> >  #define topology_die_cpumask(cpu)  (per_cpu(cpu_die_map, cpu))
> > +#define topology_cluster_cpumask(cpu)  
> > (cpu_clustergroup_mask(cpu))
> >  #define topology_core_cpumask(cpu) (per_cpu(cpu_core_map, cpu))
> >  #define topology_sibling_cpumask(cpu)  
> > (per_cpu(cpu_sibling_map, cpu))
> >
> 

Thanks
Barry


RE: [RFC PATCH v5 4/4] scheduler: Add cluster scheduler level for x86

2021-03-23 Thread Song Bao Hua (Barry Song)


> -Original Message-
> From: Tim Chen [mailto:tim.c.c...@linux.intel.com]
> Sent: Wednesday, March 24, 2021 11:51 AM
> To: Song Bao Hua (Barry Song) ;
> catalin.mari...@arm.com; w...@kernel.org; r...@rjwysocki.net;
> vincent.guit...@linaro.org; b...@alien8.de; t...@linutronix.de;
> mi...@redhat.com; l...@kernel.org; pet...@infradead.org;
> dietmar.eggem...@arm.com; rost...@goodmis.org; bseg...@google.com;
> mgor...@suse.de
> Cc: msys.miz...@gmail.com; valentin.schnei...@arm.com;
> gre...@linuxfoundation.org; Jonathan Cameron ;
> juri.le...@redhat.com; mark.rutl...@arm.com; sudeep.ho...@arm.com;
> aubrey...@linux.intel.com; linux-arm-ker...@lists.infradead.org;
> linux-kernel@vger.kernel.org; linux-a...@vger.kernel.org; x...@kernel.org;
> xuwei (O) ; Zengtao (B) ;
> guodong...@linaro.org; yangyicong ; Liguozhu (Kenneth)
> ; linux...@openeuler.org; h...@zytor.com
> Subject: Re: [RFC PATCH v5 4/4] scheduler: Add cluster scheduler level for x86
> 
> 
> 
> On 3/18/21 9:16 PM, Barry Song wrote:
> > From: Tim Chen 
> >
> > There are x86 CPU architectures (e.g. Jacobsville) where L2 cahce
> > is shared among a cluster of cores instead of being exclusive
> > to one single core.
> >
> > To prevent oversubscription of L2 cache, load should be
> > balanced between such L2 clusters, especially for tasks with
> > no shared data.
> >
> > Also with cluster scheduling policy where tasks are woken up
> > in the same L2 cluster, we will benefit from keeping tasks
> > related to each other and likely sharing data in the same L2
> > cluster.
> >
> > Add CPU masks of CPUs sharing the L2 cache so we can build such
> > L2 cluster scheduler domain.
> >
> > Signed-off-by: Tim Chen 
> > Signed-off-by: Barry Song 
> 
> 
> Barry,
> 
> Can you also add this chunk to the patch.
> Thanks.

Sure, Tim, Thanks. I'll put that into patch 4/4 in v6.

> 
> Tim
> 
> 
> diff --git a/arch/x86/include/asm/topology.h
> b/arch/x86/include/asm/topology.h
> index 2a11ccc14fb1..800fa48c9fcd 100644
> --- a/arch/x86/include/asm/topology.h
> +++ b/arch/x86/include/asm/topology.h
> @@ -115,6 +115,7 @@ extern unsigned int __max_die_per_package;
> 
>  #ifdef CONFIG_SMP
>  #define topology_die_cpumask(cpu)(per_cpu(cpu_die_map, cpu))
> +#define topology_cluster_cpumask(cpu)
> (cpu_clustergroup_mask(cpu))
>  #define topology_core_cpumask(cpu)   (per_cpu(cpu_core_map, cpu))
>  #define topology_sibling_cpumask(cpu)
> (per_cpu(cpu_sibling_map, cpu))
> 

Thanks
Barry




Re: [RFC PATCH v5 4/4] scheduler: Add cluster scheduler level for x86

2021-03-23 Thread Tim Chen



On 3/18/21 9:16 PM, Barry Song wrote:
> From: Tim Chen 
> 
> There are x86 CPU architectures (e.g. Jacobsville) where L2 cahce
> is shared among a cluster of cores instead of being exclusive
> to one single core.
> 
> To prevent oversubscription of L2 cache, load should be
> balanced between such L2 clusters, especially for tasks with
> no shared data.
> 
> Also with cluster scheduling policy where tasks are woken up
> in the same L2 cluster, we will benefit from keeping tasks
> related to each other and likely sharing data in the same L2
> cluster.
> 
> Add CPU masks of CPUs sharing the L2 cache so we can build such
> L2 cluster scheduler domain.
> 
> Signed-off-by: Tim Chen 
> Signed-off-by: Barry Song 


Barry,

Can you also add this chunk to the patch.
Thanks.

Tim


diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 2a11ccc14fb1..800fa48c9fcd 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -115,6 +115,7 @@ extern unsigned int __max_die_per_package;
 
 #ifdef CONFIG_SMP
 #define topology_die_cpumask(cpu)  (per_cpu(cpu_die_map, cpu))
+#define topology_cluster_cpumask(cpu)  (cpu_clustergroup_mask(cpu))
 #define topology_core_cpumask(cpu) (per_cpu(cpu_core_map, cpu))
 #define topology_sibling_cpumask(cpu)  (per_cpu(cpu_sibling_map, cpu))
 


[RFC PATCH v5 4/4] scheduler: Add cluster scheduler level for x86

2021-03-18 Thread Barry Song
From: Tim Chen 

There are x86 CPU architectures (e.g. Jacobsville) where L2 cahce
is shared among a cluster of cores instead of being exclusive
to one single core.

To prevent oversubscription of L2 cache, load should be
balanced between such L2 clusters, especially for tasks with
no shared data.

Also with cluster scheduling policy where tasks are woken up
in the same L2 cluster, we will benefit from keeping tasks
related to each other and likely sharing data in the same L2
cluster.

Add CPU masks of CPUs sharing the L2 cache so we can build such
L2 cluster scheduler domain.

Signed-off-by: Tim Chen 
Signed-off-by: Barry Song 
---
 arch/x86/Kconfig|  8 
 arch/x86/include/asm/smp.h  |  7 +++
 arch/x86/include/asm/topology.h |  1 +
 arch/x86/kernel/cpu/cacheinfo.c |  1 +
 arch/x86/kernel/cpu/common.c|  3 +++
 arch/x86/kernel/smpboot.c   | 43 -
 6 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2792879..d597de2 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1002,6 +1002,14 @@ config NR_CPUS
  This is purely to save memory: each supported CPU adds about 8KB
  to the kernel image.
 
+config SCHED_CLUSTER
+   bool "Cluster scheduler support"
+   default n
+   help
+Cluster scheduler support improves the CPU scheduler's decision
+making when dealing with machines that have clusters of CPUs
+sharing L2 cache. If unsure say N here.
+
 config SCHED_SMT
def_bool y if SMP
 
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index c0538f8..9cbc4ae 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -16,7 +16,9 @@
 DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_die_map);
 /* cpus sharing the last level cache: */
 DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map);
+DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_l2c_shared_map);
 DECLARE_PER_CPU_READ_MOSTLY(u16, cpu_llc_id);
+DECLARE_PER_CPU_READ_MOSTLY(u16, cpu_l2c_id);
 DECLARE_PER_CPU_READ_MOSTLY(int, cpu_number);
 
 static inline struct cpumask *cpu_llc_shared_mask(int cpu)
@@ -24,6 +26,11 @@ static inline struct cpumask *cpu_llc_shared_mask(int cpu)
return per_cpu(cpu_llc_shared_map, cpu);
 }
 
+static inline struct cpumask *cpu_l2c_shared_mask(int cpu)
+{
+   return per_cpu(cpu_l2c_shared_map, cpu);
+}
+
 DECLARE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_cpu_to_apicid);
 DECLARE_EARLY_PER_CPU_READ_MOSTLY(u32, x86_cpu_to_acpiid);
 DECLARE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_bios_cpu_apicid);
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 9239399..2a11ccc 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -103,6 +103,7 @@ static inline void setup_node_to_cpumask_map(void) { }
 #include 
 
 extern const struct cpumask *cpu_coregroup_mask(int cpu);
+extern const struct cpumask *cpu_clustergroup_mask(int cpu);
 
 #define topology_logical_package_id(cpu)   (cpu_data(cpu).logical_proc_id)
 #define topology_physical_package_id(cpu)  (cpu_data(cpu).phys_proc_id)
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 3ca9be4..0d03a71 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -846,6 +846,7 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
l2 = new_l2;
 #ifdef CONFIG_SMP
per_cpu(cpu_llc_id, cpu) = l2_id;
+   per_cpu(cpu_l2c_id, cpu) = l2_id;
 #endif
}
 
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index ab640ab..0ba282d 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -78,6 +78,9 @@
 /* Last level cache ID of each logical CPU */
 DEFINE_PER_CPU_READ_MOSTLY(u16, cpu_llc_id) = BAD_APICID;
 
+/* L2 cache ID of each logical CPU */
+DEFINE_PER_CPU_READ_MOSTLY(u16, cpu_l2c_id) = BAD_APICID;
+
 /* correctly size the local cpu masks */
 void __init setup_cpu_local_masks(void)
 {
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 02813a7..c85ffa8 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -101,6 +101,8 @@
 
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map);
 
+DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_l2c_shared_map);
+
 /* Per CPU bogomips and other parameters */
 DEFINE_PER_CPU_READ_MOSTLY(struct cpuinfo_x86, cpu_info);
 EXPORT_PER_CPU_SYMBOL(cpu_info);
@@ -501,6 +503,21 @@ static bool match_llc(struct cpuinfo_x86 *c, struct 
cpuinfo_x86 *o)
return topology_sane(c, o, "llc");
 }
 
+static bool match_l2c(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
+{
+   int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
+
+   /* Do not match if we do not have a valid APICID for cpu: */
+   if (per_cpu(cpu_l2c_id, cpu1) == BAD_APICID)
+   return false;
+
+   /* Do not match if