Re: [External] Re: [PATCH v4 2/2] memory tier: create CPUless memory tiers after obtaining HMAT info

2024-03-26 Thread Ho-Ren (Jack) Chuang
On Mon, Mar 25, 2024 at 8:08 PM Huang, Ying  wrote:
>
> "Ho-Ren (Jack) Chuang"  writes:
>
> > On Fri, Mar 22, 2024 at 1:41 AM Huang, Ying  wrote:
> >>
> >> "Ho-Ren (Jack) Chuang"  writes:
> >>
> >> > The current implementation treats emulated memory devices, such as
> >> > CXL1.1 type3 memory, as normal DRAM when they are emulated as normal 
> >> > memory
> >> > (E820_TYPE_RAM). However, these emulated devices have different
> >> > characteristics than traditional DRAM, making it important to
> >> > distinguish them. Thus, we modify the tiered memory initialization 
> >> > process
> >> > to introduce a delay specifically for CPUless NUMA nodes. This delay
> >> > ensures that the memory tier initialization for these nodes is deferred
> >> > until HMAT information is obtained during the boot process. Finally,
> >> > demotion tables are recalculated at the end.
> >> >
> >> > * late_initcall(memory_tier_late_init);
> >> > Some device drivers may have initialized memory tiers between
> >> > `memory_tier_init()` and `memory_tier_late_init()`, potentially bringing
> >> > online memory nodes and configuring memory tiers. They should be excluded
> >> > in the late init.
> >> >
> >> > * Handle cases where there is no HMAT when creating memory tiers
> >> > There is a scenario where a CPUless node does not provide HMAT 
> >> > information.
> >> > If no HMAT is specified, it falls back to using the default DRAM tier.
> >> >
> >> > * Introduce another new lock `default_dram_perf_lock` for adist 
> >> > calculation
> >> > In the current implementation, iterating through CPUlist nodes requires
> >> > holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end 
> >> > up
> >> > trying to acquire the same lock, leading to a potential deadlock.
> >> > Therefore, we propose introducing a standalone `default_dram_perf_lock` 
> >> > to
> >> > protect `default_dram_perf_*`. This approach not only avoids deadlock
> >> > but also prevents holding a large lock simultaneously.
> >> >
> >> > * Upgrade `set_node_memory_tier` to support additional cases, including
> >> >   default DRAM, late CPUless, and hot-plugged initializations.
> >> > To cover hot-plugged memory nodes, `mt_calc_adistance()` and
> >> > `mt_find_alloc_memory_type()` are moved into `set_node_memory_tier()` to
> >> > handle cases where memtype is not initialized and where HMAT information 
> >> > is
> >> > available.
> >> >
> >> > * Introduce `default_memory_types` for those memory types that are not
> >> >   initialized by device drivers.
> >> > Because late initialized memory and default DRAM memory need to be 
> >> > managed,
> >> > a default memory type is created for storing all memory types that are
> >> > not initialized by device drivers and as a fallback.
> >> >
> >> > Signed-off-by: Ho-Ren (Jack) Chuang 
> >> > Signed-off-by: Hao Xiang 
> >> > ---
> >> >  mm/memory-tiers.c | 73 ---
> >> >  1 file changed, 63 insertions(+), 10 deletions(-)
> >> >
> >> > diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
> >> > index 974af10cfdd8..9396330fa162 100644
> >> > --- a/mm/memory-tiers.c
> >> > +++ b/mm/memory-tiers.c
> >> > @@ -36,6 +36,11 @@ struct node_memory_type_map {
> >> >
> >> >  static DEFINE_MUTEX(memory_tier_lock);
> >> >  static LIST_HEAD(memory_tiers);
> >> > +/*
> >> > + * The list is used to store all memory types that are not created
> >> > + * by a device driver.
> >> > + */
> >> > +static LIST_HEAD(default_memory_types);
> >> >  static struct node_memory_type_map node_memory_types[MAX_NUMNODES];
> >> >  struct memory_dev_type *default_dram_type;
> >> >
> >> > @@ -108,6 +113,7 @@ static struct demotion_nodes *node_demotion 
> >> > __read_mostly;
> >> >
> >> >  static BLOCKING_NOTIFIER_HEAD(mt_adistance_algorithms);
> >> >
> >> > +static DEFINE_MUTEX(default_dram_perf_lock);
> >>
> >> Better to add comments about what is protected by this lock.
> >>
> >
> > Thank you. I will add a comment like this:
> > + /* The lock is used to protect `default_dram_perf*` info and nid. */
> > +static DEFINE_MUTEX(default_dram_perf_lock);
> >
> > I also found an error path was not handled and
> > found the lock could be put closer to what it protects.
> > I will have them fixed in V5.
> >
> >> >  static bool default_dram_perf_error;
> >> >  static struct access_coordinate default_dram_perf;
> >> >  static int default_dram_perf_ref_nid = NUMA_NO_NODE;
> >> > @@ -505,7 +511,8 @@ static inline void __init_node_memory_type(int node, 
> >> > struct memory_dev_type *mem
> >> >  static struct memory_tier *set_node_memory_tier(int node)
> >> >  {
> >> >   struct memory_tier *memtier;
> >> > - struct memory_dev_type *memtype;
> >> > + struct memory_dev_type *mtype;
> >>
> >> mtype may be referenced without initialization now below.
> >>
> >
> > Good catch! Thank you.
> >
> > Please check below.
> > I may found a potential NULL pointer dereference.
> >
> >> > + int adist = MEMTIER_ADISTANCE_DRAM;
> >> >

Re: [External] Re: [PATCH v4 2/2] memory tier: create CPUless memory tiers after obtaining HMAT info

2024-03-25 Thread Huang, Ying
"Ho-Ren (Jack) Chuang"  writes:

> On Fri, Mar 22, 2024 at 1:41 AM Huang, Ying  wrote:
>>
>> "Ho-Ren (Jack) Chuang"  writes:
>>
>> > The current implementation treats emulated memory devices, such as
>> > CXL1.1 type3 memory, as normal DRAM when they are emulated as normal memory
>> > (E820_TYPE_RAM). However, these emulated devices have different
>> > characteristics than traditional DRAM, making it important to
>> > distinguish them. Thus, we modify the tiered memory initialization process
>> > to introduce a delay specifically for CPUless NUMA nodes. This delay
>> > ensures that the memory tier initialization for these nodes is deferred
>> > until HMAT information is obtained during the boot process. Finally,
>> > demotion tables are recalculated at the end.
>> >
>> > * late_initcall(memory_tier_late_init);
>> > Some device drivers may have initialized memory tiers between
>> > `memory_tier_init()` and `memory_tier_late_init()`, potentially bringing
>> > online memory nodes and configuring memory tiers. They should be excluded
>> > in the late init.
>> >
>> > * Handle cases where there is no HMAT when creating memory tiers
>> > There is a scenario where a CPUless node does not provide HMAT information.
>> > If no HMAT is specified, it falls back to using the default DRAM tier.
>> >
>> > * Introduce another new lock `default_dram_perf_lock` for adist calculation
>> > In the current implementation, iterating through CPUlist nodes requires
>> > holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end up
>> > trying to acquire the same lock, leading to a potential deadlock.
>> > Therefore, we propose introducing a standalone `default_dram_perf_lock` to
>> > protect `default_dram_perf_*`. This approach not only avoids deadlock
>> > but also prevents holding a large lock simultaneously.
>> >
>> > * Upgrade `set_node_memory_tier` to support additional cases, including
>> >   default DRAM, late CPUless, and hot-plugged initializations.
>> > To cover hot-plugged memory nodes, `mt_calc_adistance()` and
>> > `mt_find_alloc_memory_type()` are moved into `set_node_memory_tier()` to
>> > handle cases where memtype is not initialized and where HMAT information is
>> > available.
>> >
>> > * Introduce `default_memory_types` for those memory types that are not
>> >   initialized by device drivers.
>> > Because late initialized memory and default DRAM memory need to be managed,
>> > a default memory type is created for storing all memory types that are
>> > not initialized by device drivers and as a fallback.
>> >
>> > Signed-off-by: Ho-Ren (Jack) Chuang 
>> > Signed-off-by: Hao Xiang 
>> > ---
>> >  mm/memory-tiers.c | 73 ---
>> >  1 file changed, 63 insertions(+), 10 deletions(-)
>> >
>> > diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
>> > index 974af10cfdd8..9396330fa162 100644
>> > --- a/mm/memory-tiers.c
>> > +++ b/mm/memory-tiers.c
>> > @@ -36,6 +36,11 @@ struct node_memory_type_map {
>> >
>> >  static DEFINE_MUTEX(memory_tier_lock);
>> >  static LIST_HEAD(memory_tiers);
>> > +/*
>> > + * The list is used to store all memory types that are not created
>> > + * by a device driver.
>> > + */
>> > +static LIST_HEAD(default_memory_types);
>> >  static struct node_memory_type_map node_memory_types[MAX_NUMNODES];
>> >  struct memory_dev_type *default_dram_type;
>> >
>> > @@ -108,6 +113,7 @@ static struct demotion_nodes *node_demotion 
>> > __read_mostly;
>> >
>> >  static BLOCKING_NOTIFIER_HEAD(mt_adistance_algorithms);
>> >
>> > +static DEFINE_MUTEX(default_dram_perf_lock);
>>
>> Better to add comments about what is protected by this lock.
>>
>
> Thank you. I will add a comment like this:
> + /* The lock is used to protect `default_dram_perf*` info and nid. */
> +static DEFINE_MUTEX(default_dram_perf_lock);
>
> I also found an error path was not handled and
> found the lock could be put closer to what it protects.
> I will have them fixed in V5.
>
>> >  static bool default_dram_perf_error;
>> >  static struct access_coordinate default_dram_perf;
>> >  static int default_dram_perf_ref_nid = NUMA_NO_NODE;
>> > @@ -505,7 +511,8 @@ static inline void __init_node_memory_type(int node, 
>> > struct memory_dev_type *mem
>> >  static struct memory_tier *set_node_memory_tier(int node)
>> >  {
>> >   struct memory_tier *memtier;
>> > - struct memory_dev_type *memtype;
>> > + struct memory_dev_type *mtype;
>>
>> mtype may be referenced without initialization now below.
>>
>
> Good catch! Thank you.
>
> Please check below.
> I may found a potential NULL pointer dereference.
>
>> > + int adist = MEMTIER_ADISTANCE_DRAM;
>> >   pg_data_t *pgdat = NODE_DATA(node);
>> >
>> >
>> > @@ -514,11 +521,20 @@ static struct memory_tier *set_node_memory_tier(int 
>> > node)
>> >   if (!node_state(node, N_MEMORY))
>> >   return ERR_PTR(-EINVAL);
>> >
>> > - __init_node_memory_type(node, default_dram_type);
>> > + mt_calc_adistance(node, );

Re: [External] Re: [PATCH v4 2/2] memory tier: create CPUless memory tiers after obtaining HMAT info

2024-03-22 Thread Ho-Ren (Jack) Chuang
On Fri, Mar 22, 2024 at 1:41 AM Huang, Ying  wrote:
>
> "Ho-Ren (Jack) Chuang"  writes:
>
> > The current implementation treats emulated memory devices, such as
> > CXL1.1 type3 memory, as normal DRAM when they are emulated as normal memory
> > (E820_TYPE_RAM). However, these emulated devices have different
> > characteristics than traditional DRAM, making it important to
> > distinguish them. Thus, we modify the tiered memory initialization process
> > to introduce a delay specifically for CPUless NUMA nodes. This delay
> > ensures that the memory tier initialization for these nodes is deferred
> > until HMAT information is obtained during the boot process. Finally,
> > demotion tables are recalculated at the end.
> >
> > * late_initcall(memory_tier_late_init);
> > Some device drivers may have initialized memory tiers between
> > `memory_tier_init()` and `memory_tier_late_init()`, potentially bringing
> > online memory nodes and configuring memory tiers. They should be excluded
> > in the late init.
> >
> > * Handle cases where there is no HMAT when creating memory tiers
> > There is a scenario where a CPUless node does not provide HMAT information.
> > If no HMAT is specified, it falls back to using the default DRAM tier.
> >
> > * Introduce another new lock `default_dram_perf_lock` for adist calculation
> > In the current implementation, iterating through CPUlist nodes requires
> > holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end up
> > trying to acquire the same lock, leading to a potential deadlock.
> > Therefore, we propose introducing a standalone `default_dram_perf_lock` to
> > protect `default_dram_perf_*`. This approach not only avoids deadlock
> > but also prevents holding a large lock simultaneously.
> >
> > * Upgrade `set_node_memory_tier` to support additional cases, including
> >   default DRAM, late CPUless, and hot-plugged initializations.
> > To cover hot-plugged memory nodes, `mt_calc_adistance()` and
> > `mt_find_alloc_memory_type()` are moved into `set_node_memory_tier()` to
> > handle cases where memtype is not initialized and where HMAT information is
> > available.
> >
> > * Introduce `default_memory_types` for those memory types that are not
> >   initialized by device drivers.
> > Because late initialized memory and default DRAM memory need to be managed,
> > a default memory type is created for storing all memory types that are
> > not initialized by device drivers and as a fallback.
> >
> > Signed-off-by: Ho-Ren (Jack) Chuang 
> > Signed-off-by: Hao Xiang 
> > ---
> >  mm/memory-tiers.c | 73 ---
> >  1 file changed, 63 insertions(+), 10 deletions(-)
> >
> > diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
> > index 974af10cfdd8..9396330fa162 100644
> > --- a/mm/memory-tiers.c
> > +++ b/mm/memory-tiers.c
> > @@ -36,6 +36,11 @@ struct node_memory_type_map {
> >
> >  static DEFINE_MUTEX(memory_tier_lock);
> >  static LIST_HEAD(memory_tiers);
> > +/*
> > + * The list is used to store all memory types that are not created
> > + * by a device driver.
> > + */
> > +static LIST_HEAD(default_memory_types);
> >  static struct node_memory_type_map node_memory_types[MAX_NUMNODES];
> >  struct memory_dev_type *default_dram_type;
> >
> > @@ -108,6 +113,7 @@ static struct demotion_nodes *node_demotion 
> > __read_mostly;
> >
> >  static BLOCKING_NOTIFIER_HEAD(mt_adistance_algorithms);
> >
> > +static DEFINE_MUTEX(default_dram_perf_lock);
>
> Better to add comments about what is protected by this lock.
>

Thank you. I will add a comment like this:
+ /* The lock is used to protect `default_dram_perf*` info and nid. */
+static DEFINE_MUTEX(default_dram_perf_lock);

I also found an error path was not handled and
found the lock could be put closer to what it protects.
I will have them fixed in V5.

> >  static bool default_dram_perf_error;
> >  static struct access_coordinate default_dram_perf;
> >  static int default_dram_perf_ref_nid = NUMA_NO_NODE;
> > @@ -505,7 +511,8 @@ static inline void __init_node_memory_type(int node, 
> > struct memory_dev_type *mem
> >  static struct memory_tier *set_node_memory_tier(int node)
> >  {
> >   struct memory_tier *memtier;
> > - struct memory_dev_type *memtype;
> > + struct memory_dev_type *mtype;
>
> mtype may be referenced without initialization now below.
>

Good catch! Thank you.

Please check below.
I may found a potential NULL pointer dereference.

> > + int adist = MEMTIER_ADISTANCE_DRAM;
> >   pg_data_t *pgdat = NODE_DATA(node);
> >
> >
> > @@ -514,11 +521,20 @@ static struct memory_tier *set_node_memory_tier(int 
> > node)
> >   if (!node_state(node, N_MEMORY))
> >   return ERR_PTR(-EINVAL);
> >
> > - __init_node_memory_type(node, default_dram_type);
> > + mt_calc_adistance(node, );
> > + if (node_memory_types[node].memtype == NULL) {
> > + mtype = mt_find_alloc_memory_type(adist, 
> > _memory_types);
> > + if