Re: [External] Re: [PATCH v4 2/2] memory tier: create CPUless memory tiers after obtaining HMAT info
On Mon, Mar 25, 2024 at 8:08 PM Huang, Ying wrote: > > "Ho-Ren (Jack) Chuang" writes: > > > On Fri, Mar 22, 2024 at 1:41 AM Huang, Ying wrote: > >> > >> "Ho-Ren (Jack) Chuang" writes: > >> > >> > The current implementation treats emulated memory devices, such as > >> > CXL1.1 type3 memory, as normal DRAM when they are emulated as normal > >> > memory > >> > (E820_TYPE_RAM). However, these emulated devices have different > >> > characteristics than traditional DRAM, making it important to > >> > distinguish them. Thus, we modify the tiered memory initialization > >> > process > >> > to introduce a delay specifically for CPUless NUMA nodes. This delay > >> > ensures that the memory tier initialization for these nodes is deferred > >> > until HMAT information is obtained during the boot process. Finally, > >> > demotion tables are recalculated at the end. > >> > > >> > * late_initcall(memory_tier_late_init); > >> > Some device drivers may have initialized memory tiers between > >> > `memory_tier_init()` and `memory_tier_late_init()`, potentially bringing > >> > online memory nodes and configuring memory tiers. They should be excluded > >> > in the late init. > >> > > >> > * Handle cases where there is no HMAT when creating memory tiers > >> > There is a scenario where a CPUless node does not provide HMAT > >> > information. > >> > If no HMAT is specified, it falls back to using the default DRAM tier. > >> > > >> > * Introduce another new lock `default_dram_perf_lock` for adist > >> > calculation > >> > In the current implementation, iterating through CPUlist nodes requires > >> > holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end > >> > up > >> > trying to acquire the same lock, leading to a potential deadlock. > >> > Therefore, we propose introducing a standalone `default_dram_perf_lock` > >> > to > >> > protect `default_dram_perf_*`. This approach not only avoids deadlock > >> > but also prevents holding a large lock simultaneously. > >> > > >> > * Upgrade `set_node_memory_tier` to support additional cases, including > >> > default DRAM, late CPUless, and hot-plugged initializations. > >> > To cover hot-plugged memory nodes, `mt_calc_adistance()` and > >> > `mt_find_alloc_memory_type()` are moved into `set_node_memory_tier()` to > >> > handle cases where memtype is not initialized and where HMAT information > >> > is > >> > available. > >> > > >> > * Introduce `default_memory_types` for those memory types that are not > >> > initialized by device drivers. > >> > Because late initialized memory and default DRAM memory need to be > >> > managed, > >> > a default memory type is created for storing all memory types that are > >> > not initialized by device drivers and as a fallback. > >> > > >> > Signed-off-by: Ho-Ren (Jack) Chuang > >> > Signed-off-by: Hao Xiang > >> > --- > >> > mm/memory-tiers.c | 73 --- > >> > 1 file changed, 63 insertions(+), 10 deletions(-) > >> > > >> > diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c > >> > index 974af10cfdd8..9396330fa162 100644 > >> > --- a/mm/memory-tiers.c > >> > +++ b/mm/memory-tiers.c > >> > @@ -36,6 +36,11 @@ struct node_memory_type_map { > >> > > >> > static DEFINE_MUTEX(memory_tier_lock); > >> > static LIST_HEAD(memory_tiers); > >> > +/* > >> > + * The list is used to store all memory types that are not created > >> > + * by a device driver. > >> > + */ > >> > +static LIST_HEAD(default_memory_types); > >> > static struct node_memory_type_map node_memory_types[MAX_NUMNODES]; > >> > struct memory_dev_type *default_dram_type; > >> > > >> > @@ -108,6 +113,7 @@ static struct demotion_nodes *node_demotion > >> > __read_mostly; > >> > > >> > static BLOCKING_NOTIFIER_HEAD(mt_adistance_algorithms); > >> > > >> > +static DEFINE_MUTEX(default_dram_perf_lock); > >> > >> Better to add comments about what is protected by this lock. > >> > > > > Thank you. I will add a comment like this: > > + /* The lock is used to protect `default_dram_perf*` info and nid. */ > > +static DEFINE_MUTEX(default_dram_perf_lock); > > > > I also found an error path was not handled and > > found the lock could be put closer to what it protects. > > I will have them fixed in V5. > > > >> > static bool default_dram_perf_error; > >> > static struct access_coordinate default_dram_perf; > >> > static int default_dram_perf_ref_nid = NUMA_NO_NODE; > >> > @@ -505,7 +511,8 @@ static inline void __init_node_memory_type(int node, > >> > struct memory_dev_type *mem > >> > static struct memory_tier *set_node_memory_tier(int node) > >> > { > >> > struct memory_tier *memtier; > >> > - struct memory_dev_type *memtype; > >> > + struct memory_dev_type *mtype; > >> > >> mtype may be referenced without initialization now below. > >> > > > > Good catch! Thank you. > > > > Please check below. > > I may found a potential NULL pointer dereference. > > > >> > + int adist = MEMTIER_ADISTANCE_DRAM; > >> >
Re: [External] Re: [PATCH v4 2/2] memory tier: create CPUless memory tiers after obtaining HMAT info
"Ho-Ren (Jack) Chuang" writes: > On Fri, Mar 22, 2024 at 1:41 AM Huang, Ying wrote: >> >> "Ho-Ren (Jack) Chuang" writes: >> >> > The current implementation treats emulated memory devices, such as >> > CXL1.1 type3 memory, as normal DRAM when they are emulated as normal memory >> > (E820_TYPE_RAM). However, these emulated devices have different >> > characteristics than traditional DRAM, making it important to >> > distinguish them. Thus, we modify the tiered memory initialization process >> > to introduce a delay specifically for CPUless NUMA nodes. This delay >> > ensures that the memory tier initialization for these nodes is deferred >> > until HMAT information is obtained during the boot process. Finally, >> > demotion tables are recalculated at the end. >> > >> > * late_initcall(memory_tier_late_init); >> > Some device drivers may have initialized memory tiers between >> > `memory_tier_init()` and `memory_tier_late_init()`, potentially bringing >> > online memory nodes and configuring memory tiers. They should be excluded >> > in the late init. >> > >> > * Handle cases where there is no HMAT when creating memory tiers >> > There is a scenario where a CPUless node does not provide HMAT information. >> > If no HMAT is specified, it falls back to using the default DRAM tier. >> > >> > * Introduce another new lock `default_dram_perf_lock` for adist calculation >> > In the current implementation, iterating through CPUlist nodes requires >> > holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end up >> > trying to acquire the same lock, leading to a potential deadlock. >> > Therefore, we propose introducing a standalone `default_dram_perf_lock` to >> > protect `default_dram_perf_*`. This approach not only avoids deadlock >> > but also prevents holding a large lock simultaneously. >> > >> > * Upgrade `set_node_memory_tier` to support additional cases, including >> > default DRAM, late CPUless, and hot-plugged initializations. >> > To cover hot-plugged memory nodes, `mt_calc_adistance()` and >> > `mt_find_alloc_memory_type()` are moved into `set_node_memory_tier()` to >> > handle cases where memtype is not initialized and where HMAT information is >> > available. >> > >> > * Introduce `default_memory_types` for those memory types that are not >> > initialized by device drivers. >> > Because late initialized memory and default DRAM memory need to be managed, >> > a default memory type is created for storing all memory types that are >> > not initialized by device drivers and as a fallback. >> > >> > Signed-off-by: Ho-Ren (Jack) Chuang >> > Signed-off-by: Hao Xiang >> > --- >> > mm/memory-tiers.c | 73 --- >> > 1 file changed, 63 insertions(+), 10 deletions(-) >> > >> > diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c >> > index 974af10cfdd8..9396330fa162 100644 >> > --- a/mm/memory-tiers.c >> > +++ b/mm/memory-tiers.c >> > @@ -36,6 +36,11 @@ struct node_memory_type_map { >> > >> > static DEFINE_MUTEX(memory_tier_lock); >> > static LIST_HEAD(memory_tiers); >> > +/* >> > + * The list is used to store all memory types that are not created >> > + * by a device driver. >> > + */ >> > +static LIST_HEAD(default_memory_types); >> > static struct node_memory_type_map node_memory_types[MAX_NUMNODES]; >> > struct memory_dev_type *default_dram_type; >> > >> > @@ -108,6 +113,7 @@ static struct demotion_nodes *node_demotion >> > __read_mostly; >> > >> > static BLOCKING_NOTIFIER_HEAD(mt_adistance_algorithms); >> > >> > +static DEFINE_MUTEX(default_dram_perf_lock); >> >> Better to add comments about what is protected by this lock. >> > > Thank you. I will add a comment like this: > + /* The lock is used to protect `default_dram_perf*` info and nid. */ > +static DEFINE_MUTEX(default_dram_perf_lock); > > I also found an error path was not handled and > found the lock could be put closer to what it protects. > I will have them fixed in V5. > >> > static bool default_dram_perf_error; >> > static struct access_coordinate default_dram_perf; >> > static int default_dram_perf_ref_nid = NUMA_NO_NODE; >> > @@ -505,7 +511,8 @@ static inline void __init_node_memory_type(int node, >> > struct memory_dev_type *mem >> > static struct memory_tier *set_node_memory_tier(int node) >> > { >> > struct memory_tier *memtier; >> > - struct memory_dev_type *memtype; >> > + struct memory_dev_type *mtype; >> >> mtype may be referenced without initialization now below. >> > > Good catch! Thank you. > > Please check below. > I may found a potential NULL pointer dereference. > >> > + int adist = MEMTIER_ADISTANCE_DRAM; >> > pg_data_t *pgdat = NODE_DATA(node); >> > >> > >> > @@ -514,11 +521,20 @@ static struct memory_tier *set_node_memory_tier(int >> > node) >> > if (!node_state(node, N_MEMORY)) >> > return ERR_PTR(-EINVAL); >> > >> > - __init_node_memory_type(node, default_dram_type); >> > + mt_calc_adistance(node, );
Re: [External] Re: [PATCH v4 2/2] memory tier: create CPUless memory tiers after obtaining HMAT info
On Fri, Mar 22, 2024 at 1:41 AM Huang, Ying wrote: > > "Ho-Ren (Jack) Chuang" writes: > > > The current implementation treats emulated memory devices, such as > > CXL1.1 type3 memory, as normal DRAM when they are emulated as normal memory > > (E820_TYPE_RAM). However, these emulated devices have different > > characteristics than traditional DRAM, making it important to > > distinguish them. Thus, we modify the tiered memory initialization process > > to introduce a delay specifically for CPUless NUMA nodes. This delay > > ensures that the memory tier initialization for these nodes is deferred > > until HMAT information is obtained during the boot process. Finally, > > demotion tables are recalculated at the end. > > > > * late_initcall(memory_tier_late_init); > > Some device drivers may have initialized memory tiers between > > `memory_tier_init()` and `memory_tier_late_init()`, potentially bringing > > online memory nodes and configuring memory tiers. They should be excluded > > in the late init. > > > > * Handle cases where there is no HMAT when creating memory tiers > > There is a scenario where a CPUless node does not provide HMAT information. > > If no HMAT is specified, it falls back to using the default DRAM tier. > > > > * Introduce another new lock `default_dram_perf_lock` for adist calculation > > In the current implementation, iterating through CPUlist nodes requires > > holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end up > > trying to acquire the same lock, leading to a potential deadlock. > > Therefore, we propose introducing a standalone `default_dram_perf_lock` to > > protect `default_dram_perf_*`. This approach not only avoids deadlock > > but also prevents holding a large lock simultaneously. > > > > * Upgrade `set_node_memory_tier` to support additional cases, including > > default DRAM, late CPUless, and hot-plugged initializations. > > To cover hot-plugged memory nodes, `mt_calc_adistance()` and > > `mt_find_alloc_memory_type()` are moved into `set_node_memory_tier()` to > > handle cases where memtype is not initialized and where HMAT information is > > available. > > > > * Introduce `default_memory_types` for those memory types that are not > > initialized by device drivers. > > Because late initialized memory and default DRAM memory need to be managed, > > a default memory type is created for storing all memory types that are > > not initialized by device drivers and as a fallback. > > > > Signed-off-by: Ho-Ren (Jack) Chuang > > Signed-off-by: Hao Xiang > > --- > > mm/memory-tiers.c | 73 --- > > 1 file changed, 63 insertions(+), 10 deletions(-) > > > > diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c > > index 974af10cfdd8..9396330fa162 100644 > > --- a/mm/memory-tiers.c > > +++ b/mm/memory-tiers.c > > @@ -36,6 +36,11 @@ struct node_memory_type_map { > > > > static DEFINE_MUTEX(memory_tier_lock); > > static LIST_HEAD(memory_tiers); > > +/* > > + * The list is used to store all memory types that are not created > > + * by a device driver. > > + */ > > +static LIST_HEAD(default_memory_types); > > static struct node_memory_type_map node_memory_types[MAX_NUMNODES]; > > struct memory_dev_type *default_dram_type; > > > > @@ -108,6 +113,7 @@ static struct demotion_nodes *node_demotion > > __read_mostly; > > > > static BLOCKING_NOTIFIER_HEAD(mt_adistance_algorithms); > > > > +static DEFINE_MUTEX(default_dram_perf_lock); > > Better to add comments about what is protected by this lock. > Thank you. I will add a comment like this: + /* The lock is used to protect `default_dram_perf*` info and nid. */ +static DEFINE_MUTEX(default_dram_perf_lock); I also found an error path was not handled and found the lock could be put closer to what it protects. I will have them fixed in V5. > > static bool default_dram_perf_error; > > static struct access_coordinate default_dram_perf; > > static int default_dram_perf_ref_nid = NUMA_NO_NODE; > > @@ -505,7 +511,8 @@ static inline void __init_node_memory_type(int node, > > struct memory_dev_type *mem > > static struct memory_tier *set_node_memory_tier(int node) > > { > > struct memory_tier *memtier; > > - struct memory_dev_type *memtype; > > + struct memory_dev_type *mtype; > > mtype may be referenced without initialization now below. > Good catch! Thank you. Please check below. I may found a potential NULL pointer dereference. > > + int adist = MEMTIER_ADISTANCE_DRAM; > > pg_data_t *pgdat = NODE_DATA(node); > > > > > > @@ -514,11 +521,20 @@ static struct memory_tier *set_node_memory_tier(int > > node) > > if (!node_state(node, N_MEMORY)) > > return ERR_PTR(-EINVAL); > > > > - __init_node_memory_type(node, default_dram_type); > > + mt_calc_adistance(node, ); > > + if (node_memory_types[node].memtype == NULL) { > > + mtype = mt_find_alloc_memory_type(adist, > > _memory_types); > > + if