Re: [PATCH 0/3] mm: use memmap_on_memory semantics for dax/kmem
On Fri, 2023-06-16 at 09:44 +0200, David Hildenbrand wrote: > On 16.06.23 00:00, Vishal Verma wrote: > > The dax/kmem driver can potentially hot-add large amounts of memory > > originating from CXL memory expanders, or NVDIMMs, or other 'device > > memories'. There is a chance there isn't enough regular system memory > > available to fit ythe memmap for this new memory. It's therefore > > desirable, if all other conditions are met, for the kmem managed memory > > to place its memmap on the newly added memory itself. > > > > Arrange for this by first allowing for a module parameter override for > > the mhp_supports_memmap_on_memory() test using a flag, adjusting the > > only other caller of this interface in dirvers/acpi/acpi_memoryhotplug.c, > > exporting the symbol so it can be called by kmem.c, and finally changing > > the kmem driver to add_memory() in chunks of memory_block_size_bytes(). > > 1) Why is the override a requirement here? Just let the admin configure > it then then add conditional support for kmem. Configure it in the current way using the module parameter to memory_hotplug? The whole module param check feels a bit awkward, especially since memory_hotplug is builtin, the only way to supply the param is on the kernel command line as opposed to a modprobe config. The goal with extending mhp_supports_memmap_on_memory() to check for support with or without consideration for the module param is that it makes it a bit more flexible for callers like kmem. > 2) I recall that there are cases where we don't want the memmap to land > on slow memory (which online_movable would achieve). Just imagine the > slow PMEM case. So this might need another configuration knob on the > kmem side. > > I recall some discussions on doing that chunk handling internally (so > kmem can just perform one add_memory() and we'd split that up internally). > Another config knob isn't unreasonable - though the thinking in making this behavior the new default policy was that with CXL based memory expanders, the performance delta from main memory wouldn't be as big as the pmem - main memory delta. With pmem devices being phased out, it's not clear we'd need a knob, and it can always be added if it ends up becoming necessary. The other comments about doing the per-memblock loop internally, and fixing up the removal paths all sound good, and I've started reworking those - thanks for taking a look!
[PATCH v3] dax/kmem: Pass valid argument to memory_group_register_static
memory_group_register_static takes maximum number of pages as the argument while dev_dax_kmem_probe passes total_len (in bytes) as the argument. IIUC, I don't see any crash/panic impact as such. As, memory_group_register_static just set the max_pages limit which is used in auto_movable_zone_for_pfn to determine the zone. which might cause these condition to behave differently, This will be true always so jump will happen to kernel_zone ... if (!auto_movable_can_online_movable(NUMA_NO_NODE, group, nr_pages)) goto kernel_zone; ... kernel_zone: return default_kernel_zone_for_pfn(nid, pfn, nr_pages); Here, In below, zone_intersects compare range will be larger as nr_pages will be higher (derived from total_len passed in dev_dax_kmem_probe). ... static struct zone *default_kernel_zone_for_pfn(int nid, unsigned long start_pfn, unsigned long nr_pages) { struct pglist_data *pgdat = NODE_DATA(nid); int zid; for (zid = 0; zid < ZONE_NORMAL; zid++) { struct zone *zone = >node_zones[zid]; if (zone_intersects(zone, start_pfn, nr_pages)) return zone; } return >node_zones[ZONE_NORMAL]; } Incorrect zone will be returned here, which in later time might cause bigger problem. Fixes: eedf634aac3b ("dax/kmem: use a single static memory group for a single probed unit") Signed-off-by: Tarun Sahu Reviewed-by: Vishal Verma --- V3<-V2 1. Removed skip characters. V2<-V1 1. Added more details to commit message drivers/dax/kmem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c index 7b36db6f1cbd..898ca9505754 100644 --- a/drivers/dax/kmem.c +++ b/drivers/dax/kmem.c @@ -99,7 +99,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) if (!data->res_name) goto err_res_name; - rc = memory_group_register_static(numa_node, total_len); + rc = memory_group_register_static(numa_node, PFN_UP(total_len)); if (rc < 0) goto err_reg_mgid; data->mgid = rc; -- 2.31.1
Re: [PATCH][next] fsdax: remove redundant variable 'error'
On Wed 21-06-23 14:02:56, Colin Ian King wrote: > The variable 'error' is being assigned a value that is never read, > the assignment and the variable and redundant and can be removed. > Cleans up clang scan build warning: > > fs/dax.c:1880:10: warning: Although the value stored to 'error' is > used in the enclosing expression, the value is never actually read > from 'error' [deadcode.DeadStores] > > Signed-off-by: Colin Ian King Yeah, good spotting. Feel free to add: Reviewed-by: Jan Kara Honza > --- > fs/dax.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/fs/dax.c b/fs/dax.c > index 2ababb89918d..cb36c6746fc4 100644 > --- a/fs/dax.c > +++ b/fs/dax.c > @@ -1830,7 +1830,6 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault > *vmf, pfn_t *pfnp, > vm_fault_t ret = VM_FAULT_FALLBACK; > pgoff_t max_pgoff; > void *entry; > - int error; > > if (vmf->flags & FAULT_FLAG_WRITE) > iter.flags |= IOMAP_WRITE; > @@ -1877,7 +1876,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault > *vmf, pfn_t *pfnp, > } > > iter.pos = (loff_t)xas.xa_index << PAGE_SHIFT; > - while ((error = iomap_iter(, ops)) > 0) { > + while (iomap_iter(, ops) > 0) { > if (iomap_length() < PMD_SIZE) > continue; /* actually breaks out of the loop */ > > -- > 2.39.2 > -- Jan Kara SUSE Labs, CR
[PATCH][next] fsdax: remove redundant variable 'error'
The variable 'error' is being assigned a value that is never read, the assignment and the variable and redundant and can be removed. Cleans up clang scan build warning: fs/dax.c:1880:10: warning: Although the value stored to 'error' is used in the enclosing expression, the value is never actually read from 'error' [deadcode.DeadStores] Signed-off-by: Colin Ian King --- fs/dax.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 2ababb89918d..cb36c6746fc4 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1830,7 +1830,6 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, vm_fault_t ret = VM_FAULT_FALLBACK; pgoff_t max_pgoff; void *entry; - int error; if (vmf->flags & FAULT_FLAG_WRITE) iter.flags |= IOMAP_WRITE; @@ -1877,7 +1876,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, } iter.pos = (loff_t)xas.xa_index << PAGE_SHIFT; - while ((error = iomap_iter(, ops)) > 0) { + while (iomap_iter(, ops) > 0) { if (iomap_length() < PMD_SIZE) continue; /* actually breaks out of the loop */ -- 2.39.2
[PATCH v2] dax/kmem: Pass valid argument to memory_group_register_static
memory_group_register_static takes maximum number of pages as the argument while dev_dax_kmem_probe passes total_len (in bytes) as the argument. IIUC, I don't see any crash/panic impact as such. As, memory_group_register_static just set the max_pages limit which is used in auto_movable_zone_for_pfn to determine the zone. which might cause these condition to behave differently, This will be true always so jump will happen to kernel_zone if (!auto_movable_can_online_movable(NUMA_NO_NODE, group, nr_pages)) goto kernel_zone; --- kernel_zone: return default_kernel_zone_for_pfn(nid, pfn, nr_pages); --- Here, In below, zone_intersects compare range will be larger as nr_pages will be higher (derived from total_len passed in dev_dax_kmem_probe). static struct zone *default_kernel_zone_for_pfn(int nid, unsigned long start_pfn, unsigned long nr_pages) { struct pglist_data *pgdat = NODE_DATA(nid); int zid; for (zid = 0; zid < ZONE_NORMAL; zid++) { struct zone *zone = >node_zones[zid]; if (zone_intersects(zone, start_pfn, nr_pages)) return zone; } return >node_zones[ZONE_NORMAL]; } Incorrect zone will be returned here, which in later time might cause bigger problem. Fixes: eedf634aac3b ("dax/kmem: use a single static memory group for a single probed unit") Signed-off-by: Tarun Sahu Reviewed-by: Vishal Verma --- drivers/dax/kmem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c index 7b36db6f1cbd..898ca9505754 100644 --- a/drivers/dax/kmem.c +++ b/drivers/dax/kmem.c @@ -99,7 +99,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) if (!data->res_name) goto err_res_name; - rc = memory_group_register_static(numa_node, total_len); + rc = memory_group_register_static(numa_node, PFN_UP(total_len)); if (rc < 0) goto err_reg_mgid; data->mgid = rc; -- 2.31.1
Re: [PATCH] dax/kmem: Pass valid argument to memory_group_register_static
On Wed, 2023-06-21 at 11:36 +0530, Tarun Sahu wrote: > Hi Alison, > > Alison Schofield writes: > > > On Tue, Jun 20, 2023 at 07:33:32PM +0530, Tarun Sahu wrote: > > > memory_group_register_static takes maximum number of pages as the argument > > > while dev_dax_kmem_probe passes total_len (in bytes) as the argument. > > > > This sounds like a fix. An explanation of the impact and a fixes tag > > may be needed. Also, wondering how you found it. > > > Yes, it is a fix, I found it during dry code walk-through. > There is not any impact as such. As, > memory_group_register_static just set the max_pages limit which > is used in auto_movable_zone_for_pfn to determine the zone. > > which might cause these condition to behave differently, > > This will be true always so jump will happen to kernel_zone > if (!auto_movable_can_online_movable(NUMA_NO_NODE, group, nr_pages)) > goto kernel_zone; > --- > kernel_zone: > return default_kernel_zone_for_pfn(nid, pfn, nr_pages); > > --- > > Here, In below, zone_intersects compare range will be larger as nr_pages > will be higher (derived from total_len passed in dev_dax_kmem_probe). > > static struct zone *default_kernel_zone_for_pfn(int nid, unsigned long > start_pfn, > unsigned long nr_pages) > { > struct pglist_data *pgdat = NODE_DATA(nid); > int zid; > > for (zid = 0; zid < ZONE_NORMAL; zid++) { > struct zone *zone = >node_zones[zid]; > > if (zone_intersects(zone, start_pfn, nr_pages)) > return zone; > } > > return >node_zones[ZONE_NORMAL]; > } > > In Mostly cases, ZONE_NORMAL will be returned. But there is no > crash/panic issues involved here, only decision making on selecting zone > is affected. > Hi Tarun, Good find! With a Fixes tag, and perhaps inclusion of a bit more of this detail described in the commit message, feel free to add: Reviewed-by: Vishal Verma
Re: [PATCH] dax/kmem: Pass valid argument to memory_group_register_static
Hi Alison, Alison Schofield writes: > On Tue, Jun 20, 2023 at 07:33:32PM +0530, Tarun Sahu wrote: >> memory_group_register_static takes maximum number of pages as the argument >> while dev_dax_kmem_probe passes total_len (in bytes) as the argument. > > This sounds like a fix. An explanation of the impact and a fixes tag > may be needed. Also, wondering how you found it. > Yes, it is a fix, I found it during dry code walk-through. There is not any impact as such. As, memory_group_register_static just set the max_pages limit which is used in auto_movable_zone_for_pfn to determine the zone. which might cause these condition to behave differently, This will be true always so jump will happen to kernel_zone if (!auto_movable_can_online_movable(NUMA_NO_NODE, group, nr_pages)) goto kernel_zone; --- kernel_zone: return default_kernel_zone_for_pfn(nid, pfn, nr_pages); --- Here, In below, zone_intersects compare range will be larger as nr_pages will be higher (derived from total_len passed in dev_dax_kmem_probe). static struct zone *default_kernel_zone_for_pfn(int nid, unsigned long start_pfn, unsigned long nr_pages) { struct pglist_data *pgdat = NODE_DATA(nid); int zid; for (zid = 0; zid < ZONE_NORMAL; zid++) { struct zone *zone = >node_zones[zid]; if (zone_intersects(zone, start_pfn, nr_pages)) return zone; } return >node_zones[ZONE_NORMAL]; } In Mostly cases, ZONE_NORMAL will be returned. But there is no crash/panic issues involved here, only decision making on selecting zone is affected. > Alison > >> >> Signed-off-by: Tarun Sahu >> --- >> drivers/dax/kmem.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c >> index 7b36db6f1cbd..898ca9505754 100644 >> --- a/drivers/dax/kmem.c >> +++ b/drivers/dax/kmem.c >> @@ -99,7 +99,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) >> if (!data->res_name) >> goto err_res_name; >> >> -rc = memory_group_register_static(numa_node, total_len); >> +rc = memory_group_register_static(numa_node, PFN_UP(total_len)); >> if (rc < 0) >> goto err_reg_mgid; >> data->mgid = rc; >> -- >> 2.31.1 >>