Re: [PATCH 0/3] mm: use memmap_on_memory semantics for dax/kmem

2023-06-21 Thread Verma, Vishal L
On Fri, 2023-06-16 at 09:44 +0200, David Hildenbrand wrote:
> On 16.06.23 00:00, Vishal Verma wrote:
> > The dax/kmem driver can potentially hot-add large amounts of memory
> > originating from CXL memory expanders, or NVDIMMs, or other 'device
> > memories'. There is a chance there isn't enough regular system memory
> > available to fit ythe memmap for this new memory. It's therefore
> > desirable, if all other conditions are met, for the kmem managed memory
> > to place its memmap on the newly added memory itself.
> > 
> > Arrange for this by first allowing for a module parameter override for
> > the mhp_supports_memmap_on_memory() test using a flag, adjusting the
> > only other caller of this interface in dirvers/acpi/acpi_memoryhotplug.c,
> > exporting the symbol so it can be called by kmem.c, and finally changing
> > the kmem driver to add_memory() in chunks of memory_block_size_bytes().
> 
> 1) Why is the override a requirement here? Just let the admin configure 
> it then then add conditional support for kmem.

Configure it in the current way using the module parameter to
memory_hotplug? The whole module param check feels a bit awkward,
especially since memory_hotplug is builtin, the only way to supply the
param is on the kernel command line as opposed to a modprobe config.

The goal with extending mhp_supports_memmap_on_memory() to check for
support with or without consideration for the module param is that it
makes it a bit more flexible for callers like kmem.

> 2) I recall that there are cases where we don't want the memmap to land 
> on slow memory (which online_movable would achieve). Just imagine the
> slow PMEM case. So this might need another configuration knob on the 
> kmem side.
> 
> I recall some discussions on doing that chunk handling internally (so
> kmem can just perform one add_memory() and we'd split that up internally).
> 
Another config knob isn't unreasonable - though the thinking in making
this behavior the new default policy was that with CXL based memory
expanders, the performance delta from main memory wouldn't be as big as
the pmem - main memory delta. With pmem devices being phased out, it's
not clear we'd need a knob, and it can always be added if it ends up
becoming necessary.

The other comments about doing the per-memblock loop internally, and
fixing up the removal paths all sound good, and I've started reworking
those - thanks for taking a look!


[PATCH v3] dax/kmem: Pass valid argument to memory_group_register_static

2023-06-21 Thread Tarun Sahu
memory_group_register_static takes maximum number of pages as the argument
while dev_dax_kmem_probe passes total_len (in bytes) as the argument.

IIUC, I don't see any crash/panic impact as such. As,
memory_group_register_static just set the max_pages limit which is used in
auto_movable_zone_for_pfn to determine the zone.

which might cause these condition to behave differently,

This will be true always so jump will happen to kernel_zone
...
if (!auto_movable_can_online_movable(NUMA_NO_NODE, group, nr_pages))
goto kernel_zone;

...
kernel_zone:
return default_kernel_zone_for_pfn(nid, pfn, nr_pages);

Here, In below, zone_intersects compare range will be larger as nr_pages
will be higher (derived from total_len passed in dev_dax_kmem_probe).

...
static struct zone *default_kernel_zone_for_pfn(int nid, unsigned long 
start_pfn,
unsigned long nr_pages)
{
struct pglist_data *pgdat = NODE_DATA(nid);
int zid;

for (zid = 0; zid < ZONE_NORMAL; zid++) {
struct zone *zone = >node_zones[zid];

if (zone_intersects(zone, start_pfn, nr_pages))
return zone;
}

return >node_zones[ZONE_NORMAL];
}

Incorrect zone will be returned here, which in later time might cause bigger
problem.

Fixes: eedf634aac3b ("dax/kmem: use a single static memory group for a single 
probed unit")
Signed-off-by: Tarun Sahu 
Reviewed-by: Vishal Verma 
---
V3<-V2
1. Removed skip characters.

V2<-V1
1. Added more details to commit message


 drivers/dax/kmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 7b36db6f1cbd..898ca9505754 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -99,7 +99,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
if (!data->res_name)
goto err_res_name;
 
-   rc = memory_group_register_static(numa_node, total_len);
+   rc = memory_group_register_static(numa_node, PFN_UP(total_len));
if (rc < 0)
goto err_reg_mgid;
data->mgid = rc;
-- 
2.31.1




Re: [PATCH][next] fsdax: remove redundant variable 'error'

2023-06-21 Thread Jan Kara
On Wed 21-06-23 14:02:56, Colin Ian King wrote:
> The variable 'error' is being assigned a value that is never read,
> the assignment and the variable and redundant and can be removed.
> Cleans up clang scan build warning:
> 
> fs/dax.c:1880:10: warning: Although the value stored to 'error' is
> used in the enclosing expression, the value is never actually read
> from 'error' [deadcode.DeadStores]
> 
> Signed-off-by: Colin Ian King 

Yeah, good spotting. Feel free to add:

Reviewed-by: Jan Kara 

Honza

> ---
>  fs/dax.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 2ababb89918d..cb36c6746fc4 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1830,7 +1830,6 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault 
> *vmf, pfn_t *pfnp,
>   vm_fault_t ret = VM_FAULT_FALLBACK;
>   pgoff_t max_pgoff;
>   void *entry;
> - int error;
>  
>   if (vmf->flags & FAULT_FLAG_WRITE)
>   iter.flags |= IOMAP_WRITE;
> @@ -1877,7 +1876,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault 
> *vmf, pfn_t *pfnp,
>   }
>  
>   iter.pos = (loff_t)xas.xa_index << PAGE_SHIFT;
> - while ((error = iomap_iter(, ops)) > 0) {
> + while (iomap_iter(, ops) > 0) {
>   if (iomap_length() < PMD_SIZE)
>   continue; /* actually breaks out of the loop */
>  
> -- 
> 2.39.2
> 
-- 
Jan Kara 
SUSE Labs, CR



[PATCH][next] fsdax: remove redundant variable 'error'

2023-06-21 Thread Colin Ian King
The variable 'error' is being assigned a value that is never read,
the assignment and the variable and redundant and can be removed.
Cleans up clang scan build warning:

fs/dax.c:1880:10: warning: Although the value stored to 'error' is
used in the enclosing expression, the value is never actually read
from 'error' [deadcode.DeadStores]

Signed-off-by: Colin Ian King 
---
 fs/dax.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 2ababb89918d..cb36c6746fc4 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1830,7 +1830,6 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault 
*vmf, pfn_t *pfnp,
vm_fault_t ret = VM_FAULT_FALLBACK;
pgoff_t max_pgoff;
void *entry;
-   int error;
 
if (vmf->flags & FAULT_FLAG_WRITE)
iter.flags |= IOMAP_WRITE;
@@ -1877,7 +1876,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault 
*vmf, pfn_t *pfnp,
}
 
iter.pos = (loff_t)xas.xa_index << PAGE_SHIFT;
-   while ((error = iomap_iter(, ops)) > 0) {
+   while (iomap_iter(, ops) > 0) {
if (iomap_length() < PMD_SIZE)
continue; /* actually breaks out of the loop */
 
-- 
2.39.2




[PATCH v2] dax/kmem: Pass valid argument to memory_group_register_static

2023-06-21 Thread Tarun Sahu
memory_group_register_static takes maximum number of pages as the argument
while dev_dax_kmem_probe passes total_len (in bytes) as the argument.

IIUC, I don't see any crash/panic impact as such. As,
memory_group_register_static just set the max_pages limit which is used in
auto_movable_zone_for_pfn to determine the zone.

which might cause these condition to behave differently,

This will be true always so jump will happen to kernel_zone
if (!auto_movable_can_online_movable(NUMA_NO_NODE, group, nr_pages))
goto kernel_zone;
---
kernel_zone:
return default_kernel_zone_for_pfn(nid, pfn, nr_pages);

---

Here, In below, zone_intersects compare range will be larger as nr_pages
will be higher (derived from total_len passed in dev_dax_kmem_probe).

static struct zone *default_kernel_zone_for_pfn(int nid, unsigned long 
start_pfn,
unsigned long nr_pages)
{
struct pglist_data *pgdat = NODE_DATA(nid);
int zid;

for (zid = 0; zid < ZONE_NORMAL; zid++) {
struct zone *zone = >node_zones[zid];

if (zone_intersects(zone, start_pfn, nr_pages))
return zone;
}

return >node_zones[ZONE_NORMAL];
}

Incorrect zone will be returned here, which in later time might cause bigger
problem.

Fixes: eedf634aac3b ("dax/kmem: use a single static memory group for a single 
probed unit")
Signed-off-by: Tarun Sahu 
Reviewed-by: Vishal Verma 
---
 drivers/dax/kmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 7b36db6f1cbd..898ca9505754 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -99,7 +99,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
if (!data->res_name)
goto err_res_name;
 
-   rc = memory_group_register_static(numa_node, total_len);
+   rc = memory_group_register_static(numa_node, PFN_UP(total_len));
if (rc < 0)
goto err_reg_mgid;
data->mgid = rc;
-- 
2.31.1




Re: [PATCH] dax/kmem: Pass valid argument to memory_group_register_static

2023-06-21 Thread Verma, Vishal L
On Wed, 2023-06-21 at 11:36 +0530, Tarun Sahu wrote:
> Hi Alison,
> 
> Alison Schofield  writes:
> 
> > On Tue, Jun 20, 2023 at 07:33:32PM +0530, Tarun Sahu wrote:
> > > memory_group_register_static takes maximum number of pages as the argument
> > > while dev_dax_kmem_probe passes total_len (in bytes) as the argument.
> > 
> > This sounds like a fix. An explanation of the impact and a fixes tag
> > may be needed. Also, wondering how you found it.
> > 
> Yes, it is a fix, I found it during dry code walk-through.
> There is not any impact as such. As,
> memory_group_register_static just set the max_pages limit which
> is used in auto_movable_zone_for_pfn to determine the zone.
> 
> which might cause these condition to behave differently,
> 
> This will be true always so jump will happen to kernel_zone
> if (!auto_movable_can_online_movable(NUMA_NO_NODE, group, nr_pages))
> goto kernel_zone;
> ---
> kernel_zone:
> return default_kernel_zone_for_pfn(nid, pfn, nr_pages);
> 
> ---
> 
> Here, In below, zone_intersects compare range will be larger as nr_pages
> will be higher (derived from total_len passed in dev_dax_kmem_probe).
> 
> static struct zone *default_kernel_zone_for_pfn(int nid, unsigned long 
> start_pfn,
> unsigned long nr_pages)
> {
> struct pglist_data *pgdat = NODE_DATA(nid);
> int zid;
> 
> for (zid = 0; zid < ZONE_NORMAL; zid++) {
> struct zone *zone = >node_zones[zid];
> 
> if (zone_intersects(zone, start_pfn, nr_pages))
> return zone;
> }
> 
> return >node_zones[ZONE_NORMAL];
> }
> 
> In Mostly cases, ZONE_NORMAL will be returned. But there is no
> crash/panic issues involved here, only decision making on selecting zone
> is affected.
> 

Hi Tarun,

Good find! With a Fixes tag, and perhaps inclusion of a bit more of
this detail described in the commit message, feel free to add:

Reviewed-by: Vishal Verma 


Re: [PATCH] dax/kmem: Pass valid argument to memory_group_register_static

2023-06-21 Thread Tarun Sahu
Hi Alison,

Alison Schofield  writes:

> On Tue, Jun 20, 2023 at 07:33:32PM +0530, Tarun Sahu wrote:
>> memory_group_register_static takes maximum number of pages as the argument
>> while dev_dax_kmem_probe passes total_len (in bytes) as the argument.
>
> This sounds like a fix. An explanation of the impact and a fixes tag
> may be needed. Also, wondering how you found it.
>
Yes, it is a fix, I found it during dry code walk-through.
There is not any impact as such. As,
memory_group_register_static just set the max_pages limit which
is used in auto_movable_zone_for_pfn to determine the zone.

which might cause these condition to behave differently,

This will be true always so jump will happen to kernel_zone
if (!auto_movable_can_online_movable(NUMA_NO_NODE, group, nr_pages))
goto kernel_zone;
---
kernel_zone:
return default_kernel_zone_for_pfn(nid, pfn, nr_pages);

---

Here, In below, zone_intersects compare range will be larger as nr_pages
will be higher (derived from total_len passed in dev_dax_kmem_probe).

static struct zone *default_kernel_zone_for_pfn(int nid, unsigned long 
start_pfn,
unsigned long nr_pages)
{
struct pglist_data *pgdat = NODE_DATA(nid);
int zid;

for (zid = 0; zid < ZONE_NORMAL; zid++) {
struct zone *zone = >node_zones[zid];

if (zone_intersects(zone, start_pfn, nr_pages))
return zone;
}

return >node_zones[ZONE_NORMAL];
}

In Mostly cases, ZONE_NORMAL will be returned. But there is no
crash/panic issues involved here, only decision making on selecting zone
is affected.



> Alison
>
>> 
>> Signed-off-by: Tarun Sahu 
>> ---
>>  drivers/dax/kmem.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
>> index 7b36db6f1cbd..898ca9505754 100644
>> --- a/drivers/dax/kmem.c
>> +++ b/drivers/dax/kmem.c
>> @@ -99,7 +99,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
>>  if (!data->res_name)
>>  goto err_res_name;
>>  
>> -rc = memory_group_register_static(numa_node, total_len);
>> +rc = memory_group_register_static(numa_node, PFN_UP(total_len));
>>  if (rc < 0)
>>  goto err_reg_mgid;
>>  data->mgid = rc;
>> -- 
>> 2.31.1
>>