Re: [PATCH v2] device-dax: use fallback nid when numa node is invalid

2021-09-10 Thread Dan Williams
On Fri, Sep 10, 2021 at 5:46 AM Jia He  wrote:
>
> Previously, numa_off was set unconditionally in dummy_numa_init()
> even with a fake numa node. Then ACPI sets node id as NUMA_NO_NODE(-1)
> after acpi_map_pxm_to_node() because it regards numa_off as turning
> off the numa node. Hence dev_dax->target_node is NUMA_NO_NODE on
> arm64 with fake numa case.
>
> Without this patch, pmem can't be probed as RAM devices on arm64 if
> SRAT table isn't present:
>   $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g -a 
> 64K
>   kmem dax0.0: rejecting DAX region [mem 0x24040-0x2bfff] with 
> invalid node: -1
>   kmem: probe of dax0.0 failed with error -22
>
> This fixes it by using fallback memory_add_physaddr_to_nid() as nid.
>
> Suggested-by: David Hildenbrand 
> Signed-off-by: Jia He 
> ---
> v2: - rebase it based on David's "memory group" patch.
> - drop the changes in dev_dax_kmem_remove() since nid had been
>   removed in remove_memory().
>  drivers/dax/kmem.c | 31 +--
>  1 file changed, 17 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index a37622060fff..e4836eb7539e 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -47,20 +47,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> unsigned long total_len = 0;
> struct dax_kmem_data *data;
> int i, rc, mapped = 0;
> -   int numa_node;
> -
> -   /*
> -* Ensure good NUMA information for the persistent memory.
> -* Without this check, there is a risk that slow memory
> -* could be mixed in a node with faster memory, causing
> -* unavoidable performance issues.
> -*/
> -   numa_node = dev_dax->target_node;
> -   if (numa_node < 0) {
> -   dev_warn(dev, "rejecting DAX region with invalid node: %d\n",
> -   numa_node);
> -   return -EINVAL;
> -   }
> +   int numa_node = dev_dax->target_node;
>
> for (i = 0; i < dev_dax->nr_range; i++) {
> struct range range;
> @@ -71,6 +58,22 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> i, range.start, range.end);
> continue;
> }
> +
> +   /*
> +* Ensure good NUMA information for the persistent memory.
> +* Without this check, there is a risk but not fatal that slow
> +* memory could be mixed in a node with faster memory, causing
> +* unavoidable performance issues. Warn this and use fallback
> +* node id.
> +*/
> +   if (numa_node < 0) {
> +   int new_node = 
> memory_add_physaddr_to_nid(range.start);
> +
> +   dev_info(dev, "changing nid from %d to %d for DAX 
> region [%#llx-%#llx]\n",
> +numa_node, new_node, range.start, range.end);
> +   numa_node = new_node;
> +   }
> +
> total_len += range_len();

This fallback change belongs where the parent region for the namespace
adopts its target_node, because it's not clear
memory_add_physaddr_to_nid() is the right fallback in all situations.
Here is where this setting is happening currently:

drivers/acpi/nfit/core.c:3004:  ndr_desc->target_node =
pxm_to_node(spa->proximity_domain);
drivers/acpi/nfit/core.c:3007:  ndr_desc->target_node = NUMA_NO_NODE;
drivers/nvdimm/e820.c:29:   ndr_desc.target_node = nid;
drivers/nvdimm/of_pmem.c:58:ndr_desc.target_node =
ndr_desc.numa_node;
drivers/nvdimm/region_devs.c:1127:  nd_region->target_node =
ndr_desc->target_node;

...where is this pmem region originating on this arm64 platform?



[PATCH v2] device-dax: use fallback nid when numa node is invalid

2021-09-10 Thread Jia He
Previously, numa_off was set unconditionally in dummy_numa_init()
even with a fake numa node. Then ACPI sets node id as NUMA_NO_NODE(-1)
after acpi_map_pxm_to_node() because it regards numa_off as turning
off the numa node. Hence dev_dax->target_node is NUMA_NO_NODE on
arm64 with fake numa case.

Without this patch, pmem can't be probed as RAM devices on arm64 if
SRAT table isn't present:
  $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g -a 64K
  kmem dax0.0: rejecting DAX region [mem 0x24040-0x2bfff] with invalid 
node: -1
  kmem: probe of dax0.0 failed with error -22

This fixes it by using fallback memory_add_physaddr_to_nid() as nid.

Suggested-by: David Hildenbrand 
Signed-off-by: Jia He 
---
v2: - rebase it based on David's "memory group" patch.
- drop the changes in dev_dax_kmem_remove() since nid had been 
  removed in remove_memory().
 drivers/dax/kmem.c | 31 +--
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index a37622060fff..e4836eb7539e 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -47,20 +47,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
unsigned long total_len = 0;
struct dax_kmem_data *data;
int i, rc, mapped = 0;
-   int numa_node;
-
-   /*
-* Ensure good NUMA information for the persistent memory.
-* Without this check, there is a risk that slow memory
-* could be mixed in a node with faster memory, causing
-* unavoidable performance issues.
-*/
-   numa_node = dev_dax->target_node;
-   if (numa_node < 0) {
-   dev_warn(dev, "rejecting DAX region with invalid node: %d\n",
-   numa_node);
-   return -EINVAL;
-   }
+   int numa_node = dev_dax->target_node;
 
for (i = 0; i < dev_dax->nr_range; i++) {
struct range range;
@@ -71,6 +58,22 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
i, range.start, range.end);
continue;
}
+
+   /*
+* Ensure good NUMA information for the persistent memory.
+* Without this check, there is a risk but not fatal that slow
+* memory could be mixed in a node with faster memory, causing
+* unavoidable performance issues. Warn this and use fallback
+* node id.
+*/
+   if (numa_node < 0) {
+   int new_node = memory_add_physaddr_to_nid(range.start);
+
+   dev_info(dev, "changing nid from %d to %d for DAX 
region [%#llx-%#llx]\n",
+numa_node, new_node, range.start, range.end);
+   numa_node = new_node;
+   }
+
total_len += range_len();
}
 
-- 
2.17.1