Re: [PATCH v2] device-dax: use fallback nid when numa node is invalid

2021-09-15 Thread Dan Williams
On Tue, Sep 14, 2021 at 11:51 PM Justin He  wrote:
[..]
> > > diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
> > > index fb775b967c52..d3a0cec635b1 100644
> > > --- a/drivers/acpi/nfit/core.c
> > > +++ b/drivers/acpi/nfit/core.c
> > > @@ -3005,15 +3005,8 @@ static int acpi_nfit_register_region(struct
> > > acpi_nfit_desc *acpi_desc,
> > > ndr_desc->res = 
> > > ndr_desc->provider_data = nfit_spa;
> > > ndr_desc->attr_groups = acpi_nfit_region_attribute_groups;
> > > -   if (spa->flags & ACPI_NFIT_PROXIMITY_VALID) {
> > > -   ndr_desc->numa_node = acpi_map_pxm_to_online_node(
> > > -   spa->proximity_domain);
> > > -   ndr_desc->target_node = acpi_map_pxm_to_node(
> > > -   spa->proximity_domain);
> > > -   } else {
> > > -   ndr_desc->numa_node = NUMA_NO_NODE;
> > > -   ndr_desc->target_node = NUMA_NO_NODE;
> > > -   }
> > > +   ndr_desc->numa_node = memory_add_physaddr_to_nid(spa->address);
> > > +   ndr_desc->target_node = phys_to_target_node(spa->address);
> > >
> > > /*
> > >  * Persistence domain bits are hierarchical, if
> > > ===
> > >
> > > Do you still suggest fixing like this?
> >
> > Are you saying that ACPI_NFIT_PROXIMITY_VALID is not set on your
> > platform, or that pxm_to_node() returns NUMA_NO_NODE?
> >
> Latter,  ACPI_NFIT_PROXIMITY_VALID is *set* in my case.
>
> > I would expect something like this:
> >
> > diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
> > index a3ef6cce644c..95de7dc18ed8 100644
> > --- a/drivers/acpi/nfit/core.c
> > +++ b/drivers/acpi/nfit/core.c
> > @@ -3007,6 +3007,15 @@ static int acpi_nfit_register_region(struct
> > acpi_nfit_desc *acpi_desc,
> > ndr_desc->target_node = NUMA_NO_NODE;
> > }
> >
> > +   /*
> > +* Fallback to address based numa information if node lookup
> > +* failed
> > +*/
> > +   if (ndr_desc->numa_node == NUMA_NO_NODE)
> > +   ndr_desc->numa_node = memory_add_physaddr_to_nid(spa-
> > >address);
> > +   if (ndr_desc->target_node == NUMA_NO_NODE)
> > +   phys_to_target_node(spa->address);
> > +
>
> Would it better to add a dev_info() here to report this node id changing?

Yes, given all the possibilities here, a dev_info() reporting the
final result of the node mapping is justifiable.



RE: [PATCH v2] device-dax: use fallback nid when numa node is invalid

2021-09-15 Thread Justin He


> -Original Message-
> From: Dan Williams 
> Sent: Wednesday, September 15, 2021 1:16 PM
> To: Justin He 
> Cc: Vishal Verma ; Dave Jiang
> ; David Hildenbrand ; Linux NVDIMM
> ; Linux Kernel Mailing List  ker...@vger.kernel.org>; nd 
> Subject: Re: [PATCH v2] device-dax: use fallback nid when numa node is
> invalid
> 
> On Mon, Sep 13, 2021 at 7:06 PM Justin He  wrote:
> >
> > Hi Dan,
> >
> > > -Original Message-
> > > From: Dan Williams 
> > > Sent: Friday, September 10, 2021 11:42 PM
> > > To: Justin He 
> > > Cc: Vishal Verma ; Dave Jiang
> > > ; David Hildenbrand ; Linux
> NVDIMM
> > > ; Linux Kernel Mailing List  > > ker...@vger.kernel.org>
> > > Subject: Re: [PATCH v2] device-dax: use fallback nid when numa node is
> > > invalid
> > >
> > > On Fri, Sep 10, 2021 at 5:46 AM Jia He  wrote:
> > > >
> > > > Previously, numa_off was set unconditionally in dummy_numa_init()
> > > > even with a fake numa node. Then ACPI sets node id as NUMA_NO_NODE(-1)
> > > > after acpi_map_pxm_to_node() because it regards numa_off as turning
> > > > off the numa node. Hence dev_dax->target_node is NUMA_NO_NODE on
> > > > arm64 with fake numa case.
> > > >
> > > > Without this patch, pmem can't be probed as RAM devices on arm64 if
> > > > SRAT table isn't present:
> > > >   $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s
> 1g
> > > -a 64K
> > > >   kmem dax0.0: rejecting DAX region [mem 0x24040-0x2bfff]
> with
> > > invalid node: -1
> > > >   kmem: probe of dax0.0 failed with error -22
> > > >
> > > > This fixes it by using fallback memory_add_physaddr_to_nid() as nid.
> > > >
> > > > Suggested-by: David Hildenbrand 
> > > > Signed-off-by: Jia He 
> > > > ---
> > > > v2: - rebase it based on David's "memory group" patch.
> > > > - drop the changes in dev_dax_kmem_remove() since nid had been
> > > >   removed in remove_memory().
> > > >  drivers/dax/kmem.c | 31 +--
> > > >  1 file changed, 17 insertions(+), 14 deletions(-)
> > > >
> > > > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> > > > index a37622060fff..e4836eb7539e 100644
> > > > --- a/drivers/dax/kmem.c
> > > > +++ b/drivers/dax/kmem.c
> > > > @@ -47,20 +47,7 @@ static int dev_dax_kmem_probe(struct dev_dax
> *dev_dax)
> > > > unsigned long total_len = 0;
> > > > struct dax_kmem_data *data;
> > > > int i, rc, mapped = 0;
> > > > -   int numa_node;
> > > > -
> > > > -   /*
> > > > -* Ensure good NUMA information for the persistent memory.
> > > > -* Without this check, there is a risk that slow memory
> > > > -* could be mixed in a node with faster memory, causing
> > > > -* unavoidable performance issues.
> > > > -*/
> > > > -   numa_node = dev_dax->target_node;
> > > > -   if (numa_node < 0) {
> > > > -   dev_warn(dev, "rejecting DAX region with invalid
> > > node: %d\n",
> > > > -   numa_node);
> > > > -   return -EINVAL;
> > > > -   }
> > > > +   int numa_node = dev_dax->target_node;
> > > >
> > > > for (i = 0; i < dev_dax->nr_range; i++) {
> > > > struct range range;
> > > > @@ -71,6 +58,22 @@ static int dev_dax_kmem_probe(struct dev_dax
> *dev_dax)
> > > > i, range.start, range.end);
> > > > continue;
> > > > }
> > > > +
> > > > +   /*
> > > > +* Ensure good NUMA information for the persistent
> > > memory.
> > > > +* Without this check, there is a risk but not fatal
> > > that slow
> > > > +* memory could be mixed in a node with faster memory,
> > > causing
> > > > +* unavoidable performance issues. Warn this and use
> > > fallback
> > > > +* node id.
> > > > +*/
> > > > +   if (numa_node < 0) {

Re: [PATCH v2] device-dax: use fallback nid when numa node is invalid

2021-09-14 Thread Dan Williams
On Mon, Sep 13, 2021 at 7:06 PM Justin He  wrote:
>
> Hi Dan,
>
> > -Original Message-
> > From: Dan Williams 
> > Sent: Friday, September 10, 2021 11:42 PM
> > To: Justin He 
> > Cc: Vishal Verma ; Dave Jiang
> > ; David Hildenbrand ; Linux NVDIMM
> > ; Linux Kernel Mailing List  > ker...@vger.kernel.org>
> > Subject: Re: [PATCH v2] device-dax: use fallback nid when numa node is
> > invalid
> >
> > On Fri, Sep 10, 2021 at 5:46 AM Jia He  wrote:
> > >
> > > Previously, numa_off was set unconditionally in dummy_numa_init()
> > > even with a fake numa node. Then ACPI sets node id as NUMA_NO_NODE(-1)
> > > after acpi_map_pxm_to_node() because it regards numa_off as turning
> > > off the numa node. Hence dev_dax->target_node is NUMA_NO_NODE on
> > > arm64 with fake numa case.
> > >
> > > Without this patch, pmem can't be probed as RAM devices on arm64 if
> > > SRAT table isn't present:
> > >   $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g
> > -a 64K
> > >   kmem dax0.0: rejecting DAX region [mem 0x24040-0x2bfff] with
> > invalid node: -1
> > >   kmem: probe of dax0.0 failed with error -22
> > >
> > > This fixes it by using fallback memory_add_physaddr_to_nid() as nid.
> > >
> > > Suggested-by: David Hildenbrand 
> > > Signed-off-by: Jia He 
> > > ---
> > > v2: - rebase it based on David's "memory group" patch.
> > > - drop the changes in dev_dax_kmem_remove() since nid had been
> > >   removed in remove_memory().
> > >  drivers/dax/kmem.c | 31 +--
> > >  1 file changed, 17 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> > > index a37622060fff..e4836eb7539e 100644
> > > --- a/drivers/dax/kmem.c
> > > +++ b/drivers/dax/kmem.c
> > > @@ -47,20 +47,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> > > unsigned long total_len = 0;
> > > struct dax_kmem_data *data;
> > > int i, rc, mapped = 0;
> > > -   int numa_node;
> > > -
> > > -   /*
> > > -* Ensure good NUMA information for the persistent memory.
> > > -* Without this check, there is a risk that slow memory
> > > -* could be mixed in a node with faster memory, causing
> > > -* unavoidable performance issues.
> > > -*/
> > > -   numa_node = dev_dax->target_node;
> > > -   if (numa_node < 0) {
> > > -   dev_warn(dev, "rejecting DAX region with invalid
> > node: %d\n",
> > > -   numa_node);
> > > -   return -EINVAL;
> > > -   }
> > > +   int numa_node = dev_dax->target_node;
> > >
> > > for (i = 0; i < dev_dax->nr_range; i++) {
> > > struct range range;
> > > @@ -71,6 +58,22 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> > > i, range.start, range.end);
> > > continue;
> > > }
> > > +
> > > +   /*
> > > +* Ensure good NUMA information for the persistent
> > memory.
> > > +* Without this check, there is a risk but not fatal
> > that slow
> > > +* memory could be mixed in a node with faster memory,
> > causing
> > > +* unavoidable performance issues. Warn this and use
> > fallback
> > > +* node id.
> > > +*/
> > > +   if (numa_node < 0) {
> > > +   int new_node =
> > memory_add_physaddr_to_nid(range.start);
> > > +
> > > +   dev_info(dev, "changing nid from %d to %d for
> > DAX region [%#llx-%#llx]\n",
> > > +numa_node, new_node, range.start,
> > range.end);
> > > +   numa_node = new_node;
> > > +   }
> > > +
> > > total_len += range_len();
> >
> > This fallback change belongs where the parent region for the namespace
> > adopts its target_node, because it's not clear
> > memory_add_physaddr_to_nid() is the right fallback in all situations.
> > Here is where this setting is happening currently

RE: [PATCH v2] device-dax: use fallback nid when numa node is invalid

2021-09-13 Thread Justin He
Hi Dan,

> -Original Message-
> From: Dan Williams 
> Sent: Friday, September 10, 2021 11:42 PM
> To: Justin He 
> Cc: Vishal Verma ; Dave Jiang
> ; David Hildenbrand ; Linux NVDIMM
> ; Linux Kernel Mailing List  ker...@vger.kernel.org>
> Subject: Re: [PATCH v2] device-dax: use fallback nid when numa node is
> invalid
> 
> On Fri, Sep 10, 2021 at 5:46 AM Jia He  wrote:
> >
> > Previously, numa_off was set unconditionally in dummy_numa_init()
> > even with a fake numa node. Then ACPI sets node id as NUMA_NO_NODE(-1)
> > after acpi_map_pxm_to_node() because it regards numa_off as turning
> > off the numa node. Hence dev_dax->target_node is NUMA_NO_NODE on
> > arm64 with fake numa case.
> >
> > Without this patch, pmem can't be probed as RAM devices on arm64 if
> > SRAT table isn't present:
> >   $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g
> -a 64K
> >   kmem dax0.0: rejecting DAX region [mem 0x24040-0x2bfff] with
> invalid node: -1
> >   kmem: probe of dax0.0 failed with error -22
> >
> > This fixes it by using fallback memory_add_physaddr_to_nid() as nid.
> >
> > Suggested-by: David Hildenbrand 
> > Signed-off-by: Jia He 
> > ---
> > v2: - rebase it based on David's "memory group" patch.
> > - drop the changes in dev_dax_kmem_remove() since nid had been
> >   removed in remove_memory().
> >  drivers/dax/kmem.c | 31 +--
> >  1 file changed, 17 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> > index a37622060fff..e4836eb7539e 100644
> > --- a/drivers/dax/kmem.c
> > +++ b/drivers/dax/kmem.c
> > @@ -47,20 +47,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> > unsigned long total_len = 0;
> > struct dax_kmem_data *data;
> > int i, rc, mapped = 0;
> > -   int numa_node;
> > -
> > -   /*
> > -* Ensure good NUMA information for the persistent memory.
> > -* Without this check, there is a risk that slow memory
> > -* could be mixed in a node with faster memory, causing
> > -* unavoidable performance issues.
> > -*/
> > -   numa_node = dev_dax->target_node;
> > -   if (numa_node < 0) {
> > -   dev_warn(dev, "rejecting DAX region with invalid
> node: %d\n",
> > -   numa_node);
> > -   return -EINVAL;
> > -   }
> > +   int numa_node = dev_dax->target_node;
> >
> > for (i = 0; i < dev_dax->nr_range; i++) {
> > struct range range;
> > @@ -71,6 +58,22 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> > i, range.start, range.end);
> > continue;
> > }
> > +
> > +   /*
> > +* Ensure good NUMA information for the persistent
> memory.
> > +* Without this check, there is a risk but not fatal
> that slow
> > +* memory could be mixed in a node with faster memory,
> causing
> > +* unavoidable performance issues. Warn this and use
> fallback
> > +* node id.
> > +*/
> > +   if (numa_node < 0) {
> > +   int new_node =
> memory_add_physaddr_to_nid(range.start);
> > +
> > +   dev_info(dev, "changing nid from %d to %d for
> DAX region [%#llx-%#llx]\n",
> > +numa_node, new_node, range.start,
> range.end);
> > +   numa_node = new_node;
> > +   }
> > +
> > total_len += range_len();
> 
> This fallback change belongs where the parent region for the namespace
> adopts its target_node, because it's not clear
> memory_add_physaddr_to_nid() is the right fallback in all situations.
> Here is where this setting is happening currently:
> 
> drivers/acpi/nfit/core.c:3004:  ndr_desc->target_node =
> pxm_to_node(spa->proximity_domain);
On my local arm64 guest('virt' machine type), the target_node is
set to -1 at this line.
That is:
The condition "spa->flags & ACPI_NFIT_PROXIMITY_VALID" is hit.

> drivers/acpi/nfit/core.c:3007:  ndr_desc->target_node =
> NUMA_NO_NODE;
> drivers/nvdimm/e820.c:29:   ndr_desc.target_node = nid;
> drivers/nvdimm/of_pmem.c:58:ndr_desc.target_node =
> ndr_desc.numa_node;
>

Re: [PATCH v2] device-dax: use fallback nid when numa node is invalid

2021-09-10 Thread Dan Williams
On Fri, Sep 10, 2021 at 5:46 AM Jia He  wrote:
>
> Previously, numa_off was set unconditionally in dummy_numa_init()
> even with a fake numa node. Then ACPI sets node id as NUMA_NO_NODE(-1)
> after acpi_map_pxm_to_node() because it regards numa_off as turning
> off the numa node. Hence dev_dax->target_node is NUMA_NO_NODE on
> arm64 with fake numa case.
>
> Without this patch, pmem can't be probed as RAM devices on arm64 if
> SRAT table isn't present:
>   $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g -a 
> 64K
>   kmem dax0.0: rejecting DAX region [mem 0x24040-0x2bfff] with 
> invalid node: -1
>   kmem: probe of dax0.0 failed with error -22
>
> This fixes it by using fallback memory_add_physaddr_to_nid() as nid.
>
> Suggested-by: David Hildenbrand 
> Signed-off-by: Jia He 
> ---
> v2: - rebase it based on David's "memory group" patch.
> - drop the changes in dev_dax_kmem_remove() since nid had been
>   removed in remove_memory().
>  drivers/dax/kmem.c | 31 +--
>  1 file changed, 17 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index a37622060fff..e4836eb7539e 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -47,20 +47,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> unsigned long total_len = 0;
> struct dax_kmem_data *data;
> int i, rc, mapped = 0;
> -   int numa_node;
> -
> -   /*
> -* Ensure good NUMA information for the persistent memory.
> -* Without this check, there is a risk that slow memory
> -* could be mixed in a node with faster memory, causing
> -* unavoidable performance issues.
> -*/
> -   numa_node = dev_dax->target_node;
> -   if (numa_node < 0) {
> -   dev_warn(dev, "rejecting DAX region with invalid node: %d\n",
> -   numa_node);
> -   return -EINVAL;
> -   }
> +   int numa_node = dev_dax->target_node;
>
> for (i = 0; i < dev_dax->nr_range; i++) {
> struct range range;
> @@ -71,6 +58,22 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> i, range.start, range.end);
> continue;
> }
> +
> +   /*
> +* Ensure good NUMA information for the persistent memory.
> +* Without this check, there is a risk but not fatal that slow
> +* memory could be mixed in a node with faster memory, causing
> +* unavoidable performance issues. Warn this and use fallback
> +* node id.
> +*/
> +   if (numa_node < 0) {
> +   int new_node = 
> memory_add_physaddr_to_nid(range.start);
> +
> +   dev_info(dev, "changing nid from %d to %d for DAX 
> region [%#llx-%#llx]\n",
> +numa_node, new_node, range.start, range.end);
> +   numa_node = new_node;
> +   }
> +
> total_len += range_len();

This fallback change belongs where the parent region for the namespace
adopts its target_node, because it's not clear
memory_add_physaddr_to_nid() is the right fallback in all situations.
Here is where this setting is happening currently:

drivers/acpi/nfit/core.c:3004:  ndr_desc->target_node =
pxm_to_node(spa->proximity_domain);
drivers/acpi/nfit/core.c:3007:  ndr_desc->target_node = NUMA_NO_NODE;
drivers/nvdimm/e820.c:29:   ndr_desc.target_node = nid;
drivers/nvdimm/of_pmem.c:58:ndr_desc.target_node =
ndr_desc.numa_node;
drivers/nvdimm/region_devs.c:1127:  nd_region->target_node =
ndr_desc->target_node;

...where is this pmem region originating on this arm64 platform?