[RFC PATCH] drivers/dax: Allow to include DEV_DAX_PMEM as builtin

2019-03-29 Thread Aneesh Kumar K.V
This move the dependency to DEV_DAX_PMEM_COMPAT such that only
if DEV_DAX_PMEM is built as module we can allow the compat support.

This allows to test the new code easily in a emulation setup where we
often build things without module support.

Signed-off-by: Aneesh Kumar K.V 
---
 drivers/dax/Kconfig | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index 5ef624fe3934..e582e088b48c 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -23,7 +23,6 @@ config DEV_DAX
 config DEV_DAX_PMEM
tristate "PMEM DAX: direct access to persistent memory"
depends on LIBNVDIMM && NVDIMM_DAX && DEV_DAX
-   depends on m # until we can kill DEV_DAX_PMEM_COMPAT
default DEV_DAX
help
  Support raw access to persistent memory.  Note that this
@@ -50,7 +49,7 @@ config DEV_DAX_KMEM
 
 config DEV_DAX_PMEM_COMPAT
tristate "PMEM DAX: support the deprecated /sys/class/dax interface"
-   depends on DEV_DAX_PMEM
+   depends on DEV_DAX_PMEM=m
default DEV_DAX_PMEM
help
  Older versions of the libdaxctl library expect to find all
-- 
2.20.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH] mm: Fix modifying of page protection by insert_pfn_pmd()

2019-03-29 Thread Aneesh Kumar K.V
With some architectures like ppc64, set_pmd_at() cannot cope with
a situation where there is already some (different) valid entry present.

Use pmdp_set_access_flags() instead to modify the pfn which is built to
deal with modifying existing PMD entries.

This is similar to
commit cae85cb8add3 ("mm/memory.c: fix modifying of page protection by 
insert_pfn()")

We also do similar update w.r.t insert_pfn_pud eventhough ppc64 don't support
pud pfn entries now.

CC: sta...@vger.kernel.org
Signed-off-by: Aneesh Kumar K.V 
---
 mm/huge_memory.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 404acdcd0455..f7dca413c4b2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -755,6 +755,20 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, 
unsigned long addr,
spinlock_t *ptl;
 
ptl = pmd_lock(mm, pmd);
+   if (!pmd_none(*pmd)) {
+   if (write) {
+   if (pmd_pfn(*pmd) != pfn_t_to_pfn(pfn)) {
+   WARN_ON_ONCE(!is_huge_zero_pmd(*pmd));
+   goto out_unlock;
+   }
+   entry = pmd_mkyoung(*pmd);
+   entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
+   if (pmdp_set_access_flags(vma, addr, pmd, entry, 1))
+   update_mmu_cache_pmd(vma, addr, pmd);
+   }
+   goto out_unlock;
+   }
+
entry = pmd_mkhuge(pfn_t_pmd(pfn, prot));
if (pfn_t_devmap(pfn))
entry = pmd_mkdevmap(entry);
@@ -770,6 +784,7 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, 
unsigned long addr,
 
set_pmd_at(mm, addr, pmd, entry);
update_mmu_cache_pmd(vma, addr, pmd);
+out_unlock:
spin_unlock(ptl);
 }
 
@@ -821,6 +836,20 @@ static void insert_pfn_pud(struct vm_area_struct *vma, 
unsigned long addr,
spinlock_t *ptl;
 
ptl = pud_lock(mm, pud);
+   if (!pud_none(*pud)) {
+   if (write) {
+   if (pud_pfn(*pud) != pfn_t_to_pfn(pfn)) {
+   WARN_ON_ONCE(!is_huge_zero_pud(*pud));
+   goto out_unlock;
+   }
+   entry = pud_mkyoung(*pud);
+   entry = maybe_pud_mkwrite(pud_mkdirty(entry), vma);
+   if (pudp_set_access_flags(vma, addr, pud, entry, 1))
+   update_mmu_cache_pud(vma, addr, pud);
+   }
+   goto out_unlock;
+   }
+
entry = pud_mkhuge(pfn_t_pud(pfn, prot));
if (pfn_t_devmap(pfn))
entry = pud_mkdevmap(entry);
@@ -830,6 +859,8 @@ static void insert_pfn_pud(struct vm_area_struct *vma, 
unsigned long addr,
}
set_pud_at(mm, addr, pud, entry);
update_mmu_cache_pud(vma, addr, pud);
+
+out_unlock:
spin_unlock(ptl);
 }
 
-- 
2.20.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH 5/6] pci/p2pdma: Track pgmap references per resource, not globally

2019-03-29 Thread Dan Williams
On Fri, Mar 29, 2019 at 10:50 AM Logan Gunthorpe  wrote:
>
> Thanks Dan, this is great. I think the changes in this series are
> cleaner and more understandable than the patch set I had sent earlier.
>
> However, I found a couple minor issues with this patch:
>
> On 2019-03-29 9:27 a.m., Dan Williams wrote:
> >  static void pci_p2pdma_release(void *data)
> >  {
> >   struct pci_dev *pdev = data;
> > @@ -103,12 +110,12 @@ static void pci_p2pdma_release(void *data)
> >   if (!pdev->p2pdma)
> >   return;
> >
> > - wait_for_completion(>p2pdma->devmap_ref_done);
> > - percpu_ref_exit(>p2pdma->devmap_ref);
> > + /* Flush and disable pci_alloc_p2p_mem() */
> > + pdev->p2pdma = NULL;
> > + synchronize_rcu();
> >
> >   gen_pool_destroy(pdev->p2pdma->pool);
>
> I missed this on my initial review, but it became obvious when I tried
> to test the series: this is a NULL dereference seeing pdev->p2pdma was
> set to NULL a few lines up.

Ah, yup.

> When I fix this by storing p2pdma in a local variable, the patch set
> works and never seems to crash when I hot remove p2pdma memory.

Great!

>
> >  void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size)
> >  {
> > - void *ret;
> > + void *ret = NULL;
> > + struct percpu_ref *ref;
> >
> > + rcu_read_lock();
> >   if (unlikely(!pdev->p2pdma))
> > - return NULL;
>
> Using RCU here makes sense to me, however I expect we should be using
> the proper rcu_assign_pointer(), rcu_dereference() and __rcu tag with
> pdev->p2pdma. If only to better document what's being protected with the
> new RCU calls.

I think just add a comment because those helpers are for cases where
the rcu protected pointer is allowed to race the teardown. In this
case we're using rcu just as a barrier to force the NULL check to
resolve.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH 5/6] pci/p2pdma: Track pgmap references per resource, not globally

2019-03-29 Thread Logan Gunthorpe
Thanks Dan, this is great. I think the changes in this series are
cleaner and more understandable than the patch set I had sent earlier.

However, I found a couple minor issues with this patch:

On 2019-03-29 9:27 a.m., Dan Williams wrote:
>  static void pci_p2pdma_release(void *data)
>  {
>   struct pci_dev *pdev = data;
> @@ -103,12 +110,12 @@ static void pci_p2pdma_release(void *data)
>   if (!pdev->p2pdma)
>   return;
>  
> - wait_for_completion(>p2pdma->devmap_ref_done);
> - percpu_ref_exit(>p2pdma->devmap_ref);
> + /* Flush and disable pci_alloc_p2p_mem() */
> + pdev->p2pdma = NULL;
> + synchronize_rcu();
>  
>   gen_pool_destroy(pdev->p2pdma->pool);

I missed this on my initial review, but it became obvious when I tried
to test the series: this is a NULL dereference seeing pdev->p2pdma was
set to NULL a few lines up.

When I fix this by storing p2pdma in a local variable, the patch set
works and never seems to crash when I hot remove p2pdma memory.

>  void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size)
>  {
> - void *ret;
> + void *ret = NULL;
> + struct percpu_ref *ref;
>  
> + rcu_read_lock();
>   if (unlikely(!pdev->p2pdma))
> - return NULL;

Using RCU here makes sense to me, however I expect we should be using
the proper rcu_assign_pointer(), rcu_dereference() and __rcu tag with
pdev->p2pdma. If only to better document what's being protected with the
new RCU calls.

Logan
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH 6/6] mm/devm_memremap_pages: Fix final page put race

2019-03-29 Thread Ira Weiny
On Fri, Mar 29, 2019 at 08:27:55AM -0700, Dan Williams wrote:
> Logan noticed that devm_memremap_pages_release() kills the percpu_ref
> drops all the page references that were acquired at init and then
> immediately proceeds to unplug, arch_remove_memory(), the backing pages
> for the pagemap. If for some reason device shutdown actually collides
> with a busy / elevated-ref-count page then arch_remove_memory() should
> be deferred until after that reference is dropped.
> 
> As it stands the "wait for last page ref drop" happens *after*
> devm_memremap_pages_release() returns, which is obviously too late and
> can lead to crashes.
> 
> Fix this situation by assigning the responsibility to wait for the
> percpu_ref to go idle to devm_memremap_pages() with a new ->cleanup()
> callback. Implement the new cleanup callback for all
> devm_memremap_pages() users: pmem, devdax, hmm, and p2pdma.
> 
> Reported-by: Logan Gunthorpe 
> Fixes: 41e94a851304 ("add devm_memremap_pages")
> Cc: Ira Weiny 
> Cc: Bjorn Helgaas 
> Cc: "Jérôme Glisse" 
> Cc: Christoph Hellwig 
> Signed-off-by: Dan Williams 

For the series:

Reviewed-by: Ira Weiny 

> ---
>  drivers/dax/device.c  |   13 +++--
>  drivers/nvdimm/pmem.c |   17 +
>  drivers/pci/p2pdma.c  |   17 +++--
>  include/linux/memremap.h  |2 ++
>  kernel/memremap.c |   17 -
>  mm/hmm.c  |   14 +++---
>  tools/testing/nvdimm/test/iomap.c |2 ++
>  7 files changed, 38 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/dax/device.c b/drivers/dax/device.c
> index e428468ab661..e3aa78dd1bb0 100644
> --- a/drivers/dax/device.c
> +++ b/drivers/dax/device.c
> @@ -27,9 +27,8 @@ static void dev_dax_percpu_release(struct percpu_ref *ref)
>   complete(_dax->cmp);
>  }
>  
> -static void dev_dax_percpu_exit(void *data)
> +static void dev_dax_percpu_exit(struct percpu_ref *ref)
>  {
> - struct percpu_ref *ref = data;
>   struct dev_dax *dev_dax = ref_to_dev_dax(ref);
>  
>   dev_dbg(_dax->dev, "%s\n", __func__);
> @@ -468,18 +467,12 @@ int dev_dax_probe(struct device *dev)
>   if (rc)
>   return rc;
>  
> - rc = devm_add_action_or_reset(dev, dev_dax_percpu_exit, _dax->ref);
> - if (rc)
> - return rc;
> -
>   dev_dax->pgmap.ref = _dax->ref;
>   dev_dax->pgmap.kill = dev_dax_percpu_kill;
> + dev_dax->pgmap.cleanup = dev_dax_percpu_exit;
>   addr = devm_memremap_pages(dev, _dax->pgmap);
> - if (IS_ERR(addr)) {
> - devm_remove_action(dev, dev_dax_percpu_exit, _dax->ref);
> - percpu_ref_exit(_dax->ref);
> + if (IS_ERR(addr))
>   return PTR_ERR(addr);
> - }
>  
>   inode = dax_inode(dax_dev);
>   cdev = inode->i_cdev;
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index bc2f700feef8..507b9eda42aa 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -304,11 +304,19 @@ static const struct attribute_group 
> *pmem_attribute_groups[] = {
>   NULL,
>  };
>  
> -static void pmem_release_queue(void *q)
> +static void __pmem_release_queue(struct percpu_ref *ref)
>  {
> + struct request_queue *q;
> +
> + q = container_of(ref, typeof(*q), q_usage_counter);
>   blk_cleanup_queue(q);
>  }
>  
> +static void pmem_release_queue(void *ref)
> +{
> + __pmem_release_queue(ref);
> +}
> +
>  static void pmem_freeze_queue(struct percpu_ref *ref)
>  {
>   struct request_queue *q;
> @@ -400,12 +408,10 @@ static int pmem_attach_disk(struct device *dev,
>   if (!q)
>   return -ENOMEM;
>  
> - if (devm_add_action_or_reset(dev, pmem_release_queue, q))
> - return -ENOMEM;
> -
>   pmem->pfn_flags = PFN_DEV;
>   pmem->pgmap.ref = >q_usage_counter;
>   pmem->pgmap.kill = pmem_freeze_queue;
> + pmem->pgmap.cleanup = __pmem_release_queue;
>   if (is_nd_pfn(dev)) {
>   if (setup_pagemap_fsdax(dev, >pgmap))
>   return -ENOMEM;
> @@ -426,6 +432,9 @@ static int pmem_attach_disk(struct device *dev,
>   pmem->pfn_flags |= PFN_MAP;
>   memcpy(_res, >pgmap.res, sizeof(bb_res));
>   } else {
> + if (devm_add_action_or_reset(dev, pmem_release_queue,
> + >q_usage_counter))
> + return -ENOMEM;
>   addr = devm_memremap(dev, pmem->phys_addr,
>   pmem->size, ARCH_MEMREMAP_PMEM);
>   memcpy(_res, >res, sizeof(bb_res));
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 1b96c1688715..7ff5b8067670 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -95,7 +95,7 @@ static void pci_p2pdma_percpu_kill(struct percpu_ref *ref)
>   percpu_ref_kill(ref);
>  }
>  
> -static void pci_p2pdma_percpu_cleanup(void *ref)
> +static void 

Re: [ndctl PATCH] monitor: removing the requirement of default config

2019-03-29 Thread Verma, Vishal L


On Fri, 2019-03-29 at 20:01 +0900, QI Fuli wrote:
> Removing the requirement of default configuration file.
> If "-c, --config-file" option is not specified, default configuration
> file should be dispensable.
> 
> Link: https://github.com/pmem/ndctl/issues/83
> Signed-off-by: QI Fuli 
> 
> ---
>  ndctl/monitor.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/ndctl/monitor.c b/ndctl/monitor.c
> index 346a6df..6829a6b 100644
> --- a/ndctl/monitor.c
> +++ b/ndctl/monitor.c
> @@ -484,8 +484,11 @@ static int read_config_file(struct ndctl_ctx *ctx, 
> struct monitor *_monitor,
>  
>   f = fopen(config_file, "r");
>   if (!f) {
> - err(, "config-file: %s cannot be opened\n", 
> config_file);
> - rc = -errno;
> + if (_monitor->config_file) {
> + err(, "config-file: %s cannot be opened\n",
> + config_file);
> + rc = -errno;
> + }
>   goto out;
>   }
>  
Hi Qi,

Thanks for the quick patch. However, this makes it so that the only way
in which a config file gets used is by explicitly specifying it with a
-c option, and that kind of makes a 'default' config file pointless..

The ideal scenario would be:
1. Check if default config exists
   a. if it does, use options from there
   b. if any cli options provided, use those to override the ones in
config file (if any)

2. If default config file is missing, only use cli options

3. If -c  provided, use that, but perform the same treatment as 1b
above, i.e. any cli options override anything in the config file from -c
as well.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH 3/6] pci/p2pdma: Fix the gen_pool_add_virt() failure path

2019-03-29 Thread Bjorn Helgaas
On Fri, Mar 29, 2019 at 08:27:39AM -0700, Dan Williams wrote:
> The pci_p2pdma_add_resource() implementation immediately frees the pgmap
> if gen_pool_add_virt() fails. However, that means that when @dev
> triggers a devres release devm_memremap_pages_release() will crash
> trying to access the freed @pgmap.
> 
> Use the new devm_memunmap_pages() to manually free the mapping in the
> error path.
> 
> Fixes: 52916982af48 ("PCI/P2PDMA: Support peer-to-peer memory")
> Cc: Logan Gunthorpe 
> Cc: Ira Weiny 
> Cc: Bjorn Helgaas 
> Cc: Christoph Hellwig 
> Signed-off-by: Dan Williams 

Especially if you run "git log --oneline drivers/pci/p2pdma.c" and make
yours match :),

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/p2pdma.c |4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index c52298d76e64..595a534bd749 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -208,13 +208,15 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int 
> bar, size_t size,
>   pci_bus_address(pdev, bar) + offset,
>   resource_size(>res), dev_to_node(>dev));
>   if (error)
> - goto pgmap_free;
> + goto pages_free;
>  
>   pci_info(pdev, "added peer-to-peer DMA memory %pR\n",
>>res);
>  
>   return 0;
>  
> +pages_free:
> + devm_memunmap_pages(>dev, pgmap);
>  pgmap_free:
>   devm_kfree(>dev, pgmap);
>   return error;
> 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH 6/6] mm/devm_memremap_pages: Fix final page put race

2019-03-29 Thread Dan Williams
Logan noticed that devm_memremap_pages_release() kills the percpu_ref
drops all the page references that were acquired at init and then
immediately proceeds to unplug, arch_remove_memory(), the backing pages
for the pagemap. If for some reason device shutdown actually collides
with a busy / elevated-ref-count page then arch_remove_memory() should
be deferred until after that reference is dropped.

As it stands the "wait for last page ref drop" happens *after*
devm_memremap_pages_release() returns, which is obviously too late and
can lead to crashes.

Fix this situation by assigning the responsibility to wait for the
percpu_ref to go idle to devm_memremap_pages() with a new ->cleanup()
callback. Implement the new cleanup callback for all
devm_memremap_pages() users: pmem, devdax, hmm, and p2pdma.

Reported-by: Logan Gunthorpe 
Fixes: 41e94a851304 ("add devm_memremap_pages")
Cc: Ira Weiny 
Cc: Bjorn Helgaas 
Cc: "Jérôme Glisse" 
Cc: Christoph Hellwig 
Signed-off-by: Dan Williams 
---
 drivers/dax/device.c  |   13 +++--
 drivers/nvdimm/pmem.c |   17 +
 drivers/pci/p2pdma.c  |   17 +++--
 include/linux/memremap.h  |2 ++
 kernel/memremap.c |   17 -
 mm/hmm.c  |   14 +++---
 tools/testing/nvdimm/test/iomap.c |2 ++
 7 files changed, 38 insertions(+), 44 deletions(-)

diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index e428468ab661..e3aa78dd1bb0 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -27,9 +27,8 @@ static void dev_dax_percpu_release(struct percpu_ref *ref)
complete(_dax->cmp);
 }
 
-static void dev_dax_percpu_exit(void *data)
+static void dev_dax_percpu_exit(struct percpu_ref *ref)
 {
-   struct percpu_ref *ref = data;
struct dev_dax *dev_dax = ref_to_dev_dax(ref);
 
dev_dbg(_dax->dev, "%s\n", __func__);
@@ -468,18 +467,12 @@ int dev_dax_probe(struct device *dev)
if (rc)
return rc;
 
-   rc = devm_add_action_or_reset(dev, dev_dax_percpu_exit, _dax->ref);
-   if (rc)
-   return rc;
-
dev_dax->pgmap.ref = _dax->ref;
dev_dax->pgmap.kill = dev_dax_percpu_kill;
+   dev_dax->pgmap.cleanup = dev_dax_percpu_exit;
addr = devm_memremap_pages(dev, _dax->pgmap);
-   if (IS_ERR(addr)) {
-   devm_remove_action(dev, dev_dax_percpu_exit, _dax->ref);
-   percpu_ref_exit(_dax->ref);
+   if (IS_ERR(addr))
return PTR_ERR(addr);
-   }
 
inode = dax_inode(dax_dev);
cdev = inode->i_cdev;
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index bc2f700feef8..507b9eda42aa 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -304,11 +304,19 @@ static const struct attribute_group 
*pmem_attribute_groups[] = {
NULL,
 };
 
-static void pmem_release_queue(void *q)
+static void __pmem_release_queue(struct percpu_ref *ref)
 {
+   struct request_queue *q;
+
+   q = container_of(ref, typeof(*q), q_usage_counter);
blk_cleanup_queue(q);
 }
 
+static void pmem_release_queue(void *ref)
+{
+   __pmem_release_queue(ref);
+}
+
 static void pmem_freeze_queue(struct percpu_ref *ref)
 {
struct request_queue *q;
@@ -400,12 +408,10 @@ static int pmem_attach_disk(struct device *dev,
if (!q)
return -ENOMEM;
 
-   if (devm_add_action_or_reset(dev, pmem_release_queue, q))
-   return -ENOMEM;
-
pmem->pfn_flags = PFN_DEV;
pmem->pgmap.ref = >q_usage_counter;
pmem->pgmap.kill = pmem_freeze_queue;
+   pmem->pgmap.cleanup = __pmem_release_queue;
if (is_nd_pfn(dev)) {
if (setup_pagemap_fsdax(dev, >pgmap))
return -ENOMEM;
@@ -426,6 +432,9 @@ static int pmem_attach_disk(struct device *dev,
pmem->pfn_flags |= PFN_MAP;
memcpy(_res, >pgmap.res, sizeof(bb_res));
} else {
+   if (devm_add_action_or_reset(dev, pmem_release_queue,
+   >q_usage_counter))
+   return -ENOMEM;
addr = devm_memremap(dev, pmem->phys_addr,
pmem->size, ARCH_MEMREMAP_PMEM);
memcpy(_res, >res, sizeof(bb_res));
diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 1b96c1688715..7ff5b8067670 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -95,7 +95,7 @@ static void pci_p2pdma_percpu_kill(struct percpu_ref *ref)
percpu_ref_kill(ref);
 }
 
-static void pci_p2pdma_percpu_cleanup(void *ref)
+static void pci_p2pdma_percpu_cleanup(struct percpu_ref *ref)
 {
struct p2pdma_pagemap *p2p_pgmap = to_p2p_pgmap(ref);
 
@@ -197,16 +197,6 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, 
size_t size,
if (error)
goto pgmap_free;
 
-   /*
-* FIXME: 

[PATCH 4/6] lib/genalloc: Introduce chunk owners

2019-03-29 Thread Dan Williams
The p2pdma facility enables a provider to publish a pool of dma
addresses for a consumer to allocate. A genpool is used internally by
p2pdma to collect dma resources, 'chunks', to be handed out to
consumers. Whenever a consumer allocates a resource it needs to pin the
'struct dev_pagemap' instance that backs the chunk selected by
pci_alloc_p2pmem().

Currently that reference is taken globally on the entire provider
device. That sets up a lifetime mismatch whereby the p2pdma core needs
to maintain hacks to make sure the percpu_ref is not released twice.

This lifetime mismatch also stands in the way of a fix to
devm_memremap_pages() whereby devm_memremap_pages_release() must wait
for the percpu_ref ->release() callback to complete before it can
proceed to teardown pages.

So, towards fixing this situation, introduce the ability to store a
'chunk owner' at gen_pool_add() time, and a facility to retrieve the
owner at gen_pool_{alloc,free}() time. For p2pdma this will be used to
store and recall individual dev_pagemap reference counter instances
per-chunk.

Cc: Logan Gunthorpe 
Cc: Ira Weiny 
Cc: Bjorn Helgaas 
Cc: "Jérôme Glisse" 
Cc: Christoph Hellwig 
Signed-off-by: Dan Williams 
---
 include/linux/genalloc.h |   55 +-
 lib/genalloc.c   |   51 +--
 2 files changed, 74 insertions(+), 32 deletions(-)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index dd0a452373e7..a337313e064f 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -75,6 +75,7 @@ struct gen_pool_chunk {
struct list_head next_chunk;/* next chunk in pool */
atomic_long_t avail;
phys_addr_t phys_addr;  /* physical starting address of memory 
chunk */
+   void *owner;/* private data to retrieve at alloc 
time */
unsigned long start_addr;   /* start address of memory chunk */
unsigned long end_addr; /* end address of memory chunk 
(inclusive) */
unsigned long bits[0];  /* bitmap for allocating memory chunk */
@@ -96,8 +97,15 @@ struct genpool_data_fixed {
 
 extern struct gen_pool *gen_pool_create(int, int);
 extern phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long);
-extern int gen_pool_add_virt(struct gen_pool *, unsigned long, phys_addr_t,
-size_t, int);
+extern int gen_pool_add_owner(struct gen_pool *, unsigned long, phys_addr_t,
+size_t, int, void *);
+
+static inline int gen_pool_add_virt(struct gen_pool *pool, unsigned long addr,
+   phys_addr_t phys, size_t size, int nid)
+{
+   return gen_pool_add_owner(pool, addr, phys, size, nid, NULL);
+}
+
 /**
  * gen_pool_add - add a new chunk of special memory to the pool
  * @pool: pool to add new memory chunk to
@@ -116,12 +124,47 @@ static inline int gen_pool_add(struct gen_pool *pool, 
unsigned long addr,
return gen_pool_add_virt(pool, addr, -1, size, nid);
 }
 extern void gen_pool_destroy(struct gen_pool *);
-extern unsigned long gen_pool_alloc(struct gen_pool *, size_t);
-extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
-   genpool_algo_t algo, void *data);
+unsigned long gen_pool_alloc_algo_owner(struct gen_pool *pool, size_t size,
+   genpool_algo_t algo, void *data, void **owner);
+
+static inline unsigned long gen_pool_alloc_owner(struct gen_pool *pool,
+   size_t size, void **owner)
+{
+   return gen_pool_alloc_algo_owner(pool, size, pool->algo, pool->data,
+   owner);
+}
+
+static inline unsigned long gen_pool_alloc_algo(struct gen_pool *pool,
+   size_t size, genpool_algo_t algo, void *data)
+{
+   return gen_pool_alloc_algo_owner(pool, size, algo, data, NULL);
+}
+
+/**
+ * gen_pool_alloc - allocate special memory from the pool
+ * @pool: pool to allocate from
+ * @size: number of bytes to allocate from the pool
+ *
+ * Allocate the requested number of bytes from the specified pool.
+ * Uses the pool allocation function (with first-fit algorithm by default).
+ * Can not be used in NMI handler on architectures without
+ * NMI-safe cmpxchg implementation.
+ */
+static inline unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
+{
+   return gen_pool_alloc_algo(pool, size, pool->algo, pool->data);
+}
+
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
dma_addr_t *dma);
-extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+extern void gen_pool_free_owner(struct gen_pool *pool, unsigned long addr,
+   size_t size, void **owner);
+static inline void gen_pool_free(struct gen_pool *pool, unsigned long addr,
+size_t size)
+{
+   gen_pool_free_owner(pool, addr, size, NULL);
+}
+
 extern void gen_pool_for_each_chunk(struct gen_pool *,
void (*)(struct gen_pool *, 

[PATCH 3/6] pci/p2pdma: Fix the gen_pool_add_virt() failure path

2019-03-29 Thread Dan Williams
The pci_p2pdma_add_resource() implementation immediately frees the pgmap
if gen_pool_add_virt() fails. However, that means that when @dev
triggers a devres release devm_memremap_pages_release() will crash
trying to access the freed @pgmap.

Use the new devm_memunmap_pages() to manually free the mapping in the
error path.

Fixes: 52916982af48 ("PCI/P2PDMA: Support peer-to-peer memory")
Cc: Logan Gunthorpe 
Cc: Ira Weiny 
Cc: Bjorn Helgaas 
Cc: Christoph Hellwig 
Signed-off-by: Dan Williams 
---
 drivers/pci/p2pdma.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index c52298d76e64..595a534bd749 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -208,13 +208,15 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int 
bar, size_t size,
pci_bus_address(pdev, bar) + offset,
resource_size(>res), dev_to_node(>dev));
if (error)
-   goto pgmap_free;
+   goto pages_free;
 
pci_info(pdev, "added peer-to-peer DMA memory %pR\n",
 >res);
 
return 0;
 
+pages_free:
+   devm_memunmap_pages(>dev, pgmap);
 pgmap_free:
devm_kfree(>dev, pgmap);
return error;

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH 5/6] pci/p2pdma: Track pgmap references per resource, not globally

2019-03-29 Thread Dan Williams
In preparation for fixing a race between devm_memremap_pages_release()
and the final put of a page from the device-page-map, allocate a
percpu-ref per p2pdma resource mapping.

Cc: Logan Gunthorpe 
Cc: Bjorn Helgaas 
Cc: Christoph Hellwig 
Signed-off-by: Dan Williams 
---
 drivers/pci/p2pdma.c |  114 --
 1 file changed, 73 insertions(+), 41 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 595a534bd749..1b96c1688715 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -20,12 +20,16 @@
 #include 
 
 struct pci_p2pdma {
-   struct percpu_ref devmap_ref;
-   struct completion devmap_ref_done;
struct gen_pool *pool;
bool p2pmem_published;
 };
 
+struct p2pdma_pagemap {
+   struct dev_pagemap pgmap;
+   struct percpu_ref ref;
+   struct completion ref_done;
+};
+
 static ssize_t size_show(struct device *dev, struct device_attribute *attr,
 char *buf)
 {
@@ -74,28 +78,31 @@ static const struct attribute_group p2pmem_group = {
.name = "p2pmem",
 };
 
+static struct p2pdma_pagemap *to_p2p_pgmap(struct percpu_ref *ref)
+{
+   return container_of(ref, struct p2pdma_pagemap, ref);
+}
+
 static void pci_p2pdma_percpu_release(struct percpu_ref *ref)
 {
-   struct pci_p2pdma *p2p =
-   container_of(ref, struct pci_p2pdma, devmap_ref);
+   struct p2pdma_pagemap *p2p_pgmap = to_p2p_pgmap(ref);
 
-   complete_all(>devmap_ref_done);
+   complete(_pgmap->ref_done);
 }
 
 static void pci_p2pdma_percpu_kill(struct percpu_ref *ref)
 {
-   /*
-* pci_p2pdma_add_resource() may be called multiple times
-* by a driver and may register the percpu_kill devm action multiple
-* times. We only want the first action to actually kill the
-* percpu_ref.
-*/
-   if (percpu_ref_is_dying(ref))
-   return;
-
percpu_ref_kill(ref);
 }
 
+static void pci_p2pdma_percpu_cleanup(void *ref)
+{
+   struct p2pdma_pagemap *p2p_pgmap = to_p2p_pgmap(ref);
+
+   wait_for_completion(_pgmap->ref_done);
+   percpu_ref_exit(_pgmap->ref);
+}
+
 static void pci_p2pdma_release(void *data)
 {
struct pci_dev *pdev = data;
@@ -103,12 +110,12 @@ static void pci_p2pdma_release(void *data)
if (!pdev->p2pdma)
return;
 
-   wait_for_completion(>p2pdma->devmap_ref_done);
-   percpu_ref_exit(>p2pdma->devmap_ref);
+   /* Flush and disable pci_alloc_p2p_mem() */
+   pdev->p2pdma = NULL;
+   synchronize_rcu();
 
gen_pool_destroy(pdev->p2pdma->pool);
sysfs_remove_group(>dev.kobj, _group);
-   pdev->p2pdma = NULL;
 }
 
 static int pci_p2pdma_setup(struct pci_dev *pdev)
@@ -124,12 +131,6 @@ static int pci_p2pdma_setup(struct pci_dev *pdev)
if (!p2p->pool)
goto out;
 
-   init_completion(>devmap_ref_done);
-   error = percpu_ref_init(>devmap_ref,
-   pci_p2pdma_percpu_release, 0, GFP_KERNEL);
-   if (error)
-   goto out_pool_destroy;
-
error = devm_add_action_or_reset(>dev, pci_p2pdma_release, pdev);
if (error)
goto out_pool_destroy;
@@ -163,6 +164,7 @@ static int pci_p2pdma_setup(struct pci_dev *pdev)
 int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
u64 offset)
 {
+   struct p2pdma_pagemap *p2p_pgmap;
struct dev_pagemap *pgmap;
void *addr;
int error;
@@ -185,14 +187,32 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int 
bar, size_t size,
return error;
}
 
-   pgmap = devm_kzalloc(>dev, sizeof(*pgmap), GFP_KERNEL);
-   if (!pgmap)
+   p2p_pgmap = devm_kzalloc(>dev, sizeof(*p2p_pgmap), GFP_KERNEL);
+   if (!p2p_pgmap)
return -ENOMEM;
 
+   init_completion(_pgmap->ref_done);
+   error = percpu_ref_init(_pgmap->ref,
+   pci_p2pdma_percpu_release, 0, GFP_KERNEL);
+   if (error)
+   goto pgmap_free;
+
+   /*
+* FIXME: the percpu_ref_exit needs to be coordinated internal
+* to devm_memremap_pages_release(). Duplicate the same ordering
+* as other devm_memremap_pages() users for now.
+*/
+   error = devm_add_action(>dev, pci_p2pdma_percpu_cleanup,
+   _pgmap->ref);
+   if (error)
+   goto ref_cleanup;
+
+   pgmap = _pgmap->pgmap;
+
pgmap->res.start = pci_resource_start(pdev, bar) + offset;
pgmap->res.end = pgmap->res.start + size - 1;
pgmap->res.flags = pci_resource_flags(pdev, bar);
-   pgmap->ref = >p2pdma->devmap_ref;
+   pgmap->ref = _pgmap->ref;
pgmap->type = MEMORY_DEVICE_PCI_P2PDMA;
pgmap->pci_p2pdma_bus_offset = pci_bus_address(pdev, bar) -
pci_resource_start(pdev, bar);
@@ -201,12 +221,13 @@ int 

[PATCH 0/6] mm/devm_memremap_pages: Fix page release race

2019-03-29 Thread Dan Williams
Logan audited the devm_memremap_pages() shutdown path and noticed that
it was possible to proceed to arch_remove_memory() before all
potential page references have been reaped.

Introduce a new ->cleanup() callback to do the work of waiting for any
straggling page references and then perform the percpu_ref_exit() in
devm_memremap_pages_release() context.

For p2pdma this involves some deeper reworks to reference count
resources on a per-instance basis rather than a per pci-device basis. A
modified genalloc api is introduced to convey a driver-private pointer
through gen_pool_{alloc,free}() interfaces. Also, a
devm_memunmap_pages() api is introduced since p2pdma does not
auto-release resources on a setup failure.

The dax and pmem changes pass the nvdimm unit tests, but the hmm and
p2pdma changes are compile-tested only.

Thoughts on the api changes?

I'm targeting to land this through Andrew's tree. 0day has chewed on
this for a day and reported no issues so far.

---

Dan Williams (6):
  drivers/base/devres: Introduce devm_release_action()
  mm/devm_memremap_pages: Introduce devm_memunmap_pages
  pci/p2pdma: Fix the gen_pool_add_virt() failure path
  lib/genalloc: Introduce chunk owners
  pci/p2pdma: Track pgmap references per resource, not globally
  mm/devm_memremap_pages: Fix final page put race


 drivers/base/devres.c |   24 
 drivers/dax/device.c  |   13 +
 drivers/nvdimm/pmem.c |   17 +-
 drivers/pci/p2pdma.c  |  105 +++--
 include/linux/device.h|1 
 include/linux/genalloc.h  |   55 +--
 include/linux/memremap.h  |8 +++
 kernel/memremap.c |   23 ++--
 lib/genalloc.c|   51 +-
 mm/hmm.c  |   14 +
 tools/testing/nvdimm/test/iomap.c |2 +
 11 files changed, 209 insertions(+), 104 deletions(-)
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH 2/6] mm/devm_memremap_pages: Introduce devm_memunmap_pages

2019-03-29 Thread Dan Williams
Use the new devm_relase_action() facility to allow
devm_memremap_pages_release() to be manually triggered.

Cc: Logan Gunthorpe 
Cc: Ira Weiny 
Cc: Bjorn Helgaas 
Cc: Christoph Hellwig 
Signed-off-by: Dan Williams 
---
 include/linux/memremap.h |6 ++
 kernel/memremap.c|6 ++
 2 files changed, 12 insertions(+)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index f0628660d541..7601ee314c4a 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -100,6 +100,7 @@ struct dev_pagemap {
 
 #ifdef CONFIG_ZONE_DEVICE
 void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap);
+void devm_memunmap_pages(struct device *dev, struct dev_pagemap *pgmap);
 struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
struct dev_pagemap *pgmap);
 
@@ -118,6 +119,11 @@ static inline void *devm_memremap_pages(struct device *dev,
return ERR_PTR(-ENXIO);
 }
 
+static inline void devm_memunmap_pages(struct device *dev,
+   struct dev_pagemap *pgmap)
+{
+}
+
 static inline struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
struct dev_pagemap *pgmap)
 {
diff --git a/kernel/memremap.c b/kernel/memremap.c
index a856cb5ff192..65afbacab44e 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -266,6 +266,12 @@ void *devm_memremap_pages(struct device *dev, struct 
dev_pagemap *pgmap)
 }
 EXPORT_SYMBOL_GPL(devm_memremap_pages);
 
+void devm_memunmap_pages(struct device *dev, struct dev_pagemap *pgmap)
+{
+   devm_release_action(dev, devm_memremap_pages_release, pgmap);
+}
+EXPORT_SYMBOL_GPL(devm_memunmap_pages);
+
 unsigned long vmem_altmap_offset(struct vmem_altmap *altmap)
 {
/* number of pfns from base where pfn_to_page() is valid */

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH 1/6] drivers/base/devres: Introduce devm_release_action()

2019-03-29 Thread Dan Williams
The devm_add_action() facility allows a resource allocation routine to
add custom devm semantics. One such user is devm_memremap_pages().

There is now a need to manually trigger devm_memremap_pages_release().
Introduce devm_release_action() so the release action can be triggered
via a new devm_memunmap_pages() api in a follow-on change.

Cc: Logan Gunthorpe 
Cc: Ira Weiny 
Cc: Bjorn Helgaas 
Cc: Christoph Hellwig 
Cc: Greg Kroah-Hartman 
Cc: "Rafael J. Wysocki" 
Signed-off-by: Dan Williams 
---
 drivers/base/devres.c  |   24 +++-
 include/linux/device.h |1 +
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/base/devres.c b/drivers/base/devres.c
index e038e2b3b7ea..0bbb328bd17f 100644
--- a/drivers/base/devres.c
+++ b/drivers/base/devres.c
@@ -755,10 +755,32 @@ void devm_remove_action(struct device *dev, void 
(*action)(void *), void *data)
 
WARN_ON(devres_destroy(dev, devm_action_release, devm_action_match,
   ));
-
 }
 EXPORT_SYMBOL_GPL(devm_remove_action);
 
+/**
+ * devm_release_action() - release previously added custom action
+ * @dev: Device that owns the action
+ * @action: Function implementing the action
+ * @data: Pointer to data passed to @action implementation
+ *
+ * Releases and removes instance of @action previously added by
+ * devm_add_action().  Both action and data should match one of the
+ * existing entries.
+ */
+void devm_release_action(struct device *dev, void (*action)(void *), void 
*data)
+{
+   struct action_devres devres = {
+   .data = data,
+   .action = action,
+   };
+
+   WARN_ON(devres_release(dev, devm_action_release, devm_action_match,
+  ));
+
+}
+EXPORT_SYMBOL_GPL(devm_release_action);
+
 /*
  * Managed kmalloc/kfree
  */
diff --git a/include/linux/device.h b/include/linux/device.h
index b425a7ee04ce..02a3e45de9af 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -715,6 +715,7 @@ void __iomem *devm_of_iomap(struct device *dev,
 /* allows to add/remove a custom action to devres stack */
 int devm_add_action(struct device *dev, void (*action)(void *), void *data);
 void devm_remove_action(struct device *dev, void (*action)(void *), void 
*data);
+void devm_release_action(struct device *dev, void (*action)(void *), void 
*data);
 
 static inline int devm_add_action_or_reset(struct device *dev,
   void (*action)(void *), void *data)

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[ndctl PATCH] monitor: removing the requirement of default config

2019-03-29 Thread QI Fuli
Removing the requirement of default configuration file.
If "-c, --config-file" option is not specified, default configuration
file should be dispensable.

Link: https://github.com/pmem/ndctl/issues/83
Signed-off-by: QI Fuli 

---
 ndctl/monitor.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/ndctl/monitor.c b/ndctl/monitor.c
index 346a6df..6829a6b 100644
--- a/ndctl/monitor.c
+++ b/ndctl/monitor.c
@@ -484,8 +484,11 @@ static int read_config_file(struct ndctl_ctx *ctx, struct 
monitor *_monitor,
 
f = fopen(config_file, "r");
if (!f) {
-   err(, "config-file: %s cannot be opened\n", 
config_file);
-   rc = -errno;
+   if (_monitor->config_file) {
+   err(, "config-file: %s cannot be opened\n",
+   config_file);
+   rc = -errno;
+   }
goto out;
}
 
-- 
2.21.0.rc1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm