Re: [Qemu-devel] [PATCH v8 3/6] libnvdimm: add dax_dev sync flag

2019-05-13 Thread Pankaj Gupta


> >
> >
> > Hi Dan,
> >
> > While testing device mapper with DAX, I faced a bug with the commit:
> >
> > commit ad428cdb525a97d15c0349fdc80f3d58befb50df
> > Author: Dan Williams 
> > Date:   Wed Feb 20 21:12:50 2019 -0800
> >
> > When I reverted the condition to old code[1] it worked for me. I
> > am thinking when we map two different devices (e.g with device mapper),
> > will
> > start & end pfn still point to same pgmap? Or there is something else which
> > I am missing here.
> >
> > Note: I tested only EXT4.
> >
> > [1]
> >
> > -   if (pgmap && pgmap->type == MEMORY_DEVICE_FS_DAX)
> > +   end_pgmap = get_dev_pagemap(pfn_t_to_pfn(end_pfn), NULL);
> > +   if (pgmap && pgmap == end_pgmap && pgmap->type ==
> > MEMORY_DEVICE_FS_DAX
> > +   && pfn_t_to_page(pfn)->pgmap == pgmap
> > +   && pfn_t_to_page(end_pfn)->pgmap == pgmap
> > +   && pfn_t_to_pfn(pfn) ==
> > PHYS_PFN(__pa(kaddr))
> > +   && pfn_t_to_pfn(end_pfn) ==
> > PHYS_PFN(__pa(end_kaddr)))
> 
> Ugh, yes, device-mapper continues to be an awkward fit for dax (or
> vice versa). We would either need a way to have a multi-level pfn to
> pagemap lookup for composite devices, or a way to discern that even
> though the pagemap is different that the result is still valid / not
> an indication that we have leaked into an unassociated address range.
> Perhaps a per-daxdev callback for ->dax_supported() so that
> device-mapper internals can be used for this validation.

Yes, Will look at it.

> 
> We need to get that fixed up, but I don't see it as a blocker /
> pre-requisite for virtio-pmem.

Agree. Will send virtio-pmem patch series.

Thank you,
Pankaj
> 
> 


Re: [Qemu-devel] [PATCH v8 3/6] libnvdimm: add dax_dev sync flag

2019-05-13 Thread Pankaj Gupta


Hi Dan,

While testing device mapper with DAX, I faced a bug with the commit:

commit ad428cdb525a97d15c0349fdc80f3d58befb50df
Author: Dan Williams 
Date:   Wed Feb 20 21:12:50 2019 -0800

When I reverted the condition to old code[1] it worked for me. I 
am thinking when we map two different devices (e.g with device mapper), will 
start & end pfn still point to same pgmap? Or there is something else which
I am missing here. 

Note: I tested only EXT4.

[1]

-   if (pgmap && pgmap->type == MEMORY_DEVICE_FS_DAX)
+   end_pgmap = get_dev_pagemap(pfn_t_to_pfn(end_pfn), NULL);
+   if (pgmap && pgmap == end_pgmap && pgmap->type == 
MEMORY_DEVICE_FS_DAX
+   && pfn_t_to_page(pfn)->pgmap == pgmap
+   && pfn_t_to_page(end_pfn)->pgmap == pgmap
+   && pfn_t_to_pfn(pfn) == PHYS_PFN(__pa(kaddr))
+   && pfn_t_to_pfn(end_pfn) == 
PHYS_PFN(__pa(end_kaddr)))
dax_enabled = true;
put_dev_pagemap(pgmap);

Thanks,
Pankaj






Re: [Qemu-devel] [PATCH v8 3/6] libnvdimm: add dax_dev sync flag

2019-05-10 Thread Pankaj Gupta


> > > >
> > > > This patch adds 'DAXDEV_SYNC' flag which is set
> > > > for nd_region doing synchronous flush. This later
> > > > is used to disable MAP_SYNC functionality for
> > > > ext4 & xfs filesystem for devices don't support
> > > > synchronous flush.
> > > >
> > > > Signed-off-by: Pankaj Gupta 
> > > > ---
> > > >  drivers/dax/bus.c|  2 +-
> > > >  drivers/dax/super.c  | 13 -
> > > >  drivers/md/dm.c  |  3 ++-
> > > >  drivers/nvdimm/pmem.c|  5 -
> > > >  drivers/nvdimm/region_devs.c |  7 +++
> > > >  include/linux/dax.h  |  8 ++--
> > > >  include/linux/libnvdimm.h|  1 +
> > > >  7 files changed, 33 insertions(+), 6 deletions(-)
> > > [..]
> > > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > > > index 043f0761e4a0..ee007b75d9fd 100644
> > > > --- a/drivers/md/dm.c
> > > > +++ b/drivers/md/dm.c
> > > > @@ -1969,7 +1969,8 @@ static struct mapped_device *alloc_dev(int minor)
> > > > sprintf(md->disk->disk_name, "dm-%d", minor);
> > > >
> > > > if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
> > > > -   dax_dev = alloc_dax(md, md->disk->disk_name,
> > > > _dax_ops);
> > > > +   dax_dev = alloc_dax(md, md->disk->disk_name,
> > > > _dax_ops,
> > > > +
> > > > DAXDEV_F_SYNC);
> > >
> > > Apologies for not realizing this until now, but this is broken.
> > > Imaging a device-mapper configuration composed of both 'async'
> > > virtio-pmem and 'sync' pmem. The 'sync' flag needs to be unified
> > > across all members. I would change this argument to '0' and then
> > > arrange for it to be set at dm_table_supports_dax() time after
> > > validating that all components support synchronous dax.
> >
> > o.k. Need to set 'DAXDEV_F_SYNC' flag after verifying all the target
> > components support synchronous DAX.
> >
> > Just a question, If device mapper configuration have composed of both
> > virtio-pmem or pmem devices, we want to configure device mapper for async
> > flush?
> 
> If it's composed of both then, yes, it needs to be async flush at the
> device-mapper level. Otherwise MAP_SYNC will succeed and fail to
> trigger fsync on the host file when necessary for the virtio-pmem
> backed portion of the device-mapper device.

o.k. Agree.

Thanks you,
Pankaj
> 
> 


Re: [PATCH v8 3/6] libnvdimm: add dax_dev sync flag

2019-05-10 Thread Dan Williams
On Fri, May 10, 2019 at 5:45 PM Pankaj Gupta  wrote:
>
>
>
> > >
> > > This patch adds 'DAXDEV_SYNC' flag which is set
> > > for nd_region doing synchronous flush. This later
> > > is used to disable MAP_SYNC functionality for
> > > ext4 & xfs filesystem for devices don't support
> > > synchronous flush.
> > >
> > > Signed-off-by: Pankaj Gupta 
> > > ---
> > >  drivers/dax/bus.c|  2 +-
> > >  drivers/dax/super.c  | 13 -
> > >  drivers/md/dm.c  |  3 ++-
> > >  drivers/nvdimm/pmem.c|  5 -
> > >  drivers/nvdimm/region_devs.c |  7 +++
> > >  include/linux/dax.h  |  8 ++--
> > >  include/linux/libnvdimm.h|  1 +
> > >  7 files changed, 33 insertions(+), 6 deletions(-)
> > [..]
> > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > > index 043f0761e4a0..ee007b75d9fd 100644
> > > --- a/drivers/md/dm.c
> > > +++ b/drivers/md/dm.c
> > > @@ -1969,7 +1969,8 @@ static struct mapped_device *alloc_dev(int minor)
> > > sprintf(md->disk->disk_name, "dm-%d", minor);
> > >
> > > if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
> > > -   dax_dev = alloc_dax(md, md->disk->disk_name, _dax_ops);
> > > +   dax_dev = alloc_dax(md, md->disk->disk_name, _dax_ops,
> > > +DAXDEV_F_SYNC);
> >
> > Apologies for not realizing this until now, but this is broken.
> > Imaging a device-mapper configuration composed of both 'async'
> > virtio-pmem and 'sync' pmem. The 'sync' flag needs to be unified
> > across all members. I would change this argument to '0' and then
> > arrange for it to be set at dm_table_supports_dax() time after
> > validating that all components support synchronous dax.
>
> o.k. Need to set 'DAXDEV_F_SYNC' flag after verifying all the target
> components support synchronous DAX.
>
> Just a question, If device mapper configuration have composed of both
> virtio-pmem or pmem devices, we want to configure device mapper for async 
> flush?

If it's composed of both then, yes, it needs to be async flush at the
device-mapper level. Otherwise MAP_SYNC will succeed and fail to
trigger fsync on the host file when necessary for the virtio-pmem
backed portion of the device-mapper device.


Re: [PATCH v8 3/6] libnvdimm: add dax_dev sync flag

2019-05-10 Thread Pankaj Gupta



> >
> > This patch adds 'DAXDEV_SYNC' flag which is set
> > for nd_region doing synchronous flush. This later
> > is used to disable MAP_SYNC functionality for
> > ext4 & xfs filesystem for devices don't support
> > synchronous flush.
> >
> > Signed-off-by: Pankaj Gupta 
> > ---
> >  drivers/dax/bus.c|  2 +-
> >  drivers/dax/super.c  | 13 -
> >  drivers/md/dm.c  |  3 ++-
> >  drivers/nvdimm/pmem.c|  5 -
> >  drivers/nvdimm/region_devs.c |  7 +++
> >  include/linux/dax.h  |  8 ++--
> >  include/linux/libnvdimm.h|  1 +
> >  7 files changed, 33 insertions(+), 6 deletions(-)
> [..]
> > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > index 043f0761e4a0..ee007b75d9fd 100644
> > --- a/drivers/md/dm.c
> > +++ b/drivers/md/dm.c
> > @@ -1969,7 +1969,8 @@ static struct mapped_device *alloc_dev(int minor)
> > sprintf(md->disk->disk_name, "dm-%d", minor);
> >
> > if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
> > -   dax_dev = alloc_dax(md, md->disk->disk_name, _dax_ops);
> > +   dax_dev = alloc_dax(md, md->disk->disk_name, _dax_ops,
> > +DAXDEV_F_SYNC);
> 
> Apologies for not realizing this until now, but this is broken.
> Imaging a device-mapper configuration composed of both 'async'
> virtio-pmem and 'sync' pmem. The 'sync' flag needs to be unified
> across all members. I would change this argument to '0' and then
> arrange for it to be set at dm_table_supports_dax() time after
> validating that all components support synchronous dax.

o.k. Need to set 'DAXDEV_F_SYNC' flag after verifying all the target
components support synchronous DAX.

Just a question, If device mapper configuration have composed of both 
virtio-pmem or pmem devices, we want to configure device mapper for async flush?

Thank you,
Pankaj 
> 


[PATCH v8 3/6] libnvdimm: add dax_dev sync flag

2019-05-10 Thread Pankaj Gupta
This patch adds 'DAXDEV_SYNC' flag which is set
for nd_region doing synchronous flush. This later
is used to disable MAP_SYNC functionality for
ext4 & xfs filesystem for devices don't support
synchronous flush.

Signed-off-by: Pankaj Gupta 
---
 drivers/dax/bus.c|  2 +-
 drivers/dax/super.c  | 13 -
 drivers/md/dm.c  |  3 ++-
 drivers/nvdimm/pmem.c|  5 -
 drivers/nvdimm/region_devs.c |  7 +++
 include/linux/dax.h  |  8 ++--
 include/linux/libnvdimm.h|  1 +
 7 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 2109cfe80219..5f184e751c82 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -388,7 +388,7 @@ struct dev_dax *__devm_create_dev_dax(struct dax_region 
*dax_region, int id,
 * No 'host' or dax_operations since there is no access to this
 * device outside of mmap of the resulting character device.
 */
-   dax_dev = alloc_dax(dev_dax, NULL, NULL);
+   dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
if (!dax_dev)
goto err;
 
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index bbd57ca0634a..b6c44b5062e9 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -186,6 +186,8 @@ enum dax_device_flags {
DAXDEV_ALIVE,
/* gate whether dax_flush() calls the low level flush routine */
DAXDEV_WRITE_CACHE,
+   /* flag to check if device supports synchronous flush */
+   DAXDEV_SYNC,
 };
 
 /**
@@ -354,6 +356,12 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
 
+bool dax_synchronous(struct dax_device *dax_dev)
+{
+   return test_bit(DAXDEV_SYNC, _dev->flags);
+}
+EXPORT_SYMBOL_GPL(dax_synchronous);
+
 bool dax_alive(struct dax_device *dax_dev)
 {
lockdep_assert_held(_srcu);
@@ -508,7 +516,7 @@ static void dax_add_host(struct dax_device *dax_dev, const 
char *host)
 }
 
 struct dax_device *alloc_dax(void *private, const char *__host,
-   const struct dax_operations *ops)
+   const struct dax_operations *ops, unsigned long flags)
 {
struct dax_device *dax_dev;
const char *host;
@@ -531,6 +539,9 @@ struct dax_device *alloc_dax(void *private, const char 
*__host,
dax_add_host(dax_dev, host);
dax_dev->ops = ops;
dax_dev->private = private;
+   if (flags & DAXDEV_F_SYNC)
+   set_bit(DAXDEV_SYNC, _dev->flags);
+
return dax_dev;
 
  err_dev:
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 043f0761e4a0..ee007b75d9fd 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1969,7 +1969,8 @@ static struct mapped_device *alloc_dev(int minor)
sprintf(md->disk->disk_name, "dm-%d", minor);
 
if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
-   dax_dev = alloc_dax(md, md->disk->disk_name, _dax_ops);
+   dax_dev = alloc_dax(md, md->disk->disk_name, _dax_ops,
+DAXDEV_F_SYNC);
if (!dax_dev)
goto bad;
}
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 0279eb1da3ef..bdbd2b414d3d 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -365,6 +365,7 @@ static int pmem_attach_disk(struct device *dev,
struct gendisk *disk;
void *addr;
int rc;
+   unsigned long flags = 0UL;
 
pmem = devm_kzalloc(dev, sizeof(*pmem), GFP_KERNEL);
if (!pmem)
@@ -462,7 +463,9 @@ static int pmem_attach_disk(struct device *dev,
nvdimm_badblocks_populate(nd_region, >bb, _res);
disk->bb = >bb;
 
-   dax_dev = alloc_dax(pmem, disk->disk_name, _dax_ops);
+   if (is_nvdimm_sync(nd_region))
+   flags = DAXDEV_F_SYNC;
+   dax_dev = alloc_dax(pmem, disk->disk_name, _dax_ops, flags);
if (!dax_dev) {
put_disk(disk);
return -ENOMEM;
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index b4ef7d9ff22e..f3ea5369d20a 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1197,6 +1197,13 @@ int nvdimm_has_cache(struct nd_region *nd_region)
 }
 EXPORT_SYMBOL_GPL(nvdimm_has_cache);
 
+bool is_nvdimm_sync(struct nd_region *nd_region)
+{
+   return is_nd_pmem(_region->dev) &&
+   !test_bit(ND_REGION_ASYNC, _region->flags);
+}
+EXPORT_SYMBOL_GPL(is_nvdimm_sync);
+
 struct conflict_context {
struct nd_region *nd_region;
resource_size_t start, size;
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 0dd316a74a29..ed75b7d9d178 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -7,6 +7,9 @@
 #include 
 #include 
 
+/* Flag for synchronous flush */
+#define DAXDEV_F_SYNC (1UL << 0)
+
 typedef unsigned long dax_entry_t;
 
 struct iomap_ops;
@@ -32,18 +35,19 @@ extern struct attribute_group