Re: [PATCH] dm: Call proper helper to determine dax support

2020-09-18 Thread Dan Williams
On Fri, Sep 18, 2020 at 5:41 AM Huang Adrian  wrote:
[..]
> Interestingly, the command `lvm2-testsuite --only pvmove-abort-all.sh`
> passed on vanilla 5.9-rc5 (on my two boxes).
> Is it the same symptom (call trace) with my reported one?
>
> Could you please run the above-mentioned command on your box (w/wo
> Jan's patch plus my fixup)?

Thanks for the exact reproduction details. I was hitting a different
hang, but I was able to reproduce this regression / latent bug. Fix is
here:

https://lore.kernel.org/linux-nvdimm/160045867590.25663.7548541079217827340.st...@dwillia2-desk3.amr.corp.intel.com/T/#u
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH] dm: Call proper helper to determine dax support

2020-09-18 Thread Huang Adrian
On Fri, Sep 18, 2020 at 8:41 PM Huang Adrian  wrote:
>
> On Fri, Sep 18, 2020 at 1:41 PM Dan Williams  wrote:
> >
> > On Thu, Sep 17, 2020 at 10:49 AM Dan Williams  
> > wrote:
> > >
> > > On Thu, Sep 17, 2020 at 7:58 AM Huang Adrian  
> > > wrote:
> > > >
> > > > On Thu, Sep 17, 2020 at 6:42 PM Jan Kara  wrote:
> > > > >
> > > > > On Thu 17-09-20 02:28:57, Dan Williams wrote:
> > > > > > On Wed, Sep 16, 2020 at 8:15 AM Jan Kara  wrote:
> > > > > > >
> > > > > > > DM was calling generic_fsdax_supported() to determine whether a 
> > > > > > > device
> > > > > > > referenced in the DM table supports DAX. However this is a helper 
> > > > > > > for "leaf" device drivers so that
> > > > > > > they don't have to duplicate common generic checks. High level 
> > > > > > > code
> > > > > > > should call dax_supported() helper which that calls into 
> > > > > > > appropriate
> > > > > > > helper for the particular device. This problem manifested itself 
> > > > > > > as
> > > > > > > kernel messages:
> > > > > > >
> > > > > > > dm-3: error: dax access failed (-95)
> > > > > > >
> > > > > > > when lvm2-testsuite run in cases where a DM device was stacked on 
> > > > > > > top of
> > > > > > > another DM device.
> > > > > > >
> > > > > > > Fixes: 7bf7eac8d648 ("dax: Arrange for dax_supported check to 
> > > > > > > span multiple devices")
> > > > > > > Tested-by: Adrian Huang 
> > > > > > > Signed-off-by: Jan Kara 
> > > > > > > ---
> > > > > > >  drivers/dax/super.c   |  4 
> > > > > > >  drivers/md/dm-table.c |  3 +--
> > > > > > >  include/linux/dax.h   | 11 +--
> > > > > > >  3 files changed, 14 insertions(+), 4 deletions(-)
> > > > > > >
> > > > > > > This patch should go in together with Adrian's
> > > > > > > https://lore.kernel.org/linux-nvdimm/20200916133923.31-1-adrianhuang0...@gmail.com
> > > > > > >
> > > > > > > diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> > > > > > > index e5767c83ea23..b6284c5cae0a 100644
> > > > > > > --- a/drivers/dax/super.c
> > > > > > > +++ b/drivers/dax/super.c
> > > > > > > @@ -325,11 +325,15 @@ EXPORT_SYMBOL_GPL(dax_direct_access);
> > > > > > >  bool dax_supported(struct dax_device *dax_dev, struct 
> > > > > > > block_device *bdev,
> > > > > > > int blocksize, sector_t start, sector_t len)
> > > > > > >  {
> > > > > > > +   if (!dax_dev)
> > > > > > > +   return false;
> > > > > > > +
> > > > > >
> > > > > > Hi Jan, Thanks for this.
> > > > > >
> > > > > > > if (!dax_alive(dax_dev))
> > > > > > > return false;
> > > > > >
> > > > > > One small fixup to quiet lockdep because dax_supported() calls
> > > > > > dax_alive() it expects that dax_read_lock() is held. So I'm testing
> > > > > > with this incremental change:
> > > > > >
> > > > > > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> > > > > > index bed1ff0744ec..229f461e7def 100644
> > > > > > --- a/drivers/md/dm-table.c
> > > > > > +++ b/drivers/md/dm-table.c
> > > > > > @@ -860,9 +860,14 @@ EXPORT_SYMBOL_GPL(dm_table_set_type);
> > > > > >  int device_supports_dax(struct dm_target *ti, struct dm_dev *dev,
> > > > > > sector_t start, sector_t len, void *data)
> > > > > >  {
> > > > > > -   int blocksize = *(int *) data;
> > > > > > +   int blocksize = *(int *) data, id;
> > > > > > +   bool rc;
> > > > > >
> > > > > > -   return dax_supported(dev->dax_dev, dev->bdev, blocksize, 
> > > > > > start, len);
> > > > > > +   id = dax_read_lock();
> > > > > > +   rc = dax_supported(dev->dax_dev, dev->bdev, blocksize, 
> > > > > > start, len);
> > > > > > +   dax_read_unlock(id);
> > > > > > +
> > > > > > +   return rc;
> > > > > >  }
> > > > >
> > > > > Yeah, thanks for this! I was actually looking into this when writing 
> > > > > the
> > > > > patch and somehow convinced myself we will always be called through
> > > > > bdev_dax_supported() which does dax_read_lock() for us. But 
> > > > > apparently I
> > > > > was wrong...
> > > >
> > > > Hold on. This patch hit another regression when I ran the full test of
> > > > the lvm2-testsuite tool today.
> > >
> > > Are you sure it's this patch?
> > >
> > > The dax_read_lock() should have zero interaction with the
> > > synchronize_srcu() that __dm_suspend() performs. The too srcu domains
> > > should not conflict... I don't even see a dax_read_lock() in this
> > > path.
> >
> > Confirmed. I hit the hang in the lvm2-testsuite on vanilla 5.9-rc5.
> > So, I'm going to proceed with Jan's patch plus my fixup.
>
> Interestingly, the command `lvm2-testsuite --only pvmove-abort-all.sh`
> passed on vanilla 5.9-rc5 (on my two boxes).
> Is it the same symptom (call trace) with my reported one?
>
> Could you please run the above-mentioned command on your box (w/wo
> Jan's patch plus my fixup)?

Sorry, it should be "w/wo Jan's patch plus Dan's fixup" (I just copied
and pasted without modification).

-- Adrian

Re: [PATCH] dm: Call proper helper to determine dax support

2020-09-18 Thread Huang Adrian
On Fri, Sep 18, 2020 at 1:41 PM Dan Williams  wrote:
>
> On Thu, Sep 17, 2020 at 10:49 AM Dan Williams  
> wrote:
> >
> > On Thu, Sep 17, 2020 at 7:58 AM Huang Adrian  
> > wrote:
> > >
> > > On Thu, Sep 17, 2020 at 6:42 PM Jan Kara  wrote:
> > > >
> > > > On Thu 17-09-20 02:28:57, Dan Williams wrote:
> > > > > On Wed, Sep 16, 2020 at 8:15 AM Jan Kara  wrote:
> > > > > >
> > > > > > DM was calling generic_fsdax_supported() to determine whether a 
> > > > > > device
> > > > > > referenced in the DM table supports DAX. However this is a helper 
> > > > > > for "leaf" device drivers so that
> > > > > > they don't have to duplicate common generic checks. High level code
> > > > > > should call dax_supported() helper which that calls into appropriate
> > > > > > helper for the particular device. This problem manifested itself as
> > > > > > kernel messages:
> > > > > >
> > > > > > dm-3: error: dax access failed (-95)
> > > > > >
> > > > > > when lvm2-testsuite run in cases where a DM device was stacked on 
> > > > > > top of
> > > > > > another DM device.
> > > > > >
> > > > > > Fixes: 7bf7eac8d648 ("dax: Arrange for dax_supported check to span 
> > > > > > multiple devices")
> > > > > > Tested-by: Adrian Huang 
> > > > > > Signed-off-by: Jan Kara 
> > > > > > ---
> > > > > >  drivers/dax/super.c   |  4 
> > > > > >  drivers/md/dm-table.c |  3 +--
> > > > > >  include/linux/dax.h   | 11 +--
> > > > > >  3 files changed, 14 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > This patch should go in together with Adrian's
> > > > > > https://lore.kernel.org/linux-nvdimm/20200916133923.31-1-adrianhuang0...@gmail.com
> > > > > >
> > > > > > diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> > > > > > index e5767c83ea23..b6284c5cae0a 100644
> > > > > > --- a/drivers/dax/super.c
> > > > > > +++ b/drivers/dax/super.c
> > > > > > @@ -325,11 +325,15 @@ EXPORT_SYMBOL_GPL(dax_direct_access);
> > > > > >  bool dax_supported(struct dax_device *dax_dev, struct block_device 
> > > > > > *bdev,
> > > > > > int blocksize, sector_t start, sector_t len)
> > > > > >  {
> > > > > > +   if (!dax_dev)
> > > > > > +   return false;
> > > > > > +
> > > > >
> > > > > Hi Jan, Thanks for this.
> > > > >
> > > > > > if (!dax_alive(dax_dev))
> > > > > > return false;
> > > > >
> > > > > One small fixup to quiet lockdep because dax_supported() calls
> > > > > dax_alive() it expects that dax_read_lock() is held. So I'm testing
> > > > > with this incremental change:
> > > > >
> > > > > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> > > > > index bed1ff0744ec..229f461e7def 100644
> > > > > --- a/drivers/md/dm-table.c
> > > > > +++ b/drivers/md/dm-table.c
> > > > > @@ -860,9 +860,14 @@ EXPORT_SYMBOL_GPL(dm_table_set_type);
> > > > >  int device_supports_dax(struct dm_target *ti, struct dm_dev *dev,
> > > > > sector_t start, sector_t len, void *data)
> > > > >  {
> > > > > -   int blocksize = *(int *) data;
> > > > > +   int blocksize = *(int *) data, id;
> > > > > +   bool rc;
> > > > >
> > > > > -   return dax_supported(dev->dax_dev, dev->bdev, blocksize, 
> > > > > start, len);
> > > > > +   id = dax_read_lock();
> > > > > +   rc = dax_supported(dev->dax_dev, dev->bdev, blocksize, start, 
> > > > > len);
> > > > > +   dax_read_unlock(id);
> > > > > +
> > > > > +   return rc;
> > > > >  }
> > > >
> > > > Yeah, thanks for this! I was actually looking into this when writing the
> > > > patch and somehow convinced myself we will always be called through
> > > > bdev_dax_supported() which does dax_read_lock() for us. But apparently I
> > > > was wrong...
> > >
> > > Hold on. This patch hit another regression when I ran the full test of
> > > the lvm2-testsuite tool today.
> >
> > Are you sure it's this patch?
> >
> > The dax_read_lock() should have zero interaction with the
> > synchronize_srcu() that __dm_suspend() performs. The too srcu domains
> > should not conflict... I don't even see a dax_read_lock() in this
> > path.
>
> Confirmed. I hit the hang in the lvm2-testsuite on vanilla 5.9-rc5.
> So, I'm going to proceed with Jan's patch plus my fixup.

Interestingly, the command `lvm2-testsuite --only pvmove-abort-all.sh`
passed on vanilla 5.9-rc5 (on my two boxes).
Is it the same symptom (call trace) with my reported one?

Could you please run the above-mentioned command on your box (w/wo
Jan's patch plus my fixup)?

-- Adrian
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH] dm: Call proper helper to determine dax support

2020-09-18 Thread Huang Adrian
On Fri, Sep 18, 2020 at 1:49 AM Dan Williams  wrote:
>
> On Thu, Sep 17, 2020 at 7:58 AM Huang Adrian  
> wrote:
> >
> > On Thu, Sep 17, 2020 at 6:42 PM Jan Kara  wrote:
> > >
> > > On Thu 17-09-20 02:28:57, Dan Williams wrote:
> > > > On Wed, Sep 16, 2020 at 8:15 AM Jan Kara  wrote:
> > > > >
> > > > > DM was calling generic_fsdax_supported() to determine whether a device
> > > > > referenced in the DM table supports DAX. However this is a helper for 
> > > > > "leaf" device drivers so that
> > > > > they don't have to duplicate common generic checks. High level code
> > > > > should call dax_supported() helper which that calls into appropriate
> > > > > helper for the particular device. This problem manifested itself as
> > > > > kernel messages:
> > > > >
> > > > > dm-3: error: dax access failed (-95)
> > > > >
> > > > > when lvm2-testsuite run in cases where a DM device was stacked on top 
> > > > > of
> > > > > another DM device.
> > > > >
> > > > > Fixes: 7bf7eac8d648 ("dax: Arrange for dax_supported check to span 
> > > > > multiple devices")
> > > > > Tested-by: Adrian Huang 
> > > > > Signed-off-by: Jan Kara 
> > > > > ---
> > > > >  drivers/dax/super.c   |  4 
> > > > >  drivers/md/dm-table.c |  3 +--
> > > > >  include/linux/dax.h   | 11 +--
> > > > >  3 files changed, 14 insertions(+), 4 deletions(-)
> > > > >
> > > > > This patch should go in together with Adrian's
> > > > > https://lore.kernel.org/linux-nvdimm/20200916133923.31-1-adrianhuang0...@gmail.com
> > > > >
> > > > > diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> > > > > index e5767c83ea23..b6284c5cae0a 100644
> > > > > --- a/drivers/dax/super.c
> > > > > +++ b/drivers/dax/super.c
> > > > > @@ -325,11 +325,15 @@ EXPORT_SYMBOL_GPL(dax_direct_access);
> > > > >  bool dax_supported(struct dax_device *dax_dev, struct block_device 
> > > > > *bdev,
> > > > > int blocksize, sector_t start, sector_t len)
> > > > >  {
> > > > > +   if (!dax_dev)
> > > > > +   return false;
> > > > > +
> > > >
> > > > Hi Jan, Thanks for this.
> > > >
> > > > > if (!dax_alive(dax_dev))
> > > > > return false;
> > > >
> > > > One small fixup to quiet lockdep because dax_supported() calls
> > > > dax_alive() it expects that dax_read_lock() is held. So I'm testing
> > > > with this incremental change:
> > > >
> > > > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> > > > index bed1ff0744ec..229f461e7def 100644
> > > > --- a/drivers/md/dm-table.c
> > > > +++ b/drivers/md/dm-table.c
> > > > @@ -860,9 +860,14 @@ EXPORT_SYMBOL_GPL(dm_table_set_type);
> > > >  int device_supports_dax(struct dm_target *ti, struct dm_dev *dev,
> > > > sector_t start, sector_t len, void *data)
> > > >  {
> > > > -   int blocksize = *(int *) data;
> > > > +   int blocksize = *(int *) data, id;
> > > > +   bool rc;
> > > >
> > > > -   return dax_supported(dev->dax_dev, dev->bdev, blocksize, start, 
> > > > len);
> > > > +   id = dax_read_lock();
> > > > +   rc = dax_supported(dev->dax_dev, dev->bdev, blocksize, start, 
> > > > len);
> > > > +   dax_read_unlock(id);
> > > > +
> > > > +   return rc;
> > > >  }
> > >
> > > Yeah, thanks for this! I was actually looking into this when writing the
> > > patch and somehow convinced myself we will always be called through
> > > bdev_dax_supported() which does dax_read_lock() for us. But apparently I
> > > was wrong...
> >
> > Hold on. This patch hit another regression when I ran the full test of
> > the lvm2-testsuite tool today.
>
> Are you sure it's this patch?

I'm pretty sure I applied this patch with your fixup. I tested it for
three times:
1. `lvm2-testsuite --only pvmove-abort-all.sh` is always passed
without the patch.
2. `lvm2-testsuite --only pvmove-abort-all.sh` is always blocked
with the patch.

> The dax_read_lock() should have zero interaction with the
> synchronize_srcu() that __dm_suspend() performs. The too srcu domains
> should not conflict... I don't even see a dax_read_lock() in this
> path.

Yup, I understand your observation. The call trace didn't show the
dax_read_lock().

-- Adrian
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH] dm: Call proper helper to determine dax support

2020-09-17 Thread Dan Williams
On Thu, Sep 17, 2020 at 10:49 AM Dan Williams  wrote:
>
> On Thu, Sep 17, 2020 at 7:58 AM Huang Adrian  
> wrote:
> >
> > On Thu, Sep 17, 2020 at 6:42 PM Jan Kara  wrote:
> > >
> > > On Thu 17-09-20 02:28:57, Dan Williams wrote:
> > > > On Wed, Sep 16, 2020 at 8:15 AM Jan Kara  wrote:
> > > > >
> > > > > DM was calling generic_fsdax_supported() to determine whether a device
> > > > > referenced in the DM table supports DAX. However this is a helper for 
> > > > > "leaf" device drivers so that
> > > > > they don't have to duplicate common generic checks. High level code
> > > > > should call dax_supported() helper which that calls into appropriate
> > > > > helper for the particular device. This problem manifested itself as
> > > > > kernel messages:
> > > > >
> > > > > dm-3: error: dax access failed (-95)
> > > > >
> > > > > when lvm2-testsuite run in cases where a DM device was stacked on top 
> > > > > of
> > > > > another DM device.
> > > > >
> > > > > Fixes: 7bf7eac8d648 ("dax: Arrange for dax_supported check to span 
> > > > > multiple devices")
> > > > > Tested-by: Adrian Huang 
> > > > > Signed-off-by: Jan Kara 
> > > > > ---
> > > > >  drivers/dax/super.c   |  4 
> > > > >  drivers/md/dm-table.c |  3 +--
> > > > >  include/linux/dax.h   | 11 +--
> > > > >  3 files changed, 14 insertions(+), 4 deletions(-)
> > > > >
> > > > > This patch should go in together with Adrian's
> > > > > https://lore.kernel.org/linux-nvdimm/20200916133923.31-1-adrianhuang0...@gmail.com
> > > > >
> > > > > diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> > > > > index e5767c83ea23..b6284c5cae0a 100644
> > > > > --- a/drivers/dax/super.c
> > > > > +++ b/drivers/dax/super.c
> > > > > @@ -325,11 +325,15 @@ EXPORT_SYMBOL_GPL(dax_direct_access);
> > > > >  bool dax_supported(struct dax_device *dax_dev, struct block_device 
> > > > > *bdev,
> > > > > int blocksize, sector_t start, sector_t len)
> > > > >  {
> > > > > +   if (!dax_dev)
> > > > > +   return false;
> > > > > +
> > > >
> > > > Hi Jan, Thanks for this.
> > > >
> > > > > if (!dax_alive(dax_dev))
> > > > > return false;
> > > >
> > > > One small fixup to quiet lockdep because dax_supported() calls
> > > > dax_alive() it expects that dax_read_lock() is held. So I'm testing
> > > > with this incremental change:
> > > >
> > > > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> > > > index bed1ff0744ec..229f461e7def 100644
> > > > --- a/drivers/md/dm-table.c
> > > > +++ b/drivers/md/dm-table.c
> > > > @@ -860,9 +860,14 @@ EXPORT_SYMBOL_GPL(dm_table_set_type);
> > > >  int device_supports_dax(struct dm_target *ti, struct dm_dev *dev,
> > > > sector_t start, sector_t len, void *data)
> > > >  {
> > > > -   int blocksize = *(int *) data;
> > > > +   int blocksize = *(int *) data, id;
> > > > +   bool rc;
> > > >
> > > > -   return dax_supported(dev->dax_dev, dev->bdev, blocksize, start, 
> > > > len);
> > > > +   id = dax_read_lock();
> > > > +   rc = dax_supported(dev->dax_dev, dev->bdev, blocksize, start, 
> > > > len);
> > > > +   dax_read_unlock(id);
> > > > +
> > > > +   return rc;
> > > >  }
> > >
> > > Yeah, thanks for this! I was actually looking into this when writing the
> > > patch and somehow convinced myself we will always be called through
> > > bdev_dax_supported() which does dax_read_lock() for us. But apparently I
> > > was wrong...
> >
> > Hold on. This patch hit another regression when I ran the full test of
> > the lvm2-testsuite tool today.
>
> Are you sure it's this patch?
>
> The dax_read_lock() should have zero interaction with the
> synchronize_srcu() that __dm_suspend() performs. The too srcu domains
> should not conflict... I don't even see a dax_read_lock() in this
> path.

Confirmed. I hit the hang in the lvm2-testsuite on vanilla 5.9-rc5.
So, I'm going to proceed with Jan's patch plus my fixup.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH] dm: Call proper helper to determine dax support

2020-09-17 Thread Dan Williams
On Thu, Sep 17, 2020 at 7:58 AM Huang Adrian  wrote:
>
> On Thu, Sep 17, 2020 at 6:42 PM Jan Kara  wrote:
> >
> > On Thu 17-09-20 02:28:57, Dan Williams wrote:
> > > On Wed, Sep 16, 2020 at 8:15 AM Jan Kara  wrote:
> > > >
> > > > DM was calling generic_fsdax_supported() to determine whether a device
> > > > referenced in the DM table supports DAX. However this is a helper for 
> > > > "leaf" device drivers so that
> > > > they don't have to duplicate common generic checks. High level code
> > > > should call dax_supported() helper which that calls into appropriate
> > > > helper for the particular device. This problem manifested itself as
> > > > kernel messages:
> > > >
> > > > dm-3: error: dax access failed (-95)
> > > >
> > > > when lvm2-testsuite run in cases where a DM device was stacked on top of
> > > > another DM device.
> > > >
> > > > Fixes: 7bf7eac8d648 ("dax: Arrange for dax_supported check to span 
> > > > multiple devices")
> > > > Tested-by: Adrian Huang 
> > > > Signed-off-by: Jan Kara 
> > > > ---
> > > >  drivers/dax/super.c   |  4 
> > > >  drivers/md/dm-table.c |  3 +--
> > > >  include/linux/dax.h   | 11 +--
> > > >  3 files changed, 14 insertions(+), 4 deletions(-)
> > > >
> > > > This patch should go in together with Adrian's
> > > > https://lore.kernel.org/linux-nvdimm/20200916133923.31-1-adrianhuang0...@gmail.com
> > > >
> > > > diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> > > > index e5767c83ea23..b6284c5cae0a 100644
> > > > --- a/drivers/dax/super.c
> > > > +++ b/drivers/dax/super.c
> > > > @@ -325,11 +325,15 @@ EXPORT_SYMBOL_GPL(dax_direct_access);
> > > >  bool dax_supported(struct dax_device *dax_dev, struct block_device 
> > > > *bdev,
> > > > int blocksize, sector_t start, sector_t len)
> > > >  {
> > > > +   if (!dax_dev)
> > > > +   return false;
> > > > +
> > >
> > > Hi Jan, Thanks for this.
> > >
> > > > if (!dax_alive(dax_dev))
> > > > return false;
> > >
> > > One small fixup to quiet lockdep because dax_supported() calls
> > > dax_alive() it expects that dax_read_lock() is held. So I'm testing
> > > with this incremental change:
> > >
> > > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> > > index bed1ff0744ec..229f461e7def 100644
> > > --- a/drivers/md/dm-table.c
> > > +++ b/drivers/md/dm-table.c
> > > @@ -860,9 +860,14 @@ EXPORT_SYMBOL_GPL(dm_table_set_type);
> > >  int device_supports_dax(struct dm_target *ti, struct dm_dev *dev,
> > > sector_t start, sector_t len, void *data)
> > >  {
> > > -   int blocksize = *(int *) data;
> > > +   int blocksize = *(int *) data, id;
> > > +   bool rc;
> > >
> > > -   return dax_supported(dev->dax_dev, dev->bdev, blocksize, start, 
> > > len);
> > > +   id = dax_read_lock();
> > > +   rc = dax_supported(dev->dax_dev, dev->bdev, blocksize, start, 
> > > len);
> > > +   dax_read_unlock(id);
> > > +
> > > +   return rc;
> > >  }
> >
> > Yeah, thanks for this! I was actually looking into this when writing the
> > patch and somehow convinced myself we will always be called through
> > bdev_dax_supported() which does dax_read_lock() for us. But apparently I
> > was wrong...
>
> Hold on. This patch hit another regression when I ran the full test of
> the lvm2-testsuite tool today.

Are you sure it's this patch?

The dax_read_lock() should have zero interaction with the
synchronize_srcu() that __dm_suspend() performs. The too srcu domains
should not conflict... I don't even see a dax_read_lock() in this
path.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH] dm: Call proper helper to determine dax support

2020-09-17 Thread Huang Adrian
On Thu, Sep 17, 2020 at 10:57 PM Huang Adrian  wrote:
>
> Note: Still no lock after applying Dan's fixup. It shows the same call
> trace with/without Dan's fixup.

Sorry, fat-finger. Should be 'Still no *luck*'

-- Adrian
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH] dm: Call proper helper to determine dax support

2020-09-17 Thread Huang Adrian
On Thu, Sep 17, 2020 at 6:42 PM Jan Kara  wrote:
>
> On Thu 17-09-20 02:28:57, Dan Williams wrote:
> > On Wed, Sep 16, 2020 at 8:15 AM Jan Kara  wrote:
> > >
> > > DM was calling generic_fsdax_supported() to determine whether a device
> > > referenced in the DM table supports DAX. However this is a helper for 
> > > "leaf" device drivers so that
> > > they don't have to duplicate common generic checks. High level code
> > > should call dax_supported() helper which that calls into appropriate
> > > helper for the particular device. This problem manifested itself as
> > > kernel messages:
> > >
> > > dm-3: error: dax access failed (-95)
> > >
> > > when lvm2-testsuite run in cases where a DM device was stacked on top of
> > > another DM device.
> > >
> > > Fixes: 7bf7eac8d648 ("dax: Arrange for dax_supported check to span 
> > > multiple devices")
> > > Tested-by: Adrian Huang 
> > > Signed-off-by: Jan Kara 
> > > ---
> > >  drivers/dax/super.c   |  4 
> > >  drivers/md/dm-table.c |  3 +--
> > >  include/linux/dax.h   | 11 +--
> > >  3 files changed, 14 insertions(+), 4 deletions(-)
> > >
> > > This patch should go in together with Adrian's
> > > https://lore.kernel.org/linux-nvdimm/20200916133923.31-1-adrianhuang0...@gmail.com
> > >
> > > diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> > > index e5767c83ea23..b6284c5cae0a 100644
> > > --- a/drivers/dax/super.c
> > > +++ b/drivers/dax/super.c
> > > @@ -325,11 +325,15 @@ EXPORT_SYMBOL_GPL(dax_direct_access);
> > >  bool dax_supported(struct dax_device *dax_dev, struct block_device *bdev,
> > > int blocksize, sector_t start, sector_t len)
> > >  {
> > > +   if (!dax_dev)
> > > +   return false;
> > > +
> >
> > Hi Jan, Thanks for this.
> >
> > > if (!dax_alive(dax_dev))
> > > return false;
> >
> > One small fixup to quiet lockdep because dax_supported() calls
> > dax_alive() it expects that dax_read_lock() is held. So I'm testing
> > with this incremental change:
> >
> > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> > index bed1ff0744ec..229f461e7def 100644
> > --- a/drivers/md/dm-table.c
> > +++ b/drivers/md/dm-table.c
> > @@ -860,9 +860,14 @@ EXPORT_SYMBOL_GPL(dm_table_set_type);
> >  int device_supports_dax(struct dm_target *ti, struct dm_dev *dev,
> > sector_t start, sector_t len, void *data)
> >  {
> > -   int blocksize = *(int *) data;
> > +   int blocksize = *(int *) data, id;
> > +   bool rc;
> >
> > -   return dax_supported(dev->dax_dev, dev->bdev, blocksize, start, 
> > len);
> > +   id = dax_read_lock();
> > +   rc = dax_supported(dev->dax_dev, dev->bdev, blocksize, start, len);
> > +   dax_read_unlock(id);
> > +
> > +   return rc;
> >  }
>
> Yeah, thanks for this! I was actually looking into this when writing the
> patch and somehow convinced myself we will always be called through
> bdev_dax_supported() which does dax_read_lock() for us. But apparently I
> was wrong...

Hold on. This patch hit another regression when I ran the full test of
the lvm2-testsuite tool today.

After checking the test log, the task was blocked for more than 120
seconds when running the command 'pvmove --abort' of the script
'pvmove-abort-all.sh'.

Note: Still no lock after applying Dan's fixup. It shows the same call
trace with/without Dan's fixup.

[  375.623646] INFO: task lvm:12573 blocked for more than 122 seconds.
[  375.630685]   Not tainted 5.9.0-rc5+ #25
[  375.635467] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  375.644223] task:lvm state:D stack:0 pid:12573
ppid: 1 flags:0x0080
[  375.653561] Call Trace:
[  375.656342]  __schedule+0x278/0x740
[  375.660261]  ? ttwu_do_wakeup+0x19/0x150
[  375.664653]  schedule+0x40/0xb0
[  375.668173]  schedule_timeout+0x25f/0x300
[  375.672665]  ? __queue_work+0x13a/0x3e0
[  375.676950]  wait_for_completion+0x8d/0xf0
[  375.681538]  __synchronize_srcu.part.21+0x81/0xb0
[  375.686804]  ? __bpf_trace_rcu_utilization+0x10/0x10
[  375.692356]  ? synchronize_srcu+0x59/0xf0
[  375.696877]  __dm_suspend+0x56/0x1d0 [dm_mod]
[  375.701759]  ? table_load+0x2e0/0x2e0 [dm_mod]
[  375.706735]  dm_suspend+0xa5/0xd0 [dm_mod]
[  375.711324]  dev_suspend+0x14d/0x290 [dm_mod]
[  375.716202]  ctl_ioctl+0x1af/0x420 [dm_mod]
[  375.720892]  ? iomap_write_begin+0x4c0/0x4c0
[  375.725675]  dm_ctl_ioctl+0xa/0x10 [dm_mod]
[  375.730352]  __x64_sys_ioctl+0x84/0xb1
[  375.734551]  do_syscall_64+0x33/0x40
[  375.738557]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  375.744212] RIP: 0033:0x7f0106eee87b
[  375.748206] Code: Bad RIP value.
[  375.751822] RSP: 002b:7ffe10ff4498 EFLAGS: 0206 ORIG_RAX:
0010
[  375.760286] RAX: ffda RBX: 564c3aebb260 RCX: 7f0106eee87b
[  375.768262] RDX: 564c3c4220c0 RSI: c138fd06 RDI: 0004
[  375.776237] RBP: 564c3af694fe R08: 564c3c3461f0 R09: 

Re: [PATCH] dm: Call proper helper to determine dax support

2020-09-17 Thread Jan Kara
On Thu 17-09-20 02:28:57, Dan Williams wrote:
> On Wed, Sep 16, 2020 at 8:15 AM Jan Kara  wrote:
> >
> > DM was calling generic_fsdax_supported() to determine whether a device
> > referenced in the DM table supports DAX. However this is a helper for 
> > "leaf" device drivers so that
> > they don't have to duplicate common generic checks. High level code
> > should call dax_supported() helper which that calls into appropriate
> > helper for the particular device. This problem manifested itself as
> > kernel messages:
> >
> > dm-3: error: dax access failed (-95)
> >
> > when lvm2-testsuite run in cases where a DM device was stacked on top of
> > another DM device.
> >
> > Fixes: 7bf7eac8d648 ("dax: Arrange for dax_supported check to span multiple 
> > devices")
> > Tested-by: Adrian Huang 
> > Signed-off-by: Jan Kara 
> > ---
> >  drivers/dax/super.c   |  4 
> >  drivers/md/dm-table.c |  3 +--
> >  include/linux/dax.h   | 11 +--
> >  3 files changed, 14 insertions(+), 4 deletions(-)
> >
> > This patch should go in together with Adrian's
> > https://lore.kernel.org/linux-nvdimm/20200916133923.31-1-adrianhuang0...@gmail.com
> >
> > diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> > index e5767c83ea23..b6284c5cae0a 100644
> > --- a/drivers/dax/super.c
> > +++ b/drivers/dax/super.c
> > @@ -325,11 +325,15 @@ EXPORT_SYMBOL_GPL(dax_direct_access);
> >  bool dax_supported(struct dax_device *dax_dev, struct block_device *bdev,
> > int blocksize, sector_t start, sector_t len)
> >  {
> > +   if (!dax_dev)
> > +   return false;
> > +
> 
> Hi Jan, Thanks for this.
> 
> > if (!dax_alive(dax_dev))
> > return false;
> 
> One small fixup to quiet lockdep because dax_supported() calls
> dax_alive() it expects that dax_read_lock() is held. So I'm testing
> with this incremental change:
> 
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index bed1ff0744ec..229f461e7def 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -860,9 +860,14 @@ EXPORT_SYMBOL_GPL(dm_table_set_type);
>  int device_supports_dax(struct dm_target *ti, struct dm_dev *dev,
> sector_t start, sector_t len, void *data)
>  {
> -   int blocksize = *(int *) data;
> +   int blocksize = *(int *) data, id;
> +   bool rc;
> 
> -   return dax_supported(dev->dax_dev, dev->bdev, blocksize, start, len);
> +   id = dax_read_lock();
> +   rc = dax_supported(dev->dax_dev, dev->bdev, blocksize, start, len);
> +   dax_read_unlock(id);
> +
> +   return rc;
>  }

Yeah, thanks for this! I was actually looking into this when writing the
patch and somehow convinced myself we will always be called through
bdev_dax_supported() which does dax_read_lock() for us. But apparently I
was wrong...

Honza
-- 
Jan Kara 
SUSE Labs, CR
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH] dm: Call proper helper to determine dax support

2020-09-17 Thread Dan Williams
On Wed, Sep 16, 2020 at 8:15 AM Jan Kara  wrote:
>
> DM was calling generic_fsdax_supported() to determine whether a device
> referenced in the DM table supports DAX. However this is a helper for "leaf" 
> device drivers so that
> they don't have to duplicate common generic checks. High level code
> should call dax_supported() helper which that calls into appropriate
> helper for the particular device. This problem manifested itself as
> kernel messages:
>
> dm-3: error: dax access failed (-95)
>
> when lvm2-testsuite run in cases where a DM device was stacked on top of
> another DM device.
>
> Fixes: 7bf7eac8d648 ("dax: Arrange for dax_supported check to span multiple 
> devices")
> Tested-by: Adrian Huang 
> Signed-off-by: Jan Kara 
> ---
>  drivers/dax/super.c   |  4 
>  drivers/md/dm-table.c |  3 +--
>  include/linux/dax.h   | 11 +--
>  3 files changed, 14 insertions(+), 4 deletions(-)
>
> This patch should go in together with Adrian's
> https://lore.kernel.org/linux-nvdimm/20200916133923.31-1-adrianhuang0...@gmail.com
>
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index e5767c83ea23..b6284c5cae0a 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -325,11 +325,15 @@ EXPORT_SYMBOL_GPL(dax_direct_access);
>  bool dax_supported(struct dax_device *dax_dev, struct block_device *bdev,
> int blocksize, sector_t start, sector_t len)
>  {
> +   if (!dax_dev)
> +   return false;
> +

Hi Jan, Thanks for this.

> if (!dax_alive(dax_dev))
> return false;

One small fixup to quiet lockdep because dax_supported() calls
dax_alive() it expects that dax_read_lock() is held. So I'm testing
with this incremental change:

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index bed1ff0744ec..229f461e7def 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -860,9 +860,14 @@ EXPORT_SYMBOL_GPL(dm_table_set_type);
 int device_supports_dax(struct dm_target *ti, struct dm_dev *dev,
sector_t start, sector_t len, void *data)
 {
-   int blocksize = *(int *) data;
+   int blocksize = *(int *) data, id;
+   bool rc;

-   return dax_supported(dev->dax_dev, dev->bdev, blocksize, start, len);
+   id = dax_read_lock();
+   rc = dax_supported(dev->dax_dev, dev->bdev, blocksize, start, len);
+   dax_read_unlock(id);
+
+   return rc;
 }
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org