Re: [dm-devel] [PATCH 1/2] libmultipath: hwhandler auto-detection for ALUA

2018-04-03 Thread Benjamin Marzinski
On Tue, Apr 03, 2018 at 10:53:29PM +0200, Martin Wilck wrote:
> On Tue, 2018-04-03 at 15:31 -0500, Benjamin Marzinski wrote:
> > On Tue, Mar 27, 2018 at 11:50:52PM +0200, Martin Wilck wrote:
> > > If the hardware handler isn't explicitly set, infer ALUA support
> > > from the pp->tpgs attribute. Likewise, if ALUA is selected, but
> > > not supported by the hardware, fall back to no hardware handler.
> > 
> > Weren't you worried before about temporary ALUA failures? If you had
> > a
> > temporary failure while configuring a device that you explicitly set
> > to
> > be ALUA, then this would cause the device to be misconfigured? 
> 
> I believe that if TGPS is 0, the device will never be able to support
> ALUA. The kernel also looks at the TPGS bits and won't try ALUA if they
> are unset. Once the device is configured and actual ALUA RTPG/STPG
> calls are performed, they may fail for a variety of temporary reasons -
> I wanted to avoid resetting the prio algorithm to "const" for such
> cases. That's my understanding, correct me if I'm wrong.

Devices that were not correctly supporing ALUA returned > 0 for
get_target_port_group_support, so detect_alua actually does all the work
necessary to verify that it can get a priority. Without doing this,
multiple deviecs that didn't support ALUA were being detected as
supporting ALUA.

> 
> > If the
> > hardware handler isn't set, inferring ALUA is fine. But what is the
> > case
> > where we want to say that a device that is explicitly set to ALUA
> > shouldn't actually be ALUA?  It seem like if there is some
> > uncertaintly,
> > we should just not set the hardware handler, and allow multipath to
> > infer it via the pp->tpgs value.
> > 
> > I'm not strongly against this patch. I just don't see the value in
> > overriding an explicit configuration, if we believe that temporary
> > failures are possible.
> 
> That would be fine if we didn't have any explicit "hardware_handler
> alua" settings in the hardcoded hwtable any more, or at least if we're 
> positive that those devices where we have "hardware_handler alua"
> really support it.
> 
> We can also adopt the philosophy of "detect_prio" and "detect_checker",
> add an additional config file option "detect_hwhandler", and look at
> tpgs only if the latter it set (which would be the default). Like
> detect_prio, users could then enforce their config file settings with
> "detect_hwhandler no".
> 
> I was hoping we could find a simpler approach, without yet another
> rarely-used config option.
> 
> Btw, at SUSE we solved our problem with the controller at hand by
> simply removing "hardware_handler alua" and "prio alua" from the IBM
> IPR entry. If the scsi_dh_alua module is loaded early (default on
> SUSE), this results in ALUA hwhandler and sysfs prio being used for IPR
> controllers that do support ALUA, and no hwhandler / const prio =
> PRIO_UNDEF for those that don't. I'm not sure if that simple solution
> suits upstream, because upstream doesn't enforce early loading of
> device handler modules.

By using retain_attached_hwhandler at all, we are implicitly requiring
the scsi_dh_alua module to be loaded before devices with indeterminate
configurations are discovered for them to work correctly. right? For
instance, commit 715c48d93dd00930534ce6a55d0e3705466df5d6 did this for
netapp devices, and that was in 2013. I don't see how this is different.

-Ben

> Regards,
> Martin
> 
> 
> > 
> > -Ben
> > 
> > > 
> > > Signed-off-by: Martin Wilck 
> > > ---
> > >  libmultipath/propsel.c | 19 +--
> > >  1 file changed, 17 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/libmultipath/propsel.c b/libmultipath/propsel.c
> > > index 93974a482336..dc24450eb775 100644
> > > --- a/libmultipath/propsel.c
> > > +++ b/libmultipath/propsel.c
> > > @@ -43,10 +43,13 @@ do {  
> > >   \
> > >   goto out;   
> > > \
> > >   }   
> > > \
> > >  } while(0)
> > > +
> > > +static char default_origin[] = "(setting: multipath internal)";
> > > +
> > >  #define do_default(dest, value)  
> > >   \
> > >  do { 
> > >   \
> > >   dest = value;   
> > >   \
> > > - origin = "(setting: multipath internal)";   
> > > \
> > > + origin = default_origin;
> > > \
> > >  } while(0)
> > >  
> > >  #define mp_set_mpe(var)  
> > >   \
> > > @@ -373,16 +376,20 @@ static int get_dh_state(struct path *pp, char
> > > *value, size_t value_len)
> > >  
> > >  int select_hwhandler(struct config *conf, struct multipath *mp)
> > >  {
> > > - char *origin;
> > > + const char *origin;
> > >   struct path *pp;
> > >   /* dh_state is no longer 

[dm-devel] [dm:for-next 31/31] drivers/md/dm-zoned-target.c:954:20: error: initialization from incompatible pointer type

2018-04-03 Thread kbuild test robot
tree:   
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git 
for-next
head:   a39e613506eca803cede2cf555d4d1dc485352e7
commit: a39e613506eca803cede2cf555d4d1dc485352e7 [31/31] dm: remove fmode_t 
argument from .prepare_ioctl hook
config: i386-randconfig-x015-201813 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
git checkout a39e613506eca803cede2cf555d4d1dc485352e7
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

>> drivers/md/dm-zoned-target.c:954:20: error: initialization from incompatible 
>> pointer type [-Werror=incompatible-pointer-types]
 .prepare_ioctl  = dmz_prepare_ioctl,
   ^
   drivers/md/dm-zoned-target.c:954:20: note: (near initialization for 
'dmz_type.prepare_ioctl')
   cc1: some warnings being treated as errors

vim +954 drivers/md/dm-zoned-target.c

3b1a94c8 Damien Le Moal 2017-06-07  943  
3b1a94c8 Damien Le Moal 2017-06-07  944  static struct target_type dmz_type = {
3b1a94c8 Damien Le Moal 2017-06-07  945 .name= "zoned",
3b1a94c8 Damien Le Moal 2017-06-07  946 .version = {1, 0, 0},
3b1a94c8 Damien Le Moal 2017-06-07  947 .features= 
DM_TARGET_SINGLETON | DM_TARGET_ZONED_HM,
3b1a94c8 Damien Le Moal 2017-06-07  948 .module  = THIS_MODULE,
3b1a94c8 Damien Le Moal 2017-06-07  949 .ctr = dmz_ctr,
3b1a94c8 Damien Le Moal 2017-06-07  950 .dtr = dmz_dtr,
3b1a94c8 Damien Le Moal 2017-06-07  951 .map = dmz_map,
3b1a94c8 Damien Le Moal 2017-06-07  952 .end_io  = dmz_end_io,
3b1a94c8 Damien Le Moal 2017-06-07  953 .io_hints= dmz_io_hints,
3b1a94c8 Damien Le Moal 2017-06-07 @954 .prepare_ioctl   = 
dmz_prepare_ioctl,
3b1a94c8 Damien Le Moal 2017-06-07  955 .postsuspend = dmz_suspend,
3b1a94c8 Damien Le Moal 2017-06-07  956 .resume  = dmz_resume,
3b1a94c8 Damien Le Moal 2017-06-07  957 .iterate_devices = 
dmz_iterate_devices,
3b1a94c8 Damien Le Moal 2017-06-07  958  };
3b1a94c8 Damien Le Moal 2017-06-07  959  

:: The code at line 954 was first introduced by commit
:: 3b1a94c88b798d4f3bd1a5b61f5c8fb9d987c242 dm zoned: drive-managed zoned 
block device target

:: TO: Damien Le Moal 
:: CC: Mike Snitzer 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [PATCH v9] dax, dm: introduce ->fs_{claim, release}() dax_device infrastructure

2018-04-03 Thread Mike Snitzer
On Tue, Apr 03 2018 at  4:36pm -0400,
Dan Williams  wrote:

> In preparation for allowing filesystems to augment the dev_pagemap
> associated with a dax_device, add an ->fs_claim() callback. The
> ->fs_claim() callback is leveraged by the device-mapper dax
> implementation to iterate all member devices in the map and repeat the
> claim operation across the array.
> 
> In order to resolve collisions between filesystem operations and DMA to
> DAX mapped pages we need a callback when DMA completes. With a callback
> we can hold off filesystem operations while DMA is in-flight and then
> resume those operations when the last put_page() occurs on a DMA page.
> The ->fs_claim() operation arranges for this callback to be registered,
> although that implementation is saved for a later patch.
> 
> Cc: Alasdair Kergon 
> Cc: Mike Snitzer 
> Cc: Matthew Wilcox 
> Cc: Ross Zwisler 
> Cc: "Jérôme Glisse" 
> Cc: Christoph Hellwig 
> Cc: Jan Kara 
> Signed-off-by: Dan Williams 
> ---
> Changes since v8:
> * make __fs_dax_claim and __fs_dax_release private to
>   drivers/dax/super.c
> 
> * rename dm_dax_iterate to dm_dax_interate_devices (Mike)
> 
> * drop the return value from dm_dax_interate_devices since nothing uses
>   it (Mike)

Acked-by: Mike Snitzer 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 1/2] libmultipath: hwhandler auto-detection for ALUA

2018-04-03 Thread Martin Wilck
On Tue, 2018-04-03 at 15:31 -0500, Benjamin Marzinski wrote:
> On Tue, Mar 27, 2018 at 11:50:52PM +0200, Martin Wilck wrote:
> > If the hardware handler isn't explicitly set, infer ALUA support
> > from the pp->tpgs attribute. Likewise, if ALUA is selected, but
> > not supported by the hardware, fall back to no hardware handler.
> 
> Weren't you worried before about temporary ALUA failures? If you had
> a
> temporary failure while configuring a device that you explicitly set
> to
> be ALUA, then this would cause the device to be misconfigured? 

I believe that if TGPS is 0, the device will never be able to support
ALUA. The kernel also looks at the TPGS bits and won't try ALUA if they
are unset. Once the device is configured and actual ALUA RTPG/STPG
calls are performed, they may fail for a variety of temporary reasons -
I wanted to avoid resetting the prio algorithm to "const" for such
cases. That's my understanding, correct me if I'm wrong.

> If the
> hardware handler isn't set, inferring ALUA is fine. But what is the
> case
> where we want to say that a device that is explicitly set to ALUA
> shouldn't actually be ALUA?  It seem like if there is some
> uncertaintly,
> we should just not set the hardware handler, and allow multipath to
> infer it via the pp->tpgs value.
> 
> I'm not strongly against this patch. I just don't see the value in
> overriding an explicit configuration, if we believe that temporary
> failures are possible.

That would be fine if we didn't have any explicit "hardware_handler
alua" settings in the hardcoded hwtable any more, or at least if we're 
positive that those devices where we have "hardware_handler alua"
really support it.

We can also adopt the philosophy of "detect_prio" and "detect_checker",
add an additional config file option "detect_hwhandler", and look at
tpgs only if the latter it set (which would be the default). Like
detect_prio, users could then enforce their config file settings with
"detect_hwhandler no".

I was hoping we could find a simpler approach, without yet another
rarely-used config option.

Btw, at SUSE we solved our problem with the controller at hand by
simply removing "hardware_handler alua" and "prio alua" from the IBM
IPR entry. If the scsi_dh_alua module is loaded early (default on
SUSE), this results in ALUA hwhandler and sysfs prio being used for IPR
controllers that do support ALUA, and no hwhandler / const prio =
PRIO_UNDEF for those that don't. I'm not sure if that simple solution
suits upstream, because upstream doesn't enforce early loading of
device handler modules.

Regards,
Martin


> 
> -Ben
> 
> > 
> > Signed-off-by: Martin Wilck 
> > ---
> >  libmultipath/propsel.c | 19 +--
> >  1 file changed, 17 insertions(+), 2 deletions(-)
> > 
> > diff --git a/libmultipath/propsel.c b/libmultipath/propsel.c
> > index 93974a482336..dc24450eb775 100644
> > --- a/libmultipath/propsel.c
> > +++ b/libmultipath/propsel.c
> > @@ -43,10 +43,13 @@ do {
> > \
> > goto out;   
> > \
> > }   
> > \
> >  } while(0)
> > +
> > +static char default_origin[] = "(setting: multipath internal)";
> > +
> >  #define do_default(dest, value)
> > \
> >  do {   
> > \
> > dest = value;   
> > \
> > -   origin = "(setting: multipath internal)";   
> > \
> > +   origin = default_origin;
> > \
> >  } while(0)
> >  
> >  #define mp_set_mpe(var)
> > \
> > @@ -373,16 +376,20 @@ static int get_dh_state(struct path *pp, char
> > *value, size_t value_len)
> >  
> >  int select_hwhandler(struct config *conf, struct multipath *mp)
> >  {
> > -   char *origin;
> > +   const char *origin;
> > struct path *pp;
> > /* dh_state is no longer than "detached" */
> > char handler[12];
> > +   static char alua_name[] = "1 alua";
> > +   static const char tpgs_origin[]= "(setting: autodetected
> > from TPGS)";
> > char *dh_state;
> > int i;
> > +   bool all_tpgs = true;
> >  
> > dh_state = [2];
> > if (mp->retain_hwhandler != RETAIN_HWHANDLER_OFF) {
> > vector_foreach_slot(mp->paths, pp, i) {
> > +   all_tpgs = all_tpgs && (pp->tpgs > 0);
> > if (get_dh_state(pp, dh_state,
> > sizeof(handler) - 2) > 0
> > && strcmp(dh_state, "detached")) {
> > memcpy(handler, "1 ", 2);
> > @@ -397,6 +404,14 @@ int select_hwhandler(struct config *conf,
> > struct multipath *mp)
> > mp_set_conf(hwhandler);
> > mp_set_default(hwhandler, DEFAULT_HWHANDLER);
> >  out:
> > +   if (all_tpgs && 

[dm-devel] [PATCH v9] dax, dm: introduce ->fs_{claim, release}() dax_device infrastructure

2018-04-03 Thread Dan Williams
In preparation for allowing filesystems to augment the dev_pagemap
associated with a dax_device, add an ->fs_claim() callback. The
->fs_claim() callback is leveraged by the device-mapper dax
implementation to iterate all member devices in the map and repeat the
claim operation across the array.

In order to resolve collisions between filesystem operations and DMA to
DAX mapped pages we need a callback when DMA completes. With a callback
we can hold off filesystem operations while DMA is in-flight and then
resume those operations when the last put_page() occurs on a DMA page.
The ->fs_claim() operation arranges for this callback to be registered,
although that implementation is saved for a later patch.

Cc: Alasdair Kergon 
Cc: Mike Snitzer 
Cc: Matthew Wilcox 
Cc: Ross Zwisler 
Cc: "Jérôme Glisse" 
Cc: Christoph Hellwig 
Cc: Jan Kara 
Signed-off-by: Dan Williams 
---
Changes since v8:
* make __fs_dax_claim and __fs_dax_release private to
  drivers/dax/super.c

* rename dm_dax_iterate to dm_dax_interate_devices (Mike)

* drop the return value from dm_dax_interate_devices since nothing uses
  it (Mike)

 drivers/dax/super.c  |   74 ++
 drivers/md/dm.c  |   57 +++
 include/linux/dax.h  |   48 ++
 include/linux/memremap.h |8 +
 4 files changed, 187 insertions(+)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 2b2332b605e4..c45ded5e93e7 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -29,6 +29,7 @@ static struct vfsmount *dax_mnt;
 static DEFINE_IDA(dax_minor_ida);
 static struct kmem_cache *dax_cache __read_mostly;
 static struct super_block *dax_superblock __read_mostly;
+static DEFINE_MUTEX(devmap_lock);
 
 #define DAX_HASH_SIZE (PAGE_SIZE / sizeof(struct hlist_head))
 static struct hlist_head dax_host_list[DAX_HASH_SIZE];
@@ -169,9 +170,82 @@ struct dax_device {
const char *host;
void *private;
unsigned long flags;
+   struct dev_pagemap *pgmap;
const struct dax_operations *ops;
 };
 
+#if IS_ENABLED(CONFIG_FS_DAX)
+static void generic_dax_pagefree(struct page *page, void *data)
+{
+   /* TODO: wakeup page-idle waiters */
+}
+
+static struct dax_device *__fs_dax_claim(struct dax_device *dax_dev,
+   void *owner)
+{
+   struct dev_pagemap *pgmap;
+
+   if (!dax_dev->pgmap)
+   return dax_dev;
+   pgmap = dax_dev->pgmap;
+
+   mutex_lock(_lock);
+   if (pgmap->data && pgmap->data == owner) {
+   /* dm might try to claim the same device more than once... */
+   mutex_unlock(_lock);
+   return dax_dev;
+   } else if (pgmap->page_free || pgmap->page_fault
+   || pgmap->type != MEMORY_DEVICE_HOST) {
+   put_dax(dax_dev);
+   mutex_unlock(_lock);
+   return NULL;
+   }
+
+   pgmap->type = MEMORY_DEVICE_FS_DAX;
+   pgmap->page_free = generic_dax_pagefree;
+   pgmap->data = owner;
+   mutex_unlock(_lock);
+
+   return dax_dev;
+}
+
+struct dax_device *fs_dax_claim(struct dax_device *dax_dev, void *owner)
+{
+   if (dax_dev->ops->fs_claim)
+   return dax_dev->ops->fs_claim(dax_dev, owner);
+   else
+   return __fs_dax_claim(dax_dev, owner);
+}
+EXPORT_SYMBOL_GPL(fs_dax_claim);
+
+static void __fs_dax_release(struct dax_device *dax_dev, void *owner)
+{
+   struct dev_pagemap *pgmap = dax_dev ? dax_dev->pgmap : NULL;
+
+   put_dax(dax_dev);
+   if (!pgmap)
+   return;
+   if (!pgmap->data)
+   return;
+
+   mutex_lock(_lock);
+   WARN_ON(pgmap->data != owner);
+   pgmap->type = MEMORY_DEVICE_HOST;
+   pgmap->page_free = NULL;
+   pgmap->data = NULL;
+   mutex_unlock(_lock);
+}
+
+void fs_dax_release(struct dax_device *dax_dev, void *owner)
+{
+   if (dax_dev->ops->fs_release)
+   dax_dev->ops->fs_release(dax_dev, owner);
+   else
+   __fs_dax_release(dax_dev, owner);
+}
+EXPORT_SYMBOL_GPL(fs_dax_release);
+#endif
+
 static ssize_t write_cache_show(struct device *dev,
struct device_attribute *attr, char *buf)
 {
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index ffc93aecc02a..cb8ddeb3373f 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1090,6 +1090,61 @@ static size_t dm_dax_copy_from_iter(struct dax_device 
*dax_dev, pgoff_t pgoff,
return ret;
 }
 
+static int dm_dax_dev_claim(struct dm_target *ti, struct dm_dev *dev,
+   sector_t start, sector_t len, void *owner)
+{
+   if (fs_dax_claim(dev->dax_dev, owner))
+   return 0;
+   /*
+* Outside of a kernel bug there is no reason a dax_dev should
+* fail 

Re: [dm-devel] [PATCH 1/2] libmultipath: hwhandler auto-detection for ALUA

2018-04-03 Thread Benjamin Marzinski
On Tue, Mar 27, 2018 at 11:50:52PM +0200, Martin Wilck wrote:
> If the hardware handler isn't explicitly set, infer ALUA support
> from the pp->tpgs attribute. Likewise, if ALUA is selected, but
> not supported by the hardware, fall back to no hardware handler.

Weren't you worried before about temporary ALUA failures? If you had a
temporary failure while configuring a device that you explicitly set to
be ALUA, then this would cause the device to be misconfigured? If the
hardware handler isn't set, inferring ALUA is fine. But what is the case
where we want to say that a device that is explicitly set to ALUA
shouldn't actually be ALUA?  It seem like if there is some uncertaintly,
we should just not set the hardware handler, and allow multipath to
infer it via the pp->tpgs value.

I'm not strongly against this patch. I just don't see the value in
overriding an explicit configuration, if we believe that temporary
failures are possible.

-Ben

> 
> Signed-off-by: Martin Wilck 
> ---
>  libmultipath/propsel.c | 19 +--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/libmultipath/propsel.c b/libmultipath/propsel.c
> index 93974a482336..dc24450eb775 100644
> --- a/libmultipath/propsel.c
> +++ b/libmultipath/propsel.c
> @@ -43,10 +43,13 @@ do {  
> \
>   goto out;   \
>   }   \
>  } while(0)
> +
> +static char default_origin[] = "(setting: multipath internal)";
> +
>  #define do_default(dest, value)  
> \
>  do { \
>   dest = value;   \
> - origin = "(setting: multipath internal)";   \
> + origin = default_origin;\
>  } while(0)
>  
>  #define mp_set_mpe(var)  
> \
> @@ -373,16 +376,20 @@ static int get_dh_state(struct path *pp, char *value, 
> size_t value_len)
>  
>  int select_hwhandler(struct config *conf, struct multipath *mp)
>  {
> - char *origin;
> + const char *origin;
>   struct path *pp;
>   /* dh_state is no longer than "detached" */
>   char handler[12];
> + static char alua_name[] = "1 alua";
> + static const char tpgs_origin[]= "(setting: autodetected from TPGS)";
>   char *dh_state;
>   int i;
> + bool all_tpgs = true;
>  
>   dh_state = [2];
>   if (mp->retain_hwhandler != RETAIN_HWHANDLER_OFF) {
>   vector_foreach_slot(mp->paths, pp, i) {
> + all_tpgs = all_tpgs && (pp->tpgs > 0);
>   if (get_dh_state(pp, dh_state, sizeof(handler) - 2) > 0
>   && strcmp(dh_state, "detached")) {
>   memcpy(handler, "1 ", 2);
> @@ -397,6 +404,14 @@ int select_hwhandler(struct config *conf, struct 
> multipath *mp)
>   mp_set_conf(hwhandler);
>   mp_set_default(hwhandler, DEFAULT_HWHANDLER);
>  out:
> + if (all_tpgs && !strcmp(mp->hwhandler, DEFAULT_HWHANDLER) &&
> + origin == default_origin) {
> + mp->hwhandler = alua_name;
> + origin = tpgs_origin;
> + } else if (!all_tpgs && !strcmp(mp->hwhandler, alua_name)) {
> + mp->hwhandler = DEFAULT_HWHANDLER;
> + origin = tpgs_origin;
> + }
>   mp->hwhandler = STRDUP(mp->hwhandler);
>   condlog(3, "%s: hardware_handler = \"%s\" %s", mp->alias, mp->hwhandler,
>   origin);
> -- 
> 2.16.1

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] Recent kernels fail to boot on POWER8 with multipath SCSI

2018-04-03 Thread Michael Ellerman
Mike Snitzer  writes:
> On Fri, Mar 30 2018 at  5:04P -0400,
> Michael Ellerman  wrote:
...
>> Any prospect of getting that patch to Linus before the 4.16 release? Yes
>> I realise that's in ~36 hours :)
>
> Please, see upstream commit e457edf0b21c873be827b7c2f6b8e1545485c415

Sweet thanks!

cheers

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] Recent kernels fail to boot on POWER8 with multipath SCSI

2018-04-03 Thread Michael Ellerman
Hi Mike,

Paul's AFK so I tried the patch you sent.

Mike Snitzer  writes:
> On Thu, Mar 29 2018 at  4:39am -0400,
> Paul Mackerras  wrote:
>> Since commit 8d47e65948dd ("dm mpath: remove unnecessary NVMe
>> branching in favor of scsi_dh checks", 2018-03-05), upstream kernels
>> fail to boot on my POWER8 box which has multipath SCSI disks.  The
>> host adapters are IPR and the userspace is CentOS 7.
...
>
> Please try this patch, it'll likely fix your issues:
>
> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> index dbddcdc5a4ec..746dd8a75b4a 100644
> --- a/drivers/md/dm-mpath.c
> +++ b/drivers/md/dm-mpath.c
> @@ -887,7 +887,7 @@ static struct pgpath *parse_path(struct dm_arg_set *as, 
> struct path_selector *ps
>  
>   q = bdev_get_queue(p->path.dev->bdev);
>   attached_handler_name = scsi_dh_attached_handler_name(q, GFP_KERNEL);
> - if (attached_handler_name) {
> + if (attached_handler_name || m->hw_handler_name) {
>   INIT_DELAYED_WORK(>activate_path, activate_path_work);
>   r = setup_scsi_dh(p->path.dev->bdev, m, attached_handler_name, 
> >error);
>   if (r) {

And it does indeed fix the problem. The system boots happily with no warnings.

If you like here's a:

  Tested-by: Michael Ellerman 

Any prospect of getting that patch to Linus before the 4.16 release? Yes
I realise that's in ~36 hours :)

cheers

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [BUG] dm-thin metadata operation failed due to -ENOSPC returned by dm_pool_alloc_data_block() after processing DISCARD bios

2018-04-03 Thread Zdenek Kabelac

Dne 3.4.2018 v 11:31 Zdenek Kabelac napsal(a):

Dne 3.4.2018 v 06:07 Dennis Yang napsal(a):

Hi,

Recently we have came across an issue that dm-thin pool will be
switched to READ_ONLY mode because dm_pool_alloc_data_block() returns
-ENOSPC. AFAIK, this should not happen since alloc_data_block() will
check if there is any free space (and commit metadata if it first
reports no free space) before it allocates pool block. In addition,
total virtual space of all thin volumes is smaller than the pool
physical space in my testing environment which makes pool impossible
to run out of space.




Hi


Which kernel has been used during testing - was this upstream ?
(4.16??)



Ahh - wrongly applied fstrim - yep reproducible even on this smaller data set.

I'll open BZ case for this.


Regards


Zdenek

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [BUG] dm-thin metadata operation failed due to -ENOSPC returned by dm_pool_alloc_data_block() after processing DISCARD bios

2018-04-03 Thread Zdenek Kabelac

Dne 3.4.2018 v 06:07 Dennis Yang napsal(a):

Hi,

Recently we have came across an issue that dm-thin pool will be
switched to READ_ONLY mode because dm_pool_alloc_data_block() returns
-ENOSPC. AFAIK, this should not happen since alloc_data_block() will
check if there is any free space (and commit metadata if it first
reports no free space) before it allocates pool block. In addition,
total virtual space of all thin volumes is smaller than the pool
physical space in my testing environment which makes pool impossible
to run out of space.




Hi


Which kernel has been used during testing - was this upstream ?
(4.16??)


This issue could be easily reproduced by the following steps.

1) Create a thin pool and a slightly smaller thin volume

sudo dmsetup create meta --table "0 4000 linear /dev/sdf 0"


maximum metadadata size is only ~16G (... 33161216 blocks or so)



sudo dmsetup create data --table "0 1024 linear /dev/md125 0"
sudo dd if=/dev/zero of=/dev/mapper/meta bs=1M count=1
sudo dmsetup create pool --table "0 1024 thin-pool /dev/mapper/meta 
/dev/mapper/data 1024 0 2 skip_block_zeroing error_if_no_space"
sudo dmsetup message pool 0 "create_thin 0"
sudo dmsetup create thin --table "0 10238976 thin /dev/mapper/pool 0"


I've tried to reproduce with smaller LVs - but haven't managed yet:

vg-LV1: 0 102384 thin 253:4 1
vg-pool: 0 102400 linear 253:4 0
vg-pool-tpool: 0 102400 thin-pool 253:2 253:3 128 0 2 skip_block_zeroing 
error_if_no_space

vg-pool_tdata: 0 102400 linear 253:0 10240
vg-pool_tmeta: 0 8192 linear 253:1 2048


Regards

Zdenek

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel