Re: [dm-devel] RFC: one more time: SCSI device identification

2021-04-30 Thread Ewan D. Milne
On Wed, 2021-04-28 at 10:09 +1000, Erwin van Londen wrote:
> 
> On Tue, 2021-04-27 at 16:41 -0400, Ewan D. Milne wrote:
> > On Tue, 2021-04-27 at 20:33 +, Martin Wilck wrote:
> > > On Tue, 2021-04-27 at 16:14 -0400, Ewan D. Milne wrote:
> > > > There's no way to do that, in principle.  Because there could
> > > > be
> > > > other I/Os in flight.  You might (somehow) avoid retrying an
> > > > I/O
> > > > that got a UA until you figured out if something changed, but
> > > > other
> > > > I/Os can already have been sent to the target, or issued before
> > > > you
> > > > get to look at the status.
> 
> If something happens on a storage side where a lun gets it's
> attributes changed (any, doesn't matter which one) a UA should be
> sent. Also all outstanding IO's on that lun should be returning an
> Abort as it can no longer warrant the validity of any IO due to these
> changes. Especially when parameters are involved like reservations
> (PR's) etc. If that does not happen from an array side all bets are
> off as the only way to be able to get back in business is to start
> from scratch.

Perhaps an array might abort I/Os it has received in the Device Server
whensomething changes.  I have no idea if most or any arrays actually
do that.
But, what about I/O that has already been queued from the host to
thehost bus adapter?  I don't see how we can abort those I/Os
properly.Most high-performance HBAs have a queue of commands and a
queueof responses, there could be lots of commands queued before
wemanage to notice an interesting status.  And AFAIK there is no
conditionalmechanism that could hold them off (and, they could be in-
flight on thewire anyway).
I get what you are saying about what SAM describes, I just don't see
howwe can guarantee we don't send any further commands after the
statuswith the UA is sent back, before we can understand what happened.
-Ewan
> > > 
> > > Right. But in practice, a WWID change will hardly happen under
> > > full
> > > IO
> > > load. The storage side will probably have to block IO while this
> > > happens, at least for a short time period. So blocking and
> > > quiescing
> > > the queue upon an UA might still work, most of the time. Even if
> > > we
> > > were too late already, the sooner we stop the queue, the better.
> 
> I think in most cases when something happens on an array side you
> will see IO's being aborted. That might be a good time to start doing
> TUR's and if these come back OK do a new inquiry. From a host side
> there is only so much you can do.
> 
> > > The current algorithm in multipath-tools needs to detect a path
> > > going
> > > down and being reinstated. The time interval during which a WWID
> > > change
> > > will go unnoticed is one or more path checker intervals,
> > > typically on
> > > the order of 5-30 seconds. If we could decrease this interval to
> > > a
> > > sub-
> > > second or even millisecond range by blocking the queue in the
> > > kernel
> > > quickly, we'd have made a big step forward.
> > 
> > Yes, and in many situations this may help.  But in the general case
> > we can't protect against a storage array misconfiguration,
> > where something like this can happen.  So I worry about people
> > believing the host software will protect them against a mistake,
> > when we can't really do that.
> 
> My thought exactly. 
> 
> > All it takes is one I/O (a discard) to make a thorough mess of the
> > LUN.
> > 
> > -Ewan
> > 
> > > Regards
> > > Martin
> > > 
> > 
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://listman.redhat.com/mailman/listinfo/dm-devel
> > 
> 
> --dm-devel mailing listdm-de...@redhat.com
> https://listman.redhat.com/mailman/listinfo/dm-devel
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] RFC: one more time: SCSI device identification

2021-04-27 Thread Ewan D. Milne
On Tue, 2021-04-27 at 20:33 +, Martin Wilck wrote:
> On Tue, 2021-04-27 at 16:14 -0400, Ewan D. Milne wrote:
> > 
> > There's no way to do that, in principle.  Because there could be
> > other I/Os in flight.  You might (somehow) avoid retrying an I/O
> > that got a UA until you figured out if something changed, but other
> > I/Os can already have been sent to the target, or issued before you
> > get to look at the status.
> 
> Right. But in practice, a WWID change will hardly happen under full
> IO
> load. The storage side will probably have to block IO while this
> happens, at least for a short time period. So blocking and quiescing
> the queue upon an UA might still work, most of the time. Even if we
> were too late already, the sooner we stop the queue, the better.
> 
> The current algorithm in multipath-tools needs to detect a path going
> down and being reinstated. The time interval during which a WWID
> change
> will go unnoticed is one or more path checker intervals, typically on
> the order of 5-30 seconds. If we could decrease this interval to a
> sub-
> second or even millisecond range by blocking the queue in the kernel
> quickly, we'd have made a big step forward.

Yes, and in many situations this may help.  But in the general case
we can't protect against a storage array misconfiguration,
where something like this can happen.  So I worry about people
believing the host software will protect them against a mistake,
when we can't really do that.

All it takes is one I/O (a discard) to make a thorough mess of the LUN.

-Ewan

> 
> Regards
> Martin
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] RFC: one more time: SCSI device identification

2021-04-27 Thread Ewan D. Milne
On Mon, 2021-04-26 at 13:16 +, Martin Wilck wrote:
> On Mon, 2021-04-26 at 13:14 +0200, Ulrich Windl wrote:
> > > > 
> > > 
> > > While we're at it, I'd like to mention another issue: WWID
> > > changes.
> > > 
> > > This is a big problem for multipathd. The gist is that the device
> > > identification attributes in sysfs only change after rescanning
> > > the
> > > device. Thus if a user changes LUN assignments on a storage
> > > system,
> > > it can happen that a direct INQUIRY returns a different WWID as
> > > in
> > > sysfs, which is fatal. If we plan to rely more on sysfs for
> > > device
> > > identification in the future, the problem gets worse. 
> > 
> > I think many devices rely on the fact that they are identified by
> > Vendor/model/serial_nr, because in most professional SAN storage
> > systems you
> > can pre-set the serial number to a custom value; so if you want a
> > new
> > disk
> > (maybe a snapshot) to be compatible with the old one, just assign
> > the
> > same
> > serial number. I guess that's the idea behind.
> 
> What you are saying sounds dangerous to me. If a snapshot has the
> same
> WWID as the device it's a snapshot of, it must not be exposed to any
> host(s) at the same time with its origin, otherwise the host may
> happily combine it with the origin into one multipath map, and data
> corruption will almost certainly result. 
> 
> My argument is about how the host is supposed to deal with a WWID
> change if it happens. Here, "WWID change" means that a given H:C:T:L
> suddenly exposes different device designators than it used to, while
> this device is in use by a host. Here, too, data corruption is
> imminent, and can happen in a blink of an eye. To avoid this, several
> things are needed:
> 
>  1) the host needs to get notified about the change (likely by an UA
> of
> some sort)
>  2) the kernel needs to react to the notification immediately, e.g.
> by
> blocking IO to the device,

There's no way to do that, in principle.  Because there could be
other I/Os in flight.  You might (somehow) avoid retrying an I/O
that got a UA until you figured out if something changed, but other
I/Os can already have been sent to the target, or issued before you
get to look at the status.

-Ewan

>  3) userspace tooling such as udev or multipathd need to figure out
> how
> to  how to deal with the situation cleanly, and eventually unblock
> it.
> 
> Wrt 1), we can only hope that it's the case. But 2) and 3) need work,
> afaics.
> 
> Martin
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] Antw: [EXT] Re: RFC: one more time: SCSI device identification

2021-04-27 Thread Ewan D. Milne
On Tue, 2021-04-27 at 12:52 +0200, Ulrich Windl wrote:
> > > > Hannes Reinecke  schrieb am 27.04.2021 um 10:21
> > > > in Nachricht
> 
> <2a6903e4-ff2b-67d5-e772-6971db844...@suse.de>:
> > On 4/27/21 10:10 AM, Martin Wilck wrote:
> > > On Tue, 2021‑04‑27 at 13:48 +1000, Erwin van Londen wrote:
> > > > > 
> > > > > Wrt 1), we can only hope that it's the case. But 2) and 3)
> > > > > need work,
> > > > > afaics.
> > > > > 
> > > > 
> > > > In my view the WWID should never change. 
> > > 
> > > In an ideal world, perhaps not. But in the dm‑multipath realm, we
> > > know
> > > that WWID changes can happen with certain storage arrays. See 
> > > 
https://listman.redhat.com/archives/dm‑devel/2021‑February/msg00116.html
> > >  
> > > and follow‑ups, for example.
> > > 
> > 
> > And it's actually something which might happen quite easily.
> > The storage array can unmap a LUN, delete it, create a new one, and
> > map
> > that one into the same LUN number than the old one.
> > If we didn't do I/O during that interval upon the next I/O we will
> > be
> > getting the dreaded 'Power‑On/Reset' sense code.
> > _And nothing else_, due to the arcane rules for sense code
> > generation in
> > SAM.
> > But we end up with a completely different device.
> > 
> > The only way out of it is to do a rescan for every POR sense code,
> > and
> > disable the device eg via DID_NO_CONNECT whenever we find that the
> > identification has changed. We already have a copy of the original
> > VPD
> > page 0x83 at hand, so that should be reasonably easy.
> 
> I don't know the depth of the SCSI or FC protocol, but storage
> systems
> typically signal such events, maybe either via some unit attention or
> some FC
> event. Older kernels logged that there was a change, but a manual
> SCSI bus scan
> is needed, while newer kernels find new devices "automagically" for
> some
> products. The HP EVA 6000 series wored that way, a 3PAR SotorServ
> 8000 series
> also seems to work that way, but not Pure Storage X70 R3. FOr the
> latter you
> need something like a FC LIP to make the kernel detect the new
> devices (LUNs).
> I'm unsure where the problem is, but in principle the kernel can be
> notified...

There has to be some command on which the Unit Attention status
can be returned.  (In a multipath configuration, the path checker
commands may do this).  In absence of a command, there is no
asynchronous mechanism in SCSI to report the status.

On FC things related to finding a remote port will trigger a rescan.

-Ewan

> 
> > 
> > I had a rather lengthy discussion with Fred Knight @ NetApp about
> > Power‑On/Reset handling, what with him complaining that we don't
> > handle
> > is correctly. So this really is something we should be looking
> > into,
> > even independently of multipathing.
> > 
> > But actually I like the idea from Martin Petersen to expose the
> > parsed
> > VPD identifiers to sysfs; that would allow us to drop sg_inq
> > completely
> > from the udev rules.
> 
> Talking of VPDs: Somewhere in the last 12 years (within SLES 11)there
> was a
> kernel change regarding trailing blanks in VPD data. That change blew
> up
> several configurations being unable to re-recognize the devices. In
> one case
> the software even had bound a license to a specific device with
> serial number,
> and that software found "new" devices while missing the "old" ones...
> 
> Regards,
> Ulrich
> 
> > 
> > Cheers,
> > 
> > Hannes
> > ‑‑ 
> > Dr. Hannes Reinecke Kernel Storage Architect
> > h...@suse.de   +49 911 74053
> > 688
> > SUSE Software Solutions Germany GmbH, 90409 Nürnberg
> > GF: F. Imendörffer, HRB 36809 (AG Nürnberg)
> 
> 
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [PATCH v2 3/4] nvme: introduce FAILUP handling for REQ_FAILFAST_TRANSPORT

2021-04-16 Thread Ewan D. Milne
On Thu, 2021-04-15 at 19:15 -0400, Mike Snitzer wrote:
> If REQ_FAILFAST_TRANSPORT is set it means the driver should not retry
> IO that completed with transport errors. REQ_FAILFAST_TRANSPORT is
> set by multipathing software (e.g. dm-multipath) before it issues IO.
> 
> Update NVMe to allow failover of requests marked with either
> REQ_NVME_MPATH or REQ_FAILFAST_TRANSPORT. This allows such requests
> to be given a disposition of either FAILOVER or FAILUP respectively.
> FAILUP handling ensures a retryable error is returned up from NVMe.
> 
> Introduce nvme_failup_req() for use in nvme_complete_rq() if
> nvme_decide_disposition() returns FAILUP. nvme_failup_req() ensures
> the request is completed with a retryable IO error when appropriate.
> __nvme_end_req() was factored out for use by both nvme_end_req() and
> nvme_failup_req().
> 
> Signed-off-by: Mike Snitzer 
> ---
>  drivers/nvme/host/core.c | 31 ++-
>  1 file changed, 26 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 4134cf3c7e48..10375197dd53 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -299,6 +299,7 @@ enum nvme_disposition {
>   COMPLETE,
>   RETRY,
>   FAILOVER,
> + FAILUP,
>  };
>  
>  static inline enum nvme_disposition nvme_decide_disposition(struct
> request *req)
> @@ -318,10 +319,11 @@ static inline enum nvme_disposition
> nvme_decide_disposition(struct request *req)
>   nvme_req(req)->retries >= nvme_max_retries)
>   return COMPLETE;
>  
> - if (req->cmd_flags & REQ_NVME_MPATH) {
> + if (req->cmd_flags & (REQ_NVME_MPATH | REQ_FAILFAST_TRANSPORT))
> {
>   if (nvme_is_path_error(nvme_req(req)->status) ||
>   blk_queue_dying(req->q))
> - return FAILOVER;
> + return (req->cmd_flags & REQ_NVME_MPATH) ?
> + FAILOVER : FAILUP;
>   } else {
>   if (blk_queue_dying(req->q))
>   return COMPLETE;
> @@ -330,10 +332,8 @@ static inline enum nvme_disposition
> nvme_decide_disposition(struct request *req)
>   return RETRY;
>  }
>  
> -static inline void nvme_end_req(struct request *req)
> +static inline void __nvme_end_req(struct request *req, blk_status_t
> status)
>  {
> - blk_status_t status = nvme_error_status(nvme_req(req)->status);
> -
>   if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&
>   req_op(req) == REQ_OP_ZONE_APPEND)
>   req->__sector = nvme_lba_to_sect(req->q->queuedata,
> @@ -343,6 +343,24 @@ static inline void nvme_end_req(struct request
> *req)
>   blk_mq_end_request(req, status);
>  }
>  
> +static inline void nvme_end_req(struct request *req)
> +{
> + __nvme_end_req(req, nvme_error_status(nvme_req(req)->status));
> +}
> +
> +static void nvme_failup_req(struct request *req)
> +{
> + blk_status_t status = nvme_error_status(nvme_req(req)->status);
> +
> + if (WARN_ON_ONCE(!blk_path_error(status))) {
> + pr_debug("Request meant for failover but blk_status_t
> (errno=%d) was not retryable.\n",
> +  blk_status_to_errno(status));
> + status = BLK_STS_IOERR;
> + }
> +
> + __nvme_end_req(req, status);

It seems like adding __nvme_end_req() was unnecessary, since
nvme_end_req() just calls it and does nothing else?

-Ewan

> +}
> +
>  void nvme_complete_rq(struct request *req)
>  {
>   trace_nvme_complete_rq(req);
> @@ -361,6 +379,9 @@ void nvme_complete_rq(struct request *req)
>   case FAILOVER:
>   nvme_failover_req(req);
>   return;
> + case FAILUP:
> + nvme_failup_req(req);
> + return;
>   }
>  }
>  EXPORT_SYMBOL_GPL(nvme_complete_rq);

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v2 3/4] nvme: introduce FAILUP handling for REQ_FAILFAST_TRANSPORT

2021-04-16 Thread Ewan D. Milne
On Fri, 2021-04-16 at 16:07 +0200, Hannes Reinecke wrote:
> 
> Hmm. Quite convoluted, methinks.
> Shouldn't this achieve the same thing?
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index e89ec2522ca6..8c36a2196b66 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -303,8 +303,10 @@ static inline enum nvme_disposition
> nvme_decide_disposition(struct request *req)
> if (likely(nvme_req(req)->status == 0))
> return COMPLETE;
> 
> -   if (blk_noretry_request(req) ||
> -   (nvme_req(req)->status & NVME_SC_DNR) ||
> +   if (blk_noretry_request(req))
> +   nvme_req(req)->status |= NVME_SC_DNR;
> +
> +   if ((nvme_req(req)->status & NVME_SC_DNR) ||
> nvme_req(req)->retries >= nvme_max_retries)
> return COMPLETE;
> 

I am not in favor of altering ->status to set DNR jus to
simplify the following conditional.

-Ewan

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] md/dm-mpath: check whether all pgpaths have same uuid in multipath_ctr()

2021-03-23 Thread Ewan D. Milne
On Tue, 2021-03-23 at 15:47 +0800, lixiaokeng wrote:
> 
> On 2021/3/22 22:22, Mike Snitzer wrote:
> > On Mon, Mar 22 2021 at  4:11am -0400,
> > Christoph Hellwig  wrote:
> > 
> > > On Sat, Mar 20, 2021 at 03:19:23PM +0800, Zhiqiang Liu wrote:
> > > > From: Zhiqiang Liu 
> > > > 
> > > > When we make IO stress test on multipath device, there will
> > > > be a metadata err because of wrong path. In the test, we
> > > > concurrent execute 'iscsi device login|logout' and
> > > > 'multipath -r' command with IO stress on multipath device.
> > > > In some case, systemd-udevd may have not time to process
> > > > uevents of iscsi device logout|login, and then 'multipath -r'
> > > > command triggers multipathd daemon calls ioctl to load table
> > > > with incorrect old device info from systemd-udevd.
> > > > Then, one iscsi path may be incorrectly attached to another
> > > > multipath which has different uuid. Finally, the metadata err
> > > > occurs when umounting filesystem to down write metadata on
> > > > the iscsi device which is actually not owned by the multipath
> > > > device.
> > > > 
> > > > So we need to check whether all pgpaths of one multipath have
> > > > the same uuid, if not, we should throw a error.
> > > > 
> > > > Signed-off-by: Zhiqiang Liu 
> > > > Signed-off-by: lixiaokeng 
> > > > Signed-off-by: linfeilong 
> > > > Signed-off-by: Wubo 
> > > > ---
> > > >  drivers/md/dm-mpath.c   | 52
> > > > +
> > > >  drivers/scsi/scsi_lib.c |  1 +
> > > >  2 files changed, 53 insertions(+)
> > > > 
> > > > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> > > > index bced42f082b0..f0b995784b53 100644
> > > > --- a/drivers/md/dm-mpath.c
> > > > +++ b/drivers/md/dm-mpath.c
> > > > @@ -24,6 +24,7 @@
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > > +#include 
> > > >  #include 
> > > >  #include 
> > > > 
> > > > @@ -1169,6 +1170,45 @@ static int parse_features(struct
> > > > dm_arg_set *as, struct multipath *m)
> > > > return r;
> > > >  }
> > > > 
> > > > +#define SCSI_VPD_LUN_ID_PREFIX_LEN 4
> > > > +#define MPATH_UUID_PREFIX_LEN 7
> > > > +static int check_pg_uuid(struct priority_group *pg, char
> > > > *md_uuid)
> > > > +{
> > > > +   char pgpath_uuid[DM_UUID_LEN] = {0};
> > > > +   struct request_queue *q;
> > > > +   struct pgpath *pgpath;
> > > > +   struct scsi_device *sdev;
> > > > +   ssize_t count;
> > > > +   int r = 0;
> > > > +
> > > > +   list_for_each_entry(pgpath, >pgpaths, list) {
> > > > +   q = bdev_get_queue(pgpath->path.dev->bdev);
> > > > +   sdev = scsi_device_from_queue(q);
> > > 
> > > Common dm-multipath code should never poke into scsi
> > > internals.  This
> > > is something for the device handler to check.  It probably also
> > > won't
> > > work for all older devices.
> > 
> > Definitely.
> > 
> > But that aside, userspace (multipathd) _should_ be able to do extra
> > validation, _before_ pushing down a new table to the kernel, rather
> > than
> > forcing the kernel to do it.
> > 
> 
> Martin (committer of multipath-tools) said that:
> "Don't get me wrong, I don't argue against tough testing. But we
> should
> be aware that there are always time intervals during which
> multipathd's
> picture of the present devices is different from what the kernel
> sees."
> 
> It is difficult to solve this in multipathd.
> 
> Regards,
> Lixiaokeng
> 

I think the patch is no good.  There are plenty of devices that don't
support VPD page 83h:

int scsi_vpd_lun_id(struct scsi_device *sdev, char *id, size_t id_len)
{
u8 cur_id_type = 0xff;
u8 cur_id_size = 0;
unsigned char *d, *cur_id_str;
unsigned char __rcu *vpd_pg83;
int id_size = -EINVAL;

rcu_read_lock();
vpd_pg83 = rcu_dereference(sdev->vpd_pg83);
if (!vpd_pg83) {
rcu_read_unlock();
return -ENXIO;
}

and the DM layer should not be looking at the properties of the
underlying devices in this way anyway.  It should be pushed down
to the table.

-Ewan



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v2 09/11] block: Expose queue nr_zones in sysfs

2018-10-11 Thread Ewan D. Milne
So in v2 you moved the #ifdef CONFIG_BLK_DEV_ZONED
so that nr_zones is always present?  It was previously
changed to keep the request_queue size smaller.

Would make more sense to make the nr_zones sysfs
code here conditional on CONFIG_BLK_DEV_ZONED?

-Ewan


On Thu, 2018-10-11 at 16:09 +0900, Damien Le Moal wrote:
> Expose through sysfs the nr_zones field of a zoned block device request
> queue. This represents the total number of zones of the device
> calculated using the known disk capacity and zone size.
> 
> Exposing this value helps in debugging disk issues as well as
> facilitating scripts based use of the disk (e.g. blktests).
> 
> Signed-off-by: Damien Le Moal 
> ---
>  block/blk-sysfs.c  | 11 +++
>  include/linux/blkdev.h |  2 +-
>  2 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 3772671cf2bc..f7060a938bf9 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -300,6 +300,11 @@ static ssize_t queue_zoned_show(struct request_queue *q, 
> char *page)
>   }
>  }
>  
> +static ssize_t queue_nr_zones_show(struct request_queue *q, char *page)
> +{
> + return queue_var_show(q->nr_zones, page);
> +}
> +
>  static ssize_t queue_nomerges_show(struct request_queue *q, char *page)
>  {
>   return queue_var_show((blk_queue_nomerges(q) << 1) |
> @@ -637,6 +642,11 @@ static struct queue_sysfs_entry queue_zoned_entry = {
>   .show = queue_zoned_show,
>  };
>  
> +static struct queue_sysfs_entry queue_nr_zones_entry = {
> + .attr = {.name = "nr_zones", .mode = 0444 },
> + .show = queue_nr_zones_show,
> +};
> +
>  static struct queue_sysfs_entry queue_nomerges_entry = {
>   .attr = {.name = "nomerges", .mode = 0644 },
>   .show = queue_nomerges_show,
> @@ -727,6 +737,7 @@ static struct attribute *default_attrs[] = {
>   _write_zeroes_max_entry.attr,
>   _nonrot_entry.attr,
>   _zoned_entry.attr,
> + _nr_zones_entry.attr,
>   _nomerges_entry.attr,
>   _rq_affinity_entry.attr,
>   _iostats_entry.attr,
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c24969b1741b..23ab53d2d4ca 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -593,7 +593,6 @@ struct request_queue {
>  
>   struct queue_limits limits;
>  
> -#ifdef CONFIG_BLK_DEV_ZONED
>   /*
>* Zoned block device information for request dispatch control.
>* nr_zones is the total number of zones of the device. This is always
> @@ -612,6 +611,7 @@ struct request_queue {
>* blk_mq_unfreeze_queue().
>*/
>   unsigned intnr_zones;
> +#ifdef CONFIG_BLK_DEV_ZONED
>   unsigned long   *seq_zones_bitmap;
>   unsigned long   *seq_zones_wlock;
>  #endif /* CONFIG_BLK_DEV_ZONED */


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [LSF/MM TOPIC] block, dm: restack queue_limits

2018-01-30 Thread Ewan D. Milne
On Tue, 2018-01-30 at 16:07 +0100, Hannes Reinecke wrote:
> On 01/29/2018 10:08 PM, Mike Snitzer wrote:
> > We currently don't restack the queue_limits if the lowest, or
> > intermediate, layer of an IO stack changes.
> > 
> > This is particularly unfortunate in the case of FLUSH/FUA which may
> > change if/when a HW controller's BBU fails; whereby requiring the device
> > advertise that it has a volatile write cache (WCE=1).
> > 
> Uh-oh. Device rescan.
> Would be a valid topic on its own...
> 
> > But in the context of DM, really it'd be best if the entire stack of
> > devices had their limits restacked if any underlying layer's limits
> > change.
> > 
> > In the past, Martin and I discussed that we should "just do it" but
> > never did.  Not sure we need a lengthy discussion but figured I'd put it
> > out there.
> > 
> > Maybe I'll find time, between now and April, to try implementing it.
> > 
> For SCSI the device capabilities are pretty much set in stone after the
> initial scan; there are hooks for rescanning, but they will only work
> half of the time.
> Plus we can't really change the device type on the fly (eg if the SCSI
> device type changes; if it moves away from '0' we would need to unbind
> the sd driver, and if it moves to '0' we'll need to rescan the sd
> device. None of this is happening right now.)
> 
> So I'd be glad to have a discussion around this.

At least array vendor has also desired the ability to change various
attributes of the device after the initial scan, such as the model name.
Not sure what would break if we did this, since who knows what userspace
software might be caching this info, but...

-Ewan


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [for-4.14 RFC PATCH 1/2] dm rq: avoid deadlock if dm-mq is stacked on old .request_fn device(s)

2017-07-14 Thread Ewan D. Milne
On Fri, 2017-07-14 at 10:19 -0400, Mike Snitzer wrote:
> 
> Do you see a benefit to extracting that portion of your WIP patch
> (removing the ->complete handler entirely)?
> 
> Or leave well enough alone and just continue to disable dm-mq's ability
> to stack on .request_fn paths?
> 
> Given SCSI's switch to scsi-mq by default I cannot see value in propping
> up stacking on the old .request_fn devices.

So, the dm_mod.use_blk_mq flag is global, right?  I guess the question
is whether all of the block device types used on a system under DM are
supported under MQ.  If that is the case then we would be OK.

The other question is whether there are negative performance
consequences in any (corner?) cases with MQ that would result in it
being preferable to run in non-MQ mode (e.g. tag space with lpfc, did
we ever resolve that?) but the right approach there is to put the effort
into the MQ path going forward, as has been the case.

-Ewan




--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel