RE: [PATCH 1/1] scsi: ufs: make sure all interrupts are processed

2018-01-30 Thread Avri Altman
Hi,
Can you elaborate how this can even happen?
Isn't the interrupt aggregation capability should attend for those cases?

Thanks,
Avri

> -Original Message-
> From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi-
> ow...@vger.kernel.org] On Behalf Of Asutosh Das
> Sent: Tuesday, January 30, 2018 6:54 AM
> To: subha...@codeaurora.org; c...@codeaurora.org;
> vivek.gau...@codeaurora.org; rna...@codeaurora.org;
> vinholika...@gmail.com; j...@linux.vnet.ibm.com;
> martin.peter...@oracle.com
> Cc: linux-scsi@vger.kernel.org; Venkat Gopalakrishnan
> ; Asutosh Das ; open
> list 
> Subject: [PATCH 1/1] scsi: ufs: make sure all interrupts are processed
> 
> From: Venkat Gopalakrishnan 
> 
> As multiple requests are submitted to the ufs host controller in parallel 
> there
> could be instances where the command completion interrupt arrives later for a
> request that is already processed earlier as the corresponding doorbell was
> cleared when handling the previous interrupt. Read the interrupt status in a
> loop after processing the received interrupt to catch such interrupts and 
> handle
> it.
> 
> Signed-off-by: Venkat Gopalakrishnan 
> Signed-off-by: Asutosh Das 
> ---
>  drivers/scsi/ufs/ufshcd.c | 27 +++
>  1 file changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index
> 8af2af3..58d81de 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -5357,19 +5357,30 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba)
>   u32 intr_status, enabled_intr_status;
>   irqreturn_t retval = IRQ_NONE;
>   struct ufs_hba *hba = __hba;
> + int retries = hba->nutrs;
> 
>   spin_lock(hba->host->host_lock);
>   intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
> - enabled_intr_status =
> - intr_status & ufshcd_readl(hba, REG_INTERRUPT_ENABLE);
> 
> - if (intr_status)
> - ufshcd_writel(hba, intr_status, REG_INTERRUPT_STATUS);
> + /*
> +  * There could be max of hba->nutrs reqs in flight and in worst case
> +  * if the reqs get finished 1 by 1 after the interrupt status is
> +  * read, make sure we handle them by checking the interrupt status
> +  * again in a loop until we process all of the reqs before returning.
> +  */
> + do {
> + enabled_intr_status =
> + intr_status & ufshcd_readl(hba,
> REG_INTERRUPT_ENABLE);
> + if (intr_status)
> + ufshcd_writel(hba, intr_status,
> REG_INTERRUPT_STATUS);
> + if (enabled_intr_status) {
> + ufshcd_sl_intr(hba, enabled_intr_status);
> + retval = IRQ_HANDLED;
> + }
> +
> + intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
> + } while (intr_status && --retries);
> 
> - if (enabled_intr_status) {
> - ufshcd_sl_intr(hba, enabled_intr_status);
> - retval = IRQ_HANDLED;
> - }
>   spin_unlock(hba->host->host_lock);
>   return retval;
>  }
> --
> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center,
> Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
> Foundation Collaborative Project.



Re: [PATCH 0/6] Six qla2xxx and qla4xxx patches

2018-01-30 Thread Madhani, Himanshu
Hi Martin, 

> On Jan 30, 2018, at 7:04 PM, Martin K. Petersen  
> wrote:
> 
> 
> Bart,
> 
>> The patches in this series are what I came up with after having analyzed
>> the source code of the qla[24]xxx drivers with several source code analysis
>> tools (scripts/kernel-doc, gcc, sparse and smatch). None of the patches in
>> this series have been tested. Yet I'm asking you to consider at least the
>> first patch of this series for kernel v4.16 because it fixes a locking bug
>> in one of the SCSI patches queued for kernel v4.16.
> 
> Himanshu and team: Please review!
> 

Series looks good

Acked-by: Himanshu Madhani 

> -- 
> Martin K. PetersenOracle Linux Engineering

Thanks,
- Himanshu



Re: [dm-devel] [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE

2018-01-30 Thread Ming Lei
On Tue, Jan 30, 2018 at 08:22:27PM -0700, Jens Axboe wrote:
> On 1/30/18 8:21 PM, Bart Van Assche wrote:
> > On Tue, 2018-01-30 at 20:17 -0700, Jens Axboe wrote:
> >> BLK_STS_RESOURCE should always be safe to return, and it should work
> >> the same as STS_DEV_RESOURCE, except it may cause an extra queue
> >> run.
> >>
> >> Well written drivers should use STS_DEV_RESOURCE where it makes
> >> sense.
> > 
> > Hello Jens,
> > 
> > I would appreciate it if other names would be chosen than BLK_STS_RESOURCE
> > and BLK_STS_DEV_RESOURCE, e.g. names that directly refer to the fact that
> > one of the two status codes causes the queue to be rerun and the other not.
> > I'm afraid that the currently chosen names will cause confusion.
> 
> DEV_RESOURCE is pretty clear I think, but I agree that STS_RESOURCE
> could perhaps be better. STS_SYSTEM_RESOURCE? It makes the distinction

I guess it still can't cover all, for example, .queue_rq() may not
submit rq to hardware successfully because of tansport busy, such FC,..

-- 
Ming


Re: [dm-devel] [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE

2018-01-30 Thread Jens Axboe
On 1/30/18 8:27 PM, Bart Van Assche wrote:
> On Tue, 2018-01-30 at 20:22 -0700, Jens Axboe wrote:
>> On 1/30/18 8:21 PM, Bart Van Assche wrote:
>>> On Tue, 2018-01-30 at 20:17 -0700, Jens Axboe wrote:
 BLK_STS_RESOURCE should always be safe to return, and it should work
 the same as STS_DEV_RESOURCE, except it may cause an extra queue
 run.

 Well written drivers should use STS_DEV_RESOURCE where it makes
 sense.
>>>
>>> Hello Jens,
>>>
>>> I would appreciate it if other names would be chosen than BLK_STS_RESOURCE
>>> and BLK_STS_DEV_RESOURCE, e.g. names that directly refer to the fact that
>>> one of the two status codes causes the queue to be rerun and the other not.
>>> I'm afraid that the currently chosen names will cause confusion.
>>
>> DEV_RESOURCE is pretty clear I think, but I agree that STS_RESOURCE
>> could perhaps be better. STS_SYSTEM_RESOURCE? It makes the distinction
>> a bit clearer.
> 
> I'm concerned about both. A block driver can namely run out of device
> resources without receiving a notification if that resource becomes
> available again.

That is true, and that is why I don't want to driver to make guesses on
when to re-run. The saving grace here is that it should happen extremely
rarely. I went over this in a previous email. If it's not a rare
occurence, then there's no other way around it then wiring up a
notification mechanism to kick the queue when the shortage clears.

One way to deal with that is making the assumption that other IO will
clear the issue. That means we can confine it to the blk-mq layer.
Similarly to how we currently sometimes have to track starvation across
hardware queues and restart queue X when Y completes IO, we may have to
have a broader scope of shortage. That would probably fix all bug
pathological cases.

> In that case the appropriate return code is BLK_STS_RESOURCE. Hence my
> concern ...

But this isn't a new thing, the only real change here is making it so
that drivers don't have to think about that case.

-- 
Jens Axboe



Re: [dm-devel] [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE

2018-01-30 Thread Bart Van Assche
On Tue, 2018-01-30 at 20:22 -0700, Jens Axboe wrote:
> On 1/30/18 8:21 PM, Bart Van Assche wrote:
> > On Tue, 2018-01-30 at 20:17 -0700, Jens Axboe wrote:
> > > BLK_STS_RESOURCE should always be safe to return, and it should work
> > > the same as STS_DEV_RESOURCE, except it may cause an extra queue
> > > run.
> > > 
> > > Well written drivers should use STS_DEV_RESOURCE where it makes
> > > sense.
> > 
> > Hello Jens,
> > 
> > I would appreciate it if other names would be chosen than BLK_STS_RESOURCE
> > and BLK_STS_DEV_RESOURCE, e.g. names that directly refer to the fact that
> > one of the two status codes causes the queue to be rerun and the other not.
> > I'm afraid that the currently chosen names will cause confusion.
> 
> DEV_RESOURCE is pretty clear I think, but I agree that STS_RESOURCE
> could perhaps be better. STS_SYSTEM_RESOURCE? It makes the distinction
> a bit clearer.

I'm concerned about both. A block driver can namely run out of device resources
without receiving a notification if that resource becomes available again. In
that case the appropriate return code is BLK_STS_RESOURCE. Hence my concern ...

Bart.

Re: [PATCH v3] scsi: sd: add new match array for cache_type

2018-01-30 Thread Martin K. Petersen

Weiping,

> add user friendly command strings sd_wr_cache to enable/disable
> write cache. user can enable both write and read cache by one of
> the following commands:

I remain unconvinced that introducing redundant option strings is a
benefit.

These shorthand forms may seem obvious to you, but not to everyone
else. Adding to the murkiness is that fact that write cache must be
*enabled* and read cache *disabled* in the protocol. The fact that the
??rN and RCD columns below contain inverse values just emphasizes that
point that "0" and "1" are not a particularly good interface either.

It is also confusing that the existing longhand forms will be printed
when the user subsequently queries the state instead of what they used
to toggle it.


This, however, is pretty much what I was looking for:

> Encoding  | WCE RCD | Write_cache Read_cache
> 
> write through  / w0r1 | 0   0   | off on
> none   / w0r0 | 0   1   | off off
> write back / w1r1 | 1   0   | on  on
> write back, no read (daft) / w1r0 | 1   1   | on  off

Please drop the w?r? options and submit a patch with this table and a
note about the "temporary" prefix. Put it in
Documentation/scsi/sd-parameters.txt or something to that effect.

Bonus points for also documenting the remaining sd sysfs attributes and
their options.

Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [dm-devel] [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE

2018-01-30 Thread Jens Axboe
On 1/30/18 8:21 PM, Bart Van Assche wrote:
> On Tue, 2018-01-30 at 20:17 -0700, Jens Axboe wrote:
>> BLK_STS_RESOURCE should always be safe to return, and it should work
>> the same as STS_DEV_RESOURCE, except it may cause an extra queue
>> run.
>>
>> Well written drivers should use STS_DEV_RESOURCE where it makes
>> sense.
> 
> Hello Jens,
> 
> I would appreciate it if other names would be chosen than BLK_STS_RESOURCE
> and BLK_STS_DEV_RESOURCE, e.g. names that directly refer to the fact that
> one of the two status codes causes the queue to be rerun and the other not.
> I'm afraid that the currently chosen names will cause confusion.

DEV_RESOURCE is pretty clear I think, but I agree that STS_RESOURCE
could perhaps be better. STS_SYSTEM_RESOURCE? It makes the distinction
a bit clearer.

-- 
Jens Axboe



Re: [dm-devel] [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE

2018-01-30 Thread Bart Van Assche
On Tue, 2018-01-30 at 20:17 -0700, Jens Axboe wrote:
> BLK_STS_RESOURCE should always be safe to return, and it should work
> the same as STS_DEV_RESOURCE, except it may cause an extra queue
> run.
> 
> Well written drivers should use STS_DEV_RESOURCE where it makes
> sense.

Hello Jens,

I would appreciate it if other names would be chosen than BLK_STS_RESOURCE
and BLK_STS_DEV_RESOURCE, e.g. names that directly refer to the fact that
one of the two status codes causes the queue to be rerun and the other not.
I'm afraid that the currently chosen names will cause confusion.

Thanks,

Bart.

Re: [PATCH v6] blk-mq: introduce BLK_STS_DEV_RESOURCE

2018-01-30 Thread Jens Axboe
On 1/30/18 8:04 PM, Mike Snitzer wrote:
> From: Ming Lei 
> 
> This status is returned from driver to block layer if device related
> resource is unavailable, but driver can guarantee that IO dispatch
> will be triggered in future when the resource is available.
> 
> Convert some drivers to return BLK_STS_DEV_RESOURCE.  Also, if driver
> returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after
> a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls.  BLK_MQ_DELAY_QUEUE is
> 3 ms because both scsi-mq and nvmefc are using that magic value.
> 
> If a driver can make sure there is in-flight IO, it is safe to return
> BLK_STS_DEV_RESOURCE because:
> 
> 1) If all in-flight IOs complete before examining SCHED_RESTART in
> blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue
> is run immediately in this case by blk_mq_dispatch_rq_list();
> 
> 2) if there is any in-flight IO after/when examining SCHED_RESTART
> in blk_mq_dispatch_rq_list():
> - if SCHED_RESTART isn't set, queue is run immediately as handled in 1)
> - otherwise, this request will be dispatched after any in-flight IO is
>   completed via blk_mq_sched_restart()
> 
> 3) if SCHED_RESTART is set concurently in context because of
> BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two
> cases and make sure IO hang can be avoided.
> 
> One invariant is that queue will be rerun if SCHED_RESTART is set.

Applied, thanks.

-- 
Jens Axboe



Re: [dm-devel] [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE

2018-01-30 Thread Jens Axboe
On 1/30/18 10:52 AM, Bart Van Assche wrote:
> On 01/30/18 06:24, Mike Snitzer wrote:
>> + *
>> + * If driver returns BLK_STS_RESOURCE and SCHED_RESTART
>> + * bit is set, run queue after a delay to avoid IO stalls
>> + * that could otherwise occur if the queue is idle.
>>   */
>> -if (!blk_mq_sched_needs_restart(hctx) ||
>> +needs_restart = blk_mq_sched_needs_restart(hctx);
>> +if (!needs_restart ||
>>  (no_tag && list_empty_careful(>dispatch_wait.entry)))
>>  blk_mq_run_hw_queue(hctx, true);
>> +else if (needs_restart && (ret == BLK_STS_RESOURCE))
>> +blk_mq_delay_run_hw_queue(hctx, BLK_MQ_QUEUE_DELAY);
>>  }
> 
> If a request completes concurrently with execution of the above code 
> then the request completion will trigger a call of 
> blk_mq_sched_restart_hctx() and that call will clear the 
> BLK_MQ_S_SCHED_RESTART bit. If that bit is cleared before the above code 
> tests it then the above code will schedule an asynchronous call of 
> __blk_mq_run_hw_queue(). If the .queue_rq() call triggered by the new 
> queue run returns again BLK_STS_RESOURCE then the above code will be 
> executed again. In other words, a loop occurs. That loop will repeat as 
> long as the described race occurs. The current (kernel v4.15) block 
> layer behavior is simpler: only block drivers call 
> blk_mq_delay_run_hw_queue() and the block layer core never calls that 
> function. Hence that loop cannot occur with the v4.15 block layer core 
> and block drivers. A motivation of why that loop is preferred compared 
> to the current behavior (no loop) is missing. Does this mean that that 
> loop is a needless complication of the block layer core?

The dispatch, and later restart check, is within the hctx lock. The
completions should be as well.

> Sorry but I still prefer the v4.15 block layer approach because this 
> patch has in my view the following disadvantages:
> - It involves a blk-mq API change. API changes are always risky and need
>some time to stabilize.
> - The delay after which to rerun the queue is moved from block layer
>drivers into the block layer core. I think that's wrong because only
>the block driver authors can make a good choice for this constant.

It's exactly the right place to put it, as drivers cannot make a good
decision for when to run the queue again. You get NULL on allocating
some piece of memory, when do you run it again? That's the last thing
I want driver writers to make a decision on, because a novice device
driver writer will just think that he should run the queue again asap.
In reality we are screwed. Decisions like that SHOULD be in shared
and generic code, not in driver private code.

> - This patch makes block drivers harder to understand. Anyone who sees
>return BLK_STS_RESOURCE / return BLK_STS_DEV_RESOURCE for the first
>time will have to look up the meaning of these constants. An explicit
>blk_mq_delay_run_hw_queue() call is easier to understand.

BLK_STS_RESOURCE should always be safe to return, and it should work
the same as STS_DEV_RESOURCE, except it may cause an extra queue
run.

Well written drivers should use STS_DEV_RESOURCE where it makes
sense.

> - This patch does not fix any bugs nor makes block drivers easier to
>read or to implement. So why is this patch considered useful?

It does fix the run bug on global resources, I'm assuming you mean
it doesn't fix any EXTRA bugs compared to just use the delayed
run?

-- 
Jens Axboe



Re: [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE

2018-01-30 Thread Mike Snitzer
On Tue, Jan 30 2018 at  9:44P -0500,
Jens Axboe  wrote:

> On 1/30/18 7:24 AM, Mike Snitzer wrote:
> > From: Ming Lei 
> > 
> > This status is returned from driver to block layer if device related
> > resource is unavailable, but driver can guarantee that IO dispatch
> > will be triggered in future when the resource is available.
> > 
> > Convert some drivers to return BLK_STS_DEV_RESOURCE.  Also, if driver
> > returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after
> > a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls.  BLK_MQ_DELAY_QUEUE is
> > 3 ms because both scsi-mq and nvmefc are using that magic value.
> > 
> > If a driver can make sure there is in-flight IO, it is safe to return
> > BLK_STS_DEV_RESOURCE because:
> > 
> > 1) If all in-flight IOs complete before examining SCHED_RESTART in
> > blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue
> > is run immediately in this case by blk_mq_dispatch_rq_list();
> > 
> > 2) if there is any in-flight IO after/when examining SCHED_RESTART
> > in blk_mq_dispatch_rq_list():
> > - if SCHED_RESTART isn't set, queue is run immediately as handled in 1)
> > - otherwise, this request will be dispatched after any in-flight IO is
> >   completed via blk_mq_sched_restart()
> > 
> > 3) if SCHED_RESTART is set concurently in context because of
> > BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two
> > cases and make sure IO hang can be avoided.
> > 
> > One invariant is that queue will be rerun if SCHED_RESTART is set.
> 
> This looks pretty good to me. I'm waffling a bit on whether to retain
> the current BLK_STS_RESOURCE behavior and name the new one something
> else, but I do like using the DEV name in there to signify the
> difference between a global and device resource.
> 
> Just a few small nits below - can you roll a v6 with the changes?

Folded in all your feedback and just replied with v6.

> > diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> > index 2d973ac54b09..f41d2057215f 100644
> > --- a/include/linux/blk_types.h
> > +++ b/include/linux/blk_types.h
> > @@ -39,6 +39,23 @@ typedef u8 __bitwise blk_status_t;
> >  
> >  #define BLK_STS_AGAIN  ((__force blk_status_t)12)
> >  
> > +/*
> > + * BLK_STS_DEV_RESOURCE is returned from driver to block layer if device
> > + * related resource is unavailable, but driver can guarantee that queue
> > + * will be rerun in future once the resource is available (whereby
> > + * dispatching requests).
> > + *
> > + * To safely return BLK_STS_DEV_RESOURCE, and allow forward progress, a
> > + * driver just needs to make sure there is in-flight IO.
> > + *
> > + * Difference with BLK_STS_RESOURCE:
> > + * If driver isn't sure if the queue will be rerun once device resource
> > + * is made available, please return BLK_STS_RESOURCE.  For example: when
> > + * memory allocation, DMA Mapping or other system resource allocation
> > + * fails and IO can't be submitted to device.
> > + */
> > +#define BLK_STS_DEV_RESOURCE   ((__force blk_status_t)13)
> 
> I'd rephrase that as:
> 
> BLK_STS_DEV_RESOURCE is returned from the driver to the block layer if
> device related resource are unavailable, but the driver can guarantee
> that the queue will be rerun in the future once resources become
> available again. This is typically the case for device specific
> resources that are consumed for IO. If the driver fails allocating these
> resources, we know that inflight (or pending) IO will free these
> resource upon completion.
> 
> This is different from BLK_STS_RESOURCE in that it explicitly references
> device specific resource. For resources of wider scope, allocation
> failure can happen without having pending IO. This means that we can't
> rely on request completions freeing these resources, as IO may not be in
> flight. Examples of that are kernel memory allocations, DMA mappings, or
> any other system wide resources.

Thanks for that, definitely clearer, nice job.

Mike


Re: [PATCH 1/1] scsi: storvsc: Increase cmd_per_lun for higher speed devices

2018-01-30 Thread Martin K. Petersen

> Increase cmd_per_lun to allow more I/Os in progress per device,
> particularly for NVMe's.  The Hyper-V host side can handle the
> higher count with no issues.

Long/KY/Cathy/Stephen: Please ack/review Michael's patches.

-- 
Martin K. Petersen  Oracle Linux Engineering


[PATCH v6] blk-mq: introduce BLK_STS_DEV_RESOURCE

2018-01-30 Thread Mike Snitzer
From: Ming Lei 

This status is returned from driver to block layer if device related
resource is unavailable, but driver can guarantee that IO dispatch
will be triggered in future when the resource is available.

Convert some drivers to return BLK_STS_DEV_RESOURCE.  Also, if driver
returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after
a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls.  BLK_MQ_DELAY_QUEUE is
3 ms because both scsi-mq and nvmefc are using that magic value.

If a driver can make sure there is in-flight IO, it is safe to return
BLK_STS_DEV_RESOURCE because:

1) If all in-flight IOs complete before examining SCHED_RESTART in
blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue
is run immediately in this case by blk_mq_dispatch_rq_list();

2) if there is any in-flight IO after/when examining SCHED_RESTART
in blk_mq_dispatch_rq_list():
- if SCHED_RESTART isn't set, queue is run immediately as handled in 1)
- otherwise, this request will be dispatched after any in-flight IO is
  completed via blk_mq_sched_restart()

3) if SCHED_RESTART is set concurently in context because of
BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two
cases and make sure IO hang can be avoided.

One invariant is that queue will be rerun if SCHED_RESTART is set.

Suggested-by: Jens Axboe 
Tested-by: Laurence Oberman 
Signed-off-by: Ming Lei 
Signed-off-by: Mike Snitzer 
---
V6:
- use -EBUSY, instead of -ENOMEM, for BLK_STS_DEV_RESOURCE
- rename BLK_MQ_QUEUE_DELAY to BLK_MQ_RESOURCE_DELAY
- cleanup blk_types.h comment block above BLK_STS_DEV_RESOURCE
- all suggested by Jens
V5:
- fixed stale reference to 10ms delay in blk-mq.c comment
- revised commit header to include Ming's proof of how
  blk-mq drivers will make progress (serves to document how
  scsi_queue_rq now works)
V4:
- cleanup header and code comments
- rerun queue after BLK_MQ_QUEUE_DELAY (3ms) instead of 10ms
- eliminate nvmefc's queue rerun now that blk-mq does it
V3:
- fix typo, and improvement document
- add tested-by tag
V2:
- add comments on the new introduced status
- patch style fix
- both are suggested by Christoph

 block/blk-core.c |  1 +
 block/blk-mq.c   | 20 
 drivers/block/null_blk.c |  2 +-
 drivers/block/virtio_blk.c   |  2 +-
 drivers/block/xen-blkfront.c |  2 +-
 drivers/md/dm-rq.c   |  5 ++---
 drivers/nvme/host/fc.c   | 12 ++--
 drivers/scsi/scsi_lib.c  |  6 +++---
 include/linux/blk_types.h| 18 ++
 9 files changed, 45 insertions(+), 23 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index cdae69be68e9..79dfef84c66c 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -145,6 +145,7 @@ static const struct {
[BLK_STS_MEDIUM]= { -ENODATA,   "critical medium" },
[BLK_STS_PROTECTION]= { -EILSEQ,"protection" },
[BLK_STS_RESOURCE]  = { -ENOMEM,"kernel resource" },
+   [BLK_STS_DEV_RESOURCE]  = { -EBUSY, "device resource" },
[BLK_STS_AGAIN] = { -EAGAIN,"nonblocking retry" },
 
/* device mapper special case, should not leak out: */
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 43e7449723e0..52a206e3777f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1160,6 +1160,8 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx 
**hctx,
return true;
 }
 
+#define BLK_MQ_RESOURCE_DELAY  3   /* ms units */
+
 bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 bool got_budget)
 {
@@ -1167,6 +1169,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, 
struct list_head *list,
struct request *rq, *nxt;
bool no_tag = false;
int errors, queued;
+   blk_status_t ret = BLK_STS_OK;
 
if (list_empty(list))
return false;
@@ -1179,7 +1182,6 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, 
struct list_head *list,
errors = queued = 0;
do {
struct blk_mq_queue_data bd;
-   blk_status_t ret;
 
rq = list_first_entry(list, struct request, queuelist);
if (!blk_mq_get_driver_tag(rq, , false)) {
@@ -1224,7 +1226,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, 
struct list_head *list,
}
 
ret = q->mq_ops->queue_rq(hctx, );
-   if (ret == BLK_STS_RESOURCE) {
+   if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) {
/*
 * If an I/O scheduler has been configured and we got a
 * driver tag for the next request already, free it
@@ -1255,6 +1257,8 @@ bool 

Re: [PATCH 0/6] Six qla2xxx and qla4xxx patches

2018-01-30 Thread Martin K. Petersen

Bart,

> The patches in this series are what I came up with after having analyzed
> the source code of the qla[24]xxx drivers with several source code analysis
> tools (scripts/kernel-doc, gcc, sparse and smatch). None of the patches in
> this series have been tested. Yet I'm asking you to consider at least the
> first patch of this series for kernel v4.16 because it fixes a locking bug
> in one of the SCSI patches queued for kernel v4.16.

Himanshu and team: Please review!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi_dh: Document alua_rtpg_queue() arguments

2018-01-30 Thread Martin K. Petersen

Bart,

> Since commit 3a025e1d1c2e ("Add optional check for bad kernel-doc
> comments") building with W=1 causes warnings to appear for issues
> in kernel-doc headers. This patch avoids that the following warnings
> are reported when building with W=1:

Applied to 4.16/scsi-fixes, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: remove dead makefile about oktagon files

2018-01-30 Thread Martin K. Petersen

Corentin,

> Remove line using inexistant files which were removed in
> commit 642978beb483 ("[SCSI] remove m68k NCR53C9x based drivers")

Applied to 4.16/scsi-fixes, thank you.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: aic7xxx: remove really aiclib.c

2018-01-30 Thread Martin K. Petersen

Corentin,

> aiclib.c is unused (and contain no code) since commit 1ff927306e08
> ("[SCSI] aic7xxx: remove aiclib.c")
> 13 years later, finish the cleaning by remove it from tree.

Applied to 4.16/scsi-fixes, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] mptfusion: Add bounds check in mptctl_hp_targetinfo()

2018-01-30 Thread Martin K. Petersen

Dan,

> My static checker complains about an out of bounds read:
>
> drivers/message/fusion/mptctl.c:2786 mptctl_hp_targetinfo()
> error: buffer overflow 'hd->sel_timeout' 255 <= u32max.
>
> It's true that we probably should have a bounds check here.

Applied to 4.16/scsi-fixes, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE

2018-01-30 Thread Jens Axboe
On 1/30/18 7:24 AM, Mike Snitzer wrote:
> From: Ming Lei 
> 
> This status is returned from driver to block layer if device related
> resource is unavailable, but driver can guarantee that IO dispatch
> will be triggered in future when the resource is available.
> 
> Convert some drivers to return BLK_STS_DEV_RESOURCE.  Also, if driver
> returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after
> a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls.  BLK_MQ_DELAY_QUEUE is
> 3 ms because both scsi-mq and nvmefc are using that magic value.
> 
> If a driver can make sure there is in-flight IO, it is safe to return
> BLK_STS_DEV_RESOURCE because:
> 
> 1) If all in-flight IOs complete before examining SCHED_RESTART in
> blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue
> is run immediately in this case by blk_mq_dispatch_rq_list();
> 
> 2) if there is any in-flight IO after/when examining SCHED_RESTART
> in blk_mq_dispatch_rq_list():
> - if SCHED_RESTART isn't set, queue is run immediately as handled in 1)
> - otherwise, this request will be dispatched after any in-flight IO is
>   completed via blk_mq_sched_restart()
> 
> 3) if SCHED_RESTART is set concurently in context because of
> BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two
> cases and make sure IO hang can be avoided.
> 
> One invariant is that queue will be rerun if SCHED_RESTART is set.

This looks pretty good to me. I'm waffling a bit on whether to retain
the current BLK_STS_RESOURCE behavior and name the new one something
else, but I do like using the DEV name in there to signify the
difference between a global and device resource.

Just a few small nits below - can you roll a v6 with the changes?

> diff --git a/block/blk-core.c b/block/blk-core.c
> index cdae69be68e9..38279d4ae08b 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -145,6 +145,7 @@ static const struct {
>   [BLK_STS_MEDIUM]= { -ENODATA,   "critical medium" },
>   [BLK_STS_PROTECTION]= { -EILSEQ,"protection" },
>   [BLK_STS_RESOURCE]  = { -ENOMEM,"kernel resource" },
> + [BLK_STS_DEV_RESOURCE]  = { -ENOMEM,"device resource" },
>   [BLK_STS_AGAIN] = { -EAGAIN,"nonblocking retry" },

I think we should make BLK_STS_DEV_RESOURCE be -EBUSY, that more closely
matches the result and makes it distinctly different than the global
shortage that is STS_RESOURCE/ENOMEM.

> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 43e7449723e0..e39b4e2a63db 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1160,6 +1160,8 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx 
> **hctx,
>   return true;
>  }
>  
> +#define BLK_MQ_QUEUE_DELAY   3   /* ms units */

Make that BLK_MQ_RESOURCE_DELAY or something like that. The name is too
generic right now.

> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 2d973ac54b09..f41d2057215f 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -39,6 +39,23 @@ typedef u8 __bitwise blk_status_t;
>  
>  #define BLK_STS_AGAIN((__force blk_status_t)12)
>  
> +/*
> + * BLK_STS_DEV_RESOURCE is returned from driver to block layer if device
> + * related resource is unavailable, but driver can guarantee that queue
> + * will be rerun in future once the resource is available (whereby
> + * dispatching requests).
> + *
> + * To safely return BLK_STS_DEV_RESOURCE, and allow forward progress, a
> + * driver just needs to make sure there is in-flight IO.
> + *
> + * Difference with BLK_STS_RESOURCE:
> + * If driver isn't sure if the queue will be rerun once device resource
> + * is made available, please return BLK_STS_RESOURCE.  For example: when
> + * memory allocation, DMA Mapping or other system resource allocation
> + * fails and IO can't be submitted to device.
> + */
> +#define BLK_STS_DEV_RESOURCE ((__force blk_status_t)13)

I'd rephrase that as:

BLK_STS_DEV_RESOURCE is returned from the driver to the block layer if
device related resource are unavailable, but the driver can guarantee
that the queue will be rerun in the future once resources become
available again. This is typically the case for device specific
resources that are consumed for IO. If the driver fails allocating these
resources, we know that inflight (or pending) IO will free these
resource upon completion.

This is different from BLK_STS_RESOURCE in that it explicitly references
device specific resource. For resources of wider scope, allocation
failure can happen without having pending IO. This means that we can't
rely on request completions freeing these resources, as IO may not be in
flight. Examples of that are kernel memory allocations, DMA mappings, or
any other system wide resources.

-- 
Jens Axboe



Re: [PATCH] qla2xxx: Avoid triggering undefined behavior in qla2x00_mbx_completion()

2018-01-30 Thread Martin K. Petersen

Bart,

> A left shift must shift less than the bit width of the left argument.
> Avoid triggering undefined behavior if ha->mbx_count == 32.

Applied to 4.16/scsi-fixes. Thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] [SCSI] sym53c8xx_2: iterator underflow in sym_getsync()

2018-01-30 Thread Martin K. Petersen

Dan,

> We wanted to exit the loop with "div" set to zero, but instead, if we
> don't hit the break then "div" is -1 when we finish the loop.  It
> leads to an array underflow a few lines later.

Applied to 4.16/scsi-fixes, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] bnx2fc: Fix check in SCSI completion handler when reqeust has already timed out.

2018-01-30 Thread Martin K. Petersen

Chad,

> When a request times out we set the io_req flag BNX2FC_FLAG_IO_COMPL so
> that if a subsequent completion comes in on that task ID we will ignore
> it.  The issue is that in the check for this flag there is a missing
> return so we will continue to process a request which may have already
> been returned to the ownership of the SCSI layer.  This can cause
> unpredictable results.

Applied to 4.16/scsi-fixes. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: csiostor: remove redundant assignment to pointer 'ln'

2018-01-30 Thread Martin K. Petersen

Colin,

> The pointer ln is assigned a value that is never read, it is re-assigned
> a new value in the list_for_each loop hence the initialization is
> redundant and can be removed.

Applied to 4.16/scsi-fixes. Thank you.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 1/1] scsi: ufs: Enable quirk to ignore sending WRITE_SAME command

2018-01-30 Thread Martin K. Petersen

Asutosh,

>   /* WRITE_SAME command is not supported*/
>   sdev->no_write_same = 1;
>  
> + /* WRITE_SAME command is not supported*/
> + sdev->no_write_same = 1;
> +

Uhm...

I applied this change to 4.16/scsi-fixes by hand.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] ibmvfc: fix misdefined reserved field in ibmvfc_fcp_rsp_info

2018-01-30 Thread Martin K. Petersen

Tyrel,

> The fcp_rsp_info structure as defined in the FC spec has an initial 3 bytes
> reserved field. The ibmvfc driver mistakenly defined this field as 4 bytes
> resulting in the rsp_code field being defined in what should be the start of
> the second reserved field and thus always being reported as zero by the
> driver.

Applied to 4.16/scsi-fixes, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] qla2xxx: Fix memory corruption during hba reset test

2018-01-30 Thread Martin K. Petersen

Himanshu,

> This patch fixes memory corrpution while performing
> HBA Reset test.

Applied to 4.16/scsi-fixes. Thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [dm-devel] [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE

2018-01-30 Thread Ming Lei
On Tue, Jan 30, 2018 at 09:52:31AM -0800, Bart Van Assche wrote:
> On 01/30/18 06:24, Mike Snitzer wrote:
> > +*
> > +* If driver returns BLK_STS_RESOURCE and SCHED_RESTART
> > +* bit is set, run queue after a delay to avoid IO stalls
> > +* that could otherwise occur if the queue is idle.
> >  */
> > -   if (!blk_mq_sched_needs_restart(hctx) ||
> > +   needs_restart = blk_mq_sched_needs_restart(hctx);
> > +   if (!needs_restart ||
> > (no_tag && list_empty_careful(>dispatch_wait.entry)))
> > blk_mq_run_hw_queue(hctx, true);
> > +   else if (needs_restart && (ret == BLK_STS_RESOURCE))
> > +   blk_mq_delay_run_hw_queue(hctx, BLK_MQ_QUEUE_DELAY);
> > }
> 
> If a request completes concurrently with execution of the above code then
> the request completion will trigger a call of blk_mq_sched_restart_hctx()
> and that call will clear the BLK_MQ_S_SCHED_RESTART bit. If that bit is
> cleared before the above code tests it then the above code will schedule an
> asynchronous call of __blk_mq_run_hw_queue(). If the .queue_rq() call

Right.

> triggered by the new queue run returns again BLK_STS_RESOURCE then the above
> code will be executed again. In other words, a loop occurs. That loop will

This patch doesn't change anything about that. When BLK_STS_RESOURCE is
returned, this request is added to hctx->dispatch, next time, before
dispatching this request, SCHED_RESTART is set and the loop is cut.

> repeat as long as the described race occurs. The current (kernel v4.15)
> block layer behavior is simpler: only block drivers call
> blk_mq_delay_run_hw_queue() and the block layer core never calls that
> function. Hence that loop cannot occur with the v4.15 block layer core and

That way isn't safe, I have explained to you in the following link:

https://marc.info/?l=dm-devel=151727454815018=2

> block drivers. A motivation of why that loop is preferred compared to the
> current behavior (no loop) is missing. Does this mean that that loop is a
> needless complication of the block layer core?

No such loop as I explained above.

> 
> Sorry but I still prefer the v4.15 block layer approach because this patch
> has in my view the following disadvantages:
> - It involves a blk-mq API change. API changes are always risky and need
>   some time to stabilize.
> - The delay after which to rerun the queue is moved from block layer
>   drivers into the block layer core. I think that's wrong because only
>   the block driver authors can make a good choice for this constant.
> - This patch makes block drivers harder to understand. Anyone who sees
>   return BLK_STS_RESOURCE / return BLK_STS_DEV_RESOURCE for the first
>   time will have to look up the meaning of these constants. An explicit
>   blk_mq_delay_run_hw_queue() call is easier to understand.
> - This patch makes the blk-mq core harder to understand because of the
>   loop mentioned above.
> - This patch does not fix any bugs nor makes block drivers easier to
>   read or to implement. So why is this patch considered useful?

It does avoid the race I mentioned in the following link:

https://marc.info/?l=dm-devel=151727454815018=2

More importantly, every driver need this change, if you have better idea
to fix them all, please post a patch, then we can compare both.

Thanks,
Ming


Re: [PATCH] mpt3sas: fix an out of bound write

2018-01-30 Thread Martin K. Petersen

Tomas,

> cpu_msix_table is allocated to store online cpus, but
> pci_irq_get_affinity may return cpu_possible_mask which is then used
> to access cpu_msix_table. That causes bad user experience.  Fix
> limits access to only online cpus, I've also added an additonal test
> to protect from an unlikely change in cpu_online_mask.

Applied to 4.16/scsi-fixes. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


[PATCH v3 09/19] lpfc: Allow set of maximum outstanding SCSI cmd limit for a target parameter

2018-01-30 Thread James Smart
Make the attribute writeable.
Remove the ramp up to logic as its unnecessary, simply set depth.
Add debug message if depth changed, possibly reducing limit, yet
our outstanding count has yet to catch up with it.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/lpfc/lpfc_attr.c |  4 ++--
 drivers/scsi/lpfc/lpfc_scsi.c | 39 ---
 2 files changed, 30 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
index 91df2b4e9fce..d8064e3ea0ba 100644
--- a/drivers/scsi/lpfc/lpfc_attr.c
+++ b/drivers/scsi/lpfc/lpfc_attr.c
@@ -3471,8 +3471,8 @@ LPFC_VPORT_ATTR_R(lun_queue_depth, 30, 1, 512,
 # tgt_queue_depth:  This parameter is used to limit the number of outstanding
 # commands per target port. Value range is [10,65535]. Default value is 65535.
 */
-LPFC_VPORT_ATTR_R(tgt_queue_depth, 65535, 10, 65535,
- "Max number of FCP commands we can queue to a specific target 
port");
+LPFC_VPORT_ATTR_RW(tgt_queue_depth, 65535, 10, 65535,
+  "Max number of FCP commands we can queue to a specific 
target port");
 
 /*
 # hba_queue_depth:  This parameter is used to limit the number of outstanding
diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index c0cdaef4db24..dcc86936e6fc 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -3926,7 +3926,6 @@ lpfc_scsi_cmd_iocb_cmpl(struct lpfc_hba *phba, struct 
lpfc_iocbq *pIocbIn,
struct lpfc_rport_data *rdata = lpfc_cmd->rdata;
struct lpfc_nodelist *pnode = rdata->pnode;
struct scsi_cmnd *cmd;
-   int depth;
unsigned long flags;
struct lpfc_fast_path_event *fast_path_evt;
struct Scsi_Host *shost;
@@ -4132,16 +4131,11 @@ lpfc_scsi_cmd_iocb_cmpl(struct lpfc_hba *phba, struct 
lpfc_iocbq *pIocbIn,
}
spin_unlock_irqrestore(shost->host_lock, flags);
} else if (pnode && NLP_CHK_NODE_ACT(pnode)) {
-   if ((pnode->cmd_qdepth < vport->cfg_tgt_queue_depth) &&
-  time_after(jiffies, pnode->last_change_time +
+   if ((pnode->cmd_qdepth != vport->cfg_tgt_queue_depth) &&
+   time_after(jiffies, pnode->last_change_time +
  msecs_to_jiffies(LPFC_TGTQ_INTERVAL))) {
spin_lock_irqsave(shost->host_lock, flags);
-   depth = pnode->cmd_qdepth * LPFC_TGTQ_RAMPUP_PCENT
-   / 100;
-   depth = depth ? depth : 1;
-   pnode->cmd_qdepth += depth;
-   if (pnode->cmd_qdepth > vport->cfg_tgt_queue_depth)
-   pnode->cmd_qdepth = vport->cfg_tgt_queue_depth;
+   pnode->cmd_qdepth = vport->cfg_tgt_queue_depth;
pnode->last_change_time = jiffies;
spin_unlock_irqrestore(shost->host_lock, flags);
}
@@ -4564,9 +4558,32 @@ lpfc_queuecommand(struct Scsi_Host *shost, struct 
scsi_cmnd *cmnd)
 */
if (!ndlp || !NLP_CHK_NODE_ACT(ndlp))
goto out_tgt_busy;
-   if (atomic_read(>cmd_pending) >= ndlp->cmd_qdepth)
+   if (atomic_read(>cmd_pending) >= ndlp->cmd_qdepth) {
+   lpfc_printf_vlog(vport, KERN_INFO, LOG_FCP_ERROR,
+"3377 Target Queue Full, scsi Id:%d Qdepth:%d"
+" Pending command:%d"
+" 
WWNN:%02x:%02x:%02x:%02x:%02x:%02x:%02x:%02x, "
+" 
WWPN:%02x:%02x:%02x:%02x:%02x:%02x:%02x:%02x",
+ndlp->nlp_sid, ndlp->cmd_qdepth,
+atomic_read(>cmd_pending),
+ndlp->nlp_nodename.u.wwn[0],
+ndlp->nlp_nodename.u.wwn[1],
+ndlp->nlp_nodename.u.wwn[2],
+ndlp->nlp_nodename.u.wwn[3],
+ndlp->nlp_nodename.u.wwn[4],
+ndlp->nlp_nodename.u.wwn[5],
+ndlp->nlp_nodename.u.wwn[6],
+ndlp->nlp_nodename.u.wwn[7],
+ndlp->nlp_portname.u.wwn[0],
+ndlp->nlp_portname.u.wwn[1],
+ndlp->nlp_portname.u.wwn[2],
+ndlp->nlp_portname.u.wwn[3],
+ndlp->nlp_portname.u.wwn[4],
+ndlp->nlp_portname.u.wwn[5],
+ndlp->nlp_portname.u.wwn[6],
+ndlp->nlp_portname.u.wwn[7]);
goto out_tgt_busy;
-
+   }
lpfc_cmd = 

[PATCH v3 11/19] lpfc: Fix issue_lip if link is disabled

2018-01-30 Thread James Smart
The driver ignored checks on whether the link should be
kept administratively down after a link bounce. Correct the
checks.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/lpfc/lpfc_attr.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
index d8064e3ea0ba..b79dad7b8278 100644
--- a/drivers/scsi/lpfc/lpfc_attr.c
+++ b/drivers/scsi/lpfc/lpfc_attr.c
@@ -911,7 +911,12 @@ lpfc_issue_lip(struct Scsi_Host *shost)
LPFC_MBOXQ_t *pmboxq;
int mbxstatus = MBXERR_ERROR;
 
+   /*
+* If the link is offline, disabled or BLOCK_MGMT_IO
+* it doesn't make any sense to allow issue_lip
+*/
if ((vport->fc_flag & FC_OFFLINE_MODE) ||
+   (phba->hba_flag & LINK_DISABLED) ||
(phba->sli.sli_flag & LPFC_BLOCK_MGMT_IO))
return -EPERM;
 
-- 
2.13.1



[PATCH v3 19/19] lpfc: Update 11.4.0.7 modified files for 2018 Copyright

2018-01-30 Thread James Smart
Updated Copyright in files updated 11.4.0.7

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 

---
v3:
  Revised LPFC_COPYRIGHT string for 2018 as well.
---
 drivers/scsi/lpfc/lpfc.h   | 2 +-
 drivers/scsi/lpfc/lpfc_attr.c  | 2 +-
 drivers/scsi/lpfc/lpfc_crtn.h  | 2 +-
 drivers/scsi/lpfc/lpfc_els.c   | 2 +-
 drivers/scsi/lpfc/lpfc_hbadisc.c   | 2 +-
 drivers/scsi/lpfc/lpfc_hw4.h   | 2 +-
 drivers/scsi/lpfc/lpfc_init.c  | 2 +-
 drivers/scsi/lpfc/lpfc_mbox.c  | 2 +-
 drivers/scsi/lpfc/lpfc_mem.c   | 2 +-
 drivers/scsi/lpfc/lpfc_nportdisc.c | 4 ++--
 drivers/scsi/lpfc/lpfc_nvme.c  | 2 +-
 drivers/scsi/lpfc/lpfc_nvmet.c | 2 +-
 drivers/scsi/lpfc/lpfc_nvmet.h | 2 +-
 drivers/scsi/lpfc/lpfc_scsi.c  | 2 +-
 drivers/scsi/lpfc/lpfc_sli.c   | 3 +--
 drivers/scsi/lpfc/lpfc_sli4.h  | 2 +-
 drivers/scsi/lpfc/lpfc_version.h   | 6 +++---
 17 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h
index d042f9118e3b..9698b9635058 100644
--- a/drivers/scsi/lpfc/lpfc.h
+++ b/drivers/scsi/lpfc/lpfc.h
@@ -1,7 +1,7 @@
 /***
  * This file is part of the Emulex Linux Device Driver for *
  * Fibre Channel Host Bus Adapters.*
- * Copyright (C) 2017 Broadcom. All Rights Reserved. The term  *
+ * Copyright (C) 2017-2018 Broadcom. All Rights Reserved. The term *
  * “Broadcom” refers to Broadcom Limited and/or its subsidiaries.  *
  * Copyright (C) 2004-2016 Emulex.  All rights reserved.   *
  * EMULEX and SLI are trademarks of Emulex.*
diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
index b79dad7b8278..70bd25666243 100644
--- a/drivers/scsi/lpfc/lpfc_attr.c
+++ b/drivers/scsi/lpfc/lpfc_attr.c
@@ -1,7 +1,7 @@
 /***
  * This file is part of the Emulex Linux Device Driver for *
  * Fibre Channel Host Bus Adapters.*
- * Copyright (C) 2017 Broadcom. All Rights Reserved. The term  *
+ * Copyright (C) 2017-2018 Broadcom. All Rights Reserved. The term *
  * “Broadcom” refers to Broadcom Limited and/or its subsidiaries.  *
  * Copyright (C) 2004-2016 Emulex.  All rights reserved.   *
  * EMULEX and SLI are trademarks of Emulex.*
diff --git a/drivers/scsi/lpfc/lpfc_crtn.h b/drivers/scsi/lpfc/lpfc_crtn.h
index 3ecf50df93f4..14a86b5b51e4 100644
--- a/drivers/scsi/lpfc/lpfc_crtn.h
+++ b/drivers/scsi/lpfc/lpfc_crtn.h
@@ -1,7 +1,7 @@
 /***
  * This file is part of the Emulex Linux Device Driver for *
  * Fibre Channel Host Bus Adapters.*
- * Copyright (C) 2017 Broadcom. All Rights Reserved. The term  *
+ * Copyright (C) 2017-2018 Broadcom. All Rights Reserved. The term *
  * “Broadcom” refers to Broadcom Limited and/or its subsidiaries.  *
  * Copyright (C) 2004-2016 Emulex.  All rights reserved.   *
  * EMULEX and SLI are trademarks of Emulex.*
diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c
index 404e1af5e2ab..ba896554a14f 100644
--- a/drivers/scsi/lpfc/lpfc_els.c
+++ b/drivers/scsi/lpfc/lpfc_els.c
@@ -1,7 +1,7 @@
 /***
  * This file is part of the Emulex Linux Device Driver for *
  * Fibre Channel Host Bus Adapters.*
- * Copyright (C) 2017 Broadcom. All Rights Reserved. The term  *
+ * Copyright (C) 2017-2018 Broadcom. All Rights Reserved. The term *
  * “Broadcom” refers to Broadcom Limited and/or its subsidiaries.  *
  * Copyright (C) 2004-2016 Emulex.  All rights reserved.   *
  * EMULEX and SLI are trademarks of Emulex.*
diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c b/drivers/scsi/lpfc/lpfc_hbadisc.c
index 9265906d956e..f5bbac3cadbb 100644
--- a/drivers/scsi/lpfc/lpfc_hbadisc.c
+++ b/drivers/scsi/lpfc/lpfc_hbadisc.c
@@ -1,7 +1,7 @@
 /***
  * This file is part of the Emulex Linux Device Driver for *
  * Fibre Channel Host Bus Adapters.*
- * Copyright (C) 2017 Broadcom. All Rights Reserved. The term  *
+ * Copyright (C) 2017-2018 Broadcom. All Rights Reserved. The term *
  * “Broadcom” refers to Broadcom Limited and/or its subsidiaries.  *
  * Copyright (C) 2004-2016 Emulex.  All rights reserved.   *
  * EMULEX and SLI are trademarks of Emulex.*
diff --git a/drivers/scsi/lpfc/lpfc_hw4.h b/drivers/scsi/lpfc/lpfc_hw4.h
index 52fe28ae50fa..8685d26e6929 100644
--- a/drivers/scsi/lpfc/lpfc_hw4.h
+++ b/drivers/scsi/lpfc/lpfc_hw4.h
@@ 

[PATCH v3 10/19] lpfc: Fix soft lockup in lpfc worker thread during LIP testing

2018-01-30 Thread James Smart
During link bounce testing in a point-to-point topology, the
host may enter a soft lockup on the lpfc_worker thread:
Call Trace:
 lpfc_work_done+0x1f3/0x1390 [lpfc]
 lpfc_do_work+0x16f/0x180 [lpfc]
 kthread+0xc7/0xe0
 ret_from_fork+0x3f/0x70

The driver was simultaneously setting a combination of flags
that caused lpfc_do_work()to effectively spin between slow path
work and new event data, causing the lockup.

Ensure in the typical wq completions, that new event data flags
are set if the slow path flag is running. The slow path will
eventually reschedule the wq handling.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/lpfc/lpfc_hbadisc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c b/drivers/scsi/lpfc/lpfc_hbadisc.c
index b159a5c4e388..9265906d956e 100644
--- a/drivers/scsi/lpfc/lpfc_hbadisc.c
+++ b/drivers/scsi/lpfc/lpfc_hbadisc.c
@@ -696,8 +696,9 @@ lpfc_work_done(struct lpfc_hba *phba)
  phba->hba_flag & HBA_SP_QUEUE_EVT)) {
if (pring->flag & LPFC_STOP_IOCB_EVENT) {
pring->flag |= LPFC_DEFERRED_RING_EVENT;
-   /* Set the lpfc data pending flag */
-   set_bit(LPFC_DATA_READY, >data_flags);
+   /* Preserve legacy behavior. */
+   if (!(phba->hba_flag & HBA_SP_QUEUE_EVT))
+   set_bit(LPFC_DATA_READY, >data_flags);
} else {
if (phba->link_state >= LPFC_LINK_UP ||
phba->link_flag & LS_MDS_LOOPBACK) {
-- 
2.13.1



[PATCH v3 06/19] lpfc: Fix PRLI handling when topology type changes

2018-01-30 Thread James Smart
The lpfc driver does not discover a target when the topology
changes from switched-fabric to direct-connect. The target
rejects the PRLI from the initiator in direct-connect as the
driver is using the old S_ID from the switched topology.

The driver was inappropriately clearing the VP bit to register the
VPI, which is what is associated with the S_ID.

Fix by leaving the VP bit set (it was set earlier) and as the VFI
is being re-registered, set the UPDT bit.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/lpfc/lpfc_mbox.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_mbox.c b/drivers/scsi/lpfc/lpfc_mbox.c
index 81fb92967b11..c32d4a323db2 100644
--- a/drivers/scsi/lpfc/lpfc_mbox.c
+++ b/drivers/scsi/lpfc/lpfc_mbox.c
@@ -2170,10 +2170,8 @@ lpfc_reg_vfi(struct lpfcMboxq *mbox, struct lpfc_vport 
*vport, dma_addr_t phys)
/* Only FC supports upd bit */
if ((phba->sli4_hba.lnk_info.lnk_tp == LPFC_LNK_TYPE_FC) &&
(vport->fc_flag & FC_VFI_REGISTERED) &&
-   (!phba->fc_topology_changed)) {
-   bf_set(lpfc_reg_vfi_vp, reg_vfi, 0);
+   (!phba->fc_topology_changed))
bf_set(lpfc_reg_vfi_upd, reg_vfi, 1);
-   }
 
bf_set(lpfc_reg_vfi_bbcr, reg_vfi, 0);
bf_set(lpfc_reg_vfi_bbscn, reg_vfi, 0);
-- 
2.13.1



[PATCH v3 17/19] lpfc: Fix nonrecovery of NVME controller after cable swap.

2018-01-30 Thread James Smart
In a test that is doing large numbers of cable swaps on the target,
the nvme controllers wouldn't reconnect.

During the cable swaps, the targets n_port_id would change. This
information was passed to the nvme-fc transport, in the new remoteport
registration. However, the nvme-fc transport didn't update the n_port_id
value in the remoteport struct when it reused an existing structure.
Later, when a new association was attempted on the remoteport, the
driver's NVME LS routine would use the stale n_port_id from the remoteport
struct to address the LS. As the device is no longer at that address,
the LS would go into never never land.

Separately, the nvme-fc transport will be corrected to update the
n_port_id value on a re-registration.

However, for now, there's no reason to use the transports values.
The private pointer points to the drivers node structure and the
node structure is up to date. Therefore, revise the LS routine to
use the drivers data structures for the LS. Augmented the debug
message for better debugging in the future.

Also removed a duplicate if check that seems to have slipped in.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/lpfc/lpfc_nvme.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c
index c6e5b9972585..6327f858c4c8 100644
--- a/drivers/scsi/lpfc/lpfc_nvme.c
+++ b/drivers/scsi/lpfc/lpfc_nvme.c
@@ -241,10 +241,11 @@ lpfc_nvme_cmpl_gen_req(struct lpfc_hba *phba, struct 
lpfc_iocbq *cmdwqe,
ndlp = (struct lpfc_nodelist *)cmdwqe->context1;
lpfc_printf_vlog(vport, KERN_INFO, LOG_NVME_DISC,
 "6047 nvme cmpl Enter "
-"Data %p DID %x Xri: %x status %x cmd:%p lsreg:%p "
-"bmp:%p ndlp:%p\n",
+"Data %p DID %x Xri: %x status %x reason x%x cmd:%p "
+"lsreg:%p bmp:%p ndlp:%p\n",
 pnvme_lsreq, ndlp ? ndlp->nlp_DID : 0,
 cmdwqe->sli4_xritag, status,
+(wcqe->parameter & 0x),
 cmdwqe, pnvme_lsreq, cmdwqe->context3, ndlp);
 
lpfc_nvmeio_data(phba, "NVME LS  CMPL: xri x%x stat x%x parm x%x\n",
@@ -419,6 +420,7 @@ lpfc_nvme_ls_req(struct nvme_fc_local_port *pnvme_lport,
 {
int ret = 0;
struct lpfc_nvme_lport *lport;
+   struct lpfc_nvme_rport *rport;
struct lpfc_vport *vport;
struct lpfc_nodelist *ndlp;
struct ulp_bde64 *bpl;
@@ -437,19 +439,18 @@ lpfc_nvme_ls_req(struct nvme_fc_local_port *pnvme_lport,
 */
 
lport = (struct lpfc_nvme_lport *)pnvme_lport->private;
+   rport = (struct lpfc_nvme_rport *)pnvme_rport->private;
vport = lport->vport;
 
if (vport->load_flag & FC_UNLOADING)
return -ENODEV;
 
-   if (vport->load_flag & FC_UNLOADING)
-   return -ENODEV;
-
-   ndlp = lpfc_findnode_did(vport, pnvme_rport->port_id);
+   /* Need the ndlp.  It is stored in the driver's rport. */
+   ndlp = rport->ndlp;
if (!ndlp || !NLP_CHK_NODE_ACT(ndlp)) {
lpfc_printf_vlog(vport, KERN_ERR, LOG_NODE | LOG_NVME_IOERR,
-"6051 DID x%06x not an active rport.\n",
-pnvme_rport->port_id);
+"6051 Remoteport %p, rport has invalid ndlp. "
+"Failing LS Req\n", pnvme_rport);
return -ENODEV;
}
 
@@ -500,8 +501,9 @@ lpfc_nvme_ls_req(struct nvme_fc_local_port *pnvme_lport,
 
/* Expand print to include key fields. */
lpfc_printf_vlog(vport, KERN_INFO, LOG_NVME_DISC,
-"6149 ENTER.  lport %p, rport %p lsreq%p rqstlen:%d "
-"rsplen:%d %pad %pad\n",
+"6149 Issue LS Req to DID 0x%06x lport %p, rport %p "
+"lsreq%p rqstlen:%d rsplen:%d %pad %pad\n",
+ndlp->nlp_DID,
 pnvme_lport, pnvme_rport,
 pnvme_lsreq, pnvme_lsreq->rqstlen,
 pnvme_lsreq->rsplen, _lsreq->rqstdma,
@@ -517,7 +519,7 @@ lpfc_nvme_ls_req(struct nvme_fc_local_port *pnvme_lport,
ndlp, 2, 30, 0);
if (ret != WQE_SUCCESS) {
atomic_inc(>xmt_ls_err);
-   lpfc_printf_vlog(vport, KERN_INFO, LOG_NVME_DISC,
+   lpfc_printf_vlog(vport, KERN_ERR, LOG_NVME_DISC,
 "6052 EXIT. issue ls wqe failed lport %p, "
 "rport %p lsreq%p Status %x DID %x\n",
 pnvme_lport, pnvme_rport, pnvme_lsreq,
-- 
2.13.1



[PATCH v3 14/19] lpfc: Validate adapter support for SRIU option

2018-01-30 Thread James Smart
When using the special option to suppress the response iu, ensure
the adapter fully supports the feature by checking feature flags
from the adapter and validating the support when formatting the
WQE.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
Reviewed-by: Hannes Reinecke 

---
v2:
  Add missing snippet that changes check logic before enabling
  suppression in lpfc_nvmet_prep_fcp_wqe()
---
 drivers/scsi/lpfc/lpfc_hw4.h   |  3 +++
 drivers/scsi/lpfc/lpfc_init.c  | 13 -
 drivers/scsi/lpfc/lpfc_nvmet.c |  7 ---
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_hw4.h b/drivers/scsi/lpfc/lpfc_hw4.h
index 7c3afc3d3121..52fe28ae50fa 100644
--- a/drivers/scsi/lpfc/lpfc_hw4.h
+++ b/drivers/scsi/lpfc/lpfc_hw4.h
@@ -3293,6 +3293,9 @@ struct lpfc_sli4_parameters {
 #define cfg_eqdr_SHIFT 8
 #define cfg_eqdr_MASK  0x0001
 #define cfg_eqdr_WORD  word19
+#define cfg_nosr_SHIFT 9
+#define cfg_nosr_MASK  0x0001
+#define cfg_nosr_WORD  word19
 #define LPFC_NODELAY_MAX_IO32
 };
 
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index aa7872a7b493..f2d2faef8710 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -10473,8 +10473,19 @@ lpfc_get_sli4_parameters(struct lpfc_hba *phba, 
LPFC_MBOXQ_t *mboxq)
phba->cfg_enable_fc4_type = LPFC_ENABLE_FCP;
}
 
-   if (bf_get(cfg_xib, mbx_sli4_parameters) && phba->cfg_suppress_rsp)
+   /*
+* To support Suppress Response feature we must satisfy 3 conditions.
+* lpfc_suppress_rsp module parameter must be set (default).
+* In SLI4-Parameters Descriptor:
+* Extended Inline Buffers (XIB) must be supported.
+* Suppress Response IU Not Supported (SRIUNS) must NOT be supported
+* (double negative).
+*/
+   if (phba->cfg_suppress_rsp && bf_get(cfg_xib, mbx_sli4_parameters) &&
+   !(bf_get(cfg_nosr, mbx_sli4_parameters)))
phba->sli.sli_flag |= LPFC_SLI_SUPPRESS_RSP;
+   else
+   phba->cfg_suppress_rsp = 0;
 
if (bf_get(cfg_eqdr, mbx_sli4_parameters))
phba->sli.sli_flag |= LPFC_SLI_USE_EQDR;
diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c
index 0539585d32d4..6dd8535918f6 100644
--- a/drivers/scsi/lpfc/lpfc_nvmet.c
+++ b/drivers/scsi/lpfc/lpfc_nvmet.c
@@ -2290,9 +2290,10 @@ lpfc_nvmet_prep_fcp_wqe(struct lpfc_hba *phba,
if (rsp->op == NVMET_FCOP_READDATA_RSP) {
atomic_inc(>xmt_fcp_read_rsp);
bf_set(wqe_ar, >fcp_tsend.wqe_com, 1);
-   if ((ndlp->nlp_flag & NLP_SUPPRESS_RSP) &&
-   (rsp->rsplen == 12)) {
-   bf_set(wqe_sup, >fcp_tsend.wqe_com, 1);
+   if (rsp->rsplen == LPFC_NVMET_SUCCESS_LEN) {
+   if (ndlp->nlp_flag & NLP_SUPPRESS_RSP)
+   bf_set(wqe_sup,
+  >fcp_tsend.wqe_com, 1);
bf_set(wqe_wqes, >fcp_tsend.wqe_com, 0);
bf_set(wqe_irsp, >fcp_tsend.wqe_com, 0);
bf_set(wqe_irsplen, >fcp_tsend.wqe_com, 0);
-- 
2.13.1



[PATCH v3 18/19] lpfc: update driver version to 11.4.0.7

2018-01-30 Thread James Smart
Update the driver version to 11.4.0.7

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/lpfc/lpfc_version.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_version.h b/drivers/scsi/lpfc/lpfc_version.h
index c232bf0e8998..6f4092cb903e 100644
--- a/drivers/scsi/lpfc/lpfc_version.h
+++ b/drivers/scsi/lpfc/lpfc_version.h
@@ -20,7 +20,7 @@
  * included with this package. *
  ***/
 
-#define LPFC_DRIVER_VERSION "11.4.0.6"
+#define LPFC_DRIVER_VERSION "11.4.0.7"
 #define LPFC_DRIVER_NAME   "lpfc"
 
 /* Used for SLI 2/3 */
-- 
2.13.1



[PATCH v3 08/19] lpfc: Fix RQ empty firmware trap

2018-01-30 Thread James Smart
When nvme target deferred receive logic waits for exchange
resources, the corresponding receive buffer is not replenished
with the hardware. This can result in a lack of asynchronous
receive buffer resources in the hardware, resulting in a
"2885 Port Status Event: ... error 1=0x52004a01 ..." message.

Correct by replenishing the buffer whenenver the deferred
logic kicks in.  Update corresponding debug messages and
statistics as well.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/lpfc/lpfc_attr.c  |  6 ++
 drivers/scsi/lpfc/lpfc_mem.c   |  8 ++--
 drivers/scsi/lpfc/lpfc_nvmet.c | 31 +--
 drivers/scsi/lpfc/lpfc_nvmet.h |  7 +--
 drivers/scsi/lpfc/lpfc_sli.c   | 12 
 5 files changed, 50 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
index d188fb565a32..91df2b4e9fce 100644
--- a/drivers/scsi/lpfc/lpfc_attr.c
+++ b/drivers/scsi/lpfc/lpfc_attr.c
@@ -259,6 +259,12 @@ lpfc_nvme_info_show(struct device *dev, struct 
device_attribute *attr,
atomic_read(>xmt_abort_rsp),
atomic_read(>xmt_abort_rsp_error));
 
+   len += snprintf(buf + len, PAGE_SIZE - len,
+   "DELAY: ctx %08x  fod %08x wqfull %08x\n",
+   atomic_read(>defer_ctx),
+   atomic_read(>defer_fod),
+   atomic_read(>defer_wqfull));
+
/* Calculate outstanding IOs */
tot = atomic_read(>rcv_fcp_cmd_drop);
tot += atomic_read(>xmt_fcp_release);
diff --git a/drivers/scsi/lpfc/lpfc_mem.c b/drivers/scsi/lpfc/lpfc_mem.c
index 56faeb049b4a..60078e61da5e 100644
--- a/drivers/scsi/lpfc/lpfc_mem.c
+++ b/drivers/scsi/lpfc/lpfc_mem.c
@@ -755,10 +755,14 @@ lpfc_rq_buf_free(struct lpfc_hba *phba, struct 
lpfc_dmabuf *mp)
if (rc < 0) {
(rqbp->rqb_free_buffer)(phba, rqb_entry);
lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
-   "6409 Cannot post to RQ %d: %x %x\n",
+   "6409 Cannot post to HRQ %d: %x %x %x "
+   "DRQ %x %x\n",
rqb_entry->hrq->queue_id,
rqb_entry->hrq->host_index,
-   rqb_entry->hrq->hba_index);
+   rqb_entry->hrq->hba_index,
+   rqb_entry->hrq->entry_count,
+   rqb_entry->drq->host_index,
+   rqb_entry->drq->hba_index);
} else {
list_add_tail(_entry->hbuf.list, >rqb_buffer_list);
rqbp->buffer_count++;
diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c
index 9c2acf90212c..0539585d32d4 100644
--- a/drivers/scsi/lpfc/lpfc_nvmet.c
+++ b/drivers/scsi/lpfc/lpfc_nvmet.c
@@ -270,8 +270,6 @@ lpfc_nvmet_ctxbuf_post(struct lpfc_hba *phba, struct 
lpfc_nvmet_ctxbuf *ctx_buf)
 "NVMET RCV BUSY: xri x%x sz %d "
 "from %06x\n",
 oxid, size, sid);
-   /* defer repost rcv buffer till .defer_rcv callback */
-   ctxp->flag &= ~LPFC_NVMET_DEFER_RCV_REPOST;
atomic_inc(>rcv_fcp_cmd_out);
return;
}
@@ -837,6 +835,7 @@ lpfc_nvmet_xmt_fcp_op(struct nvmet_fc_target_port *tgtport,
list_add_tail(>list, >wqfull_list);
wq->q_flag |= HBA_NVMET_WQFULL;
spin_unlock_irqrestore(>ring_lock, iflags);
+   atomic_inc(_nvmep->defer_wqfull);
return 0;
}
 
@@ -975,11 +974,9 @@ lpfc_nvmet_defer_rcv(struct nvmet_fc_target_port *tgtport,
 
tgtp = phba->targetport->private;
atomic_inc(>rcv_fcp_cmd_defer);
-   if (ctxp->flag & LPFC_NVMET_DEFER_RCV_REPOST)
-   lpfc_rq_buf_free(phba, >hbuf); /* repost */
-   else
-   nvmebuf->hrq->rqbp->rqb_free_buffer(phba, nvmebuf);
-   ctxp->flag &= ~LPFC_NVMET_DEFER_RCV_REPOST;
+
+   /* Free the nvmebuf since a new buffer already replaced it */
+   nvmebuf->hrq->rqbp->rqb_free_buffer(phba, nvmebuf);
 }
 
 static struct nvmet_fc_target_template lpfc_tgttemplate = {
@@ -1309,6 +1306,9 @@ lpfc_nvmet_create_targetport(struct lpfc_hba *phba)
atomic_set(>xmt_abort_sol, 0);
atomic_set(>xmt_abort_rsp, 0);
atomic_set(>xmt_abort_rsp_error, 0);
+   atomic_set(>defer_ctx, 0);
+   atomic_set(>defer_fod, 0);
+   atomic_set(>defer_wqfull, 0);
}
return error;
 }
@@ -1810,6 +1810,8 

[PATCH v3 16/19] lpfc: Treat SCSI Write operation Underruns as an error

2018-01-30 Thread James Smart
Currently, write underruns (mismatch of amount transferred vs scsi
status and its residual) detected by the adapter are not being
flagged as an error. Its expected the target controls the data
transfer and would appropriately set the RSP values.  Only read
underruns are treated as errors.

Revise the SCSI error handling to treat write underruns as an
error as well.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/lpfc/lpfc_scsi.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index dcc86936e6fc..10c2dc0cf1fa 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -3772,20 +3772,18 @@ lpfc_handle_fcp_err(struct lpfc_vport *vport, struct 
lpfc_scsi_buf *lpfc_cmd,
scsi_set_resid(cmnd, be32_to_cpu(fcprsp->rspResId));
 
lpfc_printf_vlog(vport, KERN_INFO, LOG_FCP_UNDER,
-"9025 FCP Read Underrun, expected %d, "
+"9025 FCP Underrun, expected %d, "
 "residual %d Data: x%x x%x x%x\n",
 fcpDl,
 scsi_get_resid(cmnd), fcpi_parm, cmnd->cmnd[0],
 cmnd->underflow);
 
/*
-* If there is an under run check if under run reported by
+* If there is an under run, check if under run reported by
 * storage array is same as the under run reported by HBA.
 * If this is not same, there is a dropped frame.
 */
-   if ((cmnd->sc_data_direction == DMA_FROM_DEVICE) &&
-   fcpi_parm &&
-   (scsi_get_resid(cmnd) != fcpi_parm)) {
+   if (fcpi_parm && (scsi_get_resid(cmnd) != fcpi_parm)) {
lpfc_printf_vlog(vport, KERN_WARNING,
 LOG_FCP | LOG_FCP_ERROR,
 "9026 FCP Read Check Error "
-- 
2.13.1



[PATCH v3 13/19] lpfc: Fix SCSI io host reset causing kernel crash

2018-01-30 Thread James Smart
During SCSI error handling escalation to host reset, the SCSI io
routines were moved off the txcmplq, but the individual io's
ON_CMPLQ flag wasn't cleared.  Thus, a background thread saw the
io and attempted to access it as if on the txcmplq.

Clear the flag upon removal.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/lpfc/lpfc_init.c |  4 
 drivers/scsi/lpfc/lpfc_sli.c  | 13 -
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index bff5c95cf5df..aa7872a7b493 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -958,6 +958,7 @@ lpfc_hba_clean_txcmplq(struct lpfc_hba *phba)
struct lpfc_sli_ring *pring;
LIST_HEAD(completions);
int i;
+   struct lpfc_iocbq *piocb, *next_iocb;
 
if (phba->sli_rev != LPFC_SLI_REV4) {
for (i = 0; i < psli->num_rings; i++) {
@@ -983,6 +984,9 @@ lpfc_hba_clean_txcmplq(struct lpfc_hba *phba)
if (!pring)
continue;
spin_lock_irq(>ring_lock);
+   list_for_each_entry_safe(piocb, next_iocb,
+>txcmplq, list)
+   piocb->iocb_flag &= ~LPFC_IO_ON_TXCMPLQ;
list_splice_init(>txcmplq, );
pring->txcmplq_cnt = 0;
spin_unlock_irq(>ring_lock);
diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 8b2919a553d6..d597e15a1974 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -3778,6 +3778,7 @@ lpfc_sli_flush_fcp_rings(struct lpfc_hba *phba)
struct lpfc_sli *psli = >sli;
struct lpfc_sli_ring  *pring;
uint32_t i;
+   struct lpfc_iocbq *piocb, *next_iocb;
 
spin_lock_irq(>hbalock);
/* Indicate the I/O queues are flushed */
@@ -3792,6 +3793,9 @@ lpfc_sli_flush_fcp_rings(struct lpfc_hba *phba)
spin_lock_irq(>ring_lock);
/* Retrieve everything on txq */
list_splice_init(>txq, );
+   list_for_each_entry_safe(piocb, next_iocb,
+>txcmplq, list)
+   piocb->iocb_flag &= ~LPFC_IO_ON_TXCMPLQ;
/* Retrieve everything on the txcmplq */
list_splice_init(>txcmplq, );
pring->txq_cnt = 0;
@@ -3813,6 +3817,9 @@ lpfc_sli_flush_fcp_rings(struct lpfc_hba *phba)
spin_lock_irq(>hbalock);
/* Retrieve everything on txq */
list_splice_init(>txq, );
+   list_for_each_entry_safe(piocb, next_iocb,
+>txcmplq, list)
+   piocb->iocb_flag &= ~LPFC_IO_ON_TXCMPLQ;
/* Retrieve everything on the txcmplq */
list_splice_init(>txcmplq, );
pring->txq_cnt = 0;
@@ -3844,6 +3851,7 @@ lpfc_sli_flush_nvme_rings(struct lpfc_hba *phba)
LIST_HEAD(txcmplq);
struct lpfc_sli_ring  *pring;
uint32_t i;
+   struct lpfc_iocbq *piocb, *next_iocb;
 
if (phba->sli_rev < LPFC_SLI_REV4)
return;
@@ -3860,8 +3868,11 @@ lpfc_sli_flush_nvme_rings(struct lpfc_hba *phba)
for (i = 0; i < phba->cfg_nvme_io_channel; i++) {
pring = phba->sli4_hba.nvme_wq[i]->pring;
 
-   /* Retrieve everything on the txcmplq */
spin_lock_irq(>ring_lock);
+   list_for_each_entry_safe(piocb, next_iocb,
+>txcmplq, list)
+   piocb->iocb_flag &= ~LPFC_IO_ON_TXCMPLQ;
+   /* Retrieve everything on the txcmplq */
list_splice_init(>txcmplq, );
pring->txcmplq_cnt = 0;
spin_unlock_irq(>ring_lock);
-- 
2.13.1



[PATCH v3 07/19] lpfc: Fix IO failure during hba reset testing with nvme io.

2018-01-30 Thread James Smart
A stress test repeatedly resetting the adapter while performing
io would eventually report I/O failures and missing nvme namespaces.

The driver was setting the nvmefc_fcp_req->private pointer to NULL
during the IO completion routine before upcalling done().
If the transport was also running an abort for that IO, the driver
would fail the abort with message 6140. Failing the abort is not
allowed by the nvme-fc transport, as it mandates that the io must be
returned back to the transport. As that does not happen, the transport
controller delete has an outstanding reference and can't complete
teardown.

The NULL-ing of the private pointer should be done only when the io
is considered complete. It's complete when the adapter returns the
exchange with the "exchange busy" flag clear.

Move the NULL'ing of the structure to the done case. This leaves the
io contexts set while it is busy and until the subsequent XRI_ABORTED
completion which returns the exchange is received.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 

---
v3:
  Address review comment that preferred to continue to NULL if the
  io did complete in order to protect stale conditions. After review,
  we agreed, thus the above change.
---
 drivers/scsi/lpfc/lpfc_nvme.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c
index 81e3a4f10c3c..c6e5b9972585 100644
--- a/drivers/scsi/lpfc/lpfc_nvme.c
+++ b/drivers/scsi/lpfc/lpfc_nvme.c
@@ -980,14 +980,14 @@ lpfc_nvme_io_cmd_wqe_cmpl(struct lpfc_hba *phba, struct 
lpfc_iocbq *pwqeIn,
phba->cpucheck_cmpl_io[lpfc_ncmd->cpu]++;
}
 #endif
-   freqpriv = nCmd->private;
-   freqpriv->nvme_buf = NULL;
 
/* NVME targets need completion held off until the abort exchange
 * completes unless the NVME Rport is getting unregistered.
 */
 
if (!(lpfc_ncmd->flags & LPFC_SBUF_XBUSY)) {
+   freqpriv = nCmd->private;
+   freqpriv->nvme_buf = NULL;
nCmd->done(nCmd);
lpfc_ncmd->nvmeCmd = NULL;
}
-- 
2.13.1



[PATCH v3 05/19] lpfc: Add WQ Full Logic for NVME Target

2018-01-30 Thread James Smart
I/O conditions on the nvme target may have the driver submitting
to a full hardware wq. The hardware wq is a shared resource among
all nvme controllers. When the driver hit a full wq, it failed the
io posting back to the nvme-fc transport, which then escalated it
into errors.

Correct by maintaining a sideband queue within the driver that is
added to when the WQ full condition is hit, and drained from as soon
as new WQ space opens up.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/lpfc/lpfc_crtn.h  |   1 +
 drivers/scsi/lpfc/lpfc_nvmet.c | 116 +
 drivers/scsi/lpfc/lpfc_nvmet.h |   1 +
 drivers/scsi/lpfc/lpfc_sli.c   |   3 ++
 drivers/scsi/lpfc/lpfc_sli4.h  |   5 +-
 5 files changed, 125 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_crtn.h b/drivers/scsi/lpfc/lpfc_crtn.h
index 559f9aa0ed08..3ecf50df93f4 100644
--- a/drivers/scsi/lpfc/lpfc_crtn.h
+++ b/drivers/scsi/lpfc/lpfc_crtn.h
@@ -254,6 +254,7 @@ void lpfc_nvmet_ctxbuf_post(struct lpfc_hba *phba,
struct lpfc_nvmet_ctxbuf *ctxp);
 int lpfc_nvmet_rcv_unsol_abort(struct lpfc_vport *vport,
   struct fc_frame_header *fc_hdr);
+void lpfc_nvmet_wqfull_process(struct lpfc_hba *phba, struct lpfc_queue *wq);
 void lpfc_sli_flush_nvme_rings(struct lpfc_hba *phba);
 void lpfc_nvme_wait_for_io_drain(struct lpfc_hba *phba);
 void lpfc_sli4_build_dflt_fcf_record(struct lpfc_hba *, struct fcf_record *,
diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c
index 7927ac46d345..9c2acf90212c 100644
--- a/drivers/scsi/lpfc/lpfc_nvmet.c
+++ b/drivers/scsi/lpfc/lpfc_nvmet.c
@@ -71,6 +71,8 @@ static int lpfc_nvmet_unsol_fcp_issue_abort(struct lpfc_hba *,
 static int lpfc_nvmet_unsol_ls_issue_abort(struct lpfc_hba *,
   struct lpfc_nvmet_rcv_ctx *,
   uint32_t, uint16_t);
+static void lpfc_nvmet_wqfull_flush(struct lpfc_hba *, struct lpfc_queue *,
+   struct lpfc_nvmet_rcv_ctx *);
 
 void
 lpfc_nvmet_defer_release(struct lpfc_hba *phba, struct lpfc_nvmet_rcv_ctx 
*ctxp)
@@ -741,7 +743,10 @@ lpfc_nvmet_xmt_fcp_op(struct nvmet_fc_target_port *tgtport,
struct lpfc_nvmet_rcv_ctx *ctxp =
container_of(rsp, struct lpfc_nvmet_rcv_ctx, ctx.fcp_req);
struct lpfc_hba *phba = ctxp->phba;
+   struct lpfc_queue *wq;
struct lpfc_iocbq *nvmewqeq;
+   struct lpfc_sli_ring *pring;
+   unsigned long iflags;
int rc;
 
if (phba->pport->load_flag & FC_UNLOADING) {
@@ -820,6 +825,21 @@ lpfc_nvmet_xmt_fcp_op(struct nvmet_fc_target_port *tgtport,
return 0;
}
 
+   if (rc == -EBUSY) {
+   /*
+* WQ was full, so queue nvmewqeq to be sent after
+* WQE release CQE
+*/
+   ctxp->flag |= LPFC_NVMET_DEFER_WQFULL;
+   wq = phba->sli4_hba.nvme_wq[rsp->hwqid];
+   pring = wq->pring;
+   spin_lock_irqsave(>ring_lock, iflags);
+   list_add_tail(>list, >wqfull_list);
+   wq->q_flag |= HBA_NVMET_WQFULL;
+   spin_unlock_irqrestore(>ring_lock, iflags);
+   return 0;
+   }
+
/* Give back resources */
atomic_inc(_nvmep->xmt_fcp_drop);
lpfc_printf_log(phba, KERN_ERR, LOG_NVME_IOERR,
@@ -851,6 +871,7 @@ lpfc_nvmet_xmt_fcp_abort(struct nvmet_fc_target_port 
*tgtport,
struct lpfc_nvmet_rcv_ctx *ctxp =
container_of(req, struct lpfc_nvmet_rcv_ctx, ctx.fcp_req);
struct lpfc_hba *phba = ctxp->phba;
+   struct lpfc_queue *wq;
unsigned long flags;
 
if (phba->pport->load_flag & FC_UNLOADING)
@@ -880,6 +901,14 @@ lpfc_nvmet_xmt_fcp_abort(struct nvmet_fc_target_port 
*tgtport,
}
ctxp->flag |= LPFC_NVMET_ABORT_OP;
 
+   if (ctxp->flag & LPFC_NVMET_DEFER_WQFULL) {
+   lpfc_nvmet_unsol_fcp_issue_abort(phba, ctxp, ctxp->sid,
+ctxp->oxid);
+   wq = phba->sli4_hba.nvme_wq[ctxp->wqeq->hba_wqidx];
+   lpfc_nvmet_wqfull_flush(phba, wq, ctxp);
+   return;
+   }
+
/* An state of LPFC_NVMET_STE_RCV means we have just received
 * the NVME command and have not started processing it.
 * (by issuing any IO WQEs on this exchange yet)
@@ -1435,16 +1464,103 @@ lpfc_nvmet_rcv_unsol_abort(struct lpfc_vport *vport,
return 0;
 }
 
+static void
+lpfc_nvmet_wqfull_flush(struct lpfc_hba *phba, struct lpfc_queue *wq,
+   struct lpfc_nvmet_rcv_ctx *ctxp)
+{
+   struct lpfc_sli_ring *pring;
+   struct lpfc_iocbq *nvmewqeq;
+   struct lpfc_iocbq *next_nvmewqeq;
+   unsigned long 

[PATCH v3 12/19] lpfc: Indicate CONF support in NVMe PRLI

2018-01-30 Thread James Smart
Revise the NVME PRLI to indicate CONF support.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/lpfc/lpfc_els.c   | 3 ++-
 drivers/scsi/lpfc/lpfc_hw4.h   | 6 +++---
 drivers/scsi/lpfc/lpfc_nportdisc.c | 3 ---
 3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c
index 234c7c015982..404e1af5e2ab 100644
--- a/drivers/scsi/lpfc/lpfc_els.c
+++ b/drivers/scsi/lpfc/lpfc_els.c
@@ -2293,10 +2293,11 @@ lpfc_issue_els_prli(struct lpfc_vport *vport, struct 
lpfc_nodelist *ndlp,
if (phba->nvmet_support) {
bf_set(prli_tgt, npr_nvme, 1);
bf_set(prli_disc, npr_nvme, 1);
-
} else {
bf_set(prli_init, npr_nvme, 1);
+   bf_set(prli_conf, npr_nvme, 1);
}
+
npr_nvme->word1 = cpu_to_be32(npr_nvme->word1);
npr_nvme->word4 = cpu_to_be32(npr_nvme->word4);
elsiocb->iocb_flag |= LPFC_PRLI_NVME_REQ;
diff --git a/drivers/scsi/lpfc/lpfc_hw4.h b/drivers/scsi/lpfc/lpfc_hw4.h
index ef469129fb71..7c3afc3d3121 100644
--- a/drivers/scsi/lpfc/lpfc_hw4.h
+++ b/drivers/scsi/lpfc/lpfc_hw4.h
@@ -4346,9 +4346,9 @@ struct lpfc_nvme_prli {
 #define prli_init_SHIFT 5
 #define prli_init_MASK  0x0001
 #define prli_init_WORD  word4
-#define prli_recov_SHIFT8
-#define prli_recov_MASK 0x0001
-#define prli_recov_WORD word4
+#define prli_conf_SHIFT 7
+#define prli_conf_MASK  0x0001
+#define prli_conf_WORD  word4
uint32_t word5;
 #define prli_fb_sz_SHIFT0
 #define prli_fb_sz_MASK 0x
diff --git a/drivers/scsi/lpfc/lpfc_nportdisc.c 
b/drivers/scsi/lpfc/lpfc_nportdisc.c
index d841aa42f607..bbf1e1342b09 100644
--- a/drivers/scsi/lpfc/lpfc_nportdisc.c
+++ b/drivers/scsi/lpfc/lpfc_nportdisc.c
@@ -2011,9 +2011,6 @@ lpfc_cmpl_prli_prli_issue(struct lpfc_vport *vport, 
struct lpfc_nodelist *ndlp,
}
}
 
-   if (bf_get_be32(prli_recov, nvpr))
-   ndlp->nlp_fcp_info |= NLP_FCP_2_DEVICE;
-
lpfc_printf_vlog(vport, KERN_INFO, LOG_NVME_DISC,
 "6029 NVME PRLI Cmpl w1 x%08x "
 "w4 x%08x w5 x%08x flag x%x, "
-- 
2.13.1



[PATCH v3 15/19] lpfc: Fix header inclusion in lpfc_nvmet

2018-01-30 Thread James Smart
The driver was inappropriately pulling in the nvme host's
nvme.h header. What it really needed was the standard
 header.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/lpfc/lpfc_nvmet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c
index 6dd8535918f6..823b6df0aec7 100644
--- a/drivers/scsi/lpfc/lpfc_nvmet.c
+++ b/drivers/scsi/lpfc/lpfc_nvmet.c
@@ -36,7 +36,7 @@
 #include 
 #include 
 
-#include <../drivers/nvme/host/nvme.h>
+#include 
 #include 
 #include 
 
-- 
2.13.1



[PATCH v3 04/19] lpfc: correct debug counters for abort

2018-01-30 Thread James Smart
Existing code was using the wrong field for the completion status
when comparing whether to increment abort statistics

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/lpfc/lpfc_nvmet.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c
index 8dbf5c9d51aa..7927ac46d345 100644
--- a/drivers/scsi/lpfc/lpfc_nvmet.c
+++ b/drivers/scsi/lpfc/lpfc_nvmet.c
@@ -130,7 +130,7 @@ lpfc_nvmet_xmt_ls_rsp_cmp(struct lpfc_hba *phba, struct 
lpfc_iocbq *cmdwqe,
if (tgtp) {
if (status) {
atomic_inc(>xmt_ls_rsp_error);
-   if (status == IOERR_ABORT_REQUESTED)
+   if (result == IOERR_ABORT_REQUESTED)
atomic_inc(>xmt_ls_rsp_aborted);
if (bf_get(lpfc_wcqe_c_xb, wcqe))
atomic_inc(>xmt_ls_rsp_xb_set);
@@ -541,7 +541,7 @@ lpfc_nvmet_xmt_fcp_op_cmp(struct lpfc_hba *phba, struct 
lpfc_iocbq *cmdwqe,
rsp->transferred_length = 0;
if (tgtp) {
atomic_inc(>xmt_fcp_rsp_error);
-   if (status == IOERR_ABORT_REQUESTED)
+   if (result == IOERR_ABORT_REQUESTED)
atomic_inc(>xmt_fcp_rsp_aborted);
}
 
-- 
2.13.1



[PATCH v3 02/19] lpfc: Increase CQ and WQ sizes for SCSI

2018-01-30 Thread James Smart
Increased CQ and WQ sizes for SCSI FCP, matching those used
for NVMe development.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 

---
v3:
  lpfc_init.c:
Corrected indentation of comments after patching
Removed unnecessary braces
Revised comment on checking embedded cdb support
---
 drivers/scsi/lpfc/lpfc.h  |  1 +
 drivers/scsi/lpfc/lpfc_hw4.h  |  3 +++
 drivers/scsi/lpfc/lpfc_init.c | 36 +---
 drivers/scsi/lpfc/lpfc_sli.c  |  3 ++-
 drivers/scsi/lpfc/lpfc_sli4.h |  5 +
 5 files changed, 36 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h
index 61fb46da05d4..d042f9118e3b 100644
--- a/drivers/scsi/lpfc/lpfc.h
+++ b/drivers/scsi/lpfc/lpfc.h
@@ -760,6 +760,7 @@ struct lpfc_hba {
uint8_t  mds_diags_support;
uint32_t initial_imax;
uint8_t  bbcredit_support;
+   uint8_t  enab_exp_wqcq_pages;
 
/* HBA Config Parameters */
uint32_t cfg_ack0;
diff --git a/drivers/scsi/lpfc/lpfc_hw4.h b/drivers/scsi/lpfc/lpfc_hw4.h
index 73c2f6971d2b..ef469129fb71 100644
--- a/drivers/scsi/lpfc/lpfc_hw4.h
+++ b/drivers/scsi/lpfc/lpfc_hw4.h
@@ -3212,6 +3212,9 @@ struct lpfc_sli4_parameters {
 #define cfg_cqv_SHIFT  14
 #define cfg_cqv_MASK   0x0003
 #define cfg_cqv_WORD   word4
+#define cfg_cqpsize_SHIFT  16
+#define cfg_cqpsize_MASK   0x00ff
+#define cfg_cqpsize_WORD   word4
uint32_t word5;
uint32_t word6;
 #define cfg_mqv_SHIFT  14
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index f539c554588c..40ffa042 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -8011,9 +8011,10 @@ static int
 lpfc_alloc_fcp_wq_cq(struct lpfc_hba *phba, int wqidx)
 {
struct lpfc_queue *qdesc;
+   uint32_t wqesize;
 
/* Create Fast Path FCP CQs */
-   if (phba->fcp_embed_io)
+   if (phba->enab_exp_wqcq_pages)
/* Increase the CQ size when WQEs contain an embedded cdb */
qdesc = lpfc_sli4_queue_alloc(phba, LPFC_EXPANDED_PAGE_SIZE,
  phba->sli4_hba.cq_esize,
@@ -8031,15 +8032,18 @@ lpfc_alloc_fcp_wq_cq(struct lpfc_hba *phba, int wqidx)
phba->sli4_hba.fcp_cq[wqidx] = qdesc;
 
/* Create Fast Path FCP WQs */
-   if (phba->fcp_embed_io)
+   if (phba->enab_exp_wqcq_pages) {
/* Increase the WQ size when WQEs contain an embedded cdb */
+   wqesize = (phba->fcp_embed_io) ?
+   LPFC_WQE128_SIZE : phba->sli4_hba.wq_esize;
qdesc = lpfc_sli4_queue_alloc(phba, LPFC_EXPANDED_PAGE_SIZE,
- LPFC_WQE128_SIZE,
+ wqesize,
  LPFC_WQE_EXP_COUNT);
-   else
+   } else
qdesc = lpfc_sli4_queue_alloc(phba, LPFC_DEFAULT_PAGE_SIZE,
  phba->sli4_hba.wq_esize,
  phba->sli4_hba.wq_ecount);
+
if (!qdesc) {
lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
"0503 Failed allocate fast-path FCP WQ (%d)\n",
@@ -10476,15 +10480,21 @@ lpfc_get_sli4_parameters(struct lpfc_hba *phba, 
LPFC_MBOXQ_t *mboxq)
sli4_params->sge_supp_len = LPFC_MAX_SGE_SIZE;
 
/*
-* Issue IOs with CDB embedded in WQE to minimized the number
-* of DMAs the firmware has to do. Setting this to 1 also forces
-* the driver to use 128 bytes WQEs for FCP IOs.
+* Check whether the adapter supports an embedded copy of the
+* FCP CMD IU within the WQE for FCP_Ixxx commands. In order
+* to use this option, 128-byte WQEs must be used.
 */
if (bf_get(cfg_ext_embed_cb, mbx_sli4_parameters))
phba->fcp_embed_io = 1;
else
phba->fcp_embed_io = 0;
 
+   if ((bf_get(cfg_cqpsize, mbx_sli4_parameters) & LPFC_CQ_16K_PAGE_SZ) &&
+   (bf_get(cfg_wqpsize, mbx_sli4_parameters) & LPFC_WQ_16K_PAGE_SZ) &&
+   (sli4_params->wqsize & LPFC_WQ_SZ128_SUPPORT))
+   phba->enab_exp_wqcq_pages = 1;
+   else
+   phba->enab_exp_wqcq_pages = 0;
/*
 * Check if the SLI port supports MDS Diagnostics
 */
@@ -12227,6 +12237,7 @@ int
 lpfc_fof_queue_create(struct lpfc_hba *phba)
 {
struct lpfc_queue *qdesc;
+   uint32_t wqesize;
 
/* Create FOF EQ */
qdesc = lpfc_sli4_queue_alloc(phba, LPFC_DEFAULT_PAGE_SIZE,
@@ -12240,7 +12251,7 @@ lpfc_fof_queue_create(struct lpfc_hba *phba)
if (phba->cfg_fof) {
 
/* 

[PATCH v3 01/19] lpfc: Fix frequency of Release WQE CQEs

2018-01-30 Thread James Smart
The driver controls when the hardware sends completions that
communicate consumption of elements from the WQ. This is done by
setting a WQEC bit on a WQE.

The current driver sets it on every Nth WQE posting. However, the
driver isn't clearing the bit if the WQE is reused. Thus, if the
queue depth isn't evenly divisible by N, with enough time, it can
be set on every element, creating a lot of overhead and risking
CQ full conditions.

Correct by clearing the bit when not setting it on an Nth element.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/lpfc/lpfc_sli.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 5f5528a12308..149f21f53b13 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -129,6 +129,8 @@ lpfc_sli4_wq_put(struct lpfc_queue *q, union lpfc_wqe *wqe)
/* set consumption flag every once in a while */
if (!((q->host_index + 1) % q->entry_repost))
bf_set(wqe_wqec, >generic.wqe_com, 1);
+   else
+   bf_set(wqe_wqec, >generic.wqe_com, 0);
if (q->phba->sli3_options & LPFC_SLI4_PHWQ_ENABLED)
bf_set(wqe_wqid, >generic.wqe_com, q->queue_id);
lpfc_sli_pcimem_bcopy(wqe, temp_wqe, q->entry_size);
-- 
2.13.1



[PATCH v3 03/19] lpfc: move placement of target destroy on driver detach

2018-01-30 Thread James Smart
Ensure nvme localports/targetports are torn down before
dismantling the adapter sli interface on driver detachment.
This aids leaving interfaces live while nvme may be making
callbacks to abort it.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
Reviewed-by: Hannes Reinecke 
---
 drivers/scsi/lpfc/lpfc_init.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 40ffa042..bff5c95cf5df 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -11503,13 +11503,6 @@ lpfc_pci_remove_one_s4(struct pci_dev *pdev)
/* Remove FC host and then SCSI host with the physical port */
fc_remove_host(shost);
scsi_remove_host(shost);
-   /*
-* Bring down the SLI Layer. This step disables all interrupts,
-* clears the rings, discards all mailbox commands, and resets
-* the HBA FCoE function.
-*/
-   lpfc_debugfs_terminate(vport);
-   lpfc_sli4_hba_unset(phba);
 
/* Perform ndlp cleanup on the physical port.  The nvme and nvmet
 * localports are destroyed after to cleanup all transport memory.
@@ -11518,6 +11511,13 @@ lpfc_pci_remove_one_s4(struct pci_dev *pdev)
lpfc_nvmet_destroy_targetport(phba);
lpfc_nvme_destroy_localport(vport);
 
+   /*
+* Bring down the SLI Layer. This step disables all interrupts,
+* clears the rings, discards all mailbox commands, and resets
+* the HBA FCoE function.
+*/
+   lpfc_debugfs_terminate(vport);
+   lpfc_sli4_hba_unset(phba);
 
lpfc_stop_hba_timers(phba);
spin_lock_irq(>hbalock);
-- 
2.13.1



[PATCH v3 00/19] lpfc updates for 11.4.0.7

2018-01-30 Thread James Smart
This patch set provides a number of fixes for the driver.

The patches were cut against the Martin's 4.16/scsi-queue tree.
There are no outside dependencies and are expected to be pulled
via Martins tree.


v2:
  respin patch 14 "lpfc: Validate adapter support for SRIU option"
for snippet that was missing

v3:
  Responded to review comments from Hannes
  Updated Copyright string used in load banner


James Smart (19):
  lpfc: Fix frequency of Release WQE CQEs
  lpfc: Increase CQ and WQ sizes for SCSI
  lpfc: move placement of target destroy on driver detach
  lpfc: correct debug counters for abort
  lpfc: Add WQ Full Logic for NVME Target
  lpfc: Fix PRLI handling when topology type changes
  lpfc: Fix IO failure during hba reset testing with nvme io.
  lpfc: Fix RQ empty firmware trap
  lpfc: Allow set of maximum outstanding SCSI cmd limit for a target
parameter
  lpfc: Fix soft lockup in lpfc worker thread during LIP testing
  lpfc: Fix issue_lip if link is disabled
  lpfc: Indicate CONF support in NVMe PRLI
  lpfc: Fix SCSI io host reset causing kernel crash
  lpfc: Validate adapter support for SRIU option
  lpfc: Fix header inclusion in lpfc_nvmet
  lpfc: Treat SCSI Write operation Underruns as an error
  lpfc: Fix nonrecovery of NVME controller after cable swap.
  lpfc: update driver version to 11.4.0.7
  lpfc: Update 11.4.0.7 modified files for 2018 Copyright

 drivers/scsi/lpfc/lpfc.h   |   3 +-
 drivers/scsi/lpfc/lpfc_attr.c  |  17 +++-
 drivers/scsi/lpfc/lpfc_crtn.h  |   3 +-
 drivers/scsi/lpfc/lpfc_els.c   |   5 +-
 drivers/scsi/lpfc/lpfc_hbadisc.c   |   7 +-
 drivers/scsi/lpfc/lpfc_hw4.h   |  14 +++-
 drivers/scsi/lpfc/lpfc_init.c  |  69 +++-
 drivers/scsi/lpfc/lpfc_mbox.c  |   6 +-
 drivers/scsi/lpfc/lpfc_mem.c   |  10 ++-
 drivers/scsi/lpfc/lpfc_nportdisc.c |   7 +-
 drivers/scsi/lpfc/lpfc_nvme.c  |  30 +++
 drivers/scsi/lpfc/lpfc_nvmet.c | 162 +
 drivers/scsi/lpfc/lpfc_nvmet.h |  10 ++-
 drivers/scsi/lpfc/lpfc_scsi.c  |  49 +++
 drivers/scsi/lpfc/lpfc_sli.c   |  36 -
 drivers/scsi/lpfc/lpfc_sli4.h  |  12 ++-
 drivers/scsi/lpfc/lpfc_version.h   |   8 +-
 17 files changed, 341 insertions(+), 107 deletions(-)

-- 
2.13.1



Re: [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE

2018-01-30 Thread Mike Snitzer
On Tue, Jan 30 2018 at  2:42pm -0500,
Bart Van Assche  wrote:

> On Tue, 2018-01-30 at 14:33 -0500, Mike Snitzer wrote:
> > On Tue, Jan 30 2018 at 12:52pm -0500,
> > Bart Van Assche  wrote:
> > 
> > > - This patch does not fix any bugs nor makes block drivers easier to
> > >   read or to implement. So why is this patch considered useful?
> > 
> > It enables blk-mq core to own the problem that individual drivers should
> > have no business needing to worry about.  Period.
> 
> Thanks for having confirmed that this patch is an API-change only and does
> not fix any bugs.

No, it is an API change that enables blk-mq drivers to make forward
progress without compromising the potential benefits associated with
blk-mq's SCHED_RESTART functionality.

(If we're going to beat this horse to death it might as well be with
precision)


Re: [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE

2018-01-30 Thread Bart Van Assche
On Tue, 2018-01-30 at 14:33 -0500, Mike Snitzer wrote:
> On Tue, Jan 30 2018 at 12:52pm -0500,
> Bart Van Assche  wrote:
> 
> > - This patch does not fix any bugs nor makes block drivers easier to
> >   read or to implement. So why is this patch considered useful?
> 
> It enables blk-mq core to own the problem that individual drivers should
> have no business needing to worry about.  Period.

Thanks for having confirmed that this patch is an API-change only and does
not fix any bugs.

Bart.

Re: [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE

2018-01-30 Thread Mike Snitzer
On Tue, Jan 30 2018 at 12:52pm -0500,
Bart Van Assche  wrote:

> On 01/30/18 06:24, Mike Snitzer wrote:
> >+ *
> >+ * If driver returns BLK_STS_RESOURCE and SCHED_RESTART
> >+ * bit is set, run queue after a delay to avoid IO stalls
> >+ * that could otherwise occur if the queue is idle.
> >  */
> >-if (!blk_mq_sched_needs_restart(hctx) ||
> >+needs_restart = blk_mq_sched_needs_restart(hctx);
> >+if (!needs_restart ||
> > (no_tag && list_empty_careful(>dispatch_wait.entry)))
> > blk_mq_run_hw_queue(hctx, true);
> >+else if (needs_restart && (ret == BLK_STS_RESOURCE))
> >+blk_mq_delay_run_hw_queue(hctx, BLK_MQ_QUEUE_DELAY);
> > }
> 
> If a request completes concurrently with execution of the above code
> then the request completion will trigger a call of
> blk_mq_sched_restart_hctx() and that call will clear the
> BLK_MQ_S_SCHED_RESTART bit. If that bit is cleared before the above
> code tests it then the above code will schedule an asynchronous call
> of __blk_mq_run_hw_queue(). If the .queue_rq() call triggered by the
> new queue run returns again BLK_STS_RESOURCE then the above code
> will be executed again. In other words, a loop occurs. That loop
> will repeat as long as the described race occurs. The current
> (kernel v4.15) block layer behavior is simpler: only block drivers
> call blk_mq_delay_run_hw_queue() and the block layer core never
> calls that function. Hence that loop cannot occur with the v4.15
> block layer core and block drivers. A motivation of why that loop is
> preferred compared to the current behavior (no loop) is missing.
> Does this mean that that loop is a needless complication of the
> block layer core?

No it means the loop is an internal blk-mq concern.  And that drivers
needn't worry about kicking the queue.  And it affords blk-mq latitude
to change how it responds to BLK_STS_RESOURCE in the future (without
needing to change every driver).

But even v4.15 has a similar loop.  It just happens to extend into the
.queue_rq() where the driver is completely blind to SCHED_RESTART.  And
the driver may just repeatedly kick the queue after a delay (via
blk_mq_delay_run_hw_queue).

This cycle should be a very rare occurrence regardless of which approach
is taken (V5 vs 4.15).  The fact that you engineered your SRP initiator
and target code to pathologically trigger this worst case (via
target_can_queue) doesn't mean it is the fast path for a properly
configured and functioning storage network.

> Sorry but I still prefer the v4.15 block layer approach because this
> patch has in my view the following disadvantages:
> - It involves a blk-mq API change. API changes are always risky and need
>   some time to stabilize.

The number of blk-mq API changes that have occurred since blk-mq was
introduced is a very long list.  Seems contrived to make this the one
that is breaking the camel's back.

> - The delay after which to rerun the queue is moved from block layer
>   drivers into the block layer core. I think that's wrong because only
>   the block driver authors can make a good choice for this constant.

Unsubstantiated.  3ms (scsi-mq, nvmefc) vs 100ms (dm-mq mpath): which is
better?  Pretty sure if the underlying storage network is 1) performant
2) properly configured -- then a shorter delay is preferable.

> - This patch makes block drivers harder to understand. Anyone who sees
>   return BLK_STS_RESOURCE / return BLK_STS_DEV_RESOURCE for the first
>   time will have to look up the meaning of these constants. An explicit
>   blk_mq_delay_run_hw_queue() call is easier to understand.

No it doesn't make blk-mq harder to understand.  But even if it did it
actually acknowledges that there is existing blk-mq SCHED_RESTART
heuristic for how to deal with the need for blk-mq to back-off in the
face of BLK_STS_RESOURCE returns.  By just having each blk-mq driver
blindly kick the queue you're actively ignoring, and defeating, that
entire design element of blk-mq (SCHED_RESTART).

It is an instance where blk-mq can and does know better.  Acknowledging
that fact is moving blk-mq in a better direction.

> - This patch makes the blk-mq core harder to understand because of the
>   loop mentioned above.

You've said your peace.  But you've taken on this campaign to undermine
this line of development with such passion that we're now in a place
where Jens is shell-shocked by all the repeat campaigning and noise.

Bart you keep saying the same things over and over.  Yet you cannot show
the patch to actively be a problem with testing-based detail.

Seems you'd rather refuse to even test it than open yourself up to the
possibility that this concern of yours has been making a mountain out of
a mole hill.

> - This patch does not fix any bugs nor makes block drivers easier to
>   read or to implement. So why is this 

Re: [dm-devel] [LSF/MM TOPIC] block, dm: restack queue_limits

2018-01-30 Thread Ewan D. Milne
On Tue, 2018-01-30 at 16:07 +0100, Hannes Reinecke wrote:
> On 01/29/2018 10:08 PM, Mike Snitzer wrote:
> > We currently don't restack the queue_limits if the lowest, or
> > intermediate, layer of an IO stack changes.
> > 
> > This is particularly unfortunate in the case of FLUSH/FUA which may
> > change if/when a HW controller's BBU fails; whereby requiring the device
> > advertise that it has a volatile write cache (WCE=1).
> > 
> Uh-oh. Device rescan.
> Would be a valid topic on its own...
> 
> > But in the context of DM, really it'd be best if the entire stack of
> > devices had their limits restacked if any underlying layer's limits
> > change.
> > 
> > In the past, Martin and I discussed that we should "just do it" but
> > never did.  Not sure we need a lengthy discussion but figured I'd put it
> > out there.
> > 
> > Maybe I'll find time, between now and April, to try implementing it.
> > 
> For SCSI the device capabilities are pretty much set in stone after the
> initial scan; there are hooks for rescanning, but they will only work
> half of the time.
> Plus we can't really change the device type on the fly (eg if the SCSI
> device type changes; if it moves away from '0' we would need to unbind
> the sd driver, and if it moves to '0' we'll need to rescan the sd
> device. None of this is happening right now.)
> 
> So I'd be glad to have a discussion around this.

At least array vendor has also desired the ability to change various
attributes of the device after the initial scan, such as the model name.
Not sure what would break if we did this, since who knows what userspace
software might be caching this info, but...

-Ewan




Re: [dm-devel] [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE

2018-01-30 Thread Laurence Oberman
On Tue, 2018-01-30 at 09:52 -0800, Bart Van Assche wrote:
> On 01/30/18 06:24, Mike Snitzer wrote:
> > +    *
> > +    * If driver returns BLK_STS_RESOURCE and
> > SCHED_RESTART
> > +    * bit is set, run queue after a delay to avoid IO
> > stalls
> > +    * that could otherwise occur if the queue is
> > idle.
> >      */
> > -   if (!blk_mq_sched_needs_restart(hctx) ||
> > +   needs_restart = blk_mq_sched_needs_restart(hctx);
> > +   if (!needs_restart ||
> >     (no_tag && list_empty_careful(
> > >dispatch_wait.entry)))
> >     blk_mq_run_hw_queue(hctx, true);
> > +   else if (needs_restart && (ret ==
> > BLK_STS_RESOURCE))
> > +   blk_mq_delay_run_hw_queue(hctx,
> > BLK_MQ_QUEUE_DELAY);
> >     }
> 
> If a request completes concurrently with execution of the above code 
> then the request completion will trigger a call of 
> blk_mq_sched_restart_hctx() and that call will clear the 
> BLK_MQ_S_SCHED_RESTART bit. If that bit is cleared before the above
> code 
> tests it then the above code will schedule an asynchronous call of 
> __blk_mq_run_hw_queue(). If the .queue_rq() call triggered by the
> new 
> queue run returns again BLK_STS_RESOURCE then the above code will be 
> executed again. In other words, a loop occurs. That loop will repeat
> as 
> long as the described race occurs. The current (kernel v4.15) block 
> layer behavior is simpler: only block drivers call 
> blk_mq_delay_run_hw_queue() and the block layer core never calls
> that 
> function. Hence that loop cannot occur with the v4.15 block layer
> core 
> and block drivers. A motivation of why that loop is preferred
> compared 
> to the current behavior (no loop) is missing. Does this mean that
> that 
> loop is a needless complication of the block layer core?
> 
> Sorry but I still prefer the v4.15 block layer approach because this 
> patch has in my view the following disadvantages:
> - It involves a blk-mq API change. API changes are always risky and
> need
>    some time to stabilize.
> - The delay after which to rerun the queue is moved from block layer
>    drivers into the block layer core. I think that's wrong because
> only
>    the block driver authors can make a good choice for this constant.
> - This patch makes block drivers harder to understand. Anyone who
> sees
>    return BLK_STS_RESOURCE / return BLK_STS_DEV_RESOURCE for the
> first
>    time will have to look up the meaning of these constants. An
> explicit
>    blk_mq_delay_run_hw_queue() call is easier to understand.
> - This patch makes the blk-mq core harder to understand because of
> the
>    loop mentioned above.
> - This patch does not fix any bugs nor makes block drivers easier to
>    read or to implement. So why is this patch considered useful?
> 
> Thanks,
> 
> Bart.

Hello Bart

What is the performance implication of your method versus this patch
above.
Is there more of a delay in your approach or in Ming's approach from
your own testing.
I guess it seems we will never get consensus on this so is it time to
take a vote. 

I respect and trust your inputs, you know that, but are you perhaps
prepared to accept the approach above and agree to it and if it turns
out to expose more issues it can be addressed later.

Is that not a better way to progress this because to me it looks like
you and Ming will continue to disagree on which is the better approach.

With much respect
Laurence


Re: [dm-devel] [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE

2018-01-30 Thread Bart Van Assche

On 01/30/18 06:24, Mike Snitzer wrote:

+*
+* If driver returns BLK_STS_RESOURCE and SCHED_RESTART
+* bit is set, run queue after a delay to avoid IO stalls
+* that could otherwise occur if the queue is idle.
 */
-   if (!blk_mq_sched_needs_restart(hctx) ||
+   needs_restart = blk_mq_sched_needs_restart(hctx);
+   if (!needs_restart ||
(no_tag && list_empty_careful(>dispatch_wait.entry)))
blk_mq_run_hw_queue(hctx, true);
+   else if (needs_restart && (ret == BLK_STS_RESOURCE))
+   blk_mq_delay_run_hw_queue(hctx, BLK_MQ_QUEUE_DELAY);
}


If a request completes concurrently with execution of the above code 
then the request completion will trigger a call of 
blk_mq_sched_restart_hctx() and that call will clear the 
BLK_MQ_S_SCHED_RESTART bit. If that bit is cleared before the above code 
tests it then the above code will schedule an asynchronous call of 
__blk_mq_run_hw_queue(). If the .queue_rq() call triggered by the new 
queue run returns again BLK_STS_RESOURCE then the above code will be 
executed again. In other words, a loop occurs. That loop will repeat as 
long as the described race occurs. The current (kernel v4.15) block 
layer behavior is simpler: only block drivers call 
blk_mq_delay_run_hw_queue() and the block layer core never calls that 
function. Hence that loop cannot occur with the v4.15 block layer core 
and block drivers. A motivation of why that loop is preferred compared 
to the current behavior (no loop) is missing. Does this mean that that 
loop is a needless complication of the block layer core?


Sorry but I still prefer the v4.15 block layer approach because this 
patch has in my view the following disadvantages:

- It involves a blk-mq API change. API changes are always risky and need
  some time to stabilize.
- The delay after which to rerun the queue is moved from block layer
  drivers into the block layer core. I think that's wrong because only
  the block driver authors can make a good choice for this constant.
- This patch makes block drivers harder to understand. Anyone who sees
  return BLK_STS_RESOURCE / return BLK_STS_DEV_RESOURCE for the first
  time will have to look up the meaning of these constants. An explicit
  blk_mq_delay_run_hw_queue() call is easier to understand.
- This patch makes the blk-mq core harder to understand because of the
  loop mentioned above.
- This patch does not fix any bugs nor makes block drivers easier to
  read or to implement. So why is this patch considered useful?

Thanks,

Bart.


Re: [dm-devel] [LSF/MM TOPIC] block, dm: restack queue_limits

2018-01-30 Thread Hannes Reinecke
On 01/29/2018 10:08 PM, Mike Snitzer wrote:
> We currently don't restack the queue_limits if the lowest, or
> intermediate, layer of an IO stack changes.
> 
> This is particularly unfortunate in the case of FLUSH/FUA which may
> change if/when a HW controller's BBU fails; whereby requiring the device
> advertise that it has a volatile write cache (WCE=1).
> 
Uh-oh. Device rescan.
Would be a valid topic on its own...

> But in the context of DM, really it'd be best if the entire stack of
> devices had their limits restacked if any underlying layer's limits
> change.
> 
> In the past, Martin and I discussed that we should "just do it" but
> never did.  Not sure we need a lengthy discussion but figured I'd put it
> out there.
> 
> Maybe I'll find time, between now and April, to try implementing it.
> 
For SCSI the device capabilities are pretty much set in stone after the
initial scan; there are hooks for rescanning, but they will only work
half of the time.
Plus we can't really change the device type on the fly (eg if the SCSI
device type changes; if it moves away from '0' we would need to unbind
the sd driver, and if it moves to '0' we'll need to rescan the sd
device. None of this is happening right now.)

So I'd be glad to have a discussion around this.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


[PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE

2018-01-30 Thread Mike Snitzer
From: Ming Lei 

This status is returned from driver to block layer if device related
resource is unavailable, but driver can guarantee that IO dispatch
will be triggered in future when the resource is available.

Convert some drivers to return BLK_STS_DEV_RESOURCE.  Also, if driver
returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after
a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls.  BLK_MQ_DELAY_QUEUE is
3 ms because both scsi-mq and nvmefc are using that magic value.

If a driver can make sure there is in-flight IO, it is safe to return
BLK_STS_DEV_RESOURCE because:

1) If all in-flight IOs complete before examining SCHED_RESTART in
blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue
is run immediately in this case by blk_mq_dispatch_rq_list();

2) if there is any in-flight IO after/when examining SCHED_RESTART
in blk_mq_dispatch_rq_list():
- if SCHED_RESTART isn't set, queue is run immediately as handled in 1)
- otherwise, this request will be dispatched after any in-flight IO is
  completed via blk_mq_sched_restart()

3) if SCHED_RESTART is set concurently in context because of
BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two
cases and make sure IO hang can be avoided.

One invariant is that queue will be rerun if SCHED_RESTART is set.

Suggested-by: Jens Axboe 
Tested-by: Laurence Oberman 
Signed-off-by: Ming Lei 
Signed-off-by: Mike Snitzer 
---
V5:
- fixed stale reference to 10ms delay in blk-mq.c comment
- revised commit header to include Ming's proof of how
  blk-mq drivers will make progress (serves to document how
  scsi_queue_rq now works)
V4:
- cleanup header and code comments
- rerun queue after BLK_MQ_QUEUE_DELAY (3ms) instead of 10ms
- eliminate nvmefc's queue rerun now that blk-mq does it
V3:
- fix typo, and improvement document
- add tested-by tag
V2:
- add comments on the new introduced status
- patch style fix
- both are suggested by Christoph

 block/blk-core.c |  1 +
 block/blk-mq.c   | 20 
 drivers/block/null_blk.c |  2 +-
 drivers/block/virtio_blk.c   |  2 +-
 drivers/block/xen-blkfront.c |  2 +-
 drivers/md/dm-rq.c   |  5 ++---
 drivers/nvme/host/fc.c   | 12 ++--
 drivers/scsi/scsi_lib.c  |  6 +++---
 include/linux/blk_types.h| 17 +
 9 files changed, 44 insertions(+), 23 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index cdae69be68e9..38279d4ae08b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -145,6 +145,7 @@ static const struct {
[BLK_STS_MEDIUM]= { -ENODATA,   "critical medium" },
[BLK_STS_PROTECTION]= { -EILSEQ,"protection" },
[BLK_STS_RESOURCE]  = { -ENOMEM,"kernel resource" },
+   [BLK_STS_DEV_RESOURCE]  = { -ENOMEM,"device resource" },
[BLK_STS_AGAIN] = { -EAGAIN,"nonblocking retry" },
 
/* device mapper special case, should not leak out: */
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 43e7449723e0..e39b4e2a63db 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1160,6 +1160,8 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx 
**hctx,
return true;
 }
 
+#define BLK_MQ_QUEUE_DELAY 3   /* ms units */
+
 bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 bool got_budget)
 {
@@ -1167,6 +1169,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, 
struct list_head *list,
struct request *rq, *nxt;
bool no_tag = false;
int errors, queued;
+   blk_status_t ret = BLK_STS_OK;
 
if (list_empty(list))
return false;
@@ -1179,7 +1182,6 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, 
struct list_head *list,
errors = queued = 0;
do {
struct blk_mq_queue_data bd;
-   blk_status_t ret;
 
rq = list_first_entry(list, struct request, queuelist);
if (!blk_mq_get_driver_tag(rq, , false)) {
@@ -1224,7 +1226,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, 
struct list_head *list,
}
 
ret = q->mq_ops->queue_rq(hctx, );
-   if (ret == BLK_STS_RESOURCE) {
+   if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) {
/*
 * If an I/O scheduler has been configured and we got a
 * driver tag for the next request already, free it
@@ -1255,6 +1257,8 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, 
struct list_head *list,
 * that is where we will continue on next queue run.
 */
if (!list_empty(list)) {
+   bool needs_restart;
+

Re: [LSF/MM TOPIC] Two blk-mq related topics

2018-01-30 Thread Mel Gorman
On Tue, Jan 30, 2018 at 11:08:28AM +0100, Johannes Thumshirn wrote:
> [+Cc Mel]
> Jens Axboe  writes:
> > On 1/29/18 1:56 PM, James Bottomley wrote:
> >> On Mon, 2018-01-29 at 23:46 +0800, Ming Lei wrote:
> >> [...]
> >>> 2. When to enable SCSI_MQ at default again?
> >> 
> >> I'm not sure there's much to discuss ... I think the basic answer is as
> >> soon as Christoph wants to try it again.
> >
> > FWIW, internally I've been running various IO intensive workloads on
> > what is essentially 4.12 upstream with scsi-mq the default (with
> > mq-deadline as the scheduler) and comparing IO workloads with a
> > previous 4.6 kernel (without scsi-mq), and things are looking
> > great.
> >
> > We're never going to iron out the last kinks with it being off
> > by default, I think we should attempt to flip the switch again
> > for 4.16.
> 
> The 4.12 sounds interesting. I remember Mel ran some test with 4.12 as
> we where considering to flip the config option for SLES and it showed
> several road blocks.
> 

Mostly due to slow storage and BFQ where mq-deadline was not a universal
win as an alternative default. I don't have current data and I archived
what I had, but it was based on 4.13-rc7 at the time and BFQ has changed
a lot since so it would need to be redone.

> I'm not sure whether he re-evaluated 4.13/4.14 on his grid though.
> 

No, it hasn't. Grid time for performance testing has been tight during
the last few months to say the least.

> But I'm definitively interested in this discussion and can even possibly
> share some benchmark results we did in our FC Lab.
> 

If you remind me, I may be able to re-execute the tests in a 4.16-rcX
before LSF/MM so you have other data to work with.  Unfortunately, I'll
not be able to make LSF/MM this time due to personal commitments that
conflict and are unmovable.

-- 
Mel Gorman
SUSE Labs


RE: A qla2xxx commit cause Linux no response, has not fixed in lastest version 4.15-rc6

2018-01-30 Thread Changlimin
Hi Himanshu,
  Today I tried several times and have some news.
  Before I insmod the qla2xxx.ko , if I shutdown then start the FC switch port 
connected to the HBA card, the qla2xxx.ko works well.
  It seems that the issue has relation to the FC switch port. Maybe some old 
status causes the issue.
  The FC switch model is H3C S5820V2.

Regards
Chang Limin


-Original Message-
From: Madhani, Himanshu [mailto:himanshu.madh...@cavium.com] 
Sent: Tuesday, January 30, 2018 7:40 AM
To: changlimin (Cloud)
Cc: Nicholas A. Bellinger; Tran, Quinn; jifuliang (Cloud); zhangguanghui 
(Cloud); zhangzijian (Cloud); target-devel; linux-scsi
Subject: Re: A qla2xxx commit cause Linux no response, has not fixed in lastest 
version 4.15-rc6

Hi Chang, 

> On Jan 18, 2018, at 4:51 AM, Changlimin  wrote:
> 
> Hi Himanshu,
>  Today I reproduced the issue in my server.
>  First, I compiled kernel 4.15-rc6 (make localmodconfig; make; make 
> modules_install; make install), then start the kernel with parameter 
> modprobe.blacklist=qla2xxx.
>  Second,  tail -f /var/log/syslog
>  Third,  modprobe qla2xxx ql2xextended_error_logging=0x1e40 , the log is 
> syslog-1e40.txt
>  The syslog-7fff is got when modprobe qla2xxx 
> ql2xextended_error_logging=0x7fff
> 
>  BTW, I haven't load driver from 4.9.x to kernel 4.15-rc6. 
>  When I checkout kernel commit 726b85487067d7f5b23495bc33c484b8517c4074, all 
> kernel code is 4.9.x.
> 

Sorry for extended delay in the response. From the syslog that you sent me, I 
do see driver version 10.00.00.02-k which is from 4.15.0-rc6 so atleast you are 
using the correct
driver. (in your email earlier you mentioned 8.07.xx which was confusing) 

Jan 18 20:30:23 cvknode25 kernel: [  100.991309] qla2xxx [:00:00.0]-0005: : 
QLogic Fibre Channel HBA Driver: 10.00.00.02-k-debug.
Jan 18 20:30:23 cvknode25 kernel: [  100.991486] qla2xxx [:0a:00.0]-001d: : 
Found an ISP2532 irq 16 iobase 0x67aad9fd.
Jan 18 20:30:23 cvknode25 kernel: [  101.651676] qla2xxx [:0a:00.0]-4800:1: 
DPC handler sleeping.
Jan 18 20:30:23 cvknode25 kernel: [  101.651677] scsi host1: qla2xxx

Also I do see  

Jan 18 20:30:24 cvknode25 kernel: [  102.624987] qla2xxx [:0a:00.0]-500a:1: 
LOOP UP detected (8 Gbps).

i.e. driver was able to bring up 8G link 

So having said that i still do not have clear picture from the logs provided, 
why you are encountering issue. 

Can you please share you configuration details. I would like to see how is your 
system setup and see if i can replicate in our lab here. 

> Regards
> Chang Limin
> 
> -Original Message-
> From: Madhani, Himanshu [mailto:himanshu.madh...@cavium.com]
> Sent: Thursday, January 18, 2018 2:26 AM
> To: changlimin (Cloud)
> Cc: Nicholas A. Bellinger; Tran, Quinn; jifuliang (Cloud); zhangguanghui 
> (Cloud); zhangzijian (Cloud); target-devel; linux-scsi
> Subject: Re: A qla2xxx commit cause Linux no response, has not fixed in 
> lastest version 4.15-rc6
> 
> Hi Chang, 
> 
>> On Jan 15, 2018, at 10:49 PM, Changlimin  wrote:
>> 
>> Hi Himanshu,
>>  This is my progress.
>>  First, I compiled 4.15-rc6, I found linux hang when booting, the stack 
>> showed something wrong in qla2xxx driver.
> 
> Can you provide me detail steps of how you compiled 4.15-rc6. Also provide me 
> details of how you are loading driver and also provide complete log file.
> 
> I do not see how you will be able to load driver which is from 4.9.x when you 
> compile fresh 4.15.0-rc6. 
> 
> Just FYI, I build test system with 8G/16G/32G adapter with 4.15.0-rc6 kernel 
> and I am not able to see hang that you are describing. 
> 
> # uname -r
> 4.15.0-rc6+
> 
> # modprobe qla2xxx
> 
> # fcc.sh
> FC HBAs:
> HBA   Port NamePort ID   State Device
> host3 21:00:00:24:ff:7e:f5:80  01:0d:00  OnlineQLE2742 FW:v8.05.63 
> DVR:v10.00.00.04-k
> host4 21:00:00:24:ff:7e:f5:81  01:0e:00  OnlineQLE2742 FW:v8.05.63 
> DVR:v10.00.00.04-k
> host5 21:00:00:0e:1e:12:e9:a0  01:06:00  OnlineQLE8362 FW:v8.03.06 
> DVR:v10.00.00.04-k
> host6 21:00:00:0e:1e:12:e9:a1  01:14:00  OnlineQLE8362 FW:v8.03.06 
> DVR:v10.00.00.04-k
> host7 21:00:00:24:ff:46:0a:5c  01:0d:00  OnlineQLE2562 FW:v8.03.00 
> DVR:v10.00.00.04-k
> host8 21:00:00:24:ff:46:0a:5d  01:15:00  OnlineQLE2562 FW:v8.03.00 
> DVR:v10.00.00.04-k
> 
> # modinfo qla2xxx | more
> 
> filename:   
> /lib/modules/4.15.0-rc6+/kernel/drivers/scsi/qla2xxx/qla2xxx.ko
> firmware:   ql2500_fw.bin
> firmware:   ql2400_fw.bin
> firmware:   ql2322_fw.bin
> firmware:   ql2300_fw.bin
> firmware:   ql2200_fw.bin
> firmware:   ql2100_fw.bin
> version:10.00.00.04-k
> license:GPL
> description:QLogic Fibre Channel HBA Driver
> author: QLogic Corporation
> srcversion: 6CBCF1372A7756690E83CC3
> 
> 
>>  Second, I want to find which commit introduced the issue. So I tried many 
>> times 

Re: [LSF/MM TOPIC] Two blk-mq related topics

2018-01-30 Thread John Garry

On 30/01/2018 01:24, Ming Lei wrote:

On Mon, Jan 29, 2018 at 12:56:30PM -0800, James Bottomley wrote:

On Mon, 2018-01-29 at 23:46 +0800, Ming Lei wrote:
[...]

2. When to enable SCSI_MQ at default again?


I'm not sure there's much to discuss ... I think the basic answer is as
soon as Christoph wants to try it again.


I guess Christoph still need to evaluate if there are existed issues or
blockers before trying it again. And more input may be got from F2F
discussion, IMHO.




SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In
V4.13-rc1, it is enabled at default, but later the patch is reverted
in V4.13-rc7, and becomes disabled at default too.

Now both the original reported PM issue(actually SCSI quiesce) and
the sequential IO performance issue have been addressed.


Is the blocker bug just not closed because no-one thought to do it:

https://bugzilla.kernel.org/show_bug.cgi?id=178381

(we have confirmed that this issue is now fixed with the original
reporter?)



From a developer view, this issue is fixed by the following commit:

3a0a52997(block, scsi: Make SCSI quiesce and resume work reliably),
and it is verified by kernel list reporter.



And did the Huawei guy (Jonathan Cameron) confirm his performance issue
was fixed (I don't think I saw email that he did)?


Last time I talked with John Garry about the issue, and the merged .get_budget
based patch improves much on the IO performance, but there is still a bit gap
compared with legacy path. Seems a driver specific issue, remembered that 
removing
a driver's lock can improve performance much.

Garry, could you provide further update on this issue?


Hi Ming,

From our testing with experimental changes to our driver to support 
SCSI mq we were almost getting on par performance with legacy path. But 
without these MQ was hitting performance (and I would not necessarily 
say it was a driver issue).


We can retest from today's mainline and see where we are.

BTW, Have you got performance figures for many other single queue HBAs 
with and without CONFIG_SCSI_MQ_DEFAULT=Y?


Thanks,
John



Thanks,
Ming

.






[PATCH v2] scsi: handle special return codes for ABORTED COMMAND

2018-01-30 Thread Martin Wilck
Introduce a new blist flag that indicates the device may return certain
sense code/ASC/ASCQ combinations that indicate different treatment than
normal. In particular, some devices need unconditional retry (aka
ADD_TO_MLQUEUE) under certain conditions; otherwise path failures may be
falsely detected in the "maybe_retry" case.

Because different devices use different sense codes to indicate this
condition, a single bit is not enough. But we also can't use a bit for
every possible status code. Therefore the new flag BLIST_ABORTED_CMD_QUIRK
just indicates that this is a device for which the return code should be
checked. An extra "blacklist" in scsi_aborted_cmd_quirk() then checks for
known ASC/ASCQ combinations, and handles them.

When such a combination is encountered for the first time, a message is
printed. In systems that have several different peripherals using this
flag, that might lead to a wrong match without a warning. This small risk
is a compromise between exactness and the excessive resources that would be
required to check for matching device vendor and model name every time.
Also, if there were different devices (identified by vendor/model) using
the same ASC/ASCQ, the code might print stray warnings. But the additional
effort to required to handle this 100% correctly is hardly justified with
the current minimal "blacklist".

I chose to recycle devinfo flag (1<<11) (former BLIST_INQUIRY_58, free
since 496c91bbc910) for this purpose rather than extending blist_flags_t to
64 bit. This could be easily changed of course.

This patch replaces the previously submitted patches
 "scsi: handle ABORTED_COMMAND on Fujitsu ETERNUS" and
 "scsi: Always retry internal target error" (Hannes Reinecke)

Changes in v2:
 - use ARRAY_SIZE (Bart van Assche)
 - make blist array static const and use separate bitmask for warned flag
   (Bart van Assche)
 - Fix string comparison for SCSI vendor and model
 - Print warning at scsi_logging_level 0, and improve message formatting

Signed-off-by: Martin Wilck 
---
 drivers/scsi/scsi_devinfo.c |   4 +-
 drivers/scsi/scsi_error.c   | 111 
 include/scsi/scsi_devinfo.h |   6 +++
 3 files changed, 120 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c
index dfb8da83fa50..39455734d934 100644
--- a/drivers/scsi/scsi_devinfo.c
+++ b/drivers/scsi/scsi_devinfo.c
@@ -161,12 +161,14 @@ static struct {
{"DGC", "RAID", NULL, BLIST_SPARSELUN}, /* Dell PV 650F, storage on LUN 
0 */
{"DGC", "DISK", NULL, BLIST_SPARSELUN}, /* Dell PV 650F, no storage on 
LUN 0 */
{"EMC",  "Invista", "*", BLIST_SPARSELUN | BLIST_LARGELUN},
-   {"EMC", "SYMMETRIX", NULL, BLIST_SPARSELUN | BLIST_LARGELUN | 
BLIST_REPORTLUN2},
+   {"EMC", "SYMMETRIX", NULL, BLIST_SPARSELUN | BLIST_LARGELUN
+| BLIST_REPORTLUN2 | BLIST_ABORTED_CMD_QUIRK},
{"EMULEX", "MD21/S2 ESDI", NULL, BLIST_SINGLELUN},
{"easyRAID", "16P", NULL, BLIST_NOREPORTLUN},
{"easyRAID", "X6P", NULL, BLIST_NOREPORTLUN},
{"easyRAID", "F8", NULL, BLIST_NOREPORTLUN},
{"FSC", "CentricStor", "*", BLIST_SPARSELUN | BLIST_LARGELUN},
+   {"FUJITSU", "ETERNUS_DXM", "*", BLIST_ABORTED_CMD_QUIRK},
{"Generic", "USB SD Reader", "1.00", BLIST_FORCELUN | BLIST_INQUIRY_36},
{"Generic", "USB Storage-SMC", "0180", BLIST_FORCELUN | 
BLIST_INQUIRY_36},
{"Generic", "USB Storage-SMC", "0207", BLIST_FORCELUN | 
BLIST_INQUIRY_36},
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 62b56de38ae8..d60568f71047 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -39,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "scsi_priv.h"
 #include "scsi_logging.h"
@@ -432,6 +434,112 @@ static void scsi_report_sense(struct scsi_device *sdev,
}
 }
 
+struct aborted_cmd_blist {
+   u8 asc;
+   u8 ascq;
+   int retval;
+   const char *vendor;
+   const char *model;
+};
+
+/**
+ * scsi_strcmp - Compare space-padded string with reference string
+ * @device_str:vendor or model field of struct scsi_device,
+ * possibly space-padded
+ * @ref_str:   reference string to compare with
+ * @len:   max size of device_str: 8 for vendor, 16 for model
+ *
+ * Return value:
+ * -1, 0, or 1, like strcmp().
+ */
+static int scsi_strcmp(const char *device_str, const char *ref_str, int len)
+{
+   int ref_len = strlen(ref_str);
+   int r, i;
+
+   WARN_ON(ref_len > len);
+   r = strncmp(device_str, ref_str, min(ref_len, len));
+   if (r != 0)
+   return r;
+
+   for (i = ref_len; i < strnlen(device_str, len); i++)
+   if (device_str[i] != ' ')
+   return 1;
+   return 0;
+}
+
+/**
+ * scsi_aborted_cmd_quirk - Handle special 

Re: [LSF/MM TOPIC] Two blk-mq related topics

2018-01-30 Thread Johannes Thumshirn
[+Cc Mel]
Jens Axboe  writes:
> On 1/29/18 1:56 PM, James Bottomley wrote:
>> On Mon, 2018-01-29 at 23:46 +0800, Ming Lei wrote:
>> [...]
>>> 2. When to enable SCSI_MQ at default again?
>> 
>> I'm not sure there's much to discuss ... I think the basic answer is as
>> soon as Christoph wants to try it again.
>
> FWIW, internally I've been running various IO intensive workloads on
> what is essentially 4.12 upstream with scsi-mq the default (with
> mq-deadline as the scheduler) and comparing IO workloads with a
> previous 4.6 kernel (without scsi-mq), and things are looking
> great.
>
> We're never going to iron out the last kinks with it being off
> by default, I think we should attempt to flip the switch again
> for 4.16.

The 4.12 sounds interesting. I remember Mel ran some test with 4.12 as
we where considering to flip the config option for SLES and it showed
several road blocks.

I'm not sure whether he re-evaluated 4.13/4.14 on his grid though.

But I'm definitively interested in this discussion and can even possibly
share some benchmark results we did in our FC Lab.

Byte,
Johannes
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [LSF/MM TOPIC] Two blk-mq related topics

2018-01-30 Thread Martin Steigerwald
Ming Lei - 30.01.18, 02:24:
> > > SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In
> > > V4.13-rc1, it is enabled at default, but later the patch is reverted
> > > in V4.13-rc7, and becomes disabled at default too.
> > > 
> > > Now both the original reported PM issue(actually SCSI quiesce) and
> > > the sequential IO performance issue have been addressed.
> > 
> > Is the blocker bug just not closed because no-one thought to do it:
> > 
> > https://bugzilla.kernel.org/show_bug.cgi?id=178381
> > 
> > (we have confirmed that this issue is now fixed with the original
> > reporter?)
> 
> From a developer view, this issue is fixed by the following commit:
> 3a0a52997(block, scsi: Make SCSI quiesce and resume work reliably),
> and it is verified by kernel list reporter.

I never seen any suspend / hibernate related issues with blk-mq + bfq since 
then. Using heavily utilized BTRFS dual SSD RAID 1.

% egrep "MQ|BFQ" /boot/config-4.15.0-tp520-btrfstrim+
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_BLK_WBT_MQ=y
CONFIG_BLK_MQ_PCI=y
CONFIG_BLK_MQ_VIRTIO=y
CONFIG_MQ_IOSCHED_DEADLINE=m
CONFIG_MQ_IOSCHED_KYBER=m
CONFIG_IOSCHED_BFQ=m
CONFIG_BFQ_GROUP_IOSCHED=y
CONFIG_NET_SCH_MQPRIO=m
# CONFIG_SCSI_MQ_DEFAULT is not set
# CONFIG_DM_MQ_DEFAULT is not set
CONFIG_DM_CACHE_SMQ=m

% cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-4.15.0-tp520-btrfstrim+ root=UUID=[…] ro 
rootflags=subvol=debian resume=/dev/mapper/sata-swap init=/bin/systemd 
thinkpad_acpi.fan_control=1 systemd.restore_state=0 scsi_mod.use_blk_mq=1

% cat /sys/block/sda/queue/scheduler 
[bfq] none

Thanks,
-- 
Martin