RE: [PATCH 1/1] scsi: ufs: make sure all interrupts are processed
Hi, Can you elaborate how this can even happen? Isn't the interrupt aggregation capability should attend for those cases? Thanks, Avri > -Original Message- > From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi- > ow...@vger.kernel.org] On Behalf Of Asutosh Das > Sent: Tuesday, January 30, 2018 6:54 AM > To: subha...@codeaurora.org; c...@codeaurora.org; > vivek.gau...@codeaurora.org; rna...@codeaurora.org; > vinholika...@gmail.com; j...@linux.vnet.ibm.com; > martin.peter...@oracle.com > Cc: linux-scsi@vger.kernel.org; Venkat Gopalakrishnan >; Asutosh Das ; open > list > Subject: [PATCH 1/1] scsi: ufs: make sure all interrupts are processed > > From: Venkat Gopalakrishnan > > As multiple requests are submitted to the ufs host controller in parallel > there > could be instances where the command completion interrupt arrives later for a > request that is already processed earlier as the corresponding doorbell was > cleared when handling the previous interrupt. Read the interrupt status in a > loop after processing the received interrupt to catch such interrupts and > handle > it. > > Signed-off-by: Venkat Gopalakrishnan > Signed-off-by: Asutosh Das > --- > drivers/scsi/ufs/ufshcd.c | 27 +++ > 1 file changed, 19 insertions(+), 8 deletions(-) > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index > 8af2af3..58d81de 100644 > --- a/drivers/scsi/ufs/ufshcd.c > +++ b/drivers/scsi/ufs/ufshcd.c > @@ -5357,19 +5357,30 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba) > u32 intr_status, enabled_intr_status; > irqreturn_t retval = IRQ_NONE; > struct ufs_hba *hba = __hba; > + int retries = hba->nutrs; > > spin_lock(hba->host->host_lock); > intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS); > - enabled_intr_status = > - intr_status & ufshcd_readl(hba, REG_INTERRUPT_ENABLE); > > - if (intr_status) > - ufshcd_writel(hba, intr_status, REG_INTERRUPT_STATUS); > + /* > + * There could be max of hba->nutrs reqs in flight and in worst case > + * if the reqs get finished 1 by 1 after the interrupt status is > + * read, make sure we handle them by checking the interrupt status > + * again in a loop until we process all of the reqs before returning. > + */ > + do { > + enabled_intr_status = > + intr_status & ufshcd_readl(hba, > REG_INTERRUPT_ENABLE); > + if (intr_status) > + ufshcd_writel(hba, intr_status, > REG_INTERRUPT_STATUS); > + if (enabled_intr_status) { > + ufshcd_sl_intr(hba, enabled_intr_status); > + retval = IRQ_HANDLED; > + } > + > + intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS); > + } while (intr_status && --retries); > > - if (enabled_intr_status) { > - ufshcd_sl_intr(hba, enabled_intr_status); > - retval = IRQ_HANDLED; > - } > spin_unlock(hba->host->host_lock); > return retval; > } > -- > Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, > Inc. > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux > Foundation Collaborative Project.
Re: [PATCH 0/6] Six qla2xxx and qla4xxx patches
Hi Martin, > On Jan 30, 2018, at 7:04 PM, Martin K. Petersen> wrote: > > > Bart, > >> The patches in this series are what I came up with after having analyzed >> the source code of the qla[24]xxx drivers with several source code analysis >> tools (scripts/kernel-doc, gcc, sparse and smatch). None of the patches in >> this series have been tested. Yet I'm asking you to consider at least the >> first patch of this series for kernel v4.16 because it fixes a locking bug >> in one of the SCSI patches queued for kernel v4.16. > > Himanshu and team: Please review! > Series looks good Acked-by: Himanshu Madhani > -- > Martin K. PetersenOracle Linux Engineering Thanks, - Himanshu
Re: [dm-devel] [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE
On Tue, Jan 30, 2018 at 08:22:27PM -0700, Jens Axboe wrote: > On 1/30/18 8:21 PM, Bart Van Assche wrote: > > On Tue, 2018-01-30 at 20:17 -0700, Jens Axboe wrote: > >> BLK_STS_RESOURCE should always be safe to return, and it should work > >> the same as STS_DEV_RESOURCE, except it may cause an extra queue > >> run. > >> > >> Well written drivers should use STS_DEV_RESOURCE where it makes > >> sense. > > > > Hello Jens, > > > > I would appreciate it if other names would be chosen than BLK_STS_RESOURCE > > and BLK_STS_DEV_RESOURCE, e.g. names that directly refer to the fact that > > one of the two status codes causes the queue to be rerun and the other not. > > I'm afraid that the currently chosen names will cause confusion. > > DEV_RESOURCE is pretty clear I think, but I agree that STS_RESOURCE > could perhaps be better. STS_SYSTEM_RESOURCE? It makes the distinction I guess it still can't cover all, for example, .queue_rq() may not submit rq to hardware successfully because of tansport busy, such FC,.. -- Ming
Re: [dm-devel] [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE
On 1/30/18 8:27 PM, Bart Van Assche wrote: > On Tue, 2018-01-30 at 20:22 -0700, Jens Axboe wrote: >> On 1/30/18 8:21 PM, Bart Van Assche wrote: >>> On Tue, 2018-01-30 at 20:17 -0700, Jens Axboe wrote: BLK_STS_RESOURCE should always be safe to return, and it should work the same as STS_DEV_RESOURCE, except it may cause an extra queue run. Well written drivers should use STS_DEV_RESOURCE where it makes sense. >>> >>> Hello Jens, >>> >>> I would appreciate it if other names would be chosen than BLK_STS_RESOURCE >>> and BLK_STS_DEV_RESOURCE, e.g. names that directly refer to the fact that >>> one of the two status codes causes the queue to be rerun and the other not. >>> I'm afraid that the currently chosen names will cause confusion. >> >> DEV_RESOURCE is pretty clear I think, but I agree that STS_RESOURCE >> could perhaps be better. STS_SYSTEM_RESOURCE? It makes the distinction >> a bit clearer. > > I'm concerned about both. A block driver can namely run out of device > resources without receiving a notification if that resource becomes > available again. That is true, and that is why I don't want to driver to make guesses on when to re-run. The saving grace here is that it should happen extremely rarely. I went over this in a previous email. If it's not a rare occurence, then there's no other way around it then wiring up a notification mechanism to kick the queue when the shortage clears. One way to deal with that is making the assumption that other IO will clear the issue. That means we can confine it to the blk-mq layer. Similarly to how we currently sometimes have to track starvation across hardware queues and restart queue X when Y completes IO, we may have to have a broader scope of shortage. That would probably fix all bug pathological cases. > In that case the appropriate return code is BLK_STS_RESOURCE. Hence my > concern ... But this isn't a new thing, the only real change here is making it so that drivers don't have to think about that case. -- Jens Axboe
Re: [dm-devel] [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE
On Tue, 2018-01-30 at 20:22 -0700, Jens Axboe wrote: > On 1/30/18 8:21 PM, Bart Van Assche wrote: > > On Tue, 2018-01-30 at 20:17 -0700, Jens Axboe wrote: > > > BLK_STS_RESOURCE should always be safe to return, and it should work > > > the same as STS_DEV_RESOURCE, except it may cause an extra queue > > > run. > > > > > > Well written drivers should use STS_DEV_RESOURCE where it makes > > > sense. > > > > Hello Jens, > > > > I would appreciate it if other names would be chosen than BLK_STS_RESOURCE > > and BLK_STS_DEV_RESOURCE, e.g. names that directly refer to the fact that > > one of the two status codes causes the queue to be rerun and the other not. > > I'm afraid that the currently chosen names will cause confusion. > > DEV_RESOURCE is pretty clear I think, but I agree that STS_RESOURCE > could perhaps be better. STS_SYSTEM_RESOURCE? It makes the distinction > a bit clearer. I'm concerned about both. A block driver can namely run out of device resources without receiving a notification if that resource becomes available again. In that case the appropriate return code is BLK_STS_RESOURCE. Hence my concern ... Bart.
Re: [PATCH v3] scsi: sd: add new match array for cache_type
Weiping, > add user friendly command strings sd_wr_cache to enable/disable > write cache. user can enable both write and read cache by one of > the following commands: I remain unconvinced that introducing redundant option strings is a benefit. These shorthand forms may seem obvious to you, but not to everyone else. Adding to the murkiness is that fact that write cache must be *enabled* and read cache *disabled* in the protocol. The fact that the ??rN and RCD columns below contain inverse values just emphasizes that point that "0" and "1" are not a particularly good interface either. It is also confusing that the existing longhand forms will be printed when the user subsequently queries the state instead of what they used to toggle it. This, however, is pretty much what I was looking for: > Encoding | WCE RCD | Write_cache Read_cache > > write through / w0r1 | 0 0 | off on > none / w0r0 | 0 1 | off off > write back / w1r1 | 1 0 | on on > write back, no read (daft) / w1r0 | 1 1 | on off Please drop the w?r? options and submit a patch with this table and a note about the "temporary" prefix. Put it in Documentation/scsi/sd-parameters.txt or something to that effect. Bonus points for also documenting the remaining sd sysfs attributes and their options. Thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [dm-devel] [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE
On 1/30/18 8:21 PM, Bart Van Assche wrote: > On Tue, 2018-01-30 at 20:17 -0700, Jens Axboe wrote: >> BLK_STS_RESOURCE should always be safe to return, and it should work >> the same as STS_DEV_RESOURCE, except it may cause an extra queue >> run. >> >> Well written drivers should use STS_DEV_RESOURCE where it makes >> sense. > > Hello Jens, > > I would appreciate it if other names would be chosen than BLK_STS_RESOURCE > and BLK_STS_DEV_RESOURCE, e.g. names that directly refer to the fact that > one of the two status codes causes the queue to be rerun and the other not. > I'm afraid that the currently chosen names will cause confusion. DEV_RESOURCE is pretty clear I think, but I agree that STS_RESOURCE could perhaps be better. STS_SYSTEM_RESOURCE? It makes the distinction a bit clearer. -- Jens Axboe
Re: [dm-devel] [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE
On Tue, 2018-01-30 at 20:17 -0700, Jens Axboe wrote: > BLK_STS_RESOURCE should always be safe to return, and it should work > the same as STS_DEV_RESOURCE, except it may cause an extra queue > run. > > Well written drivers should use STS_DEV_RESOURCE where it makes > sense. Hello Jens, I would appreciate it if other names would be chosen than BLK_STS_RESOURCE and BLK_STS_DEV_RESOURCE, e.g. names that directly refer to the fact that one of the two status codes causes the queue to be rerun and the other not. I'm afraid that the currently chosen names will cause confusion. Thanks, Bart.
Re: [PATCH v6] blk-mq: introduce BLK_STS_DEV_RESOURCE
On 1/30/18 8:04 PM, Mike Snitzer wrote: > From: Ming Lei> > This status is returned from driver to block layer if device related > resource is unavailable, but driver can guarantee that IO dispatch > will be triggered in future when the resource is available. > > Convert some drivers to return BLK_STS_DEV_RESOURCE. Also, if driver > returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after > a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls. BLK_MQ_DELAY_QUEUE is > 3 ms because both scsi-mq and nvmefc are using that magic value. > > If a driver can make sure there is in-flight IO, it is safe to return > BLK_STS_DEV_RESOURCE because: > > 1) If all in-flight IOs complete before examining SCHED_RESTART in > blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue > is run immediately in this case by blk_mq_dispatch_rq_list(); > > 2) if there is any in-flight IO after/when examining SCHED_RESTART > in blk_mq_dispatch_rq_list(): > - if SCHED_RESTART isn't set, queue is run immediately as handled in 1) > - otherwise, this request will be dispatched after any in-flight IO is > completed via blk_mq_sched_restart() > > 3) if SCHED_RESTART is set concurently in context because of > BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two > cases and make sure IO hang can be avoided. > > One invariant is that queue will be rerun if SCHED_RESTART is set. Applied, thanks. -- Jens Axboe
Re: [dm-devel] [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE
On 1/30/18 10:52 AM, Bart Van Assche wrote: > On 01/30/18 06:24, Mike Snitzer wrote: >> + * >> + * If driver returns BLK_STS_RESOURCE and SCHED_RESTART >> + * bit is set, run queue after a delay to avoid IO stalls >> + * that could otherwise occur if the queue is idle. >> */ >> -if (!blk_mq_sched_needs_restart(hctx) || >> +needs_restart = blk_mq_sched_needs_restart(hctx); >> +if (!needs_restart || >> (no_tag && list_empty_careful(>dispatch_wait.entry))) >> blk_mq_run_hw_queue(hctx, true); >> +else if (needs_restart && (ret == BLK_STS_RESOURCE)) >> +blk_mq_delay_run_hw_queue(hctx, BLK_MQ_QUEUE_DELAY); >> } > > If a request completes concurrently with execution of the above code > then the request completion will trigger a call of > blk_mq_sched_restart_hctx() and that call will clear the > BLK_MQ_S_SCHED_RESTART bit. If that bit is cleared before the above code > tests it then the above code will schedule an asynchronous call of > __blk_mq_run_hw_queue(). If the .queue_rq() call triggered by the new > queue run returns again BLK_STS_RESOURCE then the above code will be > executed again. In other words, a loop occurs. That loop will repeat as > long as the described race occurs. The current (kernel v4.15) block > layer behavior is simpler: only block drivers call > blk_mq_delay_run_hw_queue() and the block layer core never calls that > function. Hence that loop cannot occur with the v4.15 block layer core > and block drivers. A motivation of why that loop is preferred compared > to the current behavior (no loop) is missing. Does this mean that that > loop is a needless complication of the block layer core? The dispatch, and later restart check, is within the hctx lock. The completions should be as well. > Sorry but I still prefer the v4.15 block layer approach because this > patch has in my view the following disadvantages: > - It involves a blk-mq API change. API changes are always risky and need >some time to stabilize. > - The delay after which to rerun the queue is moved from block layer >drivers into the block layer core. I think that's wrong because only >the block driver authors can make a good choice for this constant. It's exactly the right place to put it, as drivers cannot make a good decision for when to run the queue again. You get NULL on allocating some piece of memory, when do you run it again? That's the last thing I want driver writers to make a decision on, because a novice device driver writer will just think that he should run the queue again asap. In reality we are screwed. Decisions like that SHOULD be in shared and generic code, not in driver private code. > - This patch makes block drivers harder to understand. Anyone who sees >return BLK_STS_RESOURCE / return BLK_STS_DEV_RESOURCE for the first >time will have to look up the meaning of these constants. An explicit >blk_mq_delay_run_hw_queue() call is easier to understand. BLK_STS_RESOURCE should always be safe to return, and it should work the same as STS_DEV_RESOURCE, except it may cause an extra queue run. Well written drivers should use STS_DEV_RESOURCE where it makes sense. > - This patch does not fix any bugs nor makes block drivers easier to >read or to implement. So why is this patch considered useful? It does fix the run bug on global resources, I'm assuming you mean it doesn't fix any EXTRA bugs compared to just use the delayed run? -- Jens Axboe
Re: [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE
On Tue, Jan 30 2018 at 9:44P -0500, Jens Axboewrote: > On 1/30/18 7:24 AM, Mike Snitzer wrote: > > From: Ming Lei > > > > This status is returned from driver to block layer if device related > > resource is unavailable, but driver can guarantee that IO dispatch > > will be triggered in future when the resource is available. > > > > Convert some drivers to return BLK_STS_DEV_RESOURCE. Also, if driver > > returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after > > a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls. BLK_MQ_DELAY_QUEUE is > > 3 ms because both scsi-mq and nvmefc are using that magic value. > > > > If a driver can make sure there is in-flight IO, it is safe to return > > BLK_STS_DEV_RESOURCE because: > > > > 1) If all in-flight IOs complete before examining SCHED_RESTART in > > blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue > > is run immediately in this case by blk_mq_dispatch_rq_list(); > > > > 2) if there is any in-flight IO after/when examining SCHED_RESTART > > in blk_mq_dispatch_rq_list(): > > - if SCHED_RESTART isn't set, queue is run immediately as handled in 1) > > - otherwise, this request will be dispatched after any in-flight IO is > > completed via blk_mq_sched_restart() > > > > 3) if SCHED_RESTART is set concurently in context because of > > BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two > > cases and make sure IO hang can be avoided. > > > > One invariant is that queue will be rerun if SCHED_RESTART is set. > > This looks pretty good to me. I'm waffling a bit on whether to retain > the current BLK_STS_RESOURCE behavior and name the new one something > else, but I do like using the DEV name in there to signify the > difference between a global and device resource. > > Just a few small nits below - can you roll a v6 with the changes? Folded in all your feedback and just replied with v6. > > diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h > > index 2d973ac54b09..f41d2057215f 100644 > > --- a/include/linux/blk_types.h > > +++ b/include/linux/blk_types.h > > @@ -39,6 +39,23 @@ typedef u8 __bitwise blk_status_t; > > > > #define BLK_STS_AGAIN ((__force blk_status_t)12) > > > > +/* > > + * BLK_STS_DEV_RESOURCE is returned from driver to block layer if device > > + * related resource is unavailable, but driver can guarantee that queue > > + * will be rerun in future once the resource is available (whereby > > + * dispatching requests). > > + * > > + * To safely return BLK_STS_DEV_RESOURCE, and allow forward progress, a > > + * driver just needs to make sure there is in-flight IO. > > + * > > + * Difference with BLK_STS_RESOURCE: > > + * If driver isn't sure if the queue will be rerun once device resource > > + * is made available, please return BLK_STS_RESOURCE. For example: when > > + * memory allocation, DMA Mapping or other system resource allocation > > + * fails and IO can't be submitted to device. > > + */ > > +#define BLK_STS_DEV_RESOURCE ((__force blk_status_t)13) > > I'd rephrase that as: > > BLK_STS_DEV_RESOURCE is returned from the driver to the block layer if > device related resource are unavailable, but the driver can guarantee > that the queue will be rerun in the future once resources become > available again. This is typically the case for device specific > resources that are consumed for IO. If the driver fails allocating these > resources, we know that inflight (or pending) IO will free these > resource upon completion. > > This is different from BLK_STS_RESOURCE in that it explicitly references > device specific resource. For resources of wider scope, allocation > failure can happen without having pending IO. This means that we can't > rely on request completions freeing these resources, as IO may not be in > flight. Examples of that are kernel memory allocations, DMA mappings, or > any other system wide resources. Thanks for that, definitely clearer, nice job. Mike
Re: [PATCH 1/1] scsi: storvsc: Increase cmd_per_lun for higher speed devices
> Increase cmd_per_lun to allow more I/Os in progress per device, > particularly for NVMe's. The Hyper-V host side can handle the > higher count with no issues. Long/KY/Cathy/Stephen: Please ack/review Michael's patches. -- Martin K. Petersen Oracle Linux Engineering
[PATCH v6] blk-mq: introduce BLK_STS_DEV_RESOURCE
From: Ming LeiThis status is returned from driver to block layer if device related resource is unavailable, but driver can guarantee that IO dispatch will be triggered in future when the resource is available. Convert some drivers to return BLK_STS_DEV_RESOURCE. Also, if driver returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls. BLK_MQ_DELAY_QUEUE is 3 ms because both scsi-mq and nvmefc are using that magic value. If a driver can make sure there is in-flight IO, it is safe to return BLK_STS_DEV_RESOURCE because: 1) If all in-flight IOs complete before examining SCHED_RESTART in blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue is run immediately in this case by blk_mq_dispatch_rq_list(); 2) if there is any in-flight IO after/when examining SCHED_RESTART in blk_mq_dispatch_rq_list(): - if SCHED_RESTART isn't set, queue is run immediately as handled in 1) - otherwise, this request will be dispatched after any in-flight IO is completed via blk_mq_sched_restart() 3) if SCHED_RESTART is set concurently in context because of BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two cases and make sure IO hang can be avoided. One invariant is that queue will be rerun if SCHED_RESTART is set. Suggested-by: Jens Axboe Tested-by: Laurence Oberman Signed-off-by: Ming Lei Signed-off-by: Mike Snitzer --- V6: - use -EBUSY, instead of -ENOMEM, for BLK_STS_DEV_RESOURCE - rename BLK_MQ_QUEUE_DELAY to BLK_MQ_RESOURCE_DELAY - cleanup blk_types.h comment block above BLK_STS_DEV_RESOURCE - all suggested by Jens V5: - fixed stale reference to 10ms delay in blk-mq.c comment - revised commit header to include Ming's proof of how blk-mq drivers will make progress (serves to document how scsi_queue_rq now works) V4: - cleanup header and code comments - rerun queue after BLK_MQ_QUEUE_DELAY (3ms) instead of 10ms - eliminate nvmefc's queue rerun now that blk-mq does it V3: - fix typo, and improvement document - add tested-by tag V2: - add comments on the new introduced status - patch style fix - both are suggested by Christoph block/blk-core.c | 1 + block/blk-mq.c | 20 drivers/block/null_blk.c | 2 +- drivers/block/virtio_blk.c | 2 +- drivers/block/xen-blkfront.c | 2 +- drivers/md/dm-rq.c | 5 ++--- drivers/nvme/host/fc.c | 12 ++-- drivers/scsi/scsi_lib.c | 6 +++--- include/linux/blk_types.h| 18 ++ 9 files changed, 45 insertions(+), 23 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index cdae69be68e9..79dfef84c66c 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -145,6 +145,7 @@ static const struct { [BLK_STS_MEDIUM]= { -ENODATA, "critical medium" }, [BLK_STS_PROTECTION]= { -EILSEQ,"protection" }, [BLK_STS_RESOURCE] = { -ENOMEM,"kernel resource" }, + [BLK_STS_DEV_RESOURCE] = { -EBUSY, "device resource" }, [BLK_STS_AGAIN] = { -EAGAIN,"nonblocking retry" }, /* device mapper special case, should not leak out: */ diff --git a/block/blk-mq.c b/block/blk-mq.c index 43e7449723e0..52a206e3777f 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1160,6 +1160,8 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx **hctx, return true; } +#define BLK_MQ_RESOURCE_DELAY 3 /* ms units */ + bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, bool got_budget) { @@ -1167,6 +1169,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, struct request *rq, *nxt; bool no_tag = false; int errors, queued; + blk_status_t ret = BLK_STS_OK; if (list_empty(list)) return false; @@ -1179,7 +1182,6 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, errors = queued = 0; do { struct blk_mq_queue_data bd; - blk_status_t ret; rq = list_first_entry(list, struct request, queuelist); if (!blk_mq_get_driver_tag(rq, , false)) { @@ -1224,7 +1226,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, } ret = q->mq_ops->queue_rq(hctx, ); - if (ret == BLK_STS_RESOURCE) { + if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) { /* * If an I/O scheduler has been configured and we got a * driver tag for the next request already, free it @@ -1255,6 +1257,8 @@ bool
Re: [PATCH 0/6] Six qla2xxx and qla4xxx patches
Bart, > The patches in this series are what I came up with after having analyzed > the source code of the qla[24]xxx drivers with several source code analysis > tools (scripts/kernel-doc, gcc, sparse and smatch). None of the patches in > this series have been tested. Yet I'm asking you to consider at least the > first patch of this series for kernel v4.16 because it fixes a locking bug > in one of the SCSI patches queued for kernel v4.16. Himanshu and team: Please review! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] scsi_dh: Document alua_rtpg_queue() arguments
Bart, > Since commit 3a025e1d1c2e ("Add optional check for bad kernel-doc > comments") building with W=1 causes warnings to appear for issues > in kernel-doc headers. This patch avoids that the following warnings > are reported when building with W=1: Applied to 4.16/scsi-fixes, thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] scsi: remove dead makefile about oktagon files
Corentin, > Remove line using inexistant files which were removed in > commit 642978beb483 ("[SCSI] remove m68k NCR53C9x based drivers") Applied to 4.16/scsi-fixes, thank you. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] scsi: aic7xxx: remove really aiclib.c
Corentin, > aiclib.c is unused (and contain no code) since commit 1ff927306e08 > ("[SCSI] aic7xxx: remove aiclib.c") > 13 years later, finish the cleaning by remove it from tree. Applied to 4.16/scsi-fixes, thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] mptfusion: Add bounds check in mptctl_hp_targetinfo()
Dan, > My static checker complains about an out of bounds read: > > drivers/message/fusion/mptctl.c:2786 mptctl_hp_targetinfo() > error: buffer overflow 'hd->sel_timeout' 255 <= u32max. > > It's true that we probably should have a bounds check here. Applied to 4.16/scsi-fixes, thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE
On 1/30/18 7:24 AM, Mike Snitzer wrote: > From: Ming Lei> > This status is returned from driver to block layer if device related > resource is unavailable, but driver can guarantee that IO dispatch > will be triggered in future when the resource is available. > > Convert some drivers to return BLK_STS_DEV_RESOURCE. Also, if driver > returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after > a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls. BLK_MQ_DELAY_QUEUE is > 3 ms because both scsi-mq and nvmefc are using that magic value. > > If a driver can make sure there is in-flight IO, it is safe to return > BLK_STS_DEV_RESOURCE because: > > 1) If all in-flight IOs complete before examining SCHED_RESTART in > blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue > is run immediately in this case by blk_mq_dispatch_rq_list(); > > 2) if there is any in-flight IO after/when examining SCHED_RESTART > in blk_mq_dispatch_rq_list(): > - if SCHED_RESTART isn't set, queue is run immediately as handled in 1) > - otherwise, this request will be dispatched after any in-flight IO is > completed via blk_mq_sched_restart() > > 3) if SCHED_RESTART is set concurently in context because of > BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two > cases and make sure IO hang can be avoided. > > One invariant is that queue will be rerun if SCHED_RESTART is set. This looks pretty good to me. I'm waffling a bit on whether to retain the current BLK_STS_RESOURCE behavior and name the new one something else, but I do like using the DEV name in there to signify the difference between a global and device resource. Just a few small nits below - can you roll a v6 with the changes? > diff --git a/block/blk-core.c b/block/blk-core.c > index cdae69be68e9..38279d4ae08b 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -145,6 +145,7 @@ static const struct { > [BLK_STS_MEDIUM]= { -ENODATA, "critical medium" }, > [BLK_STS_PROTECTION]= { -EILSEQ,"protection" }, > [BLK_STS_RESOURCE] = { -ENOMEM,"kernel resource" }, > + [BLK_STS_DEV_RESOURCE] = { -ENOMEM,"device resource" }, > [BLK_STS_AGAIN] = { -EAGAIN,"nonblocking retry" }, I think we should make BLK_STS_DEV_RESOURCE be -EBUSY, that more closely matches the result and makes it distinctly different than the global shortage that is STS_RESOURCE/ENOMEM. > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 43e7449723e0..e39b4e2a63db 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -1160,6 +1160,8 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx > **hctx, > return true; > } > > +#define BLK_MQ_QUEUE_DELAY 3 /* ms units */ Make that BLK_MQ_RESOURCE_DELAY or something like that. The name is too generic right now. > diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h > index 2d973ac54b09..f41d2057215f 100644 > --- a/include/linux/blk_types.h > +++ b/include/linux/blk_types.h > @@ -39,6 +39,23 @@ typedef u8 __bitwise blk_status_t; > > #define BLK_STS_AGAIN((__force blk_status_t)12) > > +/* > + * BLK_STS_DEV_RESOURCE is returned from driver to block layer if device > + * related resource is unavailable, but driver can guarantee that queue > + * will be rerun in future once the resource is available (whereby > + * dispatching requests). > + * > + * To safely return BLK_STS_DEV_RESOURCE, and allow forward progress, a > + * driver just needs to make sure there is in-flight IO. > + * > + * Difference with BLK_STS_RESOURCE: > + * If driver isn't sure if the queue will be rerun once device resource > + * is made available, please return BLK_STS_RESOURCE. For example: when > + * memory allocation, DMA Mapping or other system resource allocation > + * fails and IO can't be submitted to device. > + */ > +#define BLK_STS_DEV_RESOURCE ((__force blk_status_t)13) I'd rephrase that as: BLK_STS_DEV_RESOURCE is returned from the driver to the block layer if device related resource are unavailable, but the driver can guarantee that the queue will be rerun in the future once resources become available again. This is typically the case for device specific resources that are consumed for IO. If the driver fails allocating these resources, we know that inflight (or pending) IO will free these resource upon completion. This is different from BLK_STS_RESOURCE in that it explicitly references device specific resource. For resources of wider scope, allocation failure can happen without having pending IO. This means that we can't rely on request completions freeing these resources, as IO may not be in flight. Examples of that are kernel memory allocations, DMA mappings, or any other system wide resources. -- Jens Axboe
Re: [PATCH] qla2xxx: Avoid triggering undefined behavior in qla2x00_mbx_completion()
Bart, > A left shift must shift less than the bit width of the left argument. > Avoid triggering undefined behavior if ha->mbx_count == 32. Applied to 4.16/scsi-fixes. Thank you! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] [SCSI] sym53c8xx_2: iterator underflow in sym_getsync()
Dan, > We wanted to exit the loop with "div" set to zero, but instead, if we > don't hit the break then "div" is -1 when we finish the loop. It > leads to an array underflow a few lines later. Applied to 4.16/scsi-fixes, thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] bnx2fc: Fix check in SCSI completion handler when reqeust has already timed out.
Chad, > When a request times out we set the io_req flag BNX2FC_FLAG_IO_COMPL so > that if a subsequent completion comes in on that task ID we will ignore > it. The issue is that in the check for this flag there is a missing > return so we will continue to process a request which may have already > been returned to the ownership of the SCSI layer. This can cause > unpredictable results. Applied to 4.16/scsi-fixes. Thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] scsi: csiostor: remove redundant assignment to pointer 'ln'
Colin, > The pointer ln is assigned a value that is never read, it is re-assigned > a new value in the list_for_each loop hence the initialization is > redundant and can be removed. Applied to 4.16/scsi-fixes. Thank you. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH 1/1] scsi: ufs: Enable quirk to ignore sending WRITE_SAME command
Asutosh, > /* WRITE_SAME command is not supported*/ > sdev->no_write_same = 1; > > + /* WRITE_SAME command is not supported*/ > + sdev->no_write_same = 1; > + Uhm... I applied this change to 4.16/scsi-fixes by hand. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] ibmvfc: fix misdefined reserved field in ibmvfc_fcp_rsp_info
Tyrel, > The fcp_rsp_info structure as defined in the FC spec has an initial 3 bytes > reserved field. The ibmvfc driver mistakenly defined this field as 4 bytes > resulting in the rsp_code field being defined in what should be the start of > the second reserved field and thus always being reported as zero by the > driver. Applied to 4.16/scsi-fixes, thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] qla2xxx: Fix memory corruption during hba reset test
Himanshu, > This patch fixes memory corrpution while performing > HBA Reset test. Applied to 4.16/scsi-fixes. Thank you! -- Martin K. Petersen Oracle Linux Engineering
Re: [dm-devel] [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE
On Tue, Jan 30, 2018 at 09:52:31AM -0800, Bart Van Assche wrote: > On 01/30/18 06:24, Mike Snitzer wrote: > > +* > > +* If driver returns BLK_STS_RESOURCE and SCHED_RESTART > > +* bit is set, run queue after a delay to avoid IO stalls > > +* that could otherwise occur if the queue is idle. > > */ > > - if (!blk_mq_sched_needs_restart(hctx) || > > + needs_restart = blk_mq_sched_needs_restart(hctx); > > + if (!needs_restart || > > (no_tag && list_empty_careful(>dispatch_wait.entry))) > > blk_mq_run_hw_queue(hctx, true); > > + else if (needs_restart && (ret == BLK_STS_RESOURCE)) > > + blk_mq_delay_run_hw_queue(hctx, BLK_MQ_QUEUE_DELAY); > > } > > If a request completes concurrently with execution of the above code then > the request completion will trigger a call of blk_mq_sched_restart_hctx() > and that call will clear the BLK_MQ_S_SCHED_RESTART bit. If that bit is > cleared before the above code tests it then the above code will schedule an > asynchronous call of __blk_mq_run_hw_queue(). If the .queue_rq() call Right. > triggered by the new queue run returns again BLK_STS_RESOURCE then the above > code will be executed again. In other words, a loop occurs. That loop will This patch doesn't change anything about that. When BLK_STS_RESOURCE is returned, this request is added to hctx->dispatch, next time, before dispatching this request, SCHED_RESTART is set and the loop is cut. > repeat as long as the described race occurs. The current (kernel v4.15) > block layer behavior is simpler: only block drivers call > blk_mq_delay_run_hw_queue() and the block layer core never calls that > function. Hence that loop cannot occur with the v4.15 block layer core and That way isn't safe, I have explained to you in the following link: https://marc.info/?l=dm-devel=151727454815018=2 > block drivers. A motivation of why that loop is preferred compared to the > current behavior (no loop) is missing. Does this mean that that loop is a > needless complication of the block layer core? No such loop as I explained above. > > Sorry but I still prefer the v4.15 block layer approach because this patch > has in my view the following disadvantages: > - It involves a blk-mq API change. API changes are always risky and need > some time to stabilize. > - The delay after which to rerun the queue is moved from block layer > drivers into the block layer core. I think that's wrong because only > the block driver authors can make a good choice for this constant. > - This patch makes block drivers harder to understand. Anyone who sees > return BLK_STS_RESOURCE / return BLK_STS_DEV_RESOURCE for the first > time will have to look up the meaning of these constants. An explicit > blk_mq_delay_run_hw_queue() call is easier to understand. > - This patch makes the blk-mq core harder to understand because of the > loop mentioned above. > - This patch does not fix any bugs nor makes block drivers easier to > read or to implement. So why is this patch considered useful? It does avoid the race I mentioned in the following link: https://marc.info/?l=dm-devel=151727454815018=2 More importantly, every driver need this change, if you have better idea to fix them all, please post a patch, then we can compare both. Thanks, Ming
Re: [PATCH] mpt3sas: fix an out of bound write
Tomas, > cpu_msix_table is allocated to store online cpus, but > pci_irq_get_affinity may return cpu_possible_mask which is then used > to access cpu_msix_table. That causes bad user experience. Fix > limits access to only online cpus, I've also added an additonal test > to protect from an unlikely change in cpu_online_mask. Applied to 4.16/scsi-fixes. Thanks! -- Martin K. Petersen Oracle Linux Engineering
[PATCH v3 09/19] lpfc: Allow set of maximum outstanding SCSI cmd limit for a target parameter
Make the attribute writeable. Remove the ramp up to logic as its unnecessary, simply set depth. Add debug message if depth changed, possibly reducing limit, yet our outstanding count has yet to catch up with it. Signed-off-by: Dick KennedySigned-off-by: James Smart Reviewed-by: Hannes Reinecke --- drivers/scsi/lpfc/lpfc_attr.c | 4 ++-- drivers/scsi/lpfc/lpfc_scsi.c | 39 --- 2 files changed, 30 insertions(+), 13 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c index 91df2b4e9fce..d8064e3ea0ba 100644 --- a/drivers/scsi/lpfc/lpfc_attr.c +++ b/drivers/scsi/lpfc/lpfc_attr.c @@ -3471,8 +3471,8 @@ LPFC_VPORT_ATTR_R(lun_queue_depth, 30, 1, 512, # tgt_queue_depth: This parameter is used to limit the number of outstanding # commands per target port. Value range is [10,65535]. Default value is 65535. */ -LPFC_VPORT_ATTR_R(tgt_queue_depth, 65535, 10, 65535, - "Max number of FCP commands we can queue to a specific target port"); +LPFC_VPORT_ATTR_RW(tgt_queue_depth, 65535, 10, 65535, + "Max number of FCP commands we can queue to a specific target port"); /* # hba_queue_depth: This parameter is used to limit the number of outstanding diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c index c0cdaef4db24..dcc86936e6fc 100644 --- a/drivers/scsi/lpfc/lpfc_scsi.c +++ b/drivers/scsi/lpfc/lpfc_scsi.c @@ -3926,7 +3926,6 @@ lpfc_scsi_cmd_iocb_cmpl(struct lpfc_hba *phba, struct lpfc_iocbq *pIocbIn, struct lpfc_rport_data *rdata = lpfc_cmd->rdata; struct lpfc_nodelist *pnode = rdata->pnode; struct scsi_cmnd *cmd; - int depth; unsigned long flags; struct lpfc_fast_path_event *fast_path_evt; struct Scsi_Host *shost; @@ -4132,16 +4131,11 @@ lpfc_scsi_cmd_iocb_cmpl(struct lpfc_hba *phba, struct lpfc_iocbq *pIocbIn, } spin_unlock_irqrestore(shost->host_lock, flags); } else if (pnode && NLP_CHK_NODE_ACT(pnode)) { - if ((pnode->cmd_qdepth < vport->cfg_tgt_queue_depth) && - time_after(jiffies, pnode->last_change_time + + if ((pnode->cmd_qdepth != vport->cfg_tgt_queue_depth) && + time_after(jiffies, pnode->last_change_time + msecs_to_jiffies(LPFC_TGTQ_INTERVAL))) { spin_lock_irqsave(shost->host_lock, flags); - depth = pnode->cmd_qdepth * LPFC_TGTQ_RAMPUP_PCENT - / 100; - depth = depth ? depth : 1; - pnode->cmd_qdepth += depth; - if (pnode->cmd_qdepth > vport->cfg_tgt_queue_depth) - pnode->cmd_qdepth = vport->cfg_tgt_queue_depth; + pnode->cmd_qdepth = vport->cfg_tgt_queue_depth; pnode->last_change_time = jiffies; spin_unlock_irqrestore(shost->host_lock, flags); } @@ -4564,9 +4558,32 @@ lpfc_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *cmnd) */ if (!ndlp || !NLP_CHK_NODE_ACT(ndlp)) goto out_tgt_busy; - if (atomic_read(>cmd_pending) >= ndlp->cmd_qdepth) + if (atomic_read(>cmd_pending) >= ndlp->cmd_qdepth) { + lpfc_printf_vlog(vport, KERN_INFO, LOG_FCP_ERROR, +"3377 Target Queue Full, scsi Id:%d Qdepth:%d" +" Pending command:%d" +" WWNN:%02x:%02x:%02x:%02x:%02x:%02x:%02x:%02x, " +" WWPN:%02x:%02x:%02x:%02x:%02x:%02x:%02x:%02x", +ndlp->nlp_sid, ndlp->cmd_qdepth, +atomic_read(>cmd_pending), +ndlp->nlp_nodename.u.wwn[0], +ndlp->nlp_nodename.u.wwn[1], +ndlp->nlp_nodename.u.wwn[2], +ndlp->nlp_nodename.u.wwn[3], +ndlp->nlp_nodename.u.wwn[4], +ndlp->nlp_nodename.u.wwn[5], +ndlp->nlp_nodename.u.wwn[6], +ndlp->nlp_nodename.u.wwn[7], +ndlp->nlp_portname.u.wwn[0], +ndlp->nlp_portname.u.wwn[1], +ndlp->nlp_portname.u.wwn[2], +ndlp->nlp_portname.u.wwn[3], +ndlp->nlp_portname.u.wwn[4], +ndlp->nlp_portname.u.wwn[5], +ndlp->nlp_portname.u.wwn[6], +ndlp->nlp_portname.u.wwn[7]); goto out_tgt_busy; - + } lpfc_cmd =
[PATCH v3 11/19] lpfc: Fix issue_lip if link is disabled
The driver ignored checks on whether the link should be kept administratively down after a link bounce. Correct the checks. Signed-off-by: Dick KennedySigned-off-by: James Smart Reviewed-by: Hannes Reinecke --- drivers/scsi/lpfc/lpfc_attr.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c index d8064e3ea0ba..b79dad7b8278 100644 --- a/drivers/scsi/lpfc/lpfc_attr.c +++ b/drivers/scsi/lpfc/lpfc_attr.c @@ -911,7 +911,12 @@ lpfc_issue_lip(struct Scsi_Host *shost) LPFC_MBOXQ_t *pmboxq; int mbxstatus = MBXERR_ERROR; + /* +* If the link is offline, disabled or BLOCK_MGMT_IO +* it doesn't make any sense to allow issue_lip +*/ if ((vport->fc_flag & FC_OFFLINE_MODE) || + (phba->hba_flag & LINK_DISABLED) || (phba->sli.sli_flag & LPFC_BLOCK_MGMT_IO)) return -EPERM; -- 2.13.1
[PATCH v3 19/19] lpfc: Update 11.4.0.7 modified files for 2018 Copyright
Updated Copyright in files updated 11.4.0.7 Signed-off-by: Dick KennedySigned-off-by: James Smart --- v3: Revised LPFC_COPYRIGHT string for 2018 as well. --- drivers/scsi/lpfc/lpfc.h | 2 +- drivers/scsi/lpfc/lpfc_attr.c | 2 +- drivers/scsi/lpfc/lpfc_crtn.h | 2 +- drivers/scsi/lpfc/lpfc_els.c | 2 +- drivers/scsi/lpfc/lpfc_hbadisc.c | 2 +- drivers/scsi/lpfc/lpfc_hw4.h | 2 +- drivers/scsi/lpfc/lpfc_init.c | 2 +- drivers/scsi/lpfc/lpfc_mbox.c | 2 +- drivers/scsi/lpfc/lpfc_mem.c | 2 +- drivers/scsi/lpfc/lpfc_nportdisc.c | 4 ++-- drivers/scsi/lpfc/lpfc_nvme.c | 2 +- drivers/scsi/lpfc/lpfc_nvmet.c | 2 +- drivers/scsi/lpfc/lpfc_nvmet.h | 2 +- drivers/scsi/lpfc/lpfc_scsi.c | 2 +- drivers/scsi/lpfc/lpfc_sli.c | 3 +-- drivers/scsi/lpfc/lpfc_sli4.h | 2 +- drivers/scsi/lpfc/lpfc_version.h | 6 +++--- 17 files changed, 20 insertions(+), 21 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h index d042f9118e3b..9698b9635058 100644 --- a/drivers/scsi/lpfc/lpfc.h +++ b/drivers/scsi/lpfc/lpfc.h @@ -1,7 +1,7 @@ /*** * This file is part of the Emulex Linux Device Driver for * * Fibre Channel Host Bus Adapters.* - * Copyright (C) 2017 Broadcom. All Rights Reserved. The term * + * Copyright (C) 2017-2018 Broadcom. All Rights Reserved. The term * * “Broadcom” refers to Broadcom Limited and/or its subsidiaries. * * Copyright (C) 2004-2016 Emulex. All rights reserved. * * EMULEX and SLI are trademarks of Emulex.* diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c index b79dad7b8278..70bd25666243 100644 --- a/drivers/scsi/lpfc/lpfc_attr.c +++ b/drivers/scsi/lpfc/lpfc_attr.c @@ -1,7 +1,7 @@ /*** * This file is part of the Emulex Linux Device Driver for * * Fibre Channel Host Bus Adapters.* - * Copyright (C) 2017 Broadcom. All Rights Reserved. The term * + * Copyright (C) 2017-2018 Broadcom. All Rights Reserved. The term * * “Broadcom” refers to Broadcom Limited and/or its subsidiaries. * * Copyright (C) 2004-2016 Emulex. All rights reserved. * * EMULEX and SLI are trademarks of Emulex.* diff --git a/drivers/scsi/lpfc/lpfc_crtn.h b/drivers/scsi/lpfc/lpfc_crtn.h index 3ecf50df93f4..14a86b5b51e4 100644 --- a/drivers/scsi/lpfc/lpfc_crtn.h +++ b/drivers/scsi/lpfc/lpfc_crtn.h @@ -1,7 +1,7 @@ /*** * This file is part of the Emulex Linux Device Driver for * * Fibre Channel Host Bus Adapters.* - * Copyright (C) 2017 Broadcom. All Rights Reserved. The term * + * Copyright (C) 2017-2018 Broadcom. All Rights Reserved. The term * * “Broadcom” refers to Broadcom Limited and/or its subsidiaries. * * Copyright (C) 2004-2016 Emulex. All rights reserved. * * EMULEX and SLI are trademarks of Emulex.* diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c index 404e1af5e2ab..ba896554a14f 100644 --- a/drivers/scsi/lpfc/lpfc_els.c +++ b/drivers/scsi/lpfc/lpfc_els.c @@ -1,7 +1,7 @@ /*** * This file is part of the Emulex Linux Device Driver for * * Fibre Channel Host Bus Adapters.* - * Copyright (C) 2017 Broadcom. All Rights Reserved. The term * + * Copyright (C) 2017-2018 Broadcom. All Rights Reserved. The term * * “Broadcom” refers to Broadcom Limited and/or its subsidiaries. * * Copyright (C) 2004-2016 Emulex. All rights reserved. * * EMULEX and SLI are trademarks of Emulex.* diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c b/drivers/scsi/lpfc/lpfc_hbadisc.c index 9265906d956e..f5bbac3cadbb 100644 --- a/drivers/scsi/lpfc/lpfc_hbadisc.c +++ b/drivers/scsi/lpfc/lpfc_hbadisc.c @@ -1,7 +1,7 @@ /*** * This file is part of the Emulex Linux Device Driver for * * Fibre Channel Host Bus Adapters.* - * Copyright (C) 2017 Broadcom. All Rights Reserved. The term * + * Copyright (C) 2017-2018 Broadcom. All Rights Reserved. The term * * “Broadcom” refers to Broadcom Limited and/or its subsidiaries. * * Copyright (C) 2004-2016 Emulex. All rights reserved. * * EMULEX and SLI are trademarks of Emulex.* diff --git a/drivers/scsi/lpfc/lpfc_hw4.h b/drivers/scsi/lpfc/lpfc_hw4.h index 52fe28ae50fa..8685d26e6929 100644 --- a/drivers/scsi/lpfc/lpfc_hw4.h +++ b/drivers/scsi/lpfc/lpfc_hw4.h @@
[PATCH v3 10/19] lpfc: Fix soft lockup in lpfc worker thread during LIP testing
During link bounce testing in a point-to-point topology, the host may enter a soft lockup on the lpfc_worker thread: Call Trace: lpfc_work_done+0x1f3/0x1390 [lpfc] lpfc_do_work+0x16f/0x180 [lpfc] kthread+0xc7/0xe0 ret_from_fork+0x3f/0x70 The driver was simultaneously setting a combination of flags that caused lpfc_do_work()to effectively spin between slow path work and new event data, causing the lockup. Ensure in the typical wq completions, that new event data flags are set if the slow path flag is running. The slow path will eventually reschedule the wq handling. Signed-off-by: Dick KennedySigned-off-by: James Smart Reviewed-by: Hannes Reinecke --- drivers/scsi/lpfc/lpfc_hbadisc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c b/drivers/scsi/lpfc/lpfc_hbadisc.c index b159a5c4e388..9265906d956e 100644 --- a/drivers/scsi/lpfc/lpfc_hbadisc.c +++ b/drivers/scsi/lpfc/lpfc_hbadisc.c @@ -696,8 +696,9 @@ lpfc_work_done(struct lpfc_hba *phba) phba->hba_flag & HBA_SP_QUEUE_EVT)) { if (pring->flag & LPFC_STOP_IOCB_EVENT) { pring->flag |= LPFC_DEFERRED_RING_EVENT; - /* Set the lpfc data pending flag */ - set_bit(LPFC_DATA_READY, >data_flags); + /* Preserve legacy behavior. */ + if (!(phba->hba_flag & HBA_SP_QUEUE_EVT)) + set_bit(LPFC_DATA_READY, >data_flags); } else { if (phba->link_state >= LPFC_LINK_UP || phba->link_flag & LS_MDS_LOOPBACK) { -- 2.13.1
[PATCH v3 06/19] lpfc: Fix PRLI handling when topology type changes
The lpfc driver does not discover a target when the topology changes from switched-fabric to direct-connect. The target rejects the PRLI from the initiator in direct-connect as the driver is using the old S_ID from the switched topology. The driver was inappropriately clearing the VP bit to register the VPI, which is what is associated with the S_ID. Fix by leaving the VP bit set (it was set earlier) and as the VFI is being re-registered, set the UPDT bit. Signed-off-by: Dick KennedySigned-off-by: James Smart Reviewed-by: Hannes Reinecke --- drivers/scsi/lpfc/lpfc_mbox.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_mbox.c b/drivers/scsi/lpfc/lpfc_mbox.c index 81fb92967b11..c32d4a323db2 100644 --- a/drivers/scsi/lpfc/lpfc_mbox.c +++ b/drivers/scsi/lpfc/lpfc_mbox.c @@ -2170,10 +2170,8 @@ lpfc_reg_vfi(struct lpfcMboxq *mbox, struct lpfc_vport *vport, dma_addr_t phys) /* Only FC supports upd bit */ if ((phba->sli4_hba.lnk_info.lnk_tp == LPFC_LNK_TYPE_FC) && (vport->fc_flag & FC_VFI_REGISTERED) && - (!phba->fc_topology_changed)) { - bf_set(lpfc_reg_vfi_vp, reg_vfi, 0); + (!phba->fc_topology_changed)) bf_set(lpfc_reg_vfi_upd, reg_vfi, 1); - } bf_set(lpfc_reg_vfi_bbcr, reg_vfi, 0); bf_set(lpfc_reg_vfi_bbscn, reg_vfi, 0); -- 2.13.1
[PATCH v3 17/19] lpfc: Fix nonrecovery of NVME controller after cable swap.
In a test that is doing large numbers of cable swaps on the target, the nvme controllers wouldn't reconnect. During the cable swaps, the targets n_port_id would change. This information was passed to the nvme-fc transport, in the new remoteport registration. However, the nvme-fc transport didn't update the n_port_id value in the remoteport struct when it reused an existing structure. Later, when a new association was attempted on the remoteport, the driver's NVME LS routine would use the stale n_port_id from the remoteport struct to address the LS. As the device is no longer at that address, the LS would go into never never land. Separately, the nvme-fc transport will be corrected to update the n_port_id value on a re-registration. However, for now, there's no reason to use the transports values. The private pointer points to the drivers node structure and the node structure is up to date. Therefore, revise the LS routine to use the drivers data structures for the LS. Augmented the debug message for better debugging in the future. Also removed a duplicate if check that seems to have slipped in. Signed-off-by: Dick KennedySigned-off-by: James Smart Reviewed-by: Hannes Reinecke --- drivers/scsi/lpfc/lpfc_nvme.c | 24 +--- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c index c6e5b9972585..6327f858c4c8 100644 --- a/drivers/scsi/lpfc/lpfc_nvme.c +++ b/drivers/scsi/lpfc/lpfc_nvme.c @@ -241,10 +241,11 @@ lpfc_nvme_cmpl_gen_req(struct lpfc_hba *phba, struct lpfc_iocbq *cmdwqe, ndlp = (struct lpfc_nodelist *)cmdwqe->context1; lpfc_printf_vlog(vport, KERN_INFO, LOG_NVME_DISC, "6047 nvme cmpl Enter " -"Data %p DID %x Xri: %x status %x cmd:%p lsreg:%p " -"bmp:%p ndlp:%p\n", +"Data %p DID %x Xri: %x status %x reason x%x cmd:%p " +"lsreg:%p bmp:%p ndlp:%p\n", pnvme_lsreq, ndlp ? ndlp->nlp_DID : 0, cmdwqe->sli4_xritag, status, +(wcqe->parameter & 0x), cmdwqe, pnvme_lsreq, cmdwqe->context3, ndlp); lpfc_nvmeio_data(phba, "NVME LS CMPL: xri x%x stat x%x parm x%x\n", @@ -419,6 +420,7 @@ lpfc_nvme_ls_req(struct nvme_fc_local_port *pnvme_lport, { int ret = 0; struct lpfc_nvme_lport *lport; + struct lpfc_nvme_rport *rport; struct lpfc_vport *vport; struct lpfc_nodelist *ndlp; struct ulp_bde64 *bpl; @@ -437,19 +439,18 @@ lpfc_nvme_ls_req(struct nvme_fc_local_port *pnvme_lport, */ lport = (struct lpfc_nvme_lport *)pnvme_lport->private; + rport = (struct lpfc_nvme_rport *)pnvme_rport->private; vport = lport->vport; if (vport->load_flag & FC_UNLOADING) return -ENODEV; - if (vport->load_flag & FC_UNLOADING) - return -ENODEV; - - ndlp = lpfc_findnode_did(vport, pnvme_rport->port_id); + /* Need the ndlp. It is stored in the driver's rport. */ + ndlp = rport->ndlp; if (!ndlp || !NLP_CHK_NODE_ACT(ndlp)) { lpfc_printf_vlog(vport, KERN_ERR, LOG_NODE | LOG_NVME_IOERR, -"6051 DID x%06x not an active rport.\n", -pnvme_rport->port_id); +"6051 Remoteport %p, rport has invalid ndlp. " +"Failing LS Req\n", pnvme_rport); return -ENODEV; } @@ -500,8 +501,9 @@ lpfc_nvme_ls_req(struct nvme_fc_local_port *pnvme_lport, /* Expand print to include key fields. */ lpfc_printf_vlog(vport, KERN_INFO, LOG_NVME_DISC, -"6149 ENTER. lport %p, rport %p lsreq%p rqstlen:%d " -"rsplen:%d %pad %pad\n", +"6149 Issue LS Req to DID 0x%06x lport %p, rport %p " +"lsreq%p rqstlen:%d rsplen:%d %pad %pad\n", +ndlp->nlp_DID, pnvme_lport, pnvme_rport, pnvme_lsreq, pnvme_lsreq->rqstlen, pnvme_lsreq->rsplen, _lsreq->rqstdma, @@ -517,7 +519,7 @@ lpfc_nvme_ls_req(struct nvme_fc_local_port *pnvme_lport, ndlp, 2, 30, 0); if (ret != WQE_SUCCESS) { atomic_inc(>xmt_ls_err); - lpfc_printf_vlog(vport, KERN_INFO, LOG_NVME_DISC, + lpfc_printf_vlog(vport, KERN_ERR, LOG_NVME_DISC, "6052 EXIT. issue ls wqe failed lport %p, " "rport %p lsreq%p Status %x DID %x\n", pnvme_lport, pnvme_rport, pnvme_lsreq, -- 2.13.1
[PATCH v3 14/19] lpfc: Validate adapter support for SRIU option
When using the special option to suppress the response iu, ensure the adapter fully supports the feature by checking feature flags from the adapter and validating the support when formatting the WQE. Signed-off-by: Dick KennedySigned-off-by: James Smart Reviewed-by: Hannes Reinecke --- v2: Add missing snippet that changes check logic before enabling suppression in lpfc_nvmet_prep_fcp_wqe() --- drivers/scsi/lpfc/lpfc_hw4.h | 3 +++ drivers/scsi/lpfc/lpfc_init.c | 13 - drivers/scsi/lpfc/lpfc_nvmet.c | 7 --- 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_hw4.h b/drivers/scsi/lpfc/lpfc_hw4.h index 7c3afc3d3121..52fe28ae50fa 100644 --- a/drivers/scsi/lpfc/lpfc_hw4.h +++ b/drivers/scsi/lpfc/lpfc_hw4.h @@ -3293,6 +3293,9 @@ struct lpfc_sli4_parameters { #define cfg_eqdr_SHIFT 8 #define cfg_eqdr_MASK 0x0001 #define cfg_eqdr_WORD word19 +#define cfg_nosr_SHIFT 9 +#define cfg_nosr_MASK 0x0001 +#define cfg_nosr_WORD word19 #define LPFC_NODELAY_MAX_IO32 }; diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index aa7872a7b493..f2d2faef8710 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -10473,8 +10473,19 @@ lpfc_get_sli4_parameters(struct lpfc_hba *phba, LPFC_MBOXQ_t *mboxq) phba->cfg_enable_fc4_type = LPFC_ENABLE_FCP; } - if (bf_get(cfg_xib, mbx_sli4_parameters) && phba->cfg_suppress_rsp) + /* +* To support Suppress Response feature we must satisfy 3 conditions. +* lpfc_suppress_rsp module parameter must be set (default). +* In SLI4-Parameters Descriptor: +* Extended Inline Buffers (XIB) must be supported. +* Suppress Response IU Not Supported (SRIUNS) must NOT be supported +* (double negative). +*/ + if (phba->cfg_suppress_rsp && bf_get(cfg_xib, mbx_sli4_parameters) && + !(bf_get(cfg_nosr, mbx_sli4_parameters))) phba->sli.sli_flag |= LPFC_SLI_SUPPRESS_RSP; + else + phba->cfg_suppress_rsp = 0; if (bf_get(cfg_eqdr, mbx_sli4_parameters)) phba->sli.sli_flag |= LPFC_SLI_USE_EQDR; diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c index 0539585d32d4..6dd8535918f6 100644 --- a/drivers/scsi/lpfc/lpfc_nvmet.c +++ b/drivers/scsi/lpfc/lpfc_nvmet.c @@ -2290,9 +2290,10 @@ lpfc_nvmet_prep_fcp_wqe(struct lpfc_hba *phba, if (rsp->op == NVMET_FCOP_READDATA_RSP) { atomic_inc(>xmt_fcp_read_rsp); bf_set(wqe_ar, >fcp_tsend.wqe_com, 1); - if ((ndlp->nlp_flag & NLP_SUPPRESS_RSP) && - (rsp->rsplen == 12)) { - bf_set(wqe_sup, >fcp_tsend.wqe_com, 1); + if (rsp->rsplen == LPFC_NVMET_SUCCESS_LEN) { + if (ndlp->nlp_flag & NLP_SUPPRESS_RSP) + bf_set(wqe_sup, + >fcp_tsend.wqe_com, 1); bf_set(wqe_wqes, >fcp_tsend.wqe_com, 0); bf_set(wqe_irsp, >fcp_tsend.wqe_com, 0); bf_set(wqe_irsplen, >fcp_tsend.wqe_com, 0); -- 2.13.1
[PATCH v3 18/19] lpfc: update driver version to 11.4.0.7
Update the driver version to 11.4.0.7 Signed-off-by: Dick KennedySigned-off-by: James Smart Reviewed-by: Hannes Reinecke --- drivers/scsi/lpfc/lpfc_version.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/lpfc/lpfc_version.h b/drivers/scsi/lpfc/lpfc_version.h index c232bf0e8998..6f4092cb903e 100644 --- a/drivers/scsi/lpfc/lpfc_version.h +++ b/drivers/scsi/lpfc/lpfc_version.h @@ -20,7 +20,7 @@ * included with this package. * ***/ -#define LPFC_DRIVER_VERSION "11.4.0.6" +#define LPFC_DRIVER_VERSION "11.4.0.7" #define LPFC_DRIVER_NAME "lpfc" /* Used for SLI 2/3 */ -- 2.13.1
[PATCH v3 08/19] lpfc: Fix RQ empty firmware trap
When nvme target deferred receive logic waits for exchange resources, the corresponding receive buffer is not replenished with the hardware. This can result in a lack of asynchronous receive buffer resources in the hardware, resulting in a "2885 Port Status Event: ... error 1=0x52004a01 ..." message. Correct by replenishing the buffer whenenver the deferred logic kicks in. Update corresponding debug messages and statistics as well. Signed-off-by: Dick KennedySigned-off-by: James Smart Reviewed-by: Hannes Reinecke --- drivers/scsi/lpfc/lpfc_attr.c | 6 ++ drivers/scsi/lpfc/lpfc_mem.c | 8 ++-- drivers/scsi/lpfc/lpfc_nvmet.c | 31 +-- drivers/scsi/lpfc/lpfc_nvmet.h | 7 +-- drivers/scsi/lpfc/lpfc_sli.c | 12 5 files changed, 50 insertions(+), 14 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c index d188fb565a32..91df2b4e9fce 100644 --- a/drivers/scsi/lpfc/lpfc_attr.c +++ b/drivers/scsi/lpfc/lpfc_attr.c @@ -259,6 +259,12 @@ lpfc_nvme_info_show(struct device *dev, struct device_attribute *attr, atomic_read(>xmt_abort_rsp), atomic_read(>xmt_abort_rsp_error)); + len += snprintf(buf + len, PAGE_SIZE - len, + "DELAY: ctx %08x fod %08x wqfull %08x\n", + atomic_read(>defer_ctx), + atomic_read(>defer_fod), + atomic_read(>defer_wqfull)); + /* Calculate outstanding IOs */ tot = atomic_read(>rcv_fcp_cmd_drop); tot += atomic_read(>xmt_fcp_release); diff --git a/drivers/scsi/lpfc/lpfc_mem.c b/drivers/scsi/lpfc/lpfc_mem.c index 56faeb049b4a..60078e61da5e 100644 --- a/drivers/scsi/lpfc/lpfc_mem.c +++ b/drivers/scsi/lpfc/lpfc_mem.c @@ -755,10 +755,14 @@ lpfc_rq_buf_free(struct lpfc_hba *phba, struct lpfc_dmabuf *mp) if (rc < 0) { (rqbp->rqb_free_buffer)(phba, rqb_entry); lpfc_printf_log(phba, KERN_ERR, LOG_INIT, - "6409 Cannot post to RQ %d: %x %x\n", + "6409 Cannot post to HRQ %d: %x %x %x " + "DRQ %x %x\n", rqb_entry->hrq->queue_id, rqb_entry->hrq->host_index, - rqb_entry->hrq->hba_index); + rqb_entry->hrq->hba_index, + rqb_entry->hrq->entry_count, + rqb_entry->drq->host_index, + rqb_entry->drq->hba_index); } else { list_add_tail(_entry->hbuf.list, >rqb_buffer_list); rqbp->buffer_count++; diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c index 9c2acf90212c..0539585d32d4 100644 --- a/drivers/scsi/lpfc/lpfc_nvmet.c +++ b/drivers/scsi/lpfc/lpfc_nvmet.c @@ -270,8 +270,6 @@ lpfc_nvmet_ctxbuf_post(struct lpfc_hba *phba, struct lpfc_nvmet_ctxbuf *ctx_buf) "NVMET RCV BUSY: xri x%x sz %d " "from %06x\n", oxid, size, sid); - /* defer repost rcv buffer till .defer_rcv callback */ - ctxp->flag &= ~LPFC_NVMET_DEFER_RCV_REPOST; atomic_inc(>rcv_fcp_cmd_out); return; } @@ -837,6 +835,7 @@ lpfc_nvmet_xmt_fcp_op(struct nvmet_fc_target_port *tgtport, list_add_tail(>list, >wqfull_list); wq->q_flag |= HBA_NVMET_WQFULL; spin_unlock_irqrestore(>ring_lock, iflags); + atomic_inc(_nvmep->defer_wqfull); return 0; } @@ -975,11 +974,9 @@ lpfc_nvmet_defer_rcv(struct nvmet_fc_target_port *tgtport, tgtp = phba->targetport->private; atomic_inc(>rcv_fcp_cmd_defer); - if (ctxp->flag & LPFC_NVMET_DEFER_RCV_REPOST) - lpfc_rq_buf_free(phba, >hbuf); /* repost */ - else - nvmebuf->hrq->rqbp->rqb_free_buffer(phba, nvmebuf); - ctxp->flag &= ~LPFC_NVMET_DEFER_RCV_REPOST; + + /* Free the nvmebuf since a new buffer already replaced it */ + nvmebuf->hrq->rqbp->rqb_free_buffer(phba, nvmebuf); } static struct nvmet_fc_target_template lpfc_tgttemplate = { @@ -1309,6 +1306,9 @@ lpfc_nvmet_create_targetport(struct lpfc_hba *phba) atomic_set(>xmt_abort_sol, 0); atomic_set(>xmt_abort_rsp, 0); atomic_set(>xmt_abort_rsp_error, 0); + atomic_set(>defer_ctx, 0); + atomic_set(>defer_fod, 0); + atomic_set(>defer_wqfull, 0); } return error; } @@ -1810,6 +1810,8
[PATCH v3 16/19] lpfc: Treat SCSI Write operation Underruns as an error
Currently, write underruns (mismatch of amount transferred vs scsi status and its residual) detected by the adapter are not being flagged as an error. Its expected the target controls the data transfer and would appropriately set the RSP values. Only read underruns are treated as errors. Revise the SCSI error handling to treat write underruns as an error as well. Signed-off-by: Dick KennedySigned-off-by: James Smart Reviewed-by: Hannes Reinecke --- drivers/scsi/lpfc/lpfc_scsi.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c index dcc86936e6fc..10c2dc0cf1fa 100644 --- a/drivers/scsi/lpfc/lpfc_scsi.c +++ b/drivers/scsi/lpfc/lpfc_scsi.c @@ -3772,20 +3772,18 @@ lpfc_handle_fcp_err(struct lpfc_vport *vport, struct lpfc_scsi_buf *lpfc_cmd, scsi_set_resid(cmnd, be32_to_cpu(fcprsp->rspResId)); lpfc_printf_vlog(vport, KERN_INFO, LOG_FCP_UNDER, -"9025 FCP Read Underrun, expected %d, " +"9025 FCP Underrun, expected %d, " "residual %d Data: x%x x%x x%x\n", fcpDl, scsi_get_resid(cmnd), fcpi_parm, cmnd->cmnd[0], cmnd->underflow); /* -* If there is an under run check if under run reported by +* If there is an under run, check if under run reported by * storage array is same as the under run reported by HBA. * If this is not same, there is a dropped frame. */ - if ((cmnd->sc_data_direction == DMA_FROM_DEVICE) && - fcpi_parm && - (scsi_get_resid(cmnd) != fcpi_parm)) { + if (fcpi_parm && (scsi_get_resid(cmnd) != fcpi_parm)) { lpfc_printf_vlog(vport, KERN_WARNING, LOG_FCP | LOG_FCP_ERROR, "9026 FCP Read Check Error " -- 2.13.1
[PATCH v3 13/19] lpfc: Fix SCSI io host reset causing kernel crash
During SCSI error handling escalation to host reset, the SCSI io routines were moved off the txcmplq, but the individual io's ON_CMPLQ flag wasn't cleared. Thus, a background thread saw the io and attempted to access it as if on the txcmplq. Clear the flag upon removal. Signed-off-by: Dick KennedySigned-off-by: James Smart Reviewed-by: Hannes Reinecke --- drivers/scsi/lpfc/lpfc_init.c | 4 drivers/scsi/lpfc/lpfc_sli.c | 13 - 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index bff5c95cf5df..aa7872a7b493 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -958,6 +958,7 @@ lpfc_hba_clean_txcmplq(struct lpfc_hba *phba) struct lpfc_sli_ring *pring; LIST_HEAD(completions); int i; + struct lpfc_iocbq *piocb, *next_iocb; if (phba->sli_rev != LPFC_SLI_REV4) { for (i = 0; i < psli->num_rings; i++) { @@ -983,6 +984,9 @@ lpfc_hba_clean_txcmplq(struct lpfc_hba *phba) if (!pring) continue; spin_lock_irq(>ring_lock); + list_for_each_entry_safe(piocb, next_iocb, +>txcmplq, list) + piocb->iocb_flag &= ~LPFC_IO_ON_TXCMPLQ; list_splice_init(>txcmplq, ); pring->txcmplq_cnt = 0; spin_unlock_irq(>ring_lock); diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c index 8b2919a553d6..d597e15a1974 100644 --- a/drivers/scsi/lpfc/lpfc_sli.c +++ b/drivers/scsi/lpfc/lpfc_sli.c @@ -3778,6 +3778,7 @@ lpfc_sli_flush_fcp_rings(struct lpfc_hba *phba) struct lpfc_sli *psli = >sli; struct lpfc_sli_ring *pring; uint32_t i; + struct lpfc_iocbq *piocb, *next_iocb; spin_lock_irq(>hbalock); /* Indicate the I/O queues are flushed */ @@ -3792,6 +3793,9 @@ lpfc_sli_flush_fcp_rings(struct lpfc_hba *phba) spin_lock_irq(>ring_lock); /* Retrieve everything on txq */ list_splice_init(>txq, ); + list_for_each_entry_safe(piocb, next_iocb, +>txcmplq, list) + piocb->iocb_flag &= ~LPFC_IO_ON_TXCMPLQ; /* Retrieve everything on the txcmplq */ list_splice_init(>txcmplq, ); pring->txq_cnt = 0; @@ -3813,6 +3817,9 @@ lpfc_sli_flush_fcp_rings(struct lpfc_hba *phba) spin_lock_irq(>hbalock); /* Retrieve everything on txq */ list_splice_init(>txq, ); + list_for_each_entry_safe(piocb, next_iocb, +>txcmplq, list) + piocb->iocb_flag &= ~LPFC_IO_ON_TXCMPLQ; /* Retrieve everything on the txcmplq */ list_splice_init(>txcmplq, ); pring->txq_cnt = 0; @@ -3844,6 +3851,7 @@ lpfc_sli_flush_nvme_rings(struct lpfc_hba *phba) LIST_HEAD(txcmplq); struct lpfc_sli_ring *pring; uint32_t i; + struct lpfc_iocbq *piocb, *next_iocb; if (phba->sli_rev < LPFC_SLI_REV4) return; @@ -3860,8 +3868,11 @@ lpfc_sli_flush_nvme_rings(struct lpfc_hba *phba) for (i = 0; i < phba->cfg_nvme_io_channel; i++) { pring = phba->sli4_hba.nvme_wq[i]->pring; - /* Retrieve everything on the txcmplq */ spin_lock_irq(>ring_lock); + list_for_each_entry_safe(piocb, next_iocb, +>txcmplq, list) + piocb->iocb_flag &= ~LPFC_IO_ON_TXCMPLQ; + /* Retrieve everything on the txcmplq */ list_splice_init(>txcmplq, ); pring->txcmplq_cnt = 0; spin_unlock_irq(>ring_lock); -- 2.13.1
[PATCH v3 07/19] lpfc: Fix IO failure during hba reset testing with nvme io.
A stress test repeatedly resetting the adapter while performing io would eventually report I/O failures and missing nvme namespaces. The driver was setting the nvmefc_fcp_req->private pointer to NULL during the IO completion routine before upcalling done(). If the transport was also running an abort for that IO, the driver would fail the abort with message 6140. Failing the abort is not allowed by the nvme-fc transport, as it mandates that the io must be returned back to the transport. As that does not happen, the transport controller delete has an outstanding reference and can't complete teardown. The NULL-ing of the private pointer should be done only when the io is considered complete. It's complete when the adapter returns the exchange with the "exchange busy" flag clear. Move the NULL'ing of the structure to the done case. This leaves the io contexts set while it is busy and until the subsequent XRI_ABORTED completion which returns the exchange is received. Signed-off-by: Dick KennedySigned-off-by: James Smart --- v3: Address review comment that preferred to continue to NULL if the io did complete in order to protect stale conditions. After review, we agreed, thus the above change. --- drivers/scsi/lpfc/lpfc_nvme.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c index 81e3a4f10c3c..c6e5b9972585 100644 --- a/drivers/scsi/lpfc/lpfc_nvme.c +++ b/drivers/scsi/lpfc/lpfc_nvme.c @@ -980,14 +980,14 @@ lpfc_nvme_io_cmd_wqe_cmpl(struct lpfc_hba *phba, struct lpfc_iocbq *pwqeIn, phba->cpucheck_cmpl_io[lpfc_ncmd->cpu]++; } #endif - freqpriv = nCmd->private; - freqpriv->nvme_buf = NULL; /* NVME targets need completion held off until the abort exchange * completes unless the NVME Rport is getting unregistered. */ if (!(lpfc_ncmd->flags & LPFC_SBUF_XBUSY)) { + freqpriv = nCmd->private; + freqpriv->nvme_buf = NULL; nCmd->done(nCmd); lpfc_ncmd->nvmeCmd = NULL; } -- 2.13.1
[PATCH v3 05/19] lpfc: Add WQ Full Logic for NVME Target
I/O conditions on the nvme target may have the driver submitting to a full hardware wq. The hardware wq is a shared resource among all nvme controllers. When the driver hit a full wq, it failed the io posting back to the nvme-fc transport, which then escalated it into errors. Correct by maintaining a sideband queue within the driver that is added to when the WQ full condition is hit, and drained from as soon as new WQ space opens up. Signed-off-by: Dick KennedySigned-off-by: James Smart Reviewed-by: Hannes Reinecke --- drivers/scsi/lpfc/lpfc_crtn.h | 1 + drivers/scsi/lpfc/lpfc_nvmet.c | 116 + drivers/scsi/lpfc/lpfc_nvmet.h | 1 + drivers/scsi/lpfc/lpfc_sli.c | 3 ++ drivers/scsi/lpfc/lpfc_sli4.h | 5 +- 5 files changed, 125 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/lpfc/lpfc_crtn.h b/drivers/scsi/lpfc/lpfc_crtn.h index 559f9aa0ed08..3ecf50df93f4 100644 --- a/drivers/scsi/lpfc/lpfc_crtn.h +++ b/drivers/scsi/lpfc/lpfc_crtn.h @@ -254,6 +254,7 @@ void lpfc_nvmet_ctxbuf_post(struct lpfc_hba *phba, struct lpfc_nvmet_ctxbuf *ctxp); int lpfc_nvmet_rcv_unsol_abort(struct lpfc_vport *vport, struct fc_frame_header *fc_hdr); +void lpfc_nvmet_wqfull_process(struct lpfc_hba *phba, struct lpfc_queue *wq); void lpfc_sli_flush_nvme_rings(struct lpfc_hba *phba); void lpfc_nvme_wait_for_io_drain(struct lpfc_hba *phba); void lpfc_sli4_build_dflt_fcf_record(struct lpfc_hba *, struct fcf_record *, diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c index 7927ac46d345..9c2acf90212c 100644 --- a/drivers/scsi/lpfc/lpfc_nvmet.c +++ b/drivers/scsi/lpfc/lpfc_nvmet.c @@ -71,6 +71,8 @@ static int lpfc_nvmet_unsol_fcp_issue_abort(struct lpfc_hba *, static int lpfc_nvmet_unsol_ls_issue_abort(struct lpfc_hba *, struct lpfc_nvmet_rcv_ctx *, uint32_t, uint16_t); +static void lpfc_nvmet_wqfull_flush(struct lpfc_hba *, struct lpfc_queue *, + struct lpfc_nvmet_rcv_ctx *); void lpfc_nvmet_defer_release(struct lpfc_hba *phba, struct lpfc_nvmet_rcv_ctx *ctxp) @@ -741,7 +743,10 @@ lpfc_nvmet_xmt_fcp_op(struct nvmet_fc_target_port *tgtport, struct lpfc_nvmet_rcv_ctx *ctxp = container_of(rsp, struct lpfc_nvmet_rcv_ctx, ctx.fcp_req); struct lpfc_hba *phba = ctxp->phba; + struct lpfc_queue *wq; struct lpfc_iocbq *nvmewqeq; + struct lpfc_sli_ring *pring; + unsigned long iflags; int rc; if (phba->pport->load_flag & FC_UNLOADING) { @@ -820,6 +825,21 @@ lpfc_nvmet_xmt_fcp_op(struct nvmet_fc_target_port *tgtport, return 0; } + if (rc == -EBUSY) { + /* +* WQ was full, so queue nvmewqeq to be sent after +* WQE release CQE +*/ + ctxp->flag |= LPFC_NVMET_DEFER_WQFULL; + wq = phba->sli4_hba.nvme_wq[rsp->hwqid]; + pring = wq->pring; + spin_lock_irqsave(>ring_lock, iflags); + list_add_tail(>list, >wqfull_list); + wq->q_flag |= HBA_NVMET_WQFULL; + spin_unlock_irqrestore(>ring_lock, iflags); + return 0; + } + /* Give back resources */ atomic_inc(_nvmep->xmt_fcp_drop); lpfc_printf_log(phba, KERN_ERR, LOG_NVME_IOERR, @@ -851,6 +871,7 @@ lpfc_nvmet_xmt_fcp_abort(struct nvmet_fc_target_port *tgtport, struct lpfc_nvmet_rcv_ctx *ctxp = container_of(req, struct lpfc_nvmet_rcv_ctx, ctx.fcp_req); struct lpfc_hba *phba = ctxp->phba; + struct lpfc_queue *wq; unsigned long flags; if (phba->pport->load_flag & FC_UNLOADING) @@ -880,6 +901,14 @@ lpfc_nvmet_xmt_fcp_abort(struct nvmet_fc_target_port *tgtport, } ctxp->flag |= LPFC_NVMET_ABORT_OP; + if (ctxp->flag & LPFC_NVMET_DEFER_WQFULL) { + lpfc_nvmet_unsol_fcp_issue_abort(phba, ctxp, ctxp->sid, +ctxp->oxid); + wq = phba->sli4_hba.nvme_wq[ctxp->wqeq->hba_wqidx]; + lpfc_nvmet_wqfull_flush(phba, wq, ctxp); + return; + } + /* An state of LPFC_NVMET_STE_RCV means we have just received * the NVME command and have not started processing it. * (by issuing any IO WQEs on this exchange yet) @@ -1435,16 +1464,103 @@ lpfc_nvmet_rcv_unsol_abort(struct lpfc_vport *vport, return 0; } +static void +lpfc_nvmet_wqfull_flush(struct lpfc_hba *phba, struct lpfc_queue *wq, + struct lpfc_nvmet_rcv_ctx *ctxp) +{ + struct lpfc_sli_ring *pring; + struct lpfc_iocbq *nvmewqeq; + struct lpfc_iocbq *next_nvmewqeq; + unsigned long
[PATCH v3 12/19] lpfc: Indicate CONF support in NVMe PRLI
Revise the NVME PRLI to indicate CONF support. Signed-off-by: Dick KennedySigned-off-by: James Smart Reviewed-by: Hannes Reinecke --- drivers/scsi/lpfc/lpfc_els.c | 3 ++- drivers/scsi/lpfc/lpfc_hw4.h | 6 +++--- drivers/scsi/lpfc/lpfc_nportdisc.c | 3 --- 3 files changed, 5 insertions(+), 7 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c index 234c7c015982..404e1af5e2ab 100644 --- a/drivers/scsi/lpfc/lpfc_els.c +++ b/drivers/scsi/lpfc/lpfc_els.c @@ -2293,10 +2293,11 @@ lpfc_issue_els_prli(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp, if (phba->nvmet_support) { bf_set(prli_tgt, npr_nvme, 1); bf_set(prli_disc, npr_nvme, 1); - } else { bf_set(prli_init, npr_nvme, 1); + bf_set(prli_conf, npr_nvme, 1); } + npr_nvme->word1 = cpu_to_be32(npr_nvme->word1); npr_nvme->word4 = cpu_to_be32(npr_nvme->word4); elsiocb->iocb_flag |= LPFC_PRLI_NVME_REQ; diff --git a/drivers/scsi/lpfc/lpfc_hw4.h b/drivers/scsi/lpfc/lpfc_hw4.h index ef469129fb71..7c3afc3d3121 100644 --- a/drivers/scsi/lpfc/lpfc_hw4.h +++ b/drivers/scsi/lpfc/lpfc_hw4.h @@ -4346,9 +4346,9 @@ struct lpfc_nvme_prli { #define prli_init_SHIFT 5 #define prli_init_MASK 0x0001 #define prli_init_WORD word4 -#define prli_recov_SHIFT8 -#define prli_recov_MASK 0x0001 -#define prli_recov_WORD word4 +#define prli_conf_SHIFT 7 +#define prli_conf_MASK 0x0001 +#define prli_conf_WORD word4 uint32_t word5; #define prli_fb_sz_SHIFT0 #define prli_fb_sz_MASK 0x diff --git a/drivers/scsi/lpfc/lpfc_nportdisc.c b/drivers/scsi/lpfc/lpfc_nportdisc.c index d841aa42f607..bbf1e1342b09 100644 --- a/drivers/scsi/lpfc/lpfc_nportdisc.c +++ b/drivers/scsi/lpfc/lpfc_nportdisc.c @@ -2011,9 +2011,6 @@ lpfc_cmpl_prli_prli_issue(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp, } } - if (bf_get_be32(prli_recov, nvpr)) - ndlp->nlp_fcp_info |= NLP_FCP_2_DEVICE; - lpfc_printf_vlog(vport, KERN_INFO, LOG_NVME_DISC, "6029 NVME PRLI Cmpl w1 x%08x " "w4 x%08x w5 x%08x flag x%x, " -- 2.13.1
[PATCH v3 15/19] lpfc: Fix header inclusion in lpfc_nvmet
The driver was inappropriately pulling in the nvme host's nvme.h header. What it really needed was the standard header. Signed-off-by: Dick KennedySigned-off-by: James Smart Reviewed-by: Hannes Reinecke --- drivers/scsi/lpfc/lpfc_nvmet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c index 6dd8535918f6..823b6df0aec7 100644 --- a/drivers/scsi/lpfc/lpfc_nvmet.c +++ b/drivers/scsi/lpfc/lpfc_nvmet.c @@ -36,7 +36,7 @@ #include #include -#include <../drivers/nvme/host/nvme.h> +#include #include #include -- 2.13.1
[PATCH v3 04/19] lpfc: correct debug counters for abort
Existing code was using the wrong field for the completion status when comparing whether to increment abort statistics Signed-off-by: Dick KennedySigned-off-by: James Smart Reviewed-by: Hannes Reinecke --- drivers/scsi/lpfc/lpfc_nvmet.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c index 8dbf5c9d51aa..7927ac46d345 100644 --- a/drivers/scsi/lpfc/lpfc_nvmet.c +++ b/drivers/scsi/lpfc/lpfc_nvmet.c @@ -130,7 +130,7 @@ lpfc_nvmet_xmt_ls_rsp_cmp(struct lpfc_hba *phba, struct lpfc_iocbq *cmdwqe, if (tgtp) { if (status) { atomic_inc(>xmt_ls_rsp_error); - if (status == IOERR_ABORT_REQUESTED) + if (result == IOERR_ABORT_REQUESTED) atomic_inc(>xmt_ls_rsp_aborted); if (bf_get(lpfc_wcqe_c_xb, wcqe)) atomic_inc(>xmt_ls_rsp_xb_set); @@ -541,7 +541,7 @@ lpfc_nvmet_xmt_fcp_op_cmp(struct lpfc_hba *phba, struct lpfc_iocbq *cmdwqe, rsp->transferred_length = 0; if (tgtp) { atomic_inc(>xmt_fcp_rsp_error); - if (status == IOERR_ABORT_REQUESTED) + if (result == IOERR_ABORT_REQUESTED) atomic_inc(>xmt_fcp_rsp_aborted); } -- 2.13.1
[PATCH v3 02/19] lpfc: Increase CQ and WQ sizes for SCSI
Increased CQ and WQ sizes for SCSI FCP, matching those used for NVMe development. Signed-off-by: Dick KennedySigned-off-by: James Smart --- v3: lpfc_init.c: Corrected indentation of comments after patching Removed unnecessary braces Revised comment on checking embedded cdb support --- drivers/scsi/lpfc/lpfc.h | 1 + drivers/scsi/lpfc/lpfc_hw4.h | 3 +++ drivers/scsi/lpfc/lpfc_init.c | 36 +--- drivers/scsi/lpfc/lpfc_sli.c | 3 ++- drivers/scsi/lpfc/lpfc_sli4.h | 5 + 5 files changed, 36 insertions(+), 12 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h index 61fb46da05d4..d042f9118e3b 100644 --- a/drivers/scsi/lpfc/lpfc.h +++ b/drivers/scsi/lpfc/lpfc.h @@ -760,6 +760,7 @@ struct lpfc_hba { uint8_t mds_diags_support; uint32_t initial_imax; uint8_t bbcredit_support; + uint8_t enab_exp_wqcq_pages; /* HBA Config Parameters */ uint32_t cfg_ack0; diff --git a/drivers/scsi/lpfc/lpfc_hw4.h b/drivers/scsi/lpfc/lpfc_hw4.h index 73c2f6971d2b..ef469129fb71 100644 --- a/drivers/scsi/lpfc/lpfc_hw4.h +++ b/drivers/scsi/lpfc/lpfc_hw4.h @@ -3212,6 +3212,9 @@ struct lpfc_sli4_parameters { #define cfg_cqv_SHIFT 14 #define cfg_cqv_MASK 0x0003 #define cfg_cqv_WORD word4 +#define cfg_cqpsize_SHIFT 16 +#define cfg_cqpsize_MASK 0x00ff +#define cfg_cqpsize_WORD word4 uint32_t word5; uint32_t word6; #define cfg_mqv_SHIFT 14 diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index f539c554588c..40ffa042 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -8011,9 +8011,10 @@ static int lpfc_alloc_fcp_wq_cq(struct lpfc_hba *phba, int wqidx) { struct lpfc_queue *qdesc; + uint32_t wqesize; /* Create Fast Path FCP CQs */ - if (phba->fcp_embed_io) + if (phba->enab_exp_wqcq_pages) /* Increase the CQ size when WQEs contain an embedded cdb */ qdesc = lpfc_sli4_queue_alloc(phba, LPFC_EXPANDED_PAGE_SIZE, phba->sli4_hba.cq_esize, @@ -8031,15 +8032,18 @@ lpfc_alloc_fcp_wq_cq(struct lpfc_hba *phba, int wqidx) phba->sli4_hba.fcp_cq[wqidx] = qdesc; /* Create Fast Path FCP WQs */ - if (phba->fcp_embed_io) + if (phba->enab_exp_wqcq_pages) { /* Increase the WQ size when WQEs contain an embedded cdb */ + wqesize = (phba->fcp_embed_io) ? + LPFC_WQE128_SIZE : phba->sli4_hba.wq_esize; qdesc = lpfc_sli4_queue_alloc(phba, LPFC_EXPANDED_PAGE_SIZE, - LPFC_WQE128_SIZE, + wqesize, LPFC_WQE_EXP_COUNT); - else + } else qdesc = lpfc_sli4_queue_alloc(phba, LPFC_DEFAULT_PAGE_SIZE, phba->sli4_hba.wq_esize, phba->sli4_hba.wq_ecount); + if (!qdesc) { lpfc_printf_log(phba, KERN_ERR, LOG_INIT, "0503 Failed allocate fast-path FCP WQ (%d)\n", @@ -10476,15 +10480,21 @@ lpfc_get_sli4_parameters(struct lpfc_hba *phba, LPFC_MBOXQ_t *mboxq) sli4_params->sge_supp_len = LPFC_MAX_SGE_SIZE; /* -* Issue IOs with CDB embedded in WQE to minimized the number -* of DMAs the firmware has to do. Setting this to 1 also forces -* the driver to use 128 bytes WQEs for FCP IOs. +* Check whether the adapter supports an embedded copy of the +* FCP CMD IU within the WQE for FCP_Ixxx commands. In order +* to use this option, 128-byte WQEs must be used. */ if (bf_get(cfg_ext_embed_cb, mbx_sli4_parameters)) phba->fcp_embed_io = 1; else phba->fcp_embed_io = 0; + if ((bf_get(cfg_cqpsize, mbx_sli4_parameters) & LPFC_CQ_16K_PAGE_SZ) && + (bf_get(cfg_wqpsize, mbx_sli4_parameters) & LPFC_WQ_16K_PAGE_SZ) && + (sli4_params->wqsize & LPFC_WQ_SZ128_SUPPORT)) + phba->enab_exp_wqcq_pages = 1; + else + phba->enab_exp_wqcq_pages = 0; /* * Check if the SLI port supports MDS Diagnostics */ @@ -12227,6 +12237,7 @@ int lpfc_fof_queue_create(struct lpfc_hba *phba) { struct lpfc_queue *qdesc; + uint32_t wqesize; /* Create FOF EQ */ qdesc = lpfc_sli4_queue_alloc(phba, LPFC_DEFAULT_PAGE_SIZE, @@ -12240,7 +12251,7 @@ lpfc_fof_queue_create(struct lpfc_hba *phba) if (phba->cfg_fof) { /*
[PATCH v3 01/19] lpfc: Fix frequency of Release WQE CQEs
The driver controls when the hardware sends completions that communicate consumption of elements from the WQ. This is done by setting a WQEC bit on a WQE. The current driver sets it on every Nth WQE posting. However, the driver isn't clearing the bit if the WQE is reused. Thus, if the queue depth isn't evenly divisible by N, with enough time, it can be set on every element, creating a lot of overhead and risking CQ full conditions. Correct by clearing the bit when not setting it on an Nth element. Signed-off-by: Dick KennedySigned-off-by: James Smart Reviewed-by: Hannes Reinecke --- drivers/scsi/lpfc/lpfc_sli.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c index 5f5528a12308..149f21f53b13 100644 --- a/drivers/scsi/lpfc/lpfc_sli.c +++ b/drivers/scsi/lpfc/lpfc_sli.c @@ -129,6 +129,8 @@ lpfc_sli4_wq_put(struct lpfc_queue *q, union lpfc_wqe *wqe) /* set consumption flag every once in a while */ if (!((q->host_index + 1) % q->entry_repost)) bf_set(wqe_wqec, >generic.wqe_com, 1); + else + bf_set(wqe_wqec, >generic.wqe_com, 0); if (q->phba->sli3_options & LPFC_SLI4_PHWQ_ENABLED) bf_set(wqe_wqid, >generic.wqe_com, q->queue_id); lpfc_sli_pcimem_bcopy(wqe, temp_wqe, q->entry_size); -- 2.13.1
[PATCH v3 03/19] lpfc: move placement of target destroy on driver detach
Ensure nvme localports/targetports are torn down before dismantling the adapter sli interface on driver detachment. This aids leaving interfaces live while nvme may be making callbacks to abort it. Signed-off-by: Dick KennedySigned-off-by: James Smart Reviewed-by: Hannes Reinecke --- drivers/scsi/lpfc/lpfc_init.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 40ffa042..bff5c95cf5df 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -11503,13 +11503,6 @@ lpfc_pci_remove_one_s4(struct pci_dev *pdev) /* Remove FC host and then SCSI host with the physical port */ fc_remove_host(shost); scsi_remove_host(shost); - /* -* Bring down the SLI Layer. This step disables all interrupts, -* clears the rings, discards all mailbox commands, and resets -* the HBA FCoE function. -*/ - lpfc_debugfs_terminate(vport); - lpfc_sli4_hba_unset(phba); /* Perform ndlp cleanup on the physical port. The nvme and nvmet * localports are destroyed after to cleanup all transport memory. @@ -11518,6 +11511,13 @@ lpfc_pci_remove_one_s4(struct pci_dev *pdev) lpfc_nvmet_destroy_targetport(phba); lpfc_nvme_destroy_localport(vport); + /* +* Bring down the SLI Layer. This step disables all interrupts, +* clears the rings, discards all mailbox commands, and resets +* the HBA FCoE function. +*/ + lpfc_debugfs_terminate(vport); + lpfc_sli4_hba_unset(phba); lpfc_stop_hba_timers(phba); spin_lock_irq(>hbalock); -- 2.13.1
[PATCH v3 00/19] lpfc updates for 11.4.0.7
This patch set provides a number of fixes for the driver. The patches were cut against the Martin's 4.16/scsi-queue tree. There are no outside dependencies and are expected to be pulled via Martins tree. v2: respin patch 14 "lpfc: Validate adapter support for SRIU option" for snippet that was missing v3: Responded to review comments from Hannes Updated Copyright string used in load banner James Smart (19): lpfc: Fix frequency of Release WQE CQEs lpfc: Increase CQ and WQ sizes for SCSI lpfc: move placement of target destroy on driver detach lpfc: correct debug counters for abort lpfc: Add WQ Full Logic for NVME Target lpfc: Fix PRLI handling when topology type changes lpfc: Fix IO failure during hba reset testing with nvme io. lpfc: Fix RQ empty firmware trap lpfc: Allow set of maximum outstanding SCSI cmd limit for a target parameter lpfc: Fix soft lockup in lpfc worker thread during LIP testing lpfc: Fix issue_lip if link is disabled lpfc: Indicate CONF support in NVMe PRLI lpfc: Fix SCSI io host reset causing kernel crash lpfc: Validate adapter support for SRIU option lpfc: Fix header inclusion in lpfc_nvmet lpfc: Treat SCSI Write operation Underruns as an error lpfc: Fix nonrecovery of NVME controller after cable swap. lpfc: update driver version to 11.4.0.7 lpfc: Update 11.4.0.7 modified files for 2018 Copyright drivers/scsi/lpfc/lpfc.h | 3 +- drivers/scsi/lpfc/lpfc_attr.c | 17 +++- drivers/scsi/lpfc/lpfc_crtn.h | 3 +- drivers/scsi/lpfc/lpfc_els.c | 5 +- drivers/scsi/lpfc/lpfc_hbadisc.c | 7 +- drivers/scsi/lpfc/lpfc_hw4.h | 14 +++- drivers/scsi/lpfc/lpfc_init.c | 69 +++- drivers/scsi/lpfc/lpfc_mbox.c | 6 +- drivers/scsi/lpfc/lpfc_mem.c | 10 ++- drivers/scsi/lpfc/lpfc_nportdisc.c | 7 +- drivers/scsi/lpfc/lpfc_nvme.c | 30 +++ drivers/scsi/lpfc/lpfc_nvmet.c | 162 + drivers/scsi/lpfc/lpfc_nvmet.h | 10 ++- drivers/scsi/lpfc/lpfc_scsi.c | 49 +++ drivers/scsi/lpfc/lpfc_sli.c | 36 - drivers/scsi/lpfc/lpfc_sli4.h | 12 ++- drivers/scsi/lpfc/lpfc_version.h | 8 +- 17 files changed, 341 insertions(+), 107 deletions(-) -- 2.13.1
Re: [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE
On Tue, Jan 30 2018 at 2:42pm -0500, Bart Van Asschewrote: > On Tue, 2018-01-30 at 14:33 -0500, Mike Snitzer wrote: > > On Tue, Jan 30 2018 at 12:52pm -0500, > > Bart Van Assche wrote: > > > > > - This patch does not fix any bugs nor makes block drivers easier to > > > read or to implement. So why is this patch considered useful? > > > > It enables blk-mq core to own the problem that individual drivers should > > have no business needing to worry about. Period. > > Thanks for having confirmed that this patch is an API-change only and does > not fix any bugs. No, it is an API change that enables blk-mq drivers to make forward progress without compromising the potential benefits associated with blk-mq's SCHED_RESTART functionality. (If we're going to beat this horse to death it might as well be with precision)
Re: [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE
On Tue, 2018-01-30 at 14:33 -0500, Mike Snitzer wrote: > On Tue, Jan 30 2018 at 12:52pm -0500, > Bart Van Asschewrote: > > > - This patch does not fix any bugs nor makes block drivers easier to > > read or to implement. So why is this patch considered useful? > > It enables blk-mq core to own the problem that individual drivers should > have no business needing to worry about. Period. Thanks for having confirmed that this patch is an API-change only and does not fix any bugs. Bart.
Re: [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE
On Tue, Jan 30 2018 at 12:52pm -0500, Bart Van Asschewrote: > On 01/30/18 06:24, Mike Snitzer wrote: > >+ * > >+ * If driver returns BLK_STS_RESOURCE and SCHED_RESTART > >+ * bit is set, run queue after a delay to avoid IO stalls > >+ * that could otherwise occur if the queue is idle. > > */ > >-if (!blk_mq_sched_needs_restart(hctx) || > >+needs_restart = blk_mq_sched_needs_restart(hctx); > >+if (!needs_restart || > > (no_tag && list_empty_careful(>dispatch_wait.entry))) > > blk_mq_run_hw_queue(hctx, true); > >+else if (needs_restart && (ret == BLK_STS_RESOURCE)) > >+blk_mq_delay_run_hw_queue(hctx, BLK_MQ_QUEUE_DELAY); > > } > > If a request completes concurrently with execution of the above code > then the request completion will trigger a call of > blk_mq_sched_restart_hctx() and that call will clear the > BLK_MQ_S_SCHED_RESTART bit. If that bit is cleared before the above > code tests it then the above code will schedule an asynchronous call > of __blk_mq_run_hw_queue(). If the .queue_rq() call triggered by the > new queue run returns again BLK_STS_RESOURCE then the above code > will be executed again. In other words, a loop occurs. That loop > will repeat as long as the described race occurs. The current > (kernel v4.15) block layer behavior is simpler: only block drivers > call blk_mq_delay_run_hw_queue() and the block layer core never > calls that function. Hence that loop cannot occur with the v4.15 > block layer core and block drivers. A motivation of why that loop is > preferred compared to the current behavior (no loop) is missing. > Does this mean that that loop is a needless complication of the > block layer core? No it means the loop is an internal blk-mq concern. And that drivers needn't worry about kicking the queue. And it affords blk-mq latitude to change how it responds to BLK_STS_RESOURCE in the future (without needing to change every driver). But even v4.15 has a similar loop. It just happens to extend into the .queue_rq() where the driver is completely blind to SCHED_RESTART. And the driver may just repeatedly kick the queue after a delay (via blk_mq_delay_run_hw_queue). This cycle should be a very rare occurrence regardless of which approach is taken (V5 vs 4.15). The fact that you engineered your SRP initiator and target code to pathologically trigger this worst case (via target_can_queue) doesn't mean it is the fast path for a properly configured and functioning storage network. > Sorry but I still prefer the v4.15 block layer approach because this > patch has in my view the following disadvantages: > - It involves a blk-mq API change. API changes are always risky and need > some time to stabilize. The number of blk-mq API changes that have occurred since blk-mq was introduced is a very long list. Seems contrived to make this the one that is breaking the camel's back. > - The delay after which to rerun the queue is moved from block layer > drivers into the block layer core. I think that's wrong because only > the block driver authors can make a good choice for this constant. Unsubstantiated. 3ms (scsi-mq, nvmefc) vs 100ms (dm-mq mpath): which is better? Pretty sure if the underlying storage network is 1) performant 2) properly configured -- then a shorter delay is preferable. > - This patch makes block drivers harder to understand. Anyone who sees > return BLK_STS_RESOURCE / return BLK_STS_DEV_RESOURCE for the first > time will have to look up the meaning of these constants. An explicit > blk_mq_delay_run_hw_queue() call is easier to understand. No it doesn't make blk-mq harder to understand. But even if it did it actually acknowledges that there is existing blk-mq SCHED_RESTART heuristic for how to deal with the need for blk-mq to back-off in the face of BLK_STS_RESOURCE returns. By just having each blk-mq driver blindly kick the queue you're actively ignoring, and defeating, that entire design element of blk-mq (SCHED_RESTART). It is an instance where blk-mq can and does know better. Acknowledging that fact is moving blk-mq in a better direction. > - This patch makes the blk-mq core harder to understand because of the > loop mentioned above. You've said your peace. But you've taken on this campaign to undermine this line of development with such passion that we're now in a place where Jens is shell-shocked by all the repeat campaigning and noise. Bart you keep saying the same things over and over. Yet you cannot show the patch to actively be a problem with testing-based detail. Seems you'd rather refuse to even test it than open yourself up to the possibility that this concern of yours has been making a mountain out of a mole hill. > - This patch does not fix any bugs nor makes block drivers easier to > read or to implement. So why is this
Re: [dm-devel] [LSF/MM TOPIC] block, dm: restack queue_limits
On Tue, 2018-01-30 at 16:07 +0100, Hannes Reinecke wrote: > On 01/29/2018 10:08 PM, Mike Snitzer wrote: > > We currently don't restack the queue_limits if the lowest, or > > intermediate, layer of an IO stack changes. > > > > This is particularly unfortunate in the case of FLUSH/FUA which may > > change if/when a HW controller's BBU fails; whereby requiring the device > > advertise that it has a volatile write cache (WCE=1). > > > Uh-oh. Device rescan. > Would be a valid topic on its own... > > > But in the context of DM, really it'd be best if the entire stack of > > devices had their limits restacked if any underlying layer's limits > > change. > > > > In the past, Martin and I discussed that we should "just do it" but > > never did. Not sure we need a lengthy discussion but figured I'd put it > > out there. > > > > Maybe I'll find time, between now and April, to try implementing it. > > > For SCSI the device capabilities are pretty much set in stone after the > initial scan; there are hooks for rescanning, but they will only work > half of the time. > Plus we can't really change the device type on the fly (eg if the SCSI > device type changes; if it moves away from '0' we would need to unbind > the sd driver, and if it moves to '0' we'll need to rescan the sd > device. None of this is happening right now.) > > So I'd be glad to have a discussion around this. At least array vendor has also desired the ability to change various attributes of the device after the initial scan, such as the model name. Not sure what would break if we did this, since who knows what userspace software might be caching this info, but... -Ewan
Re: [dm-devel] [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE
On Tue, 2018-01-30 at 09:52 -0800, Bart Van Assche wrote: > On 01/30/18 06:24, Mike Snitzer wrote: > > + * > > + * If driver returns BLK_STS_RESOURCE and > > SCHED_RESTART > > + * bit is set, run queue after a delay to avoid IO > > stalls > > + * that could otherwise occur if the queue is > > idle. > > */ > > - if (!blk_mq_sched_needs_restart(hctx) || > > + needs_restart = blk_mq_sched_needs_restart(hctx); > > + if (!needs_restart || > > (no_tag && list_empty_careful( > > >dispatch_wait.entry))) > > blk_mq_run_hw_queue(hctx, true); > > + else if (needs_restart && (ret == > > BLK_STS_RESOURCE)) > > + blk_mq_delay_run_hw_queue(hctx, > > BLK_MQ_QUEUE_DELAY); > > } > > If a request completes concurrently with execution of the above code > then the request completion will trigger a call of > blk_mq_sched_restart_hctx() and that call will clear the > BLK_MQ_S_SCHED_RESTART bit. If that bit is cleared before the above > code > tests it then the above code will schedule an asynchronous call of > __blk_mq_run_hw_queue(). If the .queue_rq() call triggered by the > new > queue run returns again BLK_STS_RESOURCE then the above code will be > executed again. In other words, a loop occurs. That loop will repeat > as > long as the described race occurs. The current (kernel v4.15) block > layer behavior is simpler: only block drivers call > blk_mq_delay_run_hw_queue() and the block layer core never calls > that > function. Hence that loop cannot occur with the v4.15 block layer > core > and block drivers. A motivation of why that loop is preferred > compared > to the current behavior (no loop) is missing. Does this mean that > that > loop is a needless complication of the block layer core? > > Sorry but I still prefer the v4.15 block layer approach because this > patch has in my view the following disadvantages: > - It involves a blk-mq API change. API changes are always risky and > need > some time to stabilize. > - The delay after which to rerun the queue is moved from block layer > drivers into the block layer core. I think that's wrong because > only > the block driver authors can make a good choice for this constant. > - This patch makes block drivers harder to understand. Anyone who > sees > return BLK_STS_RESOURCE / return BLK_STS_DEV_RESOURCE for the > first > time will have to look up the meaning of these constants. An > explicit > blk_mq_delay_run_hw_queue() call is easier to understand. > - This patch makes the blk-mq core harder to understand because of > the > loop mentioned above. > - This patch does not fix any bugs nor makes block drivers easier to > read or to implement. So why is this patch considered useful? > > Thanks, > > Bart. Hello Bart What is the performance implication of your method versus this patch above. Is there more of a delay in your approach or in Ming's approach from your own testing. I guess it seems we will never get consensus on this so is it time to take a vote. I respect and trust your inputs, you know that, but are you perhaps prepared to accept the approach above and agree to it and if it turns out to expose more issues it can be addressed later. Is that not a better way to progress this because to me it looks like you and Ming will continue to disagree on which is the better approach. With much respect Laurence
Re: [dm-devel] [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE
On 01/30/18 06:24, Mike Snitzer wrote: +* +* If driver returns BLK_STS_RESOURCE and SCHED_RESTART +* bit is set, run queue after a delay to avoid IO stalls +* that could otherwise occur if the queue is idle. */ - if (!blk_mq_sched_needs_restart(hctx) || + needs_restart = blk_mq_sched_needs_restart(hctx); + if (!needs_restart || (no_tag && list_empty_careful(>dispatch_wait.entry))) blk_mq_run_hw_queue(hctx, true); + else if (needs_restart && (ret == BLK_STS_RESOURCE)) + blk_mq_delay_run_hw_queue(hctx, BLK_MQ_QUEUE_DELAY); } If a request completes concurrently with execution of the above code then the request completion will trigger a call of blk_mq_sched_restart_hctx() and that call will clear the BLK_MQ_S_SCHED_RESTART bit. If that bit is cleared before the above code tests it then the above code will schedule an asynchronous call of __blk_mq_run_hw_queue(). If the .queue_rq() call triggered by the new queue run returns again BLK_STS_RESOURCE then the above code will be executed again. In other words, a loop occurs. That loop will repeat as long as the described race occurs. The current (kernel v4.15) block layer behavior is simpler: only block drivers call blk_mq_delay_run_hw_queue() and the block layer core never calls that function. Hence that loop cannot occur with the v4.15 block layer core and block drivers. A motivation of why that loop is preferred compared to the current behavior (no loop) is missing. Does this mean that that loop is a needless complication of the block layer core? Sorry but I still prefer the v4.15 block layer approach because this patch has in my view the following disadvantages: - It involves a blk-mq API change. API changes are always risky and need some time to stabilize. - The delay after which to rerun the queue is moved from block layer drivers into the block layer core. I think that's wrong because only the block driver authors can make a good choice for this constant. - This patch makes block drivers harder to understand. Anyone who sees return BLK_STS_RESOURCE / return BLK_STS_DEV_RESOURCE for the first time will have to look up the meaning of these constants. An explicit blk_mq_delay_run_hw_queue() call is easier to understand. - This patch makes the blk-mq core harder to understand because of the loop mentioned above. - This patch does not fix any bugs nor makes block drivers easier to read or to implement. So why is this patch considered useful? Thanks, Bart.
Re: [dm-devel] [LSF/MM TOPIC] block, dm: restack queue_limits
On 01/29/2018 10:08 PM, Mike Snitzer wrote: > We currently don't restack the queue_limits if the lowest, or > intermediate, layer of an IO stack changes. > > This is particularly unfortunate in the case of FLUSH/FUA which may > change if/when a HW controller's BBU fails; whereby requiring the device > advertise that it has a volatile write cache (WCE=1). > Uh-oh. Device rescan. Would be a valid topic on its own... > But in the context of DM, really it'd be best if the entire stack of > devices had their limits restacked if any underlying layer's limits > change. > > In the past, Martin and I discussed that we should "just do it" but > never did. Not sure we need a lengthy discussion but figured I'd put it > out there. > > Maybe I'll find time, between now and April, to try implementing it. > For SCSI the device capabilities are pretty much set in stone after the initial scan; there are hooks for rescanning, but they will only work half of the time. Plus we can't really change the device type on the fly (eg if the SCSI device type changes; if it moves away from '0' we would need to unbind the sd driver, and if it moves to '0' we'll need to rescan the sd device. None of this is happening right now.) So I'd be glad to have a discussion around this. Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
[PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE
From: Ming LeiThis status is returned from driver to block layer if device related resource is unavailable, but driver can guarantee that IO dispatch will be triggered in future when the resource is available. Convert some drivers to return BLK_STS_DEV_RESOURCE. Also, if driver returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls. BLK_MQ_DELAY_QUEUE is 3 ms because both scsi-mq and nvmefc are using that magic value. If a driver can make sure there is in-flight IO, it is safe to return BLK_STS_DEV_RESOURCE because: 1) If all in-flight IOs complete before examining SCHED_RESTART in blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue is run immediately in this case by blk_mq_dispatch_rq_list(); 2) if there is any in-flight IO after/when examining SCHED_RESTART in blk_mq_dispatch_rq_list(): - if SCHED_RESTART isn't set, queue is run immediately as handled in 1) - otherwise, this request will be dispatched after any in-flight IO is completed via blk_mq_sched_restart() 3) if SCHED_RESTART is set concurently in context because of BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two cases and make sure IO hang can be avoided. One invariant is that queue will be rerun if SCHED_RESTART is set. Suggested-by: Jens Axboe Tested-by: Laurence Oberman Signed-off-by: Ming Lei Signed-off-by: Mike Snitzer --- V5: - fixed stale reference to 10ms delay in blk-mq.c comment - revised commit header to include Ming's proof of how blk-mq drivers will make progress (serves to document how scsi_queue_rq now works) V4: - cleanup header and code comments - rerun queue after BLK_MQ_QUEUE_DELAY (3ms) instead of 10ms - eliminate nvmefc's queue rerun now that blk-mq does it V3: - fix typo, and improvement document - add tested-by tag V2: - add comments on the new introduced status - patch style fix - both are suggested by Christoph block/blk-core.c | 1 + block/blk-mq.c | 20 drivers/block/null_blk.c | 2 +- drivers/block/virtio_blk.c | 2 +- drivers/block/xen-blkfront.c | 2 +- drivers/md/dm-rq.c | 5 ++--- drivers/nvme/host/fc.c | 12 ++-- drivers/scsi/scsi_lib.c | 6 +++--- include/linux/blk_types.h| 17 + 9 files changed, 44 insertions(+), 23 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index cdae69be68e9..38279d4ae08b 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -145,6 +145,7 @@ static const struct { [BLK_STS_MEDIUM]= { -ENODATA, "critical medium" }, [BLK_STS_PROTECTION]= { -EILSEQ,"protection" }, [BLK_STS_RESOURCE] = { -ENOMEM,"kernel resource" }, + [BLK_STS_DEV_RESOURCE] = { -ENOMEM,"device resource" }, [BLK_STS_AGAIN] = { -EAGAIN,"nonblocking retry" }, /* device mapper special case, should not leak out: */ diff --git a/block/blk-mq.c b/block/blk-mq.c index 43e7449723e0..e39b4e2a63db 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1160,6 +1160,8 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx **hctx, return true; } +#define BLK_MQ_QUEUE_DELAY 3 /* ms units */ + bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, bool got_budget) { @@ -1167,6 +1169,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, struct request *rq, *nxt; bool no_tag = false; int errors, queued; + blk_status_t ret = BLK_STS_OK; if (list_empty(list)) return false; @@ -1179,7 +1182,6 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, errors = queued = 0; do { struct blk_mq_queue_data bd; - blk_status_t ret; rq = list_first_entry(list, struct request, queuelist); if (!blk_mq_get_driver_tag(rq, , false)) { @@ -1224,7 +1226,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, } ret = q->mq_ops->queue_rq(hctx, ); - if (ret == BLK_STS_RESOURCE) { + if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) { /* * If an I/O scheduler has been configured and we got a * driver tag for the next request already, free it @@ -1255,6 +1257,8 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, * that is where we will continue on next queue run. */ if (!list_empty(list)) { + bool needs_restart; +
Re: [LSF/MM TOPIC] Two blk-mq related topics
On Tue, Jan 30, 2018 at 11:08:28AM +0100, Johannes Thumshirn wrote: > [+Cc Mel] > Jens Axboewrites: > > On 1/29/18 1:56 PM, James Bottomley wrote: > >> On Mon, 2018-01-29 at 23:46 +0800, Ming Lei wrote: > >> [...] > >>> 2. When to enable SCSI_MQ at default again? > >> > >> I'm not sure there's much to discuss ... I think the basic answer is as > >> soon as Christoph wants to try it again. > > > > FWIW, internally I've been running various IO intensive workloads on > > what is essentially 4.12 upstream with scsi-mq the default (with > > mq-deadline as the scheduler) and comparing IO workloads with a > > previous 4.6 kernel (without scsi-mq), and things are looking > > great. > > > > We're never going to iron out the last kinks with it being off > > by default, I think we should attempt to flip the switch again > > for 4.16. > > The 4.12 sounds interesting. I remember Mel ran some test with 4.12 as > we where considering to flip the config option for SLES and it showed > several road blocks. > Mostly due to slow storage and BFQ where mq-deadline was not a universal win as an alternative default. I don't have current data and I archived what I had, but it was based on 4.13-rc7 at the time and BFQ has changed a lot since so it would need to be redone. > I'm not sure whether he re-evaluated 4.13/4.14 on his grid though. > No, it hasn't. Grid time for performance testing has been tight during the last few months to say the least. > But I'm definitively interested in this discussion and can even possibly > share some benchmark results we did in our FC Lab. > If you remind me, I may be able to re-execute the tests in a 4.16-rcX before LSF/MM so you have other data to work with. Unfortunately, I'll not be able to make LSF/MM this time due to personal commitments that conflict and are unmovable. -- Mel Gorman SUSE Labs
RE: A qla2xxx commit cause Linux no response, has not fixed in lastest version 4.15-rc6
Hi Himanshu, Today I tried several times and have some news. Before I insmod the qla2xxx.ko , if I shutdown then start the FC switch port connected to the HBA card, the qla2xxx.ko works well. It seems that the issue has relation to the FC switch port. Maybe some old status causes the issue. The FC switch model is H3C S5820V2. Regards Chang Limin -Original Message- From: Madhani, Himanshu [mailto:himanshu.madh...@cavium.com] Sent: Tuesday, January 30, 2018 7:40 AM To: changlimin (Cloud) Cc: Nicholas A. Bellinger; Tran, Quinn; jifuliang (Cloud); zhangguanghui (Cloud); zhangzijian (Cloud); target-devel; linux-scsi Subject: Re: A qla2xxx commit cause Linux no response, has not fixed in lastest version 4.15-rc6 Hi Chang, > On Jan 18, 2018, at 4:51 AM, Changliminwrote: > > Hi Himanshu, > Today I reproduced the issue in my server. > First, I compiled kernel 4.15-rc6 (make localmodconfig; make; make > modules_install; make install), then start the kernel with parameter > modprobe.blacklist=qla2xxx. > Second, tail -f /var/log/syslog > Third, modprobe qla2xxx ql2xextended_error_logging=0x1e40 , the log is > syslog-1e40.txt > The syslog-7fff is got when modprobe qla2xxx > ql2xextended_error_logging=0x7fff > > BTW, I haven't load driver from 4.9.x to kernel 4.15-rc6. > When I checkout kernel commit 726b85487067d7f5b23495bc33c484b8517c4074, all > kernel code is 4.9.x. > Sorry for extended delay in the response. From the syslog that you sent me, I do see driver version 10.00.00.02-k which is from 4.15.0-rc6 so atleast you are using the correct driver. (in your email earlier you mentioned 8.07.xx which was confusing) Jan 18 20:30:23 cvknode25 kernel: [ 100.991309] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 10.00.00.02-k-debug. Jan 18 20:30:23 cvknode25 kernel: [ 100.991486] qla2xxx [:0a:00.0]-001d: : Found an ISP2532 irq 16 iobase 0x67aad9fd. Jan 18 20:30:23 cvknode25 kernel: [ 101.651676] qla2xxx [:0a:00.0]-4800:1: DPC handler sleeping. Jan 18 20:30:23 cvknode25 kernel: [ 101.651677] scsi host1: qla2xxx Also I do see Jan 18 20:30:24 cvknode25 kernel: [ 102.624987] qla2xxx [:0a:00.0]-500a:1: LOOP UP detected (8 Gbps). i.e. driver was able to bring up 8G link So having said that i still do not have clear picture from the logs provided, why you are encountering issue. Can you please share you configuration details. I would like to see how is your system setup and see if i can replicate in our lab here. > Regards > Chang Limin > > -Original Message- > From: Madhani, Himanshu [mailto:himanshu.madh...@cavium.com] > Sent: Thursday, January 18, 2018 2:26 AM > To: changlimin (Cloud) > Cc: Nicholas A. Bellinger; Tran, Quinn; jifuliang (Cloud); zhangguanghui > (Cloud); zhangzijian (Cloud); target-devel; linux-scsi > Subject: Re: A qla2xxx commit cause Linux no response, has not fixed in > lastest version 4.15-rc6 > > Hi Chang, > >> On Jan 15, 2018, at 10:49 PM, Changlimin wrote: >> >> Hi Himanshu, >> This is my progress. >> First, I compiled 4.15-rc6, I found linux hang when booting, the stack >> showed something wrong in qla2xxx driver. > > Can you provide me detail steps of how you compiled 4.15-rc6. Also provide me > details of how you are loading driver and also provide complete log file. > > I do not see how you will be able to load driver which is from 4.9.x when you > compile fresh 4.15.0-rc6. > > Just FYI, I build test system with 8G/16G/32G adapter with 4.15.0-rc6 kernel > and I am not able to see hang that you are describing. > > # uname -r > 4.15.0-rc6+ > > # modprobe qla2xxx > > # fcc.sh > FC HBAs: > HBA Port NamePort ID State Device > host3 21:00:00:24:ff:7e:f5:80 01:0d:00 OnlineQLE2742 FW:v8.05.63 > DVR:v10.00.00.04-k > host4 21:00:00:24:ff:7e:f5:81 01:0e:00 OnlineQLE2742 FW:v8.05.63 > DVR:v10.00.00.04-k > host5 21:00:00:0e:1e:12:e9:a0 01:06:00 OnlineQLE8362 FW:v8.03.06 > DVR:v10.00.00.04-k > host6 21:00:00:0e:1e:12:e9:a1 01:14:00 OnlineQLE8362 FW:v8.03.06 > DVR:v10.00.00.04-k > host7 21:00:00:24:ff:46:0a:5c 01:0d:00 OnlineQLE2562 FW:v8.03.00 > DVR:v10.00.00.04-k > host8 21:00:00:24:ff:46:0a:5d 01:15:00 OnlineQLE2562 FW:v8.03.00 > DVR:v10.00.00.04-k > > # modinfo qla2xxx | more > > filename: > /lib/modules/4.15.0-rc6+/kernel/drivers/scsi/qla2xxx/qla2xxx.ko > firmware: ql2500_fw.bin > firmware: ql2400_fw.bin > firmware: ql2322_fw.bin > firmware: ql2300_fw.bin > firmware: ql2200_fw.bin > firmware: ql2100_fw.bin > version:10.00.00.04-k > license:GPL > description:QLogic Fibre Channel HBA Driver > author: QLogic Corporation > srcversion: 6CBCF1372A7756690E83CC3 > > >> Second, I want to find which commit introduced the issue. So I tried many >> times
Re: [LSF/MM TOPIC] Two blk-mq related topics
On 30/01/2018 01:24, Ming Lei wrote: On Mon, Jan 29, 2018 at 12:56:30PM -0800, James Bottomley wrote: On Mon, 2018-01-29 at 23:46 +0800, Ming Lei wrote: [...] 2. When to enable SCSI_MQ at default again? I'm not sure there's much to discuss ... I think the basic answer is as soon as Christoph wants to try it again. I guess Christoph still need to evaluate if there are existed issues or blockers before trying it again. And more input may be got from F2F discussion, IMHO. SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In V4.13-rc1, it is enabled at default, but later the patch is reverted in V4.13-rc7, and becomes disabled at default too. Now both the original reported PM issue(actually SCSI quiesce) and the sequential IO performance issue have been addressed. Is the blocker bug just not closed because no-one thought to do it: https://bugzilla.kernel.org/show_bug.cgi?id=178381 (we have confirmed that this issue is now fixed with the original reporter?) From a developer view, this issue is fixed by the following commit: 3a0a52997(block, scsi: Make SCSI quiesce and resume work reliably), and it is verified by kernel list reporter. And did the Huawei guy (Jonathan Cameron) confirm his performance issue was fixed (I don't think I saw email that he did)? Last time I talked with John Garry about the issue, and the merged .get_budget based patch improves much on the IO performance, but there is still a bit gap compared with legacy path. Seems a driver specific issue, remembered that removing a driver's lock can improve performance much. Garry, could you provide further update on this issue? Hi Ming, From our testing with experimental changes to our driver to support SCSI mq we were almost getting on par performance with legacy path. But without these MQ was hitting performance (and I would not necessarily say it was a driver issue). We can retest from today's mainline and see where we are. BTW, Have you got performance figures for many other single queue HBAs with and without CONFIG_SCSI_MQ_DEFAULT=Y? Thanks, John Thanks, Ming .
[PATCH v2] scsi: handle special return codes for ABORTED COMMAND
Introduce a new blist flag that indicates the device may return certain sense code/ASC/ASCQ combinations that indicate different treatment than normal. In particular, some devices need unconditional retry (aka ADD_TO_MLQUEUE) under certain conditions; otherwise path failures may be falsely detected in the "maybe_retry" case. Because different devices use different sense codes to indicate this condition, a single bit is not enough. But we also can't use a bit for every possible status code. Therefore the new flag BLIST_ABORTED_CMD_QUIRK just indicates that this is a device for which the return code should be checked. An extra "blacklist" in scsi_aborted_cmd_quirk() then checks for known ASC/ASCQ combinations, and handles them. When such a combination is encountered for the first time, a message is printed. In systems that have several different peripherals using this flag, that might lead to a wrong match without a warning. This small risk is a compromise between exactness and the excessive resources that would be required to check for matching device vendor and model name every time. Also, if there were different devices (identified by vendor/model) using the same ASC/ASCQ, the code might print stray warnings. But the additional effort to required to handle this 100% correctly is hardly justified with the current minimal "blacklist". I chose to recycle devinfo flag (1<<11) (former BLIST_INQUIRY_58, free since 496c91bbc910) for this purpose rather than extending blist_flags_t to 64 bit. This could be easily changed of course. This patch replaces the previously submitted patches "scsi: handle ABORTED_COMMAND on Fujitsu ETERNUS" and "scsi: Always retry internal target error" (Hannes Reinecke) Changes in v2: - use ARRAY_SIZE (Bart van Assche) - make blist array static const and use separate bitmask for warned flag (Bart van Assche) - Fix string comparison for SCSI vendor and model - Print warning at scsi_logging_level 0, and improve message formatting Signed-off-by: Martin Wilck--- drivers/scsi/scsi_devinfo.c | 4 +- drivers/scsi/scsi_error.c | 111 include/scsi/scsi_devinfo.h | 6 +++ 3 files changed, 120 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c index dfb8da83fa50..39455734d934 100644 --- a/drivers/scsi/scsi_devinfo.c +++ b/drivers/scsi/scsi_devinfo.c @@ -161,12 +161,14 @@ static struct { {"DGC", "RAID", NULL, BLIST_SPARSELUN}, /* Dell PV 650F, storage on LUN 0 */ {"DGC", "DISK", NULL, BLIST_SPARSELUN}, /* Dell PV 650F, no storage on LUN 0 */ {"EMC", "Invista", "*", BLIST_SPARSELUN | BLIST_LARGELUN}, - {"EMC", "SYMMETRIX", NULL, BLIST_SPARSELUN | BLIST_LARGELUN | BLIST_REPORTLUN2}, + {"EMC", "SYMMETRIX", NULL, BLIST_SPARSELUN | BLIST_LARGELUN +| BLIST_REPORTLUN2 | BLIST_ABORTED_CMD_QUIRK}, {"EMULEX", "MD21/S2 ESDI", NULL, BLIST_SINGLELUN}, {"easyRAID", "16P", NULL, BLIST_NOREPORTLUN}, {"easyRAID", "X6P", NULL, BLIST_NOREPORTLUN}, {"easyRAID", "F8", NULL, BLIST_NOREPORTLUN}, {"FSC", "CentricStor", "*", BLIST_SPARSELUN | BLIST_LARGELUN}, + {"FUJITSU", "ETERNUS_DXM", "*", BLIST_ABORTED_CMD_QUIRK}, {"Generic", "USB SD Reader", "1.00", BLIST_FORCELUN | BLIST_INQUIRY_36}, {"Generic", "USB Storage-SMC", "0180", BLIST_FORCELUN | BLIST_INQUIRY_36}, {"Generic", "USB Storage-SMC", "0207", BLIST_FORCELUN | BLIST_INQUIRY_36}, diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 62b56de38ae8..d60568f71047 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include @@ -39,6 +40,7 @@ #include #include #include +#include #include "scsi_priv.h" #include "scsi_logging.h" @@ -432,6 +434,112 @@ static void scsi_report_sense(struct scsi_device *sdev, } } +struct aborted_cmd_blist { + u8 asc; + u8 ascq; + int retval; + const char *vendor; + const char *model; +}; + +/** + * scsi_strcmp - Compare space-padded string with reference string + * @device_str:vendor or model field of struct scsi_device, + * possibly space-padded + * @ref_str: reference string to compare with + * @len: max size of device_str: 8 for vendor, 16 for model + * + * Return value: + * -1, 0, or 1, like strcmp(). + */ +static int scsi_strcmp(const char *device_str, const char *ref_str, int len) +{ + int ref_len = strlen(ref_str); + int r, i; + + WARN_ON(ref_len > len); + r = strncmp(device_str, ref_str, min(ref_len, len)); + if (r != 0) + return r; + + for (i = ref_len; i < strnlen(device_str, len); i++) + if (device_str[i] != ' ') + return 1; + return 0; +} + +/** + * scsi_aborted_cmd_quirk - Handle special
Re: [LSF/MM TOPIC] Two blk-mq related topics
[+Cc Mel] Jens Axboewrites: > On 1/29/18 1:56 PM, James Bottomley wrote: >> On Mon, 2018-01-29 at 23:46 +0800, Ming Lei wrote: >> [...] >>> 2. When to enable SCSI_MQ at default again? >> >> I'm not sure there's much to discuss ... I think the basic answer is as >> soon as Christoph wants to try it again. > > FWIW, internally I've been running various IO intensive workloads on > what is essentially 4.12 upstream with scsi-mq the default (with > mq-deadline as the scheduler) and comparing IO workloads with a > previous 4.6 kernel (without scsi-mq), and things are looking > great. > > We're never going to iron out the last kinks with it being off > by default, I think we should attempt to flip the switch again > for 4.16. The 4.12 sounds interesting. I remember Mel ran some test with 4.12 as we where considering to flip the config option for SLES and it showed several road blocks. I'm not sure whether he re-evaluated 4.13/4.14 on his grid though. But I'm definitively interested in this discussion and can even possibly share some benchmark results we did in our FC Lab. Byte, Johannes -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [LSF/MM TOPIC] Two blk-mq related topics
Ming Lei - 30.01.18, 02:24: > > > SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In > > > V4.13-rc1, it is enabled at default, but later the patch is reverted > > > in V4.13-rc7, and becomes disabled at default too. > > > > > > Now both the original reported PM issue(actually SCSI quiesce) and > > > the sequential IO performance issue have been addressed. > > > > Is the blocker bug just not closed because no-one thought to do it: > > > > https://bugzilla.kernel.org/show_bug.cgi?id=178381 > > > > (we have confirmed that this issue is now fixed with the original > > reporter?) > > From a developer view, this issue is fixed by the following commit: > 3a0a52997(block, scsi: Make SCSI quiesce and resume work reliably), > and it is verified by kernel list reporter. I never seen any suspend / hibernate related issues with blk-mq + bfq since then. Using heavily utilized BTRFS dual SSD RAID 1. % egrep "MQ|BFQ" /boot/config-4.15.0-tp520-btrfstrim+ CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_BLK_WBT_MQ=y CONFIG_BLK_MQ_PCI=y CONFIG_BLK_MQ_VIRTIO=y CONFIG_MQ_IOSCHED_DEADLINE=m CONFIG_MQ_IOSCHED_KYBER=m CONFIG_IOSCHED_BFQ=m CONFIG_BFQ_GROUP_IOSCHED=y CONFIG_NET_SCH_MQPRIO=m # CONFIG_SCSI_MQ_DEFAULT is not set # CONFIG_DM_MQ_DEFAULT is not set CONFIG_DM_CACHE_SMQ=m % cat /proc/cmdline BOOT_IMAGE=/vmlinuz-4.15.0-tp520-btrfstrim+ root=UUID=[…] ro rootflags=subvol=debian resume=/dev/mapper/sata-swap init=/bin/systemd thinkpad_acpi.fan_control=1 systemd.restore_state=0 scsi_mod.use_blk_mq=1 % cat /sys/block/sda/queue/scheduler [bfq] none Thanks, -- Martin