Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via .host_tagset

2018-02-28 Thread Ming Lei
On Thu, Mar 01, 2018 at 10:54:17AM +0530, Kashyap Desai wrote:
> > -Original Message-
> > From: Laurence Oberman [mailto:lober...@redhat.com]
> > Sent: Wednesday, February 28, 2018 9:52 PM
> > To: Ming Lei; Kashyap Desai
> > Cc: Jens Axboe; linux-bl...@vger.kernel.org; Christoph Hellwig; Mike
> > Snitzer;
> > linux-scsi@vger.kernel.org; Hannes Reinecke; Arun Easi; Omar Sandoval;
> > Martin K . Petersen; James Bottomley; Christoph Hellwig; Don Brace; Peter
> > Rivera
> > Subject: Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance
> > via
> > .host_tagset
> >
> > On Wed, 2018-02-28 at 23:21 +0800, Ming Lei wrote:
> > > On Wed, Feb 28, 2018 at 08:28:48PM +0530, Kashyap Desai wrote:
> > > > Ming -
> > > >
> > > > Quick testing on my setup -  Performance slightly degraded (4-5%
> > > > drop)for megaraid_sas driver with this patch. (From 1610K IOPS it
> > > > goes to
> > > > 1544K)
> > > > I confirm that after applying this patch, we have #queue = #numa
> > > > node.
> > > >
> > > > ls -l
> > > > /sys/devices/pci:80/:80:02.0/:83:00.0/host10/target10:2
> > > > :23/10:
> > > > 2:23:0/block/sdy/mq
> > > > total 0
> > > > drwxr-xr-x. 18 root root 0 Feb 28 09:53 0 drwxr-xr-x. 18 root root 0
> > > > Feb 28 09:53 1
> > >
> > > OK, thanks for your test.
> > >
> > > As I mentioned to you, this patch should have improved performance on
> > > megaraid_sas, but the current slight degrade might be caused by
> > > scsi_host_queue_ready() in scsi_queue_rq(), I guess.
> > >
> > > With .host_tagset enabled and use per-numa-node hw queue, request can
> > > be queued to lld more frequently/quick than single queue, then the
> > > cost of
> > > atomic_inc_return(&host->host_busy) may be increased much meantime,
> > > think about millions of such operations, and finally slight IOPS drop
> > > is observed when the hw queue depth becomes half of .can_queue.
> > >
> > > >
> > > >
> > > > I would suggest to skip megaraid_sas driver changes using
> > > > shared_tagset until and unless there is obvious gain. If overall
> > > > interface of using shared_tagset is commit in kernel tree, we will
> > > > investigate (megaraid_sas
> > > > driver) in future about real benefit of using it.
> > >
> > > I'd suggest to not merge it until it is proved that performance can be
> > > improved in real device.
> 
> Noted.
> 
> > >
> > > I will try to work to remove the expensive atomic_inc_return(&host-
> > > >host_busy)
> > > from scsi_queue_rq(), since it isn't needed for SCSI_MQ, once it is
> > > done, will ask you to test again.
> 
> Ming - Do you mean removing host_busy stats  from scsi_queue_rq() will still
> provide correct value in host_busy whenever IO reach to LLD ?

The host queue depth has been respected by blk-mq already before calling
scsi_queue_rq(), so not necessary to do it again in scsi_queue_rq(), but this
counter is needed in error handler, so we have to figure out one way to not
break error handler.

Also megaraid_sas driver need to be checked if there is host wide lock
used in .queuecommand or completion path.

Thanks,
Ming


Re:[PATCH] scsi: lpfc: Switch memcpy_fromio() to __read32_copy()

2018-02-28 Thread 陈华才
Ping?
 
-- Original --
From:  "Huacai Chen";
Date:  Fri, Jan 26, 2018 08:53 PM
To:  "James Smart"; "Dick 
Kennedy";
Cc:  "James E . J . Bottomley"; "Martin K . 
Petersen"; "Fuxin Zhang"; 
"linux-scsi"; "Huacai Chen"; 
"stable";
Subject:  [PATCH] scsi: lpfc: Switch memcpy_fromio() to __read32_copy()
 
In commit bc73905abf770192 ("[SCSI] lpfc 8.3.16: SLI Additions, updates,
and code cleanup"), lpfc_memcpy_to_slim() have switched memcpy_toio() to
__write32_copy() in order to prevent unaligned 64 bit copy. Recently, we
found that lpfc_memcpy_from_slim() have similar issues, so let it switch
memcpy_fromio() to __read32_copy().

Cc: sta...@vger.kernel.org
Signed-off-by: Huacai Chen 
---
 drivers/scsi/lpfc/lpfc_compat.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_compat.h b/drivers/scsi/lpfc/lpfc_compat.h
index 6b32b0a..47d4fad 100644
--- a/drivers/scsi/lpfc/lpfc_compat.h
+++ b/drivers/scsi/lpfc/lpfc_compat.h
@@ -91,8 +91,8 @@ lpfc_memcpy_to_slim( void __iomem *dest, void *src, unsigned 
int bytes)
 static inline void
 lpfc_memcpy_from_slim( void *dest, void __iomem *src, unsigned int bytes)
 {
-   /* actually returns 1 byte past dest */
-   memcpy_fromio( dest, src, bytes);
+   /* convert bytes in argument list to word count for copy function */
+   __ioread32_copy(dest, src, bytes / sizeof(uint32_t));
 }
 
 #endif /* __BIG_ENDIAN */
-- 
2.7.0

RE: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via .host_tagset

2018-02-28 Thread Kashyap Desai
> -Original Message-
> From: Laurence Oberman [mailto:lober...@redhat.com]
> Sent: Wednesday, February 28, 2018 9:52 PM
> To: Ming Lei; Kashyap Desai
> Cc: Jens Axboe; linux-bl...@vger.kernel.org; Christoph Hellwig; Mike
> Snitzer;
> linux-scsi@vger.kernel.org; Hannes Reinecke; Arun Easi; Omar Sandoval;
> Martin K . Petersen; James Bottomley; Christoph Hellwig; Don Brace; Peter
> Rivera
> Subject: Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance
> via
> .host_tagset
>
> On Wed, 2018-02-28 at 23:21 +0800, Ming Lei wrote:
> > On Wed, Feb 28, 2018 at 08:28:48PM +0530, Kashyap Desai wrote:
> > > Ming -
> > >
> > > Quick testing on my setup -  Performance slightly degraded (4-5%
> > > drop)for megaraid_sas driver with this patch. (From 1610K IOPS it
> > > goes to
> > > 1544K)
> > > I confirm that after applying this patch, we have #queue = #numa
> > > node.
> > >
> > > ls -l
> > > /sys/devices/pci:80/:80:02.0/:83:00.0/host10/target10:2
> > > :23/10:
> > > 2:23:0/block/sdy/mq
> > > total 0
> > > drwxr-xr-x. 18 root root 0 Feb 28 09:53 0 drwxr-xr-x. 18 root root 0
> > > Feb 28 09:53 1
> >
> > OK, thanks for your test.
> >
> > As I mentioned to you, this patch should have improved performance on
> > megaraid_sas, but the current slight degrade might be caused by
> > scsi_host_queue_ready() in scsi_queue_rq(), I guess.
> >
> > With .host_tagset enabled and use per-numa-node hw queue, request can
> > be queued to lld more frequently/quick than single queue, then the
> > cost of
> > atomic_inc_return(&host->host_busy) may be increased much meantime,
> > think about millions of such operations, and finally slight IOPS drop
> > is observed when the hw queue depth becomes half of .can_queue.
> >
> > >
> > >
> > > I would suggest to skip megaraid_sas driver changes using
> > > shared_tagset until and unless there is obvious gain. If overall
> > > interface of using shared_tagset is commit in kernel tree, we will
> > > investigate (megaraid_sas
> > > driver) in future about real benefit of using it.
> >
> > I'd suggest to not merge it until it is proved that performance can be
> > improved in real device.

Noted.

> >
> > I will try to work to remove the expensive atomic_inc_return(&host-
> > >host_busy)
> > from scsi_queue_rq(), since it isn't needed for SCSI_MQ, once it is
> > done, will ask you to test again.

Ming - Do you mean removing host_busy stats  from scsi_queue_rq() will still
provide correct value in host_busy whenever IO reach to LLD ?

> >
> >
> > Thanks,
> > Ming
>
> I will test this here as well
> I just put the Megaraid card in to my system here
>
> Kashyap, do you have ssd's on the back-end and are you you using jbods or
> virtual devices. Let me have your config.
> I only have 6G sas shelves though.

Laurence -
I am using 12 SSD drives in JBOD mode OR single drive R0 mode.  Single SSD
is capable of ~138K IOPS (4K RR).
With all 12 SSDs performance scale linearly and goes upto ~1610K IOPS.

I think if you have 6G SAS fully loaded, you may need more number of drives
to reach 1600K IOPs (sequential load with nomerges=2 on HDD is required to
avoid IO merge at block layer.)

SSD model I am using is -  HGST  - " HUSMH8020BSS200"
Here is lscpu output of my setup -

lscpu
Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):32
On-line CPU(s) list:   0-31
Thread(s) per core:2
Core(s) per socket:8
Socket(s): 2
NUMA node(s):  2
Vendor ID: GenuineIntel
CPU family:6
Model: 79
Model name:Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
Stepping:  1
CPU MHz:   1726.217
BogoMIPS:  4199.37
Virtualization:VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache:  256K
L3 cache:  20480K
NUMA node0 CPU(s): 0-7,16-23
NUMA node1 CPU(s): 8-15,24-31

>
> Regards
> Laurence


Re: [PATCH V2] scsi: core: use blk_mq_requeue_request in __scsi_queue_insert

2018-02-28 Thread jianchao.wang
Hi Bart

Thanks for your precious time to review this and kindly detailed response.

On 03/01/2018 01:52 AM, Bart Van Assche wrote:
> On Wed, 2018-02-28 at 16:55 +0800, Jianchao Wang wrote:
>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> index a86df9c..6fa7b0c 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -191,7 +191,8 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, 
>> int reason, bool unbusy)
>>   */
>>  cmd->result = 0;
>>  if (q->mq_ops) {
>> -scsi_mq_requeue_cmd(cmd);
>> +blk_mq_requeue_request(cmd->request, true);
>> +put_device(&device->sdev_gendev);
>>  return;
>>  }
>>  spin_lock_irqsave(q->queue_lock, flags);
> 
> Anyone who sees the put_device() call that follows the 
> blk_mq_requeue_request()
> call will wonder why that call occurs there. So I think we need a comment 
> above
> that call that explains where the matching get_device() call is.

Yes, I will add this.

> For the legacy code path, there is a get_device() call in scsi_prep_fn() but 
> no
> put_device() call in scsi_unprep_fn() - the matching put_device() calls occur 
> in
> scsi_end_request() and after blk_requeue_request().
> 
> For scsi-mq however there is a get_device() call in scsi_mq_get_budget() and a
> put_device() call in scsi_mq_put_budget(). So why do we need the put_device()
> calls after blk_mq_requeue_request() and in the mq path for 
> scsi_end_request()?
> 

>From the source code, we know the scsi_mq_get_budget will be invoked every 
>time when we issue a request.
But scsi_mq_put_budget is just in the fail path.

scsi_queue_rq // if any error
  -> scsi_mq_put_budget

blk_mq_dispatch_rq_list // if no driver tags
  -> blk_mq_put_dispatch_budget
-> scsi_mq_put_budget
blk_mq_do_dispatch_sched/blk_mq_do_dispatch_ctx // if no requests
  -> blk_mq_put_dispatch_budget
-> scsi_mq_put_budget

So we have to add put_device after  blk_mq_requeue_request() and in 
scsi_end_request() to match the
scsi_mq_get_budget.

Thanks
Jianchao


Re: [PATCH V2] scsi: core: use blk_mq_requeue_request in __scsi_queue_insert

2018-02-28 Thread Bart Van Assche
On Wed, 2018-02-28 at 16:55 +0800, Jianchao Wang wrote:
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index a86df9c..6fa7b0c 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -191,7 +191,8 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, 
> int reason, bool unbusy)
>*/
>   cmd->result = 0;
>   if (q->mq_ops) {
> - scsi_mq_requeue_cmd(cmd);
> + blk_mq_requeue_request(cmd->request, true);
> + put_device(&device->sdev_gendev);
>   return;
>   }
>   spin_lock_irqsave(q->queue_lock, flags);

Anyone who sees the put_device() call that follows the blk_mq_requeue_request()
call will wonder why that call occurs there. So I think we need a comment above
that call that explains where the matching get_device() call is.

For the legacy code path, there is a get_device() call in scsi_prep_fn() but no
put_device() call in scsi_unprep_fn() - the matching put_device() calls occur in
scsi_end_request() and after blk_requeue_request().

For scsi-mq however there is a get_device() call in scsi_mq_get_budget() and a
put_device() call in scsi_mq_put_budget(). So why do we need the put_device()
calls after blk_mq_requeue_request() and in the mq path for scsi_end_request()?

Thanks,

Bart.




Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via .host_tagset

2018-02-28 Thread Laurence Oberman
On Wed, 2018-02-28 at 23:21 +0800, Ming Lei wrote:
> On Wed, Feb 28, 2018 at 08:28:48PM +0530, Kashyap Desai wrote:
> > Ming -
> > 
> > Quick testing on my setup -  Performance slightly degraded (4-5%
> > drop)for
> > megaraid_sas driver with this patch. (From 1610K IOPS it goes to
> > 1544K)
> > I confirm that after applying this patch, we have #queue = #numa
> > node.
> > 
> > ls -l
> > /sys/devices/pci:80/:80:02.0/:83:00.0/host10/target10:2
> > :23/10:
> > 2:23:0/block/sdy/mq
> > total 0
> > drwxr-xr-x. 18 root root 0 Feb 28 09:53 0
> > drwxr-xr-x. 18 root root 0 Feb 28 09:53 1
> 
> OK, thanks for your test.
> 
> As I mentioned to you, this patch should have improved performance on
> megaraid_sas, but the current slight degrade might be caused by
> scsi_host_queue_ready() in scsi_queue_rq(), I guess.
> 
> With .host_tagset enabled and use per-numa-node hw queue, request can
> be
> queued to lld more frequently/quick than single queue, then the cost
> of
> atomic_inc_return(&host->host_busy) may be increased much meantime,
> think about millions of such operations, and finally slight IOPS drop
> is observed when the hw queue depth becomes half of .can_queue.
> 
> > 
> > 
> > I would suggest to skip megaraid_sas driver changes using
> > shared_tagset
> > until and unless there is obvious gain. If overall interface of
> > using
> > shared_tagset is commit in kernel tree, we will investigate
> > (megaraid_sas
> > driver) in future about real benefit of using it.
> 
> I'd suggest to not merge it until it is proved that performance can
> be
> improved in real device.
> 
> I will try to work to remove the expensive atomic_inc_return(&host-
> >host_busy)
> from scsi_queue_rq(), since it isn't needed for SCSI_MQ, once it is
> done, will
> ask you to test again.
> 
> 
> Thanks,
> Ming

I will test this here as well
I just put the Megaraid card in to my system here

Kashyap, do you have ssd's on the back-end and are you you using jbods
or virtual devices. Let me have your config.
I only have 6G sas shelves though.

Regards
Laurence


Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via .host_tagset

2018-02-28 Thread Ming Lei
On Wed, Feb 28, 2018 at 08:28:48PM +0530, Kashyap Desai wrote:
> Ming -
> 
> Quick testing on my setup -  Performance slightly degraded (4-5% drop)for
> megaraid_sas driver with this patch. (From 1610K IOPS it goes to 1544K)
> I confirm that after applying this patch, we have #queue = #numa node.
> 
> ls -l
> /sys/devices/pci:80/:80:02.0/:83:00.0/host10/target10:2:23/10:
> 2:23:0/block/sdy/mq
> total 0
> drwxr-xr-x. 18 root root 0 Feb 28 09:53 0
> drwxr-xr-x. 18 root root 0 Feb 28 09:53 1

OK, thanks for your test.

As I mentioned to you, this patch should have improved performance on
megaraid_sas, but the current slight degrade might be caused by
scsi_host_queue_ready() in scsi_queue_rq(), I guess.

With .host_tagset enabled and use per-numa-node hw queue, request can be
queued to lld more frequently/quick than single queue, then the cost of
atomic_inc_return(&host->host_busy) may be increased much meantime,
think about millions of such operations, and finally slight IOPS drop
is observed when the hw queue depth becomes half of .can_queue.

> 
> 
> I would suggest to skip megaraid_sas driver changes using shared_tagset
> until and unless there is obvious gain. If overall interface of using
> shared_tagset is commit in kernel tree, we will investigate (megaraid_sas
> driver) in future about real benefit of using it.

I'd suggest to not merge it until it is proved that performance can be
improved in real device.

I will try to work to remove the expensive atomic_inc_return(&host->host_busy)
from scsi_queue_rq(), since it isn't needed for SCSI_MQ, once it is done, will
ask you to test again.


Thanks,
Ming


RE: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via .host_tagset

2018-02-28 Thread Kashyap Desai
Ming -

Quick testing on my setup -  Performance slightly degraded (4-5% drop)for
megaraid_sas driver with this patch. (From 1610K IOPS it goes to 1544K)
I confirm that after applying this patch, we have #queue = #numa node.

ls -l
/sys/devices/pci:80/:80:02.0/:83:00.0/host10/target10:2:23/10:
2:23:0/block/sdy/mq
total 0
drwxr-xr-x. 18 root root 0 Feb 28 09:53 0
drwxr-xr-x. 18 root root 0 Feb 28 09:53 1


I would suggest to skip megaraid_sas driver changes using shared_tagset
until and unless there is obvious gain. If overall interface of using
shared_tagset is commit in kernel tree, we will investigate (megaraid_sas
driver) in future about real benefit of using it.

Without patch -

  4.64%  [megaraid_sas]   [k] complete_cmd_fusion
   3.23%  [kernel] [k] irq_entries_start
   3.18%  [kernel] [k] _raw_spin_lock
   3.06%  [kernel] [k] syscall_return_via_sysret
   2.74%  [kernel] [k] bt_iter
   2.55%  [kernel] [k] scsi_queue_rq
   2.21%  [megaraid_sas]   [k] megasas_build_io_fusion
   1.80%  [megaraid_sas]   [k] megasas_queue_command
   1.59%  [kernel] [k] __audit_syscall_exit
   1.55%  [kernel] [k] _raw_spin_lock_irqsave
   1.38%  [megaraid_sas]   [k] megasas_build_and_issue_cmd_fusion
   1.34%  [kernel] [k] do_io_submit
   1.33%  [kernel] [k] gup_pgd_range
   1.26%  [kernel] [k] scsi_softirq_done
   1.20%  fio  [.] __fio_gettime
   1.20%  [kernel] [k] switch_mm_irqs_off
   1.00%  [megaraid_sas]   [k] megasas_build_ldio_fusion
   0.97%  fio  [.] get_io_u
   0.89%  [kernel] [k] lookup_ioctx
   0.80%  [kernel] [k] scsi_dec_host_busy
   0.78%  [kernel] [k] blkdev_direct_IO
   0.78%  [megaraid_sas]   [k] MR_GetPhyParams
   0.73%  [kernel] [k] aio_read_events
   0.70%  [megaraid_sas]   [k] MR_BuildRaidContext
   0.64%  [kernel] [k] blk_mq_complete_request
   0.64%  fio  [.] thread_main
   0.63%  [kernel] [k] blk_queue_split
   0.63%  [kernel] [k] blk_mq_get_request
   0.61%  [kernel] [k] read_tsc
   0.59%  [kernel] [k] kmem_cache_a


With patch -

   4.36%  [megaraid_sas]   [k] complete_cmd_fusion
   3.24%  [kernel] [k] irq_entries_start
   3.00%  [kernel] [k] syscall_return_via_sysret
   2.41%  [kernel] [k] scsi_queue_rq
   2.41%  [kernel] [k] _raw_spin_lock
   2.22%  [megaraid_sas]   [k] megasas_build_io_fusion
   1.92%  [kernel] [k] bt_iter
   1.74%  [megaraid_sas]   [k] megasas_queue_command
   1.48%  [kernel] [k] gup_pgd_range
   1.44%  [kernel] [k] __audit_syscall_exit
   1.33%  [megaraid_sas]   [k] megasas_build_and_issue_cmd_fusion
   1.29%  [kernel] [k] _raw_spin_lock_irqsave
   1.25%  fio  [.] get_io_u
   1.24%  fio  [.] __fio_gettime
   1.22%  [kernel] [k] do_io_submit
   1.18%  [megaraid_sas]   [k] megasas_build_ldio_fusion
   1.02%  [kernel] [k] blk_mq_get_request
   0.91%  [kernel] [k] lookup_ioctx
   0.91%  [kernel] [k] scsi_softirq_done
   0.88%  [kernel] [k] scsi_dec_host_busy
   0.87%  [kernel] [k] blkdev_direct_IO
   0.77%  [megaraid_sas]   [k] MR_BuildRaidContext
   0.76%  [megaraid_sas]   [k] MR_GetPhyParams
   0.73%  [kernel] [k] __fget
   0.70%  [kernel] [k] switch_mm_irqs_off
   0.70%  fio  [.] thread_main
   0.69%  [kernel] [k] aio_read_events
   0.68%  [kernel] [k] note_interrupt
   0.65%  [kernel] [k] do_syscal

Kashyap

> -Original Message-
> From: Ming Lei [mailto:ming@redhat.com]
> Sent: Tuesday, February 27, 2018 3:38 PM
> To: Jens Axboe; linux-bl...@vger.kernel.org; Christoph Hellwig; Mike
Snitzer
> Cc: linux-scsi@vger.kernel.org; Hannes Reinecke; Arun Easi; Omar
Sandoval;
> Martin K . Petersen; James Bottomley; Christoph Hellwig; Don Brace;
Kashyap
> Desai; Peter Rivera; Laurence Oberman; Ming Lei
> Subject: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via
> .host_tagset
>
> It is observed on null_blk that IOPS can be improved much by simply
making
> hw queue per NUMA node, so this patch applies the introduced
.host_tagset
> for improving performance.
>
> In reality, .can_queue is quite big, and NUMA node number is often
small, so
> each hw queue's depth should be high enough to saturate device.
>
> Cc: Arun Easi 
> Cc: Omar Sandoval ,
> Cc: "Martin K. Petersen" ,
> Cc: James Bottomley ,
> Cc: Christoph 

RE: [PATCH] mpt3sas: Do not mark fw_event workqueue as WQ_MEM_RECLAIM

2018-02-28 Thread Sreekanth Reddy
-Original Message-
From: Hannes Reinecke [mailto:h...@suse.de]
Sent: Monday, February 26, 2018 7:56 PM
To: Martin K. Petersen
Cc: Christoph Hellwig; James Bottomley; linux-scsi@vger.kernel.org; Hannes
Reinecke; Sreekanth Reddy; Suganath Prabu Subramani; Hannes Reinecke
Subject: [PATCH] mpt3sas: Do not mark fw_event workqueue as WQ_MEM_RECLAIM

The firmware event workqueue should not be marked as WQ_MEM_RECLAIM as
it's doesn't need to make forward progress under memory pressure.
In the current state it will result in a deadlock if the device had been
forcefully removed.

Cc: Sreekanth Reddy 
Cc: Suganath Prabu Subramani 
Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 74fca184dba9..b1d3218a7ad7 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -10547,7 +10547,7 @@ _scsih_probe(struct pci_dev *pdev, const struct
pci_device_id *id)
snprintf(ioc->firmware_event_name,
sizeof(ioc->firmware_event_name),
"fw_event_%s%d", ioc->driver_name, ioc->id);
ioc->firmware_event_thread = alloc_ordered_workqueue(
-   ioc->firmware_event_name, WQ_MEM_RECLAIM);
+   ioc->firmware_event_name, 0);
if (!ioc->firmware_event_thread) {
pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n",
ioc->name, __FILE__, __LINE__, __func__);
--
2.12.3

Acked-by: Sreekanth Reddy 

Thanks,
Sreekanth


[PATCH V2] scsi: core: use blk_mq_requeue_request in __scsi_queue_insert

2018-02-28 Thread Jianchao Wang
In scsi core, __scsi_queue_insert should just put request back on
the queue and retry using the same command as before. However, for
blk-mq, scsi_mq_requeue_cmd is employed here which will unprepare
the request. To align with the semantics of __scsi_queue_insert,
use blk_mq_requeue_request with kick_requeue_list == true and put
the reference of scsi_device.

V1 -> V2:
 - add put_device on scsi_device->sdev_gendev

Cc: Christoph Hellwig 
Signed-off-by: Jianchao Wang 
---
 drivers/scsi/scsi_lib.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index a86df9c..6fa7b0c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -191,7 +191,8 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int 
reason, bool unbusy)
 */
cmd->result = 0;
if (q->mq_ops) {
-   scsi_mq_requeue_cmd(cmd);
+   blk_mq_requeue_request(cmd->request, true);
+   put_device(&device->sdev_gendev);
return;
}
spin_lock_irqsave(q->queue_lock, flags);
-- 
2.7.4



Re: [PATCH] scsi: libsas: defer ata device eh commands to libata

2018-02-28 Thread Jason Yan


On 2018/2/28 16:09, John Garry wrote:

On 28/02/2018 04:14, Jason Yan wrote:



On 2018/2/27 23:00, Jack Wang wrote:

2018-02-27 12:50 GMT+01:00 John Garry :

On 27/02/2018 06:59, Jason Yan wrote:


When ata device doing EH, some commands still attached with tasks
are not
passed to libata when abort failed or recover failed, so libata did
not
handle these commands. After these commands done, sas task is freed,
but
ata qc is not freed. This will cause ata qc leak and trigger a warning
like below:



It's seems like a bug that we're just killing the ATA command in libsas
error handling and not deferring them to ATA EH also.

And this WARN, below, in ata_eh_finish() is a longterm issue I see (but
maybe because of other issue also).

As mentioned to Jason privately, I wonder why Dan's patch excluded the
change here:
commit 3944f50995f947558c35fb16ae0288354756762c
Author: Dan Williams 
Date:   Tue Nov 29 12:08:50 2011 -0800

 [SCSI] libsas: let libata handle command timeouts

 libsas-eh if it successfully aborts an ata command will hide the
timeout/
 condition (AC_ERR_TIMEOUT) from libata.  The command likely
completes
 with the all-zero task->task_status it started with.  Instead,
interpret
 a TMF_RESP_FUNC_COMPLETE as the end of the sas_task but keep the
scmd
 around for libata-eh to handle.

 Tested-by: Andrzej Jakowski 
 Signed-off-by: Dan Williams 
 Signed-off-by: James Bottomley 




WARNING: CPU: 0 PID: 28512 at drivers/ata/libata-eh.c:4037
ata_eh_finish+0xb4/0xcc
CPU: 0 PID: 28512 Comm: kworker/u32:2 Tainted: G W  OE 4.14.0#1
..
Call trace:
[] ata_eh_finish+0xb4/0xcc
[] ata_do_eh+0xc4/0xd8
[] ata_std_error_handler+0x44/0x8c
[] ata_scsi_port_error_handler+0x480/0x694
[] async_sas_ata_eh+0x4c/0x80
[] async_run_entry_fn+0x4c/0x170
[] process_one_work+0x144/0x390
[] worker_thread+0x144/0x418
[] kthread+0x10c/0x138
[] ret_from_fork+0x10/0x18

Hi John, hi Jason,

We've seen this warning once on pm80xx with sata SSD in production (on
3.12 kernel), but failed to see the root cause.
In my case, it's a chain sequence, one SSD failed, lead to error
handle  & IO stuck.

Do you have reproducer?



I have found this warning several times when our test team running a
very large test suite. Sorry to say that I do not have a simple
reproducer yet.



Typically we would see this when commands timeout after we unplug a SATA
disk with IO in flight, right?



Yes, that's true.


John


Your change looks good to me, but would be good to hear from Dan &
James.

Thanks,
Jack








.





.





Re: [PATCH] scsi: libsas: defer ata device eh commands to libata

2018-02-28 Thread John Garry

On 28/02/2018 04:14, Jason Yan wrote:



On 2018/2/27 23:00, Jack Wang wrote:

2018-02-27 12:50 GMT+01:00 John Garry :

On 27/02/2018 06:59, Jason Yan wrote:


When ata device doing EH, some commands still attached with tasks
are not
passed to libata when abort failed or recover failed, so libata did not
handle these commands. After these commands done, sas task is freed,
but
ata qc is not freed. This will cause ata qc leak and trigger a warning
like below:



It's seems like a bug that we're just killing the ATA command in libsas
error handling and not deferring them to ATA EH also.

And this WARN, below, in ata_eh_finish() is a longterm issue I see (but
maybe because of other issue also).

As mentioned to Jason privately, I wonder why Dan's patch excluded the
change here:
commit 3944f50995f947558c35fb16ae0288354756762c
Author: Dan Williams 
Date:   Tue Nov 29 12:08:50 2011 -0800

 [SCSI] libsas: let libata handle command timeouts

 libsas-eh if it successfully aborts an ata command will hide the
timeout/
 condition (AC_ERR_TIMEOUT) from libata.  The command likely
completes
 with the all-zero task->task_status it started with.  Instead,
interpret
 a TMF_RESP_FUNC_COMPLETE as the end of the sas_task but keep the
scmd
 around for libata-eh to handle.

 Tested-by: Andrzej Jakowski 
 Signed-off-by: Dan Williams 
 Signed-off-by: James Bottomley 




WARNING: CPU: 0 PID: 28512 at drivers/ata/libata-eh.c:4037
ata_eh_finish+0xb4/0xcc
CPU: 0 PID: 28512 Comm: kworker/u32:2 Tainted: G W  OE 4.14.0#1
..
Call trace:
[] ata_eh_finish+0xb4/0xcc
[] ata_do_eh+0xc4/0xd8
[] ata_std_error_handler+0x44/0x8c
[] ata_scsi_port_error_handler+0x480/0x694
[] async_sas_ata_eh+0x4c/0x80
[] async_run_entry_fn+0x4c/0x170
[] process_one_work+0x144/0x390
[] worker_thread+0x144/0x418
[] kthread+0x10c/0x138
[] ret_from_fork+0x10/0x18

Hi John, hi Jason,

We've seen this warning once on pm80xx with sata SSD in production (on
3.12 kernel), but failed to see the root cause.
In my case, it's a chain sequence, one SSD failed, lead to error
handle  & IO stuck.

Do you have reproducer?



I have found this warning several times when our test team running a
very large test suite. Sorry to say that I do not have a simple
reproducer yet.



Typically we would see this when commands timeout after we unplug a SATA 
disk with IO in flight, right?


John


Your change looks good to me, but would be good to hear from Dan & James.

Thanks,
Jack








.