Re: [PATCH 13/14] scsi: add support for a blk-mq based I/O path.

2014-08-19 Thread Kashyap Desai
On Tue, Aug 19, 2014 at 3:51 AM, Kashyap Desai
kashyap.de...@avagotech.com wrote:

  -Original Message-
  From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi-
  ow...@vger.kernel.org] On Behalf Of Christoph Hellwig
  Sent: Friday, July 18, 2014 3:43 PM
  To: James Bottomley; linux-scsi@vger.kernel.org
  Cc: Jens Axboe; Bart Van Assche; Mike Christie; Martin K. Petersen;
 Robert
  Elliott; Webb Scales; linux-ker...@vger.kernel.org
  Subject: [PATCH 13/14] scsi: add support for a blk-mq based I/O path.
 
  This patch adds support for an alternate I/O path in the scsi midlayer
 which
  uses the blk-mq infrastructure instead of the legacy request code.
 
  Use of blk-mq is fully transparent to drivers, although for now a host
  template field is provided to opt out of blk-mq usage in case any
 unforseen
  incompatibilities arise.
 
  In general replacing the legacy request code with blk-mq is a simple and
  mostly mechanical transformation.  The biggest exception is the new code
  that deals with the fact the I/O submissions in blk-mq must happen from
  process context, which slightly complicates the I/O completion handler.
  The second biggest differences is that blk-mq is build around the
 concept of
  preallocated requests that also include driver specific data, which in
 SCSI
  context means the scsi_cmnd structure.  This completely avoids dynamic
  memory allocations for the fast path through I/O submission.
 
  Due the preallocated requests the MQ code path exclusively uses the
 host-
  wide shared tag allocator instead of a per-LUN one.  This only affects
 drivers
  actually using the block layer provided tag allocator instead of their
 own.
  Unlike the old path blk-mq always provides a tag, although drivers don't
 have
  to use it.
 
  For now the blk-mq path is disable by defauly and must be enabled using
 the
  use_blk_mq module parameter.  Once the remaining work in the block
  layer to make blk-mq more suitable for slow devices is complete I hope
 to
  make it the default and eventually even remove the old code path.
 
  Based on the earlier scsi-mq prototype by Nicholas Bellinger.
 
  Thanks to Bart Van Assche and Robert Elliot for testing, benchmarking
 and
  various sugestions and code contributions.
 
  Signed-off-by: Christoph Hellwig h...@lst.de
  Reviewed-by: Hannes Reinecke h...@suse.de
  Reviewed-by: Webb Scales web...@hp.com
  Acked-by: Jens Axboe ax...@kernel.dk
  Tested-by: Bart Van Assche bvanass...@acm.org
  Tested-by: Robert Elliott elli...@hp.com
  ---
   drivers/scsi/hosts.c  |  35 +++-
   drivers/scsi/scsi.c   |   5 +-
   drivers/scsi/scsi_lib.c   | 464
  --
   drivers/scsi/scsi_priv.h  |   3 +
   drivers/scsi/scsi_scan.c  |   5 +-
   drivers/scsi/scsi_sysfs.c |   2 +
   include/scsi/scsi_host.h  |  18 +-
   include/scsi/scsi_tcq.h   |  28 ++-
   8 files changed, 488 insertions(+), 72 deletions(-)
 
  diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index
 0632eee..6de80e3
  100644
  --- a/drivers/scsi/hosts.c
  +++ b/drivers/scsi/hosts.c
  @@ -213,9 +213,24 @@ int scsi_add_host_with_dma(struct Scsi_Host
  *shost, struct device *dev,
goto fail;
}
 
  + if (shost_use_blk_mq(shost)) {
  + error = scsi_mq_setup_tags(shost);
  + if (error)
  + goto fail;
  + }
  +
  + /*
  +  * Note that we allocate the freelist even for the MQ case for
 now,
  +  * as we need a command set aside for scsi_reset_provider.  Having
  +  * the full host freelist and one command available for that is a
  +  * little heavy-handed, but avoids introducing a special allocator
  +  * just for this.  Eventually the structure of scsi_reset_provider
  +  * will need a major overhaul.
  +  */
error = scsi_setup_command_freelist(shost);
if (error)
  - goto fail;
  + goto out_destroy_tags;
  +
 
if (!shost-shost_gendev.parent)
shost-shost_gendev.parent = dev ? dev : platform_bus;
  @@ -226,7 +241,7 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost,
  struct device *dev,
 
error = device_add(shost-shost_gendev);
if (error)
  - goto out;
  + goto out_destroy_freelist;
 
pm_runtime_set_active(shost-shost_gendev);
pm_runtime_enable(shost-shost_gendev);
  @@ -279,8 +294,11 @@ int scsi_add_host_with_dma(struct Scsi_Host
  *shost, struct device *dev,
device_del(shost-shost_dev);
out_del_gendev:
device_del(shost-shost_gendev);
  - out:
  + out_destroy_freelist:
scsi_destroy_command_freelist(shost);
  + out_destroy_tags:
  + if (shost_use_blk_mq(shost))
  + scsi_mq_destroy_tags(shost);
fail:
return error;
   }
  @@ -309,8 +327,13 @@ static void scsi_host_dev_release(struct device
  *dev)
}
 
scsi_destroy_command_freelist(shost);
  - if (shost-bqt)
  

Re: [PATCH 13/14] scsi: add support for a blk-mq based I/O path.

2014-08-19 Thread Christoph Hellwig
On Tue, Aug 19, 2014 at 03:51:42AM +0530, Kashyap Desai wrote:
 I read  this comment and find that very few drivers are using this
 cmd_list.  I think if we remove this cmd_list, performance will scale as I
 am seeing major contention in this lock.
 Just thought to ping you to see if this is known limitation for now or any
 plan to change this lock in near future ?

Removing the lock entirely and pushing the list into the two drivers
using it is on my TODO list.  Bart actually suggested keeping the code in the
SCSI core and having a flag to enabled.  Given that I'm too busy to get my full
version done in time, it might be a good idea if someone picks up Barts
idea.  Can you send me a patch to add a enable_cmd_list flag to the host
template and only enable it for aacraid and dpt_i2o?

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/14] scsi: add support for a blk-mq based I/O path.

2014-08-19 Thread Kashyap Desai
On Tue, Aug 19, 2014 at 9:36 PM, Christoph Hellwig h...@lst.de wrote:
 On Tue, Aug 19, 2014 at 03:51:42AM +0530, Kashyap Desai wrote:
 I read  this comment and find that very few drivers are using this
 cmd_list.  I think if we remove this cmd_list, performance will scale as I
 am seeing major contention in this lock.
 Just thought to ping you to see if this is known limitation for now or any
 plan to change this lock in near future ?

 Removing the lock entirely and pushing the list into the two drivers
 using it is on my TODO list.  Bart actually suggested keeping the code in the
 SCSI core and having a flag to enabled.  Given that I'm too busy to get my 
 full
 version done in time, it might be a good idea if someone picks up Barts
 idea.  Can you send me a patch to add a enable_cmd_list flag to the host
 template and only enable it for aacraid and dpt_i2o?


Sure. I will work on relevant code change and will post patch for review.

-- 
Device Driver Developer @ Avagotech
Kashyap D. Desai
Note - my new email address
kashyap.de...@avagotech.com
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 13/14] scsi: add support for a blk-mq based I/O path.

2014-08-18 Thread Kashyap Desai
 -Original Message-
 From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi-
 ow...@vger.kernel.org] On Behalf Of Christoph Hellwig
 Sent: Friday, July 18, 2014 3:43 PM
 To: James Bottomley; linux-scsi@vger.kernel.org
 Cc: Jens Axboe; Bart Van Assche; Mike Christie; Martin K. Petersen;
Robert
 Elliott; Webb Scales; linux-ker...@vger.kernel.org
 Subject: [PATCH 13/14] scsi: add support for a blk-mq based I/O path.

 This patch adds support for an alternate I/O path in the scsi midlayer
which
 uses the blk-mq infrastructure instead of the legacy request code.

 Use of blk-mq is fully transparent to drivers, although for now a host
 template field is provided to opt out of blk-mq usage in case any
unforseen
 incompatibilities arise.

 In general replacing the legacy request code with blk-mq is a simple and
 mostly mechanical transformation.  The biggest exception is the new code
 that deals with the fact the I/O submissions in blk-mq must happen from
 process context, which slightly complicates the I/O completion handler.
 The second biggest differences is that blk-mq is build around the
concept of
 preallocated requests that also include driver specific data, which in
SCSI
 context means the scsi_cmnd structure.  This completely avoids dynamic
 memory allocations for the fast path through I/O submission.

 Due the preallocated requests the MQ code path exclusively uses the
host-
 wide shared tag allocator instead of a per-LUN one.  This only affects
drivers
 actually using the block layer provided tag allocator instead of their
own.
 Unlike the old path blk-mq always provides a tag, although drivers don't
have
 to use it.

 For now the blk-mq path is disable by defauly and must be enabled using
the
 use_blk_mq module parameter.  Once the remaining work in the block
 layer to make blk-mq more suitable for slow devices is complete I hope
to
 make it the default and eventually even remove the old code path.

 Based on the earlier scsi-mq prototype by Nicholas Bellinger.

 Thanks to Bart Van Assche and Robert Elliot for testing, benchmarking
and
 various sugestions and code contributions.

 Signed-off-by: Christoph Hellwig h...@lst.de
 Reviewed-by: Hannes Reinecke h...@suse.de
 Reviewed-by: Webb Scales web...@hp.com
 Acked-by: Jens Axboe ax...@kernel.dk
 Tested-by: Bart Van Assche bvanass...@acm.org
 Tested-by: Robert Elliott elli...@hp.com
 ---
  drivers/scsi/hosts.c  |  35 +++-
  drivers/scsi/scsi.c   |   5 +-
  drivers/scsi/scsi_lib.c   | 464
 --
  drivers/scsi/scsi_priv.h  |   3 +
  drivers/scsi/scsi_scan.c  |   5 +-
  drivers/scsi/scsi_sysfs.c |   2 +
  include/scsi/scsi_host.h  |  18 +-
  include/scsi/scsi_tcq.h   |  28 ++-
  8 files changed, 488 insertions(+), 72 deletions(-)

 diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index
0632eee..6de80e3
 100644
 --- a/drivers/scsi/hosts.c
 +++ b/drivers/scsi/hosts.c
 @@ -213,9 +213,24 @@ int scsi_add_host_with_dma(struct Scsi_Host
 *shost, struct device *dev,
   goto fail;
   }

 + if (shost_use_blk_mq(shost)) {
 + error = scsi_mq_setup_tags(shost);
 + if (error)
 + goto fail;
 + }
 +
 + /*
 +  * Note that we allocate the freelist even for the MQ case for
now,
 +  * as we need a command set aside for scsi_reset_provider.  Having
 +  * the full host freelist and one command available for that is a
 +  * little heavy-handed, but avoids introducing a special allocator
 +  * just for this.  Eventually the structure of scsi_reset_provider
 +  * will need a major overhaul.
 +  */
   error = scsi_setup_command_freelist(shost);
   if (error)
 - goto fail;
 + goto out_destroy_tags;
 +

   if (!shost-shost_gendev.parent)
   shost-shost_gendev.parent = dev ? dev : platform_bus;
 @@ -226,7 +241,7 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost,
 struct device *dev,

   error = device_add(shost-shost_gendev);
   if (error)
 - goto out;
 + goto out_destroy_freelist;

   pm_runtime_set_active(shost-shost_gendev);
   pm_runtime_enable(shost-shost_gendev);
 @@ -279,8 +294,11 @@ int scsi_add_host_with_dma(struct Scsi_Host
 *shost, struct device *dev,
   device_del(shost-shost_dev);
   out_del_gendev:
   device_del(shost-shost_gendev);
 - out:
 + out_destroy_freelist:
   scsi_destroy_command_freelist(shost);
 + out_destroy_tags:
 + if (shost_use_blk_mq(shost))
 + scsi_mq_destroy_tags(shost);
   fail:
   return error;
  }
 @@ -309,8 +327,13 @@ static void scsi_host_dev_release(struct device
 *dev)
   }

   scsi_destroy_command_freelist(shost);
 - if (shost-bqt)
 - blk_free_tags(shost-bqt);
 + if (shost_use_blk_mq(shost)) {
 + if (shost-tag_set.tags)
 + scsi_mq_destroy_tags(shost);
 + } else {
 + if (shost-bqt)
 + 

Re: [PATCH 13/14] scsi: add support for a blk-mq based I/O path.

2014-07-25 Thread Martin K. Petersen
 Christoph == Christoph Hellwig h...@lst.de writes:

Christoph This patch adds support for an alternate I/O path in the scsi
Christoph midlayer which uses the blk-mq infrastructure instead of the
Christoph legacy request code.

Reviewed-by: Martin K. Petersen martin.peter...@oracle.com

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/14] scsi: add support for a blk-mq based I/O path.

2014-07-16 Thread Mike Christie
On 06/25/2014 11:52 AM, Christoph Hellwig wrote:
 +static int scsi_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
 +{
 + struct request_queue *q = req-q;
 + struct scsi_device *sdev = q-queuedata;
 + struct Scsi_Host *shost = sdev-host;
 + struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req);
 + int ret;
 + int reason;
 +
 + ret = prep_to_mq(scsi_prep_state_check(sdev, req));
 + if (ret)
 + goto out;
 +
 + ret = BLK_MQ_RQ_QUEUE_BUSY;
 + if (!get_device(sdev-sdev_gendev))
 + goto out;
 +
 + if (!scsi_dev_queue_ready(q, sdev))
 + goto out_put_device;
 + if (!scsi_target_queue_ready(shost, sdev))
 + goto out_dec_device_busy;
 + if (!scsi_host_queue_ready(q, shost, sdev))
 + goto out_dec_target_busy;
 +
 + if (!(req-cmd_flags  REQ_DONTPREP)) {
 + ret = prep_to_mq(scsi_mq_prep_fn(req));
 + if (ret)
 + goto out_dec_host_busy;
 + req-cmd_flags |= REQ_DONTPREP;
 + }
 +
 + scsi_init_cmd_errh(cmd);
 + cmd-scsi_done = scsi_mq_done;
 +
 + reason = scsi_dispatch_cmd(cmd);
 + if (reason) {
 + scsi_set_blocked(cmd, reason);
 + ret = BLK_MQ_RQ_QUEUE_BUSY;
 + goto out_dec_host_busy;
 + }
 +
 + return BLK_MQ_RQ_QUEUE_OK;
 +
 +out_dec_host_busy:
 + cancel_delayed_work(cmd-abort_work);

Hey Christoph,

I see the request timer is started before calling queue_rq, but I could
not figure out what the cancel_delayed_work here is for exactly. It
seems if the request were to time out and the eh started while queue_rq
was running we could end up some nasty bugs like the requested requeued
twice.

Is the cancel_delayed_work call just to be safe or is supposed to be
handling a case where the abort_work could be queued at this time up due
to a request timing out while queue_rq is running? Is this case mq specific?
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/14] scsi: add support for a blk-mq based I/O path.

2014-07-16 Thread Christoph Hellwig
On Wed, Jul 16, 2014 at 06:13:21AM -0500, Mike Christie wrote:
 I see the request timer is started before calling queue_rq, but I could
 not figure out what the cancel_delayed_work here is for exactly. It
 seems if the request were to time out and the eh started while queue_rq
 was running we could end up some nasty bugs like the requested requeued
 twice.
 
 Is the cancel_delayed_work call just to be safe or is supposed to be
 handling a case where the abort_work could be queued at this time up due
 to a request timing out while queue_rq is running? Is this case mq specific?

It was cargo cult copy  paste from the old path.  I've merged a patch
from Bart to remove it from the old code, so it should go away here as well,
thanks for the reminder.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/14] scsi: add support for a blk-mq based I/O path.

2014-07-09 Thread Hannes Reinecke

On 06/25/2014 06:52 PM, Christoph Hellwig wrote:

This patch adds support for an alternate I/O path in the scsi midlayer
which uses the blk-mq infrastructure instead of the legacy request code.

Use of blk-mq is fully transparent to drivers, although for now a host
template field is provided to opt out of blk-mq usage in case any unforseen
incompatibilities arise.

In general replacing the legacy request code with blk-mq is a simple and
mostly mechanical transformation.  The biggest exception is the new code
that deals with the fact the I/O submissions in blk-mq must happen from
process context, which slightly complicates the I/O completion handler.
The second biggest differences is that blk-mq is build around the concept
of preallocated requests that also include driver specific data, which
in SCSI context means the scsi_cmnd structure.  This completely avoids
dynamic memory allocations for the fast path through I/O submission.

Due the preallocated requests the MQ code path exclusively uses the
host-wide shared tag allocator instead of a per-LUN one.  This only
affects drivers actually using the block layer provided tag allocator
instead of their own.  Unlike the old path blk-mq always provides a tag,
although drivers don't have to use it.

For now the blk-mq path is disable by defauly and must be enabled using
the use_blk_mq module parameter.  Once the remaining work in the block
layer to make blk-mq more suitable for slow devices is complete I hope
to make it the default and eventually even remove the old code path.

Based on the earlier scsi-mq prototype by Nicholas Bellinger.

Thanks to Bart Van Assche and Robert Elliot for testing, benchmarking and
various sugestions and code contributions.

Signed-off-by: Christoph Hellwig h...@lst.de
---
  drivers/scsi/hosts.c  |   30 ++-
  drivers/scsi/scsi.c   |5 +-
  drivers/scsi/scsi_lib.c   |  475 +++--
  drivers/scsi/scsi_priv.h  |3 +
  drivers/scsi/scsi_scan.c  |5 +-
  drivers/scsi/scsi_sysfs.c |2 +
  include/scsi/scsi_host.h  |   18 +-
  include/scsi/scsi_tcq.h   |   28 ++-
  8 files changed, 494 insertions(+), 72 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 0632eee..6322e6c 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -213,9 +213,24 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct 
device *dev,
goto fail;
}

+   if (shost_use_blk_mq(shost)) {
+   error = scsi_mq_setup_tags(shost);
+   if (error)
+   goto fail;
+   }
+
+   /*
+* Note that we allocate the freelist even for the MQ case for now,
+* as we need a command set aside for scsi_reset_provider.  Having
+* the full host freelist and one command available for that is a
+* little heavy-handed, but avoids introducing a special allocator
+* just for this.  Eventually the structure of scsi_reset_provider
+* will need a major overhaul.
+*/
error = scsi_setup_command_freelist(shost);
if (error)
-   goto fail;
+   goto out_destroy_tags;
+

if (!shost-shost_gendev.parent)
shost-shost_gendev.parent = dev ? dev : platform_bus;
@@ -226,7 +241,7 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct 
device *dev,

error = device_add(shost-shost_gendev);
if (error)
-   goto out;
+   goto out_destroy_freelist;

pm_runtime_set_active(shost-shost_gendev);
pm_runtime_enable(shost-shost_gendev);
@@ -279,8 +294,11 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct 
device *dev,
device_del(shost-shost_dev);
   out_del_gendev:
device_del(shost-shost_gendev);
- out:
+ out_destroy_freelist:
scsi_destroy_command_freelist(shost);
+ out_destroy_tags:
+   if (shost_use_blk_mq(shost))
+   scsi_mq_destroy_tags(shost);
   fail:
return error;
  }
@@ -309,7 +327,9 @@ static void scsi_host_dev_release(struct device *dev)
}

scsi_destroy_command_freelist(shost);
-   if (shost-bqt)
+   if (shost_use_blk_mq(shost)  shost-tag_set.tags)
+   scsi_mq_destroy_tags(shost);
+   else if (shost-bqt)
blk_free_tags(shost-bqt);

kfree(shost-shost_data);
@@ -436,6 +456,8 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template 
*sht, int privsize)
else
shost-dma_boundary = 0x;

+   shost-use_blk_mq = scsi_use_blk_mq  !shost-hostt-disable_blk_mq;
+
device_initialize(shost-shost_gendev);
dev_set_name(shost-shost_gendev, host%d, shost-host_no);
shost-shost_gendev.bus = scsi_bus_type;
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index b362058..c089812 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -809,7 +809,7 @@ void scsi_adjust_queue_depth(struct