Re: [PATCH] megaraid_sas: Enable shared host tag map

2014-11-25 Thread Christoph Hellwig
On Mon, Nov 24, 2014 at 04:51:14PM +0100, Hannes Reinecke wrote:
 It is useful as is, as we'll be getting prefixed logging output :-)

Use the blk-mq code path if you care :)

 Which I didn't do yet as the driver is using a larger tag map than
 that one announced to the block layer.
 This is to facilitate internal command submission, which should
 always work independent on any tag starvation issues from the
 upper layers.

This is an issue for a lot of drivers.  blk-mq provides a reserved_tags
pool for that, which reserves a number of tags for internal use, those
must be allocated using blk_mq_alloc_request with the reserved argument
set to true.

The lockless hpsa patches expose this to SCSI, which I'm generally
fine with, but we need to find a way to transparently make this work
for the old code path, too.  This might be as simple as embedding a
second blk_queue_tag structure into the Scsi_Host, adding a constant
prefix to the tag and providing some wrappes in scsi that allow
allocating a struct request (or rather scsi_cmnd) for internal use.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] megaraid_sas: Enable shared host tag map

2014-11-25 Thread Hannes Reinecke
On 11/25/2014 03:31 PM, Christoph Hellwig wrote:
 On Mon, Nov 24, 2014 at 04:51:14PM +0100, Hannes Reinecke wrote:
 It is useful as is, as we'll be getting prefixed logging output :-)
 
 Use the blk-mq code path if you care :)
 
 Which I didn't do yet as the driver is using a larger tag map than
 that one announced to the block layer.
 This is to facilitate internal command submission, which should
 always work independent on any tag starvation issues from the
 upper layers.
 
 This is an issue for a lot of drivers.  blk-mq provides a reserved_tags
 pool for that, which reserves a number of tags for internal use, those
 must be allocated using blk_mq_alloc_request with the reserved argument
 set to true.
 
 The lockless hpsa patches expose this to SCSI, which I'm generally
 fine with, but we need to find a way to transparently make this work
 for the old code path, too.  This might be as simple as embedding a
 second blk_queue_tag structure into the Scsi_Host, adding a constant
 prefix to the tag and providing some wrappes in scsi that allow
 allocating a struct request (or rather scsi_cmnd) for internal use.
 
I'd rather have a single map to get request/tags from; otherwise
we'd be arbitrarily starving internal requests even though the
'main' tag map is empty.
My plan was more to mark a certain range of tags as 'reserved',
and add another helper/argument to allow to dip into the reserved
pool, too.

A tentative patch is attached.
Idea is to call blk_queue_init_tags() with the actual tag size and
then blk_resize_tags() to limit the number of tags for the request
queue.
The driver can then use 'blk_allocate_tag' with the appropriate max
depth to get tags from the range [max_depth:real_max_depth].

Cheers,

Hannes

-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 21284 (AG Nürnberg)
From e872bb0c4c2b1f72982f4d31925d3b2f65317ffd Mon Sep 17 00:00:00 2001
From: Hannes Reinecke h...@suse.de
Date: Tue, 25 Nov 2014 15:43:07 +0100
Subject: [PATCH] blk-tag: separate out blk_(allocate|release)_tag

Separate out helper functions blk_allocate_tag() and
blk_release_tag().

Signed-off-by: Hannes Reinecke h...@suse.de
---
 block/blk-tag.c | 74 ++---
 1 file changed, 55 insertions(+), 19 deletions(-)

diff --git a/block/blk-tag.c b/block/blk-tag.c
index a185b86..6526fff 100644
--- a/block/blk-tag.c
+++ b/block/blk-tag.c
@@ -244,6 +244,24 @@ int blk_queue_resize_tags(struct request_queue *q, int new_depth)
 }
 EXPORT_SYMBOL(blk_queue_resize_tags);
 
+void blk_release_tag(struct blk_queue_tag *bqt, int tag)
+{
+	if (unlikely(!bqt))
+		return;
+
+	if (unlikely(!test_bit(tag, bqt-tag_map))) {
+		printk(KERN_ERR %s: attempt to clear non-busy tag (%d)\n,
+		   __func__, tag);
+		return;
+	}
+	/*
+	 * The tag_map bit acts as a lock for tag_index[bit], so we need
+	 * unlock memory barrier semantics.
+	 */
+	clear_bit_unlock(tag, bqt-tag_map);
+}
+EXPORT_SYMBOL(blk_release_tag);
+
 /**
  * blk_queue_end_tag - end tag operations for a request
  * @q:  the request queue for the device
@@ -275,18 +293,43 @@ void blk_queue_end_tag(struct request_queue *q, struct request *rq)
 
 	bqt-tag_index[tag] = NULL;
 
-	if (unlikely(!test_bit(tag, bqt-tag_map))) {
-		printk(KERN_ERR %s: attempt to clear non-busy tag (%d)\n,
-		   __func__, tag);
-		return;
-	}
+	blk_release_tag(bqt, tag);
+}
+EXPORT_SYMBOL(blk_queue_end_tag);
+
+/**
+ * blk_reserve_tag - lock and return the next free tag
+ * @bqt:  tag map
+ * @max_depth: max tag depth to use
+ *
+ * Description:
+ *   Lock and return the next free tag.
+ *   The tag needs to be freed up after usage with
+ *   blk_release_tag()
+ *
+ **/
+int blk_reserve_tag(struct blk_queue_tag *bqt, int max_depth)
+{
+	int tag = -1;
+
+	if (!bqt)
+		return tag;
+	if (max_depth = bqt-real_max_depth)
+		return -1;
+
+	do {
+		tag = find_first_zero_bit(bqt-tag_map, max_depth);
+		if (tag = max_depth)
+			return tag;
+
+	} while (test_and_set_bit_lock(tag, bqt-tag_map));
 	/*
-	 * The tag_map bit acts as a lock for tag_index[bit], so we need
-	 * unlock memory barrier semantics.
+	 * We need lock ordering semantics given by test_and_set_bit_lock.
+	 * See blk_queue_end_tag for details.
 	 */
-	clear_bit_unlock(tag, bqt-tag_map);
+	return tag;
 }
-EXPORT_SYMBOL(blk_queue_end_tag);
+EXPORT_SYMBOL(blk_reserve_tag);
 
 /**
  * blk_queue_start_tag - find a free tag and assign it
@@ -343,16 +386,9 @@ int blk_queue_start_tag(struct request_queue *q, struct request *rq)
 			return 1;
 	}
 
-	do {
-		tag = find_first_zero_bit(bqt-tag_map, max_depth);
-		if (tag = max_depth)
-			return 1;
-
-	} while (test_and_set_bit_lock(tag, bqt-tag_map));
-	/*
-	 * We need lock ordering semantics given by test_and_set_bit_lock.
-	 * See blk_queue_end_tag for details.
-	 */
+	tag = blk_reserve_tag(bqt, max_depth);
+	if (tag 

RE: [PATCH] megaraid_sas: Enable shared host tag map

2014-11-25 Thread Kashyap Desai
 -Original Message-
 From: Christoph Hellwig [mailto:h...@lst.de]
 Sent: Tuesday, November 25, 2014 8:02 PM
 To: Hannes Reinecke
 Cc: James Bottomley; Sumit Saxena; Kashyap Desai; linux-
 s...@vger.kernel.org; Webb Scales; Don Brace; Jens Axboe
 Subject: Re: [PATCH] megaraid_sas: Enable shared host tag map

 On Mon, Nov 24, 2014 at 04:51:14PM +0100, Hannes Reinecke wrote:
  It is useful as is, as we'll be getting prefixed logging output :-)

 Use the blk-mq code path if you care :)

  Which I didn't do yet as the driver is using a larger tag map than
  that one announced to the block layer.
  This is to facilitate internal command submission, which should always
  work independent on any tag starvation issues from the upper layers.

 This is an issue for a lot of drivers.  blk-mq provides a
reserved_tags pool
 for that, which reserves a number of tags for internal use, those must
be
 allocated using blk_mq_alloc_request with the reserved argument set to
 true.

 The lockless hpsa patches expose this to SCSI, which I'm generally fine
with,
 but we need to find a way to transparently make this work for the old
code
 path, too.  This might be as simple as embedding a second blk_queue_tag
 structure into the Scsi_Host, adding a constant prefix to the tag and
providing
 some wrappes in scsi that allow allocating a struct request (or rather
 scsi_cmnd) for internal use.

Just trying to understand your above comment. I see host template has new
field called
reserved_tags which is normally used by mid layer to reserved block tag
pool for driver's internal use.
What if driver is OK to manage internal pool without any blk tag
dependency and do not expose actual max can queue of what FW can support.
For example, in megaraid_sas driver FW expose max_fw_cmd = 1024 and driver
keep some reserve (let's say 24 commands for internal use) and it just
expose 1000 command for blk tags.  This will work for blk-mq and legacy
blk driver. Let driver to manage whatever internal reserved it kept for
and do not add any dependency with blk tag from above layer for those
reserved pool.

` Kashyap
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] megaraid_sas: Enable shared host tag map

2014-11-25 Thread Christoph Hellwig
On Tue, Nov 25, 2014 at 03:47:40PM +0100, Hannes Reinecke wrote:
 I'd rather have a single map to get request/tags from; otherwise
 we'd be arbitrarily starving internal requests even though the
 'main' tag map is empty.

At least in blk-mq the assumption is that a driver needs very few
internal tags, and it might need access to them in emergency
situations like resets or aborts.

 My plan was more to mark a certain range of tags as 'reserved',
 and add another helper/argument to allow to dip into the reserved
 pool, too.

We can add this as optional behavior, but I think the existing blk-mq
behavior is a good default.

 A tentative patch is attached.
 Idea is to call blk_queue_init_tags() with the actual tag size and
 then blk_resize_tags() to limit the number of tags for the request
 queue.
 The driver can then use 'blk_allocate_tag' with the appropriate max
 depth to get tags from the range [max_depth:real_max_depth].

I'd much prefer the blk-mq approach with two maps.  Either way please
make sure whatever you come up is compatible with blk-mq.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] megaraid_sas: Enable shared host tag map

2014-11-25 Thread Christoph Hellwig
On Tue, Nov 25, 2014 at 08:33:28PM +0530, Kashyap Desai wrote:
 Just trying to understand your above comment. I see host template has new
 field called
 reserved_tags which is normally used by mid layer to reserved block tag
 pool for driver's internal use.
 What if driver is OK to manage internal pool without any blk tag
 dependency and do not expose actual max can queue of what FW can support.
 For example, in megaraid_sas driver FW expose max_fw_cmd = 1024 and driver
 keep some reserve (let's say 24 commands for internal use) and it just
 expose 1000 command for blk tags.  This will work for blk-mq and legacy
 blk driver. Let driver to manage whatever internal reserved it kept for
 and do not add any dependency with blk tag from above layer for those
 reserved pool.

The driver can of coure always just manage its internal tags, but that
means we need to duplicate a tag allocator in every driver.  In short
it works ok for now, but I'd rather solve the problem in a single place.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] megaraid_sas: Enable shared host tag map

2014-11-24 Thread Christoph Hellwig
On Mon, Nov 24, 2014 at 04:33:55PM +0100, Hannes Reinecke wrote:
 The megaraid SAS driver uses a shared host tag map internally,
 so we should be telling the block layer about it.

But it doesn't make use of request-tag yet.  This would only be
useful if you got rid of the internal tag allocator.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] megaraid_sas: Enable shared host tag map

2014-11-24 Thread Hannes Reinecke
On 11/24/2014 04:35 PM, Christoph Hellwig wrote:
 On Mon, Nov 24, 2014 at 04:33:55PM +0100, Hannes Reinecke wrote:
 The megaraid SAS driver uses a shared host tag map internally,
 so we should be telling the block layer about it.
 
 But it doesn't make use of request-tag yet.  This would only be
 useful if you got rid of the internal tag allocator.
 
It is useful as is, as we'll be getting prefixed logging output :-)

But yeah, it would be good to get rid of the internal tag allocator.

Which I didn't do yet as the driver is using a larger tag map than
that one announced to the block layer.
This is to facilitate internal command submission, which should
always work independent on any tag starvation issues from the
upper layers.

So when moving to the generic tag allocation code I'd need:
- an block-layer helper for allocating a tag without a request
  (basically separating the existing blk-tag functionality, easy)
- introducing 'emergency pools' for the tag map, allowing to
  allocate a tag even under I/O pressure.

What I did was to:
- Split off blk_reserve_tag() functionality from blk_start_tag()
  (and similar with blk_end_tag)
- registered the overall size of the shared tag map
- called blk_resize_tags() to shrink it to -can_queue
- Implemented a 'force' attribute to blk_reserve_tag() to
  allow it to dip into the excess size.

But then I didn't really like this mis-use of the blk_resize_tag()
operation; I'd rather have it marked explicitly.

However, I'm sure megasas isn't the only driver requiring such a
feature (libata eg would benefit from this, too).

Ideas?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] megaraid_sas: Enable shared host tag map

2014-11-24 Thread Kashyap Desai
 -Original Message-
 From: Christoph Hellwig [mailto:h...@lst.de]
 Sent: Monday, November 24, 2014 9:06 PM
 To: Hannes Reinecke
 Cc: James Bottomley; Sumit Saxena; Kashyap Desai; Christoph Hellwig;
linux-
 s...@vger.kernel.org
 Subject: Re: [PATCH] megaraid_sas: Enable shared host tag map

 On Mon, Nov 24, 2014 at 04:33:55PM +0100, Hannes Reinecke wrote:
  The megaraid SAS driver uses a shared host tag map internally, so we
  should be telling the block layer about it.

 But it doesn't make use of request-tag yet.  This would only be useful
if you
 got rid of the internal tag allocator.

I have patch for code changes in this particular area and that is
currently under internal test cycle.

Next patch series will start using shared blk tag for internal MPT frame
allocation. It will get-rid-off  MPT frame pool link list completely and
also does lots of refactor w.r.t MFI/MPT frame corruption issue. Using
shared blk tag gave good performance improvement as driver completely
removed mpt frame spin lock used in megasas_get_cmd_fusion/
megasas_return_cmd_fusion.

~ Kashyap
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] megaraid_sas: Enable shared host tag map

2014-11-24 Thread Kashyap Desai
 -Original Message-
 From: Hannes Reinecke [mailto:h...@suse.de]
 Sent: Monday, November 24, 2014 9:21 PM
 To: Christoph Hellwig
 Cc: James Bottomley; Sumit Saxena; Kashyap Desai; linux-
 s...@vger.kernel.org
 Subject: Re: [PATCH] megaraid_sas: Enable shared host tag map

 On 11/24/2014 04:35 PM, Christoph Hellwig wrote:
  On Mon, Nov 24, 2014 at 04:33:55PM +0100, Hannes Reinecke wrote:
  The megaraid SAS driver uses a shared host tag map internally, so we
  should be telling the block layer about it.
 
  But it doesn't make use of request-tag yet.  This would only be
  useful if you got rid of the internal tag allocator.
 
 It is useful as is, as we'll be getting prefixed logging output :-)

 But yeah, it would be good to get rid of the internal tag allocator.

 Which I didn't do yet as the driver is using a larger tag map than that
one
 announced to the block layer.
 This is to facilitate internal command submission, which should always
work
 independent on any tag starvation issues from the upper layers.

 So when moving to the generic tag allocation code I'd need:
 - an block-layer helper for allocating a tag without a request
   (basically separating the existing blk-tag functionality, easy)
 - introducing 'emergency pools' for the tag map, allowing to
   allocate a tag even under I/O pressure.

 What I did was to:
 - Split off blk_reserve_tag() functionality from blk_start_tag()
   (and similar with blk_end_tag)
 - registered the overall size of the shared tag map
 - called blk_resize_tags() to shrink it to -can_queue
 - Implemented a 'force' attribute to blk_reserve_tag() to
   allow it to dip into the excess size.

Hannes,

I just received your reply... My response crossed with yours. I have
attached preview patch for your reference which I refer in my earlier
response.
Megaraid_sas driver will expose only actual IO command queue depth to the
SML and it will manage all internal used commands without taking any help
from SML.

Driver will now expose max_scsi_cmds to the upper layer which will be
lower than actually FW can support max command.
 E.a
+ instance-max_scsi_cmds = instance-max_fw_cmds -
+   (MEGASAS_FUSION_INTERNAL_CMDS +
+   MEGASAS_FUSION_IOCTL_CMDS);

We have to do this to make use of existing OS distribution which really
does not support blk_reserve_tag().

~ Kashyap

 But then I didn't really like this mis-use of the blk_resize_tag()
operation; I'd
 rather have it marked explicitly.

 However, I'm sure megasas isn't the only driver requiring such a feature
 (libata eg would benefit from this, too).

 Ideas?

 Cheers,

 Hannes
 --
 Dr. Hannes Reinecke zSeries  Storage
 h...@suse.de+49 911 74053 688
 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
 GF: J. Hawn, J. Guild, F. Imendörffer, HRB 21284 (AG Nürnberg)


blk_shared.patch
Description: Binary data