Re: [PATCH] megaraid_sas: Enable shared host tag map
On Mon, Nov 24, 2014 at 04:51:14PM +0100, Hannes Reinecke wrote: It is useful as is, as we'll be getting prefixed logging output :-) Use the blk-mq code path if you care :) Which I didn't do yet as the driver is using a larger tag map than that one announced to the block layer. This is to facilitate internal command submission, which should always work independent on any tag starvation issues from the upper layers. This is an issue for a lot of drivers. blk-mq provides a reserved_tags pool for that, which reserves a number of tags for internal use, those must be allocated using blk_mq_alloc_request with the reserved argument set to true. The lockless hpsa patches expose this to SCSI, which I'm generally fine with, but we need to find a way to transparently make this work for the old code path, too. This might be as simple as embedding a second blk_queue_tag structure into the Scsi_Host, adding a constant prefix to the tag and providing some wrappes in scsi that allow allocating a struct request (or rather scsi_cmnd) for internal use. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] megaraid_sas: Enable shared host tag map
On 11/25/2014 03:31 PM, Christoph Hellwig wrote: On Mon, Nov 24, 2014 at 04:51:14PM +0100, Hannes Reinecke wrote: It is useful as is, as we'll be getting prefixed logging output :-) Use the blk-mq code path if you care :) Which I didn't do yet as the driver is using a larger tag map than that one announced to the block layer. This is to facilitate internal command submission, which should always work independent on any tag starvation issues from the upper layers. This is an issue for a lot of drivers. blk-mq provides a reserved_tags pool for that, which reserves a number of tags for internal use, those must be allocated using blk_mq_alloc_request with the reserved argument set to true. The lockless hpsa patches expose this to SCSI, which I'm generally fine with, but we need to find a way to transparently make this work for the old code path, too. This might be as simple as embedding a second blk_queue_tag structure into the Scsi_Host, adding a constant prefix to the tag and providing some wrappes in scsi that allow allocating a struct request (or rather scsi_cmnd) for internal use. I'd rather have a single map to get request/tags from; otherwise we'd be arbitrarily starving internal requests even though the 'main' tag map is empty. My plan was more to mark a certain range of tags as 'reserved', and add another helper/argument to allow to dip into the reserved pool, too. A tentative patch is attached. Idea is to call blk_queue_init_tags() with the actual tag size and then blk_resize_tags() to limit the number of tags for the request queue. The driver can then use 'blk_allocate_tag' with the appropriate max depth to get tags from the range [max_depth:real_max_depth]. Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 21284 (AG Nürnberg) From e872bb0c4c2b1f72982f4d31925d3b2f65317ffd Mon Sep 17 00:00:00 2001 From: Hannes Reinecke h...@suse.de Date: Tue, 25 Nov 2014 15:43:07 +0100 Subject: [PATCH] blk-tag: separate out blk_(allocate|release)_tag Separate out helper functions blk_allocate_tag() and blk_release_tag(). Signed-off-by: Hannes Reinecke h...@suse.de --- block/blk-tag.c | 74 ++--- 1 file changed, 55 insertions(+), 19 deletions(-) diff --git a/block/blk-tag.c b/block/blk-tag.c index a185b86..6526fff 100644 --- a/block/blk-tag.c +++ b/block/blk-tag.c @@ -244,6 +244,24 @@ int blk_queue_resize_tags(struct request_queue *q, int new_depth) } EXPORT_SYMBOL(blk_queue_resize_tags); +void blk_release_tag(struct blk_queue_tag *bqt, int tag) +{ + if (unlikely(!bqt)) + return; + + if (unlikely(!test_bit(tag, bqt-tag_map))) { + printk(KERN_ERR %s: attempt to clear non-busy tag (%d)\n, + __func__, tag); + return; + } + /* + * The tag_map bit acts as a lock for tag_index[bit], so we need + * unlock memory barrier semantics. + */ + clear_bit_unlock(tag, bqt-tag_map); +} +EXPORT_SYMBOL(blk_release_tag); + /** * blk_queue_end_tag - end tag operations for a request * @q: the request queue for the device @@ -275,18 +293,43 @@ void blk_queue_end_tag(struct request_queue *q, struct request *rq) bqt-tag_index[tag] = NULL; - if (unlikely(!test_bit(tag, bqt-tag_map))) { - printk(KERN_ERR %s: attempt to clear non-busy tag (%d)\n, - __func__, tag); - return; - } + blk_release_tag(bqt, tag); +} +EXPORT_SYMBOL(blk_queue_end_tag); + +/** + * blk_reserve_tag - lock and return the next free tag + * @bqt: tag map + * @max_depth: max tag depth to use + * + * Description: + * Lock and return the next free tag. + * The tag needs to be freed up after usage with + * blk_release_tag() + * + **/ +int blk_reserve_tag(struct blk_queue_tag *bqt, int max_depth) +{ + int tag = -1; + + if (!bqt) + return tag; + if (max_depth = bqt-real_max_depth) + return -1; + + do { + tag = find_first_zero_bit(bqt-tag_map, max_depth); + if (tag = max_depth) + return tag; + + } while (test_and_set_bit_lock(tag, bqt-tag_map)); /* - * The tag_map bit acts as a lock for tag_index[bit], so we need - * unlock memory barrier semantics. + * We need lock ordering semantics given by test_and_set_bit_lock. + * See blk_queue_end_tag for details. */ - clear_bit_unlock(tag, bqt-tag_map); + return tag; } -EXPORT_SYMBOL(blk_queue_end_tag); +EXPORT_SYMBOL(blk_reserve_tag); /** * blk_queue_start_tag - find a free tag and assign it @@ -343,16 +386,9 @@ int blk_queue_start_tag(struct request_queue *q, struct request *rq) return 1; } - do { - tag = find_first_zero_bit(bqt-tag_map, max_depth); - if (tag = max_depth) - return 1; - - } while (test_and_set_bit_lock(tag, bqt-tag_map)); - /* - * We need lock ordering semantics given by test_and_set_bit_lock. - * See blk_queue_end_tag for details. - */ + tag = blk_reserve_tag(bqt, max_depth); + if (tag
RE: [PATCH] megaraid_sas: Enable shared host tag map
-Original Message- From: Christoph Hellwig [mailto:h...@lst.de] Sent: Tuesday, November 25, 2014 8:02 PM To: Hannes Reinecke Cc: James Bottomley; Sumit Saxena; Kashyap Desai; linux- s...@vger.kernel.org; Webb Scales; Don Brace; Jens Axboe Subject: Re: [PATCH] megaraid_sas: Enable shared host tag map On Mon, Nov 24, 2014 at 04:51:14PM +0100, Hannes Reinecke wrote: It is useful as is, as we'll be getting prefixed logging output :-) Use the blk-mq code path if you care :) Which I didn't do yet as the driver is using a larger tag map than that one announced to the block layer. This is to facilitate internal command submission, which should always work independent on any tag starvation issues from the upper layers. This is an issue for a lot of drivers. blk-mq provides a reserved_tags pool for that, which reserves a number of tags for internal use, those must be allocated using blk_mq_alloc_request with the reserved argument set to true. The lockless hpsa patches expose this to SCSI, which I'm generally fine with, but we need to find a way to transparently make this work for the old code path, too. This might be as simple as embedding a second blk_queue_tag structure into the Scsi_Host, adding a constant prefix to the tag and providing some wrappes in scsi that allow allocating a struct request (or rather scsi_cmnd) for internal use. Just trying to understand your above comment. I see host template has new field called reserved_tags which is normally used by mid layer to reserved block tag pool for driver's internal use. What if driver is OK to manage internal pool without any blk tag dependency and do not expose actual max can queue of what FW can support. For example, in megaraid_sas driver FW expose max_fw_cmd = 1024 and driver keep some reserve (let's say 24 commands for internal use) and it just expose 1000 command for blk tags. This will work for blk-mq and legacy blk driver. Let driver to manage whatever internal reserved it kept for and do not add any dependency with blk tag from above layer for those reserved pool. ` Kashyap -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] megaraid_sas: Enable shared host tag map
On Tue, Nov 25, 2014 at 03:47:40PM +0100, Hannes Reinecke wrote: I'd rather have a single map to get request/tags from; otherwise we'd be arbitrarily starving internal requests even though the 'main' tag map is empty. At least in blk-mq the assumption is that a driver needs very few internal tags, and it might need access to them in emergency situations like resets or aborts. My plan was more to mark a certain range of tags as 'reserved', and add another helper/argument to allow to dip into the reserved pool, too. We can add this as optional behavior, but I think the existing blk-mq behavior is a good default. A tentative patch is attached. Idea is to call blk_queue_init_tags() with the actual tag size and then blk_resize_tags() to limit the number of tags for the request queue. The driver can then use 'blk_allocate_tag' with the appropriate max depth to get tags from the range [max_depth:real_max_depth]. I'd much prefer the blk-mq approach with two maps. Either way please make sure whatever you come up is compatible with blk-mq. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] megaraid_sas: Enable shared host tag map
On Tue, Nov 25, 2014 at 08:33:28PM +0530, Kashyap Desai wrote: Just trying to understand your above comment. I see host template has new field called reserved_tags which is normally used by mid layer to reserved block tag pool for driver's internal use. What if driver is OK to manage internal pool without any blk tag dependency and do not expose actual max can queue of what FW can support. For example, in megaraid_sas driver FW expose max_fw_cmd = 1024 and driver keep some reserve (let's say 24 commands for internal use) and it just expose 1000 command for blk tags. This will work for blk-mq and legacy blk driver. Let driver to manage whatever internal reserved it kept for and do not add any dependency with blk tag from above layer for those reserved pool. The driver can of coure always just manage its internal tags, but that means we need to duplicate a tag allocator in every driver. In short it works ok for now, but I'd rather solve the problem in a single place. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] megaraid_sas: Enable shared host tag map
On Mon, Nov 24, 2014 at 04:33:55PM +0100, Hannes Reinecke wrote: The megaraid SAS driver uses a shared host tag map internally, so we should be telling the block layer about it. But it doesn't make use of request-tag yet. This would only be useful if you got rid of the internal tag allocator. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] megaraid_sas: Enable shared host tag map
On 11/24/2014 04:35 PM, Christoph Hellwig wrote: On Mon, Nov 24, 2014 at 04:33:55PM +0100, Hannes Reinecke wrote: The megaraid SAS driver uses a shared host tag map internally, so we should be telling the block layer about it. But it doesn't make use of request-tag yet. This would only be useful if you got rid of the internal tag allocator. It is useful as is, as we'll be getting prefixed logging output :-) But yeah, it would be good to get rid of the internal tag allocator. Which I didn't do yet as the driver is using a larger tag map than that one announced to the block layer. This is to facilitate internal command submission, which should always work independent on any tag starvation issues from the upper layers. So when moving to the generic tag allocation code I'd need: - an block-layer helper for allocating a tag without a request (basically separating the existing blk-tag functionality, easy) - introducing 'emergency pools' for the tag map, allowing to allocate a tag even under I/O pressure. What I did was to: - Split off blk_reserve_tag() functionality from blk_start_tag() (and similar with blk_end_tag) - registered the overall size of the shared tag map - called blk_resize_tags() to shrink it to -can_queue - Implemented a 'force' attribute to blk_reserve_tag() to allow it to dip into the excess size. But then I didn't really like this mis-use of the blk_resize_tag() operation; I'd rather have it marked explicitly. However, I'm sure megasas isn't the only driver requiring such a feature (libata eg would benefit from this, too). Ideas? Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 21284 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] megaraid_sas: Enable shared host tag map
-Original Message- From: Christoph Hellwig [mailto:h...@lst.de] Sent: Monday, November 24, 2014 9:06 PM To: Hannes Reinecke Cc: James Bottomley; Sumit Saxena; Kashyap Desai; Christoph Hellwig; linux- s...@vger.kernel.org Subject: Re: [PATCH] megaraid_sas: Enable shared host tag map On Mon, Nov 24, 2014 at 04:33:55PM +0100, Hannes Reinecke wrote: The megaraid SAS driver uses a shared host tag map internally, so we should be telling the block layer about it. But it doesn't make use of request-tag yet. This would only be useful if you got rid of the internal tag allocator. I have patch for code changes in this particular area and that is currently under internal test cycle. Next patch series will start using shared blk tag for internal MPT frame allocation. It will get-rid-off MPT frame pool link list completely and also does lots of refactor w.r.t MFI/MPT frame corruption issue. Using shared blk tag gave good performance improvement as driver completely removed mpt frame spin lock used in megasas_get_cmd_fusion/ megasas_return_cmd_fusion. ~ Kashyap -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] megaraid_sas: Enable shared host tag map
-Original Message- From: Hannes Reinecke [mailto:h...@suse.de] Sent: Monday, November 24, 2014 9:21 PM To: Christoph Hellwig Cc: James Bottomley; Sumit Saxena; Kashyap Desai; linux- s...@vger.kernel.org Subject: Re: [PATCH] megaraid_sas: Enable shared host tag map On 11/24/2014 04:35 PM, Christoph Hellwig wrote: On Mon, Nov 24, 2014 at 04:33:55PM +0100, Hannes Reinecke wrote: The megaraid SAS driver uses a shared host tag map internally, so we should be telling the block layer about it. But it doesn't make use of request-tag yet. This would only be useful if you got rid of the internal tag allocator. It is useful as is, as we'll be getting prefixed logging output :-) But yeah, it would be good to get rid of the internal tag allocator. Which I didn't do yet as the driver is using a larger tag map than that one announced to the block layer. This is to facilitate internal command submission, which should always work independent on any tag starvation issues from the upper layers. So when moving to the generic tag allocation code I'd need: - an block-layer helper for allocating a tag without a request (basically separating the existing blk-tag functionality, easy) - introducing 'emergency pools' for the tag map, allowing to allocate a tag even under I/O pressure. What I did was to: - Split off blk_reserve_tag() functionality from blk_start_tag() (and similar with blk_end_tag) - registered the overall size of the shared tag map - called blk_resize_tags() to shrink it to -can_queue - Implemented a 'force' attribute to blk_reserve_tag() to allow it to dip into the excess size. Hannes, I just received your reply... My response crossed with yours. I have attached preview patch for your reference which I refer in my earlier response. Megaraid_sas driver will expose only actual IO command queue depth to the SML and it will manage all internal used commands without taking any help from SML. Driver will now expose max_scsi_cmds to the upper layer which will be lower than actually FW can support max command. E.a + instance-max_scsi_cmds = instance-max_fw_cmds - + (MEGASAS_FUSION_INTERNAL_CMDS + + MEGASAS_FUSION_IOCTL_CMDS); We have to do this to make use of existing OS distribution which really does not support blk_reserve_tag(). ~ Kashyap But then I didn't really like this mis-use of the blk_resize_tag() operation; I'd rather have it marked explicitly. However, I'm sure megasas isn't the only driver requiring such a feature (libata eg would benefit from this, too). Ideas? Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de+49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 21284 (AG Nürnberg) blk_shared.patch Description: Binary data