Re: [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS
On Mon, 2018-02-05 at 23:20 +0800, Ming Lei wrote: > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > index 55c0a745b427..385bbec73804 100644 > --- a/block/blk-mq-sched.c > +++ b/block/blk-mq-sched.c > @@ -81,6 +81,17 @@ static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx > *hctx) > } else > clear_bit(BLK_MQ_S_SCHED_RESTART, >state); > > + /* need to restart all hw queues for global tags */ > + if (hctx->flags & BLK_MQ_F_GLOBAL_TAGS) { > + struct blk_mq_hw_ctx *hctx2; > + int i; > + > + queue_for_each_hw_ctx(hctx->queue, hctx2, i) > + if (blk_mq_run_hw_queue(hctx2, true)) > + return true; > + return false; > + } > + > return blk_mq_run_hw_queue(hctx, true); > } It seems weird to me that no matter for which hardware queue a restart is requested (the hctx argument) that the above loop starts with examining the hardware queue with index 0. Will this cause fairness and/or cache line bouncing problems? Thanks, Bart.
Re: [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS
On Mon, 2018-02-05 at 23:20 +0800, Ming Lei wrote: > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > index 55c0a745b427..385bbec73804 100644 > --- a/block/blk-mq-sched.c > +++ b/block/blk-mq-sched.c > @@ -81,6 +81,17 @@ static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx > *hctx) > } else > clear_bit(BLK_MQ_S_SCHED_RESTART, >state); > > + /* need to restart all hw queues for global tags */ > + if (hctx->flags & BLK_MQ_F_GLOBAL_TAGS) { > + struct blk_mq_hw_ctx *hctx2; > + int i; > + > + queue_for_each_hw_ctx(hctx->queue, hctx2, i) > + if (blk_mq_run_hw_queue(hctx2, true)) > + return true; > + return false; > + } > + > return blk_mq_run_hw_queue(hctx, true); > } This new loop looks misplaced to me. If both the BLK_MQ_F_GLOBAL_TAGS and the BLK_MQ_F_TAG_SHARED flags are set then the outer loop in blk_mq_sched_restart() and the inner loop in blk_mq_sched_restart_hctx() will cause more calls of blk_mq_run_hw_queue() than necessary. Have you considered to merge the above loop into blk_mq_sched_restart()? Thanks, Bart.
Re: [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS
On 02/06/18 15:18, Jens Axboe wrote: GLOBAL implies that it's, strangely enough, global. That isn't really the case. Why not call this BLK_MQ_F_HOST_TAGS or something like that? I'd welcome better names, but global doesn't seem to be a great choice. BLK_MQ_F_SET_TAGS? I like the name BLK_MQ_F_HOST_TAGS because it refers how these tags will be used by SCSI LLDs, namely as a host-wide tag set. Thanks, Bart.
Re: [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS
On Tue, Feb 06, 2018 at 04:18:20PM -0700, Jens Axboe wrote: > On 2/5/18 8:20 AM, Ming Lei wrote: ... > > GLOBAL implies that it's, strangely enough, global. That isn't really the > case. Why not call this BLK_MQ_F_HOST_TAGS or something like that? I'd > welcome better names, but global doesn't seem to be a great choice. > > BLK_MQ_F_SET_TAGS? Good point, I am fine with either BLK_MQ_F_HOST_TAGS or BLK_MQ_F_SET_TAGS, will update in V3. Thanks, Ming
Re: [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS
On Tue, Feb 06, 2018 at 12:33:36PM -0800, Omar Sandoval wrote: > On Mon, Feb 05, 2018 at 11:20:29PM +0800, Ming Lei wrote: .. > > > > + /* need to restart all hw queues for global tags */ > > + if (hctx->flags & BLK_MQ_F_GLOBAL_TAGS) { > > + struct blk_mq_hw_ctx *hctx2; > > + int i; > > + > > + queue_for_each_hw_ctx(hctx->queue, hctx2, i) > > + if (blk_mq_run_hw_queue(hctx2, true)) > > + return true; > > Is it intentional that we stop after the first hw queue does work? That > seems fine but it's a little confusing because the comment claims we > restart everything. Good catch, will update comment in V3. Thanks, Ming
Re: [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS
On 2/5/18 8:20 AM, Ming Lei wrote: > Quite a few HBAs(such as HPSA, megaraid, mpt3sas, ..) support multiple > reply queues, but tags is often HBA wide. > > These HBAs have switched to use pci_alloc_irq_vectors(PCI_IRQ_AFFINITY) > for automatic affinity assignment. > > Now 84676c1f21e8ff5(genirq/affinity: assign vectors to all possible CPUs) > has been merged to V4.16-rc, and it is easy to allocate all offline CPUs > for some irq vectors, this can't be avoided even though the allocation > is improved. > > So all these drivers have to avoid to ask HBA to complete request in > reply queue which hasn't online CPUs assigned, and HPSA has been broken > with v4.15+: > > https://marc.info/?l=linux-kernel=151748144730409=2 > > This issue can be solved generically and easily via blk_mq(scsi_mq) multiple > hw queue by mapping each reply queue into hctx, but one tricky thing is > the HBA wide(instead of hw queue wide) tags. > > This patch is based on the following Hannes's patch: > > https://marc.info/?l=linux-block=149132580511346=2 > > One big difference with Hannes's is that this patch only makes the tags > sbitmap > and active_queues data structure HBA wide, and others are kept as NUMA > locality, > such as request, hctx, tags, ... > > The following patch will support global tags on null_blk, also the performance > data is provided, no obvious performance loss is observed when the whole > hw queue depth is same. GLOBAL implies that it's, strangely enough, global. That isn't really the case. Why not call this BLK_MQ_F_HOST_TAGS or something like that? I'd welcome better names, but global doesn't seem to be a great choice. BLK_MQ_F_SET_TAGS? -- Jens Axboe
Re: [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS
On Mon, Feb 05, 2018 at 11:20:29PM +0800, Ming Lei wrote: > Quite a few HBAs(such as HPSA, megaraid, mpt3sas, ..) support multiple > reply queues, but tags is often HBA wide. > > These HBAs have switched to use pci_alloc_irq_vectors(PCI_IRQ_AFFINITY) > for automatic affinity assignment. > > Now 84676c1f21e8ff5(genirq/affinity: assign vectors to all possible CPUs) > has been merged to V4.16-rc, and it is easy to allocate all offline CPUs > for some irq vectors, this can't be avoided even though the allocation > is improved. > > So all these drivers have to avoid to ask HBA to complete request in > reply queue which hasn't online CPUs assigned, and HPSA has been broken > with v4.15+: > > https://marc.info/?l=linux-kernel=151748144730409=2 > > This issue can be solved generically and easily via blk_mq(scsi_mq) multiple > hw queue by mapping each reply queue into hctx, but one tricky thing is > the HBA wide(instead of hw queue wide) tags. > > This patch is based on the following Hannes's patch: > > https://marc.info/?l=linux-block=149132580511346=2 > > One big difference with Hannes's is that this patch only makes the tags > sbitmap > and active_queues data structure HBA wide, and others are kept as NUMA > locality, > such as request, hctx, tags, ... > > The following patch will support global tags on null_blk, also the performance > data is provided, no obvious performance loss is observed when the whole > hw queue depth is same. > > Cc: Hannes Reinecke> Cc: Arun Easi > Cc: Omar Sandoval , > Cc: "Martin K. Petersen" , > Cc: James Bottomley , > Cc: Christoph Hellwig , > Cc: Don Brace > Cc: Kashyap Desai > Cc: Peter Rivera > Cc: Mike Snitzer > Tested-by: Laurence Oberman > Signed-off-by: Ming Lei > --- > block/blk-mq-debugfs.c | 1 + > block/blk-mq-sched.c | 13 - > block/blk-mq-tag.c | 23 ++- > block/blk-mq-tag.h | 5 - > block/blk-mq.c | 29 - > block/blk-mq.h | 3 ++- > include/linux/blk-mq.h | 2 ++ > 7 files changed, 63 insertions(+), 13 deletions(-) > > diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c > index 0dfafa4b655a..0f0fafe03f5d 100644 > --- a/block/blk-mq-debugfs.c > +++ b/block/blk-mq-debugfs.c > @@ -206,6 +206,7 @@ static const char *const hctx_flag_name[] = { > HCTX_FLAG_NAME(SHOULD_MERGE), > HCTX_FLAG_NAME(TAG_SHARED), > HCTX_FLAG_NAME(SG_MERGE), > + HCTX_FLAG_NAME(GLOBAL_TAGS), > HCTX_FLAG_NAME(BLOCKING), > HCTX_FLAG_NAME(NO_SCHED), > }; > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > index 55c0a745b427..385bbec73804 100644 > --- a/block/blk-mq-sched.c > +++ b/block/blk-mq-sched.c > @@ -81,6 +81,17 @@ static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx > *hctx) > } else > clear_bit(BLK_MQ_S_SCHED_RESTART, >state); > > + /* need to restart all hw queues for global tags */ > + if (hctx->flags & BLK_MQ_F_GLOBAL_TAGS) { > + struct blk_mq_hw_ctx *hctx2; > + int i; > + > + queue_for_each_hw_ctx(hctx->queue, hctx2, i) > + if (blk_mq_run_hw_queue(hctx2, true)) > + return true; Is it intentional that we stop after the first hw queue does work? That seems fine but it's a little confusing because the comment claims we restart everything. > + return false; > + } > +
[PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS
Quite a few HBAs(such as HPSA, megaraid, mpt3sas, ..) support multiple reply queues, but tags is often HBA wide. These HBAs have switched to use pci_alloc_irq_vectors(PCI_IRQ_AFFINITY) for automatic affinity assignment. Now 84676c1f21e8ff5(genirq/affinity: assign vectors to all possible CPUs) has been merged to V4.16-rc, and it is easy to allocate all offline CPUs for some irq vectors, this can't be avoided even though the allocation is improved. So all these drivers have to avoid to ask HBA to complete request in reply queue which hasn't online CPUs assigned, and HPSA has been broken with v4.15+: https://marc.info/?l=linux-kernel=151748144730409=2 This issue can be solved generically and easily via blk_mq(scsi_mq) multiple hw queue by mapping each reply queue into hctx, but one tricky thing is the HBA wide(instead of hw queue wide) tags. This patch is based on the following Hannes's patch: https://marc.info/?l=linux-block=149132580511346=2 One big difference with Hannes's is that this patch only makes the tags sbitmap and active_queues data structure HBA wide, and others are kept as NUMA locality, such as request, hctx, tags, ... The following patch will support global tags on null_blk, also the performance data is provided, no obvious performance loss is observed when the whole hw queue depth is same. Cc: Hannes ReineckeCc: Arun Easi Cc: Omar Sandoval , Cc: "Martin K. Petersen" , Cc: James Bottomley , Cc: Christoph Hellwig , Cc: Don Brace Cc: Kashyap Desai Cc: Peter Rivera Cc: Mike Snitzer Tested-by: Laurence Oberman Signed-off-by: Ming Lei --- block/blk-mq-debugfs.c | 1 + block/blk-mq-sched.c | 13 - block/blk-mq-tag.c | 23 ++- block/blk-mq-tag.h | 5 - block/blk-mq.c | 29 - block/blk-mq.h | 3 ++- include/linux/blk-mq.h | 2 ++ 7 files changed, 63 insertions(+), 13 deletions(-) diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index 0dfafa4b655a..0f0fafe03f5d 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -206,6 +206,7 @@ static const char *const hctx_flag_name[] = { HCTX_FLAG_NAME(SHOULD_MERGE), HCTX_FLAG_NAME(TAG_SHARED), HCTX_FLAG_NAME(SG_MERGE), + HCTX_FLAG_NAME(GLOBAL_TAGS), HCTX_FLAG_NAME(BLOCKING), HCTX_FLAG_NAME(NO_SCHED), }; diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 55c0a745b427..385bbec73804 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -81,6 +81,17 @@ static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx) } else clear_bit(BLK_MQ_S_SCHED_RESTART, >state); + /* need to restart all hw queues for global tags */ + if (hctx->flags & BLK_MQ_F_GLOBAL_TAGS) { + struct blk_mq_hw_ctx *hctx2; + int i; + + queue_for_each_hw_ctx(hctx->queue, hctx2, i) + if (blk_mq_run_hw_queue(hctx2, true)) + return true; + return false; + } + return blk_mq_run_hw_queue(hctx, true); } @@ -495,7 +506,7 @@ static int blk_mq_sched_alloc_tags(struct request_queue *q, int ret; hctx->sched_tags = blk_mq_alloc_rq_map(set, hctx_idx, q->nr_requests, - set->reserved_tags); + set->reserved_tags, false); if (!hctx->sched_tags) return -ENOMEM; diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 571797dc36cb..66377d09eaeb 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -379,9 +379,11 @@ static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags, return NULL; } -struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags, +struct blk_mq_tags *blk_mq_init_tags(struct blk_mq_tag_set *set, +unsigned int total_tags, unsigned int reserved_tags, -int node, int alloc_policy) +int node, int alloc_policy, +bool global_tag) { struct blk_mq_tags *tags; @@ -397,6 +399,14 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags, tags->nr_tags = total_tags; tags->nr_reserved_tags = reserved_tags; + WARN_ON(global_tag && !set->global_tags); + if (global_tag && set->global_tags) { + tags->bitmap_tags = set->global_tags->bitmap_tags; + tags->breserved_tags = set->global_tags->breserved_tags; + tags->active_queues =