Re: [PATCH V3 06/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed

2017-09-01 Thread Bart Van Assche
On Fri, 2017-09-01 at 11:02 +0800, Ming Lei wrote:
> That is a good question, but it has been answered by the comment
> before checking the DISPATCH_BUSY state in blk_mq_sched_dispatch_requests():
> 
>   /*
>* If DISPATCH_BUSY is set, that means hw queue is busy
>* and requests in the list of hctx->dispatch need to
>* be flushed first, so return early.
>*
>* Wherever DISPATCH_BUSY is set, blk_mq_run_hw_queue()
>* will be run to try to make progress, so it is always
>* safe to check the state here.
>*/
> 
> Suppose the two writes are reordered, sooner or latter, the
> list_empty_careful(>dispatch) will be observed, and
> will dispatch request from this hw queue after '->dispatch'
> is flushed.
> 
> Since now the setting of DISPATCH_BUSY requires to hold
> hctx->lock, we should avoid to add barrier there.

Although it is not clear to me how anyone who has not followed this discussion
is assumed to figure out all these subtleties, if the other comments get
addressed:

Reviewed-by: Bart Van Assche 



Re: [PATCH V3 06/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed

2017-08-31 Thread Bart Van Assche
On Thu, 2017-08-31 at 12:01 +0800, Ming Lei wrote:
> On Wed, Aug 30, 2017 at 05:11:00PM +, Bart Van Assche wrote:
> > On Sun, 2017-08-27 at 00:33 +0800, Ming Lei wrote:
> > [ ... ]
> > Shouldn't blk_mq_sched_dispatch_requests() set BLK_MQ_S_DISPATCH_BUSY just 
> > after
> > the following statement because this statement makes the dispatch list 
> > empty?
> 
> Actually that is what I did in V1.
> 
> I changed to this way because setting the BUSY flag here will increase
> the race window a bit, for example, if one request is added to ->dispatch
> just after it is flushed(), the check on the BUSY bit won't catch this
> case. Then we can avoid to check both the bit and 
> list_empty_careful(>dispatch)
> in blk_mq_sched_dispatch_requests(), so code becomes simpler and more
> readable if we set the flag simply from the beginning.

Hello Ming,

My understanding is that blk_mq_sched_dispatch_requests() will only work
correctly if the code that sets the DISPATCH_BUSY flag does that after having
inserted one or more elements in the dispatch list. Although x86 CPUs do not
reorder store operations I think that the functions that set the DISPATCH_BUSY
flag need a memory barrier these two store operations. I'm referring to the
blk_mq_sched_bypass_insert(), blk_mq_dispatch_wait_add() and
blk_mq_hctx_notify_dead() functions.

> > > +  * too small, no need to worry about performance
> > 
> >^^^
> > The word "too" seems extraneous to me in this sentence.
> 
> Maybe 'extremely' is better, or just remove it?

If the word "too" would be removed I think the comment is still clear.

Thanks,

Bart.

Re: [PATCH V3 06/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed

2017-08-30 Thread Bart Van Assche
On Sun, 2017-08-27 at 00:33 +0800, Ming Lei wrote:
> During dispatching, we moved all requests from hctx->dispatch to
> one temporary list, then dispatch them one by one from this list.
> Unfortunately duirng this period, run queue from other contexts
^^
during?
> may think the queue is idle, then start to dequeue from sw/scheduler
> queue and still try to dispatch because ->dispatch is empty. This way
> hurts sequential I/O performance because requests are dequeued when
> lld queue is busy.
> [ ... ]
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 735e432294ab..4d7bea8c2594 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -146,7 +146,6 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx 
> *hctx)
>   struct request_queue *q = hctx->queue;
>   struct elevator_queue *e = q->elevator;
>   const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
> - bool do_sched_dispatch = true;
>   LIST_HEAD(rq_list);
>  
>   /* RCU or SRCU read lock is needed before checking quiesced flag */

Shouldn't blk_mq_sched_dispatch_requests() set BLK_MQ_S_DISPATCH_BUSY just after
the following statement because this statement makes the dispatch list empty?

list_splice_init(>dispatch, _list);

> @@ -177,8 +176,33 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx 
> *hctx)
>*/
>   if (!list_empty(_list)) {
>   blk_mq_sched_mark_restart_hctx(hctx);
> - do_sched_dispatch = blk_mq_dispatch_rq_list(q, _list);
> - } else if (!has_sched_dispatch && !q->queue_depth) {
> + blk_mq_dispatch_rq_list(q, _list);
> +
> + /*
> +  * We may clear DISPATCH_BUSY just after it
> +  * is set from another context, the only cost
> +  * is that one request is dequeued a bit early,
> +  * we can survive that. Given the window is
> +  * too small, no need to worry about performance
   ^^^
The word "too" seems extraneous to me in this sentence.

>  bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
> @@ -330,6 +353,7 @@ static bool blk_mq_sched_bypass_insert(struct 
> blk_mq_hw_ctx *hctx,
>*/
>   spin_lock(>lock);
>   list_add(>queuelist, >dispatch);
> + set_bit(BLK_MQ_S_DISPATCH_BUSY, >state);
>   spin_unlock(>lock);
>   return true;
>  }

Is it necessary to make blk_mq_sched_bypass_insert() set BLK_MQ_S_DISPATCH_BUSY?
My understanding is that only code that makes the dispatch list empty should
set BLK_MQ_S_DISPATCH_BUSY. However, blk_mq_sched_bypass_insert() adds an 
element
to the dispatch list so that guarantees that that list is not empty.

> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index f063dd0f197f..6af56a71c1cd 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1140,6 +1140,11 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, 
> struct list_head *list)
>  
>   spin_lock(>lock);
>   list_splice_init(list, >dispatch);
> + /*
> +  * DISPATCH_BUSY won't be cleared until all requests
> +  * in hctx->dispatch are dispatched successfully
> +  */
> + set_bit(BLK_MQ_S_DISPATCH_BUSY, >state);
>   spin_unlock(>lock);

Same comment here - since this code adds one or more requests to the dispatch 
list,
is it really needed to set the DISPATCH_BUSY flag?

Bart.

[PATCH V3 06/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed

2017-08-26 Thread Ming Lei
During dispatching, we moved all requests from hctx->dispatch to
one temporary list, then dispatch them one by one from this list.
Unfortunately duirng this period, run queue from other contexts
may think the queue is idle, then start to dequeue from sw/scheduler
queue and still try to dispatch because ->dispatch is empty. This way
hurts sequential I/O performance because requests are dequeued when
lld queue is busy.

This patch introduces the state of BLK_MQ_S_DISPATCH_BUSY to
make sure that request isn't dequeued until ->dispatch is
flushed.

Signed-off-by: Ming Lei 
---
 block/blk-mq-debugfs.c |  1 +
 block/blk-mq-sched.c   | 58 +++---
 block/blk-mq.c |  6 ++
 include/linux/blk-mq.h |  1 +
 4 files changed, 49 insertions(+), 17 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index e53b6129ca5a..64a6b34b402c 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -180,6 +180,7 @@ static const char *const hctx_state_name[] = {
HCTX_STATE_NAME(SCHED_RESTART),
HCTX_STATE_NAME(TAG_WAITING),
HCTX_STATE_NAME(START_ON_RUN),
+   HCTX_STATE_NAME(DISPATCH_BUSY),
 };
 #undef HCTX_STATE_NAME
 
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 735e432294ab..4d7bea8c2594 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -146,7 +146,6 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx 
*hctx)
struct request_queue *q = hctx->queue;
struct elevator_queue *e = q->elevator;
const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
-   bool do_sched_dispatch = true;
LIST_HEAD(rq_list);
 
/* RCU or SRCU read lock is needed before checking quiesced flag */
@@ -177,8 +176,33 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx 
*hctx)
 */
if (!list_empty(_list)) {
blk_mq_sched_mark_restart_hctx(hctx);
-   do_sched_dispatch = blk_mq_dispatch_rq_list(q, _list);
-   } else if (!has_sched_dispatch && !q->queue_depth) {
+   blk_mq_dispatch_rq_list(q, _list);
+
+   /*
+* We may clear DISPATCH_BUSY just after it
+* is set from another context, the only cost
+* is that one request is dequeued a bit early,
+* we can survive that. Given the window is
+* too small, no need to worry about performance
+* effect.
+*/
+   if (list_empty_careful(>dispatch))
+   clear_bit(BLK_MQ_S_DISPATCH_BUSY, >state);
+   }
+
+   /*
+* If DISPATCH_BUSY is set, that means hw queue is busy
+* and requests in the list of hctx->dispatch need to
+* be flushed first, so return early.
+*
+* Wherever DISPATCH_BUSY is set, blk_mq_run_hw_queue()
+* will be run to try to make progress, so it is always
+* safe to check the state here.
+*/
+   if (test_bit(BLK_MQ_S_DISPATCH_BUSY, >state))
+   return;
+
+   if (!has_sched_dispatch) {
/*
 * If there is no per-request_queue depth, we
 * flush all requests in this hw queue, otherwise
@@ -187,22 +211,21 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx 
*hctx)
 * is busy, which can be triggered easily by
 * per-request_queue queue depth
 */
-   blk_mq_flush_busy_ctxs(hctx, _list);
-   blk_mq_dispatch_rq_list(q, _list);
-   }
-
-   if (!do_sched_dispatch)
-   return;
+   if (!q->queue_depth) {
+   blk_mq_flush_busy_ctxs(hctx, _list);
+   blk_mq_dispatch_rq_list(q, _list);
+   } else {
+   blk_mq_do_dispatch_ctx(q, hctx);
+   }
+   } else {
 
-   /*
-* We want to dispatch from the scheduler if we had no work left
-* on the dispatch list, OR if we did have work but weren't able
-* to make progress.
-*/
-   if (has_sched_dispatch)
+   /*
+* We want to dispatch from the scheduler if we had no work left
+* on the dispatch list, OR if we did have work but weren't able
+* to make progress.
+*/
blk_mq_do_dispatch_sched(q, e, hctx);
-   else
-   blk_mq_do_dispatch_ctx(q, hctx);
+   }
 }
 
 bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
@@ -330,6 +353,7 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx 
*hctx,
 */
spin_lock(>lock);
list_add(>queuelist, >dispatch);
+   set_bit(BLK_MQ_S_DISPATCH_BUSY, >state);
spin_unlock(>lock);
return true;
 }
diff --git a/block/blk-mq.c b/block/blk-mq.c
index f063dd0f197f..6af56a71c1cd