Re: [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue

2017-05-30 Thread Ming Lei
On Tue, May 30, 2017 at 04:54:47PM +, Bart Van Assche wrote:
> On Tue, 2017-05-30 at 08:22 +0800, Ming Lei wrote:
> > On Sun, May 28, 2017 at 04:10:09PM +, Bart Van Assche wrote:
> > > I really would like to see the blk_queue_quiesced() tests as close as 
> > > possible to
> > > the blk_mq_hctx_stopped() tests. But I agree that we need a way to 
> > > document and/or
> > 
> > Could you explain why we have to do that? And checking on stopped state
> > doesn't need to hold RCU/SRCU read lock, and that two states are really
> > different.
> 
> I'm really surprised that you ask me why ... It's because the purpose of the
> "stopped" and "quiesced" flags is similar, namely preventing that dispatching

Actually they are not, and you can find it in patch 7.

But I will move the check into the dispatch function, and add comment about
rcu/scru read lock requirement.

> requests happens. It doesn't matter that with your patches applied it is no
> longer needed to hold an RCU / SRCU lock when testing the stopped flag.
> 
> > > verify that these tests occur with an RCU read-side lock held. Have you 
> > > considered
> > > to use rcu_read_lock_held() to document this?
> > 
> > Then we need to check if it is RCU or SRCU, and make code ugly as
> > current check on BLOCKING.
> 
> How about introducing a macro or inline function in the block layer that tests
> whether either the RCU read lock or an SRCU read lock is held depending on the
> value of the BLK_MQ_F_BLOCKING flag?

That might make code clean, but need to check the condition two times.

Thanks,
Ming


Re: [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue

2017-05-30 Thread Bart Van Assche
On Sun, 2017-05-28 at 18:44 +0800, Ming Lei wrote:
> First it is really a fix, and then a improvement, so could you tell me
> where is wrong with the title and the description?

Hello Ming,

Can you explain me why you want to keep the blk_mq_stop_hw_queues() call in
nvme_suspend_queue()? Since immediately after that call the NVMe driver starts
freeing resources, shouldn't that call be changed into a call of
blk_mq_quiesce_queue()? I think the same comment also applies to the
blk_mq_stop_hw_queues() calls in nvme_rdma_error_recovery_work() and
nvme_rdma_shutdown_ctrl().

Bart.

Re: [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue

2017-05-30 Thread Bart Van Assche
On Tue, 2017-05-30 at 08:22 +0800, Ming Lei wrote:
> On Sun, May 28, 2017 at 04:10:09PM +, Bart Van Assche wrote:
> > I really would like to see the blk_queue_quiesced() tests as close as 
> > possible to
> > the blk_mq_hctx_stopped() tests. But I agree that we need a way to document 
> > and/or
> 
> Could you explain why we have to do that? And checking on stopped state
> doesn't need to hold RCU/SRCU read lock, and that two states are really
> different.

I'm really surprised that you ask me why ... It's because the purpose of the
"stopped" and "quiesced" flags is similar, namely preventing that dispatching
requests happens. It doesn't matter that with your patches applied it is no
longer needed to hold an RCU / SRCU lock when testing the stopped flag.

> > verify that these tests occur with an RCU read-side lock held. Have you 
> > considered
> > to use rcu_read_lock_held() to document this?
> 
> Then we need to check if it is RCU or SRCU, and make code ugly as
> current check on BLOCKING.

How about introducing a macro or inline function in the block layer that tests
whether either the RCU read lock or an SRCU read lock is held depending on the
value of the BLK_MQ_F_BLOCKING flag?

Bart.

Re: [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue

2017-05-29 Thread Ming Lei
On Sun, May 28, 2017 at 04:10:09PM +, Bart Van Assche wrote:
> On Sun, 2017-05-28 at 18:44 +0800, Ming Lei wrote:
> > On Sat, May 27, 2017 at 09:46:45PM +, Bart Van Assche wrote:
> > > On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote:
> > > >  bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)
> > > > @@ -1108,13 +1119,15 @@ static void __blk_mq_run_hw_queue(struct 
> > > > blk_mq_hw_ctx *hctx)
> > > >  
> > > > if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
> > > > rcu_read_lock();
> > > > -   blk_mq_sched_dispatch_requests(hctx);
> > > > +   if (!blk_queue_quiesced(hctx->queue))
> > > > +   blk_mq_sched_dispatch_requests(hctx);
> > > > rcu_read_unlock();
> > > > } else {
> > > > might_sleep();
> > > >  
> > > > srcu_idx = srcu_read_lock(>queue_rq_srcu);
> > > > -   blk_mq_sched_dispatch_requests(hctx);
> > > > +   if (!blk_queue_quiesced(hctx->queue))
> > > > +   blk_mq_sched_dispatch_requests(hctx);
> > > > srcu_read_unlock(>queue_rq_srcu, srcu_idx);
> > > > }
> > > >  }
> > > 
> > > Sorry but I don't like these changes. Why have the blk_queue_quiesced()
> > > calls be added at other code locations than the blk_mq_hctx_stopped() 
> > > calls?
> > > This will make the block layer unnecessary hard to maintain. Please 
> > > consider
> > > to change the blk_mq_hctx_stopped(hctx) calls in 
> > > blk_mq_sched_dispatch_requests()
> > > and *blk_mq_*run_hw_queue*() into blk_mq_hctx_stopped(hctx) || 
> > > blk_queue_quiesced(q).
> > 
> > One benefit is that we make it explicit that the flag has to be checked
> > inside the RCU read-side critical sections. If you put it somewhere,
> > someone may put it out of read-side critical sections in future.
> 
> Hello Ming,
> 
> I really would like to see the blk_queue_quiesced() tests as close as 
> possible to
> the blk_mq_hctx_stopped() tests. But I agree that we need a way to document 
> and/or

Could you explain why we have to do that? And checking on stopped state
doesn't need to hold RCU/SRCU read lock, and that two states are really
different.

> verify that these tests occur with an RCU read-side lock held. Have you 
> considered
> to use rcu_read_lock_held() to document this?

Then we need to check if it is RCU or SRCU, and make code ugly as
current check on BLOCKING.

Thanks,
Ming


Re: [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue

2017-05-28 Thread Bart Van Assche
On Sun, 2017-05-28 at 18:44 +0800, Ming Lei wrote:
> On Sat, May 27, 2017 at 09:46:45PM +, Bart Van Assche wrote:
> > On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote:
> > >  bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)
> > > @@ -1108,13 +1119,15 @@ static void __blk_mq_run_hw_queue(struct 
> > > blk_mq_hw_ctx *hctx)
> > >  
> > >   if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
> > >   rcu_read_lock();
> > > - blk_mq_sched_dispatch_requests(hctx);
> > > + if (!blk_queue_quiesced(hctx->queue))
> > > + blk_mq_sched_dispatch_requests(hctx);
> > >   rcu_read_unlock();
> > >   } else {
> > >   might_sleep();
> > >  
> > >   srcu_idx = srcu_read_lock(>queue_rq_srcu);
> > > - blk_mq_sched_dispatch_requests(hctx);
> > > + if (!blk_queue_quiesced(hctx->queue))
> > > + blk_mq_sched_dispatch_requests(hctx);
> > >   srcu_read_unlock(>queue_rq_srcu, srcu_idx);
> > >   }
> > >  }
> > 
> > Sorry but I don't like these changes. Why have the blk_queue_quiesced()
> > calls be added at other code locations than the blk_mq_hctx_stopped() calls?
> > This will make the block layer unnecessary hard to maintain. Please consider
> > to change the blk_mq_hctx_stopped(hctx) calls in 
> > blk_mq_sched_dispatch_requests()
> > and *blk_mq_*run_hw_queue*() into blk_mq_hctx_stopped(hctx) || 
> > blk_queue_quiesced(q).
> 
> One benefit is that we make it explicit that the flag has to be checked
> inside the RCU read-side critical sections. If you put it somewhere,
> someone may put it out of read-side critical sections in future.

Hello Ming,

I really would like to see the blk_queue_quiesced() tests as close as possible 
to
the blk_mq_hctx_stopped() tests. But I agree that we need a way to document 
and/or
verify that these tests occur with an RCU read-side lock held. Have you 
considered
to use rcu_read_lock_held() to document this?

Thanks,

Bart.

Re: [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue

2017-05-28 Thread Ming Lei

On Sat, May 27, 2017 at 09:46:45PM +, Bart Van Assche wrote:
> On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote:
> > It is required that no dispatch can happen any more once
> > blk_mq_quiesce_queue() returns, and we don't have such requirement
> > on APIs of stopping queue.
> > 
> > But blk_mq_quiesce_queue() still may not block/drain dispatch in the
> > following cases:
> > 
> > - direct issue or BLK_MQ_S_START_ON_RUN
> > - in theory, new RCU read-side critical sections may begin while
> > synchronize_rcu() was waiting, and end after synchronize_rcu()
> > returns, during the period dispatch still may happen
> 
> Hello Ming,

Hello Bart,

> 
> I think the title and the description of this patch are wrong. Since
> the current queue quiescing mechanism works fine for drivers that do
> not stop and restart a queue (e.g. SCSI and dm-core), please change the

I have provided the issues in current quiesce mechanism, now I post it again:

But blk_mq_quiesce_queue() still may not block/drain dispatch in the
following cases:

- direct issue or BLK_MQ_S_START_ON_RUN
- in theory, new RCU read-side critical sections may begin while
synchronize_rcu() was waiting, and end after synchronize_rcu()
returns, during the period dispatch still may happen

Not like stopping queue, any dispatching has to be drained/blocked
when the synchronize_rcu() returns, otherwise double free or
use-after-free can be triggered, which has been observed on NVMe
already.

> title and description to reflect that the purpose of this patch is
> to allow drivers that use the quiesce mechanism to restart a queue
> without unquiescing it.

First it is really a fix, and then a improvement, so could you tell me
where is wrong with the title and the description?

> 
> > @@ -209,6 +217,9 @@ void blk_mq_wake_waiters(struct request_queue *q)
> >  * the queue are notified as well.
> >  */
> > wake_up_all(>mq_freeze_wq);
> > +
> > +   /* Forcibly unquiesce the queue to avoid having stuck requests */
> > +   blk_mq_unquiesce_queue(q);
> >  }
> 
> Should the block layer unquiesce a queue if a block driver hasn't 
> done that before queue removal starts or should the block driver
> itself do that?

Some drivers might quiesce a queue and not unquiesce it, such as
NVMe.

OK, I will consider to fix drivers first.

> The block layer doesn't restart stopped queues from
> inside blk_set_queue_dying() so why should it unquiesce a quiesced
> queue?

If the quiesced queue isn't unquiesced, it may cause I/O hang, since
any I/O in sw queue/scheduler queue can't be completed at all.

OK, will fix driver in next post.

Actually the queue has to be started after blk_set_queue_dying(),
otherwise it can cause I/O hang too, and there can be lots of
writeback I/O in the following del_gendisk(). We have done it
in NVMe already, see nvme_kill_queues().

Maybe in future, we should consider to do that all in block layer.

>  
> >  bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)
> > @@ -1108,13 +1119,15 @@ static void __blk_mq_run_hw_queue(struct 
> > blk_mq_hw_ctx *hctx)
> >  
> > if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
> > rcu_read_lock();
> > -   blk_mq_sched_dispatch_requests(hctx);
> > +   if (!blk_queue_quiesced(hctx->queue))
> > +   blk_mq_sched_dispatch_requests(hctx);
> > rcu_read_unlock();
> > } else {
> > might_sleep();
> >  
> > srcu_idx = srcu_read_lock(>queue_rq_srcu);
> > -   blk_mq_sched_dispatch_requests(hctx);
> > +   if (!blk_queue_quiesced(hctx->queue))
> > +   blk_mq_sched_dispatch_requests(hctx);
> > srcu_read_unlock(>queue_rq_srcu, srcu_idx);
> > }
> >  }
> 
> Sorry but I don't like these changes. Why have the blk_queue_quiesced()
> calls be added at other code locations than the blk_mq_hctx_stopped() calls?
> This will make the block layer unnecessary hard to maintain. Please consider
> to change the blk_mq_hctx_stopped(hctx) calls in 
> blk_mq_sched_dispatch_requests()
> and *blk_mq_*run_hw_queue*() into blk_mq_hctx_stopped(hctx) || 
> blk_queue_quiesced(q).

One benefit is that we make it explicit that the flag has to be checked
inside the RCU read-side critical sections. If you put it somewhere,
someone may put it out of read-side critical sections in future.


Thanks,
Ming


Re: [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue

2017-05-27 Thread Bart Van Assche
On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote:
> It is required that no dispatch can happen any more once
> blk_mq_quiesce_queue() returns, and we don't have such requirement
> on APIs of stopping queue.
> 
> But blk_mq_quiesce_queue() still may not block/drain dispatch in the
> following cases:
> 
> - direct issue or BLK_MQ_S_START_ON_RUN
> - in theory, new RCU read-side critical sections may begin while
> synchronize_rcu() was waiting, and end after synchronize_rcu()
> returns, during the period dispatch still may happen

Hello Ming,

I think the title and the description of this patch are wrong. Since
the current queue quiescing mechanism works fine for drivers that do
not stop and restart a queue (e.g. SCSI and dm-core), please change the
title and description to reflect that the purpose of this patch is
to allow drivers that use the quiesce mechanism to restart a queue
without unquiescing it.

> @@ -209,6 +217,9 @@ void blk_mq_wake_waiters(struct request_queue *q)
>* the queue are notified as well.
>*/
>   wake_up_all(>mq_freeze_wq);
> +
> + /* Forcibly unquiesce the queue to avoid having stuck requests */
> + blk_mq_unquiesce_queue(q);
>  }

Should the block layer unquiesce a queue if a block driver hasn't 
done that before queue removal starts or should the block driver
itself do that? The block layer doesn't restart stopped queues from
inside blk_set_queue_dying() so why should it unquiesce a quiesced
queue?
 
>  bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)
> @@ -1108,13 +1119,15 @@ static void __blk_mq_run_hw_queue(struct 
> blk_mq_hw_ctx *hctx)
>  
>   if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
>   rcu_read_lock();
> - blk_mq_sched_dispatch_requests(hctx);
> + if (!blk_queue_quiesced(hctx->queue))
> + blk_mq_sched_dispatch_requests(hctx);
>   rcu_read_unlock();
>   } else {
>   might_sleep();
>  
>   srcu_idx = srcu_read_lock(>queue_rq_srcu);
> - blk_mq_sched_dispatch_requests(hctx);
> + if (!blk_queue_quiesced(hctx->queue))
> + blk_mq_sched_dispatch_requests(hctx);
>   srcu_read_unlock(>queue_rq_srcu, srcu_idx);
>   }
>  }

Sorry but I don't like these changes. Why have the blk_queue_quiesced()
calls be added at other code locations than the blk_mq_hctx_stopped() calls?
This will make the block layer unnecessary hard to maintain. Please consider
to change the blk_mq_hctx_stopped(hctx) calls in 
blk_mq_sched_dispatch_requests()
and *blk_mq_*run_hw_queue*() into blk_mq_hctx_stopped(hctx) || 
blk_queue_quiesced(q).

Thanks,

Bart.