Re: [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue
On Tue, May 30, 2017 at 04:54:47PM +, Bart Van Assche wrote: > On Tue, 2017-05-30 at 08:22 +0800, Ming Lei wrote: > > On Sun, May 28, 2017 at 04:10:09PM +, Bart Van Assche wrote: > > > I really would like to see the blk_queue_quiesced() tests as close as > > > possible to > > > the blk_mq_hctx_stopped() tests. But I agree that we need a way to > > > document and/or > > > > Could you explain why we have to do that? And checking on stopped state > > doesn't need to hold RCU/SRCU read lock, and that two states are really > > different. > > I'm really surprised that you ask me why ... It's because the purpose of the > "stopped" and "quiesced" flags is similar, namely preventing that dispatching Actually they are not, and you can find it in patch 7. But I will move the check into the dispatch function, and add comment about rcu/scru read lock requirement. > requests happens. It doesn't matter that with your patches applied it is no > longer needed to hold an RCU / SRCU lock when testing the stopped flag. > > > > verify that these tests occur with an RCU read-side lock held. Have you > > > considered > > > to use rcu_read_lock_held() to document this? > > > > Then we need to check if it is RCU or SRCU, and make code ugly as > > current check on BLOCKING. > > How about introducing a macro or inline function in the block layer that tests > whether either the RCU read lock or an SRCU read lock is held depending on the > value of the BLK_MQ_F_BLOCKING flag? That might make code clean, but need to check the condition two times. Thanks, Ming
Re: [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue
On Sun, 2017-05-28 at 18:44 +0800, Ming Lei wrote: > First it is really a fix, and then a improvement, so could you tell me > where is wrong with the title and the description? Hello Ming, Can you explain me why you want to keep the blk_mq_stop_hw_queues() call in nvme_suspend_queue()? Since immediately after that call the NVMe driver starts freeing resources, shouldn't that call be changed into a call of blk_mq_quiesce_queue()? I think the same comment also applies to the blk_mq_stop_hw_queues() calls in nvme_rdma_error_recovery_work() and nvme_rdma_shutdown_ctrl(). Bart.
Re: [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue
On Tue, 2017-05-30 at 08:22 +0800, Ming Lei wrote: > On Sun, May 28, 2017 at 04:10:09PM +, Bart Van Assche wrote: > > I really would like to see the blk_queue_quiesced() tests as close as > > possible to > > the blk_mq_hctx_stopped() tests. But I agree that we need a way to document > > and/or > > Could you explain why we have to do that? And checking on stopped state > doesn't need to hold RCU/SRCU read lock, and that two states are really > different. I'm really surprised that you ask me why ... It's because the purpose of the "stopped" and "quiesced" flags is similar, namely preventing that dispatching requests happens. It doesn't matter that with your patches applied it is no longer needed to hold an RCU / SRCU lock when testing the stopped flag. > > verify that these tests occur with an RCU read-side lock held. Have you > > considered > > to use rcu_read_lock_held() to document this? > > Then we need to check if it is RCU or SRCU, and make code ugly as > current check on BLOCKING. How about introducing a macro or inline function in the block layer that tests whether either the RCU read lock or an SRCU read lock is held depending on the value of the BLK_MQ_F_BLOCKING flag? Bart.
Re: [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue
On Sun, May 28, 2017 at 04:10:09PM +, Bart Van Assche wrote: > On Sun, 2017-05-28 at 18:44 +0800, Ming Lei wrote: > > On Sat, May 27, 2017 at 09:46:45PM +, Bart Van Assche wrote: > > > On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote: > > > > bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx) > > > > @@ -1108,13 +1119,15 @@ static void __blk_mq_run_hw_queue(struct > > > > blk_mq_hw_ctx *hctx) > > > > > > > > if (!(hctx->flags & BLK_MQ_F_BLOCKING)) { > > > > rcu_read_lock(); > > > > - blk_mq_sched_dispatch_requests(hctx); > > > > + if (!blk_queue_quiesced(hctx->queue)) > > > > + blk_mq_sched_dispatch_requests(hctx); > > > > rcu_read_unlock(); > > > > } else { > > > > might_sleep(); > > > > > > > > srcu_idx = srcu_read_lock(>queue_rq_srcu); > > > > - blk_mq_sched_dispatch_requests(hctx); > > > > + if (!blk_queue_quiesced(hctx->queue)) > > > > + blk_mq_sched_dispatch_requests(hctx); > > > > srcu_read_unlock(>queue_rq_srcu, srcu_idx); > > > > } > > > > } > > > > > > Sorry but I don't like these changes. Why have the blk_queue_quiesced() > > > calls be added at other code locations than the blk_mq_hctx_stopped() > > > calls? > > > This will make the block layer unnecessary hard to maintain. Please > > > consider > > > to change the blk_mq_hctx_stopped(hctx) calls in > > > blk_mq_sched_dispatch_requests() > > > and *blk_mq_*run_hw_queue*() into blk_mq_hctx_stopped(hctx) || > > > blk_queue_quiesced(q). > > > > One benefit is that we make it explicit that the flag has to be checked > > inside the RCU read-side critical sections. If you put it somewhere, > > someone may put it out of read-side critical sections in future. > > Hello Ming, > > I really would like to see the blk_queue_quiesced() tests as close as > possible to > the blk_mq_hctx_stopped() tests. But I agree that we need a way to document > and/or Could you explain why we have to do that? And checking on stopped state doesn't need to hold RCU/SRCU read lock, and that two states are really different. > verify that these tests occur with an RCU read-side lock held. Have you > considered > to use rcu_read_lock_held() to document this? Then we need to check if it is RCU or SRCU, and make code ugly as current check on BLOCKING. Thanks, Ming
Re: [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue
On Sun, 2017-05-28 at 18:44 +0800, Ming Lei wrote: > On Sat, May 27, 2017 at 09:46:45PM +, Bart Van Assche wrote: > > On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote: > > > bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx) > > > @@ -1108,13 +1119,15 @@ static void __blk_mq_run_hw_queue(struct > > > blk_mq_hw_ctx *hctx) > > > > > > if (!(hctx->flags & BLK_MQ_F_BLOCKING)) { > > > rcu_read_lock(); > > > - blk_mq_sched_dispatch_requests(hctx); > > > + if (!blk_queue_quiesced(hctx->queue)) > > > + blk_mq_sched_dispatch_requests(hctx); > > > rcu_read_unlock(); > > > } else { > > > might_sleep(); > > > > > > srcu_idx = srcu_read_lock(>queue_rq_srcu); > > > - blk_mq_sched_dispatch_requests(hctx); > > > + if (!blk_queue_quiesced(hctx->queue)) > > > + blk_mq_sched_dispatch_requests(hctx); > > > srcu_read_unlock(>queue_rq_srcu, srcu_idx); > > > } > > > } > > > > Sorry but I don't like these changes. Why have the blk_queue_quiesced() > > calls be added at other code locations than the blk_mq_hctx_stopped() calls? > > This will make the block layer unnecessary hard to maintain. Please consider > > to change the blk_mq_hctx_stopped(hctx) calls in > > blk_mq_sched_dispatch_requests() > > and *blk_mq_*run_hw_queue*() into blk_mq_hctx_stopped(hctx) || > > blk_queue_quiesced(q). > > One benefit is that we make it explicit that the flag has to be checked > inside the RCU read-side critical sections. If you put it somewhere, > someone may put it out of read-side critical sections in future. Hello Ming, I really would like to see the blk_queue_quiesced() tests as close as possible to the blk_mq_hctx_stopped() tests. But I agree that we need a way to document and/or verify that these tests occur with an RCU read-side lock held. Have you considered to use rcu_read_lock_held() to document this? Thanks, Bart.
Re: [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue
On Sat, May 27, 2017 at 09:46:45PM +, Bart Van Assche wrote: > On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote: > > It is required that no dispatch can happen any more once > > blk_mq_quiesce_queue() returns, and we don't have such requirement > > on APIs of stopping queue. > > > > But blk_mq_quiesce_queue() still may not block/drain dispatch in the > > following cases: > > > > - direct issue or BLK_MQ_S_START_ON_RUN > > - in theory, new RCU read-side critical sections may begin while > > synchronize_rcu() was waiting, and end after synchronize_rcu() > > returns, during the period dispatch still may happen > > Hello Ming, Hello Bart, > > I think the title and the description of this patch are wrong. Since > the current queue quiescing mechanism works fine for drivers that do > not stop and restart a queue (e.g. SCSI and dm-core), please change the I have provided the issues in current quiesce mechanism, now I post it again: But blk_mq_quiesce_queue() still may not block/drain dispatch in the following cases: - direct issue or BLK_MQ_S_START_ON_RUN - in theory, new RCU read-side critical sections may begin while synchronize_rcu() was waiting, and end after synchronize_rcu() returns, during the period dispatch still may happen Not like stopping queue, any dispatching has to be drained/blocked when the synchronize_rcu() returns, otherwise double free or use-after-free can be triggered, which has been observed on NVMe already. > title and description to reflect that the purpose of this patch is > to allow drivers that use the quiesce mechanism to restart a queue > without unquiescing it. First it is really a fix, and then a improvement, so could you tell me where is wrong with the title and the description? > > > @@ -209,6 +217,9 @@ void blk_mq_wake_waiters(struct request_queue *q) > > * the queue are notified as well. > > */ > > wake_up_all(>mq_freeze_wq); > > + > > + /* Forcibly unquiesce the queue to avoid having stuck requests */ > > + blk_mq_unquiesce_queue(q); > > } > > Should the block layer unquiesce a queue if a block driver hasn't > done that before queue removal starts or should the block driver > itself do that? Some drivers might quiesce a queue and not unquiesce it, such as NVMe. OK, I will consider to fix drivers first. > The block layer doesn't restart stopped queues from > inside blk_set_queue_dying() so why should it unquiesce a quiesced > queue? If the quiesced queue isn't unquiesced, it may cause I/O hang, since any I/O in sw queue/scheduler queue can't be completed at all. OK, will fix driver in next post. Actually the queue has to be started after blk_set_queue_dying(), otherwise it can cause I/O hang too, and there can be lots of writeback I/O in the following del_gendisk(). We have done it in NVMe already, see nvme_kill_queues(). Maybe in future, we should consider to do that all in block layer. > > > bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx) > > @@ -1108,13 +1119,15 @@ static void __blk_mq_run_hw_queue(struct > > blk_mq_hw_ctx *hctx) > > > > if (!(hctx->flags & BLK_MQ_F_BLOCKING)) { > > rcu_read_lock(); > > - blk_mq_sched_dispatch_requests(hctx); > > + if (!blk_queue_quiesced(hctx->queue)) > > + blk_mq_sched_dispatch_requests(hctx); > > rcu_read_unlock(); > > } else { > > might_sleep(); > > > > srcu_idx = srcu_read_lock(>queue_rq_srcu); > > - blk_mq_sched_dispatch_requests(hctx); > > + if (!blk_queue_quiesced(hctx->queue)) > > + blk_mq_sched_dispatch_requests(hctx); > > srcu_read_unlock(>queue_rq_srcu, srcu_idx); > > } > > } > > Sorry but I don't like these changes. Why have the blk_queue_quiesced() > calls be added at other code locations than the blk_mq_hctx_stopped() calls? > This will make the block layer unnecessary hard to maintain. Please consider > to change the blk_mq_hctx_stopped(hctx) calls in > blk_mq_sched_dispatch_requests() > and *blk_mq_*run_hw_queue*() into blk_mq_hctx_stopped(hctx) || > blk_queue_quiesced(q). One benefit is that we make it explicit that the flag has to be checked inside the RCU read-side critical sections. If you put it somewhere, someone may put it out of read-side critical sections in future. Thanks, Ming
Re: [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue
On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote: > It is required that no dispatch can happen any more once > blk_mq_quiesce_queue() returns, and we don't have such requirement > on APIs of stopping queue. > > But blk_mq_quiesce_queue() still may not block/drain dispatch in the > following cases: > > - direct issue or BLK_MQ_S_START_ON_RUN > - in theory, new RCU read-side critical sections may begin while > synchronize_rcu() was waiting, and end after synchronize_rcu() > returns, during the period dispatch still may happen Hello Ming, I think the title and the description of this patch are wrong. Since the current queue quiescing mechanism works fine for drivers that do not stop and restart a queue (e.g. SCSI and dm-core), please change the title and description to reflect that the purpose of this patch is to allow drivers that use the quiesce mechanism to restart a queue without unquiescing it. > @@ -209,6 +217,9 @@ void blk_mq_wake_waiters(struct request_queue *q) >* the queue are notified as well. >*/ > wake_up_all(>mq_freeze_wq); > + > + /* Forcibly unquiesce the queue to avoid having stuck requests */ > + blk_mq_unquiesce_queue(q); > } Should the block layer unquiesce a queue if a block driver hasn't done that before queue removal starts or should the block driver itself do that? The block layer doesn't restart stopped queues from inside blk_set_queue_dying() so why should it unquiesce a quiesced queue? > bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx) > @@ -1108,13 +1119,15 @@ static void __blk_mq_run_hw_queue(struct > blk_mq_hw_ctx *hctx) > > if (!(hctx->flags & BLK_MQ_F_BLOCKING)) { > rcu_read_lock(); > - blk_mq_sched_dispatch_requests(hctx); > + if (!blk_queue_quiesced(hctx->queue)) > + blk_mq_sched_dispatch_requests(hctx); > rcu_read_unlock(); > } else { > might_sleep(); > > srcu_idx = srcu_read_lock(>queue_rq_srcu); > - blk_mq_sched_dispatch_requests(hctx); > + if (!blk_queue_quiesced(hctx->queue)) > + blk_mq_sched_dispatch_requests(hctx); > srcu_read_unlock(>queue_rq_srcu, srcu_idx); > } > } Sorry but I don't like these changes. Why have the blk_queue_quiesced() calls be added at other code locations than the blk_mq_hctx_stopped() calls? This will make the block layer unnecessary hard to maintain. Please consider to change the blk_mq_hctx_stopped(hctx) calls in blk_mq_sched_dispatch_requests() and *blk_mq_*run_hw_queue*() into blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q). Thanks, Bart.