Re: [PATCH v2 6/8] blk-mq: don't stop queue for quiescing
On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote: > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 032045841856..84cce67caee3 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -168,8 +168,6 @@ void blk_mq_quiesce_queue(struct request_queue *q) > unsigned int i; > bool rcu = false; > > - __blk_mq_stop_hw_queues(q, true); > - > spin_lock_irq(q->queue_lock); > queue_flag_set(QUEUE_FLAG_QUIESCED, q); > spin_unlock_irq(q->queue_lock); > @@ -198,7 +196,12 @@ void blk_mq_unquiesce_queue(struct request_queue *q) > queue_flag_clear(QUEUE_FLAG_QUIESCED, q); > spin_unlock_irq(q->queue_lock); > > - blk_mq_start_stopped_hw_queues(q, true); > + /* > + * During quiescing, requests can be inserted > + * to scheduler's queue or sw queue, so we run > + * queues for dispatching these requests. > + */ > + blk_mq_start_hw_queues(q); > } > EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue); Hello Ming, This looks really weird to me. If blk_mq_quiesce_queue() doesn't stop any hardware queues, why should blk_mq_unquiesce_queue() start any hardware queues? Bart.
Re: [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue
On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote: > It is required that no dispatch can happen any more once > blk_mq_quiesce_queue() returns, and we don't have such requirement > on APIs of stopping queue. > > But blk_mq_quiesce_queue() still may not block/drain dispatch in the > following cases: > > - direct issue or BLK_MQ_S_START_ON_RUN > - in theory, new RCU read-side critical sections may begin while > synchronize_rcu() was waiting, and end after synchronize_rcu() > returns, during the period dispatch still may happen Hello Ming, I think the title and the description of this patch are wrong. Since the current queue quiescing mechanism works fine for drivers that do not stop and restart a queue (e.g. SCSI and dm-core), please change the title and description to reflect that the purpose of this patch is to allow drivers that use the quiesce mechanism to restart a queue without unquiescing it. > @@ -209,6 +217,9 @@ void blk_mq_wake_waiters(struct request_queue *q) >* the queue are notified as well. >*/ > wake_up_all(>mq_freeze_wq); > + > + /* Forcibly unquiesce the queue to avoid having stuck requests */ > + blk_mq_unquiesce_queue(q); > } Should the block layer unquiesce a queue if a block driver hasn't done that before queue removal starts or should the block driver itself do that? The block layer doesn't restart stopped queues from inside blk_set_queue_dying() so why should it unquiesce a quiesced queue? > bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx) > @@ -1108,13 +1119,15 @@ static void __blk_mq_run_hw_queue(struct > blk_mq_hw_ctx *hctx) > > if (!(hctx->flags & BLK_MQ_F_BLOCKING)) { > rcu_read_lock(); > - blk_mq_sched_dispatch_requests(hctx); > + if (!blk_queue_quiesced(hctx->queue)) > + blk_mq_sched_dispatch_requests(hctx); > rcu_read_unlock(); > } else { > might_sleep(); > > srcu_idx = srcu_read_lock(>queue_rq_srcu); > - blk_mq_sched_dispatch_requests(hctx); > + if (!blk_queue_quiesced(hctx->queue)) > + blk_mq_sched_dispatch_requests(hctx); > srcu_read_unlock(>queue_rq_srcu, srcu_idx); > } > } Sorry but I don't like these changes. Why have the blk_queue_quiesced() calls be added at other code locations than the blk_mq_hctx_stopped() calls? This will make the block layer unnecessary hard to maintain. Please consider to change the blk_mq_hctx_stopped(hctx) calls in blk_mq_sched_dispatch_requests() and *blk_mq_*run_hw_queue*() into blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q). Thanks, Bart.
Re: [PATCH v2 0/8] blk-mq: fix & improve queue quiescing
On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote: > There are some issues in current blk_mq_quiesce_queue(): > > - in case of direct issue or BLK_MQ_S_START_ON_RUN, dispatch won't > be prevented after blk_mq_quiesce_queue() is returned. > - in theory, new RCU read-side critical sections may begin while > synchronize_rcu() was waiting, and end after returning from > synchronize_rcu(), then dispatch still may be run after > synchronize_rcu() returns Hello Ming, I disagree with the second part of your statement. blk_mq_quiesce_queue() stops all hardware queues before calling synchronize_*rcu(). blk_mq_sched_dispatch_requests() is called with an RCU lock held and checks whether a hctx has been stopped before calling .queue_rq(). That is sufficient to prevent blk_mq_try_issue_directly() to trigger a .queue_rq() call. Bart.
[PATCH v2 5/8] blk-mq: update comments on blk_mq_quiesce_queue()
Actually what we want to get from blk_mq_quiesce_queue() isn't only to wait for completion of all ongoing .queue_rq(). In the typical context of canceling requests, we need to make sure that the following is done in the dispatch path before starting to cancel requests: - failed dispatched request is finished - busy dispatched request is requeued, and the STARTED flag is cleared So update comment to keep code, doc and our expection consistent. Signed-off-by: Ming Lei--- block/blk-mq.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 470ee5514ea9..032045841856 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -155,12 +155,12 @@ void blk_mq_unfreeze_queue(struct request_queue *q) EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue); /** - * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have finished + * blk_mq_quiesce_queue() - wait until all ongoing dispatching have finished * @q: request queue. * * Note: this function does not prevent that the struct request end_io() - * callback function is invoked. Additionally, it is not prevented that - * new queue_rq() calls occur unless the queue has been stopped first. + * callback function is invoked. Once this function is returned, we make + * sure no dispatching can happen. */ void blk_mq_quiesce_queue(struct request_queue *q) { -- 2.9.4
[PATCH v2 7/8] blk-mq: clarify dispatch may not be drained/blocked by stopping queue
BLK_MQ_S_STOPPED may not be observed in other concurrent I/O paths, we can't guarantee that dispatching won't happen after returning from the APIs of stopping queue. So clarify the fact and avoid potential misuse. Signed-off-by: Ming Lei--- block/blk-mq.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/block/blk-mq.c b/block/blk-mq.c index 84cce67caee3..73e07bc074a6 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1240,6 +1240,11 @@ static void __blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx, bool sync) set_bit(BLK_MQ_S_STOPPED, >state); } +/* + * We do not guarantee that dispatch can be drained or blocked + * after blk_mq_stop_hw_queue() returns. Please use + * blk_mq_quiesce_queue() for that requirement. + */ void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx) { __blk_mq_stop_hw_queue(hctx, false); @@ -1255,6 +1260,11 @@ static void __blk_mq_stop_hw_queues(struct request_queue *q, bool sync) __blk_mq_stop_hw_queue(hctx, sync); } +/* + * We do not guarantee that dispatch can be drained or blocked + * after blk_mq_stop_hw_queues() returns. Please use + * blk_mq_quiesce_queue() for that requirement. + */ void blk_mq_stop_hw_queues(struct request_queue *q) { __blk_mq_stop_hw_queues(q, false); -- 2.9.4
[PATCH v2 8/8] Revert "blk-mq: don't use sync workqueue flushing from drivers"
This patch reverts commit 2719aa217e0d02(blk-mq: don't use sync workqueue flushing from drivers) because only blk_mq_quiesce_queue() need the sync flush, and now we don't need to stop queue any more, so revert it. Also changes to cancel_delayed_work() in blk_mq_stop_hw_queue(). Signed-off-by: Ming Lei--- block/blk-mq.c | 30 -- 1 file changed, 8 insertions(+), 22 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 73e07bc074a6..f38226bd651d 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -42,7 +42,6 @@ static LIST_HEAD(all_q_list); static void blk_mq_poll_stats_start(struct request_queue *q); static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb); -static void __blk_mq_stop_hw_queues(struct request_queue *q, bool sync); static int blk_mq_poll_stats_bkt(const struct request *rq) { @@ -1230,16 +1229,6 @@ bool blk_mq_queue_stopped(struct request_queue *q) } EXPORT_SYMBOL(blk_mq_queue_stopped); -static void __blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx, bool sync) -{ - if (sync) - cancel_delayed_work_sync(>run_work); - else - cancel_delayed_work(>run_work); - - set_bit(BLK_MQ_S_STOPPED, >state); -} - /* * We do not guarantee that dispatch can be drained or blocked * after blk_mq_stop_hw_queue() returns. Please use @@ -1247,18 +1236,11 @@ static void __blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx, bool sync) */ void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx) { - __blk_mq_stop_hw_queue(hctx, false); -} -EXPORT_SYMBOL(blk_mq_stop_hw_queue); + cancel_delayed_work(>run_work); -static void __blk_mq_stop_hw_queues(struct request_queue *q, bool sync) -{ - struct blk_mq_hw_ctx *hctx; - int i; - - queue_for_each_hw_ctx(q, hctx, i) - __blk_mq_stop_hw_queue(hctx, sync); + set_bit(BLK_MQ_S_STOPPED, >state); } +EXPORT_SYMBOL(blk_mq_stop_hw_queue); /* * We do not guarantee that dispatch can be drained or blocked @@ -1267,7 +1249,11 @@ static void __blk_mq_stop_hw_queues(struct request_queue *q, bool sync) */ void blk_mq_stop_hw_queues(struct request_queue *q) { - __blk_mq_stop_hw_queues(q, false); + struct blk_mq_hw_ctx *hctx; + int i; + + queue_for_each_hw_ctx(q, hctx, i) + blk_mq_stop_hw_queue(hctx); } EXPORT_SYMBOL(blk_mq_stop_hw_queues); -- 2.9.4
[PATCH v2 6/8] blk-mq: don't stop queue for quiescing
Queue can be started by other blk-mq APIs and can be used in different cases, this limits uses of blk_mq_quiesce_queue() if it is based on stopping queue, and make its usage very difficult, especially users have to use the stop queue APIs carefully for avoiding to break blk_mq_quiesce_queue(). We have applied the QUIESCED flag for draining and blocking dispatch, so it isn't necessary to stop queue any more. After stopping queue is removed, blk_mq_quiesce_queue() can be used safely and easily, then users won't worry about queue restarting during quiescing at all. Signed-off-by: Ming Lei--- block/blk-mq.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 032045841856..84cce67caee3 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -168,8 +168,6 @@ void blk_mq_quiesce_queue(struct request_queue *q) unsigned int i; bool rcu = false; - __blk_mq_stop_hw_queues(q, true); - spin_lock_irq(q->queue_lock); queue_flag_set(QUEUE_FLAG_QUIESCED, q); spin_unlock_irq(q->queue_lock); @@ -198,7 +196,12 @@ void blk_mq_unquiesce_queue(struct request_queue *q) queue_flag_clear(QUEUE_FLAG_QUIESCED, q); spin_unlock_irq(q->queue_lock); - blk_mq_start_stopped_hw_queues(q, true); + /* +* During quiescing, requests can be inserted +* to scheduler's queue or sw queue, so we run +* queues for dispatching these requests. +*/ + blk_mq_start_hw_queues(q); } EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue); -- 2.9.4
[PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue
It is required that no dispatch can happen any more once blk_mq_quiesce_queue() returns, and we don't have such requirement on APIs of stopping queue. But blk_mq_quiesce_queue() still may not block/drain dispatch in the following cases: - direct issue or BLK_MQ_S_START_ON_RUN - in theory, new RCU read-side critical sections may begin while synchronize_rcu() was waiting, and end after synchronize_rcu() returns, during the period dispatch still may happen So use the new flag of QUEUE_FLAG_QUIESCED and evaluate it inside RCU read-side critical sections for fixing the above issues. This patch fixes request use-after-free[1][2] during canceling requets of NVMe in nvme_dev_disable(), which can be triggered easily during NVMe reset & remove test. [1] oops kernel log when CONFIG_BLK_DEV_INTEGRITY is on [ 103.412969] BUG: unable to handle kernel NULL pointer dereference at 000a [ 103.412980] IP: bio_integrity_advance+0x48/0xf0 [ 103.412981] PGD 275a88067 [ 103.412981] P4D 275a88067 [ 103.412982] PUD 276c43067 [ 103.412983] PMD 0 [ 103.412984] [ 103.412986] Oops: [#1] SMP [ 103.412989] Modules linked in: vfat fat intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd cryptd ipmi_ssif iTCO_wdt iTCO_vendor_support mxm_wmi glue_helper dcdbas ipmi_si mei_me pcspkr mei sg ipmi_devintf lpc_ich ipmi_msghandler shpchp acpi_power_meter wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel nvme ahci nvme_core libahci libata tg3 i2c_core megaraid_sas ptp pps_core dm_mirror dm_region_hash dm_log dm_mod [ 103.413035] CPU: 0 PID: 102 Comm: kworker/0:2 Not tainted 4.11.0+ #1 [ 103.413036] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.2.5 09/06/2016 [ 103.413041] Workqueue: events nvme_remove_dead_ctrl_work [nvme] [ 103.413043] task: 9cc8775c8000 task.stack: c033c252c000 [ 103.413045] RIP: 0010:bio_integrity_advance+0x48/0xf0 [ 103.413046] RSP: 0018:c033c252fc10 EFLAGS: 00010202 [ 103.413048] RAX: RBX: 9cc8720a8cc0 RCX: 9cca72958240 [ 103.413049] RDX: 9cca72958000 RSI: 0008 RDI: 9cc872537f00 [ 103.413049] RBP: c033c252fc28 R08: R09: b963a0d5 [ 103.413050] R10: 063e R11: R12: 9cc8720a8d18 [ 103.413051] R13: 1000 R14: 9cc872682e00 R15: fffb [ 103.413053] FS: () GS:9cc877c0() knlGS: [ 103.413054] CS: 0010 DS: ES: CR0: 80050033 [ 103.413055] CR2: 000a CR3: 000276c41000 CR4: 001406f0 [ 103.413056] Call Trace: [ 103.413063] bio_advance+0x2a/0xe0 [ 103.413067] blk_update_request+0x76/0x330 [ 103.413072] blk_mq_end_request+0x1a/0x70 [ 103.413074] blk_mq_dispatch_rq_list+0x370/0x410 [ 103.413076] ? blk_mq_flush_busy_ctxs+0x94/0xe0 [ 103.413080] blk_mq_sched_dispatch_requests+0x173/0x1a0 [ 103.413083] __blk_mq_run_hw_queue+0x8e/0xa0 [ 103.413085] __blk_mq_delay_run_hw_queue+0x9d/0xa0 [ 103.413088] blk_mq_start_hw_queue+0x17/0x20 [ 103.413090] blk_mq_start_hw_queues+0x32/0x50 [ 103.413095] nvme_kill_queues+0x54/0x80 [nvme_core] [ 103.413097] nvme_remove_dead_ctrl_work+0x1f/0x40 [nvme] [ 103.413103] process_one_work+0x149/0x360 [ 103.413105] worker_thread+0x4d/0x3c0 [ 103.413109] kthread+0x109/0x140 [ 103.413111] ? rescuer_thread+0x380/0x380 [ 103.413113] ? kthread_park+0x60/0x60 [ 103.413120] ret_from_fork+0x2c/0x40 [ 103.413121] Code: 08 4c 8b 63 50 48 8b 80 80 00 00 00 48 8b 90 d0 03 00 00 31 c0 48 83 ba 40 02 00 00 00 48 8d 8a 40 02 00 00 48 0f 45 c1 c1 ee 09 <0f> b6 48 0a 0f b6 40 09 41 89 f5 83 e9 09 41 d3 ed 44 0f af e8 [ 103.413145] RIP: bio_integrity_advance+0x48/0xf0 RSP: c033c252fc10 [ 103.413146] CR2: 000a [ 103.413157] ---[ end trace cd6875d16eb5a11e ]--- [ 103.455368] Kernel panic - not syncing: Fatal exception [ 103.459826] Kernel Offset: 0x3760 from 0x8100 (relocation range: 0x8000-0xbfff) [ 103.850916] ---[ end Kernel panic - not syncing: Fatal exception [ 103.857637] sched: Unexpected reschedule of offline CPU#1! [ 103.863762] [ cut here ] [2] kernel hang in blk_mq_freeze_queue_wait() when CONFIG_BLK_DEV_INTEGRITY is off [ 247.129825] INFO: task nvme-test:1772 blocked for more than 120 seconds. [ 247.137311] Not tainted 4.12.0-rc2.upstream+ #4 [ 247.142954] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 247.151704] Call Trace: [ 247.154445] __schedule+0x28a/0x880 [ 247.158341] schedule+0x36/0x80 [ 247.161850] blk_mq_freeze_queue_wait+0x4b/0xb0 [ 247.166913] ? remove_wait_queue+0x60/0x60 [ 247.171485]
[PATCH v2 3/8] blk-mq: use the introduced blk_mq_unquiesce_queue()
blk_mq_unquiesce_queue() is used for unquiescing the queue explicitly, so replace blk_mq_start_stopped_hw_queues() with it. Cc: linux-n...@lists.infradead.org Cc: linux-s...@vger.kernel.org Cc: dm-de...@redhat.com Signed-off-by: Ming Lei--- drivers/md/dm-rq.c | 2 +- drivers/nvme/host/core.c | 2 +- drivers/scsi/scsi_lib.c | 5 - 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c index 2af27026aa2e..673fcf075077 100644 --- a/drivers/md/dm-rq.c +++ b/drivers/md/dm-rq.c @@ -71,7 +71,7 @@ static void dm_old_start_queue(struct request_queue *q) static void dm_mq_start_queue(struct request_queue *q) { - blk_mq_start_stopped_hw_queues(q, true); + blk_mq_unquiesce_queue(q); blk_mq_kick_requeue_list(q); } diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 04e115834702..231d36028afc 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -2514,7 +2514,7 @@ void nvme_start_queues(struct nvme_ctrl *ctrl) mutex_lock(>namespaces_mutex); list_for_each_entry(ns, >namespaces, list) { - blk_mq_start_stopped_hw_queues(ns->queue, true); + blk_mq_unquiesce_queue(ns->queue); blk_mq_kick_requeue_list(ns->queue); } mutex_unlock(>namespaces_mutex); diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 814a4bd8405d..72b11f75719c 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -3030,7 +3030,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev, return -EINVAL; if (q->mq_ops) { - blk_mq_start_stopped_hw_queues(q, false); + if (blk_queue_quiesced(q)) + blk_mq_unquiesce_queue(q); + else + blk_mq_start_stopped_hw_queues(q, false); } else { spin_lock_irqsave(q->queue_lock, flags); blk_start_queue(q); -- 2.9.4
[PATCH v2 2/8] block: introduce flag of QUEUE_FLAG_QUIESCED
This flag is introduced for fixing & improving the quiescing code. Signed-off-by: Ming Lei--- include/linux/blkdev.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 41291be82ac4..60967797f4f6 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -618,6 +618,7 @@ struct request_queue { #define QUEUE_FLAG_STATS 27 /* track rq completion times */ #define QUEUE_FLAG_POLL_STATS 28 /* collecting stats for hybrid polling */ #define QUEUE_FLAG_REGISTERED 29 /* queue has been registered to a disk */ +#define QUEUE_FLAG_QUIESCED30 /* queue has been quiesced */ #define QUEUE_FLAG_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) |\ (1 << QUEUE_FLAG_STACKABLE)| \ @@ -712,6 +713,7 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q) #define blk_noretry_request(rq) \ ((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \ REQ_FAILFAST_DRIVER)) +#define blk_queue_quiesced(q) test_bit(QUEUE_FLAG_QUIESCED, &(q)->queue_flags) static inline bool blk_account_rq(struct request *rq) { -- 2.9.4
[PATCH v2 1/8] blk-mq: introduce blk_mq_unquiesce_queue
We use blk_mq_start_stopped_hw_queues() implictely as pair of blk_mq_quiesce_queue() for unquiescing the queue, so we introduce blk_mq_unquiesce_queue() and make it as pair of blk_mq_quiesce_queue() explicitely. This function is for fixing current quiescing mechanism in the following patches. Signed-off-by: Ming Lei--- block/blk-mq.c | 13 + include/linux/blkdev.h | 1 + 2 files changed, 14 insertions(+) diff --git a/block/blk-mq.c b/block/blk-mq.c index f2224ffd225d..8ecbbb718946 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -181,6 +181,19 @@ void blk_mq_quiesce_queue(struct request_queue *q) } EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue); +/* + * blk_mq_unquiesce_queue() - pair of blk_mq_quiesce_queue() + * @q: request queue. + * + * This function recovers queue into the state before quiescing + * which is done by blk_mq_quiesce_queue. + */ +void blk_mq_unquiesce_queue(struct request_queue *q) +{ + blk_mq_start_stopped_hw_queues(q, true); +} +EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue); + void blk_mq_wake_waiters(struct request_queue *q) { struct blk_mq_hw_ctx *hctx; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index ab92c4ea138b..41291be82ac4 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -964,6 +964,7 @@ extern void __blk_run_queue_uncond(struct request_queue *q); extern void blk_run_queue(struct request_queue *); extern void blk_run_queue_async(struct request_queue *q); extern void blk_mq_quiesce_queue(struct request_queue *q); +extern void blk_mq_unquiesce_queue(struct request_queue *q); extern int blk_rq_map_user(struct request_queue *, struct request *, struct rq_map_data *, void __user *, unsigned long, gfp_t); -- 2.9.4
[PATCH v2 0/8] blk-mq: fix & improve queue quiescing
Hi, There are some issues in current blk_mq_quiesce_queue(): - in case of direct issue or BLK_MQ_S_START_ON_RUN, dispatch won't be prevented after blk_mq_quiesce_queue() is returned. - in theory, new RCU read-side critical sections may begin while synchronize_rcu() was waiting, and end after returning from synchronize_rcu(), then dispatch still may be run after synchronize_rcu() returns It is observed that request double-free/use-after-free can be triggered easily when canceling NVMe requests via blk_mq_tagset_busy_iter(...nvme_cancel_request) in nvme_dev_disable(). The reason is that blk_mq_quiesce_queue() can't prevent dispatching from being run during the period. Actually we have to quiesce queue for canceling dispatched requests via blk_mq_tagset_busy_iter(), otherwise use-after-free can be made easily. This way of canceling dispatched requests has been used in several drivers, only NVMe uses blk_mq_quiesce_queue() to avoid the issue, and others need to be fixed too. And it should be a common way for handling dead controller. blk_mq_quiesce_queue() is implemented via stopping queue, which limits its uses, and easy to casue race, because any queue restart in other paths may break blk_mq_quiesce_queue(). For example, we sometimes stops queue when hw can't handle too many ongoing requests and restarts queue after requests are completed. Meantime when we want to cancel requests if hardware is dead, quiescing has to be run first, then the restarting in complete path can break the quiescing. This patch improves this interface via removing stopping queue, then it can be easier to use. V2: - split patch "blk-mq: fix blk_mq_quiesce_queue" into two and fix one build issue when only applying the 1st two patches. - add kernel oops and hang log into commit log - add 'Revert "blk-mq: don't use sync workqueue flushing from drivers"' Ming Lei (8): blk-mq: introduce blk_mq_unquiesce_queue block: introduce flag of QUEUE_FLAG_QUIESCED blk-mq: use the introduced blk_mq_unquiesce_queue() blk-mq: fix blk_mq_quiesce_queue blk-mq: update comments on blk_mq_quiesce_queue() blk-mq: don't stop queue for quiescing blk-mq: clarify dispatch may not be drained/blocked by stopping queue Revert "blk-mq: don't use sync workqueue flushing from drivers" block/blk-mq.c | 89 +--- drivers/md/dm-rq.c | 2 +- drivers/nvme/host/core.c | 2 +- drivers/scsi/scsi_lib.c | 5 ++- include/linux/blkdev.h | 3 ++ 5 files changed, 71 insertions(+), 30 deletions(-) -- 2.9.4