Re: [PATCH v2 6/8] blk-mq: don't stop queue for quiescing

2017-05-27 Thread Bart Van Assche
On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote:
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 032045841856..84cce67caee3 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -168,8 +168,6 @@ void blk_mq_quiesce_queue(struct request_queue *q)
>   unsigned int i;
>   bool rcu = false;
>  
> - __blk_mq_stop_hw_queues(q, true);
> -
>   spin_lock_irq(q->queue_lock);
>   queue_flag_set(QUEUE_FLAG_QUIESCED, q);
>   spin_unlock_irq(q->queue_lock);
> @@ -198,7 +196,12 @@ void blk_mq_unquiesce_queue(struct request_queue *q)
>   queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
>   spin_unlock_irq(q->queue_lock);
>  
> - blk_mq_start_stopped_hw_queues(q, true);
> + /*
> +  * During quiescing, requests can be inserted
> +  * to scheduler's queue or sw queue, so we run
> +  * queues for dispatching these requests.
> +  */
> + blk_mq_start_hw_queues(q);
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);

Hello Ming,

This looks really weird to me. If blk_mq_quiesce_queue() doesn't stop any
hardware queues, why should blk_mq_unquiesce_queue() start any hardware
queues?

Bart.

Re: [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue

2017-05-27 Thread Bart Van Assche
On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote:
> It is required that no dispatch can happen any more once
> blk_mq_quiesce_queue() returns, and we don't have such requirement
> on APIs of stopping queue.
> 
> But blk_mq_quiesce_queue() still may not block/drain dispatch in the
> following cases:
> 
> - direct issue or BLK_MQ_S_START_ON_RUN
> - in theory, new RCU read-side critical sections may begin while
> synchronize_rcu() was waiting, and end after synchronize_rcu()
> returns, during the period dispatch still may happen

Hello Ming,

I think the title and the description of this patch are wrong. Since
the current queue quiescing mechanism works fine for drivers that do
not stop and restart a queue (e.g. SCSI and dm-core), please change the
title and description to reflect that the purpose of this patch is
to allow drivers that use the quiesce mechanism to restart a queue
without unquiescing it.

> @@ -209,6 +217,9 @@ void blk_mq_wake_waiters(struct request_queue *q)
>* the queue are notified as well.
>*/
>   wake_up_all(>mq_freeze_wq);
> +
> + /* Forcibly unquiesce the queue to avoid having stuck requests */
> + blk_mq_unquiesce_queue(q);
>  }

Should the block layer unquiesce a queue if a block driver hasn't 
done that before queue removal starts or should the block driver
itself do that? The block layer doesn't restart stopped queues from
inside blk_set_queue_dying() so why should it unquiesce a quiesced
queue?
 
>  bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)
> @@ -1108,13 +1119,15 @@ static void __blk_mq_run_hw_queue(struct 
> blk_mq_hw_ctx *hctx)
>  
>   if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
>   rcu_read_lock();
> - blk_mq_sched_dispatch_requests(hctx);
> + if (!blk_queue_quiesced(hctx->queue))
> + blk_mq_sched_dispatch_requests(hctx);
>   rcu_read_unlock();
>   } else {
>   might_sleep();
>  
>   srcu_idx = srcu_read_lock(>queue_rq_srcu);
> - blk_mq_sched_dispatch_requests(hctx);
> + if (!blk_queue_quiesced(hctx->queue))
> + blk_mq_sched_dispatch_requests(hctx);
>   srcu_read_unlock(>queue_rq_srcu, srcu_idx);
>   }
>  }

Sorry but I don't like these changes. Why have the blk_queue_quiesced()
calls be added at other code locations than the blk_mq_hctx_stopped() calls?
This will make the block layer unnecessary hard to maintain. Please consider
to change the blk_mq_hctx_stopped(hctx) calls in 
blk_mq_sched_dispatch_requests()
and *blk_mq_*run_hw_queue*() into blk_mq_hctx_stopped(hctx) || 
blk_queue_quiesced(q).

Thanks,

Bart.

Re: [PATCH v2 0/8] blk-mq: fix & improve queue quiescing

2017-05-27 Thread Bart Van Assche
On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote:
> There are some issues in current blk_mq_quiesce_queue():
> 
> - in case of direct issue or BLK_MQ_S_START_ON_RUN, dispatch won't
> be prevented after blk_mq_quiesce_queue() is returned.
> - in theory, new RCU read-side critical sections may begin while
> synchronize_rcu() was waiting, and end after returning from 
> synchronize_rcu(), then dispatch still may be run after
> synchronize_rcu() returns

Hello Ming,

I disagree with the second part of your statement. blk_mq_quiesce_queue()
stops all hardware queues before calling synchronize_*rcu().
blk_mq_sched_dispatch_requests() is called with an RCU lock held and checks
whether a hctx has been stopped before calling .queue_rq(). That is
sufficient to prevent blk_mq_try_issue_directly() to trigger a .queue_rq()
call.

Bart.

[PATCH v2 5/8] blk-mq: update comments on blk_mq_quiesce_queue()

2017-05-27 Thread Ming Lei
Actually what we want to get from blk_mq_quiesce_queue()
isn't only to wait for completion of all ongoing .queue_rq().

In the typical context of canceling requests, we need to
make sure that the following is done in the dispatch path
before starting to cancel requests:

- failed dispatched request is finished
- busy dispatched request is requeued, and the STARTED
flag is cleared

So update comment to keep code, doc and our expection consistent.

Signed-off-by: Ming Lei 
---
 block/blk-mq.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 470ee5514ea9..032045841856 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -155,12 +155,12 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
 EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
 
 /**
- * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have finished
+ * blk_mq_quiesce_queue() - wait until all ongoing dispatching have finished
  * @q: request queue.
  *
  * Note: this function does not prevent that the struct request end_io()
- * callback function is invoked. Additionally, it is not prevented that
- * new queue_rq() calls occur unless the queue has been stopped first.
+ * callback function is invoked. Once this function is returned, we make
+ * sure no dispatching can happen.
  */
 void blk_mq_quiesce_queue(struct request_queue *q)
 {
-- 
2.9.4



[PATCH v2 7/8] blk-mq: clarify dispatch may not be drained/blocked by stopping queue

2017-05-27 Thread Ming Lei
BLK_MQ_S_STOPPED may not be observed in other concurrent I/O paths,
we can't guarantee that dispatching won't happen after returning
from the APIs of stopping queue.

So clarify the fact and avoid potential misuse.

Signed-off-by: Ming Lei 
---
 block/blk-mq.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 84cce67caee3..73e07bc074a6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1240,6 +1240,11 @@ static void __blk_mq_stop_hw_queue(struct blk_mq_hw_ctx 
*hctx, bool sync)
set_bit(BLK_MQ_S_STOPPED, >state);
 }
 
+/*
+ * We do not guarantee that dispatch can be drained or blocked
+ * after blk_mq_stop_hw_queue() returns. Please use
+ * blk_mq_quiesce_queue() for that requirement.
+ */
 void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx)
 {
__blk_mq_stop_hw_queue(hctx, false);
@@ -1255,6 +1260,11 @@ static void __blk_mq_stop_hw_queues(struct request_queue 
*q, bool sync)
__blk_mq_stop_hw_queue(hctx, sync);
 }
 
+/*
+ * We do not guarantee that dispatch can be drained or blocked
+ * after blk_mq_stop_hw_queues() returns. Please use
+ * blk_mq_quiesce_queue() for that requirement.
+ */
 void blk_mq_stop_hw_queues(struct request_queue *q)
 {
__blk_mq_stop_hw_queues(q, false);
-- 
2.9.4



[PATCH v2 8/8] Revert "blk-mq: don't use sync workqueue flushing from drivers"

2017-05-27 Thread Ming Lei
This patch reverts commit 2719aa217e0d02(blk-mq: don't use
sync workqueue flushing from drivers) because only
blk_mq_quiesce_queue() need the sync flush, and now
we don't need to stop queue any more, so revert it.

Also changes to cancel_delayed_work() in blk_mq_stop_hw_queue().

Signed-off-by: Ming Lei 
---
 block/blk-mq.c | 30 --
 1 file changed, 8 insertions(+), 22 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 73e07bc074a6..f38226bd651d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -42,7 +42,6 @@ static LIST_HEAD(all_q_list);
 
 static void blk_mq_poll_stats_start(struct request_queue *q);
 static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
-static void __blk_mq_stop_hw_queues(struct request_queue *q, bool sync);
 
 static int blk_mq_poll_stats_bkt(const struct request *rq)
 {
@@ -1230,16 +1229,6 @@ bool blk_mq_queue_stopped(struct request_queue *q)
 }
 EXPORT_SYMBOL(blk_mq_queue_stopped);
 
-static void __blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx, bool sync)
-{
-   if (sync)
-   cancel_delayed_work_sync(>run_work);
-   else
-   cancel_delayed_work(>run_work);
-
-   set_bit(BLK_MQ_S_STOPPED, >state);
-}
-
 /*
  * We do not guarantee that dispatch can be drained or blocked
  * after blk_mq_stop_hw_queue() returns. Please use
@@ -1247,18 +1236,11 @@ static void __blk_mq_stop_hw_queue(struct blk_mq_hw_ctx 
*hctx, bool sync)
  */
 void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx)
 {
-   __blk_mq_stop_hw_queue(hctx, false);
-}
-EXPORT_SYMBOL(blk_mq_stop_hw_queue);
+   cancel_delayed_work(>run_work);
 
-static void __blk_mq_stop_hw_queues(struct request_queue *q, bool sync)
-{
-   struct blk_mq_hw_ctx *hctx;
-   int i;
-
-   queue_for_each_hw_ctx(q, hctx, i)
-   __blk_mq_stop_hw_queue(hctx, sync);
+   set_bit(BLK_MQ_S_STOPPED, >state);
 }
+EXPORT_SYMBOL(blk_mq_stop_hw_queue);
 
 /*
  * We do not guarantee that dispatch can be drained or blocked
@@ -1267,7 +1249,11 @@ static void __blk_mq_stop_hw_queues(struct request_queue 
*q, bool sync)
  */
 void blk_mq_stop_hw_queues(struct request_queue *q)
 {
-   __blk_mq_stop_hw_queues(q, false);
+   struct blk_mq_hw_ctx *hctx;
+   int i;
+
+   queue_for_each_hw_ctx(q, hctx, i)
+   blk_mq_stop_hw_queue(hctx);
 }
 EXPORT_SYMBOL(blk_mq_stop_hw_queues);
 
-- 
2.9.4



[PATCH v2 6/8] blk-mq: don't stop queue for quiescing

2017-05-27 Thread Ming Lei
Queue can be started by other blk-mq APIs and can be used in
different cases, this limits uses of blk_mq_quiesce_queue()
if it is based on stopping queue, and make its usage very
difficult, especially users have to use the stop queue APIs
carefully for avoiding to break blk_mq_quiesce_queue().

We have applied the QUIESCED flag for draining and blocking
dispatch, so it isn't necessary to stop queue any more.

After stopping queue is removed, blk_mq_quiesce_queue() can
be used safely and easily, then users won't worry about queue
restarting during quiescing at all.

Signed-off-by: Ming Lei 
---
 block/blk-mq.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 032045841856..84cce67caee3 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -168,8 +168,6 @@ void blk_mq_quiesce_queue(struct request_queue *q)
unsigned int i;
bool rcu = false;
 
-   __blk_mq_stop_hw_queues(q, true);
-
spin_lock_irq(q->queue_lock);
queue_flag_set(QUEUE_FLAG_QUIESCED, q);
spin_unlock_irq(q->queue_lock);
@@ -198,7 +196,12 @@ void blk_mq_unquiesce_queue(struct request_queue *q)
queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
spin_unlock_irq(q->queue_lock);
 
-   blk_mq_start_stopped_hw_queues(q, true);
+   /*
+* During quiescing, requests can be inserted
+* to scheduler's queue or sw queue, so we run
+* queues for dispatching these requests.
+*/
+   blk_mq_start_hw_queues(q);
 }
 EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
 
-- 
2.9.4



[PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue

2017-05-27 Thread Ming Lei
It is required that no dispatch can happen any more once
blk_mq_quiesce_queue() returns, and we don't have such requirement
on APIs of stopping queue.

But blk_mq_quiesce_queue() still may not block/drain dispatch in the
following cases:

- direct issue or BLK_MQ_S_START_ON_RUN
- in theory, new RCU read-side critical sections may begin while
synchronize_rcu() was waiting, and end after synchronize_rcu()
returns, during the period dispatch still may happen

So use the new flag of QUEUE_FLAG_QUIESCED and evaluate it
inside RCU read-side critical sections for fixing the above issues.

This patch fixes request use-after-free[1][2] during canceling requets
of NVMe in nvme_dev_disable(), which can be triggered easily during
NVMe reset & remove test.

[1] oops kernel log when CONFIG_BLK_DEV_INTEGRITY is on
[  103.412969] BUG: unable to handle kernel NULL pointer dereference at 
000a
[  103.412980] IP: bio_integrity_advance+0x48/0xf0
[  103.412981] PGD 275a88067
[  103.412981] P4D 275a88067
[  103.412982] PUD 276c43067
[  103.412983] PMD 0
[  103.412984]
[  103.412986] Oops:  [#1] SMP
[  103.412989] Modules linked in: vfat fat intel_rapl sb_edac 
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd 
cryptd ipmi_ssif iTCO_wdt iTCO_vendor_support mxm_wmi glue_helper dcdbas 
ipmi_si mei_me pcspkr mei sg ipmi_devintf lpc_ich ipmi_msghandler shpchp 
acpi_power_meter wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs 
libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect 
sysimgblt fb_sys_fops ttm drm crc32c_intel nvme ahci nvme_core libahci libata 
tg3 i2c_core megaraid_sas ptp pps_core dm_mirror dm_region_hash dm_log dm_mod
[  103.413035] CPU: 0 PID: 102 Comm: kworker/0:2 Not tainted 4.11.0+ #1
[  103.413036] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.2.5 
09/06/2016
[  103.413041] Workqueue: events nvme_remove_dead_ctrl_work [nvme]
[  103.413043] task: 9cc8775c8000 task.stack: c033c252c000
[  103.413045] RIP: 0010:bio_integrity_advance+0x48/0xf0
[  103.413046] RSP: 0018:c033c252fc10 EFLAGS: 00010202
[  103.413048] RAX:  RBX: 9cc8720a8cc0 RCX: 9cca72958240
[  103.413049] RDX: 9cca72958000 RSI: 0008 RDI: 9cc872537f00
[  103.413049] RBP: c033c252fc28 R08:  R09: b963a0d5
[  103.413050] R10: 063e R11:  R12: 9cc8720a8d18
[  103.413051] R13: 1000 R14: 9cc872682e00 R15: fffb
[  103.413053] FS:  () GS:9cc877c0() 
knlGS:
[  103.413054] CS:  0010 DS:  ES:  CR0: 80050033
[  103.413055] CR2: 000a CR3: 000276c41000 CR4: 001406f0
[  103.413056] Call Trace:
[  103.413063]  bio_advance+0x2a/0xe0
[  103.413067]  blk_update_request+0x76/0x330
[  103.413072]  blk_mq_end_request+0x1a/0x70
[  103.413074]  blk_mq_dispatch_rq_list+0x370/0x410
[  103.413076]  ? blk_mq_flush_busy_ctxs+0x94/0xe0
[  103.413080]  blk_mq_sched_dispatch_requests+0x173/0x1a0
[  103.413083]  __blk_mq_run_hw_queue+0x8e/0xa0
[  103.413085]  __blk_mq_delay_run_hw_queue+0x9d/0xa0
[  103.413088]  blk_mq_start_hw_queue+0x17/0x20
[  103.413090]  blk_mq_start_hw_queues+0x32/0x50
[  103.413095]  nvme_kill_queues+0x54/0x80 [nvme_core]
[  103.413097]  nvme_remove_dead_ctrl_work+0x1f/0x40 [nvme]
[  103.413103]  process_one_work+0x149/0x360
[  103.413105]  worker_thread+0x4d/0x3c0
[  103.413109]  kthread+0x109/0x140
[  103.413111]  ? rescuer_thread+0x380/0x380
[  103.413113]  ? kthread_park+0x60/0x60
[  103.413120]  ret_from_fork+0x2c/0x40
[  103.413121] Code: 08 4c 8b 63 50 48 8b 80 80 00 00 00 48 8b 90 d0 03 00 00 
31 c0 48 83 ba 40 02 00 00 00 48 8d 8a 40 02 00 00 48 0f 45 c1 c1 ee 09 <0f> b6 
48 0a 0f b6 40 09 41 89 f5 83 e9 09 41 d3 ed 44 0f af e8
[  103.413145] RIP: bio_integrity_advance+0x48/0xf0 RSP: c033c252fc10
[  103.413146] CR2: 000a
[  103.413157] ---[ end trace cd6875d16eb5a11e ]---
[  103.455368] Kernel panic - not syncing: Fatal exception
[  103.459826] Kernel Offset: 0x3760 from 0x8100 (relocation 
range: 0x8000-0xbfff)
[  103.850916] ---[ end Kernel panic - not syncing: Fatal exception
[  103.857637] sched: Unexpected reschedule of offline CPU#1!
[  103.863762] [ cut here ]

[2] kernel hang in blk_mq_freeze_queue_wait() when CONFIG_BLK_DEV_INTEGRITY is 
off
[  247.129825] INFO: task nvme-test:1772 blocked for more than 120 seconds.
[  247.137311]   Not tainted 4.12.0-rc2.upstream+ #4
[  247.142954] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  247.151704] Call Trace:
[  247.154445]  __schedule+0x28a/0x880
[  247.158341]  schedule+0x36/0x80
[  247.161850]  blk_mq_freeze_queue_wait+0x4b/0xb0
[  247.166913]  ? remove_wait_queue+0x60/0x60
[  247.171485]  

[PATCH v2 3/8] blk-mq: use the introduced blk_mq_unquiesce_queue()

2017-05-27 Thread Ming Lei
blk_mq_unquiesce_queue() is used for unquiescing the
queue explicitly, so replace blk_mq_start_stopped_hw_queues()
with it.

Cc: linux-n...@lists.infradead.org
Cc: linux-s...@vger.kernel.org
Cc: dm-de...@redhat.com
Signed-off-by: Ming Lei 
---
 drivers/md/dm-rq.c   | 2 +-
 drivers/nvme/host/core.c | 2 +-
 drivers/scsi/scsi_lib.c  | 5 -
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 2af27026aa2e..673fcf075077 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -71,7 +71,7 @@ static void dm_old_start_queue(struct request_queue *q)
 
 static void dm_mq_start_queue(struct request_queue *q)
 {
-   blk_mq_start_stopped_hw_queues(q, true);
+   blk_mq_unquiesce_queue(q);
blk_mq_kick_requeue_list(q);
 }
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 04e115834702..231d36028afc 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2514,7 +2514,7 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
 
mutex_lock(>namespaces_mutex);
list_for_each_entry(ns, >namespaces, list) {
-   blk_mq_start_stopped_hw_queues(ns->queue, true);
+   blk_mq_unquiesce_queue(ns->queue);
blk_mq_kick_requeue_list(ns->queue);
}
mutex_unlock(>namespaces_mutex);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 814a4bd8405d..72b11f75719c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -3030,7 +3030,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
return -EINVAL;
 
if (q->mq_ops) {
-   blk_mq_start_stopped_hw_queues(q, false);
+   if (blk_queue_quiesced(q))
+   blk_mq_unquiesce_queue(q);
+   else
+   blk_mq_start_stopped_hw_queues(q, false);
} else {
spin_lock_irqsave(q->queue_lock, flags);
blk_start_queue(q);
-- 
2.9.4



[PATCH v2 2/8] block: introduce flag of QUEUE_FLAG_QUIESCED

2017-05-27 Thread Ming Lei
This flag is introduced for fixing & improving the quiescing code.

Signed-off-by: Ming Lei 
---
 include/linux/blkdev.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 41291be82ac4..60967797f4f6 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -618,6 +618,7 @@ struct request_queue {
 #define QUEUE_FLAG_STATS   27  /* track rq completion times */
 #define QUEUE_FLAG_POLL_STATS  28  /* collecting stats for hybrid polling 
*/
 #define QUEUE_FLAG_REGISTERED  29  /* queue has been registered to a disk 
*/
+#define QUEUE_FLAG_QUIESCED30  /* queue has been quiesced */
 
 #define QUEUE_FLAG_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) |\
 (1 << QUEUE_FLAG_STACKABLE)|   \
@@ -712,6 +713,7 @@ static inline void queue_flag_clear(unsigned int flag, 
struct request_queue *q)
 #define blk_noretry_request(rq) \
((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
 REQ_FAILFAST_DRIVER))
+#define blk_queue_quiesced(q)  test_bit(QUEUE_FLAG_QUIESCED, &(q)->queue_flags)
 
 static inline bool blk_account_rq(struct request *rq)
 {
-- 
2.9.4



[PATCH v2 1/8] blk-mq: introduce blk_mq_unquiesce_queue

2017-05-27 Thread Ming Lei
We use blk_mq_start_stopped_hw_queues() implictely
as pair of blk_mq_quiesce_queue() for unquiescing the queue,
so we introduce blk_mq_unquiesce_queue() and make it
as pair of blk_mq_quiesce_queue() explicitely.

This function is for fixing current quiescing mechanism
in the following patches.

Signed-off-by: Ming Lei 
---
 block/blk-mq.c | 13 +
 include/linux/blkdev.h |  1 +
 2 files changed, 14 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f2224ffd225d..8ecbbb718946 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -181,6 +181,19 @@ void blk_mq_quiesce_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
 
+/*
+ * blk_mq_unquiesce_queue() - pair of blk_mq_quiesce_queue()
+ * @q: request queue.
+ *
+ * This function recovers queue into the state before quiescing
+ * which is done by blk_mq_quiesce_queue.
+ */
+void blk_mq_unquiesce_queue(struct request_queue *q)
+{
+   blk_mq_start_stopped_hw_queues(q, true);
+}
+EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
+
 void blk_mq_wake_waiters(struct request_queue *q)
 {
struct blk_mq_hw_ctx *hctx;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ab92c4ea138b..41291be82ac4 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -964,6 +964,7 @@ extern void __blk_run_queue_uncond(struct request_queue *q);
 extern void blk_run_queue(struct request_queue *);
 extern void blk_run_queue_async(struct request_queue *q);
 extern void blk_mq_quiesce_queue(struct request_queue *q);
+extern void blk_mq_unquiesce_queue(struct request_queue *q);
 extern int blk_rq_map_user(struct request_queue *, struct request *,
   struct rq_map_data *, void __user *, unsigned long,
   gfp_t);
-- 
2.9.4



[PATCH v2 0/8] blk-mq: fix & improve queue quiescing

2017-05-27 Thread Ming Lei
Hi,

There are some issues in current blk_mq_quiesce_queue():

- in case of direct issue or BLK_MQ_S_START_ON_RUN, dispatch won't
be prevented after blk_mq_quiesce_queue() is returned.
- in theory, new RCU read-side critical sections may begin while
synchronize_rcu() was waiting, and end after returning from 
synchronize_rcu(), then dispatch still may be run after
synchronize_rcu() returns

It is observed that request double-free/use-after-free
can be triggered easily when canceling NVMe requests via
blk_mq_tagset_busy_iter(...nvme_cancel_request) in nvme_dev_disable().
The reason is that blk_mq_quiesce_queue() can't prevent
dispatching from being run during the period.

Actually we have to quiesce queue for canceling dispatched
requests via blk_mq_tagset_busy_iter(), otherwise use-after-free
can be made easily. This way of canceling dispatched requests
has been used in several drivers, only NVMe uses blk_mq_quiesce_queue()
to avoid the issue, and others need to be fixed too. And it
should be a common way for handling dead controller.

blk_mq_quiesce_queue() is implemented via stopping queue, which
limits its uses, and easy to casue race, because any queue restart in
other paths may break blk_mq_quiesce_queue(). For example, we sometimes
stops queue when hw can't handle too many ongoing requests and restarts
queue after requests are completed. Meantime when we want to cancel
requests if hardware is dead, quiescing has to be run first, then the
restarting in complete path can break the quiescing. This patch improves
this interface via removing stopping queue, then it can be easier
to use.

V2:
- split patch "blk-mq: fix blk_mq_quiesce_queue" into two and
fix one build issue when only applying the 1st two patches.
- add kernel oops and hang log into commit log
- add 'Revert "blk-mq: don't use sync workqueue flushing from drivers"'

 
Ming Lei (8):
  blk-mq: introduce blk_mq_unquiesce_queue
  block: introduce flag of QUEUE_FLAG_QUIESCED
  blk-mq: use the introduced blk_mq_unquiesce_queue()
  blk-mq: fix blk_mq_quiesce_queue
  blk-mq: update comments on blk_mq_quiesce_queue()
  blk-mq: don't stop queue for quiescing
  blk-mq: clarify dispatch may not be drained/blocked by stopping queue
  Revert "blk-mq: don't use sync workqueue flushing from drivers"

 block/blk-mq.c   | 89 +---
 drivers/md/dm-rq.c   |  2 +-
 drivers/nvme/host/core.c |  2 +-
 drivers/scsi/scsi_lib.c  |  5 ++-
 include/linux/blkdev.h   |  3 ++
 5 files changed, 71 insertions(+), 30 deletions(-)

-- 
2.9.4