Re: [PATCH V2 6/8] block: allow to allocate req with REQF_PREEMPT when queue is frozen

2017-09-01 Thread Ming Lei
On Fri, Sep 01, 2017 at 08:07:04PM +, Bart Van Assche wrote:
> On Sat, 2017-09-02 at 02:49 +0800, Ming Lei wrote:
> > +   if ((flags & BLK_MQ_REQ_PREEMPT) && blk_queue_is_freezing(q))
> > +   blk_queue_enter_live(q);
> > +   else
> > +   ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
> 
> Why did you repost this patch series without trying to reach an agreement
> about the approach?
> 
> Anyway, because of the unsafe blk_queue_enter_live() call introduced by this
> patch, please add the following to the description of this patch whenever you
> repost it:
> 
> NAK-ed-by: Bart Van Assche 

Did you take a look at the patch 7 or cover letter? In that patch,
if preempt freezing is on-progressing, any other freeze can't be
started at all, so that is definitely safe to use blk_queue_enter_live()
here, isn't it?

-- 
Ming


Re: [PATCH V2 6/8] block: allow to allocate req with REQF_PREEMPT when queue is frozen

2017-09-01 Thread Bart Van Assche
On Sat, 2017-09-02 at 02:49 +0800, Ming Lei wrote:
> + if ((flags & BLK_MQ_REQ_PREEMPT) && blk_queue_is_freezing(q))
> + blk_queue_enter_live(q);
> + else
> + ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));

Why did you repost this patch series without trying to reach an agreement
about the approach?

Anyway, because of the unsafe blk_queue_enter_live() call introduced by this
patch, please add the following to the description of this patch whenever you
repost it:

NAK-ed-by: Bart Van Assche 



Re: [PATCH V2 1/2] block/loop: fix use after free

2017-09-01 Thread Jens Axboe
On 09/01/2017 12:15 PM, Shaohua Li wrote:
> From: Shaohua Li 
> 
> lo_rw_aio->call_read_iter->
> 1   aops->direct_IO
> 2   iov_iter_revert
> lo_rw_aio_complete could happen between 1 and 2, the bio and bvec could
> be freed before 2, which accesses bvec.

Applied 1-2, thanks.

-- 
Jens Axboe



[PATCH V2 8/8] SCSI: freeze block queue when SCSI device is put into quiesce

2017-09-01 Thread Ming Lei
Simply quiesing SCSI device and waiting for completeion of IO
dispatched to SCSI queue isn't safe, it is easy to use up
requests because all these allocated requests can't be dispatched
when device is put in QIUESCE. Then no request can be allocated
for RQF_PREEMPT, and system may hang somewhere, such as
When sending commands of sync_cache or start_stop during
system suspend path.

Before quiesing SCSI, this patch freezes block queue first,
so no new request can enter queue any more, and all pending
requests are drained too once blk_freeze_queue is returned.

This patch also uses __blk_get_request() for allocating
request with RQF_PREEMPT, so that the allocation can
succeed even though block queue is frozen.

Signed-off-by: Ming Lei 
---
 drivers/scsi/scsi_lib.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f6097b89d5d3..a59544012b68 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -243,10 +243,12 @@ int scsi_execute(struct scsi_device *sdev, const unsigned 
char *cmd,
struct request *req;
struct scsi_request *rq;
int ret = DRIVER_ERROR << 24;
+   unsigned flag = sdev->sdev_state == SDEV_QUIESCE ? BLK_REQ_PREEMPT : 0;
 
-   req = blk_get_request(sdev->request_queue,
+   req = __blk_get_request(sdev->request_queue,
data_direction == DMA_TO_DEVICE ?
-   REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, __GFP_RECLAIM);
+   REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, __GFP_RECLAIM,
+   flag);
if (IS_ERR(req))
return ret;
rq = scsi_req(req);
@@ -2890,6 +2892,19 @@ scsi_device_quiesce(struct scsi_device *sdev)
 {
int err;
 
+   /*
+* Simply quiesing SCSI device isn't safe, it is easy
+* to use up requests because all these allocated requests
+* can't be dispatched when device is put in QIUESCE.
+* Then no request can be allocated and we may hang
+* somewhere, such as system suspend/resume.
+*
+* So we freeze block queue first, no new request can
+* enter queue any more, and pending requests are drained
+* once blk_freeze_queue is returned.
+*/
+   blk_freeze_queue_preempt(sdev->request_queue);
+
mutex_lock(>state_mutex);
err = scsi_device_set_state(sdev, SDEV_QUIESCE);
mutex_unlock(>state_mutex);
@@ -2926,6 +2941,8 @@ void scsi_device_resume(struct scsi_device *sdev)
scsi_device_set_state(sdev, SDEV_RUNNING) == 0)
scsi_run_queue(sdev->request_queue);
mutex_unlock(>state_mutex);
+
+   blk_unfreeze_queue_preempt(sdev->request_queue);
 }
 EXPORT_SYMBOL(scsi_device_resume);
 
-- 
2.9.5



[PATCH V2 7/8] block: introduce preempt version of blk_[freeze|unfreeze]_queue

2017-09-01 Thread Ming Lei
The two APIs are required to allow request allocation for
RQF_PREEMPT when queue is frozen.

The following two points have to be guaranteed for one queue:

1) preempt freezing can be started only after all pending
normal & preempt freezing are completed

2) normal freezing can't be started if there is pending
preempt freezing.

Because for normal freezing, once blk_mq_freeze_queue_wait()
is returned, we have to make sure no I/Os are pending.

rwsem should have been perfect for this kind of sync,
but lockdep will complain in case of nested normal freeze.

So spin_lock with freezing status is used for the sync.

Signed-off-by: Ming Lei 
---
 block/blk-core.c   |  2 ++
 block/blk-mq.c | 72 +++---
 include/linux/blk-mq.h |  2 ++
 include/linux/blkdev.h |  3 +++
 4 files changed, 76 insertions(+), 3 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index c199910d4fe1..bbcea07f17da 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -899,6 +899,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
if (blkcg_init_queue(q))
goto fail_ref;
 
+   spin_lock_init(>freeze_lock);
+
return q;
 
 fail_ref:
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 695d2eeaf41a..bf8c057aa50f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -118,16 +118,48 @@ void blk_mq_in_flight(struct request_queue *q, struct 
hd_struct *part,
blk_mq_queue_tag_busy_iter(q, blk_mq_check_inflight, );
 }
 
-void blk_freeze_queue_start(struct request_queue *q)
+static void __blk_freeze_queue_start(struct request_queue *q, bool preempt)
 {
int freeze_depth;
 
+   /*
+* Wait for completion of another kind of freezing.
+*
+* We have to sync between normal freeze and preempt
+* freeze. preempt freeze can only be started iff all
+* pending normal & preempt freezing are completed,
+* meantime normal freeze can be started only if there
+* isn't pending preempt freezing.
+*
+* rwsem should have been perfect for this kind of sync,
+* but lockdep will complain in case of nested normal freeze.
+*
+* So we have to use lock to do that manually.
+*/
+   spin_lock(>freeze_lock);
+   wait_event_cmd(q->mq_freeze_wq,
+  preempt ? !(q->normal_freezing + q->preempt_freezing) : 
!q->preempt_freezing,
+  spin_unlock(>freeze_lock),
+  spin_lock(>freeze_lock));
+
freeze_depth = atomic_inc_return(>mq_freeze_depth);
if (freeze_depth == 1) {
+   if (preempt)
+   q->preempt_freezing = 1;
+   else
+   q->normal_freezing = 1;
+   spin_unlock(>freeze_lock);
+
percpu_ref_kill(>q_usage_counter);
if (q->mq_ops)
blk_mq_run_hw_queues(q, false);
-   }
+   } else
+   spin_unlock(>freeze_lock);
+}
+
+void blk_freeze_queue_start(struct request_queue *q)
+{
+   __blk_freeze_queue_start(q, false);
 }
 EXPORT_SYMBOL_GPL(blk_freeze_queue_start);
 
@@ -166,20 +198,54 @@ void blk_freeze_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_freeze_queue);
 
-void blk_unfreeze_queue(struct request_queue *q)
+static void __blk_unfreeze_queue(struct request_queue *q, bool preempt)
 {
int freeze_depth;
 
freeze_depth = atomic_dec_return(>mq_freeze_depth);
WARN_ON_ONCE(freeze_depth < 0);
if (!freeze_depth) {
+   spin_lock(>freeze_lock);
+   if (preempt)
+   q->preempt_freezing = 0;
+   else
+   q->normal_freezing = 0;
+   spin_unlock(>freeze_lock);
percpu_ref_reinit(>q_usage_counter);
wake_up_all(>mq_freeze_wq);
}
 }
+
+void blk_unfreeze_queue(struct request_queue *q)
+{
+   __blk_unfreeze_queue(q, false);
+}
 EXPORT_SYMBOL_GPL(blk_unfreeze_queue);
 
 /*
+ * Once this function is returned, only allow to get request
+ * for preempt purpose, such as RQF_PREEMPT.
+ *
+ */
+void blk_freeze_queue_preempt(struct request_queue *q)
+{
+   __blk_freeze_queue_start(q, true);
+   blk_freeze_queue_wait(q);
+}
+EXPORT_SYMBOL_GPL(blk_freeze_queue_preempt);
+
+/*
+ * It is the caller's responsibility to make sure no new
+ * request can be allocated before calling this function.
+ */
+void blk_unfreeze_queue_preempt(struct request_queue *q)
+{
+   blk_freeze_queue_wait(q);
+   __blk_unfreeze_queue(q, true);
+}
+EXPORT_SYMBOL_GPL(blk_unfreeze_queue_preempt);
+
+/*
  * FIXME: replace the scsi_internal_device_*block_nowait() calls in the
  * mpt3sas driver such that this function can be removed.
  */
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 0ba5cb043172..596f433eb54c 100644
--- a/include/linux/blk-mq.h

[PATCH V2 5/8] block: tracking request allocation with q_usage_counter

2017-09-01 Thread Ming Lei
This usage is basically same with blk-mq, so that we can
support to freeze queue easily.

Signed-off-by: Ming Lei 
---
 block/blk-core.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index ce2d3b6f6c62..85b15833a7a5 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1405,16 +1405,21 @@ static struct request *blk_old_get_request(struct 
request_queue *q,
   unsigned int op, gfp_t gfp_mask)
 {
struct request *rq;
+   int ret = 0;
 
WARN_ON_ONCE(q->mq_ops);
 
/* create ioc upfront */
create_io_context(gfp_mask, q->node);
 
+   ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
+   if (ret)
+   return ERR_PTR(ret);
spin_lock_irq(q->queue_lock);
rq = get_request(q, op, NULL, gfp_mask);
if (IS_ERR(rq)) {
spin_unlock_irq(q->queue_lock);
+   blk_queue_exit(q);
return rq;
}
 
@@ -1586,6 +1591,7 @@ void __blk_put_request(struct request_queue *q, struct 
request *req)
blk_free_request(rl, req);
freed_request(rl, sync, rq_flags);
blk_put_rl(rl);
+   blk_queue_exit(q);
}
 }
 EXPORT_SYMBOL_GPL(__blk_put_request);
@@ -1867,8 +1873,10 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, 
struct bio *bio)
 * Grab a free request. This is might sleep but can not fail.
 * Returns with the queue unlocked.
 */
+   blk_queue_enter_live(q);
req = get_request(q, bio->bi_opf, bio, GFP_NOIO);
if (IS_ERR(req)) {
+   blk_queue_exit(q);
__wbt_done(q->rq_wb, wb_acct);
if (PTR_ERR(req) == -ENOMEM)
bio->bi_status = BLK_STS_RESOURCE;
-- 
2.9.5



[PATCH V2 6/8] block: allow to allocate req with REQF_PREEMPT when queue is frozen

2017-09-01 Thread Ming Lei
REQF_PREEMPT is a bit special because it is required to be
dispatched to lld even when SCSI device is quiesced.

So this patch introduces __blk_get_request() to allow block
layer to allocate request when queue is frozen, since we
will freeze queue before quiescing SCSI device in the
following patch for supporting safe SCSI quiescing.

Signed-off-by: Ming Lei 
---
 block/blk-core.c   | 25 +
 block/blk-mq.c | 11 +--
 block/blk.h|  5 +
 include/linux/blk-mq.h |  7 ---
 include/linux/blkdev.h | 17 +++--
 5 files changed, 50 insertions(+), 15 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 85b15833a7a5..c199910d4fe1 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1402,7 +1402,8 @@ static struct request *get_request(struct request_queue 
*q, unsigned int op,
 }
 
 static struct request *blk_old_get_request(struct request_queue *q,
-  unsigned int op, gfp_t gfp_mask)
+  unsigned int op, gfp_t gfp_mask,
+  unsigned int flags)
 {
struct request *rq;
int ret = 0;
@@ -1412,9 +1413,17 @@ static struct request *blk_old_get_request(struct 
request_queue *q,
/* create ioc upfront */
create_io_context(gfp_mask, q->node);
 
-   ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
+   /*
+* When queue is frozen, we still need to allocate req for
+* REQF_PREEMPT.
+*/
+   if ((flags & BLK_MQ_REQ_PREEMPT) && blk_queue_is_freezing(q))
+   blk_queue_enter_live(q);
+   else
+   ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
if (ret)
return ERR_PTR(ret);
+
spin_lock_irq(q->queue_lock);
rq = get_request(q, op, NULL, gfp_mask);
if (IS_ERR(rq)) {
@@ -1430,26 +1439,26 @@ static struct request *blk_old_get_request(struct 
request_queue *q,
return rq;
 }
 
-struct request *blk_get_request(struct request_queue *q, unsigned int op,
-   gfp_t gfp_mask)
+struct request *__blk_get_request(struct request_queue *q, unsigned int op,
+ gfp_t gfp_mask, unsigned int flags)
 {
struct request *req;
 
if (q->mq_ops) {
req = blk_mq_alloc_request(q, op,
-   (gfp_mask & __GFP_DIRECT_RECLAIM) ?
-   0 : BLK_MQ_REQ_NOWAIT);
+   flags | ((gfp_mask & __GFP_DIRECT_RECLAIM) ?
+   0 : BLK_MQ_REQ_NOWAIT));
if (!IS_ERR(req) && q->mq_ops->initialize_rq_fn)
q->mq_ops->initialize_rq_fn(req);
} else {
-   req = blk_old_get_request(q, op, gfp_mask);
+   req = blk_old_get_request(q, op, gfp_mask, flags);
if (!IS_ERR(req) && q->initialize_rq_fn)
q->initialize_rq_fn(req);
}
 
return req;
 }
-EXPORT_SYMBOL(blk_get_request);
+EXPORT_SYMBOL(__blk_get_request);
 
 /**
  * blk_requeue_request - put a request back on queue
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 24de78afbe9a..695d2eeaf41a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -382,9 +382,16 @@ struct request *blk_mq_alloc_request(struct request_queue 
*q, unsigned int op,
 {
struct blk_mq_alloc_data alloc_data = { .flags = flags };
struct request *rq;
-   int ret;
+   int ret = 0;
 
-   ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT);
+   /*
+* When queue is frozen, we still need to allocate req for
+* REQF_PREEMPT.
+*/
+   if ((flags & BLK_MQ_REQ_PREEMPT) && blk_queue_is_freezing(q))
+   blk_queue_enter_live(q);
+   else
+   ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT);
if (ret)
return ERR_PTR(ret);
 
diff --git a/block/blk.h b/block/blk.h
index 242486e26a81..b71f8cc047aa 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -80,6 +80,11 @@ static inline void blk_queue_enter_live(struct request_queue 
*q)
percpu_ref_get(>q_usage_counter);
 }
 
+static inline bool blk_queue_is_freezing(struct request_queue *q)
+{
+   return percpu_ref_is_dying(>q_usage_counter);
+}
+
 #ifdef CONFIG_BLK_DEV_INTEGRITY
 void blk_flush_integrity(void);
 bool __bio_integrity_endio(struct bio *);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index f90d78eb85df..0ba5cb043172 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -200,9 +200,10 @@ void blk_mq_free_request(struct request *rq);
 bool blk_mq_can_queue(struct blk_mq_hw_ctx *);
 
 enum {
-   BLK_MQ_REQ_NOWAIT   = (1 << 0), /* return when out of requests */
-   BLK_MQ_REQ_RESERVED = (1 << 1), /* allocate from reserved pool */
-   BLK_MQ_REQ_INTERNAL = (1 << 2), /* 

[PATCH V2 4/8] blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait

2017-09-01 Thread Ming Lei
The only change on legacy is that blk_drain_queue() is run
from blk_freeze_queue(), which is called in blk_cleanup_queue().

So this patch removes the explicite __blk_drain_queue() in
blk_cleanup_queue().

Signed-off-by: Ming Lei 
---
 block/blk-core.c | 17 +++--
 block/blk-mq.c   |  8 +---
 block/blk.h  |  1 +
 drivers/nvme/host/core.c |  2 +-
 include/linux/blk-mq.h   |  2 +-
 5 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index d579501f24ba..ce2d3b6f6c62 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -530,6 +530,21 @@ static void __blk_drain_queue(struct request_queue *q, 
bool drain_all)
 }
 
 /**
+ * blk_drain_queue - drain requests from request_queue
+ * @q: queue to drain
+ *
+ * Drain requests from @q.  All pending requests are drained.
+ * The caller is responsible for ensuring that no new requests
+ * which need to be drained are queued.
+ */
+void blk_drain_queue(struct request_queue *q)
+{
+   spin_lock_irq(q->queue_lock);
+   __blk_drain_queue(q, true);
+   spin_unlock_irq(q->queue_lock);
+}
+
+/**
  * blk_queue_bypass_start - enter queue bypass mode
  * @q: queue of interest
  *
@@ -653,8 +668,6 @@ void blk_cleanup_queue(struct request_queue *q)
 */
blk_freeze_queue(q);
spin_lock_irq(lock);
-   if (!q->mq_ops)
-   __blk_drain_queue(q, true);
queue_flag_set(QUEUE_FLAG_DEAD, q);
spin_unlock_irq(lock);
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4c532d8612e1..24de78afbe9a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -131,11 +131,13 @@ void blk_freeze_queue_start(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_freeze_queue_start);
 
-void blk_mq_freeze_queue_wait(struct request_queue *q)
+void blk_freeze_queue_wait(struct request_queue *q)
 {
+   if (!q->mq_ops)
+   blk_drain_queue(q);
wait_event(q->mq_freeze_wq, percpu_ref_is_zero(>q_usage_counter));
 }
-EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait);
+EXPORT_SYMBOL_GPL(blk_freeze_queue_wait);
 
 int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
 unsigned long timeout)
@@ -160,7 +162,7 @@ void blk_freeze_queue(struct request_queue *q)
 * exported to drivers as the only user for unfreeze is blk_mq.
 */
blk_freeze_queue_start(q);
-   blk_mq_freeze_queue_wait(q);
+   blk_freeze_queue_wait(q);
 }
 EXPORT_SYMBOL_GPL(blk_freeze_queue);
 
diff --git a/block/blk.h b/block/blk.h
index 6847c5435cca..242486e26a81 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -64,6 +64,7 @@ void blk_rq_bio_prep(struct request_queue *q, struct request 
*rq,
struct bio *bio);
 void blk_queue_bypass_start(struct request_queue *q);
 void blk_queue_bypass_end(struct request_queue *q);
+void blk_drain_queue(struct request_queue *q);
 void blk_dequeue_request(struct request *rq);
 void __blk_queue_free_tags(struct request_queue *q);
 void blk_freeze_queue(struct request_queue *q);
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 986f2b4f9760..d34a9ffaa940 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2778,7 +2778,7 @@ void nvme_wait_freeze(struct nvme_ctrl *ctrl)
 
mutex_lock(>namespaces_mutex);
list_for_each_entry(ns, >namespaces, list)
-   blk_mq_freeze_queue_wait(ns->queue);
+   blk_freeze_queue_wait(ns->queue);
mutex_unlock(>namespaces_mutex);
 }
 EXPORT_SYMBOL_GPL(nvme_wait_freeze);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 8ae77e088c01..f90d78eb85df 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -259,7 +259,7 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 void blk_freeze_queue(struct request_queue *q);
 void blk_unfreeze_queue(struct request_queue *q);
 void blk_freeze_queue_start(struct request_queue *q);
-void blk_mq_freeze_queue_wait(struct request_queue *q);
+void blk_freeze_queue_wait(struct request_queue *q);
 int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
 unsigned long timeout);
 int blk_mq_reinit_tagset(struct blk_mq_tag_set *set,
-- 
2.9.5



[PATCH V2 3/8] blk-mq: only run hw queues for blk-mq

2017-09-01 Thread Ming Lei
This patch just makes it explicitely.

Reviewed-by: Johannes Thumshirn 
Signed-off-by: Ming Lei 
---
 block/blk-mq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8cf1f7cbef2b..4c532d8612e1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -125,7 +125,8 @@ void blk_freeze_queue_start(struct request_queue *q)
freeze_depth = atomic_inc_return(>mq_freeze_depth);
if (freeze_depth == 1) {
percpu_ref_kill(>q_usage_counter);
-   blk_mq_run_hw_queues(q, false);
+   if (q->mq_ops)
+   blk_mq_run_hw_queues(q, false);
}
 }
 EXPORT_SYMBOL_GPL(blk_freeze_queue_start);
-- 
2.9.5



[PATCH V2 2/8] blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue

2017-09-01 Thread Ming Lei
This APIs will be used by legacy path too.

Signed-off-by: Ming Lei 
---
 block/bfq-iosched.c  |  2 +-
 block/blk-cgroup.c   |  4 ++--
 block/blk-mq.c   | 17 -
 block/blk-mq.h   |  1 -
 block/elevator.c |  2 +-
 drivers/block/loop.c |  8 
 drivers/block/rbd.c  |  2 +-
 drivers/nvme/host/core.c |  2 +-
 include/linux/blk-mq.h   |  2 +-
 9 files changed, 15 insertions(+), 25 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 509f39998011..ce2b00e897e2 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -4757,7 +4757,7 @@ static int bfq_init_queue(struct request_queue *q, struct 
elevator_type *e)
 * The invocation of the next bfq_create_group_hierarchy
 * function is the head of a chain of function calls
 * (bfq_create_group_hierarchy->blkcg_activate_policy->
-* blk_mq_freeze_queue) that may lead to the invocation of the
+* blk_freeze_queue) that may lead to the invocation of the
 * has_work hook function. For this reason,
 * bfq_create_group_hierarchy is invoked only after all
 * scheduler data has been initialized, apart from the fields
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 02e8a47ac77c..87c15f3947d5 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1296,7 +1296,7 @@ int blkcg_activate_policy(struct request_queue *q,
return 0;
 
if (q->mq_ops)
-   blk_mq_freeze_queue(q);
+   blk_freeze_queue(q);
else
blk_queue_bypass_start(q);
 pd_prealloc:
@@ -1363,7 +1363,7 @@ void blkcg_deactivate_policy(struct request_queue *q,
return;
 
if (q->mq_ops)
-   blk_mq_freeze_queue(q);
+   blk_freeze_queue(q);
else
blk_queue_bypass_start(q);
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 82136e83951d..8cf1f7cbef2b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -161,16 +161,7 @@ void blk_freeze_queue(struct request_queue *q)
blk_freeze_queue_start(q);
blk_mq_freeze_queue_wait(q);
 }
-
-void blk_mq_freeze_queue(struct request_queue *q)
-{
-   /*
-* ...just an alias to keep freeze and unfreeze actions balanced
-* in the blk_mq_* namespace
-*/
-   blk_freeze_queue(q);
-}
-EXPORT_SYMBOL_GPL(blk_mq_freeze_queue);
+EXPORT_SYMBOL_GPL(blk_freeze_queue);
 
 void blk_unfreeze_queue(struct request_queue *q)
 {
@@ -2248,7 +2239,7 @@ static void blk_mq_update_tag_set_depth(struct 
blk_mq_tag_set *set,
lockdep_assert_held(>tag_list_lock);
 
list_for_each_entry(q, >tag_list, tag_set_list) {
-   blk_mq_freeze_queue(q);
+   blk_freeze_queue(q);
queue_set_hctx_shared(q, shared);
blk_unfreeze_queue(q);
}
@@ -2683,7 +2674,7 @@ static int __blk_mq_update_nr_requests(struct 
request_queue *q,
if (!set)
return -EINVAL;
 
-   blk_mq_freeze_queue(q);
+   blk_freeze_queue(q);
 
ret = 0;
queue_for_each_hw_ctx(q, hctx, i) {
@@ -2747,7 +2738,7 @@ static void __blk_mq_update_nr_hw_queues(struct 
blk_mq_tag_set *set,
return;
 
list_for_each_entry(q, >tag_list, tag_set_list)
-   blk_mq_freeze_queue(q);
+   blk_freeze_queue(q);
 
set->nr_hw_queues = nr_hw_queues;
blk_mq_update_queue_map(set);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 1b9742eb7399..7ce29ef1e6f3 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -30,7 +30,6 @@ struct blk_mq_ctx {
 } cacheline_aligned_in_smp;
 
 void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
-void blk_mq_freeze_queue(struct request_queue *q);
 void blk_mq_free_queue(struct request_queue *q);
 int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr);
 void blk_mq_wake_waiters(struct request_queue *q);
diff --git a/block/elevator.c b/block/elevator.c
index 371c8165c9e8..1164c8a3720f 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -967,7 +967,7 @@ static int elevator_switch_mq(struct request_queue *q,
 {
int ret;
 
-   blk_mq_freeze_queue(q);
+   blk_freeze_queue(q);
 
if (q->elevator) {
if (q->elevator->registered)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 5c11ea44d470..b2e708b7e1e6 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -211,7 +211,7 @@ static void __loop_update_dio(struct loop_device *lo, bool 
dio)
 * LO_FLAGS_READ_ONLY, both are set from kernel, and losetup
 * will get updated by ioctl(LOOP_GET_STATUS)
 */
-   blk_mq_freeze_queue(lo->lo_queue);
+   blk_freeze_queue(lo->lo_queue);
lo->use_dio = use_dio;
if (use_dio)
lo->lo_flags |= LO_FLAGS_DIRECT_IO;
@@ -599,7 +599,7 @@ static int loop_switch(struct loop_device *lo, struct 

[PATCH V2 0/8] block/scsi: safe SCSI quiescing

2017-09-01 Thread Ming Lei
Hi,

The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.

Once SCSI device is put into QUIESCE, no new request except for RQF_PREEMPT
can be dispatched to SCSI successfully, and scsi_device_quiesce() just simply
waits for completion of I/Os dispatched to SCSI stack. It isn't enough at all.

Because new request still can be allocated, but all the allocated
requests can't be dispatched successfully, so request pool can be
consumed up easily.

Then request with RQF_PREEMPT can't be allocated, and system may
hang forever, such as during system suspend or SCSI domain alidation.

Both IO hang inside system suspend[1] or SCSI domain validation
were reported before.

This patch tries to solve the issue by freezing block queue during
SCSI quiescing, and allowing to allocate request of RQF_PREEMPT
when queue is frozen.

Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
them all by introducing preempt version of blk_freeze_queue() and
blk_unfreeze_queue().

V2:
- drop the 1st patch in V1 because percpu_ref_is_dying() is
enough as pointed by Tejun

- introduce preempt version of blk_[freeze|unfreeze]_queue

- sync between preempt freeze and normal freeze

- fix warning from percpu-refcount as reported by Oleksandr


[1] https://marc.info/?t=150340250100013=3=2



Ming Lei (8):
  blk-mq: rename blk_mq_unfreeze_queue as blk_unfreeze_queue
  blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue
  blk-mq: only run hw queues for blk-mq
  blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait
  block: tracking request allocation with q_usage_counter
  block: allow to allocate req with REQF_PREEMPT when queue is frozen
  block: introduce preempt version of blk_[freeze|unfreeze]_queue
  SCSI: freeze block queue when SCSI device is put into quiesce

 block/bfq-iosched.c  |   2 +-
 block/blk-cgroup.c   |   8 ++--
 block/blk-core.c |  50 
 block/blk-mq.c   | 119 ---
 block/blk-mq.h   |   1 -
 block/blk.h  |   6 +++
 block/elevator.c |   4 +-
 drivers/block/loop.c |  16 +++
 drivers/block/rbd.c  |   2 +-
 drivers/nvme/host/core.c |   8 ++--
 drivers/scsi/scsi_lib.c  |  21 -
 include/linux/blk-mq.h   |  15 +++---
 include/linux/blkdev.h   |  20 +++-
 13 files changed, 206 insertions(+), 66 deletions(-)

-- 
2.9.5



[PATCH V2 1/8] blk-mq: rename blk_mq_unfreeze_queue as blk_unfreeze_queue

2017-09-01 Thread Ming Lei
We will support to freeze queue on block legacy path too.

Signed-off-by: Ming Lei 
---
 block/blk-cgroup.c   |  4 ++--
 block/blk-mq.c   | 10 +-
 block/elevator.c |  2 +-
 drivers/block/loop.c |  8 
 drivers/nvme/host/core.c |  4 ++--
 include/linux/blk-mq.h   |  2 +-
 6 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 0480892e97e5..02e8a47ac77c 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1337,7 +1337,7 @@ int blkcg_activate_policy(struct request_queue *q,
spin_unlock_irq(q->queue_lock);
 out_bypass_end:
if (q->mq_ops)
-   blk_mq_unfreeze_queue(q);
+   blk_unfreeze_queue(q);
else
blk_queue_bypass_end(q);
if (pd_prealloc)
@@ -1388,7 +1388,7 @@ void blkcg_deactivate_policy(struct request_queue *q,
spin_unlock_irq(q->queue_lock);
 
if (q->mq_ops)
-   blk_mq_unfreeze_queue(q);
+   blk_unfreeze_queue(q);
else
blk_queue_bypass_end(q);
 }
diff --git a/block/blk-mq.c b/block/blk-mq.c
index d935f15c54da..82136e83951d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -172,7 +172,7 @@ void blk_mq_freeze_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue);
 
-void blk_mq_unfreeze_queue(struct request_queue *q)
+void blk_unfreeze_queue(struct request_queue *q)
 {
int freeze_depth;
 
@@ -183,7 +183,7 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
wake_up_all(>mq_freeze_wq);
}
 }
-EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
+EXPORT_SYMBOL_GPL(blk_unfreeze_queue);
 
 /*
  * FIXME: replace the scsi_internal_device_*block_nowait() calls in the
@@ -2250,7 +2250,7 @@ static void blk_mq_update_tag_set_depth(struct 
blk_mq_tag_set *set,
list_for_each_entry(q, >tag_list, tag_set_list) {
blk_mq_freeze_queue(q);
queue_set_hctx_shared(q, shared);
-   blk_mq_unfreeze_queue(q);
+   blk_unfreeze_queue(q);
}
 }
 
@@ -2708,7 +2708,7 @@ static int __blk_mq_update_nr_requests(struct 
request_queue *q,
if (!ret)
q->nr_requests = nr;
 
-   blk_mq_unfreeze_queue(q);
+   blk_unfreeze_queue(q);
 
return ret;
 }
@@ -2757,7 +2757,7 @@ static void __blk_mq_update_nr_hw_queues(struct 
blk_mq_tag_set *set,
}
 
list_for_each_entry(q, >tag_list, tag_set_list)
-   blk_mq_unfreeze_queue(q);
+   blk_unfreeze_queue(q);
 }
 
 void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues)
diff --git a/block/elevator.c b/block/elevator.c
index 0e465809d3f3..371c8165c9e8 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -994,7 +994,7 @@ static int elevator_switch_mq(struct request_queue *q,
blk_add_trace_msg(q, "elv switch: none");
 
 out:
-   blk_mq_unfreeze_queue(q);
+   blk_unfreeze_queue(q);
return ret;
 }
 
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 2fbd4089c20e..5c11ea44d470 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -217,7 +217,7 @@ static void __loop_update_dio(struct loop_device *lo, bool 
dio)
lo->lo_flags |= LO_FLAGS_DIRECT_IO;
else
lo->lo_flags &= ~LO_FLAGS_DIRECT_IO;
-   blk_mq_unfreeze_queue(lo->lo_queue);
+   blk_unfreeze_queue(lo->lo_queue);
 }
 
 static int
@@ -605,7 +605,7 @@ static int loop_switch(struct loop_device *lo, struct file 
*file)
do_loop_switch(lo, );
 
/* unfreeze */
-   blk_mq_unfreeze_queue(lo->lo_queue);
+   blk_unfreeze_queue(lo->lo_queue);
 
return 0;
 }
@@ -1079,7 +1079,7 @@ static int loop_clr_fd(struct loop_device *lo)
lo->lo_state = Lo_unbound;
/* This is safe: open() is still holding a reference. */
module_put(THIS_MODULE);
-   blk_mq_unfreeze_queue(lo->lo_queue);
+   blk_unfreeze_queue(lo->lo_queue);
 
if (lo->lo_flags & LO_FLAGS_PARTSCAN && bdev)
loop_reread_partitions(lo, bdev);
@@ -1191,7 +1191,7 @@ loop_set_status(struct loop_device *lo, const struct 
loop_info64 *info)
__loop_update_dio(lo, lo->use_dio);
 
  exit:
-   blk_mq_unfreeze_queue(lo->lo_queue);
+   blk_unfreeze_queue(lo->lo_queue);
 
if (!err && (info->lo_flags & LO_FLAGS_PARTSCAN) &&
 !(lo->lo_flags & LO_FLAGS_PARTSCAN)) {
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 37046ac2c441..5c76b0a96be2 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1226,7 +1226,7 @@ static void __nvme_revalidate_disk(struct gendisk *disk, 
struct nvme_id_ns *id)
 
if (ctrl->oncs & NVME_CTRL_ONCS_DSM)
nvme_config_discard(ns);
-   blk_mq_unfreeze_queue(disk->queue);
+   blk_unfreeze_queue(disk->queue);
 }
 
 static int nvme_revalidate_disk(struct gendisk 

[PATCH V2 0/8] block/scsi: safe SCSI quiescing

2017-09-01 Thread Ming Lei
Hi,

The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.

Once SCSI device is put into QUIESCE, no new request except for RQF_PREEMPT
can be dispatched to SCSI successfully, and scsi_device_quiesce() just simply
waits for completion of I/Os dispatched to SCSI stack. It isn't enough at all.

Because new request still can be allocated, but all the allocated
requests can't be dispatched successfully, so request pool can be
consumed up easily.

Then request with RQF_PREEMPT can't be allocated, and system may
hang forever, such as during system suspend or SCSI domain alidation.

Both IO hang inside system suspend[1] or SCSI domain validation
were reported before.

This patch tries to solve the issue by freezing block queue during
SCSI quiescing, and allowing to allocate request of RQF_PREEMPT
when queue is frozen.

Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
them all by introducing preempt version of blk_freeze_queue() and
blk_unfreeze_queue().

V2:
- drop the 1st patch in V1 because percpu_ref_is_dying() is
enough as pointed by Tejun

- introduce preempt version of blk_[freeze|unfreeze]_queue

- sync between preempt freeze and normal freeze

- fix warning from percpu-refcount as reported by Oleksandr



[1] https://marc.info/?t=150340250100013=3=2



Ming Lei (8):
  blk-mq: rename blk_mq_unfreeze_queue as blk_unfreeze_queue
  blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue
  blk-mq: only run hw queues for blk-mq
  blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait
  block: tracking request allocation with q_usage_counter
  block: allow to allocate req with REQF_PREEMPT when queue is frozen
  block: introduce preempt version of blk_[freeze|unfreeze]_queue
  SCSI: freeze block queue when SCSI device is put into quiesce

 block/bfq-iosched.c  |   2 +-
 block/blk-cgroup.c   |   8 ++--
 block/blk-core.c |  50 
 block/blk-mq.c   | 119 ---
 block/blk-mq.h   |   1 -
 block/blk.h  |   6 +++
 block/elevator.c |   4 +-
 drivers/block/loop.c |  16 +++
 drivers/block/rbd.c  |   2 +-
 drivers/nvme/host/core.c |   8 ++--
 drivers/scsi/scsi_lib.c  |  21 -
 include/linux/blk-mq.h   |  15 +++---
 include/linux/blkdev.h   |  20 +++-
 13 files changed, 206 insertions(+), 66 deletions(-)

-- 
2.9.5



[PATCH V2 1/8] blk-mq: rename blk_mq_unfreeze_queue as blk_unfreeze_queue

2017-09-01 Thread Ming Lei
We will support to freeze queue on block legacy path too.

Signed-off-by: Ming Lei 
---
 block/blk-cgroup.c   |  4 ++--
 block/blk-mq.c   | 10 +-
 block/elevator.c |  2 +-
 drivers/block/loop.c |  8 
 drivers/nvme/host/core.c |  4 ++--
 include/linux/blk-mq.h   |  2 +-
 6 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 0480892e97e5..02e8a47ac77c 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1337,7 +1337,7 @@ int blkcg_activate_policy(struct request_queue *q,
spin_unlock_irq(q->queue_lock);
 out_bypass_end:
if (q->mq_ops)
-   blk_mq_unfreeze_queue(q);
+   blk_unfreeze_queue(q);
else
blk_queue_bypass_end(q);
if (pd_prealloc)
@@ -1388,7 +1388,7 @@ void blkcg_deactivate_policy(struct request_queue *q,
spin_unlock_irq(q->queue_lock);
 
if (q->mq_ops)
-   blk_mq_unfreeze_queue(q);
+   blk_unfreeze_queue(q);
else
blk_queue_bypass_end(q);
 }
diff --git a/block/blk-mq.c b/block/blk-mq.c
index d935f15c54da..82136e83951d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -172,7 +172,7 @@ void blk_mq_freeze_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue);
 
-void blk_mq_unfreeze_queue(struct request_queue *q)
+void blk_unfreeze_queue(struct request_queue *q)
 {
int freeze_depth;
 
@@ -183,7 +183,7 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
wake_up_all(>mq_freeze_wq);
}
 }
-EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
+EXPORT_SYMBOL_GPL(blk_unfreeze_queue);
 
 /*
  * FIXME: replace the scsi_internal_device_*block_nowait() calls in the
@@ -2250,7 +2250,7 @@ static void blk_mq_update_tag_set_depth(struct 
blk_mq_tag_set *set,
list_for_each_entry(q, >tag_list, tag_set_list) {
blk_mq_freeze_queue(q);
queue_set_hctx_shared(q, shared);
-   blk_mq_unfreeze_queue(q);
+   blk_unfreeze_queue(q);
}
 }
 
@@ -2708,7 +2708,7 @@ static int __blk_mq_update_nr_requests(struct 
request_queue *q,
if (!ret)
q->nr_requests = nr;
 
-   blk_mq_unfreeze_queue(q);
+   blk_unfreeze_queue(q);
 
return ret;
 }
@@ -2757,7 +2757,7 @@ static void __blk_mq_update_nr_hw_queues(struct 
blk_mq_tag_set *set,
}
 
list_for_each_entry(q, >tag_list, tag_set_list)
-   blk_mq_unfreeze_queue(q);
+   blk_unfreeze_queue(q);
 }
 
 void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues)
diff --git a/block/elevator.c b/block/elevator.c
index 0e465809d3f3..371c8165c9e8 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -994,7 +994,7 @@ static int elevator_switch_mq(struct request_queue *q,
blk_add_trace_msg(q, "elv switch: none");
 
 out:
-   blk_mq_unfreeze_queue(q);
+   blk_unfreeze_queue(q);
return ret;
 }
 
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 2fbd4089c20e..5c11ea44d470 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -217,7 +217,7 @@ static void __loop_update_dio(struct loop_device *lo, bool 
dio)
lo->lo_flags |= LO_FLAGS_DIRECT_IO;
else
lo->lo_flags &= ~LO_FLAGS_DIRECT_IO;
-   blk_mq_unfreeze_queue(lo->lo_queue);
+   blk_unfreeze_queue(lo->lo_queue);
 }
 
 static int
@@ -605,7 +605,7 @@ static int loop_switch(struct loop_device *lo, struct file 
*file)
do_loop_switch(lo, );
 
/* unfreeze */
-   blk_mq_unfreeze_queue(lo->lo_queue);
+   blk_unfreeze_queue(lo->lo_queue);
 
return 0;
 }
@@ -1079,7 +1079,7 @@ static int loop_clr_fd(struct loop_device *lo)
lo->lo_state = Lo_unbound;
/* This is safe: open() is still holding a reference. */
module_put(THIS_MODULE);
-   blk_mq_unfreeze_queue(lo->lo_queue);
+   blk_unfreeze_queue(lo->lo_queue);
 
if (lo->lo_flags & LO_FLAGS_PARTSCAN && bdev)
loop_reread_partitions(lo, bdev);
@@ -1191,7 +1191,7 @@ loop_set_status(struct loop_device *lo, const struct 
loop_info64 *info)
__loop_update_dio(lo, lo->use_dio);
 
  exit:
-   blk_mq_unfreeze_queue(lo->lo_queue);
+   blk_unfreeze_queue(lo->lo_queue);
 
if (!err && (info->lo_flags & LO_FLAGS_PARTSCAN) &&
 !(lo->lo_flags & LO_FLAGS_PARTSCAN)) {
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 37046ac2c441..5c76b0a96be2 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1226,7 +1226,7 @@ static void __nvme_revalidate_disk(struct gendisk *disk, 
struct nvme_id_ns *id)
 
if (ctrl->oncs & NVME_CTRL_ONCS_DSM)
nvme_config_discard(ns);
-   blk_mq_unfreeze_queue(disk->queue);
+   blk_unfreeze_queue(disk->queue);
 }
 
 static int nvme_revalidate_disk(struct gendisk 

[PATCH V2 0/8] block/scsi: safe SCSI quiescing

2017-09-01 Thread Ming Lei
Hi,

The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.

Once SCSI device is put into QUIESCE, no new request except for RQF_PREEMPT
can be dispatched to SCSI successfully, and scsi_device_quiesce() just simply
waits for completion of I/Os dispatched to SCSI stack. It isn't enough at all.

Because new request still can be allocated, but all the allocated
requests can't be dispatched successfully, so request pool can be
consumed up easily.

Then request with RQF_PREEMPT can't be allocated, and system may
hang forever, such as during system suspend or SCSI domain alidation.

Both IO hang inside system suspend[1] or SCSI domain validation
were reported before.

This patch tries to solve the issue by freezing block queue during
SCSI quiescing, and allowing to allocate request of RQF_PREEMPT
when queue is frozen.

Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
them all by introducing preempt version of blk_freeze_queue() and
blk_unfreeze_queue().

V2:
- drop the 1st patch in V1 because percpu_ref_is_dying() is
enough as pointed by Tejun

- introduce preempt version of blk_[freeze|unfreeze]_queue

- sync between preempt freeze and normal freeze

- fix warning from percpu-refcount as reported by Oleksandr



[1] https://marc.info/?t=150340250100013=3=2


Ming Lei (8):
  blk-mq: rename blk_mq_unfreeze_queue as blk_unfreeze_queue
  blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue
  blk-mq: only run hw queues for blk-mq
  blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait
  block: tracking request allocation with q_usage_counter
  block: allow to allocate req with REQF_PREEMPT when queue is frozen
  block: introduce preempt version of blk_[freeze|unfreeze]_queue
  SCSI: freeze block queue when SCSI device is put into quiesce

 block/bfq-iosched.c  |   2 +-
 block/blk-cgroup.c   |   8 ++--
 block/blk-core.c |  50 
 block/blk-mq.c   | 119 ---
 block/blk-mq.h   |   1 -
 block/blk.h  |   6 +++
 block/elevator.c |   4 +-
 drivers/block/loop.c |  16 +++
 drivers/block/rbd.c  |   2 +-
 drivers/nvme/host/core.c |   8 ++--
 drivers/scsi/scsi_lib.c  |  21 -
 include/linux/blk-mq.h   |  15 +++---
 include/linux/blkdev.h   |  17 ++-
 13 files changed, 203 insertions(+), 66 deletions(-)

-- 
2.9.5



Re: [PATCH V3 06/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed

2017-09-01 Thread Bart Van Assche
On Fri, 2017-09-01 at 11:02 +0800, Ming Lei wrote:
> That is a good question, but it has been answered by the comment
> before checking the DISPATCH_BUSY state in blk_mq_sched_dispatch_requests():
> 
>   /*
>* If DISPATCH_BUSY is set, that means hw queue is busy
>* and requests in the list of hctx->dispatch need to
>* be flushed first, so return early.
>*
>* Wherever DISPATCH_BUSY is set, blk_mq_run_hw_queue()
>* will be run to try to make progress, so it is always
>* safe to check the state here.
>*/
> 
> Suppose the two writes are reordered, sooner or latter, the
> list_empty_careful(>dispatch) will be observed, and
> will dispatch request from this hw queue after '->dispatch'
> is flushed.
> 
> Since now the setting of DISPATCH_BUSY requires to hold
> hctx->lock, we should avoid to add barrier there.

Although it is not clear to me how anyone who has not followed this discussion
is assumed to figure out all these subtleties, if the other comments get
addressed:

Reviewed-by: Bart Van Assche 



[PATCH V2 1/2] block/loop: fix use after free

2017-09-01 Thread Shaohua Li
From: Shaohua Li 

lo_rw_aio->call_read_iter->
1   aops->direct_IO
2   iov_iter_revert
lo_rw_aio_complete could happen between 1 and 2, the bio and bvec could
be freed before 2, which accesses bvec.

Signed-off-by: Shaohua Li 
---
 drivers/block/loop.c | 16 +---
 drivers/block/loop.h |  5 -
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 3a35121..78c47c4 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -463,14 +463,21 @@ static void lo_complete_rq(struct request *rq)
blk_mq_end_request(rq, cmd->ret < 0 ? BLK_STS_IOERR : BLK_STS_OK);
 }
 
+static void lo_rw_aio_do_completion(struct loop_cmd *cmd)
+{
+   if (!atomic_dec_and_test(>ref))
+   return;
+   kfree(cmd->bvec);
+   cmd->bvec = NULL;
+   blk_mq_complete_request(cmd->rq);
+}
+
 static void lo_rw_aio_complete(struct kiocb *iocb, long ret, long ret2)
 {
struct loop_cmd *cmd = container_of(iocb, struct loop_cmd, iocb);
 
-   kfree(cmd->bvec);
-   cmd->bvec = NULL;
cmd->ret = ret;
-   blk_mq_complete_request(cmd->rq);
+   lo_rw_aio_do_completion(cmd);
 }
 
 static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
@@ -518,6 +525,7 @@ static int lo_rw_aio(struct loop_device *lo, struct 
loop_cmd *cmd,
bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
segments = bio_segments(bio);
}
+   atomic_set(>ref, 2);
 
iov_iter_bvec(, ITER_BVEC | rw, bvec,
  segments, blk_rq_bytes(rq));
@@ -533,6 +541,8 @@ static int lo_rw_aio(struct loop_device *lo, struct 
loop_cmd *cmd,
else
ret = call_read_iter(file, >iocb, );
 
+   lo_rw_aio_do_completion(cmd);
+
if (ret != -EIOCBQUEUED)
cmd->iocb.ki_complete(>iocb, ret, 0);
return 0;
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index 43d20d3..b0ba4a5 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -68,7 +68,10 @@ struct loop_cmd {
struct kthread_work work;
struct request *rq;
struct list_head list;
-   bool use_aio;   /* use AIO interface to handle I/O */
+   union {
+   bool use_aio; /* use AIO interface to handle I/O */
+   atomic_t ref; /* only for aio */
+   };
long ret;
struct kiocb iocb;
struct bio_vec *bvec;
-- 
2.9.5



[PATCH V2 2/2] block/loop: remove unused field

2017-09-01 Thread Shaohua Li
From: Shaohua Li 

nobody uses the list.

Signed-off-by: Shaohua Li 
---
 drivers/block/loop.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index b0ba4a5..f68c1d5 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -67,7 +67,6 @@ struct loop_device {
 struct loop_cmd {
struct kthread_work work;
struct request *rq;
-   struct list_head list;
union {
bool use_aio; /* use AIO interface to handle I/O */
atomic_t ref; /* only for aio */
-- 
2.9.5



Re: [PATCH 3/5] bfq: Check kstrtoul() return value

2017-09-01 Thread weiping zhang
> Sorry but I do not like this proposal because:
> * If invalid input is provided writing into a sysfs attribute should fail
>   instead of ignoring the invalid input silently.
> * simple_strtoul() is considered obsolete and must not be used in new code.
>   From include/linux/kernel.h:
>
> /* Obsolete, do not use.  Use kstrto instead */
>
> extern unsigned long simple_strtoul(const char *,char **,unsigned int);
> extern long simple_strtol(const char *,char **,unsigned int);
> extern unsigned long long simple_strtoull(const char *,char **,unsigned int);
> extern long long simple_strtoll(const char *,char **,unsigned int);
>

Hello Bart,

Agree with you, it seems more reasonable to give error message to user.

Thanks


Re: [PATCH 3/5] bfq: Check kstrtoul() return value

2017-09-01 Thread Bart Van Assche
On Sat, 2017-09-02 at 01:37 +0800, weiping zhang wrote:
> 2017-09-02 1:14 GMT+08:00 Paolo Valente :
> > Il giorno 30 ago 2017, alle ore 20:42, Bart Van Assche 
> >  ha scritto:
> > > 
> > > Make sysfs writes fail for invalid numbers instead of storing
> > > uninitialized data copied from the stack. This patch removes
> > > all uninitialized_var() occurrences from the BFQ source code.
> > > 
> > > Signed-off-by: Bart Van Assche 
> > > Cc: Paolo Valente 
> > 
> > Acked-by: Paolo Valente 
> 
> how about using simple_strtoul  which was used in cfq/mq-iosched.c
> *var = simple_strtoul(p, , 10);
> 
> if invalid string came from sysfs, this function just return 0,
> and there are validations after every calling bfq_var_store.

Hello Weiping,

Sorry but I do not like this proposal because:
* If invalid input is provided writing into a sysfs attribute should fail
  instead of ignoring the invalid input silently.
* simple_strtoul() is considered obsolete and must not be used in new code.
  From include/linux/kernel.h:

/* Obsolete, do not use.  Use kstrto instead */

extern unsigned long simple_strtoul(const char *,char **,unsigned int);
extern long simple_strtol(const char *,char **,unsigned int);
extern unsigned long long simple_strtoull(const char *,char **,unsigned int);
extern long long simple_strtoll(const char *,char **,unsigned int);

Bart.

Re: [PATCH 3/5] bfq: Check kstrtoul() return value

2017-09-01 Thread weiping zhang
2017-09-02 1:14 GMT+08:00 Paolo Valente :
>
>> Il giorno 30 ago 2017, alle ore 20:42, Bart Van Assche 
>>  ha scritto:
>>
>> Make sysfs writes fail for invalid numbers instead of storing
>> uninitialized data copied from the stack. This patch removes
>> all uninitialized_var() occurrences from the BFQ source code.
>>
>> Signed-off-by: Bart Van Assche 
>> Cc: Paolo Valente 
>
> Acked-by: Paolo Valente 
>

Hi Bart,

how about using simple_strtoul  which was used in cfq/mq-iosched.c
*var = simple_strtoul(p, , 10);

if invalid string came from sysfs, this function just return 0,
and there are validations after every calling bfq_var_store.

Thanks


Re: [PATCH 8/8] scsi: Introduce ZBC disk I/O scheduler

2017-09-01 Thread Bart Van Assche
On Fri, 2017-09-01 at 20:36 +0900, Damien Le Moal wrote:
> +static struct scsi_disk *__zoned_scsi_disk(struct request_queue *q)
> +{
> +   struct scsi_device *sdp;
> +   struct scsi_disk *sdkp;
> +
> +   if (!blk_queue_is_zoned(q)) {
> +   pr_err("zoned: Not a zoned block device\n");
> +   return NULL;
> +   }
> +
> +   sdp = scsi_device_from_queue(q);
> +   if (!sdp) {
> +   pr_err("zoned: Not a SCSI device\n");
> +   return NULL;
> +   }
> +
> +   sdkp = dev_get_drvdata(>sdev_gendev);
> +   if (WARN_ON(sdkp->disk->queue != q))
> +   return NULL;
> +
> +   return sdkp;
> +}

Hello Damien,

Can reading sdkp->disk->queue cause a kernel crash if sdp does not point at
a SCSI device that is associated with a SCSI disk? How about using something
like the code below to convert a request queue pointer into a SCSI disk
pointer?

static int lookup_disk(struct device *dev, void *data)
{
struct scsi_disk **sdkp = data;

if (!*sdkp && dev->class == _disk_class)
*sdkp = to_scsi_disk(dev);

return 0;
}

static struct scsi_disk *q_to_sdkp(struct request_queue *q)
{
struct scsi_device *sdp = scsi_device_from_queue(q);
struct scsi_disk *sdkp = NULL;

if (sdp)
device_for_each_child(>sdev_gendev, , lookup_disk);
return sdkp;
}

Thanks,

Bart.

Re: [PATCH 4/5] bfq: Suppress compiler warnings about comparisons

2017-09-01 Thread Paolo Valente

> Il giorno 30 ago 2017, alle ore 20:42, Bart Van Assche 
>  ha scritto:
> 
> This patch avoids that the following warnings are reported when
> building with W=1:
> 
> block/bfq-iosched.c: In function 'bfq_back_seek_max_store':
> block/bfq-iosched.c:4860:13: warning: comparison of unsigned expression < 0 
> is always false [-Wtype-limits]
>  if (__data < (MIN))  \
> ^
> block/bfq-iosched.c:4876:1: note: in expansion of macro 'STORE_FUNCTION'
> STORE_FUNCTION(bfq_back_seek_max_store, >bfq_back_max, 0, INT_MAX, 0);
> ^~
> block/bfq-iosched.c: In function 'bfq_slice_idle_store':
> block/bfq-iosched.c:4860:13: warning: comparison of unsigned expression < 0 
> is always false [-Wtype-limits]
>  if (__data < (MIN))  \
> ^
> block/bfq-iosched.c:4879:1: note: in expansion of macro 'STORE_FUNCTION'
> STORE_FUNCTION(bfq_slice_idle_store, >bfq_slice_idle, 0, INT_MAX, 2);
> ^~
> block/bfq-iosched.c: In function 'bfq_slice_idle_us_store':
> block/bfq-iosched.c:4892:13: warning: comparison of unsigned expression < 0 
> is always false [-Wtype-limits]
>  if (__data < (MIN))  \
> ^
> block/bfq-iosched.c:4899:1: note: in expansion of macro 'USEC_STORE_FUNCTION'
> USEC_STORE_FUNCTION(bfq_slice_idle_us_store, >bfq_slice_idle, 0,
> ^~~
> 
> Signed-off-by: Bart Van Assche 
> Cc: Paolo Valente 

Acked-by: Paolo Valente 

> ---
> block/bfq-iosched.c | 20 ++--
> 1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index cf92f16eb5f2..d375b29ad085 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -4851,16 +4851,16 @@ static ssize_t
> \
> __FUNC(struct elevator_queue *e, const char *page, size_t count)  \
> { \
>   struct bfq_data *bfqd = e->elevator_data;   \
> - unsigned long __data;   \
> + unsigned long __data, __min = (MIN), __max = (MAX); \
>   int ret;\
>   \
>   ret = bfq_var_store(&__data, (page));   \
>   if (ret)\
>   return ret; \
> - if (__data < (MIN)) \
> - __data = (MIN); \
> - else if (__data > (MAX))\
> - __data = (MAX); \
> + if (__data < __min) \
> + __data = __min; \
> + else if (__data > __max)\
> + __data = __max; \
>   if (__CONV == 1)\
>   *(__PTR) = msecs_to_jiffies(__data);\
>   else if (__CONV == 2)   \
> @@ -4883,16 +4883,16 @@ STORE_FUNCTION(bfq_slice_idle_store, 
> >bfq_slice_idle, 0, INT_MAX, 2);
> static ssize_t __FUNC(struct elevator_queue *e, const char *page, size_t 
> count)\
> { \
>   struct bfq_data *bfqd = e->elevator_data;   \
> - unsigned long __data;   \
> + unsigned long __data, __min = (MIN), __max = (MAX); \
>   int ret;\
>   \
>   ret = bfq_var_store(&__data, (page));   \
>   if (ret)\
>   return ret; \
> - if (__data < (MIN)) \
> - __data = (MIN); \
> - else if (__data > (MAX))\
> - __data = (MAX); \
> + if (__data < __min) \
> + __data = __min; \
> + else if (__data > __max)\
> + __data = __max; \
>   *(__PTR) = (u64)__data * NSEC_PER_USEC; \
>   return count;

Re: [PATCH 5/5] bfq: Use icq_to_bic() consistently

2017-09-01 Thread Paolo Valente

> Il giorno 30 ago 2017, alle ore 20:42, Bart Van Assche 
>  ha scritto:
> 
> Some code uses icq_to_bic() to convert an io_cq pointer to a
> bfq_io_cq pointer while other code uses a direct cast. Convert
> the code that uses a direct cast such that it uses icq_to_bic().
> 
> Signed-off-by: Bart Van Assche 
> Cc: Paolo Valente 

Acked-by: Paolo Valente 

> ---
> block/bfq-iosched.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index d375b29ad085..7d1b85e393b2 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -239,7 +239,7 @@ static int T_slow[2];
> static int T_fast[2];
> static int device_speed_thresh[2];
> 
> -#define RQ_BIC(rq)   ((struct bfq_io_cq *) (rq)->elv.priv[0])
> +#define RQ_BIC(rq)   icq_to_bic((rq)->elv.priv[0])
> #define RQ_BFQQ(rq)   ((rq)->elv.priv[1])
> 
> struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, bool is_sync)
> -- 
> 2.14.1
> 



Re: [PATCH 2/5] bfq: Declare local functions static

2017-09-01 Thread Paolo Valente

> Il giorno 30 ago 2017, alle ore 20:42, Bart Van Assche 
>  ha scritto:
> 
> Signed-off-by: Bart Van Assche 
> Cc: Paolo Valente 


Acked-by: Paolo Valente 

> ---
> block/bfq-cgroup.c | 18 +-
> 1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
> index 78b2e0db4fb2..ceefb9a706d6 100644
> --- a/block/bfq-cgroup.c
> +++ b/block/bfq-cgroup.c
> @@ -206,7 +206,7 @@ static void bfqg_get(struct bfq_group *bfqg)
>   bfqg->ref++;
> }
> 
> -void bfqg_put(struct bfq_group *bfqg)
> +static void bfqg_put(struct bfq_group *bfqg)
> {
>   bfqg->ref--;
> 
> @@ -385,7 +385,7 @@ static struct bfq_group_data *blkcg_to_bfqgd(struct blkcg 
> *blkcg)
>   return cpd_to_bfqgd(blkcg_to_cpd(blkcg, _policy_bfq));
> }
> 
> -struct blkcg_policy_data *bfq_cpd_alloc(gfp_t gfp)
> +static struct blkcg_policy_data *bfq_cpd_alloc(gfp_t gfp)
> {
>   struct bfq_group_data *bgd;
> 
> @@ -395,7 +395,7 @@ struct blkcg_policy_data *bfq_cpd_alloc(gfp_t gfp)
>   return >pd;
> }
> 
> -void bfq_cpd_init(struct blkcg_policy_data *cpd)
> +static void bfq_cpd_init(struct blkcg_policy_data *cpd)
> {
>   struct bfq_group_data *d = cpd_to_bfqgd(cpd);
> 
> @@ -403,12 +403,12 @@ void bfq_cpd_init(struct blkcg_policy_data *cpd)
>   CGROUP_WEIGHT_DFL : BFQ_WEIGHT_LEGACY_DFL;
> }
> 
> -void bfq_cpd_free(struct blkcg_policy_data *cpd)
> +static void bfq_cpd_free(struct blkcg_policy_data *cpd)
> {
>   kfree(cpd_to_bfqgd(cpd));
> }
> 
> -struct blkg_policy_data *bfq_pd_alloc(gfp_t gfp, int node)
> +static struct blkg_policy_data *bfq_pd_alloc(gfp_t gfp, int node)
> {
>   struct bfq_group *bfqg;
> 
> @@ -426,7 +426,7 @@ struct blkg_policy_data *bfq_pd_alloc(gfp_t gfp, int node)
>   return >pd;
> }
> 
> -void bfq_pd_init(struct blkg_policy_data *pd)
> +static void bfq_pd_init(struct blkg_policy_data *pd)
> {
>   struct blkcg_gq *blkg = pd_to_blkg(pd);
>   struct bfq_group *bfqg = blkg_to_bfqg(blkg);
> @@ -445,7 +445,7 @@ void bfq_pd_init(struct blkg_policy_data *pd)
>   bfqg->rq_pos_tree = RB_ROOT;
> }
> 
> -void bfq_pd_free(struct blkg_policy_data *pd)
> +static void bfq_pd_free(struct blkg_policy_data *pd)
> {
>   struct bfq_group *bfqg = pd_to_bfqg(pd);
> 
> @@ -453,7 +453,7 @@ void bfq_pd_free(struct blkg_policy_data *pd)
>   bfqg_put(bfqg);
> }
> 
> -void bfq_pd_reset_stats(struct blkg_policy_data *pd)
> +static void bfq_pd_reset_stats(struct blkg_policy_data *pd)
> {
>   struct bfq_group *bfqg = pd_to_bfqg(pd);
> 
> @@ -740,7 +740,7 @@ static void bfq_reparent_active_entities(struct bfq_data 
> *bfqd,
>  * blkio already grabs the queue_lock for us, so no need to use
>  * RCU-based magic
>  */
> -void bfq_pd_offline(struct blkg_policy_data *pd)
> +static void bfq_pd_offline(struct blkg_policy_data *pd)
> {
>   struct bfq_service_tree *st;
>   struct bfq_group *bfqg = pd_to_bfqg(pd);
> -- 
> 2.14.1
> 



Re: [PATCH 3/5] bfq: Check kstrtoul() return value

2017-09-01 Thread Paolo Valente

> Il giorno 30 ago 2017, alle ore 20:42, Bart Van Assche 
>  ha scritto:
> 
> Make sysfs writes fail for invalid numbers instead of storing
> uninitialized data copied from the stack. This patch removes
> all uninitialized_var() occurrences from the BFQ source code.
> 
> Signed-off-by: Bart Van Assche 
> Cc: Paolo Valente 

Acked-by: Paolo Valente 

> ---
> block/bfq-iosched.c | 52 +---
> 1 file changed, 37 insertions(+), 15 deletions(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 8c11c2e827a5..cf92f16eb5f2 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -4802,13 +4802,15 @@ static ssize_t bfq_var_show(unsigned int var, char 
> *page)
>   return sprintf(page, "%u\n", var);
> }
> 
> -static void bfq_var_store(unsigned long *var, const char *page)
> +static int bfq_var_store(unsigned long *var, const char *page)
> {
>   unsigned long new_val;
>   int ret = kstrtoul(page, 10, _val);
> 
> - if (ret == 0)
> - *var = new_val;
> + if (ret)
> + return ret;
> + *var = new_val;
> + return 0;
> }
> 
> #define SHOW_FUNCTION(__FUNC, __VAR, __CONV)  \
> @@ -4849,8 +4851,12 @@ static ssize_t 
> \
> __FUNC(struct elevator_queue *e, const char *page, size_t count)  \
> { \
>   struct bfq_data *bfqd = e->elevator_data;   \
> - unsigned long uninitialized_var(__data);\
> - bfq_var_store(&__data, (page)); \
> + unsigned long __data;   \
> + int ret;\
> + \
> + ret = bfq_var_store(&__data, (page));   \
> + if (ret)\
> + return ret; \
>   if (__data < (MIN)) \
>   __data = (MIN); \
>   else if (__data > (MAX))\
> @@ -4877,8 +4883,12 @@ STORE_FUNCTION(bfq_slice_idle_store, 
> >bfq_slice_idle, 0, INT_MAX, 2);
> static ssize_t __FUNC(struct elevator_queue *e, const char *page, size_t 
> count)\
> { \
>   struct bfq_data *bfqd = e->elevator_data;   \
> - unsigned long uninitialized_var(__data);\
> - bfq_var_store(&__data, (page)); \
> + unsigned long __data;   \
> + int ret;\
> + \
> + ret = bfq_var_store(&__data, (page));   \
> + if (ret)\
> + return ret; \
>   if (__data < (MIN)) \
>   __data = (MIN); \
>   else if (__data > (MAX))\
> @@ -4894,9 +4904,12 @@ static ssize_t bfq_max_budget_store(struct 
> elevator_queue *e,
>   const char *page, size_t count)
> {
>   struct bfq_data *bfqd = e->elevator_data;
> - unsigned long uninitialized_var(__data);
> + unsigned long __data;
> + int ret;
> 
> - bfq_var_store(&__data, (page));
> + ret = bfq_var_store(&__data, (page));
> + if (ret)
> + return ret;
> 
>   if (__data == 0)
>   bfqd->bfq_max_budget = bfq_calc_max_budget(bfqd);
> @@ -4919,9 +4932,12 @@ static ssize_t bfq_timeout_sync_store(struct 
> elevator_queue *e,
> const char *page, size_t count)
> {
>   struct bfq_data *bfqd = e->elevator_data;
> - unsigned long uninitialized_var(__data);
> + unsigned long __data;
> + int ret;
> 
> - bfq_var_store(&__data, (page));
> + ret = bfq_var_store(&__data, (page));
> + if (ret)
> + return ret;
> 
>   if (__data < 1)
>   __data = 1;
> @@ -4939,9 +4955,12 @@ static ssize_t bfq_strict_guarantees_store(struct 
> elevator_queue *e,
>const char *page, size_t count)
> {
>   struct bfq_data *bfqd = e->elevator_data;
> - unsigned long uninitialized_var(__data);
> + unsigned long __data;
> + int ret;
> 
> - bfq_var_store(&__data, 

Re: [PATCH 1/5] bfq: Annotate fall-through in a switch statement

2017-09-01 Thread Paolo Valente

> Il giorno 30 ago 2017, alle ore 20:42, Bart Van Assche 
>  ha scritto:
> 
> This patch avoids that gcc 7 issues a warning about fall-through
> when building with W=1.
> 
> Signed-off-by: Bart Van Assche 
> Cc: Paolo Valente 

Acked-by: Paolo Valente 

> ---
> block/bfq-iosched.c | 1 +
> 1 file changed, 1 insertion(+)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 6a7a26b6cec1..8c11c2e827a5 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -3780,6 +3780,7 @@ bfq_set_next_ioprio_data(struct bfq_queue *bfqq, struct 
> bfq_io_cq *bic)
>   default:
>   dev_err(bfqq->bfqd->queue->backing_dev_info->dev,
>   "bfq: bad prio class %d\n", ioprio_class);
> + /* fall through */
>   case IOPRIO_CLASS_NONE:
>   /*
>* No prio set, inherit CPU scheduling settings.
> -- 
> 2.14.1
> 



Re: [PATCH 8/9] block: allow to allocate req with REQF_PREEMPT when queue is frozen

2017-09-01 Thread Ming Lei
On Fri, Sep 01, 2017 at 03:43:39PM +, Bart Van Assche wrote:
> On Fri, 2017-09-01 at 11:55 +0800, Ming Lei wrote:
> > On Thu, Aug 31, 2017 at 10:50:25PM +, Bart Van Assche wrote:
> > > On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> > > > @@ -1413,9 +1414,17 @@ static struct request 
> > > > *blk_old_get_request(struct request_queue *q,
> > > > /* create ioc upfront */
> > > > create_io_context(gfp_mask, q->node);
> > > >  
> > > > -   ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
> > > > +   /*
> > > > +* When queue is frozen, we still need to allocate req for
> > > > +* REQF_PREEMPT.
> > > > +*/
> > > > +   if ((flags & BLK_MQ_REQ_PREEMPT) && blk_queue_is_frozen(q))
> > > > +   blk_queue_enter_live(q);
> > > > +   else
> > > > +   ret = blk_queue_enter(q, !(gfp_mask & 
> > > > __GFP_DIRECT_RECLAIM));
> > > 
> > > Sorry but I doubt that calling blk_queue_enter_live() from inside
> > > blk_old_get_request() is safe. Calling blk_queue_enter_live() is only safe
> > > before a request queue has been marked dead. What prevents a kernel thread
> > > that holds a reference on a request queue and that is running concurrently
> > > with blk_cleanup_queue() to call blk_old_get_request() after a queue has
> > > been marked dead?
> > 
> > I have sent one delta patch in list, which will only call
> > blk_queue_enter_live() iff SCSI device is in QUIESCE.
> 
> That wouldn't make this hack less ugly.
> 
> It is possible to trigger the running -> quiesce state transition through
> /sys/class/scsi_device/*/device/state and the quiesce -> removed transition
> through /sys/class/scsi_device/*/device/delete. An example:
> 
> modprobe scsi_debug delay=0 dev_size_mb=16
> lsscsi | grep scsi_debug
> cd /sys/class/scsi_device/8:0:0:0/device
> echo quiesce > state
> echo 1 > delete
> 
> So the code from your patch 8/9 can race against device removal.
> 
> I think we need a better approach than the REQ_PREEMPT hack. How about
> implementing resume as a callback by adding an additional function pointer
> to struct request_queue / struct blk_mq_ops instead of implementing it as
> a request? For SCSI devices races of resume against removal can e.g. be
> handled by checking the scsi_device_set_state() return value. That function
> namely does not allow the removing/removed to running state transition.

If there is race between resume vs. removal, that is nothing to do
this patchset.

We definitely need to prevent new requests from being allocated after
SCSI device is put into quiesce. I don't see another better way
than freezing queue, because it is the only available way to 
prevent new req allocation.

Actually there is race between normal freezing and the preempt freezing
in this patchset, and it will be fixed in V2.


-- 
Ming


Re: [PATCH 7/8] scsi: sd_zbc: Disable zone write locking with scsi-mq

2017-09-01 Thread Bart Van Assche
On Fri, 2017-09-01 at 20:36 +0900, Damien Le Moal wrote:
> In the case of a ZBC disk used with scsi-mq, zone write locking does
> not prevent write reordering in sequential zones. Unlike the legacy
> case, zone locking can only be done after the command request is
> removed from the scheduler dispatch queue. That is, at the time of
> zone locking, the write command may already be out of order.
> 
> Disable zone write locking in sd_zbc_write_lock_zone() if the disk is
> used with scsi-mq. Write order guarantees can be provided by an
> adapted I/O scheduler.

Reviewed-by: Bart Van Assche 



Re: [PATCH 6/8] scsi: sd_zbc: Limit zone write locking to sequential zones

2017-09-01 Thread Bart Van Assche
On Fri, 2017-09-01 at 20:36 +0900, Damien Le Moal wrote:
> Zoned block devices have no write constraints for conventional zones
> so write locking of conventional zones is not necessary and can hurt
> performance. To avoid this, introduce the seq_zones bitmap to indicate
> if a zone is a sequential one. Use this information to allow any write
> to be issued to conventional zones, locking only sequential zones.
> 
> As the zone bitmap allocation for seq_zones is identical to the zones
> write lock bitmap, introduce the helper sd_zbc_alloc_zone_bitmap().
> Using this helper, wait for the disk capacity and number of zones to
> stabilize on the second revalidation pass to allocate and initialize
> the bitmaps.

Reviewed-by: Bart Van Assche 

Re: [PATCH 5/8] scsi: sd_zbc: Fix sd_zbc_read_zoned_characteristics()

2017-09-01 Thread Bart Van Assche
On Fri, 2017-09-01 at 20:36 +0900, Damien Le Moal wrote:
> The three values starting at byte 8 of the Zoned Block Device
> Characteristics VPD page B6h are 32 bits values, not 64bits. So use
> get_unaligned_be32() to retrieve the values and not get_unaligned_be64()

Reviewed-by: Bart Van Assche 

Re: [PATCH 4/8] scsi: sd_zbc: Reorganize and cleanup

2017-09-01 Thread Bart Van Assche
On Fri, 2017-09-01 at 20:36 +0900, Damien Le Moal wrote:
> Introduce sd_zbc.h for zbc related declarations (avoid cluttering sd.h).
> 
> Fix comments style in sd_zbc.c (do not use documentation format) and
> add/fix comments to enhance explanations. Also remove a useless blank
> line in sd_zbc_read_zones().
> 
> Rearrange sd_zbc_setup() to include use_16_for_rw and use_10_for_rw
> assignment and simplify nr_zones calculation.
> 
> Finally, use the min() macro sd_zbc_check_zone_size() to get a report
> zone reply size instead of the open coded version.

Hello Damien,

Should this patch perhaps be split into multiple patches to make review
easier?

> +static inline void sd_zbc_remove(struct scsi_disk *sdkp) {}

Please use the same coding style as in other Linux kernel header files,
namely '{' and '}' both at the start of a line. See e.g. include/linux/sysfs.h.

Thanks,

Bart.

Re: [PATCH 3/8] scsi: sd_zbc: Move ZBC declarations to scsi_proto.h

2017-09-01 Thread Bart Van Assche
On Fri, 2017-09-01 at 20:36 +0900, Damien Le Moal wrote:
> +/* Zone types of REPORT ZONES zone descriptors */
> +enum zbc_zone_type {
> + ZBC_ZONE_TYPE_CONV  = 0x1,
> + ZBC_ZONE_TYPE_SEQWRITE_REQ  = 0x2,
> + ZBC_ZONE_TYPE_SEQWRITE_PREF = 0x3,
> + /* 0x4 to 0xf are reserved */
> + ZBC_ZONE_TYPE_RESERVED  = 0x4,
> +};

Hello Damien,

Since no code is using ZBC_ZONE_TYPE_RESERVED, is the comment about reserved
values sufficient and can the definition of ZBC_ZONE_TYPE_RESERVED be left out?

Anyway:

Reviewed-by: Bart Van Assche 



Re: [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead()

2017-09-01 Thread Ming Lei
On Fri, Sep 01, 2017 at 06:59:15AM -0700, Tejun Heo wrote:
> Hello, Ming.
> 
> > +/**
> > + * percpu_ref_is_dead - test whether a percpu refcount is killed
> > + * @ref: percpu_ref to test
> > + *
> > + * Returns %true if @ref is dead
> > + *
> > + * This function is safe to call as long as @ref is between init and exit.
> > + */
> > +static inline bool percpu_ref_is_dead(struct percpu_ref *ref)
> > +{
> > +   unsigned long __percpu *percpu_count;
> > +
> > +   if (__ref_is_percpu(ref, _count))
> > +   return false;
> > +   return ref->percpu_count_ptr & __PERCPU_REF_DEAD;
> > +}
> 
> Can you please explain why percpu_ref_is_dying() isn't enough for your
> use case?

Hi Tejun,

You are right, looks percpu_ref_is_dying() is enough,
percpu_ref_get_many() works fine no matter the ref is dying or not.

Thanks!

-- 
Ming


Re: [PATCH 2/8] block: Fix declaration of blk-mq scheduler functions

2017-09-01 Thread Bart Van Assche
On Fri, 2017-09-01 at 20:36 +0900, Damien Le Moal wrote:
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -284,6 +284,16 @@ void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set 
> *set, int nr_hw_queues);
>  void blk_mq_quiesce_queue_nowait(struct request_queue *q);
>  
>  /*
> + * Scheduler helper functions.
> + */
> +void blk_mq_sched_free_hctx_data(struct request_queue *q,
> +  void (*exit)(struct blk_mq_hw_ctx *));
> +bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
> + struct request **merged_request);
> +bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request 
> *rq);
> +void blk_mq_sched_request_inserted(struct request *rq);
> +
> +/*
>   * Driver command data is immediately after the request. So subtract request
>   * size to get back to the original request, add request size to get the PDU.
>   */

Hello Damien,

A similar comment as for patch 1/8 for this patch: should these declarations
perhaps be moved to a new header file, e.g. include/linux/blk-mq-sched.h
because these functions are only used by I/O schedulers and not in the
implementation of block drivers? That will help to avoid that block driver
authors start wondering whether or not they have to call use these functions
while they are reading include/linux/blk-mq.h.

Thanks,

Bart.

Re: [PATCH 1/8] block: Fix declaration of blk-mq debugfs functions

2017-09-01 Thread Bart Van Assche
On Fri, 2017-09-01 at 20:36 +0900, Damien Le Moal wrote:
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 14542308d25b..a369174a9679 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -5,6 +5,21 @@
>  #include 
>  #include 
>  
> +#ifdef CONFIG_BLK_DEBUG_FS
> +
> +#include 
> +
> +struct blk_mq_debugfs_attr {
> + const char *name;
> + umode_t mode;
> + int (*show)(void *, struct seq_file *);
> + ssize_t (*write)(void *, const char __user *, size_t, loff_t *);
> + /* Set either .show or .seq_ops. */
> + const struct seq_operations *seq_ops;
> +};
> +
> +#endif
> +
>  struct blk_mq_tags;
>  struct blk_flush_queue;
>  
> @@ -289,4 +304,9 @@ static inline void *blk_mq_rq_to_pdu(struct request *rq)
>   for ((i) = 0; (i) < (hctx)->nr_ctx &&   \
>({ ctx = (hctx)->ctxs[(i)]; 1; }); (i)++)
>  
> +#ifdef CONFIG_BLK_DEBUG_FS
> +int __blk_mq_debugfs_rq_show(struct seq_file *m, struct request *rq);
> +int blk_mq_debugfs_rq_show(struct seq_file *m, void *v);
> +#endif
> +
>  #endif

Hello Damien,

Should these definitions perhaps be moved to a new header file, e.g.
include/linux/blk-mq-debugfs.h, such that the introduction of more #ifdefs
and #include  in include/linux/blk-mq.h can be avoided?

Thanks,

Bart.

Re: blk_get_request sleeps in schedule

2017-09-01 Thread Bart Van Assche
On Fri, 2017-09-01 at 08:07 +0530, Suraj Choudhari wrote:
> On my test system, blk_get_request() continuously hangs in schedule().
> 
> schedule+0x35/0x80
> schedule_timeout+0x237/0x2d0
> io_schedule_timeout+0xa6/0x110
> get_request+0x258/0x760
> blk_get_request+0x7f/0x100
> 
> I suspect the problem might be due to __get_request returning ENOMEM,
> leading to sleep in schedule().

Hello Suraj,

In addition to what Jens wrote: please consider switching to blk-mq such
that you can use the information in /sys/kernel/debug/block to analyze
this further.

Bart.

Re: [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead()

2017-09-01 Thread Bart Van Assche
On Fri, 2017-09-01 at 06:59 -0700, Tejun Heo wrote:
> > +/**
> > + * percpu_ref_is_dead - test whether a percpu refcount is killed
> > + * @ref: percpu_ref to test
> > + *
> > + * Returns %true if @ref is dead
> > + *
> > + * This function is safe to call as long as @ref is between init and exit.
> > + */
> > +static inline bool percpu_ref_is_dead(struct percpu_ref *ref)
> > +{
> > +   unsigned long __percpu *percpu_count;
> > +
> > +   if (__ref_is_percpu(ref, _count))
> > +   return false;
> > +   return ref->percpu_count_ptr & __PERCPU_REF_DEAD;
> > +}
> 
> Can you please explain why percpu_ref_is_dying() isn't enough for your
> use case?

Hello Tejun,

The approach of the whole series is wrong so I think that you can ignore this 
patch.

Bart.

Re: [PATCH 8/9] block: allow to allocate req with REQF_PREEMPT when queue is frozen

2017-09-01 Thread Bart Van Assche
On Fri, 2017-09-01 at 11:55 +0800, Ming Lei wrote:
> On Thu, Aug 31, 2017 at 10:50:25PM +, Bart Van Assche wrote:
> > On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> > > @@ -1413,9 +1414,17 @@ static struct request *blk_old_get_request(struct 
> > > request_queue *q,
> > >   /* create ioc upfront */
> > >   create_io_context(gfp_mask, q->node);
> > >  
> > > - ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
> > > + /*
> > > +  * When queue is frozen, we still need to allocate req for
> > > +  * REQF_PREEMPT.
> > > +  */
> > > + if ((flags & BLK_MQ_REQ_PREEMPT) && blk_queue_is_frozen(q))
> > > + blk_queue_enter_live(q);
> > > + else
> > > + ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
> > 
> > Sorry but I doubt that calling blk_queue_enter_live() from inside
> > blk_old_get_request() is safe. Calling blk_queue_enter_live() is only safe
> > before a request queue has been marked dead. What prevents a kernel thread
> > that holds a reference on a request queue and that is running concurrently
> > with blk_cleanup_queue() to call blk_old_get_request() after a queue has
> > been marked dead?
> 
> I have sent one delta patch in list, which will only call
> blk_queue_enter_live() iff SCSI device is in QUIESCE.

That wouldn't make this hack less ugly.

It is possible to trigger the running -> quiesce state transition through
/sys/class/scsi_device/*/device/state and the quiesce -> removed transition
through /sys/class/scsi_device/*/device/delete. An example:

modprobe scsi_debug delay=0 dev_size_mb=16
lsscsi | grep scsi_debug
cd /sys/class/scsi_device/8:0:0:0/device
echo quiesce > state
echo 1 > delete

So the code from your patch 8/9 can race against device removal.

I think we need a better approach than the REQ_PREEMPT hack. How about
implementing resume as a callback by adding an additional function pointer
to struct request_queue / struct blk_mq_ops instead of implementing it as
a request? For SCSI devices races of resume against removal can e.g. be
handled by checking the scsi_device_set_state() return value. That function
namely does not allow the removing/removed to running state transition.

Bart.

[PATCH V2] block: sed-opal: Set MBRDone on S3 resume path if TPER is MBREnabled

2017-09-01 Thread Scott Bauer
Users who are booting off their Opal enabled drives are having
issues when they have a shadow MBR set up after s3/resume cycle.
When the Drive has a shadow MBR setup the MBRDone flag is set to
false upon power loss (S3/S4/S5). When the MBRDone flag is false
I/O to LBA 0 -> LBA_END_MBR are remapped to the shadow mbr
of the drive. If the drive contains useful data in the 0 -> end_mbr
range upon s3 resume the user can never get to that data as the
drive will keep remapping it to the MBR. To fix this when we unlock
on S3 resume, we need to tell the drive that we're done with the
shadow mbr (even though we didnt use it) by setting true to MBRDone.
This way the drive will stop the remapping and the user can access
their data.

Signed-off-by: Scott Bauer 
---
 block/opal_proto.h |  1 +
 block/sed-opal.c   | 32 
 2 files changed, 33 insertions(+)

diff --git a/block/opal_proto.h b/block/opal_proto.h
index f40c9acf8895..e20be8258854 100644
--- a/block/opal_proto.h
+++ b/block/opal_proto.h
@@ -46,6 +46,7 @@ enum opal_response_token {
 #define GENERIC_HOST_SESSION_NUM 0x41
 
 #define TPER_SYNC_SUPPORTED 0x01
+#define MBR_ENABLED_MASK 0x10
 
 #define TINY_ATOM_DATA_MASK 0x3F
 #define TINY_ATOM_SIGNED 0x40
diff --git a/block/sed-opal.c b/block/sed-opal.c
index 9b30ae5ab843..9ed51d0c6b1d 100644
--- a/block/sed-opal.c
+++ b/block/sed-opal.c
@@ -80,6 +80,7 @@ struct parsed_resp {
 
 struct opal_dev {
bool supported;
+   bool mbr_enabled;
 
void *data;
sec_send_recv *send_recv;
@@ -283,6 +284,14 @@ static bool check_tper(const void *data)
return true;
 }
 
+static bool check_mbrenabled(const void *data)
+{
+   const struct d0_locking_features *lfeat = data;
+   u8 sup_feat = lfeat->supported_features;
+
+   return !!(sup_feat & MBR_ENABLED_MASK);
+}
+
 static bool check_sum(const void *data)
 {
const struct d0_single_user_mode *sum = data;
@@ -417,6 +426,7 @@ static int opal_discovery0_end(struct opal_dev *dev)
u32 hlen = be32_to_cpu(hdr->length);
 
print_buffer(dev->resp, hlen);
+   dev->mbr_enabled = false;
 
if (hlen > IO_BUFFER_LENGTH - sizeof(*hdr)) {
pr_debug("Discovery length overflows buffer (%zu+%u)/%u\n",
@@ -442,6 +452,8 @@ static int opal_discovery0_end(struct opal_dev *dev)
check_geometry(dev, body);
break;
case FC_LOCKING:
+   dev->mbr_enabled = check_mbrenabled(body->features);
+   break;
case FC_ENTERPRISE:
case FC_DATASTORE:
/* some ignored properties */
@@ -2190,6 +2202,21 @@ static int __opal_lock_unlock(struct opal_dev *dev,
return next(dev);
 }
 
+static int __opal_set_mbr_done(struct opal_dev *dev, struct opal_key *key)
+{
+   u8 mbr_done_tf = 1;
+   const struct opal_step mbrdone_step [] = {
+   { opal_discovery0, },
+   { start_admin1LSP_opal_session, key },
+   { set_mbr_done, _done_tf },
+   { end_opal_session, },
+   { NULL, }
+   };
+
+   dev->steps = mbrdone_step;
+   return next(dev);
+}
+
 static int opal_lock_unlock(struct opal_dev *dev,
struct opal_lock_unlock *lk_unlk)
 {
@@ -2345,6 +2372,11 @@ bool opal_unlock_from_suspend(struct opal_dev *dev)
 suspend->unlk.session.sum);
was_failure = true;
}
+   if (dev->mbr_enabled) {
+   ret = __opal_set_mbr_done(dev, 
>unlk.session.opal_key);
+   if (ret)
+   pr_debug("Failed to set MBR Done in S3 
resume\n");
+   }
}
mutex_unlock(>dev_lock);
return was_failure;
-- 
2.11.0



Re: non-blocking buffered reads V5

2017-09-01 Thread Jens Axboe
On 09/01/2017 02:13 AM, Christoph Hellwig wrote:
> Given that we got all the reviews is there as chance to get
> this picked up for 4.14?  The last round of nowait bits went
> through the block tree, and this one has the block bits as well,
> so maybe we should go for that again?

If Al is happy with it, it's not a problem for me.

-- 
Jens Axboe



Re: blk_get_request sleeps in schedule

2017-09-01 Thread Jens Axboe
On 08/31/2017 08:37 PM, Suraj Choudhari wrote:
> Hello,
> 
> On my test system, blk_get_request() continuously hangs in schedule().
> 
> schedule+0x35/0x80
> schedule_timeout+0x237/0x2d0
> io_schedule_timeout+0xa6/0x110
> get_request+0x258/0x760
> blk_get_request+0x7f/0x100
> 
> I suspect the problem might be due to __get_request returning ENOMEM,
> leading to sleep in schedule().
> 
> static struct request *__get_request(struct request_list *rl, unsigned
> int op, struct bio *bio, gfp_t gfp_mask)   {
> ...
> /*
> * The queue is full and the allocating
> * process is not a "batcher", and not
> * exempted by the IO scheduler
> */
> return ERR_PTR(-ENOMEM);
> ...
> 
> if (rl->count[is_sync] >= (3 * q->nr_requests / 2))
> return ERR_PTR(-ENOMEM);
> ...
> 
> }
> 
> However, there is no significant memory pressure or IO load on my test
> system, I wonder why the __get_request might be failing leading to
> sleep.
> 
> Any possible cause?   Any suggestions are welcome!

You're missing a lot of detail here. What kernel are you using? What
storage device hw/driver are you seeing this on? What are you doing
to reproduce it?

Basically all the things that someone would need to know, to have
any sort of educated guess as to what is going on.

-- 
Jens Axboe



Re: [PATCH V3 0/2] block/loop: improve performance

2017-09-01 Thread Jens Axboe
On 08/31/2017 11:09 PM, Shaohua Li wrote:
> From: Shaohua Li 
> 
> two small patches to improve performance for loop in directio mode. The goal 
> is
> to increase IO size sending to underlayer disks.

Applied for 4.14, thanks.

-- 
Jens Axboe



Re: [PATCH] kernfs: checking for IS_ERR() instead of NULL

2017-09-01 Thread Jens Axboe
On 08/30/2017 08:04 AM, Dan Carpenter wrote:
> The kernfs_get_inode() returns NULL on error, it never returns error
> pointers.

Added for 4.14, with the ack from Tejun and Greg.

-- 
Jens Axboe



[PATCH v2 2/2] blkcg: add wrappers for struct blkcg_policy operations

2017-09-01 Thread weiping zhang
Some blkcg policies may not implement all operations in struct blkcg_policy,
add wrappers for these pol->xxx_fn.

Signed-off-by: weiping zhang 
---
 block/blk-cgroup.c | 53 --
 include/linux/blk-cgroup.h | 72 ++
 2 files changed, 97 insertions(+), 28 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 0c45870..8588739 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -71,7 +71,7 @@ static void blkg_free(struct blkcg_gq *blkg)
 
for (i = 0; i < BLKCG_MAX_POLS; i++)
if (blkg->pd[i])
-   blkcg_policy[i]->pd_free_fn(blkg->pd[i]);
+   blkg_pol_free_pd(blkcg_policy[i], blkg->pd[i]);
 
if (blkg->blkcg != _root)
blk_exit_rl(blkg->q, >rl);
@@ -124,7 +124,7 @@ static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, 
struct request_queue *q,
continue;
 
/* alloc per-policy data and attach it to blkg */
-   pd = pol->pd_alloc_fn(gfp_mask, q->node);
+   pd = blkg_pol_alloc_pd(pol, gfp_mask, q->node);
if (!pd)
goto err_free;
 
@@ -218,8 +218,8 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
for (i = 0; i < BLKCG_MAX_POLS; i++) {
struct blkcg_policy *pol = blkcg_policy[i];
 
-   if (blkg->pd[i] && pol->pd_init_fn)
-   pol->pd_init_fn(blkg->pd[i]);
+   if (blkg->pd[i])
+   blkg_pol_init_pd(pol, blkg->pd[i]);
}
 
/* insert */
@@ -232,8 +232,8 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
for (i = 0; i < BLKCG_MAX_POLS; i++) {
struct blkcg_policy *pol = blkcg_policy[i];
 
-   if (blkg->pd[i] && pol->pd_online_fn)
-   pol->pd_online_fn(blkg->pd[i]);
+   if (blkg->pd[i])
+   blkg_pol_online_pd(pol, blkg->pd[i]);
}
}
blkg->online = true;
@@ -323,8 +323,8 @@ static void blkg_destroy(struct blkcg_gq *blkg)
for (i = 0; i < BLKCG_MAX_POLS; i++) {
struct blkcg_policy *pol = blkcg_policy[i];
 
-   if (blkg->pd[i] && pol->pd_offline_fn)
-   pol->pd_offline_fn(blkg->pd[i]);
+   if (blkg->pd[i])
+   blkg_pol_offline_pd(pol, blkg->pd[i]);
}
 
if (parent) {
@@ -457,8 +457,8 @@ static int blkcg_reset_stats(struct cgroup_subsys_state 
*css,
for (i = 0; i < BLKCG_MAX_POLS; i++) {
struct blkcg_policy *pol = blkcg_policy[i];
 
-   if (blkg->pd[i] && pol->pd_reset_stats_fn)
-   pol->pd_reset_stats_fn(blkg->pd[i]);
+   if (blkg->pd[i])
+   blkg_pol_reset_pd_stats(pol, blkg->pd[i]);
}
}
 
@@ -1045,7 +1045,7 @@ static void blkcg_css_free(struct cgroup_subsys_state 
*css)
 
for (i = 0; i < BLKCG_MAX_POLS; i++)
if (blkcg->cpd[i])
-   blkcg_policy[i]->cpd_free_fn(blkcg->cpd[i]);
+   blkcg_pol_free_cpd(blkcg_policy[i], blkcg->cpd[i]);
 
mutex_unlock(_pol_mutex);
 
@@ -1084,7 +1084,7 @@ blkcg_css_alloc(struct cgroup_subsys_state *parent_css)
if (!pol || !pol->cpd_alloc_fn)
continue;
 
-   cpd = pol->cpd_alloc_fn(GFP_KERNEL);
+   cpd = blkcg_pol_alloc_cpd(pol, GFP_KERNEL);
if (!cpd) {
ret = ERR_PTR(-ENOMEM);
goto free_pd_blkcg;
@@ -1092,8 +1092,7 @@ blkcg_css_alloc(struct cgroup_subsys_state *parent_css)
blkcg->cpd[i] = cpd;
cpd->blkcg = blkcg;
cpd->plid = i;
-   if (pol->cpd_init_fn)
-   pol->cpd_init_fn(cpd);
+   blkcg_pol_init_cpd(pol, cpd);
}
 
spin_lock_init(>lock);
@@ -1110,7 +1109,7 @@ blkcg_css_alloc(struct cgroup_subsys_state *parent_css)
 free_pd_blkcg:
for (i--; i >= 0; i--)
if (blkcg->cpd[i])
-   blkcg_policy[i]->cpd_free_fn(blkcg->cpd[i]);
+   blkcg_pol_free_cpd(blkcg_policy[i], blkcg->cpd[i]);
 free_blkcg:
kfree(blkcg);
mutex_unlock(_pol_mutex);
@@ -1244,7 +1243,7 @@ static void blkcg_bind(struct cgroup_subsys_state 
*root_css)
 
list_for_each_entry(blkcg, _blkcgs, all_blkcgs_node)
if (blkcg->cpd[pol->plid])
-   pol->cpd_bind_fn(blkcg->cpd[pol->plid]);
+   blkcg_pol_bind_cpd(pol, blkcg->cpd[pol->plid]);
}
mutex_unlock(_pol_mutex);
 }
@@ -1301,7 +1300,7 @@ int 

[PATCH v2 1/2] blkcg: check pol->cpd_free_fn before free cpd

2017-09-01 Thread weiping zhang
check pol->cpd_free_fn() instead of pol->cpd_alloc_fn() when free cpd.

Signed-off-by: weiping zhang 
---
 block/blk-cgroup.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 0480892..0c45870 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1450,7 +1450,7 @@ int blkcg_policy_register(struct blkcg_policy *pol)
return 0;
 
 err_free_cpds:
-   if (pol->cpd_alloc_fn) {
+   if (pol->cpd_free_fn) {
list_for_each_entry(blkcg, _blkcgs, all_blkcgs_node) {
if (blkcg->cpd[pol->plid]) {
pol->cpd_free_fn(blkcg->cpd[pol->plid]);
@@ -1490,7 +1490,7 @@ void blkcg_policy_unregister(struct blkcg_policy *pol)
/* remove cpds and unregister */
mutex_lock(_pol_mutex);
 
-   if (pol->cpd_alloc_fn) {
+   if (pol->cpd_free_fn) {
list_for_each_entry(blkcg, _blkcgs, all_blkcgs_node) {
if (blkcg->cpd[pol->plid]) {
pol->cpd_free_fn(blkcg->cpd[pol->plid]);
-- 
2.9.4



[PATCH v2 0/2] Add wrapper for blkcg policy operatins

2017-09-01 Thread weiping zhang
The first patch is the V2 of [PATCH] blkcg: check pol->cpd_free_fn
before free cpd, it fixs a checking before free cpd.

The second patch add some wrappers for struct blkcg_policy->xxx_fn, because not
every block cgroup policy implement all operations.

weiping zhang (2):
  blkcg: check pol->cpd_free_fn before free cpd
  blkcg: add wrappers for struct blkcg_policy operations

 block/blk-cgroup.c | 57 +---
 include/linux/blk-cgroup.h | 72 ++
 2 files changed, 99 insertions(+), 30 deletions(-)

-- 
2.9.4



Re: [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead()

2017-09-01 Thread Tejun Heo
Hello, Ming.

> +/**
> + * percpu_ref_is_dead - test whether a percpu refcount is killed
> + * @ref: percpu_ref to test
> + *
> + * Returns %true if @ref is dead
> + *
> + * This function is safe to call as long as @ref is between init and exit.
> + */
> +static inline bool percpu_ref_is_dead(struct percpu_ref *ref)
> +{
> + unsigned long __percpu *percpu_count;
> +
> + if (__ref_is_percpu(ref, _count))
> + return false;
> + return ref->percpu_count_ptr & __PERCPU_REF_DEAD;
> +}

Can you please explain why percpu_ref_is_dying() isn't enough for your
use case?

Thanks.

-- 
tejun


Re: [PATCH V7 00/10] mmc: Add Command Queue support

2017-09-01 Thread Adrian Hunter
On 01/09/17 15:58, Ulf Hansson wrote:
> + Christoph
> 
> On 1 September 2017 at 13:42, Adrian Hunter  wrote:
>> On 31/08/17 14:56, Adrian Hunter wrote:
>>> Here is V7 of the hardware command queue patches without the software
>>> command queue patches, now using blk-mq.
>>>
>>> HW CMDQ offers 25% - 50% better random multi-threaded I/O.  I see a slight
>>> 2% drop in sequential read speed but no change to sequential write.
>>
>> Any comments?
> 
> A couple of overall comments, for now.
> 
> To make sure we don't overlook something when converting to mq, I
> would prefer that we first convert the existing mmc block code to mq,
> then we add CMDQ on top.

That doesn't make sense.  This patch set is not converting the legacy driver
to mq therefore it cannot overlook anything for converting to mq.

> 
> Regarding patch1. in the long-term solution, we should really strive
> towards getting rid of the big mmc host lock, as it causes us to lose
> some of the benefits on which principles mq is designed upon.

Lose what benefits?

> However, it may be feasible to use patch1 as an intermediate step, to
> be able to convert to mq short term. Then we can continue doing the
> re-factoring, to get rid of the big mmc host lock and solve other
> relates issues/optimize things. What remains to be seen in this
> approach, is how well mq performs, as we would still be using the big
> mmc host lock. We may tolerate some minor regressions, but if too
> much, there is no other way that first removing the big mmc lock.

Unlike HDDs, eMMC must switch between internal partitions (LUNs) so there
must a way to arbitrate between them.

> Of course, if Linus/Christoph don't think this intermediate approach
> is good enough, we will have to respect that as well.


Re: [PATCH V7 00/10] mmc: Add Command Queue support

2017-09-01 Thread Ulf Hansson
+ Christoph

On 1 September 2017 at 13:42, Adrian Hunter  wrote:
> On 31/08/17 14:56, Adrian Hunter wrote:
>> Here is V7 of the hardware command queue patches without the software
>> command queue patches, now using blk-mq.
>>
>> HW CMDQ offers 25% - 50% better random multi-threaded I/O.  I see a slight
>> 2% drop in sequential read speed but no change to sequential write.
>
> Any comments?

A couple of overall comments, for now.

To make sure we don't overlook something when converting to mq, I
would prefer that we first convert the existing mmc block code to mq,
then we add CMDQ on top.

Regarding patch1. in the long-term solution, we should really strive
towards getting rid of the big mmc host lock, as it causes us to lose
some of the benefits on which principles mq is designed upon.

However, it may be feasible to use patch1 as an intermediate step, to
be able to convert to mq short term. Then we can continue doing the
re-factoring, to get rid of the big mmc host lock and solve other
relates issues/optimize things. What remains to be seen in this
approach, is how well mq performs, as we would still be using the big
mmc host lock. We may tolerate some minor regressions, but if too
much, there is no other way that first removing the big mmc lock.

Of course, if Linus/Christoph don't think this intermediate approach
is good enough, we will have to respect that as well.

Kind regards
Uffe


Re: [PATCH 4/9] blk-mq: only run hw queues for blk-mq

2017-09-01 Thread Johannes Thumshirn
On Fri, Sep 01, 2017 at 08:33:40PM +0800, Ming Lei wrote:
> Given it is the only case that blk_mq_run_hw_queues() is run on
> !q->mq_ops now, I suggest to check q->mq_ops outside, otherwise
> it can be a bit overkill.

Fair enough,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH 4/9] blk-mq: only run hw queues for blk-mq

2017-09-01 Thread Ming Lei
On Fri, Sep 01, 2017 at 10:16:34AM +0200, Johannes Thumshirn wrote:
> Hi Ming,
> 
> On Fri, Sep 01, 2017 at 01:27:23AM +0800, Ming Lei wrote:
> > -   blk_mq_run_hw_queues(q, false);
> > +   if (q->mq_ops)
> > +   blk_mq_run_hw_queues(q, false);
> 
> What speaks against putting the if (q->mq_ops) directly into
> blk_mq_run_hw_queues() so we can't accidently call it from the sq path?
> 
> Just an idea, no hard preferences here.

Given it is the only case that blk_mq_run_hw_queues() is run on
!q->mq_ops now, I suggest to check q->mq_ops outside, otherwise
it can be a bit overkill.

-- 
Ming


Re: [PATCH V7 00/10] mmc: Add Command Queue support

2017-09-01 Thread Adrian Hunter
On 31/08/17 14:56, Adrian Hunter wrote:
> Here is V7 of the hardware command queue patches without the software
> command queue patches, now using blk-mq.
> 
> HW CMDQ offers 25% - 50% better random multi-threaded I/O.  I see a slight
> 2% drop in sequential read speed but no change to sequential write.

Any comments?


[PATCH 7/8] scsi: sd_zbc: Disable zone write locking with scsi-mq

2017-09-01 Thread Damien Le Moal
In the case of a ZBC disk used with scsi-mq, zone write locking does
not prevent write reordering in sequential zones. Unlike the legacy
case, zone locking can only be done after the command request is
removed from the scheduler dispatch queue. That is, at the time of
zone locking, the write command may already be out of order.

Disable zone write locking in sd_zbc_write_lock_zone() if the disk is
used with scsi-mq. Write order guarantees can be provided by an
adapted I/O scheduler.

Signed-off-by: Damien Le Moal 
---
 drivers/scsi/sd_zbc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 3a9feadcc133..0f0a74fbd9c5 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -258,6 +258,10 @@ int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd)
(sector & (zone_sectors - 1)) + blk_rq_sectors(rq) > zone_sectors)
return BLKPREP_KILL;
 
+   /* No write locking with scsi-mq */
+   if (rq->q->mq_ops)
+   return BLKPREP_OK;
+
/*
 * There is no write constraint on conventional zones, but do not issue
 * more than one write at a time per sequential zone. This avoids write
-- 
2.13.5



[PATCH 8/8] scsi: Introduce ZBC disk I/O scheduler

2017-09-01 Thread Damien Le Moal
The zoned I/O scheduler is mostly identical to mq-deadline and retains
the same configuration attributes. The main difference is that the
zoned scheduler will ensure that at any time at most only one write
request (command) per sequential zone is in flight (has been issued to
the disk) in order to protect against sequential write reordering
potentially resulting from the concurrent execution of request dispatch
by multiple contexts.

This is achieved similarly to the legacy SCSI I/O path by write locking
zones when a write requests is issued. Subsequent writes to the same
zone have to wait for the completion of the previously issued write
before being in turn dispatched to the disk. This ensures that
sequential writes are processed in the correct order without needing
any modification to the execution model of blk-mq. In addition, this
also protects against write reordering at the HBA level (e.g. AHCI).

This zoned scheduler code is added under the drivers/scsi directory so
that information managed using the scsi_disk structure can be directly
referenced. Doing so, cluttering the block layer with device type
specific code is avoided.

Signed-off-by: Damien Le Moal 
---
 Documentation/block/zoned-iosched.txt |  48 ++
 block/Kconfig.iosched |  12 +
 drivers/scsi/Makefile |   1 +
 drivers/scsi/sd.h |   3 +-
 drivers/scsi/sd_zbc.c |  16 +-
 drivers/scsi/sd_zbc.h |  11 +-
 drivers/scsi/sd_zbc_iosched.c | 934 ++
 7 files changed, 1009 insertions(+), 16 deletions(-)
 create mode 100644 Documentation/block/zoned-iosched.txt
 create mode 100644 drivers/scsi/sd_zbc_iosched.c

diff --git a/Documentation/block/zoned-iosched.txt 
b/Documentation/block/zoned-iosched.txt
new file mode 100644
index ..b269125bdc61
--- /dev/null
+++ b/Documentation/block/zoned-iosched.txt
@@ -0,0 +1,48 @@
+Zoned I/O scheduler
+===
+
+The Zoned I/O scheduler solves zoned block devices write ordering problems
+inherent to the absence of a global request queue lock in the blk-mq
+infrastructure. Multiple contexts may try to dispatch simultaneously write
+requests to the same sequential zone of a zoned block device, doing so
+potentially breaking the sequential write order imposed by the device.
+
+The Zoned I/O scheduler is based on the mq-deadline scheduler. It shares the
+same tunables and behaves in a comparable manner. The main difference 
introduced
+with the zoned scheduler is handling of write batches. Whereas mq-deadline will
+keep dispatching write requests to the device as long as the batching size
+allows and reads are not starved, the zoned scheduler introduces additional
+constraints:
+1) At most only one write request can be issued to a sequential zone of the
+device. This ensures that no reordering of sequential writes to a sequential
+zone can happen once the write request leaves the scheduler internal queue (rb
+tree).
+2) The number of sequential zones that can be simultaneously written is limited
+to the device advertized maximum number of open zones. This additional 
condition
+avoids performance degradation due to excessive open zone resource use at the
+device level.
+
+These conditions do not apply to write requests targeting conventional zones.
+For these, the zoned scheduler behaves exactly like the mq-deadline scheduler.
+
+The zoned I/O scheduler cannot be used with regular block devices. It can only
+be used with host-managed or host-aware zoned block devices.
+Using the zoned I/O scheduler is mandatory with host-managed disks unless the
+disk user tightly controls itself write sequencing to sequential zones. The
+zoned scheduler will treat host-aware disks exactly the same way as 
host-managed
+devices. That is, eventhough host aware disks can be randomly written, the 
zoned
+scheduler will still impose the limit to one write per zone so that sequential
+writes sequences are preserved.
+
+For host-managed disks, automating the used of the zoned scheduler can be done
+simply with a udev rule. An example of such rule is shown below.
+
+# Set zoned scheduler for host-managed zoned block devices
+ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/zoned}=="host-managed", \
+   ATTR{queue/scheduler}="zoned"
+
+Zoned I/O scheduler tunables
+
+
+Tunables of the Zoned I/O scheduler are identical to those of the deadline
+I/O scheduler. See Documentation/block/deadline-iosched.txt for details.
diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
index fd2cefa47d35..b87c67dbf1f6 100644
--- a/block/Kconfig.iosched
+++ b/block/Kconfig.iosched
@@ -70,6 +70,18 @@ config MQ_IOSCHED_DEADLINE
---help---
  MQ version of the deadline IO scheduler.
 
+config MQ_IOSCHED_ZONED
+   tristate "Zoned I/O scheduler"
+   depends on BLK_DEV_ZONED
+   depends on SCSI_MOD
+   depends on BLK_DEV_SD
+  

[PATCH 6/8] scsi: sd_zbc: Limit zone write locking to sequential zones

2017-09-01 Thread Damien Le Moal
Zoned block devices have no write constraints for conventional zones
so write locking of conventional zones is not necessary and can hurt
performance. To avoid this, introduce the seq_zones bitmap to indicate
if a zone is a sequential one. Use this information to allow any write
to be issued to conventional zones, locking only sequential zones.

As the zone bitmap allocation for seq_zones is identical to the zones
write lock bitmap, introduce the helper sd_zbc_alloc_zone_bitmap().
Using this helper, wait for the disk capacity and number of zones to
stabilize on the second revalidation pass to allocate and initialize
the bitmaps.

Signed-off-by: Damien Le Moal 
---
 drivers/scsi/sd.h |   1 +
 drivers/scsi/sd_zbc.c | 113 ++
 drivers/scsi/sd_zbc.h |   9 
 3 files changed, 106 insertions(+), 17 deletions(-)

diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 5940390d30f3..761d3fbf72ef 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -77,6 +77,7 @@ struct scsi_disk {
unsigned intzone_blocks;
unsigned intzone_shift;
unsigned long   *zones_wlock;
+   unsigned long   *seq_zones;
unsigned intzones_optimal_open;
unsigned intzones_optimal_nonseq;
unsigned intzones_max_open;
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 0e6b5f5d14e7..3a9feadcc133 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -246,7 +246,7 @@ int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd)
struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
sector_t sector = blk_rq_pos(rq);
sector_t zone_sectors = sd_zbc_zone_sectors(sdkp);
-   unsigned int zno = sd_zbc_zone_no(sdkp, sector);
+   unsigned int zno;
 
/*
 * Note: Checks of the alignment of the write command on
@@ -254,21 +254,20 @@ int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd)
 */
 
/* Do not allow zone boundaries crossing on host-managed drives */
-   if (blk_queue_zoned_model(sdkp->disk->queue) == BLK_ZONED_HM &&
+   if (blk_queue_zoned_model(rq->q) == BLK_ZONED_HM &&
(sector & (zone_sectors - 1)) + blk_rq_sectors(rq) > zone_sectors)
return BLKPREP_KILL;
 
/*
-* Do not issue more than one write at a time per
-* zone. This solves write ordering problems due to
-* the unlocking of the request queue in the dispatch
-* path in the non scsi-mq case. For scsi-mq, this
-* also avoids potential write reordering when multiple
-* threads running on different CPUs write to the same
-* zone (with a synchronized sequential pattern).
+* There is no write constraint on conventional zones, but do not issue
+* more than one write at a time per sequential zone. This avoids write
+* ordering problems due to the unlocking of the request queue in the
+* dispatch path of the non scsi-mq (legacy) case.
 */
-   if (sdkp->zones_wlock &&
-   test_and_set_bit(zno, sdkp->zones_wlock))
+   zno = sd_zbc_zone_no(sdkp, sector);
+   if (!test_bit(zno, sdkp->seq_zones))
+   return BLKPREP_OK;
+   if (test_and_set_bit(zno, sdkp->zones_wlock))
return BLKPREP_DEFER;
 
WARN_ON_ONCE(cmd->flags & SCMD_ZONE_WRITE_LOCK);
@@ -286,8 +285,9 @@ void sd_zbc_write_unlock_zone(struct scsi_cmnd *cmd)
struct request *rq = cmd->request;
struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
 
-   if (sdkp->zones_wlock && cmd->flags & SCMD_ZONE_WRITE_LOCK) {
+   if (cmd->flags & SCMD_ZONE_WRITE_LOCK) {
unsigned int zno = sd_zbc_zone_no(sdkp, blk_rq_pos(rq));
+
WARN_ON_ONCE(!test_bit(zno, sdkp->zones_wlock));
cmd->flags &= ~SCMD_ZONE_WRITE_LOCK;
clear_bit_unlock(zno, sdkp->zones_wlock);
@@ -510,8 +510,67 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
return 0;
 }
 
+/*
+ * Initialize the sequential zone bitmap to allow identifying sequential zones.
+ */
+static int sd_zbc_setup_seq_zones(struct scsi_disk *sdkp)
+{
+   unsigned long *seq_zones;
+   sector_t block = 0;
+   unsigned char *buf;
+   unsigned char *rec;
+   unsigned int buf_len;
+   unsigned int list_length;
+   unsigned int n = 0;
+   int ret = -ENOMEM;
+
+   /* Allocate zone type bitmap */
+   seq_zones = sd_zbc_alloc_zone_bitmap(sdkp);
+   if (!seq_zones)
+   return -ENOMEM;
+
+   buf = kmalloc(SD_ZBC_BUF_SIZE, GFP_KERNEL);
+   if (!buf)
+   goto out;
+
+   while (block < sdkp->capacity) {
+
+   ret = sd_zbc_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE, block);
+   if (ret)
+   goto out;
+
+   /* Parse reported zone descriptors */
+   list_length = get_unaligned_be32([0]) + 64;
+   

[PATCH 5/8] scsi: sd_zbc: Fix sd_zbc_read_zoned_characteristics()

2017-09-01 Thread Damien Le Moal
The three values starting at byte 8 of the Zoned Block Device
Characteristics VPD page B6h are 32 bits values, not 64bits. So use
get_unaligned_be32() to retrieve the values and not get_unaligned_be64()

Fixes: 89d947561077 ("sd: Implement support for ZBC devices")
Cc: 

Signed-off-by: Damien Le Moal 
---
 drivers/scsi/sd_zbc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 810377007665..0e6b5f5d14e7 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -356,15 +356,15 @@ static int sd_zbc_read_zoned_characteristics(struct 
scsi_disk *sdkp,
if (sdkp->device->type != TYPE_ZBC) {
/* Host-aware */
sdkp->urswrz = 1;
-   sdkp->zones_optimal_open = get_unaligned_be64([8]);
-   sdkp->zones_optimal_nonseq = get_unaligned_be64([12]);
+   sdkp->zones_optimal_open = get_unaligned_be32([8]);
+   sdkp->zones_optimal_nonseq = get_unaligned_be32([12]);
sdkp->zones_max_open = 0;
} else {
/* Host-managed */
sdkp->urswrz = buf[4] & 1;
sdkp->zones_optimal_open = 0;
sdkp->zones_optimal_nonseq = 0;
-   sdkp->zones_max_open = get_unaligned_be64([16]);
+   sdkp->zones_max_open = get_unaligned_be32([16]);
}
 
return 0;
-- 
2.13.5



[PATCH 1/8] block: Fix declaration of blk-mq debugfs functions

2017-09-01 Thread Damien Le Moal
__blk_mq_debugfs_rq_show() and blk_mq_debugfs_rq_show() are exported
symbols but declared in the block internal file block/blk-mq-debugfs.h.
Move the declaration of these functions to the public linux/blk-mq.h
file to make these functions available to other modules.

While at it, also move the definition of the blk_mq_debugfs_attr
strcture to allow scheduler modules outside of block to define debugfs
attributes.

Signed-off-by: Damien Le Moal 
---
 block/blk-mq-debugfs.h | 14 --
 include/linux/blk-mq.h | 20 
 2 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/block/blk-mq-debugfs.h b/block/blk-mq-debugfs.h
index a182e6f97565..85759aef53e1 100644
--- a/block/blk-mq-debugfs.h
+++ b/block/blk-mq-debugfs.h
@@ -3,20 +3,6 @@
 
 #ifdef CONFIG_BLK_DEBUG_FS
 
-#include 
-
-struct blk_mq_debugfs_attr {
-   const char *name;
-   umode_t mode;
-   int (*show)(void *, struct seq_file *);
-   ssize_t (*write)(void *, const char __user *, size_t, loff_t *);
-   /* Set either .show or .seq_ops. */
-   const struct seq_operations *seq_ops;
-};
-
-int __blk_mq_debugfs_rq_show(struct seq_file *m, struct request *rq);
-int blk_mq_debugfs_rq_show(struct seq_file *m, void *v);
-
 int blk_mq_debugfs_register(struct request_queue *q);
 void blk_mq_debugfs_unregister(struct request_queue *q);
 int blk_mq_debugfs_register_hctx(struct request_queue *q,
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 14542308d25b..a369174a9679 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -5,6 +5,21 @@
 #include 
 #include 
 
+#ifdef CONFIG_BLK_DEBUG_FS
+
+#include 
+
+struct blk_mq_debugfs_attr {
+   const char *name;
+   umode_t mode;
+   int (*show)(void *, struct seq_file *);
+   ssize_t (*write)(void *, const char __user *, size_t, loff_t *);
+   /* Set either .show or .seq_ops. */
+   const struct seq_operations *seq_ops;
+};
+
+#endif
+
 struct blk_mq_tags;
 struct blk_flush_queue;
 
@@ -289,4 +304,9 @@ static inline void *blk_mq_rq_to_pdu(struct request *rq)
for ((i) = 0; (i) < (hctx)->nr_ctx &&   \
 ({ ctx = (hctx)->ctxs[(i)]; 1; }); (i)++)
 
+#ifdef CONFIG_BLK_DEBUG_FS
+int __blk_mq_debugfs_rq_show(struct seq_file *m, struct request *rq);
+int blk_mq_debugfs_rq_show(struct seq_file *m, void *v);
+#endif
+
 #endif
-- 
2.13.5



[PATCH 4/8] scsi: sd_zbc: Reorganize and cleanup

2017-09-01 Thread Damien Le Moal
Introduce sd_zbc.h for zbc related declarations (avoid cluttering sd.h).

Fix comments style in sd_zbc.c (do not use documentation format) and
add/fix comments to enhance explanations. Also remove a useless blank
line in sd_zbc_read_zones().

Rearrange sd_zbc_setup() to include use_16_for_rw and use_10_for_rw
assignment and simplify nr_zones calculation.

Finally, use the min() macro sd_zbc_check_zone_size() to get a report
zone reply size instead of the open coded version.

Signed-off-by: Damien Le Moal 
---
 drivers/scsi/sd.c |  1 +
 drivers/scsi/sd.h | 53 
 drivers/scsi/sd_zbc.c | 85 +--
 drivers/scsi/sd_zbc.h | 75 +
 4 files changed, 117 insertions(+), 97 deletions(-)
 create mode 100644 drivers/scsi/sd_zbc.h

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 962cef13d406..6711d49fde79 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -68,6 +68,7 @@
 #include 
 
 #include "sd.h"
+#include "sd_zbc.h"
 #include "scsi_priv.h"
 #include "scsi_logging.h"
 
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 99c4dde9b6bf..5940390d30f3 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -272,57 +272,4 @@ static inline void sd_dif_complete(struct scsi_cmnd *cmd, 
unsigned int a)
 
 #endif /* CONFIG_BLK_DEV_INTEGRITY */
 
-static inline int sd_is_zoned(struct scsi_disk *sdkp)
-{
-   return sdkp->zoned == 1 || sdkp->device->type == TYPE_ZBC;
-}
-
-#ifdef CONFIG_BLK_DEV_ZONED
-
-extern int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buffer);
-extern void sd_zbc_remove(struct scsi_disk *sdkp);
-extern void sd_zbc_print_zones(struct scsi_disk *sdkp);
-extern int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd);
-extern void sd_zbc_write_unlock_zone(struct scsi_cmnd *cmd);
-extern int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd);
-extern int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd);
-extern void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes,
-   struct scsi_sense_hdr *sshdr);
-
-#else /* CONFIG_BLK_DEV_ZONED */
-
-static inline int sd_zbc_read_zones(struct scsi_disk *sdkp,
-   unsigned char *buf)
-{
-   return 0;
-}
-
-static inline void sd_zbc_remove(struct scsi_disk *sdkp) {}
-
-static inline void sd_zbc_print_zones(struct scsi_disk *sdkp) {}
-
-static inline int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd)
-{
-   /* Let the drive fail requests */
-   return BLKPREP_OK;
-}
-
-static inline void sd_zbc_write_unlock_zone(struct scsi_cmnd *cmd) {}
-
-static inline int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd)
-{
-   return BLKPREP_INVALID;
-}
-
-static inline int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd)
-{
-   return BLKPREP_INVALID;
-}
-
-static inline void sd_zbc_complete(struct scsi_cmnd *cmd,
-  unsigned int good_bytes,
-  struct scsi_sense_hdr *sshdr) {}
-
-#endif /* CONFIG_BLK_DEV_ZONED */
-
 #endif /* _SCSI_DISK_H */
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index d445a57f99bd..810377007665 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -26,22 +26,12 @@
 
 #include 
 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include "sd.h"
-#include "scsi_priv.h"
-
-/**
+#include "sd_zbc.h"
+
+/*
  * Convert a zone descriptor to a zone struct.
  */
-static void sd_zbc_parse_report(struct scsi_disk *sdkp,
-   u8 *buf,
+static void sd_zbc_parse_report(struct scsi_disk *sdkp, u8 *buf,
struct blk_zone *zone)
 {
struct scsi_device *sdp = sdkp->device;
@@ -63,8 +53,9 @@ static void sd_zbc_parse_report(struct scsi_disk *sdkp,
zone->wp = zone->start + zone->len;
 }
 
-/**
+/*
  * Issue a REPORT ZONES scsi command.
+ * For internal use.
  */
 static int sd_zbc_report_zones(struct scsi_disk *sdkp, unsigned char *buf,
   unsigned int buflen, sector_t lba)
@@ -105,6 +96,9 @@ static int sd_zbc_report_zones(struct scsi_disk *sdkp, 
unsigned char *buf,
return 0;
 }
 
+/*
+ * Prepare a REPORT ZONES scsi command for a REQ_OP_ZONE_REPORT request.
+ */
 int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd)
 {
struct request *rq = cmd->request;
@@ -147,6 +141,10 @@ int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd)
return BLKPREP_OK;
 }
 
+/*
+ * Process a REPORT ZONES scsi command reply, converting all reported
+ * zone descriptors to blk_zone structures.
+ */
 static void sd_zbc_report_zones_complete(struct scsi_cmnd *scmd,
 unsigned int good_bytes)
 {
@@ -202,17 +200,9 @@ static void sd_zbc_report_zones_complete(struct scsi_cmnd 
*scmd,
local_irq_restore(flags);
 }
 
-static inline sector_t sd_zbc_zone_sectors(struct 

[PATCH 3/8] scsi: sd_zbc: Move ZBC declarations to scsi_proto.h

2017-09-01 Thread Damien Le Moal
Move standard macro definitions for the zone types and zone conditions
to scsi_proto.h together with the definitions related to the
REPORT ZONES command.

While at it, define all values in the enums to be clear.

Signed-off-by: Damien Le Moal 
---
 drivers/scsi/sd_zbc.c | 18 --
 include/scsi/scsi_proto.h | 46 +++---
 2 files changed, 35 insertions(+), 29 deletions(-)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 8aa54779aac1..d445a57f99bd 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -37,24 +37,6 @@
 #include "sd.h"
 #include "scsi_priv.h"
 
-enum zbc_zone_type {
-   ZBC_ZONE_TYPE_CONV = 0x1,
-   ZBC_ZONE_TYPE_SEQWRITE_REQ,
-   ZBC_ZONE_TYPE_SEQWRITE_PREF,
-   ZBC_ZONE_TYPE_RESERVED,
-};
-
-enum zbc_zone_cond {
-   ZBC_ZONE_COND_NO_WP,
-   ZBC_ZONE_COND_EMPTY,
-   ZBC_ZONE_COND_IMP_OPEN,
-   ZBC_ZONE_COND_EXP_OPEN,
-   ZBC_ZONE_COND_CLOSED,
-   ZBC_ZONE_COND_READONLY = 0xd,
-   ZBC_ZONE_COND_FULL,
-   ZBC_ZONE_COND_OFFLINE,
-};
-
 /**
  * Convert a zone descriptor to a zone struct.
  */
diff --git a/include/scsi/scsi_proto.h b/include/scsi/scsi_proto.h
index 8c285d9a06d8..20d7b3fbc351 100644
--- a/include/scsi/scsi_proto.h
+++ b/include/scsi/scsi_proto.h
@@ -301,19 +301,43 @@ struct scsi_lun {
 
 /* Reporting options for REPORT ZONES */
 enum zbc_zone_reporting_options {
-   ZBC_ZONE_REPORTING_OPTION_ALL = 0,
-   ZBC_ZONE_REPORTING_OPTION_EMPTY,
-   ZBC_ZONE_REPORTING_OPTION_IMPLICIT_OPEN,
-   ZBC_ZONE_REPORTING_OPTION_EXPLICIT_OPEN,
-   ZBC_ZONE_REPORTING_OPTION_CLOSED,
-   ZBC_ZONE_REPORTING_OPTION_FULL,
-   ZBC_ZONE_REPORTING_OPTION_READONLY,
-   ZBC_ZONE_REPORTING_OPTION_OFFLINE,
-   ZBC_ZONE_REPORTING_OPTION_NEED_RESET_WP = 0x10,
-   ZBC_ZONE_REPORTING_OPTION_NON_SEQWRITE,
-   ZBC_ZONE_REPORTING_OPTION_NON_WP = 0x3f,
+   ZBC_ZONE_REPORTING_OPTION_ALL   = 0x00,
+   ZBC_ZONE_REPORTING_OPTION_EMPTY = 0x01,
+   ZBC_ZONE_REPORTING_OPTION_IMPLICIT_OPEN = 0x02,
+   ZBC_ZONE_REPORTING_OPTION_EXPLICIT_OPEN = 0x03,
+   ZBC_ZONE_REPORTING_OPTION_CLOSED= 0x04,
+   ZBC_ZONE_REPORTING_OPTION_FULL  = 0x05,
+   ZBC_ZONE_REPORTING_OPTION_READONLY  = 0x06,
+   ZBC_ZONE_REPORTING_OPTION_OFFLINE   = 0x07,
+   /* 0x08 to 0x0f are reserved */
+   ZBC_ZONE_REPORTING_OPTION_NEED_RESET_WP = 0x10,
+   ZBC_ZONE_REPORTING_OPTION_NON_SEQWRITE  = 0x11,
+   /* 0x12 to 0x3e are reserved */
+   ZBC_ZONE_REPORTING_OPTION_NON_WP= 0x3f,
 };
 
 #define ZBC_REPORT_ZONE_PARTIAL 0x80
 
+/* Zone types of REPORT ZONES zone descriptors */
+enum zbc_zone_type {
+   ZBC_ZONE_TYPE_CONV  = 0x1,
+   ZBC_ZONE_TYPE_SEQWRITE_REQ  = 0x2,
+   ZBC_ZONE_TYPE_SEQWRITE_PREF = 0x3,
+   /* 0x4 to 0xf are reserved */
+   ZBC_ZONE_TYPE_RESERVED  = 0x4,
+};
+
+/* Zone conditions of REPORT ZONES zone descriptors */
+enum zbc_zone_cond {
+   ZBC_ZONE_COND_NO_WP = 0x0,
+   ZBC_ZONE_COND_EMPTY = 0x1,
+   ZBC_ZONE_COND_IMP_OPEN  = 0x2,
+   ZBC_ZONE_COND_EXP_OPEN  = 0x3,
+   ZBC_ZONE_COND_CLOSED= 0x4,
+   /* 0x5 to 0xc are reserved */
+   ZBC_ZONE_COND_READONLY  = 0xd,
+   ZBC_ZONE_COND_FULL  = 0xe,
+   ZBC_ZONE_COND_OFFLINE   = 0xf,
+};
+
 #endif /* _SCSI_PROTO_H_ */
-- 
2.13.5



[PATCH 2/8] block: Fix declaration of blk-mq scheduler functions

2017-09-01 Thread Damien Le Moal
The functions blk_mq_sched_free_hctx_data(), blk_mq_sched_try_merge(),
blk_mq_sched_try_insert_merge() and blk_mq_sched_request_inserted() are
all exported symbols but declared only internally in
block/blk-mq-sched.h. Move their declaration to include/linux/blk-mq.h
to make them available to block external scheduler modules.

Signed-off-by: Damien Le Moal 
---
 block/blk-mq-sched.h   |  7 ---
 include/linux/blk-mq.h | 10 ++
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 9267d0b7c197..3b725f8fbebe 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -4,16 +4,9 @@
 #include "blk-mq.h"
 #include "blk-mq-tag.h"
 
-void blk_mq_sched_free_hctx_data(struct request_queue *q,
-void (*exit)(struct blk_mq_hw_ctx *));
-
 void blk_mq_sched_assign_ioc(struct request *rq, struct bio *bio);
 
-void blk_mq_sched_request_inserted(struct request *rq);
-bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
-   struct request **merged_request);
 bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio);
-bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request 
*rq);
 void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx);
 
 void blk_mq_sched_insert_request(struct request *rq, bool at_head,
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index a369174a9679..128742552549 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -284,6 +284,16 @@ void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set 
*set, int nr_hw_queues);
 void blk_mq_quiesce_queue_nowait(struct request_queue *q);
 
 /*
+ * Scheduler helper functions.
+ */
+void blk_mq_sched_free_hctx_data(struct request_queue *q,
+void (*exit)(struct blk_mq_hw_ctx *));
+bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
+   struct request **merged_request);
+bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request 
*rq);
+void blk_mq_sched_request_inserted(struct request *rq);
+
+/*
  * Driver command data is immediately after the request. So subtract request
  * size to get back to the original request, add request size to get the PDU.
  */
-- 
2.13.5



Re: [PATCH 4/9] blk-mq: only run hw queues for blk-mq

2017-09-01 Thread Johannes Thumshirn
Hi Ming,

On Fri, Sep 01, 2017 at 01:27:23AM +0800, Ming Lei wrote:
> - blk_mq_run_hw_queues(q, false);
> + if (q->mq_ops)
> + blk_mq_run_hw_queues(q, false);

What speaks against putting the if (q->mq_ops) directly into
blk_mq_run_hw_queues() so we can't accidently call it from the sq path?

Just an idea, no hard preferences here.

Byte,
Johannes

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: non-blocking buffered reads V5

2017-09-01 Thread Christoph Hellwig
Given that we got all the reviews is there as chance to get
this picked up for 4.14?  The last round of nowait bits went
through the block tree, and this one has the block bits as well,
so maybe we should go for that again?


Enable skip_copy can cause data integrity issue in some storage stack

2017-09-01 Thread alexwu

Hi,

Recently a data integrity issue about skip_copy was found. We are able
to reproduce it and found the root cause. This data integrity issue
might happen if there are other layers between file system and raid5.

[How to Reproduce]

1. Create a raid5 named md0 first (with skip_copy enable), and wait md0
   resync done which ensures that all data and parity are synchronized
2. Use lvm tools to create a logical volume named lv-md0 over md0
3. Format an ext4 file system on lv-md0 and mount on /mnt
4. Do some db operations (e.g. sqlite insert) to write data through /mnt
5. When those db operations finished, we do the following command
   "echo check > /sys/block/md0/md/sync_action" to check mismatch_cnt,
   it is very likely that we got mismatch_cnt != 0 when check finished

[Root Cause]

After tracing code and more experiments, it is more proper to say that
it's a problem about backing_dev_info (bdi) instead of a bug about 
skip_copy.


We notice that:
   1. skip_copy counts on BDI_CAP_STABLE_WRITES to ensure that bio's 
page
  will not be modified before raid5 completes I/O. Thus we can skip 
copy

  page from bio to stripe cache
   2. The ext4 file system will call wait_for_stable_page() to ask 
whether

  the mapped bdi requires stable writes

Data integrity happens because:
   1. When raid5 enable skip_copy, it will only set it's own bdi 
required

  BDI_CAP_STABLE_WRITES, but this information will not propagate to
  other bdi between file system and md
   2. When ext4 file system check stable writes requirement by calling
  wait_for_stable_page(), it can only see the underlying bdi's 
capability

  and cannot see all related bdi

Thus, skip_copy works fine if we format file system directly on md.
But data integrity issue happens if there are some other block layers 
(e.g. dm)

between file system and md.

[Result]

We do more tests on different storage stacks, here are the results.

The following settings can pass the test thousand times:
   1. raid5 with skip_copy enable + ext4
   2. raid5 with skip_copy disable + ext4
   3. raid5 with skip_copy disable + lvm + ext4

The following setting will fail the test in 10 rounds:
   1. raid5 with skip_copy enable + lvm + ext4

I think the solution might be let all bdi can communicate through 
different block layers,
then we can pass BDI_CAP_STABLE_WRITES information if we enable 
skip_copy.

But the current bdi structure is not allowed us to do that.

What would you suggest to do if we want to make skip_copy more reliable 
?


Best Regards,
Alex



Re: [PATCH] block: sed-opal: Set MBRDone on S3 resume path if TPER is MBREnabled

2017-09-01 Thread Christoph Hellwig
On Thu, Aug 31, 2017 at 01:58:01PM -0600, Scott Bauer wrote:
> > > + { set_mbr_done, _done_tf },
> > Do you need to end_opal_session here?
> >
> Yep, sure do. I'll wait for Christoph to look at it tonight before spinning 
> another.

No additional comment from me, please just resend with that fix.


Re: [PATCH 0/9] block/scsi: safe SCSI quiescing

2017-09-01 Thread Ming Lei
On Fri, Sep 01, 2017 at 08:24:46AM +0200, oleksa...@natalenko.name wrote:
> Hi.
> 
> 01.09.2017 05:45, Ming Lei wrote:
> > Could you try the following patch against this patchset to see
> > if there is still the warning?
> 
> With this patch I wasn't able to trigger per-CPU-related warning.

That is great!

> 
> Also, for 8th patch you've written:
> 
> > I have sent one delta patch in list, which will only call
> > blk_queue_enter_live() iff SCSI device is in QUIESCE.
> 
> This refers to the patch I've just tested, right?

Yeah, you are right, thanks again for your test!

I will prepare V2 and send it out soon.

-- 
Ming


Re: [PATCH 0/9] block/scsi: safe SCSI quiescing

2017-09-01 Thread oleksandr

Hi.

01.09.2017 05:45, Ming Lei wrote:

Could you try the following patch against this patchset to see
if there is still the warning?


With this patch I wasn't able to trigger per-CPU-related warning.

Also, for 8th patch you've written:

I have sent one delta patch in list, which will only call 
blk_queue_enter_live() iff SCSI device is in QUIESCE.


This refers to the patch I've just tested, right?