Re: [PATCH 05/12] blk-mq: Introduce blk_mq_quiesce_queue()

2016-10-27 Thread Bart Van Assche

On 10/26/2016 10:52 PM, Hannes Reinecke wrote:

Hmm. Can't say I like having to have two RCU structure for the same
purpose, but I guess that can't be helped.


Hello Hannes,

There are two RCU structures because of BLK_MQ_F_BLOCKING. Today only 
the nbd driver sets that flag. If the nbd driver would be modified such 
that it doesn't set that flag then the BLK_MQ_F_BLOCKING flag and also 
queue_rq_srcu could be eliminated from the blk-mq core.


Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/12] blk-mq: Introduce blk_mq_quiesce_queue()

2016-10-27 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/12] blk-mq: Introduce blk_mq_quiesce_queue()

2016-10-26 Thread Hannes Reinecke
On 10/27/2016 12:53 AM, Bart Van Assche wrote:
> blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
> have finished. This function does *not* wait until all outstanding
> requests have finished (this means invocation of request.end_io()).
> The algorithm used by blk_mq_quiesce_queue() is as follows:
> * Hold either an RCU read lock or an SRCU read lock around
>   .queue_rq() calls. The former is used if .queue_rq() does not
>   block and the latter if .queue_rq() may block.
> * blk_mq_quiesce_queue() calls synchronize_srcu() or
>   synchronize_rcu() to wait for .queue_rq() invocations that
>   started before blk_mq_quiesce_queue() was called.
> * The blk_mq_hctx_stopped() calls that control whether or not
>   .queue_rq() will be called are called with the (S)RCU read lock
>   held. This is necessary to avoid race conditions against
>   the "blk_mq_stop_hw_queues(q); blk_mq_quiesce_queue(q);"
>   sequence from another thread.
> 
> Signed-off-by: Bart Van Assche 
> Cc: Christoph Hellwig 
> Cc: Ming Lei 
> Cc: Hannes Reinecke 
> Cc: Johannes Thumshirn 
> ---
>  block/Kconfig  |  1 +
>  block/blk-mq.c | 69 
> +-
>  include/linux/blk-mq.h |  3 +++
>  include/linux/blkdev.h |  1 +
>  4 files changed, 67 insertions(+), 7 deletions(-)
> 
Hmm. Can't say I like having to have two RCU structure for the same
purpose, but I guess that can't be helped.

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/12] blk-mq: Introduce blk_mq_quiesce_queue()

2016-10-26 Thread Bart Van Assche

On 10/26/16 18:30, Ming Lei wrote:

On Thu, Oct 27, 2016 at 6:53 AM, Bart Van Assche
 wrote:

blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
have finished. This function does *not* wait until all outstanding
requests have finished (this means invocation of request.end_io()).
The algorithm used by blk_mq_quiesce_queue() is as follows:
* Hold either an RCU read lock or an SRCU read lock around
  .queue_rq() calls. The former is used if .queue_rq() does not
  block and the latter if .queue_rq() may block.
* blk_mq_quiesce_queue() calls synchronize_srcu() or
  synchronize_rcu() to wait for .queue_rq() invocations that
  started before blk_mq_quiesce_queue() was called.
* The blk_mq_hctx_stopped() calls that control whether or not
  .queue_rq() will be called are called with the (S)RCU read lock
  held. This is necessary to avoid race conditions against
  the "blk_mq_stop_hw_queues(q); blk_mq_quiesce_queue(q);"
  sequence from another thread.

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
Cc: Ming Lei 
Cc: Hannes Reinecke 
Cc: Johannes Thumshirn 
---
 block/Kconfig  |  1 +
 block/blk-mq.c | 69 +-
 include/linux/blk-mq.h |  3 +++
 include/linux/blkdev.h |  1 +
 4 files changed, 67 insertions(+), 7 deletions(-)

diff --git a/block/Kconfig b/block/Kconfig
index 1d4d624..0562ef9 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -5,6 +5,7 @@ menuconfig BLOCK
bool "Enable the block layer" if EXPERT
default y
select SBITMAP
+   select SRCU
help
 Provide block layer support for the kernel.

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 0cf21c2..4945437 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -115,6 +115,31 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);

+/**
+ * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have finished
+ * @q: request queue.
+ *
+ * Note: this function does not prevent that the struct request end_io()
+ * callback function is invoked. Additionally, it is not prevented that
+ * new queue_rq() calls occur unless the queue has been stopped first.
+ */
+void blk_mq_quiesce_queue(struct request_queue *q)
+{
+   struct blk_mq_hw_ctx *hctx;
+   unsigned int i;
+   bool rcu = false;


Before synchronizing SRCU/RCU, we have to set a per-hctx flag
or per-queue flag to block comming .queue_rq(), seems I mentioned
that before:

   https://www.spinics.net/lists/linux-rdma/msg41389.html


Hello Ming,

Thanks for having included an URL to an archived version of that 
discussion. What I remember about that discussion is that I proposed to 
use the existing flag BLK_MQ_S_STOPPED instead of introducing a
new QUEUE_FLAG_QUIESCING flag and that you agreed with that proposal. 
See also https://www.spinics.net/lists/linux-rdma/msg41430.html.


Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/12] blk-mq: Introduce blk_mq_quiesce_queue()

2016-10-26 Thread Ming Lei
On Thu, Oct 27, 2016 at 6:53 AM, Bart Van Assche
 wrote:
> blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
> have finished. This function does *not* wait until all outstanding
> requests have finished (this means invocation of request.end_io()).
> The algorithm used by blk_mq_quiesce_queue() is as follows:
> * Hold either an RCU read lock or an SRCU read lock around
>   .queue_rq() calls. The former is used if .queue_rq() does not
>   block and the latter if .queue_rq() may block.
> * blk_mq_quiesce_queue() calls synchronize_srcu() or
>   synchronize_rcu() to wait for .queue_rq() invocations that
>   started before blk_mq_quiesce_queue() was called.
> * The blk_mq_hctx_stopped() calls that control whether or not
>   .queue_rq() will be called are called with the (S)RCU read lock
>   held. This is necessary to avoid race conditions against
>   the "blk_mq_stop_hw_queues(q); blk_mq_quiesce_queue(q);"
>   sequence from another thread.
>
> Signed-off-by: Bart Van Assche 
> Cc: Christoph Hellwig 
> Cc: Ming Lei 
> Cc: Hannes Reinecke 
> Cc: Johannes Thumshirn 
> ---
>  block/Kconfig  |  1 +
>  block/blk-mq.c | 69 
> +-
>  include/linux/blk-mq.h |  3 +++
>  include/linux/blkdev.h |  1 +
>  4 files changed, 67 insertions(+), 7 deletions(-)
>
> diff --git a/block/Kconfig b/block/Kconfig
> index 1d4d624..0562ef9 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -5,6 +5,7 @@ menuconfig BLOCK
> bool "Enable the block layer" if EXPERT
> default y
> select SBITMAP
> +   select SRCU
> help
>  Provide block layer support for the kernel.
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 0cf21c2..4945437 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -115,6 +115,31 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
>
> +/**
> + * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have 
> finished
> + * @q: request queue.
> + *
> + * Note: this function does not prevent that the struct request end_io()
> + * callback function is invoked. Additionally, it is not prevented that
> + * new queue_rq() calls occur unless the queue has been stopped first.
> + */
> +void blk_mq_quiesce_queue(struct request_queue *q)
> +{
> +   struct blk_mq_hw_ctx *hctx;
> +   unsigned int i;
> +   bool rcu = false;

Before synchronizing SRCU/RCU, we have to set a per-hctx flag
or per-queue flag to block comming .queue_rq(), seems I mentioned
that before:

   https://www.spinics.net/lists/linux-rdma/msg41389.html

> +
> +   queue_for_each_hw_ctx(q, hctx, i) {
> +   if (hctx->flags & BLK_MQ_F_BLOCKING)
> +   synchronize_srcu(>queue_rq_srcu);
> +   else
> +   rcu = true;
> +   }
> +   if (rcu)
> +   synchronize_rcu();
> +}
> +EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
> +
>  void blk_mq_wake_waiters(struct request_queue *q)
>  {
> struct blk_mq_hw_ctx *hctx;
> @@ -778,7 +803,7 @@ static inline unsigned int queued_to_index(unsigned int 
> queued)
>   * of IO. In particular, we'd like FIFO behaviour on handling existing
>   * items on the hctx->dispatch list. Ignore that for now.
>   */
> -static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
> +static void blk_mq_process_rq_list(struct blk_mq_hw_ctx *hctx)
>  {
> struct request_queue *q = hctx->queue;
> struct request *rq;
> @@ -790,9 +815,6 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx 
> *hctx)
> if (unlikely(blk_mq_hctx_stopped(hctx)))
> return;
>
> -   WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
> -   cpu_online(hctx->next_cpu));
> -
> hctx->run++;
>
> /*
> @@ -883,6 +905,24 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx 
> *hctx)
> }
>  }
>
> +static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
> +{
> +   int srcu_idx;
> +
> +   WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
> +   cpu_online(hctx->next_cpu));
> +
> +   if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
> +   rcu_read_lock();
> +   blk_mq_process_rq_list(hctx);
> +   rcu_read_unlock();
> +   } else {
> +   srcu_idx = srcu_read_lock(>queue_rq_srcu);
> +   blk_mq_process_rq_list(hctx);
> +   srcu_read_unlock(>queue_rq_srcu, srcu_idx);
> +   }
> +}
> +
>  /*
>   * It'd be great if the workqueue API had a way to pass
>   * in a mask and had some smarts for more clever placement.
> @@ -1293,7 +1333,7 @@ static blk_qc_t blk_mq_make_request(struct 
> request_queue *q, struct bio *bio)
> const int is_flush_fua = bio->bi_opf & 

[PATCH 05/12] blk-mq: Introduce blk_mq_quiesce_queue()

2016-10-26 Thread Bart Van Assche
blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
have finished. This function does *not* wait until all outstanding
requests have finished (this means invocation of request.end_io()).
The algorithm used by blk_mq_quiesce_queue() is as follows:
* Hold either an RCU read lock or an SRCU read lock around
  .queue_rq() calls. The former is used if .queue_rq() does not
  block and the latter if .queue_rq() may block.
* blk_mq_quiesce_queue() calls synchronize_srcu() or
  synchronize_rcu() to wait for .queue_rq() invocations that
  started before blk_mq_quiesce_queue() was called.
* The blk_mq_hctx_stopped() calls that control whether or not
  .queue_rq() will be called are called with the (S)RCU read lock
  held. This is necessary to avoid race conditions against
  the "blk_mq_stop_hw_queues(q); blk_mq_quiesce_queue(q);"
  sequence from another thread.

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
Cc: Ming Lei 
Cc: Hannes Reinecke 
Cc: Johannes Thumshirn 
---
 block/Kconfig  |  1 +
 block/blk-mq.c | 69 +-
 include/linux/blk-mq.h |  3 +++
 include/linux/blkdev.h |  1 +
 4 files changed, 67 insertions(+), 7 deletions(-)

diff --git a/block/Kconfig b/block/Kconfig
index 1d4d624..0562ef9 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -5,6 +5,7 @@ menuconfig BLOCK
bool "Enable the block layer" if EXPERT
default y
select SBITMAP
+   select SRCU
help
 Provide block layer support for the kernel.
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 0cf21c2..4945437 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -115,6 +115,31 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
 
+/**
+ * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have finished
+ * @q: request queue.
+ *
+ * Note: this function does not prevent that the struct request end_io()
+ * callback function is invoked. Additionally, it is not prevented that
+ * new queue_rq() calls occur unless the queue has been stopped first.
+ */
+void blk_mq_quiesce_queue(struct request_queue *q)
+{
+   struct blk_mq_hw_ctx *hctx;
+   unsigned int i;
+   bool rcu = false;
+
+   queue_for_each_hw_ctx(q, hctx, i) {
+   if (hctx->flags & BLK_MQ_F_BLOCKING)
+   synchronize_srcu(>queue_rq_srcu);
+   else
+   rcu = true;
+   }
+   if (rcu)
+   synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
+
 void blk_mq_wake_waiters(struct request_queue *q)
 {
struct blk_mq_hw_ctx *hctx;
@@ -778,7 +803,7 @@ static inline unsigned int queued_to_index(unsigned int 
queued)
  * of IO. In particular, we'd like FIFO behaviour on handling existing
  * items on the hctx->dispatch list. Ignore that for now.
  */
-static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
+static void blk_mq_process_rq_list(struct blk_mq_hw_ctx *hctx)
 {
struct request_queue *q = hctx->queue;
struct request *rq;
@@ -790,9 +815,6 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx 
*hctx)
if (unlikely(blk_mq_hctx_stopped(hctx)))
return;
 
-   WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
-   cpu_online(hctx->next_cpu));
-
hctx->run++;
 
/*
@@ -883,6 +905,24 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx 
*hctx)
}
 }
 
+static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
+{
+   int srcu_idx;
+
+   WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
+   cpu_online(hctx->next_cpu));
+
+   if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
+   rcu_read_lock();
+   blk_mq_process_rq_list(hctx);
+   rcu_read_unlock();
+   } else {
+   srcu_idx = srcu_read_lock(>queue_rq_srcu);
+   blk_mq_process_rq_list(hctx);
+   srcu_read_unlock(>queue_rq_srcu, srcu_idx);
+   }
+}
+
 /*
  * It'd be great if the workqueue API had a way to pass
  * in a mask and had some smarts for more clever placement.
@@ -1293,7 +1333,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue 
*q, struct bio *bio)
const int is_flush_fua = bio->bi_opf & (REQ_PREFLUSH | REQ_FUA);
struct blk_map_ctx data;
struct request *rq;
-   unsigned int request_count = 0;
+   unsigned int request_count = 0, srcu_idx;
struct blk_plug *plug;
struct request *same_queue_rq = NULL;
blk_qc_t cookie;
@@ -1336,7 +1376,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue 
*q, struct bio *bio)
blk_mq_bio_to_request(rq, bio);
 
/*
-* We do limited pluging. If the bio can be merged, do that.
+