Re: [PATCH 1/2] block: Zoned block device single-threaded submission

2017-08-07 Thread Damien Le Moal
Chistoph,

On 8/5/17 20:34, Christoph Hellwig wrote:
> We'll need a blk-mq version as well, otherwise: NAK.

Not that I have not tried, but I do not see how this is possible without
in the end making blk-mq/scsi-mq for a ZBC disk work exactly like the sq
path, that is adding locks/barriers in many places to prevent the mq 3
different contexts form potentially messing with the dispatch queue
order (submission, run and requeue). I do not see any solution simple
enough to be considered RC material.

This patch ensures that for 4.13 we at least have the legacy single
queue I/O path that is safe for zoned block devices. With the other
patch I sent (+ Bart's "always unprep" patch) enduring that mq does not
deadlock (and only that, unaligned write errors can happen with ZBC drives).

Going forward, considering only block-mq/scsi-mq (since the legacy path
will eventually go away), I think that trying to ensure per-zone
sequential writes at the SCSI layer is not a sustainable approach. It
will add too many constraints on the mq path/queue management and will
only make the mq code more complex and very hard to debug any issue with
sequential writes.

I thought of another simpler and easier to maintain approach: extending
the writeback throttling code to implement a "only one write per
sequential zone" I/O pattern, which will always result in sequential
writes within a zone no matter what blk-mq, the mq schedulers or the
scsi dispatch code do. In effect, this is exactly the same as what the
zone locking does currently, but all the implementation would be limited
to the higher bio_submit() level. This would allow removing all the ZBC
specific code in the I/O path (single threaded dispatch, zone lock) and
will not need messing mq I/O path. So overall, a much cleaner and easier
to maintain approach.

Of course, this kind of writeback throttling could be implemented in
each zoned block device user (currently only f2fs and dm-zoned, but
likely more coming). But that would lead to a lot of duplicated code. So
integrating that to bio_submit()/WBT makes sense to me.

What do you think ?

Of course, I may be missing something really simple to solve the problem
in blk-mq. I would be happy to tackle the implementation & testing if
someone has an idea.

Best regards.

-- 
Damien Le Moal,
Western Digital


Re: [PATCH 1/2] block: Zoned block device single-threaded submission

2017-08-05 Thread Christoph Hellwig
We'll need a blk-mq version as well, otherwise: NAK.


Re: [PATCH 1/2] block: Zoned block device single-threaded submission

2017-08-04 Thread Bart Van Assche
On Fri, 2017-08-04 at 16:52 +0900, Damien Le Moal wrote:
> From: Hannes Reinecke 
> 
> The scsi_request_fn() dispatch function internally unlocks the request
> queue before submitting a request to the underlying LLD. This can
> potentially lead to write request reordering if the context executing
> scsi_request_fn() is preempted before the request is submitted to the
> LLD and another context start the same function execution.
> 
> This is not a problem for regular disks but leads to write I/O errors
> on host managed zoned block devices and reduce the effectivness of
> sequential write optimizations for host aware disks.
> (Note: the zone write lock in place in the scsi command init code will
> prevent multiple writes from being issued simultaneously to the same
> zone to avoid HBA level reordering issues, but this locking mechanism
> is ineffective to prevent reordering at the dispatch level)
> 
> Prevent this from happening by limiting the number of context that can
> simultaneously execute the queue request_fn() function to a single
> thread.
> 
> A similar patch was originally proposed by Hannes Reinecke in a first
> set of patches implementing ZBC support but ultimately not included in
> the final support implementation. See commit 92f5e2a295
> "block: add flag for single-threaded submission" in the tree
> https://git.kernel.org/pub/scm/linux/kernel/git/hare/scsi-devel.git/log/?h=zac.v3
> 
> Authorship thus goes to Hannes.
> 
> Signed-off-by: Hannes Reinecke 
> Signed-off-by: Damien Le Moal 
> ---
>  block/blk-core.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index dbecbf4a64e0..cf590cbddcfd 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -371,7 +371,14 @@ inline void __blk_run_queue_uncond(struct request_queue 
> *q)
>* running such a request function concurrently. Keep track of the
>* number of active request_fn invocations such that blk_drain_queue()
>* can wait until all these request_fn calls have finished.
> +  *
> +  * For zoned block devices, do not allow multiple threads to
> +  * dequeue requests as this can lead to write request reordering
> +  * during the time the queue is unlocked.
>*/
> + if (blk_queue_is_zoned(q) && q->request_fn_active)
> + return;
> +
>   q->request_fn_active++;
>   q->request_fn(q);
>   q->request_fn_active--;

Hello Damien,

Since serialization of request queue processing is only needed for ZBC and
since all ZBC devices use the SCSI core, could this serialization have been
achieved by modifying the SCSI core, e.g. by adding the following before the
for-loop in scsi_request_fn():

if (blk_queue_is_zoned(q) && q->request_fn_active > 1)
return;

Thanks,

Bart.

[PATCH 1/2] block: Zoned block device single-threaded submission

2017-08-04 Thread Damien Le Moal
From: Hannes Reinecke 

The scsi_request_fn() dispatch function internally unlocks the request
queue before submitting a request to the underlying LLD. This can
potentially lead to write request reordering if the context executing
scsi_request_fn() is preempted before the request is submitted to the
LLD and another context start the same function execution.

This is not a problem for regular disks but leads to write I/O errors
on host managed zoned block devices and reduce the effectivness of
sequential write optimizations for host aware disks.
(Note: the zone write lock in place in the scsi command init code will
prevent multiple writes from being issued simultaneously to the same
zone to avoid HBA level reordering issues, but this locking mechanism
is ineffective to prevent reordering at the dispatch level)

Prevent this from happening by limiting the number of context that can
simultaneously execute the queue request_fn() function to a single
thread.

A similar patch was originally proposed by Hannes Reinecke in a first
set of patches implementing ZBC support but ultimately not included in
the final support implementation. See commit 92f5e2a295
"block: add flag for single-threaded submission" in the tree
https://git.kernel.org/pub/scm/linux/kernel/git/hare/scsi-devel.git/log/?h=zac.v3

Authorship thus goes to Hannes.

Signed-off-by: Hannes Reinecke 
Signed-off-by: Damien Le Moal 
---
 block/blk-core.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index dbecbf4a64e0..cf590cbddcfd 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -371,7 +371,14 @@ inline void __blk_run_queue_uncond(struct request_queue *q)
 * running such a request function concurrently. Keep track of the
 * number of active request_fn invocations such that blk_drain_queue()
 * can wait until all these request_fn calls have finished.
+*
+* For zoned block devices, do not allow multiple threads to
+* dequeue requests as this can lead to write request reordering
+* during the time the queue is unlocked.
 */
+   if (blk_queue_is_zoned(q) && q->request_fn_active)
+   return;
+
q->request_fn_active++;
q->request_fn(q);
q->request_fn_active--;
-- 
2.13.3



[PATCH 1/2] block: Zoned block device single-threaded submission

2017-08-01 Thread Damien Le Moal
From: Hannes Reinecke 

The scsi_request_fn() dispatch function internally unlocks the request
queue before submitting a request to the underlying LLD. This can
potentially lead to write request reordering if the context executing
scsi_request_fn() is preempted before the request is submitted to the
LLD and another context start the same function execution.

This is not a problem for regular disks but leads to write I/O errors
on host managed zoned block devices and reduce the effectivness of
sequential write optimizations for host aware disks.
(Note: the zone write lock in place in the scsi command init code will
prevent multiple writes from being issued simultaneously to the same
zone to avoid HBA level reordering issues, but this locking mechanism
is ineffective to prevent reordering at this level)

Prevent this from happening by limiting the number of context that can
simultaneously execute the queue request_fn() function to a single
thread.

A similar patch was originally proposed by Hannes Reinecke in a first
set of patches implementing ZBC support but ultimately not included in
the final support implementation. See commit 92f5e2a295
"block: add flag for single-threaded submission" in the tree
https://git.kernel.org/pub/scm/linux/kernel/git/hare/scsi-devel.git/log/?h=zac.v3

Authorship thus goes to Hannes.

Signed-off-by: Hannes Reinecke 
Signed-off-by: Damien Le Moal 
---
 block/blk-core.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index dbecbf4a64e0..cf590cbddcfd 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -371,7 +371,14 @@ inline void __blk_run_queue_uncond(struct request_queue *q)
 * running such a request function concurrently. Keep track of the
 * number of active request_fn invocations such that blk_drain_queue()
 * can wait until all these request_fn calls have finished.
+*
+* For zoned block devices, do not allow multiple threads to
+* dequeue requests as this can lead to write request reordering
+* during the time the queue is unlocked.
 */
+   if (blk_queue_is_zoned(q) && q->request_fn_active)
+   return;
+
q->request_fn_active++;
q->request_fn(q);
q->request_fn_active--;
-- 
2.13.3