Re: block: fix direct dispatch issue failure for clones

2018-12-06 Thread Jens Axboe
On 12/6/18 7:16 PM, Jens Axboe wrote:
> On 12/6/18 7:06 PM, Jens Axboe wrote:
>> On 12/6/18 6:58 PM, Mike Snitzer wrote:
 There is another way to fix this - still do the direct dispatch, but have
 dm track if it failed and do bypass insert in that case. I didn't want do
 to that since it's more involved, but it's doable.

 Let me cook that up and test it... Don't like it, though.
>>>
>>> Not following how DM can track if issuing the request worked if it is
>>> always told it worked with BLK_STS_OK.  We care about feedback when the
>>> request is actually issued because of the elaborate way blk-mq elevators
>>> work.  DM is forced to worry about all these details, as covered some in
>>> the header for commit 396eaf21ee17c476e8f66249fb1f4a39003d0ab4, it is
>>> trying to have its cake and eat it too.  It just wants IO scheduling to
>>> work for request-based DM devices.  That's it.
>>
>> It needs the feedback, I don't disagree on that part at all. If we
>> always return OK, then that loop is broken. How about something like the
>> below? Totally untested right now...
>>
>> We track if a request ever saw BLK_STS_RESOURCE from direct dispatch,
>> and if it did, we store that information with RQF_DONTPREP. When we then
>> next time go an insert a request, if it has RQF_DONTPREP set, then we
>> ask blk_insert_cloned_request() to bypass insert.
>>
>> I'll go test this now.
> 
> Passes the test case for me.

Here's one that doesn't re-arrange the return value check into a switch.
Turns out cleaner (and less LOC changes), and also doesn't fiddle with
request after freeing it if we got OK return...

Will give this a whirl too, just in case.

diff --git a/block/blk-core.c b/block/blk-core.c
index deb56932f8c4..cccda51e165f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2617,7 +2617,8 @@ static int blk_cloned_rq_check_limits(struct 
request_queue *q,
  * @q:  the queue to submit the request
  * @rq: the request being queued
  */
-blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request 
*rq)
+blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request 
*rq,
+  bool force_insert)
 {
unsigned long flags;
int where = ELEVATOR_INSERT_BACK;
@@ -2637,7 +2638,11 @@ blk_status_t blk_insert_cloned_request(struct 
request_queue *q, struct request *
 * bypass a potential scheduler on the bottom device for
 * insert.
 */
-   return blk_mq_request_issue_directly(rq);
+   if (force_insert) {
+   blk_mq_request_bypass_insert(rq, true);
+   return BLK_STS_OK;
+   } else
+   return blk_mq_request_issue_directly(rq);
}
 
spin_lock_irqsave(q->queue_lock, flags);
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 7cd36e4d1310..e497a2ab6766 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -299,16 +299,20 @@ static void end_clone_request(struct request *clone, 
blk_status_t error)
 
 static blk_status_t dm_dispatch_clone_request(struct request *clone, struct 
request *rq)
 {
+   bool was_busy = (rq->rq_flags & RQF_DONTPREP) != 0;
blk_status_t r;
 
if (blk_queue_io_stat(clone->q))
clone->rq_flags |= RQF_IO_STAT;
 
clone->start_time_ns = ktime_get_ns();
-   r = blk_insert_cloned_request(clone->q, clone);
-   if (r != BLK_STS_OK && r != BLK_STS_RESOURCE && r != 
BLK_STS_DEV_RESOURCE)
+   r = blk_insert_cloned_request(clone->q, clone, was_busy);
+   if (r == BLK_STS_RESOURCE || r == BLK_STS_DEV_RESOURCE)
+   rq->rq_flags |= RQF_DONTPREP;
+   else if (r != BLK_STS_OK)
/* must complete clone in terms of original request */
dm_complete_request(rq, r);
+
return r;
 }
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 4293dc1cd160..7cb84ee4c9f4 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -994,7 +994,7 @@ extern int blk_rq_prep_clone(struct request *rq, struct 
request *rq_src,
 void *data);
 extern void blk_rq_unprep_clone(struct request *rq);
 extern blk_status_t blk_insert_cloned_request(struct request_queue *q,
-struct request *rq);
+struct request *rq, bool force_insert);
 extern int blk_rq_append_bio(struct request *rq, struct bio **bio);
 extern void blk_delay_queue(struct request_queue *, unsigned long);
 extern void blk_queue_split(struct request_queue *, struct bio **);

-- 
Jens Axboe



Re: block: fix direct dispatch issue failure for clones

2018-12-06 Thread Jens Axboe
On 12/6/18 7:06 PM, Jens Axboe wrote:
> On 12/6/18 6:58 PM, Mike Snitzer wrote:
>>> There is another way to fix this - still do the direct dispatch, but have
>>> dm track if it failed and do bypass insert in that case. I didn't want do
>>> to that since it's more involved, but it's doable.
>>>
>>> Let me cook that up and test it... Don't like it, though.
>>
>> Not following how DM can track if issuing the request worked if it is
>> always told it worked with BLK_STS_OK.  We care about feedback when the
>> request is actually issued because of the elaborate way blk-mq elevators
>> work.  DM is forced to worry about all these details, as covered some in
>> the header for commit 396eaf21ee17c476e8f66249fb1f4a39003d0ab4, it is
>> trying to have its cake and eat it too.  It just wants IO scheduling to
>> work for request-based DM devices.  That's it.
> 
> It needs the feedback, I don't disagree on that part at all. If we
> always return OK, then that loop is broken. How about something like the
> below? Totally untested right now...
> 
> We track if a request ever saw BLK_STS_RESOURCE from direct dispatch,
> and if it did, we store that information with RQF_DONTPREP. When we then
> next time go an insert a request, if it has RQF_DONTPREP set, then we
> ask blk_insert_cloned_request() to bypass insert.
> 
> I'll go test this now.

Passes the test case for me.

-- 
Jens Axboe



Re: block: fix direct dispatch issue failure for clones

2018-12-06 Thread Mike Snitzer
On Thu, Dec 06 2018 at  8:58pm -0500,
Mike Snitzer  wrote:

> DM is forced to worry about all these details, as covered some in
> the header for commit 396eaf21ee17c476e8f66249fb1f4a39003d0ab4, it is
> trying to have its cake and eat it too.

Gah, obviously meant: DM is _NOT_ trying to have its cake and eat it too.

> It just wants IO scheduling to work for request-based DM devices.


Re: block: fix direct dispatch issue failure for clones

2018-12-06 Thread Jens Axboe
On 12/6/18 6:58 PM, Mike Snitzer wrote:
>> There is another way to fix this - still do the direct dispatch, but have
>> dm track if it failed and do bypass insert in that case. I didn't want do
>> to that since it's more involved, but it's doable.
>>
>> Let me cook that up and test it... Don't like it, though.
> 
> Not following how DM can track if issuing the request worked if it is
> always told it worked with BLK_STS_OK.  We care about feedback when the
> request is actually issued because of the elaborate way blk-mq elevators
> work.  DM is forced to worry about all these details, as covered some in
> the header for commit 396eaf21ee17c476e8f66249fb1f4a39003d0ab4, it is
> trying to have its cake and eat it too.  It just wants IO scheduling to
> work for request-based DM devices.  That's it.

It needs the feedback, I don't disagree on that part at all. If we
always return OK, then that loop is broken. How about something like the
below? Totally untested right now...

We track if a request ever saw BLK_STS_RESOURCE from direct dispatch,
and if it did, we store that information with RQF_DONTPREP. When we then
next time go an insert a request, if it has RQF_DONTPREP set, then we
ask blk_insert_cloned_request() to bypass insert.

I'll go test this now.

diff --git a/block/blk-core.c b/block/blk-core.c
index deb56932f8c4..cccda51e165f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2617,7 +2617,8 @@ static int blk_cloned_rq_check_limits(struct 
request_queue *q,
  * @q:  the queue to submit the request
  * @rq: the request being queued
  */
-blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request 
*rq)
+blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request 
*rq,
+  bool force_insert)
 {
unsigned long flags;
int where = ELEVATOR_INSERT_BACK;
@@ -2637,7 +2638,11 @@ blk_status_t blk_insert_cloned_request(struct 
request_queue *q, struct request *
 * bypass a potential scheduler on the bottom device for
 * insert.
 */
-   return blk_mq_request_issue_directly(rq);
+   if (force_insert) {
+   blk_mq_request_bypass_insert(rq, true);
+   return BLK_STS_OK;
+   } else
+   return blk_mq_request_issue_directly(rq);
}
 
spin_lock_irqsave(q->queue_lock, flags);
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 7cd36e4d1310..8d4c5020ccaa 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -299,16 +299,27 @@ static void end_clone_request(struct request *clone, 
blk_status_t error)
 
 static blk_status_t dm_dispatch_clone_request(struct request *clone, struct 
request *rq)
 {
+   bool was_busy = (rq->rq_flags & RQF_DONTPREP) != 0;
blk_status_t r;
 
if (blk_queue_io_stat(clone->q))
clone->rq_flags |= RQF_IO_STAT;
 
clone->start_time_ns = ktime_get_ns();
-   r = blk_insert_cloned_request(clone->q, clone);
-   if (r != BLK_STS_OK && r != BLK_STS_RESOURCE && r != 
BLK_STS_DEV_RESOURCE)
+
+   r = blk_insert_cloned_request(clone->q, clone, was_busy);
+   switch (r) {
+   default:
/* must complete clone in terms of original request */
dm_complete_request(rq, r);
+   /* fall through */
+   case BLK_STS_RESOURCE:
+   case BLK_STS_DEV_RESOURCE:
+   rq->rq_flags |= RQF_DONTPREP;
+   case BLK_STS_OK:
+   break;
+   }
+
return r;
 }
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 4293dc1cd160..7cb84ee4c9f4 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -994,7 +994,7 @@ extern int blk_rq_prep_clone(struct request *rq, struct 
request *rq_src,
 void *data);
 extern void blk_rq_unprep_clone(struct request *rq);
 extern blk_status_t blk_insert_cloned_request(struct request_queue *q,
-struct request *rq);
+struct request *rq, bool force_insert);
 extern int blk_rq_append_bio(struct request *rq, struct bio **bio);
 extern void blk_delay_queue(struct request_queue *, unsigned long);
 extern void blk_queue_split(struct request_queue *, struct bio **);

-- 
Jens Axboe



Re: block: fix direct dispatch issue failure for clones

2018-12-06 Thread Mike Snitzer
On Thu, Dec 06 2018 at  8:34pm -0500,
Jens Axboe  wrote:

> On 12/6/18 6:22 PM, jianchao.wang wrote:
> > 
> > 
> > On 12/7/18 9:13 AM, Jens Axboe wrote:
> >> On 12/6/18 6:04 PM, jianchao.wang wrote:
> >>>
> >>>
> >>> On 12/7/18 6:20 AM, Jens Axboe wrote:
>  After the direct dispatch corruption fix, we permanently disallow direct
>  dispatch of non read/write requests. This works fine off the normal IO
>  path, as they will be retried like any other failed direct dispatch
>  request. But for the blk_insert_cloned_request() that only DM uses to
>  bypass the bottom level scheduler, we always first attempt direct
>  dispatch. For some types of requests, that's now a permanent failure,
>  and no amount of retrying will make that succeed.
> 
>  Don't use direct dispatch off the cloned insert path, always just use
>  bypass inserts. This still bypasses the bottom level scheduler, which is
>  what DM wants.
> 
>  Fixes: ffe81d45322c ("blk-mq: fix corruption with direct issue")
>  Signed-off-by: Jens Axboe 
> 
>  ---
> 
>  diff --git a/block/blk-core.c b/block/blk-core.c
>  index deb56932f8c4..4c44e6fa0d08 100644
>  --- a/block/blk-core.c
>  +++ b/block/blk-core.c
>  @@ -2637,7 +2637,8 @@ blk_status_t blk_insert_cloned_request(struct 
>  request_queue *q, struct request *
>    * bypass a potential scheduler on the bottom device for
>    * insert.
>    */
>  -return blk_mq_request_issue_directly(rq);
>  +blk_mq_request_bypass_insert(rq, true);
>  +return BLK_STS_OK;
>   }
>   
>   spin_lock_irqsave(q->queue_lock, flags);
> 
> >>> Not sure about this because it will break the merging promotion for 
> >>> request based DM
> >>> from Ming.
> >>> 396eaf21ee17c476e8f66249fb1f4a39003d0ab4
> >>> (blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request 
> >>> feedback)
> >>>
> >>> We could use some other way to fix this.
> >>
> >> That really shouldn't matter as this is the cloned insert, merging should
> >> have been done on the original request.
> >>
> >>
> > Just quote some comments from the patch.
> > 
> > "
> >But dm-rq currently can't get the underlying queue's
> > dispatch feedback at all.  Without knowing whether a request was issued
> > or not (e.g. due to underlying queue being busy) the dm-rq elevator will
> > not be able to provide effective IO merging (as a side-effect of dm-rq
> > currently blindly destaging a request from its elevator only to requeue
> > it after a delay, which kills any opportunity for merging).  This
> > obviously causes very bad sequential IO performance.
> > ...
> > With this, request-based DM's blk-mq sequential IO performance is vastly
> > improved (as much as 3X in mpath/virtio-scsi testing)
> > "
> > 
> > Using blk_mq_request_bypass_insert to replace the 
> > blk_mq_request_issue_directly
> > could be a fast method to fix the current issue. Maybe we could get the 
> > merging
> > promotion back after some time.
> 
> This really sucks, mostly because DM wants to have it both ways - not use
> the bottom level IO scheduler, but still actually use it if it makes sense.

Well no, that isn't what DM is doing.  DM does have an upper layer
scheduler that would like to be afforded the same capabilities that any
request-based driver is given.  Yes that comes with plumbing in safe
passage for upper layer requests dispatched from a stacked blk-mq IO
scheduler.
 
> There is another way to fix this - still do the direct dispatch, but have
> dm track if it failed and do bypass insert in that case. I didn't want do
> to that since it's more involved, but it's doable.
> 
> Let me cook that up and test it... Don't like it, though.

Not following how DM can track if issuing the request worked if it is
always told it worked with BLK_STS_OK.  We care about feedback when the
request is actually issued because of the elaborate way blk-mq elevators
work.  DM is forced to worry about all these details, as covered some in
the header for commit 396eaf21ee17c476e8f66249fb1f4a39003d0ab4, it is
trying to have its cake and eat it too.  It just wants IO scheduling to
work for request-based DM devices.  That's it.

Mike


Re: block: fix direct dispatch issue failure for clones

2018-12-06 Thread Mike Snitzer
On Thu, Dec 06 2018 at  8:13pm -0500,
Jens Axboe  wrote:

> On 12/6/18 6:04 PM, jianchao.wang wrote:
> > 
> > 
> > On 12/7/18 6:20 AM, Jens Axboe wrote:
> >> After the direct dispatch corruption fix, we permanently disallow direct
> >> dispatch of non read/write requests. This works fine off the normal IO
> >> path, as they will be retried like any other failed direct dispatch
> >> request. But for the blk_insert_cloned_request() that only DM uses to
> >> bypass the bottom level scheduler, we always first attempt direct
> >> dispatch. For some types of requests, that's now a permanent failure,
> >> and no amount of retrying will make that succeed.
> >>
> >> Don't use direct dispatch off the cloned insert path, always just use
> >> bypass inserts. This still bypasses the bottom level scheduler, which is
> >> what DM wants.
> >>
> >> Fixes: ffe81d45322c ("blk-mq: fix corruption with direct issue")
> >> Signed-off-by: Jens Axboe 
> >>
> >> ---
> >>
> >> diff --git a/block/blk-core.c b/block/blk-core.c
> >> index deb56932f8c4..4c44e6fa0d08 100644
> >> --- a/block/blk-core.c
> >> +++ b/block/blk-core.c
> >> @@ -2637,7 +2637,8 @@ blk_status_t blk_insert_cloned_request(struct 
> >> request_queue *q, struct request *
> >> * bypass a potential scheduler on the bottom device for
> >> * insert.
> >> */
> >> -  return blk_mq_request_issue_directly(rq);
> >> +  blk_mq_request_bypass_insert(rq, true);
> >> +  return BLK_STS_OK;
> >>}
> >>  
> >>spin_lock_irqsave(q->queue_lock, flags);
> >>
> > Not sure about this because it will break the merging promotion for request 
> > based DM
> > from Ming.
> > 396eaf21ee17c476e8f66249fb1f4a39003d0ab4
> > (blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request 
> > feedback)
> > 
> > We could use some other way to fix this.
> 
> That really shouldn't matter as this is the cloned insert, merging should
> have been done on the original request.

Reading the header of 396eaf21ee17c476e8f66249fb1f4a39003d0ab4 brings me
back to relatively recent hell.

Thing is, dm-rq was the original justification and consumer of
blk_mq_request_issue_directly -- but Ming's later use of directly issuing
requests has forced fixes that didn't consider the original valid/safe
use of the interface that is now too rigid.  dm-rq needs the
functionality the blk_mq_request_issue_directly interface provides.

Sorry to say we cannot lose the sequential IO performance improvements
that the IO merging feedback loop gives us.


Re: block: fix direct dispatch issue failure for clones

2018-12-06 Thread Mike Snitzer
On Thu, Dec 06 2018 at  8:04pm -0500,
jianchao.wang  wrote:

> 
> 
> On 12/7/18 6:20 AM, Jens Axboe wrote:
> > After the direct dispatch corruption fix, we permanently disallow direct
> > dispatch of non read/write requests. This works fine off the normal IO
> > path, as they will be retried like any other failed direct dispatch
> > request. But for the blk_insert_cloned_request() that only DM uses to
> > bypass the bottom level scheduler, we always first attempt direct
> > dispatch. For some types of requests, that's now a permanent failure,
> > and no amount of retrying will make that succeed.
> > 
> > Don't use direct dispatch off the cloned insert path, always just use
> > bypass inserts. This still bypasses the bottom level scheduler, which is
> > what DM wants.
> > 
> > Fixes: ffe81d45322c ("blk-mq: fix corruption with direct issue")
> > Signed-off-by: Jens Axboe 
> > 
> > ---
> > 
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index deb56932f8c4..4c44e6fa0d08 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -2637,7 +2637,8 @@ blk_status_t blk_insert_cloned_request(struct 
> > request_queue *q, struct request *
> >  * bypass a potential scheduler on the bottom device for
> >  * insert.
> >  */
> > -   return blk_mq_request_issue_directly(rq);
> > +   blk_mq_request_bypass_insert(rq, true);
> > +   return BLK_STS_OK;
> > }
> >  
> > spin_lock_irqsave(q->queue_lock, flags);
> > 
> Not sure about this because it will break the merging promotion for request 
> based DM
> from Ming.
> 396eaf21ee17c476e8f66249fb1f4a39003d0ab4
> (blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request 
> feedback)

Ngh.. yeah, forgot about that convoluted feedback loop.

> We could use some other way to fix this.

Yeah, afraid we have to.


Re: block: fix direct dispatch issue failure for clones

2018-12-06 Thread Jens Axboe
On 12/6/18 3:28 PM, Mike Snitzer wrote:
> On Thu, Dec 06 2018 at  5:20pm -0500,
> Jens Axboe  wrote:
> 
>> After the direct dispatch corruption fix, we permanently disallow direct
>> dispatch of non read/write requests. This works fine off the normal IO
>> path, as they will be retried like any other failed direct dispatch
>> request. But for the blk_insert_cloned_request() that only DM uses to
>> bypass the bottom level scheduler, we always first attempt direct
>> dispatch. For some types of requests, that's now a permanent failure,
>> and no amount of retrying will make that succeed.
>>
>> Don't use direct dispatch off the cloned insert path, always just use
>> bypass inserts. This still bypasses the bottom level scheduler, which is
>> what DM wants.
>>
>> Fixes: ffe81d45322c ("blk-mq: fix corruption with direct issue")
>> Signed-off-by: Jens Axboe 
>>
>> ---
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index deb56932f8c4..4c44e6fa0d08 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -2637,7 +2637,8 @@ blk_status_t blk_insert_cloned_request(struct 
>> request_queue *q, struct request *
>>   * bypass a potential scheduler on the bottom device for
>>   * insert.
>>   */
>> -return blk_mq_request_issue_directly(rq);
>> +blk_mq_request_bypass_insert(rq, true);
>> +return BLK_STS_OK;
>>  }
>>  
>>  spin_lock_irqsave(q->queue_lock, flags);
> 
> Not sure what this trailing spin_lock_irqsave(q->queue_lock, flags) is
> about.. but this looks good.

It's because it's against current -git, that is gone from the 4.21 branch.


> I'll cleanup dm-rq.c to do away with the
> extra STS_RESOURCE checks for its call to blk_insert_cloned_request()
> once this lands.
That is indeed a nice benefit, all source based failure cases can be
removed from the caller after this.

-- 
Jens Axboe



Re: block: fix direct dispatch issue failure for clones

2018-12-06 Thread Mike Snitzer
On Thu, Dec 06 2018 at  5:20pm -0500,
Jens Axboe  wrote:

> After the direct dispatch corruption fix, we permanently disallow direct
> dispatch of non read/write requests. This works fine off the normal IO
> path, as they will be retried like any other failed direct dispatch
> request. But for the blk_insert_cloned_request() that only DM uses to
> bypass the bottom level scheduler, we always first attempt direct
> dispatch. For some types of requests, that's now a permanent failure,
> and no amount of retrying will make that succeed.
> 
> Don't use direct dispatch off the cloned insert path, always just use
> bypass inserts. This still bypasses the bottom level scheduler, which is
> what DM wants.
> 
> Fixes: ffe81d45322c ("blk-mq: fix corruption with direct issue")
> Signed-off-by: Jens Axboe 
> 
> ---
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index deb56932f8c4..4c44e6fa0d08 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -2637,7 +2637,8 @@ blk_status_t blk_insert_cloned_request(struct 
> request_queue *q, struct request *
>* bypass a potential scheduler on the bottom device for
>* insert.
>*/
> - return blk_mq_request_issue_directly(rq);
> + blk_mq_request_bypass_insert(rq, true);
> + return BLK_STS_OK;
>   }
>  
>   spin_lock_irqsave(q->queue_lock, flags);

Not sure what this trailing spin_lock_irqsave(q->queue_lock, flags) is
about.. but this looks good.  I'll cleanup dm-rq.c to do away with the
extra STS_RESOURCE checks for its call to blk_insert_cloned_request()
once this lands.

Acked-by: Mike Snitzer 

Thanks.