Re: [PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers

2017-01-17 Thread Jens Axboe
On 01/17/2017 02:13 AM, Paolo Valente wrote:
> 
>> Il giorno 17 gen 2017, alle ore 03:47, Jens Axboe  ha scritto:
>>
>> On 12/22/2016 04:13 AM, Paolo Valente wrote:
>>>
 Il giorno 22 dic 2016, alle ore 10:59, Paolo Valente 
  ha scritto:

>
> Il giorno 17 dic 2016, alle ore 01:12, Jens Axboe  ha 
> scritto:
>
> This adds a set of hooks that intercepts the blk-mq path of
> allocating/inserting/issuing/completing requests, allowing
> us to develop a scheduler within that framework.
>
> We reuse the existing elevator scheduler API on the registration
> side, but augment that with the scheduler flagging support for
> the blk-mq interfce, and with a separate set of ops hooks for MQ
> devices.
>
> Schedulers can opt in to using shadow requests. Shadow requests
> are internal requests that the scheduler uses for for the allocate
> and insert part, which are then mapped to a real driver request
> at dispatch time. This is needed to separate the device queue depth
> from the pool of requests that the scheduler has to work with.
>
> Signed-off-by: Jens Axboe 
>
 ...

> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> new file mode 100644
> index ..b7e1839d4785
> --- /dev/null
> +++ b/block/blk-mq-sched.c

> ...
> +static inline bool
> +blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
> +  struct bio *bio)
> +{
> + struct elevator_queue *e = q->elevator;
> +
> + if (e && e->type->ops.mq.allow_merge)
> + return e->type->ops.mq.allow_merge(q, rq, bio);
> +
> + return true;
> +}
> +

 Something does not seem to add up here:
 e->type->ops.mq.allow_merge may be called only in
 blk_mq_sched_allow_merge, which, in its turn, may be called only in
 blk_mq_attempt_merge, which, finally, may be called only in
 blk_mq_merge_queue_io.  Yet the latter may be called only if there is
 no elevator (line 1399 and 1507 in blk-mq.c).

 Therefore, e->type->ops.mq.allow_merge can never be called, both if
 there is and if there is not an elevator.  Be patient if I'm missing
 something huge, but I thought it was worth reporting this.

>>>
>>> Just another detail: if e->type->ops.mq.allow_merge does get invoked
>>> from the above path, then it is invoked of course without the
>>> scheduler lock held.  In contrast, if this function gets invoked
>>> from dd_bio_merge, then the scheduler lock is held.
>>
>> But the scheduler controls that itself. So it'd be perfectly fine to
>> have a locked and unlocked variant. The way that's typically done is to
>> have function() grabbing the lock, and __function() is invoked with the
>> lock held.
>>
>>> To handle this opposite alternatives, I don't know whether checking if
>>> the lock is held (and possibly taking it) from inside
>>> e->type->ops.mq.allow_merge is a good solution.  In any case, before
>>> possibly trying it, I will wait for some feedback on the main problem,
>>> i.e., on the fact that e->type->ops.mq.allow_merge
>>> seems unreachable in the above path.
>>
>> Checking if a lock is held is NEVER a good idea, as it leads to both bad
>> and incorrect code. If you just check if a lock is held when being
>> called, you don't necessarily know if it was the caller that grabbed it
>> or it just happens to be held by someone else for unrelated reasons.
>>
>>
> 
> Thanks a lot for this and the above explanations.  Unfortunately, I
> still see the problem.  To hopefully make you waste less time, I have
> reported the problematic paths explicitly, so that you can quickly
> point me to my mistake.
> 
> The problem is caused by the existence of at least the following two
> alternative paths to e->type->ops.mq.allow_merge.
> 
> 1.  In mq-deadline.c (line 374): spin_lock(>lock);
> blk_mq_sched_try_merge -> elv_merge -> elv_bio_merge_ok ->
> elv_iosched_allow_bio_merge -> e->type->ops.mq.allow_merge
> 
> 2. In blk-core.c (line 1660): spin_lock_irq(q->queue_lock);
> elv_merge -> elv_bio_merge_ok ->
> elv_iosched_allow_bio_merge -> e->type->ops.mq.allow_merge
> 
> In the first path, the scheduler lock is held, while in the second
> path, it is not.  This does not cause problems with mq-deadline,
> because the latter just has no allow_merge function.  Yet it does
> cause problems with the allow_merge implementation of bfq.  There was
> no issue in blk, as only the global queue lock was used.
> 
> Where am I wrong?

#2 can never happen for blk-mq, it's the old IO path. blk-mq is never
invoked with ->queue_lock held.

-- 
Jens Axboe



Re: [PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers

2017-01-17 Thread Jens Axboe
On 01/17/2017 02:13 AM, Paolo Valente wrote:
> 
>> Il giorno 17 gen 2017, alle ore 03:47, Jens Axboe  ha scritto:
>>
>> On 12/22/2016 04:13 AM, Paolo Valente wrote:
>>>
 Il giorno 22 dic 2016, alle ore 10:59, Paolo Valente 
  ha scritto:

>
> Il giorno 17 dic 2016, alle ore 01:12, Jens Axboe  ha 
> scritto:
>
> This adds a set of hooks that intercepts the blk-mq path of
> allocating/inserting/issuing/completing requests, allowing
> us to develop a scheduler within that framework.
>
> We reuse the existing elevator scheduler API on the registration
> side, but augment that with the scheduler flagging support for
> the blk-mq interfce, and with a separate set of ops hooks for MQ
> devices.
>
> Schedulers can opt in to using shadow requests. Shadow requests
> are internal requests that the scheduler uses for for the allocate
> and insert part, which are then mapped to a real driver request
> at dispatch time. This is needed to separate the device queue depth
> from the pool of requests that the scheduler has to work with.
>
> Signed-off-by: Jens Axboe 
>
 ...

> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> new file mode 100644
> index ..b7e1839d4785
> --- /dev/null
> +++ b/block/blk-mq-sched.c

> ...
> +static inline bool
> +blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
> +  struct bio *bio)
> +{
> + struct elevator_queue *e = q->elevator;
> +
> + if (e && e->type->ops.mq.allow_merge)
> + return e->type->ops.mq.allow_merge(q, rq, bio);
> +
> + return true;
> +}
> +

 Something does not seem to add up here:
 e->type->ops.mq.allow_merge may be called only in
 blk_mq_sched_allow_merge, which, in its turn, may be called only in
 blk_mq_attempt_merge, which, finally, may be called only in
 blk_mq_merge_queue_io.  Yet the latter may be called only if there is
 no elevator (line 1399 and 1507 in blk-mq.c).

 Therefore, e->type->ops.mq.allow_merge can never be called, both if
 there is and if there is not an elevator.  Be patient if I'm missing
 something huge, but I thought it was worth reporting this.

>>>
>>> Just another detail: if e->type->ops.mq.allow_merge does get invoked
>>> from the above path, then it is invoked of course without the
>>> scheduler lock held.  In contrast, if this function gets invoked
>>> from dd_bio_merge, then the scheduler lock is held.
>>
>> But the scheduler controls that itself. So it'd be perfectly fine to
>> have a locked and unlocked variant. The way that's typically done is to
>> have function() grabbing the lock, and __function() is invoked with the
>> lock held.
>>
>>> To handle this opposite alternatives, I don't know whether checking if
>>> the lock is held (and possibly taking it) from inside
>>> e->type->ops.mq.allow_merge is a good solution.  In any case, before
>>> possibly trying it, I will wait for some feedback on the main problem,
>>> i.e., on the fact that e->type->ops.mq.allow_merge
>>> seems unreachable in the above path.
>>
>> Checking if a lock is held is NEVER a good idea, as it leads to both bad
>> and incorrect code. If you just check if a lock is held when being
>> called, you don't necessarily know if it was the caller that grabbed it
>> or it just happens to be held by someone else for unrelated reasons.
>>
>>
> 
> Thanks a lot for this and the above explanations.  Unfortunately, I
> still see the problem.  To hopefully make you waste less time, I have
> reported the problematic paths explicitly, so that you can quickly
> point me to my mistake.
> 
> The problem is caused by the existence of at least the following two
> alternative paths to e->type->ops.mq.allow_merge.
> 
> 1.  In mq-deadline.c (line 374): spin_lock(>lock);
> blk_mq_sched_try_merge -> elv_merge -> elv_bio_merge_ok ->
> elv_iosched_allow_bio_merge -> e->type->ops.mq.allow_merge
> 
> 2. In blk-core.c (line 1660): spin_lock_irq(q->queue_lock);
> elv_merge -> elv_bio_merge_ok ->
> elv_iosched_allow_bio_merge -> e->type->ops.mq.allow_merge
> 
> In the first path, the scheduler lock is held, while in the second
> path, it is not.  This does not cause problems with mq-deadline,
> because the latter just has no allow_merge function.  Yet it does
> cause problems with the allow_merge implementation of bfq.  There was
> no issue in blk, as only the global queue lock was used.
> 
> Where am I wrong?

#2 can never happen for blk-mq, it's the old IO path. blk-mq is never
invoked with ->queue_lock held.

-- 
Jens Axboe



Re: [PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers

2017-01-17 Thread Paolo Valente

> Il giorno 17 gen 2017, alle ore 03:47, Jens Axboe  ha scritto:
> 
> On 12/22/2016 04:13 AM, Paolo Valente wrote:
>> 
>>> Il giorno 22 dic 2016, alle ore 10:59, Paolo Valente 
>>>  ha scritto:
>>> 
 
 Il giorno 17 dic 2016, alle ore 01:12, Jens Axboe  ha 
 scritto:
 
 This adds a set of hooks that intercepts the blk-mq path of
 allocating/inserting/issuing/completing requests, allowing
 us to develop a scheduler within that framework.
 
 We reuse the existing elevator scheduler API on the registration
 side, but augment that with the scheduler flagging support for
 the blk-mq interfce, and with a separate set of ops hooks for MQ
 devices.
 
 Schedulers can opt in to using shadow requests. Shadow requests
 are internal requests that the scheduler uses for for the allocate
 and insert part, which are then mapped to a real driver request
 at dispatch time. This is needed to separate the device queue depth
 from the pool of requests that the scheduler has to work with.
 
 Signed-off-by: Jens Axboe 
 
>>> ...
>>> 
 diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
 new file mode 100644
 index ..b7e1839d4785
 --- /dev/null
 +++ b/block/blk-mq-sched.c
>>> 
 ...
 +static inline bool
 +blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
 +   struct bio *bio)
 +{
 +  struct elevator_queue *e = q->elevator;
 +
 +  if (e && e->type->ops.mq.allow_merge)
 +  return e->type->ops.mq.allow_merge(q, rq, bio);
 +
 +  return true;
 +}
 +
>>> 
>>> Something does not seem to add up here:
>>> e->type->ops.mq.allow_merge may be called only in
>>> blk_mq_sched_allow_merge, which, in its turn, may be called only in
>>> blk_mq_attempt_merge, which, finally, may be called only in
>>> blk_mq_merge_queue_io.  Yet the latter may be called only if there is
>>> no elevator (line 1399 and 1507 in blk-mq.c).
>>> 
>>> Therefore, e->type->ops.mq.allow_merge can never be called, both if
>>> there is and if there is not an elevator.  Be patient if I'm missing
>>> something huge, but I thought it was worth reporting this.
>>> 
>> 
>> Just another detail: if e->type->ops.mq.allow_merge does get invoked
>> from the above path, then it is invoked of course without the
>> scheduler lock held.  In contrast, if this function gets invoked
>> from dd_bio_merge, then the scheduler lock is held.
> 
> But the scheduler controls that itself. So it'd be perfectly fine to
> have a locked and unlocked variant. The way that's typically done is to
> have function() grabbing the lock, and __function() is invoked with the
> lock held.
> 
>> To handle this opposite alternatives, I don't know whether checking if
>> the lock is held (and possibly taking it) from inside
>> e->type->ops.mq.allow_merge is a good solution.  In any case, before
>> possibly trying it, I will wait for some feedback on the main problem,
>> i.e., on the fact that e->type->ops.mq.allow_merge
>> seems unreachable in the above path.
> 
> Checking if a lock is held is NEVER a good idea, as it leads to both bad
> and incorrect code. If you just check if a lock is held when being
> called, you don't necessarily know if it was the caller that grabbed it
> or it just happens to be held by someone else for unrelated reasons.
> 
> 

Thanks a lot for this and the above explanations.  Unfortunately, I
still see the problem.  To hopefully make you waste less time, I have
reported the problematic paths explicitly, so that you can quickly
point me to my mistake.

The problem is caused by the existence of at least the following two
alternative paths to e->type->ops.mq.allow_merge.

1.  In mq-deadline.c (line 374): spin_lock(>lock);
blk_mq_sched_try_merge -> elv_merge -> elv_bio_merge_ok ->
elv_iosched_allow_bio_merge -> e->type->ops.mq.allow_merge

2. In blk-core.c (line 1660): spin_lock_irq(q->queue_lock);
elv_merge -> elv_bio_merge_ok ->
elv_iosched_allow_bio_merge -> e->type->ops.mq.allow_merge

In the first path, the scheduler lock is held, while in the second
path, it is not.  This does not cause problems with mq-deadline,
because the latter just has no allow_merge function.  Yet it does
cause problems with the allow_merge implementation of bfq.  There was
no issue in blk, as only the global queue lock was used.

Where am I wrong?

Thanks,
Paolo


> -- 
> Jens Axboe
> 



Re: [PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers

2017-01-17 Thread Paolo Valente

> Il giorno 17 gen 2017, alle ore 03:47, Jens Axboe  ha scritto:
> 
> On 12/22/2016 04:13 AM, Paolo Valente wrote:
>> 
>>> Il giorno 22 dic 2016, alle ore 10:59, Paolo Valente 
>>>  ha scritto:
>>> 
 
 Il giorno 17 dic 2016, alle ore 01:12, Jens Axboe  ha 
 scritto:
 
 This adds a set of hooks that intercepts the blk-mq path of
 allocating/inserting/issuing/completing requests, allowing
 us to develop a scheduler within that framework.
 
 We reuse the existing elevator scheduler API on the registration
 side, but augment that with the scheduler flagging support for
 the blk-mq interfce, and with a separate set of ops hooks for MQ
 devices.
 
 Schedulers can opt in to using shadow requests. Shadow requests
 are internal requests that the scheduler uses for for the allocate
 and insert part, which are then mapped to a real driver request
 at dispatch time. This is needed to separate the device queue depth
 from the pool of requests that the scheduler has to work with.
 
 Signed-off-by: Jens Axboe 
 
>>> ...
>>> 
 diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
 new file mode 100644
 index ..b7e1839d4785
 --- /dev/null
 +++ b/block/blk-mq-sched.c
>>> 
 ...
 +static inline bool
 +blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
 +   struct bio *bio)
 +{
 +  struct elevator_queue *e = q->elevator;
 +
 +  if (e && e->type->ops.mq.allow_merge)
 +  return e->type->ops.mq.allow_merge(q, rq, bio);
 +
 +  return true;
 +}
 +
>>> 
>>> Something does not seem to add up here:
>>> e->type->ops.mq.allow_merge may be called only in
>>> blk_mq_sched_allow_merge, which, in its turn, may be called only in
>>> blk_mq_attempt_merge, which, finally, may be called only in
>>> blk_mq_merge_queue_io.  Yet the latter may be called only if there is
>>> no elevator (line 1399 and 1507 in blk-mq.c).
>>> 
>>> Therefore, e->type->ops.mq.allow_merge can never be called, both if
>>> there is and if there is not an elevator.  Be patient if I'm missing
>>> something huge, but I thought it was worth reporting this.
>>> 
>> 
>> Just another detail: if e->type->ops.mq.allow_merge does get invoked
>> from the above path, then it is invoked of course without the
>> scheduler lock held.  In contrast, if this function gets invoked
>> from dd_bio_merge, then the scheduler lock is held.
> 
> But the scheduler controls that itself. So it'd be perfectly fine to
> have a locked and unlocked variant. The way that's typically done is to
> have function() grabbing the lock, and __function() is invoked with the
> lock held.
> 
>> To handle this opposite alternatives, I don't know whether checking if
>> the lock is held (and possibly taking it) from inside
>> e->type->ops.mq.allow_merge is a good solution.  In any case, before
>> possibly trying it, I will wait for some feedback on the main problem,
>> i.e., on the fact that e->type->ops.mq.allow_merge
>> seems unreachable in the above path.
> 
> Checking if a lock is held is NEVER a good idea, as it leads to both bad
> and incorrect code. If you just check if a lock is held when being
> called, you don't necessarily know if it was the caller that grabbed it
> or it just happens to be held by someone else for unrelated reasons.
> 
> 

Thanks a lot for this and the above explanations.  Unfortunately, I
still see the problem.  To hopefully make you waste less time, I have
reported the problematic paths explicitly, so that you can quickly
point me to my mistake.

The problem is caused by the existence of at least the following two
alternative paths to e->type->ops.mq.allow_merge.

1.  In mq-deadline.c (line 374): spin_lock(>lock);
blk_mq_sched_try_merge -> elv_merge -> elv_bio_merge_ok ->
elv_iosched_allow_bio_merge -> e->type->ops.mq.allow_merge

2. In blk-core.c (line 1660): spin_lock_irq(q->queue_lock);
elv_merge -> elv_bio_merge_ok ->
elv_iosched_allow_bio_merge -> e->type->ops.mq.allow_merge

In the first path, the scheduler lock is held, while in the second
path, it is not.  This does not cause problems with mq-deadline,
because the latter just has no allow_merge function.  Yet it does
cause problems with the allow_merge implementation of bfq.  There was
no issue in blk, as only the global queue lock was used.

Where am I wrong?

Thanks,
Paolo


> -- 
> Jens Axboe
> 



Re: [PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers

2017-01-17 Thread Paolo Valente

> Il giorno 17 gen 2017, alle ore 03:47, Jens Axboe  ha scritto:
> 
> On 12/22/2016 02:59 AM, Paolo Valente wrote:
>> 
>>> Il giorno 17 dic 2016, alle ore 01:12, Jens Axboe  ha scritto:
>>> 
>>> This adds a set of hooks that intercepts the blk-mq path of
>>> allocating/inserting/issuing/completing requests, allowing
>>> us to develop a scheduler within that framework.
>>> 
>>> We reuse the existing elevator scheduler API on the registration
>>> side, but augment that with the scheduler flagging support for
>>> the blk-mq interfce, and with a separate set of ops hooks for MQ
>>> devices.
>>> 
>>> Schedulers can opt in to using shadow requests. Shadow requests
>>> are internal requests that the scheduler uses for for the allocate
>>> and insert part, which are then mapped to a real driver request
>>> at dispatch time. This is needed to separate the device queue depth
>>> from the pool of requests that the scheduler has to work with.
>>> 
>>> Signed-off-by: Jens Axboe 
>>> 
>> ...
>> 
>>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>>> new file mode 100644
>>> index ..b7e1839d4785
>>> --- /dev/null
>>> +++ b/block/blk-mq-sched.c
>> 
>>> ...
>>> +static inline bool
>>> +blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
>>> +struct bio *bio)
>>> +{
>>> +   struct elevator_queue *e = q->elevator;
>>> +
>>> +   if (e && e->type->ops.mq.allow_merge)
>>> +   return e->type->ops.mq.allow_merge(q, rq, bio);
>>> +
>>> +   return true;
>>> +}
>>> +
>> 
>> Something does not seem to add up here:
>> e->type->ops.mq.allow_merge may be called only in
>> blk_mq_sched_allow_merge, which, in its turn, may be called only in
>> blk_mq_attempt_merge, which, finally, may be called only in
>> blk_mq_merge_queue_io.  Yet the latter may be called only if there is
>> no elevator (line 1399 and 1507 in blk-mq.c).
>> 
>> Therefore, e->type->ops.mq.allow_merge can never be called, both if
>> there is and if there is not an elevator.  Be patient if I'm missing
>> something huge, but I thought it was worth reporting this.
> 
> I went through the current branch, and it seems mostly fine. There was
> a double call to allow_merge() that I killed in the plug path, and one
> set missing in blk_mq_sched_try_merge(). The rest looks OK.
> 

Yes, I missed a path, sorry.  I'm happy that at least your check has
not been a waste of time for other reasons.

Thanks,
Paolo

> -- 
> Jens Axboe
> 



Re: [PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers

2017-01-17 Thread Paolo Valente

> Il giorno 17 gen 2017, alle ore 03:47, Jens Axboe  ha scritto:
> 
> On 12/22/2016 02:59 AM, Paolo Valente wrote:
>> 
>>> Il giorno 17 dic 2016, alle ore 01:12, Jens Axboe  ha scritto:
>>> 
>>> This adds a set of hooks that intercepts the blk-mq path of
>>> allocating/inserting/issuing/completing requests, allowing
>>> us to develop a scheduler within that framework.
>>> 
>>> We reuse the existing elevator scheduler API on the registration
>>> side, but augment that with the scheduler flagging support for
>>> the blk-mq interfce, and with a separate set of ops hooks for MQ
>>> devices.
>>> 
>>> Schedulers can opt in to using shadow requests. Shadow requests
>>> are internal requests that the scheduler uses for for the allocate
>>> and insert part, which are then mapped to a real driver request
>>> at dispatch time. This is needed to separate the device queue depth
>>> from the pool of requests that the scheduler has to work with.
>>> 
>>> Signed-off-by: Jens Axboe 
>>> 
>> ...
>> 
>>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>>> new file mode 100644
>>> index ..b7e1839d4785
>>> --- /dev/null
>>> +++ b/block/blk-mq-sched.c
>> 
>>> ...
>>> +static inline bool
>>> +blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
>>> +struct bio *bio)
>>> +{
>>> +   struct elevator_queue *e = q->elevator;
>>> +
>>> +   if (e && e->type->ops.mq.allow_merge)
>>> +   return e->type->ops.mq.allow_merge(q, rq, bio);
>>> +
>>> +   return true;
>>> +}
>>> +
>> 
>> Something does not seem to add up here:
>> e->type->ops.mq.allow_merge may be called only in
>> blk_mq_sched_allow_merge, which, in its turn, may be called only in
>> blk_mq_attempt_merge, which, finally, may be called only in
>> blk_mq_merge_queue_io.  Yet the latter may be called only if there is
>> no elevator (line 1399 and 1507 in blk-mq.c).
>> 
>> Therefore, e->type->ops.mq.allow_merge can never be called, both if
>> there is and if there is not an elevator.  Be patient if I'm missing
>> something huge, but I thought it was worth reporting this.
> 
> I went through the current branch, and it seems mostly fine. There was
> a double call to allow_merge() that I killed in the plug path, and one
> set missing in blk_mq_sched_try_merge(). The rest looks OK.
> 

Yes, I missed a path, sorry.  I'm happy that at least your check has
not been a waste of time for other reasons.

Thanks,
Paolo

> -- 
> Jens Axboe
> 



Re: [PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers

2017-01-16 Thread Jens Axboe
On 12/22/2016 04:13 AM, Paolo Valente wrote:
> 
>> Il giorno 22 dic 2016, alle ore 10:59, Paolo Valente 
>>  ha scritto:
>>
>>>
>>> Il giorno 17 dic 2016, alle ore 01:12, Jens Axboe  ha scritto:
>>>
>>> This adds a set of hooks that intercepts the blk-mq path of
>>> allocating/inserting/issuing/completing requests, allowing
>>> us to develop a scheduler within that framework.
>>>
>>> We reuse the existing elevator scheduler API on the registration
>>> side, but augment that with the scheduler flagging support for
>>> the blk-mq interfce, and with a separate set of ops hooks for MQ
>>> devices.
>>>
>>> Schedulers can opt in to using shadow requests. Shadow requests
>>> are internal requests that the scheduler uses for for the allocate
>>> and insert part, which are then mapped to a real driver request
>>> at dispatch time. This is needed to separate the device queue depth
>>> from the pool of requests that the scheduler has to work with.
>>>
>>> Signed-off-by: Jens Axboe 
>>>
>> ...
>>
>>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>>> new file mode 100644
>>> index ..b7e1839d4785
>>> --- /dev/null
>>> +++ b/block/blk-mq-sched.c
>>
>>> ...
>>> +static inline bool
>>> +blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
>>> +struct bio *bio)
>>> +{
>>> +   struct elevator_queue *e = q->elevator;
>>> +
>>> +   if (e && e->type->ops.mq.allow_merge)
>>> +   return e->type->ops.mq.allow_merge(q, rq, bio);
>>> +
>>> +   return true;
>>> +}
>>> +
>>
>> Something does not seem to add up here:
>> e->type->ops.mq.allow_merge may be called only in
>> blk_mq_sched_allow_merge, which, in its turn, may be called only in
>> blk_mq_attempt_merge, which, finally, may be called only in
>> blk_mq_merge_queue_io.  Yet the latter may be called only if there is
>> no elevator (line 1399 and 1507 in blk-mq.c).
>>
>> Therefore, e->type->ops.mq.allow_merge can never be called, both if
>> there is and if there is not an elevator.  Be patient if I'm missing
>> something huge, but I thought it was worth reporting this.
>>
> 
> Just another detail: if e->type->ops.mq.allow_merge does get invoked
> from the above path, then it is invoked of course without the
> scheduler lock held.  In contrast, if this function gets invoked
> from dd_bio_merge, then the scheduler lock is held.

But the scheduler controls that itself. So it'd be perfectly fine to
have a locked and unlocked variant. The way that's typically done is to
have function() grabbing the lock, and __function() is invoked with the
lock held.

> To handle this opposite alternatives, I don't know whether checking if
> the lock is held (and possibly taking it) from inside
> e->type->ops.mq.allow_merge is a good solution.  In any case, before
> possibly trying it, I will wait for some feedback on the main problem,
> i.e., on the fact that e->type->ops.mq.allow_merge
> seems unreachable in the above path.

Checking if a lock is held is NEVER a good idea, as it leads to both bad
and incorrect code. If you just check if a lock is held when being
called, you don't necessarily know if it was the caller that grabbed it
or it just happens to be held by someone else for unrelated reasons.


-- 
Jens Axboe



Re: [PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers

2017-01-16 Thread Jens Axboe
On 12/22/2016 04:13 AM, Paolo Valente wrote:
> 
>> Il giorno 22 dic 2016, alle ore 10:59, Paolo Valente 
>>  ha scritto:
>>
>>>
>>> Il giorno 17 dic 2016, alle ore 01:12, Jens Axboe  ha scritto:
>>>
>>> This adds a set of hooks that intercepts the blk-mq path of
>>> allocating/inserting/issuing/completing requests, allowing
>>> us to develop a scheduler within that framework.
>>>
>>> We reuse the existing elevator scheduler API on the registration
>>> side, but augment that with the scheduler flagging support for
>>> the blk-mq interfce, and with a separate set of ops hooks for MQ
>>> devices.
>>>
>>> Schedulers can opt in to using shadow requests. Shadow requests
>>> are internal requests that the scheduler uses for for the allocate
>>> and insert part, which are then mapped to a real driver request
>>> at dispatch time. This is needed to separate the device queue depth
>>> from the pool of requests that the scheduler has to work with.
>>>
>>> Signed-off-by: Jens Axboe 
>>>
>> ...
>>
>>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>>> new file mode 100644
>>> index ..b7e1839d4785
>>> --- /dev/null
>>> +++ b/block/blk-mq-sched.c
>>
>>> ...
>>> +static inline bool
>>> +blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
>>> +struct bio *bio)
>>> +{
>>> +   struct elevator_queue *e = q->elevator;
>>> +
>>> +   if (e && e->type->ops.mq.allow_merge)
>>> +   return e->type->ops.mq.allow_merge(q, rq, bio);
>>> +
>>> +   return true;
>>> +}
>>> +
>>
>> Something does not seem to add up here:
>> e->type->ops.mq.allow_merge may be called only in
>> blk_mq_sched_allow_merge, which, in its turn, may be called only in
>> blk_mq_attempt_merge, which, finally, may be called only in
>> blk_mq_merge_queue_io.  Yet the latter may be called only if there is
>> no elevator (line 1399 and 1507 in blk-mq.c).
>>
>> Therefore, e->type->ops.mq.allow_merge can never be called, both if
>> there is and if there is not an elevator.  Be patient if I'm missing
>> something huge, but I thought it was worth reporting this.
>>
> 
> Just another detail: if e->type->ops.mq.allow_merge does get invoked
> from the above path, then it is invoked of course without the
> scheduler lock held.  In contrast, if this function gets invoked
> from dd_bio_merge, then the scheduler lock is held.

But the scheduler controls that itself. So it'd be perfectly fine to
have a locked and unlocked variant. The way that's typically done is to
have function() grabbing the lock, and __function() is invoked with the
lock held.

> To handle this opposite alternatives, I don't know whether checking if
> the lock is held (and possibly taking it) from inside
> e->type->ops.mq.allow_merge is a good solution.  In any case, before
> possibly trying it, I will wait for some feedback on the main problem,
> i.e., on the fact that e->type->ops.mq.allow_merge
> seems unreachable in the above path.

Checking if a lock is held is NEVER a good idea, as it leads to both bad
and incorrect code. If you just check if a lock is held when being
called, you don't necessarily know if it was the caller that grabbed it
or it just happens to be held by someone else for unrelated reasons.


-- 
Jens Axboe



Re: [PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers

2017-01-16 Thread Jens Axboe
On 12/22/2016 02:59 AM, Paolo Valente wrote:
> 
>> Il giorno 17 dic 2016, alle ore 01:12, Jens Axboe  ha scritto:
>>
>> This adds a set of hooks that intercepts the blk-mq path of
>> allocating/inserting/issuing/completing requests, allowing
>> us to develop a scheduler within that framework.
>>
>> We reuse the existing elevator scheduler API on the registration
>> side, but augment that with the scheduler flagging support for
>> the blk-mq interfce, and with a separate set of ops hooks for MQ
>> devices.
>>
>> Schedulers can opt in to using shadow requests. Shadow requests
>> are internal requests that the scheduler uses for for the allocate
>> and insert part, which are then mapped to a real driver request
>> at dispatch time. This is needed to separate the device queue depth
>> from the pool of requests that the scheduler has to work with.
>>
>> Signed-off-by: Jens Axboe 
>>
> ...
> 
>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>> new file mode 100644
>> index ..b7e1839d4785
>> --- /dev/null
>> +++ b/block/blk-mq-sched.c
> 
>> ...
>> +static inline bool
>> +blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
>> + struct bio *bio)
>> +{
>> +struct elevator_queue *e = q->elevator;
>> +
>> +if (e && e->type->ops.mq.allow_merge)
>> +return e->type->ops.mq.allow_merge(q, rq, bio);
>> +
>> +return true;
>> +}
>> +
> 
> Something does not seem to add up here:
> e->type->ops.mq.allow_merge may be called only in
> blk_mq_sched_allow_merge, which, in its turn, may be called only in
> blk_mq_attempt_merge, which, finally, may be called only in
> blk_mq_merge_queue_io.  Yet the latter may be called only if there is
> no elevator (line 1399 and 1507 in blk-mq.c).
> 
> Therefore, e->type->ops.mq.allow_merge can never be called, both if
> there is and if there is not an elevator.  Be patient if I'm missing
> something huge, but I thought it was worth reporting this.

I went through the current branch, and it seems mostly fine. There was
a double call to allow_merge() that I killed in the plug path, and one
set missing in blk_mq_sched_try_merge(). The rest looks OK.

-- 
Jens Axboe



Re: [PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers

2017-01-16 Thread Jens Axboe
On 12/22/2016 02:59 AM, Paolo Valente wrote:
> 
>> Il giorno 17 dic 2016, alle ore 01:12, Jens Axboe  ha scritto:
>>
>> This adds a set of hooks that intercepts the blk-mq path of
>> allocating/inserting/issuing/completing requests, allowing
>> us to develop a scheduler within that framework.
>>
>> We reuse the existing elevator scheduler API on the registration
>> side, but augment that with the scheduler flagging support for
>> the blk-mq interfce, and with a separate set of ops hooks for MQ
>> devices.
>>
>> Schedulers can opt in to using shadow requests. Shadow requests
>> are internal requests that the scheduler uses for for the allocate
>> and insert part, which are then mapped to a real driver request
>> at dispatch time. This is needed to separate the device queue depth
>> from the pool of requests that the scheduler has to work with.
>>
>> Signed-off-by: Jens Axboe 
>>
> ...
> 
>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>> new file mode 100644
>> index ..b7e1839d4785
>> --- /dev/null
>> +++ b/block/blk-mq-sched.c
> 
>> ...
>> +static inline bool
>> +blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
>> + struct bio *bio)
>> +{
>> +struct elevator_queue *e = q->elevator;
>> +
>> +if (e && e->type->ops.mq.allow_merge)
>> +return e->type->ops.mq.allow_merge(q, rq, bio);
>> +
>> +return true;
>> +}
>> +
> 
> Something does not seem to add up here:
> e->type->ops.mq.allow_merge may be called only in
> blk_mq_sched_allow_merge, which, in its turn, may be called only in
> blk_mq_attempt_merge, which, finally, may be called only in
> blk_mq_merge_queue_io.  Yet the latter may be called only if there is
> no elevator (line 1399 and 1507 in blk-mq.c).
> 
> Therefore, e->type->ops.mq.allow_merge can never be called, both if
> there is and if there is not an elevator.  Be patient if I'm missing
> something huge, but I thought it was worth reporting this.

I went through the current branch, and it seems mostly fine. There was
a double call to allow_merge() that I killed in the plug path, and one
set missing in blk_mq_sched_try_merge(). The rest looks OK.

-- 
Jens Axboe



Re: [PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers

2016-12-23 Thread Paolo Valente

> Il giorno 22 dic 2016, alle ore 10:59, Paolo Valente 
>  ha scritto:
> 
>> 
>> Il giorno 17 dic 2016, alle ore 01:12, Jens Axboe  ha scritto:
>> 
>> This adds a set of hooks that intercepts the blk-mq path of
>> allocating/inserting/issuing/completing requests, allowing
>> us to develop a scheduler within that framework.
>> 
>> We reuse the existing elevator scheduler API on the registration
>> side, but augment that with the scheduler flagging support for
>> the blk-mq interfce, and with a separate set of ops hooks for MQ
>> devices.
>> 
>> Schedulers can opt in to using shadow requests. Shadow requests
>> are internal requests that the scheduler uses for for the allocate
>> and insert part, which are then mapped to a real driver request
>> at dispatch time. This is needed to separate the device queue depth
>> from the pool of requests that the scheduler has to work with.
>> 
>> Signed-off-by: Jens Axboe 
>> 
> ...
> 
>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>> new file mode 100644
>> index ..b7e1839d4785
>> --- /dev/null
>> +++ b/block/blk-mq-sched.c
> 
>> ...
>> +static inline bool
>> +blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
>> + struct bio *bio)
>> +{
>> +struct elevator_queue *e = q->elevator;
>> +
>> +if (e && e->type->ops.mq.allow_merge)
>> +return e->type->ops.mq.allow_merge(q, rq, bio);
>> +
>> +return true;
>> +}
>> +
> 
> Something does not seem to add up here:
> e->type->ops.mq.allow_merge may be called only in
> blk_mq_sched_allow_merge, which, in its turn, may be called only in
> blk_mq_attempt_merge, which, finally, may be called only in
> blk_mq_merge_queue_io.  Yet the latter may be called only if there is
> no elevator (line 1399 and 1507 in blk-mq.c).
> 
> Therefore, e->type->ops.mq.allow_merge can never be called, both if
> there is and if there is not an elevator.  Be patient if I'm missing
> something huge, but I thought it was worth reporting this.
> 

Jens,
I forgot to add that I'm willing (and would be happy) to propose a fix
to this, and possibly the other problems too, on my own.  Just, I'm
not yet expert enough to do it with having first received some
feedback or instructions from you.  In this specific case, I don't
even know yet whether this is really a bug.

Thanks, and merry Christmas if we don't get in touch before,
Paolo

> Paolo
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: [PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers

2016-12-23 Thread Paolo Valente

> Il giorno 22 dic 2016, alle ore 10:59, Paolo Valente 
>  ha scritto:
> 
>> 
>> Il giorno 17 dic 2016, alle ore 01:12, Jens Axboe  ha scritto:
>> 
>> This adds a set of hooks that intercepts the blk-mq path of
>> allocating/inserting/issuing/completing requests, allowing
>> us to develop a scheduler within that framework.
>> 
>> We reuse the existing elevator scheduler API on the registration
>> side, but augment that with the scheduler flagging support for
>> the blk-mq interfce, and with a separate set of ops hooks for MQ
>> devices.
>> 
>> Schedulers can opt in to using shadow requests. Shadow requests
>> are internal requests that the scheduler uses for for the allocate
>> and insert part, which are then mapped to a real driver request
>> at dispatch time. This is needed to separate the device queue depth
>> from the pool of requests that the scheduler has to work with.
>> 
>> Signed-off-by: Jens Axboe 
>> 
> ...
> 
>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>> new file mode 100644
>> index ..b7e1839d4785
>> --- /dev/null
>> +++ b/block/blk-mq-sched.c
> 
>> ...
>> +static inline bool
>> +blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
>> + struct bio *bio)
>> +{
>> +struct elevator_queue *e = q->elevator;
>> +
>> +if (e && e->type->ops.mq.allow_merge)
>> +return e->type->ops.mq.allow_merge(q, rq, bio);
>> +
>> +return true;
>> +}
>> +
> 
> Something does not seem to add up here:
> e->type->ops.mq.allow_merge may be called only in
> blk_mq_sched_allow_merge, which, in its turn, may be called only in
> blk_mq_attempt_merge, which, finally, may be called only in
> blk_mq_merge_queue_io.  Yet the latter may be called only if there is
> no elevator (line 1399 and 1507 in blk-mq.c).
> 
> Therefore, e->type->ops.mq.allow_merge can never be called, both if
> there is and if there is not an elevator.  Be patient if I'm missing
> something huge, but I thought it was worth reporting this.
> 

Jens,
I forgot to add that I'm willing (and would be happy) to propose a fix
to this, and possibly the other problems too, on my own.  Just, I'm
not yet expert enough to do it with having first received some
feedback or instructions from you.  In this specific case, I don't
even know yet whether this is really a bug.

Thanks, and merry Christmas if we don't get in touch before,
Paolo

> Paolo
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: [PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers

2016-12-22 Thread Paolo Valente

> Il giorno 21 dic 2016, alle ore 03:22, Jens Axboe  ha scritto:
> 
> On Tue, Dec 20 2016, Paolo Valente wrote:
>>> +   else
>>> +   rq = __blk_mq_alloc_request(data, op);
>>> +
>>> +   if (rq) {
>>> +   rq->elv.icq = NULL;
>>> +   if (e && e->type->icq_cache)
>>> +   blk_mq_sched_assign_ioc(q, rq, bio);
>> 
>> bfq needs rq->elv.icq to be consistent in bfq_get_request, but the
>> needed initialization seems to occur only after mq.get_request is
>> invoked.
> 
> Can you do it from get/put_rq_priv?

Definitely, I just overlooked them, sorry :(

Thanks,
Paolo

> The icq is assigned there. If not,
> we can redo this part, not a big deal.
> 
> -- 
> Jens Axboe
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: [PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers

2016-12-22 Thread Paolo Valente

> Il giorno 21 dic 2016, alle ore 03:22, Jens Axboe  ha scritto:
> 
> On Tue, Dec 20 2016, Paolo Valente wrote:
>>> +   else
>>> +   rq = __blk_mq_alloc_request(data, op);
>>> +
>>> +   if (rq) {
>>> +   rq->elv.icq = NULL;
>>> +   if (e && e->type->icq_cache)
>>> +   blk_mq_sched_assign_ioc(q, rq, bio);
>> 
>> bfq needs rq->elv.icq to be consistent in bfq_get_request, but the
>> needed initialization seems to occur only after mq.get_request is
>> invoked.
> 
> Can you do it from get/put_rq_priv?

Definitely, I just overlooked them, sorry :(

Thanks,
Paolo

> The icq is assigned there. If not,
> we can redo this part, not a big deal.
> 
> -- 
> Jens Axboe
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: [PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers

2016-12-22 Thread Paolo Valente

> Il giorno 22 dic 2016, alle ore 10:59, Paolo Valente 
>  ha scritto:
> 
>> 
>> Il giorno 17 dic 2016, alle ore 01:12, Jens Axboe  ha scritto:
>> 
>> This adds a set of hooks that intercepts the blk-mq path of
>> allocating/inserting/issuing/completing requests, allowing
>> us to develop a scheduler within that framework.
>> 
>> We reuse the existing elevator scheduler API on the registration
>> side, but augment that with the scheduler flagging support for
>> the blk-mq interfce, and with a separate set of ops hooks for MQ
>> devices.
>> 
>> Schedulers can opt in to using shadow requests. Shadow requests
>> are internal requests that the scheduler uses for for the allocate
>> and insert part, which are then mapped to a real driver request
>> at dispatch time. This is needed to separate the device queue depth
>> from the pool of requests that the scheduler has to work with.
>> 
>> Signed-off-by: Jens Axboe 
>> 
> ...
> 
>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>> new file mode 100644
>> index ..b7e1839d4785
>> --- /dev/null
>> +++ b/block/blk-mq-sched.c
> 
>> ...
>> +static inline bool
>> +blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
>> + struct bio *bio)
>> +{
>> +struct elevator_queue *e = q->elevator;
>> +
>> +if (e && e->type->ops.mq.allow_merge)
>> +return e->type->ops.mq.allow_merge(q, rq, bio);
>> +
>> +return true;
>> +}
>> +
> 
> Something does not seem to add up here:
> e->type->ops.mq.allow_merge may be called only in
> blk_mq_sched_allow_merge, which, in its turn, may be called only in
> blk_mq_attempt_merge, which, finally, may be called only in
> blk_mq_merge_queue_io.  Yet the latter may be called only if there is
> no elevator (line 1399 and 1507 in blk-mq.c).
> 
> Therefore, e->type->ops.mq.allow_merge can never be called, both if
> there is and if there is not an elevator.  Be patient if I'm missing
> something huge, but I thought it was worth reporting this.
> 

Just another detail: if e->type->ops.mq.allow_merge does get invoked
from the above path, then it is invoked of course without the
scheduler lock held.  In contrast, if this function gets invoked
from dd_bio_merge, then the scheduler lock is held.

To handle this opposite alternatives, I don't know whether checking if
the lock is held (and possibly taking it) from inside
e->type->ops.mq.allow_merge is a good solution.  In any case, before
possibly trying it, I will wait for some feedback on the main problem,
i.e., on the fact that e->type->ops.mq.allow_merge
seems unreachable in the above path.

Thanks,
Paolo

> Paolo
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: [PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers

2016-12-22 Thread Paolo Valente

> Il giorno 22 dic 2016, alle ore 10:59, Paolo Valente 
>  ha scritto:
> 
>> 
>> Il giorno 17 dic 2016, alle ore 01:12, Jens Axboe  ha scritto:
>> 
>> This adds a set of hooks that intercepts the blk-mq path of
>> allocating/inserting/issuing/completing requests, allowing
>> us to develop a scheduler within that framework.
>> 
>> We reuse the existing elevator scheduler API on the registration
>> side, but augment that with the scheduler flagging support for
>> the blk-mq interfce, and with a separate set of ops hooks for MQ
>> devices.
>> 
>> Schedulers can opt in to using shadow requests. Shadow requests
>> are internal requests that the scheduler uses for for the allocate
>> and insert part, which are then mapped to a real driver request
>> at dispatch time. This is needed to separate the device queue depth
>> from the pool of requests that the scheduler has to work with.
>> 
>> Signed-off-by: Jens Axboe 
>> 
> ...
> 
>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>> new file mode 100644
>> index ..b7e1839d4785
>> --- /dev/null
>> +++ b/block/blk-mq-sched.c
> 
>> ...
>> +static inline bool
>> +blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
>> + struct bio *bio)
>> +{
>> +struct elevator_queue *e = q->elevator;
>> +
>> +if (e && e->type->ops.mq.allow_merge)
>> +return e->type->ops.mq.allow_merge(q, rq, bio);
>> +
>> +return true;
>> +}
>> +
> 
> Something does not seem to add up here:
> e->type->ops.mq.allow_merge may be called only in
> blk_mq_sched_allow_merge, which, in its turn, may be called only in
> blk_mq_attempt_merge, which, finally, may be called only in
> blk_mq_merge_queue_io.  Yet the latter may be called only if there is
> no elevator (line 1399 and 1507 in blk-mq.c).
> 
> Therefore, e->type->ops.mq.allow_merge can never be called, both if
> there is and if there is not an elevator.  Be patient if I'm missing
> something huge, but I thought it was worth reporting this.
> 

Just another detail: if e->type->ops.mq.allow_merge does get invoked
from the above path, then it is invoked of course without the
scheduler lock held.  In contrast, if this function gets invoked
from dd_bio_merge, then the scheduler lock is held.

To handle this opposite alternatives, I don't know whether checking if
the lock is held (and possibly taking it) from inside
e->type->ops.mq.allow_merge is a good solution.  In any case, before
possibly trying it, I will wait for some feedback on the main problem,
i.e., on the fact that e->type->ops.mq.allow_merge
seems unreachable in the above path.

Thanks,
Paolo

> Paolo
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: [PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers

2016-12-22 Thread Paolo Valente

> Il giorno 17 dic 2016, alle ore 01:12, Jens Axboe  ha scritto:
> 
> This adds a set of hooks that intercepts the blk-mq path of
> allocating/inserting/issuing/completing requests, allowing
> us to develop a scheduler within that framework.
> 
> We reuse the existing elevator scheduler API on the registration
> side, but augment that with the scheduler flagging support for
> the blk-mq interfce, and with a separate set of ops hooks for MQ
> devices.
> 
> Schedulers can opt in to using shadow requests. Shadow requests
> are internal requests that the scheduler uses for for the allocate
> and insert part, which are then mapped to a real driver request
> at dispatch time. This is needed to separate the device queue depth
> from the pool of requests that the scheduler has to work with.
> 
> Signed-off-by: Jens Axboe 
> 
...

> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> new file mode 100644
> index ..b7e1839d4785
> --- /dev/null
> +++ b/block/blk-mq-sched.c

> ...
> +static inline bool
> +blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
> +  struct bio *bio)
> +{
> + struct elevator_queue *e = q->elevator;
> +
> + if (e && e->type->ops.mq.allow_merge)
> + return e->type->ops.mq.allow_merge(q, rq, bio);
> +
> + return true;
> +}
> +

Something does not seem to add up here:
e->type->ops.mq.allow_merge may be called only in
blk_mq_sched_allow_merge, which, in its turn, may be called only in
blk_mq_attempt_merge, which, finally, may be called only in
blk_mq_merge_queue_io.  Yet the latter may be called only if there is
no elevator (line 1399 and 1507 in blk-mq.c).

Therefore, e->type->ops.mq.allow_merge can never be called, both if
there is and if there is not an elevator.  Be patient if I'm missing
something huge, but I thought it was worth reporting this.

Paolo



Re: [PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers

2016-12-22 Thread Paolo Valente

> Il giorno 17 dic 2016, alle ore 01:12, Jens Axboe  ha scritto:
> 
> This adds a set of hooks that intercepts the blk-mq path of
> allocating/inserting/issuing/completing requests, allowing
> us to develop a scheduler within that framework.
> 
> We reuse the existing elevator scheduler API on the registration
> side, but augment that with the scheduler flagging support for
> the blk-mq interfce, and with a separate set of ops hooks for MQ
> devices.
> 
> Schedulers can opt in to using shadow requests. Shadow requests
> are internal requests that the scheduler uses for for the allocate
> and insert part, which are then mapped to a real driver request
> at dispatch time. This is needed to separate the device queue depth
> from the pool of requests that the scheduler has to work with.
> 
> Signed-off-by: Jens Axboe 
> 
...

> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> new file mode 100644
> index ..b7e1839d4785
> --- /dev/null
> +++ b/block/blk-mq-sched.c

> ...
> +static inline bool
> +blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
> +  struct bio *bio)
> +{
> + struct elevator_queue *e = q->elevator;
> +
> + if (e && e->type->ops.mq.allow_merge)
> + return e->type->ops.mq.allow_merge(q, rq, bio);
> +
> + return true;
> +}
> +

Something does not seem to add up here:
e->type->ops.mq.allow_merge may be called only in
blk_mq_sched_allow_merge, which, in its turn, may be called only in
blk_mq_attempt_merge, which, finally, may be called only in
blk_mq_merge_queue_io.  Yet the latter may be called only if there is
no elevator (line 1399 and 1507 in blk-mq.c).

Therefore, e->type->ops.mq.allow_merge can never be called, both if
there is and if there is not an elevator.  Be patient if I'm missing
something huge, but I thought it was worth reporting this.

Paolo



Re: [PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers

2016-12-20 Thread Jens Axboe
On Tue, Dec 20 2016, Paolo Valente wrote:
> > +   else
> > +   rq = __blk_mq_alloc_request(data, op);
> > +
> > +   if (rq) {
> > +   rq->elv.icq = NULL;
> > +   if (e && e->type->icq_cache)
> > +   blk_mq_sched_assign_ioc(q, rq, bio);
> 
> bfq needs rq->elv.icq to be consistent in bfq_get_request, but the
> needed initialization seems to occur only after mq.get_request is
> invoked.

Can you do it from get/put_rq_priv? The icq is assigned there. If not,
we can redo this part, not a big deal.

-- 
Jens Axboe



Re: [PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers

2016-12-20 Thread Jens Axboe
On Tue, Dec 20 2016, Paolo Valente wrote:
> > +   else
> > +   rq = __blk_mq_alloc_request(data, op);
> > +
> > +   if (rq) {
> > +   rq->elv.icq = NULL;
> > +   if (e && e->type->icq_cache)
> > +   blk_mq_sched_assign_ioc(q, rq, bio);
> 
> bfq needs rq->elv.icq to be consistent in bfq_get_request, but the
> needed initialization seems to occur only after mq.get_request is
> invoked.

Can you do it from get/put_rq_priv? The icq is assigned there. If not,
we can redo this part, not a big deal.

-- 
Jens Axboe



Re: [PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers

2016-12-20 Thread Jens Axboe
On 12/20/2016 04:55 AM, Paolo Valente wrote:
>> +struct request *blk_mq_sched_get_request(struct request_queue *q,
>> + struct bio *bio,
>> + unsigned int op,
>> + struct blk_mq_alloc_data *data)
>> +{
>> +struct elevator_queue *e = q->elevator;
>> +struct blk_mq_hw_ctx *hctx;
>> +struct blk_mq_ctx *ctx;
>> +struct request *rq;
>> +
>> +blk_queue_enter_live(q);
>> +ctx = blk_mq_get_ctx(q);
>> +hctx = blk_mq_map_queue(q, ctx->cpu);
>> +
>> +blk_mq_set_alloc_data(data, q, 0, ctx, hctx);
>> +
>> +if (e && e->type->ops.mq.get_request)
>> +rq = e->type->ops.mq.get_request(q, op, data);
> 
> bio is not passed to the scheduler here.  Yet bfq uses bio to get the
> blkcg (invoking bio_blkcg).  I'm not finding any workaround.

One important note here - what I'm posting is a work in progress, it's
by no means set in stone. So when you find missing items like this, feel
free to fix them up and send a patch. I will then fold in that patch. Or
if you don't feel comfortable fixing it up, let me know, and I'll fix it
up next time I touch it.

>> +else
>> +rq = __blk_mq_alloc_request(data, op);
>> +
>> +if (rq) {
>> +rq->elv.icq = NULL;
>> +if (e && e->type->icq_cache)
>> +blk_mq_sched_assign_ioc(q, rq, bio);
> 
> bfq needs rq->elv.icq to be consistent in bfq_get_request, but the
> needed initialization seems to occur only after mq.get_request is
> invoked.
> 
> Note: to minimize latency, I'm reporting immediately each problem that
> apparently cannot be solved by just modifying bfq.  But, if the
> resulting higher number of micro-emails is annoying for you, I can
> buffer my questions, and send you cumulative emails less frequently.

That's perfectly fine, I prefer knowing earlier rather than later. But
do also remember that it's fine to send a patch to fix those things up,
you don't have to wait for me.

-- 
Jens Axboe



Re: [PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers

2016-12-20 Thread Jens Axboe
On 12/20/2016 04:55 AM, Paolo Valente wrote:
>> +struct request *blk_mq_sched_get_request(struct request_queue *q,
>> + struct bio *bio,
>> + unsigned int op,
>> + struct blk_mq_alloc_data *data)
>> +{
>> +struct elevator_queue *e = q->elevator;
>> +struct blk_mq_hw_ctx *hctx;
>> +struct blk_mq_ctx *ctx;
>> +struct request *rq;
>> +
>> +blk_queue_enter_live(q);
>> +ctx = blk_mq_get_ctx(q);
>> +hctx = blk_mq_map_queue(q, ctx->cpu);
>> +
>> +blk_mq_set_alloc_data(data, q, 0, ctx, hctx);
>> +
>> +if (e && e->type->ops.mq.get_request)
>> +rq = e->type->ops.mq.get_request(q, op, data);
> 
> bio is not passed to the scheduler here.  Yet bfq uses bio to get the
> blkcg (invoking bio_blkcg).  I'm not finding any workaround.

One important note here - what I'm posting is a work in progress, it's
by no means set in stone. So when you find missing items like this, feel
free to fix them up and send a patch. I will then fold in that patch. Or
if you don't feel comfortable fixing it up, let me know, and I'll fix it
up next time I touch it.

>> +else
>> +rq = __blk_mq_alloc_request(data, op);
>> +
>> +if (rq) {
>> +rq->elv.icq = NULL;
>> +if (e && e->type->icq_cache)
>> +blk_mq_sched_assign_ioc(q, rq, bio);
> 
> bfq needs rq->elv.icq to be consistent in bfq_get_request, but the
> needed initialization seems to occur only after mq.get_request is
> invoked.
> 
> Note: to minimize latency, I'm reporting immediately each problem that
> apparently cannot be solved by just modifying bfq.  But, if the
> resulting higher number of micro-emails is annoying for you, I can
> buffer my questions, and send you cumulative emails less frequently.

That's perfectly fine, I prefer knowing earlier rather than later. But
do also remember that it's fine to send a patch to fix those things up,
you don't have to wait for me.

-- 
Jens Axboe



Re: [PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers

2016-12-20 Thread Paolo Valente

> Il giorno 17 dic 2016, alle ore 01:12, Jens Axboe  ha scritto:
> 
> This adds a set of hooks that intercepts the blk-mq path of
> allocating/inserting/issuing/completing requests, allowing
> us to develop a scheduler within that framework.
> 
> We reuse the existing elevator scheduler API on the registration
> side, but augment that with the scheduler flagging support for
> the blk-mq interfce, and with a separate set of ops hooks for MQ
> devices.
> 
> Schedulers can opt in to using shadow requests. Shadow requests
> are internal requests that the scheduler uses for for the allocate
> and insert part, which are then mapped to a real driver request
> at dispatch time. This is needed to separate the device queue depth
> from the pool of requests that the scheduler has to work with.
> 
> Signed-off-by: Jens Axboe 

> ...
> 
> +struct request *blk_mq_sched_get_request(struct request_queue *q,
> +  struct bio *bio,
> +  unsigned int op,
> +  struct blk_mq_alloc_data *data)
> +{
> + struct elevator_queue *e = q->elevator;
> + struct blk_mq_hw_ctx *hctx;
> + struct blk_mq_ctx *ctx;
> + struct request *rq;
> +
> + blk_queue_enter_live(q);
> + ctx = blk_mq_get_ctx(q);
> + hctx = blk_mq_map_queue(q, ctx->cpu);
> +
> + blk_mq_set_alloc_data(data, q, 0, ctx, hctx);
> +
> + if (e && e->type->ops.mq.get_request)
> + rq = e->type->ops.mq.get_request(q, op, data);

bio is not passed to the scheduler here.  Yet bfq uses bio to get the
blkcg (invoking bio_blkcg).  I'm not finding any workaround.

> + else
> + rq = __blk_mq_alloc_request(data, op);
> +
> + if (rq) {
> + rq->elv.icq = NULL;
> + if (e && e->type->icq_cache)
> + blk_mq_sched_assign_ioc(q, rq, bio);

bfq needs rq->elv.icq to be consistent in bfq_get_request, but the
needed initialization seems to occur only after mq.get_request is
invoked.

Note: to minimize latency, I'm reporting immediately each problem that
apparently cannot be solved by just modifying bfq.  But, if the
resulting higher number of micro-emails is annoying for you, I can
buffer my questions, and send you cumulative emails less frequently.

Thanks,
Paolo

> + data->hctx->queued++;
> + return rq;
> + }
> +
> + blk_queue_exit(q);
> + return NULL;
> +}
> +
> +void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> +{
> + struct elevator_queue *e = hctx->queue->elevator;
> + LIST_HEAD(rq_list);
> +
> + if (unlikely(blk_mq_hctx_stopped(hctx)))
> + return;
> +
> + hctx->run++;
> +
> + /*
> +  * If we have previous entries on our dispatch list, grab them first for
> +  * more fair dispatch.
> +  */
> + if (!list_empty_careful(>dispatch)) {
> + spin_lock(>lock);
> + if (!list_empty(>dispatch))
> + list_splice_init(>dispatch, _list);
> + spin_unlock(>lock);
> + }
> +
> + /*
> +  * Only ask the scheduler for requests, if we didn't have residual
> +  * requests from the dispatch list. This is to avoid the case where
> +  * we only ever dispatch a fraction of the requests available because
> +  * of low device queue depth. Once we pull requests out of the IO
> +  * scheduler, we can no longer merge or sort them. So it's best to
> +  * leave them there for as long as we can. Mark the hw queue as
> +  * needing a restart in that case.
> +  */
> + if (list_empty(_list)) {
> + if (e && e->type->ops.mq.dispatch_requests)
> + e->type->ops.mq.dispatch_requests(hctx, _list);
> + else
> + blk_mq_flush_busy_ctxs(hctx, _list);
> + } else if (!test_bit(BLK_MQ_S_SCHED_RESTART, >state))
> + set_bit(BLK_MQ_S_SCHED_RESTART, >state);
> +
> + blk_mq_dispatch_rq_list(hctx, _list);
> +}
> +
> +bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio)
> +{
> + struct request *rq;
> + int ret;
> +
> + ret = elv_merge(q, , bio);
> + if (ret == ELEVATOR_BACK_MERGE) {
> + if (bio_attempt_back_merge(q, rq, bio)) {
> + if (!attempt_back_merge(q, rq))
> + elv_merged_request(q, rq, ret);
> + return true;
> + }
> + } else if (ret == ELEVATOR_FRONT_MERGE) {
> + if (bio_attempt_front_merge(q, rq, bio)) {
> + if (!attempt_front_merge(q, rq))
> + elv_merged_request(q, rq, ret);
> + return true;
> + }
> + }
> +
> + return false;
> +}
> +EXPORT_SYMBOL_GPL(blk_mq_sched_try_merge);
> +
> +bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio)
> +{
> + struct elevator_queue *e = 

Re: [PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers

2016-12-20 Thread Paolo Valente

> Il giorno 17 dic 2016, alle ore 01:12, Jens Axboe  ha scritto:
> 
> This adds a set of hooks that intercepts the blk-mq path of
> allocating/inserting/issuing/completing requests, allowing
> us to develop a scheduler within that framework.
> 
> We reuse the existing elevator scheduler API on the registration
> side, but augment that with the scheduler flagging support for
> the blk-mq interfce, and with a separate set of ops hooks for MQ
> devices.
> 
> Schedulers can opt in to using shadow requests. Shadow requests
> are internal requests that the scheduler uses for for the allocate
> and insert part, which are then mapped to a real driver request
> at dispatch time. This is needed to separate the device queue depth
> from the pool of requests that the scheduler has to work with.
> 
> Signed-off-by: Jens Axboe 

> ...
> 
> +struct request *blk_mq_sched_get_request(struct request_queue *q,
> +  struct bio *bio,
> +  unsigned int op,
> +  struct blk_mq_alloc_data *data)
> +{
> + struct elevator_queue *e = q->elevator;
> + struct blk_mq_hw_ctx *hctx;
> + struct blk_mq_ctx *ctx;
> + struct request *rq;
> +
> + blk_queue_enter_live(q);
> + ctx = blk_mq_get_ctx(q);
> + hctx = blk_mq_map_queue(q, ctx->cpu);
> +
> + blk_mq_set_alloc_data(data, q, 0, ctx, hctx);
> +
> + if (e && e->type->ops.mq.get_request)
> + rq = e->type->ops.mq.get_request(q, op, data);

bio is not passed to the scheduler here.  Yet bfq uses bio to get the
blkcg (invoking bio_blkcg).  I'm not finding any workaround.

> + else
> + rq = __blk_mq_alloc_request(data, op);
> +
> + if (rq) {
> + rq->elv.icq = NULL;
> + if (e && e->type->icq_cache)
> + blk_mq_sched_assign_ioc(q, rq, bio);

bfq needs rq->elv.icq to be consistent in bfq_get_request, but the
needed initialization seems to occur only after mq.get_request is
invoked.

Note: to minimize latency, I'm reporting immediately each problem that
apparently cannot be solved by just modifying bfq.  But, if the
resulting higher number of micro-emails is annoying for you, I can
buffer my questions, and send you cumulative emails less frequently.

Thanks,
Paolo

> + data->hctx->queued++;
> + return rq;
> + }
> +
> + blk_queue_exit(q);
> + return NULL;
> +}
> +
> +void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> +{
> + struct elevator_queue *e = hctx->queue->elevator;
> + LIST_HEAD(rq_list);
> +
> + if (unlikely(blk_mq_hctx_stopped(hctx)))
> + return;
> +
> + hctx->run++;
> +
> + /*
> +  * If we have previous entries on our dispatch list, grab them first for
> +  * more fair dispatch.
> +  */
> + if (!list_empty_careful(>dispatch)) {
> + spin_lock(>lock);
> + if (!list_empty(>dispatch))
> + list_splice_init(>dispatch, _list);
> + spin_unlock(>lock);
> + }
> +
> + /*
> +  * Only ask the scheduler for requests, if we didn't have residual
> +  * requests from the dispatch list. This is to avoid the case where
> +  * we only ever dispatch a fraction of the requests available because
> +  * of low device queue depth. Once we pull requests out of the IO
> +  * scheduler, we can no longer merge or sort them. So it's best to
> +  * leave them there for as long as we can. Mark the hw queue as
> +  * needing a restart in that case.
> +  */
> + if (list_empty(_list)) {
> + if (e && e->type->ops.mq.dispatch_requests)
> + e->type->ops.mq.dispatch_requests(hctx, _list);
> + else
> + blk_mq_flush_busy_ctxs(hctx, _list);
> + } else if (!test_bit(BLK_MQ_S_SCHED_RESTART, >state))
> + set_bit(BLK_MQ_S_SCHED_RESTART, >state);
> +
> + blk_mq_dispatch_rq_list(hctx, _list);
> +}
> +
> +bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio)
> +{
> + struct request *rq;
> + int ret;
> +
> + ret = elv_merge(q, , bio);
> + if (ret == ELEVATOR_BACK_MERGE) {
> + if (bio_attempt_back_merge(q, rq, bio)) {
> + if (!attempt_back_merge(q, rq))
> + elv_merged_request(q, rq, ret);
> + return true;
> + }
> + } else if (ret == ELEVATOR_FRONT_MERGE) {
> + if (bio_attempt_front_merge(q, rq, bio)) {
> + if (!attempt_front_merge(q, rq))
> + elv_merged_request(q, rq, ret);
> + return true;
> + }
> + }
> +
> + return false;
> +}
> +EXPORT_SYMBOL_GPL(blk_mq_sched_try_merge);
> +
> +bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio)
> +{
> + struct elevator_queue *e = q->elevator;
> +
> + if 

[PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers

2016-12-16 Thread Jens Axboe
This adds a set of hooks that intercepts the blk-mq path of
allocating/inserting/issuing/completing requests, allowing
us to develop a scheduler within that framework.

We reuse the existing elevator scheduler API on the registration
side, but augment that with the scheduler flagging support for
the blk-mq interfce, and with a separate set of ops hooks for MQ
devices.

Schedulers can opt in to using shadow requests. Shadow requests
are internal requests that the scheduler uses for for the allocate
and insert part, which are then mapped to a real driver request
at dispatch time. This is needed to separate the device queue depth
from the pool of requests that the scheduler has to work with.

Signed-off-by: Jens Axboe 
---
 block/Makefile   |   2 +-
 block/blk-core.c |   3 +-
 block/blk-exec.c |   3 +-
 block/blk-flush.c|   7 +-
 block/blk-merge.c|   2 +-
 block/blk-mq-sched.c | 434 +++
 block/blk-mq-sched.h | 209 +++
 block/blk-mq.c   | 197 +
 block/blk-mq.h   |   6 +-
 block/elevator.c | 186 +++-
 include/linux/blk-mq.h   |   3 +-
 include/linux/elevator.h |  30 
 12 files changed, 914 insertions(+), 168 deletions(-)
 create mode 100644 block/blk-mq-sched.c
 create mode 100644 block/blk-mq-sched.h

diff --git a/block/Makefile b/block/Makefile
index a827f988c4e6..2eee9e1bb6db 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -6,7 +6,7 @@ obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-tag.o 
blk-sysfs.o \
blk-flush.o blk-settings.o blk-ioc.o blk-map.o \
blk-exec.o blk-merge.o blk-softirq.o blk-timeout.o \
blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \
-   blk-mq-sysfs.o blk-mq-cpumap.o ioctl.o \
+   blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \
genhd.o scsi_ioctl.o partition-generic.o ioprio.o \
badblocks.o partitions/
 
diff --git a/block/blk-core.c b/block/blk-core.c
index 92baea07acbc..ee3a6f340cb8 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -39,6 +39,7 @@
 
 #include "blk.h"
 #include "blk-mq.h"
+#include "blk-mq-sched.h"
 #include "blk-wbt.h"
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_remap);
@@ -2127,7 +2128,7 @@ int blk_insert_cloned_request(struct request_queue *q, 
struct request *rq)
if (q->mq_ops) {
if (blk_queue_io_stat(q))
blk_account_io_start(rq, true);
-   blk_mq_insert_request(rq, false, true, false);
+   blk_mq_sched_insert_request(rq, false, true, false);
return 0;
}
 
diff --git a/block/blk-exec.c b/block/blk-exec.c
index 3ecb00a6cf45..86656fdfa637 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -9,6 +9,7 @@
 #include 
 
 #include "blk.h"
+#include "blk-mq-sched.h"
 
 /*
  * for max sense size
@@ -65,7 +66,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct 
gendisk *bd_disk,
 * be reused after dying flag is set
 */
if (q->mq_ops) {
-   blk_mq_insert_request(rq, at_head, true, false);
+   blk_mq_sched_insert_request(rq, at_head, true, false);
return;
}
 
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 20b7c7a02f1c..6a7c29d2eb3c 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -74,6 +74,7 @@
 #include "blk.h"
 #include "blk-mq.h"
 #include "blk-mq-tag.h"
+#include "blk-mq-sched.h"
 
 /* FLUSH/FUA sequences */
 enum {
@@ -453,9 +454,9 @@ void blk_insert_flush(struct request *rq)
 */
if ((policy & REQ_FSEQ_DATA) &&
!(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
-   if (q->mq_ops) {
-   blk_mq_insert_request(rq, false, true, false);
-   } else
+   if (q->mq_ops)
+   blk_mq_sched_insert_request(rq, false, true, false);
+   else
list_add_tail(>queuelist, >queue_head);
return;
}
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 480570b691dc..6aa43dec5af4 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -763,7 +763,7 @@ int blk_attempt_req_merge(struct request_queue *q, struct 
request *rq,
 {
struct elevator_queue *e = q->elevator;
 
-   if (e->type->ops.sq.elevator_allow_rq_merge_fn)
+   if (!e->uses_mq && e->type->ops.sq.elevator_allow_rq_merge_fn)
if (!e->type->ops.sq.elevator_allow_rq_merge_fn(q, rq, next))
return 0;
 
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
new file mode 100644
index ..b7e1839d4785
--- /dev/null
+++ b/block/blk-mq-sched.c
@@ -0,0 +1,434 @@
+/*
+ * blk-mq scheduling framework
+ *
+ * Copyright (C) 2016 Jens Axboe
+ */
+#include 

[PATCH 6/8] blk-mq-sched: add framework for MQ capable IO schedulers

2016-12-16 Thread Jens Axboe
This adds a set of hooks that intercepts the blk-mq path of
allocating/inserting/issuing/completing requests, allowing
us to develop a scheduler within that framework.

We reuse the existing elevator scheduler API on the registration
side, but augment that with the scheduler flagging support for
the blk-mq interfce, and with a separate set of ops hooks for MQ
devices.

Schedulers can opt in to using shadow requests. Shadow requests
are internal requests that the scheduler uses for for the allocate
and insert part, which are then mapped to a real driver request
at dispatch time. This is needed to separate the device queue depth
from the pool of requests that the scheduler has to work with.

Signed-off-by: Jens Axboe 
---
 block/Makefile   |   2 +-
 block/blk-core.c |   3 +-
 block/blk-exec.c |   3 +-
 block/blk-flush.c|   7 +-
 block/blk-merge.c|   2 +-
 block/blk-mq-sched.c | 434 +++
 block/blk-mq-sched.h | 209 +++
 block/blk-mq.c   | 197 +
 block/blk-mq.h   |   6 +-
 block/elevator.c | 186 +++-
 include/linux/blk-mq.h   |   3 +-
 include/linux/elevator.h |  30 
 12 files changed, 914 insertions(+), 168 deletions(-)
 create mode 100644 block/blk-mq-sched.c
 create mode 100644 block/blk-mq-sched.h

diff --git a/block/Makefile b/block/Makefile
index a827f988c4e6..2eee9e1bb6db 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -6,7 +6,7 @@ obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-tag.o 
blk-sysfs.o \
blk-flush.o blk-settings.o blk-ioc.o blk-map.o \
blk-exec.o blk-merge.o blk-softirq.o blk-timeout.o \
blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \
-   blk-mq-sysfs.o blk-mq-cpumap.o ioctl.o \
+   blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \
genhd.o scsi_ioctl.o partition-generic.o ioprio.o \
badblocks.o partitions/
 
diff --git a/block/blk-core.c b/block/blk-core.c
index 92baea07acbc..ee3a6f340cb8 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -39,6 +39,7 @@
 
 #include "blk.h"
 #include "blk-mq.h"
+#include "blk-mq-sched.h"
 #include "blk-wbt.h"
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_remap);
@@ -2127,7 +2128,7 @@ int blk_insert_cloned_request(struct request_queue *q, 
struct request *rq)
if (q->mq_ops) {
if (blk_queue_io_stat(q))
blk_account_io_start(rq, true);
-   blk_mq_insert_request(rq, false, true, false);
+   blk_mq_sched_insert_request(rq, false, true, false);
return 0;
}
 
diff --git a/block/blk-exec.c b/block/blk-exec.c
index 3ecb00a6cf45..86656fdfa637 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -9,6 +9,7 @@
 #include 
 
 #include "blk.h"
+#include "blk-mq-sched.h"
 
 /*
  * for max sense size
@@ -65,7 +66,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct 
gendisk *bd_disk,
 * be reused after dying flag is set
 */
if (q->mq_ops) {
-   blk_mq_insert_request(rq, at_head, true, false);
+   blk_mq_sched_insert_request(rq, at_head, true, false);
return;
}
 
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 20b7c7a02f1c..6a7c29d2eb3c 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -74,6 +74,7 @@
 #include "blk.h"
 #include "blk-mq.h"
 #include "blk-mq-tag.h"
+#include "blk-mq-sched.h"
 
 /* FLUSH/FUA sequences */
 enum {
@@ -453,9 +454,9 @@ void blk_insert_flush(struct request *rq)
 */
if ((policy & REQ_FSEQ_DATA) &&
!(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
-   if (q->mq_ops) {
-   blk_mq_insert_request(rq, false, true, false);
-   } else
+   if (q->mq_ops)
+   blk_mq_sched_insert_request(rq, false, true, false);
+   else
list_add_tail(>queuelist, >queue_head);
return;
}
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 480570b691dc..6aa43dec5af4 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -763,7 +763,7 @@ int blk_attempt_req_merge(struct request_queue *q, struct 
request *rq,
 {
struct elevator_queue *e = q->elevator;
 
-   if (e->type->ops.sq.elevator_allow_rq_merge_fn)
+   if (!e->uses_mq && e->type->ops.sq.elevator_allow_rq_merge_fn)
if (!e->type->ops.sq.elevator_allow_rq_merge_fn(q, rq, next))
return 0;
 
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
new file mode 100644
index ..b7e1839d4785
--- /dev/null
+++ b/block/blk-mq-sched.c
@@ -0,0 +1,434 @@
+/*
+ * blk-mq scheduling framework
+ *
+ * Copyright (C) 2016 Jens Axboe
+ */
+#include 
+#include