Re: [PATCH V2] blk-mq: avoid to starve tag allocation after allocation process migrates

2018-05-23 Thread Ming Lei
On Thu, May 24, 2018 at 1:40 AM, Omar Sandoval  wrote:
> On Sat, May 19, 2018 at 03:44:06PM +0800, Ming Lei wrote:
>> When the allocation process is scheduled back and the mapped hw queue is
>> changed, do one extra wake up on orignal queue for compensating wake up
>> miss, so other allocations on the orignal queue won't be starved.
>>
>> This patch fixes one request allocation hang issue, which can be
>> triggered easily in case of very low nr_request.
>>
>> Cc: 
>> Cc: Omar Sandoval 
>> Signed-off-by: Ming Lei 
>> ---
>>
>> V2:
>>   fix build failure
>>
>>  block/blk-mq-tag.c  | 13 +
>>  include/linux/sbitmap.h |  7 +++
>>  lib/sbitmap.c   |  6 ++
>>  3 files changed, 26 insertions(+)
>>
>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
>> index 336dde07b230..77607f89d205 100644
>> --- a/block/blk-mq-tag.c
>> +++ b/block/blk-mq-tag.c
>> @@ -134,6 +134,8 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data 
>> *data)
>>   ws = bt_wait_ptr(bt, data->hctx);
>>   drop_ctx = data->ctx == NULL;
>>   do {
>> + struct sbitmap_queue *bt_orig;
>> +
>>   /*
>>* We're out of tags on this hardware queue, kick any
>>* pending IO submits before going to sleep waiting for
>> @@ -159,6 +161,7 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data 
>> *data)
>>   if (data->ctx)
>>   blk_mq_put_ctx(data->ctx);
>>
>> + bt_orig = bt;
>>   io_schedule();
>>
>>   data->ctx = blk_mq_get_ctx(data->q);
>> @@ -170,6 +173,16 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data 
>> *data)
>>   bt = >bitmap_tags;
>>
>>   finish_wait(>wait, );
>> +
>> + /*
>> +  * If destination hw queue is changed, wake up original
>> +  * queue one extra time for compensating the wake up
>> +  * miss, so other allocations on original queue won't
>> +  * be starved.
>> +  */
>> + if (bt != bt_orig)
>> + sbitmap_queue_wake_up(bt_orig);
>> +
>>   ws = bt_wait_ptr(bt, data->hctx);
>>   } while (1);
>>
>> diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h
>> index 841585f6e5f2..b23f50355281 100644
>> --- a/include/linux/sbitmap.h
>> +++ b/include/linux/sbitmap.h
>> @@ -484,6 +484,13 @@ static inline struct sbq_wait_state 
>> *sbq_wait_ptr(struct sbitmap_queue *sbq,
>>  void sbitmap_queue_wake_all(struct sbitmap_queue *sbq);
>>
>>  /**
>> + * sbitmap_wake_up() - Do a regular wake up compensation if the queue
>> + * allocated from is changed after scheduling back.
>> + * @sbq: Bitmap queue to wake up.
>> + */
>> +void sbitmap_queue_wake_up(struct sbitmap_queue *sbq);
>> +
>> +/**
>>   * sbitmap_queue_show() - Dump  sbitmap_queue information to a 
>> 
>>   * seq_file.
>>   * @sbq: Bitmap queue to show.
>> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
>> index e6a9c06ec70c..c6ae4206bcb1 100644
>> --- a/lib/sbitmap.c
>> +++ b/lib/sbitmap.c
>> @@ -466,6 +466,12 @@ static void sbq_wake_up(struct sbitmap_queue *sbq)
>>   }
>>  }
>>
>> +void sbitmap_queue_wake_up(struct sbitmap_queue *sbq)
>> +{
>> + sbq_wake_up(sbq);
>> +}
>> +EXPORT_SYMBOL_GPL(sbitmap_queue_wake_up);
>
> There's no reason to wrap an internal helper with a function taking the
> same arguments, just rename sbq_wake_up() and export it.

This way may keep sbq_wake_up() inlined to sbitmap_queue_clear(), so
we won't introduce any cost to fast path.

Thanks,
Ming Lei


Re: [PATCH V2] blk-mq: avoid to starve tag allocation after allocation process migrates

2018-05-23 Thread Omar Sandoval
On Thu, May 24, 2018 at 06:09:51AM +0800, Ming Lei wrote:
> On Thu, May 24, 2018 at 1:48 AM, Omar Sandoval  wrote:
> > On Wed, May 23, 2018 at 05:32:31PM +0800, Ming Lei wrote:
> >> On Tue, May 22, 2018 at 09:59:17PM -0600, Jens Axboe wrote:
> >> > On 5/19/18 1:44 AM, Ming Lei wrote:
> >> > > When the allocation process is scheduled back and the mapped hw queue 
> >> > > is
> >> > > changed, do one extra wake up on orignal queue for compensating wake up
> >> > > miss, so other allocations on the orignal queue won't be starved.
> >> > >
> >> > > This patch fixes one request allocation hang issue, which can be
> >> > > triggered easily in case of very low nr_request.
> >> >
> >> > Trying to think of better ways we can fix this, but I don't see
> >> > any right now. Getting rid of the wake_up_nr() kills us on tons
> >> > of tasks waiting.
> >>
> >> I am not sure if I understand your point, but this issue isn't related
> >> with wake_up_nr() actually, and it can be reproduced after reverting
> >> 4e5dff41be7b5201c1c47c (blk-mq: improve heavily contended tag case).
> >>
> >> All tasks in current sbq_wait_state may be scheduled to other CPUs, and
> >> there may still be tasks waiting for allocation from this sbitmap_queue,
> >> and the root cause is about cross-queue allocation, as you said,
> >> there are too many queues, :-)
> >
> > I don't follow. Your description of the problem was that we have two
> > waiters and only wake up one, which doesn't in turn allocate and free a
> > tag and wake up the second waiter. Changing it back to wake_up_nr()
> > eliminates that problem. And if waking up everything doesn't fix it, how
> > does your fix of waking up a few extra tasks fix it?
> 
> What matters is that this patch wakes up the previous sbq, let's see if
> from another view:
> 
> 1) still 2 hw queues, nr_requests are 2, and wake_batch is one
> 
> 2) there are 3 waiters on hw queue 0
> 
> 3) two in-flight requests in hw queue 0 are completed, and only two waiters
> of 3 are waken up because of wake_batch, but both the two waiters can be
> scheduled to another CPU and cause to switch to hw queue 1
> 
> 4) then the 3rd waiter will wait for ever, since no in-flight request
> is in hw queue
> 0 any more.
> 
> 5) this patch fixes it by the fake wakeup when waiter is scheduled to another
> hw queue
> 
> The issue can be understood a bit easier if we just forget sbq_wait_state and
> focus on sbq, :-)

Okay, I see, although if I'm understanding correctly, it has everything
to do with sbq_wait_state. If we only had one waitqueue, then wake_up()
would always wake up all of the waiters, but because we have them spread
out over multiple waitqueues, we have to call 
sbq_wake_up()/sbitmap_queue_wake_up()
to do the wake up on the other waitqueue.


Re: [PATCH V2] blk-mq: avoid to starve tag allocation after allocation process migrates

2018-05-23 Thread Jens Axboe
On 5/23/18 4:09 PM, Ming Lei wrote:
> On Thu, May 24, 2018 at 1:48 AM, Omar Sandoval  wrote:
>> On Wed, May 23, 2018 at 05:32:31PM +0800, Ming Lei wrote:
>>> On Tue, May 22, 2018 at 09:59:17PM -0600, Jens Axboe wrote:
 On 5/19/18 1:44 AM, Ming Lei wrote:
> When the allocation process is scheduled back and the mapped hw queue is
> changed, do one extra wake up on orignal queue for compensating wake up
> miss, so other allocations on the orignal queue won't be starved.
>
> This patch fixes one request allocation hang issue, which can be
> triggered easily in case of very low nr_request.

 Trying to think of better ways we can fix this, but I don't see
 any right now. Getting rid of the wake_up_nr() kills us on tons
 of tasks waiting.
>>>
>>> I am not sure if I understand your point, but this issue isn't related
>>> with wake_up_nr() actually, and it can be reproduced after reverting
>>> 4e5dff41be7b5201c1c47c (blk-mq: improve heavily contended tag case).
>>>
>>> All tasks in current sbq_wait_state may be scheduled to other CPUs, and
>>> there may still be tasks waiting for allocation from this sbitmap_queue,
>>> and the root cause is about cross-queue allocation, as you said,
>>> there are too many queues, :-)
>>
>> I don't follow. Your description of the problem was that we have two
>> waiters and only wake up one, which doesn't in turn allocate and free a
>> tag and wake up the second waiter. Changing it back to wake_up_nr()
>> eliminates that problem. And if waking up everything doesn't fix it, how
>> does your fix of waking up a few extra tasks fix it?
> 
> What matters is that this patch wakes up the previous sbq, let's see if
> from another view:
> 
> 1) still 2 hw queues, nr_requests are 2, and wake_batch is one
> 
> 2) there are 3 waiters on hw queue 0
> 
> 3) two in-flight requests in hw queue 0 are completed, and only two waiters
> of 3 are waken up because of wake_batch, but both the two waiters can be
> scheduled to another CPU and cause to switch to hw queue 1
> 
> 4) then the 3rd waiter will wait for ever, since no in-flight request
> is in hw queue
> 0 any more.
> 
> 5) this patch fixes it by the fake wakeup when waiter is scheduled to another
> hw queue
> 
> The issue can be understood a bit easier if we just forget sbq_wait_state and
> focus on sbq, :-)

It makes sense to me, and also explains why wake_up() vs wake_up_nr() doesn't
matter. Which is actually a relief. And the condition of moving AND having
a waiter should be rare enough that it'll work out fine in practice, I don't
see any performance implications from this. You're right that we already
abort early if we don't have pending waiters, so it's all good.

Can you respin with the comments from Omar and myself covered?

-- 
Jens Axboe



Re: [PATCH V2] blk-mq: avoid to starve tag allocation after allocation process migrates

2018-05-23 Thread Ming Lei
On Thu, May 24, 2018 at 1:48 AM, Omar Sandoval  wrote:
> On Wed, May 23, 2018 at 05:32:31PM +0800, Ming Lei wrote:
>> On Tue, May 22, 2018 at 09:59:17PM -0600, Jens Axboe wrote:
>> > On 5/19/18 1:44 AM, Ming Lei wrote:
>> > > When the allocation process is scheduled back and the mapped hw queue is
>> > > changed, do one extra wake up on orignal queue for compensating wake up
>> > > miss, so other allocations on the orignal queue won't be starved.
>> > >
>> > > This patch fixes one request allocation hang issue, which can be
>> > > triggered easily in case of very low nr_request.
>> >
>> > Trying to think of better ways we can fix this, but I don't see
>> > any right now. Getting rid of the wake_up_nr() kills us on tons
>> > of tasks waiting.
>>
>> I am not sure if I understand your point, but this issue isn't related
>> with wake_up_nr() actually, and it can be reproduced after reverting
>> 4e5dff41be7b5201c1c47c (blk-mq: improve heavily contended tag case).
>>
>> All tasks in current sbq_wait_state may be scheduled to other CPUs, and
>> there may still be tasks waiting for allocation from this sbitmap_queue,
>> and the root cause is about cross-queue allocation, as you said,
>> there are too many queues, :-)
>
> I don't follow. Your description of the problem was that we have two
> waiters and only wake up one, which doesn't in turn allocate and free a
> tag and wake up the second waiter. Changing it back to wake_up_nr()
> eliminates that problem. And if waking up everything doesn't fix it, how
> does your fix of waking up a few extra tasks fix it?

What matters is that this patch wakes up the previous sbq, let's see if
from another view:

1) still 2 hw queues, nr_requests are 2, and wake_batch is one

2) there are 3 waiters on hw queue 0

3) two in-flight requests in hw queue 0 are completed, and only two waiters
of 3 are waken up because of wake_batch, but both the two waiters can be
scheduled to another CPU and cause to switch to hw queue 1

4) then the 3rd waiter will wait for ever, since no in-flight request
is in hw queue
0 any more.

5) this patch fixes it by the fake wakeup when waiter is scheduled to another
hw queue

The issue can be understood a bit easier if we just forget sbq_wait_state and
focus on sbq, :-)

Thanks,
Ming Lei


Re: [PATCH V2] blk-mq: avoid to starve tag allocation after allocation process migrates

2018-05-23 Thread Omar Sandoval
On Wed, May 23, 2018 at 05:32:31PM +0800, Ming Lei wrote:
> On Tue, May 22, 2018 at 09:59:17PM -0600, Jens Axboe wrote:
> > On 5/19/18 1:44 AM, Ming Lei wrote:
> > > When the allocation process is scheduled back and the mapped hw queue is
> > > changed, do one extra wake up on orignal queue for compensating wake up
> > > miss, so other allocations on the orignal queue won't be starved.
> > > 
> > > This patch fixes one request allocation hang issue, which can be
> > > triggered easily in case of very low nr_request.
> > 
> > Trying to think of better ways we can fix this, but I don't see
> > any right now. Getting rid of the wake_up_nr() kills us on tons
> > of tasks waiting. 
> 
> I am not sure if I understand your point, but this issue isn't related
> with wake_up_nr() actually, and it can be reproduced after reverting
> 4e5dff41be7b5201c1c47c (blk-mq: improve heavily contended tag case).
> 
> All tasks in current sbq_wait_state may be scheduled to other CPUs, and
> there may still be tasks waiting for allocation from this sbitmap_queue,
> and the root cause is about cross-queue allocation, as you said,
> there are too many queues, :-)

I don't follow. Your description of the problem was that we have two
waiters and only wake up one, which doesn't in turn allocate and free a
tag and wake up the second waiter. Changing it back to wake_up_nr()
eliminates that problem. And if waking up everything doesn't fix it, how
does your fix of waking up a few extra tasks fix it?


Re: [PATCH V2] blk-mq: avoid to starve tag allocation after allocation process migrates

2018-05-23 Thread Omar Sandoval
On Sat, May 19, 2018 at 03:44:06PM +0800, Ming Lei wrote:
> When the allocation process is scheduled back and the mapped hw queue is
> changed, do one extra wake up on orignal queue for compensating wake up
> miss, so other allocations on the orignal queue won't be starved.
> 
> This patch fixes one request allocation hang issue, which can be
> triggered easily in case of very low nr_request.
> 
> Cc: 
> Cc: Omar Sandoval 
> Signed-off-by: Ming Lei 
> ---
> 
> V2:
>   fix build failure
> 
>  block/blk-mq-tag.c  | 13 +
>  include/linux/sbitmap.h |  7 +++
>  lib/sbitmap.c   |  6 ++
>  3 files changed, 26 insertions(+)
> 
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index 336dde07b230..77607f89d205 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -134,6 +134,8 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data 
> *data)
>   ws = bt_wait_ptr(bt, data->hctx);
>   drop_ctx = data->ctx == NULL;
>   do {
> + struct sbitmap_queue *bt_orig;
> +
>   /*
>* We're out of tags on this hardware queue, kick any
>* pending IO submits before going to sleep waiting for
> @@ -159,6 +161,7 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data 
> *data)
>   if (data->ctx)
>   blk_mq_put_ctx(data->ctx);
>  
> + bt_orig = bt;
>   io_schedule();
>  
>   data->ctx = blk_mq_get_ctx(data->q);
> @@ -170,6 +173,16 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data 
> *data)
>   bt = >bitmap_tags;
>  
>   finish_wait(>wait, );
> +
> + /*
> +  * If destination hw queue is changed, wake up original
> +  * queue one extra time for compensating the wake up
> +  * miss, so other allocations on original queue won't
> +  * be starved.
> +  */
> + if (bt != bt_orig)
> + sbitmap_queue_wake_up(bt_orig);
> +
>   ws = bt_wait_ptr(bt, data->hctx);
>   } while (1);
>  
> diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h
> index 841585f6e5f2..b23f50355281 100644
> --- a/include/linux/sbitmap.h
> +++ b/include/linux/sbitmap.h
> @@ -484,6 +484,13 @@ static inline struct sbq_wait_state *sbq_wait_ptr(struct 
> sbitmap_queue *sbq,
>  void sbitmap_queue_wake_all(struct sbitmap_queue *sbq);
>  
>  /**
> + * sbitmap_wake_up() - Do a regular wake up compensation if the queue
> + * allocated from is changed after scheduling back.
> + * @sbq: Bitmap queue to wake up.
> + */
> +void sbitmap_queue_wake_up(struct sbitmap_queue *sbq);
> +
> +/**
>   * sbitmap_queue_show() - Dump  sbitmap_queue information to a 
>   * seq_file.
>   * @sbq: Bitmap queue to show.
> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
> index e6a9c06ec70c..c6ae4206bcb1 100644
> --- a/lib/sbitmap.c
> +++ b/lib/sbitmap.c
> @@ -466,6 +466,12 @@ static void sbq_wake_up(struct sbitmap_queue *sbq)
>   }
>  }
>  
> +void sbitmap_queue_wake_up(struct sbitmap_queue *sbq)
> +{
> + sbq_wake_up(sbq);
> +}
> +EXPORT_SYMBOL_GPL(sbitmap_queue_wake_up);

There's no reason to wrap an internal helper with a function taking the
same arguments, just rename sbq_wake_up() and export it.


Re: [PATCH V2] blk-mq: avoid to starve tag allocation after allocation process migrates

2018-05-23 Thread Ming Lei
On Tue, May 22, 2018 at 09:59:17PM -0600, Jens Axboe wrote:
> On 5/19/18 1:44 AM, Ming Lei wrote:
> > When the allocation process is scheduled back and the mapped hw queue is
> > changed, do one extra wake up on orignal queue for compensating wake up
> > miss, so other allocations on the orignal queue won't be starved.
> > 
> > This patch fixes one request allocation hang issue, which can be
> > triggered easily in case of very low nr_request.
> 
> Trying to think of better ways we can fix this, but I don't see
> any right now. Getting rid of the wake_up_nr() kills us on tons
> of tasks waiting. 

I am not sure if I understand your point, but this issue isn't related
with wake_up_nr() actually, and it can be reproduced after reverting
4e5dff41be7b5201c1c47c (blk-mq: improve heavily contended tag case).

All tasks in current sbq_wait_state may be scheduled to other CPUs, and
there may still be tasks waiting for allocation from this sbitmap_queue,
and the root cause is about cross-queue allocation, as you said,
there are too many queues, :-)

> Maybe it might be possible to only go through
> the fake wakeup IFF we have a task waiting on the list, that'd
> spare us the atomic dec and cmpxchg for all cases except if we
> have a task (or more) waiting on the existing wait state.

sbq_wake_ptr() checks if there are waiting tasks, and it will do
nothing if there isn't any.

In theory, the fake wakeup is only needed when the current wakeup
is triggered by the last in-flight request, but it isn't cheap
to figure out that accurately.

Given it is quite unusual to schedule process from one CPU to another,
and the cost of process migration can be much bigger than the wakeup,
I guess we may not need to worry about the performance effect.

> 
> > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> > index 336dde07b230..77607f89d205 100644
> > --- a/block/blk-mq-tag.c
> > +++ b/block/blk-mq-tag.c
> > @@ -134,6 +134,8 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data 
> > *data)
> > ws = bt_wait_ptr(bt, data->hctx);
> > drop_ctx = data->ctx == NULL;
> > do {
> > +   struct sbitmap_queue *bt_orig;
> 
> This should be called 'bt_prev'.

OK.

> 
> > diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h
> > index 841585f6e5f2..b23f50355281 100644
> > --- a/include/linux/sbitmap.h
> > +++ b/include/linux/sbitmap.h
> > @@ -484,6 +484,13 @@ static inline struct sbq_wait_state 
> > *sbq_wait_ptr(struct sbitmap_queue *sbq,
> >  void sbitmap_queue_wake_all(struct sbitmap_queue *sbq);
> >  
> >  /**
> > + * sbitmap_wake_up() - Do a regular wake up compensation if the queue
> > + * allocated from is changed after scheduling back.
> > + * @sbq: Bitmap queue to wake up.
> > + */
> > +void sbitmap_queue_wake_up(struct sbitmap_queue *sbq);
> 
> The blk-mq issue is bleeding into sbitmap here. This should just detail
> that this issues a wakeup, similar to how freeing a tag would

Right, will fix it in V3.

Thanks,
Ming


Re: [PATCH V2] blk-mq: avoid to starve tag allocation after allocation process migrates

2018-05-22 Thread Jens Axboe
On 5/19/18 1:44 AM, Ming Lei wrote:
> When the allocation process is scheduled back and the mapped hw queue is
> changed, do one extra wake up on orignal queue for compensating wake up
> miss, so other allocations on the orignal queue won't be starved.
> 
> This patch fixes one request allocation hang issue, which can be
> triggered easily in case of very low nr_request.

Trying to think of better ways we can fix this, but I don't see
any right now. Getting rid of the wake_up_nr() kills us on tons
of tasks waiting. Maybe it might be possible to only go through
the fake wakeup IFF we have a task waiting on the list, that'd
spare us the atomic dec and cmpxchg for all cases except if we
have a task (or more) waiting on the existing wait state.

> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index 336dde07b230..77607f89d205 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -134,6 +134,8 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data 
> *data)
>   ws = bt_wait_ptr(bt, data->hctx);
>   drop_ctx = data->ctx == NULL;
>   do {
> + struct sbitmap_queue *bt_orig;

This should be called 'bt_prev'.

> diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h
> index 841585f6e5f2..b23f50355281 100644
> --- a/include/linux/sbitmap.h
> +++ b/include/linux/sbitmap.h
> @@ -484,6 +484,13 @@ static inline struct sbq_wait_state *sbq_wait_ptr(struct 
> sbitmap_queue *sbq,
>  void sbitmap_queue_wake_all(struct sbitmap_queue *sbq);
>  
>  /**
> + * sbitmap_wake_up() - Do a regular wake up compensation if the queue
> + * allocated from is changed after scheduling back.
> + * @sbq: Bitmap queue to wake up.
> + */
> +void sbitmap_queue_wake_up(struct sbitmap_queue *sbq);

The blk-mq issue is bleeding into sbitmap here. This should just detail
that this issues a wakeup, similar to how freeing a tag would

-- 
Jens Axboe



Re: [PATCH V2] blk-mq: avoid to starve tag allocation after allocation process migrates

2018-05-22 Thread Jens Axboe
On 5/22/18 6:22 PM, Ming Lei wrote:
> On Tue, May 22, 2018 at 02:20:37PM -0600, Jens Axboe wrote:
>> On 5/19/18 1:44 AM, Ming Lei wrote:
>>> When the allocation process is scheduled back and the mapped hw queue is
>>> changed, do one extra wake up on orignal queue for compensating wake up
>>> miss, so other allocations on the orignal queue won't be starved.
>>>
>>> This patch fixes one request allocation hang issue, which can be
>>> triggered easily in case of very low nr_request.
>>
>> Can you explain what the actual bug is? The commit message doesn't
>> really say, it just explains what the code change does. What wake up
>> miss?
> 
> For example:
> 
> 1) one request queue has 2 queues, and queue depth is low, so wake_batch
> is set 1 or very small.

We have too many 'queues' :)

> 2) There are 2 allocations which are blocked on hw queue 0, now the last
> request mapped to hw queue 0 is completed, then sbq_wake_up() wakes only
> one of 2 blockers.
> 
> 3) the waken up allocation process is migrated to another CPU, and its
> hw queue becomes mapped to hw queue 1, and this allocation is switched
> to hw queue 1
> 
> 4) so there isn't any chance for another blocked allocation to move one,
> since all request in hw queue 0 has been completed. That is why I call
> it wake up miss.

OK, that makes sense to me. With that, let me review your patch in
a different email.

-- 
Jens Axboe



Re: [PATCH V2] blk-mq: avoid to starve tag allocation after allocation process migrates

2018-05-22 Thread Ming Lei
On Tue, May 22, 2018 at 02:20:37PM -0600, Jens Axboe wrote:
> On 5/19/18 1:44 AM, Ming Lei wrote:
> > When the allocation process is scheduled back and the mapped hw queue is
> > changed, do one extra wake up on orignal queue for compensating wake up
> > miss, so other allocations on the orignal queue won't be starved.
> > 
> > This patch fixes one request allocation hang issue, which can be
> > triggered easily in case of very low nr_request.
> 
> Can you explain what the actual bug is? The commit message doesn't
> really say, it just explains what the code change does. What wake up
> miss?

For example:

1) one request queue has 2 queues, and queue depth is low, so wake_batch
is set 1 or very small.

2) There are 2 allocations which are blocked on hw queue 0, now the last
request mapped to hw queue 0 is completed, then sbq_wake_up() wakes only
one of 2 blockers.

3) the waken up allocation process is migrated to another CPU, and its
hw queue becomes mapped to hw queue 1, and this allocation is switched
to hw queue 1

4) so there isn't any chance for another blocked allocation to move one,
since all request in hw queue 0 has been completed. That is why I call
it wake up miss.

BTW, this issue can be reproduced reliably by the block/020 I posted
yesterday:

https://marc.info/?l=linux-block=152698758418536=2

Thanks,
Ming


Re: [PATCH V2] blk-mq: avoid to starve tag allocation after allocation process migrates

2018-05-22 Thread Jens Axboe
On 5/19/18 1:44 AM, Ming Lei wrote:
> When the allocation process is scheduled back and the mapped hw queue is
> changed, do one extra wake up on orignal queue for compensating wake up
> miss, so other allocations on the orignal queue won't be starved.
> 
> This patch fixes one request allocation hang issue, which can be
> triggered easily in case of very low nr_request.

Can you explain what the actual bug is? The commit message doesn't
really say, it just explains what the code change does. What wake up
miss?

-- 
Jens Axboe



[PATCH V2] blk-mq: avoid to starve tag allocation after allocation process migrates

2018-05-19 Thread Ming Lei
When the allocation process is scheduled back and the mapped hw queue is
changed, do one extra wake up on orignal queue for compensating wake up
miss, so other allocations on the orignal queue won't be starved.

This patch fixes one request allocation hang issue, which can be
triggered easily in case of very low nr_request.

Cc: 
Cc: Omar Sandoval 
Signed-off-by: Ming Lei 
---

V2:
fix build failure

 block/blk-mq-tag.c  | 13 +
 include/linux/sbitmap.h |  7 +++
 lib/sbitmap.c   |  6 ++
 3 files changed, 26 insertions(+)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 336dde07b230..77607f89d205 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -134,6 +134,8 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
ws = bt_wait_ptr(bt, data->hctx);
drop_ctx = data->ctx == NULL;
do {
+   struct sbitmap_queue *bt_orig;
+
/*
 * We're out of tags on this hardware queue, kick any
 * pending IO submits before going to sleep waiting for
@@ -159,6 +161,7 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
if (data->ctx)
blk_mq_put_ctx(data->ctx);
 
+   bt_orig = bt;
io_schedule();
 
data->ctx = blk_mq_get_ctx(data->q);
@@ -170,6 +173,16 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
bt = >bitmap_tags;
 
finish_wait(>wait, );
+
+   /*
+* If destination hw queue is changed, wake up original
+* queue one extra time for compensating the wake up
+* miss, so other allocations on original queue won't
+* be starved.
+*/
+   if (bt != bt_orig)
+   sbitmap_queue_wake_up(bt_orig);
+
ws = bt_wait_ptr(bt, data->hctx);
} while (1);
 
diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h
index 841585f6e5f2..b23f50355281 100644
--- a/include/linux/sbitmap.h
+++ b/include/linux/sbitmap.h
@@ -484,6 +484,13 @@ static inline struct sbq_wait_state *sbq_wait_ptr(struct 
sbitmap_queue *sbq,
 void sbitmap_queue_wake_all(struct sbitmap_queue *sbq);
 
 /**
+ * sbitmap_wake_up() - Do a regular wake up compensation if the queue
+ * allocated from is changed after scheduling back.
+ * @sbq: Bitmap queue to wake up.
+ */
+void sbitmap_queue_wake_up(struct sbitmap_queue *sbq);
+
+/**
  * sbitmap_queue_show() - Dump  sbitmap_queue information to a 
  * seq_file.
  * @sbq: Bitmap queue to show.
diff --git a/lib/sbitmap.c b/lib/sbitmap.c
index e6a9c06ec70c..c6ae4206bcb1 100644
--- a/lib/sbitmap.c
+++ b/lib/sbitmap.c
@@ -466,6 +466,12 @@ static void sbq_wake_up(struct sbitmap_queue *sbq)
}
 }
 
+void sbitmap_queue_wake_up(struct sbitmap_queue *sbq)
+{
+   sbq_wake_up(sbq);
+}
+EXPORT_SYMBOL_GPL(sbitmap_queue_wake_up);
+
 void sbitmap_queue_clear(struct sbitmap_queue *sbq, unsigned int nr,
 unsigned int cpu)
 {
-- 
2.9.5