Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-04-06 Thread Ming Lei
On Fri, Apr 06, 2018 at 05:11:53PM +0200, Christian Borntraeger wrote:
> 
> 
> On 04/06/2018 04:58 PM, Ming Lei wrote:
> > On Fri, Apr 06, 2018 at 04:26:49PM +0200, Christian Borntraeger wrote:
> >>
> >>
> >> On 04/06/2018 03:41 PM, Ming Lei wrote:
> >>> On Fri, Apr 06, 2018 at 12:19:19PM +0200, Christian Borntraeger wrote:
> 
> 
>  On 04/06/2018 11:23 AM, Ming Lei wrote:
> > On Fri, Apr 06, 2018 at 10:51:28AM +0200, Christian Borntraeger wrote:
> >>
> >>
> >> On 04/06/2018 10:41 AM, Ming Lei wrote:
> >>> On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote:
> 
> 
>  On 04/05/2018 06:11 PM, Ming Lei wrote:
> >>
> >> Could you please apply the following patch and provide the dmesg 
> >> boot log?
> >
> > And please post out the 'lscpu' log together from the test machine 
> > too.
> 
>  attached.
> 
>  As I said before this seems to go way with CONFIG_NR_CPUS=64 or 
>  smaller.
>  We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but 
>  only 8 Cores
>  == 16 threads.
> >>>
> >>> OK, thanks!
> >>>
> >>> The most weird thing is that hctx->next_cpu is computed as 512 since
> >>> nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of
> >>> possible CPU.
> >>>
> >>> Looks like it is a s390 specific issue, since I can setup one queue
> >>> which has same mapping with yours:
> >>>
> >>>   - nr_cpu_id is 282
> >>>   - CPU 0~15 is online
> >>>   - 64 queues null_blk
> >>>   - still run all hw queues in .complete handler
> >>>
> >>> But can't reproduce this issue at all.
> >>>
> >>> So please test the following patch, which may tell us why 
> >>> hctx->next_cpu
> >>> is computed wrong:
> >>
> >> I see things like
> >>
> >> [8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >> [8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >> [8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >> [8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >> [8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >> [8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >> [8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >> [8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >> [8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >> [8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >>
> >> which is exactly what happens if the find and and operation fails 
> >> (returns size of bitmap).
> >
> > Given both 'cpu_online_mask' and 'hctx->cpumask' are shown as correct
> > in your previous debug log, it means the following function returns
> > totally wrong result on S390.
> >
> > cpumask_first_and(hctx->cpumask, cpu_online_mask);
> >
> > The debugfs log shows that each hctx->cpumask includes one online
> > CPU(0~15).
> 
>  Really? the last log (with the latest patch applied  shows a lot of 
>  contexts
>  that do not have CPUs in 0-15:
> 
>  e.g. 
>  [4.049828] dump CPUs mapped to this hctx:
>  [4.049829] 18 
>  [4.049829] 82 
>  [4.049830] 146 
>  [4.049830] 210 
>  [4.049831] 274 
> >>>
> >>> That won't be an issue, since no IO can be submitted from these offline
> >>> CPUs, then these hctx shouldn't have been run at all.
> >>>
> >>> But hctx->next_cpu can be set as 512 for these inactive hctx in
> >>> blk_mq_map_swqueue(), then please test the attached patch, and if
> >>> hctx->next_cpu is still set as 512, something is still wrong.
> >>
> >>
> >> WIth this patch I no longer see the "run queue from wrong CPU x, hctx 
> >> active" messages.
> >> your debug code still triggers, though.
> >>
> >> wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and
> >> wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and
> >>
> >> If we would remove the debug code then dmesg would be clean it seems.
> > 
> > That is still a bit strange, since for any inactive hctx(without online
> > CPU mapped), blk_mq_run_hw_queue() will check blk_mq_hctx_has_pending()
> 
> I think for next_and it is reasonable to see this, as the next_and will return
> 512 after we have used the last one. In fact the code does call first_and in
> that case for a reason, no?

It is possible for dumping 'first_and' when there isn't any online CPUs mapped
to this hctx.

But my question is that for this case, there shouldn't be any IO queued
for this hctx, and blk_mq_hctx_has_pending() has been called to check
that, so blk_mq_hctx_next_cpu() should have only be called when
blk_mq_hctx_has_pending() in blk_mq_run_hw_queue() is true.


Thanks,
Ming


Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-04-06 Thread Christian Borntraeger


On 04/06/2018 04:58 PM, Ming Lei wrote:
> On Fri, Apr 06, 2018 at 04:26:49PM +0200, Christian Borntraeger wrote:
>>
>>
>> On 04/06/2018 03:41 PM, Ming Lei wrote:
>>> On Fri, Apr 06, 2018 at 12:19:19PM +0200, Christian Borntraeger wrote:


 On 04/06/2018 11:23 AM, Ming Lei wrote:
> On Fri, Apr 06, 2018 at 10:51:28AM +0200, Christian Borntraeger wrote:
>>
>>
>> On 04/06/2018 10:41 AM, Ming Lei wrote:
>>> On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote:


 On 04/05/2018 06:11 PM, Ming Lei wrote:
>>
>> Could you please apply the following patch and provide the dmesg 
>> boot log?
>
> And please post out the 'lscpu' log together from the test machine 
> too.

 attached.

 As I said before this seems to go way with CONFIG_NR_CPUS=64 or 
 smaller.
 We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but 
 only 8 Cores
 == 16 threads.
>>>
>>> OK, thanks!
>>>
>>> The most weird thing is that hctx->next_cpu is computed as 512 since
>>> nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of
>>> possible CPU.
>>>
>>> Looks like it is a s390 specific issue, since I can setup one queue
>>> which has same mapping with yours:
>>>
>>> - nr_cpu_id is 282
>>> - CPU 0~15 is online
>>> - 64 queues null_blk
>>> - still run all hw queues in .complete handler
>>>
>>> But can't reproduce this issue at all.
>>>
>>> So please test the following patch, which may tell us why hctx->next_cpu
>>> is computed wrong:
>>
>> I see things like
>>
>> [8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>>
>> which is exactly what happens if the find and and operation fails 
>> (returns size of bitmap).
>
> Given both 'cpu_online_mask' and 'hctx->cpumask' are shown as correct
> in your previous debug log, it means the following function returns
> totally wrong result on S390.
>
>   cpumask_first_and(hctx->cpumask, cpu_online_mask);
>
> The debugfs log shows that each hctx->cpumask includes one online
> CPU(0~15).

 Really? the last log (with the latest patch applied  shows a lot of 
 contexts
 that do not have CPUs in 0-15:

 e.g. 
 [4.049828] dump CPUs mapped to this hctx:
 [4.049829] 18 
 [4.049829] 82 
 [4.049830] 146 
 [4.049830] 210 
 [4.049831] 274 
>>>
>>> That won't be an issue, since no IO can be submitted from these offline
>>> CPUs, then these hctx shouldn't have been run at all.
>>>
>>> But hctx->next_cpu can be set as 512 for these inactive hctx in
>>> blk_mq_map_swqueue(), then please test the attached patch, and if
>>> hctx->next_cpu is still set as 512, something is still wrong.
>>
>>
>> WIth this patch I no longer see the "run queue from wrong CPU x, hctx 
>> active" messages.
>> your debug code still triggers, though.
>>
>> wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and
>> wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and
>>
>> If we would remove the debug code then dmesg would be clean it seems.
> 
> That is still a bit strange, since for any inactive hctx(without online
> CPU mapped), blk_mq_run_hw_queue() will check blk_mq_hctx_has_pending()

I think for next_and it is reasonable to see this, as the next_and will return
512 after we have used the last one. In fact the code does call first_and in
that case for a reason, no?


> first. And there shouldn't be any pending IO for all inactive hctx
> in your case, so looks blk_mq_hctx_next_cpu() shouldn't be called for
> inactive hctx.
> 
> I will prepare one patchset and post out soon, and hope all these issues
> can be covered.
> 
> Thanks,
> Ming
> 



Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-04-06 Thread Ming Lei
On Fri, Apr 06, 2018 at 04:26:49PM +0200, Christian Borntraeger wrote:
> 
> 
> On 04/06/2018 03:41 PM, Ming Lei wrote:
> > On Fri, Apr 06, 2018 at 12:19:19PM +0200, Christian Borntraeger wrote:
> >>
> >>
> >> On 04/06/2018 11:23 AM, Ming Lei wrote:
> >>> On Fri, Apr 06, 2018 at 10:51:28AM +0200, Christian Borntraeger wrote:
> 
> 
>  On 04/06/2018 10:41 AM, Ming Lei wrote:
> > On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote:
> >>
> >>
> >> On 04/05/2018 06:11 PM, Ming Lei wrote:
> 
>  Could you please apply the following patch and provide the dmesg 
>  boot log?
> >>>
> >>> And please post out the 'lscpu' log together from the test machine 
> >>> too.
> >>
> >> attached.
> >>
> >> As I said before this seems to go way with CONFIG_NR_CPUS=64 or 
> >> smaller.
> >> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but 
> >> only 8 Cores
> >> == 16 threads.
> >
> > OK, thanks!
> >
> > The most weird thing is that hctx->next_cpu is computed as 512 since
> > nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of
> > possible CPU.
> >
> > Looks like it is a s390 specific issue, since I can setup one queue
> > which has same mapping with yours:
> >
> > - nr_cpu_id is 282
> > - CPU 0~15 is online
> > - 64 queues null_blk
> > - still run all hw queues in .complete handler
> >
> > But can't reproduce this issue at all.
> >
> > So please test the following patch, which may tell us why hctx->next_cpu
> > is computed wrong:
> 
>  I see things like
> 
>  [8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>  [8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>  [8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>  [8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>  [8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>  [8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>  [8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>  [8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>  [8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>  [8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> 
>  which is exactly what happens if the find and and operation fails 
>  (returns size of bitmap).
> >>>
> >>> Given both 'cpu_online_mask' and 'hctx->cpumask' are shown as correct
> >>> in your previous debug log, it means the following function returns
> >>> totally wrong result on S390.
> >>>
> >>>   cpumask_first_and(hctx->cpumask, cpu_online_mask);
> >>>
> >>> The debugfs log shows that each hctx->cpumask includes one online
> >>> CPU(0~15).
> >>
> >> Really? the last log (with the latest patch applied  shows a lot of 
> >> contexts
> >> that do not have CPUs in 0-15:
> >>
> >> e.g. 
> >> [4.049828] dump CPUs mapped to this hctx:
> >> [4.049829] 18 
> >> [4.049829] 82 
> >> [4.049830] 146 
> >> [4.049830] 210 
> >> [4.049831] 274 
> > 
> > That won't be an issue, since no IO can be submitted from these offline
> > CPUs, then these hctx shouldn't have been run at all.
> > 
> > But hctx->next_cpu can be set as 512 for these inactive hctx in
> > blk_mq_map_swqueue(), then please test the attached patch, and if
> > hctx->next_cpu is still set as 512, something is still wrong.
> 
> 
> WIth this patch I no longer see the "run queue from wrong CPU x, hctx active" 
> messages.
> your debug code still triggers, though.
> 
> wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and
> wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and
> 
> If we would remove the debug code then dmesg would be clean it seems.

That is still a bit strange, since for any inactive hctx(without online
CPU mapped), blk_mq_run_hw_queue() will check blk_mq_hctx_has_pending()
first. And there shouldn't be any pending IO for all inactive hctx
in your case, so looks blk_mq_hctx_next_cpu() shouldn't be called for
inactive hctx.

I will prepare one patchset and post out soon, and hope all these issues
can be covered.

Thanks,
Ming


Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-04-06 Thread Christian Borntraeger


On 04/06/2018 03:41 PM, Ming Lei wrote:
> On Fri, Apr 06, 2018 at 12:19:19PM +0200, Christian Borntraeger wrote:
>>
>>
>> On 04/06/2018 11:23 AM, Ming Lei wrote:
>>> On Fri, Apr 06, 2018 at 10:51:28AM +0200, Christian Borntraeger wrote:


 On 04/06/2018 10:41 AM, Ming Lei wrote:
> On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote:
>>
>>
>> On 04/05/2018 06:11 PM, Ming Lei wrote:

 Could you please apply the following patch and provide the dmesg boot 
 log?
>>>
>>> And please post out the 'lscpu' log together from the test machine too.
>>
>> attached.
>>
>> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller.
>> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 
>> 8 Cores
>> == 16 threads.
>
> OK, thanks!
>
> The most weird thing is that hctx->next_cpu is computed as 512 since
> nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of
> possible CPU.
>
> Looks like it is a s390 specific issue, since I can setup one queue
> which has same mapping with yours:
>
>   - nr_cpu_id is 282
>   - CPU 0~15 is online
>   - 64 queues null_blk
>   - still run all hw queues in .complete handler
>
> But can't reproduce this issue at all.
>
> So please test the following patch, which may tell us why hctx->next_cpu
> is computed wrong:

 I see things like

 [8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and
 [8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and
 [8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and
 [8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and
 [8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and
 [8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and
 [8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
 [8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
 [8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and
 [8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and

 which is exactly what happens if the find and and operation fails (returns 
 size of bitmap).
>>>
>>> Given both 'cpu_online_mask' and 'hctx->cpumask' are shown as correct
>>> in your previous debug log, it means the following function returns
>>> totally wrong result on S390.
>>>
>>> cpumask_first_and(hctx->cpumask, cpu_online_mask);
>>>
>>> The debugfs log shows that each hctx->cpumask includes one online
>>> CPU(0~15).
>>
>> Really? the last log (with the latest patch applied  shows a lot of contexts
>> that do not have CPUs in 0-15:
>>
>> e.g. 
>> [4.049828] dump CPUs mapped to this hctx:
>> [4.049829] 18 
>> [4.049829] 82 
>> [4.049830] 146 
>> [4.049830] 210 
>> [4.049831] 274 
> 
> That won't be an issue, since no IO can be submitted from these offline
> CPUs, then these hctx shouldn't have been run at all.
> 
> But hctx->next_cpu can be set as 512 for these inactive hctx in
> blk_mq_map_swqueue(), then please test the attached patch, and if
> hctx->next_cpu is still set as 512, something is still wrong.


WIth this patch I no longer see the "run queue from wrong CPU x, hctx active" 
messages.
your debug code still triggers, though.

wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and
wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and

If we would remove the debug code then dmesg would be clean it seems.


> ---
> 
> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> index 9f8cffc8a701..638ab5c11b3c 100644
> --- a/block/blk-mq-cpumap.c
> +++ b/block/blk-mq-cpumap.c
> @@ -14,13 +14,12 @@
>  #include "blk.h"
>  #include "blk-mq.h"
> 
> +/*
> + * Given there isn't CPU hotplug handler in blk-mq, map all CPUs to
> + * queues even it isn't present yet.
> + */
>  static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
>  {
> - /*
> -  * Non present CPU will be mapped to queue index 0.
> -  */
> - if (!cpu_present(cpu))
> - return 0;
>   return cpu % nr_queues;
>  }
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 90838e998f66..1a834d96a718 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1343,6 +1343,13 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx 
> *hctx)
>   hctx_unlock(hctx, srcu_idx);
>  }
> 
> +static void check_next_cpu(int next_cpu, const char *str1, const char *str2)
> +{
> + if (next_cpu > nr_cpu_ids)
> + printk_ratelimited("wrong next_cpu %d, %s, %s\n",
> + next_cpu, str1, str2);
> +}
> +
>  /*
>   * It'd be great if the workqueue API had a way to pass
>   * in a mask and had some smarts for more clever placement.
> @@ -1352,26 +1359,29 @@ static void __blk_mq_run_hw_queue(struct 
> blk_mq_hw_ctx *hctx)
>  static int blk_mq_hctx_next_cpu(struct 

Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-04-06 Thread Ming Lei
On Fri, Apr 06, 2018 at 12:19:19PM +0200, Christian Borntraeger wrote:
> 
> 
> On 04/06/2018 11:23 AM, Ming Lei wrote:
> > On Fri, Apr 06, 2018 at 10:51:28AM +0200, Christian Borntraeger wrote:
> >>
> >>
> >> On 04/06/2018 10:41 AM, Ming Lei wrote:
> >>> On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote:
> 
> 
>  On 04/05/2018 06:11 PM, Ming Lei wrote:
> >>
> >> Could you please apply the following patch and provide the dmesg boot 
> >> log?
> >
> > And please post out the 'lscpu' log together from the test machine too.
> 
>  attached.
> 
>  As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller.
>  We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 
>  8 Cores
>  == 16 threads.
> >>>
> >>> OK, thanks!
> >>>
> >>> The most weird thing is that hctx->next_cpu is computed as 512 since
> >>> nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of
> >>> possible CPU.
> >>>
> >>> Looks like it is a s390 specific issue, since I can setup one queue
> >>> which has same mapping with yours:
> >>>
> >>>   - nr_cpu_id is 282
> >>>   - CPU 0~15 is online
> >>>   - 64 queues null_blk
> >>>   - still run all hw queues in .complete handler
> >>>
> >>> But can't reproduce this issue at all.
> >>>
> >>> So please test the following patch, which may tell us why hctx->next_cpu
> >>> is computed wrong:
> >>
> >> I see things like
> >>
> >> [8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >> [8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >> [8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >> [8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >> [8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >> [8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >> [8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >> [8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >> [8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >> [8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >>
> >> which is exactly what happens if the find and and operation fails (returns 
> >> size of bitmap).
> > 
> > Given both 'cpu_online_mask' and 'hctx->cpumask' are shown as correct
> > in your previous debug log, it means the following function returns
> > totally wrong result on S390.
> > 
> > cpumask_first_and(hctx->cpumask, cpu_online_mask);
> > 
> > The debugfs log shows that each hctx->cpumask includes one online
> > CPU(0~15).
> 
> Really? the last log (with the latest patch applied  shows a lot of contexts
> that do not have CPUs in 0-15:
> 
> e.g. 
> [4.049828] dump CPUs mapped to this hctx:
> [4.049829] 18 
> [4.049829] 82 
> [4.049830] 146 
> [4.049830] 210 
> [4.049831] 274 

That won't be an issue, since no IO can be submitted from these offline
CPUs, then these hctx shouldn't have been run at all.

But hctx->next_cpu can be set as 512 for these inactive hctx in
blk_mq_map_swqueue(), then please test the attached patch, and if
hctx->next_cpu is still set as 512, something is still wrong.

---

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 9f8cffc8a701..638ab5c11b3c 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -14,13 +14,12 @@
 #include "blk.h"
 #include "blk-mq.h"
 
+/*
+ * Given there isn't CPU hotplug handler in blk-mq, map all CPUs to
+ * queues even it isn't present yet.
+ */
 static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
 {
-   /*
-* Non present CPU will be mapped to queue index 0.
-*/
-   if (!cpu_present(cpu))
-   return 0;
return cpu % nr_queues;
 }
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 90838e998f66..1a834d96a718 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1343,6 +1343,13 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx 
*hctx)
hctx_unlock(hctx, srcu_idx);
 }
 
+static void check_next_cpu(int next_cpu, const char *str1, const char *str2)
+{
+   if (next_cpu > nr_cpu_ids)
+   printk_ratelimited("wrong next_cpu %d, %s, %s\n",
+   next_cpu, str1, str2);
+}
+
 /*
  * It'd be great if the workqueue API had a way to pass
  * in a mask and had some smarts for more clever placement.
@@ -1352,26 +1359,29 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx 
*hctx)
 static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
 {
bool tried = false;
+   int next_cpu = hctx->next_cpu;
 
if (hctx->queue->nr_hw_queues == 1)
return WORK_CPU_UNBOUND;
 
if (--hctx->next_cpu_batch <= 0) {
-   int next_cpu;
 select_cpu:
-   next_cpu = cpumask_next_and(hctx->next_cpu, hctx->cpumask,
+   next_cpu = cpumask_next_and(next_cpu, hctx->cpumask,

Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-04-06 Thread Christian Borntraeger


On 04/06/2018 11:23 AM, Ming Lei wrote:
> On Fri, Apr 06, 2018 at 10:51:28AM +0200, Christian Borntraeger wrote:
>>
>>
>> On 04/06/2018 10:41 AM, Ming Lei wrote:
>>> On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote:


 On 04/05/2018 06:11 PM, Ming Lei wrote:
>>
>> Could you please apply the following patch and provide the dmesg boot 
>> log?
>
> And please post out the 'lscpu' log together from the test machine too.

 attached.

 As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller.
 We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 
 Cores
 == 16 threads.
>>>
>>> OK, thanks!
>>>
>>> The most weird thing is that hctx->next_cpu is computed as 512 since
>>> nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of
>>> possible CPU.
>>>
>>> Looks like it is a s390 specific issue, since I can setup one queue
>>> which has same mapping with yours:
>>>
>>> - nr_cpu_id is 282
>>> - CPU 0~15 is online
>>> - 64 queues null_blk
>>> - still run all hw queues in .complete handler
>>>
>>> But can't reproduce this issue at all.
>>>
>>> So please test the following patch, which may tell us why hctx->next_cpu
>>> is computed wrong:
>>
>> I see things like
>>
>> [8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>>
>> which is exactly what happens if the find and and operation fails (returns 
>> size of bitmap).
> 
> Given both 'cpu_online_mask' and 'hctx->cpumask' are shown as correct
> in your previous debug log, it means the following function returns
> totally wrong result on S390.
> 
>   cpumask_first_and(hctx->cpumask, cpu_online_mask);
> 
> The debugfs log shows that each hctx->cpumask includes one online
> CPU(0~15).
> 
> So looks it isn't one issue in block MQ core.

So I checked further and printed the mask I think I can ignore the next_and 
cases. It is totally
valid to get 512 here (as we might start with an offset that is already the 
last cpu and we need
to wrap with first_and)).

So the first_and and the first cases are really the interesting one. And I 
think the code is perfectly
right, there is no bit after and for these cases:


[3.220021] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[3.220023] 1: 0001 0001 0001 
0001 0001   

[3.220025] 2:    
    

[3.220027] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[3.220028] 1: 0002 0002 0002 
0002 0002   

[3.220030] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[3.220032] 1: 0001 0001 0001 
0001 0001   

[3.220033] 2:    
    

[3.220035] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[3.220036] 2:    
    

[3.220037] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[3.220039] 1: 0004 0004 0004 
0004 0004   

[3.220040] 2:    
    

[3.220042] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[3.220062] 1: 0002 0002 0002 
0002 0002   

[3.220063] 2:    
    

[3.220064] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[3.220066] 1: 0008 0008 0008 
0008 0008  

Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-04-06 Thread Christian Borntraeger


On 04/06/2018 11:23 AM, Ming Lei wrote:
> On Fri, Apr 06, 2018 at 10:51:28AM +0200, Christian Borntraeger wrote:
>>
>>
>> On 04/06/2018 10:41 AM, Ming Lei wrote:
>>> On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote:


 On 04/05/2018 06:11 PM, Ming Lei wrote:
>>
>> Could you please apply the following patch and provide the dmesg boot 
>> log?
>
> And please post out the 'lscpu' log together from the test machine too.

 attached.

 As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller.
 We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 
 Cores
 == 16 threads.
>>>
>>> OK, thanks!
>>>
>>> The most weird thing is that hctx->next_cpu is computed as 512 since
>>> nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of
>>> possible CPU.
>>>
>>> Looks like it is a s390 specific issue, since I can setup one queue
>>> which has same mapping with yours:
>>>
>>> - nr_cpu_id is 282
>>> - CPU 0~15 is online
>>> - 64 queues null_blk
>>> - still run all hw queues in .complete handler
>>>
>>> But can't reproduce this issue at all.
>>>
>>> So please test the following patch, which may tell us why hctx->next_cpu
>>> is computed wrong:
>>
>> I see things like
>>
>> [8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>>
>> which is exactly what happens if the find and and operation fails (returns 
>> size of bitmap).
> 
> Given both 'cpu_online_mask' and 'hctx->cpumask' are shown as correct
> in your previous debug log, it means the following function returns
> totally wrong result on S390.
> 
>   cpumask_first_and(hctx->cpumask, cpu_online_mask);
> 
> The debugfs log shows that each hctx->cpumask includes one online
> CPU(0~15).

Really? the last log (with the latest patch applied  shows a lot of contexts
that do not have CPUs in 0-15:

e.g. 
[4.049828] dump CPUs mapped to this hctx:
[4.049829] 18 
[4.049829] 82 
[4.049830] 146 
[4.049830] 210 
[4.049831] 274 

> 
> So looks it isn't one issue in block MQ core.
> 
> Thanks,
> Ming
> 



Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-04-06 Thread Ming Lei
On Fri, Apr 06, 2018 at 10:51:28AM +0200, Christian Borntraeger wrote:
> 
> 
> On 04/06/2018 10:41 AM, Ming Lei wrote:
> > On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote:
> >>
> >>
> >> On 04/05/2018 06:11 PM, Ming Lei wrote:
> 
>  Could you please apply the following patch and provide the dmesg boot 
>  log?
> >>>
> >>> And please post out the 'lscpu' log together from the test machine too.
> >>
> >> attached.
> >>
> >> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller.
> >> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 
> >> Cores
> >> == 16 threads.
> > 
> > OK, thanks!
> > 
> > The most weird thing is that hctx->next_cpu is computed as 512 since
> > nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of
> > possible CPU.
> > 
> > Looks like it is a s390 specific issue, since I can setup one queue
> > which has same mapping with yours:
> > 
> > - nr_cpu_id is 282
> > - CPU 0~15 is online
> > - 64 queues null_blk
> > - still run all hw queues in .complete handler
> > 
> > But can't reproduce this issue at all.
> > 
> > So please test the following patch, which may tell us why hctx->next_cpu
> > is computed wrong:
> 
> I see things like
> 
> [8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> 
> which is exactly what happens if the find and and operation fails (returns 
> size of bitmap).

Given both 'cpu_online_mask' and 'hctx->cpumask' are shown as correct
in your previous debug log, it means the following function returns
totally wrong result on S390.

cpumask_first_and(hctx->cpumask, cpu_online_mask);

The debugfs log shows that each hctx->cpumask includes one online
CPU(0~15).

So looks it isn't one issue in block MQ core.

Thanks,
Ming


Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-04-06 Thread Christian Borntraeger


On 04/06/2018 10:51 AM, Christian Borntraeger wrote:
> 
> 
> On 04/06/2018 10:41 AM, Ming Lei wrote:
>> On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote:
>>>
>>>
>>> On 04/05/2018 06:11 PM, Ming Lei wrote:
>
> Could you please apply the following patch and provide the dmesg boot log?

 And please post out the 'lscpu' log together from the test machine too.
>>>
>>> attached.
>>>
>>> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller.
>>> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 
>>> Cores
>>> == 16 threads.
>>
>> OK, thanks!
>>
>> The most weird thing is that hctx->next_cpu is computed as 512 since
>> nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of
>> possible CPU.
>>
>> Looks like it is a s390 specific issue, since I can setup one queue
>> which has same mapping with yours:
>>
>>  - nr_cpu_id is 282
>>  - CPU 0~15 is online
>>  - 64 queues null_blk
>>  - still run all hw queues in .complete handler
>>
>> But can't reproduce this issue at all.
>>
>> So please test the following patch, which may tell us why hctx->next_cpu
>> is computed wrong:
> 
> I see things like
> 
> [8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and

There are more

# dmesg  | grep "wrong next"  | cut -d "]" -f 2- | uniq -c
 10  wrong next_cpu 512, blk_mq_map_swqueue, first_and
 72  wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and
  1  wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and
  1  wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and
  1  wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and
  1  wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and
  1  wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and
  1  wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and
  1  wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and
  1  wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and
  1  wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and
  1  wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and
  1  wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and
  7  wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and
  1  wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and
  1  wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and
  1  wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and
 10  wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and


> 
> which is exactly what happens if the find and and operation fails (returns 
> size of bitmap).
> 
> FWIW, I added a dump stack for the case when we run unbound before I tested 
> your patch:
> 
> Apr 06 10:47:41 s38lp39 kernel: CPU: 15 PID: 86 Comm: ksoftirqd/15 Not 
> tainted 4.16.0-07249-g864f9fc031e4-dirty #2
> Apr 06 10:47:41 s38lp39 kernel: Hardware name: IBM 2964 NC9 704 (LPAR)
> Apr 06 10:47:41 s38lp39 kernel: Call Trace:
> Apr 06 10:47:41 s38lp39 kernel: ([<00113946>] show_stack+0x56/0x80)
> Apr 06 10:47:41 s38lp39 kernel:  [<009d8132>] dump_stack+0x82/0xb0 
> Apr 06 10:47:41 s38lp39 kernel:  [<006a05de>] 
> blk_mq_hctx_next_cpu+0x12e/0x138 
> Apr 06 10:47:41 s38lp39 kernel:  [<006a084c>] 
> __blk_mq_delay_run_hw_queue+0x94/0xd8 
> Apr 06 10:47:41 s38lp39 kernel:  [<006a097a>] 
> blk_mq_run_hw_queue+0x82/0x180 
> Apr 06 10:47:41 s38lp39 kernel:  [<006a0ae0>] 
> blk_mq_run_hw_queues+0x68/0x88 
> Apr 06 10:47:41 s38lp39 kernel:  [<0069fc4e>] 
> __blk_mq_complete_request+0x11e/0x1d8 
> Apr 06 10:47:41 s38lp39 kernel:  [<0069fd94>] 
> blk_mq_complete_request+0x8c/0xc8 
> Apr 06 10:47:41 s38lp39 kernel:  [<00824c50>] 
> dasd_block_tasklet+0x158/0x490 
> Apr 06 10:47:41 s38lp39 kernel:  [<0014a952>] 
> tasklet_action_common.isra.5+0x7a/0x100 
> Apr 06 10:47:41 s38lp39 kernel:  [<009f8248>] __do_softirq+0x98/0x368 
> Apr 06 10:47:41 s38lp39 kernel:  [<0014a322>] run_ksoftirqd+0x4a/0x68 
> Apr 06 10:47:41 s38lp39 kernel:  [<0016dc20>] 
> smpboot_thread_fn+0x108/0x1b0 
> Apr 06 10:47:41 s38lp39 kernel:  [<00168e70>] kthread+0x148/0x160 
> Apr 06 10:47:41 s38lp39 kernel:  [<009f727a>] 
> kernel_thread_starter+0x6/0xc 
> Apr 06 10:47:41 s38lp39 kernel:  [<009f7274>] 
> kernel_thread_starter+0x0/0xc
> 



Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-04-06 Thread Christian Borntraeger


On 04/06/2018 10:41 AM, Ming Lei wrote:
> On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote:
>>
>>
>> On 04/05/2018 06:11 PM, Ming Lei wrote:

 Could you please apply the following patch and provide the dmesg boot log?
>>>
>>> And please post out the 'lscpu' log together from the test machine too.
>>
>> attached.
>>
>> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller.
>> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 
>> Cores
>> == 16 threads.
> 
> OK, thanks!
> 
> The most weird thing is that hctx->next_cpu is computed as 512 since
> nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of
> possible CPU.
> 
> Looks like it is a s390 specific issue, since I can setup one queue
> which has same mapping with yours:
> 
>   - nr_cpu_id is 282
>   - CPU 0~15 is online
>   - 64 queues null_blk
>   - still run all hw queues in .complete handler
> 
> But can't reproduce this issue at all.
> 
> So please test the following patch, which may tell us why hctx->next_cpu
> is computed wrong:

I see things like

[8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and

which is exactly what happens if the find and and operation fails (returns size 
of bitmap).

FWIW, I added a dump stack for the case when we run unbound before I tested 
your patch:

Apr 06 10:47:41 s38lp39 kernel: CPU: 15 PID: 86 Comm: ksoftirqd/15 Not tainted 
4.16.0-07249-g864f9fc031e4-dirty #2
Apr 06 10:47:41 s38lp39 kernel: Hardware name: IBM 2964 NC9 704 (LPAR)
Apr 06 10:47:41 s38lp39 kernel: Call Trace:
Apr 06 10:47:41 s38lp39 kernel: ([<00113946>] show_stack+0x56/0x80)
Apr 06 10:47:41 s38lp39 kernel:  [<009d8132>] dump_stack+0x82/0xb0 
Apr 06 10:47:41 s38lp39 kernel:  [<006a05de>] 
blk_mq_hctx_next_cpu+0x12e/0x138 
Apr 06 10:47:41 s38lp39 kernel:  [<006a084c>] 
__blk_mq_delay_run_hw_queue+0x94/0xd8 
Apr 06 10:47:41 s38lp39 kernel:  [<006a097a>] 
blk_mq_run_hw_queue+0x82/0x180 
Apr 06 10:47:41 s38lp39 kernel:  [<006a0ae0>] 
blk_mq_run_hw_queues+0x68/0x88 
Apr 06 10:47:41 s38lp39 kernel:  [<0069fc4e>] 
__blk_mq_complete_request+0x11e/0x1d8 
Apr 06 10:47:41 s38lp39 kernel:  [<0069fd94>] 
blk_mq_complete_request+0x8c/0xc8 
Apr 06 10:47:41 s38lp39 kernel:  [<00824c50>] 
dasd_block_tasklet+0x158/0x490 
Apr 06 10:47:41 s38lp39 kernel:  [<0014a952>] 
tasklet_action_common.isra.5+0x7a/0x100 
Apr 06 10:47:41 s38lp39 kernel:  [<009f8248>] __do_softirq+0x98/0x368 
Apr 06 10:47:41 s38lp39 kernel:  [<0014a322>] run_ksoftirqd+0x4a/0x68 
Apr 06 10:47:41 s38lp39 kernel:  [<0016dc20>] 
smpboot_thread_fn+0x108/0x1b0 
Apr 06 10:47:41 s38lp39 kernel:  [<00168e70>] kthread+0x148/0x160 
Apr 06 10:47:41 s38lp39 kernel:  [<009f727a>] 
kernel_thread_starter+0x6/0xc 
Apr 06 10:47:41 s38lp39 kernel:  [<009f7274>] 
kernel_thread_starter+0x0/0xc



Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-04-06 Thread Ming Lei
On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote:
> 
> 
> On 04/05/2018 06:11 PM, Ming Lei wrote:
> >>
> >> Could you please apply the following patch and provide the dmesg boot log?
> > 
> > And please post out the 'lscpu' log together from the test machine too.
> 
> attached.
> 
> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller.
> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 
> Cores
> == 16 threads.

OK, thanks!

The most weird thing is that hctx->next_cpu is computed as 512 since
nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of
possible CPU.

Looks like it is a s390 specific issue, since I can setup one queue
which has same mapping with yours:

- nr_cpu_id is 282
- CPU 0~15 is online
- 64 queues null_blk
- still run all hw queues in .complete handler

But can't reproduce this issue at all.

So please test the following patch, which may tell us why hctx->next_cpu
is computed wrong:

---
diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 9f8cffc8a701..638ab5c11b3c 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -14,13 +14,12 @@
 #include "blk.h"
 #include "blk-mq.h"
 
+/*
+ * Given there isn't CPU hotplug handler in blk-mq, map all CPUs to
+ * queues even it isn't present yet.
+ */
 static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
 {
-   /*
-* Non present CPU will be mapped to queue index 0.
-*/
-   if (!cpu_present(cpu))
-   return 0;
return cpu % nr_queues;
 }
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 90838e998f66..9b130e4b87df 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1343,6 +1343,13 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx 
*hctx)
hctx_unlock(hctx, srcu_idx);
 }
 
+static void check_next_cpu(int next_cpu, const char *str1, const char *str2)
+{
+   if (next_cpu > nr_cpu_ids)
+   printk_ratelimited("wrong next_cpu %d, %s, %s\n",
+   next_cpu, str1, str2);
+}
+
 /*
  * It'd be great if the workqueue API had a way to pass
  * in a mask and had some smarts for more clever placement.
@@ -1352,26 +1359,29 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx 
*hctx)
 static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
 {
bool tried = false;
+   int next_cpu = hctx->next_cpu;
 
if (hctx->queue->nr_hw_queues == 1)
return WORK_CPU_UNBOUND;
 
if (--hctx->next_cpu_batch <= 0) {
-   int next_cpu;
 select_cpu:
-   next_cpu = cpumask_next_and(hctx->next_cpu, hctx->cpumask,
+   next_cpu = cpumask_next_and(next_cpu, hctx->cpumask,
cpu_online_mask);
-   if (next_cpu >= nr_cpu_ids)
+   check_next_cpu(next_cpu, __func__, "next_and");
+   if (next_cpu >= nr_cpu_ids) {
next_cpu = 
cpumask_first_and(hctx->cpumask,cpu_online_mask);
+   check_next_cpu(next_cpu, __func__, "first_and");
+   }
 
/*
 * No online CPU is found, so have to make sure hctx->next_cpu
 * is set correctly for not breaking workqueue.
 */
-   if (next_cpu >= nr_cpu_ids)
-   hctx->next_cpu = cpumask_first(hctx->cpumask);
-   else
-   hctx->next_cpu = next_cpu;
+   if (next_cpu >= nr_cpu_ids) {
+   next_cpu = cpumask_first(hctx->cpumask);
+   check_next_cpu(next_cpu, __func__, "first");
+   }
hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
}
 
@@ -1379,7 +1389,7 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx 
*hctx)
 * Do unbound schedule if we can't find a online CPU for this hctx,
 * and it should only happen in the path of handling CPU DEAD.
 */
-   if (!cpu_online(hctx->next_cpu)) {
+   if (!cpu_online(next_cpu)) {
if (!tried) {
tried = true;
goto select_cpu;
@@ -1392,7 +1402,9 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx 
*hctx)
hctx->next_cpu_batch = 1;
return WORK_CPU_UNBOUND;
}
-   return hctx->next_cpu;
+
+   hctx->next_cpu = next_cpu;
+   return next_cpu;
 }
 
 static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async,
@@ -2408,6 +2420,8 @@ static void blk_mq_map_swqueue(struct request_queue *q)
mutex_unlock(>sysfs_lock);
 
queue_for_each_hw_ctx(q, hctx, i) {
+   int next_cpu;
+
/*
 * If no software queues are mapped to this hardware queue,
 * disable it and free the request entries.
@@ -2437,8 +2451,10 @@ static void blk_mq_map_swqueue(struct request_queue 

Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-04-06 Thread Christian Borntraeger


On 04/05/2018 06:05 PM, Ming Lei wrote:
[...]
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 90838e998f66..996f8a963026 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1324,9 +1324,18 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx 
> *hctx)
>*/
>   if (!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>   cpu_online(hctx->next_cpu)) {
> - printk(KERN_WARNING "run queue from wrong CPU %d, hctx %s\n",
> - raw_smp_processor_id(),
> + int cpu;
> + printk(KERN_WARNING "run queue from wrong CPU %d/%d, hctx-%d 
> %s\n",
> + raw_smp_processor_id(), hctx->next_cpu,
> + hctx->queue_num,
>   cpumask_empty(hctx->cpumask) ? "inactive": "active");
> + printk("dump CPUs mapped to this hctx:\n");
> + for_each_cpu(cpu, hctx->cpumask)
> + printk("%d ", cpu);
> + printk("\n");
> + printk("nr_cpu_ids is %d, and dump online cpus:\n", nr_cpu_ids);
> + for_each_cpu(cpu, cpu_online_mask)
> + printk("%d ", cpu);
>   dump_stack();
>   }
> 

FWIW, with things like

[4.049828] dump CPUs mapped to this hctx:
[4.049829] 18 
[4.049829] 82 
[4.049830] 146 
[4.049830] 210 
[4.049831] 274 

[4.049832] nr_cpu_ids is 282, and dump online cpus:
[4.049833] 0 
[4.049833] 1 
[4.049834] 2 
[4.049834] 3 
[4.049835] 4 
[4.049835] 5 
[4.049836] 6 
[4.049836] 7 
[4.049837] 8 
[4.049837] 9 
[4.049838] 10 
[4.049839] 11 
[4.049839] 12 
[4.049840] 13 
[4.049840] 14 
[4.049841] 15 

So the hctx has only "possible CPUs", but all are offline.

Doesnt that always make this run unbound? See blk_mq_hctx_next_cpu  below.

/*
 * It'd be great if the workqueue API had a way to pass
 * in a mask and had some smarts for more clever placement.
 * For now we just round-robin here, switching for every
 * BLK_MQ_CPU_WORK_BATCH queued items.
 */
static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
{
bool tried = false;

if (hctx->queue->nr_hw_queues == 1)
return WORK_CPU_UNBOUND;

if (--hctx->next_cpu_batch <= 0) {
int next_cpu;
select_cpu: 
next_cpu = cpumask_next_and(hctx->next_cpu, hctx->cpumask,
cpu_online_mask);
if (next_cpu >= nr_cpu_ids)
next_cpu = 
cpumask_first_and(hctx->cpumask,cpu_online_mask);

/*
 * No online CPU is found, so have to make sure hctx->next_cpu
 * is set correctly for not breaking workqueue.
 */
if (next_cpu >= nr_cpu_ids)
hctx->next_cpu = cpumask_first(hctx->cpumask);
else
hctx->next_cpu = next_cpu;
hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
}

/*
 * Do unbound schedule if we can't find a online CPU for this hctx,
 * and it should only happen in the path of handling CPU DEAD.
 */
if (!cpu_online(hctx->next_cpu)) {
if (!tried) {
tried = true;
goto select_cpu;
}

/*
 * Make sure to re-select CPU next time once after CPUs
 * in hctx->cpumask become online again.
 */
hctx->next_cpu_batch = 1;
return WORK_CPU_UNBOUND;
}
return hctx->next_cpu;
}



Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-04-05 Thread Christian Borntraeger


On 04/05/2018 07:39 PM, Christian Borntraeger wrote:
> 
> 
> On 04/05/2018 06:11 PM, Ming Lei wrote:
>>>
>>> Could you please apply the following patch and provide the dmesg boot log?
>>
>> And please post out the 'lscpu' log together from the test machine too.
> 
> attached.
> 
> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller.
> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 
> Cores
> == 16 threads.

To say it differently 
The whole system has up to 141 cpus, but this LPAR has only 8 cpus assigned. So 
we
have 16 CPUS (SMT) but this could become up to 282 IF I would do CPU hotplug. 
(But
this is not used here).



Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-04-05 Thread Christian Borntraeger


On 04/05/2018 06:11 PM, Ming Lei wrote:
>>
>> Could you please apply the following patch and provide the dmesg boot log?
> 
> And please post out the 'lscpu' log together from the test machine too.

attached.

As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller.
We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 Cores
== 16 threads.





dmesg.gz
Description: application/gzip
Architecture:s390x
CPU op-mode(s):  32-bit, 64-bit
Byte Order:  Big Endian
CPU(s):  16
On-line CPU(s) list: 0-15
Thread(s) per core:  2
Core(s) per socket:  8
Socket(s) per book:  3
Book(s) per drawer:  2
Drawer(s):   4
NUMA node(s):1
Vendor ID:   IBM/S390
Machine type:2964
CPU dynamic MHz: 5000
CPU static MHz:  5000
BogoMIPS:20325.00
Hypervisor:  PR/SM
Hypervisor vendor:   IBM
Virtualization type: full
Dispatching mode:horizontal
L1d cache:   128K
L1i cache:   96K
L2d cache:   2048K
L2i cache:   2048K
L3 cache:65536K
L4 cache:491520K
NUMA node0 CPU(s):   0-15
Flags:   esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs 
te vx sie
CPU NODE DRAWER BOOK SOCKET CORE L1d:L1i:L2d:L2i ONLINE CONFIGURED POLARIZATION 
ADDRESS
0   00  00  00:0:0:0 yesyeshorizontal   0
1   00  00  01:1:1:1 yesyeshorizontal   
1
2   00  00  12:2:2:2 yesyeshorizontal   
2
3   00  00  13:3:3:3 yesyeshorizontal   
3
4   00  00  24:4:4:4 yesyeshorizontal   
4
5   00  00  25:5:5:5 yesyeshorizontal   
5
6   00  00  36:6:6:6 yesyeshorizontal   
6
7   00  00  37:7:7:7 yesyeshorizontal   
7
8   00  01  48:8:8:8 yesyeshorizontal   
8
9   00  01  49:9:9:9 yesyeshorizontal   
9
10  00  01  510:10:10:10 yesyeshorizontal   
10
11  00  01  511:11:11:11 yesyeshorizontal   
11
12  00  01  612:12:12:12 yesyeshorizontal   
12
13  00  01  613:13:13:13 yesyeshorizontal   
13
14  00  01  714:14:14:14 yesyeshorizontal   
14
15  00  01  715:15:15:15 yesyeshorizontal   
15


Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-04-05 Thread Ming Lei
On Fri, Apr 06, 2018 at 12:05:03AM +0800, Ming Lei wrote:
> On Wed, Apr 04, 2018 at 10:18:13AM +0200, Christian Borntraeger wrote:
> > 
> > 
> > On 03/30/2018 04:53 AM, Ming Lei wrote:
> > > On Thu, Mar 29, 2018 at 01:49:29PM +0200, Christian Borntraeger wrote:
> > >>
> > >>
> > >> On 03/29/2018 01:43 PM, Ming Lei wrote:
> > >>> On Thu, Mar 29, 2018 at 12:49:55PM +0200, Christian Borntraeger wrote:
> > 
> > 
> >  On 03/29/2018 12:48 PM, Ming Lei wrote:
> > > On Thu, Mar 29, 2018 at 12:10:11PM +0200, Christian Borntraeger wrote:
> > >>
> > >>
> > >> On 03/29/2018 11:40 AM, Ming Lei wrote:
> > >>> On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger 
> > >>> wrote:
> > 
> > 
> >  On 03/29/2018 09:23 AM, Christian Borntraeger wrote:
> > >
> > >
> > > On 03/29/2018 04:00 AM, Ming Lei wrote:
> > >> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger 
> > >> wrote:
> > >>>
> > >>>
> > >>> On 03/28/2018 05:26 PM, Ming Lei wrote:
> >  Hi Christian,
> > 
> >  On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian 
> >  Borntraeger wrote:
> > > FWIW, this patch does not fix the issue for me:
> > >
> > > ostname=? addr=? terminal=? res=success'
> > > [   21.454961] WARNING: CPU: 3 PID: 1882 at 
> > > block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8
> > > [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc 
> > > scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath 
> > > dm_mod autofs4
> > > [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 
> > > 4.16.0-rc7+ #26
> > > [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
> > > [   21.454990] Krnl PSW : c0131ea3 3ea2f7bf 
> > > (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
> > > [   21.454996]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 
> > > AS:3 CC:0 PM:0 RI:0 EA:3
> > > [   21.455005] Krnl GPRS: 013abb69a000 013a 
> > > 013ac6c0dc00 0001
> > > [   21.455008] 013abb69a710 
> > > 013a 0001b691fd98
> > > [   21.455011]0001b691fd98 013ace4775c8 
> > > 0001 
> > > [   21.455014]013ac6c0dc00 00b47238 
> > > 0001b691fc08 0001b691fbd0
> > > [   21.455032] Krnl Code: 0069c596: ebaff0a4  
> > > lmg %r10,%r15,160(%r15)
> > >   0069c59c: c0f47a5e  
> > > brcl15,68ba58
> > >  #0069c5a2: a7f40001  
> > > brc 15,69c5a4
> > >  >0069c5a6: e340f0c4  
> > > lg  %r4,192(%r15)
> > >   0069c5ac: ebaff0a4  
> > > lmg %r10,%r15,160(%r15)
> > >   0069c5b2: 07f4  
> > > bcr 15,%r4
> > >   0069c5b4: c0e5feea  
> > > brasl   %r14,69c388
> > >   0069c5ba: a7f4fff6  
> > > brc 15,69c5a6
> > > [   21.455067] Call Trace:
> > > [   21.455072] ([<0001b691fd98>] 0x1b691fd98)
> > > [   21.455079]  [<0069c692>] 
> > > blk_mq_run_hw_queue+0xba/0x100 
> > > [   21.455083]  [<0069c740>] 
> > > blk_mq_run_hw_queues+0x68/0x88 
> > > [   21.455089]  [<0069b956>] 
> > > __blk_mq_complete_request+0x11e/0x1d8 
> > > [   21.455091]  [<0069ba9c>] 
> > > blk_mq_complete_request+0x8c/0xc8 
> > > [   21.455103]  [<008aa250>] 
> > > dasd_block_tasklet+0x158/0x490 
> > > [   21.455110]  [<0014c742>] 
> > > tasklet_hi_action+0x92/0x120 
> > > [   21.455118]  [<00a7cfc0>] __do_softirq+0x120/0x348 
> > > [   21.455122]  [<0014c212>] irq_exit+0xba/0xd0 
> > > [   21.455130]  [<0010bf92>] do_IRQ+0x8a/0xb8 
> > > [   21.455133]  [<00a7c298>] 
> > > io_int_handler+0x130/0x298 
> > > [   21.455136] Last Breaking-Event-Address:
> > > [   21.455138]  [<0069c5a2>] 
> > > __blk_mq_delay_run_hw_queue+0xba/0xd8
> > > [   21.455140] ---[ end trace be43f99a5d1e553e ]---
> > > [   

Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-04-05 Thread Ming Lei
On Wed, Apr 04, 2018 at 10:18:13AM +0200, Christian Borntraeger wrote:
> 
> 
> On 03/30/2018 04:53 AM, Ming Lei wrote:
> > On Thu, Mar 29, 2018 at 01:49:29PM +0200, Christian Borntraeger wrote:
> >>
> >>
> >> On 03/29/2018 01:43 PM, Ming Lei wrote:
> >>> On Thu, Mar 29, 2018 at 12:49:55PM +0200, Christian Borntraeger wrote:
> 
> 
>  On 03/29/2018 12:48 PM, Ming Lei wrote:
> > On Thu, Mar 29, 2018 at 12:10:11PM +0200, Christian Borntraeger wrote:
> >>
> >>
> >> On 03/29/2018 11:40 AM, Ming Lei wrote:
> >>> On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote:
> 
> 
>  On 03/29/2018 09:23 AM, Christian Borntraeger wrote:
> >
> >
> > On 03/29/2018 04:00 AM, Ming Lei wrote:
> >> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger 
> >> wrote:
> >>>
> >>>
> >>> On 03/28/2018 05:26 PM, Ming Lei wrote:
>  Hi Christian,
> 
>  On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger 
>  wrote:
> > FWIW, this patch does not fix the issue for me:
> >
> > ostname=? addr=? terminal=? res=success'
> > [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 
> > __blk_mq_delay_run_hw_queue+0xbe/0xd8
> > [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc 
> > scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath 
> > dm_mod autofs4
> > [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 
> > 4.16.0-rc7+ #26
> > [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
> > [   21.454990] Krnl PSW : c0131ea3 3ea2f7bf 
> > (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
> > [   21.454996]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 
> > AS:3 CC:0 PM:0 RI:0 EA:3
> > [   21.455005] Krnl GPRS: 013abb69a000 013a 
> > 013ac6c0dc00 0001
> > [   21.455008] 013abb69a710 
> > 013a 0001b691fd98
> > [   21.455011]0001b691fd98 013ace4775c8 
> > 0001 
> > [   21.455014]013ac6c0dc00 00b47238 
> > 0001b691fc08 0001b691fbd0
> > [   21.455032] Krnl Code: 0069c596: ebaff0a4
> > lmg %r10,%r15,160(%r15)
> >   0069c59c: c0f47a5e
> > brcl15,68ba58
> >  #0069c5a2: a7f40001
> > brc 15,69c5a4
> >  >0069c5a6: e340f0c4
> > lg  %r4,192(%r15)
> >   0069c5ac: ebaff0a4
> > lmg %r10,%r15,160(%r15)
> >   0069c5b2: 07f4
> > bcr 15,%r4
> >   0069c5b4: c0e5feea
> > brasl   %r14,69c388
> >   0069c5ba: a7f4fff6
> > brc 15,69c5a6
> > [   21.455067] Call Trace:
> > [   21.455072] ([<0001b691fd98>] 0x1b691fd98)
> > [   21.455079]  [<0069c692>] 
> > blk_mq_run_hw_queue+0xba/0x100 
> > [   21.455083]  [<0069c740>] 
> > blk_mq_run_hw_queues+0x68/0x88 
> > [   21.455089]  [<0069b956>] 
> > __blk_mq_complete_request+0x11e/0x1d8 
> > [   21.455091]  [<0069ba9c>] 
> > blk_mq_complete_request+0x8c/0xc8 
> > [   21.455103]  [<008aa250>] 
> > dasd_block_tasklet+0x158/0x490 
> > [   21.455110]  [<0014c742>] 
> > tasklet_hi_action+0x92/0x120 
> > [   21.455118]  [<00a7cfc0>] __do_softirq+0x120/0x348 
> > [   21.455122]  [<0014c212>] irq_exit+0xba/0xd0 
> > [   21.455130]  [<0010bf92>] do_IRQ+0x8a/0xb8 
> > [   21.455133]  [<00a7c298>] io_int_handler+0x130/0x298 
> > [   21.455136] Last Breaking-Event-Address:
> > [   21.455138]  [<0069c5a2>] 
> > __blk_mq_delay_run_hw_queue+0xba/0xd8
> > [   21.455140] ---[ end trace be43f99a5d1e553e ]---
> > [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, 
> > not configuring
> 
>  Thinking about this issue further, I can't understand the root 
>  cause for
>  this issue.
> 
>  FWIW, 

Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-04-04 Thread Christian Borntraeger


On 03/30/2018 04:53 AM, Ming Lei wrote:
> On Thu, Mar 29, 2018 at 01:49:29PM +0200, Christian Borntraeger wrote:
>>
>>
>> On 03/29/2018 01:43 PM, Ming Lei wrote:
>>> On Thu, Mar 29, 2018 at 12:49:55PM +0200, Christian Borntraeger wrote:


 On 03/29/2018 12:48 PM, Ming Lei wrote:
> On Thu, Mar 29, 2018 at 12:10:11PM +0200, Christian Borntraeger wrote:
>>
>>
>> On 03/29/2018 11:40 AM, Ming Lei wrote:
>>> On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote:


 On 03/29/2018 09:23 AM, Christian Borntraeger wrote:
>
>
> On 03/29/2018 04:00 AM, Ming Lei wrote:
>> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger 
>> wrote:
>>>
>>>
>>> On 03/28/2018 05:26 PM, Ming Lei wrote:
 Hi Christian,

 On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger 
 wrote:
> FWIW, this patch does not fix the issue for me:
>
> ostname=? addr=? terminal=? res=success'
> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 
> __blk_mq_delay_run_hw_queue+0xbe/0xd8
> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc 
> scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod 
> autofs4
> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 
> 4.16.0-rc7+ #26
> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
> [   21.454990] Krnl PSW : c0131ea3 3ea2f7bf 
> (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
> [   21.454996]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 
> AS:3 CC:0 PM:0 RI:0 EA:3
> [   21.455005] Krnl GPRS: 013abb69a000 013a 
> 013ac6c0dc00 0001
> [   21.455008] 013abb69a710 
> 013a 0001b691fd98
> [   21.455011]0001b691fd98 013ace4775c8 
> 0001 
> [   21.455014]013ac6c0dc00 00b47238 
> 0001b691fc08 0001b691fbd0
> [   21.455032] Krnl Code: 0069c596: ebaff0a4  lmg 
> %r10,%r15,160(%r15)
>   0069c59c: c0f47a5e  brcl
> 15,68ba58
>  #0069c5a2: a7f40001  
> brc 15,69c5a4
>  >0069c5a6: e340f0c4  lg  
> %r4,192(%r15)
>   0069c5ac: ebaff0a4  lmg 
> %r10,%r15,160(%r15)
>   0069c5b2: 07f4  bcr 
> 15,%r4
>   0069c5b4: c0e5feea  brasl   
> %r14,69c388
>   0069c5ba: a7f4fff6  
> brc 15,69c5a6
> [   21.455067] Call Trace:
> [   21.455072] ([<0001b691fd98>] 0x1b691fd98)
> [   21.455079]  [<0069c692>] 
> blk_mq_run_hw_queue+0xba/0x100 
> [   21.455083]  [<0069c740>] 
> blk_mq_run_hw_queues+0x68/0x88 
> [   21.455089]  [<0069b956>] 
> __blk_mq_complete_request+0x11e/0x1d8 
> [   21.455091]  [<0069ba9c>] 
> blk_mq_complete_request+0x8c/0xc8 
> [   21.455103]  [<008aa250>] 
> dasd_block_tasklet+0x158/0x490 
> [   21.455110]  [<0014c742>] tasklet_hi_action+0x92/0x120 
> [   21.455118]  [<00a7cfc0>] __do_softirq+0x120/0x348 
> [   21.455122]  [<0014c212>] irq_exit+0xba/0xd0 
> [   21.455130]  [<0010bf92>] do_IRQ+0x8a/0xb8 
> [   21.455133]  [<00a7c298>] io_int_handler+0x130/0x298 
> [   21.455136] Last Breaking-Event-Address:
> [   21.455138]  [<0069c5a2>] 
> __blk_mq_delay_run_hw_queue+0xba/0xd8
> [   21.455140] ---[ end trace be43f99a5d1e553e ]---
> [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, 
> not configuring

 Thinking about this issue further, I can't understand the root 
 cause for
 this issue.

 FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away.
>>>
>>> I think the following patch is needed, and this way aligns to the 
>>> mapping
>>> created via managed IRQ at least.
>>>
>>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
>>> index 

Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-03-29 Thread Ming Lei
On Thu, Mar 29, 2018 at 01:49:29PM +0200, Christian Borntraeger wrote:
> 
> 
> On 03/29/2018 01:43 PM, Ming Lei wrote:
> > On Thu, Mar 29, 2018 at 12:49:55PM +0200, Christian Borntraeger wrote:
> >>
> >>
> >> On 03/29/2018 12:48 PM, Ming Lei wrote:
> >>> On Thu, Mar 29, 2018 at 12:10:11PM +0200, Christian Borntraeger wrote:
> 
> 
>  On 03/29/2018 11:40 AM, Ming Lei wrote:
> > On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote:
> >>
> >>
> >> On 03/29/2018 09:23 AM, Christian Borntraeger wrote:
> >>>
> >>>
> >>> On 03/29/2018 04:00 AM, Ming Lei wrote:
>  On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger 
>  wrote:
> >
> >
> > On 03/28/2018 05:26 PM, Ming Lei wrote:
> >> Hi Christian,
> >>
> >> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger 
> >> wrote:
> >>> FWIW, this patch does not fix the issue for me:
> >>>
> >>> ostname=? addr=? terminal=? res=success'
> >>> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 
> >>> __blk_mq_delay_run_hw_queue+0xbe/0xd8
> >>> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc 
> >>> scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod 
> >>> autofs4
> >>> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 
> >>> 4.16.0-rc7+ #26
> >>> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
> >>> [   21.454990] Krnl PSW : c0131ea3 3ea2f7bf 
> >>> (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
> >>> [   21.454996]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 
> >>> AS:3 CC:0 PM:0 RI:0 EA:3
> >>> [   21.455005] Krnl GPRS: 013abb69a000 013a 
> >>> 013ac6c0dc00 0001
> >>> [   21.455008] 013abb69a710 
> >>> 013a 0001b691fd98
> >>> [   21.455011]0001b691fd98 013ace4775c8 
> >>> 0001 
> >>> [   21.455014]013ac6c0dc00 00b47238 
> >>> 0001b691fc08 0001b691fbd0
> >>> [   21.455032] Krnl Code: 0069c596: ebaff0a4  lmg 
> >>> %r10,%r15,160(%r15)
> >>>   0069c59c: c0f47a5e  brcl
> >>> 15,68ba58
> >>>  #0069c5a2: a7f40001  
> >>> brc 15,69c5a4
> >>>  >0069c5a6: e340f0c4  lg  
> >>> %r4,192(%r15)
> >>>   0069c5ac: ebaff0a4  lmg 
> >>> %r10,%r15,160(%r15)
> >>>   0069c5b2: 07f4  bcr 
> >>> 15,%r4
> >>>   0069c5b4: c0e5feea  brasl   
> >>> %r14,69c388
> >>>   0069c5ba: a7f4fff6  
> >>> brc 15,69c5a6
> >>> [   21.455067] Call Trace:
> >>> [   21.455072] ([<0001b691fd98>] 0x1b691fd98)
> >>> [   21.455079]  [<0069c692>] 
> >>> blk_mq_run_hw_queue+0xba/0x100 
> >>> [   21.455083]  [<0069c740>] 
> >>> blk_mq_run_hw_queues+0x68/0x88 
> >>> [   21.455089]  [<0069b956>] 
> >>> __blk_mq_complete_request+0x11e/0x1d8 
> >>> [   21.455091]  [<0069ba9c>] 
> >>> blk_mq_complete_request+0x8c/0xc8 
> >>> [   21.455103]  [<008aa250>] 
> >>> dasd_block_tasklet+0x158/0x490 
> >>> [   21.455110]  [<0014c742>] tasklet_hi_action+0x92/0x120 
> >>> [   21.455118]  [<00a7cfc0>] __do_softirq+0x120/0x348 
> >>> [   21.455122]  [<0014c212>] irq_exit+0xba/0xd0 
> >>> [   21.455130]  [<0010bf92>] do_IRQ+0x8a/0xb8 
> >>> [   21.455133]  [<00a7c298>] io_int_handler+0x130/0x298 
> >>> [   21.455136] Last Breaking-Event-Address:
> >>> [   21.455138]  [<0069c5a2>] 
> >>> __blk_mq_delay_run_hw_queue+0xba/0xd8
> >>> [   21.455140] ---[ end trace be43f99a5d1e553e ]---
> >>> [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, 
> >>> not configuring
> >>
> >> Thinking about this issue further, I can't understand the root 
> >> cause for
> >> this issue.
> >>
> >> FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away.
> >
> > I think the following patch is needed, and this way aligns to the 
> > mapping
> > created via managed IRQ at least.
> >
> > diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> > index 9f8cffc8a701..638ab5c11b3c 100644
> > --- a/block/blk-mq-cpumap.c

Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-03-29 Thread Ming Lei
On Thu, Mar 29, 2018 at 12:49:55PM +0200, Christian Borntraeger wrote:
> 
> 
> On 03/29/2018 12:48 PM, Ming Lei wrote:
> > On Thu, Mar 29, 2018 at 12:10:11PM +0200, Christian Borntraeger wrote:
> >>
> >>
> >> On 03/29/2018 11:40 AM, Ming Lei wrote:
> >>> On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote:
> 
> 
>  On 03/29/2018 09:23 AM, Christian Borntraeger wrote:
> >
> >
> > On 03/29/2018 04:00 AM, Ming Lei wrote:
> >> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote:
> >>>
> >>>
> >>> On 03/28/2018 05:26 PM, Ming Lei wrote:
>  Hi Christian,
> 
>  On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger 
>  wrote:
> > FWIW, this patch does not fix the issue for me:
> >
> > ostname=? addr=? terminal=? res=success'
> > [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 
> > __blk_mq_delay_run_hw_queue+0xbe/0xd8
> > [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc 
> > scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod 
> > autofs4
> > [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 
> > 4.16.0-rc7+ #26
> > [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
> > [   21.454990] Krnl PSW : c0131ea3 3ea2f7bf 
> > (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
> > [   21.454996]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 
> > CC:0 PM:0 RI:0 EA:3
> > [   21.455005] Krnl GPRS: 013abb69a000 013a 
> > 013ac6c0dc00 0001
> > [   21.455008] 013abb69a710 
> > 013a 0001b691fd98
> > [   21.455011]0001b691fd98 013ace4775c8 
> > 0001 
> > [   21.455014]013ac6c0dc00 00b47238 
> > 0001b691fc08 0001b691fbd0
> > [   21.455032] Krnl Code: 0069c596: ebaff0a4lmg 
> > %r10,%r15,160(%r15)
> >   0069c59c: c0f47a5ebrcl
> > 15,68ba58
> >  #0069c5a2: a7f40001
> > brc 15,69c5a4
> >  >0069c5a6: e340f0c4lg  
> > %r4,192(%r15)
> >   0069c5ac: ebaff0a4lmg 
> > %r10,%r15,160(%r15)
> >   0069c5b2: 07f4bcr 
> > 15,%r4
> >   0069c5b4: c0e5feeabrasl   
> > %r14,69c388
> >   0069c5ba: a7f4fff6
> > brc 15,69c5a6
> > [   21.455067] Call Trace:
> > [   21.455072] ([<0001b691fd98>] 0x1b691fd98)
> > [   21.455079]  [<0069c692>] blk_mq_run_hw_queue+0xba/0x100 
> > [   21.455083]  [<0069c740>] blk_mq_run_hw_queues+0x68/0x88 
> > [   21.455089]  [<0069b956>] 
> > __blk_mq_complete_request+0x11e/0x1d8 
> > [   21.455091]  [<0069ba9c>] 
> > blk_mq_complete_request+0x8c/0xc8 
> > [   21.455103]  [<008aa250>] dasd_block_tasklet+0x158/0x490 
> > [   21.455110]  [<0014c742>] tasklet_hi_action+0x92/0x120 
> > [   21.455118]  [<00a7cfc0>] __do_softirq+0x120/0x348 
> > [   21.455122]  [<0014c212>] irq_exit+0xba/0xd0 
> > [   21.455130]  [<0010bf92>] do_IRQ+0x8a/0xb8 
> > [   21.455133]  [<00a7c298>] io_int_handler+0x130/0x298 
> > [   21.455136] Last Breaking-Event-Address:
> > [   21.455138]  [<0069c5a2>] 
> > __blk_mq_delay_run_hw_queue+0xba/0xd8
> > [   21.455140] ---[ end trace be43f99a5d1e553e ]---
> > [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not 
> > configuring
> 
>  Thinking about this issue further, I can't understand the root cause 
>  for
>  this issue.
> 
>  FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away.
> >>>
> >>> I think the following patch is needed, and this way aligns to the mapping
> >>> created via managed IRQ at least.
> >>>
> >>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> >>> index 9f8cffc8a701..638ab5c11b3c 100644
> >>> --- a/block/blk-mq-cpumap.c
> >>> +++ b/block/blk-mq-cpumap.c
> >>> @@ -14,13 +14,12 @@
> >>>  #include "blk.h"
> >>>  #include "blk-mq.h"
> >>>
> >>> +/*
> >>> + * Given there isn't CPU hotplug handler in blk-mq, map all possible 
> >>> CPUs to
> >>> + * queues even it isn't present yet.
> >>> + */
> >>>  static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
> >>>  {
> 

Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-03-29 Thread Christian Borntraeger


On 03/29/2018 12:48 PM, Ming Lei wrote:
> On Thu, Mar 29, 2018 at 12:10:11PM +0200, Christian Borntraeger wrote:
>>
>>
>> On 03/29/2018 11:40 AM, Ming Lei wrote:
>>> On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote:


 On 03/29/2018 09:23 AM, Christian Borntraeger wrote:
>
>
> On 03/29/2018 04:00 AM, Ming Lei wrote:
>> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote:
>>>
>>>
>>> On 03/28/2018 05:26 PM, Ming Lei wrote:
 Hi Christian,

 On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
> FWIW, this patch does not fix the issue for me:
>
> ostname=? addr=? terminal=? res=success'
> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 
> __blk_mq_delay_run_hw_queue+0xbe/0xd8
> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc 
> scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod 
> autofs4
> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 
> 4.16.0-rc7+ #26
> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
> [   21.454990] Krnl PSW : c0131ea3 3ea2f7bf 
> (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
> [   21.454996]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 
> CC:0 PM:0 RI:0 EA:3
> [   21.455005] Krnl GPRS: 013abb69a000 013a 
> 013ac6c0dc00 0001
> [   21.455008] 013abb69a710 
> 013a 0001b691fd98
> [   21.455011]0001b691fd98 013ace4775c8 
> 0001 
> [   21.455014]013ac6c0dc00 00b47238 
> 0001b691fc08 0001b691fbd0
> [   21.455032] Krnl Code: 0069c596: ebaff0a4  lmg 
> %r10,%r15,160(%r15)
>   0069c59c: c0f47a5e  brcl
> 15,68ba58
>  #0069c5a2: a7f40001  brc 
> 15,69c5a4
>  >0069c5a6: e340f0c4  lg  
> %r4,192(%r15)
>   0069c5ac: ebaff0a4  lmg 
> %r10,%r15,160(%r15)
>   0069c5b2: 07f4  bcr 
> 15,%r4
>   0069c5b4: c0e5feea  brasl   
> %r14,69c388
>   0069c5ba: a7f4fff6  brc 
> 15,69c5a6
> [   21.455067] Call Trace:
> [   21.455072] ([<0001b691fd98>] 0x1b691fd98)
> [   21.455079]  [<0069c692>] blk_mq_run_hw_queue+0xba/0x100 
> [   21.455083]  [<0069c740>] blk_mq_run_hw_queues+0x68/0x88 
> [   21.455089]  [<0069b956>] 
> __blk_mq_complete_request+0x11e/0x1d8 
> [   21.455091]  [<0069ba9c>] 
> blk_mq_complete_request+0x8c/0xc8 
> [   21.455103]  [<008aa250>] dasd_block_tasklet+0x158/0x490 
> [   21.455110]  [<0014c742>] tasklet_hi_action+0x92/0x120 
> [   21.455118]  [<00a7cfc0>] __do_softirq+0x120/0x348 
> [   21.455122]  [<0014c212>] irq_exit+0xba/0xd0 
> [   21.455130]  [<0010bf92>] do_IRQ+0x8a/0xb8 
> [   21.455133]  [<00a7c298>] io_int_handler+0x130/0x298 
> [   21.455136] Last Breaking-Event-Address:
> [   21.455138]  [<0069c5a2>] 
> __blk_mq_delay_run_hw_queue+0xba/0xd8
> [   21.455140] ---[ end trace be43f99a5d1e553e ]---
> [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not 
> configuring

 Thinking about this issue further, I can't understand the root cause 
 for
 this issue.

 FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away.
>>>
>>> I think the following patch is needed, and this way aligns to the mapping
>>> created via managed IRQ at least.
>>>
>>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
>>> index 9f8cffc8a701..638ab5c11b3c 100644
>>> --- a/block/blk-mq-cpumap.c
>>> +++ b/block/blk-mq-cpumap.c
>>> @@ -14,13 +14,12 @@
>>>  #include "blk.h"
>>>  #include "blk-mq.h"
>>>
>>> +/*
>>> + * Given there isn't CPU hotplug handler in blk-mq, map all possible CPUs 
>>> to
>>> + * queues even it isn't present yet.
>>> + */
>>>  static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
>>>  {
>>> -   /*
>>> -* Non present CPU will be mapped to queue index 0.
>>> -*/
>>> -   if (!cpu_present(cpu))
>>> -   return 0;
>>> return cpu % nr_queues;
>>>  }
>>>
>>> Thanks,
>>> Ming
>>>
>>
>> With that I no longer see the WARN_ON but the other warning instead:
>>
>> [   

Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-03-29 Thread Ming Lei
On Thu, Mar 29, 2018 at 12:10:11PM +0200, Christian Borntraeger wrote:
> 
> 
> On 03/29/2018 11:40 AM, Ming Lei wrote:
> > On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote:
> >>
> >>
> >> On 03/29/2018 09:23 AM, Christian Borntraeger wrote:
> >>>
> >>>
> >>> On 03/29/2018 04:00 AM, Ming Lei wrote:
>  On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote:
> >
> >
> > On 03/28/2018 05:26 PM, Ming Lei wrote:
> >> Hi Christian,
> >>
> >> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
> >>> FWIW, this patch does not fix the issue for me:
> >>>
> >>> ostname=? addr=? terminal=? res=success'
> >>> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 
> >>> __blk_mq_delay_run_hw_queue+0xbe/0xd8
> >>> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc 
> >>> scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod 
> >>> autofs4
> >>> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 
> >>> 4.16.0-rc7+ #26
> >>> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
> >>> [   21.454990] Krnl PSW : c0131ea3 3ea2f7bf 
> >>> (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
> >>> [   21.454996]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 
> >>> CC:0 PM:0 RI:0 EA:3
> >>> [   21.455005] Krnl GPRS: 013abb69a000 013a 
> >>> 013ac6c0dc00 0001
> >>> [   21.455008] 013abb69a710 
> >>> 013a 0001b691fd98
> >>> [   21.455011]0001b691fd98 013ace4775c8 
> >>> 0001 
> >>> [   21.455014]013ac6c0dc00 00b47238 
> >>> 0001b691fc08 0001b691fbd0
> >>> [   21.455032] Krnl Code: 0069c596: ebaff0a4  lmg 
> >>> %r10,%r15,160(%r15)
> >>>   0069c59c: c0f47a5e  brcl
> >>> 15,68ba58
> >>>  #0069c5a2: a7f40001  brc 
> >>> 15,69c5a4
> >>>  >0069c5a6: e340f0c4  lg  
> >>> %r4,192(%r15)
> >>>   0069c5ac: ebaff0a4  lmg 
> >>> %r10,%r15,160(%r15)
> >>>   0069c5b2: 07f4  bcr 
> >>> 15,%r4
> >>>   0069c5b4: c0e5feea  brasl   
> >>> %r14,69c388
> >>>   0069c5ba: a7f4fff6  brc 
> >>> 15,69c5a6
> >>> [   21.455067] Call Trace:
> >>> [   21.455072] ([<0001b691fd98>] 0x1b691fd98)
> >>> [   21.455079]  [<0069c692>] blk_mq_run_hw_queue+0xba/0x100 
> >>> [   21.455083]  [<0069c740>] blk_mq_run_hw_queues+0x68/0x88 
> >>> [   21.455089]  [<0069b956>] 
> >>> __blk_mq_complete_request+0x11e/0x1d8 
> >>> [   21.455091]  [<0069ba9c>] 
> >>> blk_mq_complete_request+0x8c/0xc8 
> >>> [   21.455103]  [<008aa250>] dasd_block_tasklet+0x158/0x490 
> >>> [   21.455110]  [<0014c742>] tasklet_hi_action+0x92/0x120 
> >>> [   21.455118]  [<00a7cfc0>] __do_softirq+0x120/0x348 
> >>> [   21.455122]  [<0014c212>] irq_exit+0xba/0xd0 
> >>> [   21.455130]  [<0010bf92>] do_IRQ+0x8a/0xb8 
> >>> [   21.455133]  [<00a7c298>] io_int_handler+0x130/0x298 
> >>> [   21.455136] Last Breaking-Event-Address:
> >>> [   21.455138]  [<0069c5a2>] 
> >>> __blk_mq_delay_run_hw_queue+0xba/0xd8
> >>> [   21.455140] ---[ end trace be43f99a5d1e553e ]---
> >>> [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not 
> >>> configuring
> >>
> >> Thinking about this issue further, I can't understand the root cause 
> >> for
> >> this issue.
> >>
> >> FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away.
> > 
> > I think the following patch is needed, and this way aligns to the mapping
> > created via managed IRQ at least.
> > 
> > diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> > index 9f8cffc8a701..638ab5c11b3c 100644
> > --- a/block/blk-mq-cpumap.c
> > +++ b/block/blk-mq-cpumap.c
> > @@ -14,13 +14,12 @@
> >  #include "blk.h"
> >  #include "blk-mq.h"
> > 
> > +/*
> > + * Given there isn't CPU hotplug handler in blk-mq, map all possible CPUs 
> > to
> > + * queues even it isn't present yet.
> > + */
> >  static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
> >  {
> > -   /*
> > -* Non present CPU will be mapped to queue index 0.
> > -*/
> > -   if (!cpu_present(cpu))
> > -   return 0;
> > return cpu % nr_queues;
> >  }
> > 
> > Thanks,
> > Ming
> > 
> 
> With that I no longer see the WARN_ON but the other warning instead:
> 
> [   31.903096] audit: type=1130 

Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-03-29 Thread Ming Lei
On Thu, Mar 29, 2018 at 05:52:16PM +0800, Ming Lei wrote:
> On Thu, Mar 29, 2018 at 09:23:10AM +0200, Christian Borntraeger wrote:
> > 
> > 
> > On 03/29/2018 04:00 AM, Ming Lei wrote:
> > > On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote:
> > >>
> > >>
> > >> On 03/28/2018 05:26 PM, Ming Lei wrote:
> > >>> Hi Christian,
> > >>>
> > >>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
> >  FWIW, this patch does not fix the issue for me:
> > 
> >  ostname=? addr=? terminal=? res=success'
> >  [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 
> >  __blk_mq_delay_run_hw_queue+0xbe/0xd8
> >  [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc 
> >  scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod 
> >  autofs4
> >  [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 
> >  4.16.0-rc7+ #26
> >  [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
> >  [   21.454990] Krnl PSW : c0131ea3 3ea2f7bf 
> >  (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
> >  [   21.454996]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 
> >  CC:0 PM:0 RI:0 EA:3
> >  [   21.455005] Krnl GPRS: 013abb69a000 013a 
> >  013ac6c0dc00 0001
> >  [   21.455008] 013abb69a710 
> >  013a 0001b691fd98
> >  [   21.455011]0001b691fd98 013ace4775c8 
> >  0001 
> >  [   21.455014]013ac6c0dc00 00b47238 
> >  0001b691fc08 0001b691fbd0
> >  [   21.455032] Krnl Code: 0069c596: ebaff0a4   lmg 
> >  %r10,%r15,160(%r15)
> >    0069c59c: c0f47a5e   brcl
> >  15,68ba58
> >   #0069c5a2: a7f40001   brc 
> >  15,69c5a4
> >   >0069c5a6: e340f0c4   lg  
> >  %r4,192(%r15)
> >    0069c5ac: ebaff0a4   lmg 
> >  %r10,%r15,160(%r15)
> >    0069c5b2: 07f4   bcr 
> >  15,%r4
> >    0069c5b4: c0e5feea   brasl   
> >  %r14,69c388
> >    0069c5ba: a7f4fff6   brc 
> >  15,69c5a6
> >  [   21.455067] Call Trace:
> >  [   21.455072] ([<0001b691fd98>] 0x1b691fd98)
> >  [   21.455079]  [<0069c692>] blk_mq_run_hw_queue+0xba/0x100 
> >  [   21.455083]  [<0069c740>] blk_mq_run_hw_queues+0x68/0x88 
> >  [   21.455089]  [<0069b956>] 
> >  __blk_mq_complete_request+0x11e/0x1d8 
> >  [   21.455091]  [<0069ba9c>] blk_mq_complete_request+0x8c/0xc8 
> >  [   21.455103]  [<008aa250>] dasd_block_tasklet+0x158/0x490 
> >  [   21.455110]  [<0014c742>] tasklet_hi_action+0x92/0x120 
> >  [   21.455118]  [<00a7cfc0>] __do_softirq+0x120/0x348 
> >  [   21.455122]  [<0014c212>] irq_exit+0xba/0xd0 
> >  [   21.455130]  [<0010bf92>] do_IRQ+0x8a/0xb8 
> >  [   21.455133]  [<00a7c298>] io_int_handler+0x130/0x298 
> >  [   21.455136] Last Breaking-Event-Address:
> >  [   21.455138]  [<0069c5a2>] 
> >  __blk_mq_delay_run_hw_queue+0xba/0xd8
> >  [   21.455140] ---[ end trace be43f99a5d1e553e ]---
> >  [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not 
> >  configuring
> > >>>
> > >>> Thinking about this issue further, I can't understand the root cause for
> > >>> this issue.
> > >>>
> > >>> After commit 20e4d813931961fe ("blk-mq: simplify queue mapping & 
> > >>> schedule with
> > >>> each possisble CPU"), each hw queue should be mapped to at least one 
> > >>> CPU, that
> > >>> means this issue shouldn't happen. Maybe blk_mq_map_queues() works 
> > >>> wrong?
> > >>>
> > >>> Could you dump 'lscpu' and provide blk-mq debugfs for your DASD via the
> > >>> following command?
> > >>
> > >> # lscpu
> > >> Architecture:s390x
> > >> CPU op-mode(s):  32-bit, 64-bit
> > >> Byte Order:  Big Endian
> > >> CPU(s):  16
> > >> On-line CPU(s) list: 0-15
> > >> Thread(s) per core:  2
> > >> Core(s) per socket:  8
> > >> Socket(s) per book:  3
> > >> Book(s) per drawer:  2
> > >> Drawer(s):   4
> > >> NUMA node(s):1
> > >> Vendor ID:   IBM/S390
> > >> Machine type:2964
> > >> CPU dynamic MHz: 5000
> > >> CPU static MHz:  5000
> > >> BogoMIPS:20325.00
> > >> Hypervisor:  PR/SM
> > >> Hypervisor vendor:   IBM
> > >> Virtualization type: full
> > >> Dispatching mode:horizontal
> > >> L1d cache:   128K
> > >> L1i cache:   96K
> > >> L2d cache:   2048K
> > >> L2i cache:   2048K
> > 

Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-03-29 Thread Christian Borntraeger


On 03/29/2018 12:11 PM, Christian Borntraeger wrote:
> 
> 
> On 03/29/2018 11:52 AM, Ming Lei wrote:
>> From the debugfs log, hctx0 is mapped to lots of CPU, so it shouldn't be
>> unmapped, could you check if it is hctx0 which is unmapped when the
>> warning is triggered? If not, what is the unmapped hctx? And you can do
>> that by adding one extra line:
>>
>>  printk("unmapped hctx %d", hctx->queue_num);
> 
> Where do you want that printk?

And do you want it with or without the other patch that you have just sent?



Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-03-29 Thread Christian Borntraeger


On 03/29/2018 11:52 AM, Ming Lei wrote:
> From the debugfs log, hctx0 is mapped to lots of CPU, so it shouldn't be
> unmapped, could you check if it is hctx0 which is unmapped when the
> warning is triggered? If not, what is the unmapped hctx? And you can do
> that by adding one extra line:
> 
>   printk("unmapped hctx %d", hctx->queue_num);

Where do you want that printk?
> 
> Thanks,
> Ming



Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-03-29 Thread Christian Borntraeger


On 03/29/2018 11:40 AM, Ming Lei wrote:
> On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote:
>>
>>
>> On 03/29/2018 09:23 AM, Christian Borntraeger wrote:
>>>
>>>
>>> On 03/29/2018 04:00 AM, Ming Lei wrote:
 On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote:
>
>
> On 03/28/2018 05:26 PM, Ming Lei wrote:
>> Hi Christian,
>>
>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
>>> FWIW, this patch does not fix the issue for me:
>>>
>>> ostname=? addr=? terminal=? res=success'
>>> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 
>>> __blk_mq_delay_run_hw_queue+0xbe/0xd8
>>> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua 
>>> dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4
>>> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 
>>> 4.16.0-rc7+ #26
>>> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
>>> [   21.454990] Krnl PSW : c0131ea3 3ea2f7bf 
>>> (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
>>> [   21.454996]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 
>>> PM:0 RI:0 EA:3
>>> [   21.455005] Krnl GPRS: 013abb69a000 013a 
>>> 013ac6c0dc00 0001
>>> [   21.455008] 013abb69a710 
>>> 013a 0001b691fd98
>>> [   21.455011]0001b691fd98 013ace4775c8 
>>> 0001 
>>> [   21.455014]013ac6c0dc00 00b47238 
>>> 0001b691fc08 0001b691fbd0
>>> [   21.455032] Krnl Code: 0069c596: ebaff0a4lmg 
>>> %r10,%r15,160(%r15)
>>>   0069c59c: c0f47a5ebrcl
>>> 15,68ba58
>>>  #0069c5a2: a7f40001brc 
>>> 15,69c5a4
>>>  >0069c5a6: e340f0c4lg  
>>> %r4,192(%r15)
>>>   0069c5ac: ebaff0a4lmg 
>>> %r10,%r15,160(%r15)
>>>   0069c5b2: 07f4bcr 
>>> 15,%r4
>>>   0069c5b4: c0e5feeabrasl   
>>> %r14,69c388
>>>   0069c5ba: a7f4fff6brc 
>>> 15,69c5a6
>>> [   21.455067] Call Trace:
>>> [   21.455072] ([<0001b691fd98>] 0x1b691fd98)
>>> [   21.455079]  [<0069c692>] blk_mq_run_hw_queue+0xba/0x100 
>>> [   21.455083]  [<0069c740>] blk_mq_run_hw_queues+0x68/0x88 
>>> [   21.455089]  [<0069b956>] 
>>> __blk_mq_complete_request+0x11e/0x1d8 
>>> [   21.455091]  [<0069ba9c>] blk_mq_complete_request+0x8c/0xc8 
>>> [   21.455103]  [<008aa250>] dasd_block_tasklet+0x158/0x490 
>>> [   21.455110]  [<0014c742>] tasklet_hi_action+0x92/0x120 
>>> [   21.455118]  [<00a7cfc0>] __do_softirq+0x120/0x348 
>>> [   21.455122]  [<0014c212>] irq_exit+0xba/0xd0 
>>> [   21.455130]  [<0010bf92>] do_IRQ+0x8a/0xb8 
>>> [   21.455133]  [<00a7c298>] io_int_handler+0x130/0x298 
>>> [   21.455136] Last Breaking-Event-Address:
>>> [   21.455138]  [<0069c5a2>] 
>>> __blk_mq_delay_run_hw_queue+0xba/0xd8
>>> [   21.455140] ---[ end trace be43f99a5d1e553e ]---
>>> [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not 
>>> configuring
>>
>> Thinking about this issue further, I can't understand the root cause for
>> this issue.
>>
>> FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away.
> 
> I think the following patch is needed, and this way aligns to the mapping
> created via managed IRQ at least.
> 
> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> index 9f8cffc8a701..638ab5c11b3c 100644
> --- a/block/blk-mq-cpumap.c
> +++ b/block/blk-mq-cpumap.c
> @@ -14,13 +14,12 @@
>  #include "blk.h"
>  #include "blk-mq.h"
> 
> +/*
> + * Given there isn't CPU hotplug handler in blk-mq, map all possible CPUs to
> + * queues even it isn't present yet.
> + */
>  static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
>  {
> - /*
> -  * Non present CPU will be mapped to queue index 0.
> -  */
> - if (!cpu_present(cpu))
> - return 0;
>   return cpu % nr_queues;
>  }
> 
> Thanks,
> Ming
> 

With that I no longer see the WARN_ON but the other warning instead:

[   31.903096] audit: type=1130 audit(1522318064.439:41): pid=1 uid=0 
auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev 
comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? 
res=success'
[   31.903100] audit: type=1131 audit(1522318064.439:42): pid=1 uid=0 
auid=4294967295 ses=4294967295 

Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-03-29 Thread Ming Lei
On Thu, Mar 29, 2018 at 09:23:10AM +0200, Christian Borntraeger wrote:
> 
> 
> On 03/29/2018 04:00 AM, Ming Lei wrote:
> > On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote:
> >>
> >>
> >> On 03/28/2018 05:26 PM, Ming Lei wrote:
> >>> Hi Christian,
> >>>
> >>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
>  FWIW, this patch does not fix the issue for me:
> 
>  ostname=? addr=? terminal=? res=success'
>  [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 
>  __blk_mq_delay_run_hw_queue+0xbe/0xd8
>  [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua 
>  dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4
>  [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 
>  4.16.0-rc7+ #26
>  [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
>  [   21.454990] Krnl PSW : c0131ea3 3ea2f7bf 
>  (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
>  [   21.454996]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 
>  PM:0 RI:0 EA:3
>  [   21.455005] Krnl GPRS: 013abb69a000 013a 
>  013ac6c0dc00 0001
>  [   21.455008] 013abb69a710 
>  013a 0001b691fd98
>  [   21.455011]0001b691fd98 013ace4775c8 
>  0001 
>  [   21.455014]013ac6c0dc00 00b47238 
>  0001b691fc08 0001b691fbd0
>  [   21.455032] Krnl Code: 0069c596: ebaff0a4 lmg 
>  %r10,%r15,160(%r15)
>    0069c59c: c0f47a5e brcl
>  15,68ba58
>   #0069c5a2: a7f40001 brc 
>  15,69c5a4
>   >0069c5a6: e340f0c4 lg  
>  %r4,192(%r15)
>    0069c5ac: ebaff0a4 lmg 
>  %r10,%r15,160(%r15)
>    0069c5b2: 07f4 bcr 15,%r4
>    0069c5b4: c0e5feea brasl   
>  %r14,69c388
>    0069c5ba: a7f4fff6 brc 
>  15,69c5a6
>  [   21.455067] Call Trace:
>  [   21.455072] ([<0001b691fd98>] 0x1b691fd98)
>  [   21.455079]  [<0069c692>] blk_mq_run_hw_queue+0xba/0x100 
>  [   21.455083]  [<0069c740>] blk_mq_run_hw_queues+0x68/0x88 
>  [   21.455089]  [<0069b956>] 
>  __blk_mq_complete_request+0x11e/0x1d8 
>  [   21.455091]  [<0069ba9c>] blk_mq_complete_request+0x8c/0xc8 
>  [   21.455103]  [<008aa250>] dasd_block_tasklet+0x158/0x490 
>  [   21.455110]  [<0014c742>] tasklet_hi_action+0x92/0x120 
>  [   21.455118]  [<00a7cfc0>] __do_softirq+0x120/0x348 
>  [   21.455122]  [<0014c212>] irq_exit+0xba/0xd0 
>  [   21.455130]  [<0010bf92>] do_IRQ+0x8a/0xb8 
>  [   21.455133]  [<00a7c298>] io_int_handler+0x130/0x298 
>  [   21.455136] Last Breaking-Event-Address:
>  [   21.455138]  [<0069c5a2>] 
>  __blk_mq_delay_run_hw_queue+0xba/0xd8
>  [   21.455140] ---[ end trace be43f99a5d1e553e ]---
>  [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not 
>  configuring
> >>>
> >>> Thinking about this issue further, I can't understand the root cause for
> >>> this issue.
> >>>
> >>> After commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule 
> >>> with
> >>> each possisble CPU"), each hw queue should be mapped to at least one CPU, 
> >>> that
> >>> means this issue shouldn't happen. Maybe blk_mq_map_queues() works wrong?
> >>>
> >>> Could you dump 'lscpu' and provide blk-mq debugfs for your DASD via the
> >>> following command?
> >>
> >> # lscpu
> >> Architecture:s390x
> >> CPU op-mode(s):  32-bit, 64-bit
> >> Byte Order:  Big Endian
> >> CPU(s):  16
> >> On-line CPU(s) list: 0-15
> >> Thread(s) per core:  2
> >> Core(s) per socket:  8
> >> Socket(s) per book:  3
> >> Book(s) per drawer:  2
> >> Drawer(s):   4
> >> NUMA node(s):1
> >> Vendor ID:   IBM/S390
> >> Machine type:2964
> >> CPU dynamic MHz: 5000
> >> CPU static MHz:  5000
> >> BogoMIPS:20325.00
> >> Hypervisor:  PR/SM
> >> Hypervisor vendor:   IBM
> >> Virtualization type: full
> >> Dispatching mode:horizontal
> >> L1d cache:   128K
> >> L1i cache:   96K
> >> L2d cache:   2048K
> >> L2i cache:   2048K
> >> L3 cache:65536K
> >> L4 cache:491520K
> >> NUMA node0 CPU(s):   0-15
> >> Flags:   esan3 zarch stfle msa ldisp eimm dfp edat etf3eh 
> >> highgprs te vx sie
> >>
> >> # lsdasd 
> >> Bus-ID Status  Name  Device  Type  BlkSz  Size  Blocks
> >> 

Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-03-29 Thread Ming Lei
On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote:
> 
> 
> On 03/29/2018 09:23 AM, Christian Borntraeger wrote:
> > 
> > 
> > On 03/29/2018 04:00 AM, Ming Lei wrote:
> >> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote:
> >>>
> >>>
> >>> On 03/28/2018 05:26 PM, Ming Lei wrote:
>  Hi Christian,
> 
>  On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
> > FWIW, this patch does not fix the issue for me:
> >
> > ostname=? addr=? terminal=? res=success'
> > [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 
> > __blk_mq_delay_run_hw_queue+0xbe/0xd8
> > [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua 
> > dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4
> > [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 
> > 4.16.0-rc7+ #26
> > [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
> > [   21.454990] Krnl PSW : c0131ea3 3ea2f7bf 
> > (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
> > [   21.454996]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 
> > PM:0 RI:0 EA:3
> > [   21.455005] Krnl GPRS: 013abb69a000 013a 
> > 013ac6c0dc00 0001
> > [   21.455008] 013abb69a710 
> > 013a 0001b691fd98
> > [   21.455011]0001b691fd98 013ace4775c8 
> > 0001 
> > [   21.455014]013ac6c0dc00 00b47238 
> > 0001b691fc08 0001b691fbd0
> > [   21.455032] Krnl Code: 0069c596: ebaff0a4lmg 
> > %r10,%r15,160(%r15)
> >   0069c59c: c0f47a5ebrcl
> > 15,68ba58
> >  #0069c5a2: a7f40001brc 
> > 15,69c5a4
> >  >0069c5a6: e340f0c4lg  
> > %r4,192(%r15)
> >   0069c5ac: ebaff0a4lmg 
> > %r10,%r15,160(%r15)
> >   0069c5b2: 07f4bcr 
> > 15,%r4
> >   0069c5b4: c0e5feeabrasl   
> > %r14,69c388
> >   0069c5ba: a7f4fff6brc 
> > 15,69c5a6
> > [   21.455067] Call Trace:
> > [   21.455072] ([<0001b691fd98>] 0x1b691fd98)
> > [   21.455079]  [<0069c692>] blk_mq_run_hw_queue+0xba/0x100 
> > [   21.455083]  [<0069c740>] blk_mq_run_hw_queues+0x68/0x88 
> > [   21.455089]  [<0069b956>] 
> > __blk_mq_complete_request+0x11e/0x1d8 
> > [   21.455091]  [<0069ba9c>] blk_mq_complete_request+0x8c/0xc8 
> > [   21.455103]  [<008aa250>] dasd_block_tasklet+0x158/0x490 
> > [   21.455110]  [<0014c742>] tasklet_hi_action+0x92/0x120 
> > [   21.455118]  [<00a7cfc0>] __do_softirq+0x120/0x348 
> > [   21.455122]  [<0014c212>] irq_exit+0xba/0xd0 
> > [   21.455130]  [<0010bf92>] do_IRQ+0x8a/0xb8 
> > [   21.455133]  [<00a7c298>] io_int_handler+0x130/0x298 
> > [   21.455136] Last Breaking-Event-Address:
> > [   21.455138]  [<0069c5a2>] 
> > __blk_mq_delay_run_hw_queue+0xba/0xd8
> > [   21.455140] ---[ end trace be43f99a5d1e553e ]---
> > [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not 
> > configuring
> 
>  Thinking about this issue further, I can't understand the root cause for
>  this issue.
> 
> FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away.

I think the following patch is needed, and this way aligns to the mapping
created via managed IRQ at least.

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 9f8cffc8a701..638ab5c11b3c 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -14,13 +14,12 @@
 #include "blk.h"
 #include "blk-mq.h"
 
+/*
+ * Given there isn't CPU hotplug handler in blk-mq, map all possible CPUs to
+ * queues even it isn't present yet.
+ */
 static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
 {
-   /*
-* Non present CPU will be mapped to queue index 0.
-*/
-   if (!cpu_present(cpu))
-   return 0;
return cpu % nr_queues;
 }

Thanks,
Ming


Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-03-29 Thread Christian Borntraeger


On 03/29/2018 09:23 AM, Christian Borntraeger wrote:
> 
> 
> On 03/29/2018 04:00 AM, Ming Lei wrote:
>> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote:
>>>
>>>
>>> On 03/28/2018 05:26 PM, Ming Lei wrote:
 Hi Christian,

 On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
> FWIW, this patch does not fix the issue for me:
>
> ostname=? addr=? terminal=? res=success'
> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 
> __blk_mq_delay_run_hw_queue+0xbe/0xd8
> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua 
> dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4
> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ 
> #26
> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
> [   21.454990] Krnl PSW : c0131ea3 3ea2f7bf 
> (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
> [   21.454996]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 
> PM:0 RI:0 EA:3
> [   21.455005] Krnl GPRS: 013abb69a000 013a 
> 013ac6c0dc00 0001
> [   21.455008] 013abb69a710 
> 013a 0001b691fd98
> [   21.455011]0001b691fd98 013ace4775c8 
> 0001 
> [   21.455014]013ac6c0dc00 00b47238 
> 0001b691fc08 0001b691fbd0
> [   21.455032] Krnl Code: 0069c596: ebaff0a4  lmg 
> %r10,%r15,160(%r15)
>   0069c59c: c0f47a5e  brcl
> 15,68ba58
>  #0069c5a2: a7f40001  brc 
> 15,69c5a4
>  >0069c5a6: e340f0c4  lg  
> %r4,192(%r15)
>   0069c5ac: ebaff0a4  lmg 
> %r10,%r15,160(%r15)
>   0069c5b2: 07f4  bcr 15,%r4
>   0069c5b4: c0e5feea  brasl   
> %r14,69c388
>   0069c5ba: a7f4fff6  brc 
> 15,69c5a6
> [   21.455067] Call Trace:
> [   21.455072] ([<0001b691fd98>] 0x1b691fd98)
> [   21.455079]  [<0069c692>] blk_mq_run_hw_queue+0xba/0x100 
> [   21.455083]  [<0069c740>] blk_mq_run_hw_queues+0x68/0x88 
> [   21.455089]  [<0069b956>] 
> __blk_mq_complete_request+0x11e/0x1d8 
> [   21.455091]  [<0069ba9c>] blk_mq_complete_request+0x8c/0xc8 
> [   21.455103]  [<008aa250>] dasd_block_tasklet+0x158/0x490 
> [   21.455110]  [<0014c742>] tasklet_hi_action+0x92/0x120 
> [   21.455118]  [<00a7cfc0>] __do_softirq+0x120/0x348 
> [   21.455122]  [<0014c212>] irq_exit+0xba/0xd0 
> [   21.455130]  [<0010bf92>] do_IRQ+0x8a/0xb8 
> [   21.455133]  [<00a7c298>] io_int_handler+0x130/0x298 
> [   21.455136] Last Breaking-Event-Address:
> [   21.455138]  [<0069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8
> [   21.455140] ---[ end trace be43f99a5d1e553e ]---
> [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not 
> configuring

 Thinking about this issue further, I can't understand the root cause for
 this issue.

FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away.



Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-03-28 Thread Christian Borntraeger
FWIW, these logs were from a different system (with less disks and cpus).
the related log is

[4.114191] dasd-eckd.2aa01a: 0.0.3f77: New DASD 3390/0C (CU 3990/01) with 
30051 cylinders, 15 heads, 224 sectors
[4.114852] dasd-eckd.2aa01a: 0.0.3f74: New DASD 3390/0C (CU 3990/01) with 
30051 cylinders, 15 heads, 224 sectors
[4.122361] dasd-eckd.412b53: 0.0.3f77: DASD with 4 KB/block, 21636720 KB 
total size, 48 KB/track, compatible disk layout
[4.122811] dasd-eckd.412b53: 0.0.3f74: DASD with 4 KB/block, 21636720 KB 
total size, 48 KB/track, compatible disk layout
[4.123568]  dasdc:VOL1/  0X3F77: dasdc1
[4.124092]  dasdd:VOL1/  0X3F74: dasdd1
[4.286220] WARNING: CPU: 1 PID: 1262 at block/blk-mq.c:1402 
__blk_mq_delay_run_hw_queue+0xbe/0xd8
[4.286225] Modules linked in: autofs4
[4.286231] CPU: 1 PID: 1262 Comm: dasdconf.sh Not tainted 
4.16.0-20180323.rc6.git0.792f5024dd01.300.fc27.s390x #1
[4.286232] Hardware name: IBM 2964 NC9 704 (LPAR)
[4.286236] Krnl PSW : 53ccfc28 c4b59c51 
(__blk_mq_delay_run_hw_queue+0xbe/0xd8)
[4.286239]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 
RI:0 EA:3
[4.286242] Krnl GPRS: 0003da4eb000 0003 0003dae67000 
0001
[4.286243] 0003da4eb710 0003 
00010dbafd98
[4.286245]00010dbafd98 0001 0001 

[4.286248]0003dae67000 00aaa750 00010dbafc00 
00010dbafbc8
[4.286256] Krnl Code: 00698e46: ebaff0a4lmg 
%r10,%r15,160(%r15)
  00698e4c: c0f47acabrcl
15,6883e0
 #00698e52: a7f40001brc 
15,698e54
 >00698e56: e340f0c4lg  
%r4,192(%r15)
  00698e5c: ebaff0a4lmg 
%r10,%r15,160(%r15)
  00698e62: 07f4bcr 15,%r4
  00698e64: c0e5ff02brasl   
%r14,698c68
  00698e6a: a7f4fff6brc 
15,698e56
[4.286301] Call Trace:
[4.286304] ([<00010dbafc08>] 0x10dbafc08)
[4.286306]  [<00698f5a>] blk_mq_run_hw_queue+0x82/0x180 
[4.286308]  [<006990c0>] blk_mq_run_hw_queues+0x68/0x88 
[4.286310]  [<006982de>] __blk_mq_complete_request+0x11e/0x1d8 
[4.286313]  [<00698424>] blk_mq_complete_request+0x8c/0xc8 
[4.286319]  [<0082c5d0>] dasd_block_tasklet+0x158/0x490 
[4.286325]  [<0014bc9a>] tasklet_hi_action+0x92/0x120 
[4.286329]  [<009feeb0>] __do_softirq+0x120/0x348 
[4.286331]  [<0014b76a>] irq_exit+0xba/0xd0 
[4.286335]  [<0010bf92>] do_IRQ+0x8a/0xb8 
[4.286337]  [<009fe180>] io_int_handler+0x130/0x298 
[4.286338] Last Breaking-Event-Address:
[4.286340]  [<00698e52>] __blk_mq_delay_run_hw_queue+0xba/0xd8
[4.286342] ---[ end trace 0d746eb6f9348354 ]---

On 03/28/2018 05:36 PM, Christian Borntraeger wrote:
> 
> 
> On 03/28/2018 05:26 PM, Ming Lei wrote:
>> Hi Christian,
>>
>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
>>> FWIW, this patch does not fix the issue for me:
>>>
>>> ostname=? addr=? terminal=? res=success'
>>> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 
>>> __blk_mq_delay_run_hw_queue+0xbe/0xd8
>>> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua 
>>> dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4
>>> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ 
>>> #26
>>> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
>>> [   21.454990] Krnl PSW : c0131ea3 3ea2f7bf 
>>> (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
>>> [   21.454996]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 
>>> PM:0 RI:0 EA:3
>>> [   21.455005] Krnl GPRS: 013abb69a000 013a 
>>> 013ac6c0dc00 0001
>>> [   21.455008] 013abb69a710 
>>> 013a 0001b691fd98
>>> [   21.455011]0001b691fd98 013ace4775c8 
>>> 0001 
>>> [   21.455014]013ac6c0dc00 00b47238 
>>> 0001b691fc08 0001b691fbd0
>>> [   21.455032] Krnl Code: 0069c596: ebaff0a4lmg 
>>> %r10,%r15,160(%r15)
>>>   0069c59c: c0f47a5ebrcl
>>> 15,68ba58
>>>  #0069c5a2: a7f40001brc 
>>> 15,69c5a4
>>>  >0069c5a6: e340f0c4lg  
>>> %r4,192(%r15)
>>>   0069c5ac: ebaff0a4lmg 
>>> %r10,%r15,160(%r15)
>>>   0069c5b2: 07f4bcr 

Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-03-28 Thread Christian Borntraeger
With that patch I now get:

[   40.620619] virbr0: port 1(virbr0-nic) entered disabled state
[   47.418592] run queue from wrong CPU 3, hctx inactive
[   47.418602] CPU: 3 PID: 2153 Comm: kworker/3:1H Tainted: GW
4.16.0-rc7+ #27
[   47.418604] Hardware name: IBM 2964 NC9 704 (LPAR)
[   47.418613] Workqueue: kblockd blk_mq_run_work_fn
[   47.418615] Call Trace:
[   47.418621] ([<00113b86>] show_stack+0x56/0x80)
[   47.418626]  [<00a5cd9a>] dump_stack+0x82/0xb0 
[   47.418627]  [<0069c4be>] __blk_mq_run_hw_queue+0x136/0x160 
[   47.418631]  [<00163906>] process_one_work+0x1be/0x420 
[   47.418633]  [<00163bc0>] worker_thread+0x58/0x458 
[   47.418635]  [<0016a9d0>] kthread+0x148/0x160 
[   47.418639]  [<00a7bf3a>] kernel_thread_starter+0x6/0xc 
[   47.418640]  [<00a7bf34>] kernel_thread_starter+0x0/0xc 
[   77.670407] run queue from wrong CPU 4, hctx inactive
[   77.670416] CPU: 4 PID: 2155 Comm: kworker/4:1H Tainted: GW
4.16.0-rc7+ #27
[   77.670418] Hardware name: IBM 2964 NC9 704 (LPAR)
[   77.670428] Workqueue: kblockd blk_mq_run_work_fn
[   77.670430] Call Trace:
[   77.670436] ([<00113b86>] show_stack+0x56/0x80)
[   77.670441]  [<00a5cd9a>] dump_stack+0x82/0xb0 
[   77.670442]  [<0069c4be>] __blk_mq_run_hw_queue+0x136/0x160 
[   77.670446]  [<00163906>] process_one_work+0x1be/0x420 
[   77.670448]  [<00163bc0>] worker_thread+0x58/0x458 
[   77.670450]  [<0016a9d0>] kthread+0x148/0x160 
[   77.670454]  [<00a7bf3a>] kernel_thread_starter+0x6/0xc 
[   77.670455]  [<00a7bf34>] kernel_thread_starter+0x0/0xc 



On 03/28/2018 04:53 PM, Jens Axboe wrote:
> On 3/28/18 8:38 AM, Jens Axboe wrote:
>> On 3/28/18 1:45 AM, Christian Borntraeger wrote:
>>> FWIW, this patch does not fix the issue for me:
>>
>> Looks like I didn't do the delayed path. How about the below?
> 
> OK, final version... This is more in line with what I originally
> suggested.
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 16e83e6df404..c90016c36a70 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1306,6 +1306,10 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx 
> *hctx)
>  {
>   int srcu_idx;
> 
> + if (!blk_mq_hw_queue_mapped(hctx) &&
> + !WARN_ON_ONCE(blk_mq_hctx_has_pending(hctx)))
> + return;
> +
>   /*
>* We should be running this queue from one of the CPUs that
>* are mapped to it.
> @@ -1399,9 +1403,6 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx 
> *hctx)
>  static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool 
> async,
>   unsigned long msecs)
>  {
> - if (WARN_ON_ONCE(!blk_mq_hw_queue_mapped(hctx)))
> - return;
> -
>   if (unlikely(blk_mq_hctx_stopped(hctx)))
>   return;
> 
> @@ -1586,9 +1587,6 @@ static void blk_mq_run_work_fn(struct work_struct *work)
> 
>  void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs)
>  {
> - if (WARN_ON_ONCE(!blk_mq_hw_queue_mapped(hctx)))
> - return;
> -
>   /*
>* Stop the hw queue, then modify currently delayed work.
>* This should prevent us from running the queue prematurely.
> 



Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-03-28 Thread Christian Borntraeger


On 03/28/2018 05:26 PM, Ming Lei wrote:
> Hi Christian,
> 
> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
>> FWIW, this patch does not fix the issue for me:
>>
>> ostname=? addr=? terminal=? res=success'
>> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 
>> __blk_mq_delay_run_hw_queue+0xbe/0xd8
>> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua 
>> dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4
>> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26
>> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
>> [   21.454990] Krnl PSW : c0131ea3 3ea2f7bf 
>> (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
>> [   21.454996]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 
>> RI:0 EA:3
>> [   21.455005] Krnl GPRS: 013abb69a000 013a 013ac6c0dc00 
>> 0001
>> [   21.455008] 013abb69a710 013a 
>> 0001b691fd98
>> [   21.455011]0001b691fd98 013ace4775c8 0001 
>> 
>> [   21.455014]013ac6c0dc00 00b47238 0001b691fc08 
>> 0001b691fbd0
>> [   21.455032] Krnl Code: 0069c596: ebaff0a4 lmg 
>> %r10,%r15,160(%r15)
>>   0069c59c: c0f47a5e brcl
>> 15,68ba58
>>  #0069c5a2: a7f40001 brc 
>> 15,69c5a4
>>  >0069c5a6: e340f0c4 lg  
>> %r4,192(%r15)
>>   0069c5ac: ebaff0a4 lmg 
>> %r10,%r15,160(%r15)
>>   0069c5b2: 07f4 bcr 15,%r4
>>   0069c5b4: c0e5feea brasl   
>> %r14,69c388
>>   0069c5ba: a7f4fff6 brc 
>> 15,69c5a6
>> [   21.455067] Call Trace:
>> [   21.455072] ([<0001b691fd98>] 0x1b691fd98)
>> [   21.455079]  [<0069c692>] blk_mq_run_hw_queue+0xba/0x100 
>> [   21.455083]  [<0069c740>] blk_mq_run_hw_queues+0x68/0x88 
>> [   21.455089]  [<0069b956>] __blk_mq_complete_request+0x11e/0x1d8 
>> [   21.455091]  [<0069ba9c>] blk_mq_complete_request+0x8c/0xc8 
>> [   21.455103]  [<008aa250>] dasd_block_tasklet+0x158/0x490 
>> [   21.455110]  [<0014c742>] tasklet_hi_action+0x92/0x120 
>> [   21.455118]  [<00a7cfc0>] __do_softirq+0x120/0x348 
>> [   21.455122]  [<0014c212>] irq_exit+0xba/0xd0 
>> [   21.455130]  [<0010bf92>] do_IRQ+0x8a/0xb8 
>> [   21.455133]  [<00a7c298>] io_int_handler+0x130/0x298 
>> [   21.455136] Last Breaking-Event-Address:
>> [   21.455138]  [<0069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8
>> [   21.455140] ---[ end trace be43f99a5d1e553e ]---
>> [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not 
>> configuring
> 
> Thinking about this issue further, I can't understand the root cause for
> this issue.
> 
> After commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with
> each possisble CPU"), each hw queue should be mapped to at least one CPU, that
> means this issue shouldn't happen. Maybe blk_mq_map_queues() works wrong?
> 
> Could you dump 'lscpu' and provide blk-mq debugfs for your DASD via the
> following command?

# lscpu
Architecture:s390x
CPU op-mode(s):  32-bit, 64-bit
Byte Order:  Big Endian
CPU(s):  16
On-line CPU(s) list: 0-15
Thread(s) per core:  2
Core(s) per socket:  8
Socket(s) per book:  3
Book(s) per drawer:  2
Drawer(s):   4
NUMA node(s):1
Vendor ID:   IBM/S390
Machine type:2964
CPU dynamic MHz: 5000
CPU static MHz:  5000
BogoMIPS:20325.00
Hypervisor:  PR/SM
Hypervisor vendor:   IBM
Virtualization type: full
Dispatching mode:horizontal
L1d cache:   128K
L1i cache:   96K
L2d cache:   2048K
L2i cache:   2048K
L3 cache:65536K
L4 cache:491520K
NUMA node0 CPU(s):   0-15
Flags:   esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs 
te vx sie

# lsdasd 
Bus-ID Status  Name  Device  Type  BlkSz  Size  Blocks
==
0.0.3f75   active  dasda 94:0ECKD  4096   21129MB   5409180
0.0.3f76   active  dasdb 94:4ECKD  4096   21129MB   5409180
0.0.3f77   active  dasdc 94:8ECKD  4096   21129MB   5409180
0.0.3f74   active  dasdd 94:12   ECKD  4096   21129MB   5409180

> 
> (cd /sys/kernel/debug/block/$DASD && find . -type f -exec grep -aH . {} \;)


see attachement:
dasda/range:4
dasda/capability:10
dasda/inflight:   00
dasda/ext_range:4
dasda/power/runtime_suspended_time:0
dasda/power/runtime_active_time:0
dasda/power/control:auto
dasda/power/runtime_status:unsupported

Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-03-28 Thread Ming Lei
Hi Christian,

On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
> FWIW, this patch does not fix the issue for me:
> 
> ostname=? addr=? terminal=? res=success'
> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 
> __blk_mq_delay_run_hw_queue+0xbe/0xd8
> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua 
> dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4
> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26
> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
> [   21.454990] Krnl PSW : c0131ea3 3ea2f7bf 
> (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
> [   21.454996]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 
> RI:0 EA:3
> [   21.455005] Krnl GPRS: 013abb69a000 013a 013ac6c0dc00 
> 0001
> [   21.455008] 013abb69a710 013a 
> 0001b691fd98
> [   21.455011]0001b691fd98 013ace4775c8 0001 
> 
> [   21.455014]013ac6c0dc00 00b47238 0001b691fc08 
> 0001b691fbd0
> [   21.455032] Krnl Code: 0069c596: ebaff0a4  lmg 
> %r10,%r15,160(%r15)
>   0069c59c: c0f47a5e  brcl
> 15,68ba58
>  #0069c5a2: a7f40001  brc 
> 15,69c5a4
>  >0069c5a6: e340f0c4  lg  
> %r4,192(%r15)
>   0069c5ac: ebaff0a4  lmg 
> %r10,%r15,160(%r15)
>   0069c5b2: 07f4  bcr 15,%r4
>   0069c5b4: c0e5feea  brasl   
> %r14,69c388
>   0069c5ba: a7f4fff6  brc 
> 15,69c5a6
> [   21.455067] Call Trace:
> [   21.455072] ([<0001b691fd98>] 0x1b691fd98)
> [   21.455079]  [<0069c692>] blk_mq_run_hw_queue+0xba/0x100 
> [   21.455083]  [<0069c740>] blk_mq_run_hw_queues+0x68/0x88 
> [   21.455089]  [<0069b956>] __blk_mq_complete_request+0x11e/0x1d8 
> [   21.455091]  [<0069ba9c>] blk_mq_complete_request+0x8c/0xc8 
> [   21.455103]  [<008aa250>] dasd_block_tasklet+0x158/0x490 
> [   21.455110]  [<0014c742>] tasklet_hi_action+0x92/0x120 
> [   21.455118]  [<00a7cfc0>] __do_softirq+0x120/0x348 
> [   21.455122]  [<0014c212>] irq_exit+0xba/0xd0 
> [   21.455130]  [<0010bf92>] do_IRQ+0x8a/0xb8 
> [   21.455133]  [<00a7c298>] io_int_handler+0x130/0x298 
> [   21.455136] Last Breaking-Event-Address:
> [   21.455138]  [<0069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8
> [   21.455140] ---[ end trace be43f99a5d1e553e ]---
> [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not 
> configuring

Thinking about this issue further, I can't understand the root cause for
this issue.

After commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with
each possisble CPU"), each hw queue should be mapped to at least one CPU, that
means this issue shouldn't happen. Maybe blk_mq_map_queues() works wrong?

Could you dump 'lscpu' and provide blk-mq debugfs for your DASD via the
following command?

(cd /sys/kernel/debug/block/$DASD && find . -type f -exec grep -aH . {} \;)

Thanks,
Ming


Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-03-28 Thread Jens Axboe
On 3/28/18 8:38 AM, Jens Axboe wrote:
> On 3/28/18 1:45 AM, Christian Borntraeger wrote:
>> FWIW, this patch does not fix the issue for me:
> 
> Looks like I didn't do the delayed path. How about the below?

OK, final version... This is more in line with what I originally
suggested.

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 16e83e6df404..c90016c36a70 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1306,6 +1306,10 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx 
*hctx)
 {
int srcu_idx;
 
+   if (!blk_mq_hw_queue_mapped(hctx) &&
+   !WARN_ON_ONCE(blk_mq_hctx_has_pending(hctx)))
+   return;
+
/*
 * We should be running this queue from one of the CPUs that
 * are mapped to it.
@@ -1399,9 +1403,6 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx 
*hctx)
 static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async,
unsigned long msecs)
 {
-   if (WARN_ON_ONCE(!blk_mq_hw_queue_mapped(hctx)))
-   return;
-
if (unlikely(blk_mq_hctx_stopped(hctx)))
return;
 
@@ -1586,9 +1587,6 @@ static void blk_mq_run_work_fn(struct work_struct *work)
 
 void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs)
 {
-   if (WARN_ON_ONCE(!blk_mq_hw_queue_mapped(hctx)))
-   return;
-
/*
 * Stop the hw queue, then modify currently delayed work.
 * This should prevent us from running the queue prematurely.

-- 
Jens Axboe



Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-03-28 Thread Jens Axboe
On 3/28/18 1:45 AM, Christian Borntraeger wrote:
> FWIW, this patch does not fix the issue for me:

Looks like I didn't do the delayed path. How about the below?


diff --git a/block/blk-mq.c b/block/blk-mq.c
index 16e83e6df404..fd663ae1094c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1302,10 +1302,23 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, 
struct list_head *list,
return (queued + errors) != 0;
 }
 
+static bool blk_mq_bail_unmapped(struct blk_mq_hw_ctx *hctx)
+{
+   /*
+* Warn if the queue isn't mapped AND we have pending IO. Not being
+* mapped isn't necessarily a huge issue, if we don't have pending IO.
+*/
+   return !blk_mq_hw_queue_mapped(hctx) &&
+   !WARN_ON_ONCE(blk_mq_hctx_has_pending(hctx));
+}
+
 static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 {
int srcu_idx;
 
+   if (blk_mq_bail_unmapped(hctx))
+   return;
+
/*
 * We should be running this queue from one of the CPUs that
 * are mapped to it.
@@ -1399,9 +1412,8 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx 
*hctx)
 static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async,
unsigned long msecs)
 {
-   if (WARN_ON_ONCE(!blk_mq_hw_queue_mapped(hctx)))
+   if (blk_mq_bail_unmapped(hctx))
return;
-
if (unlikely(blk_mq_hctx_stopped(hctx)))
return;
 

-- 
Jens Axboe



Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-03-28 Thread Christian Borntraeger
FWIW, this patch does not fix the issue for me:

ostname=? addr=? terminal=? res=success'
[   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 
__blk_mq_delay_run_hw_queue+0xbe/0xd8
[   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua 
dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4
[   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26
[   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
[   21.454990] Krnl PSW : c0131ea3 3ea2f7bf 
(__blk_mq_delay_run_hw_queue+0xbe/0xd8)
[   21.454996]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 
RI:0 EA:3
[   21.455005] Krnl GPRS: 013abb69a000 013a 013ac6c0dc00 
0001
[   21.455008] 013abb69a710 013a 
0001b691fd98
[   21.455011]0001b691fd98 013ace4775c8 0001 

[   21.455014]013ac6c0dc00 00b47238 0001b691fc08 
0001b691fbd0
[   21.455032] Krnl Code: 0069c596: ebaff0a4lmg 
%r10,%r15,160(%r15)
  0069c59c: c0f47a5ebrcl
15,68ba58
 #0069c5a2: a7f40001brc 
15,69c5a4
 >0069c5a6: e340f0c4lg  
%r4,192(%r15)
  0069c5ac: ebaff0a4lmg 
%r10,%r15,160(%r15)
  0069c5b2: 07f4bcr 15,%r4
  0069c5b4: c0e5feeabrasl   
%r14,69c388
  0069c5ba: a7f4fff6brc 
15,69c5a6
[   21.455067] Call Trace:
[   21.455072] ([<0001b691fd98>] 0x1b691fd98)
[   21.455079]  [<0069c692>] blk_mq_run_hw_queue+0xba/0x100 
[   21.455083]  [<0069c740>] blk_mq_run_hw_queues+0x68/0x88 
[   21.455089]  [<0069b956>] __blk_mq_complete_request+0x11e/0x1d8 
[   21.455091]  [<0069ba9c>] blk_mq_complete_request+0x8c/0xc8 
[   21.455103]  [<008aa250>] dasd_block_tasklet+0x158/0x490 
[   21.455110]  [<0014c742>] tasklet_hi_action+0x92/0x120 
[   21.455118]  [<00a7cfc0>] __do_softirq+0x120/0x348 
[   21.455122]  [<0014c212>] irq_exit+0xba/0xd0 
[   21.455130]  [<0010bf92>] do_IRQ+0x8a/0xb8 
[   21.455133]  [<00a7c298>] io_int_handler+0x130/0x298 
[   21.455136] Last Breaking-Event-Address:
[   21.455138]  [<0069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8
[   21.455140] ---[ end trace be43f99a5d1e553e ]---
[   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring


On 03/28/2018 05:22 AM, Jens Axboe wrote:
> On 3/27/18 7:20 PM, Ming Lei wrote:
>> From commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule
>> with each possisble CPU") on, it should be easier to see unmapped hctx
>> in some CPU topo, such as, hctx may not be mapped to any CPU.
>>
>> This patch avoids the warning in __blk_mq_delay_run_hw_queue() by
>> checking if the hctx is mapped in blk_mq_run_hw_queues().
>>
>> blk_mq_run_hw_queues() is often run in SCSI or some driver's completion
>> path, so this warning has to be addressed.
> 
> I don't like this very much. You're catching just one particular case,
> and if the hw queue has pending IO (for instance), then it's just wrong.
> 
> How about something like the below? Totally untested...
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 16e83e6df404..4c04ac124e5d 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1307,6 +1307,14 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx 
> *hctx)
>   int srcu_idx;
> 
>   /*
> +  * Warn if the queue isn't mapped AND we have pending IO. Not being
> +  * mapped isn't necessarily a huge issue, if we don't have pending IO.
> +  */
> + if (!blk_mq_hw_queue_mapped(hctx) &&
> + !WARN_ON_ONCE(blk_mq_hctx_has_pending(hctx)))
> + return;
> +
> + /*
>* We should be running this queue from one of the CPUs that
>* are mapped to it.
>*
> 



Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-03-27 Thread Jens Axboe
On 3/27/18 7:20 PM, Ming Lei wrote:
> From commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule
> with each possisble CPU") on, it should be easier to see unmapped hctx
> in some CPU topo, such as, hctx may not be mapped to any CPU.
> 
> This patch avoids the warning in __blk_mq_delay_run_hw_queue() by
> checking if the hctx is mapped in blk_mq_run_hw_queues().
> 
> blk_mq_run_hw_queues() is often run in SCSI or some driver's completion
> path, so this warning has to be addressed.

I don't like this very much. You're catching just one particular case,
and if the hw queue has pending IO (for instance), then it's just wrong.

How about something like the below? Totally untested...

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 16e83e6df404..4c04ac124e5d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1307,6 +1307,14 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx 
*hctx)
int srcu_idx;
 
/*
+* Warn if the queue isn't mapped AND we have pending IO. Not being
+* mapped isn't necessarily a huge issue, if we don't have pending IO.
+*/
+   if (!blk_mq_hw_queue_mapped(hctx) &&
+   !WARN_ON_ONCE(blk_mq_hctx_has_pending(hctx)))
+   return;
+
+   /*
 * We should be running this queue from one of the CPUs that
 * are mapped to it.
 *

-- 
Jens Axboe