Re: 4.16-RC7 WARNING: CPU: 2 PID: 0 at block/blk-mq.c:1400 __blk_mq_delay_run_hw_queue

2018-03-27 Thread Christian Borntraeger


On 03/27/2018 02:01 PM, Ming Lei wrote:
> Hi Stefan,
> 
> On Tue, Mar 27, 2018 at 12:04:20PM +0200, Stefan Haberland wrote:
>> Hi,
>>
>> I get the following warning in __blk_mq_delay_run_hw_queue when the
>> scheduler is set to mq-deadline for DASD devices on s390.
>>
>> What I see is that for whatever reason there is a hctx nr 0 which has no
>> hctx->tags pointer set.
>> From my observation it is always hctx nr 0 which has a tags NULL pointer in
>> it and I see other hctx which have the hctx->tags pointer set correctly.
>>
>>
>> [    2.169986] WARNING: CPU: 0 PID: 0 at block/blk-mq.c:1402
>> __blk_mq_delay_run_hw_queue+0xe8/0x118
>> [    2.170007] Modules linked in:
>> [    2.170014] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
>> 4.16.0-rc7-04107-g91a05d9e1d6b-dirty #147
>> [    2.170019] Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
>> [    2.170024] Krnl PSW : 76fd6c7f c244c24d
>> (__blk_mq_delay_run_hw_queue+0xe8/0x118)
>> [    2.170035]    R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0
>> RI:0 EA:3
>> [    2.170041] Krnl GPRS: 599ec58a 02a94000 02a94000
>> 0001
>> [    2.170047]     6e761e98 02a96980
>> 02a96800
>> [    2.170052]    02d87ce0 737fbda8 
>> 0001
>> [    2.170058]    0001 00aedd10 737fbc38
>> 737fbc00
>> [    2.170069] Krnl Code: 006ea3c8: ebaff0a4 lmg
>> %r10,%r15,160(%r15)
>>   006ea3ce: c0f45e0d brcl    15,6d5fe8
>>  #006ea3d4: a7f40001 brc 15,6ea3d6
>>  >006ea3d8: e340f0c4    lg 
>> %r4,192(%r15)
>>   006ea3de: ebaff0a4 lmg
>> %r10,%r15,160(%r15)
>>   006ea3e4: 07f4 bcr 15,%r4
>>   006ea3e6: 41b01100 la  %r11,256(%r1)
>>   006ea3ea: 182a lr  %r2,%r10
>> [    2.170158] Call Trace:
>> [    2.170205] ([<02a96800>] 0x2a96800)
>> [    2.170248]  [<006ea4c0>] blk_mq_run_hw_queue+0xa0/0x100
>> [    2.170262]  [<006ea59c>] blk_mq_run_hw_queues+0x7c/0x98
>> [    2.170295]  [<006e88f6>] __blk_mq_complete_request+0x10e/0x1e0
>> [    2.170300]  [<006e9e30>] blk_mq_complete_request+0x80/0xa0
>> [    2.170307]  [<0087fad0>] dasd_block_tasklet+0x218/0x480
>> [    2.170415]  [<0017c3f8>] tasklet_hi_action+0xa0/0x138
>> [    2.170434]  [<00a91c10>] __do_softirq+0xc8/0x540
>> [    2.170471]  [<0017bd4e>] irq_exit+0x136/0x140
>> [    2.170478]  [<0010c912>] do_IRQ+0x8a/0xb8
>> [    2.170518]  [<00a90ee0>] io_int_handler+0x138/0x2e0
>> [    2.170524]  [<00102cd0>] enabled_wait+0x58/0x128
>> [    2.170562] ([<00102cb8>] enabled_wait+0x40/0x128)
>> [    2.170577]  [<0010319a>] arch_cpu_idle+0x32/0x48
>> [    2.170604]  [<00a8f636>] default_idle_call+0x3e/0x58
>> [    2.170613]  [<001cd5d2>] do_idle+0xda/0x190
>> [    2.170621]  [<001cd93e>] cpu_startup_entry+0x3e/0x48
>> [    2.170633]  [<00e5ebf4>] start_kernel+0x47c/0x490
>> [    2.170641]  [<00100020>] _stext+0x20/0x80
>> [    2.170650] 2 locks held by swapper/0/0:
>> [    2.170658]  #0:  (&(>lock)->rlock){..-.}, at: []
>> dasd_block_tasklet+0x1cc/0x480
>> [    2.170676]  #1:  (rcu_read_lock){}, at: []
>> hctx_lock+0x34/0x110
>> [    2.170750] Last Breaking-Event-Address:
>> [    2.170758]  [<006ea3d4>] __blk_mq_delay_run_hw_queue+0xe4/0x118
>> [    2.170803] ---[ end trace 1073cf0de1fd32d0 ]---
> 
> This warning is harmless, please try the following patch:
> 
> --
> From 7b2b5139bfef80f44d1b1424e09ab35b715fbfdb Mon Sep 17 00:00:00 2001
> From: Ming Lei 
> Date: Tue, 27 Mar 2018 19:54:23 +0800
> Subject: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
> 
> From commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule
> with each possisble CPU") on, it should be easier to see unmapped hctx
> in some CPU topo, such as, hctx may not be mapped to any CPU.
> 
> This patch avoids the warning in __blk_mq_delay_run_hw_queue() by
> checking if the hctx is mapped in blk_mq_run_hw_queues().
> 
> Reported-by: Stefan Haberland 
> Cc: Christoph Hellwig 
> Fixes: 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with each 
> possisble CPU")
> Signed-off-by: Ming Lei 

Can we get this into 4.16?

> ---
>  block/blk-mq.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 16e83e6df404..48f25a63833b 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1459,7 +1459,7 @@ void blk_mq_run_hw_queues(struct 

Re: 4.16-RC7 WARNING: CPU: 2 PID: 0 at block/blk-mq.c:1400 __blk_mq_delay_run_hw_queue

2018-03-27 Thread Stefan Haberland



This warning is harmless, please try the following patch:

--
 From 7b2b5139bfef80f44d1b1424e09ab35b715fbfdb Mon Sep 17 00:00:00 2001
From: Ming Lei 
Date: Tue, 27 Mar 2018 19:54:23 +0800
Subject: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

 From commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule
with each possisble CPU") on, it should be easier to see unmapped hctx
in some CPU topo, such as, hctx may not be mapped to any CPU.

This patch avoids the warning in __blk_mq_delay_run_hw_queue() by
checking if the hctx is mapped in blk_mq_run_hw_queues().

Reported-by: Stefan Haberland 
Cc: Christoph Hellwig 
Fixes: 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with each 
possisble CPU")
Signed-off-by: Ming Lei 
---


Hi Ming,

thanks a lot for the patch. I tried it on top of our 4.16-rc7 branch and 
it works well.
I ran a small regression test set against it and I did not notice any 
warnings or other obvious impacts.


Regards,
Stefan




Re: 4.16-RC7 WARNING: CPU: 2 PID: 0 at block/blk-mq.c:1400 __blk_mq_delay_run_hw_queue

2018-03-27 Thread Ming Lei
Hi Stefan,

On Tue, Mar 27, 2018 at 12:04:20PM +0200, Stefan Haberland wrote:
> Hi,
> 
> I get the following warning in __blk_mq_delay_run_hw_queue when the
> scheduler is set to mq-deadline for DASD devices on s390.
> 
> What I see is that for whatever reason there is a hctx nr 0 which has no
> hctx->tags pointer set.
> From my observation it is always hctx nr 0 which has a tags NULL pointer in
> it and I see other hctx which have the hctx->tags pointer set correctly.
> 
> 
> [    2.169986] WARNING: CPU: 0 PID: 0 at block/blk-mq.c:1402
> __blk_mq_delay_run_hw_queue+0xe8/0x118
> [    2.170007] Modules linked in:
> [    2.170014] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> 4.16.0-rc7-04107-g91a05d9e1d6b-dirty #147
> [    2.170019] Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
> [    2.170024] Krnl PSW : 76fd6c7f c244c24d
> (__blk_mq_delay_run_hw_queue+0xe8/0x118)
> [    2.170035]    R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0
> RI:0 EA:3
> [    2.170041] Krnl GPRS: 599ec58a 02a94000 02a94000
> 0001
> [    2.170047]     6e761e98 02a96980
> 02a96800
> [    2.170052]    02d87ce0 737fbda8 
> 0001
> [    2.170058]    0001 00aedd10 737fbc38
> 737fbc00
> [    2.170069] Krnl Code: 006ea3c8: ebaff0a4 lmg
> %r10,%r15,160(%r15)
>   006ea3ce: c0f45e0d brcl    15,6d5fe8
>  #006ea3d4: a7f40001 brc 15,6ea3d6
>  >006ea3d8: e340f0c4    lg 
> %r4,192(%r15)
>   006ea3de: ebaff0a4 lmg
> %r10,%r15,160(%r15)
>   006ea3e4: 07f4 bcr 15,%r4
>   006ea3e6: 41b01100 la  %r11,256(%r1)
>   006ea3ea: 182a lr  %r2,%r10
> [    2.170158] Call Trace:
> [    2.170205] ([<02a96800>] 0x2a96800)
> [    2.170248]  [<006ea4c0>] blk_mq_run_hw_queue+0xa0/0x100
> [    2.170262]  [<006ea59c>] blk_mq_run_hw_queues+0x7c/0x98
> [    2.170295]  [<006e88f6>] __blk_mq_complete_request+0x10e/0x1e0
> [    2.170300]  [<006e9e30>] blk_mq_complete_request+0x80/0xa0
> [    2.170307]  [<0087fad0>] dasd_block_tasklet+0x218/0x480
> [    2.170415]  [<0017c3f8>] tasklet_hi_action+0xa0/0x138
> [    2.170434]  [<00a91c10>] __do_softirq+0xc8/0x540
> [    2.170471]  [<0017bd4e>] irq_exit+0x136/0x140
> [    2.170478]  [<0010c912>] do_IRQ+0x8a/0xb8
> [    2.170518]  [<00a90ee0>] io_int_handler+0x138/0x2e0
> [    2.170524]  [<00102cd0>] enabled_wait+0x58/0x128
> [    2.170562] ([<00102cb8>] enabled_wait+0x40/0x128)
> [    2.170577]  [<0010319a>] arch_cpu_idle+0x32/0x48
> [    2.170604]  [<00a8f636>] default_idle_call+0x3e/0x58
> [    2.170613]  [<001cd5d2>] do_idle+0xda/0x190
> [    2.170621]  [<001cd93e>] cpu_startup_entry+0x3e/0x48
> [    2.170633]  [<00e5ebf4>] start_kernel+0x47c/0x490
> [    2.170641]  [<00100020>] _stext+0x20/0x80
> [    2.170650] 2 locks held by swapper/0/0:
> [    2.170658]  #0:  (&(>lock)->rlock){..-.}, at: []
> dasd_block_tasklet+0x1cc/0x480
> [    2.170676]  #1:  (rcu_read_lock){}, at: []
> hctx_lock+0x34/0x110
> [    2.170750] Last Breaking-Event-Address:
> [    2.170758]  [<006ea3d4>] __blk_mq_delay_run_hw_queue+0xe4/0x118
> [    2.170803] ---[ end trace 1073cf0de1fd32d0 ]---

This warning is harmless, please try the following patch:

--
>From 7b2b5139bfef80f44d1b1424e09ab35b715fbfdb Mon Sep 17 00:00:00 2001
From: Ming Lei 
Date: Tue, 27 Mar 2018 19:54:23 +0800
Subject: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

>From commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule
with each possisble CPU") on, it should be easier to see unmapped hctx
in some CPU topo, such as, hctx may not be mapped to any CPU.

This patch avoids the warning in __blk_mq_delay_run_hw_queue() by
checking if the hctx is mapped in blk_mq_run_hw_queues().

Reported-by: Stefan Haberland 
Cc: Christoph Hellwig 
Fixes: 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with each 
possisble CPU")
Signed-off-by: Ming Lei 
---
 block/blk-mq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 16e83e6df404..48f25a63833b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1459,7 +1459,7 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool 
async)
int i;
 
queue_for_each_hw_ctx(q, hctx, i) {
-   if (blk_mq_hctx_stopped(hctx))
+   if (blk_mq_hctx_stopped(hctx) || 

4.16-RC7 WARNING: CPU: 2 PID: 0 at block/blk-mq.c:1400 __blk_mq_delay_run_hw_queue

2018-03-27 Thread Stefan Haberland

Hi,

I get the following warning in __blk_mq_delay_run_hw_queue when the 
scheduler is set to mq-deadline for DASD devices on s390.


What I see is that for whatever reason there is a hctx nr 0 which has no 
hctx->tags pointer set.
From my observation it is always hctx nr 0 which has a tags NULL 
pointer in it and I see other hctx which have the hctx->tags pointer set 
correctly.



[    2.169986] WARNING: CPU: 0 PID: 0 at block/blk-mq.c:1402 
__blk_mq_delay_run_hw_queue+0xe8/0x118

[    2.170007] Modules linked in:
[    2.170014] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.16.0-rc7-04107-g91a05d9e1d6b-dirty #147

[    2.170019] Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
[    2.170024] Krnl PSW : 76fd6c7f c244c24d 
(__blk_mq_delay_run_hw_queue+0xe8/0x118)
[    2.170035]    R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 
PM:0 RI:0 EA:3
[    2.170041] Krnl GPRS: 599ec58a 02a94000 
02a94000 0001
[    2.170047]     6e761e98 
02a96980 02a96800
[    2.170052]    02d87ce0 737fbda8 
 0001
[    2.170058]    0001 00aedd10 
737fbc38 737fbc00
[    2.170069] Krnl Code: 006ea3c8: ebaff0a4 lmg 
%r10,%r15,160(%r15)

  006ea3ce: c0f45e0d brcl    15,6d5fe8
 #006ea3d4: a7f40001 brc 15,6ea3d6
 >006ea3d8: e340f0c4    lg  
%r4,192(%r15)
  006ea3de: ebaff0a4 lmg 
%r10,%r15,160(%r15)

  006ea3e4: 07f4 bcr 15,%r4
  006ea3e6: 41b01100 la  %r11,256(%r1)
  006ea3ea: 182a lr  %r2,%r10
[    2.170158] Call Trace:
[    2.170205] ([<02a96800>] 0x2a96800)
[    2.170248]  [<006ea4c0>] blk_mq_run_hw_queue+0xa0/0x100
[    2.170262]  [<006ea59c>] blk_mq_run_hw_queues+0x7c/0x98
[    2.170295]  [<006e88f6>] __blk_mq_complete_request+0x10e/0x1e0
[    2.170300]  [<006e9e30>] blk_mq_complete_request+0x80/0xa0
[    2.170307]  [<0087fad0>] dasd_block_tasklet+0x218/0x480
[    2.170415]  [<0017c3f8>] tasklet_hi_action+0xa0/0x138
[    2.170434]  [<00a91c10>] __do_softirq+0xc8/0x540
[    2.170471]  [<0017bd4e>] irq_exit+0x136/0x140
[    2.170478]  [<0010c912>] do_IRQ+0x8a/0xb8
[    2.170518]  [<00a90ee0>] io_int_handler+0x138/0x2e0
[    2.170524]  [<00102cd0>] enabled_wait+0x58/0x128
[    2.170562] ([<00102cb8>] enabled_wait+0x40/0x128)
[    2.170577]  [<0010319a>] arch_cpu_idle+0x32/0x48
[    2.170604]  [<00a8f636>] default_idle_call+0x3e/0x58
[    2.170613]  [<001cd5d2>] do_idle+0xda/0x190
[    2.170621]  [<001cd93e>] cpu_startup_entry+0x3e/0x48
[    2.170633]  [<00e5ebf4>] start_kernel+0x47c/0x490
[    2.170641]  [<00100020>] _stext+0x20/0x80
[    2.170650] 2 locks held by swapper/0/0:
[    2.170658]  #0:  (&(>lock)->rlock){..-.}, at: 
[] dasd_block_tasklet+0x1cc/0x480
[    2.170676]  #1:  (rcu_read_lock){}, at: [] 
hctx_lock+0x34/0x110

[    2.170750] Last Breaking-Event-Address:
[    2.170758]  [<006ea3d4>] __blk_mq_delay_run_hw_queue+0xe4/0x118
[    2.170803] ---[ end trace 1073cf0de1fd32d0 ]---



Regards,
Stefan