Re: [Xen-devel] 4.9.52: INFO: task XXX blocked for more than 300 seconds.

2017-10-05 Thread Ankur Arora

On 2017-10-05 06:20 AM, Konrad Rzeszutek Wilk wrote:

On Wed, Oct 04, 2017 at 08:26:27PM +0200, Philipp Hahn wrote:

Hello,



Adding Ankur to this as I think he saw something similar.

But in the meantime - do you see this with the latest version of Linux?

with linux-4.9.52 running on Debian-Wheezy with Xen-4.1 I observed
several stuck processes: Here is one (truncated) dump of the Linux
kernel messages:


  [] ? __schedule+0x23d/0x6d0
  [] ? bit_wait_timeout+0x90/0x90
  [] ? schedule+0x32/0x80
  [] ? schedule_timeout+0x1ec/0x360
  [] ? __blk_mq_run_hw_queue+0x327/0x3e0* see below
  [] ? xen_clocksource_get_cycles+0x11/0x20
  [] ? bit_wait_timeout+0x90/0x90
  [] ? io_schedule_timeout+0xb4/0x130
  [] ? prepare_to_wait+0x57/0x80
  [] ? bit_wait_io+0x17/0x60
  [] ? __wait_on_bit+0x5c/0x90
  [] ? bit_wait_timeout+0x90/0x90
  [] ? out_of_line_wait_on_bit+0x7e/0xa0
  [] ? autoremove_wake_function+0x40/0x40
  [] ? jbd2_journal_commit_transaction+0xd48/0x17e0 [jbd2]
  [] ? __switch_to+0x2c9/0x720
  [] ? try_to_del_timer_sync+0x4d/0x80
  [] ? kjournald2+0xdd/0x280 [jbd2]
  [] ? wake_up_atomic_t+0x30/0x30
  [] ? commit_timeout+0x10/0x10 [jbd2]
  [] ? kthread+0xf0/0x110
  [] ? __switch_to+0x2c9/0x720
  [] ? kthread_park+0x60/0x60
  [] ? ret_from_fork+0x25/0x30

This looks like this race: https://patchwork.kernel.org/patch/9853443/

Can you dump the output of: cat /sys/block/$xen-frontend-device/mq/*/tags

If you've hit this bug, one or more of the MQs would be wedged and
the nr_free in one or more of the queues would be 0 and will not
change.

Ankur


NMI backtrace for cpu 2
CPU: 2 PID: 35 Comm: khungtaskd Not tainted 4.9.0-ucs105-amd64 #1 Debian 
4.9.30-2A~4.2.0.201709271649
   81331935  0002
  81335e60 0002 8104cb70 8801f0c90e80
  81335f6a 8801f0c90e80 003fffbc 81128048
Call Trace:
  [] ? dump_stack+0x5c/0x77
  [] ? nmi_cpu_backtrace+0x90/0xa0
  [] ? irq_force_complete_move+0x140/0x140
  [] ? nmi_trigger_cpumask_backtrace+0xfa/0x130
  [] ? watchdog+0x2b8/0x330
  [] ? reset_hung_task_detector+0x10/0x10
  [] ? kthread+0xf0/0x110
  [] ? __switch_to+0x2c9/0x720
  [] ? kthread_park+0x60/0x60
  [] ? ret_from_fork+0x25/0x30
Sending NMI from CPU 2 to CPUs 0-1,3:
NMI backtrace for cpu 1
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0-ucs105-amd64 #1 Debian 
4.9.30-2A~4.2.0.201709271649
task: 8801f4a02ec0 task.stack: c90040ca4000
RIP: e030:[]  [] 
xen_hypercall_sched_op+0xa/0x20
RSP: e02b:c90040ca7ed0  EFLAGS: 0246
RAX:  RBX: 8801f4a02ec0 RCX: 810013aa
RDX: 81c4ba70 RSI:  RDI: 0001
RBP: 0001 R08:  R09: 
R10: 7ff0 R11: 0246 R12: 
R13:  R14: 8801f4a02ec0 R15: 8801f4a02ec0
FS:  7f23ac595700() GS:8801f5a8() knlGS:
CS:  e033 DS: 002b ES: 002b CR0: 80050033
CR2: 7f52537d6d46 CR3: 0001bba23000 CR4: 2660
Stack:
  8801bb832201 0001 8101b55c 81611ec8
  8801f4a02ec0 0001 810bc280 8801f4a02ec0
  8801f4a02ec0 c0a995961d41240f addce6dcadd009c9 
Call Trace:
  [] ? xen_safe_halt+0xc/0x20
  [] ? default_idle+0x18/0xd0
  [] ? cpu_startup_entry+0x1f0/0x260
Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc 
cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc 
cc cc cc cc cc cc cc cc cc cc cc cc
NMI backtrace for cpu 3
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.9.0-ucs105-amd64 #1 Debian 
4.9.30-2A~4.2.0.201709271649
task: 8801f4a24f00 task.stack: c90040cb4000
RIP: e030:[]  [] 
xen_hypercall_sched_op+0xa/0x20
RSP: e02b:c90040cb7ed0  EFLAGS: 0246
RAX:  RBX: 8801f4a24f00 RCX: 810013aa
RDX: 81c4ba70 RSI:  RDI: 0001
RBP: 0003 R08:  R09: 
R10: 7ff0 R11: 0246 R12: 
R13:  R14: 8801f4a24f00 R15: 8801f4a24f00
FS:  7f1a2af19700() GS:8801f5b8() knlGS:
CS:  e033 DS: 002b ES: 002b CR0: 80050033
CR2: 7f4a5416b000 CR3: 0001d83ec000 CR4: 2660
Stack:
  0001 0001 8101b55c 81611ec8
  8801f4a24f00 0003 810bc280 8801f4a24f00
  8801f4a24f00 77816deb133b9979 addce6dcadd009c9 
Call Trace:
  [] ? xen_safe_halt+0xc/0x20
  [] ? default_idle+0x18/0xd0
  [] ? cpu_startup_entry+0x1f0/0x260
Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc 
cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc 
cc cc cc cc cc cc cc cc cc cc cc cc
NMI backtrace for cpu 0
CPU: 0 PID: 0 Comm: swapper/0 

Re: [Xen-devel] 4.9.52: INFO: task XXX blocked for more than 300 seconds.

2017-10-05 Thread Konrad Rzeszutek Wilk
On Wed, Oct 04, 2017 at 08:26:27PM +0200, Philipp Hahn wrote:
> Hello,
> 

Adding Ankur to this as I think he saw something similar. 

But in the meantime - do you see this with the latest version of Linux?
> with linux-4.9.52 running on Debian-Wheezy with Xen-4.1 I observed
> several stuck processes: Here is one (truncated) dump of the Linux
> kernel messages:
> 
> >  [] ? __schedule+0x23d/0x6d0
> >  [] ? bit_wait_timeout+0x90/0x90
> >  [] ? schedule+0x32/0x80
> >  [] ? schedule_timeout+0x1ec/0x360
> >  [] ? __blk_mq_run_hw_queue+0x327/0x3e0* see below
> >  [] ? xen_clocksource_get_cycles+0x11/0x20
> >  [] ? bit_wait_timeout+0x90/0x90
> >  [] ? io_schedule_timeout+0xb4/0x130
> >  [] ? prepare_to_wait+0x57/0x80
> >  [] ? bit_wait_io+0x17/0x60
> >  [] ? __wait_on_bit+0x5c/0x90
> >  [] ? bit_wait_timeout+0x90/0x90
> >  [] ? out_of_line_wait_on_bit+0x7e/0xa0
> >  [] ? autoremove_wake_function+0x40/0x40
> >  [] ? jbd2_journal_commit_transaction+0xd48/0x17e0 [jbd2]
> >  [] ? __switch_to+0x2c9/0x720
> >  [] ? try_to_del_timer_sync+0x4d/0x80
> >  [] ? kjournald2+0xdd/0x280 [jbd2]
> >  [] ? wake_up_atomic_t+0x30/0x30
> >  [] ? commit_timeout+0x10/0x10 [jbd2]
> >  [] ? kthread+0xf0/0x110
> >  [] ? __switch_to+0x2c9/0x720
> >  [] ? kthread_park+0x60/0x60
> >  [] ? ret_from_fork+0x25/0x30
> > NMI backtrace for cpu 2
> > CPU: 2 PID: 35 Comm: khungtaskd Not tainted 4.9.0-ucs105-amd64 #1 Debian 
> > 4.9.30-2A~4.2.0.201709271649
> >   81331935  0002
> >  81335e60 0002 8104cb70 8801f0c90e80
> >  81335f6a 8801f0c90e80 003fffbc 81128048
> > Call Trace:
> >  [] ? dump_stack+0x5c/0x77
> >  [] ? nmi_cpu_backtrace+0x90/0xa0
> >  [] ? irq_force_complete_move+0x140/0x140
> >  [] ? nmi_trigger_cpumask_backtrace+0xfa/0x130
> >  [] ? watchdog+0x2b8/0x330
> >  [] ? reset_hung_task_detector+0x10/0x10
> >  [] ? kthread+0xf0/0x110
> >  [] ? __switch_to+0x2c9/0x720
> >  [] ? kthread_park+0x60/0x60
> >  [] ? ret_from_fork+0x25/0x30
> > Sending NMI from CPU 2 to CPUs 0-1,3:
> > NMI backtrace for cpu 1
> > CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0-ucs105-amd64 #1 Debian 
> > 4.9.30-2A~4.2.0.201709271649
> > task: 8801f4a02ec0 task.stack: c90040ca4000
> > RIP: e030:[]  [] 
> > xen_hypercall_sched_op+0xa/0x20
> > RSP: e02b:c90040ca7ed0  EFLAGS: 0246
> > RAX:  RBX: 8801f4a02ec0 RCX: 810013aa
> > RDX: 81c4ba70 RSI:  RDI: 0001
> > RBP: 0001 R08:  R09: 
> > R10: 7ff0 R11: 0246 R12: 
> > R13:  R14: 8801f4a02ec0 R15: 8801f4a02ec0
> > FS:  7f23ac595700() GS:8801f5a8() knlGS:
> > CS:  e033 DS: 002b ES: 002b CR0: 80050033
> > CR2: 7f52537d6d46 CR3: 0001bba23000 CR4: 2660
> > Stack:
> >  8801bb832201 0001 8101b55c 81611ec8
> >  8801f4a02ec0 0001 810bc280 8801f4a02ec0
> >  8801f4a02ec0 c0a995961d41240f addce6dcadd009c9 
> > Call Trace:
> >  [] ? xen_safe_halt+0xc/0x20
> >  [] ? default_idle+0x18/0xd0
> >  [] ? cpu_startup_entry+0x1f0/0x260
> > Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc 
> > cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 
> > cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 
> > NMI backtrace for cpu 3
> > CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.9.0-ucs105-amd64 #1 Debian 
> > 4.9.30-2A~4.2.0.201709271649
> > task: 8801f4a24f00 task.stack: c90040cb4000
> > RIP: e030:[]  [] 
> > xen_hypercall_sched_op+0xa/0x20
> > RSP: e02b:c90040cb7ed0  EFLAGS: 0246
> > RAX:  RBX: 8801f4a24f00 RCX: 810013aa
> > RDX: 81c4ba70 RSI:  RDI: 0001
> > RBP: 0003 R08:  R09: 
> > R10: 7ff0 R11: 0246 R12: 
> > R13:  R14: 8801f4a24f00 R15: 8801f4a24f00
> > FS:  7f1a2af19700() GS:8801f5b8() knlGS:
> > CS:  e033 DS: 002b ES: 002b CR0: 80050033
> > CR2: 7f4a5416b000 CR3: 0001d83ec000 CR4: 2660
> > Stack:
> >  0001 0001 8101b55c 81611ec8
> >  8801f4a24f00 0003 810bc280 8801f4a24f00
> >  8801f4a24f00 77816deb133b9979 addce6dcadd009c9 
> > Call Trace:
> >  [] ? xen_safe_halt+0xc/0x20
> >  [] ? default_idle+0x18/0xd0
> >  [] ? cpu_startup_entry+0x1f0/0x260
> > Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc 
> > cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 
> > cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 
> > NMI backtrace for cpu 0
> > CPU: 0 PID: 0 Comm: swapper/0 Not 

Re: [Xen-devel] 4.9.52: INFO: task XXX blocked for more than 300 seconds.

2017-10-05 Thread Philipp Hahn
Hello Jan,

thank you for you answer.

Am 05.10.2017 um 12:12 schrieb Jan Beulich:
 On 04.10.17 at 20:26,  wrote:
>> with linux-4.9.52 running on Debian-Wheezy with Xen-4.1 I observed
>> several stuck processes: Here is one (truncated) dump of the Linux
>> kernel messages:
>>
>>>  [] ? __schedule+0x23d/0x6d0
>>>  [] ? bit_wait_timeout+0x90/0x90
>>>  [] ? schedule+0x32/0x80
>>>  [] ? schedule_timeout+0x1ec/0x360
>>>  [] ? __blk_mq_run_hw_queue+0x327/0x3e0* see below
>>>  [] ? xen_clocksource_get_cycles+0x11/0x20
>>>  [] ? bit_wait_timeout+0x90/0x90
>>>  [] ? io_schedule_timeout+0xb4/0x130
>>>  [] ? prepare_to_wait+0x57/0x80
>>>  [] ? bit_wait_io+0x17/0x60
>>>  [] ? __wait_on_bit+0x5c/0x90
>>>  [] ? bit_wait_timeout+0x90/0x90
>>>  [] ? out_of_line_wait_on_bit+0x7e/0xa0
>>>  [] ? autoremove_wake_function+0x40/0x40
>>>  [] ? jbd2_journal_commit_transaction+0xd48/0x17e0 [jbd2]
>>>  [] ? __switch_to+0x2c9/0x720
>>>  [] ? try_to_del_timer_sync+0x4d/0x80
>>>  [] ? kjournald2+0xdd/0x280 [jbd2]
>>>  [] ? wake_up_atomic_t+0x30/0x30
>>>  [] ? commit_timeout+0x10/0x10 [jbd2]
>>>  [] ? kthread+0xf0/0x110
>>>  [] ? __switch_to+0x2c9/0x720
>>>  [] ? kthread_park+0x60/0x60
>>>  [] ? ret_from_fork+0x25/0x30
>>> NMI backtrace for cpu 2
>>> CPU: 2 PID: 35 Comm: khungtaskd Not tainted 4.9.0-ucs105-amd64 #1 Debian 
>>> 4.9.30-2A~4.2.0.201709271649
>>>   81331935  0002
>>>  81335e60 0002 8104cb70 8801f0c90e80
>>>  81335f6a 8801f0c90e80 003fffbc 81128048
>>> Call Trace:
>>>  [] ? dump_stack+0x5c/0x77
>>>  [] ? nmi_cpu_backtrace+0x90/0xa0
>>>  [] ? irq_force_complete_move+0x140/0x140
>>>  [] ? nmi_trigger_cpumask_backtrace+0xfa/0x130
>>>  [] ? watchdog+0x2b8/0x330
>>>  [] ? reset_hung_task_detector+0x10/0x10
>>>  [] ? kthread+0xf0/0x110
>>>  [] ? __switch_to+0x2c9/0x720
>>>  [] ? kthread_park+0x60/0x60
>>>  [] ? ret_from_fork+0x25/0x30
...
>> Looking at the dis-assembly of xen_clocksource_get_cycles() in
>> arch/x86/xen/time.c I see no path how that should call
>> __blk_mq_run_hw_queue():
> 
> Hence the question marks ahead of the stack entries: What you see
> there are likely leftovers from prior call trees. It just so happens
> that the old return address slots haven't got overwritten yet. You
> need to first sanitize the stack trace e.g. by having the kernel
> dump more of the stack in raw hex form, and then looking at the
> disassembly to figure out how large each stack frame is, starting
> at the top-most address (i.e. the one in RIP).

That explains the strange call trace for me, thank you for the
enlightenment.

> On Wed, Oct 04, 2017 at 06:26:27PM +, Philipp Hahn wrote:
>>> INFO: task btrfs-transacti:522 blocked for more than 300 seconds.
> [...] 
>> And another one:
>>> INFO: task smbd:20101 blocked for more than 300 seconds.
> [...] 
>> This does not look normal to me or did I miss something?
> 
> So I see that both of the stuck processes listed above (smbd and
> btrfs-*) are disk related processes. Might I ask how many disk/nics
> (PV) do you have attached to this DomU, and how many queues does each
> have?

Nothing special configured, how would I best fetch that info?


Which leads me back to my original problem: How can I diagnose *why* the
task is blocked for that time? From my understanding this can happen if
IO is too slow and task just have to wait for too long. ¹
Even if IO is slow the system should stabilize itself when no new IO is
generated and the old one has been processed, right? So looking at
`vmstat` or `blktrace` should tell me, that Xen/Linux/whatever is busy
with IO and it is simply not fast enough to keep up with the load.

Thanks again, but any hint how to diagnose this does help.

Philipp

¹


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] 4.9.52: INFO: task XXX blocked for more than 300 seconds.

2017-10-05 Thread Roger Pau Monné
On Wed, Oct 04, 2017 at 06:26:27PM +, Philipp Hahn wrote:
> > INFO: task btrfs-transacti:522 blocked for more than 300 seconds.
[...] 
> And another one:
> > INFO: task smbd:20101 blocked for more than 300 seconds.
[...] 
> This does not look normal to me or did I miss something?

So I see that both of the stuck processes listed above (smbd and
btrfs-*) are disk related processes. Might I ask how many disk/nics
(PV) do you have attached to this DomU, and how many queues does each
have?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] 4.9.52: INFO: task XXX blocked for more than 300 seconds.

2017-10-05 Thread Jan Beulich
>>> On 04.10.17 at 20:26,  wrote:
> Hello,
> 
> with linux-4.9.52 running on Debian-Wheezy with Xen-4.1 I observed
> several stuck processes: Here is one (truncated) dump of the Linux
> kernel messages:
> 
>>  [] ? __schedule+0x23d/0x6d0
>>  [] ? bit_wait_timeout+0x90/0x90
>>  [] ? schedule+0x32/0x80
>>  [] ? schedule_timeout+0x1ec/0x360
>>  [] ? __blk_mq_run_hw_queue+0x327/0x3e0* see below
>>  [] ? xen_clocksource_get_cycles+0x11/0x20
>>  [] ? bit_wait_timeout+0x90/0x90
>>  [] ? io_schedule_timeout+0xb4/0x130
>>  [] ? prepare_to_wait+0x57/0x80
>>  [] ? bit_wait_io+0x17/0x60
>>  [] ? __wait_on_bit+0x5c/0x90
>>  [] ? bit_wait_timeout+0x90/0x90
>>  [] ? out_of_line_wait_on_bit+0x7e/0xa0
>>  [] ? autoremove_wake_function+0x40/0x40
>>  [] ? jbd2_journal_commit_transaction+0xd48/0x17e0 [jbd2]
>>  [] ? __switch_to+0x2c9/0x720
>>  [] ? try_to_del_timer_sync+0x4d/0x80
>>  [] ? kjournald2+0xdd/0x280 [jbd2]
>>  [] ? wake_up_atomic_t+0x30/0x30
>>  [] ? commit_timeout+0x10/0x10 [jbd2]
>>  [] ? kthread+0xf0/0x110
>>  [] ? __switch_to+0x2c9/0x720
>>  [] ? kthread_park+0x60/0x60
>>  [] ? ret_from_fork+0x25/0x30
>> NMI backtrace for cpu 2
>> CPU: 2 PID: 35 Comm: khungtaskd Not tainted 4.9.0-ucs105-amd64 #1 Debian 
>> 4.9.30-2A~4.2.0.201709271649
>>   81331935  0002
>>  81335e60 0002 8104cb70 8801f0c90e80
>>  81335f6a 8801f0c90e80 003fffbc 81128048
>> Call Trace:
>>  [] ? dump_stack+0x5c/0x77
>>  [] ? nmi_cpu_backtrace+0x90/0xa0
>>  [] ? irq_force_complete_move+0x140/0x140
>>  [] ? nmi_trigger_cpumask_backtrace+0xfa/0x130
>>  [] ? watchdog+0x2b8/0x330
>>  [] ? reset_hung_task_detector+0x10/0x10
>>  [] ? kthread+0xf0/0x110
>>  [] ? __switch_to+0x2c9/0x720
>>  [] ? kthread_park+0x60/0x60
>>  [] ? ret_from_fork+0x25/0x30
>> Sending NMI from CPU 2 to CPUs 0-1,3:
>> NMI backtrace for cpu 1
>> CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0-ucs105-amd64 #1 Debian 
> 4.9.30-2A~4.2.0.201709271649
>> task: 8801f4a02ec0 task.stack: c90040ca4000
>> RIP: e030:[]  [] 
>> xen_hypercall_sched_op+0xa/0x20
>> RSP: e02b:c90040ca7ed0  EFLAGS: 0246
>> RAX:  RBX: 8801f4a02ec0 RCX: 810013aa
>> RDX: 81c4ba70 RSI:  RDI: 0001
>> RBP: 0001 R08:  R09: 
>> R10: 7ff0 R11: 0246 R12: 
>> R13:  R14: 8801f4a02ec0 R15: 8801f4a02ec0
>> FS:  7f23ac595700() GS:8801f5a8() knlGS:
>> CS:  e033 DS: 002b ES: 002b CR0: 80050033
>> CR2: 7f52537d6d46 CR3: 0001bba23000 CR4: 2660
>> Stack:
>>  8801bb832201 0001 8101b55c 81611ec8
>>  8801f4a02ec0 0001 810bc280 8801f4a02ec0
>>  8801f4a02ec0 c0a995961d41240f addce6dcadd009c9 
>> Call Trace:
>>  [] ? xen_safe_halt+0xc/0x20
>>  [] ? default_idle+0x18/0xd0
>>  [] ? cpu_startup_entry+0x1f0/0x260
>> Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc 
>> cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc 
>> cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 
>> NMI backtrace for cpu 3
>> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.9.0-ucs105-amd64 #1 Debian 
>> 4.9.30-2A~4.2.0.201709271649
>> task: 8801f4a24f00 task.stack: c90040cb4000
>> RIP: e030:[]  [] 
>> xen_hypercall_sched_op+0xa/0x20
>> RSP: e02b:c90040cb7ed0  EFLAGS: 0246
>> RAX:  RBX: 8801f4a24f00 RCX: 810013aa
>> RDX: 81c4ba70 RSI:  RDI: 0001
>> RBP: 0003 R08:  R09: 
>> R10: 7ff0 R11: 0246 R12: 
>> R13:  R14: 8801f4a24f00 R15: 8801f4a24f00
>> FS:  7f1a2af19700() GS:8801f5b8() knlGS:
>> CS:  e033 DS: 002b ES: 002b CR0: 80050033
>> CR2: 7f4a5416b000 CR3: 0001d83ec000 CR4: 2660
>> Stack:
>>  0001 0001 8101b55c 81611ec8
>>  8801f4a24f00 0003 810bc280 8801f4a24f00
>>  8801f4a24f00 77816deb133b9979 addce6dcadd009c9 
>> Call Trace:
>>  [] ? xen_safe_halt+0xc/0x20
>>  [] ? default_idle+0x18/0xd0
>>  [] ? cpu_startup_entry+0x1f0/0x260
>> Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc 
>> cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc 
>> cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 
>> NMI backtrace for cpu 0
>> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-ucs105-amd64 #1 Debian 
>> 4.9.30-2A~4.2.0.201709271649
>> task: 81c0e540 task.stack: 81c0
>> RIP: e030:[]  [] 
>> xen_hypercall_sched_op+0xa/0x20
>> RSP: e02b:81c03e90  EFLAGS: 0246
>> RAX: 

[Xen-devel] 4.9.52: INFO: task XXX blocked for more than 300 seconds.

2017-10-05 Thread Philipp Hahn
Hello,

with linux-4.9.52 running on Debian-Wheezy with Xen-4.1 I observed
several stuck processes: Here is one (truncated) dump of the Linux
kernel messages:

>  [] ? __schedule+0x23d/0x6d0
>  [] ? bit_wait_timeout+0x90/0x90
>  [] ? schedule+0x32/0x80
>  [] ? schedule_timeout+0x1ec/0x360
>  [] ? __blk_mq_run_hw_queue+0x327/0x3e0* see below
>  [] ? xen_clocksource_get_cycles+0x11/0x20
>  [] ? bit_wait_timeout+0x90/0x90
>  [] ? io_schedule_timeout+0xb4/0x130
>  [] ? prepare_to_wait+0x57/0x80
>  [] ? bit_wait_io+0x17/0x60
>  [] ? __wait_on_bit+0x5c/0x90
>  [] ? bit_wait_timeout+0x90/0x90
>  [] ? out_of_line_wait_on_bit+0x7e/0xa0
>  [] ? autoremove_wake_function+0x40/0x40
>  [] ? jbd2_journal_commit_transaction+0xd48/0x17e0 [jbd2]
>  [] ? __switch_to+0x2c9/0x720
>  [] ? try_to_del_timer_sync+0x4d/0x80
>  [] ? kjournald2+0xdd/0x280 [jbd2]
>  [] ? wake_up_atomic_t+0x30/0x30
>  [] ? commit_timeout+0x10/0x10 [jbd2]
>  [] ? kthread+0xf0/0x110
>  [] ? __switch_to+0x2c9/0x720
>  [] ? kthread_park+0x60/0x60
>  [] ? ret_from_fork+0x25/0x30
> NMI backtrace for cpu 2
> CPU: 2 PID: 35 Comm: khungtaskd Not tainted 4.9.0-ucs105-amd64 #1 Debian 
> 4.9.30-2A~4.2.0.201709271649
>   81331935  0002
>  81335e60 0002 8104cb70 8801f0c90e80
>  81335f6a 8801f0c90e80 003fffbc 81128048
> Call Trace:
>  [] ? dump_stack+0x5c/0x77
>  [] ? nmi_cpu_backtrace+0x90/0xa0
>  [] ? irq_force_complete_move+0x140/0x140
>  [] ? nmi_trigger_cpumask_backtrace+0xfa/0x130
>  [] ? watchdog+0x2b8/0x330
>  [] ? reset_hung_task_detector+0x10/0x10
>  [] ? kthread+0xf0/0x110
>  [] ? __switch_to+0x2c9/0x720
>  [] ? kthread_park+0x60/0x60
>  [] ? ret_from_fork+0x25/0x30
> Sending NMI from CPU 2 to CPUs 0-1,3:
> NMI backtrace for cpu 1
> CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0-ucs105-amd64 #1 Debian 
> 4.9.30-2A~4.2.0.201709271649
> task: 8801f4a02ec0 task.stack: c90040ca4000
> RIP: e030:[]  [] 
> xen_hypercall_sched_op+0xa/0x20
> RSP: e02b:c90040ca7ed0  EFLAGS: 0246
> RAX:  RBX: 8801f4a02ec0 RCX: 810013aa
> RDX: 81c4ba70 RSI:  RDI: 0001
> RBP: 0001 R08:  R09: 
> R10: 7ff0 R11: 0246 R12: 
> R13:  R14: 8801f4a02ec0 R15: 8801f4a02ec0
> FS:  7f23ac595700() GS:8801f5a8() knlGS:
> CS:  e033 DS: 002b ES: 002b CR0: 80050033
> CR2: 7f52537d6d46 CR3: 0001bba23000 CR4: 2660
> Stack:
>  8801bb832201 0001 8101b55c 81611ec8
>  8801f4a02ec0 0001 810bc280 8801f4a02ec0
>  8801f4a02ec0 c0a995961d41240f addce6dcadd009c9 
> Call Trace:
>  [] ? xen_safe_halt+0xc/0x20
>  [] ? default_idle+0x18/0xd0
>  [] ? cpu_startup_entry+0x1f0/0x260
> Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc 
> cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc 
> cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 
> NMI backtrace for cpu 3
> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.9.0-ucs105-amd64 #1 Debian 
> 4.9.30-2A~4.2.0.201709271649
> task: 8801f4a24f00 task.stack: c90040cb4000
> RIP: e030:[]  [] 
> xen_hypercall_sched_op+0xa/0x20
> RSP: e02b:c90040cb7ed0  EFLAGS: 0246
> RAX:  RBX: 8801f4a24f00 RCX: 810013aa
> RDX: 81c4ba70 RSI:  RDI: 0001
> RBP: 0003 R08:  R09: 
> R10: 7ff0 R11: 0246 R12: 
> R13:  R14: 8801f4a24f00 R15: 8801f4a24f00
> FS:  7f1a2af19700() GS:8801f5b8() knlGS:
> CS:  e033 DS: 002b ES: 002b CR0: 80050033
> CR2: 7f4a5416b000 CR3: 0001d83ec000 CR4: 2660
> Stack:
>  0001 0001 8101b55c 81611ec8
>  8801f4a24f00 0003 810bc280 8801f4a24f00
>  8801f4a24f00 77816deb133b9979 addce6dcadd009c9 
> Call Trace:
>  [] ? xen_safe_halt+0xc/0x20
>  [] ? default_idle+0x18/0xd0
>  [] ? cpu_startup_entry+0x1f0/0x260
> Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc 
> cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc 
> cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 
> NMI backtrace for cpu 0
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-ucs105-amd64 #1 Debian 
> 4.9.30-2A~4.2.0.201709271649
> task: 81c0e540 task.stack: 81c0
> RIP: e030:[]  [] 
> xen_hypercall_sched_op+0xa/0x20
> RSP: e02b:81c03e90  EFLAGS: 0246
> RAX:  RBX: 81c0e540 RCX: 810013aa
> RDX: 81c4ba70 RSI:  RDI: 0001
> RBP:  R08: