Re: [linux-next] cpus stalls detected few hours after booting next kernel

2017-07-24 Thread Abdul Haleem
On Fri, 2017-06-30 at 17:28 +1000, Nicholas Piggin wrote:
> On Fri, 30 Jun 2017 10:52:18 +0530
> Abdul Haleem  wrote:
> 
> > On Fri, 2017-06-30 at 00:45 +1000, Nicholas Piggin wrote:
> > > On Thu, 29 Jun 2017 20:23:05 +1000
> > > Nicholas Piggin  wrote:
> > > 
> > > > On Thu, 29 Jun 2017 19:36:14 +1000
> > > > Nicholas Piggin  wrote:
> > > 
> > > > > I don't *think* the replay-wakeup-interrupt patch is directly 
> > > > > involved, but
> > > > > it's likely to be one of the idle patches.  
> > > 
> > > Okay this turned out to be misconfigured sleep states I added for the
> > > simulator, sorry for the false alarm.
> > > 
> > > > Although you have this in the backtrace. I wonder if that's a stuck
> > > > lock in rcu_process_callbacks?
> > > 
> > > So this spinlock becomes top of the list of suspects. Can you try
> > > enabling lockdep and try to reproduce it?
> > 
> > Yes, recreated again with CONFIG_LOCKDEP=y & CONFIG_DEBUG_LOCKDEP=y set.
> > I do not see any difference in trace messages with and without LOCKDEP
> > enabled.
> > 
> > Please find the attached log file.
> 
> Can you get an rcu_invoke_callback event trace that Paul suggested?

Yes, I have collected the perf report.
> 
> Does this bug show up with just the powerpc next branch?

Now started seeing the call trace on mainline too (4.13.0-rc2)

> 
> Thanks,
> Nick
> 


-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre





Re: [linux-next] cpus stalls detected few hours after booting next kernel

2017-07-24 Thread Abdul Haleem
On Fri, 2017-06-30 at 17:28 +1000, Nicholas Piggin wrote:
> On Fri, 30 Jun 2017 10:52:18 +0530
> Abdul Haleem  wrote:
> 
> > On Fri, 2017-06-30 at 00:45 +1000, Nicholas Piggin wrote:
> > > On Thu, 29 Jun 2017 20:23:05 +1000
> > > Nicholas Piggin  wrote:
> > > 
> > > > On Thu, 29 Jun 2017 19:36:14 +1000
> > > > Nicholas Piggin  wrote:
> > > 
> > > > > I don't *think* the replay-wakeup-interrupt patch is directly 
> > > > > involved, but
> > > > > it's likely to be one of the idle patches.  
> > > 
> > > Okay this turned out to be misconfigured sleep states I added for the
> > > simulator, sorry for the false alarm.
> > > 
> > > > Although you have this in the backtrace. I wonder if that's a stuck
> > > > lock in rcu_process_callbacks?
> > > 
> > > So this spinlock becomes top of the list of suspects. Can you try
> > > enabling lockdep and try to reproduce it?
> > 
> > Yes, recreated again with CONFIG_LOCKDEP=y & CONFIG_DEBUG_LOCKDEP=y set.
> > I do not see any difference in trace messages with and without LOCKDEP
> > enabled.
> > 
> > Please find the attached log file.
> 
> Can you get an rcu_invoke_callback event trace that Paul suggested?

Yes, I have collected the perf report.
> 
> Does this bug show up with just the powerpc next branch?

Now started seeing the call trace on mainline too (4.13.0-rc2)

> 
> Thanks,
> Nick
> 


-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre





Re: [linux-next] cpus stalls detected few hours after booting next kernel

2017-07-07 Thread Abdul Haleem
On Fri, 2017-06-30 at 17:28 +1000, Nicholas Piggin wrote:
> On Fri, 30 Jun 2017 10:52:18 +0530
> Abdul Haleem  wrote:
> 
> > On Fri, 2017-06-30 at 00:45 +1000, Nicholas Piggin wrote:
> > > On Thu, 29 Jun 2017 20:23:05 +1000
> > > Nicholas Piggin  wrote:
> > > 
> > > > On Thu, 29 Jun 2017 19:36:14 +1000
> > > > Nicholas Piggin  wrote:
> > > 
> > > > > I don't *think* the replay-wakeup-interrupt patch is directly 
> > > > > involved, but
> > > > > it's likely to be one of the idle patches.  
> > > 
> > > Okay this turned out to be misconfigured sleep states I added for the
> > > simulator, sorry for the false alarm.
> > > 
> > > > Although you have this in the backtrace. I wonder if that's a stuck
> > > > lock in rcu_process_callbacks?
> > > 
> > > So this spinlock becomes top of the list of suspects. Can you try
> > > enabling lockdep and try to reproduce it?
> > 
> > Yes, recreated again with CONFIG_LOCKDEP=y & CONFIG_DEBUG_LOCKDEP=y set.
> > I do not see any difference in trace messages with and without LOCKDEP
> > enabled.
> > 
> > Please find the attached log file.
> 
> Can you get an rcu_invoke_callback event trace that Paul suggested?

Yes, I was able to collect the perf data for rcu_invoke_callback event
on recent next kernel (4.12.0-next-20170705). the issue is rare to hit.

After booting the next kernel, I started this command 'perf record  -e
rcu:rcu_invoke_callback -a -g -- cat' and waited for 30 minutes.

five minutes after seeing the stalls messages, I did CTRL-C to end the
perf command.

@Nicholas : the perf.data report is too huge to attach here, shall I
ping you the internal location of file on slack/mail ? Also the machine
is in the same state if you want to use it ?

> 
> Does this bug show up with just the powerpc next branch?
> 
> Thanks,
> Nick
> 


-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre





Re: [linux-next] cpus stalls detected few hours after booting next kernel

2017-07-07 Thread Abdul Haleem
On Fri, 2017-06-30 at 17:28 +1000, Nicholas Piggin wrote:
> On Fri, 30 Jun 2017 10:52:18 +0530
> Abdul Haleem  wrote:
> 
> > On Fri, 2017-06-30 at 00:45 +1000, Nicholas Piggin wrote:
> > > On Thu, 29 Jun 2017 20:23:05 +1000
> > > Nicholas Piggin  wrote:
> > > 
> > > > On Thu, 29 Jun 2017 19:36:14 +1000
> > > > Nicholas Piggin  wrote:
> > > 
> > > > > I don't *think* the replay-wakeup-interrupt patch is directly 
> > > > > involved, but
> > > > > it's likely to be one of the idle patches.  
> > > 
> > > Okay this turned out to be misconfigured sleep states I added for the
> > > simulator, sorry for the false alarm.
> > > 
> > > > Although you have this in the backtrace. I wonder if that's a stuck
> > > > lock in rcu_process_callbacks?
> > > 
> > > So this spinlock becomes top of the list of suspects. Can you try
> > > enabling lockdep and try to reproduce it?
> > 
> > Yes, recreated again with CONFIG_LOCKDEP=y & CONFIG_DEBUG_LOCKDEP=y set.
> > I do not see any difference in trace messages with and without LOCKDEP
> > enabled.
> > 
> > Please find the attached log file.
> 
> Can you get an rcu_invoke_callback event trace that Paul suggested?

Yes, I was able to collect the perf data for rcu_invoke_callback event
on recent next kernel (4.12.0-next-20170705). the issue is rare to hit.

After booting the next kernel, I started this command 'perf record  -e
rcu:rcu_invoke_callback -a -g -- cat' and waited for 30 minutes.

five minutes after seeing the stalls messages, I did CTRL-C to end the
perf command.

@Nicholas : the perf.data report is too huge to attach here, shall I
ping you the internal location of file on slack/mail ? Also the machine
is in the same state if you want to use it ?

> 
> Does this bug show up with just the powerpc next branch?
> 
> Thanks,
> Nick
> 


-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre





Re: [linux-next] cpus stalls detected few hours after booting next kernel

2017-07-04 Thread Paul E. McKenney
On Tue, Jul 04, 2017 at 07:44:58PM +0530, Abdul Haleem wrote:
> On Fri, 2017-06-30 at 17:28 +1000, Nicholas Piggin wrote:
> > On Fri, 30 Jun 2017 10:52:18 +0530
> > Abdul Haleem  wrote:
> > 
> > > On Fri, 2017-06-30 at 00:45 +1000, Nicholas Piggin wrote:
> > > > On Thu, 29 Jun 2017 20:23:05 +1000
> > > > Nicholas Piggin  wrote:
> > > > 
> > > > > On Thu, 29 Jun 2017 19:36:14 +1000
> > > > > Nicholas Piggin  wrote:
> > > > 
> > > > > > I don't *think* the replay-wakeup-interrupt patch is directly 
> > > > > > involved, but
> > > > > > it's likely to be one of the idle patches.  
> > > > 
> > > > Okay this turned out to be misconfigured sleep states I added for the
> > > > simulator, sorry for the false alarm.
> > > > 
> > > > > Although you have this in the backtrace. I wonder if that's a stuck
> > > > > lock in rcu_process_callbacks?
> > > > 
> > > > So this spinlock becomes top of the list of suspects. Can you try
> > > > enabling lockdep and try to reproduce it?
> > > 
> > > Yes, recreated again with CONFIG_LOCKDEP=y & CONFIG_DEBUG_LOCKDEP=y set.
> > > I do not see any difference in trace messages with and without LOCKDEP
> > > enabled.
> > > 
> > > Please find the attached log file.
> > 
> > Can you get an rcu_invoke_callback event trace that Paul suggested?
> 
> I could not reproduce the issue with the latest next kernel
> (4.12.0-rc7-next-20170703).
> 
> > Does this bug show up with just the powerpc next branch?
> 
> Perhaps, stress-ng test on today's mainline kernel (4.12.0) triggered a
> different trace message and this not reproducible all the time.
> 
> 
> stress-ng --io 100  --vm 10 --vm-bytes 100g --timeout 1h --oomable
> 
> INFO: rcu_sched self-detected stall on CPU
> 8-...: (2099 ticks this GP) idle=37e/141/0
> softirq=1404131/1404131 fqs=932 
>  (t=2100 jiffies g=394894 c=394893 q=21997)
> Task dump for CPU 8:
> kworker/u162:2  R  running task11168 28468  2 0x0884   
> Workqueue: writeback wb_workfn (flush-253:1)
> Call Trace:
> [c0036e3eb340] [c0114480] sched_show_task+0xf0/0x160
> (unreliable)
> [c0036e3eb3b0] [c09adc2c] rcu_dump_cpu_stacks+0xd0/0x134
> [c0036e3eb400] [c015e4d0] rcu_check_callbacks+0x8f0/0xaf0
> [c0036e3eb530] [c0165e2c] update_process_times+0x3c/0x90
> [c0036e3eb560] [c017b73c] tick_sched_handle.isra.13
> +0x2c/0xc0
> [c0036e3eb590] [c017b828] tick_sched_timer+0x58/0xb0
> [c0036e3eb5d0] [c01669e8] __hrtimer_run_queues+0xf8/0x330
> [c0036e3eb650] [c0167744] hrtimer_interrupt+0xe4/0x280
> [c0036e3eb710] [c0022620] __timer_interrupt+0x90/0x270
> [c0036e3eb760] [c0022cf0] timer_interrupt+0xa0/0xe0
> [c0036e3eb790] [c00091e8] decrementer_common+0x158/0x160
> --- interrupt: 901 at move_expired_inodes+0x30/0x200
> LR = queue_io+0x8c/0x190
> [c0036e3eba80] [c0036e3ebac0] 0xc0036e3ebac0 (unreliable)
> [c0036e3ebac0] [c0319b24] wb_writeback+0x2b4/0x420

The usual assumption would be that wb_writeback() is looping or...

> [c0036e3ebb90] [c031a980] wb_workfn+0xf0/0x4b0
> [c0036e3ebca0] [c00fa160] process_one_work+0x180/0x470
> [c0036e3ebd30] [c00fa6d4] worker_thread+0x284/0x500

...there is so much work being scheduled that worker_thread() cannot
keep up.  If tracing locates such a loop, the usual trick is to
place a cond_resched_rcu_qs() somewhere in it (but of course not
where interrupts or preemption are disabled).

Tracing can help detect this.  Or printk()s, for that matter.  ;-)

Thanx, Paul

> [c0036e3ebdc0] [c0101fc0] kthread+0x160/0x1a0
> [c0036e3ebe30] [c000bb60] ret_from_kernel_thread+0x5c/0x7c
> INFO: rcu_sched self-detected stall on CPU
> 51-...: (2099 ticks this GP) idle=7c2/141/0
> softirq=1272749/1272749 fqs=995 
>  (t=2100 jiffies g=394900 c=394899 q=32186)
> ~   
> 
> stress-ng : http://kernel.ubuntu.com/git/cking/stress-ng.git
> 
> > 
> > Thanks,
> > Nick
> > 
> 
> 
> -- 
> Regard's
> 
> Abdul Haleem
> IBM Linux Technology Centre
> 
> 
> 



Re: [linux-next] cpus stalls detected few hours after booting next kernel

2017-07-04 Thread Paul E. McKenney
On Tue, Jul 04, 2017 at 07:44:58PM +0530, Abdul Haleem wrote:
> On Fri, 2017-06-30 at 17:28 +1000, Nicholas Piggin wrote:
> > On Fri, 30 Jun 2017 10:52:18 +0530
> > Abdul Haleem  wrote:
> > 
> > > On Fri, 2017-06-30 at 00:45 +1000, Nicholas Piggin wrote:
> > > > On Thu, 29 Jun 2017 20:23:05 +1000
> > > > Nicholas Piggin  wrote:
> > > > 
> > > > > On Thu, 29 Jun 2017 19:36:14 +1000
> > > > > Nicholas Piggin  wrote:
> > > > 
> > > > > > I don't *think* the replay-wakeup-interrupt patch is directly 
> > > > > > involved, but
> > > > > > it's likely to be one of the idle patches.  
> > > > 
> > > > Okay this turned out to be misconfigured sleep states I added for the
> > > > simulator, sorry for the false alarm.
> > > > 
> > > > > Although you have this in the backtrace. I wonder if that's a stuck
> > > > > lock in rcu_process_callbacks?
> > > > 
> > > > So this spinlock becomes top of the list of suspects. Can you try
> > > > enabling lockdep and try to reproduce it?
> > > 
> > > Yes, recreated again with CONFIG_LOCKDEP=y & CONFIG_DEBUG_LOCKDEP=y set.
> > > I do not see any difference in trace messages with and without LOCKDEP
> > > enabled.
> > > 
> > > Please find the attached log file.
> > 
> > Can you get an rcu_invoke_callback event trace that Paul suggested?
> 
> I could not reproduce the issue with the latest next kernel
> (4.12.0-rc7-next-20170703).
> 
> > Does this bug show up with just the powerpc next branch?
> 
> Perhaps, stress-ng test on today's mainline kernel (4.12.0) triggered a
> different trace message and this not reproducible all the time.
> 
> 
> stress-ng --io 100  --vm 10 --vm-bytes 100g --timeout 1h --oomable
> 
> INFO: rcu_sched self-detected stall on CPU
> 8-...: (2099 ticks this GP) idle=37e/141/0
> softirq=1404131/1404131 fqs=932 
>  (t=2100 jiffies g=394894 c=394893 q=21997)
> Task dump for CPU 8:
> kworker/u162:2  R  running task11168 28468  2 0x0884   
> Workqueue: writeback wb_workfn (flush-253:1)
> Call Trace:
> [c0036e3eb340] [c0114480] sched_show_task+0xf0/0x160
> (unreliable)
> [c0036e3eb3b0] [c09adc2c] rcu_dump_cpu_stacks+0xd0/0x134
> [c0036e3eb400] [c015e4d0] rcu_check_callbacks+0x8f0/0xaf0
> [c0036e3eb530] [c0165e2c] update_process_times+0x3c/0x90
> [c0036e3eb560] [c017b73c] tick_sched_handle.isra.13
> +0x2c/0xc0
> [c0036e3eb590] [c017b828] tick_sched_timer+0x58/0xb0
> [c0036e3eb5d0] [c01669e8] __hrtimer_run_queues+0xf8/0x330
> [c0036e3eb650] [c0167744] hrtimer_interrupt+0xe4/0x280
> [c0036e3eb710] [c0022620] __timer_interrupt+0x90/0x270
> [c0036e3eb760] [c0022cf0] timer_interrupt+0xa0/0xe0
> [c0036e3eb790] [c00091e8] decrementer_common+0x158/0x160
> --- interrupt: 901 at move_expired_inodes+0x30/0x200
> LR = queue_io+0x8c/0x190
> [c0036e3eba80] [c0036e3ebac0] 0xc0036e3ebac0 (unreliable)
> [c0036e3ebac0] [c0319b24] wb_writeback+0x2b4/0x420

The usual assumption would be that wb_writeback() is looping or...

> [c0036e3ebb90] [c031a980] wb_workfn+0xf0/0x4b0
> [c0036e3ebca0] [c00fa160] process_one_work+0x180/0x470
> [c0036e3ebd30] [c00fa6d4] worker_thread+0x284/0x500

...there is so much work being scheduled that worker_thread() cannot
keep up.  If tracing locates such a loop, the usual trick is to
place a cond_resched_rcu_qs() somewhere in it (but of course not
where interrupts or preemption are disabled).

Tracing can help detect this.  Or printk()s, for that matter.  ;-)

Thanx, Paul

> [c0036e3ebdc0] [c0101fc0] kthread+0x160/0x1a0
> [c0036e3ebe30] [c000bb60] ret_from_kernel_thread+0x5c/0x7c
> INFO: rcu_sched self-detected stall on CPU
> 51-...: (2099 ticks this GP) idle=7c2/141/0
> softirq=1272749/1272749 fqs=995 
>  (t=2100 jiffies g=394900 c=394899 q=32186)
> ~   
> 
> stress-ng : http://kernel.ubuntu.com/git/cking/stress-ng.git
> 
> > 
> > Thanks,
> > Nick
> > 
> 
> 
> -- 
> Regard's
> 
> Abdul Haleem
> IBM Linux Technology Centre
> 
> 
> 



Re: [linux-next] cpus stalls detected few hours after booting next kernel

2017-07-04 Thread Abdul Haleem
On Fri, 2017-06-30 at 17:28 +1000, Nicholas Piggin wrote:
> On Fri, 30 Jun 2017 10:52:18 +0530
> Abdul Haleem  wrote:
> 
> > On Fri, 2017-06-30 at 00:45 +1000, Nicholas Piggin wrote:
> > > On Thu, 29 Jun 2017 20:23:05 +1000
> > > Nicholas Piggin  wrote:
> > > 
> > > > On Thu, 29 Jun 2017 19:36:14 +1000
> > > > Nicholas Piggin  wrote:
> > > 
> > > > > I don't *think* the replay-wakeup-interrupt patch is directly 
> > > > > involved, but
> > > > > it's likely to be one of the idle patches.  
> > > 
> > > Okay this turned out to be misconfigured sleep states I added for the
> > > simulator, sorry for the false alarm.
> > > 
> > > > Although you have this in the backtrace. I wonder if that's a stuck
> > > > lock in rcu_process_callbacks?
> > > 
> > > So this spinlock becomes top of the list of suspects. Can you try
> > > enabling lockdep and try to reproduce it?
> > 
> > Yes, recreated again with CONFIG_LOCKDEP=y & CONFIG_DEBUG_LOCKDEP=y set.
> > I do not see any difference in trace messages with and without LOCKDEP
> > enabled.
> > 
> > Please find the attached log file.
> 
> Can you get an rcu_invoke_callback event trace that Paul suggested?

I could not reproduce the issue with the latest next kernel
(4.12.0-rc7-next-20170703).

> Does this bug show up with just the powerpc next branch?

Perhaps, stress-ng test on today's mainline kernel (4.12.0) triggered a
different trace message and this not reproducible all the time.


stress-ng --io 100  --vm 10 --vm-bytes 100g --timeout 1h --oomable

INFO: rcu_sched self-detected stall on CPU
8-...: (2099 ticks this GP) idle=37e/141/0
softirq=1404131/1404131 fqs=932 
 (t=2100 jiffies g=394894 c=394893 q=21997)
Task dump for CPU 8:
kworker/u162:2  R  running task11168 28468  2 0x0884   
Workqueue: writeback wb_workfn (flush-253:1)
Call Trace:
[c0036e3eb340] [c0114480] sched_show_task+0xf0/0x160
(unreliable)
[c0036e3eb3b0] [c09adc2c] rcu_dump_cpu_stacks+0xd0/0x134
[c0036e3eb400] [c015e4d0] rcu_check_callbacks+0x8f0/0xaf0
[c0036e3eb530] [c0165e2c] update_process_times+0x3c/0x90
[c0036e3eb560] [c017b73c] tick_sched_handle.isra.13
+0x2c/0xc0
[c0036e3eb590] [c017b828] tick_sched_timer+0x58/0xb0
[c0036e3eb5d0] [c01669e8] __hrtimer_run_queues+0xf8/0x330
[c0036e3eb650] [c0167744] hrtimer_interrupt+0xe4/0x280
[c0036e3eb710] [c0022620] __timer_interrupt+0x90/0x270
[c0036e3eb760] [c0022cf0] timer_interrupt+0xa0/0xe0
[c0036e3eb790] [c00091e8] decrementer_common+0x158/0x160
--- interrupt: 901 at move_expired_inodes+0x30/0x200
LR = queue_io+0x8c/0x190
[c0036e3eba80] [c0036e3ebac0] 0xc0036e3ebac0 (unreliable)
[c0036e3ebac0] [c0319b24] wb_writeback+0x2b4/0x420
[c0036e3ebb90] [c031a980] wb_workfn+0xf0/0x4b0
[c0036e3ebca0] [c00fa160] process_one_work+0x180/0x470
[c0036e3ebd30] [c00fa6d4] worker_thread+0x284/0x500
[c0036e3ebdc0] [c0101fc0] kthread+0x160/0x1a0
[c0036e3ebe30] [c000bb60] ret_from_kernel_thread+0x5c/0x7c
INFO: rcu_sched self-detected stall on CPU
51-...: (2099 ticks this GP) idle=7c2/141/0
softirq=1272749/1272749 fqs=995 
 (t=2100 jiffies g=394900 c=394899 q=32186)
~   

stress-ng : http://kernel.ubuntu.com/git/cking/stress-ng.git

> 
> Thanks,
> Nick
> 


-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre





Re: [linux-next] cpus stalls detected few hours after booting next kernel

2017-07-04 Thread Abdul Haleem
On Fri, 2017-06-30 at 17:28 +1000, Nicholas Piggin wrote:
> On Fri, 30 Jun 2017 10:52:18 +0530
> Abdul Haleem  wrote:
> 
> > On Fri, 2017-06-30 at 00:45 +1000, Nicholas Piggin wrote:
> > > On Thu, 29 Jun 2017 20:23:05 +1000
> > > Nicholas Piggin  wrote:
> > > 
> > > > On Thu, 29 Jun 2017 19:36:14 +1000
> > > > Nicholas Piggin  wrote:
> > > 
> > > > > I don't *think* the replay-wakeup-interrupt patch is directly 
> > > > > involved, but
> > > > > it's likely to be one of the idle patches.  
> > > 
> > > Okay this turned out to be misconfigured sleep states I added for the
> > > simulator, sorry for the false alarm.
> > > 
> > > > Although you have this in the backtrace. I wonder if that's a stuck
> > > > lock in rcu_process_callbacks?
> > > 
> > > So this spinlock becomes top of the list of suspects. Can you try
> > > enabling lockdep and try to reproduce it?
> > 
> > Yes, recreated again with CONFIG_LOCKDEP=y & CONFIG_DEBUG_LOCKDEP=y set.
> > I do not see any difference in trace messages with and without LOCKDEP
> > enabled.
> > 
> > Please find the attached log file.
> 
> Can you get an rcu_invoke_callback event trace that Paul suggested?

I could not reproduce the issue with the latest next kernel
(4.12.0-rc7-next-20170703).

> Does this bug show up with just the powerpc next branch?

Perhaps, stress-ng test on today's mainline kernel (4.12.0) triggered a
different trace message and this not reproducible all the time.


stress-ng --io 100  --vm 10 --vm-bytes 100g --timeout 1h --oomable

INFO: rcu_sched self-detected stall on CPU
8-...: (2099 ticks this GP) idle=37e/141/0
softirq=1404131/1404131 fqs=932 
 (t=2100 jiffies g=394894 c=394893 q=21997)
Task dump for CPU 8:
kworker/u162:2  R  running task11168 28468  2 0x0884   
Workqueue: writeback wb_workfn (flush-253:1)
Call Trace:
[c0036e3eb340] [c0114480] sched_show_task+0xf0/0x160
(unreliable)
[c0036e3eb3b0] [c09adc2c] rcu_dump_cpu_stacks+0xd0/0x134
[c0036e3eb400] [c015e4d0] rcu_check_callbacks+0x8f0/0xaf0
[c0036e3eb530] [c0165e2c] update_process_times+0x3c/0x90
[c0036e3eb560] [c017b73c] tick_sched_handle.isra.13
+0x2c/0xc0
[c0036e3eb590] [c017b828] tick_sched_timer+0x58/0xb0
[c0036e3eb5d0] [c01669e8] __hrtimer_run_queues+0xf8/0x330
[c0036e3eb650] [c0167744] hrtimer_interrupt+0xe4/0x280
[c0036e3eb710] [c0022620] __timer_interrupt+0x90/0x270
[c0036e3eb760] [c0022cf0] timer_interrupt+0xa0/0xe0
[c0036e3eb790] [c00091e8] decrementer_common+0x158/0x160
--- interrupt: 901 at move_expired_inodes+0x30/0x200
LR = queue_io+0x8c/0x190
[c0036e3eba80] [c0036e3ebac0] 0xc0036e3ebac0 (unreliable)
[c0036e3ebac0] [c0319b24] wb_writeback+0x2b4/0x420
[c0036e3ebb90] [c031a980] wb_workfn+0xf0/0x4b0
[c0036e3ebca0] [c00fa160] process_one_work+0x180/0x470
[c0036e3ebd30] [c00fa6d4] worker_thread+0x284/0x500
[c0036e3ebdc0] [c0101fc0] kthread+0x160/0x1a0
[c0036e3ebe30] [c000bb60] ret_from_kernel_thread+0x5c/0x7c
INFO: rcu_sched self-detected stall on CPU
51-...: (2099 ticks this GP) idle=7c2/141/0
softirq=1272749/1272749 fqs=995 
 (t=2100 jiffies g=394900 c=394899 q=32186)
~   

stress-ng : http://kernel.ubuntu.com/git/cking/stress-ng.git

> 
> Thanks,
> Nick
> 


-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre





Re: [linux-next] cpus stalls detected few hours after booting next kernel

2017-06-30 Thread Paul E. McKenney
On Fri, Jun 30, 2017 at 05:28:02PM +1000, Nicholas Piggin wrote:
> On Fri, 30 Jun 2017 10:52:18 +0530
> Abdul Haleem  wrote:
> 
> > On Fri, 2017-06-30 at 00:45 +1000, Nicholas Piggin wrote:
> > > On Thu, 29 Jun 2017 20:23:05 +1000
> > > Nicholas Piggin  wrote:
> > > 
> > > > On Thu, 29 Jun 2017 19:36:14 +1000
> > > > Nicholas Piggin  wrote:
> > > 
> > > > > I don't *think* the replay-wakeup-interrupt patch is directly 
> > > > > involved, but
> > > > > it's likely to be one of the idle patches.  
> > > 
> > > Okay this turned out to be misconfigured sleep states I added for the
> > > simulator, sorry for the false alarm.
> > > 
> > > > Although you have this in the backtrace. I wonder if that's a stuck
> > > > lock in rcu_process_callbacks?
> > > 
> > > So this spinlock becomes top of the list of suspects. Can you try
> > > enabling lockdep and try to reproduce it?
> > 
> > Yes, recreated again with CONFIG_LOCKDEP=y & CONFIG_DEBUG_LOCKDEP=y set.
> > I do not see any difference in trace messages with and without LOCKDEP
> > enabled.
> > 
> > Please find the attached log file.
> 
> Can you get an rcu_invoke_callback event trace that Paul suggested?
> 
> Does this bug show up with just the powerpc next branch?

And I must say that those RCU CPU stall warnings are spectacular!

Did you by chance boot with a small subset of the CPUs online, and
bring the rest online later on?

Thanx, Paul



Re: [linux-next] cpus stalls detected few hours after booting next kernel

2017-06-30 Thread Paul E. McKenney
On Fri, Jun 30, 2017 at 05:28:02PM +1000, Nicholas Piggin wrote:
> On Fri, 30 Jun 2017 10:52:18 +0530
> Abdul Haleem  wrote:
> 
> > On Fri, 2017-06-30 at 00:45 +1000, Nicholas Piggin wrote:
> > > On Thu, 29 Jun 2017 20:23:05 +1000
> > > Nicholas Piggin  wrote:
> > > 
> > > > On Thu, 29 Jun 2017 19:36:14 +1000
> > > > Nicholas Piggin  wrote:
> > > 
> > > > > I don't *think* the replay-wakeup-interrupt patch is directly 
> > > > > involved, but
> > > > > it's likely to be one of the idle patches.  
> > > 
> > > Okay this turned out to be misconfigured sleep states I added for the
> > > simulator, sorry for the false alarm.
> > > 
> > > > Although you have this in the backtrace. I wonder if that's a stuck
> > > > lock in rcu_process_callbacks?
> > > 
> > > So this spinlock becomes top of the list of suspects. Can you try
> > > enabling lockdep and try to reproduce it?
> > 
> > Yes, recreated again with CONFIG_LOCKDEP=y & CONFIG_DEBUG_LOCKDEP=y set.
> > I do not see any difference in trace messages with and without LOCKDEP
> > enabled.
> > 
> > Please find the attached log file.
> 
> Can you get an rcu_invoke_callback event trace that Paul suggested?
> 
> Does this bug show up with just the powerpc next branch?

And I must say that those RCU CPU stall warnings are spectacular!

Did you by chance boot with a small subset of the CPUs online, and
bring the rest online later on?

Thanx, Paul



Re: [linux-next] cpus stalls detected few hours after booting next kernel

2017-06-30 Thread Nicholas Piggin
On Fri, 30 Jun 2017 10:52:18 +0530
Abdul Haleem  wrote:

> On Fri, 2017-06-30 at 00:45 +1000, Nicholas Piggin wrote:
> > On Thu, 29 Jun 2017 20:23:05 +1000
> > Nicholas Piggin  wrote:
> > 
> > > On Thu, 29 Jun 2017 19:36:14 +1000
> > > Nicholas Piggin  wrote:
> > 
> > > > I don't *think* the replay-wakeup-interrupt patch is directly involved, 
> > > > but
> > > > it's likely to be one of the idle patches.  
> > 
> > Okay this turned out to be misconfigured sleep states I added for the
> > simulator, sorry for the false alarm.
> > 
> > > Although you have this in the backtrace. I wonder if that's a stuck
> > > lock in rcu_process_callbacks?
> > 
> > So this spinlock becomes top of the list of suspects. Can you try
> > enabling lockdep and try to reproduce it?
> 
> Yes, recreated again with CONFIG_LOCKDEP=y & CONFIG_DEBUG_LOCKDEP=y set.
> I do not see any difference in trace messages with and without LOCKDEP
> enabled.
> 
> Please find the attached log file.

Can you get an rcu_invoke_callback event trace that Paul suggested?

Does this bug show up with just the powerpc next branch?

Thanks,
Nick


Re: [linux-next] cpus stalls detected few hours after booting next kernel

2017-06-30 Thread Nicholas Piggin
On Fri, 30 Jun 2017 10:52:18 +0530
Abdul Haleem  wrote:

> On Fri, 2017-06-30 at 00:45 +1000, Nicholas Piggin wrote:
> > On Thu, 29 Jun 2017 20:23:05 +1000
> > Nicholas Piggin  wrote:
> > 
> > > On Thu, 29 Jun 2017 19:36:14 +1000
> > > Nicholas Piggin  wrote:
> > 
> > > > I don't *think* the replay-wakeup-interrupt patch is directly involved, 
> > > > but
> > > > it's likely to be one of the idle patches.  
> > 
> > Okay this turned out to be misconfigured sleep states I added for the
> > simulator, sorry for the false alarm.
> > 
> > > Although you have this in the backtrace. I wonder if that's a stuck
> > > lock in rcu_process_callbacks?
> > 
> > So this spinlock becomes top of the list of suspects. Can you try
> > enabling lockdep and try to reproduce it?
> 
> Yes, recreated again with CONFIG_LOCKDEP=y & CONFIG_DEBUG_LOCKDEP=y set.
> I do not see any difference in trace messages with and without LOCKDEP
> enabled.
> 
> Please find the attached log file.

Can you get an rcu_invoke_callback event trace that Paul suggested?

Does this bug show up with just the powerpc next branch?

Thanks,
Nick


Re: [linux-next] cpus stalls detected few hours after booting next kernel

2017-06-29 Thread Abdul Haleem
On Fri, 2017-06-30 at 00:45 +1000, Nicholas Piggin wrote:
> On Thu, 29 Jun 2017 20:23:05 +1000
> Nicholas Piggin  wrote:
> 
> > On Thu, 29 Jun 2017 19:36:14 +1000
> > Nicholas Piggin  wrote:
> 
> > > I don't *think* the replay-wakeup-interrupt patch is directly involved, 
> > > but
> > > it's likely to be one of the idle patches.  
> 
> Okay this turned out to be misconfigured sleep states I added for the
> simulator, sorry for the false alarm.
> 
> > Although you have this in the backtrace. I wonder if that's a stuck
> > lock in rcu_process_callbacks?
> 
> So this spinlock becomes top of the list of suspects. Can you try
> enabling lockdep and try to reproduce it?

Yes, recreated again with CONFIG_LOCKDEP=y & CONFIG_DEBUG_LOCKDEP=y set.
I do not see any difference in trace messages with and without LOCKDEP
enabled.

Please find the attached log file.

> 
> > [ 5948.345539] CPU: 63 PID: 7360 Comm: rs:main Q:Reg Not tainted \
> > 4.12.0-rc7-next-20170628 #2 [ 5948.345612] task: c00f1c14e600 
> > task.stack: \
> > c00f1c1e8000 [ 5948.345672] NIP: c09c7a10 LR: c09c7a08 
> > CTR: \
> > c015eda0 [ 5948.358553] REGS: c00f1c1eb150 TRAP: 0501   Not 
> > tainted  \
> > (4.12.0-rc7-next-20170628) [ 5948.358623] MSR: 90009033 \
> >  [ 5948.358626]   CR: 28082884  XER: 2000
> > [ 5948.358706] CFAR: c09c7a28 SOFTE: 1 
> > GPR00: c015f14c c00f1c1eb3d0 c1062b00 0001 
> > GPR04: c00fff6e6180  0001 00cc 
> > GPR08: 0001 804a   
> > GPR12: c015eda0 cfd55a80 
> > [ 5948.358986] NIP [c09c7a10] _raw_spin_lock_irqsave+0x90/0x100
> > [ 5948.359043] LR [c09c7a08] _raw_spin_lock_irqsave+0x88/0x100
> > [ 5948.359099] Call Trace:
> > [ 5948.359123] [c00f1c1eb3d0] [0001] 0x1 (unreliable)
> > [ 5948.359182] [c00f1c1eb410] [c015f14c] \
> d> rcu_process_callbacks+0x3ac/0x620 [ 5948.359252] [c00f1c1eb4c0] \
> > [c00e1e0c] __do_softirq+0x14c/0x3a0 [ 5948.365958] 
> > [c00f1c1eb5b0] \
> > [c00e2448] irq_exit+0x108/0x120 [ 5948.366016] [c00f1c1eb5d0] \
> > [c00232b4] timer_interrupt+0xa4/0xe0 [ 5948.366075] 
> > [c00f1c1eb600] \
> > [c0009208] decrementer_common+0x158/0x160 [ 5948.366149] --- 
> > interrupt: 901 \
> > at start_this_handle+0xd0/0x4b0  LR = jbd2__journal_start+0x17c/0x2b0
> > [ 5948.366242] [c00f1c1eb8f0] [c00f20a5cb00] 0xc00f20a5cb00 
> > (unreliable)
> > [ 5948.366314] [c00f1c1eba00] [c040717c] 
> > jbd2__journal_start+0x17c/0x2b0
> > [ 5948.366388] [c00f1c1eba70] [c038edf4] \
> > __ext4_journal_start_sb+0x84/0x180 [ 5948.366459] [c00f1c1ebad0] \
> > [c03b47dc] ext4_da_write_begin+0x17c/0x520 [ 5948.366532] 
> > [c00f1c1ebb90] \
> > [c021f9c8] generic_perform_write+0xe8/0x250 [ 5948.366604] 
> > [c00f1c1ebc20] \
> > [c0220d20] __generic_file_write_iter+0x200/0x240 [ 5948.366677] \
> > [c00f1c1ebc80] [c039d614] ext4_file_write_iter+0x2e4/0x4d0 [ 
> > 5948.373255] \
> > [c00f1c1ebd00] [c02e13c0] __vfs_write+0x120/0x200 [ 
> > 5948.373313] \
> > [c00f1c1ebd90] [c02e2c48] vfs_write+0xc8/0x240 [ 5948.373371] \
> > [c00f1c1ebde0] [c02e4940] SyS_write+0x60/0x110 [ 5948.373430] \
> > [c00f1c1ebe30] [c000b8e0] system_call+0x38/0xdc [ 5948.373486] \
> > Instruction dump: [ 5948.373521] 7fe3fb78 e8010010 eba1ffe8 ebc1fff0 
> > ebe1fff8 \
> > 7c0803a6 4e800020 8bad028a  [ 5948.373592] 7fe3fb78 4b64db15 6000 
> > 7c210b78 \
> >  89290009 792affe3 40820048  [ 5948.374515] Sending NMI from CPU 
> > 74 to CPUs \
> > 
> 


-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre


[ 9296.490900] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 9296.491039]  8-...: (18788 GPs behind) idle=0cc/0/0 softirq=653/653 fqs=0 
[ 9296.491081]  9-...: (18788 GPs behind) idle=860/0/0 softirq=702/702 fqs=0 
[ 9296.491114]  10-...: (47558 GPs behind) idle=f90/0/0 softirq=493/493 fqs=0 
[ 9296.491147]  14-...: (18791 GPs behind) idle=840/0/0 softirq=816/816 fqs=0 
[ 9296.491180]  17-...: (1976 GPs behind) idle=0ec/0/0 softirq=1147/1147 fqs=0 
[ 9296.491221]  19-...: (60 GPs behind) idle=6a8/0/0 softirq=1551/1551 fqs=0 
[ 9296.491256]  23-...: (965 GPs behind) idle=f2c/0/0 softirq=1130/1131 fqs=0 
[ 9296.491289]  25-...: (37469 GPs behind) idle=170/0/0 softirq=599/599 fqs=0 
[ 9296.491444]  27-...: (47543 GPs behind) idle=3b0/0/0 softirq=606/606 fqs=0 
[ 9296.491479]  28-...: (47541 GPs behind) idle=5d4/0/0 softirq=503/504 fqs=0 
[ 9296.491514]  29-...: (47541 GPs behind) idle=8b4/0/0 softirq=470/470 fqs=0 
[ 9296.491547]  32-...: (2324 GPs behind) idle=7c4/0/0 softirq=946/946 fqs=0 
[ 9296.491580]  33-...: (37473 GPs behind) idle=5a0/0/0 softirq=618/618 fqs=0 
[ 

Re: [linux-next] cpus stalls detected few hours after booting next kernel

2017-06-29 Thread Abdul Haleem
On Fri, 2017-06-30 at 00:45 +1000, Nicholas Piggin wrote:
> On Thu, 29 Jun 2017 20:23:05 +1000
> Nicholas Piggin  wrote:
> 
> > On Thu, 29 Jun 2017 19:36:14 +1000
> > Nicholas Piggin  wrote:
> 
> > > I don't *think* the replay-wakeup-interrupt patch is directly involved, 
> > > but
> > > it's likely to be one of the idle patches.  
> 
> Okay this turned out to be misconfigured sleep states I added for the
> simulator, sorry for the false alarm.
> 
> > Although you have this in the backtrace. I wonder if that's a stuck
> > lock in rcu_process_callbacks?
> 
> So this spinlock becomes top of the list of suspects. Can you try
> enabling lockdep and try to reproduce it?

Yes, recreated again with CONFIG_LOCKDEP=y & CONFIG_DEBUG_LOCKDEP=y set.
I do not see any difference in trace messages with and without LOCKDEP
enabled.

Please find the attached log file.

> 
> > [ 5948.345539] CPU: 63 PID: 7360 Comm: rs:main Q:Reg Not tainted \
> > 4.12.0-rc7-next-20170628 #2 [ 5948.345612] task: c00f1c14e600 
> > task.stack: \
> > c00f1c1e8000 [ 5948.345672] NIP: c09c7a10 LR: c09c7a08 
> > CTR: \
> > c015eda0 [ 5948.358553] REGS: c00f1c1eb150 TRAP: 0501   Not 
> > tainted  \
> > (4.12.0-rc7-next-20170628) [ 5948.358623] MSR: 90009033 \
> >  [ 5948.358626]   CR: 28082884  XER: 2000
> > [ 5948.358706] CFAR: c09c7a28 SOFTE: 1 
> > GPR00: c015f14c c00f1c1eb3d0 c1062b00 0001 
> > GPR04: c00fff6e6180  0001 00cc 
> > GPR08: 0001 804a   
> > GPR12: c015eda0 cfd55a80 
> > [ 5948.358986] NIP [c09c7a10] _raw_spin_lock_irqsave+0x90/0x100
> > [ 5948.359043] LR [c09c7a08] _raw_spin_lock_irqsave+0x88/0x100
> > [ 5948.359099] Call Trace:
> > [ 5948.359123] [c00f1c1eb3d0] [0001] 0x1 (unreliable)
> > [ 5948.359182] [c00f1c1eb410] [c015f14c] \
> d> rcu_process_callbacks+0x3ac/0x620 [ 5948.359252] [c00f1c1eb4c0] \
> > [c00e1e0c] __do_softirq+0x14c/0x3a0 [ 5948.365958] 
> > [c00f1c1eb5b0] \
> > [c00e2448] irq_exit+0x108/0x120 [ 5948.366016] [c00f1c1eb5d0] \
> > [c00232b4] timer_interrupt+0xa4/0xe0 [ 5948.366075] 
> > [c00f1c1eb600] \
> > [c0009208] decrementer_common+0x158/0x160 [ 5948.366149] --- 
> > interrupt: 901 \
> > at start_this_handle+0xd0/0x4b0  LR = jbd2__journal_start+0x17c/0x2b0
> > [ 5948.366242] [c00f1c1eb8f0] [c00f20a5cb00] 0xc00f20a5cb00 
> > (unreliable)
> > [ 5948.366314] [c00f1c1eba00] [c040717c] 
> > jbd2__journal_start+0x17c/0x2b0
> > [ 5948.366388] [c00f1c1eba70] [c038edf4] \
> > __ext4_journal_start_sb+0x84/0x180 [ 5948.366459] [c00f1c1ebad0] \
> > [c03b47dc] ext4_da_write_begin+0x17c/0x520 [ 5948.366532] 
> > [c00f1c1ebb90] \
> > [c021f9c8] generic_perform_write+0xe8/0x250 [ 5948.366604] 
> > [c00f1c1ebc20] \
> > [c0220d20] __generic_file_write_iter+0x200/0x240 [ 5948.366677] \
> > [c00f1c1ebc80] [c039d614] ext4_file_write_iter+0x2e4/0x4d0 [ 
> > 5948.373255] \
> > [c00f1c1ebd00] [c02e13c0] __vfs_write+0x120/0x200 [ 
> > 5948.373313] \
> > [c00f1c1ebd90] [c02e2c48] vfs_write+0xc8/0x240 [ 5948.373371] \
> > [c00f1c1ebde0] [c02e4940] SyS_write+0x60/0x110 [ 5948.373430] \
> > [c00f1c1ebe30] [c000b8e0] system_call+0x38/0xdc [ 5948.373486] \
> > Instruction dump: [ 5948.373521] 7fe3fb78 e8010010 eba1ffe8 ebc1fff0 
> > ebe1fff8 \
> > 7c0803a6 4e800020 8bad028a  [ 5948.373592] 7fe3fb78 4b64db15 6000 
> > 7c210b78 \
> >  89290009 792affe3 40820048  [ 5948.374515] Sending NMI from CPU 
> > 74 to CPUs \
> > 
> 


-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre


[ 9296.490900] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 9296.491039]  8-...: (18788 GPs behind) idle=0cc/0/0 softirq=653/653 fqs=0 
[ 9296.491081]  9-...: (18788 GPs behind) idle=860/0/0 softirq=702/702 fqs=0 
[ 9296.491114]  10-...: (47558 GPs behind) idle=f90/0/0 softirq=493/493 fqs=0 
[ 9296.491147]  14-...: (18791 GPs behind) idle=840/0/0 softirq=816/816 fqs=0 
[ 9296.491180]  17-...: (1976 GPs behind) idle=0ec/0/0 softirq=1147/1147 fqs=0 
[ 9296.491221]  19-...: (60 GPs behind) idle=6a8/0/0 softirq=1551/1551 fqs=0 
[ 9296.491256]  23-...: (965 GPs behind) idle=f2c/0/0 softirq=1130/1131 fqs=0 
[ 9296.491289]  25-...: (37469 GPs behind) idle=170/0/0 softirq=599/599 fqs=0 
[ 9296.491444]  27-...: (47543 GPs behind) idle=3b0/0/0 softirq=606/606 fqs=0 
[ 9296.491479]  28-...: (47541 GPs behind) idle=5d4/0/0 softirq=503/504 fqs=0 
[ 9296.491514]  29-...: (47541 GPs behind) idle=8b4/0/0 softirq=470/470 fqs=0 
[ 9296.491547]  32-...: (2324 GPs behind) idle=7c4/0/0 softirq=946/946 fqs=0 
[ 9296.491580]  33-...: (37473 GPs behind) idle=5a0/0/0 softirq=618/618 fqs=0 
[ 9296.491616]  37-...: (9317 GPs behind) idle=c98/0/0 softirq=679/679 

Re: [linux-next] cpus stalls detected few hours after booting next kernel

2017-06-29 Thread Nicholas Piggin
On Thu, 29 Jun 2017 20:23:05 +1000
Nicholas Piggin  wrote:

> On Thu, 29 Jun 2017 19:36:14 +1000
> Nicholas Piggin  wrote:

> > I don't *think* the replay-wakeup-interrupt patch is directly involved, but
> > it's likely to be one of the idle patches.  

Okay this turned out to be misconfigured sleep states I added for the
simulator, sorry for the false alarm.

> Although you have this in the backtrace. I wonder if that's a stuck
> lock in rcu_process_callbacks?

So this spinlock becomes top of the list of suspects. Can you try
enabling lockdep and try to reproduce it?

> [ 5948.345539] CPU: 63 PID: 7360 Comm: rs:main Q:Reg Not tainted \
> 4.12.0-rc7-next-20170628 #2 [ 5948.345612] task: c00f1c14e600 task.stack: 
> \
> c00f1c1e8000 [ 5948.345672] NIP: c09c7a10 LR: c09c7a08 
> CTR: \
> c015eda0 [ 5948.358553] REGS: c00f1c1eb150 TRAP: 0501   Not 
> tainted  \
> (4.12.0-rc7-next-20170628) [ 5948.358623] MSR: 90009033 \
>  [ 5948.358626]   CR: 28082884  XER: 2000
> [ 5948.358706] CFAR: c09c7a28 SOFTE: 1 
> GPR00: c015f14c c00f1c1eb3d0 c1062b00 0001 
> GPR04: c00fff6e6180  0001 00cc 
> GPR08: 0001 804a   
> GPR12: c015eda0 cfd55a80 
> [ 5948.358986] NIP [c09c7a10] _raw_spin_lock_irqsave+0x90/0x100
> [ 5948.359043] LR [c09c7a08] _raw_spin_lock_irqsave+0x88/0x100
> [ 5948.359099] Call Trace:
> [ 5948.359123] [c00f1c1eb3d0] [0001] 0x1 (unreliable)
> [ 5948.359182] [c00f1c1eb410] [c015f14c] \
d> rcu_process_callbacks+0x3ac/0x620 [ 5948.359252] [c00f1c1eb4c0] \
> [c00e1e0c] __do_softirq+0x14c/0x3a0 [ 5948.365958] [c00f1c1eb5b0] 
> \
> [c00e2448] irq_exit+0x108/0x120 [ 5948.366016] [c00f1c1eb5d0] \
> [c00232b4] timer_interrupt+0xa4/0xe0 [ 5948.366075] 
> [c00f1c1eb600] \
> [c0009208] decrementer_common+0x158/0x160 [ 5948.366149] --- 
> interrupt: 901 \
> at start_this_handle+0xd0/0x4b0  LR = jbd2__journal_start+0x17c/0x2b0
> [ 5948.366242] [c00f1c1eb8f0] [c00f20a5cb00] 0xc00f20a5cb00 
> (unreliable)
> [ 5948.366314] [c00f1c1eba00] [c040717c] 
> jbd2__journal_start+0x17c/0x2b0
> [ 5948.366388] [c00f1c1eba70] [c038edf4] \
> __ext4_journal_start_sb+0x84/0x180 [ 5948.366459] [c00f1c1ebad0] \
> [c03b47dc] ext4_da_write_begin+0x17c/0x520 [ 5948.366532] 
> [c00f1c1ebb90] \
> [c021f9c8] generic_perform_write+0xe8/0x250 [ 5948.366604] 
> [c00f1c1ebc20] \
> [c0220d20] __generic_file_write_iter+0x200/0x240 [ 5948.366677] \
> [c00f1c1ebc80] [c039d614] ext4_file_write_iter+0x2e4/0x4d0 [ 
> 5948.373255] \
> [c00f1c1ebd00] [c02e13c0] __vfs_write+0x120/0x200 [ 5948.373313] \
> [c00f1c1ebd90] [c02e2c48] vfs_write+0xc8/0x240 [ 5948.373371] \
> [c00f1c1ebde0] [c02e4940] SyS_write+0x60/0x110 [ 5948.373430] \
> [c00f1c1ebe30] [c000b8e0] system_call+0x38/0xdc [ 5948.373486] \
> Instruction dump: [ 5948.373521] 7fe3fb78 e8010010 eba1ffe8 ebc1fff0 ebe1fff8 
> \
> 7c0803a6 4e800020 8bad028a  [ 5948.373592] 7fe3fb78 4b64db15 6000 
> 7c210b78 \
>  89290009 792affe3 40820048  [ 5948.374515] Sending NMI from CPU 74 
> to CPUs \
> 



Re: [linux-next] cpus stalls detected few hours after booting next kernel

2017-06-29 Thread Nicholas Piggin
On Thu, 29 Jun 2017 20:23:05 +1000
Nicholas Piggin  wrote:

> On Thu, 29 Jun 2017 19:36:14 +1000
> Nicholas Piggin  wrote:

> > I don't *think* the replay-wakeup-interrupt patch is directly involved, but
> > it's likely to be one of the idle patches.  

Okay this turned out to be misconfigured sleep states I added for the
simulator, sorry for the false alarm.

> Although you have this in the backtrace. I wonder if that's a stuck
> lock in rcu_process_callbacks?

So this spinlock becomes top of the list of suspects. Can you try
enabling lockdep and try to reproduce it?

> [ 5948.345539] CPU: 63 PID: 7360 Comm: rs:main Q:Reg Not tainted \
> 4.12.0-rc7-next-20170628 #2 [ 5948.345612] task: c00f1c14e600 task.stack: 
> \
> c00f1c1e8000 [ 5948.345672] NIP: c09c7a10 LR: c09c7a08 
> CTR: \
> c015eda0 [ 5948.358553] REGS: c00f1c1eb150 TRAP: 0501   Not 
> tainted  \
> (4.12.0-rc7-next-20170628) [ 5948.358623] MSR: 90009033 \
>  [ 5948.358626]   CR: 28082884  XER: 2000
> [ 5948.358706] CFAR: c09c7a28 SOFTE: 1 
> GPR00: c015f14c c00f1c1eb3d0 c1062b00 0001 
> GPR04: c00fff6e6180  0001 00cc 
> GPR08: 0001 804a   
> GPR12: c015eda0 cfd55a80 
> [ 5948.358986] NIP [c09c7a10] _raw_spin_lock_irqsave+0x90/0x100
> [ 5948.359043] LR [c09c7a08] _raw_spin_lock_irqsave+0x88/0x100
> [ 5948.359099] Call Trace:
> [ 5948.359123] [c00f1c1eb3d0] [0001] 0x1 (unreliable)
> [ 5948.359182] [c00f1c1eb410] [c015f14c] \
d> rcu_process_callbacks+0x3ac/0x620 [ 5948.359252] [c00f1c1eb4c0] \
> [c00e1e0c] __do_softirq+0x14c/0x3a0 [ 5948.365958] [c00f1c1eb5b0] 
> \
> [c00e2448] irq_exit+0x108/0x120 [ 5948.366016] [c00f1c1eb5d0] \
> [c00232b4] timer_interrupt+0xa4/0xe0 [ 5948.366075] 
> [c00f1c1eb600] \
> [c0009208] decrementer_common+0x158/0x160 [ 5948.366149] --- 
> interrupt: 901 \
> at start_this_handle+0xd0/0x4b0  LR = jbd2__journal_start+0x17c/0x2b0
> [ 5948.366242] [c00f1c1eb8f0] [c00f20a5cb00] 0xc00f20a5cb00 
> (unreliable)
> [ 5948.366314] [c00f1c1eba00] [c040717c] 
> jbd2__journal_start+0x17c/0x2b0
> [ 5948.366388] [c00f1c1eba70] [c038edf4] \
> __ext4_journal_start_sb+0x84/0x180 [ 5948.366459] [c00f1c1ebad0] \
> [c03b47dc] ext4_da_write_begin+0x17c/0x520 [ 5948.366532] 
> [c00f1c1ebb90] \
> [c021f9c8] generic_perform_write+0xe8/0x250 [ 5948.366604] 
> [c00f1c1ebc20] \
> [c0220d20] __generic_file_write_iter+0x200/0x240 [ 5948.366677] \
> [c00f1c1ebc80] [c039d614] ext4_file_write_iter+0x2e4/0x4d0 [ 
> 5948.373255] \
> [c00f1c1ebd00] [c02e13c0] __vfs_write+0x120/0x200 [ 5948.373313] \
> [c00f1c1ebd90] [c02e2c48] vfs_write+0xc8/0x240 [ 5948.373371] \
> [c00f1c1ebde0] [c02e4940] SyS_write+0x60/0x110 [ 5948.373430] \
> [c00f1c1ebe30] [c000b8e0] system_call+0x38/0xdc [ 5948.373486] \
> Instruction dump: [ 5948.373521] 7fe3fb78 e8010010 eba1ffe8 ebc1fff0 ebe1fff8 
> \
> 7c0803a6 4e800020 8bad028a  [ 5948.373592] 7fe3fb78 4b64db15 6000 
> 7c210b78 \
>  89290009 792affe3 40820048  [ 5948.374515] Sending NMI from CPU 74 
> to CPUs \
> 



Re: [linux-next] cpus stalls detected few hours after booting next kernel

2017-06-29 Thread Nicholas Piggin
On Thu, 29 Jun 2017 19:36:14 +1000
Nicholas Piggin  wrote:

> I've seen this as well (or something like it) in mambo at boot, but
> it's pretty rare to hit. I'm trying to debug it.
> 
> I'm guessing possibly an idle vs interrupt race.
> 
> > [ 4255.151192] Sending NMI from CPU 5 to CPUs 0:
> > [ 4255.151246] NMI backtrace for cpu 0
> > [ 4255.151287] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
> > 4.12.0-rc7-next-20170628 #2
> > [ 4255.151363] task: c007f8495600 task.stack: c007f842c000
> > [ 4255.151428] NIP: c000adb4 LR: c0015584 CTR: 
> > c082f4b0
> > [ 4255.151504] REGS: c007f842fb60 TRAP: 0e81   Not tainted  
> > (4.12.0-rc7-next-20170628)
> > [ 4255.151578] MSR: 90009033 
> > [ 4255.151586]   CR: 22004884  XER: 
> > [ 4255.151675] CFAR: c062c108 SOFTE: 1 
> > [ 4255.151675] GPR00: c082d6c8 c007f842fde0 c1062b00 
> > 2800 
> > [ 4255.151675] GPR04: 0003 c0089830 3aa8056bc35f 
> > 0001 
> > [ 4255.151675] GPR08: 0002 c0d52d80 0007fe7d 
> > 90001003 
> > [ 4255.151675] GPR12: c082a0c0 cfd4 
> > [ 4255.152217] NIP [c000adb4] .L__replay_interrupt_return+0x0/0x4
> > [ 4255.152334] LR [c0015584] arch_local_irq_restore+0x74/0x90
> > [ 4255.152447] Call Trace:
> > [ 4255.152499] [c007f842fde0] [c017cec0] 
> > tick_broadcast_oneshot_control+0x40/0x60 (unreliable)
> > [ 4255.152662] [c007f842fe00] [c082d6c8] 
> > cpuidle_enter_state+0x108/0x3d0
> > [ 4255.152803] [c007f842fe60] [c0133e94] call_cpuidle+0x44/0x80
> > [ 4255.152921] [c007f842fe80] [c0134240] do_idle+0x290/0x2f0
> > [ 4255.153037] [c007f842fef0] [c0134474] 
> > cpu_startup_entry+0x34/0x40
> > [ 4255.153176] [c007f842ff20] [c0041944] 
> > start_secondary+0x304/0x360
> > [ 4255.153316] [c007f842ff90] [c000b16c] 
> > start_secondary_prolog+0x10/0x14
> > [ 4255.153455] Instruction dump:
> > [ 4255.153527] 7d200026 618c8000 2c030900 4182e320 2c030500 4182dd68 
> > 2c030e80 4182ffa4 
> > [ 4255.153668] 2c030ea0 4182f078 2c030e60 4182edb0 <4e800020> 7c781b78 
> > 480003c9 480003e1   
> 
> What I think you're seeing here is just the lockup IPI causes the CPU to
> wake from idle, and so the backtrace gets recorded from where the interrupt
> gets replayed when the cpuidle code returns from the low level idle call
> then re-enables interrupts.
> 
> I don't *think* the replay-wakeup-interrupt patch is directly involved, but
> it's likely to be one of the idle patches.

Although you have this in the backtrace. I wonder if that's a stuck
lock in rcu_process_callbacks?


[ 5948.345539] CPU: 63 PID: 7360 Comm: rs:main Q:Reg Not tainted \
4.12.0-rc7-next-20170628 #2 [ 5948.345612] task: c00f1c14e600 task.stack: \
c00f1c1e8000 [ 5948.345672] NIP: c09c7a10 LR: c09c7a08 CTR: 
\
c015eda0 [ 5948.358553] REGS: c00f1c1eb150 TRAP: 0501   Not tainted 
 \
(4.12.0-rc7-next-20170628) [ 5948.358623] MSR: 90009033 \
 [ 5948.358626]   CR: 28082884  XER: 2000
[ 5948.358706] CFAR: c09c7a28 SOFTE: 1 
GPR00: c015f14c c00f1c1eb3d0 c1062b00 0001 
GPR04: c00fff6e6180  0001 00cc 
GPR08: 0001 804a   
GPR12: c015eda0 cfd55a80 
[ 5948.358986] NIP [c09c7a10] _raw_spin_lock_irqsave+0x90/0x100
[ 5948.359043] LR [c09c7a08] _raw_spin_lock_irqsave+0x88/0x100
[ 5948.359099] Call Trace:
[ 5948.359123] [c00f1c1eb3d0] [0001] 0x1 (unreliable)
[ 5948.359182] [c00f1c1eb410] [c015f14c] \
rcu_process_callbacks+0x3ac/0x620 [ 5948.359252] [c00f1c1eb4c0] \
[c00e1e0c] __do_softirq+0x14c/0x3a0 [ 5948.365958] [c00f1c1eb5b0] \
[c00e2448] irq_exit+0x108/0x120 [ 5948.366016] [c00f1c1eb5d0] \
[c00232b4] timer_interrupt+0xa4/0xe0 [ 5948.366075] [c00f1c1eb600] \
[c0009208] decrementer_common+0x158/0x160 [ 5948.366149] --- interrupt: 
901 \
at start_this_handle+0xd0/0x4b0  LR = jbd2__journal_start+0x17c/0x2b0
[ 5948.366242] [c00f1c1eb8f0] [c00f20a5cb00] 0xc00f20a5cb00 
(unreliable)
[ 5948.366314] [c00f1c1eba00] [c040717c] 
jbd2__journal_start+0x17c/0x2b0
[ 5948.366388] [c00f1c1eba70] [c038edf4] \
__ext4_journal_start_sb+0x84/0x180 [ 5948.366459] [c00f1c1ebad0] \
[c03b47dc] ext4_da_write_begin+0x17c/0x520 [ 5948.366532] 
[c00f1c1ebb90] \
[c021f9c8] generic_perform_write+0xe8/0x250 [ 5948.366604] 
[c00f1c1ebc20] \
[c0220d20] __generic_file_write_iter+0x200/0x240 [ 5948.366677] \
[c00f1c1ebc80] [c039d614] ext4_file_write_iter+0x2e4/0x4d0 [ 
5948.373255] \
[c00f1c1ebd00] [c02e13c0] 

Re: [linux-next] cpus stalls detected few hours after booting next kernel

2017-06-29 Thread Nicholas Piggin
On Thu, 29 Jun 2017 19:36:14 +1000
Nicholas Piggin  wrote:

> I've seen this as well (or something like it) in mambo at boot, but
> it's pretty rare to hit. I'm trying to debug it.
> 
> I'm guessing possibly an idle vs interrupt race.
> 
> > [ 4255.151192] Sending NMI from CPU 5 to CPUs 0:
> > [ 4255.151246] NMI backtrace for cpu 0
> > [ 4255.151287] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
> > 4.12.0-rc7-next-20170628 #2
> > [ 4255.151363] task: c007f8495600 task.stack: c007f842c000
> > [ 4255.151428] NIP: c000adb4 LR: c0015584 CTR: 
> > c082f4b0
> > [ 4255.151504] REGS: c007f842fb60 TRAP: 0e81   Not tainted  
> > (4.12.0-rc7-next-20170628)
> > [ 4255.151578] MSR: 90009033 
> > [ 4255.151586]   CR: 22004884  XER: 
> > [ 4255.151675] CFAR: c062c108 SOFTE: 1 
> > [ 4255.151675] GPR00: c082d6c8 c007f842fde0 c1062b00 
> > 2800 
> > [ 4255.151675] GPR04: 0003 c0089830 3aa8056bc35f 
> > 0001 
> > [ 4255.151675] GPR08: 0002 c0d52d80 0007fe7d 
> > 90001003 
> > [ 4255.151675] GPR12: c082a0c0 cfd4 
> > [ 4255.152217] NIP [c000adb4] .L__replay_interrupt_return+0x0/0x4
> > [ 4255.152334] LR [c0015584] arch_local_irq_restore+0x74/0x90
> > [ 4255.152447] Call Trace:
> > [ 4255.152499] [c007f842fde0] [c017cec0] 
> > tick_broadcast_oneshot_control+0x40/0x60 (unreliable)
> > [ 4255.152662] [c007f842fe00] [c082d6c8] 
> > cpuidle_enter_state+0x108/0x3d0
> > [ 4255.152803] [c007f842fe60] [c0133e94] call_cpuidle+0x44/0x80
> > [ 4255.152921] [c007f842fe80] [c0134240] do_idle+0x290/0x2f0
> > [ 4255.153037] [c007f842fef0] [c0134474] 
> > cpu_startup_entry+0x34/0x40
> > [ 4255.153176] [c007f842ff20] [c0041944] 
> > start_secondary+0x304/0x360
> > [ 4255.153316] [c007f842ff90] [c000b16c] 
> > start_secondary_prolog+0x10/0x14
> > [ 4255.153455] Instruction dump:
> > [ 4255.153527] 7d200026 618c8000 2c030900 4182e320 2c030500 4182dd68 
> > 2c030e80 4182ffa4 
> > [ 4255.153668] 2c030ea0 4182f078 2c030e60 4182edb0 <4e800020> 7c781b78 
> > 480003c9 480003e1   
> 
> What I think you're seeing here is just the lockup IPI causes the CPU to
> wake from idle, and so the backtrace gets recorded from where the interrupt
> gets replayed when the cpuidle code returns from the low level idle call
> then re-enables interrupts.
> 
> I don't *think* the replay-wakeup-interrupt patch is directly involved, but
> it's likely to be one of the idle patches.

Although you have this in the backtrace. I wonder if that's a stuck
lock in rcu_process_callbacks?


[ 5948.345539] CPU: 63 PID: 7360 Comm: rs:main Q:Reg Not tainted \
4.12.0-rc7-next-20170628 #2 [ 5948.345612] task: c00f1c14e600 task.stack: \
c00f1c1e8000 [ 5948.345672] NIP: c09c7a10 LR: c09c7a08 CTR: 
\
c015eda0 [ 5948.358553] REGS: c00f1c1eb150 TRAP: 0501   Not tainted 
 \
(4.12.0-rc7-next-20170628) [ 5948.358623] MSR: 90009033 \
 [ 5948.358626]   CR: 28082884  XER: 2000
[ 5948.358706] CFAR: c09c7a28 SOFTE: 1 
GPR00: c015f14c c00f1c1eb3d0 c1062b00 0001 
GPR04: c00fff6e6180  0001 00cc 
GPR08: 0001 804a   
GPR12: c015eda0 cfd55a80 
[ 5948.358986] NIP [c09c7a10] _raw_spin_lock_irqsave+0x90/0x100
[ 5948.359043] LR [c09c7a08] _raw_spin_lock_irqsave+0x88/0x100
[ 5948.359099] Call Trace:
[ 5948.359123] [c00f1c1eb3d0] [0001] 0x1 (unreliable)
[ 5948.359182] [c00f1c1eb410] [c015f14c] \
rcu_process_callbacks+0x3ac/0x620 [ 5948.359252] [c00f1c1eb4c0] \
[c00e1e0c] __do_softirq+0x14c/0x3a0 [ 5948.365958] [c00f1c1eb5b0] \
[c00e2448] irq_exit+0x108/0x120 [ 5948.366016] [c00f1c1eb5d0] \
[c00232b4] timer_interrupt+0xa4/0xe0 [ 5948.366075] [c00f1c1eb600] \
[c0009208] decrementer_common+0x158/0x160 [ 5948.366149] --- interrupt: 
901 \
at start_this_handle+0xd0/0x4b0  LR = jbd2__journal_start+0x17c/0x2b0
[ 5948.366242] [c00f1c1eb8f0] [c00f20a5cb00] 0xc00f20a5cb00 
(unreliable)
[ 5948.366314] [c00f1c1eba00] [c040717c] 
jbd2__journal_start+0x17c/0x2b0
[ 5948.366388] [c00f1c1eba70] [c038edf4] \
__ext4_journal_start_sb+0x84/0x180 [ 5948.366459] [c00f1c1ebad0] \
[c03b47dc] ext4_da_write_begin+0x17c/0x520 [ 5948.366532] 
[c00f1c1ebb90] \
[c021f9c8] generic_perform_write+0xe8/0x250 [ 5948.366604] 
[c00f1c1ebc20] \
[c0220d20] __generic_file_write_iter+0x200/0x240 [ 5948.366677] \
[c00f1c1ebc80] [c039d614] ext4_file_write_iter+0x2e4/0x4d0 [ 
5948.373255] \
[c00f1c1ebd00] [c02e13c0] __vfs_write+0x120/0x200 [ 5948.373313] \
[c00f1c1ebd90] [c02e2c48] 

Re: [linux-next] cpus stalls detected few hours after booting next kernel

2017-06-29 Thread Nicholas Piggin
I've seen this as well (or something like it) in mambo at boot, but
it's pretty rare to hit. I'm trying to debug it.

I'm guessing possibly an idle vs interrupt race.

> [ 4255.151192] Sending NMI from CPU 5 to CPUs 0:
> [ 4255.151246] NMI backtrace for cpu 0
> [ 4255.151287] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
> 4.12.0-rc7-next-20170628 #2
> [ 4255.151363] task: c007f8495600 task.stack: c007f842c000
> [ 4255.151428] NIP: c000adb4 LR: c0015584 CTR: 
> c082f4b0
> [ 4255.151504] REGS: c007f842fb60 TRAP: 0e81   Not tainted  
> (4.12.0-rc7-next-20170628)
> [ 4255.151578] MSR: 90009033 
> [ 4255.151586]   CR: 22004884  XER: 
> [ 4255.151675] CFAR: c062c108 SOFTE: 1 
> [ 4255.151675] GPR00: c082d6c8 c007f842fde0 c1062b00 
> 2800 
> [ 4255.151675] GPR04: 0003 c0089830 3aa8056bc35f 
> 0001 
> [ 4255.151675] GPR08: 0002 c0d52d80 0007fe7d 
> 90001003 
> [ 4255.151675] GPR12: c082a0c0 cfd4 
> [ 4255.152217] NIP [c000adb4] .L__replay_interrupt_return+0x0/0x4
> [ 4255.152334] LR [c0015584] arch_local_irq_restore+0x74/0x90
> [ 4255.152447] Call Trace:
> [ 4255.152499] [c007f842fde0] [c017cec0] 
> tick_broadcast_oneshot_control+0x40/0x60 (unreliable)
> [ 4255.152662] [c007f842fe00] [c082d6c8] 
> cpuidle_enter_state+0x108/0x3d0
> [ 4255.152803] [c007f842fe60] [c0133e94] call_cpuidle+0x44/0x80
> [ 4255.152921] [c007f842fe80] [c0134240] do_idle+0x290/0x2f0
> [ 4255.153037] [c007f842fef0] [c0134474] 
> cpu_startup_entry+0x34/0x40
> [ 4255.153176] [c007f842ff20] [c0041944] 
> start_secondary+0x304/0x360
> [ 4255.153316] [c007f842ff90] [c000b16c] 
> start_secondary_prolog+0x10/0x14
> [ 4255.153455] Instruction dump:
> [ 4255.153527] 7d200026 618c8000 2c030900 4182e320 2c030500 4182dd68 2c030e80 
> 4182ffa4 
> [ 4255.153668] 2c030ea0 4182f078 2c030e60 4182edb0 <4e800020> 7c781b78 
> 480003c9 480003e1 

What I think you're seeing here is just the lockup IPI causes the CPU to
wake from idle, and so the backtrace gets recorded from where the interrupt
gets replayed when the cpuidle code returns from the low level idle call
then re-enables interrupts.

I don't *think* the replay-wakeup-interrupt patch is directly involved, but
it's likely to be one of the idle patches.

Thanks,
Nick


Re: [linux-next] cpus stalls detected few hours after booting next kernel

2017-06-29 Thread Nicholas Piggin
I've seen this as well (or something like it) in mambo at boot, but
it's pretty rare to hit. I'm trying to debug it.

I'm guessing possibly an idle vs interrupt race.

> [ 4255.151192] Sending NMI from CPU 5 to CPUs 0:
> [ 4255.151246] NMI backtrace for cpu 0
> [ 4255.151287] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
> 4.12.0-rc7-next-20170628 #2
> [ 4255.151363] task: c007f8495600 task.stack: c007f842c000
> [ 4255.151428] NIP: c000adb4 LR: c0015584 CTR: 
> c082f4b0
> [ 4255.151504] REGS: c007f842fb60 TRAP: 0e81   Not tainted  
> (4.12.0-rc7-next-20170628)
> [ 4255.151578] MSR: 90009033 
> [ 4255.151586]   CR: 22004884  XER: 
> [ 4255.151675] CFAR: c062c108 SOFTE: 1 
> [ 4255.151675] GPR00: c082d6c8 c007f842fde0 c1062b00 
> 2800 
> [ 4255.151675] GPR04: 0003 c0089830 3aa8056bc35f 
> 0001 
> [ 4255.151675] GPR08: 0002 c0d52d80 0007fe7d 
> 90001003 
> [ 4255.151675] GPR12: c082a0c0 cfd4 
> [ 4255.152217] NIP [c000adb4] .L__replay_interrupt_return+0x0/0x4
> [ 4255.152334] LR [c0015584] arch_local_irq_restore+0x74/0x90
> [ 4255.152447] Call Trace:
> [ 4255.152499] [c007f842fde0] [c017cec0] 
> tick_broadcast_oneshot_control+0x40/0x60 (unreliable)
> [ 4255.152662] [c007f842fe00] [c082d6c8] 
> cpuidle_enter_state+0x108/0x3d0
> [ 4255.152803] [c007f842fe60] [c0133e94] call_cpuidle+0x44/0x80
> [ 4255.152921] [c007f842fe80] [c0134240] do_idle+0x290/0x2f0
> [ 4255.153037] [c007f842fef0] [c0134474] 
> cpu_startup_entry+0x34/0x40
> [ 4255.153176] [c007f842ff20] [c0041944] 
> start_secondary+0x304/0x360
> [ 4255.153316] [c007f842ff90] [c000b16c] 
> start_secondary_prolog+0x10/0x14
> [ 4255.153455] Instruction dump:
> [ 4255.153527] 7d200026 618c8000 2c030900 4182e320 2c030500 4182dd68 2c030e80 
> 4182ffa4 
> [ 4255.153668] 2c030ea0 4182f078 2c030e60 4182edb0 <4e800020> 7c781b78 
> 480003c9 480003e1 

What I think you're seeing here is just the lockup IPI causes the CPU to
wake from idle, and so the backtrace gets recorded from where the interrupt
gets replayed when the cpuidle code returns from the low level idle call
then re-enables interrupts.

I don't *think* the replay-wakeup-interrupt patch is directly involved, but
it's likely to be one of the idle patches.

Thanks,
Nick