RE: [EXT] Re: stress-ng --hrtimers hangs system

2020-06-21 Thread Jiafei Pan
Thanks Kurt for your confirmation.

Best Regards,
Jiafei.

-Original Message-
From: Kurt Kanzenbach  
Sent: Saturday, June 20, 2020 4:08 PM
To: Vladimir Oltean ; Jiafei Pan 
Cc: linux-rt-us...@vger.kernel.org; lkml ; 
r...@vger.kernel.org; Colin King ; Jiafei Pan 

Subject: Re: [EXT] Re: stress-ng --hrtimers hangs system

Caution: EXT Email

Hi Jiafei,

On Friday, June 12, 2020 2:49:17 AM CEST Jiafei Pan wrote:
> Hi, Kurt,
>
> May I know whether you used "root" user to run stress-ng? using "root"
> user will change the scheduler to be "SCHED_RR", so would you please 
> share test result with root and non-root users? Thanks.

Performed the test now as root and non-root user. Same result: No problem.

Thanks,
Kurt





Re: [EXT] Re: stress-ng --hrtimers hangs system

2020-06-20 Thread Kurt Kanzenbach
Hi Jiafei,

On Friday, June 12, 2020 2:49:17 AM CEST Jiafei Pan wrote:
> Hi, Kurt,
> 
> May I know whether you used "root" user to run stress-ng? using "root"
> user will change the scheduler to be "SCHED_RR", so would you please
> share test result with root and non-root users? Thanks.

Performed the test now as root and non-root user. Same result: No 
problem.

Thanks,
Kurt





RE: [EXT] Re: stress-ng --hrtimers hangs system

2020-06-11 Thread Jiafei Pan
Hi, Kurt,

May I know whether you used "root" user to run stress-ng? using "root" user 
will change the scheduler to be "SCHED_RR", so would you please share test 
result with root and non-root users? Thanks. 

Best Regards,
Jiafei.

-Original Message-
From: linux-rt-users-ow...@vger.kernel.org 
 On Behalf Of Kurt Kanzenbach
Sent: Wednesday, June 10, 2020 1:58 PM
To: Vladimir Oltean 
Cc: linux-rt-us...@vger.kernel.org; lkml ; 
r...@vger.kernel.org; Colin King 
Subject: [EXT] Re: stress-ng --hrtimers hangs system

Hi Vladimir,

On Tue Jun 09 2020, Vladimir Oltean wrote:
> Just out of curiosity, what and how many CPU cores does your ARM64 box 
> have, and what frequency are you running them at?
> Mine is a dual-core A72 machine running at 1500 MHz.

That particular machine has a dual core Cortex A53 running at 1GHz.

Thanks,
Kurt


Re: stress-ng --hrtimers hangs system

2020-06-10 Thread Vladimir Oltean
Hi Kurt,

On Wed, 10 Jun 2020 at 08:58, Kurt Kanzenbach
 wrote:
>
> Hi Vladimir,
>
> On Tue Jun 09 2020, Vladimir Oltean wrote:
> > Just out of curiosity, what and how many CPU cores does your ARM64 box
> > have, and what frequency are you running them at?
> > Mine is a dual-core A72 machine running at 1500 MHz.
>
> That particular machine has a dual core Cortex A53 running at 1GHz.
>
> Thanks,
> Kurt

Ok, I just noticed that you said "v5.6*-rt*".
So I re-tested that 5.4 downstream kernel again that hanged before,
just applied the -rt1 patchset on top. And it's actually working fine
now, no hangs, no RCU stalls, no need to reduce sched_rt_runtime_us.
So maybe it's something related to the config?

Thanks,
-Vladimir


Re: stress-ng --hrtimers hangs system

2020-06-09 Thread Kurt Kanzenbach
Hi Vladimir,

On Tue Jun 09 2020, Vladimir Oltean wrote:
> Just out of curiosity, what and how many CPU cores does your ARM64 box
> have, and what frequency are you running them at?
> Mine is a dual-core A72 machine running at 1500 MHz.

That particular machine has a dual core Cortex A53 running at 1GHz.

Thanks,
Kurt


signature.asc
Description: PGP signature


Re: stress-ng --hrtimers hangs system

2020-06-09 Thread Vladimir Oltean
Hi Kurt,

On Fri, 5 Jun 2020 at 16:19, Vladimir Oltean  wrote:
>
> Hi Kurt,
>
> On Fri, 5 Jun 2020 at 15:57, Kurt Kanzenbach
>  wrote:
> >
> > Hi Vladimir,
> >
> > On Fri Jun 05 2020, Vladimir Oltean wrote:
> > > Hi,
> > >
> > > I was testing stress-ng on an ARM64 box and I found that it can be killed 
> > > instantaneously with a --hrtimers 1 test:
> > > https://github.com/ColinIanKing/stress-ng/blob/master/stress-hrtimers.c
> > > The console shell locks up immediately after starting the process, and I 
> > > get this rcu_preempt splat after 21 seconds,
> > > letting me know that the grace-periods kernel thread could not run:
> >
> > interesting. Just tested this on an ARM64 box with v5.6-rt and the
> > stress-ng hrtimer test works fine. No lockups, cyclictest results are
> > looking good. So maybe this is v5.7 related.
> >
> > Thanks,
> > Kurt
>
> This is not actually v5.7 related. I could also reproduce it on a
> downstream 5.4 kernel which is how I originally saw it.
>
> Thanks,
> -Vladimir

Just out of curiosity, what and how many CPU cores does your ARM64 box
have, and what frequency are you running them at?
Mine is a dual-core A72 machine running at 1500 MHz.

Thanks,
-Vladimir


Re: stress-ng --hrtimers hangs system

2020-06-05 Thread Vladimir Oltean
Hi Kurt,

On Fri, 5 Jun 2020 at 15:57, Kurt Kanzenbach
 wrote:
>
> Hi Vladimir,
>
> On Fri Jun 05 2020, Vladimir Oltean wrote:
> > Hi,
> >
> > I was testing stress-ng on an ARM64 box and I found that it can be killed 
> > instantaneously with a --hrtimers 1 test:
> > https://github.com/ColinIanKing/stress-ng/blob/master/stress-hrtimers.c
> > The console shell locks up immediately after starting the process, and I 
> > get this rcu_preempt splat after 21 seconds,
> > letting me know that the grace-periods kernel thread could not run:
>
> interesting. Just tested this on an ARM64 box with v5.6-rt and the
> stress-ng hrtimer test works fine. No lockups, cyclictest results are
> looking good. So maybe this is v5.7 related.
>
> Thanks,
> Kurt

This is not actually v5.7 related. I could also reproduce it on a
downstream 5.4 kernel which is how I originally saw it.

Thanks,
-Vladimir


Re: stress-ng --hrtimers hangs system

2020-06-05 Thread Kurt Kanzenbach
Hi Vladimir,

On Fri Jun 05 2020, Vladimir Oltean wrote:
> Hi,
>
> I was testing stress-ng on an ARM64 box and I found that it can be killed 
> instantaneously with a --hrtimers 1 test:
> https://github.com/ColinIanKing/stress-ng/blob/master/stress-hrtimers.c
> The console shell locks up immediately after starting the process, and I get 
> this rcu_preempt splat after 21 seconds,
> letting me know that the grace-periods kernel thread could not run:

interesting. Just tested this on an ARM64 box with v5.6-rt and the
stress-ng hrtimer test works fine. No lockups, cyclictest results are
looking good. So maybe this is v5.7 related.

Thanks,
Kurt


signature.asc
Description: PGP signature


Re: stress-ng --hrtimers hangs system

2020-06-05 Thread Vladimir Oltean
Hi Paul,

On Fri, 5 Jun 2020 at 06:20, Paul E. McKenney  wrote:
>

>
> That agrees with my interpretation of the stall warning message.
>
> On the rest, I must defer to others.
>
> Thanx, Paul
>

Thanks, always good to have a confirmation from you!
FWIW, the system has survived an overnight run with
sched_rt_runtime_us set to 80%, I don't know if there's more to it.

Regards,
-Vladimir


Re: stress-ng --hrtimers hangs system

2020-06-04 Thread Paul E. McKenney
On Fri, Jun 05, 2020 at 04:47:51AM +0300, Vladimir Oltean wrote:
> Hi,
> 
> I was testing stress-ng on an ARM64 box and I found that it can be killed 
> instantaneously with a --hrtimers 1 test:
> https://github.com/ColinIanKing/stress-ng/blob/master/stress-hrtimers.c
> The console shell locks up immediately after starting the process, and I get 
> this rcu_preempt splat after 21 seconds,
> letting me know that the grace-periods kernel thread could not run:
> 
> # stress-ng --hrtimers 1
> stress-ng: info:  [320] defaulting to a 86400 second (1 day, 0.00 secs) run 
> per stressor
> stress-ng: info:  [320] dispatching hogs: 1 hrtimers
> stress-ng: info:  [320] cache allocate: using defaults, can't determine cache 
> details from sysfs
> [   85.827528] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [   85.833656]  (detected by 1, t=6502 jiffies, g=1789, q=12)
> [   85.839163] rcu: All QSes seen, last rcu_preempt kthread activity 6502 
> (4294913720-4294907218), jiffies_till_next_fqs=1, root ->qsmask 0x0
> [   85.851647] rcu: rcu_preempt kthread starved for 6502 jiffies! g1789 f0x2 
> RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> [   85.861859] rcu: Unless rcu_preempt kthread gets sufficient CPU time, 
> OOM is now expected behavior.
> [   85.871025] rcu: RCU grace-period kthread stack dump:
> [   85.876094] rcu_preempt R  running task010  2 
> 0x0028
> [   85.883173] Call trace:
> [   85.885636]  __switch_to+0xf8/0x148
> [   85.889137]  __schedule+0x2d8/0x808
> [   85.892636]  schedule+0x48/0x100
> [   85.895875]  schedule_timeout+0x1c8/0x420
> [   85.899900]  rcu_gp_kthread+0x738/0x1b78
> [   85.903836]  kthread+0x158/0x168
> [   85.907075]  ret_from_fork+0x10/0x18
> [   93.283548] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 
> stuck for 33s!
> [   93.291569] BUG: workqueue lockup - pool cpus=0-1 flags=0x4 nice=0 stuck 
> for 33s!
> [   93.299105] Showing busy workqueues and worker pools:
> [   93.304189] workqueue events: flags=0x0
> [   93.308116]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [   93.315031] pending: vmstat_shepherd
> [   93.318990] workqueue events_unbound: flags=0x2
> [   93.323577]   pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/512 refcnt=3
> [   93.330309] pending: flush_to_ldisc
> [   93.334161]
> [   93.334164] ==
> [   93.334165] WARNING: possible circular locking dependency detected
> [   93.334167] 5.7.0-08604-g7dcedf8c04c0 #118 Not tainted
> [   93.334168] --
> [   93.334169] stress-ng-hrtim/326 is trying to acquire lock:
> [   93.334171] b236f6aee1a0 (console_owner){-.-.}-{0:0}, at: 
> console_unlock+0x21c/0x678
> [   93.334176]
> [   93.334177] but task is already holding lock:
> [   93.334178] 00207ac0d018 (&pool->lock/1){-.-.}-{2:2}, at: 
> show_workqueue_state+0x288/0x3a8
> [   93.334184]
> [   93.334186] which lock already depends on the new lock.
> [   93.334187]
> [   93.334188]
> [   93.334189] the existing dependency chain (in reverse order) is:
> [   93.334190]
> [   93.334191] -> #2 (&pool->lock/1){-.-.}-{2:2}:
> [   93.334197]_raw_spin_lock+0x5c/0x78
> [   93.334198]__queue_work+0x124/0x7c8
> [   93.334199]queue_work_on+0xd0/0xf0
> [   93.334200]tty_flip_buffer_push+0x3c/0x48
> [   93.334202]serial8250_rx_chars+0x74/0x88
> [   93.334203]fsl8250_handle_irq+0x15c/0x1a0
> [   93.334204]serial8250_interrupt+0x70/0xb8
> [   93.334206]__handle_irq_event_percpu+0xe0/0x478
> [   93.334207]handle_irq_event_percpu+0x40/0x98
> [   93.334208]handle_irq_event+0x4c/0xd0
> [   93.334209]handle_fasteoi_irq+0xb4/0x158
> [   93.334211]generic_handle_irq+0x3c/0x58
> [   93.334212]__handle_domain_irq+0x68/0xc0
> [   93.334213]gic_handle_irq+0x6c/0x160
> [   93.334214]el1_irq+0xbc/0x180
> [   93.334216]cpuidle_enter_state+0xb4/0x4f8
> [   93.334217]cpuidle_enter+0x3c/0x50
> [   93.334218]call_cpuidle+0x44/0x78
> [   93.334219]do_idle+0x228/0x2c8
> [   93.334220]cpu_startup_entry+0x2c/0x48
> [   93.334222]rest_init+0x1ac/0x280
> [   93.334223]arch_call_rest_init+0x14/0x1c
> [   93.334224]start_kernel+0x4ec/0x524
> [   93.334225]
> [   93.334226] -> #1 (&port->lock#2){-.-.}-{2:2}:
> [   93.334232]_raw_spin_lock_irqsave+0x78/0xa0
> [   93.334233]serial8250_console_write+0x1f4/0x348
> [   93.334234]univ8250_console_write+0x44/0x58
> [   93.334235]console_unlock+0x480/0x678
> [   93.334237]vprintk_emit+0x188/0x370
> [   93.334238]vprintk_default+0x48/0x58
> [   93.334239]vprintk_func+0xf0/0x238
> [   93.334240]printk+0x74/0x94
> [   93.334241]register_console+0x1a0/0x300
> [   93.334243]uart_add_one_port+0x4a0/0x4e0
> [   93.334244]serial8250_regi

stress-ng --hrtimers hangs system

2020-06-04 Thread Vladimir Oltean
Hi,

I was testing stress-ng on an ARM64 box and I found that it can be killed 
instantaneously with a --hrtimers 1 test:
https://github.com/ColinIanKing/stress-ng/blob/master/stress-hrtimers.c
The console shell locks up immediately after starting the process, and I get 
this rcu_preempt splat after 21 seconds,
letting me know that the grace-periods kernel thread could not run:

# stress-ng --hrtimers 1
stress-ng: info:  [320] defaulting to a 86400 second (1 day, 0.00 secs) run per 
stressor
stress-ng: info:  [320] dispatching hogs: 1 hrtimers
stress-ng: info:  [320] cache allocate: using defaults, can't determine cache 
details from sysfs
[   85.827528] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[   85.833656]  (detected by 1, t=6502 jiffies, g=1789, q=12)
[   85.839163] rcu: All QSes seen, last rcu_preempt kthread activity 6502 
(4294913720-4294907218), jiffies_till_next_fqs=1, root ->qsmask 0x0
[   85.851647] rcu: rcu_preempt kthread starved for 6502 jiffies! g1789 f0x2 
RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
[   85.861859] rcu: Unless rcu_preempt kthread gets sufficient CPU time, 
OOM is now expected behavior.
[   85.871025] rcu: RCU grace-period kthread stack dump:
[   85.876094] rcu_preempt R  running task010  2 0x0028
[   85.883173] Call trace:
[   85.885636]  __switch_to+0xf8/0x148
[   85.889137]  __schedule+0x2d8/0x808
[   85.892636]  schedule+0x48/0x100
[   85.895875]  schedule_timeout+0x1c8/0x420
[   85.899900]  rcu_gp_kthread+0x738/0x1b78
[   85.903836]  kthread+0x158/0x168
[   85.907075]  ret_from_fork+0x10/0x18
[   93.283548] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 
stuck for 33s!
[   93.291569] BUG: workqueue lockup - pool cpus=0-1 flags=0x4 nice=0 stuck for 
33s!
[   93.299105] Showing busy workqueues and worker pools:
[   93.304189] workqueue events: flags=0x0
[   93.308116]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[   93.315031] pending: vmstat_shepherd
[   93.318990] workqueue events_unbound: flags=0x2
[   93.323577]   pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/512 refcnt=3
[   93.330309] pending: flush_to_ldisc
[   93.334161]
[   93.334164] ==
[   93.334165] WARNING: possible circular locking dependency detected
[   93.334167] 5.7.0-08604-g7dcedf8c04c0 #118 Not tainted
[   93.334168] --
[   93.334169] stress-ng-hrtim/326 is trying to acquire lock:
[   93.334171] b236f6aee1a0 (console_owner){-.-.}-{0:0}, at: 
console_unlock+0x21c/0x678
[   93.334176]
[   93.334177] but task is already holding lock:
[   93.334178] 00207ac0d018 (&pool->lock/1){-.-.}-{2:2}, at: 
show_workqueue_state+0x288/0x3a8
[   93.334184]
[   93.334186] which lock already depends on the new lock.
[   93.334187]
[   93.334188]
[   93.334189] the existing dependency chain (in reverse order) is:
[   93.334190]
[   93.334191] -> #2 (&pool->lock/1){-.-.}-{2:2}:
[   93.334197]_raw_spin_lock+0x5c/0x78
[   93.334198]__queue_work+0x124/0x7c8
[   93.334199]queue_work_on+0xd0/0xf0
[   93.334200]tty_flip_buffer_push+0x3c/0x48
[   93.334202]serial8250_rx_chars+0x74/0x88
[   93.334203]fsl8250_handle_irq+0x15c/0x1a0
[   93.334204]serial8250_interrupt+0x70/0xb8
[   93.334206]__handle_irq_event_percpu+0xe0/0x478
[   93.334207]handle_irq_event_percpu+0x40/0x98
[   93.334208]handle_irq_event+0x4c/0xd0
[   93.334209]handle_fasteoi_irq+0xb4/0x158
[   93.334211]generic_handle_irq+0x3c/0x58
[   93.334212]__handle_domain_irq+0x68/0xc0
[   93.334213]gic_handle_irq+0x6c/0x160
[   93.334214]el1_irq+0xbc/0x180
[   93.334216]cpuidle_enter_state+0xb4/0x4f8
[   93.334217]cpuidle_enter+0x3c/0x50
[   93.334218]call_cpuidle+0x44/0x78
[   93.334219]do_idle+0x228/0x2c8
[   93.334220]cpu_startup_entry+0x2c/0x48
[   93.334222]rest_init+0x1ac/0x280
[   93.334223]arch_call_rest_init+0x14/0x1c
[   93.334224]start_kernel+0x4ec/0x524
[   93.334225]
[   93.334226] -> #1 (&port->lock#2){-.-.}-{2:2}:
[   93.334232]_raw_spin_lock_irqsave+0x78/0xa0
[   93.334233]serial8250_console_write+0x1f4/0x348
[   93.334234]univ8250_console_write+0x44/0x58
[   93.334235]console_unlock+0x480/0x678
[   93.334237]vprintk_emit+0x188/0x370
[   93.334238]vprintk_default+0x48/0x58
[   93.334239]vprintk_func+0xf0/0x238
[   93.334240]printk+0x74/0x94
[   93.334241]register_console+0x1a0/0x300
[   93.334243]uart_add_one_port+0x4a0/0x4e0
[   93.334244]serial8250_register_8250_port+0x2d0/0x4b0
[   93.334245]of_platform_serial_probe+0x318/0x650
[   93.334246]platform_drv_probe+0x58/0xa8
[   93.334248]really_probe+0x10c/0x358
[   93.334249]driver_probe_device+0x5c/0x100
[   93.334250]device_