RE: [EXT] Re: stress-ng --hrtimers hangs system
Thanks Kurt for your confirmation. Best Regards, Jiafei. -Original Message- From: Kurt Kanzenbach Sent: Saturday, June 20, 2020 4:08 PM To: Vladimir Oltean ; Jiafei Pan Cc: linux-rt-us...@vger.kernel.org; lkml ; r...@vger.kernel.org; Colin King ; Jiafei Pan Subject: Re: [EXT] Re: stress-ng --hrtimers hangs system Caution: EXT Email Hi Jiafei, On Friday, June 12, 2020 2:49:17 AM CEST Jiafei Pan wrote: > Hi, Kurt, > > May I know whether you used "root" user to run stress-ng? using "root" > user will change the scheduler to be "SCHED_RR", so would you please > share test result with root and non-root users? Thanks. Performed the test now as root and non-root user. Same result: No problem. Thanks, Kurt
Re: [EXT] Re: stress-ng --hrtimers hangs system
Hi Jiafei, On Friday, June 12, 2020 2:49:17 AM CEST Jiafei Pan wrote: > Hi, Kurt, > > May I know whether you used "root" user to run stress-ng? using "root" > user will change the scheduler to be "SCHED_RR", so would you please > share test result with root and non-root users? Thanks. Performed the test now as root and non-root user. Same result: No problem. Thanks, Kurt
RE: [EXT] Re: stress-ng --hrtimers hangs system
Hi, Kurt, May I know whether you used "root" user to run stress-ng? using "root" user will change the scheduler to be "SCHED_RR", so would you please share test result with root and non-root users? Thanks. Best Regards, Jiafei. -Original Message- From: linux-rt-users-ow...@vger.kernel.org On Behalf Of Kurt Kanzenbach Sent: Wednesday, June 10, 2020 1:58 PM To: Vladimir Oltean Cc: linux-rt-us...@vger.kernel.org; lkml ; r...@vger.kernel.org; Colin King Subject: [EXT] Re: stress-ng --hrtimers hangs system Hi Vladimir, On Tue Jun 09 2020, Vladimir Oltean wrote: > Just out of curiosity, what and how many CPU cores does your ARM64 box > have, and what frequency are you running them at? > Mine is a dual-core A72 machine running at 1500 MHz. That particular machine has a dual core Cortex A53 running at 1GHz. Thanks, Kurt
Re: stress-ng --hrtimers hangs system
Hi Kurt, On Wed, 10 Jun 2020 at 08:58, Kurt Kanzenbach wrote: > > Hi Vladimir, > > On Tue Jun 09 2020, Vladimir Oltean wrote: > > Just out of curiosity, what and how many CPU cores does your ARM64 box > > have, and what frequency are you running them at? > > Mine is a dual-core A72 machine running at 1500 MHz. > > That particular machine has a dual core Cortex A53 running at 1GHz. > > Thanks, > Kurt Ok, I just noticed that you said "v5.6*-rt*". So I re-tested that 5.4 downstream kernel again that hanged before, just applied the -rt1 patchset on top. And it's actually working fine now, no hangs, no RCU stalls, no need to reduce sched_rt_runtime_us. So maybe it's something related to the config? Thanks, -Vladimir
Re: stress-ng --hrtimers hangs system
Hi Vladimir, On Tue Jun 09 2020, Vladimir Oltean wrote: > Just out of curiosity, what and how many CPU cores does your ARM64 box > have, and what frequency are you running them at? > Mine is a dual-core A72 machine running at 1500 MHz. That particular machine has a dual core Cortex A53 running at 1GHz. Thanks, Kurt signature.asc Description: PGP signature
Re: stress-ng --hrtimers hangs system
Hi Kurt, On Fri, 5 Jun 2020 at 16:19, Vladimir Oltean wrote: > > Hi Kurt, > > On Fri, 5 Jun 2020 at 15:57, Kurt Kanzenbach > wrote: > > > > Hi Vladimir, > > > > On Fri Jun 05 2020, Vladimir Oltean wrote: > > > Hi, > > > > > > I was testing stress-ng on an ARM64 box and I found that it can be killed > > > instantaneously with a --hrtimers 1 test: > > > https://github.com/ColinIanKing/stress-ng/blob/master/stress-hrtimers.c > > > The console shell locks up immediately after starting the process, and I > > > get this rcu_preempt splat after 21 seconds, > > > letting me know that the grace-periods kernel thread could not run: > > > > interesting. Just tested this on an ARM64 box with v5.6-rt and the > > stress-ng hrtimer test works fine. No lockups, cyclictest results are > > looking good. So maybe this is v5.7 related. > > > > Thanks, > > Kurt > > This is not actually v5.7 related. I could also reproduce it on a > downstream 5.4 kernel which is how I originally saw it. > > Thanks, > -Vladimir Just out of curiosity, what and how many CPU cores does your ARM64 box have, and what frequency are you running them at? Mine is a dual-core A72 machine running at 1500 MHz. Thanks, -Vladimir
Re: stress-ng --hrtimers hangs system
Hi Kurt, On Fri, 5 Jun 2020 at 15:57, Kurt Kanzenbach wrote: > > Hi Vladimir, > > On Fri Jun 05 2020, Vladimir Oltean wrote: > > Hi, > > > > I was testing stress-ng on an ARM64 box and I found that it can be killed > > instantaneously with a --hrtimers 1 test: > > https://github.com/ColinIanKing/stress-ng/blob/master/stress-hrtimers.c > > The console shell locks up immediately after starting the process, and I > > get this rcu_preempt splat after 21 seconds, > > letting me know that the grace-periods kernel thread could not run: > > interesting. Just tested this on an ARM64 box with v5.6-rt and the > stress-ng hrtimer test works fine. No lockups, cyclictest results are > looking good. So maybe this is v5.7 related. > > Thanks, > Kurt This is not actually v5.7 related. I could also reproduce it on a downstream 5.4 kernel which is how I originally saw it. Thanks, -Vladimir
Re: stress-ng --hrtimers hangs system
Hi Vladimir, On Fri Jun 05 2020, Vladimir Oltean wrote: > Hi, > > I was testing stress-ng on an ARM64 box and I found that it can be killed > instantaneously with a --hrtimers 1 test: > https://github.com/ColinIanKing/stress-ng/blob/master/stress-hrtimers.c > The console shell locks up immediately after starting the process, and I get > this rcu_preempt splat after 21 seconds, > letting me know that the grace-periods kernel thread could not run: interesting. Just tested this on an ARM64 box with v5.6-rt and the stress-ng hrtimer test works fine. No lockups, cyclictest results are looking good. So maybe this is v5.7 related. Thanks, Kurt signature.asc Description: PGP signature
Re: stress-ng --hrtimers hangs system
Hi Paul, On Fri, 5 Jun 2020 at 06:20, Paul E. McKenney wrote: > > > That agrees with my interpretation of the stall warning message. > > On the rest, I must defer to others. > > Thanx, Paul > Thanks, always good to have a confirmation from you! FWIW, the system has survived an overnight run with sched_rt_runtime_us set to 80%, I don't know if there's more to it. Regards, -Vladimir
Re: stress-ng --hrtimers hangs system
On Fri, Jun 05, 2020 at 04:47:51AM +0300, Vladimir Oltean wrote: > Hi, > > I was testing stress-ng on an ARM64 box and I found that it can be killed > instantaneously with a --hrtimers 1 test: > https://github.com/ColinIanKing/stress-ng/blob/master/stress-hrtimers.c > The console shell locks up immediately after starting the process, and I get > this rcu_preempt splat after 21 seconds, > letting me know that the grace-periods kernel thread could not run: > > # stress-ng --hrtimers 1 > stress-ng: info: [320] defaulting to a 86400 second (1 day, 0.00 secs) run > per stressor > stress-ng: info: [320] dispatching hogs: 1 hrtimers > stress-ng: info: [320] cache allocate: using defaults, can't determine cache > details from sysfs > [ 85.827528] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > [ 85.833656] (detected by 1, t=6502 jiffies, g=1789, q=12) > [ 85.839163] rcu: All QSes seen, last rcu_preempt kthread activity 6502 > (4294913720-4294907218), jiffies_till_next_fqs=1, root ->qsmask 0x0 > [ 85.851647] rcu: rcu_preempt kthread starved for 6502 jiffies! g1789 f0x2 > RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 > [ 85.861859] rcu: Unless rcu_preempt kthread gets sufficient CPU time, > OOM is now expected behavior. > [ 85.871025] rcu: RCU grace-period kthread stack dump: > [ 85.876094] rcu_preempt R running task010 2 > 0x0028 > [ 85.883173] Call trace: > [ 85.885636] __switch_to+0xf8/0x148 > [ 85.889137] __schedule+0x2d8/0x808 > [ 85.892636] schedule+0x48/0x100 > [ 85.895875] schedule_timeout+0x1c8/0x420 > [ 85.899900] rcu_gp_kthread+0x738/0x1b78 > [ 85.903836] kthread+0x158/0x168 > [ 85.907075] ret_from_fork+0x10/0x18 > [ 93.283548] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 > stuck for 33s! > [ 93.291569] BUG: workqueue lockup - pool cpus=0-1 flags=0x4 nice=0 stuck > for 33s! > [ 93.299105] Showing busy workqueues and worker pools: > [ 93.304189] workqueue events: flags=0x0 > [ 93.308116] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 > [ 93.315031] pending: vmstat_shepherd > [ 93.318990] workqueue events_unbound: flags=0x2 > [ 93.323577] pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/512 refcnt=3 > [ 93.330309] pending: flush_to_ldisc > [ 93.334161] > [ 93.334164] == > [ 93.334165] WARNING: possible circular locking dependency detected > [ 93.334167] 5.7.0-08604-g7dcedf8c04c0 #118 Not tainted > [ 93.334168] -- > [ 93.334169] stress-ng-hrtim/326 is trying to acquire lock: > [ 93.334171] b236f6aee1a0 (console_owner){-.-.}-{0:0}, at: > console_unlock+0x21c/0x678 > [ 93.334176] > [ 93.334177] but task is already holding lock: > [ 93.334178] 00207ac0d018 (&pool->lock/1){-.-.}-{2:2}, at: > show_workqueue_state+0x288/0x3a8 > [ 93.334184] > [ 93.334186] which lock already depends on the new lock. > [ 93.334187] > [ 93.334188] > [ 93.334189] the existing dependency chain (in reverse order) is: > [ 93.334190] > [ 93.334191] -> #2 (&pool->lock/1){-.-.}-{2:2}: > [ 93.334197]_raw_spin_lock+0x5c/0x78 > [ 93.334198]__queue_work+0x124/0x7c8 > [ 93.334199]queue_work_on+0xd0/0xf0 > [ 93.334200]tty_flip_buffer_push+0x3c/0x48 > [ 93.334202]serial8250_rx_chars+0x74/0x88 > [ 93.334203]fsl8250_handle_irq+0x15c/0x1a0 > [ 93.334204]serial8250_interrupt+0x70/0xb8 > [ 93.334206]__handle_irq_event_percpu+0xe0/0x478 > [ 93.334207]handle_irq_event_percpu+0x40/0x98 > [ 93.334208]handle_irq_event+0x4c/0xd0 > [ 93.334209]handle_fasteoi_irq+0xb4/0x158 > [ 93.334211]generic_handle_irq+0x3c/0x58 > [ 93.334212]__handle_domain_irq+0x68/0xc0 > [ 93.334213]gic_handle_irq+0x6c/0x160 > [ 93.334214]el1_irq+0xbc/0x180 > [ 93.334216]cpuidle_enter_state+0xb4/0x4f8 > [ 93.334217]cpuidle_enter+0x3c/0x50 > [ 93.334218]call_cpuidle+0x44/0x78 > [ 93.334219]do_idle+0x228/0x2c8 > [ 93.334220]cpu_startup_entry+0x2c/0x48 > [ 93.334222]rest_init+0x1ac/0x280 > [ 93.334223]arch_call_rest_init+0x14/0x1c > [ 93.334224]start_kernel+0x4ec/0x524 > [ 93.334225] > [ 93.334226] -> #1 (&port->lock#2){-.-.}-{2:2}: > [ 93.334232]_raw_spin_lock_irqsave+0x78/0xa0 > [ 93.334233]serial8250_console_write+0x1f4/0x348 > [ 93.334234]univ8250_console_write+0x44/0x58 > [ 93.334235]console_unlock+0x480/0x678 > [ 93.334237]vprintk_emit+0x188/0x370 > [ 93.334238]vprintk_default+0x48/0x58 > [ 93.334239]vprintk_func+0xf0/0x238 > [ 93.334240]printk+0x74/0x94 > [ 93.334241]register_console+0x1a0/0x300 > [ 93.334243]uart_add_one_port+0x4a0/0x4e0 > [ 93.334244]serial8250_regi
stress-ng --hrtimers hangs system
Hi, I was testing stress-ng on an ARM64 box and I found that it can be killed instantaneously with a --hrtimers 1 test: https://github.com/ColinIanKing/stress-ng/blob/master/stress-hrtimers.c The console shell locks up immediately after starting the process, and I get this rcu_preempt splat after 21 seconds, letting me know that the grace-periods kernel thread could not run: # stress-ng --hrtimers 1 stress-ng: info: [320] defaulting to a 86400 second (1 day, 0.00 secs) run per stressor stress-ng: info: [320] dispatching hogs: 1 hrtimers stress-ng: info: [320] cache allocate: using defaults, can't determine cache details from sysfs [ 85.827528] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 85.833656] (detected by 1, t=6502 jiffies, g=1789, q=12) [ 85.839163] rcu: All QSes seen, last rcu_preempt kthread activity 6502 (4294913720-4294907218), jiffies_till_next_fqs=1, root ->qsmask 0x0 [ 85.851647] rcu: rcu_preempt kthread starved for 6502 jiffies! g1789 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 [ 85.861859] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. [ 85.871025] rcu: RCU grace-period kthread stack dump: [ 85.876094] rcu_preempt R running task010 2 0x0028 [ 85.883173] Call trace: [ 85.885636] __switch_to+0xf8/0x148 [ 85.889137] __schedule+0x2d8/0x808 [ 85.892636] schedule+0x48/0x100 [ 85.895875] schedule_timeout+0x1c8/0x420 [ 85.899900] rcu_gp_kthread+0x738/0x1b78 [ 85.903836] kthread+0x158/0x168 [ 85.907075] ret_from_fork+0x10/0x18 [ 93.283548] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 33s! [ 93.291569] BUG: workqueue lockup - pool cpus=0-1 flags=0x4 nice=0 stuck for 33s! [ 93.299105] Showing busy workqueues and worker pools: [ 93.304189] workqueue events: flags=0x0 [ 93.308116] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 [ 93.315031] pending: vmstat_shepherd [ 93.318990] workqueue events_unbound: flags=0x2 [ 93.323577] pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/512 refcnt=3 [ 93.330309] pending: flush_to_ldisc [ 93.334161] [ 93.334164] == [ 93.334165] WARNING: possible circular locking dependency detected [ 93.334167] 5.7.0-08604-g7dcedf8c04c0 #118 Not tainted [ 93.334168] -- [ 93.334169] stress-ng-hrtim/326 is trying to acquire lock: [ 93.334171] b236f6aee1a0 (console_owner){-.-.}-{0:0}, at: console_unlock+0x21c/0x678 [ 93.334176] [ 93.334177] but task is already holding lock: [ 93.334178] 00207ac0d018 (&pool->lock/1){-.-.}-{2:2}, at: show_workqueue_state+0x288/0x3a8 [ 93.334184] [ 93.334186] which lock already depends on the new lock. [ 93.334187] [ 93.334188] [ 93.334189] the existing dependency chain (in reverse order) is: [ 93.334190] [ 93.334191] -> #2 (&pool->lock/1){-.-.}-{2:2}: [ 93.334197]_raw_spin_lock+0x5c/0x78 [ 93.334198]__queue_work+0x124/0x7c8 [ 93.334199]queue_work_on+0xd0/0xf0 [ 93.334200]tty_flip_buffer_push+0x3c/0x48 [ 93.334202]serial8250_rx_chars+0x74/0x88 [ 93.334203]fsl8250_handle_irq+0x15c/0x1a0 [ 93.334204]serial8250_interrupt+0x70/0xb8 [ 93.334206]__handle_irq_event_percpu+0xe0/0x478 [ 93.334207]handle_irq_event_percpu+0x40/0x98 [ 93.334208]handle_irq_event+0x4c/0xd0 [ 93.334209]handle_fasteoi_irq+0xb4/0x158 [ 93.334211]generic_handle_irq+0x3c/0x58 [ 93.334212]__handle_domain_irq+0x68/0xc0 [ 93.334213]gic_handle_irq+0x6c/0x160 [ 93.334214]el1_irq+0xbc/0x180 [ 93.334216]cpuidle_enter_state+0xb4/0x4f8 [ 93.334217]cpuidle_enter+0x3c/0x50 [ 93.334218]call_cpuidle+0x44/0x78 [ 93.334219]do_idle+0x228/0x2c8 [ 93.334220]cpu_startup_entry+0x2c/0x48 [ 93.334222]rest_init+0x1ac/0x280 [ 93.334223]arch_call_rest_init+0x14/0x1c [ 93.334224]start_kernel+0x4ec/0x524 [ 93.334225] [ 93.334226] -> #1 (&port->lock#2){-.-.}-{2:2}: [ 93.334232]_raw_spin_lock_irqsave+0x78/0xa0 [ 93.334233]serial8250_console_write+0x1f4/0x348 [ 93.334234]univ8250_console_write+0x44/0x58 [ 93.334235]console_unlock+0x480/0x678 [ 93.334237]vprintk_emit+0x188/0x370 [ 93.334238]vprintk_default+0x48/0x58 [ 93.334239]vprintk_func+0xf0/0x238 [ 93.334240]printk+0x74/0x94 [ 93.334241]register_console+0x1a0/0x300 [ 93.334243]uart_add_one_port+0x4a0/0x4e0 [ 93.334244]serial8250_register_8250_port+0x2d0/0x4b0 [ 93.334245]of_platform_serial_probe+0x318/0x650 [ 93.334246]platform_drv_probe+0x58/0xa8 [ 93.334248]really_probe+0x10c/0x358 [ 93.334249]driver_probe_device+0x5c/0x100 [ 93.334250]device_