SRU request submitted:
https://lists.ubuntu.com/archives/kernel-team/2018-May/092540.html
** Description changed:
+ == SRU Justification ==
+ IBM reports that this bug occurs with stop4 which results in soft lockups/rcu
stalls.
+ This is a kernel synchronization issue leading to a dead lock.
+
+ This bug was introduced by commit 7bc54b652f13 in v4.8-rc1. This
+ regression is fixed by mainline commit c0f7f5b6c6910.
+
+ == Fix ==
+ c0f7f5b6c6910 ("cpufreq: powernv: Fix hardlockup due to synchronous smp_call
in timer interrupt")
+
+ == Regression Potential ==
+ Low. Fixes current regression. Cc'd to upstream stable, so it has had
+ additon upstream review.
+
+ == Test Case ==
+ A test kernel was built with this patch and tested by the original bug
reporter.
+ The bug reporter states the test kernel resolved the bug.
+
+
Recently we discovered this bug occurs just alone with stop4 which
results in soft lockups/rcu stalls.
```
root@ltc-boston125:~# [15523.619395] systemd[1]: systemd-journald.service:
Processes still around after final SIGKILL. Entering failed mode.
[15523.619508] systemd[1]: systemd-journald.service: Failed with result
'timeout'.
[15523.619769] systemd[1]: Failed to start Journal Service.
[15523.620618] systemd[1]: systemd-journald.service: Service has no hold-off
time, scheduling restart.
[15523.620774] systemd[1]: systemd-journald.service: Scheduled restart job,
restart counter is at 21.
[15523.621462] systemd[1]: Stopped Journal Service.
[15523.621635] systemd[1]: systemd-journald.service: Found left-over process
1561 (systemd-journal) in control group while starting unit. Ignoring.
[15523.621756] systemd[1]: This usually indicates unclean termination of a
previous run, or service implementation deficiencies.
[15523.621888] systemd[1]: systemd-journald.service: Found left-over process
69060 (systemd-journal) in control group while starting unit. Ignoring.
[15523.622029] systemd[1]: This usually indica[15541.629904] INFO: rcu_sched
self-detected stall on CPU
- [15541.629958] 60-....: (2 GPs behind) idle=146/140000000000002/0
softirq=300022/300022 fqs=999069
+ [15541.629958] 60-....: (2 GPs behind) idle=146/140000000000002/0
softirq=300022/300022 fqs=999069
[15541.630046] (t=2415546 jiffies g=184827 c=184826 q=57111)
[15541.630101] NMI backtrace for cpu 60
[15541.630135] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L
4.15.0-15-generic #16-Ubuntu
[15541.630207] Call Trace:
[15541.630232] [c000201a1da96b00] [c000000000ceb35c] dump_stack+0xb0/0xf4
(unreliable)
[15541.630298] [c000201a1da96b40] [c000000000cf4d48]
nmi_cpu_backtrace+0x1f8/0x200
[15541.630363] [c000201a1da96bd0] [c000000000cf4ee8]
nmi_trigger_cpumask_backtrace+0x198/0x1f0
[15541.630429] [c000201a1da96c60] [c00000000002f2d8]
arch_trigger_cpumask_backtrace+0x28/0x40
[15541.630495] [c000201a1da96c80] [c0000000001a913c]
rcu_dump_cpu_stacks+0xf4/0x158
[15541.630560] [c000201a1da96cd0] [c0000000001a81e8]
rcu_check_callbacks+0x8e8/0xb40
[15541.630625] [c000201a1da96e00] [c0000000001b64a8]
update_process_times+0x48/0x90
[15541.630689] [c000201a1da96e30] [c0000000001ce1f4]
tick_sched_handle.isra.5+0x34/0xd0
[15541.630753] [c000201a1da96e60] [c0000000001ce2f0]
tick_sched_timer+0x60/0xe0
[15541.630818] [c000201a1da96ea0] [c0000000001b7054]
__hrtimer_run_queues+0x144/0x370
[15541.630883] [c000201a1da96f20] [c0000000001b7fac]
hrtimer_interrupt+0xfc/0x350
[15541.630948] [c000201a1da96ff0] [c0000000000248f0]
__timer_interrupt+0x90/0x260
[15541.631013] [c000201a1da97040] [c000000000024d08] timer_interrupt+0x98/0xe0
[15541.631069] [c000201a1da97070] [c000000000009014]
decrementer_common+0x114/0x120
[15541.631135] --- interrupt: 901 at smp_call_function_single+0x134/0x180
[15541.631135] LR = smp_call_function_single+0x110/0x180
[15541.631230] [c000201a1da973d0] [c0000000001d55e0]
smp_call_function_any+0x180/0x250
[15541.631294] [c000201a1da97430] [c000000000acd3e8]
gpstate_timer_handler+0x1e8/0x580
[15541.631359] [c000201a1da974e0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0
[15541.631433] [c000201a1da97560] [c0000000001b4958] expire_timers+0x138/0x1f0
[15541.631488] [c000201a1da975d0] [c0000000001b4bf8]
run_timer_softirq+0x1e8/0x270
[15541.631553] [c000201a1da97670] [c000000000d0d6c8] __do_softirq+0x158/0x3e4
[15541.631608] [c000201a1da97750] [c000000000114be8] irq_exit+0xe8/0x120
[15541.631663] [c000201a1da97770] [c000000000024d0c] timer_interrupt+0x9c/0xe0
[15541.631718] [c000201a1da977a0] [c000000000009014]
decrementer_common+0x114/0x120
[15541.631784] --- interrupt: 901 at smp_call_function_many+0x330/0x450
[15541.631784] LR = smp_call_function_many+0x324/0x450
[15541.631879] [c000201a1da97b00] [c000000000075f18] pmdp_invalidate+0x98/0xe0
[15541.631935] [c000201a1da97b30] [c0000000003a1120]
change_huge_pmd+0xe0/0x270
[15541.632000] [c000201a1da97ba0] [c000000000349278]
change_protection_range+0xb88/0xe40
[15541.632065] [c000201a1da97cf0] [c0000000003496c0]
mprotect_fixup+0x140/0x340
[15541.632129] [c000201a1da97db0] [c000000000349a74] SyS_mprotect+0x1b4/0x350
[15541.632185] [c000201a1da97e30] [c00000000000b184] system_call+0x58/0x6c
[15579.001651] watchdog: BUG: soft lockup - CPU#52 stuck for 23s! [grep:69263]
[15579.001738] Modules linked in: vhost_net vhost tap xt_CHECKSUM
iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT
nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter
ip6_tables iptable_filter devlink input_leds joydev mac_hid idt_89hpesx ofpart
cmdlinepart powernv_flash ipmi_powernv ipmi_devintf opal_prd mtd
ipmi_msghandler ibmpowernv at24 uio_pdrv_genirq uio vmx_crypto kvm_hv kvm
sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp
libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure ast
[15579.002363] i2c_algo_bit hid_generic ttm drm_kms_helper mpt3sas
syscopyarea sysfillrect usbhid sysimgblt fb_sys_fops hid raid_class
crct10dif_vpmsum crc32c_vpmsum drm i40e aacraid scsi_transport_sas
[15579.002524] CPU: 52 PID: 69263 Comm: grep Tainted: G L
4.15.0-15-generic #16-Ubuntu
[15579.002598] NIP: c0000000001d5368 LR: c0000000001d5340 CTR:
c000000000acc7f0
[15579.002664] REGS: c000003e84eff7e0 TRAP: 0901 Tainted: G L
(4.15.0-15-generic)
[15579.002735] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 48044222
XER: 00000000
- [15579.002810] CFAR: c01721ed8
- [15579.002810] GPR08: c000000001721ed8 0000000000000001 c009e006592e0960
0000000000000000
- [15579.002810] GPR12: c000000000acc7f0 c00000000faa3c00
+ [15579.002810] CFAR: c01721ed8
+ [15579.002810] GPR08: c000000001721ed8 0000000000000001 c009e006592e0960
0000000000000000
+ [15579.002810] GPR12: c000000000acc7f0 c00000000faa3c00
[15579.003084] NIP [c0000000001d5368] smp_call_function_single+0x138/0x180
[15579.003139] LR [c0000000001d5340] smp_call_function_single+0x110/0x180
[15579.003191] Call Trace:
[15579.003217] [c000003e84effa60] [c0000000001d5340]
smp_call_function_single+0x110/0x180 (unreliable)
[15579.003298] [c000003e84effad0] [c0000000001d55e0]
smp_call_function_any+0x180/0x250
[15579.003381] [c000003e84effb30] [c000000000acc840]
powernv_cpufreq_get+0x50/0x70
[15579.003447] [c000003e84effb60] [c000000000ac2b8c] __cpufreq_get+0x5c/0x140
[15579.003503] [c000003e84effba0] [c000000000ac2d18] cpufreq_get+0xa8/0xb0
[15579.003560] [c000003e84effbe0] [c00000000009da50]
pnv_get_proc_freq+0x20/0x50
[15579.003625] [c000003e84effc00] [c0000000000283bc] show_cpuinfo+0x11c/0x400
[15579.003680] [c000003e84effca0] [c00000000040c738] seq_read+0x138/0x610
[15579.003737] [c000003e84effd40] [c00000000047fa38] proc_reg_read+0x88/0xd0
[15579.003794] [c000003e84effd70] [c0000000003d293c] __vfs_read+0x3c/0x70
[15579.003849] [c000003e84effd90] [c0000000003d2a2c] vfs_read+0xbc/0x1b0
[15579.003905] [c000003e84effde0] [c0000000003d3028] SyS_read+0x68/0x110
[15579.003962] [c000003e84effe30] [c00000000000b184] system_call+0x58/0x6c
[15579.004016] Instruction dump:
- [15579.004051] 7fe4fb78 4bfffd4d 813f0018 71290001 4182002c 48000014 60000000
60000000
- [15579.004121] 60000000 60420000 7c210b78 7c421378 <813f0018> 71290001
4082fff0 7c2004ac
+ [15579.004051] 7fe4fb78 4bfffd4d 813f0018 71290001 4182002c 48000014 60000000
60000000
+ [15579.004121] 60000000 60420000 7c210b78 7c421378 <813f0018> 71290001
4082fff0 7c2004ac
[15604.648202] INFO: rcu_sched self-detected stall on CPU
- [15604.648260] 60-....: (2 GPs behind) idle=146/140000000000002/0
softirq=300022/300022 fqs=1005652
+ [15604.648260] 60-....: (2 GPs behind) idle=146/140000000000002/0
softirq=300022/300022 fqs=1005652
[15604.648332] (t=2431300 jiffies g=184827 c=184826 q=57308)
[15604.648385] NMI backtrace for cpu 60
[15604.648419] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L
4.15.0-15-generic #16-Ubuntu
[15604.648491] Call Trace:
[15604.648515] [c000201a1da96b00] [c000000000ceb35c] dump_stack+0xb0/0xf4
(unreliable)
[15604.648581] [c000201a1da96b40] [c000000000cf4d48]
nmi_cpu_backtrace+0x1f8/0x200
[15604.648647] [c000201a1da96bd0] [c000000000cf4ee8]
nmi_trigger_cpumask_backtrace+0x198/0x1f0
[15604.648728] [c000201a1da96c60] [c00000000002f2d8]
arch_trigger_cpumask_backtrace+0x28/0x40
[15604.648793] [c000201a1da96c80] [c0000000001a913c]
rcu_dump_cpu_stacks+0xf4/0x158
[15604.648858] [c000201a1da96cd0] [c0000000001a81e8]
rcu_check_callbacks+0x8e8/0xb40
[15604.648924] [c000201a1da96e00] [c0000000001b64a8]
update_process_times+0x48/0x90
[15604.648988] [c000201a1da96e30] [c0000000001ce1f4]
tick_sched_handle.isra.5+0x34/0xd0
[15604.649052] [c000201a1da96e60] [c0000000001ce2f0]
tick_sched_timer+0x60/0xe0
[15604.649118] [c000201a1da96ea0] [c0000000001b7054]
__hrtimer_run_queues+0x144/0x370
[15604.649183] [c000201a1da96f20] [c0000000001b7fac]
hrtimer_interrupt+0xfc/0x350
[15604.649248] [c000201a1da96ff0] [c0000000000248f0]
__timer_interrupt+0x90/0x260
[15604.649313] [c000201a1da97040] [c000000000024d08] timer_interrupt+0x98/0xe0
[15604.649369] [c000201a1da97070] [c000000000009014]
decrementer_common+0x114/0x120
[15604.649435] --- interrupt: 901 at smp_call_function_single+0x138/0x180
[15604.649435] LR = smp_call_function_single+0x110/0x180
[15604.649530] [c000201a1da973d0] [c0000000001d55e0]
smp_call_function_any+0x180/0x250
[15604.649595] [c000201a1da97430] [c000000000acd3e8]
gpstate_timer_handler+0x1e8/0x580
[15604.649660] [c000201a1da974e0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0
[15604.649715] [c000201a1da97560] [c0000000001b4958] expire_timers+0x138/0x1f0
[15604.649770] [c000201a1da975d0] [c0000000001b4bf8]
run_timer_softirq+0x1e8/0x270
[15604.649835] [c000201a1da97670] [c000000000d0d6c8] __do_softirq+0x158/0x3e4
[15604.649891] [c000201a1da97750] [c000000000114be8] irq_exit+0xe8/0x120
[15604.649946] [c000201a1da97770] [c000000000024d0c] timer_interrupt+0x9c/0xe0
[15604.650002] [c000201a1da977a0] [c000000000009014]
decrementer_common+0x114/0x120
[15604.650084] --- interrupt: 901 at smp_call_function_many+0x330/0x450
[15604.650084] LR = smp_call_function_many+0x324/0x450
[15604.650179] [c000201a1da97b00] [c000000000075f18] pmdp_invalidate+0x98/0xe0
[15604.650235] [c000201a1da97b30] [c0000000003a1120]
change_huge_pmd+0xe0/0x270
[15604.650301] [c000201a1da97ba0] [c000000000349278]
change_protection_range+0xb88/0xe40
[15604.650366] [c000201a1da97cf0] [c0000000003496c0]
mprotect_fixup+0x140/0x340
[15604.650430] [c000201a1da97db0] [c000000000349a74] SyS_mprotect+0x1b4/0x350
[15604.650486] [c000201a1da97e30] [c00000000000b184] system_call+0x58/0x6c
[15667.666494] INFO: rcu_sched self-detected stall on CPU
- [15667.666550] 60-....: (2 GPs behind) idle=146/140000000000002/0
softirq=300022/300022 fqs=1012258
+ [15667.666550] 60-....: (2 GPs behind) idle=146/140000000000002/0
softirq=300022/300022 fqs=1012258
[15667.666622] (t=2447054 jiffies g=184827 c=184826 q=57457)
[15667.666675] NMI backtrace for cpu 60
[15667.666709] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L
4.15.0-15-generic #16-Ubuntu
[15667.666781] Call Trace:
[15667.666805] [c000201a1da96b00] [c000000000ceb35c] dump_stack+0xb0/0xf4
(unreliable)
[15667.666871] [c000201a1da96b40] [c000000000cf4d48]
nmi_cpu_backtrace+0x1f8/0x200
[15667.666937] [c000201a1da96bd0] [c000000000cf4ee8]
nmi_trigger_cpumask_backtrace+0x198/0x1f0
[15667.667002] [c000201a1da96c60] [c00000000002f2d8]
arch_trigger_cpumask_backtrace+0x28/0x40
[15667.667086] [c000201a1da96c80] [c0000000001a913c]
rcu_dump_cpu_stacks+0xf4/0x158
[15667.667151] [c000201a1da96cd0] [c0000000001a81e8]
rcu_check_callbacks+0x8e8/0xb40
[15667.667216] [c000201a1da96e00] [c0000000001b64a8]
update_process_times+0x48/0x90
[15667.667280] [c000201a1da96e30] [c0000000001ce1f4]
tick_sched_handle.isra.5+0x34/0xd0
[15667.667344] [c000201a1da96e60] [c0000000001ce2f0]
tick_sched_timer+0x60/0xe0
[15667.667409] [c000201a1da96ea0] [c0000000001b7054]
__hrtimer_run_queues+0x144/0x370
[15667.667474] [c000201a1da96f20] [c0000000001b7fac]
hrtimer_interrupt+0xfc/0x350
[15667.667539] [c000201a1da96ff0] [c0000000000248f0]
__timer_interrupt+0x90/0x260
[15667.667604] [c000201a1da97040] [c000000000024d08] timer_interrupt+0x98/0xe0
[15667.667660] [c000201a1da97070] [c000000000009014]
decrementer_common+0x114/0x120
[15667.667727] --- interrupt: 901 at smp_call_function_single+0x130/0x180
[15667.667727] LR = smp_call_function_single+0x110/0x180
[15667.667821] [c000201a1da973d0] [c0000000001d55e0]
smp_call_function_any+0x180/0x250
[15667.667886] [c000201a1da97430] [c000000000acd3e8]
gpstate_timer_handler+0x1e8/0x580
[15667.667951] [c000201a1da974e0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0
[15667.668006] [c000201a1da97560] [c0000000001b4958] expire_timers+0x138/0x1f0
[15667.668061] [c000201a1da975d0] [c0000000001b4bf8]
run_timer_softirq+0x1e8/0x270
[15667.668126] [c000201a1da97670] [c000000000d0d6c8] __do_softirq+0x158/0x3e4
[15667.668181] [c000201a1da97750] [c000000000114be8] irq_exit+0xe8/0x120
[15667.668236] [c000201a1da97770] [c000000000024d0c] timer_interrupt+0x9c/0xe0
[15667.668292] [c000201a1da977a0] [c000000000009014]
decrementer_common+0x114/0x120
[15667.668358] --- interrupt: 901 at smp_call_function_many+0x330/0x450
[15667.668358] LR = smp_call_function_many+0x324/0x450
[15667.668469] [c000201a1da97b00] [c000000000075f18] pmdp_invalidate+0x98/0xe0
[15667.668524] [c000201a1da97b30] [c0000000003a1120]
change_huge_pmd+0xe0/0x270
[15667.668589] [c000201a1da97ba0] [c000000000349278]
change_protection_range+0xb88/0xe40
[15667.668654] [c000201a1da97cf0] [c0000000003496c0]
mprotect_fixup+0x140/0x340
[15667.668719] [c000201a1da97db0] [c000000000349a74] SyS_mprotect+0x1b4/0x350
[15667.668775] [c000201a1da97e30] [c00000000000b184] system_call+0x58/0x6c
```
Per feedback from Vaidy, this currently appears to NOT be a firmware
problem. This seems to be a kernel synchronization issue leading to a
dead lock.
-------
Fix identified by Shilpa as per Nick Piggin's recommendation. Kernel fix is
currently being tested.
- -------
+ -------
Fix upstream in 4.17-rc3
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.17-rc3&id=c0f7f5b6c69107ca92909512533e70258ee19188
cpufreq: powernv: Fix hardlockup due to synchronous smp_call in timer
interrupt
Posted to stable as well.
Mirroring to Launchpad for Canonical to pull in commit.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1768898
Title:
smp_call_function_single/many core hangs with stop4 alone
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1768898/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs