[Kernel-packages] [Bug 1768898] Re: smp_call_function_single/many core hangs with stop4 alone

2018-06-18 Thread Manoj Iyer
** Changed in: ubuntu-power-systems
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1768898

Title:
  smp_call_function_single/many core hangs with stop4 alone

Status in The Ubuntu-power-systems project:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Released

Bug description:
  == SRU Justification ==
  IBM reports that this bug occurs with stop4 which results in soft lockups/rcu 
stalls.
  This is a kernel synchronization issue leading to a dead lock.

  This bug was introduced by commit 7bc54b652f13 in v4.8-rc1.  This
  regression is fixed by mainline commit c0f7f5b6c6910.

  == Fix ==
  c0f7f5b6c6910 ("cpufreq: powernv: Fix hardlockup due to synchronous smp_call 
in timer interrupt")

  == Regression Potential ==
  Low. Fixes current regression.  Cc'd to upstream stable, so it has had
  additon upstream review.

  == Test Case ==
  A test kernel was built with this patch and tested by the original bug 
reporter.
  The bug reporter states the test kernel resolved the bug.


  Recently we discovered this bug occurs just alone with stop4 which
  results in soft lockups/rcu stalls.

  ```
  root@ltc-boston125:~# [15523.619395] systemd[1]: systemd-journald.service: 
Processes still around after final SIGKILL. Entering failed mode.
  [15523.619508] systemd[1]: systemd-journald.service: Failed with result 
'timeout'.
  [15523.619769] systemd[1]: Failed to start Journal Service.
  [15523.620618] systemd[1]: systemd-journald.service: Service has no hold-off 
time, scheduling restart.
  [15523.620774] systemd[1]: systemd-journald.service: Scheduled restart job, 
restart counter is at 21.
  [15523.621462] systemd[1]: Stopped Journal Service.
  [15523.621635] systemd[1]: systemd-journald.service: Found left-over process 
1561 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.621756] systemd[1]: This usually indicates unclean termination of a 
previous run, or service implementation deficiencies.
  [15523.621888] systemd[1]: systemd-journald.service: Found left-over process 
69060 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.622029] systemd[1]: This usually indica[15541.629904] INFO: rcu_sched 
self-detected stall on CPU
  [15541.629958]60-: (2 GPs behind) idle=146/142/0 
softirq=300022/300022 fqs=999069
  [15541.630046] (t=2415546 jiffies g=184827 c=184826 q=57111)
  [15541.630101] NMI backtrace for cpu 60
  [15541.630135] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L   
4.15.0-15-generic #16-Ubuntu
  [15541.630207] Call Trace:
  [15541.630232] [c000201a1da96b00] [c0ceb35c] dump_stack+0xb0/0xf4 
(unreliable)
  [15541.630298] [c000201a1da96b40] [c0cf4d48] 
nmi_cpu_backtrace+0x1f8/0x200
  [15541.630363] [c000201a1da96bd0] [c0cf4ee8] 
nmi_trigger_cpumask_backtrace+0x198/0x1f0
  [15541.630429] [c000201a1da96c60] [c002f2d8] 
arch_trigger_cpumask_backtrace+0x28/0x40
  [15541.630495] [c000201a1da96c80] [c01a913c] 
rcu_dump_cpu_stacks+0xf4/0x158
  [15541.630560] [c000201a1da96cd0] [c01a81e8] 
rcu_check_callbacks+0x8e8/0xb40
  [15541.630625] [c000201a1da96e00] [c01b64a8] 
update_process_times+0x48/0x90
  [15541.630689] [c000201a1da96e30] [c01ce1f4] 
tick_sched_handle.isra.5+0x34/0xd0
  [15541.630753] [c000201a1da96e60] [c01ce2f0] 
tick_sched_timer+0x60/0xe0
  [15541.630818] [c000201a1da96ea0] [c01b7054] 
__hrtimer_run_queues+0x144/0x370
  [15541.630883] [c000201a1da96f20] [c01b7fac] 
hrtimer_interrupt+0xfc/0x350
  [15541.630948] [c000201a1da96ff0] [c00248f0] 
__timer_interrupt+0x90/0x260
  [15541.631013] [c000201a1da97040] [c0024d08] timer_interrupt+0x98/0xe0
  [15541.631069] [c000201a1da97070] [c0009014] 
decrementer_common+0x114/0x120
  [15541.631135] --- interrupt: 901 at smp_call_function_single+0x134/0x180
  [15541.631135] LR = smp_call_function_single+0x110/0x180
  [15541.631230] [c000201a1da973d0] [c01d55e0] 
smp_call_function_any+0x180/0x250
  [15541.631294] [c000201a1da97430] [c0acd3e8] 
gpstate_timer_handler+0x1e8/0x580
  [15541.631359] [c000201a1da974e0] [c01b46b0] call_timer_fn+0x50/0x1c0
  [15541.631433] [c000201a1da97560] [c01b4958] expire_timers+0x138/0x1f0
  [15541.631488] [c000201a1da975d0] [c01b4bf8] 
run_timer_softirq+0x1e8/0x270
  [15541.631553] [c000201a1da97670] [c0d0d6c8] __do_softirq+0x158/0x3e4
  [15541.631608] [c000201a1da97750] [c0114be8] irq_exit+0xe8/0x120
  [15541.631663] [c000201a1da97770] [c0024d0c] timer_interrupt+0x9c/0xe0
  [15541.631718] [c000201a1da977a0] [c0009014] 
decrementer_common+0x114/0x120
  [15541.631784] --- interrupt: 901 at smp_call_function_many+0x330/0x450
  

[Kernel-packages] [Bug 1768898] Re: smp_call_function_single/many core hangs with stop4 alone

2018-06-14 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.15.0-23.25

---
linux (4.15.0-23.25) bionic; urgency=medium

  * linux: 4.15.0-23.25 -proposed tracker (LP: #1772927)

  * arm64 SDEI support needs trampoline code for KPTI (LP: #1768630)
- arm64: mmu: add the entry trampolines start/end section markers into
  sections.h
- arm64: sdei: Add trampoline code for remapping the kernel

  * Some PCIe errors not surfaced through rasdaemon (LP: #1769730)
- ACPI: APEI: handle PCIe AER errors in separate function
- ACPI: APEI: call into AER handling regardless of severity

  * qla2xxx: Fix page fault at kmem_cache_alloc_node() (LP: #1770003)
- scsi: qla2xxx: Fix session cleanup for N2N
- scsi: qla2xxx: Remove unused argument from 
qlt_schedule_sess_for_deletion()
- scsi: qla2xxx: Serialize session deletion by using work_lock
- scsi: qla2xxx: Serialize session free in qlt_free_session_done
- scsi: qla2xxx: Don't call dma_free_coherent with IRQ disabled.
- scsi: qla2xxx: Fix warning in qla2x00_async_iocb_timeout()
- scsi: qla2xxx: Prevent relogin trigger from sending too many commands
- scsi: qla2xxx: Fix double free bug after firmware timeout
- scsi: qla2xxx: Fixup locking for session deletion

  * Several hisi_sas bug fixes (LP: #1768974)
- scsi: hisi_sas: dt-bindings: add an property of signal attenuation
- scsi: hisi_sas: support the property of signal attenuation for v2 hw
- scsi: hisi_sas: fix the issue of link rate inconsistency
- scsi: hisi_sas: fix the issue of setting linkrate register
- scsi: hisi_sas: increase timer expire of internal abort task
- scsi: hisi_sas: remove unused variable hisi_sas_devices.running_req
- scsi: hisi_sas: fix return value of hisi_sas_task_prep()
- scsi: hisi_sas: Code cleanup and minor bug fixes

  * [bionic] machine stuck and bonding not working well when nvmet_rdma module
is loaded (LP: #1764982)
- nvmet-rdma: Don't flush system_wq by default during remove_one
- nvme-rdma: Don't flush delete_wq by default during remove_one

  * Warnings/hang during error handling of SATA disks on SAS controller
(LP: #1768971)
- scsi: libsas: defer ata device eh commands to libata

  * Hotplugging a SATA disk into a SAS controller may cause crash (LP: #1768948)
- ata: do not schedule hot plug if it is a sas host

  * ISST-LTE:pKVM:Ubuntu1804: rcu_sched self-detected stall on CPU follow by CPU
ATTEMPT TO RE-ENTER FIRMWARE! (LP: #1767927)
- powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write()
- powerpc/64s: return more carefully from sreset NMI
- powerpc/64s: sreset panic if there is no debugger or crash dump handlers

  * fsnotify: Fix fsnotify_mark_connector race (LP: #1765564)
- fsnotify: Fix fsnotify_mark_connector race

  * Hang on network interface removal in Xen virtual machine (LP: #1771620)
- xen-netfront: Fix hang on device removal

  * HiSilicon HNS NIC names are truncated in /proc/interrupts (LP: #1765977)
- net: hns: Avoid action name truncation

  * Ubuntu 18.04 kernel crashed while in degraded mode (LP: #1770849)
- SAUCE: powerpc/perf: Fix memory allocation for core-imc based on
  num_possible_cpus()

  * Switch Build-Depends: transfig to fig2dev (LP: #1770770)
- [Config] update Build-Depends: transfig to fig2dev

  * smp_call_function_single/many core hangs with stop4 alone (LP: #1768898)
- cpufreq: powernv: Fix hardlockup due to synchronous smp_call in timer
  interrupt

  * Add d-i support for Huawei NICs (LP: #1767490)
- d-i: add hinic to nic-modules udeb

  * unregister_netdevice: waiting for eth0 to become free. Usage count = 5
(LP: #1746474)
- xfrm: reuse uncached_list to track xdsts

  * Include nfp driver in linux-modules (LP: #1768526)
- [Config] Add nfp.ko to generic inclusion list

  * Kernel panic on boot (m1.small in cn-north-1) (LP: #1771679)
- x86/xen: Reset VCPU0 info pointer after shared_info remap

  * CVE-2018-3639 (x86)
- x86/bugs: Fix the parameters alignment and missing void
- KVM: SVM: Move spec control call after restore of GS
- x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP
- x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS
- x86/cpufeatures: Disentangle SSBD enumeration
- x86/cpufeatures: Add FEATURE_ZEN
- x86/speculation: Handle HT correctly on AMD
- x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL
- x86/speculation: Add virtualized speculative store bypass disable support
- x86/speculation: Rework speculative_store_bypass_update()
- x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host}
- x86/bugs: Expose x86_spec_ctrl_base directly
- x86/bugs: Remove x86_spec_ctrl_set()
- x86/bugs: Rework spec_ctrl base and mask logic
- x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFG
- KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD
- x86/bugs: 

[Kernel-packages] [Bug 1768898] Re: smp_call_function_single/many core hangs with stop4 alone

2018-06-11 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.15.0-23.25

---
linux (4.15.0-23.25) bionic; urgency=medium

  * linux: 4.15.0-23.25 -proposed tracker (LP: #1772927)

  * arm64 SDEI support needs trampoline code for KPTI (LP: #1768630)
- arm64: mmu: add the entry trampolines start/end section markers into
  sections.h
- arm64: sdei: Add trampoline code for remapping the kernel

  * Some PCIe errors not surfaced through rasdaemon (LP: #1769730)
- ACPI: APEI: handle PCIe AER errors in separate function
- ACPI: APEI: call into AER handling regardless of severity

  * qla2xxx: Fix page fault at kmem_cache_alloc_node() (LP: #1770003)
- scsi: qla2xxx: Fix session cleanup for N2N
- scsi: qla2xxx: Remove unused argument from 
qlt_schedule_sess_for_deletion()
- scsi: qla2xxx: Serialize session deletion by using work_lock
- scsi: qla2xxx: Serialize session free in qlt_free_session_done
- scsi: qla2xxx: Don't call dma_free_coherent with IRQ disabled.
- scsi: qla2xxx: Fix warning in qla2x00_async_iocb_timeout()
- scsi: qla2xxx: Prevent relogin trigger from sending too many commands
- scsi: qla2xxx: Fix double free bug after firmware timeout
- scsi: qla2xxx: Fixup locking for session deletion

  * Several hisi_sas bug fixes (LP: #1768974)
- scsi: hisi_sas: dt-bindings: add an property of signal attenuation
- scsi: hisi_sas: support the property of signal attenuation for v2 hw
- scsi: hisi_sas: fix the issue of link rate inconsistency
- scsi: hisi_sas: fix the issue of setting linkrate register
- scsi: hisi_sas: increase timer expire of internal abort task
- scsi: hisi_sas: remove unused variable hisi_sas_devices.running_req
- scsi: hisi_sas: fix return value of hisi_sas_task_prep()
- scsi: hisi_sas: Code cleanup and minor bug fixes

  * [bionic] machine stuck and bonding not working well when nvmet_rdma module
is loaded (LP: #1764982)
- nvmet-rdma: Don't flush system_wq by default during remove_one
- nvme-rdma: Don't flush delete_wq by default during remove_one

  * Warnings/hang during error handling of SATA disks on SAS controller
(LP: #1768971)
- scsi: libsas: defer ata device eh commands to libata

  * Hotplugging a SATA disk into a SAS controller may cause crash (LP: #1768948)
- ata: do not schedule hot plug if it is a sas host

  * ISST-LTE:pKVM:Ubuntu1804: rcu_sched self-detected stall on CPU follow by CPU
ATTEMPT TO RE-ENTER FIRMWARE! (LP: #1767927)
- powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write()
- powerpc/64s: return more carefully from sreset NMI
- powerpc/64s: sreset panic if there is no debugger or crash dump handlers

  * fsnotify: Fix fsnotify_mark_connector race (LP: #1765564)
- fsnotify: Fix fsnotify_mark_connector race

  * Hang on network interface removal in Xen virtual machine (LP: #1771620)
- xen-netfront: Fix hang on device removal

  * HiSilicon HNS NIC names are truncated in /proc/interrupts (LP: #1765977)
- net: hns: Avoid action name truncation

  * Ubuntu 18.04 kernel crashed while in degraded mode (LP: #1770849)
- SAUCE: powerpc/perf: Fix memory allocation for core-imc based on
  num_possible_cpus()

  * Switch Build-Depends: transfig to fig2dev (LP: #1770770)
- [Config] update Build-Depends: transfig to fig2dev

  * smp_call_function_single/many core hangs with stop4 alone (LP: #1768898)
- cpufreq: powernv: Fix hardlockup due to synchronous smp_call in timer
  interrupt

  * Add d-i support for Huawei NICs (LP: #1767490)
- d-i: add hinic to nic-modules udeb

  * unregister_netdevice: waiting for eth0 to become free. Usage count = 5
(LP: #1746474)
- xfrm: reuse uncached_list to track xdsts

  * Include nfp driver in linux-modules (LP: #1768526)
- [Config] Add nfp.ko to generic inclusion list

  * Kernel panic on boot (m1.small in cn-north-1) (LP: #1771679)
- x86/xen: Reset VCPU0 info pointer after shared_info remap

  * CVE-2018-3639 (x86)
- x86/bugs: Fix the parameters alignment and missing void
- KVM: SVM: Move spec control call after restore of GS
- x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP
- x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS
- x86/cpufeatures: Disentangle SSBD enumeration
- x86/cpufeatures: Add FEATURE_ZEN
- x86/speculation: Handle HT correctly on AMD
- x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL
- x86/speculation: Add virtualized speculative store bypass disable support
- x86/speculation: Rework speculative_store_bypass_update()
- x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host}
- x86/bugs: Expose x86_spec_ctrl_base directly
- x86/bugs: Remove x86_spec_ctrl_set()
- x86/bugs: Rework spec_ctrl base and mask logic
- x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFG
- KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD
- x86/bugs: 

[Kernel-packages] [Bug 1768898] Re: smp_call_function_single/many core hangs with stop4 alone

2018-06-11 Thread Manoj Iyer
** Changed in: ubuntu-power-systems
   Status: In Progress => Fix Committed

** Changed in: linux (Ubuntu)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1768898

Title:
  smp_call_function_single/many core hangs with stop4 alone

Status in The Ubuntu-power-systems project:
  Fix Committed
Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  == SRU Justification ==
  IBM reports that this bug occurs with stop4 which results in soft lockups/rcu 
stalls.
  This is a kernel synchronization issue leading to a dead lock.

  This bug was introduced by commit 7bc54b652f13 in v4.8-rc1.  This
  regression is fixed by mainline commit c0f7f5b6c6910.

  == Fix ==
  c0f7f5b6c6910 ("cpufreq: powernv: Fix hardlockup due to synchronous smp_call 
in timer interrupt")

  == Regression Potential ==
  Low. Fixes current regression.  Cc'd to upstream stable, so it has had
  additon upstream review.

  == Test Case ==
  A test kernel was built with this patch and tested by the original bug 
reporter.
  The bug reporter states the test kernel resolved the bug.


  Recently we discovered this bug occurs just alone with stop4 which
  results in soft lockups/rcu stalls.

  ```
  root@ltc-boston125:~# [15523.619395] systemd[1]: systemd-journald.service: 
Processes still around after final SIGKILL. Entering failed mode.
  [15523.619508] systemd[1]: systemd-journald.service: Failed with result 
'timeout'.
  [15523.619769] systemd[1]: Failed to start Journal Service.
  [15523.620618] systemd[1]: systemd-journald.service: Service has no hold-off 
time, scheduling restart.
  [15523.620774] systemd[1]: systemd-journald.service: Scheduled restart job, 
restart counter is at 21.
  [15523.621462] systemd[1]: Stopped Journal Service.
  [15523.621635] systemd[1]: systemd-journald.service: Found left-over process 
1561 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.621756] systemd[1]: This usually indicates unclean termination of a 
previous run, or service implementation deficiencies.
  [15523.621888] systemd[1]: systemd-journald.service: Found left-over process 
69060 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.622029] systemd[1]: This usually indica[15541.629904] INFO: rcu_sched 
self-detected stall on CPU
  [15541.629958]60-: (2 GPs behind) idle=146/142/0 
softirq=300022/300022 fqs=999069
  [15541.630046] (t=2415546 jiffies g=184827 c=184826 q=57111)
  [15541.630101] NMI backtrace for cpu 60
  [15541.630135] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L   
4.15.0-15-generic #16-Ubuntu
  [15541.630207] Call Trace:
  [15541.630232] [c000201a1da96b00] [c0ceb35c] dump_stack+0xb0/0xf4 
(unreliable)
  [15541.630298] [c000201a1da96b40] [c0cf4d48] 
nmi_cpu_backtrace+0x1f8/0x200
  [15541.630363] [c000201a1da96bd0] [c0cf4ee8] 
nmi_trigger_cpumask_backtrace+0x198/0x1f0
  [15541.630429] [c000201a1da96c60] [c002f2d8] 
arch_trigger_cpumask_backtrace+0x28/0x40
  [15541.630495] [c000201a1da96c80] [c01a913c] 
rcu_dump_cpu_stacks+0xf4/0x158
  [15541.630560] [c000201a1da96cd0] [c01a81e8] 
rcu_check_callbacks+0x8e8/0xb40
  [15541.630625] [c000201a1da96e00] [c01b64a8] 
update_process_times+0x48/0x90
  [15541.630689] [c000201a1da96e30] [c01ce1f4] 
tick_sched_handle.isra.5+0x34/0xd0
  [15541.630753] [c000201a1da96e60] [c01ce2f0] 
tick_sched_timer+0x60/0xe0
  [15541.630818] [c000201a1da96ea0] [c01b7054] 
__hrtimer_run_queues+0x144/0x370
  [15541.630883] [c000201a1da96f20] [c01b7fac] 
hrtimer_interrupt+0xfc/0x350
  [15541.630948] [c000201a1da96ff0] [c00248f0] 
__timer_interrupt+0x90/0x260
  [15541.631013] [c000201a1da97040] [c0024d08] timer_interrupt+0x98/0xe0
  [15541.631069] [c000201a1da97070] [c0009014] 
decrementer_common+0x114/0x120
  [15541.631135] --- interrupt: 901 at smp_call_function_single+0x134/0x180
  [15541.631135] LR = smp_call_function_single+0x110/0x180
  [15541.631230] [c000201a1da973d0] [c01d55e0] 
smp_call_function_any+0x180/0x250
  [15541.631294] [c000201a1da97430] [c0acd3e8] 
gpstate_timer_handler+0x1e8/0x580
  [15541.631359] [c000201a1da974e0] [c01b46b0] call_timer_fn+0x50/0x1c0
  [15541.631433] [c000201a1da97560] [c01b4958] expire_timers+0x138/0x1f0
  [15541.631488] [c000201a1da975d0] [c01b4bf8] 
run_timer_softirq+0x1e8/0x270
  [15541.631553] [c000201a1da97670] [c0d0d6c8] __do_softirq+0x158/0x3e4
  [15541.631608] [c000201a1da97750] [c0114be8] irq_exit+0xe8/0x120
  [15541.631663] [c000201a1da97770] [c0024d0c] timer_interrupt+0x9c/0xe0
  [15541.631718] [c000201a1da977a0] [c0009014] 
decrementer_common+0x114/0x120
  

[Kernel-packages] [Bug 1768898] Re: smp_call_function_single/many core hangs with stop4 alone

2018-05-26 Thread bugproxy
--- Comment From ppaid...@in.ibm.com 2018-05-26 03:34 EDT---
Linux ltc-boston125 4.15.0-23-generic #25-Ubuntu SMP Wed May 23 17:59:00 UTC 
2018 ppc64le ppc64le ppc64le GNU/Linux

Installed & tested the above proposed kernel, and tests ran fine for
more than 12 hours, not seen any issues. Thanks.

** Tags removed: verification-needed-bionic
** Tags added: verification-done-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1768898

Title:
  smp_call_function_single/many core hangs with stop4 alone

Status in The Ubuntu-power-systems project:
  In Progress
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  == SRU Justification ==
  IBM reports that this bug occurs with stop4 which results in soft lockups/rcu 
stalls.
  This is a kernel synchronization issue leading to a dead lock.

  This bug was introduced by commit 7bc54b652f13 in v4.8-rc1.  This
  regression is fixed by mainline commit c0f7f5b6c6910.

  == Fix ==
  c0f7f5b6c6910 ("cpufreq: powernv: Fix hardlockup due to synchronous smp_call 
in timer interrupt")

  == Regression Potential ==
  Low. Fixes current regression.  Cc'd to upstream stable, so it has had
  additon upstream review.

  == Test Case ==
  A test kernel was built with this patch and tested by the original bug 
reporter.
  The bug reporter states the test kernel resolved the bug.


  Recently we discovered this bug occurs just alone with stop4 which
  results in soft lockups/rcu stalls.

  ```
  root@ltc-boston125:~# [15523.619395] systemd[1]: systemd-journald.service: 
Processes still around after final SIGKILL. Entering failed mode.
  [15523.619508] systemd[1]: systemd-journald.service: Failed with result 
'timeout'.
  [15523.619769] systemd[1]: Failed to start Journal Service.
  [15523.620618] systemd[1]: systemd-journald.service: Service has no hold-off 
time, scheduling restart.
  [15523.620774] systemd[1]: systemd-journald.service: Scheduled restart job, 
restart counter is at 21.
  [15523.621462] systemd[1]: Stopped Journal Service.
  [15523.621635] systemd[1]: systemd-journald.service: Found left-over process 
1561 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.621756] systemd[1]: This usually indicates unclean termination of a 
previous run, or service implementation deficiencies.
  [15523.621888] systemd[1]: systemd-journald.service: Found left-over process 
69060 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.622029] systemd[1]: This usually indica[15541.629904] INFO: rcu_sched 
self-detected stall on CPU
  [15541.629958]60-: (2 GPs behind) idle=146/142/0 
softirq=300022/300022 fqs=999069
  [15541.630046] (t=2415546 jiffies g=184827 c=184826 q=57111)
  [15541.630101] NMI backtrace for cpu 60
  [15541.630135] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L   
4.15.0-15-generic #16-Ubuntu
  [15541.630207] Call Trace:
  [15541.630232] [c000201a1da96b00] [c0ceb35c] dump_stack+0xb0/0xf4 
(unreliable)
  [15541.630298] [c000201a1da96b40] [c0cf4d48] 
nmi_cpu_backtrace+0x1f8/0x200
  [15541.630363] [c000201a1da96bd0] [c0cf4ee8] 
nmi_trigger_cpumask_backtrace+0x198/0x1f0
  [15541.630429] [c000201a1da96c60] [c002f2d8] 
arch_trigger_cpumask_backtrace+0x28/0x40
  [15541.630495] [c000201a1da96c80] [c01a913c] 
rcu_dump_cpu_stacks+0xf4/0x158
  [15541.630560] [c000201a1da96cd0] [c01a81e8] 
rcu_check_callbacks+0x8e8/0xb40
  [15541.630625] [c000201a1da96e00] [c01b64a8] 
update_process_times+0x48/0x90
  [15541.630689] [c000201a1da96e30] [c01ce1f4] 
tick_sched_handle.isra.5+0x34/0xd0
  [15541.630753] [c000201a1da96e60] [c01ce2f0] 
tick_sched_timer+0x60/0xe0
  [15541.630818] [c000201a1da96ea0] [c01b7054] 
__hrtimer_run_queues+0x144/0x370
  [15541.630883] [c000201a1da96f20] [c01b7fac] 
hrtimer_interrupt+0xfc/0x350
  [15541.630948] [c000201a1da96ff0] [c00248f0] 
__timer_interrupt+0x90/0x260
  [15541.631013] [c000201a1da97040] [c0024d08] timer_interrupt+0x98/0xe0
  [15541.631069] [c000201a1da97070] [c0009014] 
decrementer_common+0x114/0x120
  [15541.631135] --- interrupt: 901 at smp_call_function_single+0x134/0x180
  [15541.631135] LR = smp_call_function_single+0x110/0x180
  [15541.631230] [c000201a1da973d0] [c01d55e0] 
smp_call_function_any+0x180/0x250
  [15541.631294] [c000201a1da97430] [c0acd3e8] 
gpstate_timer_handler+0x1e8/0x580
  [15541.631359] [c000201a1da974e0] [c01b46b0] call_timer_fn+0x50/0x1c0
  [15541.631433] [c000201a1da97560] [c01b4958] expire_timers+0x138/0x1f0
  [15541.631488] [c000201a1da975d0] [c01b4bf8] 
run_timer_softirq+0x1e8/0x270
  [15541.631553] [c000201a1da97670] [c0d0d6c8] __do_softirq+0x158/0x3e4
  [15541.631608] 

[Kernel-packages] [Bug 1768898] Re: smp_call_function_single/many core hangs with stop4 alone

2018-05-24 Thread Brad Figg
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
bionic' to 'verification-done-bionic'. If the problem still exists,
change the tag 'verification-needed-bionic' to 'verification-failed-
bionic'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1768898

Title:
  smp_call_function_single/many core hangs with stop4 alone

Status in The Ubuntu-power-systems project:
  In Progress
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  == SRU Justification ==
  IBM reports that this bug occurs with stop4 which results in soft lockups/rcu 
stalls.
  This is a kernel synchronization issue leading to a dead lock.

  This bug was introduced by commit 7bc54b652f13 in v4.8-rc1.  This
  regression is fixed by mainline commit c0f7f5b6c6910.

  == Fix ==
  c0f7f5b6c6910 ("cpufreq: powernv: Fix hardlockup due to synchronous smp_call 
in timer interrupt")

  == Regression Potential ==
  Low. Fixes current regression.  Cc'd to upstream stable, so it has had
  additon upstream review.

  == Test Case ==
  A test kernel was built with this patch and tested by the original bug 
reporter.
  The bug reporter states the test kernel resolved the bug.


  Recently we discovered this bug occurs just alone with stop4 which
  results in soft lockups/rcu stalls.

  ```
  root@ltc-boston125:~# [15523.619395] systemd[1]: systemd-journald.service: 
Processes still around after final SIGKILL. Entering failed mode.
  [15523.619508] systemd[1]: systemd-journald.service: Failed with result 
'timeout'.
  [15523.619769] systemd[1]: Failed to start Journal Service.
  [15523.620618] systemd[1]: systemd-journald.service: Service has no hold-off 
time, scheduling restart.
  [15523.620774] systemd[1]: systemd-journald.service: Scheduled restart job, 
restart counter is at 21.
  [15523.621462] systemd[1]: Stopped Journal Service.
  [15523.621635] systemd[1]: systemd-journald.service: Found left-over process 
1561 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.621756] systemd[1]: This usually indicates unclean termination of a 
previous run, or service implementation deficiencies.
  [15523.621888] systemd[1]: systemd-journald.service: Found left-over process 
69060 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.622029] systemd[1]: This usually indica[15541.629904] INFO: rcu_sched 
self-detected stall on CPU
  [15541.629958]60-: (2 GPs behind) idle=146/142/0 
softirq=300022/300022 fqs=999069
  [15541.630046] (t=2415546 jiffies g=184827 c=184826 q=57111)
  [15541.630101] NMI backtrace for cpu 60
  [15541.630135] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L   
4.15.0-15-generic #16-Ubuntu
  [15541.630207] Call Trace:
  [15541.630232] [c000201a1da96b00] [c0ceb35c] dump_stack+0xb0/0xf4 
(unreliable)
  [15541.630298] [c000201a1da96b40] [c0cf4d48] 
nmi_cpu_backtrace+0x1f8/0x200
  [15541.630363] [c000201a1da96bd0] [c0cf4ee8] 
nmi_trigger_cpumask_backtrace+0x198/0x1f0
  [15541.630429] [c000201a1da96c60] [c002f2d8] 
arch_trigger_cpumask_backtrace+0x28/0x40
  [15541.630495] [c000201a1da96c80] [c01a913c] 
rcu_dump_cpu_stacks+0xf4/0x158
  [15541.630560] [c000201a1da96cd0] [c01a81e8] 
rcu_check_callbacks+0x8e8/0xb40
  [15541.630625] [c000201a1da96e00] [c01b64a8] 
update_process_times+0x48/0x90
  [15541.630689] [c000201a1da96e30] [c01ce1f4] 
tick_sched_handle.isra.5+0x34/0xd0
  [15541.630753] [c000201a1da96e60] [c01ce2f0] 
tick_sched_timer+0x60/0xe0
  [15541.630818] [c000201a1da96ea0] [c01b7054] 
__hrtimer_run_queues+0x144/0x370
  [15541.630883] [c000201a1da96f20] [c01b7fac] 
hrtimer_interrupt+0xfc/0x350
  [15541.630948] [c000201a1da96ff0] [c00248f0] 
__timer_interrupt+0x90/0x260
  [15541.631013] [c000201a1da97040] [c0024d08] timer_interrupt+0x98/0xe0
  [15541.631069] [c000201a1da97070] [c0009014] 
decrementer_common+0x114/0x120
  [15541.631135] --- interrupt: 901 at smp_call_function_single+0x134/0x180
  [15541.631135] LR = smp_call_function_single+0x110/0x180
  [15541.631230] [c000201a1da973d0] [c01d55e0] 
smp_call_function_any+0x180/0x250
  [15541.631294] [c000201a1da97430] [c0acd3e8] 
gpstate_timer_handler+0x1e8/0x580
  [15541.631359] [c000201a1da974e0] [c01b46b0] call_timer_fn+0x50/0x1c0
  

[Kernel-packages] [Bug 1768898] Re: smp_call_function_single/many core hangs with stop4 alone

2018-05-23 Thread Stefan Bader
** Changed in: linux (Ubuntu Bionic)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1768898

Title:
  smp_call_function_single/many core hangs with stop4 alone

Status in The Ubuntu-power-systems project:
  In Progress
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  == SRU Justification ==
  IBM reports that this bug occurs with stop4 which results in soft lockups/rcu 
stalls.
  This is a kernel synchronization issue leading to a dead lock.

  This bug was introduced by commit 7bc54b652f13 in v4.8-rc1.  This
  regression is fixed by mainline commit c0f7f5b6c6910.

  == Fix ==
  c0f7f5b6c6910 ("cpufreq: powernv: Fix hardlockup due to synchronous smp_call 
in timer interrupt")

  == Regression Potential ==
  Low. Fixes current regression.  Cc'd to upstream stable, so it has had
  additon upstream review.

  == Test Case ==
  A test kernel was built with this patch and tested by the original bug 
reporter.
  The bug reporter states the test kernel resolved the bug.


  Recently we discovered this bug occurs just alone with stop4 which
  results in soft lockups/rcu stalls.

  ```
  root@ltc-boston125:~# [15523.619395] systemd[1]: systemd-journald.service: 
Processes still around after final SIGKILL. Entering failed mode.
  [15523.619508] systemd[1]: systemd-journald.service: Failed with result 
'timeout'.
  [15523.619769] systemd[1]: Failed to start Journal Service.
  [15523.620618] systemd[1]: systemd-journald.service: Service has no hold-off 
time, scheduling restart.
  [15523.620774] systemd[1]: systemd-journald.service: Scheduled restart job, 
restart counter is at 21.
  [15523.621462] systemd[1]: Stopped Journal Service.
  [15523.621635] systemd[1]: systemd-journald.service: Found left-over process 
1561 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.621756] systemd[1]: This usually indicates unclean termination of a 
previous run, or service implementation deficiencies.
  [15523.621888] systemd[1]: systemd-journald.service: Found left-over process 
69060 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.622029] systemd[1]: This usually indica[15541.629904] INFO: rcu_sched 
self-detected stall on CPU
  [15541.629958]60-: (2 GPs behind) idle=146/142/0 
softirq=300022/300022 fqs=999069
  [15541.630046] (t=2415546 jiffies g=184827 c=184826 q=57111)
  [15541.630101] NMI backtrace for cpu 60
  [15541.630135] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L   
4.15.0-15-generic #16-Ubuntu
  [15541.630207] Call Trace:
  [15541.630232] [c000201a1da96b00] [c0ceb35c] dump_stack+0xb0/0xf4 
(unreliable)
  [15541.630298] [c000201a1da96b40] [c0cf4d48] 
nmi_cpu_backtrace+0x1f8/0x200
  [15541.630363] [c000201a1da96bd0] [c0cf4ee8] 
nmi_trigger_cpumask_backtrace+0x198/0x1f0
  [15541.630429] [c000201a1da96c60] [c002f2d8] 
arch_trigger_cpumask_backtrace+0x28/0x40
  [15541.630495] [c000201a1da96c80] [c01a913c] 
rcu_dump_cpu_stacks+0xf4/0x158
  [15541.630560] [c000201a1da96cd0] [c01a81e8] 
rcu_check_callbacks+0x8e8/0xb40
  [15541.630625] [c000201a1da96e00] [c01b64a8] 
update_process_times+0x48/0x90
  [15541.630689] [c000201a1da96e30] [c01ce1f4] 
tick_sched_handle.isra.5+0x34/0xd0
  [15541.630753] [c000201a1da96e60] [c01ce2f0] 
tick_sched_timer+0x60/0xe0
  [15541.630818] [c000201a1da96ea0] [c01b7054] 
__hrtimer_run_queues+0x144/0x370
  [15541.630883] [c000201a1da96f20] [c01b7fac] 
hrtimer_interrupt+0xfc/0x350
  [15541.630948] [c000201a1da96ff0] [c00248f0] 
__timer_interrupt+0x90/0x260
  [15541.631013] [c000201a1da97040] [c0024d08] timer_interrupt+0x98/0xe0
  [15541.631069] [c000201a1da97070] [c0009014] 
decrementer_common+0x114/0x120
  [15541.631135] --- interrupt: 901 at smp_call_function_single+0x134/0x180
  [15541.631135] LR = smp_call_function_single+0x110/0x180
  [15541.631230] [c000201a1da973d0] [c01d55e0] 
smp_call_function_any+0x180/0x250
  [15541.631294] [c000201a1da97430] [c0acd3e8] 
gpstate_timer_handler+0x1e8/0x580
  [15541.631359] [c000201a1da974e0] [c01b46b0] call_timer_fn+0x50/0x1c0
  [15541.631433] [c000201a1da97560] [c01b4958] expire_timers+0x138/0x1f0
  [15541.631488] [c000201a1da975d0] [c01b4bf8] 
run_timer_softirq+0x1e8/0x270
  [15541.631553] [c000201a1da97670] [c0d0d6c8] __do_softirq+0x158/0x3e4
  [15541.631608] [c000201a1da97750] [c0114be8] irq_exit+0xe8/0x120
  [15541.631663] [c000201a1da97770] [c0024d0c] timer_interrupt+0x9c/0xe0
  [15541.631718] [c000201a1da977a0] [c0009014] 
decrementer_common+0x114/0x120
  [15541.631784] --- interrupt: 901 at smp_call_function_many+0x330/0x450
  [15541.631784] 

[Kernel-packages] [Bug 1768898] Re: smp_call_function_single/many core hangs with stop4 alone

2018-05-18 Thread Joseph Salisbury
SRU request submitted:
https://lists.ubuntu.com/archives/kernel-team/2018-May/092540.html

** Description changed:

+ == SRU Justification ==
+ IBM reports that this bug occurs with stop4 which results in soft lockups/rcu 
stalls.
+ This is a kernel synchronization issue leading to a dead lock.
+ 
+ This bug was introduced by commit 7bc54b652f13 in v4.8-rc1.  This
+ regression is fixed by mainline commit c0f7f5b6c6910.
+ 
+ == Fix ==
+ c0f7f5b6c6910 ("cpufreq: powernv: Fix hardlockup due to synchronous smp_call 
in timer interrupt")
+ 
+ == Regression Potential ==
+ Low. Fixes current regression.  Cc'd to upstream stable, so it has had
+ additon upstream review.
+ 
+ == Test Case ==
+ A test kernel was built with this patch and tested by the original bug 
reporter.
+ The bug reporter states the test kernel resolved the bug.
+ 
+ 
  Recently we discovered this bug occurs just alone with stop4 which
  results in soft lockups/rcu stalls.
  
  ```
  root@ltc-boston125:~# [15523.619395] systemd[1]: systemd-journald.service: 
Processes still around after final SIGKILL. Entering failed mode.
  [15523.619508] systemd[1]: systemd-journald.service: Failed with result 
'timeout'.
  [15523.619769] systemd[1]: Failed to start Journal Service.
  [15523.620618] systemd[1]: systemd-journald.service: Service has no hold-off 
time, scheduling restart.
  [15523.620774] systemd[1]: systemd-journald.service: Scheduled restart job, 
restart counter is at 21.
  [15523.621462] systemd[1]: Stopped Journal Service.
  [15523.621635] systemd[1]: systemd-journald.service: Found left-over process 
1561 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.621756] systemd[1]: This usually indicates unclean termination of a 
previous run, or service implementation deficiencies.
  [15523.621888] systemd[1]: systemd-journald.service: Found left-over process 
69060 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.622029] systemd[1]: This usually indica[15541.629904] INFO: rcu_sched 
self-detected stall on CPU
- [15541.629958]60-: (2 GPs behind) idle=146/142/0 
softirq=300022/300022 fqs=999069 
+ [15541.629958]60-: (2 GPs behind) idle=146/142/0 
softirq=300022/300022 fqs=999069
  [15541.630046] (t=2415546 jiffies g=184827 c=184826 q=57111)
  [15541.630101] NMI backtrace for cpu 60
  [15541.630135] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L   
4.15.0-15-generic #16-Ubuntu
  [15541.630207] Call Trace:
  [15541.630232] [c000201a1da96b00] [c0ceb35c] dump_stack+0xb0/0xf4 
(unreliable)
  [15541.630298] [c000201a1da96b40] [c0cf4d48] 
nmi_cpu_backtrace+0x1f8/0x200
  [15541.630363] [c000201a1da96bd0] [c0cf4ee8] 
nmi_trigger_cpumask_backtrace+0x198/0x1f0
  [15541.630429] [c000201a1da96c60] [c002f2d8] 
arch_trigger_cpumask_backtrace+0x28/0x40
  [15541.630495] [c000201a1da96c80] [c01a913c] 
rcu_dump_cpu_stacks+0xf4/0x158
  [15541.630560] [c000201a1da96cd0] [c01a81e8] 
rcu_check_callbacks+0x8e8/0xb40
  [15541.630625] [c000201a1da96e00] [c01b64a8] 
update_process_times+0x48/0x90
  [15541.630689] [c000201a1da96e30] [c01ce1f4] 
tick_sched_handle.isra.5+0x34/0xd0
  [15541.630753] [c000201a1da96e60] [c01ce2f0] 
tick_sched_timer+0x60/0xe0
  [15541.630818] [c000201a1da96ea0] [c01b7054] 
__hrtimer_run_queues+0x144/0x370
  [15541.630883] [c000201a1da96f20] [c01b7fac] 
hrtimer_interrupt+0xfc/0x350
  [15541.630948] [c000201a1da96ff0] [c00248f0] 
__timer_interrupt+0x90/0x260
  [15541.631013] [c000201a1da97040] [c0024d08] timer_interrupt+0x98/0xe0
  [15541.631069] [c000201a1da97070] [c0009014] 
decrementer_common+0x114/0x120
  [15541.631135] --- interrupt: 901 at smp_call_function_single+0x134/0x180
  [15541.631135] LR = smp_call_function_single+0x110/0x180
  [15541.631230] [c000201a1da973d0] [c01d55e0] 
smp_call_function_any+0x180/0x250
  [15541.631294] [c000201a1da97430] [c0acd3e8] 
gpstate_timer_handler+0x1e8/0x580
  [15541.631359] [c000201a1da974e0] [c01b46b0] call_timer_fn+0x50/0x1c0
  [15541.631433] [c000201a1da97560] [c01b4958] expire_timers+0x138/0x1f0
  [15541.631488] [c000201a1da975d0] [c01b4bf8] 
run_timer_softirq+0x1e8/0x270
  [15541.631553] [c000201a1da97670] [c0d0d6c8] __do_softirq+0x158/0x3e4
  [15541.631608] [c000201a1da97750] [c0114be8] irq_exit+0xe8/0x120
  [15541.631663] [c000201a1da97770] [c0024d0c] timer_interrupt+0x9c/0xe0
  [15541.631718] [c000201a1da977a0] [c0009014] 
decrementer_common+0x114/0x120
  [15541.631784] --- interrupt: 901 at smp_call_function_many+0x330/0x450
  [15541.631784] LR = smp_call_function_many+0x324/0x450
  [15541.631879] [c000201a1da97b00] [c0075f18] pmdp_invalidate+0x98/0xe0
  [15541.631935] [c000201a1da97b30] [c03a1120] 
change_huge_pmd+0xe0/0x270
  [15541.632000] [c000201a1da97ba0] 

[Kernel-packages] [Bug 1768898] Re: smp_call_function_single/many core hangs with stop4 alone

2018-05-14 Thread Frank Heimes
** Changed in: ubuntu-power-systems
   Status: Triaged => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1768898

Title:
  smp_call_function_single/many core hangs with stop4 alone

Status in The Ubuntu-power-systems project:
  In Progress
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress

Bug description:
  Recently we discovered this bug occurs just alone with stop4 which
  results in soft lockups/rcu stalls.

  ```
  root@ltc-boston125:~# [15523.619395] systemd[1]: systemd-journald.service: 
Processes still around after final SIGKILL. Entering failed mode.
  [15523.619508] systemd[1]: systemd-journald.service: Failed with result 
'timeout'.
  [15523.619769] systemd[1]: Failed to start Journal Service.
  [15523.620618] systemd[1]: systemd-journald.service: Service has no hold-off 
time, scheduling restart.
  [15523.620774] systemd[1]: systemd-journald.service: Scheduled restart job, 
restart counter is at 21.
  [15523.621462] systemd[1]: Stopped Journal Service.
  [15523.621635] systemd[1]: systemd-journald.service: Found left-over process 
1561 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.621756] systemd[1]: This usually indicates unclean termination of a 
previous run, or service implementation deficiencies.
  [15523.621888] systemd[1]: systemd-journald.service: Found left-over process 
69060 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.622029] systemd[1]: This usually indica[15541.629904] INFO: rcu_sched 
self-detected stall on CPU
  [15541.629958]60-: (2 GPs behind) idle=146/142/0 
softirq=300022/300022 fqs=999069 
  [15541.630046] (t=2415546 jiffies g=184827 c=184826 q=57111)
  [15541.630101] NMI backtrace for cpu 60
  [15541.630135] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L   
4.15.0-15-generic #16-Ubuntu
  [15541.630207] Call Trace:
  [15541.630232] [c000201a1da96b00] [c0ceb35c] dump_stack+0xb0/0xf4 
(unreliable)
  [15541.630298] [c000201a1da96b40] [c0cf4d48] 
nmi_cpu_backtrace+0x1f8/0x200
  [15541.630363] [c000201a1da96bd0] [c0cf4ee8] 
nmi_trigger_cpumask_backtrace+0x198/0x1f0
  [15541.630429] [c000201a1da96c60] [c002f2d8] 
arch_trigger_cpumask_backtrace+0x28/0x40
  [15541.630495] [c000201a1da96c80] [c01a913c] 
rcu_dump_cpu_stacks+0xf4/0x158
  [15541.630560] [c000201a1da96cd0] [c01a81e8] 
rcu_check_callbacks+0x8e8/0xb40
  [15541.630625] [c000201a1da96e00] [c01b64a8] 
update_process_times+0x48/0x90
  [15541.630689] [c000201a1da96e30] [c01ce1f4] 
tick_sched_handle.isra.5+0x34/0xd0
  [15541.630753] [c000201a1da96e60] [c01ce2f0] 
tick_sched_timer+0x60/0xe0
  [15541.630818] [c000201a1da96ea0] [c01b7054] 
__hrtimer_run_queues+0x144/0x370
  [15541.630883] [c000201a1da96f20] [c01b7fac] 
hrtimer_interrupt+0xfc/0x350
  [15541.630948] [c000201a1da96ff0] [c00248f0] 
__timer_interrupt+0x90/0x260
  [15541.631013] [c000201a1da97040] [c0024d08] timer_interrupt+0x98/0xe0
  [15541.631069] [c000201a1da97070] [c0009014] 
decrementer_common+0x114/0x120
  [15541.631135] --- interrupt: 901 at smp_call_function_single+0x134/0x180
  [15541.631135] LR = smp_call_function_single+0x110/0x180
  [15541.631230] [c000201a1da973d0] [c01d55e0] 
smp_call_function_any+0x180/0x250
  [15541.631294] [c000201a1da97430] [c0acd3e8] 
gpstate_timer_handler+0x1e8/0x580
  [15541.631359] [c000201a1da974e0] [c01b46b0] call_timer_fn+0x50/0x1c0
  [15541.631433] [c000201a1da97560] [c01b4958] expire_timers+0x138/0x1f0
  [15541.631488] [c000201a1da975d0] [c01b4bf8] 
run_timer_softirq+0x1e8/0x270
  [15541.631553] [c000201a1da97670] [c0d0d6c8] __do_softirq+0x158/0x3e4
  [15541.631608] [c000201a1da97750] [c0114be8] irq_exit+0xe8/0x120
  [15541.631663] [c000201a1da97770] [c0024d0c] timer_interrupt+0x9c/0xe0
  [15541.631718] [c000201a1da977a0] [c0009014] 
decrementer_common+0x114/0x120
  [15541.631784] --- interrupt: 901 at smp_call_function_many+0x330/0x450
  [15541.631784] LR = smp_call_function_many+0x324/0x450
  [15541.631879] [c000201a1da97b00] [c0075f18] pmdp_invalidate+0x98/0xe0
  [15541.631935] [c000201a1da97b30] [c03a1120] 
change_huge_pmd+0xe0/0x270
  [15541.632000] [c000201a1da97ba0] [c0349278] 
change_protection_range+0xb88/0xe40
  [15541.632065] [c000201a1da97cf0] [c03496c0] 
mprotect_fixup+0x140/0x340
  [15541.632129] [c000201a1da97db0] [c0349a74] SyS_mprotect+0x1b4/0x350
  [15541.632185] [c000201a1da97e30] [c000b184] system_call+0x58/0x6c
  [15579.001651] watchdog: BUG: soft lockup - CPU#52 stuck for 23s! [grep:69263]
  [15579.001738] Modules linked in: vhost_net vhost tap xt_CHECKSUM 
iptable_mangle ipt_MASQUERADE 

[Kernel-packages] [Bug 1768898] Re: smp_call_function_single/many core hangs with stop4 alone

2018-05-14 Thread Joseph Salisbury
I built one more test kernel that can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1768898

Can you test this kernel and see if it resolves this bug?

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the 
linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the 
linux-image-unsigned, linux-modules and linux-modules-extra .deb packages.

Thanks in advance!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1768898

Title:
  smp_call_function_single/many core hangs with stop4 alone

Status in The Ubuntu-power-systems project:
  In Progress
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress

Bug description:
  Recently we discovered this bug occurs just alone with stop4 which
  results in soft lockups/rcu stalls.

  ```
  root@ltc-boston125:~# [15523.619395] systemd[1]: systemd-journald.service: 
Processes still around after final SIGKILL. Entering failed mode.
  [15523.619508] systemd[1]: systemd-journald.service: Failed with result 
'timeout'.
  [15523.619769] systemd[1]: Failed to start Journal Service.
  [15523.620618] systemd[1]: systemd-journald.service: Service has no hold-off 
time, scheduling restart.
  [15523.620774] systemd[1]: systemd-journald.service: Scheduled restart job, 
restart counter is at 21.
  [15523.621462] systemd[1]: Stopped Journal Service.
  [15523.621635] systemd[1]: systemd-journald.service: Found left-over process 
1561 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.621756] systemd[1]: This usually indicates unclean termination of a 
previous run, or service implementation deficiencies.
  [15523.621888] systemd[1]: systemd-journald.service: Found left-over process 
69060 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.622029] systemd[1]: This usually indica[15541.629904] INFO: rcu_sched 
self-detected stall on CPU
  [15541.629958]60-: (2 GPs behind) idle=146/142/0 
softirq=300022/300022 fqs=999069 
  [15541.630046] (t=2415546 jiffies g=184827 c=184826 q=57111)
  [15541.630101] NMI backtrace for cpu 60
  [15541.630135] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L   
4.15.0-15-generic #16-Ubuntu
  [15541.630207] Call Trace:
  [15541.630232] [c000201a1da96b00] [c0ceb35c] dump_stack+0xb0/0xf4 
(unreliable)
  [15541.630298] [c000201a1da96b40] [c0cf4d48] 
nmi_cpu_backtrace+0x1f8/0x200
  [15541.630363] [c000201a1da96bd0] [c0cf4ee8] 
nmi_trigger_cpumask_backtrace+0x198/0x1f0
  [15541.630429] [c000201a1da96c60] [c002f2d8] 
arch_trigger_cpumask_backtrace+0x28/0x40
  [15541.630495] [c000201a1da96c80] [c01a913c] 
rcu_dump_cpu_stacks+0xf4/0x158
  [15541.630560] [c000201a1da96cd0] [c01a81e8] 
rcu_check_callbacks+0x8e8/0xb40
  [15541.630625] [c000201a1da96e00] [c01b64a8] 
update_process_times+0x48/0x90
  [15541.630689] [c000201a1da96e30] [c01ce1f4] 
tick_sched_handle.isra.5+0x34/0xd0
  [15541.630753] [c000201a1da96e60] [c01ce2f0] 
tick_sched_timer+0x60/0xe0
  [15541.630818] [c000201a1da96ea0] [c01b7054] 
__hrtimer_run_queues+0x144/0x370
  [15541.630883] [c000201a1da96f20] [c01b7fac] 
hrtimer_interrupt+0xfc/0x350
  [15541.630948] [c000201a1da96ff0] [c00248f0] 
__timer_interrupt+0x90/0x260
  [15541.631013] [c000201a1da97040] [c0024d08] timer_interrupt+0x98/0xe0
  [15541.631069] [c000201a1da97070] [c0009014] 
decrementer_common+0x114/0x120
  [15541.631135] --- interrupt: 901 at smp_call_function_single+0x134/0x180
  [15541.631135] LR = smp_call_function_single+0x110/0x180
  [15541.631230] [c000201a1da973d0] [c01d55e0] 
smp_call_function_any+0x180/0x250
  [15541.631294] [c000201a1da97430] [c0acd3e8] 
gpstate_timer_handler+0x1e8/0x580
  [15541.631359] [c000201a1da974e0] [c01b46b0] call_timer_fn+0x50/0x1c0
  [15541.631433] [c000201a1da97560] [c01b4958] expire_timers+0x138/0x1f0
  [15541.631488] [c000201a1da975d0] [c01b4bf8] 
run_timer_softirq+0x1e8/0x270
  [15541.631553] [c000201a1da97670] [c0d0d6c8] __do_softirq+0x158/0x3e4
  [15541.631608] [c000201a1da97750] [c0114be8] irq_exit+0xe8/0x120
  [15541.631663] [c000201a1da97770] [c0024d0c] timer_interrupt+0x9c/0xe0
  [15541.631718] [c000201a1da977a0] [c0009014] 
decrementer_common+0x114/0x120
  [15541.631784] --- interrupt: 901 at smp_call_function_many+0x330/0x450
  [15541.631784] LR = smp_call_function_many+0x324/0x450
  [15541.631879] [c000201a1da97b00] [c0075f18] pmdp_invalidate+0x98/0xe0
  [15541.631935] [c000201a1da97b30] [c03a1120] 
change_huge_pmd+0xe0/0x270
  [15541.632000] [c000201a1da97ba0] [c0349278] 
change_protection_range+0xb88/0xe40
  

[Kernel-packages] [Bug 1768898] Re: smp_call_function_single/many core hangs with stop4 alone

2018-05-14 Thread Joseph Salisbury
** Changed in: linux (Ubuntu Bionic)
   Status: Triaged => In Progress

** Changed in: linux (Ubuntu)
   Status: Triaged => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1768898

Title:
  smp_call_function_single/many core hangs with stop4 alone

Status in The Ubuntu-power-systems project:
  Triaged
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress

Bug description:
  Recently we discovered this bug occurs just alone with stop4 which
  results in soft lockups/rcu stalls.

  ```
  root@ltc-boston125:~# [15523.619395] systemd[1]: systemd-journald.service: 
Processes still around after final SIGKILL. Entering failed mode.
  [15523.619508] systemd[1]: systemd-journald.service: Failed with result 
'timeout'.
  [15523.619769] systemd[1]: Failed to start Journal Service.
  [15523.620618] systemd[1]: systemd-journald.service: Service has no hold-off 
time, scheduling restart.
  [15523.620774] systemd[1]: systemd-journald.service: Scheduled restart job, 
restart counter is at 21.
  [15523.621462] systemd[1]: Stopped Journal Service.
  [15523.621635] systemd[1]: systemd-journald.service: Found left-over process 
1561 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.621756] systemd[1]: This usually indicates unclean termination of a 
previous run, or service implementation deficiencies.
  [15523.621888] systemd[1]: systemd-journald.service: Found left-over process 
69060 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.622029] systemd[1]: This usually indica[15541.629904] INFO: rcu_sched 
self-detected stall on CPU
  [15541.629958]60-: (2 GPs behind) idle=146/142/0 
softirq=300022/300022 fqs=999069 
  [15541.630046] (t=2415546 jiffies g=184827 c=184826 q=57111)
  [15541.630101] NMI backtrace for cpu 60
  [15541.630135] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L   
4.15.0-15-generic #16-Ubuntu
  [15541.630207] Call Trace:
  [15541.630232] [c000201a1da96b00] [c0ceb35c] dump_stack+0xb0/0xf4 
(unreliable)
  [15541.630298] [c000201a1da96b40] [c0cf4d48] 
nmi_cpu_backtrace+0x1f8/0x200
  [15541.630363] [c000201a1da96bd0] [c0cf4ee8] 
nmi_trigger_cpumask_backtrace+0x198/0x1f0
  [15541.630429] [c000201a1da96c60] [c002f2d8] 
arch_trigger_cpumask_backtrace+0x28/0x40
  [15541.630495] [c000201a1da96c80] [c01a913c] 
rcu_dump_cpu_stacks+0xf4/0x158
  [15541.630560] [c000201a1da96cd0] [c01a81e8] 
rcu_check_callbacks+0x8e8/0xb40
  [15541.630625] [c000201a1da96e00] [c01b64a8] 
update_process_times+0x48/0x90
  [15541.630689] [c000201a1da96e30] [c01ce1f4] 
tick_sched_handle.isra.5+0x34/0xd0
  [15541.630753] [c000201a1da96e60] [c01ce2f0] 
tick_sched_timer+0x60/0xe0
  [15541.630818] [c000201a1da96ea0] [c01b7054] 
__hrtimer_run_queues+0x144/0x370
  [15541.630883] [c000201a1da96f20] [c01b7fac] 
hrtimer_interrupt+0xfc/0x350
  [15541.630948] [c000201a1da96ff0] [c00248f0] 
__timer_interrupt+0x90/0x260
  [15541.631013] [c000201a1da97040] [c0024d08] timer_interrupt+0x98/0xe0
  [15541.631069] [c000201a1da97070] [c0009014] 
decrementer_common+0x114/0x120
  [15541.631135] --- interrupt: 901 at smp_call_function_single+0x134/0x180
  [15541.631135] LR = smp_call_function_single+0x110/0x180
  [15541.631230] [c000201a1da973d0] [c01d55e0] 
smp_call_function_any+0x180/0x250
  [15541.631294] [c000201a1da97430] [c0acd3e8] 
gpstate_timer_handler+0x1e8/0x580
  [15541.631359] [c000201a1da974e0] [c01b46b0] call_timer_fn+0x50/0x1c0
  [15541.631433] [c000201a1da97560] [c01b4958] expire_timers+0x138/0x1f0
  [15541.631488] [c000201a1da975d0] [c01b4bf8] 
run_timer_softirq+0x1e8/0x270
  [15541.631553] [c000201a1da97670] [c0d0d6c8] __do_softirq+0x158/0x3e4
  [15541.631608] [c000201a1da97750] [c0114be8] irq_exit+0xe8/0x120
  [15541.631663] [c000201a1da97770] [c0024d0c] timer_interrupt+0x9c/0xe0
  [15541.631718] [c000201a1da977a0] [c0009014] 
decrementer_common+0x114/0x120
  [15541.631784] --- interrupt: 901 at smp_call_function_many+0x330/0x450
  [15541.631784] LR = smp_call_function_many+0x324/0x450
  [15541.631879] [c000201a1da97b00] [c0075f18] pmdp_invalidate+0x98/0xe0
  [15541.631935] [c000201a1da97b30] [c03a1120] 
change_huge_pmd+0xe0/0x270
  [15541.632000] [c000201a1da97ba0] [c0349278] 
change_protection_range+0xb88/0xe40
  [15541.632065] [c000201a1da97cf0] [c03496c0] 
mprotect_fixup+0x140/0x340
  [15541.632129] [c000201a1da97db0] [c0349a74] SyS_mprotect+0x1b4/0x350
  [15541.632185] [c000201a1da97e30] [c000b184] system_call+0x58/0x6c
  [15579.001651] watchdog: BUG: soft lockup - CPU#52 stuck for 23s! [grep:69263]
  [15579.001738] Modules linked 

[Kernel-packages] [Bug 1768898] Re: smp_call_function_single/many core hangs with stop4 alone

2018-05-07 Thread Joseph Salisbury
I built a test kernel with commit c0f7f5b6c69107ca92909512533e70258ee19188.  
The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1768898

Can you test this kernel and see if it resolves this bug?

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the 
linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the 
linux-image-unsigned, linux-modules and linux-modules-extra .deb packages.

Thanks in advance!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1768898

Title:
  smp_call_function_single/many core hangs with stop4 alone

Status in The Ubuntu-power-systems project:
  Triaged
Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Triaged

Bug description:
  Recently we discovered this bug occurs just alone with stop4 which
  results in soft lockups/rcu stalls.

  ```
  root@ltc-boston125:~# [15523.619395] systemd[1]: systemd-journald.service: 
Processes still around after final SIGKILL. Entering failed mode.
  [15523.619508] systemd[1]: systemd-journald.service: Failed with result 
'timeout'.
  [15523.619769] systemd[1]: Failed to start Journal Service.
  [15523.620618] systemd[1]: systemd-journald.service: Service has no hold-off 
time, scheduling restart.
  [15523.620774] systemd[1]: systemd-journald.service: Scheduled restart job, 
restart counter is at 21.
  [15523.621462] systemd[1]: Stopped Journal Service.
  [15523.621635] systemd[1]: systemd-journald.service: Found left-over process 
1561 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.621756] systemd[1]: This usually indicates unclean termination of a 
previous run, or service implementation deficiencies.
  [15523.621888] systemd[1]: systemd-journald.service: Found left-over process 
69060 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.622029] systemd[1]: This usually indica[15541.629904] INFO: rcu_sched 
self-detected stall on CPU
  [15541.629958]60-: (2 GPs behind) idle=146/142/0 
softirq=300022/300022 fqs=999069 
  [15541.630046] (t=2415546 jiffies g=184827 c=184826 q=57111)
  [15541.630101] NMI backtrace for cpu 60
  [15541.630135] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L   
4.15.0-15-generic #16-Ubuntu
  [15541.630207] Call Trace:
  [15541.630232] [c000201a1da96b00] [c0ceb35c] dump_stack+0xb0/0xf4 
(unreliable)
  [15541.630298] [c000201a1da96b40] [c0cf4d48] 
nmi_cpu_backtrace+0x1f8/0x200
  [15541.630363] [c000201a1da96bd0] [c0cf4ee8] 
nmi_trigger_cpumask_backtrace+0x198/0x1f0
  [15541.630429] [c000201a1da96c60] [c002f2d8] 
arch_trigger_cpumask_backtrace+0x28/0x40
  [15541.630495] [c000201a1da96c80] [c01a913c] 
rcu_dump_cpu_stacks+0xf4/0x158
  [15541.630560] [c000201a1da96cd0] [c01a81e8] 
rcu_check_callbacks+0x8e8/0xb40
  [15541.630625] [c000201a1da96e00] [c01b64a8] 
update_process_times+0x48/0x90
  [15541.630689] [c000201a1da96e30] [c01ce1f4] 
tick_sched_handle.isra.5+0x34/0xd0
  [15541.630753] [c000201a1da96e60] [c01ce2f0] 
tick_sched_timer+0x60/0xe0
  [15541.630818] [c000201a1da96ea0] [c01b7054] 
__hrtimer_run_queues+0x144/0x370
  [15541.630883] [c000201a1da96f20] [c01b7fac] 
hrtimer_interrupt+0xfc/0x350
  [15541.630948] [c000201a1da96ff0] [c00248f0] 
__timer_interrupt+0x90/0x260
  [15541.631013] [c000201a1da97040] [c0024d08] timer_interrupt+0x98/0xe0
  [15541.631069] [c000201a1da97070] [c0009014] 
decrementer_common+0x114/0x120
  [15541.631135] --- interrupt: 901 at smp_call_function_single+0x134/0x180
  [15541.631135] LR = smp_call_function_single+0x110/0x180
  [15541.631230] [c000201a1da973d0] [c01d55e0] 
smp_call_function_any+0x180/0x250
  [15541.631294] [c000201a1da97430] [c0acd3e8] 
gpstate_timer_handler+0x1e8/0x580
  [15541.631359] [c000201a1da974e0] [c01b46b0] call_timer_fn+0x50/0x1c0
  [15541.631433] [c000201a1da97560] [c01b4958] expire_timers+0x138/0x1f0
  [15541.631488] [c000201a1da975d0] [c01b4bf8] 
run_timer_softirq+0x1e8/0x270
  [15541.631553] [c000201a1da97670] [c0d0d6c8] __do_softirq+0x158/0x3e4
  [15541.631608] [c000201a1da97750] [c0114be8] irq_exit+0xe8/0x120
  [15541.631663] [c000201a1da97770] [c0024d0c] timer_interrupt+0x9c/0xe0
  [15541.631718] [c000201a1da977a0] [c0009014] 
decrementer_common+0x114/0x120
  [15541.631784] --- interrupt: 901 at smp_call_function_many+0x330/0x450
  [15541.631784] LR = smp_call_function_many+0x324/0x450
  [15541.631879] [c000201a1da97b00] [c0075f18] pmdp_invalidate+0x98/0xe0
  [15541.631935] [c000201a1da97b30] [c03a1120] 
change_huge_pmd+0xe0/0x270
  [15541.632000] [c000201a1da97ba0] 

[Kernel-packages] [Bug 1768898] Re: smp_call_function_single/many core hangs with stop4 alone

2018-05-07 Thread Joseph Salisbury
** Changed in: linux (Ubuntu)
   Importance: Undecided => Critical

** Changed in: linux (Ubuntu)
   Status: New => Triaged

** Also affects: linux (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu Bionic)
   Importance: Undecided => Critical

** Changed in: linux (Ubuntu Bionic)
   Status: New => Triaged

** Changed in: linux (Ubuntu Bionic)
 Assignee: (unassigned) => Joseph Salisbury (jsalisbury)

** Changed in: linux (Ubuntu)
 Assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) => 
Joseph Salisbury (jsalisbury)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1768898

Title:
  smp_call_function_single/many core hangs with stop4 alone

Status in The Ubuntu-power-systems project:
  Triaged
Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Triaged

Bug description:
  Recently we discovered this bug occurs just alone with stop4 which
  results in soft lockups/rcu stalls.

  ```
  root@ltc-boston125:~# [15523.619395] systemd[1]: systemd-journald.service: 
Processes still around after final SIGKILL. Entering failed mode.
  [15523.619508] systemd[1]: systemd-journald.service: Failed with result 
'timeout'.
  [15523.619769] systemd[1]: Failed to start Journal Service.
  [15523.620618] systemd[1]: systemd-journald.service: Service has no hold-off 
time, scheduling restart.
  [15523.620774] systemd[1]: systemd-journald.service: Scheduled restart job, 
restart counter is at 21.
  [15523.621462] systemd[1]: Stopped Journal Service.
  [15523.621635] systemd[1]: systemd-journald.service: Found left-over process 
1561 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.621756] systemd[1]: This usually indicates unclean termination of a 
previous run, or service implementation deficiencies.
  [15523.621888] systemd[1]: systemd-journald.service: Found left-over process 
69060 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.622029] systemd[1]: This usually indica[15541.629904] INFO: rcu_sched 
self-detected stall on CPU
  [15541.629958]60-: (2 GPs behind) idle=146/142/0 
softirq=300022/300022 fqs=999069 
  [15541.630046] (t=2415546 jiffies g=184827 c=184826 q=57111)
  [15541.630101] NMI backtrace for cpu 60
  [15541.630135] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L   
4.15.0-15-generic #16-Ubuntu
  [15541.630207] Call Trace:
  [15541.630232] [c000201a1da96b00] [c0ceb35c] dump_stack+0xb0/0xf4 
(unreliable)
  [15541.630298] [c000201a1da96b40] [c0cf4d48] 
nmi_cpu_backtrace+0x1f8/0x200
  [15541.630363] [c000201a1da96bd0] [c0cf4ee8] 
nmi_trigger_cpumask_backtrace+0x198/0x1f0
  [15541.630429] [c000201a1da96c60] [c002f2d8] 
arch_trigger_cpumask_backtrace+0x28/0x40
  [15541.630495] [c000201a1da96c80] [c01a913c] 
rcu_dump_cpu_stacks+0xf4/0x158
  [15541.630560] [c000201a1da96cd0] [c01a81e8] 
rcu_check_callbacks+0x8e8/0xb40
  [15541.630625] [c000201a1da96e00] [c01b64a8] 
update_process_times+0x48/0x90
  [15541.630689] [c000201a1da96e30] [c01ce1f4] 
tick_sched_handle.isra.5+0x34/0xd0
  [15541.630753] [c000201a1da96e60] [c01ce2f0] 
tick_sched_timer+0x60/0xe0
  [15541.630818] [c000201a1da96ea0] [c01b7054] 
__hrtimer_run_queues+0x144/0x370
  [15541.630883] [c000201a1da96f20] [c01b7fac] 
hrtimer_interrupt+0xfc/0x350
  [15541.630948] [c000201a1da96ff0] [c00248f0] 
__timer_interrupt+0x90/0x260
  [15541.631013] [c000201a1da97040] [c0024d08] timer_interrupt+0x98/0xe0
  [15541.631069] [c000201a1da97070] [c0009014] 
decrementer_common+0x114/0x120
  [15541.631135] --- interrupt: 901 at smp_call_function_single+0x134/0x180
  [15541.631135] LR = smp_call_function_single+0x110/0x180
  [15541.631230] [c000201a1da973d0] [c01d55e0] 
smp_call_function_any+0x180/0x250
  [15541.631294] [c000201a1da97430] [c0acd3e8] 
gpstate_timer_handler+0x1e8/0x580
  [15541.631359] [c000201a1da974e0] [c01b46b0] call_timer_fn+0x50/0x1c0
  [15541.631433] [c000201a1da97560] [c01b4958] expire_timers+0x138/0x1f0
  [15541.631488] [c000201a1da975d0] [c01b4bf8] 
run_timer_softirq+0x1e8/0x270
  [15541.631553] [c000201a1da97670] [c0d0d6c8] __do_softirq+0x158/0x3e4
  [15541.631608] [c000201a1da97750] [c0114be8] irq_exit+0xe8/0x120
  [15541.631663] [c000201a1da97770] [c0024d0c] timer_interrupt+0x9c/0xe0
  [15541.631718] [c000201a1da977a0] [c0009014] 
decrementer_common+0x114/0x120
  [15541.631784] --- interrupt: 901 at smp_call_function_many+0x330/0x450
  [15541.631784] LR = smp_call_function_many+0x324/0x450
  [15541.631879] [c000201a1da97b00] [c0075f18] pmdp_invalidate+0x98/0xe0
  [15541.631935] [c000201a1da97b30] [c03a1120] 
change_huge_pmd+0xe0/0x270
  

[Kernel-packages] [Bug 1768898] Re: smp_call_function_single/many core hangs with stop4 alone

2018-05-03 Thread Frank Heimes
** Also affects: ubuntu-power-systems
   Importance: Undecided
   Status: New

** Changed in: ubuntu-power-systems
   Status: New => Triaged

** Changed in: ubuntu-power-systems
   Importance: Undecided => Critical

** Changed in: ubuntu-power-systems
 Assignee: (unassigned) => Canonical Kernel Team (canonical-kernel-team)

** Tags added: triage-g

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1768898

Title:
  smp_call_function_single/many core hangs with stop4 alone

Status in The Ubuntu-power-systems project:
  Triaged
Status in linux package in Ubuntu:
  New

Bug description:
  Recently we discovered this bug occurs just alone with stop4 which
  results in soft lockups/rcu stalls.

  ```
  root@ltc-boston125:~# [15523.619395] systemd[1]: systemd-journald.service: 
Processes still around after final SIGKILL. Entering failed mode.
  [15523.619508] systemd[1]: systemd-journald.service: Failed with result 
'timeout'.
  [15523.619769] systemd[1]: Failed to start Journal Service.
  [15523.620618] systemd[1]: systemd-journald.service: Service has no hold-off 
time, scheduling restart.
  [15523.620774] systemd[1]: systemd-journald.service: Scheduled restart job, 
restart counter is at 21.
  [15523.621462] systemd[1]: Stopped Journal Service.
  [15523.621635] systemd[1]: systemd-journald.service: Found left-over process 
1561 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.621756] systemd[1]: This usually indicates unclean termination of a 
previous run, or service implementation deficiencies.
  [15523.621888] systemd[1]: systemd-journald.service: Found left-over process 
69060 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.622029] systemd[1]: This usually indica[15541.629904] INFO: rcu_sched 
self-detected stall on CPU
  [15541.629958]60-: (2 GPs behind) idle=146/142/0 
softirq=300022/300022 fqs=999069 
  [15541.630046] (t=2415546 jiffies g=184827 c=184826 q=57111)
  [15541.630101] NMI backtrace for cpu 60
  [15541.630135] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L   
4.15.0-15-generic #16-Ubuntu
  [15541.630207] Call Trace:
  [15541.630232] [c000201a1da96b00] [c0ceb35c] dump_stack+0xb0/0xf4 
(unreliable)
  [15541.630298] [c000201a1da96b40] [c0cf4d48] 
nmi_cpu_backtrace+0x1f8/0x200
  [15541.630363] [c000201a1da96bd0] [c0cf4ee8] 
nmi_trigger_cpumask_backtrace+0x198/0x1f0
  [15541.630429] [c000201a1da96c60] [c002f2d8] 
arch_trigger_cpumask_backtrace+0x28/0x40
  [15541.630495] [c000201a1da96c80] [c01a913c] 
rcu_dump_cpu_stacks+0xf4/0x158
  [15541.630560] [c000201a1da96cd0] [c01a81e8] 
rcu_check_callbacks+0x8e8/0xb40
  [15541.630625] [c000201a1da96e00] [c01b64a8] 
update_process_times+0x48/0x90
  [15541.630689] [c000201a1da96e30] [c01ce1f4] 
tick_sched_handle.isra.5+0x34/0xd0
  [15541.630753] [c000201a1da96e60] [c01ce2f0] 
tick_sched_timer+0x60/0xe0
  [15541.630818] [c000201a1da96ea0] [c01b7054] 
__hrtimer_run_queues+0x144/0x370
  [15541.630883] [c000201a1da96f20] [c01b7fac] 
hrtimer_interrupt+0xfc/0x350
  [15541.630948] [c000201a1da96ff0] [c00248f0] 
__timer_interrupt+0x90/0x260
  [15541.631013] [c000201a1da97040] [c0024d08] timer_interrupt+0x98/0xe0
  [15541.631069] [c000201a1da97070] [c0009014] 
decrementer_common+0x114/0x120
  [15541.631135] --- interrupt: 901 at smp_call_function_single+0x134/0x180
  [15541.631135] LR = smp_call_function_single+0x110/0x180
  [15541.631230] [c000201a1da973d0] [c01d55e0] 
smp_call_function_any+0x180/0x250
  [15541.631294] [c000201a1da97430] [c0acd3e8] 
gpstate_timer_handler+0x1e8/0x580
  [15541.631359] [c000201a1da974e0] [c01b46b0] call_timer_fn+0x50/0x1c0
  [15541.631433] [c000201a1da97560] [c01b4958] expire_timers+0x138/0x1f0
  [15541.631488] [c000201a1da975d0] [c01b4bf8] 
run_timer_softirq+0x1e8/0x270
  [15541.631553] [c000201a1da97670] [c0d0d6c8] __do_softirq+0x158/0x3e4
  [15541.631608] [c000201a1da97750] [c0114be8] irq_exit+0xe8/0x120
  [15541.631663] [c000201a1da97770] [c0024d0c] timer_interrupt+0x9c/0xe0
  [15541.631718] [c000201a1da977a0] [c0009014] 
decrementer_common+0x114/0x120
  [15541.631784] --- interrupt: 901 at smp_call_function_many+0x330/0x450
  [15541.631784] LR = smp_call_function_many+0x324/0x450
  [15541.631879] [c000201a1da97b00] [c0075f18] pmdp_invalidate+0x98/0xe0
  [15541.631935] [c000201a1da97b30] [c03a1120] 
change_huge_pmd+0xe0/0x270
  [15541.632000] [c000201a1da97ba0] [c0349278] 
change_protection_range+0xb88/0xe40
  [15541.632065] [c000201a1da97cf0] [c03496c0] 
mprotect_fixup+0x140/0x340
  [15541.632129] [c000201a1da97db0] [c0349a74] SyS_mprotect+0x1b4/0x350
  [15541.632185] [c000201a1da97e30]