SRU request submitted:
https://lists.ubuntu.com/archives/kernel-team/2018-June/093308.html
** Description changed:
+ == SRU Justification ==
+ IBM has requested these three commits in Artful. In Artful, kdump fails to
+ capture dump when smt=2 or off.
+
+ Including these three commits allows kdump to work properly.
+
+ == Fixes ==
+ 4388c9b3a6ee ("powerpc: Do not send system reset request through the oops
path")
+ 04b9c96eae72 ("powerpc/crash: Remove the test for cpu_online in the IPI
callback")
+ 4552d128c26e ("powerpc: System reset avoid interleaving oops using die
synchronisation")
+
+ == Regression Potential ==
+ Low. Fixes are limited to powerpc.
+
+ == Test Case ==
+ A test kernel was built with these patches and tested by the original bug
reporter.
+ The bug reporter states the test kernel resolved the bug.
+
+
+
--Problem Description---
kdump fails to take dump with smt set to 2, hmc dumpstart
---Issue observed---
[ 0.004111] Oops: Exception in kernel mode, sig: 4 [#1]
- [ 0.004118] SMP NR_CPUS=2048
- [ 0.004120] NUMA
+ [ 0.004118] SMP NR_CPUS=2048
+ [ 0.004120] NUMA
[ 0.004125] pSeries
[ 0.004132] Modules linked in:
[ 0.004142] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0-12-generic
#13-Ubuntu
[ 0.004153] task: c000000046715900 task.stack: c000000046134000
[ 0.004162] NIP: c000000000006468 LR: c00000000801764c CTR:
00000000006cdc70
[ 0.004173] REGS: c000000047fe3ce0 TRAP: 0700 Not tainted
(4.13.0-12-generic)
[ 0.004181] MSR: 8000000000081031 <SF,ME,IR,DR,LE>
[ 0.004193] CR: 88042222 XER: 20000003
- [ 0.004204] CFAR: c000000000006454 SOFTE: 0
- [ 0.004204] GPR00: c00000000801764c c000000047fe3f60 c0000000095e3000
0000000000000000
- [ 0.004204] GPR04: 0000000000000001 0000000000000002 ffffffffffffffff
ffffffffffffffdf
- [ 0.004204] GPR08: 0000000000000000 0000000028042222 0000000000000002
0000000000000002
- [ 0.004204] GPR12: 0000000000000000 c00000000fff0000 c000000046137f90
000000000b5452d8
- [ 0.004204] GPR16: fffffffffffffffd 00000000089ffd10 0000000001360000
000000000b55d378
- [ 0.004204] GPR20: 0000000000000060 000000001eca0000 000000000a6c0000
0000000000000007
- [ 0.004204] GPR24: 0000000000000000 0000000000000000 c000000009621ed0
0000000000000000
- [ 0.004204] GPR28: 0000000000000000 c000000046134000 c000000046137c80
c000000009105df8
+ [ 0.004204] CFAR: c000000000006454 SOFTE: 0
+ [ 0.004204] GPR00: c00000000801764c c000000047fe3f60 c0000000095e3000
0000000000000000
+ [ 0.004204] GPR04: 0000000000000001 0000000000000002 ffffffffffffffff
ffffffffffffffdf
+ [ 0.004204] GPR08: 0000000000000000 0000000028042222 0000000000000002
0000000000000002
+ [ 0.004204] GPR12: 0000000000000000 c00000000fff0000 c000000046137f90
000000000b5452d8
+ [ 0.004204] GPR16: fffffffffffffffd 00000000089ffd10 0000000001360000
000000000b55d378
+ [ 0.004204] GPR20: 0000000000000060 000000001eca0000 000000000a6c0000
0000000000000007
+ [ 0.004204] GPR24: 0000000000000000 0000000000000000 c000000009621ed0
0000000000000000
+ [ 0.004204] GPR28: 0000000000000000 c000000046134000 c000000046137c80
c000000009105df8
[ 0.004328] NIP [c000000000006468] 0xc000000000006468
[ 0.004338] LR [c00000000801764c] __do_irq+0x4c/0x1c0
[ 0.004345] Call Trace:
[ 0.004354] [c000000047fe3f60] [c00000000801764c] __do_irq+0x4c/0x1c0
(unreliable)
[ 0.004368] [c000000047fe3f90] [c00000000802ab70] call_do_irq+0x14/0x24
[ 0.004380] [c000000046137bc0] [c00000000801785c] do_IRQ+0x9c/0x130
[ 0.004393] [c000000046137c10] [c000000008008ac4]
hardware_interrupt_common+0x114/0x120
[ 0.004409] --- interrupt: 501 at arch_local_irq_restore+0x5c/0x90
[ 0.004409] LR = arch_local_irq_restore+0x40/0x90
[ 0.004423] [c000000046137f00] [0000000000000005] 0x5 (unreliable)
[ 0.004436] [c000000046137f20] [c000000008049824]
start_secondary+0x324/0x350
[ 0.004450] [c000000046137f90] [c00000000800aa6c]
start_secondary_prolog+0x10/0x14
[ 0.004460] Instruction dump:
- [ 0.004467] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX
- [ 0.004484] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX
+ [ 0.004467] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX
+ [ 0.004484] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX
[ 0.004506] ---[ end trace 3e5a2a9047ef3cd0 ]---
- [ 0.004512]
+ [ 0.004512]
[ 0.004518] Oops: Exception in kernel mode, sig: 4 [#2]
- [ 0.004525] SMP NR_CPUS=2048
- [ 0.004526] NUMA
+ [ 0.004525] SMP NR_CPUS=2048
+ [ 0.004526] NUMA
[ 0.004532] pSeries
[ 0.004540] Modules linked in:
[ 0.004550] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G D
4.13.0-12-generic #13-Ubuntu
[ 0.004561] task: c000000009579f00 task.stack: c0000000095dc000
[ 0.004569] NIP: c000000000006460 LR: c0000000080b6e80 CTR:
0000000000000000
[ 0.004580] REGS: c0000000095dfb20 TRAP: 0700 Tainted: G D
(4.13.0-12-generic)
[ 0.004589] MSR: 8000000000081031 <SF,ME,IR,DR,LE>
[ 0.004599] CR: 22002228 XER: 20000004
- [ 0.004611] CFAR: c00000000000493c SOFTE: 0
- [ 0.004611] GPR00: 0000000000000000 c0000000095dfda0 c0000000095e3000
0000000000000000
- [ 0.004611] GPR04: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
- [ 0.004611] GPR08: 0000000000000000 0000000022002228 000000007fffffff
0000000000000008
- [ 0.004611] GPR12: 000000000000ffff c00000000fff0a80 c000000c7e137f90
0000000009980600
- [ 0.004611] GPR16: 000000001ec70000 0000000000000001 0000000000000000
0000000000000000
- [ 0.004611] GPR20: 0000000000000000 0000000000000000 0000000000000000
0000000000000007
- [ 0.004611] GPR24: 0000000000000008 c000000008000000 0000000008000000
0000000000000000
- [ 0.004611] GPR28: 0000000000000000 0000000000000008 c000000009621ed0
c000000009622354
+ [ 0.004611] CFAR: c00000000000493c SOFTE: 0
+ [ 0.004611] GPR00: 0000000000000000 c0000000095dfda0 c0000000095e3000
0000000000000000
+ [ 0.004611] GPR04: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
+ [ 0.004611] GPR08: 0000000000000000 0000000022002228 000000007fffffff
0000000000000008
+ [ 0.004611] GPR12: 000000000000ffff c00000000fff0a80 c000000c7e137f90
0000000009980600
+ [ 0.004611] GPR16: 000000001ec70000 0000000000000001 0000000000000000
0000000000000000
+ [ 0.004611] GPR20: 0000000000000000 0000000000000000 0000000000000000
0000000000000007
+ [ 0.004611] GPR24: 0000000000000008 c000000008000000 0000000008000000
0000000000000000
+ [ 0.004611] GPR28: 0000000000000000 0000000000000008 c000000009621ed0
c000000009622354
[ 0.004729] NIP [c000000000006460] 0xc000000000006460
[ 0.004739] LR [c0000000080b6e80] pseries_lpar_idle+0x30/0x50
[ 0.004746] Call Trace:
[ 0.004756] [c0000000095dfda0] [c0000000095dfe90]
init_thread_union+0x3e90/0x4000 (unreliable)
[ 0.004771] [c0000000095dfe00] [c00000000801e314] arch_cpu_idle+0x54/0x160
[ 0.004784] [c0000000095dfe30] [c000000008c6b92c]
default_idle_call+0x4c/0x7c
[ 0.004798] [c0000000095dfe50] [c00000000815da14] do_idle+0x244/0x320
[ 0.004810] [c0000000095dfea0] [c00000000815dd28]
cpu_startup_entry+0x38/0x50
[ 0.004823] [c0000000095dfed0] [c00000000800d2dc] rest_init+0xec/0x110
[ 0.004835] [c0000000095dff00] [c000000008fe40fc] start_kernel+0x584/0x5a4
[ 0.004848] [c0000000095dff90] [c00000000800ab7c]
start_here_common+0x1c/0x520
[ 0.004857] Instruction dump:
- [ 0.004864] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX
- [ 0.004881] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX
+ [ 0.004864] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX
+ [ 0.004881] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX
[ 0.004899] ---[ end trace 3e5a2a9047ef3cd1 ]---
- [ 0.004906]
+ [ 0.004906]
[ 3.949808] Kernel panic - not syncing: Fatal exception in interrupt
[ 4.179808] ---[ end Kernel panic - not syncing: Fatal exception in
interrupt
-
When tried with maxcpus=1, following is observed.
[ 3992.056997] Modules linked in: async_tx raid6_pq raid1 raid0 multipath
linear ibmvscsi(+) crc32c_vpmsum
[ 3992.136992] CPU: 1 PID: 207 Comm: modprobe Not tainted 4.13.0-12-generic
#13-Ubuntu
[ 3992.166991] task: c000000043719e00 task.stack: c0000000437c8000
[ 3992.206994] NIP: c0000000086d2530 LR: c0000000086d46f0 CTR:
0000000000000013
[ 3992.246996] REGS: c0000000437cb260 TRAP: 0901 Not tainted
(4.13.0-12-generic)
[ 3992.276994] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
[ 3992.306995] CR: 24844442 XER: 20000000
- [ 3992.366993] CFAR: c0000000086d2570 SOFTE: 1
- [ 3992.366993] GPR00: ffffffffffffff68 c0000000437cb4e0 c0000000095e3000
c000000043c67e80
- [ 3992.366993] GPR04: c000000043c67e80 c000000043c6bc00 ffffffffffffffed
39077b9925c55abe
- [ 3992.366993] GPR08: 0000000000000000 0000000000000000 0000000000000000
0000000000000060
- [ 3992.366993] GPR12: ffffffffffffff00 c00000000fac0a80
+ [ 3992.366993] CFAR: c0000000086d2570 SOFTE: 1
+ [ 3992.366993] GPR00: ffffffffffffff68 c0000000437cb4e0 c0000000095e3000
c000000043c67e80
+ [ 3992.366993] GPR04: c000000043c67e80 c000000043c6bc00 ffffffffffffffed
39077b9925c55abe
+ [ 3992.366993] GPR08: 0000000000000000 0000000000000000 0000000000000000
0000000000000060
+ [ 3992.366993] GPR12: ffffffffffffff00 c00000000fac0a80
[ 3992.546994] NIP [c0000000086d2530] mpihelp_add_n+0x30/0x80
[ 3992.586990] LR [c0000000086d46f0] mpih_sqr_n+0x230/0x460
[ 3992.606991] Call Trace:
[ 3992.617082] [c0000000437cb4e0] [c0000000086d48c4] mpih_sqr_n+0x404/0x460
(unreliable)
[ 3992.636996] [c0000000437cb560] [c0000000086d4844] mpih_sqr_n+0x384/0x460
[ 3992.676996] [c0000000437cb5e0] [c0000000086d5778] mpi_powm+0x678/0xe50
[ 3992.716992] [c0000000437cb720] [c000000008619d40] _rsa_dec.isra.1+0x80/0xc0
[ 3992.746996] [c0000000437cb760] [c00000000861a094] rsa_verify+0x94/0x140
[ 3992.786994] [c0000000437cb7c0] [c00000000861af44]
pkcs1pad_verify+0xd4/0x160
[ 3992.856995] [c0000000437cb800] [c000000008631510]
public_key_verify_signature+0x240/0x4b0
[ 3992.896992] [c0000000437cb9a0] [c0000000086311d4]
verify_signature+0x64/0x90
[ 3992.926997] [c0000000437cb9c0] [c000000008634690]
pkcs7_validate_trust+0x190/0x2c0
[ 3992.976992] [c0000000437cba20] [c0000000082b2e30]
verify_pkcs7_signature+0xc0/0x1f0
[ 3993.036993] [c0000000437cbad0] [c0000000081c8414] mod_verify_sig+0x94/0x100
[ 3993.076996] [c0000000437cbb40] [c0000000081c5054] load_module+0x264/0x1fc0
[ 3993.116992] [c0000000437cbd30] [c0000000081c70b4]
SyS_finit_module+0xc4/0x130
[ 3993.176992] [c0000000437cbe30] [c00000000800b184] system_call+0x58/0x6c
[ 3993.226990] Instruction dump:
- [ 3993.237018] 39400000 7cc600d0 7cc607b4 7cc930f8 78c01f24 79290020 7c0c0378
39290001
- [ 3993.336994] 7d2903a6 60000000 60000000 60420000 <7d6c0050> 38c60001
7cc607b4 7d25582a
+ [ 3993.237018] 39400000 7cc600d0 7cc607b4 7cc930f8 78c01f24 79290020 7c0c0378
39290001
+ [ 3993.336994] 7d2903a6 60000000 60000000 60420000 <7d6c0050> 38c60001
7cc607b4 7d25582a
[ 4028.156997] xor: measuring software checksum speed
[ 4029.376998] 8regs : 16.000 MB/sec
[ 4030.676992] 8regs_prefetch: 16.000 MB/sec
[ 4031.716993] 32regs : 16.000 MB/sec
[ 4032.886994] 32regs_prefetch: 16.000 MB/sec
[ 4034.256993] altivec : 16.000 MB/sec
[ 4034.316994] xor: using function: altivec (16.000 MB/sec)
[ 4076.016995] watchdog: BUG: soft lockup - CPU#1 stuck for 22s!
[modprobe:207]
[ 4076.046994] Modules linked in: xor async_tx raid6_pq raid1 raid0 multipath
linear ibmvscsi(+) crc32c_vpmsum
[ 4076.126994] CPU: 1 PID: 207 Comm: modprobe Tainted: G L
4.13.0-12-generic #13-Ubuntu
[ 4076.186995] task: c000000043719e00 task.stack: c0000000437c8000
[ 4076.226993] NIP: c0000000086d224c LR: c0000000086d4404 CTR:
0000000000000008
[ 4076.256991] REGS: c0000000437cb190 TRAP: 0901 Tainted: G L
(4.13.0-12-generic)
[ 4076.286994] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
[ 4076.326993] CR: 24884444 XER: 20000000
- [ 4076.356998] CFAR: c0000000086d4400 SOFTE: 1
- [ 4076.356998] GPR00: 5ebfd337ad53c297 c0000000437cb410 c0000000095e3000
c000000043c62910
- [ 4076.356998] GPR04: c000000043c62800 fffffffffffffff8 00000000c68de1f2
0000000000000000
- [ 4076.356998] GPR08: 761ab85da0153bf8 0000000000000008 0000000063cfb2b3
026231001e934591
- [ 4076.356998] GPR12: 0000000000000038 c00000000fac0a80
+ [ 4076.356998] CFAR: c0000000086d4400 SOFTE: 1
+ [ 4076.356998] GPR00: 5ebfd337ad53c297 c0000000437cb410 c0000000095e3000
c000000043c62910
+ [ 4076.356998] GPR04: c000000043c62800 fffffffffffffff8 00000000c68de1f2
0000000000000000
+ [ 4076.356998] GPR08: 761ab85da0153bf8 0000000000000008 0000000063cfb2b3
026231001e934591
+ [ 4076.356998] GPR12: 0000000000000038 c00000000fac0a80
[ 4076.556992] NIP [c0000000086d224c] mpihelp_addmul_1+0x4c/0xf0
[ 4076.596990] LR [c0000000086d4404] mpih_sqr_n_basecase+0xd4/0x190
[ 4076.607012] Call Trace:
[ 4076.636994] [c0000000437cb410] [0000000000000901] 0x901 (unreliable)
[ 4076.676992] [c0000000437cb460] [c0000000086d4644] mpih_sqr_n+0x184/0x460
[ 4076.736992] [c0000000437cb4e0] [c0000000086d4890] mpih_sqr_n+0x3d0/0x460
[ 4076.756995] [c0000000437cb560] [c0000000086d4844] mpih_sqr_n+0x384/0x460
[ 4076.816995] [c0000000437cb5e0] [c0000000086d5778] mpi_powm+0x678/0xe50
[ 4076.846996] [c0000000437cb720] [c000000008619d40] _rsa_dec.isra.1+0x80/0xc0
[ 4076.896992] [c0000000437cb760] [c00000000861a094] rsa_verify+0x94/0x140
[ 4076.946997] [c0000000437cb7c0] [c00000000861af44]
pkcs1pad_verify+0xd4/0x160
[ 4076.976996] [c0000000437cb800] [c000000008631510]
public_key_verify_signature+0x240/0x4b0
[ 4077.016993] [c0000000437cb9a0] [c0000000086311d4]
verify_signature+0x64/0x90
[ 4077.046995] [c0000000437cb9c0] [c000000008634690]
pkcs7_validate_trust+0x190/0x2c0
[ 4077.086997] [c0000000437cba20] [c0000000082b2e30]
verify_pkcs7_signature+0xc0/0x1f0
[ 4077.136995] [c0000000437cbad0] [c0000000081c8414] mod_verify_sig+0x94/0x100
[ 4077.196996] [c0000000437cbb40] [c0000000081c5054] load_module+0x264/0x1fc0
[ 4077.236996] [c0000000437cbd30] [c0000000081c70b4]
SyS_finit_module+0xc4/0x130
[ 4077.286997] [c0000000437cbe30] [c00000000800b184] system_call+0x58/0x6c
[ 4077.337015] Instruction dump:
- [ 4077.366995] 7ca507b4 78c60020 7ca928f8 78bf1f24 79290020 7ffdfb78 39290001
38e00000
- [ 4077.426994] 7d2903a6 7b9c83e4 60000000 60000000 <60420000> 7d9df850
38a50001 7ca507b4
+ [ 4077.366995] 7ca507b4 78c60020 7ca928f8 78bf1f24 79290020 7ffdfb78 39290001
38e00000
+ [ 4077.426994] 7d2903a6 7b9c83e4 60000000 60000000 <60420000> 7d9df850
38a50001 7ca507b4
Contact Information = [email protected]
-
+
---uname output---
Linux ltcalpine-lp9 4.13.0-12-generic #13-Ubuntu SMP Fri Sep 22 20:52:52 UTC
2017 ppc64le ppc64le ppc64le GNU/Linux
-
Machine Type/Model = Power 8 pVM/8408-E8E
----Additional Info-----
# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinux-4.13.0-12-generic
root=UUID=861097e8-43d3-4335-83d3-6db421e20564 ro
crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M
-
+
---Steps to Reproduce---
1. installed linux-crashdump and install debug kernel
2. edited the kdump-tools.cfg crashkernel cmdline to above
3. update-grub
4. reboot once
5. make sure kdump is enabled
6. pp64_cpu --smt=2
- 7. Login to hmc and trigger dumpstart.
+ 7. Login to hmc and trigger dumpstart.
chsysstate -r lpar -m <Server-name> -n <lpar-name> -o dumprestart
soft lockup is observed when maxcpus=1 is used in kdump instead of
nr_cpus=1. Dump is not taken and kernel boot stops.
The full log is attached.
Expected:
To take dump and boot back to the host kernel.
== Comment: #4 - Hari Krishna Bathini <[email protected]> - 2018-06-11
06:22:57 ==
The below upstream patches should resolve this issue:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=04b9c96eae72
("powerpc/crash: Remove the test for cpu_online in the IPI callback")
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4388c9b3a6ee
("powerpc: Do not send system reset request through the oops path")
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4552d128c26e
("powerpc: System reset avoid interleaving oops using die synchronisation")
Thanks
Hari
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1776211
Title:
kdump fails to take dump with smt set to 2, hmc dumpstart
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1776211/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs