------- Comment From vaish...@in.ibm.com 2018-06-12 11:38 EDT-------
FYI,
We do have this fix included for bionic already with launchpad bug 1758206

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to makedumpfile in Ubuntu.
https://bugs.launchpad.net/bugs/1776211

Title:
  kdump fails to take dump with smt set to 2, hmc dumpstart

Status in The Ubuntu-power-systems project:
  In Progress
Status in linux package in Ubuntu:
  Invalid
Status in makedumpfile package in Ubuntu:
  New
Status in linux source package in Artful:
  In Progress
Status in makedumpfile source package in Artful:
  New

Bug description:
  --Problem Description---
  kdump fails to take dump with smt set to 2, hmc dumpstart

  ---Issue observed---
  [    0.004111] Oops: Exception in kernel mode, sig: 4 [#1]
  [    0.004118] SMP NR_CPUS=2048 
  [    0.004120] NUMA 
  [    0.004125] pSeries
  [    0.004132] Modules linked in:
  [    0.004142] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0-12-generic 
#13-Ubuntu
  [    0.004153] task: c000000046715900 task.stack: c000000046134000
  [    0.004162] NIP: c000000000006468 LR: c00000000801764c CTR: 
00000000006cdc70
  [    0.004173] REGS: c000000047fe3ce0 TRAP: 0700   Not tainted  
(4.13.0-12-generic)
  [    0.004181] MSR: 8000000000081031 <SF,ME,IR,DR,LE>
  [    0.004193]   CR: 88042222  XER: 20000003
  [    0.004204] CFAR: c000000000006454 SOFTE: 0 
  [    0.004204] GPR00: c00000000801764c c000000047fe3f60 c0000000095e3000 
0000000000000000 
  [    0.004204] GPR04: 0000000000000001 0000000000000002 ffffffffffffffff 
ffffffffffffffdf 
  [    0.004204] GPR08: 0000000000000000 0000000028042222 0000000000000002 
0000000000000002 
  [    0.004204] GPR12: 0000000000000000 c00000000fff0000 c000000046137f90 
000000000b5452d8 
  [    0.004204] GPR16: fffffffffffffffd 00000000089ffd10 0000000001360000 
000000000b55d378 
  [    0.004204] GPR20: 0000000000000060 000000001eca0000 000000000a6c0000 
0000000000000007 
  [    0.004204] GPR24: 0000000000000000 0000000000000000 c000000009621ed0 
0000000000000000 
  [    0.004204] GPR28: 0000000000000000 c000000046134000 c000000046137c80 
c000000009105df8 
  [    0.004328] NIP [c000000000006468] 0xc000000000006468
  [    0.004338] LR [c00000000801764c] __do_irq+0x4c/0x1c0
  [    0.004345] Call Trace:
  [    0.004354] [c000000047fe3f60] [c00000000801764c] __do_irq+0x4c/0x1c0 
(unreliable)
  [    0.004368] [c000000047fe3f90] [c00000000802ab70] call_do_irq+0x14/0x24
  [    0.004380] [c000000046137bc0] [c00000000801785c] do_IRQ+0x9c/0x130
  [    0.004393] [c000000046137c10] [c000000008008ac4] 
hardware_interrupt_common+0x114/0x120
  [    0.004409] --- interrupt: 501 at arch_local_irq_restore+0x5c/0x90
  [    0.004409]     LR = arch_local_irq_restore+0x40/0x90
  [    0.004423] [c000000046137f00] [0000000000000005] 0x5 (unreliable)
  [    0.004436] [c000000046137f20] [c000000008049824] 
start_secondary+0x324/0x350
  [    0.004450] [c000000046137f90] [c00000000800aa6c] 
start_secondary_prolog+0x10/0x14
  [    0.004460] Instruction dump:
  [    0.004467] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
XXXXXXXX 
  [    0.004484] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
XXXXXXXX 
  [    0.004506] ---[ end trace 3e5a2a9047ef3cd0 ]---
  [    0.004512] 
  [    0.004518] Oops: Exception in kernel mode, sig: 4 [#2]
  [    0.004525] SMP NR_CPUS=2048 
  [    0.004526] NUMA 
  [    0.004532] pSeries
  [    0.004540] Modules linked in:
  [    0.004550] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D         
4.13.0-12-generic #13-Ubuntu
  [    0.004561] task: c000000009579f00 task.stack: c0000000095dc000
  [    0.004569] NIP: c000000000006460 LR: c0000000080b6e80 CTR: 
0000000000000000
  [    0.004580] REGS: c0000000095dfb20 TRAP: 0700   Tainted: G      D          
(4.13.0-12-generic)
  [    0.004589] MSR: 8000000000081031 <SF,ME,IR,DR,LE>
  [    0.004599]   CR: 22002228  XER: 20000004
  [    0.004611] CFAR: c00000000000493c SOFTE: 0 
  [    0.004611] GPR00: 0000000000000000 c0000000095dfda0 c0000000095e3000 
0000000000000000 
  [    0.004611] GPR04: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000 
  [    0.004611] GPR08: 0000000000000000 0000000022002228 000000007fffffff 
0000000000000008 
  [    0.004611] GPR12: 000000000000ffff c00000000fff0a80 c000000c7e137f90 
0000000009980600 
  [    0.004611] GPR16: 000000001ec70000 0000000000000001 0000000000000000 
0000000000000000 
  [    0.004611] GPR20: 0000000000000000 0000000000000000 0000000000000000 
0000000000000007 
  [    0.004611] GPR24: 0000000000000008 c000000008000000 0000000008000000 
0000000000000000 
  [    0.004611] GPR28: 0000000000000000 0000000000000008 c000000009621ed0 
c000000009622354 
  [    0.004729] NIP [c000000000006460] 0xc000000000006460
  [    0.004739] LR [c0000000080b6e80] pseries_lpar_idle+0x30/0x50
  [    0.004746] Call Trace:
  [    0.004756] [c0000000095dfda0] [c0000000095dfe90] 
init_thread_union+0x3e90/0x4000 (unreliable)
  [    0.004771] [c0000000095dfe00] [c00000000801e314] arch_cpu_idle+0x54/0x160
  [    0.004784] [c0000000095dfe30] [c000000008c6b92c] 
default_idle_call+0x4c/0x7c
  [    0.004798] [c0000000095dfe50] [c00000000815da14] do_idle+0x244/0x320
  [    0.004810] [c0000000095dfea0] [c00000000815dd28] 
cpu_startup_entry+0x38/0x50
  [    0.004823] [c0000000095dfed0] [c00000000800d2dc] rest_init+0xec/0x110
  [    0.004835] [c0000000095dff00] [c000000008fe40fc] start_kernel+0x584/0x5a4
  [    0.004848] [c0000000095dff90] [c00000000800ab7c] 
start_here_common+0x1c/0x520
  [    0.004857] Instruction dump:
  [    0.004864] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
XXXXXXXX 
  [    0.004881] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
XXXXXXXX 
  [    0.004899] ---[ end trace 3e5a2a9047ef3cd1 ]---
  [    0.004906] 
  [    3.949808] Kernel panic - not syncing: Fatal exception in interrupt
  [    4.179808] ---[ end Kernel panic - not syncing: Fatal exception in 
interrupt

  
  When tried with maxcpus=1, following is observed.

  [ 3992.056997] Modules linked in: async_tx raid6_pq raid1 raid0 multipath 
linear ibmvscsi(+) crc32c_vpmsum
  [ 3992.136992] CPU: 1 PID: 207 Comm: modprobe Not tainted 4.13.0-12-generic 
#13-Ubuntu
  [ 3992.166991] task: c000000043719e00 task.stack: c0000000437c8000
  [ 3992.206994] NIP: c0000000086d2530 LR: c0000000086d46f0 CTR: 
0000000000000013
  [ 3992.246996] REGS: c0000000437cb260 TRAP: 0901   Not tainted  
(4.13.0-12-generic)
  [ 3992.276994] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
  [ 3992.306995]   CR: 24844442  XER: 20000000
  [ 3992.366993] CFAR: c0000000086d2570 SOFTE: 1 
  [ 3992.366993] GPR00: ffffffffffffff68 c0000000437cb4e0 c0000000095e3000 
c000000043c67e80 
  [ 3992.366993] GPR04: c000000043c67e80 c000000043c6bc00 ffffffffffffffed 
39077b9925c55abe 
  [ 3992.366993] GPR08: 0000000000000000 0000000000000000 0000000000000000 
0000000000000060 
  [ 3992.366993] GPR12: ffffffffffffff00 c00000000fac0a80 
  [ 3992.546994] NIP [c0000000086d2530] mpihelp_add_n+0x30/0x80
  [ 3992.586990] LR [c0000000086d46f0] mpih_sqr_n+0x230/0x460
  [ 3992.606991] Call Trace:
  [ 3992.617082] [c0000000437cb4e0] [c0000000086d48c4] mpih_sqr_n+0x404/0x460 
(unreliable)
  [ 3992.636996] [c0000000437cb560] [c0000000086d4844] mpih_sqr_n+0x384/0x460
  [ 3992.676996] [c0000000437cb5e0] [c0000000086d5778] mpi_powm+0x678/0xe50
  [ 3992.716992] [c0000000437cb720] [c000000008619d40] _rsa_dec.isra.1+0x80/0xc0
  [ 3992.746996] [c0000000437cb760] [c00000000861a094] rsa_verify+0x94/0x140
  [ 3992.786994] [c0000000437cb7c0] [c00000000861af44] 
pkcs1pad_verify+0xd4/0x160
  [ 3992.856995] [c0000000437cb800] [c000000008631510] 
public_key_verify_signature+0x240/0x4b0
  [ 3992.896992] [c0000000437cb9a0] [c0000000086311d4] 
verify_signature+0x64/0x90
  [ 3992.926997] [c0000000437cb9c0] [c000000008634690] 
pkcs7_validate_trust+0x190/0x2c0
  [ 3992.976992] [c0000000437cba20] [c0000000082b2e30] 
verify_pkcs7_signature+0xc0/0x1f0
  [ 3993.036993] [c0000000437cbad0] [c0000000081c8414] mod_verify_sig+0x94/0x100
  [ 3993.076996] [c0000000437cbb40] [c0000000081c5054] load_module+0x264/0x1fc0
  [ 3993.116992] [c0000000437cbd30] [c0000000081c70b4] 
SyS_finit_module+0xc4/0x130
  [ 3993.176992] [c0000000437cbe30] [c00000000800b184] system_call+0x58/0x6c
  [ 3993.226990] Instruction dump:
  [ 3993.237018] 39400000 7cc600d0 7cc607b4 7cc930f8 78c01f24 79290020 7c0c0378 
39290001 
  [ 3993.336994] 7d2903a6 60000000 60000000 60420000 <7d6c0050> 38c60001 
7cc607b4 7d25582a 
  [ 4028.156997] xor: measuring software checksum speed
  [ 4029.376998]    8regs     :    16.000 MB/sec
  [ 4030.676992]    8regs_prefetch:    16.000 MB/sec
  [ 4031.716993]    32regs    :    16.000 MB/sec
  [ 4032.886994]    32regs_prefetch:    16.000 MB/sec
  [ 4034.256993]    altivec   :    16.000 MB/sec
  [ 4034.316994] xor: using function: altivec (16.000 MB/sec)
  [ 4076.016995] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! 
[modprobe:207]
  [ 4076.046994] Modules linked in: xor async_tx raid6_pq raid1 raid0 multipath 
linear ibmvscsi(+) crc32c_vpmsum
  [ 4076.126994] CPU: 1 PID: 207 Comm: modprobe Tainted: G             L  
4.13.0-12-generic #13-Ubuntu
  [ 4076.186995] task: c000000043719e00 task.stack: c0000000437c8000
  [ 4076.226993] NIP: c0000000086d224c LR: c0000000086d4404 CTR: 
0000000000000008
  [ 4076.256991] REGS: c0000000437cb190 TRAP: 0901   Tainted: G             L   
(4.13.0-12-generic)
  [ 4076.286994] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
  [ 4076.326993]   CR: 24884444  XER: 20000000
  [ 4076.356998] CFAR: c0000000086d4400 SOFTE: 1 
  [ 4076.356998] GPR00: 5ebfd337ad53c297 c0000000437cb410 c0000000095e3000 
c000000043c62910 
  [ 4076.356998] GPR04: c000000043c62800 fffffffffffffff8 00000000c68de1f2 
0000000000000000 
  [ 4076.356998] GPR08: 761ab85da0153bf8 0000000000000008 0000000063cfb2b3 
026231001e934591 
  [ 4076.356998] GPR12: 0000000000000038 c00000000fac0a80 
  [ 4076.556992] NIP [c0000000086d224c] mpihelp_addmul_1+0x4c/0xf0
  [ 4076.596990] LR [c0000000086d4404] mpih_sqr_n_basecase+0xd4/0x190
  [ 4076.607012] Call Trace:
  [ 4076.636994] [c0000000437cb410] [0000000000000901] 0x901 (unreliable)
  [ 4076.676992] [c0000000437cb460] [c0000000086d4644] mpih_sqr_n+0x184/0x460
  [ 4076.736992] [c0000000437cb4e0] [c0000000086d4890] mpih_sqr_n+0x3d0/0x460
  [ 4076.756995] [c0000000437cb560] [c0000000086d4844] mpih_sqr_n+0x384/0x460
  [ 4076.816995] [c0000000437cb5e0] [c0000000086d5778] mpi_powm+0x678/0xe50
  [ 4076.846996] [c0000000437cb720] [c000000008619d40] _rsa_dec.isra.1+0x80/0xc0
  [ 4076.896992] [c0000000437cb760] [c00000000861a094] rsa_verify+0x94/0x140
  [ 4076.946997] [c0000000437cb7c0] [c00000000861af44] 
pkcs1pad_verify+0xd4/0x160
  [ 4076.976996] [c0000000437cb800] [c000000008631510] 
public_key_verify_signature+0x240/0x4b0
  [ 4077.016993] [c0000000437cb9a0] [c0000000086311d4] 
verify_signature+0x64/0x90
  [ 4077.046995] [c0000000437cb9c0] [c000000008634690] 
pkcs7_validate_trust+0x190/0x2c0
  [ 4077.086997] [c0000000437cba20] [c0000000082b2e30] 
verify_pkcs7_signature+0xc0/0x1f0
  [ 4077.136995] [c0000000437cbad0] [c0000000081c8414] mod_verify_sig+0x94/0x100
  [ 4077.196996] [c0000000437cbb40] [c0000000081c5054] load_module+0x264/0x1fc0
  [ 4077.236996] [c0000000437cbd30] [c0000000081c70b4] 
SyS_finit_module+0xc4/0x130
  [ 4077.286997] [c0000000437cbe30] [c00000000800b184] system_call+0x58/0x6c
  [ 4077.337015] Instruction dump:
  [ 4077.366995] 7ca507b4 78c60020 7ca928f8 78bf1f24 79290020 7ffdfb78 39290001 
38e00000 
  [ 4077.426994] 7d2903a6 7b9c83e4 60000000 60000000 <60420000> 7d9df850 
38a50001 7ca507b4 

  Contact Information = hasri...@in.ibm.com
   
  ---uname output---
  Linux ltcalpine-lp9 4.13.0-12-generic #13-Ubuntu SMP Fri Sep 22 20:52:52 UTC 
2017 ppc64le ppc64le ppc64le GNU/Linux

   
  Machine Type/Model = Power 8 pVM/8408-E8E

  ----Additional Info-----
  # cat /proc/cmdline
  BOOT_IMAGE=/boot/vmlinux-4.13.0-12-generic 
root=UUID=861097e8-43d3-4335-83d3-6db421e20564 ro 
crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M
   
  ---Steps to Reproduce---
  1. installed linux-crashdump and install debug kernel
  2. edited the kdump-tools.cfg crashkernel cmdline to above
  3. update-grub
  4. reboot once
  5. make sure kdump is enabled
  6. pp64_cpu --smt=2

  7. Login to hmc and trigger dumpstart. 
  chsysstate -r lpar -m <Server-name> -n <lpar-name> -o dumprestart

  soft lockup is observed when maxcpus=1 is used in kdump instead of
  nr_cpus=1. Dump is not taken and kernel boot stops.

  The full log is attached.

  Expected:
  To take dump and boot back to the host kernel.

  == Comment: #4 - Hari Krishna Bathini <hbath...@in.ibm.com> - 2018-06-11 
06:22:57 ==
  The below upstream patches should resolve this issue:

  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=04b9c96eae72
  ("powerpc/crash: Remove the test for cpu_online in the IPI callback")

  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4388c9b3a6ee
  ("powerpc: Do not send system reset request through the oops path")

  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4552d128c26e
  ("powerpc: System reset avoid interleaving oops using die synchronisation")

  Thanks
  Hari

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1776211/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to