[Bug 1758206] Comment bridged from LTC Bugzilla
--- Comment From pavra...@in.ibm.com 2018-04-05 07:34 EDT--- Issue is resolved in 4.15.0-15-generic kernel. root@ltc-wspoon4:~# ppc64_cpu --smt SMT is off Starting Kernel crash dump capture service... [ 11.747657] kdump-tools[952]: Starting kdump-tools: * running makedumpfile -c -d 31 /proc/vmcore /var/crash/201804050626/dump-incomplete Copying data : [100.0 %] \ eta: 0s [ 27.390223] kdump-tools[952]: The kernel version is not supported. [ 27.390438] kdump-tools[952]: The makedumpfile operation may be incomplete. [ 27.390563] kdump-tools[952]: The dumpfile is saved to /var/crash/201804050626/dump-incomplete. [ 27.390726] kdump-tools[952]: makedumpfile Completed. [ 27.405543] kdump-tools[952]: * kdump-tools: saved vmcore in /var/crash/201804050626 [ 30.762418] kdump-tools[952]: * running makedumpfile --dump-dmesg /proc/vmcore /var/crash/201804050626/dmesg.201804050626 [ 30.802776] kdump-tools[952]: The kernel version is not supported. [ 30.802923] kdump-tools[952]: The makedumpfile operation may be incomplete. [ 30.803025] kdump-tools[952]: The dmesg log is saved to /var/crash/201804050626/dmesg.201804050626. [ 30.803145] kdump-tools[952]: makedumpfile Completed. [ 30.803263] kdump-tools[952]: * kdump-tools: saved dmesg content in /var/crash/201804050626 [ 30.888353] kdump-tools[952]: Thu, 05 Apr 2018 06:26:24 -0500 [ 31.035631] kdump-tools[952]: Rebooting. [ 31.126613] reboot: Restarting system [ 1577.265030518,5] OPAL: Reboot request... root@ltc-wspoon4:~# ppc64_cpu --smt SMT=2 Starting Kernel crash dump capture service... [ 13.378626] kdump-tools[952]: Starting kdump-tools: * running makedumpfile -c -d 31 /proc/vmcore /var/crash/201804050631/dump-incomplete Copying data : [100.0 %] | eta: 0s [ 27.102530] kdump-tools[952]: The kernel version is not supported. [ 27.102659] kdump-tools[952]: The makedumpfile operation may be incomplete. [ 27.102787] kdump-tools[952]: The dumpfile is saved to /var/crash/201804050631/dump-incomplete. [ 27.102910] kdump-tools[952]: makedumpfile Completed. [ 27.112064] kdump-tools[952]: * kdump-tools: saved vmcore in /var/crash/201804050631 [ 29.632162] kdump-tools[952]: * running makedumpfile --dump-dmesg /proc/vmcore /var/crash/201804050631/dmesg.201804050631 [ 29.672730] kdump-tools[952]: The kernel version is not supported. [ 29.672890] kdump-tools[952]: The makedumpfile operation may be incomplete. [ 29.672997] kdump-tools[952]: The dmesg log is saved to /var/crash/201804050631/dmesg.201804050631. [ 29.673111] kdump-tools[952]: makedumpfile Completed. [ 29.673249] kdump-tools[952]: * kdump-tools: saved dmesg content in /var/crash/201804050631 [ 29.774672] kdump-tools[952]: Thu, 05 Apr 2018 06:31:40 -0500 [ 29.913780] kdump-tools[952]: Rebooting. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1758206 Title: Ubuntu 18.04 [ WSP DD2.2 with stop4 and stop5 enabled ]: kdump fails to capture dump when smt=2 or off. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1758206/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1758206] Comment bridged from LTC Bugzilla
--- Comment From pavra...@in.ibm.com 2018-03-30 01:14 EDT--- Tested again with given kernel, dump capture is successful with smt=2 and smt=off. Sorry fr the wrong update in previous comment, not sure what i had missed yesterday. root@ltc-wspoon4:~# uname -a Linux ltc-wspoon4 4.15.0-12-generic #13~lp1758206 SMP Tue Mar 27 15:20:59 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux root@ltc-wspoon4:~# ppc64_cpu --smt=off root@ltc-wspoon4:~# root@ltc-wspoon4:~# echo 1 > /proc/sys/kernel/sysrq root@ltc-wspoon4:~# echo "c" > /proc/sysrq-trigger [ 1424.806117] sysrq: SysRq : Trigger a crash [ 1424.806163] Unable to handle kernel paging request for data at address 0x [ 1424.806267] Faulting instruction address: 0xc07ec768 [ 1424.806352] Oops: Kernel access of bad area, sig: 11 [#1] [ 1424.806424] LE SMP NR_CPUS=2048 NUMA PowerNV [ 1424.806483] Modules linked in: idt_89hpesx(E) at24 ofpart uio_pdrv_genirq cmdlinepart powernv_flash uio mtd opal_prd ipmi_powernv ipmi_devintf ibmpowernv vmx_crypto ipmi_msghandler crct10dif_vpmsum sch_fq_codel ip_tables x_tables autofs4 ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci crc32c_vpmsum drm tg3 libahci [ 1424.806828] CPU: 0 PID: 3110 Comm: bash Tainted: GE 4.15.0-12-generic #13~lp1758206 [ 1424.806963] NIP: c07ec768 LR: c07ed6a8 CTR: c07ec740 [ 1424.807075] REGS: c01fce3d39f0 TRAP: 0300 Tainted: GE (4.15.0-12-generic) [ 1424.807211] MSR: 90009033CR: 2822 XER: 2004 [ 1424.807325] CFAR: c07ed6a4 DAR: DSISR: 4200 SOFTE: 1 [ 1424.807325] GPR00: c07ed6a8 c01fce3d3c70 c16eaf00 0063 [ 1424.807325] GPR04: c01ff6fbce18 c01ff6fd4368 90009033 000a [ 1424.807325] GPR08: 0007 0001 90001003 [ 1424.807325] GPR12: c07ec740 c7a2 06127f00ae48 [ 1424.807325] GPR16: 06124f78e9f0 06124f821998 06124f8219d0 06124f858204 [ 1424.807325] GPR20: 0001 7fffd6e57524 [ 1424.807325] GPR24: 7fffd6e57520 06124f85afc4 c15e9968 0002 [ 1424.807325] GPR28: 0063 0004 c1572a9c c15e9d08 [ 1424.808272] NIP [c07ec768] sysrq_handle_crash+0x28/0x30 [ 1424.808364] LR [c07ed6a8] __handle_sysrq+0xf8/0x2c0 [ 1424.808417] Call Trace: [ 1424.808468] [c01fce3d3c70] [c07ed688] __handle_sysrq+0xd8/0x2c0 (unreliable) [ 1424.808582] [c01fce3d3d10] [c07edeb4] write_sysrq_trigger+0x64/0x90 [ 1424.808690] [c01fce3d3d40] [c047dfe8] proc_reg_write+0x88/0xd0 [ 1424.808782] [c01fce3d3d70] [c03d131c] __vfs_write+0x3c/0x70 [ 1424.808875] [c01fce3d3d90] [c03d1578] vfs_write+0xd8/0x220 [ 1424.808957] [c01fce3d3de0] [c03d1898] SyS_write+0x68/0x110 [ 1424.809038] [c01fce3d3e30] [c000b184] system_call+0x58/0x6c [ 1424.809139] Instruction dump: [ 1424.809191] 4bfff9f1 4bfffe50 3c4c00f0 3842e7c0 7c0802a6 6000 3921 3d42001c [ 1424.809294] 394a6db0 912a 7c0004ac 3940 <992a> 4e800020 3c4c00f0 3842e790 [ 1424.809399] ---[ end trace a6b92894072107e0 ]--- [ 1425.814557] [ 1425.814659] Sending IPI to other CPUs [ 1427[ 1827.188061287,5] OPAL: Switch to big-endian OS .111853] IPI complete [ 1428[ 1830.496187306,5] OPAL: Switch to little-endian OS [ 1832.313865861,3] PHB#[0:0]: CRESET: Unexpected slot state 0102, resetting... [ 1840.498727171,3] PHB#0003[0:3]: CRESET: Unexpected slot state 0102, resetting... [ 1849.245109062,3] PHB#0030[8:0]: CRESET: Unexpected slot state 0102, resetting... [ 1851.209060452,3] PHB#0033[8:3]: CRESET: Unexpected slot state 0102, resetting... [ 1853.170614858,3] PHB#0034[8:4]: CRESET: Unexpected slot state 0102, resetting... .808156] kexec: Starting switchover sequence. [1.199857] integrity: Unable to open file: /etc/keys/x509_ima.der (-2) [1.199861] integrity: Unable to open file: /etc/keys/x509_evm.der (-2) [1.286500] vio vio: uevent: failed to send synthetic uevent /dev/sdb2: recovering journal /dev/sdb2: clean, 163655/61054976 files, 17123931/244188416 blocks [6.018312] vio vio: uevent: failed to send synthetic uevent [ OK ] Started Show Plymouth Boot Screen. plymouth-start.service [ OK ] Started Forward Password Requests to Plymouth Directory Watch. [ OK ] Reached target Local Encrypted Volumes. [ OK ] Started Network Service. systemd-networkd.service Starting Wait for Network to be Configured... [ OK ] Reached target Network. [7.934300] PKCS#7 signature not signed with a trusted key [7.934373] PKCS#7 signature not signed with a trusted key [7.935026] PKCS#7 signature not signed with a trusted key [7.935470] PKCS#7 signature not signed
[Bug 1758206] Comment bridged from LTC Bugzilla
--- Comment From pavra...@in.ibm.com 2018-03-29 11:31 EDT--- (In reply to comment #10) > I built a Bionic test kernel with the three commits mentioned in the bug > description. The test kernel can be downloaded from: > http://kernel.ubuntu.com/~jsalisbury/lp1758206 > > Can you test this kernel and see if it resolves this bug? > > Note, to test this kernel, you need to install both the linux-image and > linux-image-extra .deb packages. > > Thanks in advance! Tried with given kernel, kexec still failed. Please find logs below. root@ltc-wspoon4:~# ppc64_cpu --smt SMT is off root@ltc-wspoon4:~# kdump-config show DUMP_MODE:kdump USE_KDUMP:1 KDUMP_SYSCTL: kernel.panic_on_oops=1 KDUMP_COREDIR:/var/crash crashkernel addr: /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinux-4.15.0-12-generic kdump initrd: /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-4.15.0-12-generic current state:ready to kdump kexec command: /sbin/kexec -p --command-line="root=UUID=0266024d-8ea3-4132-ad62-b49befd6f8d9 ro quiet splash nr_cpus=1 systemd.unit=kdump-tools.service irqpoll noirqdistrib nousb" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz root@ltc-wspoon4:~# echo "c" > /proc/sysrq-trigger [ 951.567597] sysrq: SysRq : This sysrq operation is disabled. root@ltc-wspoon4:~# echo 1 > /proc/sys/kernel/sysrq root@ltc-wspoon4:~# echo "c" > /proc/sysrq-trigger [ 968.396522] sysrq: SysRq : Trigger a crash [ 968.396558] Unable to handle kernel paging request for data at address 0x [ 968.396602] Faulting instruction address: 0xc07ec768 [ 968.396640] Oops: Kernel access of bad area, sig: 11 [#1] [ 968.396670] LE SMP NR_CPUS=2048 NUMA PowerNV [ 968.396703] Modules linked in: idt_89hpesx(E) at24 uio_pdrv_genirq ofpart cmdlinepart powernv_flash mtd uio ibmpowernv ipmi_powernv vmx_crypto ipmi_devintf ipmi_msghandler opal_prd crct10dif_vpmsum sch_fq_codel ip_tables x_tables autofs4 ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci crc32c_vpmsum drm tg3 libahci [ 968.396893] CPU: 28 PID: 3086 Comm: bash Tainted: GE 4.15.0-12-generic #13~lp1758206 [ 968.396944] NIP: c07ec768 LR: c07ed6a8 CTR: c07ec740 [ 968.396989] REGS: c54fb9f0 TRAP: 0300 Tainted: GE (4.15.0-12-generic) [ 968.397040] MSR: 90009033CR: 2822 XER: 2004 [ 968.397090] CFAR: c07ed6a4 DAR: DSISR: 4200 SOFTE: 1 [ 968.397090] GPR00: c07ed6a8 c54fbc70 c16eaf00 0063 [ 968.397090] GPR04: c01ff76bce18 c01ff76d4368 90009033 000a [ 968.397090] GPR08: 0007 0001 90001003 [ 968.397090] GPR12: c07ec740 c7a33400 0a463c88ae48 [ 968.397090] GPR16: 0a462439e9f0 0a4624431998 0a46244319d0 0a4624468204 [ 968.397090] GPR20: 0001 79ecd164 [ 968.397090] GPR24: 79ecd160 0a462446afc4 c15e9968 0002 [ 968.397090] GPR28: 0063 0007 c1572a9c c15e9d08 [ 968.397486] NIP [c07ec768] sysrq_handle_crash+0x28/0x30 [ 968.397524] LR [c07ed6a8] __handle_sysrq+0xf8/0x2c0 [ 968.397554] Call Trace: [ 968.397571] [c54fbc70] [c07ed688] __handle_sysrq+0xd8/0x2c0 (unreliable) [ 968.397618] [c54fbd10] [c07edeb4] write_sysrq_trigger+0x64/0x90 [ 968.397664] [c54fbd40] [c047dfe8] proc_reg_write+0x88/0xd0 [ 968.397703] [c54fbd70] [c03d131c] __vfs_write+0x3c/0x70 [ 968.397742] [c54fbd90] [c03d1578] vfs_write+0xd8/0x220 [ 968.397781] [c54fbde0] [c03d1898] SyS_write+0x68/0x110 [ 968.397821] [c54fbe30] [c000b184] system_call+0x58/0x6c [ 968.397857] Instruction dump: [ 968.397881] 4bfff9f1 4bfffe50 3c4c00f0 3842e7c0 7c0802a6 6000 3921 3d42001c [ 968.397929] 394a6db0 912a 7c0004ac 3940 <992a> 4e800020 3c4c00f0 3842e790 [ 968.397979] ---[ end trace 42b5936ebd77f0df ]--- [ 969.403420] [ 969.403499] Sending IPI to other CPUs [ 970.[ 9304.282854548,5] OPAL: Switch to big-endian OS 699527] IPI c[ 9308.106771743,5] OPAL: Switch to little-endian OS [ 9309.438684420,3] PHB#[0:0]: CRESET: Unexpected slot state 0102, resetting... [ 9310.039758053,3] PHB#[0:0]: Timeout waiting for DLP PG reset ! [ 9310.039836165,3] PHB#[0:0]: Initialization failed [ 9312.102310864,3] PHB#0001[0:1]: Timeout waiting for DLP PG reset ! [ 9312.102386083,3] PHB#0001[0:1]: Initialization failed [ 9314.164868252,3] PHB#0002[0:2]: Timeout waiting for DLP PG reset ! [ 9314.165418307,3] PHB#0002[0:2]: Initialization failed [ 9316.116455526,3] PHB#0003[0:3]: CRESET: Unexpected slot state 0102, resetting...
[Bug 1758206] Comment bridged from LTC Bugzilla
--- Comment From kalsh...@in.ibm.com 2018-03-25 21:18 EDT--- Can we get patched kernel for test to try this out. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1758206 Title: Ubuntu 18.04 [ WSP DD2.2 with stop4 and stop5 enabled ]: kdump fails to capture dump when smt=2 or off. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1758206/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs