[Kernel-packages] [Bug 1603449] Comment bridged from LTC Bugzilla
--- Comment From gws...@au1.ibm.com 2016-07-18 21:21 EDT--- Yeah, There is only one patch should be backported and it should fix the kernel crash. The patch is backported to Ubuntu-4.4.0-31.50 and attached. Note I checked out the base kernel code from below git repo: git://kernel.ubuntu.com/ubuntu/ubuntu-xenial.git (branch: master) Another patch (as below link shows) cann't be backported to ubuntu 4.4.0 yet as the fix depends on EEH support for SRIOV which isn't there. Lets backport it when needed. https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=cca0e542e02e48cce541a49c4046ec094ec27c1e . ("powerpc/eeh: Fix wrong argument passed to eeh_rmv_device()") -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1603449 Title: [LTCTest][Opal][OP820] Machine crashed with Oops: Kernel access of bad area, sig: 11 [#1] while executing Froze PE Error injection Status in linux package in Ubuntu: Triaged Status in linux source package in Xenial: In Progress Bug description: == Comment: #0 - PAVAMAN SUBRAMANIYAM - 2016-07-13 01:28:56 == ---Problem Description--- Machine crashed with Oops: Kernel access of bad area, sig: 11 [#1] ---uname output--- Linux ltc-garri2 4.4.0-30-generic #49-Ubuntu SMP Fri Jul 1 10:00:36 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux ---Additional Hardware Info--- root@ltc-garri2:~# lspci :00:00.0 PCI bridge: IBM Device 03dc :01:00.0 Infiniband controller: Mellanox Technologies MT27600 [Connect-IB] 0001:00:00.0 PCI bridge: IBM Device 03dc 0002:00:00.0 PCI bridge: IBM Device 03dc 0002:01:00.0 3D controller: NVIDIA Corporation Device 15fe (rev a1) 0003:00:00.0 PCI bridge: IBM Device 03dc 0004:00:00.0 PCI bridge: IBM Device 03dc 0005:00:00.0 PCI bridge: IBM Device 03dc 0005:01:00.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab) 0005:02:01.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab) 0005:02:02.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab) 0005:02:03.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab) 0005:02:04.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab) 0005:03:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02) 0005:04:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9235 PCIe 2.0 x2 4-port SATA 6 Gb/s Controller (rev 11) 0005:05:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 03) 0005:06:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30) 0005:07:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5718 Gigabit Ethernet PCIe (rev 10) 0005:07:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5718 Gigabit Ethernet PCIe (rev 10) 0006:00:00.0 PCI bridge: IBM Device 03dc 0006:01:00.0 3D controller: NVIDIA Corporation Device 15fe (rev a1) 0007:00:00.0 PCI bridge: IBM Device 03dc 0008:00:00.0 Bridge: IBM Device 04ea 0008:00:00.1 Bridge: IBM Device 04ea 0008:00:01.0 Bridge: IBM Device 04ea 0008:00:01.1 Bridge: IBM Device 04ea 0009:00:00.0 Bridge: IBM Device 04ea 0009:00:00.1 Bridge: IBM Device 04ea 0009:00:01.0 Bridge: IBM Device 04ea 0009:00:01.1 Bridge: IBM Device 04ea Machine Type = P8 ---Debugger--- A debugger is not configured ---Steps to Reproduce--- Install a P8 Open Power 8335-GTB Hardware with Ubuntu 16.04.1. Then execute the Frozen PE error injection tests as shown below: root@ltc-garri2:~# lspci | grep -i 0004:00:00.0 0004:00:00.0 PCI bridge: IBM Device 03dc root@ltc-garri2:~# cat /proc/powerpc/eeh | tail -n 1 eeh_slot_resets=0 root@ltc-garri2:~# lspci | grep -i 0004:00:00.0 0004:00:00.0 PCI bridge: IBM Device 03dc root@ltc-garri2:~# cat /proc/powerpc/eeh | tail -n 1 eeh_slot_resets=0 root@ltc-garri2:~# echo 0:0:4:0:0 > /sys/kernel/debug/powerpc/PCI0004/err_injct && lspci -ns 0004:00:00.0; echo $? 0004:00:00.0 0604: 1014:03dc 0 Immediately the kernel crashes with a Oops Message. Contact Information = pavsu...@in.ibm.com Stack trace output: [ 289.297946] Call Trace: [ 289.297969] [c00feeb8b9e0] [c0083c78] pnv_eeh_reset+0x58/0x170 (unreliable) [ 289.298042] [c00feeb8ba60] [c0038250] eeh_reset_pe+0xb0/0x1c0 [ 289.298105] [c00feeb8bb00] [c0af444c] eeh_reset_device+0xd8/0x228 [ 289.298165] [c00feeb8bba0] [c003c520] eeh_handle_normal_event+0x390/0x440 [ 289.298234] [c00feeb8bc20] [c003c9c4] eeh_handle_event+0x184/0x370 [ 289.298304] [c00feeb8bcd0] [c003cd88] eeh_e
[Kernel-packages] [Bug 1603449] Comment bridged from LTC Bugzilla
--- Comment From gmgay...@us.ibm.com 2016-07-18 09:33 EDT--- Hello Canonical, Could you also please target this bug to 16.10 in addition to 16.04.1? Thanks, Gary -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1603449 Title: [LTCTest][Opal][OP820] Machine crashed with Oops: Kernel access of bad area, sig: 11 [#1] while executing Froze PE Error injection Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - PAVAMAN SUBRAMANIYAM - 2016-07-13 01:28:56 == ---Problem Description--- Machine crashed with Oops: Kernel access of bad area, sig: 11 [#1] ---uname output--- Linux ltc-garri2 4.4.0-30-generic #49-Ubuntu SMP Fri Jul 1 10:00:36 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux ---Additional Hardware Info--- root@ltc-garri2:~# lspci :00:00.0 PCI bridge: IBM Device 03dc :01:00.0 Infiniband controller: Mellanox Technologies MT27600 [Connect-IB] 0001:00:00.0 PCI bridge: IBM Device 03dc 0002:00:00.0 PCI bridge: IBM Device 03dc 0002:01:00.0 3D controller: NVIDIA Corporation Device 15fe (rev a1) 0003:00:00.0 PCI bridge: IBM Device 03dc 0004:00:00.0 PCI bridge: IBM Device 03dc 0005:00:00.0 PCI bridge: IBM Device 03dc 0005:01:00.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab) 0005:02:01.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab) 0005:02:02.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab) 0005:02:03.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab) 0005:02:04.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab) 0005:03:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02) 0005:04:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9235 PCIe 2.0 x2 4-port SATA 6 Gb/s Controller (rev 11) 0005:05:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 03) 0005:06:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30) 0005:07:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5718 Gigabit Ethernet PCIe (rev 10) 0005:07:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5718 Gigabit Ethernet PCIe (rev 10) 0006:00:00.0 PCI bridge: IBM Device 03dc 0006:01:00.0 3D controller: NVIDIA Corporation Device 15fe (rev a1) 0007:00:00.0 PCI bridge: IBM Device 03dc 0008:00:00.0 Bridge: IBM Device 04ea 0008:00:00.1 Bridge: IBM Device 04ea 0008:00:01.0 Bridge: IBM Device 04ea 0008:00:01.1 Bridge: IBM Device 04ea 0009:00:00.0 Bridge: IBM Device 04ea 0009:00:00.1 Bridge: IBM Device 04ea 0009:00:01.0 Bridge: IBM Device 04ea 0009:00:01.1 Bridge: IBM Device 04ea Machine Type = P8 ---Debugger--- A debugger is not configured ---Steps to Reproduce--- Install a P8 Open Power 8335-GTB Hardware with Ubuntu 16.04.1. Then execute the Frozen PE error injection tests as shown below: root@ltc-garri2:~# lspci | grep -i 0004:00:00.0 0004:00:00.0 PCI bridge: IBM Device 03dc root@ltc-garri2:~# cat /proc/powerpc/eeh | tail -n 1 eeh_slot_resets=0 root@ltc-garri2:~# lspci | grep -i 0004:00:00.0 0004:00:00.0 PCI bridge: IBM Device 03dc root@ltc-garri2:~# cat /proc/powerpc/eeh | tail -n 1 eeh_slot_resets=0 root@ltc-garri2:~# echo 0:0:4:0:0 > /sys/kernel/debug/powerpc/PCI0004/err_injct && lspci -ns 0004:00:00.0; echo $? 0004:00:00.0 0604: 1014:03dc 0 Immediately the kernel crashes with a Oops Message. Contact Information = pavsu...@in.ibm.com Stack trace output: [ 289.297946] Call Trace: [ 289.297969] [c00feeb8b9e0] [c0083c78] pnv_eeh_reset+0x58/0x170 (unreliable) [ 289.298042] [c00feeb8ba60] [c0038250] eeh_reset_pe+0xb0/0x1c0 [ 289.298105] [c00feeb8bb00] [c0af444c] eeh_reset_device+0xd8/0x228 [ 289.298165] [c00feeb8bba0] [c003c520] eeh_handle_normal_event+0x390/0x440 [ 289.298234] [c00feeb8bc20] [c003c9c4] eeh_handle_event+0x184/0x370 [ 289.298304] [c00feeb8bcd0] [c003cd88] eeh_event_handler+0x1d8/0x1e0 [ 289.298374] [c00feeb8bd80] [c00e6420] kthread+0x110/0x130 [ 289.298434] [c00feeb8be30] [c0009538] ret_from_kernel_thread+0x5c/0xa4 [ 289.298501] Instruction dump: [ 289.298531] 6000 813f ebdf0010 792affe3 408200d4 e95e0250 812a000c 2f890002 [ 289.298630] 419e0054 7fe3fb78 4bfb70c5 6000 2fa9 419e00dc e9290010 Oops output: [ 289.294622] EEH: Frozen PE#0 on PHB#4 detected [ 289.294785] EEH: PE location: N/A, PHB location: N/A [ 289.295598] EEH: This PCI device has failed 1 times in the l