[Kernel-packages] [Bug 1603449] Comment bridged from LTC Bugzilla

2016-07-18 Thread bugproxy
--- Comment From gws...@au1.ibm.com 2016-07-18 21:21 EDT---
Yeah, There is only one patch should be backported and it should fix the kernel 
crash. The patch is backported to Ubuntu-4.4.0-31.50 and attached. Note I 
checked out the base kernel code from below git repo:

git://kernel.ubuntu.com/ubuntu/ubuntu-xenial.git (branch: master)

Another patch (as below link shows) cann't be backported to ubuntu 4.4.0
yet as the fix depends on EEH support for SRIOV which isn't there. Lets
backport it when needed.

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=cca0e542e02e48cce541a49c4046ec094ec27c1e
. ("powerpc/eeh: Fix wrong argument passed to eeh_rmv_device()")

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1603449

Title:
  [LTCTest][Opal][OP820] Machine crashed with Oops: Kernel access of bad
  area, sig: 11 [#1] while executing Froze PE Error injection

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Xenial:
  In Progress

Bug description:
  == Comment: #0 - PAVAMAN SUBRAMANIYAM  - 2016-07-13 
01:28:56 ==
  ---Problem Description---
  Machine crashed with Oops: Kernel access of bad area, sig: 11 [#1]
   
  ---uname output---
  Linux ltc-garri2 4.4.0-30-generic #49-Ubuntu SMP Fri Jul 1 10:00:36 UTC 2016 
ppc64le ppc64le ppc64le GNU/Linux
   
  ---Additional Hardware Info---
  root@ltc-garri2:~# lspci
  :00:00.0 PCI bridge: IBM Device 03dc
  :01:00.0 Infiniband controller: Mellanox Technologies MT27600 [Connect-IB]
  0001:00:00.0 PCI bridge: IBM Device 03dc
  0002:00:00.0 PCI bridge: IBM Device 03dc
  0002:01:00.0 3D controller: NVIDIA Corporation Device 15fe (rev a1)
  0003:00:00.0 PCI bridge: IBM Device 03dc
  0004:00:00.0 PCI bridge: IBM Device 03dc
  0005:00:00.0 PCI bridge: IBM Device 03dc
  0005:01:00.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI 
Express Gen 3 (8.0 GT/s) Switch (rev ab)
  0005:02:01.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI 
Express Gen 3 (8.0 GT/s) Switch (rev ab)
  0005:02:02.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI 
Express Gen 3 (8.0 GT/s) Switch (rev ab)
  0005:02:03.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI 
Express Gen 3 (8.0 GT/s) Switch (rev ab)
  0005:02:04.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI 
Express Gen 3 (8.0 GT/s) Switch (rev ab)
  0005:03:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 
xHCI Host Controller (rev 02)
  0005:04:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9235 PCIe 2.0 
x2 4-port SATA 6 Gb/s Controller (rev 11)
  0005:05:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge 
(rev 03)
  0005:06:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED 
Graphics Family (rev 30)
  0005:07:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5718 
Gigabit Ethernet PCIe (rev 10)
  0005:07:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5718 
Gigabit Ethernet PCIe (rev 10)
  0006:00:00.0 PCI bridge: IBM Device 03dc
  0006:01:00.0 3D controller: NVIDIA Corporation Device 15fe (rev a1)
  0007:00:00.0 PCI bridge: IBM Device 03dc
  0008:00:00.0 Bridge: IBM Device 04ea
  0008:00:00.1 Bridge: IBM Device 04ea
  0008:00:01.0 Bridge: IBM Device 04ea
  0008:00:01.1 Bridge: IBM Device 04ea
  0009:00:00.0 Bridge: IBM Device 04ea
  0009:00:00.1 Bridge: IBM Device 04ea
  0009:00:01.0 Bridge: IBM Device 04ea
  0009:00:01.1 Bridge: IBM Device 04ea
   

   
  Machine Type = P8 
   
  ---Debugger---
  A debugger is not configured
   
  ---Steps to Reproduce---
   Install a P8 Open Power 8335-GTB Hardware with Ubuntu 16.04.1.
  Then execute the Frozen PE error injection tests as shown below:

  root@ltc-garri2:~# lspci | grep -i 0004:00:00.0
  0004:00:00.0 PCI bridge: IBM Device 03dc
  root@ltc-garri2:~# cat /proc/powerpc/eeh | tail -n 1
  eeh_slot_resets=0

  
  root@ltc-garri2:~# lspci | grep -i 0004:00:00.0
  0004:00:00.0 PCI bridge: IBM Device 03dc
  root@ltc-garri2:~# cat /proc/powerpc/eeh | tail -n 1
  eeh_slot_resets=0
  root@ltc-garri2:~# echo 0:0:4:0:0 > 
/sys/kernel/debug/powerpc/PCI0004/err_injct && lspci -ns 0004:00:00.0; echo $?
  0004:00:00.0 0604: 1014:03dc
  0

  Immediately the kernel crashes with a Oops Message.
   
  Contact Information = pavsu...@in.ibm.com 
   
  Stack trace output:
   [  289.297946] Call Trace:
  [  289.297969] [c00feeb8b9e0] [c0083c78] pnv_eeh_reset+0x58/0x170 
(unreliable)
  [  289.298042] [c00feeb8ba60] [c0038250] eeh_reset_pe+0xb0/0x1c0
  [  289.298105] [c00feeb8bb00] [c0af444c] 
eeh_reset_device+0xd8/0x228
  [  289.298165] [c00feeb8bba0] [c003c520] 
eeh_handle_normal_event+0x390/0x440
  [  289.298234] [c00feeb8bc20] [c003c9c4] 
eeh_handle_event+0x184/0x370
  [  289.298304] [c00feeb8bcd0] [c003cd88] 
eeh_e

[Kernel-packages] [Bug 1603449] Comment bridged from LTC Bugzilla

2016-07-18 Thread bugproxy
--- Comment From gmgay...@us.ibm.com 2016-07-18 09:33 EDT---
Hello Canonical,
Could you also please target this bug to 16.10 in addition to 16.04.1?
Thanks, Gary

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1603449

Title:
  [LTCTest][Opal][OP820] Machine crashed with Oops: Kernel access of bad
  area, sig: 11 [#1] while executing Froze PE Error injection

Status in linux package in Ubuntu:
  Triaged

Bug description:
  == Comment: #0 - PAVAMAN SUBRAMANIYAM  - 2016-07-13 
01:28:56 ==
  ---Problem Description---
  Machine crashed with Oops: Kernel access of bad area, sig: 11 [#1]
   
  ---uname output---
  Linux ltc-garri2 4.4.0-30-generic #49-Ubuntu SMP Fri Jul 1 10:00:36 UTC 2016 
ppc64le ppc64le ppc64le GNU/Linux
   
  ---Additional Hardware Info---
  root@ltc-garri2:~# lspci
  :00:00.0 PCI bridge: IBM Device 03dc
  :01:00.0 Infiniband controller: Mellanox Technologies MT27600 [Connect-IB]
  0001:00:00.0 PCI bridge: IBM Device 03dc
  0002:00:00.0 PCI bridge: IBM Device 03dc
  0002:01:00.0 3D controller: NVIDIA Corporation Device 15fe (rev a1)
  0003:00:00.0 PCI bridge: IBM Device 03dc
  0004:00:00.0 PCI bridge: IBM Device 03dc
  0005:00:00.0 PCI bridge: IBM Device 03dc
  0005:01:00.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI 
Express Gen 3 (8.0 GT/s) Switch (rev ab)
  0005:02:01.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI 
Express Gen 3 (8.0 GT/s) Switch (rev ab)
  0005:02:02.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI 
Express Gen 3 (8.0 GT/s) Switch (rev ab)
  0005:02:03.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI 
Express Gen 3 (8.0 GT/s) Switch (rev ab)
  0005:02:04.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI 
Express Gen 3 (8.0 GT/s) Switch (rev ab)
  0005:03:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 
xHCI Host Controller (rev 02)
  0005:04:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9235 PCIe 2.0 
x2 4-port SATA 6 Gb/s Controller (rev 11)
  0005:05:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge 
(rev 03)
  0005:06:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED 
Graphics Family (rev 30)
  0005:07:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5718 
Gigabit Ethernet PCIe (rev 10)
  0005:07:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5718 
Gigabit Ethernet PCIe (rev 10)
  0006:00:00.0 PCI bridge: IBM Device 03dc
  0006:01:00.0 3D controller: NVIDIA Corporation Device 15fe (rev a1)
  0007:00:00.0 PCI bridge: IBM Device 03dc
  0008:00:00.0 Bridge: IBM Device 04ea
  0008:00:00.1 Bridge: IBM Device 04ea
  0008:00:01.0 Bridge: IBM Device 04ea
  0008:00:01.1 Bridge: IBM Device 04ea
  0009:00:00.0 Bridge: IBM Device 04ea
  0009:00:00.1 Bridge: IBM Device 04ea
  0009:00:01.0 Bridge: IBM Device 04ea
  0009:00:01.1 Bridge: IBM Device 04ea
   

   
  Machine Type = P8 
   
  ---Debugger---
  A debugger is not configured
   
  ---Steps to Reproduce---
   Install a P8 Open Power 8335-GTB Hardware with Ubuntu 16.04.1.
  Then execute the Frozen PE error injection tests as shown below:

  root@ltc-garri2:~# lspci | grep -i 0004:00:00.0
  0004:00:00.0 PCI bridge: IBM Device 03dc
  root@ltc-garri2:~# cat /proc/powerpc/eeh | tail -n 1
  eeh_slot_resets=0

  
  root@ltc-garri2:~# lspci | grep -i 0004:00:00.0
  0004:00:00.0 PCI bridge: IBM Device 03dc
  root@ltc-garri2:~# cat /proc/powerpc/eeh | tail -n 1
  eeh_slot_resets=0
  root@ltc-garri2:~# echo 0:0:4:0:0 > 
/sys/kernel/debug/powerpc/PCI0004/err_injct && lspci -ns 0004:00:00.0; echo $?
  0004:00:00.0 0604: 1014:03dc
  0

  Immediately the kernel crashes with a Oops Message.
   
  Contact Information = pavsu...@in.ibm.com 
   
  Stack trace output:
   [  289.297946] Call Trace:
  [  289.297969] [c00feeb8b9e0] [c0083c78] pnv_eeh_reset+0x58/0x170 
(unreliable)
  [  289.298042] [c00feeb8ba60] [c0038250] eeh_reset_pe+0xb0/0x1c0
  [  289.298105] [c00feeb8bb00] [c0af444c] 
eeh_reset_device+0xd8/0x228
  [  289.298165] [c00feeb8bba0] [c003c520] 
eeh_handle_normal_event+0x390/0x440
  [  289.298234] [c00feeb8bc20] [c003c9c4] 
eeh_handle_event+0x184/0x370
  [  289.298304] [c00feeb8bcd0] [c003cd88] 
eeh_event_handler+0x1d8/0x1e0
  [  289.298374] [c00feeb8bd80] [c00e6420] kthread+0x110/0x130
  [  289.298434] [c00feeb8be30] [c0009538] 
ret_from_kernel_thread+0x5c/0xa4
  [  289.298501] Instruction dump:
  [  289.298531] 6000 813f ebdf0010 792affe3 408200d4 e95e0250 812a000c 
2f890002
  [  289.298630] 419e0054 7fe3fb78 4bfb70c5 6000  2fa9 
419e00dc e9290010

   
  Oops output:
   [  289.294622] EEH: Frozen PE#0 on PHB#4 detected
  [  289.294785] EEH: PE location: N/A, PHB location: N/A
  [  289.295598] EEH: This PCI device has failed 1 times in the l