[Kernel-packages] [Bug 1429959] Re: Auto Error Recovery is failing after error injected for sailfish card in Ubuntu 14.10 [PowerNV]

2015-09-21 Thread Mauricio Faria de Oliveira
For documentation purposes, that commit eventually made mainline 
(a30c2a3bf8571c6748dd16edc10b32d45ed71a72).
Note the issue could not be reproduced w/ 14.04.3 anyway.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1429959

Title:
  Auto Error Recovery is failing after error injected for sailfish card
  in Ubuntu 14.10 [PowerNV]

Status in linux package in Ubuntu:
  Fix Released

Bug description:
  ---Problem Description---

  PowerNV/Ubuntu 14.10 Auto Error Recovery is failing after error injected for 
sailfish
   
  ---uname output---
  Linux powerio-le21 3.16.0-23-generic #31-Ubuntu SMP Tue Oct 21 17:55:08 UTC 
2014 ppc64le ppc64le ppc64le GNU/Linux
   
  Machine Type = 8286-42A 

  ---Steps to Reproduce---
   
  There are 2 LUNs coming across 3 different paths and multipath is configured.

  1. Run I/O activity by running HTX load on the multipath devices.
  2. Verify I/O activity on the multipath devices by iostat command
  2. Injected error by the following command in 
  echo 0x8000 > 
/sys/kernel/debug/powerpc/PCI0001/err_injct_inboundA;

  sleep 1;
  echo 0x0 > /sys/kernel/debug/powerpc/PCI0001/err_injct_inboundA

  3. The error injection happened and the I/O activity was suspended as 
confirmed by iostat.
  4. Error recovery of the PCI devices did not happen and the devices remained 
inaccessible.

  The dmesg during the event is as follows

  [  376.148715] systemd-logind[7123]: New session 6 of user root.
  [  497.572751] EEH: Frozen PHB#1-PE#8 detected
  [  497.572799] EEH: PE location: U78C9.001.WZS006T-P1-C12 , PHB location: 
U78C9.001.WZS006T-P1-C32
  [  497.572890] CPU: 32 PID: 0 Comm: swapper/32 Tainted: G   OE 
3.16.0-23-generic #31-Ubuntu
  [  497.572892] Call Trace:
  [  497.572898] [c03fffe97b90] [c0017390] show_stack+0x170/0x290 
(unreliable)
  [  497.572902] [c03fffe97c70] [c0a05fc0] dump_stack+0x90/0xbc
  [  497.572906] [c03fffe97ca0] [c0038010] 
eeh_dev_check_failure+0x560/0x580
  [  497.572908] [c03fffe97d40] [c00380b8] 
eeh_check_failure+0x88/0xe0
  [  497.572933] [c03fffe97d80] [d0001cb247a8] 
qla24xx_msix_rsp_q+0x108/0x200 [qla2xxx]
  [  497.572936] [c03fffe97e10] [c01319b0] 
handle_irq_event_percpu+0x90/0x2b0
  [  497.572938] [c03fffe97ed0] [c0131c38] 
handle_irq_event+0x68/0xd0
  [  497.572940] [c03fffe97f00] [c0136f80] 
handle_fasteoi_irq+0xe0/0x2a0
  [  497.572942] [c03fffe97f30] [c0130ca8] 
generic_handle_irq+0x58/0x90
  [  497.572943] [c03fffe97f60] [c00119c0] __do_irq+0x80/0x190
  [  497.572945] [c03fffe97f90] [c00253d0] call_do_irq+0x14/0x24
  [  497.572946] [c02fe83abab0] [c0011b68] do_IRQ+0x98/0x140
  [  497.572948] [c02fe83abb00] [c0002794] 
hardware_interrupt_common+0x114/0x180
  [  497.572952] --- Exception: 501 at snooze_loop+0xd8/0x170
  LR = snooze_loop+0x90/0x170
  [  497.572955] [c02fe83abdf0] [c0a33680] cpu_online_mask+0x0/0x8 
(unreliable)
  [  497.572957] [c02fe83abe30] [c08405bc] 
cpuidle_enter_state+0x6c/0x140
  [  497.572960] [c02fe83abe80] [c0113938] 
cpu_startup_entry+0x318/0x4c0
  [  497.572962] [c02fe83abf20] [c0043844] 
start_secondary+0x324/0x350
  [  497.572964] [c02fe83abf90] [c0009a6c] 
start_secondary_prolog+0x10/0x14
  [  497.572973] EEH: Detected PCI bus error on PHB#1-PE#8
  [  497.572978] EEH: This PCI device has failed 1 times in the last hour
  [  497.572979] EEH: Notify device drivers to shutdown
  [  497.573000] qla2xxx [0001:07:00.0]-015b:2: Disabling adapter.
  [  497.573071] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573072] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573075] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573076] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573077] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573078] sd 2:0:1:1: [sdd]  
  [  497.573079] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573080] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573081] sd 2:0:1:1: [sdd]  
  [  497.573082] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573084] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573085] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573086] sd 2:0:1:1: [sdd] CDB: 
  [  497.573087] sd 2:0:1:1: [sdd]  
  [  497.573088] sd 2:0:1:0: [sdc]  
  [  497.573088] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573089] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573090] sd 2:0:1:1: [sdd] CDB: 
  [  497.573091] sd 2:0:1:1: [sdd]  
  [  497.573095] Read(10)
  [  497.573095] sd 2:0:1:0: [sdc]  
  [  497.573096] sd 2:0:1:0: [sdc]  
  [  497.573097] :
  [  497.573097] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573099] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 

[Kernel-packages] [Bug 1429959] Re: Auto Error Recovery is failing after error injected for sailfish card in Ubuntu 14.10 [PowerNV]

2015-07-28 Thread Mauricio Faria de Oliveira
** Changed in: linux (Ubuntu)
   Status: Confirmed => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1429959

Title:
  Auto Error Recovery is failing after error injected for sailfish card
  in Ubuntu 14.10 [PowerNV]

Status in linux package in Ubuntu:
  Fix Released

Bug description:
  ---Problem Description---

  PowerNV/Ubuntu 14.10 Auto Error Recovery is failing after error injected for 
sailfish
   
  ---uname output---
  Linux powerio-le21 3.16.0-23-generic #31-Ubuntu SMP Tue Oct 21 17:55:08 UTC 
2014 ppc64le ppc64le ppc64le GNU/Linux
   
  Machine Type = 8286-42A 

  ---Steps to Reproduce---
   
  There are 2 LUNs coming across 3 different paths and multipath is configured.

  1. Run I/O activity by running HTX load on the multipath devices.
  2. Verify I/O activity on the multipath devices by iostat command
  2. Injected error by the following command in 
  echo 0x8000 > 
/sys/kernel/debug/powerpc/PCI0001/err_injct_inboundA;

  sleep 1;
  echo 0x0 > /sys/kernel/debug/powerpc/PCI0001/err_injct_inboundA

  3. The error injection happened and the I/O activity was suspended as 
confirmed by iostat.
  4. Error recovery of the PCI devices did not happen and the devices remained 
inaccessible.

  The dmesg during the event is as follows

  [  376.148715] systemd-logind[7123]: New session 6 of user root.
  [  497.572751] EEH: Frozen PHB#1-PE#8 detected
  [  497.572799] EEH: PE location: U78C9.001.WZS006T-P1-C12 , PHB location: 
U78C9.001.WZS006T-P1-C32
  [  497.572890] CPU: 32 PID: 0 Comm: swapper/32 Tainted: G   OE 
3.16.0-23-generic #31-Ubuntu
  [  497.572892] Call Trace:
  [  497.572898] [c03fffe97b90] [c0017390] show_stack+0x170/0x290 
(unreliable)
  [  497.572902] [c03fffe97c70] [c0a05fc0] dump_stack+0x90/0xbc
  [  497.572906] [c03fffe97ca0] [c0038010] 
eeh_dev_check_failure+0x560/0x580
  [  497.572908] [c03fffe97d40] [c00380b8] 
eeh_check_failure+0x88/0xe0
  [  497.572933] [c03fffe97d80] [d0001cb247a8] 
qla24xx_msix_rsp_q+0x108/0x200 [qla2xxx]
  [  497.572936] [c03fffe97e10] [c01319b0] 
handle_irq_event_percpu+0x90/0x2b0
  [  497.572938] [c03fffe97ed0] [c0131c38] 
handle_irq_event+0x68/0xd0
  [  497.572940] [c03fffe97f00] [c0136f80] 
handle_fasteoi_irq+0xe0/0x2a0
  [  497.572942] [c03fffe97f30] [c0130ca8] 
generic_handle_irq+0x58/0x90
  [  497.572943] [c03fffe97f60] [c00119c0] __do_irq+0x80/0x190
  [  497.572945] [c03fffe97f90] [c00253d0] call_do_irq+0x14/0x24
  [  497.572946] [c02fe83abab0] [c0011b68] do_IRQ+0x98/0x140
  [  497.572948] [c02fe83abb00] [c0002794] 
hardware_interrupt_common+0x114/0x180
  [  497.572952] --- Exception: 501 at snooze_loop+0xd8/0x170
  LR = snooze_loop+0x90/0x170
  [  497.572955] [c02fe83abdf0] [c0a33680] cpu_online_mask+0x0/0x8 
(unreliable)
  [  497.572957] [c02fe83abe30] [c08405bc] 
cpuidle_enter_state+0x6c/0x140
  [  497.572960] [c02fe83abe80] [c0113938] 
cpu_startup_entry+0x318/0x4c0
  [  497.572962] [c02fe83abf20] [c0043844] 
start_secondary+0x324/0x350
  [  497.572964] [c02fe83abf90] [c0009a6c] 
start_secondary_prolog+0x10/0x14
  [  497.572973] EEH: Detected PCI bus error on PHB#1-PE#8
  [  497.572978] EEH: This PCI device has failed 1 times in the last hour
  [  497.572979] EEH: Notify device drivers to shutdown
  [  497.573000] qla2xxx [0001:07:00.0]-015b:2: Disabling adapter.
  [  497.573071] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573072] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573075] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573076] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573077] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573078] sd 2:0:1:1: [sdd]  
  [  497.573079] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573080] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573081] sd 2:0:1:1: [sdd]  
  [  497.573082] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573084] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573085] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573086] sd 2:0:1:1: [sdd] CDB: 
  [  497.573087] sd 2:0:1:1: [sdd]  
  [  497.573088] sd 2:0:1:0: [sdc]  
  [  497.573088] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573089] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573090] sd 2:0:1:1: [sdd] CDB: 
  [  497.573091] sd 2:0:1:1: [sdd]  
  [  497.573095] Read(10)
  [  497.573095] sd 2:0:1:0: [sdc]  
  [  497.573096] sd 2:0:1:0: [sdc]  
  [  497.573097] :
  [  497.573097] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573099] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573100] Read(10)
  [  497.573100] sd 2:0:1:1: [sdd] CDB: 
  [  497.573101] sd 2:0:1:0: [sdc]

[Kernel-packages] [Bug 1429959] Re: Auto Error Recovery is failing after error injected for sailfish card in Ubuntu 14.10 [PowerNV]

2015-07-27 Thread bugproxy
--- Comment From cha...@us.ibm.com 2015-07-27 19:43 EDT---
In Ubuntu 14.04.3 with kernel 3.19.0-22, this issue is not seen.
EEH happens as expected, and adapter recovery is working fine till 5th time.
6th time, after reboot also, it is recovered.

# uname -a
Linux powerio-le21 3.19.0-22-generic #22~14.04.1-Ubuntu SMP Wed Jun 17 10:03:39 
UTC 2015 ppc64le ppc64le ppc64le GNU/Linux

Closing...

** Tags removed: targetmilestone-inin1504
** Tags added: targetmilestone-inin14043

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1429959

Title:
  Auto Error Recovery is failing after error injected for sailfish card
  in Ubuntu 14.10 [PowerNV]

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  ---Problem Description---

  PowerNV/Ubuntu 14.10 Auto Error Recovery is failing after error injected for 
sailfish
   
  ---uname output---
  Linux powerio-le21 3.16.0-23-generic #31-Ubuntu SMP Tue Oct 21 17:55:08 UTC 
2014 ppc64le ppc64le ppc64le GNU/Linux
   
  Machine Type = 8286-42A 

  ---Steps to Reproduce---
   
  There are 2 LUNs coming across 3 different paths and multipath is configured.

  1. Run I/O activity by running HTX load on the multipath devices.
  2. Verify I/O activity on the multipath devices by iostat command
  2. Injected error by the following command in 
  echo 0x8000 > 
/sys/kernel/debug/powerpc/PCI0001/err_injct_inboundA;

  sleep 1;
  echo 0x0 > /sys/kernel/debug/powerpc/PCI0001/err_injct_inboundA

  3. The error injection happened and the I/O activity was suspended as 
confirmed by iostat.
  4. Error recovery of the PCI devices did not happen and the devices remained 
inaccessible.

  The dmesg during the event is as follows

  [  376.148715] systemd-logind[7123]: New session 6 of user root.
  [  497.572751] EEH: Frozen PHB#1-PE#8 detected
  [  497.572799] EEH: PE location: U78C9.001.WZS006T-P1-C12 , PHB location: 
U78C9.001.WZS006T-P1-C32
  [  497.572890] CPU: 32 PID: 0 Comm: swapper/32 Tainted: G   OE 
3.16.0-23-generic #31-Ubuntu
  [  497.572892] Call Trace:
  [  497.572898] [c03fffe97b90] [c0017390] show_stack+0x170/0x290 
(unreliable)
  [  497.572902] [c03fffe97c70] [c0a05fc0] dump_stack+0x90/0xbc
  [  497.572906] [c03fffe97ca0] [c0038010] 
eeh_dev_check_failure+0x560/0x580
  [  497.572908] [c03fffe97d40] [c00380b8] 
eeh_check_failure+0x88/0xe0
  [  497.572933] [c03fffe97d80] [d0001cb247a8] 
qla24xx_msix_rsp_q+0x108/0x200 [qla2xxx]
  [  497.572936] [c03fffe97e10] [c01319b0] 
handle_irq_event_percpu+0x90/0x2b0
  [  497.572938] [c03fffe97ed0] [c0131c38] 
handle_irq_event+0x68/0xd0
  [  497.572940] [c03fffe97f00] [c0136f80] 
handle_fasteoi_irq+0xe0/0x2a0
  [  497.572942] [c03fffe97f30] [c0130ca8] 
generic_handle_irq+0x58/0x90
  [  497.572943] [c03fffe97f60] [c00119c0] __do_irq+0x80/0x190
  [  497.572945] [c03fffe97f90] [c00253d0] call_do_irq+0x14/0x24
  [  497.572946] [c02fe83abab0] [c0011b68] do_IRQ+0x98/0x140
  [  497.572948] [c02fe83abb00] [c0002794] 
hardware_interrupt_common+0x114/0x180
  [  497.572952] --- Exception: 501 at snooze_loop+0xd8/0x170
  LR = snooze_loop+0x90/0x170
  [  497.572955] [c02fe83abdf0] [c0a33680] cpu_online_mask+0x0/0x8 
(unreliable)
  [  497.572957] [c02fe83abe30] [c08405bc] 
cpuidle_enter_state+0x6c/0x140
  [  497.572960] [c02fe83abe80] [c0113938] 
cpu_startup_entry+0x318/0x4c0
  [  497.572962] [c02fe83abf20] [c0043844] 
start_secondary+0x324/0x350
  [  497.572964] [c02fe83abf90] [c0009a6c] 
start_secondary_prolog+0x10/0x14
  [  497.572973] EEH: Detected PCI bus error on PHB#1-PE#8
  [  497.572978] EEH: This PCI device has failed 1 times in the last hour
  [  497.572979] EEH: Notify device drivers to shutdown
  [  497.573000] qla2xxx [0001:07:00.0]-015b:2: Disabling adapter.
  [  497.573071] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573072] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573075] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573076] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573077] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573078] sd 2:0:1:1: [sdd]  
  [  497.573079] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573080] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573081] sd 2:0:1:1: [sdd]  
  [  497.573082] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573084] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573085] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573086] sd 2:0:1:1: [sdd] CDB: 
  [  497.573087] sd 2:0:1:1: [sdd]  
  [  497.573088] sd 2:0:1:0: [sdc]  
  [  497.573088] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573089] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573090] sd 2:0:1

[Kernel-packages] [Bug 1429959] Re: Auto Error Recovery is failing after error injected for sailfish card in Ubuntu 14.10 [PowerNV]

2015-06-26 Thread Mauricio Faria de Oliveira
@arges
I'll check if qlogic may review/reply to continue some activity.
I'm not experienced w/ the review/commit process for this subsystem, so if it's 
someone else who should reply, please let me know.
Thanks for your attention on this one.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1429959

Title:
  Auto Error Recovery is failing after error injected for sailfish card
  in Ubuntu 14.10 [PowerNV]

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  ---Problem Description---

  PowerNV/Ubuntu 14.10 Auto Error Recovery is failing after error injected for 
sailfish
   
  ---uname output---
  Linux powerio-le21 3.16.0-23-generic #31-Ubuntu SMP Tue Oct 21 17:55:08 UTC 
2014 ppc64le ppc64le ppc64le GNU/Linux
   
  Machine Type = 8286-42A 

  ---Steps to Reproduce---
   
  There are 2 LUNs coming across 3 different paths and multipath is configured.

  1. Run I/O activity by running HTX load on the multipath devices.
  2. Verify I/O activity on the multipath devices by iostat command
  2. Injected error by the following command in 
  echo 0x8000 > 
/sys/kernel/debug/powerpc/PCI0001/err_injct_inboundA;

  sleep 1;
  echo 0x0 > /sys/kernel/debug/powerpc/PCI0001/err_injct_inboundA

  3. The error injection happened and the I/O activity was suspended as 
confirmed by iostat.
  4. Error recovery of the PCI devices did not happen and the devices remained 
inaccessible.

  The dmesg during the event is as follows

  [  376.148715] systemd-logind[7123]: New session 6 of user root.
  [  497.572751] EEH: Frozen PHB#1-PE#8 detected
  [  497.572799] EEH: PE location: U78C9.001.WZS006T-P1-C12 , PHB location: 
U78C9.001.WZS006T-P1-C32
  [  497.572890] CPU: 32 PID: 0 Comm: swapper/32 Tainted: G   OE 
3.16.0-23-generic #31-Ubuntu
  [  497.572892] Call Trace:
  [  497.572898] [c03fffe97b90] [c0017390] show_stack+0x170/0x290 
(unreliable)
  [  497.572902] [c03fffe97c70] [c0a05fc0] dump_stack+0x90/0xbc
  [  497.572906] [c03fffe97ca0] [c0038010] 
eeh_dev_check_failure+0x560/0x580
  [  497.572908] [c03fffe97d40] [c00380b8] 
eeh_check_failure+0x88/0xe0
  [  497.572933] [c03fffe97d80] [d0001cb247a8] 
qla24xx_msix_rsp_q+0x108/0x200 [qla2xxx]
  [  497.572936] [c03fffe97e10] [c01319b0] 
handle_irq_event_percpu+0x90/0x2b0
  [  497.572938] [c03fffe97ed0] [c0131c38] 
handle_irq_event+0x68/0xd0
  [  497.572940] [c03fffe97f00] [c0136f80] 
handle_fasteoi_irq+0xe0/0x2a0
  [  497.572942] [c03fffe97f30] [c0130ca8] 
generic_handle_irq+0x58/0x90
  [  497.572943] [c03fffe97f60] [c00119c0] __do_irq+0x80/0x190
  [  497.572945] [c03fffe97f90] [c00253d0] call_do_irq+0x14/0x24
  [  497.572946] [c02fe83abab0] [c0011b68] do_IRQ+0x98/0x140
  [  497.572948] [c02fe83abb00] [c0002794] 
hardware_interrupt_common+0x114/0x180
  [  497.572952] --- Exception: 501 at snooze_loop+0xd8/0x170
  LR = snooze_loop+0x90/0x170
  [  497.572955] [c02fe83abdf0] [c0a33680] cpu_online_mask+0x0/0x8 
(unreliable)
  [  497.572957] [c02fe83abe30] [c08405bc] 
cpuidle_enter_state+0x6c/0x140
  [  497.572960] [c02fe83abe80] [c0113938] 
cpu_startup_entry+0x318/0x4c0
  [  497.572962] [c02fe83abf20] [c0043844] 
start_secondary+0x324/0x350
  [  497.572964] [c02fe83abf90] [c0009a6c] 
start_secondary_prolog+0x10/0x14
  [  497.572973] EEH: Detected PCI bus error on PHB#1-PE#8
  [  497.572978] EEH: This PCI device has failed 1 times in the last hour
  [  497.572979] EEH: Notify device drivers to shutdown
  [  497.573000] qla2xxx [0001:07:00.0]-015b:2: Disabling adapter.
  [  497.573071] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573072] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573075] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573076] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573077] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573078] sd 2:0:1:1: [sdd]  
  [  497.573079] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573080] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573081] sd 2:0:1:1: [sdd]  
  [  497.573082] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573084] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573085] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573086] sd 2:0:1:1: [sdd] CDB: 
  [  497.573087] sd 2:0:1:1: [sdd]  
  [  497.573088] sd 2:0:1:0: [sdc]  
  [  497.573088] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573089] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573090] sd 2:0:1:1: [sdd] CDB: 
  [  497.573091] sd 2:0:1:1: [sdd]  
  [  497.573095] Read(10)
  [  497.573095] sd 2:0:1:0: [sdc]  
  [  497.573096] sd 2:0:1:0: [sdc]  
  [  497.573097] :
  [  497.573097] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  

[Kernel-packages] [Bug 1429959] Re: Auto Error Recovery is failing after error injected for sailfish card in Ubuntu 14.10 [PowerNV]

2015-06-26 Thread Chris J Arges
I haven't seen upstream take the patch yet, does it need to be resent?
Thanks

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1429959

Title:
  Auto Error Recovery is failing after error injected for sailfish card
  in Ubuntu 14.10 [PowerNV]

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  ---Problem Description---

  PowerNV/Ubuntu 14.10 Auto Error Recovery is failing after error injected for 
sailfish
   
  ---uname output---
  Linux powerio-le21 3.16.0-23-generic #31-Ubuntu SMP Tue Oct 21 17:55:08 UTC 
2014 ppc64le ppc64le ppc64le GNU/Linux
   
  Machine Type = 8286-42A 

  ---Steps to Reproduce---
   
  There are 2 LUNs coming across 3 different paths and multipath is configured.

  1. Run I/O activity by running HTX load on the multipath devices.
  2. Verify I/O activity on the multipath devices by iostat command
  2. Injected error by the following command in 
  echo 0x8000 > 
/sys/kernel/debug/powerpc/PCI0001/err_injct_inboundA;

  sleep 1;
  echo 0x0 > /sys/kernel/debug/powerpc/PCI0001/err_injct_inboundA

  3. The error injection happened and the I/O activity was suspended as 
confirmed by iostat.
  4. Error recovery of the PCI devices did not happen and the devices remained 
inaccessible.

  The dmesg during the event is as follows

  [  376.148715] systemd-logind[7123]: New session 6 of user root.
  [  497.572751] EEH: Frozen PHB#1-PE#8 detected
  [  497.572799] EEH: PE location: U78C9.001.WZS006T-P1-C12 , PHB location: 
U78C9.001.WZS006T-P1-C32
  [  497.572890] CPU: 32 PID: 0 Comm: swapper/32 Tainted: G   OE 
3.16.0-23-generic #31-Ubuntu
  [  497.572892] Call Trace:
  [  497.572898] [c03fffe97b90] [c0017390] show_stack+0x170/0x290 
(unreliable)
  [  497.572902] [c03fffe97c70] [c0a05fc0] dump_stack+0x90/0xbc
  [  497.572906] [c03fffe97ca0] [c0038010] 
eeh_dev_check_failure+0x560/0x580
  [  497.572908] [c03fffe97d40] [c00380b8] 
eeh_check_failure+0x88/0xe0
  [  497.572933] [c03fffe97d80] [d0001cb247a8] 
qla24xx_msix_rsp_q+0x108/0x200 [qla2xxx]
  [  497.572936] [c03fffe97e10] [c01319b0] 
handle_irq_event_percpu+0x90/0x2b0
  [  497.572938] [c03fffe97ed0] [c0131c38] 
handle_irq_event+0x68/0xd0
  [  497.572940] [c03fffe97f00] [c0136f80] 
handle_fasteoi_irq+0xe0/0x2a0
  [  497.572942] [c03fffe97f30] [c0130ca8] 
generic_handle_irq+0x58/0x90
  [  497.572943] [c03fffe97f60] [c00119c0] __do_irq+0x80/0x190
  [  497.572945] [c03fffe97f90] [c00253d0] call_do_irq+0x14/0x24
  [  497.572946] [c02fe83abab0] [c0011b68] do_IRQ+0x98/0x140
  [  497.572948] [c02fe83abb00] [c0002794] 
hardware_interrupt_common+0x114/0x180
  [  497.572952] --- Exception: 501 at snooze_loop+0xd8/0x170
  LR = snooze_loop+0x90/0x170
  [  497.572955] [c02fe83abdf0] [c0a33680] cpu_online_mask+0x0/0x8 
(unreliable)
  [  497.572957] [c02fe83abe30] [c08405bc] 
cpuidle_enter_state+0x6c/0x140
  [  497.572960] [c02fe83abe80] [c0113938] 
cpu_startup_entry+0x318/0x4c0
  [  497.572962] [c02fe83abf20] [c0043844] 
start_secondary+0x324/0x350
  [  497.572964] [c02fe83abf90] [c0009a6c] 
start_secondary_prolog+0x10/0x14
  [  497.572973] EEH: Detected PCI bus error on PHB#1-PE#8
  [  497.572978] EEH: This PCI device has failed 1 times in the last hour
  [  497.572979] EEH: Notify device drivers to shutdown
  [  497.573000] qla2xxx [0001:07:00.0]-015b:2: Disabling adapter.
  [  497.573071] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573072] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573075] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573076] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573077] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573078] sd 2:0:1:1: [sdd]  
  [  497.573079] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573080] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573081] sd 2:0:1:1: [sdd]  
  [  497.573082] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573084] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573085] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573086] sd 2:0:1:1: [sdd] CDB: 
  [  497.573087] sd 2:0:1:1: [sdd]  
  [  497.573088] sd 2:0:1:0: [sdc]  
  [  497.573088] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573089] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573090] sd 2:0:1:1: [sdd] CDB: 
  [  497.573091] sd 2:0:1:1: [sdd]  
  [  497.573095] Read(10)
  [  497.573095] sd 2:0:1:0: [sdc]  
  [  497.573096] sd 2:0:1:0: [sdc]  
  [  497.573097] :
  [  497.573097] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573099] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573100] Read(10)
  [  497.573100] sd 2:0:1:1: [sdd] CDB: 
  [  497.573101] sd 2:0:1:0: [

[Kernel-packages] [Bug 1429959] Re: Auto Error Recovery is failing after error injected for sailfish card in Ubuntu 14.10 [PowerNV]

2015-06-10 Thread Chris J Arges
** Changed in: linux (Ubuntu)
   Status: New => Confirmed

** Changed in: linux (Ubuntu)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1429959

Title:
  Auto Error Recovery is failing after error injected for sailfish card
  in Ubuntu 14.10 [PowerNV]

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  ---Problem Description---

  PowerNV/Ubuntu 14.10 Auto Error Recovery is failing after error injected for 
sailfish
   
  ---uname output---
  Linux powerio-le21 3.16.0-23-generic #31-Ubuntu SMP Tue Oct 21 17:55:08 UTC 
2014 ppc64le ppc64le ppc64le GNU/Linux
   
  Machine Type = 8286-42A 

  ---Steps to Reproduce---
   
  There are 2 LUNs coming across 3 different paths and multipath is configured.

  1. Run I/O activity by running HTX load on the multipath devices.
  2. Verify I/O activity on the multipath devices by iostat command
  2. Injected error by the following command in 
  echo 0x8000 > 
/sys/kernel/debug/powerpc/PCI0001/err_injct_inboundA;

  sleep 1;
  echo 0x0 > /sys/kernel/debug/powerpc/PCI0001/err_injct_inboundA

  3. The error injection happened and the I/O activity was suspended as 
confirmed by iostat.
  4. Error recovery of the PCI devices did not happen and the devices remained 
inaccessible.

  The dmesg during the event is as follows

  [  376.148715] systemd-logind[7123]: New session 6 of user root.
  [  497.572751] EEH: Frozen PHB#1-PE#8 detected
  [  497.572799] EEH: PE location: U78C9.001.WZS006T-P1-C12 , PHB location: 
U78C9.001.WZS006T-P1-C32
  [  497.572890] CPU: 32 PID: 0 Comm: swapper/32 Tainted: G   OE 
3.16.0-23-generic #31-Ubuntu
  [  497.572892] Call Trace:
  [  497.572898] [c03fffe97b90] [c0017390] show_stack+0x170/0x290 
(unreliable)
  [  497.572902] [c03fffe97c70] [c0a05fc0] dump_stack+0x90/0xbc
  [  497.572906] [c03fffe97ca0] [c0038010] 
eeh_dev_check_failure+0x560/0x580
  [  497.572908] [c03fffe97d40] [c00380b8] 
eeh_check_failure+0x88/0xe0
  [  497.572933] [c03fffe97d80] [d0001cb247a8] 
qla24xx_msix_rsp_q+0x108/0x200 [qla2xxx]
  [  497.572936] [c03fffe97e10] [c01319b0] 
handle_irq_event_percpu+0x90/0x2b0
  [  497.572938] [c03fffe97ed0] [c0131c38] 
handle_irq_event+0x68/0xd0
  [  497.572940] [c03fffe97f00] [c0136f80] 
handle_fasteoi_irq+0xe0/0x2a0
  [  497.572942] [c03fffe97f30] [c0130ca8] 
generic_handle_irq+0x58/0x90
  [  497.572943] [c03fffe97f60] [c00119c0] __do_irq+0x80/0x190
  [  497.572945] [c03fffe97f90] [c00253d0] call_do_irq+0x14/0x24
  [  497.572946] [c02fe83abab0] [c0011b68] do_IRQ+0x98/0x140
  [  497.572948] [c02fe83abb00] [c0002794] 
hardware_interrupt_common+0x114/0x180
  [  497.572952] --- Exception: 501 at snooze_loop+0xd8/0x170
  LR = snooze_loop+0x90/0x170
  [  497.572955] [c02fe83abdf0] [c0a33680] cpu_online_mask+0x0/0x8 
(unreliable)
  [  497.572957] [c02fe83abe30] [c08405bc] 
cpuidle_enter_state+0x6c/0x140
  [  497.572960] [c02fe83abe80] [c0113938] 
cpu_startup_entry+0x318/0x4c0
  [  497.572962] [c02fe83abf20] [c0043844] 
start_secondary+0x324/0x350
  [  497.572964] [c02fe83abf90] [c0009a6c] 
start_secondary_prolog+0x10/0x14
  [  497.572973] EEH: Detected PCI bus error on PHB#1-PE#8
  [  497.572978] EEH: This PCI device has failed 1 times in the last hour
  [  497.572979] EEH: Notify device drivers to shutdown
  [  497.573000] qla2xxx [0001:07:00.0]-015b:2: Disabling adapter.
  [  497.573071] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573072] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573075] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573076] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573077] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573078] sd 2:0:1:1: [sdd]  
  [  497.573079] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573080] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573081] sd 2:0:1:1: [sdd]  
  [  497.573082] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573084] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573085] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573086] sd 2:0:1:1: [sdd] CDB: 
  [  497.573087] sd 2:0:1:1: [sdd]  
  [  497.573088] sd 2:0:1:0: [sdc]  
  [  497.573088] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573089] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573090] sd 2:0:1:1: [sdd] CDB: 
  [  497.573091] sd 2:0:1:1: [sdd]  
  [  497.573095] Read(10)
  [  497.573095] sd 2:0:1:0: [sdc]  
  [  497.573096] sd 2:0:1:0: [sdc]  
  [  497.573097] :
  [  497.573097] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573099] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573100] Read(10)
  [  497.573100] sd 2

[Kernel-packages] [Bug 1429959] Re: Auto Error Recovery is failing after error injected for sailfish card in Ubuntu 14.10 [PowerNV]

2015-04-07 Thread bugproxy
** Tags removed: targetmilestone-inin1410
** Tags added: targetmilestone-inin1504

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1429959

Title:
  Auto Error Recovery is failing after error injected for sailfish card
  in Ubuntu 14.10 [PowerNV]

Status in linux package in Ubuntu:
  New

Bug description:
  ---Problem Description---

  PowerNV/Ubuntu 14.10 Auto Error Recovery is failing after error injected for 
sailfish
   
  ---uname output---
  Linux powerio-le21 3.16.0-23-generic #31-Ubuntu SMP Tue Oct 21 17:55:08 UTC 
2014 ppc64le ppc64le ppc64le GNU/Linux
   
  Machine Type = 8286-42A 

  ---Steps to Reproduce---
   
  There are 2 LUNs coming across 3 different paths and multipath is configured.

  1. Run I/O activity by running HTX load on the multipath devices.
  2. Verify I/O activity on the multipath devices by iostat command
  2. Injected error by the following command in 
  echo 0x8000 > 
/sys/kernel/debug/powerpc/PCI0001/err_injct_inboundA;

  sleep 1;
  echo 0x0 > /sys/kernel/debug/powerpc/PCI0001/err_injct_inboundA

  3. The error injection happened and the I/O activity was suspended as 
confirmed by iostat.
  4. Error recovery of the PCI devices did not happen and the devices remained 
inaccessible.

  The dmesg during the event is as follows

  [  376.148715] systemd-logind[7123]: New session 6 of user root.
  [  497.572751] EEH: Frozen PHB#1-PE#8 detected
  [  497.572799] EEH: PE location: U78C9.001.WZS006T-P1-C12 , PHB location: 
U78C9.001.WZS006T-P1-C32
  [  497.572890] CPU: 32 PID: 0 Comm: swapper/32 Tainted: G   OE 
3.16.0-23-generic #31-Ubuntu
  [  497.572892] Call Trace:
  [  497.572898] [c03fffe97b90] [c0017390] show_stack+0x170/0x290 
(unreliable)
  [  497.572902] [c03fffe97c70] [c0a05fc0] dump_stack+0x90/0xbc
  [  497.572906] [c03fffe97ca0] [c0038010] 
eeh_dev_check_failure+0x560/0x580
  [  497.572908] [c03fffe97d40] [c00380b8] 
eeh_check_failure+0x88/0xe0
  [  497.572933] [c03fffe97d80] [d0001cb247a8] 
qla24xx_msix_rsp_q+0x108/0x200 [qla2xxx]
  [  497.572936] [c03fffe97e10] [c01319b0] 
handle_irq_event_percpu+0x90/0x2b0
  [  497.572938] [c03fffe97ed0] [c0131c38] 
handle_irq_event+0x68/0xd0
  [  497.572940] [c03fffe97f00] [c0136f80] 
handle_fasteoi_irq+0xe0/0x2a0
  [  497.572942] [c03fffe97f30] [c0130ca8] 
generic_handle_irq+0x58/0x90
  [  497.572943] [c03fffe97f60] [c00119c0] __do_irq+0x80/0x190
  [  497.572945] [c03fffe97f90] [c00253d0] call_do_irq+0x14/0x24
  [  497.572946] [c02fe83abab0] [c0011b68] do_IRQ+0x98/0x140
  [  497.572948] [c02fe83abb00] [c0002794] 
hardware_interrupt_common+0x114/0x180
  [  497.572952] --- Exception: 501 at snooze_loop+0xd8/0x170
  LR = snooze_loop+0x90/0x170
  [  497.572955] [c02fe83abdf0] [c0a33680] cpu_online_mask+0x0/0x8 
(unreliable)
  [  497.572957] [c02fe83abe30] [c08405bc] 
cpuidle_enter_state+0x6c/0x140
  [  497.572960] [c02fe83abe80] [c0113938] 
cpu_startup_entry+0x318/0x4c0
  [  497.572962] [c02fe83abf20] [c0043844] 
start_secondary+0x324/0x350
  [  497.572964] [c02fe83abf90] [c0009a6c] 
start_secondary_prolog+0x10/0x14
  [  497.572973] EEH: Detected PCI bus error on PHB#1-PE#8
  [  497.572978] EEH: This PCI device has failed 1 times in the last hour
  [  497.572979] EEH: Notify device drivers to shutdown
  [  497.573000] qla2xxx [0001:07:00.0]-015b:2: Disabling adapter.
  [  497.573071] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573072] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573075] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573076] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573077] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573078] sd 2:0:1:1: [sdd]  
  [  497.573079] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573080] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573081] sd 2:0:1:1: [sdd]  
  [  497.573082] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573084] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573085] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573086] sd 2:0:1:1: [sdd] CDB: 
  [  497.573087] sd 2:0:1:1: [sdd]  
  [  497.573088] sd 2:0:1:0: [sdc]  
  [  497.573088] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573089] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573090] sd 2:0:1:1: [sdd] CDB: 
  [  497.573091] sd 2:0:1:1: [sdd]  
  [  497.573095] Read(10)
  [  497.573095] sd 2:0:1:0: [sdc]  
  [  497.573096] sd 2:0:1:0: [sdc]  
  [  497.573097] :
  [  497.573097] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573099] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573100] Read(10)
  [  497.573100] sd 2:0:1:1: [sdd] CDB: 
  [  497.573101] sd 2:0:1:0: [sd

[Kernel-packages] [Bug 1429959] Re: Auto Error Recovery is failing after error injected for sailfish card in Ubuntu 14.10 [PowerNV]

2015-03-10 Thread Joseph Salisbury
** Tags added: kernel-da-key

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1429959

Title:
  Auto Error Recovery is failing after error injected for sailfish card
  in Ubuntu 14.10 [PowerNV]

Status in linux package in Ubuntu:
  New

Bug description:
  ---Problem Description---

  PowerNV/Ubuntu 14.10 Auto Error Recovery is failing after error injected for 
sailfish
   
  ---uname output---
  Linux powerio-le21 3.16.0-23-generic #31-Ubuntu SMP Tue Oct 21 17:55:08 UTC 
2014 ppc64le ppc64le ppc64le GNU/Linux
   
  Machine Type = 8286-42A 

  ---Steps to Reproduce---
   
  There are 2 LUNs coming across 3 different paths and multipath is configured.

  1. Run I/O activity by running HTX load on the multipath devices.
  2. Verify I/O activity on the multipath devices by iostat command
  2. Injected error by the following command in 
  echo 0x8000 > 
/sys/kernel/debug/powerpc/PCI0001/err_injct_inboundA;

  sleep 1;
  echo 0x0 > /sys/kernel/debug/powerpc/PCI0001/err_injct_inboundA

  3. The error injection happened and the I/O activity was suspended as 
confirmed by iostat.
  4. Error recovery of the PCI devices did not happen and the devices remained 
inaccessible.

  The dmesg during the event is as follows

  [  376.148715] systemd-logind[7123]: New session 6 of user root.
  [  497.572751] EEH: Frozen PHB#1-PE#8 detected
  [  497.572799] EEH: PE location: U78C9.001.WZS006T-P1-C12 , PHB location: 
U78C9.001.WZS006T-P1-C32
  [  497.572890] CPU: 32 PID: 0 Comm: swapper/32 Tainted: G   OE 
3.16.0-23-generic #31-Ubuntu
  [  497.572892] Call Trace:
  [  497.572898] [c03fffe97b90] [c0017390] show_stack+0x170/0x290 
(unreliable)
  [  497.572902] [c03fffe97c70] [c0a05fc0] dump_stack+0x90/0xbc
  [  497.572906] [c03fffe97ca0] [c0038010] 
eeh_dev_check_failure+0x560/0x580
  [  497.572908] [c03fffe97d40] [c00380b8] 
eeh_check_failure+0x88/0xe0
  [  497.572933] [c03fffe97d80] [d0001cb247a8] 
qla24xx_msix_rsp_q+0x108/0x200 [qla2xxx]
  [  497.572936] [c03fffe97e10] [c01319b0] 
handle_irq_event_percpu+0x90/0x2b0
  [  497.572938] [c03fffe97ed0] [c0131c38] 
handle_irq_event+0x68/0xd0
  [  497.572940] [c03fffe97f00] [c0136f80] 
handle_fasteoi_irq+0xe0/0x2a0
  [  497.572942] [c03fffe97f30] [c0130ca8] 
generic_handle_irq+0x58/0x90
  [  497.572943] [c03fffe97f60] [c00119c0] __do_irq+0x80/0x190
  [  497.572945] [c03fffe97f90] [c00253d0] call_do_irq+0x14/0x24
  [  497.572946] [c02fe83abab0] [c0011b68] do_IRQ+0x98/0x140
  [  497.572948] [c02fe83abb00] [c0002794] 
hardware_interrupt_common+0x114/0x180
  [  497.572952] --- Exception: 501 at snooze_loop+0xd8/0x170
  LR = snooze_loop+0x90/0x170
  [  497.572955] [c02fe83abdf0] [c0a33680] cpu_online_mask+0x0/0x8 
(unreliable)
  [  497.572957] [c02fe83abe30] [c08405bc] 
cpuidle_enter_state+0x6c/0x140
  [  497.572960] [c02fe83abe80] [c0113938] 
cpu_startup_entry+0x318/0x4c0
  [  497.572962] [c02fe83abf20] [c0043844] 
start_secondary+0x324/0x350
  [  497.572964] [c02fe83abf90] [c0009a6c] 
start_secondary_prolog+0x10/0x14
  [  497.572973] EEH: Detected PCI bus error on PHB#1-PE#8
  [  497.572978] EEH: This PCI device has failed 1 times in the last hour
  [  497.572979] EEH: Notify device drivers to shutdown
  [  497.573000] qla2xxx [0001:07:00.0]-015b:2: Disabling adapter.
  [  497.573071] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573072] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573075] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573076] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573077] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573078] sd 2:0:1:1: [sdd]  
  [  497.573079] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573080] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573081] sd 2:0:1:1: [sdd]  
  [  497.573082] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573084] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573085] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573086] sd 2:0:1:1: [sdd] CDB: 
  [  497.573087] sd 2:0:1:1: [sdd]  
  [  497.573088] sd 2:0:1:0: [sdc]  
  [  497.573088] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573089] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573090] sd 2:0:1:1: [sdd] CDB: 
  [  497.573091] sd 2:0:1:1: [sdd]  
  [  497.573095] Read(10)
  [  497.573095] sd 2:0:1:0: [sdc]  
  [  497.573096] sd 2:0:1:0: [sdc]  
  [  497.573097] :
  [  497.573097] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573099] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573100] Read(10)
  [  497.573100] sd 2:0:1:1: [sdd] CDB: 
  [  497.573101] sd 2:0:1:0: [sdc]  
  [  497.573103] :
  [  497.573103]  28
  [  497

[Kernel-packages] [Bug 1429959] Re: Auto Error Recovery is failing after error injected for sailfish card in Ubuntu 14.10 [PowerNV]

2015-03-09 Thread Luciano Chavez
** Package changed: ubuntu => linux (Ubuntu)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1429959

Title:
  Auto Error Recovery is failing after error injected for sailfish card
  in Ubuntu 14.10 [PowerNV]

Status in linux package in Ubuntu:
  New

Bug description:
  ---Problem Description---

  PowerNV/Ubuntu 14.10 Auto Error Recovery is failing after error injected for 
sailfish
   
  ---uname output---
  Linux powerio-le21 3.16.0-23-generic #31-Ubuntu SMP Tue Oct 21 17:55:08 UTC 
2014 ppc64le ppc64le ppc64le GNU/Linux
   
  Machine Type = 8286-42A 

  ---Steps to Reproduce---
   
  There are 2 LUNs coming across 3 different paths and multipath is configured.

  1. Run I/O activity by running HTX load on the multipath devices.
  2. Verify I/O activity on the multipath devices by iostat command
  2. Injected error by the following command in 
  echo 0x8000 > 
/sys/kernel/debug/powerpc/PCI0001/err_injct_inboundA;

  sleep 1;
  echo 0x0 > /sys/kernel/debug/powerpc/PCI0001/err_injct_inboundA

  3. The error injection happened and the I/O activity was suspended as 
confirmed by iostat.
  4. Error recovery of the PCI devices did not happen and the devices remained 
inaccessible.

  The dmesg during the event is as follows

  [  376.148715] systemd-logind[7123]: New session 6 of user root.
  [  497.572751] EEH: Frozen PHB#1-PE#8 detected
  [  497.572799] EEH: PE location: U78C9.001.WZS006T-P1-C12 , PHB location: 
U78C9.001.WZS006T-P1-C32
  [  497.572890] CPU: 32 PID: 0 Comm: swapper/32 Tainted: G   OE 
3.16.0-23-generic #31-Ubuntu
  [  497.572892] Call Trace:
  [  497.572898] [c03fffe97b90] [c0017390] show_stack+0x170/0x290 
(unreliable)
  [  497.572902] [c03fffe97c70] [c0a05fc0] dump_stack+0x90/0xbc
  [  497.572906] [c03fffe97ca0] [c0038010] 
eeh_dev_check_failure+0x560/0x580
  [  497.572908] [c03fffe97d40] [c00380b8] 
eeh_check_failure+0x88/0xe0
  [  497.572933] [c03fffe97d80] [d0001cb247a8] 
qla24xx_msix_rsp_q+0x108/0x200 [qla2xxx]
  [  497.572936] [c03fffe97e10] [c01319b0] 
handle_irq_event_percpu+0x90/0x2b0
  [  497.572938] [c03fffe97ed0] [c0131c38] 
handle_irq_event+0x68/0xd0
  [  497.572940] [c03fffe97f00] [c0136f80] 
handle_fasteoi_irq+0xe0/0x2a0
  [  497.572942] [c03fffe97f30] [c0130ca8] 
generic_handle_irq+0x58/0x90
  [  497.572943] [c03fffe97f60] [c00119c0] __do_irq+0x80/0x190
  [  497.572945] [c03fffe97f90] [c00253d0] call_do_irq+0x14/0x24
  [  497.572946] [c02fe83abab0] [c0011b68] do_IRQ+0x98/0x140
  [  497.572948] [c02fe83abb00] [c0002794] 
hardware_interrupt_common+0x114/0x180
  [  497.572952] --- Exception: 501 at snooze_loop+0xd8/0x170
  LR = snooze_loop+0x90/0x170
  [  497.572955] [c02fe83abdf0] [c0a33680] cpu_online_mask+0x0/0x8 
(unreliable)
  [  497.572957] [c02fe83abe30] [c08405bc] 
cpuidle_enter_state+0x6c/0x140
  [  497.572960] [c02fe83abe80] [c0113938] 
cpu_startup_entry+0x318/0x4c0
  [  497.572962] [c02fe83abf20] [c0043844] 
start_secondary+0x324/0x350
  [  497.572964] [c02fe83abf90] [c0009a6c] 
start_secondary_prolog+0x10/0x14
  [  497.572973] EEH: Detected PCI bus error on PHB#1-PE#8
  [  497.572978] EEH: This PCI device has failed 1 times in the last hour
  [  497.572979] EEH: Notify device drivers to shutdown
  [  497.573000] qla2xxx [0001:07:00.0]-015b:2: Disabling adapter.
  [  497.573071] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573072] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573075] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573076] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573077] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573078] sd 2:0:1:1: [sdd]  
  [  497.573079] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573080] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573081] sd 2:0:1:1: [sdd]  
  [  497.573082] sd 2:0:1:1: [sdd] Unhandled error code
  [  497.573084] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573085] sd 2:0:1:0: [sdc] Unhandled error code
  [  497.573086] sd 2:0:1:1: [sdd] CDB: 
  [  497.573087] sd 2:0:1:1: [sdd]  
  [  497.573088] sd 2:0:1:0: [sdc]  
  [  497.573088] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573089] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573090] sd 2:0:1:1: [sdd] CDB: 
  [  497.573091] sd 2:0:1:1: [sdd]  
  [  497.573095] Read(10)
  [  497.573095] sd 2:0:1:0: [sdc]  
  [  497.573096] sd 2:0:1:0: [sdc]  
  [  497.573097] :
  [  497.573097] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573099] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [  497.573100] Read(10)
  [  497.573100] sd 2:0:1:1: [sdd] CDB: 
  [  497.573101] sd 2:0:1:0: [sdc]  
  [  497.573103] :
  [  497.5731