[Kernel-packages] [Bug 1502982] Comment bridged from LTC Bugzilla

2015-10-22 Thread bugproxy
--- Comment From cdead...@us.ibm.com 2015-10-22 20:35 EDT---
 State: Verify by: lieder on 22 October 2015 15:30:54 

#=#=# 2015-10-22 15:30:42 (CDT) #=#=#
New Fix_Potential = [P810.00D]
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1502982

Title:
  STCOP810:Firestone: frsfp6 EEH on Bluefin does not recover with Ubuntu

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Vivid:
  Fix Committed
Status in linux source package in Wily:
  Fix Released

Bug description:
  Problem:
  ==
  Test Case Execution Record:

  95613: EEH_Firestone_Ubuntu 14.04.03_Bluefin_Standalone on frsfp6

  Error Injection Method: err_injct_inboundA

  Step 1. Start HTX (I used mdt.hdbuster & only ran htx on bluefin disks)
  Step 2. Inject EEH error

  bluefin is in slot P1-C4 (PCI0004)

   echo 0x8000 >
  /sys/kernel/debug/powerpc/PCI0004/err_injct_inboundA; sleep 1; echo
  0x0 > /sys/kernel/debug/powerpc/PCI0004/err_injct_inboundA

  Expected Result: Adapter/SAN disks to recover and htx still run

  Actual Result:  Adapter did not recover... continuous EEH errors until
  limit of 6 is reached in 1 hour

  There're two patches: one for skiboot firmware and another patch,
  which has been in upstream, was missed in ubuntu distro (at least
  15.04). The skiboot patch has been merged to upstream.

  c7192a4 PHB3: Fix wrong PE number in error injection (skiboot)
  2aa5cf9 powerpc/eeh: Fix missed PE#0 on P7IOC (linux)

  If I'm correct, I think this bug needs to be mirrored so that the
  Linux patch (commit 2aa5cf9) can be backported to ubuntu distro. With
  the patch backported to ubuntu 15.04, EEH works fine on Broadcom
  adapter (not exactly the one where the bug was reported initially):

  root@fstn2-p1:/# dmesg | grep EEH
  [0.216919] EEH: PowerNV platform initialized
  [0.570606] EEH: devices created
  [1.302482] EEH: PCI Enhanced I/O Error Handling Enabled
  [   90.566761] EEH: PHB location: Slot1
  [   90.567503] EEH: Frozen PHB#4-PE#0 detected
  [   90.567673] EEH: PE location: Slot1, PHB location: Slot1
  [   90.567930] EEH: Detected PCI bus error on PHB#4-PE#0
  [   90.567935] EEH: This PCI device has failed 1 times in the last hour
  [   90.567937] EEH: Notify device drivers to shutdown
  [   90.567985] EEH: Collect temporary log
  [   90.568971] EEH: Reset without hotplug activity
  [   94.585540] EEH: Notify device drivers the completion of reset
  [   94.585934] EEH: Notify device driver to resume

  

  The story about this bug is: Without commit 2aa5cf9 ("powerpc/eeh: Fix
  missed PE#0 on P7IOC"). PE#0 is regarded as invalid one. When kernel
  sees the frozen PE#0, the frozen state is cleared and dump the PHB
  diag-data, then try to recover it. When resetting the PE, the driver,
  which wasn't stopped by error_detected() completely, access the MMIO
  space and just causes another (recursive) EEH error. Eventually, the
  EEH recovery failed. During the PE reset, the I/O path for the PE
  should be frozen and MMIO access during the period should be dropped
  to avoid recursive EEH error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1502982/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1502982] Comment bridged from LTC Bugzilla

2015-10-22 Thread bugproxy
--- Comment From cdead...@us.ibm.com 2015-10-22 14:54 EDT---
 State: Verify by: anitrap on 22 October 2015 09:38:53 

Gavin added patched kernel to my system and EEH on bluefin worked.  I'm
waiting for official fix before verifying.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1502982

Title:
  STCOP810:Firestone: frsfp6 EEH on Bluefin does not recover with Ubuntu

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Vivid:
  Fix Committed
Status in linux source package in Wily:
  Fix Released

Bug description:
  Problem:
  ==
  Test Case Execution Record:

  95613: EEH_Firestone_Ubuntu 14.04.03_Bluefin_Standalone on frsfp6

  Error Injection Method: err_injct_inboundA

  Step 1. Start HTX (I used mdt.hdbuster & only ran htx on bluefin disks)
  Step 2. Inject EEH error

  bluefin is in slot P1-C4 (PCI0004)

   echo 0x8000 >
  /sys/kernel/debug/powerpc/PCI0004/err_injct_inboundA; sleep 1; echo
  0x0 > /sys/kernel/debug/powerpc/PCI0004/err_injct_inboundA

  Expected Result: Adapter/SAN disks to recover and htx still run

  Actual Result:  Adapter did not recover... continuous EEH errors until
  limit of 6 is reached in 1 hour

  There're two patches: one for skiboot firmware and another patch,
  which has been in upstream, was missed in ubuntu distro (at least
  15.04). The skiboot patch has been merged to upstream.

  c7192a4 PHB3: Fix wrong PE number in error injection (skiboot)
  2aa5cf9 powerpc/eeh: Fix missed PE#0 on P7IOC (linux)

  If I'm correct, I think this bug needs to be mirrored so that the
  Linux patch (commit 2aa5cf9) can be backported to ubuntu distro. With
  the patch backported to ubuntu 15.04, EEH works fine on Broadcom
  adapter (not exactly the one where the bug was reported initially):

  root@fstn2-p1:/# dmesg | grep EEH
  [0.216919] EEH: PowerNV platform initialized
  [0.570606] EEH: devices created
  [1.302482] EEH: PCI Enhanced I/O Error Handling Enabled
  [   90.566761] EEH: PHB location: Slot1
  [   90.567503] EEH: Frozen PHB#4-PE#0 detected
  [   90.567673] EEH: PE location: Slot1, PHB location: Slot1
  [   90.567930] EEH: Detected PCI bus error on PHB#4-PE#0
  [   90.567935] EEH: This PCI device has failed 1 times in the last hour
  [   90.567937] EEH: Notify device drivers to shutdown
  [   90.567985] EEH: Collect temporary log
  [   90.568971] EEH: Reset without hotplug activity
  [   94.585540] EEH: Notify device drivers the completion of reset
  [   94.585934] EEH: Notify device driver to resume

  

  The story about this bug is: Without commit 2aa5cf9 ("powerpc/eeh: Fix
  missed PE#0 on P7IOC"). PE#0 is regarded as invalid one. When kernel
  sees the frozen PE#0, the frozen state is cleared and dump the PHB
  diag-data, then try to recover it. When resetting the PE, the driver,
  which wasn't stopped by error_detected() completely, access the MMIO
  space and just causes another (recursive) EEH error. Eventually, the
  EEH recovery failed. During the PE reset, the I/O path for the PE
  should be frozen and MMIO access during the period should be dropped
  to avoid recursive EEH error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1502982/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1502982] Comment bridged from LTC Bugzilla

2015-10-14 Thread bugproxy
--- Comment From cdead...@us.ibm.com 2015-10-14 17:35 EDT---
 State: Verify by: trantow on 14 October 2015 12:26:47 

Need to understand from Seq 19 & 20
commit 2aa5cf9 ("powerpc/eeh: Fix missed PE#0 on P7IOC")
commit 433185d2 ("powerpc/eeh: Fix PE#0 check in eeh_add_to_parent_pe()")

Are those both Skiboot fixes that need to be in the 1539H driver?
or were we supposed to already have one or more of them in 1539G?

Assuming these are NOT Ubuntu fixes?
Is there also a set of Ubuntu 14.04.3 3.19.x kernel updates, and are these 
already planned for a 10/17 release?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1502982

Title:
  STCOP810:Firestone: frsfp6 EEH on Bluefin does not recover with Ubuntu

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Vivid:
  Fix Committed
Status in linux source package in Wily:
  Fix Released

Bug description:
  Problem:
  ==
  Test Case Execution Record:

  95613: EEH_Firestone_Ubuntu 14.04.03_Bluefin_Standalone on frsfp6

  Error Injection Method: err_injct_inboundA

  Step 1. Start HTX (I used mdt.hdbuster & only ran htx on bluefin disks)
  Step 2. Inject EEH error

  bluefin is in slot P1-C4 (PCI0004)

   echo 0x8000 >
  /sys/kernel/debug/powerpc/PCI0004/err_injct_inboundA; sleep 1; echo
  0x0 > /sys/kernel/debug/powerpc/PCI0004/err_injct_inboundA

  Expected Result: Adapter/SAN disks to recover and htx still run

  Actual Result:  Adapter did not recover... continuous EEH errors until
  limit of 6 is reached in 1 hour

  There're two patches: one for skiboot firmware and another patch,
  which has been in upstream, was missed in ubuntu distro (at least
  15.04). The skiboot patch has been merged to upstream.

  c7192a4 PHB3: Fix wrong PE number in error injection (skiboot)
  2aa5cf9 powerpc/eeh: Fix missed PE#0 on P7IOC (linux)

  If I'm correct, I think this bug needs to be mirrored so that the
  Linux patch (commit 2aa5cf9) can be backported to ubuntu distro. With
  the patch backported to ubuntu 15.04, EEH works fine on Broadcom
  adapter (not exactly the one where the bug was reported initially):

  root@fstn2-p1:/# dmesg | grep EEH
  [0.216919] EEH: PowerNV platform initialized
  [0.570606] EEH: devices created
  [1.302482] EEH: PCI Enhanced I/O Error Handling Enabled
  [   90.566761] EEH: PHB location: Slot1
  [   90.567503] EEH: Frozen PHB#4-PE#0 detected
  [   90.567673] EEH: PE location: Slot1, PHB location: Slot1
  [   90.567930] EEH: Detected PCI bus error on PHB#4-PE#0
  [   90.567935] EEH: This PCI device has failed 1 times in the last hour
  [   90.567937] EEH: Notify device drivers to shutdown
  [   90.567985] EEH: Collect temporary log
  [   90.568971] EEH: Reset without hotplug activity
  [   94.585540] EEH: Notify device drivers the completion of reset
  [   94.585934] EEH: Notify device driver to resume

  

  The story about this bug is: Without commit 2aa5cf9 ("powerpc/eeh: Fix
  missed PE#0 on P7IOC"). PE#0 is regarded as invalid one. When kernel
  sees the frozen PE#0, the frozen state is cleared and dump the PHB
  diag-data, then try to recover it. When resetting the PE, the driver,
  which wasn't stopped by error_detected() completely, access the MMIO
  space and just causes another (recursive) EEH error. Eventually, the
  EEH recovery failed. During the PE reset, the I/O path for the PE
  should be frozen and MMIO access during the period should be dropped
  to avoid recursive EEH error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1502982/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1502982] Comment bridged from LTC Bugzilla

2015-10-14 Thread bugproxy
--- Comment From gws...@au1.ibm.com 2015-10-14 21:34 EDT---
Yes, all above 3 patches are required. Otherwise, error recovery can't work on 
PE#0. Also, error injection to PE#0 just fail. For this particular bug, I think 
it's just tracking the backporting for those 2 Linux patches from upstream to 
ubuntu distro.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1502982

Title:
  STCOP810:Firestone: frsfp6 EEH on Bluefin does not recover with Ubuntu

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Vivid:
  Fix Committed
Status in linux source package in Wily:
  Fix Released

Bug description:
  Problem:
  ==
  Test Case Execution Record:

  95613: EEH_Firestone_Ubuntu 14.04.03_Bluefin_Standalone on frsfp6

  Error Injection Method: err_injct_inboundA

  Step 1. Start HTX (I used mdt.hdbuster & only ran htx on bluefin disks)
  Step 2. Inject EEH error

  bluefin is in slot P1-C4 (PCI0004)

   echo 0x8000 >
  /sys/kernel/debug/powerpc/PCI0004/err_injct_inboundA; sleep 1; echo
  0x0 > /sys/kernel/debug/powerpc/PCI0004/err_injct_inboundA

  Expected Result: Adapter/SAN disks to recover and htx still run

  Actual Result:  Adapter did not recover... continuous EEH errors until
  limit of 6 is reached in 1 hour

  There're two patches: one for skiboot firmware and another patch,
  which has been in upstream, was missed in ubuntu distro (at least
  15.04). The skiboot patch has been merged to upstream.

  c7192a4 PHB3: Fix wrong PE number in error injection (skiboot)
  2aa5cf9 powerpc/eeh: Fix missed PE#0 on P7IOC (linux)

  If I'm correct, I think this bug needs to be mirrored so that the
  Linux patch (commit 2aa5cf9) can be backported to ubuntu distro. With
  the patch backported to ubuntu 15.04, EEH works fine on Broadcom
  adapter (not exactly the one where the bug was reported initially):

  root@fstn2-p1:/# dmesg | grep EEH
  [0.216919] EEH: PowerNV platform initialized
  [0.570606] EEH: devices created
  [1.302482] EEH: PCI Enhanced I/O Error Handling Enabled
  [   90.566761] EEH: PHB location: Slot1
  [   90.567503] EEH: Frozen PHB#4-PE#0 detected
  [   90.567673] EEH: PE location: Slot1, PHB location: Slot1
  [   90.567930] EEH: Detected PCI bus error on PHB#4-PE#0
  [   90.567935] EEH: This PCI device has failed 1 times in the last hour
  [   90.567937] EEH: Notify device drivers to shutdown
  [   90.567985] EEH: Collect temporary log
  [   90.568971] EEH: Reset without hotplug activity
  [   94.585540] EEH: Notify device drivers the completion of reset
  [   94.585934] EEH: Notify device driver to resume

  

  The story about this bug is: Without commit 2aa5cf9 ("powerpc/eeh: Fix
  missed PE#0 on P7IOC"). PE#0 is regarded as invalid one. When kernel
  sees the frozen PE#0, the frozen state is cleared and dump the PHB
  diag-data, then try to recover it. When resetting the PE, the driver,
  which wasn't stopped by error_detected() completely, access the MMIO
  space and just causes another (recursive) EEH error. Eventually, the
  EEH recovery failed. During the PE reset, the I/O path for the PE
  should be frozen and MMIO access during the period should be dropped
  to avoid recursive EEH error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1502982/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1502982] Comment bridged from LTC Bugzilla

2015-10-14 Thread bugproxy
--- Comment From cdead...@us.ibm.com 2015-10-14 21:15 EDT---
 State: Verify by: trantow on 14 October 2015 16:08:58 

Are Three patches required?
one for skiboot firmware (in OPAL) ... delivered in 5.1.5 -> op810 1539G ?
two Linux ones, and when available for 14.04.3 kernel 3.19 ?

1) c7192a4 PHB3: Fix wrong PE number in error injection (skiboot)
2) 2aa5cf9 powerpc/eeh: Fix missed PE#0 on P7IOC (linux)
 later add ...
3) 433185d2 ("powerpc/eeh: Fix PE#0 check in eeh_add_to_parent_pe()")   (Linux)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1502982

Title:
  STCOP810:Firestone: frsfp6 EEH on Bluefin does not recover with Ubuntu

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Vivid:
  Fix Committed
Status in linux source package in Wily:
  Fix Released

Bug description:
  Problem:
  ==
  Test Case Execution Record:

  95613: EEH_Firestone_Ubuntu 14.04.03_Bluefin_Standalone on frsfp6

  Error Injection Method: err_injct_inboundA

  Step 1. Start HTX (I used mdt.hdbuster & only ran htx on bluefin disks)
  Step 2. Inject EEH error

  bluefin is in slot P1-C4 (PCI0004)

   echo 0x8000 >
  /sys/kernel/debug/powerpc/PCI0004/err_injct_inboundA; sleep 1; echo
  0x0 > /sys/kernel/debug/powerpc/PCI0004/err_injct_inboundA

  Expected Result: Adapter/SAN disks to recover and htx still run

  Actual Result:  Adapter did not recover... continuous EEH errors until
  limit of 6 is reached in 1 hour

  There're two patches: one for skiboot firmware and another patch,
  which has been in upstream, was missed in ubuntu distro (at least
  15.04). The skiboot patch has been merged to upstream.

  c7192a4 PHB3: Fix wrong PE number in error injection (skiboot)
  2aa5cf9 powerpc/eeh: Fix missed PE#0 on P7IOC (linux)

  If I'm correct, I think this bug needs to be mirrored so that the
  Linux patch (commit 2aa5cf9) can be backported to ubuntu distro. With
  the patch backported to ubuntu 15.04, EEH works fine on Broadcom
  adapter (not exactly the one where the bug was reported initially):

  root@fstn2-p1:/# dmesg | grep EEH
  [0.216919] EEH: PowerNV platform initialized
  [0.570606] EEH: devices created
  [1.302482] EEH: PCI Enhanced I/O Error Handling Enabled
  [   90.566761] EEH: PHB location: Slot1
  [   90.567503] EEH: Frozen PHB#4-PE#0 detected
  [   90.567673] EEH: PE location: Slot1, PHB location: Slot1
  [   90.567930] EEH: Detected PCI bus error on PHB#4-PE#0
  [   90.567935] EEH: This PCI device has failed 1 times in the last hour
  [   90.567937] EEH: Notify device drivers to shutdown
  [   90.567985] EEH: Collect temporary log
  [   90.568971] EEH: Reset without hotplug activity
  [   94.585540] EEH: Notify device drivers the completion of reset
  [   94.585934] EEH: Notify device driver to resume

  

  The story about this bug is: Without commit 2aa5cf9 ("powerpc/eeh: Fix
  missed PE#0 on P7IOC"). PE#0 is regarded as invalid one. When kernel
  sees the frozen PE#0, the frozen state is cleared and dump the PHB
  diag-data, then try to recover it. When resetting the PE, the driver,
  which wasn't stopped by error_detected() completely, access the MMIO
  space and just causes another (recursive) EEH error. Eventually, the
  EEH recovery failed. During the PE reset, the I/O path for the PE
  should be frozen and MMIO access during the period should be dropped
  to avoid recursive EEH error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1502982/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1502982] Comment bridged from LTC Bugzilla

2015-10-14 Thread bugproxy
--- Comment From gws...@au1.ibm.com 2015-10-15 04:58 EDT---
Hi Canonical, those two patches are very critical for EEH functionality to work 
properly. If it's possible, could you please include them in next release 
cycle, which is 10/17 as I was told.

If there are any assistance needed, please let me know.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1502982

Title:
  STCOP810:Firestone: frsfp6 EEH on Bluefin does not recover with Ubuntu

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Vivid:
  Fix Committed
Status in linux source package in Wily:
  Fix Released

Bug description:
  Problem:
  ==
  Test Case Execution Record:

  95613: EEH_Firestone_Ubuntu 14.04.03_Bluefin_Standalone on frsfp6

  Error Injection Method: err_injct_inboundA

  Step 1. Start HTX (I used mdt.hdbuster & only ran htx on bluefin disks)
  Step 2. Inject EEH error

  bluefin is in slot P1-C4 (PCI0004)

   echo 0x8000 >
  /sys/kernel/debug/powerpc/PCI0004/err_injct_inboundA; sleep 1; echo
  0x0 > /sys/kernel/debug/powerpc/PCI0004/err_injct_inboundA

  Expected Result: Adapter/SAN disks to recover and htx still run

  Actual Result:  Adapter did not recover... continuous EEH errors until
  limit of 6 is reached in 1 hour

  There're two patches: one for skiboot firmware and another patch,
  which has been in upstream, was missed in ubuntu distro (at least
  15.04). The skiboot patch has been merged to upstream.

  c7192a4 PHB3: Fix wrong PE number in error injection (skiboot)
  2aa5cf9 powerpc/eeh: Fix missed PE#0 on P7IOC (linux)

  If I'm correct, I think this bug needs to be mirrored so that the
  Linux patch (commit 2aa5cf9) can be backported to ubuntu distro. With
  the patch backported to ubuntu 15.04, EEH works fine on Broadcom
  adapter (not exactly the one where the bug was reported initially):

  root@fstn2-p1:/# dmesg | grep EEH
  [0.216919] EEH: PowerNV platform initialized
  [0.570606] EEH: devices created
  [1.302482] EEH: PCI Enhanced I/O Error Handling Enabled
  [   90.566761] EEH: PHB location: Slot1
  [   90.567503] EEH: Frozen PHB#4-PE#0 detected
  [   90.567673] EEH: PE location: Slot1, PHB location: Slot1
  [   90.567930] EEH: Detected PCI bus error on PHB#4-PE#0
  [   90.567935] EEH: This PCI device has failed 1 times in the last hour
  [   90.567937] EEH: Notify device drivers to shutdown
  [   90.567985] EEH: Collect temporary log
  [   90.568971] EEH: Reset without hotplug activity
  [   94.585540] EEH: Notify device drivers the completion of reset
  [   94.585934] EEH: Notify device driver to resume

  

  The story about this bug is: Without commit 2aa5cf9 ("powerpc/eeh: Fix
  missed PE#0 on P7IOC"). PE#0 is regarded as invalid one. When kernel
  sees the frozen PE#0, the frozen state is cleared and dump the PHB
  diag-data, then try to recover it. When resetting the PE, the driver,
  which wasn't stopped by error_detected() completely, access the MMIO
  space and just causes another (recursive) EEH error. Eventually, the
  EEH recovery failed. During the PE reset, the I/O path for the PE
  should be frozen and MMIO access during the period should be dropped
  to avoid recursive EEH error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1502982/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1502982] Comment bridged from LTC Bugzilla

2015-10-12 Thread bugproxy
--- Comment From gws...@au1.ibm.com 2015-10-12 13:36 EDT---
There is another related fix that needs backporting as well. So there're two 
patches in total:

commit 2aa5cf9 ("powerpc/eeh: Fix missed PE#0 on P7IOC")
commit 433185d2 ("powerpc/eeh: Fix PE#0 check in eeh_add_to_parent_pe()")

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1502982

Title:
  STCOP810:Firestone: frsfp6 EEH on Bluefin does not recover with Ubuntu

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Vivid:
  Fix Committed
Status in linux source package in Wily:
  Fix Released

Bug description:
  Problem:
  ==
  Test Case Execution Record:

  95613: EEH_Firestone_Ubuntu 14.04.03_Bluefin_Standalone on frsfp6

  Error Injection Method: err_injct_inboundA

  Step 1. Start HTX (I used mdt.hdbuster & only ran htx on bluefin disks)
  Step 2. Inject EEH error

  bluefin is in slot P1-C4 (PCI0004)

   echo 0x8000 >
  /sys/kernel/debug/powerpc/PCI0004/err_injct_inboundA; sleep 1; echo
  0x0 > /sys/kernel/debug/powerpc/PCI0004/err_injct_inboundA

  Expected Result: Adapter/SAN disks to recover and htx still run

  Actual Result:  Adapter did not recover... continuous EEH errors until
  limit of 6 is reached in 1 hour

  There're two patches: one for skiboot firmware and another patch,
  which has been in upstream, was missed in ubuntu distro (at least
  15.04). The skiboot patch has been merged to upstream.

  c7192a4 PHB3: Fix wrong PE number in error injection (skiboot)
  2aa5cf9 powerpc/eeh: Fix missed PE#0 on P7IOC (linux)

  If I'm correct, I think this bug needs to be mirrored so that the
  Linux patch (commit 2aa5cf9) can be backported to ubuntu distro. With
  the patch backported to ubuntu 15.04, EEH works fine on Broadcom
  adapter (not exactly the one where the bug was reported initially):

  root@fstn2-p1:/# dmesg | grep EEH
  [0.216919] EEH: PowerNV platform initialized
  [0.570606] EEH: devices created
  [1.302482] EEH: PCI Enhanced I/O Error Handling Enabled
  [   90.566761] EEH: PHB location: Slot1
  [   90.567503] EEH: Frozen PHB#4-PE#0 detected
  [   90.567673] EEH: PE location: Slot1, PHB location: Slot1
  [   90.567930] EEH: Detected PCI bus error on PHB#4-PE#0
  [   90.567935] EEH: This PCI device has failed 1 times in the last hour
  [   90.567937] EEH: Notify device drivers to shutdown
  [   90.567985] EEH: Collect temporary log
  [   90.568971] EEH: Reset without hotplug activity
  [   94.585540] EEH: Notify device drivers the completion of reset
  [   94.585934] EEH: Notify device driver to resume

  

  The story about this bug is: Without commit 2aa5cf9 ("powerpc/eeh: Fix
  missed PE#0 on P7IOC"). PE#0 is regarded as invalid one. When kernel
  sees the frozen PE#0, the frozen state is cleared and dump the PHB
  diag-data, then try to recover it. When resetting the PE, the driver,
  which wasn't stopped by error_detected() completely, access the MMIO
  space and just causes another (recursive) EEH error. Eventually, the
  EEH recovery failed. During the PE reset, the I/O path for the PE
  should be frozen and MMIO access during the period should be dropped
  to avoid recursive EEH error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1502982/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


Re: [Kernel-packages] [Bug 1502982] Comment bridged from LTC Bugzilla

2015-10-09 Thread Steve Langasek
On Fri, Oct 09, 2015 at 02:01:18PM -, bugproxy wrote:
> --- Comment From cdead...@us.ibm.com 2015-10-09 13:55 EDT---
>  State: Verify by: anitrap on 09 October 2015 08:45:31 

> Sorry about password confusion...So unless something has changed, I
> thought we were not supposed to put lab passwords in defect.

Your comments here are being mirrored to Ubuntu's public bug tracker, so
yes, please don't send us any lab passwords.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1502982

Title:
  STCOP810:Firestone: frsfp6 EEH on Bluefin does not recover with Ubuntu

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Vivid:
  Fix Committed
Status in linux source package in Wily:
  Fix Released

Bug description:
  Problem:
  ==
  Test Case Execution Record:

  95613: EEH_Firestone_Ubuntu 14.04.03_Bluefin_Standalone on frsfp6

  Error Injection Method: err_injct_inboundA

  Step 1. Start HTX (I used mdt.hdbuster & only ran htx on bluefin disks)
  Step 2. Inject EEH error

  bluefin is in slot P1-C4 (PCI0004)

   echo 0x8000 >
  /sys/kernel/debug/powerpc/PCI0004/err_injct_inboundA; sleep 1; echo
  0x0 > /sys/kernel/debug/powerpc/PCI0004/err_injct_inboundA

  Expected Result: Adapter/SAN disks to recover and htx still run

  Actual Result:  Adapter did not recover... continuous EEH errors until
  limit of 6 is reached in 1 hour

  There're two patches: one for skiboot firmware and another patch,
  which has been in upstream, was missed in ubuntu distro (at least
  15.04). The skiboot patch has been merged to upstream.

  c7192a4 PHB3: Fix wrong PE number in error injection (skiboot)
  2aa5cf9 powerpc/eeh: Fix missed PE#0 on P7IOC (linux)

  If I'm correct, I think this bug needs to be mirrored so that the
  Linux patch (commit 2aa5cf9) can be backported to ubuntu distro. With
  the patch backported to ubuntu 15.04, EEH works fine on Broadcom
  adapter (not exactly the one where the bug was reported initially):

  root@fstn2-p1:/# dmesg | grep EEH
  [0.216919] EEH: PowerNV platform initialized
  [0.570606] EEH: devices created
  [1.302482] EEH: PCI Enhanced I/O Error Handling Enabled
  [   90.566761] EEH: PHB location: Slot1
  [   90.567503] EEH: Frozen PHB#4-PE#0 detected
  [   90.567673] EEH: PE location: Slot1, PHB location: Slot1
  [   90.567930] EEH: Detected PCI bus error on PHB#4-PE#0
  [   90.567935] EEH: This PCI device has failed 1 times in the last hour
  [   90.567937] EEH: Notify device drivers to shutdown
  [   90.567985] EEH: Collect temporary log
  [   90.568971] EEH: Reset without hotplug activity
  [   94.585540] EEH: Notify device drivers the completion of reset
  [   94.585934] EEH: Notify device driver to resume

  

  The story about this bug is: Without commit 2aa5cf9 ("powerpc/eeh: Fix
  missed PE#0 on P7IOC"). PE#0 is regarded as invalid one. When kernel
  sees the frozen PE#0, the frozen state is cleared and dump the PHB
  diag-data, then try to recover it. When resetting the PE, the driver,
  which wasn't stopped by error_detected() completely, access the MMIO
  space and just causes another (recursive) EEH error. Eventually, the
  EEH recovery failed. During the PE reset, the I/O path for the PE
  should be frozen and MMIO access during the period should be dropped
  to avoid recursive EEH error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1502982/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1502982] Comment bridged from LTC Bugzilla

2015-10-09 Thread bugproxy
--- Comment From cdead...@us.ibm.com 2015-10-09 13:55 EDT---
 State: Verify by: anitrap on 09 October 2015 08:45:31 

Sorry about password confusion...So unless something has changed, I
thought we were not supposed to put lab passwords in defect.

You can request access to the passwords from
https://pcajet.austin.ibm.com and then follow directions to see current
password (I put that in seq 1 and that is why I never put password in
defect).

Directions to check passwords (from https://pcajet.austin.ibm.com) :
The Lab Test Passwords are now accessible only through the auto or manual 
install web apps. For example, from the manual install web app, enter your 
email address, check the Lab Passwords checkbox and then click on Submit.

Also, you can always go to sol console...

I'll send a note out with password.  Thanks for help and will have
system available.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1502982

Title:
  STCOP810:Firestone: frsfp6 EEH on Bluefin does not recover with Ubuntu

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Vivid:
  Fix Committed
Status in linux source package in Wily:
  Fix Released

Bug description:
  Problem:
  ==
  Test Case Execution Record:

  95613: EEH_Firestone_Ubuntu 14.04.03_Bluefin_Standalone on frsfp6

  Error Injection Method: err_injct_inboundA

  Step 1. Start HTX (I used mdt.hdbuster & only ran htx on bluefin disks)
  Step 2. Inject EEH error

  bluefin is in slot P1-C4 (PCI0004)

   echo 0x8000 >
  /sys/kernel/debug/powerpc/PCI0004/err_injct_inboundA; sleep 1; echo
  0x0 > /sys/kernel/debug/powerpc/PCI0004/err_injct_inboundA

  Expected Result: Adapter/SAN disks to recover and htx still run

  Actual Result:  Adapter did not recover... continuous EEH errors until
  limit of 6 is reached in 1 hour

  There're two patches: one for skiboot firmware and another patch,
  which has been in upstream, was missed in ubuntu distro (at least
  15.04). The skiboot patch has been merged to upstream.

  c7192a4 PHB3: Fix wrong PE number in error injection (skiboot)
  2aa5cf9 powerpc/eeh: Fix missed PE#0 on P7IOC (linux)

  If I'm correct, I think this bug needs to be mirrored so that the
  Linux patch (commit 2aa5cf9) can be backported to ubuntu distro. With
  the patch backported to ubuntu 15.04, EEH works fine on Broadcom
  adapter (not exactly the one where the bug was reported initially):

  root@fstn2-p1:/# dmesg | grep EEH
  [0.216919] EEH: PowerNV platform initialized
  [0.570606] EEH: devices created
  [1.302482] EEH: PCI Enhanced I/O Error Handling Enabled
  [   90.566761] EEH: PHB location: Slot1
  [   90.567503] EEH: Frozen PHB#4-PE#0 detected
  [   90.567673] EEH: PE location: Slot1, PHB location: Slot1
  [   90.567930] EEH: Detected PCI bus error on PHB#4-PE#0
  [   90.567935] EEH: This PCI device has failed 1 times in the last hour
  [   90.567937] EEH: Notify device drivers to shutdown
  [   90.567985] EEH: Collect temporary log
  [   90.568971] EEH: Reset without hotplug activity
  [   94.585540] EEH: Notify device drivers the completion of reset
  [   94.585934] EEH: Notify device driver to resume

  

  The story about this bug is: Without commit 2aa5cf9 ("powerpc/eeh: Fix
  missed PE#0 on P7IOC"). PE#0 is regarded as invalid one. When kernel
  sees the frozen PE#0, the frozen state is cleared and dump the PHB
  diag-data, then try to recover it. When resetting the PE, the driver,
  which wasn't stopped by error_detected() completely, access the MMIO
  space and just causes another (recursive) EEH error. Eventually, the
  EEH recovery failed. During the PE reset, the I/O path for the PE
  should be frozen and MMIO access during the period should be dropped
  to avoid recursive EEH error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1502982/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1502982] Comment bridged from LTC Bugzilla

2015-10-07 Thread bugproxy
--- Comment From cha...@us.ibm.com 2015-10-07 22:20 EDT---
Hello Tim,

This may have been asked before, but by the patch being committed to
Vivid, this also means it is automatically picked up for an upcoming
linux-lts-vivid kernel for Trusty (14.04), right?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1502982

Title:
  STCOP810:Firestone: frsfp6 EEH on Bluefin does not recover with Ubuntu

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Vivid:
  Fix Committed
Status in linux source package in Wily:
  Fix Released

Bug description:
  Problem:
  ==
  Test Case Execution Record:

  95613: EEH_Firestone_Ubuntu 14.04.03_Bluefin_Standalone on frsfp6

  Error Injection Method: err_injct_inboundA

  Step 1. Start HTX (I used mdt.hdbuster & only ran htx on bluefin disks)
  Step 2. Inject EEH error

  bluefin is in slot P1-C4 (PCI0004)

   echo 0x8000 >
  /sys/kernel/debug/powerpc/PCI0004/err_injct_inboundA; sleep 1; echo
  0x0 > /sys/kernel/debug/powerpc/PCI0004/err_injct_inboundA

  Expected Result: Adapter/SAN disks to recover and htx still run

  Actual Result:  Adapter did not recover... continuous EEH errors until
  limit of 6 is reached in 1 hour

  There're two patches: one for skiboot firmware and another patch,
  which has been in upstream, was missed in ubuntu distro (at least
  15.04). The skiboot patch has been merged to upstream.

  c7192a4 PHB3: Fix wrong PE number in error injection (skiboot)
  2aa5cf9 powerpc/eeh: Fix missed PE#0 on P7IOC (linux)

  If I'm correct, I think this bug needs to be mirrored so that the
  Linux patch (commit 2aa5cf9) can be backported to ubuntu distro. With
  the patch backported to ubuntu 15.04, EEH works fine on Broadcom
  adapter (not exactly the one where the bug was reported initially):

  root@fstn2-p1:/# dmesg | grep EEH
  [0.216919] EEH: PowerNV platform initialized
  [0.570606] EEH: devices created
  [1.302482] EEH: PCI Enhanced I/O Error Handling Enabled
  [   90.566761] EEH: PHB location: Slot1
  [   90.567503] EEH: Frozen PHB#4-PE#0 detected
  [   90.567673] EEH: PE location: Slot1, PHB location: Slot1
  [   90.567930] EEH: Detected PCI bus error on PHB#4-PE#0
  [   90.567935] EEH: This PCI device has failed 1 times in the last hour
  [   90.567937] EEH: Notify device drivers to shutdown
  [   90.567985] EEH: Collect temporary log
  [   90.568971] EEH: Reset without hotplug activity
  [   94.585540] EEH: Notify device drivers the completion of reset
  [   94.585934] EEH: Notify device driver to resume

  

  The story about this bug is: Without commit 2aa5cf9 ("powerpc/eeh: Fix
  missed PE#0 on P7IOC"). PE#0 is regarded as invalid one. When kernel
  sees the frozen PE#0, the frozen state is cleared and dump the PHB
  diag-data, then try to recover it. When resetting the PE, the driver,
  which wasn't stopped by error_detected() completely, access the MMIO
  space and just causes another (recursive) EEH error. Eventually, the
  EEH recovery failed. During the PE reset, the I/O path for the PE
  should be frozen and MMIO access during the period should be dropped
  to avoid recursive EEH error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1502982/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1502982] Comment bridged from LTC Bugzilla

2015-10-07 Thread bugproxy
--- Comment From gws...@au1.ibm.com 2015-10-07 23:51 EDT---
*** Bug 128309 has been marked as a duplicate of this bug. ***

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1502982

Title:
  STCOP810:Firestone: frsfp6 EEH on Bluefin does not recover with Ubuntu

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Vivid:
  Fix Committed
Status in linux source package in Wily:
  Fix Released

Bug description:
  Problem:
  ==
  Test Case Execution Record:

  95613: EEH_Firestone_Ubuntu 14.04.03_Bluefin_Standalone on frsfp6

  Error Injection Method: err_injct_inboundA

  Step 1. Start HTX (I used mdt.hdbuster & only ran htx on bluefin disks)
  Step 2. Inject EEH error

  bluefin is in slot P1-C4 (PCI0004)

   echo 0x8000 >
  /sys/kernel/debug/powerpc/PCI0004/err_injct_inboundA; sleep 1; echo
  0x0 > /sys/kernel/debug/powerpc/PCI0004/err_injct_inboundA

  Expected Result: Adapter/SAN disks to recover and htx still run

  Actual Result:  Adapter did not recover... continuous EEH errors until
  limit of 6 is reached in 1 hour

  There're two patches: one for skiboot firmware and another patch,
  which has been in upstream, was missed in ubuntu distro (at least
  15.04). The skiboot patch has been merged to upstream.

  c7192a4 PHB3: Fix wrong PE number in error injection (skiboot)
  2aa5cf9 powerpc/eeh: Fix missed PE#0 on P7IOC (linux)

  If I'm correct, I think this bug needs to be mirrored so that the
  Linux patch (commit 2aa5cf9) can be backported to ubuntu distro. With
  the patch backported to ubuntu 15.04, EEH works fine on Broadcom
  adapter (not exactly the one where the bug was reported initially):

  root@fstn2-p1:/# dmesg | grep EEH
  [0.216919] EEH: PowerNV platform initialized
  [0.570606] EEH: devices created
  [1.302482] EEH: PCI Enhanced I/O Error Handling Enabled
  [   90.566761] EEH: PHB location: Slot1
  [   90.567503] EEH: Frozen PHB#4-PE#0 detected
  [   90.567673] EEH: PE location: Slot1, PHB location: Slot1
  [   90.567930] EEH: Detected PCI bus error on PHB#4-PE#0
  [   90.567935] EEH: This PCI device has failed 1 times in the last hour
  [   90.567937] EEH: Notify device drivers to shutdown
  [   90.567985] EEH: Collect temporary log
  [   90.568971] EEH: Reset without hotplug activity
  [   94.585540] EEH: Notify device drivers the completion of reset
  [   94.585934] EEH: Notify device driver to resume

  

  The story about this bug is: Without commit 2aa5cf9 ("powerpc/eeh: Fix
  missed PE#0 on P7IOC"). PE#0 is regarded as invalid one. When kernel
  sees the frozen PE#0, the frozen state is cleared and dump the PHB
  diag-data, then try to recover it. When resetting the PE, the driver,
  which wasn't stopped by error_detected() completely, access the MMIO
  space and just causes another (recursive) EEH error. Eventually, the
  EEH recovery failed. During the PE reset, the I/O path for the PE
  should be frozen and MMIO access during the period should be dropped
  to avoid recursive EEH error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1502982/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp