[Kernel-packages] [Bug 1873537] Re: PCIe AER device recovery failed due to logic flaw

2020-08-18 Thread Brian Murray
The Eoan Ermine has reached end of life, so this bug will not be fixed
for that release

** Changed in: linux (Ubuntu Eoan)
   Status: Fix Committed => Won't Fix

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1873537

Title:
  PCIe AER device recovery failed due to logic flaw

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Eoan:
  Won't Fix
Status in linux source package in Focal:
  Fix Committed

Bug description:
  SRU Justification

  Impact:

  During PCI Express Downstream Port Containment (DPC) recovery,
  certain types of failures do not recover due to a logic flaw
  in pcie_do_recovery().

  The upstream git commit log explains the change:

  PCI/ERR: Update error status after reset_link()
  Commit bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") uses
  reset_link() to recover from fatal errors.  But during fatal error
  recovery, if the initial value of error status is PCI_ERS_RESULT_DISCONNECT
  or PCI_ERS_RESULT_NO_AER_DRIVER then even after successful recovery (using
  reset_link()) pcie_do_recovery() will report the recovery result as
  failure.  Update the status of error after reset_link().

  You can reproduce this issue by triggering a SW DPC using "DPC Software
  Trigger" bit in "DPC Control Register".  You should see recovery failed
  dmesg log as below:

pcieport :00:16.0: DPC: containment event, status:0x1f27 source:0x
pcieport :00:16.0: DPC: software trigger detected
pci :04:00.0: AER: can't recover (no error_detected callback)
pcieport :00:16.0: AER: device recovery failed

  Fixes: bdb5ac85777d ("PCI/ERR: Handle fatal error recovery")
  Link: 
https://lore.kernel.org/r/a255fcb3a3fdebcd90f84e08b555f1786eb8eba2.158584.git.sathyanarayanan.kuppusw...@linux.intel.com
  [bhelgaas: split pci_channel_io_frozen simplification to separate patch]
  Signed-off-by: Kuppuswamy Sathyanarayanan 

  Signed-off-by: Bjorn Helgaas 
  Acked-by: Keith Busch 
  Cc: Ashok Raj 

  Note that a second prerequisite patch is necessary as well.  This
  patch,

  commit b5dfbeacf74865a8d62a4f70f501cdc61510f8e0
  Author: Kuppuswamy Sathyanarayanan 

  Date:   Fri Mar 27 17:33:24 2020 -0500

  PCI/ERR: Combine pci_channel_io_frozen cases

  is a code readability change, and makes no functional changes.

  
  Testcase:

  On a system with DPC enabled, setpci may be used to set the DPC Software
  Trigger bit (bit 6, value 0x40) in the DPC Control register of a suitable
  PCIe device (a PCIe bridge, for example).

  On a system lacking the fix, the output will be as shown above (i.e.,
  culminating in the "device recovery failed" message).  With the fix
  applied, the device successfully recovers, resulting in a message of the
  form

  pcieport :d9:01.0: AER: Device recovery successful

  
  Regression Potential:

  The risk of regression is low, as (a) the path in question currently does
  not work, and (b) the changes are minimal, comprising only a housekeeping
  change and the logically correct updating of a status variable that did
  not previously occur.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1873537/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1873537] Re: PCIe AER device recovery failed due to logic flaw

2020-04-24 Thread Kelsey Margarete Skunberg
** Also affects: linux (Ubuntu Focal)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Eoan)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu Eoan)
   Status: New => Fix Committed

** Changed in: linux (Ubuntu Focal)
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1873537

Title:
  PCIe AER device recovery failed due to logic flaw

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Eoan:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed

Bug description:
  SRU Justification

  Impact:

  During PCI Express Downstream Port Containment (DPC) recovery,
  certain types of failures do not recover due to a logic flaw
  in pcie_do_recovery().

  The upstream git commit log explains the change:

  PCI/ERR: Update error status after reset_link()
  Commit bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") uses
  reset_link() to recover from fatal errors.  But during fatal error
  recovery, if the initial value of error status is PCI_ERS_RESULT_DISCONNECT
  or PCI_ERS_RESULT_NO_AER_DRIVER then even after successful recovery (using
  reset_link()) pcie_do_recovery() will report the recovery result as
  failure.  Update the status of error after reset_link().

  You can reproduce this issue by triggering a SW DPC using "DPC Software
  Trigger" bit in "DPC Control Register".  You should see recovery failed
  dmesg log as below:

pcieport :00:16.0: DPC: containment event, status:0x1f27 source:0x
pcieport :00:16.0: DPC: software trigger detected
pci :04:00.0: AER: can't recover (no error_detected callback)
pcieport :00:16.0: AER: device recovery failed

  Fixes: bdb5ac85777d ("PCI/ERR: Handle fatal error recovery")
  Link: 
https://lore.kernel.org/r/a255fcb3a3fdebcd90f84e08b555f1786eb8eba2.158584.git.sathyanarayanan.kuppusw...@linux.intel.com
  [bhelgaas: split pci_channel_io_frozen simplification to separate patch]
  Signed-off-by: Kuppuswamy Sathyanarayanan 

  Signed-off-by: Bjorn Helgaas 
  Acked-by: Keith Busch 
  Cc: Ashok Raj 

  Note that a second prerequisite patch is necessary as well.  This
  patch,

  commit b5dfbeacf74865a8d62a4f70f501cdc61510f8e0
  Author: Kuppuswamy Sathyanarayanan 

  Date:   Fri Mar 27 17:33:24 2020 -0500

  PCI/ERR: Combine pci_channel_io_frozen cases

  is a code readability change, and makes no functional changes.

  
  Testcase:

  On a system with DPC enabled, setpci may be used to set the DPC Software
  Trigger bit (bit 6, value 0x40) in the DPC Control register of a suitable
  PCIe device (a PCIe bridge, for example).

  On a system lacking the fix, the output will be as shown above (i.e.,
  culminating in the "device recovery failed" message).  With the fix
  applied, the device successfully recovers, resulting in a message of the
  form

  pcieport :d9:01.0: AER: Device recovery successful

  
  Regression Potential:

  The risk of regression is low, as (a) the path in question currently does
  not work, and (b) the changes are minimal, comprising only a housekeeping
  change and the logically correct updating of a status variable that did
  not previously occur.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1873537/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp