Re: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-26 Thread Oliver O'Halloran
On Wed, Sep 27, 2023 at 9:03 AM Bjorn Helgaas wrote: > > On Fri, Sep 22, 2023 at 10:46:36AM +0800, Shuai Xue wrote: > > ... > > > Actually, this is a question from my colleague from firmware team. > > The original question is that: > > > > "Should I set CPER_SEV_FATAL for Generic Error Status

Re: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-26 Thread Shuai Xue
On 2023/9/27 07:02, Bjorn Helgaas wrote: > On Fri, Sep 22, 2023 at 10:46:36AM +0800, Shuai Xue wrote: >> ... > >> Actually, this is a question from my colleague from firmware team. >> The original question is that: >> >> "Should I set CPER_SEV_FATAL for Generic Error Status Block when a >>

Re: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-26 Thread Bjorn Helgaas
On Fri, Sep 22, 2023 at 10:46:36AM +0800, Shuai Xue wrote: > ... > Actually, this is a question from my colleague from firmware team. > The original question is that: > > "Should I set CPER_SEV_FATAL for Generic Error Status Block when a > PCIe fatal error is detected? If set, kernel

RE: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-25 Thread David Laight
From: Shuai Xue > Sent: 25 September 2023 02:44 > > On 2023/9/21 21:20, David Laight wrote: > > ... > > I've got a target to generate AER errors by generating read cycles > > that are inside the address range that the bridge forwards but > > outside of any BAR because there are 2 different sized

Re: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-24 Thread Oliver O'Halloran
On Fri, Sep 22, 2023 at 8:23 AM David Laight wrote: > > > It would be nice if they worked the same, but I suspect that vendors > > may rely on the fact that CPER_SEV_FATAL forces a restart/panic as > > part of their system integrity story. > > The file system errors created by a panic (especially

Re: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-24 Thread Shuai Xue
On 2023/9/21 21:20, David Laight wrote: > ... > I've got a target to generate AER errors by generating read cycles > that are inside the address range that the bridge forwards but > outside of any BAR because there are 2 different sized BARs. > (Pretty easy to setup.) > On the system I was

Re: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-21 Thread Shuai Xue
+ @Rafael for the APEI/GHES part. On 2023/9/22 05:52, Bjorn Helgaas wrote: > On Thu, Sep 21, 2023 at 08:10:19PM +0800, Shuai Xue wrote: >> On 2023/9/21 07:02, Bjorn Helgaas wrote: >>> On Mon, Sep 18, 2023 at 05:39:58PM +0800, Shuai Xue wrote: >> ... > >>> I guess your point is that for

RE: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-21 Thread David Laight
> It would be nice if they worked the same, but I suspect that vendors > may rely on the fact that CPER_SEV_FATAL forces a restart/panic as > part of their system integrity story. The file system errors created by a panic (especially an NMI panic) could easily be more problematic than a failed

Re: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-21 Thread Bjorn Helgaas
On Thu, Sep 21, 2023 at 08:10:19PM +0800, Shuai Xue wrote: > On 2023/9/21 07:02, Bjorn Helgaas wrote: > > On Mon, Sep 18, 2023 at 05:39:58PM +0800, Shuai Xue wrote: > ... > > I guess your point is that for CPER_SEV_FATAL errors, the APEI/GHES > > path always panics but the native path never does,

RE: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-21 Thread David Laight
... I've got a target to generate AER errors by generating read cycles that are inside the address range that the bridge forwards but outside of any BAR because there are 2 different sized BARs. (Pretty easy to setup.) On the system I was using they didn't get propagated all the way to the root

Re: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-21 Thread Shuai Xue
On 2023/9/21 07:02, Bjorn Helgaas wrote: > On Mon, Sep 18, 2023 at 05:39:58PM +0800, Shuai Xue wrote: >> Hi, all folks, >> >> Error reporting and recovery are one of the important features of PCIe, and >> the kernel has been supporting them since version 2.6, 17 years ago. >> I am very curious

Re: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-20 Thread Bjorn Helgaas
On Mon, Sep 18, 2023 at 05:39:58PM +0800, Shuai Xue wrote: > Hi, all folks, > > Error reporting and recovery are one of the important features of PCIe, and > the kernel has been supporting them since version 2.6, 17 years ago. > I am very curious about the expected behavior of the software. > I

Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-18 Thread Shuai Xue
Hi, all folks, Error reporting and recovery are one of the important features of PCIe, and the kernel has been supporting them since version 2.6, 17 years ago. I am very curious about the expected behavior of the software. I first recap the error classification and then list my questions bellow