Re: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-26 Thread Oliver O'Halloran
On Wed, Sep 27, 2023 at 9:03 AM Bjorn Helgaas wrote: > > On Fri, Sep 22, 2023 at 10:46:36AM +0800, Shuai Xue wrote: > > ... > > > Actually, this is a question from my colleague from firmware team. > > The original question is that: > > > > "Should I set CPER_SEV_FATAL for Generic Error Status

Re: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-26 Thread Shuai Xue
On 2023/9/27 07:02, Bjorn Helgaas wrote: > On Fri, Sep 22, 2023 at 10:46:36AM +0800, Shuai Xue wrote: >> ... > >> Actually, this is a question from my colleague from firmware team. >> The original question is that: >> >> "Should I set CPER_SEV_FATAL for Generic Error Status Block when a >>

Re: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-26 Thread Bjorn Helgaas
On Fri, Sep 22, 2023 at 10:46:36AM +0800, Shuai Xue wrote: > ... > Actually, this is a question from my colleague from firmware team. > The original question is that: > > "Should I set CPER_SEV_FATAL for Generic Error Status Block when a > PCIe fatal error is detected? If set, kernel

RE: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-25 Thread David Laight
From: Shuai Xue > Sent: 25 September 2023 02:44 > > On 2023/9/21 21:20, David Laight wrote: > > ... > > I've got a target to generate AER errors by generating read cycles > > that are inside the address range that the bridge forwards but > > outside of any BAR because there are 2 different sized

Re: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-24 Thread Oliver O'Halloran
On Fri, Sep 22, 2023 at 8:23 AM David Laight wrote: > > > It would be nice if they worked the same, but I suspect that vendors > > may rely on the fact that CPER_SEV_FATAL forces a restart/panic as > > part of their system integrity story. > > The file system errors created by a panic (especially

Re: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-24 Thread Shuai Xue
On 2023/9/21 21:20, David Laight wrote: > ... > I've got a target to generate AER errors by generating read cycles > that are inside the address range that the bridge forwards but > outside of any BAR because there are 2 different sized BARs. > (Pretty easy to setup.) > On the system I was

Re: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-21 Thread Shuai Xue
+ @Rafael for the APEI/GHES part. On 2023/9/22 05:52, Bjorn Helgaas wrote: > On Thu, Sep 21, 2023 at 08:10:19PM +0800, Shuai Xue wrote: >> On 2023/9/21 07:02, Bjorn Helgaas wrote: >>> On Mon, Sep 18, 2023 at 05:39:58PM +0800, Shuai Xue wrote: >> ... > >>> I guess your point is that for

RE: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-21 Thread David Laight
> It would be nice if they worked the same, but I suspect that vendors > may rely on the fact that CPER_SEV_FATAL forces a restart/panic as > part of their system integrity story. The file system errors created by a panic (especially an NMI panic) could easily be more problematic than a failed

Re: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-21 Thread Bjorn Helgaas
On Thu, Sep 21, 2023 at 08:10:19PM +0800, Shuai Xue wrote: > On 2023/9/21 07:02, Bjorn Helgaas wrote: > > On Mon, Sep 18, 2023 at 05:39:58PM +0800, Shuai Xue wrote: > ... > > I guess your point is that for CPER_SEV_FATAL errors, the APEI/GHES > > path always panics but the native path never does,

RE: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-21 Thread David Laight
... I've got a target to generate AER errors by generating read cycles that are inside the address range that the bridge forwards but outside of any BAR because there are 2 different sized BARs. (Pretty easy to setup.) On the system I was using they didn't get propagated all the way to the root

Re: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-21 Thread Shuai Xue
On 2023/9/21 07:02, Bjorn Helgaas wrote: > On Mon, Sep 18, 2023 at 05:39:58PM +0800, Shuai Xue wrote: >> Hi, all folks, >> >> Error reporting and recovery are one of the important features of PCIe, and >> the kernel has been supporting them since version 2.6, 17 years ago. >> I am very curious

Re: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-20 Thread Bjorn Helgaas
On Mon, Sep 18, 2023 at 05:39:58PM +0800, Shuai Xue wrote: > Hi, all folks, > > Error reporting and recovery are one of the important features of PCIe, and > the kernel has been supporting them since version 2.6, 17 years ago. > I am very curious about the expected behavior of the software. > I