Re: [PATCH v10 13/15] PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem dev handler
Terry Bowman wrote: > Hi Dan, > > On 8/31/23 15:35, Dan Williams wrote: > > Terry Bowman wrote: > >> From: Robert Richter > >> > >> In Restricted CXL Device (RCD) mode a CXL device is exposed as an > >> RCiEP, but CXL downstream and upstream ports are not enumerated and > >> not visible in the PCIe hierarchy. [1] Protocol and link errors from > >> these non-enumerated ports are signaled as internal AER errors, either > >> Uncorrectable Internal Error (UIE) or Corrected Internal Errors (CIE) > >> via an RCEC. > >> > >> Restricted CXL host (RCH) downstream port-detected errors have the > >> Requester ID of the RCEC set in the RCEC's AER Error Source ID > >> register. A CXL handler must then inspect the error status in various > >> CXL registers residing in the dport's component register space (CXL > >> RAS capability) or the dport's RCRB (PCIe AER extended > >> capability). [2] > >> > >> Errors showing up in the RCEC's error handler must be handled and > >> connected to the CXL subsystem. Implement this by forwarding the error > >> to all CXL devices below the RCEC. Since the entire CXL device is > >> controlled only using PCIe Configuration Space of device 0, function > >> 0, only pass it there [3]. The error handling is limited to currently > >> supported devices with the Memory Device class code set (CXL Type 3 > >> Device, PCI_CLASS_MEMORY_CXL, 502h), handle downstream port errors in > >> the device's cxl_pci driver. Support for other CXL Device Types > >> (e.g. a CXL.cache Device) can be added later. > >> > >> To handle downstream port errors in addition to errors directed to the > >> CXL endpoint device, a handler must also inspect the CXL RAS and PCIe > >> AER capabilities of the CXL downstream port the device is connected > >> to. > >> > >> Since CXL downstream port errors are signaled using internal errors, > >> the handler requires those errors to be unmasked. This is subject of a > >> follow-on patch. > >> > >> The reason for choosing this implementation is that the AER service > >> driver claims the RCEC device, but does not allow it to register a > >> custom specific handler to support CXL. Connecting the RCEC hard-wired > >> with a CXL handler does not work, as the CXL subsystem might not be > >> present all the time. The alternative to add an implementation to the > >> portdrv to allow the registration of a custom RCEC error handler isn't > >> worth doing it as CXL would be its only user. Instead, just check for > >> an CXL RCEC and pass it down to the connected CXL device's error > >> handler. With this approach the code can entirely be implemented in > >> the PCIe AER driver and is independent of the CXL subsystem. The CXL > >> driver only provides the handler. > >> > >> [1] CXL 3.0 spec: 9.11.8 CXL Devices Attached to an RCH > >> [2] CXL 3.0 spec, 12.2.1.1 RCH Downstream Port-detected Errors > >> [3] CXL 3.0 spec, 8.1.3 PCIe DVSEC for CXL Devices > >> > >> Co-developed-by: Terry Bowman > >> Signed-off-by: Terry Bowman > >> Signed-off-by: Robert Richter > >> Cc: "Oliver O'Halloran" > >> Cc: Bjorn Helgaas > >> Cc: linuxppc-dev@lists.ozlabs.org > >> Cc: linux-...@vger.kernel.org > >> Acked-by: Bjorn Helgaas > >> Reviewed-by: Jonathan Cameron > >> Reviewed-by: Dave Jiang > >> --- > >> drivers/pci/pcie/Kconfig | 12 + > >> drivers/pci/pcie/aer.c | 96 +++- > >> 2 files changed, 106 insertions(+), 2 deletions(-) > >> > >> diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig > >> index 228652a59f27..4f0e70fafe2d 100644 > >> --- a/drivers/pci/pcie/Kconfig > >> +++ b/drivers/pci/pcie/Kconfig > >> @@ -49,6 +49,18 @@ config PCIEAER_INJECT > >> gotten from: > >> > >> https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/ > >> > >> +config PCIEAER_CXL > >> + bool "PCI Express CXL RAS support for Restricted Hosts (RCH)" > > > > Why the "for Restricted Hosts (RCH)" clarification? I am seeing nothing > > that prevents this from working with RCECs on VH topologies. > > > > The same option can be used in VH mode. Will remove the RCH reference. > > > > >> + default y > > > > Minor, but I think "default PCIEAER" makes it slightly clearer that CXL > > error handling comes along for the ride with PCIE AER. > > > > We found Kconfig entries do not typically list a dependancy and the default > to be the same. We prefer to leave as 'default y'. If you want we can make > your requested change. The tie breaker would be to follow whatever the local precedent is. Indeed, it looks like "depends with default y" is consistent with other drivers/pci/pcie/Kconfig entries. So I retract my comment. > > >> + depends on PCIEAER && CXL_PCI > >> + help > >> +Enables error handling of downstream ports of a CXL host > >> +that is operating in RCD mode (Restricted CXL Host, RCH). > >> +The downstream port reports AER errors to a given RCEC. > >> +Errors are handled by the CXL memory device driver. > >> +
Re: [PATCH v10 13/15] PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem dev handler
Hi Dan, I adde danothe comment below. On 9/19/23 15:58, Terry Bowman wrote: > Hi Dan, > > On 8/31/23 15:35, Dan Williams wrote: >> Terry Bowman wrote: >>> From: Robert Richter >>> >>> In Restricted CXL Device (RCD) mode a CXL device is exposed as an >>> RCiEP, but CXL downstream and upstream ports are not enumerated and >>> not visible in the PCIe hierarchy. [1] Protocol and link errors from >>> these non-enumerated ports are signaled as internal AER errors, either >>> Uncorrectable Internal Error (UIE) or Corrected Internal Errors (CIE) >>> via an RCEC. >>> >>> Restricted CXL host (RCH) downstream port-detected errors have the >>> Requester ID of the RCEC set in the RCEC's AER Error Source ID >>> register. A CXL handler must then inspect the error status in various >>> CXL registers residing in the dport's component register space (CXL >>> RAS capability) or the dport's RCRB (PCIe AER extended >>> capability). [2] >>> >>> Errors showing up in the RCEC's error handler must be handled and >>> connected to the CXL subsystem. Implement this by forwarding the error >>> to all CXL devices below the RCEC. Since the entire CXL device is >>> controlled only using PCIe Configuration Space of device 0, function >>> 0, only pass it there [3]. The error handling is limited to currently >>> supported devices with the Memory Device class code set (CXL Type 3 >>> Device, PCI_CLASS_MEMORY_CXL, 502h), handle downstream port errors in >>> the device's cxl_pci driver. Support for other CXL Device Types >>> (e.g. a CXL.cache Device) can be added later. >>> >>> To handle downstream port errors in addition to errors directed to the >>> CXL endpoint device, a handler must also inspect the CXL RAS and PCIe >>> AER capabilities of the CXL downstream port the device is connected >>> to. >>> >>> Since CXL downstream port errors are signaled using internal errors, >>> the handler requires those errors to be unmasked. This is subject of a >>> follow-on patch. >>> >>> The reason for choosing this implementation is that the AER service >>> driver claims the RCEC device, but does not allow it to register a >>> custom specific handler to support CXL. Connecting the RCEC hard-wired >>> with a CXL handler does not work, as the CXL subsystem might not be >>> present all the time. The alternative to add an implementation to the >>> portdrv to allow the registration of a custom RCEC error handler isn't >>> worth doing it as CXL would be its only user. Instead, just check for >>> an CXL RCEC and pass it down to the connected CXL device's error >>> handler. With this approach the code can entirely be implemented in >>> the PCIe AER driver and is independent of the CXL subsystem. The CXL >>> driver only provides the handler. >>> >>> [1] CXL 3.0 spec: 9.11.8 CXL Devices Attached to an RCH >>> [2] CXL 3.0 spec, 12.2.1.1 RCH Downstream Port-detected Errors >>> [3] CXL 3.0 spec, 8.1.3 PCIe DVSEC for CXL Devices >>> >>> Co-developed-by: Terry Bowman >>> Signed-off-by: Terry Bowman >>> Signed-off-by: Robert Richter >>> Cc: "Oliver O'Halloran" >>> Cc: Bjorn Helgaas >>> Cc: linuxppc-dev@lists.ozlabs.org >>> Cc: linux-...@vger.kernel.org >>> Acked-by: Bjorn Helgaas >>> Reviewed-by: Jonathan Cameron >>> Reviewed-by: Dave Jiang >>> --- >>> drivers/pci/pcie/Kconfig | 12 + >>> drivers/pci/pcie/aer.c | 96 +++- >>> 2 files changed, 106 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig >>> index 228652a59f27..4f0e70fafe2d 100644 >>> --- a/drivers/pci/pcie/Kconfig >>> +++ b/drivers/pci/pcie/Kconfig >>> @@ -49,6 +49,18 @@ config PCIEAER_INJECT >>> gotten from: >>> >>> https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/ >>> >>> +config PCIEAER_CXL >>> + bool "PCI Express CXL RAS support for Restricted Hosts (RCH)" >> >> Why the "for Restricted Hosts (RCH)" clarification? I am seeing nothing >> that prevents this from working with RCECs on VH topologies. >> > > The same option can be used in VH mode. Will remove the RCH reference. > >> >>> + default y >> >> Minor, but I think "default PCIEAER" makes it slightly clearer that CXL >> error handling comes along for the ride with PCIE AER. >> > > We found Kconfig entries do not typically list a dependancy and the default > to be the same. We prefer to leave as 'default y'. If you want we can make > your requested change. > >>> + depends on PCIEAER && CXL_PCI >>> + help >>> + Enables error handling of downstream ports of a CXL host >>> + that is operating in RCD mode (Restricted CXL Host, RCH). >>> + The downstream port reports AER errors to a given RCEC. >>> + Errors are handled by the CXL memory device driver. >>> + >>> + If unsure, say Y. >>> + >>> # >>> # PCI Express ECRC >>> # >>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c >>> index d3344fcf1f79..c354ca5e8f2b 100644 >>> --- a/drivers/pci/pcie/aer.c >>> +++
Re: [PATCH v10 13/15] PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem dev handler
Hi Dan, On 8/31/23 15:35, Dan Williams wrote: > Terry Bowman wrote: >> From: Robert Richter >> >> In Restricted CXL Device (RCD) mode a CXL device is exposed as an >> RCiEP, but CXL downstream and upstream ports are not enumerated and >> not visible in the PCIe hierarchy. [1] Protocol and link errors from >> these non-enumerated ports are signaled as internal AER errors, either >> Uncorrectable Internal Error (UIE) or Corrected Internal Errors (CIE) >> via an RCEC. >> >> Restricted CXL host (RCH) downstream port-detected errors have the >> Requester ID of the RCEC set in the RCEC's AER Error Source ID >> register. A CXL handler must then inspect the error status in various >> CXL registers residing in the dport's component register space (CXL >> RAS capability) or the dport's RCRB (PCIe AER extended >> capability). [2] >> >> Errors showing up in the RCEC's error handler must be handled and >> connected to the CXL subsystem. Implement this by forwarding the error >> to all CXL devices below the RCEC. Since the entire CXL device is >> controlled only using PCIe Configuration Space of device 0, function >> 0, only pass it there [3]. The error handling is limited to currently >> supported devices with the Memory Device class code set (CXL Type 3 >> Device, PCI_CLASS_MEMORY_CXL, 502h), handle downstream port errors in >> the device's cxl_pci driver. Support for other CXL Device Types >> (e.g. a CXL.cache Device) can be added later. >> >> To handle downstream port errors in addition to errors directed to the >> CXL endpoint device, a handler must also inspect the CXL RAS and PCIe >> AER capabilities of the CXL downstream port the device is connected >> to. >> >> Since CXL downstream port errors are signaled using internal errors, >> the handler requires those errors to be unmasked. This is subject of a >> follow-on patch. >> >> The reason for choosing this implementation is that the AER service >> driver claims the RCEC device, but does not allow it to register a >> custom specific handler to support CXL. Connecting the RCEC hard-wired >> with a CXL handler does not work, as the CXL subsystem might not be >> present all the time. The alternative to add an implementation to the >> portdrv to allow the registration of a custom RCEC error handler isn't >> worth doing it as CXL would be its only user. Instead, just check for >> an CXL RCEC and pass it down to the connected CXL device's error >> handler. With this approach the code can entirely be implemented in >> the PCIe AER driver and is independent of the CXL subsystem. The CXL >> driver only provides the handler. >> >> [1] CXL 3.0 spec: 9.11.8 CXL Devices Attached to an RCH >> [2] CXL 3.0 spec, 12.2.1.1 RCH Downstream Port-detected Errors >> [3] CXL 3.0 spec, 8.1.3 PCIe DVSEC for CXL Devices >> >> Co-developed-by: Terry Bowman >> Signed-off-by: Terry Bowman >> Signed-off-by: Robert Richter >> Cc: "Oliver O'Halloran" >> Cc: Bjorn Helgaas >> Cc: linuxppc-dev@lists.ozlabs.org >> Cc: linux-...@vger.kernel.org >> Acked-by: Bjorn Helgaas >> Reviewed-by: Jonathan Cameron >> Reviewed-by: Dave Jiang >> --- >> drivers/pci/pcie/Kconfig | 12 + >> drivers/pci/pcie/aer.c | 96 +++- >> 2 files changed, 106 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig >> index 228652a59f27..4f0e70fafe2d 100644 >> --- a/drivers/pci/pcie/Kconfig >> +++ b/drivers/pci/pcie/Kconfig >> @@ -49,6 +49,18 @@ config PCIEAER_INJECT >>gotten from: >> >> https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/ >> >> +config PCIEAER_CXL >> +bool "PCI Express CXL RAS support for Restricted Hosts (RCH)" > > Why the "for Restricted Hosts (RCH)" clarification? I am seeing nothing > that prevents this from working with RCECs on VH topologies. > The same option can be used in VH mode. Will remove the RCH reference. > >> +default y > > Minor, but I think "default PCIEAER" makes it slightly clearer that CXL > error handling comes along for the ride with PCIE AER. > We found Kconfig entries do not typically list a dependancy and the default to be the same. We prefer to leave as 'default y'. If you want we can make your requested change. >> +depends on PCIEAER && CXL_PCI >> +help >> + Enables error handling of downstream ports of a CXL host >> + that is operating in RCD mode (Restricted CXL Host, RCH). >> + The downstream port reports AER errors to a given RCEC. >> + Errors are handled by the CXL memory device driver. >> + >> + If unsure, say Y. >> + >> # >> # PCI Express ECRC >> # >> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c >> index d3344fcf1f79..c354ca5e8f2b 100644 >> --- a/drivers/pci/pcie/aer.c >> +++ b/drivers/pci/pcie/aer.c >> @@ -946,14 +946,100 @@ static bool find_source_device(struct pci_dev *parent, >> return true; >> } >> >> +#ifdef CONFIG_PCIEAER_CXL >> + >> +static bool
RE: [PATCH v10 13/15] PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem dev handler
Terry Bowman wrote: > From: Robert Richter > > In Restricted CXL Device (RCD) mode a CXL device is exposed as an > RCiEP, but CXL downstream and upstream ports are not enumerated and > not visible in the PCIe hierarchy. [1] Protocol and link errors from > these non-enumerated ports are signaled as internal AER errors, either > Uncorrectable Internal Error (UIE) or Corrected Internal Errors (CIE) > via an RCEC. > > Restricted CXL host (RCH) downstream port-detected errors have the > Requester ID of the RCEC set in the RCEC's AER Error Source ID > register. A CXL handler must then inspect the error status in various > CXL registers residing in the dport's component register space (CXL > RAS capability) or the dport's RCRB (PCIe AER extended > capability). [2] > > Errors showing up in the RCEC's error handler must be handled and > connected to the CXL subsystem. Implement this by forwarding the error > to all CXL devices below the RCEC. Since the entire CXL device is > controlled only using PCIe Configuration Space of device 0, function > 0, only pass it there [3]. The error handling is limited to currently > supported devices with the Memory Device class code set (CXL Type 3 > Device, PCI_CLASS_MEMORY_CXL, 502h), handle downstream port errors in > the device's cxl_pci driver. Support for other CXL Device Types > (e.g. a CXL.cache Device) can be added later. > > To handle downstream port errors in addition to errors directed to the > CXL endpoint device, a handler must also inspect the CXL RAS and PCIe > AER capabilities of the CXL downstream port the device is connected > to. > > Since CXL downstream port errors are signaled using internal errors, > the handler requires those errors to be unmasked. This is subject of a > follow-on patch. > > The reason for choosing this implementation is that the AER service > driver claims the RCEC device, but does not allow it to register a > custom specific handler to support CXL. Connecting the RCEC hard-wired > with a CXL handler does not work, as the CXL subsystem might not be > present all the time. The alternative to add an implementation to the > portdrv to allow the registration of a custom RCEC error handler isn't > worth doing it as CXL would be its only user. Instead, just check for > an CXL RCEC and pass it down to the connected CXL device's error > handler. With this approach the code can entirely be implemented in > the PCIe AER driver and is independent of the CXL subsystem. The CXL > driver only provides the handler. > > [1] CXL 3.0 spec: 9.11.8 CXL Devices Attached to an RCH > [2] CXL 3.0 spec, 12.2.1.1 RCH Downstream Port-detected Errors > [3] CXL 3.0 spec, 8.1.3 PCIe DVSEC for CXL Devices > > Co-developed-by: Terry Bowman > Signed-off-by: Terry Bowman > Signed-off-by: Robert Richter > Cc: "Oliver O'Halloran" > Cc: Bjorn Helgaas > Cc: linuxppc-dev@lists.ozlabs.org > Cc: linux-...@vger.kernel.org > Acked-by: Bjorn Helgaas > Reviewed-by: Jonathan Cameron > Reviewed-by: Dave Jiang > --- > drivers/pci/pcie/Kconfig | 12 + > drivers/pci/pcie/aer.c | 96 +++- > 2 files changed, 106 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig > index 228652a59f27..4f0e70fafe2d 100644 > --- a/drivers/pci/pcie/Kconfig > +++ b/drivers/pci/pcie/Kconfig > @@ -49,6 +49,18 @@ config PCIEAER_INJECT > gotten from: > > https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/ > > +config PCIEAER_CXL > + bool "PCI Express CXL RAS support for Restricted Hosts (RCH)" Why the "for Restricted Hosts (RCH)" clarification? I am seeing nothing that prevents this from working with RCECs on VH topologies. > + default y Minor, but I think "default PCIEAER" makes it slightly clearer that CXL error handling comes along for the ride with PCIE AER. > + depends on PCIEAER && CXL_PCI > + help > + Enables error handling of downstream ports of a CXL host > + that is operating in RCD mode (Restricted CXL Host, RCH). > + The downstream port reports AER errors to a given RCEC. > + Errors are handled by the CXL memory device driver. > + > + If unsure, say Y. > + > # > # PCI Express ECRC > # > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index d3344fcf1f79..c354ca5e8f2b 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -946,14 +946,100 @@ static bool find_source_device(struct pci_dev *parent, > return true; > } > > +#ifdef CONFIG_PCIEAER_CXL > + > +static bool is_cxl_mem_dev(struct pci_dev *dev) > +{ > + /* > + * The capability, status, and control fields in Device 0, > + * Function 0 DVSEC control the CXL functionality of the > + * entire device (CXL 3.0, 8.1.3). > + */ > + if (dev->devfn != PCI_DEVFN(0, 0)) > + return false; > + > + /* > + * CXL Memory Devices must have the 502h class code set (CXL > + *