Re: [PATCH v8 2/3] PCI/AER: Disable AER service on suspend
On 4/15/24 9:32 PM, Kai-Heng Feng wrote: > When the power rail gets cut off, the hardware can create some electric > noise on the link that triggers AER. If IRQ is shared between AER with > PME, such AER noise will cause a spurious wakeup on system suspend. > > When the power rail gets back, the firmware of the device resets itself > and can create unexpected behavior like sending PTM messages. For this > case, the driver will always be too late to toggle off features should > be disabled. > > As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power > Management", TLP and DLLP transmission are disabled for a Link in L2/L3 > Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold) states. So if > the power will be turned off during suspend process, disable AER service > and re-enable it during the resume process. This should not affect the > basic functionality. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149 > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218090 > Signed-off-by: Kai-Heng Feng > --- Looks good to me. Reviewed-by: Kuppuswamy Sathyanarayanan > v8: > - Add more bug reports. > > v7: > - Wording > - Disable AER completely (again) if power will be turned off > > v6: > v5: > - Wording. > > v4: > v3: > - No change. > > v2: > - Only disable AER IRQ. > - No more check on PME IRQ#. > - Use helper. > > drivers/pci/pcie/aer.c | 25 + > 1 file changed, 25 insertions(+) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index ac6293c24976..bea7818c2d1b 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -28,6 +28,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -1497,6 +1498,28 @@ static int aer_probe(struct pcie_device *dev) > return 0; > } > > +static int aer_suspend(struct pcie_device *dev) > +{ > + struct aer_rpc *rpc = get_service_data(dev); > + struct pci_dev *pdev = rpc->rpd; > + > + if (pci_ancestor_pr3_present(pdev) || pm_suspend_via_firmware()) > + aer_disable_rootport(rpc); > + > + return 0; > +} > + > +static int aer_resume(struct pcie_device *dev) > +{ > + struct aer_rpc *rpc = get_service_data(dev); > + struct pci_dev *pdev = rpc->rpd; > + > + if (pci_ancestor_pr3_present(pdev) || pm_resume_via_firmware()) > + aer_enable_rootport(rpc); > + > + return 0; > +} > + > /** > * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > * @dev: pointer to Root Port, RCEC, or RCiEP > @@ -1561,6 +1584,8 @@ static struct pcie_port_service_driver aerdriver = { > .service= PCIE_PORT_SERVICE_AER, > > .probe = aer_probe, > + .suspend= aer_suspend, > + .resume = aer_resume, > .remove = aer_remove, > }; > -- Sathyanarayanan Kuppuswamy Linux Kernel Developer
Re: [PATCH v8 1/3] PCI: Add helper to check if any of ancestor device support D3cold
On 4/15/24 9:32 PM, Kai-Heng Feng wrote: > In addition to nearest upstream bridge, driver may want to know if the > entire hierarchy can be powered off to perform different action. > > So walk higher up the hierarchy to find out if any device has valid > _PR3. > > The user will be introduced in next patch. > > Signed-off-by: Kai-Heng Feng > --- Since it has been a while, I was not sure what this series is about. IMO, it is better to include a cover letter with the summary of your changes. > v8: > - No change. > > drivers/pci/pci.c | 16 > include/linux/pci.h | 2 ++ > 2 files changed, 18 insertions(+) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index e5f243dd4288..7a5662f116b8 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -6225,6 +6225,22 @@ bool pci_pr3_present(struct pci_dev *pdev) > acpi_has_method(adev->handle, "_PR3"); > } > EXPORT_SYMBOL_GPL(pci_pr3_present); > + > +bool pci_ancestor_pr3_present(struct pci_dev *pdev) > +{ > + struct pci_dev *parent = pdev; > + > + if (acpi_disabled) > + return false; > + > + while ((parent = pci_upstream_bridge(parent))) { > + if (pci_pr3_present(pdev)) I think it should be "parent" here? > + return true; > + } > + > + return false; > +} > +EXPORT_SYMBOL_GPL(pci_ancestor_pr3_present); > #endif > > /** > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 16493426a04f..cd71ebfd0f89 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -2620,10 +2620,12 @@ struct irq_domain > *pci_host_bridge_acpi_msi_domain(struct pci_bus *bus); > void > pci_msi_register_fwnode_provider(struct fwnode_handle *(*fn)(struct device > *)); > bool pci_pr3_present(struct pci_dev *pdev); > +bool pci_ancestor_pr3_present(struct pci_dev *pdev); > #else > static inline struct irq_domain * > pci_host_bridge_acpi_msi_domain(struct pci_bus *bus) { return NULL; } > static inline bool pci_pr3_present(struct pci_dev *pdev) { return false; } > +static inline bool pci_ancestor_pr3_present(struct pci_dev *pdev) { return > false; } > #endif > > #ifdef CONFIG_EEH -- Sathyanarayanan Kuppuswamy Linux Kernel Developer
Re: [PATCH v2 1/4] PCI/AER: Store more information in aer_err_info
On 1/24/24 10:27 PM, Wang, Qingshun wrote: > When Advisory Non-Fatal errors are raised, both correctable and Maybe you can start with same info about what Advisory Non-FataL errors are and the specification reference. I know that you included it in cover letter. But it is good to include it in commit log. > uncorrectable error statuses will be set. The current kernel code cannot > store both statuses at the same time, thus failing to handle ANFE properly. > In addition, to avoid clearing UEs that are not ANFE by accident, UE Please add some details about the impact of not clearing them. > severity and Device Status also need to be recorded: any fatal UE cannot > be ANFE, and if Fatal/Non-Fatal Error Detected is set in Device Status, do > not take any assumption and let UE handler to clear UE status. > > Store status and mask of both correctable and uncorrectable errors in > aer_err_info. The severity of UEs and the values of the Device Status > register are also recorded, which will be used to determine UEs that should > be handled by the ANFE handler. Refactor the rest of the code to use > cor/uncor_status and cor/uncor_mask fields instead of status and mask > fields. > > Signed-off-by: "Wang, Qingshun" > --- > drivers/acpi/apei/ghes.c | 10 - > drivers/cxl/core/pci.c | 6 ++- > drivers/pci/pci.h| 8 +++- > drivers/pci/pcie/aer.c | 93 ++-- > include/linux/aer.h | 4 +- > include/linux/pci.h | 27 > 6 files changed, 111 insertions(+), 37 deletions(-) > > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c > index 7b7c605166e0..6034039d5cff 100644 > --- a/drivers/acpi/apei/ghes.c > +++ b/drivers/acpi/apei/ghes.c > @@ -593,6 +593,8 @@ static void ghes_handle_aer(struct acpi_hest_generic_data > *gdata) > > if (pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID && > pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO) { > + struct pcie_capability_regs *pcie_caps; > + u16 device_status = 0; > unsigned int devfn; > int aer_severity; > u8 *aer_info; > @@ -615,11 +617,17 @@ static void ghes_handle_aer(struct > acpi_hest_generic_data *gdata) > return; > memcpy(aer_info, pcie_err->aer_info, sizeof(struct > aer_capability_regs)); > > + if (pcie_err->validation_bits & CPER_PCIE_VALID_CAPABILITY) { > + pcie_caps = (struct pcie_capability_regs > *)pcie_err->capability; > + device_status = pcie_caps->device_status; > + } > + > aer_recover_queue(pcie_err->device_id.segment, > pcie_err->device_id.bus, > devfn, aer_severity, > (struct aer_capability_regs *) > - aer_info); > + aer_info, > + device_status); > } > #endif > } > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c > index 6c9c8d92f8f7..9111a4415a63 100644 > --- a/drivers/cxl/core/pci.c > +++ b/drivers/cxl/core/pci.c > @@ -903,6 +903,7 @@ static void cxl_handle_rdport_errors(struct cxl_dev_state > *cxlds) > struct aer_capability_regs aer_regs; > struct cxl_dport *dport; > struct cxl_port *port; > + u16 device_status; > int severity; > > port = cxl_pci_find_port(pdev, ); > @@ -917,7 +918,10 @@ static void cxl_handle_rdport_errors(struct > cxl_dev_state *cxlds) > if (!cxl_rch_get_aer_severity(_regs, )) > return; > > - pci_print_aer(pdev, severity, _regs); > + if (pcie_capability_read_word(pdev, PCI_EXP_DEVSTA, _status)) > + return; > + > + pci_print_aer(pdev, severity, _regs, device_status); > > if (severity == AER_CORRECTABLE) > cxl_handle_rdport_cor_ras(cxlds, dport); > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index 2336a8d1edab..f881a1b42f14 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -407,8 +407,12 @@ struct aer_err_info { > unsigned int __pad2:2; > unsigned int tlp_header_valid:1; > > - unsigned int status;/* COR/UNCOR Error Status */ > - unsigned int mask; /* COR/UNCOR Error Mask */ > + u32 cor_mask; /* COR Error Mask */ > + u32 cor_status; /* COR Error Status */ > + u32 uncor_mask; /* UNCOR Error Mask */ > + u32 uncor_status; /* UNCOR Error Status */ > + u32 uncor_severity; /* UNCOR Error Severity */ > + u16 device_status; > struct aer_header_log_regs tlp; /* TLP Header */ > }; > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index 05fc30bb5134..6583dcf50977 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -615,7 +615,7 @@ const struct attribute_group
Re: [PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
On 12/6/2023 2:42 PM, Bjorn Helgaas wrote: > From: Bjorn Helgaas > > The PCIe spec classifies errors as either "Correctable" or "Uncorrectable". > Previously we printed these as "Corrected" or "Uncorrected". To avoid > confusion, use the same terms as the spec. > > One confusing situation is when one agent detects an error, but another > agent is responsible for recovery, e.g., by re-attempting the operation. > The first agent may log a "correctable" error but it has not yet been > corrected. The recovery agent must report an uncorrectable error if it is > unable to recover. If we print the first agent's error as "Corrected", it > gives the false impression that it has already been resolved. > > Sample message change: > > - pcieport :00:1c.5: AER: Corrected error received: :00:1c.5 > + pcieport :00:1c.5: AER: Correctable error received: 0000:00:1c.5 > > Signed-off-by: Bjorn Helgaas > --- Looks good to me. Reviewed-by: Kuppuswamy Sathyanarayanan > drivers/pci/pcie/aer.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index 42a3bd35a3e1..20db80018b5d 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -436,9 +436,9 @@ void pci_aer_exit(struct pci_dev *dev) > * AER error strings > */ > static const char *aer_error_severity_string[] = { > - "Uncorrected (Non-Fatal)", > - "Uncorrected (Fatal)", > - "Corrected" > + "Uncorrectable (Non-Fatal)", > + "Uncorrectable (Fatal)", > + "Correctable" > }; > > static const char *aer_error_layer[] = { -- Sathyanarayanan Kuppuswamy Linux Kernel Developer
Re: [PATCH 2/3] PCI/AER: Decode Requester ID when no error info found
On 12/6/2023 2:42 PM, Bjorn Helgaas wrote: > From: Bjorn Helgaas > > When a device with AER detects an error, it logs error information in its > own AER Error Status registers. It may send an Error Message to the Root > Port (RCEC in the case of an RCiEP), which logs the fact that an Error > Message was received (Root Error Status) and the Requester ID of the > message source (Error Source Identification). > > aer_print_port_info() prints the Requester ID from the Root Port Error > Source in the usual Linux "bb:dd.f" format, but when find_source_device() > finds no error details in the hierarchy below the Root Port, it printed the > raw Requester ID without decoding it. > > Decode the Requester ID in the usual Linux format so it matches other > messages. > > Sample message changes: > > - pcieport :00:1c.5: AER: Correctable error received: :00:1c.5 > - pcieport :00:1c.5: AER: can't find device of ID00e5 > + pcieport :00:1c.5: AER: Correctable error message received from > :00:1c.5 > + pcieport :00:1c.5: AER: found no error details for :00:1c.5 > > Signed-off-by: Bjorn Helgaas Except for the suggestion given below, it looks good to me. Reviewed-by: Kuppuswamy Sathyanarayanan > --- > drivers/pci/pcie/aer.c | 9 +++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index 20db80018b5d..2ff6bac9979f 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -740,7 +740,7 @@ static void aer_print_port_info(struct pci_dev *dev, > struct aer_err_info *info) > u8 bus = info->id >> 8; > u8 devfn = info->id & 0xff; > > - pci_info(dev, "%s%s error received: %04x:%02x:%02x.%d\n", > + pci_info(dev, "%s%s error message received from %04x:%02x:%02x.%d\n", >info->multi_error_valid ? "Multiple " : "", >aer_error_severity_string[info->severity], >pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn), > @@ -929,7 +929,12 @@ static bool find_source_device(struct pci_dev *parent, > pci_walk_bus(parent->subordinate, find_device_iter, e_info); > > if (!e_info->error_dev_num) { > - pci_info(parent, "can't find device of ID%04x\n", e_info->id); > + u8 bus = e_info->id >> 8; > + u8 devfn = e_info->id & 0xff; You can use PCI_BUS_NUM(e_info->id) for getting bus number. Since you are extracting this info in more than one place, maybe you can also define a macro PCI_DEVFN(id) (following PCI_BUS_NUM()). > + > + pci_info(parent, "found no error details for > %04x:%02x:%02x.%d\n", > + pci_domain_nr(parent->bus), bus, PCI_SLOT(devfn), > + PCI_FUNC(devfn)); > return false; > } > return true; -- Sathyanarayanan Kuppuswamy Linux Kernel Developer
[PATCH v3] PCI/AER: Handle Multi UnCorrectable/Correctable errors properly
Currently the aer_irq() handler returns IRQ_NONE for cases without bits PCI_ERR_ROOT_UNCOR_RCV or PCI_ERR_ROOT_COR_RCV are set. But this assumption is incorrect. Consider a scenario where aer_irq() is triggered for a correctable error, and while we process the error and before we clear the error status in "Root Error Status" register, if the same kind of error is triggered again, since aer_irq() only clears events it saw, the multi-bit error is left in tact. This will cause the interrupt to fire again, resulting in entering aer_irq() with just the multi-bit error logged in the "Root Error Status" register. Repeated AER recovery test has revealed this condition does happen and this prevents any new interrupt from being triggered. Allow to process interrupt even if only multi-correctable (BIT 1) or multi-uncorrectable bit (BIT 3) is set. Also note that, for cases with only multi-bit error is set, since this is not the first occurrence of the error, PCI_ERR_ROOT_ERR_SRC may have zero or some junk value. So we cannot cleanly process this error information using aer_isr_one_error(). All we are attempting with this fix is to make sure error interrupt processing can continue in this scenario. This error can be reproduced by making following changes to the aer_irq() function and by executing the given test commands. static irqreturn_t aer_irq(int irq, void *context) struct aer_err_source e_src = {}; pci_read_config_dword(rp, aer + PCI_ERR_ROOT_STATUS, _src.status); + pci_dbg(pdev->port, "Root Error Status: %04x\n", + e_src.status); if (!(e_src.status & AER_ERR_STATUS_MASK)) return IRQ_NONE; + mdelay(5000); # Prep injection data for a correctable error. $ cd /sys/kernel/debug/apei/einj $ echo 0x0040 > error_type $ echo 0x4 > flags $ echo 0x891000 > param4 # Root Error Status is initially clear $ setpci -s ECAP0001+0x30.w # Inject one error $ echo 1 > error_inject # Interrupt received pcieport : AER: Root Error Status 0001 # Inject another error (within 5 seconds) $ echo 1 > error_inject # You will get a new IRQ with only multiple ERR_COR bit set pcieport : AER: Root Error Status 0002 Currently, the above issue has been only reproduced in the ICL server platform. [Eric: proposed reproducing steps] Fixes: 4696b828ca37 ("PCI/AER: Hoist aerdrv.c, aer_inject.c up to drivers/pci/pcie/") Reported-by: Eric Badger Reviewed-by: Ashok Raj Signed-off-by: Kuppuswamy Sathyanarayanan --- Changes since v2: * Added more details to the commit log. * Rebased on v5.18-rc1. Changes since v1: * Added Fixes tag. * Included reproducing steps proposed by Eric. drivers/pci/pcie/aer.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 9fa1f97e5b27..7952e5efd6cf 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -101,6 +101,11 @@ struct aer_stats { #define ERR_COR_ID(d) (d & 0x) #define ERR_UNCOR_ID(d)(d >> 16) +#define AER_ERR_STATUS_MASK(PCI_ERR_ROOT_UNCOR_RCV | \ + PCI_ERR_ROOT_COR_RCV | \ + PCI_ERR_ROOT_MULTI_COR_RCV |\ + PCI_ERR_ROOT_MULTI_UNCOR_RCV) + static int pcie_aer_disable; static pci_ers_result_t aer_root_reset(struct pci_dev *dev); @@ -1196,7 +1201,7 @@ static irqreturn_t aer_irq(int irq, void *context) struct aer_err_source e_src = {}; pci_read_config_dword(rp, aer + PCI_ERR_ROOT_STATUS, _src.status); - if (!(e_src.status & (PCI_ERR_ROOT_UNCOR_RCV|PCI_ERR_ROOT_COR_RCV))) + if (!(e_src.status & AER_ERR_STATUS_MASK)) return IRQ_NONE; pci_read_config_dword(rp, aer + PCI_ERR_ROOT_ERR_SRC, _src.id); -- 2.25.1
[PATCH v2] PCI/AER: Handle Multi UnCorrectable/Correctable errors properly
Currently the aer_irq() handler returns IRQ_NONE for cases without bits PCI_ERR_ROOT_UNCOR_RCV or PCI_ERR_ROOT_COR_RCV are set. But this assumption is incorrect. Consider a scenario where aer_irq() is triggered for a correctable error, and while we process the error and before we clear the error status in "Root Error Status" register, if the same kind of error is triggered again, since aer_irq() only clears events it saw, the multi-bit error is left in tact. This will cause the interrupt to fire again, resulting in entering aer_irq() with just the multi-bit error logged in the "Root Error Status" register. Repeated AER recovery test has revealed this condition does happen and this prevents any new interrupt from being triggered. Allow to process interrupt even if only multi-correctable (BIT 1) or multi-uncorrectable bit (BIT 3) is set. This error can be reproduced by making following changes to the aer_irq() function and by executing the given test commands. static irqreturn_t aer_irq(int irq, void *context) struct aer_err_source e_src = {}; pci_read_config_dword(rp, aer + PCI_ERR_ROOT_STATUS, _src.status); + pci_dbg(pdev->port, "Root Error Status: %04x\n", + e_src.status); if (!(e_src.status & AER_ERR_STATUS_MASK)) return IRQ_NONE; + mdelay(5000); # Prep injection data for a correctable error. $ cd /sys/kernel/debug/apei/einj $ echo 0x0040 > error_type $ echo 0x4 > flags $ echo 0x891000 > param4 # Root Error Status is initially clear $ setpci -s ECAP0001+0x30.w # Inject one error $ echo 1 > error_inject # Interrupt received pcieport : AER: Root Error Status 0001 # Inject another error (within 5 seconds) $ echo 1 > error_inject # No interrupt received, but "multiple ERR_COR" is now set $ setpci -s ECAP0001+0x30.w 0003 # Wait for a while, then clear ERR_COR. A new interrupt immediately fires. $ setpci -s ECAP0001+0x30.w=0x1 pcieport : AER: Root Error Status 0002 Currently, the above issue has been only reproduced in the ICL server platform. [Eric: proposed reproducing steps] Fixes: 4696b828ca37 ("PCI/AER: Hoist aerdrv.c, aer_inject.c up to drivers/pci/pcie/") Reported-by: Eric Badger Reviewed-by: Ashok Raj Signed-off-by: Kuppuswamy Sathyanarayanan --- Changes since v1: * Added Fixes tag. * Included reproducing steps proposed by Eric. drivers/pci/pcie/aer.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 9fa1f97e5b27..7952e5efd6cf 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -101,6 +101,11 @@ struct aer_stats { #define ERR_COR_ID(d) (d & 0x) #define ERR_UNCOR_ID(d)(d >> 16) +#define AER_ERR_STATUS_MASK(PCI_ERR_ROOT_UNCOR_RCV | \ + PCI_ERR_ROOT_COR_RCV | \ + PCI_ERR_ROOT_MULTI_COR_RCV |\ + PCI_ERR_ROOT_MULTI_UNCOR_RCV) + static int pcie_aer_disable; static pci_ers_result_t aer_root_reset(struct pci_dev *dev); @@ -1196,7 +1201,7 @@ static irqreturn_t aer_irq(int irq, void *context) struct aer_err_source e_src = {}; pci_read_config_dword(rp, aer + PCI_ERR_ROOT_STATUS, _src.status); - if (!(e_src.status & (PCI_ERR_ROOT_UNCOR_RCV|PCI_ERR_ROOT_COR_RCV))) + if (!(e_src.status & AER_ERR_STATUS_MASK)) return IRQ_NONE; pci_read_config_dword(rp, aer + PCI_ERR_ROOT_ERR_SRC, _src.id); -- 2.25.1
[PATCH v1] PCI/AER: Handle Multi UnCorrectable/Correctable errors properly
Currently the aer_irq() handler returns IRQ_NONE for cases without bits PCI_ERR_ROOT_UNCOR_RCV or PCI_ERR_ROOT_COR_RCV are set. But this assumption is incorrect. Consider a scenario where aer_irq() is triggered for a correctable error, and while we process the error and before we clear the error status in "Root Error Status" register, if the same kind of error is triggered again, since aer_irq() only clears events it saw, the multi-bit error is left in tact. This will cause the interrupt to fire again, resulting in entering aer_irq() with just the multi-bit error logged in the "Root Error Status" register. Repeated AER recovery test has revealed this condition does happen and this prevents any new interrupt from being triggered. Allow to process interrupt even if only multi-correctable (BIT 1) or multi-uncorrectable bit (BIT 3) is set. Reported-by: Eric Badger Reviewed-by: Ashok Raj Signed-off-by: Kuppuswamy Sathyanarayanan --- drivers/pci/pcie/aer.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 9fa1f97e5b27..7952e5efd6cf 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -101,6 +101,11 @@ struct aer_stats { #define ERR_COR_ID(d) (d & 0x) #define ERR_UNCOR_ID(d)(d >> 16) +#define AER_ERR_STATUS_MASK(PCI_ERR_ROOT_UNCOR_RCV | \ + PCI_ERR_ROOT_COR_RCV | \ + PCI_ERR_ROOT_MULTI_COR_RCV |\ + PCI_ERR_ROOT_MULTI_UNCOR_RCV) + static int pcie_aer_disable; static pci_ers_result_t aer_root_reset(struct pci_dev *dev); @@ -1196,7 +1201,7 @@ static irqreturn_t aer_irq(int irq, void *context) struct aer_err_source e_src = {}; pci_read_config_dword(rp, aer + PCI_ERR_ROOT_STATUS, _src.status); - if (!(e_src.status & (PCI_ERR_ROOT_UNCOR_RCV|PCI_ERR_ROOT_COR_RCV))) + if (!(e_src.status & AER_ERR_STATUS_MASK)) return IRQ_NONE; pci_read_config_dword(rp, aer + PCI_ERR_ROOT_ERR_SRC, _src.id); -- 2.25.1
Re: [PATCH v4 0/8] Implement generic cc_platform_has() helper function
On 9/28/21 1:58 PM, Borislav Petkov wrote: On Tue, Sep 28, 2021 at 01:48:46PM -0700, Kuppuswamy, Sathyanarayanan wrote: Just read it. If you want to use cpuid_has_tdx_guest() directly in cc_platform_has(), then you want to rename intel_cc_platform_has() to tdx_cc_platform_has()? Why? You simply do: if (cpuid_has_tdx_guest()) intel_cc_platform_has(...); and lemme paste from that mail: " ...you should use cpuid_has_tdx_guest() instead but cache its result so that you don't call CPUID each time the kernel executes cc_platform_has()." Makes sense? Yes. But, since the check is related to TDX, I just want to confirm whether you are fine with naming the function as intel_*(). Since this patch is going to have dependency on TDX code, I will include this patch in TDX patch set. -- Sathyanarayanan Kuppuswamy Linux Kernel Developer
Re: [PATCH v4 0/8] Implement generic cc_platform_has() helper function
On 9/28/21 1:31 PM, Borislav Petkov wrote: On Tue, Sep 28, 2021 at 12:19:49PM -0700, Kuppuswamy, Sathyanarayanan wrote: Intel CC support patch is not included in this series. You want me to address the issue raised by Joerg before merging it? Did you not see my email to you today: https://lkml.kernel.org/r/yvl4zughfsh1q...@zn.tnic Just read it. If you want to use cpuid_has_tdx_guest() directly in cc_platform_has(), then you want to rename intel_cc_platform_has() to tdx_cc_platform_has()? ? -- Sathyanarayanan Kuppuswamy Linux Kernel Developer
Re: [PATCH v4 0/8] Implement generic cc_platform_has() helper function
On 9/28/21 12:10 PM, Borislav Petkov wrote: From: Borislav Petkov Hi all, here's v4 of the cc_platform_has() patchset with feedback incorporated. I'm going to route this through tip if there are no objections. Intel CC support patch is not included in this series. You want me to address the issue raised by Joerg before merging it? Thx. Tom Lendacky (8): x86/ioremap: Selectively build arch override encryption functions arch/cc: Introduce a function to check for confidential computing features x86/sev: Add an x86 version of cc_platform_has() powerpc/pseries/svm: Add a powerpc version of cc_platform_has() x86/sme: Replace occurrences of sme_active() with cc_platform_has() x86/sev: Replace occurrences of sev_active() with cc_platform_has() x86/sev: Replace occurrences of sev_es_active() with cc_platform_has() treewide: Replace the use of mem_encrypt_active() with cc_platform_has() arch/Kconfig | 3 + arch/powerpc/include/asm/mem_encrypt.h | 5 -- arch/powerpc/platforms/pseries/Kconfig | 1 + arch/powerpc/platforms/pseries/Makefile | 2 + arch/powerpc/platforms/pseries/cc_platform.c | 26 ++ arch/powerpc/platforms/pseries/svm.c | 5 +- arch/s390/include/asm/mem_encrypt.h | 2 - arch/x86/Kconfig | 1 + arch/x86/include/asm/io.h| 8 ++ arch/x86/include/asm/kexec.h | 2 +- arch/x86/include/asm/mem_encrypt.h | 12 +-- arch/x86/kernel/Makefile | 6 ++ arch/x86/kernel/cc_platform.c| 69 +++ arch/x86/kernel/crash_dump_64.c | 4 +- arch/x86/kernel/head64.c | 9 +- arch/x86/kernel/kvm.c| 3 +- arch/x86/kernel/kvmclock.c | 4 +- arch/x86/kernel/machine_kexec_64.c | 19 +++-- arch/x86/kernel/pci-swiotlb.c| 9 +- arch/x86/kernel/relocate_kernel_64.S | 2 +- arch/x86/kernel/sev.c| 6 +- arch/x86/kvm/svm/svm.c | 3 +- arch/x86/mm/ioremap.c| 18 ++-- arch/x86/mm/mem_encrypt.c| 55 arch/x86/mm/mem_encrypt_identity.c | 9 +- arch/x86/mm/pat/set_memory.c | 3 +- arch/x86/platform/efi/efi_64.c | 9 +- arch/x86/realmode/init.c | 8 +- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 +- drivers/gpu/drm/drm_cache.c | 4 +- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 4 +- drivers/gpu/drm/vmwgfx/vmwgfx_msg.c | 6 +- drivers/iommu/amd/init.c | 7 +- drivers/iommu/amd/iommu.c| 3 +- drivers/iommu/amd/iommu_v2.c | 3 +- drivers/iommu/iommu.c| 3 +- fs/proc/vmcore.c | 6 +- include/linux/cc_platform.h | 88 include/linux/mem_encrypt.h | 4 - kernel/dma/swiotlb.c | 4 +- 40 files changed, 310 insertions(+), 129 deletions(-) create mode 100644 arch/powerpc/platforms/pseries/cc_platform.c create mode 100644 arch/x86/kernel/cc_platform.c create mode 100644 include/linux/cc_platform.h -- Sathyanarayanan Kuppuswamy Linux Kernel Developer
Re: [PATCH v3 0/8] Implement generic cc_platform_has() helper function
On 9/16/21 8:02 AM, Borislav Petkov wrote: On Wed, Sep 15, 2021 at 10:26:06AM -0700, Kuppuswamy, Sathyanarayanan wrote: I have a Intel variant patch (please check following patch). But it includes TDX changes as well. Shall I move TDX changes to different patch and just create a separate patch for adding intel_cc_platform_has()? Yes, please, so that I can expedite that stuff separately and so that it can go in early in order for future work to be based ontop. Sent it part of TDX patch series. Please check and cherry pick it. https://lore.kernel.org/lkml/20210916183550.15349-2-sathyanarayanan.kuppusw...@linux.intel.com/ Thx. -- Sathyanarayanan Kuppuswamy Linux Kernel Developer
Re: [PATCH v3 0/8] Implement generic cc_platform_has() helper function
On 9/15/21 9:46 AM, Borislav Petkov wrote: Sathya, if you want to prepare the Intel variant intel_cc_platform_has() ontop of those and send it to me, that would be good because then I can integrate it all in one branch which can be used to base future work ontop. I have a Intel variant patch (please check following patch). But it includes TDX changes as well. Shall I move TDX changes to different patch and just create a separate patch for adding intel_cc_platform_has()? commit fc5f98a0ed94629d903827c5b44ee9295f835831 Author: Kuppuswamy Sathyanarayanan Date: Wed May 12 11:35:13 2021 -0700 x86/tdx: Add confidential guest support for TDX guest TDX architecture provides a way for VM guests to be highly secure and isolated (from untrusted VMM). To achieve this requirement, any data coming from VMM cannot be completely trusted. TDX guest fixes this issue by hardening the IO drivers against the attack from the VMM. So, when adding hardening fixes to the generic drivers, to protect custom fixes use cc_platform_has() API. Also add TDX guest support to cc_platform_has() API to protect the TDX specific fixes. Signed-off-by: Kuppuswamy Sathyanarayanan diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a5b14de03458..2e78358923a1 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -871,6 +871,7 @@ config INTEL_TDX_GUEST depends on SECURITY select X86_X2APIC select SECURITY_LOCKDOWN_LSM + select ARCH_HAS_CC_PLATFORM help Provide support for running in a trusted domain on Intel processors equipped with Trusted Domain eXtensions. TDX is a new Intel diff --git a/arch/x86/include/asm/intel_cc_platform.h b/arch/x86/include/asm/intel_cc_platform.h new file mode 100644 index ..472c3174beac --- /dev/null +++ b/arch/x86/include/asm/intel_cc_platform.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright (C) 2021 Intel Corporation */ +#ifndef _ASM_X86_INTEL_CC_PLATFORM_H +#define _ASM_X86_INTEL_CC_PLATFORM_H + +#if defined(CONFIG_CPU_SUP_INTEL) && defined(CONFIG_ARCH_HAS_CC_PLATFORM) +bool intel_cc_platform_has(unsigned int flag); +#else +static inline bool intel_cc_platform_has(unsigned int flag) { return false; } +#endif + +#endif /* _ASM_X86_INTEL_CC_PLATFORM_H */ + diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c index 3c9bacd3c3f3..e83bc2f48efe 100644 --- a/arch/x86/kernel/cc_platform.c +++ b/arch/x86/kernel/cc_platform.c @@ -10,11 +10,16 @@ #include #include #include +#include + +#include bool cc_platform_has(enum cc_attr attr) { if (sme_me_mask) return amd_cc_platform_has(attr); + else if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) + return intel_cc_platform_has(attr); return false; } diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 8321c43554a1..ab486a3b1eb0 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include @@ -60,6 +61,21 @@ static u64 msr_test_ctrl_cache __ro_after_init; */ static bool cpu_model_supports_sld __ro_after_init; +#ifdef CONFIG_ARCH_HAS_CC_PLATFORM +bool intel_cc_platform_has(enum cc_attr attr) +{ + switch (attr) { + case CC_ATTR_GUEST_TDX: + return cpu_feature_enabled(X86_FEATURE_TDX_GUEST); + default: + return false; + } + + return false; +} +EXPORT_SYMBOL_GPL(intel_cc_platform_has); +#endif + /* * Processors which have self-snooping capability can handle conflicting * memory type across CPUs by snooping its own cache. However, there exists diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h index 253f3ea66cd8..e38430e6e396 100644 --- a/include/linux/cc_platform.h +++ b/include/linux/cc_platform.h @@ -61,6 +61,15 @@ enum cc_attr { * Examples include SEV-ES. */ CC_ATTR_GUEST_STATE_ENCRYPT, + + /** +* @CC_ATTR_GUEST_TDX: Trusted Domain Extension Support +* +* The platform/OS is running as a TDX guest/virtual machine. +* +* Examples include SEV-ES. +*/ + CC_ATTR_GUEST_TDX, }; #ifdef CONFIG_ARCH_HAS_CC_PLATFORM -- Sathyanarayanan Kuppuswamy Linux Kernel Developer
Re: [PATCH v2 03/12] x86/sev: Add an x86 version of prot_guest_has()
On 8/19/21 11:33 AM, Tom Lendacky wrote: There was some talk about this on the mailing list where TDX and SEV may need to be differentiated, so we wanted to reserve a range of values per technology. I guess I can remove them until they are actually needed. In TDX also we have similar requirements and we need some flags for TDX specific checks. So I think it is fine to leave some space for vendor flags. -- Sathyanarayanan Kuppuswamy Linux Kernel Developer
Re: [PATCH v2 02/12] mm: Introduce a function to check for virtualization protection features
On 8/13/21 9:59 AM, Tom Lendacky wrote: In prep for other protected virtualization technologies, introduce a generic helper function, prot_guest_has(), that can be used to check for specific protection attributes, like memory encryption. This is intended to eliminate having to add multiple technology-specific checks to the code (e.g. if (sev_active() || tdx_active())). Reviewed-by: Joerg Roedel Co-developed-by: Andi Kleen Signed-off-by: Andi Kleen Co-developed-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kuppuswamy Sathyanarayanan Signed-off-by: Tom Lendacky --- arch/Kconfig| 3 +++ include/linux/protected_guest.h | 35 + 2 files changed, 38 insertions(+) create mode 100644 include/linux/protected_guest.h Reviewed-by: Kuppuswamy Sathyanarayanan -- Sathyanarayanan Kuppuswamy Linux Kernel Developer
Re: [PATCH 01/11] mm: Introduce a function to check for virtualization protection features
On 7/27/21 3:26 PM, Tom Lendacky wrote: diff --git a/include/linux/protected_guest.h b/include/linux/protected_guest.h new file mode 100644 index ..f8ed7b72967b --- /dev/null +++ b/include/linux/protected_guest.h @@ -0,0 +1,32 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Protected Guest (and Host) Capability checks + * + * Copyright (C) 2021 Advanced Micro Devices, Inc. + * + * Author: Tom Lendacky + */ + +#ifndef _PROTECTED_GUEST_H +#define _PROTECTED_GUEST_H + +#ifndef __ASSEMBLY__ Can you include headers for bool type and false definition? --- a/include/linux/protected_guest.h +++ b/include/linux/protected_guest.h @@ -12,6 +12,9 @@ #ifndef __ASSEMBLY__ +#include +#include Otherwise, I see following errors in multi-config auto testing. include/linux/protected_guest.h:40:15: error: unknown type name 'bool' include/linux/protected_guest.h:40:63: error: 'false' undeclared (first use in this functi + +#define PATTR_MEM_ENCRYPT 0 /* Encrypted memory */ +#define PATTR_HOST_MEM_ENCRYPT 1 /* Host encrypted memory */ +#define PATTR_GUEST_MEM_ENCRYPT2 /* Guest encrypted memory */ +#define PATTR_GUEST_PROT_STATE 3 /* Guest encrypted state */ + +#ifdef CONFIG_ARCH_HAS_PROTECTED_GUEST + +#include + +#else /* !CONFIG_ARCH_HAS_PROTECTED_GUEST */ + +static inline bool prot_guest_has(unsigned int attr) { return false; } + +#endif /* CONFIG_ARCH_HAS_PROTECTED_GUEST */ + +#endif /* __ASSEMBLY__ */ + +#endif /* _PROTECTED_GUEST_H */ -- Sathyanarayanan Kuppuswamy Linux Kernel Developer
Re: [PATCH 07/11] treewide: Replace the use of mem_encrypt_active() with prot_guest_has()
On 8/10/21 12:48 PM, Tom Lendacky wrote: On 8/10/21 1:45 PM, Kuppuswamy, Sathyanarayanan wrote: On 7/27/21 3:26 PM, Tom Lendacky wrote: diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index de01903c3735..cafed6456d45 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -19,7 +19,7 @@ #include #include #include -#include +#include #include #include @@ -285,7 +285,7 @@ unsigned long __head __startup_64(unsigned long physaddr, * there is no need to zero it after changing the memory encryption * attribute. */ - if (mem_encrypt_active()) { + if (prot_guest_has(PATTR_MEM_ENCRYPT)) { vaddr = (unsigned long)__start_bss_decrypted; vaddr_end = (unsigned long)__end_bss_decrypted; Since this change is specific to AMD, can you replace PATTR_MEM_ENCRYPT with prot_guest_has(PATTR_SME) || prot_guest_has(PATTR_SEV). It is not used in TDX. This is a direct replacement for now. I think the change you're requesting should be done as part of the TDX support patches so it's clear why it is being changed. Ok. I will include it part of TDX changes. But, wouldn't TDX still need to do something with this shared/unencrypted area, though? Or since it is shared, there's actually nothing you need to do (the bss decrpyted section exists even if CONFIG_AMD_MEM_ENCRYPT is not configured)? Kirill had a requirement to turn on CONFIG_AMD_MEM_ENCRYPT for adding lazy accept support in TDX guest kernel. Kirill, can you add details here? Thanks, Tom -- Sathyanarayanan Kuppuswamy Linux Kernel Developer
Re: [PATCH 07/11] treewide: Replace the use of mem_encrypt_active() with prot_guest_has()
On 7/27/21 3:26 PM, Tom Lendacky wrote: diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index de01903c3735..cafed6456d45 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -19,7 +19,7 @@ #include #include #include -#include +#include #include #include @@ -285,7 +285,7 @@ unsigned long __head __startup_64(unsigned long physaddr, * there is no need to zero it after changing the memory encryption * attribute. */ - if (mem_encrypt_active()) { + if (prot_guest_has(PATTR_MEM_ENCRYPT)) { vaddr = (unsigned long)__start_bss_decrypted; vaddr_end = (unsigned long)__end_bss_decrypted; Since this change is specific to AMD, can you replace PATTR_MEM_ENCRYPT with prot_guest_has(PATTR_SME) || prot_guest_has(PATTR_SEV). It is not used in TDX. -- Sathyanarayanan Kuppuswamy Linux Kernel Developer
Re: [PATCH 06/11] x86/sev: Replace occurrences of sev_es_active() with prot_guest_has()
On 8/9/21 2:59 PM, Tom Lendacky wrote: Not sure how TDX will handle AP booting, are you sure it needs this special setup as well? Otherwise a check for SEV-ES would be better instead of the generic PATTR_GUEST_PROT_STATE. Yes, I'm not sure either. I figure that change can be made, if needed, as part of the TDX support. We don't plan to set PROT_STATE. So it does not affect TDX. For SMP, we use MADT ACPI table for AP booting. -- Sathyanarayanan Kuppuswamy Linux Kernel Developer
Re: [PATCH 00/11] Implement generic prot_guest_has() helper function
Hi Tom, On 7/27/21 3:26 PM, Tom Lendacky wrote: This patch series provides a generic helper function, prot_guest_has(), to replace the sme_active(), sev_active(), sev_es_active() and mem_encrypt_active() functions. It is expected that as new protected virtualization technologies are added to the kernel, they can all be covered by a single function call instead of a collection of specific function calls all called from the same locations. The powerpc and s390 patches have been compile tested only. Can the folks copied on this series verify that nothing breaks for them. With this patch set, select ARCH_HAS_PROTECTED_GUEST and set CONFIG_AMD_MEM_ENCRYPT=n, creates following error. ld: arch/x86/mm/ioremap.o: in function `early_memremap_is_setup_data': arch/x86/mm/ioremap.c:672: undefined reference to `early_memremap_decrypted' It looks like early_memremap_is_setup_data() is not protected with appropriate config. Cc: Andi Kleen Cc: Andy Lutomirski Cc: Ard Biesheuvel Cc: Baoquan He Cc: Benjamin Herrenschmidt Cc: Borislav Petkov Cc: Christian Borntraeger Cc: Daniel Vetter Cc: Dave Hansen Cc: Dave Young Cc: David Airlie Cc: Heiko Carstens Cc: Ingo Molnar Cc: Joerg Roedel Cc: Maarten Lankhorst Cc: Maxime Ripard Cc: Michael Ellerman Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Thomas Zimmermann Cc: Vasily Gorbik Cc: VMware Graphics Cc: Will Deacon --- Patches based on: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git master commit 79e920060fa7 ("Merge branch 'WIP/fixes'") Tom Lendacky (11): mm: Introduce a function to check for virtualization protection features x86/sev: Add an x86 version of prot_guest_has() powerpc/pseries/svm: Add a powerpc version of prot_guest_has() x86/sme: Replace occurrences of sme_active() with prot_guest_has() x86/sev: Replace occurrences of sev_active() with prot_guest_has() x86/sev: Replace occurrences of sev_es_active() with prot_guest_has() treewide: Replace the use of mem_encrypt_active() with prot_guest_has() mm: Remove the now unused mem_encrypt_active() function x86/sev: Remove the now unused mem_encrypt_active() function powerpc/pseries/svm: Remove the now unused mem_encrypt_active() function s390/mm: Remove the now unused mem_encrypt_active() function arch/Kconfig | 3 ++ arch/powerpc/include/asm/mem_encrypt.h | 5 -- arch/powerpc/include/asm/protected_guest.h | 30 +++ arch/powerpc/platforms/pseries/Kconfig | 1 + arch/s390/include/asm/mem_encrypt.h| 2 - arch/x86/Kconfig | 1 + arch/x86/include/asm/kexec.h | 2 +- arch/x86/include/asm/mem_encrypt.h | 13 + arch/x86/include/asm/protected_guest.h | 27 ++ arch/x86/kernel/crash_dump_64.c| 4 +- arch/x86/kernel/head64.c | 4 +- arch/x86/kernel/kvm.c | 3 +- arch/x86/kernel/kvmclock.c | 4 +- arch/x86/kernel/machine_kexec_64.c | 19 +++ arch/x86/kernel/pci-swiotlb.c | 9 ++-- arch/x86/kernel/relocate_kernel_64.S | 2 +- arch/x86/kernel/sev.c | 6 +-- arch/x86/kvm/svm/svm.c | 3 +- arch/x86/mm/ioremap.c | 16 +++--- arch/x86/mm/mem_encrypt.c | 60 +++--- arch/x86/mm/mem_encrypt_identity.c | 3 +- arch/x86/mm/pat/set_memory.c | 3 +- arch/x86/platform/efi/efi_64.c | 9 ++-- arch/x86/realmode/init.c | 8 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 4 +- drivers/gpu/drm/drm_cache.c| 4 +- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c| 4 +- drivers/gpu/drm/vmwgfx/vmwgfx_msg.c| 6 +-- drivers/iommu/amd/init.c | 7 +-- drivers/iommu/amd/iommu.c | 3 +- drivers/iommu/amd/iommu_v2.c | 3 +- drivers/iommu/iommu.c | 3 +- fs/proc/vmcore.c | 6 +-- include/linux/mem_encrypt.h| 4 -- include/linux/protected_guest.h| 37 + kernel/dma/swiotlb.c | 4 +- 36 files changed, 218 insertions(+), 104 deletions(-) create mode 100644 arch/powerpc/include/asm/protected_guest.h create mode 100644 arch/x86/include/asm/protected_guest.h create mode 100644 include/linux/protected_guest.h -- Sathyanarayanan Kuppuswamy Linux Kernel Developer
Re: [PATCH v3 0/2] PCI/ERR: Allow Native AER/DPC using _OSC
Hi Bjorn, Derrick, On 5/22/20 12:46 PM, Bjorn Helgaas wrote: On Fri, May 22, 2020 at 05:23:31PM +, Derrick, Jonathan wrote: On Fri, 2020-05-01 at 11:35 -0600, Jonathan Derrick wrote: On Fri, 2020-05-01 at 12:16 -0500, Bjorn Helgaas wrote: On Thu, Apr 30, 2020 at 12:46:07PM -0600, Jon Derrick wrote: Hi Bjorn & Kuppuswamy, I see a problem in the DPC ECN [1] to _OSC in that it doesn't give us a way to determine if firmware supports _OSC DPC negotation, and therefore how to handle DPC. Here is the wording of the ECN that implies that Firmware without _OSC DPC negotiation support should have the OSPM rely on _OSC AER negotiation when determining DPC control: PCIe Base Specification suggests that Downstream Port Containment may be controlled either by the Firmware or the Operating System. It also suggests that the Firmware retain ownership of Downstream Port Containment if it also owns AER. When the Firmware owns Downstream Port Containment, it is expected to use the new "Error Disconnect Recover" notification to alert OSPM of a Downstream Port Containment event. In legacy platforms, as bits in _OSC are reserved prior to implementation, ACPI Root Bus enumeration will mark these Host Bridges as without Native DPC support, even though the specification implies it's expected that AER _OSC negotiation determines DPC control for these platforms. There seems to be a need for a way to determine if the DPC control bit in _OSC is supported and fallback on AER otherwise. Currently portdrv assumes DPC control if the port has Native AER services: static int get_port_device_capability(struct pci_dev *dev) ... if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) && pci_aer_available() && (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER))) services |= PCIE_PORT_SERVICE_DPC; Newer firmware may not grant OSPM DPC control, if for instance, it expects to use Error Disconnect Recovery. However it looks like ACPI will use DPC services via the EDR driver, without binding the full DPC port service driver. If we change portdrv to probe based on host->native_dpc and not AER, then we break instances with legacy firmware where OSPM will clear host->native_dpc solely due to _OSC bits being reserved: struct pci_bus *acpi_pci_root_create(struct acpi_pci_root *root, ... if (!(root->osc_control_set & OSC_PCI_EXPRESS_DPC_CONTROL)) host_bridge->native_dpc = 0; So my assumption instead is that host->native_dpc can be 0 and expect Native DPC services if AER is used. In other words, if and only if DPC probe is invoked from portdrv, then it needs to rely on the AER dependency. Otherwise it should be assumed that ACPI set up DPC via EDR. This covers legacy firmware. However it seems like that could be trouble with newer firmware that might give OSPM control of AER but not DPC, and would result in both Native DPC and EDR being in effect. Anyways here are two patches that give control of AER and DPC on the results of _OSC. They don't mess with the HEST parser as I expect those to be removed at some point. I need these for VMD support which doesn't even rely on _OSC, but I suspect this won't be the last effort as we detangle Firmware First. [1] https://members.pcisig.com/wg/PCI-SIG/document/12888 Hi Jon, I think we need to sort out the _OSC/FIRMWARE_FIRST patches from Alex and Sathy first, then see what needs to be done on top of those, so I'm going to push these off for a few days and they'll probably need a refresh. Bjorn Agreed, no need to merge now. Just wanted to bring up the DPC ambiguity, which I think was first addressed by dpc-native: commit 35a0b2378c199d4f26e458b2ca38ea56aaf2d9b8 Author: Olof Johansson Date: Wed Oct 23 12:22:05 2019 -0700 PCI/DPC: Add "pcie_ports=dpc-native" to allow DPC without AER control Prior to eed85ff4c0da7 ("PCI/DPC: Enable DPC only if AER is available"), Linux handled DPC events regardless of whether firmware had granted it ownership of AER or DPC, e.g., via _OSC. PCIe r5.0, sec 6.2.10, recommends that the OS link control of DPC to control of AER, so after eed85ff4c0da7, Linux handles DPC events only if it has control of AER. On platforms that do not grant OS control of AER via _OSC, Linux DPC handling worked before eed85ff4c0da7 but not after. To make Linux DPC handling work on those platforms the same way they did before, add a "pcie_ports=dpc-native" kernel parameter that makes Linux handle DPC events regardless of whether it has control of AER. [bhelgaas: commit log, move pcie_ports_dpc_native to drivers/pci/] Link: https://lore.kernel.org/r/20191023192205.97024-1-o...@lixom.net Signed-off-by: Olof Johansson Signed-off-by: Bjorn Helgaas Are you still thinking about removing the HEST parser? For VMD we still need the ability to bind DPC if
Re: [PATCH v2 2/2] PCI/DPC: Allow Native DPC Host Bridges to use DPC
On 4/27/20 8:15 AM, Derrick, Jonathan wrote: Hi Sathyanarayanan, On Sat, 2020-04-25 at 13:46 -0700, Kuppuswamy, Sathyanarayanan wrote: On 4/23/20 8:11 AM, Derrick, Jonathan wrote: Hi Sathyanarayanan, On Wed, 2020-04-22 at 15:50 -0700, Kuppuswamy, Sathyanarayanan wrote: On 4/20/20 2:37 PM, Jon Derrick wrote: The existing portdrv model prevents DPC services without either OS control (_OSC) granted to AER services, a Host Bridge requesting Native AER, or using one of the 'pcie_ports=' parameters of 'native' or 'dpc-native'. The DPC port service driver itself will also fail to probe if the kernel assumes the port is using Firmware-First AER. It's a reasonable expectation that a port using Firmware-First AER will also be using Firmware-First DPC, however if a Host Bridge requests Native DPC, the DPC driver should allow it and not fail to bind due to AER capability settings. Host Bridges which request Native DPC port services will also likely request Native AER, however it shouldn't be a requirement. This patch allows ports on those Host Bridges to have DPC port services. This will avoid the unlikely situation where the port is Firmware-First AER and Native DPC, and a BIOS or switch firmware preconfiguration of the DPC trigger could result in unhandled DPC events. Signed-off-by: Jon Derrick --- drivers/pci/pcie/dpc.c | 3 ++- drivers/pci/pcie/portdrv_core.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index 7621704..3f3106f 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -284,7 +284,8 @@ static int dpc_probe(struct pcie_device *dev) int status; u16 ctl, cap; - if (pcie_aer_get_firmware_first(pdev) && !pcie_ports_dpc_native) + if (pcie_aer_get_firmware_first(pdev) && !pcie_ports_dpc_native && + !pci_find_host_bridge(pdev->bus)->native_dpc) Why do it in probe as well ? if host->native_dpc is not set then the device DPC probe it self won't happen right ? Portdrv only enables the interrupt and allows the probe to occur. Please check the following snippet of code (from portdrv_core.c). IIUC, pcie_device_init() will not be called if PCIE_PORT_SERVICE_DPC is not set in capabilities. Your change in portdrv_core.c already selectively enables the PCIE_PORT_SERVICE_DPC service based on native_dpc value. That's right. So pcie_device_init registers the port service driver allowing the services enumeration to occur. So IMO, adding native_dpc check in dpc_probe() is redundant. int pcie_port_device_register(struct pci_dev *dev) /* Allocate child services if any */ status = -ENODEV; nr_service = 0; for (i = 0; i < PCIE_PORT_DEVICE_MAXSERVICES; i++) { int service = 1 << i; if (!(capabilities & service)) continue; if (!pcie_device_init(dev, service, irqs[i])) nr_service++; } This is the tricky part There's still a check in dpc_probe for AER FFS or pcie_ports=dpc- native: if (pcie_aer_get_firmware_first(pdev) && !pcie_ports_dpc_native) return -ENOTSUPP; One option is to move that to get_port_device_capability and remove the dpc_probe check Yes, its better to group them together in get_port_device_capability(). But it should be done in a separate patch. The probe itself will still fail if there's a mixed-mode _OSC negotiated AER & DPC, due to pcie_aer_get_firmware_first returning 1 for AER and no check for DPC. I don't know if such a platform will exist, but the kernel is already wired for 'dpc-native' so it makes sense to extend it for this.. This transform might be more readable: if (pcie_aer_get_firmware_first(pdev) && !(pcie_ports_dpc_native || hb->native_dpc)) return -ENOTSUPP; status = devm_request_threaded_irq(device, dev->irq, dpc_irq, diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c index 50a9522..f2139a1 100644 --- a/drivers/pci/pcie/portdrv_core.c +++ b/drivers/pci/pcie/portdrv_core.c @@ -256,7 +256,8 @@ static int get_port_device_capability(struct pci_dev *dev) */ if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) && pci_aer_available() && - (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER))) + (pcie_ports_dpc_native || host->native_dpc || +(services & PCIE_PORT_SERVICE_AER))) services |= PCIE_PORT_SERVICE_DPC; if (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
Re: [PATCH v2 2/2] PCI/DPC: Allow Native DPC Host Bridges to use DPC
On 4/23/20 8:11 AM, Derrick, Jonathan wrote: Hi Sathyanarayanan, On Wed, 2020-04-22 at 15:50 -0700, Kuppuswamy, Sathyanarayanan wrote: On 4/20/20 2:37 PM, Jon Derrick wrote: The existing portdrv model prevents DPC services without either OS control (_OSC) granted to AER services, a Host Bridge requesting Native AER, or using one of the 'pcie_ports=' parameters of 'native' or 'dpc-native'. The DPC port service driver itself will also fail to probe if the kernel assumes the port is using Firmware-First AER. It's a reasonable expectation that a port using Firmware-First AER will also be using Firmware-First DPC, however if a Host Bridge requests Native DPC, the DPC driver should allow it and not fail to bind due to AER capability settings. Host Bridges which request Native DPC port services will also likely request Native AER, however it shouldn't be a requirement. This patch allows ports on those Host Bridges to have DPC port services. This will avoid the unlikely situation where the port is Firmware-First AER and Native DPC, and a BIOS or switch firmware preconfiguration of the DPC trigger could result in unhandled DPC events. Signed-off-by: Jon Derrick --- drivers/pci/pcie/dpc.c | 3 ++- drivers/pci/pcie/portdrv_core.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index 7621704..3f3106f 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -284,7 +284,8 @@ static int dpc_probe(struct pcie_device *dev) int status; u16 ctl, cap; - if (pcie_aer_get_firmware_first(pdev) && !pcie_ports_dpc_native) + if (pcie_aer_get_firmware_first(pdev) && !pcie_ports_dpc_native && + !pci_find_host_bridge(pdev->bus)->native_dpc) Why do it in probe as well ? if host->native_dpc is not set then the device DPC probe it self won't happen right ? Portdrv only enables the interrupt and allows the probe to occur. Please check the following snippet of code (from portdrv_core.c). IIUC, pcie_device_init() will not be called if PCIE_PORT_SERVICE_DPC is not set in capabilities. Your change in portdrv_core.c already selectively enables the PCIE_PORT_SERVICE_DPC service based on native_dpc value. So IMO, adding native_dpc check in dpc_probe() is redundant. int pcie_port_device_register(struct pci_dev *dev) /* Allocate child services if any */ status = -ENODEV; nr_service = 0; for (i = 0; i < PCIE_PORT_DEVICE_MAXSERVICES; i++) { int service = 1 << i; if (!(capabilities & service)) continue; if (!pcie_device_init(dev, service, irqs[i])) nr_service++; } The probe itself will still fail if there's a mixed-mode _OSC negotiated AER & DPC, due to pcie_aer_get_firmware_first returning 1 for AER and no check for DPC. I don't know if such a platform will exist, but the kernel is already wired for 'dpc-native' so it makes sense to extend it for this.. This transform might be more readable: if (pcie_aer_get_firmware_first(pdev) && !(pcie_ports_dpc_native || hb->native_dpc)) return -ENOTSUPP; status = devm_request_threaded_irq(device, dev->irq, dpc_irq, diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c index 50a9522..f2139a1 100644 --- a/drivers/pci/pcie/portdrv_core.c +++ b/drivers/pci/pcie/portdrv_core.c @@ -256,7 +256,8 @@ static int get_port_device_capability(struct pci_dev *dev) */ if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) && pci_aer_available() && - (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER))) + (pcie_ports_dpc_native || host->native_dpc || +(services & PCIE_PORT_SERVICE_AER))) services |= PCIE_PORT_SERVICE_DPC; if (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
Re: [PATCH v2 2/2] PCI/DPC: Allow Native DPC Host Bridges to use DPC
On 4/20/20 2:37 PM, Jon Derrick wrote: The existing portdrv model prevents DPC services without either OS control (_OSC) granted to AER services, a Host Bridge requesting Native AER, or using one of the 'pcie_ports=' parameters of 'native' or 'dpc-native'. The DPC port service driver itself will also fail to probe if the kernel assumes the port is using Firmware-First AER. It's a reasonable expectation that a port using Firmware-First AER will also be using Firmware-First DPC, however if a Host Bridge requests Native DPC, the DPC driver should allow it and not fail to bind due to AER capability settings. Host Bridges which request Native DPC port services will also likely request Native AER, however it shouldn't be a requirement. This patch allows ports on those Host Bridges to have DPC port services. This will avoid the unlikely situation where the port is Firmware-First AER and Native DPC, and a BIOS or switch firmware preconfiguration of the DPC trigger could result in unhandled DPC events. Signed-off-by: Jon Derrick --- drivers/pci/pcie/dpc.c | 3 ++- drivers/pci/pcie/portdrv_core.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index 7621704..3f3106f 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -284,7 +284,8 @@ static int dpc_probe(struct pcie_device *dev) int status; u16 ctl, cap; - if (pcie_aer_get_firmware_first(pdev) && !pcie_ports_dpc_native) + if (pcie_aer_get_firmware_first(pdev) && !pcie_ports_dpc_native && + !pci_find_host_bridge(pdev->bus)->native_dpc) Why do it in probe as well ? if host->native_dpc is not set then the device DPC probe it self won't happen right ? return -ENOTSUPP; status = devm_request_threaded_irq(device, dev->irq, dpc_irq, diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c index 50a9522..f2139a1 100644 --- a/drivers/pci/pcie/portdrv_core.c +++ b/drivers/pci/pcie/portdrv_core.c @@ -256,7 +256,8 @@ static int get_port_device_capability(struct pci_dev *dev) */ if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) && pci_aer_available() && - (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER))) + (pcie_ports_dpc_native || host->native_dpc || +(services & PCIE_PORT_SERVICE_AER))) services |= PCIE_PORT_SERVICE_DPC; if (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
Re: [PATCH v2 1/2] PCI/AER: Allow Native AER Host Bridges to use AER
On 4/20/20 2:37 PM, Jon Derrick wrote: Some platforms have a mix of ports whose capabilities can be negotiated by _OSC, and some ports which are not described by ACPI and instead managed by Native drivers. The existing Firmware-First HEST model can incorrectly tag these Native, Non-ACPI ports as Firmware-First managed ports by advertising the HEST Global Flag and matching the type and class of the port (aer_hest_parse). Is there a real use case for mixed mode (one host bridge in FF mode and another in native)? If the port requests Native AER through the Host Bridge's capability settings, the AER driver should honor those settings and allow the port to bind. This patch changes the definition of Firmware-First to exclude ports whose Host Bridges request Native AER. Signed-off-by: Jon Derrick --- drivers/pci/pcie/aer.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index f4274d3..30fbd1f 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -314,6 +314,9 @@ int pcie_aer_get_firmware_first(struct pci_dev *dev) if (pcie_ports_native) return 0; + if (pci_find_host_bridge(dev->bus)->native_aer) + return 0; + if (!dev->__aer_firmware_first_valid) aer_set_firmware_first(dev); return dev->__aer_firmware_first;
Re: [PATCH] PCI/DPC: Allow Non-ACPI Native ports to use DPC
Hi, On 4/16/20 12:59 PM, Jon Derrick wrote: Some platforms have a mix of ports whose capabilities can be negotiated by _OSC, and some ports which are not described by ACPI and instead managed by Native drivers. The existing Firmware-First HEST model can incorrectly tag these Native, Non-ACPI ports as Firmware-First capable ports by advertising the HEST Global flag and specifying the type and class (aer_hest_parse). This ultimately can lead to bad situations if the BIOS or port firmware leaves DPC preconfigured and the Linux DPC driver is unable to bind to the port to handle DPC events. This patch adds the check for Native DPC in the port's host bridge in order to allow DPC services to bind to the port. Signed-off-by: Jon Derrick --- drivers/pci/pcie/dpc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index 7621704..a1e355d 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -281,10 +281,12 @@ static int dpc_probe(struct pcie_device *dev) { struct pci_dev *pdev = dev->port; struct device *device = >device; + struct pci_host_bridge *host = pci_find_host_bridge(pdev->bus); int status; u16 ctl, cap; - if (pcie_aer_get_firmware_first(pdev) && !pcie_ports_dpc_native) + if (pcie_aer_get_firmware_first(pdev) && !pcie_ports_dpc_native && For other PCIe services, this check is added in get_port_device_capability(). Why not add it there for DPC as well ? + !host->native_dpc) return -ENOTSUPP; status = devm_request_threaded_irq(device, dev->irq, dpc_irq,
Re: [PATCH v1 2/2] PCI/AER: Update parameter descriptions to satisfy kernel-doc validator
On 8/27/19 8:18 AM, Andy Shevchenko wrote: Kernel-doc validator complains: aer.c:207: warning: Function parameter or member 'str' not described in 'pcie_ecrc_get_policy' aer.c:1209: warning: Function parameter or member 'irq' not described in 'aer_isr' aer.c:1209: warning: Function parameter or member 'context' not described in 'aer_isr' aer.c:1209: warning: Excess function parameter 'work' description in 'aer_isr' Fix the above accordingly. Signed-off-by: Andy Shevchenko Reviewed-by: Kuppuswamy Sathyanarayanan --- drivers/pci/pcie/aer.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index f883f81d759a..6073eaab099d 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -202,6 +202,7 @@ void pcie_set_ecrc_checking(struct pci_dev *dev) /** * pcie_ecrc_get_policy - parse kernel command-line ecrc option + * @str: ECRC policy from kernel command line to use */ void pcie_ecrc_get_policy(char *str) { @@ -1201,7 +1202,8 @@ static void aer_isr_one_error(struct aer_rpc *rpc, /** * aer_isr - consume errors detected by root port - * @work: definition of this work item + * @irq: IRQ assigned to Root Port + * @context: pointer to Root Port data structure * * Invoked, as DPC, when root port records new detected error */ -- Sathyanarayanan Kuppuswamy Linux kernel developer
Re: [PATCH v1 1/2] PCI/AER: Use for_each_set_bit()
On 8/27/19 8:18 AM, Andy Shevchenko wrote: This simplifies and standardizes slot manipulation code by using for_each_set_bit() library function. Signed-off-by: Andy Shevchenko Reviewed-by: Kuppuswamy Sathyanarayanan --- drivers/pci/pcie/aer.c | 19 --- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index b45bc47d04fe..f883f81d759a 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -15,6 +15,7 @@ #define pr_fmt(fmt) "AER: " fmt #define dev_fmt pr_fmt +#include #include #include #include @@ -657,7 +658,8 @@ const struct attribute_group aer_stats_attr_group = { static void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct aer_err_info *info) { - int status, i, max = -1; + unsigned long status = info->status & ~info->mask; + int i, max = -1; u64 *counter = NULL; struct aer_stats *aer_stats = pdev->aer_stats; @@ -682,10 +684,8 @@ static void pci_dev_aer_stats_incr(struct pci_dev *pdev, break; } - status = (info->status & ~info->mask); - for (i = 0; i < max; i++) - if (status & (1 << i)) - counter[i]++; + for_each_set_bit(i, , max) + counter[i]++; } static void pci_rootport_aer_stats_incr(struct pci_dev *pdev, @@ -717,14 +717,11 @@ static void __print_tlp_header(struct pci_dev *dev, static void __aer_print_error(struct pci_dev *dev, struct aer_err_info *info) { - int i, status; + unsigned long status = info->status & ~info->mask; const char *errmsg = NULL; - status = (info->status & ~info->mask); - - for (i = 0; i < 32; i++) { - if (!(status & (1 << i))) - continue; + int i; + for_each_set_bit(i, , 32) { if (info->severity == AER_CORRECTABLE) errmsg = i < ARRAY_SIZE(aer_correctable_error_string) ? aer_correctable_error_string[i] : NULL; -- Sathyanarayanan Kuppuswamy Linux kernel developer