Re: [PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api

2022-11-24 Thread Zhuo Chen



Ping. Gentle reminder


On 11/5/22 1:20 AM, Zhuo Chen wrote:

Hi Bjorn, a gentle reminder.

Thanks and regards.

On 9/28/22 6:59 PM, Zhuo Chen wrote:

Hello.

Here comes patch v3, which contains some fixes and optimizations of
aer api usage. The v1 and v2 can be found on the mailing list.

v3:
- Modifications to comments proposed by Sathyanarayanan. Remove
   pci_aer_clear_nonfatal_status() call in NTB and improve commit log.

v2:
- Modifications to comments proposed by Bjorn. Split patch into more
   obvious parts.

Zhuo Chen (9):
   PCI/AER: Add pci_aer_clear_uncorrect_error_status() to PCI core
   PCI/DPC: Use pci_aer_clear_uncorrect_error_status() to clear
 uncorrectable error status
   NTB: Remove pci_aer_clear_nonfatal_status() call
   scsi: lpfc: Change to use pci_aer_clear_uncorrect_error_status()
   PCI/AER: Unexport pci_aer_clear_nonfatal_status()
   PCI/AER: Move check inside pcie_clear_device_status().
   PCI/AER: Use pcie_aer_is_native() to judge whether OS owns AER
   PCI/ERR: Clear fatal error status when pci_channel_io_frozen
   PCI/AER: Refine status clearing process with api

  drivers/ntb/hw/idt/ntb_hw_idt.c |  2 --
  drivers/pci/pci.c   |  7 +++--
  drivers/pci/pci.h   |  2 ++
  drivers/pci/pcie/aer.c  | 45 +++--
  drivers/pci/pcie/dpc.c  |  3 +--
  drivers/pci/pcie/err.c  | 15 ---
  drivers/pci/pcie/portdrv_core.c |  3 +--
  drivers/scsi/lpfc/lpfc_attr.c   |  4 +--
  include/linux/aer.h |  4 +--
  9 files changed, 44 insertions(+), 41 deletions(-)





--
Zhuo Chen


Re: [PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api

2022-11-04 Thread Zhuo Chen

Hi Bjorn, a gentle reminder.

Thanks and regards.

On 9/28/22 6:59 PM, Zhuo Chen wrote:

Hello.

Here comes patch v3, which contains some fixes and optimizations of
aer api usage. The v1 and v2 can be found on the mailing list.

v3:
- Modifications to comments proposed by Sathyanarayanan. Remove
   pci_aer_clear_nonfatal_status() call in NTB and improve commit log.

v2:
- Modifications to comments proposed by Bjorn. Split patch into more
   obvious parts.

Zhuo Chen (9):
   PCI/AER: Add pci_aer_clear_uncorrect_error_status() to PCI core
   PCI/DPC: Use pci_aer_clear_uncorrect_error_status() to clear
 uncorrectable error status
   NTB: Remove pci_aer_clear_nonfatal_status() call
   scsi: lpfc: Change to use pci_aer_clear_uncorrect_error_status()
   PCI/AER: Unexport pci_aer_clear_nonfatal_status()
   PCI/AER: Move check inside pcie_clear_device_status().
   PCI/AER: Use pcie_aer_is_native() to judge whether OS owns AER
   PCI/ERR: Clear fatal error status when pci_channel_io_frozen
   PCI/AER: Refine status clearing process with api

  drivers/ntb/hw/idt/ntb_hw_idt.c |  2 --
  drivers/pci/pci.c   |  7 +++--
  drivers/pci/pci.h   |  2 ++
  drivers/pci/pcie/aer.c  | 45 +++--
  drivers/pci/pcie/dpc.c  |  3 +--
  drivers/pci/pcie/err.c  | 15 ---
  drivers/pci/pcie/portdrv_core.c |  3 +--
  drivers/scsi/lpfc/lpfc_attr.c   |  4 +--
  include/linux/aer.h |  4 +--
  9 files changed, 44 insertions(+), 41 deletions(-)



--
Zhuo Chen


Re: [External] Re: [PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api

2022-09-28 Thread Zhuo Chen




On 9/28/22 7:06 PM, Serge Semin wrote:

On Wed, Sep 28, 2022 at 06:59:37PM +0800, Zhuo Chen wrote:

Hello.

Here comes patch v3, which contains some fixes and optimizations of
aer api usage. The v1 and v2 can be found on the mailing list.

v3:
- Modifications to comments proposed by Sathyanarayanan.



Remove
   pci_aer_clear_nonfatal_status() call in NTB and improve commit log.


Failed to see who has requested that...

-Sergey


Hi, Sergey

Currently other vendor drivers do not clear error status in their own 
init code, I don't exactly know what is special reason for clearing 
error status during init code in ntb driver.


An evidence is in pci_aer_init(), PCI core driver has do 
pci_aer_clear_status() and pci_enable_pcie_error_reporting() in common 
process. So vendor drivers don't need to do again.


But I don't know the reason why many vendor drivers reserve 
pci_enable_pcie_error_reporting() after commit f26e58bf6f54 ("PCI/AER: 
Enable error reporting when AER is native"). Do they need to be removed?

Could Bjorn and Sathyanarayanan help look into it, thanks a lot.

Thanks.


v2:
- Modifications to comments proposed by Bjorn. Split patch into more
   obvious parts.

Zhuo Chen (9):
   PCI/AER: Add pci_aer_clear_uncorrect_error_status() to PCI core
   PCI/DPC: Use pci_aer_clear_uncorrect_error_status() to clear
 uncorrectable error status
   NTB: Remove pci_aer_clear_nonfatal_status() call
   scsi: lpfc: Change to use pci_aer_clear_uncorrect_error_status()
   PCI/AER: Unexport pci_aer_clear_nonfatal_status()
   PCI/AER: Move check inside pcie_clear_device_status().
   PCI/AER: Use pcie_aer_is_native() to judge whether OS owns AER
   PCI/ERR: Clear fatal error status when pci_channel_io_frozen
   PCI/AER: Refine status clearing process with api

  drivers/ntb/hw/idt/ntb_hw_idt.c |  2 --
  drivers/pci/pci.c   |  7 +++--
  drivers/pci/pci.h   |  2 ++
  drivers/pci/pcie/aer.c  | 45 +++--
  drivers/pci/pcie/dpc.c  |  3 +--
  drivers/pci/pcie/err.c  | 15 ---
  drivers/pci/pcie/portdrv_core.c |  3 +--
  drivers/scsi/lpfc/lpfc_attr.c   |  4 +--
  include/linux/aer.h |  4 +--
  9 files changed, 44 insertions(+), 41 deletions(-)

--
2.30.1 (Apple Git-130)



--
Zhuo Chen


[PATCH v3 9/9] PCI/AER: Refine status clearing process with api

2022-09-28 Thread Zhuo Chen
Statements clearing status in aer_enable_rootport() is functionally
equivalent with pcie_clear_device_status() and pci_aer_clear_status().
So replace them, which has no functional changes.

After commit 20e15e673b05 ("PCI/AER: Add pci_aer_raw_clear_status()
to unconditionally clear Error Status"), pci_aer_raw_clear_status()
is only used by the EDR path, so we add note in function comment.

Signed-off-by: Zhuo Chen 
---
 drivers/pci/pcie/aer.c | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index a6d29269ccf2..bd5ecfa4860f 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -306,6 +306,8 @@ EXPORT_SYMBOL_GPL(pci_aer_clear_uncorrect_error_status);
  * Clearing AER error status registers unconditionally, regardless of
  * whether they're owned by firmware or the OS.
  *
+ * Used only by the EDR path. Other paths should use pci_aer_clear_status().
+ *
  * Returns 0 on success, or negative on failure.
  */
 int pci_aer_raw_clear_status(struct pci_dev *dev)
@@ -1277,24 +1279,17 @@ static void aer_enable_rootport(struct aer_rpc *rpc)
 {
struct pci_dev *pdev = rpc->rpd;
int aer = pdev->aer_cap;
-   u16 reg16;
u32 reg32;
 
/* Clear PCIe Capability's Device Status */
-   pcie_capability_read_word(pdev, PCI_EXP_DEVSTA, );
-   pcie_capability_write_word(pdev, PCI_EXP_DEVSTA, reg16);
+   pcie_clear_device_status(pdev);
 
/* Disable system error generation in response to error messages */
pcie_capability_clear_word(pdev, PCI_EXP_RTCTL,
   SYSTEM_ERROR_INTR_ON_MESG_MASK);
 
/* Clear error status */
-   pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, );
-   pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, reg32);
-   pci_read_config_dword(pdev, aer + PCI_ERR_COR_STATUS, );
-   pci_write_config_dword(pdev, aer + PCI_ERR_COR_STATUS, reg32);
-   pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, );
-   pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, reg32);
+   pci_aer_clear_status(pdev);
 
/*
 * Enable error reporting for the root port device and downstream port
-- 
2.30.1 (Apple Git-130)



[PATCH v3 8/9] PCI/ERR: Clear fatal error status when pci_channel_io_frozen

2022-09-28 Thread Zhuo Chen
When state is pci_channel_io_frozen in pcie_do_recovery(), the
severity is fatal and fatal error status should be cleared.
So add pci_aer_clear_fatal_status().

Signed-off-by: Zhuo Chen 
---
 drivers/pci/pcie/err.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index f80b21244ef1..b46f1d36c090 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -241,7 +241,10 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
pci_walk_bridge(bridge, report_resume, );
 
pcie_clear_device_status(dev);
-   pci_aer_clear_nonfatal_status(dev);
+   if (state == pci_channel_io_frozen)
+   pci_aer_clear_fatal_status(dev);
+   else
+   pci_aer_clear_nonfatal_status(dev);
 
pci_info(bridge, "device recovery successful\n");
return status;
-- 
2.30.1 (Apple Git-130)



[PATCH v3 7/9] PCI/AER: Use pcie_aer_is_native() to judge whether OS owns AER

2022-09-28 Thread Zhuo Chen
Use pcie_aer_is_native() in place of "host->native_aer ||
pcie_ports_native" to judge whether OS owns AER in aer_root_reset().

Replace "dev->aer_cap && (pcie_ports_native || host->native_aer)" in
get_port_device_capability() with pcie_aer_is_native(), which has no
functional changes.

Signed-off-by: Zhuo Chen 
---
 drivers/pci/pcie/aer.c  | 5 ++---
 drivers/pci/pcie/portdrv_core.c | 3 +--
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index e2320ab27a31..a6d29269ccf2 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1403,7 +1403,6 @@ static pci_ers_result_t aer_root_reset(struct pci_dev 
*dev)
int type = pci_pcie_type(dev);
struct pci_dev *root;
int aer;
-   struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
u32 reg32;
int rc;
 
@@ -1424,7 +1423,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev 
*dev)
 */
aer = root ? root->aer_cap : 0;
 
-   if ((host->native_aer || pcie_ports_native) && aer) {
+   if (aer && pcie_aer_is_native(root)) {
/* Disable Root's interrupt in response to error messages */
pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, );
reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
@@ -1443,7 +1442,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev 
*dev)
pci_is_root_bus(dev->bus) ? "Root" : "Downstream", rc);
}
 
-   if ((host->native_aer || pcie_ports_native) && aer) {
+   if (aer && pcie_aer_is_native(root)) {
/* Clear Root Error Status */
pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, );
pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32);
diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index 1ac7fec47d6f..844297c0c85e 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -221,8 +221,7 @@ static int get_port_device_capability(struct pci_dev *dev)
}
 
 #ifdef CONFIG_PCIEAER
-   if (dev->aer_cap && pci_aer_available() &&
-   (pcie_ports_native || host->native_aer))
+   if (pcie_aer_is_native(dev) && pci_aer_available())
services |= PCIE_PORT_SERVICE_AER;
 #endif
 
-- 
2.30.1 (Apple Git-130)



[PATCH v3 6/9] PCI/AER: Move check inside pcie_clear_device_status().

2022-09-28 Thread Zhuo Chen
pcie_clear_device_status() doesn't check for pcie_aer_is_native()
internally, but after commit 068c29a248b6 ("PCI/ERR: Clear PCIe Device
Status errors only if OS owns AER") and commit aa344bc8b727 ("PCI/ERR:
Clear AER status only when we control AER"), both callers check before
calling it. So move the check inside pcie_clear_device_status().

pcie_clear_device_status() and pci_aer_clear_nonfatal_status() both
have check internally, so remove check when callers calling them.

Signed-off-by: Zhuo Chen 
---
 drivers/pci/pci.c  |  7 +--
 drivers/pci/pcie/aer.c |  4 ++--
 drivers/pci/pcie/err.c | 14 +++---
 3 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 95bc329e74c0..8caf4a5529a1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2282,9 +2282,12 @@ EXPORT_SYMBOL_GPL(pci_set_pcie_reset_state);
 void pcie_clear_device_status(struct pci_dev *dev)
 {
u16 sta;
+   struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
 
-   pcie_capability_read_word(dev, PCI_EXP_DEVSTA, );
-   pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
+   if (host->native_aer || pcie_ports_native) {
+   pcie_capability_read_word(dev, PCI_EXP_DEVSTA, );
+   pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
+   }
 }
 #endif
 
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index e2ebd108339d..e2320ab27a31 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -971,11 +971,11 @@ static void handle_error_source(struct pci_dev *dev, 
struct aer_err_info *info)
 * Correctable error does not need software intervention.
 * No need to go through error recovery process.
 */
-   if (aer)
+   if (aer) {
pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS,
info->status);
-   if (pcie_aer_is_native(dev))
pcie_clear_device_status(dev);
+   }
} else if (info->severity == AER_NONFATAL)
pcie_do_recovery(dev, pci_channel_io_normal, aer_root_reset);
else if (info->severity == AER_FATAL)
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 59c90d04a609..f80b21244ef1 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -188,7 +188,6 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
int type = pci_pcie_type(dev);
struct pci_dev *bridge;
pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
-   struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
 
/*
 * If the error was detected by a Root Port, Downstream Port, RCEC,
@@ -241,16 +240,9 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
pci_dbg(bridge, "broadcast resume message\n");
pci_walk_bridge(bridge, report_resume, );
 
-   /*
-* If we have native control of AER, clear error status in the device
-* that detected the error.  If the platform retained control of AER,
-* it is responsible for clearing this status.  In that case, the
-* signaling device may not even be visible to the OS.
-*/
-   if (host->native_aer || pcie_ports_native) {
-   pcie_clear_device_status(dev);
-   pci_aer_clear_nonfatal_status(dev);
-   }
+   pcie_clear_device_status(dev);
+   pci_aer_clear_nonfatal_status(dev);
+
pci_info(bridge, "device recovery successful\n");
return status;
 
-- 
2.30.1 (Apple Git-130)



[PATCH v3 5/9] PCI/AER: Unexport pci_aer_clear_nonfatal_status()

2022-09-28 Thread Zhuo Chen
Since pci_aer_clear_nonfatal_status() is used only internally, move
its declaration to the PCI internal header file. Also, no one cares
about return value of pci_aer_clear_nonfatal_status(), so make it void.

Signed-off-by: Zhuo Chen 
---
 drivers/pci/pci.h  | 2 ++
 drivers/pci/pcie/aer.c | 7 ++-
 include/linux/aer.h| 5 -
 3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 785f31086313..a114175d08e4 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -684,6 +684,7 @@ void pci_aer_init(struct pci_dev *dev);
 void pci_aer_exit(struct pci_dev *dev);
 extern const struct attribute_group aer_stats_attr_group;
 void pci_aer_clear_fatal_status(struct pci_dev *dev);
+void pci_aer_clear_nonfatal_status(struct pci_dev *dev);
 int pci_aer_clear_status(struct pci_dev *dev);
 int pci_aer_raw_clear_status(struct pci_dev *dev);
 #else
@@ -691,6 +692,7 @@ static inline void pci_no_aer(void) { }
 static inline void pci_aer_init(struct pci_dev *d) { }
 static inline void pci_aer_exit(struct pci_dev *d) { }
 static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
+static inline void pci_aer_clear_nonfatal_status(struct pci_dev *dev) { }
 static inline int pci_aer_clear_status(struct pci_dev *dev) { return -EINVAL; }
 static inline int pci_aer_raw_clear_status(struct pci_dev *dev) { return 
-EINVAL; }
 #endif
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 4e637121be23..e2ebd108339d 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -251,13 +251,13 @@ int pci_disable_pcie_error_reporting(struct pci_dev *dev)
 }
 EXPORT_SYMBOL_GPL(pci_disable_pcie_error_reporting);
 
-int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
+void pci_aer_clear_nonfatal_status(struct pci_dev *dev)
 {
int aer = dev->aer_cap;
u32 status, sev;
 
if (!pcie_aer_is_native(dev))
-   return -EIO;
+   return;
 
/* Clear status bits for ERR_NONFATAL errors only */
pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, );
@@ -265,10 +265,7 @@ int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
status &= ~sev;
if (status)
pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
-
-   return 0;
 }
-EXPORT_SYMBOL_GPL(pci_aer_clear_nonfatal_status);
 
 void pci_aer_clear_fatal_status(struct pci_dev *dev)
 {
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 154690c278cb..f638ad955deb 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -44,7 +44,6 @@ struct aer_capability_regs {
 /* PCIe port driver needs this function to enable AER */
 int pci_enable_pcie_error_reporting(struct pci_dev *dev);
 int pci_disable_pcie_error_reporting(struct pci_dev *dev);
-int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
 int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev);
 void pci_save_aer_state(struct pci_dev *dev);
 void pci_restore_aer_state(struct pci_dev *dev);
@@ -57,10 +56,6 @@ static inline int pci_disable_pcie_error_reporting(struct 
pci_dev *dev)
 {
return -EINVAL;
 }
-static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
-{
-   return -EINVAL;
-}
 static inline int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
 {
return -EINVAL;
-- 
2.30.1 (Apple Git-130)



[PATCH v3 4/9] scsi: lpfc: Change to use pci_aer_clear_uncorrect_error_status()

2022-09-28 Thread Zhuo Chen
lpfc_aer_cleanup_state() requires clearing both fatal and non-fatal
uncorrectable error status. But using pci_aer_clear_nonfatal_status()
will only clear non-fatal error status. To clear both fatal and
non-fatal error status, use pci_aer_clear_uncorrect_error_status().

Signed-off-by: Zhuo Chen 
---
 drivers/scsi/lpfc/lpfc_attr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
index 09cf2cd0ae60..d835cc0ba153 100644
--- a/drivers/scsi/lpfc/lpfc_attr.c
+++ b/drivers/scsi/lpfc/lpfc_attr.c
@@ -4689,7 +4689,7 @@ static DEVICE_ATTR_RW(lpfc_aer_support);
  * Description:
  * If the @buf contains 1 and the device currently has the AER support
  * enabled, then invokes the kernel AER helper routine
- * pci_aer_clear_nonfatal_status() to clean up the uncorrectable
+ * pci_aer_clear_uncorrect_error_status() to clean up the uncorrectable
  * error status register.
  *
  * Notes:
@@ -4715,7 +4715,7 @@ lpfc_aer_cleanup_state(struct device *dev, struct 
device_attribute *attr,
return -EINVAL;
 
if (phba->hba_flag & HBA_AER_ENABLED)
-   rc = pci_aer_clear_nonfatal_status(phba->pcidev);
+   rc = pci_aer_clear_uncorrect_error_status(phba->pcidev);
 
if (rc == 0)
return strlen(buf);
-- 
2.30.1 (Apple Git-130)



[PATCH v3 3/9] NTB: Remove pci_aer_clear_nonfatal_status() call

2022-09-28 Thread Zhuo Chen
There is no need to clear error status during init code, so remove it.

Signed-off-by: Zhuo Chen 
---
 drivers/ntb/hw/idt/ntb_hw_idt.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index 0ed6f809ff2e..fed03217289d 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -2657,8 +2657,6 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
ret = pci_enable_pcie_error_reporting(pdev);
if (ret != 0)
dev_warn(>dev, "PCIe AER capability disabled\n");
-   else /* Cleanup nonfatal error status before getting to init */
-   pci_aer_clear_nonfatal_status(pdev);
 
/* First enable the PCI device */
ret = pcim_enable_device(pdev);
-- 
2.30.1 (Apple Git-130)



[PATCH v3 2/9] PCI/DPC: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status

2022-09-28 Thread Zhuo Chen
pci_aer_clear_uncorrect_error_status() clears both fatal and non-fatal
errors. So use it in place of pci_aer_clear_nonfatal_status()
and pci_aer_clear_fatal_status().

Signed-off-by: Zhuo Chen 
---
 drivers/pci/pcie/dpc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index 3e9afee02e8d..7942073fbb34 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -288,8 +288,7 @@ void dpc_process_error(struct pci_dev *pdev)
 dpc_get_aer_uncorrect_severity(pdev, ) &&
 aer_get_device_error_info(pdev, )) {
aer_print_error(pdev, );
-   pci_aer_clear_nonfatal_status(pdev);
-   pci_aer_clear_fatal_status(pdev);
+   pci_aer_clear_uncorrect_error_status(pdev);
}
 }
 
-- 
2.30.1 (Apple Git-130)



[PATCH v3 1/9] PCI/AER: Add pci_aer_clear_uncorrect_error_status() to PCI core

2022-09-28 Thread Zhuo Chen
In lpfc_aer_cleanup_state(), uncorrectable error status needs to be
cleared, which can be done by calling pci_aer_clear_nonfatal_status()
and pci_aer_clear_fatal_status(). Meanwhile they can be combined in
one function (the same in dpc_process_error). So add
pci_aer_clear_uncorrect_error_status() function to PCI core and
export symbol to other modules which wants to use it.

Signed-off-by: Zhuo Chen 
---
 drivers/pci/pcie/aer.c | 16 
 include/linux/aer.h|  5 +
 2 files changed, 21 insertions(+)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index e2d8a74f83c3..4e637121be23 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -286,6 +286,22 @@ void pci_aer_clear_fatal_status(struct pci_dev *dev)
pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
 }
 
+int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
+{
+   int aer = dev->aer_cap;
+   u32 status;
+
+   if (!pcie_aer_is_native(dev))
+   return -EIO;
+
+   pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, );
+   if (status)
+   pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(pci_aer_clear_uncorrect_error_status);
+
 /**
  * pci_aer_raw_clear_status - Clear AER error registers.
  * @dev: the PCI device
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 97f64ba1b34a..154690c278cb 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -45,6 +45,7 @@ struct aer_capability_regs {
 int pci_enable_pcie_error_reporting(struct pci_dev *dev);
 int pci_disable_pcie_error_reporting(struct pci_dev *dev);
 int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
+int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev);
 void pci_save_aer_state(struct pci_dev *dev);
 void pci_restore_aer_state(struct pci_dev *dev);
 #else
@@ -60,6 +61,10 @@ static inline int pci_aer_clear_nonfatal_status(struct 
pci_dev *dev)
 {
return -EINVAL;
 }
+static inline int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
+{
+   return -EINVAL;
+}
 static inline void pci_save_aer_state(struct pci_dev *dev) {}
 static inline void pci_restore_aer_state(struct pci_dev *dev) {}
 #endif
-- 
2.30.1 (Apple Git-130)



[PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api

2022-09-28 Thread Zhuo Chen
Hello.

Here comes patch v3, which contains some fixes and optimizations of
aer api usage. The v1 and v2 can be found on the mailing list.

v3:
- Modifications to comments proposed by Sathyanarayanan. Remove
  pci_aer_clear_nonfatal_status() call in NTB and improve commit log. 

v2:
- Modifications to comments proposed by Bjorn. Split patch into more
  obvious parts.

Zhuo Chen (9):
  PCI/AER: Add pci_aer_clear_uncorrect_error_status() to PCI core
  PCI/DPC: Use pci_aer_clear_uncorrect_error_status() to clear
uncorrectable error status
  NTB: Remove pci_aer_clear_nonfatal_status() call
  scsi: lpfc: Change to use pci_aer_clear_uncorrect_error_status()
  PCI/AER: Unexport pci_aer_clear_nonfatal_status()
  PCI/AER: Move check inside pcie_clear_device_status().
  PCI/AER: Use pcie_aer_is_native() to judge whether OS owns AER
  PCI/ERR: Clear fatal error status when pci_channel_io_frozen
  PCI/AER: Refine status clearing process with api

 drivers/ntb/hw/idt/ntb_hw_idt.c |  2 --
 drivers/pci/pci.c   |  7 +++--
 drivers/pci/pci.h   |  2 ++
 drivers/pci/pcie/aer.c  | 45 +++--
 drivers/pci/pcie/dpc.c  |  3 +--
 drivers/pci/pcie/err.c  | 15 ---
 drivers/pci/pcie/portdrv_core.c |  3 +--
 drivers/scsi/lpfc/lpfc_attr.c   |  4 +--
 include/linux/aer.h |  4 +--
 9 files changed, 44 insertions(+), 41 deletions(-)

-- 
2.30.1 (Apple Git-130)



Re: [PATCH v2 1/9] PCI/AER: Add pci_aer_clear_uncorrect_error_status() to PCI core

2022-09-28 Thread Zhuo Chen




On 9/28/22 3:31 AM, Sathyanarayanan Kuppuswamy wrote:

Hi,

On 9/27/22 8:35 AM, Zhuo Chen wrote:

Sometimes we need to clear aer uncorrectable error status, so we add


Adding n actual use case will help.


pci_aer_clear_uncorrect_error_status() to PCI core.


If possible, try to avoid "we" usage in commit log. Just say "so add
pci_aer_clear_uncorrect_error_status() function"



Signed-off-by: Zhuo Chen 
---
  drivers/pci/pcie/aer.c | 16 
  include/linux/aer.h|  5 +
  2 files changed, 21 insertions(+)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index e2d8a74f83c3..4e637121be23 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -286,6 +286,22 @@ void pci_aer_clear_fatal_status(struct pci_dev *dev)
pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
  }
  
+int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)

+{
+   int aer = dev->aer_cap;
+   u32 status;
+
+   if (!pcie_aer_is_native(dev))
+   return -EIO;
+
+   pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, );
+   if (status)
+   pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);


Why not just write all '1' and clear it? Why read and write?


Hi Sathyanarayanan,

The current implementation and the previous implementation are all first 
read and then write to clear the status. Currently just be consistent 
with them:

 - aer_enable_rootport()
 - pci_aer_raw_clear_status()
 - pcibios_plat_dev_init() in arch/mips/pci/pci-octeon.c
 - commit fd3362cb73de ("PCI/AER: Squash aerdrv_core.c into aerdrv.c")
   pci_cleanup_aer_uncorrect_error_status



+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(pci_aer_clear_uncorrect_error_status);


Add details about why you want to export in commit log.


Thanks,

I will add details and improve commit log in next version.



+
  /**
   * pci_aer_raw_clear_status - Clear AER error registers.
   * @dev: the PCI device
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 97f64ba1b34a..154690c278cb 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -45,6 +45,7 @@ struct aer_capability_regs {
  int pci_enable_pcie_error_reporting(struct pci_dev *dev);
  int pci_disable_pcie_error_reporting(struct pci_dev *dev);
  int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
+int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev);
  void pci_save_aer_state(struct pci_dev *dev);
  void pci_restore_aer_state(struct pci_dev *dev);
  #else
@@ -60,6 +61,10 @@ static inline int pci_aer_clear_nonfatal_status(struct 
pci_dev *dev)
  {
return -EINVAL;
  }
+static inline int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
+{
+   return -EINVAL;
+}
  static inline void pci_save_aer_state(struct pci_dev *dev) {}
  static inline void pci_restore_aer_state(struct pci_dev *dev) {}
  #endif




--
Thanks,
Zhuo Chen


Re: [External] Re: [PATCH v2 3/9] NTB: Change to use pci_aer_clear_uncorrect_error_status()

2022-09-27 Thread Zhuo Chen




On 9/28/22 3:39 AM, Sathyanarayanan Kuppuswamy wrote:



On 9/27/22 8:35 AM, Zhuo Chen wrote:

Status bits for ERR_NONFATAL errors only are cleared in
pci_aer_clear_nonfatal_status(), but we want clear uncorrectable
error status in idt_init_pci(), so we change to use
pci_aer_clear_uncorrect_error_status().


You mean currently driver does not clear fatal errors now, and it is
a problem? Any error reported?


Hi Sathyanarayanan,

No error reports yet, I just changes the behavior back to what it was 
before commit e7b0b847de6d ("PCI/AER: Clear only ERR_NONFATAL bits 
during non-fatal recovery"), because this commit change the original 
function in commit bf2a952d31d2 ("NTB: Add IDT 89HPESxNTx PCIe-switches 
support").



Also, I am wondering why is it required to clear errors during init
code. Is it a norm?


I think there is no need to clear errors during init code.


Signed-off-by: Zhuo Chen 
---
  drivers/ntb/hw/idt/ntb_hw_idt.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index 0ed6f809ff2e..d5f0aa87f817 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -2657,8 +2657,8 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
ret = pci_enable_pcie_error_reporting(pdev);
if (ret != 0)
dev_warn(>dev, "PCIe AER capability disabled\n");
-   else /* Cleanup nonfatal error status before getting to init */
-   pci_aer_clear_nonfatal_status(pdev);
+   else /* Cleanup uncorrectable error status before getting to init */
+   pci_aer_clear_uncorrect_error_status(pdev);
  
  	/* First enable the PCI device */

ret = pcim_enable_device(pdev);




--
Thanks,
Zhuo Chen


[PATCH v2 9/9] PCI/AER: Refine status clearing process with api

2022-09-27 Thread Zhuo Chen
Statements clearing status in aer_enable_rootport() is functionally
equivalent with pcie_clear_device_status() and pci_aer_clear_status().
So we replace them, which has no functional changes.

After commit 20e15e673b05 ("PCI/AER: Add pci_aer_raw_clear_status()
to unconditionally clear Error Status"), pci_aer_raw_clear_status()
is only used by the EDR path, so we add note in function comment.

Signed-off-by: Zhuo Chen 
---
 drivers/pci/pcie/aer.c | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index a6d29269ccf2..bd5ecfa4860f 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -306,6 +306,8 @@ EXPORT_SYMBOL_GPL(pci_aer_clear_uncorrect_error_status);
  * Clearing AER error status registers unconditionally, regardless of
  * whether they're owned by firmware or the OS.
  *
+ * Used only by the EDR path. Other paths should use pci_aer_clear_status().
+ *
  * Returns 0 on success, or negative on failure.
  */
 int pci_aer_raw_clear_status(struct pci_dev *dev)
@@ -1277,24 +1279,17 @@ static void aer_enable_rootport(struct aer_rpc *rpc)
 {
struct pci_dev *pdev = rpc->rpd;
int aer = pdev->aer_cap;
-   u16 reg16;
u32 reg32;
 
/* Clear PCIe Capability's Device Status */
-   pcie_capability_read_word(pdev, PCI_EXP_DEVSTA, );
-   pcie_capability_write_word(pdev, PCI_EXP_DEVSTA, reg16);
+   pcie_clear_device_status(pdev);
 
/* Disable system error generation in response to error messages */
pcie_capability_clear_word(pdev, PCI_EXP_RTCTL,
   SYSTEM_ERROR_INTR_ON_MESG_MASK);
 
/* Clear error status */
-   pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, );
-   pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, reg32);
-   pci_read_config_dword(pdev, aer + PCI_ERR_COR_STATUS, );
-   pci_write_config_dword(pdev, aer + PCI_ERR_COR_STATUS, reg32);
-   pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, );
-   pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, reg32);
+   pci_aer_clear_status(pdev);
 
/*
 * Enable error reporting for the root port device and downstream port
-- 
2.30.1 (Apple Git-130)



[PATCH v2 8/9] PCI/ERR: Clear fatal status when pci_channel_io_frozen

2022-09-27 Thread Zhuo Chen
When state is pci_channel_io_frozen in pcie_do_recovery(),
the severity is fatal and fatal status should be cleared.
So we add pci_aer_clear_fatal_status().

Signed-off-by: Zhuo Chen 
---
 drivers/pci/pcie/err.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index f80b21244ef1..b46f1d36c090 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -241,7 +241,10 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
pci_walk_bridge(bridge, report_resume, );
 
pcie_clear_device_status(dev);
-   pci_aer_clear_nonfatal_status(dev);
+   if (state == pci_channel_io_frozen)
+   pci_aer_clear_fatal_status(dev);
+   else
+   pci_aer_clear_nonfatal_status(dev);
 
pci_info(bridge, "device recovery successful\n");
return status;
-- 
2.30.1 (Apple Git-130)



[PATCH v2 7/9] PCI/AER: Use pcie_aer_is_native() to judge whether OS owns AER

2022-09-27 Thread Zhuo Chen
Use pcie_aer_is_native() in place of "host->native_aer ||
pcie_ports_native" to judge whether OS owns AER in aer_root_reset().

Replace "dev->aer_cap && (pcie_ports_native || host->native_aer)" in
get_port_device_capability() with pcie_aer_is_native(), which has no
functional changes.

Signed-off-by: Zhuo Chen 
---
 drivers/pci/pcie/aer.c  | 5 ++---
 drivers/pci/pcie/portdrv_core.c | 3 +--
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index e2320ab27a31..a6d29269ccf2 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1403,7 +1403,6 @@ static pci_ers_result_t aer_root_reset(struct pci_dev 
*dev)
int type = pci_pcie_type(dev);
struct pci_dev *root;
int aer;
-   struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
u32 reg32;
int rc;
 
@@ -1424,7 +1423,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev 
*dev)
 */
aer = root ? root->aer_cap : 0;
 
-   if ((host->native_aer || pcie_ports_native) && aer) {
+   if (aer && pcie_aer_is_native(root)) {
/* Disable Root's interrupt in response to error messages */
pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, );
reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
@@ -1443,7 +1442,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev 
*dev)
pci_is_root_bus(dev->bus) ? "Root" : "Downstream", rc);
}
 
-   if ((host->native_aer || pcie_ports_native) && aer) {
+   if (aer && pcie_aer_is_native(root)) {
/* Clear Root Error Status */
pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, );
pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32);
diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index 1ac7fec47d6f..844297c0c85e 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -221,8 +221,7 @@ static int get_port_device_capability(struct pci_dev *dev)
}
 
 #ifdef CONFIG_PCIEAER
-   if (dev->aer_cap && pci_aer_available() &&
-   (pcie_ports_native || host->native_aer))
+   if (pcie_aer_is_native(dev) && pci_aer_available())
services |= PCIE_PORT_SERVICE_AER;
 #endif
 
-- 
2.30.1 (Apple Git-130)



[PATCH v2 6/9] PCI/AER: Move check inside pcie_clear_device_status().

2022-09-27 Thread Zhuo Chen
pcie_clear_device_status() doesn't check for pcie_aer_is_native()
internally, but after commit 068c29a248b6 ("PCI/ERR: Clear PCIe Device
Status errors only if OS owns AER") and commit aa344bc8b727 ("PCI/ERR:
Clear AER status only when we control AER"), both callers check before
calling it. So we move the check inside pcie_clear_device_status().

Signed-off-by: Zhuo Chen 
---
 drivers/pci/pci.c  |  7 +--
 drivers/pci/pcie/aer.c |  4 ++--
 drivers/pci/pcie/err.c | 14 +++---
 3 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 95bc329e74c0..8caf4a5529a1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2282,9 +2282,12 @@ EXPORT_SYMBOL_GPL(pci_set_pcie_reset_state);
 void pcie_clear_device_status(struct pci_dev *dev)
 {
u16 sta;
+   struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
 
-   pcie_capability_read_word(dev, PCI_EXP_DEVSTA, );
-   pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
+   if (host->native_aer || pcie_ports_native) {
+   pcie_capability_read_word(dev, PCI_EXP_DEVSTA, );
+   pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
+   }
 }
 #endif
 
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index e2ebd108339d..e2320ab27a31 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -971,11 +971,11 @@ static void handle_error_source(struct pci_dev *dev, 
struct aer_err_info *info)
 * Correctable error does not need software intervention.
 * No need to go through error recovery process.
 */
-   if (aer)
+   if (aer) {
pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS,
info->status);
-   if (pcie_aer_is_native(dev))
pcie_clear_device_status(dev);
+   }
} else if (info->severity == AER_NONFATAL)
pcie_do_recovery(dev, pci_channel_io_normal, aer_root_reset);
else if (info->severity == AER_FATAL)
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 59c90d04a609..f80b21244ef1 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -188,7 +188,6 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
int type = pci_pcie_type(dev);
struct pci_dev *bridge;
pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
-   struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
 
/*
 * If the error was detected by a Root Port, Downstream Port, RCEC,
@@ -241,16 +240,9 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
pci_dbg(bridge, "broadcast resume message\n");
pci_walk_bridge(bridge, report_resume, );
 
-   /*
-* If we have native control of AER, clear error status in the device
-* that detected the error.  If the platform retained control of AER,
-* it is responsible for clearing this status.  In that case, the
-* signaling device may not even be visible to the OS.
-*/
-   if (host->native_aer || pcie_ports_native) {
-   pcie_clear_device_status(dev);
-   pci_aer_clear_nonfatal_status(dev);
-   }
+   pcie_clear_device_status(dev);
+   pci_aer_clear_nonfatal_status(dev);
+
pci_info(bridge, "device recovery successful\n");
return status;
 
-- 
2.30.1 (Apple Git-130)



[PATCH v2 5/9] PCI/AER: Unexport pci_aer_clear_nonfatal_status()

2022-09-27 Thread Zhuo Chen
Since pci_aer_clear_nonfatal_status() is used only internally, move
its declaration to the PCI internal header file. Also, no one cares
about return value of pci_aer_clear_nonfatal_status(), so make it void.

Signed-off-by: Zhuo Chen 
---
 drivers/pci/pci.h  | 2 ++
 drivers/pci/pcie/aer.c | 7 ++-
 include/linux/aer.h| 5 -
 3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 785f31086313..a114175d08e4 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -684,6 +684,7 @@ void pci_aer_init(struct pci_dev *dev);
 void pci_aer_exit(struct pci_dev *dev);
 extern const struct attribute_group aer_stats_attr_group;
 void pci_aer_clear_fatal_status(struct pci_dev *dev);
+void pci_aer_clear_nonfatal_status(struct pci_dev *dev);
 int pci_aer_clear_status(struct pci_dev *dev);
 int pci_aer_raw_clear_status(struct pci_dev *dev);
 #else
@@ -691,6 +692,7 @@ static inline void pci_no_aer(void) { }
 static inline void pci_aer_init(struct pci_dev *d) { }
 static inline void pci_aer_exit(struct pci_dev *d) { }
 static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
+static inline void pci_aer_clear_nonfatal_status(struct pci_dev *dev) { }
 static inline int pci_aer_clear_status(struct pci_dev *dev) { return -EINVAL; }
 static inline int pci_aer_raw_clear_status(struct pci_dev *dev) { return 
-EINVAL; }
 #endif
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 4e637121be23..e2ebd108339d 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -251,13 +251,13 @@ int pci_disable_pcie_error_reporting(struct pci_dev *dev)
 }
 EXPORT_SYMBOL_GPL(pci_disable_pcie_error_reporting);
 
-int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
+void pci_aer_clear_nonfatal_status(struct pci_dev *dev)
 {
int aer = dev->aer_cap;
u32 status, sev;
 
if (!pcie_aer_is_native(dev))
-   return -EIO;
+   return;
 
/* Clear status bits for ERR_NONFATAL errors only */
pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, );
@@ -265,10 +265,7 @@ int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
status &= ~sev;
if (status)
pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
-
-   return 0;
 }
-EXPORT_SYMBOL_GPL(pci_aer_clear_nonfatal_status);
 
 void pci_aer_clear_fatal_status(struct pci_dev *dev)
 {
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 154690c278cb..f638ad955deb 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -44,7 +44,6 @@ struct aer_capability_regs {
 /* PCIe port driver needs this function to enable AER */
 int pci_enable_pcie_error_reporting(struct pci_dev *dev);
 int pci_disable_pcie_error_reporting(struct pci_dev *dev);
-int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
 int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev);
 void pci_save_aer_state(struct pci_dev *dev);
 void pci_restore_aer_state(struct pci_dev *dev);
@@ -57,10 +56,6 @@ static inline int pci_disable_pcie_error_reporting(struct 
pci_dev *dev)
 {
return -EINVAL;
 }
-static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
-{
-   return -EINVAL;
-}
 static inline int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
 {
return -EINVAL;
-- 
2.30.1 (Apple Git-130)



[PATCH v2 4/9] scsi: lpfc: Change to use pci_aer_clear_uncorrect_error_status()

2022-09-27 Thread Zhuo Chen
Status bits for ERR_NONFATAL errors only are cleared in
pci_aer_clear_nonfatal_status(), but we want clear uncorrectable
error status in lpfc_aer_cleanup_state(), so we change to use
pci_aer_clear_uncorrect_error_status().

Signed-off-by: Zhuo Chen 
---
 drivers/scsi/lpfc/lpfc_attr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
index 09cf2cd0ae60..d835cc0ba153 100644
--- a/drivers/scsi/lpfc/lpfc_attr.c
+++ b/drivers/scsi/lpfc/lpfc_attr.c
@@ -4689,7 +4689,7 @@ static DEVICE_ATTR_RW(lpfc_aer_support);
  * Description:
  * If the @buf contains 1 and the device currently has the AER support
  * enabled, then invokes the kernel AER helper routine
- * pci_aer_clear_nonfatal_status() to clean up the uncorrectable
+ * pci_aer_clear_uncorrect_error_status() to clean up the uncorrectable
  * error status register.
  *
  * Notes:
@@ -4715,7 +4715,7 @@ lpfc_aer_cleanup_state(struct device *dev, struct 
device_attribute *attr,
return -EINVAL;
 
if (phba->hba_flag & HBA_AER_ENABLED)
-   rc = pci_aer_clear_nonfatal_status(phba->pcidev);
+   rc = pci_aer_clear_uncorrect_error_status(phba->pcidev);
 
if (rc == 0)
return strlen(buf);
-- 
2.30.1 (Apple Git-130)



[PATCH v2 3/9] NTB: Change to use pci_aer_clear_uncorrect_error_status()

2022-09-27 Thread Zhuo Chen
Status bits for ERR_NONFATAL errors only are cleared in
pci_aer_clear_nonfatal_status(), but we want clear uncorrectable
error status in idt_init_pci(), so we change to use
pci_aer_clear_uncorrect_error_status().

Signed-off-by: Zhuo Chen 
---
 drivers/ntb/hw/idt/ntb_hw_idt.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index 0ed6f809ff2e..d5f0aa87f817 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -2657,8 +2657,8 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
ret = pci_enable_pcie_error_reporting(pdev);
if (ret != 0)
dev_warn(>dev, "PCIe AER capability disabled\n");
-   else /* Cleanup nonfatal error status before getting to init */
-   pci_aer_clear_nonfatal_status(pdev);
+   else /* Cleanup uncorrectable error status before getting to init */
+   pci_aer_clear_uncorrect_error_status(pdev);
 
/* First enable the PCI device */
ret = pcim_enable_device(pdev);
-- 
2.30.1 (Apple Git-130)



[PATCH v2 2/9] PCI/DPC: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status

2022-09-27 Thread Zhuo Chen
Use pci_aer_clear_nonfatal_status() in dpc_process_error(), which has
no functional changes.

Signed-off-by: Zhuo Chen 
---
 drivers/pci/pcie/dpc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index 3e9afee02e8d..7942073fbb34 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -288,8 +288,7 @@ void dpc_process_error(struct pci_dev *pdev)
 dpc_get_aer_uncorrect_severity(pdev, ) &&
 aer_get_device_error_info(pdev, )) {
aer_print_error(pdev, );
-   pci_aer_clear_nonfatal_status(pdev);
-   pci_aer_clear_fatal_status(pdev);
+   pci_aer_clear_uncorrect_error_status(pdev);
}
 }
 
-- 
2.30.1 (Apple Git-130)



[PATCH v2 1/9] PCI/AER: Add pci_aer_clear_uncorrect_error_status() to PCI core

2022-09-27 Thread Zhuo Chen
Sometimes we need to clear aer uncorrectable error status, so we add
pci_aer_clear_uncorrect_error_status() to PCI core.

Signed-off-by: Zhuo Chen 
---
 drivers/pci/pcie/aer.c | 16 
 include/linux/aer.h|  5 +
 2 files changed, 21 insertions(+)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index e2d8a74f83c3..4e637121be23 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -286,6 +286,22 @@ void pci_aer_clear_fatal_status(struct pci_dev *dev)
pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
 }
 
+int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
+{
+   int aer = dev->aer_cap;
+   u32 status;
+
+   if (!pcie_aer_is_native(dev))
+   return -EIO;
+
+   pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, );
+   if (status)
+   pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(pci_aer_clear_uncorrect_error_status);
+
 /**
  * pci_aer_raw_clear_status - Clear AER error registers.
  * @dev: the PCI device
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 97f64ba1b34a..154690c278cb 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -45,6 +45,7 @@ struct aer_capability_regs {
 int pci_enable_pcie_error_reporting(struct pci_dev *dev);
 int pci_disable_pcie_error_reporting(struct pci_dev *dev);
 int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
+int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev);
 void pci_save_aer_state(struct pci_dev *dev);
 void pci_restore_aer_state(struct pci_dev *dev);
 #else
@@ -60,6 +61,10 @@ static inline int pci_aer_clear_nonfatal_status(struct 
pci_dev *dev)
 {
return -EINVAL;
 }
+static inline int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
+{
+   return -EINVAL;
+}
 static inline void pci_save_aer_state(struct pci_dev *dev) {}
 static inline void pci_restore_aer_state(struct pci_dev *dev) {}
 #endif
-- 
2.30.1 (Apple Git-130)



[PATCH v2 0/9] PCI/AER: Fix and optimize usage of status clearing api

2022-09-27 Thread Zhuo Chen
Hello.

Here comes patch v2, which contains some fixes and optimizations of
aer api usage. The original version can be found on the mailing list.

Changes since v1:
- Modifications to comments proposed by Bjorn. Split patch into more
  obvious parts.

Zhuo Chen (9):
  PCI/AER: Add pci_aer_clear_uncorrect_error_status() to PCI core
  PCI/DPC: Use pci_aer_clear_uncorrect_error_status() to clear
uncorrectable error status
  NTB: Change to use pci_aer_clear_uncorrect_error_status()
  scsi: lpfc: Change to use pci_aer_clear_uncorrect_error_status()
  PCI/AER: Unexport pci_aer_clear_nonfatal_status()
  PCI/AER: Move check inside pcie_clear_device_status().
  PCI/AER: Use pcie_aer_is_native() to judge whether OS owns AER
  PCI/ERR: Clear fatal status when pci_channel_io_frozen
  PCI/AER: Refine status clearing process with api

 drivers/ntb/hw/idt/ntb_hw_idt.c |  4 +--
 drivers/pci/pci.c   |  7 +++--
 drivers/pci/pci.h   |  2 ++
 drivers/pci/pcie/aer.c  | 45 +++--
 drivers/pci/pcie/dpc.c  |  3 +--
 drivers/pci/pcie/err.c  | 15 ---
 drivers/pci/pcie/portdrv_core.c |  3 +--
 drivers/scsi/lpfc/lpfc_attr.c   |  4 +--
 include/linux/aer.h |  4 +--
 9 files changed, 46 insertions(+), 41 deletions(-)

-- 
2.30.1 (Apple Git-130)



Re: [External] Re: [PATCH 2/3] PCI/ERR: Clear fatal status in pcie_do_recovery()

2022-09-27 Thread Zhuo Chen




On 9/27/22 2:09 AM, Bjorn Helgaas wrote:

On Mon, Sep 26, 2022 at 10:01:55PM +0800, Zhuo Chen wrote:

On 9/23/22 5:08 AM, Bjorn Helgaas wrote:

On Fri, Sep 02, 2022 at 02:16:33AM +0800, Zhuo Chen wrote:

When state is pci_channel_io_frozen in pcie_do_recovery(),
the severity is fatal and fatal status should be cleared.
So we add pci_aer_clear_fatal_status().


Seems sensible to me.  Did you find this by code inspection or by
debugging a problem?  If the latter, it would be nice to mention the
symptoms of the problem in the commit log.


I found this by code inspection so I may not enumerate what kind of problems
this code will cause.



Since pcie_aer_is_native() in pci_aer_clear_fatal_status()
and pci_aer_clear_nonfatal_status() contains the function of
'if (host->native_aer || pcie_ports_native)', so we move them
out of it.


Wrap commit log to fill 75 columns.


Signed-off-by: Zhuo Chen 
---
   drivers/pci/pcie/err.c | 8 ++--
   1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 0c5a143025af..e0a8ade4c3fe 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -243,10 +243,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 * it is responsible for clearing this status.  In that case, the
 * signaling device may not even be visible to the OS.
 */
-   if (host->native_aer || pcie_ports_native) {
+   if (host->native_aer || pcie_ports_native)
pcie_clear_device_status(dev);


pcie_clear_device_status() doesn't check for pcie_aer_is_native()
internally, but after 068c29a248b6 ("PCI/ERR: Clear PCIe Device Status
errors only if OS owns AER") and aa344bc8b727 ("PCI/ERR: Clear AER
status only when we control AER"), both callers check before calling
it.

I think we should move the check inside pcie_clear_device_status().
That could be a separate preliminary patch.

There are a couple other places (aer_root_reset() and
get_port_device_capability()) that do the same check and could be
changed to use pcie_aer_is_native() instead.  That could be another
preliminary patch.


Good suggestion. But I have only one doubt. In aer_root_reset(), if we use
"if (pcie_aer_is_native(dev) && aer)", when dev->aer_cap
is NULL and root->aer_cap is not NULL, pcie_aer_is_native() will return
false. It's different from just using "(host->native_aer ||
pcie_ports_native)".
Or if we can use "if (pcie_aer_is_native(root))", at this time a NULL
pointer check should be added in pcie_aer_is_native() because root may be
NULL.


Good point.  In aer_root_reset(), we're updating Root Port registers,
so I think they should look like:

   if (pcie_aer_is_native(root) && aer) {
 ...
   }

Does that seem safe and equivalent to you?

Bjorn


I think ‘if (aer && pcie_aer_is_native(root))’ might be safer,
because when root is NULL, 'aer' will be NUll as well, and the
predicate will return false without entering pcie_aer_is_native(root).


--
Thanks,
Zhuo Chen


Re: [PATCH 3/3] PCI/AER: Use pci_aer_raw_clear_status() to clear root port's AER error status

2022-09-26 Thread Zhuo Chen




On 9/23/22 5:50 AM, Bjorn Helgaas wrote:

On Fri, Sep 02, 2022 at 02:16:34AM +0800, Zhuo Chen wrote:

Statements clearing AER error status in aer_enable_rootport() has the
same function as pci_aer_raw_clear_status(). So we replace them, which
has no functional changes.

Signed-off-by: Zhuo Chen 
---
  drivers/pci/pcie/aer.c | 7 +--
  1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index d2996afa80f6..eb0193f279f2 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1287,12 +1287,7 @@ static void aer_enable_rootport(struct aer_rpc *rpc)
   SYSTEM_ERROR_INTR_ON_MESG_MASK);
  
  	/* Clear error status */

-   pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, );
-   pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, reg32);
-   pci_read_config_dword(pdev, aer + PCI_ERR_COR_STATUS, );
-   pci_write_config_dword(pdev, aer + PCI_ERR_COR_STATUS, reg32);
-   pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, );
-   pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, reg32);
+   pci_aer_raw_clear_status(pdev);


It's true that this is functionally equivalent.

But 20e15e673b05 ("PCI/AER: Add pci_aer_raw_clear_status() to
unconditionally clear Error Status") says pci_aer_raw_clear_status()
is only for use in the EDR path (this should have been included in the
function comment), so I think we should preserve that property and use
pci_aer_clear_status() here.

pci_aer_raw_clear_status() is the same as pci_aer_clear_status()
except it doesn't check pcie_aer_is_native().  And I'm pretty sure we
can't get to aer_enable_rootport() *unless* pcie_aer_is_native(),
because get_port_device_capability() checks the same thing, so they
should be equivalent here.

Bjorn
Thanks Bjorn, this very detailed correction is helpful. By the way, 
'only for use in the EDR path' obviously written in the function 
comments may be better. So far only commit log has included these.


I will change to use pci_aer_clear_status() in next patch.

--
Thanks,
Zhuo Chen


Re: [PATCH 2/3] PCI/ERR: Clear fatal status in pcie_do_recovery()

2022-09-26 Thread Zhuo Chen




On 9/23/22 5:08 AM, Bjorn Helgaas wrote:

On Fri, Sep 02, 2022 at 02:16:33AM +0800, Zhuo Chen wrote:

When state is pci_channel_io_frozen in pcie_do_recovery(),
the severity is fatal and fatal status should be cleared.
So we add pci_aer_clear_fatal_status().


Seems sensible to me.  Did you find this by code inspection or by
debugging a problem?  If the latter, it would be nice to mention the
symptoms of the problem in the commit log.


I found this by code inspection so I may not enumerate what kind of 
problems this code will cause.



Since pcie_aer_is_native() in pci_aer_clear_fatal_status()
and pci_aer_clear_nonfatal_status() contains the function of
'if (host->native_aer || pcie_ports_native)', so we move them
out of it.


Wrap commit log to fill 75 columns.


Signed-off-by: Zhuo Chen 
---
  drivers/pci/pcie/err.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 0c5a143025af..e0a8ade4c3fe 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -243,10 +243,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 * it is responsible for clearing this status.  In that case, the
 * signaling device may not even be visible to the OS.
 */
-   if (host->native_aer || pcie_ports_native) {
+   if (host->native_aer || pcie_ports_native)
pcie_clear_device_status(dev);


pcie_clear_device_status() doesn't check for pcie_aer_is_native()
internally, but after 068c29a248b6 ("PCI/ERR: Clear PCIe Device Status
errors only if OS owns AER") and aa344bc8b727 ("PCI/ERR: Clear AER
status only when we control AER"), both callers check before calling
it.

I think we should move the check inside pcie_clear_device_status().
That could be a separate preliminary patch.

There are a couple other places (aer_root_reset() and
get_port_device_capability()) that do the same check and could be
changed to use pcie_aer_is_native() instead.  That could be another
preliminary patch.

Good suggestion. But I have only one doubt. In aer_root_reset(), if we 
use "if (pcie_aer_is_native(dev) && aer)", when dev->aer_cap
is NULL and root->aer_cap is not NULL, pcie_aer_is_native() will return 
false. It's different from just using "(host->native_aer ||

pcie_ports_native)".
Or if we can use "if (pcie_aer_is_native(root))", at this time a NULL 
pointer check should be added in pcie_aer_is_native() because root may 
be NULL.





+   if (state == pci_channel_io_frozen)
+   pci_aer_clear_fatal_status(dev);
+   else
pci_aer_clear_nonfatal_status(dev);
-   }
+
pci_info(bridge, "device recovery successful\n");
return status;
  
--

2.30.1 (Apple Git-130)



--
Thanks,
Zhuo Chen


Re: [PATCH 1/3] PCI/AER: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status

2022-09-26 Thread Zhuo Chen




On 9/23/22 4:02 AM, Bjorn Helgaas wrote:

On Mon, Sep 12, 2022 at 01:09:05AM +0800, Zhuo Chen wrote:

On 9/12/22 12:22 AM, Serge Semin wrote:

On Fri, Sep 02, 2022 at 02:16:32AM +0800, Zhuo Chen wrote:

Status bits for ERR_NONFATAL errors only are cleared in
pci_aer_clear_nonfatal_status(), but we want clear uncorrectable
error status in ntb_hw_idt.c and lpfc_attr.c. So we add
pci_aer_clear_uncorrect_error_status() and change to use it.


What about the next drivers

drivers/scsi/lpfc/lpfc_attr.c
drivers/crypto/hisilicon/qm.c
drivers/net/ethernet/intel/ice/ice_main.c

which call the pci_aer_clear_nonfatal_status() method too?


‘pci_aer_clear_nonfatal_status()’ in
drivers/net/ethernet/intel/ice/ice_main.c has already been removed and
merged in kernel in: 
https://github.com/torvalds/linux/commit/ca415ea1f03abf34fc8e4cc5fc30a00189b4e776


It's better if you can use kernel.org URLs that don't depend on
third parties like github, e.g.,

   https://git.kernel.org/linus/ca415ea1f03a


Good reminder, I'll pay attention next time.


‘pci_aer_clear_nonfatal_status()’ in drivers/crypto/hisilicon/qm.c will be
removed in the next kernel:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/crypto/hisilicon/qm.c?id=00278564a60e11df8bcca0ececd8b2f55434e406


This is a problem because 00278564a60e ("crypto: hisilicon - Remove
pci_aer_clear_nonfatal_status() call") is in Herbert's cryptodev tree,
and if I apply this series to the PCI tree and Linus merges it before
Herbert's cryptodev changes, it will break the build.

I think we need to split this patch up like this:

   - Add pci_aer_clear_uncorrect_error_status() to PCI core
   - Convert dpc to use pci_aer_clear_uncorrect_error_status()
 (I might end up squashing with above)
   - Convert lpfc to use pci_aer_clear_uncorrect_error_status()
   - Convert ntb_hw_idt to use pci_aer_clear_uncorrect_error_status()
   - Unexport pci_aer_clear_nonfatal_status()

Then I can apply all but the last patch safely.  If the crypto changes
are merged first, we can add the last one; otherwise we can do it for
the next cycle.


Good proposal. I will implement these in the next version.

Do I need to put pci related modifications (include patch 2/3 and 3/3) 
in a patch set or just single patches?



Uncorrectable error status register was intended to be cleared in
drivers/scsi/lpfc/lpfc_attr.c. But originally function was changed in 
https://github.com/torvalds/linux/commit/e7b0b847de6db161e3917732276e425bc92a2feb
and
https://github.com/torvalds/linux/commit/894020fdd88c1e9a74c60b67c0f19f1c7696ba2f


This will be a behavior change for lpfc and ntb_hw_idt.  It looks like
it changes the behavior back to what it was before e7b0b847de6d
("PCI/AER: Clear only ERR_NONFATAL bits during non-fatal recovery"),
so it might be OK, but splitting these out to their own patches will
make the change more obvious and we can make sure that's what we want.

Bjorn


Thanks Bjorn, I will put lpfc and ntb_hw_idt changes in single patchs.



Use pci_aer_clear_nonfatal_status() in dpc_process_error(), which has
no functional changes.

Since pci_aer_clear_nonfatal_status() is used only internally, move
its declaration to the PCI internal header file. Also, no one cares
about return value of pci_aer_clear_nonfatal_status(), so make it void.

Signed-off-by: Zhuo Chen 
---
   drivers/ntb/hw/idt/ntb_hw_idt.c |  4 ++--
   drivers/pci/pci.h   |  2 ++
   drivers/pci/pcie/aer.c  | 23 ++-
   drivers/pci/pcie/dpc.c  |  3 +--
   drivers/scsi/lpfc/lpfc_attr.c   |  4 ++--
   include/linux/aer.h |  4 ++--
   6 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index 733557231ed0..de1dbbc5b9de 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -2657,8 +2657,8 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
ret = pci_enable_pcie_error_reporting(pdev);
if (ret != 0)
dev_warn(>dev, "PCIe AER capability disabled\n");



-   else /* Cleanup nonfatal error status before getting to init */
-   pci_aer_clear_nonfatal_status(pdev);
+   else /* Cleanup uncorrectable error status before getting to init */
+   pci_aer_clear_uncorrect_error_status(pdev);


  From the IDT NTB PCIe initialization procedure point of view both of
these methods are equivalent. So for the IDT NTB part:


IDT NTB part is the same as drivers/scsi/lpfc/lpfc_attr.c. The original
function is clear uncorrectable error status register including fatal and
non-fatal error status bits.


Acked-by: Serge Semin 

-Sergey


/* First enable the PCI device */
ret = pcim_enable_device(pdev);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index e10cdec6c56e..574176f43025 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -686,6 +686,7 @@ void

Re: [External] Re: [PATCH 1/3] PCI/AER: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status

2022-09-11 Thread Zhuo Chen




On 9/12/22 12:22 AM, Serge Semin wrote:

Hi Zhuo

On Fri, Sep 02, 2022 at 02:16:32AM +0800, Zhuo Chen wrote:

Status bits for ERR_NONFATAL errors only are cleared in
pci_aer_clear_nonfatal_status(), but we want clear uncorrectable
error status in ntb_hw_idt.c and lpfc_attr.c. So we add
pci_aer_clear_uncorrect_error_status() and change to use it.


What about the next drivers

drivers/scsi/lpfc/lpfc_attr.c
drivers/crypto/hisilicon/qm.c
drivers/net/ethernet/intel/ice/ice_main.c

which call the pci_aer_clear_nonfatal_status() method too?


‘pci_aer_clear_nonfatal_status()’ in 
drivers/net/ethernet/intel/ice/ice_main.c has already been removed and 
merged in kernel in: 
https://github.com/torvalds/linux/commit/ca415ea1f03abf34fc8e4cc5fc30a00189b4e776


‘pci_aer_clear_nonfatal_status()’ in drivers/crypto/hisilicon/qm.c will 
be removed in the next kernel:

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/crypto/hisilicon/qm.c?id=00278564a60e11df8bcca0ececd8b2f55434e406

Uncorrectable error status register was intended to be cleared in 
drivers/scsi/lpfc/lpfc_attr.c. But originally function was changed in 
https://github.com/torvalds/linux/commit/e7b0b847de6db161e3917732276e425bc92a2feb

and
https://github.com/torvalds/linux/commit/894020fdd88c1e9a74c60b67c0f19f1c7696ba2f




Use pci_aer_clear_nonfatal_status() in dpc_process_error(), which has
no functional changes.

Since pci_aer_clear_nonfatal_status() is used only internally, move
its declaration to the PCI internal header file. Also, no one cares
about return value of pci_aer_clear_nonfatal_status(), so make it void.

Signed-off-by: Zhuo Chen 
---
  drivers/ntb/hw/idt/ntb_hw_idt.c |  4 ++--
  drivers/pci/pci.h   |  2 ++
  drivers/pci/pcie/aer.c  | 23 ++-
  drivers/pci/pcie/dpc.c  |  3 +--
  drivers/scsi/lpfc/lpfc_attr.c   |  4 ++--
  include/linux/aer.h |  4 ++--
  6 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index 733557231ed0..de1dbbc5b9de 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -2657,8 +2657,8 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
ret = pci_enable_pcie_error_reporting(pdev);
if (ret != 0)
dev_warn(>dev, "PCIe AER capability disabled\n");



-   else /* Cleanup nonfatal error status before getting to init */
-   pci_aer_clear_nonfatal_status(pdev);
+   else /* Cleanup uncorrectable error status before getting to init */
+   pci_aer_clear_uncorrect_error_status(pdev);


 From the IDT NTB PCIe initialization procedure point of view both of
these methods are equivalent. So for the IDT NTB part:

IDT NTB part is the same as drivers/scsi/lpfc/lpfc_attr.c. The original 
function is clear uncorrectable error status register including fatal 
and non-fatal error status bits.



Acked-by: Serge Semin 

-Sergey

  
  	/* First enable the PCI device */

ret = pcim_enable_device(pdev);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index e10cdec6c56e..574176f43025 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -686,6 +686,7 @@ void pci_aer_init(struct pci_dev *dev);
  void pci_aer_exit(struct pci_dev *dev);
  extern const struct attribute_group aer_stats_attr_group;
  void pci_aer_clear_fatal_status(struct pci_dev *dev);
+void pci_aer_clear_nonfatal_status(struct pci_dev *dev);
  int pci_aer_clear_status(struct pci_dev *dev);
  int pci_aer_raw_clear_status(struct pci_dev *dev);
  #else
@@ -693,6 +694,7 @@ static inline void pci_no_aer(void) { }
  static inline void pci_aer_init(struct pci_dev *d) { }
  static inline void pci_aer_exit(struct pci_dev *d) { }
  static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
+static inline void pci_aer_clear_nonfatal_status(struct pci_dev *dev) { }
  static inline int pci_aer_clear_status(struct pci_dev *dev) { return -EINVAL; 
}
  static inline int pci_aer_raw_clear_status(struct pci_dev *dev) { return 
-EINVAL; }
  #endif
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 7952e5efd6cf..d2996afa80f6 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -251,13 +251,13 @@ int pci_disable_pcie_error_reporting(struct pci_dev *dev)
  }
  EXPORT_SYMBOL_GPL(pci_disable_pcie_error_reporting);
  
-int pci_aer_clear_nonfatal_status(struct pci_dev *dev)

+void pci_aer_clear_nonfatal_status(struct pci_dev *dev)
  {
int aer = dev->aer_cap;
u32 status, sev;
  
  	if (!pcie_aer_is_native(dev))

-   return -EIO;
+   return;
  
  	/* Clear status bits for ERR_NONFATAL errors only */

pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, );
@@ -265,10 +265,7 @@ int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
status &= ~sev;
if (status)
pci_write_c

[PATCH 3/3] PCI/AER: Use pci_aer_raw_clear_status() to clear root port's AER error status

2022-09-01 Thread Zhuo Chen
Statements clearing AER error status in aer_enable_rootport() has the
same function as pci_aer_raw_clear_status(). So we replace them, which
has no functional changes.

Signed-off-by: Zhuo Chen 
---
 drivers/pci/pcie/aer.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index d2996afa80f6..eb0193f279f2 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1287,12 +1287,7 @@ static void aer_enable_rootport(struct aer_rpc *rpc)
   SYSTEM_ERROR_INTR_ON_MESG_MASK);
 
/* Clear error status */
-   pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, );
-   pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, reg32);
-   pci_read_config_dword(pdev, aer + PCI_ERR_COR_STATUS, );
-   pci_write_config_dword(pdev, aer + PCI_ERR_COR_STATUS, reg32);
-   pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, );
-   pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, reg32);
+   pci_aer_raw_clear_status(pdev);
 
/*
 * Enable error reporting for the root port device and downstream port
-- 
2.30.1 (Apple Git-130)



[PATCH 2/3] PCI/ERR: Clear fatal status in pcie_do_recovery()

2022-09-01 Thread Zhuo Chen
When state is pci_channel_io_frozen in pcie_do_recovery(),
the severity is fatal and fatal status should be cleared.
So we add pci_aer_clear_fatal_status().

Since pcie_aer_is_native() in pci_aer_clear_fatal_status()
and pci_aer_clear_nonfatal_status() contains the function of
'if (host->native_aer || pcie_ports_native)', so we move them
out of it.

Signed-off-by: Zhuo Chen 
---
 drivers/pci/pcie/err.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 0c5a143025af..e0a8ade4c3fe 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -243,10 +243,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 * it is responsible for clearing this status.  In that case, the
 * signaling device may not even be visible to the OS.
 */
-   if (host->native_aer || pcie_ports_native) {
+   if (host->native_aer || pcie_ports_native)
pcie_clear_device_status(dev);
+
+   if (state == pci_channel_io_frozen)
+   pci_aer_clear_fatal_status(dev);
+   else
pci_aer_clear_nonfatal_status(dev);
-   }
+
pci_info(bridge, "device recovery successful\n");
return status;
 
-- 
2.30.1 (Apple Git-130)



[PATCH 1/3] PCI/AER: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status

2022-09-01 Thread Zhuo Chen
Status bits for ERR_NONFATAL errors only are cleared in
pci_aer_clear_nonfatal_status(), but we want clear uncorrectable
error status in ntb_hw_idt.c and lpfc_attr.c. So we add
pci_aer_clear_uncorrect_error_status() and change to use it.

Use pci_aer_clear_nonfatal_status() in dpc_process_error(), which has
no functional changes.

Since pci_aer_clear_nonfatal_status() is used only internally, move
its declaration to the PCI internal header file. Also, no one cares
about return value of pci_aer_clear_nonfatal_status(), so make it void.

Signed-off-by: Zhuo Chen 
---
 drivers/ntb/hw/idt/ntb_hw_idt.c |  4 ++--
 drivers/pci/pci.h   |  2 ++
 drivers/pci/pcie/aer.c  | 23 ++-
 drivers/pci/pcie/dpc.c  |  3 +--
 drivers/scsi/lpfc/lpfc_attr.c   |  4 ++--
 include/linux/aer.h |  4 ++--
 6 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index 733557231ed0..de1dbbc5b9de 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -2657,8 +2657,8 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
ret = pci_enable_pcie_error_reporting(pdev);
if (ret != 0)
dev_warn(>dev, "PCIe AER capability disabled\n");
-   else /* Cleanup nonfatal error status before getting to init */
-   pci_aer_clear_nonfatal_status(pdev);
+   else /* Cleanup uncorrectable error status before getting to init */
+   pci_aer_clear_uncorrect_error_status(pdev);
 
/* First enable the PCI device */
ret = pcim_enable_device(pdev);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index e10cdec6c56e..574176f43025 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -686,6 +686,7 @@ void pci_aer_init(struct pci_dev *dev);
 void pci_aer_exit(struct pci_dev *dev);
 extern const struct attribute_group aer_stats_attr_group;
 void pci_aer_clear_fatal_status(struct pci_dev *dev);
+void pci_aer_clear_nonfatal_status(struct pci_dev *dev);
 int pci_aer_clear_status(struct pci_dev *dev);
 int pci_aer_raw_clear_status(struct pci_dev *dev);
 #else
@@ -693,6 +694,7 @@ static inline void pci_no_aer(void) { }
 static inline void pci_aer_init(struct pci_dev *d) { }
 static inline void pci_aer_exit(struct pci_dev *d) { }
 static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
+static inline void pci_aer_clear_nonfatal_status(struct pci_dev *dev) { }
 static inline int pci_aer_clear_status(struct pci_dev *dev) { return -EINVAL; }
 static inline int pci_aer_raw_clear_status(struct pci_dev *dev) { return 
-EINVAL; }
 #endif
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 7952e5efd6cf..d2996afa80f6 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -251,13 +251,13 @@ int pci_disable_pcie_error_reporting(struct pci_dev *dev)
 }
 EXPORT_SYMBOL_GPL(pci_disable_pcie_error_reporting);
 
-int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
+void pci_aer_clear_nonfatal_status(struct pci_dev *dev)
 {
int aer = dev->aer_cap;
u32 status, sev;
 
if (!pcie_aer_is_native(dev))
-   return -EIO;
+   return;
 
/* Clear status bits for ERR_NONFATAL errors only */
pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, );
@@ -265,10 +265,7 @@ int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
status &= ~sev;
if (status)
pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
-
-   return 0;
 }
-EXPORT_SYMBOL_GPL(pci_aer_clear_nonfatal_status);
 
 void pci_aer_clear_fatal_status(struct pci_dev *dev)
 {
@@ -286,6 +283,22 @@ void pci_aer_clear_fatal_status(struct pci_dev *dev)
pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
 }
 
+int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
+{
+   int aer = dev->aer_cap;
+   u32 status;
+
+   if (!pcie_aer_is_native(dev))
+   return -EIO;
+
+   pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, );
+   if (status)
+   pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(pci_aer_clear_uncorrect_error_status);
+
 /**
  * pci_aer_raw_clear_status - Clear AER error registers.
  * @dev: the PCI device
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index 3e9afee02e8d..7942073fbb34 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -288,8 +288,7 @@ void dpc_process_error(struct pci_dev *pdev)
 dpc_get_aer_uncorrect_severity(pdev, ) &&
 aer_get_device_error_info(pdev, )) {
aer_print_error(pdev, );
-   pci_aer_clear_nonfatal_status(pdev);
-   pci_aer_clear_fatal_status(pdev);
+   pci_aer_clear_uncorrect_error_status(pdev);
}
 }
 
diff --git a/drivers/scsi/lpfc/

[PATCH 0/3] PCI/AER: Fix and optimize usage of status clear api

2022-09-01 Thread Zhuo Chen
Hello,

This series contains some fixes and optimizations of aer api usage.
We add some process to clear uncorrectable error status, then add
distinction between fatal and nonfatal situations in pcie_do_recovery()
and reduce some redundant code. The series involves pci driver and
vendor driver.

Thanks,
Zhuo Chen

Zhuo Chen (3):
  PCI/AER: Use pci_aer_clear_uncorrect_error_status() to clear
uncorrectable error status
  PCI/ERR: Clear fatal status in pcie_do_recovery()
  PCI/AER: Use pci_aer_raw_clear_status() to clear root port's AER error
status

 drivers/ntb/hw/idt/ntb_hw_idt.c |  4 ++--
 drivers/pci/pci.h   |  2 ++
 drivers/pci/pcie/aer.c  | 30 +++---
 drivers/pci/pcie/dpc.c  |  3 +--
 drivers/pci/pcie/err.c  |  8 ++--
 drivers/scsi/lpfc/lpfc_attr.c   |  4 ++--
 include/linux/aer.h |  4 ++--
 7 files changed, 34 insertions(+), 21 deletions(-)

-- 
2.30.1 (Apple Git-130)



Re: [External] Re: [PATCH v3] PCI/ERR: Use pcie_aer_is_native() to judge whether OS owns AER

2022-08-12 Thread Zhuo Chen




On 8/3/22 6:18 AM, Sathyanarayanan Kuppuswamy wrote:



On 7/27/22 2:37 AM, Zhuo Chen wrote:



Do you mean changing "if ((host->native_aer || pcie_ports_native) && aer)" into "if 
(pcie_aer_is_native(dev) && aer)" ?
I thought changing into "if (pcie_aer_is_native(dev))" before.

One another doubt. Not every pci device support aer. When dev->aer_cap is NULL and 
root->aer_cap is not NULL in aer_root_reset(), pcie_aer_is_native() will return false and OS 
cannot operate root register. It's different from just using "(host->native_aer || 
pcie_ports_native)".

Or we can change "if ((host->native_aer || pcie_ports_native) && aer)" into "if 
(pcie_aer_is_native(root))". But in this way, argument NULL pointer check should be added in 
pcie_aer_is_native().


Looking into it again, I think it is better to leave it as it is. Please ignore 
my comment.



Thanks! Is there anything else to improve and what's next for
the patch v3 ?

--
Thanks,
Zhuo Chen


[PATCH v4] PCI/ERR: Use pcie_aer_is_native() to judge whether OS owns AER

2022-08-01 Thread Zhuo Chen
Use pcie_aer_is_native() in place of "host->native_aer ||
pcie_ports_native" to judge whether OS has native control of AER
in aer_root_reset() and pcie_do_recovery().

Replace "dev->aer_cap && (pcie_ports_native || host->native_aer)" in
get_port_device_capability() with pcie_aer_is_native(), which has no
functional changes.

Signed-off-by: Zhuo Chen 
---
Changelog:
v4:
- Use pcie_aer_is_native() instead in aer_root_reset().
v3:
- Simplify why we use pcie_aer_is_native().
- Revert modification of pci_aer_clear_nonfatal_status() and comments.
v2:
- Add details and note in commit log.
---
 drivers/pci/pcie/aer.c  | 5 ++---
 drivers/pci/pcie/err.c  | 3 +--
 drivers/pci/pcie/portdrv_core.c | 3 +--
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 7952e5efd6cf..796810c49008 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1383,7 +1383,6 @@ static pci_ers_result_t aer_root_reset(struct pci_dev 
*dev)
int type = pci_pcie_type(dev);
struct pci_dev *root;
int aer;
-   struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
u32 reg32;
int rc;
 
@@ -1404,7 +1403,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev 
*dev)
 */
aer = root ? root->aer_cap : 0;
 
-   if ((host->native_aer || pcie_ports_native) && aer) {
+   if (pcie_aer_is_native(dev) && aer) {
/* Disable Root's interrupt in response to error messages */
pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, );
reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
@@ -1423,7 +1422,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev 
*dev)
pci_is_root_bus(dev->bus) ? "Root" : "Downstream", rc);
}
 
-   if ((host->native_aer || pcie_ports_native) && aer) {
+   if (pcie_aer_is_native(dev) && aer) {
/* Clear Root Error Status */
pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, );
pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32);
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 0c5a143025af..121a53338e44 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -184,7 +184,6 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
int type = pci_pcie_type(dev);
struct pci_dev *bridge;
pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
-   struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
 
/*
 * If the error was detected by a Root Port, Downstream Port, RCEC,
@@ -243,7 +242,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 * it is responsible for clearing this status.  In that case, the
 * signaling device may not even be visible to the OS.
 */
-   if (host->native_aer || pcie_ports_native) {
+   if (pcie_aer_is_native(dev)) {
pcie_clear_device_status(dev);
pci_aer_clear_nonfatal_status(dev);
}
diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index 604feeb84ee4..98c18f4a01b2 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -221,8 +221,7 @@ static int get_port_device_capability(struct pci_dev *dev)
}
 
 #ifdef CONFIG_PCIEAER
-   if (dev->aer_cap && pci_aer_available() &&
-   (pcie_ports_native || host->native_aer)) {
+   if (pcie_aer_is_native(dev) && pci_aer_available()) {
services |= PCIE_PORT_SERVICE_AER;
 
/*
-- 
2.30.1 (Apple Git-130)



Re: [PATCH v3] PCI/ERR: Use pcie_aer_is_native() to judge whether OS owns AER

2022-07-27 Thread Zhuo Chen




On 7/26/22 1:35 PM, Zhuo Chen wrote:


On 7/26/22 9:02 PM, Sathyanarayanan Kuppuswamy wrote:



On 7/26/22 8:53 PM, Zhuo Chen wrote:

Use pcie_aer_is_native() in place of "host->native_aer ||
pcie_ports_native" to judge whether OS has native control of AER
in pcie_do_recovery().

Replace "dev->aer_cap && (pcie_ports_native || host->native_aer)" in
get_port_device_capability() with pcie_aer_is_native(), which has no
functional changes.

Signed-off-by: Zhuo Chen 
---


Patch looks better now. It looks like following two changes
can also be replaced with pcie_aer_is_native() check.

drivers/pci/pcie/aer.c:1407:    if ((host->native_aer || 
pcie_ports_native) && aer) {
drivers/pci/pcie/aer.c:1426:    if ((host->native_aer || 
pcie_ports_native) && aer) {


Good advice. But I wonder is there a scenario that dev->rcec ("root") is 
NULL meanwhile dev->aer_cap is not NULL? If so, replace 
"(host->native_aer || pcie_ports_native) && aer" with 
pcie_aer_is_native() will change original function.


Do you mean changing "if ((host->native_aer || pcie_ports_native) && 
aer)" into "if (pcie_aer_is_native(dev) && aer)" ?

I thought changing into "if (pcie_aer_is_native(dev))" before.

One another doubt. Not every pci device support aer. When dev->aer_cap 
is NULL and root->aer_cap is not NULL in aer_root_reset(), 
pcie_aer_is_native() will return false and OS cannot operate root 
register. It's different from just using "(host->native_aer || 
pcie_ports_native)".


Or we can change "if ((host->native_aer || pcie_ports_native) && aer)" 
into "if (pcie_aer_is_native(root))". But in this way, argument NULL 
pointer check should be added in pcie_aer_is_native().







Changelog:
v3:
- Simplify why we use pcie_aer_is_native().
- Revert modification of pci_aer_clear_nonfatal_status() and comments.
v2:
- Add details and note in commit log.
---
  drivers/pci/pcie/err.c  | 3 +--
  drivers/pci/pcie/portdrv_core.c | 3 +--
  2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 0c5a143025af..121a53338e44 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -184,7 +184,6 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev 
*dev,

  int type = pci_pcie_type(dev);
  struct pci_dev *bridge;
  pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
-    struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
  /*
   * If the error was detected by a Root Port, Downstream Port, 
RCEC,
@@ -243,7 +242,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev 
*dev,

   * it is responsible for clearing this status.  In that case, the
   * signaling device may not even be visible to the OS.
   */
-    if (host->native_aer || pcie_ports_native) {
+    if (pcie_aer_is_native(dev)) {
  pcie_clear_device_status(dev);
  pci_aer_clear_nonfatal_status(dev);
  }
diff --git a/drivers/pci/pcie/portdrv_core.c 
b/drivers/pci/pcie/portdrv_core.c

index 604feeb84ee4..98c18f4a01b2 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -221,8 +221,7 @@ static int get_port_device_capability(struct 
pci_dev *dev)

  }
  #ifdef CONFIG_PCIEAER
-    if (dev->aer_cap && pci_aer_available() &&
-    (pcie_ports_native || host->native_aer)) {
+    if (pcie_aer_is_native(dev) && pci_aer_available()) {
  services |= PCIE_PORT_SERVICE_AER;
  /*




Thanks,
Zhuo Chen


--
Thanks,
Zhuo Chen


[PATCH v3] PCI/ERR: Use pcie_aer_is_native() to judge whether OS owns AER

2022-07-26 Thread Zhuo Chen
Use pcie_aer_is_native() in place of "host->native_aer ||
pcie_ports_native" to judge whether OS has native control of AER
in pcie_do_recovery().

Replace "dev->aer_cap && (pcie_ports_native || host->native_aer)" in
get_port_device_capability() with pcie_aer_is_native(), which has no
functional changes.

Signed-off-by: Zhuo Chen 
---
Changelog:
v3:
- Simplify why we use pcie_aer_is_native().
- Revert modification of pci_aer_clear_nonfatal_status() and comments.
v2:
- Add details and note in commit log.
---
 drivers/pci/pcie/err.c  | 3 +--
 drivers/pci/pcie/portdrv_core.c | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 0c5a143025af..121a53338e44 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -184,7 +184,6 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
int type = pci_pcie_type(dev);
struct pci_dev *bridge;
pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
-   struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
 
/*
 * If the error was detected by a Root Port, Downstream Port, RCEC,
@@ -243,7 +242,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 * it is responsible for clearing this status.  In that case, the
 * signaling device may not even be visible to the OS.
 */
-   if (host->native_aer || pcie_ports_native) {
+   if (pcie_aer_is_native(dev)) {
pcie_clear_device_status(dev);
pci_aer_clear_nonfatal_status(dev);
}
diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index 604feeb84ee4..98c18f4a01b2 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -221,8 +221,7 @@ static int get_port_device_capability(struct pci_dev *dev)
}
 
 #ifdef CONFIG_PCIEAER
-   if (dev->aer_cap && pci_aer_available() &&
-   (pcie_ports_native || host->native_aer)) {
+   if (pcie_aer_is_native(dev) && pci_aer_available()) {
services |= PCIE_PORT_SERVICE_AER;
 
/*
-- 
2.30.1 (Apple Git-130)



[PATCH v2] PCI/ERR: Use pcie_aer_is_native() to judge whether OS owns AER

2022-07-25 Thread Zhuo Chen
The AER status of the device that reported the error rather than
the first downstream port is cleared after commit 7d7cbeaba5b7
("PCI/ERR: Clear status of the reporting device"). So "a bridge
may not exist" which commit aa344bc8b727 ("PCI/ERR: Clear AER
status only when we control AER") referring to is no longer
existent, and we just use pcie_aer_is_native() in stead of
"host->native_aer || pcie_ports_native".

pci_aer_clear_nonfatal_status() already has pcie_aer_is_native(),
so we move pci_aer_clear_nonfatal_status() out of
pcie_aer_is_native().

Replace statements that judge whether OS owns AER in
get_port_device_capability() with pcie_aer_is_native(), which has
no functional changes.

Signed-off-by: Zhuo Chen 
---
v2:
- Add details and note in commit log
---
 drivers/pci/pcie/err.c  | 12 ++--
 drivers/pci/pcie/portdrv_core.c |  3 +--
 2 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 0c5a143025af..28339c741555 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -184,7 +184,6 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
int type = pci_pcie_type(dev);
struct pci_dev *bridge;
pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
-   struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
 
/*
 * If the error was detected by a Root Port, Downstream Port, RCEC,
@@ -237,16 +236,9 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
pci_dbg(bridge, "broadcast resume message\n");
pci_walk_bridge(bridge, report_resume, );
 
-   /*
-* If we have native control of AER, clear error status in the device
-* that detected the error.  If the platform retained control of AER,
-* it is responsible for clearing this status.  In that case, the
-* signaling device may not even be visible to the OS.
-*/
-   if (host->native_aer || pcie_ports_native) {
+   if (pcie_aer_is_native(dev))
pcie_clear_device_status(dev);
-   pci_aer_clear_nonfatal_status(dev);
-   }
+   pci_aer_clear_nonfatal_status(dev);
pci_info(bridge, "device recovery successful\n");
return status;
 
diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index 604feeb84ee4..98c18f4a01b2 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -221,8 +221,7 @@ static int get_port_device_capability(struct pci_dev *dev)
}
 
 #ifdef CONFIG_PCIEAER
-   if (dev->aer_cap && pci_aer_available() &&
-   (pcie_ports_native || host->native_aer)) {
+   if (pcie_aer_is_native(dev) && pci_aer_available()) {
services |= PCIE_PORT_SERVICE_AER;
 
/*
-- 
2.30.1 (Apple Git-130)



[PATCH] PCI/ERR: Use pcie_aer_is_native() to judge whether OS owns AER

2022-07-25 Thread Zhuo Chen
After commit 7d7cbeaba5b7 ("PCI/ERR: Clear status of the reporting
device"), the AER status of the device that reported the error
rather than the first downstream port is cleared. So the problem
in commit aa344bc8b727 ("PCI/ERR: Clear AER status only when we
control AER") is no longer existent, and we change to use
pcie_aer_is_native() here.

pci_aer_clear_nonfatal_status() already has pcie_aer_is_native(),
so we move pci_aer_clear_nonfatal_status() out of
pcie_aer_is_native().

Replace statements that judge whether OS owns AER in
get_port_device_capability() with pcie_aer_is_native().

Signed-off-by: Zhuo Chen 
---
 drivers/pci/pcie/err.c  | 12 ++--
 drivers/pci/pcie/portdrv_core.c |  3 +--
 2 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 0c5a143025af..28339c741555 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -184,7 +184,6 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
int type = pci_pcie_type(dev);
struct pci_dev *bridge;
pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
-   struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
 
/*
 * If the error was detected by a Root Port, Downstream Port, RCEC,
@@ -237,16 +236,9 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
pci_dbg(bridge, "broadcast resume message\n");
pci_walk_bridge(bridge, report_resume, );
 
-   /*
-* If we have native control of AER, clear error status in the device
-* that detected the error.  If the platform retained control of AER,
-* it is responsible for clearing this status.  In that case, the
-* signaling device may not even be visible to the OS.
-*/
-   if (host->native_aer || pcie_ports_native) {
+   if (pcie_aer_is_native(dev))
pcie_clear_device_status(dev);
-   pci_aer_clear_nonfatal_status(dev);
-   }
+   pci_aer_clear_nonfatal_status(dev);
pci_info(bridge, "device recovery successful\n");
return status;
 
diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index 604feeb84ee4..98c18f4a01b2 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -221,8 +221,7 @@ static int get_port_device_capability(struct pci_dev *dev)
}
 
 #ifdef CONFIG_PCIEAER
-   if (dev->aer_cap && pci_aer_available() &&
-   (pcie_ports_native || host->native_aer)) {
+   if (pcie_aer_is_native(dev) && pci_aer_available()) {
services |= PCIE_PORT_SERVICE_AER;
 
/*
-- 
2.30.1 (Apple Git-130)