[PATCH v2 0/5] Expose PCIe AER stats via sysfs

2018-05-23 Thread Rajat Jain
This patchset exposes the AER stats via the sysfs attributes.

Patchset v2 has minor changes to v1 based on the review comments,
no functional change.
Primarily:
 * Fix license header
 * Use tabs instead of spaces
 * Remove use on unlikely() etc
 * Move documentation to Documentation/ABI/

Rajat Jain (5):
  PCI/AER: Define and allocate aer_stats structure for AER capable
devices
  PCI/AER: Add sysfs stats for AER capable devices
  PCI/AER: Add sysfs attributes to provide breakdown of AERs
  PCI/AER: Add sysfs attributes for rootport cumulative stats
  Documentation/ABI: Add details of PCI AER statistics

 .../testing/sysfs-bus-pci-devices-aer_stats   | 103 ++
 Documentation/PCI/pcieaer-howto.txt   |   5 +
 drivers/pci/pci-sysfs.c   |   3 +
 drivers/pci/pci.h |   4 +-
 drivers/pci/pcie/aer/Makefile |   2 +-
 drivers/pci/pcie/aer/aerdrv.h |  15 ++
 drivers/pci/pcie/aer/aerdrv_core.c|  11 +
 drivers/pci/pcie/aer/aerdrv_errprint.c|   7 +-
 drivers/pci/pcie/aer/aerdrv_stats.c   | 192 ++
 drivers/pci/probe.c   |   1 +
 include/linux/pci.h   |   3 +
 11 files changed, 342 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
 create mode 100644 drivers/pci/pcie/aer/aerdrv_stats.c

-- 
2.17.0.441.gb46fe60e1d-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/5] PCI/AER: Add sysfs attributes for rootport cumulative stats

2018-05-23 Thread Rajat Jain
Add sysfs attributes for rootport statistics (that are cumulative
of all the ERR_* messages seen on this PCI hierarchy).

Signed-off-by: Rajat Jain 
---
v2: same as v1

 drivers/pci/pcie/aer/aerdrv.h   |  2 ++
 drivers/pci/pcie/aer/aerdrv_core.c  |  2 ++
 drivers/pci/pcie/aer/aerdrv_stats.c | 31 +
 3 files changed, 35 insertions(+)

diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
index 048fbd7c9633..77d831d9 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -88,6 +88,8 @@ irqreturn_t aer_irq(int irq, void *context);
 int pci_aer_stats_init(struct pci_dev *pdev);
 void pci_aer_stats_exit(struct pci_dev *pdev);
 void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct aer_err_info *info);
+void pci_rootport_aer_stats_incr(struct pci_dev *pdev,
+struct aer_err_source *e_src);
 
 extern const char
 *aer_correctable_error_string[AER_MAX_TYPEOF_CORRECTABLE_ERRS];
diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
b/drivers/pci/pcie/aer/aerdrv_core.c
index 42a6f913069a..0f70e22563f3 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -424,6 +424,8 @@ static void aer_isr_one_error(struct pcie_device *p_device,
struct aer_rpc *rpc = get_service_data(p_device);
struct aer_err_info *e_info = &rpc->e_info;
 
+   pci_rootport_aer_stats_incr(p_device->port, e_src);
+
/*
 * There is a possibility that both correctable error and
 * uncorrectable error being logged. Report correctable error first.
diff --git a/drivers/pci/pcie/aer/aerdrv_stats.c 
b/drivers/pci/pcie/aer/aerdrv_stats.c
index e47321b267f6..898c9bc02ec2 100644
--- a/drivers/pci/pcie/aer/aerdrv_stats.c
+++ b/drivers/pci/pcie/aer/aerdrv_stats.c
@@ -57,6 +57,9 @@ static DEVICE_ATTR_RO(field)
 aer_stats_aggregate_attr(dev_total_cor_errs);
 aer_stats_aggregate_attr(dev_total_fatal_errs);
 aer_stats_aggregate_attr(dev_total_nonfatal_errs);
+aer_stats_aggregate_attr(rootport_total_cor_errs);
+aer_stats_aggregate_attr(rootport_total_fatal_errs);
+aer_stats_aggregate_attr(rootport_total_nonfatal_errs);
 
 #define aer_stats_breakdown_attr(field, stats_array, strings_array)\
static ssize_t  \
@@ -90,6 +93,9 @@ static struct attribute *aer_stats_attrs[] __ro_after_init = {
&dev_attr_dev_total_nonfatal_errs.attr,
&dev_attr_dev_breakdown_correctable.attr,
&dev_attr_dev_breakdown_uncorrectable.attr,
+   &dev_attr_rootport_total_cor_errs.attr,
+   &dev_attr_rootport_total_fatal_errs.attr,
+   &dev_attr_rootport_total_nonfatal_errs.attr,
NULL
 };
 
@@ -102,6 +108,12 @@ static umode_t aer_stats_attrs_are_visible(struct kobject 
*kobj,
if (!pdev->aer_stats)
return 0;
 
+   if ((a == &dev_attr_rootport_total_cor_errs.attr ||
+a == &dev_attr_rootport_total_fatal_errs.attr ||
+a == &dev_attr_rootport_total_nonfatal_errs.attr) &&
+   pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT)
+   return 0;
+
return a->mode;
 }
 
@@ -144,6 +156,25 @@ void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct 
aer_err_info *info)
counter[i]++;
 }
 
+void pci_rootport_aer_stats_incr(struct pci_dev *pdev,
+struct aer_err_source *e_src)
+{
+   struct aer_stats *aer_stats = pdev->aer_stats;
+
+   if (!aer_stats)
+   return;
+
+   if (e_src->status & PCI_ERR_ROOT_COR_RCV)
+   aer_stats->rootport_total_cor_errs++;
+
+   if (e_src->status & PCI_ERR_ROOT_UNCOR_RCV) {
+   if (e_src->status & PCI_ERR_ROOT_FATAL_RCV)
+   aer_stats->rootport_total_fatal_errs++;
+   else
+   aer_stats->rootport_total_nonfatal_errs++;
+   }
+}
+
 int pci_aer_stats_init(struct pci_dev *pdev)
 {
pdev->aer_stats = kzalloc(sizeof(struct aer_stats), GFP_KERNEL);
-- 
2.17.0.441.gb46fe60e1d-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/5] PCI/AER: Define and allocate aer_stats structure for AER capable devices

2018-05-23 Thread Rajat Jain
Define a structure to hold the AER statistics. There are 2 groups
of statistics: dev_* counters that are to be collected for all AER
capable devices and rootport_* counters that are collected for all
(AER capable) rootports only. Allocate and free this structure when
device is added or released (thus counters survive the lifetime of the
device).

Add a new file aerdrv_stats.c to hold the AER stats collection logic.

Signed-off-by: Rajat Jain 
---
v2: Fix the license header as per Greg's suggestions
(Since there is disagreement with using "//" vs "/* */" for license
 I decided to keep the one preferred by Linus, also used by others
 in this directory)

 drivers/pci/pcie/aer/Makefile   |  2 +-
 drivers/pci/pcie/aer/aerdrv.h   |  6 +++
 drivers/pci/pcie/aer/aerdrv_core.c  |  9 +
 drivers/pci/pcie/aer/aerdrv_stats.c | 61 +
 drivers/pci/probe.c |  1 +
 include/linux/pci.h |  3 ++
 6 files changed, 81 insertions(+), 1 deletion(-)
 create mode 100644 drivers/pci/pcie/aer/aerdrv_stats.c

diff --git a/drivers/pci/pcie/aer/Makefile b/drivers/pci/pcie/aer/Makefile
index 09bd890875a3..a06f9cc2bde5 100644
--- a/drivers/pci/pcie/aer/Makefile
+++ b/drivers/pci/pcie/aer/Makefile
@@ -7,7 +7,7 @@ obj-$(CONFIG_PCIEAER) += aerdriver.o
 
 obj-$(CONFIG_PCIE_ECRC)+= ecrc.o
 
-aerdriver-objs := aerdrv_errprint.o aerdrv_core.o aerdrv.o
+aerdriver-objs := aerdrv_errprint.o aerdrv_core.o aerdrv.o aerdrv_stats.o
 aerdriver-$(CONFIG_ACPI) += aerdrv_acpi.o
 
 obj-$(CONFIG_PCIEAER_INJECT) += aer_inject.o
diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
index b4c950683cc7..d8b9fba536ed 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -33,6 +33,10 @@
PCI_ERR_UNC_MALF_TLP)
 
 #define AER_MAX_MULTI_ERR_DEVICES  5   /* Not likely to have more */
+
+#define AER_MAX_TYPEOF_CORRECTABLE_ERRS 16 /* as per PCI_ERR_COR_STATUS */
+#define AER_MAX_TYPEOF_UNCORRECTABLE_ERRS 26   /* as per PCI_ERR_UNCOR_STATUS*/
+
 struct aer_err_info {
struct pci_dev *dev[AER_MAX_MULTI_ERR_DEVICES];
int error_dev_num;
@@ -81,6 +85,8 @@ void aer_isr(struct work_struct *work);
 void aer_print_error(struct pci_dev *dev, struct aer_err_info *info);
 void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info);
 irqreturn_t aer_irq(int irq, void *context);
+int pci_aer_stats_init(struct pci_dev *pdev);
+void pci_aer_stats_exit(struct pci_dev *pdev);
 
 #ifdef CONFIG_ACPI_APEI
 int pcie_aer_get_firmware_first(struct pci_dev *pci_dev);
diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
b/drivers/pci/pcie/aer/aerdrv_core.c
index 36e622d35c48..42a6f913069a 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -95,9 +95,18 @@ int pci_cleanup_aer_error_status_regs(struct pci_dev *dev)
 int pci_aer_init(struct pci_dev *dev)
 {
dev->aer_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);
+
+   if (!dev->aer_cap || pci_aer_stats_init(dev))
+   return -EIO;
+
return pci_cleanup_aer_error_status_regs(dev);
 }
 
+void pci_aer_exit(struct pci_dev *dev)
+{
+   pci_aer_stats_exit(dev);
+}
+
 /**
  * add_error_device - list device to be handled
  * @e_info: pointer to error info
diff --git a/drivers/pci/pcie/aer/aerdrv_stats.c 
b/drivers/pci/pcie/aer/aerdrv_stats.c
new file mode 100644
index ..2f48d6bc81f1
--- /dev/null
+++ b/drivers/pci/pcie/aer/aerdrv_stats.c
@@ -0,0 +1,61 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2018 Google Inc, All Rights Reserved.
+ *
+ * Rajat Jain (raja...@google.com)
+ *
+ * AER Statistics - exposed to userspace via /sysfs attributes.
+ */
+
+#include 
+#include "aerdrv.h"
+
+/* AER stats for the device */
+struct aer_stats {
+
+   /*
+* Fields for all AER capable devices. They indicate the errors
+* "as seen by this device". Note that this may mean that if an
+* end point is causing problems, the AER counters may increment
+* at its link partner (e.g. root port) because the errors will be
+* "seen" by the link partner and not the the problematic end point
+* itself (which may report all counters as 0 as it never saw any
+* problems).
+*/
+   /* Individual counters for different type of correctable errors */
+   u64 dev_cor_errs[AER_MAX_TYPEOF_CORRECTABLE_ERRS];
+   /* Individual counters for different type of uncorrectable errors */
+   u64 dev_uncor_errs[AER_MAX_TYPEOF_UNCORRECTABLE_ERRS];
+   /* Total number of correctable errors seen by this device */
+   u64 dev_total_cor_errs;
+   /* Total number of fatal uncorrectable errors seen by this device */
+   u64 dev_total_fatal_errs;
+   /* Total number of fatal uncorrectable errors seen by this device */
+   u6

[PATCH v2 3/5] PCI/AER: Add sysfs attributes to provide breakdown of AERs

2018-05-23 Thread Rajat Jain
Add sysfs attributes to provide breakdown of the AERs seen,
into different type of correctable or uncorrectable errors:

dev_breakdown_correctable
dev_breakdown_uncorrectable

Signed-off-by: Rajat Jain 
---
v2: Use tabs instead of spaces, fix the subject, and print
all non zero counters.

 drivers/pci/pcie/aer/aerdrv.h  |  6 ++
 drivers/pci/pcie/aer/aerdrv_errprint.c |  6 --
 drivers/pci/pcie/aer/aerdrv_stats.c| 28 ++
 3 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
index b5d5ad6f2c03..048fbd7c9633 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -89,6 +89,12 @@ int pci_aer_stats_init(struct pci_dev *pdev);
 void pci_aer_stats_exit(struct pci_dev *pdev);
 void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct aer_err_info *info);
 
+extern const char
+*aer_correctable_error_string[AER_MAX_TYPEOF_CORRECTABLE_ERRS];
+
+extern const char
+*aer_uncorrectable_error_string[AER_MAX_TYPEOF_UNCORRECTABLE_ERRS];
+
 #ifdef CONFIG_ACPI_APEI
 int pcie_aer_get_firmware_first(struct pci_dev *pci_dev);
 #else
diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c 
b/drivers/pci/pcie/aer/aerdrv_errprint.c
index 5e8b98deda08..5585f309f1a8 100644
--- a/drivers/pci/pcie/aer/aerdrv_errprint.c
+++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
@@ -68,7 +68,8 @@ static const char *aer_error_layer[] = {
"Transaction Layer"
 };
 
-static const char *aer_correctable_error_string[] = {
+const char
+*aer_correctable_error_string[AER_MAX_TYPEOF_CORRECTABLE_ERRS] = {
"Receiver Error",   /* Bit Position 0   */
NULL,
NULL,
@@ -87,7 +88,8 @@ static const char *aer_correctable_error_string[] = {
"Header Log Overflow",  /* Bit Position 15  */
 };
 
-static const char *aer_uncorrectable_error_string[] = {
+const char
+*aer_uncorrectable_error_string[AER_MAX_TYPEOF_UNCORRECTABLE_ERRS] = {
"Undefined",/* Bit Position 0   */
NULL,
NULL,
diff --git a/drivers/pci/pcie/aer/aerdrv_stats.c 
b/drivers/pci/pcie/aer/aerdrv_stats.c
index beffef2b..e47321b267f6 100644
--- a/drivers/pci/pcie/aer/aerdrv_stats.c
+++ b/drivers/pci/pcie/aer/aerdrv_stats.c
@@ -58,10 +58,38 @@ aer_stats_aggregate_attr(dev_total_cor_errs);
 aer_stats_aggregate_attr(dev_total_fatal_errs);
 aer_stats_aggregate_attr(dev_total_nonfatal_errs);
 
+#define aer_stats_breakdown_attr(field, stats_array, strings_array)\
+   static ssize_t  \
+   field##_show(struct device *dev, struct device_attribute *attr, \
+char *buf) \
+{  \
+   unsigned int i; \
+   char *str = buf;\
+   struct pci_dev *pdev = to_pci_dev(dev); \
+   u64 *stats = pdev->aer_stats->stats_array;  \
+   for (i = 0; i < ARRAY_SIZE(strings_array); i++) {   \
+   if (strings_array[i])   \
+   str += sprintf(str, "%s = 0x%llx\n",\
+  strings_array[i], stats[i]); \
+   else if (stats[i])  \
+   str += sprintf(str, #stats_array "bit[%d] = 0x%llx\n",\
+  i, stats[i]);\
+   }   \
+   return str-buf; \
+}  \
+static DEVICE_ATTR_RO(field)
+
+aer_stats_breakdown_attr(dev_breakdown_correctable, dev_cor_errs,
+aer_correctable_error_string);
+aer_stats_breakdown_attr(dev_breakdown_uncorrectable, dev_uncor_errs,
+aer_uncorrectable_error_string);
+
 static struct attribute *aer_stats_attrs[] __ro_after_init = {
&dev_attr_dev_total_cor_errs.attr,
&dev_attr_dev_total_fatal_errs.attr,
&dev_attr_dev_total_nonfatal_errs.attr,
+   &dev_attr_dev_breakdown_correctable.attr,
+   &dev_attr_dev_breakdown_uncorrectable.attr,
NULL
 };
 
-- 
2.17.0.441.gb46fe60e1d-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 5/5] Documentation/ABI: Add details of PCI AER statistics

2018-05-23 Thread Rajat Jain
Add the PCI AER statistics details to
Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
and provide a pointer to it in
Documentation/PCI/pcieaer-howto.txt

Signed-off-by: Rajat Jain 
---
v2: Move the documentation to Documentation/ABI/

 .../testing/sysfs-bus-pci-devices-aer_stats   | 103 ++
 Documentation/PCI/pcieaer-howto.txt   |   5 +
 2 files changed, 108 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats

diff --git a/Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats 
b/Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
new file mode 100644
index ..f55c389290ac
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
@@ -0,0 +1,103 @@
+==
+PCIe Device AER statistics
+==
+These attributes show up under all the devices that are AER capable. These
+statistical counters indicate the errors "as seen/reported by the device".
+Note that this may mean that if an end point is causing problems, the AER
+counters may increment at its link partner (e.g. root port) because the
+errors will be "seen" / reported by the link partner and not the the
+problematic end point itself (which may report all counters as 0 as it never
+saw any problems).
+
+Where: /sys/bus/pci/devices//aer_stats/dev_total_cor_errs
+Date:  May 2018
+Kernel Version: 4.17.0
+Contact:   linux-...@vger.kernel.org, raja...@google.com
+Description:   Total number of correctable errors seen and reported by this
+   PCI device using ERR_COR.
+
+Where: /sys/bus/pci/devices//aer_stats/dev_total_fatal_errs
+Date:  May 2018
+Kernel Version: 4.17.0
+Contact:   linux-...@vger.kernel.org, raja...@google.com
+Description:   Total number of uncorrectable fatal errors seen and reported
+   by this PCI device using ERR_FATAL.
+
+Where: /sys/bus/pci/devices//aer_stats/dev_total_nonfatal_errs
+Date:  May 2018
+Kernel Version: 4.17.0
+Contact:   linux-...@vger.kernel.org, raja...@google.com
+Description:   Total number of uncorrectable non-fatal errors seen and reported
+   by this PCI device using ERR_NONFATAL.
+
+Where: /sys/bus/pci/devices//aer_stats/dev_breakdown_correctable
+Date:  May 2018
+Kernel Version: 4.17.0
+Contact:   linux-...@vger.kernel.org, raja...@google.com
+Description:   Breakdown of of correctable errors seen and reported by this
+   PCI device using ERR_COR. A sample result looks like this:
+-
+Receiver Error = 0x174
+Bad TLP = 0x19
+Bad DLLP = 0x3
+RELAY_NUM Rollover = 0x0
+Replay Timer Timeout = 0x1
+Advisory Non-Fatal = 0x0
+Corrected Internal Error = 0x0
+Header Log Overflow = 0x0
+-
+
+Where: /sys/bus/pci/devices//aer_stats/dev_breakdown_uncorrectable
+Date:  May 2018
+Kernel Version: 4.17.0
+Contact:   linux-...@vger.kernel.org, raja...@google.com
+Description:   Breakdown of of correctable errors seen and reported by this
+   PCI device using ERR_FATAL or ERR_NONFATAL. A sample result
+   looks like this:
+-
+Undefined = 0x0
+Data Link Protocol = 0x0
+Surprise Down Error = 0x0
+Poisoned TLP = 0x0
+Flow Control Protocol = 0x0
+Completion Timeout = 0x0
+Completer Abort = 0x0
+Unexpected Completion = 0x0
+Receiver Overflow = 0x0
+Malformed TLP = 0x0
+ECRC = 0x0
+Unsupported Request = 0x0
+ACS Violation = 0x0
+Uncorrectable Internal Error = 0x0
+MC Blocked TLP = 0x0
+AtomicOp Egress Blocked = 0x0
+TLP Prefix Blocked Error = 0x0
+-
+
+
+PCIe Rootport AER statistics
+
+These attributes showup under only the rootports that are AER capable. These
+indicate the number of error messages as "reported to" the rootport. Please 
note
+that the rootports also transmit (internally) the ERR_* messages for errors 
seen
+by the internal rootport PCI device, so these counters includes them and are
+thus cumulative of all the error messages on the PCI hierarchy originating
+at that root port.
+
+Where: /sys/bus/pci/devices//aer_stats/rootport_total_cor_errs
+Date:  May 2018
+Kernel Version: 4.17.0
+Contact:   linux-...@vger.kernel.org, raja...@google.com
+Description:   Total number of ERR_COR messages reported to rootport.
+
+Where: /sys/bus/pci/devices//aer_stats/rootport_total_fatal_errs
+Date:  May 2018
+Kernel Version: 4.17.0
+Contact:   linux-...@vger.kernel.org, raja...@google.com
+Description:   Total number of ERR_FATAL messages reported to rootport.
+
+Where: /sys/bus/pci/devices//aer_stats/rootport_total_nonfatal_errs
+Date:  May 2018
+Kernel Version: 4.17.0
+Contact:   linux-...@vger.kernel.org, raja...@google.co

[PATCH v2 2/5] PCI/AER: Add sysfs stats for AER capable devices

2018-05-23 Thread Rajat Jain
Add the following AER sysfs stats to represent the counters for each
kind of error as seen by the device:

dev_total_cor_errs
dev_total_fatal_errs
dev_total_nonfatal_errs

Signed-off-by: Rajat Jain 
---
v2: Use tabs instead of spaces at the end of macro lines, and remove
the use of unlikely() as per Greg's suggestion.

 drivers/pci/pci-sysfs.c|  3 ++
 drivers/pci/pci.h  |  4 +-
 drivers/pci/pcie/aer/aerdrv.h  |  1 +
 drivers/pci/pcie/aer/aerdrv_errprint.c |  1 +
 drivers/pci/pcie/aer/aerdrv_stats.c| 72 ++
 5 files changed, 80 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 366d93af051d..730f985a3dc9 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1743,6 +1743,9 @@ static const struct attribute_group 
*pci_dev_attr_groups[] = {
 #endif
&pci_bridge_attr_group,
&pcie_dev_attr_group,
+#ifdef CONFIG_PCIEAER
+   &aer_stats_attr_group,
+#endif
NULL,
 };
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index c358e7a07f3f..9a28ec600225 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -181,7 +181,9 @@ extern const struct attribute_group *pci_dev_groups[];
 extern const struct attribute_group *pcibus_groups[];
 extern const struct device_type pci_dev_type;
 extern const struct attribute_group *pci_bus_groups[];
-
+#ifdef CONFIG_PCIEAER
+extern const struct attribute_group aer_stats_attr_group;
+#endif
 
 /**
  * pci_match_one_device - Tell if a PCI device structure has a matching
diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
index d8b9fba536ed..b5d5ad6f2c03 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -87,6 +87,7 @@ void aer_print_port_info(struct pci_dev *dev, struct 
aer_err_info *info);
 irqreturn_t aer_irq(int irq, void *context);
 int pci_aer_stats_init(struct pci_dev *pdev);
 void pci_aer_stats_exit(struct pci_dev *pdev);
+void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct aer_err_info *info);
 
 #ifdef CONFIG_ACPI_APEI
 int pcie_aer_get_firmware_first(struct pci_dev *pci_dev);
diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c 
b/drivers/pci/pcie/aer/aerdrv_errprint.c
index 21ca5e1b0ded..5e8b98deda08 100644
--- a/drivers/pci/pcie/aer/aerdrv_errprint.c
+++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
@@ -155,6 +155,7 @@ static void __aer_print_error(struct pci_dev *dev,
pci_err(dev, "   [%2d] Unknown Error Bit%s\n",
i, info->first_error == i ? " (First)" : "");
}
+   pci_dev_aer_stats_incr(dev, info);
 }
 
 void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
diff --git a/drivers/pci/pcie/aer/aerdrv_stats.c 
b/drivers/pci/pcie/aer/aerdrv_stats.c
index 2f48d6bc81f1..beffef2b 100644
--- a/drivers/pci/pcie/aer/aerdrv_stats.c
+++ b/drivers/pci/pcie/aer/aerdrv_stats.c
@@ -44,6 +44,78 @@ struct aer_stats {
u64 rootport_total_nonfatal_errs;
 };
 
+#define aer_stats_aggregate_attr(field)
\
+   static ssize_t  \
+   field##_show(struct device *dev, struct device_attribute *attr, \
+char *buf) \
+{  \
+   struct pci_dev *pdev = to_pci_dev(dev); \
+   return sprintf(buf, "0x%llx\n", pdev->aer_stats->field);\
+}  \
+static DEVICE_ATTR_RO(field)
+
+aer_stats_aggregate_attr(dev_total_cor_errs);
+aer_stats_aggregate_attr(dev_total_fatal_errs);
+aer_stats_aggregate_attr(dev_total_nonfatal_errs);
+
+static struct attribute *aer_stats_attrs[] __ro_after_init = {
+   &dev_attr_dev_total_cor_errs.attr,
+   &dev_attr_dev_total_fatal_errs.attr,
+   &dev_attr_dev_total_nonfatal_errs.attr,
+   NULL
+};
+
+static umode_t aer_stats_attrs_are_visible(struct kobject *kobj,
+  struct attribute *a, int n)
+{
+   struct device *dev = kobj_to_dev(kobj);
+   struct pci_dev *pdev = to_pci_dev(dev);
+
+   if (!pdev->aer_stats)
+   return 0;
+
+   return a->mode;
+}
+
+const struct attribute_group aer_stats_attr_group = {
+   .name  = "aer_stats",
+   .attrs  = aer_stats_attrs,
+   .is_visible = aer_stats_attrs_are_visible,
+};
+
+void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct aer_err_info *info)
+{
+   int status, i, max = -1;
+   u64 *counter = NULL;
+   struct aer_stats *aer_stats = pdev->aer_stats;
+
+   if (!aer_stats)
+   return;
+
+   switch (info->severity) {
+   case AER_CORRECTABLE:
+   aer_sta

Re: [PATCH 2/5] PCI/AER: Add sysfs stats for AER capable devices

2018-05-22 Thread Rajat Jain
On Tue, May 22, 2018 at 3:50 PM, Alex G.  wrote:
>
>
> On 05/22/2018 05:28 PM, Rajat Jain wrote:
>> Add the following AER sysfs stats to represent the counters for each
>> kind of error as seen by the device:
>>
>> dev_total_cor_errs
>> dev_total_fatal_errs
>> dev_total_nonfatal_errs
>>
>> Signed-off-by: Rajat Jain 
>> ---
>>  drivers/pci/pci-sysfs.c|  3 ++
>>  drivers/pci/pci.h  |  4 +-
>>  drivers/pci/pcie/aer/aerdrv.h  |  1 +
>>  drivers/pci/pcie/aer/aerdrv_errprint.c |  1 +
>>  drivers/pci/pcie/aer/aerdrv_stats.c| 72 ++
>>  5 files changed, 80 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
>> index 366d93af051d..730f985a3dc9 100644
>> --- a/drivers/pci/pci-sysfs.c
>> +++ b/drivers/pci/pci-sysfs.c
>> @@ -1743,6 +1743,9 @@ static const struct attribute_group 
>> *pci_dev_attr_groups[] = {
>>  #endif
>>   &pci_bridge_attr_group,
>>   &pcie_dev_attr_group,
>> +#ifdef CONFIG_PCIEAER
>> + &aer_stats_attr_group,
>> +#endif
>>   NULL,
>>  };
>
> So if the device is removed as part of recovery, then these get reset,
> right? So if the device fails intermittently, these counters would keep
> getting reset. Is this the intent?

Umm, kind of.
* One argument is that if a PCI device is removed and then
re-enumerated, how do we know it is the same device and has not been
replaced by another device for e.g.?  Note that the root port counters
that have the cumulative counters for all the errors seen will still
have them logged in the situation you describe.

>
> (snip)
>
>>  /**
>>   * pci_match_one_device - Tell if a PCI device structure has a matching
>> diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
>> index d8b9fba536ed..b5d5ad6f2c03 100644
>> --- a/drivers/pci/pcie/aer/aerdrv.h
>> +++ b/drivers/pci/pcie/aer/aerdrv.h
>> @@ -87,6 +87,7 @@ void aer_print_port_info(struct pci_dev *dev, struct 
>> aer_err_info *info);
>>  irqreturn_t aer_irq(int irq, void *context);
>>  int pci_aer_stats_init(struct pci_dev *pdev);
>>  void pci_aer_stats_exit(struct pci_dev *pdev);
>> +void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct aer_err_info 
>> *info);
>>
>>  #ifdef CONFIG_ACPI_APEI
>>  int pcie_aer_get_firmware_first(struct pci_dev *pci_dev);
>> diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c 
>> b/drivers/pci/pcie/aer/aerdrv_errprint.c
>> index 21ca5e1b0ded..5e8b98deda08 100644
>> --- a/drivers/pci/pcie/aer/aerdrv_errprint.c
>> +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
>> @@ -155,6 +155,7 @@ static void __aer_print_error(struct pci_dev *dev,
>>   pci_err(dev, "   [%2d] Unknown Error Bit%s\n",
>>   i, info->first_error == i ? " (First)" : "");
>>   }
>> + pci_dev_aer_stats_incr(dev, info);
>
> What about AER errors that are contained by DPC?

Thanks, You are right, this patch does not take care of the DPC. I'll
try to read up on DPC and can integrate it if it turns out to be easy
enough.

Thanks,

Rajat

>
> Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] Documentation/PCI: Add details of PCI AER statistics

2018-05-22 Thread Rajat Jain
Hi,

On Tue, May 22, 2018 at 3:52 PM, Alex G.  wrote:
> On 05/22/2018 05:28 PM, Rajat Jain wrote:
>> Add the PCI AER statistics details to
>> Documentation/PCI/pcieaer-howto.txt
>>
>> Signed-off-by: Rajat Jain 
>> ---
>>  Documentation/PCI/pcieaer-howto.txt | 35 +
>>  1 file changed, 35 insertions(+)
>>
>> diff --git a/Documentation/PCI/pcieaer-howto.txt 
>> b/Documentation/PCI/pcieaer-howto.txt
>> index acd06bb8..86ee9f9ff5e1 100644
>> --- a/Documentation/PCI/pcieaer-howto.txt
>> +++ b/Documentation/PCI/pcieaer-howto.txt
>> @@ -73,6 +73,41 @@ In the example, 'Requester ID' means the ID of the device 
>> who sends
>>  the error message to root port. Pls. refer to pci express specs for
>>  other fields.
>>
>> +2.4 AER statistics
>> +
>> +When AER messages are captured, the statistics are exposed via the following
>> +sysfs attributes under the "aer_stats" folder for the device:
>> +
>> +2.4.1 Device sysfs Attributes
>> +
>> +These attributes show up under all the devices that are AER capable. These
>> +indicate the errors "as seen by the device". Note that this may mean that if
>> +an end point is causing problems, the AER counters may increment at its link
>> +partner (e.g. root port) because the errors will be "seen" by the link 
>> partner
>> +and not the the problematic end point itself (which may report all counters
>> +as 0 as it never saw any problems).
>
> I was afraid of that. Is there a way to look at the requester ID to log
> AER errors to the correct device?

I do not think it is possible to pin point the source of the problem.
Errors may be caused due to sub optimal link tuning, or signal
integrity, or either of the link partners. Both the link partners will
detect and report the errors that they "see".

The bits and errors defined by the PCIe spec, follow the same semantics i.e.
 => the spec defines the different error conditions "as
seen/encountered by the device",
   => Thus the device reports those errors to the root port
   => which is what we are counting and reporting here.

IMHO, any interpretation / analysis of this error data / counters
should be left to the user so that he can look at different devices
and the errors they see, and then conclude on what might be the
problem.

Thanks,
Rajat

>
> Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] PCI/AER: Define and allocate aer_stats structure for AER capable devices

2018-05-22 Thread Rajat Jain
Define a structure to hold the AER statistics. There are 2 groups
of statistics: dev_* counters that are to be collected for all AER
capable devices and rootport_* counters that are collected for all
(AER capable) rootports only. Allocate and free this structure when
device is added or released (thus counters survive the lifetime of the
device).

Add a new file aerdrv_stats.c to hold the AER stats collection logic.

Signed-off-by: Rajat Jain 
---
 drivers/pci/pcie/aer/Makefile   |  2 +-
 drivers/pci/pcie/aer/aerdrv.h   |  6 +++
 drivers/pci/pcie/aer/aerdrv_core.c  |  9 
 drivers/pci/pcie/aer/aerdrv_stats.c | 64 +
 drivers/pci/probe.c |  1 +
 include/linux/pci.h |  3 ++
 6 files changed, 84 insertions(+), 1 deletion(-)
 create mode 100644 drivers/pci/pcie/aer/aerdrv_stats.c

diff --git a/drivers/pci/pcie/aer/Makefile b/drivers/pci/pcie/aer/Makefile
index 09bd890875a3..a06f9cc2bde5 100644
--- a/drivers/pci/pcie/aer/Makefile
+++ b/drivers/pci/pcie/aer/Makefile
@@ -7,7 +7,7 @@ obj-$(CONFIG_PCIEAER) += aerdriver.o
 
 obj-$(CONFIG_PCIE_ECRC)+= ecrc.o
 
-aerdriver-objs := aerdrv_errprint.o aerdrv_core.o aerdrv.o
+aerdriver-objs := aerdrv_errprint.o aerdrv_core.o aerdrv.o aerdrv_stats.o
 aerdriver-$(CONFIG_ACPI) += aerdrv_acpi.o
 
 obj-$(CONFIG_PCIEAER_INJECT) += aer_inject.o
diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
index b4c950683cc7..d8b9fba536ed 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -33,6 +33,10 @@
PCI_ERR_UNC_MALF_TLP)
 
 #define AER_MAX_MULTI_ERR_DEVICES  5   /* Not likely to have more */
+
+#define AER_MAX_TYPEOF_CORRECTABLE_ERRS 16 /* as per PCI_ERR_COR_STATUS */
+#define AER_MAX_TYPEOF_UNCORRECTABLE_ERRS 26   /* as per PCI_ERR_UNCOR_STATUS*/
+
 struct aer_err_info {
struct pci_dev *dev[AER_MAX_MULTI_ERR_DEVICES];
int error_dev_num;
@@ -81,6 +85,8 @@ void aer_isr(struct work_struct *work);
 void aer_print_error(struct pci_dev *dev, struct aer_err_info *info);
 void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info);
 irqreturn_t aer_irq(int irq, void *context);
+int pci_aer_stats_init(struct pci_dev *pdev);
+void pci_aer_stats_exit(struct pci_dev *pdev);
 
 #ifdef CONFIG_ACPI_APEI
 int pcie_aer_get_firmware_first(struct pci_dev *pci_dev);
diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
b/drivers/pci/pcie/aer/aerdrv_core.c
index 36e622d35c48..42a6f913069a 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -95,9 +95,18 @@ int pci_cleanup_aer_error_status_regs(struct pci_dev *dev)
 int pci_aer_init(struct pci_dev *dev)
 {
dev->aer_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);
+
+   if (!dev->aer_cap || pci_aer_stats_init(dev))
+   return -EIO;
+
return pci_cleanup_aer_error_status_regs(dev);
 }
 
+void pci_aer_exit(struct pci_dev *dev)
+{
+   pci_aer_stats_exit(dev);
+}
+
 /**
  * add_error_device - list device to be handled
  * @e_info: pointer to error info
diff --git a/drivers/pci/pcie/aer/aerdrv_stats.c 
b/drivers/pci/pcie/aer/aerdrv_stats.c
new file mode 100644
index ..b9f251992209
--- /dev/null
+++ b/drivers/pci/pcie/aer/aerdrv_stats.c
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ *  Copyright (C) 2018 Google Inc, All Rights Reserved.
+ *  Rajat Jain (raja...@google.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * AER Statistics - exposed to userspace via /sysfs attributes.
+ */
+
+#include 
+#include "aerdrv.h"
+
+/* AER stats for the device */
+struct aer_stats {
+
+   /*
+* Fields for all AER capable devices. They indicate the errors
+* "as seen by this device". Note that this may mean that if an
+* end point is causing problems, the AER counters may increment
+* at its link partner (e.g. root port) because the errors will be
+* "seen" by the link partner and not the the problematic end point
+* itself (which may report all counters as 0 as it never saw any
+* problems).
+*/
+   /* Individual counters for different type of correctable errors */
+   u64 dev_cor_errs[AER_MAX_TYPEOF_CORRECTABLE_ERRS];
+   /* Individual counters for different type of uncorrectable errors */
+   u64 dev_uncor_errs[AER_MAX_TYPEOF_UNCORRECTABLE_ERRS];
+   /* Total number of correctable errors seen by this device */
+   u64 dev_total_cor_errs;
+   /* Total number of fatal uncorrectable errors seen by this device */
+   u64 dev_total_fatal_errs;
+   /* Total number of fatal uncorrectable errors seen by this device */
+   u64 dev_total_nonfatal_errs;
+
+   /*
+* Fields

[PATCH 3/5] PCP/AER: Add sysfs attributes to provide breakdown of AERs

2018-05-22 Thread Rajat Jain
Add sysfs attributes to provide breakdown of the AERs seen,
into different type of correctable or uncorrectable errors:

dev_breakdown_correctable
dev_breakdown_uncorrectable

Signed-off-by: Rajat Jain 
---
 drivers/pci/pcie/aer/aerdrv.h  |  6 ++
 drivers/pci/pcie/aer/aerdrv_errprint.c |  6 --
 drivers/pci/pcie/aer/aerdrv_stats.c| 25 +
 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
index b5d5ad6f2c03..048fbd7c9633 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -89,6 +89,12 @@ int pci_aer_stats_init(struct pci_dev *pdev);
 void pci_aer_stats_exit(struct pci_dev *pdev);
 void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct aer_err_info *info);
 
+extern const char
+*aer_correctable_error_string[AER_MAX_TYPEOF_CORRECTABLE_ERRS];
+
+extern const char
+*aer_uncorrectable_error_string[AER_MAX_TYPEOF_UNCORRECTABLE_ERRS];
+
 #ifdef CONFIG_ACPI_APEI
 int pcie_aer_get_firmware_first(struct pci_dev *pci_dev);
 #else
diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c 
b/drivers/pci/pcie/aer/aerdrv_errprint.c
index 5e8b98deda08..5585f309f1a8 100644
--- a/drivers/pci/pcie/aer/aerdrv_errprint.c
+++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
@@ -68,7 +68,8 @@ static const char *aer_error_layer[] = {
"Transaction Layer"
 };
 
-static const char *aer_correctable_error_string[] = {
+const char
+*aer_correctable_error_string[AER_MAX_TYPEOF_CORRECTABLE_ERRS] = {
"Receiver Error",   /* Bit Position 0   */
NULL,
NULL,
@@ -87,7 +88,8 @@ static const char *aer_correctable_error_string[] = {
"Header Log Overflow",  /* Bit Position 15  */
 };
 
-static const char *aer_uncorrectable_error_string[] = {
+const char
+*aer_uncorrectable_error_string[AER_MAX_TYPEOF_UNCORRECTABLE_ERRS] = {
"Undefined",/* Bit Position 0   */
NULL,
NULL,
diff --git a/drivers/pci/pcie/aer/aerdrv_stats.c 
b/drivers/pci/pcie/aer/aerdrv_stats.c
index 87b7119d0a86..5f0a6e144f56 100644
--- a/drivers/pci/pcie/aer/aerdrv_stats.c
+++ b/drivers/pci/pcie/aer/aerdrv_stats.c
@@ -61,10 +61,35 @@ aer_stats_aggregate_attr(dev_total_cor_errs);
 aer_stats_aggregate_attr(dev_total_fatal_errs);
 aer_stats_aggregate_attr(dev_total_nonfatal_errs);
 
+#define aer_stats_breakdown_attr(field, stats_array, strings_array)
\
+   static ssize_t \
+   field##_show(struct device *dev, struct device_attribute *attr,\
+char *buf)\
+{  
\
+   unsigned int i;\
+   char *str = buf;   \
+   struct pci_dev *pdev = to_pci_dev(dev);\
+   u64 *stats = pdev->aer_stats->stats_array; \
+   for (i = 0; i < ARRAY_SIZE(strings_array); i++) {  \
+   if (strings_array[i])  \
+   str += sprintf(str, "%s = 0x%llx\n",   \
+  strings_array[i], stats[i]);\
+   }  \
+   return str-buf;\
+}  
\
+static DEVICE_ATTR_RO(field)
+
+aer_stats_breakdown_attr(dev_breakdown_correctable, dev_cor_errs,
+aer_correctable_error_string);
+aer_stats_breakdown_attr(dev_breakdown_uncorrectable, dev_uncor_errs,
+aer_uncorrectable_error_string);
+
 static struct attribute *aer_stats_attrs[] __ro_after_init = {
&dev_attr_dev_total_cor_errs.attr,
&dev_attr_dev_total_fatal_errs.attr,
&dev_attr_dev_total_nonfatal_errs.attr,
+   &dev_attr_dev_breakdown_correctable.attr,
+   &dev_attr_dev_breakdown_uncorrectable.attr,
NULL
 };
 
-- 
2.17.0.441.gb46fe60e1d-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] PCI/AER: Add sysfs stats for AER capable devices

2018-05-22 Thread Rajat Jain
Add the following AER sysfs stats to represent the counters for each
kind of error as seen by the device:

dev_total_cor_errs
dev_total_fatal_errs
dev_total_nonfatal_errs

Signed-off-by: Rajat Jain 
---
 drivers/pci/pci-sysfs.c|  3 ++
 drivers/pci/pci.h  |  4 +-
 drivers/pci/pcie/aer/aerdrv.h  |  1 +
 drivers/pci/pcie/aer/aerdrv_errprint.c |  1 +
 drivers/pci/pcie/aer/aerdrv_stats.c| 72 ++
 5 files changed, 80 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 366d93af051d..730f985a3dc9 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1743,6 +1743,9 @@ static const struct attribute_group 
*pci_dev_attr_groups[] = {
 #endif
&pci_bridge_attr_group,
&pcie_dev_attr_group,
+#ifdef CONFIG_PCIEAER
+   &aer_stats_attr_group,
+#endif
NULL,
 };
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index c358e7a07f3f..9a28ec600225 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -181,7 +181,9 @@ extern const struct attribute_group *pci_dev_groups[];
 extern const struct attribute_group *pcibus_groups[];
 extern const struct device_type pci_dev_type;
 extern const struct attribute_group *pci_bus_groups[];
-
+#ifdef CONFIG_PCIEAER
+extern const struct attribute_group aer_stats_attr_group;
+#endif
 
 /**
  * pci_match_one_device - Tell if a PCI device structure has a matching
diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
index d8b9fba536ed..b5d5ad6f2c03 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -87,6 +87,7 @@ void aer_print_port_info(struct pci_dev *dev, struct 
aer_err_info *info);
 irqreturn_t aer_irq(int irq, void *context);
 int pci_aer_stats_init(struct pci_dev *pdev);
 void pci_aer_stats_exit(struct pci_dev *pdev);
+void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct aer_err_info *info);
 
 #ifdef CONFIG_ACPI_APEI
 int pcie_aer_get_firmware_first(struct pci_dev *pci_dev);
diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c 
b/drivers/pci/pcie/aer/aerdrv_errprint.c
index 21ca5e1b0ded..5e8b98deda08 100644
--- a/drivers/pci/pcie/aer/aerdrv_errprint.c
+++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
@@ -155,6 +155,7 @@ static void __aer_print_error(struct pci_dev *dev,
pci_err(dev, "   [%2d] Unknown Error Bit%s\n",
i, info->first_error == i ? " (First)" : "");
}
+   pci_dev_aer_stats_incr(dev, info);
 }
 
 void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
diff --git a/drivers/pci/pcie/aer/aerdrv_stats.c 
b/drivers/pci/pcie/aer/aerdrv_stats.c
index b9f251992209..87b7119d0a86 100644
--- a/drivers/pci/pcie/aer/aerdrv_stats.c
+++ b/drivers/pci/pcie/aer/aerdrv_stats.c
@@ -47,6 +47,78 @@ struct aer_stats {
u64 rootport_total_nonfatal_errs;
 };
 
+#define aer_stats_aggregate_attr(field)
\
+   static ssize_t \
+   field##_show(struct device *dev, struct device_attribute *attr,\
+char *buf)\
+{  
\
+   struct pci_dev *pdev = to_pci_dev(dev);\
+   return sprintf(buf, "0x%llx\n", pdev->aer_stats->field);   \
+}  
\
+static DEVICE_ATTR_RO(field)
+
+aer_stats_aggregate_attr(dev_total_cor_errs);
+aer_stats_aggregate_attr(dev_total_fatal_errs);
+aer_stats_aggregate_attr(dev_total_nonfatal_errs);
+
+static struct attribute *aer_stats_attrs[] __ro_after_init = {
+   &dev_attr_dev_total_cor_errs.attr,
+   &dev_attr_dev_total_fatal_errs.attr,
+   &dev_attr_dev_total_nonfatal_errs.attr,
+   NULL
+};
+
+static umode_t aer_stats_attrs_are_visible(struct kobject *kobj,
+  struct attribute *a, int n)
+{
+   struct device *dev = kobj_to_dev(kobj);
+   struct pci_dev *pdev = to_pci_dev(dev);
+
+   if (!pdev->aer_stats)
+   return 0;
+
+   return a->mode;
+}
+
+const struct attribute_group aer_stats_attr_group = {
+   .name  = "aer_stats",
+   .attrs  = aer_stats_attrs,
+   .is_visible = aer_stats_attrs_are_visible,
+};
+
+void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct aer_err_info *info)
+{
+   int status, i, max = -1;
+   u64 *counter = NULL;
+   struct aer_stats *aer_stats = pdev->aer_stats;
+
+   if (unlikely(!aer_stats))
+   return;
+
+   switch (info->severity) {
+   case AER_CORRECTABLE:
+   aer_stats->dev_total_cor_errs++;
+   counter = &aer_stats->de

[PATCH 4/5] PCI/AER: Add sysfs attributes for rootport cumulative stats

2018-05-22 Thread Rajat Jain
Add sysfs attributes for rootport statistics (that are cumulative
of all the ERR_* messages seen on this PCI hierarchy).

Signed-off-by: Rajat Jain 
---
 drivers/pci/pcie/aer/aerdrv.h   |  2 ++
 drivers/pci/pcie/aer/aerdrv_core.c  |  2 ++
 drivers/pci/pcie/aer/aerdrv_stats.c | 31 +
 3 files changed, 35 insertions(+)

diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
index 048fbd7c9633..77d831d9 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -88,6 +88,8 @@ irqreturn_t aer_irq(int irq, void *context);
 int pci_aer_stats_init(struct pci_dev *pdev);
 void pci_aer_stats_exit(struct pci_dev *pdev);
 void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct aer_err_info *info);
+void pci_rootport_aer_stats_incr(struct pci_dev *pdev,
+struct aer_err_source *e_src);
 
 extern const char
 *aer_correctable_error_string[AER_MAX_TYPEOF_CORRECTABLE_ERRS];
diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
b/drivers/pci/pcie/aer/aerdrv_core.c
index 42a6f913069a..0f70e22563f3 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -424,6 +424,8 @@ static void aer_isr_one_error(struct pcie_device *p_device,
struct aer_rpc *rpc = get_service_data(p_device);
struct aer_err_info *e_info = &rpc->e_info;
 
+   pci_rootport_aer_stats_incr(p_device->port, e_src);
+
/*
 * There is a possibility that both correctable error and
 * uncorrectable error being logged. Report correctable error first.
diff --git a/drivers/pci/pcie/aer/aerdrv_stats.c 
b/drivers/pci/pcie/aer/aerdrv_stats.c
index 5f0a6e144f56..a526e26c8683 100644
--- a/drivers/pci/pcie/aer/aerdrv_stats.c
+++ b/drivers/pci/pcie/aer/aerdrv_stats.c
@@ -60,6 +60,9 @@ static DEVICE_ATTR_RO(field)
 aer_stats_aggregate_attr(dev_total_cor_errs);
 aer_stats_aggregate_attr(dev_total_fatal_errs);
 aer_stats_aggregate_attr(dev_total_nonfatal_errs);
+aer_stats_aggregate_attr(rootport_total_cor_errs);
+aer_stats_aggregate_attr(rootport_total_fatal_errs);
+aer_stats_aggregate_attr(rootport_total_nonfatal_errs);
 
 #define aer_stats_breakdown_attr(field, stats_array, strings_array)
\
static ssize_t \
@@ -90,6 +93,9 @@ static struct attribute *aer_stats_attrs[] __ro_after_init = {
&dev_attr_dev_total_nonfatal_errs.attr,
&dev_attr_dev_breakdown_correctable.attr,
&dev_attr_dev_breakdown_uncorrectable.attr,
+   &dev_attr_rootport_total_cor_errs.attr,
+   &dev_attr_rootport_total_fatal_errs.attr,
+   &dev_attr_rootport_total_nonfatal_errs.attr,
NULL
 };
 
@@ -102,6 +108,12 @@ static umode_t aer_stats_attrs_are_visible(struct kobject 
*kobj,
if (!pdev->aer_stats)
return 0;
 
+   if ((a == &dev_attr_rootport_total_cor_errs.attr ||
+a == &dev_attr_rootport_total_fatal_errs.attr ||
+a == &dev_attr_rootport_total_nonfatal_errs.attr) &&
+   pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT)
+   return 0;
+
return a->mode;
 }
 
@@ -144,6 +156,25 @@ void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct 
aer_err_info *info)
counter[i]++;
 }
 
+void pci_rootport_aer_stats_incr(struct pci_dev *pdev,
+struct aer_err_source *e_src)
+{
+   struct aer_stats *aer_stats = pdev->aer_stats;
+
+   if (unlikely(!aer_stats))
+   return;
+
+   if (e_src->status & PCI_ERR_ROOT_COR_RCV)
+   aer_stats->rootport_total_cor_errs++;
+
+   if (e_src->status & PCI_ERR_ROOT_UNCOR_RCV) {
+   if (e_src->status & PCI_ERR_ROOT_FATAL_RCV)
+   aer_stats->rootport_total_fatal_errs++;
+   else
+   aer_stats->rootport_total_nonfatal_errs++;
+   }
+}
+
 int pci_aer_stats_init(struct pci_dev *pdev)
 {
pdev->aer_stats = kzalloc(sizeof(struct aer_stats), GFP_KERNEL);
-- 
2.17.0.441.gb46fe60e1d-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] Documentation/PCI: Add details of PCI AER statistics

2018-05-22 Thread Rajat Jain
Add the PCI AER statistics details to
Documentation/PCI/pcieaer-howto.txt

Signed-off-by: Rajat Jain 
---
 Documentation/PCI/pcieaer-howto.txt | 35 +
 1 file changed, 35 insertions(+)

diff --git a/Documentation/PCI/pcieaer-howto.txt 
b/Documentation/PCI/pcieaer-howto.txt
index acd06bb8..86ee9f9ff5e1 100644
--- a/Documentation/PCI/pcieaer-howto.txt
+++ b/Documentation/PCI/pcieaer-howto.txt
@@ -73,6 +73,41 @@ In the example, 'Requester ID' means the ID of the device 
who sends
 the error message to root port. Pls. refer to pci express specs for
 other fields.
 
+2.4 AER statistics
+
+When AER messages are captured, the statistics are exposed via the following
+sysfs attributes under the "aer_stats" folder for the device:
+
+2.4.1 Device sysfs Attributes
+
+These attributes show up under all the devices that are AER capable. These
+indicate the errors "as seen by the device". Note that this may mean that if
+an end point is causing problems, the AER counters may increment at its link
+partner (e.g. root port) because the errors will be "seen" by the link partner
+and not the the problematic end point itself (which may report all counters
+as 0 as it never saw any problems).
+
+ * dev_total_cor_errs: number of correctable errors seen by the device.
+ * dev_total_fatal_errs: number of fatal uncorrectable errors seen by the 
device.
+ * dev_total_nonfatal_errs: number of nonfatal uncorr errors seen by the 
device.
+ * dev_breakdown_correctable: Provides a breakdown of different type of
+  correctable errors seen.
+ * dev_breakdown_uncorrectable: Provides a breakdown of different type of
+  uncorrectable errors seen.
+
+2.4.1 Rootport sysfs Attributes
+
+These attributes showup under only the rootports that are AER capable. These
+indicate the number of error messages as "reported to" the rootport. Please 
note
+that the rootports also transmit (internally) the ERR_* messages for errors 
seen
+by the internal rootport PCI device, so these counters includes them and are
+thus cumulative of all the error messages on the PCI hierarchy originating
+at that root port.
+
+ * rootport_total_cor_errs: number of ERR_COR messages reported to rootport.
+ * rootport_total_fatal_errs: number of ERR_FATAL messages reported to 
rootport.
+ * rootport_total_nonfatal_errs: number of ERR_NONFATAL messages reporeted to
+ rootport.
 
 3. Developer Guide
 
-- 
2.17.0.441.gb46fe60e1d-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] Expose PCIe AER stats via sysfs

2018-05-22 Thread Rajat Jain
This patchset exposes the AER stats via the sysfs attributes.

Rajat Jain (5):
  PCI/AER: Define and allocate aer_stats structure for AER capable
devices
  PCI/AER: Add sysfs stats for AER capable devices
  PCP/AER: Add sysfs attributes to provide breakdown of AERs
  PCI/AER: Add sysfs attributes for rootport cumulative stats
  Documentation/PCI: Add details of PCI AER statistics

 Documentation/PCI/pcieaer-howto.txt|  35 +
 drivers/pci/pci-sysfs.c|   3 +
 drivers/pci/pci.h  |   4 +-
 drivers/pci/pcie/aer/Makefile  |   2 +-
 drivers/pci/pcie/aer/aerdrv.h  |  15 ++
 drivers/pci/pcie/aer/aerdrv_core.c |  11 ++
 drivers/pci/pcie/aer/aerdrv_errprint.c |   7 +-
 drivers/pci/pcie/aer/aerdrv_stats.c| 192 +
 drivers/pci/probe.c|   1 +
 include/linux/pci.h|   3 +
 10 files changed, 269 insertions(+), 4 deletions(-)
 create mode 100644 drivers/pci/pcie/aer/aerdrv_stats.c

-- 
2.17.0.441.gb46fe60e1d-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html