Re: [PATCH 1/5] PCI/AER: Define and allocate aer_stats structure for AER capable devices
On 05/23/2018 09:32 AM, Jes Sorensen wrote: > On 05/23/2018 10:26 AM, Matthew Wilcox wrote: >> On Wed, May 23, 2018 at 10:20:10AM -0400, Jes Sorensen wrote: +++ b/drivers/pci/pcie/aer/aerdrv_stats.c @@ -0,0 +1,64 @@ +// SPDX-License-Identifier: GPL-2.0 >>> >>> Fix the formatting please - that gross // gibberish doesn't belong there. >> >> Sorry, Jes. The Chief Penguin has Spoken, and that's the preferred >> syntax: >> >> 2. Style: >> >>The SPDX license identifier is added in form of a comment. The comment >>style depends on the file type:: >> >> C source: // SPDX-License-Identifier: >> >> (you can dig up the discussion around this on the mailing list if you >> like. Linus actually thinks that C++ single-line comments are one of >> the few things that language got right) > > Well I'll agree to disagree with Linus on this one. It's ugly as fsck > and allows for ambiguous statements in the code. You misspelled "fuck". Alex -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] PCI/AER: Define and allocate aer_stats structure for AER capable devices
On 05/23/2018 09:20 AM, Jes Sorensen wrote: > On 05/22/2018 06:28 PM, Rajat Jain wrote: >> new file mode 100644 >> index ..b9f251992209 >> --- /dev/null >> +++ b/drivers/pci/pcie/aer/aerdrv_stats.c >> @@ -0,0 +1,64 @@ >> +// SPDX-License-Identifier: GPL-2.0 > > Fix the formatting please - that gross // gibberish doesn't belong there. Deep breath in. Deep breath out. git grep SPDX Although I don't like it, this format is already too common. Cheers, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] Documentation/PCI: Add details of PCI AER statistics
On 05/22/2018 05:28 PM, Rajat Jain wrote: > Add the PCI AER statistics details to > Documentation/PCI/pcieaer-howto.txt > > Signed-off-by: Rajat Jain > --- > Documentation/PCI/pcieaer-howto.txt | 35 + > 1 file changed, 35 insertions(+) > > diff --git a/Documentation/PCI/pcieaer-howto.txt > b/Documentation/PCI/pcieaer-howto.txt > index acd06bb8..86ee9f9ff5e1 100644 > --- a/Documentation/PCI/pcieaer-howto.txt > +++ b/Documentation/PCI/pcieaer-howto.txt > @@ -73,6 +73,41 @@ In the example, 'Requester ID' means the ID of the device > who sends > the error message to root port. Pls. refer to pci express specs for > other fields. > > +2.4 AER statistics > + > +When AER messages are captured, the statistics are exposed via the following > +sysfs attributes under the "aer_stats" folder for the device: > + > +2.4.1 Device sysfs Attributes > + > +These attributes show up under all the devices that are AER capable. These > +indicate the errors "as seen by the device". Note that this may mean that if > +an end point is causing problems, the AER counters may increment at its link > +partner (e.g. root port) because the errors will be "seen" by the link > partner > +and not the the problematic end point itself (which may report all counters > +as 0 as it never saw any problems). I was afraid of that. Is there a way to look at the requester ID to log AER errors to the correct device? Alex -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] PCI/AER: Add sysfs stats for AER capable devices
On 05/22/2018 05:28 PM, Rajat Jain wrote: > Add the following AER sysfs stats to represent the counters for each > kind of error as seen by the device: > > dev_total_cor_errs > dev_total_fatal_errs > dev_total_nonfatal_errs > > Signed-off-by: Rajat Jain > --- > drivers/pci/pci-sysfs.c| 3 ++ > drivers/pci/pci.h | 4 +- > drivers/pci/pcie/aer/aerdrv.h | 1 + > drivers/pci/pcie/aer/aerdrv_errprint.c | 1 + > drivers/pci/pcie/aer/aerdrv_stats.c| 72 ++ > 5 files changed, 80 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > index 366d93af051d..730f985a3dc9 100644 > --- a/drivers/pci/pci-sysfs.c > +++ b/drivers/pci/pci-sysfs.c > @@ -1743,6 +1743,9 @@ static const struct attribute_group > *pci_dev_attr_groups[] = { > #endif > &pci_bridge_attr_group, > &pcie_dev_attr_group, > +#ifdef CONFIG_PCIEAER > + &aer_stats_attr_group, > +#endif > NULL, > }; So if the device is removed as part of recovery, then these get reset, right? So if the device fails intermittently, these counters would keep getting reset. Is this the intent? (snip) > /** > * pci_match_one_device - Tell if a PCI device structure has a matching > diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h > index d8b9fba536ed..b5d5ad6f2c03 100644 > --- a/drivers/pci/pcie/aer/aerdrv.h > +++ b/drivers/pci/pcie/aer/aerdrv.h > @@ -87,6 +87,7 @@ void aer_print_port_info(struct pci_dev *dev, struct > aer_err_info *info); > irqreturn_t aer_irq(int irq, void *context); > int pci_aer_stats_init(struct pci_dev *pdev); > void pci_aer_stats_exit(struct pci_dev *pdev); > +void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct aer_err_info *info); > > #ifdef CONFIG_ACPI_APEI > int pcie_aer_get_firmware_first(struct pci_dev *pci_dev); > diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c > b/drivers/pci/pcie/aer/aerdrv_errprint.c > index 21ca5e1b0ded..5e8b98deda08 100644 > --- a/drivers/pci/pcie/aer/aerdrv_errprint.c > +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c > @@ -155,6 +155,7 @@ static void __aer_print_error(struct pci_dev *dev, > pci_err(dev, " [%2d] Unknown Error Bit%s\n", > i, info->first_error == i ? " (First)" : ""); > } > + pci_dev_aer_stats_incr(dev, info); What about AER errors that are contained by DPC? Alex -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html