Re: [PATCH] PCI/AER: correctable error message as KERN_INFO

2023-03-17 Thread Grant Grundler
On Tue, Mar 14, 2023 at 5:24 PM Grant Grundler  wrote:
>
> On Tue, Mar 14, 2023 at 12:38 PM Bjorn Helgaas  wrote:
> >
> > On Tue, Feb 28, 2023 at 10:04:53PM -0800, Grant Grundler wrote:
> > > Since correctable errors have been corrected (and counted), the dmesg 
> > > output
> > > should not be reported as a warning, but rather as "informational".
> > >
> > > Otherwise, using a certain well known vendor's PCIe parts in a USB4 
> > > docking
> > > station, the dmesg buffer can be spammed with correctable errors, 717 
> > > bytes
> > > per instance, potentially many MB per day.
> > >
> > > Given the "WARN" priority, these messages have already confused the 
> > > typical
> > > user that stumbles across them, support staff (triaging feedback reports),
> > > and more than a few linux kernel devs. Changing to INFO will hide these
> > > messages from most audiences.
> > >
> > > Signed-off-by: Grant Grundler 
> > > ---
> > > This patch will likely conflict with:
> > >   
> > > https://lore.kernel.org/all/20230103165548.570377-1-rajat.khandel...@linux.intel.com/
> > >
> > > which I'd also like to see upstream. Please let me know to resubmit
> > > mine if Rajat's patch lands first. Or feel free to fix up this one.
> >
> > Yes.  I think it makes sense to separate this into two patches:
> >
> >   1) Log correctable errors as KERN_INFO instead of KERN_WARNING, and
> >   2) Rate-limit correctable error logging.
>
> I'm going to look into your comment below. I'll port Rajat's patch on
> top of mine to follow the order you've listed above.
>
> > >  drivers/pci/pcie/aer.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > > index f6c24ded134c..e4cf3ec40d66 100644
> > > --- a/drivers/pci/pcie/aer.c
> > > +++ b/drivers/pci/pcie/aer.c
> > > @@ -692,7 +692,7 @@ static void __aer_print_error(struct pci_dev *dev,
> > >
> > >   if (info->severity == AER_CORRECTABLE) {
> > >   strings = aer_correctable_error_string;
> > > - level = KERN_WARNING;
> > > + level = KERN_INFO;
> > >   } else {
> > >   strings = aer_uncorrectable_error_string;
> > >   level = KERN_ERR;
> > > @@ -724,7 +724,7 @@ void aer_print_error(struct pci_dev *dev, struct 
> > > aer_err_info *info)
> > >   layer = AER_GET_LAYER_ERROR(info->severity, info->status);
> > >   agent = AER_GET_AGENT(info->severity, info->status);
> > >
> > > - level = (info->severity == AER_CORRECTABLE) ? KERN_WARNING : 
> > > KERN_ERR;
> > > + level = (info->severity == AER_CORRECTABLE) ? KERN_INFO : KERN_ERR;
> > >
> > >   pci_printk(level, dev, "PCIe Bus Error: severity=%s, type=%s, 
> > > (%s)\n",
> > >  aer_error_severity_string[info->severity],
> >
> > Shouldn't we do the same in the cper_print_aer() path?  That path
> > currently uses pci_err() and then calls __aer_print_error(), so the
> > initial message will always be KERN_ERR, and the decoding done by
> > __aer_print_error() will be KERN_INFO (for correctable) or KERN_ERR.
>
> I was completely unaware of this since it's not causing me any
> immediate problems. But I agree the message priority should be
> consistent for correctable errors.

I've just posted a V2 which I believe is against "pci-next":

grundler <1607>git remote -v show pci-next
* remote pci-next
  Fetch URL: git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git
  Push  URL: git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git
  HEAD branch: main
  Remote branches:
aer   tracked
controller/dt tracked
controller/kirin  tracked
controller/layerscape tracked
controller/rcar   tracked
for-linus tracked
main  tracked
next  tracked
  Local branch configured for 'git pull':
aer_correctable_info merges with remote next

Please let me know if this is the wrong git tree and branch to track.

> > Seems like a shame to do the same test in three places, but would
> > require a little more refactoring to avoid that.
>
> I don't mind doing the same test in multiple places. If refactoring
> this isn't straight forward, I'll leave the refactoring for someone
> more ambitious. :D

I've moved one of the pci_info lines from cper_print_aer()  to
__aer_print_info() since the status/mask are the same for both paths
that invoke __aer_print_info(). But that's as far as I understand what
each of the paths that calls __aer_print_info() do.  If this is not
OK, I can move it back.

cheers,
grant


Re: [PATCH] PCI/AER: correctable error message as KERN_INFO

2023-03-14 Thread Grant Grundler
On Tue, Mar 14, 2023 at 12:38 PM Bjorn Helgaas  wrote:
>
> On Tue, Feb 28, 2023 at 10:04:53PM -0800, Grant Grundler wrote:
> > Since correctable errors have been corrected (and counted), the dmesg output
> > should not be reported as a warning, but rather as "informational".
> >
> > Otherwise, using a certain well known vendor's PCIe parts in a USB4 docking
> > station, the dmesg buffer can be spammed with correctable errors, 717 bytes
> > per instance, potentially many MB per day.
> >
> > Given the "WARN" priority, these messages have already confused the typical
> > user that stumbles across them, support staff (triaging feedback reports),
> > and more than a few linux kernel devs. Changing to INFO will hide these
> > messages from most audiences.
> >
> > Signed-off-by: Grant Grundler 
> > ---
> > This patch will likely conflict with:
> >   
> > https://lore.kernel.org/all/20230103165548.570377-1-rajat.khandel...@linux.intel.com/
> >
> > which I'd also like to see upstream. Please let me know to resubmit
> > mine if Rajat's patch lands first. Or feel free to fix up this one.
>
> Yes.  I think it makes sense to separate this into two patches:
>
>   1) Log correctable errors as KERN_INFO instead of KERN_WARNING, and
>   2) Rate-limit correctable error logging.

I'm going to look into your comment below. I'll port Rajat's patch on
top of mine to follow the order you've listed above.

> >  drivers/pci/pcie/aer.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > index f6c24ded134c..e4cf3ec40d66 100644
> > --- a/drivers/pci/pcie/aer.c
> > +++ b/drivers/pci/pcie/aer.c
> > @@ -692,7 +692,7 @@ static void __aer_print_error(struct pci_dev *dev,
> >
> >   if (info->severity == AER_CORRECTABLE) {
> >   strings = aer_correctable_error_string;
> > - level = KERN_WARNING;
> > + level = KERN_INFO;
> >   } else {
> >   strings = aer_uncorrectable_error_string;
> >   level = KERN_ERR;
> > @@ -724,7 +724,7 @@ void aer_print_error(struct pci_dev *dev, struct 
> > aer_err_info *info)
> >   layer = AER_GET_LAYER_ERROR(info->severity, info->status);
> >   agent = AER_GET_AGENT(info->severity, info->status);
> >
> > - level = (info->severity == AER_CORRECTABLE) ? KERN_WARNING : KERN_ERR;
> > + level = (info->severity == AER_CORRECTABLE) ? KERN_INFO : KERN_ERR;
> >
> >   pci_printk(level, dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
> >  aer_error_severity_string[info->severity],
>
> Shouldn't we do the same in the cper_print_aer() path?  That path
> currently uses pci_err() and then calls __aer_print_error(), so the
> initial message will always be KERN_ERR, and the decoding done by
> __aer_print_error() will be KERN_INFO (for correctable) or KERN_ERR.

I was completely unaware of this since it's not causing me any
immediate problems. But I agree the message priority should be
consistent for correctable errors.

> Seems like a shame to do the same test in three places, but would
> require a little more refactoring to avoid that.

I don't mind doing the same test in multiple places. If refactoring
this isn't straight forward, I'll leave the refactoring for someone
more ambitious. :D

cheers,
grant

>
> Bjorn


Re: [PATCH] PCI/AER: correctable error message as KERN_INFO

2023-03-14 Thread Bjorn Helgaas
On Tue, Feb 28, 2023 at 10:04:53PM -0800, Grant Grundler wrote:
> Since correctable errors have been corrected (and counted), the dmesg output
> should not be reported as a warning, but rather as "informational".
> 
> Otherwise, using a certain well known vendor's PCIe parts in a USB4 docking
> station, the dmesg buffer can be spammed with correctable errors, 717 bytes
> per instance, potentially many MB per day.
> 
> Given the "WARN" priority, these messages have already confused the typical
> user that stumbles across them, support staff (triaging feedback reports),
> and more than a few linux kernel devs. Changing to INFO will hide these
> messages from most audiences.
> 
> Signed-off-by: Grant Grundler 
> ---
> This patch will likely conflict with:
>   
> https://lore.kernel.org/all/20230103165548.570377-1-rajat.khandel...@linux.intel.com/
> 
> which I'd also like to see upstream. Please let me know to resubmit
> mine if Rajat's patch lands first. Or feel free to fix up this one.

Yes.  I think it makes sense to separate this into two patches:

  1) Log correctable errors as KERN_INFO instead of KERN_WARNING, and

  2) Rate-limit correctable error logging.

>  drivers/pci/pcie/aer.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index f6c24ded134c..e4cf3ec40d66 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -692,7 +692,7 @@ static void __aer_print_error(struct pci_dev *dev,
>  
>   if (info->severity == AER_CORRECTABLE) {
>   strings = aer_correctable_error_string;
> - level = KERN_WARNING;
> + level = KERN_INFO;
>   } else {
>   strings = aer_uncorrectable_error_string;
>   level = KERN_ERR;
> @@ -724,7 +724,7 @@ void aer_print_error(struct pci_dev *dev, struct 
> aer_err_info *info)
>   layer = AER_GET_LAYER_ERROR(info->severity, info->status);
>   agent = AER_GET_AGENT(info->severity, info->status);
>  
> - level = (info->severity == AER_CORRECTABLE) ? KERN_WARNING : KERN_ERR;
> + level = (info->severity == AER_CORRECTABLE) ? KERN_INFO : KERN_ERR;
>  
>   pci_printk(level, dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
>  aer_error_severity_string[info->severity],

Shouldn't we do the same in the cper_print_aer() path?  That path
currently uses pci_err() and then calls __aer_print_error(), so the
initial message will always be KERN_ERR, and the decoding done by
__aer_print_error() will be KERN_INFO (for correctable) or KERN_ERR.

Seems like a shame to do the same test in three places, but would
require a little more refactoring to avoid that.

Bjorn


Re: [PATCH] PCI/AER: correctable error message as KERN_INFO

2023-03-08 Thread Grant Grundler
On Wed, Mar 8, 2023 at 12:18 PM Bjorn Helgaas  wrote:
>
> On Wed, Mar 08, 2023 at 12:00:48PM -0800, Grant Grundler wrote:
> > Ping? Did I miss an email or other work that this patch collides with?
>
> Nope, we typically make topic branches based on -rc1, so not much
> happens during the merge window.  -rc1 was tagged Sunday, so things
> will start appearing in -next soon.

Ah ok! Thanks for clarifying Bjorn!

cheers,
grant

>
> Bjorn
>
> > On Tue, Feb 28, 2023 at 10:05 PM Grant Grundler  
> > wrote:
> > >
> > > Since correctable errors have been corrected (and counted), the dmesg 
> > > output
> > > should not be reported as a warning, but rather as "informational".
> > >
> > > Otherwise, using a certain well known vendor's PCIe parts in a USB4 
> > > docking
> > > station, the dmesg buffer can be spammed with correctable errors, 717 
> > > bytes
> > > per instance, potentially many MB per day.
> > >
> > > Given the "WARN" priority, these messages have already confused the 
> > > typical
> > > user that stumbles across them, support staff (triaging feedback reports),
> > > and more than a few linux kernel devs. Changing to INFO will hide these
> > > messages from most audiences.
> > >
> > > Signed-off-by: Grant Grundler 
> > > ---
> > > This patch will likely conflict with:
> > >   
> > > https://lore.kernel.org/all/20230103165548.570377-1-rajat.khandel...@linux.intel.com/
> > >
> > > which I'd also like to see upstream. Please let me know to resubmit mine 
> > > if Rajat's patch lands first. Or feel free to fix up this one.
> > >
> > >  drivers/pci/pcie/aer.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > > index f6c24ded134c..e4cf3ec40d66 100644
> > > --- a/drivers/pci/pcie/aer.c
> > > +++ b/drivers/pci/pcie/aer.c
> > > @@ -692,7 +692,7 @@ static void __aer_print_error(struct pci_dev *dev,
> > >
> > > if (info->severity == AER_CORRECTABLE) {
> > > strings = aer_correctable_error_string;
> > > -   level = KERN_WARNING;
> > > +   level = KERN_INFO;
> > > } else {
> > > strings = aer_uncorrectable_error_string;
> > > level = KERN_ERR;
> > > @@ -724,7 +724,7 @@ void aer_print_error(struct pci_dev *dev, struct 
> > > aer_err_info *info)
> > > layer = AER_GET_LAYER_ERROR(info->severity, info->status);
> > > agent = AER_GET_AGENT(info->severity, info->status);
> > >
> > > -   level = (info->severity == AER_CORRECTABLE) ? KERN_WARNING : 
> > > KERN_ERR;
> > > +   level = (info->severity == AER_CORRECTABLE) ? KERN_INFO : 
> > > KERN_ERR;
> > >
> > > pci_printk(level, dev, "PCIe Bus Error: severity=%s, type=%s, 
> > > (%s)\n",
> > >aer_error_severity_string[info->severity],
> > > --
> > > 2.39.2.722.g9855ee24e9-goog
> > >


Re: [PATCH] PCI/AER: correctable error message as KERN_INFO

2023-03-08 Thread Bjorn Helgaas
On Wed, Mar 08, 2023 at 12:00:48PM -0800, Grant Grundler wrote:
> Ping? Did I miss an email or other work that this patch collides with?

Nope, we typically make topic branches based on -rc1, so not much
happens during the merge window.  -rc1 was tagged Sunday, so things
will start appearing in -next soon.

Bjorn

> On Tue, Feb 28, 2023 at 10:05 PM Grant Grundler  wrote:
> >
> > Since correctable errors have been corrected (and counted), the dmesg output
> > should not be reported as a warning, but rather as "informational".
> >
> > Otherwise, using a certain well known vendor's PCIe parts in a USB4 docking
> > station, the dmesg buffer can be spammed with correctable errors, 717 bytes
> > per instance, potentially many MB per day.
> >
> > Given the "WARN" priority, these messages have already confused the typical
> > user that stumbles across them, support staff (triaging feedback reports),
> > and more than a few linux kernel devs. Changing to INFO will hide these
> > messages from most audiences.
> >
> > Signed-off-by: Grant Grundler 
> > ---
> > This patch will likely conflict with:
> >   
> > https://lore.kernel.org/all/20230103165548.570377-1-rajat.khandel...@linux.intel.com/
> >
> > which I'd also like to see upstream. Please let me know to resubmit mine if 
> > Rajat's patch lands first. Or feel free to fix up this one.
> >
> >  drivers/pci/pcie/aer.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > index f6c24ded134c..e4cf3ec40d66 100644
> > --- a/drivers/pci/pcie/aer.c
> > +++ b/drivers/pci/pcie/aer.c
> > @@ -692,7 +692,7 @@ static void __aer_print_error(struct pci_dev *dev,
> >
> > if (info->severity == AER_CORRECTABLE) {
> > strings = aer_correctable_error_string;
> > -   level = KERN_WARNING;
> > +   level = KERN_INFO;
> > } else {
> > strings = aer_uncorrectable_error_string;
> > level = KERN_ERR;
> > @@ -724,7 +724,7 @@ void aer_print_error(struct pci_dev *dev, struct 
> > aer_err_info *info)
> > layer = AER_GET_LAYER_ERROR(info->severity, info->status);
> > agent = AER_GET_AGENT(info->severity, info->status);
> >
> > -   level = (info->severity == AER_CORRECTABLE) ? KERN_WARNING : 
> > KERN_ERR;
> > +   level = (info->severity == AER_CORRECTABLE) ? KERN_INFO : KERN_ERR;
> >
> > pci_printk(level, dev, "PCIe Bus Error: severity=%s, type=%s, 
> > (%s)\n",
> >aer_error_severity_string[info->severity],
> > --
> > 2.39.2.722.g9855ee24e9-goog
> >


Re: [PATCH] PCI/AER: correctable error message as KERN_INFO

2023-03-08 Thread Grant Grundler
Ping? Did I miss an email or other work that this patch collides with?

cheers,
grant

On Tue, Feb 28, 2023 at 10:05 PM Grant Grundler  wrote:
>
> Since correctable errors have been corrected (and counted), the dmesg output
> should not be reported as a warning, but rather as "informational".
>
> Otherwise, using a certain well known vendor's PCIe parts in a USB4 docking
> station, the dmesg buffer can be spammed with correctable errors, 717 bytes
> per instance, potentially many MB per day.
>
> Given the "WARN" priority, these messages have already confused the typical
> user that stumbles across them, support staff (triaging feedback reports),
> and more than a few linux kernel devs. Changing to INFO will hide these
> messages from most audiences.
>
> Signed-off-by: Grant Grundler 
> ---
> This patch will likely conflict with:
>   
> https://lore.kernel.org/all/20230103165548.570377-1-rajat.khandel...@linux.intel.com/
>
> which I'd also like to see upstream. Please let me know to resubmit mine if 
> Rajat's patch lands first. Or feel free to fix up this one.
>
>  drivers/pci/pcie/aer.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index f6c24ded134c..e4cf3ec40d66 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -692,7 +692,7 @@ static void __aer_print_error(struct pci_dev *dev,
>
> if (info->severity == AER_CORRECTABLE) {
> strings = aer_correctable_error_string;
> -   level = KERN_WARNING;
> +   level = KERN_INFO;
> } else {
> strings = aer_uncorrectable_error_string;
> level = KERN_ERR;
> @@ -724,7 +724,7 @@ void aer_print_error(struct pci_dev *dev, struct 
> aer_err_info *info)
> layer = AER_GET_LAYER_ERROR(info->severity, info->status);
> agent = AER_GET_AGENT(info->severity, info->status);
>
> -   level = (info->severity == AER_CORRECTABLE) ? KERN_WARNING : KERN_ERR;
> +   level = (info->severity == AER_CORRECTABLE) ? KERN_INFO : KERN_ERR;
>
> pci_printk(level, dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
>aer_error_severity_string[info->severity],
> --
> 2.39.2.722.g9855ee24e9-goog
>


[PATCH] PCI/AER: correctable error message as KERN_INFO

2023-03-01 Thread Grant Grundler
Since correctable errors have been corrected (and counted), the dmesg output
should not be reported as a warning, but rather as "informational".

Otherwise, using a certain well known vendor's PCIe parts in a USB4 docking
station, the dmesg buffer can be spammed with correctable errors, 717 bytes
per instance, potentially many MB per day.

Given the "WARN" priority, these messages have already confused the typical
user that stumbles across them, support staff (triaging feedback reports),
and more than a few linux kernel devs. Changing to INFO will hide these
messages from most audiences.

Signed-off-by: Grant Grundler 
---
This patch will likely conflict with:
  
https://lore.kernel.org/all/20230103165548.570377-1-rajat.khandel...@linux.intel.com/

which I'd also like to see upstream. Please let me know to resubmit mine if 
Rajat's patch lands first. Or feel free to fix up this one.

 drivers/pci/pcie/aer.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index f6c24ded134c..e4cf3ec40d66 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -692,7 +692,7 @@ static void __aer_print_error(struct pci_dev *dev,
 
if (info->severity == AER_CORRECTABLE) {
strings = aer_correctable_error_string;
-   level = KERN_WARNING;
+   level = KERN_INFO;
} else {
strings = aer_uncorrectable_error_string;
level = KERN_ERR;
@@ -724,7 +724,7 @@ void aer_print_error(struct pci_dev *dev, struct 
aer_err_info *info)
layer = AER_GET_LAYER_ERROR(info->severity, info->status);
agent = AER_GET_AGENT(info->severity, info->status);
 
-   level = (info->severity == AER_CORRECTABLE) ? KERN_WARNING : KERN_ERR;
+   level = (info->severity == AER_CORRECTABLE) ? KERN_INFO : KERN_ERR;
 
pci_printk(level, dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
   aer_error_severity_string[info->severity],
-- 
2.39.2.722.g9855ee24e9-goog