from:"Bjorn Helgaas"

Re: [PATCH 02/12] pci/p2pdma: Don't initialise page refcount to one

2024-09-11 Thread Bjorn Helgaas

On Wed, Sep 11, 2024 at 11:07:51AM +1000, Alistair Popple wrote:
> 
> >> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> >> index 4f47a13..210b9f4 100644
> >> --- a/drivers/pci/p2pdma.c
> >> +++ b/drivers/pci/p2pdma.c
> >> @@ -129,6 +129,12 @@ static int p2pmem_alloc_mmap(struct file *filp, 
> >> struct kobject *kobj,
> >>}
> >>  
> >>/*
> >> +   * Initialise the refcount for the freshly allocated page. As we have
> >> +   * just allocated the page no one else should be using it.
> >> +   */
> >> +  set_page_count(virt_to_page(kaddr), 1);
> >
> > No doubt the subject line is true in some overall context, but it does
> > seem to say the opposite of what happens here.
> 
> Fair. It made sense to me from the mm context I was coming from (it was
> being initialised to 1 there) but not overall. Something like "move page
> refcount initialisation to p2pdma driver" would make more sense?

Definitely would, thanks.

Re: [PATCH 02/12] pci/p2pdma: Don't initialise page refcount to one

2024-09-10 Thread Bjorn Helgaas

In subject:

  PCI/P2PDMA: ...

would match previous history.

On Tue, Sep 10, 2024 at 02:14:27PM +1000, Alistair Popple wrote:
> The reference counts for ZONE_DEVICE private pages should be
> initialised by the driver when the page is actually allocated by the
> driver allocator, not when they are first created. This is currently
> the case for MEMORY_DEVICE_PRIVATE and MEMORY_DEVICE_COHERENT pages
> but not MEMORY_DEVICE_PCI_P2PDMA pages so fix that up.
> 
> Signed-off-by: Alistair Popple 
> ---
>  drivers/pci/p2pdma.c |  6 ++
>  mm/memremap.c| 17 +
>  mm/mm_init.c | 22 ++
>  3 files changed, 37 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 4f47a13..210b9f4 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -129,6 +129,12 @@ static int p2pmem_alloc_mmap(struct file *filp, struct 
> kobject *kobj,
>   }
>  
>   /*
> +  * Initialise the refcount for the freshly allocated page. As we have
> +  * just allocated the page no one else should be using it.
> +  */
> + set_page_count(virt_to_page(kaddr), 1);

No doubt the subject line is true in some overall context, but it does
seem to say the opposite of what happens here.

Bjorn

Re: [PATCH v5 7/7] PCI: Create helper to print TLP Header and Prefix Log

2024-09-02 Thread Bjorn Helgaas

On Mon, Sep 02, 2024 at 08:20:41PM +0300, Ilpo Järvinen wrote:
> On Fri, 30 Aug 2024, Bjorn Helgaas wrote:
> 
> > On Tue, May 14, 2024 at 02:31:09PM +0300, Ilpo Järvinen wrote:
> > > Add pcie_print_tlp_log() helper to print TLP Header and Prefix Log.
> > > Print End-End Prefixes only if they are non-zero.
> > > 
> > > Consolidate the few places which currently print TLP using custom
> > > formatting.
> > > 
> > > The first attempt used pr_cont() instead of building a string first but
> > > it turns out pr_cont() is not compatible with pci_err() and prints on a
> > > separate line. When I asked about this, Andy Shevchenko suggested
> > > pr_cont() should not be used in the first place (to eventually get rid
> > > of it) so pr_cont() is now replaced with building the string first.
> > > 
> > > Signed-off-by: Ilpo Järvinen 
> > > ---
> > >  drivers/pci/pci.h  |  2 ++
> > >  drivers/pci/pcie/aer.c | 10 ++
> > >  drivers/pci/pcie/dpc.c |  5 +
> > >  drivers/pci/pcie/tlp.c | 31 +++
> > >  4 files changed, 36 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > > index 7afdd71f9026..45083e62892c 100644
> > > --- a/drivers/pci/pci.h
> > > +++ b/drivers/pci/pci.h
> > > @@ -423,6 +423,8 @@ void aer_print_error(struct pci_dev *dev, struct 
> > > aer_err_info *info);
> > >  int pcie_read_tlp_log(struct pci_dev *dev, int where, int where2,
> > > unsigned int tlp_len, struct pcie_tlp_log *log);
> > >  unsigned int aer_tlp_log_len(struct pci_dev *dev);
> > > +void pcie_print_tlp_log(const struct pci_dev *dev,
> > > + const struct pcie_tlp_log *log, const char *pfx);
> > >  #endif   /* CONFIG_PCIEAER */
> > >  
> > >  #ifdef CONFIG_PCIEPORTBUS
> > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > > index ecc1dea5a208..efb9e728fe94 100644
> > > --- a/drivers/pci/pcie/aer.c
> > > +++ b/drivers/pci/pcie/aer.c
> > > @@ -664,12 +664,6 @@ static void pci_rootport_aer_stats_incr(struct 
> > > pci_dev *pdev,
> > >   }
> > >  }
> > >  
> > > -static void __print_tlp_header(struct pci_dev *dev, struct pcie_tlp_log 
> > > *t)
> > > -{
> > > - pci_err(dev, "  TLP Header: %08x %08x %08x %08x\n",
> > > - t->dw[0], t->dw[1], t->dw[2], t->dw[3]);
> > > -}
> > > -
> > >  static void __aer_print_error(struct pci_dev *dev,
> > > struct aer_err_info *info)
> > >  {
> > > @@ -724,7 +718,7 @@ void aer_print_error(struct pci_dev *dev, struct 
> > > aer_err_info *info)
> > >   __aer_print_error(dev, info);
> > >  
> > >   if (info->tlp_header_valid)
> > > - __print_tlp_header(dev, &info->tlp);
> > > + pcie_print_tlp_log(dev, &info->tlp, "  ");
> > 
> > I see you went to some trouble to preserve the previous output, down
> > to the number of spaces prefixing it.
> > 
> > But more than the leading spaces, I think what people will notice is
> > that previously AER and DPC dmesgs contain the "AER: " or "DPC: "
> > prefixes implicitly added by the dev_fmt definitions [1], where now
> > IIUC they won't.
> > 
> > I think adding dev_fmt("") here should take care of that, e.g.,
> > 
> >   pcie_print_tlp_log(dev, &info->tlp, dev_fmt(""));
> > 
> > [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1990272
> 
> Okay, I'll fix it and resend but looking into that output, it doesn't 
> look very consistent when it comes to prefixes as the lines in between do 
> not start with "AER: " either. Perhaps those lines should be changed as 
> well?

True.  Possibility for future patches.

Re: [PATCH v5 7/7] PCI: Create helper to print TLP Header and Prefix Log

2024-08-30 Thread Bjorn Helgaas

On Tue, May 14, 2024 at 02:31:09PM +0300, Ilpo Järvinen wrote:
> Add pcie_print_tlp_log() helper to print TLP Header and Prefix Log.
> Print End-End Prefixes only if they are non-zero.
> 
> Consolidate the few places which currently print TLP using custom
> formatting.
> 
> The first attempt used pr_cont() instead of building a string first but
> it turns out pr_cont() is not compatible with pci_err() and prints on a
> separate line. When I asked about this, Andy Shevchenko suggested
> pr_cont() should not be used in the first place (to eventually get rid
> of it) so pr_cont() is now replaced with building the string first.
> 
> Signed-off-by: Ilpo Järvinen 
> ---
>  drivers/pci/pci.h  |  2 ++
>  drivers/pci/pcie/aer.c | 10 ++
>  drivers/pci/pcie/dpc.c |  5 +
>  drivers/pci/pcie/tlp.c | 31 +++
>  4 files changed, 36 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 7afdd71f9026..45083e62892c 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -423,6 +423,8 @@ void aer_print_error(struct pci_dev *dev, struct 
> aer_err_info *info);
>  int pcie_read_tlp_log(struct pci_dev *dev, int where, int where2,
> unsigned int tlp_len, struct pcie_tlp_log *log);
>  unsigned int aer_tlp_log_len(struct pci_dev *dev);
> +void pcie_print_tlp_log(const struct pci_dev *dev,
> + const struct pcie_tlp_log *log, const char *pfx);
>  #endif   /* CONFIG_PCIEAER */
>  
>  #ifdef CONFIG_PCIEPORTBUS
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index ecc1dea5a208..efb9e728fe94 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -664,12 +664,6 @@ static void pci_rootport_aer_stats_incr(struct pci_dev 
> *pdev,
>   }
>  }
>  
> -static void __print_tlp_header(struct pci_dev *dev, struct pcie_tlp_log *t)
> -{
> - pci_err(dev, "  TLP Header: %08x %08x %08x %08x\n",
> - t->dw[0], t->dw[1], t->dw[2], t->dw[3]);
> -}
> -
>  static void __aer_print_error(struct pci_dev *dev,
> struct aer_err_info *info)
>  {
> @@ -724,7 +718,7 @@ void aer_print_error(struct pci_dev *dev, struct 
> aer_err_info *info)
>   __aer_print_error(dev, info);
>  
>   if (info->tlp_header_valid)
> - __print_tlp_header(dev, &info->tlp);
> + pcie_print_tlp_log(dev, &info->tlp, "  ");

I see you went to some trouble to preserve the previous output, down
to the number of spaces prefixing it.

But more than the leading spaces, I think what people will notice is
that previously AER and DPC dmesgs contain the "AER: " or "DPC: "
prefixes implicitly added by the dev_fmt definitions [1], where now
IIUC they won't.

I think adding dev_fmt("") here should take care of that, e.g.,

  pcie_print_tlp_log(dev, &info->tlp, dev_fmt(""));

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1990272

>  out:
>   if (info->id && info->error_dev_num > 1 && info->id == id)
> @@ -796,7 +790,7 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity,
>   aer->uncor_severity);
>  
>   if (tlp_header_valid)
> - __print_tlp_header(dev, &aer->header_log);
> + pcie_print_tlp_log(dev, &aer->header_log, "  ");
>  
>   trace_aer_event(dev_name(&dev->dev), (status & ~mask),
>   aer_severity, tlp_header_valid, &aer->header_log);
> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> index 5056cc6961ec..598f74384471 100644
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -220,10 +220,7 @@ static void dpc_process_rp_pio_error(struct pci_dev 
> *pdev)
>   pcie_read_tlp_log(pdev, cap + PCI_EXP_DPC_RP_PIO_HEADER_LOG,
> cap + PCI_EXP_DPC_RP_PIO_TLPPREFIX_LOG,
> dpc_tlp_log_len(pdev), &tlp_log);
> - pci_err(pdev, "TLP Header: %#010x %#010x %#010x %#010x\n",
> - tlp_log.dw[0], tlp_log.dw[1], tlp_log.dw[2], tlp_log.dw[3]);
> - for (i = 0; i < pdev->dpc_rp_log_size - 5; i++)
> - pci_err(pdev, "TLP Prefix Header: dw%d, %#010x\n", i, 
> tlp_log.prefix[i]);
> + pcie_print_tlp_log(pdev, &tlp_log, "");
>  
>   if (pdev->dpc_rp_log_size < 5)
>   goto clear_status;
> diff --git a/drivers/pci/pcie/tlp.c b/drivers/pci/pcie/tlp.c
> index def9dd7b73e8..097ac8514e96 100644
> --- a/drivers/pci/pcie/tlp.c
> +++ b/drivers/pci/pcie/tlp.c
> @@ -6,6 +6,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  
> @@ -76,3 +77,33 @@ int pcie_read_tlp_log(struct pci_dev *dev, int where, int 
> where2,
>  
>   return 0;
>  }
> +
> +/**
> + * pcie_print_tlp_log - Print TLP Header / Prefix Log contents
> + * @dev: PCIe device
> + * @log: TLP Log structure
> + * @pfx: String prefix (for print out indentation)
> + *
> + * Prints TLP Header and Prefix Log information held by @log.
> + */
> +void pcie_print_tlp_log(const struct pci_dev *dev,
> +

Re: [PATCH 2/2] ACPI: extlog: Trace CPER PCI Express Error Section

2024-08-06 Thread Bjorn Helgaas

On Mon, May 27, 2024 at 04:43:41PM +0200, Fabio M. De Francesco wrote:
> Currently, extlog_print() (ELOG) only reports CPER PCIe section (UEFI
> v2.10, Appendix N.2.7) to the kernel log via print_extlog_rcd(). Instead,
> the similar ghes_do_proc() (GHES) prints to kernel log and calls
> pci_print_aer() to report via the ftrace infrastructure.
> 
> Add support to report the CPER PCIe Error section also via the ftrace
> infrastructure by calling pci_print_aer() to make ELOG act consistently
> with GHES.
> 
> Cc: Dan Williams 
> Signed-off-by: Fabio M. De Francesco 
> ---
>  drivers/acpi/acpi_extlog.c | 30 ++
>  drivers/pci/pcie/aer.c |  2 +-
>  include/linux/aer.h| 13 +++--
>  3 files changed, 42 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
> index e025ae390737..007ce96f8672 100644
> --- a/drivers/acpi/acpi_extlog.c
> +++ b/drivers/acpi/acpi_extlog.c
> @@ -131,6 +131,32 @@ static int print_extlog_rcd(const char *pfx,
>   return 1;
>  }
>  
> +static void extlog_print_pcie(struct cper_sec_pcie *pcie_err,
> +   int severity)
> +{
> + struct aer_capability_regs *aer;
> + struct pci_dev *pdev;
> + unsigned int devfn;
> + unsigned int bus;
> + int aer_severity;
> + int domain;
> +
> + if (pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID &&
> + pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO) {
> + aer_severity = cper_severity_to_aer(severity);
> + aer = (struct aer_capability_regs *)pcie_err->aer_info;
> + domain = pcie_err->device_id.segment;
> + bus = pcie_err->device_id.bus;
> + devfn = PCI_DEVFN(pcie_err->device_id.device,
> +   pcie_err->device_id.function);
> + pdev = pci_get_domain_bus_and_slot(domain, bus, devfn);
> + if (!pdev)
> + return;
> + pci_print_aer(pdev, aer_severity, aer);
> + pci_dev_put(pdev);
> + }

I'm 100% in favor of making error reporting work and look the same
across GHES and ELOG.  But I do have to gripe a bit...

It's already unfortunate that GHES and the native AER handling are
separate paths that lead to the same place (__aer_print_error()).

I'm sorry that we need to add a third path that again does
fundamentally the same thing.  The fact that they're separate means
all the design, reviewing, testing, and maintenance effort is diluted,
and error handling always gets too little love in the first place.
I think this is a recipe for confusion.

  ghes_do_proc# GHES
apei_estatus_for_each_section
  ...
  if (guid_equal(sec_type, &CPER_SEC_PCIE))
ghes_handle_aer
  cper_severity_to_aer
  aer_recover_queue
kfifo_in_spinlocked(&aer_recover_ring)# add to queue
  aer_recover_work_func   # another thread
kfifo_get(&aer_recover_ring)  # remove from queue
pci_print_aer
  __aer_print_error <---

  aer_irq # native AER
kfifo_put(&aer_fifo)  # add to queue
  aer_isr # another thread
kfifo_get(&aer_fifo)  # remove from queue
...
aer_isr_one_error
  aer_process_err_devices
aer_print_error
  __aer_print_error <---

  extlog_print# extlog (x86 only)
apei_estatus_for_each_section
  ...
  if (guid_equal(sec_type, &CPER_SEC_PCIE))
extlog_print_pcie
  cper_severity_to_aer
  pci_get_domain_bus_and_slot
  pci_print_aer
__aer_print_error   <---

And we also have CXL paths that lead to __aer_print_error(), although
it seems like they at least start in the native AER (and maybe GHES?)
path and branch out somewhere.  My head is spinning.

Do I *object* to this patch?  No, not really; it's a trivial change in
drivers/pci/, and Rafael can add my

  Acked-by: Bjorn Helgaas 

as needed.  But I am afraid we're making ourselves a maintenance
headache.

> +}
> +
>  static int extlog_print(struct notifier_block *nb, unsigned long val,
>   void *data)
>  {
> @@ -179,6 +205,10 @@ static int extlog_print(struct notifier_block *nb, 
> unsigned long val,
>   if (gdata->error_data_length >= sizeof(*mem))
>   trace_extlog_mem_event(mem, err_seq, fru_id, 
> fru_text,
>

Re: [PATCH 1/2] ACPI: extlog: Trace CPER Non-standard Section Body

2024-08-06 Thread Bjorn Helgaas

On Mon, May 27, 2024 at 04:43:40PM +0200, Fabio M. De Francesco wrote:
> In extlog_print(), trace "Non-standard Section Body" reported by firmware
> to the OS via Common Platform Error Record (CPER) (UEFI v2.10 Appendix N
> 2.3) to add further debug information and so to make ELOG log
> consistently with ghes_do_proc() (GHES).
> 
> Cc: Dan Williams 
> Signed-off-by: Fabio M. De Francesco 
> ---
>  drivers/acpi/acpi_extlog.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
> index f055609d4b64..e025ae390737 100644
> --- a/drivers/acpi/acpi_extlog.c
> +++ b/drivers/acpi/acpi_extlog.c
> @@ -179,6 +179,12 @@ static int extlog_print(struct notifier_block *nb, 
> unsigned long val,
>   if (gdata->error_data_length >= sizeof(*mem))
>   trace_extlog_mem_event(mem, err_seq, fru_id, 
> fru_text,
>  
> (u8)gdata->error_severity);
> + } else {
> + void *err = acpi_hest_get_payload(gdata);
> +
> + trace_non_standard_event(sec_type, fru_id, fru_text,
> +  gdata->error_severity, err,
> +  gdata->error_data_length);

Kudos for making these two paths more similar.

Not specific to *this* patch, but it's annoying to try to find
tracepoint implementations.  I guess it's
TRACE_EVENT(non_standard_event, ...) in include/ras/ras_event.h.

This has the same prototype as log_non_standard_event(), so
could extlog_print() be made a little bit more like ghes_do_proc() by
using log_non_standard_event() instead of trace_non_standard_event()
directly?

Bjorn

Re: [PATCH v3] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest

2024-08-06 Thread Bjorn Helgaas

On Sat, Aug 03, 2024 at 12:03:25AM +0530, Amit Machhiwal wrote:
> With CONFIG_PCI_DYNAMIC_OF_NODES [1], a hot-plug and hot-unplug sequence
> of a PCI device attached to a PCI-bridge causes following kernel Oops on
> a pseries KVM guest:

What is unique about pseries here?  There's nothing specific to
pseries in the patch, so I would expect this to be a generic problem
on any arch.

>  RTAS: event: 2, Type: Hotplug Event (229), Severity: 1
>  Kernel attempted to read user page (10ec0048) - exploit attempt? (uid: 0)
>  BUG: Unable to handle kernel data access on read at 0x10ec0048

Weird address.  I would expect NULL or something.  Where did this
non-NULL pointer come from?

>  Faulting instruction address: 0xc12d8728
>  Oops: Kernel access of bad area, sig: 11 [#1]
>  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> 
>  NIP [c12d8728] __of_changeset_entry_invert+0x10/0x1ac
>  LR [c12da7f0] __of_changeset_revert_entries+0x98/0x180
>  Call Trace:
>  [cbcc3970] [c12daa60] of_changeset_revert+0x58/0xd8
>  [cbcc39c0] [c0d0ed78] of_pci_remove_node+0x74/0xb0
>  [cbcc39f0] [c0cdcfe0] pci_stop_bus_device+0xf4/0x138
>  [cbcc3a30] [c0cdd140] 
> pci_stop_and_remove_bus_device_locked+0x34/0x64
>  [cbcc3a60] [c0cf3780] remove_store+0xf0/0x108
>  [cbcc3ab0] [c0e89e04] dev_attr_store+0x34/0x78
>  [cbcc3ad0] [c07f8dd4] sysfs_kf_write+0x70/0xa4
>  [cbcc3af0] [c07f7248] kernfs_fop_write_iter+0x1d0/0x2e0
>  [cbcc3b40] [c06c9b08] vfs_write+0x27c/0x558
>  [cbcc3bf0] [c06ca168] ksys_write+0x90/0x170
>  [cbcc3c40] [c0033248] system_call_exception+0xf8/0x290
>  [cbcc3e50] [c000d05c] system_call_vectored_common+0x15c/0x2ec
> 
> 
> A git bisect pointed this regression to be introduced via [1] that added
> a mechanism to create device tree nodes for parent PCI bridges when a
> PCI device is hot-plugged.
> 
> The Oops is caused when `pci_stop_dev()` tries to remove a non-existing
> device-tree node associated with the pci_dev that was earlier
> hot-plugged and was attached under a pci-bridge. The PCI dev header
> `dev->hdr_type` being 0, results a conditional check done with
> `pci_is_bridge()` into false. Consequently, a call to
> `of_pci_make_dev_node()` to create a device node is never made. When at
> a later point in time, in the device node removal path, a memcpy is
> attempted in `__of_changeset_entry_invert()`; since the device node was
> never created, results in an Oops due to kernel read access to a bad
> address.

I'm sure this description is 100% correct, but it's at such a low
level that it doesn't really help understand the underlying design
problem.

Will need an ack from Rob.

> To fix this issue, the patch introduces a new flag OF_CREATE_WITH_CSET
> to keep track of device nodes created via `of_pci_make_dev_node()` and
> later attempt to destroy only such device nodes which have this flag
> set.
> 
> [1] commit 407d1a51921e ("PCI: Create device tree node for bridge")
> 
> Fixes: 407d1a51921e ("PCI: Create device tree node for bridge")
> Reported-by: Kowshik Jois B S 
> Signed-off-by: Lizhi Hou 
> Signed-off-by: Amit Machhiwal 
> ---
> Changes since v2:
> * Drop v2 changes and introduce a different approach from Lizhi discussed
>   over the v2 of this patch
> * V2: 
> https://lore.kernel.org/all/20240715080726.2496198-1-amach...@linux.ibm.com/
> Changes since v1:
> * Included Lizhi's suggested changes on V1
> * Fixed below two warnings from Lizhi's changes and rearranged the cleanup
>   part a bit in `of_pci_make_dev_node`
>   drivers/pci/of.c:611:6: warning: no previous prototype for 
> ‘of_pci_free_node’ [-Wmissing-prototypes]
> 611 | void of_pci_free_node(struct device_node *np)
> |  ^~~~   
>   drivers/pci/of.c: In function ‘of_pci_make_dev_node’:
>   drivers/pci/of.c:696:1: warning: label ‘out_destroy_cset’ defined but 
> not used [-Wunused-label]
> 696 | out_destroy_cset:   
> | ^~~~  
> * V1: 
> https://lore.kernel.org/all/20240703141634.2974589-1-amach...@linux.ibm.com/
> 
>  drivers/pci/of.c   | 3 ++-
>  include/linux/of.h | 1 +
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/of.c b/drivers/pci/of.c
> index dacea3fc5128..bc455370143e 100644
> --- a/drivers/pci/of.c
> +++ b/drivers/pci/of.c
> @@ -653,7 +653,7 @@ void of_pci_remove_node(struct pci_dev *pdev)
>   struct device_node *np;
>  
>   np = pci_device_to_OF_node(pdev);
> - if (!np || !of_node_check_flag(np, OF_DYNAMIC))
> + if (!np || !of_node_check_flag(np, OF_CREATE_WITH_CSET))
>   return;
>   pdev->dev.of_node = NULL;

of_pci_remove_node() goes on to call of_changeset_revert() and
of_changeset_destroy().  of_pci_remove_node() has nothing

Re: PCI: Work around PCIe link training failures

2024-08-06 Thread Bjorn Helgaas

On Mon, Aug 05, 2024 at 06:06:59PM -0600, Matthew W Carlis wrote:
> Hello again. I just realized that my first response to this thread two weeks
> ago was not actually starting from the end of the discussion. I hope I found
> it now... Must say sorry for this I am still figuring out how to follow these
> threads.
> I need to ask if we can either revert this patch or only modify the quirk to
> only run on the device in mention (ASMedia ASM2824). We have now identified
> it as causing devices to get stuck at Gen1 in multiple generations of our
> hardware & across product lines on ports were hot-plug is common. To be a
> little more specific it includes Intel root ports and Broadcomm PCIe switch
> ports and also Microchip PCIe switch ports.
> The most common place where we see our systems getting stuck at Gen1 is with
> device power cycling. If a device is powered on and then off quickly then the
> link will of course fail to train & the consequence here is that the port is
> forced to Gen1 forever. Does anybody know why the patch will only remove the
> forced Gen1 speed from the ASMedia device?

Thanks for keeping this thread alive.  I don't know the fix, but it
does seem like this series made wASMedia ASM2824 work better but
caused regressions elsewhere, so maybe we just need to accept that
ASM2824 is slightly broken and doesn't work as well as it should.

Bjorn

Re: [PATCH v2] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest

2024-07-25 Thread Bjorn Helgaas

On Thu, Jul 25, 2024 at 11:15:39PM +0530, Amit Machhiwal wrote:
> ...
> The crash in question is a critical issue that we would want to have
> a fix for soon. And while this is still being figured out, is it
> okay to go with the fix I proposed in the V1 of this patch?

v6.10 has been released already, and it will be a couple months before
the v6.11 release.

It looks like the regression is 407d1a51921e, which appeared in v6.6,
almost a year ago, so it's fairly old.

What target are you thinking about for the V1 patch?  I guess if we
add it as a v6.11 post-merge window fix, it might get backported to
stable kernels before v6.11?  But if the plan is to merge the V1 patch
and then polish it again before v6.11 releases, I'm not sure it's
worth the churn.

Bjorn

Re: [PATCH v2] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest

2024-07-15 Thread Bjorn Helgaas

On Mon, Jul 15, 2024 at 09:20:01AM -0700, Lizhi Hou wrote:
> On 7/15/24 01:07, Amit Machhiwal wrote:
> > With CONFIG_PCI_DYNAMIC_OF_NODES [1], a hot-plug and hot-unplug sequence
> > of a PCI device attached to a PCI-bridge causes following kernel Oops on
> > a pseries KVM guest:
> > 
> >   RTAS: event: 2, Type: Hotplug Event (229), Severity: 1
> >   Kernel attempted to read user page (10ec0048) - exploit attempt? 
> > (uid: 0)
> >   BUG: Unable to handle kernel data access on read at 0x10ec0048
> >   Faulting instruction address: 0xc12d8728
> >   Oops: Kernel access of bad area, sig: 11 [#1]
> >   LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> > 
> >   NIP [c12d8728] __of_changeset_entry_invert+0x10/0x1ac
> >   LR [c12da7f0] __of_changeset_revert_entries+0x98/0x180
> >   Call Trace:
> >   [cbcc3970] [c12daa60] of_changeset_revert+0x58/0xd8
> >   [cbcc39c0] [c0d0ed78] of_pci_remove_node+0x74/0xb0
> >   [cbcc39f0] [c0cdcfe0] pci_stop_bus_device+0xf4/0x138
> >   [cbcc3a30] [c0cdd140] 
> > pci_stop_and_remove_bus_device_locked+0x34/0x64
> >   [cbcc3a60] [c0cf3780] remove_store+0xf0/0x108
> >   [cbcc3ab0] [c0e89e04] dev_attr_store+0x34/0x78
> >   [cbcc3ad0] [c07f8dd4] sysfs_kf_write+0x70/0xa4
> >   [cbcc3af0] [c07f7248] kernfs_fop_write_iter+0x1d0/0x2e0
> >   [cbcc3b40] [c06c9b08] vfs_write+0x27c/0x558
> >   [cbcc3bf0] [c06ca168] ksys_write+0x90/0x170
> >   [cbcc3c40] [c0033248] system_call_exception+0xf8/0x290
> >   [cbcc3e50] [c000d05c] 
> > system_call_vectored_common+0x15c/0x2ec
> > 
> > 
> > A git bisect pointed this regression to be introduced via [1] that added
> > a mechanism to create device tree nodes for parent PCI bridges when a
> > PCI device is hot-plugged.
> > 
> > The Oops is caused when `pci_stop_dev()` tries to remove a non-existing
> > device-tree node associated with the pci_dev that was earlier
> > hot-plugged and was attached under a pci-bridge. The PCI dev header
> > `dev->hdr_type` being 0, results a conditional check done with
> > `pci_is_bridge()` into false. Consequently, a call to
> > `of_pci_make_dev_node()` to create a device node is never made. When at
> > a later point in time, in the device node removal path, a memcpy is
> > attempted in `__of_changeset_entry_invert()`; since the device node was
> > never created, results in an Oops due to kernel read access to a bad
> > address.
> > 
> > To fix this issue, the patch updates `of_changeset_create_node()` to
> > allocate a new node only when the device node doesn't exist and init it
> > in case it does already. Also, introduce `of_pci_free_node()` to be
> > called to only revert and destroy the changeset device node that was
> > created via a call to `of_changeset_create_node()`.
> > 
> > [1] commit 407d1a51921e ("PCI: Create device tree node for bridge")
> > 
> > Fixes: 407d1a51921e ("PCI: Create device tree node for bridge")
> > Reported-by: Kowshik Jois B S 
> > Signed-off-by: Lizhi Hou 
> > Signed-off-by: Amit Machhiwal 
> > ---
> > Changes since v1:
> >  * Included Lizhi's suggested changes on V1
> >  * Fixed below two warnings from Lizhi's changes and rearranged the 
> > cleanup
> >part a bit in `of_pci_make_dev_node`
> > drivers/pci/of.c:611:6: warning: no previous prototype for 
> > ‘of_pci_free_node’ [-Wmissing-prototypes]
> >   611 | void of_pci_free_node(struct device_node *np)
> >   |  ^~~~
> > drivers/pci/of.c: In function ‘of_pci_make_dev_node’:
> > drivers/pci/of.c:696:1: warning: label ‘out_destroy_cset’ defined but 
> > not used [-Wunused-label]
> >   696 | out_destroy_cset:
> >   | ^~~~
> >  * V1: 
> > https://lore.kernel.org/all/20240703141634.2974589-1-amach...@linux.ibm.com/
> > 
> >   drivers/of/dynamic.c  | 16 
> >   drivers/of/unittest.c |  2 +-
> >   drivers/pci/bus.c |  3 +--
> >   drivers/pci/of.c  | 39 ++-
> >   drivers/pci/pci.h |  2 ++
> >   include/linux/of.h|  1 +
> >   6 files changed, 43 insertions(+), 20 deletions(-)
> > 
> > diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> > index dda6092e6d3a..9bba5e82a384 100644
> > --- a/drivers/of/dynamic.c
> > +++ b/drivers/of/dynamic.c
> > @@ -492,21 +492,29 @@ struct device_node *__of_node_dup(const struct 
> > device_node *np,
> >* a given changeset.
> >*
> >* @ocs: Pointer to changeset
> > + * @np: Pointer to device node. If null, allocate a new node. If not, init 
> > an
> > + * existing one.
> >* @parent: Pointer to parent device node
> >* @full_name: Node full name
> >*
> >* Return: Pointer to the created device node or NULL in case of an error.
> >*/
> >   struct device_node *of_changeset_create_node(struct of_changeset *ocs,
>

Re: [PATCH] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest

2024-07-05 Thread Bjorn Helgaas

[+cc Lukas, FYI]

On Wed, Jul 03, 2024 at 07:46:34PM +0530, Amit Machhiwal wrote:
> With CONFIG_PCI_DYNAMIC_OF_NODES [1], a hot-plug and hot-unplug sequence
> of a PCI device attached to a PCI-bridge causes following kernel Oops on
> a pseries KVM guest:
> 
>  RTAS: event: 2, Type: Hotplug Event (229), Severity: 1
>  Kernel attempted to read user page (10ec0048) - exploit attempt? (uid: 0)
>  BUG: Unable to handle kernel data access on read at 0x10ec0048
>  Faulting instruction address: 0xc12d8728
>  Oops: Kernel access of bad area, sig: 11 [#1]
>  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> 
>  NIP [c12d8728] __of_changeset_entry_invert+0x10/0x1ac
>  LR [c12da7f0] __of_changeset_revert_entries+0x98/0x180
>  Call Trace:
>  [cbcc3970] [c12daa60] of_changeset_revert+0x58/0xd8
>  [cbcc39c0] [c0d0ed78] of_pci_remove_node+0x74/0xb0
>  [cbcc39f0] [c0cdcfe0] pci_stop_bus_device+0xf4/0x138
>  [cbcc3a30] [c0cdd140] 
> pci_stop_and_remove_bus_device_locked+0x34/0x64
>  [cbcc3a60] [c0cf3780] remove_store+0xf0/0x108
>  [cbcc3ab0] [c0e89e04] dev_attr_store+0x34/0x78
>  [cbcc3ad0] [c07f8dd4] sysfs_kf_write+0x70/0xa4
>  [cbcc3af0] [c07f7248] kernfs_fop_write_iter+0x1d0/0x2e0
>  [cbcc3b40] [c06c9b08] vfs_write+0x27c/0x558
>  [cbcc3bf0] [c06ca168] ksys_write+0x90/0x170
>  [cbcc3c40] [c0033248] system_call_exception+0xf8/0x290
>  [cbcc3e50] [c000d05c] system_call_vectored_common+0x15c/0x2ec
> 
> 
> A git bisect pointed this regression to be introduced via [1] that added
> a mechanism to create device tree nodes for parent PCI bridges when a
> PCI device is hot-plugged.
> 
> The Oops is caused when `pci_stop_dev()` tries to remove a non-existing
> device-tree node associated with the pci_dev that was earlier
> hot-plugged and was attached under a pci-bridge. The PCI dev header
> `dev->hdr_type` being 0, results a conditional check done with
> `pci_is_bridge()` into false. Consequently, a call to
> `of_pci_make_dev_node()` to create a device node is never made. When at
> a later point in time, in the device node removal path, a memcpy is
> attempted in `__of_changeset_entry_invert()`; since the device node was
> never created, results in an Oops due to kernel read access to a bad
> address.
> 
> To fix this issue the patch updates `pci_stop_dev()` to ensure that a
> call to `of_pci_remove_node()` is only made for pci-bridge devices.
> 
> [1] commit 407d1a51921e ("PCI: Create device tree node for bridge")
> 
> Fixes: 407d1a51921e ("PCI: Create device tree node for bridge")
> Reported-by: Kowshik Jois B S 
> Tested-by: Kowshik Jois B S 
> Signed-off-by: Amit Machhiwal 

Thanks for the patch and testing!  Would like a reviewed-by from
Lizhi.

> ---
>  drivers/pci/remove.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
> index d749ea8250d6..4e51c64af416 100644
> --- a/drivers/pci/remove.c
> +++ b/drivers/pci/remove.c
> @@ -22,7 +22,8 @@ static void pci_stop_dev(struct pci_dev *dev)
>   device_release_driver(&dev->dev);
>   pci_proc_detach_device(dev);
>   pci_remove_sysfs_dev_files(dev);
> - of_pci_remove_node(dev);
> + if (pci_is_bridge(dev))
> + of_pci_remove_node(dev);

IIUC, this basically undoes the work that was done by
of_pci_make_dev_node().

The call of of_pci_make_dev_node() from pci_bus_add_device() was added
by 407d1a51921e and is conditional on pci_is_bridge(), so it makes
sense to me that the remove needs a similar condition.

>   pci_dev_assign_added(dev, false);
>   }
> 
> base-commit: e9d22f7a6655941fc8b2b942ed354ec780936b3e
> -- 
> 2.45.2
>

Re: [PATCH 02/13] pci/p2pdma: Don't initialise page refcount to one

2024-06-29 Thread Bjorn Helgaas

On Thu, Jun 27, 2024 at 10:54:17AM +1000, Alistair Popple wrote:
> The reference counts for ZONE_DEVICE private pages should be
> initialised by the driver when the page is actually allocated by the
> driver allocator, not when they are first created. This is currently
> the case for MEMORY_DEVICE_PRIVATE and MEMORY_DEVICE_COHERENT pages
> but not MEMORY_DEVICE_PCI_P2PDMA pages so fix that up.

If you tag the subject line with PCI, please run "git log --oneline
drivers/pci/p2pdma.c" and make yours look like previous ones
("PCI/P2PDMA").

Also recast it to say something semantically useful about what it
*does*, not what it *doesn't* do.  Maybe something about initializing
the refcount where the page is allocated?  Especially since the only
p2pdma.c change here is to "set_page_count(..., 1)", which looks like
exactly the opposite of "don't initialize refcount to one".

> Signed-off-by: Alistair Popple 
> ---
>  drivers/pci/p2pdma.c | 2 ++
>  mm/memremap.c| 8 
>  mm/mm_init.c | 4 +++-
>  3 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 4f47a13..1e9ea32 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -128,6 +128,8 @@ static int p2pmem_alloc_mmap(struct file *filp, struct 
> kobject *kobj,
>   goto out;
>   }
>  
> + set_page_count(virt_to_page(kaddr), 1);
> +
>   /*
>* vm_insert_page() can sleep, so a reference is taken to mapping
>* such that rcu_read_unlock() can be done before inserting the
> diff --git a/mm/memremap.c b/mm/memremap.c
> index 40d4547..caccbd8 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -488,15 +488,15 @@ void free_zone_device_folio(struct folio *folio)
>   folio->mapping = NULL;
>   folio->page.pgmap->ops->page_free(folio_page(folio, 0));
>  
> - if (folio->page.pgmap->type != MEMORY_DEVICE_PRIVATE &&
> - folio->page.pgmap->type != MEMORY_DEVICE_COHERENT)
> + if (folio->page.pgmap->type == MEMORY_DEVICE_PRIVATE ||
> + folio->page.pgmap->type == MEMORY_DEVICE_COHERENT)
> + put_dev_pagemap(folio->page.pgmap);
> + else if (folio->page.pgmap->type != MEMORY_DEVICE_PCI_P2PDMA)
>   /*
>* Reset the refcount to 1 to prepare for handing out the page
>* again.
>*/
>   folio_set_count(folio, 1);
> - else
> - put_dev_pagemap(folio->page.pgmap);
>  }
>  
>  void zone_device_page_init(struct page *page)
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 3ec0493..b7e1599 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -6,6 +6,7 @@
>   * Author Mel Gorman 
>   *
>   */
> +#include "linux/memremap.h"
>  #include 
>  #include 
>  #include 
> @@ -1014,7 +1015,8 @@ static void __ref __init_zone_device_page(struct page 
> *page, unsigned long pfn,
>* which will set the page count to 1 when allocating the page.
>*/
>   if (pgmap->type == MEMORY_DEVICE_PRIVATE ||
> - pgmap->type == MEMORY_DEVICE_COHERENT)
> + pgmap->type == MEMORY_DEVICE_COHERENT ||
> + pgmap->type == MEMORY_DEVICE_PCI_P2PDMA)
>   set_page_count(page, 0);
>  }
>  
> -- 
> git-series 0.9.1

Re: [PATCH v3 1/2] pci/hotplug/pnv_php: Fix hotplug driver crash on Powernv

2024-06-26 Thread Bjorn Helgaas

I expect this series would go through the powerpc tree since that's
where most of the chance is.

On Mon, Jun 24, 2024 at 05:39:27PM +0530, Krishna Kumar wrote:
> Description of the problem: The hotplug driver for powerpc
> (pci/hotplug/pnv_php.c) gives kernel crash when we try to
> hot-unplug/disable the PCIe switch/bridge from the PHB.
> 
> Root Cause of Crash: The crash is due to the reason that, though the msi
> data structure has been released during disable/hot-unplug path and it
> has been assigned with NULL, still during unregistartion the code was
> again trying to explicitly disable the msi which causes the Null pointer
> dereference and kernel crash.

s/unregistartion/unregistration/
s/Null/NULL/ to match previous use
s/msi/MSI/ to match spec usage

> Proposed Fix : The fix is to correct the check during unregistration path
> so that the code should not  try to invoke pci_disable_msi/msix() if its
> data structure is already freed.

s/Proposed Fix : The fix is to// ... Just say what the patch does.

If/when the powerpc folks like this, add my:

Acked-by: Bjorn Helgaas 

> Cc: Michael Ellerman 
> Cc: Nicholas Piggin 
> Cc: Christophe Leroy 
> Cc: "Aneesh Kumar K.V" 
> Cc: Bjorn Helgaas 
> Cc: Gaurav Batra 
> Cc: Nathan Lynch 
> Cc: Brian King 
> 
> Signed-off-by: Krishna Kumar 
> ---
>  drivers/pci/hotplug/pnv_php.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
> index 694349be9d0a..573a41869c15 100644
> --- a/drivers/pci/hotplug/pnv_php.c
> +++ b/drivers/pci/hotplug/pnv_php.c
> @@ -40,7 +40,6 @@ static void pnv_php_disable_irq(struct pnv_php_slot 
> *php_slot,
>   bool disable_device)
>  {
>   struct pci_dev *pdev = php_slot->pdev;
> - int irq = php_slot->irq;
>   u16 ctrl;
>  
>   if (php_slot->irq > 0) {
> @@ -59,7 +58,7 @@ static void pnv_php_disable_irq(struct pnv_php_slot 
> *php_slot,
>   php_slot->wq = NULL;
>   }
>  
> - if (disable_device || irq > 0) {
> + if (disable_device) {
>   if (pdev->msix_enabled)
>   pci_disable_msix(pdev);
>   else if (pdev->msi_enabled)
> -- 
> 2.45.0
>

[PATCH v9 2/2] PCI/DPC: Disable DPC service on suspend

2024-06-18 Thread Bjorn Helgaas

From: Kai-Heng Feng 

If the link is powered off during suspend, electrical noise may cause
errors that trigger DPC.  If the DPC interrupt is enabled and shares an IRQ
with PME, that causes a spurious wakeup during suspend.

Disable DPC triggering and the DPC interrupt during suspend to prevent
this.  Clear DPC interrupt status before re-enabling DPC interrupts during
resume so we don't get an interrupt for errors that occurred during the
suspend/resume process.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218090
Link: 
https://lore.kernel.org/r/20240416043225.1462548-3-kai.heng.f...@canonical.com
Signed-off-by: Kai-Heng Feng 
[bhelgaas: clear status on resume, add comments, commit log]
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/pcie/dpc.c | 60 +-
 1 file changed, 48 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index a668820696dc..2b6ef7efa3c1 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -412,13 +412,44 @@ void pci_dpc_init(struct pci_dev *pdev)
}
 }
 
+static void dpc_enable(struct pcie_device *dev)
+{
+   struct pci_dev *pdev = dev->port;
+   int dpc = pdev->dpc_cap;
+   u16 ctl;
+
+   /*
+* Clear DPC Interrupt Status so we don't get an interrupt for an
+* old event when setting DPC Interrupt Enable.
+*/
+   pci_write_config_word(pdev, dpc + PCI_EXP_DPC_STATUS,
+ PCI_EXP_DPC_STATUS_INTERRUPT);
+
+   pci_read_config_word(pdev, dpc + PCI_EXP_DPC_CTL, &ctl);
+   ctl &= ~PCI_EXP_DPC_CTL_EN_MASK;
+   ctl |= PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN;
+   pci_write_config_word(pdev, dpc + PCI_EXP_DPC_CTL, ctl);
+}
+
+static void dpc_disable(struct pcie_device *dev)
+{
+   struct pci_dev *pdev = dev->port;
+   int dpc = pdev->dpc_cap;
+   u16 ctl;
+
+   /* Disable DPC triggering and DPC interrupts */
+   pci_read_config_word(pdev, dpc + PCI_EXP_DPC_CTL, &ctl);
+   ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN);
+   pci_write_config_word(pdev, dpc + PCI_EXP_DPC_CTL, ctl);
+}
+
 #define FLAG(x, y) (((x) & (y)) ? '+' : '-')
 static int dpc_probe(struct pcie_device *dev)
 {
struct pci_dev *pdev = dev->port;
struct device *device = &dev->device;
int status;
-   u16 ctl, cap;
+   u16 cap;
 
if (!pcie_aer_is_native(pdev) && !pcie_ports_dpc_native)
return -ENOTSUPP;
@@ -433,11 +464,7 @@ static int dpc_probe(struct pcie_device *dev)
}
 
pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CAP, &cap);
-
-   pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl);
-   ctl &= ~PCI_EXP_DPC_CTL_EN_MASK;
-   ctl |= PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN;
-   pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl);
+   dpc_enable(dev);
 
pci_info(pdev, "enabled with IRQ %d\n", dev->irq);
pci_info(pdev, "error containment capabilities: Int Msg #%d, RPExt%c 
PoisonedTLP%c SwTrigger%c RP PIO Log %d, DL_ActiveErr%c\n",
@@ -450,14 +477,21 @@ static int dpc_probe(struct pcie_device *dev)
return status;
 }
 
+static int dpc_suspend(struct pcie_device *dev)
+{
+   dpc_disable(dev);
+   return 0;
+}
+
+static int dpc_resume(struct pcie_device *dev)
+{
+   dpc_enable(dev);
+   return 0;
+}
+
 static void dpc_remove(struct pcie_device *dev)
 {
-   struct pci_dev *pdev = dev->port;
-   u16 ctl;
-
-   pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl);
-   ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN);
-   pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl);
+   dpc_disable(dev);
 }
 
 static struct pcie_port_service_driver dpcdriver = {
@@ -465,6 +499,8 @@ static struct pcie_port_service_driver dpcdriver = {
.port_type  = PCIE_ANY_PORT,
.service= PCIE_PORT_SERVICE_DPC,
.probe  = dpc_probe,
+   .suspend= dpc_suspend,
+   .resume = dpc_resume,
.remove = dpc_remove,
 };
 
-- 
2.34.1

[PATCH v9 1/2] PCI/AER: Disable AER service on suspend

2024-06-18 Thread Bjorn Helgaas

From: Kai-Heng Feng 

If the link is powered off during suspend, electrical noise may cause
errors that are logged via AER.  If the AER interrupt is enabled and shares
an IRQ with PME, that causes a spurious wakeup during suspend.

Disable the AER interrupt during suspend to prevent this.  Clear error
status before re-enabling IRQ interrupts during resume so we don't get an
interrupt for errors that occurred during the suspend/resume process.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218090
Link: 
https://lore.kernel.org/r/20240416043225.1462548-2-kai.heng.f...@canonical.com
Signed-off-by: Kai-Heng Feng 
[bhelgaas: drop pci_ancestor_pr3_present() etc, commit log]
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/pcie/aer.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index ac6293c24976..13b8586924ea 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1497,6 +1497,22 @@ static int aer_probe(struct pcie_device *dev)
return 0;
 }
 
+static int aer_suspend(struct pcie_device *dev)
+{
+   struct aer_rpc *rpc = get_service_data(dev);
+
+   aer_disable_rootport(rpc);
+   return 0;
+}
+
+static int aer_resume(struct pcie_device *dev)
+{
+   struct aer_rpc *rpc = get_service_data(dev);
+
+   aer_enable_rootport(rpc);
+   return 0;
+}
+
 /**
  * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP
  * @dev: pointer to Root Port, RCEC, or RCiEP
@@ -1561,6 +1577,8 @@ static struct pcie_port_service_driver aerdriver = {
.service= PCIE_PORT_SERVICE_AER,
 
.probe  = aer_probe,
+   .suspend= aer_suspend,
+   .resume = aer_resume,
.remove = aer_remove,
 };
 
-- 
2.34.1

[PATCH v9 0/2] PCI: Disable AER & DPC on suspend

2024-06-18 Thread Bjorn Helgaas

From: Bjorn Helgaas 

This is an old series from Kai-Heng that I didn't handle soon enough.  The
intent is to fix several suspend/resume issues:

  - Spurious wakeup from s2idle
(https://bugzilla.kernel.org/show_bug.cgi?id=216295)

  - Steam Deck doesn't resume after suspend
(https://bugzilla.kernel.org/show_bug.cgi?id=218090)

  - Unexpected ACS error and DPC event when resuming after suspend
(https://bugzilla.kernel.org/show_bug.cgi?id=209149)

It seems that a glitch when the link is powered down during suspend causes
errors to be logged by AER.  When AER is enabled, this causes an AER
interrupt, and if that IRQ is shared with PME, it may cause a spurious
wakeup.

Also, errors logged during link power-down and power-up seem to cause
unwanted error reporting during resume.

This series disables AER interrupts, DPC triggering, and DPC interrupts
during suspend.  On resume, it clears AER and DPC error status before
re-enabling their interrupts.

I added a couple cosmetic changes for the v9, but this is essentially all
Kai-Heng's work.  I'm just posting it as a v9 because I failed to act on
this earlier.

Bjorn

v9:
 - Drop pci_ancestor_pr3_present() and pm_suspend_via_firmware; do it
   unconditionally
 - Clear DPC status before re-enabling DPC interrupt

v8: 
https://lore.kernel.org/r/20240416043225.1462548-1-kai.heng.f...@canonical.com
 - Wording.
 - Add more bug reports.

v7:
 - Wording.
 - Disable AER completely (again) if power will be turned off
 - Disable DPC completely (again) if power will be turned off

v6: 
https://lore.kernel.org/r/2023051214.118942-1-kai.heng.f...@canonical.com

v5: https://lore.kernel.org/r/20230511133610.99759-1-kai.heng.f...@canonical.com
 - Wording.

v4: 
https://lore.kernel.org/r/20230424055249.460381-1-kai.heng.f...@canonical.com
v3: 
https://lore.kernel.org/r/20230420125941.333675-1-kai.heng.f...@canonical.com
 - Correct subject.

v2: 
https://lore.kernel.org/r/20230420015830.309845-1-kai.heng.f...@canonical.com
 - Only disable AER IRQ.
 - No more AER check on PME IRQ#.
 - Use AER helper.
 - Only disable DPC IRQ.
 - No more DPC check on PME IRQ#.

v1: 
https://lore.kernel.org/r/20220727013255.269815-1-kai.heng.f...@canonical.com

Kai-Heng Feng (2):
  PCI/AER: Disable AER service on suspend
  PCI/DPC: Disable DPC service on suspend

 drivers/pci/pcie/aer.c | 18 +
 drivers/pci/pcie/dpc.c | 60 +-
 2 files changed, 66 insertions(+), 12 deletions(-)

-- 
2.34.1

Re: [PATCH v8 2/3] PCI/AER: Disable AER service on suspend

2024-06-18 Thread Bjorn Helgaas

On Thu, Apr 25, 2024 at 03:33:01PM +0800, Kai-Heng Feng wrote:
> On Fri, Apr 19, 2024 at 4:35 AM Bjorn Helgaas  wrote:
> >
> > On Tue, Apr 16, 2024 at 12:32:24PM +0800, Kai-Heng Feng wrote:
> > > When the power rail gets cut off, the hardware can create some electric
> > > noise on the link that triggers AER. If IRQ is shared between AER with
> > > PME, such AER noise will cause a spurious wakeup on system suspend.
> > >
> > > When the power rail gets back, the firmware of the device resets itself
> > > and can create unexpected behavior like sending PTM messages. For this
> > > case, the driver will always be too late to toggle off features should
> > > be disabled.
> > >
> > > As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power
> > > Management", TLP and DLLP transmission are disabled for a Link in L2/L3
> > > Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold) states. So if
> > > the power will be turned off during suspend process, disable AER service
> > > and re-enable it during the resume process. This should not affect the
> > > basic functionality.
> > >
> > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149
> > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295
> > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218090
> > > Signed-off-by: Kai-Heng Feng 
> >
> > Thanks for reviving this series.  I tried follow the history about
> > this, but there are at least two series that were very similar and I
> > can't put it all together.
> >
> > > ---
> > > v8:
> > >  - Add more bug reports.
> > >
> > > v7:
> > >  - Wording
> > >  - Disable AER completely (again) if power will be turned off
> > >
> > > v6:
> > > v5:
> > >  - Wording.
> > >
> > > v4:
> > > v3:
> > >  - No change.
> > >
> > > v2:
> > >  - Only disable AER IRQ.
> > >  - No more check on PME IRQ#.
> > >  - Use helper.
> > >
> > >  drivers/pci/pcie/aer.c | 25 +
> > >  1 file changed, 25 insertions(+)
> > >
> > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > > index ac6293c24976..bea7818c2d1b 100644
> > > --- a/drivers/pci/pcie/aer.c
> > > +++ b/drivers/pci/pcie/aer.c
> > > @@ -28,6 +28,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >  #include 
> > >  #include 
> > >  #include 
> > > @@ -1497,6 +1498,28 @@ static int aer_probe(struct pcie_device *dev)
> > >   return 0;
> > >  }
> > >
> > > +static int aer_suspend(struct pcie_device *dev)
> > > +{
> > > + struct aer_rpc *rpc = get_service_data(dev);
> > > + struct pci_dev *pdev = rpc->rpd;
> > > +
> > > + if (pci_ancestor_pr3_present(pdev) || pm_suspend_via_firmware())
> > > + aer_disable_rootport(rpc);
> >
> > Why do we check pci_ancestor_pr3_present(pdev) and
> > pm_suspend_via_firmware()?  I'm getting pretty convinced that we need
> > to disable AER interrupts on suspend in general.  I think it will be
> > better if we do that consistently on all platforms, not special cases
> > based on details of how we suspend.
> 
> Sure. Will change in next revision.
> 
> > Also, why do we use aer_disable_rootport() instead of just
> > aer_disable_irq()?  I think it's the interrupt that causes issues on
> > suspend.  I see that there *were* some versions that used
> > aer_disable_irq(), but I can't find the reason it changed.
> 
> Interrupt can cause system wakeup, if it's shared with PME.
> 
> The reason why aer_disable_rootport() is used over aer_disable_irq()
> is that when the latter is used the error still gets logged during
> sleep cycle. Once the pcieport driver resumes, it invokes
> aer_root_reset() to reset the hierarchy, while the hierarchy hasn't
> resumed yet.
> 
> So use aer_disable_rootport() to prevent such issue from happening.

I think the issue is more likely on the resume side.

aer_disable_rootport() disables AER interrupts, then clears
PCI_ERR_ROOT_STATUS, so the path looks like this:

  aer_suspend
aer_disable_rootport
  aer_disable_irq()
  pci_write_config_dword(PCI_ERR_ROOT_STATUS)# clear

This happens during suspend, so at this point I think the link is
still active and the spurious AER errors haven't happened yet and it
probably doesn't m

Re: [PATCH 2/5] PCI: endpoint: Introduce 'epc_deinit' event and notify the EPF drivers

2024-06-11 Thread Bjorn Helgaas

On Thu, Jun 06, 2024 at 12:56:35PM +0530, Manivannan Sadhasivam wrote:
> As like the 'epc_init' event, that is used to signal the EPF drivers about
> the EPC initialization, let's introduce 'epc_deinit' event that is used to
> signal EPC deinitialization.
> 
> The EPC deinitialization applies only when any sort of fundamental reset
> is supported by the endpoint controller as per the PCIe spec.
> 
> Reference: PCIe Base spec v5.0, sections 4.2.4.9.1 and 6.6.1.

PCIe r6.0, sec 4.2.5.9.1 and 6.6.1.

(Not 4.2.4.9.1, which no longer exists in r6.x)

> Currently, some EPC drivers like pcie-qcom-ep and pcie-tegra194 support
> PERST# as the fundamental reset. So the 'deinit' event will be notified to
> the EPF drivers when PERST# assert happens in the above mentioned EPC
> drivers.
> 
> The EPF drivers, on receiving the event through the epc_deinit() callback
> should reset the EPF state machine and also cleanup any configuration that
> got affected by the fundamental reset like BAR, DMA etc...
> 
> This change also warrants skipping the cleanups in unbind() if already done
> in epc_deinit().
> 
> Reviewed-by: Niklas Cassel 
> Signed-off-by: Manivannan Sadhasivam

Re: [PATCH v2] PCI/AER: Print error message as per the TODO

2024-06-05 Thread Bjorn Helgaas

On Wed, Jun 05, 2024 at 09:23:44PM +, Abhinav Jain wrote:
> Print the add device error in find_device_iter()
> 
> Signed-off-by: Abhinav Jain 
> 
> PATCH v1 link : 
> https://lore.kernel.org/all/20240415161055.8316-1-jain.abhinav...@gmail.com/
> 
> Changes since v1:
>  - Replaced pr_err() with pr_notice()
>  - Removed unncessary whitespaces
> ---

Thanks for looking at this.

  - It doesn't apply to -rc1 (the TODO message is missing).  In PCI,
we normally apply patches on topic branches based on -rc1.

  - The subject should be more specific so it makes sense all by
itself, e.g., "Log note if we find too many devices with errors"

  - Add period at end of sentence in commit log.

  - Move historical notes (v1 URL, changes since v1) below the "---"
line so they don't get included in the commit log.

  - __func__ is not relevant here -- that's generally a debugging
thing.  We can find the function by searching for the message
text.  In cases like this, I'd rather have something that helps
identify a *device* that's related to the message, e.g., the
pci_dev in this case.  So I'd suggest pci_err(dev, "...") here.

  - I'd keep pci_err() instead of switching to pr_notice().  If we get
this message, we should re-think the way we collect this
information, so I want to hear about it.

>  drivers/pci/pcie/aer.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 0e1ad2998116..8b820a74dd6b 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -885,8 +885,8 @@ static int find_device_iter(struct pci_dev *dev, void 
> *data)
>   /* List this device */
>   if (add_error_device(e_info, dev)) {
>   /* We cannot handle more... Stop iteration */
> - pr_err("find_device_iter: Cannot handle more devices.
> - Stopping iteration");
> + pr_notice("%s: Cannot handle more devices - iteration 
> stopped\n",
> + __func__);
>   return 1;
>   }
>  
> -- 
> 2.34.1
>

Re: [PATCH v1 1/1] treewide: Align match_string() with sysfs_match_string()

2024-06-03 Thread Bjorn Helgaas

On Sun, Jun 02, 2024 at 06:57:12PM +0300, Andy Shevchenko wrote:
> Make two APIs look similar. Hence convert match_string() to be
> a 2-argument macro. In order to avoid unneeded churn, convert
> all users as well. There is no functional change intended.

Looks nice, thanks for doing this.

> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index ac6293c24976..2d317c7e1cea 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -210,7 +210,7 @@ void pcie_ecrc_get_policy(char *str)
>  {
>   int i;
>  
> - i = match_string(ecrc_policy_str, ARRAY_SIZE(ecrc_policy_str), str);
> + i = match_string(ecrc_policy_str, str);
>   if (i < 0)
>   return;
>  

Acked-by: Bjorn Helgaas# drivers/pci/

> +++ b/mm/vmpressure.c
> @@ -388,7 +388,7 @@ int vmpressure_register_event(struct mem_cgroup *memcg,
>  
>   /* Find required level */
>   token = strsep(&spec, ",");
> - ret = match_string(vmpressure_str_levels, VMPRESSURE_NUM_LEVELS, token);
> + ret = match_string(vmpressure_str_levels, token);

VMPRESSURE_NUM_LEVELS looks like it's no longer used?

>   if (ret < 0)
>   goto out;
>   level = ret;
> @@ -396,7 +396,7 @@ int vmpressure_register_event(struct mem_cgroup *memcg,
>   /* Find optional mode */
>   token = strsep(&spec, ",");
>   if (token) {
> - ret = match_string(vmpressure_str_modes, VMPRESSURE_NUM_MODES, 
> token);
> + ret = match_string(vmpressure_str_modes, token);

Ditto.

>   if (ret < 0)
>   goto out;
>   mode = ret;

Re: [PATCH v3 1/2] PCI: Add TLP Prefix reading into pcie_read_tlp_log()

2024-05-03 Thread Bjorn Helgaas

On Fri, Apr 12, 2024 at 04:36:34PM +0300, Ilpo Järvinen wrote:
> pcie_read_tlp_log() handles only 4 TLP Header Log DWORDs but TLP Prefix
> Log (PCIe r6.1 secs 7.8.4.12 & 7.9.14.13) may also be present.
> 
> Generalize pcie_read_tlp_log() and struct pcie_tlp_log to handle also
> TLP Prefix Log. The layout of relevant registers in AER and DPC
> Capability is not identical because the offsets of TLP Header Log and
> TLP Prefix Log vary so the callers must pass the offsets to
> pcie_read_tlp_log().

I think the layouts of the Header Log and the TLP Prefix Log *are*
identical, but they are at different offsets in the AER Capability vs
the DPC Capability.  Lukas and I have both stumbled over this.

Similar and more comments at:
https://lore.kernel.org/r/20240322193011.GA701027@bhelgaas

> Convert eetlp_prefix_path into integer called eetlp_prefix_max and
> make is available also when CONFIG_PCI_PASID is not configured to
> be able to determine the number of E-E Prefixes.

s/make is/make it/

I think this could be a separate patch.

> --- a/include/linux/aer.h
> +++ b/include/linux/aer.h
> @@ -20,6 +20,7 @@ struct pci_dev;
>  
>  struct pcie_tlp_log {
>   u32 dw[4];
> + u32 prefix[4];
>  };
>  
>  struct aer_capability_regs {
> @@ -37,7 +38,9 @@ struct aer_capability_regs {
>   u16 uncor_err_source;
>  };
>  
> -int pcie_read_tlp_log(struct pci_dev *dev, int where, struct pcie_tlp_log 
> *log);
> +int pcie_read_tlp_log(struct pci_dev *dev, int where, int where2,
> +   unsigned int tlp_len, struct pcie_tlp_log *log);
> +unsigned int aer_tlp_log_len(struct pci_dev *dev);

I think it was a mistake to expose pcie_read_tlp_log() outside
drivers/pci, and I don't think we should expose aer_tlp_log_len()
either.

We might be stuck with exposing struct pcie_tlp_log since it looks
like ras_event.h uses it.

Bjorn

Re: [PATCH v8 2/3] PCI/AER: Disable AER service on suspend

2024-04-18 Thread Bjorn Helgaas

On Tue, Apr 16, 2024 at 12:32:24PM +0800, Kai-Heng Feng wrote:
> When the power rail gets cut off, the hardware can create some electric
> noise on the link that triggers AER. If IRQ is shared between AER with
> PME, such AER noise will cause a spurious wakeup on system suspend.
> 
> When the power rail gets back, the firmware of the device resets itself
> and can create unexpected behavior like sending PTM messages. For this
> case, the driver will always be too late to toggle off features should
> be disabled.
> 
> As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power
> Management", TLP and DLLP transmission are disabled for a Link in L2/L3
> Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold) states. So if
> the power will be turned off during suspend process, disable AER service
> and re-enable it during the resume process. This should not affect the
> basic functionality.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=218090
> Signed-off-by: Kai-Heng Feng 

Thanks for reviving this series.  I tried follow the history about
this, but there are at least two series that were very similar and I
can't put it all together.

> ---
> v8:
>  - Add more bug reports.
> 
> v7:
>  - Wording
>  - Disable AER completely (again) if power will be turned off
> 
> v6:
> v5:
>  - Wording.
> 
> v4:
> v3:
>  - No change.
> 
> v2:
>  - Only disable AER IRQ.
>  - No more check on PME IRQ#.
>  - Use helper.
> 
>  drivers/pci/pcie/aer.c | 25 +
>  1 file changed, 25 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index ac6293c24976..bea7818c2d1b 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -28,6 +28,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1497,6 +1498,28 @@ static int aer_probe(struct pcie_device *dev)
>   return 0;
>  }
>  
> +static int aer_suspend(struct pcie_device *dev)
> +{
> + struct aer_rpc *rpc = get_service_data(dev);
> + struct pci_dev *pdev = rpc->rpd;
> +
> + if (pci_ancestor_pr3_present(pdev) || pm_suspend_via_firmware())
> + aer_disable_rootport(rpc);

Why do we check pci_ancestor_pr3_present(pdev) and
pm_suspend_via_firmware()?  I'm getting pretty convinced that we need
to disable AER interrupts on suspend in general.  I think it will be
better if we do that consistently on all platforms, not special cases
based on details of how we suspend.

Also, why do we use aer_disable_rootport() instead of just
aer_disable_irq()?  I think it's the interrupt that causes issues on
suspend.  I see that there *were* some versions that used
aer_disable_irq(), but I can't find the reason it changed.

> +
> + return 0;
> +}
> +
> +static int aer_resume(struct pcie_device *dev)
> +{
> + struct aer_rpc *rpc = get_service_data(dev);
> + struct pci_dev *pdev = rpc->rpd;
> +
> + if (pci_ancestor_pr3_present(pdev) || pm_resume_via_firmware())
> + aer_enable_rootport(rpc);
> +
> + return 0;
> +}
> +
>  /**
>   * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP
>   * @dev: pointer to Root Port, RCEC, or RCiEP
> @@ -1561,6 +1584,8 @@ static struct pcie_port_service_driver aerdriver = {
>   .service= PCIE_PORT_SERVICE_AER,
>  
>   .probe  = aer_probe,
> + .suspend= aer_suspend,
> + .resume = aer_resume,
>   .remove = aer_remove,
>  };
>  
> -- 
> 2.34.1
>

Re: [PATCH v12 2/8] PCI: dwc: ep: Add Kernel-doc comments for APIs

2024-04-15 Thread Bjorn Helgaas

On Mon, Apr 15, 2024 at 07:30:15PM +0530, Manivannan Sadhasivam wrote:
> On Fri, Apr 12, 2024 at 02:58:36PM -0500, Bjorn Helgaas wrote:
> > On Wed, Mar 27, 2024 at 02:43:31PM +0530, Manivannan Sadhasivam wrote:
> > > All of the APIs are missing the Kernel-doc comments. Hence, add them.
> > 
> > > + * dw_pcie_ep_reset_bar - Reset endpoint BAR
> > 
> > Apparently this resets @bar for every function of the device, so it's
> > not just a single BAR?
> 
> Right. It should've been 'Reset endpoint BARs'. And the API name is also
> misleading, but that is not the scope of this series.

Maybe just this for now:

s/Reset endpoint BAR/Reset @bar of every function/

> > > + * dw_pcie_ep_raise_intx_irq - Raise INTx IRQ to the host
> > > + * @ep: DWC EP device
> > > + * @func_no: Function number of the endpoint
> > > + *
> > > + * Return: 0 if success, errono otherwise.
> > 
> > s/errono/errno/ (another instance below)
> 
> ah, thanks for spotting. Since this series is already merged, I hope
> Krzysztof can ammend this.

Sounds good, thanks!

Bjorn

Re: [PATCH v12 8/8] PCI: endpoint: Remove "core_init_notifier" flag

2024-04-12 Thread Bjorn Helgaas

On Wed, Mar 27, 2024 at 02:43:37PM +0530, Manivannan Sadhasivam wrote:
> "core_init_notifier" flag is set by the glue drivers requiring refclk from
> the host to complete the DWC core initialization. Also, those drivers will
> send a notification to the EPF drivers once the initialization is fully
> completed using the pci_epc_init_notify() API. Only then, the EPF drivers
> will start functioning.
> 
> For the rest of the drivers generating refclk locally, EPF drivers will
> start functioning post binding with them. EPF drivers rely on the
> 'core_init_notifier' flag to differentiate between the drivers.
> Unfortunately, this creates two different flows for the EPF drivers.
> 
> So to avoid that, let's get rid of the "core_init_notifier" flag and follow
> a single initialization flow for the EPF drivers. This is done by calling
> the dw_pcie_ep_init_notify() from all glue drivers after the completion of
> dw_pcie_ep_init_registers() API. This will allow all the glue drivers to
> send the notification to the EPF drivers once the initialization is fully
> completed.

Thanks for doing this!  I think this is a significantly nicer
solution than core_init_notifier was.

One question: both qcom and tegra194 call dw_pcie_ep_init_registers()
from an interrupt handler, but they register that handler in a
different order with respect to dw_pcie_ep_init().

I don't know what actually starts the process that leads to the
interrupt, but if it's dw_pcie_ep_init(), then one of these (qcom, I
think) must be racy:

  qcom_pcie_ep_probe
dw_pcie_ep_init <- A
qcom_pcie_ep_enable_irq_resources
  devm_request_threaded_irq(qcom_pcie_ep_perst_irq_thread)  <- B

  qcom_pcie_ep_perst_irq_thread
qcom_pcie_perst_deassert
  dw_pcie_ep_init_registers

  tegra_pcie_dw_probe
tegra_pcie_config_ep
  devm_request_threaded_irq(tegra_pcie_ep_pex_rst_irq)  <- B
  dw_pcie_ep_init   <- A

  tegra_pcie_ep_pex_rst_irq
pex_ep_event_pex_rst_deassert
  dw_pcie_ep_init_registers

Whatever the right answer is, I think qcom and tegra194 should both
order dw_pcie_ep_init() and the devm_request_threaded_irq() the same
way.

Bjorn

Re: [PATCH v12 2/8] PCI: dwc: ep: Add Kernel-doc comments for APIs

2024-04-12 Thread Bjorn Helgaas

On Wed, Mar 27, 2024 at 02:43:31PM +0530, Manivannan Sadhasivam wrote:
> All of the APIs are missing the Kernel-doc comments. Hence, add them.

> + * dw_pcie_ep_reset_bar - Reset endpoint BAR

Apparently this resets @bar for every function of the device, so it's
not just a single BAR?

> + * dw_pcie_ep_raise_intx_irq - Raise INTx IRQ to the host
> + * @ep: DWC EP device
> + * @func_no: Function number of the endpoint
> + *
> + * Return: 0 if success, errono otherwise.

s/errono/errno/ (another instance below)

Bjorn

Re: [PATCHv3 pci-next 1/2] PCI/AER: correctable error message as KERN_INFO

2024-03-26 Thread Bjorn Helgaas

On Tue, Mar 26, 2024 at 09:39:54AM +0800, Ethan Zhao wrote:
> On 3/25/2024 6:15 PM, Xi Ruoyao wrote:
> > On Mon, 2024-03-25 at 16:45 +0800, Ethan Zhao wrote:
> > > On 3/25/2024 1:19 AM, Xi Ruoyao wrote:
> > > > On Mon, 2023-09-18 at 14:39 -0500, Bjorn Helgaas wrote:
> > > > > On Mon, Sep 18, 2023 at 07:42:30PM +0800, Xi Ruoyao wrote:
> > > > > > ...
> > > > > > My workstation suffers from too much correctable AER reporting as 
> > > > > > well
> > > > > > (related to Intel's errata "RPL013: Incorrectly Formed PCIe Packets 
> > > > > > May
> > > > > > Generate Correctable Errors" and/or the motherboard design, I 
> > > > > > guess).
> > > > > We should rate-limit correctable error reporting so it's not
> > > > > overwhelming.
> > > > > 
> > > > > At the same time, I'm *also* interested in the cause of these errors,
> > > > > in case there's a Linux defect or a hardware erratum that we can work
> > > > > around.  Do you have a bug report with any more details, e.g., a dmesg
> > > > > log and "sudo lspci -vv" output?
> > > > Hi Bjorn,
> > > > 
> > > > Sorry for the *very* late reply (somehow I didn't see the reply at all
> > > > before it was removed by my cron job, and now I just savaged it from
> > > > lore.kernel.org...)
> > > > 
> > > > The dmesg is like:
> > > > 
> > > > [  882.456994] pcieport :00:1c.1: AER: Multiple Correctable error 
> > > > message received from :00:1c.1
> > > > [  882.457002] pcieport :00:1c.1: AER: found no error details for 
> > > > :00:1c.1
> > > > [  882.457003] pcieport :00:1c.1: AER: Multiple Correctable error 
> > > > message received from :06:00.0
> > > > [  883.545763] pcieport :00:1c.1: AER: Multiple Correctable error 
> > > > message received from :00:1c.1
> > > > [  883.545789] pcieport :00:1c.1: PCIe Bus Error: 
> > > > severity=Correctable, type=Physical Layer, (Receiver ID)
> > > > [  883.545790] pcieport :00:1c.1:   device [8086:7a39] error 
> > > > status/mask=0001/2000
> > > > [  883.545792] pcieport :00:1c.1:    [ 0] RxErr  
> > > > (First)
> > > > [  883.545794] pcieport :00:1c.1: AER:   Error of this Agent is 
> > > > reported first
> > > > [  883.545798] r8169 :06:00.0: PCIe Bus Error: 
> > > > severity=Correctable, type=Physical Layer, (Transmitter ID)
> > > > [  883.545799] r8169 :06:00.0:   device [10ec:8125] error 
> > > > status/mask=1101/e000
> > > > [  883.545800] r8169 :06:00.0:    [ 0] RxErr  
> > > > (First)
> > > > [  883.545801] r8169 :06:00.0:    [ 8] Rollover
> > > > [  883.545802] r8169 :06:00.0:    [12] Timeout
> > > > [  883.545815] pcieport :00:1c.1: AER: Correctable error message 
> > > > received from :00:1c.1
> > > > [  883.545823] pcieport :00:1c.1: AER: found no error details for 
> > > > :00:1c.1
> > > > [  883.545824] pcieport :00:1c.1: AER: Multiple Correctable error 
> > > > message received from :06:00.0
> > > > 
> > > > lspci output attached.
> > > > 
> > > > Intel has issued an errata "RPL013" saying:
> > > > 
> > > > "Under complex microarchitectural conditions, the PCIe controller may
> > > > transmit an incorrectly formed Transaction Layer Packet (TLP), which
> > > > will fail CRC checks. When this erratum occurs, the PCIe end point may
> > > > record correctable errors resulting in either a NAK or link recovery.
> > > > Intel® has not observed any functional impact due to this erratum."
> > > > 
> > > > But I'm really unsure if it describes my issue.
> > > > 
> > > > Do you think I have some broken hardware and I should replace the CPU
> > > > and/or the motherboard (where the r8169 is soldered)?  I've noticed that
> > > > my 13900K is almost impossible to overclock (despite it's a K), but I've
> > > > not encountered any issue other than these AER reporting so far after I
> > > > gave up overclocking.
> > > Seems there are two r8169 nics on your board, only :06:00.0 reports
> > > aer errors, how about another one the :07:00.0 nic ?
> > It never happens to :07:00.0, even if I plug the ethernet cable into
> > it instead of :06:00.0.
> 
> So something is wrong with the physical layer, I guess.
> 
> > Maybe I should just use :07:00.0 and blacklist :06:00.0 as I
> > don't need two NICs?
> 
> Yup,
> ratelimit the AER warning is another choice instead of change WARN to INFO.
> if corrected error flood happens, even the function is working, suggests
> something was already wrong, likely will be worse, that is the meaning of
> WARN I think.

We should fix this.  IMHO Correctable Errors should be "info" level,
non-alarming, and rate-limited.  They're basically hints about link
integrity.

Bjorn

Re: [PATCH 3/4] PCI: Add TLP Prefix reading into pcie_read_tlp_log()

2024-03-22 Thread Bjorn Helgaas

On Tue, Feb 06, 2024 at 03:57:16PM +0200, Ilpo Järvinen wrote:
> pcie_read_tlp_log() handles only 4 TLP Header Log DWORDs but TLP Prefix
> Log (PCIe r6.1 secs 7.8.4.12 & 7.9.14.13) may also be present.

s/TLP Header Log/Header Log/ to match spec terminology (also below)

> Generalize pcie_read_tlp_log() and struct pcie_tlp_log to handle also
> TLP Prefix Log. The layout of relevant registers in AER and DPC
> Capability is not identical but the offsets of TLP Header Log and TLP
> Prefix Log vary so the callers must pass the offsets to
> pcie_read_tlp_log().

s/is not identical but/is identical, but/ ?

The spec is a little obtuse about Header Log Size.

> Convert eetlp_prefix_path into integer called eetlp_prefix_max and
> make is available also when CONFIG_PCI_PASID is not configured to
> be able to determine the number of E-E Prefixes.

I think this eetlp_prefix_path piece is right, but would be nice in a
separate patch since it's a little bit different piece to review.

> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -11336,7 +11336,9 @@ static pci_ers_result_t 
> ixgbe_io_error_detected(struct pci_dev *pdev,
>   if (!pos)
>   goto skip_bad_vf_detection;
>  
> - ret = pcie_read_tlp_log(pdev, pos + PCI_ERR_HEADER_LOG, &tlp_log);
> + ret = pcie_read_tlp_log(pdev, pos + PCI_ERR_HEADER_LOG,
> + pos + PCI_ERR_PREFIX_LOG,
> + aer_tlp_log_len(pdev), &tlp_log);
>   if (ret < 0) {
>   ixgbe_check_cfg_remove(hw, pdev);
>   goto skip_bad_vf_detection;

We applied the patch to export pcie_read_tlp_log(), but I'm having
second thoughts about it.   I don't think drivers really have any
business here, and I'd rather not expose either pcie_read_tlp_log() or
aer_tlp_log_len().

This part of ixgbe_io_error_detected() was added by 83c61fa97a7d
("ixgbe: Add protection from VF invalid target DMA"), and to me it
looks like debug code that probably doesn't need to be there as long
as the PCI core does the appropriate logging.

Bjorn

Re: [PATCH 2/4] PCI: Generalize TLP Header Log reading

2024-03-14 Thread Bjorn Helgaas

[+cc Greg, Jeff -- ancient history, I know, sorry!]

On Tue, Feb 06, 2024 at 03:57:15PM +0200, Ilpo Järvinen wrote:
> Both AER and DPC RP PIO provide TLP Header Log registers (PCIe r6.1
> secs 7.8.4 & 7.9.14) to convey error diagnostics but the struct is
> named after AER as the struct aer_header_log_regs. Also, not all places
> that handle TLP Header Log use the struct and the struct members are
> named individually.
> 
> Generalize the struct name and members, and use it consistently where
> TLP Header Log is being handled so that a pcie_read_tlp_log() helper
> can be easily added.
> 
> Signed-off-by: Ilpo Järvinen 

> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index bd541527c8c7..5fdf37968b2d 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -1,6 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /* Copyright(c) 1999 - 2018 Intel Corporation. */
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -391,22 +392,6 @@ u16 ixgbe_read_pci_cfg_word(struct ixgbe_hw *hw, u32 reg)
>   return value;
>  }
>  
> -#ifdef CONFIG_PCI_IOV
> -static u32 ixgbe_read_pci_cfg_dword(struct ixgbe_hw *hw, u32 reg)
> -{
> - struct ixgbe_adapter *adapter = hw->back;
> - u32 value;
> -
> - if (ixgbe_removed(hw->hw_addr))
> - return IXGBE_FAILED_READ_CFG_DWORD;
> - pci_read_config_dword(adapter->pdev, reg, &value);
> - if (value == IXGBE_FAILED_READ_CFG_DWORD &&
> - ixgbe_check_cfg_remove(hw, adapter->pdev))
> - return IXGBE_FAILED_READ_CFG_DWORD;
> - return value;
> -}
> -#endif /* CONFIG_PCI_IOV */
> -
>  void ixgbe_write_pci_cfg_word(struct ixgbe_hw *hw, u32 reg, u16 value)
>  {
>   struct ixgbe_adapter *adapter = hw->back;
> @@ -11332,8 +11317,8 @@ static pci_ers_result_t 
> ixgbe_io_error_detected(struct pci_dev *pdev,
>  #ifdef CONFIG_PCI_IOV
>   struct ixgbe_hw *hw = &adapter->hw;
>   struct pci_dev *bdev, *vfdev;
> - u32 dw0, dw1, dw2, dw3;
> - int vf, pos;
> + struct pcie_tlp_log tlp_log;
> + int vf, pos, ret;
>   u16 req_id, pf_func;
>  
>   if (adapter->hw.mac.type == ixgbe_mac_82598EB ||
> @@ -11351,14 +11336,13 @@ static pci_ers_result_t 
> ixgbe_io_error_detected(struct pci_dev *pdev,
>   if (!pos)
>   goto skip_bad_vf_detection;
>  
> - dw0 = ixgbe_read_pci_cfg_dword(hw, pos + PCI_ERR_HEADER_LOG);
> - dw1 = ixgbe_read_pci_cfg_dword(hw, pos + PCI_ERR_HEADER_LOG + 4);
> - dw2 = ixgbe_read_pci_cfg_dword(hw, pos + PCI_ERR_HEADER_LOG + 8);
> - dw3 = ixgbe_read_pci_cfg_dword(hw, pos + PCI_ERR_HEADER_LOG + 12);
> - if (ixgbe_removed(hw->hw_addr))
> + ret = pcie_read_tlp_log(pdev, pos + PCI_ERR_HEADER_LOG, &tlp_log);
> + if (ret < 0) {
> + ixgbe_check_cfg_remove(hw, pdev);
>   goto skip_bad_vf_detection;
> + }
>  
> - req_id = dw1 >> 16;
> + req_id = tlp_log.dw[1] >> 16;
>   /* On the 82599 if bit 7 of the requestor ID is set then it's a VF */
>   if (!(req_id & 0x0080))
>   goto skip_bad_vf_detection;
> @@ -11369,9 +11353,8 @@ static pci_ers_result_t 
> ixgbe_io_error_detected(struct pci_dev *pdev,
>  
>   vf = FIELD_GET(0x7F, req_id);
>   e_dev_err("VF %d has caused a PCIe error\n", vf);
> - e_dev_err("TLP: dw0: %8.8x\tdw1: %8.8x\tdw2: "
> - "%8.8x\tdw3: %8.8x\n",
> - dw0, dw1, dw2, dw3);
> + e_dev_err("TLP: dw0: %8.8x\tdw1: %8.8x\tdw2: %8.8x\tdw3: 
> %8.8x\n",
> +   tlp_log.dw[0], tlp_log.dw[1], tlp_log.dw[2], 
> tlp_log.dw[3]);
>   switch (adapter->hw.mac.type) {
>   case ixgbe_mac_82599EB:
>   device_id = IXGBE_82599_VF_DEVICE_ID;

The rest of this patch is headed for v6.10, but I dropped this ixgbe
change for now.

These TLP Log registers are generic, not device-specific, and if
there's something lacking in the PCI core that leads to ixgbe reading
and dumping them itself, I'd rather improve the PCI core so all
drivers will benefit without having to add code like this.

83c61fa97a7d ("ixgbe: Add protection from VF invalid target DMA") [1]
added the ixgbe TLP Log dumping way back in v3.2 (2012).  It does do
some device-specific VF checking and so on, but even back then, it
looks like the PCI core would have dumped the log itself [2], so I
don't know why we needed the extra dumping in ixgbe.

So what I'd really like is to remove the TLP Log reading and printing
from ixgbe completely, but keep the VF checking.

Bjorn

[1] https://git.kernel.org/linus/83c61fa97a7d
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/aer/aerdrv_errprint.c?id=83c61fa97a7d#n181

Re: [PATCH 0/4] PCI: Consolidate TLP Log reading and printing

2024-03-08 Thread Bjorn Helgaas

On Tue, Feb 06, 2024 at 03:57:13PM +0200, Ilpo Järvinen wrote:
> This series consolidates AER & DPC TLP Log handling code. Helpers are
> added for reading and printing the TLP Log and the format is made to
> include E-E Prefixes in both cases (previously only one DPC RP PIO
> displayed the E-E Prefixes).
> 
> I'd appreciate if people familiar with ixgbe could check the error
> handling conversion within the driver is correct.
> 
> Ilpo Järvinen (4):
>   PCI/AER: Cleanup register variable
>   PCI: Generalize TLP Header Log reading

I applied these first two to pci/aer for v6.9, thanks, these are all
nice improvements!

I postponed the ixgbe part for now because I think we should get an
ack from those maintainers or just send it to them since it subtly
changes the error and device removal checking there.

>   PCI: Add TLP Prefix reading into pcie_read_tlp_log()
>   PCI: Create helper to print TLP Header and Prefix Log

I'll respond to these with some minor comments.

>  drivers/firmware/efi/cper.c   |  4 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 39 +++--
>  drivers/pci/ats.c |  2 +-
>  drivers/pci/pci.c | 79 +++
>  drivers/pci/pci.h |  2 +-
>  drivers/pci/pcie/aer.c| 28 ++-
>  drivers/pci/pcie/dpc.c| 31 
>  drivers/pci/probe.c   | 14 ++--
>  include/linux/aer.h   | 16 ++--
>  include/linux/pci.h   |  2 +-
>  include/ras/ras_event.h   | 10 +--
>  include/uapi/linux/pci_regs.h |  2 +
>  12 files changed, 145 insertions(+), 84 deletions(-)
> 
> -- 
> 2.39.2
>

Re: [PATCH 1/1] PCI/portdrv: Allow DPC if the OS controls AER natively.

2024-02-21 Thread Bjorn Helgaas

[+cc Mahesh, Oliver, linuxppc-dev, since I mentioned powerpc below.
Probably not of interest since this is about the ACPI EDR feature, but
just FYI]

On Wed, Feb 21, 2024 at 05:11:04PM -0600, Bjorn Helgaas wrote:
> On Tue, Jan 23, 2024 at 09:59:21AM -0600, Bjorn Helgaas wrote:
> > On Mon, Jan 22, 2024 at 06:37:48PM -0800, Kuppuswamy Sathyanarayanan wrote:
> > > On 1/22/24 11:32 AM, Bjorn Helgaas wrote:
> > > > On Mon, Jan 08, 2024 at 05:15:08PM -0700, Matthew W Carlis wrote:
> > > >> A small part is probably historical; we've been using DPC on PCIe
> > > >> switches since before there was any EDR support in the kernel. It
> > > >> looks like there was a PCIe DPC ECN as early as Feb 2012, but this
> > > >> EDR/DPC fw ECN didn't come in till Jan 2019 & kernel support for ECN
> > > >> was even later. Its not immediately clear I would want to use EDR in
> > > >> my newer architecures & then there are also the older architecures
> > > >> still requiring support. When I submitted this patch I came at it
> > > >> with the approach of trying to keep the old behavior & still support
> > > >> the newer EDR behavior. Bjorns patch from Dec 28 2023 would seem to
> > > >> change the behavior for both root ports & switch ports, requiring
> > > >> them to set _OSC Control Field bit 7 (DPC) and _OSC Support Field
> > > >> bit 7 (EDR) or a kernel command line value. I think no matter what,
> > > >> we want to ensure that PCIe Root Ports and PCIe switches arrive at
> > > >> the same policy here.
> > > > Is there an approved DPC ECN to the PCI Firmware spec that adds DPC
> > > > control negotiation, but does *not* add the EDR requirement?
> > > >
> > > > I'm looking at
> > > > https://members.pcisig.com/wg/PCI-SIG/document/previewpdf/12888, which
> > > > seems to be the final "Downstream Port Containment Related
> > > > Enhancements" ECN, which is dated 1/28/2019 and applies to the PCI
> > > > Firmware spec r3.2.
> > > >
> > > > It adds bit 7, "PCI Express Downstream Port Containment Configuration
> > > > control", to the passed-in _OSC Control field, which indicates that
> > > > the OS supports both "native OS control and firmware ownership models
> > > > (i.e. Error Disconnect Recover notification) of Downstream Port
> > > > Containment."
> > > >
> > > > It also adds the dependency that "If the OS sets bit 7 of the Control
> > > > field, it must set bit 7 of the Support field, indicating support for
> > > > the Error Disconnect Recover event."
> > > >
> > > > So I'm trying to figure out if the "support DPC but not EDR" situation
> > > > was ever a valid place to be.  Maybe it's a mistake to have separate
> > > > CONFIG_PCIE_DPC and CONFIG_PCIE_EDR options.
> > > 
> > > My understanding is also similar. I have raised the same point in
> > > https://lore.kernel.org/all/3c02a6d6-917e-486c-ad41-bdf176639...@linux.intel.com/
> > 
> > Ah, sorry, I missed that.
> > 
> > > IMO, we don't need a separate config for EDR. I don't think user can
> > > gain anything with disabling EDR and enabling DPC. As long as
> > > firmware does not user EDR support, just compiling the code should
> > > be harmless.
> > > 
> > > So we can either remove it, or select it by default if user selects
> > > DPC config.
> > > 
> > > > CONFIG_PCIE_EDR depends on CONFIG_ACPI, so the situation is a little
> > > > bit murky on non-ACPI systems that support DPC.
> > > 
> > > If we are going to remove the EDR config, it might need #ifdef
> > > CONFIG_ACPI changes in edr.c to not compile ACPI specific code.
> > > Alternative choice is to compile edr.c with CONFIG_ACPI.
> > 
> > Right.  I think we should probably remove CONFIG_PCIE_EDR completely
> > and make everything controlled by CONFIG_PCIE_DPC.
> 
> In the PCI Firmware spec, r3.3, sec 4.5.1, table 4-4, the description
> of "Error Disconnect Recover Supported" hints at the possibility for
> an OS to support EDR but not DPC:
> 
>   In the context of PCIe, support for Error Disconnect Recover implies
>   that the operating system will invalidate the software state
>   associated with child devices of the port without attempting to
>   access the child device hardware. *If* the operating s

Re: [PATCH v2 1/4] PCI/AER: Store more information in aer_err_info

2024-02-06 Thread Bjorn Helgaas

On Wed, Feb 07, 2024 at 12:41:41AM +0800, Wang, Qingshun wrote:
> On Mon, Feb 05, 2024 at 05:12:31PM -0600, Bjorn Helgaas wrote:
> > On Thu, Jan 25, 2024 at 02:27:59PM +0800, Wang, Qingshun wrote:
> > > When Advisory Non-Fatal errors are raised, both correctable and
> > > uncorrectable error statuses will be set. The current kernel code cannot
> > > store both statuses at the same time, thus failing to handle ANFE 
> > > properly.
> > > In addition, to avoid clearing UEs that are not ANFE by accident, UE
> > > severity and Device Status also need to be recorded: any fatal UE cannot
> > > be ANFE, and if Fatal/Non-Fatal Error Detected is set in Device Status, do
> > > not take any assumption and let UE handler to clear UE status.
> > > 
> > > Store status and mask of both correctable and uncorrectable errors in
> > > aer_err_info. The severity of UEs and the values of the Device Status
> > > register are also recorded, which will be used to determine UEs that 
> > > should
> > > be handled by the ANFE handler. Refactor the rest of the code to use
> > > cor/uncor_status and cor/uncor_mask fields instead of status and mask
> > > fields.
> > 
> > There's a lot going on in this patch.  Could it possibly be split up a
> > bit, e.g., first tease apart aer_err_info.status/.mask into
> > .cor_status/mask and .uncor_status/mask, then add .uncor_severity,
> > then add the device_status bit separately?  If it could be split up, I
> > think the ANFE case would be easier to see.
> 
> Thanks for the feedback! Will split it up into two pacthes in the next
> version.

Or even three:

  1) tease apart aer_err_info.status/.mask into .cor_status/mask and
 .uncor_status/mask

  2) add .uncor_severity

  3) add device_status

Looking at this again, I'm a little confused about 2) and 3).  I see
the new read of PCI_ERR_UNCOR_SEVER into .uncor_severity, but there's
no actual *use* of it.

Same for 3), I see the new read of PCI_EXP_DEVSTA, but AFAICS there's
no use of that value.

We should have the addition of these new values in the same patch
that *uses* them.

Bjorn

Re: [PATCH v2 2/4] PCI/AER: Handle Advisory Non-Fatal properly

2024-02-05 Thread Bjorn Helgaas

In the subject, "properly" really doesn't convey information.  I think
this patch does two things:

  - Prints error bits that might be ANFE 
  - Clears UNCOR_STATUS bits that were previously not cleared

Maybe the subject line could say something about those (clearing
UNCOR_STATUS might be more important, or maybe this could even be
split into two patches so we could see both).

On Thu, Jan 25, 2024 at 02:28:00PM +0800, Wang, Qingshun wrote:
> When processing an Advisory Non-Fatal error, ideally both correctable
> error status and uncorrectable error status should be cleared. However,
> there is no way to fully identify the UE associated with ANFE. Even
> worse, a Fatal/Non-Fatal error may set the same UE status bit as ANFE.
> Assuming an ANFE is FE/NFE is kind of bad, but assuming a FE/NFE is an
> ANFE is usually unacceptable. To avoid clearing UEs that are not ANFE by
> accident, the most conservative route is taken here: If any of the
> Fatal/Non-Fatal Error Detected bits is set in Device Status, do not
> touch UE status, they should be cleared later by the UE handler.
> Otherwise, a specific set of UEs that may be raised as ANFE according to
> the PCIe specification will be cleared if their corresponding severity
> is non-fatal. Additionally, log UEs that will be cleared.
> 
> For instance, previously when kernel receives an ANFE with Poisoned TLP
> in OS native AER mode, only status of CE will be reported and cleared:
> 
>   AER: Corrected error received: :b7:02.0
>   PCIe Bus Error: severity=Corrected, type=Transaction Layer, (Receiver ID)
> device [8086:0db0] error status/mask=2000/
>  [13] NonFatalErr
> 
> If the kernel receives a Malformed TLP after that, two UE will be
> reported, which is unexpected. Malformed TLP Header was lost since
> the previous ANF gated the TLP header logs:
> 
>   PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, 
> (Receiver ID)
> device [8086:0db0] error status/mask=00041000/00180020
>  [12] TLP(First)
>  [18] MalfTLP
> 
> Now, in the same scenario, both CE status and related UE status will be
> reported and cleared after ANFE:
> 
>   AER: Corrected error received: :b7:02.0
>   PCIe Bus Error: severity=Corrected, type=Transaction Layer, (Receiver ID)
> device [8086:0db0] error status/mask=2000/
>  [13] NonFatalErr
> Uncorrectable errors that may cause Advisory Non-Fatal:
>  [18] TLP
> 
> Signed-off-by: "Wang, Qingshun" 
> ---
>  drivers/pci/pcie/aer.c | 61 +-
>  1 file changed, 60 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 6583dcf50977..713cbf625d3f 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -107,6 +107,12 @@ struct aer_stats {
>   PCI_ERR_ROOT_MULTI_COR_RCV |\
>   PCI_ERR_ROOT_MULTI_UNCOR_RCV)
>  
> +#define AER_ERR_ANFE_UNC_MASK(PCI_ERR_UNC_POISON_TLP |   
> \
> + PCI_ERR_UNC_COMP_TIME | \
> + PCI_ERR_UNC_COMP_ABORT |\
> + PCI_ERR_UNC_UNX_COMP |  \
> + PCI_ERR_UNC_UNSUP)
> +
>  static int pcie_aer_disable;
>  static pci_ers_result_t aer_root_reset(struct pci_dev *dev);
>  
> @@ -612,6 +618,32 @@ const struct attribute_group aer_stats_attr_group = {
>   .is_visible = aer_stats_attrs_are_visible,
>  };
>  
> +static int anfe_get_related_err(struct aer_err_info *info)
> +{
> + /*
> +  * Take the most conservative route here. If there are
> +  * Non-Fatal/Fatal errors detected, do not assume any
> +  * bit in uncor_status is set by ANFE.
> +  */
> + if (info->device_status & (PCI_EXP_DEVSTA_NFED | PCI_EXP_DEVSTA_FED))
> + return 0;
> + /*
> +  * According to PCIe Base Specification Revision 6.1,
> +  * Section 6.2.3.2.4, if an UNCOR error is rasied as
> +  * Advisory Non-Fatal error, it will match the following
> +  * conditions:
> +  *  a. The severity of the error is Non-Fatal.
> +  *  b. The error is one of the following:
> +  *  1. Poisoned TLP
> +  *  2. Completion Timeout
> +  *  3. Completer Abort
> +  *  4. Unexpected Completion
> +  *  5. Unsupported Request
> +  */
> + return info->uncor_status & ~info->uncor_mask
> + & AER_ERR_ANFE_UNC_MASK & ~info->severity;
> +}
> +
>  static void pci_dev_aer_stats_incr(struct pci_dev *pdev,
>  struct aer_err_info *info)
>  {
> @@ -678,6 +710,7 @@ static void __aer_print_error(struct pci_dev *dev,
> struct aer_err_info *info)
>  {
>   unsigned long status;
> + unsigned long anfe_status;
>

Re: [PATCH v2 1/4] PCI/AER: Store more information in aer_err_info

2024-02-05 Thread Bjorn Helgaas

On Thu, Jan 25, 2024 at 02:27:59PM +0800, Wang, Qingshun wrote:
> When Advisory Non-Fatal errors are raised, both correctable and
> uncorrectable error statuses will be set. The current kernel code cannot
> store both statuses at the same time, thus failing to handle ANFE properly.
> In addition, to avoid clearing UEs that are not ANFE by accident, UE
> severity and Device Status also need to be recorded: any fatal UE cannot
> be ANFE, and if Fatal/Non-Fatal Error Detected is set in Device Status, do
> not take any assumption and let UE handler to clear UE status.
> 
> Store status and mask of both correctable and uncorrectable errors in
> aer_err_info. The severity of UEs and the values of the Device Status
> register are also recorded, which will be used to determine UEs that should
> be handled by the ANFE handler. Refactor the rest of the code to use
> cor/uncor_status and cor/uncor_mask fields instead of status and mask
> fields.

There's a lot going on in this patch.  Could it possibly be split up a
bit, e.g., first tease apart aer_err_info.status/.mask into
.cor_status/mask and .uncor_status/mask, then add .uncor_severity,
then add the device_status bit separately?  If it could be split up, I
think the ANFE case would be easier to see.

Thanks a lot for working on this area!

Bjorn

Re: [PATCH 1/1] PCI/DPC: Fix TLP Prefix register reading offset

2024-01-22 Thread Bjorn Helgaas

On Thu, Jan 18, 2024 at 01:08:15PM +0200, Ilpo Järvinen wrote:
> The TLP Prefix Log Register consists of multiple DWORDs (PCIe r6.1 sec
> 7.9.14.13) but the loop in dpc_process_rp_pio_error() keeps reading
> from the first DWORD. Add the iteration count based offset calculation
> into the config read.
> 
> Fixes: f20c4ea49ec4 ("PCI/DPC: Add eDPC support")
> Signed-off-by: Ilpo Järvinen 

Applied to pci/dpc for v6.9 with commit log below, thanks!

PCI/DPC: Print all TLP Prefixes, not just the first

The TLP Prefix Log Register consists of multiple DWORDs (PCIe r6.1 sec
7.9.14.13) but the loop in dpc_process_rp_pio_error() keeps reading from
the first DWORD, so we print only the first PIO TLP Prefix (duplicated
several times), and we never print the second, third, etc., Prefixes.

Add the iteration count based offset calculation into the config read.

Fixes: f20c4ea49ec4 ("PCI/DPC: Add eDPC support")
Link: 
https://lore.kernel.org/r/20240118110815.3867-1-ilpo.jarvi...@linux.intel.com
Signed-off-by: Ilpo Järvinen 
[bhelgaas: add user-visible details to commit log]
Signed-off-by: Bjorn Helgaas 

> ---
>  drivers/pci/pcie/dpc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> index 94111e438241..e5d7c12854fa 100644
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -234,7 +234,7 @@ static void dpc_process_rp_pio_error(struct pci_dev *pdev)
>  
>   for (i = 0; i < pdev->dpc_rp_log_size - 5; i++) {
>   pci_read_config_dword(pdev,
> - cap + PCI_EXP_DPC_RP_PIO_TLPPREFIX_LOG, &prefix);
> + cap + PCI_EXP_DPC_RP_PIO_TLPPREFIX_LOG + i * 4, 
> &prefix);
>   pci_err(pdev, "TLP Prefix Header: dw%d, %#010x\n", i, prefix);
>   }
>   clear_status:
> -- 
> 2.39.2
>

Re: [PATCH 1/1] PCI/DPC: Fix TLP Prefix register reading offset

2024-01-19 Thread Bjorn Helgaas

On Thu, Jan 18, 2024 at 01:08:15PM +0200, Ilpo Järvinen wrote:
> The TLP Prefix Log Register consists of multiple DWORDs (PCIe r6.1 sec
> 7.9.14.13) but the loop in dpc_process_rp_pio_error() keeps reading
> from the first DWORD. Add the iteration count based offset calculation
> into the config read.

So IIUC the user-visible bug is that we print only the first PIO TLP
Prefix (duplicated several times), and we never print the second,
third, etc Prefixes, right?

I wish we could print them all in a single pci_err(), as we do for the
TLP Header Log, instead of dribbling them out one by one.

> Fixes: f20c4ea49ec4 ("PCI/DPC: Add eDPC support")
> Signed-off-by: Ilpo Järvinen 
> ---
>  drivers/pci/pcie/dpc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> index 94111e438241..e5d7c12854fa 100644
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -234,7 +234,7 @@ static void dpc_process_rp_pio_error(struct pci_dev *pdev)
>  
>   for (i = 0; i < pdev->dpc_rp_log_size - 5; i++) {
>   pci_read_config_dword(pdev,
> - cap + PCI_EXP_DPC_RP_PIO_TLPPREFIX_LOG, &prefix);
> + cap + PCI_EXP_DPC_RP_PIO_TLPPREFIX_LOG + i * 4, 
> &prefix);
>   pci_err(pdev, "TLP Prefix Header: dw%d, %#010x\n", i, prefix);
>   }
>   clear_status:
> -- 
> 2.39.2
>

[PATCH 7/8] powerpc: Fix typos

2024-01-03 Thread Bjorn Helgaas

From: Bjorn Helgaas 

Fix typos, most reported by "codespell arch/powerpc".  Only touches
comments, no code changes.

Signed-off-by: Bjorn Helgaas 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/boot/Makefile   |  4 ++--
 arch/powerpc/boot/dts/acadia.dts |  2 +-
 arch/powerpc/boot/main.c |  2 +-
 arch/powerpc/boot/ps3.c  |  2 +-
 arch/powerpc/include/asm/io.h|  2 +-
 arch/powerpc/include/asm/opal-api.h  |  4 ++--
 arch/powerpc/include/asm/pmac_feature.h  |  2 +-
 arch/powerpc/include/asm/uninorth.h  |  2 +-
 arch/powerpc/include/uapi/asm/bootx.h|  2 +-
 arch/powerpc/kernel/eeh_pe.c |  2 +-
 arch/powerpc/kernel/fadump.c |  2 +-
 arch/powerpc/kernel/misc_64.S|  4 ++--
 arch/powerpc/kernel/process.c| 12 ++--
 arch/powerpc/kernel/ptrace/ptrace-tm.c   |  2 +-
 arch/powerpc/kernel/smp.c|  2 +-
 arch/powerpc/kernel/sysfs.c  |  4 ++--
 arch/powerpc/kvm/book3s_xive.c   |  2 +-
 arch/powerpc/mm/cacheflush.c |  2 +-
 arch/powerpc/mm/nohash/kaslr_booke.c |  2 +-
 arch/powerpc/platforms/512x/mpc512x_shared.c |  2 +-
 arch/powerpc/platforms/cell/spufs/sched.c|  2 +-
 arch/powerpc/platforms/maple/pci.c   |  2 +-
 arch/powerpc/platforms/powermac/pic.c|  2 +-
 arch/powerpc/platforms/powermac/sleep.S  |  2 +-
 arch/powerpc/platforms/powernv/pci-sriov.c   |  4 ++--
 arch/powerpc/platforms/powernv/vas-window.c  |  2 +-
 arch/powerpc/platforms/pseries/vas.c |  2 +-
 arch/powerpc/sysdev/xive/common.c|  4 ++--
 arch/powerpc/sysdev/xive/native.c|  2 +-
 29 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 968aee2025b8..9c2b6e527ed1 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -108,8 +108,8 @@ DTC_FLAGS   ?= -p 1024
 # these files into the build dir, fix up any includes and ensure that dependent
 # files are copied in the right order.
 
-# these need to be seperate variables because they are copied out of different
-# directories in the kernel tree. Sure you COULd merge them, but it's a
+# these need to be separate variables because they are copied out of different
+# directories in the kernel tree. Sure you COULD merge them, but it's a
 # cure-is-worse-than-disease situation.
 zlib-decomp-$(CONFIG_KERNEL_GZIP) := decompress_inflate.c
 zlib-$(CONFIG_KERNEL_GZIP) := inffast.c inflate.c inftrees.c
diff --git a/arch/powerpc/boot/dts/acadia.dts b/arch/powerpc/boot/dts/acadia.dts
index deb52e41ab84..5fedda811378 100644
--- a/arch/powerpc/boot/dts/acadia.dts
+++ b/arch/powerpc/boot/dts/acadia.dts
@@ -172,7 +172,7 @@ ieee1588@ef602800 {
reg = <0xef602800 0x60>;
interrupt-parent = <&UIC0>;
interrupts = <0x4 0x4>;
-   /* This thing is a bit weird.  It has it's own 
UIC
+   /* This thing is a bit weird.  It has its own 
UIC
 * that it uses to generate snapshot triggers.  
We
 * don't really support this device yet, and it 
needs
 * work to figure this out.
diff --git a/arch/powerpc/boot/main.c b/arch/powerpc/boot/main.c
index cae31a6e8f02..2c0e2a1cab01 100644
--- a/arch/powerpc/boot/main.c
+++ b/arch/powerpc/boot/main.c
@@ -188,7 +188,7 @@ static inline void prep_esm_blob(struct addr_range vmlinux, 
void *chosen) { }
 
 /* A buffer that may be edited by tools operating on a zImage binary so as to
  * edit the command line passed to vmlinux (by setting /chosen/bootargs).
- * The buffer is put in it's own section so that tools may locate it easier.
+ * The buffer is put in its own section so that tools may locate it easier.
  */
 static char cmdline[BOOT_COMMAND_LINE_SIZE]
__attribute__((__section__("__builtin_cmdline")));
diff --git a/arch/powerpc/boot/ps3.c b/arch/powerpc/boot/ps3.c
index f157717ae814..89ff46b8b225 100644
--- a/arch/powerpc/boot/ps3.c
+++ b/arch/powerpc/boot/ps3.c
@@ -25,7 +25,7 @@ BSS_STACK(4096);
 
 /* A buffer that may be edited by tools operating on a zImage binary so as to
  * edit the command line passed to vmlinux (by setting /chosen/bootargs).
- * The buffer is put in it's own section so that tools may locate it easier.
+ * The buffer is put in its own section so that tools may locate it easier.
  */
 
 static char cmdline[BOOT_COMMAND_LINE_SIZE]
diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 5220274a6277..7fb001ab3109 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.

Re: [PATCH 2/3] PCI/AER: Decode Requester ID when no error info found

2024-01-02 Thread Bjorn Helgaas

On Tue, Jan 02, 2024 at 11:22:53AM -0800, Kuppuswamy Sathyanarayanan wrote:
> On 12/6/2023 2:42 PM, Bjorn Helgaas wrote:
> > From: Bjorn Helgaas 
> > 
> > When a device with AER detects an error, it logs error information in its
> > own AER Error Status registers.  It may send an Error Message to the Root
> > Port (RCEC in the case of an RCiEP), which logs the fact that an Error
> > Message was received (Root Error Status) and the Requester ID of the
> > message source (Error Source Identification).
> > 
> > aer_print_port_info() prints the Requester ID from the Root Port Error
> > Source in the usual Linux "bb:dd.f" format, but when find_source_device()
> > finds no error details in the hierarchy below the Root Port, it printed the
> > raw Requester ID without decoding it.
> > 
> > Decode the Requester ID in the usual Linux format so it matches other
> > messages.
> > 
> > Sample message changes:
> > 
> >   - pcieport :00:1c.5: AER: Correctable error received: :00:1c.5
> >   - pcieport :00:1c.5: AER: can't find device of ID00e5
> >   + pcieport :00:1c.5: AER: Correctable error message received from 
> > :00:1c.5
> >   + pcieport :00:1c.5: AER: found no error details for :00:1c.5
> > 
> > Signed-off-by: Bjorn Helgaas 
> 
> Except for the suggestion given below, it looks good to me.
> 
> Reviewed-by: Kuppuswamy Sathyanarayanan 
> 

Thanks for taking a look!

> > @@ -740,7 +740,7 @@ static void aer_print_port_info(struct pci_dev *dev, 
> > struct aer_err_info *info)
> > u8 bus = info->id >> 8;
> > u8 devfn = info->id & 0xff;
> >  
> > -   pci_info(dev, "%s%s error received: %04x:%02x:%02x.%d\n",
> > +   pci_info(dev, "%s%s error message received from %04x:%02x:%02x.%d\n",
> >  info->multi_error_valid ? "Multiple " : "",
> >  aer_error_severity_string[info->severity],
> >  pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn),
> > @@ -929,7 +929,12 @@ static bool find_source_device(struct pci_dev *parent,
> > pci_walk_bus(parent->subordinate, find_device_iter, e_info);
> >  
> > if (!e_info->error_dev_num) {
> > -   pci_info(parent, "can't find device of ID%04x\n", e_info->id);
> > +   u8 bus = e_info->id >> 8;
> > +   u8 devfn = e_info->id & 0xff;
> 
> You can use PCI_BUS_NUM(e_info->id) for getting bus number.  Since
> you are extracting this info in more than one place, maybe you can
> also define a macro PCI_DEVFN(id) (following PCI_BUS_NUM()).

Thanks, both good ideas.

We already have a PCI_DEVFN() that *combines* slot + func into devfn,
so we'd have to come up with a different name.

I'll add a patch to use PCI_BUS_NUM() in the two places here and in
pme.c.

I think I'll wait with these until after the v6.7 release.

> > +   pci_info(parent, "found no error details for 
> > %04x:%02x:%02x.%d\n",
> > +pci_domain_nr(parent->bus), bus, PCI_SLOT(devfn),
> > +PCI_FUNC(devfn));
> > return false;
> > }
> > return true;
> 
> -- 
> Sathyanarayanan Kuppuswamy
> Linux Kernel Developer

Re: [PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors

2023-12-12 Thread Bjorn Helgaas

On Tue, Dec 12, 2023 at 09:00:24AM -0600, Terry Bowman wrote:
> Hi Bjorn,
> 
> Will help prevent confusion. LGTM. 

Thanks a lot for taking a look at these!  I'd like to give you credit
in the log, e.g., "Reviewed-by: Terry Bowman ",
but I'm OCD enough that I don't want to translate "LGTM" into that all
by myself.

If you want that credit (and, I guess, the privilege of being cc'd
when we find that these patches break something :)), just reply again
with that actual "Reviewed-by:" text in it.

Bjorn

Re: [PATCH 0/3] PCI/AER: Clean up logging

2023-12-08 Thread Bjorn Helgaas

[+cc Jonathan]

On Wed, Dec 06, 2023 at 04:42:28PM -0600, Bjorn Helgaas wrote:
> From: Bjorn Helgaas 
> 
> Clean up some minor AER logging issues:
> 
>   - Log as "Correctable errors", not "Corrected errors"
> 
>   - Decode the Requester ID when we couldn't find detail error info
> 
> Bjorn Helgaas (3):
>   PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
>   PCI/AER: Decode Requester ID when no error info found
>   PCI/AER: Use explicit register sizes for struct members
> 
>  drivers/pci/pcie/aer.c | 19 ---
>  include/linux/aer.h|  8 
>  2 files changed, 16 insertions(+), 11 deletions(-)

Applied to pci/aer for v6.8.  Thanks, Jonathan, for your time in
taking a look.

[PATCH 3/3] PCI/AER: Use explicit register sizes for struct members

2023-12-06 Thread Bjorn Helgaas

From: Bjorn Helgaas 

aer_irq() reads the AER Root Error Status and Error Source Identification
(PCI_ERR_ROOT_STATUS and PCI_ERR_ROOT_ERR_SRC) registers directly into
struct aer_err_source.  Both registers are 32 bits, so declare the members
explicitly as "u32" instead of "unsigned int".

Similarly, aer_get_device_error_info() reads the AER Header Log
(PCI_ERR_HEADER_LOG) registers, which are also 32 bits, into struct
aer_header_log_regs.  Declare those members as "u32" as well.

No functional changes intended.

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/pcie/aer.c | 4 ++--
 include/linux/aer.h| 8 
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 2ff6bac9979f..60f84414ec2a 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -41,8 +41,8 @@
 #define AER_MAX_TYPEOF_UNCOR_ERRS  27  /* as per PCI_ERR_UNCOR_STATUS*/
 
 struct aer_err_source {
-   unsigned int status;
-   unsigned int id;
+   u32 status; /* PCI_ERR_ROOT_STATUS */
+   u32 id; /* PCI_ERR_ROOT_ERR_SRC */
 };
 
 struct aer_rpc {
diff --git a/include/linux/aer.h b/include/linux/aer.h
index f6ea2f57d808..ae0fae70d4bd 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -19,10 +19,10 @@
 struct pci_dev;
 
 struct aer_header_log_regs {
-   unsigned int dw0;
-   unsigned int dw1;
-   unsigned int dw2;
-   unsigned int dw3;
+   u32 dw0;
+   u32 dw1;
+   u32 dw2;
+   u32 dw3;
 };
 
 struct aer_capability_regs {
-- 
2.34.1

[PATCH 2/3] PCI/AER: Decode Requester ID when no error info found

2023-12-06 Thread Bjorn Helgaas

From: Bjorn Helgaas 

When a device with AER detects an error, it logs error information in its
own AER Error Status registers.  It may send an Error Message to the Root
Port (RCEC in the case of an RCiEP), which logs the fact that an Error
Message was received (Root Error Status) and the Requester ID of the
message source (Error Source Identification).

aer_print_port_info() prints the Requester ID from the Root Port Error
Source in the usual Linux "bb:dd.f" format, but when find_source_device()
finds no error details in the hierarchy below the Root Port, it printed the
raw Requester ID without decoding it.

Decode the Requester ID in the usual Linux format so it matches other
messages.

Sample message changes:

  - pcieport :00:1c.5: AER: Correctable error received: :00:1c.5
  - pcieport :00:1c.5: AER: can't find device of ID00e5
  + pcieport :00:1c.5: AER: Correctable error message received from 
:00:1c.5
  + pcieport :00:1c.5: AER: found no error details for :00:1c.5

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/pcie/aer.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 20db80018b5d..2ff6bac9979f 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -740,7 +740,7 @@ static void aer_print_port_info(struct pci_dev *dev, struct 
aer_err_info *info)
u8 bus = info->id >> 8;
u8 devfn = info->id & 0xff;
 
-   pci_info(dev, "%s%s error received: %04x:%02x:%02x.%d\n",
+   pci_info(dev, "%s%s error message received from %04x:%02x:%02x.%d\n",
 info->multi_error_valid ? "Multiple " : "",
 aer_error_severity_string[info->severity],
 pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn),
@@ -929,7 +929,12 @@ static bool find_source_device(struct pci_dev *parent,
pci_walk_bus(parent->subordinate, find_device_iter, e_info);
 
if (!e_info->error_dev_num) {
-   pci_info(parent, "can't find device of ID%04x\n", e_info->id);
+   u8 bus = e_info->id >> 8;
+   u8 devfn = e_info->id & 0xff;
+
+   pci_info(parent, "found no error details for 
%04x:%02x:%02x.%d\n",
+pci_domain_nr(parent->bus), bus, PCI_SLOT(devfn),
+PCI_FUNC(devfn));
return false;
}
return true;
-- 
2.34.1

[PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors

2023-12-06 Thread Bjorn Helgaas

From: Bjorn Helgaas 

The PCIe spec classifies errors as either "Correctable" or "Uncorrectable".
Previously we printed these as "Corrected" or "Uncorrected".  To avoid
confusion, use the same terms as the spec.

One confusing situation is when one agent detects an error, but another
agent is responsible for recovery, e.g., by re-attempting the operation.
The first agent may log a "correctable" error but it has not yet been
corrected.  The recovery agent must report an uncorrectable error if it is
unable to recover.  If we print the first agent's error as "Corrected", it
gives the false impression that it has already been resolved.

Sample message change:

  - pcieport :00:1c.5: AER: Corrected error received: :00:1c.5
  + pcieport :00:1c.5: AER: Correctable error received: :00:1c.5

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/pcie/aer.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 42a3bd35a3e1..20db80018b5d 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -436,9 +436,9 @@ void pci_aer_exit(struct pci_dev *dev)
  * AER error strings
  */
 static const char *aer_error_severity_string[] = {
-   "Uncorrected (Non-Fatal)",
-   "Uncorrected (Fatal)",
-   "Corrected"
+   "Uncorrectable (Non-Fatal)",
+   "Uncorrectable (Fatal)",
+   "Correctable"
 };
 
 static const char *aer_error_layer[] = {
-- 
2.34.1

[PATCH 0/3] PCI/AER: Clean up logging

2023-12-06 Thread Bjorn Helgaas

From: Bjorn Helgaas 

Clean up some minor AER logging issues:

  - Log as "Correctable errors", not "Corrected errors"

  - Decode the Requester ID when we couldn't find detail error info

Bjorn Helgaas (3):
  PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
  PCI/AER: Decode Requester ID when no error info found
  PCI/AER: Use explicit register sizes for struct members

 drivers/pci/pcie/aer.c | 19 ---
 include/linux/aer.h|  8 
 2 files changed, 16 insertions(+), 11 deletions(-)

-- 
2.34.1

Re: [PATCH 1/6] x86: Use PCI_HEADER_TYPE_* instead of literals

2023-12-01 Thread Bjorn Helgaas

[+cc scsi, powerpc folks]

On Fri, Dec 01, 2023 at 02:44:47PM -0600, Bjorn Helgaas wrote:
> On Fri, Nov 24, 2023 at 11:09:13AM +0200, Ilpo Järvinen wrote:
> > Replace 0x7f and 0x80 literals with PCI_HEADER_TYPE_* defines.
> > 
> > Signed-off-by: Ilpo Järvinen 
> 
> Applied entire series on the PCI "enumeration" branch for v6.8,
> thanks!
> 
> If anybody wants to take pieces separately, let me know and I'll drop
> from PCI.

OK, b4 picked up the entire series but I was only cc'd on this first
patch, so I missed the responses about EDAC, xtensa, bcma already
being applied elsewhere.

So I kept these in the PCI tree:

  420ac76610d7 ("scsi: lpfc: Use PCI_HEADER_TYPE_MFD instead of literal")
  3773343dd890 ("powerpc/fsl-pci: Use PCI_HEADER_TYPE_MASK instead of literal")
  197e0da1f1a3 ("x86/pci: Use PCI_HEADER_TYPE_* instead of literals")

and dropped the others.

x86, SCSI, powerpc folks, if you want to take these instead, let me
know and I'll drop them.

> > ---
> >  arch/x86/kernel/aperture_64.c  | 3 +--
> >  arch/x86/kernel/early-quirks.c | 4 ++--
> >  2 files changed, 3 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
> > index 4feaa670d578..89c0c8a3fc7e 100644
> > --- a/arch/x86/kernel/aperture_64.c
> > +++ b/arch/x86/kernel/aperture_64.c
> > @@ -259,10 +259,9 @@ static u32 __init search_agp_bridge(u32 *order, int 
> > *valid_agp)
> > order);
> > }
> >  
> > -   /* No multi-function device? */
> > type = read_pci_config_byte(bus, slot, func,
> >PCI_HEADER_TYPE);
> > -   if (!(type & 0x80))
> > +   if (!(type & PCI_HEADER_TYPE_MFD))
> > break;
> > }
> > }
> > diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
> > index a6c1867fc7aa..59f4aefc6bc1 100644
> > --- a/arch/x86/kernel/early-quirks.c
> > +++ b/arch/x86/kernel/early-quirks.c
> > @@ -779,13 +779,13 @@ static int __init check_dev_quirk(int num, int slot, 
> > int func)
> > type = read_pci_config_byte(num, slot, func,
> > PCI_HEADER_TYPE);
> >  
> > -   if ((type & 0x7f) == PCI_HEADER_TYPE_BRIDGE) {
> > +   if ((type & PCI_HEADER_TYPE_MASK) == PCI_HEADER_TYPE_BRIDGE) {
> > sec = read_pci_config_byte(num, slot, func, PCI_SECONDARY_BUS);
> > if (sec > num)
> > early_pci_scan_bus(sec);
> > }
> >  
> > -   if (!(type & 0x80))
> > +   if (!(type & PCI_HEADER_TYPE_MFD))
> > return -1;
> >  
> > return 0;
> > -- 
> > 2.30.2
> >

Re: [pci:controller/xilinx-xdma] BUILD REGRESSION 8d786149d78c7784144c7179e25134b6530b714b

2023-10-31 Thread Bjorn Helgaas

On Tue, Oct 31, 2023 at 09:59:29AM -0700, Nick Desaulniers wrote:
> On Tue, Oct 31, 2023 at 7:56 AM Bjorn Helgaas  wrote:
> > On Sat, Oct 28, 2023 at 08:22:54PM +0800, kernel test robot wrote:
> > > tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git 
> > > controller/xilinx-xdma
> > > branch HEAD: 8d786149d78c7784144c7179e25134b6530b714b  PCI: xilinx-xdma: 
> > > Add Xilinx XDMA Root Port driver
> > >
> > > Error/Warning ids grouped by kconfigs:
> > >
> > > clang_recent_errors
> > > `-- powerpc-pmac32_defconfig
> > > |-- 
> > > arch-powerpc-sysdev-grackle.c:error:unused-function-grackle_set_stg-Werror-Wunused-function
> > > |-- 
> > > arch-powerpc-xmon-xmon.c:error:unused-function-get_output_lock-Werror-Wunused-function
> > > `-- 
> > > arch-powerpc-xmon-xmon.c:error:unused-function-release_output_lock-Werror-Wunused-function
> >
> > This report is close to useless.  It doesn't show the complete error
> > message, it doesn't show how to reproduce the issue, and the pci -next
> > branch (including controller/xilinx-xdma) doesn't reference any of
> > these functions:
> >
> >   $ git grep -E "grackle_set_stg|get_output_lock|release_output_lock" | cat
> >   arch/powerpc/sysdev/grackle.c:static inline void grackle_set_stg(struct 
> > pci_controller* bp, int enable)
> >   arch/powerpc/sysdev/grackle.c:grackle_set_stg(hose, 1);
> >   arch/powerpc/xmon/xmon.c:static void get_output_lock(void)
> >   arch/powerpc/xmon/xmon.c:static void release_output_lock(void)
> >   arch/powerpc/xmon/xmon.c:static inline void get_output_lock(void) {}
> >   arch/powerpc/xmon/xmon.c:static inline void release_output_lock(void) {}
> >   arch/powerpc/xmon/xmon.c: get_output_lock();
> >   arch/powerpc/xmon/xmon.c: release_output_lock();
> >   arch/powerpc/xmon/xmon.c: get_output_lock();
> >   arch/powerpc/xmon/xmon.c: release_output_lock();
> >   arch/powerpc/xmon/xmon.c: get_output_lock();
> >   arch/powerpc/xmon/xmon.c: release_output_lock();
> >   arch/powerpc/xmon/xmon.c: get_output_lock();
> >   arch/powerpc/xmon/xmon.c: release_output_lock();
> >
> > That said, the unused functions do look legit:
> >
> > grackle_set_stg() is a static function and the only call is under
> > "#if 0".
> 
> Time to remove it then? Or is it a bug that it's not called?
> Otherwise the definition should be behind the same preprocessor guards
> as the caller.  Same for the below.

I don't really care whether we keep the warning or not.

My real complaint is that the 0-day report fingered
pci/controller/xilinx-xdma, which is completely unrelated, which is a
waste of time.

> > Same with get_output_lock() and release_output_lock(): they're static
> > and always defined in xmon.c, but only called if either CONFIG_SMP or
> > CONFIG_DEBUG_FS.
> >
> > But they're certainly not related to controller/xilinx-xdma, so I'm
> > going to ignore them.
> >
> > Bjorn
> >
> > P.S. Nathan & Nick, I cc'd you because of this earlier report that
> > also mentioned grackle_set_stg():
> > https://lore.kernel.org/lkml/202308121120.u2d3ypvt-...@intel.com/
> 
> 
> 
> -- 
> Thanks,
> ~Nick Desaulniers

Re: [linux-next:master] BUILD REGRESSION c503e3eec382ac708ee7adf874add37b77c5d312

2023-10-31 Thread Bjorn Helgaas

On Tue, Oct 31, 2023 at 04:35:23AM +0800, kernel test robot wrote:
> tree/branch: 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> branch HEAD: c503e3eec382ac708ee7adf874add37b77c5d312  Add linux-next 
> specific files for 20231030
> 
> Error/Warning reports:
> ... 
> https://lore.kernel.org/oe-kbuild-all/202310302206.pkr5ebdi-...@intel.com

> Error/Warning: (recently discovered and may have been fixed)
> 
> Warning: MAINTAINERS references a file that doesn't exist: 
> Documentation/devicetree/bindings/iio/imu/bosch,bma400.yaml
> aarch64-linux-ld: drivers/cxl/core/pci.c:921:(.text+0xbbc): undefined 
> reference to `pci_print_aer'
> ...
> arch/riscv/include/asm/mmio.h:67:(.text+0xd66): undefined reference to 
> `pci_print_aer'
> csky-linux-ld: pci.c:(.text+0x6e8): undefined reference to `pci_print_aer'
> drivers/cxl/core/pci.c:921: undefined reference to `pci_print_aer'
> drivers/cxl/core/pci.c:921:(.text+0xbc0): undefined reference to 
> `pci_print_aer'
> ...
> ld: drivers/cxl/core/pci.c:921: undefined reference to `pci_print_aer'
> loongarch64-linux-ld: drivers/cxl/core/pci.c:921:(.text+0xa38): undefined 
> reference to `pci_print_aer'
> pci.c:(.text+0x662): undefined reference to `pci_print_aer'
> powerpc-linux-ld: pci.c:(.text+0xf10): undefined reference to `pci_print_aer'
> riscv64-linux-ld: pci.c:(.text+0x11ec): undefined reference to `pci_print_aer'

I have no idea about the above (and all the similar ones below); I
assume they all have to do with
https://lore.kernel.org/r/20231018171713.1883517-13-rrich...@amd.com

> Unverified Error/Warning (likely false positive, please contact us if 
> interested):
> 
> drivers/pci/controller/dwc/pcie-rcar-gen4.c:439:15: warning: cast to smaller 
> integer type 'enum dw_pcie_device_mode' from 'const void *' 
> [-Wvoid-pointer-to-enum-cast]

Safe but annoying.  Yoshihiro, can you fix this by adding structs for
the of_device_id.data member instead of casting DW_PCIE_RC_TYPE and
DW_PCIE_EP_TYPE?  Examples here:

  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/dwc/pci-dra7xx.c?id=v6.6#n557
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/dwc/pci-keystone.c?id=v6.6#n1069
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/dwc/pcie-artpec6.c?id=v6.6#n452
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/dwc/pcie-designware-plat.c?id=v6.6#n159
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/dwc/pcie-keembay.c?id=v6.6#n437
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/dwc/pcie-tegra194.c?id=v6.6#n2431

Siddharth, since you're looking at keystone v3.65, it looks to me like
DW_PCIE_VER_365A is currently broken because ks_pcie_rc_of_data
doesn't set .mode, so it defaults to zero, and it looks like we should
end up at the INVALID device type case here:

  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/dwc/pci-keystone.c?id=v6.6#n1285

> |-- arm64-buildonly-randconfig-r003-20220511
> |   `-- 
> aarch64-linux-ld:drivers-cxl-core-pci.c:(.text):undefined-reference-to-pci_print_aer

> |-- csky-randconfig-001-20231030
> |   |-- csky-linux-ld:pci.c:(.text):undefined-reference-to-pci_print_aer
> |   `-- pci.c:(.text):undefined-reference-to-pci_print_aer

> |-- i386-randconfig-141-20231030
> |   |-- ld:drivers-cxl-core-pci.c:undefined-reference-to-pci_print_aer

> |-- loongarch-randconfig-r014-20230225
> |   `-- drivers-cxl-core-pci.c:(.text):undefined-reference-to-pci_print_aer
> |-- loongarch-randconfig-r032-20220926
> |   `-- 
> loongarch64-linux-ld:drivers-cxl-core-pci.c:(.text):undefined-reference-to-pci_print_aer

> |-- powerpc-randconfig-003-20231016
> |   `-- powerpc-linux-ld:pci.c:(.text):undefined-reference-to-pci_print_aer

> |-- riscv-randconfig-r002-20220124
> |   `-- 
> arch-riscv-include-asm-mmio.h:(.text):undefined-reference-to-pci_print_aer
> |-- riscv-randconfig-r011-20220606
> |   `-- riscv64-linux-ld:pci.c:(.text):undefined-reference-to-pci_print_aer

> |-- x86_64-randconfig-x052-20230810
> |   `-- drivers-cxl-core-pci.c:undefined-reference-to-pci_print_aer

Re: [pci:controller/xilinx-xdma] BUILD REGRESSION 8d786149d78c7784144c7179e25134b6530b714b

2023-10-31 Thread Bjorn Helgaas

[+cc powerpc, clang folks]

On Sat, Oct 28, 2023 at 08:22:54PM +0800, kernel test robot wrote:
> tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git 
> controller/xilinx-xdma
> branch HEAD: 8d786149d78c7784144c7179e25134b6530b714b  PCI: xilinx-xdma: Add 
> Xilinx XDMA Root Port driver
> 
> Error/Warning ids grouped by kconfigs:
> 
> clang_recent_errors
> `-- powerpc-pmac32_defconfig
> |-- 
> arch-powerpc-sysdev-grackle.c:error:unused-function-grackle_set_stg-Werror-Wunused-function
> |-- 
> arch-powerpc-xmon-xmon.c:error:unused-function-get_output_lock-Werror-Wunused-function
> `-- 
> arch-powerpc-xmon-xmon.c:error:unused-function-release_output_lock-Werror-Wunused-function

This report is close to useless.  It doesn't show the complete error
message, it doesn't show how to reproduce the issue, and the pci -next
branch (including controller/xilinx-xdma) doesn't reference any of
these functions:

  $ git grep -E "grackle_set_stg|get_output_lock|release_output_lock" | cat
  arch/powerpc/sysdev/grackle.c:static inline void grackle_set_stg(struct 
pci_controller* bp, int enable)
  arch/powerpc/sysdev/grackle.c:grackle_set_stg(hose, 1);
  arch/powerpc/xmon/xmon.c:static void get_output_lock(void)
  arch/powerpc/xmon/xmon.c:static void release_output_lock(void)
  arch/powerpc/xmon/xmon.c:static inline void get_output_lock(void) {}
  arch/powerpc/xmon/xmon.c:static inline void release_output_lock(void) {}
  arch/powerpc/xmon/xmon.c: get_output_lock();
  arch/powerpc/xmon/xmon.c: release_output_lock();
  arch/powerpc/xmon/xmon.c: get_output_lock();
  arch/powerpc/xmon/xmon.c: release_output_lock();
  arch/powerpc/xmon/xmon.c: get_output_lock();
  arch/powerpc/xmon/xmon.c: release_output_lock();
  arch/powerpc/xmon/xmon.c: get_output_lock();
  arch/powerpc/xmon/xmon.c: release_output_lock();

That said, the unused functions do look legit:

grackle_set_stg() is a static function and the only call is under
"#if 0".

Same with get_output_lock() and release_output_lock(): they're static
and always defined in xmon.c, but only called if either CONFIG_SMP or
CONFIG_DEBUG_FS.

But they're certainly not related to controller/xilinx-xdma, so I'm
going to ignore them.

Bjorn

P.S. Nathan & Nick, I cc'd you because of this earlier report that
also mentioned grackle_set_stg():
https://lore.kernel.org/lkml/202308121120.u2d3ypvt-...@intel.com/

Re: [PATCH v6 1/3] PCI/AER: Factor out interrupt toggling into helpers

2023-10-25 Thread Bjorn Helgaas

On Fri, May 12, 2023 at 08:00:12AM +0800, Kai-Heng Feng wrote:
> There are many places that enable and disable AER interrupt, so move
> them into helpers.
> 
> Reviewed-by: Mika Westerberg 
> Reviewed-by: Kuppuswamy Sathyanarayanan 
> 
> Reviewed-by: Jonathan Cameron 
> Signed-off-by: Kai-Heng Feng 

I applied this patch (only 1/3) to pci/aer for v6.7.

I'm not clear on the others yet, so let's look at those again after
v6.7-rc1.  It seemed like there's still a question about disabling
interrupts when we're going to D3hot.

>  drivers/pci/pcie/aer.c | 45 +-
>  1 file changed, 27 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index f6c24ded134c..1420e1f27105 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1227,6 +1227,28 @@ static irqreturn_t aer_irq(int irq, void *context)
>   return IRQ_WAKE_THREAD;
>  }
>  
> +static void aer_enable_irq(struct pci_dev *pdev)
> +{
> + int aer = pdev->aer_cap;
> + u32 reg32;
> +
> + /* Enable Root Port's interrupt in response to error messages */
> + pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32);
> + reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
> + pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32);
> +}
> +
> +static void aer_disable_irq(struct pci_dev *pdev)
> +{
> + int aer = pdev->aer_cap;
> + u32 reg32;
> +
> + /* Disable Root's interrupt in response to error messages */
> + pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32);
> + reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
> + pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32);
> +}
> +
>  /**
>   * aer_enable_rootport - enable Root Port's interrupts when receiving 
> messages
>   * @rpc: pointer to a Root Port data structure
> @@ -1256,10 +1278,7 @@ static void aer_enable_rootport(struct aer_rpc *rpc)
>   pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, ®32);
>   pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, reg32);
>  
> - /* Enable Root Port's interrupt in response to error messages */
> - pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32);
> - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
> - pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32);
> + aer_enable_irq(pdev);
>  }
>  
>  /**
> @@ -1274,10 +1293,7 @@ static void aer_disable_rootport(struct aer_rpc *rpc)
>   int aer = pdev->aer_cap;
>   u32 reg32;
>  
> - /* Disable Root's interrupt in response to error messages */
> - pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32);
> - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
> - pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32);
> + aer_disable_irq(pdev);
>  
>   /* Clear Root's error status reg */
>   pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, ®32);
> @@ -1372,12 +1388,8 @@ static pci_ers_result_t aer_root_reset(struct pci_dev 
> *dev)
>*/
>   aer = root ? root->aer_cap : 0;
>  
> - if ((host->native_aer || pcie_ports_native) && aer) {
> - /* Disable Root's interrupt in response to error messages */
> - pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32);
> - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
> - pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
> - }
> + if ((host->native_aer || pcie_ports_native) && aer)
> + aer_disable_irq(root);
>  
>   if (type == PCI_EXP_TYPE_RC_EC || type == PCI_EXP_TYPE_RC_END) {
>   rc = pcie_reset_flr(dev, PCI_RESET_DO_RESET);
> @@ -1396,10 +1408,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev 
> *dev)
>   pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, ®32);
>   pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32);
>  
> - /* Enable Root Port's interrupt in response to error messages */
> - pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32);
> - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
> - pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
> + aer_enable_irq(root);
>   }
>  
>   return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED;
> -- 
> 2.34.1
>

Re: [PATCH 2/3] PCI: layerscape: add suspend/resume for ls1021a

2023-10-16 Thread Bjorn Helgaas

On Mon, Oct 16, 2023 at 12:11:04PM -0400, Frank Li wrote:
> On Mon, Oct 16, 2023 at 10:22:11AM -0500, Bjorn Helgaas wrote:

> > Obviously Lorenzo *could* edit all your subject lines on your behalf,
> > but it makes everybody's life easier if people look at the existing
> > code and follow the style when making changes.
> 
> Understand, but simple mark 'a' and 'A' to me. I will update patches and
> take care for next time instead search whole docuemnt to guess which one
> violated. I know I make some mistakes at here. But I am working on many
> difference kernel subsystems, some require upper case, some require low
> case, someone doesn't care.

Right, that's why I always suggest following the example of the
surrounding code and history.  English is the only language I know,
but I speculate that this typographical detail probably doesn't make
sense in languages that don't have a similar upper/lowercase
distinction.

Thanks for persevering; we'd be having a lot more trouble if I tried
to send emails in your native language ;)

Bjorn

Re: [PATCH 2/3] PCI: layerscape: add suspend/resume for ls1021a

2023-10-16 Thread Bjorn Helgaas

On Mon, Oct 16, 2023 at 10:45:25AM -0400, Frank Li wrote:
> On Tue, Oct 10, 2023 at 06:02:36PM +0200, Lorenzo Pieralisi wrote:
> > On Tue, Oct 10, 2023 at 10:20:12AM -0400, Frank Li wrote:

> > > Ping
> > 
> > Read and follow please (and then ping us):
> > https://lore.kernel.org/linux-pci/20171026223701.ga25...@bhelgaas-glaptop.roam.corp.google.com
> 
> Could you please help point which specic one was not follow aboved guide?
> Then I can update my code. I think that's efficial communication method. I
> think I have read it serial times. But not sure which one violate the
> guide?
> 
> @Bjorn Helgaas. How do you think so? 

Since Lorenzo didn't point out anything specific in the patch itself,
I think he was probably referring to the subject line and this advice:

  - Follow the existing convention, i.e., run "git log --oneline
" and make yours match in format, capitalization, and
sentence structure.  For example, native host bridge driver patch
titles look like this:

  PCI: altera: Fix platform_get_irq() error handling
  PCI: vmd: Remove IRQ affinity so we can allocate more IRQs
  PCI: mediatek: Add MSI support for MT2712 and MT7622
  PCI: rockchip: Remove IRQ domain if probe fails

In this case, your subject line was:

  PCI: layerscape: add suspend/resume for ls1021a

The advice was to run this:

  $ git log --oneline drivers/pci/controller/dwc/pci-layerscape.c
  83c088148c8e PCI: Use PCI_HEADER_TYPE_* instead of literals
  9fda4d09905d PCI: layerscape: Add power management support for ls1028a
  277004d7a4a3 PCI: Remove unnecessary  includes
  60b3c27fb9b9 PCI: dwc: Rename struct pcie_port to dw_pcie_rp
  d23f0c11aca2 PCI: layerscape: Change to use the DWC common link-up check 
function
  7007b745a508 PCI: layerscape: Convert to builtin_platform_driver()
  60f5b73fa0f2 PCI: dwc: Remove unnecessary wrappers around dw_pcie_host_init()
  b9ac0f9dc8ea PCI: dwc: Move dw_pcie_setup_rc() to DWC common code
  f78f02638af5 PCI: dwc: Rework MSI initialization

Note that these summaries are all complete sentences that start with a
capital letter:

  Use PCI_HEADER_TYPE_* instead of literals
  Add power management support for ls1028a
  Remove unnecessary  includes
  ...

So yours could be this:

  PCI: layerscape: Add suspend/resume for ls1021a
   ^

This is trivial, obviously.  But the uppercase/lowercase distinction
carries information, and it's an unnecessary distraction to notice
that "oh, this is different from the rest; is the difference
important or should I ignore it?"

Obviously Lorenzo *could* edit all your subject lines on your behalf,
but it makes everybody's life easier if people look at the existing
code and follow the style when making changes.

E.g., write subject lines that are similar in style to previous ones,
name local variables similarly to other functions, use line lengths
consistent with the rest of the file, etc.  After applying a change,
the file should look like a coherent whole; we should not be able to
tell that this hunk was added later by somebody else.  This all helps
make the code (and the git history) more readable and maintainable.

Bjorn

Re: [PATCH 0/3] PCI: PCI_HEADER_TYPE bugfix & cleanups

2023-10-03 Thread Bjorn Helgaas

On Tue, Oct 03, 2023 at 03:52:57PM +0300, Ilpo Järvinen wrote:
> One bugfix and cleanups for PCI_HEADER_TYPE_* literals.
> 
> This series only covers what's within drivers/pci/. I'd have patches
> for other subsystems too but I decided to wait with them until
> PCI_HEADER_TYPE_MFD is in Linus' tree (to keep the series receipient
> count reasonable, the rest can IMO go through the subsystem specific
> trees once the define is there).
> 
> Ilpo Järvinen (3):
>   PCI: vmd: Correct PCI Header Type Register's MFD bit check
>   PCI: Add PCI_HEADER_TYPE_MFD pci_regs.h
>   PCI: Use PCI_HEADER_TYPE_* instead of literals
> 
>  drivers/pci/controller/dwc/pci-layerscape.c   |  2 +-
>  .../controller/mobiveil/pcie-mobiveil-host.c  |  2 +-
>  drivers/pci/controller/pcie-iproc.c   |  2 +-
>  drivers/pci/controller/pcie-rcar-ep.c |  2 +-
>  drivers/pci/controller/pcie-rcar-host.c   |  2 +-
>  drivers/pci/controller/vmd.c  |  5 ++---
>  drivers/pci/hotplug/cpqphp_ctrl.c |  6 ++---
>  drivers/pci/hotplug/cpqphp_pci.c  | 22 +--
>  drivers/pci/hotplug/ibmphp.h  |  5 +++--
>  drivers/pci/hotplug/ibmphp_pci.c  |  2 +-
>  drivers/pci/pci.c |  2 +-
>  drivers/pci/quirks.c  |  6 ++---
>  include/uapi/linux/pci_regs.h |  1 +
>  13 files changed, 30 insertions(+), 29 deletions(-)

Applied to pci/enumeration for v6.7, thanks!

Re: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-26 Thread Bjorn Helgaas

On Fri, Sep 22, 2023 at 10:46:36AM +0800, Shuai Xue wrote:
> ...

> Actually, this is a question from my colleague from firmware team.
> The original question is that:
> 
> "Should I set CPER_SEV_FATAL for Generic Error Status Block when a
> PCIe fatal error is detected? If set, kernel will always panic.
> Otherwise, kernel will always not panic."
> 
> So I pull a question about desired behavior of Linux kernel first :)
> From the perspective of the kernel, CPER_SEV_FATAL for Generic Error
> Status Block is not reasonable. The kernel will attempt to recover
> Fatal errors, although recovery may fail.

I don't know the semantics of CPER_SEV_FATAL or why it's there.
With CPER, we have *two* error severities: a "native" one defined by
the PCIe spec and another defined by the platform via CPER.

I speculate that the reason for the CPER severity could be to provide
a severity for error sources that don't have a "native" severity like
AER does, or for the vendor to force the OS to restart (for
CPER_SEV_FATAL, anyway) in cases where it might not otherwise.

In the native case, we only have the PCIe severity and don't have the
CPER severity at all, and I suspect that unless there's uncontained
data corruption, we would rather handle even the most severe PCIe
fatal error by disabling the specific device(s) instead of panicking
and restarting the whole machine.

So for PCIe errors, I'm not sure setting CPER_SEV_FATAL is beneficial
unless the platform wants to force the OS to panic, e.g., maybe the
platform knows about data corruption and/or the vendor wants the OS to
panic as part of a reliability story.

Presumably the platform has already logged the error, and I assume the
platform *could* restart without even returning to the OS, but maybe
it wants the OS to do a crashdump or shutdown in a more orderly way.

Bjorn

Re: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-21 Thread Bjorn Helgaas

On Thu, Sep 21, 2023 at 08:10:19PM +0800, Shuai Xue wrote:
> On 2023/9/21 07:02, Bjorn Helgaas wrote:
> > On Mon, Sep 18, 2023 at 05:39:58PM +0800, Shuai Xue wrote:
> ...

> > I guess your point is that for CPER_SEV_FATAL errors, the APEI/GHES
> > path always panics but the native path never does, and that maybe both
> > paths should work the same way?
> 
> Yes, exactly. Both OS native and APEI/GHES firmware first are notifications
> used to handles PCIe AER errors, and IMHO, they should ideally work in the
> same way.

I agree, that would be nice, but the whole point of the APEI/GHES
functionality is vendor value-add, so I'm not sure we can achieve that
ideal.

> ...
> As a result, AER driver only does recovery for non-fatal PCIe error.

This is only true for the APEI/GHES path, right?  For *native* AER
handling, we attempt recovery for both fatal and non-fatal errors.

> > It doesn't seem like the native path should always panic.  If we can
> > tell that data was corrupted, we may want to panic, but otherwise I
> > don't think we should crash the entire system even if some device is
> > permanently broken.
> 
> Got it. But how can we tell if the data is corrupted with OS native?

I naively expect that by PCIe protocol, corrupted DLLPs or TLPs
detected by CRC, sequence number errors, etc, would be discarded
before corrupting memory, so I doubt we'd get an uncorrectable error
that means "sorry, I just corrupted your data."

But DPC is advertised as "avoiding the potential spread of any data
corruption," so there must be some mechanisms of corruption, and since
DPC is triggered by either ERR_FATAL or ERR_NONFATAL, I guess maybe
the errors could tell us something.  I'm going to quit speculating
because I obviously don't know enough about this area.

> >> However, I have changed my mind on this issue as I encounter a case where
> >> a error propagation is detected due to fatal DLLP (Data Link Protocol
> >> Error) error. A DLLP error occurred in the Compute node, causing the
> >> node to panic because `struct acpi_hest_generic_status::error_severity` was
> >> set as CPER_SEV_FATAL. However, data corruption was still detected in the
> >> storage node by CRC.
> > 
> > The only mention of Data Link Protocol Error that looks relevant is
> > PCIe r6.0, sec 3.6.2.2, which basically says a DLLP with an unexpected
> > Sequence Number should be discarded:
> > 
> >   For Ack and Nak DLLPs, the following steps are followed (see Figure
> >   3-21):
> > 
> > - If the Sequence Number specified by the AckNak_Seq_Num does not
> >   correspond to an unacknowledged TLP, or to the value in
> >   ACKD_SEQ, the DLLP is discarded
> > 
> >   - This is a Data Link Protocol Error, which is a reported error
> > associated with the Port (see Section 6.2).
> > 
> > So data from that DLLP should not have made it to memory, although of
> > course the DMA may not have been completed.  But it sounds like you
> > did see corrupted data written to memory?
> 
> The storage node use RDMA to directly access remote compute node.
> And a error detected by CRC in the storage node. So I suspect yes.

When doing the CRC, can you distinguish between corrupted data and
data that was not written because a DMA was only partially completed?

> ...
> I tried to inject Data Link Protocol Error on some platform. The mechanism
> behind is that rootport controls the sequence number of the specific TLPs
> and ACK/NAK DLLPs. Data Link Protocol Error will be detected at the Rx side
> of ACK/NAK DLLPs.
> 
> In such case, NIC and NVMe recovered on fatal and non-fatal DLLP
> errors.

I'm guessing this error injection directly writes the AER status bit,
which would probably only test the reporting (sending an ERR_FATAL
message), AER interrupt generation, firmware or OS interrupt handling,
etc.

It probably would not actually generate a DLLP with a bad sequence
number, so it probably does not test the hardware behavior of
discarding the DLLP if the sequence number is bad.  Just my guess
though.

> ...
> My point is that how kernel could recover from non-fatal and fatal
> errors in firmware first without DPC? If CPER_SEV_FATAL is used to
> report fatal PCIe error, kernel will panic in APEI/GHES driver.

The platform decides whether to use CPER_SEV_FATAL, so we can't change
that.  We *could* change whether Linux panics when the platform says
an error is CPER_SEV_FATAL.  That happens in drivers/acpi, so it's
really up to Rafael.

Personally I would want to hear from vendors who use the APEI/GHES
path.  Poking around the web for logs that mention HEST and related
things, it looks like at least Dell, HP, and Lenovo use it.  And there
are drivers/acpi/apei commits from nxp.com, alibaba.com, amd.com,
arm.com huawei.com, etc., so some of them probably care, too.

Bjorn

Re: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-20 Thread Bjorn Helgaas

On Mon, Sep 18, 2023 at 05:39:58PM +0800, Shuai Xue wrote:
> Hi, all folks,
> 
> Error reporting and recovery are one of the important features of PCIe, and
> the kernel has been supporting them since version 2.6, 17 years ago.
> I am very curious about the expected behavior of the software.
> I first recap the error classification and then list my questions bellow it.
> 
> ## Recap: Error classification
> 
> - Fatal Errors
> 
> Fatal errors are uncorrectable error conditions which render the particular
> Link and related hardware unreliable. For Fatal errors, a reset of the
> components on the Link may be required to return to reliable operation.
> Platform handling of Fatal errors, and any efforts to limit the effects of
> these errors, is platform implementation specific. (PCIe 6.0.1, sec
> 6.2.2.2.1 Fatal Errors).
> 
> - Non-Fatal Errors
> 
> Non-fatal errors are uncorrectable errors which cause a particular
> transaction to be unreliable but the Link is otherwise fully functional.
> Isolating Non-fatal from Fatal errors provides Requester/Receiver logic in
> a device or system management software the opportunity to recover from the
> error without resetting the components on the Link and disturbing other
> transactions in progress. Devices not associated with the transaction in
> error are not impacted by the error.  (PCIe 6.0.1, sec 6.2.2.2.1 Non-Fatal
> Errors).
> 
> ## What the kernel do?
> 
> The Linux kernel supports both the OS native and firmware first modes in
> AER and DPC drivers. The error recovery API is defined in `struct
> pci_error_handlers`, and the recovery process is performed in several
> stages in pcie_do_recovery(). One main difference in handling PCIe errors
> is that the kernel only resets the link when a fatal error is detected.
> 
> ## Questions
> 
> 1. Should kernel panic when fatal errors occur without AER recovery?
> 
> IMHO, the answer is NO. The AER driver handles both fatal and
> non-fatal errors, and I have not found any panic changes in the
> recovery path in OS native mode.
> 
> As far as I know, on many X86 platforms, struct
> `acpi_hest_generic_status::error_severity` is set as CPER_SEV_FATAL
> in firmware first mode. As a result, kernel will panic immediately
> in ghes_proc() when fatal AER errors occur, and there is no chance
> to handle the error and perform recovery in AER driver.

UEFI r2.10, sec N.2.1,, defines CPER_SEV_FATAL, and platform firmware
decides which Error Severity to put in the error record.  I don't see
anything in UEFI about how the OS should handle fatal errors.

ACPI r6.5, sec 18.1, says on fatal uncorrected error, the system
should be restarted to prevent propagation of the error.  For
CPER_SEV_FATAL errors, it looks like ghes_proc() panics even before
trying AER recovery.

I guess your point is that for CPER_SEV_FATAL errors, the APEI/GHES
path always panics but the native path never does, and that maybe both
paths should work the same way?

It would be nice if they worked the same, but I suspect that vendors
may rely on the fact that CPER_SEV_FATAL forces a restart/panic as
part of their system integrity story.

It doesn't seem like the native path should always panic.  If we can
tell that data was corrupted, we may want to panic, but otherwise I
don't think we should crash the entire system even if some device is
permanently broken.

> For fatal and non-fatal errors, struct
> `acpi_hest_generic_status::error_severity` should as
> CPER_SEV_RECOVERABLE, and struct
> `acpi_hest_generic_data::error_severity` should reflect its real
> severity. Then, the kernel is equivalent to handling PCIe errors in
> Firmware first mode as it does in OS native mode.  Please correct me
> if I am wrong.

I don't know enough to comment on how Error Severity should be used in
the Generic Error Status Block vs the Generic Error Data Entry.

> However, I have changed my mind on this issue as I encounter a case where
> a error propagation is detected due to fatal DLLP (Data Link Protocol
> Error) error. A DLLP error occurred in the Compute node, causing the
> node to panic because `struct acpi_hest_generic_status::error_severity` was
> set as CPER_SEV_FATAL. However, data corruption was still detected in the
> storage node by CRC.

The only mention of Data Link Protocol Error that looks relevant is
PCIe r6.0, sec 3.6.2.2, which basically says a DLLP with an unexpected
Sequence Number should be discarded:

  For Ack and Nak DLLPs, the following steps are followed (see Figure
  3-21):

- If the Sequence Number specified by the AckNak_Seq_Num does not
  correspond to an unacknowledged TLP, or to the value in
  ACKD_SEQ, the DLLP is discarded

  - This is a Data Link Protocol Error, which is a reported error
associated with the Port (see Section 6.2).

So data from that DLLP should not have made it to memory, although of
course the DMA may not have been completed.  But it sounds like you
did see corrupted data written to memory?

I assume it is

Re: [PATCHv3 pci-next 1/2] PCI/AER: correctable error message as KERN_INFO

2023-09-18 Thread Bjorn Helgaas

On Mon, Sep 18, 2023 at 07:42:30PM +0800, Xi Ruoyao wrote:
> ...

> My workstation suffers from too much correctable AER reporting as well
> (related to Intel's errata "RPL013: Incorrectly Formed PCIe Packets May
> Generate Correctable Errors" and/or the motherboard design, I guess).

We should rate-limit correctable error reporting so it's not
overwhelming.

At the same time, I'm *also* interested in the cause of these errors,
in case there's a Linux defect or a hardware erratum that we can work
around.  Do you have a bug report with any more details, e.g., a dmesg
log and "sudo lspci -vv" output?

Bjorn

Re: [PATCH v6 2/3] PCI/AER: Disable AER interrupt on suspend

2023-08-10 Thread Bjorn Helgaas

On Thu, Aug 10, 2023 at 04:17:21PM +0800, Kai-Heng Feng wrote:
> On Thu, Aug 10, 2023 at 2:52 AM Bjorn Helgaas  wrote:
> > On Fri, Jul 21, 2023 at 11:58:24AM +0800, Kai-Heng Feng wrote:
> > > On Tue, Jul 18, 2023 at 7:17 PM Bjorn Helgaas  wrote:
> > > > On Fri, May 12, 2023 at 08:00:13AM +0800, Kai-Heng Feng wrote:
> > > > > PCIe services that share an IRQ with PME, such as AER or DPC,
> > > > > may cause a spurious wakeup on system suspend. To prevent this,
> > > > > disable the AER interrupt notification during the system suspend
> > > > > process.
> > > >
> > > > I see that in this particular BZ dmesg log, PME, AER, and DPC do share
> > > > the same IRQ, but I don't think this is true in general.
> > > >
> > > > Root Ports usually use MSI or MSI-X.  PME and hotplug events use the
> > > > Interrupt Message Number in the PCIe Capability, but AER uses the one
> > > > in the AER Root Error Status register, and DPC uses the one in the DPC
> > > > Capability register.  Those potentially correspond to three distinct
> > > > MSI/MSI-X vectors.
> > > >
> > > > I think this probably has nothing to do with the IRQ being *shared*,
> > > > but just that putting the downstream component into D3cold, where the
> > > > link state is L3, may cause the upstream component to log and signal a
> > > > link-related error as the link goes completely down.
> > >
> > > That's quite likely a better explanation than my wording.
> > > Assuming AER IRQ and PME IRQ are not shared, does system get woken up
> > > by AER IRQ?
> >
> > Rafael could answer this better than I can, but
> > Documentation/power/suspend-and-interrupts.rst says device interrupts
> > are generally disabled during suspend after the "late" phase of
> > suspending devices, i.e.,
> >
> >   dpm_suspend_noirq
> > suspend_device_irqs   <-- disable non-wakeup IRQs
> > dpm_noirq_suspend_devices
> >   ...
> > pci_pm_suspend_noirq  # (I assume)
> >   pci_prepare_to_sleep
> >
> > I think the downstream component would be put in D3cold by
> > pci_prepare_to_sleep(), so non-wakeup interrupts should be disabled by
> > then.
> >
> > I assume PME would generally *not* be disabled since it's needed for
> > wakeup, so I think any interrupt that shares the PME IRQ and occurs
> > during suspend may cause a spurious wakeup.
> 
> Yes, that's the case here.
> 
> > If so, it's exactly as you said at the beginning: AER/DPC/etc sharing
> > the PME IRQ may cause spurious wakeups, and we would have to disable
> > those other interrupts at the source, e.g., by clearing
> > PCI_ERR_ROOT_CMD_FATAL_EN etc (exactly as your series does).
> 
> So is the series good to be merged now?

If we merge as-is, won't we disable AER & DPC interrupts unnecessarily
in the case where the link goes to D3hot?  In that case, there's no
reason to expect interrupts related to the link going down, but things
like PTM messages still work, and they may cause errors that we should
know about.

> > > > I don't think D0-D3hot should be relevant here because in all those
> > > > states, the link should be active because the downstream config space
> > > > remains accessible.  So I'm not sure if it's possible, but I wonder if
> > > > there's a more targeted place we could do this, e.g., in the path that
> > > > puts downstream devices in D3cold.
> > >
> > > Let me try to work on this.
> > >
> > > Kai-Heng
> > >
> > > >
> > > > > As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power 
> > > > > Management",
> > > > > TLP and DLLP transmission are disabled for a Link in L2/L3 Ready 
> > > > > (D3hot), L2
> > > > > (D3cold with aux power) and L3 (D3cold) states. So disabling the AER
> > > > > notification during suspend and re-enabling them during the resume 
> > > > > process
> > > > > should not affect the basic functionality.
> > > > >
> > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295
> > > > > Reviewed-by: Mika Westerberg 
> > > > > Signed-off-by: Kai-Heng Feng 
> > > > > ---
> > > > > v6:
> > > > > v5:
> > > > >  - Wording.
> > > &

Re: [PATCH v6 2/3] PCI/AER: Disable AER interrupt on suspend

2023-08-09 Thread Bjorn Helgaas

On Fri, Jul 21, 2023 at 11:58:24AM +0800, Kai-Heng Feng wrote:
> On Tue, Jul 18, 2023 at 7:17 PM Bjorn Helgaas  wrote:
> > On Fri, May 12, 2023 at 08:00:13AM +0800, Kai-Heng Feng wrote:
> > > PCIe services that share an IRQ with PME, such as AER or DPC,
> > > may cause a spurious wakeup on system suspend. To prevent this,
> > > disable the AER interrupt notification during the system suspend
> > > process.
> >
> > I see that in this particular BZ dmesg log, PME, AER, and DPC do share
> > the same IRQ, but I don't think this is true in general.
> >
> > Root Ports usually use MSI or MSI-X.  PME and hotplug events use the
> > Interrupt Message Number in the PCIe Capability, but AER uses the one
> > in the AER Root Error Status register, and DPC uses the one in the DPC
> > Capability register.  Those potentially correspond to three distinct
> > MSI/MSI-X vectors.
> >
> > I think this probably has nothing to do with the IRQ being *shared*,
> > but just that putting the downstream component into D3cold, where the
> > link state is L3, may cause the upstream component to log and signal a
> > link-related error as the link goes completely down.
> 
> That's quite likely a better explanation than my wording.
> Assuming AER IRQ and PME IRQ are not shared, does system get woken up
> by AER IRQ?

Rafael could answer this better than I can, but
Documentation/power/suspend-and-interrupts.rst says device interrupts
are generally disabled during suspend after the "late" phase of
suspending devices, i.e.,

  dpm_suspend_noirq
suspend_device_irqs   <-- disable non-wakeup IRQs
dpm_noirq_suspend_devices
  ...
pci_pm_suspend_noirq  # (I assume)
  pci_prepare_to_sleep

I think the downstream component would be put in D3cold by
pci_prepare_to_sleep(), so non-wakeup interrupts should be disabled by
then.

I assume PME would generally *not* be disabled since it's needed for
wakeup, so I think any interrupt that shares the PME IRQ and occurs
during suspend may cause a spurious wakeup.

If so, it's exactly as you said at the beginning: AER/DPC/etc sharing
the PME IRQ may cause spurious wakeups, and we would have to disable
those other interrupts at the source, e.g., by clearing
PCI_ERR_ROOT_CMD_FATAL_EN etc (exactly as your series does).

> > I don't think D0-D3hot should be relevant here because in all those
> > states, the link should be active because the downstream config space
> > remains accessible.  So I'm not sure if it's possible, but I wonder if
> > there's a more targeted place we could do this, e.g., in the path that
> > puts downstream devices in D3cold.
> 
> Let me try to work on this.
> 
> Kai-Heng
> 
> >
> > > As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power 
> > > Management",
> > > TLP and DLLP transmission are disabled for a Link in L2/L3 Ready (D3hot), 
> > > L2
> > > (D3cold with aux power) and L3 (D3cold) states. So disabling the AER
> > > notification during suspend and re-enabling them during the resume process
> > > should not affect the basic functionality.
> > >
> > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295
> > > Reviewed-by: Mika Westerberg 
> > > Signed-off-by: Kai-Heng Feng 
> > > ---
> > > v6:
> > > v5:
> > >  - Wording.
> > >
> > > v4:
> > > v3:
> > >  - No change.
> > >
> > > v2:
> > >  - Only disable AER IRQ.
> > >  - No more check on PME IRQ#.
> > >  - Use helper.
> > >
> > >  drivers/pci/pcie/aer.c | 22 ++
> > >  1 file changed, 22 insertions(+)
> > >
> > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > > index 1420e1f27105..9c07fdbeb52d 100644
> > > --- a/drivers/pci/pcie/aer.c
> > > +++ b/drivers/pci/pcie/aer.c
> > > @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev)
> > >   return 0;
> > >  }
> > >
> > > +static int aer_suspend(struct pcie_device *dev)
> > > +{
> > > + struct aer_rpc *rpc = get_service_data(dev);
> > > + struct pci_dev *pdev = rpc->rpd;
> > > +
> > > + aer_disable_irq(pdev);
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int aer_resume(struct pcie_device *dev)
> > > +{
> > > + struct aer_rpc *rpc = get_service_data(dev);
> > > + struct pci_dev *pdev = rpc->rpd;
> > > +
> > > + aer_enable_irq(pdev);
> > > +
> > > + return 0;
> > > +}
> > > +
> > >  /**
> > >   * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP
> > >   * @dev: pointer to Root Port, RCEC, or RCiEP
> > > @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = {
> > >   .service= PCIE_PORT_SERVICE_AER,
> > >
> > >   .probe  = aer_probe,
> > > + .suspend= aer_suspend,
> > > + .resume = aer_resume,
> > >   .remove = aer_remove,
> > >  };
> > >
> > > --
> > > 2.34.1
> > >

Re: [PATCH v2 1/2] PCI: Add pci_find_next_dvsec_capability to find next Designated VSEC

2023-08-08 Thread Bjorn Helgaas

Don't re-post just for this, but if you do repost, add "()" after the
function name in the subject line, as you did for the 2/2 patch.

On Tue, Aug 08, 2023 at 12:08:57PM +0800, Xiongfeng Wang wrote:
> Some devices may have several DVSEC (Designated Vendor-Specific Extended
> Capability) entries with the same DVSEC ID. Add
> pci_find_next_dvsec_capability() to find them all.

Re: [PATCH 1/2] PCI: Add pci_find_next_dvsec_capability to find next designated VSEC

2023-08-07 Thread Bjorn Helgaas

[+cc David since drivers/platform/x86/intel/vsec.c does some similar
things, although it seems to iterate over all Intel DVSEC IDs at once]

In subject:

  PCI: Add pci_find_next_dvsec_capability() to find next Designated VSEC

On Mon, Aug 07, 2023 at 11:18:45AM +0800, Xiongfeng Wang wrote:
> Some devices may have several DVSEC(Designated Vendor-Specific Extended
> Capability) entries with the same DVSEC ID. Add
> pci_find_next_dvsec_capability() to find them all.

Add space between "DVSEC" and "(Designated ...)".

> Signed-off-by: Xiongfeng Wang 

Acked-by: Bjorn Helgaas 

so you can merge this along with the ocxl patch that uses it.

> ---
>  drivers/pci/pci.c   | 37 +
>  include/linux/pci.h |  2 ++
>  2 files changed, 27 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 60230da957e0..3455ca7306ae 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -749,35 +749,48 @@ u16 pci_find_vsec_capability(struct pci_dev *dev, u16 
> vendor, int cap)
>  EXPORT_SYMBOL_GPL(pci_find_vsec_capability);
>  
>  /**
> - * pci_find_dvsec_capability - Find DVSEC for vendor
> + * pci_find_next_dvsec_capability - Find next DVSEC for vendor
>   * @dev: PCI device to query
> + * @start: address at which to start looking (0 to start at beginning of 
> list)

s/address/Address/ to match other parameters

>   * @vendor: Vendor ID to match for the DVSEC
>   * @dvsec: Designated Vendor-specific capability ID

There are a lot of IDs floating around here, so to better match the
spec language:

  @dvsec: Vendor-defined DVSEC ID

> - * If DVSEC has Vendor ID @vendor and DVSEC ID @dvsec return the capability
> - * offset in config space; otherwise return 0.
> + * Returns the address of the next DVSEC if the DVSEC has Vendor ID @vendor 
> and
> + * DVSEC ID @dvsec; otherwise return 0. DVSEC can occur several times with 
> the
> + * same DVSEC ID for some devices, and this provides a way to find them all.
>   */
> -u16 pci_find_dvsec_capability(struct pci_dev *dev, u16 vendor, u16 dvsec)
> +u16 pci_find_next_dvsec_capability(struct pci_dev *dev, u16 start, u16 
> vendor,
> +u16 dvsec)
>  {
> - int pos;
> + u16 pos = start;
>  
> - pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DVSEC);
> - if (!pos)
> - return 0;
> -
> - while (pos) {
> + while ((pos = pci_find_next_ext_capability(dev, pos,
> +   PCI_EXT_CAP_ID_DVSEC))) {
>   u16 v, id;
>  
>   pci_read_config_word(dev, pos + PCI_DVSEC_HEADER1, &v);
>   pci_read_config_word(dev, pos + PCI_DVSEC_HEADER2, &id);
>   if (vendor == v && dvsec == id)
>   return pos;
> -
> - pos = pci_find_next_ext_capability(dev, pos, 
> PCI_EXT_CAP_ID_DVSEC);
>   }
>  
>   return 0;
>  }
> +EXPORT_SYMBOL_GPL(pci_find_next_dvsec_capability);
> +
> +/**
> + * pci_find_dvsec_capability - Find DVSEC for vendor
> + * @dev: PCI device to query
> + * @vendor: Vendor ID to match for the DVSEC
> + * @dvsec: Designated Vendor-specific capability ID
> + *
> + * If DVSEC has Vendor ID @vendor and DVSEC ID @dvsec return the capability
> + * offset in config space; otherwise return 0.
> + */
> +u16 pci_find_dvsec_capability(struct pci_dev *dev, u16 vendor, u16 dvsec)
> +{
> + return pci_find_next_dvsec_capability(dev, 0, vendor, dvsec);
> +}
>  EXPORT_SYMBOL_GPL(pci_find_dvsec_capability);
>  
>  /**
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index c69a2cc1f412..82bb905daf72 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1168,6 +1168,8 @@ u16 pci_find_next_ext_capability(struct pci_dev *dev, 
> u16 pos, int cap);
>  struct pci_bus *pci_find_next_bus(const struct pci_bus *from);
>  u16 pci_find_vsec_capability(struct pci_dev *dev, u16 vendor, int cap);
>  u16 pci_find_dvsec_capability(struct pci_dev *dev, u16 vendor, u16 dvsec);
> +u16 pci_find_next_dvsec_capability(struct pci_dev *dev, u16 start, u16 
> vendor,
> +u16 dvsec);
>  
>  u64 pci_get_dsn(struct pci_dev *dev);
>  
> -- 
> 2.20.1
>

Re: [PATCH v7 2/2] PCI: rpaphp: Error out on busy status from get-sensor-state

2023-08-01 Thread Bjorn Helgaas

/eeh_driver.c, not the PCI core aer.c and err.c?

> Current implementation uses rtas_get_sensor() API which blocks the slot
> check state until RTAS call returns success. To avoid this, fix the PCI
> hotplug driver (rpaphp) to return an error (-EBUSY) if the slot presence
> state can not be detected immediately while PE is in EEH recovery state.
> Change rpaphp_get_sensor_state() to invoke rtas_call(get-sensor-state)
> directly only if the respective PE is in EEH recovery state, and take
> actions based on RTAS return status. This way EEH handler will not be
> blocked on rpaphp_get_sensor_state() and can immediately notify driver
> about the PCI error and stop any active operations.
> 
> In normal cases (non-EEH case) rpaphp_get_sensor_state() will continue to
> invoke rtas_get_sensor() as it was earlier with no change in existing
> behavior.
> 
> Signed-off-by: Mahesh Salgaonkar 
> Reviewed-by: Nathan Lynch 

Seems like maybe both patches could go via a ppc tree since they seem
very ppc-specific?  A couple minor comments below.

Acked-by: Bjorn Helgaas 

> + * get_adapter_status() can be called by the EEH handler during EEH recovery.
> + * On certain PHB failures, the RTAS call get-seHsor-state() returns extended

Looks like a typo in "get-seHsor-state"?

> +static int __rpaphp_get_sensor_state(struct slot *slot, int *state)
> +{
> +#ifdef CONFIG_EEH

Is this #ifdef redundant?  It looks like this file is only compiled
if CONFIG_EEH is set:

  config HOTPLUG_PCI_RPA
  tristate "RPA PCI Hotplug driver"
  depends on PPC_PSERIES && EEH

  obj-$(CONFIG_HOTPLUG_PCI_RPA)   += rpaphp.o

  rpaphp-objs :=  rpaphp_core.o   \
  rpaphp_pci.o\
  rpaphp_slot.o

> + int rc;
> + int token = rtas_token("get-sensor-state");
> + struct pci_dn *pdn;
> + struct eeh_pe *pe;
> + struct pci_controller *phb = PCI_DN(slot->dn)->phb;
> +
> + if (token == RTAS_UNKNOWN_SERVICE)
> + return -ENOENT;
> +
> + /*
> +  * Fallback to existing method for empty slot or PE isn't in EEH
> +  * recovery.
> +  */
> + pdn = list_first_entry_or_null(&PCI_DN(phb->dn)->child_list,
> + struct pci_dn, list);
> + if (!pdn)
> + goto fallback;
> +
> + pe = eeh_dev_to_pe(pdn->edev);
> + if (pe && (pe->state & EEH_PE_RECOVERING)) {
> + rc = rtas_call(token, 2, 2, state, DR_ENTITY_SENSE,
> +slot->index);
> + return rtas_get_sensor_errno(rc);
> + }
> +fallback:
> +#endif
> + return rtas_get_sensor(DR_ENTITY_SENSE, slot->index, state);
> +}
> +
>  int rpaphp_get_sensor_state(struct slot *slot, int *state)
>  {
>   int rc;
>   int setlevel;
>  
> - rc = rtas_get_sensor(DR_ENTITY_SENSE, slot->index, state);
> + rc = __rpaphp_get_sensor_state(slot, state);
>  
>   if (rc < 0) {
>   if (rc == -EFAULT || rc == -EEXIST) {
> @@ -40,8 +117,7 @@ int rpaphp_get_sensor_state(struct slot *slot, int *state)
>   dbg("%s: power on slot[%s] failed rc=%d.\n",
>   __func__, slot->name, rc);
>   } else {
> - rc = rtas_get_sensor(DR_ENTITY_SENSE,
> -  slot->index, state);
> + rc = __rpaphp_get_sensor_state(slot, state);
>   }
>   } else if (rc == -ENODEV)
>   info("%s: slot is unusable\n", __func__);
> 
>

Re: [PATCH v6 2/3] PCI/AER: Disable AER interrupt on suspend

2023-07-18 Thread Bjorn Helgaas

[+cc Rafael]

On Fri, May 12, 2023 at 08:00:13AM +0800, Kai-Heng Feng wrote:
> PCIe services that share an IRQ with PME, such as AER or DPC, may cause a
> spurious wakeup on system suspend. To prevent this, disable the AER interrupt
> notification during the system suspend process.

I see that in this particular BZ dmesg log, PME, AER, and DPC do share
the same IRQ, but I don't think this is true in general.

Root Ports usually use MSI or MSI-X.  PME and hotplug events use the
Interrupt Message Number in the PCIe Capability, but AER uses the one
in the AER Root Error Status register, and DPC uses the one in the DPC
Capability register.  Those potentially correspond to three distinct
MSI/MSI-X vectors.

I think this probably has nothing to do with the IRQ being *shared*,
but just that putting the downstream component into D3cold, where the
link state is L3, may cause the upstream component to log and signal a
link-related error as the link goes completely down.

I don't think D0-D3hot should be relevant here because in all those
states, the link should be active because the downstream config space
remains accessible.  So I'm not sure if it's possible, but I wonder if
there's a more targeted place we could do this, e.g., in the path that
puts downstream devices in D3cold.

> As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power Management",
> TLP and DLLP transmission are disabled for a Link in L2/L3 Ready (D3hot), L2
> (D3cold with aux power) and L3 (D3cold) states. So disabling the AER
> notification during suspend and re-enabling them during the resume process
> should not affect the basic functionality.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295
> Reviewed-by: Mika Westerberg 
> Signed-off-by: Kai-Heng Feng 
> ---
> v6:
> v5:
>  - Wording.
> 
> v4:
> v3:
>  - No change.
> 
> v2:
>  - Only disable AER IRQ.
>  - No more check on PME IRQ#.
>  - Use helper.
> 
>  drivers/pci/pcie/aer.c | 22 ++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 1420e1f27105..9c07fdbeb52d 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev)
>   return 0;
>  }
>  
> +static int aer_suspend(struct pcie_device *dev)
> +{
> + struct aer_rpc *rpc = get_service_data(dev);
> + struct pci_dev *pdev = rpc->rpd;
> +
> + aer_disable_irq(pdev);
> +
> + return 0;
> +}
> +
> +static int aer_resume(struct pcie_device *dev)
> +{
> + struct aer_rpc *rpc = get_service_data(dev);
> + struct pci_dev *pdev = rpc->rpd;
> +
> + aer_enable_irq(pdev);
> +
> + return 0;
> +}
> +
>  /**
>   * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP
>   * @dev: pointer to Root Port, RCEC, or RCiEP
> @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = {
>   .service= PCIE_PORT_SERVICE_AER,
>  
>   .probe  = aer_probe,
> + .suspend= aer_suspend,
> + .resume = aer_resume,
>   .remove = aer_remove,
>  };
>  
> -- 
> 2.34.1
>

Re: [PATCH 0/2] PCI/AER: Remove/unexport error reporting enable/disable

2023-07-13 Thread Bjorn Helgaas

On Mon, Jul 10, 2023 at 06:21:34PM -0500, Bjorn Helgaas wrote:
> From: Bjorn Helgaas 
> 
> pci_disable_pcie_error_reporting() is unused; remove it.
> pci_enable_pcie_error_reporting() is used only inside aer.c; make it
> static.
> 
> Bjorn Helgaas (2):
>   PCI/AER: Drop unused pci_disable_pcie_error_reporting()
>   PCI/AER: Unexport pci_enable_pcie_error_reporting()
> 
>  drivers/pci/pcie/aer.c | 15 +--
>  include/linux/aer.h| 11 ---
>  2 files changed, 1 insertion(+), 25 deletions(-)

Applied to pci/aer for v6.6, thanks Christoph and Sathy!

[PATCH 2/2] PCI/AER: Unexport pci_enable_pcie_error_reporting()

2023-07-10 Thread Bjorn Helgaas

From: Bjorn Helgaas 

pci_enable_pcie_error_reporting() is used only inside aer.c.  Stop exposing
it outside the file.

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/pcie/aer.c | 3 +--
 include/linux/aer.h| 6 --
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index d4c948b7c449..645149608054 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -230,7 +230,7 @@ int pcie_aer_is_native(struct pci_dev *dev)
return pcie_ports_native || host->native_aer;
 }
 
-int pci_enable_pcie_error_reporting(struct pci_dev *dev)
+static int pci_enable_pcie_error_reporting(struct pci_dev *dev)
 {
int rc;
 
@@ -240,7 +240,6 @@ int pci_enable_pcie_error_reporting(struct pci_dev *dev)
rc = pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_AER_FLAGS);
return pcibios_err_to_errno(rc);
 }
-EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
 
 int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
 {
diff --git a/include/linux/aer.h b/include/linux/aer.h
index aadc9242cb20..2dd175f5debd 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -41,14 +41,8 @@ struct aer_capability_regs {
 };
 
 #if defined(CONFIG_PCIEAER)
-/* PCIe port driver needs this function to enable AER */
-int pci_enable_pcie_error_reporting(struct pci_dev *dev);
 int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
 #else
-static inline int pci_enable_pcie_error_reporting(struct pci_dev *dev)
-{
-   return -EINVAL;
-}
 static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
 {
return -EINVAL;
-- 
2.34.1

[PATCH 1/2] PCI/AER: Drop unused pci_disable_pcie_error_reporting()

2023-07-10 Thread Bjorn Helgaas

From: Bjorn Helgaas 

pci_disable_pcie_error_reporting() has no callers.  Remove it.

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/pcie/aer.c | 12 
 include/linux/aer.h|  5 -
 2 files changed, 17 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index f6c24ded134c..d4c948b7c449 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -242,18 +242,6 @@ int pci_enable_pcie_error_reporting(struct pci_dev *dev)
 }
 EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
 
-int pci_disable_pcie_error_reporting(struct pci_dev *dev)
-{
-   int rc;
-
-   if (!pcie_aer_is_native(dev))
-   return -EIO;
-
-   rc = pcie_capability_clear_word(dev, PCI_EXP_DEVCTL, PCI_EXP_AER_FLAGS);
-   return pcibios_err_to_errno(rc);
-}
-EXPORT_SYMBOL_GPL(pci_disable_pcie_error_reporting);
-
 int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
 {
int aer = dev->aer_cap;
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 3a3ab05e13fd..aadc9242cb20 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -43,17 +43,12 @@ struct aer_capability_regs {
 #if defined(CONFIG_PCIEAER)
 /* PCIe port driver needs this function to enable AER */
 int pci_enable_pcie_error_reporting(struct pci_dev *dev);
-int pci_disable_pcie_error_reporting(struct pci_dev *dev);
 int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
 #else
 static inline int pci_enable_pcie_error_reporting(struct pci_dev *dev)
 {
return -EINVAL;
 }
-static inline int pci_disable_pcie_error_reporting(struct pci_dev *dev)
-{
-   return -EINVAL;
-}
 static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
 {
return -EINVAL;
-- 
2.34.1

[PATCH 0/2] PCI/AER: Remove/unexport error reporting enable/disable

2023-07-10 Thread Bjorn Helgaas

From: Bjorn Helgaas 

pci_disable_pcie_error_reporting() is unused; remove it.
pci_enable_pcie_error_reporting() is used only inside aer.c; make it
static.

Bjorn Helgaas (2):
  PCI/AER: Drop unused pci_disable_pcie_error_reporting()
  PCI/AER: Unexport pci_enable_pcie_error_reporting()

 drivers/pci/pcie/aer.c | 15 +--
 include/linux/aer.h| 11 ---
 2 files changed, 1 insertion(+), 25 deletions(-)

-- 
2.34.1

Re: [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures

2023-06-16 Thread Bjorn Helgaas

On Fri, Jun 16, 2023 at 01:27:52PM +0100, Maciej W. Rozycki wrote:
> On Thu, 15 Jun 2023, Bjorn Helgaas wrote:

>  As per my earlier remark:
> 
> > I think making a system halfway-fixed would make little sense, but with
> > the actual fix actually made last as you suggested I think this can be
> > split off, because it'll make no functional change by itself.
> 
> I am not perfectly happy with your rearrangement to fold the !PCI_QUIRKS 
> stub into the change carrying the actual workaround and then have the 
> reset path update with a follow-up change only, but I won't fight over it.  
> It's only one tree revision that will be in this halfway-fixed state and 
> I'll trust your judgement here.

Thanks for raising this.  Here's my thought process:

  12 PCI: Provide stub failed link recovery for device probing and hot plug
  13 PCI: Add failed link recovery for device reset events
  14 PCI: Work around PCIe link training failures

Patch 12 [1] adds calls to pcie_failed_link_retrain(), which does
nothing and returns false.  Functionally, it's a no-op, but the
structure is important later.

Patch 13 [2] claims to request failed link recovery after resets, but
actually doesn't do anything yet because pcie_failed_link_retrain() is
still a no-op, so this was a bit confusing.

Patch 14 [3] implements pcie_failed_link_retrain(), so the recovery
mentioned in 12 and 13 actually happens.  But this patch doesn't add
the call to pcie_failed_link_retrain(), so it's a little bit hard to
connect the dots.

I agree that as I rearranged it, the workaround doesn't apply in all
cases simultaneously.  Maybe not ideal, but maybe not terrible either.
Looking at it again, maybe it would have made more sense to move the
pcie_wait_for_link_delay() change to the last patch along with the
pci_dev_wait() change.  I dunno.

Bjorn

[1] 12 
https://lore.kernel.org/r/alpine.deb.2.21.2306111619570.64...@angie.orcam.me.uk
[2] 13 
https://lore.kernel.org/r/alpine.deb.2.21.2306111631050.64...@angie.orcam.me.uk
[3] 14 
https://lore.kernel.org/r/alpine.deb.2.21.2305310038540.59...@angie.orcam.me.uk

Re: [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures

2023-06-15 Thread Bjorn Helgaas

On Thu, Jun 15, 2023 at 01:41:10AM +0100, Maciej W. Rozycki wrote:
> On Wed, 14 Jun 2023, Bjorn Helgaas wrote:
> 
> > >  This is v9 of the change to work around a PCIe link training phenomenon 
> > > where a pair of devices both capable of operating at a link speed above 
> > > 2.5GT/s seems unable to negotiate the link speed and continues training 
> > > indefinitely with the Link Training bit switching on and off repeatedly 
> > > and the data link layer never reaching the active state.
> > > 
> > >  With several requests addressed and a few extra issues spotted this
> > > version has now grown to 14 patches.  It has been verified for device 
> > > enumeration with and without PCI_QUIRKS enabled, using the same piece of 
> > > RISC-V hardware as previously.  Hot plug or reset events have not been 
> > > verified, as this is difficult if at all feasible with hardware in 
> > > question.

> >  static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
> >  {
> > -   bool retrain = true;
> > int delay = 1;
> > +   bool retrain = false;
> > +   struct pci_dev *bridge;
> > +
> > +   if (pci_is_pcie(dev)) {
> > +   retrain = true;
> > +   bridge = pci_upstream_bridge(dev);
> > +   }
> 
>  If doing it this way, which I actually like, I think it would be a little 
> bit better performance- and style-wise if this was written as:
> 
>   if (pci_is_pcie(dev)) {
>   bridge = pci_upstream_bridge(dev);
>   retrain = !!bridge;
>   }
> 
> (or "retrain = bridge != NULL" if you prefer this style), and then we 
> don't have to repeatedly check two variables iff (pcie && !bridge) in the 
> loop below:

Done, thanks, I do like that better.  I did:

  bridge = pci_upstream_bridge(dev);
  if (bridge)
retrain = true;

because it seems like it flows more naturally when reading.

Bjorn

Re: [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures

2023-06-14 Thread Bjorn Helgaas

On Sun, Jun 11, 2023 at 06:19:08PM +0100, Maciej W. Rozycki wrote:
> Hi,
> 
>  This is v9 of the change to work around a PCIe link training phenomenon 
> where a pair of devices both capable of operating at a link speed above 
> 2.5GT/s seems unable to negotiate the link speed and continues training 
> indefinitely with the Link Training bit switching on and off repeatedly 
> and the data link layer never reaching the active state.
> 
>  With several requests addressed and a few extra issues spotted this
> version has now grown to 14 patches.  It has been verified for device 
> enumeration with and without PCI_QUIRKS enabled, using the same piece of 
> RISC-V hardware as previously.  Hot plug or reset events have not been 
> verified, as this is difficult if at all feasible with hardware in 
> question.
> 
>  Last iteration: 
> ,
>  
> and my input to it:
> .

Thanks, I applied these to pci/enumeration for v6.5.

I tweaked a few things, so double-check to be sure I didn't break
something:

  - Moved dev->link_active_reporting init to set_pcie_port_type()
because it does other PCIe-related stuff.

  - Reordered to keep all the link_active_reporting things together.

  - Reordered to clean up & factor pcie_retrain_link() before exposing
it to the rest of the PCI core.

  - Moved pcie_retrain_link() a little earlier to keep it next to
pcie_wait_for_link_status().

  - Squashed the stubs into the actual quirk so we don't have the
intermediate state where we call the stubs but they never do
anything (let me know if there's a reason we need your order).

  - Inline pcie_parent_link_retrain(), which seemed like it didn't add
enough to be worthwhile.

Interdiff below:

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 80694e2574b8..f11268924c8f 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1153,27 +1153,16 @@ void pci_resume_bus(struct pci_bus *bus)
pci_walk_bus(bus, pci_resume_one, NULL);
 }
 
-/**
- * pcie_parent_link_retrain - Check and retrain link we are downstream from
- * @dev: PCI device to handle.
- *
- * Return TRUE if the link was retrained, FALSE otherwise.
- */
-static bool pcie_parent_link_retrain(struct pci_dev *dev)
-{
-   struct pci_dev *bridge;
-
-   bridge = pci_upstream_bridge(dev);
-   if (bridge)
-   return pcie_failed_link_retrain(bridge);
-   else
-   return false;
-}
-
 static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 {
-   bool retrain = true;
int delay = 1;
+   bool retrain = false;
+   struct pci_dev *bridge;
+
+   if (pci_is_pcie(dev)) {
+   retrain = true;
+   bridge = pci_upstream_bridge(dev);
+   }
 
/*
 * After reset, the device should not silently discard config
@@ -1201,9 +1190,9 @@ static int pci_dev_wait(struct pci_dev *dev, char 
*reset_type, int timeout)
}
 
if (delay > PCI_RESET_WAIT) {
-   if (retrain) {
+   if (retrain && bridge) {
retrain = false;
-   if (pcie_parent_link_retrain(dev)) {
+   if (pcie_failed_link_retrain(bridge)) {
delay = 1;
continue;
}
@@ -4914,6 +4903,38 @@ static bool pcie_wait_for_link_status(struct pci_dev 
*pdev,
return (lnksta & lnksta_mask) == lnksta_match;
 }
 
+/**
+ * pcie_retrain_link - Request a link retrain and wait for it to complete
+ * @pdev: Device whose link to retrain.
+ * @use_lt: Use the LT bit if TRUE, or the DLLLA bit if FALSE, for status.
+ *
+ * Retrain completion status is retrieved from the Link Status Register
+ * according to @use_lt.  It is not verified whether the use of the DLLLA
+ * bit is valid.
+ *
+ * Return TRUE if successful, or FALSE if training has not completed
+ * within PCIE_LINK_RETRAIN_TIMEOUT_MS milliseconds.
+ */
+bool pcie_retrain_link(struct pci_dev *pdev, bool use_lt)
+{
+   u16 lnkctl;
+
+   pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, &lnkctl);
+   lnkctl |= PCI_EXP_LNKCTL_RL;
+   pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnkctl);
+   if (pdev->clear_retrain_link) {
+   /*
+* Due to an erratum in some devices the Retrain Link bit
+* needs to be cleared again manually to allow the link
+* training to succeed.
+*/
+   lnkctl &= ~PCI_EXP_LNKCTL_RL;
+   pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnkctl);
+   }
+
+   return pcie_wait_for_link_status(pdev, use_lt, !use_lt);
+}
+
 /**
  * pcie_wait_for_link_delay - Wait until link is active

Re: [PATCH v8 0/7] Add pci_dev_for_each_resource() helper and update users

2023-05-30 Thread Bjorn Helgaas

On Fri, May 12, 2023 at 02:48:51PM -0500, Bjorn Helgaas wrote:
> On Fri, May 12, 2023 at 01:56:29PM +0300, Andy Shevchenko wrote:
> > On Tue, May 09, 2023 at 01:21:22PM -0500, Bjorn Helgaas wrote:
> > > On Tue, Apr 04, 2023 at 11:11:01AM -0500, Bjorn Helgaas wrote:
> > > > On Thu, Mar 30, 2023 at 07:24:27PM +0300, Andy Shevchenko wrote:
> > > > > Provide two new helper macros to iterate over PCI device resources and
> > > > > convert users.
> > > 
> > > > Applied 2-7 to pci/resource for v6.4, thanks, I really like this!
> > > 
> > > This is 09cc90063240 ("PCI: Introduce pci_dev_for_each_resource()")
> > > upstream now.
> > > 
> > > Coverity complains about each use,
> > 
> > It needs more clarification here. Use of reduced variant of the
> > macro or all of them? If the former one, then I can speculate that
> > Coverity (famous for false positives) simply doesn't understand `for
> > (type var; var ...)` code.
> 
> True, Coverity finds false positives.  It flagged every use in
> drivers/pci and drivers/pnp.  It didn't mention the arch/alpha, arm,
> mips, powerpc, sh, or sparc uses, but I think it just didn't look at
> those.
> 
> It flagged both:
> 
>   pbus_size_iopci_dev_for_each_resource(dev, r)
>   pbus_size_mem   pci_dev_for_each_resource(dev, r, i)
> 
> Here's a spreadsheet with a few more details (unfortunately I don't
> know how to make it dump the actual line numbers or analysis like I
> pasted below, so "pci_dev_for_each_resource" doesn't appear).  These
> are mostly in the "Drivers-PCI" component.
> 
> https://docs.google.com/spreadsheets/d/1ohOJwxqXXoDUA0gwopgk-z-6ArLvhN7AZn4mIlDkHhQ/edit?usp=sharing
> 
> These particular reports are in the "High Impact Outstanding" tab.

Where are we at?  Are we going to ignore this because some Coverity
reports are false positives?

Bjorn

Re: [PATCH v4 22/23] PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem dev handler

2023-05-25 Thread Bjorn Helgaas

On Thu, May 25, 2023 at 11:29:58PM +0200, Robert Richter wrote:
> eOn 24.05.23 16:32:35, Bjorn Helgaas wrote:
> > On Tue, May 23, 2023 at 06:22:13PM -0500, Terry Bowman wrote:
> > > From: Robert Richter 
> > > 
> > > In Restricted CXL Device (RCD) mode a CXL device is exposed as an
> > > RCiEP, but CXL downstream and upstream ports are not enumerated and
> > > not visible in the PCIe hierarchy. Protocol and link errors are sent
> > > to an RCEC.
> > >
> > > Restricted CXL host (RCH) downstream port-detected errors are signaled
> > > as internal AER errors, either Uncorrectable Internal Error (UIE) or
> > > Corrected Internal Errors (CIE). 
> > 
> > From the parallelism with RCD above, I first thought that RCH devices
> > were non-RCD mode and *were* enumerated as part of the PCIe hierarchy,
> > but actually I suspect it's more like the following?
> > 
> >   ... but CXL downstream and upstream ports are not enumerated and not
> >   visible in the PCIe hierarchy.
> > 
> >   Protocol and link errors from these non-enumerated ports are
> >   signaled as internal AER errors ... via a CXL RCEC.
> 
> Exactly, except the RCEC is standard PCIe and also must not
> necessarily on the same PCI bus as the CXL RCiEPs are.

So make it "RCEC" instead of "CXL RCEC", I guess?  PCIe r6.0, sec
7.9.10.3, allows an RCEC to be associated with RCiEPs on different
buses, so nothing to see there.

> > > The error source is the id of the RCEC.
> > 
> > This seems odd; I assume this refers to the RCEC's AER Error Source
> > Identification register, and the ERR_COR or ERR_FATAL/NONFATAL Source
> > Identification would ordinarily be the Requester ID of the RCiEP that
> > "sent" the Error Message.  But you're saying it's actually the ID of
> > the *RCEC*, not the RCiEP?
> 
> Right, the downstream port has its own AER ext capability in
> non-config (io mapped) RCRB register range. Errors originating from
> there are signaled as internal AER errors via the RCEC *with* the
> RCEC's Requester ID. Code walks through all associated CXL endpoints,
> determines the dport and checks its AER.
> 
> There is also an RDPAS structure defined in CXL but that is only a
> different way to provide the RCEC to dport association instead of
> using the RCEC's Endpoint Association Extended Capability. In the end
> we get all associated RCHs and check the AER of all their dports.
> 
> The upstream port is signaled using the RCiEP's AER. CXL spec is
> strict here: "Upstream Port RCRB shall not implement the AER Extended
> Capability." The RCiEP's requestor ID is used then and its config
> space the AER is in.
> 
> CXL.cachemem errors are reported with the RCiEP as requester
> too. Status is in the CXL RAS cap and the UIE or CIE is set
> respectively in the AER status of the RCiEP.
>
> > We're going to call pci_aer_handle_error() as well, to handle the
> > non-internal errors, and I'm pretty sure that path expects the RCiEP
> > ID there.
> > 
> > Whatever the answer, I'm not sure this sentence is actually relevant
> > to this patch, since this patch doesn't read PCI_ERR_ROOT_ERR_SRC or
> > look at struct aer_err_source.id.
> 
> The source id is used in aer_process_err_devices() which finally calls
> handle_error_source() for the device with the requestor id. This is
> the place where cxl_rch_handle_error() checks if it is an RCEC that
> received an internal error and has cxl devices connected to it. Then,
> the request is forwarded to the cxl_mem handler which also needs to
> check the dport now. That is, pcie_walk_rcec() in
> cxl_rch_handle_error() is called with the RCEC's pci handle,
> cxl_rch_handle_error_iter() with the RCiEP's pci handle.

I'm still not sure this is relevant.  Isn't that last sentence just
the way we always use pcie_walk_rcec()?

If there's something *different* here about CXL, and it's important to
this patch, sure.  But I don't see that yet.  Maybe a comment in the
code if you think it's important to clarify something there.

Bjorn

Re: [PATCH v4 22/23] PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem dev handler

2023-05-24 Thread Bjorn Helgaas

On Tue, May 23, 2023 at 06:22:13PM -0500, Terry Bowman wrote:
> From: Robert Richter 
> 
> In Restricted CXL Device (RCD) mode a CXL device is exposed as an
> RCiEP, but CXL downstream and upstream ports are not enumerated and
> not visible in the PCIe hierarchy. Protocol and link errors are sent
> to an RCEC.
>
> Restricted CXL host (RCH) downstream port-detected errors are signaled
> as internal AER errors, either Uncorrectable Internal Error (UIE) or
> Corrected Internal Errors (CIE). 

>From the parallelism with RCD above, I first thought that RCH devices
were non-RCD mode and *were* enumerated as part of the PCIe hierarchy,
but actually I suspect it's more like the following?

  ... but CXL downstream and upstream ports are not enumerated and not
  visible in the PCIe hierarchy.

  Protocol and link errors from these non-enumerated ports are
  signaled as internal AER errors ... via a CXL RCEC.

> The error source is the id of the RCEC.

This seems odd; I assume this refers to the RCEC's AER Error Source
Identification register, and the ERR_COR or ERR_FATAL/NONFATAL Source
Identification would ordinarily be the Requester ID of the RCiEP that
"sent" the Error Message.  But you're saying it's actually the ID of
the *RCEC*, not the RCiEP?

We're going to call pci_aer_handle_error() as well, to handle the
non-internal errors, and I'm pretty sure that path expects the RCiEP
ID there.

Whatever the answer, I'm not sure this sentence is actually relevant
to this patch, since this patch doesn't read PCI_ERR_ROOT_ERR_SRC or
look at struct aer_err_source.id.

> A CXL handler must then inspect the error status in various CXL
> registers residing in the dport's component register space (CXL RAS
> capability) or the dport's RCRB (PCIe AER extended capability). [1]
> 
> Errors showing up in the RCEC's error handler must be handled and
> connected to the CXL subsystem. Implement this by forwarding the error
> to all CXL devices below the RCEC. Since the entire CXL device is
> controlled only using PCIe Configuration Space of device 0, function
> 0, only pass it there [2]. The error handling is limited to currently
> supported devices with the Memory Device class code set
> (PCI_CLASS_MEMORY_CXL, 502h), where the handler can be implemented in
> the existing cxl_pci driver. Support of CXL devices (e.g. a CXL.cache
> device) can be enabled later.

I assume the Memory Devices are CXL devices, so maybe "Error handling
for *other* CXL devices ... can be enabled later"?  

IIUC, this happens via cxl_rch_handle_error_iter() calling
pci_error_handlers for CXL RCiEPs.  Maybe the is_cxl_mem_dev() check
belongs inside those handlers, since that driver claimed the RCiEP and
should know its functionality?  Maybe is_internal_error() and
cxl_error_is_native(), too?

> In addition to errors directed to the CXL endpoint device, a handler
> must also inspect the CXL RAS and PCIe AER capabilities of the CXL
> downstream port that is connected to the device.
> 
> Since CXL downstream port errors are signaled using internal errors,
> the handler requires those errors to be unmasked. This is subject of a
> follow-on patch.
> 
> The reason for choosing this implementation is that a CXL RCEC device
> is bound to the AER port driver,

  ... is that the AER service driver claims the CXL RCEC device, but
  does not allow registration of a CXL sub-service driver ...

> but the driver does not allow it to
> register a custom specific handler to support CXL. Connecting the RCEC
> hard-wired with a CXL handler does not work, as the CXL subsystem
> might not be present all the time. The alternative to add an
> implementation to the portdrv to allow the registration of a custom
> RCEC error handler isn't worth doing it as CXL would be its only user.
> Instead, just check for an CXL RCEC and pass it down to the connected
> CXL device's error handler. With this approach the code can entirely
> be implemented in the PCIe AER driver and is independent of the CXL
> subsystem. The CXL driver only provides the handler.
> 
> [1] CXL 3.0 spec, 12.2.1.1 RCH Downstream Port-detected Errors
> [2] CXL 3.0 spec, 8.1.3 PCIe DVSEC for CXL Devices
> 
> Co-developed-by: Terry Bowman 
> Signed-off-by: Terry Bowman 
> Signed-off-by: Robert Richter 
> Cc: "Oliver O'Halloran" 
> Cc: Bjorn Helgaas 
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-...@vger.kernel.org

Given the questions are minor:

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/pcie/Kconfig |  12 +
>  drivers/pci/pcie/aer.c   | 100 ++-
>  2 files changed, 110 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
> index 228652a59f27..4

Re: [PATCHv2 pci-next 2/2] PCI/AER: Rate limit the reporting of the correctable errors

2023-05-17 Thread Bjorn Helgaas

On Fri, Apr 07, 2023 at 04:46:03PM -0700, Grant Grundler wrote:
> On Fri, Apr 7, 2023 at 12:46 PM Bjorn Helgaas  wrote:
> > On Fri, Apr 07, 2023 at 11:53:27AM -0700, Grant Grundler wrote:
> > > On Thu, Apr 6, 2023 at 12:50 PM Bjorn Helgaas 
> > wrote:
> > > > On Fri, Mar 17, 2023 at 10:51:09AM -0700, Grant Grundler wrote:
> > > > > From: Rajat Khandelwal 
> > > > >
> > > > > There are many instances where correctable errors tend to inundate
> > > > > the message buffer. We observe such instances during thunderbolt PCIe
> > > > > tunneling.
> > > ...
> >
> > > > >   if (info->severity == AER_CORRECTABLE)
> > > > > - pci_info(dev, "   [%2d] %-22s%s\n", i, errmsg,
> > > > > - info->first_error == i ? " (First)" :
> > "");
> > > > > + pci_info_ratelimited(dev, "   [%2d]
> > %-22s%s\n", i, errmsg,
> > > > > +  info->first_error == i ?
> > " (First)" : "");
> > > >
> > > > I don't think this is going to reliably work the way we want.  We have
> > > > a bunch of pci_info_ratelimited() calls, and each caller has its own
> > > > ratelimit_state data.  Unless we call pci_info_ratelimited() exactly
> > > > the same number of times for each error, the ratelimit counters will
> > > > get out of sync and we'll end up printing fragments from error A mixed
> > > > with fragments from error B.
> > >
> > > Ok - what I'm reading between the lines here is the output should be
> > > emitted in one step, not multiple pci_info_ratelimited() calls. if the
> > > code built an output string (using sprintnf()), and then called
> > > pci_info_ratelimited() exactly once at the bottom, would that be
> > > sufficient?
> > >
> > > > I think we need to explicitly manage the ratelimiting ourselves,
> > > > similar to print_hmi_event_info() or print_extlog_rcd().  Then we can
> > > > have a *single* ratelimit_state, and we can check it once to determine
> > > > whether to log this correctable error.
> > >
> > > Is the rate limiting per call location or per device? From above, I
> > > understood rate limiting is "per call location".  If the code only
> > > has one call location, it should achieve the same goal, right?
> >
> > Rate-limiting is per call location, so yes, if we only have one call
> > location, that would solve it.  It would also have the nice property
> > that all the output would be atomic so it wouldn't get mixed with
> > other stuff, and it might encourage us to be a little less wordy in
> > the output.
> >
> 
> +1 to all of those reasons. Especially reducing the number of lines output.
> 
> I'm going to be out for the next week. If someone else (Rajat Kendalwal
> maybe?) wants to rework this to use one call location it should be fairly
> straight forward. If not, I'll tackle this when I'm back (in 2 weeks
> essentially).

Ping?  Really hoping to merge this for v6.5.

Bjorn

Re: [PATCH v8 0/7] Add pci_dev_for_each_resource() helper and update users

2023-05-12 Thread Bjorn Helgaas

On Fri, May 12, 2023 at 01:56:29PM +0300, Andy Shevchenko wrote:
> On Tue, May 09, 2023 at 01:21:22PM -0500, Bjorn Helgaas wrote:
> > On Tue, Apr 04, 2023 at 11:11:01AM -0500, Bjorn Helgaas wrote:
> > > On Thu, Mar 30, 2023 at 07:24:27PM +0300, Andy Shevchenko wrote:
> > > > Provide two new helper macros to iterate over PCI device resources and
> > > > convert users.
> > 
> > > Applied 2-7 to pci/resource for v6.4, thanks, I really like this!
> > 
> > This is 09cc90063240 ("PCI: Introduce pci_dev_for_each_resource()")
> > upstream now.
> > 
> > Coverity complains about each use,
> 
> It needs more clarification here. Use of reduced variant of the
> macro or all of them? If the former one, then I can speculate that
> Coverity (famous for false positives) simply doesn't understand `for
> (type var; var ...)` code.

True, Coverity finds false positives.  It flagged every use in
drivers/pci and drivers/pnp.  It didn't mention the arch/alpha, arm,
mips, powerpc, sh, or sparc uses, but I think it just didn't look at
those.

It flagged both:

  pbus_size_iopci_dev_for_each_resource(dev, r)
  pbus_size_mem   pci_dev_for_each_resource(dev, r, i)

Here's a spreadsheet with a few more details (unfortunately I don't
know how to make it dump the actual line numbers or analysis like I
pasted below, so "pci_dev_for_each_resource" doesn't appear).  These
are mostly in the "Drivers-PCI" component.

https://docs.google.com/spreadsheets/d/1ohOJwxqXXoDUA0gwopgk-z-6ArLvhN7AZn4mIlDkHhQ/edit?usp=sharing

These particular reports are in the "High Impact Outstanding" tab.

> > sample below from
> > drivers/pci/vgaarb.c.  I didn't investigate at all, so it might be a
> > false positive; just FYI.
> > 
> >   1. Condition screen_info.capabilities & (2U /* 1 << 1 */), taking 
> > true branch.
> >   556if (screen_info.capabilities & VIDEO_CAPABILITY_64BIT_BASE)
> >   557base |= (u64)screen_info.ext_lfb_base << 32;
> >   558
> >   559limit = base + size;
> >   560
> >   561/* Does firmware framebuffer belong to us? */
> >   2. Condition __b < PCI_NUM_RESOURCES, taking true branch.
> >   3. Condition (r = &pdev->resource[__b]) , (__b < PCI_NUM_RESOURCES), 
> > taking true branch.
> >   6. Condition __b < PCI_NUM_RESOURCES, taking true branch.
> >   7. cond_at_most: Checking __b < PCI_NUM_RESOURCES implies that __b 
> > may be up to 16 on the true branch.
> >   8. Condition (r = &pdev->resource[__b]) , (__b < PCI_NUM_RESOURCES), 
> > taking true branch.
> >   11. incr: Incrementing __b. The value of __b may now be up to 17.
> >   12. alias: Assigning: r = &pdev->resource[__b]. r may now point to as 
> > high as element 17 of pdev->resource (which consists of 17 64-byte 
> > elements).
> >   13. Condition __b < PCI_NUM_RESOURCES, taking true branch.
> >   14. Condition (r = &pdev->resource[__b]) , (__b < PCI_NUM_RESOURCES), 
> > taking true branch.
> >   562pci_dev_for_each_resource(pdev, r) {
> >   4. Condition resource_type(r) != 512, taking true branch.
> >   9. Condition resource_type(r) != 512, taking true branch.
> > 
> >   CID 1529911 (#1 of 1): Out-of-bounds read (OVERRUN)
> >   15. overrun-local: Overrunning array of 1088 bytes at byte offset 1088 by 
> > dereferencing pointer r. [show details]
> >   563if (resource_type(r) != IORESOURCE_MEM)
> >   5. Continuing loop.
> >   10. Continuing loop.
> >   564continue;
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
>

Re: [EXT] Re: [PATCH v2 1/1] PCI: layerscape: Add the endpoint linkup notifier support

2023-05-09 Thread Bjorn Helgaas

On Mon, May 08, 2023 at 09:45:59PM +, Frank Li wrote:
> > > > Subject: [EXT] Re: [PATCH v2 1/1] PCI: layerscape: Add the endpoint
> > linkup
> > > > notifier support
> > 
> > All these quoted headers are redundant clutter since we've already
> > seen them when Manivannan sent his comments.  It would be nice if your
> > mailer could be configured to omit them.
> 
> Our email client quite stupid. 

Yeah, sometimes those are really hard to work around.

Bjorn

Re: [PATCH v8 0/7] Add pci_dev_for_each_resource() helper and update users

2023-05-09 Thread Bjorn Helgaas

On Tue, Apr 04, 2023 at 11:11:01AM -0500, Bjorn Helgaas wrote:
> On Thu, Mar 30, 2023 at 07:24:27PM +0300, Andy Shevchenko wrote:
> > Provide two new helper macros to iterate over PCI device resources and
> > convert users.

> Applied 2-7 to pci/resource for v6.4, thanks, I really like this!

This is 09cc90063240 ("PCI: Introduce pci_dev_for_each_resource()")
upstream now.

Coverity complains about each use, sample below from
drivers/pci/vgaarb.c.  I didn't investigate at all, so it might be a
false positive; just FYI.

  1. Condition screen_info.capabilities & (2U /* 1 << 1 */), taking 
true branch.
  556if (screen_info.capabilities & VIDEO_CAPABILITY_64BIT_BASE)
  557base |= (u64)screen_info.ext_lfb_base << 32;
  558
  559limit = base + size;
  560
  561/* Does firmware framebuffer belong to us? */
  2. Condition __b < PCI_NUM_RESOURCES, taking true branch.
  3. Condition (r = &pdev->resource[__b]) , (__b < PCI_NUM_RESOURCES), 
taking true branch.
  6. Condition __b < PCI_NUM_RESOURCES, taking true branch.
  7. cond_at_most: Checking __b < PCI_NUM_RESOURCES implies that __b 
may be up to 16 on the true branch.
  8. Condition (r = &pdev->resource[__b]) , (__b < PCI_NUM_RESOURCES), 
taking true branch.
  11. incr: Incrementing __b. The value of __b may now be up to 17.
  12. alias: Assigning: r = &pdev->resource[__b]. r may now point to as 
high as element 17 of pdev->resource (which consists of 17 64-byte elements).
  13. Condition __b < PCI_NUM_RESOURCES, taking true branch.
  14. Condition (r = &pdev->resource[__b]) , (__b < PCI_NUM_RESOURCES), 
taking true branch.
  562pci_dev_for_each_resource(pdev, r) {
  4. Condition resource_type(r) != 512, taking true branch.
  9. Condition resource_type(r) != 512, taking true branch.

  CID 1529911 (#1 of 1): Out-of-bounds read (OVERRUN)
  15. overrun-local: Overrunning array of 1088 bytes at byte offset 1088 by 
dereferencing pointer r. [show details]
  563if (resource_type(r) != IORESOURCE_MEM)
  5. Continuing loop.
  10. Continuing loop.
  564continue;

Re: [EXT] Re: [PATCH v2 1/1] PCI: layerscape: Add the endpoint linkup notifier support

2023-05-08 Thread Bjorn Helgaas

On Mon, May 08, 2023 at 01:31:26PM +, Frank Li wrote:
> > -Original Message-
> > From: Manivannan Sadhasivam 
> > Sent: Saturday, May 6, 2023 2:59 AM
> > To: Frank Li 
> > Cc: M.H. Lian ; Mingkai Hu
> > ; Roy Zang ; Lorenzo Pieralisi
> > ; Rob Herring ; Krzysztof
> > Wilczyński ; Bjorn Helgaas ; open
> > list:PCI DRIVER FOR FREESCALE LAYERSCAPE ;
> > open list:PCI DRIVER FOR FREESCALE LAYERSCAPE  > p...@vger.kernel.org>; moderated list:PCI DRIVER FOR FREESCALE
> > LAYERSCAPE ; open list  > ker...@vger.kernel.org>; i...@lists.linux.dev
> > Subject: [EXT] Re: [PATCH v2 1/1] PCI: layerscape: Add the endpoint linkup
> > notifier support

All these quoted headers are redundant clutter since we've already
seen them when Manivannan sent his comments.  It would be nice if your
mailer could be configured to omit them.

> > > +static int ls_pcie_ep_interrupt_init(struct ls_pcie_ep *pcie,
> > > +  struct platform_device *pdev)
> > > +{
> > > + u32 val;
> > > + int ret;
> > > +
> > > + pcie->irq = platform_get_irq_byname(pdev, "pme");
> > > + if (pcie->irq < 0) {
> > > + dev_err(&pdev->dev, "Can't get 'pme' IRQ\n");
> > 
> > PME
> 
> Here should be dts property `pme`, suppose should match
> platform_get_irq_byname(pdev, "pme");

You can also edit out all the other context and questions if you're
not responding to them.

There were a lot of other comments that were useful but are not
relevant to this reply.

Bjorn

Re: [PATCH v4 2/3] PCI/AER: Disable AER interrupt on suspend

2023-05-05 Thread Bjorn Helgaas

On Mon, Apr 24, 2023 at 01:52:48PM +0800, Kai-Heng Feng wrote:
> PCIe service that shares IRQ with PME may cause spurious wakeup on
> system suspend.
> 
> PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states
> that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready
> (D3hot), L2 (D3cold with aux power) and L3 (D3cold), so we don't lose
> much here to disable AER during system suspend.
> 
> This is very similar to previous attempts to suspend AER and DPC [1],
> but with a different reason.

What is the reason?  I assume it's something to do with the bugzilla
below, but the commit log should outline the user-visible problem this
fixes.  The commit log basically makes the case for "why should we
merge this patch."

I assume it's along the lines of "I tried to suspend this system, but
it immediately woke up again because of an AER interrupt, and
disabling AER during suspend avoids this problem.  And disabling
the AER interrupt is not a problem because X"

> [1] 
> https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.f...@canonical.com/
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295
> 
> Reviewed-by: Mika Westerberg 
> Signed-off-by: Kai-Heng Feng 
> ---
>  drivers/pci/pcie/aer.c | 22 ++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 1420e1f27105..9c07fdbeb52d 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev)
>   return 0;
>  }
>  
> +static int aer_suspend(struct pcie_device *dev)
> +{
> + struct aer_rpc *rpc = get_service_data(dev);
> + struct pci_dev *pdev = rpc->rpd;
> +
> + aer_disable_irq(pdev);
> +
> + return 0;
> +}
> +
> +static int aer_resume(struct pcie_device *dev)
> +{
> + struct aer_rpc *rpc = get_service_data(dev);
> + struct pci_dev *pdev = rpc->rpd;
> +
> + aer_enable_irq(pdev);
> +
> + return 0;
> +}
> +
>  /**
>   * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP
>   * @dev: pointer to Root Port, RCEC, or RCiEP
> @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = {
>   .service= PCIE_PORT_SERVICE_AER,
>  
>   .probe  = aer_probe,
> + .suspend= aer_suspend,
> + .resume = aer_resume,
>   .remove = aer_remove,
>  };
>  
> -- 
> 2.34.1
>

Re: [PATCH v8 2/7] PCI: Export PCI link retrain timeout

2023-05-04 Thread Bjorn Helgaas

On Thu, Apr 06, 2023 at 01:21:09AM +0100, Maciej W. Rozycki wrote:
> Rename LINK_RETRAIN_TIMEOUT to PCIE_LINK_RETRAIN_TIMEOUT and make it
> available via "pci.h" for PCI drivers to use.

> +#define PCIE_LINK_RETRAIN_TIMEOUT HZ

This is basically just a rename and move, but since we're touching it
anyway, can we make it "PCIE_LINK_RETRAIN_TIMEOUT_MS 1000" here and
use msecs_to_jiffies() below?

I know jiffies and HZ are probably idiomatic elsewhere in the kernel,
and this particular timeout is arbitrary and not based on anything in
the spec, but many of the delays in PCI *are* straight from a spec, so
I'd like to make the units more explicit.

>  extern const unsigned char pcie_link_speed[];
>  extern bool pci_early_dump;
>  
> Index: linux-macro/drivers/pci/pcie/aspm.c
> ===
> --- linux-macro.orig/drivers/pci/pcie/aspm.c
> +++ linux-macro/drivers/pci/pcie/aspm.c
> @@ -90,8 +90,6 @@ static const char *policy_str[] = {
>   [POLICY_POWER_SUPERSAVE] = "powersupersave"
>  };
>  
> -#define LINK_RETRAIN_TIMEOUT HZ
> -
>  /*
>   * The L1 PM substate capability is only implemented in function 0 in a
>   * multi function device.
> @@ -213,7 +211,7 @@ static bool pcie_retrain_link(struct pci
>   }
>  
>   /* Wait for link training end. Break out after waiting for timeout */
> - end_jiffies = jiffies + LINK_RETRAIN_TIMEOUT;
> + end_jiffies = jiffies + PCIE_LINK_RETRAIN_TIMEOUT;
>   do {
>   pcie_capability_read_word(parent, PCI_EXP_LNKSTA, ®16);
>   if (!(reg16 & PCI_EXP_LNKSTA_LT))

Re: [PATCH v8 7/7] PCI: Work around PCIe link training failures

2023-05-04 Thread Bjorn Helgaas

On Thu, Apr 06, 2023 at 01:21:31AM +0100, Maciej W. Rozycki wrote:
> Attempt to handle cases such as with a downstream port of the ASMedia 
> ASM2824 PCIe switch where link training never completes and the link 
> continues switching between speeds indefinitely with the data link layer 
> never reaching the active state.

We're going to land this series this cycle, come hell or high water.

We talked about reusing pcie_retrain_link() earlier.  IIRC that didn't
work: ASPM needs to use PCI_EXP_LNKSTA_LT because not all devices
support PCI_EXP_LNKSTA_DLLLA, and you need PCI_EXP_LNKSTA_DLLLA
because the erratum makes PCI_EXP_LNKSTA_LT flap.

What if we made pcie_retrain_link() reusable by making it:

  bool pcie_retrain_link(struct pci_dev *pdev, u16 link_status_bit)

so ASPM could use pcie_retrain_link(link->pdev, PCI_EXP_LNKSTA_LT) and
you could use pcie_retrain_link(dev, PCI_EXP_LNKSTA_DLLLA)?

Maybe do it two steps?

  1) Move pcie_retrain_link() just after pcie_wait_for_link() and make
  it take link->pdev instead of link.

  2) Add the bit parameter.

I'm OK with having pcie_retrain_link() in pci.c, but the surrounding
logic about restricting to 2.5GT/s, retraining, removing the
restriction, retraining again is stuff I'd rather have in quirks.c so
it doesn't clutter pci.c.

I think it'd be good if the pci_device_add() path made clear that this
is a workaround for a problem, e.g.,

  void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
  {
...
if (pcie_link_failed(dev))
  pcie_fix_link_train(dev);

where pcie_fix_link_train() could live in quirks.c (with a stub when
CONFIG_PCI_QUIRKS isn't enabled).  It *might* even be worth adding it
and the stub first because that's a trivial patch and wouldn't clutter
the probe.c git history with all the grotty details about ASM2824 and
this topology.

> +int pcie_downstream_link_retrain(struct pci_dev *dev)
> +{
> + static const struct pci_device_id ids[] = {
> + { PCI_VDEVICE(ASMEDIA, 0x2824) }, /* ASMedia ASM2824 */
> + {}
> + };
> + u16 lnksta, lnkctl2;
> +
> + if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) ||
> + !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting)
> + return -1;
> +
> + pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2);
> + pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
> + if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) ==
> + PCI_EXP_LNKSTA_LBMS) {

You go to some trouble to make sure PCI_EXP_LNKSTA_LBMS is set, and I
can't remember what the reason is.  If you make a preparatory patch
like this, it would give a place for that background, e.g.,

  +bool pcie_link_failed(struct pci_dev *dev)
  +{
  +   u16 lnksta;
  +
  +   if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) ||
  +   !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting)
  +   return false;
  +
  +   pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
  +   if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) ==
  +   PCI_EXP_LNKSTA_LBMS)
  +   return true;
  +
  +   return false;
  +}

If this is a generic thing and checking PCI_EXP_LNKSTA_LBMS makes
sense for everybody, it could go in pci.c; otherwise it could go in
quirks.c as well.  I guess it's not *truly* generic anyway because it
only detects link training failures for devices that have LNKCTL2 and
link_active_reporting.

> + unsigned long timeout;
> + u16 lnkctl;
> +
> + pci_info(dev, "broken device, retraining non-functional 
> downstream link at 2.5GT/s\n");
> +
> + pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
> + lnkctl |= PCI_EXP_LNKCTL_RL;
> + lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
> + lnkctl2 |= PCI_EXP_LNKCTL2_TLS_2_5GT;
> + pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2);
> + pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
> + /*
> +  * Due to an erratum in some devices the Retrain Link bit
> +  * needs to be cleared again manually to allow the link
> +  * training to succeed.
> +  */
> + lnkctl &= ~PCI_EXP_LNKCTL_RL;
> + if (dev->clear_retrain_link)
> + pcie_capability_write_word(dev, PCI_EXP_LNKCTL,
> +lnkctl);
> +
> + timeout = jiffies + PCIE_LINK_RETRAIN_TIMEOUT;
> + do {
> + pcie_capability_read_word(dev, PCI_EXP_LNKSTA,
> +  &lnksta);
> + if (lnksta & PCI_EXP_LNKSTA_DLLLA)
> + break;
> + usleep_range(1, 2);
> + } while (time_before(jiffies, timeout));
> +
> + if (!(lnksta & PCI_EXP_LNKSTA_DLLLA)) {
> +

Re: [PATCH 1/1] PCI: layerscape: Add the endpoint linkup notifier support

2023-04-28 Thread Bjorn Helgaas

On Thu, Apr 20, 2023 at 06:11:17PM -0400, Frank Li wrote:
> Layerscape has PME interrupt, which can be use as linkup notifer.
> Set CFG_READY bit when linkup detected.

s/use/used/
s/notifer/notifier/

> +/* PEX PFa PCIE pme and message interrupt registers*/

s/pme/PME/ to match other usage and spec.

> + dev_info(pci->dev, "Detect the link up state !\n");
> + } else if (val & PEX_PF0_PME_MES_DR_LDD) {
> + dev_info(pci->dev, "Detect the link down state !\n");
> + } else if (val & PEX_PF0_PME_MES_DR_HRD) {
> + dev_info(pci->dev, "Detect the hot reset state !\n");

No spaces before "!".  Omit the "!" completely unless these are
unexpected situations.  They seem ordinary to me.

Would probably be better as just "Link up", "Link down", "Hot reset".
Or "Link up state detected" if you want.

> + dev_err(&pdev->dev, "Can't get 'pme' irq.\n");
> + dev_err(&pdev->dev, "Can't register PCIe IRQ.\n");

Capitalize "IRQ" in both the above message and this one.  No "."
needed at the end.

Bjorn

Re: [PATCH] PCI: Use of_property_present() for testing DT property presence

2023-04-18 Thread Bjorn Helgaas

On Fri, Mar 10, 2023 at 08:47:19AM -0600, Rob Herring wrote:
> It is preferred to use typed property access functions (i.e.
> of_property_read_ functions) rather than low-level
> of_get_property/of_find_property functions for reading properties. As
> part of this, convert of_get_property/of_find_property calls to the
> recently added of_property_present() helper when we just want to test
> for presence of a property and nothing more.
> 
> Signed-off-by: Rob Herring 

Applied with AngeloGioacchino's reviewed-by to pci/enumeration for
v6.4, thanks!

> ---
>  drivers/pci/controller/pci-tegra.c | 4 ++--
>  drivers/pci/controller/pcie-mediatek.c | 2 +-
>  drivers/pci/hotplug/rpaphp_core.c  | 4 ++--
>  drivers/pci/of.c   | 2 +-
>  4 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/pci/controller/pci-tegra.c 
> b/drivers/pci/controller/pci-tegra.c
> index 74c109f14ff0..79630885b9c8 100644
> --- a/drivers/pci/controller/pci-tegra.c
> +++ b/drivers/pci/controller/pci-tegra.c
> @@ -1375,7 +1375,7 @@ static int tegra_pcie_phys_get(struct tegra_pcie *pcie)
>   struct tegra_pcie_port *port;
>   int err;
>  
> - if (!soc->has_gen2 || of_find_property(np, "phys", NULL) != NULL)
> + if (!soc->has_gen2 || of_property_present(np, "phys"))
>   return tegra_pcie_phys_get_legacy(pcie);
>  
>   list_for_each_entry(port, &pcie->ports, list) {
> @@ -1944,7 +1944,7 @@ static bool of_regulator_bulk_available(struct 
> device_node *np,
>   for (i = 0; i < num_supplies; i++) {
>   snprintf(property, 32, "%s-supply", supplies[i].supply);
>  
> - if (of_find_property(np, property, NULL) == NULL)
> + if (!of_property_present(np, property))
>   return false;
>   }
>  
> diff --git a/drivers/pci/controller/pcie-mediatek.c 
> b/drivers/pci/controller/pcie-mediatek.c
> index ae5ad05ddc1d..31de7a29192c 100644
> --- a/drivers/pci/controller/pcie-mediatek.c
> +++ b/drivers/pci/controller/pcie-mediatek.c
> @@ -643,7 +643,7 @@ static int mtk_pcie_setup_irq(struct mtk_pcie_port *port,
>   return err;
>   }
>  
> - if (of_find_property(dev->of_node, "interrupt-names", NULL))
> + if (of_property_present(dev->of_node, "interrupt-names"))
>   port->irq = platform_get_irq_byname(pdev, "pcie_irq");
>   else
>   port->irq = platform_get_irq(pdev, port->slot);
> diff --git a/drivers/pci/hotplug/rpaphp_core.c 
> b/drivers/pci/hotplug/rpaphp_core.c
> index 491986197c47..2316de0fd198 100644
> --- a/drivers/pci/hotplug/rpaphp_core.c
> +++ b/drivers/pci/hotplug/rpaphp_core.c
> @@ -278,7 +278,7 @@ int rpaphp_check_drc_props(struct device_node *dn, char 
> *drc_name,
>   return -EINVAL;
>   }
>  
> - if (of_find_property(dn->parent, "ibm,drc-info", NULL))
> + if (of_property_present(dn->parent, "ibm,drc-info"))
>   return rpaphp_check_drc_props_v2(dn, drc_name, drc_type,
>   be32_to_cpu(*my_index));
>   else
> @@ -440,7 +440,7 @@ int rpaphp_add_slot(struct device_node *dn)
>   if (!of_node_name_eq(dn, "pci"))
>   return 0;
>  
> - if (of_find_property(dn, "ibm,drc-info", NULL))
> + if (of_property_present(dn, "ibm,drc-info"))
>   return rpaphp_drc_info_add_slot(dn);
>   else
>   return rpaphp_drc_add_slot(dn);
> diff --git a/drivers/pci/of.c b/drivers/pci/of.c
> index 196834ed44fe..e085f2eca372 100644
> --- a/drivers/pci/of.c
> +++ b/drivers/pci/of.c
> @@ -447,7 +447,7 @@ static int of_irq_parse_pci(const struct pci_dev *pdev, 
> struct of_phandle_args *
>   return -ENODEV;
>  
>   /* Local interrupt-map in the device node? Use it! */
> - if (of_get_property(dn, "interrupt-map", NULL)) {
> + if (of_property_present(dn, "interrupt-map")) {
>   pin = pci_swizzle_interrupt_pin(pdev, pin);
>   ppnode = dn;
>   }
> -- 
> 2.39.2
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Re: [PATCH v3 6/6] PCI/AER: Unmask RCEC internal errors to enable RCH downstream port error handling

2023-04-14 Thread Bjorn Helgaas

On Thu, Apr 13, 2023 at 03:38:07PM +0200, Robert Richter wrote:
> On 12.04.23 16:29:01, Bjorn Helgaas wrote:
> > On Tue, Apr 11, 2023 at 01:03:02PM -0500, Terry Bowman wrote:
> > > From: Robert Richter 
> > > 
> > > RCEC AER corrected and uncorrectable internal errors (CIE/UIE) are
> > > disabled by default.

> > > +static void cxl_unmask_internal_errors(struct pci_dev *rcec)
> 
> Also renaming this to cxl_enable_rcec() to more generalize the
> function.

I didn't follow this.  "cxl_enable_rcec" doesn't say anything about
"unmasking" or "internal errors", which seems like the whole point.
And the function doesn't actually *enable* and RCEC.

> > > +{
> > > + if (!handles_cxl_errors(rcec))
> > > + return;
> > > +
> > > + if (__cxl_unmask_internal_errors(rcec))
> > > + dev_err(&rcec->dev, "cxl: Failed to unmask internal errors");
> > > + else
> > > + dev_dbg(&rcec->dev, "cxl: Internal errors unmasked");
> 
> I am going to change this to a pci_info() for alignment with other
> messages around:
> 
> [   14.200265] pcieport :40:00.3: PME: Signaling with IRQ 44
> [   14.213925] pcieport :40:00.3: AER: cxl: Internal errors unmasked
> [   14.228413] pcieport :40:00.3: AER: enabled with IRQ 44
> 
> Plus, using pci_err() instead of dev_err().

Thanks for that!

Bjorn

Re: [PATCH v3 5/6] PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem dev handler

2023-04-14 Thread Bjorn Helgaas

On Thu, Apr 13, 2023 at 01:40:52PM +0200, Robert Richter wrote:
> On 12.04.23 17:02:33, Bjorn Helgaas wrote:
> > On Tue, Apr 11, 2023 at 01:03:01PM -0500, Terry Bowman wrote:
> > > From: Robert Richter 

> ...
> Let's assume just a simple CXL RCH topology:
> 
> PCI hierarchy:
> 
>   -
>   | ACPI0016  |--   Host bridge (CXL host)
>   | - CEDT| |
>---|   - RCRB base | |
>|  - :
>|   |
>|   |
>|  --- -
>|  | RCiEP   |.| RCEC  | Endpoint (CXL dev)
>|  | - BDF   | | - BDF |
>|  |   | - PCIe AER  | -
>|  |   | - CXL dvsec |
>|  |   |   (v2: reg loc) |
>|  |   |   - Comp regs   |
>|  |   | - CXL RAS   |
>|  |   ---
>:  :
>   
> CXL hierarchy:
> 
>::
>:  --|
>|  | CXL root port  |<
>|  ||
>|->| - dport RCRB   |<
>|  |   - PCIe AER   ||
>|  |   - Comp regs  ||
>|  | - CXL RAS  ||
>|  --|
>|  : |
>|  |   --|
>|  --->| CXL endpoint   |-
>|  | (v1: RCRB) |
>-->| - uport RCRB   |
>   |   - Comp regs  |
>   | - CXL RAS  |
>   --
> 
> Dport detected errors are reported using PCIe AER and CXL RAS caps in
> the dports RCRB.
> 
> Uport detected errors are reported using RCiEP's PCIe AER cap and
> either the uport's RCRB RAS cap or the RAS cap of the comp regs
> located using CXL DVSEC register locator.
> 
> In all cases the RCEC is used with either the RCEC (dport errors) or
> the RCiEP (uport errors) error source id (BDF: bus, dev, func).

I'm mostly interested in the PCI entities involved because that's all
aer.c can deal with.  For the above, I think the PCI core only knows
about these:

  00:00.0 RCEC  with AER, RCEC EA includes 00:01.0
  00:01.0 RCiEP with AER

aer_irq() would handle AER interrupts from 00:00.0.
cxl_handle_error() would be called for 00:00.0 and would call
handle_error_source() for everything below it (only 00:01.0 here).

> > The current code uses pcie_walk_rcec() in this path, which basically
> > searches below a Root Port or RCEC for devices that have an AER error
> > status bit set, add them to the e_info[] list, and call
> > handle_error_source() for each one:
> 
> For reference, this series adds support to handle RCH downstream
> port-detected errors as described in CXL 3.0, 12.2.1.1.
> 
> This flow looks correct to me, see comments inline.

We seem to be on the same page here, so I'll trim it out.

> ...
> > So we insert cxl_handle_error() in handle_error_source(), where it
> > gets called for the RCEC, and then it uses pcie_walk_rcec() again to
> > forcibly call handle_error_source() for *every* device "below" the
> > RCEC (even though they don't have AER error status bits set).
> 
> The CXL device contains the links to the dport's caps. Also, there can
> be multiple RCs with CXL devs connected to it. So we must search for
> all CXL devices now, determine the corresponding dport and inspect
> both, PCIe AER and CXL RAS caps.
> 
> > Then handle_error_source() ultimately calls the CXL driver err_handler
> > entry points (.cor_error_detected(), .error_detected(), etc), which
> > can look at the CXL-specific error status in the CXL RAS or RCRB or
> > whatever.
> 
> The AER driver (portdrv) does not have the knowledge of CXL internals.
> Thus the approach is to pass dport errors to the cxl_mem driver to
> handle it there in addition to cxl mem dev errors.
> 
> > So this basically looks like a workaround for the fact that the AER
> > code only calls handle_error_source() when it finds AER error status,
> > and CXL doesn't *set* that AER error status.  There's not that much
> > code here, but it seems like a quite a bit of complexity in an area
> > that is already pretty complicated.

My main point here (correct me if I got this wrong) is that:

  - A RCEC generates an AER interrupt

  - find_source_device() searches all devices below the RCEC and
builds a list everything for

Re: [PATCH v3 5/6] PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem dev handler

2023-04-12 Thread Bjorn Helgaas

atus, package it up similarly,
and queue it via aer_recover_queue()?

> [1] CXL 3.0 spec, 12.2.1.1 RCH Downstream Port-detected Errors
> [2] CXL 3.0 spec, 8.1.3 PCIe DVSEC for CXL Devices
> 
> Co-developed-by: Terry Bowman 
> Signed-off-by: Robert Richter 
> Signed-off-by: Terry Bowman 
> Cc: "Oliver O'Halloran" 
> Cc: Bjorn Helgaas 
> Cc: Mahesh J Salgaonkar 
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-...@vger.kernel.org
> ---
>  drivers/pci/pcie/Kconfig |  8 ++
>  drivers/pci/pcie/aer.c   | 61 
>  2 files changed, 69 insertions(+)
> 
> diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
> index 228652a59f27..b0dbd864d3a3 100644
> --- a/drivers/pci/pcie/Kconfig
> +++ b/drivers/pci/pcie/Kconfig
> @@ -49,6 +49,14 @@ config PCIEAER_INJECT
> gotten from:
>
> https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/
>  
> +config PCIEAER_CXL
> + bool "PCI Express CXL RAS support"
> + default y
> + depends on PCIEAER && CXL_PCI
> + help
> +   This enables CXL error handling for Restricted CXL Hosts
> +   (RCHs).
> +
>  #
>  # PCI Express ECRC
>  #
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 7a25b62d9e01..171a08fd8ebd 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -946,6 +946,65 @@ static bool find_source_device(struct pci_dev *parent,
>   return true;
>  }
>  
> +#ifdef CONFIG_PCIEAER_CXL
> +
> +static bool is_cxl_mem_dev(struct pci_dev *dev)
> +{
> + /*
> +  * A CXL device is controlled only using PCIe Configuration
> +  * Space of device 0, Function 0.
> +  */
> + if (dev->devfn != PCI_DEVFN(0, 0))
> + return false;
> +
> + /* Right now there is only a CXL.mem driver */
> + if ((dev->class >> 8) != PCI_CLASS_MEMORY_CXL)
> + return false;
> +
> + return true;
> +}
> +
> +static bool is_internal_error(struct aer_err_info *info)
> +{
> + if (info->severity == AER_CORRECTABLE)
> + return info->status & PCI_ERR_COR_INTERNAL;
> +
> + return info->status & PCI_ERR_UNC_INTN;
> +}
> +
> +static void handle_error_source(struct pci_dev *dev, struct aer_err_info 
> *info);
> +
> +static int cxl_handle_error_iter(struct pci_dev *dev, void *data)
> +{
> + struct aer_err_info *e_info = (struct aer_err_info *)data;
> +
> + if (!is_cxl_mem_dev(dev))
> + return 0;
> +
> + /* pci_dev_put() in handle_error_source() */
> + dev = pci_dev_get(dev);
> + if (dev)
> + handle_error_source(dev, e_info);
> +
> + return 0;
> +}
> +
> +static void cxl_handle_error(struct pci_dev *dev, struct aer_err_info *info)
> +{
> + /*
> +  * CXL downstream port errors are signaled as RCEC internal
> +  * errors. Forward them to all CXL devices below the RCEC.
> +  */
> + if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
> + is_internal_error(info))
> + pcie_walk_rcec(dev, cxl_handle_error_iter, info);
> +}
> +
> +#else
> +static inline void cxl_handle_error(struct pci_dev *dev,
> + struct aer_err_info *info) { }
> +#endif
> +
>  /**
>   * handle_error_source - handle logging error into an event log
>   * @dev: pointer to pci_dev data structure of error source device
> @@ -957,6 +1016,8 @@ static void handle_error_source(struct pci_dev *dev, 
> struct aer_err_info *info)
>  {
>   int aer = dev->aer_cap;
>  
> + cxl_handle_error(dev, info);
> +
>   if (info->severity == AER_CORRECTABLE) {
>   /*
>* Correctable error does not need software intervention.
> -- 
> 2.34.1
>

Re: [PATCH v3 6/6] PCI/AER: Unmask RCEC internal errors to enable RCH downstream port error handling

2023-04-12 Thread Bjorn Helgaas

On Tue, Apr 11, 2023 at 01:03:02PM -0500, Terry Bowman wrote:
> From: Robert Richter 
> 
> RCEC AER corrected and uncorrectable internal errors (CIE/UIE) are
> disabled by default.

"Disabled by default" just means "the power-up state of CIE/UIC is
that they are masked", right?  It doesn't mean that Linux normally
masks them.

> [1][2] Enable them to receive CXL downstream port
> errors of a Restricted CXL Host (RCH).
> 
> [1] CXL 3.0 Spec, 12.2.1.1 - RCH Downstream Port Detected Errors
> [2] PCIe Base Spec 6.0, 7.8.4.3 Uncorrectable Error Mask Register,
> 7.8.4.6 Correctable Error Mask Register
> 
> Co-developed-by: Terry Bowman 
> Signed-off-by: Robert Richter 
> Signed-off-by: Terry Bowman 
> Cc: "Oliver O'Halloran" 
> Cc: Bjorn Helgaas 
> Cc: Mahesh J Salgaonkar 
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-...@vger.kernel.org
> ---
>  drivers/pci/pcie/aer.c | 73 ++
>  1 file changed, 73 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 171a08fd8ebd..3973c731e11d 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1000,7 +1000,79 @@ static void cxl_handle_error(struct pci_dev *dev, 
> struct aer_err_info *info)
>   pcie_walk_rcec(dev, cxl_handle_error_iter, info);
>  }
>  
> +static bool cxl_error_is_native(struct pci_dev *dev)
> +{
> + struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
> +
> + if (pcie_ports_native)
> + return true;
> +
> + return host->native_aer && host->native_cxl_error;
> +}
> +
> +static int handles_cxl_error_iter(struct pci_dev *dev, void *data)
> +{
> + int *handles_cxl = data;
> +
> + *handles_cxl = is_cxl_mem_dev(dev) && cxl_error_is_native(dev);
> +
> + return *handles_cxl;
> +}
> +
> +static bool handles_cxl_errors(struct pci_dev *rcec)
> +{
> + int handles_cxl = 0;
> +
> + if (!rcec->aer_cap)
> + return false;
> +
> + if (pci_pcie_type(rcec) == PCI_EXP_TYPE_RC_EC)
> + pcie_walk_rcec(rcec, handles_cxl_error_iter, &handles_cxl);
> +
> + return !!handles_cxl;
> +}
> +
> +static int __cxl_unmask_internal_errors(struct pci_dev *rcec)
> +{
> + int aer, rc;
> + u32 mask;
> +
> + /*
> +  * Internal errors are masked by default, unmask RCEC's here
> +  * PCI6.0 7.8.4.3 Uncorrectable Error Mask Register (Offset 08h)
> +  * PCI6.0 7.8.4.6 Correctable Error Mask Register (Offset 14h)
> +  */

Unmasking internal errors doesn't have anything specific to do with
CXL, so I don't think it should have "cxl" in the function name.
Maybe something like "pci_aer_unmask_internal_errors()".

This also has nothing special to do with RCECs, so I think we should
refer to the device as "dev" as is typical in this file.

I think this needs to check pcie_aer_is_native() as is done by
pci_aer_clear_nonfatal_status() and other functions that write the AER
Capability.

With the exception of this function, this patch looks like all CXL
code that maybe could be with other CXL code.  Would require making
pcie_walk_rcec() available outside drivers/pci, I guess.

> + aer = rcec->aer_cap;
> + rc = pci_read_config_dword(rcec, aer + PCI_ERR_UNCOR_MASK, &mask);
> + if (rc)
> + return rc;
> + mask &= ~PCI_ERR_UNC_INTN;
> + rc = pci_write_config_dword(rcec, aer + PCI_ERR_UNCOR_MASK, mask);
> + if (rc)
> + return rc;
> +
> + rc = pci_read_config_dword(rcec, aer + PCI_ERR_COR_MASK, &mask);
> + if (rc)
> + return rc;
> + mask &= ~PCI_ERR_COR_INTERNAL;
> + rc = pci_write_config_dword(rcec, aer + PCI_ERR_COR_MASK, mask);
> +
> + return rc;
> +}
> +
> +static void cxl_unmask_internal_errors(struct pci_dev *rcec)
> +{
> + if (!handles_cxl_errors(rcec))
> + return;
> +
> + if (__cxl_unmask_internal_errors(rcec))
> + dev_err(&rcec->dev, "cxl: Failed to unmask internal errors");
> + else
> + dev_dbg(&rcec->dev, "cxl: Internal errors unmasked");
> +}
> +
>  #else
> +static inline void cxl_unmask_internal_errors(struct pci_dev *dev) { }
>  static inline void cxl_handle_error(struct pci_dev *dev,
>   struct aer_err_info *info) { }
>  #endif
> @@ -1397,6 +1469,7 @@ static int aer_probe(struct pcie_device *dev)
>   return status;
>   }
>  
> + cxl_unmask_internal_errors(port);
>   aer_enable_rootport(rpc);
>   pci_info(port, "enabled with IRQ %d\n", dev->irq);
>   return 0;
> -- 
> 2.34.1
>

Re: [PATCHv2 pci-next 2/2] PCI/AER: Rate limit the reporting of the correctable errors

2023-04-07 Thread Bjorn Helgaas

On Fri, Apr 07, 2023 at 11:53:27AM -0700, Grant Grundler wrote:
> On Thu, Apr 6, 2023 at 12:50 PM Bjorn Helgaas  wrote:
> > On Fri, Mar 17, 2023 at 10:51:09AM -0700, Grant Grundler wrote:
> > > From: Rajat Khandelwal 
> > >
> > > There are many instances where correctable errors tend to inundate
> > > the message buffer. We observe such instances during thunderbolt PCIe
> > > tunneling.
> ...

> > >   if (info->severity == AER_CORRECTABLE)
> > > - pci_info(dev, "   [%2d] %-22s%s\n", i, errmsg,
> > > - info->first_error == i ? " (First)" : "");
> > > + pci_info_ratelimited(dev, "   [%2d] %-22s%s\n", i, 
> > > errmsg,
> > > +  info->first_error == i ? " 
> > > (First)" : "");
> >
> > I don't think this is going to reliably work the way we want.  We have
> > a bunch of pci_info_ratelimited() calls, and each caller has its own
> > ratelimit_state data.  Unless we call pci_info_ratelimited() exactly
> > the same number of times for each error, the ratelimit counters will
> > get out of sync and we'll end up printing fragments from error A mixed
> > with fragments from error B.
> 
> Ok - what I'm reading between the lines here is the output should be
> emitted in one step, not multiple pci_info_ratelimited() calls. if the
> code built an output string (using sprintnf()), and then called
> pci_info_ratelimited() exactly once at the bottom, would that be
> sufficient?
>
> > I think we need to explicitly manage the ratelimiting ourselves,
> > similar to print_hmi_event_info() or print_extlog_rcd().  Then we can
> > have a *single* ratelimit_state, and we can check it once to determine
> > whether to log this correctable error.
> 
> Is the rate limiting per call location or per device? From above, I
> understood rate limiting is "per call location".  If the code only
> has one call location, it should achieve the same goal, right?

Rate-limiting is per call location, so yes, if we only have one call
location, that would solve it.  It would also have the nice property
that all the output would be atomic so it wouldn't get mixed with
other stuff, and it might encourage us to be a little less wordy in
the output.

But I don't think we need output in a single step; we just need a
single instance of ratelimit_state (or one for CPER path and another
for native AER path), and that can control all the output for a single
error.  E.g., print_hmi_event_info() looks like this:

  static void print_hmi_event_info(...)
  {
static DEFINE_RATELIMIT_STATE(rs, ...);

if (__ratelimit(&rs)) {
  printk("%s%s Hypervisor Maintenance interrupt ...");
  printk("%s Error detail: %s\n", ...);
  printk("%s  HMER: %016llx\n", ...);
}
  }

I think it's nice that the struct ratelimit_state is explicit and
there's no danger of breaking it when adding another printk later.

It *could* be per pci_dev, too, but I suspect it's not worth spending
40ish bytes per device for the ratelimit data.

Bjorn

Re: [PATCHv2 pci-next 2/2] PCI/AER: Rate limit the reporting of the correctable errors

2023-04-06 Thread Bjorn Helgaas

On Fri, Mar 17, 2023 at 10:51:09AM -0700, Grant Grundler wrote:
> From: Rajat Khandelwal 
> 
> There are many instances where correctable errors tend to inundate
> the message buffer. We observe such instances during thunderbolt PCIe
> tunneling.
> 
> It's true that they are mitigated by the hardware and are non-fatal
> but we shouldn't be spamming the logs with such correctable errors as it
> confuses other kernel developers less familiar with PCI errors, support
> staff, and users who happen to look at the logs, hence rate limit them.
> 
> A typical example log inside an HP TBT4 dock:
> [54912.661142] pcieport :00:07.0: AER: Multiple Corrected error received: 
> :2b:00.0
> [54912.661194] igc :2b:00.0: PCIe Bus Error: severity=Corrected, 
> type=Data Link Layer, (Transmitter ID)
> [54912.661203] igc :2b:00.0:   device [8086:5502] error 
> status/mask=1100/2000
> [54912.661211] igc :2b:00.0:[ 8] Rollover
> [54912.661219] igc :2b:00.0:[12] Timeout
> [54982.838760] pcieport :00:07.0: AER: Corrected error received: 
> :2b:00.0
> [54982.838798] igc :2b:00.0: PCIe Bus Error: severity=Corrected, 
> type=Data Link Layer, (Transmitter ID)
> [54982.838808] igc :2b:00.0:   device [8086:5502] error 
> status/mask=1000/2000
> [54982.838817] igc :2b:00.0:[12] Timeout

The timestamps don't contribute to understanding the problem, so we
can omit them.

> This gets repeated continuously, thus inundating the buffer.
> 
> Signed-off-by: Rajat Khandelwal 
> Signed-off-by: Grant Grundler 
> ---
>  drivers/pci/pcie/aer.c | 42 --
>  1 file changed, 28 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index cb6b96233967..b592cea8bffe 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -706,8 +706,8 @@ static void __aer_print_error(struct pci_dev *dev,
>   errmsg = "Unknown Error Bit";
>  
>   if (info->severity == AER_CORRECTABLE)
> - pci_info(dev, "   [%2d] %-22s%s\n", i, errmsg,
> - info->first_error == i ? " (First)" : "");
> + pci_info_ratelimited(dev, "   [%2d] %-22s%s\n", i, 
> errmsg,
> +  info->first_error == i ? " 
> (First)" : "");

I don't think this is going to reliably work the way we want.  We have
a bunch of pci_info_ratelimited() calls, and each caller has its own
ratelimit_state data.  Unless we call pci_info_ratelimited() exactly
the same number of times for each error, the ratelimit counters will
get out of sync and we'll end up printing fragments from error A mixed
with fragments from error B.

I think we need to explicitly manage the ratelimiting ourselves,
similar to print_hmi_event_info() or print_extlog_rcd().  Then we can
have a *single* ratelimit_state, and we can check it once to determine
whether to log this correctable error.

>   else
>   pci_err(dev, "   [%2d] %-22s%s\n", i, errmsg,
>   info->first_error == i ? " (First)" : "");
> @@ -719,7 +719,6 @@ void aer_print_error(struct pci_dev *dev, struct 
> aer_err_info *info)
>  {
>   int layer, agent;
>   int id = ((dev->bus->number << 8) | dev->devfn);
> - const char *level;
>  
>   if (!info->status) {
>   pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, 
> (Unregistered Agent ID)\n",
> @@ -730,14 +729,21 @@ void aer_print_error(struct pci_dev *dev, struct 
> aer_err_info *info)
>   layer = AER_GET_LAYER_ERROR(info->severity, info->status);
>   agent = AER_GET_AGENT(info->severity, info->status);
>  
> - level = (info->severity == AER_CORRECTABLE) ? KERN_INFO : KERN_ERR;
> + if (info->severity == AER_CORRECTABLE) {
> + pci_info_ratelimited(dev, "PCIe Bus Error: severity=%s, 
> type=%s, (%s)\n",
> +  aer_error_severity_string[info->severity],
> +  aer_error_layer[layer], 
> aer_agent_string[agent]);
>  
> - pci_printk(level, dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
> -aer_error_severity_string[info->severity],
> -aer_error_layer[layer], aer_agent_string[agent]);
> + pci_info_ratelimited(dev, "  device [%04x:%04x] error 
> status/mask=%08x/%08x\n",
> +  dev->vendor, dev->device, info->status, 
> info->mask);
> + } else {
> + pci_err(dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
> + aer_error_severity_string[info->severity],
> + aer_error_layer[layer], aer_agent_string[agent]);
>  
> - pci_printk(level, dev, "  device [%04x:%04x] error 
> status/mask=%08x/%08x\n",
> -dev->vendor, dev->device, info->status, info->mask);
> + pci_err(dev, "  device [%04x:%04x] error

Re: [PATCH v8 0/7] Add pci_dev_for_each_resource() helper and update users

2023-04-05 Thread Bjorn Helgaas

On Wed, Apr 05, 2023 at 11:28:27AM +0300, Andy Shevchenko wrote:
> On Tue, Apr 04, 2023 at 11:11:01AM -0500, Bjorn Helgaas wrote:
> > On Thu, Mar 30, 2023 at 07:24:27PM +0300, Andy Shevchenko wrote:
> > > Provide two new helper macros to iterate over PCI device resources and
> > > convert users.
> > > 
> > > Looking at it, refactor existing pci_bus_for_each_resource() and convert
> > > users accordingly.

> > Applied 2-7 to pci/resource for v6.4, thanks, I really like this!
> 
> Btw, can you actually drop patch 7, please?

Done.

> > I omitted
> > 
> >   [1/7] kernel.h: Split out COUNT_ARGS() and CONCATENATE()"
> > 
> > only because it's not essential to this series and has only a trivial
> > one-line impact on include/linux/pci.h.
> 
> I'm not sure I understood what exactly "essentiality" means to you, but
> I included that because it makes the split which can be used later by
> others and not including kernel.h in the header is the objective I want
> to achieve. Without this patch the achievement is going to be deferred.
> Yet, this, as you have noticed, allows to compile and use the macros in
> the rest of the patches.

I haven't followed the kernel.h splitting, and I try to avoid
incidental changes outside of the files I maintain, so I just wanted
to keep this series purely PCI and avoid any possible objections to a
new include file or discussion about how it should be done.

Re: [PATCH v8 5/7] PCI: Allow pci_bus_for_each_resource() to take less arguments

2023-04-05 Thread Bjorn Helgaas

On Wed, Apr 05, 2023 at 02:50:47PM +0300, Andy Shevchenko wrote:
> On Thu, Mar 30, 2023 at 07:24:32PM +0300, Andy Shevchenko wrote:
> > Refactor pci_bus_for_each_resource() in the same way as it's done in
> > pci_dev_for_each_resource() case. This will allow to hide iterator
> > inside the loop, where it's not used otherwise.
> > 
> > No functional changes intended.
> 
> Bjorn, this has wrong author in your tree:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/commit/?h=resource&id=46dbad19a59e0dd8f1e7065e5281345797fbb365

I botched it, sorry, should be fixed now.

Bjorn

Re: [PATCH v8 0/7] Add pci_dev_for_each_resource() helper and update users

2023-04-04 Thread Bjorn Helgaas

On Thu, Mar 30, 2023 at 07:24:27PM +0300, Andy Shevchenko wrote:
> Provide two new helper macros to iterate over PCI device resources and
> convert users.
> 
> Looking at it, refactor existing pci_bus_for_each_resource() and convert
> users accordingly.
> 
> Note, the amount of lines grew due to the documentation update.
> 
> Changelog v8:
> - fixed issue with pci_bus_for_each_resource() macro (LKP)
> - due to above added a new patch to document how it works
> - moved the last patch to be #2 (Philippe)
> - added tags (Philippe)
> 
> Changelog v7:
> - made both macros to share same name (Bjorn)

I didn't actually request the same name for both; I would have had no
idea how to even do that :)

v6 had:

  pci_dev_for_each_resource_p(dev, res)
  pci_dev_for_each_resource(dev, res, i)

and I suggested:

  pci_dev_for_each_resource(dev, res)
  pci_dev_for_each_resource_idx(dev, res, i)

because that pattern is used elsewhere.  But you figured out how to do
it, and having one name is even better, so thanks for that extra work!

> - split out the pci_resource_n() conversion (Bjorn)
> 
> Changelog v6:
> - dropped unused variable in PPC code (LKP)
> 
> Changelog v5:
> - renamed loop variable to minimize the clash (Keith)
> - addressed smatch warning (Dan)
> - addressed 0-day bot findings (LKP)
> 
> Changelog v4:
> - rebased on top of v6.3-rc1
> - added tag (Krzysztof)
> 
> Changelog v3:
> - rebased on top of v2 by Mika, see above
> - added tag to pcmcia patch (Dominik)
> 
> Changelog v2:
> - refactor to have two macros
> - refactor existing pci_bus_for_each_resource() in the same way and
>   convert users
> 
> Andy Shevchenko (6):
>   kernel.h: Split out COUNT_ARGS() and CONCATENATE()
>   PCI: Introduce pci_resource_n()
>   PCI: Document pci_bus_for_each_resource() to avoid confusion
>   PCI: Allow pci_bus_for_each_resource() to take less arguments
>   EISA: Convert to use less arguments in pci_bus_for_each_resource()
>   pcmcia: Convert to use less arguments in pci_bus_for_each_resource()
> 
> Mika Westerberg (1):
>   PCI: Introduce pci_dev_for_each_resource()
> 
>  .clang-format |  1 +
>  arch/alpha/kernel/pci.c   |  5 +-
>  arch/arm/kernel/bios32.c  | 16 +++--
>  arch/arm/mach-dove/pcie.c | 10 ++--
>  arch/arm/mach-mv78xx0/pcie.c  | 10 ++--
>  arch/arm/mach-orion5x/pci.c   | 10 ++--
>  arch/mips/pci/ops-bcm63xx.c   |  8 +--
>  arch/mips/pci/pci-legacy.c|  3 +-
>  arch/powerpc/kernel/pci-common.c  | 21 +++
>  arch/powerpc/platforms/4xx/pci.c  |  8 +--
>  arch/powerpc/platforms/52xx/mpc52xx_pci.c |  5 +-
>  arch/powerpc/platforms/pseries/pci.c  | 16 ++---
>  arch/sh/drivers/pci/pcie-sh7786.c | 10 ++--
>  arch/sparc/kernel/leon_pci.c  |  5 +-
>  arch/sparc/kernel/pci.c   | 10 ++--
>  arch/sparc/kernel/pcic.c  |  5 +-
>  drivers/eisa/pci_eisa.c   |  4 +-
>  drivers/pci/bus.c |  7 +--
>  drivers/pci/hotplug/shpchp_sysfs.c|  8 +--
>  drivers/pci/pci.c |  3 +-
>  drivers/pci/probe.c   |  2 +-
>  drivers/pci/remove.c  |  5 +-
>  drivers/pci/setup-bus.c   | 37 +---
>  drivers/pci/setup-res.c   |  4 +-
>  drivers/pci/vgaarb.c  | 17 ++
>  drivers/pci/xen-pcifront.c|  4 +-
>  drivers/pcmcia/rsrc_nonstatic.c   |  9 +--
>  drivers/pcmcia/yenta_socket.c |  3 +-
>  drivers/pnp/quirks.c  | 29 -
>  include/linux/args.h  | 13 
>  include/linux/kernel.h|  8 +--
>  include/linux/pci.h   | 72 +++
>  32 files changed, 190 insertions(+), 178 deletions(-)
>  create mode 100644 include/linux/args.h

Applied 2-7 to pci/resource for v6.4, thanks, I really like this!

I omitted

  [1/7] kernel.h: Split out COUNT_ARGS() and CONCATENATE()"

only because it's not essential to this series and has only a trivial
one-line impact on include/linux/pci.h.

Bjorn

Re: [PATCH v2 4/5] cxl/pci: Forward RCH downstream port-detected errors to the CXL.mem dev handler

2023-03-28 Thread Bjorn Helgaas

[+cc linux-pci, more error handling folks; beginning of thread at
https://lore.kernel.org/all/20230323213808.398039-1-terry.bow...@amd.com/]

On Mon, Mar 27, 2023 at 11:51:39PM +0200, Robert Richter wrote:
> On 24.03.23 17:36:56, Bjorn Helgaas wrote:

> > > The CXL device driver is then responsible to
> > > enable error reporting in the RCEC's AER cap
> > 
> > I don't know exactly what you mean by "error reporting in the RCEC's
> > AER cap", but IIUC, for non-Root Port devices, generation of ERR_COR/
> > ERR_NONFATAL/ERR_FATAL messages is controlled by the Device Control
> > register and should already be enabled by pci_aer_init().
> > 
> > Maybe you mean setting AER mask/severity specifically for Internal
> > Errors?  I'm hoping to get as much of AER management as we can in the
> 
> Richt, this is implemented in patch #5 in function
> rcec_enable_aer_ints().

I think we should add a PCI core interface for this so we can enforce
the AER ownership question (all the crud like pcie_aer_is_native()) in
one place.

> > PCI core and out of drivers, so maybe we need a new PCI interface to
> > do that.
> > 
> > In any event, I assume this sort of configuration would be an
> > enumeration-time thing, while *this* patch is a run-time thing, so
> > maybe this information belongs with a different patch?
> 
> Do you mean once a Restricted CXL host (RCH) is detected, the internal
> errors should be enabled in the device mask, all this done during
> device enumeration? But wouldn't interrupts being enabled then before
> the CXL device is ready?

I'm not sure what you mean by "before the CXL device is ready."  What
makes a CXL device ready, and how do we know when it is ready?

pci_aer_init() turns on PCI_EXP_DEVCTL_CERE, PCI_EXP_DEVCTL_FERE, etc
as soon as we enumerate the device, before any driver claims the
device.  I'm wondering whether we can do this PCI_ERR_COR_INTERNAL and
PCI_ERR_UNC_INTN fiddling around the same time?

> > I haven't worked all the way through this, but I thought Sean Kelley's
> > and Qiuxu Zhuo's work was along the same line and might cover this,
> > e.g.,
> > 
> >   a175102b0a82 ("PCI/ERR: Recover from RCEC AER errors")
> >   579086225502 ("PCI/ERR: Recover from RCiEP AER errors")
> >   af113553d961 ("PCI/AER: Add pcie_walk_rcec() to RCEC AER handling")
> > 
> > But I guess maybe it's not quite the same case?
> 
> Actually, we use this code to handle errors that are reported to the
> RCEC and only implement here the CXL specifics. That is, checking if
> the RCEC receives something from a CXL downstream port and forwarding
> that to a CXL handler (this patch). The handler then checks the AER
> err cap in the RCRB of all CXL downstream ports associated to the RCEC
> (not visible in the PCI hierarchy), but discovered through the :00.0
> RCiEP (patch #5).

There are two calls to pcie_walk_rcec():

  1) The existing one in find_source_device()
  2) The one you add in handle_cxl_error()

Does the call in handle_cxl_error() look at devices that the existing
call in find_source_device() does not?  I'm trying to understand why
we need both calls.

> > > +static bool is_internal_error(struct aer_err_info *info)
> > > +{
> > > + if (info->severity == AER_CORRECTABLE)
> > > + return info->status & PCI_ERR_COR_INTERNAL;
> > > +
> > > + return info->status & PCI_ERR_UNC_INTN;
> > > +}
> > > +
> > > +static void handle_cxl_error(struct pci_dev *dev, struct aer_err_info 
> > > *info)
> > > +{
> > > + if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
> > > + is_internal_error(info))
> > 
> > What's unique about Internal Errors?  I'm trying to figure out why you
> > wouldn't do this for *all* CXL errors.
> 
> Per CXL specification downstream port errors are signaled using
> internal errors. 

Maybe a spec reference here to explain is_internal_error()?  Is the
point of the check to *exclude* non-internal errors?  Or is basically
documentation that there shouldn't ever *be* any non-internal errors?
I guess the latter wouldn't make sense because at this point we don't
know whether this is a CXL hierarchy.

> All other errors would be device specific, we cannot
> handle that in a generic CXL driver.

I'm missing the point here.  We don't have any device-specific error
handling in aer.c; it only connects the generic *reporting* mechanism
(AER log registers and Root Port interrupts) to the drivers that do
the device-specific things via err_handler hooks.  I assume we want a
similar model for CXL.

Bjorn

Re: [PATCH v6 1/4] PCI: Introduce pci_dev_for_each_resource()

2023-03-23 Thread Bjorn Helgaas

On Thu, Mar 23, 2023 at 04:30:01PM +0200, Andy Shevchenko wrote:
> On Wed, Mar 22, 2023 at 02:28:04PM -0500, Bjorn Helgaas wrote:
> > On Mon, Mar 20, 2023 at 03:16:30PM +0200, Andy Shevchenko wrote:
> ...
> 
> > > + pci_dev_for_each_resource_p(dev, r) {
> > >   /* zap the 2nd function of the winbond chip */
> > > - if (dev->resource[i].flags & IORESOURCE_IO
> > > - && dev->bus->number == 0 && dev->devfn == 0x81)
> > > - dev->resource[i].flags &= ~IORESOURCE_IO;
> > > - if (dev->resource[i].start == 0 && dev->resource[i].end) {
> > > - dev->resource[i].flags = 0;
> > > - dev->resource[i].end = 0;
> > > + if (dev->bus->number == 0 && dev->devfn == 0x81 &&
> > > + r->flags & IORESOURCE_IO)
> > 
> > This is a nice literal conversion, but it's kind of lame to test
> > bus->number and devfn *inside* the loop here, since they can't change
> > inside the loop.
> 
> Hmm... why are you asking me, even if I may agree on that? It's
> in the original code and out of scope of this series.

Yeah, I don't think it would be *unreasonable* to clean this up at the
same time so the maintainers can look at both at the same time (this
is arch/powerpc/platforms/pseries/pci.c, so Michael, et al), but no
need for you to do anything, certainly.  I can post a follow-up patch.

> > but
> > since we're converging on the "(dev, res)" style, I think we should
> > reverse the names so we have something like:
> > 
> >   pci_dev_for_each_resource(dev, res)
> >   pci_dev_for_each_resource_idx(dev, res, i)
> 
> Wouldn't it be more churn, including pci_bus_for_each_resource() correction?

Yes, it definitely is a little more churn because we already have
pci_bus_for_each_resource() that would have to be changed.

I poked around looking for similar patterns elsewhere with:

  git grep "#define.*for_each_.*_p("
  git grep "#define.*for_each_.*_idx("

I didn't find any other "_p" iterators and just a few "_idx" ones, so
my hope is to follow what little precedent there is, as well as
converge on the basic "*_for_each_resource()" iterators and remove the
"_idx()" versions over time by doing things like the
pci_claim_resource() change.

What do you think?  If it seems like excessive churn, we can do it
as-is and still try to reduce the use of the index variable over time.

Bjorn

Re: [PATCH v6 2/4] PCI: Split pci_bus_for_each_resource_p() out of pci_bus_for_each_resource()

2023-03-22 Thread Bjorn Helgaas

On Mon, Mar 20, 2023 at 03:16:31PM +0200, Andy Shevchenko wrote:
> ...

> -#define pci_bus_for_each_resource(bus, res, i)   
> \
> - for (i = 0; \
> - (res = pci_bus_resource_n(bus, i)) || i < PCI_BRIDGE_RESOURCE_NUM; \
> -  i++)
> +#define __pci_bus_for_each_resource(bus, res, __i, vartype)  
> \
> + for (vartype __i = 0;   
> \
> +  res = pci_bus_resource_n(bus, __i), __i < PCI_BRIDGE_RESOURCE_NUM; 
> \
> +  __i++)
> +
> +#define pci_bus_for_each_resource(bus, res, i)   
> \
> + __pci_bus_for_each_resource(bus, res, i, )
> +
> +#define pci_bus_for_each_resource_p(bus, res)
> \
> + __pci_bus_for_each_resource(bus, res, __i, unsigned int)

I like these changes a lot, too!

Same comments about _p vs _idx and __pci_bus_for_each_resource(...,
vartype).

Also would prefer 80 char max instead of 81.

Re: [PATCH v6 1/4] PCI: Introduce pci_dev_for_each_resource()

2023-03-22 Thread Bjorn Helgaas

Hi Andy and Mika,

I really like the improvements here.  They make the code read much
better.

On Mon, Mar 20, 2023 at 03:16:30PM +0200, Andy Shevchenko wrote:
> From: Mika Westerberg 
> ...

>  static void fixup_winbond_82c105(struct pci_dev* dev)
>  {
> - int i;
> + struct resource *r;
>   unsigned int reg;
>  
>   if (!machine_is(pseries))
> @@ -251,14 +251,14 @@ static void fixup_winbond_82c105(struct pci_dev* dev)
>   /* Enable LEGIRQ to use INTC instead of ISA interrupts */
>   pci_write_config_dword(dev, 0x40, reg | (1<<11));
>  
> - for (i = 0; i < DEVICE_COUNT_RESOURCE; ++i) {
> + pci_dev_for_each_resource_p(dev, r) {
>   /* zap the 2nd function of the winbond chip */
> - if (dev->resource[i].flags & IORESOURCE_IO
> - && dev->bus->number == 0 && dev->devfn == 0x81)
> - dev->resource[i].flags &= ~IORESOURCE_IO;
> - if (dev->resource[i].start == 0 && dev->resource[i].end) {
> - dev->resource[i].flags = 0;
> - dev->resource[i].end = 0;
> + if (dev->bus->number == 0 && dev->devfn == 0x81 &&
> + r->flags & IORESOURCE_IO)

This is a nice literal conversion, but it's kind of lame to test
bus->number and devfn *inside* the loop here, since they can't change
inside the loop.

> + r->flags &= ~IORESOURCE_IO;
> + if (r->start == 0 && r->end) {
> + r->flags = 0;
> + r->end = 0;
>   }
>   }

>  #define pci_resource_len(dev,bar) \
>   ((pci_resource_end((dev), (bar)) == 0) ? 0 :\
>   \
> -  (pci_resource_end((dev), (bar)) -  \
> -   pci_resource_start((dev), (bar)) + 1))
> +  resource_size(pci_resource_n((dev), (bar

I like this change, but it's unrelated to pci_dev_for_each_resource()
and unmentioned in the commit log.

> +#define __pci_dev_for_each_resource(dev, res, __i, vartype)  \
> + for (vartype __i = 0;   \
> +  res = pci_resource_n(dev, __i), __i < PCI_NUM_RESOURCES;   \
> +  __i++)
> +
> +#define pci_dev_for_each_resource(dev, res, i)   
> \
> +   __pci_dev_for_each_resource(dev, res, i, )
> +
> +#define pci_dev_for_each_resource_p(dev, res)
> \
> + __pci_dev_for_each_resource(dev, res, __i, unsigned int)

This series converts many cases to drop the iterator variable ("i"),
which is fantastic.

Several of the remaining places need the iterator variable only to
call pci_claim_resource(), which could be converted to take a "struct
resource *" directly without much trouble.

We don't have to do that pci_claim_resource() conversion now, but
since we're converging on the "(dev, res)" style, I think we should
reverse the names so we have something like:

  pci_dev_for_each_resource(dev, res)
  pci_dev_for_each_resource_idx(dev, res, i)

Not sure __pci_dev_for_each_resource() is worthwhile since it only
avoids repeating that single "for" statement, and passing in "vartype"
(sometimes empty to implicitly avoid the declaration) is a little
complicated to read.  I think it'd be easier to read like this:

  #define pci_dev_for_each_resource(dev, res)  \
for (unsigned int __i = 0; \
 res = pci_resource_n(dev, __i), __i < PCI_NUM_RESOURCES;  \
 __i++)

  #define pci_dev_for_each_resource_idx(dev, res, idx) \
for (idx = 0;  \
 res = pci_resource_n(dev, idx), idx < PCI_NUM_RESOURCES;  \
 idx++)

Bjorn

Re: [PATCHv2 pci-next 1/2] PCI/AER: correctable error message as KERN_INFO

2023-03-17 Thread Bjorn Helgaas

On Fri, Mar 17, 2023 at 11:50:22AM -0700, Sathyanarayanan Kuppuswamy wrote:
> On 3/17/23 10:51 AM, Grant Grundler wrote:
> > Since correctable errors have been corrected (and counted), the dmesg output
> > should not be reported as a warning, but rather as "informational".
> > 
> > Otherwise, using a certain well known vendor's PCIe parts in a USB4 docking
> > station, the dmesg buffer can be spammed with correctable errors, 717 bytes
> > per instance, potentially many MB per day.
> 
> Why don't you investigate why you are getting so many correctable errors?
> Isn't solving the problem preferable to hiding the logs?

I hope there's some effort to find the cause of the errors, too.  But
I do think KERN_INFO is a reasonable level for errors that have
already been corrected.  KERN_ERR seems a little bit too severe to me.

Does changing to KERN_INFO keep the messages out of the dmesg log?  I
don't think it does, because *most* kernel messages are at KERN_INFO.
This may be just a commit log clarification.

I would like to know *which* devices are involved.  Is there some
reason for weasel-wording this?  Knowing which devices are involved
helps in triaging issue reports.  If there are any public reports on
mailing lists, etc, we could also cite those here to help users find
this solution.

> > Given the "WARN" priority, these messages have already confused the typical
> > user that stumbles across them, support staff (triaging feedback reports),
> > and more than a few linux kernel devs. Changing to INFO will hide these
> > messages from most audiences.
> > 
> > Signed-off-by: Grant Grundler 
> > ---
> >  drivers/pci/pcie/aer.c | 29 +++--
> >  1 file changed, 19 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > index f6c24ded134c..cb6b96233967 100644
> > --- a/drivers/pci/pcie/aer.c
> > +++ b/drivers/pci/pcie/aer.c
> > @@ -687,23 +687,29 @@ static void __aer_print_error(struct pci_dev *dev,
> >  {
> > const char **strings;
> > unsigned long status = info->status & ~info->mask;
> > -   const char *level, *errmsg;
> > int i;
> >  
> > if (info->severity == AER_CORRECTABLE) {
> > strings = aer_correctable_error_string;
> > -   level = KERN_WARNING;
> > +   pci_info(dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n",
> > +   info->status, info->mask);
> > } else {
> > strings = aer_uncorrectable_error_string;
> > -   level = KERN_ERR;
> > +   pci_err(dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n",
> > +   info->status, info->mask);
> > }
> >  
> > for_each_set_bit(i, &status, 32) {
> > -   errmsg = strings[i];
> > +   const char *errmsg = strings[i];
> > +
> > if (!errmsg)
> > errmsg = "Unknown Error Bit";
> >  
> > -   pci_printk(level, dev, "   [%2d] %-22s%s\n", i, errmsg,
> > +   if (info->severity == AER_CORRECTABLE)
> > +   pci_info(dev, "   [%2d] %-22s%s\n", i, errmsg,
> > +   info->first_error == i ? " (First)" : "");
> > +   else
> > +   pci_err(dev, "   [%2d] %-22s%s\n", i, errmsg,
> > info->first_error == i ? " (First)" : "");

The - 5 lines, + 11 lines diff and repetition of the printk strings
doesn't seem like an improvement compared to the -1, +1 in the v1
patch:

  @@ -692,7 +692,7 @@ static void __aer_print_error(struct pci_dev *dev,

  if (info->severity == AER_CORRECTABLE) {
  strings = aer_correctable_error_string;
  -   level = KERN_WARNING;
  +   level = KERN_INFO;
  } else {

But maybe there's a reason?

> > }
> > pci_dev_aer_stats_incr(dev, info);
> > @@ -724,7 +730,7 @@ void aer_print_error(struct pci_dev *dev, struct 
> > aer_err_info *info)
> > layer = AER_GET_LAYER_ERROR(info->severity, info->status);
> > agent = AER_GET_AGENT(info->severity, info->status);
> >  
> > -   level = (info->severity == AER_CORRECTABLE) ? KERN_WARNING : KERN_ERR;
> > +   level = (info->severity == AER_CORRECTABLE) ? KERN_INFO : KERN_ERR;
> >  
> > pci_printk(level, dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
> >aer_error_severity_string[info->severity],
> > @@ -797,14 +803,17 @@ void cper_print_aer(struct pci_dev *dev, int 
> > aer_severity,
> > info.mask = mask;
> > info.first_error = PCI_ERR_CAP_FEP(aer->cap_control);
> >  
> > -   pci_err(dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n", status, mask);
> > __aer_print_error(dev, &info);
> > -   pci_err(dev, "aer_layer=%s, aer_agent=%s\n",
> > -   aer_error_layer[layer], aer_agent_string[agent]);
> >  
> > -   if (aer_severity != AER_CORRECTABLE)
> > +   if (aer_severity == AER_CORRECTABLE) {
> > +   pci_info(dev, "aer_layer=%s, aer_agent=%s\n",
> > +   aer_error_layer[layer], aer_ag

Re: [PATCH v3 4/9] scsi: lpfc: Change to use pci_aer_clear_uncorrect_error_status()

2023-03-15 Thread Bjorn Helgaas

On Tue, Dec 06, 2022 at 04:13:35PM -0600, Bjorn Helgaas wrote:
> On Wed, Sep 28, 2022 at 06:59:41PM +0800, Zhuo Chen wrote:
> > lpfc_aer_cleanup_state() requires clearing both fatal and non-fatal
> > uncorrectable error status.
> 
> I don't know what the point of lpfc_aer_cleanup_state() is.  AER
> errors should be handled and cleared by the PCI core, not by
> individual drivers.  Only lpfc, liquidio, and sky2 touch
> PCI_ERR_UNCOR_STATUS.
> 
> But lpfc_aer_cleanup_state() is visible in the
> "lpfc_aer_state_cleanup" sysfs file, so removing it would break any
> userspace that uses it.
> 
> If we can rely on the PCI core to clean up AER errors itself
> (admittedly, that might be a big "if"), maybe lpfc_aer_cleanup_state()
> could just become a no-op?
> 
> Any comment from the LPFC folks?
> 
> Ideally, I would rather not export pci_aer_clear_nonfatal_status() or
> pci_aer_clear_uncorrect_error_status() outside the PCI core at all.

Resurrecting this old thread.  Zhuo, can you figure out where the PCI
core clears these errors, include that in the commit log, and propose
a patch that makes lpfc_aer_cleanup_state() a no-op, by removing the
pci_aer_clear_nonfatal_status() call completely?

Such a patch could be sent to the SCSI maintainers since it doesn't
involve the PCI core.

If it turns out that the PCI core *doesn't* clear these errors, we
should figure out *why* it doesn't and try to change the PCI core so
it does.

> > But using pci_aer_clear_nonfatal_status()
> > will only clear non-fatal error status. To clear both fatal and
> > non-fatal error status, use pci_aer_clear_uncorrect_error_status().
> > 
> > Signed-off-by: Zhuo Chen 
> > ---
> >  drivers/scsi/lpfc/lpfc_attr.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
> > index 09cf2cd0ae60..d835cc0ba153 100644
> > --- a/drivers/scsi/lpfc/lpfc_attr.c
> > +++ b/drivers/scsi/lpfc/lpfc_attr.c
> > @@ -4689,7 +4689,7 @@ static DEVICE_ATTR_RW(lpfc_aer_support);
> >   * Description:
> >   * If the @buf contains 1 and the device currently has the AER support
> >   * enabled, then invokes the kernel AER helper routine
> > - * pci_aer_clear_nonfatal_status() to clean up the uncorrectable
> > + * pci_aer_clear_uncorrect_error_status() to clean up the uncorrectable
> >   * error status register.
> >   *
> >   * Notes:
> > @@ -4715,7 +4715,7 @@ lpfc_aer_cleanup_state(struct device *dev, struct 
> > device_attribute *attr,
> > return -EINVAL;
> >  
> > if (phba->hba_flag & HBA_AER_ENABLED)
> > -   rc = pci_aer_clear_nonfatal_status(phba->pcidev);
> > +   rc = pci_aer_clear_uncorrect_error_status(phba->pcidev);
> >  
> > if (rc == 0)
> > return strlen(buf);
> > -- 
> > 2.30.1 (Apple Git-130)
> >

Re: [PATCH v3 3/9] NTB: Remove pci_aer_clear_nonfatal_status() call

2023-03-15 Thread Bjorn Helgaas

On Wed, Sep 28, 2022 at 06:59:40PM +0800, Zhuo Chen wrote:
> There is no need to clear error status during init code, so remove it.
> 
> Signed-off-by: Zhuo Chen 

Can you send this to the NTB folks?  It doesn't depend on anything, so
no real reason to merge via the PCI tree.

To help reviewers, ideally the commit log would mention where the PCI
core clears the non-fatal errors so the driver doesn't have to.

> ---
>  drivers/ntb/hw/idt/ntb_hw_idt.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
> index 0ed6f809ff2e..fed03217289d 100644
> --- a/drivers/ntb/hw/idt/ntb_hw_idt.c
> +++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
> @@ -2657,8 +2657,6 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
>   ret = pci_enable_pcie_error_reporting(pdev);
>   if (ret != 0)
>   dev_warn(&pdev->dev, "PCIe AER capability disabled\n");
> - else /* Cleanup nonfatal error status before getting to init */
> - pci_aer_clear_nonfatal_status(pdev);
>  
>   /* First enable the PCI device */
>   ret = pcim_enable_device(pdev);
> -- 
> 2.30.1 (Apple Git-130)
>

Re: [PATCH v3 1/9] PCI/AER: Add pci_aer_clear_uncorrect_error_status() to PCI core

2023-03-15 Thread Bjorn Helgaas

On Wed, Sep 28, 2022 at 06:59:38PM +0800, Zhuo Chen wrote:
> In lpfc_aer_cleanup_state(), uncorrectable error status needs to be
> cleared, which can be done by calling pci_aer_clear_nonfatal_status()
> and pci_aer_clear_fatal_status(). Meanwhile they can be combined in
> one function (the same in dpc_process_error). So add
> pci_aer_clear_uncorrect_error_status() function to PCI core and
> export symbol to other modules which wants to use it.

Sorry for getting back to this so late.

Why does lpfc need this?  I think AER error status should be cleared
by the PCI core, not by individual drivers, so I really would rather
not add a new interface for drivers to use.

> Signed-off-by: Zhuo Chen 
> ---
>  drivers/pci/pcie/aer.c | 16 
>  include/linux/aer.h|  5 +
>  2 files changed, 21 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index e2d8a74f83c3..4e637121be23 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -286,6 +286,22 @@ void pci_aer_clear_fatal_status(struct pci_dev *dev)
>   pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
>  }
>  
> +int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
> +{
> + int aer = dev->aer_cap;
> + u32 status;
> +
> + if (!pcie_aer_is_native(dev))
> + return -EIO;
> +
> + pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, &status);
> + if (status)
> + pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(pci_aer_clear_uncorrect_error_status);
> +
>  /**
>   * pci_aer_raw_clear_status - Clear AER error registers.
>   * @dev: the PCI device
> diff --git a/include/linux/aer.h b/include/linux/aer.h
> index 97f64ba1b34a..154690c278cb 100644
> --- a/include/linux/aer.h
> +++ b/include/linux/aer.h
> @@ -45,6 +45,7 @@ struct aer_capability_regs {
>  int pci_enable_pcie_error_reporting(struct pci_dev *dev);
>  int pci_disable_pcie_error_reporting(struct pci_dev *dev);
>  int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
> +int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev);
>  void pci_save_aer_state(struct pci_dev *dev);
>  void pci_restore_aer_state(struct pci_dev *dev);
>  #else
> @@ -60,6 +61,10 @@ static inline int pci_aer_clear_nonfatal_status(struct 
> pci_dev *dev)
>  {
>   return -EINVAL;
>  }
> +static inline int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
> +{
> + return -EINVAL;
> +}
>  static inline void pci_save_aer_state(struct pci_dev *dev) {}
>  static inline void pci_restore_aer_state(struct pci_dev *dev) {}
>  #endif
> -- 
> 2.30.1 (Apple Git-130)
>

Re: [PATCH 1/1] PCI: layerscape: Add the workaround for A-010305

2023-03-14 Thread Bjorn Helgaas

On Thu, Jan 12, 2023 at 02:44:33PM -0500, Frank Li wrote:
> From: Xiaowei Bao 
> 
> When a link down or hot reset event occurs, the PCI Express EP
> controller's Link Capabilities Register should retain the values of
> the Maximum Link Width and Supported Link Speed configured by RCW.

Can you rework this to say what the patch does and why it's necessary?

Apparently it's a workaround for some issue in A-010305?  The subject
line could also use more content.  What is A-010305?  What is the
problem this works around?

I don't see a check for A-010305; do *all* devices handled by this
driver have this problem?

The PCIe Link Capabilities is supposed to be read-only; maybe this
device loses the value on link down or hot reset?  And I guess the
device interrupts on link up/down and reset, and you restore the value
then?

Link Capabilities contains several things other than Max Link Width
and Max Link Speed.  But they don't need to be restored?

What is RCW?

> Signed-off-by: Xiaowei Bao 
> Signed-off-by: Hou Zhiqiang 
> Signed-off-by: Frank Li 
> ---
>  .../pci/controller/dwc/pci-layerscape-ep.c| 112 +-
>  1 file changed, 111 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
> b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> index ed5cfc9408d9..1b884854c18e 100644
> --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> @@ -18,6 +18,22 @@
>  
>  #include "pcie-designware.h"
>  
> +#define PCIE_LINK_CAP0x7C/* PCIe Link 
> Capabilities*/

Is this something you can find by searching the capability list
instead of hard-coding the config space offset?

> +#define MAX_LINK_SP_MASK 0x0F
> +#define MAX_LINK_W_MASK  0x3F
> +#define MAX_LINK_W_SHIFT 4

These look like they should use PCI_EXP_LNKCAP_SLS and
PCI_EXP_LNKCAP_MLW instead of defining new ones.

> +/* PEX PFa PCIE pme and message interrupt registers*/
> +#define PEX_PF0_PME_MES_DR 0xC0020
> +#define PEX_PF0_PME_MES_DR_LUD (1 << 7)
> +#define PEX_PF0_PME_MES_DR_LDD (1 << 9)
> +#define PEX_PF0_PME_MES_DR_HRD (1 << 10)
> +
> +#define PEX_PF0_PME_MES_IER0xC0028
> +#define PEX_PF0_PME_MES_IER_LUDIE  (1 << 7)
> +#define PEX_PF0_PME_MES_IER_LDDIE  (1 << 9)
> +#define PEX_PF0_PME_MES_IER_HRDIE  (1 << 10)
> +
>  #define to_ls_pcie_ep(x) dev_get_drvdata((x)->dev)
>  
>  struct ls_pcie_ep_drvdata {
> @@ -30,8 +46,90 @@ struct ls_pcie_ep {
>   struct dw_pcie  *pci;
>   struct pci_epc_features *ls_epc;
>   const struct ls_pcie_ep_drvdata *drvdata;
> + u8  max_speed;
> + u8  max_width;
> + boolbig_endian;
> + int irq;
>  };
>  
> +static u32 ls_lut_readl(struct ls_pcie_ep *pcie, u32 offset)
> +{
> + struct dw_pcie *pci = pcie->pci;
> +
> + if (pcie->big_endian)
> + return ioread32be(pci->dbi_base + offset);
> + else
> + return ioread32(pci->dbi_base + offset);
> +}
> +
> +static void ls_lut_writel(struct ls_pcie_ep *pcie, u32 offset,
> +   u32 value)
> +{
> + struct dw_pcie *pci = pcie->pci;
> +
> + if (pcie->big_endian)
> + iowrite32be(value, pci->dbi_base + offset);
> + else
> + iowrite32(value, pci->dbi_base + offset);
> +}
> +
> +static irqreturn_t ls_pcie_ep_event_handler(int irq, void *dev_id)
> +{
> + struct ls_pcie_ep *pcie = (struct ls_pcie_ep *)dev_id;
> + struct dw_pcie *pci = pcie->pci;
> + u32 val;
> +
> + val = ls_lut_readl(pcie, PEX_PF0_PME_MES_DR);
> + if (!val)
> + return IRQ_NONE;
> +
> + if (val & PEX_PF0_PME_MES_DR_LUD)
> + dev_info(pci->dev, "Detect the link up state !\n");
> + else if (val & PEX_PF0_PME_MES_DR_LDD)
> + dev_info(pci->dev, "Detect the link down state !\n");
> + else if (val & PEX_PF0_PME_MES_DR_HRD)
> + dev_info(pci->dev, "Detect the hot reset state !\n");

No space before "!".  Seems possibly more verbose than necessary,
since the endpoint may be reset as part of normal operation.

> + dw_pcie_dbi_ro_wr_en(pci);
> + dw_pcie_writew_dbi(pci, PCIE_LINK_CAP,
> +(pcie->max_width << MAX_LINK_W_SHIFT) |

Use FIELD_PREP() so you don't need a shift.

> +pcie->max_speed);
> + dw_pcie_dbi_ro_wr_dis(pci);
> +
> + ls_lut_writel(pcie, PEX_PF0_PME_MES_DR, val);
> +
> + return IRQ_HANDLED;
> +}
> +
> +static int ls_pcie_ep_interrupt_init(struct ls_pcie_ep *pcie,
> +  struct platform_device *pdev)
> +{
> + u32 val;
> + int ret;
> +
> + pcie->irq = platform_get_irq_byname(pdev, "pme");
> + if (pcie->irq < 0) {
> + dev_err(&pdev->dev, "Can't get '

Re: [PATCH] PCI/AER: correctable error message as KERN_INFO

2023-03-14 Thread Bjorn Helgaas

On Tue, Feb 28, 2023 at 10:04:53PM -0800, Grant Grundler wrote:
> Since correctable errors have been corrected (and counted), the dmesg output
> should not be reported as a warning, but rather as "informational".
> 
> Otherwise, using a certain well known vendor's PCIe parts in a USB4 docking
> station, the dmesg buffer can be spammed with correctable errors, 717 bytes
> per instance, potentially many MB per day.
> 
> Given the "WARN" priority, these messages have already confused the typical
> user that stumbles across them, support staff (triaging feedback reports),
> and more than a few linux kernel devs. Changing to INFO will hide these
> messages from most audiences.
> 
> Signed-off-by: Grant Grundler 
> ---
> This patch will likely conflict with:
>   
> https://lore.kernel.org/all/20230103165548.570377-1-rajat.khandel...@linux.intel.com/
> 
> which I'd also like to see upstream. Please let me know to resubmit
> mine if Rajat's patch lands first. Or feel free to fix up this one.

Yes.  I think it makes sense to separate this into two patches:

  1) Log correctable errors as KERN_INFO instead of KERN_WARNING, and

  2) Rate-limit correctable error logging.

>  drivers/pci/pcie/aer.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index f6c24ded134c..e4cf3ec40d66 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -692,7 +692,7 @@ static void __aer_print_error(struct pci_dev *dev,
>  
>   if (info->severity == AER_CORRECTABLE) {
>   strings = aer_correctable_error_string;
> - level = KERN_WARNING;
> + level = KERN_INFO;
>   } else {
>   strings = aer_uncorrectable_error_string;
>   level = KERN_ERR;
> @@ -724,7 +724,7 @@ void aer_print_error(struct pci_dev *dev, struct 
> aer_err_info *info)
>   layer = AER_GET_LAYER_ERROR(info->severity, info->status);
>   agent = AER_GET_AGENT(info->severity, info->status);
>  
> - level = (info->severity == AER_CORRECTABLE) ? KERN_WARNING : KERN_ERR;
> + level = (info->severity == AER_CORRECTABLE) ? KERN_INFO : KERN_ERR;
>  
>   pci_printk(level, dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
>  aer_error_severity_string[info->severity],

Shouldn't we do the same in the cper_print_aer() path?  That path
currently uses pci_err() and then calls __aer_print_error(), so the
initial message will always be KERN_ERR, and the decoding done by
__aer_print_error() will be KERN_INFO (for correctable) or KERN_ERR.

Seems like a shame to do the same test in three places, but would
require a little more refactoring to avoid that.

Bjorn

1 2 3 4 5 6 7 8 9 >

1 - 100 of 884 matches

Mail list logo