Re: [PATCHv3 pci-next 1/2] PCI/AER: correctable error message as KERN_INFO

2024-03-26 Thread Bjorn Helgaas
On Tue, Mar 26, 2024 at 09:39:54AM +0800, Ethan Zhao wrote:
> On 3/25/2024 6:15 PM, Xi Ruoyao wrote:
> > On Mon, 2024-03-25 at 16:45 +0800, Ethan Zhao wrote:
> > > On 3/25/2024 1:19 AM, Xi Ruoyao wrote:
> > > > On Mon, 2023-09-18 at 14:39 -0500, Bjorn Helgaas wrote:
> > > > > On Mon, Sep 18, 2023 at 07:42:30PM +0800, Xi Ruoyao wrote:
> > > > > > ...
> > > > > > My workstation suffers from too much correctable AER reporting as 
> > > > > > well
> > > > > > (related to Intel's errata "RPL013: Incorrectly Formed PCIe Packets 
> > > > > > May
> > > > > > Generate Correctable Errors" and/or the motherboard design, I 
> > > > > > guess).
> > > > > We should rate-limit correctable error reporting so it's not
> > > > > overwhelming.
> > > > > 
> > > > > At the same time, I'm *also* interested in the cause of these errors,
> > > > > in case there's a Linux defect or a hardware erratum that we can work
> > > > > around.  Do you have a bug report with any more details, e.g., a dmesg
> > > > > log and "sudo lspci -vv" output?
> > > > Hi Bjorn,
> > > > 
> > > > Sorry for the *very* late reply (somehow I didn't see the reply at all
> > > > before it was removed by my cron job, and now I just savaged it from
> > > > lore.kernel.org...)
> > > > 
> > > > The dmesg is like:
> > > > 
> > > > [  882.456994] pcieport :00:1c.1: AER: Multiple Correctable error 
> > > > message received from :00:1c.1
> > > > [  882.457002] pcieport :00:1c.1: AER: found no error details for 
> > > > :00:1c.1
> > > > [  882.457003] pcieport :00:1c.1: AER: Multiple Correctable error 
> > > > message received from :06:00.0
> > > > [  883.545763] pcieport :00:1c.1: AER: Multiple Correctable error 
> > > > message received from :00:1c.1
> > > > [  883.545789] pcieport :00:1c.1: PCIe Bus Error: 
> > > > severity=Correctable, type=Physical Layer, (Receiver ID)
> > > > [  883.545790] pcieport :00:1c.1:   device [8086:7a39] error 
> > > > status/mask=0001/2000
> > > > [  883.545792] pcieport :00:1c.1:    [ 0] RxErr  
> > > > (First)
> > > > [  883.545794] pcieport :00:1c.1: AER:   Error of this Agent is 
> > > > reported first
> > > > [  883.545798] r8169 :06:00.0: PCIe Bus Error: 
> > > > severity=Correctable, type=Physical Layer, (Transmitter ID)
> > > > [  883.545799] r8169 :06:00.0:   device [10ec:8125] error 
> > > > status/mask=1101/e000
> > > > [  883.545800] r8169 :06:00.0:    [ 0] RxErr  
> > > > (First)
> > > > [  883.545801] r8169 :06:00.0:    [ 8] Rollover
> > > > [  883.545802] r8169 :06:00.0:    [12] Timeout
> > > > [  883.545815] pcieport :00:1c.1: AER: Correctable error message 
> > > > received from :00:1c.1
> > > > [  883.545823] pcieport :00:1c.1: AER: found no error details for 
> > > > :00:1c.1
> > > > [  883.545824] pcieport :00:1c.1: AER: Multiple Correctable error 
> > > > message received from :06:00.0
> > > > 
> > > > lspci output attached.
> > > > 
> > > > Intel has issued an errata "RPL013" saying:
> > > > 
> > > > "Under complex microarchitectural conditions, the PCIe controller may
> > > > transmit an incorrectly formed Transaction Layer Packet (TLP), which
> > > > will fail CRC checks. When this erratum occurs, the PCIe end point may
> > > > record correctable errors resulting in either a NAK or link recovery.
> > > > Intel® has not observed any functional impact due to this erratum."
> > > > 
> > > > But I'm really unsure if it describes my issue.
> > > > 
> > > > Do you think I have some broken hardware and I should replace the CPU
> > > > and/or the motherboard (where the r8169 is soldered)?  I've noticed that
> > > > my 13900K is almost impossible to overclock (despite it's a K), but I've
> > > > not encountered any issue other than these AER reporting so far after I
> > > > gave up overclocking.
> > > Seems there are two r8169 nics on your board, only :06:00.0 reports
> > > aer errors, how about another one the :07:00.0 nic ?
> > It never happens to :07:00.0, even if I plug the ethernet cable into
> > it instead of :06:00.0.
> 
> So something is wrong with the physical layer, I guess.
> 
> > Maybe I should just use :07:00.0 and blacklist :06:00.0 as I
> > don't need two NICs?
> 
> Yup,
> ratelimit the AER warning is another choice instead of change WARN to INFO.
> if corrected error flood happens, even the function is working, suggests
> something was already wrong, likely will be worse, that is the meaning of
> WARN I think.

We should fix this.  IMHO Correctable Errors should be "info" level,
non-alarming, and rate-limited.  They're basically hints about link
integrity.

Bjorn


Re: [PATCH 3/4] PCI: Add TLP Prefix reading into pcie_read_tlp_log()

2024-03-22 Thread Bjorn Helgaas
On Tue, Feb 06, 2024 at 03:57:16PM +0200, Ilpo Järvinen wrote:
> pcie_read_tlp_log() handles only 4 TLP Header Log DWORDs but TLP Prefix
> Log (PCIe r6.1 secs 7.8.4.12 & 7.9.14.13) may also be present.

s/TLP Header Log/Header Log/ to match spec terminology (also below)

> Generalize pcie_read_tlp_log() and struct pcie_tlp_log to handle also
> TLP Prefix Log. The layout of relevant registers in AER and DPC
> Capability is not identical but the offsets of TLP Header Log and TLP
> Prefix Log vary so the callers must pass the offsets to
> pcie_read_tlp_log().

s/is not identical but/is identical, but/ ?

The spec is a little obtuse about Header Log Size.

> Convert eetlp_prefix_path into integer called eetlp_prefix_max and
> make is available also when CONFIG_PCI_PASID is not configured to
> be able to determine the number of E-E Prefixes.

I think this eetlp_prefix_path piece is right, but would be nice in a
separate patch since it's a little bit different piece to review.

> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -11336,7 +11336,9 @@ static pci_ers_result_t 
> ixgbe_io_error_detected(struct pci_dev *pdev,
>   if (!pos)
>   goto skip_bad_vf_detection;
>  
> - ret = pcie_read_tlp_log(pdev, pos + PCI_ERR_HEADER_LOG, _log);
> + ret = pcie_read_tlp_log(pdev, pos + PCI_ERR_HEADER_LOG,
> + pos + PCI_ERR_PREFIX_LOG,
> + aer_tlp_log_len(pdev), _log);
>   if (ret < 0) {
>   ixgbe_check_cfg_remove(hw, pdev);
>   goto skip_bad_vf_detection;

We applied the patch to export pcie_read_tlp_log(), but I'm having
second thoughts about it.   I don't think drivers really have any
business here, and I'd rather not expose either pcie_read_tlp_log() or
aer_tlp_log_len().

This part of ixgbe_io_error_detected() was added by 83c61fa97a7d
("ixgbe: Add protection from VF invalid target DMA"), and to me it
looks like debug code that probably doesn't need to be there as long
as the PCI core does the appropriate logging.

Bjorn


Re: [PATCH 2/4] PCI: Generalize TLP Header Log reading

2024-03-14 Thread Bjorn Helgaas
[+cc Greg, Jeff -- ancient history, I know, sorry!]

On Tue, Feb 06, 2024 at 03:57:15PM +0200, Ilpo Järvinen wrote:
> Both AER and DPC RP PIO provide TLP Header Log registers (PCIe r6.1
> secs 7.8.4 & 7.9.14) to convey error diagnostics but the struct is
> named after AER as the struct aer_header_log_regs. Also, not all places
> that handle TLP Header Log use the struct and the struct members are
> named individually.
> 
> Generalize the struct name and members, and use it consistently where
> TLP Header Log is being handled so that a pcie_read_tlp_log() helper
> can be easily added.
> 
> Signed-off-by: Ilpo Järvinen 

> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index bd541527c8c7..5fdf37968b2d 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -1,6 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /* Copyright(c) 1999 - 2018 Intel Corporation. */
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -391,22 +392,6 @@ u16 ixgbe_read_pci_cfg_word(struct ixgbe_hw *hw, u32 reg)
>   return value;
>  }
>  
> -#ifdef CONFIG_PCI_IOV
> -static u32 ixgbe_read_pci_cfg_dword(struct ixgbe_hw *hw, u32 reg)
> -{
> - struct ixgbe_adapter *adapter = hw->back;
> - u32 value;
> -
> - if (ixgbe_removed(hw->hw_addr))
> - return IXGBE_FAILED_READ_CFG_DWORD;
> - pci_read_config_dword(adapter->pdev, reg, );
> - if (value == IXGBE_FAILED_READ_CFG_DWORD &&
> - ixgbe_check_cfg_remove(hw, adapter->pdev))
> - return IXGBE_FAILED_READ_CFG_DWORD;
> - return value;
> -}
> -#endif /* CONFIG_PCI_IOV */
> -
>  void ixgbe_write_pci_cfg_word(struct ixgbe_hw *hw, u32 reg, u16 value)
>  {
>   struct ixgbe_adapter *adapter = hw->back;
> @@ -11332,8 +11317,8 @@ static pci_ers_result_t 
> ixgbe_io_error_detected(struct pci_dev *pdev,
>  #ifdef CONFIG_PCI_IOV
>   struct ixgbe_hw *hw = >hw;
>   struct pci_dev *bdev, *vfdev;
> - u32 dw0, dw1, dw2, dw3;
> - int vf, pos;
> + struct pcie_tlp_log tlp_log;
> + int vf, pos, ret;
>   u16 req_id, pf_func;
>  
>   if (adapter->hw.mac.type == ixgbe_mac_82598EB ||
> @@ -11351,14 +11336,13 @@ static pci_ers_result_t 
> ixgbe_io_error_detected(struct pci_dev *pdev,
>   if (!pos)
>   goto skip_bad_vf_detection;
>  
> - dw0 = ixgbe_read_pci_cfg_dword(hw, pos + PCI_ERR_HEADER_LOG);
> - dw1 = ixgbe_read_pci_cfg_dword(hw, pos + PCI_ERR_HEADER_LOG + 4);
> - dw2 = ixgbe_read_pci_cfg_dword(hw, pos + PCI_ERR_HEADER_LOG + 8);
> - dw3 = ixgbe_read_pci_cfg_dword(hw, pos + PCI_ERR_HEADER_LOG + 12);
> - if (ixgbe_removed(hw->hw_addr))
> + ret = pcie_read_tlp_log(pdev, pos + PCI_ERR_HEADER_LOG, _log);
> + if (ret < 0) {
> + ixgbe_check_cfg_remove(hw, pdev);
>   goto skip_bad_vf_detection;
> + }
>  
> - req_id = dw1 >> 16;
> + req_id = tlp_log.dw[1] >> 16;
>   /* On the 82599 if bit 7 of the requestor ID is set then it's a VF */
>   if (!(req_id & 0x0080))
>   goto skip_bad_vf_detection;
> @@ -11369,9 +11353,8 @@ static pci_ers_result_t 
> ixgbe_io_error_detected(struct pci_dev *pdev,
>  
>   vf = FIELD_GET(0x7F, req_id);
>   e_dev_err("VF %d has caused a PCIe error\n", vf);
> - e_dev_err("TLP: dw0: %8.8x\tdw1: %8.8x\tdw2: "
> - "%8.8x\tdw3: %8.8x\n",
> - dw0, dw1, dw2, dw3);
> + e_dev_err("TLP: dw0: %8.8x\tdw1: %8.8x\tdw2: %8.8x\tdw3: 
> %8.8x\n",
> +   tlp_log.dw[0], tlp_log.dw[1], tlp_log.dw[2], 
> tlp_log.dw[3]);
>   switch (adapter->hw.mac.type) {
>   case ixgbe_mac_82599EB:
>   device_id = IXGBE_82599_VF_DEVICE_ID;

The rest of this patch is headed for v6.10, but I dropped this ixgbe
change for now.

These TLP Log registers are generic, not device-specific, and if
there's something lacking in the PCI core that leads to ixgbe reading
and dumping them itself, I'd rather improve the PCI core so all
drivers will benefit without having to add code like this.

83c61fa97a7d ("ixgbe: Add protection from VF invalid target DMA") [1]
added the ixgbe TLP Log dumping way back in v3.2 (2012).  It does do
some device-specific VF checking and so on, but even back then, it
looks like the PCI core would have dumped the log itself [2], so I
don't know why we needed the extra dumping in ixgbe.

So what I'd really like is to remove the TLP Log reading and printing
from ixgbe completely, but keep the VF checking.

Bjorn

[1] https://git.kernel.org/linus/83c61fa97a7d
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/aer/aerdrv_errprint.c?id=83c61fa97a7d#n181


Re: [PATCH 0/4] PCI: Consolidate TLP Log reading and printing

2024-03-08 Thread Bjorn Helgaas
On Tue, Feb 06, 2024 at 03:57:13PM +0200, Ilpo Järvinen wrote:
> This series consolidates AER & DPC TLP Log handling code. Helpers are
> added for reading and printing the TLP Log and the format is made to
> include E-E Prefixes in both cases (previously only one DPC RP PIO
> displayed the E-E Prefixes).
> 
> I'd appreciate if people familiar with ixgbe could check the error
> handling conversion within the driver is correct.
> 
> Ilpo Järvinen (4):
>   PCI/AER: Cleanup register variable
>   PCI: Generalize TLP Header Log reading

I applied these first two to pci/aer for v6.9, thanks, these are all
nice improvements!

I postponed the ixgbe part for now because I think we should get an
ack from those maintainers or just send it to them since it subtly
changes the error and device removal checking there.

>   PCI: Add TLP Prefix reading into pcie_read_tlp_log()
>   PCI: Create helper to print TLP Header and Prefix Log

I'll respond to these with some minor comments.

>  drivers/firmware/efi/cper.c   |  4 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 39 +++--
>  drivers/pci/ats.c |  2 +-
>  drivers/pci/pci.c | 79 +++
>  drivers/pci/pci.h |  2 +-
>  drivers/pci/pcie/aer.c| 28 ++-
>  drivers/pci/pcie/dpc.c| 31 
>  drivers/pci/probe.c   | 14 ++--
>  include/linux/aer.h   | 16 ++--
>  include/linux/pci.h   |  2 +-
>  include/ras/ras_event.h   | 10 +--
>  include/uapi/linux/pci_regs.h |  2 +
>  12 files changed, 145 insertions(+), 84 deletions(-)
> 
> -- 
> 2.39.2
> 


Re: [PATCH 1/1] PCI/portdrv: Allow DPC if the OS controls AER natively.

2024-02-21 Thread Bjorn Helgaas
[+cc Mahesh, Oliver, linuxppc-dev, since I mentioned powerpc below.
Probably not of interest since this is about the ACPI EDR feature, but
just FYI]

On Wed, Feb 21, 2024 at 05:11:04PM -0600, Bjorn Helgaas wrote:
> On Tue, Jan 23, 2024 at 09:59:21AM -0600, Bjorn Helgaas wrote:
> > On Mon, Jan 22, 2024 at 06:37:48PM -0800, Kuppuswamy Sathyanarayanan wrote:
> > > On 1/22/24 11:32 AM, Bjorn Helgaas wrote:
> > > > On Mon, Jan 08, 2024 at 05:15:08PM -0700, Matthew W Carlis wrote:
> > > >> A small part is probably historical; we've been using DPC on PCIe
> > > >> switches since before there was any EDR support in the kernel. It
> > > >> looks like there was a PCIe DPC ECN as early as Feb 2012, but this
> > > >> EDR/DPC fw ECN didn't come in till Jan 2019 & kernel support for ECN
> > > >> was even later. Its not immediately clear I would want to use EDR in
> > > >> my newer architecures & then there are also the older architecures
> > > >> still requiring support. When I submitted this patch I came at it
> > > >> with the approach of trying to keep the old behavior & still support
> > > >> the newer EDR behavior. Bjorns patch from Dec 28 2023 would seem to
> > > >> change the behavior for both root ports & switch ports, requiring
> > > >> them to set _OSC Control Field bit 7 (DPC) and _OSC Support Field
> > > >> bit 7 (EDR) or a kernel command line value. I think no matter what,
> > > >> we want to ensure that PCIe Root Ports and PCIe switches arrive at
> > > >> the same policy here.
> > > > Is there an approved DPC ECN to the PCI Firmware spec that adds DPC
> > > > control negotiation, but does *not* add the EDR requirement?
> > > >
> > > > I'm looking at
> > > > https://members.pcisig.com/wg/PCI-SIG/document/previewpdf/12888, which
> > > > seems to be the final "Downstream Port Containment Related
> > > > Enhancements" ECN, which is dated 1/28/2019 and applies to the PCI
> > > > Firmware spec r3.2.
> > > >
> > > > It adds bit 7, "PCI Express Downstream Port Containment Configuration
> > > > control", to the passed-in _OSC Control field, which indicates that
> > > > the OS supports both "native OS control and firmware ownership models
> > > > (i.e. Error Disconnect Recover notification) of Downstream Port
> > > > Containment."
> > > >
> > > > It also adds the dependency that "If the OS sets bit 7 of the Control
> > > > field, it must set bit 7 of the Support field, indicating support for
> > > > the Error Disconnect Recover event."
> > > >
> > > > So I'm trying to figure out if the "support DPC but not EDR" situation
> > > > was ever a valid place to be.  Maybe it's a mistake to have separate
> > > > CONFIG_PCIE_DPC and CONFIG_PCIE_EDR options.
> > > 
> > > My understanding is also similar. I have raised the same point in
> > > https://lore.kernel.org/all/3c02a6d6-917e-486c-ad41-bdf176639...@linux.intel.com/
> > 
> > Ah, sorry, I missed that.
> > 
> > > IMO, we don't need a separate config for EDR. I don't think user can
> > > gain anything with disabling EDR and enabling DPC. As long as
> > > firmware does not user EDR support, just compiling the code should
> > > be harmless.
> > > 
> > > So we can either remove it, or select it by default if user selects
> > > DPC config.
> > > 
> > > > CONFIG_PCIE_EDR depends on CONFIG_ACPI, so the situation is a little
> > > > bit murky on non-ACPI systems that support DPC.
> > > 
> > > If we are going to remove the EDR config, it might need #ifdef
> > > CONFIG_ACPI changes in edr.c to not compile ACPI specific code.
> > > Alternative choice is to compile edr.c with CONFIG_ACPI.
> > 
> > Right.  I think we should probably remove CONFIG_PCIE_EDR completely
> > and make everything controlled by CONFIG_PCIE_DPC.
> 
> In the PCI Firmware spec, r3.3, sec 4.5.1, table 4-4, the description
> of "Error Disconnect Recover Supported" hints at the possibility for
> an OS to support EDR but not DPC:
> 
>   In the context of PCIe, support for Error Disconnect Recover implies
>   that the operating system will invalidate the software state
>   associated with child devices of the port without attempting to
>   access the child device hardware. *If* the operating system supports
>   Downstream Po

Re: [PATCH v2 1/4] PCI/AER: Store more information in aer_err_info

2024-02-06 Thread Bjorn Helgaas
On Wed, Feb 07, 2024 at 12:41:41AM +0800, Wang, Qingshun wrote:
> On Mon, Feb 05, 2024 at 05:12:31PM -0600, Bjorn Helgaas wrote:
> > On Thu, Jan 25, 2024 at 02:27:59PM +0800, Wang, Qingshun wrote:
> > > When Advisory Non-Fatal errors are raised, both correctable and
> > > uncorrectable error statuses will be set. The current kernel code cannot
> > > store both statuses at the same time, thus failing to handle ANFE 
> > > properly.
> > > In addition, to avoid clearing UEs that are not ANFE by accident, UE
> > > severity and Device Status also need to be recorded: any fatal UE cannot
> > > be ANFE, and if Fatal/Non-Fatal Error Detected is set in Device Status, do
> > > not take any assumption and let UE handler to clear UE status.
> > > 
> > > Store status and mask of both correctable and uncorrectable errors in
> > > aer_err_info. The severity of UEs and the values of the Device Status
> > > register are also recorded, which will be used to determine UEs that 
> > > should
> > > be handled by the ANFE handler. Refactor the rest of the code to use
> > > cor/uncor_status and cor/uncor_mask fields instead of status and mask
> > > fields.
> > 
> > There's a lot going on in this patch.  Could it possibly be split up a
> > bit, e.g., first tease apart aer_err_info.status/.mask into
> > .cor_status/mask and .uncor_status/mask, then add .uncor_severity,
> > then add the device_status bit separately?  If it could be split up, I
> > think the ANFE case would be easier to see.
> 
> Thanks for the feedback! Will split it up into two pacthes in the next
> version.

Or even three:

  1) tease apart aer_err_info.status/.mask into .cor_status/mask and
 .uncor_status/mask

  2) add .uncor_severity

  3) add device_status

Looking at this again, I'm a little confused about 2) and 3).  I see
the new read of PCI_ERR_UNCOR_SEVER into .uncor_severity, but there's
no actual *use* of it.

Same for 3), I see the new read of PCI_EXP_DEVSTA, but AFAICS there's
no use of that value.

We should have the addition of these new values in the same patch
that *uses* them.

Bjorn


Re: [PATCH v2 2/4] PCI/AER: Handle Advisory Non-Fatal properly

2024-02-05 Thread Bjorn Helgaas
In the subject, "properly" really doesn't convey information.  I think
this patch does two things:

  - Prints error bits that might be ANFE 
  - Clears UNCOR_STATUS bits that were previously not cleared

Maybe the subject line could say something about those (clearing
UNCOR_STATUS might be more important, or maybe this could even be
split into two patches so we could see both).

On Thu, Jan 25, 2024 at 02:28:00PM +0800, Wang, Qingshun wrote:
> When processing an Advisory Non-Fatal error, ideally both correctable
> error status and uncorrectable error status should be cleared. However,
> there is no way to fully identify the UE associated with ANFE. Even
> worse, a Fatal/Non-Fatal error may set the same UE status bit as ANFE.
> Assuming an ANFE is FE/NFE is kind of bad, but assuming a FE/NFE is an
> ANFE is usually unacceptable. To avoid clearing UEs that are not ANFE by
> accident, the most conservative route is taken here: If any of the
> Fatal/Non-Fatal Error Detected bits is set in Device Status, do not
> touch UE status, they should be cleared later by the UE handler.
> Otherwise, a specific set of UEs that may be raised as ANFE according to
> the PCIe specification will be cleared if their corresponding severity
> is non-fatal. Additionally, log UEs that will be cleared.
> 
> For instance, previously when kernel receives an ANFE with Poisoned TLP
> in OS native AER mode, only status of CE will be reported and cleared:
> 
>   AER: Corrected error received: :b7:02.0
>   PCIe Bus Error: severity=Corrected, type=Transaction Layer, (Receiver ID)
> device [8086:0db0] error status/mask=2000/
>  [13] NonFatalErr
> 
> If the kernel receives a Malformed TLP after that, two UE will be
> reported, which is unexpected. Malformed TLP Header was lost since
> the previous ANF gated the TLP header logs:
> 
>   PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, 
> (Receiver ID)
> device [8086:0db0] error status/mask=00041000/00180020
>  [12] TLP(First)
>  [18] MalfTLP
> 
> Now, in the same scenario, both CE status and related UE status will be
> reported and cleared after ANFE:
> 
>   AER: Corrected error received: :b7:02.0
>   PCIe Bus Error: severity=Corrected, type=Transaction Layer, (Receiver ID)
> device [8086:0db0] error status/mask=2000/
>  [13] NonFatalErr
> Uncorrectable errors that may cause Advisory Non-Fatal:
>  [18] TLP
> 
> Signed-off-by: "Wang, Qingshun" 
> ---
>  drivers/pci/pcie/aer.c | 61 +-
>  1 file changed, 60 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 6583dcf50977..713cbf625d3f 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -107,6 +107,12 @@ struct aer_stats {
>   PCI_ERR_ROOT_MULTI_COR_RCV |\
>   PCI_ERR_ROOT_MULTI_UNCOR_RCV)
>  
> +#define AER_ERR_ANFE_UNC_MASK(PCI_ERR_UNC_POISON_TLP |   
> \
> + PCI_ERR_UNC_COMP_TIME | \
> + PCI_ERR_UNC_COMP_ABORT |\
> + PCI_ERR_UNC_UNX_COMP |  \
> + PCI_ERR_UNC_UNSUP)
> +
>  static int pcie_aer_disable;
>  static pci_ers_result_t aer_root_reset(struct pci_dev *dev);
>  
> @@ -612,6 +618,32 @@ const struct attribute_group aer_stats_attr_group = {
>   .is_visible = aer_stats_attrs_are_visible,
>  };
>  
> +static int anfe_get_related_err(struct aer_err_info *info)
> +{
> + /*
> +  * Take the most conservative route here. If there are
> +  * Non-Fatal/Fatal errors detected, do not assume any
> +  * bit in uncor_status is set by ANFE.
> +  */
> + if (info->device_status & (PCI_EXP_DEVSTA_NFED | PCI_EXP_DEVSTA_FED))
> + return 0;
> + /*
> +  * According to PCIe Base Specification Revision 6.1,
> +  * Section 6.2.3.2.4, if an UNCOR error is rasied as
> +  * Advisory Non-Fatal error, it will match the following
> +  * conditions:
> +  *  a. The severity of the error is Non-Fatal.
> +  *  b. The error is one of the following:
> +  *  1. Poisoned TLP
> +  *  2. Completion Timeout
> +  *  3. Completer Abort
> +  *  4. Unexpected Completion
> +  *  5. Unsupported Request
> +  */
> + return info->uncor_status & ~info->uncor_mask
> + & AER_ERR_ANFE_UNC_MASK & ~info->severity;
> +}
> +
>  static void pci_dev_aer_stats_incr(struct pci_dev *pdev,
>  struct aer_err_info *info)
>  {
> @@ -678,6 +710,7 @@ static void __aer_print_error(struct pci_dev *dev,
> struct aer_err_info *info)
>  {
>   unsigned long status;
> + unsigned long anfe_status;
>

Re: [PATCH v2 1/4] PCI/AER: Store more information in aer_err_info

2024-02-05 Thread Bjorn Helgaas
On Thu, Jan 25, 2024 at 02:27:59PM +0800, Wang, Qingshun wrote:
> When Advisory Non-Fatal errors are raised, both correctable and
> uncorrectable error statuses will be set. The current kernel code cannot
> store both statuses at the same time, thus failing to handle ANFE properly.
> In addition, to avoid clearing UEs that are not ANFE by accident, UE
> severity and Device Status also need to be recorded: any fatal UE cannot
> be ANFE, and if Fatal/Non-Fatal Error Detected is set in Device Status, do
> not take any assumption and let UE handler to clear UE status.
> 
> Store status and mask of both correctable and uncorrectable errors in
> aer_err_info. The severity of UEs and the values of the Device Status
> register are also recorded, which will be used to determine UEs that should
> be handled by the ANFE handler. Refactor the rest of the code to use
> cor/uncor_status and cor/uncor_mask fields instead of status and mask
> fields.

There's a lot going on in this patch.  Could it possibly be split up a
bit, e.g., first tease apart aer_err_info.status/.mask into
.cor_status/mask and .uncor_status/mask, then add .uncor_severity,
then add the device_status bit separately?  If it could be split up, I
think the ANFE case would be easier to see.

Thanks a lot for working on this area!

Bjorn


Re: [PATCH 1/1] PCI/DPC: Fix TLP Prefix register reading offset

2024-01-22 Thread Bjorn Helgaas
On Thu, Jan 18, 2024 at 01:08:15PM +0200, Ilpo Järvinen wrote:
> The TLP Prefix Log Register consists of multiple DWORDs (PCIe r6.1 sec
> 7.9.14.13) but the loop in dpc_process_rp_pio_error() keeps reading
> from the first DWORD. Add the iteration count based offset calculation
> into the config read.
> 
> Fixes: f20c4ea49ec4 ("PCI/DPC: Add eDPC support")
> Signed-off-by: Ilpo Järvinen 

Applied to pci/dpc for v6.9 with commit log below, thanks!

PCI/DPC: Print all TLP Prefixes, not just the first

The TLP Prefix Log Register consists of multiple DWORDs (PCIe r6.1 sec
7.9.14.13) but the loop in dpc_process_rp_pio_error() keeps reading from
the first DWORD, so we print only the first PIO TLP Prefix (duplicated
several times), and we never print the second, third, etc., Prefixes.

Add the iteration count based offset calculation into the config read.

Fixes: f20c4ea49ec4 ("PCI/DPC: Add eDPC support")
Link: 
https://lore.kernel.org/r/20240118110815.3867-1-ilpo.jarvi...@linux.intel.com
Signed-off-by: Ilpo Järvinen 
[bhelgaas: add user-visible details to commit log]
Signed-off-by: Bjorn Helgaas 

> ---
>  drivers/pci/pcie/dpc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> index 94111e438241..e5d7c12854fa 100644
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -234,7 +234,7 @@ static void dpc_process_rp_pio_error(struct pci_dev *pdev)
>  
>   for (i = 0; i < pdev->dpc_rp_log_size - 5; i++) {
>   pci_read_config_dword(pdev,
> - cap + PCI_EXP_DPC_RP_PIO_TLPPREFIX_LOG, );
> + cap + PCI_EXP_DPC_RP_PIO_TLPPREFIX_LOG + i * 4, 
> );
>   pci_err(pdev, "TLP Prefix Header: dw%d, %#010x\n", i, prefix);
>   }
>   clear_status:
> -- 
> 2.39.2
> 


Re: [PATCH 1/1] PCI/DPC: Fix TLP Prefix register reading offset

2024-01-19 Thread Bjorn Helgaas
On Thu, Jan 18, 2024 at 01:08:15PM +0200, Ilpo Järvinen wrote:
> The TLP Prefix Log Register consists of multiple DWORDs (PCIe r6.1 sec
> 7.9.14.13) but the loop in dpc_process_rp_pio_error() keeps reading
> from the first DWORD. Add the iteration count based offset calculation
> into the config read.

So IIUC the user-visible bug is that we print only the first PIO TLP
Prefix (duplicated several times), and we never print the second,
third, etc Prefixes, right?

I wish we could print them all in a single pci_err(), as we do for the
TLP Header Log, instead of dribbling them out one by one.

> Fixes: f20c4ea49ec4 ("PCI/DPC: Add eDPC support")
> Signed-off-by: Ilpo Järvinen 
> ---
>  drivers/pci/pcie/dpc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> index 94111e438241..e5d7c12854fa 100644
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -234,7 +234,7 @@ static void dpc_process_rp_pio_error(struct pci_dev *pdev)
>  
>   for (i = 0; i < pdev->dpc_rp_log_size - 5; i++) {
>   pci_read_config_dword(pdev,
> - cap + PCI_EXP_DPC_RP_PIO_TLPPREFIX_LOG, );
> + cap + PCI_EXP_DPC_RP_PIO_TLPPREFIX_LOG + i * 4, 
> );
>   pci_err(pdev, "TLP Prefix Header: dw%d, %#010x\n", i, prefix);
>   }
>   clear_status:
> -- 
> 2.39.2
> 


[PATCH 7/8] powerpc: Fix typos

2024-01-03 Thread Bjorn Helgaas
From: Bjorn Helgaas 

Fix typos, most reported by "codespell arch/powerpc".  Only touches
comments, no code changes.

Signed-off-by: Bjorn Helgaas 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/boot/Makefile   |  4 ++--
 arch/powerpc/boot/dts/acadia.dts |  2 +-
 arch/powerpc/boot/main.c |  2 +-
 arch/powerpc/boot/ps3.c  |  2 +-
 arch/powerpc/include/asm/io.h|  2 +-
 arch/powerpc/include/asm/opal-api.h  |  4 ++--
 arch/powerpc/include/asm/pmac_feature.h  |  2 +-
 arch/powerpc/include/asm/uninorth.h  |  2 +-
 arch/powerpc/include/uapi/asm/bootx.h|  2 +-
 arch/powerpc/kernel/eeh_pe.c |  2 +-
 arch/powerpc/kernel/fadump.c |  2 +-
 arch/powerpc/kernel/misc_64.S|  4 ++--
 arch/powerpc/kernel/process.c| 12 ++--
 arch/powerpc/kernel/ptrace/ptrace-tm.c   |  2 +-
 arch/powerpc/kernel/smp.c|  2 +-
 arch/powerpc/kernel/sysfs.c  |  4 ++--
 arch/powerpc/kvm/book3s_xive.c   |  2 +-
 arch/powerpc/mm/cacheflush.c |  2 +-
 arch/powerpc/mm/nohash/kaslr_booke.c |  2 +-
 arch/powerpc/platforms/512x/mpc512x_shared.c |  2 +-
 arch/powerpc/platforms/cell/spufs/sched.c|  2 +-
 arch/powerpc/platforms/maple/pci.c   |  2 +-
 arch/powerpc/platforms/powermac/pic.c|  2 +-
 arch/powerpc/platforms/powermac/sleep.S  |  2 +-
 arch/powerpc/platforms/powernv/pci-sriov.c   |  4 ++--
 arch/powerpc/platforms/powernv/vas-window.c  |  2 +-
 arch/powerpc/platforms/pseries/vas.c |  2 +-
 arch/powerpc/sysdev/xive/common.c|  4 ++--
 arch/powerpc/sysdev/xive/native.c|  2 +-
 29 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 968aee2025b8..9c2b6e527ed1 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -108,8 +108,8 @@ DTC_FLAGS   ?= -p 1024
 # these files into the build dir, fix up any includes and ensure that dependent
 # files are copied in the right order.
 
-# these need to be seperate variables because they are copied out of different
-# directories in the kernel tree. Sure you COULd merge them, but it's a
+# these need to be separate variables because they are copied out of different
+# directories in the kernel tree. Sure you COULD merge them, but it's a
 # cure-is-worse-than-disease situation.
 zlib-decomp-$(CONFIG_KERNEL_GZIP) := decompress_inflate.c
 zlib-$(CONFIG_KERNEL_GZIP) := inffast.c inflate.c inftrees.c
diff --git a/arch/powerpc/boot/dts/acadia.dts b/arch/powerpc/boot/dts/acadia.dts
index deb52e41ab84..5fedda811378 100644
--- a/arch/powerpc/boot/dts/acadia.dts
+++ b/arch/powerpc/boot/dts/acadia.dts
@@ -172,7 +172,7 @@ ieee1588@ef602800 {
reg = <0xef602800 0x60>;
interrupt-parent = <>;
interrupts = <0x4 0x4>;
-   /* This thing is a bit weird.  It has it's own 
UIC
+   /* This thing is a bit weird.  It has its own 
UIC
 * that it uses to generate snapshot triggers.  
We
 * don't really support this device yet, and it 
needs
 * work to figure this out.
diff --git a/arch/powerpc/boot/main.c b/arch/powerpc/boot/main.c
index cae31a6e8f02..2c0e2a1cab01 100644
--- a/arch/powerpc/boot/main.c
+++ b/arch/powerpc/boot/main.c
@@ -188,7 +188,7 @@ static inline void prep_esm_blob(struct addr_range vmlinux, 
void *chosen) { }
 
 /* A buffer that may be edited by tools operating on a zImage binary so as to
  * edit the command line passed to vmlinux (by setting /chosen/bootargs).
- * The buffer is put in it's own section so that tools may locate it easier.
+ * The buffer is put in its own section so that tools may locate it easier.
  */
 static char cmdline[BOOT_COMMAND_LINE_SIZE]
__attribute__((__section__("__builtin_cmdline")));
diff --git a/arch/powerpc/boot/ps3.c b/arch/powerpc/boot/ps3.c
index f157717ae814..89ff46b8b225 100644
--- a/arch/powerpc/boot/ps3.c
+++ b/arch/powerpc/boot/ps3.c
@@ -25,7 +25,7 @@ BSS_STACK(4096);
 
 /* A buffer that may be edited by tools operating on a zImage binary so as to
  * edit the command line passed to vmlinux (by setting /chosen/bootargs).
- * The buffer is put in it's own section so that tools may locate it easier.
+ * The buffer is put in its own section so that tools may locate it easier.
  */
 
 static char cmdline[BOOT_COMMAND_LINE_SIZE]
diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 5220274a6277..7fb001ab3109 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -989,7 +989,7 @@ static inline phys

Re: [PATCH 2/3] PCI/AER: Decode Requester ID when no error info found

2024-01-02 Thread Bjorn Helgaas
On Tue, Jan 02, 2024 at 11:22:53AM -0800, Kuppuswamy Sathyanarayanan wrote:
> On 12/6/2023 2:42 PM, Bjorn Helgaas wrote:
> > From: Bjorn Helgaas 
> > 
> > When a device with AER detects an error, it logs error information in its
> > own AER Error Status registers.  It may send an Error Message to the Root
> > Port (RCEC in the case of an RCiEP), which logs the fact that an Error
> > Message was received (Root Error Status) and the Requester ID of the
> > message source (Error Source Identification).
> > 
> > aer_print_port_info() prints the Requester ID from the Root Port Error
> > Source in the usual Linux "bb:dd.f" format, but when find_source_device()
> > finds no error details in the hierarchy below the Root Port, it printed the
> > raw Requester ID without decoding it.
> > 
> > Decode the Requester ID in the usual Linux format so it matches other
> > messages.
> > 
> > Sample message changes:
> > 
> >   - pcieport :00:1c.5: AER: Correctable error received: :00:1c.5
> >   - pcieport :00:1c.5: AER: can't find device of ID00e5
> >   + pcieport :00:1c.5: AER: Correctable error message received from 
> > :00:1c.5
> >   + pcieport :00:1c.5: AER: found no error details for :00:1c.5
> > 
> > Signed-off-by: Bjorn Helgaas 
> 
> Except for the suggestion given below, it looks good to me.
> 
> Reviewed-by: Kuppuswamy Sathyanarayanan 
> 

Thanks for taking a look!

> > @@ -740,7 +740,7 @@ static void aer_print_port_info(struct pci_dev *dev, 
> > struct aer_err_info *info)
> > u8 bus = info->id >> 8;
> > u8 devfn = info->id & 0xff;
> >  
> > -   pci_info(dev, "%s%s error received: %04x:%02x:%02x.%d\n",
> > +   pci_info(dev, "%s%s error message received from %04x:%02x:%02x.%d\n",
> >  info->multi_error_valid ? "Multiple " : "",
> >  aer_error_severity_string[info->severity],
> >  pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn),
> > @@ -929,7 +929,12 @@ static bool find_source_device(struct pci_dev *parent,
> > pci_walk_bus(parent->subordinate, find_device_iter, e_info);
> >  
> > if (!e_info->error_dev_num) {
> > -   pci_info(parent, "can't find device of ID%04x\n", e_info->id);
> > +   u8 bus = e_info->id >> 8;
> > +   u8 devfn = e_info->id & 0xff;
> 
> You can use PCI_BUS_NUM(e_info->id) for getting bus number.  Since
> you are extracting this info in more than one place, maybe you can
> also define a macro PCI_DEVFN(id) (following PCI_BUS_NUM()).

Thanks, both good ideas.

We already have a PCI_DEVFN() that *combines* slot + func into devfn,
so we'd have to come up with a different name.

I'll add a patch to use PCI_BUS_NUM() in the two places here and in
pme.c.

I think I'll wait with these until after the v6.7 release.

> > +   pci_info(parent, "found no error details for 
> > %04x:%02x:%02x.%d\n",
> > +pci_domain_nr(parent->bus), bus, PCI_SLOT(devfn),
> > +PCI_FUNC(devfn));
> > return false;
> > }
> > return true;
> 
> -- 
> Sathyanarayanan Kuppuswamy
> Linux Kernel Developer


Re: [PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors

2023-12-12 Thread Bjorn Helgaas
On Tue, Dec 12, 2023 at 09:00:24AM -0600, Terry Bowman wrote:
> Hi Bjorn,
> 
> Will help prevent confusion. LGTM. 

Thanks a lot for taking a look at these!  I'd like to give you credit
in the log, e.g., "Reviewed-by: Terry Bowman ",
but I'm OCD enough that I don't want to translate "LGTM" into that all
by myself.

If you want that credit (and, I guess, the privilege of being cc'd
when we find that these patches break something :)), just reply again
with that actual "Reviewed-by:" text in it.

Bjorn


Re: [PATCH 0/3] PCI/AER: Clean up logging

2023-12-08 Thread Bjorn Helgaas
[+cc Jonathan]

On Wed, Dec 06, 2023 at 04:42:28PM -0600, Bjorn Helgaas wrote:
> From: Bjorn Helgaas 
> 
> Clean up some minor AER logging issues:
> 
>   - Log as "Correctable errors", not "Corrected errors"
> 
>   - Decode the Requester ID when we couldn't find detail error info
> 
> Bjorn Helgaas (3):
>   PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
>   PCI/AER: Decode Requester ID when no error info found
>   PCI/AER: Use explicit register sizes for struct members
> 
>  drivers/pci/pcie/aer.c | 19 ---
>  include/linux/aer.h|  8 
>  2 files changed, 16 insertions(+), 11 deletions(-)

Applied to pci/aer for v6.8.  Thanks, Jonathan, for your time in
taking a look.


[PATCH 3/3] PCI/AER: Use explicit register sizes for struct members

2023-12-06 Thread Bjorn Helgaas
From: Bjorn Helgaas 

aer_irq() reads the AER Root Error Status and Error Source Identification
(PCI_ERR_ROOT_STATUS and PCI_ERR_ROOT_ERR_SRC) registers directly into
struct aer_err_source.  Both registers are 32 bits, so declare the members
explicitly as "u32" instead of "unsigned int".

Similarly, aer_get_device_error_info() reads the AER Header Log
(PCI_ERR_HEADER_LOG) registers, which are also 32 bits, into struct
aer_header_log_regs.  Declare those members as "u32" as well.

No functional changes intended.

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/pcie/aer.c | 4 ++--
 include/linux/aer.h| 8 
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 2ff6bac9979f..60f84414ec2a 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -41,8 +41,8 @@
 #define AER_MAX_TYPEOF_UNCOR_ERRS  27  /* as per PCI_ERR_UNCOR_STATUS*/
 
 struct aer_err_source {
-   unsigned int status;
-   unsigned int id;
+   u32 status; /* PCI_ERR_ROOT_STATUS */
+   u32 id; /* PCI_ERR_ROOT_ERR_SRC */
 };
 
 struct aer_rpc {
diff --git a/include/linux/aer.h b/include/linux/aer.h
index f6ea2f57d808..ae0fae70d4bd 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -19,10 +19,10 @@
 struct pci_dev;
 
 struct aer_header_log_regs {
-   unsigned int dw0;
-   unsigned int dw1;
-   unsigned int dw2;
-   unsigned int dw3;
+   u32 dw0;
+   u32 dw1;
+   u32 dw2;
+   u32 dw3;
 };
 
 struct aer_capability_regs {
-- 
2.34.1



[PATCH 2/3] PCI/AER: Decode Requester ID when no error info found

2023-12-06 Thread Bjorn Helgaas
From: Bjorn Helgaas 

When a device with AER detects an error, it logs error information in its
own AER Error Status registers.  It may send an Error Message to the Root
Port (RCEC in the case of an RCiEP), which logs the fact that an Error
Message was received (Root Error Status) and the Requester ID of the
message source (Error Source Identification).

aer_print_port_info() prints the Requester ID from the Root Port Error
Source in the usual Linux "bb:dd.f" format, but when find_source_device()
finds no error details in the hierarchy below the Root Port, it printed the
raw Requester ID without decoding it.

Decode the Requester ID in the usual Linux format so it matches other
messages.

Sample message changes:

  - pcieport :00:1c.5: AER: Correctable error received: :00:1c.5
  - pcieport :00:1c.5: AER: can't find device of ID00e5
  + pcieport :00:1c.5: AER: Correctable error message received from 
:00:1c.5
  + pcieport :00:1c.5: AER: found no error details for :00:1c.5

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/pcie/aer.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 20db80018b5d..2ff6bac9979f 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -740,7 +740,7 @@ static void aer_print_port_info(struct pci_dev *dev, struct 
aer_err_info *info)
u8 bus = info->id >> 8;
u8 devfn = info->id & 0xff;
 
-   pci_info(dev, "%s%s error received: %04x:%02x:%02x.%d\n",
+   pci_info(dev, "%s%s error message received from %04x:%02x:%02x.%d\n",
 info->multi_error_valid ? "Multiple " : "",
 aer_error_severity_string[info->severity],
 pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn),
@@ -929,7 +929,12 @@ static bool find_source_device(struct pci_dev *parent,
pci_walk_bus(parent->subordinate, find_device_iter, e_info);
 
if (!e_info->error_dev_num) {
-   pci_info(parent, "can't find device of ID%04x\n", e_info->id);
+   u8 bus = e_info->id >> 8;
+   u8 devfn = e_info->id & 0xff;
+
+   pci_info(parent, "found no error details for 
%04x:%02x:%02x.%d\n",
+pci_domain_nr(parent->bus), bus, PCI_SLOT(devfn),
+PCI_FUNC(devfn));
return false;
}
return true;
-- 
2.34.1



[PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors

2023-12-06 Thread Bjorn Helgaas
From: Bjorn Helgaas 

The PCIe spec classifies errors as either "Correctable" or "Uncorrectable".
Previously we printed these as "Corrected" or "Uncorrected".  To avoid
confusion, use the same terms as the spec.

One confusing situation is when one agent detects an error, but another
agent is responsible for recovery, e.g., by re-attempting the operation.
The first agent may log a "correctable" error but it has not yet been
corrected.  The recovery agent must report an uncorrectable error if it is
unable to recover.  If we print the first agent's error as "Corrected", it
gives the false impression that it has already been resolved.

Sample message change:

  - pcieport :00:1c.5: AER: Corrected error received: :00:1c.5
  + pcieport :00:1c.5: AER: Correctable error received: :00:1c.5

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/pcie/aer.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 42a3bd35a3e1..20db80018b5d 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -436,9 +436,9 @@ void pci_aer_exit(struct pci_dev *dev)
  * AER error strings
  */
 static const char *aer_error_severity_string[] = {
-   "Uncorrected (Non-Fatal)",
-   "Uncorrected (Fatal)",
-   "Corrected"
+   "Uncorrectable (Non-Fatal)",
+   "Uncorrectable (Fatal)",
+   "Correctable"
 };
 
 static const char *aer_error_layer[] = {
-- 
2.34.1



[PATCH 0/3] PCI/AER: Clean up logging

2023-12-06 Thread Bjorn Helgaas
From: Bjorn Helgaas 

Clean up some minor AER logging issues:

  - Log as "Correctable errors", not "Corrected errors"

  - Decode the Requester ID when we couldn't find detail error info

Bjorn Helgaas (3):
  PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
  PCI/AER: Decode Requester ID when no error info found
  PCI/AER: Use explicit register sizes for struct members

 drivers/pci/pcie/aer.c | 19 ---
 include/linux/aer.h|  8 
 2 files changed, 16 insertions(+), 11 deletions(-)

-- 
2.34.1



Re: [PATCH 1/6] x86: Use PCI_HEADER_TYPE_* instead of literals

2023-12-01 Thread Bjorn Helgaas
[+cc scsi, powerpc folks]

On Fri, Dec 01, 2023 at 02:44:47PM -0600, Bjorn Helgaas wrote:
> On Fri, Nov 24, 2023 at 11:09:13AM +0200, Ilpo Järvinen wrote:
> > Replace 0x7f and 0x80 literals with PCI_HEADER_TYPE_* defines.
> > 
> > Signed-off-by: Ilpo Järvinen 
> 
> Applied entire series on the PCI "enumeration" branch for v6.8,
> thanks!
> 
> If anybody wants to take pieces separately, let me know and I'll drop
> from PCI.

OK, b4 picked up the entire series but I was only cc'd on this first
patch, so I missed the responses about EDAC, xtensa, bcma already
being applied elsewhere.

So I kept these in the PCI tree:

  420ac76610d7 ("scsi: lpfc: Use PCI_HEADER_TYPE_MFD instead of literal")
  3773343dd890 ("powerpc/fsl-pci: Use PCI_HEADER_TYPE_MASK instead of literal")
  197e0da1f1a3 ("x86/pci: Use PCI_HEADER_TYPE_* instead of literals")

and dropped the others.

x86, SCSI, powerpc folks, if you want to take these instead, let me
know and I'll drop them.

> > ---
> >  arch/x86/kernel/aperture_64.c  | 3 +--
> >  arch/x86/kernel/early-quirks.c | 4 ++--
> >  2 files changed, 3 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
> > index 4feaa670d578..89c0c8a3fc7e 100644
> > --- a/arch/x86/kernel/aperture_64.c
> > +++ b/arch/x86/kernel/aperture_64.c
> > @@ -259,10 +259,9 @@ static u32 __init search_agp_bridge(u32 *order, int 
> > *valid_agp)
> > order);
> > }
> >  
> > -   /* No multi-function device? */
> > type = read_pci_config_byte(bus, slot, func,
> >PCI_HEADER_TYPE);
> > -   if (!(type & 0x80))
> > +   if (!(type & PCI_HEADER_TYPE_MFD))
> > break;
> > }
> > }
> > diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
> > index a6c1867fc7aa..59f4aefc6bc1 100644
> > --- a/arch/x86/kernel/early-quirks.c
> > +++ b/arch/x86/kernel/early-quirks.c
> > @@ -779,13 +779,13 @@ static int __init check_dev_quirk(int num, int slot, 
> > int func)
> > type = read_pci_config_byte(num, slot, func,
> > PCI_HEADER_TYPE);
> >  
> > -   if ((type & 0x7f) == PCI_HEADER_TYPE_BRIDGE) {
> > +   if ((type & PCI_HEADER_TYPE_MASK) == PCI_HEADER_TYPE_BRIDGE) {
> > sec = read_pci_config_byte(num, slot, func, PCI_SECONDARY_BUS);
> > if (sec > num)
> > early_pci_scan_bus(sec);
> > }
> >  
> > -   if (!(type & 0x80))
> > +   if (!(type & PCI_HEADER_TYPE_MFD))
> > return -1;
> >  
> > return 0;
> > -- 
> > 2.30.2
> > 


Re: [pci:controller/xilinx-xdma] BUILD REGRESSION 8d786149d78c7784144c7179e25134b6530b714b

2023-10-31 Thread Bjorn Helgaas
On Tue, Oct 31, 2023 at 09:59:29AM -0700, Nick Desaulniers wrote:
> On Tue, Oct 31, 2023 at 7:56 AM Bjorn Helgaas  wrote:
> > On Sat, Oct 28, 2023 at 08:22:54PM +0800, kernel test robot wrote:
> > > tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git 
> > > controller/xilinx-xdma
> > > branch HEAD: 8d786149d78c7784144c7179e25134b6530b714b  PCI: xilinx-xdma: 
> > > Add Xilinx XDMA Root Port driver
> > >
> > > Error/Warning ids grouped by kconfigs:
> > >
> > > clang_recent_errors
> > > `-- powerpc-pmac32_defconfig
> > > |-- 
> > > arch-powerpc-sysdev-grackle.c:error:unused-function-grackle_set_stg-Werror-Wunused-function
> > > |-- 
> > > arch-powerpc-xmon-xmon.c:error:unused-function-get_output_lock-Werror-Wunused-function
> > > `-- 
> > > arch-powerpc-xmon-xmon.c:error:unused-function-release_output_lock-Werror-Wunused-function
> >
> > This report is close to useless.  It doesn't show the complete error
> > message, it doesn't show how to reproduce the issue, and the pci -next
> > branch (including controller/xilinx-xdma) doesn't reference any of
> > these functions:
> >
> >   $ git grep -E "grackle_set_stg|get_output_lock|release_output_lock" | cat
> >   arch/powerpc/sysdev/grackle.c:static inline void grackle_set_stg(struct 
> > pci_controller* bp, int enable)
> >   arch/powerpc/sysdev/grackle.c:grackle_set_stg(hose, 1);
> >   arch/powerpc/xmon/xmon.c:static void get_output_lock(void)
> >   arch/powerpc/xmon/xmon.c:static void release_output_lock(void)
> >   arch/powerpc/xmon/xmon.c:static inline void get_output_lock(void) {}
> >   arch/powerpc/xmon/xmon.c:static inline void release_output_lock(void) {}
> >   arch/powerpc/xmon/xmon.c: get_output_lock();
> >   arch/powerpc/xmon/xmon.c: release_output_lock();
> >   arch/powerpc/xmon/xmon.c: get_output_lock();
> >   arch/powerpc/xmon/xmon.c: release_output_lock();
> >   arch/powerpc/xmon/xmon.c: get_output_lock();
> >   arch/powerpc/xmon/xmon.c: release_output_lock();
> >   arch/powerpc/xmon/xmon.c: get_output_lock();
> >   arch/powerpc/xmon/xmon.c: release_output_lock();
> >
> > That said, the unused functions do look legit:
> >
> > grackle_set_stg() is a static function and the only call is under
> > "#if 0".
> 
> Time to remove it then? Or is it a bug that it's not called?
> Otherwise the definition should be behind the same preprocessor guards
> as the caller.  Same for the below.

I don't really care whether we keep the warning or not.

My real complaint is that the 0-day report fingered
pci/controller/xilinx-xdma, which is completely unrelated, which is a
waste of time.

> > Same with get_output_lock() and release_output_lock(): they're static
> > and always defined in xmon.c, but only called if either CONFIG_SMP or
> > CONFIG_DEBUG_FS.
> >
> > But they're certainly not related to controller/xilinx-xdma, so I'm
> > going to ignore them.
> >
> > Bjorn
> >
> > P.S. Nathan & Nick, I cc'd you because of this earlier report that
> > also mentioned grackle_set_stg():
> > https://lore.kernel.org/lkml/202308121120.u2d3ypvt-...@intel.com/
> 
> 
> 
> -- 
> Thanks,
> ~Nick Desaulniers


Re: [linux-next:master] BUILD REGRESSION c503e3eec382ac708ee7adf874add37b77c5d312

2023-10-31 Thread Bjorn Helgaas
On Tue, Oct 31, 2023 at 04:35:23AM +0800, kernel test robot wrote:
> tree/branch: 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> branch HEAD: c503e3eec382ac708ee7adf874add37b77c5d312  Add linux-next 
> specific files for 20231030
> 
> Error/Warning reports:
> ... 
> https://lore.kernel.org/oe-kbuild-all/202310302206.pkr5ebdi-...@intel.com

> Error/Warning: (recently discovered and may have been fixed)
> 
> Warning: MAINTAINERS references a file that doesn't exist: 
> Documentation/devicetree/bindings/iio/imu/bosch,bma400.yaml
> aarch64-linux-ld: drivers/cxl/core/pci.c:921:(.text+0xbbc): undefined 
> reference to `pci_print_aer'
> ...
> arch/riscv/include/asm/mmio.h:67:(.text+0xd66): undefined reference to 
> `pci_print_aer'
> csky-linux-ld: pci.c:(.text+0x6e8): undefined reference to `pci_print_aer'
> drivers/cxl/core/pci.c:921: undefined reference to `pci_print_aer'
> drivers/cxl/core/pci.c:921:(.text+0xbc0): undefined reference to 
> `pci_print_aer'
> ...
> ld: drivers/cxl/core/pci.c:921: undefined reference to `pci_print_aer'
> loongarch64-linux-ld: drivers/cxl/core/pci.c:921:(.text+0xa38): undefined 
> reference to `pci_print_aer'
> pci.c:(.text+0x662): undefined reference to `pci_print_aer'
> powerpc-linux-ld: pci.c:(.text+0xf10): undefined reference to `pci_print_aer'
> riscv64-linux-ld: pci.c:(.text+0x11ec): undefined reference to `pci_print_aer'

I have no idea about the above (and all the similar ones below); I
assume they all have to do with
https://lore.kernel.org/r/20231018171713.1883517-13-rrich...@amd.com

> Unverified Error/Warning (likely false positive, please contact us if 
> interested):
> 
> drivers/pci/controller/dwc/pcie-rcar-gen4.c:439:15: warning: cast to smaller 
> integer type 'enum dw_pcie_device_mode' from 'const void *' 
> [-Wvoid-pointer-to-enum-cast]

Safe but annoying.  Yoshihiro, can you fix this by adding structs for
the of_device_id.data member instead of casting DW_PCIE_RC_TYPE and
DW_PCIE_EP_TYPE?  Examples here:

  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/dwc/pci-dra7xx.c?id=v6.6#n557
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/dwc/pci-keystone.c?id=v6.6#n1069
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/dwc/pcie-artpec6.c?id=v6.6#n452
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/dwc/pcie-designware-plat.c?id=v6.6#n159
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/dwc/pcie-keembay.c?id=v6.6#n437
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/dwc/pcie-tegra194.c?id=v6.6#n2431

Siddharth, since you're looking at keystone v3.65, it looks to me like
DW_PCIE_VER_365A is currently broken because ks_pcie_rc_of_data
doesn't set .mode, so it defaults to zero, and it looks like we should
end up at the INVALID device type case here:

  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/dwc/pci-keystone.c?id=v6.6#n1285

> |-- arm64-buildonly-randconfig-r003-20220511
> |   `-- 
> aarch64-linux-ld:drivers-cxl-core-pci.c:(.text):undefined-reference-to-pci_print_aer

> |-- csky-randconfig-001-20231030
> |   |-- csky-linux-ld:pci.c:(.text):undefined-reference-to-pci_print_aer
> |   `-- pci.c:(.text):undefined-reference-to-pci_print_aer

> |-- i386-randconfig-141-20231030
> |   |-- ld:drivers-cxl-core-pci.c:undefined-reference-to-pci_print_aer

> |-- loongarch-randconfig-r014-20230225
> |   `-- drivers-cxl-core-pci.c:(.text):undefined-reference-to-pci_print_aer
> |-- loongarch-randconfig-r032-20220926
> |   `-- 
> loongarch64-linux-ld:drivers-cxl-core-pci.c:(.text):undefined-reference-to-pci_print_aer

> |-- powerpc-randconfig-003-20231016
> |   `-- powerpc-linux-ld:pci.c:(.text):undefined-reference-to-pci_print_aer

> |-- riscv-randconfig-r002-20220124
> |   `-- 
> arch-riscv-include-asm-mmio.h:(.text):undefined-reference-to-pci_print_aer
> |-- riscv-randconfig-r011-20220606
> |   `-- riscv64-linux-ld:pci.c:(.text):undefined-reference-to-pci_print_aer

> |-- x86_64-randconfig-x052-20230810
> |   `-- drivers-cxl-core-pci.c:undefined-reference-to-pci_print_aer


Re: [pci:controller/xilinx-xdma] BUILD REGRESSION 8d786149d78c7784144c7179e25134b6530b714b

2023-10-31 Thread Bjorn Helgaas
[+cc powerpc, clang folks]

On Sat, Oct 28, 2023 at 08:22:54PM +0800, kernel test robot wrote:
> tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git 
> controller/xilinx-xdma
> branch HEAD: 8d786149d78c7784144c7179e25134b6530b714b  PCI: xilinx-xdma: Add 
> Xilinx XDMA Root Port driver
> 
> Error/Warning ids grouped by kconfigs:
> 
> clang_recent_errors
> `-- powerpc-pmac32_defconfig
> |-- 
> arch-powerpc-sysdev-grackle.c:error:unused-function-grackle_set_stg-Werror-Wunused-function
> |-- 
> arch-powerpc-xmon-xmon.c:error:unused-function-get_output_lock-Werror-Wunused-function
> `-- 
> arch-powerpc-xmon-xmon.c:error:unused-function-release_output_lock-Werror-Wunused-function

This report is close to useless.  It doesn't show the complete error
message, it doesn't show how to reproduce the issue, and the pci -next
branch (including controller/xilinx-xdma) doesn't reference any of
these functions:

  $ git grep -E "grackle_set_stg|get_output_lock|release_output_lock" | cat
  arch/powerpc/sysdev/grackle.c:static inline void grackle_set_stg(struct 
pci_controller* bp, int enable)
  arch/powerpc/sysdev/grackle.c:grackle_set_stg(hose, 1);
  arch/powerpc/xmon/xmon.c:static void get_output_lock(void)
  arch/powerpc/xmon/xmon.c:static void release_output_lock(void)
  arch/powerpc/xmon/xmon.c:static inline void get_output_lock(void) {}
  arch/powerpc/xmon/xmon.c:static inline void release_output_lock(void) {}
  arch/powerpc/xmon/xmon.c: get_output_lock();
  arch/powerpc/xmon/xmon.c: release_output_lock();
  arch/powerpc/xmon/xmon.c: get_output_lock();
  arch/powerpc/xmon/xmon.c: release_output_lock();
  arch/powerpc/xmon/xmon.c: get_output_lock();
  arch/powerpc/xmon/xmon.c: release_output_lock();
  arch/powerpc/xmon/xmon.c: get_output_lock();
  arch/powerpc/xmon/xmon.c: release_output_lock();

That said, the unused functions do look legit:

grackle_set_stg() is a static function and the only call is under
"#if 0".

Same with get_output_lock() and release_output_lock(): they're static
and always defined in xmon.c, but only called if either CONFIG_SMP or
CONFIG_DEBUG_FS.

But they're certainly not related to controller/xilinx-xdma, so I'm
going to ignore them.

Bjorn

P.S. Nathan & Nick, I cc'd you because of this earlier report that
also mentioned grackle_set_stg():
https://lore.kernel.org/lkml/202308121120.u2d3ypvt-...@intel.com/


Re: [PATCH v6 1/3] PCI/AER: Factor out interrupt toggling into helpers

2023-10-25 Thread Bjorn Helgaas
On Fri, May 12, 2023 at 08:00:12AM +0800, Kai-Heng Feng wrote:
> There are many places that enable and disable AER interrupt, so move
> them into helpers.
> 
> Reviewed-by: Mika Westerberg 
> Reviewed-by: Kuppuswamy Sathyanarayanan 
> 
> Reviewed-by: Jonathan Cameron 
> Signed-off-by: Kai-Heng Feng 

I applied this patch (only 1/3) to pci/aer for v6.7.

I'm not clear on the others yet, so let's look at those again after
v6.7-rc1.  It seemed like there's still a question about disabling
interrupts when we're going to D3hot.

>  drivers/pci/pcie/aer.c | 45 +-
>  1 file changed, 27 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index f6c24ded134c..1420e1f27105 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1227,6 +1227,28 @@ static irqreturn_t aer_irq(int irq, void *context)
>   return IRQ_WAKE_THREAD;
>  }
>  
> +static void aer_enable_irq(struct pci_dev *pdev)
> +{
> + int aer = pdev->aer_cap;
> + u32 reg32;
> +
> + /* Enable Root Port's interrupt in response to error messages */
> + pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, );
> + reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
> + pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32);
> +}
> +
> +static void aer_disable_irq(struct pci_dev *pdev)
> +{
> + int aer = pdev->aer_cap;
> + u32 reg32;
> +
> + /* Disable Root's interrupt in response to error messages */
> + pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, );
> + reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
> + pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32);
> +}
> +
>  /**
>   * aer_enable_rootport - enable Root Port's interrupts when receiving 
> messages
>   * @rpc: pointer to a Root Port data structure
> @@ -1256,10 +1278,7 @@ static void aer_enable_rootport(struct aer_rpc *rpc)
>   pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, );
>   pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, reg32);
>  
> - /* Enable Root Port's interrupt in response to error messages */
> - pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, );
> - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
> - pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32);
> + aer_enable_irq(pdev);
>  }
>  
>  /**
> @@ -1274,10 +1293,7 @@ static void aer_disable_rootport(struct aer_rpc *rpc)
>   int aer = pdev->aer_cap;
>   u32 reg32;
>  
> - /* Disable Root's interrupt in response to error messages */
> - pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, );
> - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
> - pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32);
> + aer_disable_irq(pdev);
>  
>   /* Clear Root's error status reg */
>   pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, );
> @@ -1372,12 +1388,8 @@ static pci_ers_result_t aer_root_reset(struct pci_dev 
> *dev)
>*/
>   aer = root ? root->aer_cap : 0;
>  
> - if ((host->native_aer || pcie_ports_native) && aer) {
> - /* Disable Root's interrupt in response to error messages */
> - pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, );
> - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
> - pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
> - }
> + if ((host->native_aer || pcie_ports_native) && aer)
> + aer_disable_irq(root);
>  
>   if (type == PCI_EXP_TYPE_RC_EC || type == PCI_EXP_TYPE_RC_END) {
>   rc = pcie_reset_flr(dev, PCI_RESET_DO_RESET);
> @@ -1396,10 +1408,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev 
> *dev)
>   pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, );
>   pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32);
>  
> - /* Enable Root Port's interrupt in response to error messages */
> - pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, );
> - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
> - pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
> + aer_enable_irq(root);
>   }
>  
>   return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED;
> -- 
> 2.34.1
> 


Re: [PATCH 2/3] PCI: layerscape: add suspend/resume for ls1021a

2023-10-16 Thread Bjorn Helgaas
On Mon, Oct 16, 2023 at 12:11:04PM -0400, Frank Li wrote:
> On Mon, Oct 16, 2023 at 10:22:11AM -0500, Bjorn Helgaas wrote:

> > Obviously Lorenzo *could* edit all your subject lines on your behalf,
> > but it makes everybody's life easier if people look at the existing
> > code and follow the style when making changes.
> 
> Understand, but simple mark 'a' and 'A' to me. I will update patches and
> take care for next time instead search whole docuemnt to guess which one
> violated. I know I make some mistakes at here. But I am working on many
> difference kernel subsystems, some require upper case, some require low
> case, someone doesn't care.

Right, that's why I always suggest following the example of the
surrounding code and history.  English is the only language I know,
but I speculate that this typographical detail probably doesn't make
sense in languages that don't have a similar upper/lowercase
distinction.

Thanks for persevering; we'd be having a lot more trouble if I tried
to send emails in your native language ;)

Bjorn


Re: [PATCH 2/3] PCI: layerscape: add suspend/resume for ls1021a

2023-10-16 Thread Bjorn Helgaas
On Mon, Oct 16, 2023 at 10:45:25AM -0400, Frank Li wrote:
> On Tue, Oct 10, 2023 at 06:02:36PM +0200, Lorenzo Pieralisi wrote:
> > On Tue, Oct 10, 2023 at 10:20:12AM -0400, Frank Li wrote:

> > > Ping
> > 
> > Read and follow please (and then ping us):
> > https://lore.kernel.org/linux-pci/20171026223701.ga25...@bhelgaas-glaptop.roam.corp.google.com
> 
> Could you please help point which specic one was not follow aboved guide?
> Then I can update my code. I think that's efficial communication method. I
> think I have read it serial times. But not sure which one violate the
> guide?
> 
> @Bjorn Helgaas. How do you think so? 

Since Lorenzo didn't point out anything specific in the patch itself,
I think he was probably referring to the subject line and this advice:

  - Follow the existing convention, i.e., run "git log --oneline
" and make yours match in format, capitalization, and
sentence structure.  For example, native host bridge driver patch
titles look like this:

  PCI: altera: Fix platform_get_irq() error handling
  PCI: vmd: Remove IRQ affinity so we can allocate more IRQs
  PCI: mediatek: Add MSI support for MT2712 and MT7622
  PCI: rockchip: Remove IRQ domain if probe fails

In this case, your subject line was:

  PCI: layerscape: add suspend/resume for ls1021a

The advice was to run this:

  $ git log --oneline drivers/pci/controller/dwc/pci-layerscape.c
  83c088148c8e PCI: Use PCI_HEADER_TYPE_* instead of literals
  9fda4d09905d PCI: layerscape: Add power management support for ls1028a
  277004d7a4a3 PCI: Remove unnecessary  includes
  60b3c27fb9b9 PCI: dwc: Rename struct pcie_port to dw_pcie_rp
  d23f0c11aca2 PCI: layerscape: Change to use the DWC common link-up check 
function
  7007b745a508 PCI: layerscape: Convert to builtin_platform_driver()
  60f5b73fa0f2 PCI: dwc: Remove unnecessary wrappers around dw_pcie_host_init()
  b9ac0f9dc8ea PCI: dwc: Move dw_pcie_setup_rc() to DWC common code
  f78f02638af5 PCI: dwc: Rework MSI initialization

Note that these summaries are all complete sentences that start with a
capital letter:

  Use PCI_HEADER_TYPE_* instead of literals
  Add power management support for ls1028a
  Remove unnecessary  includes
  ...

So yours could be this:

  PCI: layerscape: Add suspend/resume for ls1021a
   ^

This is trivial, obviously.  But the uppercase/lowercase distinction
carries information, and it's an unnecessary distraction to notice
that "oh, this is different from the rest; is the difference
important or should I ignore it?"

Obviously Lorenzo *could* edit all your subject lines on your behalf,
but it makes everybody's life easier if people look at the existing
code and follow the style when making changes.

E.g., write subject lines that are similar in style to previous ones,
name local variables similarly to other functions, use line lengths
consistent with the rest of the file, etc.  After applying a change,
the file should look like a coherent whole; we should not be able to
tell that this hunk was added later by somebody else.  This all helps
make the code (and the git history) more readable and maintainable.

Bjorn


Re: [PATCH 0/3] PCI: PCI_HEADER_TYPE bugfix & cleanups

2023-10-03 Thread Bjorn Helgaas
On Tue, Oct 03, 2023 at 03:52:57PM +0300, Ilpo Järvinen wrote:
> One bugfix and cleanups for PCI_HEADER_TYPE_* literals.
> 
> This series only covers what's within drivers/pci/. I'd have patches
> for other subsystems too but I decided to wait with them until
> PCI_HEADER_TYPE_MFD is in Linus' tree (to keep the series receipient
> count reasonable, the rest can IMO go through the subsystem specific
> trees once the define is there).
> 
> Ilpo Järvinen (3):
>   PCI: vmd: Correct PCI Header Type Register's MFD bit check
>   PCI: Add PCI_HEADER_TYPE_MFD pci_regs.h
>   PCI: Use PCI_HEADER_TYPE_* instead of literals
> 
>  drivers/pci/controller/dwc/pci-layerscape.c   |  2 +-
>  .../controller/mobiveil/pcie-mobiveil-host.c  |  2 +-
>  drivers/pci/controller/pcie-iproc.c   |  2 +-
>  drivers/pci/controller/pcie-rcar-ep.c |  2 +-
>  drivers/pci/controller/pcie-rcar-host.c   |  2 +-
>  drivers/pci/controller/vmd.c  |  5 ++---
>  drivers/pci/hotplug/cpqphp_ctrl.c |  6 ++---
>  drivers/pci/hotplug/cpqphp_pci.c  | 22 +--
>  drivers/pci/hotplug/ibmphp.h  |  5 +++--
>  drivers/pci/hotplug/ibmphp_pci.c  |  2 +-
>  drivers/pci/pci.c |  2 +-
>  drivers/pci/quirks.c  |  6 ++---
>  include/uapi/linux/pci_regs.h |  1 +
>  13 files changed, 30 insertions(+), 29 deletions(-)

Applied to pci/enumeration for v6.7, thanks!


Re: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-26 Thread Bjorn Helgaas
On Fri, Sep 22, 2023 at 10:46:36AM +0800, Shuai Xue wrote:
> ...

> Actually, this is a question from my colleague from firmware team.
> The original question is that:
> 
> "Should I set CPER_SEV_FATAL for Generic Error Status Block when a
> PCIe fatal error is detected? If set, kernel will always panic.
> Otherwise, kernel will always not panic."
> 
> So I pull a question about desired behavior of Linux kernel first :)
> From the perspective of the kernel, CPER_SEV_FATAL for Generic Error
> Status Block is not reasonable. The kernel will attempt to recover
> Fatal errors, although recovery may fail.

I don't know the semantics of CPER_SEV_FATAL or why it's there.
With CPER, we have *two* error severities: a "native" one defined by
the PCIe spec and another defined by the platform via CPER.

I speculate that the reason for the CPER severity could be to provide
a severity for error sources that don't have a "native" severity like
AER does, or for the vendor to force the OS to restart (for
CPER_SEV_FATAL, anyway) in cases where it might not otherwise.

In the native case, we only have the PCIe severity and don't have the
CPER severity at all, and I suspect that unless there's uncontained
data corruption, we would rather handle even the most severe PCIe
fatal error by disabling the specific device(s) instead of panicking
and restarting the whole machine.

So for PCIe errors, I'm not sure setting CPER_SEV_FATAL is beneficial
unless the platform wants to force the OS to panic, e.g., maybe the
platform knows about data corruption and/or the vendor wants the OS to
panic as part of a reliability story.

Presumably the platform has already logged the error, and I assume the
platform *could* restart without even returning to the OS, but maybe
it wants the OS to do a crashdump or shutdown in a more orderly way.

Bjorn


Re: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-21 Thread Bjorn Helgaas
On Thu, Sep 21, 2023 at 08:10:19PM +0800, Shuai Xue wrote:
> On 2023/9/21 07:02, Bjorn Helgaas wrote:
> > On Mon, Sep 18, 2023 at 05:39:58PM +0800, Shuai Xue wrote:
> ...

> > I guess your point is that for CPER_SEV_FATAL errors, the APEI/GHES
> > path always panics but the native path never does, and that maybe both
> > paths should work the same way?
> 
> Yes, exactly. Both OS native and APEI/GHES firmware first are notifications
> used to handles PCIe AER errors, and IMHO, they should ideally work in the
> same way.

I agree, that would be nice, but the whole point of the APEI/GHES
functionality is vendor value-add, so I'm not sure we can achieve that
ideal.

> ...
> As a result, AER driver only does recovery for non-fatal PCIe error.

This is only true for the APEI/GHES path, right?  For *native* AER
handling, we attempt recovery for both fatal and non-fatal errors.

> > It doesn't seem like the native path should always panic.  If we can
> > tell that data was corrupted, we may want to panic, but otherwise I
> > don't think we should crash the entire system even if some device is
> > permanently broken.
> 
> Got it. But how can we tell if the data is corrupted with OS native?

I naively expect that by PCIe protocol, corrupted DLLPs or TLPs
detected by CRC, sequence number errors, etc, would be discarded
before corrupting memory, so I doubt we'd get an uncorrectable error
that means "sorry, I just corrupted your data."

But DPC is advertised as "avoiding the potential spread of any data
corruption," so there must be some mechanisms of corruption, and since
DPC is triggered by either ERR_FATAL or ERR_NONFATAL, I guess maybe
the errors could tell us something.  I'm going to quit speculating
because I obviously don't know enough about this area.

> >> However, I have changed my mind on this issue as I encounter a case where
> >> a error propagation is detected due to fatal DLLP (Data Link Protocol
> >> Error) error. A DLLP error occurred in the Compute node, causing the
> >> node to panic because `struct acpi_hest_generic_status::error_severity` was
> >> set as CPER_SEV_FATAL. However, data corruption was still detected in the
> >> storage node by CRC.
> > 
> > The only mention of Data Link Protocol Error that looks relevant is
> > PCIe r6.0, sec 3.6.2.2, which basically says a DLLP with an unexpected
> > Sequence Number should be discarded:
> > 
> >   For Ack and Nak DLLPs, the following steps are followed (see Figure
> >   3-21):
> > 
> > - If the Sequence Number specified by the AckNak_Seq_Num does not
> >   correspond to an unacknowledged TLP, or to the value in
> >   ACKD_SEQ, the DLLP is discarded
> > 
> >   - This is a Data Link Protocol Error, which is a reported error
> > associated with the Port (see Section 6.2).
> > 
> > So data from that DLLP should not have made it to memory, although of
> > course the DMA may not have been completed.  But it sounds like you
> > did see corrupted data written to memory?
> 
> The storage node use RDMA to directly access remote compute node.
> And a error detected by CRC in the storage node. So I suspect yes.

When doing the CRC, can you distinguish between corrupted data and
data that was not written because a DMA was only partially completed?

> ...
> I tried to inject Data Link Protocol Error on some platform. The mechanism
> behind is that rootport controls the sequence number of the specific TLPs
> and ACK/NAK DLLPs. Data Link Protocol Error will be detected at the Rx side
> of ACK/NAK DLLPs.
> 
> In such case, NIC and NVMe recovered on fatal and non-fatal DLLP
> errors.

I'm guessing this error injection directly writes the AER status bit,
which would probably only test the reporting (sending an ERR_FATAL
message), AER interrupt generation, firmware or OS interrupt handling,
etc.

It probably would not actually generate a DLLP with a bad sequence
number, so it probably does not test the hardware behavior of
discarding the DLLP if the sequence number is bad.  Just my guess
though.

> ...
> My point is that how kernel could recover from non-fatal and fatal
> errors in firmware first without DPC? If CPER_SEV_FATAL is used to
> report fatal PCIe error, kernel will panic in APEI/GHES driver.

The platform decides whether to use CPER_SEV_FATAL, so we can't change
that.  We *could* change whether Linux panics when the platform says
an error is CPER_SEV_FATAL.  That happens in drivers/acpi, so it's
really up to Rafael.

Personally I would want to hear from vendors who use the APEI/GHES
path.  Poking around the web for logs that mention HEST and related
things, it looks like at least Dell, HP, and Lenovo use it.  And there
are drivers/acpi/apei commits from nxp.com, alibaba.com, amd.com,
arm.com huawei.com, etc., so some of them probably care, too.

Bjorn


Re: Questions: Should kernel panic when PCIe fatal error occurs?

2023-09-20 Thread Bjorn Helgaas
On Mon, Sep 18, 2023 at 05:39:58PM +0800, Shuai Xue wrote:
> Hi, all folks,
> 
> Error reporting and recovery are one of the important features of PCIe, and
> the kernel has been supporting them since version 2.6, 17 years ago.
> I am very curious about the expected behavior of the software.
> I first recap the error classification and then list my questions bellow it.
> 
> ## Recap: Error classification
> 
> - Fatal Errors
> 
> Fatal errors are uncorrectable error conditions which render the particular
> Link and related hardware unreliable. For Fatal errors, a reset of the
> components on the Link may be required to return to reliable operation.
> Platform handling of Fatal errors, and any efforts to limit the effects of
> these errors, is platform implementation specific. (PCIe 6.0.1, sec
> 6.2.2.2.1 Fatal Errors).
> 
> - Non-Fatal Errors
> 
> Non-fatal errors are uncorrectable errors which cause a particular
> transaction to be unreliable but the Link is otherwise fully functional.
> Isolating Non-fatal from Fatal errors provides Requester/Receiver logic in
> a device or system management software the opportunity to recover from the
> error without resetting the components on the Link and disturbing other
> transactions in progress. Devices not associated with the transaction in
> error are not impacted by the error.  (PCIe 6.0.1, sec 6.2.2.2.1 Non-Fatal
> Errors).
> 
> ## What the kernel do?
> 
> The Linux kernel supports both the OS native and firmware first modes in
> AER and DPC drivers. The error recovery API is defined in `struct
> pci_error_handlers`, and the recovery process is performed in several
> stages in pcie_do_recovery(). One main difference in handling PCIe errors
> is that the kernel only resets the link when a fatal error is detected.
> 
> ## Questions
> 
> 1. Should kernel panic when fatal errors occur without AER recovery?
> 
> IMHO, the answer is NO. The AER driver handles both fatal and
> non-fatal errors, and I have not found any panic changes in the
> recovery path in OS native mode.
> 
> As far as I know, on many X86 platforms, struct
> `acpi_hest_generic_status::error_severity` is set as CPER_SEV_FATAL
> in firmware first mode. As a result, kernel will panic immediately
> in ghes_proc() when fatal AER errors occur, and there is no chance
> to handle the error and perform recovery in AER driver.

UEFI r2.10, sec N.2.1,, defines CPER_SEV_FATAL, and platform firmware
decides which Error Severity to put in the error record.  I don't see
anything in UEFI about how the OS should handle fatal errors.

ACPI r6.5, sec 18.1, says on fatal uncorrected error, the system
should be restarted to prevent propagation of the error.  For
CPER_SEV_FATAL errors, it looks like ghes_proc() panics even before
trying AER recovery.

I guess your point is that for CPER_SEV_FATAL errors, the APEI/GHES
path always panics but the native path never does, and that maybe both
paths should work the same way?

It would be nice if they worked the same, but I suspect that vendors
may rely on the fact that CPER_SEV_FATAL forces a restart/panic as
part of their system integrity story.

It doesn't seem like the native path should always panic.  If we can
tell that data was corrupted, we may want to panic, but otherwise I
don't think we should crash the entire system even if some device is
permanently broken.

> For fatal and non-fatal errors, struct
> `acpi_hest_generic_status::error_severity` should as
> CPER_SEV_RECOVERABLE, and struct
> `acpi_hest_generic_data::error_severity` should reflect its real
> severity. Then, the kernel is equivalent to handling PCIe errors in
> Firmware first mode as it does in OS native mode.  Please correct me
> if I am wrong.

I don't know enough to comment on how Error Severity should be used in
the Generic Error Status Block vs the Generic Error Data Entry.

> However, I have changed my mind on this issue as I encounter a case where
> a error propagation is detected due to fatal DLLP (Data Link Protocol
> Error) error. A DLLP error occurred in the Compute node, causing the
> node to panic because `struct acpi_hest_generic_status::error_severity` was
> set as CPER_SEV_FATAL. However, data corruption was still detected in the
> storage node by CRC.

The only mention of Data Link Protocol Error that looks relevant is
PCIe r6.0, sec 3.6.2.2, which basically says a DLLP with an unexpected
Sequence Number should be discarded:

  For Ack and Nak DLLPs, the following steps are followed (see Figure
  3-21):

- If the Sequence Number specified by the AckNak_Seq_Num does not
  correspond to an unacknowledged TLP, or to the value in
  ACKD_SEQ, the DLLP is discarded

  - This is a Data Link Protocol Error, which is a reported error
associated with the Port (see Section 6.2).

So data from that DLLP should not have made it to memory, although of
course the DMA may not have been completed.  But it sounds like you
did see corrupted data written to memory?

I assume it is 

Re: [PATCHv3 pci-next 1/2] PCI/AER: correctable error message as KERN_INFO

2023-09-18 Thread Bjorn Helgaas
On Mon, Sep 18, 2023 at 07:42:30PM +0800, Xi Ruoyao wrote:
> ...

> My workstation suffers from too much correctable AER reporting as well
> (related to Intel's errata "RPL013: Incorrectly Formed PCIe Packets May
> Generate Correctable Errors" and/or the motherboard design, I guess).

We should rate-limit correctable error reporting so it's not
overwhelming.

At the same time, I'm *also* interested in the cause of these errors,
in case there's a Linux defect or a hardware erratum that we can work
around.  Do you have a bug report with any more details, e.g., a dmesg
log and "sudo lspci -vv" output?

Bjorn


Re: [PATCH v6 2/3] PCI/AER: Disable AER interrupt on suspend

2023-08-10 Thread Bjorn Helgaas
On Thu, Aug 10, 2023 at 04:17:21PM +0800, Kai-Heng Feng wrote:
> On Thu, Aug 10, 2023 at 2:52 AM Bjorn Helgaas  wrote:
> > On Fri, Jul 21, 2023 at 11:58:24AM +0800, Kai-Heng Feng wrote:
> > > On Tue, Jul 18, 2023 at 7:17 PM Bjorn Helgaas  wrote:
> > > > On Fri, May 12, 2023 at 08:00:13AM +0800, Kai-Heng Feng wrote:
> > > > > PCIe services that share an IRQ with PME, such as AER or DPC,
> > > > > may cause a spurious wakeup on system suspend. To prevent this,
> > > > > disable the AER interrupt notification during the system suspend
> > > > > process.
> > > >
> > > > I see that in this particular BZ dmesg log, PME, AER, and DPC do share
> > > > the same IRQ, but I don't think this is true in general.
> > > >
> > > > Root Ports usually use MSI or MSI-X.  PME and hotplug events use the
> > > > Interrupt Message Number in the PCIe Capability, but AER uses the one
> > > > in the AER Root Error Status register, and DPC uses the one in the DPC
> > > > Capability register.  Those potentially correspond to three distinct
> > > > MSI/MSI-X vectors.
> > > >
> > > > I think this probably has nothing to do with the IRQ being *shared*,
> > > > but just that putting the downstream component into D3cold, where the
> > > > link state is L3, may cause the upstream component to log and signal a
> > > > link-related error as the link goes completely down.
> > >
> > > That's quite likely a better explanation than my wording.
> > > Assuming AER IRQ and PME IRQ are not shared, does system get woken up
> > > by AER IRQ?
> >
> > Rafael could answer this better than I can, but
> > Documentation/power/suspend-and-interrupts.rst says device interrupts
> > are generally disabled during suspend after the "late" phase of
> > suspending devices, i.e.,
> >
> >   dpm_suspend_noirq
> > suspend_device_irqs   <-- disable non-wakeup IRQs
> > dpm_noirq_suspend_devices
> >   ...
> > pci_pm_suspend_noirq  # (I assume)
> >   pci_prepare_to_sleep
> >
> > I think the downstream component would be put in D3cold by
> > pci_prepare_to_sleep(), so non-wakeup interrupts should be disabled by
> > then.
> >
> > I assume PME would generally *not* be disabled since it's needed for
> > wakeup, so I think any interrupt that shares the PME IRQ and occurs
> > during suspend may cause a spurious wakeup.
> 
> Yes, that's the case here.
> 
> > If so, it's exactly as you said at the beginning: AER/DPC/etc sharing
> > the PME IRQ may cause spurious wakeups, and we would have to disable
> > those other interrupts at the source, e.g., by clearing
> > PCI_ERR_ROOT_CMD_FATAL_EN etc (exactly as your series does).
> 
> So is the series good to be merged now?

If we merge as-is, won't we disable AER & DPC interrupts unnecessarily
in the case where the link goes to D3hot?  In that case, there's no
reason to expect interrupts related to the link going down, but things
like PTM messages still work, and they may cause errors that we should
know about.

> > > > I don't think D0-D3hot should be relevant here because in all those
> > > > states, the link should be active because the downstream config space
> > > > remains accessible.  So I'm not sure if it's possible, but I wonder if
> > > > there's a more targeted place we could do this, e.g., in the path that
> > > > puts downstream devices in D3cold.
> > >
> > > Let me try to work on this.
> > >
> > > Kai-Heng
> > >
> > > >
> > > > > As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power 
> > > > > Management",
> > > > > TLP and DLLP transmission are disabled for a Link in L2/L3 Ready 
> > > > > (D3hot), L2
> > > > > (D3cold with aux power) and L3 (D3cold) states. So disabling the AER
> > > > > notification during suspend and re-enabling them during the resume 
> > > > > process
> > > > > should not affect the basic functionality.
> > > > >
> > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295
> > > > > Reviewed-by: Mika Westerberg 
> > > > > Signed-off-by: Kai-Heng Feng 
> > > > > ---
> > > > > v6:
> > > > > v5:
> > > > >  - Wording.
> > > > >
> > > > > v4:
> > > &g

Re: [PATCH v6 2/3] PCI/AER: Disable AER interrupt on suspend

2023-08-09 Thread Bjorn Helgaas
On Fri, Jul 21, 2023 at 11:58:24AM +0800, Kai-Heng Feng wrote:
> On Tue, Jul 18, 2023 at 7:17 PM Bjorn Helgaas  wrote:
> > On Fri, May 12, 2023 at 08:00:13AM +0800, Kai-Heng Feng wrote:
> > > PCIe services that share an IRQ with PME, such as AER or DPC,
> > > may cause a spurious wakeup on system suspend. To prevent this,
> > > disable the AER interrupt notification during the system suspend
> > > process.
> >
> > I see that in this particular BZ dmesg log, PME, AER, and DPC do share
> > the same IRQ, but I don't think this is true in general.
> >
> > Root Ports usually use MSI or MSI-X.  PME and hotplug events use the
> > Interrupt Message Number in the PCIe Capability, but AER uses the one
> > in the AER Root Error Status register, and DPC uses the one in the DPC
> > Capability register.  Those potentially correspond to three distinct
> > MSI/MSI-X vectors.
> >
> > I think this probably has nothing to do with the IRQ being *shared*,
> > but just that putting the downstream component into D3cold, where the
> > link state is L3, may cause the upstream component to log and signal a
> > link-related error as the link goes completely down.
> 
> That's quite likely a better explanation than my wording.
> Assuming AER IRQ and PME IRQ are not shared, does system get woken up
> by AER IRQ?

Rafael could answer this better than I can, but
Documentation/power/suspend-and-interrupts.rst says device interrupts
are generally disabled during suspend after the "late" phase of
suspending devices, i.e.,

  dpm_suspend_noirq
suspend_device_irqs   <-- disable non-wakeup IRQs
dpm_noirq_suspend_devices
  ...
pci_pm_suspend_noirq  # (I assume)
  pci_prepare_to_sleep

I think the downstream component would be put in D3cold by
pci_prepare_to_sleep(), so non-wakeup interrupts should be disabled by
then.

I assume PME would generally *not* be disabled since it's needed for
wakeup, so I think any interrupt that shares the PME IRQ and occurs
during suspend may cause a spurious wakeup.

If so, it's exactly as you said at the beginning: AER/DPC/etc sharing
the PME IRQ may cause spurious wakeups, and we would have to disable
those other interrupts at the source, e.g., by clearing
PCI_ERR_ROOT_CMD_FATAL_EN etc (exactly as your series does).

> > I don't think D0-D3hot should be relevant here because in all those
> > states, the link should be active because the downstream config space
> > remains accessible.  So I'm not sure if it's possible, but I wonder if
> > there's a more targeted place we could do this, e.g., in the path that
> > puts downstream devices in D3cold.
> 
> Let me try to work on this.
> 
> Kai-Heng
> 
> >
> > > As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power 
> > > Management",
> > > TLP and DLLP transmission are disabled for a Link in L2/L3 Ready (D3hot), 
> > > L2
> > > (D3cold with aux power) and L3 (D3cold) states. So disabling the AER
> > > notification during suspend and re-enabling them during the resume process
> > > should not affect the basic functionality.
> > >
> > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295
> > > Reviewed-by: Mika Westerberg 
> > > Signed-off-by: Kai-Heng Feng 
> > > ---
> > > v6:
> > > v5:
> > >  - Wording.
> > >
> > > v4:
> > > v3:
> > >  - No change.
> > >
> > > v2:
> > >  - Only disable AER IRQ.
> > >  - No more check on PME IRQ#.
> > >  - Use helper.
> > >
> > >  drivers/pci/pcie/aer.c | 22 ++
> > >  1 file changed, 22 insertions(+)
> > >
> > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > > index 1420e1f27105..9c07fdbeb52d 100644
> > > --- a/drivers/pci/pcie/aer.c
> > > +++ b/drivers/pci/pcie/aer.c
> > > @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev)
> > >   return 0;
> > >  }
> > >
> > > +static int aer_suspend(struct pcie_device *dev)
> > > +{
> > > + struct aer_rpc *rpc = get_service_data(dev);
> > > + struct pci_dev *pdev = rpc->rpd;
> > > +
> > > + aer_disable_irq(pdev);
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int aer_resume(struct pcie_device *dev)
> > > +{
> > > + struct aer_rpc *rpc = get_service_data(dev);
> > > + struct pci_dev *pdev = rpc->rpd;
> > > +
> > > + aer_enable_irq(pdev);
> > > +
> > > + return 0;
> > > +}
> > > +
> > >  /**
> > >   * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP
> > >   * @dev: pointer to Root Port, RCEC, or RCiEP
> > > @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = {
> > >   .service= PCIE_PORT_SERVICE_AER,
> > >
> > >   .probe  = aer_probe,
> > > + .suspend= aer_suspend,
> > > + .resume = aer_resume,
> > >   .remove = aer_remove,
> > >  };
> > >
> > > --
> > > 2.34.1
> > >


Re: [PATCH v2 1/2] PCI: Add pci_find_next_dvsec_capability to find next Designated VSEC

2023-08-08 Thread Bjorn Helgaas
Don't re-post just for this, but if you do repost, add "()" after the
function name in the subject line, as you did for the 2/2 patch.

On Tue, Aug 08, 2023 at 12:08:57PM +0800, Xiongfeng Wang wrote:
> Some devices may have several DVSEC (Designated Vendor-Specific Extended
> Capability) entries with the same DVSEC ID. Add
> pci_find_next_dvsec_capability() to find them all.


Re: [PATCH 1/2] PCI: Add pci_find_next_dvsec_capability to find next designated VSEC

2023-08-07 Thread Bjorn Helgaas
[+cc David since drivers/platform/x86/intel/vsec.c does some similar
things, although it seems to iterate over all Intel DVSEC IDs at once]

In subject:

  PCI: Add pci_find_next_dvsec_capability() to find next Designated VSEC

On Mon, Aug 07, 2023 at 11:18:45AM +0800, Xiongfeng Wang wrote:
> Some devices may have several DVSEC(Designated Vendor-Specific Extended
> Capability) entries with the same DVSEC ID. Add
> pci_find_next_dvsec_capability() to find them all.

Add space between "DVSEC" and "(Designated ...)".

> Signed-off-by: Xiongfeng Wang 

Acked-by: Bjorn Helgaas 

so you can merge this along with the ocxl patch that uses it.

> ---
>  drivers/pci/pci.c   | 37 +
>  include/linux/pci.h |  2 ++
>  2 files changed, 27 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 60230da957e0..3455ca7306ae 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -749,35 +749,48 @@ u16 pci_find_vsec_capability(struct pci_dev *dev, u16 
> vendor, int cap)
>  EXPORT_SYMBOL_GPL(pci_find_vsec_capability);
>  
>  /**
> - * pci_find_dvsec_capability - Find DVSEC for vendor
> + * pci_find_next_dvsec_capability - Find next DVSEC for vendor
>   * @dev: PCI device to query
> + * @start: address at which to start looking (0 to start at beginning of 
> list)

s/address/Address/ to match other parameters

>   * @vendor: Vendor ID to match for the DVSEC
>   * @dvsec: Designated Vendor-specific capability ID

There are a lot of IDs floating around here, so to better match the
spec language:

  @dvsec: Vendor-defined DVSEC ID

> - * If DVSEC has Vendor ID @vendor and DVSEC ID @dvsec return the capability
> - * offset in config space; otherwise return 0.
> + * Returns the address of the next DVSEC if the DVSEC has Vendor ID @vendor 
> and
> + * DVSEC ID @dvsec; otherwise return 0. DVSEC can occur several times with 
> the
> + * same DVSEC ID for some devices, and this provides a way to find them all.
>   */
> -u16 pci_find_dvsec_capability(struct pci_dev *dev, u16 vendor, u16 dvsec)
> +u16 pci_find_next_dvsec_capability(struct pci_dev *dev, u16 start, u16 
> vendor,
> +u16 dvsec)
>  {
> - int pos;
> + u16 pos = start;
>  
> - pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DVSEC);
> - if (!pos)
> - return 0;
> -
> - while (pos) {
> + while ((pos = pci_find_next_ext_capability(dev, pos,
> +   PCI_EXT_CAP_ID_DVSEC))) {
>   u16 v, id;
>  
>   pci_read_config_word(dev, pos + PCI_DVSEC_HEADER1, );
>   pci_read_config_word(dev, pos + PCI_DVSEC_HEADER2, );
>   if (vendor == v && dvsec == id)
>   return pos;
> -
> - pos = pci_find_next_ext_capability(dev, pos, 
> PCI_EXT_CAP_ID_DVSEC);
>   }
>  
>   return 0;
>  }
> +EXPORT_SYMBOL_GPL(pci_find_next_dvsec_capability);
> +
> +/**
> + * pci_find_dvsec_capability - Find DVSEC for vendor
> + * @dev: PCI device to query
> + * @vendor: Vendor ID to match for the DVSEC
> + * @dvsec: Designated Vendor-specific capability ID
> + *
> + * If DVSEC has Vendor ID @vendor and DVSEC ID @dvsec return the capability
> + * offset in config space; otherwise return 0.
> + */
> +u16 pci_find_dvsec_capability(struct pci_dev *dev, u16 vendor, u16 dvsec)
> +{
> + return pci_find_next_dvsec_capability(dev, 0, vendor, dvsec);
> +}
>  EXPORT_SYMBOL_GPL(pci_find_dvsec_capability);
>  
>  /**
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index c69a2cc1f412..82bb905daf72 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1168,6 +1168,8 @@ u16 pci_find_next_ext_capability(struct pci_dev *dev, 
> u16 pos, int cap);
>  struct pci_bus *pci_find_next_bus(const struct pci_bus *from);
>  u16 pci_find_vsec_capability(struct pci_dev *dev, u16 vendor, int cap);
>  u16 pci_find_dvsec_capability(struct pci_dev *dev, u16 vendor, u16 dvsec);
> +u16 pci_find_next_dvsec_capability(struct pci_dev *dev, u16 start, u16 
> vendor,
> +u16 dvsec);
>  
>  u64 pci_get_dsn(struct pci_dev *dev);
>  
> -- 
> 2.20.1
> 


Re: [PATCH v7 2/2] PCI: rpaphp: Error out on busy status from get-sensor-state

2023-08-01 Thread Bjorn Helgaas
sor() API which blocks the slot
> check state until RTAS call returns success. To avoid this, fix the PCI
> hotplug driver (rpaphp) to return an error (-EBUSY) if the slot presence
> state can not be detected immediately while PE is in EEH recovery state.
> Change rpaphp_get_sensor_state() to invoke rtas_call(get-sensor-state)
> directly only if the respective PE is in EEH recovery state, and take
> actions based on RTAS return status. This way EEH handler will not be
> blocked on rpaphp_get_sensor_state() and can immediately notify driver
> about the PCI error and stop any active operations.
> 
> In normal cases (non-EEH case) rpaphp_get_sensor_state() will continue to
> invoke rtas_get_sensor() as it was earlier with no change in existing
> behavior.
> 
> Signed-off-by: Mahesh Salgaonkar 
> Reviewed-by: Nathan Lynch 

Seems like maybe both patches could go via a ppc tree since they seem
very ppc-specific?  A couple minor comments below.

Acked-by: Bjorn Helgaas 

> + * get_adapter_status() can be called by the EEH handler during EEH recovery.
> + * On certain PHB failures, the RTAS call get-seHsor-state() returns extended

Looks like a typo in "get-seHsor-state"?

> +static int __rpaphp_get_sensor_state(struct slot *slot, int *state)
> +{
> +#ifdef CONFIG_EEH

Is this #ifdef redundant?  It looks like this file is only compiled
if CONFIG_EEH is set:

  config HOTPLUG_PCI_RPA
  tristate "RPA PCI Hotplug driver"
  depends on PPC_PSERIES && EEH

  obj-$(CONFIG_HOTPLUG_PCI_RPA)   += rpaphp.o

  rpaphp-objs :=  rpaphp_core.o   \
  rpaphp_pci.o\
  rpaphp_slot.o

> + int rc;
> + int token = rtas_token("get-sensor-state");
> + struct pci_dn *pdn;
> + struct eeh_pe *pe;
> + struct pci_controller *phb = PCI_DN(slot->dn)->phb;
> +
> + if (token == RTAS_UNKNOWN_SERVICE)
> + return -ENOENT;
> +
> + /*
> +  * Fallback to existing method for empty slot or PE isn't in EEH
> +  * recovery.
> +  */
> + pdn = list_first_entry_or_null(_DN(phb->dn)->child_list,
> + struct pci_dn, list);
> + if (!pdn)
> + goto fallback;
> +
> + pe = eeh_dev_to_pe(pdn->edev);
> + if (pe && (pe->state & EEH_PE_RECOVERING)) {
> + rc = rtas_call(token, 2, 2, state, DR_ENTITY_SENSE,
> +slot->index);
> + return rtas_get_sensor_errno(rc);
> + }
> +fallback:
> +#endif
> + return rtas_get_sensor(DR_ENTITY_SENSE, slot->index, state);
> +}
> +
>  int rpaphp_get_sensor_state(struct slot *slot, int *state)
>  {
>   int rc;
>   int setlevel;
>  
> - rc = rtas_get_sensor(DR_ENTITY_SENSE, slot->index, state);
> + rc = __rpaphp_get_sensor_state(slot, state);
>  
>   if (rc < 0) {
>   if (rc == -EFAULT || rc == -EEXIST) {
> @@ -40,8 +117,7 @@ int rpaphp_get_sensor_state(struct slot *slot, int *state)
>   dbg("%s: power on slot[%s] failed rc=%d.\n",
>   __func__, slot->name, rc);
>   } else {
> - rc = rtas_get_sensor(DR_ENTITY_SENSE,
> -  slot->index, state);
> + rc = __rpaphp_get_sensor_state(slot, state);
>   }
>   } else if (rc == -ENODEV)
>   info("%s: slot is unusable\n", __func__);
> 
> 


Re: [PATCH v6 2/3] PCI/AER: Disable AER interrupt on suspend

2023-07-18 Thread Bjorn Helgaas
[+cc Rafael]

On Fri, May 12, 2023 at 08:00:13AM +0800, Kai-Heng Feng wrote:
> PCIe services that share an IRQ with PME, such as AER or DPC, may cause a
> spurious wakeup on system suspend. To prevent this, disable the AER interrupt
> notification during the system suspend process.

I see that in this particular BZ dmesg log, PME, AER, and DPC do share
the same IRQ, but I don't think this is true in general.

Root Ports usually use MSI or MSI-X.  PME and hotplug events use the
Interrupt Message Number in the PCIe Capability, but AER uses the one
in the AER Root Error Status register, and DPC uses the one in the DPC
Capability register.  Those potentially correspond to three distinct
MSI/MSI-X vectors.

I think this probably has nothing to do with the IRQ being *shared*,
but just that putting the downstream component into D3cold, where the
link state is L3, may cause the upstream component to log and signal a
link-related error as the link goes completely down.

I don't think D0-D3hot should be relevant here because in all those
states, the link should be active because the downstream config space
remains accessible.  So I'm not sure if it's possible, but I wonder if
there's a more targeted place we could do this, e.g., in the path that
puts downstream devices in D3cold.

> As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power Management",
> TLP and DLLP transmission are disabled for a Link in L2/L3 Ready (D3hot), L2
> (D3cold with aux power) and L3 (D3cold) states. So disabling the AER
> notification during suspend and re-enabling them during the resume process
> should not affect the basic functionality.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295
> Reviewed-by: Mika Westerberg 
> Signed-off-by: Kai-Heng Feng 
> ---
> v6:
> v5:
>  - Wording.
> 
> v4:
> v3:
>  - No change.
> 
> v2:
>  - Only disable AER IRQ.
>  - No more check on PME IRQ#.
>  - Use helper.
> 
>  drivers/pci/pcie/aer.c | 22 ++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 1420e1f27105..9c07fdbeb52d 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev)
>   return 0;
>  }
>  
> +static int aer_suspend(struct pcie_device *dev)
> +{
> + struct aer_rpc *rpc = get_service_data(dev);
> + struct pci_dev *pdev = rpc->rpd;
> +
> + aer_disable_irq(pdev);
> +
> + return 0;
> +}
> +
> +static int aer_resume(struct pcie_device *dev)
> +{
> + struct aer_rpc *rpc = get_service_data(dev);
> + struct pci_dev *pdev = rpc->rpd;
> +
> + aer_enable_irq(pdev);
> +
> + return 0;
> +}
> +
>  /**
>   * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP
>   * @dev: pointer to Root Port, RCEC, or RCiEP
> @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = {
>   .service= PCIE_PORT_SERVICE_AER,
>  
>   .probe  = aer_probe,
> + .suspend= aer_suspend,
> + .resume = aer_resume,
>   .remove = aer_remove,
>  };
>  
> -- 
> 2.34.1
> 


Re: [PATCH 0/2] PCI/AER: Remove/unexport error reporting enable/disable

2023-07-13 Thread Bjorn Helgaas
On Mon, Jul 10, 2023 at 06:21:34PM -0500, Bjorn Helgaas wrote:
> From: Bjorn Helgaas 
> 
> pci_disable_pcie_error_reporting() is unused; remove it.
> pci_enable_pcie_error_reporting() is used only inside aer.c; make it
> static.
> 
> Bjorn Helgaas (2):
>   PCI/AER: Drop unused pci_disable_pcie_error_reporting()
>   PCI/AER: Unexport pci_enable_pcie_error_reporting()
> 
>  drivers/pci/pcie/aer.c | 15 +--
>  include/linux/aer.h| 11 ---
>  2 files changed, 1 insertion(+), 25 deletions(-)

Applied to pci/aer for v6.6, thanks Christoph and Sathy!


[PATCH 2/2] PCI/AER: Unexport pci_enable_pcie_error_reporting()

2023-07-10 Thread Bjorn Helgaas
From: Bjorn Helgaas 

pci_enable_pcie_error_reporting() is used only inside aer.c.  Stop exposing
it outside the file.

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/pcie/aer.c | 3 +--
 include/linux/aer.h| 6 --
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index d4c948b7c449..645149608054 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -230,7 +230,7 @@ int pcie_aer_is_native(struct pci_dev *dev)
return pcie_ports_native || host->native_aer;
 }
 
-int pci_enable_pcie_error_reporting(struct pci_dev *dev)
+static int pci_enable_pcie_error_reporting(struct pci_dev *dev)
 {
int rc;
 
@@ -240,7 +240,6 @@ int pci_enable_pcie_error_reporting(struct pci_dev *dev)
rc = pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_AER_FLAGS);
return pcibios_err_to_errno(rc);
 }
-EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
 
 int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
 {
diff --git a/include/linux/aer.h b/include/linux/aer.h
index aadc9242cb20..2dd175f5debd 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -41,14 +41,8 @@ struct aer_capability_regs {
 };
 
 #if defined(CONFIG_PCIEAER)
-/* PCIe port driver needs this function to enable AER */
-int pci_enable_pcie_error_reporting(struct pci_dev *dev);
 int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
 #else
-static inline int pci_enable_pcie_error_reporting(struct pci_dev *dev)
-{
-   return -EINVAL;
-}
 static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
 {
return -EINVAL;
-- 
2.34.1



[PATCH 1/2] PCI/AER: Drop unused pci_disable_pcie_error_reporting()

2023-07-10 Thread Bjorn Helgaas
From: Bjorn Helgaas 

pci_disable_pcie_error_reporting() has no callers.  Remove it.

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/pcie/aer.c | 12 
 include/linux/aer.h|  5 -
 2 files changed, 17 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index f6c24ded134c..d4c948b7c449 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -242,18 +242,6 @@ int pci_enable_pcie_error_reporting(struct pci_dev *dev)
 }
 EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
 
-int pci_disable_pcie_error_reporting(struct pci_dev *dev)
-{
-   int rc;
-
-   if (!pcie_aer_is_native(dev))
-   return -EIO;
-
-   rc = pcie_capability_clear_word(dev, PCI_EXP_DEVCTL, PCI_EXP_AER_FLAGS);
-   return pcibios_err_to_errno(rc);
-}
-EXPORT_SYMBOL_GPL(pci_disable_pcie_error_reporting);
-
 int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
 {
int aer = dev->aer_cap;
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 3a3ab05e13fd..aadc9242cb20 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -43,17 +43,12 @@ struct aer_capability_regs {
 #if defined(CONFIG_PCIEAER)
 /* PCIe port driver needs this function to enable AER */
 int pci_enable_pcie_error_reporting(struct pci_dev *dev);
-int pci_disable_pcie_error_reporting(struct pci_dev *dev);
 int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
 #else
 static inline int pci_enable_pcie_error_reporting(struct pci_dev *dev)
 {
return -EINVAL;
 }
-static inline int pci_disable_pcie_error_reporting(struct pci_dev *dev)
-{
-   return -EINVAL;
-}
 static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
 {
return -EINVAL;
-- 
2.34.1



[PATCH 0/2] PCI/AER: Remove/unexport error reporting enable/disable

2023-07-10 Thread Bjorn Helgaas
From: Bjorn Helgaas 

pci_disable_pcie_error_reporting() is unused; remove it.
pci_enable_pcie_error_reporting() is used only inside aer.c; make it
static.

Bjorn Helgaas (2):
  PCI/AER: Drop unused pci_disable_pcie_error_reporting()
  PCI/AER: Unexport pci_enable_pcie_error_reporting()

 drivers/pci/pcie/aer.c | 15 +--
 include/linux/aer.h| 11 ---
 2 files changed, 1 insertion(+), 25 deletions(-)

-- 
2.34.1



Re: [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures

2023-06-16 Thread Bjorn Helgaas
On Fri, Jun 16, 2023 at 01:27:52PM +0100, Maciej W. Rozycki wrote:
> On Thu, 15 Jun 2023, Bjorn Helgaas wrote:

>  As per my earlier remark:
> 
> > I think making a system halfway-fixed would make little sense, but with
> > the actual fix actually made last as you suggested I think this can be
> > split off, because it'll make no functional change by itself.
> 
> I am not perfectly happy with your rearrangement to fold the !PCI_QUIRKS 
> stub into the change carrying the actual workaround and then have the 
> reset path update with a follow-up change only, but I won't fight over it.  
> It's only one tree revision that will be in this halfway-fixed state and 
> I'll trust your judgement here.

Thanks for raising this.  Here's my thought process:

  12 PCI: Provide stub failed link recovery for device probing and hot plug
  13 PCI: Add failed link recovery for device reset events
  14 PCI: Work around PCIe link training failures

Patch 12 [1] adds calls to pcie_failed_link_retrain(), which does
nothing and returns false.  Functionally, it's a no-op, but the
structure is important later.

Patch 13 [2] claims to request failed link recovery after resets, but
actually doesn't do anything yet because pcie_failed_link_retrain() is
still a no-op, so this was a bit confusing.

Patch 14 [3] implements pcie_failed_link_retrain(), so the recovery
mentioned in 12 and 13 actually happens.  But this patch doesn't add
the call to pcie_failed_link_retrain(), so it's a little bit hard to
connect the dots.

I agree that as I rearranged it, the workaround doesn't apply in all
cases simultaneously.  Maybe not ideal, but maybe not terrible either.
Looking at it again, maybe it would have made more sense to move the
pcie_wait_for_link_delay() change to the last patch along with the
pci_dev_wait() change.  I dunno.

Bjorn

[1] 12 
https://lore.kernel.org/r/alpine.deb.2.21.2306111619570.64...@angie.orcam.me.uk
[2] 13 
https://lore.kernel.org/r/alpine.deb.2.21.2306111631050.64...@angie.orcam.me.uk
[3] 14 
https://lore.kernel.org/r/alpine.deb.2.21.2305310038540.59...@angie.orcam.me.uk


Re: [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures

2023-06-15 Thread Bjorn Helgaas
On Thu, Jun 15, 2023 at 01:41:10AM +0100, Maciej W. Rozycki wrote:
> On Wed, 14 Jun 2023, Bjorn Helgaas wrote:
> 
> > >  This is v9 of the change to work around a PCIe link training phenomenon 
> > > where a pair of devices both capable of operating at a link speed above 
> > > 2.5GT/s seems unable to negotiate the link speed and continues training 
> > > indefinitely with the Link Training bit switching on and off repeatedly 
> > > and the data link layer never reaching the active state.
> > > 
> > >  With several requests addressed and a few extra issues spotted this
> > > version has now grown to 14 patches.  It has been verified for device 
> > > enumeration with and without PCI_QUIRKS enabled, using the same piece of 
> > > RISC-V hardware as previously.  Hot plug or reset events have not been 
> > > verified, as this is difficult if at all feasible with hardware in 
> > > question.

> >  static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
> >  {
> > -   bool retrain = true;
> > int delay = 1;
> > +   bool retrain = false;
> > +   struct pci_dev *bridge;
> > +
> > +   if (pci_is_pcie(dev)) {
> > +   retrain = true;
> > +   bridge = pci_upstream_bridge(dev);
> > +   }
> 
>  If doing it this way, which I actually like, I think it would be a little 
> bit better performance- and style-wise if this was written as:
> 
>   if (pci_is_pcie(dev)) {
>   bridge = pci_upstream_bridge(dev);
>   retrain = !!bridge;
>   }
> 
> (or "retrain = bridge != NULL" if you prefer this style), and then we 
> don't have to repeatedly check two variables iff (pcie && !bridge) in the 
> loop below:

Done, thanks, I do like that better.  I did:

  bridge = pci_upstream_bridge(dev);
  if (bridge)
retrain = true;

because it seems like it flows more naturally when reading.

Bjorn


Re: [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures

2023-06-14 Thread Bjorn Helgaas
On Sun, Jun 11, 2023 at 06:19:08PM +0100, Maciej W. Rozycki wrote:
> Hi,
> 
>  This is v9 of the change to work around a PCIe link training phenomenon 
> where a pair of devices both capable of operating at a link speed above 
> 2.5GT/s seems unable to negotiate the link speed and continues training 
> indefinitely with the Link Training bit switching on and off repeatedly 
> and the data link layer never reaching the active state.
> 
>  With several requests addressed and a few extra issues spotted this
> version has now grown to 14 patches.  It has been verified for device 
> enumeration with and without PCI_QUIRKS enabled, using the same piece of 
> RISC-V hardware as previously.  Hot plug or reset events have not been 
> verified, as this is difficult if at all feasible with hardware in 
> question.
> 
>  Last iteration: 
> ,
>  
> and my input to it:
> .

Thanks, I applied these to pci/enumeration for v6.5.

I tweaked a few things, so double-check to be sure I didn't break
something:

  - Moved dev->link_active_reporting init to set_pcie_port_type()
because it does other PCIe-related stuff.

  - Reordered to keep all the link_active_reporting things together.

  - Reordered to clean up & factor pcie_retrain_link() before exposing
it to the rest of the PCI core.

  - Moved pcie_retrain_link() a little earlier to keep it next to
pcie_wait_for_link_status().

  - Squashed the stubs into the actual quirk so we don't have the
intermediate state where we call the stubs but they never do
anything (let me know if there's a reason we need your order).

  - Inline pcie_parent_link_retrain(), which seemed like it didn't add
enough to be worthwhile.

Interdiff below:

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 80694e2574b8..f11268924c8f 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1153,27 +1153,16 @@ void pci_resume_bus(struct pci_bus *bus)
pci_walk_bus(bus, pci_resume_one, NULL);
 }
 
-/**
- * pcie_parent_link_retrain - Check and retrain link we are downstream from
- * @dev: PCI device to handle.
- *
- * Return TRUE if the link was retrained, FALSE otherwise.
- */
-static bool pcie_parent_link_retrain(struct pci_dev *dev)
-{
-   struct pci_dev *bridge;
-
-   bridge = pci_upstream_bridge(dev);
-   if (bridge)
-   return pcie_failed_link_retrain(bridge);
-   else
-   return false;
-}
-
 static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 {
-   bool retrain = true;
int delay = 1;
+   bool retrain = false;
+   struct pci_dev *bridge;
+
+   if (pci_is_pcie(dev)) {
+   retrain = true;
+   bridge = pci_upstream_bridge(dev);
+   }
 
/*
 * After reset, the device should not silently discard config
@@ -1201,9 +1190,9 @@ static int pci_dev_wait(struct pci_dev *dev, char 
*reset_type, int timeout)
}
 
if (delay > PCI_RESET_WAIT) {
-   if (retrain) {
+   if (retrain && bridge) {
retrain = false;
-   if (pcie_parent_link_retrain(dev)) {
+   if (pcie_failed_link_retrain(bridge)) {
delay = 1;
continue;
}
@@ -4914,6 +4903,38 @@ static bool pcie_wait_for_link_status(struct pci_dev 
*pdev,
return (lnksta & lnksta_mask) == lnksta_match;
 }
 
+/**
+ * pcie_retrain_link - Request a link retrain and wait for it to complete
+ * @pdev: Device whose link to retrain.
+ * @use_lt: Use the LT bit if TRUE, or the DLLLA bit if FALSE, for status.
+ *
+ * Retrain completion status is retrieved from the Link Status Register
+ * according to @use_lt.  It is not verified whether the use of the DLLLA
+ * bit is valid.
+ *
+ * Return TRUE if successful, or FALSE if training has not completed
+ * within PCIE_LINK_RETRAIN_TIMEOUT_MS milliseconds.
+ */
+bool pcie_retrain_link(struct pci_dev *pdev, bool use_lt)
+{
+   u16 lnkctl;
+
+   pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, );
+   lnkctl |= PCI_EXP_LNKCTL_RL;
+   pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnkctl);
+   if (pdev->clear_retrain_link) {
+   /*
+* Due to an erratum in some devices the Retrain Link bit
+* needs to be cleared again manually to allow the link
+* training to succeed.
+*/
+   lnkctl &= ~PCI_EXP_LNKCTL_RL;
+   pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnkctl);
+   }
+
+   return pcie_wait_for_link_status(pdev, use_lt, !use_lt);
+}
+
 /**
  * pcie_wait_for_link_delay - Wait until link is active or 

Re: [PATCH v8 0/7] Add pci_dev_for_each_resource() helper and update users

2023-05-30 Thread Bjorn Helgaas
On Fri, May 12, 2023 at 02:48:51PM -0500, Bjorn Helgaas wrote:
> On Fri, May 12, 2023 at 01:56:29PM +0300, Andy Shevchenko wrote:
> > On Tue, May 09, 2023 at 01:21:22PM -0500, Bjorn Helgaas wrote:
> > > On Tue, Apr 04, 2023 at 11:11:01AM -0500, Bjorn Helgaas wrote:
> > > > On Thu, Mar 30, 2023 at 07:24:27PM +0300, Andy Shevchenko wrote:
> > > > > Provide two new helper macros to iterate over PCI device resources and
> > > > > convert users.
> > > 
> > > > Applied 2-7 to pci/resource for v6.4, thanks, I really like this!
> > > 
> > > This is 09cc90063240 ("PCI: Introduce pci_dev_for_each_resource()")
> > > upstream now.
> > > 
> > > Coverity complains about each use,
> > 
> > It needs more clarification here. Use of reduced variant of the
> > macro or all of them? If the former one, then I can speculate that
> > Coverity (famous for false positives) simply doesn't understand `for
> > (type var; var ...)` code.
> 
> True, Coverity finds false positives.  It flagged every use in
> drivers/pci and drivers/pnp.  It didn't mention the arch/alpha, arm,
> mips, powerpc, sh, or sparc uses, but I think it just didn't look at
> those.
> 
> It flagged both:
> 
>   pbus_size_iopci_dev_for_each_resource(dev, r)
>   pbus_size_mem   pci_dev_for_each_resource(dev, r, i)
> 
> Here's a spreadsheet with a few more details (unfortunately I don't
> know how to make it dump the actual line numbers or analysis like I
> pasted below, so "pci_dev_for_each_resource" doesn't appear).  These
> are mostly in the "Drivers-PCI" component.
> 
> https://docs.google.com/spreadsheets/d/1ohOJwxqXXoDUA0gwopgk-z-6ArLvhN7AZn4mIlDkHhQ/edit?usp=sharing
> 
> These particular reports are in the "High Impact Outstanding" tab.

Where are we at?  Are we going to ignore this because some Coverity
reports are false positives?

Bjorn


Re: [PATCH v4 22/23] PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem dev handler

2023-05-25 Thread Bjorn Helgaas
On Thu, May 25, 2023 at 11:29:58PM +0200, Robert Richter wrote:
> eOn 24.05.23 16:32:35, Bjorn Helgaas wrote:
> > On Tue, May 23, 2023 at 06:22:13PM -0500, Terry Bowman wrote:
> > > From: Robert Richter 
> > > 
> > > In Restricted CXL Device (RCD) mode a CXL device is exposed as an
> > > RCiEP, but CXL downstream and upstream ports are not enumerated and
> > > not visible in the PCIe hierarchy. Protocol and link errors are sent
> > > to an RCEC.
> > >
> > > Restricted CXL host (RCH) downstream port-detected errors are signaled
> > > as internal AER errors, either Uncorrectable Internal Error (UIE) or
> > > Corrected Internal Errors (CIE). 
> > 
> > From the parallelism with RCD above, I first thought that RCH devices
> > were non-RCD mode and *were* enumerated as part of the PCIe hierarchy,
> > but actually I suspect it's more like the following?
> > 
> >   ... but CXL downstream and upstream ports are not enumerated and not
> >   visible in the PCIe hierarchy.
> > 
> >   Protocol and link errors from these non-enumerated ports are
> >   signaled as internal AER errors ... via a CXL RCEC.
> 
> Exactly, except the RCEC is standard PCIe and also must not
> necessarily on the same PCI bus as the CXL RCiEPs are.

So make it "RCEC" instead of "CXL RCEC", I guess?  PCIe r6.0, sec
7.9.10.3, allows an RCEC to be associated with RCiEPs on different
buses, so nothing to see there.

> > > The error source is the id of the RCEC.
> > 
> > This seems odd; I assume this refers to the RCEC's AER Error Source
> > Identification register, and the ERR_COR or ERR_FATAL/NONFATAL Source
> > Identification would ordinarily be the Requester ID of the RCiEP that
> > "sent" the Error Message.  But you're saying it's actually the ID of
> > the *RCEC*, not the RCiEP?
> 
> Right, the downstream port has its own AER ext capability in
> non-config (io mapped) RCRB register range. Errors originating from
> there are signaled as internal AER errors via the RCEC *with* the
> RCEC's Requester ID. Code walks through all associated CXL endpoints,
> determines the dport and checks its AER.
> 
> There is also an RDPAS structure defined in CXL but that is only a
> different way to provide the RCEC to dport association instead of
> using the RCEC's Endpoint Association Extended Capability. In the end
> we get all associated RCHs and check the AER of all their dports.
> 
> The upstream port is signaled using the RCiEP's AER. CXL spec is
> strict here: "Upstream Port RCRB shall not implement the AER Extended
> Capability." The RCiEP's requestor ID is used then and its config
> space the AER is in.
> 
> CXL.cachemem errors are reported with the RCiEP as requester
> too. Status is in the CXL RAS cap and the UIE or CIE is set
> respectively in the AER status of the RCiEP.
>
> > We're going to call pci_aer_handle_error() as well, to handle the
> > non-internal errors, and I'm pretty sure that path expects the RCiEP
> > ID there.
> > 
> > Whatever the answer, I'm not sure this sentence is actually relevant
> > to this patch, since this patch doesn't read PCI_ERR_ROOT_ERR_SRC or
> > look at struct aer_err_source.id.
> 
> The source id is used in aer_process_err_devices() which finally calls
> handle_error_source() for the device with the requestor id. This is
> the place where cxl_rch_handle_error() checks if it is an RCEC that
> received an internal error and has cxl devices connected to it. Then,
> the request is forwarded to the cxl_mem handler which also needs to
> check the dport now. That is, pcie_walk_rcec() in
> cxl_rch_handle_error() is called with the RCEC's pci handle,
> cxl_rch_handle_error_iter() with the RCiEP's pci handle.

I'm still not sure this is relevant.  Isn't that last sentence just
the way we always use pcie_walk_rcec()?

If there's something *different* here about CXL, and it's important to
this patch, sure.  But I don't see that yet.  Maybe a comment in the
code if you think it's important to clarify something there.

Bjorn


Re: [PATCH v4 22/23] PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem dev handler

2023-05-24 Thread Bjorn Helgaas
On Tue, May 23, 2023 at 06:22:13PM -0500, Terry Bowman wrote:
> From: Robert Richter 
> 
> In Restricted CXL Device (RCD) mode a CXL device is exposed as an
> RCiEP, but CXL downstream and upstream ports are not enumerated and
> not visible in the PCIe hierarchy. Protocol and link errors are sent
> to an RCEC.
>
> Restricted CXL host (RCH) downstream port-detected errors are signaled
> as internal AER errors, either Uncorrectable Internal Error (UIE) or
> Corrected Internal Errors (CIE). 

>From the parallelism with RCD above, I first thought that RCH devices
were non-RCD mode and *were* enumerated as part of the PCIe hierarchy,
but actually I suspect it's more like the following?

  ... but CXL downstream and upstream ports are not enumerated and not
  visible in the PCIe hierarchy.

  Protocol and link errors from these non-enumerated ports are
  signaled as internal AER errors ... via a CXL RCEC.

> The error source is the id of the RCEC.

This seems odd; I assume this refers to the RCEC's AER Error Source
Identification register, and the ERR_COR or ERR_FATAL/NONFATAL Source
Identification would ordinarily be the Requester ID of the RCiEP that
"sent" the Error Message.  But you're saying it's actually the ID of
the *RCEC*, not the RCiEP?

We're going to call pci_aer_handle_error() as well, to handle the
non-internal errors, and I'm pretty sure that path expects the RCiEP
ID there.

Whatever the answer, I'm not sure this sentence is actually relevant
to this patch, since this patch doesn't read PCI_ERR_ROOT_ERR_SRC or
look at struct aer_err_source.id.

> A CXL handler must then inspect the error status in various CXL
> registers residing in the dport's component register space (CXL RAS
> capability) or the dport's RCRB (PCIe AER extended capability). [1]
> 
> Errors showing up in the RCEC's error handler must be handled and
> connected to the CXL subsystem. Implement this by forwarding the error
> to all CXL devices below the RCEC. Since the entire CXL device is
> controlled only using PCIe Configuration Space of device 0, function
> 0, only pass it there [2]. The error handling is limited to currently
> supported devices with the Memory Device class code set
> (PCI_CLASS_MEMORY_CXL, 502h), where the handler can be implemented in
> the existing cxl_pci driver. Support of CXL devices (e.g. a CXL.cache
> device) can be enabled later.

I assume the Memory Devices are CXL devices, so maybe "Error handling
for *other* CXL devices ... can be enabled later"?  

IIUC, this happens via cxl_rch_handle_error_iter() calling
pci_error_handlers for CXL RCiEPs.  Maybe the is_cxl_mem_dev() check
belongs inside those handlers, since that driver claimed the RCiEP and
should know its functionality?  Maybe is_internal_error() and
cxl_error_is_native(), too?

> In addition to errors directed to the CXL endpoint device, a handler
> must also inspect the CXL RAS and PCIe AER capabilities of the CXL
> downstream port that is connected to the device.
> 
> Since CXL downstream port errors are signaled using internal errors,
> the handler requires those errors to be unmasked. This is subject of a
> follow-on patch.
> 
> The reason for choosing this implementation is that a CXL RCEC device
> is bound to the AER port driver,

  ... is that the AER service driver claims the CXL RCEC device, but
  does not allow registration of a CXL sub-service driver ...

> but the driver does not allow it to
> register a custom specific handler to support CXL. Connecting the RCEC
> hard-wired with a CXL handler does not work, as the CXL subsystem
> might not be present all the time. The alternative to add an
> implementation to the portdrv to allow the registration of a custom
> RCEC error handler isn't worth doing it as CXL would be its only user.
> Instead, just check for an CXL RCEC and pass it down to the connected
> CXL device's error handler. With this approach the code can entirely
> be implemented in the PCIe AER driver and is independent of the CXL
> subsystem. The CXL driver only provides the handler.
> 
> [1] CXL 3.0 spec, 12.2.1.1 RCH Downstream Port-detected Errors
> [2] CXL 3.0 spec, 8.1.3 PCIe DVSEC for CXL Devices
> 
> Co-developed-by: Terry Bowman 
> Signed-off-by: Terry Bowman 
> Signed-off-by: Robert Richter 
> Cc: "Oliver O'Halloran" 
> Cc: Bjorn Helgaas 
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-...@vger.kernel.org

Given the questions are minor:

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/pcie/Kconfig |  12 +
>  drivers/pci/pcie/aer.c   | 100 ++-
>  2 files changed, 110 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
> index 228652a59f27..4f0e70fafe2d 100644
> --- a/drivers/pci/pcie/Kconfig
> +++ b/driv

Re: [PATCHv2 pci-next 2/2] PCI/AER: Rate limit the reporting of the correctable errors

2023-05-17 Thread Bjorn Helgaas
On Fri, Apr 07, 2023 at 04:46:03PM -0700, Grant Grundler wrote:
> On Fri, Apr 7, 2023 at 12:46 PM Bjorn Helgaas  wrote:
> > On Fri, Apr 07, 2023 at 11:53:27AM -0700, Grant Grundler wrote:
> > > On Thu, Apr 6, 2023 at 12:50 PM Bjorn Helgaas 
> > wrote:
> > > > On Fri, Mar 17, 2023 at 10:51:09AM -0700, Grant Grundler wrote:
> > > > > From: Rajat Khandelwal 
> > > > >
> > > > > There are many instances where correctable errors tend to inundate
> > > > > the message buffer. We observe such instances during thunderbolt PCIe
> > > > > tunneling.
> > > ...
> >
> > > > >   if (info->severity == AER_CORRECTABLE)
> > > > > - pci_info(dev, "   [%2d] %-22s%s\n", i, errmsg,
> > > > > - info->first_error == i ? " (First)" :
> > "");
> > > > > + pci_info_ratelimited(dev, "   [%2d]
> > %-22s%s\n", i, errmsg,
> > > > > +  info->first_error == i ?
> > " (First)" : "");
> > > >
> > > > I don't think this is going to reliably work the way we want.  We have
> > > > a bunch of pci_info_ratelimited() calls, and each caller has its own
> > > > ratelimit_state data.  Unless we call pci_info_ratelimited() exactly
> > > > the same number of times for each error, the ratelimit counters will
> > > > get out of sync and we'll end up printing fragments from error A mixed
> > > > with fragments from error B.
> > >
> > > Ok - what I'm reading between the lines here is the output should be
> > > emitted in one step, not multiple pci_info_ratelimited() calls. if the
> > > code built an output string (using sprintnf()), and then called
> > > pci_info_ratelimited() exactly once at the bottom, would that be
> > > sufficient?
> > >
> > > > I think we need to explicitly manage the ratelimiting ourselves,
> > > > similar to print_hmi_event_info() or print_extlog_rcd().  Then we can
> > > > have a *single* ratelimit_state, and we can check it once to determine
> > > > whether to log this correctable error.
> > >
> > > Is the rate limiting per call location or per device? From above, I
> > > understood rate limiting is "per call location".  If the code only
> > > has one call location, it should achieve the same goal, right?
> >
> > Rate-limiting is per call location, so yes, if we only have one call
> > location, that would solve it.  It would also have the nice property
> > that all the output would be atomic so it wouldn't get mixed with
> > other stuff, and it might encourage us to be a little less wordy in
> > the output.
> >
> 
> +1 to all of those reasons. Especially reducing the number of lines output.
> 
> I'm going to be out for the next week. If someone else (Rajat Kendalwal
> maybe?) wants to rework this to use one call location it should be fairly
> straight forward. If not, I'll tackle this when I'm back (in 2 weeks
> essentially).

Ping?  Really hoping to merge this for v6.5.

Bjorn


Re: [PATCH v8 0/7] Add pci_dev_for_each_resource() helper and update users

2023-05-12 Thread Bjorn Helgaas
On Fri, May 12, 2023 at 01:56:29PM +0300, Andy Shevchenko wrote:
> On Tue, May 09, 2023 at 01:21:22PM -0500, Bjorn Helgaas wrote:
> > On Tue, Apr 04, 2023 at 11:11:01AM -0500, Bjorn Helgaas wrote:
> > > On Thu, Mar 30, 2023 at 07:24:27PM +0300, Andy Shevchenko wrote:
> > > > Provide two new helper macros to iterate over PCI device resources and
> > > > convert users.
> > 
> > > Applied 2-7 to pci/resource for v6.4, thanks, I really like this!
> > 
> > This is 09cc90063240 ("PCI: Introduce pci_dev_for_each_resource()")
> > upstream now.
> > 
> > Coverity complains about each use,
> 
> It needs more clarification here. Use of reduced variant of the
> macro or all of them? If the former one, then I can speculate that
> Coverity (famous for false positives) simply doesn't understand `for
> (type var; var ...)` code.

True, Coverity finds false positives.  It flagged every use in
drivers/pci and drivers/pnp.  It didn't mention the arch/alpha, arm,
mips, powerpc, sh, or sparc uses, but I think it just didn't look at
those.

It flagged both:

  pbus_size_iopci_dev_for_each_resource(dev, r)
  pbus_size_mem   pci_dev_for_each_resource(dev, r, i)

Here's a spreadsheet with a few more details (unfortunately I don't
know how to make it dump the actual line numbers or analysis like I
pasted below, so "pci_dev_for_each_resource" doesn't appear).  These
are mostly in the "Drivers-PCI" component.

https://docs.google.com/spreadsheets/d/1ohOJwxqXXoDUA0gwopgk-z-6ArLvhN7AZn4mIlDkHhQ/edit?usp=sharing

These particular reports are in the "High Impact Outstanding" tab.

> > sample below from
> > drivers/pci/vgaarb.c.  I didn't investigate at all, so it might be a
> > false positive; just FYI.
> > 
> >   1. Condition screen_info.capabilities & (2U /* 1 << 1 */), taking 
> > true branch.
> >   556if (screen_info.capabilities & VIDEO_CAPABILITY_64BIT_BASE)
> >   557base |= (u64)screen_info.ext_lfb_base << 32;
> >   558
> >   559limit = base + size;
> >   560
> >   561/* Does firmware framebuffer belong to us? */
> >   2. Condition __b < PCI_NUM_RESOURCES, taking true branch.
> >   3. Condition (r = >resource[__b]) , (__b < PCI_NUM_RESOURCES), 
> > taking true branch.
> >   6. Condition __b < PCI_NUM_RESOURCES, taking true branch.
> >   7. cond_at_most: Checking __b < PCI_NUM_RESOURCES implies that __b 
> > may be up to 16 on the true branch.
> >   8. Condition (r = >resource[__b]) , (__b < PCI_NUM_RESOURCES), 
> > taking true branch.
> >   11. incr: Incrementing __b. The value of __b may now be up to 17.
> >   12. alias: Assigning: r = >resource[__b]. r may now point to as 
> > high as element 17 of pdev->resource (which consists of 17 64-byte 
> > elements).
> >   13. Condition __b < PCI_NUM_RESOURCES, taking true branch.
> >   14. Condition (r = >resource[__b]) , (__b < PCI_NUM_RESOURCES), 
> > taking true branch.
> >   562pci_dev_for_each_resource(pdev, r) {
> >   4. Condition resource_type(r) != 512, taking true branch.
> >   9. Condition resource_type(r) != 512, taking true branch.
> > 
> >   CID 1529911 (#1 of 1): Out-of-bounds read (OVERRUN)
> >   15. overrun-local: Overrunning array of 1088 bytes at byte offset 1088 by 
> > dereferencing pointer r. [show details]
> >   563if (resource_type(r) != IORESOURCE_MEM)
> >   5. Continuing loop.
> >   10. Continuing loop.
> >   564continue;
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
> 


Re: [EXT] Re: [PATCH v2 1/1] PCI: layerscape: Add the endpoint linkup notifier support

2023-05-09 Thread Bjorn Helgaas
On Mon, May 08, 2023 at 09:45:59PM +, Frank Li wrote:
> > > > Subject: [EXT] Re: [PATCH v2 1/1] PCI: layerscape: Add the endpoint
> > linkup
> > > > notifier support
> > 
> > All these quoted headers are redundant clutter since we've already
> > seen them when Manivannan sent his comments.  It would be nice if your
> > mailer could be configured to omit them.
> 
> Our email client quite stupid. 

Yeah, sometimes those are really hard to work around.

Bjorn


Re: [PATCH v8 0/7] Add pci_dev_for_each_resource() helper and update users

2023-05-09 Thread Bjorn Helgaas
On Tue, Apr 04, 2023 at 11:11:01AM -0500, Bjorn Helgaas wrote:
> On Thu, Mar 30, 2023 at 07:24:27PM +0300, Andy Shevchenko wrote:
> > Provide two new helper macros to iterate over PCI device resources and
> > convert users.

> Applied 2-7 to pci/resource for v6.4, thanks, I really like this!

This is 09cc90063240 ("PCI: Introduce pci_dev_for_each_resource()")
upstream now.

Coverity complains about each use, sample below from
drivers/pci/vgaarb.c.  I didn't investigate at all, so it might be a
false positive; just FYI.

  1. Condition screen_info.capabilities & (2U /* 1 << 1 */), taking 
true branch.
  556if (screen_info.capabilities & VIDEO_CAPABILITY_64BIT_BASE)
  557base |= (u64)screen_info.ext_lfb_base << 32;
  558
  559limit = base + size;
  560
  561/* Does firmware framebuffer belong to us? */
  2. Condition __b < PCI_NUM_RESOURCES, taking true branch.
  3. Condition (r = >resource[__b]) , (__b < PCI_NUM_RESOURCES), 
taking true branch.
  6. Condition __b < PCI_NUM_RESOURCES, taking true branch.
  7. cond_at_most: Checking __b < PCI_NUM_RESOURCES implies that __b 
may be up to 16 on the true branch.
  8. Condition (r = >resource[__b]) , (__b < PCI_NUM_RESOURCES), 
taking true branch.
  11. incr: Incrementing __b. The value of __b may now be up to 17.
  12. alias: Assigning: r = >resource[__b]. r may now point to as 
high as element 17 of pdev->resource (which consists of 17 64-byte elements).
  13. Condition __b < PCI_NUM_RESOURCES, taking true branch.
  14. Condition (r = >resource[__b]) , (__b < PCI_NUM_RESOURCES), 
taking true branch.
  562pci_dev_for_each_resource(pdev, r) {
  4. Condition resource_type(r) != 512, taking true branch.
  9. Condition resource_type(r) != 512, taking true branch.

  CID 1529911 (#1 of 1): Out-of-bounds read (OVERRUN)
  15. overrun-local: Overrunning array of 1088 bytes at byte offset 1088 by 
dereferencing pointer r. [show details]
  563if (resource_type(r) != IORESOURCE_MEM)
  5. Continuing loop.
  10. Continuing loop.
  564continue;


Re: [EXT] Re: [PATCH v2 1/1] PCI: layerscape: Add the endpoint linkup notifier support

2023-05-08 Thread Bjorn Helgaas
On Mon, May 08, 2023 at 01:31:26PM +, Frank Li wrote:
> > -Original Message-
> > From: Manivannan Sadhasivam 
> > Sent: Saturday, May 6, 2023 2:59 AM
> > To: Frank Li 
> > Cc: M.H. Lian ; Mingkai Hu
> > ; Roy Zang ; Lorenzo Pieralisi
> > ; Rob Herring ; Krzysztof
> > Wilczyński ; Bjorn Helgaas ; open
> > list:PCI DRIVER FOR FREESCALE LAYERSCAPE ;
> > open list:PCI DRIVER FOR FREESCALE LAYERSCAPE  > p...@vger.kernel.org>; moderated list:PCI DRIVER FOR FREESCALE
> > LAYERSCAPE ; open list  > ker...@vger.kernel.org>; i...@lists.linux.dev
> > Subject: [EXT] Re: [PATCH v2 1/1] PCI: layerscape: Add the endpoint linkup
> > notifier support

All these quoted headers are redundant clutter since we've already
seen them when Manivannan sent his comments.  It would be nice if your
mailer could be configured to omit them.

> > > +static int ls_pcie_ep_interrupt_init(struct ls_pcie_ep *pcie,
> > > +  struct platform_device *pdev)
> > > +{
> > > + u32 val;
> > > + int ret;
> > > +
> > > + pcie->irq = platform_get_irq_byname(pdev, "pme");
> > > + if (pcie->irq < 0) {
> > > + dev_err(>dev, "Can't get 'pme' IRQ\n");
> > 
> > PME
> 
> Here should be dts property `pme`, suppose should match
> platform_get_irq_byname(pdev, "pme");

You can also edit out all the other context and questions if you're
not responding to them.

There were a lot of other comments that were useful but are not
relevant to this reply.

Bjorn


Re: [PATCH v4 2/3] PCI/AER: Disable AER interrupt on suspend

2023-05-05 Thread Bjorn Helgaas
On Mon, Apr 24, 2023 at 01:52:48PM +0800, Kai-Heng Feng wrote:
> PCIe service that shares IRQ with PME may cause spurious wakeup on
> system suspend.
> 
> PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states
> that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready
> (D3hot), L2 (D3cold with aux power) and L3 (D3cold), so we don't lose
> much here to disable AER during system suspend.
> 
> This is very similar to previous attempts to suspend AER and DPC [1],
> but with a different reason.

What is the reason?  I assume it's something to do with the bugzilla
below, but the commit log should outline the user-visible problem this
fixes.  The commit log basically makes the case for "why should we
merge this patch."

I assume it's along the lines of "I tried to suspend this system, but
it immediately woke up again because of an AER interrupt, and
disabling AER during suspend avoids this problem.  And disabling
the AER interrupt is not a problem because X"

> [1] 
> https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.f...@canonical.com/
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295
> 
> Reviewed-by: Mika Westerberg 
> Signed-off-by: Kai-Heng Feng 
> ---
>  drivers/pci/pcie/aer.c | 22 ++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 1420e1f27105..9c07fdbeb52d 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev)
>   return 0;
>  }
>  
> +static int aer_suspend(struct pcie_device *dev)
> +{
> + struct aer_rpc *rpc = get_service_data(dev);
> + struct pci_dev *pdev = rpc->rpd;
> +
> + aer_disable_irq(pdev);
> +
> + return 0;
> +}
> +
> +static int aer_resume(struct pcie_device *dev)
> +{
> + struct aer_rpc *rpc = get_service_data(dev);
> + struct pci_dev *pdev = rpc->rpd;
> +
> + aer_enable_irq(pdev);
> +
> + return 0;
> +}
> +
>  /**
>   * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP
>   * @dev: pointer to Root Port, RCEC, or RCiEP
> @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = {
>   .service= PCIE_PORT_SERVICE_AER,
>  
>   .probe  = aer_probe,
> + .suspend= aer_suspend,
> + .resume = aer_resume,
>   .remove = aer_remove,
>  };
>  
> -- 
> 2.34.1
> 


Re: [PATCH v8 2/7] PCI: Export PCI link retrain timeout

2023-05-04 Thread Bjorn Helgaas
On Thu, Apr 06, 2023 at 01:21:09AM +0100, Maciej W. Rozycki wrote:
> Rename LINK_RETRAIN_TIMEOUT to PCIE_LINK_RETRAIN_TIMEOUT and make it
> available via "pci.h" for PCI drivers to use.

> +#define PCIE_LINK_RETRAIN_TIMEOUT HZ

This is basically just a rename and move, but since we're touching it
anyway, can we make it "PCIE_LINK_RETRAIN_TIMEOUT_MS 1000" here and
use msecs_to_jiffies() below?

I know jiffies and HZ are probably idiomatic elsewhere in the kernel,
and this particular timeout is arbitrary and not based on anything in
the spec, but many of the delays in PCI *are* straight from a spec, so
I'd like to make the units more explicit.

>  extern const unsigned char pcie_link_speed[];
>  extern bool pci_early_dump;
>  
> Index: linux-macro/drivers/pci/pcie/aspm.c
> ===
> --- linux-macro.orig/drivers/pci/pcie/aspm.c
> +++ linux-macro/drivers/pci/pcie/aspm.c
> @@ -90,8 +90,6 @@ static const char *policy_str[] = {
>   [POLICY_POWER_SUPERSAVE] = "powersupersave"
>  };
>  
> -#define LINK_RETRAIN_TIMEOUT HZ
> -
>  /*
>   * The L1 PM substate capability is only implemented in function 0 in a
>   * multi function device.
> @@ -213,7 +211,7 @@ static bool pcie_retrain_link(struct pci
>   }
>  
>   /* Wait for link training end. Break out after waiting for timeout */
> - end_jiffies = jiffies + LINK_RETRAIN_TIMEOUT;
> + end_jiffies = jiffies + PCIE_LINK_RETRAIN_TIMEOUT;
>   do {
>   pcie_capability_read_word(parent, PCI_EXP_LNKSTA, );
>   if (!(reg16 & PCI_EXP_LNKSTA_LT))


Re: [PATCH v8 7/7] PCI: Work around PCIe link training failures

2023-05-04 Thread Bjorn Helgaas
On Thu, Apr 06, 2023 at 01:21:31AM +0100, Maciej W. Rozycki wrote:
> Attempt to handle cases such as with a downstream port of the ASMedia 
> ASM2824 PCIe switch where link training never completes and the link 
> continues switching between speeds indefinitely with the data link layer 
> never reaching the active state.

We're going to land this series this cycle, come hell or high water.

We talked about reusing pcie_retrain_link() earlier.  IIRC that didn't
work: ASPM needs to use PCI_EXP_LNKSTA_LT because not all devices
support PCI_EXP_LNKSTA_DLLLA, and you need PCI_EXP_LNKSTA_DLLLA
because the erratum makes PCI_EXP_LNKSTA_LT flap.

What if we made pcie_retrain_link() reusable by making it:

  bool pcie_retrain_link(struct pci_dev *pdev, u16 link_status_bit)

so ASPM could use pcie_retrain_link(link->pdev, PCI_EXP_LNKSTA_LT) and
you could use pcie_retrain_link(dev, PCI_EXP_LNKSTA_DLLLA)?

Maybe do it two steps?

  1) Move pcie_retrain_link() just after pcie_wait_for_link() and make
  it take link->pdev instead of link.

  2) Add the bit parameter.

I'm OK with having pcie_retrain_link() in pci.c, but the surrounding
logic about restricting to 2.5GT/s, retraining, removing the
restriction, retraining again is stuff I'd rather have in quirks.c so
it doesn't clutter pci.c.

I think it'd be good if the pci_device_add() path made clear that this
is a workaround for a problem, e.g.,

  void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
  {
...
if (pcie_link_failed(dev))
  pcie_fix_link_train(dev);

where pcie_fix_link_train() could live in quirks.c (with a stub when
CONFIG_PCI_QUIRKS isn't enabled).  It *might* even be worth adding it
and the stub first because that's a trivial patch and wouldn't clutter
the probe.c git history with all the grotty details about ASM2824 and
this topology.

> +int pcie_downstream_link_retrain(struct pci_dev *dev)
> +{
> + static const struct pci_device_id ids[] = {
> + { PCI_VDEVICE(ASMEDIA, 0x2824) }, /* ASMedia ASM2824 */
> + {}
> + };
> + u16 lnksta, lnkctl2;
> +
> + if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) ||
> + !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting)
> + return -1;
> +
> + pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, );
> + pcie_capability_read_word(dev, PCI_EXP_LNKSTA, );
> + if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) ==
> + PCI_EXP_LNKSTA_LBMS) {

You go to some trouble to make sure PCI_EXP_LNKSTA_LBMS is set, and I
can't remember what the reason is.  If you make a preparatory patch
like this, it would give a place for that background, e.g.,

  +bool pcie_link_failed(struct pci_dev *dev)
  +{
  +   u16 lnksta;
  +
  +   if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) ||
  +   !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting)
  +   return false;
  +
  +   pcie_capability_read_word(dev, PCI_EXP_LNKSTA, );
  +   if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) ==
  +   PCI_EXP_LNKSTA_LBMS)
  +   return true;
  +
  +   return false;
  +}

If this is a generic thing and checking PCI_EXP_LNKSTA_LBMS makes
sense for everybody, it could go in pci.c; otherwise it could go in
quirks.c as well.  I guess it's not *truly* generic anyway because it
only detects link training failures for devices that have LNKCTL2 and
link_active_reporting.

> + unsigned long timeout;
> + u16 lnkctl;
> +
> + pci_info(dev, "broken device, retraining non-functional 
> downstream link at 2.5GT/s\n");
> +
> + pcie_capability_read_word(dev, PCI_EXP_LNKCTL, );
> + lnkctl |= PCI_EXP_LNKCTL_RL;
> + lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
> + lnkctl2 |= PCI_EXP_LNKCTL2_TLS_2_5GT;
> + pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2);
> + pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
> + /*
> +  * Due to an erratum in some devices the Retrain Link bit
> +  * needs to be cleared again manually to allow the link
> +  * training to succeed.
> +  */
> + lnkctl &= ~PCI_EXP_LNKCTL_RL;
> + if (dev->clear_retrain_link)
> + pcie_capability_write_word(dev, PCI_EXP_LNKCTL,
> +lnkctl);
> +
> + timeout = jiffies + PCIE_LINK_RETRAIN_TIMEOUT;
> + do {
> + pcie_capability_read_word(dev, PCI_EXP_LNKSTA,
> +  );
> + if (lnksta & PCI_EXP_LNKSTA_DLLLA)
> + break;
> + usleep_range(1, 2);
> + } while (time_before(jiffies, timeout));
> +
> + if (!(lnksta & PCI_EXP_LNKSTA_DLLLA)) {
> + pci_info(dev, "retraining 

Re: [PATCH 1/1] PCI: layerscape: Add the endpoint linkup notifier support

2023-04-28 Thread Bjorn Helgaas
On Thu, Apr 20, 2023 at 06:11:17PM -0400, Frank Li wrote:
> Layerscape has PME interrupt, which can be use as linkup notifer.
> Set CFG_READY bit when linkup detected.

s/use/used/
s/notifer/notifier/

> +/* PEX PFa PCIE pme and message interrupt registers*/

s/pme/PME/ to match other usage and spec.

> + dev_info(pci->dev, "Detect the link up state !\n");
> + } else if (val & PEX_PF0_PME_MES_DR_LDD) {
> + dev_info(pci->dev, "Detect the link down state !\n");
> + } else if (val & PEX_PF0_PME_MES_DR_HRD) {
> + dev_info(pci->dev, "Detect the hot reset state !\n");

No spaces before "!".  Omit the "!" completely unless these are
unexpected situations.  They seem ordinary to me.

Would probably be better as just "Link up", "Link down", "Hot reset".
Or "Link up state detected" if you want.

> + dev_err(>dev, "Can't get 'pme' irq.\n");
> + dev_err(>dev, "Can't register PCIe IRQ.\n");

Capitalize "IRQ" in both the above message and this one.  No "."
needed at the end.

Bjorn


Re: [PATCH] PCI: Use of_property_present() for testing DT property presence

2023-04-18 Thread Bjorn Helgaas
On Fri, Mar 10, 2023 at 08:47:19AM -0600, Rob Herring wrote:
> It is preferred to use typed property access functions (i.e.
> of_property_read_ functions) rather than low-level
> of_get_property/of_find_property functions for reading properties. As
> part of this, convert of_get_property/of_find_property calls to the
> recently added of_property_present() helper when we just want to test
> for presence of a property and nothing more.
> 
> Signed-off-by: Rob Herring 

Applied with AngeloGioacchino's reviewed-by to pci/enumeration for
v6.4, thanks!

> ---
>  drivers/pci/controller/pci-tegra.c | 4 ++--
>  drivers/pci/controller/pcie-mediatek.c | 2 +-
>  drivers/pci/hotplug/rpaphp_core.c  | 4 ++--
>  drivers/pci/of.c   | 2 +-
>  4 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/pci/controller/pci-tegra.c 
> b/drivers/pci/controller/pci-tegra.c
> index 74c109f14ff0..79630885b9c8 100644
> --- a/drivers/pci/controller/pci-tegra.c
> +++ b/drivers/pci/controller/pci-tegra.c
> @@ -1375,7 +1375,7 @@ static int tegra_pcie_phys_get(struct tegra_pcie *pcie)
>   struct tegra_pcie_port *port;
>   int err;
>  
> - if (!soc->has_gen2 || of_find_property(np, "phys", NULL) != NULL)
> + if (!soc->has_gen2 || of_property_present(np, "phys"))
>   return tegra_pcie_phys_get_legacy(pcie);
>  
>   list_for_each_entry(port, >ports, list) {
> @@ -1944,7 +1944,7 @@ static bool of_regulator_bulk_available(struct 
> device_node *np,
>   for (i = 0; i < num_supplies; i++) {
>   snprintf(property, 32, "%s-supply", supplies[i].supply);
>  
> - if (of_find_property(np, property, NULL) == NULL)
> + if (!of_property_present(np, property))
>   return false;
>   }
>  
> diff --git a/drivers/pci/controller/pcie-mediatek.c 
> b/drivers/pci/controller/pcie-mediatek.c
> index ae5ad05ddc1d..31de7a29192c 100644
> --- a/drivers/pci/controller/pcie-mediatek.c
> +++ b/drivers/pci/controller/pcie-mediatek.c
> @@ -643,7 +643,7 @@ static int mtk_pcie_setup_irq(struct mtk_pcie_port *port,
>   return err;
>   }
>  
> - if (of_find_property(dev->of_node, "interrupt-names", NULL))
> + if (of_property_present(dev->of_node, "interrupt-names"))
>   port->irq = platform_get_irq_byname(pdev, "pcie_irq");
>   else
>   port->irq = platform_get_irq(pdev, port->slot);
> diff --git a/drivers/pci/hotplug/rpaphp_core.c 
> b/drivers/pci/hotplug/rpaphp_core.c
> index 491986197c47..2316de0fd198 100644
> --- a/drivers/pci/hotplug/rpaphp_core.c
> +++ b/drivers/pci/hotplug/rpaphp_core.c
> @@ -278,7 +278,7 @@ int rpaphp_check_drc_props(struct device_node *dn, char 
> *drc_name,
>   return -EINVAL;
>   }
>  
> - if (of_find_property(dn->parent, "ibm,drc-info", NULL))
> + if (of_property_present(dn->parent, "ibm,drc-info"))
>   return rpaphp_check_drc_props_v2(dn, drc_name, drc_type,
>   be32_to_cpu(*my_index));
>   else
> @@ -440,7 +440,7 @@ int rpaphp_add_slot(struct device_node *dn)
>   if (!of_node_name_eq(dn, "pci"))
>   return 0;
>  
> - if (of_find_property(dn, "ibm,drc-info", NULL))
> + if (of_property_present(dn, "ibm,drc-info"))
>   return rpaphp_drc_info_add_slot(dn);
>   else
>   return rpaphp_drc_add_slot(dn);
> diff --git a/drivers/pci/of.c b/drivers/pci/of.c
> index 196834ed44fe..e085f2eca372 100644
> --- a/drivers/pci/of.c
> +++ b/drivers/pci/of.c
> @@ -447,7 +447,7 @@ static int of_irq_parse_pci(const struct pci_dev *pdev, 
> struct of_phandle_args *
>   return -ENODEV;
>  
>   /* Local interrupt-map in the device node? Use it! */
> - if (of_get_property(dn, "interrupt-map", NULL)) {
> + if (of_property_present(dn, "interrupt-map")) {
>   pin = pci_swizzle_interrupt_pin(pdev, pin);
>   ppnode = dn;
>   }
> -- 
> 2.39.2
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


Re: [PATCH v3 6/6] PCI/AER: Unmask RCEC internal errors to enable RCH downstream port error handling

2023-04-14 Thread Bjorn Helgaas
On Thu, Apr 13, 2023 at 03:38:07PM +0200, Robert Richter wrote:
> On 12.04.23 16:29:01, Bjorn Helgaas wrote:
> > On Tue, Apr 11, 2023 at 01:03:02PM -0500, Terry Bowman wrote:
> > > From: Robert Richter 
> > > 
> > > RCEC AER corrected and uncorrectable internal errors (CIE/UIE) are
> > > disabled by default.

> > > +static void cxl_unmask_internal_errors(struct pci_dev *rcec)
> 
> Also renaming this to cxl_enable_rcec() to more generalize the
> function.

I didn't follow this.  "cxl_enable_rcec" doesn't say anything about
"unmasking" or "internal errors", which seems like the whole point.
And the function doesn't actually *enable* and RCEC.

> > > +{
> > > + if (!handles_cxl_errors(rcec))
> > > + return;
> > > +
> > > + if (__cxl_unmask_internal_errors(rcec))
> > > + dev_err(>dev, "cxl: Failed to unmask internal errors");
> > > + else
> > > + dev_dbg(>dev, "cxl: Internal errors unmasked");
> 
> I am going to change this to a pci_info() for alignment with other
> messages around:
> 
> [   14.200265] pcieport :40:00.3: PME: Signaling with IRQ 44
> [   14.213925] pcieport :40:00.3: AER: cxl: Internal errors unmasked
> [   14.228413] pcieport :40:00.3: AER: enabled with IRQ 44
> 
> Plus, using pci_err() instead of dev_err().

Thanks for that!

Bjorn


Re: [PATCH v3 5/6] PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem dev handler

2023-04-14 Thread Bjorn Helgaas
On Thu, Apr 13, 2023 at 01:40:52PM +0200, Robert Richter wrote:
> On 12.04.23 17:02:33, Bjorn Helgaas wrote:
> > On Tue, Apr 11, 2023 at 01:03:01PM -0500, Terry Bowman wrote:
> > > From: Robert Richter 

> ...
> Let's assume just a simple CXL RCH topology:
> 
> PCI hierarchy:
> 
>   -
>   | ACPI0016  |--   Host bridge (CXL host)
>   | - CEDT| |
>---|   - RCRB base | |
>|  - :
>|   |
>|   |
>|  --- -
>|  | RCiEP   |.| RCEC  | Endpoint (CXL dev)
>|  | - BDF   | | - BDF |
>|  |   | - PCIe AER  | -
>|  |   | - CXL dvsec |
>|  |   |   (v2: reg loc) |
>|  |   |   - Comp regs   |
>|  |   | - CXL RAS   |
>|  |   ---
>:  :
>   
> CXL hierarchy:
> 
>::
>:  --|
>|  | CXL root port  |<
>|  ||
>|->| - dport RCRB   |<
>|  |   - PCIe AER   ||
>|  |   - Comp regs  ||
>|  | - CXL RAS  ||
>|  --|
>|  : |
>|  |   --|
>|  --->| CXL endpoint   |-
>|  | (v1: RCRB) |
>-->| - uport RCRB   |
>   |   - Comp regs  |
>   | - CXL RAS  |
>   --
> 
> Dport detected errors are reported using PCIe AER and CXL RAS caps in
> the dports RCRB.
> 
> Uport detected errors are reported using RCiEP's PCIe AER cap and
> either the uport's RCRB RAS cap or the RAS cap of the comp regs
> located using CXL DVSEC register locator.
> 
> In all cases the RCEC is used with either the RCEC (dport errors) or
> the RCiEP (uport errors) error source id (BDF: bus, dev, func).

I'm mostly interested in the PCI entities involved because that's all
aer.c can deal with.  For the above, I think the PCI core only knows
about these:

  00:00.0 RCEC  with AER, RCEC EA includes 00:01.0
  00:01.0 RCiEP with AER

aer_irq() would handle AER interrupts from 00:00.0.
cxl_handle_error() would be called for 00:00.0 and would call
handle_error_source() for everything below it (only 00:01.0 here).

> > The current code uses pcie_walk_rcec() in this path, which basically
> > searches below a Root Port or RCEC for devices that have an AER error
> > status bit set, add them to the e_info[] list, and call
> > handle_error_source() for each one:
> 
> For reference, this series adds support to handle RCH downstream
> port-detected errors as described in CXL 3.0, 12.2.1.1.
> 
> This flow looks correct to me, see comments inline.

We seem to be on the same page here, so I'll trim it out.

> ...
> > So we insert cxl_handle_error() in handle_error_source(), where it
> > gets called for the RCEC, and then it uses pcie_walk_rcec() again to
> > forcibly call handle_error_source() for *every* device "below" the
> > RCEC (even though they don't have AER error status bits set).
> 
> The CXL device contains the links to the dport's caps. Also, there can
> be multiple RCs with CXL devs connected to it. So we must search for
> all CXL devices now, determine the corresponding dport and inspect
> both, PCIe AER and CXL RAS caps.
> 
> > Then handle_error_source() ultimately calls the CXL driver err_handler
> > entry points (.cor_error_detected(), .error_detected(), etc), which
> > can look at the CXL-specific error status in the CXL RAS or RCRB or
> > whatever.
> 
> The AER driver (portdrv) does not have the knowledge of CXL internals.
> Thus the approach is to pass dport errors to the cxl_mem driver to
> handle it there in addition to cxl mem dev errors.
> 
> > So this basically looks like a workaround for the fact that the AER
> > code only calls handle_error_source() when it finds AER error status,
> > and CXL doesn't *set* that AER error status.  There's not that much
> > code here, but it seems like a quite a bit of complexity in an area
> > that is already pretty complicated.

My main point here (correct me if I got this wrong) is that:

  - A RCEC generates an AER interrupt

  - find_source_device() searches all devices below the RCEC and
builds a list everything for which to call handle_error_source()

  - cxl_handl

Re: [PATCH v3 5/6] PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem dev handler

2023-04-12 Thread Bjorn Helgaas
> [1] CXL 3.0 spec, 12.2.1.1 RCH Downstream Port-detected Errors
> [2] CXL 3.0 spec, 8.1.3 PCIe DVSEC for CXL Devices
> 
> Co-developed-by: Terry Bowman 
> Signed-off-by: Robert Richter 
> Signed-off-by: Terry Bowman 
> Cc: "Oliver O'Halloran" 
> Cc: Bjorn Helgaas 
> Cc: Mahesh J Salgaonkar 
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-...@vger.kernel.org
> ---
>  drivers/pci/pcie/Kconfig |  8 ++
>  drivers/pci/pcie/aer.c   | 61 
>  2 files changed, 69 insertions(+)
> 
> diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
> index 228652a59f27..b0dbd864d3a3 100644
> --- a/drivers/pci/pcie/Kconfig
> +++ b/drivers/pci/pcie/Kconfig
> @@ -49,6 +49,14 @@ config PCIEAER_INJECT
> gotten from:
>
> https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/
>  
> +config PCIEAER_CXL
> + bool "PCI Express CXL RAS support"
> + default y
> + depends on PCIEAER && CXL_PCI
> + help
> +   This enables CXL error handling for Restricted CXL Hosts
> +   (RCHs).
> +
>  #
>  # PCI Express ECRC
>  #
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 7a25b62d9e01..171a08fd8ebd 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -946,6 +946,65 @@ static bool find_source_device(struct pci_dev *parent,
>   return true;
>  }
>  
> +#ifdef CONFIG_PCIEAER_CXL
> +
> +static bool is_cxl_mem_dev(struct pci_dev *dev)
> +{
> + /*
> +  * A CXL device is controlled only using PCIe Configuration
> +  * Space of device 0, Function 0.
> +  */
> + if (dev->devfn != PCI_DEVFN(0, 0))
> + return false;
> +
> + /* Right now there is only a CXL.mem driver */
> + if ((dev->class >> 8) != PCI_CLASS_MEMORY_CXL)
> + return false;
> +
> + return true;
> +}
> +
> +static bool is_internal_error(struct aer_err_info *info)
> +{
> + if (info->severity == AER_CORRECTABLE)
> + return info->status & PCI_ERR_COR_INTERNAL;
> +
> + return info->status & PCI_ERR_UNC_INTN;
> +}
> +
> +static void handle_error_source(struct pci_dev *dev, struct aer_err_info 
> *info);
> +
> +static int cxl_handle_error_iter(struct pci_dev *dev, void *data)
> +{
> + struct aer_err_info *e_info = (struct aer_err_info *)data;
> +
> + if (!is_cxl_mem_dev(dev))
> + return 0;
> +
> + /* pci_dev_put() in handle_error_source() */
> + dev = pci_dev_get(dev);
> + if (dev)
> + handle_error_source(dev, e_info);
> +
> + return 0;
> +}
> +
> +static void cxl_handle_error(struct pci_dev *dev, struct aer_err_info *info)
> +{
> + /*
> +  * CXL downstream port errors are signaled as RCEC internal
> +  * errors. Forward them to all CXL devices below the RCEC.
> +  */
> + if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
> + is_internal_error(info))
> + pcie_walk_rcec(dev, cxl_handle_error_iter, info);
> +}
> +
> +#else
> +static inline void cxl_handle_error(struct pci_dev *dev,
> + struct aer_err_info *info) { }
> +#endif
> +
>  /**
>   * handle_error_source - handle logging error into an event log
>   * @dev: pointer to pci_dev data structure of error source device
> @@ -957,6 +1016,8 @@ static void handle_error_source(struct pci_dev *dev, 
> struct aer_err_info *info)
>  {
>   int aer = dev->aer_cap;
>  
> + cxl_handle_error(dev, info);
> +
>   if (info->severity == AER_CORRECTABLE) {
>   /*
>* Correctable error does not need software intervention.
> -- 
> 2.34.1
> 


Re: [PATCH v3 6/6] PCI/AER: Unmask RCEC internal errors to enable RCH downstream port error handling

2023-04-12 Thread Bjorn Helgaas
On Tue, Apr 11, 2023 at 01:03:02PM -0500, Terry Bowman wrote:
> From: Robert Richter 
> 
> RCEC AER corrected and uncorrectable internal errors (CIE/UIE) are
> disabled by default.

"Disabled by default" just means "the power-up state of CIE/UIC is
that they are masked", right?  It doesn't mean that Linux normally
masks them.

> [1][2] Enable them to receive CXL downstream port
> errors of a Restricted CXL Host (RCH).
> 
> [1] CXL 3.0 Spec, 12.2.1.1 - RCH Downstream Port Detected Errors
> [2] PCIe Base Spec 6.0, 7.8.4.3 Uncorrectable Error Mask Register,
> 7.8.4.6 Correctable Error Mask Register
> 
> Co-developed-by: Terry Bowman 
> Signed-off-by: Robert Richter 
> Signed-off-by: Terry Bowman 
> Cc: "Oliver O'Halloran" 
> Cc: Bjorn Helgaas 
> Cc: Mahesh J Salgaonkar 
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-...@vger.kernel.org
> ---
>  drivers/pci/pcie/aer.c | 73 ++
>  1 file changed, 73 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 171a08fd8ebd..3973c731e11d 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1000,7 +1000,79 @@ static void cxl_handle_error(struct pci_dev *dev, 
> struct aer_err_info *info)
>   pcie_walk_rcec(dev, cxl_handle_error_iter, info);
>  }
>  
> +static bool cxl_error_is_native(struct pci_dev *dev)
> +{
> + struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
> +
> + if (pcie_ports_native)
> + return true;
> +
> + return host->native_aer && host->native_cxl_error;
> +}
> +
> +static int handles_cxl_error_iter(struct pci_dev *dev, void *data)
> +{
> + int *handles_cxl = data;
> +
> + *handles_cxl = is_cxl_mem_dev(dev) && cxl_error_is_native(dev);
> +
> + return *handles_cxl;
> +}
> +
> +static bool handles_cxl_errors(struct pci_dev *rcec)
> +{
> + int handles_cxl = 0;
> +
> + if (!rcec->aer_cap)
> + return false;
> +
> + if (pci_pcie_type(rcec) == PCI_EXP_TYPE_RC_EC)
> + pcie_walk_rcec(rcec, handles_cxl_error_iter, _cxl);
> +
> + return !!handles_cxl;
> +}
> +
> +static int __cxl_unmask_internal_errors(struct pci_dev *rcec)
> +{
> + int aer, rc;
> + u32 mask;
> +
> + /*
> +  * Internal errors are masked by default, unmask RCEC's here
> +  * PCI6.0 7.8.4.3 Uncorrectable Error Mask Register (Offset 08h)
> +  * PCI6.0 7.8.4.6 Correctable Error Mask Register (Offset 14h)
> +  */

Unmasking internal errors doesn't have anything specific to do with
CXL, so I don't think it should have "cxl" in the function name.
Maybe something like "pci_aer_unmask_internal_errors()".

This also has nothing special to do with RCECs, so I think we should
refer to the device as "dev" as is typical in this file.

I think this needs to check pcie_aer_is_native() as is done by
pci_aer_clear_nonfatal_status() and other functions that write the AER
Capability.

With the exception of this function, this patch looks like all CXL
code that maybe could be with other CXL code.  Would require making
pcie_walk_rcec() available outside drivers/pci, I guess.

> + aer = rcec->aer_cap;
> + rc = pci_read_config_dword(rcec, aer + PCI_ERR_UNCOR_MASK, );
> + if (rc)
> + return rc;
> + mask &= ~PCI_ERR_UNC_INTN;
> + rc = pci_write_config_dword(rcec, aer + PCI_ERR_UNCOR_MASK, mask);
> + if (rc)
> + return rc;
> +
> + rc = pci_read_config_dword(rcec, aer + PCI_ERR_COR_MASK, );
> + if (rc)
> + return rc;
> + mask &= ~PCI_ERR_COR_INTERNAL;
> + rc = pci_write_config_dword(rcec, aer + PCI_ERR_COR_MASK, mask);
> +
> + return rc;
> +}
> +
> +static void cxl_unmask_internal_errors(struct pci_dev *rcec)
> +{
> + if (!handles_cxl_errors(rcec))
> + return;
> +
> + if (__cxl_unmask_internal_errors(rcec))
> + dev_err(>dev, "cxl: Failed to unmask internal errors");
> + else
> + dev_dbg(>dev, "cxl: Internal errors unmasked");
> +}
> +
>  #else
> +static inline void cxl_unmask_internal_errors(struct pci_dev *dev) { }
>  static inline void cxl_handle_error(struct pci_dev *dev,
>   struct aer_err_info *info) { }
>  #endif
> @@ -1397,6 +1469,7 @@ static int aer_probe(struct pcie_device *dev)
>   return status;
>   }
>  
> + cxl_unmask_internal_errors(port);
>   aer_enable_rootport(rpc);
>   pci_info(port, "enabled with IRQ %d\n", dev->irq);
>   return 0;
> -- 
> 2.34.1
> 


Re: [PATCHv2 pci-next 2/2] PCI/AER: Rate limit the reporting of the correctable errors

2023-04-07 Thread Bjorn Helgaas
On Fri, Apr 07, 2023 at 11:53:27AM -0700, Grant Grundler wrote:
> On Thu, Apr 6, 2023 at 12:50 PM Bjorn Helgaas  wrote:
> > On Fri, Mar 17, 2023 at 10:51:09AM -0700, Grant Grundler wrote:
> > > From: Rajat Khandelwal 
> > >
> > > There are many instances where correctable errors tend to inundate
> > > the message buffer. We observe such instances during thunderbolt PCIe
> > > tunneling.
> ...

> > >   if (info->severity == AER_CORRECTABLE)
> > > - pci_info(dev, "   [%2d] %-22s%s\n", i, errmsg,
> > > - info->first_error == i ? " (First)" : "");
> > > + pci_info_ratelimited(dev, "   [%2d] %-22s%s\n", i, 
> > > errmsg,
> > > +  info->first_error == i ? " 
> > > (First)" : "");
> >
> > I don't think this is going to reliably work the way we want.  We have
> > a bunch of pci_info_ratelimited() calls, and each caller has its own
> > ratelimit_state data.  Unless we call pci_info_ratelimited() exactly
> > the same number of times for each error, the ratelimit counters will
> > get out of sync and we'll end up printing fragments from error A mixed
> > with fragments from error B.
> 
> Ok - what I'm reading between the lines here is the output should be
> emitted in one step, not multiple pci_info_ratelimited() calls. if the
> code built an output string (using sprintnf()), and then called
> pci_info_ratelimited() exactly once at the bottom, would that be
> sufficient?
>
> > I think we need to explicitly manage the ratelimiting ourselves,
> > similar to print_hmi_event_info() or print_extlog_rcd().  Then we can
> > have a *single* ratelimit_state, and we can check it once to determine
> > whether to log this correctable error.
> 
> Is the rate limiting per call location or per device? From above, I
> understood rate limiting is "per call location".  If the code only
> has one call location, it should achieve the same goal, right?

Rate-limiting is per call location, so yes, if we only have one call
location, that would solve it.  It would also have the nice property
that all the output would be atomic so it wouldn't get mixed with
other stuff, and it might encourage us to be a little less wordy in
the output.

But I don't think we need output in a single step; we just need a
single instance of ratelimit_state (or one for CPER path and another
for native AER path), and that can control all the output for a single
error.  E.g., print_hmi_event_info() looks like this:

  static void print_hmi_event_info(...)
  {
static DEFINE_RATELIMIT_STATE(rs, ...);

if (__ratelimit()) {
  printk("%s%s Hypervisor Maintenance interrupt ...");
  printk("%s Error detail: %s\n", ...);
  printk("%s  HMER: %016llx\n", ...);
}
  }

I think it's nice that the struct ratelimit_state is explicit and
there's no danger of breaking it when adding another printk later.

It *could* be per pci_dev, too, but I suspect it's not worth spending
40ish bytes per device for the ratelimit data.

Bjorn


Re: [PATCHv2 pci-next 2/2] PCI/AER: Rate limit the reporting of the correctable errors

2023-04-06 Thread Bjorn Helgaas
On Fri, Mar 17, 2023 at 10:51:09AM -0700, Grant Grundler wrote:
> From: Rajat Khandelwal 
> 
> There are many instances where correctable errors tend to inundate
> the message buffer. We observe such instances during thunderbolt PCIe
> tunneling.
> 
> It's true that they are mitigated by the hardware and are non-fatal
> but we shouldn't be spamming the logs with such correctable errors as it
> confuses other kernel developers less familiar with PCI errors, support
> staff, and users who happen to look at the logs, hence rate limit them.
> 
> A typical example log inside an HP TBT4 dock:
> [54912.661142] pcieport :00:07.0: AER: Multiple Corrected error received: 
> :2b:00.0
> [54912.661194] igc :2b:00.0: PCIe Bus Error: severity=Corrected, 
> type=Data Link Layer, (Transmitter ID)
> [54912.661203] igc :2b:00.0:   device [8086:5502] error 
> status/mask=1100/2000
> [54912.661211] igc :2b:00.0:[ 8] Rollover
> [54912.661219] igc :2b:00.0:[12] Timeout
> [54982.838760] pcieport :00:07.0: AER: Corrected error received: 
> :2b:00.0
> [54982.838798] igc :2b:00.0: PCIe Bus Error: severity=Corrected, 
> type=Data Link Layer, (Transmitter ID)
> [54982.838808] igc :2b:00.0:   device [8086:5502] error 
> status/mask=1000/2000
> [54982.838817] igc :2b:00.0:[12] Timeout

The timestamps don't contribute to understanding the problem, so we
can omit them.

> This gets repeated continuously, thus inundating the buffer.
> 
> Signed-off-by: Rajat Khandelwal 
> Signed-off-by: Grant Grundler 
> ---
>  drivers/pci/pcie/aer.c | 42 --
>  1 file changed, 28 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index cb6b96233967..b592cea8bffe 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -706,8 +706,8 @@ static void __aer_print_error(struct pci_dev *dev,
>   errmsg = "Unknown Error Bit";
>  
>   if (info->severity == AER_CORRECTABLE)
> - pci_info(dev, "   [%2d] %-22s%s\n", i, errmsg,
> - info->first_error == i ? " (First)" : "");
> + pci_info_ratelimited(dev, "   [%2d] %-22s%s\n", i, 
> errmsg,
> +  info->first_error == i ? " 
> (First)" : "");

I don't think this is going to reliably work the way we want.  We have
a bunch of pci_info_ratelimited() calls, and each caller has its own
ratelimit_state data.  Unless we call pci_info_ratelimited() exactly
the same number of times for each error, the ratelimit counters will
get out of sync and we'll end up printing fragments from error A mixed
with fragments from error B.

I think we need to explicitly manage the ratelimiting ourselves,
similar to print_hmi_event_info() or print_extlog_rcd().  Then we can
have a *single* ratelimit_state, and we can check it once to determine
whether to log this correctable error.

>   else
>   pci_err(dev, "   [%2d] %-22s%s\n", i, errmsg,
>   info->first_error == i ? " (First)" : "");
> @@ -719,7 +719,6 @@ void aer_print_error(struct pci_dev *dev, struct 
> aer_err_info *info)
>  {
>   int layer, agent;
>   int id = ((dev->bus->number << 8) | dev->devfn);
> - const char *level;
>  
>   if (!info->status) {
>   pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, 
> (Unregistered Agent ID)\n",
> @@ -730,14 +729,21 @@ void aer_print_error(struct pci_dev *dev, struct 
> aer_err_info *info)
>   layer = AER_GET_LAYER_ERROR(info->severity, info->status);
>   agent = AER_GET_AGENT(info->severity, info->status);
>  
> - level = (info->severity == AER_CORRECTABLE) ? KERN_INFO : KERN_ERR;
> + if (info->severity == AER_CORRECTABLE) {
> + pci_info_ratelimited(dev, "PCIe Bus Error: severity=%s, 
> type=%s, (%s)\n",
> +  aer_error_severity_string[info->severity],
> +  aer_error_layer[layer], 
> aer_agent_string[agent]);
>  
> - pci_printk(level, dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
> -aer_error_severity_string[info->severity],
> -aer_error_layer[layer], aer_agent_string[agent]);
> + pci_info_ratelimited(dev, "  device [%04x:%04x] error 
> status/mask=%08x/%08x\n",
> +  dev->vendor, dev->device, info->status, 
> info->mask);
> + } else {
> + pci_err(dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
> + aer_error_severity_string[info->severity],
> + aer_error_layer[layer], aer_agent_string[agent]);
>  
> - pci_printk(level, dev, "  device [%04x:%04x] error 
> status/mask=%08x/%08x\n",
> -dev->vendor, dev->device, info->status, info->mask);
> + pci_err(dev, "  device [%04x:%04x] error 

Re: [PATCH v8 0/7] Add pci_dev_for_each_resource() helper and update users

2023-04-05 Thread Bjorn Helgaas
On Wed, Apr 05, 2023 at 11:28:27AM +0300, Andy Shevchenko wrote:
> On Tue, Apr 04, 2023 at 11:11:01AM -0500, Bjorn Helgaas wrote:
> > On Thu, Mar 30, 2023 at 07:24:27PM +0300, Andy Shevchenko wrote:
> > > Provide two new helper macros to iterate over PCI device resources and
> > > convert users.
> > > 
> > > Looking at it, refactor existing pci_bus_for_each_resource() and convert
> > > users accordingly.

> > Applied 2-7 to pci/resource for v6.4, thanks, I really like this!
> 
> Btw, can you actually drop patch 7, please?

Done.

> > I omitted
> > 
> >   [1/7] kernel.h: Split out COUNT_ARGS() and CONCATENATE()"
> > 
> > only because it's not essential to this series and has only a trivial
> > one-line impact on include/linux/pci.h.
> 
> I'm not sure I understood what exactly "essentiality" means to you, but
> I included that because it makes the split which can be used later by
> others and not including kernel.h in the header is the objective I want
> to achieve. Without this patch the achievement is going to be deferred.
> Yet, this, as you have noticed, allows to compile and use the macros in
> the rest of the patches.

I haven't followed the kernel.h splitting, and I try to avoid
incidental changes outside of the files I maintain, so I just wanted
to keep this series purely PCI and avoid any possible objections to a
new include file or discussion about how it should be done.


Re: [PATCH v8 5/7] PCI: Allow pci_bus_for_each_resource() to take less arguments

2023-04-05 Thread Bjorn Helgaas
On Wed, Apr 05, 2023 at 02:50:47PM +0300, Andy Shevchenko wrote:
> On Thu, Mar 30, 2023 at 07:24:32PM +0300, Andy Shevchenko wrote:
> > Refactor pci_bus_for_each_resource() in the same way as it's done in
> > pci_dev_for_each_resource() case. This will allow to hide iterator
> > inside the loop, where it's not used otherwise.
> > 
> > No functional changes intended.
> 
> Bjorn, this has wrong author in your tree:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/commit/?h=resource=46dbad19a59e0dd8f1e7065e5281345797fbb365

I botched it, sorry, should be fixed now.

Bjorn


Re: [PATCH v8 0/7] Add pci_dev_for_each_resource() helper and update users

2023-04-04 Thread Bjorn Helgaas
On Thu, Mar 30, 2023 at 07:24:27PM +0300, Andy Shevchenko wrote:
> Provide two new helper macros to iterate over PCI device resources and
> convert users.
> 
> Looking at it, refactor existing pci_bus_for_each_resource() and convert
> users accordingly.
> 
> Note, the amount of lines grew due to the documentation update.
> 
> Changelog v8:
> - fixed issue with pci_bus_for_each_resource() macro (LKP)
> - due to above added a new patch to document how it works
> - moved the last patch to be #2 (Philippe)
> - added tags (Philippe)
> 
> Changelog v7:
> - made both macros to share same name (Bjorn)

I didn't actually request the same name for both; I would have had no
idea how to even do that :)

v6 had:

  pci_dev_for_each_resource_p(dev, res)
  pci_dev_for_each_resource(dev, res, i)

and I suggested:

  pci_dev_for_each_resource(dev, res)
  pci_dev_for_each_resource_idx(dev, res, i)

because that pattern is used elsewhere.  But you figured out how to do
it, and having one name is even better, so thanks for that extra work!

> - split out the pci_resource_n() conversion (Bjorn)
> 
> Changelog v6:
> - dropped unused variable in PPC code (LKP)
> 
> Changelog v5:
> - renamed loop variable to minimize the clash (Keith)
> - addressed smatch warning (Dan)
> - addressed 0-day bot findings (LKP)
> 
> Changelog v4:
> - rebased on top of v6.3-rc1
> - added tag (Krzysztof)
> 
> Changelog v3:
> - rebased on top of v2 by Mika, see above
> - added tag to pcmcia patch (Dominik)
> 
> Changelog v2:
> - refactor to have two macros
> - refactor existing pci_bus_for_each_resource() in the same way and
>   convert users
> 
> Andy Shevchenko (6):
>   kernel.h: Split out COUNT_ARGS() and CONCATENATE()
>   PCI: Introduce pci_resource_n()
>   PCI: Document pci_bus_for_each_resource() to avoid confusion
>   PCI: Allow pci_bus_for_each_resource() to take less arguments
>   EISA: Convert to use less arguments in pci_bus_for_each_resource()
>   pcmcia: Convert to use less arguments in pci_bus_for_each_resource()
> 
> Mika Westerberg (1):
>   PCI: Introduce pci_dev_for_each_resource()
> 
>  .clang-format |  1 +
>  arch/alpha/kernel/pci.c   |  5 +-
>  arch/arm/kernel/bios32.c  | 16 +++--
>  arch/arm/mach-dove/pcie.c | 10 ++--
>  arch/arm/mach-mv78xx0/pcie.c  | 10 ++--
>  arch/arm/mach-orion5x/pci.c   | 10 ++--
>  arch/mips/pci/ops-bcm63xx.c   |  8 +--
>  arch/mips/pci/pci-legacy.c|  3 +-
>  arch/powerpc/kernel/pci-common.c  | 21 +++
>  arch/powerpc/platforms/4xx/pci.c  |  8 +--
>  arch/powerpc/platforms/52xx/mpc52xx_pci.c |  5 +-
>  arch/powerpc/platforms/pseries/pci.c  | 16 ++---
>  arch/sh/drivers/pci/pcie-sh7786.c | 10 ++--
>  arch/sparc/kernel/leon_pci.c  |  5 +-
>  arch/sparc/kernel/pci.c   | 10 ++--
>  arch/sparc/kernel/pcic.c  |  5 +-
>  drivers/eisa/pci_eisa.c   |  4 +-
>  drivers/pci/bus.c |  7 +--
>  drivers/pci/hotplug/shpchp_sysfs.c|  8 +--
>  drivers/pci/pci.c |  3 +-
>  drivers/pci/probe.c   |  2 +-
>  drivers/pci/remove.c  |  5 +-
>  drivers/pci/setup-bus.c   | 37 +---
>  drivers/pci/setup-res.c   |  4 +-
>  drivers/pci/vgaarb.c  | 17 ++
>  drivers/pci/xen-pcifront.c|  4 +-
>  drivers/pcmcia/rsrc_nonstatic.c   |  9 +--
>  drivers/pcmcia/yenta_socket.c |  3 +-
>  drivers/pnp/quirks.c  | 29 -
>  include/linux/args.h  | 13 
>  include/linux/kernel.h|  8 +--
>  include/linux/pci.h   | 72 +++
>  32 files changed, 190 insertions(+), 178 deletions(-)
>  create mode 100644 include/linux/args.h

Applied 2-7 to pci/resource for v6.4, thanks, I really like this!

I omitted

  [1/7] kernel.h: Split out COUNT_ARGS() and CONCATENATE()"

only because it's not essential to this series and has only a trivial
one-line impact on include/linux/pci.h.

Bjorn


Re: [PATCH v2 4/5] cxl/pci: Forward RCH downstream port-detected errors to the CXL.mem dev handler

2023-03-28 Thread Bjorn Helgaas
[+cc linux-pci, more error handling folks; beginning of thread at
https://lore.kernel.org/all/20230323213808.398039-1-terry.bow...@amd.com/]

On Mon, Mar 27, 2023 at 11:51:39PM +0200, Robert Richter wrote:
> On 24.03.23 17:36:56, Bjorn Helgaas wrote:

> > > The CXL device driver is then responsible to
> > > enable error reporting in the RCEC's AER cap
> > 
> > I don't know exactly what you mean by "error reporting in the RCEC's
> > AER cap", but IIUC, for non-Root Port devices, generation of ERR_COR/
> > ERR_NONFATAL/ERR_FATAL messages is controlled by the Device Control
> > register and should already be enabled by pci_aer_init().
> > 
> > Maybe you mean setting AER mask/severity specifically for Internal
> > Errors?  I'm hoping to get as much of AER management as we can in the
> 
> Richt, this is implemented in patch #5 in function
> rcec_enable_aer_ints().

I think we should add a PCI core interface for this so we can enforce
the AER ownership question (all the crud like pcie_aer_is_native()) in
one place.

> > PCI core and out of drivers, so maybe we need a new PCI interface to
> > do that.
> > 
> > In any event, I assume this sort of configuration would be an
> > enumeration-time thing, while *this* patch is a run-time thing, so
> > maybe this information belongs with a different patch?
> 
> Do you mean once a Restricted CXL host (RCH) is detected, the internal
> errors should be enabled in the device mask, all this done during
> device enumeration? But wouldn't interrupts being enabled then before
> the CXL device is ready?

I'm not sure what you mean by "before the CXL device is ready."  What
makes a CXL device ready, and how do we know when it is ready?

pci_aer_init() turns on PCI_EXP_DEVCTL_CERE, PCI_EXP_DEVCTL_FERE, etc
as soon as we enumerate the device, before any driver claims the
device.  I'm wondering whether we can do this PCI_ERR_COR_INTERNAL and
PCI_ERR_UNC_INTN fiddling around the same time?

> > I haven't worked all the way through this, but I thought Sean Kelley's
> > and Qiuxu Zhuo's work was along the same line and might cover this,
> > e.g.,
> > 
> >   a175102b0a82 ("PCI/ERR: Recover from RCEC AER errors")
> >   579086225502 ("PCI/ERR: Recover from RCiEP AER errors")
> >   af113553d961 ("PCI/AER: Add pcie_walk_rcec() to RCEC AER handling")
> > 
> > But I guess maybe it's not quite the same case?
> 
> Actually, we use this code to handle errors that are reported to the
> RCEC and only implement here the CXL specifics. That is, checking if
> the RCEC receives something from a CXL downstream port and forwarding
> that to a CXL handler (this patch). The handler then checks the AER
> err cap in the RCRB of all CXL downstream ports associated to the RCEC
> (not visible in the PCI hierarchy), but discovered through the :00.0
> RCiEP (patch #5).

There are two calls to pcie_walk_rcec():

  1) The existing one in find_source_device()
  2) The one you add in handle_cxl_error()

Does the call in handle_cxl_error() look at devices that the existing
call in find_source_device() does not?  I'm trying to understand why
we need both calls.

> > > +static bool is_internal_error(struct aer_err_info *info)
> > > +{
> > > + if (info->severity == AER_CORRECTABLE)
> > > + return info->status & PCI_ERR_COR_INTERNAL;
> > > +
> > > + return info->status & PCI_ERR_UNC_INTN;
> > > +}
> > > +
> > > +static void handle_cxl_error(struct pci_dev *dev, struct aer_err_info 
> > > *info)
> > > +{
> > > + if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
> > > + is_internal_error(info))
> > 
> > What's unique about Internal Errors?  I'm trying to figure out why you
> > wouldn't do this for *all* CXL errors.
> 
> Per CXL specification downstream port errors are signaled using
> internal errors. 

Maybe a spec reference here to explain is_internal_error()?  Is the
point of the check to *exclude* non-internal errors?  Or is basically
documentation that there shouldn't ever *be* any non-internal errors?
I guess the latter wouldn't make sense because at this point we don't
know whether this is a CXL hierarchy.

> All other errors would be device specific, we cannot
> handle that in a generic CXL driver.

I'm missing the point here.  We don't have any device-specific error
handling in aer.c; it only connects the generic *reporting* mechanism
(AER log registers and Root Port interrupts) to the drivers that do
the device-specific things via err_handler hooks.  I assume we want a
similar model for CXL.

Bjorn


Re: [PATCH v6 1/4] PCI: Introduce pci_dev_for_each_resource()

2023-03-23 Thread Bjorn Helgaas
On Thu, Mar 23, 2023 at 04:30:01PM +0200, Andy Shevchenko wrote:
> On Wed, Mar 22, 2023 at 02:28:04PM -0500, Bjorn Helgaas wrote:
> > On Mon, Mar 20, 2023 at 03:16:30PM +0200, Andy Shevchenko wrote:
> ...
> 
> > > + pci_dev_for_each_resource_p(dev, r) {
> > >   /* zap the 2nd function of the winbond chip */
> > > - if (dev->resource[i].flags & IORESOURCE_IO
> > > - && dev->bus->number == 0 && dev->devfn == 0x81)
> > > - dev->resource[i].flags &= ~IORESOURCE_IO;
> > > - if (dev->resource[i].start == 0 && dev->resource[i].end) {
> > > - dev->resource[i].flags = 0;
> > > - dev->resource[i].end = 0;
> > > + if (dev->bus->number == 0 && dev->devfn == 0x81 &&
> > > + r->flags & IORESOURCE_IO)
> > 
> > This is a nice literal conversion, but it's kind of lame to test
> > bus->number and devfn *inside* the loop here, since they can't change
> > inside the loop.
> 
> Hmm... why are you asking me, even if I may agree on that? It's
> in the original code and out of scope of this series.

Yeah, I don't think it would be *unreasonable* to clean this up at the
same time so the maintainers can look at both at the same time (this
is arch/powerpc/platforms/pseries/pci.c, so Michael, et al), but no
need for you to do anything, certainly.  I can post a follow-up patch.

> > but
> > since we're converging on the "(dev, res)" style, I think we should
> > reverse the names so we have something like:
> > 
> >   pci_dev_for_each_resource(dev, res)
> >   pci_dev_for_each_resource_idx(dev, res, i)
> 
> Wouldn't it be more churn, including pci_bus_for_each_resource() correction?

Yes, it definitely is a little more churn because we already have
pci_bus_for_each_resource() that would have to be changed.

I poked around looking for similar patterns elsewhere with:

  git grep "#define.*for_each_.*_p("
  git grep "#define.*for_each_.*_idx("

I didn't find any other "_p" iterators and just a few "_idx" ones, so
my hope is to follow what little precedent there is, as well as
converge on the basic "*_for_each_resource()" iterators and remove the
"_idx()" versions over time by doing things like the
pci_claim_resource() change.

What do you think?  If it seems like excessive churn, we can do it
as-is and still try to reduce the use of the index variable over time.

Bjorn


Re: [PATCH v6 2/4] PCI: Split pci_bus_for_each_resource_p() out of pci_bus_for_each_resource()

2023-03-22 Thread Bjorn Helgaas
On Mon, Mar 20, 2023 at 03:16:31PM +0200, Andy Shevchenko wrote:
> ...

> -#define pci_bus_for_each_resource(bus, res, i)   
> \
> - for (i = 0; \
> - (res = pci_bus_resource_n(bus, i)) || i < PCI_BRIDGE_RESOURCE_NUM; \
> -  i++)
> +#define __pci_bus_for_each_resource(bus, res, __i, vartype)  
> \
> + for (vartype __i = 0;   
> \
> +  res = pci_bus_resource_n(bus, __i), __i < PCI_BRIDGE_RESOURCE_NUM; 
> \
> +  __i++)
> +
> +#define pci_bus_for_each_resource(bus, res, i)   
> \
> + __pci_bus_for_each_resource(bus, res, i, )
> +
> +#define pci_bus_for_each_resource_p(bus, res)
> \
> + __pci_bus_for_each_resource(bus, res, __i, unsigned int)

I like these changes a lot, too!

Same comments about _p vs _idx and __pci_bus_for_each_resource(...,
vartype).

Also would prefer 80 char max instead of 81.


Re: [PATCH v6 1/4] PCI: Introduce pci_dev_for_each_resource()

2023-03-22 Thread Bjorn Helgaas
Hi Andy and Mika,

I really like the improvements here.  They make the code read much
better.

On Mon, Mar 20, 2023 at 03:16:30PM +0200, Andy Shevchenko wrote:
> From: Mika Westerberg 
> ...

>  static void fixup_winbond_82c105(struct pci_dev* dev)
>  {
> - int i;
> + struct resource *r;
>   unsigned int reg;
>  
>   if (!machine_is(pseries))
> @@ -251,14 +251,14 @@ static void fixup_winbond_82c105(struct pci_dev* dev)
>   /* Enable LEGIRQ to use INTC instead of ISA interrupts */
>   pci_write_config_dword(dev, 0x40, reg | (1<<11));
>  
> - for (i = 0; i < DEVICE_COUNT_RESOURCE; ++i) {
> + pci_dev_for_each_resource_p(dev, r) {
>   /* zap the 2nd function of the winbond chip */
> - if (dev->resource[i].flags & IORESOURCE_IO
> - && dev->bus->number == 0 && dev->devfn == 0x81)
> - dev->resource[i].flags &= ~IORESOURCE_IO;
> - if (dev->resource[i].start == 0 && dev->resource[i].end) {
> - dev->resource[i].flags = 0;
> - dev->resource[i].end = 0;
> + if (dev->bus->number == 0 && dev->devfn == 0x81 &&
> + r->flags & IORESOURCE_IO)

This is a nice literal conversion, but it's kind of lame to test
bus->number and devfn *inside* the loop here, since they can't change
inside the loop.

> + r->flags &= ~IORESOURCE_IO;
> + if (r->start == 0 && r->end) {
> + r->flags = 0;
> + r->end = 0;
>   }
>   }

>  #define pci_resource_len(dev,bar) \
>   ((pci_resource_end((dev), (bar)) == 0) ? 0 :\
>   \
> -  (pci_resource_end((dev), (bar)) -  \
> -   pci_resource_start((dev), (bar)) + 1))
> +  resource_size(pci_resource_n((dev), (bar

I like this change, but it's unrelated to pci_dev_for_each_resource()
and unmentioned in the commit log.

> +#define __pci_dev_for_each_resource(dev, res, __i, vartype)  \
> + for (vartype __i = 0;   \
> +  res = pci_resource_n(dev, __i), __i < PCI_NUM_RESOURCES;   \
> +  __i++)
> +
> +#define pci_dev_for_each_resource(dev, res, i)   
> \
> +   __pci_dev_for_each_resource(dev, res, i, )
> +
> +#define pci_dev_for_each_resource_p(dev, res)
> \
> + __pci_dev_for_each_resource(dev, res, __i, unsigned int)

This series converts many cases to drop the iterator variable ("i"),
which is fantastic.

Several of the remaining places need the iterator variable only to
call pci_claim_resource(), which could be converted to take a "struct
resource *" directly without much trouble.

We don't have to do that pci_claim_resource() conversion now, but
since we're converging on the "(dev, res)" style, I think we should
reverse the names so we have something like:

  pci_dev_for_each_resource(dev, res)
  pci_dev_for_each_resource_idx(dev, res, i)

Not sure __pci_dev_for_each_resource() is worthwhile since it only
avoids repeating that single "for" statement, and passing in "vartype"
(sometimes empty to implicitly avoid the declaration) is a little
complicated to read.  I think it'd be easier to read like this:

  #define pci_dev_for_each_resource(dev, res)  \
for (unsigned int __i = 0; \
 res = pci_resource_n(dev, __i), __i < PCI_NUM_RESOURCES;  \
 __i++)

  #define pci_dev_for_each_resource_idx(dev, res, idx) \
for (idx = 0;  \
 res = pci_resource_n(dev, idx), idx < PCI_NUM_RESOURCES;  \
 idx++)

Bjorn


Re: [PATCHv2 pci-next 1/2] PCI/AER: correctable error message as KERN_INFO

2023-03-17 Thread Bjorn Helgaas
On Fri, Mar 17, 2023 at 11:50:22AM -0700, Sathyanarayanan Kuppuswamy wrote:
> On 3/17/23 10:51 AM, Grant Grundler wrote:
> > Since correctable errors have been corrected (and counted), the dmesg output
> > should not be reported as a warning, but rather as "informational".
> > 
> > Otherwise, using a certain well known vendor's PCIe parts in a USB4 docking
> > station, the dmesg buffer can be spammed with correctable errors, 717 bytes
> > per instance, potentially many MB per day.
> 
> Why don't you investigate why you are getting so many correctable errors?
> Isn't solving the problem preferable to hiding the logs?

I hope there's some effort to find the cause of the errors, too.  But
I do think KERN_INFO is a reasonable level for errors that have
already been corrected.  KERN_ERR seems a little bit too severe to me.

Does changing to KERN_INFO keep the messages out of the dmesg log?  I
don't think it does, because *most* kernel messages are at KERN_INFO.
This may be just a commit log clarification.

I would like to know *which* devices are involved.  Is there some
reason for weasel-wording this?  Knowing which devices are involved
helps in triaging issue reports.  If there are any public reports on
mailing lists, etc, we could also cite those here to help users find
this solution.

> > Given the "WARN" priority, these messages have already confused the typical
> > user that stumbles across them, support staff (triaging feedback reports),
> > and more than a few linux kernel devs. Changing to INFO will hide these
> > messages from most audiences.
> > 
> > Signed-off-by: Grant Grundler 
> > ---
> >  drivers/pci/pcie/aer.c | 29 +++--
> >  1 file changed, 19 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > index f6c24ded134c..cb6b96233967 100644
> > --- a/drivers/pci/pcie/aer.c
> > +++ b/drivers/pci/pcie/aer.c
> > @@ -687,23 +687,29 @@ static void __aer_print_error(struct pci_dev *dev,
> >  {
> > const char **strings;
> > unsigned long status = info->status & ~info->mask;
> > -   const char *level, *errmsg;
> > int i;
> >  
> > if (info->severity == AER_CORRECTABLE) {
> > strings = aer_correctable_error_string;
> > -   level = KERN_WARNING;
> > +   pci_info(dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n",
> > +   info->status, info->mask);
> > } else {
> > strings = aer_uncorrectable_error_string;
> > -   level = KERN_ERR;
> > +   pci_err(dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n",
> > +   info->status, info->mask);
> > }
> >  
> > for_each_set_bit(i, , 32) {
> > -   errmsg = strings[i];
> > +   const char *errmsg = strings[i];
> > +
> > if (!errmsg)
> > errmsg = "Unknown Error Bit";
> >  
> > -   pci_printk(level, dev, "   [%2d] %-22s%s\n", i, errmsg,
> > +   if (info->severity == AER_CORRECTABLE)
> > +   pci_info(dev, "   [%2d] %-22s%s\n", i, errmsg,
> > +   info->first_error == i ? " (First)" : "");
> > +   else
> > +   pci_err(dev, "   [%2d] %-22s%s\n", i, errmsg,
> > info->first_error == i ? " (First)" : "");

The - 5 lines, + 11 lines diff and repetition of the printk strings
doesn't seem like an improvement compared to the -1, +1 in the v1
patch:

  @@ -692,7 +692,7 @@ static void __aer_print_error(struct pci_dev *dev,

  if (info->severity == AER_CORRECTABLE) {
  strings = aer_correctable_error_string;
  -   level = KERN_WARNING;
  +   level = KERN_INFO;
  } else {

But maybe there's a reason?

> > }
> > pci_dev_aer_stats_incr(dev, info);
> > @@ -724,7 +730,7 @@ void aer_print_error(struct pci_dev *dev, struct 
> > aer_err_info *info)
> > layer = AER_GET_LAYER_ERROR(info->severity, info->status);
> > agent = AER_GET_AGENT(info->severity, info->status);
> >  
> > -   level = (info->severity == AER_CORRECTABLE) ? KERN_WARNING : KERN_ERR;
> > +   level = (info->severity == AER_CORRECTABLE) ? KERN_INFO : KERN_ERR;
> >  
> > pci_printk(level, dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
> >aer_error_severity_string[info->severity],
> > @@ -797,14 +803,17 @@ void cper_print_aer(struct pci_dev *dev, int 
> > aer_severity,
> > info.mask = mask;
> > info.first_error = PCI_ERR_CAP_FEP(aer->cap_control);
> >  
> > -   pci_err(dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n", status, mask);
> > __aer_print_error(dev, );
> > -   pci_err(dev, "aer_layer=%s, aer_agent=%s\n",
> > -   aer_error_layer[layer], aer_agent_string[agent]);
> >  
> > -   if (aer_severity != AER_CORRECTABLE)
> > +   if (aer_severity == AER_CORRECTABLE) {
> > +   pci_info(dev, "aer_layer=%s, aer_agent=%s\n",
> > +   aer_error_layer[layer], 

Re: [PATCH v3 4/9] scsi: lpfc: Change to use pci_aer_clear_uncorrect_error_status()

2023-03-15 Thread Bjorn Helgaas
On Tue, Dec 06, 2022 at 04:13:35PM -0600, Bjorn Helgaas wrote:
> On Wed, Sep 28, 2022 at 06:59:41PM +0800, Zhuo Chen wrote:
> > lpfc_aer_cleanup_state() requires clearing both fatal and non-fatal
> > uncorrectable error status.
> 
> I don't know what the point of lpfc_aer_cleanup_state() is.  AER
> errors should be handled and cleared by the PCI core, not by
> individual drivers.  Only lpfc, liquidio, and sky2 touch
> PCI_ERR_UNCOR_STATUS.
> 
> But lpfc_aer_cleanup_state() is visible in the
> "lpfc_aer_state_cleanup" sysfs file, so removing it would break any
> userspace that uses it.
> 
> If we can rely on the PCI core to clean up AER errors itself
> (admittedly, that might be a big "if"), maybe lpfc_aer_cleanup_state()
> could just become a no-op?
> 
> Any comment from the LPFC folks?
> 
> Ideally, I would rather not export pci_aer_clear_nonfatal_status() or
> pci_aer_clear_uncorrect_error_status() outside the PCI core at all.

Resurrecting this old thread.  Zhuo, can you figure out where the PCI
core clears these errors, include that in the commit log, and propose
a patch that makes lpfc_aer_cleanup_state() a no-op, by removing the
pci_aer_clear_nonfatal_status() call completely?

Such a patch could be sent to the SCSI maintainers since it doesn't
involve the PCI core.

If it turns out that the PCI core *doesn't* clear these errors, we
should figure out *why* it doesn't and try to change the PCI core so
it does.

> > But using pci_aer_clear_nonfatal_status()
> > will only clear non-fatal error status. To clear both fatal and
> > non-fatal error status, use pci_aer_clear_uncorrect_error_status().
> > 
> > Signed-off-by: Zhuo Chen 
> > ---
> >  drivers/scsi/lpfc/lpfc_attr.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
> > index 09cf2cd0ae60..d835cc0ba153 100644
> > --- a/drivers/scsi/lpfc/lpfc_attr.c
> > +++ b/drivers/scsi/lpfc/lpfc_attr.c
> > @@ -4689,7 +4689,7 @@ static DEVICE_ATTR_RW(lpfc_aer_support);
> >   * Description:
> >   * If the @buf contains 1 and the device currently has the AER support
> >   * enabled, then invokes the kernel AER helper routine
> > - * pci_aer_clear_nonfatal_status() to clean up the uncorrectable
> > + * pci_aer_clear_uncorrect_error_status() to clean up the uncorrectable
> >   * error status register.
> >   *
> >   * Notes:
> > @@ -4715,7 +4715,7 @@ lpfc_aer_cleanup_state(struct device *dev, struct 
> > device_attribute *attr,
> > return -EINVAL;
> >  
> > if (phba->hba_flag & HBA_AER_ENABLED)
> > -   rc = pci_aer_clear_nonfatal_status(phba->pcidev);
> > +   rc = pci_aer_clear_uncorrect_error_status(phba->pcidev);
> >  
> > if (rc == 0)
> > return strlen(buf);
> > -- 
> > 2.30.1 (Apple Git-130)
> > 


Re: [PATCH v3 3/9] NTB: Remove pci_aer_clear_nonfatal_status() call

2023-03-15 Thread Bjorn Helgaas
On Wed, Sep 28, 2022 at 06:59:40PM +0800, Zhuo Chen wrote:
> There is no need to clear error status during init code, so remove it.
> 
> Signed-off-by: Zhuo Chen 

Can you send this to the NTB folks?  It doesn't depend on anything, so
no real reason to merge via the PCI tree.

To help reviewers, ideally the commit log would mention where the PCI
core clears the non-fatal errors so the driver doesn't have to.

> ---
>  drivers/ntb/hw/idt/ntb_hw_idt.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
> index 0ed6f809ff2e..fed03217289d 100644
> --- a/drivers/ntb/hw/idt/ntb_hw_idt.c
> +++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
> @@ -2657,8 +2657,6 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
>   ret = pci_enable_pcie_error_reporting(pdev);
>   if (ret != 0)
>   dev_warn(>dev, "PCIe AER capability disabled\n");
> - else /* Cleanup nonfatal error status before getting to init */
> - pci_aer_clear_nonfatal_status(pdev);
>  
>   /* First enable the PCI device */
>   ret = pcim_enable_device(pdev);
> -- 
> 2.30.1 (Apple Git-130)
> 


Re: [PATCH v3 1/9] PCI/AER: Add pci_aer_clear_uncorrect_error_status() to PCI core

2023-03-15 Thread Bjorn Helgaas
On Wed, Sep 28, 2022 at 06:59:38PM +0800, Zhuo Chen wrote:
> In lpfc_aer_cleanup_state(), uncorrectable error status needs to be
> cleared, which can be done by calling pci_aer_clear_nonfatal_status()
> and pci_aer_clear_fatal_status(). Meanwhile they can be combined in
> one function (the same in dpc_process_error). So add
> pci_aer_clear_uncorrect_error_status() function to PCI core and
> export symbol to other modules which wants to use it.

Sorry for getting back to this so late.

Why does lpfc need this?  I think AER error status should be cleared
by the PCI core, not by individual drivers, so I really would rather
not add a new interface for drivers to use.

> Signed-off-by: Zhuo Chen 
> ---
>  drivers/pci/pcie/aer.c | 16 
>  include/linux/aer.h|  5 +
>  2 files changed, 21 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index e2d8a74f83c3..4e637121be23 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -286,6 +286,22 @@ void pci_aer_clear_fatal_status(struct pci_dev *dev)
>   pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
>  }
>  
> +int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
> +{
> + int aer = dev->aer_cap;
> + u32 status;
> +
> + if (!pcie_aer_is_native(dev))
> + return -EIO;
> +
> + pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, );
> + if (status)
> + pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(pci_aer_clear_uncorrect_error_status);
> +
>  /**
>   * pci_aer_raw_clear_status - Clear AER error registers.
>   * @dev: the PCI device
> diff --git a/include/linux/aer.h b/include/linux/aer.h
> index 97f64ba1b34a..154690c278cb 100644
> --- a/include/linux/aer.h
> +++ b/include/linux/aer.h
> @@ -45,6 +45,7 @@ struct aer_capability_regs {
>  int pci_enable_pcie_error_reporting(struct pci_dev *dev);
>  int pci_disable_pcie_error_reporting(struct pci_dev *dev);
>  int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
> +int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev);
>  void pci_save_aer_state(struct pci_dev *dev);
>  void pci_restore_aer_state(struct pci_dev *dev);
>  #else
> @@ -60,6 +61,10 @@ static inline int pci_aer_clear_nonfatal_status(struct 
> pci_dev *dev)
>  {
>   return -EINVAL;
>  }
> +static inline int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
> +{
> + return -EINVAL;
> +}
>  static inline void pci_save_aer_state(struct pci_dev *dev) {}
>  static inline void pci_restore_aer_state(struct pci_dev *dev) {}
>  #endif
> -- 
> 2.30.1 (Apple Git-130)
> 


Re: [PATCH 1/1] PCI: layerscape: Add the workaround for A-010305

2023-03-14 Thread Bjorn Helgaas
On Thu, Jan 12, 2023 at 02:44:33PM -0500, Frank Li wrote:
> From: Xiaowei Bao 
> 
> When a link down or hot reset event occurs, the PCI Express EP
> controller's Link Capabilities Register should retain the values of
> the Maximum Link Width and Supported Link Speed configured by RCW.

Can you rework this to say what the patch does and why it's necessary?

Apparently it's a workaround for some issue in A-010305?  The subject
line could also use more content.  What is A-010305?  What is the
problem this works around?

I don't see a check for A-010305; do *all* devices handled by this
driver have this problem?

The PCIe Link Capabilities is supposed to be read-only; maybe this
device loses the value on link down or hot reset?  And I guess the
device interrupts on link up/down and reset, and you restore the value
then?

Link Capabilities contains several things other than Max Link Width
and Max Link Speed.  But they don't need to be restored?

What is RCW?

> Signed-off-by: Xiaowei Bao 
> Signed-off-by: Hou Zhiqiang 
> Signed-off-by: Frank Li 
> ---
>  .../pci/controller/dwc/pci-layerscape-ep.c| 112 +-
>  1 file changed, 111 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
> b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> index ed5cfc9408d9..1b884854c18e 100644
> --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> @@ -18,6 +18,22 @@
>  
>  #include "pcie-designware.h"
>  
> +#define PCIE_LINK_CAP0x7C/* PCIe Link 
> Capabilities*/

Is this something you can find by searching the capability list
instead of hard-coding the config space offset?

> +#define MAX_LINK_SP_MASK 0x0F
> +#define MAX_LINK_W_MASK  0x3F
> +#define MAX_LINK_W_SHIFT 4

These look like they should use PCI_EXP_LNKCAP_SLS and
PCI_EXP_LNKCAP_MLW instead of defining new ones.

> +/* PEX PFa PCIE pme and message interrupt registers*/
> +#define PEX_PF0_PME_MES_DR 0xC0020
> +#define PEX_PF0_PME_MES_DR_LUD (1 << 7)
> +#define PEX_PF0_PME_MES_DR_LDD (1 << 9)
> +#define PEX_PF0_PME_MES_DR_HRD (1 << 10)
> +
> +#define PEX_PF0_PME_MES_IER0xC0028
> +#define PEX_PF0_PME_MES_IER_LUDIE  (1 << 7)
> +#define PEX_PF0_PME_MES_IER_LDDIE  (1 << 9)
> +#define PEX_PF0_PME_MES_IER_HRDIE  (1 << 10)
> +
>  #define to_ls_pcie_ep(x) dev_get_drvdata((x)->dev)
>  
>  struct ls_pcie_ep_drvdata {
> @@ -30,8 +46,90 @@ struct ls_pcie_ep {
>   struct dw_pcie  *pci;
>   struct pci_epc_features *ls_epc;
>   const struct ls_pcie_ep_drvdata *drvdata;
> + u8  max_speed;
> + u8  max_width;
> + boolbig_endian;
> + int irq;
>  };
>  
> +static u32 ls_lut_readl(struct ls_pcie_ep *pcie, u32 offset)
> +{
> + struct dw_pcie *pci = pcie->pci;
> +
> + if (pcie->big_endian)
> + return ioread32be(pci->dbi_base + offset);
> + else
> + return ioread32(pci->dbi_base + offset);
> +}
> +
> +static void ls_lut_writel(struct ls_pcie_ep *pcie, u32 offset,
> +   u32 value)
> +{
> + struct dw_pcie *pci = pcie->pci;
> +
> + if (pcie->big_endian)
> + iowrite32be(value, pci->dbi_base + offset);
> + else
> + iowrite32(value, pci->dbi_base + offset);
> +}
> +
> +static irqreturn_t ls_pcie_ep_event_handler(int irq, void *dev_id)
> +{
> + struct ls_pcie_ep *pcie = (struct ls_pcie_ep *)dev_id;
> + struct dw_pcie *pci = pcie->pci;
> + u32 val;
> +
> + val = ls_lut_readl(pcie, PEX_PF0_PME_MES_DR);
> + if (!val)
> + return IRQ_NONE;
> +
> + if (val & PEX_PF0_PME_MES_DR_LUD)
> + dev_info(pci->dev, "Detect the link up state !\n");
> + else if (val & PEX_PF0_PME_MES_DR_LDD)
> + dev_info(pci->dev, "Detect the link down state !\n");
> + else if (val & PEX_PF0_PME_MES_DR_HRD)
> + dev_info(pci->dev, "Detect the hot reset state !\n");

No space before "!".  Seems possibly more verbose than necessary,
since the endpoint may be reset as part of normal operation.

> + dw_pcie_dbi_ro_wr_en(pci);
> + dw_pcie_writew_dbi(pci, PCIE_LINK_CAP,
> +(pcie->max_width << MAX_LINK_W_SHIFT) |

Use FIELD_PREP() so you don't need a shift.

> +pcie->max_speed);
> + dw_pcie_dbi_ro_wr_dis(pci);
> +
> + ls_lut_writel(pcie, PEX_PF0_PME_MES_DR, val);
> +
> + return IRQ_HANDLED;
> +}
> +
> +static int ls_pcie_ep_interrupt_init(struct ls_pcie_ep *pcie,
> +  struct platform_device *pdev)
> +{
> + u32 val;
> + int ret;
> +
> + pcie->irq = platform_get_irq_byname(pdev, "pme");
> + if (pcie->irq < 0) {
> + dev_err(>dev, "Can't get 'pme' 

Re: [PATCH] PCI/AER: correctable error message as KERN_INFO

2023-03-14 Thread Bjorn Helgaas
On Tue, Feb 28, 2023 at 10:04:53PM -0800, Grant Grundler wrote:
> Since correctable errors have been corrected (and counted), the dmesg output
> should not be reported as a warning, but rather as "informational".
> 
> Otherwise, using a certain well known vendor's PCIe parts in a USB4 docking
> station, the dmesg buffer can be spammed with correctable errors, 717 bytes
> per instance, potentially many MB per day.
> 
> Given the "WARN" priority, these messages have already confused the typical
> user that stumbles across them, support staff (triaging feedback reports),
> and more than a few linux kernel devs. Changing to INFO will hide these
> messages from most audiences.
> 
> Signed-off-by: Grant Grundler 
> ---
> This patch will likely conflict with:
>   
> https://lore.kernel.org/all/20230103165548.570377-1-rajat.khandel...@linux.intel.com/
> 
> which I'd also like to see upstream. Please let me know to resubmit
> mine if Rajat's patch lands first. Or feel free to fix up this one.

Yes.  I think it makes sense to separate this into two patches:

  1) Log correctable errors as KERN_INFO instead of KERN_WARNING, and

  2) Rate-limit correctable error logging.

>  drivers/pci/pcie/aer.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index f6c24ded134c..e4cf3ec40d66 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -692,7 +692,7 @@ static void __aer_print_error(struct pci_dev *dev,
>  
>   if (info->severity == AER_CORRECTABLE) {
>   strings = aer_correctable_error_string;
> - level = KERN_WARNING;
> + level = KERN_INFO;
>   } else {
>   strings = aer_uncorrectable_error_string;
>   level = KERN_ERR;
> @@ -724,7 +724,7 @@ void aer_print_error(struct pci_dev *dev, struct 
> aer_err_info *info)
>   layer = AER_GET_LAYER_ERROR(info->severity, info->status);
>   agent = AER_GET_AGENT(info->severity, info->status);
>  
> - level = (info->severity == AER_CORRECTABLE) ? KERN_WARNING : KERN_ERR;
> + level = (info->severity == AER_CORRECTABLE) ? KERN_INFO : KERN_ERR;
>  
>   pci_printk(level, dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
>  aer_error_severity_string[info->severity],

Shouldn't we do the same in the cper_print_aer() path?  That path
currently uses pci_err() and then calls __aer_print_error(), so the
initial message will always be KERN_ERR, and the decoding done by
__aer_print_error() will be KERN_INFO (for correctable) or KERN_ERR.

Seems like a shame to do the same test in three places, but would
require a little more refactoring to avoid that.

Bjorn


Re: [PATCH] PCI/AER: correctable error message as KERN_INFO

2023-03-08 Thread Bjorn Helgaas
On Wed, Mar 08, 2023 at 12:00:48PM -0800, Grant Grundler wrote:
> Ping? Did I miss an email or other work that this patch collides with?

Nope, we typically make topic branches based on -rc1, so not much
happens during the merge window.  -rc1 was tagged Sunday, so things
will start appearing in -next soon.

Bjorn

> On Tue, Feb 28, 2023 at 10:05 PM Grant Grundler  wrote:
> >
> > Since correctable errors have been corrected (and counted), the dmesg output
> > should not be reported as a warning, but rather as "informational".
> >
> > Otherwise, using a certain well known vendor's PCIe parts in a USB4 docking
> > station, the dmesg buffer can be spammed with correctable errors, 717 bytes
> > per instance, potentially many MB per day.
> >
> > Given the "WARN" priority, these messages have already confused the typical
> > user that stumbles across them, support staff (triaging feedback reports),
> > and more than a few linux kernel devs. Changing to INFO will hide these
> > messages from most audiences.
> >
> > Signed-off-by: Grant Grundler 
> > ---
> > This patch will likely conflict with:
> >   
> > https://lore.kernel.org/all/20230103165548.570377-1-rajat.khandel...@linux.intel.com/
> >
> > which I'd also like to see upstream. Please let me know to resubmit mine if 
> > Rajat's patch lands first. Or feel free to fix up this one.
> >
> >  drivers/pci/pcie/aer.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > index f6c24ded134c..e4cf3ec40d66 100644
> > --- a/drivers/pci/pcie/aer.c
> > +++ b/drivers/pci/pcie/aer.c
> > @@ -692,7 +692,7 @@ static void __aer_print_error(struct pci_dev *dev,
> >
> > if (info->severity == AER_CORRECTABLE) {
> > strings = aer_correctable_error_string;
> > -   level = KERN_WARNING;
> > +   level = KERN_INFO;
> > } else {
> > strings = aer_uncorrectable_error_string;
> > level = KERN_ERR;
> > @@ -724,7 +724,7 @@ void aer_print_error(struct pci_dev *dev, struct 
> > aer_err_info *info)
> > layer = AER_GET_LAYER_ERROR(info->severity, info->status);
> > agent = AER_GET_AGENT(info->severity, info->status);
> >
> > -   level = (info->severity == AER_CORRECTABLE) ? KERN_WARNING : 
> > KERN_ERR;
> > +   level = (info->severity == AER_CORRECTABLE) ? KERN_INFO : KERN_ERR;
> >
> > pci_printk(level, dev, "PCIe Bus Error: severity=%s, type=%s, 
> > (%s)\n",
> >aer_error_severity_string[info->severity],
> > --
> > 2.39.2.722.g9855ee24e9-goog
> >


Re: [PATCH RFC] PCI/AER: Enable internal AER errors by default

2023-02-13 Thread Bjorn Helgaas
On Fri, Feb 10, 2023 at 02:33:23PM -0800, Ira Weiny wrote:
> The CXL driver expects internal error reporting to be enabled via
> pci_enable_pcie_error_reporting().  It is likely other drivers expect the 
> same.
> Dave submitted a patch to enable the CXL side[1] but the PCI AER registers
> still mask errors.
> 
> PCIe v6.0 Uncorrectable Mask Register (7.8.4.3) and Correctable Mask
> Register (7.8.4.6) default to masking internal errors.  The
> Uncorrectable Error Severity Register (7.8.4.4) defaults internal errors
> as fatal.
> 
> Enable internal errors to be reported via the standard
> pci_enable_pcie_error_reporting() call.  Ensure uncorrectable errors are set
> non-fatal to limit any impact to other drivers.

Do you have any background on why the spec makes these errors masked
by default?  I'm sympathetic to wanting to learn about all the errors
we can, but I'm a little wary if the spec authors thought it was
important to mask these by default.

> [1] 
> https://lore.kernel.org/all/167604864163.2392965.5102660329807283871.stgit@djiang5-mobl3.local/
> 
> Cc: Bjorn Helgaas 
> Cc: Jonathan Cameron 
> Cc: Dan Williams 
> Cc: Dave Jiang 
> Cc: Stefan Roese 
> Cc: "Kuppuswamy Sathyanarayanan" 
> Cc: Mahesh J Salgaonkar 
> Cc: Oliver O'Halloran 
> Cc: linux-...@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> Cc: linux-...@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Ira Weiny 
> ---
> This is RFC to see if it is acceptable to be part of the standard
> pci_enable_pcie_error_reporting() call or perhaps a separate pci core
> call should be introduced.  It is anticipated that enabling this error
> reporting is what existing drivers are expecting.  The errors are marked
> non-fatal therefore it should not adversely affect existing devices.
> ---
>  drivers/pci/pcie/aer.c | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 625f7b2cafe4..9d3ed3a5fc23 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -229,11 +229,28 @@ int pcie_aer_is_native(struct pci_dev *dev)
>  
>  int pci_enable_pcie_error_reporting(struct pci_dev *dev)
>  {
> + int pos_cap_err;
> + u32 reg;
>   int rc;
>  
>   if (!pcie_aer_is_native(dev))
>   return -EIO;
>  
> + pos_cap_err = dev->aer_cap;
> +
> + /* Unmask correctable and uncorrectable (non-fatal) internal errors */
> + pci_read_config_dword(dev, pos_cap_err + PCI_ERR_COR_MASK, );
> + reg &= ~PCI_ERR_COR_INTERNAL;
> + pci_write_config_dword(dev, pos_cap_err + PCI_ERR_COR_MASK, reg);
> +
> + pci_read_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_SEVER, );
> + reg &= ~PCI_ERR_UNC_INTN;
> + pci_write_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_SEVER, reg);
> +
> + pci_read_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_MASK, );
> + reg &= ~PCI_ERR_UNC_INTN;
> + pci_write_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_MASK, reg);
> +
>   rc = pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_AER_FLAGS);
>   return pcibios_err_to_errno(rc);
>  }
> 
> ---
> base-commit: e5ab7f206ffc873160bd0f1a52cae17ab692a9d1
> change-id: 20230209-cxl-pci-aer-18dda61c8239
> 
> Best regards,
> -- 
> Ira Weiny 
> 


Re: [External] : [PATCH v3 1/1] PCI: layerscape: Add EP mode support for ls1028a

2023-02-10 Thread Bjorn Helgaas
On Fri, Feb 10, 2023 at 11:51:46PM +0530, ALOK TIWARI wrote:
> LGTM,

Thanks a lot for looking at this!

In the Linux world, "LGTM" is not something a maintainer can really
act on.  If you respond with a "Reviewed-by" tag as described here:

  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v6.1#n495

maintainers (or tooling like b4) can add it to the patch when merging
it.  Here are some examples of how it can be used:

  
https://lore.kernel.org/linux-pci/bn9pr11mb527699243353309a1ddfefbe8c...@bn9pr11mb5276.namprd11.prod.outlook.com/
  https://lore.kernel.org/linux-pci/y9aezvrtb4sob...@memverge.com/
  
https://lore.kernel.org/linux-pci/a20028e6-3318-26ca-117a-26c87c292...@linaro.org/

Bjorn

> On 2/9/2023 8:40 PM, Frank Li wrote:
> > From: Xiaowei Bao 
> > 
> > Add PCIe EP mode support for ls1028a.
> > 
> > Signed-off-by: Xiaowei Bao 
> > Signed-off-by: Hou Zhiqiang 
> > Signed-off-by: Frank Li 
> > Acked-by:  Roy Zang 


Re: [External] : RE: [EXT] [PATCH v2 1/1] PCI: layerscape: Add EP mode support for ls1028a

2023-02-08 Thread Bjorn Helgaas
On Tue, Feb 07, 2023 at 04:20:21PM +, Frank Li wrote:
> > Subject: Re: [External] : RE: [EXT] [PATCH v2 1/1] PCI: layerscape: Add EP
> > mode support for ls1028a
> > 
> >  { .compatible = "fsl,ls1046a-pcie-ep", .data = _ep_drvdata },
> > +   { .compatible = "fsl,ls1028a-pcie-ep", .data = _ep_drvdata },
> > { .compatible = "fsl,ls1088a-pcie-ep", .data = _ep_drvdata },
> > 
> > can it be like this for better readability. ?
> 
> It is just chip name and follow name conversion, which already
> upstreamed and documented. 
>
> Why do you think it not is good readability? 

I thought maybe ALOK's point was to sort the list, which does make a
lot of sense.  But if you want to sort by the .data member, I would
think you would make .compatible a secondary sort key, which means
ls1028a would come before ls1046a, so you would end up with this
instead:

 static const struct of_device_id ls_pcie_ep_of_match[] = {
+   { .compatible = "fsl,ls1028a-pcie-ep", .data = _ep_drvdata },
{ .compatible = "fsl,ls1046a-pcie-ep", .data = _ep_drvdata },
{ .compatible = "fsl,ls1088a-pcie-ep", .data = _ep_drvdata },
{ .compatible = "fsl,ls2088a-pcie-ep", .data = _ep_drvdata },
{ .compatible = "fsl,lx2160ar2-pcie-ep", .data = _ep_drvdata },
{ },
 };



Re: [PATCH V2] PCI/AER: Configure ECRC only AER is native

2023-01-12 Thread Bjorn Helgaas
On Thu, Jan 12, 2023 at 12:51:11PM +0530, Vidya Sagar wrote:
> As the ECRC configuration bits are part of AER registers, configure
> ECRC only if AER is natively owned by the kernel.
> 
> Signed-off-by: Vidya Sagar 

Applied to pci/aer for v6.3, thanks!

> ---
> v2:
> * Updated kernel-parameters.txt document based on Bjorn's suggestion
> 
>  Documentation/admin-guide/kernel-parameters.txt | 4 +++-
>  drivers/pci/pcie/aer.c  | 3 +++
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index 426fa892d311..8f85a1230525 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4242,7 +4242,9 @@
>   specified, e.g., 12@pci:8086:9c22:103c:198f
>   for 4096-byte alignment.
>   ecrc=   Enable/disable PCIe ECRC (transaction layer
> - end-to-end CRC checking).
> + end-to-end CRC checking). Only effective if
> + OS has native AER control (either granted by
> + ACPI _OSC or forced via "pcie_ports=native")
>   bios: Use BIOS/firmware settings. This is the
>   the default.
>   off: Turn ECRC off
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index e2d8a74f83c3..730b47bdcdef 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -184,6 +184,9 @@ static int disable_ecrc_checking(struct pci_dev *dev)
>   */
>  void pcie_set_ecrc_checking(struct pci_dev *dev)
>  {
> + if (!pcie_aer_is_native(dev))
> + return;
> +
>   switch (ecrc_policy) {
>   case ECRC_POLICY_DEFAULT:
>   return;
> -- 
> 2.17.1
> 


Re: [PATCH V1] PCI/AER: Configure ECRC only AER is native

2023-01-12 Thread Bjorn Helgaas
On Wed, Jan 11, 2023 at 03:27:51PM -0800, Sathyanarayanan Kuppuswamy wrote:
> On 1/11/23 3:10 PM, Bjorn Helgaas wrote:
> > On Wed, Jan 11, 2023 at 01:42:21PM -0800, Sathyanarayanan Kuppuswamy wrote:
> >> On 1/11/23 12:31 PM, Vidya Sagar wrote:
> >>> As the ECRC configuration bits are part of AER registers, configure
> >>> ECRC only if AER is natively owned by the kernel.
> >>
> >> ecrc command line option takes "bios/on/off" as possible options. It
> >> does not clarify whether "on/off" choices can only be used if AER is
> >> owned by OS or it can override the ownership of ECRC configuration 
> >> similar to pcie_ports=native option. Maybe that needs to be clarified.
> > 
> > Good point, what do you think of an update like this:
> > 
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> > b/Documentation/admin-guide/kernel-parameters.txt
> > index 6cfa6e3996cf..f7b40a439194 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -4296,7 +4296,9 @@
> > specified, e.g., 12@pci:8086:9c22:103c:198f
> > for 4096-byte alignment.
> > ecrc=   Enable/disable PCIe ECRC (transaction layer
> > -   end-to-end CRC checking).
> > +   end-to-end CRC checking).  Only effective
> > +   if OS has native AER control (either granted by
> > +   ACPI _OSC or forced via "pcie_ports=native").
> > bios: Use BIOS/firmware settings. This is the
> > the default.
> > off: Turn ECRC off
> 
> Looks fine. But do we even need "bios" option? Since it is the default
> value, I am not sure why we need to list that as an option again. IMO
> this could be removed.

I agree, it seems pointless.

> > I don't know whether the "ecrc=" parameter is really needed.  If we
> > were adding it today, I would ask "why not enable ECRC wherever it is
> > supported?"  If there are devices where it's broken, we could always
> > add quirks to disable it on a case-by-case basis.
> 
> Checking the original patch which added it, it looks like the intention
> is to give option to boost performance over integrity.
> 
> commit 43c16408842b0eeb367c23a6fa540ce69f99e347
> Author: Andrew Patterson 
> Date:   Wed Apr 22 16:52:09 2009 -0600
> 
> PCI: Add support for turning PCIe ECRC on or off
> 
> Adds support for PCI Express transaction layer end-to-end CRC checking
> (ECRC).  This patch will enable/disable ECRC checking by setting/clearing
> the ECRC Check Enable and/or ECRC Generation Enable bits for devices that
> support ECRC.
> 
> The ECRC setting is controlled by the "pci=ecrc=" command-line
> option. If this option is not set or is set to 'bios", the enable and
> generation bits are left in whatever state that firmware/BIOS set them to.
> The "off" setting turns them off, and the "on" option turns them on (if 
> the
> device supports it).
> 
> Turning ECRC on or off can be a data integrity versus performance
> tradeoff.  In theory, turning it on will catch more data errors, turning
> it off means possibly better performance since CRC does not need to be
> calculated by the PCIe hardware and packet sizes are reduced.

Ah, right, and I think I was even part of the conversation when this
was added :)

I'm not sure I would make the same choice today, though.  IMHO it's
kind of hard to defend choosing performance over data integrity.

If a platform really wants to sacrifice integrity for performance, it
could retain control of AER, and after Vidya's patch, Linux will leave
the ECRC configuration alone.

Straw-man: If Linux owns AER and ECRC is supported, enable ECRC by
default.  Retain "ecrc=off" to turn it off, but drop a note in dmesg
and taint the kernel.

Bjorn


Re: [PATCH V1] PCI/AER: Configure ECRC only AER is native

2023-01-11 Thread Bjorn Helgaas
On Wed, Jan 11, 2023 at 01:42:21PM -0800, Sathyanarayanan Kuppuswamy wrote:
> On 1/11/23 12:31 PM, Vidya Sagar wrote:
> > As the ECRC configuration bits are part of AER registers, configure
> > ECRC only if AER is natively owned by the kernel.
> 
> ecrc command line option takes "bios/on/off" as possible options. It
> does not clarify whether "on/off" choices can only be used if AER is
> owned by OS or it can override the ownership of ECRC configuration 
> similar to pcie_ports=native option. Maybe that needs to be clarified.

Good point, what do you think of an update like this:

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 6cfa6e3996cf..f7b40a439194 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4296,7 +4296,9 @@
specified, e.g., 12@pci:8086:9c22:103c:198f
for 4096-byte alignment.
ecrc=   Enable/disable PCIe ECRC (transaction layer
-   end-to-end CRC checking).
+   end-to-end CRC checking).  Only effective
+   if OS has native AER control (either granted by
+   ACPI _OSC or forced via "pcie_ports=native").
bios: Use BIOS/firmware settings. This is the
the default.
off: Turn ECRC off

I don't know whether the "ecrc=" parameter is really needed.  If we
were adding it today, I would ask "why not enable ECRC wherever it is
supported?"  If there are devices where it's broken, we could always
add quirks to disable it on a case-by-case basis.

But I think the patch below is the right thing to do for now.  Vidya,
did you trip over an issue because of this, e.g., a conflict between
firmware use of AER and Linux use of it?  If so, maybe we could
mention a symptom on the commit log.  But my guess is you probably
found this by inspection.

Bjorn

> > Signed-off-by: Vidya Sagar 
> > ---
> >  drivers/pci/pcie/aer.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > index e2d8a74f83c3..730b47bdcdef 100644
> > --- a/drivers/pci/pcie/aer.c
> > +++ b/drivers/pci/pcie/aer.c
> > @@ -184,6 +184,9 @@ static int disable_ecrc_checking(struct pci_dev *dev)
> >   */
> >  void pcie_set_ecrc_checking(struct pci_dev *dev)
> >  {
> > +   if (!pcie_aer_is_native(dev))
> > +   return;
> > +
> > switch (ecrc_policy) {
> > case ECRC_POLICY_DEFAULT:
> > return;
> 
> -- 
> Sathyanarayanan Kuppuswamy
> Linux Kernel Developer


Re: [PATCH net-next 2/7] PCI: Remove PCI IDs used by the Sun Cassini driver

2023-01-10 Thread Bjorn Helgaas
On Fri, Jan 06, 2023 at 02:00:15PM -0800, Anirudh Venkataramanan wrote:
> The previous patch removed the Cassini driver (drivers/net/ethernet/sun).
> With this, PCI_DEVICE_ID_NS_SATURN and PCI_DEVICE_ID_SUN_CASSINI are
> unused. Remove them.
> 
> Cc: Leon Romanovsky 
> Signed-off-by: Anirudh Venkataramanan 
> ---
>  include/linux/pci_ids.h | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
> index b362d90..eca2340 100644
> --- a/include/linux/pci_ids.h
> +++ b/include/linux/pci_ids.h
> @@ -433,7 +433,6 @@
>  #define PCI_DEVICE_ID_NS_CS5535_AUDIO0x002e
>  #define PCI_DEVICE_ID_NS_CS5535_USB  0x002f
>  #define PCI_DEVICE_ID_NS_GX_VIDEO0x0030
> -#define PCI_DEVICE_ID_NS_SATURN  0x0035
>  #define PCI_DEVICE_ID_NS_SCx200_BRIDGE   0x0500
>  #define PCI_DEVICE_ID_NS_SCx200_SMI  0x0501
>  #define PCI_DEVICE_ID_NS_SCx200_IDE  0x0502
> @@ -1047,7 +1046,6 @@
>  #define PCI_DEVICE_ID_SUN_SABRE  0xa000
>  #define PCI_DEVICE_ID_SUN_HUMMINGBIRD0xa001
>  #define PCI_DEVICE_ID_SUN_TOMATILLO  0xa801
> -#define PCI_DEVICE_ID_SUN_CASSINI0xabba

I don't think there's value in removing these definitions.  I would
just leave them alone.


Re: [PATCH] PCI/AER: Rate limit the reporting of the correctable errors

2023-01-03 Thread Bjorn Helgaas
[+cc Paul, Sasha, Leon, Frederick]

(Please cc folks who have commented on previous versions of your
patch.)

On Tue, Jan 03, 2023 at 10:25:48PM +0530, Rajat Khandelwal wrote:
> There are many instances where correctable errors tend to inundate
> the message buffer. We observe such instances during thunderbolt PCIe
> tunneling.
> 
> It's true that they are mitigated by the hardware and are non-fatal
> but we shouldn't be spamming the logs with such correctable errors as it
> confuses other kernel developers less familiar with PCI errors, support
> staff, and users who happen to look at the logs, hence rate limit them.

I want a better understanding of why we have so many errors before
rate-limiting everybody.

> A typical example log inside an HP TBT4 dock:
> [54912.661142] pcieport :00:07.0: AER: Multiple Corrected error received: 
> :2b:00.0
> [54912.661194] igc :2b:00.0: PCIe Bus Error: severity=Corrected, 
> type=Data Link Layer, (Transmitter ID)
> [54912.661203] igc :2b:00.0:   device [8086:5502] error 
> status/mask=1100/2000
> [54912.661211] igc :2b:00.0:[ 8] Rollover
> [54912.661219] igc :2b:00.0:[12] Timeout
> [54982.838760] pcieport :00:07.0: AER: Corrected error received: 
> :2b:00.0
> [54982.838798] igc :2b:00.0: PCIe Bus Error: severity=Corrected, 
> type=Data Link Layer, (Transmitter ID)
> [54982.838808] igc :2b:00.0:   device [8086:5502] error 
> status/mask=1000/2000
> [54982.838817] igc :2b:00.0:[12] Timeout

Please remove the timestamps; they don't contribute to understanding
the problem.

> This gets repeated continuously, thus inundating the buffer.

Did you verify that we actually clear the Correctable Error Status
register?

https://bugzilla.kernel.org/show_bug.cgi?id=216863 looks like a
similar issue.  The issue Frederick is seeing happens when resuming
from sleep.  Is there some event that triggers the correctable errors
you see?

Bjorn


Re: [PATCH v3 4/9] scsi: lpfc: Change to use pci_aer_clear_uncorrect_error_status()

2022-12-06 Thread Bjorn Helgaas
[moved James, Dick, LPFC supporters to "to"]

On Wed, Sep 28, 2022 at 06:59:41PM +0800, Zhuo Chen wrote:
> lpfc_aer_cleanup_state() requires clearing both fatal and non-fatal
> uncorrectable error status.

I don't know what the point of lpfc_aer_cleanup_state() is.  AER
errors should be handled and cleared by the PCI core, not by
individual drivers.  Only lpfc, liquidio, and sky2 touch
PCI_ERR_UNCOR_STATUS.

But lpfc_aer_cleanup_state() is visible in the
"lpfc_aer_state_cleanup" sysfs file, so removing it would break any
userspace that uses it.

If we can rely on the PCI core to clean up AER errors itself
(admittedly, that might be a big "if"), maybe lpfc_aer_cleanup_state()
could just become a no-op?

Any comment from the LPFC folks?

Ideally, I would rather not export pci_aer_clear_nonfatal_status() or
pci_aer_clear_uncorrect_error_status() outside the PCI core at all.

> But using pci_aer_clear_nonfatal_status()
> will only clear non-fatal error status. To clear both fatal and
> non-fatal error status, use pci_aer_clear_uncorrect_error_status().
> 
> Signed-off-by: Zhuo Chen 
> ---
>  drivers/scsi/lpfc/lpfc_attr.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
> index 09cf2cd0ae60..d835cc0ba153 100644
> --- a/drivers/scsi/lpfc/lpfc_attr.c
> +++ b/drivers/scsi/lpfc/lpfc_attr.c
> @@ -4689,7 +4689,7 @@ static DEVICE_ATTR_RW(lpfc_aer_support);
>   * Description:
>   * If the @buf contains 1 and the device currently has the AER support
>   * enabled, then invokes the kernel AER helper routine
> - * pci_aer_clear_nonfatal_status() to clean up the uncorrectable
> + * pci_aer_clear_uncorrect_error_status() to clean up the uncorrectable
>   * error status register.
>   *
>   * Notes:
> @@ -4715,7 +4715,7 @@ lpfc_aer_cleanup_state(struct device *dev, struct 
> device_attribute *attr,
>   return -EINVAL;
>  
>   if (phba->hba_flag & HBA_AER_ENABLED)
> - rc = pci_aer_clear_nonfatal_status(phba->pcidev);
> + rc = pci_aer_clear_uncorrect_error_status(phba->pcidev);
>  
>   if (rc == 0)
>   return strlen(buf);
> -- 
> 2.30.1 (Apple Git-130)
> 


Re: [PATCH v3 8/9] PCI/ERR: Clear fatal error status when pci_channel_io_frozen

2022-12-06 Thread Bjorn Helgaas
Hi Zhuo,

On Wed, Sep 28, 2022 at 06:59:45PM +0800, Zhuo Chen wrote:
> When state is pci_channel_io_frozen in pcie_do_recovery(), the
> severity is fatal and fatal error status should be cleared.
> So add pci_aer_clear_fatal_status().
> 
> Signed-off-by: Zhuo Chen 
> ---
>  drivers/pci/pcie/err.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index f80b21244ef1..b46f1d36c090 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -241,7 +241,10 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>   pci_walk_bridge(bridge, report_resume, );
>  
>   pcie_clear_device_status(dev);
> - pci_aer_clear_nonfatal_status(dev);
> + if (state == pci_channel_io_frozen)
> + pci_aer_clear_fatal_status(dev);
> + else
> + pci_aer_clear_nonfatal_status(dev);

I'm confused.  It seems like we certainly need to clear fatal errors
after they occur *somewhere*, and if we don't, surely this would be a
very obvious issue.  But you didn't mention this being a bug fix, so I
assume it's more of a cleanup.

If it *is* a bug fix, please say that and give a hint about what the
bug looks like, e.g., what sort of messages a user might see.

If it's not a bug fix, I don't understand how AER fatal errors get
cleared today.  The PCI_ERR_UNCOR_STATUS bits are sticky, so they're
not cleared by a reset.  In the current tree, these are the only
places I see that clear AER fatal errors:

  pci_init_capabilities
pci_aer_init # once at device enumeration
  pci_aer_clear_status
pci_aer_raw_clear_status
  pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status)

  aer_probe
aer_enable_rootport  # once at Root Port enumeration
  pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, reg32)

  dpc_process_error  # after DPC triggered
pci_aer_clear_fatal_status
  pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status)

  edr_handle_event   # after EDR event
pci_aer_raw_clear_status
  pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status)

  pci_restore_state  # after reset or PM sleep/resume
pci_aer_clear_status
  pci_aer_raw_clear_status
pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status)

The only one that could clear errors after an AER error (not DPC or
EDR), would be the pci_restore_state() in the reset path.  If the
current code relies on that, I'd say that's a pretty non-obvious
dependency.

>   pci_info(bridge, "device recovery successful\n");
>   return status;
> -- 
> 2.30.1 (Apple Git-130)
> 


Re: [PATCH v3 3/9] NTB: Remove pci_aer_clear_nonfatal_status() call

2022-12-06 Thread Bjorn Helgaas
On Wed, Sep 28, 2022 at 02:03:55PM +0300, Serge Semin wrote:
> On Wed, Sep 28, 2022 at 06:59:40PM +0800, Zhuo Chen wrote:
> > There is no need to clear error status during init code, so remove it.
> 
> Why do you think there isn't? Justify in more details.

Thanks for taking a look, Sergey!  I agree we should leave it or add
the rationale here.

> > Signed-off-by: Zhuo Chen 
> > ---
> >  drivers/ntb/hw/idt/ntb_hw_idt.c | 2 --
> >  1 file changed, 2 deletions(-)
> > 
> > diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c 
> > b/drivers/ntb/hw/idt/ntb_hw_idt.c
> > index 0ed6f809ff2e..fed03217289d 100644
> > --- a/drivers/ntb/hw/idt/ntb_hw_idt.c
> > +++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
> > @@ -2657,8 +2657,6 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
> > ret = pci_enable_pcie_error_reporting(pdev);
> > if (ret != 0)
> > dev_warn(>dev, "PCIe AER capability disabled\n");
> > -   else /* Cleanup nonfatal error status before getting to init */
> > -   pci_aer_clear_nonfatal_status(pdev);

I do think drivers should not need to clear errors; I think the PCI
core should be responsible for that.

And I think the core *does* do that in this path:

  pci_init_capabilities
pci_aer_init
  pci_aer_clear_status
pci_aer_raw_clear_status
  pci_write_config_dword(pdev, aer + PCI_ERR_COR_STATUS)
  pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS)

pci_aer_clear_nonfatal_status() clears only non-fatal uncorrectable
errors, while pci_aer_init() clears all correctable and all
uncorrectable errors, so the PCI core is already doing more than
idt_init_pci() does.

So I think this change is good because it removes some work from the
driver, but let me know if you think otherwise.

> >  
> > /* First enable the PCI device */
> > ret = pcim_enable_device(pdev);
> > -- 
> > 2.30.1 (Apple Git-130)
> > 


[PATCH] cxl: Remove unnecessary cxl_pci_window_alignment()

2022-12-05 Thread Bjorn Helgaas
From: Bjorn Helgaas 

cxl_pci_window_alignment() is referenced only via the struct
pci_controller_ops.window_alignment function pointer, and only in the
powerpc implementation of pcibios_window_alignment().

pcibios_window_alignment() defaults to returning 1 if the function pointer
is NULL, which is the same was what cxl_pci_window_alignment() does.

cxl_pci_window_alignment() is unnecessary, so remove it.  No functional
change intended.

Signed-off-by: Bjorn Helgaas 
---
 drivers/misc/cxl/vphb.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/drivers/misc/cxl/vphb.c b/drivers/misc/cxl/vphb.c
index 1264253cc07b..6332db8044bd 100644
--- a/drivers/misc/cxl/vphb.c
+++ b/drivers/misc/cxl/vphb.c
@@ -67,12 +67,6 @@ static void cxl_pci_disable_device(struct pci_dev *dev)
}
 }
 
-static resource_size_t cxl_pci_window_alignment(struct pci_bus *bus,
-   unsigned long type)
-{
-   return 1;
-}
-
 static void cxl_pci_reset_secondary_bus(struct pci_dev *dev)
 {
/* Should we do an AFU reset here ? */
@@ -200,7 +194,6 @@ static struct pci_controller_ops cxl_pci_controller_ops =
.enable_device_hook = cxl_pci_enable_device_hook,
.disable_device = cxl_pci_disable_device,
.release_device = cxl_pci_disable_device,
-   .window_alignment = cxl_pci_window_alignment,
.reset_secondary_bus = cxl_pci_reset_secondary_bus,
.setup_msi_irqs = cxl_setup_msi_irqs,
.teardown_msi_irqs = cxl_teardown_msi_irqs,
-- 
2.25.1



Re: [patch 01/39] PCI/MSI: Check for MSI enabled in __pci_msix_enable()

2022-11-16 Thread Bjorn Helgaas
On Fri, Nov 11, 2022 at 02:54:15PM +0100, Thomas Gleixner wrote:
> PCI/MSI and PCI/MSI-X are mutually exclusive, but the MSI-X enable code
> lacks a check for already enabled MSI.
> 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/msi/msi.c |5 +
>  1 file changed, 5 insertions(+)
> 
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -935,6 +935,11 @@ static int __pci_enable_msix_range(struc
>   if (maxvec < minvec)
>   return -ERANGE;
>  
> + if (dev->msi_enabled) {
> + pci_info(dev, "can't enable MSI-X (MSI already enabled)\n");
> + return -EINVAL;
> + }
> +
>   if (WARN_ON_ONCE(dev->msix_enabled))
>   return -EINVAL;
>  
> 


Re: [patch 28/39] PCI/MSI: Move pci_irq_get_affinity() to api.c

2022-11-16 Thread Bjorn Helgaas
On Fri, Nov 11, 2022 at 02:54:59PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> To distangle the maze in msi.c, all exported device-driver MSI APIs are
> now to be grouped in one file, api.c.
> 
> Move pci_irq_get_affinity() and let its kernel-doc match rest of the
> file.
> 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

One nit below.

> ---
>  drivers/pci/msi/api.c | 43 +++
>  drivers/pci/msi/msi.c | 38 --
>  2 files changed, 43 insertions(+), 38 deletions(-)
> ---
> diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
> index 653a61868ae6..473df7ba0584 100644
> --- a/drivers/pci/msi/api.c
> +++ b/drivers/pci/msi/api.c
> @@ -9,6 +9,7 @@
>   */
>  
>  #include 
> +#include 
>  
>  #include "msi.h"
>  
> @@ -251,6 +252,48 @@ int pci_irq_vector(struct pci_dev *dev, unsigned int nr)
>  EXPORT_SYMBOL(pci_irq_vector);
>  
>  /**
> + * pci_irq_get_affinity() - Get a device interrupt vector affinity
> + * @dev: the PCI device to operate on
> + * @nr:  device-relative interrupt vector index (0-based); has different
> + *   meanings, depending on interrupt mode
> + * MSI-Xthe index in the MSI-X vector table
> + * MSI  the index of the enabled MSI vectors
> + * INTx must be 0
> + *
> + * Return: MSI/MSI-X vector affinity, NULL if @nr is out of range or if
> + * the MSI(-X) vector was allocated without explicit affinity
> + * requirements (e.g., by pci_enable_msi(), pci_enable_msix_range(), or
> + * pci_alloc_irq_vectors() without the %PCI_IRQ_AFFINITY flag). Return a
> + * generic set of CPU ids representing all possible CPUs available
> + * during system boot if the device is in legacy INTx mode.

s/ids/IDs/

> + */
> +const struct cpumask *pci_irq_get_affinity(struct pci_dev *dev, int nr)
> +{
> + int idx, irq = pci_irq_vector(dev, nr);
> + struct msi_desc *desc;
> +
> + if (WARN_ON_ONCE(irq <= 0))
> + return NULL;
> +
> + desc = irq_get_msi_desc(irq);
> + /* Non-MSI does not have the information handy */
> + if (!desc)
> + return cpu_possible_mask;
> +
> + /* MSI[X] interrupts can be allocated without affinity descriptor */
> + if (!desc->affinity)
> + return NULL;
> +
> + /*
> +  * MSI has a mask array in the descriptor.
> +  * MSI-X has a single mask.
> +  */
> + idx = dev->msi_enabled ? nr : 0;
> + return >affinity[idx].mask;
> +}
> +EXPORT_SYMBOL(pci_irq_get_affinity);
> +
> +/**
>   * pci_free_irq_vectors() - Free previously allocated IRQs for a device
>   * @dev: the PCI device to operate on
>   *
> diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> index 6fa90d07d2e4..d78646d1c116 100644
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -854,44 +854,6 @@ int __pci_enable_msix_range(struct pci_dev *dev,
>   }
>  }
>  
> -/**
> - * pci_irq_get_affinity - return the affinity of a particular MSI vector
> - * @dev: PCI device to operate on
> - * @nr:  device-relative interrupt vector index (0-based).
> - *
> - * @nr has the following meanings depending on the interrupt mode:
> - *   MSI-X:  The index in the MSI-X vector table
> - *   MSI:The index of the enabled MSI vectors
> - *   INTx:   Must be 0
> - *
> - * Return: A cpumask pointer or NULL if @nr is out of range
> - */
> -const struct cpumask *pci_irq_get_affinity(struct pci_dev *dev, int nr)
> -{
> - int idx, irq = pci_irq_vector(dev, nr);
> - struct msi_desc *desc;
> -
> - if (WARN_ON_ONCE(irq <= 0))
> - return NULL;
> -
> - desc = irq_get_msi_desc(irq);
> - /* Non-MSI does not have the information handy */
> - if (!desc)
> - return cpu_possible_mask;
> -
> - /* MSI[X] interrupts can be allocated without affinity descriptor */
> - if (!desc->affinity)
> - return NULL;
> -
> - /*
> -  * MSI has a mask array in the descriptor.
> -  * MSI-X has a single mask.
> -  */
> - idx = dev->msi_enabled ? nr : 0;
> - return >affinity[idx].mask;
> -}
> -EXPORT_SYMBOL(pci_irq_get_affinity);
> -
>  struct pci_dev *msi_desc_to_pci_dev(struct msi_desc *desc)
>  {
>   return to_pci_dev(desc->dev);
> 


Re: [patch 37/39] PCI/MSI: Remove redundant msi_check() callback

2022-11-16 Thread Bjorn Helgaas
On Fri, Nov 11, 2022 at 02:55:14PM +0100, Thomas Gleixner wrote:
> All these sanity checks are now done _before_ any allocation work
> happens. No point in doing it twice.
> 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/msi/irqdomain.c |   48 
> 
>  1 file changed, 48 deletions(-)
> 
> --- a/drivers/pci/msi/irqdomain.c
> +++ b/drivers/pci/msi/irqdomain.c
> @@ -64,51 +64,6 @@ static irq_hw_number_t pci_msi_domain_ca
>   (pci_domain_nr(dev->bus) & 0x) << 27;
>  }
>  
> -static inline bool pci_msi_desc_is_multi_msi(struct msi_desc *desc)
> -{
> - return !desc->pci.msi_attrib.is_msix && desc->nvec_used > 1;
> -}
> -
> -/**
> - * pci_msi_domain_check_cap - Verify that @domain supports the capabilities
> - * for @dev
> - * @domain:  The interrupt domain to check
> - * @info:The domain info for verification
> - * @dev: The device to check
> - *
> - * Returns:
> - *  0 if the functionality is supported
> - *  1 if Multi MSI is requested, but the domain does not support it
> - *  -ENOTSUPP otherwise
> - */
> -static int pci_msi_domain_check_cap(struct irq_domain *domain,
> - struct msi_domain_info *info,
> - struct device *dev)
> -{
> - struct msi_desc *desc = msi_first_desc(dev, MSI_DESC_ALL);
> -
> - /* Special handling to support __pci_enable_msi_range() */
> - if (pci_msi_desc_is_multi_msi(desc) &&
> - !(info->flags & MSI_FLAG_MULTI_PCI_MSI))
> - return 1;
> -
> - if (desc->pci.msi_attrib.is_msix) {
> - if (!(info->flags & MSI_FLAG_PCI_MSIX))
> - return -ENOTSUPP;
> -
> - if (info->flags & MSI_FLAG_MSIX_CONTIGUOUS) {
> - unsigned int idx = 0;
> -
> - /* Check for gaps in the entry indices */
> - msi_for_each_desc(desc, dev, MSI_DESC_ALL) {
> - if (desc->msi_index != idx++)
> - return -ENOTSUPP;
> - }
> - }
> - }
> - return 0;
> -}
> -
>  static void pci_msi_domain_set_desc(msi_alloc_info_t *arg,
>   struct msi_desc *desc)
>  {
> @@ -118,7 +73,6 @@ static void pci_msi_domain_set_desc(msi_
>  
>  static struct msi_domain_ops pci_msi_domain_ops_default = {
>   .set_desc   = pci_msi_domain_set_desc,
> - .msi_check  = pci_msi_domain_check_cap,
>  };
>  
>  static void pci_msi_domain_update_dom_ops(struct msi_domain_info *info)
> @@ -130,8 +84,6 @@ static void pci_msi_domain_update_dom_op
>   } else {
>   if (ops->set_desc == NULL)
>   ops->set_desc = pci_msi_domain_set_desc;
> - if (ops->msi_check == NULL)
> - ops->msi_check = pci_msi_domain_check_cap;
>   }
>  }
>  
> 


Re: [patch 36/39] PCI/MSI: Validate MSIX contiguous restriction early

2022-11-16 Thread Bjorn Helgaas
On Fri, Nov 11, 2022 at 02:55:12PM +0100, Thomas Gleixner wrote:
> With interrupt domains the sanity check for MSI-X vector validation can be
> done _before_ any allocation happens. The sanity check only applies to the
> allocation functions which have an 'entries' array argument. The entries
> array is filled by the caller with the requested MSI-X indicies. Some drivers
> have gaps in the index space which is not supported on all architectures.
> 
> The PCI/MSI irqdomain has a 'feature' bit to enforce this validation late
> during the allocation phase.
> 
> Just do it right away before doing any other work along with the other
> sanity checks on that array.
> 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

s/indicies/indices/ (commit log)
s/irqdomain/irq domain/?  IIRC previous logs used "irq domain"
s/MSIX/MSI-X/ (subject line)

> ---
>  drivers/pci/msi/msi.c |   11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -725,13 +725,17 @@ static int msix_capability_init(struct p
>   return ret;
>  }
>  
> -static bool pci_msix_validate_entries(struct msix_entry *entries, int nvec, 
> int hwsize)
> +static bool pci_msix_validate_entries(struct pci_dev *dev, struct msix_entry 
> *entries,
> +   int nvec, int hwsize)
>  {
> + bool nogap;
>   int i, j;
>  
>   if (!entries)
>   return true;
>  
> + nogap = pci_msi_domain_supports(dev, MSI_FLAG_MSIX_CONTIGUOUS, 
> DENY_LEGACY);
> +
>   for (i = 0; i < nvec; i++) {
>   /* Entry within hardware limit? */
>   if (entries[i].entry >= hwsize)
> @@ -742,6 +746,9 @@ static bool pci_msix_validate_entries(st
>   if (entries[i].entry == entries[j].entry)
>   return false;
>   }
> + /* Check for unsupported gaps */
> + if (nogap && entries[i].entry != i)
> + return false;
>   }
>   return true;
>  }
> @@ -773,7 +780,7 @@ int __pci_enable_msix_range(struct pci_d
>   if (hwsize < 0)
>   return hwsize;
>  
> - if (!pci_msix_validate_entries(entries, nvec, hwsize))
> + if (!pci_msix_validate_entries(dev, entries, nvec, hwsize))
>   return -EINVAL;
>  
>   /* PCI_IRQ_VIRTUAL is a horrible hack! */
> 


Re: [patch 35/39] PCI/MSI: Reject MSI-X early

2022-11-16 Thread Bjorn Helgaas
On Fri, Nov 11, 2022 at 02:55:11PM +0100, Thomas Gleixner wrote:
> Similar to PCI multi-MSI reject MSI-X enablement when a irq domain is
> attached to the device which does not support MSI-X.
> 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/msi/msi.c |4 
>  1 file changed, 4 insertions(+)
> 
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -760,6 +760,10 @@ int __pci_enable_msix_range(struct pci_d
>   if (WARN_ON_ONCE(dev->msix_enabled))
>   return -EINVAL;
>  
> + /* Check MSI-X early on irq domain enabled architectures */
> + if (!pci_msi_domain_supports(dev, MSI_FLAG_PCI_MSIX, ALLOW_LEGACY))
> + return -ENOTSUPP;
> +
>   if (!pci_msi_supported(dev, nvec) || dev->current_state != PCI_D0)
>   return -EINVAL;
>  
> 


Re: [patch 34/39] PCI/MSI: Reject multi-MSI early

2022-11-16 Thread Bjorn Helgaas
On Fri, Nov 11, 2022 at 02:55:09PM +0100, Thomas Gleixner wrote:
> When hierarchical MSI interrupt domains are enabled then there is no point
> to do tons of work and detect the missing support for multi-MSI late in the
> allocation path.
> 
> Just query the domain feature flags right away. The query function is going
> to be used for other purposes later and has a mode argument which influences
> the result:
> 
>   ALLOW_LEGACY returns true when:
>  - there is no irq domain attached (legacy support)
>  - there is a irq domain attached which has the feature flag set
> 
>   DENY_LEGACY returns only true when:
>  - there is a irq domain attached which has the feature flag set
> 
> This allows to use the function universally without ifdeffery in the
> calling code.
> 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/msi/irqdomain.c |   22 ++
>  drivers/pci/msi/msi.c   |4 
>  drivers/pci/msi/msi.h   |9 +
>  3 files changed, 35 insertions(+)
> 
> --- a/drivers/pci/msi/irqdomain.c
> +++ b/drivers/pci/msi/irqdomain.c
> @@ -187,6 +187,28 @@ struct irq_domain *pci_msi_create_irq_do
>  }
>  EXPORT_SYMBOL_GPL(pci_msi_create_irq_domain);
>  
> +/**
> + * pci_msi_domain_supports - Check for support of a particular feature flag
> + * @pdev:The PCI device to operate on
> + * @feature_mask:The feature mask to check for (full match)
> + * @mode:If ALLOW_LEGACY this grants the feature when there is 
> no irq domain
> + *   associated to the device. If DENY_LEGACY the lack of an 
> irq domain
> + *   makes the feature unsupported

Looks like some of these might be wider than 80 columns, which I think
was the typical width of this file.

> + */
> +bool pci_msi_domain_supports(struct pci_dev *pdev, unsigned int feature_mask,
> +  enum support_mode mode)
> +{
> + struct msi_domain_info *info;
> + struct irq_domain *domain;
> +
> + domain = dev_get_msi_domain(>dev);
> +
> + if (!domain || !irq_domain_is_hierarchy(domain))
> + return mode == ALLOW_LEGACY;
> + info = domain->host_data;
> + return (info->flags & feature_mask) == feature_mask;
> +}
> +
>  /*
>   * Users of the generic MSI infrastructure expect a device to have a single 
> ID,
>   * so with DMA aliases we have to pick the least-worst compromise. Devices 
> with
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -347,6 +347,10 @@ static int msi_capability_init(struct pc
>   struct msi_desc *entry;
>   int ret;
>  
> + /* Reject multi-MSI early on irq domain enabled architectures */
> + if (nvec > 1 && !pci_msi_domain_supports(dev, MSI_FLAG_MULTI_PCI_MSI, 
> ALLOW_LEGACY))
> + return 1;
> +
>   /*
>* Disable MSI during setup in the hardware, but mark it enabled
>* so that setup code can evaluate it.
> --- a/drivers/pci/msi/msi.h
> +++ b/drivers/pci/msi/msi.h
> @@ -97,6 +97,15 @@ int __pci_enable_msix_range(struct pci_d
>  void __pci_restore_msi_state(struct pci_dev *dev);
>  void __pci_restore_msix_state(struct pci_dev *dev);
>  
> +/* irq_domain related functionality */
> +
> +enum support_mode {
> + ALLOW_LEGACY,
> + DENY_LEGACY,
> +};
> +
> +bool pci_msi_domain_supports(struct pci_dev *dev, unsigned int feature_mask, 
> enum support_mode mode);
> +
>  /* Legacy (!IRQDOMAIN) fallbacks */
>  
>  #ifdef CONFIG_PCI_MSI_ARCH_FALLBACKS
> 


Re: [patch 33/39] PCI/MSI: Sanitize MSI-X checks

2022-11-16 Thread Bjorn Helgaas
On Fri, Nov 11, 2022 at 02:55:07PM +0100, Thomas Gleixner wrote:
> There is no point in doing the same sanity checks over and over in a loop
> during MSI-X enablement. Put them in front of the loop and return early
> when they fail.
> 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/msi/msi.c |   67 
> +-
>  1 file changed, 34 insertions(+), 33 deletions(-)
> 
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -721,47 +721,31 @@ static int msix_capability_init(struct p
>   return ret;
>  }
>  
> -static int __pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries,
> -  int nvec, struct irq_affinity *affd, int flags)
> +static bool pci_msix_validate_entries(struct msix_entry *entries, int nvec, 
> int hwsize)
>  {
> - int nr_entries;
>   int i, j;
>  
> - if (!pci_msi_supported(dev, nvec) || dev->current_state != PCI_D0)
> - return -EINVAL;
> + if (!entries)
> + return true;
>  
> - nr_entries = pci_msix_vec_count(dev);
> - if (nr_entries < 0)
> - return nr_entries;
> - if (nvec > nr_entries && !(flags & PCI_IRQ_VIRTUAL))
> - return nr_entries;
> -
> - if (entries) {
> - /* Check for any invalid entries */
> - for (i = 0; i < nvec; i++) {
> - if (entries[i].entry >= nr_entries)
> - return -EINVAL; /* invalid entry */
> - for (j = i + 1; j < nvec; j++) {
> - if (entries[i].entry == entries[j].entry)
> - return -EINVAL; /* duplicate entry */
> - }
> + for (i = 0; i < nvec; i++) {
> + /* Entry within hardware limit? */
> + if (entries[i].entry >= hwsize)
> + return false;
> +
> + /* Check for duplicate entries */
> + for (j = i + 1; j < nvec; j++) {
> + if (entries[i].entry == entries[j].entry)
> + return false;
>   }
>   }
> -
> - /* Check whether driver already requested for MSI IRQ */
> - if (dev->msi_enabled) {
> - pci_info(dev, "can't enable MSI-X (MSI IRQ already 
> assigned)\n");
> - return -EINVAL;
> - }
> - return msix_capability_init(dev, entries, nvec, affd);
> + return true;
>  }
>  
> -int __pci_enable_msix_range(struct pci_dev *dev,
> - struct msix_entry *entries, int minvec,
> - int maxvec, struct irq_affinity *affd,
> - int flags)
> +int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, 
> int minvec,
> + int maxvec, struct irq_affinity *affd, int flags)
>  {
> - int rc, nvec = maxvec;
> + int hwsize, rc, nvec = maxvec;
>  
>   if (maxvec < minvec)
>   return -ERANGE;
> @@ -774,6 +758,23 @@ int __pci_enable_msix_range(struct pci_d
>   if (WARN_ON_ONCE(dev->msix_enabled))
>   return -EINVAL;
>  
> + if (!pci_msi_supported(dev, nvec) || dev->current_state != PCI_D0)
> + return -EINVAL;
> +
> + hwsize = pci_msix_vec_count(dev);
> + if (hwsize < 0)
> + return hwsize;
> +
> + if (!pci_msix_validate_entries(entries, nvec, hwsize))
> + return -EINVAL;
> +
> + /* PCI_IRQ_VIRTUAL is a horrible hack! */
> + if (nvec > hwsize && !(flags & PCI_IRQ_VIRTUAL))
> + nvec = hwsize;
> +
> + if (nvec < minvec)
> + return -ENOSPC;
> +
>   rc = pci_setup_msi_context(dev);
>   if (rc)
>   return rc;
> @@ -785,7 +786,7 @@ int __pci_enable_msix_range(struct pci_d
>   return -ENOSPC;
>   }
>  
> - rc = __pci_enable_msix(dev, entries, nvec, affd, flags);
> + rc = msix_capability_init(dev, entries, nvec, affd);
>   if (rc == 0)
>   return nvec;
>  
> 


Re: [patch 32/39] PCI/MSI: Reorder functions in msi.c

2022-11-16 Thread Bjorn Helgaas
On Fri, Nov 11, 2022 at 02:55:06PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> There is no way to navigate msi.c without banging the head against the wall
> every now and then because MSI and MSI-X specific functions are
> intermingled and the code flow is completely non-obvious.
> 
> Reorder everthing so common helpers, MSI and MSI-X specific functions are
> grouped together.

s/everthing/everything/

> Suggested-by: Thomas Gleixner 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

I assume this is pure code movement, so I didn't even look at the
text below.

> ---
>  drivers/pci/msi/msi.c |  577 
> +-
>  1 file changed, 295 insertions(+), 282 deletions(-)
> 
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -16,6 +16,97 @@
>  int pci_msi_enable = 1;
>  int pci_msi_ignore_mask;
>  
> +/**
> + * pci_msi_supported - check whether MSI may be enabled on a device
> + * @dev: pointer to the pci_dev data structure of MSI device function
> + * @nvec: how many MSIs have been requested?
> + *
> + * Look at global flags, the device itself, and its parent buses
> + * to determine if MSI/-X are supported for the device. If MSI/-X is
> + * supported return 1, else return 0.
> + **/
> +static int pci_msi_supported(struct pci_dev *dev, int nvec)
> +{
> + struct pci_bus *bus;
> +
> + /* MSI must be globally enabled and supported by the device */
> + if (!pci_msi_enable)
> + return 0;
> +
> + if (!dev || dev->no_msi)
> + return 0;
> +
> + /*
> +  * You can't ask to have 0 or less MSIs configured.
> +  *  a) it's stupid ..
> +  *  b) the list manipulation code assumes nvec >= 1.
> +  */
> + if (nvec < 1)
> + return 0;
> +
> + /*
> +  * Any bridge which does NOT route MSI transactions from its
> +  * secondary bus to its primary bus must set NO_MSI flag on
> +  * the secondary pci_bus.
> +  *
> +  * The NO_MSI flag can either be set directly by:
> +  * - arch-specific PCI host bus controller drivers (deprecated)
> +  * - quirks for specific PCI bridges
> +  *
> +  * or indirectly by platform-specific PCI host bridge drivers by
> +  * advertising the 'msi_domain' property, which results in
> +  * the NO_MSI flag when no MSI domain is found for this bridge
> +  * at probe time.
> +  */
> + for (bus = dev->bus; bus; bus = bus->parent)
> + if (bus->bus_flags & PCI_BUS_FLAGS_NO_MSI)
> + return 0;
> +
> + return 1;
> +}
> +
> +static void pcim_msi_release(void *pcidev)
> +{
> + struct pci_dev *dev = pcidev;
> +
> + dev->is_msi_managed = false;
> + pci_free_irq_vectors(dev);
> +}
> +
> +/*
> + * Needs to be separate from pcim_release to prevent an ordering problem
> + * vs. msi_device_data_release() in the MSI core code.
> + */
> +static int pcim_setup_msi_release(struct pci_dev *dev)
> +{
> + int ret;
> +
> + if (!pci_is_managed(dev) || dev->is_msi_managed)
> + return 0;
> +
> + ret = devm_add_action(>dev, pcim_msi_release, dev);
> + if (!ret)
> + dev->is_msi_managed = true;
> + return ret;
> +}
> +
> +/*
> + * Ordering vs. devres: msi device data has to be installed first so that
> + * pcim_msi_release() is invoked before it on device release.
> + */
> +static int pci_setup_msi_context(struct pci_dev *dev)
> +{
> + int ret = msi_setup_device_data(>dev);
> +
> + if (!ret)
> + ret = pcim_setup_msi_release(dev);
> + return ret;
> +}
> +
> +/*
> + * Helper functions for mask/unmask and MSI message handling
> + */
> +
>  void pci_msi_update_mask(struct msi_desc *desc, u32 clear, u32 set)
>  {
>   raw_spinlock_t *lock = _pci_dev(desc->dev)->msi_lock;
> @@ -163,15 +254,8 @@ void pci_write_msi_msg(unsigned int irq,
>  }
>  EXPORT_SYMBOL_GPL(pci_write_msi_msg);
>  
> -void pci_free_msi_irqs(struct pci_dev *dev)
> -{
> - pci_msi_teardown_msi_irqs(dev);
>  
> - if (dev->msix_base) {
> - iounmap(dev->msix_base);
> - dev->msix_base = NULL;
> - }
> -}
> +/* PCI/MSI specific functionality */
>  
>  static void pci_intx_for_msi(struct pci_dev *dev, int enable)
>  {
> @@ -190,111 +274,6 @@ static void pci_msi_set_enable(struct pc
>   pci_write_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, control);
>  }
>  
> -/*
> - * Architecture override returns true

Re: [patch 31/39] Documentation: PCI: Add reference to PCI/MSI device driver APIs

2022-11-16 Thread Bjorn Helgaas
On Fri, Nov 11, 2022 at 02:55:04PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> All exported device-driver MSI APIs are now grouped in one place at
> drivers/pci/msi/api.c with comprehensive kernel-docs added.
> 
> Reference these kernel-docs in the official PCI/MSI howto.
> 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

> ---
>  Documentation/PCI/msi-howto.rst |   10 ++
>  1 file changed, 10 insertions(+)
> ---
> --- a/Documentation/PCI/msi-howto.rst
> +++ b/Documentation/PCI/msi-howto.rst
> @@ -285,3 +285,13 @@ to bridges between the PCI root and the
>  It is also worth checking the device driver to see whether it supports MSIs.
>  For example, it may contain calls to pci_alloc_irq_vectors() with the
>  PCI_IRQ_MSI or PCI_IRQ_MSIX flags.
> +
> +
> +List of device drivers MSI(-X) APIs
> +===
> +
> +The PCI/MSI subystem has a dedicated C file for its exported device driver
> +APIs — `drivers/pci/msi/api.c`. The following functions are exported:
> +
> +.. kernel-doc:: drivers/pci/msi/api.c
> +   :export:
> 


Re: [patch 30/39] PCI/MSI: Move pci_msi_restore_state() to api.c

2022-11-16 Thread Bjorn Helgaas
On Fri, Nov 11, 2022 at 02:55:03PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> To distangle the maze in msi.c, all exported device-driver MSI APIs are
> now to be grouped in one file, api.c.
> 
> Move pci_msi_enabled() and add kernel-doc for the function.
> 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

> diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
> index ee9ed5ccd94d..8d1cf6db9bd7 100644
> --- a/drivers/pci/msi/api.c
> +++ b/drivers/pci/msi/api.c
> @@ -308,6 +308,21 @@ void pci_free_irq_vectors(struct pci_dev *dev)
>  }
>  EXPORT_SYMBOL(pci_free_irq_vectors);
>  
> +/**
> + * pci_restore_msi_state() - Restore cached MSI(-X) state on device
> + * @dev: the PCI device to operate on
> + *
> + * Write the Linux-cached MSI(-X) state back on device. This is
> + * typically useful upon system resume, or after an error-recovery PCI
> + * adapter reset.
> + */
> +void pci_restore_msi_state(struct pci_dev *dev)
> +{
> + __pci_restore_msi_state(dev);
> + __pci_restore_msix_state(dev);
> +}
> +EXPORT_SYMBOL_GPL(pci_restore_msi_state);
> +
>  /**
>   * pci_msi_enabled() - Are MSI(-X) interrupts enabled system-wide?
>   *
> diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> index 59c33bc7fe81..a5d168c823ff 100644
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -199,7 +199,7 @@ bool __weak arch_restore_msi_irqs(struct pci_dev *dev)
>   return true;
>  }
>  
> -static void __pci_restore_msi_state(struct pci_dev *dev)
> +void __pci_restore_msi_state(struct pci_dev *dev)
>  {
>   struct msi_desc *entry;
>   u16 control;
> @@ -231,7 +231,7 @@ static void pci_msix_clear_and_set_ctrl(struct pci_dev 
> *dev, u16 clear, u16 set)
>   pci_write_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, ctrl);
>  }
>  
> -static void __pci_restore_msix_state(struct pci_dev *dev)
> +void __pci_restore_msix_state(struct pci_dev *dev)
>  {
>   struct msi_desc *entry;
>   bool write_msg;
> @@ -257,13 +257,6 @@ static void __pci_restore_msix_state(struct pci_dev *dev)
>   pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
>  }
>  
> -void pci_restore_msi_state(struct pci_dev *dev)
> -{
> - __pci_restore_msi_state(dev);
> - __pci_restore_msix_state(dev);
> -}
> -EXPORT_SYMBOL_GPL(pci_restore_msi_state);
> -
>  static void pcim_msi_release(void *pcidev)
>  {
>   struct pci_dev *dev = pcidev;
> diff --git a/drivers/pci/msi/msi.h b/drivers/pci/msi/msi.h
> index f3f4ede53171..8170ef2c5ad0 100644
> --- a/drivers/pci/msi/msi.h
> +++ b/drivers/pci/msi/msi.h
> @@ -94,6 +94,8 @@ void pci_free_msi_irqs(struct pci_dev *dev);
>  int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec, 
> struct irq_affinity *affd);
>  int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, 
> int minvec,
>   int maxvec,  struct irq_affinity *affd, int flags);
> +void __pci_restore_msi_state(struct pci_dev *dev);
> +void __pci_restore_msix_state(struct pci_dev *dev);
>  
>  /* Legacy (!IRQDOMAIN) fallbacks */
>  
> 


Re: [patch 29/39] PCI/MSI: Move pci_msi_enabled() to api.c

2022-11-16 Thread Bjorn Helgaas
On Fri, Nov 11, 2022 at 02:55:01PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> To distangle the maze in msi.c, all exported device-driver MSI APIs are
> now to be grouped in one file, api.c.
> 
> Move pci_msi_enabled() and make its kernel-doc comprehensive.
> 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/msi/api.c | 12 
>  drivers/pci/msi/msi.c | 14 +-
>  drivers/pci/msi/msi.h |  3 +++
>  3 files changed, 16 insertions(+), 13 deletions(-)
> ---
> diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
> index 473df7ba0584..ee9ed5ccd94d 100644
> --- a/drivers/pci/msi/api.c
> +++ b/drivers/pci/msi/api.c
> @@ -307,3 +307,15 @@ void pci_free_irq_vectors(struct pci_dev *dev)
>   pci_disable_msi(dev);
>  }
>  EXPORT_SYMBOL(pci_free_irq_vectors);
> +
> +/**
> + * pci_msi_enabled() - Are MSI(-X) interrupts enabled system-wide?
> + *
> + * Return: true if MSI has not been globally disabled through ACPI FADT,
> + * PCI bridge quirks, or the "pci=nomsi" kernel command-line option.
> + */
> +int pci_msi_enabled(void)
> +{
> + return pci_msi_enable;
> +}
> +EXPORT_SYMBOL(pci_msi_enabled);
> diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> index d78646d1c116..59c33bc7fe81 100644
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -13,7 +13,7 @@
>  #include "../pci.h"
>  #include "msi.h"
>  
> -static int pci_msi_enable = 1;
> +int pci_msi_enable = 1;
>  int pci_msi_ignore_mask;
>  
>  void pci_msi_update_mask(struct msi_desc *desc, u32 clear, u32 set)
> @@ -864,15 +864,3 @@ void pci_no_msi(void)
>  {
>   pci_msi_enable = 0;
>  }
> -
> -/**
> - * pci_msi_enabled - is MSI enabled?
> - *
> - * Returns true if MSI has not been disabled by the command-line option
> - * pci=nomsi.
> - **/
> -int pci_msi_enabled(void)
> -{
> - return pci_msi_enable;
> -}
> -EXPORT_SYMBOL(pci_msi_enabled);
> diff --git a/drivers/pci/msi/msi.h b/drivers/pci/msi/msi.h
> index 77e2587f7e4f..f3f4ede53171 100644
> --- a/drivers/pci/msi/msi.h
> +++ b/drivers/pci/msi/msi.h
> @@ -84,6 +84,9 @@ static inline __attribute_const__ u32 msi_multi_mask(struct 
> msi_desc *desc)
>   return (1 << (1 << desc->pci.msi_attrib.multi_cap)) - 1;
>  }
>  
> +/* Subsystem variables */
> +extern int pci_msi_enable;
> +
>  /* MSI internal functions invoked from the public APIs */
>  void pci_msi_shutdown(struct pci_dev *dev);
>  void pci_msix_shutdown(struct pci_dev *dev);
> 


Re: [patch 27/39] PCI/MSI: Move pci_disable_msix() to api.c

2022-11-16 Thread Bjorn Helgaas
On Fri, Nov 11, 2022 at 02:54:58PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> To distangle the maze in msi.c, all exported device-driver MSI APIs are
> now to be grouped in one file, api.c.
> 
> Move pci_disable_msix() and make its kernel-doc comprehensive.
> 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

Trivial question below.

> ---
>  drivers/pci/msi/api.c | 24 
>  drivers/pci/msi/msi.c | 14 +-
>  drivers/pci/msi/msi.h |  1 +
>  3 files changed, 26 insertions(+), 13 deletions(-)
> ---
> diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
> index 83ea38ffa116..653a61868ae6 100644
> --- a/drivers/pci/msi/api.c
> +++ b/drivers/pci/msi/api.c
> @@ -112,6 +112,30 @@ int pci_enable_msix_range(struct pci_dev *dev, struct 
> msix_entry *entries,
>  EXPORT_SYMBOL(pci_enable_msix_range);
>  
>  /**
> + * pci_disable_msix() - Disable MSI-X interrupt mode on device
> + * @dev: the PCI device to operate on
> + *
> + * Legacy device driver API to disable MSI-X interrupt mode on device,
> + * free earlier-allocated interrupt vectors, and restore INTx emulation.

Isn't INTx *emulation* a PCIe implementation detail?  Doesn't seem
relevant to callers that it's emulated.

> + * The PCI device Linux IRQ (@dev->irq) is restored to its default pin
> + * assertion IRQ. This is the cleanup pair of pci_enable_msix_range().
> + *
> + * NOTE: The newer pci_alloc_irq_vectors() / pci_free_irq_vectors() API
> + * pair should, in general, be used instead.
> + */
> +void pci_disable_msix(struct pci_dev *dev)
> +{
> + if (!pci_msi_enabled() || !dev || !dev->msix_enabled)
> + return;
> +
> + msi_lock_descs(>dev);
> + pci_msix_shutdown(dev);
> + pci_free_msi_irqs(dev);
> + msi_unlock_descs(>dev);
> +}
> +EXPORT_SYMBOL(pci_disable_msix);
> +
> +/**
>   * pci_alloc_irq_vectors() - Allocate multiple device interrupt vectors
>   * @dev:  the PCI device to operate on
>   * @min_vecs: minimum required number of vectors (must be >= 1)
> diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> index 1226d66da992..6fa90d07d2e4 100644
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -736,7 +736,7 @@ static int __pci_enable_msix(struct pci_dev *dev, struct 
> msix_entry *entries,
>   return msix_capability_init(dev, entries, nvec, affd);
>  }
>  
> -static void pci_msix_shutdown(struct pci_dev *dev)
> +void pci_msix_shutdown(struct pci_dev *dev)
>  {
>   struct msi_desc *desc;
>  
> @@ -758,18 +758,6 @@ static void pci_msix_shutdown(struct pci_dev *dev)
>   pcibios_alloc_irq(dev);
>  }
>  
> -void pci_disable_msix(struct pci_dev *dev)
> -{
> - if (!pci_msi_enable || !dev || !dev->msix_enabled)
> - return;
> -
> - msi_lock_descs(>dev);
> - pci_msix_shutdown(dev);
> - pci_free_msi_irqs(dev);
> - msi_unlock_descs(>dev);
> -}
> -EXPORT_SYMBOL(pci_disable_msix);
> -
>  int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
>  struct irq_affinity *affd)
>  {
> diff --git a/drivers/pci/msi/msi.h b/drivers/pci/msi/msi.h
> index 8c4a5289432d..77e2587f7e4f 100644
> --- a/drivers/pci/msi/msi.h
> +++ b/drivers/pci/msi/msi.h
> @@ -86,6 +86,7 @@ static inline __attribute_const__ u32 msi_multi_mask(struct 
> msi_desc *desc)
>  
>  /* MSI internal functions invoked from the public APIs */
>  void pci_msi_shutdown(struct pci_dev *dev);
> +void pci_msix_shutdown(struct pci_dev *dev);
>  void pci_free_msi_irqs(struct pci_dev *dev);
>  int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec, 
> struct irq_affinity *affd);
>  int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, 
> int minvec,
> 


  1   2   3   4   5   6   7   8   9   >