Re: [PATCH v6 4/5] PCI: vmd: Update type of the __iomem pointers

2020-11-30 Thread Bjorn Helgaas
On Mon, Nov 30, 2020 at 09:06:56AM +, David Laight wrote:
> From: Krzysztof Wilczynski
> > Sent: 29 November 2020 23:08
> > 
> > Use "void __iomem" instead "char __iomem" pointer type when working with
> > the accessor functions (with names like readb() or writel(), etc.) to
> > better match a given accessor function signature where commonly the
> > address pointing to an I/O memory region would be a "void __iomem"
> > pointer.
> 
> ISTM that is heading in the wrong direction.
> 
> I think (form the variable names etc) that these are pointers
> to specific registers.
> 
> So what you ought to have is a type for that register block.
> Typically this is actually a structure - to give some type
> checking that the offsets are being used with the correct
> base address.

In this case, "cfgbar" is not really a pointer to a register; it's the
address of memory-mapped config space.  The VMD hardware turns
accesses to that space into PCI config transactions on its secondary
side.  xgene_pcie_get_cfg_base() and brcm_pcie_map_conf() are similar
situations and use "void *".

Bjorn


Re: [PATCH v5] PCI: Unify ECAM constants in native PCI Express drivers

2020-11-28 Thread Bjorn Helgaas
On Fri, Nov 27, 2020 at 10:46:26AM +, Krzysztof Wilczyński wrote:
> Unify ECAM-related constants into a single set of standard constants
> defining memory address shift values for the byte-level address that can
> be used when accessing the PCI Express Configuration Space, and then
> move native PCI Express controller drivers to use newly introduced
> definitions retiring any driver-specific ones.
> 
> The ECAM ("Enhanced Configuration Access Mechanism") is defined by the
> PCI Express specification (see PCI Express Base Specification, Revision
> 5.0, Version 1.0, Section 7.2.2, p. 676), thus most hardware should
> implement it the same way.  Most of the native PCI Express controller
> drivers define their ECAM-related constants, many of these could be
> shared, or use open-coded values when setting the .bus_shift field of
> the struct pci_ecam_ops.
> 
> All of the newly added constants should remove ambiguity and reduce the
> number of open-coded values, and also correlate more strongly with the
> descriptions in the aforementioned specification (see Table 7-1
> "Enhanced Configuration Address Mapping", p. 677).
> 
> There is no change to functionality.
> 
> Suggested-by: Bjorn Helgaas 
> Signed-off-by: Krzysztof Wilczyński 

Beautiful.  This should probably go via Lorenzo's tree, so he may have
comments, too.  Could apply this as-is; I had a few trivial notes
below.

It's ironic that we don't use PCIE_ECAM_OFFSET in drivers/pci/ecam.c.
We could do something like this, which would also let us drop
.bus_shift completely in all the conforming implementations.  It also
closes the hole that we didn't limit "where" to 4K for
pci_ecam_map_bus() users.

  if (per_bus_mapping) {
base = cfg->winp[busn];
busn = 0;
  } else {
base = cfg->win;
  }

  if (cfg->ops->bus_shift) {
u32 bus_offset = (busn & 0xff) << cfg->ops->bus_shift;
u32 devfn_offset = (devfn & 0xff) << (cfg->ops->bus_shift - 8);

where &= 0xfff;

return base + (bus_offset | devfn_offset | where);
  }

  return base + PCIE_ECAM_OFFSET(busn, devfn, where);

Reviewed-by: Bjorn Helgaas 

>  static void __iomem *ppc4xx_pciex_get_config_base(struct ppc4xx_pciex_port 
> *port,
> struct pci_bus *bus,
> -   unsigned int devfn)
> +   unsigned int devfn,
> +   int offset)

The interface change (to add "offset") could be a preparatory patch by
itself.

But I'm actually not sure it's worth even touching this file.  This is
the only place outside drivers/pci that includes linux/pci-ecam.h.  I
think I might rather put PCIE_ECAM_OFFSET() and related things in
drivers/pci/pci.h and keep it all inside drivers/pci.

>  static const struct pci_ecam_ops pci_thunder_pem_ops = {
> - .bus_shift  = 24,
> + .bus_shift  = THUNDER_PCIE_ECAM_BUS_SHIFT,
>   .init   = thunder_pem_platform_init,
>   .pci_ops= {
>   .map_bus= pci_ecam_map_bus,

This could be split to its own patch, no big deal either way.

>  const struct pci_ecam_ops xgene_v2_pcie_ecam_ops = {
> - .bus_shift  = 16,
>   .init   = xgene_v2_pcie_ecam_init,
>   .pci_ops= {
>   .map_bus= xgene_pcie_map_bus,

Thanks for mentioning this change in the cover letter.  It could also
be split off to a preparatory patch, since it's not related to
PCIE_ECAM_OFFSET(), which is the main point of this patch.

>  static void __iomem *iproc_pcie_map_ep_cfg_reg(struct iproc_pcie *pcie,
>  unsigned int busno,
> -unsigned int slot,
> -unsigned int fn,
> +unsigned int devfn,

This interface change *could* be a separate preparatory patch, too,
but I'm starting to feel even more OCD than usual :)

> @@ -94,7 +95,7 @@ struct vmd_dev {
>   struct pci_dev  *dev;
>  
>   spinlock_t  cfg_lock;
> - char __iomem*cfgbar;
> + void __iomem*cfgbar;

This type change might be worth pushing to a separate patch since the
casting issues are not completely trivial.


Re: [PATCH v4] PCI: Unify ECAM constants in native PCI Express drivers

2020-11-20 Thread Bjorn Helgaas
On Mon, Oct 05, 2020 at 12:38:05AM +, Krzysztof Wilczyński wrote:
> Unify ECAM-related constants into a single set of standard constants
> defining memory address shift values for the byte-level address that can
> be used when accessing the PCI Express Configuration Space, and then
> move native PCI Express controller drivers to use newly introduced
> definitions retiring any driver-specific ones.
> 
> The ECAM ("Enhanced Configuration Access Mechanism") is defined by the
> PCI Express specification (see PCI Express Base Specification, Revision
> 5.0, Version 1.0, Section 7.2.2, p. 676), thus most hardware should
> implement it the same way.  Most of the native PCI Express controller
> drivers define their ECAM-related constants, many of these could be
> shared, or use open-coded values when setting the .bus_shift field of
> the struct pci_ecam_ops.
> 
> All of the newly added constants should remove ambiguity and reduce the
> number of open-coded values, and also correlate more strongly with the
> descriptions in the aforementioned specification (see Table 7-1
> "Enhanced Configuration Address Mapping", p. 677).
> 
> There is no change to functionality.
> 
> Suggested-by: Bjorn Helgaas 
> Signed-off-by: Krzysztof Wilczyński 

I think this is a nice cleanup.  PCIE_ECAM_DEV_SHIFT is unused, so I'd
probably remove it and maybe rename PCIE_ECAM_FUN_SHIFT to
PCIE_ECAM_DEVFN_SHIFT or similar.

I assume this would best go through Lorenzo's tree.

> ---
> Changed in v4:
>   Removed constants related to "CAM".
>   Added more platforms and devices that can use new ECAM macros and
>   constants.
>   Removed unused ".bus_shift" initialisers from pci-xgene.c as
>   xgene_pcie_map_bus() did not use these.
> 
> Changes in v3:
>   Updated commit message wording.
>   Updated regarding custom ECAM bus shift values and concerning PCI base
>   configuration space access for Type 1 access.
>   Refactored rockchip_pcie_rd_other_conf() and rockchip_pcie_wr_other_conf()
>   and removed the "busdev" variable.
>   Removed surplus "relbus" variable from nwl_pcie_map_bus() and
>   xilinx_pcie_map_bus().
>   Renamed the PCIE_ECAM_ADDR() macro to PCIE_ECAM_OFFSET().
> 
> Changes in v2:
>   Use PCIE_ECAM_ADDR macro when computing ECAM address offset, but drop
>   PCI_SLOT and PCI_FUNC macros from the PCIE_ECAM_ADDR macro in favour
>   of using a single value for the device/function.
> 
>  arch/powerpc/platforms/4xx/pci.c|  7 +++--
>  drivers/pci/controller/dwc/pcie-al.c|  8 +++---
>  drivers/pci/controller/dwc/pcie-hisi.c  |  4 +--
>  drivers/pci/controller/pci-aardvark.c   | 13 +++--
>  drivers/pci/controller/pci-host-generic.c   |  2 +-
>  drivers/pci/controller/pci-thunder-ecam.c   |  2 +-
>  drivers/pci/controller/pci-thunder-pem.c| 13 +++--
>  drivers/pci/controller/pci-xgene.c  |  2 --
>  drivers/pci/controller/pcie-brcmstb.c   | 16 ++--
>  drivers/pci/controller/pcie-iproc.c | 29 ++---
>  drivers/pci/controller/pcie-rockchip-host.c | 27 +--
>  drivers/pci/controller/pcie-rockchip.h  |  8 +-
>  drivers/pci/controller/pcie-tango.c |  2 +-
>  drivers/pci/controller/pcie-xilinx-nwl.c|  9 ++-
>  drivers/pci/controller/pcie-xilinx.c| 11 ++--
>  drivers/pci/controller/vmd.c|  5 ++--
>  drivers/pci/ecam.c  |  4 +--
>  include/linux/pci-ecam.h| 24 +
>  18 files changed, 82 insertions(+), 104 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/4xx/pci.c 
> b/arch/powerpc/platforms/4xx/pci.c
> index c13d64c3b019..cee40e0b061c 100644
> --- a/arch/powerpc/platforms/4xx/pci.c
> +++ b/arch/powerpc/platforms/4xx/pci.c
> @@ -20,6 +20,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1585,17 +1586,15 @@ static void __iomem 
> *ppc4xx_pciex_get_config_base(struct ppc4xx_pciex_port *port
> struct pci_bus *bus,
> unsigned int devfn)
>  {
> - int relbus;
> -
>   /* Remove the casts when we finally remove the stupid volatile
>* in struct pci_controller
>*/
>   if (bus->number == port->hose->first_busno)
>   return (void __iomem *)port->hose->cfg_addr;
>  
> - relbus = bus->number - (port->hose->first_busno + 1);
>   return (void __iomem *)port->hose->cfg_data +
> - ((relbus  << 20) | (devfn << 12));
> + PCIE_ECAM_BUS(bus->number - (port->hos

Re: [PATCH] rpadlpar_io:Add MODULE_DESCRIPTION entries to kernel modules

2020-09-25 Thread Bjorn Helgaas
On Thu, Sep 24, 2020 at 04:41:39PM +1000, Oliver O'Halloran wrote:
> On Thu, Sep 24, 2020 at 3:15 PM Mamatha Inamdar
>  wrote:
> >
> > This patch adds a brief MODULE_DESCRIPTION to rpadlpar_io kernel modules
> > (descriptions taken from Kconfig file)
> >
> > Signed-off-by: Mamatha Inamdar 
> > ---
> >  drivers/pci/hotplug/rpadlpar_core.c |1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/pci/hotplug/rpadlpar_core.c 
> > b/drivers/pci/hotplug/rpadlpar_core.c
> > index f979b70..bac65ed 100644
> > --- a/drivers/pci/hotplug/rpadlpar_core.c
> > +++ b/drivers/pci/hotplug/rpadlpar_core.c
> > @@ -478,3 +478,4 @@ static void __exit rpadlpar_io_exit(void)
> >  module_init(rpadlpar_io_init);
> >  module_exit(rpadlpar_io_exit);
> >  MODULE_LICENSE("GPL");
> > +MODULE_DESCRIPTION("RPA Dynamic Logical Partitioning driver for I/O 
> > slots");
> 
> RPA as a spec was superseded by PAPR in the early 2000s. Can we rename
> this already?
> 
> The only potential problem I can see is scripts doing: modprobe
> rpadlpar_io or similar
> 
> However, we should be able to fix that with a module alias.

Is MODULE_DESCRIPTION() connected with how modprobe works?

If this patch just improves documentation, without breaking users of
modprobe, I'm fine with it, even if it would be nice to rename to PAPR
or something in the future.

But, please use "git log --oneline drivers/pci/hotplug/rpadlpar*" and
match the style, and also look through the rest of drivers/pci/ to see
if we should do the same thing to any other modules.

Bjorn


Re: [PATCH v6 04/11] PCI: designware-ep: Modify MSI and MSIX CAP way of finding

2020-09-23 Thread Bjorn Helgaas
s/MSIX/MSI-X/ (subject and below)

On Sat, Mar 14, 2020 at 11:30:31AM +0800, Xiaowei Bao wrote:
> Each PF of EP device should have it's own MSI or MSIX capabitily
> struct, so create a dw_pcie_ep_func struct and remove the msi_cap
> and msix_cap to this struct from dw_pcie_ep, and manage the PFs
> with a list.

s/capabitily/capability/

I know Lorenzo has already applied this, but for the future, or
in case there are other reasons to update this patch.

There are a bunch of unnecessary initializations below for future
cleanup.

> Signed-off-by: Xiaowei Bao 
> ---
> v3:
>  - This is a new patch, to fix the issue of MSI and MSIX CAP way of
>finding.
> v4:
>  - Correct some word of commit message.
> v5:
>  - No change.
> v6:
>  - Fix up the compile error.
> 
>  drivers/pci/controller/dwc/pcie-designware-ep.c | 135 
> +---
>  drivers/pci/controller/dwc/pcie-designware.h|  18 +++-
>  2 files changed, 134 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c 
> b/drivers/pci/controller/dwc/pcie-designware-ep.c
> index 933bb89..fb915f2 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> @@ -19,6 +19,19 @@ void dw_pcie_ep_linkup(struct dw_pcie_ep *ep)
>   pci_epc_linkup(epc);
>  }
>  
> +struct dw_pcie_ep_func *
> +dw_pcie_ep_get_func_from_ep(struct dw_pcie_ep *ep, u8 func_no)
> +{
> + struct dw_pcie_ep_func *ep_func;
> +
> + list_for_each_entry(ep_func, >func_list, list) {
> + if (ep_func->func_no == func_no)
> + return ep_func;
> + }
> +
> + return NULL;
> +}
> +
>  static unsigned int dw_pcie_ep_func_select(struct dw_pcie_ep *ep, u8 func_no)
>  {
>   unsigned int func_offset = 0;
> @@ -59,6 +72,47 @@ void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum 
> pci_barno bar)
>   __dw_pcie_ep_reset_bar(pci, func_no, bar, 0);
>  }
>  
> +static u8 __dw_pcie_ep_find_next_cap(struct dw_pcie_ep *ep, u8 func_no,
> + u8 cap_ptr, u8 cap)
> +{
> + struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> + unsigned int func_offset = 0;

Unnecessary initialization.

> + u8 cap_id, next_cap_ptr;
> + u16 reg;
> +
> + if (!cap_ptr)
> + return 0;
> +
> + func_offset = dw_pcie_ep_func_select(ep, func_no);
> +
> + reg = dw_pcie_readw_dbi(pci, func_offset + cap_ptr);
> + cap_id = (reg & 0x00ff);
> +
> + if (cap_id > PCI_CAP_ID_MAX)
> + return 0;
> +
> + if (cap_id == cap)
> + return cap_ptr;
> +
> + next_cap_ptr = (reg & 0xff00) >> 8;
> + return __dw_pcie_ep_find_next_cap(ep, func_no, next_cap_ptr, cap);
> +}
> +
> +static u8 dw_pcie_ep_find_capability(struct dw_pcie_ep *ep, u8 func_no, u8 
> cap)
> +{
> + struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> + unsigned int func_offset = 0;

Unnecessary initialization.

> + u8 next_cap_ptr;
> + u16 reg;
> +
> + func_offset = dw_pcie_ep_func_select(ep, func_no);
> +
> + reg = dw_pcie_readw_dbi(pci, func_offset + PCI_CAPABILITY_LIST);
> + next_cap_ptr = (reg & 0x00ff);
> +
> + return __dw_pcie_ep_find_next_cap(ep, func_no, next_cap_ptr, cap);
> +}
> +
>  static int dw_pcie_ep_write_header(struct pci_epc *epc, u8 func_no,
>  struct pci_epf_header *hdr)
>  {
> @@ -246,13 +300,18 @@ static int dw_pcie_ep_get_msi(struct pci_epc *epc, u8 
> func_no)
>   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
>   u32 val, reg;
>   unsigned int func_offset = 0;

Unnecessary initialization (not from your patch).

> + struct dw_pcie_ep_func *ep_func;
>  
> - if (!ep->msi_cap)
> + ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no);
> + if (!ep_func)
> + return -EINVAL;
> +
> + if (!ep_func->msi_cap)
>   return -EINVAL;
>  
>   func_offset = dw_pcie_ep_func_select(ep, func_no);
>  
> - reg = ep->msi_cap + func_offset + PCI_MSI_FLAGS;
> + reg = ep_func->msi_cap + func_offset + PCI_MSI_FLAGS;
>   val = dw_pcie_readw_dbi(pci, reg);
>   if (!(val & PCI_MSI_FLAGS_ENABLE))
>   return -EINVAL;
> @@ -268,13 +327,18 @@ static int dw_pcie_ep_set_msi(struct pci_epc *epc, u8 
> func_no, u8 interrupts)
>   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
>   u32 val, reg;
>   unsigned int func_offset = 0;

Unnecessary initialization (not from your patch).

> + struct dw_pcie_ep_func *ep_func;
> +
> + ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no);
> + if (!ep_func)
> + return -EINVAL;
>  
> - if (!ep->msi_cap)
> + if (!ep_func->msi_cap)
>   return -EINVAL;
>  
>   func_offset = dw_pcie_ep_func_select(ep, func_no);
>  
> - reg = ep->msi_cap + func_offset + PCI_MSI_FLAGS;
> + reg = ep_func->msi_cap + func_offset + PCI_MSI_FLAGS;
>   val = dw_pcie_readw_dbi(pci, reg);
>   val &= ~PCI_MSI_FLAGS_QMASK;
>   val 

Re: [PATCH -next] PCI: rpadlpar: use for_each_child_of_node() and for_each_node_by_name

2020-09-17 Thread Bjorn Helgaas
On Wed, Sep 16, 2020 at 02:21:28PM +0800, Qinglang Miao wrote:
> Use for_each_child_of_node() and for_each_node_by_name macro
> instead of open coding it.
> 
> Signed-off-by: Qinglang Miao 

Applied to pci/hotplug for v5.10, thanks!

> ---
>  drivers/pci/hotplug/rpadlpar_core.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/rpadlpar_core.c 
> b/drivers/pci/hotplug/rpadlpar_core.c
> index f979b7098..0a3c80ba6 100644
> --- a/drivers/pci/hotplug/rpadlpar_core.c
> +++ b/drivers/pci/hotplug/rpadlpar_core.c
> @@ -40,13 +40,13 @@ static DEFINE_MUTEX(rpadlpar_mutex);
>  static struct device_node *find_vio_slot_node(char *drc_name)
>  {
>   struct device_node *parent = of_find_node_by_name(NULL, "vdevice");
> - struct device_node *dn = NULL;
> + struct device_node *dn;
>   int rc;
>  
>   if (!parent)
>   return NULL;
>  
> - while ((dn = of_get_next_child(parent, dn))) {
> + for_each_child_of_node(parent, dn) {
>   rc = rpaphp_check_drc_props(dn, drc_name, NULL);
>   if (rc == 0)
>   break;
> @@ -60,10 +60,10 @@ static struct device_node *find_vio_slot_node(char 
> *drc_name)
>  static struct device_node *find_php_slot_pci_node(char *drc_name,
> char *drc_type)
>  {
> - struct device_node *np = NULL;
> + struct device_node *np;
>   int rc;
>  
> - while ((np = of_find_node_by_name(np, "pci"))) {
> + for_each_node_by_name(np, "pci") {
>   rc = rpaphp_check_drc_props(np, drc_name, drc_type);
>   if (rc == 0)
>   break;
> -- 
> 2.23.0
> 


Re: [PATCH -next] PCI: rpadlpar: Make some functions static

2020-07-30 Thread Bjorn Helgaas
On Tue, Jul 21, 2020 at 11:17:35PM +0800, Wei Yongjun wrote:
> The sparse tool report build warnings as follows:
> 
> drivers/pci/hotplug/rpadlpar_core.c:355:5: warning:
>  symbol 'dlpar_remove_pci_slot' was not declared. Should it be static?
> drivers/pci/hotplug/rpadlpar_core.c:461:12: warning:
>  symbol 'rpadlpar_io_init' was not declared. Should it be static?
> drivers/pci/hotplug/rpadlpar_core.c:473:6: warning:
>  symbol 'rpadlpar_io_exit' was not declared. Should it be static?
> 
> Those functions are not used outside of this file, so marks them
> static.
> Also mark rpadlpar_io_exit() as __exit.
> 
> Reported-by: Hulk Robot 
> Signed-off-by: Wei Yongjun 

Applied to pci/hotplug for v5.9, thanks!

> ---
>  drivers/pci/hotplug/rpadlpar_core.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/rpadlpar_core.c 
> b/drivers/pci/hotplug/rpadlpar_core.c
> index c5eb509c72f0..f979b7098acf 100644
> --- a/drivers/pci/hotplug/rpadlpar_core.c
> +++ b/drivers/pci/hotplug/rpadlpar_core.c
> @@ -352,7 +352,7 @@ static int dlpar_remove_vio_slot(char *drc_name, struct 
> device_node *dn)
>   * -ENODEV   Not a valid drc_name
>   * -EIO  Internal PCI Error
>   */
> -int dlpar_remove_pci_slot(char *drc_name, struct device_node *dn)
> +static int dlpar_remove_pci_slot(char *drc_name, struct device_node *dn)
>  {
>   struct pci_bus *bus;
>   struct slot *slot;
> @@ -458,7 +458,7 @@ static inline int is_dlpar_capable(void)
>   return (int) (rc != RTAS_UNKNOWN_SERVICE);
>  }
>  
> -int __init rpadlpar_io_init(void)
> +static int __init rpadlpar_io_init(void)
>  {
>  
>   if (!is_dlpar_capable()) {
> @@ -470,7 +470,7 @@ int __init rpadlpar_io_init(void)
>   return dlpar_sysfs_init();
>  }
>  
> -void rpadlpar_io_exit(void)
> +static void __exit rpadlpar_io_exit(void)
>  {
>   dlpar_sysfs_exit();
>  }
> 


Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86

2020-07-15 Thread Bjorn Helgaas
On Wed, Jul 15, 2020 at 02:38:29PM +, David Laight wrote:
> From: Oliver O'Halloran
> > Sent: 15 July 2020 05:19
> > 
> > On Wed, Jul 15, 2020 at 8:03 AM Arnd Bergmann  wrote:
> ...
> > > - config space accesses are very rare compared to memory
> > >   space access and on the hardware side the error handling
> > >   would be similar, but readl/writel don't return errors, they just
> > >   access wrong registers or return 0x.
> > >   arch/powerpc/kernel/eeh.c has a ton extra code written to
> > >   deal with it, but no other architectures do.
> > 
> > TBH the EEH MMIO hooks were probably a mistake to begin with. Errors
> > detected via MMIO are almost always asynchronous to the error itself
> > so you usually just wind up with a misleading stack trace rather than
> > any kind of useful synchronous error reporting. It seems like most
> > drivers don't bother checking for 0xFFs either and rely on the
> > asynchronous reporting via .error_detected() instead, so I have to
> > wonder what the point is. I've been thinking of removing the MMIO
> > hooks and using a background poller to check for errors on each PHB
> > periodically (assuming we don't have an EEH interrupt) instead. That
> > would remove the requirement for eeh_dev_check_failure() to be
> > interrupt safe too, so it might even let us fix all the godawful races
> > in EEH.
> 
> I've 'played' with PCIe error handling - without much success.
> What might be useful is for a driver that has just read ~0u to
> be able to ask 'has there been an error signalled for this device?'.

In many cases a driver will know that ~0 is not a valid value for the
register it's reading.  But if ~0 *could* be valid, an interface like
you suggest could be useful.  I don't think we have anything like that
today, but maybe we could.  It would certainly be nice if the PCI core
noticed, logged, and cleared errors.  We have some of that for AER,
but that's an optional feature, and support for the error bits in the
garden-variety PCI_STATUS register is pretty haphazard.  As you note
below, this sort of SERR/PERR reporting is frequently hard-wired in
ways that takes it out of our purview.

Bjorn


Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86

2020-07-15 Thread Bjorn Helgaas
On Wed, Jul 15, 2020 at 02:24:21PM +, David Laight wrote:
> From: Arnd Bergmann
> > Sent: 15 July 2020 07:47
> > On Wed, Jul 15, 2020 at 1:46 AM Bjorn Helgaas  wrote:
> > 
> >  So something like:
> > >
> > >   void pci_read_config_word(struct pci_dev *dev, int where, u16 *val)
> > >
> > > and where we used to return anything non-zero, we just set *val = ~0
> > > instead?  I think we do that already in most, maybe all, cases.
> > 
> > Right, this is what I had in mind. If we start by removing the handling
> > of the return code in all files that clearly don't need it, looking at
> > whatever remains will give a much better idea of what a good interface
> > should be.
> 
> It would be best to get rid of that nasty 'u16 *' parameter.

Do you mean nasty because it's basically a return value, but not
returned as the *function's* return value?  I agree that if we were
starting from scratch it would nicer to have:

  u16 pci_read_config_word(struct pci_dev *dev, int where)

but I don't think it's worth changing the thousands of callers just
for that.

> Make the return int and return the read value or -1 on error.
> (Or maybe 0x on error??)
> 
> For a 32bit read (there must be one for the BARs) returning
> a 64bit signed integer would work even for 32bit systems.
> 
> If code cares about the error, and it can be detected then
> it can check. Otherwise the it all 'just works'.

There are u8 (byte), u16 (word), and u32 (dword) config reads &
writes.  But I don't think it really helps to return something wider
than the access.  For programmatic errors like invalid alignment, we
could indeed use the extra bits to return an unambiguous error.  But
we still have the "device was unplugged" sort of errors where the
*hardware* typically returns ~0 and the config accessor doesn't know
whether that's valid data or an error.

Bjorn


Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86

2020-07-14 Thread Bjorn Helgaas
[+cc Kjetil]

On Wed, Jul 15, 2020 at 12:01:56AM +0200, Arnd Bergmann wrote:
> On Tue, Jul 14, 2020 at 8:45 PM Bjorn Helgaas  wrote:
> > On Mon, Jul 13, 2020 at 05:08:10PM +0200, Arnd Bergmann wrote:
> > > On Mon, Jul 13, 2020 at 3:22 PM Saheed O. Bolarinwa
> > > Starting with a), my first question is whether any high-level
> > > drivers even need to care about errors from these functions. I see
> > > 4913 callers that ignore the return code, and 576 that actually
> > > check it, and almost none care about the specific error (as you
> > > found as well). Unless we conclude that most PCI drivers are wrong,
> > > could we just change the return type to 'void' and assume they never
> > > fail for valid arguments on a valid pci_device* ?
> >
> > I really like this idea.
> >
> > pci_write_config_*() has one return value, and only 100ish of 2500
> > callers check for errors.  It's sometimes possible for config
> > accessors to detect PCI errors and return failure, e.g., device was
> > removed or didn't respond, but most of them don't, and detecting these
> > errors is not really that valuable.
> >
> > pci_read_config_*() is much more interesting because it returns two
> > things, the function return value and the value read from the PCI
> > device, and it's complicated to check both.
> >
> > Again it's sometimes possible for config read accessors to detect PCI
> > errors, but in most cases a PCI error means the accessor returns
> > success and the value from PCI is ~0.
> >
> > Checking the function return value catches programming errors (bad
> > alignment, etc) but misses most of the interesting errors (device was
> > unplugged or reported a PCI error).
> 
> My thinking was more that most of the time the error checking may
> be completely bogus to start with, and I would just not check for
> errors at all.

Yes.  I have no problem with that.  There are a few cases where it's
important to check for errors, e.g., we read a status register and do
something based on a bit being set.  A failure will return all bits
set, and we may do the wrong thing.  But most of the errors we care
about will be on MMIO reads, not config reads, so we can probably
ignore most config read errors.

> > Checking the value returned from PCI is tricky because ~0 is a valid
> > value for some config registers, and only the driver knows for sure.
> > If the driver knows that ~0 is a possible value, it would have to do
> > something else, e.g., another config read of a register that *cannot*
> > be ~0, to see whether it's really an error.
> >
> > I suspect that if we had a single value to look at it would be easier
> > to get right.  Error checking with current interface would look like
> > this:
> >
> >   err = pci_read_config_word(dev, addr, );
> >   if (err)
> > return -EINVAL;
> >
> >   if (PCI_POSSIBLE_ERROR(val)) {
> > /* if driver knows ~0 is invalid */
> > return -EINVAL;
> >
> > /* if ~0 is potentially a valid value */
> > err = pci_read_config_word(dev, PCI_VENDOR_ID, );
> > if (err)
> >   return -EINVAL;
> >
> > if (PCI_POSSIBLE_ERROR(val2))
> >   return -EINVAL;
> >   }
> >
> > Error checking with a possible interface that returned only a single
> > value could look like this:
> >
> >   val = pci_config_read_word(dev, addr);
> >   if (PCI_POSSIBLE_ERROR(val)) {
> > /* if driver knows ~0 is invalid */
> > return -EINVAL;
> >
> > /* if ~0 is potentially a valid value */
> > val2 = pci_config_read_word(dev, PCI_VENDOR_ID);
> > if (PCI_POSSIBLE_ERROR(val2))
> >   return -EINVAL;
> >   }
> >
> > Am I understanding you correctly?
> 
> That would require changing all callers of the function, which
> I think would involve changing some 700 files. 

Yeah, that would be a disaster.  So something like:

  void pci_read_config_word(struct pci_dev *dev, int where, u16 *val)

and where we used to return anything non-zero, we just set *val = ~0
instead?  I think we do that already in most, maybe all, cases.

> What I was suggesting was to only change the return type to void and
> categorize all drivers that today check it as either
> 
> a) checking the return code is not helpful, or possibly even
> wrong, so we just stop doing it. I expect those to be the
> vast majority of callers, but that could be wrong.
> 
> b) Code that legitimately check the error code and need to
>take an appropriate action. These could be changed to
>calling a different interface such as 'pci_bus_r

Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86

2020-07-14 Thread Bjorn Helgaas
[trimmed the cc list; it's still too large but maybe arch folks care]

On Mon, Jul 13, 2020 at 05:08:10PM +0200, Arnd Bergmann wrote:
> On Mon, Jul 13, 2020 at 3:22 PM Saheed O. Bolarinwa
>  wrote:
> > This goal of these series is to move the definition of *all*
> > PCIBIOS* from include/linux/pci.h to arch/x86 and limit their use
> > within there.  All other tree specific definition will be left for
> > intact. Maybe they can be renamed.
> >
> > PCIBIOS* is an x86 concept as defined by the PCI spec. The
> > returned error codes of PCIBIOS* are positive values and this
> > introduces some complexities which other archs need not incur.
> 
> I think the intention is good, but I find the series in its current
> form very hard to review, in particular the way you touch some
> functions three times with trivial changes. Instead of
> 
> 1) replace PCIBIOS_SUCCESSFUL with 0
> 2) drop pointless 0-comparison
> 3) reformat whitespace
> 
> I would suggest to combine the first two steps into one patch per
> subsystem and drop the third step.

I agree.  BUT please don't just run out and post new patches to do
this.  Let's talk about Arnd's further ideas below first.

> ...
> Maybe the work can be split up differently, with a similar end
> result but fewer and easier reviewed patches. The way I'd look at
> the problem, there are three main areas that can be dealt with one
> at a time:
> 
> a) callers of the high-level config space accessors
>pci_{write,read}_config_{byte,word,dword}, mostly in device
>drivers.
> b) low-level implementation of the config space accessors
> through struct pci_ops
> c) all other occurrences of these constants
> 
> Starting with a), my first question is whether any high-level
> drivers even need to care about errors from these functions. I see
> 4913 callers that ignore the return code, and 576 that actually
> check it, and almost none care about the specific error (as you
> found as well). Unless we conclude that most PCI drivers are wrong,
> could we just change the return type to 'void' and assume they never
> fail for valid arguments on a valid pci_device* ?

I really like this idea.

pci_write_config_*() has one return value, and only 100ish of 2500
callers check for errors.  It's sometimes possible for config
accessors to detect PCI errors and return failure, e.g., device was
removed or didn't respond, but most of them don't, and detecting these
errors is not really that valuable.

pci_read_config_*() is much more interesting because it returns two
things, the function return value and the value read from the PCI
device, and it's complicated to check both. 

Again it's sometimes possible for config read accessors to detect PCI
errors, but in most cases a PCI error means the accessor returns
success and the value from PCI is ~0.

Checking the function return value catches programming errors (bad
alignment, etc) but misses most of the interesting errors (device was
unplugged or reported a PCI error).

Checking the value returned from PCI is tricky because ~0 is a valid
value for some config registers, and only the driver knows for sure.
If the driver knows that ~0 is a possible value, it would have to do
something else, e.g., another config read of a register that *cannot*
be ~0, to see whether it's really an error.

I suspect that if we had a single value to look at it would be easier
to get right.  Error checking with current interface would look like
this:

  err = pci_read_config_word(dev, addr, );
  if (err)
return -EINVAL;

  if (PCI_POSSIBLE_ERROR(val)) {
/* if driver knows ~0 is invalid */
return -EINVAL;

/* if ~0 is potentially a valid value */
err = pci_read_config_word(dev, PCI_VENDOR_ID, );
if (err)
  return -EINVAL;

if (PCI_POSSIBLE_ERROR(val2))
  return -EINVAL;
  }

Error checking with a possible interface that returned only a single
value could look like this:

  val = pci_config_read_word(dev, addr);
  if (PCI_POSSIBLE_ERROR(val)) {
/* if driver knows ~0 is invalid */
return -EINVAL;

/* if ~0 is potentially a valid value */
val2 = pci_config_read_word(dev, PCI_VENDOR_ID);
if (PCI_POSSIBLE_ERROR(val2))
  return -EINVAL;
  }

Am I understanding you correctly?

> For b), it might be nice to also change other aspects of the
> interface, e.g. passing a pci_host_bridge pointer plus bus number
> instead of a pci_bus pointer, or having the callback in the
> pci_host_bridge structure.

I like this idea a lot, too.  I think the fact that
pci_bus_read_config_word() requires a pci_bus * complicates things in
a few places.

I think it's completely separate, as you say, and we should defer it
for now because even part a) is a lot of work.  I added it to my list
of possible future projects.

Bjorn


Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86

2020-07-13 Thread Bjorn Helgaas
On Mon, Jul 13, 2020 at 02:22:12PM +0200, Saheed O. Bolarinwa wrote:
> This goal of these series is to move the definition of *all* PCIBIOS* from
> include/linux/pci.h to arch/x86 and limit their use within there.
> All other tree specific definition will be left for intact. Maybe they can
> be renamed.

More comments later, but a few trivial whitespace issues you can clean
up in the meantime.  Don't repost for at least a few days to avoid
spamming everybody.  I found these with:

  $ b4 am -om/ 20200713122247.10985-1-refactormys...@gmail.com
  $ git am 
m/20200713_refactormyself_move_all_pcibios_definitions_into_arch_x86.mbx

  Applying: atm: Change PCIBIOS_SUCCESSFUL to 0
  .git/rebase-apply/patch:11: trailing whitespace.
  iadev = INPH_IA_DEV(dev);
  .git/rebase-apply/patch:12: trailing whitespace.
  for(i=0; i<64; i++)
  .git/rebase-apply/patch:13: trailing whitespace.
if ((error = pci_read_config_dword(iadev->pci,
  .git/rebase-apply/patch:16: trailing whitespace, space before tab in indent.
return error;
  .git/rebase-apply/patch:17: trailing whitespace.
  writel(0, iadev->reg+IPHASE5575_EXT_RESET);
  warning: squelched 5 whitespace errors
  warning: 10 lines add whitespace errors.
  Applying: atm: Tidy Success/Failure checks
  .git/rebase-apply/patch:13: trailing whitespace.

  .git/rebase-apply/patch:14: trailing whitespace.
  iadev = INPH_IA_DEV(dev);
  .git/rebase-apply/patch:15: trailing whitespace.
  for(i=0; i<64; i++)
  .git/rebase-apply/patch:21: trailing whitespace.
  writel(0, iadev->reg+IPHASE5575_EXT_RESET);
  .git/rebase-apply/patch:22: trailing whitespace.
  for(i=0; i<64; i++)
  warning: squelched 3 whitespace errors
  warning: 8 lines add whitespace errors.
  Applying: atm: Fix Style ERROR- assignment in if condition
  .git/rebase-apply/patch:12: trailing whitespace.
  unsigned int pci[64];
  .git/rebase-apply/patch:13: trailing whitespace.

  .git/rebase-apply/patch:14: trailing whitespace.
  iadev = INPH_IA_DEV(dev);
  .git/rebase-apply/patch:23: trailing whitespace.
  writel(0, iadev->reg+IPHASE5575_EXT_RESET);
  .git/rebase-apply/patch:32: trailing whitespace.
  udelay(5);
  warning: squelched 2 whitespace errors
  warning: 7 lines add whitespace errors.
  Applying: PCI: Change PCIBIOS_SUCCESSFUL to 0
  .git/rebase-apply/patch:37: trailing whitespace.
  struct pci_ops apecs_pci_ops =
  .git/rebase-apply/patch:50: trailing whitespace.
  static int
  .git/rebase-apply/patch:59: trailing whitespace.
  struct pci_ops cia_pci_ops =
  .git/rebase-apply/patch:94: trailing whitespace.
  static int
  .git/rebase-apply/patch:103: trailing whitespace.
  struct pci_ops lca_pci_ops =
  warning: squelched 10 whitespace errors
  warning: 15 lines add whitespace errors.


Re: [PATCH 2/2] PCI/AER: Log correctable errors as warning, not error

2020-07-09 Thread Bjorn Helgaas
On Tue, Jul 07, 2020 at 07:14:01PM -0500, Bjorn Helgaas wrote:
> From: Matt Jolly 
> 
> PCIe correctable errors are recovered by hardware with no need for software
> intervention (PCIe r5.0, sec 6.2.2.1).
> 
> Reduce the log level of correctable errors from KERN_ERR to KERN_WARNING.
> 
> The bug reports below are for correctable error logging.  This doesn't fix
> the cause of those reports, but it may make the messages less alarming.
> 
> [bhelgaas: commit log, use pci_printk() to avoid code duplication]
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=201517
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=196183
> Link: https://lore.kernel.org/r/20200618155511.16009-1-Kangie@footclan.ninja
> Signed-off-by: Matt Jolly 
> Signed-off-by: Bjorn Helgaas 

I applied both of these to pci/error for v5.9.

> ---
>  drivers/pci/pcie/aer.c | 25 +++--
>  1 file changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 9176c8a968b9..ca886bf91fd9 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -673,20 +673,23 @@ static void __aer_print_error(struct pci_dev *dev,
>  {
>   const char **strings;
>   unsigned long status = info->status & ~info->mask;
> - const char *errmsg;
> + const char *level, *errmsg;
>   int i;
>  
> - if (info->severity == AER_CORRECTABLE)
> + if (info->severity == AER_CORRECTABLE) {
>   strings = aer_correctable_error_string;
> - else
> + level = KERN_WARNING;
> + } else {
>   strings = aer_uncorrectable_error_string;
> + level = KERN_ERR;
> + }
>  
>   for_each_set_bit(i, , 32) {
>   errmsg = strings[i];
>   if (!errmsg)
>   errmsg = "Unknown Error Bit";
>  
> - pci_err(dev, "   [%2d] %-22s%s\n", i, errmsg,
> + pci_printk(level, dev, "   [%2d] %-22s%s\n", i, errmsg,
>   info->first_error == i ? " (First)" : "");
>   }
>   pci_dev_aer_stats_incr(dev, info);
> @@ -696,6 +699,7 @@ void aer_print_error(struct pci_dev *dev, struct 
> aer_err_info *info)
>  {
>   int layer, agent;
>   int id = ((dev->bus->number << 8) | dev->devfn);
> + const char *level;
>  
>   if (!info->status) {
>   pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, 
> (Unregistered Agent ID)\n",
> @@ -706,13 +710,14 @@ void aer_print_error(struct pci_dev *dev, struct 
> aer_err_info *info)
>   layer = AER_GET_LAYER_ERROR(info->severity, info->status);
>   agent = AER_GET_AGENT(info->severity, info->status);
>  
> - pci_err(dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
> - aer_error_severity_string[info->severity],
> - aer_error_layer[layer], aer_agent_string[agent]);
> + level = (info->severity == AER_CORRECTABLE) ? KERN_WARNING : KERN_ERR;
> +
> + pci_printk(level, dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
> +aer_error_severity_string[info->severity],
> +aer_error_layer[layer], aer_agent_string[agent]);
>  
> - pci_err(dev, "  device [%04x:%04x] error status/mask=%08x/%08x\n",
> - dev->vendor, dev->device,
> - info->status, info->mask);
> + pci_printk(level, dev, "  device [%04x:%04x] error 
> status/mask=%08x/%08x\n",
> +dev->vendor, dev->device, info->status, info->mask);
>  
>   __aer_print_error(dev, info);
>  
> -- 
> 2.25.1
> 


[PATCH 2/2] PCI/AER: Log correctable errors as warning, not error

2020-07-07 Thread Bjorn Helgaas
From: Matt Jolly 

PCIe correctable errors are recovered by hardware with no need for software
intervention (PCIe r5.0, sec 6.2.2.1).

Reduce the log level of correctable errors from KERN_ERR to KERN_WARNING.

The bug reports below are for correctable error logging.  This doesn't fix
the cause of those reports, but it may make the messages less alarming.

[bhelgaas: commit log, use pci_printk() to avoid code duplication]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201517
Link: https://bugzilla.kernel.org/show_bug.cgi?id=196183
Link: https://lore.kernel.org/r/20200618155511.16009-1-Kangie@footclan.ninja
Signed-off-by: Matt Jolly 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/pcie/aer.c | 25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 9176c8a968b9..ca886bf91fd9 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -673,20 +673,23 @@ static void __aer_print_error(struct pci_dev *dev,
 {
const char **strings;
unsigned long status = info->status & ~info->mask;
-   const char *errmsg;
+   const char *level, *errmsg;
int i;
 
-   if (info->severity == AER_CORRECTABLE)
+   if (info->severity == AER_CORRECTABLE) {
strings = aer_correctable_error_string;
-   else
+   level = KERN_WARNING;
+   } else {
strings = aer_uncorrectable_error_string;
+   level = KERN_ERR;
+   }
 
for_each_set_bit(i, , 32) {
errmsg = strings[i];
if (!errmsg)
errmsg = "Unknown Error Bit";
 
-   pci_err(dev, "   [%2d] %-22s%s\n", i, errmsg,
+   pci_printk(level, dev, "   [%2d] %-22s%s\n", i, errmsg,
info->first_error == i ? " (First)" : "");
}
pci_dev_aer_stats_incr(dev, info);
@@ -696,6 +699,7 @@ void aer_print_error(struct pci_dev *dev, struct 
aer_err_info *info)
 {
int layer, agent;
int id = ((dev->bus->number << 8) | dev->devfn);
+   const char *level;
 
if (!info->status) {
pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, 
(Unregistered Agent ID)\n",
@@ -706,13 +710,14 @@ void aer_print_error(struct pci_dev *dev, struct 
aer_err_info *info)
layer = AER_GET_LAYER_ERROR(info->severity, info->status);
agent = AER_GET_AGENT(info->severity, info->status);
 
-   pci_err(dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
-   aer_error_severity_string[info->severity],
-   aer_error_layer[layer], aer_agent_string[agent]);
+   level = (info->severity == AER_CORRECTABLE) ? KERN_WARNING : KERN_ERR;
+
+   pci_printk(level, dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
+  aer_error_severity_string[info->severity],
+  aer_error_layer[layer], aer_agent_string[agent]);
 
-   pci_err(dev, "  device [%04x:%04x] error status/mask=%08x/%08x\n",
-   dev->vendor, dev->device,
-   info->status, info->mask);
+   pci_printk(level, dev, "  device [%04x:%04x] error 
status/mask=%08x/%08x\n",
+  dev->vendor, dev->device, info->status, info->mask);
 
__aer_print_error(dev, info);
 
-- 
2.25.1



[PATCH 1/2] PCI/AER: Simplify __aer_print_error()

2020-07-07 Thread Bjorn Helgaas
From: Bjorn Helgaas 

aer_correctable_error_string[] and aer_uncorrectable_error_string[] have
descriptions of AER error status bits.  Add NULL entries to these tables so
all entries for bits 0-31 are defined.  Then we don't have to check for
ARRAY_SIZE() when decoding a status word, which simplifies
__aer_print_error().

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/pcie/aer.c | 48 ++
 1 file changed, 34 insertions(+), 14 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 3acf56683915..9176c8a968b9 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -447,7 +447,7 @@ static const char *aer_error_layer[] = {
"Transaction Layer"
 };
 
-static const char *aer_correctable_error_string[AER_MAX_TYPEOF_COR_ERRS] = {
+static const char *aer_correctable_error_string[] = {
"RxErr",/* Bit Position 0   */
NULL,
NULL,
@@ -464,9 +464,25 @@ static const char 
*aer_correctable_error_string[AER_MAX_TYPEOF_COR_ERRS] = {
"NonFatalErr",  /* Bit Position 13  */
"CorrIntErr",   /* Bit Position 14  */
"HeaderOF", /* Bit Position 15  */
+   NULL,   /* Bit Position 16  */
+   NULL,   /* Bit Position 17  */
+   NULL,   /* Bit Position 18  */
+   NULL,   /* Bit Position 19  */
+   NULL,   /* Bit Position 20  */
+   NULL,   /* Bit Position 21  */
+   NULL,   /* Bit Position 22  */
+   NULL,   /* Bit Position 23  */
+   NULL,   /* Bit Position 24  */
+   NULL,   /* Bit Position 25  */
+   NULL,   /* Bit Position 26  */
+   NULL,   /* Bit Position 27  */
+   NULL,   /* Bit Position 28  */
+   NULL,   /* Bit Position 29  */
+   NULL,   /* Bit Position 30  */
+   NULL,   /* Bit Position 31  */
 };
 
-static const char *aer_uncorrectable_error_string[AER_MAX_TYPEOF_UNCOR_ERRS] = 
{
+static const char *aer_uncorrectable_error_string[] = {
"Undefined",/* Bit Position 0   */
NULL,
NULL,
@@ -494,6 +510,11 @@ static const char 
*aer_uncorrectable_error_string[AER_MAX_TYPEOF_UNCOR_ERRS] = {
"AtomicOpBlocked",  /* Bit Position 24  */
"TLPBlockedErr",/* Bit Position 25  */
"PoisonTLPBlocked", /* Bit Position 26  */
+   NULL,   /* Bit Position 27  */
+   NULL,   /* Bit Position 28  */
+   NULL,   /* Bit Position 29  */
+   NULL,   /* Bit Position 30  */
+   NULL,   /* Bit Position 31  */
 };
 
 static const char *aer_agent_string[] = {
@@ -650,24 +671,23 @@ static void __print_tlp_header(struct pci_dev *dev,
 static void __aer_print_error(struct pci_dev *dev,
  struct aer_err_info *info)
 {
+   const char **strings;
unsigned long status = info->status & ~info->mask;
-   const char *errmsg = NULL;
+   const char *errmsg;
int i;
 
+   if (info->severity == AER_CORRECTABLE)
+   strings = aer_correctable_error_string;
+   else
+   strings = aer_uncorrectable_error_string;
+
for_each_set_bit(i, , 32) {
-   if (info->severity == AER_CORRECTABLE)
-   errmsg = i < ARRAY_SIZE(aer_correctable_error_string) ?
-   aer_correctable_error_string[i] : NULL;
-   else
-   errmsg = i < ARRAY_SIZE(aer_uncorrectable_error_string) 
?
-   aer_uncorrectable_error_string[i] : NULL;
+   errmsg = strings[i];
+   if (!errmsg)
+   errmsg = "Unknown Error Bit";
 
-   if (errmsg)
-   pci_err(dev, "   [%2d] %-22s%s\n", i, errmsg,
+   pci_err(dev, "   [%2d] %-22s%s\n", i, errmsg,
info->first_error == i ? " (First)" : "");
-   else
-   pci_err(dev, "   [%2d] Unknown Error Bit%s\n",
-   i, info->first_error == i ? " (First)" : "");
}
pci_dev_aer_stats_incr(dev, info);
 }
-- 
2.25.1



Re: [PATCH] pci: pcie: AER: Fix logging of Correctable errors

2020-07-07 Thread Bjorn Helgaas
On Fri, Jun 19, 2020 at 01:55:11AM +1000, Matt Jolly wrote:
> The AER documentation indicates that correctable (severity=Corrected)
> errors should be output as a warning so that users can filter these
> errors if they choose to; This functionality does not appear to have been 
> implemented.
> 
> This patch modifies the functions aer_print_error and __aer_print_error
> to send correctable errors as a warning (pci_warn), rather than as an error 
> (pci_err). It
> partially addresses several bugs in relation to kernel message buffer
> spam for misbehaving devices - the root cause (possibly device firmware?) 
> isn't
> addressed, but the dmesg output is less alarming for end users, and can
> be filtered separately from uncorrectable errors. This should hopefully
> reduce the need for users to disable AER to suppress corrected errors.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=201517
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=196183
> 
> Signed-off-by: Matt Jolly 
> ---
>  drivers/pci/pcie/aer.c | 36 ++--
>  1 file changed, 26 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 3acf56683915..131ecc0df2cb 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -662,12 +662,18 @@ static void __aer_print_error(struct pci_dev *dev,
>   errmsg = i < ARRAY_SIZE(aer_uncorrectable_error_string) 
> ?
>   aer_uncorrectable_error_string[i] : NULL;
>  
> - if (errmsg)
> - pci_err(dev, "   [%2d] %-22s%s\n", i, errmsg,
> - info->first_error == i ? " (First)" : "");
> - else
> + if (errmsg) {
> + if (info->severity == AER_CORRECTABLE) {
> + pci_warn(dev, "   [%2d] %-22s%s\n", i, errmsg,

I think we can use pci_printk() here to reduce the code duplication.

And I think we can also simplify the aer_correctable_error_string/
aer_uncorrectable_error_string stuff above, which would make this even
simpler.

I'll respond to this with my proposal.

> + info->first_error == i ? " (First)" : 
> "");
> + } else {
> + pci_err(dev, "   [%2d] %-22s%s\n", i, errmsg,
> + info->first_error == i ? " (First)" : 
> "");
> + }
> + } else {
>   pci_err(dev, "   [%2d] Unknown Error Bit%s\n",
>   i, info->first_error == i ? " (First)" : "");
> + }
>   }
>   pci_dev_aer_stats_incr(dev, info);
>  }
> @@ -686,13 +692,23 @@ void aer_print_error(struct pci_dev *dev, struct 
> aer_err_info *info)
>   layer = AER_GET_LAYER_ERROR(info->severity, info->status);
>   agent = AER_GET_AGENT(info->severity, info->status);
>  
> - pci_err(dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
> - aer_error_severity_string[info->severity],
> - aer_error_layer[layer], aer_agent_string[agent]);
> + if  (info->severity == AER_CORRECTABLE) {
> + pci_warn(dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
> + aer_error_severity_string[info->severity],
> + aer_error_layer[layer], aer_agent_string[agent]);
>  
> - pci_err(dev, "  device [%04x:%04x] error status/mask=%08x/%08x\n",
> - dev->vendor, dev->device,
> - info->status, info->mask);
> + pci_warn(dev, "  device [%04x:%04x] error 
> status/mask=%08x/%08x\n",
> + dev->vendor, dev->device,
> + info->status, info->mask);
> + } else {
> + pci_err(dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
> + aer_error_severity_string[info->severity],
> + aer_error_layer[layer], aer_agent_string[agent]);
> +
> + pci_err(dev, "  device [%04x:%04x] error 
> status/mask=%08x/%08x\n",
> + dev->vendor, dev->device,
> + info->status, info->mask);
> + }
>  
>   __aer_print_error(dev, info);
>  
> -- 
> 2.26.2
> 


Re: [PATCH 0/8 v2] PCI: Align return values of PCIe capability and PCI accessors

2020-06-26 Thread Bjorn Helgaas
On Mon, Jun 15, 2020 at 09:32:17AM +0200, refactormys...@gmail.com wrote:
> From: Bolarinwa Olayemi Saheed 
> 
> 
> PATCH 1/8 to 7/8:
> PCIBIOS_ error codes have positive values and they are passed down the
> call heirarchy from accessors. For functions which are meant to return
> only a negative value on failure, passing on this value is a bug.
> To mitigate this, call pcibios_err_to_errno() before passing on return
> value from PCIe capability accessors call heirarchy. This function
> converts any positive PCIBIOS_ error codes to negative generic error
> values.
> 
> PATCH 8/8:
> The PCIe capability accessors can return 0, -EINVAL, or any PCIBIOS_ error
> code. The pci accessor on the other hand can only return 0 or any PCIBIOS_
> error code.This inconsistency among these accessor makes it harder for
> callers to check for errors.
> Return PCIBIOS_BAD_REGISTER_NUMBER instead of -EINVAL in all PCIe
> capability accessors.
> 
> MERGING:
> These may all be merged via the PCI tree, since it is a collection of
> similar fixes. This way they all get merged at once.
> 
> Version 2:
> * cc to maintainers and mailing lists
> * Edit the Subject to conform with previous style
> * reorder "Signed by" and "Suggested by"
> * made spelling corrections
> * fixed redundant initialisation in PATCH 3/8
> * include missing call to pcibios_err_to_errno() in PATCH 6/8 and 7/8
> 
> 
> Bolarinwa Olayemi Saheed (8):
>   dmaengine: ioatdma: Convert PCIBIOS_* errors to generic -E* errors
>   IB/hfi1: Convert PCIBIOS_* errors to generic -E* errors
>   IB/hfi1: Convert PCIBIOS_* errors to generic -E* errors
>   PCI: Convert PCIBIOS_* errors to generic -E* errors
>   scsi: smartpqi: Convert PCIBIOS_* errors to generic -E* errors
>   PCI/AER: Convert PCIBIOS_* errors to generic -E* errors
>   PCI/AER: Convert PCIBIOS_* errors to generic -E* errors
>   PCI: Align return values of PCIe capability and PCI accessorss
> 
>  drivers/dma/ioat/init.c   |  4 ++--
>  drivers/infiniband/hw/hfi1/pcie.c | 18 +-
>  drivers/pci/access.c  |  8 
>  drivers/pci/pci.c | 10 --
>  drivers/pci/pcie/aer.c| 12 ++--
>  drivers/scsi/smartpqi/smartpqi_init.c |  6 +-
>  6 files changed, 42 insertions(+), 16 deletions(-)

Since these are really fixing a single PCI API problem, not individual
driver-related problems, I squashed the pcibios_err_to_errno() patches
together (except IB/hfi1, since Jason will take those separately) and
applied them to pci/misc, thanks!

The squashed patch as applied is:

commit d20df83b66cc ("PCI: Convert PCIe capability PCIBIOS errors to errno")
Author: Bolarinwa Olayemi Saheed 
Date:   Mon Jun 15 09:32:18 2020 +0200

PCI: Convert PCIe capability PCIBIOS errors to errno

The PCI config accessors (pci_read_config_word(), et al) return
PCIBIOS_SUCCESSFUL (zero) or positive error values like
PCIBIOS_FUNC_NOT_SUPPORTED.

The PCIe capability accessors (pcie_capability_read_word(), et al)
similarly return PCIBIOS errors, but some callers assume they return
generic errno values like -EINVAL.

For example, the Myri-10G probe function returns a positive PCIBIOS error
if the pcie_capability_clear_and_set_word() in pcie_set_readrq() fails:

  myri10ge_probe
status = pcie_set_readrq
  return pcie_capability_clear_and_set_word
if (status)
  return status

A positive return from a PCI driver probe function would cause a "Driver
probe function unexpectedly returned" warning from local_pci_probe()
instead of the desired probe failure.

Convert PCIBIOS errors to generic errno for all callers of:

  pcie_capability_read_word
  pcie_capability_read_dword
  pcie_capability_write_word
  pcie_capability_write_dword
  pcie_capability_set_word
  pcie_capability_set_dword
  pcie_capability_clear_word
  pcie_capability_clear_dword
  pcie_capability_clear_and_set_word
  pcie_capability_clear_and_set_dword

that check the return code for anything other than zero.

[bhelgaas: commit log, squash together]
Suggested-by: Bjorn Helgaas 
Link: 
https://lore.kernel.org/r/20200615073225.24061-1-refactormys...@gmail.com
Signed-off-by: Bolarinwa Olayemi Saheed 
Signed-off-by: Bjorn Helgaas 

diff --git a/drivers/dma/ioat/init.c b/drivers/dma/ioat/init.c
index 58d13564f88b..9a6a9ec3cf48 100644
--- a/drivers/dma/ioat/init.c
+++ b/drivers/dma/ioat/init.c
@@ -1195,13 +1195,13 @@ static int ioat3_dma_probe(struct ioatdma_device 
*ioat_dma, int dca)
/* disable relaxed ordering */
err = pcie_capability_read_word(pdev, IOAT_DEVCTRL_OFFSET, );
if (err)
-  

Re: [PATCH v3 0/2] PCI/ERR: Allow Native AER/DPC using _OSC

2020-05-22 Thread Bjorn Helgaas
On Fri, May 22, 2020 at 05:23:31PM +, Derrick, Jonathan wrote:
> On Fri, 2020-05-01 at 11:35 -0600, Jonathan Derrick wrote:
> > On Fri, 2020-05-01 at 12:16 -0500, Bjorn Helgaas wrote:
> > > On Thu, Apr 30, 2020 at 12:46:07PM -0600, Jon Derrick wrote:
> > > > Hi Bjorn & Kuppuswamy,
> > > > 
> > > > I see a problem in the DPC ECN [1] to _OSC in that it doesn't
> > > > give us a way to determine if firmware supports _OSC DPC
> > > > negotation, and therefore how to handle DPC.
> > > > 
> > > > Here is the wording of the ECN that implies that Firmware
> > > > without _OSC DPC negotiation support should have the OSPM rely
> > > > on _OSC AER negotiation when determining DPC control:
> > > > 
> > > >   PCIe Base Specification suggests that Downstream Port
> > > >   Containment may be controlled either by the Firmware or the
> > > >   Operating System. It also suggests that the Firmware retain
> > > >   ownership of Downstream Port Containment if it also owns
> > > >   AER. When the Firmware owns Downstream Port Containment, it
> > > >   is expected to use the new "Error Disconnect Recover"
> > > >   notification to alert OSPM of a Downstream Port Containment
> > > >   event.
> > > > 
> > > > In legacy platforms, as bits in _OSC are reserved prior to
> > > > implementation, ACPI Root Bus enumeration will mark these Host
> > > > Bridges as without Native DPC support, even though the
> > > > specification implies it's expected that AER _OSC negotiation
> > > > determines DPC control for these platforms. There seems to be
> > > > a need for a way to determine if the DPC control bit in _OSC
> > > > is supported and fallback on AER otherwise.
> > > > 
> > > > 
> > > > Currently portdrv assumes DPC control if the port has Native
> > > > AER services:
> > > > 
> > > > static int get_port_device_capability(struct pci_dev *dev)
> > > > ...
> > > > if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) &&
> > > > pci_aer_available() &&
> > > > (pcie_ports_dpc_native || (services & 
> > > > PCIE_PORT_SERVICE_AER)))
> > > > services |= PCIE_PORT_SERVICE_DPC;
> > > > 
> > > > Newer firmware may not grant OSPM DPC control, if for
> > > > instance, it expects to use Error Disconnect Recovery. However
> > > > it looks like ACPI will use DPC services via the EDR driver,
> > > > without binding the full DPC port service driver.
> > > > 
> > > > 
> > > > If we change portdrv to probe based on host->native_dpc and
> > > > not AER, then we break instances with legacy firmware where
> > > > OSPM will clear host->native_dpc solely due to _OSC bits being
> > > > reserved:
> > > > 
> > > > struct pci_bus *acpi_pci_root_create(struct acpi_pci_root *root,
> > > > ...
> > > > if (!(root->osc_control_set & OSC_PCI_EXPRESS_DPC_CONTROL))
> > > > host_bridge->native_dpc = 0;
> > > > 
> > > > 
> > > > 
> > > > So my assumption instead is that host->native_dpc can be 0 and
> > > > expect Native DPC services if AER is used. In other words, if
> > > > and only if DPC probe is invoked from portdrv, then it needs
> > > > to rely on the AER dependency. Otherwise it should be assumed
> > > > that ACPI set up DPC via EDR. This covers legacy firmware.
> > > > 
> > > > However it seems like that could be trouble with newer
> > > > firmware that might give OSPM control of AER but not DPC, and
> > > > would result in both Native DPC and EDR being in effect.
> > > > 
> > > > 
> > > > Anyways here are two patches that give control of AER and DPC
> > > > on the results of _OSC. They don't mess with the HEST parser
> > > > as I expect those to be removed at some point. I need these
> > > > for VMD support which doesn't even rely on _OSC, but I suspect
> > > > this won't be the last effort as we detangle Firmware First.
> > > > 
> > > > [1] https://members.pcisig.com/wg/PCI-SIG/document/12888
> > > 
> > > Hi Jon, I think we need to sort out the _OSC/FIRMWARE_FIRST patches
> > > from Alex and Sathy first, then 

Re: [PATCH v3 0/2] PCI/ERR: Allow Native AER/DPC using _OSC

2020-05-01 Thread Bjorn Helgaas
On Thu, Apr 30, 2020 at 12:46:07PM -0600, Jon Derrick wrote:
> Hi Bjorn & Kuppuswamy,
> 
> I see a problem in the DPC ECN [1] to _OSC in that it doesn't give us a way to
> determine if firmware supports _OSC DPC negotation, and therefore how to 
> handle
> DPC.
> 
> Here is the wording of the ECN that implies that Firmware without _OSC DPC
> negotiation support should have the OSPM rely on _OSC AER negotiation when
> determining DPC control:
> 
>   PCIe Base Specification suggests that Downstream Port Containment may be
>   controlled either by the Firmware or the Operating System. It also suggests
>   that the Firmware retain ownership of Downstream Port Containment if it also
>   owns AER. When the Firmware owns Downstream Port Containment, it is expected
>   to use the new "Error Disconnect Recover" notification to alert OSPM of a
>   Downstream Port Containment event.
> 
> In legacy platforms, as bits in _OSC are reserved prior to implementation, 
> ACPI
> Root Bus enumeration will mark these Host Bridges as without Native DPC
> support, even though the specification implies it's expected that AER _OSC
> negotiation determines DPC control for these platforms. There seems to be a
> need for a way to determine if the DPC control bit in _OSC is supported and
> fallback on AER otherwise.
> 
> 
> Currently portdrv assumes DPC control if the port has Native AER services:
> 
> static int get_port_device_capability(struct pci_dev *dev)
> ...
>   if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) &&
>   pci_aer_available() &&
>   (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER)))
>   services |= PCIE_PORT_SERVICE_DPC;
> 
> Newer firmware may not grant OSPM DPC control, if for instance, it expects to
> use Error Disconnect Recovery. However it looks like ACPI will use DPC 
> services
> via the EDR driver, without binding the full DPC port service driver.
> 
> 
> If we change portdrv to probe based on host->native_dpc and not AER, then we
> break instances with legacy firmware where OSPM will clear host->native_dpc
> solely due to _OSC bits being reserved:
> 
> struct pci_bus *acpi_pci_root_create(struct acpi_pci_root *root,
> ...
>   if (!(root->osc_control_set & OSC_PCI_EXPRESS_DPC_CONTROL))
>   host_bridge->native_dpc = 0;
> 
> 
> 
> So my assumption instead is that host->native_dpc can be 0 and expect Native
> DPC services if AER is used. In other words, if and only if DPC probe is
> invoked from portdrv, then it needs to rely on the AER dependency. Otherwise 
> it
> should be assumed that ACPI set up DPC via EDR. This covers legacy firmware.
> 
> However it seems like that could be trouble with newer firmware that might 
> give
> OSPM control of AER but not DPC, and would result in both Native DPC and EDR
> being in effect.
> 
> 
> Anyways here are two patches that give control of AER and DPC on the results 
> of
> _OSC. They don't mess with the HEST parser as I expect those to be removed at
> some point. I need these for VMD support which doesn't even rely on _OSC, but 
> I
> suspect this won't be the last effort as we detangle Firmware First.
> 
> [1] https://members.pcisig.com/wg/PCI-SIG/document/12888

Hi Jon, I think we need to sort out the _OSC/FIRMWARE_FIRST patches
from Alex and Sathy first, then see what needs to be done on top of
those, so I'm going to push these off for a few days and they'll
probably need a refresh.

Bjorn


Re: [PATCH v4] pci: Make return value of pcie_capability_read*() consistent

2020-04-28 Thread Bjorn Helgaas
On Tue, Apr 28, 2020 at 10:19:08AM +0800, Yicong Yang wrote:
> On 2020/4/28 2:13, Bjorn Helgaas wrote:
> >
> > I'm starting to think we're approaching this backwards.  I searched
> > for PCIBIOS_FUNC_NOT_SUPPORTED, PCIBIOS_BAD_VENDOR_ID, and the other
> > error values.  Almost every use is a *return* in a config accessor.
> > There are very, very few *tests* for these values.
> 
> If we have certain reasons to reserve PCI_BIOS* error to identify
> PCI errors in PCI drivers, maybe redefine the PCI_BIOS* to generic
> error codes can solve the issues, and no need to call
> pcibios_err_to_errno() to do the conversion.  Few changes may be
> made to current codes. One possible patch may look like below.
> Otherwise, maybe convert all PCI_BIOS* errors to generic error codes
> is a better idea.
> 
> Not sure it's the best way or not. Just FYI.

That's a brilliant idea!  We should still look carefully at all the
callers of the config accessors, but this would avoid changing all the
arch accessors, so the patch would be dramatically smaller.

> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 83ce1cd..843987c 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -675,14 +675,18 @@ static inline bool pci_dev_msi_enabled(struct pci_dev 
> *pci_dev) { return false;
>  
>  /* Error values that may be returned by PCI functions */
>  #define PCIBIOS_SUCCESSFUL   0x00
> -#define PCIBIOS_FUNC_NOT_SUPPORTED   0x81
> -#define PCIBIOS_BAD_VENDOR_ID0x83
> -#define PCIBIOS_DEVICE_NOT_FOUND 0x86
> -#define PCIBIOS_BAD_REGISTER_NUMBER  0x87
> -#define PCIBIOS_SET_FAILED   0x88
> -#define PCIBIOS_BUFFER_TOO_SMALL 0x89
> -
> -/* Translate above to generic errno for passing back through non-PCI code */
> +#define PCIBIOS_FUNC_NOT_SUPPORTED   -ENOENT
> +#define PCIBIOS_BAD_VENDOR_ID-ENOTTY
> +#define PCIBIOS_DEVICE_NOT_FOUND -ENODEV
> +#define PCIBIOS_BAD_REGISTER_NUMBER  -EFAULT
> +#define PCIBIOS_SET_FAILED   -EIO
> +#define PCIBIOS_BUFFER_TOO_SMALL -ENOSPC
> +
> +/**
> + * Translate above to generic errno for passing back through non-PCI code
> + *
> + * Deprecated. Use the PCIBIOS_* directly without a translation.
> + */
>  static inline int pcibios_err_to_errno(int err)
>  {
>   if (err <= PCIBIOS_SUCCESSFUL)
> @@ -690,17 +694,12 @@ static inline int pcibios_err_to_errno(int err)
>  
>   switch (err) {
>   case PCIBIOS_FUNC_NOT_SUPPORTED:
> - return -ENOENT;
>   case PCIBIOS_BAD_VENDOR_ID:
> - return -ENOTTY;
>   case PCIBIOS_DEVICE_NOT_FOUND:
> - return -ENODEV;
>   case PCIBIOS_BAD_REGISTER_NUMBER:
> - return -EFAULT;
>   case PCIBIOS_SET_FAILED:
> - return -EIO;
>   case PCIBIOS_BUFFER_TOO_SMALL:
> - return -ENOSPC;
> + return err;
>   }
>  
>   return -ERANGE;
> 
> > For example, the only tests for PCIBIOS_FUNC_NOT_SUPPORTED are in
> > xen_pcibios_err_to_errno() and pcibios_err_to_errno(), i.e., we're
> > just converting that value to -ENOENT or the Xen-specific thing.
> >
> > So I think the best approach might be to remove the PCIBIOS_* error
> > values completely and replace them with the corresponding values from
> > pcibios_err_to_errno().  For example, a part of the patch would look
> > like this:
> >
> > diff --git a/arch/mips/pci/ops-emma2rh.c b/arch/mips/pci/ops-emma2rh.c
> > index 65f47344536c..d4d9c902c147 100644
> > --- a/arch/mips/pci/ops-emma2rh.c
> > +++ b/arch/mips/pci/ops-emma2rh.c
> > @@ -100,7 +100,7 @@ static int pci_config_read(struct pci_bus *bus, 
> > unsigned int devfn, int where,
> > break;
> > default:
> > emma2rh_out32(EMMA2RH_PCI_IWIN0_CTR, backup_win0);
> > -   return PCIBIOS_FUNC_NOT_SUPPORTED;
> > +   return -ENOENT;
> > }
> >  
> > emma2rh_out32(EMMA2RH_PCI_IWIN0_CTR, backup_win0);
> > @@ -149,7 +149,7 @@ static int pci_config_write(struct pci_bus *bus, 
> > unsigned int devfn, int where,
> > break;
> > default:
> > emma2rh_out32(EMMA2RH_PCI_IWIN0_CTR, backup_win0);
> > -   return PCIBIOS_FUNC_NOT_SUPPORTED;
> > +   return -ENOENT;
> > }
> > *(volatile u32 *)(base + (PCI_FUNC(devfn) << 8) +
> >   (where & 0xfffc)) = data;
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index 83ce1cdf5676..f95637a8d391 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> 

Re: [PATCH v2 1/2] PCI/AER: Allow Native AER Host Bridges to use AER

2020-04-27 Thread Bjorn Helgaas
On Mon, Apr 27, 2020 at 04:11:07PM +, Derrick, Jonathan wrote:
> On Fri, 2020-04-24 at 18:30 -0500, Bjorn Helgaas wrote:
> > I'm glad you raised this because I think the way we handle
> > FIRMWARE_FIRST is really screwed up.
> > 
> > On Mon, Apr 20, 2020 at 03:37:09PM -0600, Jon Derrick wrote:
> > > Some platforms have a mix of ports whose capabilities can be negotiated
> > > by _OSC, and some ports which are not described by ACPI and instead
> > > managed by Native drivers. The existing Firmware-First HEST model can
> > > incorrectly tag these Native, Non-ACPI ports as Firmware-First managed
> > > ports by advertising the HEST Global Flag and matching the type and
> > > class of the port (aer_hest_parse).
> > > 
> > > If the port requests Native AER through the Host Bridge's capability
> > > settings, the AER driver should honor those settings and allow the port
> > > to bind. This patch changes the definition of Firmware-First to exclude
> > > ports whose Host Bridges request Native AER.
> > > 
> > > Signed-off-by: Jon Derrick 
> > > ---
> > >  drivers/pci/pcie/aer.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > > index f4274d3..30fbd1f 100644
> > > --- a/drivers/pci/pcie/aer.c
> > > +++ b/drivers/pci/pcie/aer.c
> > > @@ -314,6 +314,9 @@ int pcie_aer_get_firmware_first(struct pci_dev *dev)
> > >   if (pcie_ports_native)
> > >   return 0;
> > >  
> > > + if (pci_find_host_bridge(dev->bus)->native_aer)
> > > + return 0;
> > 
> > I hope we don't have to complicate pcie_aer_get_firmware_first() by
> > adding this "native_aer" check here.  I'm not sure what we actually
> > *should* do based on FIRMWARE_FIRST, but I don't think the current
> > uses really make sense.
> > 
> > I think Linux makes too many assumptions based on the FIRMWARE_FIRST
> > bit.  The ACPI spec really only says (ACPI v6.3, sec 18.3.2.4):
> > 
> >   If set, FIRMWARE_FIRST indicates to the OSPM that system firmware
> >   will handle errors from this source first.
> > 
> >   If FIRMWARE_FIRST is set in the flags field, the Enabled field [of
> >   the HEST AER structure] is ignored by the OSPM.
> > 
> > I do not see anything there about who owns the AER Capability, but
> > Linux assumes that if FIRMWARE_FIRST is set, firmware must own the AER
> > Capability.  I think that's reading too much into the spec.
> > 
> > We already have _OSC, which *does* explicitly talk about who owns the
> > AER Capability, and I think we should rely on that.  If firmware
> > doesn't want the OS to touch the AER Capability, it should decline to
> > give ownership to the OS via _OSC.
> > 
> > >   if (!dev->__aer_firmware_first_valid)
> > >   aer_set_firmware_first(dev);
> > >   return dev->__aer_firmware_first;
> 
> Just a little bit of reading and my interpretation, as it seems like
> some of this is just layers upon layers of possibly conflicting yet
> intentionally vague descriptions.
> 
> _OSC seems to describe that OSPM can handle AER (6.2.11.3):
> PCI Express Advanced Error Reporting (AER) control
>The OS sets this bit to 1 to request control over PCI Express AER.
>If the OS successfully receives control of this feature, it must
>handle error reporting through the AER Capability as described in
>the PCI Express Base Specification.
> 
> 
> For AER and DPC the ACPI root port enumeration will properly set
> native_aer/dpc based on _OSC:
> 
> struct pci_bus *acpi_pci_root_create(struct acpi_pci_root *root,
> ...
>   if (!(root->osc_control_set & OSC_PCI_EXPRESS_AER_CONTROL))
>   host_bridge->native_aer = 0;
>   if (!(root->osc_control_set & OSC_PCI_EXPRESS_PME_CONTROL))
>   host_bridge->native_pme = 0;
>   if (!(root->osc_control_set & OSC_PCI_EXPRESS_LTR_CONTROL))
>   host_bridge->native_ltr = 0;
>   if (!(root->osc_control_set & OSC_PCI_EXPRESS_DPC_CONTROL))
>   host_bridge->native_dpc = 0;
> 
> As DPC was defined in an ECN [1], I would imagine AER will need to
> cover DPC for legacy platforms prior to the ECN.
> 
> 
> 
> The complication is that HEST also seems to describe how ports (and
> other devices) are managed either individually or globally:
> 
> Table 18-387  PCI Express Root Port AER Structure
> ...
> Flags:
>[0] - FIRMWARE_FIRST: If set, this b

Re: [PATCH v4] pci: Make return value of pcie_capability_read*() consistent

2020-04-27 Thread Bjorn Helgaas
[+cc Thomas, Michael, linux-mips, linux-ppc, LKML
Background:

  - PCI config accessors (pci_read_config_word(), etc) return 0 or a
positive error (PCIBIOS_BAD_REGISTER_NUMBER, etc).

  - PCI Express capability accessors (pcie_capability_read_word(),
etc) return 0, a negative error (-EINVAL), or a positive error
(PCIBIOS_BAD_REGISTER_NUMBER, etc).

  - The PCI Express case is hard for callers to deal with.  The
original plan was to convert this case to either return 0 or
positive errors, just like pci_read_config_word().

  - I'm raising the possibility of instead getting rid of the positive
PCIBIOS_* error values completely and replacing them with -EINVAL,
-ENOENT, etc.

  - Very few callers check the return codes at all.  Most of the ones
that do either check for non-zero or use pcibios_err_to_errno() to
convert PCIBIOS_* to -EINVAL, etc.

I added MIPS and powerpc folks to CC: just as FYI because you're the
biggest users of PCIBIOS_*.  The intent is that this would be zero
functional change.
]

On Sun, Apr 26, 2020 at 11:51:30AM +0200, Saheed Bolarinwa wrote:
> On 4/25/20 12:30 AM, Bjorn Helgaas wrote:
> > On Fri, Apr 24, 2020 at 04:27:11PM +0200, Bolarinwa Olayemi Saheed wrote:
> > > pcie_capability_read*() could return 0, -EINVAL, or any of the
> > > PCIBIOS_* error codes (which are positive).
> > > This is behaviour is now changed to return only PCIBIOS_* error
> > > codes on error.
> > > This is consistent with pci_read_config_*(). Callers can now have
> > > a consistent way for checking which error has occurred.
> > > 
> > > An audit of the callers of this function was made and no case was found
> > > where there is need for a change within the caller function or their
> > > dependencies down the heirarchy.
> > > Out of all caller functions discovered only 8 functions either persist the
> > > return value of pcie_capability_read*() or directly pass on the return
> > > value.
> > > 
> > > 1.) "./drivers/infiniband/hw/hfi1/pcie.c" :
> > > => pcie_speeds() line-306
> > > 
> > >   if (ret) {
> > >   dd_dev_err(dd, "Unable to read from PCI config\n");
> > >   return ret;
> > >   }
> > > 
> > > remarks: The variable "ret" is the captured return value.
> > >   This function passes on the return value. The return value was
> > >store only by hfi1_init_dd() line-15076 in
> > >   ./drivers/infiniband/hw/hfi1/chip.c and it behave the same on 
> > > all
> > >errors. So this patch will not require a change in this function.
> > Thanks for the analysis, but I don't think it's quite complete.
> > Here's the call chain I see:
> > 
> >local_pci_probe
> >  pci_drv->probe(..)
> >init_one# hfi1_pci_driver.probe method
> >  hfi1_init_dd
> >pcie_speeds
> >  pcie_capability_read_dword
> 
> Thank you for pointing out the call chain. After checking it, I noticed that
> the
> 
> error is handled within the chain in two places without being passed on.
> 
> 1. init_one() in ./drivers/infiniband/hw/hfil1/init.c
> 
>  ret = hfi1_init_dd(dd);
>     if (ret)
>     goto clean_bail; /* error already printed */
> 
>...
>clean_bail:
>     hfi1_pcie_cleanup(pdev);  /*EXITS*/
> 
> 2. hfi1_init_dd() in ./drivers/infiniband/hw/hfil1/chip.c
> 
>     ret = pcie_speeds(dd);
>     if (ret)
>     goto bail_cleanup;
> 
>         ...
> 
>         bail_cleanup:
>          hfi1_pcie_ddcleanup(dd);  /*EXITS*/
> 
> > If pcie_capability_read_dword() returns any non-zero value, that value
> > propagates all the way up and is eventually returned by init_one().
> > init_one() id called by local_pci_probe(), which interprets:
> > 
> >< 0 as failure
> >  0 as success, and
> >> 0 as "success but warn"
> > 
> > So previously an error from pcie_capability_read_dword() could cause
> > either failure or "success but warn" for the probe method, and after
> > this patch those errors will always cause "success but warn".
> > 
> > The current behavior is definitely a bug: if
> > pci_bus_read_config_word() returns PCIBIOS_BAD_REGISTER_NUMBER, that
> > causes pcie_capability_read_dword() to also return
> > PCIBIOS_BAD_REGISTER_NUMBER, which will lead to the probe succeeding
> > with a warning, when it should fail.
> > 
> > I think the fix is to make pcie_speeds() cal

Re: [PATCH v2 1/2] PCI/AER: Allow Native AER Host Bridges to use AER

2020-04-24 Thread Bjorn Helgaas
Hi Jon,

I'm glad you raised this because I think the way we handle
FIRMWARE_FIRST is really screwed up.

On Mon, Apr 20, 2020 at 03:37:09PM -0600, Jon Derrick wrote:
> Some platforms have a mix of ports whose capabilities can be negotiated
> by _OSC, and some ports which are not described by ACPI and instead
> managed by Native drivers. The existing Firmware-First HEST model can
> incorrectly tag these Native, Non-ACPI ports as Firmware-First managed
> ports by advertising the HEST Global Flag and matching the type and
> class of the port (aer_hest_parse).
> 
> If the port requests Native AER through the Host Bridge's capability
> settings, the AER driver should honor those settings and allow the port
> to bind. This patch changes the definition of Firmware-First to exclude
> ports whose Host Bridges request Native AER.
> 
> Signed-off-by: Jon Derrick 
> ---
>  drivers/pci/pcie/aer.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index f4274d3..30fbd1f 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -314,6 +314,9 @@ int pcie_aer_get_firmware_first(struct pci_dev *dev)
>   if (pcie_ports_native)
>   return 0;
>  
> + if (pci_find_host_bridge(dev->bus)->native_aer)
> + return 0;

I hope we don't have to complicate pcie_aer_get_firmware_first() by
adding this "native_aer" check here.  I'm not sure what we actually
*should* do based on FIRMWARE_FIRST, but I don't think the current
uses really make sense.

I think Linux makes too many assumptions based on the FIRMWARE_FIRST
bit.  The ACPI spec really only says (ACPI v6.3, sec 18.3.2.4):

  If set, FIRMWARE_FIRST indicates to the OSPM that system firmware
  will handle errors from this source first.

  If FIRMWARE_FIRST is set in the flags field, the Enabled field [of
  the HEST AER structure] is ignored by the OSPM.

I do not see anything there about who owns the AER Capability, but
Linux assumes that if FIRMWARE_FIRST is set, firmware must own the AER
Capability.  I think that's reading too much into the spec.

We already have _OSC, which *does* explicitly talk about who owns the
AER Capability, and I think we should rely on that.  If firmware
doesn't want the OS to touch the AER Capability, it should decline to
give ownership to the OS via _OSC.

>   if (!dev->__aer_firmware_first_valid)
>   aer_set_firmware_first(dev);
>   return dev->__aer_firmware_first;
> -- 
> 1.8.3.1
> 


Re: [PATCH] PCI: Use of_node_name_eq for node name comparisons

2020-04-24 Thread Bjorn Helgaas
On Thu, Apr 16, 2020 at 04:51:14PM -0500, Rob Herring wrote:
> Convert string compares of DT node names to use of_node_name_eq helper
> instead. This removes direct access to the node name pointer.
> 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Bjorn Helgaas 
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-...@vger.kernel.org
> Signed-off-by: Rob Herring 

Applied to pci/hotplug for v5.8, thanks!

> ---
>  drivers/pci/hotplug/rpaphp_core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/hotplug/rpaphp_core.c 
> b/drivers/pci/hotplug/rpaphp_core.c
> index 6504869efabc..9887c9de08c3 100644
> --- a/drivers/pci/hotplug/rpaphp_core.c
> +++ b/drivers/pci/hotplug/rpaphp_core.c
> @@ -435,7 +435,7 @@ static int rpaphp_drc_add_slot(struct device_node *dn)
>   */
>  int rpaphp_add_slot(struct device_node *dn)
>  {
> - if (!dn->name || strcmp(dn->name, "pci"))
> + if (!of_node_name_eq(dn, "pci"))
>   return 0;
>  
>   if (of_find_property(dn, "ibm,drc-info", NULL))
> -- 
> 2.20.1
> 


Re: [patch V2 02/15] pci/switchtec: Replace completion wait queue usage for poll

2020-03-18 Thread Bjorn Helgaas
On Wed, Mar 18, 2020 at 09:43:04PM +0100, Thomas Gleixner wrote:
> From: Sebastian Andrzej Siewior 
> 
> The poll callback is using the completion wait queue and sticks it into
> poll_wait() to wake up pollers after a command has completed.
> 
> This works to some extent, but cannot provide EPOLLEXCLUSIVE support
> because the waker side uses complete_all() which unconditionally wakes up
> all waiters. complete_all() is required because completions internally use
> exclusive wait and complete() only wakes up one waiter by default.
> 
> This mixes conceptually different mechanisms and relies on internal
> implementation details of completions, which in turn puts contraints on
> changing the internal implementation of completions.
> 
> Replace it with a regular wait queue and store the state in struct
> switchtec_user.
> 
> Signed-off-by: Sebastian Andrzej Siewior 
> Acked-by: Peter Zijlstra (Intel) 
> Cc: Kurt Schwemmer 
> Cc: Logan Gunthorpe 
> Cc: Bjorn Helgaas 
> Cc: linux-...@vger.kernel.org

Acked-by: Bjorn Helgaas 

But please tweak the subject so it matches the other:

  - pci/switchtec: Replace completion wait queue usage for poll
  + PCI/switchtec: Replace completion wait queue usage for poll

> ---
> V2: Reworded changelog.
> ---
>  drivers/pci/switch/switchtec.c |   22 +-
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> --- a/drivers/pci/switch/switchtec.c
> +++ b/drivers/pci/switch/switchtec.c
> @@ -52,10 +52,11 @@ struct switchtec_user {
>  
>   enum mrpc_state state;
>  
> - struct completion comp;
> + wait_queue_head_t cmd_comp;
>   struct kref kref;
>   struct list_head list;
>  
> + bool cmd_done;
>   u32 cmd;
>   u32 status;
>   u32 return_code;
> @@ -77,7 +78,7 @@ static struct switchtec_user *stuser_cre
>   stuser->stdev = stdev;
>   kref_init(>kref);
>   INIT_LIST_HEAD(>list);
> - init_completion(>comp);
> + init_waitqueue_head(>cmd_comp);
>   stuser->event_cnt = atomic_read(>event_cnt);
>  
>   dev_dbg(>dev, "%s: %p\n", __func__, stuser);
> @@ -175,7 +176,7 @@ static int mrpc_queue_cmd(struct switcht
>   kref_get(>kref);
>   stuser->read_len = sizeof(stuser->data);
>   stuser_set_state(stuser, MRPC_QUEUED);
> - reinit_completion(>comp);
> + stuser->cmd_done = false;
>   list_add_tail(>list, >mrpc_queue);
>  
>   mrpc_cmd_submit(stdev);
> @@ -222,7 +223,8 @@ static void mrpc_complete_cmd(struct swi
>   memcpy_fromio(stuser->data, >mmio_mrpc->output_data,
> stuser->read_len);
>  out:
> - complete_all(>comp);
> + stuser->cmd_done = true;
> + wake_up_interruptible(>cmd_comp);
>   list_del_init(>list);
>   stuser_put(stuser);
>   stdev->mrpc_busy = 0;
> @@ -529,10 +531,11 @@ static ssize_t switchtec_dev_read(struct
>   mutex_unlock(>mrpc_mutex);
>  
>   if (filp->f_flags & O_NONBLOCK) {
> - if (!try_wait_for_completion(>comp))
> + if (!stuser->cmd_done)
>   return -EAGAIN;
>   } else {
> - rc = wait_for_completion_interruptible(>comp);
> + rc = wait_event_interruptible(stuser->cmd_comp,
> +   stuser->cmd_done);
>   if (rc < 0)
>   return rc;
>   }
> @@ -580,7 +583,7 @@ static __poll_t switchtec_dev_poll(struc
>   struct switchtec_dev *stdev = stuser->stdev;
>   __poll_t ret = 0;
>  
> - poll_wait(filp, >comp.wait, wait);
> + poll_wait(filp, >cmd_comp, wait);
>   poll_wait(filp, >event_wq, wait);
>  
>   if (lock_mutex_and_test_alive(stdev))
> @@ -588,7 +591,7 @@ static __poll_t switchtec_dev_poll(struc
>  
>   mutex_unlock(>mrpc_mutex);
>  
> - if (try_wait_for_completion(>comp))
> + if (stuser->cmd_done)
>   ret |= EPOLLIN | EPOLLRDNORM;
>  
>   if (stuser->event_cnt != atomic_read(>event_cnt))
> @@ -1272,7 +1275,8 @@ static void stdev_kill(struct switchtec_
>  
>   /* Wake up and kill any users waiting on an MRPC request */
>   list_for_each_entry_safe(stuser, tmpuser, >mrpc_queue, list) {
> - complete_all(>comp);
> + stuser->cmd_done = true;
> + wake_up_interruptible(>cmd_comp);
>   list_del_init(>list);
>   stuser_put(stuser);
>   }
> 


Re: [patch V2 01/15] PCI/switchtec: Fix init_completion race condition with poll_wait()

2020-03-18 Thread Bjorn Helgaas
On Wed, Mar 18, 2020 at 09:43:03PM +0100, Thomas Gleixner wrote:
> From: Logan Gunthorpe 
> 
> The call to init_completion() in mrpc_queue_cmd() can theoretically
> race with the call to poll_wait() in switchtec_dev_poll().
> 
>   poll()  write()
> switchtec_dev_poll()switchtec_dev_write()
>   poll_wait(>comp.wait);  mrpc_queue_cmd()
>  init_completion(>comp)
>init_waitqueue_head(>comp.wait)
> 
> To my knowledge, no one has hit this bug.
> 
> Fix this by using reinit_completion() instead of init_completion() in
> mrpc_queue_cmd().
> 
> Fixes: 080b47def5e5 ("MicroSemi Switchtec management interface driver")
> Reported-by: Sebastian Andrzej Siewior 
> Signed-off-by: Logan Gunthorpe 
> Signed-off-by: Thomas Gleixner 
> Link: https://lkml.kernel.org/r/20200313183608.2646-1-log...@deltatee.com

Acked-by: Bjorn Helgaas 

Not because I understand and have reviewed this, but because I trust
you to do the right thing and it belongs with the rest of the series.

> ---
>  drivers/pci/switch/switchtec.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c
> index a823b4b8ef8a..81dc7ac01381 100644
> --- a/drivers/pci/switch/switchtec.c
> +++ b/drivers/pci/switch/switchtec.c
> @@ -175,7 +175,7 @@ static int mrpc_queue_cmd(struct switchtec_user *stuser)
>   kref_get(>kref);
>   stuser->read_len = sizeof(stuser->data);
>   stuser_set_state(stuser, MRPC_QUEUED);
> - init_completion(>comp);
> + reinit_completion(>comp);
>   list_add_tail(>list, >mrpc_queue);
>  
>   mrpc_cmd_submit(stdev);
> -- 
> 2.20.1
> 
> 


Re: [PATCH -next] PCI: rpaphp: remove set but not used variable 'value'

2020-03-12 Thread Bjorn Helgaas
On Thu, Mar 12, 2020 at 09:38:02AM -0500, Bjorn Helgaas wrote:
> On Thu, Mar 12, 2020 at 10:04:12PM +0800, Chen Zhou wrote:
> > Fixes gcc '-Wunused-but-set-variable' warning:
> > 
> > drivers/pci/hotplug/rpaphp_core.c: In function is_php_type:
> > drivers/pci/hotplug/rpaphp_core.c:291:16: warning:
> > variable value set but not used [-Wunused-but-set-variable]
> > 
> > Reported-by: Hulk Robot 
> > Signed-off-by: Chen Zhou 
> 
> Michael, if you want this:
> 
> Acked-by: Bjorn Helgaas 
> 
> If you don't mind, edit the subject to follow the convention, e.g.,
> 
>   PCI: rpaphp: Remove unused variable 'value'
> 
> Apparently simple_strtoul() is deprecated and we're supposed to use
> kstrtoul() instead.  Looks like kstrtoul() might simplify the code a
> little, too, e.g.,
> 
>   if (kstrtoul(drc_type, 0, ) == 0)
> return 1;
> 
>   return 0;

I guess there are several other uses of simple_strtoul() in this file.
Not sure if it's worth changing them all, just this one, or just the
patch below as-is.

> > ---
> >  drivers/pci/hotplug/rpaphp_core.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/drivers/pci/hotplug/rpaphp_core.c 
> > b/drivers/pci/hotplug/rpaphp_core.c
> > index e408e40..5d871ef 100644
> > --- a/drivers/pci/hotplug/rpaphp_core.c
> > +++ b/drivers/pci/hotplug/rpaphp_core.c
> > @@ -288,11 +288,10 @@ EXPORT_SYMBOL_GPL(rpaphp_check_drc_props);
> >  
> >  static int is_php_type(char *drc_type)
> >  {
> > -   unsigned long value;
> > char *endptr;
> >  
> > /* PCI Hotplug nodes have an integer for drc_type */
> > -   value = simple_strtoul(drc_type, , 10);
> > +   simple_strtoul(drc_type, , 10);
> > if (endptr == drc_type)
> > return 0;
> >  
> > -- 
> > 2.7.4
> > 


Re: [PATCH -next] PCI: rpaphp: remove set but not used variable 'value'

2020-03-12 Thread Bjorn Helgaas
On Thu, Mar 12, 2020 at 10:04:12PM +0800, Chen Zhou wrote:
> Fixes gcc '-Wunused-but-set-variable' warning:
> 
> drivers/pci/hotplug/rpaphp_core.c: In function is_php_type:
> drivers/pci/hotplug/rpaphp_core.c:291:16: warning:
>   variable value set but not used [-Wunused-but-set-variable]
> 
> Reported-by: Hulk Robot 
> Signed-off-by: Chen Zhou 

Michael, if you want this:

Acked-by: Bjorn Helgaas 

If you don't mind, edit the subject to follow the convention, e.g.,

  PCI: rpaphp: Remove unused variable 'value'

Apparently simple_strtoul() is deprecated and we're supposed to use
kstrtoul() instead.  Looks like kstrtoul() might simplify the code a
little, too, e.g.,

  if (kstrtoul(drc_type, 0, ) == 0)
return 1;

  return 0;

> ---
>  drivers/pci/hotplug/rpaphp_core.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/rpaphp_core.c 
> b/drivers/pci/hotplug/rpaphp_core.c
> index e408e40..5d871ef 100644
> --- a/drivers/pci/hotplug/rpaphp_core.c
> +++ b/drivers/pci/hotplug/rpaphp_core.c
> @@ -288,11 +288,10 @@ EXPORT_SYMBOL_GPL(rpaphp_check_drc_props);
>  
>  static int is_php_type(char *drc_type)
>  {
> - unsigned long value;
>   char *endptr;
>  
>   /* PCI Hotplug nodes have an integer for drc_type */
> - value = simple_strtoul(drc_type, , 10);
> + simple_strtoul(drc_type, , 10);
>   if (endptr == drc_type)
>   return 0;
>  
> -- 
> 2.7.4
> 


Re: [PATCH] powerpc/powernv: Disable native PCIe port management

2019-11-14 Thread Bjorn Helgaas
On Fri, Nov 15, 2019 at 12:34:50AM +1100, Oliver O'Halloran wrote:
> On Thu, Nov 14, 2019 at 1:31 AM Bjorn Helgaas  wrote:
> >
> > This is fine, but it feels like sort of a blunt instrument.  Is there
> > any practical way to clear pci_host_bridge.native_pcie_hotplug (and
> > native_aer if appropriate) for the PHBs in question? That would also
> > prevent pciehp from binding.
> 
> It is a large hammer, but I don't see a better way to handle it for
> the moment. I had another look and my initial assessment was wrong in
> that it's the portbus driver which claims the MSI rather than pciehp
> itself. The MSI in the PCIe capability is shared between hotplug
> events, PMEs, and BW notifications so to make the portbus concept work
> the portbus driver needs to own the interrupt. Basicly, pnv_php and
> portbus are fundamentally at odds with each other and can't be used
> concurrently.

Yeah, that makes sense.  Is there a Kconfig symbol for pnv_php?  If
so, you could make CONFIG_PCIEPORTBUS unselectable in the first place.

But I'm guessing there isn't such a symbol because you probably want
to be able to build generic kernels that run on machines that *can*
use portdrv as well as on PowerNV.

So I'm find with the patch as posted.

Bjorn


Re: [PATCH] powerpc/powernv: Disable native PCIe port management

2019-11-13 Thread Bjorn Helgaas
On Wed, Nov 13, 2019 at 08:40:35PM +1100, Oliver O'Halloran wrote:
> On PowerNV the PCIe topology is (currently) managed the powernv platform
> code in cooperation with firmware. The PCIe-native service drivers bypass
> both and this can cause problems.
> 
> Historically this hasn't been a big deal since the only port service
> driver that saw much use was the AER driver. The AER driver relies
> a kernel service to report when errors occur rather than acting autonmously
> so it's fairly easy to ignore. On PowerNV (and pseries) AER events are
> handled through EEH, which ignores the AER service, so it's never been
> an issue.
> 
> Unfortunately, the hotplug port service driver (pciehp) does act
> autonomously and conflicts with the platform specific hotplug
> driver (pnv_php). The main issue is that pciehp claims the interrupt
> associated with the PCIe capability which in turn prevents pnv_php from
> claiming it.
> 
> This results in hotplug events being handled by pciehp which does not
> notify firmware when the PCIe topology changes, and does not setup/teardown
> the arch specific PCI device structures (pci_dn) when the topology changes.
> The end result is that hot-added devices cannot be enabled and hot-removed
> devices may not be fully torn-down on removal.
> 
> We can fix these problems by setting the "pcie_ports_disabled" flag during
> platform initialisation. The flag indicates the platform owns the PCIe
> ports which stops the portbus driver being registered.
> 
> Cc: Sergey Miroshnichenko 
> Fixes: 66725152fb9f ("PCI/hotplug: PowerPC PowerNV PCI hotplug driver")
> Signed-off-by: Oliver O'Halloran 
> ---
> Sergey, just FYI. I'll try sort out the rest of the hotplug
> trainwreck in 5.6.
> 
> The Fixes: here is for the patch that added pnv_php in 4.8. It's been
> a problem since then, but wasn't noticed until people started testing
> it after the EEH fixes in commit 799abe283e51 ("powerpc/eeh: Clean up
> EEH PEs after recovery finishes") went in earlier in the 5.4 cycle.
> ---
>  arch/powerpc/platforms/powernv/pci.c | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci.c 
> b/arch/powerpc/platforms/powernv/pci.c
> index 2825d00..ae62583 100644
> --- a/arch/powerpc/platforms/powernv/pci.c
> +++ b/arch/powerpc/platforms/powernv/pci.c
> @@ -941,6 +941,23 @@ void __init pnv_pci_init(void)
>  
>   pci_add_flags(PCI_CAN_SKIP_ISA_ALIGN);
>  
> +#ifdef CONFIG_PCIEPORTBUS
> + /*
> +  * On PowerNV PCIe devices are (currently) managed in cooperation
> +  * with firmware. This isn't *strictly* required, but there's enough
> +  * assumptions baked into both firmware and the platform code that
> +  * it's unwise to allow the portbus services to be used.
> +  *
> +  * We need to fix this eventually, but for now set this flag to disable
> +  * the portbus driver. The AER service isn't required since that AER
> +  * events are handled via EEH. The pciehp hotplug driver can't work
> +  * without kernel changes (and portbus binding breaks pnv_php). The
> +  * other services also require some thinking about how we're going
> +  * to integrate them.
> +  */
> + pcie_ports_disabled = true;
> +#endif

This is fine, but it feels like sort of a blunt instrument.  Is there
any practical way to clear pci_host_bridge.native_pcie_hotplug (and
native_aer if appropriate) for the PHBs in question?  That would also
prevent pciehp from binding.

We might someday pull portdrv into the PCI core directly instead of as
a separate driver, and I'm thinking that might be easier if we have
more specific indications of what the core shouldn't use.

>   /* If we don't have OPAL, eg. in sim, just skip PCI probe */
>   if (!firmware_has_feature(FW_FEATURE_OPAL))
>   return;
> -- 
> 2.9.5
> 


Re: [PATCH v2 0/9] Fixes and Enablement of ibm,drc-info property

2019-11-11 Thread Bjorn Helgaas
On Sun, Nov 10, 2019 at 11:21:27PM -0600, Tyrel Datwyler wrote:
> There was a previous effort to add support for the PAPR
> architected ibm,drc-info property. This property provides a more
> memory compact representation of a parition's Dynamic Reconfig
> Connectors (DRC). These can otherwise be thought of as currently
> partitioned, or available but yet to be partitioned system resources
> such as cpus, memory, and physical/logical IOA devices.
> 
> The initial implementation proved buggy and was fully turned off by
> disabling the bit in the appropriate CAS support vector. We now have
> PowerVM firmware in the field that supports this new property, and
> further to support partitions with 24TB+ of possible memory this
> property is required to perform platform migration.
> 
> This series fixs the short comings of the previous submission
> in the areas of general implementation, cpu hotplug, and IOA hotplug.
> 
> v2 changelog:
>   Cover Letter: fixed up spelling errors (mpe, tfalcon)
>   Patch 3: added comment regarding indexing of drc values (tfalcon)
>split drc-index and drc-info logic into multiple
>functions for collecting cpu drc's for dlpar (mpe)
>   Patch 7: fix up a couple more sparse warnings (mpe)
> 
> Tyrel Datwyler (9):
>   powerpc/pseries: Fix bad drc_index_start value parsing of drc-info
> entry
>   powerpc/pseries: Fix drc-info mappings of logical cpus to drc-index
>   powerpc/pseries: Add cpu DLPAR support for drc-info property
>   PCI: rpaphp: Fix up pointer to first drc-info entry
>   PCI: rpaphp: Don't rely on firmware feature to imply drc-info support
>   PCI: rpaphp: Add drc-info support for hotplug slot registration
>   PCI: rpaphp: Annotate and correctly byte swap DRC properties
>   PCI: rpaphp: Correctly match ibm,my-drc-index to drc-name when using
> drc-info
>   powerpc/pseries: Enable support for ibm,drc-info property
> 
>  arch/powerpc/kernel/prom_init.c |   2 +-
>  arch/powerpc/platforms/pseries/hotplug-cpu.c| 127 
> +---
>  arch/powerpc/platforms/pseries/of_helpers.c |   8 +-
>  arch/powerpc/platforms/pseries/pseries_energy.c |  23 ++---
>  drivers/pci/hotplug/rpaphp_core.c   | 127 
> +---

For the drivers/pci/* parts:

Acked-by: Bjorn Helgaas 

I assume they will be merged along with the rest of the series via
powerpc.

>  5 files changed, 216 insertions(+), 71 deletions(-)
> 
> -- 
> 2.7.4
> 


Re: [PATCH v2 0/2] Enabling MSI for Microblaze

2019-10-25 Thread Bjorn Helgaas
On Fri, Oct 25, 2019 at 08:10:36AM +0200, Michal Simek wrote:
> Hi,
> 
> these two patches come from discussion with Christoph, Bjorn, Palmer and
> Waiman. The first patch was suggestion by Christoph here
> https://lore.kernel.org/linux-riscv/20191008154604.ga7...@infradead.org/
> The second part was discussed
> https://lore.kernel.org/linux-pci/mhng-5d9bcb53-225e-441f-86cc-b335624b3e7c@palmer-si-x1e/
> and
> https://lore.kernel.org/linux-pci/20191017181937.7004-1-pal...@sifive.com/
> 
> Thanks,
> Michal
> 
> Changes in v2:
> - Fix typo in commit message s/expect/except/ - Reported-by: Masahiro
> 
> Michal Simek (1):
>   asm-generic: Make msi.h a mandatory include/asm header
> 
> Palmer Dabbelt (1):
>   pci: Default to PCI_MSI_IRQ_DOMAIN
> 
>  arch/arc/include/asm/Kbuild | 1 -
>  arch/arm/include/asm/Kbuild | 1 -
>  arch/arm64/include/asm/Kbuild   | 1 -
>  arch/mips/include/asm/Kbuild| 1 -
>  arch/powerpc/include/asm/Kbuild | 1 -
>  arch/riscv/include/asm/Kbuild   | 1 -
>  arch/sparc/include/asm/Kbuild   | 1 -
>  drivers/pci/Kconfig | 2 +-
>  include/asm-generic/Kbuild  | 1 +
>  9 files changed, 2 insertions(+), 8 deletions(-)

I applied these to pci/msi for v5.5, thanks!


Re: Oxford Semiconductor Ltd OX16PCI954 - weird dmesg

2019-10-25 Thread Bjorn Helgaas
On Fri, Oct 25, 2019 at 04:33:13PM +0200, Carlo Pisani wrote:
> pci_bus :00: root bus resource [mem 0x5000-0x5fff]
> pci_bus :00: root bus resource [io  0x1880-0x188f]
> pci_bus :00: root bus resource [??? 0x flags 0x0]
> pci_bus :00: No busn resource found for root bus, will use [bus 00-ff]
> pci :00:00.0: [Firmware Bug]: reg 0x14: invalid BAR (can't size)
> pci :00:00.0: [Firmware Bug]: reg 0x18: invalid BAR (can't size)
> pci :00:04.0: BAR 0: assigned [mem 0x5000-0x5000]
> pci :00:05.0: BAR 1: assigned [mem 0x5001-0x50010fff]
> pci :00:05.0: BAR 3: assigned [mem 0x50011000-0x50011fff]
> pci :00:0a.0: BAR 1: assigned [mem 0x50012000-0x50012fff]
> pci :00:0a.0: BAR 3: assigned [mem 0x50013000-0x50013fff]
> pci :00:02.0: BAR 0: assigned [io  0x1880-0x188000ff]
> pci :00:02.0: BAR 1: assigned [mem 0x50014000-0x500140ff]
> pci :00:03.0: BAR 0: assigned [io  0x18800400-0x188004ff]
> pci :00:03.0: BAR 1: assigned [mem 0x50014100-0x500141ff]
> pci :00:05.0: BAR 0: assigned [io  0x18800800-0x1880081f]
> pci :00:05.0: BAR 2: assigned [io  0x18800820-0x1880083f]
> pci :00:0a.0: BAR 0: assigned [io  0x18800840-0x1880085f]
> pci :00:0a.0: BAR 2: assigned [io  0x18800860-0x1880087f]
> 
> 
> 00:00.0 Non-VGA unclassified device: Integrated Device Technology,
> Inc. Device 
> Subsystem: Device 0214:011d
> Flags: bus master, 66MHz, medium devsel, latency 60, IRQ 140
> Memory at  (32-bit, prefetchable)
> I/O ports at 
> I/O ports at 
> 
> 00:02.0 Ethernet controller: VIA Technologies, Inc. VT6105/VT6106S
> [Rhine-III] (rev 86)
> Subsystem: AST Research Inc Device 086c
> Flags: bus master, stepping, medium devsel, latency 64, IRQ 142
> I/O ports at 1880 [size=256]
> Memory at 50014000 (32-bit, non-prefetchable) [size=256]
> Capabilities: [40] Power Management version 2
> Kernel driver in use: via-rhine
> 
> 00:03.0 Ethernet controller: VIA Technologies, Inc. VT6105/VT6106S
> [Rhine-III] (rev 86)
> Subsystem: AST Research Inc Device 086c
> Flags: bus master, stepping, medium devsel, latency 64, IRQ 143
> I/O ports at 18800400 [size=256]
> Memory at 50014100 (32-bit, non-prefetchable) [size=256]
> Capabilities: [40] Power Management version 2
> Kernel driver in use: via-rhine
> 
> 00:04.0 Network controller: Atheros Communications Inc. Device 0029 (rev 01)
> Subsystem: Atheros Communications Inc. Device 2091
> Flags: bus master, 66MHz, medium devsel, latency 168, IRQ 142
> Memory at 5000 (32-bit, non-prefetchable) [size=64K]
> Capabilities: [44] Power Management version 2
> Kernel driver in use: ath9k
> 
> 00:05.0 Serial controller: Oxford Semiconductor Ltd OX16PCI954 (Quad
> 16950 UART) function 0 (Uart) (rev 01) (prog-if 06 [)
> Subsystem: Oxford Semiconductor Ltd Device 
> Flags: medium devsel, IRQ 143
> I/O ports at 18800800 [size=32]
> Memory at 5001 (32-bit, non-prefetchable) [size=4K]
> I/O ports at 18800820 [size=32]
> Memory at 50011000 (32-bit, non-prefetchable) [size=4K]
> Capabilities: [40] Power Management version 2
> Kernel driver in use: serial
> 
> 00:05.1 Non-VGA unclassified device: Oxford Semiconductor Ltd
> OX16PCI954 (Quad 16950 UART) function 0 (Disabled) (rev 01)
> Subsystem: Oxford Semiconductor Ltd Device 
> Flags: medium devsel, IRQ 143
> I/O ports at  [disabled]
> I/O ports at  [disabled]
> I/O ports at  [disabled]
> Capabilities: [40] Power Management version 2
> 
> 00:0a.0 Serial controller: Oxford Semiconductor Ltd OX16PCI954 (Quad
> 16950 UART) function 0 (Uart) (rev 01) (prog-if 06 [)
> Subsystem: Oxford Semiconductor Ltd Device 
> Flags: medium devsel, IRQ 140
> I/O ports at 18800840 [size=32]
> Memory at 50012000 (32-bit, non-prefetchable) [size=4K]
> I/O ports at 18800860 [size=32]
> Memory at 50013000 (32-bit, non-prefetchable) [size=4K]
> Capabilities: [40] Power Management version 2
> Kernel driver in use: serial
> 
> 00:0a.1 Non-VGA unclassified device: Oxford Semiconductor Ltd
> OX16PCI954 (Quad 16950 UART) function 0 (Disabled) (rev 01)
> Subsystem: Oxford Semiconductor Ltd Device 
> Flags: medium devsel, IRQ 140
> I/O ports at  [disabled]
> I/O ports at  [disabled]
> I/O ports at  [disabled]
> Capabilities: [40] Power Management version 2
> 
> 
> hi guys
> I have a couple of miniPCI Oxford Semiconductor Ltd OX16PCI954 cards
> installed, and the dmesg looks weird
> 
> espeially these lines
> pci_bus :00: root bus resource [mem 0x5000-0x5fff]
> pci_bus :00: root bus resource [io  0x1880-0x188f]
> pci_bus :00: root bus 

Re: [PATCH v5 03/23] PCI: hotplug: Add a flag for the movable BARs feature

2019-10-16 Thread Bjorn Helgaas
On Wed, Oct 16, 2019 at 06:50:30PM +0300, Sergey Miroshnichenko wrote:
> On 10/16/19 1:14 AM, Bjorn Helgaas wrote:
> > On Mon, Sep 30, 2019 at 03:59:25PM +0300, Sergey Miroshnichenko wrote:
> > > On 9/28/19 1:02 AM, Bjorn Helgaas wrote:

> > > > It's possible that a hot-add will trigger this attempt to move things
> > > > around, and it's possible that we won't find space for the new device
> > > > even if we move things around.  But are we certain that every device
> > > > that worked *before* the hot-add will still work *afterwards*?
> > > > 
> > > > Much of the assignment was probably done by the BIOS using different
> > > > algorithms than Linux has, so I think there's some chance that the
> > > > BIOS did a better job and if we lose that BIOS assignment, we might
> > > > not be able to recreate it.
> > > 
> > > If a hardware has some special constraints on BAR assignment that the
> > > kernel is not aware of yet, the movable BARs may break things after a
> > > hotplug event. So the feature must be disabled there (manually) until
> > > the kernel get support for that special needs.
> > 
> > I'm not talking about special constraints on BAR assignment.  (I'm not
> > sure what those constraints would be -- AFAIK the constraints for a
> > spec-compliant device are all discoverable via the BAR size and type
> > (or the Enhanced Allocation capability)).
> > 
> > What I'm concerned about is the case where we boot with a working
> > assignment, we hot-add a device, we move things around to try to
> > accommodate the new device, and not only do we fail to find resources
> > for the new device, we also fail to find a working assignment for the
> > devices that were present at boot.  We've moved things around from
> > what BIOS did, and since we use a different algorithm than the BIOS,
> > there's no guarantee that we'll be able to find the assignment BIOS
> > did.
> 
> If BAR assignment fails with a hot-added device, these patches will
> disable BARs for this device and retry, falling back to the situation
> where number of BARs and their size are the same as they were before
> the hotplug event.
> 
> If all the BARs are immovable - they will just remain on their
> positions. Nothing to break here I guess.
> 
> If almost all the BARs are immovable and there is one movable BAR,
> after releasing the bridge windows there will be a free gap - right
> where this movable BAR was. These patches are keeping the size of
> released BARs, not requesting the size from the devices again - so the
> device can't ask for a larger BAR. The space reserving is disabled by
> this patchset, so the kernel will request the same size for the bridge
> window containing this movable BAR. So there always will be a gap for
> this BAR - in the same location it was before.
> 
> Based on these considerations I assume that the kernel is always able
> to arrange BARs from scratch if a BIOS was able to make it before.
> 
> But! There is an implicit speculation that there will be the same
> amount of BARs after the fallback (which is equivalent to a PCI rescan
> triggered on unchanged topology). And two week ago I've found that
> this is not always true!
> 
> I was testing on a "new" x86_64 PC, where BIOS doesn't reserve a space
> for SR-IOV BARs (of a network adapter). On the boot, the kernel wasn't
> arranging BARs itself - it took values written by the BIOS. And the
> bridge window was "jammed" between immovable BARs, so it can't expand.
> BARs of this device are also immovable, so the bridge window can't be
> moved away. During the PCI rescan, the kernel tried to allocate both
> "regular" and SR-IOV BARs - and failed. Even without changes in the
> PCI topology.
> 
> So in the next version of this series there will be one more patch,
> that allows the kernel to ignore BIOS's setting for the "safe" (non-IO
> and non-VGA) BARs, so these BARs will be arranged kernel-way - and
> also those forgotten by the BIOS.

This still seems a little scary, so I'll probably ask about it again :)

> After modifying the code as you advised, it became possible to mark
> only some BARs of the device as immovable. So the code is less ugly
> now, and it also works for drivers/video/fbdev/efifb.c , which uses
> the BAR in a weird way (dev->driver is NULL, but not the res->child):
> 
>   static bool pci_dev_movable(struct pci_dev *dev,
>   bool res_has_children)
>   {
> if (!pci_can_move_bars)
>   return false;
> 
> if (dev->driver && dev->driver->rescan_

Re: [PATCH v5 03/23] PCI: hotplug: Add a flag for the movable BARs feature

2019-10-15 Thread Bjorn Helgaas
On Mon, Sep 30, 2019 at 03:59:25PM +0300, Sergey Miroshnichenko wrote:
> Hello Bjorn,
> 
> On 9/28/19 1:02 AM, Bjorn Helgaas wrote:
> > On Fri, Aug 16, 2019 at 07:50:41PM +0300, Sergey Miroshnichenko wrote:
> > > When hot-adding a device, the bridge may have windows not big enough (or
> > > fragmented too much) for newly requested BARs to fit in. And expanding
> > > these bridge windows may be impossible because blocked by "neighboring"
> > > BARs and bridge windows.
> > > 
> > > Still, it may be possible to allocate a memory region for new BARs with 
> > > the
> > > following procedure:
> > > 
> > > 1) notify all the drivers which support movable BARs to pause and release
> > > the BARs; the rest of the drivers are guaranteed that their devices 
> > > will
> > > not get BARs moved;
> > > 
> > > 2) release all the bridge windows except of root bridges;
> > > 
> > > 3) try to recalculate new bridge windows that will fit all the BAR types:
> > > - fixed;
> > > - immovable;
> > > - movable;
> > > - newly requested by hot-added devices;
> > > 
> > > 4) if the previous step fails, disable BARs for one of the hot-added
> > > devices and retry from step 3;
> > > 
> > > 5) notify the drivers, so they remap BARs and resume.
> > 
> > You don't do the actual recalculation in *this* patch, but since you
> > mention the procedure here, are we confident that we never make things
> > worse?
> > 
> > It's possible that a hot-add will trigger this attempt to move things
> > around, and it's possible that we won't find space for the new device
> > even if we move things around.  But are we certain that every device
> > that worked *before* the hot-add will still work *afterwards*?
> > 
> > Much of the assignment was probably done by the BIOS using different
> > algorithms than Linux has, so I think there's some chance that the
> > BIOS did a better job and if we lose that BIOS assignment, we might
> > not be able to recreate it.
> 
> If a hardware has some special constraints on BAR assignment that the
> kernel is not aware of yet, the movable BARs may break things after a
> hotplug event. So the feature must be disabled there (manually) until
> the kernel get support for that special needs.

I'm not talking about special constraints on BAR assignment.  (I'm not
sure what those constraints would be -- AFAIK the constraints for a
spec-compliant device are all discoverable via the BAR size and type
(or the Enhanced Allocation capability)).

What I'm concerned about is the case where we boot with a working
assignment, we hot-add a device, we move things around to try to
accommodate the new device, and not only do we fail to find resources
for the new device, we also fail to find a working assignment for the
devices that were present at boot.  We've moved things around from
what BIOS did, and since we use a different algorithm than the BIOS,
there's no guarantee that we'll be able to find the assignment BIOS
did.

> > I'm not sure why the PCI_CLASS_DISPLAY_VGA special case is there; can
> > you add a comment about why that's needed?  Obviously we can't move
> > the 0xa legacy frame buffer because I think devices are allowed to
> > claim that region even if no BAR describes it.  But I would think
> > *other* BARs of VGA devices could be movable.
> 
> Sure, I'll add a comment to the code.
> 
> The issue that we are avoiding by that is the "nomodeset" command line
> argument, which prevents a video driver from being bound, so the BARs
> are seems to be used, but can't be moved, otherwise machines just hang
> after hotplug events. That was the only special ugly case we've
> spotted during testing. I'll check if it will be enough just to work
> around the 0xa.

"nomodeset" is not really documented and is a funny way to say "don't
bind video drivers that know about it", but OK.  Thanks for checking
on the other BARs.

> > > +bool pci_movable_bars_enabled(void);
> > 
> > I would really like it if this were simply
> > 
> >extern bool pci_no_movable_bars;
> > 
> > in drivers/pci/pci.h.  It would default to false since it's
> > uninitialized, and "pci=no_movable_bars" would set it to true.
> 
> I have a premonition of platforms that will not support the feature.
> Wouldn't be better to put this variable-flag to include/linux/pci.h ,
> so code in arch/* can set it, so they could work by default, without
> the command line argument?

In general I don't see why a platform wouldn't support this since
there really isn't anything platform-specific here.  But if a platform
does need to disable it, having arch code set this flag sounds
reasonable.  We shouldn't make it globally visible until we actually
need that, though.

> > We have similar "=off" and "=force" parameters for ASPM and other
> > things, and it makes the code really hard to analyze.

The "=off" and "=force" things are the biggest things I'd like to
avoid.

Bjorn


Re: [RFC PATCH 0/9] Fixes and Enablement of ibm,drc-info property

2019-10-01 Thread Bjorn Helgaas
On Tue, Oct 01, 2019 at 01:12:05AM -0500, Tyrel Datwyler wrote:
> There was an initial previous effort yo add support for the PAPR
> architected ibm,drc-info property. This property provides a more
> memory compact representation of a paritions Dynamic Reconfig
> Connectors (DRC). These can otherwise be thought of the currently
> partitioned, or available, but yet to be partitioned, system resources
> such as cpus, memory, and physical/logical IOA devices.
> 
> The initial implementation proved buggy and was fully turned of by
> disabling the bit in the appropriate CAS support vector. We now have
> PowerVM firmware in the field that supports this new property, and 
> further to suppport partitions with 24TB+ or possible memory this
> property is required to perform platform migration.
> 
> This serious fixup the short comings of the previous implementation
> in the areas of general implementation, cpu hotplug, and IOA hotplug.
> 
> Tyrel Datwyler (9):
>   powerpc/pseries: add cpu DLPAR support for drc-info property
>   powerpc/pseries: fix bad drc_index_start value parsing of drc-info
> entry
>   powerpc/pseries: fix drc-info mappings of logical cpus to drc-index
>   PCI: rpaphp: fix up pointer to first drc-info entry
>   PCI: rpaphp: don't rely on firmware feature to imply drc-info support
>   PCI: rpaphp: add drc-info support for hotplug slot registration
>   PCI: rpaphp: annotate and correctly byte swap DRC properties
>   PCI: rpaphp: correctly match ibm,my-drc-index to drc-name when using
> drc-info
>   powerpc: Enable support for ibm,drc-info property
> 
>  arch/powerpc/kernel/prom_init.c |   2 +-
>  arch/powerpc/platforms/pseries/hotplug-cpu.c| 117 --
>  arch/powerpc/platforms/pseries/of_helpers.c |   8 +-
>  arch/powerpc/platforms/pseries/pseries_energy.c |  23 ++---
>  drivers/pci/hotplug/rpaphp_core.c   | 124 
> +---
>  5 files changed, 191 insertions(+), 83 deletions(-)

Michael, I assume you'll take care of this.  If I were applying, I
would capitalize the commit subjects and fix the typos in the commit
logs, e.g.,

  s/the this/the/
  s/the the/that the/
  s/short coming/shortcoming/
  s/seperate/separate/
  s/bid endian/big endian/
  s/were appropriate/where appropriate/
  s/name form/name from/

etc.  git am also complains about space before tab whitespace errors.
And it adds a few lines >80 chars.


Re: [PATCH 1/3] powernv/iov: Ensure the pdn for VFs always contains a valid PE number

2019-09-30 Thread Bjorn Helgaas
On Mon, Sep 30, 2019 at 12:08:46PM +1000, Oliver O'Halloran wrote:

This is all powerpc, so I assume Michael will handle this.  Just
random things I noticed; ignore if they don't make sense:

> On PowerNV we use the pcibios_sriov_enable() hook to do two things:
> 
> 1. Create a pci_dn structure for each of the VFs, and
> 2. Configure the PHB's internal BARs that map MMIO ranges to PEs
>so that each VF has it's own PE. Note that the PE also determines

s/it's/its/

>the IOMMU table the HW uses for the device.
> 
> Currently we do not set the pe_number field of the pci_dn immediately after
> assigning the PE number for the VF that it represents. Instead, we do that
> in a fixup (see pnv_pci_dma_dev_setup) which is run inside the
> pcibios_add_device() hook which is run prior to adding the device to the
> bus.
> 
> On PowerNV we add the device to it's IOMMU group using a bus notifier and

s/it's/its/

> in order for this to work the PE number needs to be known when the bus
> notifier is run. This works today since the PE number is set in the fixup
> which runs before adding the device to the bus. However, if we want to move
> the fixup to a later stage this will break.
> 
> We can fix this by setting the pdn->pe_number inside of
> pcibios_sriov_enable(). There's no good to avoid this since we already have

s/no good/no good reason/ ?

Not quite sure what "this" refers to ... "no good reason to avoid
setting pdn->pe_number in pcibios_sriov_enable()"?  The double
negative makes it a little hard to parse.

> all the required information at that point, so... do that. Moving this
> earlier does cause two problems:
> 
> 1. We trip the WARN_ON() in the fixup code, and
> 2. The EEH core will clear pdn->pe_number while recovering VFs.
> 
> The only justification for either of these is a comment in eeh_rmv_device()
> suggesting that pdn->pe_number *must* be set to IODA_INVALID_PE in order
> for the VF to be scanned. However, this comment appears to have no basis in
> reality so just delete it.
> 
> Signed-off-by: Oliver O'Halloran 
> ---
> Can't get rid of the fixup entirely since we need it to set the
> ioda_pe->pdev back-pointer. I'll look at killing that another time.
> ---
>  arch/powerpc/kernel/eeh_driver.c  |  6 --
>  arch/powerpc/platforms/powernv/pci-ioda.c | 19 +++
>  arch/powerpc/platforms/powernv/pci.c  |  4 
>  3 files changed, 15 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/eeh_driver.c 
> b/arch/powerpc/kernel/eeh_driver.c
> index d9279d0..7955fba 100644
> --- a/arch/powerpc/kernel/eeh_driver.c
> +++ b/arch/powerpc/kernel/eeh_driver.c
> @@ -541,12 +541,6 @@ static void eeh_rmv_device(struct eeh_dev *edev, void 
> *userdata)
>  
>   pci_iov_remove_virtfn(edev->physfn, pdn->vf_index);
>   edev->pdev = NULL;
> -
> - /*
> -  * We have to set the VF PE number to invalid one, which is
> -  * required to plug the VF successfully.
> -  */
> - pdn->pe_number = IODA_INVALID_PE;
>  #endif
>   if (rmv_data)
>   list_add(>rmv_entry, _data->removed_vf_list);
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 5e3172d..70508b3 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1558,6 +1558,10 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, 
> u16 num_vfs)
>  
>   /* Reserve PE for each VF */
>   for (vf_index = 0; vf_index < num_vfs; vf_index++) {
> + int vf_devfn = pci_iov_virtfn_devfn(pdev, vf_index);
> + int vf_bus = pci_iov_virtfn_bus(pdev, vf_index);
> + struct pci_dn *vf_pdn;
> +
>   if (pdn->m64_single_mode)
>   pe_num = pdn->pe_num_map[vf_index];
>   else
> @@ -1570,13 +1574,11 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev 
> *pdev, u16 num_vfs)
>   pe->pbus = NULL;
>   pe->parent_dev = pdev;
>   pe->mve_number = -1;
> - pe->rid = (pci_iov_virtfn_bus(pdev, vf_index) << 8) |
> -pci_iov_virtfn_devfn(pdev, vf_index);
> + pe->rid = (vf_bus << 8) | vf_devfn;
>  
>   pe_info(pe, "VF %04d:%02d:%02d.%d associated with PE#%x\n",

Not related to *this* patch, but this looks like maybe it's supposed
to match the pci_name(), e.g., "%04x:%02x:%02x.%d" from
pci_setup_device()?  If so, the "%04d:%02d:%02d" here will be
confusing since the decimal & hex won't always match.

>   hose->global_number, pdev->bus->number,

Consider pci_domain_nr(bus) instead of hose->global_number?  It would
be nice if you had the pci_dev * for each VF so you could just use
pci_name(vf) instead of all this domain/bus/PCI_SLOT/FUNC.

> - PCI_SLOT(pci_iov_virtfn_devfn(pdev, vf_index)),
> - 

Re: [PATCH v5 03/23] PCI: hotplug: Add a flag for the movable BARs feature

2019-09-27 Thread Bjorn Helgaas
On Fri, Aug 16, 2019 at 07:50:41PM +0300, Sergey Miroshnichenko wrote:
> When hot-adding a device, the bridge may have windows not big enough (or
> fragmented too much) for newly requested BARs to fit in. And expanding
> these bridge windows may be impossible because blocked by "neighboring"
> BARs and bridge windows.
> 
> Still, it may be possible to allocate a memory region for new BARs with the
> following procedure:
> 
> 1) notify all the drivers which support movable BARs to pause and release
>the BARs; the rest of the drivers are guaranteed that their devices will
>not get BARs moved;
> 
> 2) release all the bridge windows except of root bridges;
> 
> 3) try to recalculate new bridge windows that will fit all the BAR types:
>- fixed;
>- immovable;
>- movable;
>- newly requested by hot-added devices;
> 
> 4) if the previous step fails, disable BARs for one of the hot-added
>devices and retry from step 3;
> 
> 5) notify the drivers, so they remap BARs and resume.

You don't do the actual recalculation in *this* patch, but since you
mention the procedure here, are we confident that we never make things
worse?

It's possible that a hot-add will trigger this attempt to move things
around, and it's possible that we won't find space for the new device
even if we move things around.  But are we certain that every device
that worked *before* the hot-add will still work *afterwards*?

Much of the assignment was probably done by the BIOS using different
algorithms than Linux has, so I think there's some chance that the
BIOS did a better job and if we lose that BIOS assignment, we might
not be able to recreate it.

> This makes the prior reservation of memory by BIOS/bootloader/firmware not
> required anymore for the PCI hotplug.
> 
> Drivers indicate their support of movable BARs by implementing the new
> .rescan_prepare() and .rescan_done() hooks in the struct pci_driver. All
> device's activity must be paused during a rescan, and iounmap()+ioremap()
> must be applied to every used BAR.
> 
> The platform also may need to prepare to BAR movement, so new hooks added:
> pcibios_rescan_prepare(pci_dev) and pcibios_rescan_prepare(pci_dev).
> 
> This patch is a preparation for future patches with actual implementation,
> and for now it just does the following:
>  - declares the feature;
>  - defines pci_movable_bars_enabled(), pci_dev_movable_bars_supported(dev);
>  - invokes the .rescan_prepare() and .rescan_done() driver notifiers;
>  - declares and invokes the pcibios_rescan_prepare()/_done() hooks;
>  - adds the PCI_IMMOVABLE_BARS flag.
> 
> The feature is disabled by default (via PCI_IMMOVABLE_BARS) until the final
> patch of the series. It can be overridden per-arch using this flag or by
> the following command line option:
> 
> pcie_movable_bars={ off | force }
> 
> CC: Sam Bobroff 
> CC: Rajat Jain 
> CC: Lukas Wunner 
> CC: Oliver O'Halloran 
> CC: David Laight 
> Signed-off-by: Sergey Miroshnichenko 
> ---
>  .../admin-guide/kernel-parameters.txt |  7 ++
>  drivers/pci/pci-driver.c  |  2 +
>  drivers/pci/pci.c | 24 ++
>  drivers/pci/pci.h |  2 +
>  drivers/pci/probe.c   | 86 ++-
>  include/linux/pci.h   |  7 ++
>  6 files changed, 126 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index 47d981a86e2f..e2274ee87a35 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3526,6 +3526,13 @@
>   nomsi   Do not use MSI for native PCIe PME signaling (this makes
>   all PCIe root ports use INTx for all services).
>  
> + pcie_movable_bars=[PCIE]

This isn't a PCIe-specific feature, it's just a function of whether
drivers are smart enough, so we shouldn't tie it specifically to PCIe.
We could eventually do this for conventional PCI as well.

> + Override the movable BARs support detection:
> + off
> + Disable even if supported by the platform
> + force
> + Enable even if not explicitly declared as supported

What's the need for "force"?  If it's possible, I think we should
enable this functionality all the time and just have a disable switch
in case we trip over cases where it doesn't work, e.g., something
like:

  pci=no_movable_bars

>   pcmv=   [HW,PCMCIA] BadgePAD 4
>  
>   pd_ignore_unused
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index a8124e47bf6e..d11909e79263 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1688,6 +1688,8 @@ static int __init pci_driver_init(void)
>  {
>   int ret;
>  
> + pci_add_flags(PCI_IMMOVABLE_BARS);
> +
>   ret = 

Re: [PATCH v5 02/23] PCI: Enable bridge's I/O and MEM access for hotplugged devices

2019-09-27 Thread Bjorn Helgaas
On Fri, Aug 16, 2019 at 07:50:40PM +0300, Sergey Miroshnichenko wrote:
> The PCI_COMMAND_IO and PCI_COMMAND_MEMORY bits of the bridge must be
> updated not only when enabling the bridge for the first time, but also if a
> hotplugged device requests these types of resources.

Yeah, this assumption that pci_is_enabled() means PCI_COMMAND_IO and
PCI_COMMAND_MEMORY are set correctly even though we may now need
*different* settings than when we incremented pdev->enable_cnt is
quite broken.

> Originally these bits were set by the pci_enable_device_flags() only, which
> exits early if the bridge is already pci_is_enabled(). So if the bridge was
> empty initially (an edge case), then hotplugged devices fail to IO/MEM.
> 
> Signed-off-by: Sergey Miroshnichenko 
> ---
>  drivers/pci/pci.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index e7f8c354e644..61d951766087 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1652,6 +1652,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
>   pci_enable_bridge(bridge);
>  
>   if (pci_is_enabled(dev)) {
> + int i, bars = 0;
> +
> + for (i = PCI_BRIDGE_RESOURCES; i < DEVICE_COUNT_RESOURCE; i++) {
> + if (dev->resource[i].flags & (IORESOURCE_MEM | 
> IORESOURCE_IO))
> + bars |= (1 << i);
> + }
> + do_pci_enable_device(dev, bars);
> +
>   if (!dev->is_busmaster)
>   pci_set_master(dev);
>   mutex_unlock(>enable_mutex);
> -- 
> 2.21.0
> 


Re: [PATCH v5 01/23] PCI: Fix race condition in pci_enable/disable_device()

2019-09-27 Thread Bjorn Helgaas
On Fri, Aug 16, 2019 at 07:50:39PM +0300, Sergey Miroshnichenko wrote:
> This is a yet another approach to fix an old [1-2] concurrency issue, when:
>  - two or more devices are being hot-added into a bridge which was
>initially empty;
>  - a bridge with two or more devices is being hot-added;
>  - during boot, if BIOS/bootloader/firmware doesn't pre-enable bridges.
> 
> The problem is that a bridge is reported as enabled before the MEM/IO bits
> are actually written to the PCI_COMMAND register, so another driver thread
> starts memory requests through the not-yet-enabled bridge:
> 
>  CPU0CPU1
> 
>  pci_enable_device_mem() pci_enable_device_mem()
>pci_enable_bridge() pci_enable_bridge()
>  pci_is_enabled()
>return false;
>  atomic_inc_return(enable_cnt)
>  Start actual enabling the bridge
>  ... pci_is_enabled()
>  ...   return true;
>  ... Start memory requests <-- FAIL
>  ...
>  Set the PCI_COMMAND_MEMORY bit <-- Must wait for this
> 
> Protect the pci_enable/disable_device() and pci_enable_bridge(), which is
> similar to the previous solution from commit 40f11adc7cd9 ("PCI: Avoid race
> while enabling upstream bridges"), but adding a per-device mutexes and
> preventing the dev->enable_cnt from from incrementing early.

This isn't directly related to the movable BARs functionality; is it
here because you see the problem more frequently when moving BARs?


Re: [PATCH v1 1/2] PCI/AER: Use for_each_set_bit()

2019-09-27 Thread Bjorn Helgaas
On Tue, Aug 27, 2019 at 06:18:22PM +0300, Andy Shevchenko wrote:
> This simplifies and standardizes slot manipulation code
> by using for_each_set_bit() library function.
> 
> Signed-off-by: Andy Shevchenko 
> ---
>  drivers/pci/pcie/aer.c | 19 ---
>  1 file changed, 8 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index b45bc47d04fe..f883f81d759a 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -15,6 +15,7 @@
>  #define pr_fmt(fmt) "AER: " fmt
>  #define dev_fmt pr_fmt
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -657,7 +658,8 @@ const struct attribute_group aer_stats_attr_group = {
>  static void pci_dev_aer_stats_incr(struct pci_dev *pdev,
>  struct aer_err_info *info)
>  {
> - int status, i, max = -1;
> + unsigned long status = info->status & ~info->mask;
> + int i, max = -1;
>   u64 *counter = NULL;
>   struct aer_stats *aer_stats = pdev->aer_stats;
>  
> @@ -682,10 +684,8 @@ static void pci_dev_aer_stats_incr(struct pci_dev *pdev,
>   break;
>   }
>  
> - status = (info->status & ~info->mask);
> - for (i = 0; i < max; i++)
> - if (status & (1 << i))
> - counter[i]++;
> + for_each_set_bit(i, , max)

I applied this, but I confess to being a little ambivalent.  It's
arguably a little easier to read, but it's not nearly as efficient
(not a great concern here) and more importantly much harder to verify
that it's correct because you have to chase through
for_each_set_bit(), find_first_bit(), _ffs(), etc, etc.  No doubt it's
great for bitmaps of arbitrary size, but for a simple 32-bit register
I'm a little hesitant.  But I applied it anyway.

> + counter[i]++;
>  }
>  
>  static void pci_rootport_aer_stats_incr(struct pci_dev *pdev,
> @@ -717,14 +717,11 @@ static void __print_tlp_header(struct pci_dev *dev,
>  static void __aer_print_error(struct pci_dev *dev,
> struct aer_err_info *info)
>  {
> - int i, status;
> + unsigned long status = info->status & ~info->mask;
>   const char *errmsg = NULL;
> - status = (info->status & ~info->mask);
> -
> - for (i = 0; i < 32; i++) {
> - if (!(status & (1 << i)))
> - continue;
> + int i;
>  
> + for_each_set_bit(i, , 32) {
>   if (info->severity == AER_CORRECTABLE)
>   errmsg = i < ARRAY_SIZE(aer_correctable_error_string) ?
>   aer_correctable_error_string[i] : NULL;
> -- 
> 2.23.0.rc1
> 


Re: [PATCH v1 1/2] PCI/AER: Use for_each_set_bit()

2019-09-27 Thread Bjorn Helgaas
On Tue, Aug 27, 2019 at 06:18:22PM +0300, Andy Shevchenko wrote:
> This simplifies and standardizes slot manipulation code
> by using for_each_set_bit() library function.
> 
> Signed-off-by: Andy Shevchenko 

Applied both with Kuppuswamy's reviewed-by to pci/aer for v5.5,
thanks!

> ---
>  drivers/pci/pcie/aer.c | 19 ---
>  1 file changed, 8 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index b45bc47d04fe..f883f81d759a 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -15,6 +15,7 @@
>  #define pr_fmt(fmt) "AER: " fmt
>  #define dev_fmt pr_fmt
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -657,7 +658,8 @@ const struct attribute_group aer_stats_attr_group = {
>  static void pci_dev_aer_stats_incr(struct pci_dev *pdev,
>  struct aer_err_info *info)
>  {
> - int status, i, max = -1;
> + unsigned long status = info->status & ~info->mask;
> + int i, max = -1;
>   u64 *counter = NULL;
>   struct aer_stats *aer_stats = pdev->aer_stats;
>  
> @@ -682,10 +684,8 @@ static void pci_dev_aer_stats_incr(struct pci_dev *pdev,
>   break;
>   }
>  
> - status = (info->status & ~info->mask);
> - for (i = 0; i < max; i++)
> - if (status & (1 << i))
> - counter[i]++;
> + for_each_set_bit(i, , max)
> + counter[i]++;
>  }
>  
>  static void pci_rootport_aer_stats_incr(struct pci_dev *pdev,
> @@ -717,14 +717,11 @@ static void __print_tlp_header(struct pci_dev *dev,
>  static void __aer_print_error(struct pci_dev *dev,
> struct aer_err_info *info)
>  {
> - int i, status;
> + unsigned long status = info->status & ~info->mask;
>   const char *errmsg = NULL;
> - status = (info->status & ~info->mask);
> -
> - for (i = 0; i < 32; i++) {
> - if (!(status & (1 << i)))
> - continue;
> + int i;
>  
> + for_each_set_bit(i, , 32) {
>   if (info->severity == AER_CORRECTABLE)
>   errmsg = i < ARRAY_SIZE(aer_correctable_error_string) ?
>   aer_correctable_error_string[i] : NULL;
> -- 
> 2.23.0.rc1
> 


Re: [PATCH] PCI: hotplug: Remove surplus return from a void function

2019-08-27 Thread Bjorn Helgaas
On Mon, Aug 26, 2019 at 11:51:43AM +0200, Krzysztof Wilczynski wrote:
> Remove unnecessary empty return statement at the end of a void
> function in the following:
> 
>   - drivers/pci/hotplug/cpci_hotplug_core.c: cleanup_slots()
>   - drivers/pci/hotplug/cpqphp_core.c: pci_print_IRQ_route()
>   - drivers/pci/hotplug/cpqphp_ctrl.c: cpqhp_pushbutton_thread()
>   - drivers/pci/hotplug/cpqphp_ctrl.c: interrupt_event_handler()
>   - drivers/pci/hotplug/cpqphp_nvram.h: compaq_nvram_init()
>   - drivers/pci/hotplug/rpadlpar_core.c: rpadlpar_io_init()
>   - drivers/pci/hotplug/rpaphp_core.c: cleanup_slots()
> 
> Signed-off-by: Krzysztof Wilczynski 

Applied to pci/trivial for v5.4, thanks!

I squashed the mediatek patch into this since they're both trivial.

> ---
>  drivers/pci/hotplug/cpci_hotplug_core.c | 1 -
>  drivers/pci/hotplug/cpqphp_core.c   | 1 -
>  drivers/pci/hotplug/cpqphp_ctrl.c   | 4 
>  drivers/pci/hotplug/cpqphp_nvram.h  | 5 +
>  drivers/pci/hotplug/rpadlpar_core.c | 1 -
>  drivers/pci/hotplug/rpaphp_core.c   | 1 -
>  6 files changed, 1 insertion(+), 12 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/cpci_hotplug_core.c 
> b/drivers/pci/hotplug/cpci_hotplug_core.c
> index 603eadf3d965..d0559d2faf50 100644
> --- a/drivers/pci/hotplug/cpci_hotplug_core.c
> +++ b/drivers/pci/hotplug/cpci_hotplug_core.c
> @@ -563,7 +563,6 @@ cleanup_slots(void)
>   }
>  cleanup_null:
>   up_write(_rwsem);
> - return;
>  }
>  
>  int
> diff --git a/drivers/pci/hotplug/cpqphp_core.c 
> b/drivers/pci/hotplug/cpqphp_core.c
> index 16bbb183695a..b8aacb41a83c 100644
> --- a/drivers/pci/hotplug/cpqphp_core.c
> +++ b/drivers/pci/hotplug/cpqphp_core.c
> @@ -173,7 +173,6 @@ static void pci_print_IRQ_route(void)
>   dbg("%d %d %d %d\n", tbus, tdevice >> 3, tdevice & 0x7, tslot);
>  
>   }
> - return;
>  }
>  
>  
> diff --git a/drivers/pci/hotplug/cpqphp_ctrl.c 
> b/drivers/pci/hotplug/cpqphp_ctrl.c
> index b7f4e1f099d9..68de958a9be8 100644
> --- a/drivers/pci/hotplug/cpqphp_ctrl.c
> +++ b/drivers/pci/hotplug/cpqphp_ctrl.c
> @@ -1872,8 +1872,6 @@ static void interrupt_event_handler(struct controller 
> *ctrl)
>   }
>   }   /* End of FOR loop */
>   }
> -
> - return;
>  }
>  
>  
> @@ -1943,8 +1941,6 @@ void cpqhp_pushbutton_thread(struct timer_list *t)
>  
>   p_slot->state = STATIC_STATE;
>   }
> -
> - return;
>  }
>  
>  
> diff --git a/drivers/pci/hotplug/cpqphp_nvram.h 
> b/drivers/pci/hotplug/cpqphp_nvram.h
> index 918ff8dbfe62..70e879b6a23f 100644
> --- a/drivers/pci/hotplug/cpqphp_nvram.h
> +++ b/drivers/pci/hotplug/cpqphp_nvram.h
> @@ -16,10 +16,7 @@
>  
>  #ifndef CONFIG_HOTPLUG_PCI_COMPAQ_NVRAM
>  
> -static inline void compaq_nvram_init(void __iomem *rom_start)
> -{
> - return;
> -}
> +static inline void compaq_nvram_init(void __iomem *rom_start) { }
>  
>  static inline int compaq_nvram_load(void __iomem *rom_start, struct 
> controller *ctrl)
>  {
> diff --git a/drivers/pci/hotplug/rpadlpar_core.c 
> b/drivers/pci/hotplug/rpadlpar_core.c
> index 182f9e3443ee..977946e4e613 100644
> --- a/drivers/pci/hotplug/rpadlpar_core.c
> +++ b/drivers/pci/hotplug/rpadlpar_core.c
> @@ -473,7 +473,6 @@ int __init rpadlpar_io_init(void)
>  void rpadlpar_io_exit(void)
>  {
>   dlpar_sysfs_exit();
> - return;
>  }
>  
>  module_init(rpadlpar_io_init);
> diff --git a/drivers/pci/hotplug/rpaphp_core.c 
> b/drivers/pci/hotplug/rpaphp_core.c
> index c3899ee1db99..18627bb21e9e 100644
> --- a/drivers/pci/hotplug/rpaphp_core.c
> +++ b/drivers/pci/hotplug/rpaphp_core.c
> @@ -408,7 +408,6 @@ static void __exit cleanup_slots(void)
>   pci_hp_deregister(>hotplug_slot);
>   dealloc_slot_struct(slot);
>   }
> - return;
>  }
>  
>  static int __init rpaphp_init(void)
> -- 
> 2.22.1
> 


Re: [PATCH v2] PCI: rpaphp: Avoid a sometimes-uninitialized warning

2019-08-01 Thread Bjorn Helgaas
On Mon, Jul 22, 2019 at 02:05:12PM +1000, Michael Ellerman wrote:
> Nathan Chancellor  writes:
> > On Mon, Jun 03, 2019 at 03:11:58PM -0700, Nathan Chancellor wrote:
> >> When building with -Wsometimes-uninitialized, clang warns:
> >> 
> >> drivers/pci/hotplug/rpaphp_core.c:243:14: warning: variable 'fndit' is
> >> used uninitialized whenever 'for' loop exits because its condition is
> >> false [-Wsometimes-uninitialized]
> >> for (j = 0; j < entries; j++) {
> >> ^~~
> >> drivers/pci/hotplug/rpaphp_core.c:256:6: note: uninitialized use occurs
> >> here
> >> if (fndit)
> >> ^
> >> drivers/pci/hotplug/rpaphp_core.c:243:14: note: remove the condition if
> >> it is always true
> >> for (j = 0; j < entries; j++) {
> >> ^~~
> >> drivers/pci/hotplug/rpaphp_core.c:233:14: note: initialize the variable
> >> 'fndit' to silence this warning
> >> int j, fndit;
> >> ^
> >>  = 0
> >> 
> >> fndit is only used to gate a sprintf call, which can be moved into the
> >> loop to simplify the code and eliminate the local variable, which will
> >> fix this warning.
> >> 
> >> Link: https://github.com/ClangBuiltLinux/linux/issues/504
> >> Fixes: 2fcf3ae508c2 ("hotplug/drc-info: Add code to search ibm,drc-info 
> >> property")
> >> Suggested-by: Nick Desaulniers 
> >> Signed-off-by: Nathan Chancellor 
> >> ---
> >> 
> >> v1 -> v2:
> >> 
> >> * Eliminate fndit altogether by shuffling the sprintf call into the for
> >>   loop and changing the if conditional, as suggested by Nick.
> >> 
> >>  drivers/pci/hotplug/rpaphp_core.c | 18 +++---
> >>  1 file changed, 7 insertions(+), 11 deletions(-)
> >> 
> >> diff --git a/drivers/pci/hotplug/rpaphp_core.c 
> >> b/drivers/pci/hotplug/rpaphp_core.c
> >> index bcd5d357ca23..c3899ee1db99 100644
> >> --- a/drivers/pci/hotplug/rpaphp_core.c
> >> +++ b/drivers/pci/hotplug/rpaphp_core.c
> >> @@ -230,7 +230,7 @@ static int rpaphp_check_drc_props_v2(struct 
> >> device_node *dn, char *drc_name,
> >>struct of_drc_info drc;
> >>const __be32 *value;
> >>char cell_drc_name[MAX_DRC_NAME_LEN];
> >> -  int j, fndit;
> >> +  int j;
> >>  
> >>info = of_find_property(dn->parent, "ibm,drc-info", NULL);
> >>if (info == NULL)
> >> @@ -245,17 +245,13 @@ static int rpaphp_check_drc_props_v2(struct 
> >> device_node *dn, char *drc_name,
> >>  
> >>/* Should now know end of current entry */
> >>  
> >> -  if (my_index > drc.last_drc_index)
> >> -  continue;
> >> -
> >> -  fndit = 1;
> >> -  break;
> >> +  /* Found it */
> >> +  if (my_index <= drc.last_drc_index) {
> >> +  sprintf(cell_drc_name, "%s%d", drc.drc_name_prefix,
> >> +  my_index);
> >> +  break;
> >> +  }
> >>}
> >> -  /* Found it */
> >> -
> >> -  if (fndit)
> >> -  sprintf(cell_drc_name, "%s%d", drc.drc_name_prefix, 
> >> -  my_index);
> >>  
> >>if (((drc_name == NULL) ||
> >> (drc_name && !strcmp(drc_name, cell_drc_name))) &&
> >> -- 
> >> 2.22.0.rc3
> >> 
> >
> > Hi all,
> >
> > Could someone please pick this up?
> 
> I'll take it.
> 
> I was expecting Bjorn to take it as a PCI patch, but I realise now that
> I merged that code in the first place so may as well take this too.
> 
> I'll put it in my next branch once that opens next week.

Sorry, I should have done something with this.  Did you take it,
Michael?  I don't see it in -next and haven't figured out where to
look in your git tree, so I can't tell.  Just let me know either way
so I know whether to drop this or apply it.

Bjorn


Re: [PATCH v2 0/5] PCI: Convert pci_resource_to_user() to a weak function

2019-07-30 Thread Bjorn Helgaas
On Mon, Jul 29, 2019 at 01:13:56PM +0300, Denis Efremov wrote:
> Architectures currently define HAVE_ARCH_PCI_RESOURCE_TO_USER if they want
> to provide their own pci_resource_to_user() implementation. This could be
> simplified if we make the generic version a weak function. Thus,
> architecture specific versions will automatically override the generic one.
> 
> Changes in v2:
> 1. Removed __weak from pci_resource_to_user() declaration
> 2. Fixed typo s/spark/sparc/g
> 
> Denis Efremov (5):
>   PCI: Convert pci_resource_to_user to a weak function
>   microblaze/PCI: Remove HAVE_ARCH_PCI_RESOURCE_TO_USER
>   mips/PCI: Remove HAVE_ARCH_PCI_RESOURCE_TO_USER
>   powerpc/PCI: Remove HAVE_ARCH_PCI_RESOURCE_TO_USER
>   sparc/PCI: Remove HAVE_ARCH_PCI_RESOURCE_TO_USER
> 
>  arch/microblaze/include/asm/pci.h |  2 --
>  arch/mips/include/asm/pci.h   |  1 -
>  arch/powerpc/include/asm/pci.h|  2 --
>  arch/sparc/include/asm/pci.h  |  2 --
>  drivers/pci/pci.c |  8 
>  include/linux/pci.h   | 12 
>  6 files changed, 8 insertions(+), 19 deletions(-)

Thanks, I added Paul's ack, squashed into a single patch since I think
it's easier to see what's going on then, and applied to pci/misc for
v4.5.


Re: [PATCHv2 2/2] PCI: layerscape: EP and RC drivers are compiled separately

2019-06-26 Thread Bjorn Helgaas
If you post another revision for any reason, please change the subject
so it's worded as a command and mentions the new config options, e.g.,

  PCI: layerscape: Add CONFIG_PCI_LAYERSCAPE_EP to build EP/RC separately

On Wed, Jun 26, 2019 at 07:11:39PM +0800, Xiaowei Bao wrote:
> Compile the EP and RC drivers separately with different configuration
> options, this looks clearer.
> 
> Signed-off-by: Xiaowei Bao 
> ---
> v2:
>  - No change.
> 
>  drivers/pci/controller/dwc/Kconfig  |   20 ++--
>  drivers/pci/controller/dwc/Makefile |3 ++-
>  2 files changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/Kconfig 
> b/drivers/pci/controller/dwc/Kconfig
> index a6ce1ee..a41ccf5 100644
> --- a/drivers/pci/controller/dwc/Kconfig
> +++ b/drivers/pci/controller/dwc/Kconfig
> @@ -131,13 +131,29 @@ config PCI_KEYSTONE_EP
> DesignWare core functions to implement the driver.
>  
>  config PCI_LAYERSCAPE
> - bool "Freescale Layerscape PCIe controller"
> + bool "Freescale Layerscape PCIe controller - Host mode"
>   depends on OF && (ARM || ARCH_LAYERSCAPE || COMPILE_TEST)
>   depends on PCI_MSI_IRQ_DOMAIN
>   select MFD_SYSCON
>   select PCIE_DW_HOST
>   help
> -   Say Y here if you want PCIe controller support on Layerscape SoCs.
> +   Say Y here if you want to enable PCIe controller support on Layerscape
> +   SoCs to work in Host mode.
> +   This controller can work either as EP or RC. The RCW[HOST_AGT_PEX]
> +   determines which PCIe controller works in EP mode and which PCIe
> +   controller works in RC mode.
> +
> +config PCI_LAYERSCAPE_EP
> + bool "Freescale Layerscape PCIe controller - Endpoint mode"
> + depends on OF && (ARM || ARCH_LAYERSCAPE || COMPILE_TEST)
> + depends on PCI_ENDPOINT
> + select PCIE_DW_EP
> + help
> +   Say Y here if you want to enable PCIe controller support on Layerscape
> +   SoCs to work in Endpoint mode.
> +   This controller can work either as EP or RC. The RCW[HOST_AGT_PEX]
> +   determines which PCIe controller works in EP mode and which PCIe
> +   controller works in RC mode.
>  
>  config PCI_HISI
>   depends on OF && (ARM64 || COMPILE_TEST)
> diff --git a/drivers/pci/controller/dwc/Makefile 
> b/drivers/pci/controller/dwc/Makefile
> index b085dfd..824fde7 100644
> --- a/drivers/pci/controller/dwc/Makefile
> +++ b/drivers/pci/controller/dwc/Makefile
> @@ -8,7 +8,8 @@ obj-$(CONFIG_PCI_EXYNOS) += pci-exynos.o
>  obj-$(CONFIG_PCI_IMX6) += pci-imx6.o
>  obj-$(CONFIG_PCIE_SPEAR13XX) += pcie-spear13xx.o
>  obj-$(CONFIG_PCI_KEYSTONE) += pci-keystone.o
> -obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o pci-layerscape-ep.o
> +obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o
> +obj-$(CONFIG_PCI_LAYERSCAPE_EP) += pci-layerscape-ep.o
>  obj-$(CONFIG_PCIE_QCOM) += pcie-qcom.o
>  obj-$(CONFIG_PCIE_ARMADA_8K) += pcie-armada8k.o
>  obj-$(CONFIG_PCIE_ARTPEC6) += pcie-artpec6.o
> -- 
> 1.7.1
> 


Re: [PATCH v4 19/28] docs: powerpc: convert docs to ReST and rename to *.rst

2019-06-14 Thread Bjorn Helgaas
On Fri, Jun 14, 2019 at 02:36:35PM -0600, Jonathan Corbet wrote:
> On Wed, 12 Jun 2019 14:52:55 -0300
> Mauro Carvalho Chehab  wrote:
> 
> > Convert docs to ReST and add them to the arch-specific
> > book.
> > 
> > The conversion here was trivial, as almost every file there
> > was already using an elegant format close to ReST standard.
> > 
> > The changes were mostly to mark literal blocks and add a few
> > missing section title identifiers.
> > 
> > One note with regards to "--": on Sphinx, this can't be used
> > to identify a list, as it will format it badly. This can be
> > used, however, to identify a long hyphen - and "---" is an
> > even longer one.
> > 
> > At its new index.rst, let's add a :orphan: while this is not linked to
> > the main index.rst file, in order to avoid build warnings.
> > 
> > Signed-off-by: Mauro Carvalho Chehab 
> > Acked-by: Andrew Donnellan  # cxl
> 
> This one fails to apply because ...
> 
> [...]
> 
> > diff --git a/Documentation/PCI/pci-error-recovery.rst 
> > b/Documentation/PCI/pci-error-recovery.rst
> > index 83db42092935..acc21ecca322 100644
> > --- a/Documentation/PCI/pci-error-recovery.rst
> > +++ b/Documentation/PCI/pci-error-recovery.rst
> > @@ -403,7 +403,7 @@ That is, the recovery API only requires that:
> >  .. note::
> >  
> > Implementation details for the powerpc platform are discussed in
> > -   the file Documentation/powerpc/eeh-pci-error-recovery.txt
> > +   the file Documentation/powerpc/eeh-pci-error-recovery.rst
> >  
> > As of this writing, there is a growing list of device drivers with
> > patches implementing error recovery. Not all of these patches are in
> > @@ -422,3 +422,24 @@ That is, the recovery API only requires that:
> > - drivers/net/cxgb3
> > - drivers/net/s2io.c
> > - drivers/net/qlge
> > +
> > +>>> As of this writing, there is a growing list of device drivers with
> > +>>> patches implementing error recovery. Not all of these patches are in
> > +>>> mainline yet. These may be used as "examples":
> > +>>>
> > +>>> drivers/scsi/ipr
> > +>>> drivers/scsi/sym53c8xx_2
> > +>>> drivers/scsi/qla2xxx
> > +>>> drivers/scsi/lpfc
> > +>>> drivers/next/bnx2.c
> > +>>> drivers/next/e100.c
> > +>>> drivers/net/e1000
> > +>>> drivers/net/e1000e
> > +>>> drivers/net/ixgb
> > +>>> drivers/net/ixgbe
> > +>>> drivers/net/cxgb3
> > +>>> drivers/net/s2io.c
> > +>>> drivers/net/qlge  
> 
> ...of this, which has the look of a set of conflict markers that managed
> to get committed...?

I don't see these conflict markers in my local branch or in
linux-next (next-20190614).

Let me know if I need to do something.

Bjorn


Re: [PATCH v3 3/3] powerpc/pseries: Allow user-specified PCI resource alignment after init

2019-05-29 Thread Bjorn Helgaas
On Mon, May 27, 2019 at 11:03:13PM -0500, Shawn Anastasio wrote:
> On pseries, custom PCI resource alignment specified with the commandline
> argument pci=resource_alignment is disabled due to PCI resources being
> managed by the firmware. However, in the case of PCI hotplug the
> resources are managed by the kernel, so custom alignments should be
> honored in these cases. This is done by only honoring custom
> alignments after initial PCI initialization is done, to ensure that
> all devices managed by the firmware are excluded.
> 
> Without this ability, sub-page BARs sometimes get mapped in between
> page boundaries for hotplugged devices and are therefore unusable
> with the VFIO framework. This change allows users to request
> page alignment for devices they wish to access via VFIO using
> the pci=resource_alignment commandline argument.
> 
> In the future, this could be extended to provide page-aligned
> resources by default for hotplugged devices, similar to what is
> done on powernv by commit 382746376993 ("powerpc/powernv: Override
> pcibios_default_alignment() to force PCI devices to be page aligned")
> 
> Signed-off-by: Shawn Anastasio 
> ---
>  arch/powerpc/include/asm/machdep.h |  3 +++
>  arch/powerpc/kernel/pci-common.c   |  9 +
>  arch/powerpc/platforms/pseries/setup.c | 22 ++
>  3 files changed, 34 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/machdep.h 
> b/arch/powerpc/include/asm/machdep.h
> index 2fbfaa9176ed..46eb62c0954e 100644
> --- a/arch/powerpc/include/asm/machdep.h
> +++ b/arch/powerpc/include/asm/machdep.h
> @@ -179,6 +179,9 @@ struct machdep_calls {
>  
>   resource_size_t (*pcibios_default_alignment)(void);
>  
> + /* Called when determining PCI resource alignment */
> + int (*pcibios_ignore_alignment_request)(void);
> +
>  #ifdef CONFIG_PCI_IOV
>   void (*pcibios_fixup_sriov)(struct pci_dev *pdev);
>   resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int 
> resno);
> diff --git a/arch/powerpc/kernel/pci-common.c 
> b/arch/powerpc/kernel/pci-common.c
> index ff4b7539cbdf..8e0d73b4c188 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -238,6 +238,15 @@ resource_size_t pcibios_default_alignment(void)
>   return 0;
>  }
>  
> +int pcibios_ignore_alignment_request(void)
> +{
> + if (ppc_md.pcibios_ignore_alignment_request)
> + return ppc_md.pcibios_ignore_alignment_request();
> +
> + /* Fall back to default method of checking PCI_PROBE_ONLY */
> + return pci_has_flag(PCI_PROBE_ONLY);
> +}
> +
>  #ifdef CONFIG_PCI_IOV
>  resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int 
> resno)
>  {
> diff --git a/arch/powerpc/platforms/pseries/setup.c 
> b/arch/powerpc/platforms/pseries/setup.c
> index e4f0dfd4ae33..07f03be02afe 100644
> --- a/arch/powerpc/platforms/pseries/setup.c
> +++ b/arch/powerpc/platforms/pseries/setup.c
> @@ -82,6 +82,8 @@ EXPORT_SYMBOL(CMO_PageSize);
>  
>  int fwnmi_active;  /* TRUE if an FWNMI handler is present */
>  
> +static int initial_pci_init_done; /* TRUE if initial pcibios init has 
> completed */
> +
>  static void pSeries_show_cpuinfo(struct seq_file *m)
>  {
>   struct device_node *root;
> @@ -749,6 +751,23 @@ static resource_size_t 
> pseries_pci_iov_resource_alignment(struct pci_dev *pdev,
>  }
>  #endif
>  
> +static void pseries_after_init(void)
> +{
> + initial_pci_init_done = 1;
> +}
> +
> +static int pseries_ignore_alignment_request(void)
> +{
> + if (initial_pci_init_done)
> + /*
> +  * Allow custom alignments after init for things
> +  * like PCI hotplugging.
> +  */
> + return 0;

Hmm, if there's any way to avoid this sort of early/late flag, that
would be nicer.

> +
> + return pci_has_flag(PCI_PROBE_ONLY);
> +}
> +
>  static void __init pSeries_setup_arch(void)
>  {
>   set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT);
> @@ -797,6 +816,9 @@ static void __init pSeries_setup_arch(void)
>   }
>  
>   ppc_md.pcibios_root_bridge_prepare = pseries_root_bridge_prepare;
> + ppc_md.pcibios_after_init = pseries_after_init;
> + ppc_md.pcibios_ignore_alignment_request =
> + pseries_ignore_alignment_request;
>  }
>  
>  static void pseries_panic(char *str)
> -- 
> 2.20.1
> 


Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request

2019-05-29 Thread Bjorn Helgaas
On Tue, May 28, 2019 at 03:36:34PM +1000, Oliver wrote:
> On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio  wrote:
> >
> > Introduce a new pcibios function pcibios_ignore_alignment_request
> > which allows the PCI core to defer to platform-specific code to
> > determine whether or not to ignore alignment requests for PCI resources.
> >
> > The existing behavior is to simply ignore alignment requests when
> > PCI_PROBE_ONLY is set. This is behavior is maintained by the
> > default implementation of pcibios_ignore_alignment_request.
> >
> > Signed-off-by: Shawn Anastasio 
> > ---
> >  drivers/pci/pci.c   | 9 +++--
> >  include/linux/pci.h | 1 +
> >  2 files changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 8abc843b1615..8207a09085d1 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -5882,6 +5882,11 @@ resource_size_t __weak 
> > pcibios_default_alignment(void)
> > return 0;
> >  }
> >
> > +int __weak pcibios_ignore_alignment_request(void)
> > +{
> > +   return pci_has_flag(PCI_PROBE_ONLY);
> > +}
> > +
> >  #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
> >  static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
> >  static DEFINE_SPINLOCK(resource_alignment_lock);
> > @@ -5906,9 +5911,9 @@ static resource_size_t 
> > pci_specified_resource_alignment(struct pci_dev *dev,
> > p = resource_alignment_param;
> > if (!*p && !align)
> > goto out;
> > -   if (pci_has_flag(PCI_PROBE_ONLY)) {
> > +   if (pcibios_ignore_alignment_request()) {
> > align = 0;
> > -   pr_info_once("PCI: Ignoring requested alignments 
> > (PCI_PROBE_ONLY)\n");
> > +   pr_info_once("PCI: Ignoring requested alignments\n");
> > goto out;
> > }
> 
> I think the logic here is questionable to begin with. If the user has
> explicitly requested re-aligning a resource via the command line then
> we should probably do it even if PCI_PROBE_ONLY is set. When it breaks
> they get to keep the pieces.

I agree.  I don't like PCI_PROBE_ONLY in the first place.  It's a
sledgehammer approach that doesn't tell us which resource assignments
need to be preserved or why.  I'd rather use IORESOURCE_PCI_FIXED and
set it for the BARs where there's actually some sort of
hypervisor/firmware/OS dependency.

If there's a way to avoid another pciobios_*() weak function, that
would also be better.

Bjorn


Re: [PATCH v4 00/63] Include linux ACPI/PCI/X86 docs into Sphinx TOC tree

2019-04-23 Thread Bjorn Helgaas
On Tue, Apr 23, 2019 at 06:39:47PM +0200, Rafael J. Wysocki wrote:
> On Tue, Apr 23, 2019 at 6:30 PM Changbin Du  wrote:
> > Hi Corbet and All,
> > The kernel now uses Sphinx to generate intelligent and beautiful
> > documentation from reStructuredText files. I converted all of the Linux
> > ACPI/PCI/X86 docs to reST format in this serias.
> >
> > In this version I combined ACPI and PCI docs, and added new x86 docs
> > conversion.
> 
> I'm not sure if combining all three into one big patch series has been
> a good idea, honestly.

Yeah, if you post this again, I would find it easier to deal with if
linux-pci only got the PCI-related things.  63 patches is a little too
much for one series.

Bjorn


Re: [PATCH 1/2] pci: rpadlpar: fix leaked device_node references in add/remove paths

2019-04-05 Thread Bjorn Helgaas
On Fri, Mar 22, 2019 at 01:27:21PM -0500, Tyrel Datwyler wrote:
> The find_dlpar_node() helper returns a device node with its reference
> incremented. Both the add and remove paths use this helper for find the
> appropriate node, but fail to release the reference when done.
> 
> Annotate the find_dlpar_node() helper with a comment about the incremented
> reference count, and call of_node_put() on the obtained device_node in the
> add and remove paths. Also, fixup a reference leak in the find_vio_slot()
> helper where we fail to call of_node_put() on the vdevice node after we
> iterate over its children.
> 
> Signed-off-by: Tyrel Datwyler 

Both applied to pci/hotplug for v5.2, thanks!

> ---
>  drivers/pci/hotplug/rpadlpar_core.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/pci/hotplug/rpadlpar_core.c 
> b/drivers/pci/hotplug/rpadlpar_core.c
> index e2356a9c7088..182f9e3443ee 100644
> --- a/drivers/pci/hotplug/rpadlpar_core.c
> +++ b/drivers/pci/hotplug/rpadlpar_core.c
> @@ -51,6 +51,7 @@ static struct device_node *find_vio_slot_node(char 
> *drc_name)
>   if (rc == 0)
>   break;
>   }
> + of_node_put(parent);
>  
>   return dn;
>  }
> @@ -71,6 +72,7 @@ static struct device_node *find_php_slot_pci_node(char 
> *drc_name,
>   return np;
>  }
>  
> +/* Returns a device_node with its reference count incremented */
>  static struct device_node *find_dlpar_node(char *drc_name, int *node_type)
>  {
>   struct device_node *dn;
> @@ -306,6 +308,7 @@ int dlpar_add_slot(char *drc_name)
>   rc = dlpar_add_phb(drc_name, dn);
>   break;
>   }
> + of_node_put(dn);
>  
>   printk(KERN_INFO "%s: slot %s added\n", DLPAR_MODULE_NAME, drc_name);
>  exit:
> @@ -439,6 +442,7 @@ int dlpar_remove_slot(char *drc_name)
>   rc = dlpar_remove_pci_slot(drc_name, dn);
>   break;
>   }
> + of_node_put(dn);
>   vm_unmap_aliases();
>  
>   printk(KERN_INFO "%s: slot %s removed\n", DLPAR_MODULE_NAME, drc_name);
> -- 
> 2.12.3
> 


Re: [PATCH v5 0/8] powerpc/powernv/pci: Make hotplug self-sufficient, independent of FW and DT

2019-03-27 Thread Bjorn Helgaas
Hi Sergey,

Since this doesn't touch drivers/pci, I assume powerpc folks will
handle this series.  Let me know if otherwise.

On Mon, Mar 11, 2019 at 02:52:25PM +0300, Sergey Miroshnichenko wrote:
> This patchset allows switching from the pnv_php module to the standard
> pciehp driver for PCIe hotplug functionality, if the platform supports it:
> PowerNV working on on top of the skiboot with the "core/pci: Sync VFs and
> the changes of bdfns between the firmware and the OS" [1] patch serie
> applied.

s/bdfns/BDFs/  Maybe?  I see this is a reference to another patch
  series, but if it hasn't been merged yet, "BDFs" would be consistent
  with "VFs" and give a hint that "bdfns" is not itself a word.

s/serie/series/

> The feature is activated by the "pci=realloc" command line argument.

>From a user point of view, it doesn't seem intuitive that
"pci=realloc" also means "switch from pnv_php to pciehp".

The only direct effect of "pci=realloc" is to set pci_realloc_enable.
I haven't read the patches, but is there really something in
arch/powerpc/ that does something different based on
pci_realloc_enable?

> The goal is ability to hotplug bridges full of devices in the future. The
> "Movable BARs" [2] is a platform-independent part of our work in this. The
> final part will be movable bus numbers to support inserting a bridge in the
> middle of an existing PCIe tree.
> 
> Tested on POWER8 PowerNV+PHB3 ppc64le (our Vesnin server) with:
>  - the pciehp driver active;
>  - the pnv_php driver disabled;
>  - The "pci=realloc" argument is passed;
>  - surprise hotplug of an NVME disk works;
>  - controlled hotplug of a network card with SR-IOV works;
>  - activating of SR-IOV on a network card works;
>  - [with extra patches] manually initiated (via sysfs) rescan has found
>and turned on a hotplugged bridge;
>  - Without "pci=realloc" works just as before.
> 
> Changes since v4:
>  - Fixed failing build when EEH is disabled in a kernel config;
>  - Unfreeze the bus on EEH_IO_ERROR_VALUE(size), not only 0x;
>  - Replaced the 0xff magic constant with phb->ioda.reserved_pe_idx;
>  - Renamed create_pdn() -> pci_create_pdn_from_dev();
>  - Renamed add_one_dev_pci_data(..., vf_index, ...) -> pci_alloc_pdn();
>  - Renamed add_dev_pci_data() -> pci_create_vf_pdns();
>  - Renamed remove_dev_pci_data() -> pci_destroy_vf_pdns();
>  - Removed the patch fixing uninitialized IOMMU group - now it is fixed in
>commit 8f5b27347e88 ("powerpc/powernv/sriov: Register IOMMU groups for
>VFs")
> 
> Changes since v3 [3]:
>  - Subject changed;
>  - Don't disable EEH during rescan anymore - instead just unfreeze the
>target buses deliberately;
>  - Add synchronization with the firmware when changing the PCIe topology;
>  - Fixed for VFs;
>  - Code cleanup.
> 
> Changes since v2:
>  - Don't reassign bus numbers on PowerNV by default (to retain the default
>behavior), but only when pci=realloc is passed;
>  - Less code affected;
>  - pci_add_device_node_info is refactored with add_one_dev_pci_data;
>  - Minor code cleanup.
> 
> Changes since v1:
>  - Fixed build for ppc64le and ppc64be when CONFIG_PCI_IOV is disabled;
>  - Fixed build for ppc64e when CONFIG_EEH is disabled;
>  - Fixed code style warnings.
> 
> [1] https://lists.ozlabs.org/pipermail/skiboot/2019-March/013571.html
> [2] https://www.spinics.net/lists/linux-pci/msg79995.html
> [3] https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-September/178053.html
> 
> Sergey Miroshnichenko (8):
>   powerpc/pci: Access PCI config space directly w/o pci_dn
>   powerpc/powernv/pci: Suppress an EEH error when reading an empty slot
>   powerpc/pci: Create pci_dn on demand
>   powerpc/pci: Reduce code duplication in pci_add_device_node_info
>   powerpc/pci/IOV: Add support for runtime enabling the VFs
>   powerpc/pci: Don't rely on DT is the PCI_REASSIGN_ALL_BUS is set
>   powerpc/powernv/pci: Hook up the writes to PCI_SECONDARY_BUS register
>   powerpc/powernv/pci: Enable reassigning the bus numbers
> 
>  arch/powerpc/include/asm/pci-bridge.h|   4 +-
>  arch/powerpc/include/asm/ppc-pci.h   |   1 +
>  arch/powerpc/kernel/pci_dn.c | 170 ++-
>  arch/powerpc/kernel/rtas_pci.c   |  97 ++---
>  arch/powerpc/platforms/powernv/eeh-powernv.c |   2 +-
>  arch/powerpc/platforms/powernv/pci-ioda.c|   4 +-
>  arch/powerpc/platforms/powernv/pci.c | 205 +--
>  arch/powerpc/platforms/pseries/pci.c |   4 +-
>  8 files changed, 379 insertions(+), 108 deletions(-)
> 
> -- 
> 2.20.1
> 


Re: [PATCH RFC v4 20/21] PCI: pciehp: Add support for the movable BARs feature

2019-03-26 Thread Bjorn Helgaas
On Mon, Mar 11, 2019 at 04:31:21PM +0300, Sergey Miroshnichenko wrote:
> With movable BARs, adding a hotplugged device may affect all the PCIe
> domain starting from the root, so use a pci_rescan_bus() function which
> handles the rearrangement of existing BARs and bridge windows.
> 
> Signed-off-by: Sergey Miroshnichenko 
> ---
>  drivers/pci/hotplug/pciehp_pci.c | 14 +-
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pciehp_pci.c 
> b/drivers/pci/hotplug/pciehp_pci.c
> index b9c1396db6fe..7c0871db5bae 100644
> --- a/drivers/pci/hotplug/pciehp_pci.c
> +++ b/drivers/pci/hotplug/pciehp_pci.c
> @@ -56,12 +56,16 @@ int pciehp_configure_device(struct controller *ctrl)
>   goto out;
>   }
>  
> - for_each_pci_bridge(dev, parent)
> - pci_hp_add_bridge(dev);
> + if (pci_movable_bars_enabled()) {
> + pci_rescan_bus(parent);
> + } else {
> + for_each_pci_bridge(dev, parent)
> + pci_hp_add_bridge(dev);
>  
> - pci_assign_unassigned_bridge_resources(bridge);
> - pcie_bus_configure_settings(parent);
> - pci_bus_add_devices(parent);
> + pci_assign_unassigned_bridge_resources(bridge);
> + pcie_bus_configure_settings(parent);
> + pci_bus_add_devices(parent);
> + }

The addition of a second path at this level, i.e., different paths
depending on whether movable BARs are enabled, seems a little
problematic because it's hard to determine whether they're equivalent
except for the movable BAR aspect.  For example, you don't call
pci_hp_add_bridge() when movable BARs are enabled, and I can't tell
whether that's intentional or whether it's a problem.

This looks like the sort of change that should be made in other
hotplug paths, e.g., enable_slot() for acpiphp,
pcibios_finish_adding_to_bus() for powerpc (maybe? I can't really
tell), cpci_configure_slot() shpchp_configure_device()?

If we have or could invent some top-level interface that all these
places could use, and somewhere inside that we could do the movable
BAR magic, I think that would make it more maintainable.

>   out:
>   pci_unlock_rescan_remove();
> -- 
> 2.20.1
> 


Re: [PATCH RFC v4 17/21] PCI: Calculate boundaries for bridge windows

2019-03-26 Thread Bjorn Helgaas
On Mon, Mar 11, 2019 at 04:31:18PM +0300, Sergey Miroshnichenko wrote:
> If a bridge window contains fixed areas (there are PCIe devices with
> immovable BARs located on this bus), 

I think what you mean by "immovable BARs" is "drivers that don't
support moving BARs".  I want to keep the concept of legacy and EA
resources separate because those are immovable in principle, but
drivers can always be improved.

> this window must be allocated
> within the bound memory area, limited by windows size and by address
> range of fixed resources, calculated as follows:
> 
>| <-- bus's fixed_range_hard   --> |
>   | <--  fixed_range_hard.end - window size   --> |
>| <--  fixed_range_hard.start + window size   --> |
>   | <--bus's fixed_range_soft--> |
> 
> Signed-off-by: Sergey Miroshnichenko 
> ---
>  drivers/pci/setup-bus.c | 56 +
>  include/linux/pci.h |  4 ++-
>  2 files changed, 59 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index a1fd7f3c5ea8..f4737339d5ec 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1809,6 +1809,61 @@ static enum enable_type pci_realloc_detect(struct 
> pci_bus *bus,
>  }
>  #endif
>  
> +static void pci_bus_update_fixed_range_soft(struct pci_bus *bus)
> +{
> + struct pci_dev *dev;
> + struct pci_bus *parent = bus->parent;
> + int idx;
> +
> + list_for_each_entry(dev, >devices, bus_list)
> + if (dev->subordinate)
> + pci_bus_update_fixed_range_soft(dev->subordinate);
> +
> + if (!parent || !bus->self)
> + return;
> +
> + for (idx = 0; idx < ARRAY_SIZE(bus->fixed_range_hard); ++idx) {
> + struct resource *r;
> + resource_size_t soft_start, soft_end;
> + resource_size_t hard_start = bus->fixed_range_hard[idx].start;
> + resource_size_t hard_end = bus->fixed_range_hard[idx].end;
> +
> + if (hard_start > hard_end)
> + continue;
> +
> + r = bus->resource[idx];
> +
> + soft_start = hard_end - resource_size(r) + 1;
> + soft_end = hard_start + resource_size(r) - 1;
> +
> + if (soft_start > hard_start)
> + soft_start = hard_start;
> +
> + if (soft_end < hard_end)
> + soft_end = hard_end;
> +
> + list_for_each_entry(dev, >devices, bus_list) {
> + struct pci_bus *sibling = dev->subordinate;
> + resource_size_t s_start, s_end;
> +
> + if (!sibling || sibling == bus)
> + continue;
> +
> + s_start = sibling->fixed_range_hard[idx].start;
> + s_end = sibling->fixed_range_hard[idx].end;
> +
> + if (s_start > s_end)
> + continue;
> +
> + if (s_end < hard_start && s_end > soft_start)
> + soft_start = s_end;
> + }
> +
> + bus->fixed_range_soft[idx].start = soft_start;
> + bus->fixed_range_soft[idx].end = soft_end;
> + }
> +}
> +
>  /*
>   * first try will not touch pci bridge res
>   * second and later try will clear small leaf bridge res
> @@ -1847,6 +1902,7 @@ void pci_assign_unassigned_root_bus_resources(struct 
> pci_bus *bus)
>   /* Depth first, calculate sizes and alignments of all
>  subordinate buses. */
>   __pci_bus_size_bridges(bus, add_list);
> + pci_bus_update_fixed_range_soft(bus);
>  
>   /* Depth last, allocate resources and update the hardware. */
>   __pci_bus_assign_resources(bus, add_list, _head);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 7a4d62d84bc1..75a56db73ad4 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -574,9 +574,11 @@ struct pci_bus {
>  
>   /*
>* If there are fixed resources in the bridge window, the hard range
> -  * contains the lowest and the highest addresses of them.
> +  * contains the lowest and the highest addresses of them, and this
> +  * bridge window must reside within the soft range.
>*/
>   struct resource fixed_range_hard[PCI_BRIDGE_RESOURCE_NUM];
> + struct resource fixed_range_soft[PCI_BRIDGE_RESOURCE_NUM];
>  
>   struct pci_ops  *ops;   /* Configuration access functions */
>   struct msi_controller *msi; /* MSI controller */
> -- 
> 2.20.1
> 


Re: [PATCH RFC v4 15/21] PCI: Allow the failed resources to be reassigned later

2019-03-26 Thread Bjorn Helgaas
On Mon, Mar 11, 2019 at 04:31:16PM +0300, Sergey Miroshnichenko wrote:
> Don't lose the size of the requested EP's BAR if it can't be fit
> in a current trial, so this can be retried.

s/EP/device/, since this applies equally to conventional PCI.

> But a failed bridge window must be dropped and recalculated in the
> next trial.
> 
> Signed-off-by: Sergey Miroshnichenko 
> ---
>  drivers/pci/setup-bus.c |  3 ++-
>  drivers/pci/setup-res.c | 12 
>  2 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index f9d605cd1725..c1559a4a8564 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -309,7 +309,8 @@ static void assign_requested_resources_sorted(struct 
> list_head *head,
>   0 /* don't care */,
>   0 /* don't care */);
>   }
> - reset_resource(res);
> + if (!pci_movable_bars_enabled())
> + reset_resource(res);
>   }
>   }
>  }
> diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
> index d8ca40a97693..732d18f60f1b 100644
> --- a/drivers/pci/setup-res.c
> +++ b/drivers/pci/setup-res.c
> @@ -298,6 +298,18 @@ static int _pci_assign_resource(struct pci_dev *dev, int 
> resno,
>  
>   bus = dev->bus;
>   while ((ret = __pci_assign_resource(bus, dev, resno, size, min_align))) 
> {
> + if (pci_movable_bars_enabled()) {
> + if (resno >= PCI_BRIDGE_RESOURCES &&
> + resno <= PCI_BRIDGE_RESOURCE_END) {
> + struct resource *res = dev->resource + resno;
> +
> + res->start = 0;
> + res->end = 0;
> + res->flags = 0;
> + }
> + break;
> + }
> +
>   if (!bus->parent || !bus->self->transparent)
>   break;
>   bus = bus->parent;
> -- 
> 2.20.1
> 


Re: [PATCH RFC v4 14/21] PCI: Don't reserve memory for hotplug when enabled movable BARs

2019-03-26 Thread Bjorn Helgaas
On Mon, Mar 11, 2019 at 04:31:15PM +0300, Sergey Miroshnichenko wrote:
> pbus_size_mem() returns a precise amount of memory required to fit
> all the requested BARs and windows of children bridges.

Please make the commit log complete in itself, without requiring the
subject.

> Signed-off-by: Sergey Miroshnichenko 
> ---
>  drivers/pci/setup-bus.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 9d93f2b32bf1..f9d605cd1725 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1229,7 +1229,7 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct 
> list_head *realloc_head)
>  
>   case PCI_HEADER_TYPE_BRIDGE:
>   pci_bridge_check_ranges(bus);
> - if (bus->self->is_hotplug_bridge) {
> + if (bus->self->is_hotplug_bridge && 
> !pci_movable_bars_enabled()) {
>   additional_io_size  = pci_hotplug_io_size;
>   additional_mem_size = pci_hotplug_mem_size;
>   }
> -- 
> 2.20.1
> 


Re: [PATCH RFC v4 12/21] PCI: Don't allow hotplugged devices to steal resources

2019-03-26 Thread Bjorn Helgaas
On Mon, Mar 11, 2019 at 04:31:13PM +0300, Sergey Miroshnichenko wrote:
> When movable BARs are enabled, the PCI subsystem at first releases
> all the bridge windows and then performs an attempt to assign new
> requested resources and re-assign the existing ones.

s/performs an attempt/attempts/

I guess "new requested resources" means "resources to newly hotplugged
devices"?

> If a hotplugged device gets its resources first, there could be no
> space left to re-assign resources of already working devices, which
> is unacceptable. If this happens, this patch marks one of the new
> devices with the new introduced flag PCI_DEV_IGNORE and retries the
> resource assignment.
> 
> This patch adds a new res_mask bitmask to the struct pci_dev for
> storing the indices of assigned resources.
> 
> Signed-off-by: Sergey Miroshnichenko 
> ---
>  drivers/pci/bus.c   |   5 ++
>  drivers/pci/pci.h   |  11 +
>  drivers/pci/probe.c | 100 +++-
>  drivers/pci/setup-bus.c |  15 ++
>  include/linux/pci.h |   1 +
>  5 files changed, 130 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index 5cb40b2518f9..a9784144d6f2 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -311,6 +311,11 @@ void pci_bus_add_device(struct pci_dev *dev)
>  {
>   int retval;
>  
> + if (pci_dev_is_ignored(dev)) {
> + pci_warn(dev, "%s: don't enable the ignored device\n", 
> __func__);
> + return;

I'm not sure about this.  Even if we're unable to assign space for all
the device's BARs, it still should respond to config accesses, and I
think it should show up in sysfs and lspci.

> + }
> +
>   /*
>* Can not put in pci_device_add yet because resources
>* are not assigned yet for some devices.
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index e06e8692a7b1..56b905068ac5 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -366,6 +366,7 @@ static inline bool pci_dev_is_disconnected(const struct 
> pci_dev *dev)
>  
>  /* pci_dev priv_flags */
>  #define PCI_DEV_ADDED 0
> +#define PCI_DEV_IGNORE 1
>  
>  static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
>  {
> @@ -377,6 +378,16 @@ static inline bool pci_dev_is_added(const struct pci_dev 
> *dev)
>   return test_bit(PCI_DEV_ADDED, >priv_flags);
>  }
>  
> +static inline void pci_dev_ignore(struct pci_dev *dev, bool ignore)
> +{
> + assign_bit(PCI_DEV_IGNORE, >priv_flags, ignore);
> +}
> +
> +static inline bool pci_dev_is_ignored(const struct pci_dev *dev)
> +{
> + return test_bit(PCI_DEV_IGNORE, >priv_flags);
> +}
> +
>  #ifdef CONFIG_PCIEAER
>  #include 
>  
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 692752c71f71..62f4058a001f 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -3248,6 +3248,23 @@ unsigned int pci_rescan_bus_bridge_resize(struct 
> pci_dev *bridge)
>   return max;
>  }
>  
> +static unsigned int pci_dev_res_mask(struct pci_dev *dev)
> +{
> + unsigned int res_mask = 0;
> + int i;
> +
> + for (i = 0; i < PCI_BRIDGE_RESOURCES; i++) {
> + struct resource *r = >resource[i];
> +
> + if (!r->flags || (r->flags & IORESOURCE_UNSET) || !r->parent)
> + continue;
> +
> + res_mask |= (1 << i);
> + }
> +
> + return res_mask;
> +}
> +
>  static void pci_bus_rescan_prepare(struct pci_bus *bus)
>  {
>   struct pci_dev *dev;
> @@ -3257,6 +3274,8 @@ static void pci_bus_rescan_prepare(struct pci_bus *bus)
>   list_for_each_entry(dev, >devices, bus_list) {
>   struct pci_bus *child = dev->subordinate;
>  
> + dev->res_mask = pci_dev_res_mask(dev);
> +
>   if (child) {
>   pci_bus_rescan_prepare(child);
>   } else if (dev->driver &&
> @@ -3318,6 +3337,84 @@ static void pci_setup_bridges(struct pci_bus *bus)
>   pci_setup_bridge(bus);
>  }
>  
> +static struct pci_dev *pci_find_next_new_device(struct pci_bus *bus)
> +{
> + struct pci_dev *dev;
> +
> + if (!bus)
> + return NULL;
> +
> + list_for_each_entry(dev, >devices, bus_list) {
> + struct pci_bus *child_bus = dev->subordinate;
> +
> + if (!pci_dev_is_added(dev) && !pci_dev_is_ignored(dev))
> + return dev;
> +
> + if (child_bus) {
> + struct pci_dev *next_new_dev;
> +
> + next_new_dev = pci_find_next_new_device(child_bus);
> + if (next_new_dev)
> + return next_new_dev;
> + }
> + }
> +
> + return NULL;
> +}
> +
> +static bool pci_bus_validate_resources(struct pci_bus *bus)

The name of this function should tell us what the return value means.
Just from the name "pci_bus_validate_resources", I can't tell whether we
call it for side-effects, or whether true 

Re: [PATCH RFC v4 11/21] PCI: Release and reassign the root bridge resources during rescan

2019-03-26 Thread Bjorn Helgaas
On Mon, Mar 11, 2019 at 04:31:12PM +0300, Sergey Miroshnichenko wrote:
> When the movable BARs feature is enabled, don't rely on the memory gaps
> reserved by the BIOS/bootloader/firmware, but instead rearrange the BARs
> and bridge windows starting from the root.
> 
> Endpoint device's BARs, after being released, are resorted and written
> back by the pci_assign_unassigned_root_bus_resources().
> 
> The last step of writing the recalculated windows to the bridges is done
> by the new pci_setup_bridges() function.
> 
> Signed-off-by: Sergey Miroshnichenko 
> ---
>  drivers/pci/pci.h   |  1 +
>  drivers/pci/probe.c | 22 ++
>  drivers/pci/setup-bus.c | 11 ++-
>  3 files changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 224d88634115..e06e8692a7b1 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -248,6 +248,7 @@ void __pci_bus_assign_resources(const struct pci_bus *bus,
>   struct list_head *realloc_head,
>   struct list_head *fail_head);
>  bool pci_bus_clip_resource(struct pci_dev *dev, int idx);
> +void pci_bus_release_root_bridge_resources(struct pci_bus *bus);
>  
>  void pci_reassigndev_resource_alignment(struct pci_dev *dev);
>  void pci_disable_bridge_window(struct pci_dev *dev);
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 1cf6ec960236..692752c71f71 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -3299,6 +3299,25 @@ static void pci_bus_rescan_done(struct pci_bus *bus)
>   pm_runtime_put(>dev);
>  }
>  
> +static void pci_setup_bridges(struct pci_bus *bus)
> +{
> + struct pci_dev *dev;
> +
> + list_for_each_entry(dev, >devices, bus_list) {
> + struct pci_bus *child;
> +
> + if (!pci_dev_is_added(dev) || pci_dev_is_ignored(dev))
> + continue;
> +
> + child = dev->subordinate;
> + if (child)
> + pci_setup_bridges(child);
> + }
> +
> + if (bus->self)
> + pci_setup_bridge(bus);
> +}
> +
>  /**
>   * pci_rescan_bus - Scan a PCI bus for devices
>   * @bus: PCI bus to scan
> @@ -3321,8 +3340,11 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
>   pci_bus_rescan_prepare(root);
>  
>   max = pci_scan_child_bus(root);
> +
> + pci_bus_release_root_bridge_resources(root);
>   pci_assign_unassigned_root_bus_resources(root);
>  
> + pci_setup_bridges(root);
>   pci_bus_rescan_done(root);
>   } else {
>   max = pci_scan_child_bus(bus);
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index be7d4e6d7b65..36a1907d9509 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1584,7 +1584,7 @@ static void pci_bridge_release_resources(struct pci_bus 
> *bus,
>   pci_printk(KERN_DEBUG, dev, "resource %d %pR released\n",
>   PCI_BRIDGE_RESOURCES + idx, r);
>   /* keep the old size */
> - r->end = resource_size(r) - 1;
> + r->end = pci_movable_bars_enabled() ? 0 : (resource_size(r) - 
> 1);

Doesn't this mean we're throwing away the information about the BAR
size, and we'll have to size the BAR again somewhere?  I would like to
avoid that.  But I don't know yet where you rely on this, so maybe
it's not possible to avoid it.

>   r->start = 0;
>   r->flags = 0;
>  
> @@ -1637,6 +1637,15 @@ static void pci_bus_release_bridge_resources(struct 
> pci_bus *bus,
>   pci_bridge_release_resources(bus, type);
>  }
>  
> +void pci_bus_release_root_bridge_resources(struct pci_bus *root_bus)
> +{
> + pci_bus_release_bridge_resources(root_bus, IORESOURCE_IO, 
> whole_subtree);
> + pci_bus_release_bridge_resources(root_bus, IORESOURCE_MEM, 
> whole_subtree);
> + pci_bus_release_bridge_resources(root_bus,
> +  IORESOURCE_MEM_64 | 
> IORESOURCE_PREFETCH,
> +  whole_subtree);
> +}
> +
>  static void pci_bus_dump_res(struct pci_bus *bus)
>  {
>   struct resource *res;
> -- 
> 2.20.1
> 


Re: [PATCH RFC v4 10/21] PCI: Fix assigning of fixed prefetchable resources

2019-03-26 Thread Bjorn Helgaas
On Mon, Mar 11, 2019 at 04:31:11PM +0300, Sergey Miroshnichenko wrote:
> Allow matching them to non-prefetchable windows, as it is done for movable
> resources.

Please make the commit log complete in itself, without requiring the
subject.  It's OK if you have to repeat the subject.

IIUC, this is actually a bug fix and is not strictly related to
movable resources.  We should be able to have a IORESOURCE_PCI_FIXED
prefetchable BAR in a non-prefetchable window.

I suppose movable windows exposes this case because as currently
implemented, it marks many more BARs as IORESOURCE_PCI_FIXED.  I think
we should use something other than IORESOURCE_PCI_FIXED for that case,
so maybe this patch will end up being unnecessary?

> Signed-off-by: Sergey Miroshnichenko 
> ---
>  drivers/pci/setup-bus.c | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 3644feb13179..be7d4e6d7b65 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1301,15 +1301,20 @@ static void assign_fixed_resource_on_bus(struct 
> pci_bus *b, struct resource *r)
>  {
>   int i;
>   struct resource *parent_r;
> - unsigned long mask = IORESOURCE_IO | IORESOURCE_MEM |
> -  IORESOURCE_PREFETCH;
> + unsigned long mask = IORESOURCE_TYPE_BITS;
>  
>   pci_bus_for_each_resource(b, parent_r, i) {
>   if (!parent_r)
>   continue;
>  
> - if ((r->flags & mask) == (parent_r->flags & mask) &&
> - resource_contains(parent_r, r))
> + if ((r->flags & mask) != (parent_r->flags & mask))
> + continue;
> +
> + if (parent_r->flags & IORESOURCE_PREFETCH &&
> + !(r->flags & IORESOURCE_PREFETCH))
> + continue;
> +
> + if (resource_contains(parent_r, r))
>   request_resource(parent_r, r);
>   }
>  }
> -- 
> 2.20.1
> 


Re: [PATCH RFC v4 09/21] PCI: Mark immovable BARs with PCI_FIXED

2019-03-26 Thread Bjorn Helgaas
On Mon, Mar 11, 2019 at 04:31:10PM +0300, Sergey Miroshnichenko wrote:
> If a PCIe device driver doesn't yet have support for movable BARs,
> mark device's BARs with IORESOURCE_PCI_FIXED.

I'm hesitant about using IORESOURCE_PCI_FIXED for this purpose.  That
was originally added to describe resources that can not be changed
because they're hardwired in the device, e.g., legacy resources and
Enhanced Allocation resources.

In general, I think the bits in res->flags should tell us things about
the hardware.  This particular use would be something about the
*driver*, and I think we should figure that out by looking at
dev->driver.

> Signed-off-by: Sergey Miroshnichenko 
> ---
>  drivers/pci/probe.c | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index dc935f82a595..1cf6ec960236 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -3262,6 +3262,21 @@ static void pci_bus_rescan_prepare(struct pci_bus *bus)
>   } else if (dev->driver &&
>  dev->driver->rescan_prepare) {
>   dev->driver->rescan_prepare(dev);
> + } else if (dev->driver || ((dev->class >> 8) == 
> PCI_CLASS_DISPLAY_VGA)) {
> + int i;
> +
> + for (i = 0; i < PCI_NUM_RESOURCES; i++) {
> + struct resource *r = >resource[i];
> +
> + if (!r->flags || !r->parent ||
> + (r->flags & IORESOURCE_UNSET) ||
> + (r->flags & IORESOURCE_PCI_FIXED))
> + continue;
> +
> + r->flags |= IORESOURCE_PCI_FIXED;
> + pci_warn(dev, "%s: no support for movable BARs, 
> mark BAR %d (%pR) as fixed\n",
> +  __func__, i, r);
> + }
>   }
>   }
>  }
> -- 
> 2.20.1
> 


Re: [PATCH RFC v4 08/21] nvme-pci: Handle movable BARs

2019-03-26 Thread Bjorn Helgaas
[+cc Keith, Jens, Christoph, Sagi, linux-nvme, LKML]

On Mon, Mar 11, 2019 at 04:31:09PM +0300, Sergey Miroshnichenko wrote:
> Hotplugged devices can affect the existing ones by moving their BARs.
> PCI subsystem will inform the NVME driver about this by invoking
> reset_prepare()+reset_done(), then iounmap()+ioremap() must be called.

Do you mean the PCI core will invoke ->rescan_prepare() and
->rescan_done() (as opposed to *reset*)?

> Signed-off-by: Sergey Miroshnichenko 
> ---
>  drivers/nvme/host/pci.c | 29 +++--
>  1 file changed, 27 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 92bad1c810ac..ccea3033a67a 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -106,6 +106,7 @@ struct nvme_dev {
>   unsigned int num_vecs;
>   int q_depth;
>   u32 db_stride;
> + resource_size_t current_phys_bar;
>   void __iomem *bar;
>   unsigned long bar_mapped_size;
>   struct work_struct remove_work;
> @@ -1672,13 +1673,16 @@ static int nvme_remap_bar(struct nvme_dev *dev, 
> unsigned long size)
>  {
>   struct pci_dev *pdev = to_pci_dev(dev->dev);
>  
> - if (size <= dev->bar_mapped_size)
> + if (dev->bar &&
> + dev->current_phys_bar == pci_resource_start(pdev, 0) &&
> + size <= dev->bar_mapped_size)
>   return 0;
>   if (size > pci_resource_len(pdev, 0))
>   return -ENOMEM;
>   if (dev->bar)
>   iounmap(dev->bar);
> - dev->bar = ioremap(pci_resource_start(pdev, 0), size);
> + dev->current_phys_bar = pci_resource_start(pdev, 0);
> + dev->bar = ioremap(dev->current_phys_bar, size);

dev->current_phys_bar is different from pci_resource_start() in the
case where the PCI core has moved the nvme BAR, but nvme has not yet
remapped it.

I'm not sure it's worth keeping track of current_phys_bar, as opposed
to always unmapping and remapping.  Is this a performance path?  I
think there are advantages to always exercising the same code path,
regardless of whether the BAR happened to be moved, e.g., if there's a
bug in the "BAR moved" path, it may be a heisenbug because whether we
exercise that path depends on the current configuration.

If you do need to cache current_phys_bar, maybe this, so it's a little
easier to see that you're not changing the ioremap() itself:

  dev->bar = ioremap(pci_resource_start(pdev, 0), size);
  dev->current_phys_bar = pci_resource_start(pdev, 0);

>   if (!dev->bar) {
>   dev->bar_mapped_size = 0;
>   return -ENOMEM;
> @@ -2504,6 +2508,8 @@ static void nvme_reset_work(struct work_struct *work)
>   if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
>   goto out;
>  
> + nvme_remap_bar(dev, db_bar_size(dev, 0));

How is this change connected to rescan?  This looks reset-related.

>   /*
>* If we're called to reset a live controller first shut it down before
>* moving on.
> @@ -2910,6 +2916,23 @@ static void nvme_error_resume(struct pci_dev *pdev)
>   flush_work(>ctrl.reset_work);
>  }
>  
> +void nvme_rescan_prepare(struct pci_dev *pdev)
> +{
> + struct nvme_dev *dev = pci_get_drvdata(pdev);
> +
> + nvme_dev_disable(dev, false);
> + nvme_dev_unmap(dev);
> + dev->bar = NULL;
> +}
> +
> +void nvme_rescan_done(struct pci_dev *pdev)
> +{
> + struct nvme_dev *dev = pci_get_drvdata(pdev);
> +
> + nvme_dev_map(dev);
> + nvme_reset_ctrl_sync(>ctrl);
> +}
> +
>  static const struct pci_error_handlers nvme_err_handler = {
>   .error_detected = nvme_error_detected,
>   .slot_reset = nvme_slot_reset,
> @@ -2974,6 +2997,8 @@ static struct pci_driver nvme_driver = {
>   },
>   .sriov_configure = pci_sriov_configure_simple,
>   .err_handler= _err_handler,
> + .rescan_prepare = nvme_rescan_prepare,
> + .rescan_done= nvme_rescan_done,
>  };
>  
>  static int __init nvme_init(void)
> -- 
> 2.20.1
> 


Re: [PATCH RFC v4 07/21] PCI: Wake up bridges during rescan when movable BARs enabled

2019-03-26 Thread Bjorn Helgaas
On Mon, Mar 11, 2019 at 04:31:08PM +0300, Sergey Miroshnichenko wrote:
> Use the PM runtime methods to wake up the bridges before accessing
> their config space.
> 
> Signed-off-by: Sergey Miroshnichenko 
> ---
>  drivers/pci/probe.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 88350dd56344..dc935f82a595 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -3252,6 +3252,8 @@ static void pci_bus_rescan_prepare(struct pci_bus *bus)
>  {
>   struct pci_dev *dev;
>  
> + pm_runtime_get_sync(>dev);

This should be part of the patch that adds the config space access
so we can tell specifically what code requires the wakeup.

>   list_for_each_entry(dev, >devices, bus_list) {
>   struct pci_bus *child = dev->subordinate;
>  
> @@ -3278,6 +3280,8 @@ static void pci_bus_rescan_done(struct pci_bus *bus)
>   dev->driver->rescan_done(dev);
>   }
>   }
> +
> + pm_runtime_put(>dev);
>  }
>  
>  /**
> -- 
> 2.20.1
> 


Re: [PATCH RFC v4 05/21] PCI: hotplug: Add a flag for the movable BARs feature

2019-03-26 Thread Bjorn Helgaas
On Mon, Mar 11, 2019 at 04:31:06PM +0300, Sergey Miroshnichenko wrote:
> If a new PCIe device has been hot-plugged between the two active ones
> without big enough gap between their BARs, 

Just to speak precisely here, a hot-added device is not "between" two
active ones because the new device has zeros in its BARs.

BARs from different devices can be interleaved arbitrarily, subject to
bridge window constraints, so we can really only speak about a *BAR*
(not the entire device) being between two other BARs.

Also, I don't think there's anything here that is PCIe-specific, so we
should talk about "PCI", not "PCIe".

> these BARs should be moved
> if their drivers support this feature. The drivers should be notified
> and paused during the procedure:
> 
> 1) dev 8 (new)
>|
>v
> .. |  dev 3  |  dev 3  |  dev 5  |  dev 7  |
> .. |  BAR 0  |  BAR 1  |  BAR 0  |  BAR 0  |
> 
> 2) dev 8
>  |
>  v
> .. |  dev 3  |  dev 3  | -->   --> |  dev 5  |  dev 7  |
> .. |  BAR 0  |  BAR 1  | -->   --> |  BAR 0  |  BAR 0  |
> 
>  3)
> 
> .. |  dev 3  |  dev 3  |  dev 8  |  dev 8  |  dev 5  |  dev 7  |
> .. |  BAR 0  |  BAR 1  |  BAR 0  |  BAR 1  |  BAR 0  |  BAR 0  |
> 
> Thus, prior reservation of memory regions by BIOS/bootloader/firmware
> is not required anymore for the PCIe hotplug.
> 
> The PCI_MOVABLE_BARS flag is set by the platform is this feature is
> supported and tested, but can be overridden by the following command
> line option:
> pcie_movable_bars={ off | force }

A chicken switch to turn this functionality off is OK, but I think it
should be enabled by default.  There isn't anything about this that's
platform-specific, is there?

> Signed-off-by: Sergey Miroshnichenko 
> ---
>  .../admin-guide/kernel-parameters.txt |  7 ++
>  drivers/pci/pci.c | 24 +++
>  include/linux/pci.h   |  2 ++
>  3 files changed, 33 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index 2b8ee90bb644..d40eaf993f80 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3417,6 +3417,13 @@
>   nomsi   Do not use MSI for native PCIe PME signaling (this makes
>   all PCIe root ports use INTx for all services).
>  
> + pcie_movable_bars=[PCIE]
> + Override the movable BARs support detection:
> + off
> + Disable even if supported by the platform
> + force
> + Enable even if not explicitly declared as supported
> +
>   pcmv=   [HW,PCMCIA] BadgePAD 4
>  
>   pd_ignore_unused
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 69898fe5255e..4dac49a887ec 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -139,6 +139,30 @@ static int __init pcie_port_pm_setup(char *str)
>  }
>  __setup("pcie_port_pm=", pcie_port_pm_setup);
>  
> +static bool pcie_movable_bars_off;
> +static bool pcie_movable_bars_force;
> +static int __init pcie_movable_bars_setup(char *str)
> +{
> + if (!strcmp(str, "off"))
> + pcie_movable_bars_off = true;
> + else if (!strcmp(str, "force"))
> + pcie_movable_bars_force = true;
> + return 1;
> +}
> +__setup("pcie_movable_bars=", pcie_movable_bars_setup);
> +
> +bool pci_movable_bars_enabled(void)
> +{
> + if (pcie_movable_bars_off)
> + return false;
> +
> + if (pcie_movable_bars_force)
> + return true;
> +
> + return pci_has_flag(PCI_MOVABLE_BARS);
> +}
> +EXPORT_SYMBOL(pci_movable_bars_enabled);
> +
>  /* Time to wait after a reset for device to become responsive */
>  #define PCIE_RESET_READY_POLL_MS 6
>  
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index cb2760a31fe2..cbe661aff9f5 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -866,6 +866,7 @@ enum {
>   PCI_ENABLE_PROC_DOMAINS = 0x0010,   /* Enable domains in /proc */
>   PCI_COMPAT_DOMAIN_0 = 0x0020,   /* ... except domain 0 */
>   PCI_SCAN_ALL_PCIE_DEVS  = 0x0040,   /* Scan all, not just dev 0 */
> + PCI_MOVABLE_BARS= 0x0080,   /* Runtime BAR reassign after 
> hotplug */
>  };
>  
>  /* These external functions are only available when PCI support is enabled */
> @@ -1345,6 +1346,7 @@ unsigned char pci_bus_max_busnr(struct pci_bus *bus);
>  void pci_setup_bridge(struct pci_bus *bus);
>  resource_size_t pcibios_window_alignment(struct pci_bus *bus,
>unsigned long type);
> +bool pci_movable_bars_enabled(void);
>  
>  #define PCI_VGA_STATE_CHANGE_BRIDGE (1 << 0)
>  #define PCI_VGA_STATE_CHANGE_DECODES (1 << 1)
> -- 
> 2.20.1

Re: [PATCH RFC v4 03/21] PCI: Enable bridge's I/O and MEM access for hotplugged devices

2019-03-26 Thread Bjorn Helgaas
On Mon, Mar 11, 2019 at 04:31:04PM +0300, Sergey Miroshnichenko wrote:
> After updating the bridge window resources, the PCI_COMMAND_IO and
> PCI_COMMAND_MEMORY bits of the bridge must be addressed as well.
> 
> Signed-off-by: Sergey Miroshnichenko 
> ---
>  drivers/pci/pci.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 895201d4c9e6..69898fe5255e 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1622,6 +1622,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
>   pci_enable_bridge(bridge);
>  
>   if (pci_is_enabled(dev)) {
> + int i, bars = 0;
> +
> + for (i = PCI_BRIDGE_RESOURCES; i < DEVICE_COUNT_RESOURCE; i++) {
> + if (dev->resource[i].flags & (IORESOURCE_MEM | 
> IORESOURCE_IO))
> + bars |= (1 << i);
> + }
> + do_pci_enable_device(dev, bars);

In what situation is this needed, exactly?  This code already exists
in pci_enable_device_flags().  Why isn't that enough?

I guess maybe there's some case where we enable the bridge, then
assign bridge windows, then enable a downstream device?

Does this fix a bug with current hotplug?

>   if (!dev->is_busmaster)
>   pci_set_master(dev);
>   mutex_unlock(>enable_mutex);
> -- 
> 2.20.1
> 


Re: [PATCH RFC v4 02/21] PCI: Fix race condition in pci_enable/disable_device()

2019-03-26 Thread Bjorn Helgaas
[+cc Srinath, Marta, LKML]

On Mon, Mar 11, 2019 at 04:31:03PM +0300, Sergey Miroshnichenko wrote:
>  CPU0  CPU1
> 
>  pci_enable_device_mem()   pci_enable_device_mem()
>pci_enable_bridge()   pci_enable_bridge()
>  pci_is_enabled()
>return false;
>  atomic_inc_return(enable_cnt)
>  Start actual enabling the bridge
>  ...   pci_is_enabled()
>  ... return true;
>  ...   Start memory requests <-- FAIL
>  ...
>  Set the PCI_COMMAND_MEMORY bit <-- Must wait for this
> 
> This patch protects the pci_enable/disable_device() and pci_enable_bridge()
> with mutexes.

This is a subtle issue that we've tried to fix before, but we've never
had a satisfactory solution, so I hope you've figured out the right
fix.

I'll include some links to previous discussion.  This patch is very
similar to [2], which we didn't actually apply.  We did apply the
patch from [3] as 40f11adc7cd9 ("PCI: Avoid race while enabling
upstream bridges"), but it caused the regressions reported in [4,5],
so we reverted it with 0f50a49e3008 ("Revert "PCI: Avoid race while
enabling upstream bridges"").

I think the underlying design problem is that we have a driver for
device B calling pci_enable_device(), and it is changing the state of
device A (an upstream bridge).  The model generally is that a driver
should only touch the device it is bound to.

It's tricky to get the locking right when several children of device A
all need to operate on A.

That's all to say I'll have to think carefully about this particular
patch, so I'll go on to the others and come back to this one.

Bjorn

[1] 
https://lore.kernel.org/linux-pci/1494256190-28993-1-git-send-email-srinath.man...@broadcom.com/T/#u
[RFC PATCH] pci: Concurrency issue in NVMe Init through PCIe switch

[2] 
https://lore.kernel.org/linux-pci/1496135297-19680-1-git-send-email-srinath.man...@broadcom.com/T/#u
[RFC PATCH v2] pci: Concurrency issue in NVMe Init through PCIe switch

[3] 
https://lore.kernel.org/linux-pci/1501858648-8-1-git-send-email-srinath.man...@broadcom.com/T/#u
[RFC PATCH v3] pci: Concurrency issue during pci enable bridge

[4] 
https://lore.kernel.org/linux-pci/150547971091.977464.16294045866179907260.stgit@buzz/T/#u
[PATCH bisected regression in 4.14] PCI: fix race while enabling upstream 
bridges concurrently

[5] 
https://lore.kernel.org/linux-wireless/04c9b578-693c-1dc6-9f0f-904580231...@kernel.dk/T/#u
iwlwifi firmware load broken in current -git

[6] 
https://lore.kernel.org/linux-pci/744877924.5841545.1521630049567.javamail.zim...@kalray.eu/T/#u
[RFC PATCH] nvme: avoid race-conditions when enabling devices

> Signed-off-by: Sergey Miroshnichenko 
> ---
>  drivers/pci/pci.c   | 26 ++
>  drivers/pci/probe.c |  1 +
>  include/linux/pci.h |  1 +
>  3 files changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index f006068be209..895201d4c9e6 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1615,6 +1615,8 @@ static void pci_enable_bridge(struct pci_dev *dev)
>   struct pci_dev *bridge;
>   int retval;
>  
> + mutex_lock(>enable_mutex);
> +
>   bridge = pci_upstream_bridge(dev);
>   if (bridge)
>   pci_enable_bridge(bridge);
> @@ -1622,6 +1624,7 @@ static void pci_enable_bridge(struct pci_dev *dev)
>   if (pci_is_enabled(dev)) {
>   if (!dev->is_busmaster)
>   pci_set_master(dev);
> + mutex_unlock(>enable_mutex);
>   return;
>   }
>  
> @@ -1630,11 +1633,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
>   pci_err(dev, "Error enabling bridge (%d), continuing\n",
>   retval);
>   pci_set_master(dev);
> + mutex_unlock(>enable_mutex);
>  }
>  
>  static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
>  {
>   struct pci_dev *bridge;
> + /* Enable-locking of bridges is performed within the 
> pci_enable_bridge() */
> + bool need_lock = !dev->subordinate;
>   int err;
>   int i, bars = 0;
>  
> @@ -1650,8 +1656,13 @@ static int pci_enable_device_flags(struct pci_dev 
> *dev, unsigned long flags)
>   dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
>   }
>  
> - if (atomic_inc_return(>enable_cnt) > 1)
> + if (need_lock)
> + mutex_lock(>enable_mutex);
> + if (pci_is_enabled(dev)) {
> + if (need_lock)
> + mutex_unlock(>enable_mutex);
>   return 0;   /* already enabled */
> + }
>  
>   bridge = pci_upstream_bridge(dev);
>   if (bridge)
> @@ -1666,8 +1677,10 @@ static int pci_enable_device_flags(struct pci_dev 
> *dev, unsigned long flags)
>   

Re: [PATCH RFC v4 01/21] PCI: Fix writing invalid BARs during pci_restore_state()

2019-03-26 Thread Bjorn Helgaas
Hi Sergey,

Thanks for all your work here.  This is a long-standing problem, and
I'm glad you're working on it.

On Mon, Mar 11, 2019 at 04:31:02PM +0300, Sergey Miroshnichenko wrote:
> If BAR movement has happened (due to PCIe hotplug) after pci_save_state(),
> the saved addresses will become outdated. Restore them the most recently
> calculated values, not the ones stored in an arbitrary moment.

Maybe pci_save_state() should not even save BAR values, since we have
no mechanism to determine whether those saved values are valid?

> Signed-off-by: Sergey Miroshnichenko 
> ---
>  drivers/pci/pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 7c1b362f599a..f006068be209 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1376,7 +1376,7 @@ static void pci_restore_config_space(struct pci_dev 
> *pdev)
>   if (pdev->hdr_type == PCI_HEADER_TYPE_NORMAL) {
>   pci_restore_config_space_range(pdev, 10, 15, 0, false);
>   /* Restore BARs before the command register. */
> - pci_restore_config_space_range(pdev, 4, 9, 10, false);
> + pci_restore_bars(pdev);

pci_restore_bars() is a much longer call path than
pci_restore_config_space_range(), so it's a little bit scary just from
the complexity point of view, but I think this does make sense.

But I am concerned that we don't handle bridge BARs the same way (this
is an existing problem, not something you're introducing).

Bridge BARs (if implemented) are dwords 4 and 5, so they are currently
restored as part of this range:

  pci_restore_config_space_range(pdev, 0, 8, 0, false);

If we followed the same pattern as for type 0 devices, this would look
like:

  pci_restore_config_space_range(pdev, 6, 8, 0, false);
  pci_restore_config_space_range(pdev, 4, 5, 10, false);  /* BARs */
  pci_restore_config_space_range(pdev, 0, 3, 0, false);

And after your patch, it would look like:

  pci_restore_config_space_range(pdev, 6, 8, 0, false);
  pci_restore_bars(pdev);
  pci_restore_config_space_range(pdev, 0, 3, 0, false);

I think this would require a little enhancement in pci_restore_bars()
to filter the BAR range based on the hdr_type.

I would propose

  - adding a new patch to split up the bridge restore so the (0, 8)
range is split into (6, 8); (4, 5); (0, 3), so it matches the type
0 restore.

  - adding another new patch to filter the BAR range in
pci_restore_bars().

  - updating this patch to use pci_restore_bars() in both the type 0
and type 1 paths.

  - possibly adding a patch to make pci_save_state() not save BAR
values in dev->saved_config_space, and any other changes needed to
stop reading BARs from that area.

What do you think?

>   pci_restore_config_space_range(pdev, 0, 3, 0, false);
>   } else if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
>   pci_restore_config_space_range(pdev, 12, 15, 0, false);
> -- 
> 2.20.1
> 


Re: [PATCH] hotplug/drc-info: ininitialize fndit to zero

2019-03-20 Thread Bjorn Helgaas
[+cc Michael B (original author)]

On Sat, Mar 16, 2019 at 09:40:16PM +, Colin King wrote:
> From: Colin Ian King 
> 
> Currently variable fndit is not initialized and contains a
> garbage value, later it is set to 1 if a drc entry is found.
> Ensure fndit is not containing garbage by initializing it to
> zero. Also remove an extraneous space at the end of an
> sprintf call.
> 
> Detected by static analysis with cppcheck.
> 
> Fixes: 2fcf3ae508c2 ("hotplug/drc-info: Add code to search ibm,drc-info 
> property")
> Signed-off-by: Colin Ian King 

Michael E, I assume you'll take this since you took the original?
Let me know if you want me to.

> ---
>  drivers/pci/hotplug/rpaphp_core.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/rpaphp_core.c 
> b/drivers/pci/hotplug/rpaphp_core.c
> index bcd5d357ca23..28213f44f64a 100644
> --- a/drivers/pci/hotplug/rpaphp_core.c
> +++ b/drivers/pci/hotplug/rpaphp_core.c
> @@ -230,7 +230,7 @@ static int rpaphp_check_drc_props_v2(struct device_node 
> *dn, char *drc_name,
>   struct of_drc_info drc;
>   const __be32 *value;
>   char cell_drc_name[MAX_DRC_NAME_LEN];
> - int j, fndit;
> + int j, fndit = 0;
>  
>   info = of_find_property(dn->parent, "ibm,drc-info", NULL);
>   if (info == NULL)
> @@ -254,7 +254,7 @@ static int rpaphp_check_drc_props_v2(struct device_node 
> *dn, char *drc_name,
>   /* Found it */
>  
>   if (fndit)
> - sprintf(cell_drc_name, "%s%d", drc.drc_name_prefix, 
> + sprintf(cell_drc_name, "%s%d", drc.drc_name_prefix,
>   my_index);
>  
>   if (((drc_name == NULL) ||
> -- 
> 2.20.1
> 


Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2019-03-05 Thread Bjorn Helgaas
On Thu, Nov 15, 2018 at 05:16:01PM -0600, Alexandru Gagniuc wrote:
> Thanks to Keith for pointing out that it doesn't make sense to disable
> AER services when only one device has a FIRMWARE_FIRST HEST.
> 
> AER ownership is an interesting issue brought in by FFS (firmware-first)
> model. In a nutshell if FFS handles AER, then OS should not touch any
> of the AER bits. FW might set things up so that it receives AER
> notifications via SMI. It's theoretically possible to receive SCIs,
> but the exact mechanism is platform-dependent. OS touching AER bits
> when firmware owns them may interfere with these notifications.
> 
> The ACPI mechanism for negotiating control of AER is _OSC, and is
> described in detail in ACPI 6.2 Ch. 6.2.11.3. _OSC is negotiated at
> the root bus level. Any root port, switch, or endpoint under the bus
> would have its AER ownership negotiated in one _OSC call.
> 
> Then there is HEST, which is part of ACPI Platform Error Interfaces
> (APEI). HEST tables describe the errors that FW may report to the OS.
> A types 6,7 and 7 HEST tables describe AER errors from PCIe devices.
> As part of this description, we're told if the error source is FFS.
> 
> Information in HEST seems to be redundant, as each error reported by
> FW will have a CPER table that describes it in detail.
> 
> Because HEST describes an error source as firmware-first or not, we've
> taken this to mean ownership of AER. Because AER ownership and error
> reporting are coupled, _OSC and HEST usually agree on the matter of
> ownership. However, that doesn't seem to be required by ACPI.
> 
> I've asked around a few people at Dell and they unanimously agree that
> _OSC is the correct way to determine ownership of AER. In linux, we
> use the result of _OSC to enable AER services, but we use HEST to
> determine AER ownership. That's inconsistent. This series drops the
> use of HEST in favor of _OSC.
> 
> [1] https://lkml.org/lkml/2018/11/15/62
> 
> Alexandru Gagniuc (2):
>   PCI/AER: Do not use APEI/HEST to disable AER services globally
>   PCI/AER: Determine AER ownership based on _OSC instead of HEST
> 
>  drivers/acpi/pci_root.c  |  9 +
>  drivers/pci/pcie/aer.c   | 82 ++--
>  include/linux/pci-acpi.h |  6 ---
>  3 files changed, 5 insertions(+), 92 deletions(-)

I'm pretty sure we do need to do something here, but there was quite a
lot of discussion that didn't seem to really get resolved, so I'm
dropping these for now.

Please repost them with any relevant updates and we'll see if we can
get a consensus that we're going the right direction.

Bjorn


Re: [PATCH v1] PCI/AER: use match_string() helper to simplify the code

2019-01-29 Thread Bjorn Helgaas
On Mon, Jan 28, 2019 at 01:57:28PM +0200, Andy Shevchenko wrote:
> match_string() returns the index of an array for a matching string,
> which can be used intead of open coded implementation.
> 
> Signed-off-by: Andy Shevchenko 

Applied to pci/aer for v5.1, thanks!

> ---
>  drivers/pci/pcie/aer.c | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index fed29de783e0..f8fc2114ad39 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -117,7 +117,7 @@ bool pci_aer_available(void)
>  
>  static int ecrc_policy = ECRC_POLICY_DEFAULT;
>  
> -static const char *ecrc_policy_str[] = {
> +static const char * const ecrc_policy_str[] = {
>   [ECRC_POLICY_DEFAULT] = "bios",
>   [ECRC_POLICY_OFF] = "off",
>   [ECRC_POLICY_ON] = "on"
> @@ -203,11 +203,8 @@ void pcie_ecrc_get_policy(char *str)
>  {
>   int i;
>  
> - for (i = 0; i < ARRAY_SIZE(ecrc_policy_str); i++)
> - if (!strncmp(str, ecrc_policy_str[i],
> -  strlen(ecrc_policy_str[i])))
> - break;
> - if (i >= ARRAY_SIZE(ecrc_policy_str))
> + i = match_string(ecrc_policy_str, ARRAY_SIZE(ecrc_policy_str), str);
> + if (i < 0)
>   return;
>  
>   ecrc_policy = i;
> -- 
> 2.20.1
> 


Re: [PATCH] PCI: Use of_node_name_eq for node name comparisons

2019-01-22 Thread Bjorn Helgaas
[+cc Michael B]

On Wed, Dec 05, 2018 at 01:50:34PM -0600, Rob Herring wrote:
> Convert string compares of DT node names to use of_node_name_eq helper
> instead. This removes direct access to the node name pointer.
> 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Bjorn Helgaas 
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-...@vger.kernel.org
> Signed-off-by: Rob Herring 

I applied the pci/of.c change to pci/misc for v5.1, thanks!

I dropped the rpaphp_core.c part because there's another patch
in-flight [1] that touches the same code, and it would be easier if
Michael picked up this change and added it to his series.

Bjorn

[1] 
https://lore.kernel.org/lkml/20181214205120.16435.46952.st...@powerkvm6.aus.stglabs.ibm.com/

> ---
>  drivers/pci/hotplug/rpaphp_core.c | 2 +-
>  drivers/pci/of.c  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/rpaphp_core.c 
> b/drivers/pci/hotplug/rpaphp_core.c
> index bcd5d357ca23..7697d9c92b98 100644
> --- a/drivers/pci/hotplug/rpaphp_core.c
> +++ b/drivers/pci/hotplug/rpaphp_core.c
> @@ -354,7 +354,7 @@ int rpaphp_add_slot(struct device_node *dn)
>   const int *indexes, *names, *types, *power_domains;
>   char *name, *type;
>  
> - if (!dn->name || strcmp(dn->name, "pci"))
> + if (!of_node_name_eq(dn, "pci"))
>   return 0;
>  
>   /* If this is not a hotplug slot, return without doing anything. */
> diff --git a/drivers/pci/of.c b/drivers/pci/of.c
> index 4c4217d0c3f1..3d32da15c215 100644
> --- a/drivers/pci/of.c
> +++ b/drivers/pci/of.c
> @@ -113,7 +113,7 @@ struct device_node *of_pci_find_child_device(struct 
> device_node *parent,
>* a fake root for all functions of a multi-function
>* device we go down them as well.
>*/
> - if (!strcmp(node->name, "multifunc-device")) {
> + if (of_node_name_eq(node, "multifunc-device")) {
>   for_each_child_of_node(node, node2) {
>   if (__of_pci_pci_compare(node2, devfn)) {
>   of_node_put(node);
> -- 
> 2.19.1
> 


Re: [RFC 5/6] powerpc/pci/hotplug: Use common drcinfo parsing

2019-01-22 Thread Bjorn Helgaas
On Mon, Jan 14, 2019 at 06:28:57PM -0600, Bjorn Helgaas wrote:
> On Fri, Dec 14, 2018 at 02:51:31PM -0600, Michael Bringmann wrote:
> > The implementation of the pseries-specific drc info properties
> > is currently implemented in pseries-specific and non-pseries-specific
> > files.  This patch set uses a new implementation of the device-tree
> > parsing code for the properties.
> > 
> > This patch refactors parsing of the pseries-specific drc-info properties
> > out of rpaphp_core.c to use the common parser.  In the case where an
> > architecture does not use these properties, an __weak copy of the
> > function is provided with dummy return values.  Changes include creating
> > appropriate callback functions and passing callback-specific data
> > blocks into arch_find_drc_match.  All functions that were used just
> > to support the previous parsing implementation have been moved.
> > 
> > Signed-off-by: Michael Bringmann 
> 
> This is fine with me.  Any rpaphp_core.c maintainers want to comment?
> Tyrel?
> 
> $ ./scripts/get_maintainer.pl -f drivers/pci/hotplug/rpaphp_core.c
> Tyrel Datwyler  (supporter:IBM Power PCI Hotplug 
> Driver for RPA-compliant...)
> Benjamin Herrenschmidt  (supporter:LINUX FOR 
> POWERPC (32-BIT AND 64-BIT))
> Paul Mackerras  (supporter:LINUX FOR POWERPC (32-BIT AND 
> 64-BIT))
> Michael Ellerman  (supporter:LINUX FOR POWERPC (32-BIT 
> AND 64-BIT))

This looks like part of a larger series, but I only got this patch.  So I
assume you'll route this elsewhere along with the rest of the series.
Here's my ack if it's useful:

Acked-by: Bjorn Helgaas 

> > ---
> >  drivers/pci/hotplug/rpaphp_core.c |  232 
> > -
> >  1 file changed, 28 insertions(+), 204 deletions(-)
> > 
> > diff --git a/drivers/pci/hotplug/rpaphp_core.c 
> > b/drivers/pci/hotplug/rpaphp_core.c
> > index bcd5d35..9ad7384 100644
> > --- a/drivers/pci/hotplug/rpaphp_core.c
> > +++ b/drivers/pci/hotplug/rpaphp_core.c
> > @@ -154,182 +154,18 @@ static enum pci_bus_speed get_max_bus_speed(struct 
> > slot *slot)
> > return speed;
> >  }
> >  
> > -static int get_children_props(struct device_node *dn, const int 
> > **drc_indexes,
> > -   const int **drc_names, const int **drc_types,
> > -   const int **drc_power_domains)
> > -{
> > -   const int *indexes, *names, *types, *domains;
> > -
> > -   indexes = of_get_property(dn, "ibm,drc-indexes", NULL);
> > -   names = of_get_property(dn, "ibm,drc-names", NULL);
> > -   types = of_get_property(dn, "ibm,drc-types", NULL);
> > -   domains = of_get_property(dn, "ibm,drc-power-domains", NULL);
> > -
> > -   if (!indexes || !names || !types || !domains) {
> > -   /* Slot does not have dynamically-removable children */
> > -   return -EINVAL;
> > -   }
> > -   if (drc_indexes)
> > -   *drc_indexes = indexes;
> > -   if (drc_names)
> > -   /* _names[1] contains NULL terminated slot names */
> > -   *drc_names = names;
> > -   if (drc_types)
> > -   /* _types[1] contains NULL terminated slot types */
> > -   *drc_types = types;
> > -   if (drc_power_domains)
> > -   *drc_power_domains = domains;
> > -
> > -   return 0;
> > -}
> > -
> > -
> >  /* Verify the existence of 'drc_name' and/or 'drc_type' within the
> > - * current node.  First obtain it's my-drc-index property.  Next,
> > - * obtain the DRC info from it's parent.  Use the my-drc-index for
> > - * correlation, and obtain/validate the requested properties.
> > + * current node.
> >   */
> >  
> > -static int rpaphp_check_drc_props_v1(struct device_node *dn, char 
> > *drc_name,
> > -   char *drc_type, unsigned int my_index)
> > -{
> > -   char *name_tmp, *type_tmp;
> > -   const int *indexes, *names;
> > -   const int *types, *domains;
> > -   int i, rc;
> > -
> > -   rc = get_children_props(dn->parent, , , , );
> > -   if (rc < 0) {
> > -   return -EINVAL;
> > -   }
> > -
> > -   name_tmp = (char *) [1];
> > -   type_tmp = (char *) [1];
> > -
> > -   /* Iterate through parent properties, looking for my-drc-index */
> > -   for (i = 0; i < be32_to_cpu(indexes[0]); i++) {
> > -   if ((unsigned int) indexes[i + 1] == my_index)
> > -   break;
> > -
> > -   name_tmp += (strlen(name_tmp) + 1);
> > -

Re: [RFC 5/6] powerpc/pci/hotplug: Use common drcinfo parsing

2019-01-14 Thread Bjorn Helgaas
On Fri, Dec 14, 2018 at 02:51:31PM -0600, Michael Bringmann wrote:
> The implementation of the pseries-specific drc info properties
> is currently implemented in pseries-specific and non-pseries-specific
> files.  This patch set uses a new implementation of the device-tree
> parsing code for the properties.
> 
> This patch refactors parsing of the pseries-specific drc-info properties
> out of rpaphp_core.c to use the common parser.  In the case where an
> architecture does not use these properties, an __weak copy of the
> function is provided with dummy return values.  Changes include creating
> appropriate callback functions and passing callback-specific data
> blocks into arch_find_drc_match.  All functions that were used just
> to support the previous parsing implementation have been moved.
> 
> Signed-off-by: Michael Bringmann 

This is fine with me.  Any rpaphp_core.c maintainers want to comment?
Tyrel?

$ ./scripts/get_maintainer.pl -f drivers/pci/hotplug/rpaphp_core.c
Tyrel Datwyler  (supporter:IBM Power PCI Hotplug 
Driver for RPA-compliant...)
Benjamin Herrenschmidt  (supporter:LINUX FOR POWERPC 
(32-BIT AND 64-BIT))
Paul Mackerras  (supporter:LINUX FOR POWERPC (32-BIT AND 
64-BIT))
Michael Ellerman  (supporter:LINUX FOR POWERPC (32-BIT AND 
64-BIT))

> ---
>  drivers/pci/hotplug/rpaphp_core.c |  232 
> -
>  1 file changed, 28 insertions(+), 204 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/rpaphp_core.c 
> b/drivers/pci/hotplug/rpaphp_core.c
> index bcd5d35..9ad7384 100644
> --- a/drivers/pci/hotplug/rpaphp_core.c
> +++ b/drivers/pci/hotplug/rpaphp_core.c
> @@ -154,182 +154,18 @@ static enum pci_bus_speed get_max_bus_speed(struct 
> slot *slot)
>   return speed;
>  }
>  
> -static int get_children_props(struct device_node *dn, const int 
> **drc_indexes,
> - const int **drc_names, const int **drc_types,
> - const int **drc_power_domains)
> -{
> - const int *indexes, *names, *types, *domains;
> -
> - indexes = of_get_property(dn, "ibm,drc-indexes", NULL);
> - names = of_get_property(dn, "ibm,drc-names", NULL);
> - types = of_get_property(dn, "ibm,drc-types", NULL);
> - domains = of_get_property(dn, "ibm,drc-power-domains", NULL);
> -
> - if (!indexes || !names || !types || !domains) {
> - /* Slot does not have dynamically-removable children */
> - return -EINVAL;
> - }
> - if (drc_indexes)
> - *drc_indexes = indexes;
> - if (drc_names)
> - /* _names[1] contains NULL terminated slot names */
> - *drc_names = names;
> - if (drc_types)
> - /* _types[1] contains NULL terminated slot types */
> - *drc_types = types;
> - if (drc_power_domains)
> - *drc_power_domains = domains;
> -
> - return 0;
> -}
> -
> -
>  /* Verify the existence of 'drc_name' and/or 'drc_type' within the
> - * current node.  First obtain it's my-drc-index property.  Next,
> - * obtain the DRC info from it's parent.  Use the my-drc-index for
> - * correlation, and obtain/validate the requested properties.
> + * current node.
>   */
>  
> -static int rpaphp_check_drc_props_v1(struct device_node *dn, char *drc_name,
> - char *drc_type, unsigned int my_index)
> -{
> - char *name_tmp, *type_tmp;
> - const int *indexes, *names;
> - const int *types, *domains;
> - int i, rc;
> -
> - rc = get_children_props(dn->parent, , , , );
> - if (rc < 0) {
> - return -EINVAL;
> - }
> -
> - name_tmp = (char *) [1];
> - type_tmp = (char *) [1];
> -
> - /* Iterate through parent properties, looking for my-drc-index */
> - for (i = 0; i < be32_to_cpu(indexes[0]); i++) {
> - if ((unsigned int) indexes[i + 1] == my_index)
> - break;
> -
> - name_tmp += (strlen(name_tmp) + 1);
> - type_tmp += (strlen(type_tmp) + 1);
> - }
> -
> - if (((drc_name == NULL) || (drc_name && !strcmp(drc_name, name_tmp))) &&
> - ((drc_type == NULL) || (drc_type && !strcmp(drc_type, type_tmp
> - return 0;
> -
> - return -EINVAL;
> -}
> -
> -static int rpaphp_check_drc_props_v2(struct device_node *dn, char *drc_name,
> - char *drc_type, unsigned int my_index)
> -{
> - struct property *info;
> - unsigned int entries;
> - struct of_drc_info drc;
> - const __be32 *value;
> - char cell_drc_name[MAX_DRC_NAME_LEN];
> - int j, fndit;
> -
> - info = of_find_property(dn->parent, "ibm,drc-info", NULL);
> - if (info == NULL)
> - return -EINVAL;
> -
> - value = of_prop_next_u32(info, NULL, );
> - if (!value)
> - return -EINVAL;
> -
> - for (j = 0; j < entries; j++) {
> - of_read_drc_info_cell(, , );
> -
> - /* Should now know end of current entry */
> -
> - if (my_index > 

Kconfig label updates

2019-01-08 Thread Bjorn Helgaas
Hi,

I want to update the PCI Kconfig labels so they're more consistent and
useful to users, something like the patch below.  IIUC, the items
below are all IBM-related; please correct me if not.

I'd also like to expand (or remove) "RPA" because Google doesn't find
anything about "IBM RPA", except Robotic Process Automation, which I
think must be something else.

Is there some text expansion of RPA that we could use that would be
meaningful to a user, i.e., something he/she might find on a nameplate
or in a user manual?

Ideally the PCI Kconfig labels would match the terms used in
arch/.../Kconfig, e.g.,

  config PPC_POWERNV
bool "IBM PowerNV (Non-Virtualized) platform support"

  config PPC_PSERIES
bool "IBM pSeries & new (POWER5-based) iSeries"

  config MARCH_Z900
bool "IBM zSeries model z800 and z900"

  config MARCH_Z9_109
bool "IBM System z9"

Bjorn


diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
index e9f78eb390d2..1c1d145bfd84 100644
--- a/drivers/pci/hotplug/Kconfig
+++ b/drivers/pci/hotplug/Kconfig
@@ -112,7 +112,7 @@ config HOTPLUG_PCI_SHPC
  When in doubt, say N.
 
 config HOTPLUG_PCI_POWERNV
-   tristate "PowerPC PowerNV PCI Hotplug driver"
+   tristate "IBM PowerNV PCI Hotplug driver"
depends on PPC_POWERNV && EEH
select OF_DYNAMIC
help
@@ -125,10 +125,11 @@ config HOTPLUG_PCI_POWERNV
  When in doubt, say N.
 
 config HOTPLUG_PCI_RPA
-   tristate "RPA PCI Hotplug driver"
+   tristate "IBM Power Systems RPA PCI Hotplug driver"
depends on PPC_PSERIES && EEH
help
  Say Y here if you have a RPA system that supports PCI Hotplug.
+ This includes the earlier pSeries and iSeries.
 
  To compile this driver as a module, choose M here: the
  module will be called rpaphp.
@@ -136,7 +137,7 @@ config HOTPLUG_PCI_RPA
  When in doubt, say N.
 
 config HOTPLUG_PCI_RPA_DLPAR
-   tristate "RPA Dynamic Logical Partitioning for I/O slots"
+   tristate "IBM RPA Dynamic Logical Partitioning for I/O slots"
depends on HOTPLUG_PCI_RPA
help
  Say Y here if your system supports Dynamic Logical Partitioning
@@ -157,7 +158,7 @@ config HOTPLUG_PCI_SGI
  When in doubt, say N.
 
 config HOTPLUG_PCI_S390
-   bool "System z PCI Hotplug Support"
+   bool "IBM System z PCI Hotplug Support"
depends on S390 && 64BIT
help
  Say Y here if you want to use the System z PCI Hotplug


Re: [PATCH 1/2] PCI/IOV: provide flag to skip VF scanning

2019-01-01 Thread Bjorn Helgaas
On Fri, Dec 21, 2018 at 03:19:49PM +0100, Sebastian Ott wrote:
> Hello Bjorn,
> 
> On Thu, 20 Dec 2018, Bjorn Helgaas wrote:
> > I think the strategy is fine, but can you restructure the patches
> > like this:
> > 
> >   1) Factor out sriov_add_vfs() and sriov_dev_vfs().  This makes no
> >  functional change at all.
> > 
> >   2) Add dev->no_vf_scan, set it in the s390 pcibios_add_device(), and
> >  test it in sriov_add_vfs(), and sriov_del_vfs().
> > 
> > I think both pieces will be easier to review that way.
> 
> Done. I took the liberty to add Christoph's R-b to the first two patches
> since it's just a split of the patch he gave the R-b to.

Thanks.

It's really way too late to do this, but they're pretty trivial, and
I've been out longer than expected for vacation and illness, so I applied
these to pci/virtualization for v4.21.

Bjorn


Re: [PATCH 1/2] PCI/IOV: provide flag to skip VF scanning

2018-12-20 Thread Bjorn Helgaas
Hi Sebastian,

On Tue, Dec 18, 2018 at 11:16:49AM +0100, Sebastian Ott wrote:
> Provide a flag to skip scanning for new VFs after SRIOV enablement.
> This can be set by implementations for which the VFs are already
> reported by other means.
> 
> Signed-off-by: Sebastian Ott 
> ---
>  drivers/pci/iov.c   | 48 
>  include/linux/pci.h |  1 +
>  2 files changed, 37 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index 9616eca3182f..3aa115ed3a65 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -252,6 +252,27 @@ int __weak pcibios_sriov_disable(struct pci_dev *pdev)
>   return 0;
>  }
>  
> +static int sriov_add_vfs(struct pci_dev *dev, u16 num_vfs)
> +{
> + unsigned int i;
> + int rc;
> +
> + if (dev->no_vf_scan)
> + return 0;
> +
> + for (i = 0; i < num_vfs; i++) {
> + rc = pci_iov_add_virtfn(dev, i);
> + if (rc)
> + goto failed;
> + }
> + return 0;
> +failed:
> + while (i--)
> + pci_iov_remove_virtfn(dev, i);
> +
> + return rc;
> +}

I think the strategy is fine, but can you restructure the patches
like this:

  1) Factor out sriov_add_vfs() and sriov_dev_vfs().  This makes no
 functional change at all.

  2) Add dev->no_vf_scan, set it in the s390 pcibios_add_device(), and
 test it in sriov_add_vfs(), and sriov_del_vfs().

I think both pieces will be easier to review that way.

>  static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
>  {
>   int rc;
> @@ -337,21 +358,15 @@ static int sriov_enable(struct pci_dev *dev, int 
> nr_virtfn)
>   msleep(100);
>   pci_cfg_access_unlock(dev);
>  
> - for (i = 0; i < initial; i++) {
> - rc = pci_iov_add_virtfn(dev, i);
> - if (rc)
> - goto failed;
> - }
> + rc = sriov_add_vfs(dev, initial);
> + if (rc)
> + goto err_pcibios;
>  
>   kobject_uevent(>dev.kobj, KOBJ_CHANGE);
>   iov->num_VFs = nr_virtfn;
>  
>   return 0;
>  
> -failed:
> - while (i--)
> - pci_iov_remove_virtfn(dev, i);
> -
>  err_pcibios:
>   iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
>   pci_cfg_access_lock(dev);
> @@ -368,17 +383,26 @@ static int sriov_enable(struct pci_dev *dev, int 
> nr_virtfn)
>   return rc;
>  }
>  
> -static void sriov_disable(struct pci_dev *dev)
> +static void sriov_del_vfs(struct pci_dev *dev)
>  {
> - int i;
>   struct pci_sriov *iov = dev->sriov;
> + int i;
>  
> - if (!iov->num_VFs)
> + if (dev->no_vf_scan)
>   return;
>  
>   for (i = 0; i < iov->num_VFs; i++)
>   pci_iov_remove_virtfn(dev, i);
> +}
> +
> +static void sriov_disable(struct pci_dev *dev)
> +{
> + struct pci_sriov *iov = dev->sriov;
> +
> + if (!iov->num_VFs)
> + return;
>  
> + sriov_del_vfs(dev);
>   iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
>   pci_cfg_access_lock(dev);
>   pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 11c71c4ecf75..f70b9ccd3e86 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -405,6 +405,7 @@ struct pci_dev {
>   unsigned intnon_compliant_bars:1;   /* Broken BARs; ignore them */
>   unsigned intis_probed:1;/* Device probing in progress */
>   unsigned intlink_active_reporting:1;/* Device capable of reporting 
> link active */
> + unsigned intno_vf_scan:1;   /* Don't scan for VF's after VF 
> enablement */
>   pci_dev_flags_t dev_flags;
>   atomic_tenable_cnt; /* pci_enable_device has been called */
>  
> -- 
> 2.13.4
> 


Re: [PATCH] PCI/AER: only insert one element into kfifo

2018-12-14 Thread Bjorn Helgaas
On Wed, Dec 12, 2018 at 04:32:30PM +0800, Yanjiang Jin wrote:
> 'commit ecae65e133f2 ("PCI/AER: Use kfifo_in_spinlocked() to
> insert locked elements")' replace kfifo_put() with kfifo_in_spinlocked().
> 
> But as "kfifo_in(fifo, buf, n)" describes:
> " * @n: number of elements to be added".
> 
> We want to insert only one element into kfifo, not "sizeof(entry) = 16".
> Without this patch, we would get 15 uninitialized elements.
> 
> Signed-off-by: Yanjiang Jin 

Since this fixes ecae65e133f2, which was applied for v4.20, I applied
this with Keith's reviewed-by to for-linus so we can get it into
v4.20.

For some reason the patch didn't apply, but I can't see why, so I
just applied it by hand.

Thanks!

> ---
>  drivers/pci/pcie/aer.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index a90a919..fed29de 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1064,7 +1064,7 @@ void aer_recover_queue(int domain, unsigned int bus, 
> unsigned int devfn,
> .regs   = aer_regs,
> };
> 
> -   if (kfifo_in_spinlocked(_recover_ring, , sizeof(entry),
> +   if (kfifo_in_spinlocked(_recover_ring, , 1,
>  _recover_ring_lock))
> schedule_work(_recover_work);
> else
> --
> 1.8.3.1
> 
> 
> 
> 
> This email is intended only for the named addressee. It may contain 
> information that is confidential/private, legally privileged, or 
> copyright-protected, and you should handle it accordingly. If you are not the 
> intended recipient, you do not have legal rights to retain, copy, or 
> distribute this email or its contents, and should promptly delete the email 
> and all electronic copies in your system; do not retain copies in any media. 
> If you have received this email in error, please notify the sender promptly. 
> Thank you.
> 
> 


Re: [PATCH 0/2] sriov enablement on s390

2018-12-12 Thread Bjorn Helgaas
On Wed, Dec 05, 2018 at 02:45:14PM +0100, Sebastian Ott wrote:
> Hello Bjorn,
> 
> On Wed, 10 Oct 2018, Bjorn Helgaas wrote:
> > On Wed, Oct 10, 2018 at 02:55:07PM +0200, Sebastian Ott wrote:
> > > On Wed, 12 Sep 2018, Bjorn Helgaas wrote:
> > > > On Wed, Sep 12, 2018 at 02:34:09PM +0200, Sebastian Ott wrote:
> > > > > On s390 we currently handle SRIOV within firmware. Which means
> > > > > that the PF is under firmware control and not visible to operating
> > > > > systems. SRIOV enablement happens within firmware and VFs are
> > > > > passed through to logical partitions.
> > > > > 
> > > > > I'm working on a new mode were the PF is under operating system
> > > > > control (including SRIOV enablement). However we still need
> > > > > firmware support to access the VFs. The way this is supposed
> > > > > to work is that when firmware traps the SRIOV enablement it
> > > > > will present machine checks to the logical partition that
> > > > > triggered the SRIOV enablement and provide the VFs via hotplug
> > > > > events.
> > > > > 
> > > > > The problem I'm faced with is that the VF detection code in
> > > > > sriov_enable leads to unusable functions in s390.
> > > > 
> > > > We're moving away from the weak function implementation style.  Can
> > > > you take a look at Arnd's work here, which uses pci_host_bridge
> > > > callbacks instead?
> > > > 
> > > >   https://lkml.kernel.org/r/20180817102645.3839621-1-a...@arndb.de
> > > 
> > > What's the status of Arnd's patches - will they go upstream in the next
> > > couple of versions?
> > 
> > I hope so [1].  IIRC Arnd mentioned doing some minor updates, so I'm
> > waiting on that.
> > 
> > > What about my patches that I rebased on Arnd's branch
> > > will they be considered?
> > 
> > Definitely.  From my point of view they're just lined up behind Arnd's
> > patches.
> > 
> > [1] 
> > https://lore.kernel.org/linux-pci/20181002205903.gd120...@bhelgaas-glaptop.roam.corp.google.com
> 
> It appears like these patches are not in-line for the next merge window.
> Would it be possible to go with my original patches (using __weak
> functions)? (This would also make life easier with regards to backports)
> I can post patches to convert this to use function pointers once Arnd's
> patches make it to the kernel.

Yeah, sorry, I think we should just go with your original approach.

Can you repost those patches with minor changelog updates so
"git log --online" on the files looks consistent.  Also, capitalize
"PCI", "VF", etc, consistently when used in English text.

Bjorn


Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]

2018-12-11 Thread Bjorn Helgaas
On Tue, Dec 11, 2018 at 6:38 PM David Gibson
 wrote:
>
> On Tue, Dec 11, 2018 at 08:01:43AM -0600, Bjorn Helgaas wrote:
> > Hi David,
> >
> > I see you're still working on this, but if you do end up going this
> > direction eventually, would you mind splitting this into two patches:
> > 1) rename the quirk to make it more generic (but not changing any
> > behavior), and 2) add the ConnectX devices to the quirk.  That way
> > the ConnectX change is smaller and more easily
> > understood/reverted/etc.
>
> Sure.  Would it make sense to send (1) as an independent cleanup,
> while I'm still working out exactly what (if anything) we need for
> (2)?

You could, but I don't think there's really much benefit in doing the
first without the second, and I think there is some value in handling
both patches at the same time.


Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]

2018-12-11 Thread Bjorn Helgaas
Hi David,

I see you're still working on this, but if you do end up going this
direction eventually, would you mind splitting this into two patches:
1) rename the quirk to make it more generic (but not changing any
behavior), and 2) add the ConnectX devices to the quirk.  That way
the ConnectX change is smaller and more easily understood/reverted/etc.

On Thu, Dec 06, 2018 at 03:19:51PM +1100, David Gibson wrote:
> Mellanox ConnectX-5 IB cards (MT27800) seem to cause a call trace when
> unbound from their regular driver and attached to vfio-pci in order to pass
> them through to a guest.
> 
> This goes away if the disable_idle_d3 option is used, so it looks like a
> problem with the hardware handling D3 state.  To fix that more permanently,
> use a device quirk to disable D3 state for these devices.
> 
> We do this by renaming the existing quirk_no_ata_d3() more generally and
> attaching it to the ConnectX-[45] devices (0x15b3:0x1013).
> 
> Signed-off-by: David Gibson 
> ---
>  drivers/pci/quirks.c | 17 +++--
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 4700d24e5d55..add3f516ca12 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -1315,23 +1315,24 @@ static void quirk_ide_samemode(struct pci_dev *pdev)
>  }
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82801CA_10, 
> quirk_ide_samemode);
>  
> -/* Some ATA devices break if put into D3 */
> -static void quirk_no_ata_d3(struct pci_dev *pdev)
> +/* Some devices (including a number of ATA cards) break if put into D3 */
> +static void quirk_no_d3(struct pci_dev *pdev)
>  {
>   pdev->dev_flags |= PCI_DEV_FLAGS_NO_D3;
>  }
> +
>  /* Quirk the legacy ATA devices only. The AHCI ones are ok */
>  DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_SERVERWORKS, PCI_ANY_ID,
> - PCI_CLASS_STORAGE_IDE, 8, quirk_no_ata_d3);
> + PCI_CLASS_STORAGE_IDE, 8, quirk_no_d3);
>  DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_ATI, PCI_ANY_ID,
> - PCI_CLASS_STORAGE_IDE, 8, quirk_no_ata_d3);
> + PCI_CLASS_STORAGE_IDE, 8, quirk_no_d3);
>  /* ALi loses some register settings that we cannot then restore */
>  DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AL, PCI_ANY_ID,
> - PCI_CLASS_STORAGE_IDE, 8, quirk_no_ata_d3);
> + PCI_CLASS_STORAGE_IDE, 8, quirk_no_d3);
>  /* VIA comes back fine but we need to keep it alive or ACPI GTM failures
> occur when mode detecting */
>  DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_VIA, PCI_ANY_ID,
> - PCI_CLASS_STORAGE_IDE, 8, quirk_no_ata_d3);
> + PCI_CLASS_STORAGE_IDE, 8, quirk_no_d3);
>  
>  /*
>   * This was originally an Alpha-specific thing, but it really fits here.
> @@ -3367,6 +3368,10 @@ static void mellanox_check_broken_intx_masking(struct 
> pci_dev *pdev)
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MELLANOX, PCI_ANY_ID,
>   mellanox_check_broken_intx_masking);
>  
> +/* Mellanox MT27800 (ConnectX-5) IB card seems to break with D3
> + * In particular this shows up when the device is bound to the vfio-pci 
> driver */

Follow usual multiline comment style, i.e.,

  /*
   * text ...
   * more text ...
   */

> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_MELLANOX, 
> PCI_DEVICE_ID_MELLANOX_CONNECTX4, quirk_no_d3)
> +
>  static void quirk_no_bus_reset(struct pci_dev *dev)
>  {
>   dev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET;
> -- 
> 2.19.2
> 


Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected

2018-11-14 Thread Bjorn Helgaas
On Wed, Nov 14, 2018 at 07:22:04PM +, alex_gagn...@dellteam.com wrote:
> On 11/14/2018 12:00 AM, Bjorn Helgaas wrote:
> > On Tue, Nov 13, 2018 at 10:39:15PM +, alex_gagn...@dellteam.com wrote:
> >> On 11/12/2018 11:02 PM, Bjorn Helgaas wrote:
> >> ...
> >>> Do you think Linux observes the rule about not touching AER bits on
> >>> FFS?  I'm not sure it does.  I'm not even sure what section of the
> >>> spec is relevant.
> >>
> >> I haven't found any place where linux breaks this rule. I'm very
> >> confident that, unless otherwise instructed, we follow this rule.
> > 
> > Just to make sure we're on the same page, can you point me to this
> > rule?  I do see that OSPM must request control of AER using _OSC
> > before it touches the AER registers.  What I don't see is the
> > connection between firmware-first and the AER registers.
> 
> ACPI 6.2 - 6.2.11.3, Table 6-197:
> 
> PCI Express Advanced Error Reporting control:
>   * The firmware sets this bit to 1 to grant control over PCI Express 
> Advanced Error Reporting. If firmware allows the OS control of this 
> feature, then in the context of the _OSC method it must ensure that 
> error messages are routed to device interrupts as described in the PCI 
> Express Base Specification[...]

The PCIe Base Spec is pretty big, so I wish this reference were a
little more explicit.  I *guess* maybe it's referring to PCIe r4.0,
figure 6-3 in sec 6.2.6, where PCIe ERR_* Messages can be routed to
"INTx or MSI Error Interrupts" and/or "platform-specific System Error"
interrupts.

"Device interrupts" seems like it refers to the "INTx or MSI"
interrupts, not the platform-specific System Errors, so I would read
that as saying "if firmware grants OS control of AER via _OSC,
firmware must set the AER Reporting Enables in the AER Root Error
Command register."  But that seems a little silly because the OS now
*owns* the AER capability and it can set the AER Root Error Command
register itself if it wants to.

And I still don't see the connection here with Firmware-First.  I'm
pretty sure firmware could not be notified via INTx or MSI interrupts
because those are totally managed by OSPM.

> > The closest I can find is the "Enabled" field in the HEST PCIe
> > AER structures (ACPI v6.2, sec 18.3.2.4, .5, .6), where it says:
> > 
> >If the field value is 1, indicates this error source is
> >to be enabled.
> > 
> >If the field value is 0, indicates that the error source
> >is not to be enabled.
> > 
> >If FIRMWARE_FIRST is set in the flags field, the Enabled
> >field is ignored by the OSPM.
> > 
> > AFAICT, Linux completely ignores the Enabled field in these
> > structures.
> 
> I don't think ignoring the field is a problem:
>   * With FFS, OS should ignore it.
>   * Without FFS, we have control, and we get to make the decisions anyway.
> In the latter case we decide whether to use AER, independent of the crap 
> in ACPI. I'm not even sure why "Enabled" matters in native AER handling. 

It seems like these HEST structures are "here's how firmware thinks
you should set up AER on this device".  But I agree, I have no idea
how to interpret "Enabled".  The rest of the HEST fields cover all the
useful AER registers, including the Reporting Enables in the AER Root
Error Command register *and* the Error Reporting Enables in the Device
Control register.  So I don't know what the "Enabled" field adds to
all that.  What a mess.

> > For firmware-first to work, firmware has to get control.  How does
> > it get control?  How does OSPM know to either set up that
> > mechanism or keep its mitts off something firmware set up before
> > handoff?
> 
> My understanding is that, if FW keeps control of AER in _OSC, then
> it will have set things up to get notified instead of the OS. OSPM
> not touching AER bits is to make sure it doesn't mess up FW's setup.
> I think there are some proprietary bits in the root port to route
> interrupts to SMIs instead of the AER vectors.

It makes good sense that if OSPM doesn't have AER control, firmware
does all AER handling, including any setup for firmware-first
notification.  If we can assume that firmware-first notification is
done in some way the OS doesn't know about and can't mess up, that
would be awesome.

But I think the VMD model really has nothing to do with the APEI
firmware-first model.  With VMD, it sounds like OSPM owns the AER
capability and doesn't know firmware exists *except* that it has to be
careful not to step on firmware's interrupt.  So maybe we can handle it
separately.

Bjorn


Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected

2018-11-13 Thread Bjorn Helgaas
On Tue, Nov 13, 2018 at 10:39:15PM +, alex_gagn...@dellteam.com wrote:
> On 11/12/2018 11:02 PM, Bjorn Helgaas wrote:
> > 
> > [EXTERNAL EMAIL]
> > Please report any suspicious attachments, links, or requests for sensitive 
> > information.

It looks like Dell's email system adds the above in such a way that the
email quoting convention suggests that *I* wrote it, when I did not.

> ...
> > Do you think Linux observes the rule about not touching AER bits on
> > FFS?  I'm not sure it does.  I'm not even sure what section of the
> > spec is relevant.
> 
> I haven't found any place where linux breaks this rule. I'm very 
> confident that, unless otherwise instructed, we follow this rule.

Just to make sure we're on the same page, can you point me to this
rule?  I do see that OSPM must request control of AER using _OSC
before it touches the AER registers.  What I don't see is the
connection between firmware-first and the AER registers.

The closest I can find is the "Enabled" field in the HEST PCIe
AER structures (ACPI v6.2, sec 18.3.2.4, .5, .6), where it says:

  If the field value is 1, indicates this error source is
  to be enabled.

  If the field value is 0, indicates that the error source
  is not to be enabled.

  If FIRMWARE_FIRST is set in the flags field, the Enabled
  field is ignored by the OSPM.

AFAICT, Linux completely ignores the Enabled field in these
structures.

These structures also contain values the OS is apparently supposed to
write to Device Control and several AER registers (in struct
acpi_hest_aer_common).  Linux ignores these as well.

These seem like fairly serious omissions in Linux.

> > The whole issue of firmware-first, the mechanism by which firmware
> > gets control, the System Error enables in Root Port Root Control
> > registers, etc., is very murky to me.  Jon has a sort of similar issue
> > with VMD where he needs to leave System Errors enabled instead of
> > disabling them as we currently do.
> 
> Well, OS gets control via _OSC method, and based on that it should 
> touch/not touch the AER bits. 

I agree so far.

> The bits that get set/cleared come from _HPX method,

_HPX tells us about some AER registers, Device Control, Link Control,
and some bridge registers.  It doesn't say anything about the Root
Control register that Jon is concerned with.

For firmware-first to work, firmware has to get control.  How does it
get control?  How does OSPM know to either set up that mechanism or
keep its mitts off something firmware set up before handoff?  In Jon's
VMD case, I think firmware-first relies on the System Error controlled
by the Root Control register.  Linux thinks it owns that, and I don't
know how to learn otherwise.

> and there's a more about the FFS described in ACPI spec. It 
> seems that if platform, wants to enable VMD, it should pass the correct 
> bits via _HPX. I'm curious to know in what new twisted way FFS doesn't 
> work as intended.

Bjorn


Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected

2018-11-12 Thread Bjorn Helgaas
[+cc Jon, for related VMD firmware-first error enable issue]

On Mon, Nov 12, 2018 at 08:05:41PM +, alex_gagn...@dellteam.com wrote:
> On 11/11/2018 11:50 PM, Oliver O'Halloran wrote:
> > On Thu, 2018-11-08 at 23:06 +, alex_gagn...@dellteam.com wrote:

> >> But it's not the firmware that crashes. It's linux as a result of a
> >> fatal error message from the firmware. And we can't fix that because FFS
> >> handling requires that the system reboots [1].
> > 
> > Do we know the exact circumsances that result in firmware requesting a
> > reboot? If it happen on any PCIe error I don't see what we can do to
> > prevent that beyond masking UEs entirely (are we even allowed to do
> > that on FFS systems?).
> 
> Pull a drive out at an angle, push two drives in at the same time, pull 
> out a drive really slow. If an error is even reported to the OS depends 
> on PD state, and proprietary mechanisms and logic in the HW and FW. OS 
> is not supposed to mask errors (touch AER bits) on FFS.

PD?

Do you think Linux observes the rule about not touching AER bits on
FFS?  I'm not sure it does.  I'm not even sure what section of the
spec is relevant.

The whole issue of firmware-first, the mechanism by which firmware
gets control, the System Error enables in Root Port Root Control
registers, etc., is very murky to me.  Jon has a sort of similar issue
with VMD where he needs to leave System Errors enabled instead of
disabling them as we currently do.

Bjorn

[1] 
https://lore.kernel.org/linux-pci/20181029210651.gb13...@bhelgaas-glaptop.roam.corp.google.com


Re: [PATCH 6/9] PCI: consolidate PCI config entry in drivers/pci

2018-11-08 Thread Bjorn Helgaas
On Fri, Oct 19, 2018 at 02:09:49PM +0200, Christoph Hellwig wrote:
> There is no good reason to duplicate the PCI menu in every architecture.
> Instead provide a selectable HAVE_PCI symbol that indicates availability
> of PCI support and the handle the rest in drivers/pci.
> 
> Note that for powerpc we now select HAVE_PCI globally instead of the
> convoluted mess of conditional or or non-conditional support per board,
> similar to what we do e.g. on x86.  For alpha PCI is selected for the
> non-jensen configs as it was the default before, and a lot of code does
> not compile without PCI enabled.  On other architectures with limited
> PCI support that wasn't as complicated I've left the selection as-is.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Max Filippov 
> Acked-by: Thomas Gleixner 
> Acked-by: Bjorn Helgaas 

Sounds like Masahiro plans to take this series and I'm fine with that.
Minor typo below, since it sounds like there's another revision coming.

> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -3,6 +3,18 @@
>  # PCI configuration
>  #
>  
> +config HAVE_PCI
> + bool
> +
> +menuconfig PCI
> + bool "PCI support"
> + depends on HAVE_PCI
> +
> + help
> +   This option enables support for the PCI local bus, including
> +   support for PCI-X and the fundations for PCI Express support.

s/fundations/foundations/

> +   Say 'Y' here unless you know what you are doing.
> +
>  source "drivers/pci/pcie/Kconfig"


Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected

2018-11-08 Thread Bjorn Helgaas
[+cc Jonathan, Greg, Lukas, Russell, Sam, Oliver for discussion about
PCI error recovery in general]

On Wed, Nov 07, 2018 at 05:42:57PM -0600, Bjorn Helgaas wrote:
> On Tue, Sep 18, 2018 at 05:15:00PM -0500, Alexandru Gagniuc wrote:
> > When a PCI device is gone, we don't want to send IO to it if we can
> > avoid it. We expose functionality via the irq_chip structure. As
> > users of that structure may not know about the underlying PCI device,
> > it's our responsibility to guard against removed devices.
> > 
> > .irq_write_msi_msg() is already guarded inside __pci_write_msi_msg().
> > .irq_mask/unmask() are not. Guard them for completeness.
> > 
> > For example, surprise removal of a PCIe device triggers teardown. This
> > touches the irq_chips ops some point to disable the interrupts. I/O
> > generated here can crash the system on firmware-first machines.
> > Not triggering the IO in the first place greatly reduces the
> > possibility of the problem occurring.
> > 
> > Signed-off-by: Alexandru Gagniuc 
> 
> Applied to pci/misc for v4.21, thanks!

I'm having second thoughts about this.  One thing I'm uncomfortable
with is that sprinkling pci_dev_is_disconnected() around feels ad hoc
instead of systematic, in the sense that I don't know how we convince
ourselves that this (and only this) is the correct place to put it.

Another is that the only place we call pci_dev_set_disconnected() is
in pciehp and acpiphp, so the only "disconnected" case we catch is if
hotplug happens to be involved.  Every MMIO read from the device is an
opportunity to learn whether it is reachable (a read from an
unreachable device typically returns ~0 data), but we don't do
anything at all with those.

The config accessors already check pci_dev_is_disconnected(), so this
patch is really aimed at MMIO accesses.  I think it would be more
robust if we added wrappers for readl() and writel() so we could
notice read errors and avoid future reads and writes.

Two compiled but untested patches below for your comments.  You can
mostly ignore the first; it's just boring interface changes.  The
important part is the second, which adds the wrappers.

These would be an alternative to the (admittedly much shorter) patch
here.  The wrappers are independent of MSI and could potentially be
exported from the PCI core for use by drivers.  I think it would be
better for drivers to use something like this instead of calling
pci_device_is_present() or pci_dev_is_disconnected() directly.

> > ---
> >  drivers/pci/msi.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > index f2ef896464b3..f31058fd2260 100644
> > --- a/drivers/pci/msi.c
> > +++ b/drivers/pci/msi.c
> > @@ -227,6 +227,9 @@ static void msi_set_mask_bit(struct irq_data *data, u32 
> > flag)
> >  {
> > struct msi_desc *desc = irq_data_get_msi_desc(data);
> >  
> > +   if (pci_dev_is_disconnected(msi_desc_to_pci_dev(desc)))
> > +   return;
> > +
> >     if (desc->msi_attrib.is_msix) {
> > msix_mask_irq(desc, flag);
> > readl(desc->mask_base); /* Flush write to device */


commit 150346e09edbcaedc520a6d7dec2b16f3a53afa1
Author: Bjorn Helgaas 
Date:   Thu Nov 8 09:17:51 2018 -0600

PCI/MSI: Pass pci_dev into IRQ mask interfaces

Add the struct pci_dev pointer to these interfaces:

  __pci_msix_desc_mask_irq()
  __pci_msi_desc_mask_irq()
  msix_mask_irq
  msi_mask_irq()

The pci_dev pointer is currently unused, so there's no functional change
intended with this patch.  A subsequent patch will use the pointer to
improve error detection.

diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 9f6f392a4461..56bbee2cf761 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -462,9 +462,9 @@ void arch_teardown_msi_irqs(struct pci_dev *pdev)
if (!msi->irq)
continue;
if (msi->msi_attrib.is_msix)
-   __pci_msix_desc_mask_irq(msi, 1);
+   __pci_msix_desc_mask_irq(pdev, msi, 1);
else
-   __pci_msi_desc_mask_irq(msi, 1, 1);
+   __pci_msi_desc_mask_irq(pdev, msi, 1, 1);
irq_set_msi_desc(msi->irq, NULL);
irq_free_desc(msi->irq);
msi->msg.address_lo = 0;
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index af24ed50a245..d46ae506e296 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -170,7 +170,8 @@ static inline __attribute_const__ u32 msi_mask(unsigned x)
  * reliably as devices without an INTx disable bit will then generate a
  * level IRQ which will never be cleared.
  */
-u32 __pci_msi_de

Re: [PATCH 4/8] pci: consolidate PCI config entry in drivers/pci

2018-10-15 Thread Bjorn Helgaas
s/^pci: /PCI: / in subject

On Sat, Oct 13, 2018 at 05:10:12PM +0200, Christoph Hellwig wrote:
> There is no good reason to duplicate the PCI menu in every architecture.
> Instead provide a selectable HAS_PCI symbol that indicates availability
> of PCI support and the handle the rest in drivers/pci.
> 
> Note that for powerpc we now select HAS_PCI globally instead of the
> convoluted mess of conditional or or non-conditional support per board,
> similar to what we do e.g. on x86.  For alpha PCI is selected for the
> non-jensen configs as it was the default before, and a lot of code does
> not compile without PCI enabled.  On other architectures with limited
> PCI support that wasn't as complicated I've left the selection as-is.

Thanks for doing this.  It's a great cleanup.  I know you have a few
things you're cleaning up, but add my:

Acked-by: Bjorn Helgaas 

when you do that.


Re: [PATCH 1/8] aha152x: rename the PCMCIA define

2018-10-15 Thread Bjorn Helgaas
On Sat, Oct 13, 2018 at 05:10:09PM +0200, Christoph Hellwig wrote:
> We plan to enable building the pcmcia core and drivers, and the
> non-prefixed PCMCIA name clashes with some arch headers.

In the followup PCMCIA patch, you capitalized "PCMCIA core".


Re: [PATCH 0/2] sriov enablement on s390

2018-10-10 Thread Bjorn Helgaas
On Wed, Oct 10, 2018 at 02:55:07PM +0200, Sebastian Ott wrote:
> Hello Bjorn,
> 
> On Wed, 12 Sep 2018, Bjorn Helgaas wrote:
> > On Wed, Sep 12, 2018 at 02:34:09PM +0200, Sebastian Ott wrote:
> > > On s390 we currently handle SRIOV within firmware. Which means
> > > that the PF is under firmware control and not visible to operating
> > > systems. SRIOV enablement happens within firmware and VFs are
> > > passed through to logical partitions.
> > > 
> > > I'm working on a new mode were the PF is under operating system
> > > control (including SRIOV enablement). However we still need
> > > firmware support to access the VFs. The way this is supposed
> > > to work is that when firmware traps the SRIOV enablement it
> > > will present machine checks to the logical partition that
> > > triggered the SRIOV enablement and provide the VFs via hotplug
> > > events.
> > > 
> > > The problem I'm faced with is that the VF detection code in
> > > sriov_enable leads to unusable functions in s390.
> > 
> > We're moving away from the weak function implementation style.  Can
> > you take a look at Arnd's work here, which uses pci_host_bridge
> > callbacks instead?
> > 
> >   https://lkml.kernel.org/r/20180817102645.3839621-1-a...@arndb.de
> 
> What's the status of Arnd's patches - will they go upstream in the next
> couple of versions?

I hope so [1].  IIRC Arnd mentioned doing some minor updates, so I'm
waiting on that.

> What about my patches that I rebased on Arnd's branch
> will they be considered?

Definitely.  From my point of view they're just lined up behind Arnd's
patches.

[1] 
https://lore.kernel.org/linux-pci/20181002205903.gd120...@bhelgaas-glaptop.roam.corp.google.com


Re: [linux-next][bisected e3fbcc7c] build error at drivers/pci/pcie/aer_inject.c:444:6: error: ‘struct pt_regs’ has no member named ‘ip

2018-10-09 Thread Bjorn Helgaas
On Tue, Oct 9, 2018 at 5:40 AM Abdul Haleem  wrote:
>
> On Tue, 2018-10-09 at 20:47 +1100, Michael Ellerman wrote:
> > Abdul Haleem  writes:
> > > Greeting's
> > >
> > > Today's linux-next fails to build on ppc with below error
> > >
> > > drivers/pci/pcie/aer_inject.c: In function ‘aer_inject_ftrace_thunk’:
> > > drivers/pci/pcie/aer_inject.c:444:6: error: ‘struct pt_regs’ has no
> > > member named ‘ip’
> > >   regs->ip = (unsigned long) hook->function;
> > >   ^
> > > make[3]: *** [drivers/pci/pcie/aer_inject.o] Error 1
> > > make[2]: *** [drivers/pci/pcie] Error 2
> > > make[1]: *** [drivers/pci] Error 2
> > >
> > > Machine: Power 9
> > > kernel version: 4.19.0-rc7-next-20181008
> > > gcc version: 4.8.5 20150623
> > > config attached
> > >
> > > Recent code changes from these commits:
> > >
> > > c79ad38b36dd4967d67f83fc48bade37ee041a47 PCI/AER: Abstract AER interrupt 
> > > handling
> > > 7b23285f930a8dedcbf749661cd0f2cc27f79be6 PCI/AER: Reuse existing 
> > > pcie_port_find_device() interface
> > > e3fbcc7c4b130f81ea586183db561fa3ce9c6447 PCI/AER: Covertly inject errors 
> > > with ftrace hooks
> >
> > I don't see this commit in today's linux next (20181009) so presumably
> > someone else has noticed and dropped the commit?
>
> Yes, the commit is dropped from next-20181009, and now build is fine.

I dropped those commits yesterday.


Re: [RFC 00/15] PCI: turn some __weak functions into callbacks

2018-10-02 Thread Bjorn Helgaas
On Fri, Aug 17, 2018 at 12:26:30PM +0200, Arnd Bergmann wrote:
> Hi Bjorn and others,
> 
> Triggered by Christoph's patches, I had another go at converting
> all of the remaining pci host bridge implementations to be based
> on pci_alloc_host_bridge and a separate registration function.
> 
> This is made possible through work from Lorenzo and others to
> convert many of the existing drivers, as well as the removal
> of some of the older architectures that nobody used.
> 
> I'm adding a bit of duplication into the less maintained code
> here, but it makes everything more consistent, and gives an
> easy place to hook up callback functions etc.
> 
> The three parts of this series are:
> 
> a) push up the registration into the callers (this is where
>code gets added)
> b) clean up some of the more common host bridge
>implementations again to integrate that code better.
>This could be done for the rest as well, or we could just
>leave them alone.
> c) start moving the __weak functions into callbacks in
>pci_host_bridge. This is intentionally incomplete, since
>it is a lot of work to do it for all those functions,
>and I want to get consensus on the approach first, as well
>as maybe get other developers to help out with the rest.
> 
> Please have a look.
> 
>Arnd
> 
> [1] https://lore.kernel.org/lkml/4288331.jNpl6KXlNO@wuerfel/
> [2] https://patchwork.kernel.org/patch/10555657/
> 
> Arnd Bergmann (15):
>   PCI: clean up legacy host bridge scan functions
>   PCI: move pci_scan_bus into callers
>   PCI: move pci_scan_root_bus into callers
>   PCI: export pci_register_host_bridge
>   PCI: move pci_create_root_bus into callers
>   powerpc/pci: fold pci_create_root_bus into pcibios_scan_phb
>   PCI/ACPI: clean up acpi_pci_root_create()
>   x86: PCI: clean up pcibios_scan_root()
>   PCI: xenfront: clean up pcifront_scan_root()
>   sparc/PCI: simplify pci_scan_one_pbm
>   PCI: hyperv: convert to pci_scan_root_bus_bridge
>   PCI: make pcibios_bus_add_device() a callback function
>   PCI: turn pcibios_alloc_irq into a callback
>   PCI: make pcibios_root_bridge_prepare a callback
>   PCI: make pcibios_add_bus/remove_bus callbacks
> 
>  arch/arm64/kernel/pci.c   |  40 ++-
>  arch/ia64/pci/pci.c   |  25 +
>  arch/ia64/sn/kernel/io_init.c |  27 +
>  arch/microblaze/pci/pci-common.c  |  27 +
>  arch/powerpc/include/asm/pci-bridge.h |   3 +
>  arch/powerpc/kernel/pci-common.c  |  60 +--
>  arch/s390/pci/pci.c   |  30 +-
>  arch/sh/drivers/pci/pci.c |   1 +
>  arch/sh/drivers/pci/pcie-sh7786.c |   3 +-
>  arch/sh/include/asm/pci.h |   2 +
>  arch/sparc/kernel/pci.c   |  40 ---
>  arch/sparc/kernel/pcic.c  |  35 ++
>  arch/x86/pci/acpi.c   |  15 +--
>  arch/x86/pci/common.c |  42 
>  arch/xtensa/kernel/pci.c  |  27 +
>  drivers/acpi/pci_root.c   |  43 +---
>  drivers/parisc/dino.c |  28 +
>  drivers/parisc/lba_pci.c  |  28 +
>  drivers/pci/bus.c |   8 +-
>  drivers/pci/controller/pci-hyperv.c   |  47 
>  drivers/pci/controller/vmd.c  |  30 +-
>  drivers/pci/hotplug/ibmphp_core.c |  35 ++
>  drivers/pci/pci-driver.c  |  13 ++-
>  drivers/pci/probe.c   | 150 +-
>  drivers/pci/xen-pcifront.c|  40 +++
>  include/linux/acpi.h  |   2 +
>  include/linux/pci.h   |  17 ++-
>  27 files changed, 514 insertions(+), 304 deletions(-)

Sorry for the late response to this.

I think I'm generally on-board with this.  I admit I'm a little
hesitant about adding 200 lines of code when this is really more
"cleanup" than new functionality, but I think a lot of that is because
this series contains costs (e.g., duplicating code) for everybody but
only has the corresponding benefits for a few (ACPI, x86, xenfront).
Those cases are much closer to parity in terms of lines added/removed.

I saw some minor comments that suggested you had some updates, so I'll
watch for an updated posting.

Bjorn


Re: [PATCH -next] PCI: hotplug: Use kmemdup rather than duplicating its implementation in pnv_php_add_devtree()

2018-09-28 Thread Bjorn Helgaas
On Thu, Sep 27, 2018 at 06:52:21AM +, YueHaibing wrote:
> Use kmemdup rather than duplicating its implementation
> 
> Signed-off-by: YueHaibing 

Applied with Michael's ack to pci/hotplug for v4.20, thanks!

> ---
>  drivers/pci/hotplug/pnv_php.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
> index 5070620..ee54f5b 100644
> --- a/drivers/pci/hotplug/pnv_php.c
> +++ b/drivers/pci/hotplug/pnv_php.c
> @@ -275,14 +275,13 @@ static int pnv_php_add_devtree(struct pnv_php_slot 
> *php_slot)
>   goto free_fdt1;
>   }
>  
> - fdt = kzalloc(fdt_totalsize(fdt1), GFP_KERNEL);
> + fdt = kmemdup(fdt1, fdt_totalsize(fdt1), GFP_KERNEL);
>   if (!fdt) {
>   ret = -ENOMEM;
>   goto free_fdt1;
>   }
>  
>   /* Unflatten device tree blob */
> - memcpy(fdt, fdt1, fdt_totalsize(fdt1));
>   dt = of_fdt_unflatten_tree(fdt, php_slot->dn, NULL);
>   if (!dt) {
>   ret = -EINVAL;
> 
> 
> 


Re: [PATCH -next] PCI/AER: Remove duplicated include from err.c

2018-09-26 Thread Bjorn Helgaas
On Wed, Sep 26, 2018 at 11:00:10AM +, YueHaibing wrote:
> Remove duplicated include.
> 
> Signed-off-by: YueHaibing 

Applied to pci/hotplug for v4.20, thanks!

> ---
>  drivers/pci/pcie/err.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 4da2a62b..773197a 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -12,7 +12,6 @@
>  
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> 
> 
> 


Re: [PATCH] MAINTAINERS: Add PPC contacts for PCI core error handling

2018-09-19 Thread Bjorn Helgaas
On Wed, Sep 19, 2018 at 11:49:26AM +1000, Russell Currey wrote:
> On Tue, 2018-09-18 at 16:58 -0500, Bjorn Helgaas wrote:
> > On Wed, Sep 12, 2018 at 11:55:26AM -0500, Bjorn Helgaas wrote:
> > > From: Bjorn Helgaas 
> > > 
> > > The original PCI error recovery functionality was for the powerpc-specific
> > > IBM EEH feature.  PCIe subsequently added some similar features, including
> > > AER and DPC, that can be used on any architecture.
> > > 
> > > We want the generic PCI core error handling support to work with all of
> > > these features.  Driver error recovery callbacks should be independent of
> > > which feature the platform provides.
> > > 
> > > Add the generic PCI core error recovery files to the powerpc EEH
> > > MAINTAINERS entry so the powerpc folks will be copied on changes to the
> > > generic PCI error handling strategy.
> > > 
> > > Signed-off-by: Bjorn Helgaas 
> > 
> > I applied the following to for-linus for v4.19.  Russell, if you want
> > to be removed, let me know and I'll do that.
> 
> Oliver's email address for kernel stuff is ooh...@gmail.com, I think benh has 
> been
> CCing his IBM address.  But other than that,
> 
> Acked-by: Russell Currey 

I updated Oliver's email address and added your ack, thanks!


Re: [PATCH] MAINTAINERS: Add PPC contacts for PCI core error handling

2018-09-18 Thread Bjorn Helgaas
On Wed, Sep 12, 2018 at 11:55:26AM -0500, Bjorn Helgaas wrote:
> From: Bjorn Helgaas 
> 
> The original PCI error recovery functionality was for the powerpc-specific
> IBM EEH feature.  PCIe subsequently added some similar features, including
> AER and DPC, that can be used on any architecture.
> 
> We want the generic PCI core error handling support to work with all of
> these features.  Driver error recovery callbacks should be independent of
> which feature the platform provides.
> 
> Add the generic PCI core error recovery files to the powerpc EEH
> MAINTAINERS entry so the powerpc folks will be copied on changes to the
> generic PCI error handling strategy.
> 
> Signed-off-by: Bjorn Helgaas 

I applied the following to for-linus for v4.19.  Russell, if you want
to be removed, let me know and I'll do that.

commit 3fed0e04026c
Author: Bjorn Helgaas 
Date:   Wed Sep 12 11:55:26 2018 -0500

MAINTAINERS: Update PPC contacts for PCI core error handling

The original PCI error recovery functionality was for the powerpc-specific
IBM EEH feature.  PCIe subsequently added some similar features, including
AER and DPC, that can be used on any architecture.

We want the generic PCI core error handling support to work with all of
these features.  Driver error recovery callbacks should be independent of
which feature the platform provides.

Add the generic PCI core error recovery files to the powerpc EEH
MAINTAINERS entry so the powerpc folks will be copied on changes to the
generic PCI error handling strategy.

Add Sam and Oliver as maintainers for this area.

Signed-off-by: Bjorn Helgaas 

diff --git a/MAINTAINERS b/MAINTAINERS
index 4ece30f15777..f23244003836 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11203,8 +11203,14 @@ F: tools/pci/
 
 PCI ENHANCED ERROR HANDLING (EEH) FOR POWERPC
 M: Russell Currey 
+M: Sam Bobroff 
+M: Oliver O'Halloran 
 L: linuxppc-dev@lists.ozlabs.org
 S: Supported
+F: Documentation/PCI/pci-error-recovery.txt
+F: drivers/pci/pcie/aer.c
+F: drivers/pci/pcie/dpc.c
+F: drivers/pci/pcie/err.c
 F: Documentation/powerpc/eeh-pci-error-recovery.txt
 F: arch/powerpc/kernel/eeh*.c
 F: arch/powerpc/platforms/*/eeh*.c


Re: [PATCH RFC 1/4] PCI: hotplug: Add parameter to put devices to reset during rescan

2018-09-18 Thread Bjorn Helgaas
On Tue, Sep 18, 2018 at 05:01:48PM +0300, Sergey Miroshnichenko wrote:
> On 9/18/18 1:59 AM, Bjorn Helgaas wrote:
> > On Mon, Sep 17, 2018 at 11:55:43PM +0300, Sergey Miroshnichenko wrote:
> >> On 9/17/18 8:28 AM, Sam Bobroff wrote:
> >>> On Fri, Sep 14, 2018 at 07:14:01PM +0300, Sergey Miroshnichenko wrote:
> >>>> Introduce a new command line option "pci=pcie_movable_bars"
> >>>> that indicates support of PCIe hotplug without prior
> >>>> reservation of memory regions by BIOS/bootloader.

> >>> What about devices with drivers that don't have reset_prepare()?  It
> >>> looks like it will just reconfigure them anyway. Is that right?
> >>
> >> It is possible that unprepared driver without these hooks will get BARs
> >> moved, I should put a warning message there. There three ways we can see
> >> to make this safe:
> >>  - add the reset_prepare()/reset_done() hooks to *every* PCIe driver;
> >>  - refuse BAR movement if at least one unprepared driver has been
> >> encountered during rescan;
> >>  - reduce the number of drivers which can be affected to some
> >> controllable value and prepare them on demand.
> >>
> >> Applying the second proposal as a major restriction seems fairly
> >> reasonable, but for our particular setups and use-cases it is probably
> >> too stiff:
> >>  - we've noticed that devices connected directly to the root bridge
> >> don't get moved BARs, and this covers our x86_64 servers: we only
> >> insert/remove devices into "second-level" and "lower" bridges there, but
> >> not root;
> >>  - on PowerNV we have system devices (network interfaces, USB hub, etc.)
> >> grouped into dedicated domain, with all other domains ready for hotplug,
> >> and only these domains can be rescanned.
> >>
> >> With our scenarios currently reduced to these two, we can live with just
> >> a few drivers "prepared" for now: NVME and few Ethernet adapters, this
> >> gives us a possibility to use this feature before "converting" *all* the
> >> drivers, and even have the NVidia cards running on a closed proprietary
> >> driver.
> >>
> >> Should we make this behavior adjustable with something like
> >> "pcie_movable_bars=safe" and "pcie_movable_bars=always" ?
> > 
> > I like the overall idea of this a lot.
> > 
> >   - Why do we need a command line parameter to enable this?  Can't we
> > do it unconditionally and automatically when it's possible?  We
> > could have a chicken switch to *disable* it in case this breaks
> > something horribly, but I would like this functionality to be
> > always available without a special option.
> 
> After making this feature completely safe we could activate it with the
> existing option "pci=realloc".

That *sounds* good, but in practice it never happens that we decide a
feature is completely safe and somebody makes it the default.  If
we're going to do this, I think we need to commit to making it work
100% of the time, with no option needed.

> >   - I'm not sure the existence of .reset_done() is evidence that a
> > driver is prepared for its BARs to move.  I don't see any
> > documentation that refers to BAR movement, and I doubt it's been
> > tested.  But I only see 5 implementations in the tree, so it'd be
> > easy to verify.
> 
> You are right, and we should clarify the description:
>  - drivers which have the .reset_done() already - none of them are aware
> of movable BARs yet;
>  - the rest of the drivers should both be able to pause and handle the
> changes in BARs.

This doesn't clarify it for me.  If you want to update all existing
.reset_done() methods so they deal with BAR changes, that would be
fine with me.  That would be done by preliminary patches in the series
that adds the feature.

> >   - I think your second proposal above sounds right: we should regard
> > any device whose driver lacks .reset_done() as immovable.  We will
> > likely be able to move some devices but not others.  Implementing
> > .reset_done() will increase flexibility but it shouldn't be a
> > requirement for all drivers.
> 
> Thanks for the advice! This is more flexible and doesn't have any
> prerequisites. In this case the greater the "movable"/"immovable" ratio
> of the devices that was working before the hotplug event - the higher
> the probability to free some space for new BARs. But even a single
> "immovable" device at an undesir

Re: [PATCH RFC 1/4] PCI: hotplug: Add parameter to put devices to reset during rescan

2018-09-17 Thread Bjorn Helgaas
[+cc Russell, Ben, Oliver, linuxppc-dev]

On Mon, Sep 17, 2018 at 11:55:43PM +0300, Sergey Miroshnichenko wrote:
> Hello Sam,
> 
> On 9/17/18 8:28 AM, Sam Bobroff wrote:
> > Hi Sergey,
> > 
> > On Fri, Sep 14, 2018 at 07:14:01PM +0300, Sergey Miroshnichenko wrote:
> >> Introduce a new command line option "pci=pcie_movable_bars" that indicates
> >> support of PCIe hotplug without prior reservation of memory regions by
> >> BIOS/bootloader.
> >>
> >> If a new PCIe device has been hot-plugged between two active ones, which
> >> have no (or not big enough) gap between their BARs, allocating new BARs
> >> requires to move BARs of the following working devices:
> >>
> >> 1)   dev 4
> >>|
> >>v
> >> .. |  dev 3  |  dev 3  |  dev 5  |  dev 7  |
> >> .. |  BAR 0  |  BAR 1  |  BAR 0  |  BAR 0  |
> >>
> >> 2) dev 4
> >>  |
> >>  v
> >> .. |  dev 3  |  dev 3  | -->   --> |  dev 5  |  dev 7  |
> >> .. |  BAR 0  |  BAR 1  | -->   --> |  BAR 0  |  BAR 0  |
> >>
> >> 3)
> >>
> >> .. |  dev 3  |  dev 3  |  dev 4  |  dev 4  |  dev 5  |  dev 7  |
> >> .. |  BAR 0  |  BAR 1  |  BAR 0  |  BAR 1  |  BAR 0  |  BAR 0  |
> >>
> >> Not only BARs, but also bridge windows can be updated during a PCIe rescan,
> >> threatening all memory transactions during this procedure, so the PCI
> >> subsystem will instruct the drivers to pause by calling the reset_prepare()
> >> and reset_done() callbacks.
> >>
> >> If a device may be affected by BAR movement, the BAR changes tracking must
> >> be implemented in its driver.
> >>
> >> Signed-off-by: Sergey Miroshnichenko 
> >> ---
> >>  .../admin-guide/kernel-parameters.txt |  6 +++
> >>  drivers/pci/pci.c |  2 +
> >>  drivers/pci/probe.c   | 43 +++
> >>  include/linux/pci.h   |  1 +
> >>  4 files changed, 52 insertions(+)
> >>
> >> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> >> b/Documentation/admin-guide/kernel-parameters.txt
> >> index 64a3bf54b974..f8132a709061 100644
> >> --- a/Documentation/admin-guide/kernel-parameters.txt
> >> +++ b/Documentation/admin-guide/kernel-parameters.txt
> >> @@ -3311,6 +3311,12 @@
> >>bridges without forcing it upstream. Note:
> >>this removes isolation between devices and
> >>may put more devices in an IOMMU group.
> >> +  pcie_movable_bars   Arrange a space at runtime for BARs of
> >> +  hotplugged PCIe devices - usable if bootloader
> >> +  doesn't reserve memory regions for them. Freeing
> >> +  a space may require moving BARs of active 
> >> devices
> >> +  to higher addresses, so device drivers will be
> >> +  paused during rescan.
> >>  
> >>pcie_aspm=  [PCIE] Forcibly enable or disable PCIe Active State 
> >> Power
> >>Management.
> >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> >> index 1835f3a7aa8d..5f07a59b5924 100644
> >> --- a/drivers/pci/pci.c
> >> +++ b/drivers/pci/pci.c
> >> @@ -6105,6 +6105,8 @@ static int __init pci_setup(char *str)
> >>pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS);
> >>} else if (!strncmp(str, "disable_acs_redir=", 18)) {
> >>disable_acs_redir_param = str + 18;
> >> +  } else if (!strncmp(str, "pcie_movable_bars", 17)) {
> >> +  pci_add_flags(PCI_MOVABLE_BARS);
> >>} else {
> >>printk(KERN_ERR "PCI: Unknown option `%s'\n",
> >>str);
> >> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> >> index 201f9e5ff55c..bdaafc48dc4c 100644
> >> --- a/drivers/pci/probe.c
> >> +++ b/drivers/pci/probe.c
> >> @@ -3138,6 +3138,45 @@ unsigned int pci_rescan_bus_bridge_resize(struct 
> >> pci_dev *bridge)
> >>return max;
> >>  }
> >>  
> >> +/*
> >> + * Put all devices of the bus and its children to reset
> >> + */
> >> +static void pci_bus_reset_prepare(struct pci_bus *bus)
> >> +{
> >> +  struct pci_dev *dev;
> >> +
> >> +  list_for_each_entry(dev, >devices, bus_list) {
> >> +  struct pci_bus *child = dev->subordinate;
> >> +
> >> +  if (child) {
> >> +  pci_bus_reset_prepare(child);
> >> +  } else if (dev->driver &&
> >> + dev->driver->err_handler &&
> >> + dev->driver->err_handler->reset_prepare) {
> >> +  dev->driver->err_handler->reset_prepare(dev);
> >> +  }
> > 
> > What about devices with drivers that don't have reset_prepare()?  It
> > looks like it will just reconfigure them anyway. Is that 

Re: [PATCH] MAINTAINERS: Add PPC contacts for PCI core error handling

2018-09-12 Thread Bjorn Helgaas
On Wed, Sep 12, 2018 at 11:55:26AM -0500, Bjorn Helgaas wrote:
> From: Bjorn Helgaas 
> 
> The original PCI error recovery functionality was for the powerpc-specific
> IBM EEH feature.  PCIe subsequently added some similar features, including
> AER and DPC, that can be used on any architecture.
> 
> We want the generic PCI core error handling support to work with all of
> these features.  Driver error recovery callbacks should be independent of
> which feature the platform provides.
> 
> Add the generic PCI core error recovery files to the powerpc EEH
> MAINTAINERS entry so the powerpc folks will be copied on changes to the
> generic PCI error handling strategy.

I really want to make sure the powerpc folks are plugged into any PCI core
error handling discussions.  Please let me know if there's a better way
than this patch, or if there are other people who should be added.

> Signed-off-by: Bjorn Helgaas 
> ---
>  MAINTAINERS |4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 7e10ba65bfe4..d6699597fd89 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -11202,6 +11202,10 @@ PCI ENHANCED ERROR HANDLING (EEH) FOR POWERPC
>  M:   Russell Currey 
>  L:   linuxppc-dev@lists.ozlabs.org
>  S:   Supported
> +F:   Documentation/PCI/pci-error-recovery.txt
> +F:   drivers/pci/pcie/aer.c
> +F:   drivers/pci/pcie/dpc.c
> +F:   drivers/pci/pcie/err.c
>  F:   Documentation/powerpc/eeh-pci-error-recovery.txt
>  F:   arch/powerpc/kernel/eeh*.c
>  F:   arch/powerpc/platforms/*/eeh*.c
> 


[PATCH] MAINTAINERS: Add PPC contacts for PCI core error handling

2018-09-12 Thread Bjorn Helgaas
From: Bjorn Helgaas 

The original PCI error recovery functionality was for the powerpc-specific
IBM EEH feature.  PCIe subsequently added some similar features, including
AER and DPC, that can be used on any architecture.

We want the generic PCI core error handling support to work with all of
these features.  Driver error recovery callbacks should be independent of
which feature the platform provides.

Add the generic PCI core error recovery files to the powerpc EEH
MAINTAINERS entry so the powerpc folks will be copied on changes to the
generic PCI error handling strategy.

Signed-off-by: Bjorn Helgaas 
---
 MAINTAINERS |4 
 1 file changed, 4 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7e10ba65bfe4..d6699597fd89 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11202,6 +11202,10 @@ PCI ENHANCED ERROR HANDLING (EEH) FOR POWERPC
 M: Russell Currey 
 L: linuxppc-dev@lists.ozlabs.org
 S: Supported
+F: Documentation/PCI/pci-error-recovery.txt
+F: drivers/pci/pcie/aer.c
+F: drivers/pci/pcie/dpc.c
+F: drivers/pci/pcie/err.c
 F: Documentation/powerpc/eeh-pci-error-recovery.txt
 F: arch/powerpc/kernel/eeh*.c
 F: arch/powerpc/platforms/*/eeh*.c



Re: [PATCH 0/2] sriov enablement on s390

2018-09-12 Thread Bjorn Helgaas
[+cc Arnd, powerpc folks]

On Wed, Sep 12, 2018 at 02:34:09PM +0200, Sebastian Ott wrote:
> Hello Bjorn,
> 
> On s390 we currently handle SRIOV within firmware. Which means
> that the PF is under firmware control and not visible to operating
> systems. SRIOV enablement happens within firmware and VFs are
> passed through to logical partitions.
> 
> I'm working on a new mode were the PF is under operating system
> control (including SRIOV enablement). However we still need
> firmware support to access the VFs. The way this is supposed
> to work is that when firmware traps the SRIOV enablement it
> will present machine checks to the logical partition that
> triggered the SRIOV enablement and provide the VFs via hotplug
> events.
> 
> The problem I'm faced with is that the VF detection code in
> sriov_enable leads to unusable functions in s390.

We're moving away from the weak function implementation style.  Can
you take a look at Arnd's work here, which uses pci_host_bridge
callbacks instead?

  https://lkml.kernel.org/r/20180817102645.3839621-1-a...@arndb.de

I cc'd some powerpc folks because they also have a fair amount of
arch-specific SR-IOV code that might one day move in this direction.

> Sebastian Ott (2):
>   pci: provide pcibios_sriov_add_vfs
>   s390/pci: handle function enumeration after sriov enablement
> 
>  arch/s390/pci/pci.c | 11 +++
>  drivers/pci/iov.c   | 43 +++
>  include/linux/pci.h |  2 ++
>  3 files changed, 44 insertions(+), 12 deletions(-)
> 
> -- 
> 2.13.4
> 


Re: [PATCH] MAINTAINERS: add maintainer entries for RPA pci hotplug drivers

2018-09-06 Thread Bjorn Helgaas
On Mon, Aug 20, 2018 at 06:20:31PM -0700, Tyrel Datwyler wrote:
> Adding myself as maintiner of the IBM RPA hotplug modules located in
> drivers/pci/hotplug directory. These modules provide kernel interfaces
> for support of Dynamic Logical Partitioning (DLPAR) of Logical and
> Physical IO slots, and hotplug of physical PCI slots of a PHB on
> RPA-compliant ppc64 platforms (pseries).
> 
> Signed-off-by: Tyrel Datwyler 

Applied to for-linus for v4.19, thanks!

> ---
>  MAINTAINERS | 14 ++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5df1b36..7b5dc3f 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -6984,6 +6984,20 @@ F: drivers/crypto/vmx/aes*
>  F:   drivers/crypto/vmx/ghash*
>  F:   drivers/crypto/vmx/ppc-xlate.pl
>  
> +IBM Power PCI Hotplug Driver for RPA-compliant PPC64 platform
> +M:   Tyrel Datwyler 
> +L:   linux-...@vger.kernel.org
> +L:   linuxppc-dev@lists.ozlabs.org
> +S:   Supported
> +F:   drivers/pci/hotplug/rpaphp*
> +
> +IBM Power IO DLPAR Driver for RPA-compliant PPC64 platform
> +M:   Tyrel Datwyler 
> +L:   linux-...@vger.kernel.org
> +L:   linuxppc-dev@lists.ozlabs.org
> +S:   Supported
> +F:   drivers/pci/hotplug/rpadlpar*
> +
>  IBM ServeRAID RAID DRIVER
>  S:   Orphan
>  F:   drivers/scsi/ips.*
> -- 
> 1.8.3.1
> 


Re: [PATCH] PCI: call dma_debug_add_bus for pci_bus_type in common code

2018-07-31 Thread Bjorn Helgaas
On Mon, Jul 30, 2018 at 09:38:42AM +0200, Christoph Hellwig wrote:
> There is nothing arch specific about PCI or dma-debug, so move this
> call to common code just after registering the bus type.
> 
> Signed-off-by: Christoph Hellwig 

Applied with acks from Thomas and Michael to pci/misc for v4.19, thanks!

> ---
>  arch/powerpc/kernel/dma.c | 3 ---
>  arch/sh/drivers/pci/pci.c | 2 --
>  arch/x86/kernel/pci-dma.c | 3 ---
>  drivers/pci/pci-driver.c  | 2 +-
>  4 files changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
> index 155170d70324..dbfc7056d7df 100644
> --- a/arch/powerpc/kernel/dma.c
> +++ b/arch/powerpc/kernel/dma.c
> @@ -357,9 +357,6 @@ EXPORT_SYMBOL_GPL(dma_get_required_mask);
>  
>  static int __init dma_init(void)
>  {
> -#ifdef CONFIG_PCI
> - dma_debug_add_bus(_bus_type);
> -#endif
>  #ifdef CONFIG_IBMVIO
>   dma_debug_add_bus(_bus_type);
>  #endif
> diff --git a/arch/sh/drivers/pci/pci.c b/arch/sh/drivers/pci/pci.c
> index e5b7437ab4af..8256626bc53c 100644
> --- a/arch/sh/drivers/pci/pci.c
> +++ b/arch/sh/drivers/pci/pci.c
> @@ -160,8 +160,6 @@ static int __init pcibios_init(void)
>   for (hose = hose_head; hose; hose = hose->next)
>   pcibios_scanbus(hose);
>  
> - dma_debug_add_bus(_bus_type);
> -
>   pci_initialized = 1;
>  
>   return 0;
> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> index ab5d9dd668d2..43f58632f123 100644
> --- a/arch/x86/kernel/pci-dma.c
> +++ b/arch/x86/kernel/pci-dma.c
> @@ -155,9 +155,6 @@ static int __init pci_iommu_init(void)
>  {
>   struct iommu_table_entry *p;
>  
> -#ifdef CONFIG_PCI
> - dma_debug_add_bus(_bus_type);
> -#endif
>   x86_init.iommu.iommu_init();
>  
>   for (p = __iommu_table; p < __iommu_table_end; p++) {
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 6792292b5fc7..bef17c3fca67 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1668,7 +1668,7 @@ static int __init pci_driver_init(void)
>   if (ret)
>   return ret;
>  #endif
> -
> + dma_debug_add_bus(_bus_type);
>   return 0;
>  }
>  postcore_initcall(pci_driver_init);
> -- 
> 2.18.0
> 


  1   2   3   4   5   6   7   >