[GIT PULL] PCI fixes for v4.20

2018-12-07 Thread Bjorn Helgaas
PCI fixes:

  - Revert ASPM change that caused a regression (Bjorn Helgaas)


The following changes since commit c74eadf881ad634c68880e2c1b504989d95993ee:

  Merge remote-tracking branch 'lorenzo/pci/controller-fixes' into for-linus 
(2018-11-30 23:42:08 -0600)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git 
tags/pci-v4.20-fixes-3

for you to fetch changes up to b07b864ee4232b03125992a8f6a490b040adcb6a:

  Revert "PCI/ASPM: Do not initialize link state when aspm_disabled is set" 
(2018-12-03 18:05:17 -0600)


pci-v4.20-fixes-3

--------
Bjorn Helgaas (1):
  Revert "PCI/ASPM: Do not initialize link state when aspm_disabled is set"

 drivers/pci/pcie/aspm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


Re: [PATCH] PCI: Add missing include to drivers/pci.h

2018-12-06 Thread Bjorn Helgaas
On Wed, Nov 28, 2018 at 04:28:04PM -0600, Alexandru Gagniuc wrote:
> This files makes use of definitions provided in . This
> only compiles when  is included beforehand, and creates
> a nasty include dependency. Instead, just include the correct file.
> 
> Signed-off-by: Alexandru Gagniuc 

Applied to pci/misc for v4.21, thanks!

> ---
>  drivers/pci/pci.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 662b7457db23..224d88634115 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -2,6 +2,8 @@
>  #ifndef DRIVERS_PCI_H
>  #define DRIVERS_PCI_H
>  
> +#include 
> +
>  #define PCI_FIND_CAP_TTL 48
>  
>  #define PCI_VSEC_ID_INTEL_TBT0x1234  /* Thunderbolt */
> -- 
> 2.17.1
> 


Re: [PATCH] PCI/P2PDMA: Match interface changes to devm_memremap_pages()

2018-12-06 Thread Bjorn Helgaas
On Fri, Nov 30, 2018 at 03:59:11PM -0700, Logan Gunthorpe wrote:
> "mm-hmm-mark-hmm_devmem_add-add_resource-export_symbol_gpl.patch" in the
> mm tree breaks p2pdma. The patch was written and reviewed before p2pdma
> was merged so the necessary changes were not done to the call site in
> that code.
> 
> Without this patch, all drivers will fail to register P2P resources
> because devm_memremap_pages() will return -EINVAL due to the 'kill'
> member of the pagemap structure not yet being set.
> 
> Signed-off-by: Logan Gunthorpe 
> Cc: Andrew Morton 
> Cc: Dan Williams 
> Cc: Bjorn Helgaas 

Applied with Dan's reviewed-by to pci/peer-to-peer for v4.21, thanks!

If the mm patch you mention gets merged for v4.20, let me know and I can
promote this to for-linus so v4.20 doesn't end up broken.

> ---
> 
> Ideally this patch should be squashed with the one mentioned above to
> avoid a bisect regression point.
> 
> drivers/pci/p2pdma.c | 10 ++
>  1 file changed, 2 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index ae3c5b25dcc7..a2eb25271c96 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -82,10 +82,8 @@ static void pci_p2pdma_percpu_release(struct percpu_ref 
> *ref)
>   complete_all(>devmap_ref_done);
>  }
> 
> -static void pci_p2pdma_percpu_kill(void *data)
> +static void pci_p2pdma_percpu_kill(struct percpu_ref *ref)
>  {
> - struct percpu_ref *ref = data;
> -
>   /*
>* pci_p2pdma_add_resource() may be called multiple times
>* by a driver and may register the percpu_kill devm action multiple
> @@ -198,6 +196,7 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int 
> bar, size_t size,
>   pgmap->type = MEMORY_DEVICE_PCI_P2PDMA;
>   pgmap->pci_p2pdma_bus_offset = pci_bus_address(pdev, bar) -
>   pci_resource_start(pdev, bar);
> + pgmap->kill = pci_p2pdma_percpu_kill;
> 
>   addr = devm_memremap_pages(>dev, pgmap);
>   if (IS_ERR(addr)) {
> @@ -211,11 +210,6 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int 
> bar, size_t size,
>   if (error)
>   goto pgmap_free;
> 
> - error = devm_add_action_or_reset(>dev, pci_p2pdma_percpu_kill,
> -   >p2pdma->devmap_ref);
> - if (error)
> - goto pgmap_free;
> -
>   pci_info(pdev, "added peer-to-peer DMA memory %pR\n",
>>res);
> 
> --
> 2.19.0


Re: [PATCH] pcie: portdrv: Fix Unnecessary space before function pointer arguments

2018-12-06 Thread Bjorn Helgaas
On Sat, Dec 01, 2018 at 08:07:11AM -0800, Benjamin Young wrote:
> Made spacing more consistent in the code for function pointer
> declarations based on checkpatch.pl
> 
> Signed-off-by: Benjamin Young 

Applied to pci/misc for v4.21, thanks!

I also made similar changes to include/linux/pci.h.  For trivial changes
like this I like to fix similar issues in all of PCI at the same time.

> ---
>  drivers/pci/pcie/portdrv.h | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
> index e495f04..fbbf00b0 100644
> --- a/drivers/pci/pcie/portdrv.h
> +++ b/drivers/pci/pcie/portdrv.h
> @@ -71,19 +71,19 @@ static inline void *get_service_data(struct pcie_device 
> *dev)
>  
>  struct pcie_port_service_driver {
>   const char *name;
> - int (*probe) (struct pcie_device *dev);
> - void (*remove) (struct pcie_device *dev);
> - int (*suspend) (struct pcie_device *dev);
> - int (*resume_noirq) (struct pcie_device *dev);
> - int (*resume) (struct pcie_device *dev);
> - int (*runtime_suspend) (struct pcie_device *dev);
> - int (*runtime_resume) (struct pcie_device *dev);
> + int (*probe)(struct pcie_device *dev);
> + void (*remove)(struct pcie_device *dev);
> + int (*suspend)(struct pcie_device *dev);
> + int (*resume_noirq)(struct pcie_device *dev);
> + int (*resume)(struct pcie_device *dev);
> + int (*runtime_suspend)(struct pcie_device *dev);
> + int (*runtime_resume)(struct pcie_device *dev);
>  
>   /* Device driver may resume normal operations */
>   void (*error_resume)(struct pci_dev *dev);
>  
>   /* Link Reset Capability - AER service driver specific */
> - pci_ers_result_t (*reset_link) (struct pci_dev *dev);
> + pci_ers_result_t (*reset_link)(struct pci_dev *dev);
>  
>   int port_type;  /* Type of the port this driver can handle */
>   u32 service;/* Port service this device represents */
> -- 
> 2.5.0
> 


Re: [PATCH] pci: p2pdma: clean up documentation and kernel-doc

2018-12-06 Thread Bjorn Helgaas
On Sat, Dec 01, 2018 at 09:31:34AM -0800, Randy Dunlap wrote:
> From: Randy Dunlap 
> 
> Fix typos, spellos, and grammar in p2pdma.rst and p2pdma.c.
> 
> Fix return value(s) in function pci_p2pmem_alloc_sgl().
> 
> Signed-off-by: Randy Dunlap 
> Cc: linux-...@vger.kernel.org
> Cc: Bjorn Helgaas 
> Cc: Jonathan Corbet 
> Cc: Logan Gunthorpe 

Applied with Logan's ack to pci/peer-to-peer for v4.21, thanks!

> ---
>  Documentation/driver-api/pci/p2pdma.rst |4 ++--
>  drivers/pci/p2pdma.c|   14 +++---
>  2 files changed, 9 insertions(+), 9 deletions(-)
> 
> --- lnx-420-rc4.orig/Documentation/driver-api/pci/p2pdma.rst
> +++ lnx-420-rc4/Documentation/driver-api/pci/p2pdma.rst
> @@ -49,7 +49,7 @@ For example, in the NVMe Target Copy Off
>in that it exposes any CMB (Controller Memory Buffer) as a P2P memory
>resource (provider), it accepts P2P memory pages as buffers in requests
>to be used directly (client) and it can also make use of the CMB as
> -  submission queue entries (orchastrator).
> +  submission queue entries (orchestrator).
>  * The RDMA driver is a client in this arrangement so that an RNIC
>can DMA directly to the memory exposed by the NVMe device.
>  * The NVMe Target driver (nvmet) can orchestrate the data from the RNIC
> @@ -111,7 +111,7 @@ that's compatible with all clients using
>  If more than one provider is supported, the one nearest to all the clients 
> will
>  be chosen first. If more than one provider is an equal distance away, the
>  one returned will be chosen at random (it is not an arbitrary but
> -truely random). This function returns the PCI device to use for the provider
> +truly random). This function returns the PCI device to use for the provider
>  with a reference taken and therefore when it's no longer needed it should be
>  returned with pci_dev_put().
>  
> --- lnx-420-rc4.orig/drivers/pci/p2pdma.c
> +++ lnx-420-rc4/drivers/pci/p2pdma.c
> @@ -422,7 +422,7 @@ static int upstream_bridge_distance_warn
>   *
>   * Returns -1 if any of the clients are not compatible (behind the same
>   * root port as the provider), otherwise returns a positive number where
> - * a lower number is the preferrable choice. (If there's one client
> + * a lower number is the preferable choice. (If there's one client
>   * that's the same as the provider it will return 0, which is best choice).
>   *
>   * For now, "compatible" means the provider and the clients are all behind
> @@ -493,7 +493,7 @@ EXPORT_SYMBOL_GPL(pci_has_p2pmem);
>   * @num_clients: number of client devices in the list
>   *
>   * If multiple devices are behind the same switch, the one "closest" to the
> - * client devices in use will be chosen first. (So if one of the providers 
> are
> + * client devices in use will be chosen first. (So if one of the providers is
>   * the same as one of the clients, that provider will be used ahead of any
>   * other providers that are unrelated). If multiple providers are an equal
>   * distance away, one will be chosen at random.
> @@ -580,7 +580,7 @@ EXPORT_SYMBOL_GPL(pci_alloc_p2pmem);
>   * pci_free_p2pmem - free peer-to-peer DMA memory
>   * @pdev: the device the memory was allocated from
>   * @addr: address of the memory that was allocated
> - * @size: number of bytes that was allocated
> + * @size: number of bytes that were allocated
>   */
>  void pci_free_p2pmem(struct pci_dev *pdev, void *addr, size_t size)
>  {
> @@ -617,7 +617,7 @@ EXPORT_SYMBOL_GPL(pci_p2pmem_virt_to_bus
>   * @nents: the number of SG entries in the list
>   * @length: number of bytes to allocate
>   *
> - * Returns 0 on success
> + * Return: %NULL on error or  scatterlist pointer and @nents on 
> success
>   */
>  struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
>unsigned int *nents, u32 length)
> @@ -673,7 +673,7 @@ EXPORT_SYMBOL_GPL(pci_p2pmem_free_sgl);
>   *
>   * Published memory can be used by other PCI device drivers for
>   * peer-2-peer DMA operations. Non-published memory is reserved for
> - * exlusive use of the device driver that registers the peer-to-peer
> + * exclusive use of the device driver that registers the peer-to-peer
>   * memory.
>   */
>  void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
> @@ -733,7 +733,7 @@ EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg);
>   * @use_p2pdma: returns whether to enable p2pdma or not
>   *
>   * Parses an attribute value to decide whether to enable p2pdma.
> - * The value can select a PCI device (using it's full BDF device
> + * The value can select a PCI device (using its full BDF device
>   * name) or a boolean (in any format strtobool() accepts). A false

Re: [Bug] SD card reader in Acer Aspire S5 broken in 4.20-rc

2018-12-03 Thread Bjorn Helgaas
On Wed, Nov 28, 2018 at 02:05:21PM -0600, Bjorn Helgaas wrote:
> On Wed, Nov 28, 2018 at 6:13 AM Rafael J. Wysocki  wrote:
> > On Tuesday, November 27, 2018 9:25:14 PM CET Bjorn Helgaas wrote:
> > > On Mon, Nov 26, 2018 at 11:37:20PM +0100, Rafael J. Wysocki wrote:
> > > > On Monday, November 26, 2018 7:03:58 PM CET Rafael J. Wysocki wrote:
> > > > > Hi Bjorn,
> > > > >
> > > > > The SD card reader in my Acer Aspire S5 doesn't work with 4.20-rc.
> > > > >
> > > > > Here's what lspci -v says about it (in a bad kernel):
> > > > >
> > > > > 02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. 
> > > > > RTS5209 PCI Express Card Reader
> > > > > (rev 01)
> > > > > Subsystem: Acer Incorporated [ALI] Device 0704
> > > > > Flags: bus master, fast devsel, latency 0, IRQ 35
> > > > > Memory at d9001000 (32-bit, non-prefetchable) [size=4K]
> > > > > Capabilities: [40] Power Management version 3
> > > > > Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
> > > > > Capabilities: [70] Express Endpoint, MSI 00
> > > > > Capabilities: [100] Advanced Error Reporting
> > > > > Capabilities: [140] Device Serial Number 
> > > > > 00-00-00-01-00-4c-e0-00
> > > > > Kernel driver in use: rtsx_pci
> > > > > Kernel modules: rtsx_pci
> > >
> > > Thanks a lot for bisecting this!
> > >
> > > With a good kernel (v4.19 or v4.20-rc with 17c91487364f reverted),
> > > would you mind collecting "lspci -vv" output, the dmesg log with
> > > "pci=earlydump", and the FADT dump?
> >
> > All of the information is attached to the BZ entry at
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=201801
> 
> Thanks!  I hope Patrick has a chance to look at this.  Per the
> bugzilla mentioned in 17c91487364f, it fixes a problem with a custom
> proprietary PCIe device, and there's a lot of good detailed analysis
> there, so hopefully we can figure out a way to address both
> situations.

I queued up a revert on for-linus, since we haven't made any progress on
this yet.  I'll be on vacation much of this week, but I want to get
the revert (or better, a fix if we can find one) in before -rc6 comes
out next Sunday.

If we figure out a fix before then, I'll drop the revert and use the
fix instead.

Bjorn


Fwd: [Bug 201517] New: pcieport 0000:00:03.1: AER: Corrected error received: 0000:00:00.0

2018-12-03 Thread Bjorn Helgaas
[Forwarding this to linux-pci since nobody really monitors the bugzilla]

Possibly the same issue reported here:

  https://bugzilla.kernel.org/show_bug.cgi?id=109691
  https://bugzilla.kernel.org/show_bug.cgi?id=111601
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1588428/
  https://lore.kernel.org/linux-pci/20160215171423.GA12641@localhost/

I had a theory about the problem (see the lore.kernel link above), but
that was before a lot of AER rework, and I haven't checked the code
since then.

-- Forwarded message -
From: 
Date: Thu, Oct 25, 2018 at 12:45 AM
Subject: [Bug 201517] New: pcieport :00:03.1: AER: Corrected error
received: :00:00.0
To: 


https://bugzilla.kernel.org/show_bug.cgi?id=201517

Bug ID: 201517
   Summary: pcieport :00:03.1: AER: Corrected error received:
:00:00.0
   Product: Drivers
   Version: 2.5
Kernel Version: 4.19
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: PCI
  Assignee: drivers_...@kernel-bugs.osdl.org
  Reporter: mikhail.v.gavri...@gmail.com
Regression: No

Created attachment 279149
  --> https://bugzilla.kernel.org/attachment.cgi?id=279149=edit
dmesg

I often get a strange error in the kernel log:

[ 8885.590311] pcieport :00:03.1: AER: Corrected error received:
:00:00.0
[ 8885.590320] pcieport :00:03.1: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, (Transmitter ID)
[ 8885.590324] pcieport :00:03.1:   device [1022:1453] error
status/mask=1000/6000
[ 8885.590328] pcieport :00:03.1:[12] Timeout

But not always, it means that if this message starts to appear after a reboot,
then it will appear again and again, and if it does not appear, it does not
appear at all.

# lspci -nn
00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h
(Models 00h-0fh) Root Complex [1022:1450]
00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models
00h-0fh) I/O Memory Management Unit [1022:1451]
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h
(Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h
(Models 00h-0fh) PCIe GPP Bridge [1022:1453]
00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h
(Models 00h-0fh) PCIe GPP Bridge [1022:1453]
00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h
(Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h
(Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h
(Models 00h-0fh) PCIe GPP Bridge [1022:1453]
00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h
(Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h
(Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h
(Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h
(Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h
(Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller
[1022:790b] (rev 59)
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge
[1022:790e] (rev 51)
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h
(Models 00h-0fh) Data Fabric: Device 18h; Function 0 [1022:1460]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h
(Models 00h-0fh) Data Fabric: Device 18h; Function 1 [1022:1461]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h
(Models 00h-0fh) Data Fabric: Device 18h; Function 2 [1022:1462]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h
(Models 00h-0fh) Data Fabric: Device 18h; Function 3 [1022:1463]
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h
(Models 00h-0fh) Data Fabric: Device 18h; Function 4 [1022:1464]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h
(Models 00h-0fh) Data Fabric: Device 18h; Function 5 [1022:1465]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h
(Models 00h-0fh) Data Fabric: Device 18h; Function 6 [1022:1466]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h
(Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]
01:00.0 Non-Volatile memory controller [0108]: Intel Corporation Optane SSD
900P Series [8086:2700]
02:00.0 USB controller [0c03]: 

Re: [PATCH v6 2/2] PCI: amlogic: Add the Amlogic Meson PCIe controller driver

2018-12-03 Thread Bjorn Helgaas
On Mon, Dec 03, 2018 at 04:41:50PM +, Lorenzo Pieralisi wrote:
> On Thu, Nov 22, 2018 at 04:53:54PM +0800, Hanjie Lin wrote:
> 
> [...]
> 
> > +static int meson_pcie_rd_own_conf(struct pcie_port *pp, int where, int 
> > size,
> > + u32 *val)
> > +{
> > +   struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > +
> > +   /*
> > +* there is a bug of MESON AXG pcie controller that software can not
> > +* programe PCI_CLASS_DEVICE register, so we must return a fake right
> > +* value to ensure driver could probe successfully.
> > +*/
> > +   if (where == PCI_CLASS_REVISION) {
> > +   *val = readl(pci->dbi_base + PCI_CLASS_REVISION);
> > +   /* keep revision id */
> > +   *val &= PCI_CLASS_REVISION_MASK;
> > +   *val |= PCI_CLASS_BRIDGE_PCI << 16;
> > +   return PCIBIOS_SUCCESSFUL;
> > +   }
> 
> As I said before, this looks broken. If this code (or other drivers with
> the same broken assumptions, eg dwc/pcie-qcom.c) carries out a, say,
> byte sized config access of eg PCI_CLASS_DEVICE you will get junk out of
> it according to your comment above.
> 
> I would like to pick Bjorn's brain on this to see what we can really do
> to fix this (and other) drivers.

  - Check to see whether you're reading anything in the 32-bit dword at
offset 0x08.

  - Do the 32-bit readl().

  - Insert the correct Sub-Class and Base Class code (you also throw
away the Programming Interface; not sure why that is)

  - If you're reading something smaller than 32 bits, mask & shift as
needed.  pci_bridge_emul_conf_read() does something similar that
you might be able to copy.

Out of curiosity, what code depends on PCI_CLASS_BRIDGE_PCI?  There
are several places in the kernel that currently depend on it, but I
think several of them *should* be checking dev->hdr_type to identify a
type 1 header instead.

Bjorn


Re: [PATCH 2/2] PCI: mobiveil: ls_pcie_g4: add Workaround for A-011451

2018-12-03 Thread Bjorn Helgaas
On Sun, Dec 02, 2018 at 01:32:45PM +, Z.q. Hou wrote:
> From: Hou Zhiqiang 
> 
> When LX2 PCIe controller is sending multiple split completions and
> ACK latency expires indicating that ACK should be send at priority.
> But because of large number of split completions and FC update DLLP,
> the controller does not give priority to ACK transmission. This
> results into ACK latency timer timeout error at the link partner and
> the pending TLPs are replayed by the link partner again.
> 
> Workaround:
> 1. Reduce the ACK latency timeout value to a very small value.
> 2. Restrict the number of completions from the LX2 PCIe controller
>to 1, by changing the Max Read Request Size (MRRS) of link partner
>to the same value as Max Packet size (MPS).
> 
> This patch implemented part 1, the part 2 can be set by kernel parameter
> 'pci=pcie_bus_perf'

So you're saying that users of this controller must boot with
"pci=pcie_bus_perf"?  That's a little unfriendly to users.  When they
forget to use that parameter and some mysterious PCIe error occurs,
they will not thank you.

We should be able to figure this out automatically via some sort of
quirk in the driver, and then do the right thing in the MPS/MRRS
configuration.  That would also give us a chance to make sure that
when the MPS/MRRS code changes, it can be done in a way that keeps
this Rev1.0 controller working.

If you depend on users booting with "pci=pcie_bus_perf", there's no
connection in the code, and if we change or remove that parameter, we
would have no clue that you depend on it.

> This ERRATA is only for LX2160A Rev1.0, and it will be fixed
> in Rev2.0.
> 
> Signed-off-by: Hou Zhiqiang 
> ---
>  .../pci/controller/mobiveil/pci-layerscape-gen4.c  | 14 ++
>  drivers/pci/controller/mobiveil/pcie-mobiveil.h|  4 
>  2 files changed, 18 insertions(+)
> 
> diff --git a/drivers/pci/controller/mobiveil/pci-layerscape-gen4.c 
> b/drivers/pci/controller/mobiveil/pci-layerscape-gen4.c
> index 1fe56532b288..ef43033e1c2a 100644
> --- a/drivers/pci/controller/mobiveil/pci-layerscape-gen4.c
> +++ b/drivers/pci/controller/mobiveil/pci-layerscape-gen4.c
> @@ -220,6 +220,18 @@ static const struct mobiveil_pab_ops ls_pcie_g4_pab_ops 
> = {
>   .link_up = ls_pcie_g4_link_up,
>  };
>  
> +static void workaround_A011451(struct ls_pcie_g4 *pcie)
> +{
> + struct mobiveil_pcie *mv_pci = pcie->pci;
> + u32 val;
> +
> + /* Set ACK latency timeout */
> + val = csr_readl(mv_pci, GPEX_ACK_REPLAY_TO);
> + val &= ~(ACK_LAT_TO_VAL_MASK << ACK_LAT_TO_VAL_SHIFT);
> + val |= (4 << ACK_LAT_TO_VAL_SHIFT);
> + csr_writel(mv_pci, val, GPEX_ACK_REPLAY_TO);
> +}
> +
>  static int __init ls_pcie_g4_probe(struct platform_device *pdev)
>  {
>   struct device *dev = >dev;
> @@ -259,6 +271,8 @@ static int __init ls_pcie_g4_probe(struct platform_device 
> *pdev)
>   if (!ls_pcie_g4_is_bridge(pcie))
>   return -ENODEV;
>  
> + workaround_A011451(pcie);
> +
>   return 0;
>  }
>  
> diff --git a/drivers/pci/controller/mobiveil/pcie-mobiveil.h 
> b/drivers/pci/controller/mobiveil/pcie-mobiveil.h
> index ef93b41f4419..c75b7c304c46 100644
> --- a/drivers/pci/controller/mobiveil/pcie-mobiveil.h
> +++ b/drivers/pci/controller/mobiveil/pcie-mobiveil.h
> @@ -85,6 +85,10 @@
>  #define PAB_AXI_AMAP_PEX_WIN_H(win)  PAB_REG_ADDR(0x0bac, win)
>  #define PAB_INTP_AXI_PIO_CLASS   0x474
>  
> +#define GPEX_ACK_REPLAY_TO   0x438
> +#define  ACK_LAT_TO_VAL_MASK 0x1fff
> +#define  ACK_LAT_TO_VAL_SHIFT0
> +
>  #define PAB_PEX_AMAP_CTRL(win)   PAB_REG_ADDR(0x4ba0, win)
>  #define  AMAP_CTRL_EN_SHIFT  0
>  #define  AMAP_CTRL_TYPE_SHIFT1
> -- 
> 2.17.1
> 


Re: [PATCH] x86/pci: Remove dead code DBG() macro

2018-12-03 Thread Bjorn Helgaas
On Mon, Dec 03, 2018 at 09:21:40AM +0100, Ingo Molnar wrote:
> From 22b71f970f18f5f38161be028ab7ce7cd1f769f7 Mon Sep 17 00:00:00 2001
> From: Ingo Molnar 
> Date: Mon, 3 Dec 2018 09:15:40 +0100
> Subject: [PATCH] x86/pci: Remove the dead-code DBG() macro
> 
> While reading arch/x86/include/asm/pci_x86.h I noticed that we have ancient
> residuals of debugging code, which is never actually enabled via any regular
> Kconfig mechanism:
> 
>  #undef DEBUG
> 
>  #ifdef DEBUG
>  #define DBG(fmt, ...) printk(fmt, ##__VA_ARGS__)
>  #else
>  #define DBG(fmt, ...)  \
>  do {   \
>if (0)  \
>printk(fmt, ##__VA_ARGS__); \
>  } while (0)
>  #endif
> 
> Remove this and the call sites. These messages might have been
> super interesting decades ago when the PCI code was first
> bootstrapped, but we have better mechanisms meanwhile, and code
> readability is king ... ;-)
> 
> Signed-off-by: Ingo Molnar 

Acked-by: Bjorn Helgaas 

for both of these.  I have nothing in the queue for these files, so
probably easier if you just take them.  FWIW, recent history
capitalizes the subject lines as "x86/PCI: "

Thanks for all this cleanup!

> ---
>  arch/x86/include/asm/pci_x86.h | 12 
>  arch/x86/pci/direct.c  |  1 -
>  arch/x86/pci/i386.c|  2 --
>  arch/x86/pci/irq.c | 29 +++--
>  arch/x86/pci/legacy.c  |  3 +--
>  arch/x86/pci/pcbios.c  |  9 ++---
>  6 files changed, 6 insertions(+), 50 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
> index 959d618dbb17..3f8d9c75b9ee 100644
> --- a/arch/x86/include/asm/pci_x86.h
> +++ b/arch/x86/include/asm/pci_x86.h
> @@ -7,18 +7,6 @@
>  
>  #include 
>  
> -#undef DEBUG
> -
> -#ifdef DEBUG
> -#define DBG(fmt, ...) printk(fmt, ##__VA_ARGS__)
> -#else
> -#define DBG(fmt, ...)\
> -do { \
> - if (0)  \
> - printk(fmt, ##__VA_ARGS__); \
> -} while (0)
> -#endif
> -
>  #define PCI_PROBE_BIOS   0x0001
>  #define PCI_PROBE_CONF1  0x0002
>  #define PCI_PROBE_CONF2  0x0004
> diff --git a/arch/x86/pci/direct.c b/arch/x86/pci/direct.c
> index a51074c55982..68a115629568 100644
> --- a/arch/x86/pci/direct.c
> +++ b/arch/x86/pci/direct.c
> @@ -216,7 +216,6 @@ static int __init pci_sanity_check(const struct 
> pci_raw_ops *o)
>   return 1;
>   }
>  
> - DBG(KERN_WARNING "PCI: Sanity check failed\n");
>   return 0;
>  }
>  
> diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
> index 8cd66152cdb0..5c2b31315750 100644
> --- a/arch/x86/pci/i386.c
> +++ b/arch/x86/pci/i386.c
> @@ -389,8 +389,6 @@ void __init pcibios_resource_survey(void)
>  {
>   struct pci_bus *bus;
>  
> - DBG("PCI: Allocating resources\n");
> -
>   list_for_each_entry(bus, _root_buses, node)
>   pcibios_allocate_bus_resources(bus);
>  
> diff --git a/arch/x86/pci/irq.c b/arch/x86/pci/irq.c
> index 52e55108404e..1286f138e281 100644
> --- a/arch/x86/pci/irq.c
> +++ b/arch/x86/pci/irq.c
> @@ -77,11 +77,8 @@ static inline struct irq_routing_table 
> *pirq_check_routing_table(u8 *addr)
>   sum = 0;
>   for (i = 0; i < rt->size; i++)
>   sum += addr[i];
> - if (!sum) {
> - DBG(KERN_DEBUG "PCI: Interrupt Routing Table found at 0x%p\n",
> - rt);
> + if (!sum)
>   return rt;
> - }
>   return NULL;
>  }
>  
> @@ -126,15 +123,6 @@ static void __init pirq_peer_trick(void)
>   memset(busmap, 0, sizeof(busmap));
>   for (i = 0; i < (rt->size - sizeof(struct irq_routing_table)) / 
> sizeof(struct irq_info); i++) {
>   e = >slots[i];
> -#ifdef DEBUG
> - {
> - int j;
> - DBG(KERN_DEBUG "%02x:%02x slot=%02x", e->bus, 
> e->devfn/8, e->slot);
> - for (j = 0; j < 4; j++)
> - DBG(" %d:%02x/%04x", j, e->irq[j].link, 
> e->irq[j].bitmap);
> - DBG("\n");
> - }
> -#endif
>   busmap[e->bus] = 1;
>   }
>   for (i = 1; i < 256; i++) {
> @@ -163,10 +151,8 @@ void elcr_set_level_irq(unsigned int irq)
>   elcr_irq_mask |= (1 << irq);
>   printk(KERN_DEBUG "

Re: [PATCH 1/2] PCI: mobiveil: ls_pcie_g4: add Workaround for A-011577

2018-12-03 Thread Bjorn Helgaas
On Sun, Dec 02, 2018 at 01:32:42PM +, Z.q. Hou wrote:
> From: Hou Zhiqiang 

Can we pick one driver name (either "mobiveil" or "ls_pcie_g4" (this
seems excessively long and excessively specific), or something else)?
I don't want to waste the space of "PCI: mobiveil: ls_pcie_g4:" in
every future subject line.

Then "Add workaround for ...".  I assume the "A-011577" part is
meaningful inside NXP, but it's not useful to anybody else.  Move that
to the changelog proper and say something about the actual issue in
the subject.

> PCIe configuration access to non-existent function triggered
> SERROR interrupt exception.
> 
> Workaround:
> Disable error reporting on AXI bus during the Vendor ID read
> transactions in enumeration.
> 
> This ERRATA is only for LX2160A Rev1.0, and it will be fixed
> in Rev2.0.
> 
> Signed-off-by: Hou Zhiqiang 
> ---
>  .../controller/mobiveil/pci-layerscape-gen4.c | 24 +++
>  .../controller/mobiveil/pcie-mobiveil-host.c  | 13 +-
>  .../pci/controller/mobiveil/pcie-mobiveil.h   |  2 ++
>  3 files changed, 38 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/mobiveil/pci-layerscape-gen4.c 
> b/drivers/pci/controller/mobiveil/pci-layerscape-gen4.c
> index 174cbcac4059..1fe56532b288 100644
> --- a/drivers/pci/controller/mobiveil/pci-layerscape-gen4.c
> +++ b/drivers/pci/controller/mobiveil/pci-layerscape-gen4.c
> @@ -24,6 +24,9 @@
>  
>  /* LUT and PF control registers */
>  #define PCIE_LUT_OFF (0x8)
> +#define PCIE_LUT_GCR (0x28)
> +#define PCIE_LUT_GCR_RRE (0)
> +
>  #define PCIE_PF_OFF  (0xc)
>  #define PCIE_PF_INT_STAT (0x18)
>  #define PF_INT_STAT_PABRST   (31)
> @@ -188,8 +191,29 @@ static void ls_pcie_g4_reset(struct work_struct *work)
>   ls_pcie_g4_reinit_hw(pcie);
>  }
>  
> +static int ls_pcie_g4_read_other_conf(struct pci_bus *bus, unsigned int 
> devfn,
> +int where, int size, u32 *val)
> +{
> + struct mobiveil_pcie *pci = bus->sysdata;
> + struct ls_pcie_g4 *pcie = to_ls_pcie_g4(pci);
> + int ret;
> +
> + if (where == PCI_VENDOR_ID)
> + ls_pcie_g4_lut_writel(pcie, PCIE_LUT_GCR,
> +   0 << PCIE_LUT_GCR_RRE);
> +
> + ret = pci_generic_config_read(bus, devfn, where, size, val);
> +
> + if (where == PCI_VENDOR_ID)
> + ls_pcie_g4_lut_writel(pcie, PCIE_LUT_GCR,
> +   1 << PCIE_LUT_GCR_RRE);

1) As a general style rule, it's better to "clear, then restore" than
to "clear, then set" the bit.  That way if somebody elsewhere decides
that PCIE_LUT_GCR_RRE should be cleared by default, this code won't
stomp on that decision.  E.g.,

  gcr = ls_pcie_g4_lut_readl(...);
  ls_pcie_g4_lut_writel(..., 0 << PCIE_LUT_GCR_RRE);
  ret = pci_generic_config_read(...);
  ls_pcie_g4_lut_writel(..., gcr);

2) I don't *think* the PCIe spec requires that the first access to a
device be a read of the Vendor ID, so this is a 99% solution, not a
100% solution.  A 100% solution would be to handle the SERROR so it's
not fatal.  But I'm pretty sure Linux always does read the Vendor ID
first (except after a reset, and when we do config reads after a
reset, we already know the device *exists*), so this is probably
pretty safe.

> + return ret;
> +}
> +
>  static struct mobiveil_rp_ops ls_pcie_g4_rp_ops = {
>   .interrupt_init = ls_pcie_g4_interrupt_init,
> + .read_other_conf = ls_pcie_g4_read_other_conf,
>  };
>  
>  static const struct mobiveil_pab_ops ls_pcie_g4_pab_ops = {
> diff --git a/drivers/pci/controller/mobiveil/pcie-mobiveil-host.c 
> b/drivers/pci/controller/mobiveil/pcie-mobiveil-host.c
> index c85f00d3cfcf..8b6db38320d7 100644
> --- a/drivers/pci/controller/mobiveil/pcie-mobiveil-host.c
> +++ b/drivers/pci/controller/mobiveil/pcie-mobiveil-host.c
> @@ -79,9 +79,20 @@ static void __iomem *mobiveil_pcie_map_bus(struct pci_bus 
> *bus,
>   return pcie->rp.config_axi_slave_base + where;
>  }
>  
> +static int mobiveil_pcie_config_read(struct pci_bus *bus, unsigned int devfn,
> +  int where, int size, u32 *val)
> +{
> + struct mobiveil_pcie *pcie = bus->sysdata;
> + struct root_port *rp = >rp;
> +
> + if (bus->number > rp->root_bus_nr && rp->ops->read_other_conf)
> + return rp->ops->read_other_conf(bus, devfn, where, size, val);
> +
> + return pci_generic_config_read(bus, devfn, where, size, val);
> +}
>  static struct pci_ops mobiveil_pcie_ops = {
>   .map_bus = mobiveil_pcie_map_bus,
> - .read = pci_generic_config_read,
> + .read = mobiveil_pcie_config_read,
>   .write = pci_generic_config_write,
>  };
>  
> diff --git a/drivers/pci/controller/mobiveil/pcie-mobiveil.h 
> b/drivers/pci/controller/mobiveil/pcie-mobiveil.h
> index 0ccd6cee5f8f..ef93b41f4419 100644
> --- a/drivers/pci/controller/mobiveil/pcie-mobiveil.h
> +++ 

[GIT PULL] PCI fixes for v4.20

2018-11-30 Thread Bjorn Helgaas
PCI fixes:

  - Fix a link speed checking interface that broke PCIe gen3 cards in gen1
slots (Mikulas Patocka)

  - Fix an imx6 link training error (Trent Piepho)

  - Fix a layerscape outbound window accessor calling error (Hou Zhiqiang)

  - Fix a DesignWare endpoint MSI-X address calculation error (Gustavo
Pimentel)


The following changes since commit 0d76bcc960e6057750fcf556b65da13f8bbdfd2b:

  Revert "ACPI/PCI: Pay attention to device-specific _PXM node values" 
(2018-11-13 08:38:17 -0600)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git 
tags/pci-v4.20-fixes-2

for you to fetch changes up to c74eadf881ad634c68880e2c1b504989d95993ee:

  Merge remote-tracking branch 'lorenzo/pci/controller-fixes' into for-linus 
(2018-11-30 23:42:08 -0600)


pci-v4.20-fixes-2

--------
Bjorn Helgaas (1):
  Merge remote-tracking branch 'lorenzo/pci/controller-fixes' into for-linus

Gustavo Pimentel (1):
  PCI: dwc: Fix MSI-X EP framework address calculation bug

Hou Zhiqiang (1):
  PCI: layerscape: Fix wrong invocation of outbound window disable accessor

Mikulas Patocka (1):
  PCI: Fix incorrect value returned from pcie_get_speed_cap()

Trent Piepho (1):
  PCI: imx6: Fix link training status detection in link up check

 drivers/pci/controller/dwc/pci-imx6.c   | 10 +-
 drivers/pci/controller/dwc/pci-layerscape.c |  2 +-
 drivers/pci/controller/dwc/pcie-designware-ep.c |  1 -
 drivers/pci/pci.c   | 24 +++-
 4 files changed, 13 insertions(+), 24 deletions(-)


Re: [PATCH] PCI: pciehp: Report degraded links via link bandwidth notification

2018-11-29 Thread Bjorn Helgaas
On Thu, Nov 29, 2018 at 08:13:12PM +0100, Lukas Wunner wrote:
> On Thu, Nov 29, 2018 at 06:57:37PM +, alex_gagn...@dellteam.com wrote:
> > On 11/29/2018 11:36 AM, Bjorn Helgaas wrote:
> > > On Wed, Nov 28, 2018 at 06:08:24PM -0600, Alexandru Gagniuc wrote:
> > >> A warning is generated when a PCIe device is probed with a degraded
> > >> link, but there was no similar mechanism to warn when the link becomes
> > >> degraded after probing. The Link Bandwidth Notification provides this
> > >> mechanism.
> > >>
> > >> Use the link bandwidth notification interrupt to detect bandwidth
> > >> changes, and rescan the bandwidth, looking for the weakest point. This
> > >> is the same logic used in probe().
> > > 
> > > I like the concept of this.  What I don't like is the fact that it's
> > > tied to pciehp, since I don't think the concept of Link Bandwidth
> > > Notification is related to hotplug.  So I think we'll only notice this
> > > for ports that support hotplug.  Maybe it's worth doing it this way
> > > anyway, even if it could be generalized in the future?
> > 
> > That makes sense. At first, I thought that BW notification was tied to 
> > hotplug, but our PCIe spec writer disagreed with that assertion. I'm 
> > just not sure where to handle the interrupt otherwise.
> 
> I guess the interrupt is shared with hotplug and PME?  In that case write
> a separate pcie_port_service_driver and request the interrupt with
> IRQF_SHARED.  Define a new service type in drivers/pci/pcie/portdrv.h.
> Amend get_port_device_capability() to check for PCI_EXP_LNKCAP_LBNC.

I really don't like the port driver design.  I'd rather integrate
those services more tightly into the PCI core.  But realistically
that's wishful thinking and may never happen, so this might be the
most expedient approach.

Bjorn


Re: [PATCH] PCI: pciehp: Report degraded links via link bandwidth notification

2018-11-29 Thread Bjorn Helgaas
On Wed, Nov 28, 2018 at 06:08:24PM -0600, Alexandru Gagniuc wrote:
> A warning is generated when a PCIe device is probed with a degraded
> link, but there was no similar mechanism to warn when the link becomes
> degraded after probing. The Link Bandwidth Notification provides this
> mechanism.
> 
> Use the link bandwidth notification interrupt to detect bandwidth
> changes, and rescan the bandwidth, looking for the weakest point. This
> is the same logic used in probe().

I like the concept of this.  What I don't like is the fact that it's
tied to pciehp, since I don't think the concept of Link Bandwidth
Notification is related to hotplug.  So I think we'll only notice this
for ports that support hotplug.  Maybe it's worth doing it this way
anyway, even if it could be generalized in the future?

> Signed-off-by: Alexandru Gagniuc 
> ---
>  drivers/pci/hotplug/pciehp_hpc.c | 35 +++-
>  1 file changed, 34 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c 
> b/drivers/pci/hotplug/pciehp_hpc.c
> index 7dd443aea5a5..834672000b59 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -515,7 +515,8 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
>   struct controller *ctrl = (struct controller *)dev_id;
>   struct pci_dev *pdev = ctrl_dev(ctrl);
>   struct device *parent = pdev->dev.parent;
> - u16 status, events;
> + struct pci_dev *endpoint;
> + u16 status, events, link_status;
>  
>   /*
>* Interrupts only occur in D3hot or shallower and only if enabled
> @@ -525,6 +526,17 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
>   (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode))
>   return IRQ_NONE;
>  
> + pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, _status);
> +
> + if (link_status & PCI_EXP_LNKSTA_LBMS) {
> + if (pdev->subordinate && pdev->subordinate->self)
> + endpoint = pdev->subordinate->self;
> + else
> + endpoint = pdev;
> + __pcie_print_link_status(endpoint, false);
> + pcie_capability_write_word(pdev, PCI_EXP_LNKSTA, link_status);
> + }
> +
>   /*
>* Keep the port accessible by holding a runtime PM ref on its parent.
>* Defer resume of the parent to the IRQ thread if it's suspended.
> @@ -677,6 +689,24 @@ static int pciehp_poll(void *data)
>   return 0;
>  }
>  
> +static bool pcie_link_bandwidth_notification_supported(struct controller 
> *ctrl)
> +{
> + int ret;
> + u32 cap;
> +
> + ret = pcie_capability_read_dword(ctrl_dev(ctrl), PCI_EXP_LNKCAP, );
> + return (ret == PCIBIOS_SUCCESSFUL) && (cap & PCI_EXP_LNKCAP_LBNC);
> +}
> +
> +static void pcie_enable_link_bandwidth_notification(struct controller *ctrl)
> +{
> + u16 lnk_ctl;
> +
> + pcie_capability_read_word(ctrl_dev(ctrl), PCI_EXP_LNKCTL, _ctl);
> + lnk_ctl |= PCI_EXP_LNKCTL_LBMIE;
> + pcie_capability_write_word(ctrl_dev(ctrl), PCI_EXP_LNKCTL, lnk_ctl);
> +}
> +
>  static void pcie_enable_notification(struct controller *ctrl)
>  {
>   u16 cmd, mask;
> @@ -713,6 +743,9 @@ static void pcie_enable_notification(struct controller 
> *ctrl)
>   pcie_write_cmd_nowait(ctrl, cmd, mask);
>   ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
>pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, cmd);
> +
> + if (pcie_link_bandwidth_notification_supported(ctrl))
> + pcie_enable_link_bandwidth_notification(ctrl);
>  }
>  
>  static void pcie_disable_notification(struct controller *ctrl)
> -- 
> 2.17.1
> 


Re: [PATCH v2] PCI: assign bus numbers present in EA capability for bridges

2018-11-29 Thread Bjorn Helgaas
On Thu, Nov 29, 2018 at 07:00:14PM +0530, sundeep subbaraya wrote:
> On Thu, Nov 29, 2018 at 3:25 AM Bjorn Helgaas  wrote:
> > On Mon, Nov 19, 2018 at 06:44:32PM +0530, sundeep.l...@gmail.com wrote:
> > > From: Subbaraya Sundeep 
> > >
> > > As per the spec, bridges with EA capability work
> > > with fixed secondary and subordinate bus numbers.
> > > Hence assign bus numbers to bridges from EA if the
> > > capability exists.
> >
> > A reference to the spec section would be good, i.e., PCIe r4.0, sec xxx.
> >
> Ok. I referred ECN 2014 section 6.9.1.2.
> 
> > > Signed-off-by: Subbaraya Sundeep 
> > > ---
> > > Changes for v2:
> > >   No changes just added Sean Stalley who did EA support for BARs.
> > >
> > >  drivers/pci/probe.c   | 58 
> > > ---
> > >  include/uapi/linux/pci_regs.h |  6 +
> > >  2 files changed, 60 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > > index b1c05b5..f41d2e6 100644
> > > --- a/drivers/pci/probe.c
> > > +++ b/drivers/pci/probe.c
> > > @@ -1030,6 +1030,40 @@ static void pci_enable_crs(struct pci_dev *pdev)
> > >
> > >  static unsigned int pci_scan_child_bus_extend(struct pci_bus *bus,
> > > unsigned int available_buses);
> > > +/*
> > > + * pci_ea_fixed_busnrs() - Read fixed Secondary and Subordinate bus
> > > + * numbers from EA capability.
> > > + * @dev: Bridge with EA
> > > + * @secondary: updated with secondary bus number in EA
> > > + * @subordinate: updated with subordinate bus number in EA
> > > + *
> > > + * If it is a bridge with EA capability then fixed bus numbers are
> > > + * read from EA capability list and secondary, subordinate reference
> > > + * variables will be updated. Otherwise secondary and subordinate 
> > > reference
> > > + * variables will be zeroed.
> > > + */
> > > +static void pci_ea_fixed_busnrs(struct pci_dev *dev, u8 *secondary,
> > > + u8 *subordinate)
> > > +{
> > > + int ea;
> > > + int offset;
> > > + u32 dw;
> > > +
> > > + *secondary = *subordinate = 0;
> > > +
> > > + if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE)
> > > + return;
> > > +
> > > + /* find PCI EA capability in list */
> > > + ea = pci_find_capability(dev, PCI_CAP_ID_EA);
> > > + if (!ea)
> > > + return;
> > > +
> > > + offset = ea + PCI_EA_FIRST_ENT;
> > > + pci_read_config_dword(dev, offset, );
> >
> > "Num Entries" in the first DW of the capability is allowed to be zero,
> > in which case this word (the second DW) is invalid.  [See comments
> > below; this code would be valid based on the 2014 ECN, but not per the
> > 2017 PCIe r4.0 spec]
> >
> Yes but Entries follow after first DW of EA capability for devices and
> after second
> DW for bridges.
> 2014 ECN says for bridges DW2 of EA must be present:
> "For Type 1 functions only, there is a second DW in the capability,
> preceding the first entry.
> This second DW must be included in the Enhanced Allocation Capability
> whenever this
> capability is implemented in a Type 1 Function"
> So for normal device EA DW1, Entries(if any)
> for bridges EA DW1,EA DW2, Entries(if any)
> 
> > It would be much better if this function could be somehow incorporated
> > into pci_ea_init(), which already knows how to parse much of the EA
> > capability.
> >
> I initially thought of this but didn't do it to avoid new members in pci_dev.
> 
> > > + *secondary =  dw & PCI_EA_SEC_BUS_MASK;
> > > + *subordinate = (dw & PCI_EA_SUB_BUS_MASK) >> PCI_EA_SUB_BUS_SHIFT;
> > > +}
> > >
> > >  /*
> > >   * pci_scan_bridge_extend() - Scan buses behind a bridge
> > > @@ -1064,6 +1098,8 @@ static int pci_scan_bridge_extend(struct pci_bus 
> > > *bus, struct pci_dev *dev,
> > >   u16 bctl;
> > >   u8 primary, secondary, subordinate;
> > >   int broken = 0;
> > > + u8 fixed_sec, fixed_sub;
> > > + int next_busnr;
> > >
> > >   /*
> > >* Make sure the bridge is powered on to be able to access config
> > > @@ -1163,17 +1199,25 @@ static int pci_scan_bridge_e

Re: [PATCH v2] PCI: assign bus numbers present in EA capability for bridges

2018-11-28 Thread Bjorn Helgaas
On Mon, Nov 19, 2018 at 06:44:32PM +0530, sundeep.l...@gmail.com wrote:
> From: Subbaraya Sundeep 
> 
> As per the spec, bridges with EA capability work
> with fixed secondary and subordinate bus numbers.
> Hence assign bus numbers to bridges from EA if the
> capability exists.

A reference to the spec section would be good, i.e., PCIe r4.0, sec xxx.

> Signed-off-by: Subbaraya Sundeep 
> ---
> Changes for v2:
>   No changes just added Sean Stalley who did EA support for BARs.
> 
>  drivers/pci/probe.c   | 58 
> ---
>  include/uapi/linux/pci_regs.h |  6 +
>  2 files changed, 60 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index b1c05b5..f41d2e6 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1030,6 +1030,40 @@ static void pci_enable_crs(struct pci_dev *pdev)
>  
>  static unsigned int pci_scan_child_bus_extend(struct pci_bus *bus,
> unsigned int available_buses);
> +/*
> + * pci_ea_fixed_busnrs() - Read fixed Secondary and Subordinate bus
> + * numbers from EA capability.
> + * @dev: Bridge with EA
> + * @secondary: updated with secondary bus number in EA
> + * @subordinate: updated with subordinate bus number in EA
> + *
> + * If it is a bridge with EA capability then fixed bus numbers are
> + * read from EA capability list and secondary, subordinate reference
> + * variables will be updated. Otherwise secondary and subordinate reference
> + * variables will be zeroed.
> + */
> +static void pci_ea_fixed_busnrs(struct pci_dev *dev, u8 *secondary,
> + u8 *subordinate)
> +{
> + int ea;
> + int offset;
> + u32 dw;
> +
> + *secondary = *subordinate = 0;
> +
> + if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE)
> + return;
> +
> + /* find PCI EA capability in list */
> + ea = pci_find_capability(dev, PCI_CAP_ID_EA);
> + if (!ea)
> + return;
> +
> + offset = ea + PCI_EA_FIRST_ENT;
> + pci_read_config_dword(dev, offset, );

"Num Entries" in the first DW of the capability is allowed to be zero,
in which case this word (the second DW) is invalid.  [See comments
below; this code would be valid based on the 2014 ECN, but not per the
2017 PCIe r4.0 spec]

It would be much better if this function could be somehow incorporated
into pci_ea_init(), which already knows how to parse much of the EA
capability.

> + *secondary =  dw & PCI_EA_SEC_BUS_MASK;
> + *subordinate = (dw & PCI_EA_SUB_BUS_MASK) >> PCI_EA_SUB_BUS_SHIFT;
> +}
>  
>  /*
>   * pci_scan_bridge_extend() - Scan buses behind a bridge
> @@ -1064,6 +1098,8 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, 
> struct pci_dev *dev,
>   u16 bctl;
>   u8 primary, secondary, subordinate;
>   int broken = 0;
> + u8 fixed_sec, fixed_sub;
> + int next_busnr;
>  
>   /*
>* Make sure the bridge is powered on to be able to access config
> @@ -1163,17 +1199,25 @@ static int pci_scan_bridge_extend(struct pci_bus 
> *bus, struct pci_dev *dev,
>   /* Clear errors */
>   pci_write_config_word(dev, PCI_STATUS, 0x);
>  
> + /* read bus numbers from EA */
> + pci_ea_fixed_busnrs(dev, _sec, _sub);
> +
> + next_busnr = max + 1;
> + /* Use secondary bus number in EA */
> + if (fixed_sec)
> + next_busnr = fixed_sec;
> +
>   /*
>* Prevent assigning a bus number that already exists.
>* This can happen when a bridge is hot-plugged, so in this
>* case we only re-scan this bus.
>*/
> - child = pci_find_bus(pci_domain_nr(bus), max+1);
> + child = pci_find_bus(pci_domain_nr(bus), next_busnr);
>   if (!child) {
> - child = pci_add_new_bus(bus, dev, max+1);
> + child = pci_add_new_bus(bus, dev, next_busnr);
>   if (!child)
>   goto out;
> - pci_bus_insert_busn_res(child, max+1,
> + pci_bus_insert_busn_res(child, next_busnr,
>   bus->busn_res.end);
>   }
>   max++;
> @@ -1234,7 +1278,13 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, 
> struct pci_dev *dev,
>   max += i;
>   }
>  
> - /* Set subordinate bus number to its real value */
> + /*
> +  * Set subordinate bus number to its real value.
> +  * If fixed subordinate bus number exists from EA
> +  * capability then use it.
> +  */
> + if (fixed_sub)
> + max = fixed_sub;
>   pci_bus_update_busn_res_end(child, max);
>   pci_write_config_byte(dev, PCI_SUBORDINATE_BUS, 

Re: Fwd: [Bug 201647] New: Intel Wireless card 3165 does not get detected but bluetooth works

2018-11-28 Thread Bjorn Helgaas
[+cc Emmanuel, LKML]

On Fri, Nov 09, 2018 at 03:43:06PM -0600, Bjorn Helgaas wrote:
> -- Forwarded message -
> From: 
> Date: Fri, Nov 9, 2018 at 4:10 AM
> Subject: [Bug 201647] New: Intel Wireless card 3165 does not get
> detected but bluetooth works
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=201647
> 
> Bug ID: 201647
>Summary: Intel Wireless card 3165 does not get detected but
> bluetooth works
>Product: Drivers
>Version: 2.5
> Kernel Version: 4.19.1
>   Hardware: Intel
> OS: Linux
>   Tree: Mainline
> Status: NEW
>   Severity: high
>   Priority: P1
>  Component: PCI
>   Assignee: drivers_...@kernel-bugs.osdl.org
>   Reporter: mertar...@gmail.com
> Regression: No
> 
> This bug affects most of the devices with a Celeron N4000 and an
> Intel wifi 3165 Ac adapter.
> 
> When using Linux wifi is not working however, Bluetooth is working
> fine.  Also, Bluetooth part of this chip is connected via btusb and
> the wifi part of this chip is connected via PCIe.

Can you attach a screenshot of the Windows 10 device manager info for
the wifi adapter to the bugzilla?  If you can get a raw hex dump of
its config space, that would be awesome.

Also attach a copy of your kernel .config file (typically in /boot/).

My only guess is that maybe the system keeps wifi completely powered
down and uses hotplug to add it when needed.  [1] mentions wifi being
on pcibus 1 under Windows.  Your lspci does show bridge 00:13.0
leading to bus 01, but Linux doesn't find any devices on bus 01.

Hotplug could be done via either acpiphp (ACPI mediated hotplug) or
pciehp (native PCIe hotplug).  Your dmesg shows you do have acpiphp.

I can't tell about pciehp (your .config will show that), but I think
pciehp will only claim bridges where SltCap contains HotPlug+, and
yours shows HotPlug-, so I don't think pciehp will do anything on your
system.

Even if the system does use hotplug, I don't know what mechanism the
OS would use to wake up the device, since we don't know it even
exists.  I guess there could be some magic switch accessible via USB.
But if that were the case, I'm sure Emmanuel would know about it.

[1] 
https://www.chinamobilemag.de/forum/hardware/3779-teclast-f5-linux-erkennt-kein-wlan-intel-3165-in-ubuntu-18-04-1.html?start=10


Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus

2018-11-28 Thread Bjorn Helgaas
On Tue, Nov 27, 2018 at 10:32 PM Bharat Bhushan  wrote:

> > -Original Message-
> > From: Alex Williamson 
> > Sent: Tuesday, November 27, 2018 9:39 PM
> > To: Bjorn Helgaas 
> > Cc: Bharat Bhushan ; linux-...@vger.kernel.org;
> > linux-kernel@vger.kernel.org; bharatb.ya...@gmail.com; David Daney
> > ; Jan Glauber ; Maik
> > Broemme ; Chris Blake
> > 
> > Subject: Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus
> >
> > On Tue, 27 Nov 2018 09:33:56 -0600
> > Bjorn Helgaas  wrote:

> > > 4) Is there a hardware erratum for this?  If so, please include the
> > > URL here.
>
> No h/w errata as of now.

Does that mean (a) the HW folks agree this is a hardware problem but
they haven't written an erratum, (b) there is an erratum but it isn't
public, (c) we don't have any concrete evidence of a hardware problem,
but things just don't work if we do a bus reset, (d) something else?

> In pci_reset_secondary_bus() I have tried to increase the delay after reset 
> but not helped.
> Do I need to add delay at some other place as well?

No, I think the place you tried should be enough.

You should also be able to exercise this from user-space by using
"setpci" to set and clear the Secondary Bus Reset bit in the Bridge
Control register.  Then you can also use setpci to read/write config
space of the NIC.  The kernel would normally read the Vendor and
Device IDs as the first access to the device during enumeration.  You
also might be able to learn something by using "lspci -vv" on the
bridge before and after the reset to see if it logs any AER bits (if
it supports AER) or the other standard error logging bits.


Re: [Bug] SD card reader in Acer Aspire S5 broken in 4.20-rc

2018-11-28 Thread Bjorn Helgaas
On Wed, Nov 28, 2018 at 6:13 AM Rafael J. Wysocki  wrote:
>
> On Tuesday, November 27, 2018 9:25:14 PM CET Bjorn Helgaas wrote:
> > On Mon, Nov 26, 2018 at 11:37:20PM +0100, Rafael J. Wysocki wrote:
> > > On Monday, November 26, 2018 7:03:58 PM CET Rafael J. Wysocki wrote:
> > > > Hi Bjorn,
> > > >
> > > > The SD card reader in my Acer Aspire S5 doesn't work with 4.20-rc.
> > > >
> > > > Here's what lspci -v says about it (in a bad kernel):
> > > >
> > > > 02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. 
> > > > RTS5209 PCI Express Card Reader
> > > > (rev 01)
> > > > Subsystem: Acer Incorporated [ALI] Device 0704
> > > > Flags: bus master, fast devsel, latency 0, IRQ 35
> > > > Memory at d9001000 (32-bit, non-prefetchable) [size=4K]
> > > > Capabilities: [40] Power Management version 3
> > > > Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
> > > > Capabilities: [70] Express Endpoint, MSI 00
> > > > Capabilities: [100] Advanced Error Reporting
> > > > Capabilities: [140] Device Serial Number 00-00-00-01-00-4c-e0-00
> > > > Kernel driver in use: rtsx_pci
> > > > Kernel modules: rtsx_pci
> >
> > Thanks a lot for bisecting this!
> >
> > With a good kernel (v4.19 or v4.20-rc with 17c91487364f reverted),
> > would you mind collecting "lspci -vv" output, the dmesg log with
> > "pci=earlydump", and the FADT dump?
>
> All of the information is attached to the BZ entry at
>
> https://bugzilla.kernel.org/show_bug.cgi?id=201801

Thanks!  I hope Patrick has a chance to look at this.  Per the
bugzilla mentioned in 17c91487364f, it fixes a problem with a custom
proprietary PCIe device, and there's a lot of good detailed analysis
there, so hopefully we can figure out a way to address both
situations.


Re: [RFC PATCH v3] genirq/affinity: Create and transfer more irq desc info by a new structure

2018-11-28 Thread Bjorn Helgaas
[+cc linux-pci]

On Wed, Nov 28, 2018 at 10:45 AM Dou Liyang  wrote:
>
> Now, Linux just spread the irq affinity to irqdesc core by a cpumask pointer.
> if an Vector's affinity is not NULL, it will be marked as managed.
>
> But, as Kashyap and Sumit reported, in MSI/-x subsystem, the pre/post vectors
> may be used to some extra reply queues for performance. their affinities are
> not NULL, but, they should be mapped as unmanaged interrupts. So, only
> transfering the irq affinity assignments is not enough
>
> Create a new structure named irq_affinity_desc, which include both the irq
> affinity masks and flags. Replace the cpumask pointer with a irq_affinity_desc
> pointer which allows to expand this in the future without touching all the
> functions ever again, just modify the data irq_affinity_desc structure.
>
> Reported-by: Kashyap Desai 
> Reported-by: Sumit Saxena 

Since you mention reports, are there URLs to mailing list archives you
can include?

> Suggested-by: Thomas Gleixner 
> Signed-off-by: Dou Liyang 
> ---
> Changelog:
>   v2 --> v3
>   - Create a new irq_affinity_desc pointer to transfer the info
> suggested by tglx
>   - rebase to the tip irq/core branch
>
>   v1 --> v2
>   - Add a bitmap for marking if an interrupt is managed or not.
>   the size of bitmap is runtime allocation.
>
>   - Need more tests, Just test this patch in QEmu.
>
>   - v1: https://lkml.org/lkml/2018/9/13/366
>
>  drivers/pci/msi.c | 29 ++--
>  include/linux/interrupt.h | 19 ---
>  include/linux/irq.h   |  3 ++-
>  include/linux/irqdomain.h |  7 ---
>  include/linux/msi.h   |  4 ++--
>  kernel/irq/affinity.c | 40 +--
>  kernel/irq/devres.c   | 23 --
>  kernel/irq/irqdesc.c  | 32 +++
>  kernel/irq/irqdomain.c| 14 +++---
>  kernel/irq/msi.c  | 21 ++--
>  10 files changed, 135 insertions(+), 57 deletions(-)
>
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 265ed3e4c920..431449163316 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -534,16 +534,15 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
>  static struct msi_desc *
>  msi_setup_entry(struct pci_dev *dev, int nvec, const struct irq_affinity 
> *affd)
>  {
> -   struct cpumask *masks = NULL;
> +   struct irq_affinity_desc *affi_desc = NULL;
> struct msi_desc *entry;
> u16 control;
>
> if (affd)
> -   masks = irq_create_affinity_masks(nvec, affd);
> -
> +   affi_desc = irq_create_affinity_desc(nvec, affd);
>
> /* MSI Entry Initialization */
> -   entry = alloc_msi_entry(>dev, nvec, masks);
> +   entry = alloc_msi_entry(>dev, nvec, affi_desc);

Can you split this into two or more patches?  Most of these changes
are trivial and not very interesting, and the fact that they're all in
one patch makes it hard to find and review the interesting bits.  For
example,

  1) Rename all the local variables while keeping the type the same
(or just leave the name the same; I think "affinity" would be a fine
name, and I would be OK if we ended up with "struct irq_affinity_desc
*masks" or "struct irq_affinity_desc *affinity").  This patch would
obviously have no functional impact and would remove a lot of the
trivial changes.

  2) Add "struct irq_affinity_desc" containing only "struct cpumask
masks" and irq_create_affinity_desc() (or leave the name as
irq_create_affinity_masks() and adjust the signature).  This would
also have no functional impact and would be a fairly trivial patch.

  3) Add "flags" to struct irq_affinity_desc and the related code.
This is the real meat of your patch, and with the above out of the
way, it will be much smaller and it'll be obvious what the important
changes are.

> if (!entry)
> goto out;
>
> @@ -567,7 +566,7 @@ msi_setup_entry(struct pci_dev *dev, int nvec, const 
> struct irq_affinity *affd)
> pci_read_config_dword(dev, entry->mask_pos, >masked);
>
>  out:
> -   kfree(masks);
> +   kfree(affi_desc);
> return entry;
>  }
>
> @@ -672,15 +671,15 @@ static int msix_setup_entries(struct pci_dev *dev, void 
> __iomem *base,
>   struct msix_entry *entries, int nvec,
>   const struct irq_affinity *affd)
>  {
> -   struct cpumask *curmsk, *masks = NULL;
> +   struct irq_affinity_desc *cur_affi_desc, *affi_desc = NULL;
> struct msi_desc *entry;
> int ret, i;
>
> if (affd)
> -   masks = irq_create_affinity_masks(nvec, affd);
> +   affi_desc = irq_create_affinity_desc(nvec, affd);
>
> -   for (i = 0, curmsk = masks; i < nvec; i++) {
> -   entry = alloc_msi_entry(>dev, 1, curmsk);
> +   for (i = 0, cur_affi_desc = affi_desc; i < nvec; i++) {
> +   entry = 

Re: [Bug] SD card reader in Acer Aspire S5 broken in 4.20-rc

2018-11-27 Thread Bjorn Helgaas
On Mon, Nov 26, 2018 at 11:37:20PM +0100, Rafael J. Wysocki wrote:
> On Monday, November 26, 2018 7:03:58 PM CET Rafael J. Wysocki wrote:
> > Hi Bjorn,
> > 
> > The SD card reader in my Acer Aspire S5 doesn't work with 4.20-rc.
> > 
> > Here's what lspci -v says about it (in a bad kernel):
> > 
> > 02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5209 
> > PCI Express Card Reader 
> > (rev 01)
> > Subsystem: Acer Incorporated [ALI] Device 0704
> > Flags: bus master, fast devsel, latency 0, IRQ 35
> > Memory at d9001000 (32-bit, non-prefetchable) [size=4K]
> > Capabilities: [40] Power Management version 3
> > Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
> > Capabilities: [70] Express Endpoint, MSI 00
> > Capabilities: [100] Advanced Error Reporting
> > Capabilities: [140] Device Serial Number 00-00-00-01-00-4c-e0-00
> > Kernel driver in use: rtsx_pci
> > Kernel modules: rtsx_pci

Thanks a lot for bisecting this!

With a good kernel (v4.19 or v4.20-rc with 17c91487364f reverted),
would you mind collecting "lspci -vv" output, the dmesg log with
"pci=earlydump", and the FADT dump?

I'm interested in the initial state of the device at handoff from
BIOS, and what Linux changes even when aspm_disabled is set.

If we can't figure out a way to fix both this issue and the one fixed
by 17c91487364f, I guess the fallback will be to revert 17c91487364f
since it's better to allow a system that was previously broken to
remain broken than it is to break a system that previously worked.

But obviously I hope we can figure out a solution that fixes both
cases.

> > When it doesn't work, it doesn't generate any interrupts on device insertion
> > and removal and this seems to be reproducible 100% of the time.
> 
> So reverting 17c91487364f (PCI/ASPM: Do not initialize link state when
> aspm_disabled is set) on top of 4.20-rc4 makes the problem go away.
> 
> I guess that the device in question needs pcie_aspm_cap_init() to be
> called for it even though the FADT says "NO_ASPM".


Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus

2018-11-27 Thread Bjorn Helgaas
[+cc David, Jan, Alex, Maik, Chris]

On Tue, Nov 27, 2018 at 08:46:33AM +, Bharat Bhushan wrote:
> NXP (Freescale Vendor ID) LS1088 chips do not behave correctly after
> bus reset with e1000e. Link state of device does not comes UP and so
> config space never accessible again.

Previous similar commits:

  822155100e58 ("PCI: Mark Cavium CN8xxx to avoid bus reset")
  8e2e03179923 ("PCI: Mark Atheros AR9580 to avoid bus reset")
  9ac0108c2bac ("PCI: Mark Atheros AR9485 and QCA9882 to avoid bus reset")
  c3e59ee4e766 ("PCI: Mark Atheros AR93xx to avoid bus reset")

1) Please make your subject match (remove the spurious "bus" at the
end)

2) This should probably be marked for stable (v3.14 and later, since
the quirk itself appeared in v3.19 and marked for v3.14 and later
stable kernels).  Maybe even mark it as "Fixes: c3e59ee4e766..." to
connect it.

3) The 1957:80c0 PCI ID doesn't appear in https://pci-ids.ucw.cz/; can
you add it?

4) Is there a hardware erratum for this?  If so, please include the
URL here.

5) Can you reproduce the problem using the same endpoint (e1000e) on a
different system with a different bridge?

6) Have you looked at this with a PCIe analyzer?  It would be very
interesting to compare the boot-time or system reboot path with the
individual bus reset path you're fixing.

Since there are several similar reports and they sometimes involve the
same devices (both your patch and 822155100e58 mention e1000e), I'm a
little suspicious that we're doing something wrong in the bus reset
path.

I think bus reset uses Secondary Bus Reset in the Bridge Control
register.  That's a generic mechanism that I would expect to be pretty
well-tested.  I suspect the BIOS probably uses it in the reboot path,
and the device probably works after that.

So I wonder if the Linux delay isn't quite long enough, or our first
access to the device isn't quite right, e.g., maybe there's some issue
with the bus/device number capture (PCIe r4.0, sec 2.2.6.2).

> Signed-off-by: Bharat Bhushan 
> ---
>  drivers/pci/quirks.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 4700d24e5d55..b9ae4e9f101a 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3391,6 +3391,13 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 
> 0x0033, quirk_no_bus_reset);
>   */
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_CAVIUM, 0xa100, quirk_no_bus_reset);
>  
> +/*
> + * NXP (Freescale Vendor ID) LS1088 chips do not behave correctly after
> + * bus reset. Link state of device does not comes UP and so config space
> + * never accessible again.
> + */
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_FREESCALE, 0x80c0, 
> quirk_no_bus_reset);
> +
>  static void quirk_no_pm_reset(struct pci_dev *dev)
>  {
>   /*
> -- 
> 2.19.1
> 


Re: [PATCH 00/12] Bring suspend to RAM support to PCIe Aardvark driver

2018-11-26 Thread Bjorn Helgaas
On Fri, Nov 23, 2018 at 8:18 AM Miquel Raynal  wrote:
>
> Hello,
>
> As part of an effort to bring suspend to RAM support to Armada 3700
> SoCs (main target: ESPRESSObin), this series handles the work around
> the PCIe IP.
>
> First, more configuration is done in the 'setup' helper as inspired
> from the U-Boot driver. This is needed to entirely initialize the IP
> during future resume operation (patch 1).
>
> Then, reset GPIO, PHY and clock support are introduced (patch 2-4). As
> current device trees do not provide the corresponding properties, not
> finding one of these properties is not an error and just produces a
> warning. However, if the property is present, an error during PHY
> initialization will fail the probe of the driver.
>
> Note: To be sure the clock will be resumed before this driver, a first
> series adding links between clocks and consumers has been submitted,
> see [1].
>
> Patch 5 adds suspend/resume hooks, re-using all the above.
>
> Finally, bindings and device trees are updated to reflect the hardware
> (patch 6-12). While the clock depends on the SoC, the reset GPIO and
> the PHY depends on the board so the clock is added in the
> armada-37xx.dtsi file while the two other properties are added in
> armada-3720-espressobin.dts.
>
> [1] 
> http://lists.infradead.org/pipermail/linux-arm-kernel/2018-November/614527.html
>
> Thanks,
> Miquèl
>
>
> Miquel Raynal (12):
>   PCI: aardvark: configure more registers in the configuration helper
>   PCI: aardvark: add reset GPIO support
>   PCI: aardvark: add PHY support
>   PCI: aardvark: add clock support
>   PCI: aardvark: add suspend to RAM support
>   dt-bindings: PCI: aardvark: describe the reset-gpios property
>   dt-bindings: PCI: aardvark: describe the clocks property
>   dt-bindings: PCI: aardvark: describe the PHY property
>   ARM64: dts: marvell: armada-37xx: declare PCIe reset pin
>   ARM64: dts: marvell: armada-3720-espressobin: declare PCIe reset GPIO
>   ARM64: dts: marvell: armada-37xx: declare PCIe clock
>   ARM64: dts: marvell: armada-3720-espressobin: declare PCIe PHY

Hi Miquèl,

Thanks for your work!  If/when you post a v2, please run "git log
--oneline" and adjust your subject lines to match the capitalization
conventions, i.e., for PCI, start the description with a capital
letter: "PCI: aardvark: Add suspend to RAM support".

BTW, I notice you closed your email with "Miquèl", but the patches
contain "Miquel".  you *should* be able to use the correctly accented
version of your name in the Signed-off-by lines.  I have tripped over
some tool issues, but if we pay attention, we should be able to get it
to work.

>  .../devicetree/bindings/pci/aardvark-pci.txt  |   9 +
>  .../dts/marvell/armada-3720-espressobin.dts   |   4 +
>  arch/arm64/boot/dts/marvell/armada-37xx.dtsi  |   5 +
>  drivers/pci/controller/pci-aardvark.c | 214 ++
>  4 files changed, 232 insertions(+)
>
> --
> 2.19.1
>


Re: [PATCH v2 1/2] pci: prevent sk hynix nvme from entering D3

2018-11-15 Thread Bjorn Helgaas
On Thu, Nov 15, 2018 at 08:58:09AM -0600, Bjorn Helgaas wrote:
> On Thu, Nov 15, 2018 at 03:16:29PM +0800, Kai Heng Feng wrote:
> > On Nov 9, 2018, at 08:21, Bjorn Helgaas  wrote:

> > > I'm not sure we want a quirk for this at all, since as Christoph
> > > points out, it doesn't fix a functional issue as the other uses of
> > > quirk_no_ata_d3() do.
> > > 
> > > From your emails with Christoph, it sounds like this quirk is a
> > > workaround for a firmware defect.  If we *do* end up wanting a quirk,
> > > the changelog should at least mention the firmware defect and maybe
> > > check whether it has been fixed.
> > 
> > According to SK Hynix folks and new evidence on the new Intel NVMe
> > we have, this is something we are going to see more often.
> 
> Hmmm, are you suggesting that if we went this quirk route, we'd be
> updating the quirk frequently to add new devices?
> 
> I'm opposed to that as a strategy because it makes needless work.  You
> have to update the quirk, backport it to older kernels, re-release
> distro kernels, etc.

But I guess you have to do this anyway just to add the vendor/device
ID to the driver, so maybe this isn't a big deal to you.  If you can
do a quirk like this in the driver, it would be invisible to me and I
wouldn't care.  I just don't want to deal with ongoing tweaks like
this in the PCI core :)

Bjorn


Re: [GIT PULL] PCI fixes for v4.20

2018-11-15 Thread Bjorn Helgaas
On Thu, Nov 15, 2018 at 10:54:18AM -0500, Konstantin Ryabitsev wrote:
> On Thu, Nov 15, 2018 at 09:03:21AM -0600, Bjorn Helgaas wrote:
> > > You didn't really do anything wrong. In *general* I prefer to see
> > > public URLs if they are sent to public lists, so if you're cc'ing
> > > something to LKML, I would generally expect the pull request to have a
> > > public URL like https://git.kernel.org/ instead of a private ssh:// URL
> > > that is only accessible to people with a kernel.org account.
> > > 
> > > That's basically all there is to it. It doesn't *really* matter, since
> > > Linus is the one who will be merging the actual pull request, and he
> > > certainly has access to internal ssh:// URLs. However, in case someone
> > > else was interested in reviewing the pull request, it would be more
> > > friendly to have a public URL for them.
> > 
> > OK, I think I'll remove the insteadOf chunk from my .gitconfig.  Should
> > https://korg.wiki.kernel.org/userdoc/gitolite be updated to remove or
> > expand that recommendation?  The only reason I added insteadOf in the first
> > place was because it sounded like a security improvement.
> 
> It is. Does adding the insteadOf rules result in ssh:// URLs when using
> git-request-pull? I didn't expect that it would.

Yep, it seems to for me.  Maybe I'm doing something else weird, because I
don't see many other pull requests with ssh:// URLs.


Re: [GIT PULL] PCI fixes for v4.20

2018-11-15 Thread Bjorn Helgaas
On Thu, Nov 15, 2018 at 02:53:30AM -0500, Konstantin Ryabitsev wrote:
> On Thu, Nov 15, 2018 at 01:12:53AM -0600, Bjorn Helgaas wrote:
> > > and I kinda see the point of maybe not having your ssh username in the
> > > URL. Not that it is a big deal for us, k.org users though.
> > 
> > Sorry, I don't understand the problem.  I have this in my .gitconfig:
> > 
> > [url "ssh://g...@gitolite.kernel.org"]
> > insteadOf = https://git.kernel.org
> > insteadOf = http://git.kernel.org
> > insteadOf = git://git.kernel.org
> > 
> > because I thought that was the recommended way (see the end of
> > https://korg.wiki.kernel.org/userdoc/gitolite). But that also makes my
> > request-pull:
> > 
> > $ git request-pull origin/master 
> > git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git 
> > pci-v4.20-fixes-1
> > 
> > generate the ssh URL above.  If I remove the insteadOf stuff from
> > .gitconfig, request-pull produces this instead:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git 
> > tags/pci-v4.20-fixes-1
> > 
> > I'm happy to do either; just tell me which :)
> 
> You didn't really do anything wrong. In *general* I prefer to see
> public URLs if they are sent to public lists, so if you're cc'ing
> something to LKML, I would generally expect the pull request to have a
> public URL like https://git.kernel.org/ instead of a private ssh:// URL
> that is only accessible to people with a kernel.org account.
> 
> That's basically all there is to it. It doesn't *really* matter, since
> Linus is the one who will be merging the actual pull request, and he
> certainly has access to internal ssh:// URLs. However, in case someone
> else was interested in reviewing the pull request, it would be more
> friendly to have a public URL for them.

OK, I think I'll remove the insteadOf chunk from my .gitconfig.  Should
https://korg.wiki.kernel.org/userdoc/gitolite be updated to remove or
expand that recommendation?  The only reason I added insteadOf in the first
place was because it sounded like a security improvement.

Bjorn


Re: [PATCH v2 1/2] pci: prevent sk hynix nvme from entering D3

2018-11-15 Thread Bjorn Helgaas
On Thu, Nov 15, 2018 at 03:16:29PM +0800, Kai Heng Feng wrote:
> > On Nov 9, 2018, at 08:21, Bjorn Helgaas  wrote:
> > On Tue, Nov 06, 2018 at 03:12:13PM +0800, AceLan Kao wrote:
> >> It leads to the power consumption raises to 2.2W during s2idle, while
> >> it consumes less than 1W during long idle if put SK hynix nvme to D3
> >> and then enter s2idle.
> >> From SK hynix FE, MS Windows doesn't put nvme to D3, and uses its own
> >> APST feature to do the power management.
> >> To leverage its APST feature during s2idle, we can't disable nvme
> >> device while suspending, too.
> 
> We have a new Intel NVMe [8086:f1a6] that has this “new” behavior.
> 
> > I don't know how APST works, but it sounds like you want to disable D3
> > if you're using APST.  But that's not what this patch does; this
> > disables it always.
> 
> Ok, will work on a new patch that only disables D3 when APST is enabled.

My comment was that the changelog didn't match the code.  I don't know
which one is wrong, so I wasn't trying to suggest that you change the
code.  If the code is right and the changelog is wrong, just change
the changelog.

> > I'm not sure we want a quirk for this at all, since as Christoph
> > points out, it doesn't fix a functional issue as the other uses of
> > quirk_no_ata_d3() do.
> > 
> > From your emails with Christoph, it sounds like this quirk is a
> > workaround for a firmware defect.  If we *do* end up wanting a quirk,
> > the changelog should at least mention the firmware defect and maybe
> > check whether it has been fixed.
> 
> According to SK Hynix folks and new evidence on the new Intel NVMe
> we have, this is something we are going to see more often.

Hmmm, are you suggesting that if we went this quirk route, we'd be
updating the quirk frequently to add new devices?

I'm opposed to that as a strategy because it makes needless work.  You
have to update the quirk, backport it to older kernels, re-release
distro kernels, etc.

If this situation is going to happen frequently, it would be better to
(a) fix the firmware defect (if that's what this is) or (b) pursue
some APST or other spec change so there's a generic documented way to
handle this without requiring device-specific quirks.

Bjorn


Re: [GIT PULL] PCI fixes for v4.20

2018-11-14 Thread Bjorn Helgaas
On Wed, Nov 14, 2018 at 11:48:39PM +0100, Borislav Petkov wrote:
> On Wed, Nov 14, 2018 at 05:21:54PM -0500, Konstantin Ryabitsev wrote:
> > For the record, there's nothing wrong with that, it's just a condition
> > that I didn't expect. I have a fix in place that should avoid this in
> > the future.
> 
> Actually, I meant the pull request URL.

My pull request URL was this:

  ssh://g...@gitolite.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git 
tags/pci-v4.20-fixes-1

> Here's some background info:
> 
> https://lkml.kernel.org/r/CA%2B55aFyMxkS=8jzz%2brooafkwr45ekbnq0gumqs4f%2br_-ffw...@mail.gmail.com
> 
> and I kinda see the point of maybe not having your ssh username in the
> URL. Not that it is a big deal for us, k.org users though.

Sorry, I don't understand the problem.  I have this in my .gitconfig:

[url "ssh://g...@gitolite.kernel.org"]
insteadOf = https://git.kernel.org
insteadOf = http://git.kernel.org
insteadOf = git://git.kernel.org

because I thought that was the recommended way (see the end of
https://korg.wiki.kernel.org/userdoc/gitolite). But that also makes my
request-pull:

$ git request-pull origin/master 
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git pci-v4.20-fixes-1

generate the ssh URL above.  If I remove the insteadOf stuff from
.gitconfig, request-pull produces this instead:

git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git 
tags/pci-v4.20-fixes-1

I'm happy to do either; just tell me which :)

Bjorn


[GIT PULL] PCI fixes for v4.20

2018-11-13 Thread Bjorn Helgaas
PCI fixes:

  - Revert a _PXM change that causes silent early boot failure on some AMD
ThreadRipper systems (Bjorn Helgaas)


The following changes since commit 651022382c7f8da46cb4872a545ee1da6d097d2a:

  Linux 4.20-rc1 (2018-11-04 15:37:52 -0800)

are available in the Git repository at:

  ssh://g...@gitolite.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git 
tags/pci-v4.20-fixes-1

for you to fetch changes up to 0d76bcc960e6057750fcf556b65da13f8bbdfd2b:

  Revert "ACPI/PCI: Pay attention to device-specific _PXM node values" 
(2018-11-13 08:38:17 -0600)


pci-v4.20-fixes-1

--------
Bjorn Helgaas (1):
  Revert "ACPI/PCI: Pay attention to device-specific _PXM node values"

 drivers/pci/pci-acpi.c | 5 -
 1 file changed, 5 deletions(-)


Re: [GIT PULL] PCI changes for v4.20

2018-11-13 Thread Bjorn Helgaas
[+cc Martin, Rafael, Len, linux-acpi]

On Tue, Nov 13, 2018 at 11:20:04AM +0100, Borislav Petkov wrote:
> On Tue, Nov 13, 2018 at 08:17:12AM +0100, Ingo Molnar wrote:
> > 
> > * Bjorn Helgaas  wrote:
> > 
> > > PCI changes:
> > > 
> > >   - Pay attention to device-specific _PXM node values (Jonathan Cameron)
> > 
> > There's a new boot regression, my AMD ThreadRipper system (MSI X399 SLI 
> > PLUS (MS-7B09)) hangs during early bootup, and I have bisected it down to 
> > this commit:
> > 
> >   bad7dcd94f39: ACPI/PCI: Pay attention to device-specific _PXM node values
> > 
> > Reverting it solves the hang.
> > 
> > Unfortunately there's no console output when it hangs, even with 
> > earlyprintk. It just hangs after the "loading initrd" line.
> > 
> > Config is an Ubuntu-ish config with PROVE_LOCKING=y and a few other debug 
> > options.
> > 
> > All my other testsystems boot fine with similar configs, so it's probably 
> > something specific to this system.

Martin reported the same thing [1] (unfortunately the archive didn't
capture Martin's original emails, I think because they were multi-part
messages with attachments).

Looks like Martin might have a similar system:

  DMI: To Be Filled By O.E.M. To Be Filled By O.E.M./X399 Taichi, BIOS P3.30 
08/14/2018
  smpboot: CPU0: AMD Ryzen Threadripper 2950X 16-Core Processor (family: 0x17, 
model: 0x8, stepping: 0x2)

Given how painful this is to debug, I queued up a revert on my
for-linus branch until we figure out what sanity checks are needed to
make the original patch safe.

I would expect proximity information to be basically just a hint for
optimization, not a functional requirement, so it would be really
interesting to figure out why this causes such a catastrophic failure.
Maybe there's a way to improve that path as well so it would be more
robust or at least more debuggable.

Bjorn

[1] 
https://lore.kernel.org/linux-pci/20180912152140.3676-2-jonathan.came...@huawei.com


Re: [PATCH v2 1/2] pci: prevent sk hynix nvme from entering D3

2018-11-08 Thread Bjorn Helgaas
On Tue, Nov 06, 2018 at 03:12:13PM +0800, AceLan Kao wrote:
> It leads to the power consumption raises to 2.2W during s2idle, while
> it consumes less than 1W during long idle if put SK hynix nvme to D3
> and then enter s2idle.
> From SK hynix FE, MS Windows doesn't put nvme to D3, and uses its own
> APST feature to do the power management.
> To leverage its APST feature during s2idle, we can't disable nvme
> device while suspending, too.

I don't know how APST works, but it sounds like you want to disable D3
if you're using APST.  But that's not what this patch does; this
disables it always.

I'm not sure we want a quirk for this at all, since as Christoph
points out, it doesn't fix a functional issue as the other uses of
quirk_no_ata_d3() do.

>From your emails with Christoph, it sounds like this quirk is a
workaround for a firmware defect.  If we *do* end up wanting a quirk,
the changelog should at least mention the firmware defect and maybe
check whether it has been fixed.

> BTW, prevent it from entering D3 will increase the power consumtion around
> 0.13W ~ 0.15W during short/long idle, and the power consumption during
> s2idle becomes 0.77W.
> 
> Signed-off-by: AceLan Kao 
> ---
>  drivers/pci/quirks.c| 1 +
>  include/linux/pci_ids.h | 2 ++
>  2 files changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 4700d24e5d55..b7e6492e8311 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -1332,6 +1332,7 @@ DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AL, 
> PCI_ANY_ID,
> occur when mode detecting */
>  DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_VIA, PCI_ANY_ID,
>   PCI_CLASS_STORAGE_IDE, 8, quirk_no_ata_d3);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SK_HYNIX, 0x1527, quirk_no_ata_d3);
>  
>  /*
>   * This was originally an Alpha-specific thing, but it really fits here.
> diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
> index 69f0abe1ba1a..5f5adda07de0 100644
> --- a/include/linux/pci_ids.h
> +++ b/include/linux/pci_ids.h
> @@ -3090,4 +3090,6 @@
>  
>  #define PCI_VENDOR_ID_NCUBE  0x10ff
>  
> +#define PCI_VENDOR_ID_SK_HYNIX   0x1c5c
> +
>  #endif /* _LINUX_PCI_IDS_H */
> -- 
> 2.17.1
> 
> 
> ___
> Linux-nvme mailing list
> linux-n...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme


Re: [PATCH] x86/PCI: Fix Broadcom CNB20LE unintended sign extension (redux)

2018-11-08 Thread Bjorn Helgaas
On Thu, Oct 25, 2018 at 02:52:31PM +0100, Colin King wrote:
> From: Colin Ian King 
> 
> In the expression "word1 << 16", word1 starts as u16, but is promoted to
> a signed int, then sign-extended to resource_size_t, which is probably
> not what was intended.  Cast to resource_size_t to avoid the sign
> extension.
> 
> This fixes an identical issue as fixed by commit 0b2d70764bb3
> ("x86/PCI: Fix Broadcom CNB20LE unintended sign extension") back in 2014.
> 
> Detected by CoverityScan, CID#138749, 138750 ("Unintended sign extension")
> 
> Fixes: 3f6ea84a3035 ("PCI: read memory ranges out of Broadcom CNB20LE host 
> bridge")
> Signed-off-by: Colin Ian King 

How lame that I fixed one but not both with 0b2d70764bb3, sorry about
that!

Applied to pci/enumeration for v4.21, thanks!

> ---
>  arch/x86/pci/broadcom_bus.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/pci/broadcom_bus.c b/arch/x86/pci/broadcom_bus.c
> index 526536c81ddc..d09c401a300d 100644
> --- a/arch/x86/pci/broadcom_bus.c
> +++ b/arch/x86/pci/broadcom_bus.c
> @@ -50,8 +50,8 @@ static void __init cnb20le_res(u8 bus, u8 slot, u8 func)
>   word1 = read_pci_config_16(bus, slot, func, 0xc0);
>   word2 = read_pci_config_16(bus, slot, func, 0xc2);
>   if (word1 != word2) {
> - res.start = (word1 << 16) | 0x;
> - res.end   = (word2 << 16) | 0x;
> + res.start = ((resource_size_t) word1 << 16) | 0x;
> + res.end   = ((resource_size_t) word2 << 16) | 0x;
>   res.flags = IORESOURCE_MEM;
>   update_res(info, res.start, res.end, res.flags, 0);
>   }
> -- 
> 2.19.1
> 


Re: [RFC] x86/pci: Mark pci_root_ops as const

2018-11-08 Thread Bjorn Helgaas
Hi Zubin,

On Thu, Nov 08, 2018 at 09:11:15AM -0800, Zubin Mithra wrote:
> pci_root_ops is only written to from within intel_mid_pci_init. This
> is linked in only when CONFIG_X86_INTEL_MID is set. If not for this,
> pci_root_ops could be marked as const.
> 
> Fix this by replacing pci_root_ops usage with pci_root_ops_ptr. If
> CONFIG_X86_INTEL_MID is set, pci_root_ops_ptr will be set to
> intel_mid_pci_ops inside intel_mid_pci_init.
> 
> Introduce pci_acpi_set_ops for intel_mid_pci_init to set
> acpi_pci_root_ops.pci_ops.
> 
> This also means that intel_mid_pci_ops cannot be freed after init, hence
> remove __initconst.
> 
> Signed-off-by: Zubin Mithra 
> ---
>  arch/x86/include/asm/pci_x86.h |  4 +++-
>  arch/x86/pci/acpi.c|  5 +
>  arch/x86/pci/common.c  |  5 +++--
>  arch/x86/pci/intel_mid_pci.c   |  5 +++--
>  drivers/pci/access.c   |  4 ++--
>  drivers/pci/probe.c|  4 ++--
>  include/linux/pci-acpi.h   |  2 +-
>  include/linux/pci.h| 11 ++-
>  8 files changed, 25 insertions(+), 15 deletions(-)

Can you:

  - Split this into an x86 patch and a PCI core patch (if possible)?
  - Make the same fixes for other arches?

Bjorn


Re: [PATCH] MAINTAINERS: Add x86 early-quirks.c file pattern to PCI subsystem

2018-11-08 Thread Bjorn Helgaas
On Wed, Oct 24, 2018 at 05:13:59PM -0500, Bjorn Helgaas wrote:
> From: Bjorn Helgaas 
> 
> arch/x86/kernel/early-quirks.c contains special PCI quirks that need to
> run even before the usual DECLARE_PCI_FIXUP_EARLY() quirks.  These have
> typically been merged by the x86 maintainers, which is fine, but PCI folks
> should at least see what's happening, so add a file pattern to the PCI
> subsystem entry.
> 
> Signed-off-by: Bjorn Helgaas 

I applied this with Ingo's ack to pci/misc for v4.20.

> ---
>  MAINTAINERS |1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4ece30f15777..63cb7f3dbbb4 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -11249,6 +11249,7 @@ F:include/uapi/linux/pci*
>  F:   lib/pci*
>  F:   arch/x86/pci/
>  F:   arch/x86/kernel/quirks.c
> +F:   arch/x86/kernel/early-quirks.c
>  
>  PCI NATIVE HOST BRIDGE AND ENDPOINT DRIVERS
>  M:   Lorenzo Pieralisi 
> 


Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected

2018-11-07 Thread Bjorn Helgaas
On Tue, Sep 18, 2018 at 05:15:00PM -0500, Alexandru Gagniuc wrote:
> When a PCI device is gone, we don't want to send IO to it if we can
> avoid it. We expose functionality via the irq_chip structure. As
> users of that structure may not know about the underlying PCI device,
> it's our responsibility to guard against removed devices.
> 
> .irq_write_msi_msg() is already guarded inside __pci_write_msi_msg().
> .irq_mask/unmask() are not. Guard them for completeness.
> 
> For example, surprise removal of a PCIe device triggers teardown. This
> touches the irq_chips ops some point to disable the interrupts. I/O
> generated here can crash the system on firmware-first machines.
> Not triggering the IO in the first place greatly reduces the
> possibility of the problem occurring.
> 
> Signed-off-by: Alexandru Gagniuc 

Applied to pci/misc for v4.21, thanks!

> ---
>  drivers/pci/msi.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index f2ef896464b3..f31058fd2260 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -227,6 +227,9 @@ static void msi_set_mask_bit(struct irq_data *data, u32 
> flag)
>  {
>   struct msi_desc *desc = irq_data_get_msi_desc(data);
>  
> + if (pci_dev_is_disconnected(msi_desc_to_pci_dev(desc)))
> + return;
> +
>   if (desc->msi_attrib.is_msix) {
>   msix_mask_irq(desc, flag);
>   readl(desc->mask_base); /* Flush write to device */
> -- 
> 2.17.1
> 


Re: [PATCH 2/4] x86/amd_nb: add support for newer PCI topologies

2018-11-05 Thread Bjorn Helgaas
[+cc Takashi, Andy, Colin, Myron for potential distro impact]

[Beginning of thread:
https://lore.kernel.org/linux-pci/20181102181055.130531-1-brian.wo...@amd.com/]

On Sat, Nov 03, 2018 at 12:29:48AM +0100, Borislav Petkov wrote:
> On Fri, Nov 02, 2018 at 02:59:25PM -0500, Bjorn Helgaas wrote:
> > This isn't my code, and I'm not really objecting to these changes, but
> > from where I sit, the fact that you need this sort of vendor-specific
> > topology discovery is a little bit ugly and seems like something of a
> > maintenance issue.  

I think this is the most important part, and I should have elaborated
on it instead of getting into the driver structure details below.

It is a major goal of ACPI and PCI that an old kernel should work
unchanged on a new platform unless it needs to use new functionality
introduced in the new platform.

amd_nb.c prevents us from achieving that goal.  These patches don't
add new functionality; they merely describe minor topographical
differences in new hardware.  We usually try to do that in a more
generic way, e.g., via an ACPI method, so the new platform can update
the ACPI method and use an old, already-qualified, already-shipped
kernel.

I'm not strenuously objecting to these because this isn't a *huge*
deal, but I suspect it is a source of friction for distros that don't
want to update and requalify their software for every new platform.

> > You could argue that this is sort of an "AMD CPU
> > driver", which is entitled to be device-specific, and that does make
> > some sense.
> 
> It is a bunch of glue code which enumerates the PCI devices a CPU
> has and other in-kernel users can use that instead of doing the
> discovery/enumeration themselves.
> 
> > But device-specific code is typically packaged as a driver that uses
> > driver registration interfaces like acpi_bus_register_driver(),
> > pci_register_driver(), etc.  That gives you a consistent structure
> > and, more importantly, a framework for dealing with hotplug.  It
> > doesn't look like amd_nb.c would deal well with hot-add of CPUs.
> 
> If you mean physical hotadd, then that's a non-issue as, AFAIK, AMD
> doesn't support that.
> 
> Now, TBH I've never tried soft-offlining the cores of a node and then
> check whether using the PCI devices of that node would work.
> 
> Now, I don't mind this getting converted to a proper PCI driver as long
> as it is not a module as it has to be present at all times. Other than
> that, I'm a happy camper.

amd_nb.c uses pci_get_device(), which is incompatible with hotplug and
subverts the usual driver/device ownership model.  We could pursue
this part of the conversation, but I think it's more fruitful to
approach this from the "new machine, old kernel" angle above.


Re: [PATCH v7] i2c: Add PCI and platform drivers for the AMD MP2 I2C controller

2018-10-30 Thread Bjorn Helgaas
[+cc Rafael, Len, linux-acpi]

On Sat, Oct 27, 2018 at 12:09:10PM -0300, Elie Morisse wrote:
> This contains two drivers:
>  * i2c-amd-plat-mp2: platform driver managing an i2c adapter (one of
> the two busses of the MP2) and routing any i2c read/write command to
> the PCI driver.
>  * i2c-amd-pci-mp2: PCI driver communicating through the C2P/P2C
> mailbox registers, or through DMA for more than 32 bytes transfers.

I'm dubious about this two-driver structure.  If I understand
correctly (and it's very possible that I don't), the PCI driver
(amd_mp2_pci_probe()) is the real owner of the i2c adapter: it
claims the PCI device, claims its BARs, and requests an IRQ.

The i2c_amd_probe() code *looks* like a platform driver that claims
AMDI0011 devices from the ACPI namespace, but I don't think it's
really a driver.  It looks like it exists mainly to extract some
information (bus speed and maybe a bus number?) from the namespace,
then to call i2c_add_adapter().

It looks like i2c_amd_probe() must run *after* amd_mp2_pci_probe(),
but there's no way to really enforce that ordering.

And i2c-amd-plat-mp2 contains the i2c_amd_algorithm functions, which 
operate on the PCI device, which requires exported interfaces
(amd_mp2_read(), amd_mp2_write()) that are implemented in the PCI
driver but called from the platform part.

It seems like there should be a way to put the ACPI lookups into
i2c-amd-pci-mp2 so there's only one driver.

I only have a couple trivial comments below but I'm not trimming my
response so the ACPI folks can see the whole context.

> This is major rework of the patch submitted by Nehal-bakulchandra Shah
> from AMD (https://patchwork.kernel.org/patch/10597369/).
> 
> Most of the event handling of v2/v3 was rewritten since it couldn't work
> if more than one bus was enabled, and contains many more fixes listed
> in the patch changelog.
> 
> With those changes both the touchpad and the touchscreen of the
> Ryzen-based Lenovo Yoga 530 which lie in separate busses work beautifully.
> 
> Signed-off-by: Elie Morisse 
> ---
> Changes since v1:(https://www.spinics.net/lists/linux-i2c/msg34650.html)
> -> Add fix for IOMMU
> -> Add depedency of ACPI
> -> Add locks to avoid the crash
> 
> Changes since v2:(https://patchwork.ozlabs.org/patch/961270/)
> 
> -> fix for review comments
> -> fix for more than 32 bytes write issue
> 
> Changes since v3 (https://patchwork.kernel.org/patch/10597369/) by Elie M.:
> 
> -> support more than one bus/adapter
> -> support more than one slave per bus
> -> use the bus speed specified by the slaves declared in the DSDT instead of
>assuming speed == 400kbits/s
> -> instead of kzalloc'ing a buffer for every less than 32 bytes reads, simply
>use i2c_msg.buf
> -> fix buffer overreads/overflows when (<=32 bytes) message lengths aren't a
>multiple of 4 by using memcpy_fromio and memcpy_toio
> -> use streaming DMA mappings instead of allocating a coherent DMA buffer for
>every >32 bytes read/write
> -> properly check for timeouts during i2c_amd_xfer and increase it from 50
>jiffies to 250 msecs (which is more in line with other drivers)
> -> complete amd_i2c_dev.msg even if the device doesn't return a xxx_success
>event, instead of stalling i2c_amd_xfer
> -> removed the spinlock and mdelay during i2c_amd_pci_configure, I didn't see
>the point since it's already waiting for a i2c_busenable_complete event
> -> add an adapter-specific mutex lock for i2c_amd_xfer, since we don't want
>parallel calls writing to AMD_C2P_MSG0 (or AMD_C2P_MSG1)
> -> add a global mutex lock for registers AMD_C2P_MSG2 to AMD_C2P_MSG9,  which
>are shared across the two busses/adapters
> -> add MODULE_DEVICE_TABLE to automatically load i2c-amd-platdrv if the DSDT
>enumerates devices with the "AMDI0011" HID
> -> set maximum length of reads/writes to 4095 (event's length field is 12 
> bits)
> -> basic PM support
> -> style corrections to match the kernel code style, and tried to reduce code
>duplication whenever possible
> 
> Changes since v4 (https://marc.info/?l=linux-kernel=154031133019835) by 
> Elie M.:
> 
> -> fix missing typecast warning
> -> removed the duplicated entry in Kconfig
> 
> Changes since v5 by Elie M.:
> 
> -> move DMA mapping from the platform driver to the PCI driver
> -> attempt to find the platform device's PCI parent through the _DEP ACPI 
> method
>(if not found take the first MP2 device registred in the i2c-amd-pci-mp2
>driver, like before)
> -> do not assume anymore that the PCI device is owned by the i2c-amd-pci-mp2
>driver
> -> address other review comments by Bjorn Helgaas (meant for v3)
&g

Re: [PATCH 1/3] PCI/AER: Option to leave System Error Interrupts as-is

2018-10-29 Thread Bjorn Helgaas
[+cc Rafael, Len, Tony, Borislav, Tyler, Christoph, linux-acpi, LKML]

On Fri, Oct 26, 2018 at 02:19:04PM -0600, Jon Derrick wrote:
> Add a bit in pci_host_bridge to indicate to leave the System Error
> Interrupts as configured by the pre-boot environment. Propagate this to
> the AER driver which disables System Error Interrupts.
> 
> Signed-off-by: Jon Derrick 
> ---
>  drivers/pci/pcie/aer.c | 7 +--
>  include/linux/pci.h| 3 +++
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 83180ed..6a4af63 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1360,6 +1360,7 @@ static void 
> set_downstream_devices_error_reporting(struct pci_dev *dev,
>  static void aer_enable_rootport(struct aer_rpc *rpc)
>  {
>   struct pci_dev *pdev = rpc->rpd;
> + struct pci_host_bridge *host;
>   int aer_pos;
>   u16 reg16;
>   u32 reg32;
> @@ -1369,8 +1370,10 @@ static void aer_enable_rootport(struct aer_rpc *rpc)
>   pcie_capability_write_word(pdev, PCI_EXP_DEVSTA, reg16);
>  
>   /* Disable system error generation in response to error messages */
> - pcie_capability_clear_word(pdev, PCI_EXP_RTCTL,
> -SYSTEM_ERROR_INTR_ON_MESG_MASK);
> + host = pci_find_host_bridge(pdev->bus);
> + if (!host->no_disable_sys_err)
> + pcie_capability_clear_word(pdev, PCI_EXP_RTCTL,
> +SYSTEM_ERROR_INTR_ON_MESG_MASK);

If I squint hard enough this sort of makes sense, but it also makes me
confused about the normal APEI firmware-first model works.

In the NON-firmare-first case, firmware isn't involved in handling AER
errors.  The Linux AER driver fields an interrupt from a Root Port,
reads AER log registers, etc.

In the normal APEI firmware-first case, when the hardware reports an
AER event, I think firmware gets control first, and *it* reads the AER
log registers, packages them up, and generates an interrupt to the OS,
which reads the packaged error state from the firmware via the HEST.

If I understand this special Intel VMD firmware-first case correctly,
firmware gets control first, reads the AER log registers, and
synthesizes what looks to the OS like a normal AER interrupt.  The
Linux AER driver gets what it thinks is an interrupt from a Root Port,
and it reads AER log registers from the hardware just like it does in
the NON-firmware-first case.

My confusion is about how we manage the mechanism by which the
firmware gets control first.  In the Intel VMD case, it looks like
firmware fields the System Errors controlled by the Root Control
register of Root Ports.  This patch adds some framework so we know not
to touch something set up by firmware.

But in the normal APEI firmware-first case, we disable those System
Errors in the Root Control registers, so firmware must get control
some other way.

How does the OS know what mechanism the firmware uses, so it can make
sure to preserve it?  This patch might be part of the solution, but it
seems pretty ad hoc, and of course it does nothing for the APEI
firmware-first case.  How does firmware get control in that case?

>   aer_pos = pdev->aer_cap;
>   /* Clear error status */
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 6925828..6fcfab4 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -484,6 +484,9 @@ struct pci_host_bridge {
>   unsigned intnative_shpc_hotplug:1;  /* OS may use SHPC hotplug */
>   unsigned intnative_pme:1;   /* OS may use PCIe PME */
>   unsigned intnative_ltr:1;   /* OS may use PCIe LTR */
> + unsigned intno_disable_sys_err:1;   /* Don't disable system
> +error interrupts */
> +
>   /* Resource alignment requirements */
>   resource_size_t (*align_resource)(struct pci_dev *dev,
>   const struct resource *res,
> -- 
> 1.8.3.1
> 


Re: [PATCH] PCI/Layerscape: fix wrongly invoking of outbound window disable accessor

2018-10-29 Thread Bjorn Helgaas
On Fri, Oct 26, 2018 at 02:20:21AM +, Z.q. Hou wrote:
> > From: Bjorn Helgaas 

> > Holy cow, this has been broken since v4.14.  If fixing this makes
> > a difference, you might want to tag it for stable.
> 
> How can I tag it for stable?

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/stable-kernel-rules.rst


Re: [PATCH] PCI/Layerscape: fix wrongly invoking of outbound window disable accessor

2018-10-25 Thread Bjorn Helgaas
  $ git log --oneline --follow drivers/pci/controller/dwc/pci-layerscape.c | 
head
  6e0832fa432e PCI: Collect all native drivers under drivers/pci/controller/
  3f43ccc4ea1b PCI: dwc: Remove old MSI IRQs API
  8cfab3cf63cf PCI: Add SPDX GPL-2.0 to replace GPL v2 boilerplate
  84d897d69938 PCI: layerscape: Change default error response behavior
  a335b122ba27 PCI: layerscape: Add support for ls1012a
  03fc6134c260 PCI: layerscape: Add support for ls1088a
  8f89357094e6 PCI: layerscape: Add support for ls2088a
  c3f909398827 PCI: layerscape: Remove unnecessary class code fixup
  e44abfed6fcb PCI: dwc: Add accessors for write permission of DBI read-only 
registers
  4a2745d760fa PCI: layerscape: Disable outbound windows configured by 
bootloader

Make yours match, e.g., "PCI: layerscape: Call dw_pcie_disable_atu()
correctly"

On Thu, Oct 25, 2018 at 08:53:37AM +, Z.q. Hou wrote:
> From: Hou Zhiqiang 
> 
> This issue is introduced by commit 4a2745d760fac ("PCI: layerscape: Disable
> outbound windows configured by bootloader").

Conventional commit reference:

  $ git --no-pager show -s --abbrev-commit --abbrev=12 --pretty=format:"%h 
(\"%s\")%n" 4a2745d760fac
  4a2745d760fa ("PCI: layerscape: Disable outbound windows configured by 
bootloader")

Holy cow, this has been broken since v4.14.  If fixing this makes a
difference, you might want to tag it for stable.

> Signed-off-by: Hou Zhiqiang 
> ---
>  drivers/pci/controller/dwc/pci-layerscape.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
> b/drivers/pci/controller/dwc/pci-layerscape.c
> index 69f3f1a5a782..b2085988dbee 100644
> --- a/drivers/pci/controller/dwc/pci-layerscape.c
> +++ b/drivers/pci/controller/dwc/pci-layerscape.c
> @@ -88,7 +88,7 @@ static void ls_pcie_disable_outbound_atus(struct ls_pcie 
> *pcie)
>   int i;
>  
>   for (i = 0; i < PCIE_IATU_NUM; i++)
> - dw_pcie_disable_atu(pcie->pci, DW_PCIE_REGION_OUTBOUND, i);
> + dw_pcie_disable_atu(pcie->pci, i, DW_PCIE_REGION_OUTBOUND);
>  }
>  
>  static int ls1021_pcie_link_up(struct dw_pcie *pci)
> -- 
> 2.17.1
> 


[PATCH] MAINTAINERS: Add x86 early-quirks.c file pattern to PCI subsystem

2018-10-24 Thread Bjorn Helgaas
From: Bjorn Helgaas 

arch/x86/kernel/early-quirks.c contains special PCI quirks that need to
run even before the usual DECLARE_PCI_FIXUP_EARLY() quirks.  These have
typically been merged by the x86 maintainers, which is fine, but PCI folks
should at least see what's happening, so add a file pattern to the PCI
subsystem entry.

Signed-off-by: Bjorn Helgaas 
---
 MAINTAINERS |1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 4ece30f15777..63cb7f3dbbb4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11249,6 +11249,7 @@ F:  include/uapi/linux/pci*
 F: lib/pci*
 F: arch/x86/pci/
 F: arch/x86/kernel/quirks.c
+F: arch/x86/kernel/early-quirks.c
 
 PCI NATIVE HOST BRIDGE AND ENDPOINT DRIVERS
 M: Lorenzo Pieralisi 



Re: [PATCH v3] i2c:amd I2C Driver based on PCI Interface for upcoming, platform

2018-10-24 Thread Bjorn Helgaas
On Thu, Oct 25, 2018 at 01:26:51AM +0800, Kai Heng Feng wrote:
> > On Sep 17, 2018, at 16:19, Kai-Heng Feng  
> > wrote:
> > at 18:54, Shah, Nehal-bakulchandra  wrote:
> > 
> >> From: Nehal-bakulchandra Shah 
> >> 
> >> This contains two drivers.
> >> 1)i2c-amd-platdrv: This is based on I2C framework of
> >> linux kernel. So any i2c read write call or commands
> >> to this driver is routed to PCI Interface driver.
> >> 2) i2c-amd-platdrv: This driver is responsible to
> >> talk with Mp2 and does all C2P/P2C communication
> >> or reading/writing from DRAM in case of more
> >> data.
> >> 
> >> Reviewed-by: S-k, Shyam-sundar 
> >> Reviewed-by: Sandeep Singh 
> >> Signed-off-by: Nehal-bakulchandra Shah 
> > 
> > The I2C touchpad on Latitude 5495 works with this patch.
> 
> From PCI’s point of view, do you think this driver is good?

Speaking for myself, I usually only review things that *change* the
PCI core, not things like drivers that simply *use* the PCI core
services.

But since you made the mistake of asking, I do have a few comments
below :)

> >> --- /dev/null
> >> +++ b/drivers/i2c/busses/i2c-amd-pci-mp2.c
> >> @@ -0,0 +1,580 @@
> >> +// SPDX-License-Identifier: GPL-2.0
> >> +/*
> >> + * Copyright (C) 2018 Advanced Micro Devices, Inc. All Rights Reserved.
> >> + *
> >> + *

Extra blank line above.

> >> + * Author: Shyam Sundar S K 
> >> + * AMD PCIe MP2 Communication Driver
> >> + */
> >> +
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +
> >> +#include "i2c-amd-pci-mp2.h"
> >> +
> >> +#define DRIVER_NAME   "pcie_mp2_amd"
> >> +#define DRIVER_DESC   "AMD(R) PCI-E MP2 Communication Driver"

"PCIe" is the usual spelling.

> >> +#define DRIVER_VER"1.0"
> >> +
> >> +MODULE_DESCRIPTION(DRIVER_DESC);
> >> +MODULE_VERSION(DRIVER_VER);
> >> +MODULE_LICENSE("Dual BSD/GPL");

This doesn't match the SPDX tag above.
See 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/license-rules.rst

> >> +MODULE_AUTHOR("Shyam Sundar S K ");
> >> +
> >> +static const struct file_operations amd_mp2_debugfs_info;
> >> +static struct dentry *debugfs_dir;
> >> +
> >> +int amd_mp2_connect(struct pci_dev *dev, struct i2c_config i2c_cfg)
> >> +{
> >> +  struct amd_mp2_dev *privdata = pci_get_drvdata(dev);
> >> +  union i2c_cmd_base i2c_cmd_base;
> >> +  unsigned  long  flags;

Extra spaces in the "flags" declaration.

> >> +
> >> +  raw_spin_lock_irqsave(>lock, flags);
> >> +  dev_dbg(ndev_dev(privdata), "%s addr: %x id: %d\n", __func__,
> >> +  i2c_cfg.dev_addr, i2c_cfg.bus_id);
> >> +
> >> +  i2c_cmd_base.ul = 0;
> >> +  i2c_cmd_base.s.i2c_cmd = i2c_enable;
> >> +  i2c_cmd_base.s.bus_id = i2c_cfg.bus_id;
> >> +  i2c_cmd_base.s.i2c_speed = i2c_cfg.i2c_speed;
> >> +
> >> +  if (i2c_cmd_base.s.bus_id == i2c_bus_1) {
> >> +  writel(i2c_cmd_base.ul, privdata->mmio + AMD_C2P_MSG1);
> >> +  } else if (i2c_cmd_base.s.bus_id == i2c_bus_0) {
> >> +  writel(i2c_cmd_base.ul, privdata->mmio + AMD_C2P_MSG0);
> >> +  } else {
> >> +  dev_err(ndev_dev(privdata), "%s Invalid bus id\n", __func__);
> >> +  return -EINVAL;
> >> +  }
> >> +  raw_spin_unlock_irqrestore(>lock, flags);
> >> +  return 0;
> >> +}
> >> +EXPORT_SYMBOL_GPL(amd_mp2_connect);
> >> +
> >> +int amd_mp2_read(struct pci_dev *dev, struct i2c_config i2c_cfg)
> >> +{
> >> +  struct amd_mp2_dev *privdata = pci_get_drvdata(dev);
> >> +  union i2c_cmd_base i2c_cmd_base;
> >> +
> >> +  dev_dbg(ndev_dev(privdata), "%s addr: %x id: %d\n", __func__,
> >> +  i2c_cfg.dev_addr, i2c_cfg.bus_id);
> >> +
> >> +  privdata->requested = true;
> >> +  i2c_cmd_base.ul = 0;
> >> +  i2c_cmd_base.s.i2c_cmd = i2c_read;
> >> +  i2c_cmd_base.s.dev_addr = i2c_cfg.dev_addr;
> >> +  i2c_cmd_base.s.length = i2c_cfg.length;
> >> +  i2c_cmd_base.s.bus_id = i2c_cfg.bus_id;

This block is repeated and could be factored out to a helper function
(parameterized with i2c_read/i2c_write).

> >> +
> >> +  if (i2c_cfg.length <= 32) {
> >> +  i2c_cmd_base.s.mem_type = use_c2pmsg;
> >> +  privdata->eventval.buf = (u32 *)i2c_cfg.read_buf;
> >> +  if (!privdata->eventval.buf) {
> >> +  dev_err(ndev_dev(privdata), "%s no mem for buf 
> >> received\n",
> >> +  __func__);
> >> +  return -ENOMEM;
> >> +  }
> >> +  } else {
> >> +  i2c_cmd_base.s.mem_type = use_dram;
> >> +  privdata->i2c_cfg.phy_addr = i2c_cfg.phy_addr;
> >> +  privdata->i2c_cfg.read_buf = i2c_cfg.read_buf;
> >> +  write64((u64)privdata->i2c_cfg.phy_addr,
> >> +  privdata->mmio + AMD_C2P_MSG2);
> >> +  }
> >> +
> >> +  switch (i2c_cfg.i2c_speed) {
> >> +  case 0:
> >> +  i2c_cmd_base.s.i2c_speed = speed100k;
> >> +  break;
> >> +  case 1:
> >> +  i2c_cmd_base.s.i2c_speed = speed400k;
> >> +  break;
> >> +  case 2:
> >> +  i2c_cmd_base.s.i2c_speed = 

Re: HP DL585 warm boot fail (old)

2018-10-24 Thread Bjorn Helgaas
On Wed, Oct 24, 2018 at 05:47:17PM +0300, Meelis Roos wrote:
> > Can you try the patch below?  This is extracted from the code here:
> > https://github.com/joyent/illumos-joyent/blob/b6a0b04d591f5b877cfe05f45e81f0e8a5cfc2b3/usr/src/uts/intel/io/pci/pci_boot.c#L1805
> 
> Thank you. Unfortunately it does not change anything noticable.

Do you see the "disabling NMI on error" message?

Can you boot with "pci=earlydump vga=0xf07" and capture the output?
Drop the "vga=0xf07" if it doesn't work or makes the screen
unreadable.

> > I'm not sure why this would be only an intermittent problem, but at
> > least we can see if this is related.
> 
> It seems 4.19 and current git are 100% reproducers so far - I have
> not managed to successfully boot either of them yet. I have seen
> 4.19-rc1 era git kernel booting at least once.
> 
> I noticed that Debian packaged 4.17 with initramfs worked fine so
> far for my test, from these I have in grub menu. My selfcompiled
> kernels do not use initramfs.

It seems like the hang happens long before we would do anything with
an initramfs, but maybe there's a timing or memory map issue.  It
seems like a hassle to pursue this angle, but if we can't figure it
out otherwise, maybe we'll have to.

Bjorn


Re: HH DL585 warm boot fail (old)

2018-10-24 Thread Bjorn Helgaas
On Wed, Oct 24, 2018 at 10:47:24AM +0300, Meelis Roos wrote:
> > Would you mind opening a report at https://bugzilla.kernel.org?  I'm
> > not sure if anybody will be able to do anything about this, but it's
> > always possible.
> 
> Submitted now, https://bugzilla.kernel.org/show_bug.cgi?id=201503
> 
> > A complete dmesg log and "sudo lspci -vv" output from a successful
> > boot would be a good start.  And if you have a screenshot of the
> > failure, that would help, too.  You can use the "ignore_loglevel"
> > kernel parameter to make sure we see everything on the console.
> 
> Added.
> 
> >  Does this machine have an iLO?  If so, it may have logs that
> >  could be useful if this is related to some sort of bus error.
> 
> Nothing in the ILO logs.

Great, thanks!

Can you try the patch below?  This is extracted from the code here:
https://github.com/joyent/illumos-joyent/blob/b6a0b04d591f5b877cfe05f45e81f0e8a5cfc2b3/usr/src/uts/intel/io/pci/pci_boot.c#L1805

I'm not sure why this would be only an intermittent problem, but at
least we can see if this is related.


diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6bc27b7fd452..842f900ed194 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5113,3 +5113,15 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8575,
quirk_switchtec_ntb_dma_alias);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8576,
quirk_switchtec_ntb_dma_alias);
+
+static void quirk_amd_8111(struct pci_dev *pdev)
+{
+   u8 ioc;
+
+   pci_read_config_byte(pdev, 0x40, );
+   if (ioc & 0x80) {
+   pci_info(pdev, "disabling NMI on error\n");
+   pci_write_config_byte(pdev, 0x40, ioc & ~0x80);
+   }
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x7468, quirk_amd_8111);


[GIT PULL] PCI changes for v4.20

2018-10-23 Thread Bjorn Helgaas
PCI changes:

  - Fix ASPM link_state teardown on removal (Lukas Wunner)

  - Fix misleading _OSC ASPM message (Sinan Kaya)

  - Make _OSC optional for PCI (Sinan Kaya)

  - Don't initialize ASPM link state when ACPI_FADT_NO_ASPM is set (Patrick
Talbert)

  - Remove x86 and arm64 node-local allocation for host bridge structures
(Punit Agrawal)

  - Pay attention to device-specific _PXM node values (Jonathan Cameron)

  - Support new Immediate Readiness bit (Felipe Balbi)

  - Differentiate between pciehp surprise and safe removal (Lukas Wunner)

  - Remove unnecessary pciehp includes (Lukas Wunner)

  - Drop pciehp hotplug_slot_ops wrappers (Lukas Wunner)

  - Tolerate PCIe Slot Presence Detect being hardwired to zero to
workaround broken hardware, e.g., the Wilocity switch/wireless device
(Lukas Wunner)

  - Unify pciehp controller & slot structs (Lukas Wunner)

  - Constify hotplug_slot_ops (Lukas Wunner)

  - Drop hotplug_slot_info (Lukas Wunner)

  - Embed hotplug_slot struct into users instead of allocating it
separately (Lukas Wunner)

  - Initialize PCIe port service drivers directly instead of relying on
initcall ordering (Keith Busch)

  - Restore PCI config state after a slot reset (Keith Busch)

  - Save/restore DPC config state along with other PCI config state (Keith
Busch)

  - Reference count devices during AER handling to avoid race issue with
concurrent hot removal (Keith Busch)

  - If an Upstream Port reports ERR_FATAL, don't try to read the Port's
config space because it is probably unreachable (Keith Busch)

  - During error handling, use slot-specific reset instead of secondary
bus reset to avoid link up/down issues on hotplug ports (Keith Busch)

  - Restore previous AER/DPC handling that does not remove and re-enumerate
devices on ERR_FATAL (Keith Busch)

  - Notify all drivers that may be affected by error recovery resets (Keith
Busch)

  - Always generate error recovery uevents, even if a driver doesn't have
error callbacks (Keith Busch)

  - Make PCIe link active reporting detection generic (Keith Busch)

  - Support D3cold in PCIe hierarchies during system sleep and runtime,
including hotplug and Thunderbolt ports (Mika Westerberg)

  - Handle hpmemsize/hpiosize kernel parameters uniformly, whether slots
are empty or occupied (Jon Derrick)

  - Remove duplicated include from pci/pcie/err.c and unused variable from
cpqphp (YueHaibing)

  - Remove driver pci_cleanup_aer_uncorrect_error_status() calls (Oza
Pawandeep)

  - Uninline PCI bus accessors for better ftracing (Keith Busch)

  - Remove unused AER Root Port .error_resume method (Keith Busch)

  - Use kfifo in AER instead of a local version (Keith Busch)

  - Use threaded IRQ in AER bottom half (Keith Busch)

  - Use managed resources in AER core (Keith Busch)

  - Reuse pcie_port_find_device() for AER injection (Keith Busch)

  - Abstract AER interrupt handling to disconnect error injection (Keith
Busch)

  - Refactor AER injection callbacks to simplify future improvments (Keith
Busch)

  - Remove unused Netronome NFP32xx Device IDs (Jakub Kicinski)

  - Use bitmap_zalloc() for dma_alias_mask (Andy Shevchenko)

  - Add switch fall-through annotations (Gustavo A. R. Silva)

  - Remove unused Switchtec quirk variable (Joshua Abraham)

  - Fix pci.c kernel-doc warning (Randy Dunlap)

  - Remove trivial PCI wrappers for DMA APIs (Christoph Hellwig)

  - Add Intel GPU device IDs to spurious interrupt quirk (Bin Meng)

  - Run Switchtec DMA aliasing quirk only on NTB endpoints to avoid useless
dmesg errors (Logan Gunthorpe)

  - Update Switchtec NTB documentation (Wesley Yung)

  - Remove redundant "default n" from Kconfig (Bartlomiej Zolnierkiewicz)

  - Avoid panic when drivers enable MSI/MSI-X twice (Tonghao Zhang)

  - Add PCI support for peer-to-peer DMA (Logan Gunthorpe)

  - Add sysfs group for PCI peer-to-peer memory statistics (Logan
Gunthorpe)

  - Add PCI peer-to-peer DMA scatterlist mapping interface (Logan
Gunthorpe)

  - Add PCI configfs/sysfs helpers for use by peer-to-peer users (Logan
Gunthorpe)

  - Add PCI peer-to-peer DMA driver writer's documentation (Logan
Gunthorpe)

  - Add block layer flag to indicate driver support for PCI peer-to-peer
DMA (Logan Gunthorpe)

  - Map Infiniband scatterlists for peer-to-peer DMA if they contain P2P
memory (Logan Gunthorpe)

  - Register nvme-pci CMB buffer as PCI peer-to-peer memory (Logan
Gunthorpe)

  - Add nvme-pci support for PCI peer-to-peer memory in requests (Logan
Gunthorpe)

  - Use PCI peer-to-peer memory in nvme (Stephen Bates, Steve Wise,
Christoph Hellwig, Logan Gunthorpe)

  - Cache VF config space size to optimize enumeration of many VFs
(KarimAllah Ahmed)

  - Remove unnecessary  include (Bjorn Helgaas)

  - Fix VMD AERSID quirk Device ID matching (Jon Derrick)

  - Fix Cadence PHY handling during probe (Alan Douglas)

  - 

Re: [PATCH v4] PCI/AER: Enable error reporting for all ports

2018-10-18 Thread Bjorn Helgaas
On Thu, Oct 18, 2018 at 05:03:13PM -0600, Keith Busch wrote:
> On Thu, Oct 18, 2018 at 03:53:58PM -0500, Bjorn Helgaas wrote:
> > Change the AER service driver so it binds to *all* PCIe Ports,
> > including Switch Upstream and Downstream Ports.  Enable AER error
> > reporting for all these Ports, but not for any children.
> 
> I'm looking at this again and think enabling/disabling error
> reporting for ports is the responsibility of the port driver, not
> the AER service.

That's an interesting idea.  Can you expand on this a little more?
Why is it the responsibility of the port driver?

Do you think pci_enable_pcie_error_reporting() shouldn't be part of
the AER service because it updates the Device Control register, which
is in the PCIe Capability, not the AER Capability?

What about pci_aer_clear_device_status(), which clears Device Status,
which is also in the PCIe Capability?

> The following should do the same as this patch, but without making
> AER driver handle non-root ports.  The report enabling/disabling
> functions are already stubbed for '!CONFIG_PCIE_AER' and have checks
> for aer_cap and firmware first.

If we thought we should enable error reporting *always*, regardless of
whether the AER service is enabled, this would make perfect sense to
me, and I might suggest doing it in an even more generic place like
pci_configure_device() or pci_init_capabilities().

But that doesn't seem like where you're headed.  It seems like you
still only want error reporting enabled when CONFIG_PCIEAR=y.  If
that's the case, it seems like doing it in portdrv only obfuscates the
connection with AER.  When CONFIG_PCIEAER is unset, the portdrv code
*looks* like it's doing something but it's really not because of the
#ifdef magic.

> A real patch for this could even make this remove all the aer
> specific error report enabling, so it'd be a net-loss in code lines.
> :)
> 
> ---
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index 0acca3596807..f129a33c8303 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -122,12 +122,13 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
>   pm_runtime_put_autosuspend(>dev);
>   pm_runtime_allow(>dev);
>   }
> -
> + pci_enable_pcie_error_reporting(dev);
>   return 0;
>  }
>  
>  static void pcie_portdrv_remove(struct pci_dev *dev)
>  {
> + pci_disable_pcie_error_reporting(dev);
>   if (pci_bridge_d3_possible(dev)) {
>   pm_runtime_forbid(>dev);
>   pm_runtime_get_noresume(>dev);
> --


[PATCH v4] PCI/AER: Enable error reporting for all ports

2018-10-18 Thread Bjorn Helgaas
From: Bjorn Helgaas 

Previously we enabled AER error reporting only for Switch Ports that were
enumerated prior to registering the AER service driver.  Switch Ports
enumerated after AER driver registration were left with error reporting
disabled.

A common order, which works correctly, is that we enumerate devices before
registering portdrv and the AER driver:

  - Enumerate all the devices at boot-time

  - Register portdrv and bind it to all Root Ports and Switch Ports, which
disables error reporting for these Ports

  - Register AER service driver and bind it to all Root Ports, which
enables error reporting for the Root Ports and any Switch Ports below
them

But if we enumerate devices *after* registering portdrv and the AER driver,
e.g., if a host bridge driver is loaded as a module, error reporting is not
enabled correctly:

  - Register portdrv and AER driver (this happens at boot-time)

  - Enumerate a Root Port

  - Bind portdrv to Root Port, disabling its error reporting

  - Bind AER service driver to Root Port, enabling error reporting for it
and its children (there are no children, since we haven't enumerated
them yet)

  - Enumerate Switch Port below the Root Port

  - Bind portdrv to Switch Port, disabling its error reporting

  - AER service driver doesn't bind to Switch Ports, so error reporting
remains disabled

Hot-adding a Switch fails similarly: error reporting is enabled correctly
for the Root Port, but when the Switch is enumerated, the AER service
driver doesn't claim it, so there's nothing to enable error reporting for
the Switch Ports.

Change the AER service driver so it binds to *all* PCIe Ports, including
Switch Upstream and Downstream Ports.  Enable AER error reporting for all
these Ports, but not for any children.

Binding the AER driver to all PCIe Ports requires additional changes
because aer_remove() and aer_root_reset() were previously called only for
Root Ports but may now be called for any Port.

  - aer_remove() must check for Root Ports before disabling downstream
device error reporting and Root Port interrupts and status.

  - aer_root_reset() must check for Root Ports before disabling and
restoring Root Port interrupts.  This is called from reset_link(),
which previously fell back to default_reset_link() for Switch Ports
because they weren't claimed by the AER service driver.  With the new
Root Port check, aer_root_reset() is equivalent to default_reset_link()
in that case.

Link: 
https://lore.kernel.org/linux-pci/1536085989-2956-1-git-send-email-jonathan.derr...@intel.com
Based-on-patch-by: Jon Derrick 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/pcie/aer.c |   66 +++-
 1 file changed, 37 insertions(+), 29 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index a90a9194ac4a..f4cafb2ee7da 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1315,12 +1315,6 @@ static void aer_enable_rootport(struct aer_rpc *rpc)
pci_read_config_dword(pdev, aer_pos + PCI_ERR_UNCOR_STATUS, );
pci_write_config_dword(pdev, aer_pos + PCI_ERR_UNCOR_STATUS, reg32);
 
-   /*
-* Enable error reporting for the root port device and downstream port
-* devices.
-*/
-   set_downstream_devices_error_reporting(pdev, true);
-
/* Enable Root Port's interrupt in response to error messages */
pci_read_config_dword(pdev, aer_pos + PCI_ERR_ROOT_COMMAND, );
reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
@@ -1364,9 +1358,12 @@ static void aer_disable_rootport(struct aer_rpc *rpc)
  */
 static void aer_remove(struct pcie_device *dev)
 {
-   struct aer_rpc *rpc = get_service_data(dev);
+   struct aer_rpc *rpc;
 
-   aer_disable_rootport(rpc);
+   if (pci_pcie_type(dev->port) == PCI_EXP_TYPE_ROOT_PORT) {
+   rpc = get_service_data(dev);
+   aer_disable_rootport(rpc);
+   }
 }
 
 /**
@@ -1377,10 +1374,17 @@ static void aer_remove(struct pcie_device *dev)
  */
 static int aer_probe(struct pcie_device *dev)
 {
+   struct pci_dev *pdev = dev->port;
+   int type = pci_pcie_type(pdev);
int status;
struct aer_rpc *rpc;
struct device *device = >device;
 
+   if (type == PCI_EXP_TYPE_UPSTREAM || type == PCI_EXP_TYPE_DOWNSTREAM) {
+   pci_enable_pcie_error_reporting(pdev);
+   return 0;
+   }
+
rpc = devm_kzalloc(device, sizeof(struct aer_rpc), GFP_KERNEL);
if (!rpc) {
dev_printk(KERN_DEBUG, device, "alloc AER rpc failed\n");
@@ -1398,52 +1402,56 @@ static int aer_probe(struct pcie_device *dev)
}
 
aer_enable_rootport(rpc);
+   pci_enable_pcie_error_reporting(pdev);
dev_info(device, "AER enabled with IRQ %d\n", dev->irq);
return 0;
 }
 
 /**
- * aer_root_reset - reset link on Root Port
- * @dev: pointer to Ro

Re: [PATCH v3] PCI/AER: Enable reporting for ports enumerated after AER driver registration

2018-10-18 Thread Bjorn Helgaas
On Fri, Oct 12, 2018 at 04:16:04PM +0800, Dongdong Liu wrote:
> 在 2018/10/11 23:57, Keith Busch 写道:
> > On Thu, Oct 11, 2018 at 08:26:18AM -0700, Bjorn Helgaas wrote:
> > > From: Bjorn Helgaas 
> > > 
> > > Previously we enabled AER error reporting only for Switch Ports that were
> > > enumerated prior to registering the AER service driver.  Switch Ports
> > > enumerated after AER driver registration were left with error reporting
> > > disabled.
> > > 
> > > A common order, which works correctly, is that we enumerate devices before
> > > registering portdrv and the AER driver:
> > > 
> > >   - Enumerate all the devices at boot-time
> > > 
> > >   - Register portdrv and bind it to all Root Ports and Switch Ports, which
> > > disables error reporting for these Ports
> > > 
> > >   - Register AER service driver and bind it to all Root Ports, which
> > > enables error reporting for the Root Ports and any Switch Ports below
> > > them
> > > 
> > > But if we enumerate devices *after* registering portdrv and the AER 
> > > driver,
> > > e.g., if a host bridge driver is loaded as a module, error reporting is 
> > > not
> > > enabled correctly:
> > > 
> > >   - Register portdrv and AER driver (this happens at boot-time)
> > > 
> > >   - Enumerate a Root Port
> > > 
> > >   - Bind portdrv to Root Port, disabling its error reporting
> > > 
> > >   - Bind AER service driver to Root Port, enabling error reporting for it
> > > and its children (there are no children, since we haven't enumerated
> > > them yet)
> > > 
> > >   - Enumerate Switch Port below the Root Port
> > > 
> > >   - Bind portdrv to Switch Port, disabling its error reporting
> > > 
> > >   - AER service driver doesn't bind to Switch Ports, so error reporting
> > > remains disabled
> > > 
> > > Hot-adding a Switch fails similarly: error reporting is enabled correctly
> > > for the Root Port, but when the Switch is enumerated, the AER service
> > > driver doesn't claim it, so there's nothing to enable error reporting for
> > > the Switch Ports.
> > > 
> > > Change the AER service driver so it binds to *all* PCIe Ports, including
> > > Switch Upstream and Downstream Ports.  Enable AER error reporting for all
> > > these Ports, but not for any children.
> > > 
> > > Link: 
> > > https://lore.kernel.org/linux-pci/1536085989-2956-1-git-send-email-jonathan.derr...@intel.com
> > > Based-on-patch-by: Jon Derrick 
> > > Signed-off-by: Bjorn Helgaas 
> > > ---
> > >  drivers/pci/pcie/aer.c |   16 +---
> > >  1 file changed, 9 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > > index 90b53abf621d..c40c6607849b 100644
> > > --- a/drivers/pci/pcie/aer.c
> > > +++ b/drivers/pci/pcie/aer.c
> > > @@ -1316,12 +1316,6 @@ static void aer_enable_rootport(struct aer_rpc 
> > > *rpc)
> > >   pci_read_config_dword(pdev, aer_pos + PCI_ERR_UNCOR_STATUS, );
> > >   pci_write_config_dword(pdev, aer_pos + PCI_ERR_UNCOR_STATUS, reg32);
> > > 
> > > - /*
> > > -  * Enable error reporting for the root port device and downstream port
> > > -  * devices.
> > > -  */
> > > - set_downstream_devices_error_reporting(pdev, true);
> > > -
> > >   /* Enable Root Port's interrupt in response to error messages */
> > >   pci_read_config_dword(pdev, aer_pos + PCI_ERR_ROOT_COMMAND, );
> > >   reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
> > > @@ -1378,10 +1372,17 @@ static void aer_remove(struct pcie_device *dev)
> > >   */
> > >  static int aer_probe(struct pcie_device *dev)
> > >  {
> > > + struct pci_dev *pdev = dev->port;
> > > + int type = pci_pcie_type(pdev);
> > >   int status;
> > >   struct aer_rpc *rpc;
> > >   struct device *device = >device;
> > > 
> > > + if (type == PCI_EXP_TYPE_UPSTREAM || type == PCI_EXP_TYPE_DOWNSTREAM) {
> > > + pci_enable_pcie_error_reporting(pdev);
> > > + return 0;
> > > + }
> > 
> > I think we need to either return an error in this case so that the
> > pcie_device won't be eligable for the .remove() callback, or add a
> > similiar type check in aer_remove().
> 
> It seems aer_root_reset() also will be called for downstream port(err.c 
> driver->reset_link(dev)),
> but aer_root_reset is only for root port.

Also right, thanks!

I think it will do the right thing if we make aer_root_reset() check
the port type and only disable/restore the Root Port interrupt
settings and status when called for a Root Port.  The
pci_bus_error_reset() it does is exactly what default_reset_link()
does for devices that aren't claimed by the AER driver.

Bjorn


Re: [PATCH v3] PCI/AER: Enable reporting for ports enumerated after AER driver registration

2018-10-18 Thread Bjorn Helgaas
On Thu, Oct 11, 2018 at 09:57:16AM -0600, Keith Busch wrote:
> On Thu, Oct 11, 2018 at 08:26:18AM -0700, Bjorn Helgaas wrote:
> > From: Bjorn Helgaas 
> > 
> > Previously we enabled AER error reporting only for Switch Ports that were
> > enumerated prior to registering the AER service driver.  Switch Ports
> > enumerated after AER driver registration were left with error reporting
> > disabled.
> > 
> > A common order, which works correctly, is that we enumerate devices before
> > registering portdrv and the AER driver:
> > 
> >   - Enumerate all the devices at boot-time
> > 
> >   - Register portdrv and bind it to all Root Ports and Switch Ports, which
> > disables error reporting for these Ports
> > 
> >   - Register AER service driver and bind it to all Root Ports, which
> > enables error reporting for the Root Ports and any Switch Ports below
> > them
> > 
> > But if we enumerate devices *after* registering portdrv and the AER driver,
> > e.g., if a host bridge driver is loaded as a module, error reporting is not
> > enabled correctly:
> > 
> >   - Register portdrv and AER driver (this happens at boot-time)
> > 
> >   - Enumerate a Root Port
> > 
> >   - Bind portdrv to Root Port, disabling its error reporting
> > 
> >   - Bind AER service driver to Root Port, enabling error reporting for it
> > and its children (there are no children, since we haven't enumerated
> > them yet)
> > 
> >   - Enumerate Switch Port below the Root Port
> > 
> >   - Bind portdrv to Switch Port, disabling its error reporting
> > 
> >   - AER service driver doesn't bind to Switch Ports, so error reporting
> > remains disabled
> > 
> > Hot-adding a Switch fails similarly: error reporting is enabled correctly
> > for the Root Port, but when the Switch is enumerated, the AER service
> > driver doesn't claim it, so there's nothing to enable error reporting for
> > the Switch Ports.
> > 
> > Change the AER service driver so it binds to *all* PCIe Ports, including
> > Switch Upstream and Downstream Ports.  Enable AER error reporting for all
> > these Ports, but not for any children.
> > 
> > Link: 
> > https://lore.kernel.org/linux-pci/1536085989-2956-1-git-send-email-jonathan.derr...@intel.com
> > Based-on-patch-by: Jon Derrick 
> > Signed-off-by: Bjorn Helgaas 
> > ---
> >  drivers/pci/pcie/aer.c |   16 +---
> >  1 file changed, 9 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > index 90b53abf621d..c40c6607849b 100644
> > --- a/drivers/pci/pcie/aer.c
> > +++ b/drivers/pci/pcie/aer.c
> > @@ -1316,12 +1316,6 @@ static void aer_enable_rootport(struct aer_rpc *rpc)
> > pci_read_config_dword(pdev, aer_pos + PCI_ERR_UNCOR_STATUS, );
> > pci_write_config_dword(pdev, aer_pos + PCI_ERR_UNCOR_STATUS, reg32);
> >  
> > -   /*
> > -* Enable error reporting for the root port device and downstream port
> > -* devices.
> > -*/
> > -   set_downstream_devices_error_reporting(pdev, true);
> > -
> > /* Enable Root Port's interrupt in response to error messages */
> > pci_read_config_dword(pdev, aer_pos + PCI_ERR_ROOT_COMMAND, );
> > reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
> > @@ -1378,10 +1372,17 @@ static void aer_remove(struct pcie_device *dev)
> >   */
> >  static int aer_probe(struct pcie_device *dev)
> >  {
> > +   struct pci_dev *pdev = dev->port;
> > +   int type = pci_pcie_type(pdev);
> > int status;
> > struct aer_rpc *rpc;
> > struct device *device = >device;
> >  
> > +   if (type == PCI_EXP_TYPE_UPSTREAM || type == PCI_EXP_TYPE_DOWNSTREAM) {
> > +   pci_enable_pcie_error_reporting(pdev);
> > +   return 0;
> > +   }
> 
> I think we need to either return an error in this case so that the
> pcie_device won't be eligable for the .remove() callback, or add a
> similiar type check in aer_remove().

Indeed, thanks!  I think a check in aer_remove() seems nicer.  It doesn't
seem right to return an error here, since everything is working correctly.

Bjorn


Re: [PATCH] PCI: pcie: remove redundant 'default n' from Kconfig

2018-10-18 Thread Bjorn Helgaas
On Tue, Oct 16, 2018 at 04:38:13PM +0200, Bartlomiej Zolnierkiewicz wrote:
> 'default n' is the default value for any bool or tristate Kconfig
> setting so there is no need to write it explicitly.
> 
> Also since commit f467c5640c29 ("kconfig: only write '# CONFIG_FOO
> is not set' for visible symbols") the Kconfig behavior is the same
> regardless of 'default n' being present or not:
> 
> ...
> One side effect of (and the main motivation for) this change is making
> the following two definitions behave exactly the same:
> 
> config FOO
> bool
> 
> config FOO
> bool
> default n
> 
> With this change, neither of these will generate a
> '# CONFIG_FOO is not set' line (assuming FOO isn't selected/implied).
> That might make it clearer to people that a bare 'default n' is
> redundant.
> ...
> 
> Signed-off-by: Bartlomiej Zolnierkiewicz 

Applied to pci/misc for v4.20, thanks!

> ---
>  drivers/pci/pcie/Kconfig |4 
>  1 file changed, 4 deletions(-)
> 
> Index: b/drivers/pci/pcie/Kconfig
> ===
> --- a/drivers/pci/pcie/Kconfig2018-10-09 15:58:49.831123212 +0200
> +++ b/drivers/pci/pcie/Kconfig2018-10-16 16:36:32.419732670 +0200
> @@ -36,7 +36,6 @@ config PCIEAER
>  config PCIEAER_INJECT
>   tristate "PCI Express error injection support"
>   depends on PCIEAER && DYNAMIC_FTRACE_WITH_REGS
> - default n
>   help
> This enables PCI Express Root Port Advanced Error Reporting
> (AER) software error injector.
> @@ -84,7 +83,6 @@ config PCIEASPM
>  config PCIEASPM_DEBUG
>   bool "Debug PCI Express ASPM"
>   depends on PCIEASPM
> - default n
>   help
> This enables PCI Express ASPM debug support. It will add per-device
> interface to control ASPM.
> @@ -129,7 +127,6 @@ config PCIE_PME
>  config PCIE_DPC
>   bool "PCI Express Downstream Port Containment support"
>   depends on PCIEPORTBUS && PCIEAER
> - default n
>   help
> This enables PCI Express Downstream Port Containment (DPC)
> driver support.  DPC events from Root and Downstream ports
> @@ -139,7 +136,6 @@ config PCIE_DPC
>  
>  config PCIE_PTM
>   bool "PCI Express Precision Time Measurement support"
> - default n
>   depends on PCIEPORTBUS
>   help
> This enables PCI Express Precision Time Measurement (PTM)


Re: [PATCH] PCI/P2PDMA: Fix NULL check in pci_p2pmem_publish()

2018-10-17 Thread Bjorn Helgaas
On Wed, Oct 17, 2018 at 10:05:10AM -0600, Logan Gunthorpe wrote:
> We should only assign 'p2pmem_published' if 'pdev->p2pdma' is not NULL.
> The extra check on 'publish' makes no sense.
> 
> Signed-off-by: Logan Gunthorpe 
> Reported-by: Dan Carpenter 
> Cc: Bjorn Helgaas 
> Cc: Christoph Hellwig 

I folded this into the original commit on pci/peer-to-peer, thanks!

> ---
>  drivers/pci/p2pdma.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index da66c7e31730..d710b5ef65a1 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -678,10 +678,8 @@ EXPORT_SYMBOL_GPL(pci_p2pmem_free_sgl);
>   */
>  void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
>  {
> - if (publish && !pdev->p2pdma)
> - return;
> -
> - pdev->p2pdma->p2pmem_published = publish;
> + if (pdev->p2pdma)
> + pdev->p2pdma->p2pmem_published = publish;
>  }
>  EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
>  
> -- 
> 2.19.0
> 


Re: [PATCH v9 2/9] PCI: Using PCI configuration space header type instead of class type to assign resource

2018-10-17 Thread Bjorn Helgaas
On Tue, Oct 16, 2018 at 03:53:55PM +0100, Lorenzo Pieralisi wrote:
> On Tue, Oct 16, 2018 at 06:44:43PM +0800, honghui.zh...@mediatek.com wrote:
> > From: Honghui Zhang 
> > 
> > The PCI configuration space header type defines the layout of the rest
> > of the header (PCI r3.0 sec 6.1, PCIe r4.0 sec 7.5.1.1.9) while the
> > resource assignment is based on the configuration space layout instead
> > of its class type. Using configuration space header type instead of
> > class type for the resource assignment.
> > 
> > Suggested-by: Bjorn Helgaas 
> > Signed-off-by: Honghui Zhang 
> > ---
> >  drivers/pci/pci.c   |  3 +--
> >  drivers/pci/probe.c |  3 ---
> >  drivers/pci/setup-bus.c | 20 ++--
> >  3 files changed, 11 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 29ff961..7d379ca 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -5908,8 +5908,7 @@ void pci_reassigndev_resource_alignment(struct 
> > pci_dev *dev)
> >  * to enable the kernel to reassign new resource
> >  * window later on.
> >  */
> > -   if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE &&
> > -   (dev->class >> 8) == PCI_CLASS_BRIDGE_PCI) {
> > +   if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
> > for (i = PCI_BRIDGE_RESOURCES; i < PCI_NUM_RESOURCES; i++) {
> > r = >resource[i];
> > if (!(r->flags & IORESOURCE_MEM))
> > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > index ec78400..29a35c1 100644
> > --- a/drivers/pci/probe.c
> > +++ b/drivers/pci/probe.c
> > @@ -1695,9 +1695,6 @@ int pci_setup_device(struct pci_dev *dev)
> > break;
> >  
> > case PCI_HEADER_TYPE_BRIDGE:/* bridge header */
> > -   if (class != PCI_CLASS_BRIDGE_PCI)
> > -   goto bad;
> > -
> > /*
> >  * The PCI-to-PCI bridge spec requires that subtractive
> >  * decoding (i.e. transparent) bridge must have programming
> > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> > index 79b1824..69f90f4 100644
> > --- a/drivers/pci/setup-bus.c
> > +++ b/drivers/pci/setup-bus.c
> > @@ -182,7 +182,7 @@ static void __dev_sort_resources(struct pci_dev *dev,
> > u16 class = dev->class >> 8;
> >  
> > /* Don't touch classless devices or host bridges or ioapics.  */
> > -   if (class == PCI_CLASS_NOT_DEFINED || class == PCI_CLASS_BRIDGE_HOST)
> > +   if (class == PCI_CLASS_NOT_DEFINED)
> 
> I think this check has been there since the first initial git commit,
> whether that's _really_ needed or not in the current kernel it is very
> hard to say.
> 
> I am not that sure it is safe to remove it, especially given that we are at
> -rc8 and close to a release, it would be good if this patch could sit in
> next to give it some exposure to testing before merging it upstream.

Yes, you're right; I think I think this is a little too risky at this
point.  I'll pull this patch out and queue it up for the next cycle
(v4.21).

For v4.20, I think you should resurrect the class code patch [1].  That
should be enough to make things work in v4.20, even without this hdr_type
patch.  It will also improve the lspci output, because I think it uses the
class code to look up the generic description, e.g., in this output:

  00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port 
(rev f1)

I think the "PCI bridge" part is based on the class code.

Bjorn

[1] 
https://lore.kernel.org/linux-pci/1539590940-13355-3-git-send-email-honghui.zh...@mediatek.com

> > return;
> >  
> > /* Don't touch ioapic devices already enabled by firmware */
> > @@ -1221,12 +1221,12 @@ void __pci_bus_size_bridges(struct pci_bus *bus, 
> > struct list_head *realloc_head)
> > if (!b)
> > continue;
> >  
> > -   switch (dev->class >> 8) {
> > -   case PCI_CLASS_BRIDGE_CARDBUS:
> > +   switch (dev->hdr_type) {
> > +   case PCI_HEADER_TYPE_CARDBUS:
> > pci_bus_size_cardbus(b, realloc_head);
> > break;
> >  
> > -   case PCI_CLASS_BRIDGE_PCI:
> > +   case PCI_HEADER_TYPE_BRIDGE:
> > default:
> > __pci_bus_size_bridges(b, realloc_head);
> > break;
> > @@ -1237,12 +1237,12 @@ 

Re: [PATCH v8 2/9] PCI: mediatek: Fix class type for MT7622 as PCI_CLASS_BRIDGE_PCI

2018-10-15 Thread Bjorn Helgaas
On Mon, Oct 15, 2018 at 04:08:53PM +0800, honghui.zh...@mediatek.com wrote:
> From: Honghui Zhang 
> 
> The commit 101c92dc80c8 ("PCI: mediatek: Set up vendor ID and class
> type for MT7622") have set the class type for MT7622 as un-properly
> value of PCI_CLASS_BRIDGE_HOST.
> 
> The PCIe controller of MT7622 is complexed with Root Port and PCI-to-PCI
> bridge, the bridge has type 1 configuration space header and related bridge
> windows. The HW default value of this bridge's class type is invalid. Fix
> its class type as PCI_CLASS_BRIDGE_PCI since it is HW defines.
> 
> Making the bridge visiable to PCI framework by setting its class type
> properly will get its bridge windows configurated during PCI device
> enumerate.
> 
> Fixes: 101c92dc80c8 ("PCI: mediatek: Set up vendor ID and class type for 
> MT7622")
> Signed-off-by: Honghui Zhang 
> Acked-by: Ryder Lee 

Nak until this patch is preceded by one that fixes the PCI core defect
I pointed out earlier [1].  It's OK to change the class code, but
not as a way of working around that PCI core defect.

[1] 
https://lore.kernel.org/linux-pci/20181012141202.gv5...@bhelgaas-glaptop.roam.corp.google.com

> ---
>  drivers/pci/controller/pcie-mediatek.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/pcie-mediatek.c 
> b/drivers/pci/controller/pcie-mediatek.c
> index 288b8e2..bcdac9b 100644
> --- a/drivers/pci/controller/pcie-mediatek.c
> +++ b/drivers/pci/controller/pcie-mediatek.c
> @@ -432,7 +432,7 @@ static int mtk_pcie_startup_port_v2(struct mtk_pcie_port 
> *port)
>   val = PCI_VENDOR_ID_MEDIATEK;
>   writew(val, port->base + PCIE_CONF_VEND_ID);
>  
> - val = PCI_CLASS_BRIDGE_HOST;
> + val = PCI_CLASS_BRIDGE_PCI;
>   writew(val, port->base + PCIE_CONF_CLASS_ID);
>   }
>  
> -- 
> 2.6.4
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


Re: [PATCH v6 2/9] PCI: mediatek: Fixup class ID for MT7622 as PCI_CLASS_BRIDGE_PCI

2018-10-15 Thread Bjorn Helgaas
On Mon, Oct 15, 2018 at 10:42:23AM +0800, Honghui Zhang wrote:
> On Fri, 2018-10-12 at 09:12 -0500, Bjorn Helgaas wrote:
> > On Fri, Oct 12, 2018 at 11:22:30AM +0100, Lorenzo Pieralisi wrote:
> > > On Fri, Oct 12, 2018 at 04:01:29PM +0800, Honghui Zhang wrote:
> > >> On Thu, 2018-10-11 at 12:38 +0100, Lorenzo Pieralisi wrote:
> > >>> On Tue, Oct 09, 2018 at 11:08:15AM +0800, Honghui Zhang wrote:
> > >>>> On Mon, 2018-10-08 at 18:23 +0100, Lorenzo Pieralisi wrote:
> > >>>>> On Mon, Oct 08, 2018 at 11:24:41AM +0800, honghui.zh...@mediatek.com 
> > >>>>> wrote:
> > >>>>>> From: Honghui Zhang 
> > >>>>>> 
> > >>>>>> The PCIe controller of MT7622 has TYPE 1 configuration
> > >>>>>> space type, but the HW default class type values is
> > >>>>>> invalid.
> > >>>>>> 
> > >>>>>> The commit 101c92dc80c8 ("PCI: mediatek: Set up vendor ID
> > >>>>>> and class type for MT7622") have set the class ID for
> > >>>>>> MT7622 as PCI_CLASS_BRIDGE_HOSTe, but it's not workable
> > >>>>>> for MT7622:
> > >>>>>> 
> > >>>>>> In __pci_bus_assign_resources, the framework only setup
> > >>>>>> bridge's resource window only if class type is
> > >>>>>> PCI_CLASS_BRIDGE_PCI. Or it will leave the subordinary PCIe
> > >>>>>> device's MMIO window un-touched.
> > 
> > I think __pci_bus_assign_resources() should be testing dev->hdr_type
> > instead of dev->class.  The connection between "Header Type" and the
> > layout of the rest of the header is very explicit (PCI r3.0 sec 6.1,
> > PCIe r4.0 sec 7.5.1.1.9), and the reason for the switch statement in
> > __pci_bus_assign_resources() is precisely to determine which layout to
> > use.
> > 
> > There are several other uses of dev->class in setup-bus.c that I think
> > should also be changed to use dev->hdr_type.
> > 
> > If we make these changes in setup-bus.c, I suspect the class code you
> > assign won't matter too much.  There are a few other tests of the
> > class code to figure out whether to leave certain things untouched.
> > These seem a little hacky to me, but we're probably stuck with them
> > for now, so you should look and see whether they apply to your
> > situation.
> 
> If these change could be made in the PCI core, then the class code is no
> matter what will be workable for MT7622.
> 
> As Lorenzo point it out, it's more reasonable for MT7622 to defined as a
> PCI-to-PCI class code since the IP is defined as that. I intend to
> following Lorenzo's suggest to update the commit message and re-send
> this patch set for current solution.
> 
> > >>>> And for MT7622, it integrated with block of internal control
> > >>>> registers, type 1 configuration space, and is considered as a
> > >>>> root complex.
> > >>> 
> > >>> I assume you mean a type 1 config header here. I do not think it
> > >>> is mandatory for a host bridge to have a type 1 config header (and
> > >>> related bridge windows + primary/secondary/subordinate bus
> > >>> numbers) but I do not know how the IP you are programming is
> > >>> designed.
> > 
> > It is definitely not mandatory for a host bridge to have a type 1
> > header.  I'm not even sure that would make sense: the "Primary Bus
> > Number" would not apply to a host bridge (since a host bridge's
> > primary bus is some sort of CPU bus, not a PCI bus), and a type 1
> > device cannot perform address translation between its primary and
> > secondary buses, while a host bridge can.
> > 
> > A Root Port is a type 1 device where the primary bus is logically
> > internal to the Root Complex.  A host bridge bridges from the CPU bus
> > to that internal bus and might perform address translation.  The Root
> > Port must be a PCI device.  A host bridge, being a bridge *to* the PCI
> > domain, is not itself generally programmed via PCI config space and
> > might not even be visible as a device in PCI config space.
> > 
> Thanks for the explain. Per my understanding, MT7622 is more like a
> complex of Root Port and PCI-to-PCI bridge. It has type 1 header and has
> the ability to translate address between its primary and secondary
> buses.

Nope.  Logically speaking, the PCI device in question is a Root 

Re: [PATCH v6 2/9] PCI: mediatek: Fixup class ID for MT7622 as PCI_CLASS_BRIDGE_PCI

2018-10-12 Thread Bjorn Helgaas
On Fri, Oct 12, 2018 at 11:22:30AM +0100, Lorenzo Pieralisi wrote:
> On Fri, Oct 12, 2018 at 04:01:29PM +0800, Honghui Zhang wrote:
>> On Thu, 2018-10-11 at 12:38 +0100, Lorenzo Pieralisi wrote:
>>> On Tue, Oct 09, 2018 at 11:08:15AM +0800, Honghui Zhang wrote:
 On Mon, 2018-10-08 at 18:23 +0100, Lorenzo Pieralisi wrote:
> On Mon, Oct 08, 2018 at 11:24:41AM +0800, honghui.zh...@mediatek.com 
> wrote:
>> From: Honghui Zhang 
>> 
>> The PCIe controller of MT7622 has TYPE 1 configuration
>> space type, but the HW default class type values is
>> invalid.
>> 
>> The commit 101c92dc80c8 ("PCI: mediatek: Set up vendor ID
>> and class type for MT7622") have set the class ID for
>> MT7622 as PCI_CLASS_BRIDGE_HOSTe, but it's not workable
>> for MT7622:
>> 
>> In __pci_bus_assign_resources, the framework only setup
>> bridge's resource window only if class type is
>> PCI_CLASS_BRIDGE_PCI. Or it will leave the subordinary PCIe
>> device's MMIO window un-touched.

I think __pci_bus_assign_resources() should be testing dev->hdr_type
instead of dev->class.  The connection between "Header Type" and the
layout of the rest of the header is very explicit (PCI r3.0 sec 6.1,
PCIe r4.0 sec 7.5.1.1.9), and the reason for the switch statement in
__pci_bus_assign_resources() is precisely to determine which layout to
use.

There are several other uses of dev->class in setup-bus.c that I think
should also be changed to use dev->hdr_type.

If we make these changes in setup-bus.c, I suspect the class code you
assign won't matter too much.  There are a few other tests of the
class code to figure out whether to leave certain things untouched.
These seem a little hacky to me, but we're probably stuck with them
for now, so you should look and see whether they apply to your
situation.

 And for MT7622, it integrated with block of internal control
 registers, type 1 configuration space, and is considered as a
 root complex.
>>> 
>>> I assume you mean a type 1 config header here. I do not think it
>>> is mandatory for a host bridge to have a type 1 config header (and
>>> related bridge windows + primary/secondary/subordinate bus
>>> numbers) but I do not know how the IP you are programming is
>>> designed.

It is definitely not mandatory for a host bridge to have a type 1
header.  I'm not even sure that would make sense: the "Primary Bus
Number" would not apply to a host bridge (since a host bridge's
primary bus is some sort of CPU bus, not a PCI bus), and a type 1
device cannot perform address translation between its primary and
secondary buses, while a host bridge can.

A Root Port is a type 1 device where the primary bus is logically
internal to the Root Complex.  A host bridge bridges from the CPU bus
to that internal bus and might perform address translation.  The Root
Port must be a PCI device.  A host bridge, being a bridge *to* the PCI
domain, is not itself generally programmed via PCI config space and
might not even be visible as a device in PCI config space.

Bjorn


Re: [PATCH v2] PCI/IOV: Use VF0 cached config space size for other VFs

2018-10-11 Thread Bjorn Helgaas
On Wed, Oct 10, 2018 at 06:00:10PM +0200, KarimAllah Ahmed wrote:
> Cache the config space size from VF0 and use it for all other VFs instead
> of reading it from the config space of each VF. We assume that it will be
> the same across all associated VFs.
> 
> This is an optimization when enabling SR-IOV on a device with many VFs.
> 
> Cc: Bjorn Helgaas 
> Cc: linux-...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: KarimAllah Ahmed 

Applied to pci/virtualization for v4.20, thanks!

As I mentioned last time, I think CONFIG_PCI_ATS is the wrong symbol to
test here, so I changed that to CONFIG_PCI_IOV.  I also moved the #ifdef
wrapper so the caller doesn't need an ifdef.  Please let me know if these
break anything.  The patch I applied is appended.

> ---
> v1 -> v2:
> - Drop the __pci_cfg_space_size (bhelgaas@)
> - Extend pci_cfg_space_size to return the cached value for all VFs except
>   VF0 (bhelgaas@)
> ---
>  drivers/pci/iov.c   |  2 ++
>  drivers/pci/pci.h   |  1 +
>  drivers/pci/probe.c | 17 +
>  3 files changed, 20 insertions(+)
> 
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index c5f3cd4e..4238b53 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -133,6 +133,8 @@ static void pci_read_vf_config_common(struct pci_dev 
> *virtfn)
>>sriov->subsystem_vendor);
>   pci_read_config_word(virtfn, PCI_SUBSYSTEM_ID,
>>sriov->subsystem_device);
> +
> + physfn->sriov->cfg_size = pci_cfg_space_size(virtfn);
>  }
>  
>  int pci_iov_add_virtfn(struct pci_dev *dev, int id)
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 6e0d152..2f14542 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -285,6 +285,7 @@ struct pci_sriov {
>   u16 driver_max_VFs; /* Max num VFs driver supports */
>   struct pci_dev  *dev;   /* Lowest numbered PF */
>   struct pci_dev  *self;  /* This PF */
> + u32 cfg_size;   /* VF config space size */
>   u32 class;  /* VF device */
>   u8  hdr_type;   /* VF header type */
>   u16 subsystem_vendor; /* VF subsystem vendor */
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 201f9e5..8c0f428 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1438,12 +1438,29 @@ static int pci_cfg_space_size_ext(struct pci_dev *dev)
>   return PCI_CFG_SPACE_EXP_SIZE;
>  }
>  
> +#ifdef CONFIG_PCI_ATS
> +static bool is_vf0(struct pci_dev *dev)
> +{
> + if (pci_iov_virtfn_devfn(dev->physfn, 0) == dev->devfn &&
> + pci_iov_virtfn_bus(dev->physfn, 0) == dev->bus->number)
> + return true;
> +
> + return false;
> +}
> +#endif
> +
>  int pci_cfg_space_size(struct pci_dev *dev)
>  {
>   int pos;
>   u32 status;
>   u16 class;
>  
> +#ifdef CONFIG_PCI_ATS
> + /* Read cached value for all VFs except for VF0 */
> + if (dev->is_virtfn && !is_vf0(dev))
> + return dev->physfn->sriov->cfg_size;
> +#endif
> +
>   if (dev->bus->bus_flags & PCI_BUS_FLAGS_NO_EXTCFG)
>   return PCI_CFG_SPACE_SIZE;
>  
> -- 
> 2.7.4
> 

commit 601f9f6679157b70a7a4e752baa590bd2af69ffb
Author: KarimAllah Ahmed 
Date:   Thu Oct 11 11:49:58 2018 -0500

PCI/IOV: Use VF0 cached config space size for other VFs

Cache the config space size from VF0 and use it for all other VFs instead
of reading it from the config space of each VF.  We assume that it will be
the same across all associated VFs.

This is an optimization when enabling SR-IOV on a device with many VFs.

Signed-off-by: KarimAllah Ahmed 
[bhelgaas: use CONFIG_PCI_IOV (not CONFIG_PCI_ATS), adjust is_vf0() wrapper
so caller doesn't need ifdef]
Signed-off-by: Bjorn Helgaas 

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index c5f3cd4ed766..4238b539f9d8 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -133,6 +133,8 @@ static void pci_read_vf_config_common(struct pci_dev 
*virtfn)
 >sriov->subsystem_vendor);
pci_read_config_word(virtfn, PCI_SUBSYSTEM_ID,
 >sriov->subsystem_device);
+
+   physfn->sriov->cfg_size = pci_cfg_space_size(virtfn);
 }
 
 int pci_iov_add_virtfn(struct pci_dev *dev, int id)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 6e0d1528d471..2f1454209257 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -285,6 +285,7 @@ struct pci_sriov {
u16 driver_max_VFs; /* Max num VFs driver supports */
struct p

[PATCH v3] PCI/AER: Enable reporting for all ports

2018-10-11 Thread Bjorn Helgaas
This is another attempt to fix the AER error reporting issue reported
by Jon.  I've compiled this on x86, but can't test it myself.

If we can get this tested, I'd like to include this for v4.20.


v3: This post
Fix the problem that v2 didn't enable error reporting on Root
Ports, as pointed out by Dongdong

v2: 
https://lore.kernel.org/linux-pci/20181009231915.gc5...@bhelgaas-glaptop.roam.corp.google.com
My attempt to move the fix to the AER service driver

v1: 
https://lore.kernel.org/linux-pci/1536085989-2956-1-git-send-email-jonathan.derr...@intel.com
Jon's initial posting

---

Bjorn Helgaas (1):
  PCI/AER: Enable reporting for ports enumerated after AER driver 
registration


 drivers/pci/pcie/aer.c |   16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)


[PATCH v3] PCI/AER: Enable reporting for ports enumerated after AER driver registration

2018-10-11 Thread Bjorn Helgaas
From: Bjorn Helgaas 

Previously we enabled AER error reporting only for Switch Ports that were
enumerated prior to registering the AER service driver.  Switch Ports
enumerated after AER driver registration were left with error reporting
disabled.

A common order, which works correctly, is that we enumerate devices before
registering portdrv and the AER driver:

  - Enumerate all the devices at boot-time

  - Register portdrv and bind it to all Root Ports and Switch Ports, which
disables error reporting for these Ports

  - Register AER service driver and bind it to all Root Ports, which
enables error reporting for the Root Ports and any Switch Ports below
them

But if we enumerate devices *after* registering portdrv and the AER driver,
e.g., if a host bridge driver is loaded as a module, error reporting is not
enabled correctly:

  - Register portdrv and AER driver (this happens at boot-time)

  - Enumerate a Root Port

  - Bind portdrv to Root Port, disabling its error reporting

  - Bind AER service driver to Root Port, enabling error reporting for it
and its children (there are no children, since we haven't enumerated
them yet)

  - Enumerate Switch Port below the Root Port

  - Bind portdrv to Switch Port, disabling its error reporting

  - AER service driver doesn't bind to Switch Ports, so error reporting
remains disabled

Hot-adding a Switch fails similarly: error reporting is enabled correctly
for the Root Port, but when the Switch is enumerated, the AER service
driver doesn't claim it, so there's nothing to enable error reporting for
the Switch Ports.

Change the AER service driver so it binds to *all* PCIe Ports, including
Switch Upstream and Downstream Ports.  Enable AER error reporting for all
these Ports, but not for any children.

Link: 
https://lore.kernel.org/linux-pci/1536085989-2956-1-git-send-email-jonathan.derr...@intel.com
Based-on-patch-by: Jon Derrick 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/pcie/aer.c |   16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 90b53abf621d..c40c6607849b 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1316,12 +1316,6 @@ static void aer_enable_rootport(struct aer_rpc *rpc)
pci_read_config_dword(pdev, aer_pos + PCI_ERR_UNCOR_STATUS, );
pci_write_config_dword(pdev, aer_pos + PCI_ERR_UNCOR_STATUS, reg32);
 
-   /*
-* Enable error reporting for the root port device and downstream port
-* devices.
-*/
-   set_downstream_devices_error_reporting(pdev, true);
-
/* Enable Root Port's interrupt in response to error messages */
pci_read_config_dword(pdev, aer_pos + PCI_ERR_ROOT_COMMAND, );
reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
@@ -1378,10 +1372,17 @@ static void aer_remove(struct pcie_device *dev)
  */
 static int aer_probe(struct pcie_device *dev)
 {
+   struct pci_dev *pdev = dev->port;
+   int type = pci_pcie_type(pdev);
int status;
struct aer_rpc *rpc;
struct device *device = >device;
 
+   if (type == PCI_EXP_TYPE_UPSTREAM || type == PCI_EXP_TYPE_DOWNSTREAM) {
+   pci_enable_pcie_error_reporting(pdev);
+   return 0;
+   }
+
rpc = devm_kzalloc(device, sizeof(struct aer_rpc), GFP_KERNEL);
if (!rpc) {
dev_printk(KERN_DEBUG, device, "alloc AER rpc failed\n");
@@ -1399,6 +1400,7 @@ static int aer_probe(struct pcie_device *dev)
}
 
aer_enable_rootport(rpc);
+   pci_enable_pcie_error_reporting(pdev);
dev_info(device, "AER enabled with IRQ %d\n", dev->irq);
return 0;
 }
@@ -1439,7 +1441,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev 
*dev)
 
 static struct pcie_port_service_driver aerdriver = {
.name   = "aer",
-   .port_type  = PCI_EXP_TYPE_ROOT_PORT,
+   .port_type  = PCIE_ANY_PORT,
.service= PCIE_PORT_SERVICE_AER,
 
.probe  = aer_probe,



Re: [PATCH] PCI/portdrv: Enable error reporting on managed ports

2018-10-11 Thread Bjorn Helgaas
On Thu, Oct 11, 2018 at 07:58:47PM +0800, Dongdong Liu wrote:
> Hi Bjorn
> 
> > commit 15a6711649915ca3e9d1086dc88ff4b616b99aac
> > Author: Bjorn Helgaas 
> > Date:   Tue Oct 9 17:25:25 2018 -0500
> > 
> > PCI/AER: Enable reporting for ports enumerated after AER driver 
> > registration
> > 
> > Previously we enabled AER error reporting only for Switch Ports that 
> > were
> > enumerated prior to registering the AER service driver.  Switch Ports
> > enumerated after AER driver registration were left with error reporting
> > disabled.
> > 
> > A common order, which works correctly, is that we enumerate devices 
> > before
> > registering portdrv and the AER driver:
> > 
> >   - Enumerate all the devices at boot-time
> > 
> >   - Register portdrv and bind it to all Root Ports and Switch Ports, 
> > which
> > disables error reporting for these Ports
> > 
> >   - Register AER service driver and bind it to all Root Ports, which
> > enables error reporting for the Root Ports and any Switch Ports 
> > below
> > them
> > 
> > But if we enumerate devices *after* registering portdrv and the AER 
> > driver,
> > e.g., if a host bridge driver is loaded as a module, error reporting is 
> > not
> > enabled correctly:
> > 
> >   - Register portdrv and AER driver (this happens at boot-time)
> > 
> >   - Enumerate a Root Port
> > 
> >   - Bind portdrv to Root Port, disabling its error reporting
> > 
> >   - Bind AER service driver to Root Port, enabling error reporting for 
> > it
> > and its children (none, since we haven't enumerated them yet)
> > 
> >   - Enumerate Switch Port below the Root Port
> > 
> >   - Bind portdrv to Switch Port, disabling its error reporting
> > 
> >   - AER service driver doesn't bind to Switch Ports, so error reporting
> > remains disabled
> > 
> > Hot-adding a Switch fails similarly: error reporting is enabled 
> > correctly
> > for the Root Port, but when the Switch is enumerated, the AER service
> > driver doesn't claim it, so there's nothing to enable error reporting 
> > for
> > the Switch Ports.
> > 
> > Change the AER service driver so it binds to *all* PCIe ports, including
> > Switch Upstream and Downstream Ports.  For Switch Ports, enable AER 
> > error
> > reporting.
> > 
> > Link: 
> > https://lore.kernel.org/linux-pci/1536085989-2956-1-git-send-email-jonathan.derr...@intel.com
> > Based-on-patch-by: Jon Derrick 
> > Signed-off-by: Bjorn Helgaas 
> > 
> > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > index 90b53abf621d..fe6c16461367 100644
> > --- a/drivers/pci/pcie/aer.c
> > +++ b/drivers/pci/pcie/aer.c
> > @@ -1316,12 +1316,6 @@ static void aer_enable_rootport(struct aer_rpc *rpc)
> > pci_read_config_dword(pdev, aer_pos + PCI_ERR_UNCOR_STATUS, );
> > pci_write_config_dword(pdev, aer_pos + PCI_ERR_UNCOR_STATUS, reg32);
> > 
> > -   /*
> > -* Enable error reporting for the root port device and downstream port
> > -* devices.
> > -*/
> > -   set_downstream_devices_error_reporting(pdev, true);
> > -
> 
> Delete the code will also disable error reporting for the root port as
> the portdrv to Root Port has disabled its error reporting,
> so need to enable enable error reporting for the root port.
> +pci_enable_pcie_error_reporting(pdev);

Oh, you're right, thank you!

I'll post a "v3" to fix this, i.e.,

  v1 - Jon's original post, 
https://lore.kernel.org/linux-pci/1536085989-2956-1-git-send-email-jonathan.derr...@intel.com
  v2 - My rework, 
https://lore.kernel.org/linux-pci/20181009231915.gc5...@bhelgaas-glaptop.roam.corp.google.com
  v3 - My rework + enable error reporting for Root Ports

Bjorn


Re: [PATCH v9 00/13] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-10-10 Thread Bjorn Helgaas
On Wed, Oct 10, 2018 at 05:03:33PM -0600, Logan Gunthorpe wrote:
> 
> 
> On 2018-10-10 2:19 p.m., Bjorn Helgaas wrote:
> > I added the reviewed-by tags from Christoph, Jens' ack on the blkdev.h
> > change, and applied these to pci/peer-to-peer with the intent of
> > merging these for v4.20.
> > 
> > I gave up on waiting for an ack for the memremap.h and mm.h changes.
> > 
> > I dropped the "nvme-pci: Add a quirk for a pseudo CMB" quirk because
> > of Christoph's objection.  After this is all merged, I won't need to
> > be involved, and you and the NVMe folks can hash that out.
> > 
> > If there are updates to "nvmet: Optionally use PCI P2P memory" based
> > on Sagi's comments, send an incremental patch and I'll fold them in.
> 
> Thanks for picking this up. However, I hate to throw a wrench in the
> works, but I had a v10[1] queued up because kbuild found some problems
> with the series over the weekend. I can send v10 off right away if you
> want to just replace it in your branch or, if you'd like, I can generate
> some incremental patches. Let me know which you'd prefer.

I applied the updates from your v10 to my pci/peer-to-peer branch.

> [1] https://github.com/sbates130272/linux-p2pmem pci-p2p-v10


Re: [PATCH v2] PCI: Fix Switchtec DMA aliasing quirk dmesg noise

2018-10-10 Thread Bjorn Helgaas
On Fri, Oct 05, 2018 at 09:49:40AM -0600, Logan Gunthorpe wrote:
> Currently the Switchtec quirk runs on all endpoints in the Switch
> which includes all the upstream and downstream ports. Seeing these
> other functions do not contain BARs the quirk fails when trying to
> map the BAR and prints the error "Cannot iomap Switchtec device".
> The user will see a few of these useless and scary errors, one for
> each port in the switch.
> 
> At most, the quirk should only run on either a management endpoint
> (class=PCI_CLASS_MEMORY_OTHER) or an NTB endpoint
> (PCI_CLASS_BRIDGE_OTHER). However, seeing the quirk is useless except
> in NTB applications, we will only run it when the class is
> PCI_CLASS_BRIDGE_OTHER.
> 
> Thus, switch to using DECLARE_PCI_FIXUP_CLASS_FINAL and clean up
> the list with a define (so we don't have to change as much code if
> we ever have to adjust the list).
> 
> Reported-by: Stephen  Bates 
> Cc: Doug Meyer 
> Cc: Bjorn Helgaas 
> Cc: Kurt Schwemmer 
> Fixes: ad281ecf1c7d ("PCI: Add DMA alias quirk for Microsemi Switchtec NTB")
> Signed-off-by: Logan Gunthorpe 

Applied to pci/misc for v4.20, thanks!

I split this into two patches so the important change doesn't get lost in
the noise of the SWITCHTEC_QUIRK() addition:

  - Add the SWITCHTEC_QUIRK() macro, but don't change anything else
  - Change SWITCHTEC_QUIRK() to use DECLARE_PCI_FIXUP_CLASS_FINAL

> ---
> 
> * v2: Changes comment style, per feedback from Christoph
> 
>  drivers/pci/quirks.c | 90 +---
>  1 file changed, 34 insertions(+), 56 deletions(-)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 6bc27b7fd452..0f072aed30f5 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5057,59 +5057,37 @@ static void quirk_switchtec_ntb_dma_alias(struct 
> pci_dev *pdev)
>   pci_iounmap(pdev, mmio);
>   pci_disable_device(pdev);
>  }
> -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8531,
> - quirk_switchtec_ntb_dma_alias);
> -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8532,
> - quirk_switchtec_ntb_dma_alias);
> -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8533,
> - quirk_switchtec_ntb_dma_alias);
> -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8534,
> - quirk_switchtec_ntb_dma_alias);
> -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8535,
> - quirk_switchtec_ntb_dma_alias);
> -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8536,
> - quirk_switchtec_ntb_dma_alias);
> -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8543,
> - quirk_switchtec_ntb_dma_alias);
> -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8544,
> - quirk_switchtec_ntb_dma_alias);
> -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8545,
> - quirk_switchtec_ntb_dma_alias);
> -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8546,
> - quirk_switchtec_ntb_dma_alias);
> -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8551,
> - quirk_switchtec_ntb_dma_alias);
> -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8552,
> - quirk_switchtec_ntb_dma_alias);
> -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8553,
> - quirk_switchtec_ntb_dma_alias);
> -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8554,
> - quirk_switchtec_ntb_dma_alias);
> -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8555,
> - quirk_switchtec_ntb_dma_alias);
> -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8556,
> - quirk_switchtec_ntb_dma_alias);
> -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8561,
> - quirk_switchtec_ntb_dma_alias);
> -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8562,
> - quirk_switchtec_ntb_dma_alias);
> -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8563,
> - quirk_switchtec_ntb_dma_alias);
> -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8564,
> - quirk_switchtec_ntb_dma_alias);
> -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8565,
> - quirk_switchtec_ntb_dma_alias);
> -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8566,
> - quirk_switchtec_ntb_dma_alias);
> -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8571,
> - quirk_switchtec_ntb_dma_alias);
> -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8572,
> - quirk_switchtec_ntb_dma_alias);
> -DEC

Re: [PATCH 3/3] PCI: remove pci_set_dma_max_seg_size

2018-10-10 Thread Bjorn Helgaas
[+cc maintainers]

On Tue, Oct 09, 2018 at 04:08:24PM +0200, Christoph Hellwig wrote:
> The few callers can just use dma_set_max_seg_size directly.

I intend to apply this, just FYI about these trivial changes to your
drivers.

> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/ata/sata_inic162x.c| 2 +-
>  drivers/block/rsxx/core.c  | 2 +-
>  drivers/pci/probe.c| 2 +-
>  drivers/s390/net/ism_drv.c | 2 +-
>  drivers/scsi/aacraid/linit.c   | 2 +-
>  include/linux/pci-dma-compat.h | 9 -
>  6 files changed, 5 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/ata/sata_inic162x.c b/drivers/ata/sata_inic162x.c
> index 9b6d7930d1c7..e0bcf9b2dab0 100644
> --- a/drivers/ata/sata_inic162x.c
> +++ b/drivers/ata/sata_inic162x.c
> @@ -873,7 +873,7 @@ static int inic_init_one(struct pci_dev *pdev, const 
> struct pci_device_id *ent)
>* like others but it will lock up the whole machine HARD if
>* 65536 byte PRD entry is fed. Reduce maximum segment size.
>*/
> - rc = pci_set_dma_max_seg_size(pdev, 65536 - 512);
> + rc = dma_set_max_seg_size(>dev, 65536 - 512);
>   if (rc) {
>   dev_err(>dev, "failed to set the maximum segment size\n");
>   return rc;
> diff --git a/drivers/block/rsxx/core.c b/drivers/block/rsxx/core.c
> index f2c631ce793c..37df486c7c3c 100644
> --- a/drivers/block/rsxx/core.c
> +++ b/drivers/block/rsxx/core.c
> @@ -780,7 +780,7 @@ static int rsxx_pci_probe(struct pci_dev *dev,
>   goto failed_enable;
>  
>   pci_set_master(dev);
> - pci_set_dma_max_seg_size(dev, RSXX_HW_BLK_SIZE);
> + dma_set_max_seg_size(>dev, RSXX_HW_BLK_SIZE);
>  
>   st = pci_set_dma_mask(dev, DMA_BIT_MASK(64));
>   if (st) {
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index fc6340d76814..f97513b3d030 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2397,7 +2397,7 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus 
> *bus)
>   dev->dev.dma_parms = >dma_parms;
>   dev->dev.coherent_dma_mask = 0xull;
>  
> - pci_set_dma_max_seg_size(dev, 65536);
> + dma_set_max_seg_size(>dev, 65536);
>   dma_set_seg_boundary(>dev, 0x);
>  
>   /* Fix up broken headers */
> diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c
> index 8688c0fff761..f96ec68af2e5 100644
> --- a/drivers/s390/net/ism_drv.c
> +++ b/drivers/s390/net/ism_drv.c
> @@ -516,7 +516,7 @@ static int ism_probe(struct pci_dev *pdev, const struct 
> pci_device_id *id)
>   goto err_unmap;
>  
>   dma_set_seg_boundary(>dev, SZ_1M - 1);
> - pci_set_dma_max_seg_size(pdev, SZ_1M);
> + dma_set_max_seg_size(>dev, SZ_1M);
>   pci_set_master(pdev);
>  
>   ism->smcd = smcd_alloc_dev(>dev, dev_name(>dev), _ops,
> diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c
> index 04443577d48b..53eb2e9569b9 100644
> --- a/drivers/scsi/aacraid/linit.c
> +++ b/drivers/scsi/aacraid/linit.c
> @@ -1747,7 +1747,7 @@ static int aac_probe_one(struct pci_dev *pdev, const 
> struct pci_device_id *id)
>   shost->max_sectors = (shost->sg_tablesize * 8) + 112;
>   }
>  
> - error = pci_set_dma_max_seg_size(pdev,
> + error = dma_set_max_seg_size(>dev,
>   (aac->adapter_info.options & AAC_OPT_NEW_COMM) ?
>   (shost->max_sectors << 9) : 65536);
>   if (error)
> diff --git a/include/linux/pci-dma-compat.h b/include/linux/pci-dma-compat.h
> index 558a109ab497..cb1adf0b78a9 100644
> --- a/include/linux/pci-dma-compat.h
> +++ b/include/linux/pci-dma-compat.h
> @@ -119,20 +119,11 @@ static inline int pci_set_consistent_dma_mask(struct 
> pci_dev *dev, u64 mask)
>  {
>   return dma_set_coherent_mask(>dev, mask);
>  }
> -
> -static inline int pci_set_dma_max_seg_size(struct pci_dev *dev,
> -unsigned int size)
> -{
> - return dma_set_max_seg_size(>dev, size);
> -}
>  #else
>  static inline int pci_set_dma_mask(struct pci_dev *dev, u64 mask)
>  { return -EIO; }
>  static inline int pci_set_consistent_dma_mask(struct pci_dev *dev, u64 mask)
>  { return -EIO; }
> -static inline int pci_set_dma_max_seg_size(struct pci_dev *dev,
> -unsigned int size)
> -{ return -EIO; }
>  #endif
>  
>  #endif
> -- 
> 2.19.0
> 


Re: [PATCH 2/3] PCI: remove pci_set_dma_seg_boundary

2018-10-10 Thread Bjorn Helgaas
[+cc s390 network maintainers]

On Tue, Oct 09, 2018 at 04:08:23PM +0200, Christoph Hellwig wrote:
> The two callers can just use dma_set_seg_boundary directly.

I intend to apply this trivial patch, so just FYI.

> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/pci/probe.c| 2 +-
>  drivers/s390/net/ism_drv.c | 2 +-
>  include/linux/pci-dma-compat.h | 9 -
>  3 files changed, 2 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 201f9e5ff55c..fc6340d76814 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2398,7 +2398,7 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus 
> *bus)
>   dev->dev.coherent_dma_mask = 0xull;
>  
>   pci_set_dma_max_seg_size(dev, 65536);
> - pci_set_dma_seg_boundary(dev, 0x);
> + dma_set_seg_boundary(>dev, 0x);
>  
>   /* Fix up broken headers */
>   pci_fixup_device(pci_fixup_header, dev);
> diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c
> index c0631895154e..8688c0fff761 100644
> --- a/drivers/s390/net/ism_drv.c
> +++ b/drivers/s390/net/ism_drv.c
> @@ -515,7 +515,7 @@ static int ism_probe(struct pci_dev *pdev, const struct 
> pci_device_id *id)
>   if (ret)
>   goto err_unmap;
>  
> - pci_set_dma_seg_boundary(pdev, SZ_1M - 1);
> + dma_set_seg_boundary(>dev, SZ_1M - 1);
>   pci_set_dma_max_seg_size(pdev, SZ_1M);
>   pci_set_master(pdev);
>  
> diff --git a/include/linux/pci-dma-compat.h b/include/linux/pci-dma-compat.h
> index c3f1b44ade29..558a109ab497 100644
> --- a/include/linux/pci-dma-compat.h
> +++ b/include/linux/pci-dma-compat.h
> @@ -125,12 +125,6 @@ static inline int pci_set_dma_max_seg_size(struct 
> pci_dev *dev,
>  {
>   return dma_set_max_seg_size(>dev, size);
>  }
> -
> -static inline int pci_set_dma_seg_boundary(struct pci_dev *dev,
> -unsigned long mask)
> -{
> - return dma_set_seg_boundary(>dev, mask);
> -}
>  #else
>  static inline int pci_set_dma_mask(struct pci_dev *dev, u64 mask)
>  { return -EIO; }
> @@ -139,9 +133,6 @@ static inline int pci_set_consistent_dma_mask(struct 
> pci_dev *dev, u64 mask)
>  static inline int pci_set_dma_max_seg_size(struct pci_dev *dev,
>  unsigned int size)
>  { return -EIO; }
> -static inline int pci_set_dma_seg_boundary(struct pci_dev *dev,
> -unsigned long mask)
> -{ return -EIO; }
>  #endif
>  
>  #endif
> -- 
> 2.19.0
> 


Re: [PATCH 1/3] PCI: remove DMA unmap wrappers

2018-10-10 Thread Bjorn Helgaas
[+cc folks from MAINTAINERS]

On Tue, Oct 09, 2018 at 04:08:22PM +0200, Christoph Hellwig wrote:
> Only some of these were still used by the cxgb4 driver, and that despite
> the fact that the driver otherwise uses the generic DMA API.

This is trivial and I intend to apply it, so just copying cxgb4 folks
as FYI.

> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/infiniband/hw/cxgb4/qp.c | 10 +-
>  drivers/infiniband/hw/cxgb4/t4.h |  2 +-
>  include/linux/pci-dma.h  | 12 
>  include/linux/pci.h  |  1 -
>  4 files changed, 6 insertions(+), 19 deletions(-)
>  delete mode 100644 include/linux/pci-dma.h
> 
> diff --git a/drivers/infiniband/hw/cxgb4/qp.c 
> b/drivers/infiniband/hw/cxgb4/qp.c
> index 347fe18b1a41..62d6f197ec0b 100644
> --- a/drivers/infiniband/hw/cxgb4/qp.c
> +++ b/drivers/infiniband/hw/cxgb4/qp.c
> @@ -99,7 +99,7 @@ static void dealloc_oc_sq(struct c4iw_rdev *rdev, struct 
> t4_sq *sq)
>  static void dealloc_host_sq(struct c4iw_rdev *rdev, struct t4_sq *sq)
>  {
>   dma_free_coherent(&(rdev->lldi.pdev->dev), sq->memsize, sq->queue,
> -   pci_unmap_addr(sq, mapping));
> +   dma_unmap_addr(sq, mapping));
>  }
>  
>  static void dealloc_sq(struct c4iw_rdev *rdev, struct t4_sq *sq)
> @@ -132,7 +132,7 @@ static int alloc_host_sq(struct c4iw_rdev *rdev, struct 
> t4_sq *sq)
>   if (!sq->queue)
>   return -ENOMEM;
>   sq->phys_addr = virt_to_phys(sq->queue);
> - pci_unmap_addr_set(sq, mapping, sq->dma_addr);
> + dma_unmap_addr_set(sq, mapping, sq->dma_addr);
>   return 0;
>  }
>  
> @@ -2521,7 +2521,7 @@ static void free_srq_queue(struct c4iw_srq *srq, struct 
> c4iw_dev_ucontext *uctx,
>  
>   dma_free_coherent(>lldi.pdev->dev,
> wq->memsize, wq->queue,
> - pci_unmap_addr(wq, mapping));
> + dma_unmap_addr(wq, mapping));
>   c4iw_rqtpool_free(rdev, wq->rqt_hwaddr, wq->rqt_size);
>   kfree(wq->sw_rq);
>   c4iw_put_qpid(rdev, wq->qid, uctx);
> @@ -2570,7 +2570,7 @@ static int alloc_srq_queue(struct c4iw_srq *srq, struct 
> c4iw_dev_ucontext *uctx,
>   goto err_free_rqtpool;
>  
>   memset(wq->queue, 0, wq->memsize);
> - pci_unmap_addr_set(wq, mapping, wq->dma_addr);
> + dma_unmap_addr_set(wq, mapping, wq->dma_addr);
>  
>   wq->bar2_va = c4iw_bar2_addrs(rdev, wq->qid, T4_BAR2_QTYPE_EGRESS,
> >bar2_qid,
> @@ -2649,7 +2649,7 @@ static int alloc_srq_queue(struct c4iw_srq *srq, struct 
> c4iw_dev_ucontext *uctx,
>  err_free_queue:
>   dma_free_coherent(>lldi.pdev->dev,
> wq->memsize, wq->queue,
> - pci_unmap_addr(wq, mapping));
> + dma_unmap_addr(wq, mapping));
>  err_free_rqtpool:
>   c4iw_rqtpool_free(rdev, wq->rqt_hwaddr, wq->rqt_size);
>  err_free_pending_wrs:
> diff --git a/drivers/infiniband/hw/cxgb4/t4.h 
> b/drivers/infiniband/hw/cxgb4/t4.h
> index e42021fd6fd6..fff6d48d262f 100644
> --- a/drivers/infiniband/hw/cxgb4/t4.h
> +++ b/drivers/infiniband/hw/cxgb4/t4.h
> @@ -397,7 +397,7 @@ struct t4_srq_pending_wr {
>  struct t4_srq {
>   union t4_recv_wr *queue;
>   dma_addr_t dma_addr;
> - DECLARE_PCI_UNMAP_ADDR(mapping);
> + DEFINE_DMA_UNMAP_ADDR(mapping);
>   struct t4_swrqe *sw_rq;
>   void __iomem *bar2_va;
>   u64 bar2_pa;
> diff --git a/include/linux/pci-dma.h b/include/linux/pci-dma.h
> deleted file mode 100644
> index 0f7aa7353ca3..
> --- a/include/linux/pci-dma.h
> +++ /dev/null
> @@ -1,12 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0 */
> -#ifndef _LINUX_PCI_DMA_H
> -#define _LINUX_PCI_DMA_H
> -
> -#define DECLARE_PCI_UNMAP_ADDR(ADDR_NAME) DEFINE_DMA_UNMAP_ADDR(ADDR_NAME);
> -#define DECLARE_PCI_UNMAP_LEN(LEN_NAME)   DEFINE_DMA_UNMAP_LEN(LEN_NAME);
> -#define pci_unmap_addr dma_unmap_addr
> -#define pci_unmap_addr_set dma_unmap_addr_set
> -#define pci_unmap_len  dma_unmap_len
> -#define pci_unmap_len_set  dma_unmap_len_set
> -
> -#endif
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 6925828f9f25..e938e80e59c1 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1342,7 +1342,6 @@ int pci_set_vga_state(struct pci_dev *pdev, bool decode,
>  
>  /* kmem_cache style wrapper around pci_alloc_consistent() */
>  
> -#include 
>  #include 
>  
>  #define  pci_pool dma_pool
> -- 
> 2.19.0
> 


Re: [PATCH] PCI/portdrv: Enable error reporting on managed ports

2018-10-09 Thread Bjorn Helgaas
On Tue, Oct 09, 2018 at 07:51:58PM +, Derrick, Jonathan wrote:
> On Tue, 2018-10-09 at 12:56 -0500, Bjorn Helgaas wrote:
> > On Tue, Sep 04, 2018 at 12:33:09PM -0600, Jon Derrick wrote:
> > > During probe, the port driver will disable error reporting and
> > > assumes it will be enabled later by the AER driver's
> > > pci_walk_bus() sequence.  This may not be the case for
> > > host-bridge enabled root ports, who will enable first error
> > > reporting on the bus during the root port probe, and then
> > > disable error reporting on downstream devices during subsequent
> > > probing of the bus.
> > 
> > I understand the hotplug case (see below), but help me understand
> > this "host-bridge enabled root ports" thing.  I'm not sure what
> > that means.
>
> Sorry for the confusion. I meant a device which doesn't expose the
> root ports with firmware but has to expose the root ports using
> pci_create_root_bus or similar methods. These methods use a host
> bridge aperture on the backend.

I guess the contrast you're making is between drivers/acpi/pci_root.c,
which claims ACPI PNP0A03 and PNP0A08 devices, and what I call the
"native" host bridge drivers, which normally claim DT platform devices
and know about the register layout and programming model of those
devices?

In both cases the PCI core has to know about the host bridge apertures
(the address ranges on the primary (upstream) side of the host bridge
that are translated to PCI addresses on the secondary (downstream)
side of the bridge).

I'm still not seeing the connection between the ACPI/native
distinction or the host bridge apertures and this AER enablement
issue.  But I think the problem has to do with the ordering between
enumeration and portdrv/AER driver binding.  See below.

> > We run pcie_portdrv_probe() for every root port, switch upstream
> > port, and switch downstream port, and it always disables error
> > reporting for the port:
> > 
> >   pcie_portdrv_probe  # pci_driver .probe
> > pcie_port_device_register
> >   get_port_device_capability
> > services |= PCIE_PORT_SERVICE_AER
> > pci_disable_pcie_error_reporting
> >   # clear DEVCTL Error Reporting Enables
> > 
> > For root ports, we call aer_probe(), and it enables error reporting
> > for the entire tree below the root port:
> > 
> >   aer_probe   # pcie_port_service .probe
> > aer_enable_rootport
> >   set_downstream_devices_error_reporting(dev, true)
> > pci_walk_bus(dev->subordinate, set_device_error_reporting)
> >   set_device_error_reporting
> > if (Root Port || Upstream Port || Downstream Port)
> >   pci_enable_pcie_error_reporting
> > # set DEVCTL Error Reporting Enables
> > 
> > This is definitely broken for hot-added switches because
> > aer_probe() is the only place we enable error reporting, and it's
> > only run when we enumerate a root port, not when we hot-add things
> > below that root port.
> 
> I don't currently have the hardware to test hotplugging a switch,
> although I think it should be possible to test with Thunderbolt.
> Mika?  :)

It seems clear enough to me that this is broken; I don't think we need
more testing to confirm it.  Your scenario below looks like it's
probably from VMD, given the domain number, and the fact that the
timestamps look like they're after boot suggests that VMD is being
loaded as a module.

I think I can work out the order of events there:

  - register pcie_portdriver   # device_initcall
  - register aerdriver # device_initcall, after portdrv
  - load VMD module
  - add VMD host bridge to domain 1
  - 1:00:00.0: enumerate VMD root port
  - 1:00:00.0: bind portdrv, disable error reporting
  - 1:00:00.0: bind aerdriver, enable error reporting for children
  - 1:01:00.0: enumerate VMD switch upstream port
  - 1:01:00.0: bind portdrv, disable error reporting
  - 1:01:00.0: do not bind AER driver (because not a root port)
  - 1:02:0x.0: enumerate VMD switch downstream ports
  - 1:02:0x.0: bind portdrv, disable error reporting
  - 1:02:0x.0: do not bind AER driver (not root ports)
  - 1:06:00.0: enumerate NVMe endpoint
  - 1:06:00.0: nvme driver enables error reporting

The end state is that error reporting is enabled only for the root
port and the NVMe device, but not for the switch ports in between.

I think the critical ordering is the portdrv/AER driver registration
vs. the device enumeration.  If we enumerate the devices before
registering portdrv/AER (as is the typical case with ACPI host
bridges), when we register portdrv, we'll bind portdrv to all the
bri

Re: [PATCH] PCI/portdrv: Enable error reporting on managed ports

2018-10-09 Thread Bjorn Helgaas
On Tue, Sep 04, 2018 at 12:33:09PM -0600, Jon Derrick wrote:
> During probe, the port driver will disable error reporting and assumes
> it will be enabled later by the AER driver's pci_walk_bus() sequence.
> This may not be the case for host-bridge enabled root ports, who will
> enable first error reporting on the bus during the root port probe, and
> then disable error reporting on downstream devices during subsequent
> probing of the bus.

I understand the hotplug case (see below), but help me understand this
"host-bridge enabled root ports" thing.  I'm not sure what that means.

We run pcie_portdrv_probe() for every root port, switch upstream port,
and switch downstream port, and it always disables error reporting for
the port:

  pcie_portdrv_probe  # pci_driver .probe
pcie_port_device_register
  get_port_device_capability
services |= PCIE_PORT_SERVICE_AER
pci_disable_pcie_error_reporting
  # clear DEVCTL Error Reporting Enables

For root ports, we call aer_probe(), and it enables error reporting
for the entire tree below the root port:

  aer_probe   # pcie_port_service .probe
aer_enable_rootport
  set_downstream_devices_error_reporting(dev, true)
pci_walk_bus(dev->subordinate, set_device_error_reporting)
  set_device_error_reporting
if (Root Port || Upstream Port || Downstream Port)
  pci_enable_pcie_error_reporting
# set DEVCTL Error Reporting Enables

This is definitely broken for hot-added switches because aer_probe()
is the only place we enable error reporting, and it's only run when we
enumerate a root port, not when we hot-add things below that root
port.

> A hotplugged port device may also fail to enable error reporting as the
> AER driver has already run on the root bus.

> Check for these conditions and enable error reporting during portdrv
> probing.
> 
> Example case:

pcie_portdrv_probe(1:00:00.0):
> [  343.790573] pcieport 1:00:00.0: pci_disable_pcie_error_reporting

aer_probe(1:00:00.0):
> [  343.809812] pcieport 1:00:00.0: pci_enable_pcie_error_reporting
> [  343.819506] pci 1:01:00.0: pci_enable_pcie_error_reporting
> [  343.828814] pci 1:02:00.0: pci_enable_pcie_error_reporting
> [  343.838089] pci 1:02:01.0: pci_enable_pcie_error_reporting
> [  343.847478] pci 1:02:02.0: pci_enable_pcie_error_reporting
> [  343.856659] pci 1:02:03.0: pci_enable_pcie_error_reporting
> [  343.865794] pci 1:02:04.0: pci_enable_pcie_error_reporting
> [  343.874875] pci 1:02:05.0: pci_enable_pcie_error_reporting
> [  343.883918] pci 1:02:06.0: pci_enable_pcie_error_reporting
> [  343.892922] pci 1:02:07.0: pci_enable_pcie_error_reporting

pcie_portdrv_probe(1:01:00.0):
> [  343.918900] pcieport 1:01:00.0: pci_disable_pcie_error_reporting

pcie_portdrv_probe(1:02:00.0):
> [  343.968426] pcieport 1:02:00.0: pci_disable_pcie_error_reporting

...
> [  344.028179] pcieport 1:02:01.0: pci_disable_pcie_error_reporting
> [  344.091269] pcieport 1:02:02.0: pci_disable_pcie_error_reporting
> [  344.156473] pcieport 1:02:03.0: pci_disable_pcie_error_reporting
> [  344.238042] pcieport 1:02:04.0: pci_disable_pcie_error_reporting
> [  344.321864] pcieport 1:02:05.0: pci_disable_pcie_error_reporting
> [  344.411601] pcieport 1:02:06.0: pci_disable_pcie_error_reporting
> [  344.505332] pcieport 1:02:07.0: pci_disable_pcie_error_reporting

> [  344.621824] nvme 1:06:00.0: pci_enable_pcie_error_reporting
> 
> Signed-off-by: Jon Derrick 
> ---
>  drivers/pci/pcie/portdrv_core.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> index 7c37d81..fdd953a 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -343,6 +343,16 @@ int pcie_port_device_register(struct pci_dev *dev)
>   if (!nr_service)
>   goto error_cleanup_irqs;
>  
> +#ifdef CONFIG_PCIEAER
> + /*
> +  * Enable error reporting for this port in case AER probing has already
> +  * run on the root bus or this port device is hot-inserted
> +  */
> + if (dev->aer_cap && pci_aer_available() &&
> + (pcie_ports_native || pci_find_host_bridge(dev->bus)->native_aer))
> + pci_enable_pcie_error_reporting(dev);
> +#endif

I plan to apply this after we clarify the changelog a bit, but I don't
really like this patch because it (and the corresponding code added by
2bd50dd800b5 ("PCI: PCIe: Disable PCIe port services during port
initialization")) seem a little out of place.

The way I think this *should* work is that the PCI core should arrange to
handle AER interrupts when it enumerates the devices that can generate
them (Root Ports and Root Complex Event Collectors), even before it
enumerates the devices below the Root Port.

Then the PCI core could directly enable the AER interrupts 

[tip:x86/mm] resource: Fix find_next_iomem_res() iteration issue

2018-10-09 Thread tip-bot for Bjorn Helgaas
Commit-ID:  010a93bf97c72f43aac664d0a685942f83d1a103
Gitweb: https://git.kernel.org/tip/010a93bf97c72f43aac664d0a685942f83d1a103
Author: Bjorn Helgaas 
AuthorDate: Thu, 27 Sep 2018 09:22:09 -0500
Committer:  Borislav Petkov 
CommitDate: Tue, 9 Oct 2018 17:18:36 +0200

resource: Fix find_next_iomem_res() iteration issue

Previously find_next_iomem_res() used "*res" as both an input parameter for
the range to search and the type of resource to search for, and an output
parameter for the resource we found, which makes the interface confusing.

The current callers use find_next_iomem_res() incorrectly because they
allocate a single struct resource and use it for repeated calls to
find_next_iomem_res().  When find_next_iomem_res() returns a resource, it
overwrites the start, end, flags, and desc members of the struct.  If we
call find_next_iomem_res() again, we must update or restore these fields.
The previous code restored res.start and res.end, but not res.flags or
res.desc.

Since the callers did not restore res.flags, if they searched for flags
IORESOURCE_MEM | IORESOURCE_BUSY and found a resource with flags
IORESOURCE_MEM | IORESOURCE_BUSY | IORESOURCE_SYSRAM, the next search would
incorrectly skip resources unless they were also marked as
IORESOURCE_SYSRAM.

Fix this by restructuring the interface so it takes explicit "start, end,
flags" parameters and uses "*res" only as an output parameter.

Based on a patch by Lianbo Jiang .

 [ bp: While at it:
   - make comments kernel-doc style.
   -

Originally-by: 
http://lore.kernel.org/lkml/20180921073211.20097-2-liji...@redhat.com
Signed-off-by: Bjorn Helgaas 
Signed-off-by: Borislav Petkov 
CC: Andrew Morton 
CC: Brijesh Singh 
CC: Dan Williams 
CC: H. Peter Anvin 
CC: Lianbo Jiang 
CC: Takashi Iwai 
CC: Thomas Gleixner 
CC: Tom Lendacky 
CC: Vivek Goyal 
CC: Yaowei Bai 
CC: b...@redhat.com
CC: dan.j.willi...@intel.com
CC: dyo...@redhat.com
CC: ke...@lists.infradead.org
CC: mi...@redhat.com
CC: x86-ml 
Link: 
http://lkml.kernel.org/r/153805812916.1157.177580438135143788.st...@bhelgaas-glaptop.roam.corp.google.com
---
 kernel/resource.c | 96 ---
 1 file changed, 42 insertions(+), 54 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 155ec873ea4d..38b8d11c9eaf 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -318,24 +318,27 @@ int release_resource(struct resource *old)
 
 EXPORT_SYMBOL(release_resource);
 
-/*
- * Finds the lowest iomem resource existing within [res->start..res->end].
- * The caller must specify res->start, res->end, res->flags, and optionally
- * desc.  If found, returns 0, res is overwritten, if not found, returns -1.
- * This function walks the whole tree and not just first level children until
- * and unless first_level_children_only is true.
+/**
+ * Finds the lowest iomem resource that covers part of [start..end].  The
+ * caller must specify start, end, flags, and desc (which may be
+ * IORES_DESC_NONE).
+ *
+ * If a resource is found, returns 0 and *res is overwritten with the part
+ * of the resource that's within [start..end]; if none is found, returns
+ * -1.
+ *
+ * This function walks the whole tree and not just first level children
+ * unless @first_level_children_only is true.
  */
-static int find_next_iomem_res(struct resource *res, unsigned long desc,
-  bool first_level_children_only)
+static int find_next_iomem_res(resource_size_t start, resource_size_t end,
+  unsigned long flags, unsigned long desc,
+  bool first_level_children_only,
+  struct resource *res)
 {
-   resource_size_t start, end;
struct resource *p;
bool sibling_only = false;
 
BUG_ON(!res);
-
-   start = res->start;
-   end = res->end;
BUG_ON(start >= end);
 
if (first_level_children_only)
@@ -344,7 +347,7 @@ static int find_next_iomem_res(struct resource *res, 
unsigned long desc,
read_lock(_lock);
 
for (p = iomem_resource.child; p; p = next_resource(p, sibling_only)) {
-   if ((p->flags & res->flags) != res->flags)
+   if ((p->flags & flags) != flags)
continue;
if ((desc != IORES_DESC_NONE) && (desc != p->desc))
continue;
@@ -359,32 +362,31 @@ static int find_next_iomem_res(struct resource *res, 
unsigned long desc,
read_unlock(_lock);
if (!p)
return -1;
+
/* copy data */
-   if (res->start < p->start)
-   res->start = p->start;
-   if (res->end > p->end)
-   res->end = p->end;
+   res->start = max(start, p->start);
+   res->end = min(end, p->end);
res->flags = p->

[tip:x86/mm] resource: Include resource end in walk_*() interfaces

2018-10-09 Thread tip-bot for Bjorn Helgaas
Commit-ID:  a98959fdbda1849a01b2150bb635ed559ec06700
Gitweb: https://git.kernel.org/tip/a98959fdbda1849a01b2150bb635ed559ec06700
Author: Bjorn Helgaas 
AuthorDate: Thu, 27 Sep 2018 09:22:02 -0500
Committer:  Borislav Petkov 
CommitDate: Tue, 9 Oct 2018 17:18:34 +0200

resource: Include resource end in walk_*() interfaces

find_next_iomem_res() finds an iomem resource that covers part of a range
described by "start, end".  All callers expect that range to be inclusive,
i.e., both start and end are included, but find_next_iomem_res() doesn't
handle the end address correctly.

If it finds an iomem resource that contains exactly the end address, it
skips it, e.g., if "start, end" is [0x0-0x1] and there happens to be an
iomem resource [mem 0x1-0x1] (the single byte at 0x1), we skip
it:

  find_next_iomem_res(...)
  {
start = 0x0;
end = 0x1;
for (p = next_resource(...)) {
  # p->start = 0x1;
  # p->end = 0x1;
  # we *should* return this resource, but this condition is false:
  if ((p->end >= start) && (p->start < end))
break;

Adjust find_next_iomem_res() so it allows a resource that includes the
single byte at the end of the range.  This is a corner case that we
probably don't see in practice.

Fixes: 58c1b5b07907 ("[PATCH] memory hotadd fixes: find_next_system_ram catch 
range fix")
Signed-off-by: Bjorn Helgaas 
Signed-off-by: Borislav Petkov 
CC: Andrew Morton 
CC: Brijesh Singh 
CC: Dan Williams 
CC: H. Peter Anvin 
CC: Lianbo Jiang 
CC: Takashi Iwai 
CC: Thomas Gleixner 
CC: Tom Lendacky 
CC: Vivek Goyal 
CC: Yaowei Bai 
CC: b...@redhat.com
CC: dan.j.willi...@intel.com
CC: dyo...@redhat.com
CC: ke...@lists.infradead.org
CC: mi...@redhat.com
CC: x86-ml 
Link: 
http://lkml.kernel.org/r/153805812254.1157.16736368485811773752.st...@bhelgaas-glaptop.roam.corp.google.com
---
 kernel/resource.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 30e1bc68503b..155ec873ea4d 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -319,7 +319,7 @@ int release_resource(struct resource *old)
 EXPORT_SYMBOL(release_resource);
 
 /*
- * Finds the lowest iomem resource existing within [res->start.res->end).
+ * Finds the lowest iomem resource existing within [res->start..res->end].
  * The caller must specify res->start, res->end, res->flags, and optionally
  * desc.  If found, returns 0, res is overwritten, if not found, returns -1.
  * This function walks the whole tree and not just first level children until
@@ -352,7 +352,7 @@ static int find_next_iomem_res(struct resource *res, 
unsigned long desc,
p = NULL;
break;
}
-   if ((p->end >= start) && (p->start < end))
+   if ((p->end >= start) && (p->start <= end))
break;
}
 


[tip:x86/mm] x86/kexec: Correct KEXEC_BACKUP_SRC_END off-by-one error

2018-10-09 Thread tip-bot for Bjorn Helgaas
Commit-ID:  51fbf14f2528a8c6401290e37f1c893a2412f1d3
Gitweb: https://git.kernel.org/tip/51fbf14f2528a8c6401290e37f1c893a2412f1d3
Author: Bjorn Helgaas 
AuthorDate: Thu, 27 Sep 2018 09:21:55 -0500
Committer:  Borislav Petkov 
CommitDate: Tue, 9 Oct 2018 17:18:31 +0200

x86/kexec: Correct KEXEC_BACKUP_SRC_END off-by-one error

The only use of KEXEC_BACKUP_SRC_END is as an argument to
walk_system_ram_res():

  int crash_load_segments(struct kimage *image)
  {
...
walk_system_ram_res(KEXEC_BACKUP_SRC_START, KEXEC_BACKUP_SRC_END,
image, determine_backup_region);

walk_system_ram_res() expects "start, end" arguments that are inclusive,
i.e., the range to be walked includes both the start and end addresses.

KEXEC_BACKUP_SRC_END was previously defined as (640 * 1024UL), which is the
first address *past* the desired 0-640KB range.

Define KEXEC_BACKUP_SRC_END as (640 * 1024UL - 1) so the KEXEC_BACKUP_SRC
region is [0-0x9], not [0-0xa].

Fixes: dd5f726076cc ("kexec: support for kexec on panic using new system call")
Signed-off-by: Bjorn Helgaas 
Signed-off-by: Borislav Petkov 
CC: "H. Peter Anvin" 
CC: Andrew Morton 
CC: Brijesh Singh 
CC: Greg Kroah-Hartman 
CC: Ingo Molnar 
CC: Lianbo Jiang 
CC: Takashi Iwai 
CC: Thomas Gleixner 
CC: Tom Lendacky 
CC: Vivek Goyal 
CC: baiyao...@cmss.chinamobile.com
CC: b...@redhat.com
CC: dan.j.willi...@intel.com
CC: dyo...@redhat.com
CC: ke...@lists.infradead.org
Link: 
http://lkml.kernel.org/r/153805811578.1157.6948388946904655969.st...@bhelgaas-glaptop.roam.corp.google.com
---
 arch/x86/include/asm/kexec.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index f327236f0fa7..5125fca472bb 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -67,7 +67,7 @@ struct kimage;
 
 /* Memory to backup during crash kdump */
 #define KEXEC_BACKUP_SRC_START (0UL)
-#define KEXEC_BACKUP_SRC_END   (640 * 1024UL)  /* 640K */
+#define KEXEC_BACKUP_SRC_END   (640 * 1024UL - 1)  /* 640K */
 
 /*
  * CPU does not save ss and sp on stack if execution is already


Re: x86/mm: Found insecure W+X mapping at address (ptrval)/0xc00a0000

2018-10-08 Thread Bjorn Helgaas
On Mon, Oct 8, 2018 at 2:37 PM Thomas Gleixner  wrote:
>
> Paul,
>
> On Fri, 5 Oct 2018, Paul Menzel wrote:
> > On 10/05/18 11:27, Thomas Gleixner wrote:
> > > If pcibios is enabled and used, need to look at the gory details of that
> > > first, then the W+X check has to exclude that region. We can't do much
> > > about that.
> >
> > That would also explain, why it only happens with the SeaBIOS payload,
> > which sets up legacy BIOS calls. Using GRUB directly as payload, no BIOS
> > calls are set up.
> >
> > Reading the Kconfig description of the PCI access mode, the BIOS should
> > only be used last.
>
> Correct. And looking at the dmesg you provided it is initialized:
>
> [0.441062] PCI: PCI BIOS area is rw and x. Use pci=nobios if you want it 
> NX.
> [0.441062] PCI: PCI BIOS revision 2.10 entry at 0xffa40, last bus=3
>
> Though I assume it's not really required, but this PCI BIOS thing is not
> really well documented and there are some obsure usage sites involved.
>
> Bjorn, do you have any insight or did you flush those memories long ago?

No, I don't.  I was never really involved with PCIBIOS.


Re: [PATCH] PCI: expand the "PF" acronym in Kconfig help text

2018-10-08 Thread Bjorn Helgaas
On Sat, Oct 06, 2018 at 08:56:33PM -0700, Randy Dunlap wrote:
> From: Randy Dunlap 
> 
> Tell users what a PCI PF is in the PCI_PF_STUB config help text.
> 
> Fixes: a8ccf8a3 ("PCI/IOV: Add pci-pf-stub driver for PFs that only 
> enable VFs")
> 
> Signed-off-by: Randy Dunlap 
> Cc: Alexander Duyck 

Applied with Alexander's ack to for-linus for v4.19.

> ---
>  drivers/pci/Kconfig |6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> --- lnx-419-rc6.orig/drivers/pci/Kconfig
> +++ lnx-419-rc6/drivers/pci/Kconfig
> @@ -73,9 +73,9 @@ config PCI_PF_STUB
>   depends on PCI_IOV
>   help
> Say Y or M here if you want to enable support for devices that
> -   require SR-IOV support, while at the same time the PF itself is
> -   not providing any actual services on the host itself such as
> -   storage or networking.
> +   require SR-IOV support, while at the same time the PF (Physical
> +   Function) itself is not providing any actual services on the
> +   host itself such as storage or networking.
>  
> When in doubt, say N.
>  
> 
> 


Re: [PATCH v6 0/9] PCI: mediatek: fixup find_port, enable_msi and add pm, module support

2018-10-08 Thread Bjorn Helgaas
On Mon, Oct 08, 2018 at 11:24:39AM +0800, honghui.zh...@mediatek.com wrote:

> Honghui Zhang (9):
>   PCI: mediatek: Using slot's devfn for compare to fix
> mtk_pcie_find_port logic
>   PCI: mediatek: Fixup class ID for MT7622 as PCI_CLASS_BRIDGE_PCI
>   PCI: mediatek: Remove the redundant dev->pm_domain check
>   PCI: mediatek: Convert to use pci_host_probe()
>   PCI: mediatek: Move the mtk_pcie_startup_port_v2 function's define
> after mtk_pcie_setup_irq
>   PCI: mediatek: Fixup enable msi logic by enable msi after clock
> enabled

s/msi/MSI/ (twice)

>   PCI: mediatek: Add system pm support for MT2712 and MT7622

s/pm/PM/

"msi" and "pm" are not English words, and capitalizing them tells the
reader that they are acronyms or initialisms (like GIC and IRQ below).

>   PCI: mediatek: Save the GIC IRQ in mtk_pcie_port
>   PCI: mediatek: Add loadable kernel module support


Re: [PATCH 00/12] error handling and pciehp maintenance

2018-10-08 Thread Bjorn Helgaas
On Mon, Oct 08, 2018 at 10:18:47AM -0600, Keith Busch wrote:
> On Fri, Oct 05, 2018 at 12:31:45PM -0500, Bjorn Helgaas wrote:
> > [+cc arm64 folks, LKML: This conversation is about this patch:
> > 
> >   
> > https://lore.kernel.org/linux-pci/20180918235848.26694-3-keith.bu...@intel.com
> > 
> > which fixes some PCIe AER error injection bugs, but also makes the error
> > injector dependent on DYNAMIC_FTRACE_WITH_REGS, which not all arches
> > support.  Note that this question is only about the error *injection*
> > module used for testing.  It doesn't affect AER support itself.]
> > 
> > On Thu, Oct 04, 2018 at 04:11:37PM -0600, Keith Busch wrote:
> > > On Thu, Oct 04, 2018 at 04:40:15PM -0500, Bjorn Helgaas wrote:
> > > > On Tue, Sep 18, 2018 at 05:58:36PM -0600, Keith Busch wrote:
> > > > > I ran into a lot of trouble testing error handling, and this series is
> > > > > just trying to simplify some things. The first 4 fix up aer_inject, 
> > > > > and
> > > > > the rest are cleanup to make better use of kernel APIs.
> > > > > 
> > > > > Keith Busch (12):
> > > > >   PCI: Set PCI bus accessors to noinline
> > > > >   PCI/AER: Covertly inject errors
> > > > >   PCI/AER: Reuse existing service device lookup
> > > > >   PCI/AER: Abstract AER interrupt handling
> > > > >   PCI/AER: Remove dead code
> > > > >   PCI/AER: Remove error source from aer struct
> > > > >   PCI/AER: Use kfifo for tracking events
> > > > >   PCI/AER: Use kfifo helper inserting locked elements
> > > > >   PCI/AER: Don't read upstream ports below fatal errors
> > > > >   PCI/AER: Use threaded IRQ for bottom half
> > > > >   PCI/AER: Use managed resource allocations
> > > > >   PCI/pciehp: Use device managed allocations
> > > > > 
> > > > >  drivers/pci/access.c  |   4 +-
> > > > >  drivers/pci/hotplug/pciehp_core.c |  14 +-
> > > > >  drivers/pci/hotplug/pciehp_hpc.c  |  48 ++
> > > > >  drivers/pci/pcie/Kconfig  |   2 +-
> > > > >  drivers/pci/pcie/aer.c| 219 ++-
> > > > >  drivers/pci/pcie/aer_inject.c | 306 
> > > > > --
> > > > >  drivers/pci/pcie/portdrv.h|   4 -
> > > > >  drivers/pci/pcie/portdrv_core.c   |   1 +
> > > > >  8 files changed, 227 insertions(+), 371 deletions(-)
> > > > 
> > > > Thanks a lot for doing this!  I applied these to pci/hotplug for
> > > > v4.20, except for "PCI/AER: Don't read upstream ports below fatal
> > > > errors", which seems to be already there via another posting, and
> > > > "PCI/pciehp: Use device managed allocations", which needs a few
> > > > tweaks.
> > > 
> > > Sounds good, and thanks for applying!
> > > 
> > > In case this went unnoticed, patch 2's aer_inject using ftrace hooks
> > > to pci config accessors is really cool and fixes several kernel crashes
> > > I encountered, but it may not work on every architecture. I'm not sure
> > > how widely aer_inject is used, so maybe there are no concerns with the
> > > DYNAMIC_FTRACE_WITH_REGS dependency, but I just want to reemphasize that
> > > dependency in case there are valid objections.
> > 
> > Oh, indeed, I hadn't noticed this arch dependency.  AFAICT, the new
> > DYNAMIC_FTRACE_WITH_REGS dependency means aer_inject will work only
> > on these arches:
> > 
> >   arm   # if (!XIP_KERNEL) && !CPU_ENDIAN_BE32 && MMU
> >   powerpc   # if PPC64 && CPU_LITTLE_ENDIAN
> >   riscv # ARCH_RV64I only
> >   s390
> >   x86
> > 
> > Notably missing is arm64, which has DYNAMIC_FTRACE but not
> > DYNAMIC_FTRACE_WITH_REGS.
> > 
> > Bjorn
> 
> Looks like the kbuild bot found an ARM kernel config that has
> DYNAMIC_FTRACE_WITH_REGS set, and then the module can't compile
> there. I'll need to update this patch regardless, and I think the right
> thing to do is maintain the "old" way with conditional compiling for
> any arch specific features.

Sounds messy, but probably the best route.

I dropped these patches for now:

  PCI/AER: Covertly inject errors with ftrace hooks
  PCI/AER: Reuse existing pcie_port_find_device() interface
  PCI/AER: Abstract AER interrupt handling


[GIT PULL] PCI fixes for v4.19

2018-10-05 Thread Bjorn Helgaas
PCI fixes:

  - Reprogram bridge prefetch registers to fix NVIDIA and Radeon issues
after suspend/resume (Daniel Drake)

  - Fix mvebu I/O mapping creation sequence (Thomas Petazzoni)

  - Fix minor MAINTAINERS file match issue (Bjorn Helgaas)


The following changes since commit f188b99f0b2d33794b4af8a225f95d1e968c0a3f:

  ACPI / hotplug / PCI: Don't scan for non-hotplug bridges if slot is not 
bridge (2018-09-26 15:39:28 -0500)

are available in the Git repository at:
 
  ssh://g...@gitolite.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git 
tags/pci-v4.19-fixes-3

for you to fetch changes up to 95375f2ab2960c135484d83ea9f8f357cb1be26a:

  PCI: mvebu: Fix PCI I/O mapping creation sequence (2018-10-01 15:42:09 -0500)


pci-v4.19-fixes-3


Bjorn Helgaas (1):
  MAINTAINERS: Remove obsolete drivers/pci pattern from ACPI section

Daniel Drake (1):
  PCI: Reprogram bridge prefetch registers on resume

Thomas Petazzoni (1):
  PCI: mvebu: Fix PCI I/O mapping creation sequence

 MAINTAINERS|  1 -
 drivers/pci/controller/pci-mvebu.c | 52 +++---
 drivers/pci/pci.c  | 27 ++--
 3 files changed, 67 insertions(+), 13 deletions(-)


Re: [PATCH 00/12] error handling and pciehp maintenance

2018-10-05 Thread Bjorn Helgaas
[+cc arm64 folks, LKML: This conversation is about this patch:

  https://lore.kernel.org/linux-pci/20180918235848.26694-3-keith.bu...@intel.com

which fixes some PCIe AER error injection bugs, but also makes the error
injector dependent on DYNAMIC_FTRACE_WITH_REGS, which not all arches
support.  Note that this question is only about the error *injection*
module used for testing.  It doesn't affect AER support itself.]

On Thu, Oct 04, 2018 at 04:11:37PM -0600, Keith Busch wrote:
> On Thu, Oct 04, 2018 at 04:40:15PM -0500, Bjorn Helgaas wrote:
> > On Tue, Sep 18, 2018 at 05:58:36PM -0600, Keith Busch wrote:
> > > I ran into a lot of trouble testing error handling, and this series is
> > > just trying to simplify some things. The first 4 fix up aer_inject, and
> > > the rest are cleanup to make better use of kernel APIs.
> > > 
> > > Keith Busch (12):
> > >   PCI: Set PCI bus accessors to noinline
> > >   PCI/AER: Covertly inject errors
> > >   PCI/AER: Reuse existing service device lookup
> > >   PCI/AER: Abstract AER interrupt handling
> > >   PCI/AER: Remove dead code
> > >   PCI/AER: Remove error source from aer struct
> > >   PCI/AER: Use kfifo for tracking events
> > >   PCI/AER: Use kfifo helper inserting locked elements
> > >   PCI/AER: Don't read upstream ports below fatal errors
> > >   PCI/AER: Use threaded IRQ for bottom half
> > >   PCI/AER: Use managed resource allocations
> > >   PCI/pciehp: Use device managed allocations
> > > 
> > >  drivers/pci/access.c  |   4 +-
> > >  drivers/pci/hotplug/pciehp_core.c |  14 +-
> > >  drivers/pci/hotplug/pciehp_hpc.c  |  48 ++
> > >  drivers/pci/pcie/Kconfig  |   2 +-
> > >  drivers/pci/pcie/aer.c| 219 ++-
> > >  drivers/pci/pcie/aer_inject.c | 306 
> > > --
> > >  drivers/pci/pcie/portdrv.h|   4 -
> > >  drivers/pci/pcie/portdrv_core.c   |   1 +
> > >  8 files changed, 227 insertions(+), 371 deletions(-)
> > 
> > Thanks a lot for doing this!  I applied these to pci/hotplug for
> > v4.20, except for "PCI/AER: Don't read upstream ports below fatal
> > errors", which seems to be already there via another posting, and
> > "PCI/pciehp: Use device managed allocations", which needs a few
> > tweaks.
> 
> Sounds good, and thanks for applying!
> 
> In case this went unnoticed, patch 2's aer_inject using ftrace hooks
> to pci config accessors is really cool and fixes several kernel crashes
> I encountered, but it may not work on every architecture. I'm not sure
> how widely aer_inject is used, so maybe there are no concerns with the
> DYNAMIC_FTRACE_WITH_REGS dependency, but I just want to reemphasize that
> dependency in case there are valid objections.

Oh, indeed, I hadn't noticed this arch dependency.  AFAICT, the new
DYNAMIC_FTRACE_WITH_REGS dependency means aer_inject will work only
on these arches:

  arm   # if (!XIP_KERNEL) && !CPU_ENDIAN_BE32 && MMU
  powerpc   # if PPC64 && CPU_LITTLE_ENDIAN
  riscv # ARCH_RV64I only
  s390
  x86

Notably missing is arm64, which has DYNAMIC_FTRACE but not
DYNAMIC_FTRACE_WITH_REGS.

Bjorn


Re: [PATCH] PCI / ACPI: Mark expected switch fall-through

2018-10-04 Thread Bjorn Helgaas
On Thu, Oct 04, 2018 at 05:40:41PM +0200, Gustavo A. R. Silva wrote:
> In preparation to enabling -Wimplicit-fallthrough, mark switch cases
> where we are expecting to fall through.
> 
> Addresses-Coverity-ID: 1472052 ("Missing break in switch")
> Signed-off-by: Gustavo A. R. Silva 

Applied to pci/misc for v4.20, thanks!

> ---
>  drivers/pci/pci-acpi.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> index 79c8e95..2a4aa64 100644
> --- a/drivers/pci/pci-acpi.c
> +++ b/drivers/pci/pci-acpi.c
> @@ -588,6 +588,7 @@ static int acpi_pci_set_power_state(struct pci_dev *dev, 
> pci_power_t state)
>   error = -EBUSY;
>   break;
>   }
> + /* Fall through */
>   case PCI_D0:
>   case PCI_D1:
>   case PCI_D2:
> -- 
> 2.7.4
> 


Re: [PATCH 01/16] x86/PCI: Replace spin_is_locked() with lockdep

2018-10-03 Thread Bjorn Helgaas
On Tue, Oct 02, 2018 at 10:38:47PM -0700, Lance Roy wrote:
> lockdep_assert_held() is better suited to checking locking requirements,
> since it won't get confused when someone else holds the lock. This is
> also a step towards possibly removing spin_is_locked().
> 
> Signed-off-by: Lance Roy 
> Cc: Bjorn Helgaas 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: "H. Peter Anvin" 
> Cc: 
> Cc: 

I assume you plan to merge the whole series together.  I don't object
to that, but I don't know enough to be able to formally ack this.

It would be useful to include a tiny bit more detail in the changelog.
The spin_is_locked() documentation doesn't mention anything about
differences with respect to the lock being held by self vs by someone
else, so I can't tell where the confusion arises.

Bjorn

> ---
>  arch/x86/pci/i386.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
> index ed4ac215305d..24bb58a007de 100644
> --- a/arch/x86/pci/i386.c
> +++ b/arch/x86/pci/i386.c
> @@ -59,7 +59,7 @@ static struct pcibios_fwaddrmap 
> *pcibios_fwaddrmap_lookup(struct pci_dev *dev)
>  {
>   struct pcibios_fwaddrmap *map;
>  
> - WARN_ON_SMP(!spin_is_locked(_fwaddrmap_lock));
> + lockdep_assert_held(_fwaddrmap_lock);
>  
>   list_for_each_entry(map, _fwaddrmappings, list)
>   if (map->dev == dev)
> -- 
> 2.19.0
> 


Re: [PATCH v4 1/6] mpt3sas: Introduce mpt3sas_base_pci_device_is_available

2018-10-02 Thread Bjorn Helgaas
On Mon, Oct 01, 2018 at 03:40:51PM -0500, Bjorn Helgaas wrote:
> I think the names "pci_device_is_present()" and
> "mpt3sas_base_pci_device_is_available()" contribute to the problem
> because they make promises that can't be kept -- all we can say is
> that the device *was* present, but we know whether it is *still*
> present.

Oops, I meant "we DON'T know whether it is still present."

> I think it would be better if the interfaces were something
> like "pci_device_is_absent()" because that gives a result we can rely
> on.  If that returns true, we know the device is definitely gone.
> 
> Bjorn


Re: linux-next: build warnings after merge of the pci tree

2018-10-02 Thread Bjorn Helgaas
On Mon, Oct 1, 2018 at 7:26 PM Stephen Rothwell  wrote:
>
> Hi Bjorn,
>
> After merging the pci tree, today's linux-next build (powerpc
> ppc64_defconfig) produced these warning:
>
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c: In function 
> 'ixgbe_io_slot_reset':
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:11102:6: warning: unused 
> variable 'err' [-Wunused-variable]
>   int err;
>   ^~~
> drivers/net/ethernet/intel/i40e/i40e_main.c: In function 
> 'i40e_pci_error_slot_reset':
> drivers/net/ethernet/intel/i40e/i40e_main.c:14230:6: warning: unused variable 
> 'err' [-Wunused-variable]
>   int err;
>   ^~~
>
> also from the x86_64 allmodconfig build:
>
> drivers/dma/ioat/init.c: In function 'ioat_pcie_error_slot_reset':
> drivers/dma/ioat/init.c:1255:6: warning: unused variable 'err' 
> [-Wunused-variable]
>   int err;
>   ^~~
> drivers/net/ethernet/sfc/efx.c: In function 'efx_io_slot_reset':
> drivers/net/ethernet/sfc/efx.c:3824:6: warning: unused variable 'rc' 
> [-Wunused-variable]
>   int rc;
>   ^~
> drivers/net/ethernet/sfc/falcon/efx.c: In function 'ef4_io_slot_reset':
> drivers/net/ethernet/sfc/falcon/efx.c:3163:6: warning: unused variable 'rc' 
> [-Wunused-variable]
>   int rc;
>   ^~

Fixed the above, thanks!

> Introduced by commit
>
>   6dcde3e574b2 ("XXX PCI/AER: Remove pci_cleanup_aer_uncorrect_error_status() 
> calls")
> (good subject prefix there :-))

Yeah, that's a note to self to update something in the changelog.

Bjorn


Re: [BISECTED] Regression: Solidrun Clearfog Base won't boot since "PCI: mvebu: Only remap I/O space if configured"

2018-10-01 Thread Bjorn Helgaas
On Mon, Oct 01, 2018 at 02:51:48PM +0200, Thomas Petazzoni wrote:
> Hello,
> 
> On Mon, 01 Oct 2018 12:56:37 +0200, Jan Kundrát wrote:
> 
> > Thomas, Russell, Lorenzo,
> > did you have time to convert this into a patch which can hit 4.19? I don't 
> > see anything related in 4.19-rc6, but perhaps I missed something. Is there 
> > something that I should test or otherwise help?
> 
> Sorry, I suddenly got busy (my second son arrived a few days earlier
> than expected).

Congratulations!

> I just sent a proper patch with the proposal I made last week, after
> testing on ClearFog and Armada XP GP. Note that on ClearFog, I only
> tested that it fixes the panic at boot, since I didn't had any
> mini-PCIe devices at hand. On Armada XP GP, I verified that an E1000E
> NIC was still working as expected. Therefore, it would be useful if
> you could test on your ClearFog platform with PCI devices connected.
> 
> Thanks a lot and sorry for the delay.

And thanks for the patch!  No need to apologize for having a life :)

Bjorn


Re: Bad MAINTAINERS pattern in section 'ACPI'

2018-10-01 Thread Bjorn Helgaas
On Fri, Sep 28, 2018 at 06:06:17PM -0500, Bjorn Helgaas wrote:
> [+cc Tony, Borislav (ACPI APEI reviewers), linux-pci]
> 
> On Fri, Sep 28, 2018 at 02:50:53PM -0700, Joe Perches wrote:
> > Please fix this defect appropriately.
> > 
> > linux-next MAINTAINERS section:
> > 
> > 308 ACPI
> > 309 M:  "Rafael J. Wysocki" 
> > 310 M:  Len Brown 
> > 311 L:  linux-a...@vger.kernel.org
> > 312 W:  https://01.org/linux-acpi
> > 313 Q:  https://patchwork.kernel.org/project/linux-acpi/list/
> > 314 T:  git 
> > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
> > 315 B:  https://bugzilla.kernel.org
> > 316 S:  Supported
> > 317 F:  drivers/acpi/
> > 318 F:  drivers/pnp/pnpacpi/
> > 319 F:  include/linux/acpi.h
> > 320 F:  include/linux/fwnode.h
> > 321 F:  include/acpi/
> > 322 F:  Documentation/acpi/
> > 323 F:  Documentation/ABI/testing/sysfs-bus-acpi
> > 324 F:  Documentation/ABI/testing/configfs-acpi
> > 325 F:  drivers/pci/*acpi*
> > 326 F:  drivers/pci/*/*acpi*
> > --> 327 F:  drivers/pci/*/*/*acpi*
> > 328 F:  tools/power/acpi/
> 
> My proposal to fix this:
> 
> commit a99051c0d3c59fd259fd76a8bbd9837b76b509d9
> Author: Bjorn Helgaas 
> Date:   Fri Sep 28 17:34:21 2018 -0500
> 
> MAINTAINERS: Remove obsolete drivers/pci pattern from ACPI section
> 
> Prior to 256a45937093 ("PCI/AER: Squash aerdrv_acpi.c into aerdrv.c"),
> drivers/pci/pcie/aer/aerdrv_acpi.c contained code to parse the ACPI HEST
> table.  That code now lives in drivers/pci/pcie/aer.c.
> 
> Remove the "F: drivers/pci/*/*/*acpi*" pattern because it matches nothing.
> 
> We could add a "F: drivers/pci/pcie/aer.c" pattern to the ACPI APEI
> section, but that file sees a lot of changes, almost none of which are of
> interest to the ACPI folks.
> 
> Signed-off-by: Bjorn Helgaas 
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 700408b7bc53..9babd8a0406b 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -324,7 +324,6 @@ F:Documentation/ABI/testing/sysfs-bus-acpi
>  F:   Documentation/ABI/testing/configfs-acpi
>  F:   drivers/pci/*acpi*
>  F:   drivers/pci/*/*acpi*
> -F:   drivers/pci/*/*/*acpi*
>  F:   tools/power/acpi/
>  
>  ACPI APEI

Applied with Rafael's ack to for-linus for v4.19.


Re: Bad MAINTAINERS pattern in section 'ACPI'

2018-09-28 Thread Bjorn Helgaas
[+cc Tony, Borislav (ACPI APEI reviewers), linux-pci]

On Fri, Sep 28, 2018 at 02:50:53PM -0700, Joe Perches wrote:
> Please fix this defect appropriately.
> 
> linux-next MAINTAINERS section:
> 
>   308 ACPI
>   309 M:  "Rafael J. Wysocki" 
>   310 M:  Len Brown 
>   311 L:  linux-a...@vger.kernel.org
>   312 W:  https://01.org/linux-acpi
>   313 Q:  https://patchwork.kernel.org/project/linux-acpi/list/
>   314 T:  git 
> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
>   315 B:  https://bugzilla.kernel.org
>   316 S:  Supported
>   317 F:  drivers/acpi/
>   318 F:  drivers/pnp/pnpacpi/
>   319 F:  include/linux/acpi.h
>   320 F:  include/linux/fwnode.h
>   321 F:  include/acpi/
>   322 F:  Documentation/acpi/
>   323 F:  Documentation/ABI/testing/sysfs-bus-acpi
>   324 F:  Documentation/ABI/testing/configfs-acpi
>   325 F:  drivers/pci/*acpi*
>   326 F:  drivers/pci/*/*acpi*
> -->   327 F:  drivers/pci/*/*/*acpi*
>   328 F:  tools/power/acpi/

My proposal to fix this:

commit a99051c0d3c59fd259fd76a8bbd9837b76b509d9
Author: Bjorn Helgaas 
Date:   Fri Sep 28 17:34:21 2018 -0500

MAINTAINERS: Remove obsolete drivers/pci pattern from ACPI section

Prior to 256a45937093 ("PCI/AER: Squash aerdrv_acpi.c into aerdrv.c"),
drivers/pci/pcie/aer/aerdrv_acpi.c contained code to parse the ACPI HEST
table.  That code now lives in drivers/pci/pcie/aer.c.

Remove the "F: drivers/pci/*/*/*acpi*" pattern because it matches nothing.

We could add a "F: drivers/pci/pcie/aer.c" pattern to the ACPI APEI
section, but that file sees a lot of changes, almost none of which are of
interest to the ACPI folks.

Signed-off-by: Bjorn Helgaas 

diff --git a/MAINTAINERS b/MAINTAINERS
index 700408b7bc53..9babd8a0406b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -324,7 +324,6 @@ F:  Documentation/ABI/testing/sysfs-bus-acpi
 F: Documentation/ABI/testing/configfs-acpi
 F: drivers/pci/*acpi*
 F: drivers/pci/*/*acpi*
-F: drivers/pci/*/*/*acpi*
 F: tools/power/acpi/
 
 ACPI APEI


Re: [PATCH v3] PCI: Reprogram bridge prefetch registers on resume

2018-09-27 Thread Bjorn Helgaas
[+cc LKML]

On Tue, Sep 18, 2018 at 04:32:44PM -0500, Bjorn Helgaas wrote:
> On Thu, Sep 13, 2018 at 11:37:45AM +0800, Daniel Drake wrote:
> > On 38+ Intel-based Asus products, the nvidia GPU becomes unusable
> > after S3 suspend/resume. The affected products include multiple
> > generations of nvidia GPUs and Intel SoCs. After resume, nouveau logs
> > many errors such as:
> > 
> > fifo: fault 00 [READ] at 00555000 engine 00 [GR] client 04
> >   [HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
> > DRM: failed to idle channel 0 [DRM]
> > 
> > Similarly, the nvidia proprietary driver also fails after resume
> > (black screen, 100% CPU usage in Xorg process). We shipped a sample
> > to Nvidia for diagnosis, and their response indicated that it's a
> > problem with the parent PCI bridge (on the Intel SoC), not the GPU.
> > 
> > Runtime suspend/resume works fine, only S3 suspend is affected.
> > 
> > We found a workaround: on resume, rewrite the Intel PCI bridge
> > 'Prefetchable Base Upper 32 Bits' register (PCI_PREF_BASE_UPPER32). In
> > the cases that I checked, this register has value 0 and we just have to
> > rewrite that value.
> > 
> > Linux already saves and restores PCI config space during suspend/resume,
> > but this register was being skipped because upon resume, it already
> > has value 0 (the correct, pre-suspend value).
> > 
> > Intel appear to have previously acknowledged this behaviour and the
> > requirement to rewrite this register.
> > https://bugzilla.kernel.org/show_bug.cgi?id=116851#c23
> > 
> > Based on that, rewrite the prefetch register values even when that
> > appears unnecessary.
> > 
> > We have confirmed this solution on all the affected models we have
> > in-hands (X542UQ, UX533FD, X530UN, V272UN).
> > 
> > Additionally, this solves an issue where r8169 MSI-X interrupts were
> > broken after S3 suspend/resume on Asus X441UAR. This issue was recently
> > worked around in commit 7bb05b85bc2d ("r8169: don't use MSI-X on
> > RTL8106e"). It also fixes the same issue on RTL6186evl/8111evl on an
> > Aimfor-tech laptop that we had not yet patched. I suspect it will also
> > fix the issue that was worked around in commit 7c53a722459c ("r8169:
> > don't use MSI-X on RTL8168g").
> > 
> > Thomas Martitz reports that this change also solves an issue where
> > the AMD Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive
> > after S3 suspend/resume.
> > 
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=201069
> > Signed-off-by: Daniel Drake 
> 
> Applied with Rafael's and Peter's reviewed-by to pci/enumeration for v4.20.
> Thanks for the the huge investigative effort!

Since this looks low-risk and fixes several painful issues, I think
this merits a stable tag and being included in v4.19 (instead of
waiting for v4.20).  

I moved it to for-linus for v4.19.  Let me know if you object.

> > ---
> >  drivers/pci/pci.c | 25 +
> >  1 file changed, 17 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 29ff9619b5fa..5d58220b6997 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -1289,12 +1289,12 @@ int pci_save_state(struct pci_dev *dev)
> >  EXPORT_SYMBOL(pci_save_state);
> >  
> >  static void pci_restore_config_dword(struct pci_dev *pdev, int offset,
> > -u32 saved_val, int retry)
> > +u32 saved_val, int retry, bool force)
> >  {
> > u32 val;
> >  
> > pci_read_config_dword(pdev, offset, );
> > -   if (val == saved_val)
> > +   if (!force && val == saved_val)
> > return;
> >  
> > for (;;) {
> > @@ -1313,25 +1313,34 @@ static void pci_restore_config_dword(struct pci_dev 
> > *pdev, int offset,
> >  }
> >  
> >  static void pci_restore_config_space_range(struct pci_dev *pdev,
> > -  int start, int end, int retry)
> > +  int start, int end, int retry,
> > +  bool force)
> >  {
> > int index;
> >  
> > for (index = end; index >= start; index--)
> > pci_restore_config_dword(pdev, 4 * index,
> >  pdev->saved_config_space[index],
> > -retry);
> > +retry, force);
> >  }
> >  

[GIT PULL] PCI fixes for v4.19

2018-09-27 Thread Bjorn Helgaas
PCI fixes:

  - Fix ACPI hotplug issue that causes black screen crash at boot (Mika
Westerberg)

  - Fix DesignWare "scheduling while atomic" issues (Jisheng Zhang)

  - Add PPC contacts to MAINTAINERS for PCI core error handling (Bjorn
    Helgaas)

  - Sort Mobiveil MAINTAINERS entry (Lorenzo Pieralisi)


The following changes since commit 7876320f88802b22d4e2daf7eb027dd14175a0f8:

  Linux 4.19-rc4 (2018-09-16 11:52:37 -0700)

are available in the Git repository at:

  ssh://g...@gitolite.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git 
tags/pci-v4.19-fixes-2

for you to fetch changes up to f188b99f0b2d33794b4af8a225f95d1e968c0a3f:

  ACPI / hotplug / PCI: Don't scan for non-hotplug bridges if slot is not 
bridge (2018-09-26 15:39:28 -0500)


pci-v4.19-fixes-2

--------
Bjorn Helgaas (1):
  MAINTAINERS: Update PPC contacts for PCI core error handling

Jisheng Zhang (1):
  PCI: dwc: Fix scheduling while atomic issues

Lorenzo Pieralisi (1):
  MAINTAINERS: Move mobiveil PCI driver entry where it belongs

Mika Westerberg (1):
  ACPI / hotplug / PCI: Don't scan for non-hotplug bridges if slot is not 
bridge

 MAINTAINERS  | 20 +---
 drivers/pci/controller/dwc/pcie-designware.c |  8 
 drivers/pci/controller/dwc/pcie-designware.h |  3 +--
 drivers/pci/hotplug/acpiphp_glue.c   | 11 ++-
 4 files changed, 24 insertions(+), 18 deletions(-)


Re: [PATCH v4 1/6] mpt3sas: Introduce mpt3sas_base_pci_device_is_available

2018-09-27 Thread Bjorn Helgaas
[+cc Ben, Russell, Sam, Oliver in case they have pertinent experience from
powerpc error handling; thread begins at
https://lore.kernel.org/linux-pci/1537935759-14754-1-git-send-email-suganath-prabu.subram...@broadcom.com/]

On Thu, Sep 27, 2018 at 09:03:27AM +0200, Lukas Wunner wrote:
> On Wed, Sep 26, 2018 at 04:32:41PM -0500, Bjorn Helgaas wrote:
> > On Wed, Sep 26, 2018 at 09:52:34AM +0530, Suganath Prabu S wrote:
> > > @@ -6853,6 +6872,13 @@ mpt3sas_wait_for_commands_to_complete(struct 
> > > MPT3SAS_ADAPTER *ioc)
> > >  
> > >   ioc->pending_io_count = 0;
> > >  
> > > + if (!mpt3sas_base_pci_device_is_available(ioc)) {
> > > + pr_err(MPT3SAS_FMT
> > > + "%s: pci error recovery reset or pci device unplug 
> > > occurred\n",
> > > + ioc->name, __func__);
> > > + return;
> > > + }
> > > +
> > >   ioc_state = mpt3sas_base_get_iocstate(ioc, 0);
> > 
> > This is a good example of why I don't like pci_device_is_present(): it
> > is fundamentally racy and gives a false sense of security.  Here we
> > *think* we're making the code safer, but in fact we could have this
> > sequence:
> > 
> >   mpt3sas_base_pci_device_is_available()# returns true
> >   # device is removed
> >   ioc_state = mpt3sas_base_get_iocstate()
> > 
> > In this case the readl() inside mpt3sas_base_get_iocstate() will
> > probably return 0x data, and we assume that's valid and
> > continue on our merry way, pretending that "ioc_state" makes sense
> > when it really doesn't.
> 
> The function does the following:
> 
>   ioc_state = mpt3sas_base_get_iocstate(ioc, 0);
>   if ((ioc_state & MPI2_IOC_STATE_MASK) != MPI2_IOC_STATE_OPERATIONAL)
>   return;
> 
> where MPI2_IOC_STATE_MASK is 0xF000 and MPI2_IOC_STATE_OPERATIONAL
> is 0x2000.  If the device is removed after the call to
> mpt3sas_base_pci_device_is_available(), the result of the bitwise "and"
> operation would be 0xF000, which is unequal to 0x2000.
> Hence this looks safe.

I agree this particular case is technically safe, but figuring that
out requires an unreasonable amount of analysis.  And there's no hint
in the code that we need to be concerned about whether the readl()
returns valid data, so the need for the analysis won't even occur to
most readers.

I don't feel good about encouraging this style of adding an explicit
test for whether the device is available, followed by a completely
implicit test that accidentally happens to correctly handle a device
that was removed after the explicit test.

If we instead added a test for ~0 after the readl(), we would avoid
the race and give the reader a clue that *any* read from the device
can potentially fail without advance warning.

> I agree that pci_device_is_present() (and the pci_dev_is_disconnected()
> it calls) must be used judiciously, but here it seems to have been done
> correctly.
> 
> One thing to be aware of is that a return value of "true" from
> pci_dev_is_disconnected() is definitive and can be trusted.
> On the other hand a return value of "false" is more like a fuzzy
> "likely not disconnected, but can't give any guarantees".
> So the boolean return value is kind of the problem here.
> Boolean logic doesn't really fit these "definitive if true,
> not definitive if false" semantics.
> 
> However being able to get the definitive answer in the disconnected
> case is valuable:  pciehp is the only entity that can determine
> surprise removal authoritatively and unambiguously (albeit with
> a latency).  All the other tools that we have at our disposal don't
> have that quality:  E.g. checking the Vendor ID is ambiguous because
> it returns a valid value if a device was quickly replaced with another
> one.  Also, all ones may be returned in the case of an Uncorrectable
> Error, but the device may revert to valid responses if the error can
> be recovered.  (Please correct me if I'm wrong.)

I think everything you said above is true, but I'm not yet convinced
that it's being applied usefully in mpt3sas.

  bool pci_dev_is_disconnected(pdev)   # "true" is definitive
  {
return test_bit(PCI_DEV_DISCONNECTED, >priv_flags);
  }

  bool pci_device_is_present(pdev) # "false" is definitive
  {
if (pci_dev_is_disconnected(pdev))
  return false;
return pci_bus_read_dev_vendor_id(...);
  }

  mpt3sas_base_pci_device_is_available(ioc)  # "false" is definitive
  {
return !ioc->pci_error_recovery && pci_device_is_present(ioc->pdev);
  }

  mpt3sas_wait_for_commands_to_complete

[PATCH 3/3] resource: Fix find_next_iomem_res() iteration issue

2018-09-27 Thread Bjorn Helgaas
From: Bjorn Helgaas 

Previously find_next_iomem_res() used "*res" as both an input parameter for
the range to search and the type of resource to search for, and an output
parameter for the resource we found, which makes the interface confusing.

The current callers use find_next_iomem_res() incorrectly because they
allocate a single struct resource and use it for repeated calls to
find_next_iomem_res().  When find_next_iomem_res() returns a resource, it
overwrites the start, end, flags, and desc members of the struct.  If we
call find_next_iomem_res() again, we must update or restore these fields.
The previous code restored res.start and res.end, but not res.flags or
res.desc.

Since the callers did not restore res.flags, if they searched for flags
IORESOURCE_MEM | IORESOURCE_BUSY and found a resource with flags
IORESOURCE_MEM | IORESOURCE_BUSY | IORESOURCE_SYSRAM, the next search would
incorrectly skip resources unless they were also marked as
IORESOURCE_SYSRAM.

Fix this by restructuring the interface so it takes explicit "start, end,
flags" parameters and uses "*res" only as an output parameter.

Original-patch: 
http://lore.kernel.org/lkml/20180921073211.20097-2-liji...@redhat.com
Based-on-patch-by: Lianbo Jiang 
Signed-off-by: Bjorn Helgaas 
---
 kernel/resource.c |   94 +++--
 1 file changed, 41 insertions(+), 53 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 155ec873ea4d..9891ea90cc8f 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -319,23 +319,26 @@ int release_resource(struct resource *old)
 EXPORT_SYMBOL(release_resource);
 
 /*
- * Finds the lowest iomem resource existing within [res->start..res->end].
- * The caller must specify res->start, res->end, res->flags, and optionally
- * desc.  If found, returns 0, res is overwritten, if not found, returns -1.
- * This function walks the whole tree and not just first level children until
- * and unless first_level_children_only is true.
+ * Finds the lowest iomem resource that covers part of [start..end].  The
+ * caller must specify start, end, flags, and desc (which may be
+ * IORES_DESC_NONE).
+ *
+ * If a resource is found, returns 0 and *res is overwritten with the part
+ * of the resource that's within [start..end]; if none is found, returns
+ * -1.
+ *
+ * This function walks the whole tree and not just first level children
+ * unless first_level_children_only is true.
  */
-static int find_next_iomem_res(struct resource *res, unsigned long desc,
-  bool first_level_children_only)
+static int find_next_iomem_res(resource_size_t start, resource_size_t end,
+  unsigned long flags, unsigned long desc,
+  bool first_level_children_only,
+  struct resource *res)
 {
-   resource_size_t start, end;
struct resource *p;
bool sibling_only = false;
 
BUG_ON(!res);
-
-   start = res->start;
-   end = res->end;
BUG_ON(start >= end);
 
if (first_level_children_only)
@@ -344,7 +347,7 @@ static int find_next_iomem_res(struct resource *res, 
unsigned long desc,
read_lock(_lock);
 
for (p = iomem_resource.child; p; p = next_resource(p, sibling_only)) {
-   if ((p->flags & res->flags) != res->flags)
+   if ((p->flags & flags) != flags)
continue;
if ((desc != IORES_DESC_NONE) && (desc != p->desc))
continue;
@@ -359,32 +362,31 @@ static int find_next_iomem_res(struct resource *res, 
unsigned long desc,
read_unlock(_lock);
if (!p)
return -1;
+
/* copy data */
-   if (res->start < p->start)
-   res->start = p->start;
-   if (res->end > p->end)
-   res->end = p->end;
+   res->start = max(start, p->start);
+   res->end = min(end, p->end);
res->flags = p->flags;
res->desc = p->desc;
return 0;
 }
 
-static int __walk_iomem_res_desc(struct resource *res, unsigned long desc,
-bool first_level_children_only,
-void *arg,
+static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
+unsigned long flags, unsigned long desc,
+bool first_level_children_only, void *arg,
 int (*func)(struct resource *, void *))
 {
-   u64 orig_end = res->end;
+   struct resource res;
int ret = -1;
 
-   while ((res->start < res->end) &&
-  !find_next_iomem_res(res, desc, first_level_children_only)) {
-   ret = (*func)(res, arg);
+   while (start < end &

[PATCH 2/3] resource: Include resource end in walk_*() interfaces

2018-09-27 Thread Bjorn Helgaas
From: Bjorn Helgaas 

find_next_iomem_res() finds an iomem resource that covers part of a range
described by "start, end".  All callers expect that range to be inclusive,
i.e., both start and end are included, but find_next_iomem_res() doesn't
handle the end address correctly.

If it finds an iomem resource that contains exactly the end address, it
skips it, e.g., if "start, end" is [0x0-0x1] and there happens to be an
iomem resource [mem 0x1-0x1] (the single byte at 0x1), we skip
it:

  find_next_iomem_res(...)
  {
start = 0x0;
end = 0x1;
for (p = next_resource(...)) {
  # p->start = 0x1;
  # p->end = 0x1;
  # we *should* return this resource, but this condition is false:
  if ((p->end >= start) && (p->start < end))
break;

Adjust find_next_iomem_res() so it allows a resource that includes the
single byte at the end of the range.  This is a corner case that we
probably don't see in practice.

Fixes: 58c1b5b07907 ("[PATCH] memory hotadd fixes: find_next_system_ram catch 
range fix")
Signed-off-by: Bjorn Helgaas 
---
 kernel/resource.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 30e1bc68503b..155ec873ea4d 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -319,7 +319,7 @@ int release_resource(struct resource *old)
 EXPORT_SYMBOL(release_resource);
 
 /*
- * Finds the lowest iomem resource existing within [res->start.res->end).
+ * Finds the lowest iomem resource existing within [res->start..res->end].
  * The caller must specify res->start, res->end, res->flags, and optionally
  * desc.  If found, returns 0, res is overwritten, if not found, returns -1.
  * This function walks the whole tree and not just first level children until
@@ -352,7 +352,7 @@ static int find_next_iomem_res(struct resource *res, 
unsigned long desc,
p = NULL;
break;
}
-   if ((p->end >= start) && (p->start < end))
+   if ((p->end >= start) && (p->start <= end))
break;
}
 



[PATCH 1/3] x86/kexec: Correct KEXEC_BACKUP_SRC_END off-by-one error

2018-09-27 Thread Bjorn Helgaas
From: Bjorn Helgaas 

The only use of KEXEC_BACKUP_SRC_END is as an argument to
walk_system_ram_res():

  int crash_load_segments(struct kimage *image)
  {
...
walk_system_ram_res(KEXEC_BACKUP_SRC_START, KEXEC_BACKUP_SRC_END,
image, determine_backup_region);

walk_system_ram_res() expects "start, end" arguments that are inclusive,
i.e., the range to be walked includes both the start and end addresses.

KEXEC_BACKUP_SRC_END was previously defined as (640 * 1024UL), which is the
first address *past* the desired 0-640KB range.

Define KEXEC_BACKUP_SRC_END as (640 * 1024UL - 1) so the KEXEC_BACKUP_SRC
region is [0-0x9], not [0-0xa].

Fixes: dd5f726076cc ("kexec: support for kexec on panic using new system call")
Signed-off-by: Bjorn Helgaas 
---
 arch/x86/include/asm/kexec.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index f327236f0fa7..5125fca472bb 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -67,7 +67,7 @@ struct kimage;
 
 /* Memory to backup during crash kdump */
 #define KEXEC_BACKUP_SRC_START (0UL)
-#define KEXEC_BACKUP_SRC_END   (640 * 1024UL)  /* 640K */
+#define KEXEC_BACKUP_SRC_END   (640 * 1024UL - 1)  /* 640K */
 
 /*
  * CPU does not save ss and sp on stack if execution is already



[PATCH 0/3] find_next_iomem_res() fixes

2018-09-27 Thread Bjorn Helgaas
These fix:

  - A kexec off-by-one error that we probably never see in practice (only
affects a resource starting at exactly 640KB).

  - A corner case in walk_iomem_res_desc(), walk_system_ram_res(), etc that
we probably also never see in practice (only affects a single byte
resource at the very end of the region we're searching)

  - An issue in walk_iomem_res_desc() that apparently causes a kdump issue
(see Lianbo's note at [1]).

I think we need to fix the kdump issue either by these patches
(specifically the last one) or by the patch Lianbo posted [2].

I'm hoping to avoid deciding and merging these myself, but I'm not
sure who really wants to own kernel/resource.c.

[1] https://lore.kernel.org/lkml/01551d06-c421-5df3-b19f-fc66f3639...@redhat.com
[2] https://lore.kernel.org/lkml/20180921073211.20097-2-liji...@redhat.com

---

Bjorn Helgaas (3):
  x86/kexec: Correct KEXEC_BACKUP_SRC_END off-by-one error
  resource: Include resource end in walk_*() interfaces
  resource: Fix find_next_iomem_res() iteration issue


 arch/x86/include/asm/kexec.h |2 -
 kernel/resource.c|   96 ++
 2 files changed, 43 insertions(+), 55 deletions(-)


Re: [PATCH 3/3] resource: Fix find_next_iomem_res() iteration issue

2018-09-27 Thread Bjorn Helgaas
On Thu, Sep 27, 2018 at 01:27:41PM +0800, lijiang wrote:
> 在 2018年09月25日 06:15, Bjorn Helgaas 写道:
> > From: Bjorn Helgaas 
> > 
> > Previously find_next_iomem_res() used "*res" as both an input parameter for
> > the range to search and the type of resource to search for, and an output
> > parameter for the resource we found, which makes the interface confusing
> > and hard to use correctly.
> > 
> > All callers allocate a single struct resource and use it for repeated calls
> > to find_next_iomem_res().  When find_next_iomem_res() returns a resource,
> > it overwrites the start, end, flags, and desc members of the struct.  If we
> > call find_next_iomem_res() again, we must update or restore these fields.
> > 
> > The callers (__walk_iomem_res_desc() and walk_system_ram_range()) do not
> > restore res->flags, so if the caller is searching for flags of
> > IORESOURCE_MEM | IORESOURCE_BUSY and finds a resource with flags of
> > IORESOURCE_MEM | IORESOURCE_BUSY | IORESOURCE_SYSRAM, the next search will
> > find only resources marked as IORESOURCE_SYSRAM.
> > 
> > Fix this by restructuring the interface so it takes explicit "start, end,
> > flags" parameters and uses "*res" only as an output parameter.
> 
> Hi, Bjorn
> I personally suggest that some comments might be added in the code, make it 
> clear
> and easy to understand, then which could avoid the old confusion and more 
> code changes.

Since I think the current interface (using *res as both input and
output parameters that have very different meanings) is confusing,
it's hard for *me* to write comments that make it less confusing, but
of course, you're welcome to propose something.

My opinion (probably not universally shared) is that my proposal would
make the code more readable, and it's worth doing even though the diff
is larger.

Anyway, I'll post these patches independently and see if anybody else
has an opinion.

Bjorn

> > Original-patch: 
> > http://lore.kernel.org/lkml/20180921073211.20097-2-liji...@redhat.com
> > Based-on-patch-by: Lianbo Jiang 
> > Signed-off-by: Bjorn Helgaas 
> > ---
> >  kernel/resource.c |   94 
> > +++--
> >  1 file changed, 41 insertions(+), 53 deletions(-)
> > 
> > diff --git a/kernel/resource.c b/kernel/resource.c
> > index 155ec873ea4d..9891ea90cc8f 100644
> > --- a/kernel/resource.c
> > +++ b/kernel/resource.c
> > @@ -319,23 +319,26 @@ int release_resource(struct resource *old)
> >  EXPORT_SYMBOL(release_resource);
> >  
> >  /*
> > - * Finds the lowest iomem resource existing within [res->start..res->end].
> > - * The caller must specify res->start, res->end, res->flags, and optionally
> > - * desc.  If found, returns 0, res is overwritten, if not found, returns 
> > -1.
> > - * This function walks the whole tree and not just first level children 
> > until
> > - * and unless first_level_children_only is true.
> > + * Finds the lowest iomem resource that covers part of [start..end].  The
> > + * caller must specify start, end, flags, and desc (which may be
> > + * IORES_DESC_NONE).
> > + *
> > + * If a resource is found, returns 0 and *res is overwritten with the part
> > + * of the resource that's within [start..end]; if none is found, returns
> > + * -1.
> > + *
> > + * This function walks the whole tree and not just first level children
> > + * unless first_level_children_only is true.
> >   */
> > -static int find_next_iomem_res(struct resource *res, unsigned long desc,
> > -  bool first_level_children_only)
> > +static int find_next_iomem_res(resource_size_t start, resource_size_t end,
> > +  unsigned long flags, unsigned long desc,
> > +  bool first_level_children_only,
> > +  struct resource *res)
> >  {
> > -   resource_size_t start, end;
> > struct resource *p;
> > bool sibling_only = false;
> >  
> > BUG_ON(!res);
> > -
> > -   start = res->start;
> > -   end = res->end;
> > BUG_ON(start >= end);
> >  
> > if (first_level_children_only)
> > @@ -344,7 +347,7 @@ static int find_next_iomem_res(struct resource *res, 
> > unsigned long desc,
> > read_lock(_lock);
> >  
> > for (p = iomem_resource.child; p; p = next_resource(p, sibling_only)) {
> > -   if ((p->flags & res->flags) != res->flags)
> > +   if ((p->flags & flags) != flags)
> >   

Re: [PATCH v3] PCI: Equalize hotplug memory and io for non/occupied slots

2018-09-26 Thread Bjorn Helgaas
On Tue, Sep 25, 2018 at 12:39:06PM -0600, Jon Derrick wrote:
> Currently, a hotplug bridge will be given hpmemsize additional memory
> and hpiosize additional io if available, in order to satisfy any future
> hotplug allocation requirements.
> 
> These calculations don't consider the current memory/io size of the
> hotplug bridge/slot, so hotplug bridges/slots which have downstream
> devices will be allocated their current allocation in addition to the
> hpmemsize value.
> 
> This makes for possibly undesirable results with a mix of unoccupied and
> occupied slots (ex, with hpmemsize=2M):
> 
> 02:03.0 PCI bridge: <-- Occupied
>   Memory behind bridge: d620-d64f [size=3M]
> 02:04.0 PCI bridge: <-- Unoccupied
>   Memory behind bridge: d650-d66f [size=2M]
> 
> This change considers the current allocation size when using the
> hpmemsize/hpiosize parameters to make the reservations predictable for
> the mix of unoccupied and occupied slots:
> 
> 02:03.0 PCI bridge: <-- Occupied
>   Memory behind bridge: d620-d63f [size=2M]
> 02:04.0 PCI bridge: <-- Unoccupied
>   Memory behind bridge: d640-d65f [size=2M]
> 
> Signed-off-by: Jon Derrick 

Applied to pci/hotplug for v4.20, thanks!

> ---
> v2->v3: Made the IO and mem size calculations nearly equivalent
> 
>  drivers/pci/setup-bus.c | 28 +++-
>  1 file changed, 15 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 79b1824..ed96043 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -811,6 +811,8 @@ static struct resource *find_free_bus_resource(struct 
> pci_bus *bus,
>  static resource_size_t calculate_iosize(resource_size_t size,
>   resource_size_t min_size,
>   resource_size_t size1,
> + resource_size_t add_size,
> + resource_size_t children_add_size,
>   resource_size_t old_size,
>   resource_size_t align)
>  {
> @@ -823,15 +825,18 @@ static resource_size_t calculate_iosize(resource_size_t 
> size,
>  #if defined(CONFIG_ISA) || defined(CONFIG_EISA)
>   size = (size & 0xff) + ((size & ~0xffUL) << 2);
>  #endif
> - size = ALIGN(size + size1, align);
> + size = size + size1;
>   if (size < old_size)
>   size = old_size;
> +
> + size = ALIGN(max(size, add_size) + children_add_size, align);
>   return size;
>  }
>  
>  static resource_size_t calculate_memsize(resource_size_t size,
>   resource_size_t min_size,
> - resource_size_t size1,
> + resource_size_t add_size,
> + resource_size_t children_add_size,
>   resource_size_t old_size,
>   resource_size_t align)
>  {
> @@ -841,7 +846,8 @@ static resource_size_t calculate_memsize(resource_size_t 
> size,
>   old_size = 0;
>   if (size < old_size)
>   size = old_size;
> - size = ALIGN(size + size1, align);
> +
> + size = ALIGN(max(size, add_size) + children_add_size, align);
>   return size;
>  }
>  
> @@ -930,12 +936,10 @@ static void pbus_size_io(struct pci_bus *bus, 
> resource_size_t min_size,
>   }
>   }
>  
> - size0 = calculate_iosize(size, min_size, size1,
> + size0 = calculate_iosize(size, min_size, size1, 0, 0,
>   resource_size(b_res), min_align);
> - if (children_add_size > add_size)
> - add_size = children_add_size;
> - size1 = (!realloc_head || (realloc_head && !add_size)) ? size0 :
> - calculate_iosize(size, min_size, add_size + size1,
> + size1 = (!realloc_head || (realloc_head && !add_size && 
> !children_add_size)) ? size0 :
> + calculate_iosize(size, min_size, size1, add_size, 
> children_add_size,
>   resource_size(b_res), min_align);
>   if (!size0 && !size1) {
>   if (b_res->start || b_res->end)
> @@ -1079,12 +1083,10 @@ static int pbus_size_mem(struct pci_bus *bus, 
> unsigned long mask,
>  
>   min_align = calculate_mem_align(aligns, max_order);
>   min_align = max(min_align, window_alignment(bus, b_res->flags));
> - size0 = calculate_memsize(size, min_size, 0, resource_size(b_res), 
> min_align);
> + size0 = calculate_memsize(size, min_size, 0, 0, resource_size(b_res), 
> min_align);
>   add_align = max(min_align, add_align);
> - if (children_add_size > add_size)
> - add_size = children_add_size;
> - size1 = (!realloc_head || (realloc_head && !add_size)) ? size0 :
> - calculate_memsize(size, min_size, add_size,
> + size1 = (!realloc_head || (realloc_head && !add_size && 
> !children_add_size)) ? size0 :
> + calculate_memsize(size, min_size, add_size, children_add_size,
>   resource_size(b_res), add_align);
>   if (!size0 && !size1) {
>   if (b_res->start || b_res->end)
> -- 
> 1.8.3.1
> 


Re: [PATCH] PCI/AER: Clear uncorrectable error status for device

2018-09-26 Thread Bjorn Helgaas
[+cc Sinan, LKML]

On Tue, Sep 18, 2018 at 04:20:29AM -0400, Oza Pawandeep wrote:
> PCI based device drivers handles ERR_NONFATAL  by registering
> pci_error_handlers. some of the drivers clear AER uncorrectable status
> in slot_reset while some in resume.
> 
> Drivers should not have responsibility of clearing the AER status, instead
> shall be done by error and recovery framework defined in err.c

Agreed, and Keith's patch 43c9a34fe04e ("PCI/ERR: Always use the first
downstream port") [1], which is queued on pci/hotplug for v4.20, does
call pci_cleanup_aer_uncorrect_error_status() at the end of
pcie_do_recovery().

1) Does that seem like the right place?

2) I guess all we need now would be to remove the calls from the
   drivers?

3) If we remove all the calls from the drivers, we should remove the
   declaration from include/linux/aer.h, too.

I can take care of these updates if we agree they're the right thing
to do.

[1] 
http://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/commit/?id=43c9a34fe04e

> Clear the status while resuming, after reset_link was successful.
> 
> Signed-off-by: Oza Pawandeep 
> 
> diff --git a/drivers/crypto/qat/qat_common/adf_aer.c 
> b/drivers/crypto/qat/qat_common/adf_aer.c
> index da8a2d3..61ded36 100644
> --- a/drivers/crypto/qat/qat_common/adf_aer.c
> +++ b/drivers/crypto/qat/qat_common/adf_aer.c
> @@ -198,7 +198,6 @@ static pci_ers_result_t adf_slot_reset(struct pci_dev 
> *pdev)
>   pr_err("QAT: Can't find acceleration device\n");
>   return PCI_ERS_RESULT_DISCONNECT;
>   }
> - pci_cleanup_aer_uncorrect_error_status(pdev);
>   if (adf_dev_aer_schedule_reset(accel_dev, ADF_DEV_RESET_SYNC))
>   return PCI_ERS_RESULT_DISCONNECT;
>  
> diff --git a/drivers/dma/ioat/init.c b/drivers/dma/ioat/init.c
> index 4fa4c06..80c475f 100644
> --- a/drivers/dma/ioat/init.c
> +++ b/drivers/dma/ioat/init.c
> @@ -1267,12 +1267,6 @@ static pci_ers_result_t 
> ioat_pcie_error_slot_reset(struct pci_dev *pdev)
>   pci_wake_from_d3(pdev, false);
>   }
>  
> - err = pci_cleanup_aer_uncorrect_error_status(pdev);
> - if (err) {
> - dev_err(>dev,
> - "AER uncorrect error status clear failed: %#x\n", err);
> - }
> -
>   return result;
>  }
>  
> diff --git a/drivers/infiniband/hw/hfi1/pcie.c 
> b/drivers/infiniband/hw/hfi1/pcie.c
> index baf7c32..38bc804 100644
> --- a/drivers/infiniband/hw/hfi1/pcie.c
> +++ b/drivers/infiniband/hw/hfi1/pcie.c
> @@ -655,7 +655,6 @@ pci_resume(struct pci_dev *pdev)
>   struct hfi1_devdata *dd = pci_get_drvdata(pdev);
>  
>   dd_dev_info(dd, "HFI1 resume function called\n");
> - pci_cleanup_aer_uncorrect_error_status(pdev);
>   /*
>* Running jobs will fail, since it's asynchronous
>* unlike sysfs-requested reset.   Better than
> diff --git a/drivers/infiniband/hw/qib/qib_pcie.c 
> b/drivers/infiniband/hw/qib/qib_pcie.c
> index 5ac7b31..30595b3 100644
> --- a/drivers/infiniband/hw/qib/qib_pcie.c
> +++ b/drivers/infiniband/hw/qib/qib_pcie.c
> @@ -597,7 +597,6 @@ qib_pci_resume(struct pci_dev *pdev)
>   struct qib_devdata *dd = pci_get_drvdata(pdev);
>  
>   qib_devinfo(pdev, "QIB resume function called\n");
> - pci_cleanup_aer_uncorrect_error_status(pdev);
>   /*
>* Running jobs will fail, since it's asynchronous
>* unlike sysfs-requested reset.   Better than
> diff --git a/drivers/net/ethernet/atheros/alx/main.c 
> b/drivers/net/ethernet/atheros/alx/main.c
> index 567ee54..0d0b6a4 100644
> --- a/drivers/net/ethernet/atheros/alx/main.c
> +++ b/drivers/net/ethernet/atheros/alx/main.c
> @@ -1960,8 +1960,6 @@ static pci_ers_result_t alx_pci_error_slot_reset(struct 
> pci_dev *pdev)
>   if (!alx_reset_mac(hw))
>   rc = PCI_ERS_RESULT_RECOVERED;
>  out:
> - pci_cleanup_aer_uncorrect_error_status(pdev);
> -
>   rtnl_unlock();
>  
>   return rc;
> diff --git a/drivers/net/ethernet/broadcom/bnx2.c 
> b/drivers/net/ethernet/broadcom/bnx2.c
> index 122fdb8..bbb2471 100644
> --- a/drivers/net/ethernet/broadcom/bnx2.c
> +++ b/drivers/net/ethernet/broadcom/bnx2.c
> @@ -8793,13 +8793,6 @@ static pci_ers_result_t bnx2_io_slot_reset(struct 
> pci_dev *pdev)
>   if (!(bp->flags & BNX2_FLAG_AER_ENABLED))
>   return result;
>  
> - err = pci_cleanup_aer_uncorrect_error_status(pdev);
> - if (err) {
> - dev_err(>dev,
> - "pci_cleanup_aer_uncorrect_error_status failed 0x%0x\n",
> -  err); /* non-fatal, continue */
> - }
> -
>   return result;
>  }
>  
> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
> b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> index 5b1ed24..cfb6c89 100644
> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> @@ -14379,14 +14379,6 @@ static pci_ers_result_t bnx2x_io_slot_reset(struct 
> pci_dev 

Re: linux-next: build failure after merge of the pci tree

2018-09-26 Thread Bjorn Helgaas
On Wed, Sep 26, 2018 at 9:56 AM Keith Busch  wrote:
>
> On Wed, Sep 26, 2018 at 08:25:40AM -0600, Keith Busch wrote:
> > On Wed, Sep 26, 2018 at 03:00:51PM +1000, Stephen Rothwell wrote:
> > > Hi Bjorn,
> > >
> > > After merging the pci tree, today's linux-next build (powerpc allnoconfig)
> > > failed like this:
> > >
> > > ld: drivers/pci/pci.o: in function `pci_bus_error_reset':
> > > pci.c:(.text+0x5fba): undefined reference to `pci_slot_mutex'
> > > ld: pci.c:(.text+0x5fc2): undefined reference to `pci_slot_mutex'
> > >
> > > Caused by commit
> > >
> > >   131b0ca2c7b2 ("PCI/ERR: Use slot reset if available")
> > >
> > > I have applied the following hack for today (there is probably a better
> > > way):
> >
> > Thanks for the notice. Does this mean you don't have CONFIG_SYSFS? I
> > must admit I missed that connection for building slot.c.
> >
> >
> > > From: Stephen Rothwell 
> > > Date: Wed, 26 Sep 2018 14:55:37 +1000
> > > Subject: [PATCH] pci: move pci_slot_mutex so it is available where needed
> > >
> > > Fixes: 131b0ca2c7b2 ("PCI/ERR: Use slot reset if available")
> > > Signed-off-by: Stephen Rothwell 
> > > ---
> > >  drivers/pci/pci.c  | 2 ++
> > >  drivers/pci/slot.c | 1 -
> > >  2 files changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index 8c1e99a637d8..1fa67db6b21e 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -5190,6 +5190,8 @@ static int pci_bus_reset(struct pci_bus *bus, int 
> > > probe)
> > > return ret;
> > >  }
> > >
> > > +DEFINE_MUTEX(pci_slot_mutex);
> > > +
> > >  /**
> > >   * pci_bus_error_reset - reset the bridge's subordinate bus
> > >   * @bridge: The parent device that connects to the bus to reset
> > > diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
> > > index 3da03fcc6fbf..c46d5e1ff536 100644
> > > --- a/drivers/pci/slot.c
> > > +++ b/drivers/pci/slot.c
> > > @@ -14,7 +14,6 @@
> > >
> > >  struct kset *pci_slots_kset;
> > >  EXPORT_SYMBOL_GPL(pci_slots_kset);
> > > -DEFINE_MUTEX(pci_slot_mutex);
> > >
> > >  static ssize_t pci_slot_attr_show(struct kobject *kobj,
> > > struct attribute *attr, char *buf)
> > > --
> > > 2.18.0
>
> There's unfortunately second bug here when there are no slots, which
> would be the case without CONFIG_SYSFS: the slot list is empty, and the
> function just returned success, but it should have gone to the default
> secondary bus reset behavior in that case. I'll send a patch shortly.

I folded in Keith's patch for this, so you should be able to drop your
workaround, Stephen.


Re: [PATCH -next] PCI: hotplug: Remove set but not used variable 'physical_slot'

2018-09-26 Thread Bjorn Helgaas
On Wed, Sep 26, 2018 at 11:06:02AM +, YueHaibing wrote:
> Fixes gcc '-Wunused-but-set-variable' warning:
> 
> drivers/pci/hotplug/cpqphp_core.c: In function 'init_SERR':
> drivers/pci/hotplug/cpqphp_core.c:124:5: warning:
>  variable 'physical_slot' set but not used [-Wunused-but-set-variable]
> 
> Signed-off-by: YueHaibing 

Applied to pci/hotplug for v4.20, thanks!

> ---
>  drivers/pci/hotplug/cpqphp_core.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/cpqphp_core.c 
> b/drivers/pci/hotplug/cpqphp_core.c
> index 95b7d60..16bbb18 100644
> --- a/drivers/pci/hotplug/cpqphp_core.c
> +++ b/drivers/pci/hotplug/cpqphp_core.c
> @@ -121,7 +121,6 @@ static int init_SERR(struct controller *ctrl)
>  {
>   u32 tempdword;
>   u32 number_of_slots;
> - u8 physical_slot;
>  
>   if (!ctrl)
>   return 1;
> @@ -131,7 +130,6 @@ static int init_SERR(struct controller *ctrl)
>   number_of_slots = readb(ctrl->hpc_reg + SLOT_MASK) & 0x0F;
>   /* Loop through slots */
>   while (number_of_slots) {
> - physical_slot = tempdword;
>   writeb(0, ctrl->hpc_reg + SLOT_SERR);
>   tempdword++;
>   number_of_slots--;
> 


[PATCH 1/3] x86/kexec: Correct KEXEC_BACKUP_SRC_END off-by-one error

2018-09-24 Thread Bjorn Helgaas
From: Bjorn Helgaas 

The only use of KEXEC_BACKUP_SRC_END is as an argument to
walk_system_ram_res():

  int crash_load_segments(struct kimage *image)
  {
...
walk_system_ram_res(KEXEC_BACKUP_SRC_START, KEXEC_BACKUP_SRC_END,
image, determine_backup_region);

walk_system_ram_res() expects "start, end" arguments that are inclusive,
i.e., the range to be walked includes both the start and end addresses.

KEXEC_BACKUP_SRC_END was previously defined as (640 * 1024UL), which is the
first address *past* the desired 0-64KB range.

Define KEXEC_BACKUP_SRC_END as (640 * 1024UL - 1) so the KEXEC_BACKUP_SRC
region is [0-0x], not [0-0x1].

Fixes: dd5f726076cc ("kexec: support for kexec on panic using new system call")
Signed-off-by: Bjorn Helgaas 
---
 arch/x86/include/asm/kexec.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index f327236f0fa7..5125fca472bb 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -67,7 +67,7 @@ struct kimage;
 
 /* Memory to backup during crash kdump */
 #define KEXEC_BACKUP_SRC_START (0UL)
-#define KEXEC_BACKUP_SRC_END   (640 * 1024UL)  /* 640K */
+#define KEXEC_BACKUP_SRC_END   (640 * 1024UL - 1)  /* 640K */
 
 /*
  * CPU does not save ss and sp on stack if execution is already



Re: [PATCH v6 03/13] PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset

2018-09-21 Thread Bjorn Helgaas
On Fri, Sep 21, 2018 at 12:13:21PM -0600, Logan Gunthorpe wrote:
> On 2018-09-21 10:48 AM, Bjorn Helgaas wrote:
> >> I think the use of "map" in this context is slightly confusing because the
> >> general expectation is that map/unmap must be balanced.
> 
> Yeah, Jason said the same thing, but having an empty unmap function
> seems wasteful and Christoph said to just remove it. My opinion is that
> it's not that big an issue one way or another -- if we have to add an
> unmap later it's not really that hard.
> 
> >> If you keep "map", maybe add a sentence or two about why there's no
> >> corresponding unmap?
> 
> Will do.
> 
> > Another wrinkle is that "map" usually takes an A and gives you back a
> > B.  Now the caller has both A and B and both are still valid.
> > Here we pass in an SGL and the SGL is transformed, so the caller only
> > has B and A has been destroyed, i.e., the SGL can no longer be used as
> > it was before, and there's no way to get A back.
> 
> I wouldn't say that. Our map_sg function is doing the same thing
> dma_map_sg is: it sets the DMA address and length in the scatter list.
> So B is still A just with other fields set. If the caller wanted to map
> this SG in a different way they can still do so and the new DMA
> address/length would override the old values. (Normally, you'd want to
> unmap before doing something like that, but seeing our unmap is an empty
> operation, we wouldn't have to do that.)

Ok.  I was assuming s->dma_address would have been already set before
the call and would be overwritten by pci_p2pmem_map_sg().  But I guess
that's not the case -- sounds like s->dma_address is undefined before
the call.

Bjorn


Re: [PATCH v6 06/13] PCI/P2PDMA: Add P2P DMA driver writer's documentation

2018-09-21 Thread Bjorn Helgaas
On Wed, Sep 12, 2018 at 06:11:49PM -0600, Logan Gunthorpe wrote:
> Add a restructured text file describing how to write drivers
> with support for P2P DMA transactions. The document describes
> how to use the APIs that were added in the previous few
> commits.
> 
> Also adds an index for the PCI documentation tree even though this
> is the only PCI document that has been converted to restructured text
> at this time.
> 
> Signed-off-by: Logan Gunthorpe 

Acked-by: Bjorn Helgaas 

> +With the client list in hand, the orchestrator may then call
> +:c:func:`pci_p2pmem_find()` to obtain a published P2P memory provider
> +that is supported (behind the same root port) as all the clients. If more
> +than one provider is supported, the one nearest to all the clients will
> +be chosen first. If there are more than one provider is an equal distance

s/If there are more/If more/

> +away, the one returned will be chosen at random. This function returns the 
> PCI

s/the one returned will be chosen at random/one will be chosen
arbitrarily/ ?  (I doubt it's really random)

> +device to use for the provider with a reference taken and therefore
> +when it's no longer needed it should be returned with pci_dev_put().

> +Struct Page Caveats
> +---
> +
> +Driver writers should be very careful about not passing these special
> +struct pages to code that isn't prepared for it. At this time, the kernel
> +interfaces do not have any checks for ensuring this. This obviously
> +precludes passing these pages to userspace.

Sounds like landmines here since the reader probably can't translate
"code that isn't prepared for it" into a list of interfaces that are
off-limits.  But that's a VM issue that is above my pay grade, so I'm
not suggesting any change; just pointing out something that makes me
wonder "hmmm..., how would I act on this?"

> +P2P memory is also technically IO memory but should never have any side
> +effects behind it. Thus, the order of loads and stores should not be 
> important
> +and ioreadX(), iowriteX() and friends should not be necessary.
> +However, as the memory is not cache coherent, if access ever needs to
> +be protected by a spinlock then :c:func:`mmiowb()` must be used before
> +unlocking the lock. (See ACQUIRES VS I/O ACCESSES in
> +Documentation/memory-barriers.txt)


Re: [PATCH v6 02/13] PCI/P2PDMA: Add sysfs group to display p2pmem stats

2018-09-21 Thread Bjorn Helgaas
On Wed, Sep 12, 2018 at 06:11:45PM -0600, Logan Gunthorpe wrote:
> Add a sysfs group to display statistics about P2P memory that is
> registered in each PCI device.
> 
> Attributes in the group display the total amount of P2P memory, the
> amount available and whether it is published or not.
> 
> Signed-off-by: Logan Gunthorpe 

Acked-by: Bjorn Helgaas 

> +What:/sys/bus/pci/devices/.../p2pmem/available
> +Date:November 2017
> +Contact: Logan Gunthorpe 
> +Description:
> + If the device has any Peer-to-Peer memory registered, this
> + file contains the amount of memory that has not been
> + allocated (in decimal).
> +
> +What:/sys/bus/pci/devices/.../p2pmem/size
> +Date:November 2017
> +Contact: Logan Gunthorpe 
> +Description:
> + If the device has any Peer-to-Peer memory registered, this
> + file contains the total amount of memory that the device
> + provides (in decimal).

Maybe reorder this so the "size" (total amount) is documented before
"available" (some subset of "size")?

> +
> +What:/sys/bus/pci/devices/.../p2pmem/published
> +Date:November 2017
> +Contact: Logan Gunthorpe 
> +Description:
> + If the device has any Peer-to-Peer memory registered, this
> + file contains a '1' if the memory has been published for
> + use inside the kernel or a '0' if it is only intended
> + for use within the driver that published it.

It doesn't read quite right to talk about "use within the driver that
*published* it".  Is it really published in that case?  That sounds more
like "private".  I expected something like the following (but I don't claim
to understand the whole use model here):

  ... this file contains a '1' if the memory has been published for use
  outside the driver that owns the device.



Re: [PATCH v3] PCI: dwc: fix scheduling while atomic issues

2018-09-20 Thread Bjorn Helgaas
On Thu, Sep 13, 2018 at 04:05:54PM +0100, Lorenzo Pieralisi wrote:
> On Wed, Aug 29, 2018 at 11:04:08AM +0800, Jisheng Zhang wrote:
> > When programming inbound/outbound atu, we call usleep_range() after
> > each checking PCIE_ATU_ENABLE bit. Unfortunately, the atu programming
> > can be called in atomic context:
> > 
> > inbound atu programming could be called through
> > pci_epc_write_header()
> >   =>dw_pcie_ep_write_header()
> > =>dw_pcie_prog_inbound_atu()
> > 
> > outbound atu programming could be called through
> > pci_bus_read_config_dword()
> >   =>dw_pcie_rd_conf()
> > =>dw_pcie_prog_outbound_atu()
> > 
> > Fix this issue by calling mdelay() instead.
> > 
> > Fixes: f8aed6ec624f ("PCI: dwc: designware: Add EP mode support")
> > Fixes: d8bbeb39fbf3 ("PCI: designware: Wait for iATU enable")
> > Signed-off-by: Jisheng Zhang 
> > Acked-by: Gustavo Pimentel 
> > ---
> 
> Applied to pci/controller-fixes aiming at one of the upcoming -rc*.

I cherry-picked this into for-linus for v4.19.

> > since v2:
> >  - Add Fixes tag
> >  - Add Gustavo's Ack
> > 
> > since v1:
> >  - use mdelay() instead of udelay() to avoid __bad_udelay()
> > 
> >  drivers/pci/controller/dwc/pcie-designware.c | 8 
> >  drivers/pci/controller/dwc/pcie-designware.h | 3 +--
> >  2 files changed, 5 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/pci/controller/dwc/pcie-designware.c 
> > b/drivers/pci/controller/dwc/pcie-designware.c
> > index 778c4f76a884..2153956a0b20 100644
> > --- a/drivers/pci/controller/dwc/pcie-designware.c
> > +++ b/drivers/pci/controller/dwc/pcie-designware.c
> > @@ -135,7 +135,7 @@ static void dw_pcie_prog_outbound_atu_unroll(struct 
> > dw_pcie *pci, int index,
> > if (val & PCIE_ATU_ENABLE)
> > return;
> >  
> > -   usleep_range(LINK_WAIT_IATU_MIN, LINK_WAIT_IATU_MAX);
> > +   mdelay(LINK_WAIT_IATU);
> > }
> > dev_err(pci->dev, "Outbound iATU is not being enabled\n");
> >  }
> > @@ -178,7 +178,7 @@ void dw_pcie_prog_outbound_atu(struct dw_pcie *pci, int 
> > index, int type,
> > if (val & PCIE_ATU_ENABLE)
> > return;
> >  
> > -   usleep_range(LINK_WAIT_IATU_MIN, LINK_WAIT_IATU_MAX);
> > +   mdelay(LINK_WAIT_IATU);
> > }
> > dev_err(pci->dev, "Outbound iATU is not being enabled\n");
> >  }
> > @@ -236,7 +236,7 @@ static int dw_pcie_prog_inbound_atu_unroll(struct 
> > dw_pcie *pci, int index,
> > if (val & PCIE_ATU_ENABLE)
> > return 0;
> >  
> > -   usleep_range(LINK_WAIT_IATU_MIN, LINK_WAIT_IATU_MAX);
> > +   mdelay(LINK_WAIT_IATU);
> > }
> > dev_err(pci->dev, "Inbound iATU is not being enabled\n");
> >  
> > @@ -282,7 +282,7 @@ int dw_pcie_prog_inbound_atu(struct dw_pcie *pci, int 
> > index, int bar,
> > if (val & PCIE_ATU_ENABLE)
> > return 0;
> >  
> > -   usleep_range(LINK_WAIT_IATU_MIN, LINK_WAIT_IATU_MAX);
> > +   mdelay(LINK_WAIT_IATU);
> > }
> > dev_err(pci->dev, "Inbound iATU is not being enabled\n");
> >  
> > diff --git a/drivers/pci/controller/dwc/pcie-designware.h 
> > b/drivers/pci/controller/dwc/pcie-designware.h
> > index 96126fd8403c..9f1a5e399b70 100644
> > --- a/drivers/pci/controller/dwc/pcie-designware.h
> > +++ b/drivers/pci/controller/dwc/pcie-designware.h
> > @@ -26,8 +26,7 @@
> >  
> >  /* Parameters for the waiting for iATU enabled routine */
> >  #define LINK_WAIT_MAX_IATU_RETRIES 5
> > -#define LINK_WAIT_IATU_MIN 9000
> > -#define LINK_WAIT_IATU_MAX 1
> > +#define LINK_WAIT_IATU 9
> >  
> >  /* Synopsys-specific PCIe configuration registers */
> >  #define PCIE_PORT_LINK_CONTROL 0x710
> > -- 
> > 2.18.0
> > 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


Re: [PATCH v2] PCI hotplug Eq v2

2018-09-17 Thread Bjorn Helgaas
On Thu, Aug 30, 2018 at 04:11:59PM -0600, Jon Derrick wrote:
> Hi Bjorn,
> 
> Sorry for the delay on this one and pushing it after RC1.
> Feel free to queue it up for 4.20 if it looks fine.
> 
> I've added comments to the git log and source explaining why
> calculate_iosize was left unchanged. Basically I could not
> synthesize a condition where it would have affected the topology.

In other words, the only reason you didn't change the
calculate_iosize() path was because you couldn't test it?

I appreciate your desire to avoid untested changes, but I think it's
very important to preserve and even improve the symmetry between
calculate_memsize() and calculate_iosize().  For example, it's not
obvious why the order is different here:

  calculate_iosize():
size = ALIGN(size + size1, align);
if (size < old_size)
  size = old_size;

  calculate_memsize():
if (size < old_size)
  size = old_size;
size = ALIGN(size + size1, align);

So I don't want to diverge them further unless there's a real
functional reason why we need to handle I/O port space differently
than MMIO space.

You've tested the MMIO path, and I'm willing to take the risk of
doing the same thing in the I/O port path.

Bjorn


  1   2   3   4   5   6   7   8   9   10   >