Re: [net-next 10/16] net/mlx5: Support PCIe buffer congestion handling via Devlink

2018-07-30 Thread Bjorn Helgaas
On Mon, Jul 30, 2018 at 08:02:48AM -0700, Alexander Duyck wrote:
> On Mon, Jul 30, 2018 at 7:07 AM, Bjorn Helgaas  wrote:
> > On Sun, Jul 29, 2018 at 03:00:28PM -0700, Alexander Duyck wrote:
> >> On Sun, Jul 29, 2018 at 2:23 AM, Moshe Shemesh  
> >> wrote:
> >> > On Sat, Jul 28, 2018 at 7:06 PM, Bjorn Helgaas  
> >> > wrote:
> >> >> On Thu, Jul 26, 2018 at 07:00:20AM -0700, Alexander Duyck wrote:
> >> >> > On Thu, Jul 26, 2018 at 12:14 AM, Jiri Pirko  wrote:
> >> >> > > Thu, Jul 26, 2018 at 02:43:59AM CEST, jakub.kicin...@netronome.com
> >> >> > > wrote:
> >> >> > >>On Wed, 25 Jul 2018 08:23:26 -0700, Alexander Duyck wrote:
> >> >> > >>> On Wed, Jul 25, 2018 at 5:31 AM, Eran Ben Elisha wrote:
> >> >> > >>> > On 7/24/2018 10:51 PM, Jakub Kicinski wrote:
> >> >> > >>> >>>> The devlink params haven't been upstream even for a full 
> >> >> > >>> >>>> cycle
> >> >> > >>> >>>> and
> >> >> > >>> >>>> already you guys are starting to use them to configure 
> >> >> > >>> >>>> standard
> >> >> > >>> >>>> features like queuing.
> >> >> > >>> >>>
> >> >> > >>> >>> We developed the devlink params in order to support 
> >> >> > >>> >>> non-standard
> >> >> > >>> >>> configuration only. And for non-standard, there are generic 
> >> >> > >>> >>> and
> >> >> > >>> >>> vendor
> >> >> > >>> >>> specific options.
> >> >> > >>> >>
> >> >> > >>> >> I thought it was developed for performing non-standard and
> >> >> > >>> >> possibly
> >> >> > >>> >> vendor specific configuration.  Look at DEVLINK_PARAM_GENERIC_*
> >> >> > >>> >> for
> >> >> > >>> >> examples of well justified generic options for which we have no
> >> >> > >>> >> other API.  The vendor mlx4 options look fairly vendor specific
> >> >> > >>> >> if you
> >> >> > >>> >> ask me, too.
> >> >> > >>> >>
> >> >> > >>> >> Configuring queuing has an API.  The question is it acceptable 
> >> >> > >>> >> to
> >> >> > >>> >> enter
> >> >> > >>> >> into the risky territory of controlling offloads via devlink
> >> >> > >>> >> parameters
> >> >> > >>> >> or would we rather make vendors take the time and effort to 
> >> >> > >>> >> model
> >> >> > >>> >> things to (a subset) of existing APIs.  The HW never fits the
> >> >> > >>> >> APIs
> >> >> > >>> >> perfectly.
> >> >> > >>> >
> >> >> > >>> > I understand what you meant here, I would like to highlight that
> >> >> > >>> > this
> >> >> > >>> > mechanism was not meant to handle SRIOV, Representors, etc.
> >> >> > >>> > The vendor specific configuration suggested here is to handle a
> >> >> > >>> > congestion
> >> >> > >>> > state in Multi Host environment (which includes PF and multiple
> >> >> > >>> > VFs per
> >> >> > >>> > host), where one host is not aware to the other hosts, and each 
> >> >> > >>> > is
> >> >> > >>> > running
> >> >> > >>> > on its own pci/driver. It is a device working mode 
> >> >> > >>> > configuration.
> >> >> > >>> >
> >> >> > >>> > This  couldn't fit into any existing API, thus creating this
> >> >> > >>> > vendor specific
> >> >> > >>> > unique API is needed.
> >> >> > >>>
> >> >> > >>> If we are just goin

Re: [net-next 10/16] net/mlx5: Support PCIe buffer congestion handling via Devlink

2018-07-31 Thread Bjorn Helgaas
On Mon, Jul 30, 2018 at 08:19:50PM -0700, Alexander Duyck wrote:
> On Mon, Jul 30, 2018 at 7:33 PM, Bjorn Helgaas  wrote:
> > On Mon, Jul 30, 2018 at 08:02:48AM -0700, Alexander Duyck wrote:
> >> On Mon, Jul 30, 2018 at 7:07 AM, Bjorn Helgaas  wrote:
> >> > On Sun, Jul 29, 2018 at 03:00:28PM -0700, Alexander Duyck wrote:
> >> >> On Sun, Jul 29, 2018 at 2:23 AM, Moshe Shemesh  
> >> >> wrote:
> >> >> > On Sat, Jul 28, 2018 at 7:06 PM, Bjorn Helgaas  
> >> >> > wrote:
> >> >> >> On Thu, Jul 26, 2018 at 07:00:20AM -0700, Alexander Duyck wrote:
> >> >> >> > On Thu, Jul 26, 2018 at 12:14 AM, Jiri Pirko  
> >> >> >> > wrote:
> >> >> >> > > Thu, Jul 26, 2018 at 02:43:59AM CEST, 
> >> >> >> > > jakub.kicin...@netronome.com
> >> >> >> > > wrote:
> >> >> >> > >>On Wed, 25 Jul 2018 08:23:26 -0700, Alexander Duyck wrote:
> >> >> >> > >>> On Wed, Jul 25, 2018 at 5:31 AM, Eran Ben Elisha wrote:
> >> >> >> > >>> > On 7/24/2018 10:51 PM, Jakub Kicinski wrote:
> >> >> >> > >>> >>>> The devlink params haven't been upstream even for a full 
> >> >> >> > >>> >>>> cycle
> >> >> >> > >>> >>>> and
> >> >> >> > >>> >>>> already you guys are starting to use them to configure 
> >> >> >> > >>> >>>> standard
> >> >> >> > >>> >>>> features like queuing.
> >> >> >> > >>> >>>
> >> >> >> > >>> >>> We developed the devlink params in order to support 
> >> >> >> > >>> >>> non-standard
> >> >> >> > >>> >>> configuration only. And for non-standard, there are 
> >> >> >> > >>> >>> generic and
> >> >> >> > >>> >>> vendor
> >> >> >> > >>> >>> specific options.
> >> >> >> > >>> >>
> >> >> >> > >>> >> I thought it was developed for performing non-standard and
> >> >> >> > >>> >> possibly
> >> >> >> > >>> >> vendor specific configuration.  Look at 
> >> >> >> > >>> >> DEVLINK_PARAM_GENERIC_*
> >> >> >> > >>> >> for
> >> >> >> > >>> >> examples of well justified generic options for which we 
> >> >> >> > >>> >> have no
> >> >> >> > >>> >> other API.  The vendor mlx4 options look fairly vendor 
> >> >> >> > >>> >> specific
> >> >> >> > >>> >> if you
> >> >> >> > >>> >> ask me, too.
> >> >> >> > >>> >>
> >> >> >> > >>> >> Configuring queuing has an API.  The question is it 
> >> >> >> > >>> >> acceptable to
> >> >> >> > >>> >> enter
> >> >> >> > >>> >> into the risky territory of controlling offloads via devlink
> >> >> >> > >>> >> parameters
> >> >> >> > >>> >> or would we rather make vendors take the time and effort to 
> >> >> >> > >>> >> model
> >> >> >> > >>> >> things to (a subset) of existing APIs.  The HW never fits 
> >> >> >> > >>> >> the
> >> >> >> > >>> >> APIs
> >> >> >> > >>> >> perfectly.
> >> >> >> > >>> >
> >> >> >> > >>> > I understand what you meant here, I would like to highlight 
> >> >> >> > >>> > that
> >> >> >> > >>> > this
> >> >> >> > >>> > mechanism was not meant to handle SRIOV, Representors, etc.
> >> >> >> > >>> > The vendor specific configuration suggested here is to 
> >> &g

[PATCH v1 0/4] PCI: Remove unnecessary includes of

2018-07-25 Thread Bjorn Helgaas
Remove includes of  from files that don't need
it.  I'll apply all these via the PCI tree unless there's objection.

---

Bjorn Helgaas (4):
  igb: Remove unnecessary include of 
  ath9k: Remove unnecessary include of 
  iwlwifi: Remove unnecessary include of 
  PCI: Remove unnecessary include of 


 drivers/net/ethernet/intel/igb/igb_main.c |1 -
 drivers/net/wireless/ath/ath9k/pci.c  |1 -
 drivers/net/wireless/intel/iwlwifi/pcie/drv.c |1 -
 drivers/pci/pci-sysfs.c   |1 -
 drivers/pci/pci.c |1 -
 drivers/pci/probe.c   |1 -
 drivers/pci/remove.c  |1 -
 7 files changed, 7 deletions(-)


[PATCH v1 1/4] igb: Remove unnecessary include of

2018-07-25 Thread Bjorn Helgaas
From: Bjorn Helgaas 

The igb driver doesn't need anything provided by pci-aspm.h, so remove
the unnecessary include of it.

Signed-off-by: Bjorn Helgaas 
---
 drivers/net/ethernet/intel/igb/igb_main.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index f707709969ac..c77fda05f683 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -22,7 +22,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 



[PATCH v1 4/4] PCI: Remove unnecessary include of

2018-07-25 Thread Bjorn Helgaas
From: Bjorn Helgaas 

Several PCI core files include pci-aspm.h even though they don't need
anything provided by that file.  Remove the unnecessary includes of it.

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/pci-sysfs.c |1 -
 drivers/pci/pci.c   |1 -
 drivers/pci/probe.c |1 -
 drivers/pci/remove.c|1 -
 4 files changed, 4 deletions(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 0c4653c1d2ce..91337faae60d 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -23,7 +23,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index f5c6ab14fb31..7c2f0e682fc0 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -23,7 +23,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ac876e32de4b..1ed2852dee21 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -13,7 +13,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
index 6f072eae4f7a..01ec7fcb5634 100644
--- a/drivers/pci/remove.c
+++ b/drivers/pci/remove.c
@@ -1,7 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #include 
 #include 
-#include 
 #include "pci.h"
 
 static void pci_free_resources(struct pci_dev *dev)



[PATCH v1 3/4] iwlwifi: Remove unnecessary include of

2018-07-25 Thread Bjorn Helgaas
From: Bjorn Helgaas 

This part of the iwlwifi driver doesn't need anything provided by
pci-aspm.h, so remove the unnecessary include of it.

Signed-off-by: Bjorn Helgaas 
---
 drivers/net/wireless/intel/iwlwifi/pcie/drv.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/drv.c 
b/drivers/net/wireless/intel/iwlwifi/pcie/drv.c
index 38234bda9017..d6c55e111fda 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/drv.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/drv.c
@@ -72,7 +72,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #include "fw/acpi.h"



[PATCH v1 2/4] ath9k: Remove unnecessary include of

2018-07-25 Thread Bjorn Helgaas
From: Bjorn Helgaas 

The ath9k driver doesn't need anything provided by pci-aspm.h, so remove
the unnecessary include of it.

Signed-off-by: Bjorn Helgaas 
---
 drivers/net/wireless/ath/ath9k/pci.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath9k/pci.c 
b/drivers/net/wireless/ath/ath9k/pci.c
index 645f0fbd9179..92b2dd396436 100644
--- a/drivers/net/wireless/ath/ath9k/pci.c
+++ b/drivers/net/wireless/ath/ath9k/pci.c
@@ -18,7 +18,6 @@
 
 #include 
 #include 
-#include 
 #include 
 #include "ath9k.h"
 



Re: [PATCH v1 0/4] PCI: Remove unnecessary includes of

2018-07-25 Thread Bjorn Helgaas
On Wed, Jul 25, 2018 at 01:33:23PM -0700, Sinan Kaya wrote:
> On 7/25/2018 12:52 PM, Bjorn Helgaas wrote:
> > emove includes of  from files that don't need
> > it.  I'll apply all these via the PCI tree unless there's objection.
> > 
> > ---
> > 
> > Bjorn Helgaas (4):
> >igb: Remove unnecessary include of 
> >ath9k: Remove unnecessary include of 
> >iwlwifi: Remove unnecessary include of 
> >PCI: Remove unnecessary include of 
> 
> Thanks.
> 
> Reviewed-by: Sinan Kaya 
> 
> Is it possible to kill that file altogether? I haven't looked who is
> using outside of pci directory.

Thanks for taking a look!

It's possible we could remove it altogether; there's very little in
it, and in most cases the only reason drivers include it is to disable
certain ASPM link states to work around hardware defects.  It might
make sense to just move that interface into .


Re: [PATCH v5 05/14] PCI: Add pcie_print_link_status() to log link speed and whether it's limited

2018-04-13 Thread Bjorn Helgaas
On Thu, Apr 12, 2018 at 09:32:49PM -0700, Jakub Kicinski wrote:
> On Fri, 30 Mar 2018 16:05:18 -0500, Bjorn Helgaas wrote:
> > +   if (bw_avail >= bw_cap)
> > +   pci_info(dev, "%d Mb/s available bandwidth (%s x%d link)\n",
> > +bw_cap, PCIE_SPEED2STR(speed_cap), width_cap);
> > +   else
> > +   pci_info(dev, "%d Mb/s available bandwidth, limited by %s x%d 
> > link at %s (capable of %d Mb/s with %s x%d link)\n",
> > +bw_avail, PCIE_SPEED2STR(speed), width,
> > +limiting_dev ? pci_name(limiting_dev) : "",
> > +bw_cap, PCIE_SPEED2STR(speed_cap), width_cap);
> 
> I was just looking at using this new function to print PCIe BW for a
> NIC, but I'm slightly worried that there is nothing in the message that
> says PCIe...  For a NIC some people may interpret the bandwidth as NIC
> bandwidth:
> 
> [   39.839989] nfp :04:00.0: Netronome Flow Processor NFP4000/NFP6000 
> PCIe Card Probe
> [   39.848943] nfp :04:00.0: 63.008 Gb/s available bandwidth (8 GT/s x8 
> link)
> [   39.857146] nfp :04:00.0: RESERVED BARs: 0.0: General/MSI-X SRAM, 0.1: 
> PCIe XPB/MSI-X PBA, 0.4: Explicit0, 0.5: Explicit1, fre4
> 
> It's not a 63Gbps NIC...  I'm sorry if this was discussed before and I
> didn't find it.  Would it make sense to add the "PCIe: " prefix to the
> message like bnx2x used to do?  Like:
> 
> nfp :04:00.0: PCIe: 63.008 Gb/s available bandwidth (8 GT/s x8 link)

I agree, that does look potentially confusing.  How about this:

  nfp :04:00.0: 63.008 Gb/s available PCIe bandwidth (8 GT/s x8 link)

I did have to look twice at this before I remembered that we're
printing Gb/s (not GB/s).  Most of the references I found on the web
use GB/s when talking about total PCIe bandwidth.

But either way I think it's definitely worth mentioning PCIe
explicitly.


Re: [PATCH] PCI: Add PCIe to pcie_print_link_status() messages

2018-04-20 Thread Bjorn Helgaas
On Fri, Apr 13, 2018 at 11:16:38AM -0700, Jakub Kicinski wrote:
> Currently the pcie_print_link_status() will print PCIe bandwidth
> and link width information but does not mention it is pertaining
> to the PCIe.  Since this and related functions are used exclusively
> by networking drivers today users may get confused into thinking
> that it's the NIC bandwidth that is being talked about.  Insert a
> "PCIe" into the messages.
> 
> Signed-off-by: Jakub Kicinski 

Applied to for-linus for v4.17, thanks!

> ---
>  drivers/pci/pci.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index aa86e904f93c..73a0a4993f6a 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5273,11 +5273,11 @@ void pcie_print_link_status(struct pci_dev *dev)
>   bw_avail = pcie_bandwidth_available(dev, _dev, , );
>  
>   if (bw_avail >= bw_cap)
> - pci_info(dev, "%u.%03u Gb/s available bandwidth (%s x%d 
> link)\n",
> + pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth (%s x%d 
> link)\n",
>bw_cap / 1000, bw_cap % 1000,
>PCIE_SPEED2STR(speed_cap), width_cap);
>   else
> - pci_info(dev, "%u.%03u Gb/s available bandwidth, limited by %s 
> x%d link at %s (capable of %u.%03u Gb/s with %s x%d link)\n",
> + pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth, limited 
> by %s x%d link at %s (capable of %u.%03u Gb/s with %s x%d link)\n",
>bw_avail / 1000, bw_avail % 1000,
>PCIE_SPEED2STR(speed), width,
>limiting_dev ? pci_name(limiting_dev) : "",
> -- 
> 2.16.2
> 


Re: [pci PATCH v8 0/4] Add support for unmanaged SR-IOV

2018-04-21 Thread Bjorn Helgaas
On Fri, Apr 20, 2018 at 12:28:08PM -0400, Alexander Duyck wrote:
> This series is meant to add support for SR-IOV on devices when the VFs are
> not managed by the kernel. Examples of recent patches attempting to do this
> include:
> virto - https://patchwork.kernel.org/patch/10241225/
> pci-stub - https://patchwork.kernel.org/patch/10109935/
> vfio - https://patchwork.kernel.org/patch/10103353/
> uio - https://patchwork.kernel.org/patch/9974031/
> 
> Since this is quickly blowing up into a multi-driver problem it is probably
> best to implement this solution as generically as possible.
> 
> This series is an attempt to do that. What we do with this patch set is
> provide a generic framework to enable SR-IOV in the case that the PF driver
> doesn't support managing the VFs itself.
> 
> I based my patch set originally on the patch by Mark Rustad but there isn't
> much left after going through and cleaning out the bits that were no longer
> needed, and after incorporating the feedback from David Miller. At this point
> the only items to be fully reused was his patch description which is now
> present in patch 3 of the set.
> 
> This solution is limited in scope to just adding support for devices that
> provide no functionality for SR-IOV other than allocating the VFs by
> calling pci_enable_sriov. Previous sets had included patches for VFIO, but
> for now I am dropping that as the scope of that work is larger then I
> think I can take on at this time.
> 
> v2: Reduced scope back to just virtio_pci and vfio-pci
> Broke into 3 patch set from single patch
> Changed autoprobe behavior to always set when num_vfs is set non-zero
> v3: Updated Documentation to clarify when sriov_unmanaged_autoprobe is used
> Wrapped vfio_pci_sriov_configure to fix build errors w/o SR-IOV in kernel
> v4: Dropped vfio-pci patch
> Added ena and nvme to drivers now using pci_sriov_configure_unmanaged
> Dropped pci_disable_sriov call in virtio_pci to be consistent with ena
> v5: Dropped sriov_unmanaged_autoprobe and pci_sriov_conifgure_unmanaged
> Added new patch that enables pci_sriov_configure_simple
> Updated drivers to use pci_sriov_configure_simple
> v6: Defined pci_sriov_configure_simple as NULL when SR-IOV is not enabled
> Updated drivers to drop "#ifdef" checks for IOV
> Added pci-pf-stub as place for PF-only drivers to add support
> v7: Dropped pci_id table explanation from pci-pf-stub driver
> Updated pci_sriov_configure_simple to drop need for err value
> Fixed comment explaining why pci_sriov_configure_simple is NULL
> v8: Dropped virtio from the set, support to be added later after TC approval
> 
> Cc: Mark Rustad 
> Cc: Maximilian Heyne 
> Cc: Liang-Min Wang 
> Cc: David Woodhouse 
> 
> ---
> 
> Alexander Duyck (4):
>   pci: Add pci_sriov_configure_simple for PFs that don't manage VF 
> resources
>   ena: Migrate over to unmanaged SR-IOV support
>   nvme: Migrate over to unmanaged SR-IOV support
>   pci-pf-stub: Add PF driver stub for PFs that function only to enable VFs
> 
> 
>  drivers/net/ethernet/amazon/ena/ena_netdev.c |   28 -
>  drivers/nvme/host/pci.c  |   20 --
>  drivers/pci/Kconfig  |   12 ++
>  drivers/pci/Makefile |2 +
>  drivers/pci/iov.c|   31 +++
>  drivers/pci/pci-pf-stub.c|   54 
> ++
>  include/linux/pci.h  |3 +
>  include/linux/pci_ids.h  |2 +
>  8 files changed, 106 insertions(+), 46 deletions(-)
>  create mode 100644 drivers/pci/pci-pf-stub.c

I tentatively applied these to pci/virtualization-review.

The code changes look fine, but I want to flesh out the changelogs a
little bit before merging them.

For example, I'm not sure what you mean by "devices where the PF is
not capable of managing VF resources."

It *sounds* like you're saying the hardware works differently on some
devices, but I don't think that's what you mean.  I think you're
saying something about which drivers are used for the PF and the VF.

I think a trivial example of how this will be used might help.  I
assume this involves a virtualization scenario where the host uses the
PF to enable several VFs, but the host doesn't use the PF for much
else.  Then you assign the VFs to guests, and drivers in the guest
OSes use the VFs.

Since .sriov_configure() is only used by sriov_numvfs_store(), I
assume the usage model involves writing to the sysfs sriov_numvfs
attribute to enable the VFs, then assigning them to guests?

Bjorn


Re: [PATCH net-next 1/2] PCI: Add two more values for PCIe Max_Read_Request_Size

2018-04-16 Thread Bjorn Helgaas
On Mon, Apr 16, 2018 at 09:37:13PM +0200, Heiner Kallweit wrote:
> This patch adds missing values for the max read request size.
> E.g. network driver r8169 uses a value of 4K.
> 
> Signed-off-by: Heiner Kallweit <hkallwe...@gmail.com>

I'd prefer a subject line with more details, e.g.,

  PCI: Add #defines for 2K and 4K Max Read Request Size

Acked-by: Bjorn Helgaas <bhelg...@google.com>

I suspect conflicts are more likely in r8169.c so it might make more
sense to route these through the netdev tree.  I'd also be happy to
take them, so let me know if you want me to take them, David.

> ---
>  include/uapi/linux/pci_regs.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index 0c79eac5..699257fb 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -506,6 +506,8 @@
>  #define  PCI_EXP_DEVCTL_READRQ_256B  0x1000 /* 256 Bytes */
>  #define  PCI_EXP_DEVCTL_READRQ_512B  0x2000 /* 512 Bytes */
>  #define  PCI_EXP_DEVCTL_READRQ_1024B 0x3000 /* 1024 Bytes */
> +#define  PCI_EXP_DEVCTL_READRQ_2048B 0x4000 /* 2048 Bytes */
> +#define  PCI_EXP_DEVCTL_READRQ_4096B 0x5000 /* 4096 Bytes */
>  #define  PCI_EXP_DEVCTL_BCR_FLR 0x8000  /* Bridge Configuration Retry / FLR 
> */
>  #define PCI_EXP_DEVSTA   10  /* Device Status */
>  #define  PCI_EXP_DEVSTA_CED  0x0001  /* Correctable Error Detected */
> -- 
> 2.17.0
> 
> 


Re: [pci PATCH v8 0/4] Add support for unmanaged SR-IOV

2018-04-24 Thread Bjorn Helgaas
On Sat, Apr 21, 2018 at 05:22:27PM -0700, Alexander Duyck wrote:
> On Sat, Apr 21, 2018 at 1:34 PM, Bjorn Helgaas <helg...@kernel.org> wrote:

> > For example, I'm not sure what you mean by "devices where the PF is
> > not capable of managing VF resources."
> >
> > It *sounds* like you're saying the hardware works differently on some
> > devices, but I don't think that's what you mean.  I think you're
> > saying something about which drivers are used for the PF and the VF.
> 
> That is sort of what I am saying.
> 
> So for example with ixgbe there is functionality which is controlled
> in the MMIO space of the PF that affects the functionality of the VFs
> that are generated on the device. The PF has to rearrange the
> resources such as queues and interrupts on the device before it can
> enable SR-IOV, and it could alter those later to limit what the VF is
> capable of doing.
> 
> The model I am dealing with via this patch set has a PF that is not
> much different than the VFs other than the fact that it has some
> extended configuration space bits in place for SR-IOV, ARI, ACS, and
> whatever other bits are needed in order to support spawning isolated
> VFs.

OK, thanks for the explanation, I think I understand what's going on
now, correct me if I'm mistaken.  I added a hint about "PF" for Randy,
too.

These are on pci/virtualization for v4.18.


commit 8effc395c209
Author: Alexander Duyck <alexander.h.du...@intel.com>
Date:   Sat Apr 21 15:23:09 2018 -0500

PCI/IOV: Add pci_sriov_configure_simple()

SR-IOV (Single Root I/O Virtualization) is an optional PCIe capability (see
PCIe r4.0, sec 9).  A PCIe Function with the SR-IOV capability is referred
to as a PF (Physical Function).  If SR-IOV is enabled on the PF, several
VFs (Virtual Functions) may be created.  The VFs can be individually
assigned to virtual machines, which allows them to share a single hardware
device while being isolated from each other.

Some SR-IOV devices have resources such as queues and interrupts that must
be set up in the PF before enabling the VFs, so they require a PF driver to
do that.

Other SR-IOV devices don't require any PF setup before enabling VFs.  Add a
pci_sriov_configure_simple() interface so PF drivers for such devices can
use it without repeating the VF-enabling code.

Tested-by: Mark Rustad <mark.d.rus...@intel.com>
    Signed-off-by: Alexander Duyck <alexander.h.du...@intel.com>
[bhelgaas: changelog, comment]
Signed-off-by: Bjorn Helgaas <bhelg...@google.com>
Reviewed-by: Greg Rose <gvrose8...@gmail.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>:wq

commit a8ccf8a3
Author: Alexander Duyck <alexander.h.du...@intel.com>
Date:   Tue Apr 24 16:47:16 2018 -0500

PCI/IOV: Add pci-pf-stub driver for PFs that only enable VFs

Some SR-IOV PF devices provide no functionality other than acting as a
means of enabling VFs.  For these devices, we want to enable the VFs and
assign them to guest virtual machines, but there's no need to have a driver
for the PF itself.

Add a new pci-pf-stub driver to claim those PF devices and provide the
generic VF enable functionality.  An administrator can use the sysfs
"sriov_numvfs" file to enable VFs, then assign them to guests.

For now I only have one example ID provided by Amazon in terms of devices
that require this functionality.  The general idea is that in the future we
will see other devices added as vendors come up with devices where the PF
is more or less just a lightweight shim used to allocate VFs.

Signed-off-by: Alexander Duyck <alexander.h.du...@intel.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelg...@google.com>
Reviewed-by: Greg Rose <gvrose8...@gmail.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>

commit 115ddc491922
Author: Alexander Duyck <alexander.h.du...@intel.com>
Date:   Tue Apr 24 16:47:22 2018 -0500

net: ena: Use pci_sriov_configure_simple() to enable VFs

Instead of implementing our own version of a SR-IOV configuration stub in
the ena driver, use the existing pci_sriov_configure_simple() function.

Signed-off-by: Alexander Duyck <alexander.h.du...@intel.com>
Signed-off-by: Bjorn Helgaas <bhelg...@google.com>
Reviewed-by: Greg Rose <gvrose8...@gmail.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>

commit 74d986abc20b
Author: Alexander Duyck <alexander.h.du...@intel.com>
Date:   Tue Apr 24 16:47:27 2018 -0500

nvme-pci: Use pci_sriov_configure_simple() to enable VFs

Instead of implementing our own version of a SR-IOV configuration stub in
the nvme driver, use the existing pci_sriov_configure_simple() function.

Signed-off-by: Alexander Duyck <alexander.h.du...@intel.com>
Signed-off-by: Bjorn Helgaas <bhelg...@google.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>


[PATCH v5 12/14] fm10k: Report PCIe link properties with pcie_print_link_status()

2018-03-30 Thread Bjorn Helgaas
From: Bjorn Helgaas <bhelg...@google.com>

Use pcie_print_link_status() to report PCIe link speed and possible
limitations instead of implementing this in the driver itself.

Note that pcie_get_minimum_link() can return misleading information because
it finds the slowest link and the narrowest link without considering the
total bandwidth of the link.  If the path contains a 16 GT/s x1 link and a
2.5 GT/s x16 link, pcie_get_minimum_link() returns 2.5 GT/s x1, which
corresponds to 250 MB/s of bandwidth, not the actual available bandwidth of
about 2000 MB/s for a 16 GT/s x1 link.

Signed-off-by: Bjorn Helgaas <bhelg...@google.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c |   87 --
 1 file changed, 1 insertion(+), 86 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index a434fecfdfeb..aa05fb534942 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -2120,91 +2120,6 @@ static int fm10k_sw_init(struct fm10k_intfc *interface,
return 0;
 }
 
-static void fm10k_slot_warn(struct fm10k_intfc *interface)
-{
-   enum pcie_link_width width = PCIE_LNK_WIDTH_UNKNOWN;
-   enum pci_bus_speed speed = PCI_SPEED_UNKNOWN;
-   struct fm10k_hw *hw = >hw;
-   int max_gts = 0, expected_gts = 0;
-
-   if (pcie_get_minimum_link(interface->pdev, , ) ||
-   speed == PCI_SPEED_UNKNOWN || width == PCIE_LNK_WIDTH_UNKNOWN) {
-   dev_warn(>pdev->dev,
-"Unable to determine PCI Express bandwidth.\n");
-   return;
-   }
-
-   switch (speed) {
-   case PCIE_SPEED_2_5GT:
-   /* 8b/10b encoding reduces max throughput by 20% */
-   max_gts = 2 * width;
-   break;
-   case PCIE_SPEED_5_0GT:
-   /* 8b/10b encoding reduces max throughput by 20% */
-   max_gts = 4 * width;
-   break;
-   case PCIE_SPEED_8_0GT:
-   /* 128b/130b encoding has less than 2% impact on throughput */
-   max_gts = 8 * width;
-   break;
-   default:
-   dev_warn(>pdev->dev,
-"Unable to determine PCI Express bandwidth.\n");
-   return;
-   }
-
-   dev_info(>pdev->dev,
-"PCI Express bandwidth of %dGT/s available\n",
-max_gts);
-   dev_info(>pdev->dev,
-"(Speed:%s, Width: x%d, Encoding Loss:%s, Payload:%s)\n",
-(speed == PCIE_SPEED_8_0GT ? "8.0GT/s" :
- speed == PCIE_SPEED_5_0GT ? "5.0GT/s" :
- speed == PCIE_SPEED_2_5GT ? "2.5GT/s" :
- "Unknown"),
-hw->bus.width,
-(speed == PCIE_SPEED_2_5GT ? "20%" :
- speed == PCIE_SPEED_5_0GT ? "20%" :
- speed == PCIE_SPEED_8_0GT ? "<2%" :
- "Unknown"),
-(hw->bus.payload == fm10k_bus_payload_128 ? "128B" :
- hw->bus.payload == fm10k_bus_payload_256 ? "256B" :
- hw->bus.payload == fm10k_bus_payload_512 ? "512B" :
- "Unknown"));
-
-   switch (hw->bus_caps.speed) {
-   case fm10k_bus_speed_2500:
-   /* 8b/10b encoding reduces max throughput by 20% */
-   expected_gts = 2 * hw->bus_caps.width;
-   break;
-   case fm10k_bus_speed_5000:
-   /* 8b/10b encoding reduces max throughput by 20% */
-   expected_gts = 4 * hw->bus_caps.width;
-   break;
-   case fm10k_bus_speed_8000:
-   /* 128b/130b encoding has less than 2% impact on throughput */
-   expected_gts = 8 * hw->bus_caps.width;
-   break;
-   default:
-   dev_warn(>pdev->dev,
-"Unable to determine expected PCI Express 
bandwidth.\n");
-   return;
-   }
-
-   if (max_gts >= expected_gts)
-   return;
-
-   dev_warn(>pdev->dev,
-"This device requires %dGT/s of bandwidth for optimal 
performance.\n",
-expected_gts);
-   dev_warn(>pdev->dev,
-"A %sslot with x%d lanes is suggested.\n",
-(hw->bus_caps.speed == fm10k_bus_speed_2500 ? "2.5GT/s " :
- hw->bus_caps.speed == fm10k_bus_speed_5000 ? "5.0GT/s " :
- hw->bus_caps.speed == fm10k_bus_speed_8000 ? "8.0GT/s " : ""),
-hw->bus_caps.width);
-}
-
 /**
  * fm10k_probe - Device Initialization Routine
  * @pdev: PCI device information 

[PATCH v5 11/14] cxgb4: Report PCIe link properties with pcie_print_link_status()

2018-03-30 Thread Bjorn Helgaas
From: Bjorn Helgaas <bhelg...@google.com>

Use pcie_print_link_status() to report PCIe link speed and possible
limitations instead of implementing this in the driver itself.

Note that pcie_get_minimum_link() can return misleading information because
it finds the slowest link and the narrowest link without considering the
total bandwidth of the link.  If the path contains a 16 GT/s x1 link and a
2.5 GT/s x16 link, pcie_get_minimum_link() returns 2.5 GT/s x1, which
corresponds to 250 MB/s of bandwidth, not the actual available bandwidth of
about 2000 MB/s for a 16 GT/s x1 link.

Signed-off-by: Bjorn Helgaas <bhelg...@google.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |   75 ---
 1 file changed, 1 insertion(+), 74 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 56bc626ef006..2d6864c8199e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4762,79 +4762,6 @@ static int init_rss(struct adapter *adap)
return 0;
 }
 
-static int cxgb4_get_pcie_dev_link_caps(struct adapter *adap,
-   enum pci_bus_speed *speed,
-   enum pcie_link_width *width)
-{
-   u32 lnkcap1, lnkcap2;
-   int err1, err2;
-
-#define  PCIE_MLW_CAP_SHIFT 4   /* start of MLW mask in link capabilities */
-
-   *speed = PCI_SPEED_UNKNOWN;
-   *width = PCIE_LNK_WIDTH_UNKNOWN;
-
-   err1 = pcie_capability_read_dword(adap->pdev, PCI_EXP_LNKCAP,
- );
-   err2 = pcie_capability_read_dword(adap->pdev, PCI_EXP_LNKCAP2,
- );
-   if (!err2 && lnkcap2) { /* PCIe r3.0-compliant */
-   if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_8_0GB)
-   *speed = PCIE_SPEED_8_0GT;
-   else if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_5_0GB)
-   *speed = PCIE_SPEED_5_0GT;
-   else if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_2_5GB)
-   *speed = PCIE_SPEED_2_5GT;
-   }
-   if (!err1) {
-   *width = (lnkcap1 & PCI_EXP_LNKCAP_MLW) >> PCIE_MLW_CAP_SHIFT;
-   if (!lnkcap2) { /* pre-r3.0 */
-   if (lnkcap1 & PCI_EXP_LNKCAP_SLS_5_0GB)
-   *speed = PCIE_SPEED_5_0GT;
-   else if (lnkcap1 & PCI_EXP_LNKCAP_SLS_2_5GB)
-   *speed = PCIE_SPEED_2_5GT;
-   }
-   }
-
-   if (*speed == PCI_SPEED_UNKNOWN || *width == PCIE_LNK_WIDTH_UNKNOWN)
-   return err1 ? err1 : err2 ? err2 : -EINVAL;
-   return 0;
-}
-
-static void cxgb4_check_pcie_caps(struct adapter *adap)
-{
-   enum pcie_link_width width, width_cap;
-   enum pci_bus_speed speed, speed_cap;
-
-#define PCIE_SPEED_STR(speed) \
-   (speed == PCIE_SPEED_8_0GT ? "8.0GT/s" : \
-speed == PCIE_SPEED_5_0GT ? "5.0GT/s" : \
-speed == PCIE_SPEED_2_5GT ? "2.5GT/s" : \
-"Unknown")
-
-   if (cxgb4_get_pcie_dev_link_caps(adap, _cap, _cap)) {
-   dev_warn(adap->pdev_dev,
-"Unable to determine PCIe device BW capabilities\n");
-   return;
-   }
-
-   if (pcie_get_minimum_link(adap->pdev, , ) ||
-   speed == PCI_SPEED_UNKNOWN || width == PCIE_LNK_WIDTH_UNKNOWN) {
-   dev_warn(adap->pdev_dev,
-"Unable to determine PCI Express bandwidth.\n");
-   return;
-   }
-
-   dev_info(adap->pdev_dev, "PCIe link speed is %s, device supports %s\n",
-PCIE_SPEED_STR(speed), PCIE_SPEED_STR(speed_cap));
-   dev_info(adap->pdev_dev, "PCIe link width is x%d, device supports 
x%d\n",
-width, width_cap);
-   if (speed < speed_cap || width < width_cap)
-   dev_info(adap->pdev_dev,
-"A slot with more lanes and/or higher speed is "
-"suggested for optimal performance.\n");
-}
-
 /* Dump basic information about the adapter */
 static void print_adapter_info(struct adapter *adapter)
 {
@@ -5466,7 +5393,7 @@ static int init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
}
 
/* check for PCI Express bandwidth capabiltites */
-   cxgb4_check_pcie_caps(adapter);
+   pcie_print_link_status(pdev);
 
err = init_rss(adapter);
if (err)



[PATCH v5 09/14] bnx2x: Report PCIe link properties with pcie_print_link_status()

2018-03-30 Thread Bjorn Helgaas
From: Bjorn Helgaas <bhelg...@google.com>

Use pcie_print_link_status() to report PCIe link speed and possible
limitations instead of implementing this in the driver itself.

Note that pcie_get_minimum_link() can return misleading information because
it finds the slowest link and the narrowest link without considering the
total bandwidth of the link.  If the path contains a 16 GT/s x1 link and a
2.5 GT/s x16 link, pcie_get_minimum_link() returns 2.5 GT/s x1, which
corresponds to 250 MB/s of bandwidth, not the actual available bandwidth of
about 2000 MB/s for a 16 GT/s x1 link.

Signed-off-by: Bjorn Helgaas <bhelg...@google.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |   23 ++
 1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 74fc9af4aadb..c92601f1b0f3 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -13922,8 +13922,6 @@ static int bnx2x_init_one(struct pci_dev *pdev,
 {
struct net_device *dev = NULL;
struct bnx2x *bp;
-   enum pcie_link_width pcie_width;
-   enum pci_bus_speed pcie_speed;
int rc, max_non_def_sbs;
int rx_count, tx_count, rss_count, doorbell_size;
int max_cos_est;
@@ -14091,21 +14089,12 @@ static int bnx2x_init_one(struct pci_dev *pdev,
dev_addr_add(bp->dev, bp->fip_mac, NETDEV_HW_ADDR_T_SAN);
rtnl_unlock();
}
-   if (pcie_get_minimum_link(bp->pdev, _speed, _width) ||
-   pcie_speed == PCI_SPEED_UNKNOWN ||
-   pcie_width == PCIE_LNK_WIDTH_UNKNOWN)
-   BNX2X_DEV_INFO("Failed to determine PCI Express Bandwidth\n");
-   else
-   BNX2X_DEV_INFO(
-  "%s (%c%d) PCI-E x%d %s found at mem %lx, IRQ %d, node 
addr %pM\n",
-  board_info[ent->driver_data].name,
-  (CHIP_REV(bp) >> 12) + 'A', (CHIP_METAL(bp) >> 4),
-  pcie_width,
-  pcie_speed == PCIE_SPEED_2_5GT ? "2.5GHz" :
-  pcie_speed == PCIE_SPEED_5_0GT ? "5.0GHz" :
-  pcie_speed == PCIE_SPEED_8_0GT ? "8.0GHz" :
-  "Unknown",
-  dev->base_addr, bp->pdev->irq, dev->dev_addr);
+   BNX2X_DEV_INFO(
+  "%s (%c%d) PCI-E found at mem %lx, IRQ %d, node addr %pM\n",
+  board_info[ent->driver_data].name,
+  (CHIP_REV(bp) >> 12) + 'A', (CHIP_METAL(bp) >> 4),
+  dev->base_addr, bp->pdev->irq, dev->dev_addr);
+   pcie_print_link_status(bp->pdev);
 
bnx2x_register_phc(bp);
 



[PATCH v5 13/14] ixgbe: Report PCIe link properties with pcie_print_link_status()

2018-03-30 Thread Bjorn Helgaas
From: Bjorn Helgaas <bhelg...@google.com>

Use pcie_print_link_status() to report PCIe link speed and possible
limitations instead of implementing this in the driver itself.

Note that pcie_get_minimum_link() can return misleading information because
it finds the slowest link and the narrowest link without considering the
total bandwidth of the link.  If the path contains a 16 GT/s x1 link and a
2.5 GT/s x16 link, pcie_get_minimum_link() returns 2.5 GT/s x1, which
corresponds to 250 MB/s of bandwidth, not the actual available bandwidth of
about 2000 MB/s for a 16 GT/s x1 link.

Signed-off-by: Bjorn Helgaas <bhelg...@google.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   47 +
 1 file changed, 1 insertion(+), 46 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 0da5aa2c8aba..38bb9c17d333 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -270,9 +270,6 @@ static void ixgbe_check_minimum_link(struct ixgbe_adapter 
*adapter,
 int expected_gts)
 {
struct ixgbe_hw *hw = >hw;
-   int max_gts = 0;
-   enum pci_bus_speed speed = PCI_SPEED_UNKNOWN;
-   enum pcie_link_width width = PCIE_LNK_WIDTH_UNKNOWN;
struct pci_dev *pdev;
 
/* Some devices are not connected over PCIe and thus do not negotiate
@@ -288,49 +285,7 @@ static void ixgbe_check_minimum_link(struct ixgbe_adapter 
*adapter,
else
pdev = adapter->pdev;
 
-   if (pcie_get_minimum_link(pdev, , ) ||
-   speed == PCI_SPEED_UNKNOWN || width == PCIE_LNK_WIDTH_UNKNOWN) {
-   e_dev_warn("Unable to determine PCI Express bandwidth.\n");
-   return;
-   }
-
-   switch (speed) {
-   case PCIE_SPEED_2_5GT:
-   /* 8b/10b encoding reduces max throughput by 20% */
-   max_gts = 2 * width;
-   break;
-   case PCIE_SPEED_5_0GT:
-   /* 8b/10b encoding reduces max throughput by 20% */
-   max_gts = 4 * width;
-   break;
-   case PCIE_SPEED_8_0GT:
-   /* 128b/130b encoding reduces throughput by less than 2% */
-   max_gts = 8 * width;
-   break;
-   default:
-   e_dev_warn("Unable to determine PCI Express bandwidth.\n");
-   return;
-   }
-
-   e_dev_info("PCI Express bandwidth of %dGT/s available\n",
-  max_gts);
-   e_dev_info("(Speed:%s, Width: x%d, Encoding Loss:%s)\n",
-  (speed == PCIE_SPEED_8_0GT ? "8.0GT/s" :
-   speed == PCIE_SPEED_5_0GT ? "5.0GT/s" :
-   speed == PCIE_SPEED_2_5GT ? "2.5GT/s" :
-   "Unknown"),
-  width,
-  (speed == PCIE_SPEED_2_5GT ? "20%" :
-   speed == PCIE_SPEED_5_0GT ? "20%" :
-   speed == PCIE_SPEED_8_0GT ? "<2%" :
-   "Unknown"));
-
-   if (max_gts < expected_gts) {
-   e_dev_warn("This is not sufficient for optimal performance of 
this card.\n");
-   e_dev_warn("For optimal performance, at least %dGT/s of 
bandwidth is required.\n",
-   expected_gts);
-   e_dev_warn("A slot with more lanes and/or higher speed is 
suggested.\n");
-   }
+   pcie_print_link_status(pdev);
 }
 
 static void ixgbe_service_event_schedule(struct ixgbe_adapter *adapter)



[PATCH v5 14/14] PCI: Remove unused pcie_get_minimum_link()

2018-03-30 Thread Bjorn Helgaas
From: Bjorn Helgaas <bhelg...@google.com>

In some cases pcie_get_minimum_link() returned misleading information
because it found the slowest link and the narrowest link without
considering the total bandwidth of the link.  For example, if the path
contained a 16 GT/s x1 link and a 2.5 GT/s x16 link,
pcie_get_minimum_link() returned 2.5 GT/s x1, which corresponds to 250 MB/s
of bandwidth, not the actual available bandwidth of about 2000 MB/s for a
16 GT/s x1 link.

Callers should use pcie_print_link_status() instead, or
pcie_bandwidth_available() if they need more detailed information.

Signed-off-by: Bjorn Helgaas <bhelg...@google.com>
---
 drivers/pci/pci.c   |   43 ---
 include/linux/pci.h |2 --
 2 files changed, 45 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index cec7aed09f6b..b6951c44ae6c 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5103,49 +5103,6 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
 }
 EXPORT_SYMBOL(pcie_set_mps);
 
-/**
- * pcie_get_minimum_link - determine minimum link settings of a PCI device
- * @dev: PCI device to query
- * @speed: storage for minimum speed
- * @width: storage for minimum width
- *
- * This function will walk up the PCI device chain and determine the minimum
- * link width and speed of the device.
- */
-int pcie_get_minimum_link(struct pci_dev *dev, enum pci_bus_speed *speed,
- enum pcie_link_width *width)
-{
-   int ret;
-
-   *speed = PCI_SPEED_UNKNOWN;
-   *width = PCIE_LNK_WIDTH_UNKNOWN;
-
-   while (dev) {
-   u16 lnksta;
-   enum pci_bus_speed next_speed;
-   enum pcie_link_width next_width;
-
-   ret = pcie_capability_read_word(dev, PCI_EXP_LNKSTA, );
-   if (ret)
-   return ret;
-
-   next_speed = pcie_link_speed[lnksta & PCI_EXP_LNKSTA_CLS];
-   next_width = (lnksta & PCI_EXP_LNKSTA_NLW) >>
-   PCI_EXP_LNKSTA_NLW_SHIFT;
-
-   if (next_speed < *speed)
-   *speed = next_speed;
-
-   if (next_width < *width)
-   *width = next_width;
-
-   dev = dev->bus->self;
-   }
-
-   return 0;
-}
-EXPORT_SYMBOL(pcie_get_minimum_link);
-
 /**
  * pcie_bandwidth_available - determine minimum link settings of a PCIe
  *   device and its bandwidth limitation
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 38f7957121ef..5ccee29fe1b1 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1081,8 +1081,6 @@ int pcie_get_readrq(struct pci_dev *dev);
 int pcie_set_readrq(struct pci_dev *dev, int rq);
 int pcie_get_mps(struct pci_dev *dev);
 int pcie_set_mps(struct pci_dev *dev, int mps);
-int pcie_get_minimum_link(struct pci_dev *dev, enum pci_bus_speed *speed,
- enum pcie_link_width *width);
 u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev 
**limiting_dev,
 enum pci_bus_speed *speed,
 enum pcie_link_width *width);



[PATCH v5 10/14] bnxt_en: Report PCIe link properties with pcie_print_link_status()

2018-03-30 Thread Bjorn Helgaas
From: Bjorn Helgaas <bhelg...@google.com>

Use pcie_print_link_status() to report PCIe link speed and possible
limitations instead of implementing this in the driver itself.

Note that pcie_get_minimum_link() can return misleading information because
it finds the slowest link and the narrowest link without considering the
total bandwidth of the link.  If the path contains a 16 GT/s x1 link and a
2.5 GT/s x16 link, pcie_get_minimum_link() returns 2.5 GT/s x1, which
corresponds to 250 MB/s of bandwidth, not the actual available bandwidth of
about 2000 MB/s for a 16 GT/s x1 link.

Signed-off-by: Bjorn Helgaas <bhelg...@google.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |   19 +--
 1 file changed, 1 insertion(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 1500243b9886..3be42431e029 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -8469,22 +8469,6 @@ static int bnxt_init_mac_addr(struct bnxt *bp)
return rc;
 }
 
-static void bnxt_parse_log_pcie_link(struct bnxt *bp)
-{
-   enum pcie_link_width width = PCIE_LNK_WIDTH_UNKNOWN;
-   enum pci_bus_speed speed = PCI_SPEED_UNKNOWN;
-
-   if (pcie_get_minimum_link(pci_physfn(bp->pdev), , ) ||
-   speed == PCI_SPEED_UNKNOWN || width == PCIE_LNK_WIDTH_UNKNOWN)
-   netdev_info(bp->dev, "Failed to determine PCIe Link Info\n");
-   else
-   netdev_info(bp->dev, "PCIe: Speed %s Width x%d\n",
-   speed == PCIE_SPEED_2_5GT ? "2.5GT/s" :
-   speed == PCIE_SPEED_5_0GT ? "5.0GT/s" :
-   speed == PCIE_SPEED_8_0GT ? "8.0GT/s" :
-   "Unknown", width);
-}
-
 static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 {
static int version_printed;
@@ -8694,8 +8678,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
netdev_info(dev, "%s found at mem %lx, node addr %pM\n",
board_info[ent->driver_data].name,
(long)pci_resource_start(pdev, 0), dev->dev_addr);
-
-   bnxt_parse_log_pcie_link(bp);
+   pcie_print_link_status(pdev);
 
return 0;
 



[PATCH v5 07/14] net/mlx5: Report PCIe link properties with pcie_print_link_status()

2018-03-30 Thread Bjorn Helgaas
From: Tal Gilboa <ta...@mellanox.com>

Use pcie_print_link_status() to report PCIe link speed and possible
limitations.

Signed-off-by: Tal Gilboa <ta...@mellanox.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelg...@google.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c |4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 2ef641c91c26..622f02d34aae 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1043,6 +1043,10 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, 
struct mlx5_priv *priv,
dev_info(>dev, "firmware version: %d.%d.%d\n", fw_rev_maj(dev),
 fw_rev_min(dev), fw_rev_sub(dev));
 
+   /* Only PFs hold the relevant PCIe information for this query */
+   if (mlx5_core_is_pf(dev))
+   pcie_print_link_status(dev->pdev);
+
/* on load removing any previous indication of internal error, device is
 * up
 */



[PATCH v5 08/14] net/mlx5e: Use pcie_bandwidth_available() to compute bandwidth

2018-03-30 Thread Bjorn Helgaas
From: Tal Gilboa <ta...@mellanox.com>

Use the new pci_bandwidth_available() function to calculate maximum
available bandwidth through the PCI chain instead of computing it ourselves
with mlx5e_get_pci_bw().

This is used to detect when the device is capable of more bandwidth than is
available in the current slot.  The driver may adjust compression settings
accordingly.

Note that pci_bandwidth_available() accounts for PCIe encoding overhead, so
it is more accurate than mlx5e_get_pci_bw() was.

Signed-off-by: Tal Gilboa <ta...@mellanox.com>
[bhelgaas: remove mlx5e_get_pci_bw() wrapper altogether]
Signed-off-by: Bjorn Helgaas <bhelg...@google.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |   32 +
 1 file changed, 1 insertion(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 47bab842c5ee..93291ec4a3d1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3864,36 +3864,6 @@ void mlx5e_build_default_indir_rqt(u32 *indirection_rqt, 
int len,
indirection_rqt[i] = i % num_channels;
 }
 
-static int mlx5e_get_pci_bw(struct mlx5_core_dev *mdev, u32 *pci_bw)
-{
-   enum pcie_link_width width;
-   enum pci_bus_speed speed;
-   int err = 0;
-
-   err = pcie_get_minimum_link(mdev->pdev, , );
-   if (err)
-   return err;
-
-   if (speed == PCI_SPEED_UNKNOWN || width == PCIE_LNK_WIDTH_UNKNOWN)
-   return -EINVAL;
-
-   switch (speed) {
-   case PCIE_SPEED_2_5GT:
-   *pci_bw = 2500 * width;
-   break;
-   case PCIE_SPEED_5_0GT:
-   *pci_bw = 5000 * width;
-   break;
-   case PCIE_SPEED_8_0GT:
-   *pci_bw = 8000 * width;
-   break;
-   default:
-   return -EINVAL;
-   }
-
-   return 0;
-}
-
 static bool cqe_compress_heuristic(u32 link_speed, u32 pci_bw)
 {
return (link_speed && pci_bw &&
@@ -3979,7 +3949,7 @@ void mlx5e_build_nic_params(struct mlx5_core_dev *mdev,
params->num_tc   = 1;
 
mlx5e_get_max_linkspeed(mdev, _speed);
-   mlx5e_get_pci_bw(mdev, _bw);
+   pci_bw = pcie_bandwidth_available(mdev->pdev, NULL, NULL, NULL);
mlx5_core_dbg(mdev, "Max link speed = %d, PCI BW = %d\n",
  link_speed, pci_bw);
 



[PATCH v5 06/14] net/mlx4_core: Report PCIe link properties with pcie_print_link_status()

2018-03-30 Thread Bjorn Helgaas
From: Tal Gilboa <ta...@mellanox.com>

Use pcie_print_link_status() to report PCIe link speed and possible
limitations instead of implementing this in the driver itself.

Signed-off-by: Tal Gilboa <ta...@mellanox.com>
Signed-off-by: Tariq Toukan <tar...@mellanox.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelg...@google.com>
---
 drivers/net/ethernet/mellanox/mlx4/main.c |   81 -
 1 file changed, 1 insertion(+), 80 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index 4d84cab77105..30cacac54e69 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -623,85 +623,6 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct 
mlx4_dev_cap *dev_cap)
return 0;
 }
 
-static int mlx4_get_pcie_dev_link_caps(struct mlx4_dev *dev,
-  enum pci_bus_speed *speed,
-  enum pcie_link_width *width)
-{
-   u32 lnkcap1, lnkcap2;
-   int err1, err2;
-
-#define  PCIE_MLW_CAP_SHIFT 4  /* start of MLW mask in link capabilities */
-
-   *speed = PCI_SPEED_UNKNOWN;
-   *width = PCIE_LNK_WIDTH_UNKNOWN;
-
-   err1 = pcie_capability_read_dword(dev->persist->pdev, PCI_EXP_LNKCAP,
- );
-   err2 = pcie_capability_read_dword(dev->persist->pdev, PCI_EXP_LNKCAP2,
- );
-   if (!err2 && lnkcap2) { /* PCIe r3.0-compliant */
-   if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_8_0GB)
-   *speed = PCIE_SPEED_8_0GT;
-   else if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_5_0GB)
-   *speed = PCIE_SPEED_5_0GT;
-   else if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_2_5GB)
-   *speed = PCIE_SPEED_2_5GT;
-   }
-   if (!err1) {
-   *width = (lnkcap1 & PCI_EXP_LNKCAP_MLW) >> PCIE_MLW_CAP_SHIFT;
-   if (!lnkcap2) { /* pre-r3.0 */
-   if (lnkcap1 & PCI_EXP_LNKCAP_SLS_5_0GB)
-   *speed = PCIE_SPEED_5_0GT;
-   else if (lnkcap1 & PCI_EXP_LNKCAP_SLS_2_5GB)
-   *speed = PCIE_SPEED_2_5GT;
-   }
-   }
-
-   if (*speed == PCI_SPEED_UNKNOWN || *width == PCIE_LNK_WIDTH_UNKNOWN) {
-   return err1 ? err1 :
-   err2 ? err2 : -EINVAL;
-   }
-   return 0;
-}
-
-static void mlx4_check_pcie_caps(struct mlx4_dev *dev)
-{
-   enum pcie_link_width width, width_cap;
-   enum pci_bus_speed speed, speed_cap;
-   int err;
-
-#define PCIE_SPEED_STR(speed) \
-   (speed == PCIE_SPEED_8_0GT ? "8.0GT/s" : \
-speed == PCIE_SPEED_5_0GT ? "5.0GT/s" : \
-speed == PCIE_SPEED_2_5GT ? "2.5GT/s" : \
-"Unknown")
-
-   err = mlx4_get_pcie_dev_link_caps(dev, _cap, _cap);
-   if (err) {
-   mlx4_warn(dev,
- "Unable to determine PCIe device BW capabilities\n");
-   return;
-   }
-
-   err = pcie_get_minimum_link(dev->persist->pdev, , );
-   if (err || speed == PCI_SPEED_UNKNOWN ||
-   width == PCIE_LNK_WIDTH_UNKNOWN) {
-   mlx4_warn(dev,
- "Unable to determine PCI device chain minimum BW\n");
-   return;
-   }
-
-   if (width != width_cap || speed != speed_cap)
-   mlx4_warn(dev,
- "PCIe BW is different than device's capability\n");
-
-   mlx4_info(dev, "PCIe link speed is %s, device supports %s\n",
- PCIE_SPEED_STR(speed), PCIE_SPEED_STR(speed_cap));
-   mlx4_info(dev, "PCIe link width is x%d, device supports x%d\n",
- width, width_cap);
-   return;
-}
-
 /*The function checks if there are live vf, return the num of them*/
 static int mlx4_how_many_lives_vf(struct mlx4_dev *dev)
 {
@@ -3475,7 +3396,7 @@ static int mlx4_load_one(struct pci_dev *pdev, int 
pci_dev_data,
 * express device capabilities are under-satisfied by the bus.
 */
if (!mlx4_is_slave(dev))
-   mlx4_check_pcie_caps(dev);
+   pcie_print_link_status(dev->persist->pdev);
 
/* In master functions, the communication channel must be initialized
 * after obtaining its address from fw */



[PATCH v5 01/14] PCI: Add pcie_get_speed_cap() to find max supported link speed

2018-03-30 Thread Bjorn Helgaas
From: Tal Gilboa <ta...@mellanox.com>

Add pcie_get_speed_cap() to find the max link speed supported by a device.
Change max_link_speed_show() to use pcie_get_speed_cap().

Signed-off-by: Tal Gilboa <ta...@mellanox.com>
[bhelgaas: return speed directly instead of error and *speed, don't export
outside drivers/pci]
Signed-off-by: Bjorn Helgaas <bhelg...@google.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/pci/pci-sysfs.c |   28 ++--
 drivers/pci/pci.c   |   44 
 drivers/pci/pci.h   |   10 ++
 3 files changed, 56 insertions(+), 26 deletions(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 7dc5be545d18..c2ea05fbbf1d 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -158,33 +158,9 @@ static DEVICE_ATTR_RO(resource);
 static ssize_t max_link_speed_show(struct device *dev,
   struct device_attribute *attr, char *buf)
 {
-   struct pci_dev *pci_dev = to_pci_dev(dev);
-   u32 linkcap;
-   int err;
-   const char *speed;
-
-   err = pcie_capability_read_dword(pci_dev, PCI_EXP_LNKCAP, );
-   if (err)
-   return -EINVAL;
-
-   switch (linkcap & PCI_EXP_LNKCAP_SLS) {
-   case PCI_EXP_LNKCAP_SLS_16_0GB:
-   speed = "16 GT/s";
-   break;
-   case PCI_EXP_LNKCAP_SLS_8_0GB:
-   speed = "8 GT/s";
-   break;
-   case PCI_EXP_LNKCAP_SLS_5_0GB:
-   speed = "5 GT/s";
-   break;
-   case PCI_EXP_LNKCAP_SLS_2_5GB:
-   speed = "2.5 GT/s";
-   break;
-   default:
-   speed = "Unknown speed";
-   }
+   struct pci_dev *pdev = to_pci_dev(dev);
 
-   return sprintf(buf, "%s\n", speed);
+   return sprintf(buf, "%s\n", PCIE_SPEED2STR(pcie_get_speed_cap(pdev)));
 }
 static DEVICE_ATTR_RO(max_link_speed);
 
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index f6a4dd10d9b0..b29d3436ee9f 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5146,6 +5146,50 @@ int pcie_get_minimum_link(struct pci_dev *dev, enum 
pci_bus_speed *speed,
 }
 EXPORT_SYMBOL(pcie_get_minimum_link);
 
+/**
+ * pcie_get_speed_cap - query for the PCI device's link speed capability
+ * @dev: PCI device to query
+ *
+ * Query the PCI device speed capability.  Return the maximum link speed
+ * supported by the device.
+ */
+enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev)
+{
+   u32 lnkcap2, lnkcap;
+
+   /*
+* PCIe r4.0 sec 7.5.3.18 recommends using the Supported Link
+* Speeds Vector in Link Capabilities 2 when supported, falling
+* back to Max Link Speed in Link Capabilities otherwise.
+*/
+   pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, );
+   if (lnkcap2) { /* PCIe r3.0-compliant */
+   if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_16_0GB)
+   return PCIE_SPEED_16_0GT;
+   else if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_8_0GB)
+   return PCIE_SPEED_8_0GT;
+   else if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_5_0GB)
+   return PCIE_SPEED_5_0GT;
+   else if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_2_5GB)
+   return PCIE_SPEED_2_5GT;
+   return PCI_SPEED_UNKNOWN;
+   }
+
+   pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, );
+   if (lnkcap) {
+   if (lnkcap & PCI_EXP_LNKCAP_SLS_16_0GB)
+   return PCIE_SPEED_16_0GT;
+   else if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB)
+   return PCIE_SPEED_8_0GT;
+   else if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB)
+   return PCIE_SPEED_5_0GT;
+   else if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB)
+   return PCIE_SPEED_2_5GT;
+   }
+
+   return PCI_SPEED_UNKNOWN;
+}
+
 /**
  * pci_select_bars - Make BAR mask from the type of resource
  * @dev: the PCI device for which BAR mask is made
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index fcd81911b127..1186d8be6055 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -253,6 +253,16 @@ bool pci_bus_clip_resource(struct pci_dev *dev, int idx);
 void pci_reassigndev_resource_alignment(struct pci_dev *dev);
 void pci_disable_bridge_window(struct pci_dev *dev);
 
+/* PCIe link information */
+#define PCIE_SPEED2STR(speed) \
+   ((speed) == PCIE_SPEED_16_0GT ? "16 GT/s" : \
+(speed) == PCIE_SPEED_8_0GT ? "8 GT/s" : \
+(speed) == PCIE_SPEED_5_0GT ? "5 GT/s" : \
+(speed) == PCIE_SPEED_2_5GT ? "2.5 GT/s" : \
+"Unknown speed")
+
+enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev);
+
 /* Single Root I/O Virtualization */
 struct pci_sriov {
int pos;/* Capability position */



[PATCH v5 03/14] PCI: Add pcie_bandwidth_capable() to compute max supported link bandwidth

2018-03-30 Thread Bjorn Helgaas
From: Tal Gilboa <ta...@mellanox.com>

Add pcie_bandwidth_capable() to compute the max link bandwidth supported by
a device, based on the max link speed and width, adjusted by the encoding
overhead.

The maximum bandwidth of the link is computed as:

  max_link_speed * max_link_width * (1 - encoding_overhead)

The encoding overhead is about 20% for 2.5 and 5.0 GT/s links using 8b/10b
encoding, and about 1.5% for 8 GT/s or higher speed links using 128b/130b
encoding.

Signed-off-by: Tal Gilboa <ta...@mellanox.com>
[bhelgaas: adjust for pcie_get_speed_cap() and pcie_get_width_cap()
signatures, don't export outside drivers/pci]
Signed-off-by: Bjorn Helgaas <bhelg...@google.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/pci/pci.c |   21 +
 drivers/pci/pci.h |9 +
 2 files changed, 30 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 43075be79388..9ce89e254197 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5208,6 +5208,27 @@ enum pcie_link_width pcie_get_width_cap(struct pci_dev 
*dev)
return PCIE_LNK_WIDTH_UNKNOWN;
 }
 
+/**
+ * pcie_bandwidth_capable - calculates a PCI device's link bandwidth capability
+ * @dev: PCI device
+ * @speed: storage for link speed
+ * @width: storage for link width
+ *
+ * Calculate a PCI device's link bandwidth by querying for its link speed
+ * and width, multiplying them, and applying encoding overhead.
+ */
+u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed *speed,
+  enum pcie_link_width *width)
+{
+   *speed = pcie_get_speed_cap(dev);
+   *width = pcie_get_width_cap(dev);
+
+   if (*speed == PCI_SPEED_UNKNOWN || *width == PCIE_LNK_WIDTH_UNKNOWN)
+   return 0;
+
+   return *width * PCIE_SPEED2MBS_ENC(*speed);
+}
+
 /**
  * pci_select_bars - Make BAR mask from the type of resource
  * @dev: the PCI device for which BAR mask is made
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 66738f1050c0..2a50172b9803 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -261,8 +261,17 @@ void pci_disable_bridge_window(struct pci_dev *dev);
 (speed) == PCIE_SPEED_2_5GT ? "2.5 GT/s" : \
 "Unknown speed")
 
+/* PCIe speed to Mb/s with encoding overhead: 20% for gen2, ~1.5% for gen3 */
+#define PCIE_SPEED2MBS_ENC(speed) \
+   ((speed) == PCIE_SPEED_8_0GT ? 7877 : \
+(speed) == PCIE_SPEED_5_0GT ? 4000 : \
+(speed) == PCIE_SPEED_2_5GT ? 2000 : \
+0)
+
 enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev);
 enum pcie_link_width pcie_get_width_cap(struct pci_dev *dev);
+u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed *speed,
+  enum pcie_link_width *width);
 
 /* Single Root I/O Virtualization */
 struct pci_sriov {



[PATCH v5 02/14] PCI: Add pcie_get_width_cap() to find max supported link width

2018-03-30 Thread Bjorn Helgaas
From: Tal Gilboa <ta...@mellanox.com>

Add pcie_get_width_cap() to find the max link width supported by a device.
Change max_link_width_show() to use pcie_get_width_cap().

Signed-off-by: Tal Gilboa <ta...@mellanox.com>
[bhelgaas: return width directly instead of error and *width, don't export
outside drivers/pci]
Signed-off-by: Bjorn Helgaas <bhelg...@google.com>
Reviewed-by: Tariq Toukan <tar...@mellanox.com>
---
 drivers/pci/pci-sysfs.c |   10 ++
 drivers/pci/pci.c   |   18 ++
 drivers/pci/pci.h   |1 +
 3 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index c2ea05fbbf1d..63d0952684fb 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -167,15 +167,9 @@ static DEVICE_ATTR_RO(max_link_speed);
 static ssize_t max_link_width_show(struct device *dev,
   struct device_attribute *attr, char *buf)
 {
-   struct pci_dev *pci_dev = to_pci_dev(dev);
-   u32 linkcap;
-   int err;
-
-   err = pcie_capability_read_dword(pci_dev, PCI_EXP_LNKCAP, );
-   if (err)
-   return -EINVAL;
+   struct pci_dev *pdev = to_pci_dev(dev);
 
-   return sprintf(buf, "%u\n", (linkcap & PCI_EXP_LNKCAP_MLW) >> 4);
+   return sprintf(buf, "%u\n", pcie_get_width_cap(pdev));
 }
 static DEVICE_ATTR_RO(max_link_width);
 
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b29d3436ee9f..43075be79388 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5190,6 +5190,24 @@ enum pci_bus_speed pcie_get_speed_cap(struct pci_dev 
*dev)
return PCI_SPEED_UNKNOWN;
 }
 
+/**
+ * pcie_get_width_cap - query for the PCI device's link width capability
+ * @dev: PCI device to query
+ *
+ * Query the PCI device width capability.  Return the maximum link width
+ * supported by the device.
+ */
+enum pcie_link_width pcie_get_width_cap(struct pci_dev *dev)
+{
+   u32 lnkcap;
+
+   pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, );
+   if (lnkcap)
+   return (lnkcap & PCI_EXP_LNKCAP_MLW) >> 4;
+
+   return PCIE_LNK_WIDTH_UNKNOWN;
+}
+
 /**
  * pci_select_bars - Make BAR mask from the type of resource
  * @dev: the PCI device for which BAR mask is made
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 1186d8be6055..66738f1050c0 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -262,6 +262,7 @@ void pci_disable_bridge_window(struct pci_dev *dev);
 "Unknown speed")
 
 enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev);
+enum pcie_link_width pcie_get_width_cap(struct pci_dev *dev);
 
 /* Single Root I/O Virtualization */
 struct pci_sriov {



[PATCH v5 04/14] PCI: Add pcie_bandwidth_available() to compute bandwidth available to device

2018-03-30 Thread Bjorn Helgaas
From: Tal Gilboa <ta...@mellanox.com>

Add pcie_bandwidth_available() to compute the bandwidth available to a
device.  This may be limited by the device itself or by a slower upstream
link leading to the device.

The available bandwidth at each link along the path is computed as:

  link_speed * link_width * (1 - encoding_overhead)

The encoding overhead is about 20% for 2.5 and 5.0 GT/s links using 8b/10b
encoding, and about 1.5% for 8 GT/s or higher speed links using 128b/130b
encoding.

Also return the device with the slowest link and the speed and width of
that link.

Signed-off-by: Tal Gilboa <ta...@mellanox.com>
[bhelgaas: changelog, leave pcie_get_minimum_link() alone for now, return
bw directly, use pci_upstream_bridge(), check "next_bw <= bw" to find
uppermost limiting device, return speed/width of the limiting device]
Signed-off-by: Bjorn Helgaas <bhelg...@google.com>
---
 drivers/pci/pci.c   |   54 +++
 include/linux/pci.h |3 +++
 2 files changed, 57 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 9ce89e254197..e00d56b12747 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5146,6 +5146,60 @@ int pcie_get_minimum_link(struct pci_dev *dev, enum 
pci_bus_speed *speed,
 }
 EXPORT_SYMBOL(pcie_get_minimum_link);
 
+/**
+ * pcie_bandwidth_available - determine minimum link settings of a PCIe
+ *   device and its bandwidth limitation
+ * @dev: PCI device to query
+ * @limiting_dev: storage for device causing the bandwidth limitation
+ * @speed: storage for speed of limiting device
+ * @width: storage for width of limiting device
+ *
+ * Walk up the PCI device chain and find the point where the minimum
+ * bandwidth is available.  Return the bandwidth available there and (if
+ * limiting_dev, speed, and width pointers are supplied) information about
+ * that point.
+ */
+u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev 
**limiting_dev,
+enum pci_bus_speed *speed,
+enum pcie_link_width *width)
+{
+   u16 lnksta;
+   enum pci_bus_speed next_speed;
+   enum pcie_link_width next_width;
+   u32 bw, next_bw;
+
+   *speed = PCI_SPEED_UNKNOWN;
+   *width = PCIE_LNK_WIDTH_UNKNOWN;
+   bw = 0;
+
+   while (dev) {
+   pcie_capability_read_word(dev, PCI_EXP_LNKSTA, );
+
+   next_speed = pcie_link_speed[lnksta & PCI_EXP_LNKSTA_CLS];
+   next_width = (lnksta & PCI_EXP_LNKSTA_NLW) >>
+   PCI_EXP_LNKSTA_NLW_SHIFT;
+
+   next_bw = next_width * PCIE_SPEED2MBS_ENC(next_speed);
+
+   /* Check if current device limits the total bandwidth */
+   if (!bw || next_bw <= bw) {
+   bw = next_bw;
+
+   if (limiting_dev)
+   *limiting_dev = dev;
+   if (speed)
+   *speed = next_speed;
+   if (width)
+   *width = next_width;
+   }
+
+   dev = pci_upstream_bridge(dev);
+   }
+
+   return bw;
+}
+EXPORT_SYMBOL(pcie_bandwidth_available);
+
 /**
  * pcie_get_speed_cap - query for the PCI device's link speed capability
  * @dev: PCI device to query
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 8043a5937ad0..f2bf2b7a66c7 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1083,6 +1083,9 @@ int pcie_get_mps(struct pci_dev *dev);
 int pcie_set_mps(struct pci_dev *dev, int mps);
 int pcie_get_minimum_link(struct pci_dev *dev, enum pci_bus_speed *speed,
  enum pcie_link_width *width);
+u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev 
**limiting_dev,
+enum pci_bus_speed *speed,
+enum pcie_link_width *width);
 void pcie_flr(struct pci_dev *dev);
 int __pci_reset_function_locked(struct pci_dev *dev);
 int pci_reset_function(struct pci_dev *dev);



Re: [PATCH v5 03/14] PCI: Add pcie_bandwidth_capable() to compute max supported link bandwidth

2018-04-02 Thread Bjorn Helgaas
On Mon, Apr 02, 2018 at 10:34:58AM +0300, Tal Gilboa wrote:
> On 4/2/2018 3:40 AM, Bjorn Helgaas wrote:
> > On Sun, Apr 01, 2018 at 11:38:53PM +0300, Tal Gilboa wrote:
> > > On 3/31/2018 12:05 AM, Bjorn Helgaas wrote:
> > > > From: Tal Gilboa <ta...@mellanox.com>
> > > > 
> > > > Add pcie_bandwidth_capable() to compute the max link bandwidth 
> > > > supported by
> > > > a device, based on the max link speed and width, adjusted by the 
> > > > encoding
> > > > overhead.
> > > > 
> > > > The maximum bandwidth of the link is computed as:
> > > > 
> > > > max_link_speed * max_link_width * (1 - encoding_overhead)
> > > > 
> > > > The encoding overhead is about 20% for 2.5 and 5.0 GT/s links using 
> > > > 8b/10b
> > > > encoding, and about 1.5% for 8 GT/s or higher speed links using 
> > > > 128b/130b
> > > > encoding.
> > > > 
> > > > Signed-off-by: Tal Gilboa <ta...@mellanox.com>
> > > > [bhelgaas: adjust for pcie_get_speed_cap() and pcie_get_width_cap()
> > > > signatures, don't export outside drivers/pci]
> > > > Signed-off-by: Bjorn Helgaas <bhelg...@google.com>
> > > > Reviewed-by: Tariq Toukan <tar...@mellanox.com>
> > > > ---
> > > >drivers/pci/pci.c |   21 +
> > > >drivers/pci/pci.h |9 +
> > > >2 files changed, 30 insertions(+)
> > > > 
> > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > > index 43075be79388..9ce89e254197 100644
> > > > --- a/drivers/pci/pci.c
> > > > +++ b/drivers/pci/pci.c
> > > > @@ -5208,6 +5208,27 @@ enum pcie_link_width pcie_get_width_cap(struct 
> > > > pci_dev *dev)
> > > > return PCIE_LNK_WIDTH_UNKNOWN;
> > > >}
> > > > +/**
> > > > + * pcie_bandwidth_capable - calculates a PCI device's link bandwidth 
> > > > capability
> > > > + * @dev: PCI device
> > > > + * @speed: storage for link speed
> > > > + * @width: storage for link width
> > > > + *
> > > > + * Calculate a PCI device's link bandwidth by querying for its link 
> > > > speed
> > > > + * and width, multiplying them, and applying encoding overhead.
> > > > + */
> > > > +u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed 
> > > > *speed,
> > > > +  enum pcie_link_width *width)
> > > > +{
> > > > +   *speed = pcie_get_speed_cap(dev);
> > > > +   *width = pcie_get_width_cap(dev);
> > > > +
> > > > +   if (*speed == PCI_SPEED_UNKNOWN || *width == 
> > > > PCIE_LNK_WIDTH_UNKNOWN)
> > > > +   return 0;
> > > > +
> > > > +   return *width * PCIE_SPEED2MBS_ENC(*speed);
> > > > +}
> > > > +
> > > >/**
> > > > * pci_select_bars - Make BAR mask from the type of resource
> > > > * @dev: the PCI device for which BAR mask is made
> > > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > > > index 66738f1050c0..2a50172b9803 100644
> > > > --- a/drivers/pci/pci.h
> > > > +++ b/drivers/pci/pci.h
> > > > @@ -261,8 +261,17 @@ void pci_disable_bridge_window(struct pci_dev 
> > > > *dev);
> > > >  (speed) == PCIE_SPEED_2_5GT ? "2.5 GT/s" : \
> > > >  "Unknown speed")
> > > > +/* PCIe speed to Mb/s with encoding overhead: 20% for gen2, ~1.5% for 
> > > > gen3 */
> > > > +#define PCIE_SPEED2MBS_ENC(speed) \
> > > 
> > > Missing gen4.
> > 
> > I made it "gen3+".  I think that's accurate, isn't it?  The spec
> > doesn't seem to actually use "gen3" as a specific term, but sec 4.2.2
> > says rates of 8 GT/s or higher (which I think includes gen3 and gen4)
> > use 128b/130b encoding.
> > 
> 
> I meant that PCIE_SPEED_16_0GT will return 0 from this macro since it wasn't
> added. Need to return 15754.

Oh, duh, of course!  Sorry for being dense.  What about the following?
I included the calculation as opposed to just the magic numbers to try
to make it clear how they're derived.  This has the disadvantage of
truncating the result instead of rounding, but I doubt that's
significant in this context.  If it is, we could use the magic numbers
and put 

Re: [PATCH v5 03/14] PCI: Add pcie_bandwidth_capable() to compute max supported link bandwidth

2018-04-02 Thread Bjorn Helgaas
On Mon, Apr 02, 2018 at 04:00:16PM +, Keller, Jacob E wrote:
> > -Original Message-
> > From: Tal Gilboa [mailto:ta...@mellanox.com]
> > Sent: Monday, April 02, 2018 7:34 AM
> > To: Bjorn Helgaas <helg...@kernel.org>
> > Cc: Tariq Toukan <tar...@mellanox.com>; Keller, Jacob E
> > <jacob.e.kel...@intel.com>; Ariel Elior <ariel.el...@cavium.com>; Ganesh
> > Goudar <ganes...@chelsio.com>; Kirsher, Jeffrey T
> > <jeffrey.t.kirs...@intel.com>; everest-linux...@cavium.com; intel-wired-
> > l...@lists.osuosl.org; netdev@vger.kernel.org; linux-ker...@vger.kernel.org;
> > linux-...@vger.kernel.org
> > Subject: Re: [PATCH v5 03/14] PCI: Add pcie_bandwidth_capable() to compute
> > max supported link bandwidth
> > 
> > On 4/2/2018 5:05 PM, Bjorn Helgaas wrote:
> > > On Mon, Apr 02, 2018 at 10:34:58AM +0300, Tal Gilboa wrote:
> > >> On 4/2/2018 3:40 AM, Bjorn Helgaas wrote:
> > >>> On Sun, Apr 01, 2018 at 11:38:53PM +0300, Tal Gilboa wrote:
> > >>>> On 3/31/2018 12:05 AM, Bjorn Helgaas wrote:
> > >>>>> From: Tal Gilboa <ta...@mellanox.com>
> > >>>>>
> > >>>>> Add pcie_bandwidth_capable() to compute the max link bandwidth
> > supported by
> > >>>>> a device, based on the max link speed and width, adjusted by the
> > encoding
> > >>>>> overhead.
> > >>>>>
> > >>>>> The maximum bandwidth of the link is computed as:
> > >>>>>
> > >>>>>  max_link_speed * max_link_width * (1 - encoding_overhead)
> > >>>>>
> > >>>>> The encoding overhead is about 20% for 2.5 and 5.0 GT/s links using
> > 8b/10b
> > >>>>> encoding, and about 1.5% for 8 GT/s or higher speed links using 
> > >>>>> 128b/130b
> > >>>>> encoding.
> > >>>>>
> > >>>>> Signed-off-by: Tal Gilboa <ta...@mellanox.com>
> > >>>>> [bhelgaas: adjust for pcie_get_speed_cap() and pcie_get_width_cap()
> > >>>>> signatures, don't export outside drivers/pci]
> > >>>>> Signed-off-by: Bjorn Helgaas <bhelg...@google.com>
> > >>>>> Reviewed-by: Tariq Toukan <tar...@mellanox.com>
> > >>>>> ---
> > >>>>> drivers/pci/pci.c |   21 +
> > >>>>> drivers/pci/pci.h |9 +
> > >>>>> 2 files changed, 30 insertions(+)
> > >>>>>
> > >>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > >>>>> index 43075be79388..9ce89e254197 100644
> > >>>>> --- a/drivers/pci/pci.c
> > >>>>> +++ b/drivers/pci/pci.c
> > >>>>> @@ -5208,6 +5208,27 @@ enum pcie_link_width
> > pcie_get_width_cap(struct pci_dev *dev)
> > >>>>>   return PCIE_LNK_WIDTH_UNKNOWN;
> > >>>>> }
> > >>>>> +/**
> > >>>>> + * pcie_bandwidth_capable - calculates a PCI device's link bandwidth
> > capability
> > >>>>> + * @dev: PCI device
> > >>>>> + * @speed: storage for link speed
> > >>>>> + * @width: storage for link width
> > >>>>> + *
> > >>>>> + * Calculate a PCI device's link bandwidth by querying for its link 
> > >>>>> speed
> > >>>>> + * and width, multiplying them, and applying encoding overhead.
> > >>>>> + */
> > >>>>> +u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed
> > *speed,
> > >>>>> +enum pcie_link_width *width)
> > >>>>> +{
> > >>>>> + *speed = pcie_get_speed_cap(dev);
> > >>>>> + *width = pcie_get_width_cap(dev);
> > >>>>> +
> > >>>>> + if (*speed == PCI_SPEED_UNKNOWN || *width ==
> > PCIE_LNK_WIDTH_UNKNOWN)
> > >>>>> + return 0;
> > >>>>> +
> > >>>>> + return *width * PCIE_SPEED2MBS_ENC(*speed);
> > >>>>> +}
> > >>>>> +
> > >>>>> /**
> > >>>>>  * pci_select_bars - Make BAR mask from the type of resource
> > >>>>>  * @dev: the PCI device for which BAR mask is 

Re: [PATCH v5 05/14] PCI: Add pcie_print_link_status() to log link speed and whether it's limited

2018-04-02 Thread Bjorn Helgaas
On Mon, Apr 02, 2018 at 04:25:17PM +, Keller, Jacob E wrote:
> > -Original Message-
> > From: Bjorn Helgaas [mailto:helg...@kernel.org]
> > Sent: Friday, March 30, 2018 2:05 PM
> > To: Tal Gilboa <ta...@mellanox.com>
> > Cc: Tariq Toukan <tar...@mellanox.com>; Keller, Jacob E
> > <jacob.e.kel...@intel.com>; Ariel Elior <ariel.el...@cavium.com>; Ganesh
> > Goudar <ganes...@chelsio.com>; Kirsher, Jeffrey T
> > <jeffrey.t.kirs...@intel.com>; everest-linux...@cavium.com; intel-wired-
> > l...@lists.osuosl.org; netdev@vger.kernel.org; linux-ker...@vger.kernel.org;
> > linux-...@vger.kernel.org
> > Subject: [PATCH v5 05/14] PCI: Add pcie_print_link_status() to log link 
> > speed and
> > whether it's limited
> > 
> > From: Tal Gilboa <ta...@mellanox.com>
> > 
> > Add pcie_print_link_status().  This logs the current settings of the link
> > (speed, width, and total available bandwidth).
> > 
> > If the device is capable of more bandwidth but is limited by a slower
> > upstream link, we include information about the link that limits the
> > device's performance.
> > 
> > The user may be able to move the device to a different slot for better
> > performance.
> > 
> > This provides a unified method for all PCI devices to report status and
> > issues, instead of each device reporting in a different way, using
> > different code.
> > 
> > Signed-off-by: Tal Gilboa <ta...@mellanox.com>
> > [bhelgaas: changelog, reword log messages, print device capabilities when
> > not limited]
> > Signed-off-by: Bjorn Helgaas <bhelg...@google.com>
> > ---
> >  drivers/pci/pci.c   |   29 +
> >  include/linux/pci.h |1 +
> >  2 files changed, 30 insertions(+)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index e00d56b12747..cec7aed09f6b 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -5283,6 +5283,35 @@ u32 pcie_bandwidth_capable(struct pci_dev *dev,
> > enum pci_bus_speed *speed,
> > return *width * PCIE_SPEED2MBS_ENC(*speed);
> >  }
> > 
> > +/**
> > + * pcie_print_link_status - Report the PCI device's link speed and width
> > + * @dev: PCI device to query
> > + *
> > + * Report the available bandwidth at the device.  If this is less than the
> > + * device is capable of, report the device's maximum possible bandwidth and
> > + * the upstream link that limits its performance to less than that.
> > + */
> > +void pcie_print_link_status(struct pci_dev *dev)
> > +{
> > +   enum pcie_link_width width, width_cap;
> > +   enum pci_bus_speed speed, speed_cap;
> > +   struct pci_dev *limiting_dev = NULL;
> > +   u32 bw_avail, bw_cap;
> > +
> > +   bw_cap = pcie_bandwidth_capable(dev, _cap, _cap);
> > +   bw_avail = pcie_bandwidth_available(dev, _dev, ,
> > );
> > +
> > +   if (bw_avail >= bw_cap)
> > +   pci_info(dev, "%d Mb/s available bandwidth (%s x%d link)\n",
> > +bw_cap, PCIE_SPEED2STR(speed_cap), width_cap);
> > +   else
> > +   pci_info(dev, "%d Mb/s available bandwidth, limited by %s x%d
> > link at %s (capable of %d Mb/s with %s x%d link)\n",
> > +bw_avail, PCIE_SPEED2STR(speed), width,
> > +limiting_dev ? pci_name(limiting_dev) : "",
> > +bw_cap, PCIE_SPEED2STR(speed_cap), width_cap);
> > +}
> 
> Personally, I would make thic last one a pci_warn() to indicate it at a
> higher log level, but I'm  ok with the wording, and if consensus is that
> this should be at info, I'm ok with that.

Tal's original patch did have a pci_warn() here, and we went back and
forth a bit.  They get bug reports when a device doesn't perform as
expected, which argues for pci_warn().  But they also got feedback
saying warnings are a bit too much, which argues for pci_info() [1]

I don't have a really strong opinion either way.  I have a slight
preference for info because the user may not be able to do anything
about it (there may not be a faster slot available), and I think
distros are usually configured so a warning interrupts the smooth
graphical boot.

It looks like mlx4, fm10k, and ixgbe currently use warnings, while
bnx2x, bnxt_en, and cxgb4 use info.  It's a tie so far :)

[1] https://lkml.kernel.org/r/e47f3628-b56c-4d0a-f18b-5ffaf261c...@mellanox.com

Here's a proposal for printing the bandwidth as "x.xxx Gb/s":

commit ad370f38c1b5e9b8bb941eaed84ebb676c4bdaa4
Author: Tal Gilboa <ta...@mellanox.com>

Re: [PATCH v5 12/14] fm10k: Report PCIe link properties with pcie_print_link_status()

2018-04-02 Thread Bjorn Helgaas
On Mon, Apr 02, 2018 at 03:56:06PM +, Keller, Jacob E wrote:
> > -Original Message-
> > From: Bjorn Helgaas [mailto:helg...@kernel.org]
> > Sent: Friday, March 30, 2018 2:06 PM
> > To: Tal Gilboa <ta...@mellanox.com>
> > Cc: Tariq Toukan <tar...@mellanox.com>; Keller, Jacob E
> > <jacob.e.kel...@intel.com>; Ariel Elior <ariel.el...@cavium.com>; Ganesh
> > Goudar <ganes...@chelsio.com>; Kirsher, Jeffrey T
> > <jeffrey.t.kirs...@intel.com>; everest-linux...@cavium.com; intel-wired-
> > l...@lists.osuosl.org; netdev@vger.kernel.org; linux-ker...@vger.kernel.org;
> > linux-...@vger.kernel.org
> > Subject: [PATCH v5 12/14] fm10k: Report PCIe link properties with
> > pcie_print_link_status()
> > 
> > From: Bjorn Helgaas <bhelg...@google.com>
> > 
> > Use pcie_print_link_status() to report PCIe link speed and possible
> > limitations instead of implementing this in the driver itself.
> > 
> > Note that pcie_get_minimum_link() can return misleading information because
> > it finds the slowest link and the narrowest link without considering the
> > total bandwidth of the link.  If the path contains a 16 GT/s x1 link and a
> > 2.5 GT/s x16 link, pcie_get_minimum_link() returns 2.5 GT/s x1, which
> > corresponds to 250 MB/s of bandwidth, not the actual available bandwidth of
> > about 2000 MB/s for a 16 GT/s x1 link.
> 
> This comment is about what's being fixed, so it would have been easier to
> parse if it were written to more clearly indicate that we're removing
> (and not adding) this behavior.

Good point.  Is this any better?

  fm10k: Report PCIe link properties with pcie_print_link_status()
  
  Previously the driver used pcie_get_minimum_link() to warn when the NIC
  is in a slot that can't supply as much bandwidth as the NIC could use.
  
  pcie_get_minimum_link() can be misleading because it finds the slowest link
  and the narrowest link (which may be different links) without considering
  the total bandwidth of each link.  For a path with a 16 GT/s x1 link and a
  2.5 GT/s x16 link, it returns 2.5 GT/s x1, which corresponds to 250 MB/s of
  bandwidth, not the true available bandwidth of about 1969 MB/s for a
  16 GT/s x1 link.
  
  Use pcie_print_link_status() to report PCIe link speed and possible
  limitations instead of implementing this in the driver itself.  This finds
  the slowest link in the path to the device by computing the total bandwidth
  of each link and compares that with the capabilities of the device.
  
  Note that the driver previously used dev_warn() to suggest using a
  different slot, but pcie_print_link_status() uses dev_info() because if the
  platform has no faster slot available, the user can't do anything about the
  warning and may not want to be bothered with it.


Re: [PATCH v5 04/14] PCI: Add pcie_bandwidth_available() to compute bandwidth available to device

2018-04-01 Thread Bjorn Helgaas
On Sun, Apr 01, 2018 at 11:41:42PM +0300, Tal Gilboa wrote:
> On 3/31/2018 12:05 AM, Bjorn Helgaas wrote:
> > From: Tal Gilboa <ta...@mellanox.com>
> > 
> > Add pcie_bandwidth_available() to compute the bandwidth available to a
> > device.  This may be limited by the device itself or by a slower upstream
> > link leading to the device.
> > 
> > The available bandwidth at each link along the path is computed as:
> > 
> >link_speed * link_width * (1 - encoding_overhead)
> > 
> > The encoding overhead is about 20% for 2.5 and 5.0 GT/s links using 8b/10b
> > encoding, and about 1.5% for 8 GT/s or higher speed links using 128b/130b
> > encoding.
> > 
> > Also return the device with the slowest link and the speed and width of
> > that link.
> > 
> > Signed-off-by: Tal Gilboa <ta...@mellanox.com>
> > [bhelgaas: changelog, leave pcie_get_minimum_link() alone for now, return
> > bw directly, use pci_upstream_bridge(), check "next_bw <= bw" to find
> > uppermost limiting device, return speed/width of the limiting device]
> > Signed-off-by: Bjorn Helgaas <bhelg...@google.com>
> > ---
> >   drivers/pci/pci.c   |   54 
> > +++
> >   include/linux/pci.h |3 +++
> >   2 files changed, 57 insertions(+)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 9ce89e254197..e00d56b12747 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -5146,6 +5146,60 @@ int pcie_get_minimum_link(struct pci_dev *dev, enum 
> > pci_bus_speed *speed,
> >   }
> >   EXPORT_SYMBOL(pcie_get_minimum_link);
> > +/**
> > + * pcie_bandwidth_available - determine minimum link settings of a PCIe
> > + *   device and its bandwidth limitation
> > + * @dev: PCI device to query
> > + * @limiting_dev: storage for device causing the bandwidth limitation
> > + * @speed: storage for speed of limiting device
> > + * @width: storage for width of limiting device
> > + *
> > + * Walk up the PCI device chain and find the point where the minimum
> > + * bandwidth is available.  Return the bandwidth available there and (if
> > + * limiting_dev, speed, and width pointers are supplied) information about
> > + * that point.
> > + */
> > +u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev 
> > **limiting_dev,
> > +enum pci_bus_speed *speed,
> > +enum pcie_link_width *width)
> > +{
> > +   u16 lnksta;
> > +   enum pci_bus_speed next_speed;
> > +   enum pcie_link_width next_width;
> > +   u32 bw, next_bw;
> > +
> > +   *speed = PCI_SPEED_UNKNOWN;
> > +   *width = PCIE_LNK_WIDTH_UNKNOWN;
> 
> This is not safe anymore, now that we allow speed/width=NULL.

Good catch, thanks!


Re: [PATCH v5 03/14] PCI: Add pcie_bandwidth_capable() to compute max supported link bandwidth

2018-04-01 Thread Bjorn Helgaas
On Sun, Apr 01, 2018 at 11:38:53PM +0300, Tal Gilboa wrote:
> On 3/31/2018 12:05 AM, Bjorn Helgaas wrote:
> > From: Tal Gilboa <ta...@mellanox.com>
> > 
> > Add pcie_bandwidth_capable() to compute the max link bandwidth supported by
> > a device, based on the max link speed and width, adjusted by the encoding
> > overhead.
> > 
> > The maximum bandwidth of the link is computed as:
> > 
> >max_link_speed * max_link_width * (1 - encoding_overhead)
> > 
> > The encoding overhead is about 20% for 2.5 and 5.0 GT/s links using 8b/10b
> > encoding, and about 1.5% for 8 GT/s or higher speed links using 128b/130b
> > encoding.
> > 
> > Signed-off-by: Tal Gilboa <ta...@mellanox.com>
> > [bhelgaas: adjust for pcie_get_speed_cap() and pcie_get_width_cap()
> > signatures, don't export outside drivers/pci]
> > Signed-off-by: Bjorn Helgaas <bhelg...@google.com>
> > Reviewed-by: Tariq Toukan <tar...@mellanox.com>
> > ---
> >   drivers/pci/pci.c |   21 +
> >   drivers/pci/pci.h |9 +
> >   2 files changed, 30 insertions(+)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 43075be79388..9ce89e254197 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -5208,6 +5208,27 @@ enum pcie_link_width pcie_get_width_cap(struct 
> > pci_dev *dev)
> > return PCIE_LNK_WIDTH_UNKNOWN;
> >   }
> > +/**
> > + * pcie_bandwidth_capable - calculates a PCI device's link bandwidth 
> > capability
> > + * @dev: PCI device
> > + * @speed: storage for link speed
> > + * @width: storage for link width
> > + *
> > + * Calculate a PCI device's link bandwidth by querying for its link speed
> > + * and width, multiplying them, and applying encoding overhead.
> > + */
> > +u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed *speed,
> > +  enum pcie_link_width *width)
> > +{
> > +   *speed = pcie_get_speed_cap(dev);
> > +   *width = pcie_get_width_cap(dev);
> > +
> > +   if (*speed == PCI_SPEED_UNKNOWN || *width == PCIE_LNK_WIDTH_UNKNOWN)
> > +   return 0;
> > +
> > +   return *width * PCIE_SPEED2MBS_ENC(*speed);
> > +}
> > +
> >   /**
> >* pci_select_bars - Make BAR mask from the type of resource
> >* @dev: the PCI device for which BAR mask is made
> > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > index 66738f1050c0..2a50172b9803 100644
> > --- a/drivers/pci/pci.h
> > +++ b/drivers/pci/pci.h
> > @@ -261,8 +261,17 @@ void pci_disable_bridge_window(struct pci_dev *dev);
> >  (speed) == PCIE_SPEED_2_5GT ? "2.5 GT/s" : \
> >  "Unknown speed")
> > +/* PCIe speed to Mb/s with encoding overhead: 20% for gen2, ~1.5% for gen3 
> > */
> > +#define PCIE_SPEED2MBS_ENC(speed) \
> 
> Missing gen4.

I made it "gen3+".  I think that's accurate, isn't it?  The spec
doesn't seem to actually use "gen3" as a specific term, but sec 4.2.2
says rates of 8 GT/s or higher (which I think includes gen3 and gen4)
use 128b/130b encoding.


Re: [PATCH v5 03/14] PCI: Add pcie_bandwidth_capable() to compute max supported link bandwidth

2018-04-03 Thread Bjorn Helgaas
On Mon, Apr 02, 2018 at 05:30:54PM -0700, Jacob Keller wrote:
> On Mon, Apr 2, 2018 at 7:05 AM, Bjorn Helgaas <helg...@kernel.org> wrote:
> > +/* PCIe speed to Mb/s reduced by encoding overhead */
> > +#define PCIE_SPEED2MBS_ENC(speed) \
> > +   ((speed) == PCIE_SPEED_16_0GT ? (16000*(128/130)) : \
> > +(speed) == PCIE_SPEED_8_0GT  ?  (8000*(128/130)) : \
> > +(speed) == PCIE_SPEED_5_0GT  ?  (5000*(8/10)) : \
> > +(speed) == PCIE_SPEED_2_5GT  ?  (2500*(8/10)) : \
> > +0)
> > +
> 
> Should this be "(speed * x ) / y" instead? wouldn't they calculate
> 128/130 and truncate that to zero before multiplying by the speed? Or
> are compilers smart enough to do this the other way to avoid the
> losses?

Yep, thanks for saving me yet more embarrassment.


[PATCH v5 05/14] PCI: Add pcie_print_link_status() to log link speed and whether it's limited

2018-03-30 Thread Bjorn Helgaas
From: Tal Gilboa <ta...@mellanox.com>

Add pcie_print_link_status().  This logs the current settings of the link
(speed, width, and total available bandwidth).

If the device is capable of more bandwidth but is limited by a slower
upstream link, we include information about the link that limits the
device's performance.

The user may be able to move the device to a different slot for better
performance.

This provides a unified method for all PCI devices to report status and
issues, instead of each device reporting in a different way, using
different code.

Signed-off-by: Tal Gilboa <ta...@mellanox.com>
[bhelgaas: changelog, reword log messages, print device capabilities when
not limited]
Signed-off-by: Bjorn Helgaas <bhelg...@google.com>
---
 drivers/pci/pci.c   |   29 +
 include/linux/pci.h |1 +
 2 files changed, 30 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index e00d56b12747..cec7aed09f6b 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5283,6 +5283,35 @@ u32 pcie_bandwidth_capable(struct pci_dev *dev, enum 
pci_bus_speed *speed,
return *width * PCIE_SPEED2MBS_ENC(*speed);
 }
 
+/**
+ * pcie_print_link_status - Report the PCI device's link speed and width
+ * @dev: PCI device to query
+ *
+ * Report the available bandwidth at the device.  If this is less than the
+ * device is capable of, report the device's maximum possible bandwidth and
+ * the upstream link that limits its performance to less than that.
+ */
+void pcie_print_link_status(struct pci_dev *dev)
+{
+   enum pcie_link_width width, width_cap;
+   enum pci_bus_speed speed, speed_cap;
+   struct pci_dev *limiting_dev = NULL;
+   u32 bw_avail, bw_cap;
+
+   bw_cap = pcie_bandwidth_capable(dev, _cap, _cap);
+   bw_avail = pcie_bandwidth_available(dev, _dev, , );
+
+   if (bw_avail >= bw_cap)
+   pci_info(dev, "%d Mb/s available bandwidth (%s x%d link)\n",
+bw_cap, PCIE_SPEED2STR(speed_cap), width_cap);
+   else
+   pci_info(dev, "%d Mb/s available bandwidth, limited by %s x%d 
link at %s (capable of %d Mb/s with %s x%d link)\n",
+bw_avail, PCIE_SPEED2STR(speed), width,
+limiting_dev ? pci_name(limiting_dev) : "",
+bw_cap, PCIE_SPEED2STR(speed_cap), width_cap);
+}
+EXPORT_SYMBOL(pcie_print_link_status);
+
 /**
  * pci_select_bars - Make BAR mask from the type of resource
  * @dev: the PCI device for which BAR mask is made
diff --git a/include/linux/pci.h b/include/linux/pci.h
index f2bf2b7a66c7..38f7957121ef 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1086,6 +1086,7 @@ int pcie_get_minimum_link(struct pci_dev *dev, enum 
pci_bus_speed *speed,
 u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev 
**limiting_dev,
 enum pci_bus_speed *speed,
 enum pcie_link_width *width);
+void pcie_print_link_status(struct pci_dev *dev);
 void pcie_flr(struct pci_dev *dev);
 int __pci_reset_function_locked(struct pci_dev *dev);
 int pci_reset_function(struct pci_dev *dev);



[PATCH v5 00/14] Report PCI device link status

2018-03-30 Thread Bjorn Helgaas
This is mostly Tal's work to reduce code duplication in drivers and unify
the approach for reporting PCIe link speed/width and whether the device is
being limited by a slower upstream link.

This v5 series is based on Tal's v4 [1].

Changes since v4:
  - Added patches to replace uses of pcie_get_minimum_link() in bnx2x,
bnxt_en, cxgb4, fm10k, and ixgbe.  Note that this is a user-visible
change to the log messages, and in some cases changes dev_warn() to
dev_info().  I hope we can converge on something that works for
everybody, and it's OK if we need to tweak the text and/or level used
in pcie_print_link_status() to get there.

  - Rebased on top of Jay Fang's patch that adds 16 GT/s decoding support.

  - Changed pcie_get_speed_cap() and pcie_get_width_cap() to return the
values directly instead of returning both an error code and the value
via a reference parameter.  I don't think the callers can really use
both the error and the value.

  - Moved some declarations from linux/pci.h to drivers/pci/pci.h so
they're not visible outside the PCI subsystem.  Also removed
corresponding EXPORT_SYMBOL()s.  If we need these outside the PCI core,
we can export them again, but that's not needed yet.

  - Reworked pcie_bandwidth_available() so it finds the uppermost limiting
device and returns width/speed info for that device (previously it
could return width from one device and speed from a different one).

The incremental diff between the v4 series (based on v4.17-rc1) and this v5
series (based on v4.17-rc1 + Jay Fang's patch) is attached.  This diff
doesn't include the new patches to bnx2x, bnxt_en, cxgb4, fm10k, and ixgbe.

I don't have any of this hardware, so this is only compile-tested.

Bjorn


[1] 
https://lkml.kernel.org/r/1522394086-3555-1-git-send-email-ta...@mellanox.com

---

Bjorn Helgaas (6):
  bnx2x: Report PCIe link properties with pcie_print_link_status()
  bnxt_en: Report PCIe link properties with pcie_print_link_status()
  cxgb4: Report PCIe link properties with pcie_print_link_status()
  fm10k: Report PCIe link properties with pcie_print_link_status()
  ixgbe: Report PCIe link properties with pcie_print_link_status()
  PCI: Remove unused pcie_get_minimum_link()

Tal Gilboa (8):
  PCI: Add pcie_get_speed_cap() to find max supported link speed
  PCI: Add pcie_get_width_cap() to find max supported link width
  PCI: Add pcie_bandwidth_capable() to compute max supported link bandwidth
  PCI: Add pcie_bandwidth_available() to compute bandwidth available to 
device
  PCI: Add pcie_print_link_status() to log link speed and whether it's 
limited
  net/mlx4_core: Report PCIe link properties with pcie_print_link_status()
  net/mlx5: Report PCIe link properties with pcie_print_link_status()
  net/mlx5e: Use pcie_bandwidth_available() to compute bandwidth


 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c  |   23 +--
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |   19 --
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c   |   75 -
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c  |   87 ---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   47 --
 drivers/net/ethernet/mellanox/mlx4/main.c |   81 --
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |   32 
 drivers/net/ethernet/mellanox/mlx5/core/main.c|4 +
 drivers/pci/pci-sysfs.c   |   38 +
 drivers/pci/pci.c |  167 ++---
 drivers/pci/pci.h |   20 +++
 include/linux/pci.h   |6 +
 12 files changed, 189 insertions(+), 410 deletions(-)



diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 1bbd6cd20213..93291ec4a3d1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3864,25 +3864,6 @@ void mlx5e_build_default_indir_rqt(u32 *indirection_rqt, 
int len,
indirection_rqt[i] = i % num_channels;
 }
 
-static int mlx5e_get_pci_bw(struct mlx5_core_dev *mdev, u32 *pci_bw)
-{
-   enum pcie_link_width width;
-   enum pci_bus_speed speed;
-   int err = 0;
-   int bw;
-
-   err = pcie_bandwidth_available(mdev->pdev, , , , NULL);
-   if (err)
-   return err;
-
-   if (speed == PCI_SPEED_UNKNOWN || width == PCIE_LNK_WIDTH_UNKNOWN)
-   return -EINVAL;
-
-   *pci_bw = bw;
-
-   return 0;
-}
-
 static bool cqe_compress_heuristic(u32 link_speed, u32 pci_bw)
 {
return (link_speed && pci_bw &&
@@ -3968,7 +3949,7 @@ void mlx5e_build_nic_params(struct mlx5_core_dev *mdev,
params->num_tc   = 1;
 
mlx5e_get_max_linkspeed(mdev, _speed);
-   mlx5e_get_pci_bw(mdev, _bw);
+   pci_bw = pcie_bandwidth_available(mdev-&

Re: [PATCH v3] PCI: Reprogram bridge prefetch registers on resume

2018-09-27 Thread Bjorn Helgaas
[+cc LKML]

On Tue, Sep 18, 2018 at 04:32:44PM -0500, Bjorn Helgaas wrote:
> On Thu, Sep 13, 2018 at 11:37:45AM +0800, Daniel Drake wrote:
> > On 38+ Intel-based Asus products, the nvidia GPU becomes unusable
> > after S3 suspend/resume. The affected products include multiple
> > generations of nvidia GPUs and Intel SoCs. After resume, nouveau logs
> > many errors such as:
> > 
> > fifo: fault 00 [READ] at 00555000 engine 00 [GR] client 04
> >   [HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
> > DRM: failed to idle channel 0 [DRM]
> > 
> > Similarly, the nvidia proprietary driver also fails after resume
> > (black screen, 100% CPU usage in Xorg process). We shipped a sample
> > to Nvidia for diagnosis, and their response indicated that it's a
> > problem with the parent PCI bridge (on the Intel SoC), not the GPU.
> > 
> > Runtime suspend/resume works fine, only S3 suspend is affected.
> > 
> > We found a workaround: on resume, rewrite the Intel PCI bridge
> > 'Prefetchable Base Upper 32 Bits' register (PCI_PREF_BASE_UPPER32). In
> > the cases that I checked, this register has value 0 and we just have to
> > rewrite that value.
> > 
> > Linux already saves and restores PCI config space during suspend/resume,
> > but this register was being skipped because upon resume, it already
> > has value 0 (the correct, pre-suspend value).
> > 
> > Intel appear to have previously acknowledged this behaviour and the
> > requirement to rewrite this register.
> > https://bugzilla.kernel.org/show_bug.cgi?id=116851#c23
> > 
> > Based on that, rewrite the prefetch register values even when that
> > appears unnecessary.
> > 
> > We have confirmed this solution on all the affected models we have
> > in-hands (X542UQ, UX533FD, X530UN, V272UN).
> > 
> > Additionally, this solves an issue where r8169 MSI-X interrupts were
> > broken after S3 suspend/resume on Asus X441UAR. This issue was recently
> > worked around in commit 7bb05b85bc2d ("r8169: don't use MSI-X on
> > RTL8106e"). It also fixes the same issue on RTL6186evl/8111evl on an
> > Aimfor-tech laptop that we had not yet patched. I suspect it will also
> > fix the issue that was worked around in commit 7c53a722459c ("r8169:
> > don't use MSI-X on RTL8168g").
> > 
> > Thomas Martitz reports that this change also solves an issue where
> > the AMD Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive
> > after S3 suspend/resume.
> > 
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=201069
> > Signed-off-by: Daniel Drake 
> 
> Applied with Rafael's and Peter's reviewed-by to pci/enumeration for v4.20.
> Thanks for the the huge investigative effort!

Since this looks low-risk and fixes several painful issues, I think
this merits a stable tag and being included in v4.19 (instead of
waiting for v4.20).  

I moved it to for-linus for v4.19.  Let me know if you object.

> > ---
> >  drivers/pci/pci.c | 25 +
> >  1 file changed, 17 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 29ff9619b5fa..5d58220b6997 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -1289,12 +1289,12 @@ int pci_save_state(struct pci_dev *dev)
> >  EXPORT_SYMBOL(pci_save_state);
> >  
> >  static void pci_restore_config_dword(struct pci_dev *pdev, int offset,
> > -u32 saved_val, int retry)
> > +u32 saved_val, int retry, bool force)
> >  {
> > u32 val;
> >  
> > pci_read_config_dword(pdev, offset, );
> > -   if (val == saved_val)
> > +   if (!force && val == saved_val)
> > return;
> >  
> > for (;;) {
> > @@ -1313,25 +1313,34 @@ static void pci_restore_config_dword(struct pci_dev 
> > *pdev, int offset,
> >  }
> >  
> >  static void pci_restore_config_space_range(struct pci_dev *pdev,
> > -  int start, int end, int retry)
> > +  int start, int end, int retry,
> > +  bool force)
> >  {
> > int index;
> >  
> > for (index = end; index >= start; index--)
> > pci_restore_config_dword(pdev, 4 * index,
> >  pdev->saved_config_space[index],
> > -retry);
> > +retry, force);
> >  }
> >  

<    1   2