Re: [PATCH net-next v2 5/5] net-next: dsa: add dsa support for Mediatek MT7530 switch

2017-03-27 Thread Sean Wang
Hi Florian,

Thank for taking your time on reviewing. Add comment as inline.

On Wed, 2017-03-22 at 11:39 -0700, Florian Fainelli wrote:
> On 03/21/2017 02:35 AM, sean.w...@mediatek.com wrote:
> > From: Sean Wang 
> > 
> > MT7530 is a 7-ports Gigabit Ethernet Switch that could be found on
> > Mediatek router platforms such as MT7623A or MT7623N platform which
> > includes 7-port Gigabit Ethernet MAC and 5-port Gigabit Ethernet PHY.
> > Among these ports, The port from 0 to 4 are the user ports connecting
> > with the remote devices while the port 5 and 6 are the CPU ports
> > connecting into Mediatek Ethernet GMAC.
> > 
> > For port 6, it can communicate with the CPU via Mediatek Ethernet GMAC
> > through either the TRGMII or RGMII which could be controlled by phy-mode
> > in the dt-bindings to specify which mode is preferred to use. And for
> > port 5, only RGMII can be specified. However, currently, only port 6 is
> > being supported in this DSA driver.
> > 
> > The driver is made with the reference to qca8k and other existing DSA
> > driver. The most of the essential callbacks of the DSA are already
> > support in the driver, including tag insert for user port distinguishing,
> > port control, bridge offloading, STP setup and ethtool operation to allow
> > DSA to model each user port into a standalone netdevice as the other DSA
> > driver had done.
> 
> Overall, this looks pretty nice and clean, a few comments below
> 
> > 
> > Signed-off-by: Sean Wang 
> > Signed-off-by: Landen Chao 
> > ---
> 
> > +static void
> > +mt7530_fdb_read(struct mt7530_priv *priv, struct mt7530_fdb *fdb)
> > +{
> > +   u32 reg[3];
> > +   int i;
> > +
> > +   /* Read from ARL table into an array */
> > +   for (i = 0; i < 3; i++) {
> > +   reg[i] = mt7530_read(priv, MT7530_TSRA1 + (i * 4));
> > +
> > +   dev_dbg(priv->dev, "%s(%d) reg[%d]=0x%x\n",
> > +   __func__, __LINE__, i, reg[i]);
> > +   }
> > +
> > +   /* vid - 11:0 on reg[1] */
> > +   fdb->vid = (reg[1] >> 0) & 0xfff;
> > +   /* aging - 31:24 on reg[2] */
> > +   fdb->aging = (reg[2] >> 24) & 0xff;
> > +   /* portmask - 11:4 on reg[2] */
> > +   fdb->port_mask = (reg[2] >> 4) & 0xff;
> > +   /* mac - 31:0 on reg[0] and 31:16 on reg[1] */
> > +   fdb->mac[0] = (reg[0] >> 24) & 0xff;
> > +   fdb->mac[1] = (reg[0] >> 16) & 0xff;
> > +   fdb->mac[2] = (reg[0] >>  8) & 0xff;
> > +   fdb->mac[3] = (reg[0] >>  0) & 0xff;
> > +   fdb->mac[4] = (reg[1] >> 24) & 0xff;
> > +   fdb->mac[5] = (reg[1] >> 16) & 0xff;
> > +   /* noarp - 3:2 on reg[2] */
> > +   fdb->noarp = ((reg[2] >> 2) & 0x3) == STATIC_ENT;
> 
> Could you add some definitions for the bits and masks that you are
> shifting here?
> 

Okay, I'll make into proper macro for readability  

> > +}
> > +
> > +static void
> > +mt7530_fdb_write(struct mt7530_priv *priv, u16 vid,
> > +u8 port_mask, const u8 *mac,
> > +u8 aging, u8 type)
> > +{
> > +   u32 reg[3] = { 0 };
> > +   int i;
> > +
> > +   /* vid - 11:0 on reg[1] */
> > +   reg[1] |= (vid & 0xfff) << 0;
> > +   /* aging - 31:25 on reg[2] */
> > +   reg[2] |= (aging & 0xff) << 24;
> > +   /* portmask - 11:4 on reg[2] */
> > +   reg[2] |= (port_mask & 0xff) << 4;
> > +   /* type - 3 indicate that entry is static wouldn't
> > +* be aged out and 0 specified as erasing an entry
> > +*/
> > +   reg[2] |= (type & 0x3) << 2;
> > +   /* mac - 31:0 on reg[0] and 31:16 on reg[1] */
> > +   reg[1] |= mac[5] << 16;
> > +   reg[1] |= mac[4] << 24;
> > +   reg[0] |= mac[3] << 0;
> > +   reg[0] |= mac[2] << 8;
> > +   reg[0] |= mac[1] << 16;
> > +   reg[0] |= mac[0] << 24;
> > +
> > +   /* Wrirte array into the ARL table */
> > +   for (i = 0; i < 3; i++)
> > +   mt7530_write(priv, MT7530_ATA1 + (i * 4), reg[i]);
> > +}
> 
> Same here.
> 

As above. I will improve them.


> > +
> > +static int
> > +mt7530_pad_clk_setup(struct dsa_switch *ds, int mode)
> > +{
> > +   struct mt7530_priv *priv = ds->priv;
> > +   u32 ncpo1, ssc_delta, trgint, i;
> > +
> > +   switch (mode) {
> > +   case PHY_INTERFACE_MODE_RGMII:
> > +   trgint = 0;
> > +   ncpo1 = 0x0c80;
> > +   ssc_delta = 0x87;
> > +   break;
> > +   case PHY_INTERFACE_MODE_TRGMII:
> > +   trgint = 1;
> > +   ncpo1 = 0x1400;
> > +   ssc_delta = 0x57;
> > +   break;
> > +   default:
> > +   pr_err("xMII mode %d not supported\n", mode);
> > +   return -EINVAL;
> > +   }
> 
> You may be able to move this to an adjust_link callback that the PHY
> library would call when the PHY gets setup and the port is finally used,
> as opposed to doing this upfront during driver initialization.
> 


Good point. i will follow up


> 
> > +mt7530_setup(struct dsa_switch *ds)
> > +{
> > +   struct mt7530_priv *priv = ds->priv;
> > +   int ret, i, phy_mode;
> > +   u8  cpup_mask = 0;
> > +   u32 id, val;
> > +   struct regmap 

Re: [PATCH 1/3] soc: qcom: smd: Transition client drivers from smd to rpmsg

2017-03-27 Thread Bjorn Andersson
On Mon 27 Mar 16:04 PDT 2017, David Miller wrote:

> From: Bjorn Andersson 
> Date: Mon, 27 Mar 2017 15:58:37 -0700
> 
> > I'm sorry, but I can't figure out how to reproduce this.
> 
> All of my builds are "make allmodconfig" so it should be easy to reproduce.

Thanks, turns out that while it was possible to select CONFIG_SMD_RPM
and CONFIG_QCOM_WCNSS_CTRL drivers/soc/Makefile does not traverse into
qcom/ unless CONFIG_ARCH_QCOM was set.

So I just sent out version 2 of the three patches, where I add an
explicit dependency on ARCH_QCOM for those two options.

Regards,
Bjorn


Re: [PATCH] Make EN2 pin optional in the TRF7970A driver

2017-03-27 Thread Heiko Schocher

Hello all,

Am 21.02.2017 um 17:43 schrieb Rob Herring:

On Sun, Feb 19, 2017 at 11:19 PM, Heiko Schocher  wrote:

Hello all,

Am 13.02.2017 um 22:31 schrieb Rob Herring:


On Mon, Feb 13, 2017 at 12:38 AM, Heiko Schocher  wrote:


Hello Rob,


Am 10.02.2017 um 16:51 schrieb Rob Herring:



On Tue, Feb 07, 2017 at 06:22:04AM +0100, Heiko Schocher wrote:



From: Guan Ben 

Make the EN2 pin optional. This is useful for boards,
which have this pin fix wired, for example to ground.

Signed-off-by: Guan Ben 
Signed-off-by: Mark Jonas 
Signed-off-by: Heiko Schocher 

---

.../devicetree/bindings/net/nfc/trf7970a.txt   |  4 ++--
drivers/nfc/trf7970a.c | 26
--
2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
index 32b35a0..5889a3d 100644
--- a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
+++ b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
@@ -5,8 +5,8 @@ Required properties:
- spi-max-frequency: Maximum SPI frequency (<= 200).
- interrupt-parent: phandle of parent interrupt handler.
- interrupts: A single interrupt specifier.
-- ti,enable-gpios: Two GPIO entries used for 'EN' and 'EN2' pins on
the
-  TRF7970A.
+- ti,enable-gpios: One or two GPIO entries used for 'EN' and 'EN2'
pins
on the
+  TRF7970A. EN2 is optional.




Could EN ever be optional/fixed? If so, perhaps deprecate this property
and do 2 properties, one for each pin.




The hardware I have has the EN2 pin fix connected to ground. Looking
into http://www.ti.com/lit/ds/slos743k/slos743k.pdf page 19 table 6-3
and 6-4 the EN2 pin is a don;t core if EN = 1. If EN = 0 EN2 pin
selects between Power Down and Sleep Mode ... I see no reason why
this is not possible/allowed ...

Hmm.. I do not like the idea of deprecating the "ti,enable-gpios"
property into 2 seperate properties ... but if this would be a reason
for not accepting this patch, I can do this ... How should I name
the 2 new properties?



I guess if this ever happens, then we just add "ti,enable2-gpios" and
ti,enable-gpios continues to point to EN. We don't need to deprecate
anything (or maybe just deprecate having both GPIOs on single
property).

In that case,

Acked-by: Rob Herring 



gentle ping.

Are there any more comments to this patch? Is it acceptable as it
is?


I acked it, so yes, it is fine.


Gentle ping. Any more issues or can this patch go into mainline?

bye,
Heiko
--
DENX Software Engineering GmbH,  Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


[PATCH v2 1/3] soc: qcom: smd: Transition client drivers from smd to rpmsg

2017-03-27 Thread Bjorn Andersson
By moving these client drivers to use RPMSG instead of the direct SMD
API we can reuse them ontop of the newly added GLINK wire-protocol
support found in the 820 and 835 Qualcomm platforms.

As the new (RPMSG-based) and old SMD implementations are mutually
exclusive we have to change all client drivers in one commit, to make
sure we have a working system before and after this transition.

Acked-by: Andy Gross 
Acked-by: Kalle Valo 
Acked-by: Marcel Holtmann 
Signed-off-by: Bjorn Andersson 
---

Changes since v1:
- Add dependency on ARCH_QCOM for soc config options, to match the fact that
  drivers/soc/Makefile only enters qcom/ iff ARCH_QCOM is set.

 drivers/bluetooth/Kconfig  |  2 +-
 drivers/bluetooth/btqcomsmd.c  | 32 +--
 drivers/net/wireless/ath/wcn36xx/Kconfig   |  2 +-
 drivers/net/wireless/ath/wcn36xx/main.c|  6 ++--
 drivers/net/wireless/ath/wcn36xx/smd.c | 10 +++---
 drivers/net/wireless/ath/wcn36xx/smd.h |  6 ++--
 drivers/net/wireless/ath/wcn36xx/wcn36xx.h |  2 +-
 drivers/soc/qcom/Kconfig   |  6 ++--
 drivers/soc/qcom/smd-rpm.c | 43 +
 drivers/soc/qcom/wcnss_ctrl.c  | 50 +-
 include/linux/soc/qcom/wcnss_ctrl.h| 11 ---
 net/qrtr/Kconfig   |  2 +-
 net/qrtr/smd.c | 42 -
 13 files changed, 110 insertions(+), 104 deletions(-)

diff --git a/drivers/bluetooth/Kconfig b/drivers/bluetooth/Kconfig
index 08e054507d0b..a6a9dd4d0eef 100644
--- a/drivers/bluetooth/Kconfig
+++ b/drivers/bluetooth/Kconfig
@@ -344,7 +344,7 @@ config BT_WILINK
 
 config BT_QCOMSMD
tristate "Qualcomm SMD based HCI support"
-   depends on QCOM_SMD || (COMPILE_TEST && QCOM_SMD=n)
+   depends on RPMSG || (COMPILE_TEST && RPMSG=n)
depends on QCOM_WCNSS_CTRL || (COMPILE_TEST && QCOM_WCNSS_CTRL=n)
select BT_QCA
help
diff --git a/drivers/bluetooth/btqcomsmd.c b/drivers/bluetooth/btqcomsmd.c
index 8d4868af9bbd..ef730c173d4b 100644
--- a/drivers/bluetooth/btqcomsmd.c
+++ b/drivers/bluetooth/btqcomsmd.c
@@ -14,7 +14,7 @@
 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 
@@ -26,8 +26,8 @@
 struct btqcomsmd {
struct hci_dev *hdev;
 
-   struct qcom_smd_channel *acl_channel;
-   struct qcom_smd_channel *cmd_channel;
+   struct rpmsg_endpoint *acl_channel;
+   struct rpmsg_endpoint *cmd_channel;
 };
 
 static int btqcomsmd_recv(struct hci_dev *hdev, unsigned int type,
@@ -48,19 +48,19 @@ static int btqcomsmd_recv(struct hci_dev *hdev, unsigned 
int type,
return hci_recv_frame(hdev, skb);
 }
 
-static int btqcomsmd_acl_callback(struct qcom_smd_channel *channel,
- const void *data, size_t count)
+static int btqcomsmd_acl_callback(struct rpmsg_device *rpdev, void *data,
+ int count, void *priv, u32 addr)
 {
-   struct btqcomsmd *btq = qcom_smd_get_drvdata(channel);
+   struct btqcomsmd *btq = priv;
 
btq->hdev->stat.byte_rx += count;
return btqcomsmd_recv(btq->hdev, HCI_ACLDATA_PKT, data, count);
 }
 
-static int btqcomsmd_cmd_callback(struct qcom_smd_channel *channel,
- const void *data, size_t count)
+static int btqcomsmd_cmd_callback(struct rpmsg_device *rpdev, void *data,
+ int count, void *priv, u32 addr)
 {
-   struct btqcomsmd *btq = qcom_smd_get_drvdata(channel);
+   struct btqcomsmd *btq = priv;
 
return btqcomsmd_recv(btq->hdev, HCI_EVENT_PKT, data, count);
 }
@@ -72,12 +72,12 @@ static int btqcomsmd_send(struct hci_dev *hdev, struct 
sk_buff *skb)
 
switch (hci_skb_pkt_type(skb)) {
case HCI_ACLDATA_PKT:
-   ret = qcom_smd_send(btq->acl_channel, skb->data, skb->len);
+   ret = rpmsg_send(btq->acl_channel, skb->data, skb->len);
hdev->stat.acl_tx++;
hdev->stat.byte_tx += skb->len;
break;
case HCI_COMMAND_PKT:
-   ret = qcom_smd_send(btq->cmd_channel, skb->data, skb->len);
+   ret = rpmsg_send(btq->cmd_channel, skb->data, skb->len);
hdev->stat.cmd_tx++;
break;
default:
@@ -114,18 +114,15 @@ static int btqcomsmd_probe(struct platform_device *pdev)
wcnss = dev_get_drvdata(pdev->dev.parent);
 
btq->acl_channel = qcom_wcnss_open_channel(wcnss, "APPS_RIVA_BT_ACL",
-  btqcomsmd_acl_callback);
+  btqcomsmd_acl_callback, btq);
if (IS_ERR(btq->acl_channel))
return PTR_ERR(btq->acl_channel);
 
btq->cmd_channel = qcom_wcnss_open_channel(wcnss, "APPS_RIVA_BT_CMD",
-  

[PATCH v2 3/3] soc: qcom: smd-rpm: Add msm8996 compatibility

2017-03-27 Thread Bjorn Andersson
With the RPM driver transitioned to RPMSG we can reuse the SMD-RPM
driver ontop of GLINK for 8996, without any modifications.

Acked-by: Andy Gross 
Signed-off-by: Bjorn Andersson 
---
 drivers/soc/qcom/smd-rpm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/soc/qcom/smd-rpm.c b/drivers/soc/qcom/smd-rpm.c
index 0dcf1bf33126..c2346752b3ea 100644
--- a/drivers/soc/qcom/smd-rpm.c
+++ b/drivers/soc/qcom/smd-rpm.c
@@ -225,6 +225,7 @@ static const struct of_device_id qcom_smd_rpm_of_match[] = {
{ .compatible = "qcom,rpm-apq8084" },
{ .compatible = "qcom,rpm-msm8916" },
{ .compatible = "qcom,rpm-msm8974" },
+   { .compatible = "qcom,rpm-msm8996" },
{}
 };
 MODULE_DEVICE_TABLE(of, qcom_smd_rpm_of_match);
-- 
2.12.0



[PATCH v2 2/3] soc: qcom: smd: Remove standalone driver

2017-03-27 Thread Bjorn Andersson
Remove the standalone SMD implementation as we have transitioned the
client drivers to use the RPMSG based one.

Also remove all dependencies on QCOM_SMD from Kconfig files, in order to
keep them selectable in the absence of the removed symbol.

Acked-by: Andy Gross 
Signed-off-by: Bjorn Andersson 
---
 drivers/remoteproc/Kconfig |6 +-
 drivers/rpmsg/Kconfig  |1 -
 drivers/soc/qcom/Kconfig   |8 -
 drivers/soc/qcom/Makefile  |1 -
 drivers/soc/qcom/smd.c | 1560 
 include/linux/rpmsg/qcom_smd.h |2 +-
 include/linux/soc/qcom/smd.h   |  139 
 7 files changed, 4 insertions(+), 1713 deletions(-)
 delete mode 100644 drivers/soc/qcom/smd.c
 delete mode 100644 include/linux/soc/qcom/smd.h

diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
index 1dc43fc5f65f..faad69a1a597 100644
--- a/drivers/remoteproc/Kconfig
+++ b/drivers/remoteproc/Kconfig
@@ -76,7 +76,7 @@ config QCOM_ADSP_PIL
depends on OF && ARCH_QCOM
depends on REMOTEPROC
depends on QCOM_SMEM
-   depends on RPMSG_QCOM_SMD || QCOM_SMD || (COMPILE_TEST && QCOM_SMD=n && 
RPMSG_QCOM_SMD=n)
+   depends on RPMSG_QCOM_SMD || (COMPILE_TEST && RPMSG_QCOM_SMD=n)
select MFD_SYSCON
select QCOM_MDT_LOADER
select QCOM_RPROC_COMMON
@@ -93,7 +93,7 @@ config QCOM_Q6V5_PIL
depends on OF && ARCH_QCOM
depends on QCOM_SMEM
depends on REMOTEPROC
-   depends on RPMSG_QCOM_SMD || QCOM_SMD || (COMPILE_TEST && QCOM_SMD=n && 
RPMSG_QCOM_SMD=n)
+   depends on RPMSG_QCOM_SMD || (COMPILE_TEST && RPMSG_QCOM_SMD=n)
select MFD_SYSCON
select QCOM_RPROC_COMMON
select QCOM_SCM
@@ -104,7 +104,7 @@ config QCOM_Q6V5_PIL
 config QCOM_WCNSS_PIL
tristate "Qualcomm WCNSS Peripheral Image Loader"
depends on OF && ARCH_QCOM
-   depends on RPMSG_QCOM_SMD || QCOM_SMD || (COMPILE_TEST && QCOM_SMD=n && 
RPMSG_QCOM_SMD=n)
+   depends on RPMSG_QCOM_SMD || (COMPILE_TEST && RPMSG_QCOM_SMD=n)
depends on QCOM_SMEM
depends on REMOTEPROC
select QCOM_MDT_LOADER
diff --git a/drivers/rpmsg/Kconfig b/drivers/rpmsg/Kconfig
index f12ac0b28263..edc008f55663 100644
--- a/drivers/rpmsg/Kconfig
+++ b/drivers/rpmsg/Kconfig
@@ -16,7 +16,6 @@ config RPMSG_CHAR
 config RPMSG_QCOM_SMD
tristate "Qualcomm Shared Memory Driver (SMD)"
depends on QCOM_SMEM
-   depends on QCOM_SMD=n
select RPMSG
help
  Say y here to enable support for the Qualcomm Shared Memory Driver
diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
index 4e090c697eb6..9fca977ef18d 100644
--- a/drivers/soc/qcom/Kconfig
+++ b/drivers/soc/qcom/Kconfig
@@ -33,14 +33,6 @@ config QCOM_SMEM
  The driver provides an interface to items in a heap shared among all
  processors in a Qualcomm platform.
 
-config QCOM_SMD
-   tristate "Qualcomm Shared Memory Driver (SMD)"
-   depends on QCOM_SMEM
-   help
- Say y here to enable support for the Qualcomm Shared Memory Driver
- providing communication channels to remote processors in Qualcomm
- platforms.
-
 config QCOM_SMD_RPM
tristate "Qualcomm Resource Power Manager (RPM) over SMD"
depends on ARCH_QCOM
diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile
index 1f30260b06b8..414f0de274fa 100644
--- a/drivers/soc/qcom/Makefile
+++ b/drivers/soc/qcom/Makefile
@@ -1,7 +1,6 @@
 obj-$(CONFIG_QCOM_GSBI)+=  qcom_gsbi.o
 obj-$(CONFIG_QCOM_MDT_LOADER)  += mdt_loader.o
 obj-$(CONFIG_QCOM_PM)  +=  spm.o
-obj-$(CONFIG_QCOM_SMD) +=  smd.o
 obj-$(CONFIG_QCOM_SMD_RPM) += smd-rpm.o
 obj-$(CONFIG_QCOM_SMEM) += smem.o
 obj-$(CONFIG_QCOM_SMEM_STATE) += smem_state.o
diff --git a/drivers/soc/qcom/smd.c b/drivers/soc/qcom/smd.c
deleted file mode 100644
index 322034ab9d37..
--- a/drivers/soc/qcom/smd.c
+++ /dev/null
@@ -1,1560 +0,0 @@
-/*
- * Copyright (c) 2015, Sony Mobile Communications AB.
- * Copyright (c) 2012-2013, The Linux Foundation. All rights reserved.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 and
- * only version 2 as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-/*
- * The Qualcomm Shared Memory communication solution provides point-to-point
- * channels for clients to send and receive streaming or packet based data.
- *
- * Each channel consists of a 

Re: genphy_read_status() vs. 1000bT Pause capability

2017-03-27 Thread Benjamin Herrenschmidt
On Tue, 2017-03-28 at 16:17 +1100, Benjamin Herrenschmidt wrote:
> Hence my question ... how "standard" is the re-use of the LPA bits
> for these alternate meanings in 1000bT and should we update genphy
> to perform that decoding ?

And btw, I'm happy to provide patches if we agree on the approach :-)

Cheers,
Ben.




Re: genphy_read_status() vs. 1000bT Pause capability

2017-03-27 Thread Benjamin Herrenschmidt
On Mon, 2017-03-27 at 21:14 -0700, Florian Fainelli wrote:
> > Doesn't fix my other problem with Pause in 1000bT land. Do you know if
> > that way of reflecting the pause capability by hijacking the old
> > LPA bits is widely implemented enough that we should put it in
> > genphy_read_status() ?
> 
> Not sure I follow you here? The link partner pause capability is
> reflected in phydev->pause and phydev->asym_pause (yes, these are
> terrible names) when the link is established. 

Right. The problem is that they aren't for some gigabit links :-)

Basically, in my setup with a PHY which uses genphy_read_status()
(like most of them), I never get those advertised despite the fact
that, I *think* they are supported by the other end (even after fixing
my side of the advertising).

I added a printk inside genphy_read_status() to inspect the result
of the negociation, and this is what I read:

lpa=c1e1 lpagb=3800 adv=de1 common_adv=1e1 common_adv_gb=800

As you can see, LPA doesn't have the Pause bits. *However* it does
have bit 0x80 which can mean ADVERTISE_100HALF, but according to
our own mii.h can also mean ADVERTISE_1000XPAUSE. Similarily it 
has bit 0x100 which can mean ADVERTISE_100FULL but also can mean
ADVERTISE_1000XPSE_ASYM.

In fact we appear to have two functions to interpret them as
such inn the non-uapi mii.h:

ethtool_adv_to_mii_adv_x
mii_adv_to_ethtool_adv_x

However they aren't used much in the tree and not at all by the
"genphy" code.

So my question is... when we observe that we have a 1000 link
established, should we use these to "interpret" the LPA bits as
above ?

As it is, we never seem to advertise the capability because we never
decode the above (tried with 2 different PHYs, a Realtek and a
Broadcom) while my Cisco switches, I think, do support Pause.

Hence my question ... how "standard" is the re-use of the LPA bits
for these alternate meanings in 1000bT and should we update genphy
to perform that decoding ?

(I'm trying to download the 802.3 document referenced in the phy.txt
to see if it says anything about it but it's taking forever for some
reason).

Cheers,
Ben.



Re: EINVAL when using connect() for udp sockets

2017-03-27 Thread Cong Wang
On Fri, Mar 24, 2017 at 4:19 PM, Eric Dumazet  wrote:
> On Fri, 2017-03-24 at 15:34 -0700, Cong Wang wrote:
>> (Cc'ing Michael Kerrisk)
>>
>> On Wed, Mar 22, 2017 at 10:18 PM, Eric Dumazet  
>> wrote:
>> > On Thu, 2017-03-23 at 13:22 +1100, Daurnimator wrote:
>> >> On 9 March 2017 at 14:10, Daurnimator  wrote:
>> >> > When debugging https://github.com/daurnimator/lua-http/issues/73 which
>> >> > uses https://github.com/wahern/dns we ran into an issue where modern
>> >> > linux kernels return EINVAL if you try and re-use a udp socket.
>> >> > The issue seems to occur if you go from a local destination ip to a
>> >> > non-local one.
>> >>
>> >> Did anyone get a chance to look into this issue?
>> >
>> > I believe man page is not complete.
>> >
>> > A disconnect is needed before another connect()
>>
>> Is it? Making connect() reentrant is reasonable for connection-less
>> protocol like UDP, but I don't dig POSIX for the details. If so we need
>> something like below...
>>
>> --- a/net/ipv4/datagram.c
>> +++ b/net/ipv4/datagram.c
>> @@ -40,7 +40,7 @@ int __ip4_datagram_connect(struct sock *sk, struct
>> sockaddr *uaddr, int addr_len
>> sk_dst_reset(sk);
>>
>> oif = sk->sk_bound_dev_if;
>> -   saddr = inet->inet_saddr;
>> +   saddr = inet->inet_saddr = 0;
>> if (ipv4_is_multicast(usin->sin_addr.s_addr)) {
>> if (!oif)
>> oif = inet->mc_index;
>
> Wont this break bind() ?
>

Right. We need to distinguish bind() and connect(), something
like below?

--- a/net/ipv4/datagram.c
+++ b/net/ipv4/datagram.c
@@ -26,7 +26,7 @@ int __ip4_datagram_connect(struct sock *sk, struct
sockaddr *uaddr, int addr_len
struct sockaddr_in *usin = (struct sockaddr_in *) uaddr;
struct flowi4 *fl4;
struct rtable *rt;
-   __be32 saddr;
+   __be32 saddr = 0;
int oif;
int err;

@@ -40,7 +40,8 @@ int __ip4_datagram_connect(struct sock *sk, struct
sockaddr *uaddr, int addr_len
sk_dst_reset(sk);

oif = sk->sk_bound_dev_if;
-   saddr = inet->inet_saddr;
+   if (sk->sk_userlocks & SOCK_BINDADDR_LOCK)
+   saddr = inet->inet_saddr;
if (ipv4_is_multicast(usin->sin_addr.s_addr)) {
if (!oif)
oif = inet->mc_index;


Re: [pull request][net-next 00/14] Mellanox mlx5e Fail-safe config

2017-03-27 Thread David Miller
From: Saeed Mahameed 
Date: Mon, 27 Mar 2017 23:48:56 +0300

> This series provides a fail-safe mechanism to allow safely re-configuring
> mlx5e netdevice and provides a resiliency against sporadic
> configuration failures.
> 
> For additional information please see below.
> 
> Please pull and let me know if there's any problem.

Looks good, pulled, thanks!


Re: [PATCHv2 net] sctp: change to save MSG_MORE flag into assoc

2017-03-27 Thread Xin Long
On Tue, Mar 28, 2017 at 6:43 AM, Marcelo Ricardo Leitner
 wrote:
> On Mon, Mar 27, 2017 at 12:21:15AM +0800, Xin Long wrote:
>> David Laight noticed the support for MSG_MORE with datamsg->force_delay
>> didn't really work as we expected, as the first msg with MSG_MORE set
>> would always block the following chunks' dequeuing.
>>
>> This Patch is to rewrite it by saving the MSG_MORE flag into assoc as
>> David Laight suggested.
>>
>> asoc->force_delay is used to save MSG_MORE flag before a msg is sent.
>> All chunks in queue would not be sent out if asoc->force_delay is set
>> by the msg with MSG_MORE flag, until a new msg without MSG_MORE flag
>> clears asoc->force_delay.
>>
>> Note that this change would not affect the flush is generated by other
>> triggers, like asoc->state != ESTABLISHED, queue size > pmtu etc.
>>
>> v1->v2:
>>   Not clear asoc->force_delay after sending the msg with MSG_MORE flag.
>>
>> Fixes: 4ea0c32f5f42 ("sctp: add support for MSG_MORE")
>> Signed-off-by: Xin Long 
>> ---
>>  include/net/sctp/structs.h | 2 +-
>>  net/sctp/output.c  | 2 +-
>>  net/sctp/socket.c  | 2 +-
>>  3 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
>> index 592dece..8caa5ee 100644
>> --- a/include/net/sctp/structs.h
>> +++ b/include/net/sctp/structs.h
>> @@ -499,7 +499,6 @@ struct sctp_datamsg {
>>   /* Did the messenge fail to send? */
>>   int send_error;
>>   u8 send_failed:1,
>> -force_delay:1,
>>  can_delay;   /* should this message be Nagle delayed */
>>  };
>>
>> @@ -1878,6 +1877,7 @@ struct sctp_association {
>>
>>   __u8 need_ecne:1,   /* Need to send an ECNE Chunk? */
>>temp:1,/* Is it a temporary association? */
>> +  force_delay:1,
>>prsctp_enable:1,
>>reconf_enable:1;
>>
>> diff --git a/net/sctp/output.c b/net/sctp/output.c
>> index 1224421..73fd178 100644
>> --- a/net/sctp/output.c
>> +++ b/net/sctp/output.c
>> @@ -704,7 +704,7 @@ static sctp_xmit_t sctp_packet_can_append_data(struct 
>> sctp_packet *packet,
>>*/
>>
>>   if ((sctp_sk(asoc->base.sk)->nodelay || inflight == 0) &&
>> - !chunk->msg->force_delay)
>> + !asoc->force_delay)
>
> How is this going to not block the flush on asoc->state != ESTABLISHED?
> AFAICT b7018d0b6300 ("sctp: flush out queue once assoc state falls into
> SHUTDOWN_PENDING") need to clear asoc->force_delay too.

It won't block  the flush on asoc->state != ESTABLISHED,
in sctp_packet_can_append_data [1].

if ((sctp_sk(asoc->base.sk)->nodelay || inflight == 0) &&
!chunk->msg->force_delay)
/* Nothing unacked */
return SCTP_XMIT_OK;

if (!sctp_packet_empty(packet))
/* Append to packet */
return SCTP_XMIT_OK;

if (!sctp_state(asoc, ESTABLISHED)) <-[1]
return SCTP_XMIT_OK;



>
> Case I have in mind is the same old one:
> - app send a msg with MSG_MORE
> - close the asoc, without sending the final msg
>
>>   /* Nothing unacked */
>>   return SCTP_XMIT_OK;
>>
>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>> index 0f378ea..baa269a 100644
>> --- a/net/sctp/socket.c
>> +++ b/net/sctp/socket.c
>> @@ -1965,7 +1965,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr 
>> *msg, size_t msg_len)
>>   err = PTR_ERR(datamsg);
>>   goto out_free;
>>   }
>> - datamsg->force_delay = !!(msg->msg_flags & MSG_MORE);
>> + asoc->force_delay = !!(msg->msg_flags & MSG_MORE);
>>
>>   /* Now send the (possibly) fragmented message. */
>>   list_for_each_entry(chunk, >chunks, frag_list) {
>> --
>> 2.1.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>


Re: genphy_read_status() vs. 1000bT Pause capability

2017-03-27 Thread Florian Fainelli
Hi Ben,

On 03/27/2017 07:55 PM, Benjamin Herrenschmidt wrote:
> On Tue, 2017-03-28 at 13:28 +1100, Benjamin Herrenschmidt wrote:
>>
>>> Hi Ben
>>>
>>> It is worth reading Documentation/networking/phy.txt
>>>
>>> The MAC should set SUPPORTED_Pause and SUPPORTED_Asym_Pause if the MAC
>>> supports these features. The PHY will then negotiate them.
>>
> Haha ! The OpenBMC kernel is still at 4.7 which was still saying you
> should only clear bits in there :-) I think that's what I initially
> read.
> 
> Thanks for the pointer.
> 
> Doesn't fix my other problem with Pause in 1000bT land. Do you know if
> that way of reflecting the pause capability by hijacking the old
> LPA bits is widely implemented enough that we should put it in
> genphy_read_status() ?

Not sure I follow you here? The link partner pause capability is
reflected in phydev->pause and phydev->asym_pause (yes, these are
terrible names) when the link is established. An Ethernet driver is
still supposed to reconcile the locally advertised pause parameters
(auto negotiated, or manually configured) from ethtool_{get,set}_param
and then decide what to do in return (typically advertise or not the
support for pause frames).

The plan is eventually to provide better helper function for PHYLIB
aware Ethernet MAC drivers such that given the local pause settings of
the driver (resolved via ethtool), advertisement and auto-(re)negotation
works as expected, essentially providing something generic ala
tg3_set_pauseparam().

Have not gotten the time to get there yet so if you, or Russell beat me
to it, I'd happily review such patches.
-- 
Florian


Re: [PATCH next 0/5] link-status fixes for mii-monitoring

2017-03-27 Thread David Miller
From: Mahesh Bandewar 
Date: Mon, 27 Mar 2017 11:37:24 -0700

> From: Mahesh Bandewar 
> 
> The mii monitoring is divided into two phases - inspect and commit. The
> inspect phase technically should not make any changes to the state and
> defer it to the commit phase. However detected link state inconsistencies
> on several machines and discovered that it's the result of some
> inconsistent update to link states and assumption that you *always* get
> rtnl-mutex. In reality when trylock() fails to acquire rtnl-mutex, the
> commit phase is postponed until next mii-mon run. At the next round
> because of the state change performed in the previous inspect-run, this
> round does not detect any changes and would skip calling commit phase.
> This would result in an inconsistent state until next link event happens
> (if it ever happens).
> 
> During the the commit phase, it's always assumed that speed and duplex
> fetch is always successful, but that's always not the case. However the
> slave state is marked UP irrespective of speed / duplex fetch operation.
> If the speed / duplex fetch operation results in insane values for either
> of these two fields, then keeping internal link state UP is not going to
> provide fruitful results either.
> 
> Please see into individual patches for more details.

Looks good, series applied, thanks.


Re: [PATCH] net: ipconfig: fix ic_close_devs() use-after-free

2017-03-27 Thread David Miller
From: Mark Rutland 
Date: Mon, 27 Mar 2017 18:00:14 +0100

> Our chosen ic_dev may be anywhere in our list of ic_devs, and we may
> free it before attempting to close others. When we compare d->dev and
> ic_dev->dev, we're potentially dereferencing memory returned to the
> allocator. This causes KASAN to scream for each subsequent ic_dev we
> check.
> 
> As there's a 1-1 mapping between ic_devs and netdevs, we can instead
> compare d and ic_dev directly, which implicitly handles the !ic_dev
> case, and avoids the use-after-free. The ic_dev pointer may be stale,
> but we will not dereference it.
> 
> Original splat:
 ...
> Signed-off-by: Mark Rutland 

Applied, thanks.


Re: [PATCH v2] net: moxa: fix TX overrun memory leak

2017-03-27 Thread David Miller
From: Jonas Jensen 
Date: Mon, 27 Mar 2017 14:31:19 +0200

> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "moxart_ether.h"
>  
> @@ -297,6 +298,7 @@ static void moxart_tx_finished(struct net_device *ndev)
>   tx_tail = TX_NEXT(tx_tail);
>   }
>   priv->tx_tail = tx_tail;
> + netif_wake_queue(ndev);
>  }
>  
>  static irqreturn_t moxart_mac_interrupt(int irq, void *dev_id)

Doing the wakeup unconditionally is very wasteful, you just need to do it
when enough space has been made available.

Therefore the wakeup should be more like:

if (netif_queue_stopped(ndev) &&
moxart_tx_queue_space(ndev) >= MOXART_TX_WAKEUP_THRESHOLD)
netif_wake_queue();

Otherwise you're just going to flap back and forth under high load and
get almost not packet batching at all, hurting performance.




Re: [PATCH net-next 0/4] net: mpls: Allow users to configure more labels per route

2017-03-27 Thread Eric W. Biederman
David Ahern  writes:

> On 3/27/17 4:39 AM, Robert Shearman wrote:
>> On 25/03/17 19:15, Eric W. Biederman wrote:
>>> David Ahern  writes:
>>>
 Bump the maximum number of labels for MPLS routes from 2 to 12. To keep
 memory consumption in check the labels array is moved to the end of
 mpls_nh
 and mpls_iptunnel_encap structs as a 0-sized array. Allocations use the
 maximum number of labels across all nexthops in a route for LSR and the
 number of labels configured for LWT.

 The mpls_route layout is changed to:

+--+
| mpls_route   |
+--+
| mpls_nh 0|
+--+
| alignment padding|   4 bytes for odd number of labels; 0 for
 even
+--+
| via[rt_max_alen] 0   |
+--+
| alignment padding|   via's aligned on sizeof(unsigned long)
+--+
| ...  |

 Meaning the via follows its mpls_nh providing better locality as the
 number of labels increases. UDP_RR tests with namespaces shows no impact
 to a modest performance increase with this layout for 1 or 2 labels and
 1 or 2 nexthops.

 The new limit is set to 12 to cover all currently known segment
 routing use cases.
>>>
>>> How does this compare with running the packet a couple of times through
>>> the mpls table to get all of the desired labels applied?
>> 
>> At the moment (i.e setting output interface for a route to the loopback
>> interface) the TTL would currently be calculated incorrectly since it'll
>> be decremented each time the packet is run through the input processing.
>> If that was avoided, then the only issue would be the lower performance.
>
> We have the infrastructure to add all the labels on one pass. It does
> not make sense to recirculate the packet to get the same effect.

I was really asking what are the advantages and disadvantages of this
change rather than suggesting it was a bad idea.  The information about
ttl is useful.

Adding that this will route packets with more labels more quickly than
the recirculation method is also useful to know.

>>> I can certainly see the case in an mpls tunnel ingress where this might
>>> could be desirable.Which is something you implement in your last
>>> patch.  However is it at all common to push lots of labels at once
>>> during routing?
>>>
>>> I am probably a bit naive but it seems absurd to push more
>>> than a handful of labels onto a packet as you are routing it.
>> 
>> From draft-ietf-spring-segment-routing-mpls-07:
>> 
>>Note that the kind of deployment of Segment Routing may affect the
>>depth of the MPLS label stack.  As every segment in the list is
>>represented by an additional MPLS label, the length of the segment
>>list directly correlates to the depth of the label stack.
>>Implementing a long path with many explicit hops as a segment list
>>may thus yield a deep label stack that would need to be pushed at the
>>head of the SR tunnel.
>> 
>>However, many use cases would need very few segments in the list.
>>This is especially true when taking good advantage of the ECMP aware
>>routing within each segment.  In fact most use cases need just one
>>additional segment and thus lead to a similar label stack depth as
>>e.g.  RSVP-based routing.
>> 
>> The summary is that when using short-path routing then the number of
>> labels needed to be pushed on will be small (2 or 3). However, if using
>> SR to implement traffic engineering through a list of explicit hops then
>> the number of labels pushed could be much greater and up to the diameter
>> of the IGP network. Traffic engineering like this is not unusual.
>
> And the thread on the FRR mailing list has other ietf references. The
> summary is that are plenty of use cases for more labels on ingress
> (ip->mpls) and route paths (mpls->mpls). I did see one comment that 12
> may not be enough for all use cases. Why not 16 or 20?
>
> This patch set bumps the number of labels and the performance impact is
> only to users that use a high label count. Other than a temporary stack
> variable for installing the routes, no memory is allocated based on the
> limit as an array size, so we could just as easily go with 16 - a nice
> round number.

Overall I like what is being accomplished by this patchset.
I especially like the fact that the forwarding path is left
essentially unchanged, and that the struct mpls_route shirnks a little
for the common case.

I believe we should just kill MAX_NEW_LABELS.

I think the only significant change from your patch is the removal of an
array from mpls_route_config.

With the removal of MAX_NEW_LABELS I would replace it by a sanity check
in mpls_rt_alloc that verifies that the amount we are going to 

Re: genphy_read_status() vs. 1000bT Pause capability

2017-03-27 Thread Benjamin Herrenschmidt
On Tue, 2017-03-28 at 13:28 +1100, Benjamin Herrenschmidt wrote:
> 
> > Hi Ben
> > 
> > It is worth reading Documentation/networking/phy.txt
> > 
> > The MAC should set SUPPORTED_Pause and SUPPORTED_Asym_Pause if the MAC
> > supports these features. The PHY will then negotiate them.
> 
Haha ! The OpenBMC kernel is still at 4.7 which was still saying you
should only clear bits in there :-) I think that's what I initially
read.

Thanks for the pointer.

Doesn't fix my other problem with Pause in 1000bT land. Do you know if
that way of reflecting the pause capability by hijacking the old
LPA bits is widely implemented enough that we should put it in
genphy_read_status() ?

Cheers,
Ben.



Re: [PATCH] net: mvneta: set rx mode during resume if interface is running

2017-03-27 Thread Jisheng Zhang
Dear David,

On Mon, 27 Mar 2017 16:15:34 -0700 David Miller wrote:

> From: Jisheng Zhang 
> Date: Mon, 27 Mar 2017 18:59:05 +0800
> 
> > I found a bug by:
> > 
> > 0. boot and start dhcp client
> > 1. echo mem > /sys/power/state
> > 2. resume back immediately
> > 3. don't touch dhcp client to renew the lease
> > 4. ping the gateway. No acks
> > 
> > Usually, after step2, the DHCP lease isn't expired, so in theory we
> > should resume all back. But in fact, it doesn't. It turns out
> > the rx mode isn't resumed correctly. This patch fixes it by adding
> > mvneta_set_rx_mode(dev) in the resume hook if interface is running.
> > 
> > Signed-off-by: Jisheng Zhang   
> 
> This doesn't apply cleanly to the net tree, please respin.

This patch is generated against net-next, for mvneta suspend/resume support
is added into net-next recently. I did need to use the "[PATCH net-next]" for
the patch title, will take care in the future.

Sorry for confusion,
Jisheng


Re: genphy_read_status() vs. 1000bT Pause capability

2017-03-27 Thread Benjamin Herrenschmidt
On Tue, 2017-03-28 at 03:09 +0200, Andrew Lunn wrote:
> On Tue, Mar 28, 2017 at 11:49:50AM +1100, Benjamin Herrenschmidt wrote:
> > Hi !
> > 
> > I noticed that flow control isn't being enabled on a system I'm
> > working on by default. I've tracked it down to two things:
> > 
> >  - The realtec.c PHY driver doesn't have Pause or Asym_Pause in
> > its exposed capabilities. This is in part because PHY_GBIT_FEATURES
> > does not include SUPPORTED_Pause and SUPPORTED_Asym_Pause. Is there
> > a specific reason for that ?
> 
> Hi Ben
> 
> It is worth reading Documentation/networking/phy.txt
> 
> The MAC should set SUPPORTED_Pause and SUPPORTED_Asym_Pause if the MAC
> supports these features. The PHY will then negotiate them.

Ok. I had added them but hit the other issue with the 1000bT  style
pause.

Cheers,
Ben.



Re: [PATCH net-next 2/4] net: mpls: change mpls_route layout

2017-03-27 Thread Eric W. Biederman
David Ahern  writes:

> diff --git a/net/mpls/internal.h b/net/mpls/internal.h
> index 66f388ba2d49..302d48f54b57 100644
> --- a/net/mpls/internal.h
> +++ b/net/mpls/internal.h
> @@ -64,7 +64,6 @@ struct mpls_dev {
>  struct sk_buff;
>  
>  #define LABEL_NOT_SPECIFIED (1 << 20)
> -#define MAX_NEW_LABELS 2
>  
>  /* This maximum ha length copied from the definition of struct neighbour */
>  #define VIA_ALEN_ALIGN sizeof(unsigned long)
> @@ -84,12 +83,25 @@ enum mpls_payload_type {
>  struct mpls_nh { /* next hop label forwarding entry */
>   struct net_device __rcu *nh_dev;
>   unsigned intnh_flags;
> - u32 nh_label[MAX_NEW_LABELS];
>   u8  nh_labels;
>   u8  nh_via_alen;
>   u8  nh_via_table;
> + /* u8 hole */

This hole probably be better documented with:
u8  nh_reserved1;
> + u32 nh_label[0];
>  };

Eric


Re: genphy_read_status() vs. 1000bT Pause capability

2017-03-27 Thread Andrew Lunn
On Tue, Mar 28, 2017 at 11:49:50AM +1100, Benjamin Herrenschmidt wrote:
> Hi !
> 
> I noticed that flow control isn't being enabled on a system I'm
> working on by default. I've tracked it down to two things:
> 
>  - The realtec.c PHY driver doesn't have Pause or Asym_Pause in
> its exposed capabilities. This is in part because PHY_GBIT_FEATURES
> does not include SUPPORTED_Pause and SUPPORTED_Asym_Pause. Is there
> a specific reason for that ?

Hi Ben

It is worth reading Documentation/networking/phy.txt

The MAC should set SUPPORTED_Pause and SUPPORTED_Asym_Pause if the MAC
supports these features. The PHY will then negotiate them.

 Andrew


Re: [PATCH] net: udp: add socket option to report RX queue level

2017-03-27 Thread Chris Kuiper
Sorry, I have been transferring jobs and had no time to look at this.

Josh Hunt's change seems to solve a different problem. I was looking
for something that works the same way as SO_RXQ_OVERFL, providing
information as ancillary data to the recvmsg() call. The problem with
SO_RXQ_OVERFL alone is that it tells you when things have already gone
wrong (you dropped data), so the new option SO_RX_ALLOC acts as a
leading indicator to check if you are getting close to hitting such
problem.

Regarding only UDP being supported, it is only meaningful for UDP. TCP
doesn't drop data and if its buffer gets full it just stops the sender
from sending more. The buffer level in that case doesn't even tell you
the whole picture, since it doesn't include any information on how
much additional buffering is done at the sender side.

In terms of "a lot overhead", logically the overhead of adding
additional getsockopt() calls after each recvmsg() is significantly
larger than just getting the information as part of recvmsg(). If you
don't need it, then don't enable this option. Admitted you can reduce
the frequency of calling getsockopt() relative to recvmsg(), but that
also increases your risk of missing the point where data is dropped.

-Chris


On Fri, Mar 17, 2017 at 3:01 PM, Eric Dumazet  wrote:
> On Fri, 2017-03-17 at 14:13 -0700, Chris Kuiper wrote:
>> This adds a new socket option "SO_RXQ_ALLOC" that enables providing
>> the RX queue buffer allocation as ancillary data from the recvmsg()
>> system call. The value reported is a byte number and together with
>> the RX queue size (obtained via getsockopt(SO_RCVBUF) can be used to
>> calculate a percentage value on how full the socket buffer is.
>> ---
>
> Seems a lot of overhead, and only UDP would be supported.
>
> I very much prefer Josh Hunt proposal
> ( https://patchwork.ozlabs.org/patch/738250/ )
>
> Ie using a separate getsockopt() call instead of adding code to UDP fast
> path ?
>
>
>


genphy_read_status() vs. 1000bT Pause capability

2017-03-27 Thread Benjamin Herrenschmidt
Hi !

I noticed that flow control isn't being enabled on a system I'm
working on by default. I've tracked it down to two things:

 - The realtec.c PHY driver doesn't have Pause or Asym_Pause in
its exposed capabilities. This is in part because PHY_GBIT_FEATURES
does not include SUPPORTED_Pause and SUPPORTED_Asym_Pause. Is there
a specific reason for that ?

 - After I've hacked the above, I get in genphy_read_status():

lpa=c1e1 lpagb=3800 adv=de1 common_adv=1e1 common_adv_gb=800

So we have negociated 1000bT full duplex. LPA_PAUSE's aren't set
but I was under the impression that in Gigabit mode, LPA bit 0x80
which *is* set, meant ADVERTISE_1000XPAUSE which is the pause
capability isn't it ? Or am I confusing with something else ?
This seems to be how mii_adv_to_ethtool_adv_x() decodes them
but that function is not called by genphy_read_status()...

Now it's been a while since I hacked network drivers and back then
everybody did their own salad with gigabit PHYs so it's very possible
that I missed something here.

Should we update genphy_read_status() to grab the pause details
from mii_adv_to_ethtool_adv_x() when in 1000bT mode ?

Thanks !

Cheers,
Ben.



Re: [net-next v2 00/10][pull request] 40GbE Intel Wired LAN Driver Updates 2017-03-27

2017-03-27 Thread David Miller
From: Jeff Kirsher 
Date: Mon, 27 Mar 2017 16:52:00 -0700

> This series contains updates to i40e and i40evf only.

Pulled, thanks Jeff.


[net-next v2 10/10] i40e: initialize params before notifying of l2_param_changes

2017-03-27 Thread Jeff Kirsher
From: Jacob Keller 

Probably due to some mis-merging fix a bug associated with commits
d7ce6422d6e6 ("i40e: don't check params until after checking for client
instance", 2017-02-09) and 3140aa9a78c9 ("i40e: KISS the client
interface", 2017-03-14)

The first commit tried to move the initialization of the params
structure so that we didn't bother doing this if we didn't have a client
interface. You can already see that it looks fishy because of the
indentation. The second commit refactors a bunch of the interface, and
incorrectly drops the params initialization.

I believe what occurred is that internally the two patches were
re-ordered, and the merge conflicts as a result were performed
incorrectly.

Fix the use of an uninitialized variable by correctly initializing the
params variable via i40e_client_get_params().

Reported-by: Colin Ian King 
Signed-off-by: Jacob Keller 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_client.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_client.c 
b/drivers/net/ethernet/intel/i40e/i40e_client.c
index a9f0d22a7cf4..191028b1489b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_client.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_client.c
@@ -147,6 +147,8 @@ void i40e_notify_client_of_l2_param_changes(struct i40e_vsi 
*vsi)
dev_dbg(>back->pdev->dev, "Client is not open, abort l2 
param change\n");
return;
}
+   memset(, 0, sizeof(params));
+   i40e_client_get_params(vsi, );
memcpy(>lan_info.params, , sizeof(struct i40e_params));
cdev->client->ops->l2_param_change(>lan_info, cdev->client,
   );
-- 
2.12.0



[net-next v2 09/10] i40evf: dereference VSI after VSI has been null checked

2017-03-27 Thread Jeff Kirsher
From: Colin Ian King 

VSI is being dereferenced before the VSI null check; if VSI is
null we end up with a null pointer dereference.  Fix this by
performing VSI deference after the VSI null check.  Also remove
the need for using adapter by using vsi->back->cinst.

Detected by CoverityScan, CID#1419696, CID#1419697
("Dereference before null check")

Fixes: ed0e894de7c133 ("i40evf: add client interface")
Signed-off-by: Colin Ian King 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40evf/i40evf_client.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_client.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_client.c
index 5b43e5b6e2eb..ee737680a0e9 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_client.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_client.c
@@ -34,12 +34,12 @@ static struct i40e_ops i40evf_lan_ops = {
  **/
 void i40evf_notify_client_message(struct i40e_vsi *vsi, u8 *msg, u16 len)
 {
-   struct i40evf_adapter *adapter = vsi->back;
-   struct i40e_client_instance *cinst = adapter->cinst;
+   struct i40e_client_instance *cinst;
 
if (!vsi)
return;
 
+   cinst = vsi->back->cinst;
if (!cinst || !cinst->client || !cinst->client->ops ||
!cinst->client->ops->virtchnl_receive) {
dev_dbg(>back->pdev->dev,
@@ -58,12 +58,13 @@ void i40evf_notify_client_message(struct i40e_vsi *vsi, u8 
*msg, u16 len)
  **/
 void i40evf_notify_client_l2_params(struct i40e_vsi *vsi)
 {
-   struct i40evf_adapter *adapter = vsi->back;
-   struct i40e_client_instance *cinst = adapter->cinst;
+   struct i40e_client_instance *cinst;
struct i40e_params params;
 
if (!vsi)
return;
+
+   cinst = vsi->back->cinst;
memset(, 0, sizeof(params));
params.mtu = vsi->netdev->mtu;
params.link_up = vsi->back->link_up;
-- 
2.12.0



[net-next v2 00/10][pull request] 40GbE Intel Wired LAN Driver Updates 2017-03-27

2017-03-27 Thread Jeff Kirsher
This series contains updates to i40e and i40evf only.

Alex updates the driver code so that we can do bulk updates of the page
reference count instead of just incrementing it by one reference at a
time.  Fixed an issue where we were not resetting skb back to NULL when
we have freed it.  Cleaned up the i40e_process_skb_fields() to align with
other Intel drivers.  Removed FCoE code, since it is not supported in any
of the Fortville/Fortpark hardware, so there is not much point of carrying
the code around, especially if it is broken and untested.

Harshitha fixes a bug in the driver where the calculation of the RSS size
was not taking into account the number of traffic classes enabled.

Robert fixes a potential race condition during VF reset by eliminating
IOMMU DMAR Faults caused by VF hardware and when the OS initiates a VF
reset and before the reset is finished we modify the VF's settings.

Bimmy removes a delay that is no longer needed, since it was only needed
for preproduction hardware.

Colin King fixes null pointer dereference, where VSI was being
dereferenced before the VSI NULL check.

Jake fixes an issue with the recent addition of the "client code" to the
driver, where we attempt to use an uninitialized variable, so correctly
initialize the params variable by calling i40e_client_get_params().

v2: dropped patch 5 of the original series from Carolyn since we need
more documentation and reason why the added delay, so Carolyn is
taking the time to update the patch before we re-submit it for
kernel inclusion.

The following are changes since commit 402a5bc462d47f0b7c9e8a516c124c9c162fe2aa:
  ipv6: sr: select DST_CACHE by default
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE

Alexander Duyck (5):
  i40e/i40evf: Update code to better handle incrementing page count
  i40e/i40evf: Fix use after free in Rx cleanup path
  i40e/i40evf: Clean-up process_skb_fields
  i40e: Drop FCoE code from core driver files
  i40e: Drop FCoE code that always evaluates to false or 0

Bimmy Pujari (1):
  i40e: removed no longer needed delays

Colin Ian King (1):
  i40evf: dereference VSI after VSI has been null checked

Harshitha Ramamurthy (1):
  i40e: fix configuration of RSS table with DCB

Jacob Keller (1):
  i40e: initialize params before notifying of l2_param_changes

Robert Konklewski (1):
  i40e: Fixed race conditions in VF reset

 drivers/net/ethernet/intel/Kconfig |  11 -
 drivers/net/ethernet/intel/i40e/Makefile   |   1 -
 drivers/net/ethernet/intel/i40e/i40e.h |  62 -
 drivers/net/ethernet/intel/i40e/i40e_client.c  |   2 +
 drivers/net/ethernet/intel/i40e/i40e_common.c  |  27 --
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c |  19 --
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |  35 ---
 drivers/net/ethernet/intel/i40e/i40e_main.c| 295 +
 drivers/net/ethernet/intel/i40e/i40e_osdep.h   |   3 -
 drivers/net/ethernet/intel/i40e/i40e_prototype.h   |   3 -
 drivers/net/ethernet/intel/i40e/i40e_txrx.c|  60 ++---
 drivers/net/ethernet/intel/i40e/i40e_txrx.h|  24 +-
 drivers/net/ethernet/intel/i40e/i40e_type.h| 138 --
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |  43 ++-
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h |   1 -
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c  |  33 ++-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h  |  19 +-
 drivers/net/ethernet/intel/i40evf/i40evf_client.c  |   9 +-
 18 files changed, 112 insertions(+), 673 deletions(-)

-- 
2.12.0



[net-next v2 08/10] i40e: Drop FCoE code that always evaluates to false or 0

2017-03-27 Thread Jeff Kirsher
From: Alexander Duyck 

Since FCoE isn't supported by the i40e products there isn't much point in
carrying around code that will always evaluate to false. This patch goes
through and strips out the code in several spots so that we don't go around
carrying variables and/or code that is always going to evaluate to false or
0.

Change-ID: I39d1d779c66c638b75525839db2b6208fdc809d7
Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e.h |  3 ---
 drivers/net/ethernet/intel/i40e/i40e_main.c| 17 +++--
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h |  1 -
 3 files changed, 3 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index ee298adbb6db..d7e84f99eb2d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -502,9 +502,6 @@ struct i40e_pf {
 */
u16 dcbx_cap;
 
-   u32 fcoe_hmc_filt_num;
-   u32 fcoe_hmc_cntx_num;
-
struct i40e_filter_control_settings filter_settings;
 
struct ptp_clock *ptp_clock;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index b3520567a12f..96bedb54701c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -4369,14 +4369,6 @@ static void i40e_quiesce_vsi(struct i40e_vsi *vsi)
if (test_bit(__I40E_DOWN, >state))
return;
 
-   /* No need to disable FCoE VSI when Tx suspended */
-   if ((test_bit(__I40E_PORT_TX_SUSPENDED, >back->state)) &&
-   vsi->type == I40E_VSI_FCOE) {
-   dev_dbg(>back->pdev->dev,
-"VSI seid %d skipping FCoE VSI disable\n", vsi->seid);
-   return;
-   }
-
set_bit(__I40E_NEEDS_RESTART, >state);
if (vsi->netdev && netif_running(vsi->netdev))
vsi->netdev->netdev_ops->ndo_stop(vsi->netdev);
@@ -4479,8 +4471,7 @@ static int i40e_pf_wait_queues_disabled(struct i40e_pf 
*pf)
int v, ret = 0;
 
for (v = 0; v < pf->hw.func_caps.num_vsis; v++) {
-   /* No need to wait for FCoE VSI queues */
-   if (pf->vsi[v] && pf->vsi[v]->type != I40E_VSI_FCOE) {
+   if (pf->vsi[v]) {
ret = i40e_vsi_wait_queues_disabled(pf->vsi[v]);
if (ret)
break;
@@ -6968,8 +6959,7 @@ static void i40e_reset_and_rebuild(struct i40e_pf *pf, 
bool reinit)
goto end_core_reset;
 
ret = i40e_init_lan_hmc(hw, hw->func_caps.num_tx_qp,
-   hw->func_caps.num_rx_qp,
-   pf->fcoe_hmc_cntx_num, pf->fcoe_hmc_filt_num);
+   hw->func_caps.num_rx_qp, 0, 0);
if (ret) {
dev_info(>pdev->dev, "init_lan_hmc failed: %d\n", ret);
goto end_core_reset;
@@ -11014,8 +11004,7 @@ static int i40e_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
}
 
err = i40e_init_lan_hmc(hw, hw->func_caps.num_tx_qp,
-   hw->func_caps.num_rx_qp,
-   pf->fcoe_hmc_cntx_num, pf->fcoe_hmc_filt_num);
+   hw->func_caps.num_rx_qp, 0, 0);
if (err) {
dev_info(>dev, "init_lan_hmc failed: %d\n", err);
goto err_init_lan_hmc;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
index 4012d069939a..37af437daa5d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
@@ -87,7 +87,6 @@ struct i40e_vf {
u16 stag;
 
struct i40e_virtchnl_ether_addr default_lan_addr;
-   struct i40e_virtchnl_ether_addr default_fcoe_addr;
u16 port_vlan_id;
bool pf_set_mac;/* The VMM admin set the VF MAC address */
bool trusted;
-- 
2.12.0



[net-next v2 06/10] i40e/i40evf: Clean-up process_skb_fields

2017-03-27 Thread Jeff Kirsher
From: Alexander Duyck 

This is a minor clean-up to make the i40e/i40evf process_skb_fields
function look a little more like what we have in igb.  The Rx checksum
function called out a need for skb->protocol but I can't see where it
actually needs it.  I am assuming this is something that was likely
refactored out some time ago as the Rx checksum code has gone through a few
rewrites.

Change-ID: I0b4668a34d90b61b66ded7c7c26e19a3e2d06251
Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 8 +++-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 8 +++-
 2 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 9f2c9f1b8e06..a40338fd0126 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1392,8 +1392,6 @@ bool i40e_alloc_rx_buffers(struct i40e_ring *rx_ring, u16 
cleaned_count)
  * @vsi: the VSI we care about
  * @skb: skb currently being received and modified
  * @rx_desc: the receive descriptor
- *
- * skb->protocol must be set before this function is called
  **/
 static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
struct sk_buff *skb,
@@ -1555,12 +1553,12 @@ void i40e_process_skb_fields(struct i40e_ring *rx_ring,
 
i40e_rx_hash(rx_ring, rx_desc, skb, rx_ptype);
 
-   /* modifies the skb - consumes the enet header */
-   skb->protocol = eth_type_trans(skb, rx_ring->netdev);
-
i40e_rx_checksum(rx_ring->vsi, skb, rx_desc);
 
skb_record_rx_queue(skb, rx_ring->queue_index);
+
+   /* modifies the skb - consumes the enet header */
+   skb->protocol = eth_type_trans(skb, rx_ring->netdev);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 38f93075f496..8915c5598d20 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -764,8 +764,6 @@ bool i40evf_alloc_rx_buffers(struct i40e_ring *rx_ring, u16 
cleaned_count)
  * @vsi: the VSI we care about
  * @skb: skb currently being received and modified
  * @rx_desc: the receive descriptor
- *
- * skb->protocol must be set before this function is called
  **/
 static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
struct sk_buff *skb,
@@ -917,12 +915,12 @@ void i40evf_process_skb_fields(struct i40e_ring *rx_ring,
 {
i40e_rx_hash(rx_ring, rx_desc, skb, rx_ptype);
 
-   /* modifies the skb - consumes the enet header */
-   skb->protocol = eth_type_trans(skb, rx_ring->netdev);
-
i40e_rx_checksum(rx_ring->vsi, skb, rx_desc);
 
skb_record_rx_queue(skb, rx_ring->queue_index);
+
+   /* modifies the skb - consumes the enet header */
+   skb->protocol = eth_type_trans(skb, rx_ring->netdev);
 }
 
 /**
-- 
2.12.0



[net-next v2 02/10] i40e: fix configuration of RSS table with DCB

2017-03-27 Thread Jeff Kirsher
From: Harshitha Ramamurthy 

There exists a bug in the driver where the calculation of the
RSS size was not taking into account the number of traffic classes
enabled. This patch factors in the traffic classes both in
the initial configuration of the table as well as reconfiguration.

Change-ID: I34dcd345ce52faf1d6b9614bea28d450cfd5f621
Signed-off-by: Harshitha Ramamurthy 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 1d8febd721ac..5da990909a88 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8577,9 +8577,12 @@ static int i40e_pf_config_rss(struct i40e_pf *pf)
i40e_write_rx_ctl(hw, I40E_PFQF_CTL_0, reg_val);
 
/* Determine the RSS size of the VSI */
-   if (!vsi->rss_size)
-   vsi->rss_size = min_t(int, pf->alloc_rss_size,
- vsi->num_queue_pairs);
+   if (!vsi->rss_size) {
+   u16 qcount;
+
+   qcount = vsi->num_queue_pairs / vsi->tc_config.numtc;
+   vsi->rss_size = min_t(int, pf->alloc_rss_size, qcount);
+   }
if (!vsi->rss_size)
return -EINVAL;
 
@@ -8625,6 +8628,8 @@ int i40e_reconfig_rss_queues(struct i40e_pf *pf, int 
queue_count)
new_rss_size = min_t(int, queue_count, pf->rss_size_max);
 
if (queue_count != vsi->num_queue_pairs) {
+   u16 qcount;
+
vsi->req_queue_pairs = queue_count;
i40e_prep_for_reset(pf);
 
@@ -8642,8 +8647,8 @@ int i40e_reconfig_rss_queues(struct i40e_pf *pf, int 
queue_count)
}
 
/* Reset vsi->rss_size, as number of enabled queues changed */
-   vsi->rss_size = min_t(int, pf->alloc_rss_size,
- vsi->num_queue_pairs);
+   qcount = vsi->num_queue_pairs / vsi->tc_config.numtc;
+   vsi->rss_size = min_t(int, pf->alloc_rss_size, qcount);
 
i40e_pf_config_rss(pf);
}
-- 
2.12.0



[net-next v2 05/10] i40e: removed no longer needed delays

2017-03-27 Thread Jeff Kirsher
From: Bimmy Pujari 

Removed no longer needed delays.  At preproduction stage those delays were
needed but now these delays are not needed.

Signed-off-by: Bimmy Pujari 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 5da990909a88..0359f60b4792 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -4101,8 +4101,6 @@ static int i40e_vsi_control_tx(struct i40e_vsi *vsi, bool 
enable)
}
}
 
-   if (hw->revision_id == 0)
-   mdelay(50);
return ret;
 }
 
-- 
2.12.0



[net-next v2 07/10] i40e: Drop FCoE code from core driver files

2017-03-27 Thread Jeff Kirsher
From: Alexander Duyck 

Looking over the code for FCoE it looks like the Rx path has been broken at
least since the last major Rx refactor almost a year ago.  It seems like
FCoE isn't supported for any of the Fortville/Fortpark hardware so there
isn't much point in carrying the code around, especially if it is broken
and untested.

Change-ID: I892de8fa551cb129ce2361e738ff82ce55fa229e
Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/Kconfig   |  11 -
 drivers/net/ethernet/intel/i40e/Makefile |   1 -
 drivers/net/ethernet/intel/i40e/i40e.h   |  61 +-
 drivers/net/ethernet/intel/i40e/i40e_common.c|  27 ---
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c   |  19 --
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c   |  35 ---
 drivers/net/ethernet/intel/i40e/i40e_main.c  | 261 +--
 drivers/net/ethernet/intel/i40e/i40e_osdep.h |   3 -
 drivers/net/ethernet/intel/i40e/i40e_prototype.h |   3 -
 drivers/net/ethernet/intel/i40e/i40e_txrx.c  |  26 ---
 drivers/net/ethernet/intel/i40e/i40e_txrx.h  |  17 --
 drivers/net/ethernet/intel/i40e/i40e_type.h  | 138 
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h|  12 --
 13 files changed, 2 insertions(+), 612 deletions(-)

diff --git a/drivers/net/ethernet/intel/Kconfig 
b/drivers/net/ethernet/intel/Kconfig
index 1349b45f014d..1542a2158e96 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -235,17 +235,6 @@ config I40E_DCB
 
  If unsure, say N.
 
-config I40E_FCOE
-   bool "Fibre Channel over Ethernet (FCoE)"
-   default n
-   depends on I40E && DCB && FCOE
-   ---help---
- Say Y here if you want to use Fibre Channel over Ethernet (FCoE)
- in the driver. This will create new netdev for exclusive FCoE
- use with XL710 FCoE offloads enabled.
-
- If unsure, say N.
-
 config I40EVF
tristate "Intel(R) XL710 X710 Virtual Function Ethernet support"
depends on PCI_MSI
diff --git a/drivers/net/ethernet/intel/i40e/Makefile 
b/drivers/net/ethernet/intel/i40e/Makefile
index 3b3c63e54ed6..4f454d364d0d 100644
--- a/drivers/net/ethernet/intel/i40e/Makefile
+++ b/drivers/net/ethernet/intel/i40e/Makefile
@@ -45,4 +45,3 @@ i40e-objs := i40e_main.o \
i40e_virtchnl_pf.o
 
 i40e-$(CONFIG_I40E_DCB) += i40e_dcb.o i40e_dcb_nl.o
-i40e-$(CONFIG_I40E_FCOE) += i40e_fcoe.o
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index 3133a1a8b8b3..ee298adbb6db 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -56,9 +56,6 @@
 #include 
 #include "i40e_type.h"
 #include "i40e_prototype.h"
-#ifdef I40E_FCOE
-#include "i40e_fcoe.h"
-#endif
 #include "i40e_client.h"
 #include "i40e_virtchnl.h"
 #include "i40e_virtchnl_pf.h"
@@ -85,10 +82,6 @@
(((pf)->flags & I40E_FLAG_128_QP_RSS_CAPABLE) ? 128 : 64)
 #define I40E_FDIR_RING 0
 #define I40E_FDIR_RING_COUNT   32
-#ifdef I40E_FCOE
-#define I40E_DEFAULT_FCOE  8 /* default number of QPs for FCoE */
-#define I40E_MINIMUM_FCOE  1 /* minimum number of QPs for FCoE */
-#endif /* I40E_FCOE */
 #define I40E_MAX_AQ_BUF_SIZE   4096
 #define I40E_AQ_LEN256
 #define I40E_AQ_WORK_LIMIT 66 /* max number of VFs + a little */
@@ -347,10 +340,6 @@ struct i40e_pf {
u16 num_vmdq_msix; /* num queue vectors per vmdq pool */
u16 num_req_vfs;   /* num VFs requested for this VF */
u16 num_vf_qps;/* num queue pairs per VF */
-#ifdef I40E_FCOE
-   u16 num_fcoe_qps;  /* num fcoe queues this PF has set up */
-   u16 num_fcoe_msix; /* num queue vectors per fcoe pool */
-#endif /* I40E_FCOE */
u16 num_lan_qps;   /* num lan queues this PF has set up */
u16 num_lan_msix;  /* num queue vectors for the base PF vsi */
u16 num_fdsb_msix; /* num queue vectors for sideband Fdir */
@@ -411,9 +400,6 @@ struct i40e_pf {
 #define I40E_FLAG_FDIR_REQUIRES_REINIT BIT_ULL(8)
 #define I40E_FLAG_NEED_LINK_UPDATE BIT_ULL(9)
 #define I40E_FLAG_IWARP_ENABLEDBIT_ULL(10)
-#ifdef I40E_FCOE
-#define I40E_FLAG_FCOE_ENABLED BIT_ULL(11)
-#endif /* I40E_FCOE */
 #define I40E_FLAG_CLEAN_ADMINQ BIT_ULL(14)
 #define I40E_FLAG_FILTER_SYNC  BIT_ULL(15)
 #define I40E_FLAG_SERVICE_CLIENT_REQUESTED BIT_ULL(16)
@@ -461,10 +447,6 @@ struct i40e_pf {
 */
u64 hw_disabled_flags;
 
-#ifdef I40E_FCOE
-   struct i40e_fcoe fcoe;
-
-#endif /* I40E_FCOE */
struct i40e_client_instance *cinst;
bool 

[net-next v2 01/10] i40e/i40evf: Update code to better handle incrementing page count

2017-03-27 Thread Jeff Kirsher
From: Alexander Duyck 

Update the driver code so that we do bulk updates of the page reference
count instead of just incrementing it by one reference at a time.  The
advantage to doing this is that we cut down on atomic operations and
this in turn should give us a slight improvement in cycles per packet.
In addition if we eventually move this over to using build_skb the gains
will be more noticeable.

I also found and fixed a store forwarding stall from where we were
assigning "*new_buff = *old_buff".  By breaking it up into individual
copies we can avoid this and as a result the performance is slightly
improved.

Change-ID: I1d3880dece4133eca3c32423b04a5467321ccc52
Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 25 ++---
 drivers/net/ethernet/intel/i40e/i40e_txrx.h   |  7 ++-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 24 ++--
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h |  7 ++-
 4 files changed, 48 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 0ca307a6c731..e5c89770cbc2 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1154,7 +1154,7 @@ void i40e_clean_rx_ring(struct i40e_ring *rx_ring)
 PAGE_SIZE,
 DMA_FROM_DEVICE,
 I40E_RX_DMA_ATTR);
-   __free_pages(rx_bi->page, 0);
+   __page_frag_cache_drain(rx_bi->page, rx_bi->pagecnt_bias);
 
rx_bi->page = NULL;
rx_bi->page_offset = 0;
@@ -1299,6 +1299,7 @@ static bool i40e_alloc_mapped_page(struct i40e_ring 
*rx_ring,
bi->dma = dma;
bi->page = page;
bi->page_offset = 0;
+   bi->pagecnt_bias = 1;
 
return true;
 }
@@ -1604,7 +1605,10 @@ static void i40e_reuse_rx_page(struct i40e_ring *rx_ring,
rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
/* transfer page from old buffer to new buffer */
-   *new_buff = *old_buff;
+   new_buff->dma   = old_buff->dma;
+   new_buff->page  = old_buff->page;
+   new_buff->page_offset   = old_buff->page_offset;
+   new_buff->pagecnt_bias  = old_buff->pagecnt_bias;
 }
 
 /**
@@ -1656,6 +1660,7 @@ static bool i40e_can_reuse_rx_page(struct i40e_rx_buffer 
*rx_buffer,
 #if (PAGE_SIZE >= 8192)
unsigned int last_offset = PAGE_SIZE - I40E_RXBUFFER_2048;
 #endif
+   unsigned int pagecnt_bias = rx_buffer->pagecnt_bias--;
 
/* Is any reuse possible? */
if (unlikely(!i40e_page_is_reusable(page)))
@@ -1663,7 +1668,7 @@ static bool i40e_can_reuse_rx_page(struct i40e_rx_buffer 
*rx_buffer,
 
 #if (PAGE_SIZE < 8192)
/* if we are only owner of page we can reuse it */
-   if (unlikely(page_count(page) != 1))
+   if (unlikely(page_count(page) != pagecnt_bias))
return false;
 
/* flip page offset to other buffer */
@@ -1676,9 +1681,14 @@ static bool i40e_can_reuse_rx_page(struct i40e_rx_buffer 
*rx_buffer,
return false;
 #endif
 
-   /* Inc ref count on page before passing it up to the stack */
-   get_page(page);
-
+   /* If we have drained the page fragment pool we need to update
+* the pagecnt_bias and page count so that we fully restock the
+* number of references the driver holds.
+*/
+   if (unlikely(pagecnt_bias == 1)) {
+   page_ref_add(page, USHRT_MAX);
+   rx_buffer->pagecnt_bias = USHRT_MAX;
+   }
return true;
 }
 
@@ -1725,7 +1735,6 @@ static bool i40e_add_rx_frag(struct i40e_ring *rx_ring,
return true;
 
/* this page cannot be reused so discard it */
-   __free_pages(page, 0);
return false;
}
 
@@ -1819,6 +1828,8 @@ struct sk_buff *i40e_fetch_rx_buffer(struct i40e_ring 
*rx_ring,
/* we are not reusing the buffer so unmap it */
dma_unmap_page_attrs(rx_ring->dev, rx_buffer->dma, PAGE_SIZE,
 DMA_FROM_DEVICE, I40E_RX_DMA_ATTR);
+   __page_frag_cache_drain(rx_buffer->page,
+   rx_buffer->pagecnt_bias);
}
 
/* clear contents of buffer_info */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index 49c7b2089d8e..77c3e96f5172 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -258,7 +258,12 @@ struct i40e_tx_buffer {
 struct i40e_rx_buffer {
dma_addr_t dma;
struct page *page;
-   

[BUG] ethernet:mellanox:mlx5: Oops in health_recover get_nic_state(dev)

2017-03-27 Thread Goel, Sameer
Stack frame:
[ 1744.418958] [] get_nic_state+0x24/0x40 [mlx5_core]
[ 1744.425273] [] health_recover+0x28/0x80 [mlx5_core]
[ 1744.431496] [] process_one_work+0x150/0x460
[ 1744.437218] [] worker_thread+0x50/0x4b8
[ 1744.442609] [] kthread+0xd8/0xf0
[ 1744.447377] [] ret_from_fork+0x10/0x20

Summary:
This issue was seen on QDF2400 system 30 mins after while running speccpu 2006. 
During the test a recoverable PCIe error was seen that gave the following log:
[ 1673.170969] pcieport 0002:00:00.0: aer_status: 0x4000, aer_mask: 
0x0040
[ 1673.177961] pcieport 0002:00:00.0: aer_layer=Transaction Layer, 
aer_agent=Requester ID
[ 1673.185832] pcieport 0002:00:00.0: aer_uncor_severity: 0x00462030
[ 1675.536391] mlx5_core 0002:01:00.0: assert_var[0] 0x
[ 1675.541093] mlx5_core 0002:01:00.0: assert_var[1] 0x
[ 1675.546750] mlx5_core 0002:01:00.0: assert_var[2] 0x
[ 1675.552377] mlx5_core 0002:01:00.0: assert_var[3] 0x
[ 1675.558040] mlx5_core 0002:01:00.0: assert_var[4] 0x
[ 1675.563661] mlx5_core 0002:01:00.0: assert_exit_ptr 0x
[ 1675.569488] mlx5_core 0002:01:00.0: assert_callra 0x
[ 1675.575120] mlx5_core 0002:01:00.0: fw_ver 15.4095.65535
[ 1675.580426] mlx5_core 0002:01:00.0: hw_id 0x
[ 1675.585363] mlx5_core 0002:01:00.0: irisc_index 255
[ 1675.590242] mlx5_core 0002:01:00.0: synd 0xff: unrecognized error
[ 1675.596301] mlx5_core 0002:01:00.0: ext_synd 0x
[ 1675.601209] mlx5_core 0002:01:00.0: mlx5_enter_error_state:120:(pid 7205): 
start
[ 1675.608613] mlx5_core 0002:01:00.0: mlx5_enter_error_state:127:(pid 7205): 
end

After the above log we see the above stackframe and a page fault due to invalid 
dev pointer.

So the the recovery work is queued and the timer is stopped. Somehow the 
workqueue is not cleared and when it runs the dev pointer is invalid.

This issue was difficult to repro and was seen only once in multiple runs on a 
specific device.

Thanks,
Sameer 
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.


[net-next v2 03/10] i40e/i40evf: Fix use after free in Rx cleanup path

2017-03-27 Thread Jeff Kirsher
From: Alexander Duyck 

We need to reset skb back to NULL when we have freed it in the Rx cleanup
path.  I found one spot where this wasn't occurring so this patch fixes it.

Change-ID: Iaca68934200732cd4a63eb0bd83b539c95f8c4dd
Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 1 +
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index e5c89770cbc2..9f2c9f1b8e06 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1941,6 +1941,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, 
int budget)
 */
if (unlikely(i40e_test_staterr(rx_desc, 
BIT(I40E_RXD_QW1_ERROR_SHIFT {
dev_kfree_skb_any(skb);
+   skb = NULL;
continue;
}
 
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index d892922a2ed9..38f93075f496 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -1299,6 +1299,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, 
int budget)
 */
if (unlikely(i40e_test_staterr(rx_desc, 
BIT(I40E_RXD_QW1_ERROR_SHIFT {
dev_kfree_skb_any(skb);
+   skb = NULL;
continue;
}
 
-- 
2.12.0



[net-next v2 04/10] i40e: Fixed race conditions in VF reset

2017-03-27 Thread Jeff Kirsher
From: Robert Konklewski 

First, this patch eliminates IOMMU DMAR Faults caused by VF hardware.
This is done by enabling VF hardware only after VSI resources are
freed. Otherwise, hardware could DMA into memory that is (or just has
been) being freed.

Then, the VF driver is activated only after VSI resources have been
reallocated. That's because the VF driver can request resources
immediately after it's activated. So they need to be ready at that
point.

The second race condition happens when the OS initiates a VF reset,
and then before it's finished modifies VF's settings by changing its
MAC, VLAN ID, bandwidth allocation, anti-spoof checking, etc. These
functions needed to be blocked while VF is undergoing reset. Otherwise,
they could operate on data structures that had just been freed or not
yet fully initialized.

Change-ID: I43ba5a7ae2c9a1cce3911611ffc4598ae33ae3ff
Signed-off-by: Robert Konklewski 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 43 ++
 1 file changed, 35 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index cfe8b78dac0e..d526940ff951 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -809,6 +809,11 @@ static void i40e_free_vf_res(struct i40e_vf *vf)
u32 reg_idx, reg;
int i, msix_vf;
 
+   /* Start by disabling VF's configuration API to prevent the OS from
+* accessing the VF's VSI after it's freed / invalidated.
+*/
+   clear_bit(I40E_VF_STAT_INIT, >vf_states);
+
/* free vsi & disconnect it from the parent uplink */
if (vf->lan_vsi_idx) {
i40e_vsi_release(pf->vsi[vf->lan_vsi_idx]);
@@ -848,7 +853,6 @@ static void i40e_free_vf_res(struct i40e_vf *vf)
/* reset some of the state variables keeping track of the resources */
vf->num_queue_pairs = 0;
vf->vf_states = 0;
-   clear_bit(I40E_VF_STAT_INIT, >vf_states);
 }
 
 /**
@@ -939,6 +943,14 @@ void i40e_reset_vf(struct i40e_vf *vf, bool flr)
/* warn the VF */
clear_bit(I40E_VF_STAT_ACTIVE, >vf_states);
 
+   /* Disable VF's configuration API during reset. The flag is re-enabled
+* in i40e_alloc_vf_res(), when it's safe again to access VF's VSI.
+* It's normally disabled in i40e_free_vf_res(), but it's safer
+* to do it earlier to give some time to finish to any VF config
+* functions that may still be running at this point.
+*/
+   clear_bit(I40E_VF_STAT_INIT, >vf_states);
+
/* In the case of a VFLR, the HW has already reset the VF and we
 * just need to clean up, so don't hit the VFRTRIG register.
 */
@@ -982,11 +994,6 @@ void i40e_reset_vf(struct i40e_vf *vf, bool flr)
if (!rsd)
dev_err(>pdev->dev, "VF reset check timeout on VF %d\n",
vf->vf_id);
-   wr32(hw, I40E_VFGEN_RSTAT1(vf->vf_id), I40E_VFR_COMPLETED);
-   /* clear the reset bit in the VPGEN_VFRTRIG reg */
-   reg = rd32(hw, I40E_VPGEN_VFRTRIG(vf->vf_id));
-   reg &= ~I40E_VPGEN_VFRTRIG_VFSWR_MASK;
-   wr32(hw, I40E_VPGEN_VFRTRIG(vf->vf_id), reg);
 
/* On initial reset, we won't have any queues */
if (vf->lan_vsi_idx == 0)
@@ -994,8 +1001,24 @@ void i40e_reset_vf(struct i40e_vf *vf, bool flr)
 
i40e_vsi_stop_rings(pf->vsi[vf->lan_vsi_idx]);
 complete_reset:
-   /* reallocate VF resources to reset the VSI state */
+   /* free VF resources to begin resetting the VSI state */
i40e_free_vf_res(vf);
+
+   /* Enable hardware by clearing the reset bit in the VPGEN_VFRTRIG reg.
+* By doing this we allow HW to access VF memory at any point. If we
+* did it any sooner, HW could access memory while it was being freed
+* in i40e_free_vf_res(), causing an IOMMU fault.
+*
+* On the other hand, this needs to be done ASAP, because the VF driver
+* is waiting for this to happen and may report a timeout. It's
+* harmless, but it gets logged into Guest OS kernel log, so best avoid
+* it.
+*/
+   reg = rd32(hw, I40E_VPGEN_VFRTRIG(vf->vf_id));
+   reg &= ~I40E_VPGEN_VFRTRIG_VFSWR_MASK;
+   wr32(hw, I40E_VPGEN_VFRTRIG(vf->vf_id), reg);
+
+   /* reallocate VF resources to finish resetting the VSI state */
if (!i40e_alloc_vf_res(vf)) {
int abs_vf_id = vf->vf_id + hw->func_caps.vf_base_id;
i40e_enable_vf_mappings(vf);
@@ -1006,7 +1029,11 @@ void i40e_reset_vf(struct i40e_vf *vf, bool flr)
i40e_notify_client_of_vf_reset(pf, abs_vf_id);
vf->num_vlan = 0;
}
-   /* tell 

Re: [PATCH net] MAINTAINERS: Add Andrew Lunn as co-maintainer of PHYLIB

2017-03-27 Thread David Miller
From: Florian Fainelli 
Date: Mon, 27 Mar 2017 10:48:11 -0700

> Andrew has been contributing a lot to PHYLIB over the past months and
> his feedback on patches is more than welcome.
> 
> Signed-off-by: Florian Fainelli 

Applied, thanks.


Re: [net-next 00/11][pull request] 40GbE Intel Wired LAN Driver Updates 2017-03-25

2017-03-27 Thread Jeff Kirsher
On Sat, 2017-03-25 at 01:12 -0700, Jeff Kirsher wrote:
> This series contains updates to i40e and i40evf only.
> 
> Alex updates the driver code so that we can do bulk updates of the
> page
> reference count instead of just incrementing it by one reference at a
> time.  Fixed an issue where we were not resetting skb back to NULL
> when
> we have freed it.  Cleaned up the i40e_process_skb_fields() to align
> with
> other Intel drivers.  Removed FCoE code, since it is not supported in
> any
> of the Fortville/Fortpark hardware, so there is not much point of
> carrying
> the code around, especially if it is broken and untested.
> 
> Harshitha fixes a bug in the driver where the calculation of the RSS
> size
> was not taking into account the number of traffic classes enabled.
> 
> Robert fixes a potential race condition during VF reset by
> eliminating
> IOMMU DMAR Faults caused by VF hardware and when the OS initiates a
> VF
> reset and before the reset is finished we modify the VF's settings.
> 
> Carolyn adds a needed delay to accommodate the hardware needs.
> 
> Bimmy removes a delay that is no longer needed, since it was only
> needed
> for preproduction hardware.
> 
> Colin King fixes null pointer dereference, where VSI was being
> dereferenced before the VSI NULL check.
> 
> Jake fixes an issue with the recent addition of the "client code" to
> the
> driver, where we attempt to use an uninitialized variable, so
> correctly
> initialize the params variable by calling i40e_client_get_params().

I will re-spin the series without Carolyn's patch, while she works on
the patch to update the documentation/explanation on her change.

> 
> The following are changes since commit
> 2239cc634395ecce69dd047be9104f71edc417b4:
>   Merge branch 'epoll-busypoll'
> and are available in the git repository at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
> 40GbE
> 
> Alexander Duyck (5):
>   i40e/i40evf: Update code to better handle incrementing page count
>   i40e/i40evf: Fix use after free in Rx cleanup path
>   i40e/i40evf: Clean-up process_skb_fields
>   i40e: Drop FCoE code from core driver files
>   i40e: Drop FCoE code that always evaluates to false or 0
> 
> Bimmy Pujari (1):
>   i40e: removed no longer needed delays
> 
> Carolyn Wyborny (1):
>   i40e: fix for queue timing delays
> 
> Colin Ian King (1):
>   i40evf: dereference VSI after VSI has been null checked
> 
> Harshitha Ramamurthy (1):
>   i40e: fix configuration of RSS table with DCB
> 
> Jacob Keller (1):
>   i40e: initialize params before notifying of l2_param_changes
> 
> Robert Konklewski (1):
>   i40e: Fixed race conditions in VF reset
> 
>  drivers/net/ethernet/intel/Kconfig |  11 -
>  drivers/net/ethernet/intel/i40e/Makefile   |   1 -
>  drivers/net/ethernet/intel/i40e/i40e.h |  62 -
>  drivers/net/ethernet/intel/i40e/i40e_client.c  |   2 +
>  drivers/net/ethernet/intel/i40e/i40e_common.c  |  27 --
>  drivers/net/ethernet/intel/i40e/i40e_debugfs.c |  19 --
>  drivers/net/ethernet/intel/i40e/i40e_ethtool.c |  35 ---
>  drivers/net/ethernet/intel/i40e/i40e_main.c| 297 ++-
> --
>  drivers/net/ethernet/intel/i40e/i40e_osdep.h   |   3 -
>  drivers/net/ethernet/intel/i40e/i40e_prototype.h   |   3 -
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c|  60 ++---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h|  24 +-
>  drivers/net/ethernet/intel/i40e/i40e_type.h| 138 --
>  drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |  43 ++-
>  drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h |   1 -
>  drivers/net/ethernet/intel/i40evf/i40e_txrx.c  |  33 ++-
>  drivers/net/ethernet/intel/i40evf/i40e_txrx.h  |  19 +-
>  drivers/net/ethernet/intel/i40evf/i40evf_client.c  |   9 +-
>  18 files changed, 114 insertions(+), 673 deletions(-)
> 


signature.asc
Description: This is a digitally signed message part


Re: [PATCH] net: mvneta: set rx mode during resume if interface is running

2017-03-27 Thread David Miller
From: Jisheng Zhang 
Date: Mon, 27 Mar 2017 18:59:05 +0800

> I found a bug by:
> 
> 0. boot and start dhcp client
> 1. echo mem > /sys/power/state
> 2. resume back immediately
> 3. don't touch dhcp client to renew the lease
> 4. ping the gateway. No acks
> 
> Usually, after step2, the DHCP lease isn't expired, so in theory we
> should resume all back. But in fact, it doesn't. It turns out
> the rx mode isn't resumed correctly. This patch fixes it by adding
> mvneta_set_rx_mode(dev) in the resume hook if interface is running.
> 
> Signed-off-by: Jisheng Zhang 

This doesn't apply cleanly to the net tree, please respin.


Re: [PATCH net-next] ipv6: sr: select DST_CACHE by default

2017-03-27 Thread David Miller
From: David Lebrun 
Date: Mon, 27 Mar 2017 11:43:59 +0200

> When CONFIG_IPV6_SEG6_LWTUNNEL is selected, automatically select DST_CACHE.
> This allows to remove multiple ifdefs.
> 
> Signed-off-by: David Lebrun 

Applied, thanks for following up on this David.


Re: [PATCH 1/3] soc: qcom: smd: Transition client drivers from smd to rpmsg

2017-03-27 Thread David Miller
From: Bjorn Andersson 
Date: Mon, 27 Mar 2017 15:58:37 -0700

> I'm sorry, but I can't figure out how to reproduce this.

All of my builds are "make allmodconfig" so it should be easy to reproduce.


Re: [PATCH net-next] net: ibmveth: Remove unused stats member from struct ibmveth_adapter

2017-03-27 Thread David Miller
From: Tobias Klauser 
Date: Mon, 27 Mar 2017 08:56:15 +0200

> The ibmveth driver keeps its statistics in net_device->stats, so the
> stats member in struct ibmveth_adapter is unused. Remove it.
> 
> Signed-off-by: Tobias Klauser 

Applied.


Re: [PATCH net-next] net: ibmvnic: Remove unused net_stats member from struct ibmvnic_adapter

2017-03-27 Thread David Miller
From: Tobias Klauser 
Date: Mon, 27 Mar 2017 08:56:59 +0200

> The ibmvnic driver keeps its statistics in net_device->stats, so the
> net_stats member in struct ibmvnic_adapter is unused. Remove it.
> 
> Signed-off-by: Tobias Klauser 

Applied.


Re: [PATCH] netvsc: fix dereference before null check errors

2017-03-27 Thread David Miller
From: Haiyang Zhang 
Date: Mon, 27 Mar 2017 00:50:27 +

> 
> 
>> -Original Message-
>> From: Colin King [mailto:colin.k...@canonical.com]
>> Sent: Saturday, March 25, 2017 10:27 AM
>> To: KY Srinivasan ; Haiyang Zhang
>> ; Stephen Hemminger ;
>> de...@linuxdriverproject.org; netdev@vger.kernel.org
>> Cc: kernel-janit...@vger.kernel.org; linux-ker...@vger.kernel.org
>> Subject: [PATCH] netvsc: fix dereference before null check errors
>> 
>> From: Colin Ian King 
>> 
>> ndev is being checked to see if it is a null pointer however before
>> the null check ndev is being dereferenced; hence there is a potential
>> null pointer dereference bug that needs fixing. Fix this by only
>> dereferencing ndev after the null check.
>> 
>> Detected by CoverityScan, CID#1420760, CID#140761 ("Dereference
>> before null check")
>> 
>> Signed-off-by: Colin Ian King 
> 
> Reviewed-by: Haiyang Zhang 

Applied.


Re: [PATCH net-next 1/4] net: mpls: Convert number of nexthops to u8

2017-03-27 Thread Eric W. Biederman
David Ahern  writes:

> On 3/26/17 9:11 PM, Eric W. Biederman wrote:
>> I don't like this.  Byte writes don't exist on all architectures.
>> 
>> So while I think always writing to rtn_nhn_alive under the
>> rtn_lock ensures that we don't have wrong values written
>> it is quite subtle.  And I don't know how this will interact with other
>> fields that you are introducing.
>> 
>> AKA this might be ok, but I expect this formulation of the code
>> will easily bit-rot and break.
>
> net/ has other use cases -- e.g., ipv6 tunneling has proto as a u8.
>
> It unrealistic for a route to have 255 or more nexthops so the point of
> this patch is to not waste 8 bytes tracking it - especially when
> removing it gets routes with ipv4 and ipv6 via's into a cache line.

The argument isn't that 255 nexthops is too few but that there is no
instruction to write to a single byte on some architectures.

My concern is that if we are writing a field using a non-byte write
without care we could easily have confusion with adjacent fields.

> I can make the alive counter a u16 without increasing the size of the
> struct. I'd prefer to leave it as an u8 to have a u8 hole for flags
> should something be needed later.

u16 is no better than u8.

The original architecture was that all changes to an mpls route would
be done in read, copy, allocate a new route, and replace the pointer
fashion.  Which allows rcu access.

There was argument made that it is silly to do that when a the network
device for a hop goes up or down.  Something about the memory allocation
not being reliable as I recall. And so we now have rt_nhn_alive and it
stored as an int so that it can be read and written atomically.

It is absolutely a no-brainer to change rt_nhn to a u8.  And I very much
appreciate all work to keep mpls_route into a single cache line.  As in
practices that is one of the most important parts to performance.

Which leads to the functions mpls_ifup, mpls_ifdown, and
mpls_select_multipath.

To make this all work correctly we need a couple of things.
- A big fat comment on struct mpls_route and mpls_nh about how
  and why these structures are modified and not replaced during
  nexthop processing.  Including the fact that it all modifications
  may only happen with rntl_lock held.

- The use of READ_ONCE and WRITE_ONCE on all rt->rt_nhn_alive accesses,
  that happen after the route is installed (and is thus rcu reachable).

- The use of READ_ONCE and WRITE_ONCE on all nh->nh_flags accesses,
  that happen after the route is installed (and is thus rcu reachable).

Someone needs to fix mpls_ifup AKA something like:

struct net_device *nh_dev =
rtnl_dereference(nh->nh_dev);

+   unhsigned int flags = READ_ONCE(nh->nh_flags);
+   if (nh_dev == dev) {
+   flags &= ~nh_flags;
+   WRITE_ONCE(nh->nh_flags, flags);
+   }
+   if (!(flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN)))
+   alive++;
-   if (!(nh->nh_flags & nh_flags)) {
-   alive++;
-   continue;
-   }
-   if (nh_dev != dev)
-   continue;
-   alive++;
-   nh->nh_flags &= ~nh_flags;
} endfor_nexthops(rt);
 
-   ACCESS_ONCE(rt->rt_nhn_alive) = alive;
+   WRITE_ONCE(rt->rt_nhn_alive, alive);
}
 }

If we comment it all clearly and make very certain that the magic with
nh->nh_flags and rt->rt_nhn_alive works I don't object.  But we need to
let future people who touch the code know: here be dragons.

Especially as anything else in the same 32bits as rt->nhn_alive as our
update of that field will can rewrite those values too.  So we need
very careful to serialize any update like that.

Eric


Re: [PATCH net-next] net: bfin_mac: Remove unused stats member from struct bfin_mac_local

2017-03-27 Thread David Miller
From: Tobias Klauser 
Date: Mon, 27 Mar 2017 08:55:11 +0200

> The bfin_mac driver keeps its statistics in net_device->stats, so the
> stats member in struct bfin_mac_local is unused. Remove it, as well as
> the accompanying comment.
> 
> Cc: adi-buildroot-de...@lists.sourceforge.net
> Signed-off-by: Tobias Klauser 

Applied.


Re: [PATCH] net: tehuti: use new api ethtool_{get|set}_link_ksettings

2017-03-27 Thread David Miller
From: Philippe Reynes 
Date: Sun, 26 Mar 2017 22:03:13 +0200

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> As I don't have the hardware, I'd be very pleased if
> someone may test this patch.
> 
> Signed-off-by: Philippe Reynes 

Applied.


Re: [PATCH net-next] rtnl: Add support for netdev event to link messages

2017-03-27 Thread David Miller
From: Vladislav Yasevich 
Date: Sat, 25 Mar 2017 21:59:47 -0400

> RTNL currently generates notifications on some netdev notifier events.
> However, user space has no idea what changed.  All it sees is the
> data and has to infer what has changed.  For some events that is not
> possible.
> 
> This patch adds a new field to RTM_NEWLINK message called IFLA_EVENT
> that would have an encoding of the which event triggered this
> notification.  Currectly, only 2 events (NETDEV_NOTIFY_PEERS and
> NETDEV_MTUCHANGED) are supported.  These events could be interesting
> in the virt space to trigger additional configuration commands to VMs.
> Other events of interest may be added later.
> 
> Signed-off-by: Vladislav Yasevich 

At what point do we start providing the metadata for the changed
values as well?  You'd probably need to provide both the old and
new values to cover all cases.

> @@ -4044,6 +4076,7 @@ static int rtnl_stats_dump(struct sk_buff *skb, struct 
> netlink_callback *cb)
>   return skb->len;
>  }
>  
> +
>  /* Process one rtnetlink message. */
>  
>  static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)

Please don't add more empty lines between functions, one is enough.


Re: [PATCH 1/3] soc: qcom: smd: Transition client drivers from smd to rpmsg

2017-03-27 Thread Bjorn Andersson
On Thu 23 Mar 16:56 PDT 2017, David Miller wrote:

> From: Bjorn Andersson 
> Date: Wed, 22 Mar 2017 14:57:33 -0700
> 
> > On Wed 22 Mar 11:44 PDT 2017, David Miller wrote:
> > 
> >> From: Bjorn Andersson 
> >> Date: Mon, 20 Mar 2017 16:35:42 -0700
> >> 
> >> What is the status of the Kconfig dependency fix and how will I be
> >> getting it?
> >> 
> > 
> > There are two Kconfig dependencies in play here, the first is
> > c3104aae5d8c ("remoteproc: qcom: fix QCOM_SMD dependencies"), this was
> > picked up by Linus yesterday and will as such be in v4.10-rc4.
> > 
> > The other dependency, is the one Marcel wants you to pick up here is
> > https://patchwork.kernel.org/patch/9635385/. It's on LKML, but if you
> > want I can resend it with you as direct recipient, with Marcel's ack.
> > 
> > Likely Arnd would like this fix to be sent upstream for v4.11 already.
> > 
> >> Second, should I merge all three of these patches to net-next or just
> >> this one?
> >> 
> > 
> > I would like all three to be merged in this cycle and in addition I have
> > a couple of patches coming up that will cause some minor conflicts with
> > patch 2 - so I would prefer if patch 2 was available in a tag I can
> > merge into my tree.
> 
> I should have all the dependencies in net-next now, but when I apply
> this series I get undefined symbols:
> 

I'm sorry, but I can't figure out how to reproduce this. I took the
master branch of net-next and applied the three patches in this series.

> [davem@localhost net-next]$ time make -s -j8
> Kernel: arch/x86/boot/bzImage is ready  (#578)
> ERROR: "qcom_rpm_smd_write" [drivers/regulator/qcom_smd-regulator.ko] 
> undefined!

According to drivers/regulator/Kconfig REGULATOR_QCOM_SMD_RPM depends on
QCOM_SMD_RPM and there's nothing tricky here. So if the Kconfig
dependency is met you should have qcom_rpm_smd_write().

> ERROR: "qcom_wcnss_open_channel" 
> [drivers/net/wireless/ath/wcn36xx/wcn36xx.ko] undefined!

WCN36XX depends on QCOM_WCNSS_CTRL || QCOM_WCNSS_CTRL=n, in other words
either qcom_wcnss_open_channel() should be defined from
drivers/soc/qcom/wcnss_ctrl.o (or .ko) or be stubbed by
include/linux/soc/qcom/wcnss_ctrl.h

> ERROR: "qcom_rpm_smd_write" [drivers/clk/qcom/clk-smd-rpm.ko] undefined!

As with REGULATOR_QCOM_SMD_RPM, this depends on QCOM_SMD_RPM - so it
should be covered.

> ERROR: "qcom_wcnss_open_channel" [drivers/bluetooth/btqcomsmd.ko] undefined!

This is the problem that was corrected by  6e9e6cc8f4e4 ("Bluetooth:
btqcomsmd: fix compile-test dependency") and with the same dependencies
as CONFIG_WCN36XX I don't see how this can happen (with that patch
applied).

> scripts/Makefile.modpost:91: recipe for target '__modpost' failed
> 
> Please fix this up.

Can you please help me figure this out?

Perhaps you can help me figure it out by letting me know the state of
the following config options in your local .config?

CONFIG_QCOM_SMD_RPM
CONFIG_QCOM_WCNSS_CTRL
CONFIG_REGULATOR_QCOM_SMD_RPM
CONFIG_RPMSG
CONFIG_WCN36XX

I'm sorry for the inconvenience.

Regards,
Bjorn


Re: [PATCH] net: cris: eth_v10: use new api ethtool_{get|set}_link_ksettings

2017-03-27 Thread David Miller
From: Philippe Reynes 
Date: Sat, 25 Mar 2017 19:39:05 +0100

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> As I don't have the hardware, I'd be very pleased if
> someone may test this patch.
> 
> Signed-off-by: Philippe Reynes 

Applied.


Re: [PATCH net-next 0/4] net: mpls: Allow users to configure more labels per route

2017-03-27 Thread David Miller
From: David Ahern 
Date: Sat, 25 Mar 2017 10:03:24 -0700

> Bump the maximum number of labels for MPLS routes from 2 to 12.

This doesn't apply cleanly to net-next, please respin.

Perhaps it conflicts with your recent cleanups.


Re: [patch net-next 0/8] Add support for pipeline debug (dpipe)

2017-03-27 Thread David Miller

Please fix up these warnings and resubmit:

drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c: In function 
‘mlxsw_sp_rif_counter_free’:
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:208:2: warning: 
‘p_counter_index’ may be used uninitialized in this function 
[-Wmaybe-uninitialized]
  mlxsw_sp_rif_counter_edit(mlxsw_sp, rif->rif_index,
  ^
drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c: In function 
‘mlxsw_sp_table_erif_entries_dump’:
drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:220:9: warning: missing 
braces around initializer [-Wmissing-braces]
  struct devlink_dpipe_value match_value = {0}, action_value = {0};
 ^
drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:220:9: warning: (near 
initialization for ‘match_value.’) [-Wmissing-braces]
drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:220:9: warning: missing 
braces around initializer [-Wmissing-braces]
drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:220:9: warning: (near 
initialization for ‘action_value.’) [-Wmissing-braces]


Re: [PATCH RFC v2 3/3] net: phy: allow EEE with SGMII interface modes

2017-03-27 Thread Russell King - ARM Linux
On Mon, Mar 27, 2017 at 03:42:57PM -0700, Florian Fainelli wrote:
> I think David will require you to resubmit this as an entire patch
> series and without an RFC tag. Do you want to hold off a bit to get
> build coverage or go ahead and target net-next right away?

I'd prefer to get build coverage first - there's no point in not using
the facilities that are available.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.


[PATCH net-next] vxlan: don't age NTF_EXT_LEARNED fdb entries

2017-03-27 Thread Roopa Prabhu
From: Roopa Prabhu 

vxlan driver already implicitly supports installing
of external fdb entries with NTF_EXT_LEARNED. This
patch just makes sure these entries are not aged
by the vxlan driver. An external entity managing these
entries will age them out. This is consistent with
the use of NTF_EXT_LEARNED in the bridge driver.

Signed-off-by: Roopa Prabhu 
---
 drivers/net/vxlan.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 09855be..1e54fb5 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2322,6 +2322,9 @@ static void vxlan_cleanup(unsigned long arg)
if (f->state & (NUD_PERMANENT | NUD_NOARP))
continue;
 
+   if (f->flags & NTF_EXT_LEARNED)
+   continue;
+
timeout = f->used + vxlan->cfg.age_interval * HZ;
if (time_before_eq(timeout, jiffies)) {
netdev_dbg(vxlan->dev,
-- 
1.9.1



Re: [PATCHv2 net] sctp: change to save MSG_MORE flag into assoc

2017-03-27 Thread Marcelo Ricardo Leitner
On Mon, Mar 27, 2017 at 12:21:15AM +0800, Xin Long wrote:
> David Laight noticed the support for MSG_MORE with datamsg->force_delay
> didn't really work as we expected, as the first msg with MSG_MORE set
> would always block the following chunks' dequeuing.
> 
> This Patch is to rewrite it by saving the MSG_MORE flag into assoc as
> David Laight suggested.
> 
> asoc->force_delay is used to save MSG_MORE flag before a msg is sent.
> All chunks in queue would not be sent out if asoc->force_delay is set
> by the msg with MSG_MORE flag, until a new msg without MSG_MORE flag
> clears asoc->force_delay.
> 
> Note that this change would not affect the flush is generated by other
> triggers, like asoc->state != ESTABLISHED, queue size > pmtu etc.
> 
> v1->v2:
>   Not clear asoc->force_delay after sending the msg with MSG_MORE flag.
> 
> Fixes: 4ea0c32f5f42 ("sctp: add support for MSG_MORE")
> Signed-off-by: Xin Long 
> ---
>  include/net/sctp/structs.h | 2 +-
>  net/sctp/output.c  | 2 +-
>  net/sctp/socket.c  | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index 592dece..8caa5ee 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -499,7 +499,6 @@ struct sctp_datamsg {
>   /* Did the messenge fail to send? */
>   int send_error;
>   u8 send_failed:1,
> -force_delay:1,
>  can_delay;   /* should this message be Nagle delayed */
>  };
>  
> @@ -1878,6 +1877,7 @@ struct sctp_association {
>  
>   __u8 need_ecne:1,   /* Need to send an ECNE Chunk? */
>temp:1,/* Is it a temporary association? */
> +  force_delay:1,
>prsctp_enable:1,
>reconf_enable:1;
>  
> diff --git a/net/sctp/output.c b/net/sctp/output.c
> index 1224421..73fd178 100644
> --- a/net/sctp/output.c
> +++ b/net/sctp/output.c
> @@ -704,7 +704,7 @@ static sctp_xmit_t sctp_packet_can_append_data(struct 
> sctp_packet *packet,
>*/
>  
>   if ((sctp_sk(asoc->base.sk)->nodelay || inflight == 0) &&
> - !chunk->msg->force_delay)
> + !asoc->force_delay)

How is this going to not block the flush on asoc->state != ESTABLISHED?
AFAICT b7018d0b6300 ("sctp: flush out queue once assoc state falls into
SHUTDOWN_PENDING") need to clear asoc->force_delay too.

Case I have in mind is the same old one:
- app send a msg with MSG_MORE
- close the asoc, without sending the final msg

>   /* Nothing unacked */
>   return SCTP_XMIT_OK;
>  
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 0f378ea..baa269a 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -1965,7 +1965,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr 
> *msg, size_t msg_len)
>   err = PTR_ERR(datamsg);
>   goto out_free;
>   }
> - datamsg->force_delay = !!(msg->msg_flags & MSG_MORE);
> + asoc->force_delay = !!(msg->msg_flags & MSG_MORE);
>  
>   /* Now send the (possibly) fragmented message. */
>   list_for_each_entry(chunk, >chunks, frag_list) {
> -- 
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


Re: [PATCH RFC v2 3/3] net: phy: allow EEE with SGMII interface modes

2017-03-27 Thread Florian Fainelli
On 03/27/2017 01:15 PM, Russell King - ARM Linux wrote:
> Here's the revised patch as requested.
> 
> Thanks.
> 
> 8<===
> From: Russell King 
> Subject: [PATCH] net: phy: allow EEE with any interface mode
> 
> EEE is able to work in any PHY interface mode, there is nothing which
> fundamentally restricts it to only a few modes.  For example, EEE works
> in SGMII mode with the Marvell 88E1512.
> 
> Rather than just adding SGMII mode to the list, Florian suggests
> removing the list of interface modes entirely:
> 
>   It actually sounds like we should just kill the check entirely,
>   it does not appear that any of the interface mode would not
>   fundamentally be able to support EEE, because the "lowest" mode
>   we support is MII, and even there it's quite possible to support
>   EEE.
> 
> Signed-off-by: Russell King 

Reviewed-by: Florian Fainelli 

I think David will require you to resubmit this as an entire patch
series and without an RFC tag. Do you want to hold off a bit to get
build coverage or go ahead and target net-next right away?

Thanks!
-- 
Florian


[PATCH v2] net: netfilter: Remove multiple assignment.

2017-03-27 Thread Arushi Singhal
This patch removes multiple assignments to follow the kernel coding
style as also reported by checkpatch.pl.
Done using coccinelle.
@@
identifier i1,i2;
constant c;
@@
- i1=i2=c;
+ i1=c;
+ i2=i1;

Signed-off-by: Arushi Singhal 
---
changes in v2
 -Make the commit message more clear and appropriate.

 net/netfilter/nf_conntrack_proto_sctp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nf_conntrack_proto_sctp.c 
b/net/netfilter/nf_conntrack_proto_sctp.c
index 33279aab583d..723386bcc2cb 100644
--- a/net/netfilter/nf_conntrack_proto_sctp.c
+++ b/net/netfilter/nf_conntrack_proto_sctp.c
@@ -346,7 +346,8 @@ static int sctp_packet(struct nf_conn *ct,
goto out;
}
 
-   old_state = new_state = SCTP_CONNTRACK_NONE;
+   old_state = SCTP_CONNTRACK_NONE;
+   new_state = old_state;
spin_lock_bh(>lock);
for_each_sctp_chunk (skb, sch, _sch, offset, dataoff, count) {
/* Special cases of Verification tag check (Sec 8.5.1) */
-- 
2.11.0



Re: Extending socket timestamping API for NTP

2017-03-27 Thread Denny Page

> On Mar 27, 2017, at 13:58, Richard Cochran  wrote:
> 
> On Mon, Mar 27, 2017 at 12:18:47PM -0700, Denny Page wrote:
>> I think that on average, the Vendor’s numbers are likely to be more
>> accurate than anyone else’s. The concept that independent software
>> implementations are going to somehow obtain and maintain better
>> numbers is too much of a stretch.
> 
> But you just said that Intel's first published numbers were wrong.  If
> the vendors would have published accurate information, then you would
> not have to have made your own measurements, and the drivers could
> simply use the correct values.
> 
> Sadly, this will never happen.  The vendor's track record is 100%
> fail.  The apps will always need to implement their own, truly correct
> values.  Having "almost correct" values hard coded into the drivers
> only makes things worse.

Yes, Intel’s original numbers were wrong. But that doesn’t mean that other’s 
people’s numbers are going to be particularly better. Even Intel’s original 
numbers were far better than most will be able to achieve. 

But let’s bring this back to the driver. If someone conducts tests and believes 
that they have better numbers than currently used in the driver, let them come 
forward with their information and propose a kernel patch. No harm in that at 
all. And much easier than brining a patch for dozens of applications.

Denny



Re: [PATCH net-next 0/2] net: mpls: multipath route cleanups

2017-03-27 Thread David Miller
From: David Ahern 
Date: Fri, 24 Mar 2017 15:21:55 -0700

> When a device associated with a nexthop is deleted, the nexthop in
> the route is effectively removed, so remove it from the route dump.
> 
> Further, when all nexhops have been deleted the route is effectively
> done, so remove the route. 

Series applied, but I agree with Roopa that you need to add that nexthop
NULL device check to lfib_nlmsg_size() too.


Re: [PATCH net-next 2/2] net: stmmac: fix number of tx queues in stmmac_poll

2017-03-27 Thread David Miller
From: Joao Pinto 
Date: Mon, 27 Mar 2017 18:44:22 +0100

> For what I am understanding, SoCs base on Core versions >= 4.00 are working
> properly and for some reason SoCs based on older versions are not working.

Please send me a revert, and work offline with these people to resolve
all of the regressions you introduced.

Once you resolved all of the regressions, we can put the changes back
in.

Thank you.


Re: Extending socket timestamping API for NTP

2017-03-27 Thread Richard Cochran
On Mon, Mar 27, 2017 at 12:19:57PM -0700, Denny Page wrote:
> Do you still have the resulting correction values from this?

No, I don't, but next time I drag out the phyter I will take another
look and let you know.

Thanks,
Richard


Re: Extending socket timestamping API for NTP

2017-03-27 Thread Richard Cochran
On Mon, Mar 27, 2017 at 12:18:47PM -0700, Denny Page wrote:
> I think that on average, the Vendor’s numbers are likely to be more
> accurate than anyone else’s. The concept that independent software
> implementations are going to somehow obtain and maintain better
> numbers is too much of a stretch.

But you just said that Intel's first published numbers were wrong.  If
the vendors would have published accurate information, then you would
not have to have made your own measurements, and the drivers could
simply use the correct values.

Sadly, this will never happen.  The vendor's track record is 100%
fail.  The apps will always need to implement their own, truly correct
values.  Having "almost correct" values hard coded into the drivers
only makes things worse.

> FWIW, My testing indicates that the 100Mb numbers that Intel
> currently publishes are quite accurate. I don’t believe that Intel
> did the driver corrections btw, if memory serves these values were
> lifted from the Mac.

Huh?  Mac?  -ENOPARSE.

Thanks,
Richard


Re: [PATCH net-next 2/2] cls_flower: add support for matching MPLS labels

2017-03-27 Thread Jiri Pirko
Mon, Mar 27, 2017 at 08:16:02PM CEST, benjamin.laha...@netronome.com wrote:
>Add support to tc flower to match based on fields in MPLS labels (TTL, 
>Bottom of Stack, TC field, Label).
>
>Signed-off-by: Benjamin LaHaise 
>Signed-off-by: Benjamin LaHaise 
>Reviewed-by: Simon Horman 
>Reviewed-by: Jakub Kicinski 
>
>diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
>index 7a69f2a..f1129e3 100644
>--- a/include/uapi/linux/pkt_cls.h
>+++ b/include/uapi/linux/pkt_cls.h
>@@ -432,6 +432,11 @@ enum {
>   TCA_FLOWER_KEY_ARP_THA, /* ETH_ALEN */
>   TCA_FLOWER_KEY_ARP_THA_MASK,/* ETH_ALEN */
> 
>+  TCA_FLOWER_KEY_MPLS_TTL,/* u8 - 8 bits */
>+  TCA_FLOWER_KEY_MPLS_BOS,/* u8 - 1 bit */
>+  TCA_FLOWER_KEY_MPLS_TC, /* u8 - 3 bits */
>+  TCA_FLOWER_KEY_MPLS_LABEL,  /* be32 - 20 bits */
>+
>   __TCA_FLOWER_MAX,
> };
> 
>diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>index 9d0c99d..24619f9 100644
>--- a/net/sched/cls_flower.c
>+++ b/net/sched/cls_flower.c
>@@ -18,6 +18,7 @@
> #include 
> #include 
> #include 
>+#include 
> 
> #include 
> #include 
>@@ -47,6 +48,7 @@ struct fl_flow_key {
>   struct flow_dissector_key_ipv6_addrs enc_ipv6;
>   };
>   struct flow_dissector_key_ports enc_tp;
>+  struct flow_dissector_key_mpls mpls;
> } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as 
> longs. */
> 
> struct fl_flow_mask_range {
>@@ -423,6 +425,10 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 
>1] = {
>   [TCA_FLOWER_KEY_ARP_SHA_MASK]   = { .len = ETH_ALEN },
>   [TCA_FLOWER_KEY_ARP_THA]= { .len = ETH_ALEN },
>   [TCA_FLOWER_KEY_ARP_THA_MASK]   = { .len = ETH_ALEN },
>+  [TCA_FLOWER_KEY_MPLS_TTL]   = { .type = NLA_U8 },
>+  [TCA_FLOWER_KEY_MPLS_BOS]   = { .type = NLA_U8 },
>+  [TCA_FLOWER_KEY_MPLS_TC]= { .type = NLA_U8 },
>+  [TCA_FLOWER_KEY_MPLS_LABEL] = { .type = NLA_U32 },
> };
> 
> static void fl_set_key_val(struct nlattr **tb,
>@@ -438,6 +444,36 @@ static void fl_set_key_val(struct nlattr **tb,
>   memcpy(mask, nla_data(tb[mask_type]), len);
> }
> 
>+static void fl_set_key_mpls(struct nlattr **tb,
>+  struct flow_dissector_key_mpls *key_val,
>+  struct flow_dissector_key_mpls *key_mask)
>+{
>+#define MPLS_TTL_MASK (MPLS_LS_TTL_MASK >> MPLS_LS_TTL_SHIFT)
>+#define MPLS_BOS_MASK (MPLS_LS_S_MASK >> MPLS_LS_S_SHIFT)
>+#define MPLS_TC_MASK  (MPLS_LS_TC_MASK >> MPLS_LS_TC_SHIFT)
>+#define MPLS_LABEL_MASK   (MPLS_LS_LABEL_MASK >> MPLS_LS_LABEL_SHIFT)

I wonder if this defines should not be moved to mpls.h so they could be
possibly re-used?

Other than this, looks fine

Acked-by: Jiri Pirko 



>+
>+  if (tb[TCA_FLOWER_KEY_MPLS_TTL]) {
>+  key_val->mpls_ttl = nla_get_u8(tb[TCA_FLOWER_KEY_MPLS_TTL]);
>+  key_mask->mpls_ttl = MPLS_TTL_MASK;
>+  }
>+  if (tb[TCA_FLOWER_KEY_MPLS_BOS]) {
>+  key_val->mpls_bos = nla_get_u8(tb[TCA_FLOWER_KEY_MPLS_BOS]);
>+  key_mask->mpls_bos = MPLS_BOS_MASK;
>+  }
>+  if (tb[TCA_FLOWER_KEY_MPLS_TC]) {
>+  key_val->mpls_tc =
>+  nla_get_u8(tb[TCA_FLOWER_KEY_MPLS_TC]) & MPLS_TC_MASK;
>+  key_mask->mpls_tc = MPLS_TC_MASK;
>+  }
>+  if (tb[TCA_FLOWER_KEY_MPLS_LABEL]) {
>+  key_val->mpls_label =
>+  nla_get_u32(tb[TCA_FLOWER_KEY_MPLS_LABEL]) &
>+  MPLS_LABEL_MASK;
>+  key_mask->mpls_label = MPLS_LABEL_MASK;
>+  }
>+}
>+
> static void fl_set_key_vlan(struct nlattr **tb,
>   struct flow_dissector_key_vlan *key_val,
>   struct flow_dissector_key_vlan *key_mask)
>@@ -594,6 +630,9 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
>  >icmp.code,
>  TCA_FLOWER_KEY_ICMPV6_CODE_MASK,
>  sizeof(key->icmp.code));
>+  } else if (key->basic.n_proto == htons(ETH_P_MPLS_UC) ||
>+ key->basic.n_proto == htons(ETH_P_MPLS_MC)) {
>+  fl_set_key_mpls(tb, >mpls, >mpls);
>   } else if (key->basic.n_proto == htons(ETH_P_ARP) ||
>  key->basic.n_proto == htons(ETH_P_RARP)) {
>   fl_set_key_val(tb, >arp.sip, TCA_FLOWER_KEY_ARP_SIP,
>@@ -730,6 +769,8 @@ static void fl_init_dissector(struct cls_fl_head *head,
>   FL_KEY_SET_IF_MASKED(>key, keys, cnt,
>FLOW_DISSECTOR_KEY_ARP, arp);
>   FL_KEY_SET_IF_MASKED(>key, keys, cnt,
>+   FLOW_DISSECTOR_KEY_MPLS, mpls);
>+  FL_KEY_SET_IF_MASKED(>key, keys, cnt,
>FLOW_DISSECTOR_KEY_VLAN, vlan);
>   

[net-next 13/14] net/mlx5e: Fail safe tc setup

2017-03-27 Thread Saeed Mahameed
Use the new fail-safe channels switch mechanism to set up new
tc parameters.

Signed-off-by: Saeed Mahameed 
Reviewed-by: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 97e153209834..1e29f40d84ca 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2910,7 +2910,7 @@ int mlx5e_modify_channels_vsd(struct mlx5e_channels *chs, 
bool vsd)
 static int mlx5e_setup_tc(struct net_device *netdev, u8 tc)
 {
struct mlx5e_priv *priv = netdev_priv(netdev);
-   bool was_opened;
+   struct mlx5e_channels new_channels = {};
int err = 0;
 
if (tc && tc != MLX5E_MAX_NUM_TC)
@@ -2918,17 +2918,21 @@ static int mlx5e_setup_tc(struct net_device *netdev, u8 
tc)
 
mutex_lock(>state_lock);
 
-   was_opened = test_bit(MLX5E_STATE_OPENED, >state);
-   if (was_opened)
-   mlx5e_close_locked(priv->netdev);
+   new_channels.params = priv->channels.params;
+   new_channels.params.num_tc = tc ? tc : 1;
 
-   priv->channels.params.num_tc = tc ? tc : 1;
+   if (test_bit(MLX5E_STATE_OPENED, >state)) {
+   priv->channels.params = new_channels.params;
+   goto out;
+   }
 
-   if (was_opened)
-   err = mlx5e_open_locked(priv->netdev);
+   err = mlx5e_open_channels(priv, _channels);
+   if (err)
+   goto out;
 
+   mlx5e_switch_priv_channels(priv, _channels);
+out:
mutex_unlock(>state_lock);
-
return err;
 }
 
-- 
2.11.0



[net-next 10/14] net/mlx5e: Introduce switch channels

2017-03-27 Thread Saeed Mahameed
A fail safe helper functions that allows switching to new channels on the
fly,  In simple words:

make_new_config(new_params)
{
new_channels = open_channels(new_params);
if (!new_channels)
 return "Failed, but current channels are still active :)"

switch_channels(new_channels);

return "SUCCESS";
}

Demonstrate mlx5e_switch_priv_channels usage in set channels ethtool
callback and make it fail-safe using the new switch channels mechanism.

Signed-off-by: Saeed Mahameed 
Reviewed-by: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  7 +
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 29 -
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 30 +++---
 3 files changed, 51 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 44c454b34754..2f259dfbf844 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -863,6 +863,13 @@ void mlx5e_build_indir_tir_ctx_hash(struct mlx5e_params 
*params,
 
 int mlx5e_open_locked(struct net_device *netdev);
 int mlx5e_close_locked(struct net_device *netdev);
+
+int mlx5e_open_channels(struct mlx5e_priv *priv,
+   struct mlx5e_channels *chs);
+void mlx5e_close_channels(struct mlx5e_channels *chs);
+void mlx5e_switch_priv_channels(struct mlx5e_priv *priv,
+   struct mlx5e_channels *new_chs);
+
 void mlx5e_build_default_indir_rqt(struct mlx5_core_dev *mdev,
   u32 *indirection_rqt, int len,
   int num_channels);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index b2cd0ef7921e..e5cee400a4d3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -556,8 +556,8 @@ static int mlx5e_set_channels(struct net_device *dev,
 {
struct mlx5e_priv *priv = netdev_priv(dev);
unsigned int count = ch->combined_count;
+   struct mlx5e_channels new_channels = {};
bool arfs_enabled;
-   bool was_opened;
int err = 0;
 
if (!count) {
@@ -571,22 +571,27 @@ static int mlx5e_set_channels(struct net_device *dev,
 
mutex_lock(>state_lock);
 
-   was_opened = test_bit(MLX5E_STATE_OPENED, >state);
-   if (was_opened)
-   mlx5e_close_locked(dev);
+   new_channels.params = priv->channels.params;
+   new_channels.params.num_channels = count;
+   mlx5e_build_default_indir_rqt(priv->mdev, 
new_channels.params.indirection_rqt,
+ MLX5E_INDIR_RQT_SIZE, count);
+
+   if (!test_bit(MLX5E_STATE_OPENED, >state)) {
+   priv->channels.params = new_channels.params;
+   goto out;
+   }
+
+   /* Create fresh channels with new parameters */
+   err = mlx5e_open_channels(priv, _channels);
+   if (err)
+   goto out;
 
arfs_enabled = dev->features & NETIF_F_NTUPLE;
if (arfs_enabled)
mlx5e_arfs_disable(priv);
 
-   priv->channels.params.num_channels = count;
-   mlx5e_build_default_indir_rqt(priv->mdev, 
priv->channels.params.indirection_rqt,
- MLX5E_INDIR_RQT_SIZE, count);
-
-   if (was_opened)
-   err = mlx5e_open_locked(dev);
-   if (err)
-   goto out;
+   /* Switch to new channels, set new parameters and close old ones */
+   mlx5e_switch_priv_channels(priv, _channels);
 
if (arfs_enabled) {
err = mlx5e_arfs_enable(priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index a94f84ec2c1a..97e153209834 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1972,8 +1972,8 @@ static void mlx5e_build_channel_param(struct mlx5e_priv 
*priv,
mlx5e_build_ico_cq_param(priv, icosq_log_wq_sz, >icosq_cq);
 }
 
-static int mlx5e_open_channels(struct mlx5e_priv *priv,
-  struct mlx5e_channels *chs)
+int mlx5e_open_channels(struct mlx5e_priv *priv,
+   struct mlx5e_channels *chs)
 {
struct mlx5e_channel_param *cparam;
int err = -ENOMEM;
@@ -2037,7 +2037,7 @@ static void mlx5e_deactivate_channels(struct 
mlx5e_channels *chs)
mlx5e_deactivate_channel(chs->c[i]);
 }
 
-static void mlx5e_close_channels(struct mlx5e_channels *chs)
+void mlx5e_close_channels(struct mlx5e_channels *chs)
 {
int i;
 
@@ -2533,6 +2533,30 @@ static void mlx5e_deactivate_priv_channels(struct 
mlx5e_priv *priv)
mlx5e_deactivate_channels(>channels);
 }
 
+void 

[net-next 01/14] net/mlx5e: Set SQ max rate on mlx5e_open_txqsq rather on open_channel

2017-03-27 Thread Saeed Mahameed
Instead of iterating over the channel SQs to set their max rate, do it
on SQ creation per TXQ SQ.

Signed-off-by: Saeed Mahameed 
Reviewed-by: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index e849a0fc2653..469d6c147db7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1207,6 +1207,9 @@ static int mlx5e_create_sq_rdy(struct mlx5e_priv *priv,
return err;
 }
 
+static int mlx5e_set_sq_maxrate(struct net_device *dev,
+   struct mlx5e_txqsq *sq, u32 rate);
+
 static int mlx5e_open_txqsq(struct mlx5e_channel *c,
int tc,
struct mlx5e_sq_param *param,
@@ -1214,6 +1217,8 @@ static int mlx5e_open_txqsq(struct mlx5e_channel *c,
 {
struct mlx5e_create_sq_param csp = {};
struct mlx5e_priv *priv = c->priv;
+   u32 tx_rate;
+   int txq_ix;
int err;
 
err = mlx5e_alloc_txqsq(c, tc, param, sq);
@@ -1230,6 +1235,11 @@ static int mlx5e_open_txqsq(struct mlx5e_channel *c,
if (err)
goto err_free_txqsq;
 
+   txq_ix = c->ix + tc * priv->params.num_channels;
+   tx_rate = priv->tx_rates[txq_ix];
+   if (tx_rate)
+   mlx5e_set_sq_maxrate(priv->netdev, sq, tx_rate);
+
netdev_tx_reset_queue(sq->txq);
netif_tx_start_queue(sq->txq);
return 0;
@@ -1692,7 +1702,6 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, 
int ix,
int cpu = mlx5e_get_cpu(priv, ix);
struct mlx5e_channel *c;
int err;
-   int i;
 
c = kzalloc_node(sizeof(*c), GFP_KERNEL, cpu_to_node(cpu));
if (!c)
@@ -1745,17 +1754,6 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, 
int ix,
if (err)
goto err_close_icosq;
 
-   for (i = 0; i < priv->params.num_tc; i++) {
-   u32 txq_ix = priv->channeltc_to_txq_map[ix][i];
-
-   if (priv->tx_rates[txq_ix]) {
-   struct mlx5e_txqsq *sq = priv->txq_to_sq_map[txq_ix];
-
-   mlx5e_set_sq_maxrate(priv->netdev, sq,
-priv->tx_rates[txq_ix]);
-   }
-   }
-
err = c->xdp ? mlx5e_open_xdpsq(c, >xdp_sq, >rq.xdpsq) : 0;
if (err)
goto err_close_sqs;
-- 
2.11.0



[net-next 02/14] net/mlx5e: Set netdev->rx_cpu_rmap on netdev creation

2017-03-27 Thread Saeed Mahameed
To simplify mlx5e_open_locked flow we set netdev->rx_cpu_rmap on netdev
creation rather on netdev open, it is redundant to set it every time on
mlx5e_open_locked.

Signed-off-by: Saeed Mahameed 
Reviewed-by: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 469d6c147db7..f0eff5e30729 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2478,9 +2478,7 @@ int mlx5e_open_locked(struct net_device *netdev)
mlx5e_redirect_rqts(priv);
mlx5e_update_carrier(priv);
mlx5e_timestamp_init(priv);
-#ifdef CONFIG_RFS_ACCEL
-   priv->netdev->rx_cpu_rmap = priv->mdev->rmap;
-#endif
+
if (priv->profile->update_stats)
queue_delayed_work(priv->wq, >update_stats_work, 0);
 
@@ -4022,6 +4020,10 @@ struct net_device *mlx5e_create_netdev(struct 
mlx5_core_dev *mdev,
return NULL;
}
 
+#ifdef CONFIG_RFS_ACCEL
+   netdev->rx_cpu_rmap = mdev->rmap;
+#endif
+
profile->init(mdev, netdev, profile, ppriv);
 
netif_carrier_off(netdev);
-- 
2.11.0



[net-next 03/14] net/mlx5e: Introduce mlx5e_channels

2017-03-27 Thread Saeed Mahameed
Have a dedicated "channels" handler that will serve as channels
(RQs/SQs/etc..) holder to help with separating channels/parameters
operations, for the downstream fail-safe configuration flow, where we will
create a new instance of mlx5e_channels with the new requested parameters
and switch to the new channels on the fly.

Signed-off-by: Saeed Mahameed 
Reviewed-by: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  9 ++-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 27 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 86 +++---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   | 14 ++--
 4 files changed, 71 insertions(+), 65 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index bace9233dc1f..b00c6688ddcf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -560,6 +560,11 @@ struct mlx5e_channel {
intcpu;
 };
 
+struct mlx5e_channels {
+   struct mlx5e_channel **c;
+   unsigned int   num;
+};
+
 enum mlx5e_traffic_types {
MLX5E_TT_IPV4_TCP,
MLX5E_TT_IPV6_TCP,
@@ -736,7 +741,7 @@ struct mlx5e_priv {
struct mutex   state_lock; /* Protects Interface state */
struct mlx5e_rqdrop_rq;
 
-   struct mlx5e_channel **channel;
+   struct mlx5e_channels  channels;
u32tisn[MLX5E_MAX_NUM_TC];
struct mlx5e_rqt   indir_rqt;
struct mlx5e_tir   indir_tir[MLX5E_NUM_INDIR_TIRS];
@@ -836,7 +841,7 @@ int mlx5e_vlan_rx_kill_vid(struct net_device *dev, 
__always_unused __be16 proto,
 void mlx5e_enable_vlan_filter(struct mlx5e_priv *priv);
 void mlx5e_disable_vlan_filter(struct mlx5e_priv *priv);
 
-int mlx5e_modify_rqs_vsd(struct mlx5e_priv *priv, bool vsd);
+int mlx5e_modify_channels_vsd(struct mlx5e_channels *chs, bool vsd);
 
 int mlx5e_redirect_rqt(struct mlx5e_priv *priv, u32 rqtn, int sz, int ix);
 void mlx5e_build_indir_tir_ctx_hash(struct mlx5e_priv *priv, void *tirc,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index a004a5a1a4c2..2e54a6564d86 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -152,12 +152,9 @@ static bool mlx5e_query_global_pause_combined(struct 
mlx5e_priv *priv)
 }
 
 #define MLX5E_NUM_Q_CNTRS(priv) (NUM_Q_COUNTERS * (!!priv->q_counter))
-#define MLX5E_NUM_RQ_STATS(priv) \
-   (NUM_RQ_STATS * priv->params.num_channels * \
-test_bit(MLX5E_STATE_OPENED, >state))
+#define MLX5E_NUM_RQ_STATS(priv) (NUM_RQ_STATS * (priv)->channels.num)
 #define MLX5E_NUM_SQ_STATS(priv) \
-   (NUM_SQ_STATS * priv->params.num_channels * priv->params.num_tc * \
-test_bit(MLX5E_STATE_OPENED, >state))
+   (NUM_SQ_STATS * (priv)->channels.num * (priv)->params.num_tc)
 #define MLX5E_NUM_PFC_COUNTERS(priv) \
((mlx5e_query_global_pause_combined(priv) + 
hweight8(mlx5e_query_pfc_combined(priv))) * \
  NUM_PPORT_PER_PRIO_PFC_COUNTERS)
@@ -262,13 +259,13 @@ static void mlx5e_fill_stats_strings(struct mlx5e_priv 
*priv, uint8_t *data)
return;
 
/* per channel counters */
-   for (i = 0; i < priv->params.num_channels; i++)
+   for (i = 0; i < priv->channels.num; i++)
for (j = 0; j < NUM_RQ_STATS; j++)
sprintf(data + (idx++) * ETH_GSTRING_LEN,
rq_stats_desc[j].format, i);
 
for (tc = 0; tc < priv->params.num_tc; tc++)
-   for (i = 0; i < priv->params.num_channels; i++)
+   for (i = 0; i < priv->channels.num; i++)
for (j = 0; j < NUM_SQ_STATS; j++)
sprintf(data + (idx++) * ETH_GSTRING_LEN,
sq_stats_desc[j].format,
@@ -303,6 +300,7 @@ static void mlx5e_get_ethtool_stats(struct net_device *dev,
struct ethtool_stats *stats, u64 *data)
 {
struct mlx5e_priv *priv = netdev_priv(dev);
+   struct mlx5e_channels *channels;
struct mlx5_priv *mlx5_priv;
int i, j, tc, prio, idx = 0;
unsigned long pfc_combined;
@@ -313,6 +311,7 @@ static void mlx5e_get_ethtool_stats(struct net_device *dev,
mutex_lock(>state_lock);
if (test_bit(MLX5E_STATE_OPENED, >state))
mlx5e_update_stats(priv);
+   channels = >channels;
mutex_unlock(>state_lock);
 
for (i = 0; i < NUM_SW_COUNTERS; i++)
@@ -382,16 +381,16 @@ static void mlx5e_get_ethtool_stats(struct net_device 
*dev,
return;
 
/* per channel counters */
-   for (i = 0; i < priv->params.num_channels; i++)
+   for (i = 0; i < 

[net-next 09/14] net/mlx5e: Minimize mlx5e_{open/close}_locked

2017-03-27 Thread Saeed Mahameed
mlx5e_redirect_rqts_to_{channels,drop} and mlx5e_{add,del}_sqs_fwd_rules
and Set real num tx/rx queues belong to
mlx5e_{activate,deactivate}_priv_channels, for that we move those functions
and minimize mlx5e_open/close flows.

This will be needed in downstream patches to replace old channels with new
ones without the need to call mlx5e_close/open.

Signed-off-by: Saeed Mahameed 
Reviewed-by: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 40 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c  | 10 --
 2 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index a6e09c46440b..a94f84ec2c1a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2498,14 +2498,33 @@ static void mlx5e_build_channels_tx_maps(struct 
mlx5e_priv *priv)
 
 static void mlx5e_activate_priv_channels(struct mlx5e_priv *priv)
 {
+   int num_txqs = priv->channels.num * priv->channels.params.num_tc;
+   struct net_device *netdev = priv->netdev;
+
+   mlx5e_netdev_set_tcs(netdev);
+   if (netdev->real_num_tx_queues != num_txqs)
+   netif_set_real_num_tx_queues(netdev, num_txqs);
+   if (netdev->real_num_rx_queues != priv->channels.num)
+   netif_set_real_num_rx_queues(netdev, priv->channels.num);
+
mlx5e_build_channels_tx_maps(priv);
mlx5e_activate_channels(>channels);
netif_tx_start_all_queues(priv->netdev);
+
+   if (MLX5_CAP_GEN(priv->mdev, vport_group_manager))
+   mlx5e_add_sqs_fwd_rules(priv);
+
mlx5e_wait_channels_min_rx_wqes(>channels);
+   mlx5e_redirect_rqts_to_channels(priv, >channels);
 }
 
 static void mlx5e_deactivate_priv_channels(struct mlx5e_priv *priv)
 {
+   mlx5e_redirect_rqts_to_drop(priv);
+
+   if (MLX5_CAP_GEN(priv->mdev, vport_group_manager))
+   mlx5e_remove_sqs_fwd_rules(priv);
+
/* FIXME: This is a W/A only for tx timeout watch dog false alarm when
 * polling for inactive tx queues.
 */
@@ -2517,40 +2536,24 @@ static void mlx5e_deactivate_priv_channels(struct 
mlx5e_priv *priv)
 int mlx5e_open_locked(struct net_device *netdev)
 {
struct mlx5e_priv *priv = netdev_priv(netdev);
-   struct mlx5_core_dev *mdev = priv->mdev;
-   int num_txqs;
int err;
 
set_bit(MLX5E_STATE_OPENED, >state);
 
-   mlx5e_netdev_set_tcs(netdev);
-
-   num_txqs = priv->channels.params.num_channels * 
priv->channels.params.num_tc;
-   netif_set_real_num_tx_queues(netdev, num_txqs);
-   netif_set_real_num_rx_queues(netdev, 
priv->channels.params.num_channels);
-
err = mlx5e_open_channels(priv, >channels);
if (err)
goto err_clear_state_opened_flag;
 
mlx5e_refresh_tirs(priv, false);
mlx5e_activate_priv_channels(priv);
-   mlx5e_redirect_rqts_to_channels(priv, >channels);
mlx5e_update_carrier(priv);
mlx5e_timestamp_init(priv);
 
if (priv->profile->update_stats)
queue_delayed_work(priv->wq, >update_stats_work, 0);
 
-   if (MLX5_CAP_GEN(mdev, vport_group_manager)) {
-   err = mlx5e_add_sqs_fwd_rules(priv);
-   if (err)
-   goto err_close_channels;
-   }
return 0;
 
-err_close_channels:
-   mlx5e_close_channels(>channels);
 err_clear_state_opened_flag:
clear_bit(MLX5E_STATE_OPENED, >state);
return err;
@@ -2571,7 +2574,6 @@ int mlx5e_open(struct net_device *netdev)
 int mlx5e_close_locked(struct net_device *netdev)
 {
struct mlx5e_priv *priv = netdev_priv(netdev);
-   struct mlx5_core_dev *mdev = priv->mdev;
 
/* May already be CLOSED in case a previous configuration operation
 * (e.g RX/TX queue size change) that involves close failed.
@@ -2581,12 +2583,8 @@ int mlx5e_close_locked(struct net_device *netdev)
 
clear_bit(MLX5E_STATE_OPENED, >state);
 
-   if (MLX5_CAP_GEN(mdev, vport_group_manager))
-   mlx5e_remove_sqs_fwd_rules(priv);
-
mlx5e_timestamp_cleanup(priv);
netif_carrier_off(priv->netdev);
-   mlx5e_redirect_rqts_to_drop(priv);
mlx5e_deactivate_priv_channels(priv);
mlx5e_close_channels(>channels);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index d277c1979b2a..53db5ec2c122 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -189,12 +189,13 @@ int mlx5e_add_sqs_fwd_rules(struct mlx5e_priv *priv)
struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
struct mlx5_eswitch_rep *rep = priv->ppriv;
struct mlx5e_channel *c;
-   int n, tc, err, num_sqs = 0;
+   int n, 

[net-next 05/14] net/mlx5e: Refactor refresh TIRs

2017-03-27 Thread Saeed Mahameed
Rename mlx5e_refresh_tirs_self_loopback to mlx5e_refresh_tirs,
as it will be used in downstream (Safe config flow) patches, and make it
fail safe on mlx5e_open.

Signed-off-by: Saeed Mahameed 
Reviewed-by: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  3 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_common.c   | 17 +++--
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |  8 +---
 drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c |  9 +++--
 4 files changed, 16 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 50dfc4c6c8e4..5f7cc58d900c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -959,8 +959,7 @@ void mlx5e_destroy_tir(struct mlx5_core_dev *mdev,
   struct mlx5e_tir *tir);
 int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev);
 void mlx5e_destroy_mdev_resources(struct mlx5_core_dev *mdev);
-int mlx5e_refresh_tirs_self_loopback(struct mlx5_core_dev *mdev,
-bool enable_uc_lb);
+int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool enable_uc_lb);
 
 struct mlx5_eswitch_rep;
 int mlx5e_vport_rep_load(struct mlx5_eswitch *esw,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
index 20bdbe685795..f1f17f7a3cd0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
@@ -136,18 +136,20 @@ void mlx5e_destroy_mdev_resources(struct mlx5_core_dev 
*mdev)
mlx5_core_dealloc_pd(mdev, res->pdn);
 }
 
-int mlx5e_refresh_tirs_self_loopback(struct mlx5_core_dev *mdev,
-bool enable_uc_lb)
+int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool enable_uc_lb)
 {
+   struct mlx5_core_dev *mdev = priv->mdev;
struct mlx5e_tir *tir;
-   void *in;
+   int err  = -ENOMEM;
+   u32 tirn = 0;
int inlen;
-   int err = 0;
+   void *in;
+
 
inlen = MLX5_ST_SZ_BYTES(modify_tir_in);
in = mlx5_vzalloc(inlen);
if (!in)
-   return -ENOMEM;
+   goto out;
 
if (enable_uc_lb)
MLX5_SET(modify_tir_in, in, ctx.self_lb_block,
@@ -156,13 +158,16 @@ int mlx5e_refresh_tirs_self_loopback(struct mlx5_core_dev 
*mdev,
MLX5_SET(modify_tir_in, in, bitmask.self_lb_en, 1);
 
list_for_each_entry(tir, >mlx5e_res.td.tirs_list, list) {
-   err = mlx5_core_modify_tir(mdev, tir->tirn, in, inlen);
+   tirn = tir->tirn;
+   err = mlx5_core_modify_tir(mdev, tirn, in, inlen);
if (err)
goto out;
}
 
 out:
kvfree(in);
+   if (err)
+   netdev_err(priv->netdev, "refresh tir(0x%x) failed, %d\n", 
tirn, err);
 
return err;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index aec77f075714..a98d01684247 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2498,13 +2498,7 @@ int mlx5e_open_locked(struct net_device *netdev)
goto err_clear_state_opened_flag;
}
 
-   err = mlx5e_refresh_tirs_self_loopback(priv->mdev, false);
-   if (err) {
-   netdev_err(netdev, "%s: mlx5e_refresh_tirs_self_loopback_enable 
failed, %d\n",
-  __func__, err);
-   goto err_close_channels;
-   }
-
+   mlx5e_refresh_tirs(priv, false);
mlx5e_redirect_rqts_to_channels(priv, >channels);
mlx5e_update_carrier(priv);
mlx5e_timestamp_init(priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
index 5621dcfda4f1..5225f2226a67 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
@@ -236,12 +236,9 @@ static int mlx5e_test_loopback_setup(struct mlx5e_priv 
*priv,
 {
int err = 0;
 
-   err = mlx5e_refresh_tirs_self_loopback(priv->mdev, true);
-   if (err) {
-   netdev_err(priv->netdev,
-  "\tFailed to enable UC loopback err(%d)\n", err);
+   err = mlx5e_refresh_tirs(priv, true);
+   if (err)
return err;
-   }
 
lbtp->loopback_ok = false;
init_completion(>comp);
@@ -258,7 +255,7 @@ static void mlx5e_test_loopback_cleanup(struct mlx5e_priv 
*priv,
struct mlx5e_lbt_priv *lbtp)
 {
dev_remove_pack(>pt);
-   mlx5e_refresh_tirs_self_loopback(priv->mdev, false);
+   mlx5e_refresh_tirs(priv, false);
 }
 
 #define MLX5E_LB_VERIFY_TIMEOUT 

[net-next 12/14] net/mlx5e: Fail safe cqe compressing/moderation mode setting

2017-03-27 Thread Saeed Mahameed
Use the new fail-safe channels switch mechanism to set new
CQE compressing and CQE moderation mode settings.

We also move RX CQE compression modify function out of en_rx file  to
a more appropriate place.

Signed-off-by: Saeed Mahameed 
Reviewed-by: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c |  8 +++-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 53 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c| 22 -
 4 files changed, 51 insertions(+), 34 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 2f259dfbf844..8b93d8d02116 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -833,7 +833,7 @@ void mlx5e_pps_event_handler(struct mlx5e_priv *priv,
 struct ptp_clock_event *event);
 int mlx5e_hwstamp_set(struct net_device *dev, struct ifreq *ifr);
 int mlx5e_hwstamp_get(struct net_device *dev, struct ifreq *ifr);
-void mlx5e_modify_rx_cqe_compression_locked(struct mlx5e_priv *priv, bool val);
+int mlx5e_modify_rx_cqe_compression_locked(struct mlx5e_priv *priv, bool val);
 
 int mlx5e_vlan_rx_add_vid(struct net_device *dev, __always_unused __be16 proto,
  u16 vid);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
index 485c23b59f93..e706a87fc8b2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
@@ -90,6 +90,7 @@ int mlx5e_hwstamp_set(struct net_device *dev, struct ifreq 
*ifr)
 {
struct mlx5e_priv *priv = netdev_priv(dev);
struct hwtstamp_config config;
+   int err;
 
if (!MLX5_CAP_GEN(priv->mdev, device_frequency_khz))
return -EOPNOTSUPP;
@@ -129,7 +130,12 @@ int mlx5e_hwstamp_set(struct net_device *dev, struct ifreq 
*ifr)
case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ:
/* Disable CQE compression */
netdev_warn(dev, "Disabling cqe compression");
-   mlx5e_modify_rx_cqe_compression_locked(priv, false);
+   err = mlx5e_modify_rx_cqe_compression_locked(priv, false);
+   if (err) {
+   netdev_err(dev, "Failed disabling cqe compression 
err=%d\n", err);
+   mutex_unlock(>state_lock);
+   return err;
+   }
config.rx_filter = HWTSTAMP_FILTER_ALL;
break;
default:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 457a796cc248..c5f49e294987 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1474,10 +1474,10 @@ static int set_pflag_rx_cqe_based_moder(struct 
net_device *netdev, bool enable)
 {
struct mlx5e_priv *priv = netdev_priv(netdev);
struct mlx5_core_dev *mdev = priv->mdev;
+   struct mlx5e_channels new_channels = {};
bool rx_mode_changed;
u8 rx_cq_period_mode;
int err = 0;
-   bool reset;
 
rx_cq_period_mode = enable ?
MLX5_CQ_PERIOD_MODE_START_FROM_CQE :
@@ -1491,16 +1491,51 @@ static int set_pflag_rx_cqe_based_moder(struct 
net_device *netdev, bool enable)
if (!rx_mode_changed)
return 0;
 
-   reset = test_bit(MLX5E_STATE_OPENED, >state);
-   if (reset)
-   mlx5e_close_locked(netdev);
+   new_channels.params = priv->channels.params;
+   mlx5e_set_rx_cq_mode_params(_channels.params, rx_cq_period_mode);
 
-   mlx5e_set_rx_cq_mode_params(>channels.params, rx_cq_period_mode);
+   if (!test_bit(MLX5E_STATE_OPENED, >state)) {
+   priv->channels.params = new_channels.params;
+   return 0;
+   }
+
+   err = mlx5e_open_channels(priv, _channels);
+   if (err)
+   return err;
 
-   if (reset)
-   err = mlx5e_open_locked(netdev);
+   mlx5e_switch_priv_channels(priv, _channels);
+   return 0;
+}
 
-   return err;
+int mlx5e_modify_rx_cqe_compression_locked(struct mlx5e_priv *priv, bool 
new_val)
+{
+   bool curr_val = MLX5E_GET_PFLAG(>channels.params, 
MLX5E_PFLAG_RX_CQE_COMPRESS);
+   struct mlx5e_channels new_channels = {};
+   int err = 0;
+
+   if (!MLX5_CAP_GEN(priv->mdev, cqe_compression))
+   return new_val ? -EOPNOTSUPP : 0;
+
+   if (curr_val == new_val)
+   return 0;
+
+   new_channels.params = priv->channels.params;
+   MLX5E_SET_PFLAG(_channels.params, MLX5E_PFLAG_RX_CQE_COMPRESS, 
new_val);
+
+   mlx5e_set_rq_type_params(priv->mdev, _channels.params,
+

[net-next 14/14] net/mlx5e: Fail safe mtu and lro setting

2017-03-27 Thread Saeed Mahameed
Use the new fail-safe channels switch mechanism to set new
netdev mtu and lro settings.

MTU and lro settings demand some HW configuration changes after new
channels are created and ready for action. In order to unify switch
channels routine for LRO and MTU changes, and maybe future configuration
features, we now pass to it a modify HW function pointer to be
invoked directly after old channels are de-activated and before new
channels are activated.

Signed-off-by: Saeed Mahameed 
Reviewed-by: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  8 ++-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 12 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 70 ++
 3 files changed, 58 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 8b93d8d02116..150fb52a0737 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -867,8 +867,14 @@ int mlx5e_close_locked(struct net_device *netdev);
 int mlx5e_open_channels(struct mlx5e_priv *priv,
struct mlx5e_channels *chs);
 void mlx5e_close_channels(struct mlx5e_channels *chs);
+
+/* Function pointer to be used to modify WH settings while
+ * switching channels
+ */
+typedef int (*mlx5e_fp_hw_modify)(struct mlx5e_priv *priv);
 void mlx5e_switch_priv_channels(struct mlx5e_priv *priv,
-   struct mlx5e_channels *new_chs);
+   struct mlx5e_channels *new_chs,
+   mlx5e_fp_hw_modify hw_modify);
 
 void mlx5e_build_default_indir_rqt(struct mlx5_core_dev *mdev,
   u32 *indirection_rqt, int len,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index c5f49e294987..40912937d211 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -540,7 +540,7 @@ static int mlx5e_set_ringparam(struct net_device *dev,
if (err)
goto unlock;
 
-   mlx5e_switch_priv_channels(priv, _channels);
+   mlx5e_switch_priv_channels(priv, _channels, NULL);
 
 unlock:
mutex_unlock(>state_lock);
@@ -597,7 +597,7 @@ static int mlx5e_set_channels(struct net_device *dev,
mlx5e_arfs_disable(priv);
 
/* Switch to new channels, set new parameters and close old ones */
-   mlx5e_switch_priv_channels(priv, _channels);
+   mlx5e_switch_priv_channels(priv, _channels, NULL);
 
if (arfs_enabled) {
err = mlx5e_arfs_enable(priv);
@@ -691,7 +691,7 @@ static int mlx5e_set_coalesce(struct net_device *netdev,
if (err)
goto out;
 
-   mlx5e_switch_priv_channels(priv, _channels);
+   mlx5e_switch_priv_channels(priv, _channels, NULL);
 
 out:
mutex_unlock(>state_lock);
@@ -1166,7 +1166,7 @@ static int mlx5e_set_tunable(struct net_device *dev,
err = mlx5e_open_channels(priv, _channels);
if (err)
break;
-   mlx5e_switch_priv_channels(priv, _channels);
+   mlx5e_switch_priv_channels(priv, _channels, NULL);
 
break;
default:
@@ -1503,7 +1503,7 @@ static int set_pflag_rx_cqe_based_moder(struct net_device 
*netdev, bool enable)
if (err)
return err;
 
-   mlx5e_switch_priv_channels(priv, _channels);
+   mlx5e_switch_priv_channels(priv, _channels, NULL);
return 0;
 }
 
@@ -1534,7 +1534,7 @@ int mlx5e_modify_rx_cqe_compression_locked(struct 
mlx5e_priv *priv, bool new_val
if (err)
return err;
 
-   mlx5e_switch_priv_channels(priv, _channels);
+   mlx5e_switch_priv_channels(priv, _channels, NULL);
return 0;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 1e29f40d84ca..68d6c3c58ba7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2437,9 +2437,9 @@ static void mlx5e_query_mtu(struct mlx5e_priv *priv, u16 
*mtu)
*mtu = MLX5E_HW2SW_MTU(hw_mtu);
 }
 
-static int mlx5e_set_dev_port_mtu(struct net_device *netdev)
+static int mlx5e_set_dev_port_mtu(struct mlx5e_priv *priv)
 {
-   struct mlx5e_priv *priv = netdev_priv(netdev);
+   struct net_device *netdev = priv->netdev;
u16 mtu;
int err;
 
@@ -2534,7 +2534,8 @@ static void mlx5e_deactivate_priv_channels(struct 
mlx5e_priv *priv)
 }
 
 void mlx5e_switch_priv_channels(struct mlx5e_priv *priv,
-   struct mlx5e_channels *new_chs)
+   struct mlx5e_channels *new_chs,
+   mlx5e_fp_hw_modify hw_modify)
 {

[net-next 04/14] net/mlx5e: Redirect RQT refactoring

2017-03-27 Thread Saeed Mahameed
RQ Tables are always created once (on netdev creation) pointing to drop RQ
and at that stage, RQ tables (indirection tables) are always directed to
drop RQ.

We don't need to use mlx5e_fill_{direct,indir}_rqt_rqns to fill the drop
RQ in create RQT procedure.

Instead of having separate flows to redirect direct and indirect RQ Tables
to the current active channels Receive Queues (RQs), we unify the two
flows by introducing mlx5e_redirect_rqt function and redirect_rqt_param
struct. Combined, they provide one generic logic to fill the RQ table RQ
numbers regardless of the RQ table purpose (direct/indirect).

Demonstrated the usage with mlx5e_redirect_rqts_to_channels which will
be called on mlx5e_open and with mlx5e_redirect_rqts_to_drop which will
be called on mlx5e_close.

Signed-off-by: Saeed Mahameed 
Reviewed-by: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  14 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  24 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 169 -
 3 files changed, 129 insertions(+), 78 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index b00c6688ddcf..50dfc4c6c8e4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -843,7 +843,19 @@ void mlx5e_disable_vlan_filter(struct mlx5e_priv *priv);
 
 int mlx5e_modify_channels_vsd(struct mlx5e_channels *chs, bool vsd);
 
-int mlx5e_redirect_rqt(struct mlx5e_priv *priv, u32 rqtn, int sz, int ix);
+struct mlx5e_redirect_rqt_param {
+   bool is_rss;
+   union {
+   u32 rqn; /* Direct RQN (Non-RSS) */
+   struct {
+   u8 hfunc;
+   struct mlx5e_channels *channels;
+   } rss; /* RSS data */
+   };
+};
+
+int mlx5e_redirect_rqt(struct mlx5e_priv *priv, u32 rqtn, int sz,
+  struct mlx5e_redirect_rqt_param rrp);
 void mlx5e_build_indir_tir_ctx_hash(struct mlx5e_priv *priv, void *tirc,
enum mlx5e_traffic_types tt);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 2e54a6564d86..faa21848c9dc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1027,20 +1027,28 @@ static int mlx5e_set_rxfh(struct net_device *dev, const 
u32 *indir,
 
mutex_lock(>state_lock);
 
-   if (indir) {
-   u32 rqtn = priv->indir_rqt.rqtn;
-
-   memcpy(priv->params.indirection_rqt, indir,
-  sizeof(priv->params.indirection_rqt));
-   mlx5e_redirect_rqt(priv, rqtn, MLX5E_INDIR_RQT_SIZE, 0);
-   }
-
if (hfunc != ETH_RSS_HASH_NO_CHANGE &&
hfunc != priv->params.rss_hfunc) {
priv->params.rss_hfunc = hfunc;
hash_changed = true;
}
 
+   if (indir) {
+   memcpy(priv->params.indirection_rqt, indir,
+  sizeof(priv->params.indirection_rqt));
+
+   if (test_bit(MLX5E_STATE_OPENED, >state)) {
+   u32 rqtn = priv->indir_rqt.rqtn;
+   struct mlx5e_redirect_rqt_param rrp = {
+   .is_rss = true,
+   .rss.hfunc = priv->params.rss_hfunc,
+   .rss.channels  = >channels
+   };
+
+   mlx5e_redirect_rqt(priv, rqtn, MLX5E_INDIR_RQT_SIZE, 
rrp);
+   }
+   }
+
if (key) {
memcpy(priv->params.toeplitz_hash_key, key,
   sizeof(priv->params.toeplitz_hash_key));
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 920e72ae992e..aec77f075714 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2046,61 +2046,15 @@ static void mlx5e_close_channels(struct mlx5e_priv 
*priv)
chs->num = 0;
 }
 
-static int mlx5e_rx_hash_fn(int hfunc)
-{
-   return (hfunc == ETH_RSS_HASH_TOP) ?
-  MLX5_RX_HASH_FN_TOEPLITZ :
-  MLX5_RX_HASH_FN_INVERTED_XOR8;
-}
-
-static int mlx5e_bits_invert(unsigned long a, int size)
-{
-   int inv = 0;
-   int i;
-
-   for (i = 0; i < size; i++)
-   inv |= (test_bit(size - i - 1, ) ? 1 : 0) << i;
-
-   return inv;
-}
-
-static void mlx5e_fill_indir_rqt_rqns(struct mlx5e_priv *priv, void *rqtc)
-{
-   int i;
-
-   for (i = 0; i < MLX5E_INDIR_RQT_SIZE; i++) {
-   int ix = i;
-   u32 rqn;
-
-   if (priv->params.rss_hfunc == ETH_RSS_HASH_XOR)
-   ix = mlx5e_bits_invert(i, MLX5E_LOG_INDIR_RQT_SIZE);
-
-   

[net-next 11/14] net/mlx5e: Fail safe ethtool settings

2017-03-27 Thread Saeed Mahameed
Use the new fail-safe channels switch mechanism to set new ethtool
settings:
 - ring parameters
 - coalesce parameters
 - tx copy break parameters

Signed-off-by: Saeed Mahameed 
Reviewed-by: Tariq Toukan 
---
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 120 +
 1 file changed, 73 insertions(+), 47 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index e5cee400a4d3..457a796cc248 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -457,8 +457,8 @@ static int mlx5e_set_ringparam(struct net_device *dev,
 {
struct mlx5e_priv *priv = netdev_priv(dev);
int rq_wq_type = priv->channels.params.rq_wq_type;
+   struct mlx5e_channels new_channels = {};
u32 rx_pending_wqes;
-   bool was_opened;
u32 min_rq_size;
u32 max_rq_size;
u8 log_rq_size;
@@ -527,16 +527,22 @@ static int mlx5e_set_ringparam(struct net_device *dev,
 
mutex_lock(>state_lock);
 
-   was_opened = test_bit(MLX5E_STATE_OPENED, >state);
-   if (was_opened)
-   mlx5e_close_locked(dev);
+   new_channels.params = priv->channels.params;
+   new_channels.params.log_rq_size = log_rq_size;
+   new_channels.params.log_sq_size = log_sq_size;
 
-   priv->channels.params.log_rq_size = log_rq_size;
-   priv->channels.params.log_sq_size = log_sq_size;
+   if (!test_bit(MLX5E_STATE_OPENED, >state)) {
+   priv->channels.params = new_channels.params;
+   goto unlock;
+   }
 
-   if (was_opened)
-   err = mlx5e_open_locked(dev);
+   err = mlx5e_open_channels(priv, _channels);
+   if (err)
+   goto unlock;
+
+   mlx5e_switch_priv_channels(priv, _channels);
 
+unlock:
mutex_unlock(>state_lock);
 
return err;
@@ -623,36 +629,13 @@ static int mlx5e_get_coalesce(struct net_device *netdev,
return 0;
 }
 
-static int mlx5e_set_coalesce(struct net_device *netdev,
- struct ethtool_coalesce *coal)
+static void
+mlx5e_set_priv_channels_coalesce(struct mlx5e_priv *priv, struct 
ethtool_coalesce *coal)
 {
-   struct mlx5e_priv *priv= netdev_priv(netdev);
struct mlx5_core_dev *mdev = priv->mdev;
-   bool restart =
-   !!coal->use_adaptive_rx_coalesce != 
priv->channels.params.rx_am_enabled;
-   bool was_opened;
-   int err = 0;
int tc;
int i;
 
-   if (!MLX5_CAP_GEN(mdev, cq_moderation))
-   return -EOPNOTSUPP;
-
-   mutex_lock(>state_lock);
-
-   was_opened = test_bit(MLX5E_STATE_OPENED, >state);
-   if (was_opened && restart) {
-   mlx5e_close_locked(netdev);
-   priv->channels.params.rx_am_enabled = 
!!coal->use_adaptive_rx_coalesce;
-   }
-
-   priv->channels.params.tx_cq_moderation.usec = coal->tx_coalesce_usecs;
-   priv->channels.params.tx_cq_moderation.pkts = 
coal->tx_max_coalesced_frames;
-   priv->channels.params.rx_cq_moderation.usec = coal->rx_coalesce_usecs;
-   priv->channels.params.rx_cq_moderation.pkts = 
coal->rx_max_coalesced_frames;
-
-   if (!was_opened || restart)
-   goto out;
for (i = 0; i < priv->channels.num; ++i) {
struct mlx5e_channel *c = priv->channels.c[i];
 
@@ -667,11 +650,50 @@ static int mlx5e_set_coalesce(struct net_device *netdev,
   coal->rx_coalesce_usecs,
   coal->rx_max_coalesced_frames);
}
+}
 
-out:
-   if (was_opened && restart)
-   err = mlx5e_open_locked(netdev);
+static int mlx5e_set_coalesce(struct net_device *netdev,
+ struct ethtool_coalesce *coal)
+{
+   struct mlx5e_priv *priv= netdev_priv(netdev);
+   struct mlx5_core_dev *mdev = priv->mdev;
+   struct mlx5e_channels new_channels = {};
+   int err = 0;
+   bool reset;
 
+   if (!MLX5_CAP_GEN(mdev, cq_moderation))
+   return -EOPNOTSUPP;
+
+   mutex_lock(>state_lock);
+   new_channels.params = priv->channels.params;
+
+   new_channels.params.tx_cq_moderation.usec = coal->tx_coalesce_usecs;
+   new_channels.params.tx_cq_moderation.pkts = 
coal->tx_max_coalesced_frames;
+   new_channels.params.rx_cq_moderation.usec = coal->rx_coalesce_usecs;
+   new_channels.params.rx_cq_moderation.pkts = 
coal->rx_max_coalesced_frames;
+   new_channels.params.rx_am_enabled = 
!!coal->use_adaptive_rx_coalesce;
+
+   if (!test_bit(MLX5E_STATE_OPENED, >state)) {
+   priv->channels.params = new_channels.params;
+   goto out;
+   }
+   /* we are opened */
+
+   reset = !!coal->use_adaptive_rx_coalesce != 

[net-next 08/14] net/mlx5e: CQ and RQ don't need priv pointer

2017-03-27 Thread Saeed Mahameed
Remove mlx5e_priv pointer from CQ and RQ structs,
it was needed only to access mdev pointer from priv pointer.

Instead we now pass mdev where needed.

Signed-off-by: Saeed Mahameed 
Reviewed-by: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  38 +++--
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 181 +
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |   3 +-
 4 files changed, 99 insertions(+), 125 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 007f91f54fda..44c454b34754 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -280,7 +280,6 @@ struct mlx5e_cq {
struct napi_struct*napi;
struct mlx5_core_cqmcq;
struct mlx5e_channel  *channel;
-   struct mlx5e_priv *priv;
 
/* cqe decompression */
struct mlx5_cqe64  title;
@@ -290,6 +289,7 @@ struct mlx5e_cq {
u16decmprs_wqe_counter;
 
/* control */
+   struct mlx5_core_dev  *mdev;
struct mlx5_frag_wq_ctrl   wq_ctrl;
 } cacheline_aligned_in_smp;
 
@@ -533,7 +533,7 @@ struct mlx5e_rq {
u32mpwqe_num_strides;
u32rqn;
struct mlx5e_channel  *channel;
-   struct mlx5e_priv *priv;
+   struct mlx5_core_dev  *mdev;
struct mlx5_core_mkey  umr_mkey;
 } cacheline_aligned_in_smp;
 
@@ -556,6 +556,8 @@ struct mlx5e_channel {
 
/* control */
struct mlx5e_priv *priv;
+   struct mlx5_core_dev  *mdev;
+   struct mlx5e_tstamp   *tstamp;
intix;
intcpu;
 };
@@ -715,22 +717,6 @@ enum {
MLX5E_NIC_PRIO
 };
 
-struct mlx5e_profile {
-   void(*init)(struct mlx5_core_dev *mdev,
-   struct net_device *netdev,
-   const struct mlx5e_profile *profile, void *ppriv);
-   void(*cleanup)(struct mlx5e_priv *priv);
-   int (*init_rx)(struct mlx5e_priv *priv);
-   void(*cleanup_rx)(struct mlx5e_priv *priv);
-   int (*init_tx)(struct mlx5e_priv *priv);
-   void(*cleanup_tx)(struct mlx5e_priv *priv);
-   void(*enable)(struct mlx5e_priv *priv);
-   void(*disable)(struct mlx5e_priv *priv);
-   void(*update_stats)(struct mlx5e_priv *priv);
-   int (*max_nch)(struct mlx5_core_dev *mdev);
-   int max_tc;
-};
-
 struct mlx5e_priv {
/* priv data path fields - start */
struct mlx5e_txqsq *txq2sq[MLX5E_MAX_NUM_CHANNELS * MLX5E_MAX_NUM_TC];
@@ -770,6 +756,22 @@ struct mlx5e_priv {
void  *ppriv;
 };
 
+struct mlx5e_profile {
+   void(*init)(struct mlx5_core_dev *mdev,
+   struct net_device *netdev,
+   const struct mlx5e_profile *profile, void *ppriv);
+   void(*cleanup)(struct mlx5e_priv *priv);
+   int (*init_rx)(struct mlx5e_priv *priv);
+   void(*cleanup_rx)(struct mlx5e_priv *priv);
+   int (*init_tx)(struct mlx5e_priv *priv);
+   void(*cleanup_tx)(struct mlx5e_priv *priv);
+   void(*enable)(struct mlx5e_priv *priv);
+   void(*disable)(struct mlx5e_priv *priv);
+   void(*update_stats)(struct mlx5e_priv *priv);
+   int (*max_nch)(struct mlx5_core_dev *mdev);
+   int max_tc;
+};
+
 void mlx5e_build_ptys2ethtool_map(void);
 
 u16 mlx5e_select_queue(struct net_device *dev, struct sk_buff *skb,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index cf8df1d3275e..a6e09c46440b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -491,11 +491,10 @@ static void mlx5e_rq_free_mpwqe_info(struct mlx5e_rq *rq)
kfree(rq->mpwqe.info);
 }
 
-static int mlx5e_create_umr_mkey(struct mlx5e_priv *priv,
+static int mlx5e_create_umr_mkey(struct mlx5_core_dev *mdev,
 u64 npages, u8 page_shift,
 struct mlx5_core_mkey *umr_mkey)
 {
-   struct mlx5_core_dev *mdev = priv->mdev;
int inlen = MLX5_ST_SZ_BYTES(create_mkey_in);
void *mkc;
u32 *in;
@@ -529,12 +528,11 @@ static int mlx5e_create_umr_mkey(struct mlx5e_priv *priv,
return err;
 }
 
-static int mlx5e_create_rq_umr_mkey(struct mlx5e_rq *rq)
+static int mlx5e_create_rq_umr_mkey(struct mlx5_core_dev *mdev, struct 
mlx5e_rq *rq)
 {
-   struct mlx5e_priv *priv = rq->priv;
u64 num_mtts = MLX5E_REQUIRED_MTTS(mlx5_wq_ll_get_size(>wq));
 
-   return mlx5e_create_umr_mkey(priv, num_mtts, PAGE_SHIFT, >umr_mkey);
+ 

[net-next 07/14] net/mlx5e: Isolate open_channels from priv->params

2017-03-27 Thread Saeed Mahameed
In order to have a clean separation between channels resources creation
flows and current active mlx5e netdev parameters, make sure each
resource creation function do not access priv->params, and only works
with on a new fresh set of parameters.

For this we add "new" mlx5e_params field to mlx5e_channels structure
and use it down the road to mlx5e_open_{cq,rq,sq} and so on.

Signed-off-by: Saeed Mahameed 
Reviewed-by: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  22 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c |   2 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 119 +++---
 .../ethernet/mellanox/mlx5/core/en_fs_ethtool.c|   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 448 ++---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |  61 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|   8 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|   7 +-
 8 files changed, 328 insertions(+), 341 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index f1895ebe7fe5..007f91f54fda 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -182,15 +182,15 @@ enum mlx5e_priv_flag {
MLX5E_PFLAG_RX_CQE_COMPRESS = (1 << 1),
 };
 
-#define MLX5E_SET_PFLAG(priv, pflag, enable)   \
+#define MLX5E_SET_PFLAG(params, pflag, enable) \
do {\
if (enable) \
-   (priv)->params.pflags |= (pflag);   \
+   (params)->pflags |= (pflag);\
else\
-   (priv)->params.pflags &= ~(pflag);  \
+   (params)->pflags &= ~(pflag);   \
} while (0)
 
-#define MLX5E_GET_PFLAG(priv, pflag) (!!((priv)->params.pflags & (pflag)))
+#define MLX5E_GET_PFLAG(params, pflag) (!!((params)->pflags & (pflag)))
 
 #ifdef CONFIG_MLX5_CORE_EN_DCB
 #define MLX5E_MAX_BW_ALLOC 100 /* Max percentage of BW allocation */
@@ -213,7 +213,6 @@ struct mlx5e_params {
bool rx_cqe_compress_def;
struct mlx5e_cq_moder rx_cq_moderation;
struct mlx5e_cq_moder tx_cq_moderation;
-   u16 min_rx_wqes;
bool lro_en;
u32 lro_wqe_sz;
u16 tx_max_inline;
@@ -225,6 +224,7 @@ struct mlx5e_params {
bool rx_am_enabled;
u32 lro_timeout;
u32 pflags;
+   struct bpf_prog *xdp_prog;
 };
 
 #ifdef CONFIG_MLX5_CORE_EN_DCB
@@ -357,7 +357,6 @@ struct mlx5e_txqsq {
/* control path */
struct mlx5_wq_ctrlwq_ctrl;
struct mlx5e_channel  *channel;
-   inttc;
inttxq_ix;
u32rate_limit;
 } cacheline_aligned_in_smp;
@@ -564,6 +563,7 @@ struct mlx5e_channel {
 struct mlx5e_channels {
struct mlx5e_channel **c;
unsigned int   num;
+   struct mlx5e_paramsparams;
 };
 
 enum mlx5e_traffic_types {
@@ -735,7 +735,6 @@ struct mlx5e_priv {
/* priv data path fields - start */
struct mlx5e_txqsq *txq2sq[MLX5E_MAX_NUM_CHANNELS * MLX5E_MAX_NUM_TC];
int channel_tc2txq[MLX5E_MAX_NUM_CHANNELS][MLX5E_MAX_NUM_TC];
-   struct bpf_prog *xdp_prog;
/* priv data path fields - end */
 
unsigned long  state;
@@ -752,7 +751,6 @@ struct mlx5e_priv {
struct mlx5e_flow_steering fs;
struct mlx5e_vxlan_db  vxlan;
 
-   struct mlx5e_paramsparams;
struct workqueue_struct*wq;
struct work_struct update_carrier_work;
struct work_struct set_rx_mode_work;
@@ -857,8 +855,9 @@ struct mlx5e_redirect_rqt_param {
 
 int mlx5e_redirect_rqt(struct mlx5e_priv *priv, u32 rqtn, int sz,
   struct mlx5e_redirect_rqt_param rrp);
-void mlx5e_build_indir_tir_ctx_hash(struct mlx5e_priv *priv, void *tirc,
-   enum mlx5e_traffic_types tt);
+void mlx5e_build_indir_tir_ctx_hash(struct mlx5e_params *params,
+   enum mlx5e_traffic_types tt,
+   void *tirc);
 
 int mlx5e_open_locked(struct net_device *netdev);
 int mlx5e_close_locked(struct net_device *netdev);
@@ -869,7 +868,8 @@ int mlx5e_get_max_linkspeed(struct mlx5_core_dev *mdev, u32 
*speed);
 
 void mlx5e_set_rx_cq_mode_params(struct mlx5e_params *params,
 u8 cq_period_mode);
-void mlx5e_set_rq_type_params(struct mlx5e_priv *priv, u8 rq_type);
+void mlx5e_set_rq_type_params(struct mlx5_core_dev *mdev,
+ struct mlx5e_params *params, u8 rq_type);
 
 static inline
 struct mlx5e_tx_wqe *mlx5e_post_nop(struct 

[pull request][net-next 00/14] Mellanox mlx5e Fail-safe config

2017-03-27 Thread Saeed Mahameed
Hi Dave,

This series provides a fail-safe mechanism to allow safely re-configuring
mlx5e netdevice and provides a resiliency against sporadic
configuration failures.

For additional information please see below.

Please pull and let me know if there's any problem.

Thanks,
Saeed.

---

The following changes since commit 88275ed0cb3ac89ed869a925337b951801b154d7:

  Merge branch 'netvsc-next' (2017-03-25 20:15:56 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git 
tags/mlx5e-failsafe

for you to fetch changes up to 2e20a151205be8e7efa9644cdb942381e7bec787:

  net/mlx5e: Fail safe mtu and lro setting (2017-03-27 15:08:24 +0300)


mlx5e-failsafe 27-03-2017

This series provides a fail-safe mechanism to allow safely re-configuring
mlx5e netdevice and provides a resiliency against sporadic
configuration failures.

To enable this we do some refactoring and code reorganizing to allow
breaking the drivers open/close flows to stages:
  open -> activate -> deactivate -> close.

In addition we need to allow creating fresh HW ring resources
(mlx5e_channels) with their own "new" set of parameters, while keeping
the current ones running and active until the new channels are
successfully created with the new configuration, and only then we can
safly replace (switch) old channels with new ones.

For that we introduce mlx5e_channels object and an API to manage it:
 - channels = open_channels(new_params):
   open fresh TX/RX channels
 - activate_channels(channels):
   redirect traffic to them and attach them to the netdev
 - deactivate_channes(channels)
   stop traffic and detach from netdev
 - close(channels)
   Free the TX/RX HW resources of those channels

With the above strategy it is straightforward to achieve the desired
behavior of fail-safe configuration.  In pseudo code:

make_new_config(new_params)
{
old_channels = current_active_channels;
new_channels = create_channels(new_params);
if (!new_channels)
return "Failed, but current channels are still active :)"

deactivate_channels(old_channels); /* Can't fail */
set_hw_new_state();/* If needed  */
activate_channels(new_channels);   /* Can't fail */
close_channels(old_channels);
current_active_channels = new_channels;

return "SUCCESS";
}

At the top of this series, we change the following flows to be fail-safe:
ethtool:
   - ring parameters
   - coalesce parameters
   - tx copy break parameters
   - cqe compressing/moderation mode setting (priv flags)
ndos:
   - tc setup
   - set features: LRO
   - change mtu


Saeed Mahameed (14):
  net/mlx5e: Set SQ max rate on mlx5e_open_txqsq rather on open_channel
  net/mlx5e: Set netdev->rx_cpu_rmap on netdev creation
  net/mlx5e: Introduce mlx5e_channels
  net/mlx5e: Redirect RQT refactoring
  net/mlx5e: Refactor refresh TIRs
  net/mlx5e: Split open/close channels to stages
  net/mlx5e: Isolate open_channels from priv->params
  net/mlx5e: CQ and RQ don't need priv pointer
  net/mlx5e: Minimize mlx5e_{open/close}_locked
  net/mlx5e: Introduce switch channels
  net/mlx5e: Fail safe ethtool settings
  net/mlx5e: Fail safe cqe compressing/moderation mode setting
  net/mlx5e: Fail safe tc setup
  net/mlx5e: Fail safe mtu and lro setting

 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  106 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c |   10 +-
 .../net/ethernet/mellanox/mlx5/core/en_common.c|   17 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  336 +++---
 .../ethernet/mellanox/mlx5/core/en_fs_ethtool.c|2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 1172 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   83 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|   22 -
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c |2 +-
 .../net/ethernet/mellanox/mlx5/core/en_selftest.c  |9 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|   11 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |3 +-
 12 files changed, 984 insertions(+), 789 deletions(-)


[net-next 06/14] net/mlx5e: Split open/close channels to stages

2017-03-27 Thread Saeed Mahameed
As a foundation for safe config flow, a simple clear API such as
(Open then Activate) where the "Open" handles the heavy unsafe
creation operation and the "activate" will be fast and fail safe,
to enable the newly created channels.

For this we split the RQs/TXQ SQs and channels open/close flows to
open => activate, deactivate => close.

This will simplify the ability to have fail safe configuration changes
in downstream patches as follows:

make_new_config(new_params)
{
 old_channels = current_active_channels;
 new_channels = create_channels(new_params);
 if (!new_channels)
  return "Failed, but current channels still active :)"
 deactivate_channels(old_channels); /* Can't fail */
 activate_channels(new_channels); /* Can't fail */
 close_channels(old_channels);
 current_active_channels = new_channels;

 return "SUCCESS";
}

Signed-off-by: Saeed Mahameed 
Reviewed-by: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   5 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 214 ++---
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|   4 +-
 4 files changed, 148 insertions(+), 77 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 5f7cc58d900c..f1895ebe7fe5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -358,6 +358,7 @@ struct mlx5e_txqsq {
struct mlx5_wq_ctrlwq_ctrl;
struct mlx5e_channel  *channel;
inttc;
+   inttxq_ix;
u32rate_limit;
 } cacheline_aligned_in_smp;
 
@@ -732,8 +733,8 @@ struct mlx5e_profile {
 
 struct mlx5e_priv {
/* priv data path fields - start */
-   struct mlx5e_txqsq **txq_to_sq_map;
-   int channeltc_to_txq_map[MLX5E_MAX_NUM_CHANNELS][MLX5E_MAX_NUM_TC];
+   struct mlx5e_txqsq *txq2sq[MLX5E_MAX_NUM_CHANNELS * MLX5E_MAX_NUM_TC];
+   int channel_tc2txq[MLX5E_MAX_NUM_CHANNELS][MLX5E_MAX_NUM_TC];
struct bpf_prog *xdp_prog;
/* priv data path fields - end */
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index faa21848c9dc..5159358a242d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -269,7 +269,7 @@ static void mlx5e_fill_stats_strings(struct mlx5e_priv 
*priv, uint8_t *data)
for (j = 0; j < NUM_SQ_STATS; j++)
sprintf(data + (idx++) * ETH_GSTRING_LEN,
sq_stats_desc[j].format,
-   priv->channeltc_to_txq_map[i][tc]);
+   priv->channel_tc2txq[i][tc]);
 }
 
 static void mlx5e_get_strings(struct net_device *dev,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index a98d01684247..6be7c2367d41 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -820,6 +820,8 @@ static int mlx5e_wait_for_min_rx_wqes(struct mlx5e_rq *rq)
msleep(20);
}
 
+   netdev_warn(priv->netdev, "Failed to get min RX wqes on RQN[0x%x] wq 
cur_sz(%d) min_rx_wqes(%d)\n",
+   rq->rqn, wq->cur_sz, priv->params.min_rx_wqes);
return -ETIMEDOUT;
 }
 
@@ -848,9 +850,6 @@ static int mlx5e_open_rq(struct mlx5e_channel *c,
 struct mlx5e_rq_param *param,
 struct mlx5e_rq *rq)
 {
-   struct mlx5e_icosq *sq = >icosq;
-   u16 pi = sq->pc & sq->wq.sz_m1;
-   struct mlx5e_tx_wqe *nopwqe;
int err;
 
err = mlx5e_alloc_rq(c, param, rq);
@@ -861,7 +860,6 @@ static int mlx5e_open_rq(struct mlx5e_channel *c,
if (err)
goto err_free_rq;
 
-   set_bit(MLX5E_RQ_STATE_ENABLED, >state);
err = mlx5e_modify_rq_state(rq, MLX5_RQC_STATE_RST, MLX5_RQC_STATE_RDY);
if (err)
goto err_destroy_rq;
@@ -869,14 +867,9 @@ static int mlx5e_open_rq(struct mlx5e_channel *c,
if (param->am_enabled)
set_bit(MLX5E_RQ_STATE_AM, >rq.state);
 
-   sq->db.ico_wqe[pi].opcode = MLX5_OPCODE_NOP;
-   sq->db.ico_wqe[pi].num_wqebbs = 1;
-   nopwqe = mlx5e_post_nop(>wq, sq->sqn, >pc);
-   mlx5e_notify_hw(>wq, sq->pc, sq->uar_map, >ctrl);
return 0;
 
 err_destroy_rq:
-   clear_bit(MLX5E_RQ_STATE_ENABLED, >state);
mlx5e_destroy_rq(rq);
 err_free_rq:
mlx5e_free_rq(rq);
@@ -884,12 +877,28 @@ static int mlx5e_open_rq(struct mlx5e_channel *c,
return err;
 }
 
-static void 

Re: [PATCH net-next 2/2] cls_flower: add support for matching MPLS labels

2017-03-27 Thread Benjamin LaHaise
On Mon, Mar 27, 2017 at 10:30:41PM +0200, Jiri Pirko wrote:
> Mon, Mar 27, 2017 at 08:16:02PM CEST, benjamin.laha...@netronome.com wrote:
> >Add support to tc flower to match based on fields in MPLS labels (TTL, 
> >Bottom of Stack, TC field, Label).
> 
> Please use scripts/get_maintainer.pl to get list of ccs for the patches
> you submit.

Oops.  Adding Jamal to the Cc -- please holler if you want me to resend.

-ben

> 
> >
> >Signed-off-by: Benjamin LaHaise 
> >Signed-off-by: Benjamin LaHaise 
> >Reviewed-by: Simon Horman 
> >Reviewed-by: Jakub Kicinski 
> >
> >diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
> >index 7a69f2a..f1129e3 100644
> >--- a/include/uapi/linux/pkt_cls.h
> >+++ b/include/uapi/linux/pkt_cls.h
> >@@ -432,6 +432,11 @@ enum {
> > TCA_FLOWER_KEY_ARP_THA, /* ETH_ALEN */
> > TCA_FLOWER_KEY_ARP_THA_MASK,/* ETH_ALEN */
> > 
> >+TCA_FLOWER_KEY_MPLS_TTL,/* u8 - 8 bits */
> >+TCA_FLOWER_KEY_MPLS_BOS,/* u8 - 1 bit */
> >+TCA_FLOWER_KEY_MPLS_TC, /* u8 - 3 bits */
> >+TCA_FLOWER_KEY_MPLS_LABEL,  /* be32 - 20 bits */
> >+
> > __TCA_FLOWER_MAX,
> > };
> > 
> >diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
> >index 9d0c99d..24619f9 100644
> >--- a/net/sched/cls_flower.c
> >+++ b/net/sched/cls_flower.c
> >@@ -18,6 +18,7 @@
> > #include 
> > #include 
> > #include 
> >+#include 
> > 
> > #include 
> > #include 
> >@@ -47,6 +48,7 @@ struct fl_flow_key {
> > struct flow_dissector_key_ipv6_addrs enc_ipv6;
> > };
> > struct flow_dissector_key_ports enc_tp;
> >+struct flow_dissector_key_mpls mpls;
> > } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as 
> > longs. */
> > 
> > struct fl_flow_mask_range {
> >@@ -423,6 +425,10 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX 
> >+ 1] = {
> > [TCA_FLOWER_KEY_ARP_SHA_MASK]   = { .len = ETH_ALEN },
> > [TCA_FLOWER_KEY_ARP_THA]= { .len = ETH_ALEN },
> > [TCA_FLOWER_KEY_ARP_THA_MASK]   = { .len = ETH_ALEN },
> >+[TCA_FLOWER_KEY_MPLS_TTL]   = { .type = NLA_U8 },
> >+[TCA_FLOWER_KEY_MPLS_BOS]   = { .type = NLA_U8 },
> >+[TCA_FLOWER_KEY_MPLS_TC]= { .type = NLA_U8 },
> >+[TCA_FLOWER_KEY_MPLS_LABEL] = { .type = NLA_U32 },
> > };
> > 
> > static void fl_set_key_val(struct nlattr **tb,
> >@@ -438,6 +444,36 @@ static void fl_set_key_val(struct nlattr **tb,
> > memcpy(mask, nla_data(tb[mask_type]), len);
> > }
> > 
> >+static void fl_set_key_mpls(struct nlattr **tb,
> >+struct flow_dissector_key_mpls *key_val,
> >+struct flow_dissector_key_mpls *key_mask)
> >+{
> >+#define MPLS_TTL_MASK   (MPLS_LS_TTL_MASK >> MPLS_LS_TTL_SHIFT)
> >+#define MPLS_BOS_MASK   (MPLS_LS_S_MASK >> MPLS_LS_S_SHIFT)
> >+#define MPLS_TC_MASK(MPLS_LS_TC_MASK >> MPLS_LS_TC_SHIFT)
> >+#define MPLS_LABEL_MASK (MPLS_LS_LABEL_MASK >> MPLS_LS_LABEL_SHIFT)
> >+
> >+if (tb[TCA_FLOWER_KEY_MPLS_TTL]) {
> >+key_val->mpls_ttl = nla_get_u8(tb[TCA_FLOWER_KEY_MPLS_TTL]);
> >+key_mask->mpls_ttl = MPLS_TTL_MASK;
> >+}
> >+if (tb[TCA_FLOWER_KEY_MPLS_BOS]) {
> >+key_val->mpls_bos = nla_get_u8(tb[TCA_FLOWER_KEY_MPLS_BOS]);
> >+key_mask->mpls_bos = MPLS_BOS_MASK;
> >+}
> >+if (tb[TCA_FLOWER_KEY_MPLS_TC]) {
> >+key_val->mpls_tc =
> >+nla_get_u8(tb[TCA_FLOWER_KEY_MPLS_TC]) & MPLS_TC_MASK;
> >+key_mask->mpls_tc = MPLS_TC_MASK;
> >+}
> >+if (tb[TCA_FLOWER_KEY_MPLS_LABEL]) {
> >+key_val->mpls_label =
> >+nla_get_u32(tb[TCA_FLOWER_KEY_MPLS_LABEL]) &
> >+MPLS_LABEL_MASK;
> >+key_mask->mpls_label = MPLS_LABEL_MASK;
> >+}
> >+}
> >+
> > static void fl_set_key_vlan(struct nlattr **tb,
> > struct flow_dissector_key_vlan *key_val,
> > struct flow_dissector_key_vlan *key_mask)
> >@@ -594,6 +630,9 @@ static int fl_set_key(struct net *net, struct nlattr 
> >**tb,
> >>icmp.code,
> >TCA_FLOWER_KEY_ICMPV6_CODE_MASK,
> >sizeof(key->icmp.code));
> >+} else if (key->basic.n_proto == htons(ETH_P_MPLS_UC) ||
> >+   key->basic.n_proto == htons(ETH_P_MPLS_MC)) {
> >+fl_set_key_mpls(tb, >mpls, >mpls);
> > } else if (key->basic.n_proto == htons(ETH_P_ARP) ||
> >key->basic.n_proto == htons(ETH_P_RARP)) {
> > fl_set_key_val(tb, >arp.sip, TCA_FLOWER_KEY_ARP_SIP,
> >@@ -730,6 +769,8 @@ static void fl_init_dissector(struct cls_fl_head *head,
> > FL_KEY_SET_IF_MASKED(>key, keys, cnt,
> >  FLOW_DISSECTOR_KEY_ARP, arp);
> > 

Re: [PATCH net-next 1/2] flow_dissector: add mpls support

2017-03-27 Thread Jiri Pirko
Mon, Mar 27, 2017 at 08:13:42PM CEST, benjamin.laha...@netronome.com wrote:
>Add support for parsing MPLS flows to the flow dissector in preparation for
>adding MPLS match support to cls_flower.
>
>Signed-off-by: Benjamin LaHaise 
>Signed-off-by: Benjamin LaHaise 
>Reviewed-by: Simon Horman 
>Reviewed-by: Jakub Kicinski 
>
>diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
>index ac97030..00d704f 100644
>--- a/include/net/flow_dissector.h
>+++ b/include/net/flow_dissector.h
>@@ -41,6 +41,13 @@ struct flow_dissector_key_vlan {
>   u16 padding;
> };
> 
>+struct flow_dissector_key_mpls {
>+  u32 mpls_ttl : 8,
^ remove this space
   ^
 unnecessary tab

>+  mpls_bos : 1,
>+  mpls_tc : 3,
>+  mpls_label : 20;
>+};
>+
> struct flow_dissector_key_keyid {
>   __be32  keyid;
  ^^ also tab not necessary


Other than this nits, the patch looks good to me.
Reviewed-by: Jiri Pirko 



> };
>@@ -169,6 +176,7 @@ enum flow_dissector_key_id {
>   FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS, /* struct 
> flow_dissector_key_ipv6_addrs */
>   FLOW_DISSECTOR_KEY_ENC_CONTROL, /* struct flow_dissector_key_control */
>   FLOW_DISSECTOR_KEY_ENC_PORTS, /* struct flow_dissector_key_ports */
>+  FLOW_DISSECTOR_KEY_MPLS, /* struct flow_dissector_key_mpls */
> 
>   FLOW_DISSECTOR_KEY_MAX,
> };
>diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
>index 5f3ae92..15185d8 100644
>--- a/net/core/flow_dissector.c
>+++ b/net/core/flow_dissector.c
>@@ -126,9 +126,11 @@ __skb_flow_dissect_mpls(const struct sk_buff *skb,
> {
>   struct flow_dissector_key_keyid *key_keyid;
>   struct mpls_label *hdr, _hdr[2];
>+  u32 entry, label;
> 
>   if (!dissector_uses_key(flow_dissector,
>-  FLOW_DISSECTOR_KEY_MPLS_ENTROPY))
>+  FLOW_DISSECTOR_KEY_MPLS_ENTROPY) &&
>+  !dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_MPLS))
>   return FLOW_DISSECT_RET_OUT_GOOD;
> 
>   hdr = __skb_header_pointer(skb, nhoff, sizeof(_hdr), data,
>@@ -136,8 +138,25 @@ __skb_flow_dissect_mpls(const struct sk_buff *skb,
>   if (!hdr)
>   return FLOW_DISSECT_RET_OUT_BAD;
> 
>-  if ((ntohl(hdr[0].entry) & MPLS_LS_LABEL_MASK) >>
>-  MPLS_LS_LABEL_SHIFT == MPLS_LABEL_ENTROPY) {
>+  entry = ntohl(hdr[0].entry);
>+  label = (entry & MPLS_LS_LABEL_MASK) >> MPLS_LS_LABEL_SHIFT;
>+
>+  if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_MPLS)) {
>+  struct flow_dissector_key_mpls *key_mpls;
>+
>+  key_mpls = skb_flow_dissector_target(flow_dissector,
>+   FLOW_DISSECTOR_KEY_MPLS,
>+   target_container);
>+  key_mpls->mpls_label = label;
>+  key_mpls->mpls_ttl = (entry & MPLS_LS_TTL_MASK)
>+  >> MPLS_LS_TTL_SHIFT;
>+  key_mpls->mpls_tc = (entry & MPLS_LS_TC_MASK)
>+  >> MPLS_LS_TC_SHIFT;
>+  key_mpls->mpls_bos = (entry & MPLS_LS_S_MASK)
>+  >> MPLS_LS_S_SHIFT;
>+  }
>+
>+  if (label == MPLS_LABEL_ENTROPY) {
>   key_keyid = skb_flow_dissector_target(flow_dissector,
> 
> FLOW_DISSECTOR_KEY_MPLS_ENTROPY,
> target_container);
>


Re: [PATCH net-next 2/2] cls_flower: add support for matching MPLS labels

2017-03-27 Thread Jiri Pirko
Mon, Mar 27, 2017 at 08:16:02PM CEST, benjamin.laha...@netronome.com wrote:
>Add support to tc flower to match based on fields in MPLS labels (TTL, 
>Bottom of Stack, TC field, Label).

Please use scripts/get_maintainer.pl to get list of ccs for the patches
you submit.

>
>Signed-off-by: Benjamin LaHaise 
>Signed-off-by: Benjamin LaHaise 
>Reviewed-by: Simon Horman 
>Reviewed-by: Jakub Kicinski 
>
>diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
>index 7a69f2a..f1129e3 100644
>--- a/include/uapi/linux/pkt_cls.h
>+++ b/include/uapi/linux/pkt_cls.h
>@@ -432,6 +432,11 @@ enum {
>   TCA_FLOWER_KEY_ARP_THA, /* ETH_ALEN */
>   TCA_FLOWER_KEY_ARP_THA_MASK,/* ETH_ALEN */
> 
>+  TCA_FLOWER_KEY_MPLS_TTL,/* u8 - 8 bits */
>+  TCA_FLOWER_KEY_MPLS_BOS,/* u8 - 1 bit */
>+  TCA_FLOWER_KEY_MPLS_TC, /* u8 - 3 bits */
>+  TCA_FLOWER_KEY_MPLS_LABEL,  /* be32 - 20 bits */
>+
>   __TCA_FLOWER_MAX,
> };
> 
>diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>index 9d0c99d..24619f9 100644
>--- a/net/sched/cls_flower.c
>+++ b/net/sched/cls_flower.c
>@@ -18,6 +18,7 @@
> #include 
> #include 
> #include 
>+#include 
> 
> #include 
> #include 
>@@ -47,6 +48,7 @@ struct fl_flow_key {
>   struct flow_dissector_key_ipv6_addrs enc_ipv6;
>   };
>   struct flow_dissector_key_ports enc_tp;
>+  struct flow_dissector_key_mpls mpls;
> } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as 
> longs. */
> 
> struct fl_flow_mask_range {
>@@ -423,6 +425,10 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 
>1] = {
>   [TCA_FLOWER_KEY_ARP_SHA_MASK]   = { .len = ETH_ALEN },
>   [TCA_FLOWER_KEY_ARP_THA]= { .len = ETH_ALEN },
>   [TCA_FLOWER_KEY_ARP_THA_MASK]   = { .len = ETH_ALEN },
>+  [TCA_FLOWER_KEY_MPLS_TTL]   = { .type = NLA_U8 },
>+  [TCA_FLOWER_KEY_MPLS_BOS]   = { .type = NLA_U8 },
>+  [TCA_FLOWER_KEY_MPLS_TC]= { .type = NLA_U8 },
>+  [TCA_FLOWER_KEY_MPLS_LABEL] = { .type = NLA_U32 },
> };
> 
> static void fl_set_key_val(struct nlattr **tb,
>@@ -438,6 +444,36 @@ static void fl_set_key_val(struct nlattr **tb,
>   memcpy(mask, nla_data(tb[mask_type]), len);
> }
> 
>+static void fl_set_key_mpls(struct nlattr **tb,
>+  struct flow_dissector_key_mpls *key_val,
>+  struct flow_dissector_key_mpls *key_mask)
>+{
>+#define MPLS_TTL_MASK (MPLS_LS_TTL_MASK >> MPLS_LS_TTL_SHIFT)
>+#define MPLS_BOS_MASK (MPLS_LS_S_MASK >> MPLS_LS_S_SHIFT)
>+#define MPLS_TC_MASK  (MPLS_LS_TC_MASK >> MPLS_LS_TC_SHIFT)
>+#define MPLS_LABEL_MASK   (MPLS_LS_LABEL_MASK >> MPLS_LS_LABEL_SHIFT)
>+
>+  if (tb[TCA_FLOWER_KEY_MPLS_TTL]) {
>+  key_val->mpls_ttl = nla_get_u8(tb[TCA_FLOWER_KEY_MPLS_TTL]);
>+  key_mask->mpls_ttl = MPLS_TTL_MASK;
>+  }
>+  if (tb[TCA_FLOWER_KEY_MPLS_BOS]) {
>+  key_val->mpls_bos = nla_get_u8(tb[TCA_FLOWER_KEY_MPLS_BOS]);
>+  key_mask->mpls_bos = MPLS_BOS_MASK;
>+  }
>+  if (tb[TCA_FLOWER_KEY_MPLS_TC]) {
>+  key_val->mpls_tc =
>+  nla_get_u8(tb[TCA_FLOWER_KEY_MPLS_TC]) & MPLS_TC_MASK;
>+  key_mask->mpls_tc = MPLS_TC_MASK;
>+  }
>+  if (tb[TCA_FLOWER_KEY_MPLS_LABEL]) {
>+  key_val->mpls_label =
>+  nla_get_u32(tb[TCA_FLOWER_KEY_MPLS_LABEL]) &
>+  MPLS_LABEL_MASK;
>+  key_mask->mpls_label = MPLS_LABEL_MASK;
>+  }
>+}
>+
> static void fl_set_key_vlan(struct nlattr **tb,
>   struct flow_dissector_key_vlan *key_val,
>   struct flow_dissector_key_vlan *key_mask)
>@@ -594,6 +630,9 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
>  >icmp.code,
>  TCA_FLOWER_KEY_ICMPV6_CODE_MASK,
>  sizeof(key->icmp.code));
>+  } else if (key->basic.n_proto == htons(ETH_P_MPLS_UC) ||
>+ key->basic.n_proto == htons(ETH_P_MPLS_MC)) {
>+  fl_set_key_mpls(tb, >mpls, >mpls);
>   } else if (key->basic.n_proto == htons(ETH_P_ARP) ||
>  key->basic.n_proto == htons(ETH_P_RARP)) {
>   fl_set_key_val(tb, >arp.sip, TCA_FLOWER_KEY_ARP_SIP,
>@@ -730,6 +769,8 @@ static void fl_init_dissector(struct cls_fl_head *head,
>   FL_KEY_SET_IF_MASKED(>key, keys, cnt,
>FLOW_DISSECTOR_KEY_ARP, arp);
>   FL_KEY_SET_IF_MASKED(>key, keys, cnt,
>+   FLOW_DISSECTOR_KEY_MPLS, mpls);
>+  FL_KEY_SET_IF_MASKED(>key, keys, cnt,
>FLOW_DISSECTOR_KEY_VLAN, vlan);
>   FL_KEY_SET_IF_MASKED(>key, keys, cnt,
>

Re: [PATCH RFC net-next 3/3] net: phy: stop the PHY clock during LPI only if supported

2017-03-27 Thread Andrew Lunn
On Mon, Mar 27, 2017 at 11:47:21AM -0700, Florian Fainelli wrote:
> Now that we detect whether a PHY supports stopping its clock during LPI,
> deny a call to phy_init_eee() with clk_stop_enable being set and the PHY
> not supporting that.

Hi Florian

We are not denying the call. This just ignores the clk_stop_enable
parameter if the clock cannot be stopped. So i think this message
could be better worded.

Maybe also update the function comment?

* and it programs the MMD register 3.0 setting the "Clock stop enable"
* bit if supported by the device.

  Andrew


Re: [PATCH RFC net-next 2/3] net: phy: read whether PHY supports stopping clock during LPI

2017-03-27 Thread Andrew Lunn
On Mon, Mar 27, 2017 at 11:47:20AM -0700, Florian Fainelli wrote:
> In order to use phy_init_eee() correctly, in particular the clk_stop
> argument, we need to know whether the Ethernet PHY supports stopping its
> clock.
> 
> Right now, we would have to call phy_init_eee(phydev, 1), see if that
> tails, and call again with phy_init_eee(phydev, 0) to enable EEE this is
> not an acceptable API use.

Hi Florain

I'm having trouble parsing this paragraph. Should tails be fails?  I
think "This is not an acceptable API use." should be a sentence?

> 
> Update phy_init_hw() to read whether the PHY supports this, and retain
> that information in the phydev structure so we can re-use it later.
> 
> Signed-off-by: Florian Fainelli 
> ---
>  drivers/net/phy/phy_device.c | 23 ++-
>  include/linux/phy.h  |  2 ++
>  2 files changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 1219eeab69d1..2755d77626f7 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -837,6 +837,21 @@ static int phy_poll_reset(struct phy_device *phydev)
>   return 0;
>  }
>  
> +static void phy_read_clock_stop_capable(struct phy_device *phydev)
> +{
> + int ret;
> +
> + /* Read if the PHY supports stopping its clocks (reg 3.1) */
> + ret = phy_read_mmd_indirect(phydev, MDIO_MMD_PCS, MDIO_STAT1,
> + phydev->addr);
> + if (ret < 0)
> + return;

It is pretty unusual to ignore a real error. It might be better to
make this an int function, and return the error.

 Andrew


Re: [PATCH RFC v4 07/10] dt-bindings: net: add binding for QCA7000 UART

2017-03-27 Thread Rob Herring
On Mon, Mar 27, 2017 at 8:37 AM, Stefan Wahren  wrote:
> This is the serdev binding for the QCA7000 UART driver (Ethernet over UART).
>
> Signed-off-by: Stefan Wahren 
> ---
>
> According to this binding are still some questions:
>
> Where should be the optional hardware flow control defined (at master or 
> slave side)?

Probably should be in the slave side. We already have uart-has-rtscts
and rts/cts-gpios for the UART. Those mean we have RTS/CTS, but not
necessarily that we want to enable them.

In many cases, the driver may know what it needs.

> Is it okay to have two bindings (qca-qca7000-spi and qca-qca7000-uart) or 
> should they be merged?

Are they mutually-exclusive or both are used at the same time? What
are the dependencies between the interfaces?

>
>
>  .../devicetree/bindings/net/qca-qca7000-uart.txt   | 31 
> ++
>  1 file changed, 31 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/qca-qca7000-uart.txt
>
> diff --git a/Documentation/devicetree/bindings/net/qca-qca7000-uart.txt 
> b/Documentation/devicetree/bindings/net/qca-qca7000-uart.txt
> new file mode 100644
> index 000..f2e0450
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/qca-qca7000-uart.txt
> @@ -0,0 +1,31 @@
> +* Qualcomm QCA7000 (Ethernet over UART protocol)
> +
> +Note: This binding applies in case the QCA7000 is configured as a
> +UART slave device. It is possible to preconfigure the UART settings
> +of the QCA7000 firmware, which can't be changed during runtime.
> +
> +Required properties:
> +- compatible: Should be "qca,qca7000-uart"
> +
> +Optional properties:
> +- local-mac-address : 6 bytes, Specifies MAC address

The description can be "see ./ethernet.txt"

> +- current-speed : Specifies the serial device speed in
> + bits per second (default = 115200), which is
> + predefined by the QCA7000 firmware configuration

Add this to the slave binding doc with some caveats as to when this
should or should not be used as we discussed.

Rob


Re: [PATCH RFC v2 3/3] net: phy: allow EEE with SGMII interface modes

2017-03-27 Thread Russell King - ARM Linux
Here's the revised patch as requested.

Thanks.

8<===
From: Russell King 
Subject: [PATCH] net: phy: allow EEE with any interface mode

EEE is able to work in any PHY interface mode, there is nothing which
fundamentally restricts it to only a few modes.  For example, EEE works
in SGMII mode with the Marvell 88E1512.

Rather than just adding SGMII mode to the list, Florian suggests
removing the list of interface modes entirely:

  It actually sounds like we should just kill the check entirely,
  it does not appear that any of the interface mode would not
  fundamentally be able to support EEE, because the "lowest" mode
  we support is MII, and even there it's quite possible to support
  EEE.

Signed-off-by: Russell King 
---
 drivers/net/phy/phy.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 345251f21699..867c42154087 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -1208,15 +1208,8 @@ int phy_init_eee(struct phy_device *phydev, bool 
clk_stop_enable)
return -EIO;
 
/* According to 802.3az,the EEE is supported only in full duplex-mode.
-* Also EEE feature is active when core is operating with MII, GMII
-* or RGMII (all kinds). Internal PHYs are also allowed to proceed and
-* should return an error if they do not support EEE.
 */
-   if ((phydev->duplex == DUPLEX_FULL) &&
-   ((phydev->interface == PHY_INTERFACE_MODE_MII) ||
-   (phydev->interface == PHY_INTERFACE_MODE_GMII) ||
-phy_interface_is_rgmii(phydev) ||
-phy_is_internal(phydev))) {
+   if (phydev->duplex == DUPLEX_FULL) {
int eee_lp, eee_cap, eee_adv;
u32 lp, cap, adv;
int status;
-- 
2.7.4


-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.


Re: [PATCH net] MAINTAINERS: Add Andrew Lunn as co-maintainer of PHYLIB

2017-03-27 Thread Andrew Lunn
On Mon, Mar 27, 2017 at 10:48:11AM -0700, Florian Fainelli wrote:
> Andrew has been contributing a lot to PHYLIB over the past months and
> his feedback on patches is more than welcome.
> 
> Signed-off-by: Florian Fainelli 

Thanks Florian

Acked-by: Andrew Lunn 

Andrew


Re: [PATCH RFC v4 09/10] tty: serdev-ttyport: return actual baudrate from ttyport_set_baudrate

2017-03-27 Thread Rob Herring
On Mon, Mar 27, 2017 at 8:37 AM, Stefan Wahren  wrote:
> Instead of returning the requested baudrate, we better return the
> actual one because it isn't always the same.
>
> Signed-off-by: Stefan Wahren 
> ---
>  drivers/tty/serdev/serdev-ttyport.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Acked-by: Rob Herring 


Re: [PATCH RFC v4 10/10] tty: serdev: add functions to retrieve common UART settings

2017-03-27 Thread Rob Herring
On Mon, Mar 27, 2017 at 8:37 AM, Stefan Wahren  wrote:
> Currently serdev core doesn't provide functions to retrieve common
> UART settings like data bits, stop bits or parity. This patch adds
> the interface to the core and the necessary implementation for
> serdev-ttyport.

It doesn't provide them because why do you need to know? The attached
device should request the settings it needs and be done with it. Maybe
some devices can support a number of settings and you could want
negotiate the settings with the UART, though surely 8N1 is in that
list. It's rare to see something that's not 8N1 from what I've seen.

Rob


[PATCH v2] cfg80211: Fix array-bounds warning in fragment copy

2017-03-27 Thread Matthias Kaehlcke
__ieee80211_amsdu_copy_frag intentionally initializes a pointer to
array[-1] to increment it later to valid values. clang rightfully
generates an array-bounds warning on the initialization statement.

Initialize the pointer to array[0] and change the algorithm from
increment before to increment after consume.

Signed-off-by: Matthias Kaehlcke 
---
 net/wireless/util.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/wireless/util.c b/net/wireless/util.c
index 68e5f2ecee1a..52795ae5337f 100644
--- a/net/wireless/util.c
+++ b/net/wireless/util.c
@@ -659,7 +659,7 @@ __ieee80211_amsdu_copy_frag(struct sk_buff *skb, struct 
sk_buff *frame,
int offset, int len)
 {
struct skb_shared_info *sh = skb_shinfo(skb);
-   const skb_frag_t *frag = >frags[-1];
+   const skb_frag_t *frag = >frags[0];
struct page *frag_page;
void *frag_ptr;
int frag_len, frag_size;
@@ -672,10 +672,10 @@ __ieee80211_amsdu_copy_frag(struct sk_buff *skb, struct 
sk_buff *frame,
 
while (offset >= frag_size) {
offset -= frag_size;
-   frag++;
frag_page = skb_frag_page(frag);
frag_ptr = skb_frag_address(frag);
frag_size = skb_frag_size(frag);
+   frag++;
}
 
frag_ptr += offset;
@@ -687,12 +687,12 @@ __ieee80211_amsdu_copy_frag(struct sk_buff *skb, struct 
sk_buff *frame,
len -= cur_len;
 
while (len > 0) {
-   frag++;
frag_len = skb_frag_size(frag);
cur_len = min(len, frag_len);
__frame_add_frag(frame, skb_frag_page(frag),
 skb_frag_address(frag), cur_len, frag_len);
len -= cur_len;
+   frag++;
}
 }
 
-- 
2.12.1.578.ge9c3154ca4-goog



Re: [PATCH RFC v2 3/3] net: phy: allow EEE with SGMII interface modes

2017-03-27 Thread Russell King - ARM Linux
On Mon, Mar 27, 2017 at 11:03:12AM -0700, Florian Fainelli wrote:
> On 03/27/2017 10:00 AM, Russell King - ARM Linux wrote:
> > On Mon, Mar 27, 2017 at 09:47:31AM -0700, Florian Fainelli wrote:
> >> On 03/27/2017 02:59 AM, Russell King wrote:
> >>> As EEE is able to work in SGMII mode as well, add it to the list of
> >>> permissable EEE modes that phy_init_eee() will accept.  This is
> >>> necessary so that EEE can work with an 88E1512 connected in SGMII mode.
> >>
> >> As you mention in your cover letter, we should probably reverse this
> >> test and make it reject modes where EEE has no chance of being supported
> >> at all.
> > 
> > Want me to re-spin?  Any thought on which interface modes we should
> > explicitly exclude?
> 
> It actually sounds like we should just kill the check entirely, it does
> not appear that any of the interface mode would not fundamentally be
> able to support EEE, because the "lowest" mode we support is MII, and
> even there it's quite possible to support EEE.

Right, so it looks like the test reduces down to just:

if (phydev->duplex == DUPLEX_FULL) {

agreed?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.


Re: [PATCH RFC v2 3/3] net: phy: allow EEE with SGMII interface modes

2017-03-27 Thread Florian Fainelli
On 03/27/2017 12:47 PM, Russell King - ARM Linux wrote:
> On Mon, Mar 27, 2017 at 11:03:12AM -0700, Florian Fainelli wrote:
>> On 03/27/2017 10:00 AM, Russell King - ARM Linux wrote:
>>> On Mon, Mar 27, 2017 at 09:47:31AM -0700, Florian Fainelli wrote:
 On 03/27/2017 02:59 AM, Russell King wrote:
> As EEE is able to work in SGMII mode as well, add it to the list of
> permissable EEE modes that phy_init_eee() will accept.  This is
> necessary so that EEE can work with an 88E1512 connected in SGMII mode.

 As you mention in your cover letter, we should probably reverse this
 test and make it reject modes where EEE has no chance of being supported
 at all.
>>>
>>> Want me to re-spin?  Any thought on which interface modes we should
>>> explicitly exclude?
>>
>> It actually sounds like we should just kill the check entirely, it does
>> not appear that any of the interface mode would not fundamentally be
>> able to support EEE, because the "lowest" mode we support is MII, and
>> even there it's quite possible to support EEE.
> 
> Right, so it looks like the test reduces down to just:
> 
>   if (phydev->duplex == DUPLEX_FULL) {
> 
> agreed?

Yes indeed. Thanks!
-- 
Florian


Re: [PATCH] cfg80211: Fix array-bounds warning in fragment copy

2017-03-27 Thread Matthias Kaehlcke
El Mon, Mar 27, 2017 at 12:47:59PM +0200 Johannes Berg ha dit:

> On Fri, 2017-03-24 at 18:06 -0700, Matthias Kaehlcke wrote:
> > __ieee80211_amsdu_copy_frag intentionally initializes a pointer to
> > array[-1] to increment it later to valid values. clang rightfully
> > generates an array-bounds warning on the initialization statement.
> > Work around this by initializing the pointer to array[0] and
> > decrementing it later, which allows to leave the rest of the
> > algorithm untouched.
> > 
> > Signed-off-by: Matthias Kaehlcke 
> > ---
> >  net/wireless/util.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/net/wireless/util.c b/net/wireless/util.c
> > index 68e5f2ecee1a..d3d459e4a070 100644
> > --- a/net/wireless/util.c
> > +++ b/net/wireless/util.c
> > @@ -659,7 +659,7 @@ __ieee80211_amsdu_copy_frag(struct sk_buff *skb,
> > struct sk_buff *frame,
> >     int offset, int len)
> >  {
> >     struct skb_shared_info *sh = skb_shinfo(skb);
> > -   const skb_frag_t *frag = >frags[-1];
> > +   const skb_frag_t *frag = >frags[0];
> >     struct page *frag_page;
> >     void *frag_ptr;
> >     int frag_len, frag_size;
> > @@ -669,6 +669,7 @@ __ieee80211_amsdu_copy_frag(struct sk_buff *skb,
> > struct sk_buff *frame,
> >     frag_page = virt_to_head_page(skb->head);
> >     frag_ptr = skb->data;
> >     frag_size = head_size;
> > +   frag--;
> 
> Isn't it just a question of time until the compiler will see through
> this trick and warn about it?

Maybe.

Actually it seems the algorithm can be easily adapted to increment the
pointer after consumption, which is clearer anyway. I will give this a
shot. I'm not sure how to exercise the code path for testing and would
appreciate help on this end.

Matthias



Re: Extending socket timestamping API for NTP

2017-03-27 Thread Denny Page
[Resend in plain text for vger]

> On Mar 27, 2017, at 11:28, Richard Cochran  wrote:
> 
> I didn't do anything super methodical, and I didn't keep notes, but I
> had a phyter (whose delays were published by TI and independently
> confirmed in a ISPCS paper by Christian Riesch) and an i210 with a 100
> MBit link and with a PPS between them.  The phyter's numbers are
> correct to within a nanosecond, and I saw that the i210 was repeatedly
> landing at the published extreme of the range.  I don't remember which
> extreme, and I didn't repeat more than a few times, however.

Do you still have the resulting correction values from this?

Thanks,
Denny



Re: Extending socket timestamping API for NTP

2017-03-27 Thread Denny Page
[Resend in plain text for vger]

> On Mar 27, 2017, at 11:28, Richard Cochran  wrote:
> 
> On Mon, Mar 27, 2017 at 09:25:03AM -0700, Denny Page wrote:
> 
>> I agree that the values in the igb driver are incorrect. They were
>> middle of the range values from the old tables. At least for 100Mb,
>> Intel seems to know that the original table was incorrect. I’ve done
>> extensive measurements of the i210 and i211 at both 100Mb and
>> 1Gb. The “external link partner” numbers Intel currently publishes
>> for the 100Mb appear accurate.
> 
> Well, after reading this, I am more convinced than ever that doing the
> correction in user space is the right way.  If the one and only vendor
> who publishes numbers can't even get them straight, how on earth will
> we ever get the drivers right?

I think that on average, the Vendor’s numbers are likely to be more accurate 
than anyone else’s. The concept that independent software implementations are 
going to somehow obtain and maintain better numbers is too much of a stretch.

FWIW, My testing indicates that the 100Mb numbers that Intel currently 
publishes are quite accurate. I don’t believe that Intel did the driver 
corrections btw, if memory serves these values were lifted from the Mac.

Denny



Re: Extending socket timestamping API for NTP

2017-03-27 Thread Denny Page

> On Mar 27, 2017, at 11:28, Richard Cochran  wrote:
> 
> On Mon, Mar 27, 2017 at 09:25:03AM -0700, Denny Page wrote:
> 
>> I agree that the values in the igb driver are incorrect. They were
>> middle of the range values from the old tables. At least for 100Mb,
>> Intel seems to know that the original table was incorrect. I’ve done
>> extensive measurements of the i210 and i211 at both 100Mb and
>> 1Gb. The “external link partner” numbers Intel currently publishes
>> for the 100Mb appear accurate.
> 
> Well, after reading this, I am more convinced than ever that doing the
> correction in user space is the right way.  If the one and only vendor
> who publishes numbers can't even get them straight, how on earth will
> we ever get the drivers right?

I think that on average, the Vendor’s numbers are likely to be more accurate 
than anyone else’s. The concept that independent software implementations are 
going to somehow obtain and maintain better numbers is too much of a stretch.

FWIW, My testing indicates that the 100Mb numbers that Intel currently 
publishes are quite accurate. I don’t believe that Intel did the driver 
corrections btw, if memory serves these values were lifted from the Mac.

Denny



Re: [PATCH net-next 2/2] net: stmmac: fix number of tx queues in stmmac_poll

2017-03-27 Thread Corentin Labbe
On Mon, Mar 27, 2017 at 06:44:22PM +0100, Joao Pinto wrote:
> Às 6:28 PM de 3/27/2017, David Miller escreveu:
> > From: Corentin Labbe 
> > Date: Mon, 27 Mar 2017 19:00:58 +0200
> > 
> >> On Mon, Mar 27, 2017 at 04:26:48PM +0100, Joao Pinto wrote:
> >>> Hi David,
> >>>
> >>> Às 7:26 AM de 3/25/2017, Corentin Labbe escreveu:
>  On Fri, Mar 24, 2017 at 05:16:45PM +, Joao Pinto wrote:
> > For cores that have more than 1 TX queue configured, the kernel would 
> > crash,
> > since only one TX queue is permitted by default.
> >
> > Signed-off-by: Joao Pinto 
> > ---
> >  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
> > b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > index 3827952..1eab084 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > @@ -3429,7 +3429,7 @@ static int stmmac_poll(struct napi_struct *napi, 
> > int budget)
> > struct stmmac_rx_queue *rx_q =
> > container_of(napi, struct stmmac_rx_queue, napi);
> > struct stmmac_priv *priv = rx_q->priv_data;
> > -   u32 tx_count = priv->dma_cap.number_tx_queues;
> > +   u32 tx_count = priv->plat->tx_queues_to_use;
> > u32 chan = rx_q->queue_index;
> > u32 work_done = 0;
> > u32 queue = 0;
> > -- 
> > 2.9.3
> >
> 
>  This patch fix the performance issue on dwmac-sun8i only.
>  The dwmac-sunxi is still broken.
> 
> >>>
> >>> This patch series can be upstreamed please, since they make 2 fixes, one 
> >>> of them
> >>> solving the problem in dwmac-sun8i.
> >>>
> >>> Thanks.
> >>
> >> As I said in a previous answer, finaly dwmac-sun8i is still broken.
> >> Adding thoses 2 patch will just made the revert harder.
> > 
> > I agree.
> 
> For what I am understanding, SoCs base on Core versions >= 4.00 are working
> properly and for some reason SoCs based on older versions are not working.
> 
> This fix is necessary, since if you have a diferent configured 
> tx_queues_to_use
> in the driver and priv->dma_cap.number_tx_queues in the core, this can lead to
> kernel crashes.
> 
> The other fix (netdev resources release) is also necessary, since when you
> release the driver its crashes, because the rx queue struct is freed before
> releasing the netdevs.
> 
> We can revert, but I think it might not solve the issue. We can break the
> "multiple buffers" patch into "rx multilple buffers" and "tx multiple 
> buffers",
> but will that actually work? We can give it a try, I don't mind making a new
> multiple buffers patch broken into 2, that can be tested by new cores and 
> older
> cores.
> 

Reverting at least will bring back my archs to good status:)
Spliting will not solve magically the issue, but will permit to easily detect 
which part is faulty.
And I am sure that it is possible to split more than in 2.
The more small the patch will be, the easier it will.

Regards


[PATCH RFC net-next 2/3] net: phy: read whether PHY supports stopping clock during LPI

2017-03-27 Thread Florian Fainelli
In order to use phy_init_eee() correctly, in particular the clk_stop
argument, we need to know whether the Ethernet PHY supports stopping its
clock.

Right now, we would have to call phy_init_eee(phydev, 1), see if that
tails, and call again with phy_init_eee(phydev, 0) to enable EEE this is
not an acceptable API use.

Update phy_init_hw() to read whether the PHY supports this, and retain
that information in the phydev structure so we can re-use it later.

Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/phy_device.c | 23 ++-
 include/linux/phy.h  |  2 ++
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 1219eeab69d1..2755d77626f7 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -837,6 +837,21 @@ static int phy_poll_reset(struct phy_device *phydev)
return 0;
 }
 
+static void phy_read_clock_stop_capable(struct phy_device *phydev)
+{
+   int ret;
+
+   /* Read if the PHY supports stopping its clocks (reg 3.1) */
+   ret = phy_read_mmd_indirect(phydev, MDIO_MMD_PCS, MDIO_STAT1,
+   phydev->addr);
+   if (ret < 0)
+   return;
+
+   /* Do not make this fatal */
+   if (ret & MDIO_STAT1_CLOCK_STOP_CAPABLE)
+   phydev->clk_stop_cap = true;
+}
+
 int phy_init_hw(struct phy_device *phydev)
 {
int ret = 0;
@@ -856,7 +871,13 @@ int phy_init_hw(struct phy_device *phydev)
if (ret < 0)
return ret;
 
-   return phydev->drv->config_init(phydev);
+   ret = phydev->drv->config_init(phydev);
+   if (ret < 0)
+   return ret;
+
+   phy_read_clock_stop_capable(phydev);
+
+   return 0;
 }
 EXPORT_SYMBOL(phy_init_hw);
 
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 624cecf69c28..c61fd519f341 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -357,6 +357,7 @@ struct phy_c45_device_ids {
  * is_pseudo_fixed_link: Set to true if this phy is an Ethernet switch, etc.
  * has_fixups: Set to true if this phy has fixups/quirks.
  * suspended: Set to true if this phy has been suspended successfully.
+ * clk_stop_cap: Set to true if this phy supports TX clock stopping during EEE.
  * state: state of the PHY for management purposes
  * dev_flags: Device-specific flags used by the PHY driver.
  * link_timeout: The number of timer firings to wait before the
@@ -393,6 +394,7 @@ struct phy_device {
bool is_pseudo_fixed_link;
bool has_fixups;
bool suspended;
+   bool clk_stop_cap;
 
enum phy_state state;
 
-- 
2.9.3



[PATCH RFC net-next 3/3] net: phy: stop the PHY clock during LPI only if supported

2017-03-27 Thread Florian Fainelli
Now that we detect whether a PHY supports stopping its clock during LPI,
deny a call to phy_init_eee() with clk_stop_enable being set and the PHY
not supporting that.

Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/phy.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index ba4676ee9018..1c3800e01d82 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -1251,7 +1251,7 @@ int phy_init_eee(struct phy_device *phydev, bool 
clk_stop_enable)
if (!phy_check_valid(phydev->speed, phydev->duplex, lp & adv))
goto eee_exit_err;
 
-   if (clk_stop_enable) {
+   if (clk_stop_enable && phydev->clk_stop_cap) {
/* Configure the PHY to stop receiving xMII
 * clock while it is signaling LPI.
 */
-- 
2.9.3



[PATCH RFC net-next 1/3] net: mdio: add definition for MDIO_STAT1_CLOCK_STOP_CAPABLE

2017-03-27 Thread Florian Fainelli
Add the definition for the Clause 45 IEEE PCS Status 1 Register (3.1)
reporting whether a PHY supports stopping its clock or not during LPI
(EEE).

Signed-off-by: Florian Fainelli 
---
 include/uapi/linux/mdio.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/mdio.h b/include/uapi/linux/mdio.h
index c94a510a577e..4f17427db4cd 100644
--- a/include/uapi/linux/mdio.h
+++ b/include/uapi/linux/mdio.h
@@ -96,6 +96,7 @@
 #define MDIO_STAT1_LPOWERABLE  0x0002  /* Low-power ability */
 #define MDIO_STAT1_LSTATUS BMSR_LSTATUS
 #define MDIO_STAT1_FAULT   0x0080  /* Fault */
+#define MDIO_STAT1_CLOCK_STOP_CAPABLE  0x0040  /* Clock stop capable */
 #define MDIO_AN_STAT1_LPABLE   0x0001  /* Link partner AN ability */
 #define MDIO_AN_STAT1_ABLE BMSR_ANEGCAPABLE
 #define MDIO_AN_STAT1_RFAULT   BMSR_RFAULT
-- 
2.9.3



[PATCH RFC net-next 0/3] net: phy: Read if PHY can stop its clock

2017-03-27 Thread Florian Fainelli
This patch series update PHYLIB to read whether the PHY is actually capable of
stopping its clocks during EEE low power modes.

One problem (not the only one) with phy_init_eee() is that it takes a
clk_stop_enable argument that the caller has no idea how to determine.

This patch series makes the PHY library read whether the PHY is capable of
stopping its clock (after config_init has been called) which will allow
future patches to rename clk_stop_enable into something that conveys an
intention (and therefore could fail if the PHY does not support it).

Florian Fainelli (3):
  net: mdio: add definition for MDIO_STAT1_CLOCK_STOP_CAPABLE
  net: phy: read whether PHY supports stopping clock during LPI
  net: phy: stop the PHY clock during LPI only if supported

 drivers/net/phy/phy.c|  2 +-
 drivers/net/phy/phy_device.c | 23 ++-
 include/linux/phy.h  |  2 ++
 include/uapi/linux/mdio.h|  1 +
 4 files changed, 26 insertions(+), 2 deletions(-)

-- 
2.9.3



  1   2   3   >