date:20180914

Re: Need input on placement of driver

2018-09-14 Thread Sunil Kovvuri

On Wed, Sep 12, 2018 at 10:20 PM Sunil Kovvuri  wrote:
>
> Hi David,
>
> I am trying to submit a driver into drivers/soc folder and Arnd is of
> the opinion that
> the driver should be moved to drivers/net/ethernet.
>
> Can you please go through below and give your feedback.
>
> HW functionality in brief
> # HW has a Admin function (AF) PCI device which has privilege access
> to configure co-processors.
> # Co-processors include network block, crypto block, ring buffer block
> used by both network and crypto blocks, packet or anyother work
> scheduler block, ingress/egress packet parser and forwarding block,
> internal state machine caches etc.
> # Each of these blocks are multiple in number and can be attached to
> other PCI devices.
> # Future variants of the same silicon might have additional functional
> blocks in AF.
> # There are other SRIOV PF/VF devices which are dumb at power-on and
> acquire functionality based
> what blocks are attached to them by AF.
> # So AF is the one which configures, facilities and manages all HW
> resources (network and non-network).
>But doesn't handle any data.
> # PF/VFs communicate with AF via a shared mailbox memory for functional block
>attach / detach requests, HW configuration etc etc.
>
> AF driver will have logic not only the functionality needed by kernel
> netdev or crypto drivers but
> also the HW configuration logic needed by userspace application drivers.
>
> Keeping current and future functionality in view we thought of having
> 3 drivers in kernel
> # AF driver at drivers/soc
> # PF/ VF netdev driver (network & ring buffer blocks attached to these
> devices) at drivers/net/ethernet
> # PF / VF crypto driver (ring buffer and crypto blocks attached to
> these devices) at drivers/crypto.
>
> I have submitted few patches for the AF driver
> https://patchwork.kernel.org/cover/10587635/
>
> Here Arnd has opined that all drivers should move into drivers/net/ethernet.
> So wanted to check if you would be okay with this.
>
> Thanks,
> Sunil.


Hi David,

Sorry for the reminder.
It would be great to have your feedback / inputs.

Thanks,
Sunil.

Re: [PATCH net-next RFC 7/8] udp: gro behind static key

2018-09-14 Thread Subash Abhinov Kasiviswanathan


On 2018-09-14 11:59, Willem de Bruijn wrote:

From: Willem de Bruijn 

Avoid the socket lookup cost in udp_gro_receive if no socket has a
gro callback configured.

Signed-off-by: Willem de Bruijn 
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 4f6aa95a9b12..f44fe328aa0f 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -405,7 +405,7 @@ static struct sk_buff *udp4_gro_receive(struct
list_head *head,
 {
struct udphdr *uh = udp_gro_udphdr(skb);

-   if (unlikely(!uh))
+   if (unlikely(!uh) ||
!static_branch_unlikely(_encap_needed_key))
goto flush;



Hi Willem

Does this need to be

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 6dd3f0a..fcd5589 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -407,7 +407,7 @@ static struct sk_buff *udp4_gro_receive(struct 
list_head *head,

 {
struct udphdr *uh = udp_gro_udphdr(skb);

-   if (unlikely(!uh) || 
!static_branch_unlikely(_encap_needed_key))
+   if (unlikely(!uh) || 
static_branch_unlikely(_encap_needed_key))

goto flush;

/* Don't bother verifying checksum if we're going to flush 
anyway. */


I tried setting UDP_GRO socket option and I had to make this change to
exercise the udp_gro_receive_cb code path.

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [PATCH net-next RFC 5/8] net: deconstify net_offload

2018-09-14 Thread Subash Abhinov Kasiviswanathan


On 2018-09-14 11:59, Willem de Bruijn wrote:

From: Willem de Bruijn 

With configurable gro, the flags field in net_offloads may be changed.

Remove the const keyword. This is a noop otherwise.

Signed-off-by: Willem de Bruijn 
diff --git a/net/sctp/offload.c b/net/sctp/offload.c
index 123e9f2dc226..ad504b83245d 100644
--- a/net/sctp/offload.c
+++ b/net/sctp/offload.c
@@ -90,7 +90,7 @@ static struct sk_buff *sctp_gso_segment(struct 
sk_buff

*skb,
return segs;
 }

-static const struct net_offload sctp_offload = {
+static struct net_offload sctp_offload = {
.callbacks = {
.gso_segment = sctp_gso_segment,
},


Hi Willem

sctp6 also needs to be deconstified.

diff --git a/net/sctp/offload.c b/net/sctp/offload.c
index ad504b8..4be7794 100644
--- a/net/sctp/offload.c
+++ b/net/sctp/offload.c
@@ -96,7 +96,7 @@ static struct sk_buff *sctp_gso_segment(struct sk_buff 
*skb,

},
 };

-static const struct net_offload sctp6_offload = {
+static struct net_offload sctp6_offload = {
.callbacks = {
.gso_segment = sctp_gso_segment,
},


--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

[PATCH net-next] net: hns: make function hns_gmac_wait_fifo_clean() static

2018-09-14 Thread Wei Yongjun

Fixes the following sparse warning:

drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c:322:5: warning:
 symbol 'hns_gmac_wait_fifo_clean' was not declared. Should it be static?

Signed-off-by: Wei Yongjun 
---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
index 09e4061..aaf72c0 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
@@ -319,7 +319,7 @@ static void hns_gmac_set_promisc(void *mac_drv, u8 en)
hns_gmac_set_uc_match(mac_drv, en);
 }
 
-int hns_gmac_wait_fifo_clean(void *mac_drv)
+static int hns_gmac_wait_fifo_clean(void *mac_drv)
 {
struct mac_driver *drv = (struct mac_driver *)mac_drv;
int wait_cnt;

Re: [net-next, RFC PATCH] net: sched: cls_range: Introduce Range classifier

2018-09-14 Thread Nambiar, Amritha

On 9/14/2018 2:58 AM, Jiri Pirko wrote:
> Thu, Sep 13, 2018 at 10:52:06PM CEST, amritha.namb...@intel.com wrote:
> 
> [...]
> 
>> +static struct cls_range_filter *range_lookup(struct cls_range_head *head,
>> + struct range_flow_key *key,
>> + struct range_flow_key *mkey,
>> + bool is_skb)
>> +{
>> +struct cls_range_filter *filter, *next_filter;
>> +struct range_params range;
>> +int ret;
>> +size_t cmp_size;
>> +
>> +list_for_each_entry_safe(filter, next_filter, >filters, flist) {
> 
> This really should be list_for_each_entry_rcu()
> 
> also, as I wrote in the previous email, this should be done in
> cls_flower. Look at fl_lookup() it looks-up hashtable. You just need to
> add linked list traversal and range comparison to that function for the
> hit in the hashtable.
> 

I see. Will integrate the range comparison into cls_flower.

> 
>> +if (!is_skb) {
>> +/* Existing filter comparison */
>> +cmp_size = sizeof(filter->mkey);
>> +} else {
>> +/* skb classification */
>> +ret = range_compare_params(, filter, key,
>> +   RANGE_PORT_DST);
>> +if (ret < 0)
>> +continue;
>> +
>> +ret = range_compare_params(, filter, key,
>> +   RANGE_PORT_SRC);
>> +if (ret < 0)
>> +continue;
>> +
>> +/* skb does not have min and max values */
>> +cmp_size = RANGE_KEY_MEMBER_OFFSET(tp_min);
>> +}
>> +if (!memcmp(mkey, >mkey, cmp_size))
>> +return filter;
>> +}
>> +return NULL;
> 
> [...]
>

Re: [net-next,RFC PATCH] Introduce TC Range classifier

2018-09-14 Thread Nambiar, Amritha

On 9/14/2018 2:09 PM, Cong Wang wrote:
> On Fri, Sep 14, 2018 at 2:53 AM Jiri Pirko  wrote:
>>
>> Thu, Sep 13, 2018 at 10:52:01PM CEST, amritha.namb...@intel.com wrote:
>>> This patch introduces a TC range classifier to support filtering based
>>> on ranges. Only port-range filters are supported currently. This can
>>> be combined with flower classifier to support filters that are a
>>> combination of port-ranges and other parameters based on existing
>>> fields supported by cls_flower. The 'goto chain' action can be used to
>>> combine the flower and range filter.
>>> The filter precedence is decided based on the 'prio' value.
>>
>> For example Spectrum ASIC supports mask-based and range-based matching
>> in a single TCAM rule. No chains needed. Also, I don't really understand
>> why is this a separate cls. I believe that this functionality should be
>> put as an extension of existing cls_flower.
> 
> Exactly. u32 filters support range matching too with proper masks.
> 
Can u32 filters support ranges that are not power-of-2 ?

[PATCH net-next] net: lantiq: Fix return value check in xrx200_probe()

2018-09-14 Thread Wei Yongjun

In case of error, the function devm_ioremap_resource() returns ERR_PTR()
and never returns NULL. The NULL test in the return value check should
be replaced with IS_ERR().

Fixes: fe1a56420cf2 ("net: lantiq: Add Lantiq / Intel VRX200 Ethernet driver")
Signed-off-by: Wei Yongjun 
---
 drivers/net/ethernet/lantiq_xrx200.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/lantiq_xrx200.c 
b/drivers/net/ethernet/lantiq_xrx200.c
index c8b6d90..4a16076 100644
--- a/drivers/net/ethernet/lantiq_xrx200.c
+++ b/drivers/net/ethernet/lantiq_xrx200.c
@@ -461,9 +461,9 @@ static int xrx200_probe(struct platform_device *pdev)
}
 
priv->pmac_reg = devm_ioremap_resource(dev, res);
-   if (!priv->pmac_reg) {
+   if (IS_ERR(priv->pmac_reg)) {
dev_err(dev, "failed to request and remap io ranges\n");
-   return -ENOMEM;
+   return PTR_ERR(priv->pmac_reg);
}
 
priv->chan_rx.dma.irq = platform_get_irq_byname(pdev, "rx");

[PATCH net-next] net: dsa: gswip: Fix copy-paste error in gswip_gphy_fw_probe()

2018-09-14 Thread Wei Yongjun

The return value from of_reset_control_array_get_exclusive() is not
checked correctly. The test is done against a wrong variable. This
patch fix it.

Fixes: 14fceff4771e ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Signed-off-by: Wei Yongjun 
---
 drivers/net/dsa/lantiq_gswip.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/dsa/lantiq_gswip.c b/drivers/net/dsa/lantiq_gswip.c
index 9c28d0b..9c10570 100644
--- a/drivers/net/dsa/lantiq_gswip.c
+++ b/drivers/net/dsa/lantiq_gswip.c
@@ -934,10 +934,10 @@ static int gswip_gphy_fw_probe(struct gswip_priv *priv,
}
 
gphy_fw->reset = of_reset_control_array_get_exclusive(gphy_fw_np);
-   if (IS_ERR(priv->gphy_fw)) {
-   if (PTR_ERR(priv->gphy_fw) != -EPROBE_DEFER)
+   if (IS_ERR(gphy_fw->reset)) {
+   if (PTR_ERR(gphy_fw->reset) != -EPROBE_DEFER)
dev_err(dev, "Failed to lookup gphy reset\n");
-   return PTR_ERR(priv->gphy_fw);
+   return PTR_ERR(gphy_fw->reset);
}
 
return gswip_gphy_fw_load(priv, gphy_fw);

[PATCH net-next] net: dsa: gswip: Fix return value check in gswip_probe()

2018-09-14 Thread Wei Yongjun

In case of error, the function devm_ioremap_resource() returns ERR_PTR()
and never returns NULL. The NULL test in the return value check should
be replaced with IS_ERR().

Fixes: 14fceff4771e ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Signed-off-by: Wei Yongjun 
---
 drivers/net/dsa/lantiq_gswip.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/dsa/lantiq_gswip.c b/drivers/net/dsa/lantiq_gswip.c
index 9c28d0b..faac359 100644
--- a/drivers/net/dsa/lantiq_gswip.c
+++ b/drivers/net/dsa/lantiq_gswip.c
@@ -1044,18 +1044,18 @@ static int gswip_probe(struct platform_device *pdev)
 
gswip_res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
priv->gswip = devm_ioremap_resource(dev, gswip_res);
-   if (!priv->gswip)
-   return -ENOMEM;
+   if (IS_ERR(priv->gswip))
+   return PTR_ERR(priv->gswip);
 
mdio_res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
priv->mdio = devm_ioremap_resource(dev, mdio_res);
-   if (!priv->mdio)
-   return -ENOMEM;
+   if (IS_ERR(priv->mdio))
+   return PTR_ERR(priv->mdio);
 
mii_res = platform_get_resource(pdev, IORESOURCE_MEM, 2);
priv->mii = devm_ioremap_resource(dev, mii_res);
-   if (!priv->mii)
-   return -ENOMEM;
+   if (IS_ERR(priv->mii))
+   return PTR_ERR(priv->mii);
 
priv->hw_info = of_device_get_match_data(dev);
if (!priv->hw_info)

[PATCH net-next v2 04/14] iavf: rename i40e_status to iavf_status

2018-09-14 Thread Jesse Brandeburg

This is just a rename of an internal variable i40e_status, but
it was a pretty big change and so deserved it's own patch.

Signed-off-by: Jesse Brandeburg 
---
 drivers/net/ethernet/intel/iavf/i40e_adminq.c |  94 +-
 drivers/net/ethernet/intel/iavf/i40e_alloc.h  |   8 +-
 drivers/net/ethernet/intel/iavf/i40e_common.c |  72 +++---
 drivers/net/ethernet/intel/iavf/i40e_osdep.h  |   2 +-
 drivers/net/ethernet/intel/iavf/i40e_prototype.h  |  28 +++---
 drivers/net/ethernet/intel/iavf/i40evf.h  |   2 +-
 drivers/net/ethernet/intel/iavf/i40evf_client.c   |   6 +-
 drivers/net/ethernet/intel/iavf/i40evf_ethtool.c  |  52 --
 drivers/net/ethernet/intel/iavf/i40evf_main.c |  14 +--
 drivers/net/ethernet/intel/iavf/i40evf_virtchnl.c | 115 +-
 10 files changed, 179 insertions(+), 214 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/i40e_adminq.c 
b/drivers/net/ethernet/intel/iavf/i40e_adminq.c
index 3d1c874f5f85..f0e6f9bbb819 100644
--- a/drivers/net/ethernet/intel/iavf/i40e_adminq.c
+++ b/drivers/net/ethernet/intel/iavf/i40e_adminq.c
@@ -34,9 +34,9 @@ static void i40e_adminq_init_regs(struct i40e_hw *hw)
  *  i40e_alloc_adminq_asq_ring - Allocate Admin Queue send rings
  *  @hw: pointer to the hardware structure
  **/
-static i40e_status i40e_alloc_adminq_asq_ring(struct i40e_hw *hw)
+static iavf_status i40e_alloc_adminq_asq_ring(struct i40e_hw *hw)
 {
-   i40e_status ret_code;
+   iavf_status ret_code;
 
ret_code = i40e_allocate_dma_mem(hw, >aq.asq.desc_buf,
 i40e_mem_atq_ring,
@@ -61,9 +61,9 @@ static i40e_status i40e_alloc_adminq_asq_ring(struct i40e_hw 
*hw)
  *  i40e_alloc_adminq_arq_ring - Allocate Admin Queue receive rings
  *  @hw: pointer to the hardware structure
  **/
-static i40e_status i40e_alloc_adminq_arq_ring(struct i40e_hw *hw)
+static iavf_status i40e_alloc_adminq_arq_ring(struct i40e_hw *hw)
 {
-   i40e_status ret_code;
+   iavf_status ret_code;
 
ret_code = i40e_allocate_dma_mem(hw, >aq.arq.desc_buf,
 i40e_mem_arq_ring,
@@ -102,9 +102,9 @@ static void i40e_free_adminq_arq(struct i40e_hw *hw)
  *  i40e_alloc_arq_bufs - Allocate pre-posted buffers for the receive queue
  *  @hw: pointer to the hardware structure
  **/
-static i40e_status i40e_alloc_arq_bufs(struct i40e_hw *hw)
+static iavf_status i40e_alloc_arq_bufs(struct i40e_hw *hw)
 {
-   i40e_status ret_code;
+   iavf_status ret_code;
struct i40e_aq_desc *desc;
struct i40e_dma_mem *bi;
int i;
@@ -115,7 +115,7 @@ static i40e_status i40e_alloc_arq_bufs(struct i40e_hw *hw)
 
/* buffer_info structures do not need alignment */
ret_code = i40e_allocate_virt_mem(hw, >aq.arq.dma_head,
-   (hw->aq.num_arq_entries * sizeof(struct i40e_dma_mem)));
+ (hw->aq.num_arq_entries * 
sizeof(struct i40e_dma_mem)));
if (ret_code)
goto alloc_arq_bufs;
hw->aq.arq.r.arq_bi = (struct i40e_dma_mem *)hw->aq.arq.dma_head.va;
@@ -169,15 +169,15 @@ static i40e_status i40e_alloc_arq_bufs(struct i40e_hw *hw)
  *  i40e_alloc_asq_bufs - Allocate empty buffer structs for the send queue
  *  @hw: pointer to the hardware structure
  **/
-static i40e_status i40e_alloc_asq_bufs(struct i40e_hw *hw)
+static iavf_status i40e_alloc_asq_bufs(struct i40e_hw *hw)
 {
-   i40e_status ret_code;
+   iavf_status ret_code;
struct i40e_dma_mem *bi;
int i;
 
/* No mapped memory needed yet, just the buffer info structures */
ret_code = i40e_allocate_virt_mem(hw, >aq.asq.dma_head,
-   (hw->aq.num_asq_entries * sizeof(struct i40e_dma_mem)));
+ (hw->aq.num_asq_entries * 
sizeof(struct i40e_dma_mem)));
if (ret_code)
goto alloc_asq_bufs;
hw->aq.asq.r.asq_bi = (struct i40e_dma_mem *)hw->aq.asq.dma_head.va;
@@ -253,9 +253,9 @@ static void i40e_free_asq_bufs(struct i40e_hw *hw)
  *
  *  Configure base address and length registers for the transmit queue
  **/
-static i40e_status i40e_config_asq_regs(struct i40e_hw *hw)
+static iavf_status i40e_config_asq_regs(struct i40e_hw *hw)
 {
-   i40e_status ret_code = 0;
+   iavf_status ret_code = 0;
u32 reg = 0;
 
/* Clear Head and Tail */
@@ -282,9 +282,9 @@ static i40e_status i40e_config_asq_regs(struct i40e_hw *hw)
  *
  * Configure base address and length registers for the receive (event queue)
  **/
-static i40e_status i40e_config_arq_regs(struct i40e_hw *hw)
+static iavf_status i40e_config_arq_regs(struct i40e_hw *hw)
 {
-   i40e_status ret_code = 0;
+   iavf_status ret_code = 0;
u32 reg = 0;
 
/* Clear Head and Tail */
@@ -321,9 +321,9 @@ static i40e_status i40e_config_arq_regs(struct i40e_hw *hw)
  *  Do *NOT* hold the lock when calling this as the memory

[PATCH net-next v2 09/14] iavf: rename i40e_hw to iavf_hw

2018-09-14 Thread Jesse Brandeburg

Fix up the i40e_hw names to new name, including versions
inside other strings.

Signed-off-by: Jesse Brandeburg 
---
 drivers/net/ethernet/intel/iavf/i40e_adminq.c| 42 +++
 drivers/net/ethernet/intel/iavf/i40e_alloc.h | 21 +++-
 drivers/net/ethernet/intel/iavf/i40e_common.c| 30 +--
 drivers/net/ethernet/intel/iavf/i40e_prototype.h | 65 +++-
 drivers/net/ethernet/intel/iavf/i40e_type.h  | 10 ++--
 drivers/net/ethernet/intel/iavf/iavf.h   |  2 +-
 drivers/net/ethernet/intel/iavf/iavf_main.c  | 52 +--
 drivers/net/ethernet/intel/iavf/iavf_txrx.c  |  2 +-
 drivers/net/ethernet/intel/iavf/iavf_virtchnl.c  |  6 +--
 9 files changed, 111 insertions(+), 119 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/i40e_adminq.c 
b/drivers/net/ethernet/intel/iavf/i40e_adminq.c
index 69dfdfd69796..480c3e8c38c8 100644
--- a/drivers/net/ethernet/intel/iavf/i40e_adminq.c
+++ b/drivers/net/ethernet/intel/iavf/i40e_adminq.c
@@ -13,7 +13,7 @@
  *
  *  This assumes the alloc_asq and alloc_arq functions have already been called
  **/
-static void i40e_adminq_init_regs(struct i40e_hw *hw)
+static void i40e_adminq_init_regs(struct iavf_hw *hw)
 {
/* set head and tail registers in our local struct */
hw->aq.asq.tail = IAVF_VF_ATQT1;
@@ -32,7 +32,7 @@ static void i40e_adminq_init_regs(struct i40e_hw *hw)
  *  i40e_alloc_adminq_asq_ring - Allocate Admin Queue send rings
  *  @hw: pointer to the hardware structure
  **/
-static iavf_status i40e_alloc_adminq_asq_ring(struct i40e_hw *hw)
+static iavf_status i40e_alloc_adminq_asq_ring(struct iavf_hw *hw)
 {
iavf_status ret_code;
 
@@ -59,7 +59,7 @@ static iavf_status i40e_alloc_adminq_asq_ring(struct i40e_hw 
*hw)
  *  i40e_alloc_adminq_arq_ring - Allocate Admin Queue receive rings
  *  @hw: pointer to the hardware structure
  **/
-static iavf_status i40e_alloc_adminq_arq_ring(struct i40e_hw *hw)
+static iavf_status i40e_alloc_adminq_arq_ring(struct iavf_hw *hw)
 {
iavf_status ret_code;
 
@@ -79,7 +79,7 @@ static iavf_status i40e_alloc_adminq_arq_ring(struct i40e_hw 
*hw)
  *  This assumes the posted send buffers have already been cleaned
  *  and de-allocated
  **/
-static void i40e_free_adminq_asq(struct i40e_hw *hw)
+static void i40e_free_adminq_asq(struct iavf_hw *hw)
 {
i40e_free_dma_mem(hw, >aq.asq.desc_buf);
 }
@@ -91,7 +91,7 @@ static void i40e_free_adminq_asq(struct i40e_hw *hw)
  *  This assumes the posted receive buffers have already been cleaned
  *  and de-allocated
  **/
-static void i40e_free_adminq_arq(struct i40e_hw *hw)
+static void i40e_free_adminq_arq(struct iavf_hw *hw)
 {
i40e_free_dma_mem(hw, >aq.arq.desc_buf);
 }
@@ -100,7 +100,7 @@ static void i40e_free_adminq_arq(struct i40e_hw *hw)
  *  i40e_alloc_arq_bufs - Allocate pre-posted buffers for the receive queue
  *  @hw: pointer to the hardware structure
  **/
-static iavf_status i40e_alloc_arq_bufs(struct i40e_hw *hw)
+static iavf_status i40e_alloc_arq_bufs(struct iavf_hw *hw)
 {
iavf_status ret_code;
struct i40e_aq_desc *desc;
@@ -167,7 +167,7 @@ static iavf_status i40e_alloc_arq_bufs(struct i40e_hw *hw)
  *  i40e_alloc_asq_bufs - Allocate empty buffer structs for the send queue
  *  @hw: pointer to the hardware structure
  **/
-static iavf_status i40e_alloc_asq_bufs(struct i40e_hw *hw)
+static iavf_status i40e_alloc_asq_bufs(struct iavf_hw *hw)
 {
iavf_status ret_code;
struct i40e_dma_mem *bi;
@@ -207,7 +207,7 @@ static iavf_status i40e_alloc_asq_bufs(struct i40e_hw *hw)
  *  i40e_free_arq_bufs - Free receive queue buffer info elements
  *  @hw: pointer to the hardware structure
  **/
-static void i40e_free_arq_bufs(struct i40e_hw *hw)
+static void i40e_free_arq_bufs(struct iavf_hw *hw)
 {
int i;
 
@@ -226,7 +226,7 @@ static void i40e_free_arq_bufs(struct i40e_hw *hw)
  *  i40e_free_asq_bufs - Free send queue buffer info elements
  *  @hw: pointer to the hardware structure
  **/
-static void i40e_free_asq_bufs(struct i40e_hw *hw)
+static void i40e_free_asq_bufs(struct iavf_hw *hw)
 {
int i;
 
@@ -251,7 +251,7 @@ static void i40e_free_asq_bufs(struct i40e_hw *hw)
  *
  *  Configure base address and length registers for the transmit queue
  **/
-static iavf_status i40e_config_asq_regs(struct i40e_hw *hw)
+static iavf_status i40e_config_asq_regs(struct iavf_hw *hw)
 {
iavf_status ret_code = 0;
u32 reg = 0;
@@ -280,7 +280,7 @@ static iavf_status i40e_config_asq_regs(struct i40e_hw *hw)
  *
  * Configure base address and length registers for the receive (event queue)
  **/
-static iavf_status i40e_config_arq_regs(struct i40e_hw *hw)
+static iavf_status i40e_config_arq_regs(struct iavf_hw *hw)
 {
iavf_status ret_code = 0;
u32 reg = 0;
@@ -319,7 +319,7 @@ static iavf_status i40e_config_arq_regs(struct i40e_hw *hw)
  *  Do *NOT* hold the lock when calling this as the memory allocation routines
  *

[PATCH net-next v2 06/14] iavf: remove references to old names

2018-09-14 Thread Jesse Brandeburg

Remove the register name references to I40E_VF* and change to
IAVF_VF. Update the descriptor names and defines to the IAVF
name.

Signed-off-by: Jesse Brandeburg 
---
 drivers/net/ethernet/intel/iavf/i40e_adminq.c   |  28 ++--
 drivers/net/ethernet/intel/iavf/i40e_common.c   |   2 +-
 drivers/net/ethernet/intel/iavf/i40e_osdep.h|   2 +-
 drivers/net/ethernet/intel/iavf/i40e_register.h | 128 +-
 drivers/net/ethernet/intel/iavf/i40e_type.h | 170 
 drivers/net/ethernet/intel/iavf/iavf.h  |  10 +-
 drivers/net/ethernet/intel/iavf/iavf_main.c |  92 ++---
 drivers/net/ethernet/intel/iavf/iavf_txrx.c | 104 +++
 drivers/net/ethernet/intel/iavf/iavf_txrx.h |   2 +-
 9 files changed, 267 insertions(+), 271 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/i40e_adminq.c 
b/drivers/net/ethernet/intel/iavf/i40e_adminq.c
index f0e6f9bbb819..50e0f1225298 100644
--- a/drivers/net/ethernet/intel/iavf/i40e_adminq.c
+++ b/drivers/net/ethernet/intel/iavf/i40e_adminq.c
@@ -17,16 +17,16 @@ static void i40e_adminq_init_regs(struct i40e_hw *hw)
 {
/* set head and tail registers in our local struct */
if (i40e_is_vf(hw)) {
-   hw->aq.asq.tail = I40E_VF_ATQT1;
-   hw->aq.asq.head = I40E_VF_ATQH1;
-   hw->aq.asq.len  = I40E_VF_ATQLEN1;
-   hw->aq.asq.bal  = I40E_VF_ATQBAL1;
-   hw->aq.asq.bah  = I40E_VF_ATQBAH1;
-   hw->aq.arq.tail = I40E_VF_ARQT1;
-   hw->aq.arq.head = I40E_VF_ARQH1;
-   hw->aq.arq.len  = I40E_VF_ARQLEN1;
-   hw->aq.arq.bal  = I40E_VF_ARQBAL1;
-   hw->aq.arq.bah  = I40E_VF_ARQBAH1;
+   hw->aq.asq.tail = IAVF_VF_ATQT1;
+   hw->aq.asq.head = IAVF_VF_ATQH1;
+   hw->aq.asq.len  = IAVF_VF_ATQLEN1;
+   hw->aq.asq.bal  = IAVF_VF_ATQBAL1;
+   hw->aq.asq.bah  = IAVF_VF_ATQBAH1;
+   hw->aq.arq.tail = IAVF_VF_ARQT1;
+   hw->aq.arq.head = IAVF_VF_ARQH1;
+   hw->aq.arq.len  = IAVF_VF_ARQLEN1;
+   hw->aq.arq.bal  = IAVF_VF_ARQBAL1;
+   hw->aq.arq.bah  = IAVF_VF_ARQBAH1;
}
 }
 
@@ -264,7 +264,7 @@ static iavf_status i40e_config_asq_regs(struct i40e_hw *hw)
 
/* set starting point */
wr32(hw, hw->aq.asq.len, (hw->aq.num_asq_entries |
- I40E_VF_ATQLEN1_ATQENABLE_MASK));
+ IAVF_VF_ATQLEN1_ATQENABLE_MASK));
wr32(hw, hw->aq.asq.bal, lower_32_bits(hw->aq.asq.desc_buf.pa));
wr32(hw, hw->aq.asq.bah, upper_32_bits(hw->aq.asq.desc_buf.pa));
 
@@ -293,7 +293,7 @@ static iavf_status i40e_config_arq_regs(struct i40e_hw *hw)
 
/* set starting point */
wr32(hw, hw->aq.arq.len, (hw->aq.num_arq_entries |
- I40E_VF_ARQLEN1_ARQENABLE_MASK));
+ IAVF_VF_ARQLEN1_ARQENABLE_MASK));
wr32(hw, hw->aq.arq.bal, lower_32_bits(hw->aq.arq.desc_buf.pa));
wr32(hw, hw->aq.arq.bah, upper_32_bits(hw->aq.arq.desc_buf.pa));
 
@@ -800,7 +800,7 @@ iavf_status iavf_asq_send_command(struct i40e_hw *hw, 
struct i40e_aq_desc *desc,
/* update the error if time out occurred */
if ((!cmd_completed) &&
(!details->async && !details->postpone)) {
-   if (rd32(hw, hw->aq.asq.len) & I40E_VF_ATQLEN1_ATQCRIT_MASK) {
+   if (rd32(hw, hw->aq.asq.len) & IAVF_VF_ATQLEN1_ATQCRIT_MASK) {
i40e_debug(hw, I40E_DEBUG_AQ_MESSAGE,
   "AQTX: AQ Critical error.\n");
status = I40E_ERR_ADMIN_QUEUE_CRITICAL_ERROR;
@@ -868,7 +868,7 @@ iavf_status iavf_clean_arq_element(struct i40e_hw *hw,
}
 
/* set next_to_use to head */
-   ntu = rd32(hw, hw->aq.arq.head) & I40E_VF_ARQH1_ARQH_MASK;
+   ntu = rd32(hw, hw->aq.arq.head) & IAVF_VF_ARQH1_ARQH_MASK;
if (ntu == ntc) {
/* nothing to do - shouldn't need to update ring's values */
ret_code = I40E_ERR_ADMIN_QUEUE_NO_WORK;
diff --git a/drivers/net/ethernet/intel/iavf/i40e_common.c 
b/drivers/net/ethernet/intel/iavf/i40e_common.c
index 96133efddf72..733e5cfeaf71 100644
--- a/drivers/net/ethernet/intel/iavf/i40e_common.c
+++ b/drivers/net/ethernet/intel/iavf/i40e_common.c
@@ -335,7 +335,7 @@ bool iavf_check_asq_alive(struct i40e_hw *hw)
 {
if (hw->aq.asq.len)
return !!(rd32(hw, hw->aq.asq.len) &
- I40E_VF_ATQLEN1_ATQENABLE_MASK);
+ IAVF_VF_ATQLEN1_ATQENABLE_MASK);
else
return false;
 }
diff --git a/drivers/net/ethernet/intel/iavf/i40e_osdep.h 
b/drivers/net/ethernet/intel/iavf/i40e_osdep.h
index 788a599dc26b..0fceb284e54a 100644
--- a/drivers/net/ethernet/intel/iavf/i40e_osdep.h
+++ b/drivers/net/ethernet/intel/iavf/i40e_osdep.h
@@

[PATCH net-next v2 08/14] iavf: rename I40E_ADMINQ_DESC

2018-09-14 Thread Jesse Brandeburg

Take care of some renames containing I40E_ADMINQ_DESC.

Signed-off-by: Jesse Brandeburg 
---
 drivers/net/ethernet/intel/iavf/i40e_adminq.c | 18 +-
 drivers/net/ethernet/intel/iavf/i40e_adminq.h |  4 ++--
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/i40e_adminq.c 
b/drivers/net/ethernet/intel/iavf/i40e_adminq.c
index 8110b92fa2b0..69dfdfd69796 100644
--- a/drivers/net/ethernet/intel/iavf/i40e_adminq.c
+++ b/drivers/net/ethernet/intel/iavf/i40e_adminq.c
@@ -40,7 +40,7 @@ static iavf_status i40e_alloc_adminq_asq_ring(struct i40e_hw 
*hw)
 i40e_mem_atq_ring,
 (hw->aq.num_asq_entries *
 sizeof(struct i40e_aq_desc)),
-I40E_ADMINQ_DESC_ALIGNMENT);
+IAVF_ADMINQ_DESC_ALIGNMENT);
if (ret_code)
return ret_code;
 
@@ -67,7 +67,7 @@ static iavf_status i40e_alloc_adminq_arq_ring(struct i40e_hw 
*hw)
 i40e_mem_arq_ring,
 (hw->aq.num_arq_entries *
 sizeof(struct i40e_aq_desc)),
-I40E_ADMINQ_DESC_ALIGNMENT);
+IAVF_ADMINQ_DESC_ALIGNMENT);
 
return ret_code;
 }
@@ -124,12 +124,12 @@ static iavf_status i40e_alloc_arq_bufs(struct i40e_hw *hw)
ret_code = i40e_allocate_dma_mem(hw, bi,
 i40e_mem_arq_buf,
 hw->aq.arq_buf_size,
-I40E_ADMINQ_DESC_ALIGNMENT);
+IAVF_ADMINQ_DESC_ALIGNMENT);
if (ret_code)
goto unwind_alloc_arq_bufs;
 
/* now configure the descriptors for use */
-   desc = I40E_ADMINQ_DESC(hw->aq.arq, i);
+   desc = IAVF_ADMINQ_DESC(hw->aq.arq, i);
 
desc->flags = cpu_to_le16(I40E_AQ_FLAG_BUF);
if (hw->aq.arq_buf_size > I40E_AQ_LARGE_BUF)
@@ -186,7 +186,7 @@ static iavf_status i40e_alloc_asq_bufs(struct i40e_hw *hw)
ret_code = i40e_allocate_dma_mem(hw, bi,
 i40e_mem_asq_buf,
 hw->aq.asq_buf_size,
-I40E_ADMINQ_DESC_ALIGNMENT);
+IAVF_ADMINQ_DESC_ALIGNMENT);
if (ret_code)
goto unwind_alloc_asq_bufs;
}
@@ -574,7 +574,7 @@ static u16 i40e_clean_asq(struct i40e_hw *hw)
struct i40e_aq_desc desc_cb;
struct i40e_aq_desc *desc;
 
-   desc = I40E_ADMINQ_DESC(*asq, ntc);
+   desc = IAVF_ADMINQ_DESC(*asq, ntc);
details = I40E_ADMINQ_DETAILS(*asq, ntc);
while (rd32(hw, hw->aq.asq.head) != ntc) {
i40e_debug(hw, I40E_DEBUG_AQ_MESSAGE,
@@ -592,7 +592,7 @@ static u16 i40e_clean_asq(struct i40e_hw *hw)
ntc++;
if (ntc == asq->count)
ntc = 0;
-   desc = I40E_ADMINQ_DESC(*asq, ntc);
+   desc = IAVF_ADMINQ_DESC(*asq, ntc);
details = I40E_ADMINQ_DETAILS(*asq, ntc);
}
 
@@ -714,7 +714,7 @@ iavf_status iavf_asq_send_command(struct i40e_hw *hw, 
struct i40e_aq_desc *desc,
}
 
/* initialize the temp desc pointer with the right desc */
-   desc_on_ring = I40E_ADMINQ_DESC(hw->aq.asq, hw->aq.asq.next_to_use);
+   desc_on_ring = IAVF_ADMINQ_DESC(hw->aq.asq, hw->aq.asq.next_to_use);
 
/* if the desc is available copy the temp desc to the right place */
*desc_on_ring = *desc;
@@ -874,7 +874,7 @@ iavf_status iavf_clean_arq_element(struct i40e_hw *hw,
}
 
/* now clean the next descriptor */
-   desc = I40E_ADMINQ_DESC(hw->aq.arq, ntc);
+   desc = IAVF_ADMINQ_DESC(hw->aq.arq, ntc);
desc_idx = ntc;
 
hw->aq.arq_last_status =
diff --git a/drivers/net/ethernet/intel/iavf/i40e_adminq.h 
b/drivers/net/ethernet/intel/iavf/i40e_adminq.h
index 80b70a65028f..fd162a293c38 100644
--- a/drivers/net/ethernet/intel/iavf/i40e_adminq.h
+++ b/drivers/net/ethernet/intel/iavf/i40e_adminq.h
@@ -8,10 +8,10 @@
 #include "i40e_status.h"
 #include "i40e_adminq_cmd.h"
 
-#define I40E_ADMINQ_DESC(R, i)   \
+#define IAVF_ADMINQ_DESC(R, i)   \
(&(((struct i40e_aq_desc *)((R).desc_buf.va))[i]))
 
-#define I40E_ADMINQ_DESC_ALIGNMENT 4096
+#define IAVF_ADMINQ_DESC_ALIGNMENT 4096
 
 struct i40e_adminq_ring {
struct i40e_virt_mem dma_head;  /* space for dma structures */
-- 
2.14.4

[PATCH net-next v2 07/14] iavf: rename device ID defines

2018-09-14 Thread Jesse Brandeburg

Rename the device ID defines to have IAVF in them
and remove all the unused defines.

Signed-off-by: Jesse Brandeburg 
---
 drivers/net/ethernet/intel/iavf/i40e_adminq.c | 22 +++
 drivers/net/ethernet/intel/iavf/i40e_common.c | 29 +++
 drivers/net/ethernet/intel/iavf/i40e_devids.h | 40 ++-
 drivers/net/ethernet/intel/iavf/i40e_type.h   |  6 
 drivers/net/ethernet/intel/iavf/iavf_main.c   |  8 +++---
 5 files changed, 27 insertions(+), 78 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/i40e_adminq.c 
b/drivers/net/ethernet/intel/iavf/i40e_adminq.c
index 50e0f1225298..8110b92fa2b0 100644
--- a/drivers/net/ethernet/intel/iavf/i40e_adminq.c
+++ b/drivers/net/ethernet/intel/iavf/i40e_adminq.c
@@ -16,18 +16,16 @@
 static void i40e_adminq_init_regs(struct i40e_hw *hw)
 {
/* set head and tail registers in our local struct */
-   if (i40e_is_vf(hw)) {
-   hw->aq.asq.tail = IAVF_VF_ATQT1;
-   hw->aq.asq.head = IAVF_VF_ATQH1;
-   hw->aq.asq.len  = IAVF_VF_ATQLEN1;
-   hw->aq.asq.bal  = IAVF_VF_ATQBAL1;
-   hw->aq.asq.bah  = IAVF_VF_ATQBAH1;
-   hw->aq.arq.tail = IAVF_VF_ARQT1;
-   hw->aq.arq.head = IAVF_VF_ARQH1;
-   hw->aq.arq.len  = IAVF_VF_ARQLEN1;
-   hw->aq.arq.bal  = IAVF_VF_ARQBAL1;
-   hw->aq.arq.bah  = IAVF_VF_ARQBAH1;
-   }
+   hw->aq.asq.tail = IAVF_VF_ATQT1;
+   hw->aq.asq.head = IAVF_VF_ATQH1;
+   hw->aq.asq.len  = IAVF_VF_ATQLEN1;
+   hw->aq.asq.bal  = IAVF_VF_ATQBAL1;
+   hw->aq.asq.bah  = IAVF_VF_ATQBAH1;
+   hw->aq.arq.tail = IAVF_VF_ARQT1;
+   hw->aq.arq.head = IAVF_VF_ARQH1;
+   hw->aq.arq.len  = IAVF_VF_ARQLEN1;
+   hw->aq.arq.bal  = IAVF_VF_ARQBAL1;
+   hw->aq.arq.bah  = IAVF_VF_ARQBAH1;
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/iavf/i40e_common.c 
b/drivers/net/ethernet/intel/iavf/i40e_common.c
index 733e5cfeaf71..b97e8925d20e 100644
--- a/drivers/net/ethernet/intel/iavf/i40e_common.c
+++ b/drivers/net/ethernet/intel/iavf/i40e_common.c
@@ -19,33 +19,12 @@ iavf_status i40e_set_mac_type(struct i40e_hw *hw)
 
if (hw->vendor_id == PCI_VENDOR_ID_INTEL) {
switch (hw->device_id) {
-   case I40E_DEV_ID_SFP_XL710:
-   case I40E_DEV_ID_QEMU:
-   case I40E_DEV_ID_KX_B:
-   case I40E_DEV_ID_KX_C:
-   case I40E_DEV_ID_QSFP_A:
-   case I40E_DEV_ID_QSFP_B:
-   case I40E_DEV_ID_QSFP_C:
-   case I40E_DEV_ID_10G_BASE_T:
-   case I40E_DEV_ID_10G_BASE_T4:
-   case I40E_DEV_ID_20G_KR2:
-   case I40E_DEV_ID_20G_KR2_A:
-   case I40E_DEV_ID_25G_B:
-   case I40E_DEV_ID_25G_SFP28:
-   hw->mac.type = I40E_MAC_XL710;
-   break;
-   case I40E_DEV_ID_SFP_X722:
-   case I40E_DEV_ID_1G_BASE_T_X722:
-   case I40E_DEV_ID_10G_BASE_T_X722:
-   case I40E_DEV_ID_SFP_I_X722:
-   hw->mac.type = I40E_MAC_X722;
-   break;
-   case I40E_DEV_ID_X722_VF:
+   case IAVF_DEV_ID_X722_VF:
hw->mac.type = I40E_MAC_X722_VF;
break;
-   case I40E_DEV_ID_VF:
-   case I40E_DEV_ID_VF_HV:
-   case I40E_DEV_ID_ADAPTIVE_VF:
+   case IAVF_DEV_ID_VF:
+   case IAVF_DEV_ID_VF_HV:
+   case IAVF_DEV_ID_ADAPTIVE_VF:
hw->mac.type = I40E_MAC_VF;
break;
default:
diff --git a/drivers/net/ethernet/intel/iavf/i40e_devids.h 
b/drivers/net/ethernet/intel/iavf/i40e_devids.h
index f300bf271824..8eb7b697e96c 100644
--- a/drivers/net/ethernet/intel/iavf/i40e_devids.h
+++ b/drivers/net/ethernet/intel/iavf/i40e_devids.h
@@ -1,34 +1,12 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 /* Copyright(c) 2013 - 2018 Intel Corporation. */
 
-#ifndef _I40E_DEVIDS_H_
-#define _I40E_DEVIDS_H_
-
-/* Device IDs */
-#define I40E_DEV_ID_SFP_XL710  0x1572
-#define I40E_DEV_ID_QEMU   0x1574
-#define I40E_DEV_ID_KX_B   0x1580
-#define I40E_DEV_ID_KX_C   0x1581
-#define I40E_DEV_ID_QSFP_A 0x1583
-#define I40E_DEV_ID_QSFP_B 0x1584
-#define I40E_DEV_ID_QSFP_C 0x1585
-#define I40E_DEV_ID_10G_BASE_T 0x1586
-#define I40E_DEV_ID_20G_KR20x1587
-#define I40E_DEV_ID_20G_KR2_A  0x1588
-#define I40E_DEV_ID_10G_BASE_T40x1589
-#define I40E_DEV_ID_25G_B  0x158A
-#define I40E_DEV_ID_25G_SFP28  0x158B
-#define I40E_DEV_ID_VF 0x154C
-#define I40E_DEV_ID_VF_HV  0x1571
-#define I40E_DEV_ID_ADAPTIVE_VF0x1889
-#define I40E_DEV_ID_SFP_X722   0x37D0
-#define

[PATCH net-next v2 10/14] iavf: replace i40e_debug with iavf version

2018-09-14 Thread Jesse Brandeburg

Change another string (i40e_debug)

Signed-off-by: Jesse Brandeburg 
---
 drivers/net/ethernet/intel/iavf/i40e_adminq.c| 28 
 drivers/net/ethernet/intel/iavf/i40e_common.c| 12 +-
 drivers/net/ethernet/intel/iavf/i40e_osdep.h |  2 +-
 drivers/net/ethernet/intel/iavf/i40e_prototype.h |  2 +-
 drivers/net/ethernet/intel/iavf/i40e_type.h  |  2 +-
 5 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/i40e_adminq.c 
b/drivers/net/ethernet/intel/iavf/i40e_adminq.c
index 480c3e8c38c8..d614722fbb3d 100644
--- a/drivers/net/ethernet/intel/iavf/i40e_adminq.c
+++ b/drivers/net/ethernet/intel/iavf/i40e_adminq.c
@@ -577,7 +577,7 @@ static u16 i40e_clean_asq(struct iavf_hw *hw)
desc = IAVF_ADMINQ_DESC(*asq, ntc);
details = I40E_ADMINQ_DETAILS(*asq, ntc);
while (rd32(hw, hw->aq.asq.head) != ntc) {
-   i40e_debug(hw, I40E_DEBUG_AQ_MESSAGE,
+   iavf_debug(hw, I40E_DEBUG_AQ_MESSAGE,
   "ntc %d head %d.\n", ntc, rd32(hw, hw->aq.asq.head));
 
if (details->callback) {
@@ -643,7 +643,7 @@ iavf_status iavf_asq_send_command(struct iavf_hw *hw, 
struct i40e_aq_desc *desc,
mutex_lock(>aq.asq_mutex);
 
if (hw->aq.asq.count == 0) {
-   i40e_debug(hw, I40E_DEBUG_AQ_MESSAGE,
+   iavf_debug(hw, I40E_DEBUG_AQ_MESSAGE,
   "AQTX: Admin queue not initialized.\n");
status = I40E_ERR_QUEUE_EMPTY;
goto asq_send_command_error;
@@ -653,7 +653,7 @@ iavf_status iavf_asq_send_command(struct iavf_hw *hw, 
struct i40e_aq_desc *desc,
 
val = rd32(hw, hw->aq.asq.head);
if (val >= hw->aq.num_asq_entries) {
-   i40e_debug(hw, I40E_DEBUG_AQ_MESSAGE,
+   iavf_debug(hw, I40E_DEBUG_AQ_MESSAGE,
   "AQTX: head overrun at %d\n", val);
status = I40E_ERR_QUEUE_EMPTY;
goto asq_send_command_error;
@@ -682,7 +682,7 @@ iavf_status iavf_asq_send_command(struct iavf_hw *hw, 
struct i40e_aq_desc *desc,
desc->flags |= cpu_to_le16(details->flags_ena);
 
if (buff_size > hw->aq.asq_buf_size) {
-   i40e_debug(hw,
+   iavf_debug(hw,
   I40E_DEBUG_AQ_MESSAGE,
   "AQTX: Invalid buffer size: %d.\n",
   buff_size);
@@ -691,7 +691,7 @@ iavf_status iavf_asq_send_command(struct iavf_hw *hw, 
struct i40e_aq_desc *desc,
}
 
if (details->postpone && !details->async) {
-   i40e_debug(hw,
+   iavf_debug(hw,
   I40E_DEBUG_AQ_MESSAGE,
   "AQTX: Async flag not set along with postpone flag");
status = I40E_ERR_PARAM;
@@ -706,7 +706,7 @@ iavf_status iavf_asq_send_command(struct iavf_hw *hw, 
struct i40e_aq_desc *desc,
 * in case of asynchronous completions
 */
if (i40e_clean_asq(hw) == 0) {
-   i40e_debug(hw,
+   iavf_debug(hw,
   I40E_DEBUG_AQ_MESSAGE,
   "AQTX: Error queue is full.\n");
status = I40E_ERR_ADMIN_QUEUE_FULL;
@@ -736,7 +736,7 @@ iavf_status iavf_asq_send_command(struct iavf_hw *hw, 
struct i40e_aq_desc *desc,
}
 
/* bump the tail */
-   i40e_debug(hw, I40E_DEBUG_AQ_MESSAGE, "AQTX: desc and buffer:\n");
+   iavf_debug(hw, I40E_DEBUG_AQ_MESSAGE, "AQTX: desc and buffer:\n");
iavf_debug_aq(hw, I40E_DEBUG_AQ_COMMAND, (void *)desc_on_ring,
  buff, buff_size);
(hw->aq.asq.next_to_use)++;
@@ -769,7 +769,7 @@ iavf_status iavf_asq_send_command(struct iavf_hw *hw, 
struct i40e_aq_desc *desc,
memcpy(buff, dma_buff->va, buff_size);
retval = le16_to_cpu(desc->retval);
if (retval != 0) {
-   i40e_debug(hw,
+   iavf_debug(hw,
   I40E_DEBUG_AQ_MESSAGE,
   "AQTX: Command completed with error 0x%X.\n",
   retval);
@@ -787,7 +787,7 @@ iavf_status iavf_asq_send_command(struct iavf_hw *hw, 
struct i40e_aq_desc *desc,
hw->aq.asq_last_status = (enum i40e_admin_queue_err)retval;
}
 
-   i40e_debug(hw, I40E_DEBUG_AQ_MESSAGE,
+   iavf_debug(hw, I40E_DEBUG_AQ_MESSAGE,
   "AQTX: desc and buffer writeback:\n");
iavf_debug_aq(hw, I40E_DEBUG_AQ_COMMAND, (void *)desc, buff, buff_size);
 
@@ -799,11 +799,11 @@ iavf_status iavf_asq_send_command(struct iavf_hw *hw, 
struct i40e_aq_desc *desc,
if ((!cmd_completed) &&
(!details->async && !details->postpone)) {
if (rd32(hw, hw->aq.asq.len) & IAVF_VF_ATQLEN1_ATQCRIT_MASK) {
-   i40e_debug(hw, I40E_DEBUG_AQ_MESSAGE,
+

[PATCH net-next v2 13/14] iavf: finish renaming files to iavf

2018-09-14 Thread Jesse Brandeburg

This finishes the process of renaming the files that
make sense to rename (skipping adminq related files that
talk to i40e), and fixes up the build and the #includes
so that everything builds nicely.

Signed-off-by: Jesse Brandeburg 
---
 drivers/net/ethernet/intel/iavf/Makefile  | 2 +-
 drivers/net/ethernet/intel/iavf/i40e_adminq.c | 8 
 drivers/net/ethernet/intel/iavf/i40e_adminq.h | 4 ++--
 drivers/net/ethernet/intel/iavf/iavf.h| 4 ++--
 drivers/net/ethernet/intel/iavf/{i40e_alloc.h => iavf_alloc.h}| 0
 drivers/net/ethernet/intel/iavf/iavf_client.c | 2 +-
 drivers/net/ethernet/intel/iavf/{i40e_common.c => iavf_common.c}  | 4 ++--
 drivers/net/ethernet/intel/iavf/{i40e_devids.h => iavf_devids.h}  | 0
 drivers/net/ethernet/intel/iavf/iavf_main.c   | 2 +-
 drivers/net/ethernet/intel/iavf/{i40e_osdep.h => iavf_osdep.h}| 0
 .../ethernet/intel/iavf/{i40e_prototype.h => iavf_prototype.h}| 4 ++--
 .../net/ethernet/intel/iavf/{i40e_register.h => iavf_register.h}  | 0
 drivers/net/ethernet/intel/iavf/{i40e_status.h => iavf_status.h}  | 0
 drivers/net/ethernet/intel/iavf/iavf_txrx.c   | 2 +-
 drivers/net/ethernet/intel/iavf/{i40e_type.h => iavf_type.h}  | 8 
 drivers/net/ethernet/intel/iavf/iavf_virtchnl.c   | 2 +-
 16 files changed, 21 insertions(+), 21 deletions(-)
 rename drivers/net/ethernet/intel/iavf/{i40e_alloc.h => iavf_alloc.h} (100%)
 rename drivers/net/ethernet/intel/iavf/{i40e_common.c => iavf_common.c} (99%)
 rename drivers/net/ethernet/intel/iavf/{i40e_devids.h => iavf_devids.h} (100%)
 rename drivers/net/ethernet/intel/iavf/{i40e_osdep.h => iavf_osdep.h} (100%)
 rename drivers/net/ethernet/intel/iavf/{i40e_prototype.h => iavf_prototype.h} 
(98%)
 rename drivers/net/ethernet/intel/iavf/{i40e_register.h => iavf_register.h} 
(100%)
 rename drivers/net/ethernet/intel/iavf/{i40e_status.h => iavf_status.h} (100%)
 rename drivers/net/ethernet/intel/iavf/{i40e_type.h => iavf_type.h} (99%)

diff --git a/drivers/net/ethernet/intel/iavf/Makefile 
b/drivers/net/ethernet/intel/iavf/Makefile
index fa4c43be2266..87ddfbac2f17 100644
--- a/drivers/net/ethernet/intel/iavf/Makefile
+++ b/drivers/net/ethernet/intel/iavf/Makefile
@@ -12,4 +12,4 @@ subdir-ccflags-y += -I$(src)
 obj-$(CONFIG_IAVF) += iavf.o
 
 iavf-objs := iavf_main.o iavf_ethtool.o iavf_virtchnl.o \
-iavf_txrx.o i40e_common.o i40e_adminq.o iavf_client.o
+iavf_txrx.o iavf_common.o i40e_adminq.o iavf_client.o
diff --git a/drivers/net/ethernet/intel/iavf/i40e_adminq.c 
b/drivers/net/ethernet/intel/iavf/i40e_adminq.c
index 8aa817808cd5..d2b165b610fa 100644
--- a/drivers/net/ethernet/intel/iavf/i40e_adminq.c
+++ b/drivers/net/ethernet/intel/iavf/i40e_adminq.c
@@ -1,11 +1,11 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright(c) 2013 - 2018 Intel Corporation. */
 
-#include "i40e_status.h"
-#include "i40e_type.h"
-#include "i40e_register.h"
+#include "iavf_status.h"
+#include "iavf_type.h"
+#include "iavf_register.h"
 #include "i40e_adminq.h"
-#include "i40e_prototype.h"
+#include "iavf_prototype.h"
 
 /**
  *  i40e_adminq_init_regs - Initialize AdminQ registers
diff --git a/drivers/net/ethernet/intel/iavf/i40e_adminq.h 
b/drivers/net/ethernet/intel/iavf/i40e_adminq.h
index e34625e25589..ee983889eab0 100644
--- a/drivers/net/ethernet/intel/iavf/i40e_adminq.h
+++ b/drivers/net/ethernet/intel/iavf/i40e_adminq.h
@@ -4,8 +4,8 @@
 #ifndef _IAVF_ADMINQ_H_
 #define _IAVF_ADMINQ_H_
 
-#include "i40e_osdep.h"
-#include "i40e_status.h"
+#include "iavf_osdep.h"
+#include "iavf_status.h"
 #include "i40e_adminq_cmd.h"
 
 #define IAVF_ADMINQ_DESC(R, i)   \
diff --git a/drivers/net/ethernet/intel/iavf/iavf.h 
b/drivers/net/ethernet/intel/iavf/iavf.h
index 1d973b4cd973..961c1a71b671 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -34,7 +34,7 @@
 #include 
 #include 
 
-#include "i40e_type.h"
+#include "iavf_type.h"
 #include 
 #include "iavf_txrx.h"
 
@@ -298,7 +298,7 @@ struct iavf_adapter {
struct net_device *netdev;
struct pci_dev *pdev;
 
-   struct iavf_hw hw; /* defined in i40e_type.h */
+   struct iavf_hw hw; /* defined in iavf_type.h */
 
enum iavf_state_t state;
unsigned long crit_section;
diff --git a/drivers/net/ethernet/intel/iavf/i40e_alloc.h 
b/drivers/net/ethernet/intel/iavf/iavf_alloc.h
similarity index 100%
rename from drivers/net/ethernet/intel/iavf/i40e_alloc.h
rename to drivers/net/ethernet/intel/iavf/iavf_alloc.h
diff --git a/drivers/net/ethernet/intel/iavf/iavf_client.c 
b/drivers/net/ethernet/intel/iavf/iavf_client.c
index f4c195a4167a..a0bfa6b9555e 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_client.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_client.c
@@ -5,7 +5,7 @@
 #include 
 
 #include "iavf.h"
-#include

[PATCH net-next v2 11/14] iavf: tracing infrastructure rename

2018-09-14 Thread Jesse Brandeburg

Rename the i40e_trace file and fix up all the callers
to the new names inside the iavf_trace.h file.

Signed-off-by: Jesse Brandeburg 
---
 drivers/net/ethernet/intel/iavf/iavf_main.c|  2 +-
 .../intel/iavf/{i40e_trace.h => iavf_trace.h}  | 28 +++---
 drivers/net/ethernet/intel/iavf/iavf_txrx.c| 14 +--
 3 files changed, 22 insertions(+), 22 deletions(-)
 rename drivers/net/ethernet/intel/iavf/{i40e_trace.h => iavf_trace.h} (85%)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c 
b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 63c5d97b1658..b8edf43e36f1 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -9,7 +9,7 @@
  * CREATE_TRACE_POINTS defined
  */
 #define CREATE_TRACE_POINTS
-#include "i40e_trace.h"
+#include "iavf_trace.h"
 
 static int iavf_setup_all_tx_resources(struct iavf_adapter *adapter);
 static int iavf_setup_all_rx_resources(struct iavf_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/iavf/i40e_trace.h 
b/drivers/net/ethernet/intel/iavf/iavf_trace.h
similarity index 85%
rename from drivers/net/ethernet/intel/iavf/i40e_trace.h
rename to drivers/net/ethernet/intel/iavf/iavf_trace.h
index 552cfbfcce71..24f34d79f20a 100644
--- a/drivers/net/ethernet/intel/iavf/i40e_trace.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_trace.h
@@ -5,7 +5,7 @@
 
 /* The trace subsystem name for iavf will be "iavf".
  *
- * This file is named i40e_trace.h.
+ * This file is named iavf_trace.h.
  *
  * Since this include file's name is different from the trace
  * subsystem name, we'll have to define TRACE_INCLUDE_FILE at the end
@@ -23,14 +23,14 @@
 #include 
 
 /**
- * i40e_trace() macro enables shared code to refer to trace points
+ * iavf_trace() macro enables shared code to refer to trace points
  * like:
  *
- * trace_i40e{,vf}_example(args...)
+ * trace_iavf{,vf}_example(args...)
  *
  * ... as:
  *
- * i40e_trace(example, args...)
+ * iavf_trace(example, args...)
  *
  * ... to resolve to the PF or VF version of the tracepoint without
  * ifdefs, and to allow tracepoints to be disabled entirely at build
@@ -39,18 +39,18 @@
  * Trace point should always be referred to in the driver via this
  * macro.
  *
- * Similarly, i40e_trace_enabled(trace_name) wraps references to
- * trace_i40e{,vf}__enabled() functions.
+ * Similarly, iavf_trace_enabled(trace_name) wraps references to
+ * trace_iavf{,vf}__enabled() functions.
  */
-#define _I40E_TRACE_NAME(trace_name) (trace_ ## iavf ## _ ## trace_name)
-#define I40E_TRACE_NAME(trace_name) _I40E_TRACE_NAME(trace_name)
+#define _IAVF_TRACE_NAME(trace_name) (trace_ ## iavf ## _ ## trace_name)
+#define IAVF_TRACE_NAME(trace_name) _IAVF_TRACE_NAME(trace_name)
 
-#define i40e_trace(trace_name, args...) I40E_TRACE_NAME(trace_name)(args)
+#define iavf_trace(trace_name, args...) IAVF_TRACE_NAME(trace_name)(args)
 
-#define i40e_trace_enabled(trace_name) I40E_TRACE_NAME(trace_name##_enabled)()
+#define iavf_trace_enabled(trace_name) IAVF_TRACE_NAME(trace_name##_enabled)()
 
 /* Events common to PF and VF. Corresponding versions will be defined
- * for both, named trace_i40e_* and trace_iavf_*. The i40e_trace()
+ * for both, named trace_iavf_* and trace_iavf_*. The iavf_trace()
  * macro above will select the right trace point name for the driver
  * being built from shared code.
  */
@@ -195,8 +195,8 @@ DEFINE_EVENT(
 
 /* Events unique to the VF. */
 
-#endif /* _I40E_TRACE_H_ */
-/* This must be outside ifdef _I40E_TRACE_H */
+#endif /* _IAVF_TRACE_H_ */
+/* This must be outside ifdef _IAVF_TRACE_H */
 
 /* This trace include file is not located in the .../include/trace
  * with the kernel tracepoint definitions, because we're a loadable
@@ -205,5 +205,5 @@ DEFINE_EVENT(
 #undef TRACE_INCLUDE_PATH
 #define TRACE_INCLUDE_PATH .
 #undef TRACE_INCLUDE_FILE
-#define TRACE_INCLUDE_FILE i40e_trace
+#define TRACE_INCLUDE_FILE iavf_trace
 #include 
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c 
b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index 66d9f1bf9467..5164e812f009 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -5,7 +5,7 @@
 #include 
 
 #include "iavf.h"
-#include "i40e_trace.h"
+#include "iavf_trace.h"
 #include "i40e_prototype.h"
 
 static inline __le64 build_ctob(u32 td_cmd, u32 td_offset, unsigned int size,
@@ -211,7 +211,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
/* prevent any other reads prior to eop_desc */
smp_rmb();
 
-   i40e_trace(clean_tx_irq, tx_ring, tx_desc, tx_buf);
+   iavf_trace(clean_tx_irq, tx_ring, tx_desc, tx_buf);
/* if the descriptor isn't done, no work yet to do */
if (!(eop_desc->cmd_type_offset_bsz &
  cpu_to_le64(IAVF_TX_DESC_DTYPE_DESC_DONE)))
@@ -239,7 +239,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,

[PATCH net-next v2 05/14] iavf: move i40evf files to new name

2018-09-14 Thread Jesse Brandeburg

Simply move the i40evf files to the new name, updating the #includes
to track the new names, and updating the Makefile as well.

A future patch will remove the i40e references (after the code
removal patches later in this series).

Signed-off-by: Jesse Brandeburg 
---
 drivers/net/ethernet/intel/iavf/Makefile  | 4 ++--
 drivers/net/ethernet/intel/iavf/{i40evf.h => iavf.h}  | 2 +-
 drivers/net/ethernet/intel/iavf/{i40evf_client.c => iavf_client.c}| 4 ++--
 drivers/net/ethernet/intel/iavf/{i40evf_client.h => iavf_client.h}| 0
 drivers/net/ethernet/intel/iavf/{i40evf_ethtool.c => iavf_ethtool.c}  | 2 +-
 drivers/net/ethernet/intel/iavf/{i40evf_main.c => iavf_main.c}| 4 ++--
 drivers/net/ethernet/intel/iavf/{i40e_txrx.c => iavf_txrx.c}  | 2 +-
 drivers/net/ethernet/intel/iavf/{i40e_txrx.h => iavf_txrx.h}  | 0
 .../net/ethernet/intel/iavf/{i40evf_virtchnl.c => iavf_virtchnl.c}| 4 ++--
 9 files changed, 11 insertions(+), 11 deletions(-)
 rename drivers/net/ethernet/intel/iavf/{i40evf.h => iavf.h} (99%)
 rename drivers/net/ethernet/intel/iavf/{i40evf_client.c => iavf_client.c} (99%)
 rename drivers/net/ethernet/intel/iavf/{i40evf_client.h => iavf_client.h} 
(100%)
 rename drivers/net/ethernet/intel/iavf/{i40evf_ethtool.c => iavf_ethtool.c} 
(99%)
 rename drivers/net/ethernet/intel/iavf/{i40evf_main.c => iavf_main.c} (99%)
 rename drivers/net/ethernet/intel/iavf/{i40e_txrx.c => iavf_txrx.c} (99%)
 rename drivers/net/ethernet/intel/iavf/{i40e_txrx.h => iavf_txrx.h} (100%)
 rename drivers/net/ethernet/intel/iavf/{i40evf_virtchnl.c => iavf_virtchnl.c} 
(99%)

diff --git a/drivers/net/ethernet/intel/iavf/Makefile 
b/drivers/net/ethernet/intel/iavf/Makefile
index ce2dce1e1ebf..fa4c43be2266 100644
--- a/drivers/net/ethernet/intel/iavf/Makefile
+++ b/drivers/net/ethernet/intel/iavf/Makefile
@@ -11,5 +11,5 @@ subdir-ccflags-y += -I$(src)
 
 obj-$(CONFIG_IAVF) += iavf.o
 
-iavf-objs := i40evf_main.o i40evf_ethtool.o i40evf_virtchnl.o \
-i40e_txrx.o i40e_common.o i40e_adminq.o i40evf_client.o
+iavf-objs := iavf_main.o iavf_ethtool.o iavf_virtchnl.o \
+iavf_txrx.o i40e_common.o i40e_adminq.o iavf_client.o
diff --git a/drivers/net/ethernet/intel/iavf/i40evf.h 
b/drivers/net/ethernet/intel/iavf/iavf.h
similarity index 99%
rename from drivers/net/ethernet/intel/iavf/i40evf.h
rename to drivers/net/ethernet/intel/iavf/iavf.h
index 19a93bfdb65c..c7ce2db958b0 100644
--- a/drivers/net/ethernet/intel/iavf/i40evf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -36,7 +36,7 @@
 
 #include "i40e_type.h"
 #include 
-#include "i40e_txrx.h"
+#include "iavf_txrx.h"
 
 #define DEFAULT_DEBUG_LEVEL_SHIFT 3
 #define PFX "iavf: "
diff --git a/drivers/net/ethernet/intel/iavf/i40evf_client.c 
b/drivers/net/ethernet/intel/iavf/iavf_client.c
similarity index 99%
rename from drivers/net/ethernet/intel/iavf/i40evf_client.c
rename to drivers/net/ethernet/intel/iavf/iavf_client.c
index d2660659174d..16971bfc5e43 100644
--- a/drivers/net/ethernet/intel/iavf/i40evf_client.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_client.c
@@ -4,9 +4,9 @@
 #include 
 #include 
 
-#include "i40evf.h"
+#include "iavf.h"
 #include "i40e_prototype.h"
-#include "i40evf_client.h"
+#include "iavf_client.h"
 
 static
 const char iavf_client_interface_version_str[] = IAVF_CLIENT_VERSION_STR;
diff --git a/drivers/net/ethernet/intel/iavf/i40evf_client.h 
b/drivers/net/ethernet/intel/iavf/iavf_client.h
similarity index 100%
rename from drivers/net/ethernet/intel/iavf/i40evf_client.h
rename to drivers/net/ethernet/intel/iavf/iavf_client.h
diff --git a/drivers/net/ethernet/intel/iavf/i40evf_ethtool.c 
b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
similarity index 99%
rename from drivers/net/ethernet/intel/iavf/i40evf_ethtool.c
rename to drivers/net/ethernet/intel/iavf/iavf_ethtool.c
index 0277df40e53f..74a142802074 100644
--- a/drivers/net/ethernet/intel/iavf/i40evf_ethtool.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
@@ -2,7 +2,7 @@
 /* Copyright(c) 2013 - 2018 Intel Corporation. */
 
 /* ethtool support for iavf */
-#include "i40evf.h"
+#include "iavf.h"
 
 #include 
 
diff --git a/drivers/net/ethernet/intel/iavf/i40evf_main.c 
b/drivers/net/ethernet/intel/iavf/iavf_main.c
similarity index 99%
rename from drivers/net/ethernet/intel/iavf/i40evf_main.c
rename to drivers/net/ethernet/intel/iavf/iavf_main.c
index 600ea4040af2..7d815ace2d98 100644
--- a/drivers/net/ethernet/intel/iavf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -1,9 +1,9 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright(c) 2013 - 2018 Intel Corporation. */
 
-#include "i40evf.h"
+#include "iavf.h"
 #include "i40e_prototype.h"
-#include "i40evf_client.h"
+#include "iavf_client.h"
 /* All iavf tracepoints are defined by the include below, which must
  * be included exactly once across the whole kernel with
  * CREATE_TRACE_POINTS defined
diff --git

[PATCH net-next v2 14/14] intel-ethernet: use correct module license

2018-09-14 Thread Jesse Brandeburg

We recently updated all our SPDX identifiers to correctly
indicate our net/ethernet/intel/* drivers were always released
and intended to be released under GPL v2, but the MODULE_LICENSE
declaration was never updated.

Fix the MODULE_LICENSE to be GPL v2, for all our drivers.

Signed-off-by: Jesse Brandeburg 
---
 drivers/net/ethernet/intel/e100.c | 2 +-
 drivers/net/ethernet/intel/e1000/e1000_main.c | 2 +-
 drivers/net/ethernet/intel/e1000e/netdev.c| 2 +-
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 2 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c   | 2 +-
 drivers/net/ethernet/intel/iavf/iavf_main.c   | 4 ++--
 drivers/net/ethernet/intel/ice/ice_main.c | 2 +-
 drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
 drivers/net/ethernet/intel/igbvf/netdev.c | 2 +-
 drivers/net/ethernet/intel/ixgb/ixgb_main.c   | 2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 +-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 2 +-
 12 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/e100.c 
b/drivers/net/ethernet/intel/e100.c
index 27d5f27163d2..7c4b55482f72 100644
--- a/drivers/net/ethernet/intel/e100.c
+++ b/drivers/net/ethernet/intel/e100.c
@@ -164,7 +164,7 @@
 
 MODULE_DESCRIPTION(DRV_DESCRIPTION);
 MODULE_AUTHOR(DRV_COPYRIGHT);
-MODULE_LICENSE("GPL");
+MODULE_LICENSE("GPL v2");
 MODULE_VERSION(DRV_VERSION);
 MODULE_FIRMWARE(FIRMWARE_D101M);
 MODULE_FIRMWARE(FIRMWARE_D101S);
diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c 
b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 2110d5f2da19..7e0f1f96a8a1 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -195,7 +195,7 @@ static struct pci_driver e1000_driver = {
 
 MODULE_AUTHOR("Intel Corporation, ");
 MODULE_DESCRIPTION("Intel(R) PRO/1000 Network Driver");
-MODULE_LICENSE("GPL");
+MODULE_LICENSE("GPL v2");
 MODULE_VERSION(DRV_VERSION);
 
 #define DEFAULT_MSG_ENABLE (NETIF_MSG_DRV|NETIF_MSG_PROBE|NETIF_MSG_LINK)
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 3ba0c90e7055..c0f9faca70c4 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -7592,7 +7592,7 @@ module_exit(e1000_exit_module);
 
 MODULE_AUTHOR("Intel Corporation, ");
 MODULE_DESCRIPTION("Intel(R) PRO/1000 Network Driver");
-MODULE_LICENSE("GPL");
+MODULE_LICENSE("GPL v2");
 MODULE_VERSION(DRV_VERSION);
 
 /* netdev.c */
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 3f536541f45f..503bbc017792 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -21,7 +21,7 @@ static const char fm10k_copyright[] =
 
 MODULE_AUTHOR("Intel Corporation, ");
 MODULE_DESCRIPTION(DRV_SUMMARY);
-MODULE_LICENSE("GPL");
+MODULE_LICENSE("GPL v2");
 MODULE_VERSION(DRV_VERSION);
 
 /* single workqueue for entire fm10k driver */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 5d209d8fe9b8..c7d2c9010fdf 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -91,7 +91,7 @@ MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all), 
Debug mask (0x8XXX
 
 MODULE_AUTHOR("Intel Corporation, ");
 MODULE_DESCRIPTION("Intel(R) Ethernet Connection XL710 Network Driver");
-MODULE_LICENSE("GPL");
+MODULE_LICENSE("GPL v2");
 MODULE_VERSION(DRV_VERSION);
 
 static struct workqueue_struct *i40e_wq;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c 
b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 54d8a1ed05ac..0e2f78175f0e 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -53,8 +53,8 @@ MODULE_DEVICE_TABLE(pci, iavf_pci_tbl);
 
 MODULE_ALIAS("i40evf");
 MODULE_AUTHOR("Intel Corporation, ");
-MODULE_DESCRIPTION("Intel(R) XL710 X710 Virtual Function Network Driver");
-MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Intel(R) Ethernet Adaptive Virtual Function Network 
Driver");
+MODULE_LICENSE("GPL v2");
 MODULE_VERSION(DRV_VERSION);
 
 static struct workqueue_struct *iavf_wq;
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index 1b49a605d094..d54e63785ff0 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -15,7 +15,7 @@ static const char ice_copyright[] = "Copyright (c) 2018, 
Intel Corporation.";
 
 MODULE_AUTHOR("Intel Corporation, ");
 MODULE_DESCRIPTION(DRV_SUMMARY);
-MODULE_LICENSE("GPL");
+MODULE_LICENSE("GPL v2");
 MODULE_VERSION(DRV_VERSION);
 
 static int debug = -1;
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index a32c576c1e65..c18e79112cad 100644
---

[PATCH net-next v2 02/14] iavf: diet and reformat

2018-09-14 Thread Jesse Brandeburg

Remove a bunch of unused code and reformat a few lines. Also
remove some now un-necessary files.

Signed-off-by: Jesse Brandeburg 
---
 drivers/net/ethernet/intel/iavf/i40e_adminq.c |   27 -
 drivers/net/ethernet/intel/iavf/i40e_adminq_cmd.h | 2277 +
 drivers/net/ethernet/intel/iavf/i40e_common.c |  337 ---
 drivers/net/ethernet/intel/iavf/i40e_hmc.h|  215 --
 drivers/net/ethernet/intel/iavf/i40e_lan_hmc.h|  158 --
 drivers/net/ethernet/intel/iavf/i40e_prototype.h  |   65 +-
 drivers/net/ethernet/intel/iavf/i40e_register.h   |  245 ---
 drivers/net/ethernet/intel/iavf/i40e_type.h   |  783 +--
 8 files changed, 50 insertions(+), 4057 deletions(-)
 delete mode 100644 drivers/net/ethernet/intel/iavf/i40e_hmc.h
 delete mode 100644 drivers/net/ethernet/intel/iavf/i40e_lan_hmc.h

diff --git a/drivers/net/ethernet/intel/iavf/i40e_adminq.c 
b/drivers/net/ethernet/intel/iavf/i40e_adminq.c
index 21a0dbf6ccf6..32e0e2d9cdc5 100644
--- a/drivers/net/ethernet/intel/iavf/i40e_adminq.c
+++ b/drivers/net/ethernet/intel/iavf/i40e_adminq.c
@@ -7,16 +7,6 @@
 #include "i40e_adminq.h"
 #include "i40e_prototype.h"
 
-/**
- * i40e_is_nvm_update_op - return true if this is an NVM update operation
- * @desc: API request descriptor
- **/
-static inline bool i40e_is_nvm_update_op(struct i40e_aq_desc *desc)
-{
-   return (desc->opcode == i40e_aqc_opc_nvm_erase) ||
-  (desc->opcode == i40e_aqc_opc_nvm_update);
-}
-
 /**
  *  i40e_adminq_init_regs - Initialize AdminQ registers
  *  @hw: pointer to the hardware structure
@@ -569,9 +559,6 @@ i40e_status i40evf_shutdown_adminq(struct i40e_hw *hw)
i40e_shutdown_asq(hw);
i40e_shutdown_arq(hw);
 
-   if (hw->nvm_buff.va)
-   i40e_free_virt_mem(hw, >nvm_buff);
-
return ret_code;
 }
 
@@ -951,17 +938,3 @@ i40e_status i40evf_clean_arq_element(struct i40e_hw *hw,
 
return ret_code;
 }
-
-void i40evf_resume_aq(struct i40e_hw *hw)
-{
-   /* Registers are reset after PF reset */
-   hw->aq.asq.next_to_use = 0;
-   hw->aq.asq.next_to_clean = 0;
-
-   i40e_config_asq_regs(hw);
-
-   hw->aq.arq.next_to_use = 0;
-   hw->aq.arq.next_to_clean = 0;
-
-   i40e_config_arq_regs(hw);
-}
diff --git a/drivers/net/ethernet/intel/iavf/i40e_adminq_cmd.h 
b/drivers/net/ethernet/intel/iavf/i40e_adminq_cmd.h
index 5fd8529465d4..493bdc5331f7 100644
--- a/drivers/net/ethernet/intel/iavf/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/iavf/i40e_adminq_cmd.h
@@ -307,33 +307,6 @@ enum i40e_admin_queue_opc {
  */
 #define I40E_CHECK_CMD_LENGTH(X)   I40E_CHECK_STRUCT_LEN(16, X)
 
-/* internal (0x00XX) commands */
-
-/* Get version (direct 0x0001) */
-struct i40e_aqc_get_version {
-   __le32 rom_ver;
-   __le32 fw_build;
-   __le16 fw_major;
-   __le16 fw_minor;
-   __le16 api_major;
-   __le16 api_minor;
-};
-
-I40E_CHECK_CMD_LENGTH(i40e_aqc_get_version);
-
-/* Send driver version (indirect 0x0002) */
-struct i40e_aqc_driver_version {
-   u8  driver_major_ver;
-   u8  driver_minor_ver;
-   u8  driver_build_ver;
-   u8  driver_subbuild_ver;
-   u8  reserved[4];
-   __le32  address_high;
-   __le32  address_low;
-};
-
-I40E_CHECK_CMD_LENGTH(i40e_aqc_driver_version);
-
 /* Queue Shutdown (direct 0x0003) */
 struct i40e_aqc_queue_shutdown {
__le32  driver_unloading;
@@ -343,490 +316,6 @@ struct i40e_aqc_queue_shutdown {
 
 I40E_CHECK_CMD_LENGTH(i40e_aqc_queue_shutdown);
 
-/* Set PF context (0x0004, direct) */
-struct i40e_aqc_set_pf_context {
-   u8  pf_id;
-   u8  reserved[15];
-};
-
-I40E_CHECK_CMD_LENGTH(i40e_aqc_set_pf_context);
-
-/* Request resource ownership (direct 0x0008)
- * Release resource ownership (direct 0x0009)
- */
-#define I40E_AQ_RESOURCE_NVM   1
-#define I40E_AQ_RESOURCE_SDP   2
-#define I40E_AQ_RESOURCE_ACCESS_READ   1
-#define I40E_AQ_RESOURCE_ACCESS_WRITE  2
-#define I40E_AQ_RESOURCE_NVM_READ_TIMEOUT  3000
-#define I40E_AQ_RESOURCE_NVM_WRITE_TIMEOUT 18
-
-struct i40e_aqc_request_resource {
-   __le16  resource_id;
-   __le16  access_type;
-   __le32  timeout;
-   __le32  resource_number;
-   u8  reserved[4];
-};
-
-I40E_CHECK_CMD_LENGTH(i40e_aqc_request_resource);
-
-/* Get function capabilities (indirect 0x000A)
- * Get device capabilities (indirect 0x000B)
- */
-struct i40e_aqc_list_capabilites {
-   u8 command_flags;
-#define I40E_AQ_LIST_CAP_PF_INDEX_EN   1
-   u8 pf_index;
-   u8 reserved[2];
-   __le32 count;
-   __le32 addr_high;
-   __le32 addr_low;
-};
-
-I40E_CHECK_CMD_LENGTH(i40e_aqc_list_capabilites);
-
-struct i40e_aqc_list_capabilities_element_resp {
-   __le16  id;
-   u8  major_rev;
-   u8  minor_rev;
-   __le32  number;
-   __le32  logical_id;
-   __le32  phys_id;
-   u8  reserved[16];
-};
-
-/* list of

[PATCH net-next v2 00/14] rename and shrink i40evf

2018-09-14 Thread Jesse Brandeburg

This series contains changes to i40evf so that it becomes a more
generic virtual function driver for current and future silicon.

While doing the rename of i40evf to a more generic name of iavf,
we also put the driver on a severe diet due to how much of the
code was unneeded or was unused.  The outcome is a lean and mean
virtual function driver that continues to work on existing 40GbE
(i40e) virtual devices and prepped for future supported devices,
like the 100GbE (ice) virtual devices.

This solves 2 issues we saw coming or were already present, the
first was constant code duplication happening with i40e/i40evf,
when much of the duplicate code in the i40evf was not used or was
not needed.  The second was to remove the future confusion of why
future VF devices that were not considered "40GbE" only devices
were supported by i40evf.

The thought is that iavf will be the virtual function driver for
all future devices, so it should have a "generic" name to properly
represent that it is the VF driver for multiple generations of
devices.

The last patch in this series is unreleated to the iavf conversion
and just has to do with a MODULE_LICENSE correction.

Known Caveats:
Existing user space configurations may have to change, but the module
alias in patch 1 helps a bit here.

---
NOTE: This series is compile tested, but needs more testing before
  commit.  I expect it will go through Jeff's regular
  intel-wired-lan process.

v2: first non-RFC version, updated Kconfig migration in patch 1
v1: initial RFC


Jesse Brandeburg (14):
  intel-ethernet: rename i40evf to iavf
  iavf: diet and reformat
  iavf: rename functions and structs to new name
  iavf: rename i40e_status to iavf_status
  iavf: move i40evf files to new name
  iavf: remove references to old names
  iavf: rename device ID defines
  iavf: rename I40E_ADMINQ_DESC
  iavf: rename i40e_hw to iavf_hw
  iavf: replace i40e_debug with iavf version
  iavf: tracing infrastructure rename
  iavf: rename most of i40e strings
  iavf: finish renaming files to iavf
  intel-ethernet: use correct module license

 Documentation/networking/00-INDEX  |4 +-
 Documentation/networking/{i40evf.txt => iavf.txt}  |   16 +-
 MAINTAINERS|2 +-
 drivers/net/ethernet/intel/Kconfig |   15 +-
 drivers/net/ethernet/intel/Makefile|2 +-
 drivers/net/ethernet/intel/e100.c  |2 +-
 drivers/net/ethernet/intel/e1000/e1000_main.c  |2 +-
 drivers/net/ethernet/intel/e1000e/netdev.c |2 +-
 drivers/net/ethernet/intel/fm10k/fm10k_main.c  |2 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c|2 +-
 drivers/net/ethernet/intel/i40evf/i40e_devids.h|   34 -
 drivers/net/ethernet/intel/i40evf/i40e_hmc.h   |  215 --
 drivers/net/ethernet/intel/i40evf/i40e_lan_hmc.h   |  158 --
 drivers/net/ethernet/intel/i40evf/i40e_register.h  |  313 ---
 .../net/ethernet/intel/{i40evf => iavf}/Makefile   |   11 +-
 .../ethernet/intel/{i40evf => iavf}/i40e_adminq.c  |  309 ++-
 .../ethernet/intel/{i40evf => iavf}/i40e_adminq.h  |   35 +-
 .../intel/{i40evf => iavf}/i40e_adminq_cmd.h   | 2281 +---
 .../intel/{i40evf/i40evf.h => iavf/iavf.h} |  407 ++--
 .../{i40evf/i40e_alloc.h => iavf/iavf_alloc.h} |   47 +-
 .../{i40evf/i40evf_client.c => iavf/iavf_client.c} |  200 +-
 .../{i40evf/i40evf_client.h => iavf/iavf_client.h} |   30 +-
 .../{i40evf/i40e_common.c => iavf/iavf_common.c}   | 1105 --
 drivers/net/ethernet/intel/iavf/iavf_devids.h  |   12 +
 .../i40evf_ethtool.c => iavf/iavf_ethtool.c}   |  510 +++--
 .../{i40evf/i40evf_main.c => iavf/iavf_main.c} | 1688 ---
 .../{i40evf/i40e_osdep.h => iavf/iavf_osdep.h} |   28 +-
 .../i40e_prototype.h => iavf/iavf_prototype.h} |  147 +-
 drivers/net/ethernet/intel/iavf/iavf_register.h|   68 +
 .../{i40evf/i40e_status.h => iavf/iavf_status.h}   |8 +-
 .../{i40evf/i40e_trace.h => iavf/iavf_trace.h} |   86 +-
 .../intel/{i40evf/i40e_txrx.c => iavf/iavf_txrx.c} |  804 +++
 .../intel/{i40evf/i40e_txrx.h => iavf/iavf_txrx.h} |  359 ++-
 .../intel/{i40evf/i40e_type.h => iavf/iavf_type.h} | 1604 --
 .../i40evf_virtchnl.c => iavf/iavf_virtchnl.c} |  501 +++--
 drivers/net/ethernet/intel/ice/ice_main.c  |2 +-
 drivers/net/ethernet/intel/igb/igb_main.c  |2 +-
 drivers/net/ethernet/intel/igbvf/netdev.c  |2 +-
 drivers/net/ethernet/intel/ixgb/ixgb_main.c|2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |2 +-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c  |2 +-
 41 files changed, 3440 insertions(+), 7581 deletions(-)
 rename Documentation/networking/{i40evf.txt => iavf.txt} (72%)
 delete mode 100644 drivers/net/ethernet/intel/i40evf/i40e_devids.h
 delete mode 100644 drivers/net/ethernet/intel/i40evf/i40e_hmc.h
 delete mode 100644

[PATCH net-next v2 01/14] intel-ethernet: rename i40evf to iavf

2018-09-14 Thread Jesse Brandeburg

Rename the Intel Ethernet Adaptive Virtual Function driver
(i40evf) to a new name (iavf) that is more consistent with
the ongoing maintenance of the driver as the universal VF driver
for multiple product lines.

This first patch fixes up the directory names and the .ko name,
intentionally ignoring the function names inside the driver
for now.  Basically this is the simplest patch that gets
the rename done and will be followed by other patches that
rename the internal functions.

This patch also addresses a couple of string/name issues
and updates the Copyright year.

Also, made sure to add a MODULE_ALIAS to the old name.

Signed-off-by: Jesse Brandeburg 

---

v2: add Kconfig migration as suggested by davem
---
 Documentation/networking/00-INDEX|  4 ++--
 Documentation/networking/{i40evf.txt => iavf.txt}| 16 +---
 MAINTAINERS  |  2 +-
 drivers/net/ethernet/intel/Kconfig   | 15 +++
 drivers/net/ethernet/intel/Makefile  |  2 +-
 drivers/net/ethernet/intel/{i40evf => iavf}/Makefile | 11 +--
 .../net/ethernet/intel/{i40evf => iavf}/i40e_adminq.c|  0
 .../net/ethernet/intel/{i40evf => iavf}/i40e_adminq.h|  0
 .../ethernet/intel/{i40evf => iavf}/i40e_adminq_cmd.h|  0
 drivers/net/ethernet/intel/{i40evf => iavf}/i40e_alloc.h |  0
 .../net/ethernet/intel/{i40evf => iavf}/i40e_common.c|  0
 .../net/ethernet/intel/{i40evf => iavf}/i40e_devids.h|  0
 drivers/net/ethernet/intel/{i40evf => iavf}/i40e_hmc.h   |  0
 .../net/ethernet/intel/{i40evf => iavf}/i40e_lan_hmc.h   |  0
 drivers/net/ethernet/intel/{i40evf => iavf}/i40e_osdep.h |  0
 .../net/ethernet/intel/{i40evf => iavf}/i40e_prototype.h |  0
 .../net/ethernet/intel/{i40evf => iavf}/i40e_register.h  |  0
 .../net/ethernet/intel/{i40evf => iavf}/i40e_status.h|  0
 drivers/net/ethernet/intel/{i40evf => iavf}/i40e_trace.h |  0
 drivers/net/ethernet/intel/{i40evf => iavf}/i40e_txrx.c  |  0
 drivers/net/ethernet/intel/{i40evf => iavf}/i40e_txrx.h  |  0
 drivers/net/ethernet/intel/{i40evf => iavf}/i40e_type.h  |  0
 drivers/net/ethernet/intel/{i40evf => iavf}/i40evf.h |  0
 .../net/ethernet/intel/{i40evf => iavf}/i40evf_client.c  |  0
 .../net/ethernet/intel/{i40evf => iavf}/i40evf_client.h  |  0
 .../net/ethernet/intel/{i40evf => iavf}/i40evf_ethtool.c |  0
 .../net/ethernet/intel/{i40evf => iavf}/i40evf_main.c|  7 ---
 .../ethernet/intel/{i40evf => iavf}/i40evf_virtchnl.c|  0
 28 files changed, 33 insertions(+), 24 deletions(-)
 rename Documentation/networking/{i40evf.txt => iavf.txt} (72%)
 rename drivers/net/ethernet/intel/{i40evf => iavf}/Makefile (37%)
 rename drivers/net/ethernet/intel/{i40evf => iavf}/i40e_adminq.c (100%)
 rename drivers/net/ethernet/intel/{i40evf => iavf}/i40e_adminq.h (100%)
 rename drivers/net/ethernet/intel/{i40evf => iavf}/i40e_adminq_cmd.h (100%)
 rename drivers/net/ethernet/intel/{i40evf => iavf}/i40e_alloc.h (100%)
 rename drivers/net/ethernet/intel/{i40evf => iavf}/i40e_common.c (100%)
 rename drivers/net/ethernet/intel/{i40evf => iavf}/i40e_devids.h (100%)
 rename drivers/net/ethernet/intel/{i40evf => iavf}/i40e_hmc.h (100%)
 rename drivers/net/ethernet/intel/{i40evf => iavf}/i40e_lan_hmc.h (100%)
 rename drivers/net/ethernet/intel/{i40evf => iavf}/i40e_osdep.h (100%)
 rename drivers/net/ethernet/intel/{i40evf => iavf}/i40e_prototype.h (100%)
 rename drivers/net/ethernet/intel/{i40evf => iavf}/i40e_register.h (100%)
 rename drivers/net/ethernet/intel/{i40evf => iavf}/i40e_status.h (100%)
 rename drivers/net/ethernet/intel/{i40evf => iavf}/i40e_trace.h (100%)
 rename drivers/net/ethernet/intel/{i40evf => iavf}/i40e_txrx.c (100%)
 rename drivers/net/ethernet/intel/{i40evf => iavf}/i40e_txrx.h (100%)
 rename drivers/net/ethernet/intel/{i40evf => iavf}/i40e_type.h (100%)
 rename drivers/net/ethernet/intel/{i40evf => iavf}/i40evf.h (100%)
 rename drivers/net/ethernet/intel/{i40evf => iavf}/i40evf_client.c (100%)
 rename drivers/net/ethernet/intel/{i40evf => iavf}/i40evf_client.h (100%)
 rename drivers/net/ethernet/intel/{i40evf => iavf}/i40evf_ethtool.c (100%)
 rename drivers/net/ethernet/intel/{i40evf => iavf}/i40evf_main.c (99%)
 rename drivers/net/ethernet/intel/{i40evf => iavf}/i40evf_virtchnl.c (100%)

diff --git a/Documentation/networking/00-INDEX 
b/Documentation/networking/00-INDEX
index dcbccae4043e..f4f2b5d6c8d8 100644
--- a/Documentation/networking/00-INDEX
+++ b/Documentation/networking/00-INDEX
@@ -94,8 +94,8 @@ gianfar.txt
- Gianfar Ethernet Driver.
 i40e.txt
- README for the Intel Ethernet Controller XL710 Driver (i40e).
-i40evf.txt
-   - Short note on the Driver for the Intel(R) XL710 X710 Virtual Function
+iavf.txt
+   - README for the Intel Ethernet Adaptive Virtual Function Driver (iavf).
 ieee802154.txt
- Linux IEEE 802.15.4 implementation, API and drivers
 igb.txt
diff --git

Re: [PATH RFC net-next 5/8] net: phy: Add limkmode equivalents to some of the MII ethtool helpers

2018-09-14 Thread Florian Fainelli

On 09/14/2018 02:38 PM, Andrew Lunn wrote:
> Add helpers which take a linkmode rather than a u32 ethtool for
> advertising settings.
> 
> Signed-off-by: Andrew Lunn 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATH RFC net-next 4/8] net: phy: Add helper for advertise to lcl value

2018-09-14 Thread Florian Fainelli

On 09/14/2018 02:38 PM, Andrew Lunn wrote:
> Add a helper to convert the local advertising to an LCL capabilities,
> which is then used to resolve pause flow control settings.
> 
> Signed-off-by: Andrew Lunn 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATH RFC net-next 2/8] net: phy: Add phydev_warn()

2018-09-14 Thread Andrew Lunn

On Fri, Sep 14, 2018 at 03:10:36PM -0700, Florian Fainelli wrote:
> On 09/14/2018 02:38 PM, Andrew Lunn wrote:
> > Not all new style LINK_MODE bits can be converted into old style
> > SUPPORTED bits. We need to warn when such a conversion is attempted.
> > Add a helper for this.
> > 
> > Signed-off-by: Andrew Lunn 
> 
> Acked-by: Florian Fainelli 
> 
> Do you mind converting drivers/net/phy/marvell10g.c to use it? I would
> also suggest adding phydev_info() while we are at it and do the two
> conversions to it that exist in drivers/net/phy/phy_device.c?

Yes, we might as well have the full set.

 Andrew

Re: [PATH RFC net-next 3/8] net: phy: Add helper to convert MII ADV register to a linkmode

2018-09-14 Thread Andrew Lunn

On Fri, Sep 14, 2018 at 03:23:14PM -0700, Florian Fainelli wrote:
> On 09/14/2018 02:38 PM, Andrew Lunn wrote:
> > The phy_mii_ioctl can be used to write a value into the MII_ADVERTISE
> > register in the PHY. Since this changes the state of the PHY, we need
> > to make the same change to phydev->advertising. Add a helper which can
> > convert the register value to a linkmode.
> 
> It would have been nice if we could eliminate the duplication between
> mii_adv_to_ethtool_adv_t() and mii_adv_to_linkmode_adv_t() but I don't
> really see how without changing the former function's signature.

Some of these functions are also used by non-phylib MAC drivers. So
the ethtool version cannot be eliminated.

And the UAPI for EEE still uses a u32 for which modes EEE is
advertised :-(

   Andrew

Re: [PATCH net-next RFC 6/8] net: make gro configurable

2018-09-14 Thread Willem de Bruijn

On Fri, Sep 14, 2018 at 1:59 PM Willem de Bruijn
 wrote:
>
> From: Willem de Bruijn 
>
> Add net_offload flag NET_OFF_FLAG_GRO_OFF. If set, a net_offload will
> not be used for gro receive processing.
>
> Also add sysctl helper proc_do_net_offload that toggles this flag and
> register sysctls net.{core,ipv4,ipv6}.gro
>
> Signed-off-by: Willem de Bruijn 
> ---
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 20d9552afd38..0fd5273bc931 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -154,6 +154,7 @@
>  #define GRO_MAX_HEAD (MAX_HEADER + 128)
>
>  static DEFINE_SPINLOCK(ptype_lock);
> +DEFINE_SPINLOCK(offload_lock);
>  struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly;
>  struct list_head ptype_all __read_mostly;  /* Taps */
>  static struct list_head offload_base __read_mostly;
> diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
> index b1a2c5e38530..d2d72afdd9eb 100644
> --- a/net/core/sysctl_net_core.c
> +++ b/net/core/sysctl_net_core.c
> @@ -15,6 +15,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include 
>  #include 
> @@ -34,6 +35,58 @@ static int net_msg_warn; /* Unused, but still a sysctl 
> */
>  int sysctl_fb_tunnels_only_for_init_net __read_mostly = 0;
>  EXPORT_SYMBOL(sysctl_fb_tunnels_only_for_init_net);
>
> +extern spinlock_t offload_lock;
> +
> +#define NET_OFF_TBL_LEN256
> +
> +int proc_do_net_offload(struct ctl_table *ctl, int write, void __user 
> *buffer,
> +   size_t *lenp, loff_t *ppos)
> +{
> +   unsigned long bitmap[NET_OFF_TBL_LEN / (sizeof(unsigned long) << 3)];
> +   struct ctl_table tbl = { .maxlen = NET_OFF_TBL_LEN, .data = bitmap };
> +   unsigned long flag = (unsigned long) ctl->extra2;
> +   struct net_offload __rcu **offs = ctl->extra1;
> +   struct net_offload *off;
> +   int i, ret;
> +
> +   memset(bitmap, 0, sizeof(bitmap));
> +
> +   spin_lock(_lock);
> +
> +   for (i = 0; i < tbl.maxlen; i++) {
> +   off = rcu_dereference_protected(offs[i], 
> lockdep_is_held(_lock));
> +   if (off && off->flags & flag) {

This does not actually work as is. No protocol will have this flag set
out of the box.

I was in the middle of rewriting some of this when it became topical,
so I sent it out for discussion. It's bound not to be the only bug of
the patchset as is. I'll work through them to get it back in shape.

Re: [PATCH net-next RFC 6/8] net: make gro configurable

2018-09-14 Thread Willem de Bruijn

On Fri, Sep 14, 2018 at 6:50 PM Willem de Bruijn
 wrote:
>
> On Fri, Sep 14, 2018 at 2:39 PM Stephen Hemminger
>  wrote:
> >
> > On Fri, 14 Sep 2018 13:59:39 -0400
> > Willem de Bruijn  wrote:
> >
> > > diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> > > index e5d236595206..8cb8e02c8ab6 100644
> > > --- a/drivers/net/vxlan.c
> > > +++ b/drivers/net/vxlan.c
> > > @@ -572,6 +572,7 @@ static struct sk_buff *vxlan_gro_receive(struct sock 
> > > *sk,
> > >struct list_head *head,
> > >struct sk_buff *skb)
> > >  {
> > > + const struct net_offload *ops;
> > >   struct sk_buff *pp = NULL;
> > >   struct sk_buff *p;
> > >   struct vxlanhdr *vh, *vh2;
> > > @@ -606,6 +607,12 @@ static struct sk_buff *vxlan_gro_receive(struct sock 
> > > *sk,
> > >   goto out;
> > >   }
> > >
> > > + rcu_read_lock();
> > > + ops = net_gro_receive(dev_offloads, ETH_P_TEB);
> > > + rcu_read_unlock();
> > > + if (!ops)
> > > + goto out;
> >
> > Isn't rcu_read_lock already held here?
> > RCU read lock is always held in the receive handler path
>
> There is a critical section on receive, taken in
> netif_receive_skb_core, but gro code runs before that. All the
> existing gro handlers call rcu_read_lock.

Though if dev_gro_receive is the entry point for all of gro, then
all other handlers are ensured to be executed within its rcu
readside section.

Re: [PATCH net-next RFC 6/8] net: make gro configurable

2018-09-14 Thread Willem de Bruijn

On Fri, Sep 14, 2018 at 2:39 PM Stephen Hemminger
 wrote:
>
> On Fri, 14 Sep 2018 13:59:39 -0400
> Willem de Bruijn  wrote:
>
> > diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> > index e5d236595206..8cb8e02c8ab6 100644
> > --- a/drivers/net/vxlan.c
> > +++ b/drivers/net/vxlan.c
> > @@ -572,6 +572,7 @@ static struct sk_buff *vxlan_gro_receive(struct sock 
> > *sk,
> >struct list_head *head,
> >struct sk_buff *skb)
> >  {
> > + const struct net_offload *ops;
> >   struct sk_buff *pp = NULL;
> >   struct sk_buff *p;
> >   struct vxlanhdr *vh, *vh2;
> > @@ -606,6 +607,12 @@ static struct sk_buff *vxlan_gro_receive(struct sock 
> > *sk,
> >   goto out;
> >   }
> >
> > + rcu_read_lock();
> > + ops = net_gro_receive(dev_offloads, ETH_P_TEB);
> > + rcu_read_unlock();
> > + if (!ops)
> > + goto out;
>
> Isn't rcu_read_lock already held here?
> RCU read lock is always held in the receive handler path

There is a critical section on receive, taken in
netif_receive_skb_core, but gro code runs before that. All the
existing gro handlers call rcu_read_lock.

> > +
> >   skb_gro_pull(skb, sizeof(struct vxlanhdr)); /* pull vxlan header */
> >
> >   list_for_each_entry(p, head, list) {
> > @@ -621,6 +628,7 @@ static struct sk_buff *vxlan_gro_receive(struct sock 
> > *sk,
> >   }
> >
> >   pp = call_gro_receive(eth_gro_receive, head, skb);
> > +
> >   flush = 0;
>
> whitespace change crept into this patch.

Oops, thanks.

Re: [PATH RFC net-next 3/8] net: phy: Add helper to convert MII ADV register to a linkmode

2018-09-14 Thread Florian Fainelli

On 09/14/2018 02:38 PM, Andrew Lunn wrote:
> The phy_mii_ioctl can be used to write a value into the MII_ADVERTISE
> register in the PHY. Since this changes the state of the PHY, we need
> to make the same change to phydev->advertising. Add a helper which can
> convert the register value to a linkmode.

It would have been nice if we could eliminate the duplication between
mii_adv_to_ethtool_adv_t() and mii_adv_to_linkmode_adv_t() but I don't
really see how without changing the former function's signature.

Reviewed-by: Florian Fainelli 

> 
> Signed-off-by: Andrew Lunn 
> ---
>  include/linux/mii.h | 31 +++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/include/linux/mii.h b/include/linux/mii.h
> index 567047ef0309..8c7da9473ad9 100644
> --- a/include/linux/mii.h
> +++ b/include/linux/mii.h
> @@ -303,6 +303,37 @@ static inline u32 mii_lpa_to_ethtool_lpa_x(u32 lpa)
>   return result | mii_adv_to_ethtool_adv_x(lpa);
>  }
>  
> +/**
> + * mii_adv_to_linkmode_adv_t
> + * @advertising:pointer to destination link mode.
> + * @adv: value of the MII_ADVERTISE register
> + *
> + * A small helper function that translates MII_ADVERTISE bits
> + * to linkmode advertisement settings.
> + */
> +static inline void mii_adv_to_linkmode_adv_t(unsigned long *advertising,
> +  u32 adv)
> +{
> + linkmode_zero(advertising);
> +
> + if (adv & ADVERTISE_10HALF)
> + linkmode_set_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT,
> +  advertising);
> + if (adv & ADVERTISE_10FULL)
> + linkmode_set_bit(ETHTOOL_LINK_MODE_10baseT_Full_BIT,
> +  advertising);
> + if (adv & ADVERTISE_100HALF)
> + linkmode_set_bit(ETHTOOL_LINK_MODE_100baseT_Half_BIT,
> +  advertising);
> + if (adv & ADVERTISE_100FULL)
> + linkmode_set_bit(ETHTOOL_LINK_MODE_100baseT_Full_BIT,
> +  advertising);
> + if (adv & ADVERTISE_PAUSE_CAP)
> + linkmode_set_bit(ETHTOOL_LINK_MODE_Pause_BIT, advertising);
> + if (adv & ADVERTISE_PAUSE_ASYM)
> + linkmode_set_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT, advertising);
> +}
> +
>  /**
>   * mii_advertise_flowctrl - get flow control advertisement flags
>   * @cap: Flow control capabilities (FLOW_CTRL_RX, FLOW_CTRL_TX or both)
> 


-- 
Florian

Re: [PATH RFC net-next 2/8] net: phy: Add phydev_warn()

2018-09-14 Thread Florian Fainelli

On 09/14/2018 02:38 PM, Andrew Lunn wrote:
> Not all new style LINK_MODE bits can be converted into old style
> SUPPORTED bits. We need to warn when such a conversion is attempted.
> Add a helper for this.
> 
> Signed-off-by: Andrew Lunn 

Acked-by: Florian Fainelli 

Do you mind converting drivers/net/phy/marvell10g.c to use it? I would
also suggest adding phydev_info() while we are at it and do the two
conversions to it that exist in drivers/net/phy/phy_device.c?

Thanks!
-- 
Florian

Re: [PATH RFC net-next 1/8] net: phy: Move linkmode helpers to somewhere public

2018-09-14 Thread Florian Fainelli

On 09/14/2018 02:38 PM, Andrew Lunn wrote:
> phylink has some useful helpers to working with linkmode bitmaps.
> Move them to there own header so other code can use them.

Good idea, I wonder if we should create a more specific directory within
include/linux/ that can host a variety of PHYLIB, PHYLINK and what not
header files, but this could be solved later on.

> 
> Signed-off-by: Andrew Lunn 

Acked-by: Florian Fainelli 
-- 
Florian

Re: [bpf-next, v4 0/5] Introduce eBPF flow dissector

2018-09-14 Thread Petar Penkov

On Fri, Sep 14, 2018 at 2:47 PM, Y Song  wrote:
> On Fri, Sep 14, 2018 at 12:24 PM Alexei Starovoitov
>  wrote:
>>
>> On Fri, Sep 14, 2018 at 07:46:17AM -0700, Petar Penkov wrote:
>> > From: Petar Penkov 
>> >
>> > This patch series hardens the RX stack by allowing flow dissection in BPF,
>> > as previously discussed [1]. Because of the rigorous checks of the BPF
>> > verifier, this provides significant security guarantees. In particular, the
>> > BPF flow dissector cannot get inside of an infinite loop, as with
>> > CVE-2013-4348, because BPF programs are guaranteed to terminate. It cannot
>> > read outside of packet bounds, because all memory accesses are checked.
>> > Also, with BPF the administrator can decide which protocols to support,
>> > reducing potential attack surface. Rarely encountered protocols can be
>> > excluded from dissection and the program can be updated without kernel
>> > recompile or reboot if a bug is discovered.
>> >
>> > Patch 1 adds infrastructure to execute a BPF program in __skb_flow_dissect.
>> > This includes a new BPF program and attach type.
>> >
>> > Patch 2 adds the new BPF flow dissector definitions to tools/uapi.
>> >
>> > Patch 3 adds support for the new BPF program type to libbpf and bpftool.
>> >
>> > Patch 4 adds a flow dissector program in BPF. This parses most protocols in
>> > __skb_flow_dissect in BPF for a subset of flow keys (basic, control, ports,
>> > and address types).
>> >
>> > Patch 5 adds a selftest that attaches the BPF program to the flow dissector
>> > and sends traffic with different levels of encapsulation.
>> >
>> > Performance Evaluation:
>> > The in-kernel implementation was compared against the demo program from
>> > patch 4 using the test in patch 5 with IPv4/UDP traffic over 10 seconds.
>> >   $perf record -a -C 4 taskset -c 4 ./test_flow_dissector -i 4 -f 8 \
>> >   -t 10
>>
>> Looks great. Applied to bpf-next with one extra patch:
>>  SEC("dissect")
>> -int dissect(struct __sk_buff *skb)
>> +int _dissect(struct __sk_buff *skb)
>>
>> otherwise the test doesn't build.
>> I'm not sure how it builds for you. Which llvm did you use?
>
> This is a known issue. IIRC, llvm <= 4 should be okay and llvm >= 5 would 
> fail.
>
I was running a much older version of llvm so I imagine this was the
issue. Thanks for the fix!
>>
>> Also above command works and ipv4 test in ./test_flow_dissector.sh
>> is passing as well, but it still fails at the end for me:
>> ./test_flow_dissector.sh
>> bpffs not mounted. Mounting...
>> 0: IP
>> 1: IPV6
>> 2: IPV6OP
>> 3: IPV6FR
>> 4: MPLS
>> 5: VLAN
>> Testing IPv4...
>> inner.dest4: 127.0.0.1
>> inner.source4: 127.0.0.3
>> pkts: tx=10 rx=10
>> inner.dest4: 127.0.0.1
>> inner.source4: 127.0.0.3
>> pkts: tx=10 rx=0
>> inner.dest4: 127.0.0.1
>> inner.source4: 127.0.0.3
>> pkts: tx=10 rx=10
>> Testing IPIP...
>> tunnels before test:
>> tunl0: any/ip remote any local any ttl inherit nopmtudisc
>> sit_test_LV5N: any/ip remote 127.0.0.2 local 127.0.0.1 dev lo ttl inherit
>> ipip_test_LV5N: any/ip remote 127.0.0.2 local 127.0.0.1 dev lo ttl inherit
>> sit0: ipv6/ip remote any local any ttl 64 nopmtudisc
>> gre_test_LV5N: gre/ip remote 127.0.0.2 local 127.0.0.1 dev lo ttl inherit
>> gre0: gre/ip remote any local any ttl inherit nopmtudisc
>> inner.dest4: 192.168.0.1
>> inner.source4: 1.1.1.1
>> encap proto:   4
>> outer.dest4: 127.0.0.1
>> outer.source4: 127.0.0.2
>> pkts: tx=10 rx=0
>> tunnels after test:
>> tunl0: any/ip remote any local any ttl inherit nopmtudisc
>> sit0: ipv6/ip remote any local any ttl 64 nopmtudisc
>> gre0: gre/ip remote any local any ttl inherit nopmtudisc
>> selftests: test_flow_dissector [FAILED]
>>
>> is it something in my setup or test is broken?
>>
I just reran the test and it is passing. We will investigate what
could be causing the issue.

[PATCH bpf-next] tools/bpf: bpftool: improve output format for bpftool net

2018-09-14 Thread Yonghong Song

This is a followup patch for Commit f6f3bac08ff9
("tools/bpf: bpftool: add net support").
Some improvements are made for the bpftool net output.
Specially, plain output is more concise such that
per attachment should nicely fit in one line.
Compared to previous output, the prog tag is removed
since it can be easily obtained with program id.
Similar to xdp attachments, the device name is added
to tc_filters attachments.

The bpf program attached through shared block
mechanism is supported as well.
  $ ip link add dev v1 type veth peer name v2
  $ tc qdisc add dev v1 ingress_block 10 egress_block 20 clsact
  $ tc qdisc add dev v2 ingress_block 10 egress_block 20 clsact
  $ tc filter add block 10 protocol ip prio 25 bpf obj bpf_shared.o sec ingress 
flowid 1:1
  $ tc filter add block 20 protocol ip prio 30 bpf obj bpf_cyclic.o sec 
classifier flowid 1:1
  $ bpftool net
  xdp [
  ]
  tc_filters [
   v2(7) qdisc_clsact_ingress bpf_shared.o:[ingress] id 23
   v2(7) qdisc_clsact_egress bpf_cyclic.o:[classifier] id 24
   v1(8) qdisc_clsact_ingress bpf_shared.o:[ingress] id 23
   v1(8) qdisc_clsact_egress bpf_cyclic.o:[classifier] id 24
  ]

The documentation and "bpftool net help" are updated
to make it clear that current implementation only
supports xdp and tc attachments. For programs
attached to cgroups, "bpftool cgroup" can be used
to dump attachments. For other programs e.g.
sk_{filter,skb,msg,reuseport} and lwt/seg6,
iproute2 tools should be used.

The new output:
  $ bpftool net
  xdp [
   eth0(2) id/drv 198
  ]
  tc_filters [
   eth0(2) qdisc_clsact_ingress fbflow_icmp id 335 act [{icmp_action id 336}]
   eth0(2) qdisc_clsact_egress fbflow_egress id 334
  ]
  $ bpftool -jp net
  [{
"xdp": [{
"devname": "eth0",
"ifindex": 2,
"id/drv": 198
}
],
"tc_filters": [{
"devname": "eth0",
"ifindex": 2,
"kind": "qdisc_clsact_ingress",
"name": "fbflow_icmp",
"id": 335,
"act": [{
"name": "icmp_action",
"id": 336
}
]
},{
"devname": "eth0",
"ifindex": 2,
"kind": "qdisc_clsact_egress",
"name": "fbflow_egress",
"id": 334
}
]
}
  ]

Signed-off-by: Yonghong Song 
---
 .../bpf/bpftool/Documentation/bpftool-net.rst |  58 +-
 tools/bpf/bpftool/main.h  |   3 +-
 tools/bpf/bpftool/net.c   | 100 --
 tools/bpf/bpftool/netlink_dumper.c|  78 ++
 tools/bpf/bpftool/netlink_dumper.h|  20 ++--
 5 files changed, 143 insertions(+), 116 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-net.rst 
b/tools/bpf/bpftool/Documentation/bpftool-net.rst
index 48a61837a264..433581592c72 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-net.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-net.rst
@@ -26,9 +26,20 @@ NET COMMANDS
 DESCRIPTION
 ===
**bpftool net { show | list } [ dev name ]**
- List all networking device driver and tc attachment in the 
system.
-
-  Output will start with all xdp program attachment, followed 
by
+  List bpf program attachments in the kernel networking 
subsystem.
+
+  Currently, only device driver xdp attachments and tc filter
+  classification/action attachments are implemented, i.e., for
+  program types **BPF_PROG_TYPE_SCHED_CLS**,
+  **BPF_PROG_TYPE_SCHED_ACT** and **BPF_PROG_TYPE_XDP**.
+  For programs attached to a particular cgroup, e.g.,
+  **BPF_PROG_TYPE_CGROUP_SKB**, **BPF_PROG_TYPE_CGROUP_SOCK**,
+  **BPF_PROG_TYPE_SOCK_OPS** and 
**BPF_PROG_TYPE_CGROUP_SOCK_ADDR**,
+  users can use **bpftool cgroup** to dump cgroup attachments.
+  For sk_{filter, skb, msg, reuseport} and lwt/seg6
+  bpf programs, users should consult other tools, e.g., 
iproute2.
+
+  The current output will start with all xdp program 
attachments, followed by
   all tc class/qdisc bpf program attachments. Both xdp 
programs and
   tc programs are ordered based on ifindex number. If multiple 
bpf
   programs attached to the same networking device through **tc 
filter**,
@@ -62,19 +73,14 @@ EXAMPLES
 ::
 
   xdp [
-  ifindex 2 devname eth0 prog_id 198
+   eth0(2) id/drv 198
   ]
   tc_filters [
-  ifindex 2 kind qdisc_htb name prefix_matcher.o:[cls_prefix_matcher_htb]
-prog_id 111727 tag d08fe3b4319bc2fd act []
-  ifindex 2 kind qdisc_clsact_ingress name fbflow_icmp
-prog_id 130246 tag 3f265c7f26db62c9 act []
-

Re: [bpf-next, v4 0/5] Introduce eBPF flow dissector

2018-09-14 Thread Y Song

On Fri, Sep 14, 2018 at 12:24 PM Alexei Starovoitov
 wrote:
>
> On Fri, Sep 14, 2018 at 07:46:17AM -0700, Petar Penkov wrote:
> > From: Petar Penkov 
> >
> > This patch series hardens the RX stack by allowing flow dissection in BPF,
> > as previously discussed [1]. Because of the rigorous checks of the BPF
> > verifier, this provides significant security guarantees. In particular, the
> > BPF flow dissector cannot get inside of an infinite loop, as with
> > CVE-2013-4348, because BPF programs are guaranteed to terminate. It cannot
> > read outside of packet bounds, because all memory accesses are checked.
> > Also, with BPF the administrator can decide which protocols to support,
> > reducing potential attack surface. Rarely encountered protocols can be
> > excluded from dissection and the program can be updated without kernel
> > recompile or reboot if a bug is discovered.
> >
> > Patch 1 adds infrastructure to execute a BPF program in __skb_flow_dissect.
> > This includes a new BPF program and attach type.
> >
> > Patch 2 adds the new BPF flow dissector definitions to tools/uapi.
> >
> > Patch 3 adds support for the new BPF program type to libbpf and bpftool.
> >
> > Patch 4 adds a flow dissector program in BPF. This parses most protocols in
> > __skb_flow_dissect in BPF for a subset of flow keys (basic, control, ports,
> > and address types).
> >
> > Patch 5 adds a selftest that attaches the BPF program to the flow dissector
> > and sends traffic with different levels of encapsulation.
> >
> > Performance Evaluation:
> > The in-kernel implementation was compared against the demo program from
> > patch 4 using the test in patch 5 with IPv4/UDP traffic over 10 seconds.
> >   $perf record -a -C 4 taskset -c 4 ./test_flow_dissector -i 4 -f 8 \
> >   -t 10
>
> Looks great. Applied to bpf-next with one extra patch:
>  SEC("dissect")
> -int dissect(struct __sk_buff *skb)
> +int _dissect(struct __sk_buff *skb)
>
> otherwise the test doesn't build.
> I'm not sure how it builds for you. Which llvm did you use?

This is a known issue. IIRC, llvm <= 4 should be okay and llvm >= 5 would fail.

>
> Also above command works and ipv4 test in ./test_flow_dissector.sh
> is passing as well, but it still fails at the end for me:
> ./test_flow_dissector.sh
> bpffs not mounted. Mounting...
> 0: IP
> 1: IPV6
> 2: IPV6OP
> 3: IPV6FR
> 4: MPLS
> 5: VLAN
> Testing IPv4...
> inner.dest4: 127.0.0.1
> inner.source4: 127.0.0.3
> pkts: tx=10 rx=10
> inner.dest4: 127.0.0.1
> inner.source4: 127.0.0.3
> pkts: tx=10 rx=0
> inner.dest4: 127.0.0.1
> inner.source4: 127.0.0.3
> pkts: tx=10 rx=10
> Testing IPIP...
> tunnels before test:
> tunl0: any/ip remote any local any ttl inherit nopmtudisc
> sit_test_LV5N: any/ip remote 127.0.0.2 local 127.0.0.1 dev lo ttl inherit
> ipip_test_LV5N: any/ip remote 127.0.0.2 local 127.0.0.1 dev lo ttl inherit
> sit0: ipv6/ip remote any local any ttl 64 nopmtudisc
> gre_test_LV5N: gre/ip remote 127.0.0.2 local 127.0.0.1 dev lo ttl inherit
> gre0: gre/ip remote any local any ttl inherit nopmtudisc
> inner.dest4: 192.168.0.1
> inner.source4: 1.1.1.1
> encap proto:   4
> outer.dest4: 127.0.0.1
> outer.source4: 127.0.0.2
> pkts: tx=10 rx=0
> tunnels after test:
> tunl0: any/ip remote any local any ttl inherit nopmtudisc
> sit0: ipv6/ip remote any local any ttl 64 nopmtudisc
> gre0: gre/ip remote any local any ttl inherit nopmtudisc
> selftests: test_flow_dissector [FAILED]
>
> is it something in my setup or test is broken?
>

[PATCH net] net: dsa: mv88e6xxx: Fix ATU Miss Violation

2018-09-14 Thread Andrew Lunn

Fix a cut/paste error and a typo which results in ATU miss violations
not being reported.

Fixes: 0977644c5005 ("net: dsa: mv88e6xxx: Decode ATU problem interrupt")
Signed-off-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx/global1.h | 2 +-
 drivers/net/dsa/mv88e6xxx/global1_atu.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/global1.h 
b/drivers/net/dsa/mv88e6xxx/global1.h
index 7c791c1da4b9..bef01331266f 100644
--- a/drivers/net/dsa/mv88e6xxx/global1.h
+++ b/drivers/net/dsa/mv88e6xxx/global1.h
@@ -128,7 +128,7 @@
 #define MV88E6XXX_G1_ATU_OP_GET_CLR_VIOLATION  0x7000
 #define MV88E6XXX_G1_ATU_OP_AGE_OUT_VIOLATION  BIT(7)
 #define MV88E6XXX_G1_ATU_OP_MEMBER_VIOLATION   BIT(6)
-#define MV88E6XXX_G1_ATU_OP_MISS_VIOLTATIONBIT(5)
+#define MV88E6XXX_G1_ATU_OP_MISS_VIOLATION BIT(5)
 #define MV88E6XXX_G1_ATU_OP_FULL_VIOLATION BIT(4)
 
 /* Offset 0x0C: ATU Data Register */
diff --git a/drivers/net/dsa/mv88e6xxx/global1_atu.c 
b/drivers/net/dsa/mv88e6xxx/global1_atu.c
index 307410898fc9..5200e4bdce93 100644
--- a/drivers/net/dsa/mv88e6xxx/global1_atu.c
+++ b/drivers/net/dsa/mv88e6xxx/global1_atu.c
@@ -349,7 +349,7 @@ static irqreturn_t mv88e6xxx_g1_atu_prob_irq_thread_fn(int 
irq, void *dev_id)
chip->ports[entry.portvec].atu_member_violation++;
}
 
-   if (val & MV88E6XXX_G1_ATU_OP_MEMBER_VIOLATION) {
+   if (val & MV88E6XXX_G1_ATU_OP_MISS_VIOLATION) {
dev_err_ratelimited(chip->dev,
"ATU miss violation for %pM portvec %x\n",
entry.mac, entry.portvec);
-- 
2.19.0.rc1

[PATH RFC net-next 0/8] Continue towards using linkmode in phylib

2018-09-14 Thread Andrew Lunn

These patches contain some further cleanup and helpers, and the first
real patch towards using linkmode bitmaps in phylink.

It is RFC because i don't like patch #7 and maybe somebody has a
better idea how to do this. Ideally, we want to initialise a linux
generic bitmap at compile time.

Thanks
Andrew

Andrew Lunn (8):
  net: phy: Move linkmode helpers to somewhere public
  net: phy: Add phydev_warn()
  net: phy: Add helper to convert MII ADV register to a linkmode
  net: phy: Add helper for advertise to lcl value
  net: phy: Add limkmode equivalents to some of the MII ethtool helpers
  net: ethernet xgbe expand PHY_GBIT_FEAUTRES
  net: phy: Replace phy driver features u32 with link_mode bitmap
  net: phy: Add build warning if assumptions get broken

 drivers/net/dsa/mt7530.c  |   6 +-
 drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c   |  15 +-
 drivers/net/ethernet/freescale/fman/mac.c |   6 +-
 drivers/net/ethernet/freescale/gianfar.c  |   7 +-
 .../hisilicon/hns3/hns3pf/hclge_main.c|   6 +-
 drivers/net/ethernet/marvell/pxa168_eth.c |   4 +-
 drivers/net/ethernet/mediatek/mtk_eth_soc.c   |   6 +-
 drivers/net/ethernet/socionext/sni_ave.c  |   5 +-
 drivers/net/phy/aquantia.c|  12 +-
 drivers/net/phy/bcm63xx.c |   9 +-
 drivers/net/phy/marvell.c |   2 +-
 drivers/net/phy/marvell10g.c  |  11 +-
 drivers/net/phy/microchip_t1.c|   2 +-
 drivers/net/phy/phy_device.c  | 211 +-
 drivers/net/phy/phylink.c |  27 ---
 include/linux/linkmode.h  |  67 ++
 include/linux/mii.h   | 101 +
 include/linux/phy.h   |  28 ++-
 18 files changed, 421 insertions(+), 104 deletions(-)
 create mode 100644 include/linux/linkmode.h

-- 
2.19.0.rc1

[PATH RFC net-next 7/8] net: phy: Replace phy driver features u32 with link_mode bitmap

2018-09-14 Thread Andrew Lunn

This is one step in allowing phylib to make use of link_mode bitmaps,
instead of u32 for supported and advertised features. Convert the phy
drivers to use bitmaps to indicates the features they support. This
requires some macro magic in order to construct constant bitmaps used
to initialise the driver structures.

Some new PHY_*_FEATURES are added, to indicate FIBRE is supported, and
that all media ports are supported. This is done since bitmaps cannot
be ORed together at compile time.

Within phylib, the features bitmap is currently turned back into a
u32.  The MAC API to phylib needs to be cleaned up before the core of
phylib can be converted to using bitmaps instead of u32.

Signed-off-by: Andrew Lunn 
---
 drivers/net/ethernet/marvell/pxa168_eth.c |   4 +-
 drivers/net/phy/aquantia.c|  12 +-
 drivers/net/phy/bcm63xx.c |   9 +-
 drivers/net/phy/marvell.c |   2 +-
 drivers/net/phy/marvell10g.c  |  11 +-
 drivers/net/phy/microchip_t1.c|   2 +-
 drivers/net/phy/phy_device.c  | 204 +-
 include/linux/phy.h   |  24 ++-
 8 files changed, 229 insertions(+), 39 deletions(-)

diff --git a/drivers/net/ethernet/marvell/pxa168_eth.c 
b/drivers/net/ethernet/marvell/pxa168_eth.c
index 3a9730612a70..b406395bbb37 100644
--- a/drivers/net/ethernet/marvell/pxa168_eth.c
+++ b/drivers/net/ethernet/marvell/pxa168_eth.c
@@ -988,8 +988,8 @@ static int pxa168_init_phy(struct net_device *dev)
cmd.base.phy_address = pep->phy_addr;
cmd.base.speed = pep->phy_speed;
cmd.base.duplex = pep->phy_duplex;
-   ethtool_convert_legacy_u32_to_link_mode(cmd.link_modes.advertising,
-   PHY_BASIC_FEATURES);
+   bitmap_copy(cmd.link_modes.advertising, PHY_BASIC_FEATURES,
+   __ETHTOOL_LINK_MODE_MASK_NBITS);
cmd.base.autoneg = AUTONEG_ENABLE;
 
if (cmd.base.speed != 0)
diff --git a/drivers/net/phy/aquantia.c b/drivers/net/phy/aquantia.c
index 319edc9c8ec7..632472cab3bb 100644
--- a/drivers/net/phy/aquantia.c
+++ b/drivers/net/phy/aquantia.c
@@ -115,7 +115,7 @@ static struct phy_driver aquantia_driver[] = {
.phy_id = PHY_ID_AQ1202,
.phy_id_mask= 0xfff0,
.name   = "Aquantia AQ1202",
-   .features   = PHY_AQUANTIA_FEATURES,
+   .features   = PHY_10GBIT_FULL_FEATURES,
.flags  = PHY_HAS_INTERRUPT,
.aneg_done  = genphy_c45_aneg_done,
.config_aneg= aquantia_config_aneg,
@@ -127,7 +127,7 @@ static struct phy_driver aquantia_driver[] = {
.phy_id = PHY_ID_AQ2104,
.phy_id_mask= 0xfff0,
.name   = "Aquantia AQ2104",
-   .features   = PHY_AQUANTIA_FEATURES,
+   .features   = PHY_10GBIT_FULL_FEATURES,
.flags  = PHY_HAS_INTERRUPT,
.aneg_done  = genphy_c45_aneg_done,
.config_aneg= aquantia_config_aneg,
@@ -139,7 +139,7 @@ static struct phy_driver aquantia_driver[] = {
.phy_id = PHY_ID_AQR105,
.phy_id_mask= 0xfff0,
.name   = "Aquantia AQR105",
-   .features   = PHY_AQUANTIA_FEATURES,
+   .features   = PHY_10GBIT_FULL_FEATURES,
.flags  = PHY_HAS_INTERRUPT,
.aneg_done  = genphy_c45_aneg_done,
.config_aneg= aquantia_config_aneg,
@@ -151,7 +151,7 @@ static struct phy_driver aquantia_driver[] = {
.phy_id = PHY_ID_AQR106,
.phy_id_mask= 0xfff0,
.name   = "Aquantia AQR106",
-   .features   = PHY_AQUANTIA_FEATURES,
+   .features   = PHY_10GBIT_FULL_FEATURES,
.flags  = PHY_HAS_INTERRUPT,
.aneg_done  = genphy_c45_aneg_done,
.config_aneg= aquantia_config_aneg,
@@ -163,7 +163,7 @@ static struct phy_driver aquantia_driver[] = {
.phy_id = PHY_ID_AQR107,
.phy_id_mask= 0xfff0,
.name   = "Aquantia AQR107",
-   .features   = PHY_AQUANTIA_FEATURES,
+   .features   = PHY_10GBIT_FULL_FEATURES,
.flags  = PHY_HAS_INTERRUPT,
.aneg_done  = genphy_c45_aneg_done,
.config_aneg= aquantia_config_aneg,
@@ -175,7 +175,7 @@ static struct phy_driver aquantia_driver[] = {
.phy_id = PHY_ID_AQR405,
.phy_id_mask= 0xfff0,
.name   = "Aquantia AQR405",
-   .features   = PHY_AQUANTIA_FEATURES,
+   .features   = PHY_10GBIT_FULL_FEATURES,
.flags  = PHY_HAS_INTERRUPT,
.aneg_done  = genphy_c45_aneg_done,
.config_aneg= aquantia_config_aneg,
diff --git a/drivers/net/phy/bcm63xx.c b/drivers/net/phy/bcm63xx.c
index cf14613745c9..ff5acf01b877 100644
--- a/drivers/net/phy/bcm63xx.c
+++ b/drivers/net/phy/bcm63xx.c
@@ -42,6 +42,9 @@ static int bcm63xx_config_init(struct

[PATH RFC net-next 8/8] net: phy: Add build warning if assumptions get broken

2018-09-14 Thread Andrew Lunn

The macro magic to build constant bitmaps of supported PHY features
breaks when we have more than 63 ETHTOOL_LINK_MODE bits. Make the
breakage loud, not a subtle bug, when we get to that condition.

Signed-off-by: Andrew Lunn 
---
 drivers/net/phy/phy_device.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index eed61ee1d394..7bee59c7834b 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -2297,6 +2297,13 @@ static int __init phy_init(void)
 {
int rc;
 
+   /* The phy_basic_features, phy_gbit_features etc, above, only
+* work for values up to 63. Ensure we get a loud error if
+* this threshold is exceeded, and the necessary changes are
+* made.
+*/
+   BUILD_BUG_ON(__ETHTOOL_LINK_MODE_LAST > 63);
+
rc = mdio_bus_init();
if (rc)
return rc;
-- 
2.19.0.rc1

[PATH RFC net-next 6/8] net: ethernet xgbe expand PHY_GBIT_FEAUTRES

2018-09-14 Thread Andrew Lunn

The macro PHY_GBIT_FEAUTRES needs to change into a bitmap in order to
support link_modes. Remove its use from xgde by replacing it with its
definition.

Probably, the current behavior is wrong. It probably should be
ANDing not assigning.

Signed-off-by: Andrew Lunn 
---
 drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
index a7e03e3ecc93..d49e76982453 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
@@ -878,8 +878,9 @@ static bool xgbe_phy_finisar_phy_quirks(struct 
xgbe_prv_data *pdata)
phy_write(phy_data->phydev, 0x04, 0x0d01);
phy_write(phy_data->phydev, 0x00, 0x9140);
 
-   phy_data->phydev->supported = PHY_GBIT_FEATURES;
-   phy_data->phydev->advertising = phy_data->phydev->supported;
+   phy_data->phydev->supported = (PHY_10BT_FEATURES |
+  PHY_100BT_FEATURES |
+  PHY_1000BT_FEATURES);
phy_support_asym_pause(phy_data->phydev);
 
netif_dbg(pdata, drv, pdata->netdev,
@@ -950,8 +951,9 @@ static bool xgbe_phy_belfuse_phy_quirks(struct 
xgbe_prv_data *pdata)
reg = phy_read(phy_data->phydev, 0x00);
phy_write(phy_data->phydev, 0x00, reg & ~0x00800);
 
-   phy_data->phydev->supported = PHY_GBIT_FEATURES;
-   phy_data->phydev->advertising = phy_data->phydev->supported;
+   phy_data->phydev->supported = (PHY_10BT_FEATURES |
+  PHY_100BT_FEATURES |
+  PHY_1000BT_FEATURES);
phy_support_asym_pause(phy_data->phydev);
 
netif_dbg(pdata, drv, pdata->netdev,
-- 
2.19.0.rc1

[PATH RFC net-next 3/8] net: phy: Add helper to convert MII ADV register to a linkmode

2018-09-14 Thread Andrew Lunn

The phy_mii_ioctl can be used to write a value into the MII_ADVERTISE
register in the PHY. Since this changes the state of the PHY, we need
to make the same change to phydev->advertising. Add a helper which can
convert the register value to a linkmode.

Signed-off-by: Andrew Lunn 
---
 include/linux/mii.h | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/include/linux/mii.h b/include/linux/mii.h
index 567047ef0309..8c7da9473ad9 100644
--- a/include/linux/mii.h
+++ b/include/linux/mii.h
@@ -303,6 +303,37 @@ static inline u32 mii_lpa_to_ethtool_lpa_x(u32 lpa)
return result | mii_adv_to_ethtool_adv_x(lpa);
 }
 
+/**
+ * mii_adv_to_linkmode_adv_t
+ * @advertising:pointer to destination link mode.
+ * @adv: value of the MII_ADVERTISE register
+ *
+ * A small helper function that translates MII_ADVERTISE bits
+ * to linkmode advertisement settings.
+ */
+static inline void mii_adv_to_linkmode_adv_t(unsigned long *advertising,
+u32 adv)
+{
+   linkmode_zero(advertising);
+
+   if (adv & ADVERTISE_10HALF)
+   linkmode_set_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT,
+advertising);
+   if (adv & ADVERTISE_10FULL)
+   linkmode_set_bit(ETHTOOL_LINK_MODE_10baseT_Full_BIT,
+advertising);
+   if (adv & ADVERTISE_100HALF)
+   linkmode_set_bit(ETHTOOL_LINK_MODE_100baseT_Half_BIT,
+advertising);
+   if (adv & ADVERTISE_100FULL)
+   linkmode_set_bit(ETHTOOL_LINK_MODE_100baseT_Full_BIT,
+advertising);
+   if (adv & ADVERTISE_PAUSE_CAP)
+   linkmode_set_bit(ETHTOOL_LINK_MODE_Pause_BIT, advertising);
+   if (adv & ADVERTISE_PAUSE_ASYM)
+   linkmode_set_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT, advertising);
+}
+
 /**
  * mii_advertise_flowctrl - get flow control advertisement flags
  * @cap: Flow control capabilities (FLOW_CTRL_RX, FLOW_CTRL_TX or both)
-- 
2.19.0.rc1

[PATH RFC net-next 2/8] net: phy: Add phydev_warn()

2018-09-14 Thread Andrew Lunn

Not all new style LINK_MODE bits can be converted into old style
SUPPORTED bits. We need to warn when such a conversion is attempted.
Add a helper for this.

Signed-off-by: Andrew Lunn 
---
 include/linux/phy.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/phy.h b/include/linux/phy.h
index d24cc46748e2..0ab9f89773fd 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -968,6 +968,9 @@ static inline void phy_device_reset(struct phy_device 
*phydev, int value)
 #define phydev_err(_phydev, format, args...)   \
dev_err(&_phydev->mdio.dev, format, ##args)
 
+#define phydev_warn(_phydev, format, args...)  \
+   dev_warn(&_phydev->mdio.dev, format, ##args)
+
 #define phydev_dbg(_phydev, format, args...)   \
dev_dbg(&_phydev->mdio.dev, format, ##args)
 
-- 
2.19.0.rc1

[PATH RFC net-next 4/8] net: phy: Add helper for advertise to lcl value

2018-09-14 Thread Andrew Lunn

Add a helper to convert the local advertising to an LCL capabilities,
which is then used to resolve pause flow control settings.

Signed-off-by: Andrew Lunn 
---
 drivers/net/dsa/mt7530.c  |  6 +-
 drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c   |  5 +
 drivers/net/ethernet/freescale/fman/mac.c |  6 +-
 drivers/net/ethernet/freescale/gianfar.c  |  7 +--
 .../hisilicon/hns3/hns3pf/hclge_main.c|  6 +-
 drivers/net/ethernet/mediatek/mtk_eth_soc.c   |  6 +-
 drivers/net/ethernet/socionext/sni_ave.c  |  5 +
 include/linux/mii.h   | 19 +++
 8 files changed, 26 insertions(+), 34 deletions(-)

diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
index 62e486652e62..a5de9bffe5be 100644
--- a/drivers/net/dsa/mt7530.c
+++ b/drivers/net/dsa/mt7530.c
@@ -658,11 +658,7 @@ static void mt7530_adjust_link(struct dsa_switch *ds, int 
port,
if (phydev->asym_pause)
rmt_adv |= LPA_PAUSE_ASYM;
 
-   if (phydev->advertising & ADVERTISED_Pause)
-   lcl_adv |= ADVERTISE_PAUSE_CAP;
-   if (phydev->advertising & ADVERTISED_Asym_Pause)
-   lcl_adv |= ADVERTISE_PAUSE_ASYM;
-
+   lcl_adv = ethtool_adv_to_lcl_adv_t(phydev->advertising);
flowctrl = mii_resolve_flowctrl_fdx(lcl_adv, rmt_adv);
 
if (flowctrl & FLOW_CTRL_TX)
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
index 289129011b9f..a7e03e3ecc93 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
@@ -1495,10 +1495,7 @@ static void xgbe_phy_phydev_flowctrl(struct 
xgbe_prv_data *pdata)
if (!phy_data->phydev)
return;
 
-   if (phy_data->phydev->advertising & ADVERTISED_Pause)
-   lcl_adv |= ADVERTISE_PAUSE_CAP;
-   if (phy_data->phydev->advertising & ADVERTISED_Asym_Pause)
-   lcl_adv |= ADVERTISE_PAUSE_ASYM;
+   lcl_adv = ethtool_adv_to_lcl_adv_t(phy_data->phydev->advertising);
 
if (phy_data->phydev->pause) {
XGBE_SET_LP_ADV(lks, Pause);
diff --git a/drivers/net/ethernet/freescale/fman/mac.c 
b/drivers/net/ethernet/freescale/fman/mac.c
index a847b9c3b31a..d79e4e009d63 100644
--- a/drivers/net/ethernet/freescale/fman/mac.c
+++ b/drivers/net/ethernet/freescale/fman/mac.c
@@ -393,11 +393,7 @@ void fman_get_pause_cfg(struct mac_device *mac_dev, bool 
*rx_pause,
 */
 
/* get local capabilities */
-   lcl_adv = 0;
-   if (phy_dev->advertising & ADVERTISED_Pause)
-   lcl_adv |= ADVERTISE_PAUSE_CAP;
-   if (phy_dev->advertising & ADVERTISED_Asym_Pause)
-   lcl_adv |= ADVERTISE_PAUSE_ASYM;
+   lcl_adv = ethtool_adv_to_lcl_adv_t(phy_dev->advertising);
 
/* get link partner capabilities */
rmt_adv = 0;
diff --git a/drivers/net/ethernet/freescale/gianfar.c 
b/drivers/net/ethernet/freescale/gianfar.c
index 40a1a87cd338..a24b242bf752 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -3658,12 +3658,7 @@ static u32 gfar_get_flowctrl_cfg(struct gfar_private 
*priv)
if (phydev->asym_pause)
rmt_adv |= LPA_PAUSE_ASYM;
 
-   lcl_adv = 0;
-   if (phydev->advertising & ADVERTISED_Pause)
-   lcl_adv |= ADVERTISE_PAUSE_CAP;
-   if (phydev->advertising & ADVERTISED_Asym_Pause)
-   lcl_adv |= ADVERTISE_PAUSE_ASYM;
-
+   lcl_adv = ethtool_adv_to_lcl_adv_t(phydev->advertising);
flowctrl = mii_resolve_flowctrl_fdx(lcl_adv, rmt_adv);
if (flowctrl & FLOW_CTRL_TX)
val |= MACCFG1_TX_FLOW;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index cf18608669f5..a8088ba2ac9c 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -5270,11 +5270,7 @@ int hclge_cfg_flowctrl(struct hclge_dev *hdev)
if (!phydev->link || !phydev->autoneg)
return 0;
 
-   if (phydev->advertising & ADVERTISED_Pause)
-   local_advertising = ADVERTISE_PAUSE_CAP;
-
-   if (phydev->advertising & ADVERTISED_Asym_Pause)
-   local_advertising |= ADVERTISE_PAUSE_ASYM;
+   local_advertising = ethtool_adv_to_lcl_adv_t(phydev->advertising);
 
if (phydev->pause)
remote_advertising = LPA_PAUSE_CAP;
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index cc1e9a96a43b..7dbfdac4067a 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++

[PATH RFC net-next 1/8] net: phy: Move linkmode helpers to somewhere public

2018-09-14 Thread Andrew Lunn

phylink has some useful helpers to working with linkmode bitmaps.
Move them to there own header so other code can use them.

Signed-off-by: Andrew Lunn 
---
 drivers/net/phy/phylink.c | 27 
 include/linux/linkmode.h  | 67 +++
 include/linux/mii.h   |  1 +
 include/linux/phy.h   |  1 +
 4 files changed, 69 insertions(+), 27 deletions(-)
 create mode 100644 include/linux/linkmode.h

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 3ba5cf2a8a5f..95ab492089f2 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -68,33 +68,6 @@ struct phylink {
struct sfp_bus *sfp_bus;
 };
 
-static inline void linkmode_zero(unsigned long *dst)
-{
-   bitmap_zero(dst, __ETHTOOL_LINK_MODE_MASK_NBITS);
-}
-
-static inline void linkmode_copy(unsigned long *dst, const unsigned long *src)
-{
-   bitmap_copy(dst, src, __ETHTOOL_LINK_MODE_MASK_NBITS);
-}
-
-static inline void linkmode_and(unsigned long *dst, const unsigned long *a,
-   const unsigned long *b)
-{
-   bitmap_and(dst, a, b, __ETHTOOL_LINK_MODE_MASK_NBITS);
-}
-
-static inline void linkmode_or(unsigned long *dst, const unsigned long *a,
-   const unsigned long *b)
-{
-   bitmap_or(dst, a, b, __ETHTOOL_LINK_MODE_MASK_NBITS);
-}
-
-static inline bool linkmode_empty(const unsigned long *src)
-{
-   return bitmap_empty(src, __ETHTOOL_LINK_MODE_MASK_NBITS);
-}
-
 /**
  * phylink_set_port_modes() - set the port type modes in the ethtool mask
  * @mask: ethtool link mode mask
diff --git a/include/linux/linkmode.h b/include/linux/linkmode.h
new file mode 100644
index ..014fb86c7114
--- /dev/null
+++ b/include/linux/linkmode.h
@@ -0,0 +1,67 @@
+#ifndef __LINKMODE_H
+#define __LINKMODE_H
+
+#include 
+#include 
+#include 
+
+static inline void linkmode_zero(unsigned long *dst)
+{
+   bitmap_zero(dst, __ETHTOOL_LINK_MODE_MASK_NBITS);
+}
+
+static inline void linkmode_copy(unsigned long *dst, const unsigned long *src)
+{
+   bitmap_copy(dst, src, __ETHTOOL_LINK_MODE_MASK_NBITS);
+}
+
+static inline void linkmode_and(unsigned long *dst, const unsigned long *a,
+   const unsigned long *b)
+{
+   bitmap_and(dst, a, b, __ETHTOOL_LINK_MODE_MASK_NBITS);
+}
+
+static inline void linkmode_or(unsigned long *dst, const unsigned long *a,
+   const unsigned long *b)
+{
+   bitmap_or(dst, a, b, __ETHTOOL_LINK_MODE_MASK_NBITS);
+}
+
+static inline bool linkmode_empty(const unsigned long *src)
+{
+   return bitmap_empty(src, __ETHTOOL_LINK_MODE_MASK_NBITS);
+}
+
+static inline int linkmode_andnot(unsigned long *dst, const unsigned long 
*src1,
+ const unsigned long *src2)
+{
+   return bitmap_andnot(dst, src1, src2,  __ETHTOOL_LINK_MODE_MASK_NBITS);
+}
+
+static inline void linkmode_set_bit(int nr, volatile unsigned long *addr)
+{
+   __set_bit(nr, addr);
+}
+
+static inline void linkmode_clear_bit(int nr, volatile unsigned long *addr)
+{
+   __clear_bit(nr, addr);
+}
+
+static inline void linkmode_change_bit(int nr, volatile unsigned long *addr)
+{
+   __change_bit(nr, addr);
+}
+
+static inline int linkmode_test_bit(int nr, volatile unsigned long *addr)
+{
+   return test_bit(nr, addr);
+}
+
+static inline int linkmode_equal(const unsigned long *src1,
+const unsigned long *src2)
+{
+   return bitmap_equal(src1, src2, __ETHTOOL_LINK_MODE_MASK_NBITS);
+}
+
+#endif /* __LINKMODE_H */
diff --git a/include/linux/mii.h b/include/linux/mii.h
index 55000ee5c6ad..567047ef0309 100644
--- a/include/linux/mii.h
+++ b/include/linux/mii.h
@@ -10,6 +10,7 @@
 
 
 #include 
+#include 
 #include 
 
 struct ethtool_cmd;
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 192a1fa0c73b..d24cc46748e2 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
2.19.0.rc1

[PATH RFC net-next 5/8] net: phy: Add limkmode equivalents to some of the MII ethtool helpers

2018-09-14 Thread Andrew Lunn

Add helpers which take a linkmode rather than a u32 ethtool for
advertising settings.

Signed-off-by: Andrew Lunn 
---
 include/linux/mii.h | 50 +
 1 file changed, 50 insertions(+)

diff --git a/include/linux/mii.h b/include/linux/mii.h
index 9ed49c8261d0..2da85b02e1c0 100644
--- a/include/linux/mii.h
+++ b/include/linux/mii.h
@@ -132,6 +132,34 @@ static inline u32 ethtool_adv_to_mii_adv_t(u32 ethadv)
return result;
 }
 
+/**
+ * linkmode_adv_to_mii_adv_t
+ * @advertising: the linkmode advertisement settings
+ *
+ * A small helper function that translates linkmode advertisement
+ * settings to phy autonegotiation advertisements for the
+ * MII_ADVERTISE register.
+ */
+static inline u32 linkmode_adv_to_mii_adv_t(unsigned long *advertising)
+{
+   u32 result = 0;
+
+   if (linkmode_test_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT, advertising))
+   result |= ADVERTISE_10HALF;
+   if (linkmode_test_bit(ETHTOOL_LINK_MODE_10baseT_Full_BIT, advertising))
+   result |= ADVERTISE_10FULL;
+   if (linkmode_test_bit(ETHTOOL_LINK_MODE_100baseT_Half_BIT, advertising))
+   result |= ADVERTISE_100HALF;
+   if (linkmode_test_bit(ETHTOOL_LINK_MODE_100baseT_Full_BIT, advertising))
+   result |= ADVERTISE_100FULL;
+   if (linkmode_test_bit(ETHTOOL_LINK_MODE_Pause_BIT, advertising))
+   result |= ADVERTISE_PAUSE_CAP;
+   if (linkmode_test_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT, advertising))
+   result |= ADVERTISE_PAUSE_ASYM;
+
+   return result;
+}
+
 /**
  * mii_adv_to_ethtool_adv_t
  * @adv: value of the MII_ADVERTISE register
@@ -179,6 +207,28 @@ static inline u32 ethtool_adv_to_mii_ctrl1000_t(u32 ethadv)
return result;
 }
 
+/**
+ * linkmode_adv_to_mii_ctrl1000_t
+ * advertising: the linkmode advertisement settings
+ *
+ * A small helper function that translates linkmode advertisement
+ * settings to phy autonegotiation advertisements for the
+ * MII_CTRL1000 register when in 1000T mode.
+ */
+static inline u32 linkmode_adv_to_mii_ctrl1000_t(unsigned long *advertising)
+{
+   u32 result = 0;
+
+   if (linkmode_test_bit(ETHTOOL_LINK_MODE_1000baseT_Half_BIT,
+ advertising))
+   result |= ADVERTISE_1000HALF;
+   if (linkmode_test_bit(ETHTOOL_LINK_MODE_1000baseT_Full_BIT,
+ advertising))
+   result |= ADVERTISE_1000FULL;
+
+   return result;
+}
+
 /**
  * mii_ctrl1000_to_ethtool_adv_t
  * @adv: value of the MII_CTRL1000 register
-- 
2.19.0.rc1

Re: [PATCH net-next v3 0/2] net: stmmac: Coalesce and tail addr fixes

2018-09-14 Thread David Miller

From: Jose Abreu 
Date: Thu, 13 Sep 2018 09:02:21 +0100

> The fix for coalesce timer and a fix in tail address setting that impacts
> XGMAC2 operation.

This series is fixing bugs going all the way back to 4.7

There is no logical way that targetting net-next is valid.

net-next is always for new features and cleanups.

Bug fixes always go to 'net'.

Thank you.

mlx5_core: null pointer dereference in mlx5_accel_tls_device_caps() (net-next kernel)

2018-09-14 Thread Michal Kubecek

I just encountered a null pointer dereference on mlx5_core module
initialization while booting net-next kernel (based on commit
ee4fccbee7d3) on an aarch64 machine:

[   12.021971] iommu: Adding device :01:00.0 to group 3
[   12.022925] mlx5_core :01:00.0: firmware version: 12.17.2020
[   12.022954] mlx5_core :01:00.0: 63.008 Gb/s available PCIe bandwidth (8 
GT/s x8 link)
[   12.068709] Adding 98830144k swap on /dev/sda4.  Priority:-2 extents:1 
across:98830144k FS
[   12.347571] (:01:00.0): E-Switch: Total vports 9, per vport: max 
uc(1024) max mc(16384)
[   12.351962] mlx5_core :01:00.0: Port module event: module 0, Cable 
plugged
[   12.366306] mlx5_core :01:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(128) 
RxCqeCmprss(0)
[   12.366741] Unable to handle kernel NULL pointer dereference at virtual 
address 0050
[   12.374603] Mem abort info:
[   12.377368]   ESR = 0x9604
[   12.380406]   Exception class = DABT (current EL), IL = 32 bits
[   12.386357]   SET = 0, FnV = 0
[   12.389347]   EA = 0, S1PTW = 0
[   12.392471] Data abort info:
[   12.395343]   ISV = 0, ISS = 0x0004
[   12.399156]   CM = 0, WnR = 0
[   12.402108] user pgtable: 4k pages, 48-bit VAs, pgdp = (ptrval)
[   12.408711] [0050] pgd=
[   12.413567] Internal error: Oops: 9604 [#1] SMP
[   12.418427] Modules linked in: fat mlx5_core(+) ipmi_ssif(+) aes_ce_blk 
crypto_simd cryptd aes_ce_cipher crc32_ce crct10dif_ce ghash_ce aes_arm64 
sha2_ce sha256_arm64 sha1_ce ipmi_devintf ipmi_msghandler sbsa_gwdt tls mlxfw 
devlink at803x qcom_emac btrfs libcrc32c xor zlib_deflate raid6_pq 
ahci_platform libahci_platform hdma hdma_mgmt i2c_qup sg dm_multipath dm_mod 
scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs
[   12.454800] CPU: 40 PID: 742 Comm: systemd-udevd Not tainted 
4.19.0-rc3-ethnl.15-default #1
[   12.463131] Hardware name: To be filled by O.E.M. To be filled by O.E.M./To 
be filled by O.E.M., BIOS 5.13 12/12/2012
[   12.473722] pstate: 6045 (nZCv daif +PAN -UAO)
[   12.478559] pc : mlx5_accel_tls_device_caps+0x28/0x38 [mlx5_core]
[   12.484598] lr : mlx5e_tls_build_netdev+0x24/0x98 [mlx5_core]
[   12.490301] sp : 21873a30
[   12.493599] x29: 21873a30 x28: 2a72560a7940 
[   12.498895] x27: 2a7256df6000 x26: 2a71a0fed650 
[   12.504190] x25:  x24: 92c7f2b988c0 
[   12.509485] x23: 92c7fe01c0c0 x22: 2a71a0fcfa70 
[   12.514780] x21: 92c7f2b808c0 x20: 92c7f741c110 
[   12.520075] x19: 92c7f2b988c0 x18: 218739b0 
[   12.525370] x17:  x16: 2a725625ade0 
[   12.530665] x15: 29818ed4 x14: d47aab07 
[   12.535961] x13: 8a24 x12:  
[   12.541256] x11:  x10:  
[   12.546551] x9 :  x8 :  
[   12.551846] x7 :  x6 : 92c8159dc910 
[   12.557141] x5 : 0400 x4 : 7e4b205a20c7 
[   12.562436] x3 :  x2 : 2a725625ae1c 
[   12.567731] x1 : ab078a24 x0 :  
[   12.573027] Process systemd-udevd (pid: 742, stack limit = 
0x(ptrval))
[   12.580232] Call trace:
[   12.582688]  mlx5_accel_tls_device_caps+0x28/0x38 [mlx5_core]
[   12.588419]  mlx5e_build_nic_netdev+0x27c/0x348 [mlx5_core]
[   12.593974]  mlx5e_nic_init+0x1a0/0x258 [mlx5_core]
[   12.598835]  mlx5e_create_netdev+0x74/0x118 [mlx5_core]
[   12.604043]  mlx5e_add+0xf0/0x2c0 [mlx5_core]
[   12.608384]  mlx5_add_device+0x88/0x1a8 [mlx5_core]
[   12.613246]  mlx5_register_interface+0x78/0xb0 [mlx5_core]
[   12.618713]  mlx5e_init+0x24/0x30 [mlx5_core]
[   12.623052]  init+0x88/0xa0 [mlx5_core]
[   12.626850]  do_one_initcall+0x54/0x200
[   12.630667]  do_init_module+0x64/0x1d8
[   12.634401]  load_module+0x1480/0x1510
[   12.638132]  __se_sys_finit_module+0xc8/0xd8
[   12.642385]  __arm64_sys_finit_module+0x24/0x30
[   12.646901]  el0_svc_common+0x7c/0x118
[   12.650631]  el0_svc_handler+0x38/0x78
[   12.654364]  el0_svc+0x8/0xc
[   12.657229] Code: d503201f f97c7e60 f9400bf3 a8c27bfd (f9402800) 
[   12.663306] ---[ end trace 57e772dd3cf718f1 ]---

The function looks like this:


drivers/net/ethernet/mellanox/mlx5/core/accel/tls.c:
68  {
   0x00058230 <+0>: stp x29, x30, [sp, #-32]!
   0x00058234 <+4>: mov x29, sp
   0x00058238 <+8>: str x19, [sp, #16]
   0x0005823c <+12>:mov x19, x0
   0x00058240 <+16>:mov x0, x30

drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.h:
68  return mdev->fpga->tls->caps;
   0x00058244 <+20>:add x19, x19, #0x38, lsl #12

drivers/net/ethernet/mellanox/mlx5/core/accel/tls.c:
68  {
   0x00058248 <+24>:bl  0x58248


drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.h:
68  return mdev->fpga->tls->caps;

Re: [net-next,RFC PATCH] Introduce TC Range classifier

2018-09-14 Thread Cong Wang

On Fri, Sep 14, 2018 at 2:53 AM Jiri Pirko  wrote:
>
> Thu, Sep 13, 2018 at 10:52:01PM CEST, amritha.namb...@intel.com wrote:
> >This patch introduces a TC range classifier to support filtering based
> >on ranges. Only port-range filters are supported currently. This can
> >be combined with flower classifier to support filters that are a
> >combination of port-ranges and other parameters based on existing
> >fields supported by cls_flower. The 'goto chain' action can be used to
> >combine the flower and range filter.
> >The filter precedence is decided based on the 'prio' value.
>
> For example Spectrum ASIC supports mask-based and range-based matching
> in a single TCAM rule. No chains needed. Also, I don't really understand
> why is this a separate cls. I believe that this functionality should be
> put as an extension of existing cls_flower.

Exactly. u32 filters support range matching too with proper masks.

Re: [net-next, RFC PATCH] net: sched: cls_range: Introduce Range classifier

2018-09-14 Thread Cong Wang

On Thu, Sep 13, 2018 at 6:53 PM Amritha Nambiar
 wrote:
>
> This patch introduces a range classifier to support filtering based
> on ranges. Only port-range filters are supported currently. This can
> be combined with flower classifier to support filters that are a
> combination of port-ranges and other parameters based on existing
> fields supported by cls_flower.

Why should we have a special-purpose filter just for ports here?

We have achieved almost the same goal with u32 filter:

https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/isolators/network/port_mapping.cpp

There is a large overlap with other general purpose filters.

I don't see you provide any justification for the purpose of it. If
it is just for convenience, can't we just make it on top of other
general purpose header-matching filters?

[PATCH net] tls: fix currently broken MSG_PEEK behavior

2018-09-14 Thread Daniel Borkmann

In kTLS MSG_PEEK behavior is currently failing, strace example:

  [pid  2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
  [pid  2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4
  [pid  2430] bind(4, {sa_family=AF_INET, sin_port=htons(0), 
sin_addr=inet_addr("0.0.0.0")}, 16) = 0
  [pid  2430] listen(4, 10)   = 0
  [pid  2430] getsockname(4, {sa_family=AF_INET, sin_port=htons(38855), 
sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
  [pid  2430] connect(3, {sa_family=AF_INET, sin_port=htons(38855), 
sin_addr=inet_addr("0.0.0.0")}, 16) = 0
  [pid  2430] setsockopt(3, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
  [pid  2430] setsockopt(3, 0x11a /* SOL_?? */, 1, 
"\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
  [pid  2430] accept(4, {sa_family=AF_INET, sin_port=htons(49636), 
sin_addr=inet_addr("127.0.0.1")}, [16]) = 5
  [pid  2430] setsockopt(5, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
  [pid  2430] setsockopt(5, 0x11a /* SOL_?? */, 2, 
"\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
  [pid  2430] close(4)= 0
  [pid  2430] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14
  [pid  2430] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11
  [pid  2430] recvfrom(5, "test_read_peektest_read_peektest"..., 64, MSG_PEEK, 
NULL, NULL) = 64

As can be seen from strace, there are two TLS records sent,
i) 'test_read_peek' and ii) '_mult_recs\0' where we end up
peeking 'test_read_peektest_read_peektest'. This is clearly
wrong, and what happens is that given peek cannot call into
tls_sw_advance_skb() to unpause strparser and proceed with
the next skb, we end up looping over the current one, copying
the 'test_read_peek' over and over into the user provided
buffer.

Here, we can only peek into the currently held skb (current,
full TLS record) as otherwise we would end up having to hold
all the original skb(s) (depending on the peek depth) in a
separate queue when unpausing strparser to process next
records, minimally intrusive is to return only up to the
current record's size (which likely was what c46234ebb4d1
("tls: RX path for ktls") originally intended as well). Thus,
after patch we properly peek the first record:

  [pid  2046] wait4(2075,  
  [pid  2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
  [pid  2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4
  [pid  2075] bind(4, {sa_family=AF_INET, sin_port=htons(0), 
sin_addr=inet_addr("0.0.0.0")}, 16) = 0
  [pid  2075] listen(4, 10)   = 0
  [pid  2075] getsockname(4, {sa_family=AF_INET, sin_port=htons(55115), 
sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
  [pid  2075] connect(3, {sa_family=AF_INET, sin_port=htons(55115), 
sin_addr=inet_addr("0.0.0.0")}, 16) = 0
  [pid  2075] setsockopt(3, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
  [pid  2075] setsockopt(3, 0x11a /* SOL_?? */, 1, 
"\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
  [pid  2075] accept(4, {sa_family=AF_INET, sin_port=htons(45732), 
sin_addr=inet_addr("127.0.0.1")}, [16]) = 5
  [pid  2075] setsockopt(5, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
  [pid  2075] setsockopt(5, 0x11a /* SOL_?? */, 2, 
"\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
  [pid  2075] close(4)= 0
  [pid  2075] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14
  [pid  2075] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11
  [pid  2075] recvfrom(5, "test_read_peek", 64, MSG_PEEK, NULL, NULL) = 14

Fixes: c46234ebb4d1 ("tls: RX path for ktls")
Signed-off-by: Daniel Borkmann 
---
 net/tls/tls_sw.c  |  8 +++
 tools/testing/selftests/net/tls.c | 49 +++
 2 files changed, 57 insertions(+)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index e28a6ff..b0cea79 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -931,7 +931,15 @@ int tls_sw_recvmsg(struct sock *sk,
if (control != TLS_RECORD_TYPE_DATA)
goto recv_end;
}
+   } else {
+   /* MSG_PEEK right now cannot look beyond current skb
+* from strparser, meaning we cannot advance skb here
+* and thus unpause strparser since we'd loose original
+* one.
+*/
+   break;
}
+
/* If we have a new message from strparser, continue now. */
if (copied >= target && !ctx->recv_pkt)
break;
diff --git a/tools/testing/selftests/net/tls.c 
b/tools/testing/selftests/net/tls.c
index b3ebf26..8fdfeaf 100644
--- a/tools/testing/selftests/net/tls.c
+++ b/tools/testing/selftests/net/tls.c
@@ -502,6 +502,55 @@ TEST_F(tls, recv_peek_multiple)
EXPECT_EQ(memcmp(test_str, buf, send_len), 0);
 }
 
+TEST_F(tls, recv_peek_multiple_records)
+{
+

Re: [PATCH net-next v2] net: sched: change tcf_del_walker() to take idrinfo->lock

2018-09-14 Thread Cong Wang

On Fri, Sep 14, 2018 at 3:46 AM Vlad Buslov  wrote:
>
>
> On Thu 13 Sep 2018 at 17:13, Cong Wang  wrote:
> > On Wed, Sep 12, 2018 at 1:51 AM Vlad Buslov  wrote:
> >>
> >>
> >> On Fri 07 Sep 2018 at 19:12, Cong Wang  wrote:
> >> > On Fri, Sep 7, 2018 at 6:52 AM Vlad Buslov  wrote:
> >> >>
> >> >> Action API was changed to work with actions and action_idr in 
> >> >> concurrency
> >> >> safe manner, however tcf_del_walker() still uses actions without taking 
> >> >> a
> >> >> reference or idrinfo->lock first, and deletes them directly, 
> >> >> disregarding
> >> >> possible concurrent delete.
> >> >>
> >> >> Add tc_action_wq workqueue to action API. Implement
> >> >> tcf_idr_release_unsafe() that assumes external synchronization by caller
> >> >> and delays blocking action cleanup part to tc_action_wq workqueue. 
> >> >> Extend
> >> >> tcf_action_cleanup() with 'async' argument to indicate that function 
> >> >> should
> >> >> free action asynchronously.
> >> >
> >> > Where exactly is blocking in tcf_action_cleanup()?
> >> >
> >> > From your code, it looks like free_tcf(), but from my observation,
> >> > the only blocking function inside is tcf_action_goto_chain_fini()
> >> > which calls __tcf_chain_put(). But, __tcf_chain_put() is blocking
> >> > _ONLY_ when tc_chain_notify() is called, for tc action it is never
> >> > called.
> >> >
> >> > So, what else is blocking?
> >>
> >> __tcf_chain_put() calls tc_chain_tmplt_del(), which calls
> >> ops->tmplt_destroy(). This last function uses hw offload API, which is
> >> blocking.
> >
> > Good to know.
> >
> > Can we just make ops->tmplt_destroy() to use workqueue?
> > Making tc action to workqueue seems overkill, for me.
>
> How about changing tcf_chain_put_by_act() to use tc_filter_wq, instead
> of directly calling __tcf_chain_put()? IMO it is a better solution
> because it benefits all classifiers, instead of requiring every
> classifier with templates support to implement non-blocking
> ops->tmplt_destroy().

My point is, there is only one filter implements ops->tmplt_destroy
so far, so there is no reason to just make all filters to adjusted
for this single one. Not to mention actions, actions are innocent
here.

[Patch net-next] ipv4: initialize ra_mutex in inet_init_net()

2018-09-14 Thread Cong Wang

ra_mutex is a IPv4 specific mutex, it is inside struct netns_ipv4,
but its initialization is in the generic netns code, setup_net().

Move it to IPv4 specific net init code, inet_init_net().

Fixes: d9ff3049739e ("net: Replace ip_ra_lock with per-net mutex")
Cc: Kirill Tkhai 
Signed-off-by: Cong Wang 
---
 net/core/net_namespace.c | 1 -
 net/ipv4/af_inet.c   | 2 ++
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 670c84b1bfc2..b272ccfcbf63 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -308,7 +308,6 @@ static __net_init int setup_net(struct net *net, struct 
user_namespace *user_ns)
net->user_ns = user_ns;
idr_init(>netns_ids);
spin_lock_init(>nsid_lock);
-   mutex_init(>ipv4.ra_mutex);
 
list_for_each_entry(ops, _list, list) {
error = ops_init(ops, net);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 20fda8fb8ffd..57b7bffb93e5 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1817,6 +1817,8 @@ static __net_init int inet_init_net(struct net *net)
net->ipv4.sysctl_igmp_llm_reports = 1;
net->ipv4.sysctl_igmp_qrv = 2;
 
+   mutex_init(>ipv4.ra_mutex);
+
return 0;
 }
 
-- 
2.14.4

Re: [PATCH net] bnxt_en: Fix VF mac address regression.

2018-09-14 Thread Siwei Liu

Ack. Looks fine to me.

-Siwei

On Fri, Sep 14, 2018 at 12:41 PM, Michael Chan
 wrote:
> The recent commit to always forward the VF MAC address to the PF for
> approval may not work if the PF driver or the firmware is older.  This
> will cause the VF driver to fail during probe:
>
>   bnxt_en :00:03.0 (unnamed net_device) (uninitialized): hwrm req_type 
> 0xf seq id 0x5 error 0x
>   bnxt_en :00:03.0 (unnamed net_device) (uninitialized): VF MAC address 
> 00:00:17:02:05:d0 not approved by the PF
>   bnxt_en :00:03.0: Unable to initialize mac address.
>   bnxt_en: probe of :00:03.0 failed with error -99
>
> We fix it by treating the error as fatal only if the VF MAC address is
> locally generated by the VF.
>
> Fixes: 707e7e966026 ("bnxt_en: Always forward VF MAC address to the PF.")
> Reported-by: Seth Forshee 
> Reported-by: Siwei Liu 
> Signed-off-by: Michael Chan 
> ---
> Please queue this for stable as well.  Thanks.
>
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c   | 9 +++--
>  drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 9 +
>  drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h | 2 +-
>  3 files changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
> b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index cecbb1d..177587f 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -8027,7 +8027,7 @@ static int bnxt_change_mac_addr(struct net_device *dev, 
> void *p)
> if (ether_addr_equal(addr->sa_data, dev->dev_addr))
> return 0;
>
> -   rc = bnxt_approve_mac(bp, addr->sa_data);
> +   rc = bnxt_approve_mac(bp, addr->sa_data, true);
> if (rc)
> return rc;
>
> @@ -8827,14 +8827,19 @@ static int bnxt_init_mac_addr(struct bnxt *bp)
> } else {
>  #ifdef CONFIG_BNXT_SRIOV
> struct bnxt_vf_info *vf = >vf;
> +   bool strict_approval = true;
>
> if (is_valid_ether_addr(vf->mac_addr)) {
> /* overwrite netdev dev_addr with admin VF MAC */
> memcpy(bp->dev->dev_addr, vf->mac_addr, ETH_ALEN);
> +   /* Older PF driver or firmware may not approve this
> +* correctly.
> +*/
> +   strict_approval = false;
> } else {
> eth_hw_addr_random(bp->dev);
> }
> -   rc = bnxt_approve_mac(bp, bp->dev->dev_addr);
> +   rc = bnxt_approve_mac(bp, bp->dev->dev_addr, strict_approval);
>  #endif
> }
> return rc;
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c 
> b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
> index fcd085a..3962f6f 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
> @@ -1104,7 +1104,7 @@ void bnxt_update_vf_mac(struct bnxt *bp)
> mutex_unlock(>hwrm_cmd_lock);
>  }
>
> -int bnxt_approve_mac(struct bnxt *bp, u8 *mac)
> +int bnxt_approve_mac(struct bnxt *bp, u8 *mac, bool strict)
>  {
> struct hwrm_func_vf_cfg_input req = {0};
> int rc = 0;
> @@ -1122,12 +1122,13 @@ int bnxt_approve_mac(struct bnxt *bp, u8 *mac)
> memcpy(req.dflt_mac_addr, mac, ETH_ALEN);
> rc = hwrm_send_message(bp, , sizeof(req), HWRM_CMD_TIMEOUT);
>  mac_done:
> -   if (rc) {
> +   if (rc && strict) {
> rc = -EADDRNOTAVAIL;
> netdev_warn(bp->dev, "VF MAC address %pM not approved by the 
> PF\n",
> mac);
> +   return rc;
> }
> -   return rc;
> +   return 0;
>  }
>  #else
>
> @@ -1144,7 +1145,7 @@ void bnxt_update_vf_mac(struct bnxt *bp)
>  {
>  }
>
> -int bnxt_approve_mac(struct bnxt *bp, u8 *mac)
> +int bnxt_approve_mac(struct bnxt *bp, u8 *mac, bool strict)
>  {
> return 0;
>  }
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h 
> b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
> index e9b20cd..2eed9ed 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
> @@ -39,5 +39,5 @@ int bnxt_sriov_configure(struct pci_dev *pdev, int num_vfs);
>  void bnxt_sriov_disable(struct bnxt *);
>  void bnxt_hwrm_exec_fwd_req(struct bnxt *);
>  void bnxt_update_vf_mac(struct bnxt *);
> -int bnxt_approve_mac(struct bnxt *, u8 *);
> +int bnxt_approve_mac(struct bnxt *, u8 *, bool);
>  #endif
> --
> 2.5.1
>

[net-next PATCH] tls: async support causes out-of-bounds access in crypto APIs

2018-09-14 Thread John Fastabend

When async support was added it needed to access the sk from the async
callback to report errors up the stack. The patch tried to use space
after the aead request struct by directly setting the reqsize field in
aead_request. This is an internal field that should not be used
outside the crypto APIs. It is used by the crypto code to define extra
space for private structures used in the crypto context. Users of the
API then use crypto_aead_reqsize() and add the returned amount of
bytes to the end of the request memory allocation before posting the
request to encrypt/decrypt APIs.

So this breaks (with general protection fault and KASAN error, if
enabled) because the request sent to decrypt is shorter than required
causing the crypto API out-of-bounds errors. Also it seems unlikely the
sk is even valid by the time it gets to the callback because of memset
in crypto layer.

Anyways, fix this by holding the sk in the skb->sk field when the
callback is set up and because the skb is already passed through to
the callback handler via void* we can access it in the handler. Then
in the handler we need to be careful to NULL the pointer again before
kfree_skb. I added comments on both the setup (in tls_do_decryption)
and when we clear it from the crypto callback handler
tls_decrypt_done(). After this selftests pass again and fixes KASAN
errors/warnings.

Fixes: 94524d8fc965 ("net/tls: Add support for async decryption of tls records")
Signed-off-by: John Fastabend 
---
 include/net/tls.h |4 
 net/tls/tls_sw.c  |   39 +++
 2 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index cd0a65b..8630d28 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -128,10 +128,6 @@ struct tls_sw_context_rx {
bool async_notify;
 };
 
-struct decrypt_req_ctx {
-   struct sock *sk;
-};
-
 struct tls_record_info {
struct list_head list;
u32 end_seq;
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index be4f2e9..cef69b6 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -122,25 +122,32 @@ static int skb_nsg(struct sk_buff *skb, int offset, int 
len)
 static void tls_decrypt_done(struct crypto_async_request *req, int err)
 {
struct aead_request *aead_req = (struct aead_request *)req;
-   struct decrypt_req_ctx *req_ctx =
-   (struct decrypt_req_ctx *)(aead_req + 1);
-
struct scatterlist *sgout = aead_req->dst;
-
-   struct tls_context *tls_ctx = tls_get_ctx(req_ctx->sk);
-   struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
-   int pending = atomic_dec_return(>decrypt_pending);
+   struct tls_sw_context_rx *ctx;
+   struct tls_context *tls_ctx;
struct scatterlist *sg;
+   struct sk_buff *skb;
unsigned int pages;
+   int pending;
+
+   skb = (struct sk_buff *)req->data;
+   tls_ctx = tls_get_ctx(skb->sk);
+   ctx = tls_sw_ctx_rx(tls_ctx);
+   pending = atomic_dec_return(>decrypt_pending);
 
/* Propagate if there was an err */
if (err) {
ctx->async_wait.err = err;
-   tls_err_abort(req_ctx->sk, err);
+   tls_err_abort(skb->sk, err);
}
 
+   /* After using skb->sk to propagate sk through crypto async callback
+* we need to NULL it again.
+*/
+   skb->sk = NULL;
+
/* Release the skb, pages and memory allocated for crypto req */
-   kfree_skb(req->data);
+   kfree_skb(skb);
 
/* Skip the first S/G entry as it points to AAD */
for_each_sg(sg_next(sgout), sg, UINT_MAX, pages) {
@@ -175,11 +182,13 @@ static int tls_do_decryption(struct sock *sk,
   (u8 *)iv_recv);
 
if (async) {
-   struct decrypt_req_ctx *req_ctx;
-
-   req_ctx = (struct decrypt_req_ctx *)(aead_req + 1);
-   req_ctx->sk = sk;
-
+   /* Using skb->sk to push sk through to crypto async callback
+* handler. This allows propagating errors up to the socket
+* if needed. It _must_ be cleared in the async handler
+* before kfree_skb is called. We _know_ skb->sk is NULL
+* because it is a clone from strparser.
+*/
+   skb->sk = sk;
aead_request_set_callback(aead_req,
  CRYPTO_TFM_REQ_MAY_BACKLOG,
  tls_decrypt_done, skb);
@@ -1455,8 +1464,6 @@ int tls_set_sw_offload(struct sock *sk, struct 
tls_context *ctx, int tx)
goto free_aead;
 
if (sw_ctx_rx) {
-   (*aead)->reqsize = sizeof(struct decrypt_req_ctx);
-
/* Set up strparser */
memset(, 0, sizeof(cb));
cb.rcv_msg = tls_queue;

[PATCH v2 2/2] hv_netvsc: pair VF based on serial number

2018-09-14 Thread Stephen Hemminger

Matching network device based on MAC address is problematic
since a non VF network device can be creted with a duplicate MAC
address causing confusion and problems.  The VMBus API does provide
a serial number that is a better matching method.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/hyperv/netvsc.c |  3 ++
 drivers/net/hyperv/netvsc_drv.c | 58 +++--
 2 files changed, 36 insertions(+), 25 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 31c3d77b4733..fe01e141c8f8 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -1203,6 +1203,9 @@ static void netvsc_send_vf(struct net_device *ndev,
 
net_device_ctx->vf_alloc = nvmsg->msg.v4_msg.vf_assoc.allocated;
net_device_ctx->vf_serial = nvmsg->msg.v4_msg.vf_assoc.serial;
+   netdev_info(ndev, "VF slot %u %s\n",
+   net_device_ctx->vf_serial,
+   net_device_ctx->vf_alloc ? "added" : "removed");
 }
 
 static  void netvsc_receive_inband(struct net_device *ndev,
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 1121a1ec407c..9dedc1463e88 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1894,20 +1894,6 @@ static void netvsc_link_change(struct work_struct *w)
rtnl_unlock();
 }
 
-static struct net_device *get_netvsc_bymac(const u8 *mac)
-{
-   struct net_device_context *ndev_ctx;
-
-   list_for_each_entry(ndev_ctx, _dev_list, list) {
-   struct net_device *dev = hv_get_drvdata(ndev_ctx->device_ctx);
-
-   if (ether_addr_equal(mac, dev->perm_addr))
-   return dev;
-   }
-
-   return NULL;
-}
-
 static struct net_device *get_netvsc_byref(struct net_device *vf_netdev)
 {
struct net_device_context *net_device_ctx;
@@ -2036,26 +2022,48 @@ static void netvsc_vf_setup(struct work_struct *w)
rtnl_unlock();
 }
 
+/* Find netvsc by VMBus serial number.
+ * The PCI hyperv controller records the serial number as the slot.
+ */
+static struct net_device *get_netvsc_byslot(const struct net_device *vf_netdev)
+{
+   struct device *parent = vf_netdev->dev.parent;
+   struct net_device_context *ndev_ctx;
+   struct pci_dev *pdev;
+
+   if (!parent || !dev_is_pci(parent))
+   return NULL; /* not a PCI device */
+
+   pdev = to_pci_dev(parent);
+   if (!pdev->slot) {
+   netdev_notice(vf_netdev, "no PCI slot information\n");
+   return NULL;
+   }
+
+   list_for_each_entry(ndev_ctx, _dev_list, list) {
+   if (!ndev_ctx->vf_alloc)
+   continue;
+
+   if (ndev_ctx->vf_serial == pdev->slot->number)
+   return hv_get_drvdata(ndev_ctx->device_ctx);
+   }
+
+   netdev_notice(vf_netdev,
+ "no netdev found for slot %u\n", pdev->slot->number);
+   return NULL;
+}
+
 static int netvsc_register_vf(struct net_device *vf_netdev)
 {
-   struct net_device *ndev;
struct net_device_context *net_device_ctx;
-   struct device *pdev = vf_netdev->dev.parent;
struct netvsc_device *netvsc_dev;
+   struct net_device *ndev;
int ret;
 
if (vf_netdev->addr_len != ETH_ALEN)
return NOTIFY_DONE;
 
-   if (!pdev || !dev_is_pci(pdev) || dev_is_pf(pdev))
-   return NOTIFY_DONE;
-
-   /*
-* We will use the MAC address to locate the synthetic interface to
-* associate with the VF interface. If we don't find a matching
-* synthetic interface, move on.
-*/
-   ndev = get_netvsc_bymac(vf_netdev->perm_addr);
+   ndev = get_netvsc_byslot(vf_netdev);
if (!ndev)
return NOTIFY_DONE;
 
-- 
2.18.0

[PATCH v2 0/2] hv_netvsc: associate VF and PV device by serial number

2018-09-14 Thread Stephen Hemminger

The Hyper-V implementation of PCI controller has concept of 32 bit serial number
(not to be confused with PCI-E serial number).  This value is sent in the 
protocol
from the host to indicate SR-IOV VF device is attached to a synthetic NIC.

Using the serial number (instead of MAC address) to associate the two devices
avoids lots of potential problems when there are duplicate MAC addresses from
tunnels or layered devices.

The patch set is broken into two parts, one is for the PCI controller
and the other is for the netvsc device. Normally, these go through different
trees but sending them together here for better review. The PCI changes
were submitted previously, but the main review comment was "why do you
need this?". This is why.

v2 - slot name can be shorter.
 remove locking when creating pci_slots; see comment for explaination

Stephen Hemminger (2):
  PCI: hv: support reporting serial number as slot information
  hv_netvsc: pair VF based on serial number

 drivers/net/hyperv/netvsc.c |  3 ++
 drivers/net/hyperv/netvsc_drv.c | 58 -
 drivers/pci/controller/pci-hyperv.c | 37 ++
 3 files changed, 73 insertions(+), 25 deletions(-)

-- 
2.18.0

[PATCH v2 1/2] PCI: hv: support reporting serial number as slot information

2018-09-14 Thread Stephen Hemminger

The Hyper-V host API for PCI provides a unique "serial number" which
can be used as basis for sysfs PCI slot table. This can be useful
for cases where userspace wants to find the PCI device based on
serial number.

When an SR-IOV NIC is added, the host sends an attach message
with serial number. The kernel doesn't use the serial number, but
it is useful when doing the same thing in a userspace driver such
as the DPDK. By having /sys/bus/pci/slots/N it provides a direct
way to find the matching PCI device.

There maybe some cases where serial number is not unique such
as when using GPU's. But the PCI slot infrastructure will handle
that.

This has a side effect which may also be useful. The common udev
network device naming policy uses the slot information (rather
than PCI address).

Signed-off-by: Stephen Hemminger 
---
 drivers/pci/controller/pci-hyperv.c | 37 +
 1 file changed, 37 insertions(+)

diff --git a/drivers/pci/controller/pci-hyperv.c 
b/drivers/pci/controller/pci-hyperv.c
index c00f82cc54aa..ee80e79db21a 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -89,6 +89,9 @@ static enum pci_protocol_version_t pci_protocol_version;
 
 #define STATUS_REVISION_MISMATCH 0xC059
 
+/* space for 32bit serial number as string */
+#define SLOT_NAME_SIZE 11
+
 /*
  * Message Types
  */
@@ -494,6 +497,7 @@ struct hv_pci_dev {
struct list_head list_entry;
refcount_t refs;
enum hv_pcichild_state state;
+   struct pci_slot *pci_slot;
struct pci_function_description desc;
bool reported_missing;
struct hv_pcibus_device *hbus;
@@ -1457,6 +1461,34 @@ static void prepopulate_bars(struct hv_pcibus_device 
*hbus)
spin_unlock_irqrestore(>device_list_lock, flags);
 }
 
+/*
+ * Assign entries in sysfs pci slot directory.
+ *
+ * Note that this function does not need to lock the children list
+ * because it is called from pci_devices_present_work which
+ * is serialized with hv_eject_device_work because they are on the
+ * same ordered workqueue. Therefore hbus->children list will not change
+ * even when pci_create_slot sleeps.
+ */
+static void hv_pci_assign_slots(struct hv_pcibus_device *hbus)
+{
+   struct hv_pci_dev *hpdev;
+   char name[SLOT_NAME_SIZE];
+   int slot_nr;
+
+   list_for_each_entry(hpdev, >children, list_entry) {
+   if (hpdev->pci_slot)
+   continue;
+
+   slot_nr = PCI_SLOT(wslot_to_devfn(hpdev->desc.win_slot.slot));
+   snprintf(name, SLOT_NAME_SIZE, "%u", hpdev->desc.ser);
+   hpdev->pci_slot = pci_create_slot(hbus->pci_bus, slot_nr,
+ name, NULL);
+   if (!hpdev->pci_slot)
+   pr_warn("pci_create slot %s failed\n", name);
+   }
+}
+
 /**
  * create_root_hv_pci_bus() - Expose a new root PCI bus
  * @hbus:  Root PCI bus, as understood by this driver
@@ -1480,6 +1512,7 @@ static int create_root_hv_pci_bus(struct hv_pcibus_device 
*hbus)
pci_lock_rescan_remove();
pci_scan_child_bus(hbus->pci_bus);
pci_bus_assign_resources(hbus->pci_bus);
+   hv_pci_assign_slots(hbus);
pci_bus_add_devices(hbus->pci_bus);
pci_unlock_rescan_remove();
hbus->state = hv_pcibus_installed;
@@ -1742,6 +1775,7 @@ static void pci_devices_present_work(struct work_struct 
*work)
 */
pci_lock_rescan_remove();
pci_scan_child_bus(hbus->pci_bus);
+   hv_pci_assign_slots(hbus);
pci_unlock_rescan_remove();
break;
 
@@ -1858,6 +1892,9 @@ static void hv_eject_device_work(struct work_struct *work)
list_del(>list_entry);
spin_unlock_irqrestore(>hbus->device_list_lock, flags);
 
+   if (hpdev->pci_slot)
+   pci_destroy_slot(hpdev->pci_slot);
+
memset(, 0, sizeof(ctxt));
ejct_pkt = (struct pci_eject_response *)
ejct_pkt->message_type.type = PCI_EJECTION_COMPLETE;
-- 
2.18.0

[PATCH net] bnxt_en: Fix VF mac address regression.

2018-09-14 Thread Michael Chan

The recent commit to always forward the VF MAC address to the PF for
approval may not work if the PF driver or the firmware is older.  This
will cause the VF driver to fail during probe:

  bnxt_en :00:03.0 (unnamed net_device) (uninitialized): hwrm req_type 0xf 
seq id 0x5 error 0x
  bnxt_en :00:03.0 (unnamed net_device) (uninitialized): VF MAC address 
00:00:17:02:05:d0 not approved by the PF
  bnxt_en :00:03.0: Unable to initialize mac address.
  bnxt_en: probe of :00:03.0 failed with error -99

We fix it by treating the error as fatal only if the VF MAC address is
locally generated by the VF.

Fixes: 707e7e966026 ("bnxt_en: Always forward VF MAC address to the PF.")
Reported-by: Seth Forshee 
Reported-by: Siwei Liu 
Signed-off-by: Michael Chan 
---
Please queue this for stable as well.  Thanks.

 drivers/net/ethernet/broadcom/bnxt/bnxt.c   | 9 +++--
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 9 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h | 2 +-
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index cecbb1d..177587f 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -8027,7 +8027,7 @@ static int bnxt_change_mac_addr(struct net_device *dev, 
void *p)
if (ether_addr_equal(addr->sa_data, dev->dev_addr))
return 0;
 
-   rc = bnxt_approve_mac(bp, addr->sa_data);
+   rc = bnxt_approve_mac(bp, addr->sa_data, true);
if (rc)
return rc;
 
@@ -8827,14 +8827,19 @@ static int bnxt_init_mac_addr(struct bnxt *bp)
} else {
 #ifdef CONFIG_BNXT_SRIOV
struct bnxt_vf_info *vf = >vf;
+   bool strict_approval = true;
 
if (is_valid_ether_addr(vf->mac_addr)) {
/* overwrite netdev dev_addr with admin VF MAC */
memcpy(bp->dev->dev_addr, vf->mac_addr, ETH_ALEN);
+   /* Older PF driver or firmware may not approve this
+* correctly.
+*/
+   strict_approval = false;
} else {
eth_hw_addr_random(bp->dev);
}
-   rc = bnxt_approve_mac(bp, bp->dev->dev_addr);
+   rc = bnxt_approve_mac(bp, bp->dev->dev_addr, strict_approval);
 #endif
}
return rc;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index fcd085a..3962f6f 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -1104,7 +1104,7 @@ void bnxt_update_vf_mac(struct bnxt *bp)
mutex_unlock(>hwrm_cmd_lock);
 }
 
-int bnxt_approve_mac(struct bnxt *bp, u8 *mac)
+int bnxt_approve_mac(struct bnxt *bp, u8 *mac, bool strict)
 {
struct hwrm_func_vf_cfg_input req = {0};
int rc = 0;
@@ -1122,12 +1122,13 @@ int bnxt_approve_mac(struct bnxt *bp, u8 *mac)
memcpy(req.dflt_mac_addr, mac, ETH_ALEN);
rc = hwrm_send_message(bp, , sizeof(req), HWRM_CMD_TIMEOUT);
 mac_done:
-   if (rc) {
+   if (rc && strict) {
rc = -EADDRNOTAVAIL;
netdev_warn(bp->dev, "VF MAC address %pM not approved by the 
PF\n",
mac);
+   return rc;
}
-   return rc;
+   return 0;
 }
 #else
 
@@ -1144,7 +1145,7 @@ void bnxt_update_vf_mac(struct bnxt *bp)
 {
 }
 
-int bnxt_approve_mac(struct bnxt *bp, u8 *mac)
+int bnxt_approve_mac(struct bnxt *bp, u8 *mac, bool strict)
 {
return 0;
 }
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
index e9b20cd..2eed9ed 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
@@ -39,5 +39,5 @@ int bnxt_sriov_configure(struct pci_dev *pdev, int num_vfs);
 void bnxt_sriov_disable(struct bnxt *);
 void bnxt_hwrm_exec_fwd_req(struct bnxt *);
 void bnxt_update_vf_mac(struct bnxt *);
-int bnxt_approve_mac(struct bnxt *, u8 *);
+int bnxt_approve_mac(struct bnxt *, u8 *, bool);
 #endif
-- 
2.5.1

Re: [PATCH net-next v2] net/tls: Add support for async decryption of tls records

2018-09-14 Thread John Fastabend

On 08/29/2018 02:56 AM, Vakul Garg wrote:
> When tls records are decrypted using asynchronous acclerators such as
> NXP CAAM engine, the crypto apis return -EINPROGRESS. Presently, on
> getting -EINPROGRESS, the tls record processing stops till the time the
> crypto accelerator finishes off and returns the result. This incurs a
> context switch and is not an efficient way of accessing the crypto
> accelerators. Crypto accelerators work efficient when they are queued
> with multiple crypto jobs without having to wait for the previous ones
> to complete.
> 
> The patch submits multiple crypto requests without having to wait for
> for previous ones to complete. This has been implemented for records
> which are decrypted in zero-copy mode. At the end of recvmsg(), we wait
> for all the asynchronous decryption requests to complete.
> 
> The references to records which have been sent for async decryption are
> dropped. For cases where record decryption is not possible in zero-copy
> mode, asynchronous decryption is not used and we wait for decryption
> crypto api to complete.
> 
> For crypto requests executing in async fashion, the memory for
> aead_request, sglists and skb etc is freed from the decryption
> completion handler. The decryption completion handler wakesup the
> sleeping user context when recvmsg() flags that it has done sending
> all the decryption requests and there are no more decryption requests
> pending to be completed.
> 
> Signed-off-by: Vakul Garg 
> Reviewed-by: Dave Watson 
> ---

[...]


> @@ -1271,6 +1377,8 @@ int tls_set_sw_offload(struct sock *sk, struct 
> tls_context *ctx, int tx)
>   goto free_aead;
>  
>   if (sw_ctx_rx) {
> + (*aead)->reqsize = sizeof(struct decrypt_req_ctx);
> +

This is not valid and may cause GPF or best case only a KASAN
warning. 'reqsize' should probably not be mangled outside the
internal crypto APIs but the real reason is the reqsize is used
to determine how much space is needed at the end of the aead_request
for crypto private ctx use in encrypt/decrypt. After this patch
when we submit an aead_request the crypto layer will think it
has room for its private structs at the end but now only 8B will
be there and crypto layer will happily memset some arbitrary
memory for you amongst other things.

Anyways testing a fix now will post shortly.

Thanks,
John

Re: [bpf-next, v4 0/5] Introduce eBPF flow dissector

2018-09-14 Thread Alexei Starovoitov

On Fri, Sep 14, 2018 at 07:46:17AM -0700, Petar Penkov wrote:
> From: Petar Penkov 
> 
> This patch series hardens the RX stack by allowing flow dissection in BPF,
> as previously discussed [1]. Because of the rigorous checks of the BPF
> verifier, this provides significant security guarantees. In particular, the
> BPF flow dissector cannot get inside of an infinite loop, as with
> CVE-2013-4348, because BPF programs are guaranteed to terminate. It cannot
> read outside of packet bounds, because all memory accesses are checked.
> Also, with BPF the administrator can decide which protocols to support,
> reducing potential attack surface. Rarely encountered protocols can be
> excluded from dissection and the program can be updated without kernel
> recompile or reboot if a bug is discovered.
> 
> Patch 1 adds infrastructure to execute a BPF program in __skb_flow_dissect.
> This includes a new BPF program and attach type.
> 
> Patch 2 adds the new BPF flow dissector definitions to tools/uapi.
> 
> Patch 3 adds support for the new BPF program type to libbpf and bpftool.
> 
> Patch 4 adds a flow dissector program in BPF. This parses most protocols in
> __skb_flow_dissect in BPF for a subset of flow keys (basic, control, ports,
> and address types).
> 
> Patch 5 adds a selftest that attaches the BPF program to the flow dissector
> and sends traffic with different levels of encapsulation.
> 
> Performance Evaluation:
> The in-kernel implementation was compared against the demo program from
> patch 4 using the test in patch 5 with IPv4/UDP traffic over 10 seconds.
>   $perf record -a -C 4 taskset -c 4 ./test_flow_dissector -i 4 -f 8 \
>   -t 10

Looks great. Applied to bpf-next with one extra patch:
 SEC("dissect")
-int dissect(struct __sk_buff *skb)
+int _dissect(struct __sk_buff *skb)

otherwise the test doesn't build.
I'm not sure how it builds for you. Which llvm did you use?

Also above command works and ipv4 test in ./test_flow_dissector.sh
is passing as well, but it still fails at the end for me:
./test_flow_dissector.sh
bpffs not mounted. Mounting...
0: IP
1: IPV6
2: IPV6OP
3: IPV6FR
4: MPLS
5: VLAN
Testing IPv4...
inner.dest4: 127.0.0.1
inner.source4: 127.0.0.3
pkts: tx=10 rx=10
inner.dest4: 127.0.0.1
inner.source4: 127.0.0.3
pkts: tx=10 rx=0
inner.dest4: 127.0.0.1
inner.source4: 127.0.0.3
pkts: tx=10 rx=10
Testing IPIP...
tunnels before test:
tunl0: any/ip remote any local any ttl inherit nopmtudisc
sit_test_LV5N: any/ip remote 127.0.0.2 local 127.0.0.1 dev lo ttl inherit
ipip_test_LV5N: any/ip remote 127.0.0.2 local 127.0.0.1 dev lo ttl inherit
sit0: ipv6/ip remote any local any ttl 64 nopmtudisc
gre_test_LV5N: gre/ip remote 127.0.0.2 local 127.0.0.1 dev lo ttl inherit
gre0: gre/ip remote any local any ttl inherit nopmtudisc
inner.dest4: 192.168.0.1
inner.source4: 1.1.1.1
encap proto:   4
outer.dest4: 127.0.0.1
outer.source4: 127.0.0.2
pkts: tx=10 rx=0
tunnels after test:
tunl0: any/ip remote any local any ttl inherit nopmtudisc
sit0: ipv6/ip remote any local any ttl 64 nopmtudisc
gre0: gre/ip remote any local any ttl inherit nopmtudisc
selftests: test_flow_dissector [FAILED]

is it something in my setup or test is broken?

Re: [RFC PATCH net-next v1 00/14] rename and shrink i40evf

2018-09-14 Thread Jesse Brandeburg

On Fri, 14 Sep 2018 12:10:45 +0300 Or wrote:
> On Fri, Sep 14, 2018 at 1:31 AM, Jesse Brandeburg
>  wrote:
> on what HW ring format do you standardize? do i40e/Fortville and
> ice/what's-the-intel-code-name?  HWs can/use the same posting/completion
> descriptor?

The initial ring format is the same as used for XL710/X722 devices, and
planned be supported for the Intel Ethernet E800 series (ice driver) and
future VF devices using SR-IOV.

> > This solves 2 issues we saw coming or were already present, the
> > first was constant code duplication happening with i40e/i40evf,
> > when much of the duplicate code in the i40evf was not used or was
> > not needed.  
> 
> could you spare few words on the origin/nature of these duplicates? were them
> just developer C mistakes for functionality which is irrelevant for
> a VF? like what?
> if not, what was there?

In particular, some of the code was not used at all, but was not caught
by any automation because it was in a header file and included into
multiple file scopes.  Other big chunk of the duplicate code was for
the PF's usage of the communication channel to firmware, which for some
reason was left in the VF driver code (probably just to avoid changing
the file) - but the VF driver doesn't communicate to firmware, just to
the PF.

> > The second was to remove the future confusion of why
> > future VF devices that were not considered "40GbE" only devices
> > were supported by i40evf.  
> 
> can elaborate further?

The name i40evf was generating customer questions, and was confusing
when you add in multiple generations of PF hardware that are no longer
using the i40e driver.

> > The thought is that iavf will be the virtual function driver for
> > all future devices, so it should have a "generic" name to propery
> > represent that it is the VF driver for multiple generations of
> > devices.  
> 
> for that end,  as I think was explained @ the netdev Tokyo AVF session,
> you would need a mechanism for feature negotiation, is it here or coming up?

The driver already has it (a feature negotitiation), please see the
function called iavf_send_vf_config_msg, and follow from where it is
called.  Basically the VF driver negotiates with the PF for what it can
do, and the PF guarantees that the base set of features will always
work, with optional advanced features which the code may/may-not have
in the future.

> >  41 files changed, 3436 insertions(+), 7581 deletions(-)  
> 
> code diet is cool!

Thanks! ~4000 lines less made me very happy too.

[PATCH net] ipv6: fix possible use-after-free in ip6_xmit()

2018-09-14 Thread Eric Dumazet

In the unlikely case ip6_xmit() has to call skb_realloc_headroom(),
we need to call skb_set_owner_w() before consuming original skb,
otherwise we risk a use-after-free.

Bring IPv6 in line with what we do in IPv4 to fix this.

Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet 
Reported-by: syzbot 
---
 net/ipv6/ip6_output.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 
16f200f06500758c4cae84ea16229d5dbce912cb..f9f8f554d141676a7d342f85088d12d9a6815e9d
 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -219,12 +219,10 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, 
struct flowi6 *fl6,
kfree_skb(skb);
return -ENOBUFS;
}
+   if (skb->sk)
+   skb_set_owner_w(skb2, skb->sk);
consume_skb(skb);
skb = skb2;
-   /* skb_set_owner_w() changes sk->sk_wmem_alloc 
atomically,
-* it is safe to call in our context (socket lock not 
held)
-*/
-   skb_set_owner_w(skb, (struct sock *)sk);
}
if (opt->opt_flen)
ipv6_push_frag_opts(skb, opt, );
-- 
2.19.0.397.gdd90340f6a-goog

Re: [RFC PATCH net-next v1 00/14] rename and shrink i40evf

2018-09-14 Thread Jesse Brandeburg

On Fri, 14 Sep 2018 13:39:17 +0900 Benjamin wrote:
> > Jesse Brandeburg (14):
> >   intel-ethernet: rename i40evf to iavf  
> 
> Seems like patch 1 didn't make it to netdev
> https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20180910/014025.html

Hi Ben, Thanks for the note, I don't know why it didn't show up for
you, it's here if you want to take a look:
https://patchwork.ozlabs.org/patch/969557/

Re: [PATCH net-next RFC 6/8] net: make gro configurable

2018-09-14 Thread Stephen Hemminger

On Fri, 14 Sep 2018 13:59:39 -0400
Willem de Bruijn  wrote:

> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index e5d236595206..8cb8e02c8ab6 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -572,6 +572,7 @@ static struct sk_buff *vxlan_gro_receive(struct sock *sk,
>struct list_head *head,
>struct sk_buff *skb)
>  {
> + const struct net_offload *ops;
>   struct sk_buff *pp = NULL;
>   struct sk_buff *p;
>   struct vxlanhdr *vh, *vh2;
> @@ -606,6 +607,12 @@ static struct sk_buff *vxlan_gro_receive(struct sock *sk,
>   goto out;
>   }
>  
> + rcu_read_lock();
> + ops = net_gro_receive(dev_offloads, ETH_P_TEB);
> + rcu_read_unlock();
> + if (!ops)
> + goto out;

Isn't rcu_read_lock already held here?
RCU read lock is always held in the receive handler path

> +
>   skb_gro_pull(skb, sizeof(struct vxlanhdr)); /* pull vxlan header */
>  
>   list_for_each_entry(p, head, list) {
> @@ -621,6 +628,7 @@ static struct sk_buff *vxlan_gro_receive(struct sock *sk,
>   }
>  
>   pp = call_gro_receive(eth_gro_receive, head, skb);
> +
>   flush = 0;

whitespace change crept into this patch.

Re: mlx5 driver loading failing on v4.19 / net-next / bpf-next

2018-09-14 Thread Saeed Mahameed

On Fri, Sep 14, 2018 at 1:52 AM, Jesper Dangaard Brouer
 wrote:
> On Fri, 14 Sep 2018 01:22:15 -0700
> Saeed Mahameed  wrote:
>
>> On Thu, Sep 13, 2018 at 11:36 PM, Jesper Dangaard Brouer
>>  wrote:
>> > On Thu, 13 Sep 2018 15:55:29 -0700
>> > Alexei Starovoitov  wrote:
>> >
>> >> On Thu, Aug 30, 2018 at 1:35 AM, Tariq Toukan  wrote:
>> >> >
>> >> >
>> >> > On 29/08/2018 6:05 PM, Jesper Dangaard Brouer wrote:
>> >> >>
>> >> >> Hi Saeed,
>> >> >>
>> >> >> I'm having issues loading mlx5 driver on v4.19 kernels (tested both
>> >> >> net-next and bpf-next), while kernel v4.18 seems to work.  It happens
>> >> >> with a Mellanox ConnectX-5 NIC (and also a CX4-Lx but I removed that
>> >> >> from the system now).
>> >> >>
>> >> >
>> >> > Hi Jesper,
>> >> >
>> >> > Thanks for your report!
>> >> >
>> >> > We are working to analyze and debug the issue.
>> >>
>> >> looks like serious issue to me... while no news in 2 weeks.
>> >> any update?
>> >
>> > Mellanox took it offlist, and Sep 6th found that this is a regression
>> > introduced by commit 269d26f47f6f ("net/mlx5: Reduce command polling
>> > interval"), but only if CONFIG_PREEMPT is on.
>> >
>> > I can confirm that reverting this commit fixed the issue (and not the
>> > firmware upgrade I also did).
>> >
>> > I think Moshe (Cc) is responsible for this case, and I expect to soon
>> > see a revert or alternative solution to this!?
>> >
>> > Thanks for the kick Alexei :-)
>>
>> Thanks you Alexei and Jesper for following up,
>> the fix is already being tested [1] and will be submitted tomorrow,
>> as Jesper pointed out the issue happens only with 269d26f47f6f
>> ("net/mlx5: Reduce command polling
>> interval"), and only if CONFIG_PREEMPT is on.
>> the only affected kernel is 4.19 which is not GA yet.
>>
>> [1] 
>> https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/commit/?h=net-mlx5
>
> Sound good.
>
> I will appreciate if you add a:
>
> Reported-by: Jesper Dangaard Brouer 
>

Of course i will add it, simply the patch was in my review queue
before your report :).

> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer

Re: [PATCH] net: caif: remove redundant null check on frontpkt

2018-09-14 Thread Colin Ian King

On 14/09/18 18:54, Sergei Shtylyov wrote:
> Hello!
> 
> On 09/14/2018 08:19 PM, Colin King wrote:
> 
>> From: Colin Ian King 
>>
>> It is impossible for frontpkt to be null at the point of the null
>> check because it has been assigned from rearpkt and there is no
>> way realpkt can be null at the point of the assignment because
> 
>rearpkt?

Good spot. Can this be fixed up when the patch is applied?

> 
>> of the sanity checking and exit paths taken previously. Remove
>> the redundant null check.
>>
>> Detected by CoverityScan, CID#114434 ("Logically dead code")
>>
>> Signed-off-by: Colin Ian King 
> [...]
> 
> MBR, Sergei
>

[PATCH net-next RFC 8/8] udp: add gro

2018-09-14 Thread Willem de Bruijn

From: Willem de Bruijn 

Very rough initial version of udp gro, for discussion purpose only at
this point.

Among others it
- lacks the cmsg UDP_SEGMENT to return gso_size
- probably breaks udp tunnels
- hard breaks at 40 segments
- does not allow a last segment of unequal size

Signed-off-by: Willem de Bruijn 
---
 include/uapi/linux/udp.h |  1 +
 net/ipv4/udp.c   | 71 
 net/ipv4/udp_offload.c   | 11 +++
 3 files changed, 76 insertions(+), 7 deletions(-)

diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
index 09d00f8c442b..7fda3e8c7fcf 100644
--- a/include/uapi/linux/udp.h
+++ b/include/uapi/linux/udp.h
@@ -33,6 +33,7 @@ struct udphdr {
 #define UDP_NO_CHECK6_TX 101   /* Disable sending checksum for UDP6X */
 #define UDP_NO_CHECK6_RX 102   /* Disable accpeting checksum for UDP6 */
 #define UDP_SEGMENT103 /* Set GSO segmentation size */
+#define UDP_GRO104 /* Enable GRO */
 
 /* UDP encapsulation types */
 #define UDP_ENCAP_ESPINUDP_NON_IKE 1 /* draft-ietf-ipsec-nat-t-ike-00/01 */
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index bd873a5b8a86..ae49c08e6225 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2387,6 +2387,51 @@ void udp_destroy_sock(struct sock *sk)
}
 }
 
+static struct sk_buff *udp_gro_receive_cb(struct sock *sk,
+ struct list_head *head,
+ struct sk_buff *skb)
+{
+   struct sk_buff *p;
+   unsigned int off;
+
+   off = skb_gro_offset(skb) - sizeof(struct udphdr);
+
+   list_for_each_entry(p, head, list) {
+   if (!NAPI_GRO_CB(p)->same_flow)
+   continue;
+
+   /* TODO: for UDP_GRO: match size unless last segment */
+   if (NAPI_GRO_CB(p)->flush)
+   break;
+
+   /* TODO: look into ip id check */
+   if (skb_gro_receive(p, skb)) {
+   NAPI_GRO_CB(skb)->flush = 1;
+   break;
+   }
+
+   if (NAPI_GRO_CB(skb)->count >= 40) {
+   return p;
+   }
+
+   return NULL;
+   }
+
+   return NULL;
+}
+
+static int udp_gro_complete_cb(struct sock *sk, struct sk_buff *skb,
+  int nhoff)
+{
+   skb->csum_start = (unsigned char *)udp_hdr(skb) - skb->head;
+   skb->csum_offset = offsetof(struct udphdr, check);
+   skb->ip_summed = CHECKSUM_PARTIAL;
+
+   skb_shinfo(skb)->gso_segs = NAPI_GRO_CB(skb)->count;
+
+   return 0;
+}
+
 /*
  * Socket option code for UDP
  */
@@ -2450,6 +2495,32 @@ int udp_lib_setsockopt(struct sock *sk, int level, int 
optname,
up->gso_size = val;
break;
 
+   case UDP_GRO:
+   {
+   if (val < 0 || val > 1)
+   return -EINVAL;
+
+   lock_sock(sk);
+   if (val) {
+
+   if (!udp_sk(sk)->gro_receive) {
+   udp_sk(sk)->gro_complete = udp_gro_complete_cb;
+   udp_sk(sk)->gro_receive = udp_gro_receive_cb;
+   } else {
+   err = -EALREADY;
+   }
+   } else {
+   if (udp_sk(sk)->gro_receive) {
+   udp_sk(sk)->gro_receive = NULL;
+   udp_sk(sk)->gro_complete = NULL;
+   } else {
+   err = -ENOENT;
+   }
+   }
+   release_sock(sk);
+   break;
+   }
+
/*
 *  UDP-Lite's partial checksum coverage (RFC 3828).
 */
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index f44fe328aa0f..6dd3f0a28b5e 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -386,6 +386,8 @@ struct sk_buff *udp_gro_receive(struct list_head *head, 
struct sk_buff *skb,
NAPI_GRO_CB(p)->same_flow = 0;
continue;
}
+
+   /* TODO: for UDP_GRO: match size */
}
 
skb_gro_pull(skb, sizeof(struct udphdr)); /* pull encapsulating udp 
header */
@@ -437,11 +439,6 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff,
 
uh->len = newlen;
 
-   /* Set encapsulation before calling into inner gro_complete() functions
-* to make them set up the inner offsets.
-*/
-   skb->encapsulation = 1;
-
rcu_read_lock();
sk = (*lookup)(skb, uh->source, uh->dest);
if (sk && udp_sk(sk)->gro_complete)
@@ -462,11 +459,11 @@ static int udp4_gro_complete(struct sk_buff *skb, int 
nhoff)
struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
 
if (uh->check) {
-   skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL_CSUM;
+

[PATCH net-next RFC 1/8] gro: convert device offloads to net_offload

2018-09-14 Thread Willem de Bruijn

From: Willem de Bruijn 

In preparation of making GRO receive configurable, have all offloads
share the same infrastructure.

Signed-off-by: Willem de Bruijn 
---
 include/linux/netdevice.h |  17 +-
 include/net/protocol.h|   7 ---
 net/core/dev.c| 105 +-
 3 files changed, 51 insertions(+), 78 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e2b3bd750c98..7425068fa249 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2366,13 +2366,18 @@ struct offload_callbacks {
int (*gro_complete)(struct sk_buff *skb, int nhoff);
 };
 
-struct packet_offload {
+struct net_offload {
__be16   type;  /* This is really htons(ether_type). */
u16  priority;
struct offload_callbacks callbacks;
-   struct list_head list;
+   unsigned int flags; /* Flags used by IPv6 for now */
 };
 
+#define packet_offload net_offload
+
+/* This should be set for any extension header which is compatible with GSO. */
+#define INET6_PROTO_GSO_EXTHDR 0x1
+
 /* often modified stats are per-CPU, other are shared (netdev->stats) */
 struct pcpu_sw_netstats {
u64 rx_packets;
@@ -3554,6 +3559,14 @@ gro_result_t napi_gro_frags(struct napi_struct *napi);
 struct packet_offload *gro_find_receive_by_type(__be16 type);
 struct packet_offload *gro_find_complete_by_type(__be16 type);
 
+static inline u8 net_offload_from_type(u16 type)
+{
+   /* Do not bother handling collisions. There are none.
+* If they do occur with new offloads, add a mapping function here.
+*/
+   return type & 0xFF;
+}
+
 static inline void napi_free_frags(struct napi_struct *napi)
 {
kfree_skb(napi->skb);
diff --git a/include/net/protocol.h b/include/net/protocol.h
index 4fc75f7ae23b..53a0322ee545 100644
--- a/include/net/protocol.h
+++ b/include/net/protocol.h
@@ -69,13 +69,6 @@ struct inet6_protocol {
 #define INET6_PROTO_FINAL  0x2
 #endif
 
-struct net_offload {
-   struct offload_callbacks callbacks;
-   unsigned int flags; /* Flags used by IPv6 for now */
-};
-/* This should be set for any extension header which is compatible with GSO. */
-#define INET6_PROTO_GSO_EXTHDR 0x1
-
 /* This is used to register socket interfaces for IP protocols.  */
 struct inet_protosw {
struct list_head list;
diff --git a/net/core/dev.c b/net/core/dev.c
index 0b2d777e5b9e..55f86b6d3182 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -154,7 +154,6 @@
 #define GRO_MAX_HEAD (MAX_HEADER + 128)
 
 static DEFINE_SPINLOCK(ptype_lock);
-static DEFINE_SPINLOCK(offload_lock);
 struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly;
 struct list_head ptype_all __read_mostly;  /* Taps */
 static struct list_head offload_base __read_mostly;
@@ -467,6 +466,9 @@ void dev_remove_pack(struct packet_type *pt)
 EXPORT_SYMBOL(dev_remove_pack);
 
 
+const struct net_offload __rcu *dev_offloads[256] __read_mostly;
+EXPORT_SYMBOL(dev_offloads);
+
 /**
  * dev_add_offload - register offload handlers
  * @po: protocol offload declaration
@@ -481,15 +483,9 @@ EXPORT_SYMBOL(dev_remove_pack);
  */
 void dev_add_offload(struct packet_offload *po)
 {
-   struct packet_offload *elem;
-
-   spin_lock(_lock);
-   list_for_each_entry(elem, _base, list) {
-   if (po->priority < elem->priority)
-   break;
-   }
-   list_add_rcu(>list, elem->list.prev);
-   spin_unlock(_lock);
+   cmpxchg((const struct net_offload **)
+   _offloads[net_offload_from_type(po->type)],
+   NULL, po);
 }
 EXPORT_SYMBOL(dev_add_offload);
 
@@ -506,23 +502,11 @@ EXPORT_SYMBOL(dev_add_offload);
  * and must not be freed until after all the CPU's have gone
  * through a quiescent state.
  */
-static void __dev_remove_offload(struct packet_offload *po)
+static int __dev_remove_offload(struct packet_offload *po)
 {
-   struct list_head *head = _base;
-   struct packet_offload *po1;
-
-   spin_lock(_lock);
-
-   list_for_each_entry(po1, head, list) {
-   if (po == po1) {
-   list_del_rcu(>list);
-   goto out;
-   }
-   }
-
-   pr_warn("dev_remove_offload: %p not found\n", po);
-out:
-   spin_unlock(_lock);
+   return (cmpxchg((const struct net_offload **)
+   _offloads[net_offload_from_type(po->type)],
+  po, NULL) == po) ? 0 : -1;
 }
 
 /**
@@ -2962,7 +2946,7 @@ struct sk_buff *skb_mac_gso_segment(struct sk_buff *skb,
netdev_features_t features)
 {
struct sk_buff *segs = ERR_PTR(-EPROTONOSUPPORT);
-   struct packet_offload *ptype;
+   const struct net_offload *off;
int vlan_depth = skb->mac_len;
__be16 type = skb_network_protocol(skb,

[PATCH net-next RFC 3/8] gro: add net_gro_receive

2018-09-14 Thread Willem de Bruijn

From: Willem de Bruijn 

For configurable gro_receive all callsites need to be updated. Similar
to gro_complete, introduce a single shared helper, net_gro_receive.

Signed-off-by: Willem de Bruijn 
---
 drivers/net/geneve.c  |  2 +-
 include/linux/netdevice.h | 14 +-
 net/8021q/vlan.c  |  2 +-
 net/core/dev.c| 20 
 net/ethernet/eth.c|  2 +-
 net/ipv4/af_inet.c|  4 ++--
 net/ipv4/fou.c|  8 
 net/ipv4/gre_offload.c| 12 ++--
 net/ipv6/ip6_offload.c|  8 
 9 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index a3a4621d9bee..a812a774e5fd 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -467,7 +467,7 @@ static struct sk_buff *geneve_gro_receive(struct sock *sk,
type = gh->proto_type;
 
rcu_read_lock();
-   ptype = gro_find_receive_by_type(type);
+   ptype = net_gro_receive(dev_offloads, type);
if (!ptype)
goto out_unlock;
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0d292ea6716e..0be594f8d1ce 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3556,7 +3556,6 @@ gro_result_t napi_gro_receive(struct napi_struct *napi, 
struct sk_buff *skb);
 void napi_gro_flush(struct napi_struct *napi, bool flush_old);
 struct sk_buff *napi_get_frags(struct napi_struct *napi);
 gro_result_t napi_gro_frags(struct napi_struct *napi);
-struct packet_offload *gro_find_receive_by_type(__be16 type);
 
 extern const struct net_offload __rcu *dev_offloads[256];
 
@@ -3568,6 +3567,19 @@ static inline u8 net_offload_from_type(u16 type)
return type & 0xFF;
 }
 
+static inline const struct net_offload *
+net_gro_receive(const struct net_offload __rcu **offs, u16 type)
+{
+   const struct net_offload *off;
+
+   off = rcu_dereference(offs[net_offload_from_type(type)]);
+   if (off && off->callbacks.gro_receive &&
+   (!off->type || off->type == type))
+   return off;
+   else
+   return NULL;
+}
+
 static inline int net_gro_complete(const struct net_offload __rcu **offs,
   u16 type, struct sk_buff *skb, int nhoff)
 {
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 6ac27aa9f158..a106c5373b1d 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -670,7 +670,7 @@ static struct sk_buff *vlan_gro_receive(struct list_head 
*head,
type = vhdr->h_vlan_encapsulated_proto;
 
rcu_read_lock();
-   ptype = gro_find_receive_by_type(type);
+   ptype = net_gro_receive(dev_offloads, type);
if (!ptype)
goto out_unlock;
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 2c21e507291f..ae5fbd4114d2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5382,7 +5382,7 @@ static void gro_flush_oldest(struct list_head *head)
 static enum gro_result dev_gro_receive(struct napi_struct *napi, struct 
sk_buff *skb)
 {
u32 hash = skb_get_hash_raw(skb) & (GRO_HASH_BUCKETS - 1);
-   const struct packet_offload *ptype;
+   const struct net_offload *ops;
__be16 type = skb->protocol;
struct list_head *gro_head;
struct sk_buff *pp = NULL;
@@ -5396,8 +5396,8 @@ static enum gro_result dev_gro_receive(struct napi_struct 
*napi, struct sk_buff
gro_head = gro_list_prepare(napi, skb);
 
rcu_read_lock();
-   ptype = dev_offloads[net_offload_from_type(type)];
-   if (ptype && ptype->callbacks.gro_receive) {
+   ops = net_gro_receive(dev_offloads, type);
+   if (ops) {
skb_set_network_header(skb, skb_gro_offset(skb));
skb_reset_mac_len(skb);
NAPI_GRO_CB(skb)->same_flow = 0;
@@ -5425,7 +5425,7 @@ static enum gro_result dev_gro_receive(struct napi_struct 
*napi, struct sk_buff
NAPI_GRO_CB(skb)->csum_valid = 0;
}
 
-   pp = ptype->callbacks.gro_receive(gro_head, skb);
+   pp = ops->callbacks.gro_receive(gro_head, skb);
rcu_read_unlock();
} else {
rcu_read_unlock();
@@ -5483,18 +5483,6 @@ static enum gro_result dev_gro_receive(struct 
napi_struct *napi, struct sk_buff
goto pull;
 }
 
-struct packet_offload *gro_find_receive_by_type(__be16 type)
-{
-   struct net_offload *off;
-
-   off = (struct net_offload *) rcu_dereference(dev_offloads[type & 0xFF]);
-   if (off && off->type == type && off->callbacks.gro_receive)
-   return off;
-   else
-   return NULL;
-}
-EXPORT_SYMBOL(gro_find_receive_by_type);
-
 static void napi_skb_free_stolen_head(struct sk_buff *skb)
 {
skb_dst_drop(skb);
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index fb17a13722e8..542dbc2ec956 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -462,7 +462,7 @@ struct sk_buff *eth_gro_receive(struct

[PATCH net-next RFC 6/8] net: make gro configurable

2018-09-14 Thread Willem de Bruijn

From: Willem de Bruijn 

Add net_offload flag NET_OFF_FLAG_GRO_OFF. If set, a net_offload will
not be used for gro receive processing.

Also add sysctl helper proc_do_net_offload that toggles this flag and
register sysctls net.{core,ipv4,ipv6}.gro

Signed-off-by: Willem de Bruijn 
---
 drivers/net/vxlan.c|  8 +
 include/linux/netdevice.h  |  7 -
 net/core/dev.c |  1 +
 net/core/sysctl_net_core.c | 60 ++
 net/ipv4/sysctl_net_ipv4.c |  7 +
 net/ipv6/ip6_offload.c | 10 +--
 net/ipv6/sysctl_net_ipv6.c |  8 +
 7 files changed, 97 insertions(+), 4 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index e5d236595206..8cb8e02c8ab6 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -572,6 +572,7 @@ static struct sk_buff *vxlan_gro_receive(struct sock *sk,
 struct list_head *head,
 struct sk_buff *skb)
 {
+   const struct net_offload *ops;
struct sk_buff *pp = NULL;
struct sk_buff *p;
struct vxlanhdr *vh, *vh2;
@@ -606,6 +607,12 @@ static struct sk_buff *vxlan_gro_receive(struct sock *sk,
goto out;
}
 
+   rcu_read_lock();
+   ops = net_gro_receive(dev_offloads, ETH_P_TEB);
+   rcu_read_unlock();
+   if (!ops)
+   goto out;
+
skb_gro_pull(skb, sizeof(struct vxlanhdr)); /* pull vxlan header */
 
list_for_each_entry(p, head, list) {
@@ -621,6 +628,7 @@ static struct sk_buff *vxlan_gro_receive(struct sock *sk,
}
 
pp = call_gro_receive(eth_gro_receive, head, skb);
+
flush = 0;
 
 out:
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b9e671887fc2..93e8c9ade593 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2377,6 +2377,10 @@ struct net_offload {
 
 /* This should be set for any extension header which is compatible with GSO. */
 #define INET6_PROTO_GSO_EXTHDR 0x1
+#define NET_OFF_FLAG_GRO_OFF   0x2
+
+int proc_do_net_offload(struct ctl_table *ctl, int write, void __user *buffer,
+   size_t *lenp, loff_t *ppos);
 
 /* often modified stats are per-CPU, other are shared (netdev->stats) */
 struct pcpu_sw_netstats {
@@ -3583,7 +3587,8 @@ net_gro_receive(struct net_offload __rcu **offs, u16 type)
 
off = rcu_dereference(offs[net_offload_from_type(type)]);
if (off && off->callbacks.gro_receive &&
-   (!off->type || off->type == type))
+   (!off->type || off->type == type) &&
+   !(off->flags & NET_OFF_FLAG_GRO_OFF))
return off;
else
return NULL;
diff --git a/net/core/dev.c b/net/core/dev.c
index 20d9552afd38..0fd5273bc931 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -154,6 +154,7 @@
 #define GRO_MAX_HEAD (MAX_HEADER + 128)
 
 static DEFINE_SPINLOCK(ptype_lock);
+DEFINE_SPINLOCK(offload_lock);
 struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly;
 struct list_head ptype_all __read_mostly;  /* Taps */
 static struct list_head offload_base __read_mostly;
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index b1a2c5e38530..d2d72afdd9eb 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -34,6 +35,58 @@ static int net_msg_warn; /* Unused, but still a sysctl */
 int sysctl_fb_tunnels_only_for_init_net __read_mostly = 0;
 EXPORT_SYMBOL(sysctl_fb_tunnels_only_for_init_net);
 
+extern spinlock_t offload_lock;
+
+#define NET_OFF_TBL_LEN256
+
+int proc_do_net_offload(struct ctl_table *ctl, int write, void __user *buffer,
+   size_t *lenp, loff_t *ppos)
+{
+   unsigned long bitmap[NET_OFF_TBL_LEN / (sizeof(unsigned long) << 3)];
+   struct ctl_table tbl = { .maxlen = NET_OFF_TBL_LEN, .data = bitmap };
+   unsigned long flag = (unsigned long) ctl->extra2;
+   struct net_offload __rcu **offs = ctl->extra1;
+   struct net_offload *off;
+   int i, ret;
+
+   memset(bitmap, 0, sizeof(bitmap));
+
+   spin_lock(_lock);
+
+   for (i = 0; i < tbl.maxlen; i++) {
+   off = rcu_dereference_protected(offs[i], 
lockdep_is_held(_lock));
+   if (off && off->flags & flag) {
+   /* flag specific constraints */
+   if (flag == NET_OFF_FLAG_GRO_OFF) {
+   /* gro disable bit: only if can gro */
+   if (!off->callbacks.gro_receive &&
+   !(off->flags & INET6_PROTO_GSO_EXTHDR))
+   continue;
+   }
+   set_bit(i, bitmap);
+   }
+   }
+
+   ret = proc_do_large_bitmap(, write, buffer, lenp, ppos);
+
+   if (write && !ret) {
+   for

[PATCH net-next RFC 7/8] udp: gro behind static key

2018-09-14 Thread Willem de Bruijn

From: Willem de Bruijn 

Avoid the socket lookup cost in udp_gro_receive if no socket has a
gro callback configured.

Signed-off-by: Willem de Bruijn 
---
 include/net/udp.h  | 2 ++
 net/ipv4/udp.c | 2 +-
 net/ipv4/udp_offload.c | 2 +-
 net/ipv6/udp.c | 2 +-
 net/ipv6/udp_offload.c | 2 +-
 5 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index 8482a990b0bb..9e82cb391dea 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -443,8 +443,10 @@ int udpv4_offload_init(void);
 
 void udp_init(void);
 
+DECLARE_STATIC_KEY_FALSE(udp_encap_needed_key);
 void udp_encap_enable(void);
 #if IS_ENABLED(CONFIG_IPV6)
+DECLARE_STATIC_KEY_FALSE(udpv6_encap_needed_key);
 void udpv6_encap_enable(void);
 #endif
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index f4e35b2ff8b8..bd873a5b8a86 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1889,7 +1889,7 @@ static int __udp_queue_rcv_skb(struct sock *sk, struct 
sk_buff *skb)
return 0;
 }
 
-static DEFINE_STATIC_KEY_FALSE(udp_encap_needed_key);
+DEFINE_STATIC_KEY_FALSE(udp_encap_needed_key);
 void udp_encap_enable(void)
 {
static_branch_enable(_encap_needed_key);
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 4f6aa95a9b12..f44fe328aa0f 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -405,7 +405,7 @@ static struct sk_buff *udp4_gro_receive(struct list_head 
*head,
 {
struct udphdr *uh = udp_gro_udphdr(skb);
 
-   if (unlikely(!uh))
+   if (unlikely(!uh) || !static_branch_unlikely(_encap_needed_key))
goto flush;
 
/* Don't bother verifying checksum if we're going to flush anyway. */
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 83f4c77c79d8..d84672959f10 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -548,7 +548,7 @@ static __inline__ void udpv6_err(struct sk_buff *skb,
__udp6_lib_err(skb, opt, type, code, offset, info, _table);
 }
 
-static DEFINE_STATIC_KEY_FALSE(udpv6_encap_needed_key);
+DEFINE_STATIC_KEY_FALSE(udpv6_encap_needed_key);
 void udpv6_encap_enable(void)
 {
static_branch_enable(_encap_needed_key);
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index 2a41da0dd33f..e00f19c4a939 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -119,7 +119,7 @@ static struct sk_buff *udp6_gro_receive(struct list_head 
*head,
 {
struct udphdr *uh = udp_gro_udphdr(skb);
 
-   if (unlikely(!uh))
+   if (unlikely(!uh) || !static_branch_unlikely(_encap_needed_key))
goto flush;
 
/* Don't bother verifying checksum if we're going to flush anyway. */
-- 
2.19.0.397.gdd90340f6a-goog

[PATCH net-next RFC 4/8] ipv6: remove offload exception for hopopts

2018-09-14 Thread Willem de Bruijn

From: Willem de Bruijn 

Extension headers in ipv6 are pulled without calling a callback
function. An inet6_offload signals this feature with flag
INET6_PROTO_GSO_EXTHDR.

Add net_has_flag helper to hide implementation details and in
prepartion for configurable gro.

Convert NEXTHDR_HOP from a special case branch to a standard
extension header offload.

Signed-off-by: Willem de Bruijn 
---
 include/linux/netdevice.h  |  9 +
 net/ipv6/exthdrs_offload.c | 17 ++---
 net/ipv6/ip6_offload.c | 36 +---
 3 files changed, 36 insertions(+), 26 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0be594f8d1ce..1c97a048506f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3567,6 +3567,15 @@ static inline u8 net_offload_from_type(u16 type)
return type & 0xFF;
 }
 
+static inline bool net_offload_has_flag(const struct net_offload __rcu **offs,
+   u16 type, u16 flag)
+{
+   const struct net_offload *off;
+
+   off = offs ? rcu_dereference(offs[net_offload_from_type(type)]) : NULL;
+   return off && off->flags & flag;
+}
+
 static inline const struct net_offload *
 net_gro_receive(const struct net_offload __rcu **offs, u16 type)
 {
diff --git a/net/ipv6/exthdrs_offload.c b/net/ipv6/exthdrs_offload.c
index f5e2ba1c18bf..2230331c6012 100644
--- a/net/ipv6/exthdrs_offload.c
+++ b/net/ipv6/exthdrs_offload.c
@@ -12,11 +12,15 @@
 #include 
 #include "ip6_offload.h"
 
-static const struct net_offload rthdr_offload = {
+static struct net_offload hophdr_offload = {
.flags  =   INET6_PROTO_GSO_EXTHDR,
 };
 
-static const struct net_offload dstopt_offload = {
+static struct net_offload rthdr_offload = {
+   .flags  =   INET6_PROTO_GSO_EXTHDR,
+};
+
+static struct net_offload dstopt_offload = {
.flags  =   INET6_PROTO_GSO_EXTHDR,
 };
 
@@ -24,10 +28,14 @@ int __init ipv6_exthdrs_offload_init(void)
 {
int ret;
 
-   ret = inet6_add_offload(_offload, IPPROTO_ROUTING);
+   ret = inet6_add_offload(_offload, IPPROTO_HOPOPTS);
if (ret)
goto out;
 
+   ret = inet6_add_offload(_offload, IPPROTO_ROUTING);
+   if (ret)
+   goto out_hop;
+
ret = inet6_add_offload(_offload, IPPROTO_DSTOPTS);
if (ret)
goto out_rt;
@@ -37,5 +45,8 @@ int __init ipv6_exthdrs_offload_init(void)
 
 out_rt:
inet6_del_offload(_offload, IPPROTO_ROUTING);
+
+out_hop:
+   inet6_del_offload(_offload, IPPROTO_HOPOPTS);
goto out;
 }
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index 9d301bef0e23..4854509a2c5d 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -22,21 +22,13 @@
 
 static int ipv6_gso_pull_exthdrs(struct sk_buff *skb, int proto)
 {
-   const struct net_offload *ops = NULL;
-
for (;;) {
struct ipv6_opt_hdr *opth;
int len;
 
-   if (proto != NEXTHDR_HOP) {
-   ops = rcu_dereference(inet6_offloads[proto]);
-
-   if (unlikely(!ops))
-   break;
-
-   if (!(ops->flags & INET6_PROTO_GSO_EXTHDR))
-   break;
-   }
+   if (!net_offload_has_flag(inet6_offloads, proto,
+ INET6_PROTO_GSO_EXTHDR))
+   break;
 
if (unlikely(!pskb_may_pull(skb, 8)))
break;
@@ -141,26 +133,24 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff 
*skb,
 /* Return the total length of all the extension hdrs, following the same
  * logic in ipv6_gso_pull_exthdrs() when parsing ext-hdrs.
  */
-static int ipv6_exthdrs_len(struct ipv6hdr *iph,
-   const struct net_offload **opps)
+static int ipv6_exthdrs_len(struct ipv6hdr *iph, u8 *pproto)
 {
struct ipv6_opt_hdr *opth = (void *)iph;
int len = 0, proto, optlen = sizeof(*iph);
 
proto = iph->nexthdr;
for (;;) {
-   if (proto != NEXTHDR_HOP) {
-   *opps = rcu_dereference(inet6_offloads[proto]);
-   if (unlikely(!(*opps)))
-   break;
-   if (!((*opps)->flags & INET6_PROTO_GSO_EXTHDR))
-   break;
-   }
+   if (!net_offload_has_flag(inet6_offloads, proto,
+ INET6_PROTO_GSO_EXTHDR))
+   break;
+
opth = (void *)opth + optlen;
optlen = ipv6_optlen(opth);
len += optlen;
proto = opth->nexthdr;
}
+
+   *pproto = proto;
return len;
 }
 
@@ -296,8 +286,8 @@ static struct sk_buff *ip4ip6_gro_receive(struct list_head 
*head,
 
 static int

[PATCH net-next RFC 0/8] udp and configurable gro

2018-09-14 Thread Willem de Bruijn

From: Willem de Bruijn 

This is a *very rough* draft. Mainly for discussion while we also
look at another partially overlapping approach [1].

Reduce UDP receive cost for bulk traffic by enabling datagram
coalescing with GRO.

Before adding more GRO callbacks, make GRO configurable by the
administrator to optionally reduce the attack surface of this
early receive path. See also [2].

Introduce sysctls net.(core|ipv4|ipv6).gro that expose the table of
protocols for which GRO is support. Allow the administrator to disable
individual entries in the table.

To have a single infrastructure, convert dev_offloads to the
table-based approach to existing inet(6)_offloads. Additional small
benefit is that ipv6 will no longer take two list lookups to find.

Patch 1 converts dev_offloads to the infra of inet(6)_offloads
Patch 2 deduplicates gro_complete logic now that all share infra
Patch 3 does the same for gro_receive, in anticipation of adding
a branch to check whether gro_receive is enabled
Patch 4 harmonizes ipv6 header opts, so that those, too can be
optionally disabled.
Patch 5 makes inet(6)_offloads non-const to allow disabling a flag
Patch 6 introduces the administrative sysctl
Patch 7 avoids udp gro cost if no udp gro callback is register
Patch 8 introduces udp gro

[1] http://patchwork.ozlabs.org/project/netdev/list/?series=65741
[2] http://vger.kernel.org/netconf2017_files/rx_hardening_and_udp_gso.pdf

Willem de Bruijn (8):
  gro: convert device offloads to net_offload
  gro: deduplicate gro_complete
  gro: add net_gro_receive
  ipv6: remove offload exception for hopopts
  net: deconstify net_offload
  net: make gro configurable
  udp: gro behind static key
  udp: add gro

 drivers/net/geneve.c   |  11 +---
 drivers/net/vxlan.c|   8 +++
 include/linux/netdevice.h  |  64 +++--
 include/net/protocol.h |  19 ++-
 include/net/udp.h  |   2 +
 include/uapi/linux/udp.h   |   1 +
 net/8021q/vlan.c   |  12 +---
 net/core/dev.c | 112 -
 net/core/sysctl_net_core.c |  60 
 net/ethernet/eth.c |  13 +
 net/ipv4/af_inet.c |  21 ++-
 net/ipv4/esp4_offload.c|   2 +-
 net/ipv4/fou.c |  41 --
 net/ipv4/gre_offload.c |  26 -
 net/ipv4/protocol.c|  10 ++--
 net/ipv4/sysctl_net_ipv4.c |   7 +++
 net/ipv4/tcp_offload.c |   2 +-
 net/ipv4/udp.c |  73 +++-
 net/ipv4/udp_offload.c |  19 +++
 net/ipv6/esp6_offload.c|   2 +-
 net/ipv6/exthdrs_offload.c |  17 +-
 net/ipv6/ip6_offload.c |  69 +--
 net/ipv6/protocol.c|  10 ++--
 net/ipv6/sysctl_net_ipv6.c |   8 +++
 net/ipv6/tcpv6_offload.c   |   2 +-
 net/ipv6/udp.c |   2 +-
 net/ipv6/udp_offload.c |   4 +-
 net/sctp/offload.c |   2 +-
 28 files changed, 344 insertions(+), 275 deletions(-)

-- 
2.19.0.397.gdd90340f6a-goog

[PATCH net-next RFC 2/8] gro: deduplicate gro_complete

2018-09-14 Thread Willem de Bruijn

From: Willem de Bruijn 

The gro completion datapath is open coded for all protocols.
Deduplicate with new helper function net_gro_complete.

Signed-off-by: Willem de Bruijn 
---
 drivers/net/geneve.c  |  9 +
 include/linux/netdevice.h | 19 ++-
 net/8021q/vlan.c  | 10 +-
 net/core/dev.c| 24 +---
 net/ethernet/eth.c| 11 +--
 net/ipv4/af_inet.c| 15 ++-
 net/ipv4/fou.c| 25 +++--
 net/ipv4/gre_offload.c| 12 +++-
 net/ipv6/ip6_offload.c| 13 +
 9 files changed, 31 insertions(+), 107 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 6625fabe2c88..a3a4621d9bee 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -488,7 +488,6 @@ static int geneve_gro_complete(struct sock *sk, struct 
sk_buff *skb,
   int nhoff)
 {
struct genevehdr *gh;
-   struct packet_offload *ptype;
__be16 type;
int gh_len;
int err = -ENOSYS;
@@ -497,13 +496,7 @@ static int geneve_gro_complete(struct sock *sk, struct 
sk_buff *skb,
gh_len = geneve_hlen(gh);
type = gh->proto_type;
 
-   rcu_read_lock();
-   ptype = gro_find_complete_by_type(type);
-   if (ptype)
-   err = ptype->callbacks.gro_complete(skb, nhoff + gh_len);
-
-   rcu_read_unlock();
-
+   err = net_gro_complete(dev_offloads, type, skb, nhoff + gh_len);
skb_set_inner_mac_header(skb, nhoff + gh_len);
 
return err;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7425068fa249..0d292ea6716e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3557,7 +3557,8 @@ void napi_gro_flush(struct napi_struct *napi, bool 
flush_old);
 struct sk_buff *napi_get_frags(struct napi_struct *napi);
 gro_result_t napi_gro_frags(struct napi_struct *napi);
 struct packet_offload *gro_find_receive_by_type(__be16 type);
-struct packet_offload *gro_find_complete_by_type(__be16 type);
+
+extern const struct net_offload __rcu *dev_offloads[256];
 
 static inline u8 net_offload_from_type(u16 type)
 {
@@ -3567,6 +3568,22 @@ static inline u8 net_offload_from_type(u16 type)
return type & 0xFF;
 }
 
+static inline int net_gro_complete(const struct net_offload __rcu **offs,
+  u16 type, struct sk_buff *skb, int nhoff)
+{
+   const struct net_offload *off;
+   int ret = -ENOENT;
+
+   rcu_read_lock();
+   off = rcu_dereference(offs[net_offload_from_type(type)]);
+   if (off && off->callbacks.gro_complete &&
+   (!off->type || off->type == type))
+   ret = off->callbacks.gro_complete(skb, nhoff);
+   rcu_read_unlock();
+
+   return ret;
+}
+
 static inline void napi_free_frags(struct napi_struct *napi)
 {
kfree_skb(napi->skb);
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 5e9950453955..6ac27aa9f158 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -703,16 +703,8 @@ static int vlan_gro_complete(struct sk_buff *skb, int 
nhoff)
 {
struct vlan_hdr *vhdr = (struct vlan_hdr *)(skb->data + nhoff);
__be16 type = vhdr->h_vlan_encapsulated_proto;
-   struct packet_offload *ptype;
-   int err = -ENOENT;
 
-   rcu_read_lock();
-   ptype = gro_find_complete_by_type(type);
-   if (ptype)
-   err = ptype->callbacks.gro_complete(skb, nhoff + sizeof(*vhdr));
-
-   rcu_read_unlock();
-   return err;
+   return net_gro_complete(dev_offloads, type, skb, nhoff + sizeof(*vhdr));
 }
 
 static struct packet_offload vlan_packet_offloads[] __read_mostly = {
diff --git a/net/core/dev.c b/net/core/dev.c
index 55f86b6d3182..2c21e507291f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5235,10 +5235,6 @@ static void flush_all_backlogs(void)
 
 static int napi_gro_complete(struct sk_buff *skb)
 {
-   const struct packet_offload *ptype;
-   __be16 type = skb->protocol;
-   int err = -ENOENT;
-
BUILD_BUG_ON(sizeof(struct napi_gro_cb) > sizeof(skb->cb));
 
if (NAPI_GRO_CB(skb)->count == 1) {
@@ -5246,13 +5242,7 @@ static int napi_gro_complete(struct sk_buff *skb)
goto out;
}
 
-   rcu_read_lock();
-   ptype = dev_offloads[net_offload_from_type(type)];
-   if (ptype && ptype->callbacks.gro_complete)
-   err = ptype->callbacks.gro_complete(skb, 0);
-   rcu_read_unlock();
-
-   if (err) {
+   if (net_gro_complete(dev_offloads, skb->protocol, skb, 0)) {
kfree_skb(skb);
return NET_RX_SUCCESS;
}
@@ -5505,18 +5495,6 @@ struct packet_offload *gro_find_receive_by_type(__be16 
type)
 }
 EXPORT_SYMBOL(gro_find_receive_by_type);
 
-struct packet_offload *gro_find_complete_by_type(__be16 type)
-{
-   struct net_offload *off;
-
-   off = (struct net_offload *)

[PATCH net-next RFC 5/8] net: deconstify net_offload

2018-09-14 Thread Willem de Bruijn

From: Willem de Bruijn 

With configurable gro, the flags field in net_offloads may be changed.

Remove the const keyword. This is a noop otherwise.

Signed-off-by: Willem de Bruijn 
---
 include/linux/netdevice.h | 14 +++---
 include/net/protocol.h| 12 ++--
 net/core/dev.c|  8 +++-
 net/ipv4/af_inet.c|  2 +-
 net/ipv4/esp4_offload.c   |  2 +-
 net/ipv4/fou.c|  8 
 net/ipv4/gre_offload.c|  2 +-
 net/ipv4/protocol.c   | 10 +-
 net/ipv4/tcp_offload.c|  2 +-
 net/ipv4/udp_offload.c|  6 +++---
 net/ipv6/esp6_offload.c   |  2 +-
 net/ipv6/ip6_offload.c|  6 +++---
 net/ipv6/protocol.c   | 10 +-
 net/ipv6/tcpv6_offload.c  |  2 +-
 net/ipv6/udp_offload.c|  2 +-
 net/sctp/offload.c|  2 +-
 16 files changed, 44 insertions(+), 46 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 1c97a048506f..b9e671887fc2 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3557,7 +3557,7 @@ void napi_gro_flush(struct napi_struct *napi, bool 
flush_old);
 struct sk_buff *napi_get_frags(struct napi_struct *napi);
 gro_result_t napi_gro_frags(struct napi_struct *napi);
 
-extern const struct net_offload __rcu *dev_offloads[256];
+extern struct net_offload __rcu *dev_offloads[256];
 
 static inline u8 net_offload_from_type(u16 type)
 {
@@ -3567,19 +3567,19 @@ static inline u8 net_offload_from_type(u16 type)
return type & 0xFF;
 }
 
-static inline bool net_offload_has_flag(const struct net_offload __rcu **offs,
+static inline bool net_offload_has_flag(struct net_offload __rcu **offs,
u16 type, u16 flag)
 {
-   const struct net_offload *off;
+   struct net_offload *off;
 
off = offs ? rcu_dereference(offs[net_offload_from_type(type)]) : NULL;
return off && off->flags & flag;
 }
 
 static inline const struct net_offload *
-net_gro_receive(const struct net_offload __rcu **offs, u16 type)
+net_gro_receive(struct net_offload __rcu **offs, u16 type)
 {
-   const struct net_offload *off;
+   struct net_offload *off;
 
off = rcu_dereference(offs[net_offload_from_type(type)]);
if (off && off->callbacks.gro_receive &&
@@ -3589,10 +3589,10 @@ net_gro_receive(const struct net_offload __rcu **offs, 
u16 type)
return NULL;
 }
 
-static inline int net_gro_complete(const struct net_offload __rcu **offs,
+static inline int net_gro_complete(struct net_offload __rcu **offs,
   u16 type, struct sk_buff *skb, int nhoff)
 {
-   const struct net_offload *off;
+   struct net_offload *off;
int ret = -ENOENT;
 
rcu_read_lock();
diff --git a/include/net/protocol.h b/include/net/protocol.h
index 53a0322ee545..5e2c20b662d1 100644
--- a/include/net/protocol.h
+++ b/include/net/protocol.h
@@ -87,8 +87,8 @@ struct inet_protosw {
 #define INET_PROTOSW_ICSK  0x04  /* Is this an inet_connection_sock? */
 
 extern struct net_protocol __rcu *inet_protos[MAX_INET_PROTOS];
-extern const struct net_offload __rcu *inet_offloads[MAX_INET_PROTOS];
-extern const struct net_offload __rcu *inet6_offloads[MAX_INET_PROTOS];
+extern struct net_offload __rcu *inet_offloads[MAX_INET_PROTOS];
+extern struct net_offload __rcu *inet6_offloads[MAX_INET_PROTOS];
 
 #if IS_ENABLED(CONFIG_IPV6)
 extern struct inet6_protocol __rcu *inet6_protos[MAX_INET_PROTOS];
@@ -96,8 +96,8 @@ extern struct inet6_protocol __rcu 
*inet6_protos[MAX_INET_PROTOS];
 
 int inet_add_protocol(const struct net_protocol *prot, unsigned char num);
 int inet_del_protocol(const struct net_protocol *prot, unsigned char num);
-int inet_add_offload(const struct net_offload *prot, unsigned char num);
-int inet_del_offload(const struct net_offload *prot, unsigned char num);
+int inet_add_offload(struct net_offload *prot, unsigned char num);
+int inet_del_offload(struct net_offload *prot, unsigned char num);
 void inet_register_protosw(struct inet_protosw *p);
 void inet_unregister_protosw(struct inet_protosw *p);
 
@@ -107,7 +107,7 @@ int inet6_del_protocol(const struct inet6_protocol *prot, 
unsigned char num);
 int inet6_register_protosw(struct inet_protosw *p);
 void inet6_unregister_protosw(struct inet_protosw *p);
 #endif
-int inet6_add_offload(const struct net_offload *prot, unsigned char num);
-int inet6_del_offload(const struct net_offload *prot, unsigned char num);
+int inet6_add_offload(struct net_offload *prot, unsigned char num);
+int inet6_del_offload(struct net_offload *prot, unsigned char num);
 
 #endif /* _PROTOCOL_H */
diff --git a/net/core/dev.c b/net/core/dev.c
index ae5fbd4114d2..20d9552afd38 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -466,7 +466,7 @@ void dev_remove_pack(struct packet_type *pt)
 EXPORT_SYMBOL(dev_remove_pack);
 
 
-const struct net_offload __rcu *dev_offloads[256] __read_mostly;
+struct net_offload __rcu *dev_offloads[256] __read_mostly;

Re: [PATCH] net: caif: remove redundant null check on frontpkt

2018-09-14 Thread Sergei Shtylyov

Hello!

On 09/14/2018 08:19 PM, Colin King wrote:

> From: Colin Ian King 
> 
> It is impossible for frontpkt to be null at the point of the null
> check because it has been assigned from rearpkt and there is no
> way realpkt can be null at the point of the assignment because

   rearpkt?

> of the sanity checking and exit paths taken previously. Remove
> the redundant null check.
> 
> Detected by CoverityScan, CID#114434 ("Logically dead code")
> 
> Signed-off-by: Colin Ian King 
[...]

MBR, Sergei

Re: [PATCH iproute2] libnetlink: fix leak and using unused memory on error

2018-09-14 Thread महेश बंडेवार

On Thu, Sep 13, 2018 at 12:33 PM, Stephen Hemminger
 wrote:
> If an error happens in multi-segment message (tc only)
> then report the error and stop processing further responses.
> This also fixes refering to the buffer after free.
>
> The sequence check is not necessary here because the
> response message has already been validated to be in
> the window of the sequence number of the iov.
>
> Reported-by: Mahesh Bandewar 
> Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic")
> Signed-off-by: Stephen Hemminger 
Acked-by: Mahesh Bandewar 
> ---
>  lib/libnetlink.c | 23 +--
>  1 file changed, 9 insertions(+), 14 deletions(-)
>
> diff --git a/lib/libnetlink.c b/lib/libnetlink.c
> index 928de1dd16d8..586809292594 100644
> --- a/lib/libnetlink.c
> +++ b/lib/libnetlink.c
> @@ -617,7 +617,6 @@ static int __rtnl_talk_iov(struct rtnl_handle *rtnl, 
> struct iovec *iov,
> msg.msg_iovlen = 1;
> i = 0;
> while (1) {
> -next:
> status = rtnl_recvmsg(rtnl->fd, , );
> ++i;
>
> @@ -660,27 +659,23 @@ next:
>
> if (l < sizeof(struct nlmsgerr)) {
> fprintf(stderr, "ERROR truncated\n");
> -   } else if (!err->error) {
> +   free(buf);
> +   return -1;
> +   }
> +
> +   if (!err->error)
> /* check messages from kernel */
> nl_dump_ext_ack(h, errfn);
>
> -   if (answer)
> -   *answer = (struct nlmsghdr 
> *)buf;
> -   else
> -   free(buf);
> -   if (h->nlmsg_seq == seq)
> -   return 0;
> -   else if (i < iovlen)
> -   goto next;
> -   return 0;
> -   }
> -
> if (rtnl->proto != NETLINK_SOCK_DIAG &&
> show_rtnl_err)
> rtnl_talk_error(h, err, errfn);
>
> errno = -err->error;
> -   free(buf);
> +   if (answer)
> +   *answer = (struct nlmsghdr *)buf;
> +   else
> +   free(buf);
> return -i;
> }
>
> --
> 2.18.0
>

[PATCH] net: caif: remove redundant null check on frontpkt

2018-09-14 Thread Colin King

From: Colin Ian King 

It is impossible for frontpkt to be null at the point of the null
check because it has been assigned from rearpkt and there is no
way realpkt can be null at the point of the assignment because
of the sanity checking and exit paths taken previously. Remove
the redundant null check.

Detected by CoverityScan, CID#114434 ("Logically dead code")

Signed-off-by: Colin Ian King 
---
 net/caif/cfrfml.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/caif/cfrfml.c b/net/caif/cfrfml.c
index b82440e1fcb4..a931a71ef6df 100644
--- a/net/caif/cfrfml.c
+++ b/net/caif/cfrfml.c
@@ -264,9 +264,6 @@ static int cfrfml_transmit(struct cflayer *layr, struct 
cfpkt *pkt)
frontpkt = rearpkt;
rearpkt = NULL;
 
-   err = -ENOMEM;
-   if (frontpkt == NULL)
-   goto out;
err = -EPROTO;
if (cfpkt_add_head(frontpkt, head, 6) < 0)
goto out;
-- 
2.17.1

Re: [RFC PATCH 2/4] net: enable UDP gro on demand.

2018-09-14 Thread Willem de Bruijn

On Fri, Sep 14, 2018 at 11:47 AM Paolo Abeni  wrote:
>
> Currently, the UDP GRO callback is always invoked, regardless of
> the existence of any actual user (e.g. a UDP tunnel). With retpoline
> enabled, this causes measurable overhead.
>
> This changeset introduces explicit accounting of the sockets requiring
> UDP GRO and updates the UDP offloads at runtime accordingly, so that
> the GRO callback is present (and invoked) only when there is at least
> one socket requiring it.

I have a difference solution both to the UDP socket lookup avoidance
and configurable GRO in general.

The first can be achieved by exporting the udp_encap_needed_key static key:

"
diff --git a/include/net/udp.h b/include/net/udp.h
index 8482a990b0bb..9e82cb391dea 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -443,8 +443,10 @@ int udpv4_offload_init(void);

 void udp_init(void);

+DECLARE_STATIC_KEY_FALSE(udp_encap_needed_key);
 void udp_encap_enable(void);

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index f4e35b2ff8b8..bd873a5b8a86 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1889,7 +1889,7 @@ static int __udp_queue_rcv_skb(struct sock *sk,
struct sk_buff *skb)
return 0;
 }

-static DEFINE_STATIC_KEY_FALSE(udp_encap_needed_key);
+DEFINE_STATIC_KEY_FALSE(udp_encap_needed_key);
 void udp_encap_enable(void)
 {
static_branch_enable(_encap_needed_key);
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 4f6aa95a9b12..f44fe328aa0f 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -405,7 +405,7 @@ static struct sk_buff *udp4_gro_receive(struct
list_head *head,
 {
struct udphdr *uh = udp_gro_udphdr(skb);

-   if (unlikely(!uh))
+   if (unlikely(!uh) || !static_branch_unlikely(_encap_needed_key))
goto flush;
 "

.. and same for ipv6.

The second is a larger patchset that converts dev_offload to
net_offload, so that all offloads share the same infrastructure, and a
sysctl interface to be able to disable all gro_receive types, not just
udp.

I've been sitting on it for too long. Let me slightly clean it up and
send it out for discussion sake..

>
> Tested with pktgen vs udpgso_bench_rx
> Before:
> udp rx: 27 MB/s  1613271 calls/s
>
> After:
> udp rx: 30 MB/s  1771537 calls/s
>
> Signed-off-by: Paolo Abeni

Re: [PATCH 1/1] net: rds: use memset to optimize the recv

2018-09-14 Thread Santosh Shilimkar


On 9/14/2018 1:45 AM, Zhu Yanjun wrote:

The function rds_inc_init is in recv process. To use memset can optimize
the function rds_inc_init.
The test result:

 Before:
 1) + 24.950 us   |rds_inc_init [rds]();
 After:
 1) + 10.990 us   |rds_inc_init [rds]();

Signed-off-by: Zhu Yanjun 
---

Looks good. Thanks !!

Acked-by: Santosh Shilimkar

Re: [RFC PATCH 3/4] udp: implement GRO plain UDP sockets.

2018-09-14 Thread Eric Dumazet




On 09/14/2018 08:43 AM, Paolo Abeni wrote:
> This is the RX counter part of commit bec1f6f69736 ("udp: generate gso
> with UDP_SEGMENT"). When UDP_SEGMENT is enabled, such socket is also
> eligible for GRO in the rx path: UDP segments directed to such socket
> are assembled into a larger GSO_UDP_L4 packet.
> 
> The core UDP GRO support is enabled/updated on setsockopt(UDP_SEGMENT) and
> disabled, if needed at socket destruction time.
> 
> Initial benchmark numbers:
> 
> Before:
> udp rx:   1079 MB/s   769065 calls/s
> 
> After:
> udp rx:   1466 MB/s24877 calls/s


Are you sure the data is actually fully copied to user space ?

tools/testing/selftests/net/udpgso_bench_rx.c

uses :

static char rbuf[ETH_DATA_LEN];
   /* MSG_TRUNC will make return value full datagram length */
   ret = recv(fd, rbuf, len, MSG_TRUNC | MSG_DONTWAIT);

So you need to change this program.

Also, GRO reception would mean that userspace can retrieve,
not only full bytes of X datagrams, but also the gso_size (or length of 
individual datagrams)

You can not know the size of the packets in advance, the sender will decide.

Re: [PATCH net] pppoe: fix reception of frames with no mac header

2018-09-14 Thread Alexander Potapenko

On Fri, Sep 14, 2018 at 4:35 PM Guillaume Nault  wrote:
>
> On Fri, Sep 14, 2018 at 04:28:05PM +0200, Guillaume Nault wrote:
> > pppoe_rcv() needs to look back at the Ethernet header in order to
> > lookup the PPPoE session. Therefore we need to ensure that the mac
> > header is big enough to contain an Ethernet header. Otherwise
> > eth_hdr(skb)->h_source might access invalid data.
> >
> Forgot to Cc Alexander :/
> Sorry...
> BTW, thanks for your first analysis.
Thank you!



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Re: Project Financing

2018-09-14 Thread Gabriel Walker

Thank you for your time,

We are looking for clients in your country with good business or project that 
requires financing to execute.

Do get back to me if you are interested in this or you know anybody who has 
good business ideas but lack the necessary capital to fund his projects so we 
can establish working relationship.

Sincerely,
 
John Hanan, MBA, CFA
General Investment Consultant

Re: [PATCH net-next] cxgb4: update supported DCB version

2018-09-14 Thread David Miller

From: Ganesh Goudar 
Date: Fri, 14 Sep 2018 17:35:55 +0530

> - In CXGB4_DCB_STATE_FW_INCOMPLETE state check if the dcb
>   version is changed and update the dcb supported version.
> 
> - Also, fill the priority code point value for priority
>   based flow control.
> 
> Signed-off-by: Ganesh Goudar 

Applied, thank you.

Re: [PATCH net] net/sched: act_sample: fix NULL dereference in the data path

2018-09-14 Thread David Miller

From: Davide Caratti 
Date: Fri, 14 Sep 2018 12:03:18 +0200

> Matteo reported the following splat, testing the datapath of TC 'sample':
 ...
> tcf_sample_act() tried to update its per-cpu stats, but tcf_sample_init()
> forgot to allocate them, because tcf_idr_create() was called with a wrong
> value of 'cpustats'. Setting it to true proved to fix the reported crash.
> 
> Reported-by: Matteo Croce 
> Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use 
> IDR")
> Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action")
> Tested-by: Matteo Croce 
> Signed-off-by: Davide Caratti 

Applied and queued up for -stable, thanks.

[RFC PATCH 2/4] net: enable UDP gro on demand.

2018-09-14 Thread Paolo Abeni

Currently, the UDP GRO callback is always invoked, regardless of
the existence of any actual user (e.g. a UDP tunnel). With retpoline
enabled, this causes measurable overhead.

This changeset introduces explicit accounting of the sockets requiring
UDP GRO and updates the UDP offloads at runtime accordingly, so that
the GRO callback is present (and invoked) only when there is at least
one socket requiring it.

Tested with pktgen vs udpgso_bench_rx
Before:
udp rx: 27 MB/s  1613271 calls/s

After:
udp rx: 30 MB/s  1771537 calls/s

Signed-off-by: Paolo Abeni 
---
 include/linux/udp.h| 18 +++-
 include/net/addrconf.h |  1 +
 include/net/udp.h  | 12 
 net/ipv4/udp.c |  2 ++
 net/ipv4/udp_offload.c | 63 --
 net/ipv4/udp_tunnel.c  |  1 +
 net/ipv6/af_inet6.c|  1 +
 net/ipv6/udp_offload.c | 25 +++--
 8 files changed, 117 insertions(+), 6 deletions(-)

diff --git a/include/linux/udp.h b/include/linux/udp.h
index 320d49d85484..56a321a55ba1 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -49,7 +49,8 @@ struct udp_sock {
unsigned int corkflag;  /* Cork is required */
__u8 encap_type;/* Is this an Encapsulation socket? */
unsigned charno_check6_tx:1,/* Send zero UDP6 checksums on TX? */
-no_check6_rx:1;/* Allow zero UDP6 checksums on RX? */
+no_check6_rx:1,/* Allow zero UDP6 checksums on RX? */
+gro_in_use:1;  /* UDP GRO is requested */
/*
 * Following member retains the information to create a UDP header
 * when the socket is uncorked.
@@ -105,6 +106,11 @@ static inline void udp_set_no_check6_rx(struct sock *sk, 
bool val)
udp_sk(sk)->no_check6_rx = val;
 }
 
+static inline void udp_set_gro_in_use(struct sock *sk, bool val)
+{
+   udp_sk(sk)->gro_in_use = val;
+}
+
 static inline bool udp_get_no_check6_tx(struct sock *sk)
 {
return udp_sk(sk)->no_check6_tx;
@@ -115,6 +121,16 @@ static inline bool udp_get_no_check6_rx(struct sock *sk)
return udp_sk(sk)->no_check6_rx;
 }
 
+static inline bool udp_get_gro_in_use(struct sock *sk)
+{
+   return udp_sk(sk)->gro_in_use;
+}
+
+static inline bool udp_want_gro(struct sock *sk)
+{
+   return udp_sk(sk)->gro_receive;
+}
+
 #define udp_portaddr_for_each_entry(__sk, list) \
hlist_for_each_entry(__sk, list, __sk_common.skc_portaddr_node)
 
diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 6def0351bcc3..fb2ac3ca3417 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -254,6 +254,7 @@ struct ipv6_stub {
 struct in6_addr *saddr);
 
void (*udpv6_encap_enable)(void);
+   void (*udpv6_update_offload)(bool enable_gro);
void (*ndisc_send_na)(struct net_device *dev, const struct in6_addr 
*daddr,
  const struct in6_addr *solicited_addr,
  bool router, bool solicited, bool override, bool 
inc_opt);
diff --git a/include/net/udp.h b/include/net/udp.h
index 8482a990b0bb..eff2dfa0571b 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -444,8 +444,20 @@ int udpv4_offload_init(void);
 void udp_init(void);
 
 void udp_encap_enable(void);
+void udp_gro_in_use_changed(struct sock *sk);
+
 #if IS_ENABLED(CONFIG_IPV6)
 void udpv6_encap_enable(void);
+void udpv6_update_offload(bool);
 #endif
 
+static inline void udp_update_gro_in_use(struct sock *sk, bool want_gro)
+{
+   if (want_gro == udp_get_gro_in_use(sk))
+   return;
+
+   udp_set_gro_in_use(sk, want_gro);
+   udp_gro_in_use_changed(sk);
+}
+
 #endif /* _UDP_H */
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index f4e35b2ff8b8..5ac794230013 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1438,6 +1438,8 @@ void udp_destruct_sock(struct sock *sk)
}
udp_rmem_release(sk, total, 0, true);
 
+   udp_update_gro_in_use(sk, 0);
+
inet_sock_destruct(sk);
 }
 EXPORT_SYMBOL_GPL(udp_destruct_sock);
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 0c0522b79b43..08b225adf763 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -14,6 +14,10 @@
 #include 
 #include 
 
+#if IS_ENABLED(CONFIG_IPV6)
+#include 
+#endif
+
 static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
netdev_features_t features,
struct sk_buff *(*gso_inner_segment)(struct sk_buff *skb,
@@ -472,7 +476,13 @@ static int udp4_gro_complete(struct sk_buff *skb, int 
nhoff)
return udp_gro_complete(skb, nhoff, udp4_lib_lookup_skb);
 }
 
-static const struct net_offload udpv4_offload = {
+static const struct net_offload udpv4_no_gro_offload = {
+   .callbacks = {
+   .gso_segment = udp4_ufo_fragment,
+   },
+};
+
+static const struct net_offload udpv4_gro_offload = {
.callbacks = {

[RFC PATCH 4/4] selftests: add GRO support, fix port option processing

2018-09-14 Thread Paolo Abeni

Not a full test-case yet, but allows triggering the UDP GSO code
path.

Signed-off-by: Paolo Abeni 
---
 tools/testing/selftests/net/udpgso_bench_rx.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/net/udpgso_bench_rx.c 
b/tools/testing/selftests/net/udpgso_bench_rx.c
index 727cf67a3f75..f8bb7ea6bd25 100644
--- a/tools/testing/selftests/net/udpgso_bench_rx.c
+++ b/tools/testing/selftests/net/udpgso_bench_rx.c
@@ -31,9 +31,14 @@
 #include 
 #include 
 
+#ifndef UDP_SEGMENT
+#define UDP_SEGMENT103
+#endif
+
 static int  cfg_port   = 8000;
 static bool cfg_tcp;
 static bool cfg_verify;
+static bool cfg_gro_segment;
 
 static bool interrupted;
 static unsigned long packets, bytes;
@@ -199,10 +204,13 @@ static void parse_opts(int argc, char **argv)
 {
int c;
 
-   while ((c = getopt(argc, argv, "ptv")) != -1) {
+   while ((c = getopt(argc, argv, "p:Stv")) != -1) {
switch (c) {
case 'p':
-   cfg_port = htons(strtoul(optarg, NULL, 0));
+   cfg_port = strtoul(optarg, NULL, 0);
+   break;
+   case 'S':
+   cfg_gro_segment = true;
break;
case 't':
cfg_tcp = true;
@@ -227,6 +235,12 @@ static void do_recv(void)
 
fd = do_socket(cfg_tcp);
 
+   if (cfg_gro_segment) {
+   int val = 1;
+   if (setsockopt(fd, IPPROTO_UDP, UDP_SEGMENT, , sizeof(val)))
+   error(1, errno, "setsockopt UDP_SEGMENT");
+   }
+
treport = gettimeofday_ms() + 1000;
do {
do_poll(fd);
-- 
2.17.1

[RFC PATCH 1/4] net: add new helper to update an already registered offload

2018-09-14 Thread Paolo Abeni

This will allow us to enable/disable UDP GRO at runtime in
a later patch.

Signed-off-by: Paolo Abeni 
---
 include/net/protocol.h |  4 
 net/ipv4/protocol.c| 13 +
 net/ipv6/protocol.c| 13 +
 3 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/include/net/protocol.h b/include/net/protocol.h
index 4fc75f7ae23b..aa77e7feffab 100644
--- a/include/net/protocol.h
+++ b/include/net/protocol.h
@@ -104,6 +104,8 @@ extern struct inet6_protocol __rcu 
*inet6_protos[MAX_INET_PROTOS];
 int inet_add_protocol(const struct net_protocol *prot, unsigned char num);
 int inet_del_protocol(const struct net_protocol *prot, unsigned char num);
 int inet_add_offload(const struct net_offload *prot, unsigned char num);
+int inet_update_offload(const struct net_offload *old_prot,
+   const struct net_offload *new_prot, unsigned char num);
 int inet_del_offload(const struct net_offload *prot, unsigned char num);
 void inet_register_protosw(struct inet_protosw *p);
 void inet_unregister_protosw(struct inet_protosw *p);
@@ -115,6 +117,8 @@ int inet6_register_protosw(struct inet_protosw *p);
 void inet6_unregister_protosw(struct inet_protosw *p);
 #endif
 int inet6_add_offload(const struct net_offload *prot, unsigned char num);
+int inet6_update_offload(const struct net_offload *old_prot,
+const struct net_offload *new_prot, unsigned char num);
 int inet6_del_offload(const struct net_offload *prot, unsigned char num);
 
 #endif /* _PROTOCOL_H */
diff --git a/net/ipv4/protocol.c b/net/ipv4/protocol.c
index 32a691b7ce2c..b60f1686b918 100644
--- a/net/ipv4/protocol.c
+++ b/net/ipv4/protocol.c
@@ -65,12 +65,17 @@ int inet_del_protocol(const struct net_protocol *prot, 
unsigned char protocol)
 }
 EXPORT_SYMBOL(inet_del_protocol);
 
-int inet_del_offload(const struct net_offload *prot, unsigned char protocol)
+int inet_update_offload(const struct net_offload *old_prot,
+   const struct net_offload *new_prot,
+   unsigned char protocol)
 {
-   int ret;
+   return (cmpxchg((const struct net_offload **)_offloads[protocol],
+   old_prot, new_prot) == old_prot) ? 0 : -1;
+}
 
-   ret = (cmpxchg((const struct net_offload **)_offloads[protocol],
-  prot, NULL) == prot) ? 0 : -1;
+int inet_del_offload(const struct net_offload *prot, unsigned char protocol)
+{
+   int ret = inet_update_offload(prot, NULL, protocol);
 
synchronize_net();
 
diff --git a/net/ipv6/protocol.c b/net/ipv6/protocol.c
index b5d54d4f995c..9ee6aff1f3fa 100644
--- a/net/ipv6/protocol.c
+++ b/net/ipv6/protocol.c
@@ -60,12 +60,17 @@ int inet6_add_offload(const struct net_offload *prot, 
unsigned char protocol)
 }
 EXPORT_SYMBOL(inet6_add_offload);
 
-int inet6_del_offload(const struct net_offload *prot, unsigned char protocol)
+int inet6_update_offload(const struct net_offload *old_prot,
+const struct net_offload *new_prot,
+unsigned char protocol)
 {
-   int ret;
+   return (cmpxchg((const struct net_offload **)_offloads[protocol],
+   old_prot, new_prot) == old_prot) ? 0 : -1;
+}
 
-   ret = (cmpxchg((const struct net_offload **)_offloads[protocol],
-  prot, NULL) == prot) ? 0 : -1;
+int inet6_del_offload(const struct net_offload *prot, unsigned char protocol)
+{
+   int ret = inet6_update_offload(prot, NULL, protocol);
 
synchronize_net();
 
-- 
2.17.1

[RFC PATCH 3/4] udp: implement GRO plain UDP sockets.

2018-09-14 Thread Paolo Abeni

This is the RX counter part of commit bec1f6f69736 ("udp: generate gso
with UDP_SEGMENT"). When UDP_SEGMENT is enabled, such socket is also
eligible for GRO in the rx path: UDP segments directed to such socket
are assembled into a larger GSO_UDP_L4 packet.

The core UDP GRO support is enabled/updated on setsockopt(UDP_SEGMENT) and
disabled, if needed at socket destruction time.

Initial benchmark numbers:

Before:
udp rx:   1079 MB/s   769065 calls/s

After:
udp rx:   1466 MB/s24877 calls/s

This change introduces a side effect in respect to UDP tunnels:
after an UDP tunnel creation, now the kernel performs a lookup per ingress UDP
packet, before such lookup happended only if the ingress packet carried a valid
internal header csum.

Signed-off-by: Paolo Abeni 
---
 include/linux/udp.h|   2 +-
 net/ipv4/udp.c |   1 +
 net/ipv4/udp_offload.c | 107 +
 net/ipv6/udp_offload.c |   6 +--
 4 files changed, 90 insertions(+), 26 deletions(-)

diff --git a/include/linux/udp.h b/include/linux/udp.h
index 56a321a55ba1..27dea956ef6e 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -128,7 +128,7 @@ static inline bool udp_get_gro_in_use(struct sock *sk)
 
 static inline bool udp_want_gro(struct sock *sk)
 {
-   return udp_sk(sk)->gro_receive;
+   return udp_sk(sk)->gro_receive || udp_sk(sk)->gso_size;
 }
 
 #define udp_portaddr_for_each_entry(__sk, list) \
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 5ac794230013..871ee55afd96 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2450,6 +2450,7 @@ int udp_lib_setsockopt(struct sock *sk, int level, int 
optname,
if (val < 0 || val > USHRT_MAX)
return -EINVAL;
up->gso_size = val;
+   udp_update_gro_in_use(sk, udp_want_gro(sk));
break;
 
/*
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 08b225adf763..4ff150bb84de 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -347,6 +347,54 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff 
*skb,
return segs;
 }
 
+#define UDO_GRO_CNT_MAX 64
+static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
+  struct sk_buff *skb)
+{
+   struct udphdr *uh = udp_hdr(skb);
+   struct sk_buff *pp = NULL;
+   struct udphdr *uh2;
+   struct sk_buff *p;
+
+   /* requires non zero csum, for simmetry with GSO */
+   if (!uh->check) {
+   NAPI_GRO_CB(skb)->flush = 1;
+   return NULL;
+   }
+
+   /* pull encapsulating udp header */
+   skb_gro_pull(skb, sizeof(struct udphdr));
+   skb_gro_postpull_rcsum(skb, uh, sizeof(struct udphdr));
+
+   list_for_each_entry(p, head, list) {
+   if (!NAPI_GRO_CB(p)->same_flow)
+   continue;
+
+   uh2 = udp_hdr(p);
+
+   /* Match ports only, as csum is always non zero */
+   if ((*(u32 *)>source != *(u32 *)>source)) {
+   NAPI_GRO_CB(p)->same_flow = 0;
+   continue;
+   }
+
+   /* Terminate the flow on len mismatch or if it grow "too much".
+* Under small packet flood GRO count could elsewhere grow a lot
+* leading to execessive truesize values
+*/
+   if (!skb_gro_receive(p, skb) &&
+   NAPI_GRO_CB(p)->count > UDO_GRO_CNT_MAX)
+   pp = p;
+   else if (uh->len != uh2->len)
+   pp = p;
+
+   return pp;
+   }
+
+   /* mismatch, but we never need to flush */
+   return NULL;
+}
+
 struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
struct udphdr *uh, udp_lookup_t lookup)
 {
@@ -357,23 +405,29 @@ struct sk_buff *udp_gro_receive(struct list_head *head, 
struct sk_buff *skb,
int flush = 1;
struct sock *sk;
 
+   rcu_read_lock();
+   sk = (*lookup)(skb, uh->source, uh->dest);
+   if (!sk)
+   goto out_unlock;
+
+   if (udp_sk(sk)->gso_size) {
+   pp = call_gro_receive(udp_gro_receive_segment, head, skb);
+   rcu_read_unlock();
+   return pp;
+   }
+
if (NAPI_GRO_CB(skb)->encap_mark ||
(skb->ip_summed != CHECKSUM_PARTIAL &&
 NAPI_GRO_CB(skb)->csum_cnt == 0 &&
 !NAPI_GRO_CB(skb)->csum_valid))
-   goto out;
+   goto out_unlock;
 
/* mark that this skb passed once through the tunnel gro layer */
NAPI_GRO_CB(skb)->encap_mark = 1;
 
-   rcu_read_lock();
-   sk = (*lookup)(skb, uh->source, uh->dest);
-
-   if (sk && udp_sk(sk)->gro_receive)
-   goto unflush;
-   goto out_unlock;
+   if (!udp_sk(sk)->gro_receive)
+   goto out_unlock;
 
-unflush:

[RFC PATCH 0/4] UDP: implement GRO support for UDP_SEGMENT socket

2018-09-14 Thread Paolo Abeni

This series implements GRO support for UDP sockets, as the RX counterpart
of ommit bec1f6f69736 ("udp: generate gso with UDP_SEGMENT"). 
The first two patches allow UDP GRO registration on demand, avoiding additional
overhead when no UDP_SEGMENT sockets are created, actually decreasing the GRO
engine costs for the default configuration for UDP packets. They could possibly
live on their own.
The third patch contains the actual UDP GRO implementation, while the 4th patch
allows using the udpgso_bench_rx program under selftest to trigger UDP GRO. A
full self-test is not there yet.

Paolo Abeni (4):
  net: add new helper to update an already registered offload
  net: enable UDP gro on demand.
  udp: implement GRO plain UDP sockets.
  selftests: add GRO support, fix port option processing

 include/linux/udp.h   |  18 +-
 include/net/addrconf.h|   1 +
 include/net/protocol.h|   4 +
 include/net/udp.h |  12 ++
 net/ipv4/protocol.c   |  13 +-
 net/ipv4/udp.c|   3 +
 net/ipv4/udp_offload.c| 170 +++---
 net/ipv4/udp_tunnel.c |   1 +
 net/ipv6/af_inet6.c   |   1 +
 net/ipv6/protocol.c   |  13 +-
 net/ipv6/udp_offload.c|  31 +++-
 tools/testing/selftests/net/udpgso_bench_rx.c |  18 +-
 12 files changed, 244 insertions(+), 41 deletions(-)

-- 
2.17.1

Re: [PATCH net-next] cxgb4: Fix endianness issue in t4_fwcache()

2018-09-14 Thread David Miller

From: Ganesh Goudar 
Date: Fri, 14 Sep 2018 14:36:27 +0530

> Do not put host-endian 0 or 1 into big endian feild.
> 
> Reported-by: Al Viro 
> Signed-off-by: Ganesh Goudar 

Applied.

Re: [PATCH net-next] cxgb4: add per rx-queue counter for packet errors

2018-09-14 Thread David Miller

From: Ganesh Goudar 
Date: Fri, 14 Sep 2018 14:46:04 +0530

> print per rx-queue packet errors in sge_qinfo
> 
> Signed-off-by: Casey Leedom 
> Signed-off-by: Ganesh Goudar 

Applied.

Re: [PATCH][net-next] net: move definition of pcpu_lstats to header file

2018-09-14 Thread David Miller

From: Li RongQing 
Date: Fri, 14 Sep 2018 16:00:51 +0800

> pcpu_lstats is defined in several files, so unify them as one
> and move to header file
> 
> Signed-off-by: Zhang Yu 
> Signed-off-by: Li RongQing 

This looks fine, applied, thanks.

Re: [PATCH net] net: diag: Fix swapped src/dst in udp_dump_one.

2018-09-14 Thread David Miller

From: Lorenzo Colitti 
Date: Fri, 14 Sep 2018 15:25:53 +0900

> Since its inception, udp_dump_one had has a bug where userspace
> needs to swap src and dst addresses and ports in order to find
> the socket it wants.
> 
> This is because udp_dump_one misuses __udp[46]_lib_lookup by
> passing the source address as the source address argument.
> Unfortunately, those functions are intended to find local sockets
> matching received packets, so the order of the arguments is
> inverted: the argument that ends up being compared with, e.g.,
> sk_daddr is actually saddr, not daddr.
> 
> While it's true that this creates a backwards compatibility
> problem, this is clearly a bug since inet_diag_sockid is very
> clear about which struct elements are the source address and port
> and which are the destination address and port. Also, this bug
> does not affect TCP sockets, SOCK_DESTROY of UDP sockets, or
> finding UDP sockets with NLMSG_DUMP.
> 
> Fixes: a925aa00a55 ("udp_diag: Implement the get_exact dumping functionality")
> Tested: https://android-review.googlesource.com/c/kernel/tests/+/755889/
> Signed-off-by: Lorenzo Colitti 

Unfortunately I think we are stuck with how things are now.

Indisputably, your patch breaks userland components that have
workarounds in order to work with existing kernels.  People who
wrote such code:

1) Won't get any warnings that things are about to break on them

2) Will have limited options to have their code work on all kernels,
   ones that have this change and ones that do not.

Maybe if this got introduced 1 or 2 releases ago we could consider
doing this, but all the way back to v3.3?  No way.

I cannot apply this, sorry.

Re: [PATCH net-next 08/13] net: sched: rename tcf_block_get{_ext}() and tcf_block_put{_ext}()

2018-09-14 Thread Jiri Pirko

Fri, Sep 14, 2018 at 12:38:08PM CEST, vla...@mellanox.com wrote:
>
>On Thu 13 Sep 2018 at 17:21, Cong Wang  wrote:
>> On Wed, Sep 12, 2018 at 1:24 AM Vlad Buslov  wrote:
>>>
>>>
>>> On Fri 07 Sep 2018 at 20:09, Cong Wang  wrote:
>>> > On Thu, Sep 6, 2018 at 12:59 AM Vlad Buslov  wrote:
>>> >>
>>> >> Functions tcf_block_get{_ext}() and tcf_block_put{_ext}() actually
>>> >> attach/detach block to specific Qdisc besides just taking/putting
>>> >> reference. Rename them according to their purpose.
>>> >
>>> > Where exactly does it attach to?
>>> >
>>> > Each qdisc provides a pointer to a pointer of a block, like
>>> > >block. It is where the result is saved to. It takes a parameter
>>> > of Qdisc* merely for read-only purpose.
>>>
>>> tcf_block_attach_ext() passes qdisc parameter to tcf_block_owner_add()
>>> which saves qdisc to new tcf_block_owner_item and adds the item to
>>> block's owner list. I proposed several naming options for these
>>> functions to Jiri on internal review and he suggested "attach" as better
>>> option.
>>
>> But that is merely item->q = q, this is why I said it is read-only,
>> hard to claim this is attaching.
>>
>>
>>>
>>> >
>>> > So, renaming it to *attach() is even confusing, at least not
>>> > any better. Please find other names or leave them as they are.
>>>
>>> What would you recommend?
>>
>> I don't know, perhaps "acquire"?
>>
>> Or, leaving tcf_block_get() as it is but rename your refcnt
>> increment function to be something like tcf_block_refcnt_get()?
>
>Cong, I'm okay with both options.
>
>Jiri, which naming would you prefer?

Maybe tcf_block_refcnt_get() is better.

Re: [PATCH net-next v3 0/2] net: stmmac: Coalesce and tail addr fixes

2018-09-14 Thread Jerome Brunet

On Thu, 2018-09-13 at 09:02 +0100, Jose Abreu wrote:
> The fix for coalesce timer and a fix in tail address setting that impacts
> XGMAC2 operation.
> 
> Cc: Florian Fainelli 
> Cc: Neil Armstrong 
> Cc: Jerome Brunet 
> Cc: Martin Blumenstingl 
> Cc: David S. Miller 
> Cc: Joao Pinto 
> Cc: Giuseppe Cavallaro 
> Cc: Alexandre Torgue 
> 
> Jose Abreu (2):
>   net: stmmac: Rework coalesce timer and fix multi-queue races
>   net: stmmac: Fixup the tail addr setting in xmit path

Looks better this time. Stable so far, with even a small throughput improvement
on the Tx path.

so for the a113 s400 board (single queue)
Tested-by: Jerome Brunet 

> 
>  drivers/net/ethernet/stmicro/stmmac/common.h  |   4 +-
>  drivers/net/ethernet/stmicro/stmmac/stmmac.h  |  14 +-
>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 238 
> --
>  include/linux/stmmac.h|   1 +
>  4 files changed, 149 insertions(+), 108 deletions(-)
>

[bpf-next, v4 5/5] selftests/bpf: test bpf flow dissection

2018-09-14 Thread Petar Penkov

From: Petar Penkov 

Adds a test that sends different types of packets over multiple
tunnels and verifies that valid packets are dissected correctly.  To do
so, a tc-flower rule is added to drop packets on UDP src port 9, and
packets are sent from ports 8, 9, and 10. Only the packets on port 9
should be dropped. Because tc-flower relies on the flow dissector to
match flows, correct classification demonstrates correct dissection.

Also add support logic to load the BPF program and to inject the test
packets.

Signed-off-by: Petar Penkov 
Signed-off-by: Willem de Bruijn 
---
 tools/testing/selftests/bpf/.gitignore|   2 +
 tools/testing/selftests/bpf/Makefile  |   6 +-
 tools/testing/selftests/bpf/config|   1 +
 .../selftests/bpf/flow_dissector_load.c   | 140 
 .../selftests/bpf/test_flow_dissector.c   | 782 ++
 .../selftests/bpf/test_flow_dissector.sh  | 115 +++
 tools/testing/selftests/bpf/with_addr.sh  |  54 ++
 tools/testing/selftests/bpf/with_tunnels.sh   |  36 +
 8 files changed, 1134 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/flow_dissector_load.c
 create mode 100644 tools/testing/selftests/bpf/test_flow_dissector.c
 create mode 100755 tools/testing/selftests/bpf/test_flow_dissector.sh
 create mode 100755 tools/testing/selftests/bpf/with_addr.sh
 create mode 100755 tools/testing/selftests/bpf/with_tunnels.sh

diff --git a/tools/testing/selftests/bpf/.gitignore 
b/tools/testing/selftests/bpf/.gitignore
index 4d789c1e5167..8a60c9b9892d 100644
--- a/tools/testing/selftests/bpf/.gitignore
+++ b/tools/testing/selftests/bpf/.gitignore
@@ -23,3 +23,5 @@ test_skb_cgroup_id_user
 test_socket_cookie
 test_cgroup_storage
 test_select_reuseport
+test_flow_dissector
+flow_dissector_load
diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index e65f50f9185e..fd3851d5c079 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -47,10 +47,12 @@ TEST_PROGS := test_kmod.sh \
test_tunnel.sh \
test_lwt_seg6local.sh \
test_lirc_mode2.sh \
-   test_skb_cgroup_id.sh
+   test_skb_cgroup_id.sh \
+   test_flow_dissector.sh
 
 # Compile but not part of 'make run_tests'
-TEST_GEN_PROGS_EXTENDED = test_libbpf_open test_sock_addr 
test_skb_cgroup_id_user
+TEST_GEN_PROGS_EXTENDED = test_libbpf_open test_sock_addr 
test_skb_cgroup_id_user \
+   flow_dissector_load test_flow_dissector
 
 include ../lib.mk
 
diff --git a/tools/testing/selftests/bpf/config 
b/tools/testing/selftests/bpf/config
index b4994a94968b..3655508f95fd 100644
--- a/tools/testing/selftests/bpf/config
+++ b/tools/testing/selftests/bpf/config
@@ -18,3 +18,4 @@ CONFIG_CRYPTO_HMAC=m
 CONFIG_CRYPTO_SHA256=m
 CONFIG_VXLAN=y
 CONFIG_GENEVE=y
+CONFIG_NET_CLS_FLOWER=m
diff --git a/tools/testing/selftests/bpf/flow_dissector_load.c 
b/tools/testing/selftests/bpf/flow_dissector_load.c
new file mode 100644
index ..d3273b5b3173
--- /dev/null
+++ b/tools/testing/selftests/bpf/flow_dissector_load.c
@@ -0,0 +1,140 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+const char *cfg_pin_path = "/sys/fs/bpf/flow_dissector";
+const char *cfg_map_name = "jmp_table";
+bool cfg_attach = true;
+char *cfg_section_name;
+char *cfg_path_name;
+
+static void load_and_attach_program(void)
+{
+   struct bpf_program *prog, *main_prog;
+   struct bpf_map *prog_array;
+   int i, fd, prog_fd, ret;
+   struct bpf_object *obj;
+   int prog_array_fd;
+
+   ret = bpf_prog_load(cfg_path_name, BPF_PROG_TYPE_FLOW_DISSECTOR, ,
+   _fd);
+   if (ret)
+   error(1, 0, "bpf_prog_load %s", cfg_path_name);
+
+   main_prog = bpf_object__find_program_by_title(obj, cfg_section_name);
+   if (!main_prog)
+   error(1, 0, "bpf_object__find_program_by_title %s",
+ cfg_section_name);
+
+   prog_fd = bpf_program__fd(main_prog);
+   if (prog_fd < 0)
+   error(1, 0, "bpf_program__fd");
+
+   prog_array = bpf_object__find_map_by_name(obj, cfg_map_name);
+   if (!prog_array)
+   error(1, 0, "bpf_object__find_map_by_name %s", cfg_map_name);
+
+   prog_array_fd = bpf_map__fd(prog_array);
+   if (prog_array_fd < 0)
+   error(1, 0, "bpf_map__fd %s", cfg_map_name);
+
+   i = 0;
+   bpf_object__for_each_program(prog, obj) {
+   fd = bpf_program__fd(prog);
+   if (fd < 0)
+   error(1, 0, "bpf_program__fd");
+
+   if (fd != prog_fd) {
+   printf("%d: %s\n", i, bpf_program__title(prog, false));
+   bpf_map_update_elem(prog_array_fd, , , BPF_ANY);
+   ++i;
+   }
+   }
+
+   ret =

[bpf-next, v4 4/5] flow_dissector: implements eBPF parser

2018-09-14 Thread Petar Penkov

From: Petar Penkov 

This eBPF program extracts basic/control/ip address/ports keys from
incoming packets. It supports recursive parsing for IP encapsulation,
and VLAN, along with IPv4/IPv6 and extension headers.  This program is
meant to show how flow dissection and key extraction can be done in
eBPF.

Link: http://vger.kernel.org/netconf2017_files/rx_hardening_and_udp_gso.pdf
Signed-off-by: Petar Penkov 
Signed-off-by: Willem de Bruijn 
---
 tools/testing/selftests/bpf/Makefile   |   2 +-
 tools/testing/selftests/bpf/bpf_flow.c | 373 +
 2 files changed, 374 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/bpf_flow.c

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index fff7fb1285fc..e65f50f9185e 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -35,7 +35,7 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o 
test_tcp_estats.o test
test_get_stack_rawtp.o test_sockmap_kern.o test_sockhash_kern.o \
test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o 
test_lirc_mode2_kern.o \
get_cgroup_id_kern.o socket_cookie_prog.o test_select_reuseport_kern.o \
-   test_skb_cgroup_id_kern.o
+   test_skb_cgroup_id_kern.o bpf_flow.o
 
 # Order correspond to 'make run_tests' order
 TEST_PROGS := test_kmod.sh \
diff --git a/tools/testing/selftests/bpf/bpf_flow.c 
b/tools/testing/selftests/bpf/bpf_flow.c
new file mode 100644
index ..5fb809d95867
--- /dev/null
+++ b/tools/testing/selftests/bpf/bpf_flow.c
@@ -0,0 +1,373 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
+
+int _version SEC("version") = 1;
+#define PROG(F) SEC(#F) int bpf_func_##F
+
+/* These are the identifiers of the BPF programs that will be used in tail
+ * calls. Name is limited to 16 characters, with the terminating character and
+ * bpf_func_ above, we have only 6 to work with, anything after will be 
cropped.
+ */
+enum {
+   IP,
+   IPV6,
+   IPV6OP, /* Destination/Hop-by-Hop Options IPv6 Extension header */
+   IPV6FR, /* Fragmentation IPv6 Extension Header */
+   MPLS,
+   VLAN,
+};
+
+#define IP_MF  0x2000
+#define IP_OFFSET  0x1FFF
+#define IP6_MF 0x0001
+#define IP6_OFFSET 0xFFF8
+
+struct vlan_hdr {
+   __be16 h_vlan_TCI;
+   __be16 h_vlan_encapsulated_proto;
+};
+
+struct gre_hdr {
+   __be16 flags;
+   __be16 proto;
+};
+
+struct frag_hdr {
+   __u8 nexthdr;
+   __u8 reserved;
+   __be16 frag_off;
+   __be32 identification;
+};
+
+struct bpf_map_def SEC("maps") jmp_table = {
+   .type = BPF_MAP_TYPE_PROG_ARRAY,
+   .key_size = sizeof(__u32),
+   .value_size = sizeof(__u32),
+   .max_entries = 8
+};
+
+static __always_inline void *bpf_flow_dissect_get_header(struct __sk_buff *skb,
+__u16 hdr_size,
+void *buffer)
+{
+   void *data_end = (void *)(long)skb->data_end;
+   void *data = (void *)(long)skb->data;
+   __u16 nhoff = skb->flow_keys->nhoff;
+   __u8 *hdr;
+
+   /* Verifies this variable offset does not overflow */
+   if (nhoff > (USHRT_MAX - hdr_size))
+   return NULL;
+
+   hdr = data + nhoff;
+   if (hdr + hdr_size <= data_end)
+   return hdr;
+
+   if (bpf_skb_load_bytes(skb, nhoff, buffer, hdr_size))
+   return NULL;
+
+   return buffer;
+}
+
+/* Dispatches on ETHERTYPE */
+static __always_inline int parse_eth_proto(struct __sk_buff *skb, __be16 proto)
+{
+   struct bpf_flow_keys *keys = skb->flow_keys;
+
+   keys->n_proto = proto;
+   switch (proto) {
+   case bpf_htons(ETH_P_IP):
+   bpf_tail_call(skb, _table, IP);
+   break;
+   case bpf_htons(ETH_P_IPV6):
+   bpf_tail_call(skb, _table, IPV6);
+   break;
+   case bpf_htons(ETH_P_MPLS_MC):
+   case bpf_htons(ETH_P_MPLS_UC):
+   bpf_tail_call(skb, _table, MPLS);
+   break;
+   case bpf_htons(ETH_P_8021Q):
+   case bpf_htons(ETH_P_8021AD):
+   bpf_tail_call(skb, _table, VLAN);
+   break;
+   default:
+   /* Protocol not supported */
+   return BPF_DROP;
+   }
+
+   return BPF_DROP;
+}
+
+SEC("dissect")
+int dissect(struct __sk_buff *skb)
+{
+   if (!skb->vlan_present)
+   return parse_eth_proto(skb, skb->protocol);
+   else
+   return parse_eth_proto(skb, skb->vlan_proto);
+}
+
+/* Parses on IPPROTO_* */
+static __always_inline int parse_ip_proto(struct __sk_buff *skb, __u8 proto)
+{
+   struct

[bpf-next, v4 2/5] bpf: sync bpf.h uapi with tools/

2018-09-14 Thread Petar Penkov

From: Petar Penkov 

This patch syncs tools/include/uapi/linux/bpf.h with the flow dissector
definitions from include/uapi/linux/bpf.h

Signed-off-by: Petar Penkov 
Signed-off-by: Willem de Bruijn 
---
 tools/include/uapi/linux/bpf.h | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 66917a4eba27..aa5ccd2385ed 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -152,6 +152,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_LWT_SEG6LOCAL,
BPF_PROG_TYPE_LIRC_MODE2,
BPF_PROG_TYPE_SK_REUSEPORT,
+   BPF_PROG_TYPE_FLOW_DISSECTOR,
 };
 
 enum bpf_attach_type {
@@ -172,6 +173,7 @@ enum bpf_attach_type {
BPF_CGROUP_UDP4_SENDMSG,
BPF_CGROUP_UDP6_SENDMSG,
BPF_LIRC_MODE2,
+   BPF_FLOW_DISSECTOR,
__MAX_BPF_ATTACH_TYPE
 };
 
@@ -2333,6 +2335,7 @@ struct __sk_buff {
/* ... here. */
 
__u32 data_meta;
+   struct bpf_flow_keys *flow_keys;
 };
 
 struct bpf_tunnel_key {
@@ -2778,4 +2781,27 @@ enum bpf_task_fd_type {
BPF_FD_TYPE_URETPROBE,  /* filename + offset */
 };
 
+struct bpf_flow_keys {
+   __u16   nhoff;
+   __u16   thoff;
+   __u16   addr_proto; /* ETH_P_* of valid addrs */
+   __u8is_frag;
+   __u8is_first_frag;
+   __u8is_encap;
+   __u8ip_proto;
+   __be16  n_proto;
+   __be16  sport;
+   __be16  dport;
+   union {
+   struct {
+   __be32  ipv4_src;
+   __be32  ipv4_dst;
+   };
+   struct {
+   __u32   ipv6_src[4];/* in6_addr; network order */
+   __u32   ipv6_dst[4];/* in6_addr; network order */
+   };
+   };
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
-- 
2.19.0.397.gdd90340f6a-goog

[bpf-next, v4 3/5] bpf: support flow dissector in libbpf and bpftool

2018-09-14 Thread Petar Penkov

From: Petar Penkov 

This patch extends libbpf and bpftool to work with programs of type
BPF_PROG_TYPE_FLOW_DISSECTOR.

Signed-off-by: Petar Penkov 
Signed-off-by: Willem de Bruijn 
---
 tools/bpf/bpftool/prog.c | 1 +
 tools/lib/bpf/libbpf.c   | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index dce960d22106..b1cd3bc8db70 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -74,6 +74,7 @@ static const char * const prog_type_name[] = {
[BPF_PROG_TYPE_RAW_TRACEPOINT]  = "raw_tracepoint",
[BPF_PROG_TYPE_CGROUP_SOCK_ADDR] = "cgroup_sock_addr",
[BPF_PROG_TYPE_LIRC_MODE2]  = "lirc_mode2",
+   [BPF_PROG_TYPE_FLOW_DISSECTOR]  = "flow_dissector",
 };
 
 static void print_boot_time(__u64 nsecs, char *buf, unsigned int size)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 8476da7f2720..9ca8e0e624d8 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1502,6 +1502,7 @@ static bool bpf_prog_type__needs_kver(enum bpf_prog_type 
type)
case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
case BPF_PROG_TYPE_LIRC_MODE2:
case BPF_PROG_TYPE_SK_REUSEPORT:
+   case BPF_PROG_TYPE_FLOW_DISSECTOR:
return false;
case BPF_PROG_TYPE_UNSPEC:
case BPF_PROG_TYPE_KPROBE:
@@ -2121,6 +2122,7 @@ static const struct {
BPF_PROG_SEC("sk_skb",  BPF_PROG_TYPE_SK_SKB),
BPF_PROG_SEC("sk_msg",  BPF_PROG_TYPE_SK_MSG),
BPF_PROG_SEC("lirc_mode2",  BPF_PROG_TYPE_LIRC_MODE2),
+   BPF_PROG_SEC("flow_dissector",  BPF_PROG_TYPE_FLOW_DISSECTOR),
BPF_SA_PROG_SEC("cgroup/bind4", BPF_CGROUP_INET4_BIND),
BPF_SA_PROG_SEC("cgroup/bind6", BPF_CGROUP_INET6_BIND),
BPF_SA_PROG_SEC("cgroup/connect4", BPF_CGROUP_INET4_CONNECT),
-- 
2.19.0.397.gdd90340f6a-goog

1 2 >

1 - 100 of 128 matches

Mail list logo