Re: [Intel-wired-lan] NFS over NAT causes e1000e transmit hangs

2017-04-22 Thread Neftin, Sasha

On 4/20/2017 00:15, Florian Fainelli wrote:

On 04/19/2017 01:52 AM, Neftin, Sasha wrote:

On 4/18/2017 22:05, Florian Fainelli wrote:

On 04/18/2017 12:03 PM, Eric Dumazet wrote:

On Tue, 2017-04-18 at 11:18 -0700, Florian Fainelli wrote:

Hi,

I am using NFS over a NAT with two e1000e adapters and with eth1 being
the LAN interface and eth0 the WAN interface. The kernel is Ubuntu's
16.10 kernel: 4.8.0-46-generic. The device doing NAT over NFS is just
mounting a remote folder and doing normal execution/file accesses. It's
enough to untar a file from this device onto a NFS share to expose the
problem.

The transmit hangs look like the ones below, doing a rmmod/insmod does
not help eliminated the problem, nor does a power cycle. Stopping the
NFS over NAT definitively does let the adapter recover.

Is this NFS over TCP or UDP ?

This is NFS over TCP mounted with the following:

type nfs
(rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=2049,timeo=70,retrans=3,sec=sys,local_lock=none,addr=X.X.X.X)


Thanks Eric!

Please, try disable TCP segmentation offload: ethtool -K  tso off.

I am not able to reproduce the hangs with TSO turned off. Is there a
specific patch you would want me to try?


Please, work with TSO turned off so. There is no patch for this specific 
problem.




[PATCH 1/1] openvswitch: check return value of nla_nest_start

2017-04-22 Thread Pan Bian
Function nla_nest_start() will return a NULL pointer on error, and its
return value should be validated before it is used. However, in function
queue_userspace_packet(), its return value is ignored. This may result
in NULL dereference when calling nla_nest_end(). This patch fixes the
bug.

Signed-off-by: Pan Bian 
---
 net/openvswitch/datapath.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 9c62b63..34c0fbd 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -489,7 +489,8 @@ static int queue_userspace_packet(struct datapath *dp, 
struct sk_buff *skb,
err = ovs_nla_put_tunnel_info(user_skb,
  upcall_info->egress_tun_info);
BUG_ON(err);
-   nla_nest_end(user_skb, nla);
+   if (nla)
+   nla_nest_end(user_skb, nla);
}
 
if (upcall_info->actions_len) {
@@ -497,7 +498,7 @@ static int queue_userspace_packet(struct datapath *dp, 
struct sk_buff *skb,
err = ovs_nla_put_actions(upcall_info->actions,
  upcall_info->actions_len,
  user_skb);
-   if (!err)
+   if (!err && nla)
nla_nest_end(user_skb, nla);
else
nla_nest_cancel(user_skb, nla);
-- 
1.9.1




[PATCH 1/1] lwtunnel: check return value of nla_nest_start

2017-04-22 Thread Pan Bian
Function nla_nest_start() may return a NULL pointer on error. However,
in function lwtunnel_fill_encap(), the return value of nla_nest_start()
is not validated before it is used. This patch checks the return value
of nla_nest_start() against NULL.

Signed-off-by: Pan Bian 
---
 net/core/lwtunnel.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
index 6df9f8f..3471ce7 100644
--- a/net/core/lwtunnel.c
+++ b/net/core/lwtunnel.c
@@ -216,6 +216,8 @@ int lwtunnel_fill_encap(struct sk_buff *skb, struct 
lwtunnel_state *lwtstate)
 
ret = -EOPNOTSUPP;
nest = nla_nest_start(skb, RTA_ENCAP);
+   if (!nest)
+   goto nla_put_failure;
rcu_read_lock();
ops = rcu_dereference(lwtun_encaps[lwtstate->type]);
if (likely(ops && ops->fill_encap))
-- 
1.9.1




Re: [net-next 03/11] ixgbe: add support for XDP_TX action

2017-04-22 Thread Jakub Kicinski
On Sat, 22 Apr 2017 20:40:22 -0700, John Fastabend wrote:
> >> @@ -9557,7 +9739,21 @@ static int ixgbe_xdp_setup(struct net_device *dev, 
> >> struct bpf_prog *prog)
> >>return -EINVAL;
> >>}
> >>  
> >> +  if (nr_cpu_ids > MAX_XDP_QUEUES)
> >> +  return -ENOMEM;
> >> +
> >>old_prog = xchg(&adapter->xdp_prog, prog);
> >> +
> >> +  /* If transitioning XDP modes reconfigure rings */
> >> +  if (!!prog != !!old_prog) {
> >> +  int err = ixgbe_setup_tc(dev, netdev_get_num_tc(dev));
> >> +
> >> +  if (err) {
> >> +  rcu_assign_pointer(adapter->xdp_prog, old_prog);
> >> +  return -EINVAL;
> >> +  }
> >> +  }
> >> +
> >>for (i = 0; i < adapter->num_rx_queues; i++)
> >>xchg(&adapter->rx_ring[i]->xdp_prog, adapter->xdp_prog);
> >>
> > 
> > In case of disabling XDP I assume ixgbe_setup_tc() will free the rings
> > before the xdp_prog on the rings is swapped to NULL.  Is there anything
> > preventing TX in that time window?  I think usual ordering would be to
> > install the prog after reconfig but uninstall before.
> >   
> 
> Well in the ixgbe_setup_tc() case we set the rx_ring->xdp_prog in
> ixgbe_setup_rx_resorources(), while the dma engine is disabled, so the for
> loop is just doing another set on the rx_ring assigning it to the program
> already set previously.
> 
> Its not really buggy its just extra useless work so I'll change it to this,
> 
>   if (!!prog != !!old_prog) {
>   ...
>   } else {
>   for ( ... )
>   swap xdp prog
>   }
> 
> Nice spot, thanks for reviewing. And I missed a build error so I'll roll these
> fixes in a resend.

Ah, thanks for explaining.  No bugs that I can spot then :)


Re: [net-next 00/11][pull request] 10GbE Intel Wired LAN Driver Updates 2017-04-20

2017-04-22 Thread John Fastabend
On 17-04-21 11:18 AM, David Miller wrote:
> From: Jeff Kirsher 
> Date: Thu, 20 Apr 2017 18:50:18 -0700
> 
>> John adds XDP support (yeah!) for ixgbe.
> 
> As excited and eager as I am about this, I want to see the build regression
> for PAGE_SIZE>=8192 fixed before I pull this.
> 
> Thanks.
> 

Dang :(

Jeff, Alex sent you a fix already for this, but Jakub had a few nice
improvements. I'm thinking the easiest thing to do is for me to merge Alex's fix
and Jakubs comments and resend the patches.

Thanks,
John


Re: [net-next 03/11] ixgbe: add support for XDP_TX action

2017-04-22 Thread John Fastabend
On 17-04-22 07:24 PM, Jakub Kicinski wrote:
> On Thu, 20 Apr 2017 18:50:21 -0700, Jeff Kirsher wrote:
>> +static int ixgbe_xdp_queues(struct ixgbe_adapter *adapter)
>> +{
>> +if (nr_cpu_ids > MAX_XDP_QUEUES)
>> +return 0;
>> +
>> +return adapter->xdp_prog ? nr_cpu_ids : 0;
>> +}
> 
> Nit: AFAICT ixgbe_xdp_setup() will guarantee xdp_prog is not set if
> there are too many CPU ids.

Sure being a bit paranoid I guess.

> 
>> @@ -6120,10 +6193,21 @@ static int ixgbe_setup_all_tx_resources(struct 
>> ixgbe_adapter *adapter)
>>  e_err(probe, "Allocation for Tx Queue %u failed\n", i);
>>  goto err_setup_tx;
>>  }
>> +for (j = 0; j < adapter->num_xdp_queues; j++) {
>> +err = ixgbe_setup_tx_resources(adapter->xdp_ring[j]);
>> +if (!err)
>> +continue;
>> +
>> +e_err(probe, "Allocation for Tx Queue %u failed\n", j);
>> +goto err_setup_tx;
>> +}
>> +
>>  
> 
> Nit: extra line here

OK well I guess we can fix this if we need a respin anyways.

> 
>> @@ -9557,7 +9739,21 @@ static int ixgbe_xdp_setup(struct net_device *dev, 
>> struct bpf_prog *prog)
>>  return -EINVAL;
>>  }
>>  
>> +if (nr_cpu_ids > MAX_XDP_QUEUES)
>> +return -ENOMEM;
>> +
>>  old_prog = xchg(&adapter->xdp_prog, prog);
>> +
>> +/* If transitioning XDP modes reconfigure rings */
>> +if (!!prog != !!old_prog) {
>> +int err = ixgbe_setup_tc(dev, netdev_get_num_tc(dev));
>> +
>> +if (err) {
>> +rcu_assign_pointer(adapter->xdp_prog, old_prog);
>> +return -EINVAL;
>> +}
>> +}
>> +
>>  for (i = 0; i < adapter->num_rx_queues; i++)
>>  xchg(&adapter->rx_ring[i]->xdp_prog, adapter->xdp_prog);
>>  
> 
> In case of disabling XDP I assume ixgbe_setup_tc() will free the rings
> before the xdp_prog on the rings is swapped to NULL.  Is there anything
> preventing TX in that time window?  I think usual ordering would be to
> install the prog after reconfig but uninstall before.
> 

Well in the ixgbe_setup_tc() case we set the rx_ring->xdp_prog in
ixgbe_setup_rx_resorources(), while the dma engine is disabled, so the for
loop is just doing another set on the rx_ring assigning it to the program
already set previously.

Its not really buggy its just extra useless work so I'll change it to this,

if (!!prog != !!old_prog) {
...
} else {
for ( ... )
swap xdp prog
}

Nice spot, thanks for reviewing. And I missed a build error so I'll roll these
fixes in a resend.

Thanks,
John






[PATCH net-next v3 5/5] nfp: remove the refresh of all ports optimization

2017-04-22 Thread Jakub Kicinski
The code refreshing the eth port state was trying to update state
of all ports of the card.  Unfortunately to safely walk the port
list we would have to hold the port lock, which we can't due to
lock ordering constraints against rtnl.

Make the per-port sync refresh and async refresh of all ports
completely separate routines.

Fixes: 172f638c93dd ("nfp: add port state refresh")
Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h   |  3 +-
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   | 13 +++--
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c  | 67 +++---
 3 files changed, 58 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 8302a2d688da..8f20fdef0754 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -819,7 +819,8 @@ struct nfp_net_dp *nfp_net_clone_dp(struct nfp_net *nn);
 int nfp_net_ring_reconfig(struct nfp_net *nn, struct nfp_net_dp *new);
 
 bool nfp_net_link_changed_read_clear(struct nfp_net *nn);
-void nfp_net_refresh_port_config(struct nfp_net *nn);
+int nfp_net_refresh_eth_port(struct nfp_net *nn);
+void nfp_net_refresh_port_table(struct nfp_net *nn);
 
 #ifdef CONFIG_NFP_DEBUG
 void nfp_net_debugfs_create(void);
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
index 3328041ec290..6e27d1281425 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
@@ -211,10 +211,15 @@ nfp_net_get_link_ksettings(struct net_device *netdev,
return 0;
 
/* Use link speed from ETH table if available, otherwise try the BAR */
-   if (nn->eth_port && nfp_net_link_changed_read_clear(nn))
-   nfp_net_refresh_port_config(nn);
-   /* Separate if - on FW error the port could've disappeared from table */
if (nn->eth_port) {
+   int err;
+
+   if (nfp_net_link_changed_read_clear(nn)) {
+   err = nfp_net_refresh_eth_port(nn);
+   if (err)
+   return err;
+   }
+
cmd->base.port = nn->eth_port->port_type;
cmd->base.speed = nn->eth_port->speed;
cmd->base.duplex = DUPLEX_FULL;
@@ -273,7 +278,7 @@ nfp_net_set_link_ksettings(struct net_device *netdev,
if (err > 0)
return 0; /* no change */
 
-   nfp_net_refresh_port_config(nn);
+   nfp_net_refresh_port_table(nn);
 
return err;
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
index 4c6863a072d3..8cb87cbe1120 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
@@ -176,13 +176,13 @@ nfp_net_get_mac_addr(struct nfp_net *nn, struct nfp_cpp 
*cpp, unsigned int id)
 }
 
 static struct nfp_eth_table_port *
-nfp_net_find_port(struct nfp_pf *pf, unsigned int id)
+nfp_net_find_port(struct nfp_eth_table *eth_tbl, unsigned int id)
 {
int i;
 
-   for (i = 0; pf->eth_tbl && i < pf->eth_tbl->count; i++)
-   if (pf->eth_tbl->ports[i].eth_index == id)
-   return &pf->eth_tbl->ports[i];
+   for (i = 0; eth_tbl && i < eth_tbl->count; i++)
+   if (eth_tbl->ports[i].eth_index == id)
+   return ð_tbl->ports[i];
 
return NULL;
 }
@@ -367,7 +367,7 @@ nfp_net_pf_alloc_netdevs(struct nfp_pf *pf, void __iomem 
*ctrl_bar,
prev_tx_base = tgt_tx_base;
prev_rx_base = tgt_rx_base;
 
-   eth_port = nfp_net_find_port(pf, i);
+   eth_port = nfp_net_find_port(pf->eth_tbl, i);
if (eth_port && eth_port->override_changed) {
nfp_warn(pf->cpp, "Config changed for port #%d, reboot 
required before port will be operational\n", i);
} else {
@@ -485,6 +485,7 @@ static void nfp_net_refresh_netdevs(struct work_struct 
*work)
 {
struct nfp_pf *pf = container_of(work, struct nfp_pf,
 port_refresh_work);
+   struct nfp_eth_table *eth_table;
struct nfp_net *nn, *next;
 
mutex_lock(&pf->port_lock);
@@ -493,6 +494,27 @@ static void nfp_net_refresh_netdevs(struct work_struct 
*work)
if (list_empty(&pf->ports))
goto out;
 
+   list_for_each_entry(nn, &pf->ports, port_list)
+   nfp_net_link_changed_read_clear(nn);
+
+   eth_table = nfp_eth_read_ports(pf->cpp);
+   if (!eth_table) {
+   nfp_err(pf->cpp, "Error refreshing port config!\n");
+   goto out;
+   }
+
+   rtnl_lock();
+   list_for_each_entry(nn, &pf->ports, port_list) {
+   if (!nn->eth_port)
+

[PATCH net-next v3 3/5] nfp: add NSP routine to get static information

2017-04-22 Thread Jakub Kicinski
From: David Brunecz 

Retrieve identifying information from the NSP.  For now it only
contains versions of firmware subcomponents.

Signed-off-by: David Brunecz 
Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/Makefile|  1 +
 drivers/net/ethernet/netronome/nfp/nfp_main.c  |  7 ++
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h   |  1 +
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c   |  7 ++
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h   | 24 ++
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp_cmds.c  | 89 ++
 6 files changed, 129 insertions(+)
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_cmds.c

diff --git a/drivers/net/ethernet/netronome/nfp/Makefile 
b/drivers/net/ethernet/netronome/nfp/Makefile
index 4a5d13ef92a4..4b15f0f496aa 100644
--- a/drivers/net/ethernet/netronome/nfp/Makefile
+++ b/drivers/net/ethernet/netronome/nfp/Makefile
@@ -9,6 +9,7 @@ nfp-objs := \
nfpcore/nfp_mutex.o \
nfpcore/nfp_nffw.o \
nfpcore/nfp_nsp.o \
+   nfpcore/nfp_nsp_cmds.o \
nfpcore/nfp_nsp_eth.o \
nfpcore/nfp_resource.o \
nfpcore/nfp_rtsym.o \
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_main.c
index bea2a1a6c211..dde35dae35c5 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c
@@ -253,6 +253,7 @@ nfp_fw_load(struct pci_dev *pdev, struct nfp_pf *pf, struct 
nfp_nsp *nsp)
 
 static int nfp_nsp_init(struct pci_dev *pdev, struct nfp_pf *pf)
 {
+   struct nfp_nsp_identify *nspi;
struct nfp_nsp *nsp;
int err;
 
@@ -269,6 +270,12 @@ static int nfp_nsp_init(struct pci_dev *pdev, struct 
nfp_pf *pf)
 
pf->eth_tbl = __nfp_eth_read_ports(pf->cpp, nsp);
 
+   nspi = __nfp_nsp_identify(nsp);
+   if (nspi) {
+   dev_info(&pdev->dev, "BSP: %s\n", nspi->version);
+   kfree(nspi);
+   }
+
err = nfp_fw_load(pdev, pf, nsp);
if (err < 0) {
kfree(pf->eth_tbl);
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h
index 8afef7593f13..4df2ce261b3f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h
@@ -63,6 +63,7 @@ void nfp_nsp_config_clear_state(struct nfp_nsp *state);
 int nfp_nsp_read_eth_table(struct nfp_nsp *state, void *buf, unsigned int 
size);
 int nfp_nsp_write_eth_table(struct nfp_nsp *state,
const void *buf, unsigned int size);
+int nfp_nsp_read_identify(struct nfp_nsp *state, void *buf, unsigned int size);
 
 /* Implemented in nfp_resource.c */
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
index 4635f42e15b0..61797c98f5fe 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
@@ -93,6 +93,7 @@ enum nfp_nsp_cmd {
SPCODE_FW_LOAD  = 6, /* Load fw from buffer, len in option */
SPCODE_ETH_RESCAN   = 7, /* Rescan ETHs, write ETH_TABLE to buf */
SPCODE_ETH_CONTROL  = 8, /* Update media config from buffer */
+   SPCODE_NSP_IDENTIFY = 13, /* Read NSP version */
 
__MAX_SPCODE,
 };
@@ -493,3 +494,9 @@ int nfp_nsp_write_eth_table(struct nfp_nsp *state,
return nfp_nsp_command_buf(state, SPCODE_ETH_CONTROL, size, buf, size,
   NULL, 0);
 }
+
+int nfp_nsp_read_identify(struct nfp_nsp *state, void *buf, unsigned int size)
+{
+   return nfp_nsp_command_buf(state, SPCODE_NSP_IDENTIFY, size, NULL, 0,
+  buf, size);
+}
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
index 7d34ff145fd7..36b21e4dc56d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
@@ -147,4 +147,28 @@ int __nfp_eth_set_aneg(struct nfp_nsp *nsp, enum 
nfp_eth_aneg mode);
 int __nfp_eth_set_speed(struct nfp_nsp *nsp, unsigned int speed);
 int __nfp_eth_set_split(struct nfp_nsp *nsp, unsigned int lanes);
 
+/**
+ * struct nfp_nsp_identify - NSP static information
+ * @version:  opaque version string
+ * @flags:version flags
+ * @br_primary:   branch id of primary bootloader
+ * @br_secondary: branch id of secondary bootloader
+ * @br_nsp:   branch id of NSP
+ * @primary:  version of primarary bootloader
+ * @secondary:version id of secondary bootloader
+ * @nsp:  version id of NSP
+ */
+struct nfp_nsp_identify {
+   char version[40];
+   u8 flags;
+   u8 br_primary;
+   u8 br_secondary;
+   u8 br_nsp;
+   u16 primary;
+   u16 secondary;
+   u16 nsp;
+};
+
+struct nfp_nsp_identify *

[PATCH net-next v3 0/5] nfp: DMA flags, adjust head and fixes

2017-04-22 Thread Jakub Kicinski
Hi!

This series takes advantage of Alex's DMA_ATTR_SKIP_CPU_SYNC to make 
XDP packet modifications "correct" from DMA API point of view.  It 
also allows us to parse the metadata before we run XDP at no additional
DMA sync cost.  That way we can get rid of the metadata memcpy, and 
remove the last upstream user of bpf_prog->xdp_adjust_head.

David's patch adds a way to read capabilities from the management
firmware.

There are also two net-next fixes.  Patch 4 which fixes what seems to
be a result of a botched rebase on my part.  Patch 5 corrects locking
when state of ethernet ports is being refreshed.

---
v3: move the sync from alloc func to the actual give to hw func
v2: sync rx buffers before giving them to the card (Alex)


David Brunecz (1):
  nfp: add NSP routine to get static information

Jakub Kicinski (4):
  nfp: make use of the DMA_ATTR_SKIP_CPU_SYNC attr
  nfp: parse metadata prepend before XDP runs
  nfp: fix free list buffer size reporting
  nfp: remove the refresh of all ports optimization

 drivers/net/ethernet/netronome/nfp/Makefile|   1 +
 drivers/net/ethernet/netronome/nfp/nfp_main.c  |   7 ++
 drivers/net/ethernet/netronome/nfp/nfp_net.h   |   9 +-
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 125 -
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   |  13 ++-
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c  |  67 +++
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h   |   1 +
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c   |   7 ++
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h   |  24 
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp_cmds.c  |  89 +++
 10 files changed, 265 insertions(+), 78 deletions(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_cmds.c

-- 
2.11.0



[PATCH net-next v3 1/5] nfp: make use of the DMA_ATTR_SKIP_CPU_SYNC attr

2017-04-22 Thread Jakub Kicinski
DMA unmap may destroy changes CPU made to the buffer.  To make XDP
run correctly on non-x86 platforms we should use the
DMA_ATTR_SKIP_CPU_SYNC attribute.

Thanks to using the attribute we can now push the sync operation to the
common code path from XDP handler.

A little bit of variable name reshuffling is required to bring the
code back to readable state.

Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 53 ++
 1 file changed, 35 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index e2197160e4dc..f1128d12cd24 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -87,16 +87,31 @@ void nfp_net_get_fw_version(struct nfp_net_fw_version 
*fw_ver,
 
 static dma_addr_t nfp_net_dma_map_rx(struct nfp_net_dp *dp, void *frag)
 {
-   return dma_map_single(dp->dev, frag + NFP_NET_RX_BUF_HEADROOM,
- dp->fl_bufsz - NFP_NET_RX_BUF_NON_DATA,
- dp->rx_dma_dir);
+   return dma_map_single_attrs(dp->dev, frag + NFP_NET_RX_BUF_HEADROOM,
+   dp->fl_bufsz - NFP_NET_RX_BUF_NON_DATA,
+   dp->rx_dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
+}
+
+static void
+nfp_net_dma_sync_dev_rx(const struct nfp_net_dp *dp, dma_addr_t dma_addr)
+{
+   dma_sync_single_for_device(dp->dev, dma_addr,
+  dp->fl_bufsz - NFP_NET_RX_BUF_NON_DATA,
+  dp->rx_dma_dir);
 }
 
 static void nfp_net_dma_unmap_rx(struct nfp_net_dp *dp, dma_addr_t dma_addr)
 {
-   dma_unmap_single(dp->dev, dma_addr,
-dp->fl_bufsz - NFP_NET_RX_BUF_NON_DATA,
-dp->rx_dma_dir);
+   dma_unmap_single_attrs(dp->dev, dma_addr,
+  dp->fl_bufsz - NFP_NET_RX_BUF_NON_DATA,
+  dp->rx_dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
+}
+
+static void nfp_net_dma_sync_cpu_rx(struct nfp_net_dp *dp, dma_addr_t dma_addr,
+   unsigned int len)
+{
+   dma_sync_single_for_cpu(dp->dev, dma_addr - NFP_NET_RX_BUF_HEADROOM,
+   len, dp->rx_dma_dir);
 }
 
 /* Firmware reconfig
@@ -1208,6 +1223,8 @@ static void nfp_net_rx_give_one(const struct nfp_net_dp 
*dp,
 
wr_idx = rx_ring->wr_p & (rx_ring->cnt - 1);
 
+   nfp_net_dma_sync_dev_rx(dp, dma_addr);
+
/* Stash SKB and DMA address away */
rx_ring->rxbufs[wr_idx].frag = frag;
rx_ring->rxbufs[wr_idx].dma_addr = dma_addr;
@@ -1569,7 +1586,7 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
tx_ring = r_vec->xdp_ring;
 
while (pkts_polled < budget) {
-   unsigned int meta_len, data_len, data_off, pkt_len;
+   unsigned int meta_len, data_len, meta_off, pkt_len, pkt_off;
u8 meta_prepend[NFP_NET_MAX_PREPEND];
struct nfp_net_rx_buf *rxbuf;
struct nfp_net_rx_desc *rxd;
@@ -1608,11 +1625,12 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
data_len = le16_to_cpu(rxd->rxd.data_len);
pkt_len = data_len - meta_len;
 
+   pkt_off = NFP_NET_RX_BUF_HEADROOM + dp->rx_dma_off;
if (dp->rx_offset == NFP_NET_CFG_RX_OFFSET_DYNAMIC)
-   data_off = NFP_NET_RX_BUF_HEADROOM + meta_len;
+   pkt_off += meta_len;
else
-   data_off = NFP_NET_RX_BUF_HEADROOM + dp->rx_offset;
-   data_off += dp->rx_dma_off;
+   pkt_off += dp->rx_offset;
+   meta_off = pkt_off - meta_len;
 
/* Stats update */
u64_stats_update_begin(&r_vec->rx_sync);
@@ -1621,7 +1639,7 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
u64_stats_update_end(&r_vec->rx_sync);
 
/* Pointer to start of metadata */
-   meta = rxbuf->frag + data_off - meta_len;
+   meta = rxbuf->frag + meta_off;
 
if (unlikely(meta_len > NFP_NET_MAX_PREPEND ||
 (dp->rx_offset && meta_len > dp->rx_offset))) {
@@ -1631,6 +1649,9 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
continue;
}
 
+   nfp_net_dma_sync_cpu_rx(dp, rxbuf->dma_addr + meta_off,
+   data_len);
+
if (xdp_prog && !(rxd->rxd.flags & PCIE_DESC_RX_BPF &&
  dp->bpf_offload_xdp)) {
unsigned int dma_off;
@@ -1638,10 +1659,6 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
int act;
 

[PATCH net-next v3 2/5] nfp: parse metadata prepend before XDP runs

2017-04-22 Thread Jakub Kicinski
Calling memcpy to shift metadata out of the way for XDP to run
seems like an overkill.  The most common metadata contents are
8 bytes containing type and flow hash.  Simply parse the metadata
before we run XDP.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h   |  6 ++
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 67 +++---
 2 files changed, 40 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 052db9208fbb..8302a2d688da 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -284,6 +284,12 @@ struct nfp_net_rx_desc {
 
 #define NFP_NET_META_FIELD_MASK GENMASK(NFP_NET_META_FIELD_SIZE - 1, 0)
 
+struct nfp_meta_parsed {
+   u32 hash_type;
+   u32 hash;
+   u32 mark;
+};
+
 struct nfp_net_rx_hash {
__be32 hash_type;
__be32 hash;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index f1128d12cd24..3285053bece0 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1402,8 +1402,9 @@ static void nfp_net_rx_csum(struct nfp_net_dp *dp,
}
 }
 
-static void nfp_net_set_hash(struct net_device *netdev, struct sk_buff *skb,
-unsigned int type, __be32 *hash)
+static void
+nfp_net_set_hash(struct net_device *netdev, struct nfp_meta_parsed *meta,
+unsigned int type, __be32 *hash)
 {
if (!(netdev->features & NETIF_F_RXHASH))
return;
@@ -1412,16 +1413,18 @@ static void nfp_net_set_hash(struct net_device *netdev, 
struct sk_buff *skb,
case NFP_NET_RSS_IPV4:
case NFP_NET_RSS_IPV6:
case NFP_NET_RSS_IPV6_EX:
-   skb_set_hash(skb, get_unaligned_be32(hash), PKT_HASH_TYPE_L3);
+   meta->hash_type = PKT_HASH_TYPE_L3;
break;
default:
-   skb_set_hash(skb, get_unaligned_be32(hash), PKT_HASH_TYPE_L4);
+   meta->hash_type = PKT_HASH_TYPE_L4;
break;
}
+
+   meta->hash = get_unaligned_be32(hash);
 }
 
 static void
-nfp_net_set_hash_desc(struct net_device *netdev, struct sk_buff *skb,
+nfp_net_set_hash_desc(struct net_device *netdev, struct nfp_meta_parsed *meta,
  void *data, struct nfp_net_rx_desc *rxd)
 {
struct nfp_net_rx_hash *rx_hash = data;
@@ -1429,12 +1432,12 @@ nfp_net_set_hash_desc(struct net_device *netdev, struct 
sk_buff *skb,
if (!(rxd->rxd.flags & PCIE_DESC_RX_RSS))
return;
 
-   nfp_net_set_hash(netdev, skb, get_unaligned_be32(&rx_hash->hash_type),
+   nfp_net_set_hash(netdev, meta, get_unaligned_be32(&rx_hash->hash_type),
 &rx_hash->hash);
 }
 
 static void *
-nfp_net_parse_meta(struct net_device *netdev, struct sk_buff *skb,
+nfp_net_parse_meta(struct net_device *netdev, struct nfp_meta_parsed *meta,
   void *data, int meta_len)
 {
u32 meta_info;
@@ -1446,13 +1449,13 @@ nfp_net_parse_meta(struct net_device *netdev, struct 
sk_buff *skb,
switch (meta_info & NFP_NET_META_FIELD_MASK) {
case NFP_NET_META_HASH:
meta_info >>= NFP_NET_META_FIELD_SIZE;
-   nfp_net_set_hash(netdev, skb,
+   nfp_net_set_hash(netdev, meta,
 meta_info & NFP_NET_META_FIELD_MASK,
 (__be32 *)data);
data += 4;
break;
case NFP_NET_META_MARK:
-   skb->mark = get_unaligned_be32(data);
+   meta->mark = get_unaligned_be32(data);
data += 4;
break;
default:
@@ -1587,12 +1590,11 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
 
while (pkts_polled < budget) {
unsigned int meta_len, data_len, meta_off, pkt_len, pkt_off;
-   u8 meta_prepend[NFP_NET_MAX_PREPEND];
struct nfp_net_rx_buf *rxbuf;
struct nfp_net_rx_desc *rxd;
+   struct nfp_meta_parsed meta;
dma_addr_t new_dma_addr;
void *new_frag;
-   u8 *meta;
 
idx = rx_ring->rd_p & (rx_ring->cnt - 1);
 
@@ -1605,6 +1607,8 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
 */
dma_rmb();
 
+   memset(&meta, 0, sizeof(meta));
+
rx_ring->rd_p++;
pkts_polled++;
 
@@ -1638,9 +1642,6 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
r_vec->rx_bytes += pkt_len;
u64_stats_update_end(&r_vec->rx_sync);
 

[PATCH net-next v3 4/5] nfp: fix free list buffer size reporting

2017-04-22 Thread Jakub Kicinski
XDP headroom should not be included in free list buffer size.

Fixes: 6fe0c3b43804 ("nfp: add support for xdp_adjust_head()")
Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 3285053bece0..8a9b74305493 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -2165,7 +2165,7 @@ nfp_net_tx_ring_hw_cfg_write(struct nfp_net *nn,
  */
 static int nfp_net_set_config_and_enable(struct nfp_net *nn)
 {
-   u32 new_ctrl, update = 0;
+   u32 bufsz, new_ctrl, update = 0;
unsigned int r;
int err;
 
@@ -2199,8 +2199,9 @@ static int nfp_net_set_config_and_enable(struct nfp_net 
*nn)
nfp_net_write_mac_addr(nn);
 
nn_writel(nn, NFP_NET_CFG_MTU, nn->dp.netdev->mtu);
-   nn_writel(nn, NFP_NET_CFG_FLBUFSZ,
- nn->dp.fl_bufsz - NFP_NET_RX_BUF_NON_DATA);
+
+   bufsz = nn->dp.fl_bufsz - nn->dp.rx_dma_off - NFP_NET_RX_BUF_NON_DATA;
+   nn_writel(nn, NFP_NET_CFG_FLBUFSZ, bufsz);
 
/* Enable device */
new_ctrl |= NFP_NET_CFG_CTRL_ENABLE;
-- 
2.11.0



Re: [PATCH v2 net] net: ipv6: regenerate host route if moved to gc list

2017-04-22 Thread Martin KaFai Lau
On Sat, Apr 22, 2017 at 07:12:34PM -0600, David Ahern wrote:
> On 4/22/17 4:00 PM, Martin KaFai Lau wrote:
> > On Sat, Apr 22, 2017 at 09:40:37AM -0700, David Ahern wrote:
> > [...]
> >> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> >> index 08f9e8ea7a81..97e86158bbcb 100644
> >> --- a/net/ipv6/addrconf.c
> >> +++ b/net/ipv6/addrconf.c
> >> @@ -3303,14 +3303,24 @@ static void addrconf_gre_config(struct net_device 
> >> *dev)
> >>  static int fixup_permanent_addr(struct inet6_dev *idev,
> >>struct inet6_ifaddr *ifp)
> >>  {
> >> -  if (!ifp->rt) {
> >> -  struct rt6_info *rt;
> >> +  /* rt6i_ref == 0 means the host route was removed from the
> >> +   * FIB, for example, if 'lo' device is taken down. In that
> >> +   * case regenerate the host route.
> >> +   */
> >> +  if (!ifp->rt || !atomic_read(&ifp->rt->rt6i_ref)) {
> >> +  struct rt6_info *rt, *prev;
> >>
> >>rt = addrconf_dst_alloc(idev, &ifp->addr, false);
> > The rt regernation makes sense.
> >
> >>if (unlikely(IS_ERR(rt)))
> >>return PTR_ERR(rt);
> >>
> >> +  spin_lock(&ifp->lock);
> >> +  prev = ifp->rt;
> >>ifp->rt = rt;
> > I am still missing something on the new spin_lock:
> > 1) Is there an existing race in the existing
> >ifp->rt modification ('ipf->rt = rt') which is
> >not related to this bug?
> > 2) If there is a race in ifp->rt, is the above if-checks
> >on ifp->rt racy and need protection also? F.e. 'ifp->rt->rt6i_ref'
> >since ifp->rt could be NULL or ifp->rt->rt6i_ref
> >may not be zero later if there is concurrent
> >modification on ifp->rt?
>
> As I understand it:
> - rt6i_ref is modified by the fib code (adding and removing to tree) and
> always under RTNL.
> - ifp->rt is only *set* under RTNL, but is accessed without (dad via
> workqueue and sysctl).
>
> The code path to fixup_permanent_addr is under RTNL, so the if check on
> ifp->rt and rt6i_ref is ok -- neither can be changed since RTNL is held.
>
> Since ifp->rt can be accessed outside of RTNL, the spinlock is needed to
> change its value.
Got it. It is to protect the readers which are not under RTNL.
Many thanks for pointing out what I was missing.  It all makes sense now.

> Arguably only 'ifp->rt = rt;' needs the spinlock.
It still seems like the existing 'ifp->rt = rt;' needs protection
anyway regardless of the rt regeneration change.  It would be nice to
explain it in the commit log or even better separating it out
into another patch.

>
> There are many twists and turns with the ipv6 code.
Nod Nod :)

>
> >
> >> +  spin_unlock(&ifp->lock);
> >> +
> >> +  if (prev)
> >> +  ip6_rt_put(prev);
> > Nit. ip6_rt_put() takes NULL.
>
> ok.
>


Re: [net-next 03/11] ixgbe: add support for XDP_TX action

2017-04-22 Thread Jakub Kicinski
On Thu, 20 Apr 2017 18:50:21 -0700, Jeff Kirsher wrote:
> +static int ixgbe_xdp_queues(struct ixgbe_adapter *adapter)
> +{
> + if (nr_cpu_ids > MAX_XDP_QUEUES)
> + return 0;
> +
> + return adapter->xdp_prog ? nr_cpu_ids : 0;
> +}

Nit: AFAICT ixgbe_xdp_setup() will guarantee xdp_prog is not set if
there are too many CPU ids.

> @@ -6120,10 +6193,21 @@ static int ixgbe_setup_all_tx_resources(struct 
> ixgbe_adapter *adapter)
>   e_err(probe, "Allocation for Tx Queue %u failed\n", i);
>   goto err_setup_tx;
>   }
> + for (j = 0; j < adapter->num_xdp_queues; j++) {
> + err = ixgbe_setup_tx_resources(adapter->xdp_ring[j]);
> + if (!err)
> + continue;
> +
> + e_err(probe, "Allocation for Tx Queue %u failed\n", j);
> + goto err_setup_tx;
> + }
> +
>  

Nit: extra line here

> @@ -9557,7 +9739,21 @@ static int ixgbe_xdp_setup(struct net_device *dev, 
> struct bpf_prog *prog)
>   return -EINVAL;
>   }
>  
> + if (nr_cpu_ids > MAX_XDP_QUEUES)
> + return -ENOMEM;
> +
>   old_prog = xchg(&adapter->xdp_prog, prog);
> +
> + /* If transitioning XDP modes reconfigure rings */
> + if (!!prog != !!old_prog) {
> + int err = ixgbe_setup_tc(dev, netdev_get_num_tc(dev));
> +
> + if (err) {
> + rcu_assign_pointer(adapter->xdp_prog, old_prog);
> + return -EINVAL;
> + }
> + }
> +
>   for (i = 0; i < adapter->num_rx_queues; i++)
>   xchg(&adapter->rx_ring[i]->xdp_prog, adapter->xdp_prog);
>  

In case of disabling XDP I assume ixgbe_setup_tc() will free the rings
before the xdp_prog on the rings is swapped to NULL.  Is there anything
preventing TX in that time window?  I think usual ordering would be to
install the prog after reconfig but uninstall before.


Re: [PATCH v2 net] net: ipv6: regenerate host route if moved to gc list

2017-04-22 Thread David Ahern
On 4/22/17 4:00 PM, Martin KaFai Lau wrote:
> On Sat, Apr 22, 2017 at 09:40:37AM -0700, David Ahern wrote:
> [...]
>> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>> index 08f9e8ea7a81..97e86158bbcb 100644
>> --- a/net/ipv6/addrconf.c
>> +++ b/net/ipv6/addrconf.c
>> @@ -3303,14 +3303,24 @@ static void addrconf_gre_config(struct net_device 
>> *dev)
>>  static int fixup_permanent_addr(struct inet6_dev *idev,
>>  struct inet6_ifaddr *ifp)
>>  {
>> -if (!ifp->rt) {
>> -struct rt6_info *rt;
>> +/* rt6i_ref == 0 means the host route was removed from the
>> + * FIB, for example, if 'lo' device is taken down. In that
>> + * case regenerate the host route.
>> + */
>> +if (!ifp->rt || !atomic_read(&ifp->rt->rt6i_ref)) {
>> +struct rt6_info *rt, *prev;
>>
>>  rt = addrconf_dst_alloc(idev, &ifp->addr, false);
> The rt regernation makes sense.
> 
>>  if (unlikely(IS_ERR(rt)))
>>  return PTR_ERR(rt);
>>
>> +spin_lock(&ifp->lock);
>> +prev = ifp->rt;
>>  ifp->rt = rt;
> I am still missing something on the new spin_lock:
> 1) Is there an existing race in the existing
>ifp->rt modification ('ipf->rt = rt') which is
>not related to this bug?
> 2) If there is a race in ifp->rt, is the above if-checks
>on ifp->rt racy and need protection also? F.e. 'ifp->rt->rt6i_ref'
>since ifp->rt could be NULL or ifp->rt->rt6i_ref
>may not be zero later if there is concurrent
>modification on ifp->rt?

As I understand it:
- rt6i_ref is modified by the fib code (adding and removing to tree) and
always under RTNL.
- ifp->rt is only *set* under RTNL, but is accessed without (dad via
workqueue and sysctl).

The code path to fixup_permanent_addr is under RTNL, so the if check on
ifp->rt and rt6i_ref is ok -- neither can be changed since RTNL is held.

Since ifp->rt can be accessed outside of RTNL, the spinlock is needed to
change its value. Arguably only 'ifp->rt = rt;' needs the spinlock.

Let me know if I am missing something. There are many twists and turns
with the ipv6 code.

> 
>> +spin_unlock(&ifp->lock);
>> +
>> +if (prev)
>> +ip6_rt_put(prev);
> Nit. ip6_rt_put() takes NULL.

ok.



Re: [PATCH v2 net] net: ipv6: regenerate host route if moved to gc list

2017-04-22 Thread Martin KaFai Lau
On Sat, Apr 22, 2017 at 09:40:37AM -0700, David Ahern wrote:
[...]
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 08f9e8ea7a81..97e86158bbcb 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -3303,14 +3303,24 @@ static void addrconf_gre_config(struct net_device 
> *dev)
>  static int fixup_permanent_addr(struct inet6_dev *idev,
>   struct inet6_ifaddr *ifp)
>  {
> - if (!ifp->rt) {
> - struct rt6_info *rt;
> + /* rt6i_ref == 0 means the host route was removed from the
> +  * FIB, for example, if 'lo' device is taken down. In that
> +  * case regenerate the host route.
> +  */
> + if (!ifp->rt || !atomic_read(&ifp->rt->rt6i_ref)) {
> + struct rt6_info *rt, *prev;
>
>   rt = addrconf_dst_alloc(idev, &ifp->addr, false);
The rt regernation makes sense.

>   if (unlikely(IS_ERR(rt)))
>   return PTR_ERR(rt);
>
> + spin_lock(&ifp->lock);
> + prev = ifp->rt;
>   ifp->rt = rt;
I am still missing something on the new spin_lock:
1) Is there an existing race in the existing
   ifp->rt modification ('ipf->rt = rt') which is
   not related to this bug?
2) If there is a race in ifp->rt, is the above if-checks
   on ifp->rt racy and need protection also? F.e. 'ifp->rt->rt6i_ref'
   since ifp->rt could be NULL or ifp->rt->rt6i_ref
   may not be zero later if there is concurrent
   modification on ifp->rt?

> + spin_unlock(&ifp->lock);
> +
> + if (prev)
> + ip6_rt_put(prev);
Nit. ip6_rt_put() takes NULL.

>   }
>
>   if (!(ifp->flags & IFA_F_NOPREFIXROUTE)) {
> --
> 2.1.4
>


[PATCH net-next 2/2] cls_flower: add support for matching MPLS fields (v2)

2017-04-22 Thread Benjamin LaHaise
Add support to the tc flower classifier to match based on fields in MPLS
labels (TTL, Bottom of Stack, TC field, Label).

Signed-off-by: Benjamin LaHaise 
Signed-off-by: Benjamin LaHaise 
Reviewed-by: Jakub Kicinski 
Cc: "David S. Miller" 
Cc: Simon Horman 
Cc: Jamal Hadi Salim 
Cc: Cong Wang 
Cc: Jiri Pirko 
Cc: Eric Dumazet 
Cc: Hadar Hen Zion 
Cc: Gao Feng 
---
 include/uapi/linux/pkt_cls.h |  5 +++
 net/sched/cls_flower.c   | 74 
 2 files changed, 79 insertions(+)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 7a69f2a..f1129e3 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -432,6 +432,11 @@ enum {
TCA_FLOWER_KEY_ARP_THA, /* ETH_ALEN */
TCA_FLOWER_KEY_ARP_THA_MASK,/* ETH_ALEN */
 
+   TCA_FLOWER_KEY_MPLS_TTL,/* u8 - 8 bits */
+   TCA_FLOWER_KEY_MPLS_BOS,/* u8 - 1 bit */
+   TCA_FLOWER_KEY_MPLS_TC, /* u8 - 3 bits */
+   TCA_FLOWER_KEY_MPLS_LABEL,  /* be32 - 20 bits */
+
__TCA_FLOWER_MAX,
 };
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 31ee340..3ecf076 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -47,6 +48,7 @@ struct fl_flow_key {
struct flow_dissector_key_ipv6_addrs enc_ipv6;
};
struct flow_dissector_key_ports enc_tp;
+   struct flow_dissector_key_mpls mpls;
 } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. 
*/
 
 struct fl_flow_mask_range {
@@ -418,6 +420,10 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 
1] = {
[TCA_FLOWER_KEY_ARP_SHA_MASK]   = { .len = ETH_ALEN },
[TCA_FLOWER_KEY_ARP_THA]= { .len = ETH_ALEN },
[TCA_FLOWER_KEY_ARP_THA_MASK]   = { .len = ETH_ALEN },
+   [TCA_FLOWER_KEY_MPLS_TTL]   = { .type = NLA_U8 },
+   [TCA_FLOWER_KEY_MPLS_BOS]   = { .type = NLA_U8 },
+   [TCA_FLOWER_KEY_MPLS_TC]= { .type = NLA_U8 },
+   [TCA_FLOWER_KEY_MPLS_LABEL] = { .type = NLA_U32 },
 };
 
 static void fl_set_key_val(struct nlattr **tb,
@@ -433,6 +439,31 @@ static void fl_set_key_val(struct nlattr **tb,
memcpy(mask, nla_data(tb[mask_type]), len);
 }
 
+static void fl_set_key_mpls(struct nlattr **tb,
+   struct flow_dissector_key_mpls *key_val,
+   struct flow_dissector_key_mpls *key_mask)
+{
+   if (tb[TCA_FLOWER_KEY_MPLS_TTL]) {
+   key_val->mpls_ttl = nla_get_u8(tb[TCA_FLOWER_KEY_MPLS_TTL]);
+   key_mask->mpls_ttl = MPLS_TTL_MASK;
+   }
+   if (tb[TCA_FLOWER_KEY_MPLS_BOS]) {
+   key_val->mpls_bos = nla_get_u8(tb[TCA_FLOWER_KEY_MPLS_BOS]);
+   key_mask->mpls_bos = MPLS_BOS_MASK;
+   }
+   if (tb[TCA_FLOWER_KEY_MPLS_TC]) {
+   key_val->mpls_tc =
+   nla_get_u8(tb[TCA_FLOWER_KEY_MPLS_TC]) & MPLS_TC_MASK;
+   key_mask->mpls_tc = MPLS_TC_MASK;
+   }
+   if (tb[TCA_FLOWER_KEY_MPLS_LABEL]) {
+   key_val->mpls_label =
+   nla_get_u32(tb[TCA_FLOWER_KEY_MPLS_LABEL]) &
+   MPLS_LABEL_MASK;
+   key_mask->mpls_label = MPLS_LABEL_MASK;
+   }
+}
+
 static void fl_set_key_vlan(struct nlattr **tb,
struct flow_dissector_key_vlan *key_val,
struct flow_dissector_key_vlan *key_mask)
@@ -589,6 +620,9 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
   &mask->icmp.code,
   TCA_FLOWER_KEY_ICMPV6_CODE_MASK,
   sizeof(key->icmp.code));
+   } else if (key->basic.n_proto == htons(ETH_P_MPLS_UC) ||
+  key->basic.n_proto == htons(ETH_P_MPLS_MC)) {
+   fl_set_key_mpls(tb, &key->mpls, &mask->mpls);
} else if (key->basic.n_proto == htons(ETH_P_ARP) ||
   key->basic.n_proto == htons(ETH_P_RARP)) {
fl_set_key_val(tb, &key->arp.sip, TCA_FLOWER_KEY_ARP_SIP,
@@ -725,6 +759,8 @@ static void fl_init_dissector(struct cls_fl_head *head,
FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
 FLOW_DISSECTOR_KEY_ARP, arp);
FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
+FLOW_DISSECTOR_KEY_MPLS, mpls);
+   FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
 FLOW_DISSECTOR_KEY_VLAN, vlan);
FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
 FLOW_DISSECTOR_KEY_ENC_KEYID, enc_key_id);
@@ -991,6 +1027,41 @@ static int fl_dump_key_val(struct sk_buff *skb,
return 0;
 }
 
+static int fl_dump_key_mpls(struct sk_buff *skb,
+   struct flow_dissector_key_mpls *mpls_key,
+ 

[PATCH net-next 0/2] flower: add MPLS matching support

2017-04-22 Thread Benjamin LaHaise
From: Benjamin LaHaise 

This patch series adds support for parsing MPLS flows in the flow dissector
and the flower classifier.  Each of the MPLS TTL, BOS, TC and Label fields
can be used for matching.

v2: incorporate style feedback, move #defines to linux/include/mpls.h
Note: this omits Jiri's request to remove tabs between the type and 
field names in struct declarations.  This would be inconsistent with 
numerous other struct definitions.

Benjamin LaHaise (2):
  flow_dissector: add mpls support (v2)
  cls_flower: add support for matching MPLS fields (v2)

 include/linux/mpls.h |  5 +++
 include/net/flow_dissector.h |  8 +
 include/uapi/linux/pkt_cls.h |  5 +++
 net/core/flow_dissector.c| 25 +--
 net/sched/cls_flower.c   | 74 
 5 files changed, 114 insertions(+), 3 deletions(-)

-- 
2.7.4



[PATCH net-next 1/2] flow_dissector: add mpls support (v2)

2017-04-22 Thread Benjamin LaHaise
Add support for parsing MPLS flows to the flow dissector in preparation for
adding MPLS match support to cls_flower.

Signed-off-by: Benjamin LaHaise 
Signed-off-by: Benjamin LaHaise 
Reviewed-by: Jakub Kicinski 
Cc: "David S. Miller" 
Cc: Simon Horman 
Cc: Jamal Hadi Salim 
Cc: Cong Wang 
Cc: Jiri Pirko 
Cc: Eric Dumazet 
Cc: Hadar Hen Zion 
Cc: Gao Feng 
---
 include/linux/mpls.h |  5 +
 include/net/flow_dissector.h |  8 
 net/core/flow_dissector.c| 25 ++---
 3 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/include/linux/mpls.h b/include/linux/mpls.h
index 145..384fb22 100644
--- a/include/linux/mpls.h
+++ b/include/linux/mpls.h
@@ -3,4 +3,9 @@
 
 #include 
 
+#define MPLS_TTL_MASK  (MPLS_LS_TTL_MASK >> MPLS_LS_TTL_SHIFT)
+#define MPLS_BOS_MASK  (MPLS_LS_S_MASK >> MPLS_LS_S_SHIFT)
+#define MPLS_TC_MASK   (MPLS_LS_TC_MASK >> MPLS_LS_TC_SHIFT)
+#define MPLS_LABEL_MASK(MPLS_LS_LABEL_MASK >> 
MPLS_LS_LABEL_SHIFT)
+
 #endif  /* _LINUX_MPLS_H */
diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index ac97030..8d21d44 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -41,6 +41,13 @@ struct flow_dissector_key_vlan {
u16 padding;
 };
 
+struct flow_dissector_key_mpls {
+   u32 mpls_ttl:8,
+   mpls_bos:1,
+   mpls_tc:3,
+   mpls_label:20;
+};
+
 struct flow_dissector_key_keyid {
__be32  keyid;
 };
@@ -169,6 +176,7 @@ enum flow_dissector_key_id {
FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS, /* struct 
flow_dissector_key_ipv6_addrs */
FLOW_DISSECTOR_KEY_ENC_CONTROL, /* struct flow_dissector_key_control */
FLOW_DISSECTOR_KEY_ENC_PORTS, /* struct flow_dissector_key_ports */
+   FLOW_DISSECTOR_KEY_MPLS, /* struct flow_dissector_key_mpls */
 
FLOW_DISSECTOR_KEY_MAX,
 };
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index c9cf425..28d94bc 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -126,9 +126,11 @@ __skb_flow_dissect_mpls(const struct sk_buff *skb,
 {
struct flow_dissector_key_keyid *key_keyid;
struct mpls_label *hdr, _hdr[2];
+   u32 entry, label;
 
if (!dissector_uses_key(flow_dissector,
-   FLOW_DISSECTOR_KEY_MPLS_ENTROPY))
+   FLOW_DISSECTOR_KEY_MPLS_ENTROPY) &&
+   !dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_MPLS))
return FLOW_DISSECT_RET_OUT_GOOD;
 
hdr = __skb_header_pointer(skb, nhoff, sizeof(_hdr), data,
@@ -136,8 +138,25 @@ __skb_flow_dissect_mpls(const struct sk_buff *skb,
if (!hdr)
return FLOW_DISSECT_RET_OUT_BAD;
 
-   if ((ntohl(hdr[0].entry) & MPLS_LS_LABEL_MASK) >>
-   MPLS_LS_LABEL_SHIFT == MPLS_LABEL_ENTROPY) {
+   entry = ntohl(hdr[0].entry);
+   label = (entry & MPLS_LS_LABEL_MASK) >> MPLS_LS_LABEL_SHIFT;
+
+   if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_MPLS)) {
+   struct flow_dissector_key_mpls *key_mpls;
+
+   key_mpls = skb_flow_dissector_target(flow_dissector,
+FLOW_DISSECTOR_KEY_MPLS,
+target_container);
+   key_mpls->mpls_label = label;
+   key_mpls->mpls_ttl = (entry & MPLS_LS_TTL_MASK)
+   >> MPLS_LS_TTL_SHIFT;
+   key_mpls->mpls_tc = (entry & MPLS_LS_TC_MASK)
+   >> MPLS_LS_TC_SHIFT;
+   key_mpls->mpls_bos = (entry & MPLS_LS_S_MASK)
+   >> MPLS_LS_S_SHIFT;
+   }
+
+   if (label == MPLS_LABEL_ENTROPY) {
key_keyid = skb_flow_dissector_target(flow_dissector,
  
FLOW_DISSECTOR_KEY_MPLS_ENTROPY,
  target_container);
-- 
2.7.4



Re: [PATCH] bpf: Add sparc support to tools and samples.

2017-04-22 Thread Daniel Borkmann

On 04/22/2017 10:02 PM, David Miller wrote:

From: Daniel Borkmann 
Date: Sat, 22 Apr 2017 21:46:46 +0200


On 04/22/2017 09:38 PM, David Miller wrote:


Signed-off-by: David S. Miller 


LGTM, thanks!

Acked-by: Daniel Borkmann 


Great, this and the sparc64 eBPF JIT are now pushed out to net-next.


Awesome, thanks for all the work!


Re: compile issue in latest iproute2

2017-04-22 Thread Jamal Hadi Salim

On 17-04-22 12:54 PM, Stephen Hemminger wrote:

On Sat, 22 Apr 2017 12:43:50 -0400
Jamal Hadi Salim  wrote:


On 17-04-22 12:18 PM, Daniel Borkmann wrote:
[..]


Anything I'm missing?



Let me get back to that machine (couple of hours) and try to see how i
created the issue.
Shouldve cutnpasted the error msg. Cant create it on this laptop.

cheers,
jamal


Current tip of iproute2 master compiles fine for me
both with and without HAVE_ELF



Sorry - I cannot recreate it. I tried from scratch and did the patches
I was testing on and it compiled cleanly. Apologies for the alarm.

cheers,
jamal



Re: [PATCH] bpf: Add sparc support to tools and samples.

2017-04-22 Thread David Miller
From: Daniel Borkmann 
Date: Sat, 22 Apr 2017 21:46:46 +0200

> On 04/22/2017 09:38 PM, David Miller wrote:
>>
>> Signed-off-by: David S. Miller 
> 
> LGTM, thanks!
> 
> Acked-by: Daniel Borkmann 

Great, this and the sparc64 eBPF JIT are now pushed out to net-next.


Re: [PATCH] bpf: Add sparc support to tools and samples.

2017-04-22 Thread Daniel Borkmann

On 04/22/2017 09:38 PM, David Miller wrote:


Signed-off-by: David S. Miller 


LGTM, thanks!

Acked-by: Daniel Borkmann 


tools/testing/selftests/bpf/Makefile

2017-04-22 Thread David Miller

Alexei, that unconditional -D__x86_64__ isn't going to work.  It in
fact makes the build break on sparc because the types.h asm headers
explicitly check for things like __sparc__ && __arch64__ etc.

There are other places that want stuff like this, so let's do it
right.

In every

arch/${ARCH}/Makefile

extract out the "-DXXX" stuff from CHECKFLAGS into a new Makefile
variable, expand that into CHECKFLAGS and use the new variable in
places like

tools/testing/selftests/bpf/Makefile

and

tools/testing/selftests/ipc/Makefile

Thanks.


[PATCH] bpf: Add sparc support to tools and samples.

2017-04-22 Thread David Miller

Signed-off-by: David S. Miller 
---
 samples/bpf/bpf_helpers.h  | 19 +++
 tools/build/feature/test-bpf.c |  3 +++
 tools/lib/bpf/bpf.c|  2 ++
 3 files changed, 24 insertions(+)

diff --git a/samples/bpf/bpf_helpers.h b/samples/bpf/bpf_helpers.h
index 52de9d8..9a9c95f 100644
--- a/samples/bpf/bpf_helpers.h
+++ b/samples/bpf/bpf_helpers.h
@@ -146,11 +146,30 @@ static int (*bpf_skb_change_head)(void *, int len, int 
flags) =
 #define PT_REGS_SP(x) ((x)->sp)
 #define PT_REGS_IP(x) ((x)->nip)
 
+#elif defined(__sparc__)
+
+#define PT_REGS_PARM1(x) ((x)->u_regs[UREG_I0])
+#define PT_REGS_PARM2(x) ((x)->u_regs[UREG_I1])
+#define PT_REGS_PARM3(x) ((x)->u_regs[UREG_I2])
+#define PT_REGS_PARM4(x) ((x)->u_regs[UREG_I3])
+#define PT_REGS_PARM5(x) ((x)->u_regs[UREG_I4])
+#define PT_REGS_RET(x) ((x)->u_regs[UREG_I7])
+#define PT_REGS_RC(x) ((x)->u_regs[UREG_I0])
+#define PT_REGS_SP(x) ((x)->u_regs[UREG_FP])
+#if defined(__arch64__)
+#define PT_REGS_IP(x) ((x)->tpc)
+#else
+#define PT_REGS_IP(x) ((x)->pc)
+#endif
+
 #endif
 
 #ifdef __powerpc__
 #define BPF_KPROBE_READ_RET_IP(ip, ctx)({ (ip) = (ctx)->link; 
})
 #define BPF_KRETPROBE_READ_RET_IP  BPF_KPROBE_READ_RET_IP
+#elif defined(__sparc__)
+#define BPF_KPROBE_READ_RET_IP(ip, ctx)({ (ip) = 
PT_REGS_RET(ctx); })
+#define BPF_KRETPROBE_READ_RET_IP  BPF_KPROBE_READ_RET_IP
 #else
 #define BPF_KPROBE_READ_RET_IP(ip, ctx)({  
\
bpf_probe_read(&(ip), sizeof(ip), (void *)PT_REGS_RET(ctx)); })
diff --git a/tools/build/feature/test-bpf.c b/tools/build/feature/test-bpf.c
index e04ab89..ebc6dce 100644
--- a/tools/build/feature/test-bpf.c
+++ b/tools/build/feature/test-bpf.c
@@ -9,6 +9,9 @@
 #  define __NR_bpf 321
 # elif defined(__aarch64__)
 #  define __NR_bpf 280
+# elif defined(__sparc__)
+#  define __NR_bpf 349
+# else
 #  error __NR_bpf not defined. libbpf does not support your arch.
 # endif
 #endif
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index f84c398..4fe444b80 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -37,6 +37,8 @@
 #  define __NR_bpf 321
 # elif defined(__aarch64__)
 #  define __NR_bpf 280
+# elif defined(__sparc__)
+#  define __NR_bpf 349
 # else
 #  error __NR_bpf not defined. libbpf does not support your arch.
 # endif
-- 
2.1.2.532.g19b5d50



Re: [PATCH v2] net: natsemi: ns83820: add checks for dma mapping error

2017-04-22 Thread Francois Romieu
Alexey Khoroshilov  :
[...]
> diff --git a/drivers/net/ethernet/natsemi/ns83820.c 
> b/drivers/net/ethernet/natsemi/ns83820.c
> index 729095db3e08..dfc64e1e31f9 100644
> --- a/drivers/net/ethernet/natsemi/ns83820.c
> +++ b/drivers/net/ethernet/natsemi/ns83820.c
[...]
> @@ -1183,6 +1193,32 @@ static netdev_tx_t ns83820_hard_start_xmit(struct 
> sk_buff *skb,
>   netif_start_queue(ndev);
>  
>   return NETDEV_TX_OK;
> +
> +dma_error:
> + do {
> + free_idx = (free_idx + NR_TX_DESC - 1) % NR_TX_DESC;
> + desc = dev->tx_descs + (free_idx * DESC_SIZE);
> + cmdsts = le32_to_cpu(desc[DESC_CMDSTS]);
> + len = cmdsts & CMDSTS_LEN_MASK;
> + buf = desc_addr_get(desc + DESC_BUFPTR);
> + if (desc == first_desc)
> + pci_unmap_single(dev->pci_dev,
> + buf,
> + len,
> + PCI_DMA_TODEVICE);
> + else
> + pci_unmap_page(dev->pci_dev,
> + buf,
> + len,
> + PCI_DMA_TODEVICE);

(use tabs + spaces to indent: code should line up right after the parenthesis)

(premature line breaks imho)

(nevermind, both can be avoided :o) )

> + desc[DESC_CMDSTS] = cpu_to_le32(0);
> + mb();
> + } while (desc != first_desc);
> +
> +dma_error_first:
> + dev_kfree_skb_any(skb);
> + ndev->stats.tx_errors++;
^ -> should be tx_dropped
> + return NETDEV_TX_OK;
>  }

You only need a single test in the error loop if you mimic the map loop.
Something like:

diff --git a/drivers/net/ethernet/natsemi/ns83820.c 
b/drivers/net/ethernet/natsemi/ns83820.c
index 729095d..5e2dbc9 100644
--- a/drivers/net/ethernet/natsemi/ns83820.c
+++ b/drivers/net/ethernet/natsemi/ns83820.c
@@ -1160,9 +1160,11 @@ static netdev_tx_t ns83820_hard_start_xmit(struct 
sk_buff *skb,
 
buf = skb_frag_dma_map(&dev->pci_dev->dev, frag, 0,
   skb_frag_size(frag), DMA_TO_DEVICE);
+   if (dma_mapping_error(&dev->pci_dev->dev, buf))
+   goto err_unmap_frags;
dprintk("frag: buf=%08Lx  page=%08lx offset=%08lx\n",
(long long)buf, (long) page_to_pfn(frag->page),
frag->page_offset);
len = skb_frag_size(frag);
frag++;
nr_frags--;
@@ -1181,8 +1184,27 @@ static netdev_tx_t ns83820_hard_start_xmit(struct 
sk_buff *skb,
/* Check again: we may have raced with a tx done irq */
if (stopped && (dev->tx_done_idx != tx_done_idx) && start_tx_okay(dev))
netif_start_queue(ndev);
-
+out:
return NETDEV_TX_OK;
+
+err_unmap_frags:
+   while (1) {
+   buf = desc_addr_get(desc + DESC_BUFPTR);
+   if (!--nr_frags)
+   break;
+
+   pci_unmap_page(dev->pci_dev, buf, len, PCI_DMA_TODEVICE);
+
+   free_idx = (free_idx - 1) % NR_TX_DESC;
+   desc = dev->tx_descs + (free_idx * DESC_SIZE);
+   len = le32_to_cpu(desc + DESC_CMDSTS) & CMDSTS_LEN_MASK;
+   }
+   pci_unmap_single(dev->pci_dev, buf, len, PCI_DMA_TODEVICE);
+
+err_free_skb:
+   dev_kfree_skb_any(skb);
+   ndev->stats.tx_dropped++;
+   goto out;
 }
 
 static void ns83820_update_stats(struct ns83820 *dev)


Thinking more about it, the driver seems rather unsafe if a failing
start_xmit closely follows a succeeding one. The driver should imho
map frags first *then* plug the remaining hole in the descriptor ring.
Until it does, the implicit assumption about descriptor ownership that
the error unroll loop relies on may be wrong.

-- 
Ueimor


[net-next 3/5] net/mlx5: E-Switch, Add control for encapsulation

2017-04-22 Thread Saeed Mahameed
From: Roi Dayan 

Implement the devlink e-switch encapsulation control set and get
callbacks. Apply the value set by the user on the switchdev offloads
mode when creating the fast FDB table where offloaded rules will be set.

Signed-off-by: Roi Dayan 
Reviewed-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  5 ++
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  3 ++
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 63 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  2 +
 4 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index b3281d1118b3..21bed3c3334d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1806,6 +1806,11 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev)
esw->enabled_vports = 0;
esw->mode = SRIOV_NONE;
esw->offloads.inline_mode = MLX5_INLINE_MODE_NONE;
+   if (MLX5_CAP_ESW_FLOWTABLE_FDB(dev, encap) &&
+   MLX5_CAP_ESW_FLOWTABLE_FDB(dev, decap))
+   esw->offloads.encap = DEVLINK_ESWITCH_ENCAP_MODE_BASIC;
+   else
+   esw->offloads.encap = DEVLINK_ESWITCH_ENCAP_MODE_NONE;
 
dev->priv.eswitch = esw;
return 0;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 1f56ed9f5a6f..1e7f21be1233 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -210,6 +210,7 @@ struct mlx5_esw_offload {
DECLARE_HASHTABLE(encap_tbl, 8);
u8 inline_mode;
u64 num_flows;
+   u8 encap;
 };
 
 struct mlx5_eswitch {
@@ -322,6 +323,8 @@ int mlx5_devlink_eswitch_mode_get(struct devlink *devlink, 
u16 *mode);
 int mlx5_devlink_eswitch_inline_mode_set(struct devlink *devlink, u8 mode);
 int mlx5_devlink_eswitch_inline_mode_get(struct devlink *devlink, u8 *mode);
 int mlx5_eswitch_inline_mode_get(struct mlx5_eswitch *esw, int nvfs, u8 *mode);
+int mlx5_devlink_eswitch_encap_mode_set(struct devlink *devlink, u8 encap);
+int mlx5_devlink_eswitch_encap_mode_get(struct devlink *devlink, u8 *encap);
 void mlx5_eswitch_register_vport_rep(struct mlx5_eswitch *esw,
 int vport_index,
 struct mlx5_eswitch_rep *rep);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index ce3a2c040706..189d24dbd3e3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -450,8 +450,7 @@ static int esw_create_offloads_fast_fdb_table(struct 
mlx5_eswitch *esw)
esw_size = min_t(int, MLX5_CAP_GEN(dev, max_flow_counter) * 
ESW_OFFLOADS_NUM_GROUPS,
 1 << MLX5_CAP_ESW_FLOWTABLE_FDB(dev, log_max_ft_size));
 
-   if (MLX5_CAP_ESW_FLOWTABLE_FDB(dev, encap) &&
-   MLX5_CAP_ESW_FLOWTABLE_FDB(dev, decap))
+   if (esw->offloads.encap != DEVLINK_ESWITCH_ENCAP_MODE_NONE)
flags |= MLX5_FLOW_TABLE_TUNNEL_EN;
 
fdb = mlx5_create_auto_grouped_flow_table(root_ns, FDB_FAST_PATH,
@@ -1045,6 +1044,66 @@ int mlx5_eswitch_inline_mode_get(struct mlx5_eswitch 
*esw, int nvfs, u8 *mode)
return 0;
 }
 
+int mlx5_devlink_eswitch_encap_mode_set(struct devlink *devlink, u8 encap)
+{
+   struct mlx5_core_dev *dev = devlink_priv(devlink);
+   struct mlx5_eswitch *esw = dev->priv.eswitch;
+   int err;
+
+   if (!MLX5_CAP_GEN(dev, vport_group_manager))
+   return -EOPNOTSUPP;
+
+   if (esw->mode == SRIOV_NONE)
+   return -EOPNOTSUPP;
+
+   if (encap != DEVLINK_ESWITCH_ENCAP_MODE_NONE &&
+   (!MLX5_CAP_ESW_FLOWTABLE_FDB(dev, encap) ||
+!MLX5_CAP_ESW_FLOWTABLE_FDB(dev, decap)))
+   return -EOPNOTSUPP;
+
+   if (encap && encap != DEVLINK_ESWITCH_ENCAP_MODE_BASIC)
+   return -EOPNOTSUPP;
+
+   if (esw->mode == SRIOV_LEGACY) {
+   esw->offloads.encap = encap;
+   return 0;
+   }
+
+   if (esw->offloads.encap == encap)
+   return 0;
+
+   if (esw->offloads.num_flows > 0) {
+   esw_warn(dev, "Can't set encapsulation when flows are 
configured\n");
+   return -EOPNOTSUPP;
+   }
+
+   esw_destroy_offloads_fast_fdb_table(esw);
+
+   esw->offloads.encap = encap;
+   err = esw_create_offloads_fast_fdb_table(esw);
+   if (err) {
+   esw_warn(esw->dev, "Failed re-creating fast FDB table, err 
%d\n", err);
+   esw->offloads.encap = !encap;
+   (void) esw_create_offloads_fast_fdb_table(esw);
+   }
+   return err;
+}
+
+int mlx5_devlink_eswitc

[net-next 2/5] net/mlx5: E-Switch, Refactor fast path FDB table creation in switchdev mode

2017-04-22 Thread Saeed Mahameed
From: Or Gerlitz 

Refactor the creation of the fast path FDB table that holds the
offloaded rules in SRIOV switchdev mode into it's own function.

This will be used in the next patch to be able and re-create the
table under different settings without going through legacy mode.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz 
Reviewed-by: Roi Dayan 
Signed-off-by: Saeed Mahameed 
---
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 69 +++---
 1 file changed, 49 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 992b380d36be..ce3a2c040706 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -426,31 +426,21 @@ static int esw_add_fdb_miss_rule(struct mlx5_eswitch *esw)
return err;
 }
 
-#define MAX_PF_SQ 256
 #define ESW_OFFLOADS_NUM_GROUPS  4
 
-static int esw_create_offloads_fdb_table(struct mlx5_eswitch *esw, int nvports)
+static int esw_create_offloads_fast_fdb_table(struct mlx5_eswitch *esw)
 {
-   int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
-   struct mlx5_flow_table_attr ft_attr = {};
-   int table_size, ix, esw_size, err = 0;
struct mlx5_core_dev *dev = esw->dev;
struct mlx5_flow_namespace *root_ns;
struct mlx5_flow_table *fdb = NULL;
-   struct mlx5_flow_group *g;
-   u32 *flow_group_in;
-   void *match_criteria;
+   int esw_size, err = 0;
u32 flags = 0;
 
-   flow_group_in = mlx5_vzalloc(inlen);
-   if (!flow_group_in)
-   return -ENOMEM;
-
root_ns = mlx5_get_flow_namespace(dev, MLX5_FLOW_NAMESPACE_FDB);
if (!root_ns) {
esw_warn(dev, "Failed to get FDB flow namespace\n");
err = -EOPNOTSUPP;
-   goto ns_err;
+   goto out;
}
 
esw_debug(dev, "Create offloads FDB table, min (max esw size(2^%d), max 
counters(%d)*groups(%d))\n",
@@ -471,10 +461,49 @@ static int esw_create_offloads_fdb_table(struct 
mlx5_eswitch *esw, int nvports)
if (IS_ERR(fdb)) {
err = PTR_ERR(fdb);
esw_warn(dev, "Failed to create Fast path FDB Table err %d\n", 
err);
-   goto fast_fdb_err;
+   goto out;
}
esw->fdb_table.fdb = fdb;
 
+out:
+   return err;
+}
+
+static void esw_destroy_offloads_fast_fdb_table(struct mlx5_eswitch *esw)
+{
+   mlx5_destroy_flow_table(esw->fdb_table.fdb);
+}
+
+#define MAX_PF_SQ 256
+
+static int esw_create_offloads_fdb_tables(struct mlx5_eswitch *esw, int 
nvports)
+{
+   int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
+   struct mlx5_flow_table_attr ft_attr = {};
+   struct mlx5_core_dev *dev = esw->dev;
+   struct mlx5_flow_namespace *root_ns;
+   struct mlx5_flow_table *fdb = NULL;
+   int table_size, ix, err = 0;
+   struct mlx5_flow_group *g;
+   void *match_criteria;
+   u32 *flow_group_in;
+
+   esw_debug(esw->dev, "Create offloads FDB Tables\n");
+   flow_group_in = mlx5_vzalloc(inlen);
+   if (!flow_group_in)
+   return -ENOMEM;
+
+   root_ns = mlx5_get_flow_namespace(dev, MLX5_FLOW_NAMESPACE_FDB);
+   if (!root_ns) {
+   esw_warn(dev, "Failed to get FDB flow namespace\n");
+   err = -EOPNOTSUPP;
+   goto ns_err;
+   }
+
+   err = esw_create_offloads_fast_fdb_table(esw);
+   if (err)
+   goto fast_fdb_err;
+
table_size = nvports + MAX_PF_SQ + 1;
 
ft_attr.max_fte = table_size;
@@ -545,18 +574,18 @@ static int esw_create_offloads_fdb_table(struct 
mlx5_eswitch *esw, int nvports)
return err;
 }
 
-static void esw_destroy_offloads_fdb_table(struct mlx5_eswitch *esw)
+static void esw_destroy_offloads_fdb_tables(struct mlx5_eswitch *esw)
 {
if (!esw->fdb_table.fdb)
return;
 
-   esw_debug(esw->dev, "Destroy offloads FDB Table\n");
+   esw_debug(esw->dev, "Destroy offloads FDB Tables\n");
mlx5_del_flow_rules(esw->fdb_table.offloads.miss_rule);
mlx5_destroy_flow_group(esw->fdb_table.offloads.send_to_vport_grp);
mlx5_destroy_flow_group(esw->fdb_table.offloads.miss_grp);
 
mlx5_destroy_flow_table(esw->fdb_table.offloads.fdb);
-   mlx5_destroy_flow_table(esw->fdb_table.fdb);
+   esw_destroy_offloads_fast_fdb_table(esw);
 }
 
 static int esw_create_offloads_table(struct mlx5_eswitch *esw)
@@ -716,7 +745,7 @@ int esw_offloads_init(struct mlx5_eswitch *esw, int nvports)
mlx5_remove_dev_by_protocol(esw->dev, MLX5_INTERFACE_PROTOCOL_IB);
mlx5_dev_list_unlock();
 
-   err = esw_create_offloads_fdb_table(esw, nvports);
+   err = esw_create_offloads_fdb_tables(esw, nvports);
if (err)
goto create_fdb_err;
 
@@ -753,7 +782,7 @@ 

[pull request][net-next 0/5] Mellanox, mlx5 updates 2017-04-22

2017-04-22 Thread Saeed Mahameed
Hi Dave,

This series contains some updates to mlx5 driver.

Sparse and compiler warnings fixes from Stephen Hemminger.

>From Roi Dayan and Or Gerlitz, Add devlink and mlx5 support for controlling
E-Switch encapsulation mode, this knob will enable HW support for applying
encapsulation/decapsulation to VF traffic as part of SRIOV e-switch offloading.

Please pull and let me know if there's any problem.

Thanks,
Saeed.

---

The following changes since commit fb796707d7a6c9b24fdf80b9b4f24fa5ffcf0ec5:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2017-04-21 
20:23:53 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git 
tags/mlx5-updates-2017-04-22

for you to fetch changes up to 8bf3198a5e394ed6815aeb8fedaf49722986bbd3:

  mlx5: fix warning about missing prototype (2017-04-22 20:26:42 +0300)

Or Gerlitz (1):
  net/mlx5: E-Switch, Refactor fast path FDB table creation in switchdev 
mode

Roi Dayan (2):
  net/devlink: Add E-Switch encapsulation control
  net/mlx5: E-Switch, Add control for encapsulation

Stephen Hemminger (2):
  mlx5: hide unused functions
  mlx5: fix warning about missing prototype

 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|   1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|   1 +
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |   5 +
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |   3 +
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 132 +
 drivers/net/ethernet/mellanox/mlx5/core/ipoib.c|  24 ++--
 drivers/net/ethernet/mellanox/mlx5/core/main.c |   2 +
 include/net/devlink.h  |   2 +
 include/uapi/linux/devlink.h   |   7 ++
 net/core/devlink.c |  26 +++-
 10 files changed, 167 insertions(+), 36 deletions(-)


[net-next 4/5] mlx5: hide unused functions

2017-04-22 Thread Saeed Mahameed
From: Stephen Hemminger 

Fix sparse warnings in recent ipoib support.
The RDMA functions are not used yet, hide behind #ifdef.
Based on comment, they will eventually be local so make static.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/ipoib.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
index ec78e637840f..3c84e36af018 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
@@ -178,7 +178,7 @@ static int mlx5i_init_tx(struct mlx5e_priv *priv)
return 0;
 }
 
-void mlx5i_cleanup_tx(struct mlx5e_priv *priv)
+static void mlx5i_cleanup_tx(struct mlx5e_priv *priv)
 {
struct mlx5i_priv *ipriv = priv->ppriv;
 
@@ -359,9 +359,10 @@ static int mlx5i_close(struct net_device *netdev)
return 0;
 }
 
+#ifdef notusedyet
 /* IPoIB RDMA netdev callbacks */
-int mlx5i_attach_mcast(struct net_device *netdev, struct ib_device *hca,
-  union ib_gid *gid, u16 lid, int set_qkey)
+static int mlx5i_attach_mcast(struct net_device *netdev, struct ib_device *hca,
+ union ib_gid *gid, u16 lid, int set_qkey)
 {
struct mlx5e_priv*epriv = mlx5i_epriv(netdev);
struct mlx5_core_dev *mdev  = epriv->mdev;
@@ -377,8 +378,8 @@ int mlx5i_attach_mcast(struct net_device *netdev, struct 
ib_device *hca,
return err;
 }
 
-int mlx5i_detach_mcast(struct net_device *netdev, struct ib_device *hca,
-  union ib_gid *gid, u16 lid)
+static int mlx5i_detach_mcast(struct net_device *netdev, struct ib_device *hca,
+ union ib_gid *gid, u16 lid)
 {
struct mlx5e_priv*epriv = mlx5i_epriv(netdev);
struct mlx5_core_dev *mdev  = epriv->mdev;
@@ -395,7 +396,7 @@ int mlx5i_detach_mcast(struct net_device *netdev, struct 
ib_device *hca,
return err;
 }
 
-int mlx5i_xmit(struct net_device *dev, struct sk_buff *skb,
+static int mlx5i_xmit(struct net_device *dev, struct sk_buff *skb,
   struct ib_ah *address, u32 dqpn, u32 dqkey)
 {
struct mlx5e_priv *epriv = mlx5i_epriv(dev);
@@ -404,6 +405,7 @@ int mlx5i_xmit(struct net_device *dev, struct sk_buff *skb,
 
return mlx5i_sq_xmit(sq, skb, &mah->av, dqpn, dqkey);
 }
+#endif
 
 static int mlx5i_check_required_hca_cap(struct mlx5_core_dev *mdev)
 {
@@ -418,10 +420,10 @@ static int mlx5i_check_required_hca_cap(struct 
mlx5_core_dev *mdev)
return 0;
 }
 
-struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev *mdev,
- struct ib_device *ibdev,
- const char *name,
- void (*setup)(struct net_device *))
+static struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev *mdev,
+struct ib_device *ibdev,
+const char *name,
+void (*setup)(struct 
net_device *))
 {
const struct mlx5e_profile *profile = &mlx5i_nic_profile;
int nch = profile->max_nch(mdev);
@@ -480,7 +482,7 @@ struct net_device *mlx5_rdma_netdev_alloc(struct 
mlx5_core_dev *mdev,
 }
 EXPORT_SYMBOL(mlx5_rdma_netdev_alloc);
 
-void mlx5_rdma_netdev_free(struct net_device *netdev)
+static void mlx5_rdma_netdev_free(struct net_device *netdev)
 {
struct mlx5e_priv  *priv= mlx5i_epriv(netdev);
const struct mlx5e_profile *profile = priv->profile;
-- 
2.11.0



[net-next 5/5] mlx5: fix warning about missing prototype

2017-04-22 Thread Saeed Mahameed
From: Stephen Hemminger 

Fix sparse warning about missing prototypes. The rx/tx code path
defines functions with prototypes in ipoib.h.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 43308243f519..ae66fad98244 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -39,6 +39,7 @@
 #include "en.h"
 #include "en_tc.h"
 #include "eswitch.h"
+#include "ipoib.h"
 
 static inline bool mlx5e_rx_hw_stamp(struct mlx5e_tstamp *tstamp)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index dda7db503043..ab3bb026ff9e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include "en.h"
+#include "ipoib.h"
 
 #define MLX5E_SQ_NOPS_ROOM  MLX5_SEND_WQE_MAX_WQEBBS
 #define MLX5E_SQ_STOP_ROOM (MLX5_SEND_WQE_MAX_WQEBBS +\
-- 
2.11.0



[net-next 1/5] net/devlink: Add E-Switch encapsulation control

2017-04-22 Thread Saeed Mahameed
From: Roi Dayan 

This is an e-switch global knob to enable HW support for applying
encapsulation/decapsulation to VF traffic as part of SRIOV e-switch offloading.

The actual encap/decap is carried out (along with the matching and other 
actions)
per offloaded e-switch rules, e.g as done when offloading the TC tunnel key 
action.

Signed-off-by: Roi Dayan 
Reviewed-by: Or Gerlitz 
Acked-by: Jiri Pirko 
Signed-off-by: Saeed Mahameed 
---
 include/net/devlink.h|  2 ++
 include/uapi/linux/devlink.h |  7 +++
 net/core/devlink.c   | 26 +++---
 3 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 24de13f8c94f..ed7687bbf5d0 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -268,6 +268,8 @@ struct devlink_ops {
int (*eswitch_mode_set)(struct devlink *devlink, u16 mode);
int (*eswitch_inline_mode_get)(struct devlink *devlink, u8 
*p_inline_mode);
int (*eswitch_inline_mode_set)(struct devlink *devlink, u8 inline_mode);
+   int (*eswitch_encap_mode_get)(struct devlink *devlink, u8 
*p_encap_mode);
+   int (*eswitch_encap_mode_set)(struct devlink *devlink, u8 encap_mode);
 };
 
 static inline void *devlink_priv(struct devlink *devlink)
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index b47bee277347..b0e807ac53bb 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -119,6 +119,11 @@ enum devlink_eswitch_inline_mode {
DEVLINK_ESWITCH_INLINE_MODE_TRANSPORT,
 };
 
+enum devlink_eswitch_encap_mode {
+   DEVLINK_ESWITCH_ENCAP_MODE_NONE,
+   DEVLINK_ESWITCH_ENCAP_MODE_BASIC,
+};
+
 enum devlink_attr {
/* don't change the order or add anything between, this is ABI! */
DEVLINK_ATTR_UNSPEC,
@@ -195,6 +200,8 @@ enum devlink_attr {
 
DEVLINK_ATTR_PAD,
 
+   DEVLINK_ATTR_ESWITCH_ENCAP_MODE,/* u8 */
+
/* add new attributes above here, update the policy in devlink.c */
 
__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 0afac5800b57..b0b87a292e7c 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -1397,10 +1397,10 @@ static int devlink_nl_eswitch_fill(struct sk_buff *msg, 
struct devlink *devlink,
   u32 seq, int flags)
 {
const struct devlink_ops *ops = devlink->ops;
+   u8 inline_mode, encap_mode;
void *hdr;
int err = 0;
u16 mode;
-   u8 inline_mode;
 
hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
if (!hdr)
@@ -1429,6 +1429,15 @@ static int devlink_nl_eswitch_fill(struct sk_buff *msg, 
struct devlink *devlink,
goto nla_put_failure;
}
 
+   if (ops->eswitch_encap_mode_get) {
+   err = ops->eswitch_encap_mode_get(devlink, &encap_mode);
+   if (err)
+   goto nla_put_failure;
+   err = nla_put_u8(msg, DEVLINK_ATTR_ESWITCH_ENCAP_MODE, 
encap_mode);
+   if (err)
+   goto nla_put_failure;
+   }
+
genlmsg_end(msg, hdr);
return 0;
 
@@ -1468,9 +1477,9 @@ static int devlink_nl_cmd_eswitch_set_doit(struct sk_buff 
*skb,
 {
struct devlink *devlink = info->user_ptr[0];
const struct devlink_ops *ops = devlink->ops;
-   u16 mode;
-   u8 inline_mode;
+   u8 inline_mode, encap_mode;
int err = 0;
+   u16 mode;
 
if (!ops)
return -EOPNOTSUPP;
@@ -1493,6 +1502,16 @@ static int devlink_nl_cmd_eswitch_set_doit(struct 
sk_buff *skb,
if (err)
return err;
}
+
+   if (info->attrs[DEVLINK_ATTR_ESWITCH_ENCAP_MODE]) {
+   if (!ops->eswitch_encap_mode_set)
+   return -EOPNOTSUPP;
+   encap_mode = 
nla_get_u8(info->attrs[DEVLINK_ATTR_ESWITCH_ENCAP_MODE]);
+   err = ops->eswitch_encap_mode_set(devlink, encap_mode);
+   if (err)
+   return err;
+   }
+
return 0;
 }
 
@@ -2190,6 +2209,7 @@ static const struct nla_policy 
devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_SB_TC_INDEX] = { .type = NLA_U16 },
[DEVLINK_ATTR_ESWITCH_MODE] = { .type = NLA_U16 },
[DEVLINK_ATTR_ESWITCH_INLINE_MODE] = { .type = NLA_U8 },
+   [DEVLINK_ATTR_ESWITCH_ENCAP_MODE] = { .type = NLA_U8 },
[DEVLINK_ATTR_DPIPE_TABLE_NAME] = { .type = NLA_NUL_STRING },
[DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED] = { .type = NLA_U8 },
 };
-- 
2.11.0



Re: [PATCH 2/2] sparc64: Add eBPF JIT.

2017-04-22 Thread David Miller
From: Alexei Starovoitov 
Date: Sat, 22 Apr 2017 08:32:35 -0700

> On Fri, Apr 21, 2017 at 08:17:11PM -0700, David Miller wrote:
>> 
>> This is an eBPF JIT for sparc64.  All major features are supported.
>> 
>> All tests under tools/testing/selftests/bpf/ pass.
>> 
>> Signed-off-by: David S. Miller 
> ...
>> +/* tail call */
>> +case BPF_JMP | BPF_CALL |BPF_X:
>> +emit_tail_call(ctx);
>> +
> 
> I think 'break;' is missing here.

Good catch, I'll fix that.

> When tail_call's target program is null the current program should
> continue instead of aborting.
> Like in our current ddos+lb setup the program looks like:
>  bpf_tail_call(ctx, &prog_array, 1);
>  bpf_tail_call(ctx, &prog_array, 2);
>  bpf_tail_call(ctx, &prog_array, 3);
>  return XDP_DROP;
> 
> this way it will jump into the program that is installed in slot 1.
> If it's empty, it will try slot 2...
> If no programs installed it will drop the packet.

Yes, with the break; fixed above that's what the sparc64 JIT will
end up doing.  If any of the tests don't pass in emit_tail_call()
we branch to the end of the emit_tail_call() sequence.

Thanks.


Re: [PATCH] rtl_bt: Update firmware for BT part of rtl8822be

2017-04-22 Thread Kyle McMartin
On Fri, Apr 14, 2017 at 12:55:52AM -0500, Larry Finger wrote:
> These files were supplied by Realtek.
> 
> Signed-off-by: Larry Finger 

Applied, thanks Larry.

--Kyle


Re: [PATCH] net: can: usb: gs_usb: Fix buffer on stack

2017-04-22 Thread Fabio Estevam
On Sat, Apr 22, 2017 at 1:56 PM, Maksim Salau  wrote:
> Allocate buffer on HEAP instead of STACK for a local structure
> that is to be sent using usb_control_msg().
>
> Signed-off-by: Maksim Salau 
> ---
>  drivers/net/can/usb/gs_usb.c | 17 -
>  1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/can/usb/gs_usb.c b/drivers/net/can/usb/gs_usb.c
> index a0dabd4..98f972a 100644
> --- a/drivers/net/can/usb/gs_usb.c
> +++ b/drivers/net/can/usb/gs_usb.c
> @@ -740,13 +740,18 @@ static const struct net_device_ops gs_usb_netdev_ops = {
>  static int gs_usb_set_identify(struct net_device *netdev, bool do_identify)
>  {
> struct gs_can *dev = netdev_priv(netdev);
> -   struct gs_identify_mode imode;
> +   struct gs_identify_mode *imode = NULL;

No need to assign imode to NULL here.


Re: PROBLEM: IPVS incorrectly reverse-NATs traffic to LVS host

2017-04-22 Thread Julian Anastasov

Hello,

On Wed, 12 Apr 2017, Nick Moriarty wrote:

> Hi,
> 
> I've experienced a problem in how traffic returning to an LVS host is
> handled in certain circumstances.  Please find a bug report below - if
> there's any further information you'd like, please let me know.
> 
> [1.] One line summary of the problem:
> IPVS incorrectly reverse-NATs traffic to LVS host
> 
> [2.] Full description of the problem/report:
> When using IPVS in direct-routing mode, normal traffic from the LVS
> host to a back-end server is sometimes incorrectly NATed on the way
> back into the LVS host.  Using tcpdump shows that the return packets
> have the correct source IP, but by the time it makes it back to the
> application, it's been changed.
> 
> To reproduce this, a configuration such as the following will work:
> - Set up an LVS system with a VIP serving UDP to a backend DNS server
> using the direct-routing method in IPVS
> - Make an outgoing UDP request to the VIP from the LVS system itself
> (this causes a connection to be added to the IPVS connection table)
> - The request should succeed as normal
> - Note the UDP source port used
> - Within 5 minutes (before the UDP connection entry expires), make an
> outgoing UDP request directly to the backend DNS server
> - The request will fail as the reply is incorrectly modified on its
> way back and appears to return from the VIP
> 
> Monitoring the above sequence with tcpdump verifies that the returned
> packet (as it enters the host) is from the DNS IP, even though the
> application sees the VIP.
> 
> If an outgoing request direct to the DNS server is made from a port
> not in the connection table, everything is fine.

Thanks for the detailed report! I think, I fixed the
problem. Let me know if you are able to test the appended fix.

> I expect that somewhere, something (e.g. functionality for IPVS MASQ
> responses) is applying IPVS connection
> information to incoming traffic, matching a DROUTE rule, and treating
> it as NAT traffic.

Yep, that is what happens.



[PATCH net] ipvs: SNAT packet replies only for NATed connections

We do not check if packet from real server is for NAT
connection before performing SNAT. This causes problems
for setups that use DR/TUN and allow local clients to
access the real server directly, for example:

- local client in director creates IPVS-DR/TUN connection
CIP->VIP and the request packets are routed to RIP.
Talks are finished but IPVS connection is not expired yet.

- second local client creates non-IPVS connection CIP->RIP
with same reply tuple RIP->CIP and when replies are received
on LOCAL_IN we wrongly assign them for the first client
connection because RIP->CIP matches the reply direction.

The problem is more visible to local UDP clients but in rare
cases it can happen also for TCP or remote clients when the
real server sends the reply traffic via the director.

So, better to be more precise for the reply traffic.
As replies are not expected for DR/TUN connections, better
to not touch them.

Reported-by: Nick Moriarty 
Signed-off-by: Julian Anastasov 
---
 net/netfilter/ipvs/ip_vs_core.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index db40050..ee44ed5 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -849,10 +849,8 @@ static int handle_response_icmp(int af, struct sk_buff 
*skb,
 {
unsigned int verdict = NF_DROP;
 
-   if (IP_VS_FWD_METHOD(cp) != 0) {
-   pr_err("shouldn't reach here, because the box is on the "
-  "half connection in the tun/dr module.\n");
-   }
+   if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ)
+   goto ignore_cp;
 
/* Ensure the checksum is correct */
if (!skb_csum_unnecessary(skb) && ip_vs_checksum_complete(skb, ihl)) {
@@ -886,6 +884,8 @@ static int handle_response_icmp(int af, struct sk_buff *skb,
ip_vs_notrack(skb);
else
ip_vs_update_conntrack(skb, cp, 0);
+
+ignore_cp:
verdict = NF_ACCEPT;
 
 out:
@@ -1385,8 +1385,11 @@ ip_vs_out(struct netns_ipvs *ipvs, unsigned int hooknum, 
struct sk_buff *skb, in
 */
cp = pp->conn_out_get(ipvs, af, skb, &iph);
 
-   if (likely(cp))
+   if (likely(cp)) {
+   if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ)
+   goto ignore_cp;
return handle_response(af, skb, pd, cp, &iph, hooknum);
+   }
 
/* Check for real-server-started requests */
if (atomic_read(&ipvs->conn_out_counter)) {
@@ -1444,9 +1447,15 @@ ip_vs_out(struct netns_ipvs *ipvs, unsigned int hooknum, 
struct sk_buff *skb, in
}
}
}
+
+out:
IP_VS_DBG_PKT(12, af, pp, skb, iph.off,
  "ip_vs_out: pack

[PATCH] net: wireless: orinoco: usb: Fix buffer on stack

2017-04-22 Thread Maksim Salau
Allocate buffer on HEAP instead of STACK for a local variable
that is to be sent using usb_control_msg().

Signed-off-by: Maksim Salau 
---
 drivers/net/wireless/intersil/orinoco/orinoco_usb.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/intersil/orinoco/orinoco_usb.c 
b/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
index bca6935..eb4528b 100644
--- a/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
+++ b/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
@@ -770,18 +770,31 @@ static int ezusb_submit_in_urb(struct ezusb_priv *upriv)
 
 static inline int ezusb_8051_cpucs(struct ezusb_priv *upriv, int reset)
 {
-   u8 res_val = reset; /* avoid argument promotion */
+   int ret;
+   u8 *res_val = NULL;
 
if (!upriv->udev) {
err("%s: !upriv->udev", __func__);
return -EFAULT;
}
-   return usb_control_msg(upriv->udev,
+
+   res_val = kmalloc(sizeof(*res_val), GFP_KERNEL);
+
+   if (!res_val)
+   return -ENOMEM;
+
+   *res_val = reset;   /* avoid argument promotion */
+
+   ret =  usb_control_msg(upriv->udev,
   usb_sndctrlpipe(upriv->udev, 0),
   EZUSB_REQUEST_FW_TRANS,
   USB_TYPE_VENDOR | USB_RECIP_DEVICE |
-  USB_DIR_OUT, EZUSB_CPUCS_REG, 0, &res_val,
-  sizeof(res_val), DEF_TIMEOUT);
+  USB_DIR_OUT, EZUSB_CPUCS_REG, 0, res_val,
+  sizeof(*res_val), DEF_TIMEOUT);
+
+   kfree(res_val);
+
+   return ret;
 }
 
 static int ezusb_firmware_download(struct ezusb_priv *upriv,
-- 
2.9.3



[PATCH] net: can: usb: gs_usb: Fix buffer on stack

2017-04-22 Thread Maksim Salau
Allocate buffer on HEAP instead of STACK for a local structure
that is to be sent using usb_control_msg().

Signed-off-by: Maksim Salau 
---
 drivers/net/can/usb/gs_usb.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/net/can/usb/gs_usb.c b/drivers/net/can/usb/gs_usb.c
index a0dabd4..98f972a 100644
--- a/drivers/net/can/usb/gs_usb.c
+++ b/drivers/net/can/usb/gs_usb.c
@@ -740,13 +740,18 @@ static const struct net_device_ops gs_usb_netdev_ops = {
 static int gs_usb_set_identify(struct net_device *netdev, bool do_identify)
 {
struct gs_can *dev = netdev_priv(netdev);
-   struct gs_identify_mode imode;
+   struct gs_identify_mode *imode = NULL;
int rc;
 
+   imode = kmalloc(sizeof(*imode), GFP_KERNEL);
+
+   if (!imode)
+   return -ENOMEM;
+
if (do_identify)
-   imode.mode = GS_CAN_IDENTIFY_ON;
+   imode->mode = GS_CAN_IDENTIFY_ON;
else
-   imode.mode = GS_CAN_IDENTIFY_OFF;
+   imode->mode = GS_CAN_IDENTIFY_OFF;
 
rc = usb_control_msg(interface_to_usbdev(dev->iface),
 usb_sndctrlpipe(interface_to_usbdev(dev->iface),
@@ -756,10 +761,12 @@ static int gs_usb_set_identify(struct net_device *netdev, 
bool do_identify)
 USB_RECIP_INTERFACE,
 dev->channel,
 0,
-&imode,
-sizeof(imode),
+imode,
+sizeof(*imode),
 100);
 
+   kfree(imode);
+
return (rc > 0) ? 0 : rc;
 }
 
-- 
2.9.3



Re: compile issue in latest iproute2

2017-04-22 Thread Stephen Hemminger
On Sat, 22 Apr 2017 12:43:50 -0400
Jamal Hadi Salim  wrote:

> On 17-04-22 12:18 PM, Daniel Borkmann wrote:
> [..]
> >
> > Anything I'm missing?  
> 
> 
> Let me get back to that machine (couple of hours) and try to see how i
> created the issue.
> Shouldve cutnpasted the error msg. Cant create it on this laptop.
> 
> cheers,
> jamal

Current tip of iproute2 master compiles fine for me
both with and without HAVE_ELF





Fw: [Bug 195495] New: unchecked return value of nla_nest_start() in function lwtunnel_fill_encap()

2017-04-22 Thread Stephen Hemminger


Begin forwarded message:

Date: Sat, 22 Apr 2017 14:49:46 +
From: bugzilla-dae...@bugzilla.kernel.org
To: step...@networkplumber.org
Subject: [Bug 195495] New: unchecked return value of nla_nest_start() in 
function lwtunnel_fill_encap()


https://bugzilla.kernel.org/show_bug.cgi?id=195495

Bug ID: 195495
   Summary: unchecked return value of nla_nest_start() in function
lwtunnel_fill_encap()
   Product: Networking
   Version: 2.5
Kernel Version: linux-4.11-rc7
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: Other
  Assignee: step...@networkplumber.org
  Reporter: bianpan2...@ruc.edu.cn
Regression: No

Function nla_nest_start() may return a NULL pointer on error. However, in
function lwtunnel_fill_encap(), the return value of nla_nest_start() is not
checked against NULL (see line 218), and may result in bad memory access.
Related code snippets are shown as follows.

lwtunnel_fill_encap @@ net/core/lwtunnel.c: 204
204 int lwtunnel_fill_encap(struct sk_buff *skb, struct lwtunnel_state
*lwtstate)
205 {
206 const struct lwtunnel_encap_ops *ops;
207 struct nlattr *nest;
208 int ret = -EINVAL;
209 
210 if (!lwtstate)
211 return 0;
212 
213 if (lwtstate->type == LWTUNNEL_ENCAP_NONE ||
214 lwtstate->type > LWTUNNEL_ENCAP_MAX)
215 return 0;
216 
217 ret = -EOPNOTSUPP;
218 nest = nla_nest_start(skb, RTA_ENCAP);
219 rcu_read_lock();
220 ops = rcu_dereference(lwtun_encaps[lwtstate->type]);
221 if (likely(ops && ops->fill_encap))
222 ret = ops->fill_encap(skb, lwtstate);
223 rcu_read_unlock();
224 
225 if (ret)
226 goto nla_put_failure;
227 nla_nest_end(skb, nest);
228 ret = nla_put_u16(skb, RTA_ENCAP_TYPE, lwtstate->type);
229 if (ret)
230 goto nla_put_failure;
231 
232 return 0;
233 
234 nla_put_failure:
235 nla_nest_cancel(skb, nest);
236 
237 return (ret == -EOPNOTSUPP ? 0 : ret);
238 }

Generally, the return value of function nla_nest_start() should be checked
against NULL, as follows.
rtnetlink_put_metrics @@ net/core/rtnetlink.c: 
 686 int rtnetlink_put_metrics(struct sk_buff *skb, u32 *metrics)
 687 {
 688 struct nlattr *mx;
 689 int i, valid = 0;
 690 
 691 mx = nla_nest_start(skb, RTA_METRICS);
 692 if (mx == NULL)
 693 return -ENOBUFS;
 ...
 726 return nla_nest_end(skb, mx);
 727 
 728 nla_put_failure:
 729 nla_nest_cancel(skb, mx);
 730 return -EMSGSIZE;
 731 }


Thanks very much for your attention!

Pan Bian

-- 
You are receiving this mail because:
You are the assignee for the bug.


Fw: [Bug 195503] New: tipc: unchecked return value of nlmsg_new() in function tipc_nl_node_get_monitor()

2017-04-22 Thread Stephen Hemminger


Begin forwarded message:

Date: Sat, 22 Apr 2017 14:56:25 +
From: bugzilla-dae...@bugzilla.kernel.org
To: step...@networkplumber.org
Subject: [Bug 195503] New: tipc: unchecked return value of nlmsg_new() in 
function tipc_nl_node_get_monitor()


https://bugzilla.kernel.org/show_bug.cgi?id=195503

Bug ID: 195503
   Summary: tipc: unchecked return value of nlmsg_new() in
function tipc_nl_node_get_monitor()
   Product: Networking
   Version: 2.5
Kernel Version: linux-4.11-rc7
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: Other
  Assignee: step...@networkplumber.org
  Reporter: bianpan2...@ruc.edu.cn
Regression: No

Function nlmsg_new() will return a NULL pointer if there is no enough memory.
In function tipc_nl_node_get_monitor(), the return value of nlmsg_new() is not
checked (see line 2100), which may result in bad memory access. 
tipc_nl_node_get_monitor @@ net/tipc/node.c
2094 int tipc_nl_node_get_monitor(struct sk_buff *skb, struct genl_info *info)
2095 {
2096 struct net *net = sock_net(skb->sk);
2097 struct tipc_nl_msg msg;
2098 int err;
2099 
2100 msg.skb = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
2101 msg.portid = info->snd_portid;
2102 msg.seq = info->snd_seq;
2103 
2104 err = __tipc_nl_add_monitor_prop(net, &msg);
2105 if (err) {
2106 nlmsg_free(msg.skb);
2107 return err;
2108 }
2109 
2110 return genlmsg_reply(msg.skb, info);
2111 }

Generally, the return value of nlmsg_new() should be checked against NULL, as
follows.
nfc_genl_target_lost @@ net/nfc/netlink.c: 
 213 int nfc_genl_target_lost(struct nfc_dev *dev, u32 target_idx)
 214 {
 215 struct sk_buff *msg;
 216 void *hdr;
 217 
 218 msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
 219 if (!msg)
 220 return -ENOMEM;
 ...
 237 nla_put_failure:
 238 genlmsg_cancel(msg, hdr);
 239 free_msg:
 240 nlmsg_free(msg);
 241 return -EMSGSIZE;
 242 }


Thanks very much for your attention!

Pan Bian

-- 
You are receiving this mail because:
You are the assignee for the bug.


Fw: [Bug 195497] New: openvswitch: unchecked return value of nla_nest_start() in function queue_userspace_packet()

2017-04-22 Thread Stephen Hemminger


Begin forwarded message:

Date: Sat, 22 Apr 2017 14:52:46 +
From: bugzilla-dae...@bugzilla.kernel.org
To: step...@networkplumber.org
Subject: [Bug 195497] New: openvswitch: unchecked return value of 
nla_nest_start() in function queue_userspace_packet()


https://bugzilla.kernel.org/show_bug.cgi?id=195497

Bug ID: 195497
   Summary: openvswitch: unchecked return value of
nla_nest_start() in function queue_userspace_packet()
   Product: Networking
   Version: 2.5
Kernel Version: linux-4.11-rc7
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: Other
  Assignee: step...@networkplumber.org
  Reporter: bianpan2...@ruc.edu.cn
Regression: No

Function nla_nest_start() may return a NULL pointer on error. However, in
function queue_userspace_packet(), the return value of nla_nest_start() is not
checked against NULL (see lines 489 and 496), and may result in bad memory
access. Related code snippets are shown as follows.

queue_userspace_packet @@ net/openvswitch/datapath.c:420
 420 static int queue_userspace_packet(struct datapath *dp, struct sk_buff
*skb,
 421   const struct sw_flow_key *key,
 422   const struct dp_upcall_info *upcall_info,
 423   uint32_t cutlen)
 424 {
 425 struct ovs_header *upcall;
 ...
 468 len = upcall_msg_size(upcall_info, hlen - cutlen);
 469 user_skb = genlmsg_new(len, GFP_ATOMIC);
 470 if (!user_skb) {
 471 err = -ENOMEM;
 472 goto out;
 473 }
 474 
 475 upcall = genlmsg_put(user_skb, 0, 0, &dp_packet_genl_family,
 476  0, upcall_info->cmd);
 477 upcall->dp_ifindex = dp_ifindex;
 ...
 487 if (upcall_info->egress_tun_info) {
 488 nla = nla_nest_start(user_skb, OVS_PACKET_ATTR_EGRESS_TUN_KEY);
 489 err = ovs_nla_put_tunnel_info(user_skb,
 490   upcall_info->egress_tun_info);
 491 BUG_ON(err);
 492 nla_nest_end(user_skb, nla);
 493 }
 494 
 495 if (upcall_info->actions_len) {
 496 nla = nla_nest_start(user_skb, OVS_PACKET_ATTR_ACTIONS);
 497 err = ovs_nla_put_actions(upcall_info->actions,
 498   upcall_info->actions_len,
 499   user_skb);
 500 if (!err)
 501 nla_nest_end(user_skb, nla);
 502 else
 503 nla_nest_cancel(user_skb, nla);
 504 }
 ...
 545 out:
 546 if (err)
 547 skb_tx_error(skb);
 548 kfree_skb(user_skb);
 549 kfree_skb(nskb);
 550 return err;
 551 }

Generally, the return value of function nla_nest_start() should be checked
against NULL, as follows.
rtnetlink_put_metrics @@ net/core/rtnetlink.c: 
 686 int rtnetlink_put_metrics(struct sk_buff *skb, u32 *metrics)
 687 {
 688 struct nlattr *mx;
 689 int i, valid = 0;
 690 
 691 mx = nla_nest_start(skb, RTA_METRICS);
 692 if (mx == NULL)
 693 return -ENOBUFS;
 ...
 726 return nla_nest_end(skb, mx);
 727 
 728 nla_put_failure:
 729 nla_nest_cancel(skb, mx);
 730 return -EMSGSIZE;
 731 }


Thanks very much for your attention!

Pan Bian

-- 
You are receiving this mail because:
You are the assignee for the bug.


Re: compile issue in latest iproute2

2017-04-22 Thread Jamal Hadi Salim

On 17-04-22 12:18 PM, Daniel Borkmann wrote:
[..]


Anything I'm missing?



Let me get back to that machine (couple of hours) and try to see how i
created the issue.
Shouldve cutnpasted the error msg. Cant create it on this laptop.

cheers,
jamal


[PATCH v2 net] net: ipv6: regenerate host route if moved to gc list

2017-04-22 Thread David Ahern
Taking down the loopback device wreaks havoc on IPv6 routes. By
extension, taking a VRF device wreaks havoc on its table.

Dmitry and Andrey both reported heap out-of-bounds reports in the IPv6
FIB code while running syzkaller fuzzer. The root cause is a dead dst
that is on the garbage list gets reinserted into the IPv6 FIB. While on
the gc (or perhaps when it gets added to the gc list) the dst->next is
set to an IPv4 dst. A subsequent walk of the ipv6 tables causes the
out-of-bounds access.

Andrey's reproducer was the key to getting to the bottom of this.

With IPv6, host routes for an address have the dst->dev set to the
loopback device. When the 'lo' device is taken down, rt6_ifdown initiates
a walk of the fib evicting routes with the 'lo' device which means all
host routes are removed. That process moves the dst which is attached to
an inet6_ifaddr to the gc list and marks it as dead.

The recent change to keep global IPv6 addresses added a new function
fixup_permanent_addr that is called on admin up. That function restarts
dad for an inet6_ifaddr and when it completes the host route attached
to it is inserted into the fib. Since the route was marked dead and
moved to the gc list, we get the reported out-of-bounds accesses. If
the device with the address is taken down or the address is removed, the
WARN_ON in fib6_del is triggered.

All of those faults are fixed by regenerating the host route of the
existing one has been moved to the gc list, something that can be
determined by checking if the rt6i_ref counter is 0.

Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
Reported-by: Dmitry Vyukov 
Reported-by: Andrey Konovalov 
Signed-off-by: David Ahern 
---
v2
- change ifp->rt under spinlock vs cmpxchg
- add comment about rt6i_ref == 0

Dmitry / Andrey: can you guys add this patch to your tree and run
syzkaller tests? I'd like to confirm that all of the fib traces
are fixed. Thanks.

 net/ipv6/addrconf.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 08f9e8ea7a81..97e86158bbcb 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3303,14 +3303,24 @@ static void addrconf_gre_config(struct net_device *dev)
 static int fixup_permanent_addr(struct inet6_dev *idev,
struct inet6_ifaddr *ifp)
 {
-   if (!ifp->rt) {
-   struct rt6_info *rt;
+   /* rt6i_ref == 0 means the host route was removed from the
+* FIB, for example, if 'lo' device is taken down. In that
+* case regenerate the host route.
+*/
+   if (!ifp->rt || !atomic_read(&ifp->rt->rt6i_ref)) {
+   struct rt6_info *rt, *prev;
 
rt = addrconf_dst_alloc(idev, &ifp->addr, false);
if (unlikely(IS_ERR(rt)))
return PTR_ERR(rt);
 
+   spin_lock(&ifp->lock);
+   prev = ifp->rt;
ifp->rt = rt;
+   spin_unlock(&ifp->lock);
+
+   if (prev)
+   ip6_rt_put(prev);
}
 
if (!(ifp->flags & IFA_F_NOPREFIXROUTE)) {
-- 
2.1.4



[PATCH net-next] net: add rcu locking when changing early demux

2017-04-22 Thread David Ahern
systemd-sysctl is triggering a suspicious RCU usage message when
net.ipv4.tcp_early_demux or net.ipv4.udp_early_demux is changed via
a sysctl config file:

[   33.896184] ===
[   33.899558] [ ERR: suspicious RCU usage.  ]
[   33.900624] 4.11.0-rc7+ #104 Not tainted
[   33.901698] ---
[   33.903059] /home/dsa/kernel-2.git/net/ipv4/sysctl_net_ipv4.c:305 suspicious 
rcu_dereference_check() usage!
[   33.905724]
other info that might help us debug this:

[   33.907656]
rcu_scheduler_active = 2, debug_locks = 0
[   33.909288] 1 lock held by systemd-sysctl/143:
[   33.910373]  #0:  (sb_writers#5){.+.+.+}, at: [] 
file_start_write+0x45/0x48
[   33.912407]
stack backtrace:
[   33.914018] CPU: 0 PID: 143 Comm: systemd-sysctl Not tainted 4.11.0-rc7+ #104
[   33.915631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.7.5-20140531_083030-gandalf 04/01/2014
[   33.917870] Call Trace:
[   33.918431]  dump_stack+0x81/0xb6
[   33.919241]  lockdep_rcu_suspicious+0x10f/0x118
[   33.920263]  proc_configure_early_demux+0x65/0x10a
[   33.921391]  proc_udp_early_demux+0x3a/0x41

add rcu locking to proc_configure_early_demux.

Fixes: dddb64bcb3461 ("net: Add sysctl to toggle early demux for tcp and udp")
Signed-off-by: David Ahern 
---
 net/ipv4/sysctl_net_ipv4.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 6fb25693c00b..ddac9e64b702 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -302,6 +302,8 @@ static void proc_configure_early_demux(int enabled, int 
protocol)
struct inet6_protocol *ip6prot;
 #endif
 
+   rcu_read_lock();
+
ipprot = rcu_dereference(inet_protos[protocol]);
if (ipprot)
ipprot->early_demux = enabled ? ipprot->early_demux_handler :
@@ -313,6 +315,7 @@ static void proc_configure_early_demux(int enabled, int 
protocol)
ip6prot->early_demux = enabled ? ip6prot->early_demux_handler :
 NULL;
 #endif
+   rcu_read_unlock();
 }
 
 static int proc_tcp_early_demux(struct ctl_table *table, int write,
-- 
2.1.4



Re: compile issue in latest iproute2

2017-04-22 Thread Daniel Borkmann

On 04/22/2017 05:00 PM, Daniel Borkmann wrote:

On 04/22/2017 02:31 PM, Jamal Hadi Salim wrote:


I dont think is a kernel uapi - but it was failing compiling
when HAVE_ELF is false.
-
jhs@jhs-UX:~/git-trees/others/iproute-with-ck$ git diff include/bpf_util.h
diff --git a/include/bpf_util.h b/include/bpf_util.h
index 5361dab..edca339 100644
--- a/include/bpf_util.h
+++ b/include/bpf_util.h
@@ -266,7 +266,7 @@ int bpf_send_map_fds(const char *path, const char *obj);
  int bpf_recv_map_fds(const char *path, int *fds, struct bpf_map_aux *aux,
  unsigned int entries);
  #else
-static inline int bpf_send_map_fds(const char *path, const char *obj)
+inline int bpf_send_map_fds(const char *path, const char *obj)
  {
 return 0;
  }
-

Let me know if you want a formal patch or feel free to take it.


Will resolve it and send a patch later today, thanks!


Hmm, I'm on latest f443565f8df6 ("ip vrf: Add command name next to
pid") commit in master branch. Compiles fine for me with and without
ELF support. I verified that there's no HAVE_ELF defined and I'm
not seeing an error.

Without ELF support:

# ./configure
TC schedulers
 ATMno

libc has setns: yes
SELinux support: yes
ELF support: no
libmnl support: yes
Berkeley DB: yes

docs: latex: yes
 pdflatex: yes
 sgml2latex: no
 WARNING: no LaTeX files can be build from SGML files
 sgml2html: no
 WARNING: no HTML docs can be built from SGML

# make > /dev/null
ssfilter.y: conflicts: 35 shift/reduce
#

With ELF support:

# ./configure
TC schedulers
 ATMno

libc has setns: yes
SELinux support: yes
ELF support: yes
libmnl support: yes
Berkeley DB: yes

docs: latex: yes
 pdflatex: yes
 sgml2latex: no
 WARNING: no LaTeX files can be build from SGML files
 sgml2html: no
 WARNING: no HTML docs can be built from SGML

# make > /dev/null
ssfilter.y: conflicts: 35 shift/reduce
#

Anything I'm missing?


[PATCH v2 net-next] net: ipv6: send unsolicited NA if enabled for all interfaces

2017-04-22 Thread David Ahern
When arp_notify is set to 1 for either a specific interface or for 'all'
interfaces, gratuitous arp requests are sent. Since ndisc_notify is the
ipv6 equivalent to arp_notify, it should follow the same semantics.
Commit 4a6e3c5def13 ("net: ipv6: send unsolicited NA on admin up") sends
the NA on admin up. The final piece is checking devconf_all->ndisc_notify
in addition to the per device setting. Add it.

Fixes: 5cb04436eef6 ("ipv6: add knob to send unsolicited ND on link-layer 
address change")
Signed-off-by: David Ahern 
---
v2
- update commit message with subject of commit 4a6e3c5def13 per comment
  from Sergei

 net/ipv6/ndisc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index b23822e64228..d310dc41209a 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1753,7 +1753,8 @@ static int ndisc_netdev_event(struct notifier_block 
*this, unsigned long event,
idev = in6_dev_get(dev);
if (!idev)
break;
-   if (idev->cnf.ndisc_notify)
+   if (idev->cnf.ndisc_notify ||
+   net->ipv6.devconf_all->ndisc_notify)
ndisc_send_unsol_na(dev);
in6_dev_put(idev);
break;
-- 
2.1.4



Re: Why max netlink msg size is limited to 16k

2017-04-22 Thread Eric Dumazet
On Sat, 2017-04-22 at 19:43 +0530, prashantkumar dhotre wrote:
> I am observing that max netlink msg that my kernel module can send to
> user app is close to 16K.
> 
> For larger sizes, genlmsg_unicast() succeeds but my app does not receive data.
> 
> I have tried increasing RECV buffer size in my user app but that does not 
> help.
> 
> Regards


You need a kernel >= linux-4.9 to get about 32KB 

Why is this limited ? Please read 

https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=d35c99ff77ecb2eb239731b799386f3b3637a31e





Re: [PATCH 2/2] sparc64: Add eBPF JIT.

2017-04-22 Thread Alexei Starovoitov
On Fri, Apr 21, 2017 at 08:17:11PM -0700, David Miller wrote:
> 
> This is an eBPF JIT for sparc64.  All major features are supported.
> 
> All tests under tools/testing/selftests/bpf/ pass.
> 
> Signed-off-by: David S. Miller 
...
> + /* tail call */
> + case BPF_JMP | BPF_CALL |BPF_X:
> + emit_tail_call(ctx);
> +

I think 'break;' is missing here.
When tail_call's target program is null the current program should
continue instead of aborting.
Like in our current ddos+lb setup the program looks like:
 bpf_tail_call(ctx, &prog_array, 1);
 bpf_tail_call(ctx, &prog_array, 2);
 bpf_tail_call(ctx, &prog_array, 3);
 return XDP_DROP;

this way it will jump into the program that is installed in slot 1.
If it's empty, it will try slot 2...
If no programs installed it will drop the packet.

> + /* function return */
> + case BPF_JMP | BPF_EXIT:
> + /* Optimization: when last instruction is EXIT,
> +simply fallthrough to epilogue. */
> + if (i == ctx->prog->len - 1)
> + break;
> + emit_branch(BA, ctx->idx, ctx->epilogue_offset, ctx);
> + emit_nop(ctx);



Re: compile issue in latest iproute2

2017-04-22 Thread Daniel Borkmann

On 04/22/2017 02:31 PM, Jamal Hadi Salim wrote:


I dont think is a kernel uapi - but it was failing compiling
when HAVE_ELF is false.
-
jhs@jhs-UX:~/git-trees/others/iproute-with-ck$ git diff include/bpf_util.h
diff --git a/include/bpf_util.h b/include/bpf_util.h
index 5361dab..edca339 100644
--- a/include/bpf_util.h
+++ b/include/bpf_util.h
@@ -266,7 +266,7 @@ int bpf_send_map_fds(const char *path, const char *obj);
  int bpf_recv_map_fds(const char *path, int *fds, struct bpf_map_aux *aux,
  unsigned int entries);
  #else
-static inline int bpf_send_map_fds(const char *path, const char *obj)
+inline int bpf_send_map_fds(const char *path, const char *obj)
  {
 return 0;
  }
-

Let me know if you want a formal patch or feel free to take it.


Will resolve it and send a patch later today, thanks!


Re: [PATCH net] net: ipv6: regenerate host route if moved to gc list

2017-04-22 Thread David Ahern
On 4/22/17 3:14 AM, Dmitry Vyukov wrote:
>> One small question.  Why cmpxchg is needed instead
>> of a ip6_rt_put() and then assign?
>> Is it fixing another bug?
> cmpxchg here looks fishy.
> If there are no concurrent modifications, then it is not needed.
> If there are and cmpxchg fails, then we will put the installed rt and
> leak the new one.
> 

Yes, I need to convert that to changing the rt under a lock.

Leftover from the beginning of the investigation when I suspected
locking and recalled Li's patch. I'll send a v2.


Why max netlink msg size is limited to 16k

2017-04-22 Thread prashantkumar dhotre
I am observing that max netlink msg that my kernel module can send to
user app is close to 16K.

For larger sizes, genlmsg_unicast() succeeds but my app does not receive data.

I have tried increasing RECV buffer size in my user app but that does not help.

Regards


[PATCH] orinoco: fix spelling mistake: "Registerred" -> "Registered"

2017-04-22 Thread Colin King
From: Colin Ian King 

trivial fix to spelling mistake in dbg_dbg message

Signed-off-by: Colin Ian King 
---
 drivers/net/wireless/intersil/orinoco/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/intersil/orinoco/main.c 
b/drivers/net/wireless/intersil/orinoco/main.c
index 28cf97489001..d9128bb25e85 100644
--- a/drivers/net/wireless/intersil/orinoco/main.c
+++ b/drivers/net/wireless/intersil/orinoco/main.c
@@ -2283,7 +2283,7 @@ int orinoco_if_add(struct orinoco_private *priv,
priv->ndev = dev;
 
/* Report what we've done */
-   dev_dbg(priv->dev, "Registerred interface %s.\n", dev->name);
+   dev_dbg(priv->dev, "Registered interface %s.\n", dev->name);
 
return 0;
 
-- 
2.11.0



pull request: bluetooth-next 2017-04-22

2017-04-22 Thread Johan Hedberg
Hi Dave,

Here are some more Bluetooth patches (and one 802.15.4 patch) in the
bluetooth-next tree targeting the 4.12 kernel. Most of them are pure
fixes.

Please let me know if there are any issues pulling. Thanks.

Johan

---
The following changes since commit fb796707d7a6c9b24fdf80b9b4f24fa5ffcf0ec5:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2017-04-21 
20:23:53 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git 
for-upstream

for you to fetch changes up to d160b74da85a4ec072b076db056e27ba7246eba0:

  Bluetooth: hci_ldisc: Add missing clear HCI_UART_PROTO_READY (2017-04-22 
10:28:40 +0200)


Arnd Bergmann (2):
  Bluetooth: try to improve CONFIG_SERIAL_DEV_BUS dependency
  ieee802154: don't select COMMON_CLK

Dean Jenkins (3):
  Bluetooth: hci_ldisc: Add missing return in hci_uart_init_work()
  Bluetooth: hci_ldisc: Ensure hu->hdev set to NULL before freeing hdev
  Bluetooth: hci_ldisc: Add missing clear HCI_UART_PROTO_READY

Sebastian Reichel (1):
  Bluetooth: hci_ll: Fix NULL pointer deref on FW upload failure

 drivers/bluetooth/Kconfig  | 8 +++-
 drivers/bluetooth/Makefile | 2 +-
 drivers/bluetooth/hci_ldisc.c  | 7 ++-
 drivers/bluetooth/hci_ll.c | 3 +--
 drivers/net/ieee802154/Kconfig | 2 +-
 5 files changed, 16 insertions(+), 6 deletions(-)


signature.asc
Description: PGP signature


[PATCH v2] net: natsemi: ns83820: add checks for dma mapping error

2017-04-22 Thread Alexey Khoroshilov
The driver does not check if mapping dma memory succeed.
The patch adds the checks and failure handling.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov 
---
 drivers/net/ethernet/natsemi/ns83820.c | 42 +++---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/natsemi/ns83820.c 
b/drivers/net/ethernet/natsemi/ns83820.c
index 729095db3e08..dfc64e1e31f9 100644
--- a/drivers/net/ethernet/natsemi/ns83820.c
+++ b/drivers/net/ethernet/natsemi/ns83820.c
@@ -534,14 +534,19 @@ static inline int ns83820_add_rx_skb(struct ns83820 *dev, 
struct sk_buff *skb)
);
 #endif
 
+   buf = pci_map_single(dev->pci_dev, skb->data,
+REAL_RX_BUF_SIZE, PCI_DMA_FROMDEVICE);
+   if (pci_dma_mapping_error(dev->pci_dev, buf)) {
+   kfree_skb(skb);
+   return 1;
+   }
+
sg = dev->rx_info.descs + (next_empty * DESC_SIZE);
BUG_ON(NULL != dev->rx_info.skbs[next_empty]);
dev->rx_info.skbs[next_empty] = skb;
 
dev->rx_info.next_empty = (next_empty + 1) % NR_RX_DESC;
cmdsts = REAL_RX_BUF_SIZE | CMDSTS_INTR;
-   buf = pci_map_single(dev->pci_dev, skb->data,
-REAL_RX_BUF_SIZE, PCI_DMA_FROMDEVICE);
build_rx_desc(dev, sg, 0, buf, cmdsts, 0);
/* update link of previous rx */
if (likely(next_empty != dev->rx_info.next_rx))
@@ -1068,6 +1073,7 @@ static netdev_tx_t ns83820_hard_start_xmit(struct sk_buff 
*skb,
int stopped = 0;
int do_intr = 0;
volatile __le32 *first_desc;
+   volatile __le32 *desc;
 
dprintk("ns83820_hard_start_xmit\n");
 
@@ -1136,11 +1142,13 @@ static netdev_tx_t ns83820_hard_start_xmit(struct 
sk_buff *skb,
if (nr_frags)
len -= skb->data_len;
buf = pci_map_single(dev->pci_dev, skb->data, len, PCI_DMA_TODEVICE);
+   if (pci_dma_mapping_error(dev->pci_dev, buf))
+   goto dma_error_first;
 
first_desc = dev->tx_descs + (free_idx * DESC_SIZE);
 
for (;;) {
-   volatile __le32 *desc = dev->tx_descs + (free_idx * DESC_SIZE);
+   desc = dev->tx_descs + (free_idx * DESC_SIZE);
 
dprintk("frag[%3u]: %4u @ 0x%08Lx\n", free_idx, len,
(unsigned long long)buf);
@@ -1160,6 +1168,8 @@ static netdev_tx_t ns83820_hard_start_xmit(struct sk_buff 
*skb,
 
buf = skb_frag_dma_map(&dev->pci_dev->dev, frag, 0,
   skb_frag_size(frag), DMA_TO_DEVICE);
+   if (dma_mapping_error(&dev->pci_dev->dev, buf))
+   goto dma_error;
dprintk("frag: buf=%08Lx  page=%08lx offset=%08lx\n",
(long long)buf, (long) page_to_pfn(frag->page),
frag->page_offset);
@@ -1183,6 +1193,32 @@ static netdev_tx_t ns83820_hard_start_xmit(struct 
sk_buff *skb,
netif_start_queue(ndev);
 
return NETDEV_TX_OK;
+
+dma_error:
+   do {
+   free_idx = (free_idx + NR_TX_DESC - 1) % NR_TX_DESC;
+   desc = dev->tx_descs + (free_idx * DESC_SIZE);
+   cmdsts = le32_to_cpu(desc[DESC_CMDSTS]);
+   len = cmdsts & CMDSTS_LEN_MASK;
+   buf = desc_addr_get(desc + DESC_BUFPTR);
+   if (desc == first_desc)
+   pci_unmap_single(dev->pci_dev,
+   buf,
+   len,
+   PCI_DMA_TODEVICE);
+   else
+   pci_unmap_page(dev->pci_dev,
+   buf,
+   len,
+   PCI_DMA_TODEVICE);
+   desc[DESC_CMDSTS] = cpu_to_le32(0);
+   mb();
+   } while (desc != first_desc);
+
+dma_error_first:
+   dev_kfree_skb_any(skb);
+   ndev->stats.tx_errors++;
+   return NETDEV_TX_OK;
 }
 
 static void ns83820_update_stats(struct ns83820 *dev)
-- 
2.7.4



[PATCH iproute2 1/1] actions: Add support for user cookies

2017-04-22 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Make use of 128b user cookies

Introduce optional 128-bit action cookie.
Like all other cookie schemes in the networking world (eg in protocols
like http or existing kernel fib protocol field, etc) the idea is to
save user state that when retrieved serves as a correlator. The kernel
_should not_ intepret it. The user can store whatever they wish in the
128 bits.

Sample exercise(showing variable length use of cookie)

.. create an accept action with cookie a1b2c3d4
sudo $TC actions add action ok index 1 cookie a1b2c3d4

.. dump all gact actions..
sudo $TC -s actions ls action gact

action order 0: gact action pass
 random type none pass val 0
 index 1 ref 1 bind 0 installed 5 sec used 5 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
cookie a1b2c3d4

.. bind the accept action to a filter..
sudo $TC filter add dev lo parent : protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1

... send some traffic..
$ ping 127.0.0.1 -c 3
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms

Signed-off-by: Jamal Hadi Salim 
---
 tc/m_action.c | 49 +++--
 1 file changed, 43 insertions(+), 6 deletions(-)

diff --git a/tc/m_action.c b/tc/m_action.c
index 05ef07e..6ebe85e 100644
--- a/tc/m_action.c
+++ b/tc/m_action.c
@@ -150,18 +150,19 @@ new_cmd(char **argv)
 
 }
 
-int
-parse_action(int *argc_p, char ***argv_p, int tca_id, struct nlmsghdr *n)
+int parse_action(int *argc_p, char ***argv_p, int tca_id, struct nlmsghdr *n)
 {
int argc = *argc_p;
char **argv = *argv_p;
struct rtattr *tail, *tail2;
char k[16];
+   int act_ck_len = 0;
int ok = 0;
int eap = 0; /* expect action parameters */
 
int ret = 0;
int prio = 0;
+   unsigned char act_ck[TC_COOKIE_MAX_SIZE];
 
if (argc <= 0)
return -1;
@@ -215,16 +216,44 @@ done0:
addattr_l(n, MAX_MSG, ++prio, NULL, 0);
addattr_l(n, MAX_MSG, TCA_ACT_KIND, k, strlen(k) + 1);
 
-   ret = a->parse_aopt(a, &argc, &argv, TCA_ACT_OPTIONS, 
n);
+   ret = a->parse_aopt(a, &argc, &argv, TCA_ACT_OPTIONS,
+   n);
 
if (ret < 0) {
fprintf(stderr, "bad action parsing\n");
goto bad_val;
}
+
+   if (*argv && strcmp(*argv, "cookie") == 0) {
+   size_t slen;
+
+   NEXT_ARG();
+   slen = strlen(*argv);
+   if (slen > TC_COOKIE_MAX_SIZE * 2) {
+   char cookie_err_m[128];
+
+   snprintf(cookie_err_m, 128,
+"%zd Max allowed size %d",
+slen, TC_COOKIE_MAX_SIZE*2);
+   invarg(cookie_err_m, *argv);
+   }
+
+   if (hex2mem(*argv, act_ck, slen / 2) < 0)
+   invarg("cookie must be a hex string\n",
+  *argv);
+
+   act_ck_len = slen;
+   argc--;
+   argv++;
+   }
+
+   if (act_ck_len)
+   addattr_l(n, MAX_MSG, TCA_ACT_COOKIE,
+ &act_ck, act_ck_len);
+
tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
ok++;
}
-
}
 
if (eap > 0) {
@@ -245,8 +274,7 @@ bad_val:
return -1;
 }
 
-static int
-tc_print_one_action(FILE *f, struct rtattr *arg)
+static int tc_print_one_action(FILE *f, struct rtattr *arg)
 {
 
struct rtattr *tb[TCA_ACT_MAX + 1];
@@ -274,8 +302,17 @@ tc_print_one_action(FILE *f, struct rtattr *arg)
return err;
 
if (show_stats && tb[TCA_ACT_STATS]) {
+
fprintf(f, "\tAction statistics:\n");
print_tcstats2_attr(f, tb[TCA_ACT_STATS], "\t", NULL);
+   if (tb[TCA_ACT_COOKIE]) {
+   int strsz = RTA_PAYLOAD(tb[TCA_ACT_COOKIE]);
+   char b1[strsz+1];
+
+   fprintf(f, "\n\tcookie len %d %s ", strsz,
+   hexstring_n2a(RTA_DATA(tb[TCA_ACT_COOKIE]),
+ strsz, b1, sizeof(b1)));
+   }
 

compile issue in latest iproute2

2017-04-22 Thread Jamal Hadi Salim


I dont think is a kernel uapi - but it was failing compiling
when HAVE_ELF is false.
-
jhs@jhs-UX:~/git-trees/others/iproute-with-ck$ git diff include/bpf_util.h
diff --git a/include/bpf_util.h b/include/bpf_util.h
index 5361dab..edca339 100644
--- a/include/bpf_util.h
+++ b/include/bpf_util.h
@@ -266,7 +266,7 @@ int bpf_send_map_fds(const char *path, const char *obj);
 int bpf_recv_map_fds(const char *path, int *fds, struct bpf_map_aux *aux,
 unsigned int entries);
 #else
-static inline int bpf_send_map_fds(const char *path, const char *obj)
+inline int bpf_send_map_fds(const char *path, const char *obj)
 {
return 0;
 }
-

Let me know if you want a formal patch or feel free to take it.

cheers,
jamal


Re: [PATCH 4/4] [DO NOT MERGE] arm64: allwinner: a64: enable RTL8211E PHY workaround

2017-04-22 Thread kbuild test robot
Hi Icenowy,

[auto build test ERROR on net-next/master]
[also build test ERROR on v4.11-rc7 next-20170421]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Icenowy-Zheng/net-phy-realtek-change-macro-name-for-page-select-register/20170422-144641
config: arm64-defconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

>> Error: arch/arm64/boot/dts/allwinner/sun50i-a64-pine64-plus.dts:52.1-9 Label 
>> or path ext_phy not found
>> FATAL ERROR: Syntax error parsing input tree

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


[PATCH] net: netcp: fix spelling mistake: "memomry" -> "memory"

2017-04-22 Thread Colin King
From: Colin Ian King 

Trivial fix to spelling mistake in dev_err message and rejoin
line.

Signed-off-by: Colin Ian King 
---
 drivers/net/ethernet/ti/netcp_ethss.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ti/netcp_ethss.c 
b/drivers/net/ethernet/ti/netcp_ethss.c
index eece3e2eec14..897176fc5043 100644
--- a/drivers/net/ethernet/ti/netcp_ethss.c
+++ b/drivers/net/ethernet/ti/netcp_ethss.c
@@ -3048,8 +3048,7 @@ static void init_secondary_ports(struct gbe_priv *gbe_dev,
for_each_child_of_node(node, port) {
slave = devm_kzalloc(dev, sizeof(*slave), GFP_KERNEL);
if (!slave) {
-   dev_err(dev,
-   "memomry alloc failed for secondary port(%s), 
skipping...\n",
+   dev_err(dev, "memory alloc failed for secondary 
port(%s), skipping...\n",
port->name);
continue;
}
-- 
2.11.0



Re: r8169: Long link becomes ready times

2017-04-22 Thread Francois Romieu
Paul Menzel  :
[...]
> The ASRock E350M1 has a Realtek ethernet controller.
> 
> It takes almost three seconds for the link to become ready. This is
> noticeable after resume from suspend, where the user wants to continue
> working but first has to wait for the network.
> 
> This test is done with Linux 4.10.
[...]
> The test below is done, removing the module, and then inserting it.
> 
> ```
> Apr 22 10:56:11.919311 myasrocke350m1 kernel: r8169 :03:00.0 eth0: 
> RTL8168e/8111e at 0xf82ad000, bc:5f:f4:c8:d3:98, XID 0c20 IRQ 26
> Apr 22 10:56:11.920631 myasrocke350m1 kernel: r8169 :03:00.0 eth0: jumbo 
> features [frames: 9200 bytes, tx checksumming: ko]
> Apr 22 10:56:11.967396 myasrocke350m1 kernel: r8169 :03:00.0 eth6: 
> renamed from eth0
> Apr 22 10:56:12.064323 myasrocke350m1 kernel: IPv6: ADDRCONF(NETDEV_UP): 
> eth6: link is not ready
> Apr 22 10:56:12.179106 myasrocke350m1 kernel: r8169 :03:00.0: firmware: 
> direct-loading firmware rtl_nic/rtl8168e-2.fw
> Apr 22 10:56:12.247858 myasrocke350m1 kernel: r8169 :03:00.0 eth6: link 
> down
> Apr 22 10:56:12.248593 myasrocke350m1 kernel: IPv6: ADDRCONF(NETDEV_UP): 
> eth6: link is not ready
> Apr 22 10:56:14.992108 myasrocke350m1 kernel: r8169 :03:00.0 eth6: link up
> Apr 22 10:56:14.993299 myasrocke350m1 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): 
> eth6: link becomes ready
> ```
> 
> Is it possible to get this well below one second?

Gross as it is, the link detection is already irq driven. Most currently
used delays - see grep -E '(sleep|delay)' drivers/.../r8169.c - are
supposed to be busy waiting loops with a moderate (well...) delay per
loop. The iteration bound is not expected to make a difference.

So, unless there is a big crawling sleep/delay hidden somewhere, this part
of the r8169 driver should not induce huge delays.

Realtek does not communicate hardware documentation, neither programming
specification nor known bugs. Some phy related pieces of material may be
found on their site but you would have to experiment a lot to check if
things can behave differently.

I also experience ~3s link down / link up transition with a 8168c when
connected to a 3Com 4200G. Same 2~3s figures with an intel 82578dc.
It looks similar with a 82574l.

I've never aimed at well below one second (500 ms ?) reliable autoneg.
The phy man may have a different vision.

-- 
Ueimor


Re: [PATCH 2/2] ipv6: don't deliver packets with zero length to raw sockets

2017-04-22 Thread Jamie Bainbridge
On Sat, Apr 22, 2017 at 12:53 AM, David Miller  wrote:
> From: Jamie Bainbridge 
> Date: Fri, 21 Apr 2017 21:18:00 +1000
>
>> I cannot see the use in delivering a skb with zero bytes after the
>> network header to a raw socket.
>
> Then it cannot be used to look at zero length UDP packets, which are
> completely legal and used.
>
> So we must deliver it.

Understood, thank you both for the clarification.

That would mean the pattern of select/ioctl/recvfrom is the incorrect
way to code an IPv6 raw socket application. I will let our user know.

How about the other patch in this series? That actually is a valid bug
when skb are paged in a certain way. That patch does not change
behaviour, it just allows ioctl to return the correct result whether
data is linear or paged. Will I resubmit that patch on its own with a
revised commit message?

Jamie


Re: [PATCH net-next v4 1/2] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch

2017-04-22 Thread Jamal Hadi Salim

On 17-04-21 02:11 PM, Jamal Hadi Salim wrote:


Please bear with me. I want to make sure to get this right.

Lets say I updated the kernel today to reject transactions with
bits it didnt understand. Lets call this "old kernel". A tc that
understands/sets these bits and nothing else. Call it "old tc".
3 months later:
I add one more bit setting to introduce a new feature in a new
kernel version. Lets call this new "kernel". I update to
understand new bits. Call it "new tc".

The possibilities:
a) old tc + old kernel combo. No problem
b) new tc + new kernel combo. No problem.
c) old tc + new kernel combo. No problem.
d) new tc + old kernel. Rejection.

For #d if i have a smart tc it would retry with a new combination
which restores its behavior to old tc level. Of course this means
apps would have to be rewritten going forward to understand these
mechanics.
Alternative is to request for capabilities first then doing a
lowest common denominator request.
But even that is a lot more code and crossing user/kernel twice.

There is a simpler approach that would work going forward.
How about letting the user choose their fate? Set something maybe
in the netlink header to tell the kernel "if you dont understand
something I am asking for - please ignore it and do what you can".
This would maintain current behavior but would force the user to
explicitly state so.




Tested patch that demonstrates this idea is attached.

cheers,
jamal

diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index cce0613..48d3acb 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -674,10 +674,28 @@ struct tcamsg {
unsigned char   tca__pad1;
unsigned short  tca__pad2;
 };
+
+enum {
+   TCA_ROOT_UNSPEC,
+   TCA_ROOT_TAB,
+#define TCA_ACT_TAB TCA_ROOT_TAB
+   TCA_ROOT_FLAGS,
+   TCA_ROOT_COUNT,
+   __TCA_ROOT_MAX,
+#defineTCA_ROOT_MAX (__TCA_ROOT_MAX - 1)
+};
+
 #define TA_RTA(r)  ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct 
tcamsg
 #define TA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct tcamsg))
-#define TCA_ACT_TAB 1 /* attr type must be >=1 */  
-#define TCAA_MAX 1
+/* tcamsg flags stored in attribute TCA_ROOT_FLAGS
+ *
+ * TCA_FLAG_LARGE_DUMP_ON user->kernel to request for larger than 
TCA_ACT_MAX_PRIO
+ * actions in a dump. All dump responses will contain the number of actions
+ * being dumped stored in for user app's consumption in TCA_ROOT_COUNT
+ *
+ */
+#define TCA_FLAG_LARGE_DUMP_ON (1 << 0)
+#define TCA_FLAG_LIBERAL_CHECK_ON  (1 << 1)
 
 /* New extended info filters for IFLA_EXT_MASK */
 #define RTEXT_FILTER_VF(1 << 0)
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 9ce22b7..fbe96ae 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -83,6 +83,7 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, struct 
sk_buff *skb,
   struct netlink_callback *cb)
 {
int err = 0, index = -1, i = 0, s_i = 0, n_i = 0;
+   u32 act_flags = cb->args[2];
struct nlattr *nest;
 
spin_lock_bh(&hinfo->lock);
@@ -111,14 +112,18 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, 
struct sk_buff *skb,
}
nla_nest_end(skb, nest);
n_i++;
-   if (n_i >= TCA_ACT_MAX_PRIO)
+   if (!(act_flags & TCA_FLAG_LARGE_DUMP_ON) &&
+   n_i >= TCA_ACT_MAX_PRIO)
goto done;
}
}
 done:
spin_unlock_bh(&hinfo->lock);
-   if (n_i)
+   if (n_i) {
cb->args[0] += n_i;
+   if (act_flags & TCA_FLAG_LARGE_DUMP_ON)
+   cb->args[1] = n_i;
+   }
return n_i;
 
 nla_put_failure:
@@ -993,11 +998,15 @@ static int tcf_action_add(struct net *net, struct nlattr 
*nla,
return tcf_add_notify(net, n, &actions, portid);
 }
 
+static const struct nla_policy tcaa_policy[TCA_ROOT_MAX + 1] = {
+   [TCA_ROOT_FLAGS]  = { .type = NLA_U32 },
+};
+
 static int tc_ctl_action(struct sk_buff *skb, struct nlmsghdr *n,
 struct netlink_ext_ack *extack)
 {
struct net *net = sock_net(skb->sk);
-   struct nlattr *tca[TCAA_MAX + 1];
+   struct nlattr *tca[TCA_ROOT_MAX + 1];
u32 portid = skb ? NETLINK_CB(skb).portid : 0;
int ret = 0, ovr = 0;
 
@@ -1005,7 +1014,7 @@ static int tc_ctl_action(struct sk_buff *skb, struct 
nlmsghdr *n,
!netlink_capable(skb, CAP_NET_ADMIN))
return -EPERM;
 
-   ret = nlmsg_parse(n, sizeof(struct tcamsg), tca, TCAA_MAX, NULL,
+   ret = nlmsg_parse(n, sizeof(struct tcamsg), tca, TCA_ROOT_MAX, NULL,
  extack);
if (ret < 0)
return ret;
@@ -1046,16 +1055,12 @@ static int tc_ctl_action(struct sk_buff *skb, struct 
nlmsghdr *n,
return ret;
 }

Re: [PATCH] ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled

2017-04-22 Thread Julian Anastasov

Hello,

On Thu, 20 Apr 2017, Paolo Abeni wrote:

> When creating a new ipvs service, ipv6 addresses are always accepted
> if CONFIG_IP_VS_IPV6 is enabled. On dest creation the address family
> is not explicitly checked.
> 
> This allows the user-space to configure ipvs services even if the
> system is booted with ipv6.disable=1. On specific configuration, ipvs
> can try to call ipv6 routing code at setup time, causing the kernel to
> oops due to fib6_rules_ops being NULL.
> 
> This change addresses the issue adding a check for the ipv6
> module being enabled while validating ipv6 service operations and
> adding the same validation for dest operations.
> 
> According to git history, this issue is apparently present since
> the introduction of ipv6 support, and the oops can be triggered
> since commit 09571c7ae30865ad ("IPVS: Add function to determine
> if IPv6 address is local")
> 
> Fixes: 09571c7ae30865ad ("IPVS: Add function to determine if IPv6 address is 
> local")
> Signed-off-by: Paolo Abeni 

Looks good to me but I see two places that can benefit
from such check:

- in ip_vs_genl_new_daemon() if we do not want to create IPv6 sockets
for the sync protocol in make_send_sock() and make_receive_sock().
Not sure if this can lead to crashes.

- in ip_vs_proc_sync_conn() if we do not want backup server to accept 
IPv6 conns because they may be created even when dests are missing.
We may use retc = 10 there. Not fatal but may eat memory for
conns that will not be used.

Regards

--
Julian Anastasov 


[PATCH net] ravb: Double free on error in ravb_start_xmit()

2017-04-22 Thread Dan Carpenter
If skb_put_padto() fails then it frees the skb.  I shifted that code
up a bit to make my error handling a little simpler.

Fixes: a0d2f20650e8 ("Renesas Ethernet AVB PTP clock driver")
Signed-off-by: Dan Carpenter 

diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
b/drivers/net/ethernet/renesas/ravb_main.c
index 8cfc4a54f2dc..3cd7989c007d 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -1516,11 +1516,12 @@ static netdev_tx_t ravb_start_xmit(struct sk_buff *skb, 
struct net_device *ndev)
spin_unlock_irqrestore(&priv->lock, flags);
return NETDEV_TX_BUSY;
}
-   entry = priv->cur_tx[q] % (priv->num_tx_ring[q] * NUM_TX_DESC);
-   priv->tx_skb[q][entry / NUM_TX_DESC] = skb;
 
if (skb_put_padto(skb, ETH_ZLEN))
-   goto drop;
+   goto exit;
+
+   entry = priv->cur_tx[q] % (priv->num_tx_ring[q] * NUM_TX_DESC);
+   priv->tx_skb[q][entry / NUM_TX_DESC] = skb;
 
buffer = PTR_ALIGN(priv->tx_align[q], DPTR_ALIGN) +
 entry / NUM_TX_DESC * DPTR_ALIGN;


r8169: Long link becomes ready times

2017-04-22 Thread Paul Menzel
Dear Linux folks,


The ASRock E350M1 has a Realtek ethernet controller.

It takes almost three seconds for the link to become ready. This is
noticeable after resume from suspend, where the user wants to continue
working but first has to wait for the network.

This test is done with Linux 4.10.

```
$ sudo lspci -s 3:00.0 -nn -v
03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. 
RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 06)
Subsystem: ASRock Incorporation Motherboard (one of many) [1849:8168]
Flags: bus master, fast devsel, latency 0, IRQ 26
I/O ports at 1000 [size=256]
Memory at f0004000 (64-bit, prefetchable) [size=4K]
Memory at f000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Endpoint, MSI 01
Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
Capabilities: [d0] Vital Product Data
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Virtual Channel
Capabilities: [160] Device Serial Number 01-00-00-00-68-4c-e0-00
Kernel driver in use: r8169
Kernel modules: r8169
```

The test below is done, removing the module, and then inserting it.

```
Apr 22 10:56:11.919311 myasrocke350m1 kernel: r8169 :03:00.0 eth0: 
RTL8168e/8111e at 0xf82ad000, bc:5f:f4:c8:d3:98, XID 0c20 IRQ 26
Apr 22 10:56:11.920631 myasrocke350m1 kernel: r8169 :03:00.0 eth0: jumbo 
features [frames: 9200 bytes, tx checksumming: ko]
Apr 22 10:56:11.967396 myasrocke350m1 kernel: r8169 :03:00.0 eth6: renamed 
from eth0
Apr 22 10:56:12.064323 myasrocke350m1 kernel: IPv6: ADDRCONF(NETDEV_UP): eth6: 
link is not ready
Apr 22 10:56:12.179106 myasrocke350m1 kernel: r8169 :03:00.0: firmware: 
direct-loading firmware rtl_nic/rtl8168e-2.fw
Apr 22 10:56:12.247858 myasrocke350m1 kernel: r8169 :03:00.0 eth6: link down
Apr 22 10:56:12.248593 myasrocke350m1 kernel: IPv6: ADDRCONF(NETDEV_UP): eth6: 
link is not ready
Apr 22 10:56:14.992108 myasrocke350m1 kernel: r8169 :03:00.0 eth6: link up
Apr 22 10:56:14.993299 myasrocke350m1 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): 
eth6: link becomes ready
```

Is it possible to get this well below one second?


Thanks,

Paul

signature.asc
Description: This is a digitally signed message part


Re: [RFC] change the default Kconfig value of mlx5_en

2017-04-22 Thread Ian Kumlien
On Sat, Apr 22, 2017 at 3:07 AM, Saeed Mahameed
 wrote:
> On Sat, Apr 22, 2017 at 3:47 AM, Ian Kumlien  wrote:
>> On Sat, Apr 22, 2017 at 2:34 AM, Saeed Mahameed
>>  wrote:
>>> On Sat, Apr 22, 2017 at 2:10 AM, Ian Kumlien  wrote:
 Sorry,

 Back again, fighting cold, hot whiskey has been consumed...

 Something like this would perhaps be a better solution:

 diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c
 b/drivers/net/ethernet/mellanox/mlx5/core/main.c
 index 60154a175bd3..fe192e247601 100644
 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
 +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
 @@ -1139,6 +1139,10 @@ static int mlx5_load_one(struct mlx5_core_dev
 *dev, struct mlx5_priv *priv,

  #ifdef CONFIG_MLX5_CORE_EN
 mlx5_eswitch_attach(dev->priv.eswitch);
 +#else
 +   if (MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_ETH) {
 +   dev_info(&pdev->dev, "Ethernet device discovered but
 support not enabled in kernel.");
 +   }
  #endif

>>>
>>> Currently both MLX5_CORE=n and MLX5_CORE_EN=n as a default, the issue
>>> you are seeing can occur only if you explicitly  set MLX5_CORE=y and
>>> MLX5_CORE=n, Why would someone do this if he knows he wants Ethernet
>>> support as well ? IMHO this print is redundant .
>>
>> Well, I'm running a prebuilt kernel - which was configured this way,
>> and since there
>> is no mlx5_en module and it does state that the link is "Ethernet", it
>> just looks like the
>> driver is broken or in some kind of really weird state.
>>
>>> Anyway, Are you looking for RDMA support over ethernet (RoCE) ? and
>>> you are not interested to have ethernet netdev support ?
>>
>> ? RDMA is something we'll look at in the future, right now, having the
>> nics actually
>> work as nics is a priority ;)
>>
>
> I see, i just wanted to understand your situation :)
>
>>> if yes, I think this is something that can be achieved, but the
>>> question is do we really need this ?
>>
>> It's really weird to see the driver load, to see everything register
>> and have no feedback.
>>
>
> So, in your case you have mlx5 core support without MLX5_CORE_EN which
> provides the eswitch and netdev functionality in ethernet.

Yes

> But you will still have mlx5_ib register an RDMA interface and
> theoretically it should work, the only thing you won't see is a
> netdevice.
>
> The weird thing is that you don't see a link up on the RDMA interface,
> Leon/Matan can you please look into this ? do we really need a netdev
> to have a functioning RDMA logical link in ethernet ?

The switch we have does support RDMA but the manual is sparse (as in
nothing really there) wrt enabling/configuring the RDMA bit so something
might be missing.

I'll try to remember to do the same test when we setup the mellanox switches =)

>> Including no network devices, but if you run the Infiniband commands,
>> they tell you that
>> you are connected to Ethernet but that the device is down and disabled.
>>
>> To me, down and disabled is not the same as in "Ethernet support is
>> not included" =)
>>
>> Basically, i would hate for someone else to end up in the same
>> situation since you only
>> get guides on how to enable infiniband/RDMA but what you really want
>> to do at that point
>> is to disable it and see if that gives you your network devices back =)
>>
>
> Yes this is misleading, Maybe your kernel log warning is not so bad
> after all, but let me dig more into this.
> I will get back to you next week.

Thanks, I bet that there is better ways to do it, this one was just
one of the first ones i found =)

>> I have had similar issues with some connectx3 devices while playing at
>> home but i suspect
>> that it's just a limitation of OFED packages available for the dist I'm 
>> running.


Re: [PATCH net] net: ipv6: regenerate host route if moved to gc list

2017-04-22 Thread Dmitry Vyukov
On Sat, Apr 22, 2017 at 7:57 AM, Martin KaFai Lau  wrote:
> On Fri, Apr 21, 2017 at 04:40:30PM -0700, David Ahern wrote:
>> Taking down the loopback device wreaks havoc on IPv6 routes. By
>> extension, taking a VRF device wreaks havoc on its table.
>>
>> Dmitry and Andrey both reported heap out-of-bounds reports in the IPv6
>> FIB code while running syzkaller fuzzer. The root cause is a dead dst
>> that is on the garbage list gets reinserted into the IPv6 FIB. While on
>> the gc (or perhaps when it gets added to the gc list) the dst->next is
>> set to an IPv4 dst. A subsequent walk of the ipv6 tables causes the
>> out-of-bounds access.
> Thanks for the investigation and details explanation.
>
> It sounds like the dst is already in DST_OBSOLETE_DEAD during
> the second fib6_add().  Glad that the fib6_del() caught it.
>
>>
>> Andrey's reproducer was the key to getting to the bottom of this.
>>
>> With IPv6, host routes for an address have the dst->dev set to the
>> loopback device. When the 'lo' device is taken down, rt6_ifdown initiates
>> a walk of the fib evicting routes with the 'lo' device which means all
>> host routes are removed. That process moves the dst which is attached to
>> an inet6_ifaddr to the gc list and marks it as dead.
>>
>> The recent change to keep global IPv6 addresses added a new function
>> fixup_permanent_addr that is called on admin up. That function restarts
>> dad for an inet6_ifaddr and when it completes the host route attached
>> to it is inserted into the fib. Since the route was marked dead and
>> moved to the gc list, we get the reported out-of-bounds accesses. If
>> the device with the address is taken down or the address is removed, the
>> WARN_ON in fib6_del is triggered.
>>
>> All of those faults are fixed by regenerating the host route of the
>> existing one has been moved to the gc list, something that can be
>> determined by checking if the rt6i_ref counter is 0.
>>
>> The update of the route on the ifp is done using cmpxchg as suggested
>> by Li RongQing in a patch a year ago.
>>
>> Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
>> Reported-by: Dmitry Vyukov 
>> Reported-by: Andrey Konovalov 
>> Signed-off-by: David Ahern 
>> ---
>>  net/ipv6/addrconf.c | 8 +---
>>  1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>> index 08f9e8ea7a81..9328a45b 100644
>> --- a/net/ipv6/addrconf.c
>> +++ b/net/ipv6/addrconf.c
>> @@ -3303,14 +3303,16 @@ static void addrconf_gre_config(struct net_device 
>> *dev)
>>  static int fixup_permanent_addr(struct inet6_dev *idev,
>>   struct inet6_ifaddr *ifp)
>>  {
>> - if (!ifp->rt) {
>> - struct rt6_info *rt;
>> + if (!ifp->rt || !atomic_read(&ifp->rt->rt6i_ref)) {
>> + struct rt6_info *rt, *prev;
>>
>>   rt = addrconf_dst_alloc(idev, &ifp->addr, false);
>>   if (unlikely(IS_ERR(rt)))
>>   return PTR_ERR(rt);
>>
>> - ifp->rt = rt;
>> + prev = cmpxchg(&ifp->rt, ifp->rt, rt);
> One small question.  Why cmpxchg is needed instead
> of a ip6_rt_put() and then assign?
> Is it fixing another bug?

cmpxchg here looks fishy.
If there are no concurrent modifications, then it is not needed.
If there are and cmpxchg fails, then we will put the installed rt and
leak the new one.




>> + if (prev)
>> + ip6_rt_put(prev);
>>   }
>>
>>   if (!(ifp->flags & IFA_F_NOPREFIXROUTE)) {
>> --
>> 2.1.4
>>


Re: [PATCH 3/4] net: phy: realtek: add disable RX delay hack for RTL8211E

2017-04-22 Thread kbuild test robot
Hi Icenowy,

[auto build test ERROR on net-next/master]
[also build test ERROR on v4.11-rc7 next-20170421]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Icenowy-Zheng/net-phy-realtek-change-macro-name-for-page-select-register/20170422-144641
config: i386-randconfig-x070-201716 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/net/phy/realtek.c: In function 'rtl8211e_config_init':
>> drivers/net/phy/realtek.c:147:3: error: expected ';' before 'phy_write'
  phy_write(phydev, RTL8211E_EXT_PAGE_SELECT, 0xa4);
  ^

vim +147 drivers/net/phy/realtek.c

   141   *
   142   * The datasheet of RTL8211E didn't cover this ext page.
   143   *
   144   * Select extension page 0xa4 here.
   145   */
   146  phy_write(phydev, RTL8211_PAGE_SELECT, 
RTL8211E_EXT_PAGE)
 > 147  phy_write(phydev, RTL8211E_EXT_PAGE_SELECT, 0xa4);
   148  
   149  /* Write the magic number */
   150  phy_write(phydev, 0x1c, 0xb591);

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip