date:20170628

[PATCH] net: freescale: gianfar : constify dev_pm_ops structures.

2017-06-28 Thread Arvind Yadav

dev_pm_ops are not supposed to change at runtime. All functions
working with dev_pm_ops provided by  work with const
dev_pm_ops. So mark the non-const structs as const.

File size before:
   textdata bss dec hex filename
  19057 392   0   194494bf9 drivers/net/ethernet/freescale/gianfar.o

File size After adding 'const':
   textdata bss dec hex filename
  19249 192   0   194414bf1 drivers/net/ethernet/freescale/gianfar.o

Signed-off-by: Arvind Yadav 
---
 drivers/net/ethernet/freescale/gianfar.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c 
b/drivers/net/ethernet/freescale/gianfar.c
index 0ff166e..e3b0501 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -1718,7 +1718,7 @@ static int gfar_restore(struct device *dev)
return 0;
 }
 
-static struct dev_pm_ops gfar_pm_ops = {
+static const struct dev_pm_ops gfar_pm_ops = {
.suspend = gfar_suspend,
.resume = gfar_resume,
.freeze = gfar_suspend,
-- 
1.9.1

[PATCH] net: smc91x: constify dev_pm_ops structures.

2017-06-28 Thread Arvind Yadav

dev_pm_ops are not supposed to change at runtime. All functions
working with dev_pm_ops provided by  work with const
dev_pm_ops. So mark the non-const structs as const.

File size before:
   textdata bss dec hex filename
  18709 401   0   191104aa6 drivers/net/ethernet/smsc/smc91x.o

File size After adding 'const':
   textdata bss dec hex filename
  18901 201   0   191024a9e drivers/net/ethernet/smsc/smc91x.o

Signed-off-by: Arvind Yadav 
---
 drivers/net/ethernet/smsc/smc91x.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/smsc/smc91x.c 
b/drivers/net/ethernet/smsc/smc91x.c
index 91e9bd7..83cf052 100644
--- a/drivers/net/ethernet/smsc/smc91x.c
+++ b/drivers/net/ethernet/smsc/smc91x.c
@@ -2488,7 +2488,7 @@ static int smc_drv_resume(struct device *dev)
return 0;
 }
 
-static struct dev_pm_ops smc_drv_pm_ops = {
+static const struct dev_pm_ops smc_drv_pm_ops = {
.suspend= smc_drv_suspend,
.resume = smc_drv_resume,
 };
-- 
1.9.1

Re: [PATCH v6 0/3] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-06-28 Thread Ding Tianhong

ping

On 2017/6/22 20:15, Ding Tianhong wrote:
> Some devices have problems with Transaction Layer Packets with the Relaxed
> Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
> PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
> devices with Relaxed Ordering issues, and a use of this new flag by the
> cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
> Ports.
> 
> It's been years since I've submitted kernel.org patches, I appolgise for the
> almost certain submission errors.
> 
> v2: Alexander point out that the v1 was only a part of the whole solution,
> some platform which has some issues could use the new flag to indicate
> that it is not safe to enable relaxed ordering attribute, then we need
> to clear the relaxed ordering enable bits in the PCI configuration when
> initializing the device. So add a new second patch to modify the PCI
> initialization code to clear the relaxed ordering enable bit in the
> event that the root complex doesn't want relaxed ordering enabled.
> 
> The third patch was base on the v1's second patch and only be changed
> to query the relaxed ordering enable bit in the PCI configuration space
> to allow the Chelsio NIC to send TLPs with the relaxed ordering attributes
> set.
> 
> This version didn't plan to drop the defines for Intel Drivers to use the
> new checking way to enable relaxed ordering because it is not the hardest
> part of the moment, we could fix it in next patchset when this patches
> reach the goal.  
> 
> v3: Redesigned the logic for pci_configure_relaxed_ordering when 
> configuration,
> If a PCIe device didn't enable the relaxed ordering attribute default,
> we should not do anything in the PCIe configuration, otherwise we
> should check if any of the devices above us do not support relaxed
> ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on
> the result if we get a return that indicate that the relaxed ordering
> is not supported we should update our device to disable relaxed ordering
> in configuration space. If the device above us doesn't exist or isn't
> the PCIe device, we shouldn't do anything and skip updating relaxed 
> ordering
> because we are probably running in a guest.
> 
> v4: Rename the functions pcie_get_relaxed_ordering and 
> pcie_disable_relaxed_ordering
> according John's suggestion, and modify the description, use the 
> true/false
> as the return value.
> 
> We shouldn't enable relaxed ordering attribute by the setting in the root
> complex configuration space for PCIe device, so fix it for cxgb4.
> 
> Fix some format issues.
> 
> v5: Removed the unnecessary code for some function which only return the bool
> value, and add the check for VF device.
> 
> Make this patch set base on 4.12-rc5.
> 
> v6: Fix the logic error in the need to enable the relaxed ordering attribute 
> for cxgb4.
>  
> Casey Leedom (2):
>   PCI: Add new PCIe Fabric End Node flag,
> PCI_DEV_FLAGS_NO_RELAXED_ORDERING
>   net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
> 
> Ding Tianhong (1):
>   PCI: Enable PCIe Relaxed Ordering if supported
> 
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4.h  |  1 +
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 17 ++
>  drivers/net/ethernet/chelsio/cxgb4/sge.c|  5 +--
>  drivers/pci/pci.c   | 32 +++
>  drivers/pci/probe.c | 41 
> +
>  drivers/pci/quirks.c| 38 +++
>  include/linux/pci.h |  4 +++
>  7 files changed, 136 insertions(+), 2 deletions(-)
>

[PATCH] net: ibm: ibmveth: constify dev_pm_ops structures.

2017-06-28 Thread Arvind Yadav

dev_pm_ops are not supposed to change at runtime. All functions
working with dev_pm_ops provided by  work with const
dev_pm_ops. So mark the non-const structs as const.

File size before:
   textdata bss dec hex filename
  154261256   0   16682412a drivers/net/ethernet/ibm/ibmveth.o

File size After adding 'const':
   textdata bss dec hex filename
  156181064   0   16682412a drivers/net/ethernet/ibm/ibmveth.o

Signed-off-by: Arvind Yadav 
---
 drivers/net/ethernet/ibm/ibmveth.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmveth.c 
b/drivers/net/ethernet/ibm/ibmveth.c
index 72ab7b6..02b26bf 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -1843,7 +1843,7 @@ static int ibmveth_resume(struct device *dev)
 };
 MODULE_DEVICE_TABLE(vio, ibmveth_device_table);
 
-static struct dev_pm_ops ibmveth_pm_ops = {
+static const struct dev_pm_ops ibmveth_pm_ops = {
.resume = ibmveth_resume
 };
 
-- 
1.9.1

Re: [PATCH net-next v2 3/9] nfp: provide infrastructure for offloading flower based TC filters

2017-06-28 Thread Simon Horman

On Wed, Jun 28, 2017 at 06:35:07PM -0700, Jakub Kicinski wrote:
> On Wed, 28 Jun 2017 22:29:56 +0200, Simon Horman wrote:
> > From: Pieter Jansen van Vuuren 
> > 
> > Adds a flower based TC offload handler for representor devices, this
> > is in addition to the bpf based offload handler. The changes in this
> > patch will be used in a follow-up patch to add tc flower offload to
> > the NFP.
> > 
> > The flower app enables tc offloads on representors by default.
> > 
> > Signed-off-by: Pieter Jansen van Vuuren 
> > 
> > Signed-off-by: Simon Horman 
> 
> Thanks, two nits, since it seems like there will have to be another
> respin.

Thanks, I will fix those in v3.

Re: [PATCH v3 net-next 03/12] nfp: change bpf verifier hooks to match new verifier data structures

2017-06-28 Thread Jakub Kicinski

On Tue, 27 Jun 2017 13:57:34 +0100, Edward Cree wrote:
> Signed-off-by: Edward Cree 

Acked-by: Jakub Kicinski 

Sorry about the delay.

Re: [PATCH] [net-next] net/mlx5e: select CONFIG_MLXFW

2017-06-28 Thread Or Gerlitz

On Wed, Jun 28, 2017 at 11:10 PM, Arnd Bergmann  wrote:
> With the introduction of mlx5 firmware flash support, we get a link
> error with CONFIG_MLXFW=m and CONFIG_MLX5_CORE=y:
>
> drivers/net/ethernet/mellanox/mlx5/core/fw.o: In function 
> `mlx5_firmware_flash':
> fw.c:(.text+0x9d4): undefined reference to `mlxfw_firmware_flash'

Thanks Arnd, I got a report on that from Jakub but you were before me here..

> We could have a more elaborate method to force MLX5 to be a loadable
> module in this case, but the easiest fix seems to be to always enable
> MLXFW as well, like we do for CONFIG_MLXSW_SPECTRUM, which is the other
> user of mlxfw_firmware_flash.

We would not want to force mlx5 users to build mlxfw.

So lets either use the more elaborate method or maybe instead of using
IS_ENABLED in mlxfw.h use IS_REACHABLE (this was suggested by Jakub)

Or.

Re: [PATCH net] bpf: prevent leaking pointer via xadd on unpriviledged

2017-06-28 Thread Alexei Starovoitov


On 6/28/17 6:04 PM, Daniel Borkmann wrote:

Prevent this by checking xadd src reg for pointer types. Also
add a couple of test cases related to this.

Fixes: 1be7f75d1668 ("bpf: enable non-root eBPF programs")
Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)")
Signed-off-by: Daniel Borkmann 


Acked-by: Alexei Starovoitov

Re: [PATCH net-next v2 8/9] nfp: add a stats handler for flower offloads

2017-06-28 Thread Jakub Kicinski

On Wed, 28 Jun 2017 22:30:01 +0200, Simon Horman wrote:
> From: Pieter Jansen van Vuuren 
> 
> Previously there was no way of updating flow rule stats after they
> have been offloaded to hardware. This is solved by keeping track of
> stats received from hardware and providing this to the TC handler
> on request.
> 
> Signed-off-by: Pieter Jansen van Vuuren 
> Signed-off-by: Simon Horman 

> @@ -334,7 +441,10 @@ int nfp_modify_flow_metadata(struct nfp_app *app,
>   /* Update flow payload with mask ids. */
>   nfp_flow->unmasked_data[NFP_FL_MASK_ID_LOCATION] = new_mask_id;
>  
> - return 0;
> + /* Release the stats ctx id. */
> + temp_ctx_id = be32_to_cpu(nfp_flow->meta.host_ctx_id);
> +
> + return nfp_release_stats_entry(app, temp_ctx_id);
>  }
>  
>  int nfp_flower_metadata_init(struct nfp_app *app)
> @@ -362,6 +472,15 @@ int nfp_flower_metadata_init(struct nfp_app *app)
>   return -ENOMEM;
>   }
>  
> + /* Init ring buffer and unallocated stats_ids. */
> + priv->stats_ids.free_list.buf =
> + vmalloc(NFP_FL_STATS_ENTRY_RS * NFP_FL_STATS_ELEM_RS);
> + if (!priv->stats_ids.free_list.buf) {
> + vfree(priv->mask_ids.mask_id_free_list.buf);
> + return -ENOMEM;

This is hiding a leak, I think.  There were 2 things allocate above.
Please add a proper unwind path with goto's - it makes catching bugs
like this much easier.

> + }
> + priv->stats_ids.init_unalloc = NFP_FL_REPEATED_HASH_MAX;
> +
>   return 0;
>  }
>

Re: [PATCH NET V5 2/2] net: hns: Use phy_driver to setup Phy loopback

2017-06-28 Thread Yunsheng Lin

Hi, Andrew

On 2017/6/29 4:28, Andrew Lunn wrote:
>>> >From your description, it sounds like you can call phy_resume() on a
>>> device which is not suspended. 
>> Do you mean after calling dev_close, the device is still not suspended?
> 
> You only call dev_close() if the device is running. What if somebody
> runs the self test on an interface when it has never been opened? It
> looks like you will call phy_resume(). But since it has never been
> suspended, you could be in trouble.
Here is what I can think of:
1. when the mac driver is first loaded, the phy has a default state. suspended?
2. If user runs the self test after using 'ifconfig ethX down', then I suppose
phy is already suspended.

Also I don't quite understand what do you mean by in trouble. Right now in phy
core, phy_resume return ok even the phy is not suspended.

Best Regards
Yunsheng Lin
>>
>> In general, suspend is expected to
>>> store away state which will be lost when powering down a
>>> device. Resume writes that state back into the device after it is
>>> powered up. So resuming a device which was never suspended could write
>>> bad state into it.
>>
>> Do you mean phydev->suspended has bad state?
> 
> phy_resume() current does not check the phydev->suspended state.
> 
>>> Also, what about if WOL has been set before closing the device?
>>
>> phy_suspend will return errro.
>>
>> int phy_suspend(struct phy_device *phydev)
>> {
>>  struct phy_driver *phydrv = to_phy_driver(phydev->mdio.dev.driver);
>>  struct ethtool_wolinfo wol = { .cmd = ETHTOOL_GWOL };
>>  int ret = 0;
>>
>>  /* If the device has WOL enabled, we cannot suspend the PHY */
>>  phy_ethtool_get_wol(phydev, );
>>  if (wol.wolopts)
>>  return -EBUSY;
>>
>>  if (phydev->drv && phydrv->suspend)
>>  ret = phydrv->suspend(phydev);
>>
>>  if (ret)
>>  return ret;
>>
>>  phydev->suspended = true;
>>
>>  return ret;
>> }
> 
> Which means when you call phy_resume() in lb_setup() you are again
> resuming a device which is not suspended...
> 
>Andrew
> 
> .
>

Re: [PATCH net-next v2 7/9] nfp: add metadata to each flow offload

2017-06-28 Thread Jakub Kicinski

On Wed, 28 Jun 2017 22:30:00 +0200, Simon Horman wrote:
> From: Pieter Jansen van Vuuren 
> 
> Adds metadata describing the mask id of each flow and keeps track of
> flows installed in hardware. Previously a flow could not be removed
> from hardware as there was no way of knowing if that a specific flow
> was installed. This is solved by storing the offloaded flows in a
> hash table.
> 
> Signed-off-by: Pieter Jansen van Vuuren 
> Signed-off-by: Simon Horman 

> diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c 
> b/drivers/net/ethernet/netronome/nfp/flower/main.c
> index 19f20f819e2f..1103d23a8ec7 100644
> --- a/drivers/net/ethernet/netronome/nfp/flower/main.c
> +++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
> @@ -50,14 +50,6 @@
>  
>  #define NFP_FLOWER_ALLOWED_VER 0x00010001UL
>  
> -/**
> - * struct nfp_flower_priv - Flower APP per-vNIC priv data
> - * @nn:   Pointer to vNIC
> - */
> -struct nfp_flower_priv {
> - struct nfp_net *nn;
> -};
> -
>  static const char *nfp_flower_extra_cap(struct nfp_app *app, struct nfp_net 
> *nn)
>  {
>   return "FLOWER";
> @@ -351,6 +343,12 @@ static int nfp_flower_init(struct nfp_app *app)
>   if (!app->priv)
>   return -ENOMEM;
>  
> + err = nfp_flower_metadata_init(app);
> + if (nfp_flower_metadata_init(app)) {

You're calling init twice here.  Also please do the error path with
goto, as explained in review of patch 8 having a proper unwind makes
later patches easier to review.

> + kfree(app->priv);
> + return err;
> + }
> +
>   return 0;
>  }
>  

> +static int nfp_mask_alloc(struct nfp_app *app, u8 *mask_id)
> +{
> + struct nfp_flower_priv *priv = app->priv;
> + struct timespec64 delta, now;
> + struct circ_buf *ring;
> + u8 temp_id, freed_id;
> +
> + ring = >mask_ids.mask_id_free_list;
> + freed_id = NFP_FLOWER_MASK_ENTRY_RS - 1;
> + /* Checking for unallocated entries first. */
> + if (priv->mask_ids.init_unallocated > 0) {
> + *mask_id = priv->mask_ids.init_unallocated;
> + priv->mask_ids.init_unallocated--;
> + return 0;
> + }
> +
> + /* Checking if buffer is empty. */
> + if (ring->head == ring->tail) {
> + *mask_id = freed_id;
> + return -ENOENT;
> + }
> +
> + memcpy(_id, >buf[ring->tail], NFP_FLOWER_MASK_ELEMENT_RS);
> + *mask_id = temp_id;
> + memcpy(>buf[ring->tail], _id, NFP_FLOWER_MASK_ELEMENT_RS);
> + ring->tail = (ring->tail + NFP_FLOWER_MASK_ELEMENT_RS) %
> +  (NFP_FLOWER_MASK_ENTRY_RS * NFP_FLOWER_MASK_ELEMENT_RS);
> +
> + getnstimeofday64();
> + delta = timespec64_sub(now, priv->mask_ids.last_used[*mask_id]);
> +
> + if (timespec64_to_ns() < NFP_FL_MASK_REUSE_TIME_NS) {
> + nfp_release_mask_id(app, *mask_id);

nfp_release_mask_id() will reset the time stamp and put the mask at the
end of the queue.  Is that OK?

> + return -ENOENT;
> + }
> +
> + return 0;
> +}
> +

> +int nfp_flower_metadata_init(struct nfp_app *app)
> +{
> + struct nfp_flower_priv *priv = app->priv;
> +
> + hash_init(priv->mask_table);
> + hash_init(priv->flow_table);
> +
> + /* Init ring buffer and unallocated mask_ids. */
> + priv->mask_ids.mask_id_free_list.buf =
> + kmalloc(NFP_FLOWER_MASK_ENTRY_RS * NFP_FLOWER_MASK_ELEMENT_RS,
> + GFP_KERNEL);

kmalloc_array, perhaps?  

> + if (!priv->mask_ids.mask_id_free_list.buf)
> + return -ENOMEM;
> +
> + priv->mask_ids.init_unallocated = NFP_FLOWER_MASK_ENTRY_RS - 1;
> +
> + /* Init timestamps for mask id*/
> + priv->mask_ids.last_used =
> + kmalloc_array(NFP_FLOWER_MASK_ENTRY_RS,
> +   sizeof(*priv->mask_ids.last_used), GFP_KERNEL);
> + if (!priv->mask_ids.last_used) {
> + kfree(priv->mask_ids.mask_id_free_list.buf);
> + return -ENOMEM;
> + }
> +
> + return 0;
> +}

Re: [PATCH net-next v2 3/9] nfp: provide infrastructure for offloading flower based TC filters

2017-06-28 Thread Jakub Kicinski

On Wed, 28 Jun 2017 22:29:56 +0200, Simon Horman wrote:
> +/**
> + * nfp_flower_del_offload() - Removes a flow from hardware.
> + * @app: Pointer to the APP handle
> + * @netdev:  netdev structure.
> + * @flow:   TC flower classifier offload structure

Nit: there are spaces and tabs mixed here, same for
nfp_flower_get_stats().

> + *
> + * Removes a flow from the repeated hash structure and clears the
> + * action payload.
> + *
> + * Return: negative value on error, 0 if removed successfully.
> + */
> +static int
> +nfp_flower_del_offload(struct nfp_app *app, struct net_device *netdev,
> +struct tc_cls_flower_offload *flow)
> +{
> + return -EOPNOTSUPP;
> +}

[PATCH v3] datapath: Avoid using stack larger than 1024.

2017-06-28 Thread Tonghao Zhang

When compiling OvS-master on 4.4.0-81 kernel,
there is a warning:

CC [M]  /root/ovs/datapath/linux/datapath.o
/root/ovs/datapath/linux/datapath.c: In function
'ovs_flow_cmd_set':
/root/ovs/datapath/linux/datapath.c:1221:1: warning:
the frame size of 1040 bytes is larger than 1024 bytes
[-Wframe-larger-than=]

This patch factors out match-init and action-copy to avoid
"Wframe-larger-than=1024" warning. Because mask is only
used to get actions, we new a function to save some
stack space.

Signed-off-by: Tonghao Zhang 
---
 datapath/datapath.c | 81 ++---
 1 file changed, 58 insertions(+), 23 deletions(-)

diff --git a/datapath/datapath.c b/datapath/datapath.c
index c85029c..fb9f114 100644
--- a/datapath/datapath.c
+++ b/datapath/datapath.c
@@ -1100,6 +1100,58 @@ static struct sw_flow_actions *get_flow_actions(struct 
net *net,
return acts;
 }
 
+/* Factor out match-init and action-copy to avoid
+ * "Wframe-larger-than=1024" warning. Because mask is only
+ * used to get actions, we new a function to save some
+ * stack space.
+ *
+ * If there are not key and action attrs, we return 0
+ * directly. In the case, the caller will also not use the
+ * match as before. If there is action attr, we try to get
+ * actions and save them to *acts. Before returning from
+ * the function, we reset the match->mask pointer. Because
+ * we should not to return match object with dangling reference
+ * to mask.
+ * */
+static int ovs_nla_init_match_and_action(struct net *net,
+struct sw_flow_match *match,
+struct sw_flow_key *key,
+struct nlattr **a,
+struct sw_flow_actions **acts,
+bool log)
+{
+   struct sw_flow_mask mask;
+   int error = 0;
+
+   if (a[OVS_FLOW_ATTR_KEY]) {
+   ovs_match_init(match, key, true, );
+   error = ovs_nla_get_match(net, match, a[OVS_FLOW_ATTR_KEY],
+ a[OVS_FLOW_ATTR_MASK], log);
+   if (error)
+   goto error;
+   }
+
+   if (a[OVS_FLOW_ATTR_ACTIONS]) {
+   if (!a[OVS_FLOW_ATTR_KEY]) {
+   OVS_NLERR(log,
+ "Flow key attribute not present in set 
flow.");
+   return -EINVAL;
+   }
+
+   *acts = get_flow_actions(net, a[OVS_FLOW_ATTR_ACTIONS], key,
+, log);
+   if (IS_ERR(*acts)) {
+   error = PTR_ERR(*acts);
+   goto error;
+   }
+   }
+
+   /* On success, error is 0. */
+error:
+   match->mask = NULL;
+   return error;
+}
+
 static int ovs_flow_cmd_set(struct sk_buff *skb, struct genl_info *info)
 {
struct net *net = sock_net(skb->sk);
@@ -1107,7 +1159,6 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct 
genl_info *info)
struct ovs_header *ovs_header = info->userhdr;
struct sw_flow_key key;
struct sw_flow *flow;
-   struct sw_flow_mask mask;
struct sk_buff *reply = NULL;
struct datapath *dp;
struct sw_flow_actions *old_acts = NULL, *acts = NULL;
@@ -1119,34 +1170,18 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct 
genl_info *info)
bool ufid_present;
 
ufid_present = ovs_nla_get_ufid(, a[OVS_FLOW_ATTR_UFID], log);
-   if (a[OVS_FLOW_ATTR_KEY]) {
-   ovs_match_init(, , true, );
-   error = ovs_nla_get_match(net, , a[OVS_FLOW_ATTR_KEY],
- a[OVS_FLOW_ATTR_MASK], log);
-   } else if (!ufid_present) {
+   if (!a[OVS_FLOW_ATTR_KEY] && !ufid_present) {
OVS_NLERR(log,
  "Flow set message rejected, Key attribute missing.");
-   error = -EINVAL;
+   return -EINVAL;
}
+
+   error = ovs_nla_init_match_and_action(net, , , a,
+ , log);
if (error)
goto error;
 
-   /* Validate actions. */
-   if (a[OVS_FLOW_ATTR_ACTIONS]) {
-   if (!a[OVS_FLOW_ATTR_KEY]) {
-   OVS_NLERR(log,
- "Flow key attribute not present in set 
flow.");
-   error = -EINVAL;
-   goto error;
-   }
-
-   acts = get_flow_actions(net, a[OVS_FLOW_ATTR_ACTIONS], ,
-   , log);
-   if (IS_ERR(acts)) {
-   error = PTR_ERR(acts);
-   goto error;
-   }
-
+   if (acts) {
/* Can allocate before locking if have acts. */
reply =

Re: [PATCH NET V7 1/2] net: phy: Add phy loopback support in net phy framework

2017-06-28 Thread Yunsheng Lin

Hi, Andrew

On 2017/6/28 21:27, Andrew Lunn wrote:
> On Wed, Jun 28, 2017 at 05:13:10PM +0800, Lin Yun Sheng wrote:
>> This patch add set_loopback in phy_driver, which is used by MAC
>> driver to enable or disable phy loopback. it also add a generic
>> genphy_loopback function, which use BMCR loopback bit to enable
>> or disable loopback.
>>
>> Signed-off-by: Lin Yun Sheng 
> 
> Hi Lin
> 
> It is normal to include my
> 
> Reviewed-by: Andrew Lunn 
> 
> when resubmitting a patch. The only time you drop such tags is when
> you make a big change.
Will do next time, thanks for reviewing.

Best Regards
Yunsheng Lin

Re: [PATCH net-next v2 3/9] nfp: provide infrastructure for offloading flower based TC filters

2017-06-28 Thread Jakub Kicinski

On Wed, 28 Jun 2017 22:29:56 +0200, Simon Horman wrote:
> From: Pieter Jansen van Vuuren 
> 
> Adds a flower based TC offload handler for representor devices, this
> is in addition to the bpf based offload handler. The changes in this
> patch will be used in a follow-up patch to add tc flower offload to
> the NFP.
> 
> The flower app enables tc offloads on representors by default.
> 
> Signed-off-by: Pieter Jansen van Vuuren 
> Signed-off-by: Simon Horman 

Thanks, two nits, since it seems like there will have to be another
respin.

> @@ -313,6 +317,8 @@ static int nfp_flower_vnic_init(struct nfp_app *app, 
> struct nfp_net *nn,
>  static int nfp_flower_init(struct nfp_app *app)
>  {
>   const struct nfp_pf *pf = app->pf;
> + u64 version;
> + int err;
>  
>   if (!pf->eth_tbl) {
>   nfp_warn(app->cpp, "FlowerNIC requires eth table\n");
> @@ -329,6 +335,18 @@ static int nfp_flower_init(struct nfp_app *app)
>   return -EINVAL;
>   }
>  
> + version = nfp_rtsym_read_le(app->pf->rtbl, "hw_flower_version", );
> + if (err) {
> + nfp_warn(app->cpp, "FlowerNIC requires hw_flower_version memory 
> symbol\n");
> + return err;
> + }
> +
> + /* We need to ensure hardware has enough flower capabilities. */
> + if (version != NFP_FLOWER_ALLOWED_VER) {
> + nfp_warn(app->cpp, "FlowerNIC: unspported firmware version\n");

s/unspported/unsupported/

> + return -EINVAL;
> + }
> +
>   app->priv = kzalloc(sizeof(struct nfp_flower_priv), GFP_KERNEL);
>   if (!app->priv)
>   return -ENOMEM;

> diff --git a/drivers/net/ethernet/netronome/nfp/nfp_port.h 
> b/drivers/net/ethernet/netronome/nfp/nfp_port.h
> index de60cacd3362..f3552da3c277 100644
> --- a/drivers/net/ethernet/netronome/nfp/nfp_port.h
> +++ b/drivers/net/ethernet/netronome/nfp/nfp_port.h
> @@ -37,6 +37,7 @@
>  #include 
>  #include 
>  
> +struct tc_to_netdev;
>  struct net_device;
>  struct nfp_app;
>  struct nfp_pf;
> @@ -109,6 +110,10 @@ struct nfp_port {
>  
>  extern const struct switchdev_ops nfp_port_switchdev_ops;
>  
> +int
> +nfp_port_setup_tc(struct net_device *netdev, u32 handle, u32 chain_index,

int can be on the same line

Re: [PATCH net-next v2 8/9] nfp: add a stats handler for flower offloads

2017-06-28 Thread Jakub Kicinski

On Wed, 28 Jun 2017 22:30:01 +0200, Simon Horman wrote:
> @@ -288,7 +292,21 @@ nfp_flower_del_offload(struct nfp_app *app, struct 
> net_device *netdev,
>  static int
>  nfp_flower_get_stats(struct nfp_app *app, struct tc_cls_flower_offload *flow)
>  {
> - return -EOPNOTSUPP;
> + struct nfp_fl_payload *nfp_flow;
> +
> + nfp_flow = nfp_flower_find_in_fl_table(app, flow->cookie);
> + if (!nfp_flow)
> + return -EINVAL;
> +
> + spin_lock(_flow->lock);
> + tcf_exts_stats_update(flow->exts, nfp_flow->stats.bytes,
> +   nfp_flow->stats.pkts, nfp_flow->stats.used);
> +
> + nfp_flow->stats.pkts = 0;
> + nfp_flow->stats.bytes = 0;
> + spin_unlock(_flow->lock);
> +
> + return 0;
>  }

This needs to take spin_lock_bh() to lock out the RX path safely :(

[PATCH net] bpf: prevent leaking pointer via xadd on unpriviledged

2017-06-28 Thread Daniel Borkmann

Leaking kernel addresses on unpriviledged is generally disallowed,
for example, verifier rejects the following:

  0: (b7) r0 = 0
  1: (18) r2 = 0x897e82304400
  3: (7b) *(u64 *)(r1 +48) = r2
  R2 leaks addr into ctx

Doing pointer arithmetic on them is also forbidden, so that they
don't turn into unknown value and then get leaked out. However,
there's xadd as a special case, where we don't check the src reg
for being a pointer register, e.g. the following will pass:

  0: (b7) r0 = 0
  1: (7b) *(u64 *)(r1 +48) = r0
  2: (18) r2 = 0x897e82304400 ; map
  4: (db) lock *(u64 *)(r1 +48) += r2
  5: (95) exit

We could store the pointer into skb->cb, loose the type context,
and then read it out from there again to leak it eventually out
of a map value. Or more easily in a different variant, too:

   0: (bf) r6 = r1
   1: (7a) *(u64 *)(r10 -8) = 0
   2: (bf) r2 = r10
   3: (07) r2 += -8
   4: (18) r1 = 0x0
   6: (85) call bpf_map_lookup_elem#1
   7: (15) if r0 == 0x0 goto pc+3
   R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R6=ctx R10=fp
   8: (b7) r3 = 0
   9: (7b) *(u64 *)(r0 +0) = r3
  10: (db) lock *(u64 *)(r0 +0) += r6
  11: (b7) r0 = 0
  12: (95) exit

  from 7 to 11: R0=inv,min_value=0,max_value=0 R6=ctx R10=fp
  11: (b7) r0 = 0
  12: (95) exit

Prevent this by checking xadd src reg for pointer types. Also
add a couple of test cases related to this.

Fixes: 1be7f75d1668 ("bpf: enable non-root eBPF programs")
Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)")
Signed-off-by: Daniel Borkmann 
---
 kernel/bpf/verifier.c   |  5 +++
 tools/testing/selftests/bpf/test_verifier.c | 66 +
 2 files changed, 71 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 339c8a1..a8a7256 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -989,6 +989,11 @@ static int check_xadd(struct bpf_verifier_env *env, struct 
bpf_insn *insn)
if (err)
return err;
 
+   if (is_pointer_value(env, insn->src_reg)) {
+   verbose("R%d leaks addr into mem\n", insn->src_reg);
+   return -EACCES;
+   }
+
/* check whether atomic_add can read the memory */
err = check_mem_access(env, insn->dst_reg, insn->off,
   BPF_SIZE(insn->code), BPF_READ, -1);
diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index cabb19b..0ff8c55 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -3749,6 +3749,72 @@ struct test_val {
.errstr = "invalid bpf_context access",
},
{
+   "leak pointer into ctx 1",
+   .insns = {
+   BPF_MOV64_IMM(BPF_REG_0, 0),
+   BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+   offsetof(struct __sk_buff, cb[0])),
+   BPF_LD_MAP_FD(BPF_REG_2, 0),
+   BPF_STX_XADD(BPF_DW, BPF_REG_1, BPF_REG_2,
+ offsetof(struct __sk_buff, cb[0])),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map1 = { 2 },
+   .errstr_unpriv = "R2 leaks addr into mem",
+   .result_unpriv = REJECT,
+   .result = ACCEPT,
+   },
+   {
+   "leak pointer into ctx 2",
+   .insns = {
+   BPF_MOV64_IMM(BPF_REG_0, 0),
+   BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+   offsetof(struct __sk_buff, cb[0])),
+   BPF_STX_XADD(BPF_DW, BPF_REG_1, BPF_REG_10,
+ offsetof(struct __sk_buff, cb[0])),
+   BPF_EXIT_INSN(),
+   },
+   .errstr_unpriv = "R10 leaks addr into mem",
+   .result_unpriv = REJECT,
+   .result = ACCEPT,
+   },
+   {
+   "leak pointer into ctx 3",
+   .insns = {
+   BPF_MOV64_IMM(BPF_REG_0, 0),
+   BPF_LD_MAP_FD(BPF_REG_2, 0),
+   BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_2,
+ offsetof(struct __sk_buff, cb[0])),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map1 = { 1 },
+   .errstr_unpriv = "R2 leaks addr into ctx",
+   .result_unpriv = REJECT,
+   .result = ACCEPT,
+   },
+   {
+   "leak pointer into map val",
+   .insns = {
+   BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+

Re: [PATCH net-next 1/3] net: ethtool: add support for forward error correction modes

2017-06-28 Thread Jakub Kicinski

On Wed, 28 Jun 2017 14:47:51 -0700, Dustin Byford wrote:
> Hi Andrew,
> 
> On Wed Jun 28 15:41, Andrew Lunn wrote:
> > On Tue, Jun 27, 2017 at 03:22:39AM -0700, Jakub Kicinski wrote:  
> > > On Sat, 24 Jun 2017 12:19:43 -0700, Roopa Prabhu wrote:  
> > > > Encoding: Types of encoding
> > > > Off:  Turning off any encoding
> > > > RS :  enforcing RS-FEC encoding on supported speeds
> > > > BaseR  :  enforcing Base R encoding on supported speeds
> > > > Auto   :  IEEE defaults for the speed/medium combination  
> > > 
> > > Just to be sure - does auto mean autonegotiate as defined by IEEE or
> > > some presets?  
> > 
> > I don't know this field very well. Is this confusion likely to happen
> > a lot? Is there a better name for Auto which is less likely to be
> > confused?  
> 
> You're not the first, or the second to ask that question.  I agree it
> could use clarification.
> 
> I always read auto in this context as automatic rather than autoneg.
> The best I can come up with is to perhaps fully spell out "automatic" in
> the documentation and the associated uapi enums.  It's accurate, and
> hopefully different enough from "autoneg" to hint people away from the
> IEEE autoneg concept.

So perhaps just "default"?  Even saying something like ieee-selected
doesn't really help, because apparently there are two autonegs defined
- IEEE one and a "consortium" one...

[PATCH net-next] ibmvnic: Fix assignment of RX/TX IRQ's

2017-06-28 Thread Thomas Falcon

The driver currently creates RX/TX queues during device probe, but
assigns IRQ's to them during device open. On reset, however,
IRQ's are assigned when resetting the queues. If there is a reset
while the device is closed and the device is later opened, the driver will
request IRQ's twice, causing the open to fail. This patch assigns
the IRQ's in the ibmvnic_init function after the queues are reset or
initialized, ensuring IRQ's are only requested once.

Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 87db1eb..a3e6946 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -763,12 +763,6 @@ static int init_resources(struct ibmvnic_adapter *adapter)
if (rc)
return rc;
 
-   rc = init_sub_crq_irqs(adapter);
-   if (rc) {
-   netdev_err(netdev, "failed to initialize sub crq irqs\n");
-   return -1;
-   }
-
rc = init_stats_token(adapter);
if (rc)
return rc;
@@ -1803,7 +1797,6 @@ static int reset_sub_crq_queues(struct ibmvnic_adapter 
*adapter)
return rc;
}
 
-   rc = init_sub_crq_irqs(adapter);
return rc;
 }
 
@@ -3669,6 +3662,13 @@ static int ibmvnic_init(struct ibmvnic_adapter *adapter)
if (rc) {
dev_err(dev, "Initialization of sub crqs failed\n");
release_crq_queue(adapter);
+   return rc;
+   }
+
+   rc = init_sub_crq_irqs(adapter);
+   if (rc) {
+   dev_err(dev, "Failed to initialize sub crq irqs\n");
+   release_crq_queue(adapter);
}
 
return rc;
-- 
2.7.4

Re: ti: wl18xx: add checks on wl18xx_top_reg_write() return value

2017-06-28 Thread Gustavo A. R. Silva



Quoting Kalle Valo :


"Gustavo A. R. Silva"  wrote:


Check return value from call to wl18xx_top_reg_write(),
so in case of error jump to goto label out and return.

Also, remove unnecessary value check before goto label out.

Addresses-Coverity-ID: 1226938
Signed-off-by: Gustavo A. R. Silva 


The prefix should be "wl18xx:", I'll fix that.



Thanks, Kalle.
--
Gustavo A. R. Silva

Re: [RFC 2/2] phy: bcm-ns-usb3: fix MDIO_BUS dependency

2017-06-28 Thread Florian Fainelli

On 06/21/2017 03:06 PM, Arnd Bergmann wrote:
> The driver attempts to 'select MDIO_DEVICE', but the code
> is actually a loadable module when PHYLIB=m:
> 
> drivers/phy/broadcom/phy-bcm-ns-usb3.o: In function 
> `bcm_ns_usb3_mdiodev_phy_write':
> phy-bcm-ns-usb3.c:(.text.bcm_ns_usb3_mdiodev_phy_write+0x28): undefined 
> reference to `mdiobus_write'
> drivers/phy/broadcom/phy-bcm-ns-usb3.o: In function `bcm_ns_usb3_module_exit':
> phy-bcm-ns-usb3.c:(.exit.text+0x18): undefined reference to 
> `mdio_driver_unregister'
> drivers/phy/broadcom/phy-bcm-ns-usb3.o: In function `bcm_ns_usb3_module_init':
> phy-bcm-ns-usb3.c:(.init.text+0x18): undefined reference to 
> `mdio_driver_register'
> phy-bcm-ns-usb3.c:(.init.text+0x38): undefined reference to 
> `mdio_driver_unregister'
> 
> Using 'depends on MDIO_BUS' instead will avoid the link error.
> 
> Fixes: af850e14a7ae ("phy: bcm-ns-usb3: add MDIO driver using proper bus 
> layer")
> Signed-off-by: Arnd Bergmann 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [RFC 1/2] net: phy: rework Kconfig settings for MDIO_BUS

2017-06-28 Thread Florian Fainelli

On 06/21/2017 03:06 PM, Arnd Bergmann wrote:
> I still see build errors in randconfig builds and have had this
> patch for a while to locally work around it:
> 
> drivers/built-in.o: In function `xgene_mdio_probe':
> mux-core.c:(.text+0x352154): undefined reference to `of_mdiobus_register'
> mux-core.c:(.text+0x352168): undefined reference to `mdiobus_free'
> mux-core.c:(.text+0x3521c0): undefined reference to `mdiobus_alloc_size'
> 
> The idea is that CONFIG_MDIO_BUS now reflects whether the mdio_bus
> code is built-in or a module, and other drivers that use the core
> code can simply depend on that, instead of having a complex
> dependency line.
> 
> Fixes: 90eff9096c01 ("net: phy: Allow splitting MDIO bus/device support from 
> PHYs")
> Signed-off-by: Arnd Bergmann 

Reviewed-by: Florian Fainelli 

This looks a lot better indeed, thanks!
-- 
Florian

CAN-FD Transceiver Limitations

2017-06-28 Thread Franklin S Cooper Jr

Hi All,

The various CAN transceivers I've seen that support CAN-FD appear to be
fairly limited in terms of their supported max speed. I've seen some
transceivers that only support upto 2 Mbps while others support up to  5
Mbps. This is a problem when the SoC's CAN IP can support even higher
values than the transceiver.

Ideally I would think the MCAN driver should at the very least know what
the maximum speed supported by the transceiver it is connected to.
Therefore, either throwing an error if a request for a speed above the
transceiver capability or lower the requested speed to what ever the
transceiver is capability of doing.

In either case I do not know if it makes sense to add a DT property
within the MCAN driver or create another subnode that contains this
information. For example I see some ethernet drivers support
"fixed-link" subnode which is trying to solve a similar issue. Should I
go with that approach? If so would it make sense to reuse fixed-link
even though majority of its properties aren't applicable? Or should I
create something similar such as fixed-can-transceiver?

Re: Adding nfc-next to linux-next

2017-06-28 Thread Stephen Rothwell

Hi Samuel,

On Wed, 28 Jun 2017 09:25:31 +0200 Samuel Ortiz  wrote:
>
> Could you please add the nfc-next tree to linux-next?
> 
> It's here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next.git/
> 
> and the branch is the master one.

Added from today.

Thanks for adding your subsystem tree as a participant of linux-next.  As
you may know, this is not a judgement of your code.  The purpose of
linux-next is for integration testing and to lower the impact of
conflicts between subsystems in the next merge window. 

You will need to ensure that the patches/commits in your tree/series have
been:
 * submitted under GPL v2 (or later) and include the Contributor's
Signed-off-by,
 * posted to the relevant mailing list,
 * reviewed by you (or another maintainer of your subsystem tree),
 * successfully unit tested, and 
 * destined for the current or next Linux merge window.

Basically, this should be just what you would send to Linus (or ask him
to fetch).  It is allowed to be rebased if you deem it necessary.

-- 
Cheers,
Stephen Rothwell 
s...@canb.auug.org.au

Re: [PATCH net-next 1/3] net: ethtool: add support for forward error correction modes

2017-06-28 Thread Dustin Byford

Hi Andrew,

On Wed Jun 28 15:41, Andrew Lunn wrote:
> On Tue, Jun 27, 2017 at 03:22:39AM -0700, Jakub Kicinski wrote:
> > On Sat, 24 Jun 2017 12:19:43 -0700, Roopa Prabhu wrote:
> > > Encoding: Types of encoding
> > > Off:  Turning off any encoding
> > > RS :  enforcing RS-FEC encoding on supported speeds
> > > BaseR  :  enforcing Base R encoding on supported speeds
> > > Auto   :  IEEE defaults for the speed/medium combination
> > 
> > Just to be sure - does auto mean autonegotiate as defined by IEEE or
> > some presets?
> 
> I don't know this field very well. Is this confusion likely to happen
> a lot? Is there a better name for Auto which is less likely to be
> confused?

You're not the first, or the second to ask that question.  I agree it
could use clarification.

I always read auto in this context as automatic rather than autoneg.
The best I can come up with is to perhaps fully spell out "automatic" in
the documentation and the associated uapi enums.  It's accurate, and
hopefully different enough from "autoneg" to hint people away from the
IEEE autoneg concept.

--Dustin

Re: [PATCH v3 net-next 00/12] bpf: rewrite value tracking in verifier

2017-06-28 Thread Alexei Starovoitov

On Wed, Jun 28, 2017 at 10:38:02PM +0200, Daniel Borkmann wrote:
> On 06/28/2017 04:11 PM, Edward Cree wrote:
> > On 28/06/17 14:50, Daniel Borkmann wrote:
> > > Hi Edward,
> > > 
> > > Did you also have a chance in the meantime to look at reducing complexity
> > > along with your unification? I did run the cilium test suite with your
> > > latest set from here and current # worst case processed insns that
> > > verifier has to go through for cilium progs increases from ~53k we have
> > > right now to ~76k. I'm a bit worried that this quickly gets us close to
> > > the upper ~98k max limit starting to reject programs again. Alternative
> > > is to bump the complexity limit again in near future once run into it,
> > > but preferably there's a way to optimize it along with the rewrite? Do
> > > you see any possibilities worth exploring?
> > The trouble, I think, is that as we're now tracking more information about
> >   each register value, we're less able to prune branches.  But often that
> >   information is not actually being used in reaching the exit state.  So it
> 
> Agree.
> 
> >   seems like the way to tackle this would be to track what information is
> >   used — or at least, which registers are read from (including e.g. writing
> >   through them or passing them to helper calls) — in reaching a safe state.
> >   Then only registers which are used are required to match for pruning.
> > But that tracking would presumably have to propagate backwards through the
> >   verifier stack, and I'm not sure how easily that could be done.  Someone
> >   (was it you?) was talking about replacing the current DAG walking and
> >   pruning with some kind of basic-block thing, which would help with this.
> > Summary: I think it could be done, but I haven't looked into the details
> >   of implementation yet; if it's not actually breaking your programs (yet),
> >   maybe leave it for a followup patch series?
> 
> Could we adapt the limit to 128k perhaps as part of this set
> given we know that we're tracking more meta data here anyway?

Increasing the limit is must have, since pruning suffered so much.
Going from 53k to 76k is pretty substantial.
What is the % increase for tests in selftests/ ?
I think we need to pin point exactly the reason.
Saying we just track more data is not enough.
We've tried v2 set on our load balancer and also saw ~20% increase.
I don't remember the absolute numbers.
These jumps don't make me comfortable with these extra tracking.
Can you try to roll back ptr and full negative/positive tracking
and see whether it gets back to what we had before?
I agree that long term it's better to do proper basic block based
liveness, but we need to do understand what's causing the increase today.
If tnum is causing it that would be reasonable trade off to make,
but if it's full neg/pos tracking that has no use today other than
(the whole thing is cleaner) I would rather drop it then.
We can always come back to it later once pruning issues are solved.

Re: [PATCH net-next 0/3] fix sw timestamping for non PTP packets

2017-06-28 Thread Grygorii Strashko




On 06/27/2017 08:58 AM, Ivan Khoronzhuk wrote:

This series contains several corrections connected with timestamping
for cpsw and netcp drivers based on same cpts module.

Based on net/next


Reviewed-by: Grygorii Strashko 



Ivan Khoronzhuk (3):
   net: ethernet: ti: cpsw: move skb timestamp to packet_submit
   net: ethernet: ti: cpsw: fix sw timestamping for non PTP packets
   net: ethernet: ti: netcp_ethss: use cpts to check if packet needs
 timestamping

  drivers/net/ethernet/ti/cpsw.c|  6 +++---
  drivers/net/ethernet/ti/cpts.h| 16 
  drivers/net/ethernet/ti/netcp_ethss.c | 18 +-
  3 files changed, 20 insertions(+), 20 deletions(-)



--
regards,
-grygorii

Re: [PATCH v3 net-next 03/12] nfp: change bpf verifier hooks to match new verifier data structures

2017-06-28 Thread Daniel Borkmann


On 06/27/2017 02:57 PM, Edward Cree wrote:

Signed-off-by: Edward Cree 


Acked-by: Daniel Borkmann

Re: [PATCH v3 net-next 00/12] bpf: rewrite value tracking in verifier

2017-06-28 Thread Daniel Borkmann


On 06/28/2017 04:11 PM, Edward Cree wrote:

On 28/06/17 14:50, Daniel Borkmann wrote:

Hi Edward,

Did you also have a chance in the meantime to look at reducing complexity
along with your unification? I did run the cilium test suite with your
latest set from here and current # worst case processed insns that
verifier has to go through for cilium progs increases from ~53k we have
right now to ~76k. I'm a bit worried that this quickly gets us close to
the upper ~98k max limit starting to reject programs again. Alternative
is to bump the complexity limit again in near future once run into it,
but preferably there's a way to optimize it along with the rewrite? Do
you see any possibilities worth exploring?

The trouble, I think, is that as we're now tracking more information about
  each register value, we're less able to prune branches.  But often that
  information is not actually being used in reaching the exit state.  So it


Agree.


  seems like the way to tackle this would be to track what information is
  used — or at least, which registers are read from (including e.g. writing
  through them or passing them to helper calls) — in reaching a safe state.
  Then only registers which are used are required to match for pruning.
But that tracking would presumably have to propagate backwards through the
  verifier stack, and I'm not sure how easily that could be done.  Someone
  (was it you?) was talking about replacing the current DAG walking and
  pruning with some kind of basic-block thing, which would help with this.
Summary: I think it could be done, but I haven't looked into the details
  of implementation yet; if it's not actually breaking your programs (yet),
  maybe leave it for a followup patch series?


Could we adapt the limit to 128k perhaps as part of this set
given we know that we're tracking more meta data here anyway?
Then we could potentially avoid going via -stable later on,
biggest pain point is usually tracking differences in LLVM
code generation (e.g. differences in optimizations) along with
verifier changes to make sure that programs still keep loading
on older kernels with e.g. newer LLVM; one of the issues is that
pruning can be quite fragile. E.g. worst case adding a simple
var in a branch that LLVM assigns a stack slot that was otherwise
not used throughout the prog can cause a significant increase of
verifier work (run into this multiple times in the past and
is a bit of a pain to track down actually). If we could keep
some buffer in BPF_COMPLEXITY_LIMIT_INSNS at least when we know
that more work is needed anyway from that point onward, that
would be good.

Re: [PATCH v2] datapath: Avoid using stack larger than 1024.

2017-06-28 Thread Pravin Shelar

On Tue, Jun 27, 2017 at 7:29 PM, Tonghao Zhang  wrote:
> When compiling OvS-master on 4.4.0-81 kernel,
> there is a warning:
>
> CC [M]  /root/ovs/datapath/linux/datapath.o
> /root/ovs/datapath/linux/datapath.c: In function
> ‘ovs_flow_cmd_set’:
> /root/ovs/datapath/linux/datapath.c:1221:1: warning:
> the frame size of 1040 bytes is larger than 1024 bytes
> [-Wframe-larger-than=]
>
> This patch factors out match-init and action-copy to avoid
> "Wframe-larger-than=1024" warning. Because mask is only
> used to get actions, we new a function to save some
> stack space.
>
> Signed-off-by: Tonghao Zhang 
> ---
>  datapath/datapath.c | 73 
> -
>  1 file changed, 50 insertions(+), 23 deletions(-)
>
> diff --git a/datapath/datapath.c b/datapath/datapath.c
> index c85029c..fdbe314 100644
> --- a/datapath/datapath.c
> +++ b/datapath/datapath.c
> @@ -1100,6 +1100,50 @@ static struct sw_flow_actions *get_flow_actions(struct 
> net *net,
> return acts;
>  }
>
> +/* Factor out match-init and action-copy to avoid
> + * "Wframe-larger-than=1024" warning. Because mask is only
> + * used to get actions, we new a function to save some
> + * stack space.
> + *
> + * If there are not key and action attrs, we return 0
> + * directly. In the case, the caller will also not use the
> + * match as before. If there is action attr, we try to get
> + * actions and save them to *acts.
> + * */
> +static int ovs_nla_init_match_and_action(struct net *net,
> +struct sw_flow_match *match,
> +struct sw_flow_key *key,
> +struct nlattr **a,
> +struct sw_flow_actions **acts,
> +bool log)
> +{
> +   struct sw_flow_mask mask;
> +   int error = 0;
> +
> +   if (a[OVS_FLOW_ATTR_KEY]) {
> +   ovs_match_init(match, key, true, );
> +   error = ovs_nla_get_match(net, match, a[OVS_FLOW_ATTR_KEY],
> + a[OVS_FLOW_ATTR_MASK], log);
> +   if (error)
> +   return error;
> +   }
> +
> +   if (a[OVS_FLOW_ATTR_ACTIONS]) {
> +   if (!a[OVS_FLOW_ATTR_KEY]) {
> +   OVS_NLERR(log,
> + "Flow key attribute not present in set 
> flow.");
> +   return -EINVAL;
> +   }
> +
> +   *acts = get_flow_actions(net, a[OVS_FLOW_ATTR_ACTIONS], key,
> +, log);
> +   if (IS_ERR(*acts))
> +   return PTR_ERR(*acts);
> +   }
> +
> +   return 0;
> +}
> +
>  static int ovs_flow_cmd_set(struct sk_buff *skb, struct genl_info *info)
>  {
> struct net *net = sock_net(skb->sk);
> @@ -1107,7 +1151,6 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct 
> genl_info *info)
> struct ovs_header *ovs_header = info->userhdr;
> struct sw_flow_key key;
> struct sw_flow *flow;
> -   struct sw_flow_mask mask;
> struct sk_buff *reply = NULL;
> struct datapath *dp;
> struct sw_flow_actions *old_acts = NULL, *acts = NULL;
> @@ -1119,34 +1162,18 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, 
> struct genl_info *info)
> bool ufid_present;
>
> ufid_present = ovs_nla_get_ufid(, a[OVS_FLOW_ATTR_UFID], log);
> -   if (a[OVS_FLOW_ATTR_KEY]) {
> -   ovs_match_init(, , true, );
> -   error = ovs_nla_get_match(net, , a[OVS_FLOW_ATTR_KEY],
> - a[OVS_FLOW_ATTR_MASK], log);
> -   } else if (!ufid_present) {
> +   if (!a[OVS_FLOW_ATTR_KEY] && !ufid_present) {
> OVS_NLERR(log,
>   "Flow set message rejected, Key attribute 
> missing.");
> -   error = -EINVAL;
> +   return -EINVAL;
> }
> +
> +   error = ovs_nla_init_match_and_action(net, , , a,
> + , log);
This looks good. But it is returning match object with dangling
reference to mask. It is fine for now as it is not referenced outside
of the function for now. But it is not pretty. We could reset the mask
pointer before returning from the function. and write comment
explaining it.

[PATCH V2 0/1] Reduce cdc_ncm memory use when kernel memory low

2017-06-28 Thread Jim Baxter

Problem
---

We are using an ARM embedded platform and require 16KiB NTB's to allow for fast
data transfer. Unfortunately we have found that there are times after
running the kernel for a while and transferring a lot of data over the CDC-NCM
connection that it can become harder to find 16KiB pages of memory for
allocation.
This results in a disconnection of the NCM Gadget attached to the host platform.

We are running with reduced buffers to not cross over into the 32KiB page
boundary by setting the buffer sizes to:
tx_max=16000
rx_max=16000


Analysis


We identified through investigation that the lack of 16KiB pages would be short
lived as the kernel would compact the buddy list soon after the failure which 
results in pages being available within seconds.

Solution


In order to avoid disconnections I implemented a patch that will attempt to
use a 2048 Byte minimum size NTB if the allocation of the maximum size NTB
fails.
This allows the connection to limp along until the memory has been recovered
which was usually between 1 and 4 NTB's on our heavy traffic system.
The algorithm will wait for an increasing number of small allocations each
time we have a failure to not burden a system short on memory.

---

V1: Sent to linux-usb for review.
V2: Added code to increase amount of time spent making small allocations to
reduce the burden on the system.

This is the diff between Version 1 and 2 of the patches.

-- File: drivers/net/usb/cdc_ncm.c
41c51,60
< @@ -1055,10 +1055,10 @@ static struct usb_cdc_ncm_ndp16 *cdc_ncm_ndp(struct 
cdc_ncm_ctx *ctx, struct sk_
---
> @@ -89,6 +89,8 @@ struct cdc_ncm_stats {
>   CDC_NCM_SIMPLE_STAT(rx_ntbs),
>  };
>  
> +#define CDC_NCM_LOW_MEM_MAX_CNT 10
> +
>  static int cdc_ncm_get_sset_count(struct net_device __always_unused *netdev, 
> int sset)
>  {
>   switch (sset) {
> @@ -1055,10 +1057,10 @@ static struct usb_cdc_ncm_ndp16 *cdc_ncm_ndp(struct 
> cdc_ncm_ctx *ctx, struct sk_
59,60c78,91
< + ctx->tx_curr_size = ctx->tx_max;
< + skb_out = alloc_skb(ctx->tx_curr_size, GFP_ATOMIC);
---
> + if (ctx->tx_low_mem_val == 0) {
> + ctx->tx_curr_size = ctx->tx_max;
> + skb_out = alloc_skb(ctx->tx_curr_size, GFP_ATOMIC);
> + /* If the memory allocation fails we will wait longer
> +  * each time before attempting another full size
> +  * allocation again to not overload the system
> +  * further.
> +  */
> + if (skb_out == NULL) {
> + ctx->tx_low_mem_max_cnt = 
> min(ctx->tx_low_mem_max_cnt + 1,
> +   
> (unsigned)CDC_NCM_LOW_MEM_MAX_CNT);
> + ctx->tx_low_mem_val = ctx->tx_low_mem_max_cnt;
> + }
> + }
84a116
> + ctx->tx_low_mem_val--;

-- File: include/linux/usb/cdc_ncm.h
130a163,164
> + u32 tx_low_mem_max_cnt;
> + u32 tx_low_mem_val;


Jim Baxter (1):
  net: cdc_ncm: Reduce memory use when kernel memory low

 drivers/net/usb/cdc_ncm.c   | 54 +++--
 include/linux/usb/cdc_ncm.h |  3 +++
 2 files changed, 45 insertions(+), 12 deletions(-)

-- 
1.9.1

[PATCH V2 1/1] net: cdc_ncm: Reduce memory use when kernel memory low

2017-06-28 Thread Jim Baxter

The CDC-NCM driver can require large amounts of memory to create
skb's and this can be a problem when the memory becomes fragmented.

This especially affects embedded systems that have constrained
resources but wish to maximise the throughput of CDC-NCM with 16KiB
NTB's.

The issue is after running for a while the kernel memory can become
fragmented and it needs compacting.
If the NTB allocation is needed before the memory has been compacted
the atomic allocation can fail which can cause increased latency,
large re-transmissions or disconnections depending upon the data
being transmitted at the time.
This situation occurs for less than a second until the kernel has
compacted the memory but the failed devices can take a lot longer to
recover from the failed TX packets.

To ease this temporary situation I modified the CDC-NCM TX path to
temporarily switch into a reduced memory mode which allocates an NTB
that will fit into a USB_CDC_NCM_NTB_MIN_OUT_SIZE (default 2048 Bytes)
sized memory block and only transmit NTB's with a single network frame
until the memory situation is resolved.
Each time this issue occurs we wait for an increasing number of
reduced size allocations before requesting a full size one to not
put additional pressure on a low memory system.

Once the memory is compacted the CDC-NCM data can resume transmitting
at the normal tx_max rate once again.

Signed-off-by: Jim Baxter 

---

V1: Sent to linux-usb for review.
V2: Added code to increase amount of time spent making small allocations to
reduce the burden on the system.

 drivers/net/usb/cdc_ncm.c   | 54 +++--
 include/linux/usb/cdc_ncm.h |  3 +++
 2 files changed, 45 insertions(+), 12 deletions(-)

diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c
index b5cec18..f9187d8 100644
--- a/drivers/net/usb/cdc_ncm.c
+++ b/drivers/net/usb/cdc_ncm.c
@@ -89,6 +89,8 @@ struct cdc_ncm_stats {
CDC_NCM_SIMPLE_STAT(rx_ntbs),
 };
 
+#define CDC_NCM_LOW_MEM_MAX_CNT 10
+
 static int cdc_ncm_get_sset_count(struct net_device __always_unused *netdev, 
int sset)
 {
switch (sset) {
@@ -1055,10 +1057,10 @@ static struct usb_cdc_ncm_ndp16 *cdc_ncm_ndp(struct 
cdc_ncm_ctx *ctx, struct sk_
 
/* align new NDP */
if (!(ctx->drvflags & CDC_NCM_FLAG_NDP_TO_END))
-   cdc_ncm_align_tail(skb, ctx->tx_ndp_modulus, 0, ctx->tx_max);
+   cdc_ncm_align_tail(skb, ctx->tx_ndp_modulus, 0, 
ctx->tx_curr_size);
 
/* verify that there is room for the NDP and the datagram (reserve) */
-   if ((ctx->tx_max - skb->len - reserve) < ctx->max_ndp_size)
+   if ((ctx->tx_curr_size - skb->len - reserve) < ctx->max_ndp_size)
return NULL;
 
/* link to it */
@@ -,13 +1113,41 @@ struct sk_buff *
 
/* allocate a new OUT skb */
if (!skb_out) {
-   skb_out = alloc_skb(ctx->tx_max, GFP_ATOMIC);
+   if (ctx->tx_low_mem_val == 0) {
+   ctx->tx_curr_size = ctx->tx_max;
+   skb_out = alloc_skb(ctx->tx_curr_size, GFP_ATOMIC);
+   /* If the memory allocation fails we will wait longer
+* each time before attempting another full size
+* allocation again to not overload the system
+* further.
+*/
+   if (skb_out == NULL) {
+   ctx->tx_low_mem_max_cnt = 
min(ctx->tx_low_mem_max_cnt + 1,
+ 
(unsigned)CDC_NCM_LOW_MEM_MAX_CNT);
+   ctx->tx_low_mem_val = ctx->tx_low_mem_max_cnt;
+   }
+   }
if (skb_out == NULL) {
-   if (skb != NULL) {
-   dev_kfree_skb_any(skb);
-   dev->net->stats.tx_dropped++;
+   /* See if a very small allocation is possible.
+* We will send this packet immediately and hope
+* that there is more memory available later.
+*/
+   if (skb)
+   ctx->tx_curr_size = max(skb->len,
+   (u32)USB_CDC_NCM_NTB_MIN_OUT_SIZE);
+   else
+   ctx->tx_curr_size = 
USB_CDC_NCM_NTB_MIN_OUT_SIZE;
+   skb_out = alloc_skb(ctx->tx_curr_size, GFP_ATOMIC);
+
+   /* No allocation possible so we will abort */
+   if (skb_out == NULL) {
+   if (skb != NULL) {
+   dev_kfree_skb_any(skb);
+   dev->net->stats.tx_dropped++;
+   }
+   goto exit_no_skb;

[PATCH net-next v2 0/9] introduce flower offload capabilities

2017-06-28 Thread Simon Horman

Hi,

this series adds flower offload to the NFP driver. It builds on recent
work to add representor and a skeleton flower app - now the app does what
its name says.

In general the approach taken is to allow some flows within
the universe of possible flower matches and tc actions to be offloaded.
It is planned that this support will grow over time but the support
offered by this patch-set seems to be a reasonable starting point.

Key Changes since RFC:
* Revised locking scheme for flows
* Make generalise tc_setup NDO
* Addressed other review of RFC
* Dropped RFC designation

Pieter Jansen van Vuuren (7):
  nfp: provide infrastructure for offloading flower based TC filters
  nfp: extend flower add flow offload
  nfp: extend flower matching capabilities
  nfp: add basic action capabilities to flower offloads
  nfp: add metadata to each flow offload
  nfp: add a stats handler for flower offloads
  nfp: add control message passing capabilities to flower offloads

Simon Horman (2):
  net: switchdev: add SET_SWITCHDEV_OPS helper
  nfp: add phys_switch_id support

 drivers/net/ethernet/netronome/nfp/Makefile|   6 +-
 drivers/net/ethernet/netronome/nfp/flower/action.c | 211 +
 drivers/net/ethernet/netronome/nfp/flower/cmsg.c   |  11 +-
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h   | 202 +
 drivers/net/ethernet/netronome/nfp/flower/main.c   |  32 +-
 drivers/net/ethernet/netronome/nfp/flower/main.h   | 159 +++
 drivers/net/ethernet/netronome/nfp/flower/match.c  | 292 
 .../net/ethernet/netronome/nfp/flower/metadata.c   | 497 +
 .../net/ethernet/netronome/nfp/flower/offload.c| 397 
 .../net/ethernet/netronome/nfp/nfp_net_common.c|  17 +-
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.c  |   8 +
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.h  |   9 +
 drivers/net/ethernet/netronome/nfp/nfp_port.c  |  43 ++
 drivers/net/ethernet/netronome/nfp/nfp_port.h  |   8 +
 include/net/switchdev.h|   4 +
 15 files changed, 1869 insertions(+), 27 deletions(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/action.c
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/main.h
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/match.c
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/metadata.c
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/offload.c

-- 
2.1.4

[PATCH net-next v2 7/9] nfp: add metadata to each flow offload

2017-06-28 Thread Simon Horman

From: Pieter Jansen van Vuuren 

Adds metadata describing the mask id of each flow and keeps track of
flows installed in hardware. Previously a flow could not be removed
from hardware as there was no way of knowing if that a specific flow
was installed. This is solved by storing the offloaded flows in a
hash table.

Signed-off-by: Pieter Jansen van Vuuren 
Signed-off-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/Makefile|   1 +
 drivers/net/ethernet/netronome/nfp/flower/main.c   |  14 +-
 drivers/net/ethernet/netronome/nfp/flower/main.h   |  53 +++
 .../net/ethernet/netronome/nfp/flower/metadata.c   | 377 +
 .../net/ethernet/netronome/nfp/flower/offload.c|  22 +-
 5 files changed, 456 insertions(+), 11 deletions(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/metadata.c

diff --git a/drivers/net/ethernet/netronome/nfp/Makefile 
b/drivers/net/ethernet/netronome/nfp/Makefile
index 1ba0ea78adc3..b8e1358868bd 100644
--- a/drivers/net/ethernet/netronome/nfp/Makefile
+++ b/drivers/net/ethernet/netronome/nfp/Makefile
@@ -35,6 +35,7 @@ nfp-objs += \
flower/cmsg.o \
flower/main.o \
flower/match.o \
+   flower/metadata.o \
flower/offload.o
 endif
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c 
b/drivers/net/ethernet/netronome/nfp/flower/main.c
index 19f20f819e2f..1103d23a8ec7 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
@@ -50,14 +50,6 @@
 
 #define NFP_FLOWER_ALLOWED_VER 0x00010001UL
 
-/**
- * struct nfp_flower_priv - Flower APP per-vNIC priv data
- * @nn: Pointer to vNIC
- */
-struct nfp_flower_priv {
-   struct nfp_net *nn;
-};
-
 static const char *nfp_flower_extra_cap(struct nfp_app *app, struct nfp_net 
*nn)
 {
return "FLOWER";
@@ -351,6 +343,12 @@ static int nfp_flower_init(struct nfp_app *app)
if (!app->priv)
return -ENOMEM;
 
+   err = nfp_flower_metadata_init(app);
+   if (nfp_flower_metadata_init(app)) {
+   kfree(app->priv);
+   return err;
+   }
+
return 0;
 }
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
index 7c9530504752..ae54b8052043 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -34,12 +34,51 @@
 #ifndef __NFP_FLOWER_H__
 #define __NFP_FLOWER_H__ 1
 
+#include 
+#include 
+#include 
 #include 
 
 struct tc_to_netdev;
 struct net_device;
 struct nfp_app;
 
+#define NFP_FLOWER_HASH_BITS   10
+#define NFP_FLOWER_HASH_SEED   129004
+
+#define NFP_FLOWER_MASK_ENTRY_RS   256
+#define NFP_FLOWER_MASK_ELEMENT_RS 1
+#define NFP_FLOWER_MASK_HASH_BITS  10
+#define NFP_FLOWER_MASK_HASH_SEED  9198806
+
+#define NFP_FL_META_FLAG_NEW_MASK  128
+#define NFP_FL_META_FLAG_LAST_MASK 1
+
+#define NFP_FL_MASK_REUSE_TIME_NS  4
+#define NFP_FL_MASK_ID_LOCATION1
+
+struct nfp_fl_mask_id {
+   struct circ_buf mask_id_free_list;
+   struct timespec64 *last_used;
+   u8 init_unallocated;
+};
+
+/**
+ * struct nfp_flower_priv - Flower APP per-vNIC priv data
+ * @nn:Pointer to vNIC
+ * @flower_version:HW version of flower
+ * @mask_ids:  List of free mask ids
+ * @mask_table:Hash table used to store masks
+ * @flow_table:Hash table used to store flower rules
+ */
+struct nfp_flower_priv {
+   struct nfp_net *nn;
+   u64 flower_version;
+   struct nfp_fl_mask_id mask_ids;
+   DECLARE_HASHTABLE(mask_table, NFP_FLOWER_MASK_HASH_BITS);
+   DECLARE_HASHTABLE(flow_table, NFP_FLOWER_HASH_BITS);
+};
+
 struct nfp_fl_key_ls {
u32 key_layer_two;
u8 key_layer;
@@ -64,6 +103,9 @@ struct nfp_fl_payload {
char *action_data;
 };
 
+int nfp_flower_metadata_init(struct nfp_app *app);
+void nfp_flower_metadata_cleanup(struct nfp_app *app);
+
 int nfp_flower_setup_tc(struct nfp_app *app, struct net_device *netdev,
u32 handle, __be16 proto, struct tc_to_netdev *tc);
 int nfp_flower_compile_flow_match(struct tc_cls_flower_offload *flow,
@@ -73,5 +115,16 @@ int nfp_flower_compile_flow_match(struct 
tc_cls_flower_offload *flow,
 int nfp_flower_compile_action(struct tc_cls_flower_offload *flow,
  struct net_device *netdev,
  struct nfp_fl_payload *nfp_flow);
+int nfp_compile_flow_metadata(struct nfp_app *app,
+ struct tc_cls_flower_offload *flow,
+ struct nfp_fl_payload *nfp_flow);
+int nfp_modify_flow_metadata(struct nfp_app *app,
+struct

[PATCH net-next v2 8/9] nfp: add a stats handler for flower offloads

2017-06-28 Thread Simon Horman

From: Pieter Jansen van Vuuren 

Previously there was no way of updating flow rule stats after they
have been offloaded to hardware. This is solved by keeping track of
stats received from hardware and providing this to the TC handler
on request.

Signed-off-by: Pieter Jansen van Vuuren 
Signed-off-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/flower/cmsg.c   |   5 -
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h   |   5 +
 drivers/net/ethernet/netronome/nfp/flower/main.h   |  29 +
 .../net/ethernet/netronome/nfp/flower/metadata.c   | 124 -
 .../net/ethernet/netronome/nfp/flower/offload.c|  20 +++-
 5 files changed, 175 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.c 
b/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
index 916a6196d2ba..0f5410aa66d6 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
@@ -52,11 +52,6 @@ nfp_flower_cmsg_get_hdr(struct sk_buff *skb)
return (struct nfp_flower_cmsg_hdr *)skb->data;
 }
 
-static void *nfp_flower_cmsg_get_data(struct sk_buff *skb)
-{
-   return (unsigned char *)skb->data + NFP_FLOWER_CMSG_HLEN;
-}
-
 static struct sk_buff *
 nfp_flower_cmsg_alloc(struct nfp_app *app, unsigned int size,
  enum nfp_flower_cmsg_type_port type)
diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h 
b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
index 4c72e537af32..736c4848f073 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
@@ -300,6 +300,11 @@ nfp_flower_cmsg_pcie_port(u8 nfp_pcie, enum 
nfp_flower_cmsg_port_vnic_type type,
   NFP_FLOWER_CMSG_PORT_TYPE_PCIE_PORT);
 }
 
+static inline void *nfp_flower_cmsg_get_data(struct sk_buff *skb)
+{
+   return (unsigned char *)skb->data + NFP_FLOWER_CMSG_HLEN;
+}
+
 int nfp_flower_cmsg_portmod(struct nfp_repr *repr, bool carrier_ok);
 void nfp_flower_cmsg_rx(struct nfp_app *app, struct sk_buff *skb);
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
index ae54b8052043..cfa54540aff0 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -43,9 +43,13 @@ struct tc_to_netdev;
 struct net_device;
 struct nfp_app;
 
+#define NFP_FL_REPEATED_HASH_MAX   BIT(17)
 #define NFP_FLOWER_HASH_BITS   10
 #define NFP_FLOWER_HASH_SEED   129004
 
+#define NFP_FL_STATS_ENTRY_RS  BIT(20)
+#define NFP_FL_STATS_ELEM_RS   4
+
 #define NFP_FLOWER_MASK_ENTRY_RS   256
 #define NFP_FLOWER_MASK_ELEMENT_RS 1
 #define NFP_FLOWER_MASK_HASH_BITS  10
@@ -63,10 +67,17 @@ struct nfp_fl_mask_id {
u8 init_unallocated;
 };
 
+struct nfp_fl_stats_id {
+   struct circ_buf free_list;
+   u32 init_unalloc;
+   u8 repeated_em_count;
+};
+
 /**
  * struct nfp_flower_priv - Flower APP per-vNIC priv data
  * @nn:Pointer to vNIC
  * @flower_version:HW version of flower
+ * @stats_ids: List of free stats ids
  * @mask_ids:  List of free mask ids
  * @mask_table:Hash table used to store masks
  * @flow_table:Hash table used to store flower rules
@@ -74,6 +85,7 @@ struct nfp_fl_mask_id {
 struct nfp_flower_priv {
struct nfp_net *nn;
u64 flower_version;
+   struct nfp_fl_stats_id stats_ids;
struct nfp_fl_mask_id mask_ids;
DECLARE_HASHTABLE(mask_table, NFP_FLOWER_MASK_HASH_BITS);
DECLARE_HASHTABLE(flow_table, NFP_FLOWER_HASH_BITS);
@@ -96,13 +108,28 @@ struct nfp_fl_rule_metadata {
__be32 shortcut;
 };
 
+struct nfp_fl_stats {
+   u64 pkts;
+   u64 bytes;
+   u64 used;
+};
+
 struct nfp_fl_payload {
struct nfp_fl_rule_metadata meta;
+   spinlock_t lock; /* lock stats */
+   struct nfp_fl_stats stats;
char *unmasked_data;
char *mask_data;
char *action_data;
 };
 
+struct nfp_fl_stats_frame {
+   __be32 stats_con_id;
+   __be32 pkt_count;
+   __be64 byte_count;
+   __be64 stats_cookie;
+};
+
 int nfp_flower_metadata_init(struct nfp_app *app);
 void nfp_flower_metadata_cleanup(struct nfp_app *app);
 
@@ -127,4 +154,6 @@ nfp_flower_find_in_fl_table(struct nfp_app *app,
 struct nfp_fl_payload *
 nfp_flower_remove_fl_table(struct nfp_app *app, unsigned long 
tc_flower_cookie);
 
+void nfp_flower_rx_flow_stats(struct nfp_app *app, struct sk_buff *skb);
+
 #endif
diff --git a/drivers/net/ethernet/netronome/nfp/flower/metadata.c 
b/drivers/net/ethernet/netronome/nfp/flower/metadata.c
index 51de9ab85951..f03e45ab15af 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/metadata.c
+++

[PATCH net-next v2 6/9] nfp: add basic action capabilities to flower offloads

2017-06-28 Thread Simon Horman

From: Pieter Jansen van Vuuren 

Adds push vlan, pop vlan, output and drop action capabilities
to flower offloads.

Signed-off-by: Pieter Jansen van Vuuren 
Signed-off-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/Makefile|   1 +
 drivers/net/ethernet/netronome/nfp/flower/action.c | 211 +
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h   |  45 +
 drivers/net/ethernet/netronome/nfp/flower/main.h   |   3 +
 .../net/ethernet/netronome/nfp/flower/offload.c|  11 ++
 5 files changed, 271 insertions(+)
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/action.c

diff --git a/drivers/net/ethernet/netronome/nfp/Makefile 
b/drivers/net/ethernet/netronome/nfp/Makefile
index 018cef3fa10a..1ba0ea78adc3 100644
--- a/drivers/net/ethernet/netronome/nfp/Makefile
+++ b/drivers/net/ethernet/netronome/nfp/Makefile
@@ -31,6 +31,7 @@ nfp-objs := \
 
 ifeq ($(CONFIG_NFP_APP_FLOWER),y)
 nfp-objs += \
+   flower/action.o \
flower/cmsg.o \
flower/main.o \
flower/match.o \
diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c 
b/drivers/net/ethernet/netronome/nfp/flower/action.c
new file mode 100644
index ..291b9af0e5d4
--- /dev/null
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -0,0 +1,211 @@
+/*
+ * Copyright (C) 2017 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  1. Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ *  2. Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "cmsg.h"
+#include "main.h"
+#include "../nfp_net_repr.h"
+
+static void nfp_fl_pop_vlan(struct nfp_fl_pop_vlan *pop_vlan)
+{
+   size_t act_size = sizeof(struct nfp_fl_pop_vlan);
+   u16 tmp_pop_vlan_op;
+
+   tmp_pop_vlan_op =
+   FIELD_PREP(NFP_FL_ACT_LEN_LW, act_size / NFP_FL_LW_SIZ) |
+   FIELD_PREP(NFP_FL_ACT_JMP_ID, NFP_FL_ACTION_OPCODE_POP_VLAN);
+
+   pop_vlan->a_op = cpu_to_be16(tmp_pop_vlan_op);
+   pop_vlan->reserved = 0;
+}
+
+static void
+nfp_fl_push_vlan(struct nfp_fl_push_vlan *push_vlan,
+const struct tc_action *action)
+{
+   size_t act_size = sizeof(struct nfp_fl_push_vlan);
+   struct tcf_vlan *vlan = to_vlan(action);
+   u16 tmp_push_vlan_tci;
+   u16 tmp_push_vlan_op;
+
+   tmp_push_vlan_op =
+   FIELD_PREP(NFP_FL_ACT_LEN_LW, act_size / NFP_FL_LW_SIZ) |
+   FIELD_PREP(NFP_FL_ACT_JMP_ID, NFP_FL_ACTION_OPCODE_PUSH_VLAN);
+
+   push_vlan->a_op = cpu_to_be16(tmp_push_vlan_op);
+   /* Set action push vlan parameters. */
+   push_vlan->reserved = 0;
+   push_vlan->vlan_tpid = tcf_vlan_push_proto(action);
+
+   tmp_push_vlan_tci =
+   FIELD_PREP(NFP_FL_PUSH_VLAN_PRIO, vlan->tcfv_push_prio) |
+   FIELD_PREP(NFP_FL_PUSH_VLAN_VID, vlan->tcfv_push_vid) |
+   NFP_FL_PUSH_VLAN_CFI;
+   push_vlan->vlan_tci = cpu_to_be16(tmp_push_vlan_tci);
+}
+
+static int
+nfp_fl_output(struct nfp_fl_output *output, const struct tc_action *action,
+ struct nfp_fl_payload *nfp_flow, bool last,
+ struct net_device *in_dev)
+{
+   size_t act_size = sizeof(struct nfp_fl_output);
+   struct net_device *out_dev;
+   u16 tmp_output_op;
+   int ifindex;
+
+   /* Set action opcode to output action. */
+   tmp_output_op =
+   FIELD_PREP(NFP_FL_ACT_LEN_LW, act_size / NFP_FL_LW_SIZ) |
+   FIELD_PREP(NFP_FL_ACT_JMP_ID, NFP_FL_ACTION_OPCODE_OUTPUT);
+
+

[PATCH net-next v2 4/9] nfp: extend flower add flow offload

2017-06-28 Thread Simon Horman

From: Pieter Jansen van Vuuren 

Extends the flower flow add function by calculating which match
fields are present in the flower offload structure and allocating
the appropriate space to describe these.

Signed-off-by: Pieter Jansen van Vuuren 
Signed-off-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h   | 141 +
 drivers/net/ethernet/netronome/nfp/flower/main.h   |  24 +++
 .../net/ethernet/netronome/nfp/flower/offload.c| 166 -
 3 files changed, 330 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h 
b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
index c10ae7631941..1b1888e8dc14 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
@@ -40,6 +40,147 @@
 
 #include "../nfp_app.h"
 
+#define NFP_FLOWER_LAYER_META  BIT(0)
+#define NFP_FLOWER_LAYER_PORT  BIT(1)
+#define NFP_FLOWER_LAYER_MAC   BIT(2)
+#define NFP_FLOWER_LAYER_TPBIT(3)
+#define NFP_FLOWER_LAYER_IPV4  BIT(4)
+#define NFP_FLOWER_LAYER_IPV6  BIT(5)
+#define NFP_FLOWER_LAYER_CTBIT(6)
+#define NFP_FLOWER_LAYER_VXLAN BIT(7)
+
+#define NFP_FLOWER_LAYER_ETHER BIT(3)
+#define NFP_FLOWER_LAYER_ARP   BIT(4)
+
+/* Metadata without L2 (1W/4B)
+ * 
+ *3   2   1
+ *  1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |  key_layers   |mask_id|   reserved|
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+struct nfp_flower_meta_one {
+   u8 nfp_flow_key_layer;
+   u8 mask_id;
+   u16 reserved;
+};
+
+/* Metadata with L2 (1W/4B)
+ * 
+ *3   2   1
+ *  1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |key_type   |mask_id| PCP |p|   vlan outermost VID  |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * ^   ^
+ *   NOTE: | TCI   |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+struct nfp_flower_meta_two {
+   u8 nfp_flow_key_layer;
+   u8 mask_id;
+   __be16 tci;
+};
+
+/* Port details (1W/4B)
+ * 
+ *3   2   1
+ *  1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * | port_ingress  |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+struct nfp_flower_in_port {
+   __be32 in_port;
+};
+
+/* L2 details (4W/16B)
+ *3   2   1
+ *  1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * | mac_addr_dst, 31 - 0  |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |  mac_addr_dst, 47 - 32| mac_addr_src, 15 - 0  |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * | mac_addr_src, 47 - 16 |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |   mpls outermost label|  TC |B|   reserved  |q|
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+struct nfp_flower_mac_mpls {
+   u8 mac_dst[6];
+   u8 mac_src[6];
+   __be32 mpls_lse;
+};
+
+/* L4 ports (for UDP, TCP, SCTP) (1W/4B)
+ *3   2   1
+ *  1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |port_src   |   port_dst|
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+struct nfp_flower_tp_ports {
+   __be16 port_src;
+   __be16 port_dst;
+};
+
+/* L3 IPv4 details (3W/12B)
+ *3   2   1
+ *  1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |DSCP   |ECN|   protocol|   reserved|
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |ipv4_addr_src  |
+ *

[PATCH net-next v2 9/9] nfp: add control message passing capabilities to flower offloads

2017-06-28 Thread Simon Horman

From: Pieter Jansen van Vuuren 

Previously the flower offloads never sends messages to the hardware,
and never registers a handler for receiving messages from hardware.
This patch enables the flower offloads to send control messages to
hardware when adding and removing flow rules. Additionally it
registers a control message rx handler for receiving stats updates
from hardware for each offloaded flow.

Additionally this patch adds 4 control message types; Add, modify and
delete flow, as well as flow stats. It also allows
nfp_flower_cmsg_get_data() to be used outside of cmsg.c.

Signed-off-by: Pieter Jansen van Vuuren 
Signed-off-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/flower/cmsg.c   |  6 ++-
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h   |  7 +++
 .../net/ethernet/netronome/nfp/flower/offload.c| 56 ++
 3 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.c 
b/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
index 0f5410aa66d6..dd7fa9cf225f 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 
+#include "main.h"
 #include "../nfpcore/nfp_cpp.h"
 #include "../nfp_net_repr.h"
 #include "./cmsg.h"
@@ -52,7 +53,7 @@ nfp_flower_cmsg_get_hdr(struct sk_buff *skb)
return (struct nfp_flower_cmsg_hdr *)skb->data;
 }
 
-static struct sk_buff *
+struct sk_buff *
 nfp_flower_cmsg_alloc(struct nfp_app *app, unsigned int size,
  enum nfp_flower_cmsg_type_port type)
 {
@@ -143,6 +144,9 @@ void nfp_flower_cmsg_rx(struct nfp_app *app, struct sk_buff 
*skb)
case NFP_FLOWER_CMSG_TYPE_PORT_MOD:
nfp_flower_cmsg_portmod_rx(app, skb);
break;
+   case NFP_FLOWER_CMSG_TYPE_FLOW_STATS:
+   nfp_flower_rx_flow_stats(app, skb);
+   break;
default:
nfp_flower_cmsg_warn(app, "Cannot handle invalid repr control 
type %u\n",
 type);
diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h 
b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
index 736c4848f073..5a997feb6f80 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
@@ -245,7 +245,11 @@ struct nfp_flower_cmsg_hdr {
 
 /* Types defined for port related control messages  */
 enum nfp_flower_cmsg_type_port {
+   NFP_FLOWER_CMSG_TYPE_FLOW_ADD = 0,
+   NFP_FLOWER_CMSG_TYPE_FLOW_MOD = 1,
+   NFP_FLOWER_CMSG_TYPE_FLOW_DEL = 2,
NFP_FLOWER_CMSG_TYPE_PORT_MOD = 8,
+   NFP_FLOWER_CMSG_TYPE_FLOW_STATS =   15,
NFP_FLOWER_CMSG_TYPE_PORT_ECHO =16,
NFP_FLOWER_CMSG_TYPE_MAX =  32,
 };
@@ -307,5 +311,8 @@ static inline void *nfp_flower_cmsg_get_data(struct sk_buff 
*skb)
 
 int nfp_flower_cmsg_portmod(struct nfp_repr *repr, bool carrier_ok);
 void nfp_flower_cmsg_rx(struct nfp_app *app, struct sk_buff *skb);
+struct sk_buff *
+nfp_flower_cmsg_alloc(struct nfp_app *app, unsigned int size,
+ enum nfp_flower_cmsg_type_port type);
 
 #endif
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c 
b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 79df9d74f89e..73b3467f5bcd 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -44,6 +44,52 @@
 #include "../nfp_net.h"
 #include "../nfp_port.h"
 
+static int
+nfp_flower_xmit_flow(struct net_device *netdev,
+struct nfp_fl_payload *nfp_flow, u8 mtype)
+{
+   u32 meta_len, key_len, mask_len, act_len, tot_len;
+   struct nfp_repr *priv = netdev_priv(netdev);
+   struct sk_buff *skb;
+   unsigned char *msg;
+
+   meta_len =  sizeof(struct nfp_fl_rule_metadata);
+   key_len = nfp_flow->meta.key_len;
+   mask_len = nfp_flow->meta.mask_len;
+   act_len = nfp_flow->meta.act_len;
+
+   tot_len = meta_len + key_len + mask_len + act_len;
+
+   /* Convert to long words as firmware expects
+* lengths in units of NFP_FL_LW_SIZ.
+*/
+   nfp_flow->meta.key_len /= NFP_FL_LW_SIZ;
+   nfp_flow->meta.mask_len /= NFP_FL_LW_SIZ;
+   nfp_flow->meta.act_len /= NFP_FL_LW_SIZ;
+
+   skb = nfp_flower_cmsg_alloc(priv->app, tot_len, mtype);
+   if (!skb)
+   return -ENOMEM;
+
+   msg = nfp_flower_cmsg_get_data(skb);
+   memcpy(msg, _flow->meta, meta_len);
+   memcpy([meta_len], nfp_flow->unmasked_data, key_len);
+   memcpy([meta_len + key_len], nfp_flow->mask_data, mask_len);
+   memcpy([meta_len + key_len + mask_len],
+  nfp_flow->action_data, act_len);
+
+   /* Convert back to bytes as software expects
+

[PATCH net-next v2 5/9] nfp: extend flower matching capabilities

2017-06-28 Thread Simon Horman

From: Pieter Jansen van Vuuren 

Extends matching capabilities for flower offloads to include vlan,
layer 2, layer 3 and layer 4 type matches. This includes both exact
and wildcard matching.

Signed-off-by: Pieter Jansen van Vuuren 
Signed-off-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/Makefile|   1 +
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h   |   4 +
 drivers/net/ethernet/netronome/nfp/flower/main.h   |   5 +
 drivers/net/ethernet/netronome/nfp/flower/match.c  | 292 +
 .../net/ethernet/netronome/nfp/flower/offload.c|   5 +
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.h  |   9 +
 6 files changed, 316 insertions(+)
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/match.c

diff --git a/drivers/net/ethernet/netronome/nfp/Makefile 
b/drivers/net/ethernet/netronome/nfp/Makefile
index d7afd2b410fe..018cef3fa10a 100644
--- a/drivers/net/ethernet/netronome/nfp/Makefile
+++ b/drivers/net/ethernet/netronome/nfp/Makefile
@@ -33,6 +33,7 @@ ifeq ($(CONFIG_NFP_APP_FLOWER),y)
 nfp-objs += \
flower/cmsg.o \
flower/main.o \
+   flower/match.o \
flower/offload.o
 endif
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h 
b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
index 1b1888e8dc14..1956c1acf39f 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
@@ -52,6 +52,10 @@
 #define NFP_FLOWER_LAYER_ETHER BIT(3)
 #define NFP_FLOWER_LAYER_ARP   BIT(4)
 
+#define NFP_FLOWER_MASK_VLAN_PRIO  GENMASK(15, 13)
+#define NFP_FLOWER_MASK_VLAN_CFI   BIT(12)
+#define NFP_FLOWER_MASK_VLAN_VID   GENMASK(11, 0)
+
 /* Metadata without L2 (1W/4B)
  * 
  *3   2   1
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
index ba3e14c3ec26..5ba7e5194708 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -66,4 +66,9 @@ struct nfp_fl_payload {
 
 int nfp_flower_setup_tc(struct nfp_app *app, struct net_device *netdev,
u32 handle, __be16 proto, struct tc_to_netdev *tc);
+int nfp_flower_compile_flow_match(struct tc_cls_flower_offload *flow,
+ struct nfp_fl_key_ls *key_ls,
+ struct net_device *netdev,
+ struct nfp_fl_payload *nfp_flow);
+
 #endif
diff --git a/drivers/net/ethernet/netronome/nfp/flower/match.c 
b/drivers/net/ethernet/netronome/nfp/flower/match.c
new file mode 100644
index ..b700daf300e0
--- /dev/null
+++ b/drivers/net/ethernet/netronome/nfp/flower/match.c
@@ -0,0 +1,292 @@
+/*
+ * Copyright (C) 2017 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  1. Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ *  2. Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+
+#include "cmsg.h"
+#include "main.h"
+
+static void
+nfp_flower_compile_meta_tci(struct nfp_flower_meta_two *frame,
+   struct tc_cls_flower_offload *flow, u8 key_type,
+   bool mask_version)
+{
+   struct flow_dissector_key_vlan *flow_vlan;
+   u16 tmp_tci;
+
+   /* Populate the metadata frame. */
+   frame->nfp_flow_key_layer = key_type;
+   frame->mask_id = ~0;
+
+   if (mask_version) {
+   frame->tci = cpu_to_be16(~0);

[PATCH net-next v2 2/9] nfp: add phys_switch_id support

2017-06-28 Thread Simon Horman

Add phys_switch_id support by allowing lookup of
SWITCHDEV_ATTR_ID_PORT_PARENT_ID via the nfp_repr_port_attr_get
switchdev operation.

This is visible to user-space in the phys_switch_id attribute
of a netdev.

e.g.
cd /sys/devices/pci:00/:00:01.0/:01:00.0
find . -name phys_switch_id | xargs grep .
./net/eth3/phys_switch_id:00154d1300bd
./net/eth4/phys_switch_id:00154d1300bd
./net/eth2/phys_switch_id:00154d1300bd
grep: ./net/eth5/phys_switch_id: Operation not supported

In the above eth2 and eth3 and representor netdevs for the first and second
physical port. eth4 is the representor for the PF. And eth5 is the PF netdev.

Signed-off-by: Simon Horman 
Reviewed-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c|  3 +++
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.c  |  2 ++
 drivers/net/ethernet/netronome/nfp/nfp_port.c  | 28 ++
 drivers/net/ethernet/netronome/nfp/nfp_port.h  |  3 +++
 4 files changed, 36 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 2e728543e840..b5834525c5f0 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -64,6 +64,7 @@
 #include 
 #include 
 
+#include 
 #include 
 
 #include "nfpcore/nfp_nsp.h"
@@ -3703,6 +3704,8 @@ static void nfp_net_netdev_init(struct nfp_net *nn)
netdev->netdev_ops = _net_netdev_ops;
netdev->watchdog_timeo = msecs_to_jiffies(5 * 1000);
 
+   SWITCHDEV_SET_OPS(netdev, _port_switchdev_ops);
+
/* MTU range: 68 - hw-specific max */
netdev->min_mtu = ETH_MIN_MTU;
netdev->max_mtu = nn->max_mtu;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
index 046b89eb4cf2..bc9108071e5b 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "nfpcore/nfp_cpp.h"
 #include "nfpcore/nfp_nsp.h"
@@ -299,6 +300,7 @@ int nfp_repr_init(struct nfp_app *app, struct net_device 
*netdev,
repr->dst->u.port_info.lower_dev = pf_netdev;
 
netdev->netdev_ops = _repr_netdev_ops;
+   SWITCHDEV_SET_OPS(netdev, _port_switchdev_ops);
 
err = register_netdev(netdev);
if (err)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_port.c 
b/drivers/net/ethernet/netronome/nfp/nfp_port.c
index 0b44952945d8..c95215eb87c2 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_port.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_port.c
@@ -59,6 +59,34 @@ struct nfp_port *nfp_port_from_netdev(struct net_device 
*netdev)
return NULL;
 }
 
+static int
+nfp_port_attr_get(struct net_device *netdev, struct switchdev_attr *attr)
+{
+   struct nfp_port *port;
+
+   port = nfp_port_from_netdev(netdev);
+   if (!port)
+   return -EOPNOTSUPP;
+
+   switch (attr->id) {
+   case SWITCHDEV_ATTR_ID_PORT_PARENT_ID: {
+   const u8 *serial;
+   /* N.B: attr->u.ppid.id is binary data */
+   attr->u.ppid.id_len = nfp_cpp_serial(port->app->cpp, );
+   memcpy(>u.ppid.id, serial, attr->u.ppid.id_len);
+   break;
+   }
+   default:
+   return -EOPNOTSUPP;
+   }
+
+   return 0;
+}
+
+const struct switchdev_ops nfp_port_switchdev_ops = {
+   .switchdev_port_attr_get= nfp_port_attr_get,
+};
+
 struct nfp_port *
 nfp_port_from_id(struct nfp_pf *pf, enum nfp_port_type type, unsigned int id)
 {
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_port.h 
b/drivers/net/ethernet/netronome/nfp/nfp_port.h
index 57d852a4ca59..de60cacd3362 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_port.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_port.h
@@ -35,6 +35,7 @@
 #define _NFP_PORT_H_
 
 #include 
+#include 
 
 struct net_device;
 struct nfp_app;
@@ -106,6 +107,8 @@ struct nfp_port {
struct list_head port_list;
 };
 
+extern const struct switchdev_ops nfp_port_switchdev_ops;
+
 struct nfp_port *nfp_port_from_netdev(struct net_device *netdev);
 struct nfp_port *
 nfp_port_from_id(struct nfp_pf *pf, enum nfp_port_type type, unsigned int id);
-- 
2.1.4

[PATCH net-next v2 3/9] nfp: provide infrastructure for offloading flower based TC filters

2017-06-28 Thread Simon Horman

From: Pieter Jansen van Vuuren 

Adds a flower based TC offload handler for representor devices, this
is in addition to the bpf based offload handler. The changes in this
patch will be used in a follow-up patch to add tc flower offload to
the NFP.

The flower app enables tc offloads on representors by default.

Signed-off-by: Pieter Jansen van Vuuren 
Signed-off-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/Makefile|   3 +-
 drivers/net/ethernet/netronome/nfp/flower/main.c   |  20 
 drivers/net/ethernet/netronome/nfp/flower/main.h   |  45 
 .../net/ethernet/netronome/nfp/flower/offload.c| 127 +
 .../net/ethernet/netronome/nfp/nfp_net_common.c|  14 +--
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.c  |   6 +
 drivers/net/ethernet/netronome/nfp/nfp_port.c  |  15 +++
 drivers/net/ethernet/netronome/nfp/nfp_port.h  |   5 +
 8 files changed, 221 insertions(+), 14 deletions(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/main.h
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/offload.c

diff --git a/drivers/net/ethernet/netronome/nfp/Makefile 
b/drivers/net/ethernet/netronome/nfp/Makefile
index 43bdbc228969..d7afd2b410fe 100644
--- a/drivers/net/ethernet/netronome/nfp/Makefile
+++ b/drivers/net/ethernet/netronome/nfp/Makefile
@@ -32,7 +32,8 @@ nfp-objs := \
 ifeq ($(CONFIG_NFP_APP_FLOWER),y)
 nfp-objs += \
flower/cmsg.o \
-   flower/main.o
+   flower/main.o \
+   flower/offload.o
 endif
 
 ifeq ($(CONFIG_BPF_SYSCALL),y)
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c 
b/drivers/net/ethernet/netronome/nfp/flower/main.c
index ab68a8f58862..19f20f819e2f 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
@@ -37,7 +37,9 @@
 #include 
 #include 
 
+#include "main.h"
 #include "../nfpcore/nfp_cpp.h"
+#include "../nfpcore/nfp_nffw.h"
 #include "../nfpcore/nfp_nsp.h"
 #include "../nfp_app.h"
 #include "../nfp_main.h"
@@ -46,6 +48,8 @@
 #include "../nfp_port.h"
 #include "./cmsg.h"
 
+#define NFP_FLOWER_ALLOWED_VER 0x00010001UL
+
 /**
  * struct nfp_flower_priv - Flower APP per-vNIC priv data
  * @nn: Pointer to vNIC
@@ -313,6 +317,8 @@ static int nfp_flower_vnic_init(struct nfp_app *app, struct 
nfp_net *nn,
 static int nfp_flower_init(struct nfp_app *app)
 {
const struct nfp_pf *pf = app->pf;
+   u64 version;
+   int err;
 
if (!pf->eth_tbl) {
nfp_warn(app->cpp, "FlowerNIC requires eth table\n");
@@ -329,6 +335,18 @@ static int nfp_flower_init(struct nfp_app *app)
return -EINVAL;
}
 
+   version = nfp_rtsym_read_le(app->pf->rtbl, "hw_flower_version", );
+   if (err) {
+   nfp_warn(app->cpp, "FlowerNIC requires hw_flower_version memory 
symbol\n");
+   return err;
+   }
+
+   /* We need to ensure hardware has enough flower capabilities. */
+   if (version != NFP_FLOWER_ALLOWED_VER) {
+   nfp_warn(app->cpp, "FlowerNIC: unspported firmware version\n");
+   return -EINVAL;
+   }
+
app->priv = kzalloc(sizeof(struct nfp_flower_priv), GFP_KERNEL);
if (!app->priv)
return -ENOMEM;
@@ -367,4 +385,6 @@ const struct nfp_app_type app_flower = {
 
.eswitch_mode_get  = eswitch_mode_get,
.repr_get   = nfp_flower_repr_get,
+
+   .setup_tc   = nfp_flower_setup_tc,
 };
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
new file mode 100644
index ..c7a19527875e
--- /dev/null
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -0,0 +1,45 @@
+/*
+ * Copyright (C) 2017 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  1. Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ *  2. Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT

[PATCH net-next v2 1/9] net: switchdev: add SET_SWITCHDEV_OPS helper

2017-06-28 Thread Simon Horman

Add a helper to allow switchdev ops to be set if NET_SWITCHDEV is configured
and do nothing otherwise. This allows for slightly cleaner code which
uses switchdev but does not select NET_SWITCHDEV.

Signed-off-by: Simon Horman 
---
 include/net/switchdev.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index c784a6ac6ef1..8ae9e3b6392e 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -217,6 +217,8 @@ void switchdev_port_fwd_mark_set(struct net_device *dev,
 
 bool switchdev_port_same_parent_id(struct net_device *a,
   struct net_device *b);
+
+#define SWITCHDEV_SET_OPS(netdev, ops) ((netdev)->switchdev_ops = (ops))
 #else
 
 static inline void switchdev_deferred_process(void)
@@ -322,6 +324,8 @@ static inline bool switchdev_port_same_parent_id(struct 
net_device *a,
return false;
 }
 
+#define SWITCHDEV_SET_OPS(netdev, ops) do {} while (0)
+
 #endif
 
 #endif /* _LINUX_SWITCHDEV_H_ */
-- 
2.1.4

Re: [PATCH NET V5 2/2] net: hns: Use phy_driver to setup Phy loopback

2017-06-28 Thread Andrew Lunn

> >>From your description, it sounds like you can call phy_resume() on a
> > device which is not suspended. 
> Do you mean after calling dev_close, the device is still not suspended?

You only call dev_close() if the device is running. What if somebody
runs the self test on an interface when it has never been opened? It
looks like you will call phy_resume(). But since it has never been
suspended, you could be in trouble.
> 
> In general, suspend is expected to
> > store away state which will be lost when powering down a
> > device. Resume writes that state back into the device after it is
> > powered up. So resuming a device which was never suspended could write
> > bad state into it.
>
> Do you mean phydev->suspended has bad state?

phy_resume() current does not check the phydev->suspended state.

> > Also, what about if WOL has been set before closing the device?
>
> phy_suspend will return errro.
> 
> int phy_suspend(struct phy_device *phydev)
> {
>   struct phy_driver *phydrv = to_phy_driver(phydev->mdio.dev.driver);
>   struct ethtool_wolinfo wol = { .cmd = ETHTOOL_GWOL };
>   int ret = 0;
> 
>   /* If the device has WOL enabled, we cannot suspend the PHY */
>   phy_ethtool_get_wol(phydev, );
>   if (wol.wolopts)
>   return -EBUSY;
> 
>   if (phydev->drv && phydrv->suspend)
>   ret = phydrv->suspend(phydev);
> 
>   if (ret)
>   return ret;
> 
>   phydev->suspended = true;
> 
>   return ret;
> }

Which means when you call phy_resume() in lb_setup() you are again
resuming a device which is not suspended...

 Andrew

[PATCH] [net-next] net/mlx5e: select CONFIG_MLXFW

2017-06-28 Thread Arnd Bergmann

With the introduction of mlx5 firmware flash support, we get a link
error with CONFIG_MLXFW=m and CONFIG_MLX5_CORE=y:

drivers/net/ethernet/mellanox/mlx5/core/fw.o: In function `mlx5_firmware_flash':
fw.c:(.text+0x9d4): undefined reference to `mlxfw_firmware_flash'

We could have a more elaborate method to force MLX5 to be a loadable
module in this case, but the easiest fix seems to be to always enable
MLXFW as well, like we do for CONFIG_MLXSW_SPECTRUM, which is the other
user of mlxfw_firmware_flash.

Fixes: 3ffaabecd1a1 ("net/mlx5e: Support the flash device ethtool callback")
Signed-off-by: Arnd Bergmann 
---
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig 
b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index cf1ef48bfd8d..09edee060b03 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -6,6 +6,7 @@ config MLX5_CORE
tristate "Mellanox Technologies ConnectX-4 and Connect-IB core driver"
depends on MAY_USE_DEVLINK
depends on PCI
+   select MLXFW
default n
---help---
  Core driver for low level functionality of the ConnectX-4 and
-- 
2.9.0

Re: [PATCH net-next v4 01/16] bpf: BPF support for sock_ops

2017-06-28 Thread Alexei Starovoitov


On 6/28/17 10:31 AM, Lawrence Brakmo wrote:

+#ifdef CONFIG_BPF
+static inline int tcp_call_bpf(struct sock *sk, bool is_req_sock, int op)
+{
+   struct bpf_sock_ops_kern sock_ops;
+   int ret;
+
+   if (!is_req_sock)
+   sock_owned_by_me(sk);
+
+   memset(_ops, 0, sizeof(sock_ops));
+   sock_ops.sk = sk;
+   sock_ops.is_req_sock = is_req_sock;
+   sock_ops.op = op;
+
+   ret = BPF_CGROUP_RUN_PROG_SOCK_OPS(_ops);
+   if (ret == 0)
+   ret = sock_ops.reply;
+   else
+   ret = -1;
+   return ret;
+}


the switch to cgroup attached only made it really nice and clean.
No global state to worry about.
I haven't looked through the minor patch details, but overall
it all looks good to me. I don't have any architectural concerns.

Acked-by: Alexei Starovoitov

Re: [PATCH v3 net-next 02/12] bpf/verifier: rework value tracking

2017-06-28 Thread Daniel Borkmann


On 06/28/2017 06:07 PM, Edward Cree wrote:

On 28/06/17 16:15, Daniel Borkmann wrote:

On 06/27/2017 02:56 PM, Edward Cree wrote:

Tracks value alignment by means of tracking known & unknown bits.
Tightens some min/max value checks and fixes a couple of bugs therein.


You mean the one in relation to patch 1/12? Would be good to elaborate
here since otherwise this gets forgotten few weeks later.

That wasn't the only one; there were also some in the new min/max value
  calculation for ALU ops.  For instance, in subtraction we were taking
  the new bounds as [min-min, max-max] instead of [min-max, max-min].
I can't remember what else there was and there might also have been some
  that I missed but that got incidentally fixed by the rewrite.  But I
  guess I should change "checks" to "checks and updates" in the above?


Ok. Would be good though to have them all covered in the selftests
part of your series if possible, so we can make sure to keep track
of these cases.


Could you also document all the changes that verifier will then start
allowing for after the patch?

Maybe not the changes, because the old verifier had a lot of special
  cases, but I could, and probably should, document the new behaviour
  (maybe in Documentation/networking/filter.txt, that already has a bit
  of description of the verifier).


Yeah, that would definitely help; filter.txt should be fine.


[...]

   /* check whether memory at (regno + off) is accessible for t = (read | write)
@@ -899,52 +965,79 @@ static int check_mem_access(struct bpf_verifier_env *env, 
int insn_idx, u32 regn
   struct bpf_reg_state *reg = >regs[regno];
   int size, err = 0;

-if (reg->type == PTR_TO_STACK)
-off += reg->imm;
-
   size = bpf_size_to_bytes(bpf_size);
   if (size < 0)
   return size;


[...]

-if (reg->type == PTR_TO_MAP_VALUE ||
-reg->type == PTR_TO_MAP_VALUE_ADJ) {
+/* for access checks, reg->off is just part of off */
+off += reg->off;


Could you elaborate on why removing the reg->type == PTR_TO_STACK?

Previously bpf_reg_state had a member 'imm' which, for PTR_TO_STACK, was
  a fixed offset, so we had to add it in to the offset.  Now we instead
  have reg->off and it's generic to all pointerish types, so we don't need
  special handling of PTR_TO_STACK here.

Also in context of below PTR_TO_CTX.

[...]

   } else if (reg->type == PTR_TO_CTX) {
-enum bpf_reg_type reg_type = UNKNOWN_VALUE;
+enum bpf_reg_type reg_type = SCALAR_VALUE;

   if (t == BPF_WRITE && value_regno >= 0 &&
   is_pointer_value(env, value_regno)) {
   verbose("R%d leaks addr into ctx\n", value_regno);
   return -EACCES;
   }
+/* ctx accesses must be at a fixed offset, so that we can
+ * determine what type of data were returned.
+ */
+if (!tnum_is_const(reg->var_off)) {
+char tn_buf[48];
+
+tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off);
+verbose("variable ctx access var_off=%s off=%d size=%d",
+tn_buf, off, size);
+return -EACCES;
+}
+off += reg->var_off.value;


... f.e. in PTR_TO_CTX case the only access that is currently
allowed is LDX/STX with fixed offset from insn->off, which is
passed as off param to check_mem_access(). Can you elaborate on
off += reg->var_off.value? Meaning we make this more dynamic
as long as access is known const?

So, I can't actually figure out how to construct a pointer with a known
  variable offset, but future changes to the verifier (like learning from
  comparing two pointers with the same base) could make it possible.  The
  situation we're handling here is where our register holds ctx + x,
  where x is also known to be some constant value k, and currently I don't
  know if that's possible except for the trivial case of k==0, and the edge
  case where k is too big to fit in the s32 reg->off (in which case the
  check_ctx_access will presumably reject it).
Stepping back a bit, each register holding a pointer type has two offsets,
  reg->off and reg->var_off, and the latter is a tnum representing
  knowledge about a value that's not necessarily exactly known.  But
  tnum_is_const checks that it _is_ exactly known.


Right, I was reviewing this with the thought in mind where we could
run into a pruning situation where in the first path we either add
a scalar or offset to the ctx ptr that is then spilled to stack, later
filled to a reg again with eventual successful exit. And the second path
would prune on the spilled reg, but even if scalar, we require that it's
a _known_ const whereas reading back from stack marks it unknown, so that
is not possible. So all is fine; including your below example since it
all has to be a _known_ scalar.


There is another case that we allow now through the reg->off handling:
  adding a constant to a pointer and then dereferencing it.
So, with r1=ctx, instead of r2 =

Re: [PATCH iproute2 3/5] rdma: Add device capability parsing

2017-06-28 Thread Leon Romanovsky

On Wed, Jun 28, 2017 at 10:11:12AM -0600, Jason Gunthorpe wrote:
> On Tue, Jun 27, 2017 at 03:18:59PM -0700, Stephen Hemminger wrote:
> > On Tue, 27 Jun 2017 20:46:15 +0300
> > Leon Romanovsky  wrote:
> >
> > > On Tue, Jun 27, 2017 at 11:37:35AM -0600, Jason Gunthorpe wrote:
> > > > On Tue, Jun 27, 2017 at 08:33:01PM +0300, Leon Romanovsky wrote:
> > > >
> > > > > My initial plan was to put all parsers under their respective names, 
> > > > > in
> > > > > the similar way as I did for caps: $ rdma dev show mlx5_4 caps
> > > >
> > > > I think you should have a useful summary display similar to 'ip a' and
> > > > other commands.
> > > >
> > > > guid(s), subnet prefix or default gid for IB, lid/lmc, link state,
> > > > speed, mtu, pkeys protocol(s)
> > >
> > > It will, but before I would like to see this tool be a part of
> > > iproute2, so other people will be able to extend it in addition
> > > to me.
> > >
> > > Are you fine with the proposed code?
> > >
> >
> > Output formats need to be nailed down. The output of iproute2 commands is 
> > almost
> > like an ABI. Users build scripts to parse it (whether that is a great idea 
> > or not
> > is debateable, it mostly shows the weakness in programatic API's). 
> > Therefore fully
> > changing output formats in later revisions is likely to get users upset.
>
> It would be nice to see an example of what the completed command
> should output to make judgements on the format.. Going bit by bit
> doesn't really give a full picture, IHO.

Bit by bit expansion allow easily control of what is needed. Mostly,
those full examples have nothing close to real use case.

>
> Jason


signature.asc
Description: PGP signature

Re: mwifiex: fix spelling mistake: "secuirty" -> "security"

2017-06-28 Thread Kalle Valo

Colin Ian King  wrote:

> From: Colin Ian King 
> 
> Trivial fix to spelling mistake in mwifiex_dbg message
> 
> Signed-off-by: Colin Ian King 

Patch applied to wireless-drivers-next.git, thanks.

3334c28ec56c mwifiex: fix spelling mistake: "secuirty" -> "security"

-- 
https://patchwork.kernel.org/patch/9814767/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

[PATCH net-next v1 13/14] amd-xgbe: Simplify the burst length settings

2017-06-28 Thread Tom Lendacky

Currently the driver hardcodes the PBLx8 setting.  Remove the need for
specifying the PBLx8 setting and automatically calculate based on the
specified PBL value. Since the PBLx8 setting applies to both Tx and Rx
use the same PBL value for both of them.

Also, the driver currently uses a bit field to set the AXI master burst
len setting. Change to the full bit field range and set the burst length
based on the specified value.

Signed-off-by: Tom Lendacky 
---
 drivers/net/ethernet/amd/xgbe/xgbe-common.h |   11 
 drivers/net/ethernet/amd/xgbe/xgbe-dev.c|   67 +++
 drivers/net/ethernet/amd/xgbe/xgbe-main.c   |5 +-
 drivers/net/ethernet/amd/xgbe/xgbe.h|   12 +
 4 files changed, 31 insertions(+), 64 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-common.h 
b/drivers/net/ethernet/amd/xgbe/xgbe-common.h
index dc09883..6b5c72d 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-common.h
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-common.h
@@ -137,12 +137,19 @@
 #define DMA_MR_SWR_WIDTH   1
 #define DMA_SBMR_EAME_INDEX11
 #define DMA_SBMR_EAME_WIDTH1
-#define DMA_SBMR_BLEN_256_INDEX7
-#define DMA_SBMR_BLEN_256_WIDTH1
+#define DMA_SBMR_BLEN_INDEX1
+#define DMA_SBMR_BLEN_WIDTH7
 #define DMA_SBMR_UNDEF_INDEX   0
 #define DMA_SBMR_UNDEF_WIDTH   1
 
 /* DMA register values */
+#define DMA_SBMR_BLEN_256  256
+#define DMA_SBMR_BLEN_128  128
+#define DMA_SBMR_BLEN_64   64
+#define DMA_SBMR_BLEN_32   32
+#define DMA_SBMR_BLEN_16   16
+#define DMA_SBMR_BLEN_88
+#define DMA_SBMR_BLEN_44
 #define DMA_DSR_RPS_WIDTH  4
 #define DMA_DSR_TPS_WIDTH  4
 #define DMA_DSR_Q_WIDTH(DMA_DSR_RPS_WIDTH + 
DMA_DSR_TPS_WIDTH)
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-dev.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-dev.c
index 98da249..a51ece5 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-dev.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-dev.c
@@ -174,52 +174,30 @@ static unsigned int xgbe_riwt_to_usec(struct 
xgbe_prv_data *pdata,
return ret;
 }
 
-static int xgbe_config_pblx8(struct xgbe_prv_data *pdata)
+static int xgbe_config_pbl_val(struct xgbe_prv_data *pdata)
 {
+   unsigned int pblx8, pbl;
unsigned int i;
 
-   for (i = 0; i < pdata->channel_count; i++)
-   XGMAC_DMA_IOWRITE_BITS(pdata->channel[i], DMA_CH_CR, PBLX8,
-  pdata->pblx8);
-
-   return 0;
-}
-
-static int xgbe_get_tx_pbl_val(struct xgbe_prv_data *pdata)
-{
-   return XGMAC_DMA_IOREAD_BITS(pdata->channel[0], DMA_CH_TCR, PBL);
-}
-
-static int xgbe_config_tx_pbl_val(struct xgbe_prv_data *pdata)
-{
-   unsigned int i;
-
-   for (i = 0; i < pdata->channel_count; i++) {
-   if (!pdata->channel[i]->tx_ring)
-   break;
+   pblx8 = DMA_PBL_X8_DISABLE;
+   pbl = pdata->pbl;
 
-   XGMAC_DMA_IOWRITE_BITS(pdata->channel[i], DMA_CH_TCR, PBL,
-  pdata->tx_pbl);
+   if (pdata->pbl > 32) {
+   pblx8 = DMA_PBL_X8_ENABLE;
+   pbl >>= 3;
}
 
-   return 0;
-}
-
-static int xgbe_get_rx_pbl_val(struct xgbe_prv_data *pdata)
-{
-   return XGMAC_DMA_IOREAD_BITS(pdata->channel[0], DMA_CH_RCR, PBL);
-}
-
-static int xgbe_config_rx_pbl_val(struct xgbe_prv_data *pdata)
-{
-   unsigned int i;
-
for (i = 0; i < pdata->channel_count; i++) {
-   if (!pdata->channel[i]->rx_ring)
-   break;
+   XGMAC_DMA_IOWRITE_BITS(pdata->channel[i], DMA_CH_CR, PBLX8,
+  pblx8);
+
+   if (pdata->channel[i]->tx_ring)
+   XGMAC_DMA_IOWRITE_BITS(pdata->channel[i], DMA_CH_TCR,
+  PBL, pbl);
 
-   XGMAC_DMA_IOWRITE_BITS(pdata->channel[i], DMA_CH_RCR, PBL,
-  pdata->rx_pbl);
+   if (pdata->channel[i]->rx_ring)
+   XGMAC_DMA_IOWRITE_BITS(pdata->channel[i], DMA_CH_RCR,
+  PBL, pbl);
}
 
return 0;
@@ -2141,7 +2119,7 @@ static void xgbe_config_dma_bus(struct xgbe_prv_data 
*pdata)
 
/* Set the System Bus mode */
XGMAC_IOWRITE_BITS(pdata, DMA_SBMR, UNDEF, 1);
-   XGMAC_IOWRITE_BITS(pdata, DMA_SBMR, BLEN_256, 1);
+   XGMAC_IOWRITE_BITS(pdata, DMA_SBMR, BLEN, pdata->blen >> 2);
 }
 
 static void xgbe_config_dma_cache(struct xgbe_prv_data *pdata)
@@ -3381,9 +3359,7 @@ static int xgbe_init(struct xgbe_prv_data *pdata)
xgbe_config_dma_bus(pdata);
xgbe_config_dma_cache(pdata);
xgbe_config_osp_mode(pdata);
-

[PATCH net-next v1 14/14] amd-xgbe: Adjust register settings to improve performance

2017-06-28 Thread Tom Lendacky

Add support to change some general performance settings and to provide
some performance settings based on the device that is probed.

This includes:

- Setting the maximum read/write outstanding request limit
- Reducing the AXI interface burst length size
- Selectively setting the Tx and Rx descriptor pre-fetch threshold
- Selectively setting additional cache coherency controls

Tested and verified on all versions of the hardware.

Signed-off-by: Tom Lendacky 
---
 drivers/net/ethernet/amd/xgbe/xgbe-common.h |   13 +
 drivers/net/ethernet/amd/xgbe/xgbe-dev.c|   26 +++---
 drivers/net/ethernet/amd/xgbe/xgbe-main.c   |5 -
 drivers/net/ethernet/amd/xgbe/xgbe-pci.c|9 +++--
 drivers/net/ethernet/amd/xgbe/xgbe.h|   11 +++
 5 files changed, 58 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-common.h 
b/drivers/net/ethernet/amd/xgbe/xgbe-common.h
index 6b5c72d..9795419 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-common.h
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-common.h
@@ -123,8 +123,11 @@
 #define DMA_ISR0x3008
 #define DMA_AXIARCR0x3010
 #define DMA_AXIAWCR0x3018
+#define DMA_AXIAWARCR  0x301c
 #define DMA_DSR0   0x3020
 #define DMA_DSR1   0x3024
+#define DMA_TXEDMACR   0x3040
+#define DMA_RXEDMACR   0x3044
 
 /* DMA register entry bit positions and sizes */
 #define DMA_ISR_MACIS_INDEX17
@@ -135,12 +138,22 @@
 #define DMA_MR_INTM_WIDTH  2
 #define DMA_MR_SWR_INDEX   0
 #define DMA_MR_SWR_WIDTH   1
+#define DMA_RXEDMACR_RDPS_INDEX0
+#define DMA_RXEDMACR_RDPS_WIDTH3
+#define DMA_SBMR_AAL_INDEX 12
+#define DMA_SBMR_AAL_WIDTH 1
 #define DMA_SBMR_EAME_INDEX11
 #define DMA_SBMR_EAME_WIDTH1
 #define DMA_SBMR_BLEN_INDEX1
 #define DMA_SBMR_BLEN_WIDTH7
+#define DMA_SBMR_RD_OSR_LMT_INDEX  16
+#define DMA_SBMR_RD_OSR_LMT_WIDTH  6
 #define DMA_SBMR_UNDEF_INDEX   0
 #define DMA_SBMR_UNDEF_WIDTH   1
+#define DMA_SBMR_WR_OSR_LMT_INDEX  24
+#define DMA_SBMR_WR_OSR_LMT_WIDTH  6
+#define DMA_TXEDMACR_TDPS_INDEX0
+#define DMA_TXEDMACR_TDPS_WIDTH3
 
 /* DMA register values */
 #define DMA_SBMR_BLEN_256  256
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-dev.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-dev.c
index a51ece5..06f953e 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-dev.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-dev.c
@@ -2114,18 +2114,38 @@ static int xgbe_flush_tx_queues(struct xgbe_prv_data 
*pdata)
 
 static void xgbe_config_dma_bus(struct xgbe_prv_data *pdata)
 {
+   unsigned int sbmr;
+
+   sbmr = XGMAC_IOREAD(pdata, DMA_SBMR);
+
/* Set enhanced addressing mode */
-   XGMAC_IOWRITE_BITS(pdata, DMA_SBMR, EAME, 1);
+   XGMAC_SET_BITS(sbmr, DMA_SBMR, EAME, 1);
 
/* Set the System Bus mode */
-   XGMAC_IOWRITE_BITS(pdata, DMA_SBMR, UNDEF, 1);
-   XGMAC_IOWRITE_BITS(pdata, DMA_SBMR, BLEN, pdata->blen >> 2);
+   XGMAC_SET_BITS(sbmr, DMA_SBMR, UNDEF, 1);
+   XGMAC_SET_BITS(sbmr, DMA_SBMR, BLEN, pdata->blen >> 2);
+   XGMAC_SET_BITS(sbmr, DMA_SBMR, AAL, pdata->aal);
+   XGMAC_SET_BITS(sbmr, DMA_SBMR, RD_OSR_LMT, pdata->rd_osr_limit - 1);
+   XGMAC_SET_BITS(sbmr, DMA_SBMR, WR_OSR_LMT, pdata->wr_osr_limit - 1);
+
+   XGMAC_IOWRITE(pdata, DMA_SBMR, sbmr);
+
+   /* Set descriptor fetching threshold */
+   if (pdata->vdata->tx_desc_prefetch)
+   XGMAC_IOWRITE_BITS(pdata, DMA_TXEDMACR, TDPS,
+  pdata->vdata->tx_desc_prefetch);
+
+   if (pdata->vdata->rx_desc_prefetch)
+   XGMAC_IOWRITE_BITS(pdata, DMA_RXEDMACR, RDPS,
+  pdata->vdata->rx_desc_prefetch);
 }
 
 static void xgbe_config_dma_cache(struct xgbe_prv_data *pdata)
 {
XGMAC_IOWRITE(pdata, DMA_AXIARCR, pdata->arcr);
XGMAC_IOWRITE(pdata, DMA_AXIAWCR, pdata->awcr);
+   if (pdata->awarcr)
+   XGMAC_IOWRITE(pdata, DMA_AXIAWARCR, pdata->awarcr);
 }
 
 static void xgbe_config_mtl_mode(struct xgbe_prv_data *pdata)
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-main.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-main.c
index 8eec9f5..500147d 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-main.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-main.c
@@ -140,8 +140,11 @@ static void xgbe_default_config(struct xgbe_prv_data 
*pdata)
 {
DBGPR("-->xgbe_default_config\n");
 
-   pdata->blen = DMA_SBMR_BLEN_256;
+   pdata->blen = DMA_SBMR_BLEN_64;
pdata->pbl = DMA_PBL_128;
+   pdata->aal = 1;
+   pdata->rd_osr_limit = 8;
+   pdata->wr_osr_limit = 8;

[PATCH net-next v1 11/14] amd-xgbe: Add NUMA affinity support for IRQ hints

2017-06-28 Thread Tom Lendacky

For IRQ affinity, set the affinity hints for the IRQs to be (initially) on
the processors corresponding to the NUMA node of the device.

Signed-off-by: Tom Lendacky 
---
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c |   18 +++---
 drivers/net/ethernet/amd/xgbe/xgbe.h |2 ++
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
index 43b84ff..ecef3ee 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
@@ -192,12 +192,17 @@ static int xgbe_alloc_channels(struct xgbe_prv_data 
*pdata)
struct xgbe_channel *channel;
struct xgbe_ring *ring;
unsigned int count, i;
+   unsigned int cpu;
int node;
 
-   node = dev_to_node(pdata->dev);
-
count = max_t(unsigned int, pdata->tx_ring_count, pdata->rx_ring_count);
for (i = 0; i < count; i++) {
+   /* Attempt to use a CPU on the node the device is on */
+   cpu = cpumask_local_spread(i, dev_to_node(pdata->dev));
+
+   /* Set the allocation node based on the returned CPU */
+   node = cpu_to_node(cpu);
+
channel = xgbe_alloc_node(sizeof(*channel), node);
if (!channel)
goto err_mem;
@@ -209,6 +214,7 @@ static int xgbe_alloc_channels(struct xgbe_prv_data *pdata)
channel->dma_regs = pdata->xgmac_regs + DMA_CH_BASE +
(DMA_CH_INC * i);
channel->node = node;
+   cpumask_set_cpu(cpu, >affinity_mask);
 
if (pdata->per_channel_irq)
channel->dma_irq = pdata->channel_irq[i];
@@ -236,7 +242,7 @@ static int xgbe_alloc_channels(struct xgbe_prv_data *pdata)
}
 
netif_dbg(pdata, drv, pdata->netdev,
- "%s: node=%d\n", channel->name, node);
+ "%s: cpu=%u, node=%d\n", channel->name, cpu, node);
 
netif_dbg(pdata, drv, pdata->netdev,
  "%s: dma_regs=%p, dma_irq=%d, tx=%p, rx=%p\n",
@@ -916,6 +922,9 @@ static int xgbe_request_irqs(struct xgbe_prv_data *pdata)
 channel->dma_irq);
goto err_dma_irq;
}
+
+   irq_set_affinity_hint(channel->dma_irq,
+ >affinity_mask);
}
 
return 0;
@@ -925,6 +934,7 @@ static int xgbe_request_irqs(struct xgbe_prv_data *pdata)
for (i--; i < pdata->channel_count; i--) {
channel = pdata->channel[i];
 
+   irq_set_affinity_hint(channel->dma_irq, NULL);
devm_free_irq(pdata->dev, channel->dma_irq, channel);
}
 
@@ -952,6 +962,8 @@ static void xgbe_free_irqs(struct xgbe_prv_data *pdata)
 
for (i = 0; i < pdata->channel_count; i++) {
channel = pdata->channel[i];
+
+   irq_set_affinity_hint(channel->dma_irq, NULL);
devm_free_irq(pdata->dev, channel->dma_irq, channel);
}
 }
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe.h 
b/drivers/net/ethernet/amd/xgbe/xgbe.h
index ac3b558..7b50469 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe.h
+++ b/drivers/net/ethernet/amd/xgbe/xgbe.h
@@ -128,6 +128,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define XGBE_DRV_NAME  "amd-xgbe"
 #define XGBE_DRV_VERSION   "1.0.3"
@@ -465,6 +466,7 @@ struct xgbe_channel {
struct xgbe_ring *rx_ring;
 
int node;
+   cpumask_t affinity_mask;
 } cacheline_aligned;
 
 enum xgbe_state {

[PATCH net-next v1 12/14] amd-xgbe: Prepare for more fine grained cache coherency controls

2017-06-28 Thread Tom Lendacky

In prep for setting fine grained read and write DMA cache coherency
controls, allow specific values to be used to set the cache coherency
registers.

Signed-off-by: Tom Lendacky 
---
 drivers/net/ethernet/amd/xgbe/xgbe-common.h   |   28 -
 drivers/net/ethernet/amd/xgbe/xgbe-dev.c  |   23 ++---
 drivers/net/ethernet/amd/xgbe/xgbe-pci.c  |5 ++--
 drivers/net/ethernet/amd/xgbe/xgbe-platform.c |   10 -
 drivers/net/ethernet/amd/xgbe/xgbe.h  |   15 +
 5 files changed, 14 insertions(+), 67 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-common.h 
b/drivers/net/ethernet/amd/xgbe/xgbe-common.h
index e7b6804..dc09883 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-common.h
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-common.h
@@ -127,34 +127,6 @@
 #define DMA_DSR1   0x3024
 
 /* DMA register entry bit positions and sizes */
-#define DMA_AXIARCR_DRC_INDEX  0
-#define DMA_AXIARCR_DRC_WIDTH  4
-#define DMA_AXIARCR_DRD_INDEX  4
-#define DMA_AXIARCR_DRD_WIDTH  2
-#define DMA_AXIARCR_TEC_INDEX  8
-#define DMA_AXIARCR_TEC_WIDTH  4
-#define DMA_AXIARCR_TED_INDEX  12
-#define DMA_AXIARCR_TED_WIDTH  2
-#define DMA_AXIARCR_THC_INDEX  16
-#define DMA_AXIARCR_THC_WIDTH  4
-#define DMA_AXIARCR_THD_INDEX  20
-#define DMA_AXIARCR_THD_WIDTH  2
-#define DMA_AXIAWCR_DWC_INDEX  0
-#define DMA_AXIAWCR_DWC_WIDTH  4
-#define DMA_AXIAWCR_DWD_INDEX  4
-#define DMA_AXIAWCR_DWD_WIDTH  2
-#define DMA_AXIAWCR_RPC_INDEX  8
-#define DMA_AXIAWCR_RPC_WIDTH  4
-#define DMA_AXIAWCR_RPD_INDEX  12
-#define DMA_AXIAWCR_RPD_WIDTH  2
-#define DMA_AXIAWCR_RHC_INDEX  16
-#define DMA_AXIAWCR_RHC_WIDTH  4
-#define DMA_AXIAWCR_RHD_INDEX  20
-#define DMA_AXIAWCR_RHD_WIDTH  2
-#define DMA_AXIAWCR_TDC_INDEX  24
-#define DMA_AXIAWCR_TDC_WIDTH  4
-#define DMA_AXIAWCR_TDD_INDEX  28
-#define DMA_AXIAWCR_TDD_WIDTH  2
 #define DMA_ISR_MACIS_INDEX17
 #define DMA_ISR_MACIS_WIDTH1
 #define DMA_ISR_MTLIS_INDEX16
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-dev.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-dev.c
index b05393f..98da249 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-dev.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-dev.c
@@ -2146,27 +2146,8 @@ static void xgbe_config_dma_bus(struct xgbe_prv_data 
*pdata)
 
 static void xgbe_config_dma_cache(struct xgbe_prv_data *pdata)
 {
-   unsigned int arcache, awcache;
-
-   arcache = 0;
-   XGMAC_SET_BITS(arcache, DMA_AXIARCR, DRC, pdata->arcache);
-   XGMAC_SET_BITS(arcache, DMA_AXIARCR, DRD, pdata->axdomain);
-   XGMAC_SET_BITS(arcache, DMA_AXIARCR, TEC, pdata->arcache);
-   XGMAC_SET_BITS(arcache, DMA_AXIARCR, TED, pdata->axdomain);
-   XGMAC_SET_BITS(arcache, DMA_AXIARCR, THC, pdata->arcache);
-   XGMAC_SET_BITS(arcache, DMA_AXIARCR, THD, pdata->axdomain);
-   XGMAC_IOWRITE(pdata, DMA_AXIARCR, arcache);
-
-   awcache = 0;
-   XGMAC_SET_BITS(awcache, DMA_AXIAWCR, DWC, pdata->awcache);
-   XGMAC_SET_BITS(awcache, DMA_AXIAWCR, DWD, pdata->axdomain);
-   XGMAC_SET_BITS(awcache, DMA_AXIAWCR, RPC, pdata->awcache);
-   XGMAC_SET_BITS(awcache, DMA_AXIAWCR, RPD, pdata->axdomain);
-   XGMAC_SET_BITS(awcache, DMA_AXIAWCR, RHC, pdata->awcache);
-   XGMAC_SET_BITS(awcache, DMA_AXIAWCR, RHD, pdata->axdomain);
-   XGMAC_SET_BITS(awcache, DMA_AXIAWCR, TDC, pdata->awcache);
-   XGMAC_SET_BITS(awcache, DMA_AXIAWCR, TDD, pdata->axdomain);
-   XGMAC_IOWRITE(pdata, DMA_AXIAWCR, awcache);
+   XGMAC_IOWRITE(pdata, DMA_AXIARCR, pdata->arcr);
+   XGMAC_IOWRITE(pdata, DMA_AXIAWCR, pdata->awcr);
 }
 
 static void xgbe_config_mtl_mode(struct xgbe_prv_data *pdata)
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-pci.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-pci.c
index f0c2e88..1e73768 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-pci.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-pci.c
@@ -327,9 +327,8 @@ static int xgbe_pci_probe(struct pci_dev *pdev, const 
struct pci_device_id *id)
 
/* Set the DMA coherency values */
pdata->coherent = 1;
-   pdata->axdomain = XGBE_DMA_OS_AXDOMAIN;
-   pdata->arcache = XGBE_DMA_OS_ARCACHE;
-   pdata->awcache = XGBE_DMA_OS_AWCACHE;
+   pdata->arcr = XGBE_DMA_OS_ARCR;
+   pdata->awcr = XGBE_DMA_OS_AWCR;
 
/* Set the maximum channels and queues */
reg = XP_IOREAD(pdata, XP_PROP_1);
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-platform.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-platform.c
index 84d4c51..d0f3dfb 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-platform.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-platform.c
@@ -448,13 +448,11 @@ static int xgbe_platform_probe(struct platform_device

[PATCH net-next v1 10/14] amd-xgbe: Add NUMA affinity support for memory allocations

2017-06-28 Thread Tom Lendacky

Add support to perform memory allocations on the node of the device. The
original allocation or the ring structure and Tx/Rx queues allocated all
of the memory at once and then carved it up for each channel and queue.
To best ensure that we get as much memory from the NUMA node as we can,
break the channel and ring allocations into individual allocations.

Signed-off-by: Tom Lendacky 
---
 drivers/net/ethernet/amd/xgbe/xgbe-desc.c |   94 ++-
 drivers/net/ethernet/amd/xgbe/xgbe-dev.c  |  135 +-
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c  |  177 -
 drivers/net/ethernet/amd/xgbe/xgbe.h  |5 +
 4 files changed, 217 insertions(+), 194 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-desc.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-desc.c
index 0a98c36..45d9230 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-desc.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-desc.c
@@ -176,8 +176,8 @@ static void xgbe_free_ring_resources(struct xgbe_prv_data 
*pdata)
 
DBGPR("-->xgbe_free_ring_resources\n");
 
-   channel = pdata->channel;
-   for (i = 0; i < pdata->channel_count; i++, channel++) {
+   for (i = 0; i < pdata->channel_count; i++) {
+   channel = pdata->channel[i];
xgbe_free_ring(pdata, channel->tx_ring);
xgbe_free_ring(pdata, channel->rx_ring);
}
@@ -185,34 +185,60 @@ static void xgbe_free_ring_resources(struct xgbe_prv_data 
*pdata)
DBGPR("<--xgbe_free_ring_resources\n");
 }
 
+static void *xgbe_alloc_node(size_t size, int node)
+{
+   void *mem;
+
+   mem = kzalloc_node(size, GFP_KERNEL, node);
+   if (!mem)
+   mem = kzalloc(size, GFP_KERNEL);
+
+   return mem;
+}
+
+static void *xgbe_dma_alloc_node(struct device *dev, size_t size,
+dma_addr_t *dma, int node)
+{
+   void *mem;
+   int cur_node = dev_to_node(dev);
+
+   set_dev_node(dev, node);
+   mem = dma_alloc_coherent(dev, size, dma, GFP_KERNEL);
+   set_dev_node(dev, cur_node);
+
+   if (!mem)
+   mem = dma_alloc_coherent(dev, size, dma, GFP_KERNEL);
+
+   return mem;
+}
+
 static int xgbe_init_ring(struct xgbe_prv_data *pdata,
  struct xgbe_ring *ring, unsigned int rdesc_count)
 {
-   DBGPR("-->xgbe_init_ring\n");
+   size_t size;
 
if (!ring)
return 0;
 
/* Descriptors */
+   size = rdesc_count * sizeof(struct xgbe_ring_desc);
+
ring->rdesc_count = rdesc_count;
-   ring->rdesc = dma_alloc_coherent(pdata->dev,
-(sizeof(struct xgbe_ring_desc) *
- rdesc_count), >rdesc_dma,
-GFP_KERNEL);
+   ring->rdesc = xgbe_dma_alloc_node(pdata->dev, size, >rdesc_dma,
+ ring->node);
if (!ring->rdesc)
return -ENOMEM;
 
/* Descriptor information */
-   ring->rdata = kcalloc(rdesc_count, sizeof(struct xgbe_ring_data),
- GFP_KERNEL);
+   size = rdesc_count * sizeof(struct xgbe_ring_data);
+
+   ring->rdata = xgbe_alloc_node(size, ring->node);
if (!ring->rdata)
return -ENOMEM;
 
netif_dbg(pdata, drv, pdata->netdev,
- "rdesc=%p, rdesc_dma=%pad, rdata=%p\n",
- ring->rdesc, >rdesc_dma, ring->rdata);
-
-   DBGPR("<--xgbe_init_ring\n");
+ "rdesc=%p, rdesc_dma=%pad, rdata=%p, node=%d\n",
+ ring->rdesc, >rdesc_dma, ring->rdata, ring->node);
 
return 0;
 }
@@ -223,10 +249,8 @@ static int xgbe_alloc_ring_resources(struct xgbe_prv_data 
*pdata)
unsigned int i;
int ret;
 
-   DBGPR("-->xgbe_alloc_ring_resources\n");
-
-   channel = pdata->channel;
-   for (i = 0; i < pdata->channel_count; i++, channel++) {
+   for (i = 0; i < pdata->channel_count; i++) {
+   channel = pdata->channel[i];
netif_dbg(pdata, drv, pdata->netdev, "%s - Tx ring:\n",
  channel->name);
 
@@ -250,8 +274,6 @@ static int xgbe_alloc_ring_resources(struct xgbe_prv_data 
*pdata)
}
}
 
-   DBGPR("<--xgbe_alloc_ring_resources\n");
-
return 0;
 
 err_ring:
@@ -261,21 +283,33 @@ static int xgbe_alloc_ring_resources(struct xgbe_prv_data 
*pdata)
 }
 
 static int xgbe_alloc_pages(struct xgbe_prv_data *pdata,
-   struct xgbe_page_alloc *pa, gfp_t gfp, int order)
+   struct xgbe_page_alloc *pa, int alloc_order,
+   int node)
 {
struct page *pages = NULL;
dma_addr_t pages_dma;
-   int ret;
+   gfp_t gfp;
+   int order, ret;
+
+again:
+   order = alloc_order;
 
/* Try to obtain pages, decreasing order if necessary

[PATCH net-next v1 09/14] amd-xgbe: Re-issue interrupt if interrupt status not cleared

2017-06-28 Thread Tom Lendacky

Some of the device interrupts should function as level interrupts. For
some hardware configurations this requires setting some control bits
so that if the interrupt status has not been cleared the interrupt
should be reissued.

Additionally, when using MSI or MSI-X interrupts, run the interrupt
service routine as a tasklet so that the re-issuance of the interrupt
is handled properly.

Signed-off-by: Tom Lendacky 
---
 drivers/net/ethernet/amd/xgbe/xgbe-common.h |1 +
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c|   53 +++
 drivers/net/ethernet/amd/xgbe/xgbe-i2c.c|   30 +--
 drivers/net/ethernet/amd/xgbe/xgbe-mdio.c   |   33 +++--
 drivers/net/ethernet/amd/xgbe/xgbe-pci.c|4 ++
 drivers/net/ethernet/amd/xgbe/xgbe.h|   11 +-
 6 files changed, 115 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-common.h 
b/drivers/net/ethernet/amd/xgbe/xgbe-common.h
index 127adbe..e7b6804 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-common.h
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-common.h
@@ -959,6 +959,7 @@
 #define XP_DRIVER_INT_RO   0x0064
 #define XP_DRIVER_SCRATCH_00x0068
 #define XP_DRIVER_SCRATCH_10x006c
+#define XP_INT_REISSUE_EN  0x0074
 #define XP_INT_EN  0x0078
 #define XP_I2C_MUTEX   0x0080
 #define XP_MDIO_MUTEX  0x0084
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
index 2068510..ff6d204 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
@@ -382,9 +382,9 @@ static bool xgbe_ecc_ded(struct xgbe_prv_data *pdata, 
unsigned long *period,
return false;
 }
 
-static irqreturn_t xgbe_ecc_isr(int irq, void *data)
+static void xgbe_ecc_isr_task(unsigned long data)
 {
-   struct xgbe_prv_data *pdata = data;
+   struct xgbe_prv_data *pdata = (struct xgbe_prv_data *)data;
unsigned int ecc_isr;
bool stop = false;
 
@@ -435,12 +435,26 @@ static irqreturn_t xgbe_ecc_isr(int irq, void *data)
/* Clear all ECC interrupts */
XP_IOWRITE(pdata, XP_ECC_ISR, ecc_isr);
 
-   return IRQ_HANDLED;
+   /* Reissue interrupt if status is not clear */
+   if (pdata->vdata->irq_reissue_support)
+   XP_IOWRITE(pdata, XP_INT_REISSUE_EN, 1 << 1);
 }
 
-static irqreturn_t xgbe_isr(int irq, void *data)
+static irqreturn_t xgbe_ecc_isr(int irq, void *data)
 {
struct xgbe_prv_data *pdata = data;
+
+   if (pdata->isr_as_tasklet)
+   tasklet_schedule(>tasklet_ecc);
+   else
+   xgbe_ecc_isr_task((unsigned long)pdata);
+
+   return IRQ_HANDLED;
+}
+
+static void xgbe_isr_task(unsigned long data)
+{
+   struct xgbe_prv_data *pdata = (struct xgbe_prv_data *)data;
struct xgbe_hw_if *hw_if = >hw_if;
struct xgbe_channel *channel;
unsigned int dma_isr, dma_ch_isr;
@@ -543,15 +557,36 @@ static irqreturn_t xgbe_isr(int irq, void *data)
 isr_done:
/* If there is not a separate AN irq, handle it here */
if (pdata->dev_irq == pdata->an_irq)
-   pdata->phy_if.an_isr(irq, pdata);
+   pdata->phy_if.an_isr(pdata);
 
/* If there is not a separate ECC irq, handle it here */
if (pdata->vdata->ecc_support && (pdata->dev_irq == pdata->ecc_irq))
-   xgbe_ecc_isr(irq, pdata);
+   xgbe_ecc_isr_task((unsigned long)pdata);
 
/* If there is not a separate I2C irq, handle it here */
if (pdata->vdata->i2c_support && (pdata->dev_irq == pdata->i2c_irq))
-   pdata->i2c_if.i2c_isr(irq, pdata);
+   pdata->i2c_if.i2c_isr(pdata);
+
+   /* Reissue interrupt if status is not clear */
+   if (pdata->vdata->irq_reissue_support) {
+   unsigned int reissue_mask;
+
+   reissue_mask = 1 << 0;
+   if (!pdata->per_channel_irq)
+   reissue_mask |= 0x < 4;
+
+   XP_IOWRITE(pdata, XP_INT_REISSUE_EN, reissue_mask);
+   }
+}
+
+static irqreturn_t xgbe_isr(int irq, void *data)
+{
+   struct xgbe_prv_data *pdata = data;
+
+   if (pdata->isr_as_tasklet)
+   tasklet_schedule(>tasklet_dev);
+   else
+   xgbe_isr_task((unsigned long)pdata);
 
return IRQ_HANDLED;
 }
@@ -826,6 +861,10 @@ static int xgbe_request_irqs(struct xgbe_prv_data *pdata)
unsigned int i;
int ret;
 
+   tasklet_init(>tasklet_dev, xgbe_isr_task, (unsigned long)pdata);
+   tasklet_init(>tasklet_ecc, xgbe_ecc_isr_task,
+(unsigned long)pdata);
+
ret = devm_request_irq(pdata->dev, pdata->dev_irq, xgbe_isr, 0,
   netdev->name, pdata);
if (ret) {
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-i2c.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-i2c.c

[PATCH net-next v1 04/14] amd-xgbe: Add a check for an skb in the timestamp path

2017-06-28 Thread Tom Lendacky

Spurious Tx timestamp interrupts can cause an oops in the Tx timestamp
processing function if a Tx timestamp skb is NULL. Add a check to insure
a Tx timestamp skb is present before attempting to use it.

Signed-off-by: Tom Lendacky 
---
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
index a934bd5..2068510 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
@@ -1212,6 +1212,10 @@ static void xgbe_tx_tstamp(struct work_struct *work)
u64 nsec;
unsigned long flags;
 
+   spin_lock_irqsave(>tstamp_lock, flags);
+   if (!pdata->tx_tstamp_skb)
+   goto unlock;
+
if (pdata->tx_tstamp) {
nsec = timecounter_cyc2time(>tstamp_tc,
pdata->tx_tstamp);
@@ -1223,8 +1227,9 @@ static void xgbe_tx_tstamp(struct work_struct *work)
 
dev_kfree_skb_any(pdata->tx_tstamp_skb);
 
-   spin_lock_irqsave(>tstamp_lock, flags);
pdata->tx_tstamp_skb = NULL;
+
+unlock:
spin_unlock_irqrestore(>tstamp_lock, flags);
 }

[PATCH net-next v1 08/14] amd-xgbe: Limit the I2C error messages that are output

2017-06-28 Thread Tom Lendacky

When I2C communication fails, it tends to always fail. Rather than
continuously issue an error message (once per second in most cases),
change the message to be issued just once.

Signed-off-by: Tom Lendacky 
---
 drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c |9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
index b8be62e..04b5c14 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
@@ -1121,7 +1121,8 @@ static int xgbe_phy_sfp_read_eeprom(struct xgbe_prv_data 
*pdata)
 
ret = xgbe_phy_sfp_get_mux(pdata);
if (ret) {
-   netdev_err(pdata->netdev, "I2C error setting SFP MUX\n");
+   dev_err_once(pdata->dev, "%s: I2C error setting SFP MUX\n",
+netdev_name(pdata->netdev));
return ret;
}
 
@@ -1131,7 +1132,8 @@ static int xgbe_phy_sfp_read_eeprom(struct xgbe_prv_data 
*pdata)
_addr, sizeof(eeprom_addr),
_eeprom, sizeof(sfp_eeprom));
if (ret) {
-   netdev_err(pdata->netdev, "I2C error reading SFP EEPROM\n");
+   dev_err_once(pdata->dev, "%s: I2C error reading SFP EEPROM\n",
+netdev_name(pdata->netdev));
goto put;
}
 
@@ -1190,7 +1192,8 @@ static void xgbe_phy_sfp_signals(struct xgbe_prv_data 
*pdata)
_reg, sizeof(gpio_reg),
gpio_ports, sizeof(gpio_ports));
if (ret) {
-   netdev_err(pdata->netdev, "I2C error reading SFP GPIOs\n");
+   dev_err_once(pdata->dev, "%s: I2C error reading SFP GPIOs\n",
+netdev_name(pdata->netdev));
return;
}

[PATCH net-next v1 06/14] amd-xgbe: Handle return code from software reset function

2017-06-28 Thread Tom Lendacky

Currently the function that performs a software reset of the hardware
provides a return code.  During driver probe check this return code and
exit with an error if the software reset fails.

Signed-off-by: Tom Lendacky 
---
 drivers/net/ethernet/amd/xgbe/xgbe-main.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-main.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-main.c
index 17ac8f9..982368b 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-main.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-main.c
@@ -277,7 +277,11 @@ int xgbe_config_netdev(struct xgbe_prv_data *pdata)
pdata->desc_ded_period = jiffies;
 
/* Issue software reset to device */
-   pdata->hw_if.exit(pdata);
+   ret = pdata->hw_if.exit(pdata);
+   if (ret) {
+   dev_err(dev, "software reset failed\n");
+   return ret;
+   }
 
/* Set default configuration data */
xgbe_default_config(pdata);

[PATCH net-next v1 07/14] amd-xgbe: Fixes for working with PHYs that support 2.5GbE

2017-06-28 Thread Tom Lendacky

The driver has some missing functionality when operating in the mode that
supports 2.5GbE.  Fix the driver to fully recognize and support this speed.

Signed-off-by: Tom Lendacky 
---
 drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c |7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
index 756e116..b8be62e 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
@@ -1966,6 +1966,8 @@ static enum xgbe_mode xgbe_phy_get_baset_mode(struct 
xgbe_phy_data *phy_data,
return XGBE_MODE_SGMII_100;
case SPEED_1000:
return XGBE_MODE_SGMII_1000;
+   case SPEED_2500:
+   return XGBE_MODE_KX_2500;
case SPEED_1:
return XGBE_MODE_KR;
default:
@@ -2109,6 +2111,9 @@ static bool xgbe_phy_use_baset_mode(struct xgbe_prv_data 
*pdata,
case XGBE_MODE_SGMII_1000:
return xgbe_phy_check_mode(pdata, mode,
   ADVERTISED_1000baseT_Full);
+   case XGBE_MODE_KX_2500:
+   return xgbe_phy_check_mode(pdata, mode,
+  ADVERTISED_2500baseX_Full);
case XGBE_MODE_KR:
return xgbe_phy_check_mode(pdata, mode,
   ADVERTISED_1baseT_Full);
@@ -2218,6 +2223,8 @@ static bool xgbe_phy_valid_speed_baset_mode(struct 
xgbe_phy_data *phy_data,
case SPEED_100:
case SPEED_1000:
return true;
+   case SPEED_2500:
+   return (phy_data->port_mode == XGBE_PORT_MODE_NBASE_T);
case SPEED_1:
return (phy_data->port_mode == XGBE_PORT_MODE_10GBASE_T);
default:

[PATCH net-next v1 05/14] amd-xgbe: Prevent looping forever if timestamp update fails

2017-06-28 Thread Tom Lendacky

Just to be on the safe side, should the update of the timestamp registers
not complete, issue a warning rather than looping forever waiting for the
update to complete.

Signed-off-by: Tom Lendacky 
---
 drivers/net/ethernet/amd/xgbe/xgbe-dev.c |   15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-dev.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-dev.c
index 24a687c..3ad4036 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-dev.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-dev.c
@@ -1497,26 +1497,37 @@ static void xgbe_rx_desc_init(struct xgbe_channel 
*channel)
 static void xgbe_update_tstamp_addend(struct xgbe_prv_data *pdata,
  unsigned int addend)
 {
+   unsigned int count = 1;
+
/* Set the addend register value and tell the device */
XGMAC_IOWRITE(pdata, MAC_TSAR, addend);
XGMAC_IOWRITE_BITS(pdata, MAC_TSCR, TSADDREG, 1);
 
/* Wait for addend update to complete */
-   while (XGMAC_IOREAD_BITS(pdata, MAC_TSCR, TSADDREG))
+   while (--count && XGMAC_IOREAD_BITS(pdata, MAC_TSCR, TSADDREG))
udelay(5);
+
+   if (!count)
+   netdev_err(pdata->netdev,
+  "timed out updating timestamp addend register\n");
 }
 
 static void xgbe_set_tstamp_time(struct xgbe_prv_data *pdata, unsigned int sec,
 unsigned int nsec)
 {
+   unsigned int count = 1;
+
/* Set the time values and tell the device */
XGMAC_IOWRITE(pdata, MAC_STSUR, sec);
XGMAC_IOWRITE(pdata, MAC_STNUR, nsec);
XGMAC_IOWRITE_BITS(pdata, MAC_TSCR, TSINIT, 1);
 
/* Wait for time update to complete */
-   while (XGMAC_IOREAD_BITS(pdata, MAC_TSCR, TSINIT))
+   while (--count && XGMAC_IOREAD_BITS(pdata, MAC_TSCR, TSINIT))
udelay(5);
+
+   if (!count)
+   netdev_err(pdata->netdev, "timed out initializing timestamp\n");
 }
 
 static u64 xgbe_get_tstamp_time(struct xgbe_prv_data *pdata)

[PATCH net-next v1 01/14] amd-xgbe: Simplify mailbox interface rate change code

2017-06-28 Thread Tom Lendacky

Simplify and centralize the mailbox command rate change interface by
having a single function perform the writes to the mailbox registers
to issue the request.

Signed-off-by: Tom Lendacky 
---
 drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c |  155 +--
 1 file changed, 29 insertions(+), 126 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
index e707c49..0429840 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
@@ -1694,19 +1694,25 @@ static void xgbe_phy_set_redrv_mode(struct 
xgbe_prv_data *pdata)
xgbe_phy_put_comm_ownership(pdata);
 }
 
-static void xgbe_phy_start_ratechange(struct xgbe_prv_data *pdata)
+static void xgbe_phy_perform_ratechange(struct xgbe_prv_data *pdata,
+   unsigned int cmd, unsigned int sub_cmd)
 {
-   if (!XP_IOREAD_BITS(pdata, XP_DRIVER_INT_RO, STATUS))
-   return;
+   unsigned int s0 = 0;
+   unsigned int wait;
 
/* Log if a previous command did not complete */
-   netif_dbg(pdata, link, pdata->netdev,
- "firmware mailbox not ready for command\n");
-}
+   if (XP_IOREAD_BITS(pdata, XP_DRIVER_INT_RO, STATUS))
+   netif_dbg(pdata, link, pdata->netdev,
+ "firmware mailbox not ready for command\n");
 
-static void xgbe_phy_complete_ratechange(struct xgbe_prv_data *pdata)
-{
-   unsigned int wait;
+   /* Construct the command */
+   XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, COMMAND, cmd);
+   XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, SUB_COMMAND, sub_cmd);
+
+   /* Issue the command */
+   XP_IOWRITE(pdata, XP_DRIVER_SCRATCH_0, s0);
+   XP_IOWRITE(pdata, XP_DRIVER_SCRATCH_1, 0);
+   XP_IOWRITE_BITS(pdata, XP_DRIVER_INT_REQ, REQUEST, 1);
 
/* Wait for command to complete */
wait = XGBE_RATECHANGE_COUNT;
@@ -1723,21 +1729,8 @@ static void xgbe_phy_complete_ratechange(struct 
xgbe_prv_data *pdata)
 
 static void xgbe_phy_rrc(struct xgbe_prv_data *pdata)
 {
-   unsigned int s0;
-
-   xgbe_phy_start_ratechange(pdata);
-
/* Receiver Reset Cycle */
-   s0 = 0;
-   XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, COMMAND, 5);
-   XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, SUB_COMMAND, 0);
-
-   /* Call FW to make the change */
-   XP_IOWRITE(pdata, XP_DRIVER_SCRATCH_0, s0);
-   XP_IOWRITE(pdata, XP_DRIVER_SCRATCH_1, 0);
-   XP_IOWRITE_BITS(pdata, XP_DRIVER_INT_REQ, REQUEST, 1);
-
-   xgbe_phy_complete_ratechange(pdata);
+   xgbe_phy_perform_ratechange(pdata, 5, 0);
 
netif_dbg(pdata, link, pdata->netdev, "receiver reset complete\n");
 }
@@ -1746,14 +1739,8 @@ static void xgbe_phy_power_off(struct xgbe_prv_data 
*pdata)
 {
struct xgbe_phy_data *phy_data = pdata->phy_data;
 
-   xgbe_phy_start_ratechange(pdata);
-
-   /* Call FW to make the change */
-   XP_IOWRITE(pdata, XP_DRIVER_SCRATCH_0, 0);
-   XP_IOWRITE(pdata, XP_DRIVER_SCRATCH_1, 0);
-   XP_IOWRITE_BITS(pdata, XP_DRIVER_INT_REQ, REQUEST, 1);
-
-   xgbe_phy_complete_ratechange(pdata);
+   /* Power off */
+   xgbe_phy_perform_ratechange(pdata, 0, 0);
 
phy_data->cur_mode = XGBE_MODE_UNKNOWN;
 
@@ -1763,33 +1750,21 @@ static void xgbe_phy_power_off(struct xgbe_prv_data 
*pdata)
 static void xgbe_phy_sfi_mode(struct xgbe_prv_data *pdata)
 {
struct xgbe_phy_data *phy_data = pdata->phy_data;
-   unsigned int s0;
 
xgbe_phy_set_redrv_mode(pdata);
 
-   xgbe_phy_start_ratechange(pdata);
-
/* 10G/SFI */
-   s0 = 0;
-   XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, COMMAND, 3);
if (phy_data->sfp_cable != XGBE_SFP_CABLE_PASSIVE) {
-   XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, SUB_COMMAND, 0);
+   xgbe_phy_perform_ratechange(pdata, 3, 0);
} else {
if (phy_data->sfp_cable_len <= 1)
-   XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, SUB_COMMAND, 1);
+   xgbe_phy_perform_ratechange(pdata, 3, 1);
else if (phy_data->sfp_cable_len <= 3)
-   XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, SUB_COMMAND, 2);
+   xgbe_phy_perform_ratechange(pdata, 3, 2);
else
-   XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, SUB_COMMAND, 3);
+   xgbe_phy_perform_ratechange(pdata, 3, 3);
}
 
-   /* Call FW to make the change */
-   XP_IOWRITE(pdata, XP_DRIVER_SCRATCH_0, s0);
-   XP_IOWRITE(pdata, XP_DRIVER_SCRATCH_1, 0);
-   XP_IOWRITE_BITS(pdata, XP_DRIVER_INT_REQ, REQUEST, 1);
-
-   xgbe_phy_complete_ratechange(pdata);
-
phy_data->cur_mode = XGBE_MODE_SFI;
 
netif_dbg(pdata, link, pdata->netdev, "10GbE SFI mode set\n");
@@ -1798,23 +1773,11 @@ static void xgbe_phy_sfi_mode(struct xgbe_prv_data 
*pdata)

[PATCH net-next v1 02/14] amd-xgbe: Fix SFP PHY supported/advertised settings

2017-06-28 Thread Tom Lendacky

When using SFPs, the supported and advertised settings should be initially
based on the SFP that has been detected.  The code currently indicates the
overall support of the device as opposed to what the SFP is capable of.
Update the code to change the supported link modes, auto-negotiation, etc.
to be based on the installed SFP.

Signed-off-by: Tom Lendacky 
---
 drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c |   69 ++-
 1 file changed, 47 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
index 0429840..756e116 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
@@ -711,23 +711,39 @@ static void xgbe_phy_sfp_phy_settings(struct 
xgbe_prv_data *pdata)
 {
struct xgbe_phy_data *phy_data = pdata->phy_data;
 
+   if (!phy_data->sfp_mod_absent && !phy_data->sfp_changed)
+   return;
+
+   pdata->phy.supported &= ~SUPPORTED_Autoneg;
+   pdata->phy.supported &= ~(SUPPORTED_Pause | SUPPORTED_Asym_Pause);
+   pdata->phy.supported &= ~SUPPORTED_TP;
+   pdata->phy.supported &= ~SUPPORTED_FIBRE;
+   pdata->phy.supported &= ~SUPPORTED_100baseT_Full;
+   pdata->phy.supported &= ~SUPPORTED_1000baseT_Full;
+   pdata->phy.supported &= ~SUPPORTED_1baseT_Full;
+
if (phy_data->sfp_mod_absent) {
pdata->phy.speed = SPEED_UNKNOWN;
pdata->phy.duplex = DUPLEX_UNKNOWN;
pdata->phy.autoneg = AUTONEG_ENABLE;
+   pdata->phy.pause_autoneg = AUTONEG_ENABLE;
+
+   pdata->phy.supported |= SUPPORTED_Autoneg;
+   pdata->phy.supported |= SUPPORTED_Pause | SUPPORTED_Asym_Pause;
+   pdata->phy.supported |= SUPPORTED_TP;
+   pdata->phy.supported |= SUPPORTED_FIBRE;
+   if (phy_data->port_speeds & XGBE_PHY_PORT_SPEED_100)
+   pdata->phy.supported |= SUPPORTED_100baseT_Full;
+   if (phy_data->port_speeds & XGBE_PHY_PORT_SPEED_1000)
+   pdata->phy.supported |= SUPPORTED_1000baseT_Full;
+   if (phy_data->port_speeds & XGBE_PHY_PORT_SPEED_1)
+   pdata->phy.supported |= SUPPORTED_1baseT_Full;
+
pdata->phy.advertising = pdata->phy.supported;
 
return;
}
 
-   pdata->phy.advertising &= ~ADVERTISED_Autoneg;
-   pdata->phy.advertising &= ~ADVERTISED_TP;
-   pdata->phy.advertising &= ~ADVERTISED_FIBRE;
-   pdata->phy.advertising &= ~ADVERTISED_100baseT_Full;
-   pdata->phy.advertising &= ~ADVERTISED_1000baseT_Full;
-   pdata->phy.advertising &= ~ADVERTISED_1baseT_Full;
-   pdata->phy.advertising &= ~ADVERTISED_1baseR_FEC;
-
switch (phy_data->sfp_base) {
case XGBE_SFP_BASE_1000_T:
case XGBE_SFP_BASE_1000_SX:
@@ -736,17 +752,25 @@ static void xgbe_phy_sfp_phy_settings(struct 
xgbe_prv_data *pdata)
pdata->phy.speed = SPEED_UNKNOWN;
pdata->phy.duplex = DUPLEX_UNKNOWN;
pdata->phy.autoneg = AUTONEG_ENABLE;
-   pdata->phy.advertising |= ADVERTISED_Autoneg;
+   pdata->phy.pause_autoneg = AUTONEG_ENABLE;
+   pdata->phy.supported |= SUPPORTED_Autoneg;
+   pdata->phy.supported |= SUPPORTED_Pause | SUPPORTED_Asym_Pause;
break;
case XGBE_SFP_BASE_1_SR:
case XGBE_SFP_BASE_1_LR:
case XGBE_SFP_BASE_1_LRM:
case XGBE_SFP_BASE_1_ER:
case XGBE_SFP_BASE_1_CR:
-   default:
pdata->phy.speed = SPEED_1;
pdata->phy.duplex = DUPLEX_FULL;
pdata->phy.autoneg = AUTONEG_DISABLE;
+   pdata->phy.pause_autoneg = AUTONEG_DISABLE;
+   break;
+   default:
+   pdata->phy.speed = SPEED_UNKNOWN;
+   pdata->phy.duplex = DUPLEX_UNKNOWN;
+   pdata->phy.autoneg = AUTONEG_DISABLE;
+   pdata->phy.pause_autoneg = AUTONEG_DISABLE;
break;
}
 
@@ -754,36 +778,38 @@ static void xgbe_phy_sfp_phy_settings(struct 
xgbe_prv_data *pdata)
case XGBE_SFP_BASE_1000_T:
case XGBE_SFP_BASE_1000_CX:
case XGBE_SFP_BASE_1_CR:
-   pdata->phy.advertising |= ADVERTISED_TP;
+   pdata->phy.supported |= SUPPORTED_TP;
break;
default:
-   pdata->phy.advertising |= ADVERTISED_FIBRE;
+   pdata->phy.supported |= SUPPORTED_FIBRE;
}
 
switch (phy_data->sfp_speed) {
case XGBE_SFP_SPEED_100_1000:
if (phy_data->port_speeds & XGBE_PHY_PORT_SPEED_100)
-   pdata->phy.advertising |= ADVERTISED_100baseT_Full;
+   pdata->phy.supported |= SUPPORTED_100baseT_Full;
if

[PATCH net-next v1 03/14] amd-xgbe: Use the proper register during PTP initialization

2017-06-28 Thread Tom Lendacky

During PTP initialization, the Timestamp Control register should be
cleared and not the Tx Configuration register.  While this typo causes
the wrong register to be cleared, the default value of each register and
and the fact that the Tx Configuration register is programmed afterwards
doesn't result in a bug, hence only fixing in net-next.

Signed-off-by: Tom Lendacky 
---
 drivers/net/ethernet/amd/xgbe/xgbe-ptp.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-ptp.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-ptp.c
index a533a6c..d06d260 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-ptp.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-ptp.c
@@ -267,7 +267,7 @@ void xgbe_ptp_register(struct xgbe_prv_data *pdata)
 ktime_to_ns(ktime_get_real()));
 
/* Disable all timestamping to start */
-   XGMAC_IOWRITE(pdata, MAC_TCR, 0);
+   XGMAC_IOWRITE(pdata, MAC_TSCR, 0);
pdata->tstamp_config.tx_type = HWTSTAMP_TX_OFF;
pdata->tstamp_config.rx_filter = HWTSTAMP_FILTER_NONE;
 }

[PATCH net-next v1 00/14] amd-xgbe: AMD XGBE driver updates 2016-06-28

2017-06-28 Thread Tom Lendacky

The following updates and fixes are included in this driver update series:

- Simplify mailbox interface code
- Fix SFP supported and advertising settings
- Fix PTP initialization register usage
- Insure there is timestamp skb present before using it
- Add a timeout to timestamp register updates
- Handle return code from software reset function
- Some fixes for handling 2.5Gbps rates
- Limit I2C error messages
- Fix non-DMA interrupt handling through tasklet usage
- Add NUMA affinity support for memory allocations
- Add NUMA affinity support for interrupts
- Prepare for more fine-grained cache coherency controls
- Simplify setting the DMA burst length programming
- Performance improvements

This patch series is based on net-next.

---

Tom Lendacky (14):
  amd-xgbe: Simplify mailbox interface rate change code
  amd-xgbe: Fix SFP PHY supported/advertised settings
  amd-xgbe: Use the proper register during PTP initialization
  amd-xgbe: Add a check for an skb in the timestamp path
  amd-xgbe: Prevent looping forever if timestamp update fails
  amd-xgbe: Handle return code from software reset function
  amd-xgbe: Fixes for working with PHYs that support 2.5GbE
  amd-xgbe: Limit the I2C error messages that are output
  amd-xgbe: Re-issue interrupt if interrupt status not cleared
  amd-xgbe: Add NUMA affinity support for memory allocations
  amd-xgbe: Add NUMA affinity support for IRQ hints
  amd-xgbe: Prepare for more fine grained cache coherency controls
  amd-xgbe: Simplify the burst length settings
  amd-xgbe: Adjust register settings to improve performance


 drivers/net/ethernet/amd/xgbe/xgbe-common.h   |   53 ++---
 drivers/net/ethernet/amd/xgbe/xgbe-desc.c |   94 +++---
 drivers/net/ethernet/amd/xgbe/xgbe-dev.c  |  244 ++---
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c  |  245 -
 drivers/net/ethernet/amd/xgbe/xgbe-i2c.c  |   30 +++
 drivers/net/ethernet/amd/xgbe/xgbe-main.c |   14 +
 drivers/net/ethernet/amd/xgbe/xgbe-mdio.c |   33 +++
 drivers/net/ethernet/amd/xgbe/xgbe-pci.c  |   14 +
 drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c   |  240 +---
 drivers/net/ethernet/amd/xgbe/xgbe-platform.c |   10 -
 drivers/net/ethernet/amd/xgbe/xgbe-ptp.c  |2 
 drivers/net/ethernet/amd/xgbe/xgbe.h  |   56 +++---
 12 files changed, 547 insertions(+), 488 deletions(-)

-- 
Tom Lendacky

Re: [PATCH v3 net-next 02/12] bpf/verifier: rework value tracking

2017-06-28 Thread Edward Cree

On 28/06/17 18:09, Daniel Borkmann wrote:
> Could you elaborate on this one? If I understand it correctly, then
> the scalar += pointer case would mean the following: given I have one
> of the allowed pointer types in adjust_ptr_min_max_vals() then the
> prior scalar type inherits the ptr type/id. I would then 'destroy' the
> pointer value so we get a -EACCES on it. We mark the tmp off_reg as
> scalar type, but shouldn't also actual dst_reg be marked as such
> like in below pointer += scalar case, such that we undo the prior
> ptr_type inheritance?
Good catch.  The intent was that adjust_ptr_min_max_vals() wouldn't mark
 dst_reg's type/id in the case when it returned -EACCES, but indeed there
 are some such paths, and rather than changing those it may be easier to
 change the type/id back to scalar/0.  I don't think we need to go as far
 as calling __mark_reg_unknown() on it though, its bounds and align
 shouldn't have been screwed up by adjust_ptr_min_max_vals().

-Ed

Re: wl18xx: add checks on wl18xx_top_reg_write() return value

2017-06-28 Thread Kalle Valo

"Gustavo A. R. Silva"  wrote:

> Check return value from call to wl18xx_top_reg_write(),
> so in case of error jump to goto label out and return.
> 
> Also, remove unnecessary value check before goto label out.
> 
> Addresses-Coverity-ID: 1226938
> Signed-off-by: Gustavo A. R. Silva 

Patch applied to wireless-drivers-next.git, thanks.

059c98599b1a wl18xx: add checks on wl18xx_top_reg_write() return value

-- 
https://patchwork.kernel.org/patch/9810591/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

Re: cw1200: add const to hwbus_ops structures

2017-06-28 Thread Kalle Valo

Bhumika Goyal  wrote:

> Declare hwbus_ops structures as const as they are only passed as an
> argument to the function cw1200_core_probe. This argument is of type
> const. So, make these structures const.
> 
> Signed-off-by: Bhumika Goyal 

Patch applied to wireless-drivers-next.git, thanks.

3ac27dd37b40 cw1200: add const to hwbus_ops structures

-- 
https://patchwork.kernel.org/patch/9806289/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

Re: [PATCH] net: ipmr: Add ipmr_rtm_getroute

2017-06-28 Thread Nikolay Aleksandrov

On 28/06/17 20:58, Donald Sharp wrote:
> Add to RTNL_FAMILY_IPMR, RTM_GETROUTE the ability
> to retrieve one S,G mroute from a specified table.
> 
> *,G will return mroute information for just that
> particular mroute if it exists.  This is because
> it is entirely possible to have more S's then
> can fit in one skb to return to the requesting
> process.
> 
> Signed-off-by: Donald Sharp 
> ---
>  net/ipv4/ipmr.c | 63 
> -
>  1 file changed, 62 insertions(+), 1 deletion(-)
> 

This is targeted at net-next.

Signed-off-by: Nikolay Aleksandrov

[PATCH] net: ipmr: Add ipmr_rtm_getroute

2017-06-28 Thread Donald Sharp

Add to RTNL_FAMILY_IPMR, RTM_GETROUTE the ability
to retrieve one S,G mroute from a specified table.

*,G will return mroute information for just that
particular mroute if it exists.  This is because
it is entirely possible to have more S's then
can fit in one skb to return to the requesting
process.

Signed-off-by: Donald Sharp 
---
 net/ipv4/ipmr.c | 63 -
 1 file changed, 62 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index a1d521b..bb909f1 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -2406,6 +2406,67 @@ static void igmpmsg_netlink_event(struct mr_table *mrt, 
struct sk_buff *pkt)
rtnl_set_sk_err(net, RTNLGRP_IPV4_MROUTE_R, -ENOBUFS);
 }
 
+static int ipmr_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
+struct netlink_ext_ack *extack)
+{
+   struct net *net = sock_net(in_skb->sk);
+   struct nlattr *tb[RTA_MAX + 1];
+   struct sk_buff *skb = NULL;
+   struct mfc_cache *cache;
+   struct mr_table *mrt;
+   struct rtmsg *rtm;
+   __be32 src, grp;
+   u32 tableid;
+   int err;
+
+   err = nlmsg_parse(nlh, sizeof(*rtm), tb, RTA_MAX,
+ rtm_ipv4_policy, extack);
+   if (err < 0)
+   goto errout;
+
+   rtm = nlmsg_data(nlh);
+
+   src = tb[RTA_SRC] ? nla_get_in_addr(tb[RTA_SRC]) : 0;
+   grp = tb[RTA_DST] ? nla_get_in_addr(tb[RTA_DST]) : 0;
+   tableid = tb[RTA_TABLE] ? nla_get_u32(tb[RTA_TABLE]) : 0;
+
+   mrt = ipmr_get_table(net, tableid ? tableid : RT_TABLE_DEFAULT);
+   if (IS_ERR(mrt)) {
+   err = PTR_ERR(mrt);
+   goto errout_free;
+   }
+
+   /* entries are added/deleted only under RTNL */
+   rcu_read_lock();
+   cache = ipmr_cache_find(mrt, src, grp);
+   rcu_read_unlock();
+   if (!cache) {
+   err = -ENOENT;
+   goto errout_free;
+   }
+
+   skb = nlmsg_new(mroute_msgsize(false, mrt->maxvif), GFP_KERNEL);
+   if (!skb) {
+   err = -ENOBUFS;
+   goto errout_free;
+   }
+
+   err = ipmr_fill_mroute(mrt, skb, NETLINK_CB(in_skb).portid,
+  nlh->nlmsg_seq, cache,
+  RTM_NEWROUTE, 0);
+   if (err < 0)
+   goto errout_free;
+
+   err = rtnl_unicast(skb, net, NETLINK_CB(in_skb).portid);
+
+errout:
+   return err;
+
+errout_free:
+   kfree_skb(skb);
+   goto errout;
+}
+
 static int ipmr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb)
 {
struct net *net = sock_net(skb->sk);
@@ -3053,7 +3114,7 @@ int __init ip_mr_init(void)
}
 #endif
rtnl_register(RTNL_FAMILY_IPMR, RTM_GETROUTE,
- NULL, ipmr_rtm_dumproute, NULL);
+ ipmr_rtm_getroute, ipmr_rtm_dumproute, NULL);
rtnl_register(RTNL_FAMILY_IPMR, RTM_NEWROUTE,
  ipmr_rtm_route, NULL, NULL);
rtnl_register(RTNL_FAMILY_IPMR, RTM_DELROUTE,
-- 
2.9.4

Re: rsi: add in missing RSI_FSM_STATES into array fsm_state

2017-06-28 Thread Kalle Valo

Colin Ian King  wrote:

> From: Colin Ian King 
> 
> Two recent commits added new RSI_FSM_STATES (namely FSM_FW_NOT_LOADED
> and FSM_COMMON_DEV_PARAMS_SENT) and the corresponding table fsm_state
> was not updated to match. This can lead to an array overrun when
> accessing the latter two states in fsm_state. Fix this by adding in
> the missing states.
> 
> Detected by CoverityScan, CID#1398379 ("Illegal address computation")
> 
> Fixes: 9920322ccd8e ("rsi: add tx frame for common device configuration")
> Fixes: 015e367494c1 ("rsi: Register interrupt handler before firmware load")
> Signed-off-by: Colin Ian King 

Patch applied to wireless-drivers-next.git, thanks.

58828680af49 rsi: add in missing RSI_FSM_STATES into array fsm_state

-- 
https://patchwork.kernel.org/patch/9804857/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

Re: [net-next] qtnfmac: fix uninitialized return code in ret

2017-06-28 Thread Kalle Valo

Colin Ian King  wrote:

> From: Colin Ian King 
> 
> The return value ret is unitialized and garbage is being returned
> for the three different error conditions when setting up the PCIe
> BARs. Fix this by initializing ret to  -ENOMEM to indicate that
> the BARs failed to be setup correctly.
> 
> Detected by CoverityScan, CID#1437563 ("Unitialized scalar variable")
> 
> Signed-off-by: Colin Ian King 
> Reviewed-by: Sergey Matyukevich 

Patch applied to wireless-drivers-next.git, thanks.

3e3d8aa61107 qtnfmac: fix uninitialized return code in ret

-- 
https://patchwork.kernel.org/patch/9801833/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

Re: [PATCH net-next v3 01/15] bpf: BPF support for sock_ops

2017-06-28 Thread Lawrence Brakmo

On 6/23/17, 2:15 PM, "Daniel Borkmann"  wrote:

On 06/23/2017 01:57 AM, Lawrence Brakmo wrote:
> On 6/22/17, 4:19 PM, "netdev-ow...@vger.kernel.org on behalf of Daniel 
Borkmann"  
wrote:
>
>  On 06/23/2017 12:58 AM, Lawrence Brakmo wrote:
>  [...]
>  > Daniel, I see value for having a global program, so I would like 
to keep that. When
>  > this patchset is accepted, I will submit one that adds support for 
per cgroup
>  > sock_ops programs, with the option to use the global one if none is
>  > specified for a cgroup. We could also have the option of the 
cgroup sock_ops
>  > program choosing if the global program should run for a particular 
op based on
>  > its return value. We can iron it out the details when that patch 
is submitted.
>
>  Hm, could you elaborate on the value part compared to per cgroups 
ops?
>  My understanding is that per cgroup would already be a proper 
superset
>  of just the global one anyway, so why not going with that in the 
first
>  place since you're working on it?
>
>  What would be the additional value? How would global vs per cgroup 
one
>  interact with each other in terms of enforcement e.g., there's 
already
>  semantics in place for cgroups descendants, would it be that we set
>  TCP parameters twice or would you disable the global one altogether?
>  Just wondering as you could avoid these altogether with going via 
cgroups
>  initially.
>
>  Thanks,
>  Daniel
>
> Well, for starters the global program will work even if CONFIG_CGROUP_BPF 
is
> not defined. It is also an easier concept for when a global program is 
all that

Otoh, major distros are highly likely to enable this on by default anyway.

> is required. But I also had in mind that behaviors that were in common for
> most cgroup programs could be handled by the global program instead of
> adding it to all cgroup programs. In this scenario the global program
> represents the default behavior that can be override by the cgroup
> program (per op). For example, the cgroup program could return a value
> to indicate that that op should be passed to the global program.

But then you would need to go through two program passes for setting
such parameters? Other option could be to make the per cgroup ops more
fine grained and use the effective one that was inherited for delegating
to default ops. My gut feeling is just that this makes interactions to
manage this and enforcement in combination with the later planned per
cgroups ops more complex if the same use-case could indeed be resolved
with per cgroups only.

Daniel, thank you for the feedback. I just submitted a new patch set without
the global program and using bpf cgroups framework.

> I agree 100% with you on the value of cgroup programs, but I just happen
> to think there is also value in the global program.
>
> Thanks,
> Lawrence

[PATCH net-next] bpf: Fix out-of-bound access on interpreters[]

2017-06-28 Thread Martin KaFai Lau

The index is off-by-one when fp->aux->stack_depth
has already been rounded up to 32.  In particular,
if stack_depth is 512, the index will be 16.

The fix is to round_up and then takes -1 instead of round_down.

[   22.318680] 
==
[   22.319745] BUG: KASAN: global-out-of-bounds in 
bpf_prog_select_runtime+0x48a/0x670
[   22.320737] Read of size 8 at addr 82aadae0 by task sockex3/1946
[   22.321646]
[   22.321858] CPU: 1 PID: 1946 Comm: sockex3 Tainted: GW   
4.12.0-rc6-01680-g2ee87db3a287 #22
[   22.323061] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.9.3-1.el7.centos 04/01/2014
[   22.324260] Call Trace:
[   22.324612]  dump_stack+0x67/0x99
[   22.325081]  print_address_description+0x1e8/0x290
[   22.325734]  ? bpf_prog_select_runtime+0x48a/0x670
[   22.326360]  kasan_report+0x265/0x350
[   22.326860]  __asan_report_load8_noabort+0x19/0x20
[   22.327484]  bpf_prog_select_runtime+0x48a/0x670
[   22.328109]  bpf_prog_load+0x626/0xd40
[   22.328637]  ? __bpf_prog_charge+0xc0/0xc0
[   22.329222]  ? check_nnp_nosuid.isra.61+0x100/0x100
[   22.329890]  ? __might_fault+0xf6/0x1b0
[   22.330446]  ? lock_acquire+0x360/0x360
[   22.331013]  SyS_bpf+0x67c/0x24d0
[   22.331491]  ? trace_hardirqs_on+0xd/0x10
[   22.332049]  ? __getnstimeofday64+0xaf/0x1c0
[   22.332635]  ? bpf_prog_get+0x20/0x20
[   22.333135]  ? __audit_syscall_entry+0x300/0x600
[   22.333770]  ? syscall_trace_enter+0x540/0xdd0
[   22.334339]  ? exit_to_usermode_loop+0xe0/0xe0
[   22.334950]  ? do_syscall_64+0x48/0x410
[   22.335446]  ? bpf_prog_get+0x20/0x20
[   22.335954]  do_syscall_64+0x181/0x410
[   22.336454]  entry_SYSCALL64_slow_path+0x25/0x25
[   22.337121] RIP: 0033:0x7f263fe81f19
[   22.337618] RSP: 002b:7ffd9a3440c8 EFLAGS: 0202 ORIG_RAX: 
0141
[   22.338619] RAX: ffda RBX: 00aac5fb RCX: 7f263fe81f19
[   22.339600] RDX: 0030 RSI: 7ffd9a3440d0 RDI: 0005
[   22.340470] RBP: 00a9a1e0 R08: 00a9a1e0 R09: 009d0001
[   22.341430] R10:  R11: 0202 R12: 0001
[   22.342411] R13: 00a9a023 R14: 0001 R15: 0003
[   22.343369]
[   22.343593] The buggy address belongs to the variable:
[   22.344241]  interpreters+0x80/0x980
[   22.344708]
[   22.344908] Memory state around the buggy address:
[   22.345556]  82aad980: 00 00 00 04 fa fa fa fa 04 fa fa fa fa fa fa 
fa
[   22.346449]  82aada00: 00 00 00 00 00 fa fa fa fa fa fa fa 00 00 00 
00
[   22.347361] >82aada80: 00 00 00 00 00 00 00 00 00 00 00 00 fa fa fa 
fa
[   22.348301]^
[   22.349142]  82aadb00: 00 01 fa fa fa fa fa fa 00 00 00 00 00 00 00 
00
[   22.350058]  82aadb80: 00 00 07 fa fa fa fa fa 00 00 05 fa fa fa fa 
fa
[   22.350984] 
==

Fixes: b870aa901f4b ("bpf: use different interpreter depending on required 
stack size")
Signed-off-by: Martin KaFai Lau 
Acked-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 kernel/bpf/core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 774069ca18a7..ad5f55922a13 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1297,7 +1297,9 @@ static int bpf_check_tail_call(const struct bpf_prog *fp)
  */
 struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err)
 {
-   fp->bpf_func = interpreters[round_down(fp->aux->stack_depth, 32) / 32];
+   u32 stack_depth = max_t(u32, fp->aux->stack_depth, 1);
+
+   fp->bpf_func = interpreters[(round_up(stack_depth, 32) / 32) - 1];
 
/* eBPF JITs can rewrite the program in case constant
 * blinding is active. However, in case of error during
-- 
2.9.3

[PATCH net-next v4 16/16] bpf: update tools/include/uapi/linux/bpf.h

2017-06-28 Thread Lawrence Brakmo

Update tools/include/uapi/linux/bpf.h to include changes related to new
bpf sock_ops program type.

Signed-off-by: Lawrence Brakmo 
---
 tools/include/uapi/linux/bpf.h | 66 +-
 1 file changed, 65 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index f94b48b..284b366 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -120,12 +120,14 @@ enum bpf_prog_type {
BPF_PROG_TYPE_LWT_IN,
BPF_PROG_TYPE_LWT_OUT,
BPF_PROG_TYPE_LWT_XMIT,
+   BPF_PROG_TYPE_SOCK_OPS,
 };
 
 enum bpf_attach_type {
BPF_CGROUP_INET_INGRESS,
BPF_CGROUP_INET_EGRESS,
BPF_CGROUP_INET_SOCK_CREATE,
+   BPF_CGROUP_SOCK_OPS,
__MAX_BPF_ATTACH_TYPE
 };
 
@@ -518,6 +520,17 @@ union bpf_attr {
  * Set full skb->hash.
  * @skb: pointer to skb
  * @hash: hash to set
+ *
+ * int bpf_setsockopt(bpf_socket, level, optname, optval, optlen)
+ * Calls setsockopt. Not all opts are available, only those with
+ * integer optvals plus TCP_CONGESTION.
+ * Supported levels: SOL_SOCKET and IPROTO_TCP
+ * @bpf_socket: pointer to bpf_socket
+ * @level: SOL_SOCKET or IPROTO_TCP
+ * @optname: option name
+ * @optval: pointer to option value
+ * @optlen: length of optval in byes
+ * Return: 0 or negative error
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -568,7 +581,8 @@ union bpf_attr {
FN(probe_read_str), \
FN(get_socket_cookie),  \
FN(get_socket_uid), \
-   FN(set_hash),
+   FN(set_hash),   \
+   FN(setsockopt),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -720,4 +734,54 @@ struct bpf_map_info {
__u32 map_flags;
 } __attribute__((aligned(8)));
 
+/* User bpf_sock_ops struct to access socket values and specify request ops
+ * and their replies.
+ * New fields can only be added at the end of this structure
+ */
+struct bpf_sock_ops {
+   __u32 op;
+   union {
+   __u32 reply;
+   __u32 replylong[4];
+   };
+   __u32 family;
+   __u32 remote_ip4;
+   __u32 local_ip4;
+   __u32 remote_ip6[4];
+   __u32 local_ip6[4];
+   __u32 remote_port;
+   __u32 local_port;
+};
+
+/* List of known BPF sock_ops operators.
+ * New entries can only be added at the end
+ */
+enum {
+   BPF_SOCK_OPS_VOID,
+   BPF_SOCK_OPS_TIMEOUT_INIT,  /* Should return SYN-RTO value to use or
+* -1 if default value should be used
+*/
+   BPF_SOCK_OPS_RWND_INIT, /* Should return initial advertized
+* window (in packets) or -1 if default
+* value should be used
+*/
+   BPF_SOCK_OPS_TCP_CONNECT_CB,/* Calls BPF program right before an
+* active connection is initialized
+*/
+   BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB, /* Calls BPF program when an
+* active connection is
+* established
+*/
+   BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB,/* Calls BPF program when a
+* passive connection is
+* established
+*/
+   BPF_SOCK_OPS_NEEDS_ECN, /* If connection's congestion control
+* needs ECN
+*/
+};
+
+#define TCP_BPF_IW 1001/* Set TCP initial congestion window */
+#define TCP_BPF_SNDCWND_CLAMP  1002/* Set sndcwnd_clamp */
+
 #endif /* _UAPI__LINUX_BPF_H__ */
-- 
2.9.3

[PATCH net-next v4 13/16] bpf: Sample BPF program to set initial cwnd

2017-06-28 Thread Lawrence Brakmo

Sample BPF program that assumes hosts are far away (i.e. large RTTs)
and sets initial cwnd and initial receive window to 40 packets,
send and receive buffers to 1.5MB.

In practice there would be a test to insure the hosts are actually
far enough away.

Signed-off-by: Lawrence Brakmo 
---
 samples/bpf/Makefile  |  1 +
 samples/bpf/tcp_iw_kern.c | 79 +++
 2 files changed, 80 insertions(+)
 create mode 100644 samples/bpf/tcp_iw_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 6fdf32d..242d76e 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -117,6 +117,7 @@ always += tcp_synrto_kern.o
 always += tcp_rwnd_kern.o
 always += tcp_bufs_kern.o
 always += tcp_cong_kern.o
+always += tcp_iw_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
diff --git a/samples/bpf/tcp_iw_kern.c b/samples/bpf/tcp_iw_kern.c
new file mode 100644
index 000..28626f9
--- /dev/null
+++ b/samples/bpf/tcp_iw_kern.c
@@ -0,0 +1,79 @@
+/* Copyright (c) 2017 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * BPF program to set initial congestion window and initial receive
+ * window to 40 packets and send and receive buffers to 1.5MB. This
+ * would usually be done after doing appropriate checks that indicate
+ * the hosts are far enough away (i.e. large RTT).
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+#define DEBUG 1
+
+SEC("sockops")
+int bpf_iw(struct bpf_sock_ops *skops)
+{
+   char fmt1[] = "BPF command: %d\n";
+   char fmt2[] = "  Returning %d\n";
+   int bufsize = 150;
+   int rwnd_init = 40;
+   int iw = 40;
+   int rv = 0;
+   int op;
+
+   /* For testing purposes, only execute rest of BPF program
+* if neither port numberis 55601
+*/
+   if (skops->remote_port != 55601 && skops->local_port != 55601)
+   return -1;
+
+   op = (int) skops->op;
+
+#ifdef DEBUG
+   bpf_trace_printk(fmt1, sizeof(fmt1), op);
+#endif
+
+   /* Usually there would be a check to insure the hosts are far
+* from each other so it makes sense to increase buffer sizes
+*/
+   switch (op) {
+   case BPF_SOCK_OPS_RWND_INIT:
+   rv = rwnd_init;
+   break;
+   case BPF_SOCK_OPS_TCP_CONNECT_CB:
+   /* Set sndbuf and rcvbuf of active connections */
+   rv = bpf_setsockopt(skops, SOL_SOCKET, SO_SNDBUF, ,
+   sizeof(bufsize));
+   rv = rv*100 + bpf_setsockopt(skops, SOL_SOCKET, SO_RCVBUF,
+, sizeof(bufsize));
+   break;
+   case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB:
+   rv = bpf_setsockopt(skops, SOL_TCP, TCP_BPF_IW, ,
+   sizeof(iw));
+   break;
+   case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
+   /* Set sndbuf and rcvbuf of passive connections */
+   rv = bpf_setsockopt(skops, SOL_SOCKET, SO_SNDBUF, ,
+   sizeof(bufsize));
+   rv = rv*100 + bpf_setsockopt(skops, SOL_SOCKET, SO_RCVBUF,
+, sizeof(bufsize));
+   break;
+   default:
+   rv = -1;
+   }
+#ifdef DEBUG
+   bpf_trace_printk(fmt2, sizeof(fmt2), rv);
+#endif
+   skops->reply = rv;
+   return 1;
+}
+char _license[] SEC("license") = "GPL";
-- 
2.9.3

[PATCH net-next v4 15/16] bpf: Sample bpf program to set sndcwnd clamp

2017-06-28 Thread Lawrence Brakmo

Sample BPF program, tcp_clamp_kern.c, to demostrate the use
of setting the sndcwnd clamp. This program assumes that if the
first 5.5 bytes of the host's IPv6 addresses are the same, then
the hosts are in the same datacenter and sets sndcwnd clamp to
100 packets, SYN and SYN-ACK RTOs to 10ms and send/receive buffer
sizes to 150KB.

Signed-off-by: Lawrence Brakmo 
---
 samples/bpf/Makefile |  1 +
 samples/bpf/tcp_clamp_kern.c | 94 
 2 files changed, 95 insertions(+)
 create mode 100644 samples/bpf/tcp_clamp_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 242d76e..9c65058 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -118,6 +118,7 @@ always += tcp_rwnd_kern.o
 always += tcp_bufs_kern.o
 always += tcp_cong_kern.o
 always += tcp_iw_kern.o
+always += tcp_clamp_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
diff --git a/samples/bpf/tcp_clamp_kern.c b/samples/bpf/tcp_clamp_kern.c
new file mode 100644
index 000..07e334e
--- /dev/null
+++ b/samples/bpf/tcp_clamp_kern.c
@@ -0,0 +1,94 @@
+/* Copyright (c) 2017 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * Sample BPF program to set send and receive buffers to 150KB, sndcwnd clamp
+ * to 100 packets and SYN and SYN_ACK RTOs to 10ms when both hosts are within
+ * the same datacenter. For his example, we assume they are within the same
+ * datacenter when the first 5.5 bytes of their IPv6 addresses are the same.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+#define DEBUG 1
+
+SEC("sockops")
+int bpf_clamp(struct bpf_sock_ops *skops)
+{
+   char fmt1[] = "BPF command: %d\n";
+   char fmt2[] = "  Returning %d\n";
+   int bufsize = 15;
+   int to_init = 10;
+   int clamp = 100;
+   int rv = 0;
+   int op;
+
+   /* For testing purposes, only execute rest of BPF program
+* if neither port numberis 55601
+*/
+   if (skops->remote_port != 55601 && skops->local_port != 55601)
+   return -1;
+
+   op = (int) skops->op;
+
+#ifdef DEBUG
+   bpf_trace_printk(fmt1, sizeof(fmt1), op);
+#endif
+
+   /* Check that both hosts are within same datacenter. For this example
+* it is the case when the first 5.5 bytes of their IPv6 addresses are
+* the same.
+*/
+   if (skops->family == AF_INET6 &&
+   skops->local_ip6[0] == skops->remote_ip6[0] &&
+   (skops->local_ip6[1] & 0xfff0) ==
+   (skops->remote_ip6[1] & 0xfff0)) {
+   switch (op) {
+   case BPF_SOCK_OPS_TIMEOUT_INIT:
+   rv = to_init;
+   break;
+   case BPF_SOCK_OPS_TCP_CONNECT_CB:
+   /* Set sndbuf and rcvbuf of active connections */
+   rv = bpf_setsockopt(skops, SOL_SOCKET, SO_SNDBUF,
+   , sizeof(bufsize));
+   rv = rv*100 + bpf_setsockopt(skops, SOL_SOCKET,
+ SO_RCVBUF, ,
+ sizeof(bufsize));
+   break;
+   case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB:
+   rv = bpf_setsockopt(skops, SOL_TCP,
+   TCP_BPF_SNDCWND_CLAMP,
+   , sizeof(clamp));
+   break;
+   case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
+   /* Set sndbuf and rcvbuf of passive connections */
+   rv = bpf_setsockopt(skops, SOL_TCP,
+   TCP_BPF_SNDCWND_CLAMP,
+   , sizeof(clamp));
+   rv = rv*100 + bpf_setsockopt(skops, SOL_SOCKET,
+ SO_SNDBUF, ,
+ sizeof(bufsize));
+   rv = rv*100 + bpf_setsockopt(skops, SOL_SOCKET,
+ SO_RCVBUF, ,
+ sizeof(bufsize));
+   break;
+   default:
+   rv = -1;
+   }
+   } else {
+   rv = -1;
+   }
+#ifdef DEBUG
+   bpf_trace_printk(fmt2, sizeof(fmt2), rv);
+#endif
+   skops->reply = rv;
+   return 1;
+}
+char _license[] SEC("license") = "GPL";
-- 
2.9.3

[PATCH net-next v4 12/16] bpf: Adds support for setting initial cwnd

2017-06-28 Thread Lawrence Brakmo

Adds a new bpf_setsockopt for TCP sockets, TCP_BPF_IW, which sets the
initial congestion window. This can be used when the hosts are far
apart (large RTTs) and it is safe to start with a large inital cwnd.

Signed-off-by: Lawrence Brakmo 
---
 include/uapi/linux/bpf.h |  2 ++
 net/core/filter.c| 14 +-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 77d05ff..0d9ff6d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -781,4 +781,6 @@ enum {
 */
 };
 
+#define TCP_BPF_IW 1001/* Set TCP initial congestion window */
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/net/core/filter.c b/net/core/filter.c
index b36ec83..147b637 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2724,7 +2724,19 @@ BPF_CALL_5(bpf_setsockopt, struct bpf_sock_ops_kern *, 
bpf_sock,
tcp_reinit_congestion_control(sk,
inet_csk(sk)->icsk_ca_ops);
} else {
-   ret = -EINVAL;
+   struct tcp_sock *tp = tcp_sk(sk);
+
+   val = *((int *)optval);
+   switch (optname) {
+   case TCP_BPF_IW:
+   if (val <= 0 || tp->data_segs_out > 0)
+   ret = -EINVAL;
+   else
+   tp->snd_cwnd = val;
+   break;
+   default:
+   ret = -EINVAL;
+   }
}
} else {
ret = -EINVAL;
-- 
2.9.3

[PATCH net-next v4 14/16] bpf: Adds support for setting sndcwnd clamp

2017-06-28 Thread Lawrence Brakmo

Adds a new bpf_setsockopt for TCP sockets, TCP_BPF_SNDCWND_CLAMP, which
sets the initial congestion window. It is useful to limit the sndcwnd
when the host are close to each other (small RTT).

Signed-off-by: Lawrence Brakmo 
---
 include/uapi/linux/bpf.h | 1 +
 net/core/filter.c| 7 +++
 2 files changed, 8 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0d9ff6d..284b366 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -782,5 +782,6 @@ enum {
 };
 
 #define TCP_BPF_IW 1001/* Set TCP initial congestion window */
+#define TCP_BPF_SNDCWND_CLAMP  1002/* Set sndcwnd_clamp */
 
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/net/core/filter.c b/net/core/filter.c
index 147b637..516353e 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2734,6 +2734,13 @@ BPF_CALL_5(bpf_setsockopt, struct bpf_sock_ops_kern *, 
bpf_sock,
else
tp->snd_cwnd = val;
break;
+   case TCP_BPF_SNDCWND_CLAMP:
+   if (val <= 0) {
+   ret = -EINVAL;
+   } else {
+   tp->snd_cwnd_clamp = val;
+   tp->snd_ssthresh = val;
+   }
default:
ret = -EINVAL;
}
-- 
2.9.3

[PATCH net-next v4 08/16] bpf: Add TCP connection BPF callbacks

2017-06-28 Thread Lawrence Brakmo

Added callbacks to BPF SOCK_OPS type program before an active
connection is intialized and after a passive or active connection is
established.

The following patch demostrates how they can be used to set send and
receive buffer sizes.

Signed-off-by: Lawrence Brakmo 
---
 include/uapi/linux/bpf.h | 11 +++
 net/ipv4/tcp_fastopen.c  |  1 +
 net/ipv4/tcp_input.c |  4 +++-
 net/ipv4/tcp_output.c|  1 +
 4 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2dbae9e..5b7207d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -765,6 +765,17 @@ enum {
 * window (in packets) or -1 if default
 * value should be used
 */
+   BPF_SOCK_OPS_TCP_CONNECT_CB,/* Calls BPF program right before an
+* active connection is initialized
+*/
+   BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB, /* Calls BPF program when an
+* active connection is
+* established
+*/
+   BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB,/* Calls BPF program when a
+* passive connection is
+* established
+*/
 };
 
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index 4af82b9..ed6b549 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -221,6 +221,7 @@ static struct sock *tcp_fastopen_create_child(struct sock 
*sk,
tcp_init_congestion_control(child);
tcp_mtup_init(child);
tcp_init_metrics(child);
+   tcp_call_bpf(child, false, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
tcp_init_buffer_space(child);
 
tp->rcv_nxt = TCP_SKB_CB(skb)->seq + 1;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 0867b05..1b868ae 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5571,7 +5571,7 @@ void tcp_finish_connect(struct sock *sk, struct sk_buff 
*skb)
icsk->icsk_af_ops->rebuild_header(sk);
 
tcp_init_metrics(sk);
-
+   tcp_call_bpf(sk, false, BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB);
tcp_init_congestion_control(sk);
 
/* Prevent spurious tcp_cwnd_restart() on first data
@@ -5977,6 +5977,8 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff 
*skb)
} else {
/* Make sure socket is routed, for correct metrics. */
icsk->icsk_af_ops->rebuild_header(sk);
+   tcp_call_bpf(sk, false,
+BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
tcp_init_congestion_control(sk);
 
tcp_mtup_init(sk);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index e5f623f..958edc8 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3445,6 +3445,7 @@ int tcp_connect(struct sock *sk)
struct sk_buff *buff;
int err;
 
+   tcp_call_bpf(sk, false, BPF_SOCK_OPS_TCP_CONNECT_CB);
tcp_connect_init(sk);
 
if (unlikely(tp->repair)) {
-- 
2.9.3

[PATCH net-next v4 07/16] bpf: Add setsockopt helper function to bpf

2017-06-28 Thread Lawrence Brakmo

Added support for calling a subset of socket setsockopts from
BPF_PROG_TYPE_SOCK_OPS programs. The code was duplicated rather
than making the changes to call the socket setsockopt function because
the changes required would have been larger.

The ops supported are:
  SO_RCVBUF
  SO_SNDBUF
  SO_MAX_PACING_RATE
  SO_PRIORITY
  SO_RCVLOWAT
  SO_MARK

Signed-off-by: Lawrence Brakmo 
---
 include/uapi/linux/bpf.h  | 14 -
 net/core/filter.c | 77 ++-
 samples/bpf/bpf_helpers.h |  3 ++
 3 files changed, 92 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index cdec348..2dbae9e 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -520,6 +520,17 @@ union bpf_attr {
  * Set full skb->hash.
  * @skb: pointer to skb
  * @hash: hash to set
+ *
+ * int bpf_setsockopt(bpf_socket, level, optname, optval, optlen)
+ * Calls setsockopt. Not all opts are available, only those with
+ * integer optvals plus TCP_CONGESTION.
+ * Supported levels: SOL_SOCKET and IPROTO_TCP
+ * @bpf_socket: pointer to bpf_socket
+ * @level: SOL_SOCKET or IPROTO_TCP
+ * @optname: option name
+ * @optval: pointer to option value
+ * @optlen: length of optval in byes
+ * Return: 0 or negative error
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -570,7 +581,8 @@ union bpf_attr {
FN(probe_read_str), \
FN(get_socket_cookie),  \
FN(get_socket_uid), \
-   FN(set_hash),
+   FN(set_hash),   \
+   FN(setsockopt),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index bb54832..167eca0 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -54,6 +54,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /**
  * sk_filter_trim_cap - run a packet through a socket filter
@@ -2672,6 +2673,69 @@ static const struct bpf_func_proto 
bpf_get_socket_uid_proto = {
.arg1_type  = ARG_PTR_TO_CTX,
 };
 
+BPF_CALL_5(bpf_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
+  int, level, int, optname, char *, optval, int, optlen)
+{
+   struct sock *sk = bpf_sock->sk;
+   int ret = 0;
+   int val;
+
+   if (bpf_sock->is_req_sock)
+   return -EINVAL;
+
+   if (level == SOL_SOCKET) {
+   /* Only some socketops are supported */
+   val = *((int *)optval);
+
+   switch (optname) {
+   case SO_RCVBUF:
+   sk->sk_userlocks |= SOCK_RCVBUF_LOCK;
+   sk->sk_rcvbuf = max_t(int, val * 2, SOCK_MIN_RCVBUF);
+   break;
+   case SO_SNDBUF:
+   sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
+   sk->sk_sndbuf = max_t(int, val * 2, SOCK_MIN_SNDBUF);
+   break;
+   case SO_MAX_PACING_RATE:
+   sk->sk_max_pacing_rate = val;
+   sk->sk_pacing_rate = min(sk->sk_pacing_rate,
+sk->sk_max_pacing_rate);
+   break;
+   case SO_PRIORITY:
+   sk->sk_priority = val;
+   break;
+   case SO_RCVLOWAT:
+   if (val < 0)
+   val = INT_MAX;
+   sk->sk_rcvlowat = val ? : 1;
+   break;
+   case SO_MARK:
+   sk->sk_mark = val;
+   break;
+   default:
+   ret = -EINVAL;
+   }
+   } else if (level == SOL_TCP &&
+  sk->sk_prot->setsockopt == tcp_setsockopt) {
+   /* Place holder */
+   ret = -EINVAL;
+   } else {
+   ret = -EINVAL;
+   }
+   return ret;
+}
+
+static const struct bpf_func_proto bpf_setsockopt_proto = {
+   .func   = bpf_setsockopt,
+   .gpl_only   = true,
+   .ret_type   = RET_INTEGER,
+   .arg1_type  = ARG_PTR_TO_CTX,
+   .arg2_type  = ARG_ANYTHING,
+   .arg3_type  = ARG_ANYTHING,
+   .arg4_type  = ARG_PTR_TO_MEM,
+   .arg5_type  = ARG_CONST_SIZE_OR_ZERO,
+};
+
 static const struct bpf_func_proto *
 bpf_base_func_proto(enum bpf_func_id func_id)
 {
@@ -2823,6 +2887,17 @@ lwt_inout_func_proto(enum bpf_func_id func_id)
 }
 
 static const struct bpf_func_proto *
+   sock_ops_func_proto(enum bpf_func_id func_id)
+{
+   switch (func_id) {
+   case BPF_FUNC_setsockopt:
+   return _setsockopt_proto;
+   default:
+   return bpf_base_func_proto(func_id);
+   }
+}
+
+static const struct bpf_func_proto *

[PATCH net-next v4 09/16] bpf: Sample BPF program to set buffer sizes

2017-06-28 Thread Lawrence Brakmo

This patch contains a BPF program to set initial receive window to
40 packets and send and receive buffers to 1.5MB. This would usually
be done after doing appropriate checks that indicate the hosts are
far enough away (i.e. large RTT).

Signed-off-by: Lawrence Brakmo 
---
 samples/bpf/Makefile|  1 +
 samples/bpf/tcp_bufs_kern.c | 77 +
 2 files changed, 78 insertions(+)
 create mode 100644 samples/bpf/tcp_bufs_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index ca95528..3b300db 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -115,6 +115,7 @@ always += test_map_in_map_kern.o
 always += cookie_uid_helper_example.o
 always += tcp_synrto_kern.o
 always += tcp_rwnd_kern.o
+always += tcp_bufs_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
diff --git a/samples/bpf/tcp_bufs_kern.c b/samples/bpf/tcp_bufs_kern.c
new file mode 100644
index 000..ccd3bbe
--- /dev/null
+++ b/samples/bpf/tcp_bufs_kern.c
@@ -0,0 +1,77 @@
+/* Copyright (c) 2017 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * BPF program to set initial receive window to 40 packets and send
+ * and receive buffers to 1.5MB. This would usually be done after
+ * doing appropriate checks that indicate the hosts are far enough
+ * away (i.e. large RTT).
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+#define DEBUG 1
+
+SEC("sockops")
+int bpf_bufs(struct bpf_sock_ops *skops)
+{
+   char fmt1[] = "BPF command: %d\n";
+   char fmt2[] = "  Returning %d\n";
+   int bufsize = 150;
+   int rwnd_init = 40;
+   int rv = 0;
+   int op;
+
+   /* For testing purposes, only execute rest of BPF program
+* if neither port numberis 55601
+*/
+   if (skops->remote_port != 55601 && skops->local_port != 55601)
+   return -1;
+
+   op = (int) skops->op;
+
+#ifdef DEBUG
+   bpf_trace_printk(fmt1, sizeof(fmt1), op);
+#endif
+
+   /* Usually there would be a check to insure the hosts are far
+* from each other so it makes sense to increase buffer sizes
+*/
+   switch (op) {
+   case BPF_SOCK_OPS_RWND_INIT:
+   rv = rwnd_init;
+   break;
+   case BPF_SOCK_OPS_TCP_CONNECT_CB:
+   /* Set sndbuf and rcvbuf of active connections */
+   rv = bpf_setsockopt(skops, SOL_SOCKET, SO_SNDBUF, ,
+   sizeof(bufsize));
+   rv = rv*100 + bpf_setsockopt(skops, SOL_SOCKET, SO_RCVBUF,
+, sizeof(bufsize));
+   break;
+   case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB:
+   /* Nothing to do */
+   break;
+   case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
+   /* Set sndbuf and rcvbuf of passive connections */
+   rv = bpf_setsockopt(skops, SOL_SOCKET, SO_SNDBUF, ,
+   sizeof(bufsize));
+   rv = rv*100 + bpf_setsockopt(skops, SOL_SOCKET, SO_RCVBUF,
+, sizeof(bufsize));
+   break;
+   default:
+   rv = -1;
+   }
+#ifdef DEBUG
+   bpf_trace_printk(fmt2, sizeof(fmt2), rv);
+#endif
+   skops->reply = rv;
+   return 1;
+}
+char _license[] SEC("license") = "GPL";
-- 
2.9.3

[PATCH net-next v4 02/16] bpf: program to load and attach sock_ops BPF progs

2017-06-28 Thread Lawrence Brakmo

The program load_sock_ops can be used to load sock_ops bpf programs and
to attach it to an existing (v2) cgroup. It can also be used to detach
sock_ops programs.

Examples:
load_sock_ops [-l]  
Load and attaches a sock_ops program at the specified cgroup.
If "-l" is used, the program will continue to run to output the
BPF log buffer.
If the specified filename does not end in ".o", it appends
"_kern.o" to the name.

load_sock_ops -r 
Detaches the currently attached sock_ops program from the
specified cgroup.

Signed-off-by: Lawrence Brakmo 
---
 samples/bpf/Makefile|  3 ++
 samples/bpf/load_sock_ops.c | 97 +
 2 files changed, 100 insertions(+)
 create mode 100644 samples/bpf/load_sock_ops.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index e7ec9b8..015589b 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -36,6 +36,7 @@ hostprogs-y += lwt_len_hist
 hostprogs-y += xdp_tx_iptunnel
 hostprogs-y += test_map_in_map
 hostprogs-y += per_socket_stats_example
+hostprogs-y += load_sock_ops
 
 # Libbpf dependencies
 LIBBPF := ../../tools/lib/bpf/bpf.o
@@ -52,6 +53,7 @@ tracex3-objs := bpf_load.o $(LIBBPF) tracex3_user.o
 tracex4-objs := bpf_load.o $(LIBBPF) tracex4_user.o
 tracex5-objs := bpf_load.o $(LIBBPF) tracex5_user.o
 tracex6-objs := bpf_load.o $(LIBBPF) tracex6_user.o
+load_sock_ops-objs := bpf_load.o $(LIBBPF) load_sock_ops.o
 test_probe_write_user-objs := bpf_load.o $(LIBBPF) test_probe_write_user_user.o
 trace_output-objs := bpf_load.o $(LIBBPF) trace_output_user.o
 lathist-objs := bpf_load.o $(LIBBPF) lathist_user.o
@@ -130,6 +132,7 @@ HOSTLOADLIBES_tracex4 += -lelf -lrt
 HOSTLOADLIBES_tracex5 += -lelf
 HOSTLOADLIBES_tracex6 += -lelf
 HOSTLOADLIBES_test_cgrp2_sock2 += -lelf
+HOSTLOADLIBES_load_sock_ops += -lelf
 HOSTLOADLIBES_test_probe_write_user += -lelf
 HOSTLOADLIBES_trace_output += -lelf -lrt
 HOSTLOADLIBES_lathist += -lelf
diff --git a/samples/bpf/load_sock_ops.c b/samples/bpf/load_sock_ops.c
new file mode 100644
index 000..91aa00d
--- /dev/null
+++ b/samples/bpf/load_sock_ops.c
@@ -0,0 +1,97 @@
+/* Copyright (c) 2017 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include 
+#include 
+#include 
+#include 
+#include "libbpf.h"
+#include "bpf_load.h"
+#include 
+#include 
+#include 
+#include 
+
+static void usage(char *pname)
+{
+   printf("USAGE:\n  %s [-l]  \n", pname);
+   printf("\tLoad and attach a sock_ops program to the specified "
+  "cgroup\n");
+   printf("\tIf \"-l\" is used, the program will continue to run\n");
+   printf("\tprinting the BPF log buffer\n");
+   printf("\tIf the specified filename does not end in \".o\", it\n");
+   printf("\tappends \"_kern.o\" to the name\n");
+   printf("\n");
+   printf("  %s -r \n", pname);
+   printf("\tDetaches the currently attached sock_ops program\n");
+   printf("\tfrom the specified cgroup\n");
+   printf("\n");
+   exit(0);
+}
+
+int main(int argc, char **argv)
+{
+   int logFlag = 0;
+   int error = 0;
+   char *cg_path;
+   char fn[500];
+   char *prog;
+   int cg_fd;
+
+   if (argc < 3)
+   usage(argv[0]);
+
+   if (!strcmp(argv[1], "-r")) {
+   cg_path = argv[2];
+   cg_fd = open(cg_path, O_DIRECTORY, O_RDONLY);
+   error = bpf_prog_detach(cg_fd, BPF_CGROUP_SOCK_OPS);
+   if (error) {
+   printf("ERROR: bpf_prog_detach: %d (%s)\n",
+  error, strerror(errno));
+   return 1;
+   }
+   return 0;
+   } else if (!strcmp(argv[1], "-h")) {
+   usage(argv[0]);
+   } else if (!strcmp(argv[1], "-l")) {
+   logFlag = 1;
+   if (argc < 4)
+   usage(argv[0]);
+   }
+
+   prog = argv[argc - 1];
+   cg_path = argv[argc - 2];
+   if (strlen(prog) > 480) {
+   fprintf(stderr, "ERROR: program name too long (> 480 chars)\n");
+   exit(2);
+   }
+   cg_fd = open(cg_path, O_DIRECTORY, O_RDONLY);
+
+   if (!strcmp(prog + strlen(prog)-2, ".o"))
+   strcpy(fn, prog);
+   else
+   sprintf(fn, "%s_kern.o", prog);
+   if (logFlag)
+   printf("loading bpf file:%s\n", fn);
+   if (load_bpf_file(fn)) {
+   printf("ERROR: load_bpf_file failed for: %s\n", fn);
+   printf("%s", bpf_log_buf);
+   return 1;
+   }
+   if (logFlag)
+   printf("TCP BPF Loaded %s\n", fn);
+
+   error = bpf_prog_attach(prog_fd[0], cg_fd, BPF_CGROUP_SOCK_OPS, 0);
+   if (error) {
+   printf("ERROR:

[PATCH net-next v4 05/16] bpf: Support for setting initial receive window

2017-06-28 Thread Lawrence Brakmo

This patch adds suppport for setting the initial advertized window from
within a BPF_SOCK_OPS program. This can be used to support larger
initial cwnd values in environments where it is known to be safe.

Signed-off-by: Lawrence Brakmo 
---
 include/net/tcp.h| 10 ++
 include/uapi/linux/bpf.h |  4 
 net/ipv4/tcp_minisocks.c |  9 -
 net/ipv4/tcp_output.c|  7 ++-
 4 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index cd9ef63..af404aa 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2069,4 +2069,14 @@ static inline u32 tcp_timeout_init(struct sock *sk, bool 
is_req_sock)
return timeout;
 }
 
+static inline u32 tcp_rwnd_init_bpf(struct sock *sk, bool is_req_sock)
+{
+   int rwnd;
+
+   rwnd = tcp_call_bpf(sk, is_req_sock, BPF_SOCK_OPS_RWND_INIT);
+
+   if (rwnd < 0)
+   rwnd = 0;
+   return rwnd;
+}
 #endif /* _TCP_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 4174668..cdec348 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -749,6 +749,10 @@ enum {
BPF_SOCK_OPS_TIMEOUT_INIT,  /* Should return SYN-RTO value to use or
 * -1 if default value should be used
 */
+   BPF_SOCK_OPS_RWND_INIT, /* Should return initial advertized
+* window (in packets) or -1 if default
+* value should be used
+*/
 };
 
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index d30ee31..bbaf3c6 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -351,6 +351,7 @@ void tcp_openreq_init_rwin(struct request_sock *req,
int full_space = tcp_full_space(sk_listener);
u32 window_clamp;
__u8 rcv_wscale;
+   u32 rcv_wnd;
int mss;
 
mss = tcp_mss_clamp(tp, dst_metric_advmss(dst));
@@ -363,6 +364,12 @@ void tcp_openreq_init_rwin(struct request_sock *req,
(req->rsk_window_clamp > full_space || req->rsk_window_clamp == 0))
req->rsk_window_clamp = full_space;
 
+   rcv_wnd = tcp_rwnd_init_bpf((struct sock *)req, true);
+   if (rcv_wnd == 0)
+   rcv_wnd = dst_metric(dst, RTAX_INITRWND);
+   else if (full_space < rcv_wnd * mss)
+   full_space = rcv_wnd * mss;
+
/* tcp_full_space because it is guaranteed to be the first packet */
tcp_select_initial_window(full_space,
mss - (ireq->tstamp_ok ? TCPOLEN_TSTAMP_ALIGNED : 0),
@@ -370,7 +377,7 @@ void tcp_openreq_init_rwin(struct request_sock *req,
>rsk_window_clamp,
ireq->wscale_ok,
_wscale,
-   dst_metric(dst, RTAX_INITRWND));
+   rcv_wnd);
ireq->rcv_wscale = rcv_wscale;
 }
 EXPORT_SYMBOL(tcp_openreq_init_rwin);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 5e478a1..e5f623f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3267,6 +3267,7 @@ static void tcp_connect_init(struct sock *sk)
const struct dst_entry *dst = __sk_dst_get(sk);
struct tcp_sock *tp = tcp_sk(sk);
__u8 rcv_wscale;
+   u32 rcv_wnd;
 
/* We'll fix this up when we get a response from the other end.
 * See tcp_input.c:tcp_rcv_state_process case TCP_SYN_SENT.
@@ -3300,13 +3301,17 @@ static void tcp_connect_init(struct sock *sk)
(tp->window_clamp > tcp_full_space(sk) || tp->window_clamp == 0))
tp->window_clamp = tcp_full_space(sk);
 
+   rcv_wnd = tcp_rwnd_init_bpf(sk, false);
+   if (rcv_wnd == 0)
+   rcv_wnd = dst_metric(dst, RTAX_INITRWND);
+
tcp_select_initial_window(tcp_full_space(sk),
  tp->advmss - (tp->rx_opt.ts_recent_stamp ? 
tp->tcp_header_len - sizeof(struct tcphdr) : 0),
  >rcv_wnd,
  >window_clamp,
  sock_net(sk)->ipv4.sysctl_tcp_window_scaling,
  _wscale,
- dst_metric(dst, RTAX_INITRWND));
+ rcv_wnd);
 
tp->rx_opt.rcv_wscale = rcv_wscale;
tp->rcv_ssthresh = tp->rcv_wnd;
-- 
2.9.3

[PATCH net-next v4 04/16] bpf: Sample bpf program to set SYN/SYN-ACK RTOs

2017-06-28 Thread Lawrence Brakmo

The sample BPF program, tcp_synrto_kern.c, sets the SYN and SYN-ACK
RTOs to 10ms when both hosts are within the same datacenter (i.e.
small RTTs) in an environment where common IPv6 prefixes indicate
both hosts are in the same data center.

Signed-off-by: Lawrence Brakmo 
---
 samples/bpf/Makefile  |  1 +
 samples/bpf/tcp_synrto_kern.c | 60 +++
 2 files changed, 61 insertions(+)
 create mode 100644 samples/bpf/tcp_synrto_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 015589b..e29370a 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -113,6 +113,7 @@ always += lwt_len_hist_kern.o
 always += xdp_tx_iptunnel_kern.o
 always += test_map_in_map_kern.o
 always += cookie_uid_helper_example.o
+always += tcp_synrto_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
diff --git a/samples/bpf/tcp_synrto_kern.c b/samples/bpf/tcp_synrto_kern.c
new file mode 100644
index 000..b16ac39
--- /dev/null
+++ b/samples/bpf/tcp_synrto_kern.c
@@ -0,0 +1,60 @@
+/* Copyright (c) 2017 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * BPF program to set SYN and SYN-ACK RTOs to 10ms when using IPv6 addresses
+ * and the first 5.5 bytes of the IPv6 addresses are the same (in this example
+ * that means both hosts are in the same datacenter.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+#define DEBUG 1
+
+SEC("sockops")
+int bpf_synrto(struct bpf_sock_ops *skops)
+{
+   char fmt1[] = "BPF command: %d\n";
+   char fmt2[] = "  Returning %d\n";
+   int rv = -1;
+   int op;
+
+   /* For testing purposes, only execute rest of BPF program
+* if neither port numberis 55601
+*/
+   if (skops->remote_port != 55601 && skops->local_port != 55601)
+   return -1;
+
+   op = (int) skops->op;
+
+#ifdef DEBUG
+   bpf_trace_printk(fmt1, sizeof(fmt1), op);
+#endif
+
+   /* Check for TIMEOUT_INIT operation and IPv6 addresses */
+   if (op == BPF_SOCK_OPS_TIMEOUT_INIT &&
+   skops->family == AF_INET6) {
+
+   /* If the first 5.5 bytes of the IPv6 address are the same
+* then both hosts are in the same datacenter
+* so use an RTO of 10ms
+*/
+   if (skops->local_ip6[0] == skops->remote_ip6[0] &&
+   (skops->local_ip6[1] & 0xfff0) ==
+   (skops->remote_ip6[1] & 0xfff0))
+   rv = 10;
+   }
+#ifdef DEBUG
+   bpf_trace_printk(fmt2, sizeof(fmt2), rv);
+#endif
+   skops->reply = rv;
+   return 1;
+}
+char _license[] SEC("license") = "GPL";
-- 
2.9.3

[PATCH net-next v4 10/16] bpf: Add support for changing congestion control

2017-06-28 Thread Lawrence Brakmo

Added support for changing congestion control for SOCK_OPS bpf
programs through the setsockopt bpf helper function. It also adds
a new SOCK_OPS op, BPF_SOCK_OPS_NEEDS_ECN, that is needed for
congestion controls, like dctcp, that need to enable ECN in the
SYN packets.

Signed-off-by: Lawrence Brakmo 
---
 include/net/tcp.h|  9 -
 include/uapi/linux/bpf.h |  3 +++
 net/core/filter.c| 11 +--
 net/ipv4/tcp.c   |  2 +-
 net/ipv4/tcp_cong.c  | 32 ++--
 net/ipv4/tcp_input.c |  3 ++-
 net/ipv4/tcp_output.c|  8 +---
 7 files changed, 50 insertions(+), 18 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index af404aa..4faa8d1 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1004,7 +1004,9 @@ void tcp_get_default_congestion_control(char *name);
 void tcp_get_available_congestion_control(char *buf, size_t len);
 void tcp_get_allowed_congestion_control(char *buf, size_t len);
 int tcp_set_allowed_congestion_control(char *allowed);
-int tcp_set_congestion_control(struct sock *sk, const char *name);
+int tcp_set_congestion_control(struct sock *sk, const char *name, bool load);
+void tcp_reinit_congestion_control(struct sock *sk,
+  const struct tcp_congestion_ops *ca);
 u32 tcp_slow_start(struct tcp_sock *tp, u32 acked);
 void tcp_cong_avoid_ai(struct tcp_sock *tp, u32 w, u32 acked);
 
@@ -2079,4 +2081,9 @@ static inline u32 tcp_rwnd_init_bpf(struct sock *sk, bool 
is_req_sock)
rwnd = 0;
return rwnd;
 }
+
+static inline bool tcp_bpf_ca_needs_ecn(struct sock *sk)
+{
+   return (tcp_call_bpf(sk, true, BPF_SOCK_OPS_NEEDS_ECN) == 1);
+}
 #endif /* _TCP_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 5b7207d..77d05ff 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -776,6 +776,9 @@ enum {
 * passive connection is
 * established
 */
+   BPF_SOCK_OPS_NEEDS_ECN, /* If connection's congestion control
+* needs ECN
+*/
 };
 
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/net/core/filter.c b/net/core/filter.c
index 167eca0..b36ec83 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2717,8 +2717,15 @@ BPF_CALL_5(bpf_setsockopt, struct bpf_sock_ops_kern *, 
bpf_sock,
}
} else if (level == SOL_TCP &&
   sk->sk_prot->setsockopt == tcp_setsockopt) {
-   /* Place holder */
-   ret = -EINVAL;
+   if (optname == TCP_CONGESTION) {
+   ret = tcp_set_congestion_control(sk, optval, false);
+   if (!ret && bpf_sock->op > BPF_SOCK_OPS_NEEDS_ECN)
+   /* replacing an existing ca */
+   tcp_reinit_congestion_control(sk,
+   inet_csk(sk)->icsk_ca_ops);
+   } else {
+   ret = -EINVAL;
+   }
} else {
ret = -EINVAL;
}
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4c88d20..5199952 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2479,7 +2479,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
name[val] = 0;
 
lock_sock(sk);
-   err = tcp_set_congestion_control(sk, name);
+   err = tcp_set_congestion_control(sk, name, true);
release_sock(sk);
return err;
}
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index 324c9bc..fde983f 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -189,8 +189,8 @@ void tcp_init_congestion_control(struct sock *sk)
INET_ECN_dontxmit(sk);
 }
 
-static void tcp_reinit_congestion_control(struct sock *sk,
- const struct tcp_congestion_ops *ca)
+void tcp_reinit_congestion_control(struct sock *sk,
+  const struct tcp_congestion_ops *ca)
 {
struct inet_connection_sock *icsk = inet_csk(sk);
 
@@ -333,8 +333,12 @@ int tcp_set_allowed_congestion_control(char *val)
return ret;
 }
 
-/* Change congestion control for socket */
-int tcp_set_congestion_control(struct sock *sk, const char *name)
+/* Change congestion control for socket. If load is false, then it is the
+ * responsibility of the caller to call tcp_init_congestion_control or
+ * tcp_reinit_congestion_control (if the current congestion control was
+ * already initialized.
+ */
+int tcp_set_congestion_control(struct sock *sk, const char *name, bool load)
 {
struct inet_connection_sock *icsk = inet_csk(sk);
const struct tcp_congestion_ops *ca;
@@ -344,21 +348,29 @@ int

[PATCH net-next v4 11/16] bpf: Sample BPF program to set congestion control

2017-06-28 Thread Lawrence Brakmo

Sample BPF program that sets congestion control to dctcp when both hosts
are within the same datacenter. In this example that is assumed to be
when they have the first 5.5 bytes of their IPv6 address are the same.

Signed-off-by: Lawrence Brakmo 
---
 samples/bpf/Makefile|  1 +
 samples/bpf/tcp_cong_kern.c | 74 +
 2 files changed, 75 insertions(+)
 create mode 100644 samples/bpf/tcp_cong_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 3b300db..6fdf32d 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -116,6 +116,7 @@ always += cookie_uid_helper_example.o
 always += tcp_synrto_kern.o
 always += tcp_rwnd_kern.o
 always += tcp_bufs_kern.o
+always += tcp_cong_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
diff --git a/samples/bpf/tcp_cong_kern.c b/samples/bpf/tcp_cong_kern.c
new file mode 100644
index 000..fdced0f
--- /dev/null
+++ b/samples/bpf/tcp_cong_kern.c
@@ -0,0 +1,74 @@
+/* Copyright (c) 2017 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * BPF program to set congestion control to dctcp when both hosts are
+ * in the same datacenter (as deteremined by IPv6 prefix).
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+#define DEBUG 1
+
+SEC("sockops")
+int bpf_cong(struct bpf_sock_ops *skops)
+{
+   char fmt1[] = "BPF command: %d\n";
+   char fmt2[] = "  Returning %d\n";
+   char cong[] = "dctcp";
+   int rv = 0;
+   int op;
+
+   /* For testing purposes, only execute rest of BPF program
+* if neither port numberis 55601
+*/
+   if (skops->remote_port != 55601 && skops->local_port != 55601)
+   return -1;
+
+   op = (int) skops->op;
+
+#ifdef DEBUG
+   bpf_trace_printk(fmt1, sizeof(fmt1), op);
+#endif
+
+   /* Check if both hosts are in the same datacenter. For this
+* example they are if the 1st 5.5 bytes in the IPv6 address
+* are the same.
+*/
+   if (skops->family == AF_INET6 &&
+   skops->local_ip6[0] == skops->remote_ip6[0] &&
+   (skops->local_ip6[1] & 0xfff0) ==
+   (skops->remote_ip6[1] & 0xfff0)) {
+   switch (op) {
+   case BPF_SOCK_OPS_NEEDS_ECN:
+   rv = 1;
+   break;
+   case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB:
+   rv = bpf_setsockopt(skops, SOL_TCP, TCP_CONGESTION,
+   cong, sizeof(cong));
+   break;
+   case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
+   rv = bpf_setsockopt(skops, SOL_TCP, TCP_CONGESTION,
+   cong, sizeof(cong));
+   break;
+   default:
+   rv = -1;
+   }
+   } else {
+   rv = -1;
+   }
+#ifdef DEBUG
+   bpf_trace_printk(fmt2, sizeof(fmt2), rv);
+#endif
+   skops->reply = rv;
+   return 1;
+}
+char _license[] SEC("license") = "GPL";
-- 
2.9.3

[PATCH net-next v4 03/16] bpf: Support for per connection SYN/SYN-ACK RTOs

2017-06-28 Thread Lawrence Brakmo

This patch adds support for setting a per connection SYN and
SYN_ACK RTOs from within a BPF_SOCK_OPS program. For example,
to set small RTOs when it is known both hosts are within a
datacenter.

Signed-off-by: Lawrence Brakmo 
---
 include/net/tcp.h| 11 +++
 include/uapi/linux/bpf.h |  3 +++
 net/ipv4/tcp_input.c |  3 ++-
 net/ipv4/tcp_output.c|  2 +-
 4 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 804c27a..cd9ef63 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2058,4 +2058,15 @@ static inline int tcp_call_bpf(struct sock *sk, bool 
is_req_sock, int op)
 }
 #endif
 
+static inline u32 tcp_timeout_init(struct sock *sk, bool is_req_sock)
+{
+   int timeout;
+
+   timeout = tcp_call_bpf(sk, is_req_sock, BPF_SOCK_OPS_TIMEOUT_INIT);
+
+   if (timeout <= 0)
+   timeout = TCP_TIMEOUT_INIT;
+   return timeout;
+}
+
 #endif /* _TCP_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 617fb66..4174668 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -746,6 +746,9 @@ struct bpf_sock_ops {
  */
 enum {
BPF_SOCK_OPS_VOID,
+   BPF_SOCK_OPS_TIMEOUT_INIT,  /* Should return SYN-RTO value to use or
+* -1 if default value should be used
+*/
 };
 
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 2ab7e2f..0867b05 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6406,7 +6406,8 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
} else {
tcp_rsk(req)->tfo_listener = false;
if (!want_cookie)
-   inet_csk_reqsk_queue_hash_add(sk, req, 
TCP_TIMEOUT_INIT);
+   inet_csk_reqsk_queue_hash_add(sk, req,
+   tcp_timeout_init((struct sock *)req, true));
af_ops->send_synack(sk, dst, , req, ,
!want_cookie ? TCP_SYNACK_NORMAL :
   TCP_SYNACK_COOKIE);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 9a9c395..5e478a1 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3327,7 +3327,7 @@ static void tcp_connect_init(struct sock *sk)
tp->rcv_wup = tp->rcv_nxt;
tp->copied_seq = tp->rcv_nxt;
 
-   inet_csk(sk)->icsk_rto = TCP_TIMEOUT_INIT;
+   inet_csk(sk)->icsk_rto = tcp_timeout_init(sk, false);
inet_csk(sk)->icsk_retransmits = 0;
tcp_clear_retrans(tp);
 }
-- 
2.9.3

[PATCH net-next v4 01/16] bpf: BPF support for sock_ops

2017-06-28 Thread Lawrence Brakmo

Created a new BPF program type, BPF_PROG_TYPE_SOCK_OPS, and a corresponding
struct that allows BPF programs of this type to access some of the
socket's fields (such as IP addresses, ports, etc.). It uses the
existing bpf cgroups infrastructure so the programs can be attached per
cgroup with full inheritance support. The program will be called at
appropriate times to set relevant connections parameters such as buffer
sizes, SYN and SYN-ACK RTOs, etc., based on connection information such
as IP addresses, port numbers, etc.

Alghough there are already 3 mechanisms to set parameters (sysctls,
route metrics and setsockopts), this new mechanism provides some
distinct advantages. Unlike sysctls, it can set parameters per
connection. In contrast to route metrics, it can also use port numbers
and information provided by a user level program. In addition, it could
set parameters probabilistically for evaluation purposes (i.e. do
something different on 10% of the flows and compare results with the
other 90% of the flows). Also, in cases where IPv6 addresses contain
geographic information, the rules to make changes based on the distance
(or RTT) between the hosts are much easier than route metric rules and
can be global. Finally, unlike setsockopt, it oes not require
application changes and it can be updated easily at any time.

Although the bpf cgroup framework already contains a sock related
program type (BPF_PROG_TYPE_CGROUP_SOCK), I created the new type
(BPF_PROG_TYPE_SOCK_OPS) beccause the existing type expects to be called
only once during the connections's lifetime. In contrast, the new
program type will be called multiple times from different places in the
network stack code.  For example, before sending SYN and SYN-ACKs to set
an appropriate timeout, when the connection is established to set
congestion control, etc. As a result it has "op" field to specify the
type of operation requested.

The purpose of this new program type is to simplify setting connection
parameters, such as buffer sizes, TCP's SYN RTO, etc. For example, it is
easy to use facebook's internal IPv6 addresses to determine if both hosts
of a connection are in the same datacenter. Therefore, it is easy to
write a BPF program to choose a small SYN RTO value when both hosts are
in the same datacenter.

This patch only contains the framework to support the new BPF program
type, following patches add the functionality to set various connection
parameters.

This patch defines a new BPF program type: BPF_PROG_TYPE_SOCKET_OPS
and a new bpf syscall command to load a new program of this type:
BPF_PROG_LOAD_SOCKET_OPS.

Two new corresponding structs (one for the kernel one for the user/BPF
program):

/* kernel version */
struct bpf_sock_ops_kern {
struct sock *sk;
bool   is_req_sock:1;
__u32  op;
union {
__u32 reply;
__u32 replylong[4];
};
};

/* user version */
struct bpf_sock_ops {
__u32 op;
union {
__u32 reply;
__u32 replylong[4];
};
__u32 family;
__u32 remote_ip4;
__u32 local_ip4;
__u32 remote_ip6[4];
__u32 local_ip6[4];
__u32 remote_port;
__u32 local_port;
};

Currently there are two types of ops. The first type expects the BPF
program to return a value which is then used by the caller (or a
negative value to indicate the operation is not supported). The second
type expects state changes to be done by the BPF program, for example
through a setsockopt BPF helper function, and they ignore the return
value.

The reply fields of the bpf_sockt_ops struct are there in case a bpf
program needs to return a value larger than an integer.

Signed-off-by: Lawrence Brakmo 
---
 include/linux/bpf-cgroup.h |  18 +
 include/linux/bpf_types.h  |   1 +
 include/linux/filter.h |  10 +++
 include/net/tcp.h  |  37 ++
 include/uapi/linux/bpf.h   |  28 
 kernel/bpf/cgroup.c|  37 ++
 kernel/bpf/syscall.c   |   5 ++
 net/core/filter.c  | 170 +
 samples/bpf/bpf_load.c |  13 +++-
 9 files changed, 316 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index c970a25..26449c7 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -7,6 +7,7 @@
 struct sock;
 struct cgroup;
 struct sk_buff;
+struct bpf_sock_ops_kern;
 
 #ifdef CONFIG_CGROUP_BPF
 
@@ -42,6 +43,10 @@ int __cgroup_bpf_run_filter_skb(struct sock *sk,
 int __cgroup_bpf_run_filter_sk(struct sock *sk,
   enum bpf_attach_type type);
 
+int __cgroup_bpf_run_filter_sock_ops(struct sock *sk,
+struct bpf_sock_ops_kern *sock_ops,
+enum bpf_attach_type type);
+
 /* Wrappers for __cgroup_bpf_run_filter_skb() guarded by cgroup_bpf_enabled. */
 #define

[PATCH net-next v4 06/16] bpf: Sample bpf program to set initial window

2017-06-28 Thread Lawrence Brakmo

The sample bpf program, tcp_rwnd_kern.c, sets the initial
advertized window to 40 packets in an environment where
distinct IPv6 prefixes indicate that both hosts are not
in the same data center.

Signed-off-by: Lawrence Brakmo 
---
 samples/bpf/Makefile|  1 +
 samples/bpf/tcp_rwnd_kern.c | 61 +
 2 files changed, 62 insertions(+)
 create mode 100644 samples/bpf/tcp_rwnd_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index e29370a..ca95528 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -114,6 +114,7 @@ always += xdp_tx_iptunnel_kern.o
 always += test_map_in_map_kern.o
 always += cookie_uid_helper_example.o
 always += tcp_synrto_kern.o
+always += tcp_rwnd_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
diff --git a/samples/bpf/tcp_rwnd_kern.c b/samples/bpf/tcp_rwnd_kern.c
new file mode 100644
index 000..5daa649
--- /dev/null
+++ b/samples/bpf/tcp_rwnd_kern.c
@@ -0,0 +1,61 @@
+/* Copyright (c) 2017 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * BPF program to set initial receive window to 40 packets when using IPv6
+ * and the first 5.5 bytes of the IPv6 addresses are not the same (in this
+ * example that means both hosts are not the same datacenter.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+#define DEBUG 1
+
+SEC("sockops")
+int bpf_rwnd(struct bpf_sock_ops *skops)
+{
+   char fmt1[] = "BPF command: %d\n";
+   char fmt2[] = "  Returning %d\n";
+   int rv = -1;
+   int op;
+
+   /* For testing purposes, only execute rest of BPF program
+* if neither port numberis 55601
+*/
+   if (skops->remote_port != 55601 && skops->local_port != 55601)
+   return -1;
+
+   op = (int) skops->op;
+
+#ifdef DEBUG
+   bpf_trace_printk(fmt1, sizeof(fmt1), op);
+#endif
+
+   /* Check for RWND_INIT operation and IPv6 addresses */
+   if (op == BPF_SOCK_OPS_RWND_INIT &&
+   skops->family == AF_INET6) {
+
+   /* If the first 5.5 bytes of the IPv6 address are not the same
+* then both hosts are not in the same datacenter
+* so use a larger initial advertized window (40 packets)
+*/
+   if (skops->local_ip6[0] != skops->remote_ip6[0] ||
+   (skops->local_ip6[1] & 0xf000) !=
+   (skops->remote_ip6[1] & 0xf000))
+   bpf_trace_printk(fmt2, sizeof(fmt2), -1);
+   rv = 40;
+   }
+#ifdef DEBUG
+   bpf_trace_printk(fmt2, sizeof(fmt2), rv);
+#endif
+   skops->reply = rv;
+   return 1;
+}
+char _license[] SEC("license") = "GPL";
-- 
2.9.3

[PATCH net-next v4 00/16] bpf: BPF cgroup support for sock_ops

2017-06-28 Thread Lawrence Brakmo

Created a new BPF program type, BPF_PROG_TYPE_SOCK_OPS, and a corresponding
struct that allows BPF programs of this type to access some of the
socket's fields (such as IP addresses, ports, etc.) and setting
connection parameters such as buffer sizes, initial window, SYN/SYN-ACK
RTOs, etc.

Unlike current BPF program types that expect to be called at a particular
place in the network stack code, SOCK_OPS program can be called at
different places and use an "op" field to indicate the context. There
are currently two types of operations, those whose effect is through
their return value and those whose effect is through the new
bpf_setsocketop BPF helper function.

Example operands of the first type are:
  BPF_SOCK_OPS_TIMEOUT_INIT
  BPF_SOCK_OPS_RWND_INIT
  BPF_SOCK_OPS_NEEDS_ECN

Example operands of the secont type are:
  BPF_SOCK_OPS_TCP_CONNECT_CB
  BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB
  BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB

Current operands are only called during connection establishment so
there should not be any BPF overheads after connection establishment. The
main idea is to use connection information form both hosts, such as IP
addresses and ports to allow setting of per connection parameters to
optimize the connection's peformance.

Alghough there are already 3 mechanisms to set parameters (sysctls,
route metrics and setsockopts), this new mechanism provides some
disticnt advantages. Unlike sysctls, it can set parameters per
connection. In contrast to route metrics, it can also use port numbers
and information provided by a user level program. In addition, it could
set parameters probabilistically for evaluation purposes (i.e. do
something different on 10% of the flows and compare results with the
other 90% of the flows). Also, in cases where IPv6 addresses contain
geographic information, the rules to make changes based on the distance
(or RTT) between the hosts are much easier than route metric rules and
can be global. Finally, unlike setsockopt, it does not require
application changes and it can be updated easily at any time.

It uses the existing bpf cgroups infrastructure so the programs can be
attached per cgroup with full inheritance support. Although the bpf cgroup
framework already contains a sock related program type 
(BPF_PROG_TYPE_CGROUP_SOCK),
I created the new type (BPF_PROG_TYPE_SOCK_OPS) beccause the existing type
expects to be called only once during the connections's lifetime. In contrast,
the new program type will be called multiple times from different places in the
network stack code.  For example, before sending SYN and SYN-ACKs to set
an appropriate timeout, when the connection is established to set congestion
control, etc. As a result it has "op" field to specify the type of operation
requested.

This patch set also includes sample BPF programs to demostrate the differnet
features.

v2: Formatting changes, rebased to latest net-next

v3: Fixed build issues, changed socket_ops to sock_ops throught,
fixed formatting issues, removed the syscall to load sock_ops
program and added functionality to use existing bpf attach and
bpf detach system calls, removed reader/writer locks in
sock_bpfops.c (used when saving sock_ops global program)
and fixed missing module refcount increment.

v4: Removed global sock_ops program and instead used existing cgroup bpf
infrastructure to support a new BPF_CGROUP_ATTCH type.

Consists of the following patches:


 include/linux/bpf-cgroup.h |  18 
 include/linux/bpf_types.h  |   1 +
 include/linux/filter.h |  10 ++
 include/net/tcp.h  |  67 +++-
 include/uapi/linux/bpf.h   |  66 +++-
 kernel/bpf/cgroup.c|  37 +++
 kernel/bpf/syscall.c   |   5 +
 net/core/filter.c  | 271 
+++
 net/ipv4/tcp.c |   2 +-
 net/ipv4/tcp_cong.c|  32 --
 net/ipv4/tcp_fastopen.c|   1 +
 net/ipv4/tcp_input.c   |  10 +-
 net/ipv4/tcp_minisocks.c   |   9 +-
 net/ipv4/tcp_output.c  |  18 +++-
 samples/bpf/Makefile   |   9 ++
 samples/bpf/bpf_helpers.h  |   3 +
 samples/bpf/bpf_load.c |  13 ++-
 samples/bpf/load_sock_ops.c|  97 +
 samples/bpf/tcp_bufs_kern.c|  77 ++
 samples/bpf/tcp_clamp_kern.c   |  94 
 samples/bpf/tcp_cong_kern.c|  74 +
 samples/bpf/tcp_iw_kern.c  |  79 ++
 samples/bpf/tcp_rwnd_kern.c|  61 +++
 samples/bpf/tcp_synrto_kern.c  |  60 +++
 tools/include/uapi/linux/bpf.h |  66 +++-
 25 files changed, 1154 insertions(+), 26 deletions(-)

Re: [PATCH net-next 2/2] vxlan: add back error messages to vxlan_config_validate() as extended netlink acks

2017-06-28 Thread Jiri Benc

On Tue, 27 Jun 2017 22:47:58 +0200, Matthias Schiffer wrote:
>   if ((conf->flags & ~VXLAN_F_ALLOWED_GPE) ||
>   !(conf->flags & VXLAN_F_COLLECT_METADATA)) {
> + NL_SET_ERR_MSG(extack,
> +"unsupported combination of extensions");

Since we're redesigning this, let's be more helpful to the user.
There's probably not going to be tremendous improvements here but let's
try at least a bit.

"VXLAN GPE does not support this combination of extensions"

>   if (local_type & IPV6_ADDR_LINKLOCAL) {
>   if (!(remote_type & IPV6_ADDR_LINKLOCAL) &&
> - (remote_type != IPV6_ADDR_ANY))
> + (remote_type != IPV6_ADDR_ANY)) {
> + NL_SET_ERR_MSG(extack,
> +"invalid combination of 
> address scopes");

"invalid combination of local and remote address scopes"

>   return -EINVAL;
> + }
>  
>   conf->flags |= VXLAN_F_IPV6_LINKLOCAL;
>   } else {
>   if (remote_type ==
> - (IPV6_ADDR_UNICAST | IPV6_ADDR_LINKLOCAL))
> + (IPV6_ADDR_UNICAST | IPV6_ADDR_LINKLOCAL)) {
> + NL_SET_ERR_MSG(extack,
> +"invalid combination of 
> address scopes");

ditto

The rest looks good to me. Thanks a lot for doing the work, Matthias!

 Jiri

Re: [PATCH net-next 1/2] vxlan: change vxlan_validate() to use netlink_ext_ack for error reporting

2017-06-28 Thread Jiri Benc

On Tue, 27 Jun 2017 22:47:57 +0200, Matthias Schiffer wrote:
>   if (!is_valid_ether_addr(nla_data(tb[IFLA_ADDRESS]))) {
> - pr_debug("invalid all zero ethernet address\n");
> + NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_ADDRESS],
> + "invalid ethernet address");

Could we be more specific here? This is better than nothing but still
not as helpful to the user as it could be. What about something like
"the provided ethernet address is not unicast"?

> - if (mtu < ETH_MIN_MTU || mtu > ETH_MAX_MTU)
> + if (mtu < ETH_MIN_MTU || mtu > ETH_MAX_MTU) {
> + NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_MTU],
> + "invalid MTU");

"MTU must be between 68 and 65535"

> - if (id >= VXLAN_N_VID)
> + if (id >= VXLAN_N_VID) {
> + NL_SET_ERR_MSG_ATTR(extack, data[IFLA_VXLAN_ID],
> + "invalid VXLAN ID");

"VXLAN ID must be lower than 16777216"

>   if (ntohs(p->high) < ntohs(p->low)) {
> - pr_debug("port range %u .. %u not valid\n",
> -  ntohs(p->low), ntohs(p->high));
> + NL_SET_ERR_MSG_ATTR(extack, data[IFLA_VXLAN_PORT_RANGE],
> + "port range not valid");

Since you're getting rid of the values output, I'd rather suggest more
explicit "the first value of the port range must not be higher than the
second value" or so. Shorter wording is welcome :-)

Thanks,

 Jiri

Re: [PATCH v3 net-next 02/12] bpf/verifier: rework value tracking

2017-06-28 Thread Daniel Borkmann


On 06/27/2017 02:56 PM, Edward Cree wrote:

Tracks value alignment by means of tracking known & unknown bits.
Tightens some min/max value checks and fixes a couple of bugs therein.
If pointer leaks are allowed, and adjust_ptr_min_max_vals returns -EACCES,
  treat the pointer as an unknown scalar and try again, because we might be
  able to conclude something about the result (e.g. pointer & 0x40 is either
  0 or 0x40).

Signed-off-by: Edward Cree 

[...]

+static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
+  struct bpf_insn *insn)
+{
+   struct bpf_reg_state *regs = env->cur_state.regs, *dst_reg, *src_reg;
+   struct bpf_reg_state *ptr_reg = NULL, off_reg = {0};
+   u8 opcode = BPF_OP(insn->code);
+   int rc;
+
+   dst_reg = [insn->dst_reg];
+   check_reg_overflow(dst_reg);
+   src_reg = NULL;
+   if (dst_reg->type != SCALAR_VALUE)
+   ptr_reg = dst_reg;
+   if (BPF_SRC(insn->code) == BPF_X) {
+   src_reg = [insn->src_reg];
+   check_reg_overflow(src_reg);
+
+   if (src_reg->type != SCALAR_VALUE) {
+   if (dst_reg->type != SCALAR_VALUE) {
+   /* Combining two pointers by any ALU op yields
+* an arbitrary scalar.
+*/
+   if (!env->allow_ptr_leaks) {
+   verbose("R%d pointer %s pointer 
prohibited\n",
+   insn->dst_reg,
+   bpf_alu_string[opcode >> 4]);
+   return -EACCES;
+   }
+   mark_reg_unknown(regs, insn->dst_reg);
+   return 0;
+   } else {
+   /* scalar += pointer
+* This is legal, but we have to reverse our
+* src/dest handling in computing the range
+*/
+   rc = adjust_ptr_min_max_vals(env, insn,
+src_reg, dst_reg);
+   if (rc == -EACCES && env->allow_ptr_leaks) {
+   /* scalar += unknown scalar */
+   __mark_reg_unknown(_reg);
+   return adjust_scalar_min_max_vals(
+   env, insn,
+   dst_reg, _reg);


Could you elaborate on this one? If I understand it correctly, then
the scalar += pointer case would mean the following: given I have one
of the allowed pointer types in adjust_ptr_min_max_vals() then the
prior scalar type inherits the ptr type/id. I would then 'destroy' the
pointer value so we get a -EACCES on it. We mark the tmp off_reg as
scalar type, but shouldn't also actual dst_reg be marked as such
like in below pointer += scalar case, such that we undo the prior
ptr_type inheritance?


+   }
+   return rc;
+   }
+   } else if (ptr_reg) {
+   /* pointer += scalar */
+   rc = adjust_ptr_min_max_vals(env, insn,
+dst_reg, src_reg);
+   if (rc == -EACCES && env->allow_ptr_leaks) {
+   /* unknown scalar += scalar */
+   __mark_reg_unknown(dst_reg);
+   return adjust_scalar_min_max_vals(
+   env, insn, dst_reg, src_reg);
+   }
+   return rc;
+   }
+   } else {

[...]

[PATCH iproute2 1/1] tc: updated ife man page.

2017-06-28 Thread Roman Mashak

Explain when skbmark encoding may fail.

Signed-off-by: Roman Mashak 
---
 man/man8/tc-ife.8 | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/man/man8/tc-ife.8 b/man/man8/tc-ife.8
index a8f1f28..1499a3f 100644
--- a/man/man8/tc-ife.8
+++ b/man/man8/tc-ife.8
@@ -59,7 +59,10 @@ Encode direction only. Enforce static encoding of specified 
metadata.
 .BR mark " [ "
 .IR u32_value " ]"
 The value to set for the skb mark. The u32 value is required only when
-.BR use " is specified."
+.BR use " is specified. If
+.BR mark " value is zero, it will not be encoded, instead
+"overlimits" statistics increment and
+.BR CONTROL " action is taken.
 .TP
 .BR prio " [ "
 .IR u32_value " ]"
-- 
1.9.1

Re: ath9k: remove useless variable assignment in ath_mci_intr()

2017-06-28 Thread Kalle Valo

"Gustavo A. R. Silva"  wrote:

> Value assigned to variable offset at line 551 is overwritten at line 562,
> before it can be used. This makes such variable assignment useless.
> 
> Addresses-Coverity-ID: 1226941
> Signed-off-by: Gustavo A. R. Silva 
> Signed-off-by: Kalle Valo 

Patch applied to ath-next branch of ath.git, thanks.

6788a3832c70 ath9k: remove useless variable assignment in ath_mci_intr()

-- 
https://patchwork.kernel.org/patch/9810609/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

Re: [-next] ath10k: fix a bunch of spelling mistakes in messages

2017-06-28 Thread Kalle Valo

Colin Ian King  wrote:

> Fix the following spelling mistakes in messages:
>   syncronise -> synchronize
>   unusally -> unusually
>   addrress -> address
>   inverval -> interval
> 
> Signed-off-by: Colin Ian King 
> Signed-off-by: Kalle Valo 

Patch applied to ath-next branch of ath.git, thanks.

23de57975f14 ath10k: fix a bunch of spelling mistakes in messages

-- 
https://patchwork.kernel.org/patch/9808405/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

Re: ath10k: add const to thermal_cooling_device_ops structure

2017-06-28 Thread Kalle Valo

Bhumika Goyal  wrote:

> Declare thermal_cooling_device_ops structure as const as it is only passed
> as an argument to the function thermal_cooling_device_register and this
> argument is of type const. So, declare the structure as const.
> 
> Signed-off-by: Bhumika Goyal 
> Signed-off-by: Kalle Valo 

Patch applied to ath-next branch of ath.git, thanks.

1cdb6c9fd433 ath10k: add const to thermal_cooling_device_ops structure

-- 
https://patchwork.kernel.org/patch/9801291/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

[PATCH] amd-xgbe: fix spelling mistake: "avialable" -> "available"

2017-06-28 Thread Colin King

From: Colin Ian King 

Trivial fix to spelling mistake in netdev_err message

Signed-off-by: Colin Ian King 
---
 drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
index 920566a3a599..67a2e52ad25d 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
@@ -247,7 +247,7 @@ static int xgbe_set_pauseparam(struct net_device *netdev,
 
if (pause->autoneg && (pdata->phy.autoneg != AUTONEG_ENABLE)) {
netdev_err(netdev,
-  "autoneg disabled, pause autoneg not avialable\n");
+  "autoneg disabled, pause autoneg not available\n");
return -EINVAL;
}
 
-- 
2.11.0

Re: ti: wl18xx: add checks on wl18xx_top_reg_write() return value

2017-06-28 Thread Kalle Valo

"Gustavo A. R. Silva"  wrote:

> Check return value from call to wl18xx_top_reg_write(),
> so in case of error jump to goto label out and return.
> 
> Also, remove unnecessary value check before goto label out.
> 
> Addresses-Coverity-ID: 1226938
> Signed-off-by: Gustavo A. R. Silva 

The prefix should be "wl18xx:", I'll fix that.

-- 
https://patchwork.kernel.org/patch/9810591/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

[PATCH 1/4] arcnet: add err_skb package for package status feedback

2017-06-28 Thread Michael Grzeschik

We need to track the status of our queued packages. This way the driving
process knows if failed packages need to be retransmitted. For this
purpose we queue the transferred/failed packages back into the err_skb
message queue added with some status information.

Signed-off-by: Michael Grzeschik 
---
 drivers/net/arcnet/arcdevice.h |  4 +++
 drivers/net/arcnet/arcnet.c| 74 --
 2 files changed, 68 insertions(+), 10 deletions(-)

diff --git a/drivers/net/arcnet/arcdevice.h b/drivers/net/arcnet/arcdevice.h
index 20bfb9ba83ea2..cbb4f8566bbe5 100644
--- a/drivers/net/arcnet/arcdevice.h
+++ b/drivers/net/arcnet/arcdevice.h
@@ -269,6 +269,10 @@ struct arcnet_local {
 
struct timer_list   timer;
 
+   struct net_device *dev;
+   int reply_status;
+   struct tasklet_struct reply_tasklet;
+
/*
 * Buffer management: an ARCnet card has 4 x 512-byte buffers, each of
 * which can be used for either sending or receiving.  The new dynamic
diff --git a/drivers/net/arcnet/arcnet.c b/drivers/net/arcnet/arcnet.c
index 62ee439d58829..d87f4da29f113 100644
--- a/drivers/net/arcnet/arcnet.c
+++ b/drivers/net/arcnet/arcnet.c
@@ -51,6 +51,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -391,6 +392,52 @@ static void arcnet_timer(unsigned long data)
}
 }
 
+static void arcnet_reply_tasklet(unsigned long data)
+{
+   struct arcnet_local *lp = (struct arcnet_local *)data;
+
+   struct sk_buff *ackskb, *skb;
+   struct sock_exterr_skb *serr;
+   struct sock *sk;
+   int ret;
+
+   local_irq_disable();
+   skb = lp->outgoing.skb;
+   if (!skb || !skb->sk) {
+   local_irq_enable();
+   return;
+   }
+
+   sock_hold(skb->sk);
+   sk = skb->sk;
+   ackskb = skb_clone_sk(skb);
+   sock_put(skb->sk);
+
+   if (!ackskb) {
+   local_irq_enable();
+   return;
+   }
+
+   serr = SKB_EXT_ERR(ackskb);
+   memset(serr, 0, sizeof(*serr));
+   serr->ee.ee_errno = ENOMSG;
+   serr->ee.ee_origin = SO_EE_ORIGIN_TXSTATUS;
+   serr->ee.ee_data = skb_shinfo(skb)->tskey;
+   serr->ee.ee_info = lp->reply_status;
+
+   /* finally erasing outgoing skb */
+   dev_kfree_skb(lp->outgoing.skb);
+   lp->outgoing.skb = NULL;
+
+   ackskb->dev = lp->dev;
+
+   ret = sock_queue_err_skb(sk, ackskb);
+   if (ret)
+   kfree_skb(ackskb);
+
+   local_irq_enable();
+};
+
 struct net_device *alloc_arcdev(const char *name)
 {
struct net_device *dev;
@@ -401,6 +448,7 @@ struct net_device *alloc_arcdev(const char *name)
if (dev) {
struct arcnet_local *lp = netdev_priv(dev);
 
+   lp->dev = dev;
spin_lock_init(>lock);
init_timer(>timer);
lp->timer.data = (unsigned long) dev;
@@ -436,6 +484,9 @@ int arcnet_open(struct net_device *dev)
arc_cont(D_PROTO, "\n");
}
 
+   tasklet_init(>reply_tasklet, arcnet_reply_tasklet,
+(unsigned long)lp);
+
arc_printk(D_INIT, dev, "arcnet_open: resetting card.\n");
 
/* try to put the card in a defined state - if it fails the first
@@ -527,6 +578,8 @@ int arcnet_close(struct net_device *dev)
netif_stop_queue(dev);
netif_carrier_off(dev);
 
+   tasklet_kill(>reply_tasklet);
+
/* flush TX and disable RX */
lp->hw.intmask(dev, 0);
lp->hw.command(dev, NOTXcmd);   /* stop transmit */
@@ -635,13 +688,13 @@ netdev_tx_t arcnet_send_packet(struct sk_buff *skb,
txbuf = -1;
 
if (txbuf != -1) {
+   lp->outgoing.skb = skb;
if (proto->prepare_tx(dev, pkt, skb->len, txbuf) &&
!proto->ack_tx) {
/* done right away and we don't want to acknowledge
 *  the package later - forget about it now
 */
dev->stats.tx_bytes += skb->len;
-   dev_kfree_skb(skb);
} else {
/* do it the 'split' way */
lp->outgoing.proto = proto;
@@ -842,8 +895,16 @@ irqreturn_t arcnet_interrupt(int irq, void *dev_id)
 
/* a transmit finished, and we're interested in it. */
if ((status & lp->intmask & TXFREEflag) || lp->timed_out) {
+   int ackstatus;
lp->intmask &= ~(TXFREEflag | EXCNAKflag);
 
+   if (status & TXACKflag)
+   ackstatus = 2;
+   else if (lp->excnak_pending)
+   ackstatus = 1;
+   else
+   ackstatus = 0;
+
arc_printk(D_DURING, dev, "TX IRQ (stat=%Xh)\n",

[PATCH 2/4] arcnet: com20020-pci: add attribute to readback backplane status

2017-06-28 Thread Michael Grzeschik

We add the sysfs interface the read back the backplane
status of the interface.

Signed-off-by: Michael Grzeschik 
---
 drivers/net/arcnet/com20020-pci.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/net/arcnet/com20020-pci.c 
b/drivers/net/arcnet/com20020-pci.c
index 239de38fbd6a5..dec300cac55f9 100644
--- a/drivers/net/arcnet/com20020-pci.c
+++ b/drivers/net/arcnet/com20020-pci.c
@@ -93,6 +93,27 @@ static void led_recon_set(struct led_classdev *led_cdev,
outb(!!value, priv->misc + ci->leds[card->index].red);
 }
 
+static ssize_t backplane_mode_show(struct device *dev,
+  struct device_attribute *attr,
+  char *buf)
+{
+   struct net_device *net_dev = to_net_dev(dev);
+   struct arcnet_local *lp = netdev_priv(net_dev);
+
+   return sprintf(buf, "%s\n", lp->backplane ? "true" : "false");
+}
+static DEVICE_ATTR_RO(backplane_mode);
+
+static struct attribute *com20020_state_attrs[] = {
+   _attr_backplane_mode.attr,
+   NULL,
+};
+
+static struct attribute_group com20020_state_group = {
+   .name = NULL,
+   .attrs = com20020_state_attrs,
+};
+
 static void com20020pci_remove(struct pci_dev *pdev);
 
 static int com20020pci_probe(struct pci_dev *pdev,
@@ -168,6 +189,7 @@ static int com20020pci_probe(struct pci_dev *pdev,
 
dev->base_addr = ioaddr;
dev->dev_addr[0] = node;
+   dev->sysfs_groups[0] = _state_group;
dev->irq = pdev->irq;
lp->card_name = "PCI COM20020";
lp->card_flags = ci->flags;
-- 
2.11.0

1 2 >

1 - 100 of 181 matches

Mail list logo