[PATCH net-next 3/3] nfp: flower: include geneve as supported offload tunnel type

2018-11-07 Thread John Hurley
Offload of geneve decap rules is supported in NFP. Include geneve in the
check for supported types.

Signed-off-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c 
b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
index 8e5bec0..170f314 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
@@ -190,6 +190,8 @@ static bool nfp_tun_is_netdev_to_offload(struct net_device 
*netdev)
return true;
if (netif_is_vxlan(netdev))
return true;
+   if (netif_is_geneve(netdev))
+   return true;
 
return false;
 }
-- 
2.7.4



[PATCH net-next 2/3] nfp: flower: use geneve and vxlan helpers

2018-11-07 Thread John Hurley
Make use of the recently added VXLAN and geneve helper functions to
determine the type of the netdev from its rtnl_link_ops.

Signed-off-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/flower/action.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c 
b/drivers/net/ethernet/netronome/nfp/flower/action.c
index 244dc26..2f67cd55 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "cmsg.h"
 #include "main.h"
@@ -94,13 +95,10 @@ nfp_fl_pre_lag(struct nfp_app *app, const struct tc_action 
*action,
 static bool nfp_fl_netdev_is_tunnel_type(struct net_device *out_dev,
 enum nfp_flower_tun_type tun_type)
 {
-   if (!out_dev->rtnl_link_ops)
-   return false;
-
-   if (!strcmp(out_dev->rtnl_link_ops->kind, "vxlan"))
+   if (netif_is_vxlan(out_dev))
return tun_type == NFP_FL_TUNNEL_VXLAN;
 
-   if (!strcmp(out_dev->rtnl_link_ops->kind, "geneve"))
+   if (netif_is_geneve(out_dev))
return tun_type == NFP_FL_TUNNEL_GENEVE;
 
return false;
-- 
2.7.4



[PATCH net-next 1/3] net: add netif_is_geneve()

2018-11-07 Thread John Hurley
Add a helper function to determine if the type of a netdev is geneve based
on its rtnl_link_ops. This allows drivers that may wish to offload tunnels
to check the underlying type of the device.

A recent patch added a similar helper to vxlan.h

Signed-off-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 include/net/geneve.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/net/geneve.h b/include/net/geneve.h
index a7600ed..fc6a7e0 100644
--- a/include/net/geneve.h
+++ b/include/net/geneve.h
@@ -60,6 +60,12 @@ struct genevehdr {
struct geneve_opt options[];
 };
 
+static inline bool netif_is_geneve(const struct net_device *dev)
+{
+   return dev->rtnl_link_ops &&
+  !strcmp(dev->rtnl_link_ops->kind, "geneve");
+}
+
 #ifdef CONFIG_INET
 struct net_device *geneve_dev_create_fb(struct net *net, const char *name,
u8 name_assign_type, u16 dst_port);
-- 
2.7.4



[PATCH net-next 0/3] nfp: add and use tunnel netdev helpers

2018-11-07 Thread John Hurley
A recent patch introduced the function netif_is_vxlan() to verify the
tunnel type of a given netdev as vxlan.

Add a similar function to detect geneve netdevs and make use of this
function in the NFP driver. Also make use of the vxlan helper where
applicable.

John Hurley (3):
  net: add netif_is_geneve()
  nfp: flower: use geneve and vxlan helpers
  nfp: flower: include geneve as supported offload tunnel type

 drivers/net/ethernet/netronome/nfp/flower/action.c  | 8 +++-
 drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c | 2 ++
 include/net/geneve.h| 6 ++
 3 files changed, 11 insertions(+), 5 deletions(-)

-- 
2.7.4



Re: [RFC net-next v2 1/8] net: sched: register callbacks for indirect tc block binds

2018-10-29 Thread John Hurley
On Sun, Oct 28, 2018 at 11:10 AM Or Gerlitz  wrote:
>
> On Thu, Oct 25, 2018 at 3:28 PM John Hurley  wrote:
> > Currently drivers can register to receive TC block bind/unbind callbacks
> > by implementing the setup_tc ndo in any of their given netdevs. However,
> > drivers may also be interested in binds to higher level devices (e.g.
> > tunnel drivers) to potentially offload filters applied to them.
>
> > Introduce indirect block devs which allows drivers to register callbacks
> > for block binds on other devices. The calling driver is expected to
> > reference an 'owner' struct that it will pass to all block registrations.
> > This is used to track the callbacks from a given driver and free them if
> > the driver is removed while the upper level device is still active.
>
> Hi John,
>
> Maybe it would be better to follow the trusted environment model of the kernel
> and not protect the core from driver bugs? If the driver does things right 
> they
> will unregister before bailing out and if not, they will have to fix..
>

Hi Or,
The owner stuff just makes it easier for a driver to track the blocks
it has registered for and, in turn, release these when exiting.
We could just leave this up to the driver to ensure it properly cleans
up after itself.
I don't feel that strongly either way.

> > Freeing a callback will also trigger an unbind event (if necessary) to
> > direct the driver to remove any offloaded rules and unreg any block filter
> > callbacks.
>
> > Allow registering an indirect block dev callback for a device that is
> > already bound to a block. In this case (if it is an ingress block),
> > register and also trigger the callback meaning that any already installed
> > rules can be replayed to the calling driver.
>
> not just can be replayed.. they will be replayed, but through an
> existing (tc re-offload?)
> facility, correct?
>

Yes, currently in TC, when you register for rule callbacks to a block
that already has rules, these rules are replayed.
With the indirect block approach we still use the same mechanism for
requesting rule callbacks,

> Or.


Re: [RFC net-next v2 2/8] net: add netif_is_geneve()

2018-10-29 Thread John Hurley
On Fri, Oct 26, 2018 at 9:52 AM Sergei Shtylyov
 wrote:
>
> Hello!
>
> On 25.10.2018 15:26, John Hurley wrote:
>
> > Add a helper function to determine if the type of a netdev is geneve based
> > on its rtnl_link_ops. This allows drivers that may wish to ofload tunnels
>
> Offload?
>

offload encap/decap to a hardware device such as a smartNIC.
Sorry, should have made this clearer

> > to check the underlying type of the device.
> >
> > A recent patch added a similar helper to vxlan.h
> >
> > Signed-off-by: John Hurley 
> > Reviewed-by: Jakub Kicinski 
> [...]
>
> MBR, Sergei
>
>


Re: [RFC net-next v2 2/8] net: add netif_is_geneve()

2018-10-25 Thread John Hurley
On Thu, Oct 25, 2018 at 2:00 PM Jiri Pirko  wrote:
>
> Thu, Oct 25, 2018 at 02:26:51PM CEST, john.hur...@netronome.com wrote:
> >Add a helper function to determine if the type of a netdev is geneve based
> >on its rtnl_link_ops. This allows drivers that may wish to ofload tunnels
> >to check the underlying type of the device.
> >
> >A recent patch added a similar helper to vxlan.h
> >
> >Signed-off-by: John Hurley 
> >Reviewed-by: Jakub Kicinski 
>
> I don't understand why this and the next patch are part of this
> patchset. They don't seem directly related.

This is used in later patches that implement the indirect block
offload but I suppose it is not directly related.
We can probably move it to a separate patchset.
Thanks


Re: [RFC net-next v2 0/8] indirect tc block cb registration

2018-10-25 Thread John Hurley
On Thu, Oct 25, 2018 at 1:58 PM Jiri Pirko  wrote:
>
> Thu, Oct 25, 2018 at 02:26:49PM CEST, john.hur...@netronome.com wrote:
> >This patchset introduces an alternative to egdev offload by allowing a
> >driver to register for block updates when an external device (e.g. tunnel
> >netdev) is bound to a TC block. Drivers can track new netdevs or register
> >to existing ones to receive information on such events. Based on this,
> >they may register for block offload rules using already existing
> >functions.
> >
> >The patchset also implements this new indirect block registration in the
> >NFP driver to allow the offloading of tunnel rules. The use of egdev
> >offload (which is currently only used for tunnel offload) is subsequently
> >removed.
>
> John, I'm missing v1->v2 changelog. Could you please add it?
>
> Thanks!

Hi Jiri,
There's little change outside the NFP in v2 but here's short changelog:

v1->v2:
- free allocated owner struct in block_owner_clean function
- add geneve type helper function
- move test stub in NFP (v1 patch 2) to full tunnel offload
implementation via indirect blocks (v2 patches 3-8)


[RFC net-next v2 8/8] nfp: flower: remove unnecessary code in flow lookup

2018-10-25 Thread John Hurley
Recent changes to NFP mean that stats updates from fw to driver no longer
require a flow lookup and (because egdev offload has been removed) the
ingress netdev for a lookup is now always known.

Remove obsolete code in a flow lookup that matches on host context and
that allows for a netdev to be NULL.

Signed-off-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/flower/main.h |  3 +--
 drivers/net/ethernet/netronome/nfp/flower/metadata.c | 11 +++
 drivers/net/ethernet/netronome/nfp/flower/offload.c  |  6 ++
 3 files changed, 6 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
index d8c8f0d..3d3a13f 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -20,7 +20,6 @@ struct nfp_fl_pre_lag;
 struct net_device;
 struct nfp_app;
 
-#define NFP_FL_STATS_CTX_DONT_CARE cpu_to_be32(0x)
 #define NFP_FL_STATS_ELEM_RS   FIELD_SIZEOF(struct nfp_fl_stats_id, \
 init_unalloc)
 #define NFP_FLOWER_MASK_ENTRY_RS   256
@@ -248,7 +247,7 @@ int nfp_modify_flow_metadata(struct nfp_app *app,
 
 struct nfp_fl_payload *
 nfp_flower_search_fl_table(struct nfp_app *app, unsigned long tc_flower_cookie,
-  struct net_device *netdev, __be32 host_ctx);
+  struct net_device *netdev);
 struct nfp_fl_payload *
 nfp_flower_remove_fl_table(struct nfp_app *app, unsigned long 
tc_flower_cookie);
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/metadata.c 
b/drivers/net/ethernet/netronome/nfp/flower/metadata.c
index 9b4711c..573a440 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/metadata.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/metadata.c
@@ -21,7 +21,6 @@ struct nfp_mask_id_table {
 struct nfp_fl_flow_table_cmp_arg {
struct net_device *netdev;
unsigned long cookie;
-   __be32 host_ctx;
 };
 
 static int nfp_release_stats_entry(struct nfp_app *app, u32 stats_context_id)
@@ -76,14 +75,13 @@ static int nfp_get_stats_entry(struct nfp_app *app, u32 
*stats_context_id)
 /* Must be called with either RTNL or rcu_read_lock */
 struct nfp_fl_payload *
 nfp_flower_search_fl_table(struct nfp_app *app, unsigned long tc_flower_cookie,
-  struct net_device *netdev, __be32 host_ctx)
+  struct net_device *netdev)
 {
struct nfp_fl_flow_table_cmp_arg flower_cmp_arg;
struct nfp_flower_priv *priv = app->priv;
 
flower_cmp_arg.netdev = netdev;
flower_cmp_arg.cookie = tc_flower_cookie;
-   flower_cmp_arg.host_ctx = host_ctx;
 
return rhashtable_lookup_fast(>flow_table, _cmp_arg,
  nfp_flower_table_params);
@@ -307,8 +305,7 @@ int nfp_compile_flow_metadata(struct nfp_app *app,
priv->stats[stats_cxt].bytes = 0;
priv->stats[stats_cxt].used = jiffies;
 
-   check_entry = nfp_flower_search_fl_table(app, flow->cookie, netdev,
-NFP_FL_STATS_CTX_DONT_CARE);
+   check_entry = nfp_flower_search_fl_table(app, flow->cookie, netdev);
if (check_entry) {
if (nfp_release_stats_entry(app, stats_cxt))
return -EINVAL;
@@ -353,9 +350,7 @@ static int nfp_fl_obj_cmpfn(struct rhashtable_compare_arg 
*arg,
const struct nfp_fl_flow_table_cmp_arg *cmp_arg = arg->key;
const struct nfp_fl_payload *flow_entry = obj;
 
-   if ((!cmp_arg->netdev || flow_entry->ingress_dev == cmp_arg->netdev) &&
-   (cmp_arg->host_ctx == NFP_FL_STATS_CTX_DONT_CARE ||
-flow_entry->meta.host_ctx_id == cmp_arg->host_ctx))
+   if (flow_entry->ingress_dev == cmp_arg->netdev)
return flow_entry->tc_flower_cookie != cmp_arg->cookie;
 
return 1;
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c 
b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 392d292..07ff728 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -512,8 +512,7 @@ nfp_flower_del_offload(struct nfp_app *app, struct 
net_device *netdev,
if (nfp_netdev_is_nfp_repr(netdev))
port = nfp_port_from_netdev(netdev);
 
-   nfp_flow = nfp_flower_search_fl_table(app, flow->cookie, netdev,
- NFP_FL_STATS_CTX_DONT_CARE);
+   nfp_flow = nfp_flower_search_fl_table(app, flow->cookie, netdev);
if (!nfp_flow)
return -ENOENT;
 
@@ -561,8 +560,7 @@ nfp_flower_get_stats(struct nfp_app *app, struct net_device 
*netdev,
struct nfp_fl_payload *nfp_flow;
u32 ctx_id;
 
- 

[RFC net-next v2 5/8] nfp: flower: add infastructure for indirect TC block register

2018-10-25 Thread John Hurley
Add support structures and functions that can be used by NFP to impliment
the indirect block register functionality of TC.

Signed-off-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/flower/main.c   |  13 +++
 drivers/net/ethernet/netronome/nfp/flower/main.h   |   8 ++
 .../net/ethernet/netronome/nfp/flower/offload.c| 129 +
 3 files changed, 150 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c 
b/drivers/net/ethernet/netronome/nfp/flower/main.c
index 3a54728..518006c 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
@@ -568,8 +568,18 @@ static int nfp_flower_init(struct nfp_app *app)
goto err_cleanup_metadata;
}
 
+   INIT_LIST_HEAD(_priv->indr_block_cb_priv);
+   app_priv->indr_block_owner = tc_indr_block_owner_create();
+   if (!app_priv->indr_block_owner) {
+   err = -ENOMEM;
+   goto err_lag_clean;
+   }
+
return 0;
 
+err_lag_clean:
+   if (app_priv->flower_ext_feats & NFP_FL_FEATS_LAG)
+   nfp_flower_lag_cleanup(_priv->nfp_lag);
 err_cleanup_metadata:
nfp_flower_metadata_cleanup(app);
 err_free_app_priv:
@@ -588,6 +598,8 @@ static void nfp_flower_clean(struct nfp_app *app)
if (app_priv->flower_ext_feats & NFP_FL_FEATS_LAG)
nfp_flower_lag_cleanup(_priv->nfp_lag);
 
+   nfp_flower_clean_indr_block_priv(app);
+
nfp_flower_metadata_cleanup(app);
vfree(app->priv);
app->priv = NULL;
@@ -678,6 +690,7 @@ static void nfp_flower_stop(struct nfp_app *app)
unregister_netdevice_notifier(_priv->nfp_lag.lag_nb);
 
nfp_tunnel_config_stop(app);
+   tc_indr_block_owner_clean(app_priv->indr_block_owner);
 }
 
 const struct nfp_app_type app_flower = {
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
index a91ac52..8b4bcf3 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -133,6 +133,8 @@ struct nfp_fl_lag {
  * @reify_wait_queue:  wait queue for repr reify response counting
  * @mtu_conf:  Configuration of repr MTU value
  * @nfp_lag:   Link aggregation data block
+ * @indr_block_cb_priv:List of priv data passed to indirect block 
registers
+ * @indr_block_owner:  Struct required for indirect blocks
  */
 struct nfp_flower_priv {
struct nfp_app *app;
@@ -166,6 +168,8 @@ struct nfp_flower_priv {
wait_queue_head_t reify_wait_queue;
struct nfp_mtu_conf mtu_conf;
struct nfp_fl_lag nfp_lag;
+   struct list_head indr_block_cb_priv;
+   struct tcf_indr_block_owner *indr_block_owner;
 };
 
 /**
@@ -269,5 +273,9 @@ int nfp_flower_lag_populate_pre_action(struct nfp_app *app,
   struct nfp_fl_pre_lag *pre_act);
 int nfp_flower_lag_get_output_id(struct nfp_app *app,
 struct net_device *master);
+void
+nfp_flower_register_indr_block(struct nfp_app *app, struct net_device *netdev);
+void nfp_flower_unregister_indr_block(struct net_device *netdev);
+void nfp_flower_clean_indr_block_priv(struct nfp_app *app);
 
 #endif
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c 
b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 2c32edf..f701b2e 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -693,3 +693,132 @@ int nfp_flower_setup_tc(struct nfp_app *app, struct 
net_device *netdev,
return -EOPNOTSUPP;
}
 }
+
+struct nfp_flower_indr_block_cb_priv {
+   struct net_device *netdev;
+   struct nfp_app *app;
+   struct list_head list;
+};
+
+static struct nfp_flower_indr_block_cb_priv *
+nfp_flower_indr_block_cb_priv_lookup(struct nfp_app *app,
+struct net_device *netdev)
+{
+   struct nfp_flower_indr_block_cb_priv *cb_priv;
+   struct nfp_flower_priv *priv = app->priv;
+
+   /* All callback list access should be protected by RTNL. */
+   ASSERT_RTNL();
+
+   list_for_each_entry(cb_priv, >indr_block_cb_priv, list)
+   if (cb_priv->netdev == netdev)
+   return cb_priv;
+
+   return NULL;
+}
+
+void nfp_flower_clean_indr_block_priv(struct nfp_app *app)
+{
+   struct nfp_flower_indr_block_cb_priv *cb_priv, *temp;
+   struct nfp_flower_priv *priv = app->priv;
+
+   list_for_each_entry_safe(cb_priv, temp, >indr_block_cb_priv, list)
+   kfree(cb_priv);
+}
+
+static int nfp_flower_setup_indr_block_cb(enum tc_setup_type type,
+ void *type_data, void *cb_priv)
+{
+   struct nfp_flower_indr_block_cb_priv *priv = cb_priv;

[RFC net-next v2 6/8] nfp: flower: offload tunnel decap rules via indirect TC blocks

2018-10-25 Thread John Hurley
Previously, TC block tunnel decap rules were only offloaded when a
callback was triggered through registration of the rules egress device.
This meant that the driver had no access to the ingress netdev and so
could not verify it was the same tunnel type that the rule implied.

Register tunnel devices for indirect TC block offloads in NFP, giving
access to new rules based on the ingress device rather than egress. Use
this to verify the netdev type of VXLAN and Geneve based rules and offload
the rules to HW if applicable.

Tunnel registration is done via a netdev notifier. On notifier
registration, this is triggered for already existing netdevs. This means
that NFP can register for offloads from devices that exist before it is
loaded (filter rules will be replayed from the TC core).

Signed-off-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/flower/action.c  | 15 ---
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h| 13 +
 drivers/net/ethernet/netronome/nfp/flower/offload.c | 11 +++
 drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c |  9 -
 4 files changed, 28 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c 
b/drivers/net/ethernet/netronome/nfp/flower/action.c
index 04349c7..1260825 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -91,21 +91,6 @@ nfp_fl_pre_lag(struct nfp_app *app, const struct tc_action 
*action,
return act_size;
 }
 
-static bool nfp_fl_netdev_is_tunnel_type(struct net_device *out_dev,
-enum nfp_flower_tun_type tun_type)
-{
-   if (!out_dev->rtnl_link_ops)
-   return false;
-
-   if (!strcmp(out_dev->rtnl_link_ops->kind, "vxlan"))
-   return tun_type == NFP_FL_TUNNEL_VXLAN;
-
-   if (!strcmp(out_dev->rtnl_link_ops->kind, "geneve"))
-   return tun_type == NFP_FL_TUNNEL_GENEVE;
-
-   return false;
-}
-
 static int
 nfp_fl_output(struct nfp_app *app, struct nfp_fl_output *output,
  const struct tc_action *action, struct nfp_fl_payload *nfp_flow,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h 
b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
index 29d673a..06e2888 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "../nfp_app.h"
 #include "../nfpcore/nfp_cpp.h"
@@ -475,6 +476,18 @@ static inline int nfp_flower_cmsg_get_data_len(struct 
sk_buff *skb)
return skb->len - NFP_FLOWER_CMSG_HLEN;
 }
 
+static inline bool
+nfp_fl_netdev_is_tunnel_type(struct net_device *dev,
+enum nfp_flower_tun_type tun_type)
+{
+   if (netif_is_vxlan(dev))
+   return tun_type == NFP_FL_TUNNEL_VXLAN;
+   if (netif_is_geneve(dev))
+   return tun_type == NFP_FL_TUNNEL_GENEVE;
+
+   return false;
+}
+
 struct sk_buff *
 nfp_flower_cmsg_mac_repr_start(struct nfp_app *app, unsigned int num_ports);
 void
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c 
b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index f701b2e..1dc6044 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -128,6 +128,7 @@ nfp_flower_calc_opt_layer(struct 
flow_dissector_key_enc_opts *enc_opts,
 
 static int
 nfp_flower_calculate_key_layers(struct nfp_app *app,
+   struct net_device *netdev,
struct nfp_fl_key_ls *ret_key_ls,
struct tc_cls_flower_offload *flow,
bool egress,
@@ -186,8 +187,6 @@ nfp_flower_calculate_key_layers(struct nfp_app *app,
skb_flow_dissector_target(flow->dissector,
  
FLOW_DISSECTOR_KEY_ENC_CONTROL,
  flow->key);
-   if (!egress)
-   return -EOPNOTSUPP;
 
if (mask_enc_ctl->addr_type != 0x ||
enc_ctl->addr_type != FLOW_DISSECTOR_KEY_IPV4_ADDRS)
@@ -250,6 +249,10 @@ nfp_flower_calculate_key_layers(struct nfp_app *app,
default:
return -EOPNOTSUPP;
}
+
+   /* Ensure the ingress netdev matches the expected tun type. */
+   if (!nfp_fl_netdev_is_tunnel_type(netdev, *tun_type))
+   return -EOPNOTSUPP;
} else if (egress) {
/* Reject non tunnel matches offloaded to egress repr. */
return -EOPNOTSUPP;
@@ -451,8 +454,8 @@ nfp_flower_add_offload(struct nfp_app *app, struct 
net_

[RFC net-next v2 7/8] nfp: flower: remove TC egdev offloads

2018-10-25 Thread John Hurley
Previously, only tunnel decap rules required egdev registration for
offload in NFP. These are now supported via indirect TC block callbacks.

Remove the egdev code from NFP.

Signed-off-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/flower/main.c   | 12 
 drivers/net/ethernet/netronome/nfp/flower/main.h   |  3 -
 .../net/ethernet/netronome/nfp/flower/metadata.c   |  1 +
 .../net/ethernet/netronome/nfp/flower/offload.c| 79 +-
 4 files changed, 17 insertions(+), 78 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c 
b/drivers/net/ethernet/netronome/nfp/flower/main.c
index 518006c..45ab4be 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
@@ -146,23 +146,12 @@ nfp_flower_repr_netdev_stop(struct nfp_app *app, struct 
nfp_repr *repr)
return nfp_flower_cmsg_portmod(repr, false, repr->netdev->mtu, false);
 }
 
-static int
-nfp_flower_repr_netdev_init(struct nfp_app *app, struct net_device *netdev)
-{
-   return tc_setup_cb_egdev_register(netdev,
- nfp_flower_setup_tc_egress_cb,
- netdev_priv(netdev));
-}
-
 static void
 nfp_flower_repr_netdev_clean(struct nfp_app *app, struct net_device *netdev)
 {
struct nfp_repr *repr = netdev_priv(netdev);
 
kfree(repr->app_priv);
-
-   tc_setup_cb_egdev_unregister(netdev, nfp_flower_setup_tc_egress_cb,
-netdev_priv(netdev));
 }
 
 static void
@@ -711,7 +700,6 @@ const struct nfp_app_type app_flower = {
.vnic_init  = nfp_flower_vnic_init,
.vnic_clean = nfp_flower_vnic_clean,
 
-   .repr_init  = nfp_flower_repr_netdev_init,
.repr_preclean  = nfp_flower_repr_netdev_preclean,
.repr_clean = nfp_flower_repr_netdev_clean,
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
index 8b4bcf3..d8c8f0d 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -213,7 +213,6 @@ struct nfp_fl_payload {
char *unmasked_data;
char *mask_data;
char *action_data;
-   bool ingress_offload;
 };
 
 extern const struct rhashtable_params nfp_flower_table_params;
@@ -262,8 +261,6 @@ void nfp_tunnel_del_ipv4_off(struct nfp_app *app, __be32 
ipv4);
 void nfp_tunnel_add_ipv4_off(struct nfp_app *app, __be32 ipv4);
 void nfp_tunnel_request_route(struct nfp_app *app, struct sk_buff *skb);
 void nfp_tunnel_keep_alive(struct nfp_app *app, struct sk_buff *skb);
-int nfp_flower_setup_tc_egress_cb(enum tc_setup_type type, void *type_data,
- void *cb_priv);
 void nfp_flower_lag_init(struct nfp_fl_lag *lag);
 void nfp_flower_lag_cleanup(struct nfp_fl_lag *lag);
 int nfp_flower_lag_reset(struct nfp_fl_lag *lag);
diff --git a/drivers/net/ethernet/netronome/nfp/flower/metadata.c 
b/drivers/net/ethernet/netronome/nfp/flower/metadata.c
index 48729bf..9b4711c 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/metadata.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/metadata.c
@@ -287,6 +287,7 @@ int nfp_compile_flow_metadata(struct nfp_app *app,
 
nfp_flow->meta.host_ctx_id = cpu_to_be32(stats_cxt);
nfp_flow->meta.host_cookie = cpu_to_be64(flow->cookie);
+   nfp_flow->ingress_dev = netdev;
 
new_mask_id = 0;
if (!nfp_check_mask_add(app, nfp_flow->mask_data,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c 
b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 1dc6044..392d292 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -131,7 +131,6 @@ nfp_flower_calculate_key_layers(struct nfp_app *app,
struct net_device *netdev,
struct nfp_fl_key_ls *ret_key_ls,
struct tc_cls_flower_offload *flow,
-   bool egress,
enum nfp_flower_tun_type *tun_type)
 {
struct flow_dissector_key_basic *mask_basic = NULL;
@@ -253,9 +252,6 @@ nfp_flower_calculate_key_layers(struct nfp_app *app,
/* Ensure the ingress netdev matches the expected tun type. */
if (!nfp_fl_netdev_is_tunnel_type(netdev, *tun_type))
return -EOPNOTSUPP;
-   } else if (egress) {
-   /* Reject non tunnel matches offloaded to egress repr. */
-   return -EOPNOTSUPP;
}
 
if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_BASIC)) {
@@ -376,7 +372,7 @@ nfp_flower_calculate_key_layers(struct nfp_app *app,
 }
 
 static struct nfp_fl_payload *
-nfp_flower_allocate_new(struct nfp_fl_key_ls *key_layer, bool egress)
+nfp

[RFC net-next v2 2/8] net: add netif_is_geneve()

2018-10-25 Thread John Hurley
Add a helper function to determine if the type of a netdev is geneve based
on its rtnl_link_ops. This allows drivers that may wish to ofload tunnels
to check the underlying type of the device.

A recent patch added a similar helper to vxlan.h

Signed-off-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 include/net/geneve.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/net/geneve.h b/include/net/geneve.h
index a7600ed..fc6a7e0 100644
--- a/include/net/geneve.h
+++ b/include/net/geneve.h
@@ -60,6 +60,12 @@ struct genevehdr {
struct geneve_opt options[];
 };
 
+static inline bool netif_is_geneve(const struct net_device *dev)
+{
+   return dev->rtnl_link_ops &&
+  !strcmp(dev->rtnl_link_ops->kind, "geneve");
+}
+
 #ifdef CONFIG_INET
 struct net_device *geneve_dev_create_fb(struct net *net, const char *name,
u8 name_assign_type, u16 dst_port);
-- 
2.7.4



[RFC net-next v2 4/8] nfp: flower: allow non repr netdev offload

2018-10-25 Thread John Hurley
Previously the offload functions in NFP assumed that the ingress (or
egress) netdev passed to them was an nfp repr.

Modify the driver to permit the passing of non repr netdevs as the ingress
device for an offload rule candidate. This may include devices such as
tunnels. The driver should then base its offload decision on a combination
of ingress device and egress port for a rule.

Signed-off-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/flower/action.c | 14 
 drivers/net/ethernet/netronome/nfp/flower/main.h   |  3 +-
 drivers/net/ethernet/netronome/nfp/flower/match.c  | 38 --
 .../net/ethernet/netronome/nfp/flower/offload.c| 33 +++
 4 files changed, 49 insertions(+), 39 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c 
b/drivers/net/ethernet/netronome/nfp/flower/action.c
index 244dc26..04349c7 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -151,11 +151,12 @@ nfp_fl_output(struct nfp_app *app, struct nfp_fl_output 
*output,
/* Set action output parameters. */
output->flags = cpu_to_be16(tmp_flags);
 
-   /* Only offload if egress ports are on the same device as the
-* ingress port.
-*/
-   if (!switchdev_port_same_parent_id(in_dev, out_dev))
-   return -EOPNOTSUPP;
+   if (nfp_netdev_is_nfp_repr(in_dev)) {
+   /* Confirm ingress and egress are on same device. */
+   if (!switchdev_port_same_parent_id(in_dev, out_dev))
+   return -EOPNOTSUPP;
+   }
+
if (!nfp_netdev_is_nfp_repr(out_dev))
return -EOPNOTSUPP;
 
@@ -728,9 +729,8 @@ nfp_flower_loop_action(struct nfp_app *app, const struct 
tc_action *a,
*a_len += sizeof(struct nfp_fl_push_vlan);
} else if (is_tcf_tunnel_set(a)) {
struct ip_tunnel_info *ip_tun = tcf_tunnel_info(a);
-   struct nfp_repr *repr = netdev_priv(netdev);
 
-   *tun_type = nfp_fl_get_tun_from_act_l4_port(repr->app, a);
+   *tun_type = nfp_fl_get_tun_from_act_l4_port(app, a);
if (*tun_type == NFP_FL_TUNNEL_NONE)
return -EOPNOTSUPP;
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
index 90045ba..a91ac52 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -226,7 +226,8 @@ void nfp_flower_metadata_cleanup(struct nfp_app *app);
 
 int nfp_flower_setup_tc(struct nfp_app *app, struct net_device *netdev,
enum tc_setup_type type, void *type_data);
-int nfp_flower_compile_flow_match(struct tc_cls_flower_offload *flow,
+int nfp_flower_compile_flow_match(struct nfp_app *app,
+ struct tc_cls_flower_offload *flow,
  struct nfp_fl_key_ls *key_ls,
  struct net_device *netdev,
  struct nfp_fl_payload *nfp_flow,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/match.c 
b/drivers/net/ethernet/netronome/nfp/flower/match.c
index e54fb60..cdf7559 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/match.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/match.c
@@ -52,10 +52,13 @@ nfp_flower_compile_port(struct nfp_flower_in_port *frame, 
u32 cmsg_port,
return 0;
}
 
-   if (tun_type)
+   if (tun_type) {
frame->in_port = cpu_to_be32(NFP_FL_PORT_TYPE_TUN | tun_type);
-   else
+   } else {
+   if (!cmsg_port)
+   return -EOPNOTSUPP;
frame->in_port = cpu_to_be32(cmsg_port);
+   }
 
return 0;
 }
@@ -289,17 +292,21 @@ nfp_flower_compile_ipv4_udp_tun(struct 
nfp_flower_ipv4_udp_tun *frame,
}
 }
 
-int nfp_flower_compile_flow_match(struct tc_cls_flower_offload *flow,
+int nfp_flower_compile_flow_match(struct nfp_app *app,
+ struct tc_cls_flower_offload *flow,
  struct nfp_fl_key_ls *key_ls,
  struct net_device *netdev,
  struct nfp_fl_payload *nfp_flow,
  enum nfp_flower_tun_type tun_type)
 {
-   struct nfp_repr *netdev_repr;
+   u32 cmsg_port = 0;
int err;
u8 *ext;
u8 *msk;
 
+   if (nfp_netdev_is_nfp_repr(netdev))
+   cmsg_port = nfp_repr_get_port_id(netdev);
+
memset(nfp_flow->unmasked_data, 0, key_ls->key_size);
memset(nfp_flow->mask_data, 0, key_ls->key_size);
 
@@ -327,15 +334,13 @@ int nfp_flower_c

[RFC net-next v2 3/8] nfp: flower: include geneve as supported offload tunnel type

2018-10-25 Thread John Hurley
Offload of geneve decap rules is supported in NFP. Include geneve in the
check for supported types.

Signed-off-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c 
b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
index 8e5bec0..170f314 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
@@ -190,6 +190,8 @@ static bool nfp_tun_is_netdev_to_offload(struct net_device 
*netdev)
return true;
if (netif_is_vxlan(netdev))
return true;
+   if (netif_is_geneve(netdev))
+   return true;
 
return false;
 }
-- 
2.7.4



[RFC net-next v2 0/8] indirect tc block cb registration

2018-10-25 Thread John Hurley
This patchset introduces an alternative to egdev offload by allowing a
driver to register for block updates when an external device (e.g. tunnel
netdev) is bound to a TC block. Drivers can track new netdevs or register
to existing ones to receive information on such events. Based on this,
they may register for block offload rules using already existing
functions.

The patchset also implements this new indirect block registration in the
NFP driver to allow the offloading of tunnel rules. The use of egdev
offload (which is currently only used for tunnel offload) is subsequently
removed.

John Hurley (8):
  net: sched: register callbacks for indirect tc block binds
  net: add netif_is_geneve()
  nfp: flower: include geneve as supported offload tunnel type
  nfp: flower: allow non repr netdev offload
  nfp: flower: add infastructure for indirect TC block register
  nfp: flower: offload tunnel decap rules via indirect TC blocks
  nfp: flower: remove TC egdev offloads
  nfp: flower: remove unnecessary code in flow lookup

 drivers/net/ethernet/netronome/nfp/flower/action.c |  29 +-
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h   |  13 +
 drivers/net/ethernet/netronome/nfp/flower/main.c   |  25 +-
 drivers/net/ethernet/netronome/nfp/flower/main.h   |  17 +-
 drivers/net/ethernet/netronome/nfp/flower/match.c  |  38 +--
 .../net/ethernet/netronome/nfp/flower/metadata.c   |  12 +-
 .../net/ethernet/netronome/nfp/flower/offload.c| 246 +++--
 .../ethernet/netronome/nfp/flower/tunnel_conf.c|  11 +-
 include/net/geneve.h   |   6 +
 include/net/pkt_cls.h  |  56 
 include/net/sch_generic.h  |   3 +
 net/sched/cls_api.c| 299 -
 12 files changed, 609 insertions(+), 146 deletions(-)

-- 
2.7.4



[RFC net-next v2 1/8] net: sched: register callbacks for indirect tc block binds

2018-10-25 Thread John Hurley
Currently drivers can register to receive TC block bind/unbind callbacks
by implementing the setup_tc ndo in any of their given netdevs. However,
drivers may also be interested in binds to higher level devices (e.g.
tunnel drivers) to potentially offload filters applied to them.

Introduce indirect block devs which allows drivers to register callbacks
for block binds on other devices. The calling driver is expected to
reference an 'owner' struct that it will pass to all block registrations.
This is used to track the callbacks from a given driver and free them if
the driver is removed while the upper level device is still active.
Freeing a callback will also trigger an unbind event (if necessary) to
direct the driver to remove any offloaded rules and unreg any block filter
callbacks.

Allow registering an indirect block dev callback for a device that is
already bound to a block. In this case (if it is an ingress block),
register and also trigger the callback meaning that any already installed
rules can be replayed to the calling driver.

Signed-off-by: John Hurley 
Signed-off-by: Jakub Kicinski 
---
 include/net/pkt_cls.h |  56 +
 include/net/sch_generic.h |   3 +
 net/sched/cls_api.c   | 299 +-
 3 files changed, 357 insertions(+), 1 deletion(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 72ffb31..1b47837 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -37,6 +37,7 @@ struct tcf_block_ext_info {
 };
 
 struct tcf_block_cb;
+struct tcf_indr_block_owner;
 bool tcf_queue_work(struct rcu_work *rwork, work_func_t func);
 
 #ifdef CONFIG_NET_CLS
@@ -81,6 +82,20 @@ void __tcf_block_cb_unregister(struct tcf_block *block,
   struct tcf_block_cb *block_cb);
 void tcf_block_cb_unregister(struct tcf_block *block,
 tc_setup_cb_t *cb, void *cb_ident);
+int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+   tc_indr_block_bind_cb_t *cb, void *cb_ident,
+   struct tcf_indr_block_owner *owner);
+int tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+ tc_indr_block_bind_cb_t *cb, void *cb_ident,
+ struct tcf_indr_block_owner *owner);
+void __tc_indr_block_cb_unregister(struct net_device *dev,
+  tc_indr_block_bind_cb_t *cb, void *cb_ident);
+void tc_indr_block_cb_unregister(struct net_device *dev,
+tc_indr_block_bind_cb_t *cb,
+void *cb_ident);
+
+struct tcf_indr_block_owner *tc_indr_block_owner_create(void);
+void tc_indr_block_owner_clean(struct tcf_indr_block_owner *owner);
 
 int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 struct tcf_result *res, bool compat_mode);
@@ -183,6 +198,47 @@ void tcf_block_cb_unregister(struct tcf_block *block,
 {
 }
 
+static inline
+int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+   tc_indr_block_bind_cb_t *cb,
+   void *cb_ident,
+   struct tcf_indr_block_owner *owner)
+{
+   return 0;
+}
+
+static inline
+int tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+ tc_indr_block_bind_cb_t *cb, void *cb_ident,
+ struct tcf_indr_block_owner *owner)
+{
+   return 0;
+}
+
+static inline
+void __tc_indr_block_cb_unregister(struct net_device *dev,
+  tc_indr_block_bind_cb_t *cb,
+  void *cb_ident)
+{
+}
+
+static inline
+void tc_indr_block_cb_unregister(struct net_device *dev,
+tc_indr_block_bind_cb_t *cb,
+void *cb_ident)
+{
+}
+
+static inline struct tcf_indr_block_owner *tc_indr_block_owner_create(void)
+{
+   /* NULL would mean an error, only CONFIG_NET_CLS can dereference this */
+   return (void *)1;
+}
+
+static inline void tc_indr_block_owner_clean(struct tcf_indr_block_owner 
*owner)
+{
+}
+
 static inline int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
   struct tcf_result *res, bool compat_mode)
 {
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 4d73642..8301581 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -24,6 +24,9 @@ struct bpf_flow_keys;
 typedef int tc_setup_cb_t(enum tc_setup_type type,
  void *type_data, void *cb_priv);
 
+typedef int tc_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv,
+   enum tc_setup_type type, void *type_data);
+
 struct qdisc_rate_table {
struct tc_ratespec rate;
u32 data[256];
diff --git a/net/sched/cls_api.c b/net/sched

Re: [RFC 0/2] net: sched: indirect/remote setup tc block cb registering

2018-10-11 Thread John Hurley
On Wed, Oct 10, 2018 at 2:38 PM Or Gerlitz  wrote:
>
> On Thu, Oct 4, 2018 at 8:19 PM Jakub Kicinski
>  wrote:
> > On Thu, 4 Oct 2018 17:20:43 +0100, John Hurley wrote:
> > > > > In this case the hw driver will receive the rules from the tunnel 
> > > > > device directly.
> > > > > The driver can then offload them as it sees fit.
> > > >
> > > > if both instances of the hw drivers (uplink0, uplink1) register to get
> > > > the rules installed on the block of the tunnel device we have exactly
> > > > what we want, isn't that?
> > > >
> > >
> > > The design here is that each hw driver should only need to register
> > > for callbacks on a 'higher level' device's block once.
> > > When a callback is triggered the driver receives one instance of the
> > > rule and can make its own decision about what to do.
> > > This is slightly different from registering ingress devs where each
> > > uplink registers for its own block.
> > > It is probably more akin to the egdev setup in that if a rule on a
> > > block egresses to an uplink, the driver receives 1 callback on the
> > > rule, irrespective of how may underlying netdevs are on the block.
> >
> > Right, though nothing stops the driver from registering multiple
> > callbacks for the same device, if its somehow useful.
>
> I must be missing something.. put uplink bonding a side. If the user
> is setting tc ingress rule
> on a tunnel device (vxlan0/gre0) over a system with multiple unrelated
> NICs/uplinks that support
> TC decap offload, wouldn't each of these netdevs want to install the
> rule into HW? why do we want
> the HW driver to duplicate the rule between the potential candidates
> among the netdev instances they created?
> and not each of them to get the callback and decide??
>
> we want each netdev instance of these NIC

Hi Or,
It depends on how we want to offload tunnels.
In the case of the NFP, we offload 1 instance of a tunnel rule, not
one instance per uplink.
With this, it makes sense to have 1 callback per tunnel netdev (and
per driver) rather that per uplink (although as Jakub pointed out, the
option is there to register more callbacks).
If we consider the egdev model for offload, we only got a single
callback per rule if the egress device was registered and did not know
the ingress dev - is this not a similar in that the driver gets 1
callback for the rule and decides what to do with it?
John


Re: [RFC 0/2] net: sched: indirect/remote setup tc block cb registering

2018-10-04 Thread John Hurley
On Thu, Oct 4, 2018 at 4:53 PM Or Gerlitz  wrote:
>
> On Thu, Oct 4, 2018 at 6:44 PM John Hurley  wrote:
> > On Thu, Oct 4, 2018 at 3:28 PM Or Gerlitz  wrote:
> > > On Thu, Oct 4, 2018 at 7:55 AM Jakub Kicinski 
> > >  wrote:
>
> > > > This patchset introduces as an alternative to egdev offload by allowing 
> > > > a
> > > > driver to register for block updates when an external device (e.g. 
> > > > tunnel
> > > > netdev) is bound to a TC block.
>
> > > In a slightly different but hopefully somehow related context, regarding
> > > the case of flow offloading in the presence of upper devices 
> > > (specifically LAG),
> > > your ovs user patch [1]  applied TC block sharing on the slave of lag 
> > > (bond/team)
> > > device which serves as ovs port. This way, flows that are installed on
> > > the bond are propagated to both uplink devices - good!
>
> > > However, when tunneling comes into play, the bond device is not part of
> > > the virtual switch but rather the tunnel device, so the SW DP is
> > >
> > > wire --> hw driver --> bond --> stack --> tunnel driver --> virtual switch
> > >
> > > So now, if the HW driver uses your new facility to register for rules 
> > > installed on the
> > > tunnel device, we are again properly sharing (duplicating) the rules
> > > to both uplinks, right?!
> > >
> > > [1] d22f892 netdev-linux: monitor and offload LAG slaves to TC
>
> > Because the bond is not on the vSwitch, the TC rule will not be
> > offloaded to it and therefore not duplicated to its devices.
>
> indeed
>
> > In this case the hw driver will receive the rules from the tunnel device 
> > directly.
> > The driver can then offload them as it sees fit.
>
> if both instances of the hw drivers (uplink0, uplink1) register to get
> the rules installed on the block of the tunnel device we have exactly
> what we want, isn't that?
>

The design here is that each hw driver should only need to register
for callbacks on a 'higher level' device's block once.
When a callback is triggered the driver receives one instance of the
rule and can make its own decision about what to do.
This is slightly different from registering ingress devs where each
uplink registers for its own block.
It is probably more akin to the egdev setup in that if a rule on a
block egresses to an uplink, the driver receives 1 callback on the
rule, irrespective of how may underlying netdevs are on the block.

> > Currently, this setup would be offloaded via egdev.
>
> not following, egdev I thought could be removed... and it's not needed
> as I explained above, unless I miss something?

Apologies - my use of 'currently' meant with the current upstream
kernel (i.e. prior to this patch).


Re: [RFC 0/2] net: sched: indirect/remote setup tc block cb registering

2018-10-04 Thread John Hurley
On Thu, Oct 4, 2018 at 3:28 PM Or Gerlitz  wrote:
>
> On Thu, Oct 4, 2018 at 7:55 AM Jakub Kicinski
>  wrote:
> >
> > Hi!
> >
> > This set contains a rough RFC implementation of a proposed [1] replacement
> > for egdev cls_flower offloads.  I did some last minute restructuring
> > and removal of parts I felt were unnecessary, so if there are glaring bugs
> > they are probably mine, not John's :)  but hopefully this will give an idea
> > of the general direction.  We need to beef up the driver part to see how
> > it fully comes together.
> >
> > [1] http://vger.kernel.org/netconf2018_files/JakubKicinski_netconf2018.pdf
> > slides 10-13
> >
> > John's says:
> >
> > This patchset introduces as an alternative to egdev offload by allowing a
> > driver to register for block updates when an external device (e.g. tunnel
> > netdev) is bound to a TC block.
>
> In a slightly different but hopefully somehow related context, regarding
> the case of flow offloading in the presence of upper devices (specifically 
> LAG),
> your ovs user patch [1]  applied TC block sharing on the slave of lag
> (bond/team)
> device which serves as ovs port. This way, flows that are installed on
> the bond are
> propagated to both uplink devices - good!
>
> However, when tunneling comes into play, the bond device is not part of
> the virtual switch but rather the tunnel device, so the SW DP is
>
> wire --> hw driver --> bond --> stack --> tunnel driver --> virtual switch
>
> So now, if the HW driver uses your new facility to register for rules
> installed on the
> tunnel device, we are again properly sharing (duplicating) the rules
> to both uplinks, right?!
>
> [1] d22f892 netdev-linux: monitor and offload LAG slaves to TC
>

Hi Or,
In this case the hw driver will receive the rules from the tunnel
device directly.
The driver can then offload them as it sees fit.
Because the bond is not on the vSwitch, the TC rule will not be
offloaded to it and therefore not duplicated to its devices.
Currently, this setup would be offloaded via egdev.

> > Drivers can track new netdevs or register
> > to existing ones to receive information on such events. Based on this,
> > they may register for block offload rules using already existing functions.
>
> Just to make it clear, (part of) the claim to fame here is that once
> we have this
> code in, we can just go and remove all the egdev related code from the
> kernel (both
> core and drivers), right? only nfp and mlx5 use egdev, so the removal
> should be simple
> exercise.
>

Along with simplifying things and removing the need for duplicate rule
offload checks, I see (at least) 2 main functional benefits of using
this instead of egdev:
1. we can get access to the ingress netdev and so can check for things
like tunnel type rather than relying on TC rules and well known ports
to determine this.
2. we can offload rules that do not have an uplink/repr as ingress or
egress dev (currently, the hw driver will not recieve a callback) -
e.g. VXLAN -->bond.

> > Included with this RFC is a patch to the NFP driver. This is only supposed
> > to provide an example of how the remote block setup can be used.
>
> We will look and play with the patches next week and provide feedback, cool
> that you took the lead to improve the facilities here!
cool, thanks


Re: [PATCH net-next 2/6] nfp: flower: allow matching on ipv4 UDP tunnel tos and ttl

2018-08-07 Thread John Hurley
On Tue, Aug 7, 2018 at 6:46 PM, Or Gerlitz  wrote:
> On Tue, Aug 7, 2018 at 6:35 PM, Simon Horman  
> wrote:
>> From: John Hurley 
>>
>> The addition of FLOW_DISSECTOR_KEY_ENC_IP to TC flower means that the ToS
>> and TTL of the tunnel header can now be matched on.
>>
>> Extend the NFP tunnel match function to include these new fields.
>
> (referring to you patch title) where is this explicitly dealing with
> udp tunnel? tos/ttl belong to any IP tunnel, e.g GRE as well
>
> Or.

Hi Or,

Yes, you are correct.
But in our case we do not currently support GRE so, for NFP, this
offload match only applies to IPv4 UDP tunnels.

John


Re: Fwd: [PATCH 0/6] offload Linux LAG devices to the TC datapath

2018-06-26 Thread John Hurley
On Tue, Jun 26, 2018 at 3:57 PM, Or Gerlitz  wrote:
>>  Forwarded Message 
>> Subject: [PATCH 0/6] offload Linux LAG devices to the TC datapath
>> Date: Thu, 21 Jun 2018 14:35:55 +0100
>> From: John Hurley 
>> To: d...@openvswitch.org, r...@mellanox.com, g...@mellanox.com, 
>> pa...@mellanox.com, f...@sysclose.org, simon.hor...@netronome.com
>> CC: John Hurley 
>>
>> This patchset extends OvS TC and the linux-netdev implementation to
>> support the offloading of Linux Link Aggregation devices (LAG) and their
>> slaves. TC blocks are used to provide this offload. Blocks, in TC, group
>> together a series of qdiscs. If a filter is added to one of these qdiscs
>> then it applied to all. Similarly, if a packet is matched on one of the
>> grouped qdiscs then the stats for the entire block are increased. The
>> basis of the LAG offload is that the LAG master (attached to the OvS
>> bridge) and slaves that may exist outside of OvS are all added to the same
>> TC block. OvS can then control the filters and collect the stats on the
>> slaves via its interaction with the LAG master.
>>
>> The TC API is extended within OvS to allow the addition of a block id to
>> ingress qdisc adds. Block ids are then assigned to each LAG master that is
>> attached to the OvS bridge. The linux netdev netlink socket is used to
>> monitor slave devices. If a LAG slave is found whose master is on the bridge
>> then it is added to the same block as its master. If the underlying slaves
>> belong to an offloadable device then the Linux LAG device can be offloaded
>> to hardware.
>
> Guys (J/J/J),
>
> Doing this here b/c
>
> a. this has impact on the kernel side of things
>
> b. I am more of a netdev and not openvswitch citizen..
>
> some comments,
>
> 1. this + Jakub's patch for the reply are really a great design
>
> 2. re the egress side of things. Some NIC HWs can't just use LAG
> as the egress port destination of an ACL (tc rule) and the HW rule
> needs to be duplicated to both HW ports. So... in that case, you
> see the HW driver doing the duplication (:() or we can somehow
> make it happen from user-space?
>

Hi Or,
I'm not sure how rule duplication would work for rules that egress to
a LAG device.
Perhaps this could be done for an active/backup mode where user-space
adds a rule to 1 port and deletes from another as appropriate.
For load balancing modes where the egress port is selected based on a
hash of packet fields, it would be a lot more complicated.
OvS can do this with its own bonds as far as I'm aware but (if
recirculation is turned off) it basically creates exact match datapath
entries for each packet flow.
Perhaps I do not fully understand your question?

> 3. for the case of overlay networks, e.g OVS based vxlan tunnel, the
> ingress (decap) rule is set on the vxlan device. Jakub, you mentioned
> a possible kernel patch to the HW (nfp, mlx5) drivers to have them bind
> to the tunnel device for ingress rules. If we have agreed way to identify
> uplink representors, can we do that from ovs too? does it matter if we are
> bonding + encapsulating or just encapsulating? note that under encap scheme
> the bond is typically not part of the OVS bridge.
>

If we have a way to bind the HW drivers to tunnel devs for ingress
rules then this should work fine with OvS (possibly requiring a small
patch - Id need to check).

In terms of bonding + encap this probably needs to be handled in the
hw itself for the same reason I mentioned in point 2.

> Or.


Re: [PATCH net-next 0/8] nfp: offload LAG for tc flower egress

2018-05-31 Thread John Hurley
On Wed, May 30, 2018 at 9:29 PM, Jiri Pirko  wrote:
> Wed, May 30, 2018 at 11:26:23AM CEST, john.hur...@netronome.com wrote:
>>On Tue, May 29, 2018 at 11:09 PM, Jiri Pirko  wrote:
>>> Tue, May 29, 2018 at 04:08:48PM CEST, john.hur...@netronome.com wrote:
On Sat, May 26, 2018 at 3:47 AM, Jakub Kicinski
 wrote:
> On Fri, 25 May 2018 08:48:09 +0200, Jiri Pirko wrote:
>> Thu, May 24, 2018 at 04:22:47AM CEST, jakub.kicin...@netronome.com wrote:
>> >Hi!
>> >
>> >This series from John adds bond offload to the nfp driver.  Patch 5
>> >exposes the hash type for NETDEV_LAG_TX_TYPE_HASH to make sure nfp
>> >hashing matches that of the software LAG.  This may be unnecessarily
>> >conservative, let's see what LAG maintainers think :)
>>
>> So you need to restrict offload to only certain hash algo? In mlxsw, we
>> just ignore the lag setting and do some hw default hashing. Would not be
>> enough? Note that there's a good reason for it, as you see, in team, the
>> hashing is done in a BPF function and could be totally arbitrary.
>> Your patchset effectively disables team offload for nfp.
>
> My understanding is that the project requirements only called for L3/L4
> hash algorithm offload, hence the temptation to err on the side of
> caution and not offload all the bond configurations.  John can provide
> more details.  Not being able to offload team is unfortunate indeed.

Hi Jiri,
Yes, as Jakub mentions, we restrict ourselves to L3/L4 hash algorithm
as this is currently what is supported in fw.
>>>
>>> In mlxsw, a default l3/l4 is used always, no matter what the
>>> bonding/team sets. It is not correct, but it works with team as well.
>>> Perhaps we can have NETDEV_LAG_HASH_UNKNOWN to indicate to the driver to
>>> do some default? That would make the "team" offload functional.
>>>
>>
>>yes, I would agree with that.
>>Thanks
>
> Okay, would you please adjust your driver?
>

Will do.
Thanks, Jiri

> I will teka care of mlxsw bits.
>
> Thanks!
>
>>
Hopefully this will change as fw features are expanded.
I understand the issue this presents with offloading team.
Perhaps resorting to a default hw hash for team is acceptable.
John


Re: [PATCH net-next 0/8] nfp: offload LAG for tc flower egress

2018-05-30 Thread John Hurley
On Tue, May 29, 2018 at 11:09 PM, Jiri Pirko  wrote:
> Tue, May 29, 2018 at 04:08:48PM CEST, john.hur...@netronome.com wrote:
>>On Sat, May 26, 2018 at 3:47 AM, Jakub Kicinski
>> wrote:
>>> On Fri, 25 May 2018 08:48:09 +0200, Jiri Pirko wrote:
 Thu, May 24, 2018 at 04:22:47AM CEST, jakub.kicin...@netronome.com wrote:
 >Hi!
 >
 >This series from John adds bond offload to the nfp driver.  Patch 5
 >exposes the hash type for NETDEV_LAG_TX_TYPE_HASH to make sure nfp
 >hashing matches that of the software LAG.  This may be unnecessarily
 >conservative, let's see what LAG maintainers think :)

 So you need to restrict offload to only certain hash algo? In mlxsw, we
 just ignore the lag setting and do some hw default hashing. Would not be
 enough? Note that there's a good reason for it, as you see, in team, the
 hashing is done in a BPF function and could be totally arbitrary.
 Your patchset effectively disables team offload for nfp.
>>>
>>> My understanding is that the project requirements only called for L3/L4
>>> hash algorithm offload, hence the temptation to err on the side of
>>> caution and not offload all the bond configurations.  John can provide
>>> more details.  Not being able to offload team is unfortunate indeed.
>>
>>Hi Jiri,
>>Yes, as Jakub mentions, we restrict ourselves to L3/L4 hash algorithm
>>as this is currently what is supported in fw.
>
> In mlxsw, a default l3/l4 is used always, no matter what the
> bonding/team sets. It is not correct, but it works with team as well.
> Perhaps we can have NETDEV_LAG_HASH_UNKNOWN to indicate to the driver to
> do some default? That would make the "team" offload functional.
>

yes, I would agree with that.
Thanks

>>Hopefully this will change as fw features are expanded.
>>I understand the issue this presents with offloading team.
>>Perhaps resorting to a default hw hash for team is acceptable.
>>John


Re: [PATCH net-next 0/8] nfp: offload LAG for tc flower egress

2018-05-29 Thread John Hurley
On Sat, May 26, 2018 at 3:47 AM, Jakub Kicinski
 wrote:
> On Fri, 25 May 2018 08:48:09 +0200, Jiri Pirko wrote:
>> Thu, May 24, 2018 at 04:22:47AM CEST, jakub.kicin...@netronome.com wrote:
>> >Hi!
>> >
>> >This series from John adds bond offload to the nfp driver.  Patch 5
>> >exposes the hash type for NETDEV_LAG_TX_TYPE_HASH to make sure nfp
>> >hashing matches that of the software LAG.  This may be unnecessarily
>> >conservative, let's see what LAG maintainers think :)
>>
>> So you need to restrict offload to only certain hash algo? In mlxsw, we
>> just ignore the lag setting and do some hw default hashing. Would not be
>> enough? Note that there's a good reason for it, as you see, in team, the
>> hashing is done in a BPF function and could be totally arbitrary.
>> Your patchset effectively disables team offload for nfp.
>
> My understanding is that the project requirements only called for L3/L4
> hash algorithm offload, hence the temptation to err on the side of
> caution and not offload all the bond configurations.  John can provide
> more details.  Not being able to offload team is unfortunate indeed.

Hi Jiri,
Yes, as Jakub mentions, we restrict ourselves to L3/L4 hash algorithm
as this is currently what is supported in fw.
Hopefully this will change as fw features are expanded.
I understand the issue this presents with offloading team.
Perhaps resorting to a default hw hash for team is acceptable.
John


Re: [PATCH net-next 8/8] nfp: flower: compute link aggregation action

2018-05-24 Thread John Hurley
On Thu, May 24, 2018 at 6:09 PM, Or Gerlitz <gerlitz...@gmail.com> wrote:
> On Thu, May 24, 2018 at 5:22 AM, Jakub Kicinski
> <jakub.kicin...@netronome.com> wrote:
>> From: John Hurley <john.hur...@netronome.com>
>>
>> If the egress device of an offloaded rule is a LAG port, then encode the
>> output port to the NFP with a LAG identifier and the offloaded group ID.
>>
>> A prelag action is also offloaded which must be the first action of the
>> series (although may appear after other pre-actions - e.g. tunnels). This
>> causes the FW to check that it has the necessary information to output to
>> the requested LAG port. If it does not, the packet is sent to the kernel
>> before any other actions are applied to it.
>
> Offload decision typically also looks if both devices have the same
> switchdev ID.
>
> In your case, do both reprs gets the same switchdev ID automatically when 
> being
> put into the same team/bond instance? I wasn't sure  to see here
> changes for that matter

Hi Or,
Yes, you are correct.
We essentially substituted the switchdev ID check for a repr app check.
So an app runs per card and spawns the reprs for that card - each repr
has a backpointer to its creating app.
When deciding if we can offload a bond, we ensure that all bond ports
are reprs and belong to the same app and so same card/switchdev_id.


Re: [PATCH net-next 4/4] nfp: flower: ignore duplicate cb requests for same rule

2018-04-25 Thread John Hurley
On Wed, Apr 25, 2018 at 10:17 AM, Or Gerlitz <gerlitz...@gmail.com> wrote:
> On Wed, Apr 25, 2018 at 7:17 AM, Jakub Kicinski
> <jakub.kicin...@netronome.com> wrote:
>> From: John Hurley <john.hur...@netronome.com>
>>
>> If a flower rule has a repr both as ingress and egress port then 2
>> callbacks may be generated for the same rule request.
>>
>> Add an indicator to each flow as to whether or not it was added from an
>> ingress registered cb. If so then ignore add/del/stat requests to it from
>> an egress cb.
>
> So on add() you ignore (return success) - I wasn't sure from the patch
> what do you do for stat()/del() -- success? why not err? as you know I am
> working on the same patch for mlx5, lets align here please.
>
> Or.

ok, this is way Ive implemented the calls...

add:
- if egress cb duplicate but the flow has been added on ingress cb
then ignore (return success)
- if egress cb duplicate and flow not added on ingress then err (this
is a 'true duplicate')

stats:
- if egress cb but the flow has an ingress cb then ignore - stat
request will have been covered on ingress cb so shouldn't error, just
don't repeat

del:
- if egress cb and flow no longer exists then assume it was deleted on
ingress so ignore (return success)
- if ingress cb and flow no longer exists then (as ingress cb is hit
first) this is a bad request whereby trying to delete a flow that was
never added - return err


Re: [PATCH net-next 3/4] nfp: flower: support offloading multiple rules with same cookie

2018-04-25 Thread John Hurley
On Wed, Apr 25, 2018 at 10:13 AM, Or Gerlitz <gerlitz...@gmail.com> wrote:
> On Wed, Apr 25, 2018 at 12:02 PM, John Hurley <john.hur...@netronome.com> 
> wrote:
>> On Wed, Apr 25, 2018 at 9:56 AM, Or Gerlitz <gerlitz...@gmail.com> wrote:
>>> On Wed, Apr 25, 2018 at 11:51 AM, John Hurley <john.hur...@netronome.com> 
>>> wrote:
>>>> On Wed, Apr 25, 2018 at 7:31 AM, Or Gerlitz <gerlitz...@gmail.com> wrote:
>>>>> On Wed, Apr 25, 2018 at 7:17 AM, Jakub Kicinski
>>>>> <jakub.kicin...@netronome.com> wrote:
>>>>>> From: John Hurley <john.hur...@netronome.com>
>>>>>>
>>>>>> When multiple netdevs are attached to a tc offload block and register for
>>>>>> callbacks, a rule added to the block will be propogated to all netdevs.
>>>>>> Previously these were detected as duplicates (based on cookie) and
>>>>>> rejected. Modify the rule nfp lookup function to optionally include an
>>>>>> ingress netdev and a host context along with the cookie value when
>>>>>> searching for a rule. When a new rule is passed to the driver, the netdev
>>>>>> the rule is to be attached to is considered when searching for 
>>>>>> dublicates.
>>>>>
>>>>> so if the same rule (cookie) is provided to the driver through multiple 
>>>>> ingress
>>>>> devices you will not reject it -- what is the use case for that, is it
>>>>> block sharing?
>>>>
>>>> Hi Or,
>>>> Yes, block sharing is the current use-case.
>>>> Simple example for clarity
>>>> Here we want to offload the filter to both ingress devs nfp_0 and nfp_1:
>>>>
>>>> tc qdisc add dev nfp_p0 ingress_block 22 ingress
>>>> tc qdisc add dev nfp_p1 ingress_block 22 ingress
>>>> tc filter add block 22 protocol ip parent : flower skip_sw
>>>> ip_proto tcp action drop
>>>
>>> cool!
>>>
>>> Just out of curiosity, do you actually share this HW rule or you duplicate 
>>> it?
>>
>> It's duplicated. At HW level the ingress port is part of the match so 
>> technically it's
>> a different rule.
>
> I see, we have also a match on the ingress port as part of the HW API, which
> means we will have to apply a similar practice if we want to support
> block sharing quickly.
>
> Just to make sure, under tc block sharing the tc stack calls for hw
> offloading of the
> same rule (same cookie) multiple times, each with different ingress
> device, right?
>
>
> Or.

So in the example above, when each qdisc add is called, a callback
will be registered to the block.
For each callback, the dev used is passed as priv data (presumably you
do similar).
When the filter is added, the block code triggers all callbacks with
the same rule data [1].
We differentiate the callbacks with the priv data (ingress dev).

[1] https://elixir.bootlin.com/linux/v4.17-rc2/source/net/sched/cls_api.c#L741


Re: [PATCH net-next 3/4] nfp: flower: support offloading multiple rules with same cookie

2018-04-25 Thread John Hurley
On Wed, Apr 25, 2018 at 9:56 AM, Or Gerlitz <gerlitz...@gmail.com> wrote:
> On Wed, Apr 25, 2018 at 11:51 AM, John Hurley <john.hur...@netronome.com> 
> wrote:
>> On Wed, Apr 25, 2018 at 7:31 AM, Or Gerlitz <gerlitz...@gmail.com> wrote:
>>> On Wed, Apr 25, 2018 at 7:17 AM, Jakub Kicinski
>>> <jakub.kicin...@netronome.com> wrote:
>>>> From: John Hurley <john.hur...@netronome.com>
>>>>
>>>> When multiple netdevs are attached to a tc offload block and register for
>>>> callbacks, a rule added to the block will be propogated to all netdevs.
>>>> Previously these were detected as duplicates (based on cookie) and
>>>> rejected. Modify the rule nfp lookup function to optionally include an
>>>> ingress netdev and a host context along with the cookie value when
>>>> searching for a rule. When a new rule is passed to the driver, the netdev
>>>> the rule is to be attached to is considered when searching for dublicates.
>>>
>>> so if the same rule (cookie) is provided to the driver through multiple 
>>> ingress
>>> devices you will not reject it -- what is the use case for that, is it
>>> block sharing?
>>
>> Hi Or,
>> Yes, block sharing is the current use-case.
>> Simple example for clarity
>> Here we want to offload the filter to both ingress devs nfp_0 and nfp_1:
>>
>> tc qdisc add dev nfp_p0 ingress_block 22 ingress
>> tc qdisc add dev nfp_p1 ingress_block 22 ingress
>> tc filter add block 22 protocol ip parent : flower skip_sw
>> ip_proto tcp action drop
>
> cool!
>
> Just out of curiosity, do you actually share this HW rule or you duplicate it?

It's duplicated.
At HW level the ingress port is part of the match so technically it's
a different rule.


Re: [PATCH net-next 3/4] nfp: flower: support offloading multiple rules with same cookie

2018-04-25 Thread John Hurley
On Wed, Apr 25, 2018 at 7:31 AM, Or Gerlitz <gerlitz...@gmail.com> wrote:
> On Wed, Apr 25, 2018 at 7:17 AM, Jakub Kicinski
> <jakub.kicin...@netronome.com> wrote:
>> From: John Hurley <john.hur...@netronome.com>
>>
>> When multiple netdevs are attached to a tc offload block and register for
>> callbacks, a rule added to the block will be propogated to all netdevs.
>> Previously these were detected as duplicates (based on cookie) and
>> rejected. Modify the rule nfp lookup function to optionally include an
>> ingress netdev and a host context along with the cookie value when
>> searching for a rule. When a new rule is passed to the driver, the netdev
>> the rule is to be attached to is considered when searching for dublicates.
>
> so if the same rule (cookie) is provided to the driver through multiple 
> ingress
> devices you will not reject it -- what is the use case for that, is it
> block sharing?

Hi Or,
Yes, block sharing is the current use-case.
Simple example for clarity
Here we want to offload the filter to both ingress devs nfp_0 and nfp_1:

tc qdisc add dev nfp_p0 ingress_block 22 ingress
tc qdisc add dev nfp_p1 ingress_block 22 ingress
tc filter add block 22 protocol ip parent : flower skip_sw
ip_proto tcp action drop


Re: [PATCH net 5/5] nfp: remove false positive offloads in flower vxlan

2018-04-18 Thread John Hurley
On Wed, Apr 18, 2018 at 7:18 PM, Or Gerlitz <gerlitz...@gmail.com> wrote:
> On Wed, Apr 18, 2018 at 3:31 PM, John Hurley <john.hur...@netronome.com> 
> wrote:
>> On Wed, Apr 18, 2018 at 8:43 AM, Or Gerlitz <gerlitz...@gmail.com> wrote:
>>> On Fri, Nov 17, 2017 at 4:06 AM, Jakub Kicinski
>>> <jakub.kicin...@netronome.com> wrote:
>>>> From: John Hurley <john.hur...@netronome.com>
>>>>
>>>> Pass information to the match offload on whether or not the repr is the
>>>> ingress or egress dev. Only accept tunnel matches if repr is the egress 
>>>> dev.
>>>>
>>>> This means rules such as the following are successfully offloaded:
>>>> tc .. add dev vxlan0 .. enc_dst_port 4789 .. action redirect dev nfp_p0
>>>>
>>>> While rules such as the following are rejected:
>>>> tc .. add dev nfp_p0 .. enc_dst_port 4789 .. action redirect dev vxlan0
>>>
>>> cool
>>>
>>>
>>>> Also reject non tunnel flows that are offloaded to an egress dev.
>>>> Non tunnel matches assume that the offload dev is the ingress port and
>>>> offload a match accordingly.
>>>
>>> not following on the "Also" here, see below
>>>
>>>
>>>> diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c 
>>>> b/drivers/net/ethernet/netronome/nfp/flower/offload.c
>>>> index a0193e0c24a0..f5d73b83dcc2 100644
>>>> --- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
>>>> +++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
>>>> @@ -131,7 +131,8 @@ static bool nfp_flower_check_higher_than_mac(struct 
>>>> tc_cls_flower_offload *f)
>>>>
>>>>  static int
>>>>  nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
>>>> -   struct tc_cls_flower_offload *flow)
>>>> +   struct tc_cls_flower_offload *flow,
>>>> +   bool egress)
>>>>  {
>>>> struct flow_dissector_key_basic *mask_basic = NULL;
>>>> struct flow_dissector_key_basic *key_basic = NULL;
>>>> @@ -167,6 +168,9 @@ nfp_flower_calculate_key_layers(struct nfp_fl_key_ls 
>>>> *ret_key_ls,
>>>> skb_flow_dissector_target(flow->dissector,
>>>>   
>>>> FLOW_DISSECTOR_KEY_ENC_CONTROL,
>>>>   flow->key);
>>>> +   if (!egress)
>>>> +   return -EOPNOTSUPP;
>>>> +
>>>> if (mask_enc_ctl->addr_type != 0x ||
>>>> enc_ctl->addr_type != FLOW_DISSECTOR_KEY_IPV4_ADDRS)
>>>> return -EOPNOTSUPP;
>>>> @@ -194,6 +198,9 @@ nfp_flower_calculate_key_layers(struct nfp_fl_key_ls 
>>>> *ret_key_ls,
>>>>
>>>> key_layer |= NFP_FLOWER_LAYER_VXLAN;
>>>> key_size += sizeof(struct nfp_flower_vxlan);
>>>> +   } else if (egress) {
>>>> +   /* Reject non tunnel matches offloaded to egress repr. */
>>>> +   return -EOPNOTSUPP;
>>>> }
>>>
>>> with these two hunks we get: egress <- IFF -> encap match, right?
>>>
>>> (1) we can't offload the egress way if there isn't matching on encap headers
>>> (2) we can't go the matching on encap headers way if we are not egress
>>>
>>
>> yes, this is correct.
>> With the block code and egdev offload, we do not have access to the
>> ingress netdev when doing an offload.
>> We need to use the encap headers (especially the enc_port) to
>> distinguish the type of tunnel used and, therefore, require that the
>> encap matches be present before offloading.
>>
>>> what other cases are rejected by this logic?
>>>
>>
>> Yes, some other cases may be rejected (like veth mentioned below).
>
> my claim is that the veth case I mentioned below will not be rejected
> if it has the matching on encap headers, and a wrong rule will be set
> into hw, agree?
>

yes, unfortunately this is correct.
Without having access to the ingress netdev we have to put as many
restrictions as possible to ensure it is 'almost certainly' a given
ingress netdev but extreme cases can bypass this.

>> However, this is better than allowing

Re: [PATCH net 5/5] nfp: remove false positive offloads in flower vxlan

2018-04-18 Thread John Hurley
On Wed, Apr 18, 2018 at 8:43 AM, Or Gerlitz <gerlitz...@gmail.com> wrote:
> On Fri, Nov 17, 2017 at 4:06 AM, Jakub Kicinski
> <jakub.kicin...@netronome.com> wrote:
>> From: John Hurley <john.hur...@netronome.com>
>>
>> Pass information to the match offload on whether or not the repr is the
>> ingress or egress dev. Only accept tunnel matches if repr is the egress dev.
>>
>> This means rules such as the following are successfully offloaded:
>> tc .. add dev vxlan0 .. enc_dst_port 4789 .. action redirect dev nfp_p0
>>
>> While rules such as the following are rejected:
>> tc .. add dev nfp_p0 .. enc_dst_port 4789 .. action redirect dev vxlan0
>
> cool
>
>
>> Also reject non tunnel flows that are offloaded to an egress dev.
>> Non tunnel matches assume that the offload dev is the ingress port and
>> offload a match accordingly.
>
> not following on the "Also" here, see below
>
>
>> diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c 
>> b/drivers/net/ethernet/netronome/nfp/flower/offload.c
>> index a0193e0c24a0..f5d73b83dcc2 100644
>> --- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
>> +++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
>> @@ -131,7 +131,8 @@ static bool nfp_flower_check_higher_than_mac(struct 
>> tc_cls_flower_offload *f)
>>
>>  static int
>>  nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
>> -   struct tc_cls_flower_offload *flow)
>> +   struct tc_cls_flower_offload *flow,
>> +   bool egress)
>>  {
>> struct flow_dissector_key_basic *mask_basic = NULL;
>> struct flow_dissector_key_basic *key_basic = NULL;
>> @@ -167,6 +168,9 @@ nfp_flower_calculate_key_layers(struct nfp_fl_key_ls 
>> *ret_key_ls,
>> skb_flow_dissector_target(flow->dissector,
>>   
>> FLOW_DISSECTOR_KEY_ENC_CONTROL,
>>   flow->key);
>> +   if (!egress)
>> +   return -EOPNOTSUPP;
>> +
>> if (mask_enc_ctl->addr_type != 0x ||
>> enc_ctl->addr_type != FLOW_DISSECTOR_KEY_IPV4_ADDRS)
>> return -EOPNOTSUPP;
>> @@ -194,6 +198,9 @@ nfp_flower_calculate_key_layers(struct nfp_fl_key_ls 
>> *ret_key_ls,
>>
>> key_layer |= NFP_FLOWER_LAYER_VXLAN;
>> key_size += sizeof(struct nfp_flower_vxlan);
>> +   } else if (egress) {
>> +   /* Reject non tunnel matches offloaded to egress repr. */
>> +   return -EOPNOTSUPP;
>> }
>
> with these two hunks we get: egress <- IFF -> encap match, right?
>
> (1) we can't offload the egress way if there isn't matching on encap headers
> (2) we can't go the matching on encap headers way if we are not egress
>

yes, this is correct.
With the block code and egdev offload, we do not have access to the
ingress netdev when doing an offload.
We need to use the encap headers (especially the enc_port) to
distinguish the type of tunnel used and, therefore, require that the
encap matches be present before offloading.

> what other cases are rejected by this logic?
>

Yes, some other cases may be rejected (like veth mentioned below).
However, this is better than allowing rules to be incorrectly
offloaded (as could have happened before these changes).
Currently, we are looking at offloading flows on other ingress devices
such as bonds so this will require a change to the driver code here.
IMO, the cleanest solution will also require tc core changes to either
avoid egdev offload or to have access to the ingress netdev of a rule.

> e.g If we add a rule with SW device (veth. tap) being the ingress, and
> HW device (vf rep)
> being the egress while not using skip_sw (just no flags == both) we
> get the TC stack
> go along the egdev callback from the vf rep hw device and add an
> (uplink --> vf rep) rule
> which will not be rejected if there is matching on tunnel headers, it
> will also not be rejected
> by some driver logic as the one we discussed to identify and ignore
> rules that are attempted to being added twice.
>
> Or.


Re: [RFC net-next 0/6] offload linux bonding tc ingress rules

2018-03-05 Thread John Hurley
On Mon, Mar 5, 2018 at 9:43 PM, Or Gerlitz <gerlitz...@gmail.com> wrote:
> On Mon, Mar 5, 2018 at 3:28 PM, John Hurley <john.hur...@netronome.com> wrote:
>> This RFC patchset adds support for offloading tc ingress rules applied to
>> linux bonds. The premise of these patches is that if a rule is applied to
>> a bond port then the rule should be applied to each slave of the bond.
>>
>> The linux bond itself registers a cb for offloading tc rules. Potential
>> slave netdevs on offload devices can then register with the bond for a
>> further callback - this code is basically the same as registering for an
>> egress dev offload in TC. Then when a rule is offloaded to the bond, it
>> can be relayed to each netdev that has registered with the bond code and
>> which is a slave of the given bond.
>>
>> To prevent sync issues between the kernel and offload device, the linux
>> bond driver is affectively locked when it has offloaded rules. i.e no new
>> ports can be enslaved and no slaves can be released until the offload
>> rules are removed. Similarly, if a port on a bond is deleted, the bond is
>> destroyed, forcing a flush of all offloaded rules.
>>
>> Also included in the RFC are changes to the NFP driver to utilise the new
>> code by registering NFP port representors for bond offload rules and
>> modifying cookie handling to allow the relaying of a rule to multiple ports.
>
> what is your approach for rules whose bond is their egress device
> (ingress = vf port
> representor)?

Egress offload will be handled entirely by the NFP driver.
Basically, notifiers will track the slaves/masters and update the card
with any groups that consist entirely of reprs.
We then offload the TC rule outputting to the given group - because it
is an ingress match we can access the egress netdev in the block
callback.


Re: [RFC net-next 4/6] nfp: add ndo_set_mac_address for representors

2018-03-05 Thread John Hurley
On Mon, Mar 5, 2018 at 9:39 PM, Or Gerlitz <gerlitz...@gmail.com> wrote:
> On Mon, Mar 5, 2018 at 3:28 PM, John Hurley <john.hur...@netronome.com> wrote:
>> A representor hardware address does not have any meaning outside of the
>> kernel netdev/networking stack. Thus there is no need for any app specific
>> code for setting a representors hardware address, the default eth_mac_addr
>> is sufficient.
>
> where did you need that? does libvirt attempts to change the mac address or
> it's for bonding to call, worth mentioning the use-case in the change log

Hi Or,
yes, setting the mac is required to add the repr to a linux bond.
I agree, I should add the use case here. Thanks


[RFC net-next 0/6] offload linux bonding tc ingress rules

2018-03-05 Thread John Hurley
Hi,

This RFC patchset adds support for offloading tc ingress rules applied to
linux bonds. The premise of these patches is that if a rule is applied to
a bond port then the rule should be applied to each slave of the bond.

The linux bond itself registers a cb for offloading tc rules. Potential
slave netdevs on offload devices can then register with the bond for a
further callback - this code is basically the same as registering for an
egress dev offload in TC. Then when a rule is offloaded to the bond, it
can be relayed to each netdev that has registered with the bond code and
which is a slave of the given bond.

To prevent sync issues between the kernel and offload device, the linux
bond driver is affectively locked when it has offloaded rules. i.e no new
ports can be enslaved and no slaves can be released until the offload
rules are removed. Similarly, if a port on a bond is deleted, the bond is
destroyed, forcing a flush of all offloaded rules.

Also included in the RFC are changes to the NFP driver to utilise the new
code by registering NFP port representors for bond offload rules and
modifying cookie handling to allow the relaying of a rule to multiple
ports.

Thanks,
John

John Hurley (6):
  drivers: net: bonding: add tc offload infastructure to bond
  driver: net: bonding: allow registration of tc offload callbacks in
bond
  drivers: net: bonding: restrict bond mods when rules are offloaded
  nfp: add ndo_set_mac_address for representors
  nfp: register repr ports for bond offloads
  nfp: support offloading multiple rules with same cookie

 drivers/net/bonding/bond_main.c| 277 -
 drivers/net/ethernet/netronome/nfp/flower/main.c   |  24 +-
 drivers/net/ethernet/netronome/nfp/flower/main.h   |  10 +-
 .../net/ethernet/netronome/nfp/flower/metadata.c   |  20 +-
 .../net/ethernet/netronome/nfp/flower/offload.c|  33 ++-
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.c  |   1 +
 include/net/bonding.h  |   9 +
 7 files changed, 351 insertions(+), 23 deletions(-)

-- 
2.7.4



[RFC net-next 1/6] drivers: net: bonding: add tc offload infastructure to bond

2018-03-05 Thread John Hurley
Register an ndo and callback for linux bonds to offload TC block ingress
rules. Enable tc-hw-offload to be set by the user (defaults to off). When
on, the flag cannot be turned off if rules are offloaded.

Signed-off-by: John Hurley <john.hur...@netronome.com>
---
 drivers/net/bonding/bond_main.c | 64 +
 include/net/bonding.h   |  2 ++
 2 files changed, 66 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 4c19d23..e6415f6 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -325,6 +325,55 @@ static int bond_vlan_rx_kill_vid(struct net_device 
*bond_dev,
return 0;
 }
 
+/*-- TC HW Offload --*/
+
+static inline unsigned int bond_get_offload_cnt(struct bonding *bond)
+{
+   if (!bond->tc_block)
+   return 0;
+
+   return bond->tc_block->offloadcnt;
+}
+
+static int bond_tc_relay_cb(enum tc_setup_type type, void *type_data,
+   void *cb_priv)
+{
+   return 0;
+}
+
+static int bond_setup_tc_block(struct net_device *netdev,
+  struct tc_block_offload *f)
+{
+   struct bonding *bond = netdev_priv(netdev);
+
+   if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
+   return -EOPNOTSUPP;
+
+   switch (f->command) {
+   case TC_BLOCK_BIND:
+   bond->tc_block = f->block;
+   return tcf_block_cb_register(f->block, bond_tc_relay_cb,
+netdev, netdev);
+   case TC_BLOCK_UNBIND:
+   bond->tc_block = NULL;
+   tcf_block_cb_unregister(f->block, bond_tc_relay_cb, netdev);
+   return 0;
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
+static int bond_setup_tc(struct net_device *netdev, enum tc_setup_type type,
+void *type_data)
+{
+   switch (type) {
+   case TC_SETUP_BLOCK:
+   return bond_setup_tc_block(netdev, type_data);
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
 /*--- Link status ---*/
 
 /* Set the carrier state for the master according to the state of its
@@ -1065,6 +1114,17 @@ static netdev_features_t bond_fix_features(struct 
net_device *dev,
return features;
 }
 
+int bond_set_features(struct net_device *netdev, netdev_features_t features)
+{
+   if ((features & NETIF_F_HW_TC) <
+   !!bond_get_offload_cnt(netdev_priv(netdev))) {
+   netdev_err(netdev, "Cannot disable TC offload while active\n");
+   return -EBUSY;
+   }
+
+   return 0;
+}
+
 #define BOND_VLAN_FEATURES (NETIF_F_HW_CSUM | NETIF_F_SG | \
 NETIF_F_FRAGLIST | NETIF_F_ALL_TSO | \
 NETIF_F_HIGHDMA | NETIF_F_LRO)
@@ -4198,7 +4258,9 @@ static const struct net_device_ops bond_netdev_ops = {
.ndo_add_slave  = bond_enslave,
.ndo_del_slave  = bond_release,
.ndo_fix_features   = bond_fix_features,
+   .ndo_set_features   = bond_set_features,
.ndo_features_check = passthru_features_check,
+   .ndo_setup_tc   = bond_setup_tc,
 };
 
 static const struct device_type bond_type = {
@@ -4259,6 +4321,8 @@ void bond_setup(struct net_device *bond_dev)
 
bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL;
bond_dev->features |= bond_dev->hw_features;
+   /* Disable hw tc offload by default. */
+   bond_dev->hw_features |= NETIF_F_HW_TC;
 }
 
 /* Destroy a bonding device.
diff --git a/include/net/bonding.h b/include/net/bonding.h
index f801fc9..424b9ea 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define BOND_MAX_ARP_TARGETS   16
 
@@ -199,6 +200,7 @@ struct bonding {
struct   bond_up_slave __rcu *slave_arr; /* Array of usable slaves */
bool force_primary;
s32  slave_cnt; /* never change this value outside the 
attach/detach wrappers */
+   struct   tcf_block *tc_block;
int (*recv_probe)(const struct sk_buff *, struct bonding *,
  struct slave *);
/* mode_lock is used for mode-specific locking needs, currently used by:
-- 
2.7.4



[RFC net-next 5/6] nfp: register repr ports for bond offloads

2018-03-05 Thread John Hurley
On initialisation, register nfp repr ports to receive callbacks when tc
rules are offloaded to any bond they may be attached to. Callback
function is the same that is used for non bonded port rule offload.

Signed-off-by: John Hurley <john.hur...@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/flower/main.c   | 24 +++---
 drivers/net/ethernet/netronome/nfp/flower/main.h   |  2 ++
 .../net/ethernet/netronome/nfp/flower/offload.c|  4 ++--
 3 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c 
b/drivers/net/ethernet/netronome/nfp/flower/main.c
index 742d6f1..0ca15f6 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "main.h"
 #include "../nfpcore/nfp_cpp.h"
@@ -177,14 +178,31 @@ nfp_flower_repr_netdev_stop(struct nfp_app *app, struct 
nfp_repr *repr)
 static int
 nfp_flower_repr_netdev_init(struct nfp_app *app, struct net_device *netdev)
 {
-   return tc_setup_cb_egdev_register(netdev,
- nfp_flower_setup_tc_egress_cb,
- netdev_priv(netdev));
+   int err;
+
+   err = tc_setup_cb_bond_register(netdev, nfp_flower_setup_tc_block_cb,
+   netdev_priv(netdev));
+   if (err)
+   return err;
+
+   err = tc_setup_cb_egdev_register(netdev, nfp_flower_setup_tc_egress_cb,
+netdev_priv(netdev));
+   if (err)
+   goto err_egdev;
+
+   return err;
+err_egdev:
+   tc_setup_cb_bond_unregister(netdev, nfp_flower_setup_tc_block_cb,
+   netdev_priv(netdev));
+   return err;
 }
 
 static void
 nfp_flower_repr_netdev_clean(struct nfp_app *app, struct net_device *netdev)
 {
+   tc_setup_cb_bond_unregister(netdev, nfp_flower_setup_tc_block_cb,
+   netdev_priv(netdev));
+
tc_setup_cb_egdev_unregister(netdev, nfp_flower_setup_tc_egress_cb,
 netdev_priv(netdev));
 }
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
index c5cebf6..5fd7c1f 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -213,5 +213,7 @@ void nfp_tunnel_request_route(struct nfp_app *app, struct 
sk_buff *skb);
 void nfp_tunnel_keep_alive(struct nfp_app *app, struct sk_buff *skb);
 int nfp_flower_setup_tc_egress_cb(enum tc_setup_type type, void *type_data,
  void *cb_priv);
+int nfp_flower_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+void *cb_priv);
 
 #endif
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c 
b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index f3586c5..eb8abeb 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -553,8 +553,8 @@ int nfp_flower_setup_tc_egress_cb(enum tc_setup_type type, 
void *type_data,
}
 }
 
-static int nfp_flower_setup_tc_block_cb(enum tc_setup_type type,
-   void *type_data, void *cb_priv)
+int nfp_flower_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+void *cb_priv)
 {
struct nfp_repr *repr = cb_priv;
 
-- 
2.7.4



[RFC net-next 2/6] driver: net: bonding: allow registration of tc offload callbacks in bond

2018-03-05 Thread John Hurley
Allow drivers to register netdev callbacks for tc offload in linux bonds.
If a netdev has registered and is a slave of a given bond, then any tc
rules offloaded to the bond will be relayed to it if both the bond and the
slave permit hw offload.

Because the bond itself is not offloaded, just the rules, we don't care
about whether the bond ports are on the same device or whether some of
slaves are representor ports and some are not.

Signed-off-by: John Hurley <john.hur...@netronome.com>
---
 drivers/net/bonding/bond_main.c | 195 +++-
 include/net/bonding.h   |   7 ++
 2 files changed, 201 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index e6415f6..d9e41cf 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -335,9 +335,201 @@ static inline unsigned int bond_get_offload_cnt(struct 
bonding *bond)
return bond->tc_block->offloadcnt;
 }
 
+struct tcf_bond_cb {
+   struct list_head list;
+   tc_setup_cb_t *cb;
+   void *cb_priv;
+};
+
+struct tcf_bond_off {
+   struct rhash_head ht_node;
+   const struct net_device *netdev;
+   unsigned int refcnt;
+   struct list_head cb_list;
+};
+
+static const struct rhashtable_params tcf_bond_ht_params = {
+   .key_offset = offsetof(struct tcf_bond_off, netdev),
+   .head_offset = offsetof(struct tcf_bond_off, ht_node),
+   .key_len = sizeof(const struct net_device *),
+};
+
+static struct tcf_bond_off *tcf_bond_off_lookup(const struct net_device *dev)
+{
+   struct bond_net *bn = net_generic(dev_net(dev), bond_net_id);
+
+   return rhashtable_lookup_fast(>bond_offload_ht, ,
+ tcf_bond_ht_params);
+}
+
+static struct tcf_bond_cb *tcf_bond_off_cb_lookup(struct tcf_bond_off *off,
+ tc_setup_cb_t *cb,
+ void *cb_priv)
+{
+   struct tcf_bond_cb *bond_cb;
+
+   list_for_each_entry(bond_cb, >cb_list, list)
+   if (bond_cb->cb == cb && bond_cb->cb_priv == cb_priv)
+   return bond_cb;
+   return NULL;
+}
+
+static struct tcf_bond_off *tcf_bond_off_get(const struct net_device *dev,
+tc_setup_cb_t *cb,
+void *cb_priv)
+{
+   struct tcf_bond_off *bond_off;
+   struct bond_net *bn;
+
+   bond_off = tcf_bond_off_lookup(dev);
+   if (bond_off)
+   goto inc_ref;
+
+   bond_off = kzalloc(sizeof(*bond_off), GFP_KERNEL);
+   if (!bond_off)
+   return NULL;
+   INIT_LIST_HEAD(_off->cb_list);
+   bond_off->netdev = dev;
+   bn = net_generic(dev_net(dev), bond_net_id);
+   rhashtable_insert_fast(>bond_offload_ht, _off->ht_node,
+  tcf_bond_ht_params);
+
+inc_ref:
+   bond_off->refcnt++;
+   return bond_off;
+}
+
+static void tcf_bond_off_put(struct tcf_bond_off *bond_off)
+{
+   struct bond_net *bn;
+
+   if (--bond_off->refcnt)
+   return;
+   bn = net_generic(dev_net(bond_off->netdev), bond_net_id);
+   rhashtable_remove_fast(>bond_offload_ht, _off->ht_node,
+  tcf_bond_ht_params);
+   kfree(bond_off);
+}
+
+static int tcf_bond_off_cb_add(struct tcf_bond_off *bond_off,
+  tc_setup_cb_t *cb, void *cb_priv)
+{
+   struct tcf_bond_cb *bond_cb;
+
+   bond_cb = tcf_bond_off_cb_lookup(bond_off, cb, cb_priv);
+   if (WARN_ON(bond_cb))
+   return -EEXIST;
+   bond_cb = kzalloc(sizeof(*bond_cb), GFP_KERNEL);
+   if (!bond_cb)
+   return -ENOMEM;
+   bond_cb->cb = cb;
+   bond_cb->cb_priv = cb_priv;
+   list_add(_cb->list, _off->cb_list);
+   return 0;
+}
+
+static void tcf_bond_off_cb_del(struct tcf_bond_off *bond_off,
+   tc_setup_cb_t *cb, void *cb_priv)
+{
+   struct tcf_bond_cb *bond_cb;
+
+   bond_cb = tcf_bond_off_cb_lookup(bond_off, cb, cb_priv);
+   if (WARN_ON(!bond_cb))
+   return;
+   list_del(_cb->list);
+   kfree(bond_cb);
+}
+
+static int __tc_setup_cb_bond_register(const struct net_device *dev,
+  tc_setup_cb_t *cb, void *cb_priv)
+{
+   struct tcf_bond_off *bond_off = tcf_bond_off_get(dev, cb, cb_priv);
+   int err;
+
+   if (!bond_off)
+   return -ENOMEM;
+   err = tcf_bond_off_cb_add(bond_off, cb, cb_priv);
+   if (err)
+   goto err_cb_add;
+   return 0;
+
+err_cb_add:
+   tcf_bond_off_put(bond_off);
+   return err;
+}
+
+int tc_setup_cb_bond_register(const struct net_device *dev, tc_setup_cb_t *cb,
+ void *cb_priv)
+{
+   int err

[RFC net-next 4/6] nfp: add ndo_set_mac_address for representors

2018-03-05 Thread John Hurley
A representor hardware address does not have any meaning outside of the
kernel netdev/networking stack. Thus there is no need for any app specific
code for setting a representors hardware address, the default eth_mac_addr
is sufficient.

Signed-off-by: John Hurley <john.hur...@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
index 6195705..329e37d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
@@ -266,6 +266,7 @@ const struct net_device_ops nfp_repr_netdev_ops = {
.ndo_get_vf_config  = nfp_app_get_vf_config,
.ndo_set_vf_link_state  = nfp_app_set_vf_link_state,
.ndo_set_features   = nfp_port_set_features,
+   .ndo_set_mac_address= eth_mac_addr,
 };
 
 static void nfp_repr_clean(struct nfp_repr *repr)
-- 
2.7.4



[RFC net-next 6/6] nfp: support offloading multiple rules with same cookie

2018-03-05 Thread John Hurley
If ports are bonded, the same rule with the same cookie may be offloaded
to multiple ports. Modify the rule lookup function to optionally include
an ingress netdev and a host context along with the cookie value when
searching for a rule. When a new rule is passed to the driver, the netdev
the rule is to be attached to is considered when searching for dublicates.
Conversely, when a stats update is received from the NFP card, the host
context is used alongside the cookie to map to the correct host rule.

Signed-off-by: John Hurley <john.hur...@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/flower/main.h   |  8 --
 .../net/ethernet/netronome/nfp/flower/metadata.c   | 20 +--
 .../net/ethernet/netronome/nfp/flower/offload.c| 29 --
 3 files changed, 40 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
index 5fd7c1f..2c48e10 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -47,6 +47,7 @@
 struct net_device;
 struct nfp_app;
 
+#define NFP_FL_STATS_CTX_DONT_CARE cpu_to_be32(0x)
 #define NFP_FL_STATS_ENTRY_RS  BIT(20)
 #define NFP_FL_STATS_ELEM_RS   4
 #define NFP_FL_REPEATED_HASH_MAX   BIT(17)
@@ -166,6 +167,7 @@ struct nfp_fl_payload {
spinlock_t lock; /* lock stats */
struct nfp_fl_stats stats;
__be32 nfp_tun_ipv4_addr;
+   struct net_device *ingress_dev;
char *unmasked_data;
char *mask_data;
char *action_data;
@@ -193,12 +195,14 @@ int nfp_flower_compile_action(struct 
tc_cls_flower_offload *flow,
  struct nfp_fl_payload *nfp_flow);
 int nfp_compile_flow_metadata(struct nfp_app *app,
  struct tc_cls_flower_offload *flow,
- struct nfp_fl_payload *nfp_flow);
+ struct nfp_fl_payload *nfp_flow,
+ struct net_device *netdev);
 int nfp_modify_flow_metadata(struct nfp_app *app,
 struct nfp_fl_payload *nfp_flow);
 
 struct nfp_fl_payload *
-nfp_flower_search_fl_table(struct nfp_app *app, unsigned long 
tc_flower_cookie);
+nfp_flower_search_fl_table(struct nfp_app *app, unsigned long tc_flower_cookie,
+  struct net_device *netdev, __be32 host_ctx);
 struct nfp_fl_payload *
 nfp_flower_remove_fl_table(struct nfp_app *app, unsigned long 
tc_flower_cookie);
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/metadata.c 
b/drivers/net/ethernet/netronome/nfp/flower/metadata.c
index db977cf..21668aa 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/metadata.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/metadata.c
@@ -99,14 +99,18 @@ static int nfp_get_stats_entry(struct nfp_app *app, u32 
*stats_context_id)
 
 /* Must be called with either RTNL or rcu_read_lock */
 struct nfp_fl_payload *
-nfp_flower_search_fl_table(struct nfp_app *app, unsigned long tc_flower_cookie)
+nfp_flower_search_fl_table(struct nfp_app *app, unsigned long tc_flower_cookie,
+  struct net_device *netdev, __be32 host_ctx)
 {
struct nfp_flower_priv *priv = app->priv;
struct nfp_fl_payload *flower_entry;
 
hash_for_each_possible_rcu(priv->flow_table, flower_entry, link,
   tc_flower_cookie)
-   if (flower_entry->tc_flower_cookie == tc_flower_cookie)
+   if (flower_entry->tc_flower_cookie == tc_flower_cookie &&
+   (!netdev || flower_entry->ingress_dev == netdev) &&
+   (host_ctx == NFP_FL_STATS_CTX_DONT_CARE ||
+flower_entry->meta.host_ctx_id == host_ctx))
return flower_entry;
 
return NULL;
@@ -121,13 +125,11 @@ nfp_flower_update_stats(struct nfp_app *app, struct 
nfp_fl_stats_frame *stats)
flower_cookie = be64_to_cpu(stats->stats_cookie);
 
rcu_read_lock();
-   nfp_flow = nfp_flower_search_fl_table(app, flower_cookie);
+   nfp_flow = nfp_flower_search_fl_table(app, flower_cookie, NULL,
+ stats->stats_con_id);
if (!nfp_flow)
goto exit_rcu_unlock;
 
-   if (nfp_flow->meta.host_ctx_id != stats->stats_con_id)
-   goto exit_rcu_unlock;
-
spin_lock(_flow->lock);
nfp_flow->stats.pkts += be32_to_cpu(stats->pkt_count);
nfp_flow->stats.bytes += be64_to_cpu(stats->byte_count);
@@ -317,7 +319,8 @@ nfp_check_mask_remove(struct nfp_app *app, char *mask_data, 
u32 mask_len,
 
 int nfp_compile_flow_metadata(struct nfp_app *app,
  struct tc_cls_flower_offload *flow,
- struct nfp_fl_payload *nfp_flow)
+ st

[RFC net-next 3/6] drivers: net: bonding: restrict bond mods when rules are offloaded

2018-03-05 Thread John Hurley
When a bond has tc rules offloaded to its slaves, prevent new slaves being
added. To remove a slave from a bond, the offloaded rules must first be
deleted. For the case where a slave port on a bond is unregistered from
the kernel, flush all offloaded rules and destroy the bond.

Signed-off-by: John Hurley <john.hur...@netronome.com>
---
 drivers/net/bonding/bond_main.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index d9e41cf..4c146b1 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1607,6 +1607,15 @@ int bond_enslave(struct net_device *bond_dev, struct 
net_device *slave_dev,
return -EPERM;
}
 
+   /* check for TC offloaded */
+   if (bond_get_offload_cnt(bond)) {
+   NL_SET_ERR_MSG(extack,
+  "Cannot enslave - bond has offloaded rules.");
+   netdev_err(bond_dev,
+  "cannot enslave - bond has offloaded rules.\n");
+   return -EPERM;
+   }
+
/* vlan challenged mutual exclusion */
/* no need to lock since we're protected by rtnl_lock */
if (slave_dev->features & NETIF_F_VLAN_CHALLENGED) {
@@ -2237,6 +2246,13 @@ static int __bond_release_one(struct net_device 
*bond_dev,
 /* A wrapper used because of ndo_del_link */
 int bond_release(struct net_device *bond_dev, struct net_device *slave_dev)
 {
+   if (bond_get_offload_cnt(netdev_priv(bond_dev))) {
+   netdev_err(bond_dev,
+  "cannot release %s - has offloaded rules.\n",
+  slave_dev->name);
+   return -EPERM;
+   }
+
return __bond_release_one(bond_dev, slave_dev, false, false);
 }
 
@@ -3325,6 +3341,8 @@ static int bond_slave_netdev_event(unsigned long event,
case NETDEV_UNREGISTER:
if (bond_dev->type != ARPHRD_ETHER)
bond_release_and_destroy(bond_dev, slave_dev);
+   else if (bond_get_offload_cnt(bond))
+   unregister_netdevice_queue(bond_dev, NULL);
else
__bond_release_one(bond_dev, slave_dev, false, true);
break;
-- 
2.7.4



Re: [PATCH net-next 2/7] nfp: compile flower vxlan tunnel metadata match fields

2017-09-26 Thread John Hurley
On Tue, Sep 26, 2017 at 4:33 PM, Or Gerlitz <gerlitz...@gmail.com> wrote:
> On Tue, Sep 26, 2017 at 6:11 PM, John Hurley <john.hur...@netronome.com> 
> wrote:
>> On Tue, Sep 26, 2017 at 3:12 PM, Or Gerlitz <gerlitz...@gmail.com> wrote:
>>> On Tue, Sep 26, 2017 at 4:58 PM, John Hurley <john.hur...@netronome.com> 
>>> wrote:
>>>> On Mon, Sep 25, 2017 at 7:35 PM, Or Gerlitz <gerlitz...@gmail.com> wrote:
>>>>> On Mon, Sep 25, 2017 at 1:23 PM, Simon Horman
>>>>> <simon.hor...@netronome.com> wrote:
>>>>>> From: John Hurley <john.hur...@netronome.com>
>>>>>>
>>>>>> Compile ovs-tc flower vxlan metadata match fields for offloading. Only
>>>>>
>>>>> anything in the npf kernel bits has direct relation to ovs? what?
>>>>>
>>>>
>>>> Sorry, this is a typo  and should refer to TC.
>>>>
>>>>>> +++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
>>>>>> @@ -52,8 +52,25 @@
>>>>>>  BIT(FLOW_DISSECTOR_KEY_PORTS) | \
>>>>>>  BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS) | \
>>>>>>  BIT(FLOW_DISSECTOR_KEY_VLAN) | \
>>>>>> +BIT(FLOW_DISSECTOR_KEY_ENC_KEYID) | \
>>>>>> +BIT(FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS) | \
>>>>>> +BIT(FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS) | \
>>>>>
>>>>> this series takes care of IPv6 tunnels too?
>>>>
>>>> IPv6 is not included in this set.
>>>> The reason the IPv6 bit is included here is to account for behavior we
>>>> have noticed in TC flower.
>>>> If, for example, I add a filter with the following match fields:
>>>> 'protocol ip flower enc_src_ip 10.0.0.1 enc_dst_ip 10.0.0.2
>>>> enc_dst_port 4789 enc_key_id 123'
>>>> The 'used_keys' value in the dissector marks both IPv4 and IPv6 encap
>>>> addresses as 'used'.
>>>> I am not sure if this is a bug in TC or that we are expected to check
>>>> the enc_control fields to determine if IPv4 or v6 addresses are used.
>>>
>>> you should have your code to check enc_control->addr_type to be
>>> FLOW_DISSECTOR_KEY_IPV4_ADDRS or IPV6_ADDRS
>>>
>>>
>>>> Including the IPv6 used_keys bit in our whitelist approach allows us
>>>> to accept legitimate IPv4 tunnel rules in these situations.
>>>
>>> mmm can please take a look on fl_init_dissector() and tell me if you
>>> see why FLOW_DISSECTOR_KEY_IPV6_ADDRS is set for ipv4 tunnels,
>>> I am not sure.
>>
>>
>> The fl_init_dissector uses the FL_KEY_SET_IF_MASKED macro to set an
>> array of keys which are then translated to the used_keys values.
>> The FL_KEY_SET_IF_MASKED takes a 'struct fl_flow_key' as input and
>> checks if any mask bits are set in a particular field - if so it
>> eventually marks it as used.
>> In struct fl_flow_key, the encap ipv4 and ipv6 addresses are
>> represented as a union of the 2.
>> Therefore, if we have masked bits set for IPv4, they are also being
>> set for the IPv6 field.
>
> I see, do you consider it a bug?

The code seems to insist that, if either IPv4 or IPv6 is in use then a
control encap key is also used:

if (FL_KEY_IS_MASKED(>key, enc_ipv4) ||
   FL_KEY_IS_MASKED(>key, enc_ipv6))
FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_ENC_CONTROL,
  enc_control);

Therefore, I think it should be ok to use this to determine the IP
type in use by the tunnel.


Re: [PATCH net-next 2/7] nfp: compile flower vxlan tunnel metadata match fields

2017-09-26 Thread John Hurley
On Tue, Sep 26, 2017 at 3:12 PM, Or Gerlitz <gerlitz...@gmail.com> wrote:
> On Tue, Sep 26, 2017 at 4:58 PM, John Hurley <john.hur...@netronome.com> 
> wrote:
>> On Mon, Sep 25, 2017 at 7:35 PM, Or Gerlitz <gerlitz...@gmail.com> wrote:
>>> On Mon, Sep 25, 2017 at 1:23 PM, Simon Horman
>>> <simon.hor...@netronome.com> wrote:
>>>> From: John Hurley <john.hur...@netronome.com>
>>>>
>>>> Compile ovs-tc flower vxlan metadata match fields for offloading. Only
>>>
>>> anything in the npf kernel bits has direct relation to ovs? what?
>>>
>>
>> Sorry, this is a typo  and should refer to TC.
>>
>>>> +++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
>>>> @@ -52,8 +52,25 @@
>>>>  BIT(FLOW_DISSECTOR_KEY_PORTS) | \
>>>>  BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS) | \
>>>>  BIT(FLOW_DISSECTOR_KEY_VLAN) | \
>>>> +BIT(FLOW_DISSECTOR_KEY_ENC_KEYID) | \
>>>> +BIT(FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS) | \
>>>> +BIT(FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS) | \
>>>
>>> this series takes care of IPv6 tunnels too?
>>
>> IPv6 is not included in this set.
>> The reason the IPv6 bit is included here is to account for behavior we
>> have noticed in TC flower.
>> If, for example, I add a filter with the following match fields:
>> 'protocol ip flower enc_src_ip 10.0.0.1 enc_dst_ip 10.0.0.2
>> enc_dst_port 4789 enc_key_id 123'
>> The 'used_keys' value in the dissector marks both IPv4 and IPv6 encap
>> addresses as 'used'.
>> I am not sure if this is a bug in TC or that we are expected to check
>> the enc_control fields to determine if IPv4 or v6 addresses are used.
>
> you should have your code to check enc_control->addr_type to be
> FLOW_DISSECTOR_KEY_IPV4_ADDRS or IPV6_ADDRS
>
>
>> Including the IPv6 used_keys bit in our whitelist approach allows us
>> to accept legitimate IPv4 tunnel rules in these situations.
>
> mmm can please take a look on fl_init_dissector() and tell me if you
> see why FLOW_DISSECTOR_KEY_IPV6_ADDRS is set for ipv4 tunnels,
> I am not sure.


The fl_init_dissector uses the FL_KEY_SET_IF_MASKED macro to set an
array of keys which are then translated to the used_keys values.
The FL_KEY_SET_IF_MASKED takes a 'struct fl_flow_key' as input and
checks if any mask bits are set in a particular field - if so it
eventually marks it as used.
In struct fl_flow_key, the encap ipv4 and ipv6 addresses are
represented as a union of the 2.
Therefore, if we have masked bits set for IPv4, they are also being
set for the IPv6 field.


>
>> If it is found to be IPv6 when the rule is parsed, it will be rejected here.


Re: [PATCH net-next 2/7] nfp: compile flower vxlan tunnel metadata match fields

2017-09-26 Thread John Hurley
On Mon, Sep 25, 2017 at 7:35 PM, Or Gerlitz <gerlitz...@gmail.com> wrote:
> On Mon, Sep 25, 2017 at 1:23 PM, Simon Horman
> <simon.hor...@netronome.com> wrote:
>> From: John Hurley <john.hur...@netronome.com>
>>
>> Compile ovs-tc flower vxlan metadata match fields for offloading. Only
>
> anything in the npf kernel bits has direct relation to ovs? what?
>

Sorry, this is a typo  and should refer to TC.

>> +++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
>> @@ -52,8 +52,25 @@
>>  BIT(FLOW_DISSECTOR_KEY_PORTS) | \
>>  BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS) | \
>>  BIT(FLOW_DISSECTOR_KEY_VLAN) | \
>> +BIT(FLOW_DISSECTOR_KEY_ENC_KEYID) | \
>> +BIT(FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS) | \
>> +BIT(FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS) | \
>
> this series takes care of IPv6 tunnels too?

IPv6 is not included in this set.
The reason the IPv6 bit is included here is to account for behavior we
have noticed in TC flower.
If, for example, I add a filter with the following match fields:
'protocol ip flower enc_src_ip 10.0.0.1 enc_dst_ip 10.0.0.2
enc_dst_port 4789 enc_key_id 123'
The 'used_keys' value in the dissector marks both IPv4 and IPv6 encap
addresses as 'used'.
I am not sure if this is a bug in TC or that we are expected to check
the enc_control fields to determine if IPv4 or v6 addresses are used.
Including the IPv6 used_keys bit in our whitelist approach allows us
to accept legitimate IPv4 tunnel rules in these situations.
If it is found to be IPv6 when the rule is parsed, it will be rejected here.


Re: [PATCH net-next 7/7] nfp: flower vxlan neighbour keep-alive

2017-09-26 Thread John Hurley
[ Reposting in plantext only]



On Mon, Sep 25, 2017 at 7:32 PM, Or Gerlitz <gerlitz...@gmail.com> wrote:
>
> On Mon, Sep 25, 2017 at 1:23 PM, Simon Horman
> <simon.hor...@netronome.com> wrote:
> > From: John Hurley <john.hur...@netronome.com>
> >
> > Periodically receive messages containing the destination IPs of tunnels
> > that have recently forwarded traffic. Update the neighbour entries 'used'
> > value for these IPs next hop.
>
> Are you proactively sending keep alive messages from the driver or the
> fw? what's wrong with the probes sent by the kernel NUD subsystem?
>

Hi Or,

The messages are sent from the FW to the driver. They indicate which
offloaded tunnels are currently active.

>
> In our driver we also update the used value for neighs of offloaded
> tunnels, we do it based on flow counters for the offloaded tunnels
> which is an evidence for activity. Any reason for you not to apply a
> similar practice?


Yes, this would provide the same outcome. Because our firmware already
offered these messages, we chose to support this approach.

>
>
> Or.