date:20161225

[ovs-dev] [PATCH v2 6/6] ovn: distributed NAT flows

2016-12-25 Thread Mickey Spiegel

This patch implements the flows required in the ingress and egress
pipeline stages in order to support NAT on a distributed logical router.

NAT functionality is associated with the logical router gateway port.
The flows that carry out NAT functionality all have match conditions on
inport or outport equal to the logical router gateway port.  There are
additional flows that are used to redirect traffic when necessary,
using the tunnel key of a "chassisredirect" SB port binding in order to
redirect traffic to the instance of the logical router gateway port on
the centralized "redirect-chassis".

North/south traffic subject to one-to-one "dnat_and_snat" is handled
in a distributed manner, with south-to-north traffic going to the
local instance of the logical router gateway port.  North/south
traffic subject to (possibly one-to-many) "snat" is handled in a
centralized manner, with south-to-north traffic going to the instance
of the logical router gateway port on the "redirect-chassis".
North-to-south traffic is directed to the corresponding chassis by
limiting ARP responses to the appropriate instance of the logical
router gateway port on one chassis.  For centralized NAT rules, this
is the instance on the "redirect-chassis".  For distributed NAT rules,
this is the chassis where the corresponding logical port resides, using
an ethernet address specified in the NB NAT rule to trigger upstream
MAC learning.

East/west NAT traffic is all handled in a centralized manner.  While it
is certainly possible to handle some of this traffic in a distributed
manner, the centralized approach keeps the NAT flows simpler and
cleaner.  The expectation is that east/west NAT traffic is not as
important to optimize as north/south NAT traffic, with most east/west
traffic not requiring NAT.

Automated tests are currently limited to only a single node.  The
single node automated tests cover both north/south and east/west
traffic flows.

Signed-off-by: Mickey Spiegel 
---
 ovn/controller/ovn-controller.c |   6 +-
 ovn/northd/ovn-northd.8.xml | 293 ++--
 ovn/northd/ovn-northd.c | 365 ++--
 ovn/ovn-nb.ovsschema|   6 +-
 ovn/ovn-nb.xml  |  40 -
 tests/system-ovn.at | 338 +
 6 files changed, 974 insertions(+), 74 deletions(-)

diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c
index 131c900..5fd517c 100644
--- a/ovn/controller/ovn-controller.c
+++ b/ovn/controller/ovn-controller.c
@@ -302,10 +302,8 @@ update_ct_zones(struct sset *lports, const struct hmap 
*local_datapaths,
 /* Local patched datapath (gateway routers) need zones assigned. */
 const struct local_datapath *ld;
 HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
-if (!ld->has_local_l3gateway) {
-continue;
-}
-
+/* XXX Add method to limit zone assignment to logical router
+ * datapaths with NAT */
 char *dnat = alloc_nat_zone_key(>datapath->header_.uuid, "dnat");
 char *snat = alloc_nat_zone_key(>datapath->header_.uuid, "snat");
 sset_add(_users, dnat);
diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index b8af946..432ceeb 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -867,6 +867,15 @@ output;
 P  (eth.mcast || eth.dst ==
 E), with action next;.
   
+
+  
+For each dnat or dnat_and_snat NAT
+rule on a distributed router that specifies an external
+Ethernet address E, a priority-50 flow that matches
+inport == GW 
+eth.dst == E, where GW is the
+logical router gateway port, with action next;.
+  
 
 
 
@@ -1031,6 +1040,50 @@ outport = P;
 flags.loopback = 1;
 output;
 
+
+
+  For the gateway port on a distributed logical router with NAT
+  (where one of the logical router ports specifies a
+  redirect-chassis):
+
+
+
+  
+If the corresponding NAT rule cannot be handled in a
+distributed manner, then this flow is only programmed on
+the gateway port instance on the
+redirect-chassis.  This behavior avoids
+generation of multiple ARP responses from different chassis,
+and allows upstream MAC learning to point to the
+redirect-chassis.
+  
+
+  
+
+  If the corresponding NAT rule can be handled in a distributed
+  manner, then this flow is only programmed on the gateway port
+  instance where the logical_port specified in the
+  NAT rule resides.
+
+
+
+  Some of the actions are different for this case, using the
+  external_mac specified in the NAT rule rather
+  than the gateway

[ovs-dev] [PATCH v2 2/6] ovn: Introduce "chassisredirect" port binding

2016-12-25 Thread Mickey Spiegel

Currently OVN handles all logical router ports in a distributed manner,
creating instances on each chassis.  The logical router ingress and
egress pipelines are traversed locally on the source chassis.

In order to support advanced features such as one-to-many NAT (aka IP
masquerading), where multiple private IP addresses spread across
multiple chassis are mapped to one public IP address, it will be
necessary to handle some of the logical router processing on a specific
chassis in a centralized manner.

The goal of this patch is to develop abstractions that allow for a
subset of router gateway traffic to be handled in a centralized manner
(e.g. one-to-many NAT traffic), while allowing for other subsets of
router gateway traffic to be handled in a distributed manner (e.g.
floating IP traffic).

This patch introduces a new type of SB port_binding called
"chassisredirect".  A "chassisredirect" port represents a particular
instance, bound to a specific chassis, of an otherwise distributed
port.  The ovn-controller on that chassis populates the "chassis"
column for this record as an indication for other ovn-controllers of
its physical location.  Other ovn-controllers do not treat this port
as a local port.

A "chassisredirect" port should never be used as an "inport".  When an
ingress pipeline sets the "outport", it may set the value to a logical
port of type "chassisredirect".  This will cause the packet to be
directed to a specific chassis to carry out the egress logical router
pipeline, in the same way that a logical switch forwards egress traffic
to a VIF port residing on a specific chassis.  At the beginning of the
egress pipeline, the "outport" will be reset to the value of the
distributed port.

For outbound traffic to be handled in a centralized manner, the
"outport" should be set to the "chassisredirect" port representing
centralized gateway functionality in the otherwise distributed router.
For outbound traffic to be handled in a distributed manner, locally on
the source chassis, the "outport" should be set to the existing "patch"
port representing distributed gateway functionality.

Inbound traffic will be directed to the appropriate chassis by
restricting source MAC address usage and ARP responses to that chassis,
or by running dynamic routing protocols.

Note that "chassisredirect" ports have no associated IP or MAC addresses.
Any pipeline stages that depend on port specific IP or MAC addresses
should be carried out in the context of the distributed port.

Although the abstraction represented by the "chassisredirect" port
binding is generalized, in this patch the "chassisredirect" port binding
is only created for NB logical router ports that specify the new
"redirect-chassis" option.  There is no explicit notion of a
"chassisredirect" port in the NB database.  The expectation is when
capabilities are implemented that take advantage of "chassisredirect"
ports (e.g. NAT), the addition of flows specifying a "chassisredirect"
port as the outport will also be triggered by the presence of the
"redirect-chassis" option.

Signed-off-by: Mickey Spiegel 
---
 ovn/controller/binding.c|   8 ++
 ovn/controller/ovn-controller.c |   4 +
 ovn/controller/physical.c   |  63 +
 ovn/northd/ovn-northd.8.xml |  75 ++-
 ovn/northd/ovn-northd.c | 195 +--
 ovn/ovn-nb.ovsschema|   9 +-
 ovn/ovn-nb.xml  |  30 +
 ovn/ovn-sb.xml  |  35 +
 ovn/utilities/ovn-trace.c   |  43 +-
 tests/ovn.at| 287 
 10 files changed, 736 insertions(+), 13 deletions(-)

diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
index 2f24e9d..25592c2 100644
--- a/ovn/controller/binding.c
+++ b/ovn/controller/binding.c
@@ -355,6 +355,14 @@ consider_local_datapath(struct controller_ctx *ctx,
 add_local_datapath(ldatapaths, lports, binding_rec->datapath,
false, local_datapaths);
 }
+} else if (!strcmp(binding_rec->type, "chassisredirect")) {
+const char *chassis_id = smap_get(_rec->options,
+  "redirect-chassis");
+our_chassis = chassis_id && !strcmp(chassis_id, chassis_rec->name);
+if (our_chassis) {
+add_local_datapath(ldatapaths, lports, binding_rec->datapath,
+   false, local_datapaths);
+}
 } else if (!strcmp(binding_rec->type, "l3gateway")) {
 const char *chassis_id = smap_get(_rec->options,
   "l3gateway-chassis");
diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c
index 5bddcf3..131c900 100644
--- a/ovn/controller/ovn-controller.c
+++ b/ovn/controller/ovn-controller.c
@@ -155,6 +155,10 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
 sbrec_port_binding_add_clause_options(,

[ovs-dev] [PATCH v2 4/6] ovn: move load balancing flows after NAT flows

2016-12-25 Thread Mickey Spiegel

This will make it easy for distributed NAT to reuse some of the
existing code for NAT flows, while leaving load balancing and defrag
as functionality specific to gateway routers.  There is no intent to
change any functionality in this patch.

Signed-off-by: Mickey Spiegel 
---
 ovn/northd/ovn-northd.c | 140 
 1 file changed, 70 insertions(+), 70 deletions(-)

diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 6779d46..a333d1c 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -4068,76 +4068,6 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap 
*ports,
 const char *lb_force_snat_ip = get_force_snat_ip(od, "lb",
  _ip);
 
-/* A set to hold all ips that need defragmentation and tracking. */
-struct sset all_ips = SSET_INITIALIZER(_ips);
-
-for (int i = 0; i < od->nbr->n_load_balancer; i++) {
-struct nbrec_load_balancer *lb = od->nbr->load_balancer[i];
-struct smap *vips = >vips;
-struct smap_node *node;
-
-SMAP_FOR_EACH (node, vips) {
-uint16_t port = 0;
-
-/* node->key contains IP:port or just IP. */
-char *ip_address = NULL;
-ip_address_and_port_from_lb_key(node->key, _address, );
-if (!ip_address) {
-continue;
-}
-
-if (!sset_contains(_ips, ip_address)) {
-sset_add(_ips, ip_address);
-}
-
-/* Higher priority rules are added for load-balancing in DNAT
- * table.  For every match (on a VIP[:port]), we add two flows
- * via add_router_lb_flow().  One flow is for specific matching
- * on ct.new with an action of "ct_lb($targets);".  The other
- * flow is for ct.est with an action of "ct_dnat;". */
-ds_clear();
-ds_put_format(, "ct_lb(%s);", node->value);
-
-ds_clear();
-ds_put_format(, "ip && ip4.dst == %s",
-  ip_address);
-free(ip_address);
-
-if (port) {
-if (lb->protocol && !strcmp(lb->protocol, "udp")) {
-ds_put_format(, " && udp && udp.dst == %d",
-  port);
-} else {
-ds_put_format(, " && tcp && tcp.dst == %d",
-  port);
-}
-add_router_lb_flow(lflows, od, , , 120,
-   lb_force_snat_ip);
-} else {
-add_router_lb_flow(lflows, od, , , 110,
-   lb_force_snat_ip);
-}
-}
-}
-
-/* If there are any load balancing rules, we should send the
- * packet to conntrack for defragmentation and tracking.  This helps
- * with two things.
- *
- * 1. With tracking, we can send only new connections to pick a
- *DNAT ip address from a group.
- * 2. If there are L4 ports in load balancing rules, we need the
- *defragmentation to match on L4 ports. */
-const char *ip_address;
-SSET_FOR_EACH(ip_address, _ips) {
-ds_clear();
-ds_put_format(, "ip && ip4.dst == %s", ip_address);
-ovn_lflow_add(lflows, od, S_ROUTER_IN_DEFRAG,
-  100, ds_cstr(), "ct_next;");
-}
-
-sset_destroy(_ips);
-
 for (int i = 0; i < od->nbr->n_nat; i++) {
 const struct nbrec_nat *nat;
 
@@ -4292,6 +4222,76 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap 
*ports,
 * routing in the openflow pipeline. */
 ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 50,
   "ip", "flags.loopback = 1; ct_dnat;");
+
+/* A set to hold all ips that need defragmentation and tracking. */
+struct sset all_ips = SSET_INITIALIZER(_ips);
+
+for (int i = 0; i < od->nbr->n_load_balancer; i++) {
+struct nbrec_load_balancer *lb = od->nbr->load_balancer[i];
+struct smap *vips = >vips;
+struct smap_node *node;
+
+SMAP_FOR_EACH (node, vips) {
+uint16_t port = 0;
+
+/* node->key contains IP:port or just IP. */
+char *ip_address = NULL;
+ip_address_and_port_from_lb_key(node->key, _address, );
+if (!ip_address) {
+continue;
+}
+
+if (!sset_contains(_ips, ip_address)) {
+sset_add(_ips, ip_address);
+}
+
+/* Higher priority rules are added for load-balancing in DNAT
+

[ovs-dev] [PATCH v2 3/6] ovn: add egress loopback capability

2016-12-25 Thread Mickey Spiegel

This patch adds the capability to force loopback at the end of the
egress pipeline.  A new flags.force_egress_loopback symbol is defined,
along with corresponding flags bits.  When flags.force_egress_loopback
is set, at OFTABLE_LOG_TO_PHY, instead of the packet being sent out to
the peer patch port or out the outport, the packet is forced back to
the beginning of the ingress pipeline with inport = outport.  All
other registers are cleared, as if the packet just arrived on that
inport.

This capability is needed in order to implement some of the east/west
distributed NAT flows.

Note: The existing flags.loopback allows a packet to go from the end
of the ingress pipeline to the beginning of the egress pipeline with
outport = inport, which is different.

Initially, there are no tests incorporated in this patch.  This
functionality is tested in a subsequent distributed NAT flows patch.
Tests specific to egress loopback may be added once the capability
to inject a packet with one of the flags bits set is added.

Signed-off-by: Mickey Spiegel 
---
 ovn/controller/physical.c   | 38 ++
 ovn/lib/logical-fields.c|  8 
 ovn/lib/logical-fields.h| 14 ++
 ovn/northd/ovn-northd.8.xml |  4 +++-
 ovn/northd/ovn-northd.c |  2 ++
 ovn/ovn-sb.xml  |  2 +-
 ovn/utilities/ovn-trace.c   | 41 ++---
 7 files changed, 92 insertions(+), 17 deletions(-)

diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
index 3ea1290..cba1c0e 100644
--- a/ovn/controller/physical.c
+++ b/ovn/controller/physical.c
@@ -183,7 +183,7 @@ get_zone_ids(const struct sbrec_port_binding *binding,
 }
 
 static void
-put_local_common_flows(uint32_t dp_key, uint32_t port_key,
+put_local_common_flows(uint32_t dp_key, uint32_t port_key, ofp_port_t ofport,
bool nested_container, const struct zone_ids *zone_ids,
struct ofpbuf *ofpacts_p, struct hmap *flow_table)
 {
@@ -258,6 +258,36 @@ put_local_common_flows(uint32_t dp_key, uint32_t port_key,
 put_resubmit(OFTABLE_LOG_TO_PHY, ofpacts_p);
 put_stack(MFF_IN_PORT, ofpact_put_STACK_POP(ofpacts_p));
 ofctrl_add_flow(flow_table, OFTABLE_SAVE_INPORT, 100, , ofpacts_p);
+
+/* Table 65, Priority 150.
+ * ===
+ *
+ * Send packets with MLF_FORCE_EGRESS_LOOPBACK flag back to the
+ * ingress pipeline with inport = outport. */
+
+match_init_catchall();
+ofpbuf_clear(ofpacts_p);
+match_set_metadata(, htonll(dp_key));
+match_set_reg(, MFF_LOG_OUTPORT - MFF_REG0, port_key);
+match_set_reg_masked(, MFF_LOG_FLAGS - MFF_REG0,
+ MLF_FORCE_EGRESS_LOOPBACK, MLF_FORCE_EGRESS_LOOPBACK);
+
+size_t clone_ofs = ofpacts_p->size;
+struct ofpact_nest *clone = ofpact_put_CLONE(ofpacts_p);
+put_load(ofport, MFF_IN_PORT, 0, 16, ofpacts_p);
+put_load(port_key, MFF_LOG_INPORT, 0, 32, ofpacts_p);
+put_load(0, MFF_LOG_OUTPORT, 0, 32, ofpacts_p);
+put_load(MLF_EGRESS_LOOPBACK_OCCURRED, MFF_LOG_FLAGS, 0, 32, ofpacts_p);
+for (int i = 0; i < MFF_N_LOG_REGS; i++) {
+put_load(0, MFF_LOG_REG0 + i, 0, 32, ofpacts_p);
+}
+put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, ofpacts_p);
+clone = ofpbuf_at_assert(ofpacts_p, clone_ofs, sizeof *clone);
+ofpacts_p->header = clone;
+ofpact_finish_CLONE(ofpacts_p, );
+
+ofctrl_add_flow(flow_table, OFTABLE_LOG_TO_PHY, 150,
+, ofpacts_p);
 }
 
 static void
@@ -320,7 +350,7 @@ consider_port_binding(enum mf_field_id mff_ovn_geneve,
 }
 
 struct zone_ids binding_zones = get_zone_ids(binding, ct_zones);
-put_local_common_flows(dp_key, port_key, false, _zones,
+put_local_common_flows(dp_key, port_key, 0, false, _zones,
ofpacts_p, flow_table);
 
 match_init_catchall();
@@ -489,8 +519,8 @@ consider_port_binding(enum mf_field_id mff_ovn_geneve,
  */
 
 struct zone_ids zone_ids = get_zone_ids(binding, ct_zones);
-put_local_common_flows(dp_key, port_key, nested_container, _ids,
-   ofpacts_p, flow_table);
+put_local_common_flows(dp_key, port_key, ofport, nested_container,
+   _ids, ofpacts_p, flow_table);
 
 /* Table 0, Priority 150 and 100.
  * ==
diff --git a/ovn/lib/logical-fields.c b/ovn/lib/logical-fields.c
index fa134d6..c056e41 100644
--- a/ovn/lib/logical-fields.c
+++ b/ovn/lib/logical-fields.c
@@ -96,6 +96,14 @@ ovn_init_symtab(struct shash *symtab)
  MLF_FORCE_SNAT_FOR_LB_BIT);
 expr_symtab_add_subfield(symtab, "flags.force_snat_for_lb", NULL,
  flags_str);
+snprintf(flags_str, sizeof flags_str, "flags[%d]",
+ MLF_FORCE_EGRESS_LOOPBACK_BIT);
+expr_symtab_add_subfield(symtab,

[ovs-dev] [PATCH v2 1/6] ovn: add is_chassis_resident match expression component

2016-12-25 Thread Mickey Spiegel

This patch introduces a new match expression component
is_chassis_resident().  Unlike match expression comparisons,
is_chassis_resident is not pushed down to OpenFlow.  It is a
conditional that is evaluated in the controller during expr_simplify(),
when it is replaced by a boolean expression.  The is_chassis_resident
conditional evaluates to "true" when the specified string identifies a
port name that is resident on this controller chassis, i.e., the
corresponding southbound database Port_Binding has a chassis column
that matches this chassis.  Otherwise it evaluates to "false".

This allows higher level features to specify flows that are only
installed on some chassis rather than on all chassis with the
corresponding datapath.

Suggested-by: Ben Pfaff 
Signed-off-by: Mickey Spiegel 
---
 include/ovn/expr.h  |  22 +-
 ovn/controller/lflow.c  |  39 --
 ovn/controller/lflow.h  |   5 +-
 ovn/controller/ovn-controller.c |   5 +-
 ovn/lib/expr.c  | 155 ++--
 ovn/utilities/ovn-trace.c   |  21 +-
 tests/ovn.at|  14 
 tests/test-ovn.c|  15 +++-
 8 files changed, 260 insertions(+), 16 deletions(-)

diff --git a/include/ovn/expr.h b/include/ovn/expr.h
index 371ba20..d3749fa 100644
--- a/include/ovn/expr.h
+++ b/include/ovn/expr.h
@@ -292,6 +292,15 @@ enum expr_type {
 EXPR_T_AND, /* Logical AND of 2 or more subexpressions. */
 EXPR_T_OR,  /* Logical OR of 2 or more subexpressions. */
 EXPR_T_BOOLEAN, /* True or false constant. */
+EXPR_T_CONDITION,   /* Conditional to be evaluated in the
+ * controller during expr_simplify(),
+ * prior to constructing OpenFlow matches. */
+};
+
+/* Expression condition type. */
+enum expr_cond_type {
+EXPR_COND_CHASSIS_RESIDENT, /* Check if specified logical port name is
+ * resident on the controller chassis. */
 };
 
 /* Relational operator. */
@@ -349,6 +358,14 @@ struct expr {
 
 /* EXPR_T_BOOLEAN. */
 bool boolean;
+
+/* EXPR_T_CONDITION. */
+struct {
+enum expr_cond_type type;
+bool not;
+/* XXX Should arguments for conditions be generic? */
+char *string;
+} cond;
 };
 };
 
@@ -375,7 +392,10 @@ void expr_destroy(struct expr *);
 
 struct expr *expr_annotate(struct expr *, const struct shash *symtab,
char **errorp);
-struct expr *expr_simplify(struct expr *);
+struct expr *expr_simplify(struct expr *,
+   bool (*is_chassis_resident)(const void *c_aux,
+   const char *port_name),
+   const void *c_aux);
 struct expr *expr_normalize(struct expr *);
 
 bool expr_honors_invariants(const struct expr *);
diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c
index d913998..0384c8d 100644
--- a/ovn/controller/lflow.c
+++ b/ovn/controller/lflow.c
@@ -64,12 +64,18 @@ struct lookup_port_aux {
 const struct sbrec_datapath_binding *dp;
 };
 
+struct condition_aux {
+const struct lport_index *lports;
+const struct sbrec_chassis *chassis;
+};
+
 static void consider_logical_flow(const struct lport_index *lports,
   const struct mcgroup_index *mcgroups,
   const struct sbrec_logical_flow *lflow,
   const struct hmap *local_datapaths,
   struct group_table *group_table,
   const struct simap *ct_zones,
+  const struct sbrec_chassis *chassis,
   struct hmap *dhcp_opts_p,
   struct hmap *dhcpv6_opts_p,
   uint32_t *conj_id_ofs_p,
@@ -99,6 +105,20 @@ lookup_port_cb(const void *aux_, const char *port_name, 
unsigned int *portp)
 }
 
 static bool
+is_chassis_resident_cb(const void *c_aux_, const char *port_name)
+{
+const struct condition_aux *c_aux = c_aux_;
+
+const struct sbrec_port_binding *pb
+= lport_lookup_by_name(c_aux->lports, port_name);
+if (pb && pb->chassis && pb->chassis == c_aux->chassis) {
+return true;
+}
+
+return false;
+}
+
+static bool
 is_switch(const struct sbrec_datapath_binding *ldp)
 {
 return smap_get(>external_ids, "logical-switch") != NULL;
@@ -112,6 +132,7 @@ add_logical_flows(struct controller_ctx *ctx, const struct 
lport_index *lports,
   const struct hmap *local_datapaths,
   struct group_table *group_table,
   const struct simap *ct_zones,
+  const struct sbrec_chassis *chassis,
   struct

[ovs-dev] [PATCH v2 0/6] ovn: add distributed NAT capability

2016-12-25 Thread Mickey Spiegel

Currently OVN supports NAT functionality by connecting each distributed
logical router to a centralized "l3gateway" router that resides on a
single chassis.  NAT is only carried out in the "l3gateway" router.

This patch set introduces NAT capability in the distributed logical
router itself, avoiding the need to pass through a transit logical
switch and a second logical router, and in many cases avoiding the need
to pass through a centralized chassis.

NAT functionality is associated with the logical router gateway port.
In order to support one-to-many SNAT (aka IP masquerading), where
multiple private IP addresses spread across multiple chassis are mapped
to a single public IP address, it will be necessary to handle some of
the logical router processing on a specific chassis in a centralized
manner.  Some NAT flows are handled in a distributed manner on all
chassis (following the local "patch" port as is normally done for
distributed logical routers), while other NAT flows are handled on a
centralized "redirect-chassis".

Possible future work items (hopefully not required for this patch set
to be accepted) include:
1. The NAT flows patch lifts the restriction that conntrack zones are
   only assigned to datapaths for gateway routers.  Given recent
   changes to ovn-controller, a hypervisor only sees the datapaths
   for which there is a port resident on this chassis, or datapaths
   reachable from ports resident on this chassis.  Is that good
   enough?  Or should conntrack zone assignment for datapaths be
   restricted further, perhaps only to logical router datapaths?
2. The current automated test for NAT flows is single node, so it does
   not cover the distributed functionality.  Full coverage requires a
   multi-node test with conntrack NAT capability, either in the kernel
   or userspace.  Is this possible?
   Multi-node tests have been added for the chassisdirect patch,
   testing non-NAT aspects of the distributed router gateway port.
3. Consider how to generalize distributed versus centralized handling
   of non-NAT traffic being output on the distributed gateway port.
   If MAC learning is used in the upstream network, then the
   distributed gateway port’s MAC address must be restricted to the
   redirect-chassis by using the chassisredirect port.  In the
   presence of dynamic protocols such as BGP EVPN, non-NAT traffic
   could be handled in a distributed manner.
4. Gratuitous ARP for NAT addresses needs to be updated for
   distributed NAT.
5. Add load balancing on the redirect chassis of an otherwise
   distributed logical router.

PATCH v1 -> PATCH v2
Added ovn-trace logic for chassisredirect ports, including automated test.
Added ovn-trace logic for egress loopback.
Fixed some bugs in ovn-trace register handling from ingress to egress,
and across patch ports (should these be filed separately as well?).

RFC v4 -> PATCH v1
Added egress loopback capability
Added east/west NAT tests to system-ovn.at (make check-kernel)
Added REGBIT_NAT_REDIRECT flows to IN_IP_ROUTING and IN_ARP_RESOLVE,
resolving remaining issues with east/west NAT

RFC v3 -> RFC v4
Rebased to pick up recent changes to ovn-controller, including a fix
to the localnet issue where VIFs had to be added on a chassis in order
to cause the localnet port to be instantiated.
The chassisredirect port logic was rewritten to avoid creating an
ofport.  Besides streamlining the code significantly, this fixed the
problem when the distributed port name was longer than 12 characters.
Restricted IPv6 ND replies for the router IP address to the redirect
chassis, similar to IPv4 ARP restrictions.
Added specific gateway redirect flows for unresolved ethernet
destination, so that ARP requests generated by the router are sent
through the redirect chassis regardless of NAT rules.
Relaxed checks in chassisredirect tests so that they are independent
of register assignments.
Renamed ovn-northd.c "l3gateway_port" to "l3dgw_port" in order to
avoid overlaps with gateway router terminology.

RFC v2 -> RFC v3
Reordered the first two patches.
Moved non-NAT specific flows from patch 5 to patch 2.
Added automated tests for is_chassis_resident (which is ready for
review) and chassisredirect patches.
Added flows to limit ICMP echo replies for router IPs on the gateway
interface, so that they are only generated on the redirect-chassis.

Mickey Spiegel (6):
  ovn: add is_chassis_resident match expression component
  ovn: Introduce "chassisredirect" port binding
  ovn: add egress loopback capability
  ovn: move load balancing flows after NAT flows
  ovn: avoid snat recirc only on gateway routers
  ovn: distributed NAT flows

 include/ovn/actions.h   |   3 +
 include/ovn/expr.h  |  22 +-
 ovn/controller/binding.c|   8 +
 ovn/controller/lflow.c  |  49 ++-
 ovn/controller/lflow.h  |   5 +-
 ovn/controller/ovn-controller.c |  15 +-
 ovn/controller/physical.c   | 101 +-
 ovn/lib/actions.c   |  15 +-

[ovs-dev] openvswitch is marked for autoremoval from testing

2016-12-25 Thread Debian testing autoremoval watch

openvswitch 2.3.0+git20140819-4 is marked for autoremoval from testing on 
2017-01-23

It is affected by these RC bugs:
828478: openvswitch: FTBFS with openssl 1.1.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH ovs V2 17/21] dpif-netlink: Use netdev flow get api to query a flow

2016-12-25 Thread Paul Blakey

Search all datapath added netdevs for a given flow
using netdev flow api and parse it back to dpif flow.

Signed-off-by: Paul Blakey 
Reviewed-by: Roi Dayan 
---
 lib/dpif-netlink.c | 58 +-
 1 file changed, 57 insertions(+), 1 deletion(-)

diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index f8cc59d..ec54512 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -1908,6 +1908,53 @@ dpif_netlink_operate__(struct dpif_netlink *dpif,
 return n_ops;
 }
 
+static bool
+parse_flow_get(struct dpif_netlink *dpif, struct dpif_flow_get *get)
+{
+struct dpif_flow *dpif_flow = get->flow;
+struct ovs_list port_list;
+struct netdev_list_element *element;
+struct match match;
+struct nlattr *actions = 0;
+struct dpif_flow_stats stats;
+struct ofpbuf buf;
+uint64_t act_buf[1024 / 8];
+bool found = false;
+struct odputil_keybuf maskbuf;
+struct odputil_keybuf keybuf;
+struct odputil_keybuf actbuf;
+struct ofpbuf key, mask, act;
+
+ofpbuf_use_stack(, _buf, sizeof act_buf);
+netdev_hmap_port_get_list(dpif->dpif.dpif_class, _list);
+LIST_FOR_EACH(element, node, _list) {
+if (!netdev_flow_get(element->netdev, , , ,
+ (ovs_u128 *) get->ufid, )) {
+found = true;
+}
+}
+netdev_port_list_del(_list);
+if (!found) {
+return false;
+}
+
+VLOG_DBG("found flow from netdev, translating to dpif flow");
+
+ofpbuf_use_stack(, , sizeof keybuf);
+ofpbuf_use_stack(, , sizeof actbuf);
+ofpbuf_use_stack(, , sizeof maskbuf);
+dpif_netlink_netdev_match_to_dpif_flow(, , , actions,
+   ,
+   (ovs_u128 *) get->ufid,
+   dpif_flow,
+   false);
+ofpbuf_put(get->buffer, nl_attr_get(actions), nl_attr_get_size(actions));
+dpif_flow->actions = ofpbuf_at(get->buffer, 0, 0);
+dpif_flow->actions_len = nl_attr_get_size(actions);
+
+return true;
+}
+
 static int
 parse_key_and_mask_to_match(const struct nlattr *key, size_t key_len,
 const struct nlattr *mask, size_t mask_len,
@@ -2127,7 +2174,16 @@ try_send_to_netdev(struct dpif_netlink *dpif, struct 
dpif_op *op)
del->ufid, "DEL");
 return parse_flow_del(dpif, del);
 }
-case DPIF_OP_FLOW_GET:
+case DPIF_OP_FLOW_GET: {
+struct dpif_flow_get *get = >u.flow_get;
+
+if (!op->u.flow_get.ufid) {
+return false;
+}
+dbg_print_flow(get->key, get->key_len, NULL, 0, NULL, 0,
+   get->ufid, "GET");
+return parse_flow_get(dpif, get);
+}
 case DPIF_OP_EXECUTE:
 default:
 break;
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH ovs V2 14/21] netdev-tc-offloads: Netdev flow put implementation using tc api

2016-12-25 Thread Paul Blakey

Signed-off-by: Paul Blakey 
Reviewed-by: Roi Dayan 
---
 lib/netdev-tc-offloads.c | 186 +--
 1 file changed, 180 insertions(+), 6 deletions(-)

diff --git a/lib/netdev-tc-offloads.c b/lib/netdev-tc-offloads.c
index b4eee98..4acc8ea 100644
--- a/lib/netdev-tc-offloads.c
+++ b/lib/netdev-tc-offloads.c
@@ -370,15 +370,189 @@ netdev_tc_flow_dump_next(struct netdev_flow_dump *dump,
 return false;
 }
 
+static int
+parse_put_flow_set_action(struct tc_flow *tc_flow, const struct nlattr *set,
+  size_t set_len)
+{
+const struct nlattr *set_attr;
+size_t set_left;
+
+NL_ATTR_FOR_EACH_UNSAFE(set_attr, set_left, set, set_len) {
+if (nl_attr_type(set_attr) == OVS_KEY_ATTR_TUNNEL) {
+const struct nlattr *tunnel = nl_attr_get(set_attr);
+const size_t tunnel_len = nl_attr_get_size(set_attr);
+const struct nlattr *tun_attr;
+size_t tun_left;
+
+tc_flow->set.set = true;
+NL_ATTR_FOR_EACH_UNSAFE(tun_attr, tun_left, tunnel, tunnel_len) {
+switch (nl_attr_type(tun_attr)) {
+case OVS_TUNNEL_KEY_ATTR_ID: {
+tc_flow->set.id = nl_attr_get_be64(tun_attr);
+}
+break;
+case OVS_TUNNEL_KEY_ATTR_IPV4_SRC: {
+tc_flow->set.ipv4_src = nl_attr_get_be32(tun_attr);
+}
+break;
+case OVS_TUNNEL_KEY_ATTR_IPV4_DST: {
+tc_flow->set.ipv4_dst = nl_attr_get_be32(tun_attr);
+}
+break;
+case OVS_TUNNEL_KEY_ATTR_TP_SRC: {
+tc_flow->set.tp_src = nl_attr_get_be16(tun_attr);
+}
+break;
+case OVS_TUNNEL_KEY_ATTR_TP_DST: {
+tc_flow->set.tp_dst = nl_attr_get_be16(tun_attr);
+}
+break;
+}
+}
+} else {
+VLOG_DBG("unsupported set action type: %d",
+ nl_attr_type(set_attr));
+return -1;
+}
+}
+return 0;
+}
+
 int
-netdev_tc_flow_put(struct netdev *netdev OVS_UNUSED,
-  struct match *match OVS_UNUSED,
-  struct nlattr *actions OVS_UNUSED,
-  size_t actions_len OVS_UNUSED,
+netdev_tc_flow_put(struct netdev *netdev,
+  struct match *match,
+  struct nlattr *actions,
+  size_t actions_len,
   struct dpif_flow_stats *stats OVS_UNUSED,
-  ovs_u128 *ufid OVS_UNUSED)
+  ovs_u128 *ufid)
 {
-return EOPNOTSUPP;
+struct tc_flow tc_flow;
+struct flow *key = >flow;
+struct flow *mask = >wc.masks;
+const struct flow_tnl *tnl = >flow.tunnel;
+struct nlattr *nla;
+size_t left;
+int prio = 0;
+int handle;
+int err;
+
+memset(_flow, 0, sizeof(tc_flow));
+
+if (tnl->tun_id) {
+VLOG_INFO("tun_id %#"PRIx64, ntohll(tnl->tun_id));
+VLOG_DBG("tun_src "IP_FMT" tun_dst "IP_FMT,
+ IP_ARGS(tnl->ip_src), IP_ARGS(tnl->ip_dst));
+VLOG_DBG("tun_tp_src %d, tun_tp_dst %d",
+ ntohs(tnl->tp_src), ntohs(tnl->tp_dst));
+tc_flow.tunnel.id = tnl->tun_id;
+tc_flow.tunnel.ipv4_src = tnl->ip_src;
+tc_flow.tunnel.ipv4_dst = tnl->ip_dst;
+tc_flow.tunnel.tp_src = tnl->tp_src;
+tc_flow.tunnel.tp_dst = tnl->tp_dst;
+tc_flow.tunnel.tunnel = true;
+}
+
+tc_flow.key.eth_type = key->dl_type;
+tc_flow.mask.eth_type = mask->dl_type;
+
+if (mask->vlan_tci) {
+ovs_be16 vid_mask = mask->vlan_tci & htons(VLAN_VID_MASK);
+ovs_be16 pcp_mask = mask->vlan_tci & htons(VLAN_PCP_MASK);
+ovs_be16 cfi = mask->vlan_tci & htons(VLAN_CFI);
+
+if (cfi && key->vlan_tci & htons(VLAN_CFI)
+&& (!vid_mask || vid_mask == htons(VLAN_VID_MASK))
+&& (!pcp_mask || pcp_mask == htons(VLAN_PCP_MASK))
+&& (vid_mask || pcp_mask)) {
+if (vid_mask) {
+tc_flow.key.vlan_id = vlan_tci_to_vid(key->vlan_tci);
+VLOG_DBG("vlan_id: %d\n", tc_flow.key.vlan_id);
+}
+if (pcp_mask) {
+tc_flow.key.vlan_prio = vlan_tci_to_pcp(key->vlan_tci);
+VLOG_DBG("vlan_prio %d\n", tc_flow.key.vlan_prio);
+}
+tc_flow.key.encap_eth_type = key->dl_type;
+tc_flow.key.eth_type = htons(ETH_TYPE_VLAN);
+} else if (mask->vlan_tci == htons(0x) &&
+   ntohs(key->vlan_tci) == 0) {
+/* exact && no vlan */
+} else {
+/* partial mask */
+return EOPNOTSUPP;
+}
+}
+
+tc_flow.key.dst_mac = key->dl_dst;
+

[ovs-dev] [PATCH ovs V2 12/21] dpif-netlink: Use netdev flow put api to insert a flow

2016-12-25 Thread Paul Blakey

Using the new netdev flow api operate will now try and
offload flows to the relevant netdev of the input port.
Other operate methods flows will come in later patches.

Signed-off-by: Paul Blakey 
Reviewed-by: Roi Dayan 
---
 lib/dpif-netlink.c | 232 -
 1 file changed, 228 insertions(+), 4 deletions(-)

diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index 3d8940e..717af90 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -1908,15 +1908,239 @@ dpif_netlink_operate__(struct dpif_netlink *dpif,
 return n_ops;
 }
 
+static int
+parse_key_and_mask_to_match(const struct nlattr *key, size_t key_len,
+const struct nlattr *mask, size_t mask_len,
+struct match *match)
+{
+enum odp_key_fitness fitness;
+
+fitness = odp_flow_key_to_flow(key, key_len, >flow);
+if (fitness) {
+/* This should not happen: it indicates that odp_flow_key_from_flow()
+ * and odp_flow_key_to_flow() disagree on the acceptable form of a
+ * flow.  Log the problem as an error, with enough details to enable
+ * debugging. */
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
+
+if (!VLOG_DROP_ERR()) {
+struct ds s;
+
+ds_init();
+odp_flow_format(key, key_len, NULL, 0, NULL, , true);
+VLOG_ERR("internal error parsing flow key %s", ds_cstr());
+ds_destroy();
+}
+
+return EINVAL;
+}
+
+fitness = odp_flow_key_to_mask(mask, mask_len, >wc, >flow);
+if (fitness) {
+/* This should not happen: it indicates that
+ * odp_flow_key_from_mask() and odp_flow_key_to_mask()
+ * disagree on the acceptable form of a mask.  Log the problem
+ * as an error, with enough details to enable debugging. */
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
+
+if (!VLOG_DROP_ERR()) {
+struct ds s;
+
+VLOG_ERR("internal error parsing flow mask %s (%s)",
+ ds_cstr(), odp_key_fitness_to_string(fitness));
+ds_destroy();
+}
+
+return EINVAL;
+}
+
+return 0;
+}
+
+static bool
+parse_flow_put(struct dpif_netlink *dpif, struct dpif_flow_put *put)
+{
+struct match match;
+odp_port_t in_port;
+const struct nlattr *nla;
+size_t left;
+int outputs = 0;
+struct ofpbuf buf;
+uint64_t act_stub[1024 / 8];
+size_t offset;
+struct nlattr *act;
+struct netdev *dev;
+int err;
+
+/* 0x1234 - fake eth type sent to probe feature */
+if (put->flags & DPIF_FP_PROBE || match.flow.dl_type == htons(0x1234)) {
+return false;
+}
+
+if (parse_key_and_mask_to_match(put->key, put->key_len, put->mask,
+put->mask_len, )) {
+return false;
+}
+
+in_port = match.flow.in_port.odp_port;
+ofpbuf_use_stub(, act_stub, sizeof act_stub);
+offset = nl_msg_start_nested(, OVS_FLOW_ATTR_ACTIONS);
+NL_ATTR_FOR_EACH(nla, left, put->actions, put->actions_len) {
+if (nl_attr_type(nla) == OVS_ACTION_ATTR_OUTPUT) {
+struct netdev *outdev;
+int ifindex_out;
+const struct netdev_tunnel_config *tnl_cfg;
+size_t out_off;
+odp_port_t out_port;
+
+outputs++;
+if (outputs > 1) {
+break;
+}
+
+out_port = nl_attr_get_u32(nla);
+outdev = netdev_hmap_port_get(out_port, dpif->dpif.dpif_class);
+tnl_cfg = netdev_get_tunnel_config(outdev);
+
+out_off = nl_msg_start_nested(, OVS_ACTION_ATTR_OUTPUT);
+ifindex_out = netdev_get_ifindex(outdev);
+nl_msg_put_u32(, OVS_ACTION_ATTR_OUTPUT, ifindex_out);
+if (tnl_cfg && tnl_cfg->dst_port != 0) {
+nl_msg_put_u32(, OVS_TUNNEL_KEY_ATTR_TP_DST, 
tnl_cfg->dst_port);
+}
+nl_msg_end_nested(, out_off);
+
+if (outdev) {
+netdev_close(outdev);
+}
+} else {
+nl_msg_put_unspec(, nl_attr_type(nla), nl_attr_get(nla),
+  nl_attr_get_size(nla));
+}
+}
+nl_msg_end_nested(, offset);
+
+if (outputs > 1) {
+return false;
+}
+
+act = ofpbuf_at_assert(, offset, sizeof(struct nlattr));
+dev = netdev_hmap_port_get(in_port, dpif->dpif.dpif_class);
+err = netdev_flow_put(dev, , CONST_CAST(struct nlattr *,
+  nl_attr_get(act)),
+  nl_attr_get_size(act), put->stats,
+  CONST_CAST(ovs_u128 *, put->ufid));
+netdev_close(dev);
+
+if (!err) {
+if (put->flags & DPIF_FP_MODIFY) {
+struct dpif_op *opp;
+struct dpif_op op;
+
+op.type =

[ovs-dev] [PATCH ovs V2 10/21] netdev-tc-offloads: Add ufid to tc/netdev map

2016-12-25 Thread Paul Blakey

Flows offloaded to tc are identified by priority
and handle pair while OVS flows are identified by ufid.
Added a hash map to convert between the two for later
retrieval and deleting of offloaded flows.

Signed-off-by: Paul Blakey 
Reviewed-by: Roi Dayan 
---
 lib/netdev-tc-offloads.c | 98 
 1 file changed, 98 insertions(+)

diff --git a/lib/netdev-tc-offloads.c b/lib/netdev-tc-offloads.c
index 4ba6086..f470aa3 100644
--- a/lib/netdev-tc-offloads.c
+++ b/lib/netdev-tc-offloads.c
@@ -75,6 +75,104 @@
 
 VLOG_DEFINE_THIS_MODULE(netdev_tc_offloads);
 
+static struct hmap ufid_to_tc = HMAP_INITIALIZER(_to_tc);
+static struct ovs_mutex ufid_lock = OVS_MUTEX_INITIALIZER;
+
+struct ufid_to_tc_data {
+struct hmap_node node;
+ovs_u128 ufid;
+uint16_t prio;
+uint32_t handle;
+struct netdev *netdev;
+};
+
+static bool
+del_ufid_tc_mapping(ovs_u128 *ufid)
+{
+size_t hash = hash_bytes(ufid, sizeof *ufid, 0);
+struct ufid_to_tc_data *data;
+
+ovs_mutex_lock(_lock);
+HMAP_FOR_EACH_WITH_HASH(data, node, hash, _to_tc) {
+if (ovs_u128_equals(*ufid, data->ufid)) {
+break;
+}
+}
+if (data) {
+hmap_remove(_to_tc, >node);
+ovs_mutex_unlock(_lock);
+netdev_close(data->netdev);
+free(data);
+return true;
+}
+ovs_mutex_unlock(_lock);
+return false;
+}
+
+static ovs_u128 *
+find_ufid(int prio, int handle, struct netdev *netdev)
+{
+int ifindex = netdev_get_ifindex(netdev);
+struct ufid_to_tc_data *data;
+
+ovs_mutex_lock(_lock);
+HMAP_FOR_EACH(data, node, _to_tc) {
+if (data->prio == prio && data->handle == handle
+&& netdev_get_ifindex(data->netdev) == ifindex) {
+break;
+}
+}
+ovs_mutex_unlock(_lock);
+if (data) {
+return >ufid;
+}
+return NULL;
+}
+
+static int
+get_ufid_tc_mapping(ovs_u128 *ufid, int *prio, struct netdev **netdev)
+{
+size_t hash = hash_bytes(ufid, sizeof *ufid, 0);
+struct ufid_to_tc_data *data;
+
+ovs_mutex_lock(_lock);
+HMAP_FOR_EACH_WITH_HASH(data, node, hash, _to_tc) {
+if (ovs_u128_equals(*ufid, data->ufid)) {
+break;
+}
+}
+ovs_mutex_unlock(_lock);
+if (data) {
+if (prio) {
+*prio = data->prio;
+}
+if (netdev) {
+*netdev = netdev_ref(data->netdev);
+}
+return data->handle;
+}
+return 0;
+}
+
+static bool
+add_ufid_tc_mapping(ovs_u128 *ufid, int prio, int handle, struct netdev 
*netdev)
+{
+size_t hash = hash_bytes(ufid, sizeof *ufid, 0);
+bool replace = del_ufid_tc_mapping(ufid);
+struct ufid_to_tc_data *new_data = xzalloc(sizeof *new_data);
+
+new_data->ufid = *ufid;
+new_data->prio = prio;
+new_data->handle = handle;
+new_data->netdev = netdev_ref(netdev);
+
+ovs_mutex_lock(_lock);
+hmap_insert(_to_tc, _data->node, hash);
+ovs_mutex_unlock(_lock);
+
+return replace;
+}
+
 int
 netdev_tc_flow_flush(struct netdev *netdev)
 {
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH ovs V2 19/21] dpctl: read vswitch config on start

2016-12-25 Thread Paul Blakey

Use Open vSwitch IDL pattern to read OVS configuration on dpctl start,
needed as some functionality is dependent on that configuration.

Signed-off-by: Paul Blakey 
Reviewed-by: Roi Dayan 
---
 lib/dpctl.c   | 44 
 lib/dpctl.h   |  2 ++
 utilities/ovs-dpctl.c |  2 ++
 3 files changed, 48 insertions(+)

diff --git a/lib/dpctl.c b/lib/dpctl.c
index edccb7f..a892632 100644
--- a/lib/dpctl.c
+++ b/lib/dpctl.c
@@ -50,6 +50,10 @@
 #include "unixctl.h"
 #include "util.h"
 #include "openvswitch/ofp-parse.h"
+#include "ovsdb-idl.h"
+#include "vswitch-idl.h"
+#include "db-ctl-base.h"
+#include "tc.h"
 
 typedef int dpctl_command_handler(int argc, const char *argv[],
   struct dpctl_params *);
@@ -1645,6 +1649,46 @@ static const struct dpctl_command 
*get_all_dpctl_commands(void)
 return all_commands;
 }
 
+int
+dpctl_read_db()
+{
+char *db = ctl_default_db();
+struct ovsdb_idl *idl = ovsdb_idl_create(db, _idl_class, true,
+ true);
+ovsdb_idl_track_add_all(idl);
+unsigned int seqno = ovsdb_idl_get_seqno(idl);
+const struct ovsrec_open_vswitch *cfg;
+
+for (;;) {
+/* synchronize OVSDB */
+ovsdb_idl_run(idl);
+
+if (!ovsdb_idl_is_alive(idl)) {
+int retval = ovsdb_idl_get_last_error(idl);
+
+ctl_fatal("%s: database connection failed (%s)",
+  db, ovs_retval_to_string(retval));
+}
+
+if (seqno != ovsdb_idl_get_seqno(idl)) {
+cfg = ovsrec_open_vswitch_first(idl);
+if (cfg) {
+netdev_set_flow_api_enabled(smap_get_bool(>other_config,
+  "hw-offload",
+  false));
+tc_set_skip_hw(smap_get_bool(>other_config, "skip_hw",
+ false));
+break;
+}
+} else {
+ovsdb_idl_wait(idl);
+}
+}
+
+ovsdb_idl_destroy(idl);
+return 0;
+}
+
 /* Runs the command designated by argv[0] within the command table specified by
  * 'commands', which must be terminated by a command whose 'name' member is a
  * null pointer. */
diff --git a/lib/dpctl.h b/lib/dpctl.h
index 4ee083f..4828f3d 100644
--- a/lib/dpctl.h
+++ b/lib/dpctl.h
@@ -50,6 +50,8 @@ struct dpctl_params {
 void (*usage)(void *aux);
 };
 
+int dpctl_read_db(void);
+
 int dpctl_run_command(int argc, const char *argv[],
   struct dpctl_params *dpctl_p);
 
diff --git a/utilities/ovs-dpctl.c b/utilities/ovs-dpctl.c
index 843d305..135035d 100644
--- a/utilities/ovs-dpctl.c
+++ b/utilities/ovs-dpctl.c
@@ -66,6 +66,8 @@ main(int argc, char *argv[])
 dpctl_p.output = dpctl_print;
 dpctl_p.usage = usage;
 
+dpctl_read_db();
+
 error = dpctl_run_command(argc - optind, (const char **) argv + optind,
   _p);
 return error ? EXIT_FAILURE : EXIT_SUCCESS;
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH ovs V2 15/21] dpif-netlink: delete a flow from netdev

2016-12-25 Thread Paul Blakey

If a flow was offloaded to a netdev we delete it using netdev
flow api.

Signed-off-by: Paul Blakey 
Reviewed-by: Roi Dayan 
---
 lib/dpif-netlink.c | 30 +-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index 717af90..f8cc59d 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -2056,6 +2056,25 @@ parse_flow_put(struct dpif_netlink *dpif, struct 
dpif_flow_put *put)
 return false;
 }
 
+static bool
+parse_flow_del(struct dpif_netlink *dpif, struct dpif_flow_del *del)
+{
+bool ret = false;
+struct ovs_list port_list;
+struct netdev_list_element *element;
+
+netdev_hmap_port_get_list(dpif->dpif.dpif_class, _list);
+LIST_FOR_EACH(element, node, _list) {
+if (!netdev_flow_del(element->netdev, del->stats,
+ CONST_CAST(ovs_u128 *, del->ufid))) {
+ret = true;
+break;
+}
+}
+netdev_port_list_del(_list);
+return ret;
+}
+
 static void
 dbg_print_flow(const struct nlattr *key, size_t key_len,
const struct nlattr *mask, size_t mask_len,
@@ -2098,7 +2117,16 @@ try_send_to_netdev(struct dpif_netlink *dpif, struct 
dpif_op *op)
put->actions, put->actions_len, put->ufid, "PUT");
 return parse_flow_put(dpif, put);
 }
-case DPIF_OP_FLOW_DEL:
+case DPIF_OP_FLOW_DEL: {
+struct dpif_flow_del *del = >u.flow_del;
+
+if (!del->ufid) {
+return false;
+}
+dbg_print_flow(del->key, del->key_len, NULL, 0, NULL, 0,
+   del->ufid, "DEL");
+return parse_flow_del(dpif, del);
+}
 case DPIF_OP_FLOW_GET:
 case DPIF_OP_EXECUTE:
 default:
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH ovs V2 13/21] netdev-tc-offloads: Add flower mask to priority map

2016-12-25 Thread Paul Blakey

Flower classifer requires a different priority per mask,
so we hash the mask and generate a new priority for
each new mask used.

Signed-off-by: Paul Blakey 
Reviewed-by: Roi Dayan 
---
 lib/netdev-tc-offloads.c | 37 +
 1 file changed, 37 insertions(+)

diff --git a/lib/netdev-tc-offloads.c b/lib/netdev-tc-offloads.c
index 7729cfc..b4eee98 100644
--- a/lib/netdev-tc-offloads.c
+++ b/lib/netdev-tc-offloads.c
@@ -173,6 +173,43 @@ add_ufid_tc_mapping(ovs_u128 *ufid, int prio, int handle, 
struct netdev *netdev)
 return replace;
 }
 
+struct prio_map_data {
+struct hmap_node node;
+struct tc_flow_key mask;
+uint16_t protocol;
+uint16_t prio;
+};
+
+static uint16_t
+get_prio_for_tc_flow(struct tc_flow *tc_flow)
+{
+static struct hmap prios = HMAP_INITIALIZER();
+static struct ovs_mutex prios_lock = OVS_MUTEX_INITIALIZER;
+static int last_prio = 0;
+size_t key_len = sizeof(struct tc_flow_key);
+size_t hash = hash_bytes(_flow->mask, key_len, tc_flow->key.eth_type);
+struct prio_map_data *data;
+struct prio_map_data *new_data;
+
+ovs_mutex_lock(_lock);
+HMAP_FOR_EACH_WITH_HASH(data, node, hash, ) {
+if (!memcmp(_flow->mask, >mask, key_len)
+&& data->protocol == tc_flow->key.eth_type) {
+ovs_mutex_unlock(_lock);
+return data->prio;
+}
+}
+
+new_data = xzalloc(sizeof *new_data);
+memcpy(_data->mask, _flow->mask, key_len);
+new_data->prio = ++last_prio;
+new_data->protocol = tc_flow->key.eth_type;
+hmap_insert(, _data->node, hash);
+ovs_mutex_unlock(_lock);
+
+return new_data->prio;
+}
+
 int
 netdev_tc_flow_flush(struct netdev *netdev)
 {
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH ovs V2 11/21] netdev-tc-offloads: Implement netdev flow dump api using tc interface

2016-12-25 Thread Paul Blakey

Signed-off-by: Paul Blakey 
Reviewed-by: Roi Dayan 
---
 lib/netdev-tc-offloads.c | 138 ---
 1 file changed, 131 insertions(+), 7 deletions(-)

diff --git a/lib/netdev-tc-offloads.c b/lib/netdev-tc-offloads.c
index f470aa3..7729cfc 100644
--- a/lib/netdev-tc-offloads.c
+++ b/lib/netdev-tc-offloads.c
@@ -184,28 +184,152 @@ netdev_tc_flow_dump_create(struct netdev *netdev)
 {
 struct netdev_flow_dump *dump = xzalloc(sizeof *dump);
 
+memset(dump, 0, sizeof *dump);
+dump->nl_dump = xzalloc(sizeof *dump->nl_dump);
 dump->netdev = netdev_ref(netdev);
+tc_dump_flower_start(netdev_get_ifindex(netdev), dump->nl_dump);
 return dump;
 }
 
 int
 netdev_tc_flow_dump_destroy(struct netdev_flow_dump *dump)
 {
+nl_dump_done(dump->nl_dump);
 netdev_close(dump->netdev);
+free(dump->nl_dump);
 free(dump);
+return 0;
+}
+
+static int
+parse_tc_flow_to_match(struct tc_flow *tc_flow,
+   struct match *match,
+   struct nlattr **actions,
+   struct dpif_flow_stats *stats,
+   struct ofpbuf *buf) {
+size_t act_off;
+struct tc_flow_key *key = _flow->key;
+struct tc_flow_key *mask = _flow->mask;
+
+match_init_catchall(match);
+match_set_dl_type(match, key->eth_type);
+match_set_dl_src_masked(match, key->src_mac, mask->src_mac);
+match_set_dl_dst_masked(match, key->dst_mac, mask->dst_mac);
+if (key->vlan_id || key->vlan_prio) {
+match_set_dl_vlan(match, ntohs(key->vlan_id));
+match_set_dl_vlan_pcp(match, ntohs(key->vlan_prio));
+match_set_dl_type(match, key->encap_eth_type);
+}
+
+if (key->ip_proto &&
+(key->eth_type == htons(ETH_P_IP)
+ || key->eth_type == htons(ETH_P_IPV6))) {
+match_set_nw_proto(match, key->ip_proto);
+}
+match_set_nw_src_masked(match, key->ipv4.ipv4_src, mask->ipv4.ipv4_src);
+match_set_nw_dst_masked(match, key->ipv4.ipv4_dst, mask->ipv4.ipv4_dst);
+
+match_set_tp_dst_masked(match, key->dst_port, mask->dst_port);
+match_set_tp_src_masked(match, key->src_port, mask->src_port);
+
+if (tc_flow->tunnel.tunnel) {
+match_set_tun_id(match, tc_flow->tunnel.id);
+match_set_tun_src(match, tc_flow->tunnel.ipv4_src);
+match_set_tun_dst(match, tc_flow->tunnel.ipv4_dst);
+match_set_tp_dst(match, tc_flow->tunnel.tp_dst);
+}
+
+act_off = nl_msg_start_nested(buf, OVS_FLOW_ATTR_ACTIONS);
+{
+if (tc_flow->vlan_pop)
+nl_msg_put_flag(buf, OVS_ACTION_ATTR_POP_VLAN);
+
+if (tc_flow->vlan_push_id || tc_flow->vlan_push_prio) {
+struct ovs_action_push_vlan *push;
+push = nl_msg_put_unspec_zero(buf, OVS_ACTION_ATTR_PUSH_VLAN,
+  sizeof *push);
+
+push->vlan_tpid = ntohs(ETH_TYPE_VLAN);
+push->vlan_tci = ntohs(tc_flow->vlan_push_id
+   | (tc_flow->vlan_push_prio << 13)
+   | VLAN_CFI);
+}
+
+if (tc_flow->ifindex_out > 0) {
+int ifx = netdev_hmap_port_get_byifidx(tc_flow->ifindex_out);
+
+nl_msg_put_u32(buf, OVS_ACTION_ATTR_OUTPUT, ifx ? ifx : 0xFF);
+}
+
+if (tc_flow->set.set) {
+size_t set_offset = nl_msg_start_nested(buf, OVS_ACTION_ATTR_SET);
+size_t tunnel_offset = nl_msg_start_nested(buf, 
OVS_KEY_ATTR_TUNNEL);
+
+nl_msg_put_be64(buf, OVS_TUNNEL_KEY_ATTR_ID, tc_flow->set.id);
+nl_msg_put_be32(buf, OVS_TUNNEL_KEY_ATTR_IPV4_SRC, 
tc_flow->set.ipv4_src);
+nl_msg_put_be32(buf, OVS_TUNNEL_KEY_ATTR_IPV4_DST, 
tc_flow->set.ipv4_dst);
+nl_msg_put_be16(buf, OVS_TUNNEL_KEY_ATTR_TP_DST, 
tc_flow->set.tp_dst);
+
+nl_msg_end_nested(buf, tunnel_offset);
+nl_msg_end_nested(buf, set_offset);
+}
+}
+nl_msg_end_nested(buf, act_off);
+
+*actions = ofpbuf_at_assert(buf, act_off, sizeof(struct nlattr));
+
+if (stats) {
+memset(stats, 0, sizeof *stats);
+stats->n_packets = get_32aligned_u64(_flow->stats.n_packets);
+stats->n_bytes = get_32aligned_u64(_flow->stats.n_bytes);
+stats->used = tc_flow->lastused;
+}
 
 return 0;
 }
 
 bool
-netdev_tc_flow_dump_next(struct netdev_flow_dump *dump OVS_UNUSED,
-struct match *match OVS_UNUSED,
-struct nlattr **actions OVS_UNUSED,
-struct dpif_flow_stats *stats OVS_UNUSED,
-ovs_u128 *ufid OVS_UNUSED,
-struct ofpbuf *rbuffer OVS_UNUSED,
-struct ofpbuf *wbuffer OVS_UNUSED)
+netdev_tc_flow_dump_next(struct netdev_flow_dump *dump,
+struct match *match,
+

[ovs-dev] [PATCH ovs V2 21/21] netdev-vport: use common offloads interface

2016-12-25 Thread Paul Blakey

netdev vports are backed by actualy netdev at the kernel
level, so they can use the common netdev-tc offloads interface
for flow offloading (if enabled).

Signed-off-by: Paul Blakey 
Reviewed-by: Roi Dayan 
---
 lib/netdev-vport.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index 04c9d62..4127ace 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -44,6 +44,7 @@
 #include "unaligned.h"
 #include "unixctl.h"
 #include "openvswitch/vlog.h"
+#include "netdev-tc-offloads.h"
 
 VLOG_DEFINE_THIS_MODULE(netdev_vport);
 
@@ -838,14 +839,14 @@ netdev_vport_get_ifindex(const struct netdev *netdev_)
 NULL,   /* rx_wait */   \
 NULL,   /* rx_drain */  \
 \
-NULL,   /* flow_flush */\
-NULL,   /* flow_dump_create */  \
-NULL,   /* flow_dump_destroy */ \
-NULL,   /* flow_dump_next */\
-NULL,   /* flow_put */  \
-NULL,   /* flow_get */  \
-NULL,   /* flow_del */  \
-NULL,   /* init_flow_api */
+netdev_tc_flow_flush,   \
+netdev_tc_flow_dump_create, \
+netdev_tc_flow_dump_destroy,\
+netdev_tc_flow_dump_next,   \
+netdev_tc_flow_put, \
+netdev_tc_flow_get, \
+netdev_tc_flow_del, \
+netdev_tc_init_flow_api,
 
 
 #define TUNNEL_CLASS(NAME, DPIF_PORT, BUILD_HEADER, PUSH_HEADER, POP_HEADER)   
\
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH ovs V2 20/21] netdev-linux: always add ingress qdisc

2016-12-25 Thread Paul Blakey

flow offloading by tc needs ingress qdisc on the device.
Deleting the ingress qdisc was done in order to flush
policing filters, so instead we just flush the filter and
leave the ingress added (and add it if there wasn't any).

Signed-off-by: Paul Blakey 
Reviewed-by: Roi Dayan 
---
 lib/netdev-linux.c | 29 +++--
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 6a23a82..90b6a64 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -75,6 +75,7 @@
 #include "openvswitch/vlog.h"
 #include "util.h"
 #include "netdev-tc-offloads.h"
+#include "tc.h"
 
 VLOG_DEFINE_THIS_MODULE(netdev_linux);
 
@@ -2057,6 +2058,7 @@ netdev_linux_set_policing(struct netdev *netdev_,
 struct netdev_linux *netdev = netdev_linux_cast(netdev_);
 const char *netdev_name = netdev_get_name(netdev_);
 int error;
+int ifindex;
 
 kbits_burst = (!kbits_rate ? 0   /* Force to 0 if no rate specified. */
: !kbits_burst ? 8000 /* Default to 8000 kbits if 0. */
@@ -2074,22 +2076,29 @@ netdev_linux_set_policing(struct netdev *netdev_,
 }
 
 COVERAGE_INC(netdev_set_policing);
-/* Remove any existing ingress qdisc. */
-error = tc_add_del_ingress_qdisc(netdev_, false);
+error = tc_add_del_ingress_qdisc(netdev_, true);
+error = (error == EEXIST) ? 0 : error;
 if (error) {
-VLOG_WARN_RL(, "%s: removing policing failed: %s",
+VLOG_WARN_RL(, "%s: adding policing qdisc failed: %s",
+netdev_name, ovs_strerror(error));
+goto out;
+}
+
+/* Remove any existing policing. */
+error = get_ifindex(>up, );
+if (error) {
+VLOG_WARN_RL(, "%s: getting ifindex failed: %s",
+ netdev_name, ovs_strerror(error));
+goto out;
+}
+error = tc_flush_flower(ifindex);
+if (error) {
+VLOG_WARN_RL(, "%s: flushing policing failed: %s",
  netdev_name, ovs_strerror(error));
 goto out;
 }
 
 if (kbits_rate) {
-error = tc_add_del_ingress_qdisc(netdev_, true);
-if (error) {
-VLOG_WARN_RL(, "%s: adding policing qdisc failed: %s",
- netdev_name, ovs_strerror(error));
-goto out;
-}
-
 error = tc_add_policer(netdev_, kbits_rate, kbits_burst);
 if (error){
 VLOG_WARN_RL(, "%s: adding policing action failed: %s",
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH ovs V2 09/21] dpif-netlink: Dump netdevs flows on flow dump

2016-12-25 Thread Paul Blakey

While dumping flows, dump flows that were offloaded to
netdev and parse them back to dpif flow.

Signed-off-by: Paul Blakey 
Reviewed-by: Roi Dayan 
---
 lib/dpif-netlink.c | 179 +
 1 file changed, 179 insertions(+)

diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index 36f2888..3d8940e 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -38,6 +38,7 @@
 #include "flow.h"
 #include "fat-rwlock.h"
 #include "netdev.h"
+#include "netdev-provider.h"
 #include "netdev-linux.h"
 #include "netdev-vport.h"
 #include "netlink-conntrack.h"
@@ -55,6 +56,7 @@
 #include "unaligned.h"
 #include "util.h"
 #include "openvswitch/vlog.h"
+#include "openvswitch/match.h"
 
 VLOG_DEFINE_THIS_MODULE(dpif_netlink);
 #ifdef _WIN32
@@ -68,6 +70,8 @@ enum { MAX_PORTS = USHRT_MAX };
  * missing if we have old headers. */
 #define ETH_FLAG_LRO  (1 << 15)/* LRO is enabled */
 
+#define FLOW_DUMP_MAX_BATCH 50
+
 struct dpif_netlink_dp {
 /* Generic Netlink header. */
 uint8_t cmd;
@@ -1355,6 +1359,10 @@ struct dpif_netlink_flow_dump {
 struct dpif_flow_dump up;
 struct nl_dump nl_dump;
 atomic_int status;
+struct netdev_flow_dump **netdev_dumps;
+int netdev_num;
+int netdev_given;
+struct ovs_mutex netdev_lock;
 };
 
 static struct dpif_netlink_flow_dump *
@@ -1363,6 +1371,34 @@ dpif_netlink_flow_dump_cast(struct dpif_flow_dump *dump)
 return CONTAINER_OF(dump, struct dpif_netlink_flow_dump, up);
 }
 
+static void start_netdev_dump(const struct dpif *dpif_,
+  struct dpif_netlink_flow_dump *dump) {
+
+if (!netdev_flow_api_enabled) {
+dump->netdev_num = 0;
+return;
+}
+
+struct netdev_list_element *element;
+struct ovs_list port_list;
+int ports = netdev_hmap_port_get_list(dpif_->dpif_class, _list);
+int i = 0;
+
+dump->netdev_dumps =
+ports ? xzalloc(sizeof(struct netdev_flow_dump *) * ports) : 0;
+dump->netdev_num = ports;
+dump->netdev_given = 0;
+
+LIST_FOR_EACH(element, node, _list) {
+dump->netdev_dumps[i] = netdev_flow_dump_create(element->netdev);
+dump->netdev_dumps[i]->port = element->port_no;
+i++;
+}
+netdev_port_list_del(_list);
+
+ovs_mutex_init(>netdev_lock);
+}
+
 static struct dpif_flow_dump *
 dpif_netlink_flow_dump_create(const struct dpif *dpif_, bool terse)
 {
@@ -1387,6 +1423,8 @@ dpif_netlink_flow_dump_create(const struct dpif *dpif_, 
bool terse)
 atomic_init(>status, 0);
 dump->up.terse = terse;
 
+start_netdev_dump(dpif_, dump);
+
 return >up;
 }
 
@@ -1397,6 +1435,16 @@ dpif_netlink_flow_dump_destroy(struct dpif_flow_dump 
*dump_)
 unsigned int nl_status = nl_dump_done(>nl_dump);
 int dump_status;
 
+if (netdev_flow_api_enabled) {
+for (int i = 0; i < dump->netdev_num; i++) {
+int err = netdev_flow_dump_destroy(dump->netdev_dumps[i]);
+if (err != 0 && err != EOPNOTSUPP) {
+VLOG_ERR("failed dumping netdev: %s", ovs_strerror(err));
+}
+}
+free(dump->netdev_dumps);
+}
+
 /* No other thread has access to 'dump' at this point. */
 atomic_read_relaxed(>status, _status);
 free(dump);
@@ -1410,6 +1458,11 @@ struct dpif_netlink_flow_dump_thread {
 struct dpif_flow_stats stats;
 struct ofpbuf nl_flows; /* Always used to store flows. */
 struct ofpbuf *nl_actions;  /* Used if kernel does not supply actions. */
+struct odputil_keybuf keybuf[FLOW_DUMP_MAX_BATCH];
+struct odputil_keybuf maskbuf[FLOW_DUMP_MAX_BATCH];
+struct odputil_keybuf actbuf[FLOW_DUMP_MAX_BATCH];
+int netdev_cur_dump;
+bool netdev_done;
 };
 
 static struct dpif_netlink_flow_dump_thread *
@@ -1429,6 +1482,8 @@ dpif_netlink_flow_dump_thread_create(struct 
dpif_flow_dump *dump_)
 thread->dump = dump;
 ofpbuf_init(>nl_flows, NL_DUMP_BUFSIZE);
 thread->nl_actions = NULL;
+thread->netdev_cur_dump = 0;
+thread->netdev_done = !(thread->netdev_cur_dump < dump->netdev_num);
 
 return >up;
 }
@@ -1466,6 +1521,90 @@ dpif_netlink_flow_to_dpif_flow(struct dpif *dpif, struct 
dpif_flow *dpif_flow,
 dpif_netlink_flow_get_stats(datapath_flow, _flow->stats);
 }
 
+static void
+dpif_netlink_advance_netdev_dump(struct dpif_netlink_flow_dump_thread *thread)
+{
+struct dpif_netlink_flow_dump *dump = thread->dump;
+
+ovs_mutex_lock(>netdev_lock);
+/* if we haven't finished (dumped everything) */
+if (dump->netdev_given < dump->netdev_num) {
+/* if we are the first to find that given dump is finished
+ * (for race condition, e.g 3 finish dump 0 at the same time) */
+if (thread->netdev_cur_dump == dump->netdev_given) {
+thread->netdev_cur_dump = ++dump->netdev_given;
+/* did we just finish the last dump? done. */
+if (dump->netdev_given ==

[ovs-dev] [PATCH ovs V2 07/21] dpif-netlink: Flush added ports using netdev flow api

2016-12-25 Thread Paul Blakey

If netdev flow offloading is enabled, flush all
added ports using netdev flow api.

Signed-off-by: Paul Blakey 
Reviewed-by: Roi Dayan 
---
 lib/dpif-netlink.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index a39faa2..36f2888 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -1054,10 +1054,21 @@ dpif_netlink_flow_flush(struct dpif *dpif_)
 {
 const struct dpif_netlink *dpif = dpif_netlink_cast(dpif_);
 struct dpif_netlink_flow flow;
+struct ovs_list port_list;
+struct netdev_list_element *element;
 
 dpif_netlink_flow_init();
 flow.cmd = OVS_FLOW_CMD_DEL;
 flow.dp_ifindex = dpif->dp_ifindex;
+
+if (netdev_flow_api_enabled) {
+netdev_hmap_port_get_list(dpif_->dpif_class, _list);
+LIST_FOR_EACH(element, node, _list) {
+netdev_flow_flush(element->netdev);
+}
+netdev_port_list_del(_list);
+}
+
 return dpif_netlink_flow_transact(, NULL, NULL);
 }
 
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH ovs V2 08/21] netdev-tc-offloads: Implement netdev flow flush using tc interface

2016-12-25 Thread Paul Blakey

Signed-off-by: Paul Blakey 
Reviewed-by: Roi Dayan 
---
 lib/netdev-tc-offloads.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/netdev-tc-offloads.c b/lib/netdev-tc-offloads.c
index 692ab76..4ba6086 100644
--- a/lib/netdev-tc-offloads.c
+++ b/lib/netdev-tc-offloads.c
@@ -76,9 +76,9 @@
 VLOG_DEFINE_THIS_MODULE(netdev_tc_offloads);
 
 int
-netdev_tc_flow_flush(struct netdev *netdev OVS_UNUSED)
+netdev_tc_flow_flush(struct netdev *netdev)
 {
-return EOPNOTSUPP;
+return tc_flush_flower(netdev_get_ifindex(netdev));
 }
 
 struct netdev_flow_dump *
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH ovs V2 03/21] other-config: Add hw-offload switch to control netdev flow offloading

2016-12-25 Thread Paul Blakey

Add a new configuration option - hw-offload that enables netdev
flow api. Enabling this option will allow offloading flows
using netdev implementation instead of the kernel datapath.
This configuration option defaults to false - disabled.

Signed-off-by: Paul Blakey 
Reviewed-by: Roi Dayan 
---
 lib/netdev.c | 18 ++
 lib/netdev.h |  2 ++
 vswitchd/bridge.c|  2 ++
 vswitchd/vswitch.xml | 11 +++
 4 files changed, 33 insertions(+)

diff --git a/lib/netdev.c b/lib/netdev.c
index 3ac3c48..b289166 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -2071,7 +2071,25 @@ netdev_init_flow_api(struct netdev *netdev)
 {
 const struct netdev_class *class = netdev->netdev_class;
 
+if (!netdev_flow_api_enabled) {
+return EOPNOTSUPP;
+}
+
 return (class->init_flow_api
 ? class->init_flow_api(netdev)
 : EOPNOTSUPP);
 }
+
+bool netdev_flow_api_enabled = false;
+
+void
+netdev_set_flow_api_enabled(bool enabled)
+{
+static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
+
+if (ovsthread_once_start()) {
+netdev_flow_api_enabled = enabled;
+VLOG_INFO("netdev: Flow API %s", enabled ? "Enabled" : "Disabled");
+ovsthread_once_done();
+}
+}
diff --git a/lib/netdev.h b/lib/netdev.h
index c04632d..5be67ea 100644
--- a/lib/netdev.h
+++ b/lib/netdev.h
@@ -168,6 +168,8 @@ int netdev_flow_get(struct netdev *, struct match *, struct 
nlattr **actions,
 struct dpif_flow_stats *, ovs_u128 *, struct ofpbuf *);
 int netdev_flow_del(struct netdev *, struct dpif_flow_stats *, ovs_u128 *);
 int netdev_init_flow_api(struct netdev *);
+extern bool netdev_flow_api_enabled;
+void netdev_set_flow_api_enabled(bool flow_api_enabled);
 
 /* native tunnel APIs */
 /* Structure to pass parameters required to build a tunnel header. */
diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index 7f33070..e90a31a 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -2919,6 +2919,8 @@ bridge_run(void)
 cfg = ovsrec_open_vswitch_first(idl);
 
 if (cfg) {
+netdev_set_flow_api_enabled(smap_get_bool(>other_config,
+  "hw-offload", false));
 dpdk_init(>other_config);
 }
 
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 66c349b..9bc9f5f 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -169,6 +169,17 @@
 
   The default is 1.
 
+
+
+  
+
+  Set this value to true to enable netdev flow offload.
+
+
+  The default value is false. Changing this value requires
+  restarting the daemon
+
   
 
   https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH ovs V2 06/21] netdev-vport: Add get ifindex implementation

2016-12-25 Thread Paul Blakey

Ifindex is needed for flow offloading using tc, so we
try and get ifindex by the real vport netdev name.

Signed-off-by: Paul Blakey 
Reviewed-by: Roi Dayan 
---
 lib/netdev-vport.c | 28 +++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index 2cde854..04c9d62 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -742,6 +742,32 @@ get_stats(const struct netdev *netdev, struct netdev_stats 
*stats)
 return 0;
 }
 
+static int
+do_get_ifindex(const char *netdev_name)
+{
+struct ifreq ifr;
+int error;
+
+ovs_strzcpy(ifr.ifr_name, netdev_name, sizeof ifr.ifr_name);
+
+error = af_inet_ioctl(SIOCGIFINDEX, );
+if (error) {
+VLOG_ERR("ioctl(SIOCGIFINDEX) on %s device failed: %s",
+ netdev_name, ovs_strerror(error));
+return -error;
+}
+return ifr.ifr_ifindex;
+}
+
+static int
+netdev_vport_get_ifindex(const struct netdev *netdev_)
+{
+char buf[32];
+const char *name = netdev_vport_get_dpif_port(netdev_, buf, sizeof(buf));
+
+return do_get_ifindex(name);
+}
+
 
 #define VPORT_FUNCTIONS(GET_CONFIG, SET_CONFIG, \
 GET_TUNNEL_CONFIG, GET_STATUS,  \
@@ -771,7 +797,7 @@ get_stats(const struct netdev *netdev, struct netdev_stats 
*stats)
 netdev_vport_get_etheraddr, \
 NULL,   /* get_mtu */   \
 NULL,   /* set_mtu */   \
-NULL,   /* get_ifindex */   \
+netdev_vport_get_ifindex,   \
 NULL,   /* get_carrier */   \
 NULL,   /* get_carrier_resets */\
 NULL,   /* get_miimon */\
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH ovs V2 05/21] dpif: Save added ports in a port map for netdev flow api use

2016-12-25 Thread Paul Blakey

To use netdev flow offloading api, dpifs needs to iterate over
added ports. This addition inserts the added dpif ports in a hash map,
The map will also be used to translate dpif ports to netdevs.

Signed-off-by: Paul Blakey 
Reviewed-by: Roi Dayan 
---
 lib/dpif.c   |  25 +++
 lib/netdev.c | 133 +++
 lib/netdev.h |  14 +++
 3 files changed, 172 insertions(+)

diff --git a/lib/dpif.c b/lib/dpif.c
index 53958c5..c67ea92 100644
--- a/lib/dpif.c
+++ b/lib/dpif.c
@@ -352,7 +352,22 @@ do_open(const char *name, const char *type, bool create, 
struct dpif **dpifp)
 error = registered_class->dpif_class->open(registered_class->dpif_class,
name, create, );
 if (!error) {
+struct dpif_port_dump port_dump;
+struct dpif_port dpif_port;
+
 ovs_assert(dpif->dpif_class == registered_class->dpif_class);
+
+DPIF_PORT_FOR_EACH(_port, _dump, dpif) {
+struct netdev *netdev;
+int err = netdev_open(dpif_port.name, dpif_port.type, );
+
+if (!err) {
+netdev_hmap_port_add(netdev, dpif->dpif_class, _port);
+netdev_close(netdev);
+} else {
+VLOG_WARN("could not open netdev %s type %s", name, type);
+}
+}
 } else {
 dp_class_unref(registered_class);
 }
@@ -545,6 +560,14 @@ dpif_port_add(struct dpif *dpif, struct netdev *netdev, 
odp_port_t *port_nop)
 if (!error) {
 VLOG_DBG_RL(_rl, "%s: added %s as port %"PRIu32,
 dpif_name(dpif), netdev_name, port_no);
+
+/* temp dpif_port, will be cloned in netdev_hmap_port_add */
+struct dpif_port dpif_port;
+
+dpif_port.type = CONST_CAST(char *, netdev_get_type(netdev));
+dpif_port.name = CONST_CAST(char *, netdev_name);
+dpif_port.port_no = port_no;
+netdev_hmap_port_add(netdev, dpif->dpif_class, _port);
 } else {
 VLOG_WARN_RL(_rl, "%s: failed to add %s as port: %s",
  dpif_name(dpif), netdev_name, ovs_strerror(error));
@@ -569,6 +592,8 @@ dpif_port_del(struct dpif *dpif, odp_port_t port_no)
 if (!error) {
 VLOG_DBG_RL(_rl, "%s: port_del(%"PRIu32")",
 dpif_name(dpif), port_no);
+
+netdev_hmap_port_del(port_no, dpif->dpif_class->type);
 } else {
 log_operation(dpif, "port_del", error);
 }
diff --git a/lib/netdev.c b/lib/netdev.c
index b289166..2630802 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -2080,6 +2080,139 @@ netdev_init_flow_api(struct netdev *netdev)
 : EOPNOTSUPP);
 }
 
+static struct hmap port_to_netdev = HMAP_INITIALIZER(_to_netdev);
+static struct hmap ifindex_to_port = HMAP_INITIALIZER(_to_port);
+
+struct port_to_netdev_data {
+struct hmap_node node;
+struct netdev *netdev;
+struct dpif_port dpif_port;
+const void *obj;
+};
+
+struct ifindex_to_port_data {
+struct hmap_node node;
+int ifindex;
+odp_port_t port;
+};
+
+int
+netdev_hmap_port_add(struct netdev *netdev, const void *obj,
+ struct dpif_port *dpif_port)
+{
+size_t hash = hash_int(dpif_port->port_no, hash_pointer(obj, 0));
+struct port_to_netdev_data *data = xzalloc(sizeof *data);
+struct ifindex_to_port_data *ifidx = xzalloc(sizeof *ifidx);
+
+netdev_hmap_port_del(dpif_port->port_no, obj);
+
+data->netdev = netdev_ref(netdev);
+data->obj = obj;
+dpif_port_clone(>dpif_port, dpif_port);
+
+ifidx->ifindex = netdev_get_ifindex(netdev);
+ifidx->port = dpif_port->port_no;
+
+hmap_insert(_to_netdev, >node, hash);
+hmap_insert(_to_port, >node, ifidx->ifindex);
+
+return 0;
+}
+
+struct netdev *
+netdev_hmap_port_get(odp_port_t port_no, const void *obj)
+{
+size_t hash = hash_int(port_no, hash_pointer(obj, 0));
+struct port_to_netdev_data *data;
+
+HMAP_FOR_EACH_WITH_HASH(data, node, hash, _to_netdev) {
+if (data->obj == obj && data->dpif_port.port_no == port_no) {
+break;
+}
+}
+
+if (data) {
+netdev_ref(data->netdev);
+return data->netdev;
+}
+
+return 0;
+}
+
+odp_port_t
+netdev_hmap_port_get_byifidx(int ifindex)
+{
+struct ifindex_to_port_data *data;
+
+HMAP_FOR_EACH_WITH_HASH(data, node, ifindex, _to_port) {
+if (data->ifindex == ifindex) {
+return data->port;
+}
+}
+
+return 0;
+}
+
+int
+netdev_hmap_port_del(odp_port_t port_no, const void *obj)
+{
+size_t hash = hash_int(port_no, hash_pointer(obj, 0));
+struct port_to_netdev_data *data;
+
+HMAP_FOR_EACH_WITH_HASH(data, node, hash, _to_netdev) {
+if (data->obj == obj && data->dpif_port.port_no == port_no) {
+break;
+}
+}
+
+if (data) {
+dpif_port_destroy(>dpif_port);
+

[ovs-dev] [PATCH ovs V2 01/21] tc: Add tc flower interface

2016-12-25 Thread Paul Blakey

Add tc flower interface that will be used to offload flows via tc
flower classifier. Depending on the flag used (skip_sw/hw) flower
will pass those to HW or handle them itself.

Signed-off-by: Shahar Klein 
Signed-off-by: Paul Blakey 
Reviewed-by: Roi Dayan 
---
 lib/automake.mk |   2 +
 lib/tc.c| 996 
 lib/tc.h| 107 ++
 3 files changed, 1105 insertions(+)
 create mode 100644 lib/tc.c
 create mode 100644 lib/tc.h

diff --git a/lib/automake.mk b/lib/automake.mk
index 9345cee..bcc7813 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -351,6 +351,8 @@ if LINUX
 lib_libopenvswitch_la_SOURCES += \
lib/dpif-netlink.c \
lib/dpif-netlink.h \
+   lib/tc.h \
+   lib/tc.c \
lib/if-notifier.c \
lib/if-notifier.h \
lib/netdev-linux.c \
diff --git a/lib/tc.c b/lib/tc.c
new file mode 100644
index 000..b5f6603
--- /dev/null
+++ b/lib/tc.c
@@ -0,0 +1,996 @@
+/*
+ * Copyright (c) 2016 Mellanox Technologies, Ltd.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "timeval.h"
+#include "netlink-socket.h"
+#include "netlink.h"
+#include "rtnetlink.h"
+#include "openvswitch/vlog.h"
+#include "openvswitch/ofpbuf.h"
+#include "tc.h"
+#include "util.h"
+#include "byte-order.h"
+
+VLOG_DEFINE_THIS_MODULE(tc);
+
+bool SKIP_HW = false;
+
+/* Returns tc handle 'major':'minor'. */
+static unsigned int
+tc_make_handle(unsigned int major, unsigned int minor)
+{
+return TC_H_MAKE(major << 16, minor);
+}
+
+static struct tcmsg *
+tc_make_req(int ifindex, int type, unsigned int flags, struct ofpbuf *request)
+{
+struct tcmsg *tcmsg;
+struct nlmsghdr *nlmsghdr;
+
+ofpbuf_init(request, 512);
+
+nl_msg_reserve(request, NLMSG_HDRLEN + sizeof *tcmsg);
+nlmsghdr = nl_msg_put_uninit(request, NLMSG_HDRLEN);
+nlmsghdr->nlmsg_len = 0;
+nlmsghdr->nlmsg_type = type;
+nlmsghdr->nlmsg_flags = NLM_F_REQUEST | flags;
+nlmsghdr->nlmsg_seq = 0;
+nlmsghdr->nlmsg_pid = 0;
+
+tcmsg = ofpbuf_put_zeros(request, sizeof *tcmsg);
+tcmsg->tcm_family = AF_UNSPEC;
+tcmsg->tcm_ifindex = ifindex;
+
+return tcmsg;
+}
+
+static int
+tc_transact(struct ofpbuf *request, struct ofpbuf **replyp)
+{
+int error = nl_transact(NETLINK_ROUTE, request, replyp);
+
+ofpbuf_uninit(request);
+return error;
+}
+
+static const struct nl_policy tca_policy[] = {
+[TCA_KIND] = { .type = NL_A_STRING, .optional = false, },
+[TCA_OPTIONS] = { .type = NL_A_NESTED, .optional = false, },
+[TCA_STATS] = { .type = NL_A_UNSPEC,
+.min_len = sizeof(struct tc_stats), .optional = true, },
+[TCA_STATS2] = { .type = NL_A_NESTED, .optional = true, },
+};
+
+static const struct nl_policy tca_flower_policy[] = {
+[TCA_FLOWER_CLASSID] = { .type = NL_A_U32, .optional = true, },
+[TCA_FLOWER_INDEV] = { .type = NL_A_STRING, .max_len = IFNAMSIZ,
+   .optional = true, },
+[TCA_FLOWER_KEY_ETH_SRC] = { .type = NL_A_UNSPEC,
+ .min_len = ETH_ALEN, .optional = true, },
+[TCA_FLOWER_KEY_ETH_DST] = { .type = NL_A_UNSPEC,
+ .min_len = ETH_ALEN, .optional = true, },
+[TCA_FLOWER_KEY_ETH_SRC_MASK] = { .type = NL_A_UNSPEC,
+  .min_len = ETH_ALEN,
+  .optional = true, },
+[TCA_FLOWER_KEY_ETH_DST_MASK] = { .type = NL_A_UNSPEC,
+  .min_len = ETH_ALEN,
+  .optional = true, },
+[TCA_FLOWER_KEY_ETH_TYPE] = { .type = NL_A_U16, .optional = false, },
+[TCA_FLOWER_FLAGS] = { .type = NL_A_U32, .optional = false, },
+[TCA_FLOWER_ACT] = { .type = NL_A_NESTED, .optional = false, },
+[TCA_FLOWER_KEY_IP_PROTO] = { .type = NL_A_U8, .optional = true, },
+[TCA_FLOWER_KEY_IPV4_SRC] = { .type = NL_A_U32, .optional = true, },
+[TCA_FLOWER_KEY_IPV4_DST] = {.type = NL_A_U32, .optional = true, },
+[TCA_FLOWER_KEY_IPV4_SRC_MASK] = { .type = NL_A_U32, .optional = true, },
+[TCA_FLOWER_KEY_IPV4_DST_MASK] = { .type = NL_A_U32, .optional = true, },
+[TCA_FLOWER_KEY_IPV6_SRC] = { .type = NL_A_UNSPEC,
+  .min_len =

[ovs-dev] [PATCH ovs V2 02/21] netdev: Adding a new netdev api to be used for offloading flows

2016-12-25 Thread Paul Blakey

Signed-off-by: Paul Blakey 
Reviewed-by: Roi Dayan 
---
 lib/automake.mk  |   2 +
 lib/netdev-bsd.c |   9 +++
 lib/netdev-dummy.c   |   9 +++
 lib/netdev-linux.c   |  26 +++--
 lib/netdev-provider.h|  29 +
 lib/netdev-tc-offloads.c | 149 +++
 lib/netdev-tc-offloads.h |  40 +
 lib/netdev-vport.c   |  11 +++-
 lib/netdev.c | 102 
 lib/netdev.h |  16 +
 10 files changed, 388 insertions(+), 5 deletions(-)
 create mode 100644 lib/netdev-tc-offloads.c
 create mode 100644 lib/netdev-tc-offloads.h

diff --git a/lib/automake.mk b/lib/automake.mk
index bcc7813..d06fc8b 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -353,6 +353,8 @@ lib_libopenvswitch_la_SOURCES += \
lib/dpif-netlink.h \
lib/tc.h \
lib/tc.c \
+   lib/netdev-tc-offloads.h \
+   lib/netdev-tc-offloads.c \
lib/if-notifier.c \
lib/if-notifier.h \
lib/netdev-linux.c \
diff --git a/lib/netdev-bsd.c b/lib/netdev-bsd.c
index 75a330b..d61c229 100644
--- a/lib/netdev-bsd.c
+++ b/lib/netdev-bsd.c
@@ -1543,6 +1543,15 @@ netdev_bsd_update_flags(struct netdev *netdev_, enum 
netdev_flags off,
 netdev_bsd_rxq_recv, \
 netdev_bsd_rxq_wait, \
 netdev_bsd_rxq_drain,\
+ \
+NULL, /* flow_flush */   \
+NULL, /* flow_dump_create */ \
+NULL, /* flow_dump_destroy */\
+NULL, /* flow_dump_next */   \
+NULL, /* flow_put */ \
+NULL, /* flow_get */ \
+NULL, /* flow_del */ \
+NULL, /* init_flow_api */\
 }
 
 const struct netdev_class netdev_bsd_class =
diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index dec1a8e..9408cc4 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -1384,6 +1384,15 @@ netdev_dummy_update_flags(struct netdev *netdev_,
 netdev_dummy_rxq_recv,  \
 netdev_dummy_rxq_wait,  \
 netdev_dummy_rxq_drain, \
+\
+NULL,   /* flow_flush */\
+NULL,   /* flow_dump_create */  \
+NULL,   /* flow_dump_destroy */ \
+NULL,   /* flow_dump_next */\
+NULL,   /* flow_put */  \
+NULL,   /* flow_get */  \
+NULL,   /* flow_del */  \
+NULL,   /* init_flow_api */ \
 }
 
 static const struct netdev_class dummy_class =
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index a5a9ec1..6a23a82 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -74,6 +74,7 @@
 #include "unaligned.h"
 #include "openvswitch/vlog.h"
 #include "util.h"
+#include "netdev-tc-offloads.h"
 
 VLOG_DEFINE_THIS_MODULE(netdev_linux);
 
@@ -2762,7 +2763,8 @@ netdev_linux_update_flags(struct netdev *netdev_, enum 
netdev_flags off,
 }
 
 #define NETDEV_LINUX_CLASS(NAME, CONSTRUCT, GET_STATS,  \
-   GET_FEATURES, GET_STATUS)\
+   GET_FEATURES, GET_STATUS,\
+   FLOW_OFFLOAD_API)\
 {   \
 NAME,   \
 false,  /* is_pmd */\
@@ -2831,15 +2833,29 @@ netdev_linux_update_flags(struct netdev *netdev_, enum 
netdev_flags off,
 netdev_linux_rxq_recv,  \
 netdev_linux_rxq_wait,  \
 netdev_linux_rxq_drain, \
+\
+FLOW_OFFLOAD_API\
 }
 
+#define NO_OFFLOAD_API NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL
+#define LINUX_OFFLOAD   \
+netdev_tc_flow_flush,   \
+netdev_tc_flow_dump_create, \
+netdev_tc_flow_dump_destroy,\
+netdev_tc_flow_dump_next,   \
+netdev_tc_flow_put, \
+netdev_tc_flow_get, \
+netdev_tc_flow_del,

[ovs-dev] [PATCH ovs V2 00/21] Introducing HW offload support for openvswitch

2016-12-25 Thread Paul Blakey

This patch series introduces rule offload functionality to dpif-netlink 
via netdev ports new flow offloading API. The user can specify whether to
enable rule offloading or not via OVS configuration. Netdev providers
are able to implement netdev flow offload API in order to offload rules.

This patch series also implements one offload scheme for netdev-linux,
using TC flower classifier, which was chosen because its sort of natrual to
state OVS DP rules for this classifier. However, the code can be extended
to support other classifiers such as U32, eBPF, etc which support offload as 
well.

The use-case we are currently addressing is the newly sriov switchdev mode in 
the
linux kernel which was introduced in version 4.8 [1][2]. this series was tested 
against sriov vfs
vports representors of the Mellanox 100G ConnectX-4 series exposed by the mlx5 
kernel driver.

changes from V1
- Added generic netdev flow offloads API.
- Implemented relevant flow API in netdev-linux (and netdev-vport).
- Added a other_config hw-offload option to enable offloading (defaults to 
false).
- Fixed coding style to conform with OVS.
- Policy removed for now. (Will be discussed how best implemented later).

Paul Blakey (21):
  tc: Add tc flower interface
  netdev: Adding a new netdev api to be used for offloading flows
  other-config: Add hw-offload switch to control netdev flow offloading
  other-config: Add skip-hw switch to control tc flower flag
  dpif: Save added ports in a port map for netdev flow api use
  netdev-vport: Add get ifindex implementation
  dpif-netlink: Flush added ports using netdev flow api
  netdev-tc-offloads: Implement netdev flow flush using tc interface
  dpif-netlink: Dump netdevs flows on flow dump
  netdev-tc-offloads: Add ufid to tc/netdev map
  netdev-tc-offloads: Implement netdev flow dump api using tc interface
  dpif-netlink: Use netdev flow put api to insert a flow
  netdev-tc-offloads: Add flower mask to priority map
  netdev-tc-offloads: Netdev flow put implementation using tc api
  dpif-netlink: delete a flow from netdev
  netdev-tc-offloads: netdev flow del using tc interface
  dpif-netlink: Use netdev flow get api to query a flow
  netdev-tc-offloads: Implement flow get using tc interface
  dpctl: read vswitch config on start
  netdev-linux: always add ingress qdisc
  netdev-vport: use common offloads interface

 lib/automake.mk  |   4 +
 lib/dpctl.c  |  44 +++
 lib/dpctl.h  |   2 +
 lib/dpif-netlink.c   | 506 +++-
 lib/dpif.c   |  25 ++
 lib/netdev-bsd.c |   9 +
 lib/netdev-dummy.c   |   9 +
 lib/netdev-linux.c   |  55 ++-
 lib/netdev-provider.h|  29 ++
 lib/netdev-tc-offloads.c | 625 +
 lib/netdev-tc-offloads.h |  40 ++
 lib/netdev-vport.c   |  40 +-
 lib/netdev.c | 253 
 lib/netdev.h |  32 ++
 lib/tc.c | 996 +++
 lib/tc.h | 107 +
 utilities/ovs-dpctl.c|   2 +
 vswitchd/bridge.c|   4 +
 vswitchd/vswitch.xml |  25 ++
 19 files changed, 2787 insertions(+), 20 deletions(-)
 create mode 100644 lib/netdev-tc-offloads.c
 create mode 100644 lib/netdev-tc-offloads.h
 create mode 100644 lib/tc.c
 create mode 100644 lib/tc.h

-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v2 6/6] ovn: distributed NAT flows

[ovs-dev] [PATCH v2 2/6] ovn: Introduce "chassisredirect" port binding

[ovs-dev] [PATCH v2 4/6] ovn: move load balancing flows after NAT flows

[ovs-dev] [PATCH v2 3/6] ovn: add egress loopback capability

[ovs-dev] [PATCH v2 1/6] ovn: add is_chassis_resident match expression component

[ovs-dev] [PATCH v2 0/6] ovn: add distributed NAT capability

[ovs-dev] openvswitch is marked for autoremoval from testing

[ovs-dev] [PATCH ovs V2 17/21] dpif-netlink: Use netdev flow get api to query a flow

[ovs-dev] [PATCH ovs V2 14/21] netdev-tc-offloads: Netdev flow put implementation using tc api

[ovs-dev] [PATCH ovs V2 12/21] dpif-netlink: Use netdev flow put api to insert a flow

[ovs-dev] [PATCH ovs V2 10/21] netdev-tc-offloads: Add ufid to tc/netdev map

[ovs-dev] [PATCH ovs V2 19/21] dpctl: read vswitch config on start

[ovs-dev] [PATCH ovs V2 15/21] dpif-netlink: delete a flow from netdev

[ovs-dev] [PATCH ovs V2 13/21] netdev-tc-offloads: Add flower mask to priority map

[ovs-dev] [PATCH ovs V2 11/21] netdev-tc-offloads: Implement netdev flow dump api using tc interface

[ovs-dev] [PATCH ovs V2 21/21] netdev-vport: use common offloads interface

[ovs-dev] [PATCH ovs V2 20/21] netdev-linux: always add ingress qdisc

[ovs-dev] [PATCH ovs V2 09/21] dpif-netlink: Dump netdevs flows on flow dump

[ovs-dev] [PATCH ovs V2 07/21] dpif-netlink: Flush added ports using netdev flow api

[ovs-dev] [PATCH ovs V2 08/21] netdev-tc-offloads: Implement netdev flow flush using tc interface

[ovs-dev] [PATCH ovs V2 03/21] other-config: Add hw-offload switch to control netdev flow offloading

[ovs-dev] [PATCH ovs V2 06/21] netdev-vport: Add get ifindex implementation

[ovs-dev] [PATCH ovs V2 05/21] dpif: Save added ports in a port map for netdev flow api use

[ovs-dev] [PATCH ovs V2 01/21] tc: Add tc flower interface

[ovs-dev] [PATCH ovs V2 02/21] netdev: Adding a new netdev api to be used for offloading flows

[ovs-dev] [PATCH ovs V2 00/21] Introducing HW offload support for openvswitch

26 matches

Site Navigation

Mail list logo

Footer information