[ovs-dev] [PATCH v4 ovn 3/3] northd: add check_pkt_larger lflows for ingress traffic
Introduce check_pkt_larger action for ingress traffic entering the cluster from a distributed gw router port or from a gw router. This patch enables pMTU discovery for ingress traffic. Signed-off-by: Lorenzo Bianconi --- northd/ovn-northd.8.xml | 60 +-- northd/ovn-northd.c | 166 northd/ovn_northd.dl| 153 tests/ovn-northd.at | 40 +- tests/ovn.at| 137 +++-- 5 files changed, 446 insertions(+), 110 deletions(-) diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml index aff189ad5..bfb048710 100644 --- a/northd/ovn-northd.8.xml +++ b/northd/ovn-northd.8.xml @@ -1923,6 +1923,15 @@ output; eth.dst == E is only programmed on the gateway port instance on the gateway chassis. + + + For a distributed logical router or for gateway router where + the port is configured with options:gateway_mtu + the action of the above flow is modified adding + check_pkt_larger in order to mark the packet + setting REGBIT_PKT_LARGER if the size is greater + than the MTU + @@ -2147,6 +2156,46 @@ next; + + + For distributed logical routers or gateway routers with gateway port + configured with options:gateway_mtu to a valid integer + value, a priority-150 flow with the match inport == + LRP REGBIT_PKT_LARGER + REGBIT_EGRESS_LOOPBACK == 0, where LRP is the + logical router port and applies the following action for ipv4 + and ipv6 respectively: + + + +icmp4 { +icmp4.type = 3; /* Destination Unreachable. */ +icmp4.code = 4; /* Frag Needed and DF was Set. */ +icmp4.frag_mtu = M; +eth.dst = E; +ip4.dst = ip4.src; +ip4.src = I; +ip.ttl = 255; +REGBIT_EGRESS_LOOPBACK = 1; +REGBIT_PKT_LARGER 0; +next(pipeline=ingress, table=0); +}; + +icmp6 { +icmp6.type = 2; +icmp6.code = 0; +icmp6.frag_mtu = M; +eth.dst = E; +ip6.dst = ip6.src; +ip6.src = I; +ip.ttl = 255; +REGBIT_EGRESS_LOOPBACK = 1; +REGBIT_PKT_LARGER 0; +next(pipeline=ingress, table=0); +}; + + + For each NAT entry of a distributed logical router (with @@ -3631,12 +3680,11 @@ REGBIT_PKT_LARGER = check_pkt_larger(L); next; For distributed logical routers or gateway routers with gateway port configured with options:gateway_mtu to a valid integer - value, this table adds the following priority-50 logical flow for each + value, this table adds the following priority-150 logical flow for each logical router port with the match inport == LRP - outport == GW_PORT - REGBIT_PKT_LARGER, where LRP is the logical - router port and GW_PORT is the gateway router port and applies - the following action for ipv4 and ipv6 respectively: + REGBIT_PKT_LARGER !REGBIT_EGRESS_LOOPBACK, + where LRP is the logical router port and applies the following + action for ipv4 and ipv6 respectively: @@ -3649,6 +3697,7 @@ icmp4 { ip4.src = I; ip.ttl = 255; REGBIT_EGRESS_LOOPBACK = 1; +REGBIT_PKT_LARGER = 0; next(pipeline=ingress, table=0); }; @@ -3661,6 +3710,7 @@ icmp6 { ip6.src = I; ip.ttl = 255; REGBIT_EGRESS_LOOPBACK = 1; +REGBIT_PKT_LARGER = 0; next(pipeline=ingress, table=0); }; diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index f0eb715bf..8c76508cf 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -9509,6 +9509,10 @@ build_adm_ctrl_flows_for_lrouter( } } +static void +build_check_pkt_len_action_string(struct ovn_port *op, int *pmtu, + struct ds *actions); + /* Logical router ingress Table 0: L2 Admission Control * This table drops packets that the router shouldn???t see at all based * on their Ethernet headers. @@ -9536,6 +9540,8 @@ build_adm_ctrl_flows_for_lrouter_port( * the pipeline. */ ds_clear(actions); + +build_check_pkt_len_action_string(op, NULL, actions); ds_put_format(actions, REG_INPORT_ETH_ADDR " = %s; next;", op->lrp_networks.ea_s); @@ -10430,32 +10436,110 @@ build_arp_resolve_flows_for_lrouter_port( } +static void +build_icmperr_pkt_big_flows(struct ovn_port *op, int mtu, struct hmap *lflows, +struct ds *match, struct ds *actions, +enum ovn_stage stage) +{ +if (op->lrp_networks.ipv4_addrs) { +ds_clear(match); +ds_put_format(match, + "inport == %s && ip4 && "REGBIT_PKT_LARGER + " && "REGBIT_EGRESS_LOOPBACK" == 0", op->json_key); + +ds_clear(actions); +/* Set icmp4.frag_mtu to
[ovs-dev] [PATCH v4 ovn 2/3] northd: enable check_pkt_larger for gw router
As it is already done for distributed gw router scenario, introduce check_pkt_larger logical flows for gw router use case. Co-authored-by: Numan Siddique Signed-off-by: Numan Siddique Signed-off-by: Lorenzo Bianconi --- northd/ovn-northd.8.xml | 24 northd/ovn-northd.c | 31 +++--- northd/ovn_northd.dl| 82 +++ tests/ovn-northd.at | 122 tests/ovn.at| 47 5 files changed, 285 insertions(+), 21 deletions(-) diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml index 407464602..aff189ad5 100644 --- a/northd/ovn-northd.8.xml +++ b/northd/ovn-northd.8.xml @@ -3599,13 +3599,13 @@ outport = P Ingress Table 15: Check packet length - For distributed logical routers with distributed gateway port configured - with options:gateway_mtu to a valid integer value, this - table adds a priority-50 logical flow with the match - ip4 outport == GW_PORT where - GW_PORT is the distributed gateway router port and applies the - action check_pkt_larger and advances the packet to the - next table. + For distributed logical routers or gateway routers with gateway + port configured with options:gateway_mtu to a valid + integer value, this table adds a priority-50 logical flow with + the match ip4 outport == GW_PORT + where GW_PORT is the gateway router port and applies + the action check_pkt_larger and advances the packet + to the next table. @@ -3629,14 +3629,14 @@ REGBIT_PKT_LARGER = check_pkt_larger(L); next; Ingress Table 16: Handle larger packets - For distributed logical routers with distributed gateway port configured - with options:gateway_mtu to a valid integer value, this - table adds the following priority-50 logical flow for each + For distributed logical routers or gateway routers with gateway port + configured with options:gateway_mtu to a valid integer + value, this table adds the following priority-50 logical flow for each logical router port with the match inport == LRP outport == GW_PORT REGBIT_PKT_LARGER, where LRP is the logical - router port and GW_PORT is the distributed gateway router - port and applies the following action for ipv4 and ipv6 respectively: + router port and GW_PORT is the gateway router port and applies + the following action for ipv4 and ipv6 respectively: diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index dd135c38f..f0eb715bf 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -10542,17 +10542,30 @@ build_check_pkt_len_flows_for_lrouter( struct hmap *ports, struct ds *match, struct ds *actions) { -if (od->nbr) { +if (!od->nbr) { +return; +} -/* Packets are allowed by default. */ -ovn_lflow_add(lflows, od, S_ROUTER_IN_CHK_PKT_LEN, 0, "1", - "next;"); -ovn_lflow_add(lflows, od, S_ROUTER_IN_LARGER_PKTS, 0, "1", - "next;"); +/* Packets are allowed by default. */ +ovn_lflow_add(lflows, od, S_ROUTER_IN_CHK_PKT_LEN, 0, "1", + "next;"); +ovn_lflow_add(lflows, od, S_ROUTER_IN_LARGER_PKTS, 0, "1", + "next;"); -if (od->l3dgw_port && od->l3redirect_port) { -build_check_pkt_len_flows_for_lrp(od->l3dgw_port, lflows, - ports, match, actions); +if (od->l3dgw_port && od->l3redirect_port) { +/* gw router port */ +build_check_pkt_len_flows_for_lrp(od->l3dgw_port, lflows, + ports, match, actions); +} else if (smap_get(>nbr->options, "chassis")) { +for (size_t i = 0; i < od->nbr->n_ports; i++) { +/* gw router */ +struct ovn_port *rp = ovn_port_find(ports, +od->nbr->ports[i]->name); +if (!rp) { +continue; +} +build_check_pkt_len_flows_for_lrp(rp, lflows, ports, match, + actions); } } } diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index 3afa80a3b..0b704b524 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -7112,6 +7112,7 @@ for ((._uuid = lr_uuid)) /* Local router ingress table CHK_PKT_LEN: Check packet length. * + * For distributed routers with gateway ports. * Any IPv4 packet with outport set to the distributed gateway * router port, check the packet length and store the result in the * 'REGBIT_PKT_LARGER' register bit. @@ -7122,6 +7123,18 @@ for ((._uuid = lr_uuid)) * router port and the 'REGBIT_PKT_LARGER' register bit is set, * generate ICMPv4 packet with type 3 (Destination Unreachable) and * code 4 (Fragmentation needed). + * + * For Gateway
[ovs-dev] [PATCH v4 ovn 0/3] Introduce check_pkt_larger for ingress traffic
In the current codebase, check_pkt_larger is applied just for traffic leaving the ovn cluster. This series introduces the same capability for traffic entering the network from a gateway router or distributed gateway router port in order to send an ICMP error packet if the frame size is greater than the configured MTU. Changes since v3: - add missing documentation - add DDlog implementation Changes since v2: - squash gw router and distributed gw router tests - fix typos Changes since v1: - drop router pipeline rearrangement - refer to check_pkt_larger instead of check_pkt_len Lorenzo Bianconi (3): northd: introduce build_check_pkt_len_flows_for_lrp routine northd: enable check_pkt_larger for gw router northd: add check_pkt_larger lflows for ingress traffic northd/ovn-northd.8.xml | 78 ++--- northd/ovn-northd.c | 240 northd/ovn_northd.dl| 223 +++-- tests/ovn-northd.at | 122 tests/ovn.at| 182 +- 5 files changed, 727 insertions(+), 118 deletions(-) -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v4 ovn 1/3] northd: introduce build_check_pkt_len_flows_for_lrp routine
Introduce build_check_pkt_len_flows_for_lrp routine to configure check_pkt_larger logical flow for a given logical port. This is a preliminary patch to enable check_pkt_larger support for gw router use case. Acked-by: Mark Michelson Signed-off-by: Lorenzo Bianconi --- northd/ovn-northd.c | 181 +++- 1 file changed, 95 insertions(+), 86 deletions(-) diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index 75484c5cd..dd135c38f 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -10430,6 +10430,99 @@ build_arp_resolve_flows_for_lrouter_port( } +static void +build_check_pkt_len_flows_for_lrp(struct ovn_port *op, + struct hmap *lflows, struct hmap *ports, + struct ds *match, struct ds *actions) +{ +int gw_mtu = 0; + +if (op->nbrp) { + gw_mtu = smap_get_int(>nbrp->options, "gateway_mtu", 0); +} +/* Add the flows only if gateway_mtu is configured. */ +if (gw_mtu <= 0) { +return; +} + +ds_clear(match); +ds_put_format(match, "outport == %s", op->json_key); + +ds_clear(actions); +ds_put_format(actions, + REGBIT_PKT_LARGER" = check_pkt_larger(%d);" + " next;", gw_mtu + VLAN_ETH_HEADER_LEN); +ovn_lflow_add_with_hint(lflows, op->od, S_ROUTER_IN_CHK_PKT_LEN, 50, +ds_cstr(match), ds_cstr(actions), +>nbrp->header_); + +for (size_t i = 0; i < op->od->nbr->n_ports; i++) { +struct ovn_port *rp = ovn_port_find(ports, +op->od->nbr->ports[i]->name); +if (!rp || rp == op) { +continue; +} + +if (rp->lrp_networks.ipv4_addrs) { +ds_clear(match); +ds_put_format(match, "inport == %s && outport == %s" + " && ip4 && "REGBIT_PKT_LARGER, + rp->json_key, op->json_key); + +ds_clear(actions); +/* Set icmp4.frag_mtu to gw_mtu */ +ds_put_format(actions, +"icmp4_error {" +REGBIT_EGRESS_LOOPBACK" = 1; " +"eth.dst = %s; " +"ip4.dst = ip4.src; " +"ip4.src = %s; " +"ip.ttl = 255; " +"icmp4.type = 3; /* Destination Unreachable. */ " +"icmp4.code = 4; /* Frag Needed and DF was Set. */ " +"icmp4.frag_mtu = %d; " +"next(pipeline=ingress, table=%d); };", +rp->lrp_networks.ea_s, +rp->lrp_networks.ipv4_addrs[0].addr_s, +gw_mtu, +ovn_stage_get_table(S_ROUTER_IN_ADMISSION)); +ovn_lflow_add_with_hint(lflows, op->od, +S_ROUTER_IN_LARGER_PKTS, 50, +ds_cstr(match), ds_cstr(actions), +>nbrp->header_); +} + +if (rp->lrp_networks.ipv6_addrs) { +ds_clear(match); +ds_put_format(match, "inport == %s && outport == %s" + " && ip6 && "REGBIT_PKT_LARGER, + rp->json_key, op->json_key); + +ds_clear(actions); +/* Set icmp6.frag_mtu to gw_mtu */ +ds_put_format(actions, +"icmp6_error {" +REGBIT_EGRESS_LOOPBACK" = 1; " +"eth.dst = %s; " +"ip6.dst = ip6.src; " +"ip6.src = %s; " +"ip.ttl = 255; " +"icmp6.type = 2; /* Packet Too Big. */ " +"icmp6.code = 0; " +"icmp6.frag_mtu = %d; " +"next(pipeline=ingress, table=%d); };", +rp->lrp_networks.ea_s, +rp->lrp_networks.ipv6_addrs[0].addr_s, +gw_mtu, +ovn_stage_get_table(S_ROUTER_IN_ADMISSION)); +ovn_lflow_add_with_hint(lflows, op->od, +S_ROUTER_IN_LARGER_PKTS, 50, +ds_cstr(match), ds_cstr(actions), +>nbrp->header_); +} +} +} + /* Local router ingress table CHK_PKT_LEN: Check packet length. * * Any IPv4 packet with outport set to the distributed gateway @@ -10458,92 +10551,8 @@ build_check_pkt_len_flows_for_lrouter( "next;"); if (od->l3dgw_port && od->l3redirect_port) { -int gw_mtu = 0; -if (od->l3dgw_port->nbrp) { - gw_mtu = smap_get_int(>l3dgw_port->nbrp->options, - "gateway_mtu", 0); -} -/* Add the flows only if gateway_mtu is configured. */ -if (gw_mtu <= 0) { -return; -} - -ds_clear(match); -ds_put_format(match, "outport == %s",
Re: [ovs-dev] [PATCH v6] ofproto-dpif: APIs and CLI option to add/delete static fdb entry
Vasu, I reviewed your v7, but added my comments in the v6 email :( As only the documentation updated in v7, it should reflect the same code areas. Cheers, Eelco On 23 Jun 2021, at 16:41, Eelco Chaudron wrote: >> On 12 Jun 2021, at 4:09, Vasu Dasari wrote: > > See my inline comments below. > > Cheers, > > Eelco > > >> Currently there is an option to add/flush/show ARP/ND neighbor. This covers >> L3 >> side. For L2 side, there is only fdb show command. This patch gives an >> option >> to add/del an fdb entry via ovs-appctl. >> >> CLI command looks like: >> >> To add: >> ovs-appctl fdb/add >> ovs-appctl fdb/add br0 p1 0 50:54:00:00:00:05 >> >> To del: >> ovs-appctl fdb/del >> ovs-appctl fdb/del br0 p1 0 50:54:00:00:00:05 >> >> Added two new APIs to provide convenient interface to add and delete >> static-macs. >> bool xlate_add_static_mac_entry(const struct ofproto_dpif *, ofp_port_t >> in_port, >>struct eth_addr dl_src, int vlan); >> bool xlate_delete_static_mac_entry(const struct ofproto_dpif *, >> struct eth_addr dl_src, int vlan); >> >> 1. Static entry should not age. To indicate that entry being programmed is a >> static entry, >>'expires' field in 'struct mac_entry' will be set to a >> MAC_ENTRY_AGE_STATIC_ENTRY. A >>check for this value is made while deleting mac entry as part of regular >> aging process. >> 2. Another change to of mac-update logic, when a packet with same dl_src as >> that of a >>static-mac entry arrives on any port, the logic will not modify the >> expires field. >> 3. While flushing fdb entries, made sure static ones are not evicted. >> 4. Updated "ovs-appctl fdb/stats-show br0" to display numberof static >> entries in switch >> >> Added following tests: >> ofproto-dpif - static-mac add/del/flush >> ofproto-dpif - static-mac mac moves >> >> Signed-off-by: Vasu Dasari >> Reported-at: >> https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048894.html >> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1597752 >> Tested-by: Eelco Chaudron >> --- >> v1: >> - Fixed 0-day robot warnings >> v2: >> - Fix valgrind error in the modified code in mac_learning_insert() where a >> read is >>is performed on e->expires which is not initialized >> v3: >> - Addressed code review comments >> - Added more documentation >> - Fixed mac_entry_age() and is_mac_learning_update_needed() to have common >>understanding of return values when mac_entry is a static one. >> - Added NEWS item >> v4: >> - Addressed code review comments >> - Static entries will not be purged when fdb/flush is performed. >> - Static entries will not be overwritten when packet with same dl_src >> arrives on >>any port of switch >> - Provided bit more detail while doing fdb/add, to indicate if static-mac is >>overriding already present entry >> - Separated test cases for a bit more clarity >> v5: >> - Addressed code review comments >> - Added new total_static counter to count number of static entries. >> - Removed mac_entry_set_idle_time() >> - Added mac_learning_add_static_entry() and mac_learning_del_static_entry() >> - Modified APIs xlate_add_static_mac_entry() and >> xlate_delete_static_mac_entry() >>return 0 on success else a failure code >> v6: >> - Fixed a probable bug with Eelco's code review comments in >>is_mac_learning_update_needed() >> --- >> NEWS | 2 + >> lib/mac-learning.c | 149 +++ >> lib/mac-learning.h | 17 >> ofproto/ofproto-dpif-xlate.c | 48 +-- >> ofproto/ofproto-dpif-xlate.h | 5 ++ >> ofproto/ofproto-dpif.c | 95 +- >> tests/ofproto-dpif.at| 93 ++ >> 7 files changed, 380 insertions(+), 29 deletions(-) >> >> diff --git a/NEWS b/NEWS >> index ebba17b22..501b26cee 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -9,6 +9,8 @@ Post-v2.15.0 >> - Userspace datapath: >> * Auto load balancing of PMDs now partially supports cross-NUMA polling >> cases, e.g if all PMD threads are running on the same NUMA node. >> + * Added ability to add and delete static mac entries using: >> + 'ovs-appctl fdb/{add,del}' > > I think this needs its own section, not being part of "- Userspace > datapath:". So something like: > > - Added ability to add and delete static mac entries using: >'ovs-appctl fdb/{add,del}' > >> - ovs-ctl: >> * New option '--no-record-hostname' to disable hostname configuration >> in ovsdb on startup. >> diff --git a/lib/mac-learning.c b/lib/mac-learning.c >> index 3d5293d3b..2f51a6553 100644 >> --- a/lib/mac-learning.c >> +++ b/lib/mac-learning.c >> @@ -35,12 +35,23 @@ COVERAGE_DEFINE(mac_learning_expired); >> COVERAGE_DEFINE(mac_learning_evicted); >> COVERAGE_DEFINE(mac_learning_moved); >> >> -/* Returns the number
Re: [ovs-dev] [v4 01/12] dpif-netdev: Add command line and function pointer for miniflow extract
> This patch introduces the mfex function pointers which allows > the user to switch between different miniflow extract implementations > which are provided by the OVS based on optimized ISA CPU. Thanks for the patch Amber/harry. Comments inline below. > > The user can query for the available minflow extract variants available > for that CPU by following commands: > > $ovs-appctl dpif-netdev/miniflow-parser-get > > Similarly an user can set the miniflow implementation by the following > command : > > $ ovs-appctl dpif-netdev/miniflow-parser-set name > > This allow for more performance and flexibility to the user to choose Minor typo above, allow -> allows. > the miniflow implementation according to the needs. > > Signed-off-by: Kumar Amber > Co-authored-by: Harry van Haaren > Signed-off-by: Harry van Haaren > --- > lib/automake.mk | 2 + > lib/dpif-netdev-avx512.c | 32 ++-- > lib/dpif-netdev-private-extract.c | 86 > lib/dpif-netdev-private-extract.h | 94 ++ > lib/dpif-netdev-private-thread.h | 4 + > lib/dpif-netdev.c | 126 +- > 6 files changed, 337 insertions(+), 7 deletions(-) > create mode 100644 lib/dpif-netdev-private-extract.c > create mode 100644 lib/dpif-netdev-private-extract.h > > diff --git a/lib/automake.mk b/lib/automake.mk > index 49f42c2a3..6657b9ae5 100644 > --- a/lib/automake.mk > +++ b/lib/automake.mk > @@ -118,6 +118,8 @@ lib_libopenvswitch_la_SOURCES = \ > lib/dpif-netdev-private-dpcls.h \ > lib/dpif-netdev-private-dpif.c \ > lib/dpif-netdev-private-dpif.h \ > + lib/dpif-netdev-private-extract.c \ > + lib/dpif-netdev-private-extract.h \ > lib/dpif-netdev-private-flow.h \ > lib/dpif-netdev-private-hwol.h \ > lib/dpif-netdev-private-thread.h \ > diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c > index f9b199637..bb99b23ff 100644 > --- a/lib/dpif-netdev-avx512.c > +++ b/lib/dpif-netdev-avx512.c > @@ -148,6 +148,15 @@ dp_netdev_input_outer_avx512(struct > dp_netdev_pmd_thread *pmd, > * // do all processing (HWOL->MFEX->EMC->SMC) > * } > */ > + > +/* Do a batch minfilow extract into keys. */ > +uint32_t mf_mask = 0; > +if (pmd->miniflow_extract_opt) { > +mf_mask = pmd->miniflow_extract_opt(packets, keys, > +batch_size, in_port, > +(void *) pmd); > +} > +/* Perform first packet interation */ Would add whitespace above to separate the comment form the conditional block. Also minor nit, missing period at end of comment. > uint32_t lookup_pkts_bitmask = (1ULL << batch_size) - 1; > uint32_t iter = lookup_pkts_bitmask; > while (iter) { > @@ -159,6 +168,12 @@ dp_netdev_input_outer_avx512(struct > dp_netdev_pmd_thread *pmd, > pkt_metadata_init(>md, in_port); > > struct dp_netdev_flow *f = NULL; > +struct netdev_flow_key *key = [i]; > + > +/* Check the minfiflow mask to see if the packet was correctly > +* classifed by vector mfex else do a scalar miniflow extract > +* for that packet. */ Typo above for miniflow. Also alignment of * in comment seems out of place. Should be vertically aligned. Would also suggest finishing with */ on separate line after the comment. > +uint32_t mfex_hit = (mf_mask & (1 << i)); > > /* Check for partial hardware offload mark. */ > uint32_t mark; > @@ -166,7 +181,13 @@ dp_netdev_input_outer_avx512(struct > dp_netdev_pmd_thread *pmd, > f = mark_to_flow_find(pmd, mark); > if (f) { > rules[i] = >cr; > -pkt_meta[i].tcp_flags = parse_tcp_flags(packet); > +/* If AVX512 MFEX already classified the packet, use it. */ > +if (mfex_hit) { > +pkt_meta[i].tcp_flags = miniflow_get_tcp_flags(>mf); > +} else { > +pkt_meta[i].tcp_flags = parse_tcp_flags(packet); > +} > + > pkt_meta[i].bytes = dp_packet_size(packet); > phwol_hits++; > hwol_emc_smc_hitmask |= (1 << i); > @@ -174,11 +195,12 @@ dp_netdev_input_outer_avx512(struct > dp_netdev_pmd_thread *pmd, > } > } > > -/* Do miniflow extract into keys. */ > -struct netdev_flow_key *key = [i]; > -miniflow_extract(packet, >mf); > +if (!mfex_hit) { > +/* Do a scalar miniflow extract into keys */ Minor, missing period in comment. > +miniflow_extract(packet, >mf); > +} > > -/* Cache TCP and byte values for all packets. */ > +/* Cache TCP and byte values for all packets */ Period removed from comment, should be put back. > pkt_meta[i].bytes = dp_packet_size(packet); >
Re: [ovs-dev] [PATCH V6 00/13] Netdev vxlan-decap offload
On 6/23/21 5:18 PM, Ferriter, Cian wrote: >> -Original Message- >> From: dev On Behalf Of Ferriter, Cian >> Sent: Wednesday 23 June 2021 13:38 >> To: Ilya Maximets ; Eli Britstein ; >> d...@openvswitch.org >> Cc: ivan.ma...@oktetlabs.ru; Ameer Mahagneh ; Majd >> Dibbiny >> Subject: Re: [ovs-dev] [PATCH V6 00/13] Netdev vxlan-decap offload >> >> Hi all, >> >> As part of rebasing our AVX512 DPIF on this patchset, I tested this patchset >> with partial HWOL and I'm >> seeing strange behaviour. >> >> I'll report back more detailed findings soon, just wanted to mention this >> here as soon as I found the >> issue. >> >> Thanks, >> Cian >> > > More details on the issue I'm seeing: > I'm using Ilya's branch from Github: > https://github.com/igsilya/ovs/tree/tmp-vxlan-decap > > ~/ovs_scripts# $OVS_DIR/utilities/ovs-vsctl list Open_vSwitch > dpdk_version: "DPDK 20.11.1" > other_config: {dpdk-hugepage-dir="/mnt/huge", dpdk-init="true", > dpdk-lcore-mask="0x1", dpdk-socket-mem="2048,0", emc-insert-inv-prob="0", > hw-offload="true", pmd-cpu-mask="0x2"} > > ~/ovs_scripts# $OVS_DIR/utilities/ovs-vsctl show > 31584ce5-09c1-44b3-ab27-1a0308d63fff > Bridge br0 > datapath_type: netdev > Port br0 > Interface br0 > type: internal > Port phy0 > Interface phy0 > type: dpdk > options: {dpdk-devargs="5e:00.0"} > > ~/ovs_scripts# $OVS_DIR/utilities/ovs-ofctl dump-flows br0 > cookie=0x0, duration=29.466s, table=0, n_packets=0, n_bytes=0, in_port=phy0 > actions=IN_PORT > > I'm expecting the flow to be partially offloaded, but I get a segfault when > using the above branch. More info on the segfault below: > > Thread 13 "pmd-c01/id:8" received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7f9f72734700 (LWP 19327)] > 0x56163bf0d825 in set_error (error=0x0, type=RTE_FLOW_ERROR_TYPE_ATTR) at > lib/netdev-dpdk.h:84 > (gdb) bt > #0 0x56163bf0d825 in set_error (error=0x0, > type=RTE_FLOW_ERROR_TYPE_ATTR) at lib/netdev-dpdk.h:84 > #1 0x56163bf0d8d3 in netdev_dpdk_rte_flow_get_restore_info > (netdev=0x1bfc65c80, p=0x19033af00, info=0x7f9f72729a20, error=0x0) at > lib/netdev-dpdk.h:119 > #2 0x56163bf14da3 in netdev_offload_dpdk_hw_miss_packet_recover > (netdev=0x1bfc65c80, packet=0x19033af00) at lib/netdev-offload-dpdk.c:2133 > #3 0x56163bde3662 in netdev_hw_miss_packet_recover (netdev=0x1bfc65c80, > packet=0x19033af00) at lib/netdev-offload.c:265 > #4 0x56163bda19a9 in dp_netdev_hw_flow (pmd=0x7f9f72735010, port_no=2, > packet=0x19033af00, flow=0x7f9f72729b98) at lib/dpif-netdev.c:7087 > #5 0x56163bda1c5c in dfc_processing (pmd=0x7f9f72735010, > packets_=0x7f9f727310d0, keys=0x7f9f7272c480, missed_keys=0x7f9f7272c370, > batches=0x7f9f72729f60, n_batches=0x7f9f72730f70, flow_map=0x7f9f72729c50, > n_flows=0x7f9f72730f78, index_map=0x7f9f72729c30 "", md_is_valid=false, > port_no=2) at lib/dpif-netdev.c:7168 > #6 0x56163bda2f3e in dp_netdev_input__ (pmd=0x7f9f72735010, > packets=0x7f9f727310d0, md_is_valid=false, port_no=2) at > lib/dpif-netdev.c:7475 > #7 0x56163bda3105 in dp_netdev_input (pmd=0x7f9f72735010, > packets=0x7f9f727310d0, port_no=2) at lib/dpif-netdev.c:7519 > #8 0x56163bd9ab04 in dp_netdev_process_rxq_port (pmd=0x7f9f72735010, > rxq=0x56163fb3f610, port_no=2) at lib/dpif-netdev.c:4774 > #9 0x56163bd9ee17 in pmd_thread_main (f_=0x7f9f72735010) at > lib/dpif-netdev.c:6063 > #10 0x56163be71c88 in ovsthread_wrapper (aux_=0x56163fb3fe70) at > lib/ovs-thread.c:383 > #11 0x7f9f884cf6db in start_thread (arg=0x7f9f72734700) at > pthread_create.c:463 > #12 0x7f9f862bb71f in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 > > In netdev_offload_dpdk_hw_miss_packet_recover() calls > netdev_dpdk_rte_flow_get_restore_info() with a NULL for the struct > rte_flow_error *error argument: > > if (netdev_dpdk_rte_flow_get_restore_info(netdev, packet, > _restore_info, NULL)) { > /* This function is called for every packet, and in most cases there > * will be no restore info from the HW, thus error is expected. > */ > return 0; > } > > There are 2 "netdev_dpdk_rte_flow_get_restore_info()" functions. One in > lib/netdev-dpdk.h and one in lib/netdev-dpdk.c. > > I don't have the experimental API enabled, so I'm using the function rom > lib/netdev-dpdk.h. Yes, that's my fault. I replaced 'error' with NULL, because actual DPDK implementation supports that and we're not using this error anyway. But I missed the fact that dummy implementation doesn't support NULL as argument. Following change should fix your issue: diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h index 7b77ed8e0..699be3fb4 100644 --- a/lib/netdev-dpdk.h +++ b/lib/netdev-dpdk.h @@ -81,6 +81,9 @@ int
Re: [ovs-dev] [PATCH] dpif-netlink: "bonding_masters" is a reserved name
On 6/23/21 2:12 PM, Timothy Redaelli wrote: > Currently, on Linux, if you try to create a system datapath called > "bonding_masters", when you have bonding module loaded, you have a > kernel trace > ("sysfs: cannot create duplicate filename '/class/net/bonding_masters'"). > > This trace appears since "bonding" kernel modules creates a file called > "/sys/class/net/bonding_masters", that prevents any network interface to > be called "bonding_masters". > > This commits forbid an user to create a system datapath (that is a network > interface) called "bonding_masters" to avoid the kernel trace and to > avoid that bonding module can't work if it's loaded after > "bonding_masters" interface is created. > > Reported-at: https://bugzilla.redhat.com/1974303 > Signed-off-by: Timothy Redaelli > --- Hi, Timothy. Looking at BZ linked above, I tend to agree that it's a kernel's bug and working around it in every userspace program that is able to create a network interface doesn't make much sense to me. I think, kernel should just reject attempts to create network interfaces with this kind of names. I can create this kind of interface with just an ip command, OVS can create this kind of interface, any DPDK application is able to create tap interface with this name, QEMU, and so on. Simple 'ip tuntap add mode tap bonding_masters && modprobe bonding' gives the same call trace in a kernel. Also, the change below will only reject creation of bridges with such name, but will not prevent creation of regular ports (e.g. tap interfaces) and having this check in 3-5 places in the code doesn't look right to me. > lib/dpif-netlink.c | 8 > 1 file changed, 8 insertions(+) > > diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c > index 73d5608a8..ada1d8479 100644 > --- a/lib/dpif-netlink.c > +++ b/lib/dpif-netlink.c > @@ -330,6 +330,14 @@ dpif_netlink_open(const struct dpif_class *class > OVS_UNUSED, const char *name, > uint32_t upcall_pid; > int error; > > +/* "bonding_masters" is a reserved interface name under Linux, > + * since bonding module creates /sys/class/net/bonding_masters > + * and so no interface can be called "bonding_masters". > + */ > +if (!strcmp(name, "bonding_masters")) { > +return EINVAL; > +} > + > error = dpif_netlink_init(); > if (error) { > return error; > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] conntrack: increment coverage counter for all bad checksum cases
On Wed, Jun 23, 2021 at 1:01 AM Paolo Valerio wrote: > > Paolo Valerio writes: > > > conntrack_l4csum_err gets incremented only when corrupted icmp > > pass through conntrack. > > Increase it for the remaining bad checksum cases including when > > checksum is offloaded. > > > > Signed-off-by: Paolo Valerio > > --- > > Missed the Fixes tag: > > Fixes: 38c69ccf8e29 ("conntrack: Add coverage count for l4csum error.") Acked-by: Tonghao Zhang Thanks! > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev -- Best regards, Tonghao ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 0/2] add port-based ingress policing based packet-per-second rate-limiting
On Wed, Jun 09, 2021 at 11:52:07AM +0200, Simon Horman wrote: > Hi, > > this short test adds support for add port-based ingress policing based > packet-per-second rate-limiting. This builds on existing support for > byte-per-second rate limiting. > > Changes since v2 > > * Remove the for loop in function nl_msg_put_act_police() > * Remove unused enum definition for qos type > * Define 1 kpkts as 1000 packets rather than 1024 packets > * Update the description for the new item in ovsdb > * Fix some format warnings according robot's comments > > Changes between v1 and v2 > * Correct typo: s/comsume/consume/ Hi Marcelo, could I trouble you for a review of this series. I believe it addresses the issues that you raised in v2. > Tianyu Yuan (1): > add test cases for ingress_policing_kpkts parameters > > Yong Xu (1): > add port-based ingress policing based packet-per-second rate-limiting > > acinclude.m4 | 6 +- > include/linux/pkt_cls.h | 6 +- > lib/netdev-dpdk.c| 4 +- > lib/netdev-linux-private.h | 4 +- > lib/netdev-linux.c | 109 ++- > lib/netdev-provider.h| 11 ++-- > lib/netdev.c | 15 +++-- > lib/netdev.h | 3 +- > tests/atlocal.in | 17 - > tests/ovs-vsctl.at | 23 +++ > tests/system-offloads-traffic.at | 50 ++ > vswitchd/bridge.c| 6 +- > vswitchd/vswitch.ovsschema | 10 ++- > vswitchd/vswitch.xml | 59 - > 14 files changed, 266 insertions(+), 57 deletions(-) > > -- > 2.20.1 > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v6] ofproto-dpif: APIs and CLI option to add/delete static fdb entry
> On 12 Jun 2021, at 4:09, Vasu Dasari wrote: See my inline comments below. Cheers, Eelco > Currently there is an option to add/flush/show ARP/ND neighbor. This covers L3 > side. For L2 side, there is only fdb show command. This patch gives an option > to add/del an fdb entry via ovs-appctl. > > CLI command looks like: > > To add: > ovs-appctl fdb/add > ovs-appctl fdb/add br0 p1 0 50:54:00:00:00:05 > > To del: > ovs-appctl fdb/del > ovs-appctl fdb/del br0 p1 0 50:54:00:00:00:05 > > Added two new APIs to provide convenient interface to add and delete > static-macs. > bool xlate_add_static_mac_entry(const struct ofproto_dpif *, ofp_port_t > in_port, >struct eth_addr dl_src, int vlan); > bool xlate_delete_static_mac_entry(const struct ofproto_dpif *, > struct eth_addr dl_src, int vlan); > > 1. Static entry should not age. To indicate that entry being programmed is a > static entry, >'expires' field in 'struct mac_entry' will be set to a > MAC_ENTRY_AGE_STATIC_ENTRY. A >check for this value is made while deleting mac entry as part of regular > aging process. > 2. Another change to of mac-update logic, when a packet with same dl_src as > that of a >static-mac entry arrives on any port, the logic will not modify the > expires field. > 3. While flushing fdb entries, made sure static ones are not evicted. > 4. Updated "ovs-appctl fdb/stats-show br0" to display numberof static entries > in switch > > Added following tests: > ofproto-dpif - static-mac add/del/flush > ofproto-dpif - static-mac mac moves > > Signed-off-by: Vasu Dasari > Reported-at: > https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048894.html > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1597752 > Tested-by: Eelco Chaudron > --- > v1: > - Fixed 0-day robot warnings > v2: > - Fix valgrind error in the modified code in mac_learning_insert() where a > read is >is performed on e->expires which is not initialized > v3: > - Addressed code review comments > - Added more documentation > - Fixed mac_entry_age() and is_mac_learning_update_needed() to have common >understanding of return values when mac_entry is a static one. > - Added NEWS item > v4: > - Addressed code review comments > - Static entries will not be purged when fdb/flush is performed. > - Static entries will not be overwritten when packet with same dl_src > arrives on >any port of switch > - Provided bit more detail while doing fdb/add, to indicate if static-mac is >overriding already present entry > - Separated test cases for a bit more clarity > v5: > - Addressed code review comments > - Added new total_static counter to count number of static entries. > - Removed mac_entry_set_idle_time() > - Added mac_learning_add_static_entry() and mac_learning_del_static_entry() > - Modified APIs xlate_add_static_mac_entry() and > xlate_delete_static_mac_entry() >return 0 on success else a failure code > v6: > - Fixed a probable bug with Eelco's code review comments in >is_mac_learning_update_needed() > --- > NEWS | 2 + > lib/mac-learning.c | 149 +++ > lib/mac-learning.h | 17 > ofproto/ofproto-dpif-xlate.c | 48 +-- > ofproto/ofproto-dpif-xlate.h | 5 ++ > ofproto/ofproto-dpif.c | 95 +- > tests/ofproto-dpif.at| 93 ++ > 7 files changed, 380 insertions(+), 29 deletions(-) > > diff --git a/NEWS b/NEWS > index ebba17b22..501b26cee 100644 > --- a/NEWS > +++ b/NEWS > @@ -9,6 +9,8 @@ Post-v2.15.0 > - Userspace datapath: > * Auto load balancing of PMDs now partially supports cross-NUMA polling > cases, e.g if all PMD threads are running on the same NUMA node. > + * Added ability to add and delete static mac entries using: > + 'ovs-appctl fdb/{add,del}' I think this needs its own section, not being part of "- Userspace datapath:". So something like: - Added ability to add and delete static mac entries using: 'ovs-appctl fdb/{add,del}' > - ovs-ctl: > * New option '--no-record-hostname' to disable hostname configuration > in ovsdb on startup. > diff --git a/lib/mac-learning.c b/lib/mac-learning.c > index 3d5293d3b..2f51a6553 100644 > --- a/lib/mac-learning.c > +++ b/lib/mac-learning.c > @@ -35,12 +35,23 @@ COVERAGE_DEFINE(mac_learning_expired); > COVERAGE_DEFINE(mac_learning_evicted); > COVERAGE_DEFINE(mac_learning_moved); > > -/* Returns the number of seconds since 'e' (within 'ml') was last learned. */ > +/* > + * This function will return age of mac entry in the fdb. It NIT: If you break the line before 80, I would also move the above "It" to the line below. > + * will return either one of the following: > + * 1. Number of seconds since 'e' (within 'ml') was last learned. > + * 2. If the mac
Re: [ovs-dev] [PATCH V6 00/13] Netdev vxlan-decap offload
> -Original Message- > From: dev On Behalf Of Ferriter, Cian > Sent: Wednesday 23 June 2021 13:38 > To: Ilya Maximets ; Eli Britstein ; > d...@openvswitch.org > Cc: ivan.ma...@oktetlabs.ru; Ameer Mahagneh ; Majd Dibbiny > > Subject: Re: [ovs-dev] [PATCH V6 00/13] Netdev vxlan-decap offload > > Hi all, > > As part of rebasing our AVX512 DPIF on this patchset, I tested this patchset > with partial HWOL and I'm > seeing strange behaviour. > > I'll report back more detailed findings soon, just wanted to mention this > here as soon as I found the > issue. > > Thanks, > Cian > More details on the issue I'm seeing: I'm using Ilya's branch from Github: https://github.com/igsilya/ovs/tree/tmp-vxlan-decap ~/ovs_scripts# $OVS_DIR/utilities/ovs-vsctl list Open_vSwitch dpdk_version: "DPDK 20.11.1" other_config: {dpdk-hugepage-dir="/mnt/huge", dpdk-init="true", dpdk-lcore-mask="0x1", dpdk-socket-mem="2048,0", emc-insert-inv-prob="0", hw-offload="true", pmd-cpu-mask="0x2"} ~/ovs_scripts# $OVS_DIR/utilities/ovs-vsctl show 31584ce5-09c1-44b3-ab27-1a0308d63fff Bridge br0 datapath_type: netdev Port br0 Interface br0 type: internal Port phy0 Interface phy0 type: dpdk options: {dpdk-devargs="5e:00.0"} ~/ovs_scripts# $OVS_DIR/utilities/ovs-ofctl dump-flows br0 cookie=0x0, duration=29.466s, table=0, n_packets=0, n_bytes=0, in_port=phy0 actions=IN_PORT I'm expecting the flow to be partially offloaded, but I get a segfault when using the above branch. More info on the segfault below: Thread 13 "pmd-c01/id:8" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f9f72734700 (LWP 19327)] 0x56163bf0d825 in set_error (error=0x0, type=RTE_FLOW_ERROR_TYPE_ATTR) at lib/netdev-dpdk.h:84 (gdb) bt #0 0x56163bf0d825 in set_error (error=0x0, type=RTE_FLOW_ERROR_TYPE_ATTR) at lib/netdev-dpdk.h:84 #1 0x56163bf0d8d3 in netdev_dpdk_rte_flow_get_restore_info (netdev=0x1bfc65c80, p=0x19033af00, info=0x7f9f72729a20, error=0x0) at lib/netdev-dpdk.h:119 #2 0x56163bf14da3 in netdev_offload_dpdk_hw_miss_packet_recover (netdev=0x1bfc65c80, packet=0x19033af00) at lib/netdev-offload-dpdk.c:2133 #3 0x56163bde3662 in netdev_hw_miss_packet_recover (netdev=0x1bfc65c80, packet=0x19033af00) at lib/netdev-offload.c:265 #4 0x56163bda19a9 in dp_netdev_hw_flow (pmd=0x7f9f72735010, port_no=2, packet=0x19033af00, flow=0x7f9f72729b98) at lib/dpif-netdev.c:7087 #5 0x56163bda1c5c in dfc_processing (pmd=0x7f9f72735010, packets_=0x7f9f727310d0, keys=0x7f9f7272c480, missed_keys=0x7f9f7272c370, batches=0x7f9f72729f60, n_batches=0x7f9f72730f70, flow_map=0x7f9f72729c50, n_flows=0x7f9f72730f78, index_map=0x7f9f72729c30 "", md_is_valid=false, port_no=2) at lib/dpif-netdev.c:7168 #6 0x56163bda2f3e in dp_netdev_input__ (pmd=0x7f9f72735010, packets=0x7f9f727310d0, md_is_valid=false, port_no=2) at lib/dpif-netdev.c:7475 #7 0x56163bda3105 in dp_netdev_input (pmd=0x7f9f72735010, packets=0x7f9f727310d0, port_no=2) at lib/dpif-netdev.c:7519 #8 0x56163bd9ab04 in dp_netdev_process_rxq_port (pmd=0x7f9f72735010, rxq=0x56163fb3f610, port_no=2) at lib/dpif-netdev.c:4774 #9 0x56163bd9ee17 in pmd_thread_main (f_=0x7f9f72735010) at lib/dpif-netdev.c:6063 #10 0x56163be71c88 in ovsthread_wrapper (aux_=0x56163fb3fe70) at lib/ovs-thread.c:383 #11 0x7f9f884cf6db in start_thread (arg=0x7f9f72734700) at pthread_create.c:463 #12 0x7f9f862bb71f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 In netdev_offload_dpdk_hw_miss_packet_recover() calls netdev_dpdk_rte_flow_get_restore_info() with a NULL for the struct rte_flow_error *error argument: if (netdev_dpdk_rte_flow_get_restore_info(netdev, packet, _restore_info, NULL)) { /* This function is called for every packet, and in most cases there * will be no restore info from the HW, thus error is expected. */ return 0; } There are 2 "netdev_dpdk_rte_flow_get_restore_info()" functions. One in lib/netdev-dpdk.h and one in lib/netdev-dpdk.c. I don't have the experimental API enabled, so I'm using the function rom lib/netdev-dpdk.h. > > -Original Message- > > From: dev On Behalf Of Ilya Maximets > > Sent: Tuesday 22 June 2021 16:55 > > To: Eli Britstein ; d...@openvswitch.org; Ilya Maximets > > > > Cc: ivan.ma...@oktetlabs.ru; Ameer Mahagneh ; Majd > > Dibbiny > > Subject: Re: [ovs-dev] [PATCH V6 00/13] Netdev vxlan-decap offload > > > > On 4/4/21 11:54 AM, Eli Britstein wrote: > > > VXLAN decap in OVS-DPDK configuration consists of two flows: > > > F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789) > > > F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0 > > > > > > F1 is a classification flow. It has outer headers matches and it > > > classifies the packet as a
Re: [ovs-dev] [PATCH ovn] controller: Fix the wrong 'struct' type for 'pflow_output_data'.
Acked-by: Mark Michelson On 6/22/21 4:10 PM, num...@ovn.org wrote: From: Numan Siddique 'pflow_output_data' should be of type 'struct ed_type_pflow_output' and not 'struct ed_type_lflow_output'. Fixes: e07e397b7ae("ovn-controller: Split logical flow and physical flow processing.") Signed-off-by: Numan Siddique --- controller/ovn-controller.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index 1cfe4b713..3968ef059 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -2980,7 +2980,7 @@ main(int argc, char *argv[]) struct ed_type_lflow_output *lflow_output_data = engine_get_internal_data(_lflow_output); -struct ed_type_lflow_output *pflow_output_data = +struct ed_type_pflow_output *pflow_output_data = engine_get_internal_data(_pflow_output); struct ed_type_ct_zones *ct_zones_data = engine_get_internal_data(_ct_zones); ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v4] datapath: Add a new action dec_ttl
On 26 May 2021, at 14:34, Eelco Chaudron wrote: On 18 May 2021, at 20:15, Ilya Maximets wrote: On 5/18/21 4:44 PM, Eelco Chaudron wrote: Add support for the dec_ttl action. Instead of programming the datapath with a flow that matches the packet TTL and an IP set, use a single dec_ttl action. The old behavior is kept if the new action is not supported by the datapath. # ovs-ofctl dump-flows br0 cookie=0x0, duration=12.538s, table=0, n_packets=4, n_bytes=392, ip actions=dec_ttl,NORMAL cookie=0x0, duration=12.536s, table=0, n_packets=4, n_bytes=168, actions=NORMAL # ping -c1 -t 20 192.168.0.2 PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data. IP (tos 0x0, ttl 19, id 45336, offset 0, flags [DF], proto ICMP (1), length 84) 192.168.0.1 > 192.168.0.2: ICMP echo request, id 8865, seq 1, length 64 Linux netlink datapath support depends on upstream Linux commit: 744676e77720 ("openvswitch: add TTL decrement action") Note that in the Linux kernel tree the OVS_ACTION_ATTR_ADD_MPLS has been defined, and to make sure the IDs are in sync, it had to be added to the OVS source tree. This required some additional case statements, which should be revisited once the OVS implementation is added. @@ -5186,6 +5187,40 @@ xlate_delete_field(struct xlate_ctx *ctx, ds_destroy(); } +/* New handling for dec_ttl action. */ 'New handling' makes sense in a patch, but doesn't while reading the code. Yes, I will remove this comment altogether as it makes no sense. +/* Tests whether 'dpif' datapath supports decrement of the IP TTL via + * OVS_ACTION_DEC_TTL. */ +static bool +check_dec_ttl_action(struct dpif *dpif) +{ +struct odputil_keybuf keybuf; +struct flow flow = { 0 }; It's probbaly better to just memset it as in other similar functions to avoid compiler's complains. ACK, will use a memset here. /* Tests whether 'backer''s datapath supports the clone action * OVS_ACTION_ATTR_CLONE. */ static bool @@ -1590,6 +1629,7 @@ check_support(struct dpif_backer *backer) dpif_supports_explicit_drop_action(backer->dpif); backer->rt_support.lb_output_action= dpif_supports_lb_output_action(backer->dpif); +backer->rt_support.dec_ttl_action = check_dec_ttl_action(backer->dpif); During discussions about all-zero SNAT feature support I remembered that we have a 'capabilities' table that should reflect all the datapath supported fetures. So, this should be added there as well. And documented in vswitchd/vswitch.xml. ACK, will add. /* Stores the various features which the corresponding backer supports. */ diff --git a/tests/odp.at b/tests/odp.at index b762ebb2b..24946bec4 100644 --- a/tests/odp.at +++ b/tests/odp.at @@ -384,6 +384,7 @@ check_pkt_len(size=200,gt(drop),le(5)) check_pkt_len(size=200,gt(ct(nat)),le(drop)) check_pkt_len(size=200,gt(set(eth(src=00:01:02:03:04:05,dst=10:11:12:13:14:15))),le(set(eth(src=00:01:02:03:04:06,dst=10:11:12:13:14:16 lb_output(1) +dec_ttl(le_1(userspace(pid=3614533484,controller(reason=2,dont_send=0,continuation=0,recirc_id=1,rule_cookie=0,controller_id=0,max_len=65535 Maybe it will make sense to also add a very simple variant of this action, e.g. dec_ttl(le_1(drop)). Added, drop and it resulted in an issue (fixed). Doing some final tests and will sent out a V5 soon! Hi Ilya/Bindiya, I was preparing my v5, and I noticed that a bunch of self-tests fail. I was wondering why I (and Matteo/Bindiya) never noticed. For me, it was because I was running make check on my dev system, which had an old kernel :( The datapath tests I was running on my DUT. I did solve most of the problems, but there are some corner cases that might be hard (and was wondering if you guys even thought about them): - As dec_ttl is none reversible there are some cases that it needs to clone the packet. Take the following example with a patch port (to solve this I sent another preceding patch, "ofproto-dpif: fix issue with non-reversible actions on a patch port"): Rule set: ovs-ofctl -O OpenFlow13 add-flow br0 in_port=local,ip,actions=2,1]) [br0 port 2 <===> port 1 br1] ovs-ofctl -O OpenFlow13 add-flow br1 in_port=1,ip,actions=dec_ttl,push_mpls:0x8847,3]) Becomes: clone(dec_ttl(le_1(userspace(pid=0,controller(reason=2,dont_send=1,continuation=0,recirc_id=1,rule_cookie=0,controller_id=0,max_len=65535,push_mpls(label=0,tc=0,ttl=63,bos=1,eth_type=0x8847),3),1 Where it was: set(ipv4(ttl=63)),push_mpls(label=0,tc=0,ttl=63,bos=1,eth_type=0x8847),3,pop_mpls(eth_type=0x800),set(ipv4(ttl=64)),1' This clone action is NOT hardware offloadable, so wondering if this is needed in your use case? This is fixed in my v5. - tunnel encapsulation copies TTL value from the header however, because dec_ttl is a dynamic action, the value from the upcall packet is copied as the mpls_pop is static Not sure how to solve this as the TTL
[ovs-dev] [PATCH v3] ovsdb: provide raft and command interfaces with priority
From: Anton Ivanov Set a soft time limit of "raft election timer"/2 on ovsdb processing. This improves behaviour in large heavily loaded clusters. While it cannot fully eliminate spurious raft elections under heavy load, it significantly decreases their number. Processing is (to the extent possible) restarted where it stopped on the previous iteration to ensure that sessions towards the tail of the session list are not starved. Signed-off-by: Anton Ivanov --- ovsdb/jsonrpc-server.c | 86 -- ovsdb/jsonrpc-server.h | 2 +- ovsdb/ovsdb-server.c | 24 +++- ovsdb/raft.c | 5 +++ ovsdb/raft.h | 3 ++ ovsdb/storage.c| 11 ++ ovsdb/storage.h| 2 + 7 files changed, 127 insertions(+), 6 deletions(-) diff --git a/ovsdb/jsonrpc-server.c b/ovsdb/jsonrpc-server.c index 4e2dfc3d7..a2f47aae7 100644 --- a/ovsdb/jsonrpc-server.c +++ b/ovsdb/jsonrpc-server.c @@ -60,7 +60,8 @@ static struct ovsdb_jsonrpc_session *ovsdb_jsonrpc_session_create( struct ovsdb_jsonrpc_remote *, struct jsonrpc_session *, bool); static void ovsdb_jsonrpc_session_preremove_db(struct ovsdb_jsonrpc_remote *, struct ovsdb *); -static void ovsdb_jsonrpc_session_run_all(struct ovsdb_jsonrpc_remote *); +static void ovsdb_jsonrpc_session_run_all(struct ovsdb_jsonrpc_remote *, + uint64_t limit); static void ovsdb_jsonrpc_session_wait_all(struct ovsdb_jsonrpc_remote *); static void ovsdb_jsonrpc_session_get_memory_usage_all( const struct ovsdb_jsonrpc_remote *, struct simap *usage); @@ -128,6 +129,8 @@ struct ovsdb_jsonrpc_server { bool read_only;/* This server is does not accept any transactions that can modify the database. */ struct shash remotes; /* Contains "struct ovsdb_jsonrpc_remote *"s. */ +struct ovsdb_jsonrpc_remote *skip_to; +bool must_wake_up; }; /* A configured remote. This is either a passive stream listener plus a list @@ -137,6 +140,7 @@ struct ovsdb_jsonrpc_remote { struct ovsdb_jsonrpc_server *server; struct pstream *listener; /* Listener, if passive. */ struct ovs_list sessions; /* List of "struct ovsdb_jsonrpc_session"s. */ +struct ovsdb_jsonrpc_session *skip_to; uint8_t dscp; bool read_only; char *role; @@ -159,6 +163,8 @@ ovsdb_jsonrpc_server_create(bool read_only) ovsdb_server_init(>up); shash_init(>remotes); server->read_only = read_only; +server->must_wake_up = false; +server->skip_to = NULL; return server; } @@ -279,6 +285,7 @@ ovsdb_jsonrpc_server_add_remote(struct ovsdb_jsonrpc_server *svr, remote->dscp = options->dscp; remote->read_only = options->read_only; remote->role = nullable_xstrdup(options->role); +remote->skip_to = NULL; shash_add(>remotes, name, remote); if (!listener) { @@ -293,6 +300,13 @@ ovsdb_jsonrpc_server_del_remote(struct shash_node *node) { struct ovsdb_jsonrpc_remote *remote = node->data; +/* safest option - rerun all remotes */ +if (remote->server->skip_to) { +remote->server->skip_to = NULL; +} + +remote->skip_to = NULL; + ovsdb_jsonrpc_session_close_all(remote); pstream_close(remote->listener); shash_delete(>server->remotes, node); @@ -378,12 +392,25 @@ ovsdb_jsonrpc_server_set_read_only(struct ovsdb_jsonrpc_server *svr, } void -ovsdb_jsonrpc_server_run(struct ovsdb_jsonrpc_server *svr) +ovsdb_jsonrpc_server_run(struct ovsdb_jsonrpc_server *svr, uint64_t limit) { struct shash_node *node; +uint64_t elapsed = 0, start_time = 0; + +start_time = time_msec(); + +svr->must_wake_up = false; SHASH_FOR_EACH (node, >remotes) { struct ovsdb_jsonrpc_remote *remote = node->data; +if (svr->skip_to) { +if (remote != svr->skip_to) { +continue; +} else { +svr->skip_to = NULL; +svr->must_wake_up = true; +} +} if (remote->listener) { struct stream *stream; @@ -403,7 +430,14 @@ ovsdb_jsonrpc_server_run(struct ovsdb_jsonrpc_server *svr) } } -ovsdb_jsonrpc_session_run_all(remote); +ovsdb_jsonrpc_session_run_all(remote, limit - elapsed); + +elapsed = time_msec() - start_time; +if (elapsed > limit) { +svr->must_wake_up = true; +svr->skip_to = remote; +break; +} } } @@ -412,6 +446,11 @@ ovsdb_jsonrpc_server_wait(struct ovsdb_jsonrpc_server *svr) { struct shash_node *node; +if (svr->must_wake_up) { +poll_immediate_wake(); +svr->must_wake_up = false; +} + SHASH_FOR_EACH (node, >remotes) { struct ovsdb_jsonrpc_remote *remote = node->data; @@ -583,15 +622,54 @@ ovsdb_jsonrpc_session_set_options(struct
Re: [ovs-dev] [PATCH] bridge: fix type mismatch
Friendly ping > -Original Message- > From: wangyunjian > Sent: Tuesday, April 27, 2021 2:42 PM > To: d...@openvswitch.org; i.maxim...@ovn.org > Cc: dingxiaoxiong ; wangyunjian > > Subject: [ovs-dev] [PATCH] bridge: fix type mismatch > > From: Yunjian Wang > > Currently the function ofproto_set_flow_limit() was not checking > 'limit' value. It maybe negative, which will be lead to a big > unsigned value. The 'limit' should never be negative so it's better > to just use smap_get_uint() to get it right. > > And fix ofproto_set_max_idle(), ofproto_set_min_revalidate_pps(), > ofproto_set_max_revalidator() and ofproto_set_bundle_idle_timeout() > together. > > Signed-off-by: Yunjian Wang > --- > vswitchd/bridge.c | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c > index 5ed7e8234..985e05099 100644 > --- a/vswitchd/bridge.c > +++ b/vswitchd/bridge.c > @@ -822,19 +822,19 @@ bridge_reconfigure(const struct > ovsrec_open_vswitch *ovs_cfg) > > COVERAGE_INC(bridge_reconfigure); > > -ofproto_set_flow_limit(smap_get_int(_cfg->other_config, > "flow-limit", > +ofproto_set_flow_limit(smap_get_uint(_cfg->other_config, > "flow-limit", > > OFPROTO_FLOW_LIMIT_DEFAULT)); > -ofproto_set_max_idle(smap_get_int(_cfg->other_config, > "max-idle", > +ofproto_set_max_idle(smap_get_uint(_cfg->other_config, > "max-idle", > > OFPROTO_MAX_IDLE_DEFAULT)); > -ofproto_set_max_revalidator(smap_get_int(_cfg->other_config, > +ofproto_set_max_revalidator(smap_get_uint(_cfg->other_config, > "max-revalidator", > > OFPROTO_MAX_REVALIDATOR_DEFAULT)); > ofproto_set_min_revalidate_pps( > -smap_get_int(_cfg->other_config, "min-revalidate-pps", > +smap_get_uint(_cfg->other_config, "min-revalidate-pps", > OFPROTO_MIN_REVALIDATE_PPS_DEFAULT)); > ofproto_set_vlan_limit(smap_get_int(_cfg->other_config, > "vlan-limit", > > LEGACY_MAX_VLAN_HEADERS)); > - > ofproto_set_bundle_idle_timeout(smap_get_int(_cfg->other_config, > + > ofproto_set_bundle_idle_timeout(smap_get_uint(_cfg->other_config, > > "bundle-idle-timeout", 0)); > ofproto_set_threads( > smap_get_int(_cfg->other_config, "n-handler-threads", 0), > -- > 2.18.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v5 ovn 3/4] ovn-northd: Add CoPP policies for flows that punt packets to ovn-controller.
From: Dumitru Ceara Change the ovn-northd implementation to set the new 'controller_meter' field for flows that need to punt packets to ovn-controller. Protocol packets for which CoPP is enforced when sending packets to ovn-controller (if configured): - ARP - ND_NS - ND_NA - ND_RA - DNS - IGMP - packets that require ARP resolution before forwarding - packets that require ND_NS before forwarding - packets that need to be replied to with ICMP Errors - packets that need to be replied to with TCP RST - packets that need to be replied to with DHCP_OPTS - packets that trigger a SCTP abort action - contoller_events - BFD Co-authored-by: Lorenzo Bianconi Signed-off-by: Lorenzo Bianconi Signed-off-by: Dumitru Ceara --- include/ovn/actions.h | 1 - lib/actions.c | 50 +--- lib/copp.c| 1 + lib/copp.h| 1 + northd/ovn-northd.c | 497 -- ovn-nb.xml| 3 + tests/atlocal.in | 3 + tests/ovn.at | 9 +- tests/system-ovn.at | 138 +++ utilities/ovn-nbctl.8.xml | 3 + 10 files changed, 477 insertions(+), 229 deletions(-) diff --git a/include/ovn/actions.h b/include/ovn/actions.h index a33d02681..f023a37b9 100644 --- a/include/ovn/actions.h +++ b/include/ovn/actions.h @@ -394,7 +394,6 @@ struct ovnact_controller_event { int event_type; /* controller event type */ struct ovnact_gen_option *options; size_t n_options; -char *meter; }; /* OVNACT_BIND_VPORT. */ diff --git a/lib/actions.c b/lib/actions.c index 2355a9ace..c572e88ae 100644 --- a/lib/actions.c +++ b/lib/actions.c @@ -1644,9 +1644,6 @@ format_TRIGGER_EVENT(const struct ovnact_controller_event *event, { ds_put_format(s, "trigger_event(event = \"%s\"", event_to_string(event->event_type)); -if (event->meter) { -ds_put_format(s, ", meter = \"%s\"", event->meter); -} for (const struct ovnact_gen_option *o = event->options; o < >options[event->n_options]; o++) { ds_put_cstr(s, ", "); @@ -1821,24 +1818,11 @@ encode_event_empty_lb_backends_opts(struct ofpbuf *ofpacts, static void encode_TRIGGER_EVENT(const struct ovnact_controller_event *event, - const struct ovnact_encode_params *ep OVS_UNUSED, + const struct ovnact_encode_params *ep, struct ofpbuf *ofpacts) { -uint32_t meter_id = NX_CTLR_NO_METER; -size_t oc_offset; - -if (event->meter) { -meter_id = ovn_extend_table_assign_id(ep->meter_table, event->meter, - ep->lflow_uuid); -if (meter_id == EXT_TABLE_ID_INVALID) { -VLOG_WARN("Unable to assign id for trigger meter: %s", - event->meter); -return; -} -} - -oc_offset = encode_start_controller_op(ACTION_OPCODE_EVENT, false, - meter_id, ofpacts); +size_t oc_offset = encode_start_controller_op(ACTION_OPCODE_EVENT, false, + ep->ctrl_meter_id, ofpacts); ovs_be32 ofs = htonl(event->event_type); ofpbuf_put(ofpacts, , sizeof ofs); @@ -2372,27 +2356,12 @@ parse_trigger_event(struct action_context *ctx, sizeof *event->options); } -if (lexer_match_id(ctx->lexer, "meter")) { -if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) { -return; -} -/* If multiple meters are given, use the most recent. */ -if (ctx->lexer->token.type == LEX_T_STRING && -strlen(ctx->lexer->token.s)) { -free(event->meter); -event->meter = xstrdup(ctx->lexer->token.s); -} else if (ctx->lexer->token.type != LEX_T_STRING) { -lexer_syntax_error(ctx->lexer, "expecting string"); -return; -} -lexer_get(ctx->lexer); -} else { -struct ovnact_gen_option *o = >options[event->n_options++]; -memset(o, 0, sizeof *o); -parse_gen_opt(ctx, o, ->pp->controller_event_opts->event_opts[event_type], -event_to_string(event_type)); -} +struct ovnact_gen_option *o = >options[event->n_options++]; +memset(o, 0, sizeof *o); +parse_gen_opt(ctx, o, + >pp->controller_event_opts->event_opts[event_type], + event_to_string(event_type)); + if (ctx->lexer->error) { return; } @@ -2413,7 +2382,6 @@ static void ovnact_controller_event_free(struct ovnact_controller_event *event) { free_gen_options(event->options, event->n_options); -free(event->meter); } static void diff --git a/lib/copp.c b/lib/copp.c index e3d14938a..bbe66924b 100644 --- a/lib/copp.c +++
[ovs-dev] [PATCH v5 ovn 2/4] ovn-northd: Add support for CoPP.
From: Dumitru Ceara Add new 'Copp' (Control plane protection) table to OVN Northbound DB: - this stores mappings between control plane protocol names and meters that should be used to rate limit controller-destined traffic for those protocols. Add new 'copp' columns to the following OVN Northbound DB tables: - Logical_Switch - Logical_Router For now, no control plane protection policy is installed for any of the existing flows that punt packets to ovn-controller. This will be added in follow-up patches. Add CLI commands in 'ovn-nbctl' to allow the user to manage Control Plane Protection Policies at different levels (logical switch, logical router). Acked-by: Mark D. Gray Co-authored-by: Lorenzo Bianconi Signed-off-by: Lorenzo Bianconi Signed-off-by: Dumitru Ceara --- lib/automake.mk | 2 + lib/copp.c| 142 ++ lib/copp.h| 58 + northd/ovn-northd.c | 58 + ovn-nb.ovsschema | 16 +++- ovn-nb.xml| 78 + tests/ovn-controller.at | 52 +++ tests/ovn-northd.at | 96 utilities/ovn-nbctl.8.xml | 72 +++ utilities/ovn-nbctl.c | 178 ++ 10 files changed, 734 insertions(+), 18 deletions(-) create mode 100644 lib/copp.c create mode 100644 lib/copp.h diff --git a/lib/automake.mk b/lib/automake.mk index 917b28e1e..ac0fde8a3 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -9,6 +9,8 @@ lib_libovn_la_SOURCES = \ lib/actions.c \ lib/chassis-index.c \ lib/chassis-index.h \ + lib/copp.c \ + lib/copp.h \ lib/ovn-dirs.h \ lib/expr.c \ lib/extend-table.h \ diff --git a/lib/copp.c b/lib/copp.c new file mode 100644 index 0..e3d14938a --- /dev/null +++ b/lib/copp.c @@ -0,0 +1,142 @@ +/* Copyright (c) 2021, Red Hat, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include + +#include "openvswitch/shash.h" +#include "db-ctl-base.h" +#include "smap.h" +#include "lib/ovn-nb-idl.h" +#include "lib/copp.h" + +static char *copp_proto_names[COPP_PROTO_MAX] = { +[COPP_ARP] = "arp", +[COPP_ARP_RESOLVE] = "arp-resolve", +[COPP_DHCPV4_OPTS] = "dhcpv4-opts", +[COPP_DHCPV6_OPTS] = "dhcpv6-opts", +[COPP_DNS] = "dns", +[COPP_EVENT_ELB] = "event-elb", +[COPP_ICMP4_ERR] = "icmp4-error", +[COPP_ICMP6_ERR] = "icmp6-error", +[COPP_IGMP] = "igmp", +[COPP_ND_NA] = "nd-na", +[COPP_ND_NS] = "nd-ns", +[COPP_ND_NS_RESOLVE] = "nd-ns-resolve", +[COPP_ND_RA_OPTS]= "nd-ra-opts", +[COPP_TCP_RESET] = "tcp-reset", +[COPP_BFD] = "bfd", +}; + +static const char * +copp_proto_get_name(enum copp_proto proto) +{ +if (proto >= COPP_PROTO_MAX) { +return ""; +} +return copp_proto_names[proto]; +} + +const char * +copp_meter_get(enum copp_proto proto, const struct nbrec_copp *copp, + const struct shash *meter_groups) +{ +if (!copp || proto >= COPP_PROTO_MAX) { +return NULL; +} + +const char *meter = smap_get(>meters, copp_proto_names[proto]); + +if (meter && shash_find(meter_groups, meter)) { +return meter; +} + +return NULL; +} + +void +copp_meter_list(struct ctl_context *ctx, const struct nbrec_copp *copp) +{ +if (!copp) { +return; +} + +struct smap_node *node; + +SMAP_FOR_EACH (node, >meters) { +ds_put_format(>output, "%s: %s\n", node->key, node->value); +} +} + +const struct nbrec_copp * +copp_meter_add(struct ctl_context *ctx, const struct nbrec_copp *copp, + const char *proto_name, const char *meter) +{ +if (!copp) { +copp = nbrec_copp_insert(ctx->txn); +} + +struct smap meters; +smap_init(); +smap_clone(, >meters); +smap_replace(, proto_name, meter); +nbrec_copp_set_meters(copp, ); +smap_destroy(); + +return copp; +} + +void +copp_meter_del(const struct nbrec_copp *copp, const char *proto_name) +{ +if (!copp) { +return; +} + +if (proto_name) { +if (smap_get(>meters, proto_name)) { +struct smap meters; +smap_init(); +smap_clone(, >meters); +smap_remove(, proto_name); +nbrec_copp_set_meters(copp, ); +
[ovs-dev] [PATCH v5 ovn 4/4] NEWS: Add CoPP support.
From: Dumitru Ceara Acked-by: Mark D. Gray Co-authored-by: Lorenzo Bianconi Signed-off-by: Lorenzo Bianconi Signed-off-by: Dumitru Ceara --- NEWS | 1 + 1 file changed, 1 insertion(+) diff --git a/NEWS b/NEWS index 0da7d8f97..72585e56b 100644 --- a/NEWS +++ b/NEWS @@ -25,6 +25,7 @@ OVN v21.06.0 - 11 May 2021 * ovn-sbctl now also supports daemon mode. - Added support in native DNS to respond to PTR request types. - New --dry-run option for ovn-northd and ovn-northd-ddlog. + - Added Control Plane Protection support (control plane traffic metering). OVN v21.03.0 - 12 Mar 2021 - -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v5 ovn 0/4] respin CoPP series
This series respin CoPP support introduced here [0] by Dumitru rebasing on top of ovn master branch and adding some missing meters (e.g. bfd or acl reject). The main goal of this series is to continue the discussion about the proposed approach and to align on CMS APIs. For the moment DDLog is not supported yet and it will be added in a subsequent series. Related bz: https://bugzilla.redhat.com/show_bug.cgi?id=1947913 https://bugzilla.redhat.com/show_bug.cgi?id=1946610 Changes since v4: - cosmetics - rebased on top of ovn master Changes since v3: - fix DDlog compilation errors - rebased on top of ovn master Changes since v2: - add sbctl checks in tests/ovn-northd.at unit tests - remove letfovers in utilities/ovn-nbctl.8.xml Changes since v1: - merge patch 3/5 and 4/5 - cosmetics - improve naming conventions - add more unit-tests/system-tests - remove duplicated flow - remove some leftover entries in ovn-nbctl.8.xml - add metering for sctp abort packets Changes since RFC: - drop per-port metering - add unit/system tests - add reject action metering Dumitru Ceara (4): ovn-controller: Add support for Logical_Flow control meters ovn-northd: Add support for CoPP. ovn-northd: Add CoPP policies for flows that punt packets to ovn-controller. NEWS: Add CoPP support. NEWS | 1 + controller/lflow.c| 40 ++- controller/ofctrl.c | 56 ++-- controller/ofctrl.h | 21 +- controller/physical.c | 9 +- include/ovn/actions.h | 3 +- lib/actions.c | 116 +++- lib/automake.mk | 2 + lib/copp.c| 143 ++ lib/copp.h| 59 northd/ovn-northd.c | 549 -- northd/ovn_northd.dl | 2 + ovn-nb.ovsschema | 16 +- ovn-nb.xml| 81 ++ ovn-sb.ovsschema | 6 +- ovn-sb.xml| 6 + tests/atlocal.in | 3 + tests/ovn-controller.at | 52 tests/ovn-northd.at | 96 +++ tests/ovn.at | 9 +- tests/system-ovn.at | 138 ++ utilities/ovn-nbctl.8.xml | 75 ++ utilities/ovn-nbctl.c | 178 23 files changed, 1353 insertions(+), 308 deletions(-) create mode 100644 lib/copp.c create mode 100644 lib/copp.h -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v5 ovn 1/4] ovn-controller: Add support for Logical_Flow control meters
From: Dumitru Ceara Add a new 'controller_meter' column to OVN Southbound Logical_Flow table. This stores an optional string which should correspond to the Meter that must be used for rate limiting controller actions generated by packets hitting the flow. Add a new 'ofctrl_add_flow_metered' function to create a new 'ovn_flow' with an attached controller meter. Change ofctrl_check_and_add_flow to allow specifying a meter ID for packets that are punted to controller. Change consider_logical_flow to parse controller_meter from the logical flow and use it when building openflow entries. Add a new 'ctrl_meter_id' field to 'struct ovnact_encode_params' to be used when encoding controller actions from logical flow actions. Acked-by: Mark D. Gray Co-authored-by: Lorenzo Bianconi Signed-off-by: Lorenzo Bianconi Signed-off-by: Dumitru Ceara --- controller/lflow.c| 40 +++--- controller/ofctrl.c | 56 +--- controller/ofctrl.h | 21 ++ controller/physical.c | 9 +++--- include/ovn/actions.h | 2 ++ lib/actions.c | 66 +++ northd/ovn_northd.dl | 2 ++ ovn-sb.ovsschema | 6 ++-- ovn-sb.xml| 6 9 files changed, 144 insertions(+), 64 deletions(-) diff --git a/controller/lflow.c b/controller/lflow.c index 34eca135a..cbb4a52d1 100644 --- a/controller/lflow.c +++ b/controller/lflow.c @@ -569,6 +569,27 @@ update_conj_id_ofs(uint32_t *conj_id_ofs, uint32_t n_conjs) return false; } +static void +lflow_parse_ctrl_meter(const struct sbrec_logical_flow *lflow, + struct ovn_extend_table *meter_table, + uint32_t *meter_id) +{ +ovs_assert(meter_id); +*meter_id = NX_CTLR_NO_METER; + +if (lflow->controller_meter) { +*meter_id = ovn_extend_table_assign_id(meter_table, + lflow->controller_meter, + lflow->header_.uuid); +if (*meter_id == EXT_TABLE_ID_INVALID) { +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); +VLOG_WARN_RL(, "Unable to assign id for meter: %s", + lflow->controller_meter); +return; +} +} +} + static void add_matches_to_flow_table(const struct sbrec_logical_flow *lflow, const struct sbrec_datapath_binding *dp, @@ -586,6 +607,13 @@ add_matches_to_flow_table(const struct sbrec_logical_flow *lflow, .lfrr = l_ctx_out->lfrr, }; +/* Parse any meter to be used if this flow should punt packets to + * controller. + */ +uint32_t ctrl_meter_id = NX_CTLR_NO_METER; +lflow_parse_ctrl_meter(lflow, l_ctx_out->meter_table, + _meter_id); + /* Encode OVN logical actions into OpenFlow. */ uint64_t ofpacts_stub[1024 / 8]; struct ofpbuf ofpacts = OFPBUF_STUB_INITIALIZER(ofpacts_stub); @@ -609,6 +637,7 @@ add_matches_to_flow_table(const struct sbrec_logical_flow *lflow, .ct_snat_vip_ptable = OFTABLE_CT_SNAT_FOR_VIP, .fdb_ptable = OFTABLE_GET_FDB, .fdb_lookup_ptable = OFTABLE_LOOKUP_FDB, +.ctrl_meter_id = ctrl_meter_id, }; ovnacts_encode(ovnacts->data, ovnacts->size, , ); @@ -635,9 +664,11 @@ add_matches_to_flow_table(const struct sbrec_logical_flow *lflow, } } if (!m->n) { -ofctrl_add_flow(l_ctx_out->flow_table, ptable, lflow->priority, -lflow->header_.uuid.parts[0], >match, , ->header_.uuid); +ofctrl_add_flow_metered(l_ctx_out->flow_table, ptable, +lflow->priority, +lflow->header_.uuid.parts[0], >match, +, >header_.uuid, +ctrl_meter_id); } else { uint64_t conj_stubs[64 / 8]; struct ofpbuf conj; @@ -655,7 +686,8 @@ add_matches_to_flow_table(const struct sbrec_logical_flow *lflow, ofctrl_add_or_append_flow(l_ctx_out->flow_table, ptable, lflow->priority, 0, - >match, , >header_.uuid); + >match, , >header_.uuid, + ctrl_meter_id); ofpbuf_uninit(); } } diff --git a/controller/ofctrl.c b/controller/ofctrl.c index 053631590..eebb27567 100644 --- a/controller/ofctrl.c +++ b/controller/ofctrl.c @@ -66,6 +66,7 @@ struct ovn_flow { struct ofpact *ofpacts; size_t ofpacts_len; uint64_t cookie; +uint32_t ctrl_meter_id; /* Meter to be used for controller actions. */ }; /* A desired flow, in struct ovn_desired_flow_table, calculated by the @@ -220,7 +221,8 @@ static struct desired_flow
Re: [ovs-dev] [PATCH] ovsdb-server: Add limitation for ovsdb-server remotes
> -Original Message- > From: Ilya Maximets [mailto:i.maxim...@ovn.org] > Sent: Wednesday, May 12, 2021 9:03 PM > To: wangyunjian ; Ilya Maximets > ; d...@openvswitch.org > Cc: Lilijun (Jerry) ; xudingke > ; chenchanghu > Subject: Re: [ovs-dev] [PATCH] ovsdb-server: Add limitation for ovsdb-server > remotes > > On 1/11/21 2:27 PM, wangyunjian wrote: > >> -Original Message- > >> From: Ilya Maximets [mailto:i.maxim...@ovn.org] > >> Sent: Monday, January 11, 2021 8:40 PM > >> To: wangyunjian ; d...@openvswitch.org > >> Cc: i.maxim...@ovn.org; Lilijun (Jerry) ; > >> xudingke ; chenchanghu > > >> Subject: Re: [ovs-dev] [PATCH] ovsdb-server: Add limitation for > >> ovsdb-server remotes > >> > >> On 12/22/20 12:31 PM, wangyunjian wrote: > >>> From: Yunjian Wang > >>> > >>> Currently there is no limit to add ovsdb-server remotes, which will > >>> cause all FDs maybe be consumed when we always call > >>> ovsdb_server_add_remote() function. And as a result, other > >>> connections cannot be created. To fix this issue, we can add > >>> limitation for ovsdb-server remotes. It's limited to 64, witch is > >>> just an empirical value. > >> > >> Hi. Why do you need so many remotes? > >> And why not removing ones that you don't need? > > > > This issue is caused by pressure tests. The remotes are continuously > > created. The ovsdb-server service is abnormal and cannot be recovered. > > Thinking more about this, it makes some sense to limit number of remotes, but > this patch only limits it in one place. Commonly, some database table is used > as a source of remotes for an ovsdb process and this patch doesn't limit > number of connections that could be added to these tables. OK, I can fix them together, but I don't know the exact command for the some database tables. Can you explain them in detail? Thanks > > Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 1/5] dpif-netdev: Rework rxq scheduling code.
On Fri, Jun 4, 2021 at 11:19 PM Kevin Traynor wrote: > > This reworks the current rxq scheduling code to break it into more > generic and reusable pieces. > > The behaviour does not change from a user perspective, except the logs > are updated to be more consistent. > > From an implementation view, there are some changes with mind to adding > functionality and reuse in later patches. > > The high level reusable functions added in this patch are: > - Generate a list of current numas and pmds > - Perform rxq scheduling into the list > - Effect the rxq scheduling assignments so they are used > > The rxq scheduling is updated to handle both pinned and non-pinned rxqs > in the same call. As a global comment, I prefer consistency for function names. I put suggestions inline, you can take these as simple nits. This patch prepares arrival of new scheduling algorithm, but uses a single boolean flag. I would avoid this temp boolean and introduce the enum (used later in this series). > > Signed-off-by: Kevin Traynor > --- > lib/dpif-netdev.c | 538 ++ > tests/pmd.at | 2 +- > 2 files changed, 446 insertions(+), 94 deletions(-) > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index 650e67ab3..57d23e112 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -5006,4 +5006,211 @@ rr_numa_list_destroy(struct rr_numa_list *rr) > } > > +struct sched_numa_list { > +struct hmap numas; /* Contains 'struct sched_numa'. */ > +}; > + > +/* Meta data for out-of-place pmd rxq assignments. */ > +struct sched_pmd { > +/* Associated PMD thread. */ > +struct dp_netdev_pmd_thread *pmd; > +uint64_t pmd_proc_cycles; > +struct dp_netdev_rxq **rxqs; > +unsigned n_rxq; > +bool isolated; > +}; sched_pmd objects are associated in a unique fashion to a sched_numa object. Having a back pointer to sched_numa in the sched_pmd object removes the need for sched_numa_list_find_numa(). > + > +struct sched_numa { > +struct hmap_node node; > +int numa_id; > +/* PMDs on numa node. */ > +struct sched_pmd *pmds; > +/* Num of PMDs on numa node. */ > +unsigned n_pmds; > +/* Num of isolated PMDs on numa node. */ > +unsigned n_iso; > +int rr_cur_index; > +bool rr_idx_inc; > +}; > + > +static size_t > +sched_numa_list_count(struct sched_numa_list *numa_list) > +{ > +return hmap_count(_list->numas); > +} > + > +static struct sched_numa * > +sched_numa_list_next(struct sched_numa_list *numa_list, > + const struct sched_numa *numa) > +{ > +struct hmap_node *node = NULL; > + > +if (numa) { > +node = hmap_next(_list->numas, >node); > +} > +if (!node) { > +node = hmap_first(_list->numas); > +} > + > +return (node) ? CONTAINER_OF(node, struct sched_numa, node) : NULL; > +} > + > +static struct sched_numa * > +sched_numa_list_lookup(struct sched_numa_list * numa_list, int numa_id) Nit: no space *numa_list > +{ > +struct sched_numa *numa; > + > +HMAP_FOR_EACH_WITH_HASH (numa, node, hash_int(numa_id, 0), > + _list->numas) { > +if (numa->numa_id == numa_id) { > +return numa; > +} > +} > +return NULL; > +} > + > +/* Populate numas and pmds on those numas */ Nit: missing a '.'. > +static void > +sched_numa_list_populate(struct sched_numa_list *numa_list, > + struct dp_netdev *dp) > +{ > +struct dp_netdev_pmd_thread *pmd; Nit: missing a blank line after var definition. > +hmap_init(_list->numas); > + > +/* For each pmd on this datapath. */ > +CMAP_FOR_EACH (pmd, node, >poll_threads) { > +struct sched_numa *numa; > +struct sched_pmd *sched_pmd; > +if (pmd->core_id == NON_PMD_CORE_ID) { > +continue; > +} > + > +/* Get the numa of the PMD. */ > +numa = sched_numa_list_lookup(numa_list, pmd->numa_id); > +/* Create a new numa node for it if not already created */ Nit: missing a '.'. > +if (!numa) { > +numa = xzalloc(sizeof *numa); > +numa->numa_id = pmd->numa_id; > +hmap_insert(_list->numas, >node, > +hash_int(pmd->numa_id, 0)); > +} > + > +/* Create a sched_pmd on this numa for the pmd. */ > +numa->n_pmds++; > +numa->pmds = xrealloc(numa->pmds, numa->n_pmds * sizeof *numa->pmds); > +sched_pmd = >pmds[numa->n_pmds - 1]; > +memset(sched_pmd ,0, sizeof *sched_pmd); Nit: should be sched_pmd, 0, > +sched_pmd->pmd = pmd; > +/* At least one pmd is present so initialize curr_idx and idx_inc. */ > +numa->rr_cur_index = 0; > +numa->rr_idx_inc = true; > +} > +} > + > +static void > +sched_numa_list_free_entries(struct sched_numa_list *numa_list) > +{ > +struct sched_numa *numa; > + > +HMAP_FOR_EACH_POP (numa, node, _list->numas) {
Re: [ovs-dev] [PATCH ovn v3 3/3] ovn-controller: Fix incremental processing for logical port references.
On Mon, Jun 21, 2021 at 2:52 AM Han Zhou wrote: > > If a lflow has an lport name in the match, but when the lflow is > processed the port-binding is not seen by ovn-controller, the > corresponding openflow will not be created. Later if the port-binding is > created/monitored by ovn-controller, the lflow is not reprocessed > because the lflow didn't change and ovn-controller doesn't know that the > port-binding affects the lflow. This patch fixes the problem by tracking > the references when parsing the lflow, even if the port-binding is not > found when the lflow is firstly parsed. A test case is also added to > cover the scenario. > > Signed-off-by: Han Zhou Hi Han, Thanks for fixing these issues. I've a few questions. I haven't reviewed the patch completely. > --- > controller/lflow.c | 63 ++--- > controller/lflow.h | 3 ++ > controller/ovn-controller.c | 35 - > include/ovn/expr.h | 2 +- > lib/expr.c | 14 +++-- > tests/ovn.at| 47 +++ > tests/test-ovn.c| 4 +-- > utilities/ovn-trace.c | 2 +- > 8 files changed, 132 insertions(+), 38 deletions(-) > > diff --git a/controller/lflow.c b/controller/lflow.c > index 34eca135a..b7699a309 100644 > --- a/controller/lflow.c > +++ b/controller/lflow.c > @@ -61,6 +61,7 @@ struct lookup_port_aux { > > struct condition_aux { > struct ovsdb_idl_index *sbrec_port_binding_by_name; > +const struct sbrec_datapath_binding *dp; > const struct sbrec_chassis *chassis; > const struct sset *active_tunnels; > const struct sbrec_logical_flow *lflow; > @@ -98,6 +99,12 @@ lookup_port_cb(const void *aux_, const char *port_name, > unsigned int *portp) > > const struct lookup_port_aux *aux = aux_; > > +/* Store the name that used to lookup the lport to lflow reference, so > that > + * in the future when the lport's port binding changes, the logical flow > + * that references this lport can be reprocessed. */ > +lflow_resource_add(aux->lfrr, REF_TYPE_PORTBINDING, port_name, > + >lflow->header_.uuid); > + > const struct sbrec_port_binding *pb > = lport_lookup_by_name(aux->sbrec_port_binding_by_name, port_name); > if (pb && pb->datapath == aux->dp) { > @@ -149,19 +156,18 @@ is_chassis_resident_cb(const void *c_aux_, const char > *port_name) > { > const struct condition_aux *c_aux = c_aux_; > > +/* Store the port name that used to lookup the lport to lflow reference, > so > + * that in the future when the lport's port-binding changes the logical > + * flow that references this lport can be reprocessed. */ > +lflow_resource_add(c_aux->lfrr, REF_TYPE_PORTBINDING, port_name, > + _aux->lflow->header_.uuid); > + > const struct sbrec_port_binding *pb > = lport_lookup_by_name(c_aux->sbrec_port_binding_by_name, port_name); > if (!pb) { > return false; > } > > -/* Store the port_name to lflow reference. */ > -int64_t dp_id = pb->datapath->tunnel_key; > -char buf[16]; > -get_unique_lport_key(dp_id, pb->tunnel_key, buf, sizeof(buf)); > -lflow_resource_add(c_aux->lfrr, REF_TYPE_PORTBINDING, buf, > - _aux->lflow->header_.uuid); > - > if (strcmp(pb->type, "chassisredirect")) { > /* for non-chassisredirect ports */ > return pb->chassis && pb->chassis == c_aux->chassis; > @@ -623,8 +629,6 @@ add_matches_to_flow_table(const struct sbrec_logical_flow > *lflow, > int64_t dp_id = dp->tunnel_key; > char buf[16]; > get_unique_lport_key(dp_id, port_id, buf, sizeof(buf)); > -lflow_resource_add(l_ctx_out->lfrr, REF_TYPE_PORTBINDING, > buf, > - >header_.uuid); > if (!sset_contains(l_ctx_in->local_lport_ids, buf)) { > VLOG_DBG("lflow "UUID_FMT > " port %s in match is not local, skip", > @@ -788,6 +792,7 @@ consider_logical_flow__(const struct sbrec_logical_flow > *lflow, > }; > struct condition_aux cond_aux = { > .sbrec_port_binding_by_name = l_ctx_in->sbrec_port_binding_by_name, > +.dp = dp, > .chassis = l_ctx_in->chassis, > .active_tunnels = l_ctx_in->active_tunnels, > .lflow = lflow, > @@ -805,7 +810,6 @@ consider_logical_flow__(const struct sbrec_logical_flow > *lflow, > struct hmap *matches = NULL; > size_t matches_size = 0; > > -bool is_cr_cond_present = false; > bool pg_addr_set_ref = false; > uint32_t n_conjs = 0; > > @@ -843,8 +847,8 @@ consider_logical_flow__(const struct sbrec_logical_flow > *lflow, > case LCACHE_T_NONE: > case LCACHE_T_CONJ_ID: > case LCACHE_T_EXPR: > -expr = expr_evaluate_condition(expr, is_chassis_resident_cb, >
Re: [ovs-dev] [PATCH v3] dpif-netdev: Expand the meter capacity.
On 5/12/21 11:17 AM, xiangxia.m@gmail.com wrote: > From: Tonghao Zhang > > For now, ovs-vswitchd use the array of the dp_meter struct > to store meter's data, and at most, there are only 65536 > (defined by MAX_METERS) meters that can be used. But in some > case, for example, in the edge gateway, we should use 200,000+, > at least, meters for IP address bandwidth limitation. > Every one IP address will use two meters for its rx and tx > path[1]. In other way, ovs-vswitchd should support meter-offload > (rte_mtr_xxx api introduced by dpdk.), but there are more than > 65536 meters in the hardware, such as Mellanox ConnectX-6. > > This patch use cmap to manage the meter, instead of the array. > > * Insertion performance, ovs-ofctl add-meter 1000+ meters, > the cmap takes abount 4000ms, as same as previous implementation. > * Lookup performance in datapath, we add 1000+ meters which rate limit > are 10Gbps (the NIC cards are 10Gbps, so netdev-datapath will not > drop the packets.), and a flow which only forward packets from p0 > to p1, with meter action[2]. On other machine, pktgen-dpdk will > generate 64B packets to p0. > > The forwarding performance always is 1324 Kpps on my server > which CPU is Intel E5-2650, 2.00GHz. > > [1]. > $ in_port=p0,ip,ip_dst=1.1.1.x action=meter:n,output:p1 > $ in_port=p1,ip,ip_src=1.1.1.x action=meter:m,output:p0 > > [2]. > $ in_port=p0 action=meter:100,output:p1 > > Signed-off-by: Tonghao Zhang > --- > v3: > * update the commit message > * remove dp_netdev_meter struct > * remove create_dp_netdev function > * don't use the hash_basis > * use the meter_id as a hash instead of hash_xxx function. see *dp_meter_hash > for details > * fix coding style > * v2: > http://patchwork.ozlabs.org/project/openvswitch/patch/1584254601-7321-1-git-send-email-xiangxia.m@gmail.com/ > --- > lib/dpif-netdev.c | 158 -- > 1 file changed, 97 insertions(+), 61 deletions(-) Hi. Thanks for v3! This version looks mostly OK to me with only one question: In current code meter locks are adaptive mutexes, but this patch makes them usual. Is there particular reason to do that? Have you tested performance in case where several threads uses the same meter? If not, I'd prefer to keep it adaptive, as it's the current behavior and adaptive mutexes sometimes provides better performance since they act like spinlocks for a short period of time (in some cases they're worse than simple mutexes, but extensive performance testing is needed for each particular case to confirm). I can change the type of meter mutexes before applying the patch. Let me know, what do you think. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 2/5] dpif-netdev: Make PMD auto load balance use common rxq scheduling.
Hey Kevin , Patch looks good to me. Builds fine , all test cases here http://patchwork.ozlabs.org/project/openvswitch/patch/20210316154532.127858-1-ktray...@redhat.com/ pass as well. Some minor nits inline : > +static bool > +pmd_rebalance_dry_run(struct dp_netdev *dp) > +OVS_REQUIRES(dp->port_mutex) > +{ > +struct sched_numa_list *numa_list_cur; > +struct sched_numa_list *numa_list_est; > +bool thresh_met = false; > +uint64_t current, estimate; current and estimate aren't specific, may be current_var and estimate_var ? > +uint64_t improvement = 0; > + > +VLOG_DBG("PMD auto load balance performing dry run."); > + > +/* Populate current assignments. */ > +numa_list_cur = xzalloc(sizeof *numa_list_cur); > +sched_numa_list_populate(numa_list_cur, dp); > +sched_numa_list_assignments(numa_list_cur, dp); > + > +/* Populate estimated assignments. */ > +numa_list_est = xzalloc(sizeof *numa_list_est); > +sched_numa_list_populate(numa_list_est, dp); > +sched_numa_list_schedule(numa_list_est, dp, > + dp->pmd_rxq_assign_cyc, VLL_DBG); > + > +/* Check if cross-numa polling, there is only one numa with PMDs. */ > +if (!sched_numa_list_cross_numa_polling(numa_list_est) || > +sched_numa_list_count(numa_list_est) == 1) { > + > +/* Calculate variances. */ > +current = sched_numa_list_variance(numa_list_cur); > +estimate = sched_numa_list_variance(numa_list_est); > + > +if (estimate < current) { > + improvement = ((current - estimate) * 100) / current; > +} > +VLOG_DBG("Current variance %"PRIu64" Estimated variance > %"PRIu64"", > + current, estimate); space alignment issues. extra space is required at the start of second statement to align with the first one ? Also, comma or full stop after Current variance %"PRIu64" ? > +VLOG_DBG("Variance improvement %"PRIu64"%%", improvement); > + > +if (improvement >= dp->pmd_alb.rebalance_improve_thresh) { > +thresh_met = true; > +VLOG_DBG("PMD load variance improvement threshold %u%% " > + "is met", dp->pmd_alb.rebalance_improve_thresh); space alignment issue. Extra space added here before "is_met". > +} else { > +VLOG_DBG("PMD load variance improvement threshold %u%% is not > met", > + dp->pmd_alb.rebalance_improve_thresh); > +} > +} else { > +VLOG_DBG("PMD auto load balance detected cross-numa polling with " > + "multiple numa nodes. Unable to accurately estimate."); > +} > + > +sched_numa_list_free_entries(numa_list_cur); > +sched_numa_list_free_entries(numa_list_est); > + > +free(numa_list_cur); > +free(numa_list_est); > + > +return thresh_met; > +} > + > static void > reload_affected_pmds(struct dp_netdev *dp) @@ -5897,215 +5925,4 @@ > variance(uint64_t a[], int n) } ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] dpif-netlink: "bonding_masters" is a reserved name
Currently, on Linux, if you try to create a system datapath called "bonding_masters", when you have bonding module loaded, you have a kernel trace ("sysfs: cannot create duplicate filename '/class/net/bonding_masters'"). This trace appears since "bonding" kernel modules creates a file called "/sys/class/net/bonding_masters", that prevents any network interface to be called "bonding_masters". This commits forbid an user to create a system datapath (that is a network interface) called "bonding_masters" to avoid the kernel trace and to avoid that bonding module can't work if it's loaded after "bonding_masters" interface is created. Reported-at: https://bugzilla.redhat.com/1974303 Signed-off-by: Timothy Redaelli --- lib/dpif-netlink.c | 8 1 file changed, 8 insertions(+) diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c index 73d5608a8..ada1d8479 100644 --- a/lib/dpif-netlink.c +++ b/lib/dpif-netlink.c @@ -330,6 +330,14 @@ dpif_netlink_open(const struct dpif_class *class OVS_UNUSED, const char *name, uint32_t upcall_pid; int error; +/* "bonding_masters" is a reserved interface name under Linux, + * since bonding module creates /sys/class/net/bonding_masters + * and so no interface can be called "bonding_masters". + */ +if (!strcmp(name, "bonding_masters")) { +return EINVAL; +} + error = dpif_netlink_init(); if (error) { return error; -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v8 6/6] northd: Flood ARPs to routers for "unreachable" addresses.
Hi Mark, Would it be possible to install /32 IP on the router to avoid flooding all router ports? This can be hundreds if not thousands of ports and I think I've seen issues with flooding at that scale before. I'd have to test it in the lab though first. Best Regards, Krzysztof On Thu, Jun 3, 2021, at 20:49, Mark Michelson wrote: > Previously, ARP TPAs were filtered down only to "reachable" addresses. > Reachable addresses are all router interface addresses, as well as NAT > external addresses and load balancer VIPs that are within the subnet > handled by a router's port. > > However, it is possible that in some configurations, CMSes purposely > configure NAT or load balancer addresses on a router that are outside > the router's subnets, and they expect the router to respond to ARPs for > those addresses. > > This commit adds a higher priority flow to logical switches that makes > it so ARPs targeted at "unreachable" addresses are flooded to all ports. > This way, the ARPs can reach the router appropriately and receive a > response. > > Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1929901 > > Signed-off-by: Mark Michelson > --- > northd/ovn-northd.8.xml | 8 +++ > northd/ovn-northd.c | 153 +++- > northd/ovn_northd.dl| 101 -- > tests/ovn-northd.at | 99 ++ > tests/system-ovn.at | 102 +++ > 5 files changed, 391 insertions(+), 72 deletions(-) > > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml > index bb77689de..8bb77bf6c 100644 > --- a/northd/ovn-northd.8.xml > +++ b/northd/ovn-northd.8.xml > @@ -1549,6 +1549,14 @@ output; > logical ports. > > > + > +Priority-80 flows for each IP address/VIP/NAT address > configured > +outside its owning router port's subnet. These flows match ARP > +requests and ND packets for the specific IP addresses. > Matched packets > +are forwarded to the MC_FLOOD multicast group > which > +contains all connected logical ports. > + > + > > Priority-75 flows for each port connected to a logical router > matching self originated ARP request/ND packets. These packets > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c > index 414bf9c48..eacbab96a 100644 > --- a/northd/ovn-northd.c > +++ b/northd/ovn-northd.c > @@ -6539,44 +6539,48 @@ > build_lswitch_rport_arp_req_self_orig_flow(struct ovn_port *op, > ds_destroy(); > } > > -/* > - * Ingress table 19: Flows that forward ARP/ND requests only to the routers > - * that own the addresses. Other ARP/ND packets are still flooded in the > - * switching domain as regular broadcast. > - */ > static void > -build_lswitch_rport_arp_req_flow_for_ip(struct sset *ips, > -int addr_family, > -struct ovn_port *patch_op, > -struct ovn_datapath *od, > -uint32_t priority, > -struct hmap *lflows, > -const struct ovsdb_idl_row > *stage_hint) > +arp_nd_ns_match(struct sset *ips, int addr_family, struct ds *match) > { > -struct ds match = DS_EMPTY_INITIALIZER; > -struct ds actions = DS_EMPTY_INITIALIZER; > > /* Packets received from VXLAN tunnels have already been through the > * router pipeline so we should skip them. Normally this is done by the > * multicast_group implementation (VXLAN packets skip table 32 which > * delivers to patch ports) but we're bypassing multicast_groups. > */ > -ds_put_cstr(, FLAGBIT_NOT_VXLAN " && "); > +ds_put_cstr(match, FLAGBIT_NOT_VXLAN " && "); > > if (addr_family == AF_INET) { > -ds_put_cstr(, "arp.op == 1 && arp.tpa == { "); > +ds_put_cstr(match, "arp.op == 1 && arp.tpa == {"); > } else { > -ds_put_cstr(, "nd_ns && nd.target == { "); > +ds_put_cstr(match, "nd_ns && nd.target == {"); > } > > const char *ip_address; > SSET_FOR_EACH (ip_address, ips) { > -ds_put_format(, "%s, ", ip_address); > +ds_put_format(match, "%s, ", ip_address); > } > > -ds_chomp(, ' '); > -ds_chomp(, ','); > -ds_put_cstr(, "}"); > +ds_chomp(match, ' '); > +ds_chomp(match, ','); > +ds_put_cstr(match, "}"); > +} > + > +/* > + * Ingress table 19: Flows that forward ARP/ND requests only to the routers > + * that own the addresses. Other ARP/ND packets are still flooded in the > + * switching domain as regular broadcast. > + */ > +static void > +build_lswitch_rport_arp_req_flow_for_reachable_ip(struct sset *ips, > +int addr_family, struct ovn_port *patch_op, struct ovn_datapath *od, > +uint32_t priority, struct hmap *lflows, > +const struct ovsdb_idl_row *stage_hint) > +{ > +
Re: [ovs-dev] [PATCH v2] ovsdb: provide raft and command interfaces with priority
I just sent an updated version: 1. Logic fixed in the outer (remotes) loop. Inner (sessions) was mostly OK. 2. Appropriate measures have been taken to ensure that the "skip to" pointer is always valid. 3. Tested by forcing <10ms timeouts on processing and/or mandating "skip" on every iteration. All tests pass (see 4) 4. I had to disable skipping for replay. They seem to be incompatible. Brgds, A. On 21/06/2021 20:29, Ilya Maximets wrote: On 6/11/21 5:42 PM, anton.iva...@cambridgegreys.com wrote: From: Anton Ivanov Set a soft time limit of "raft election timer"/2 on ovsdb processing. This improves behaviour in large heavily loaded clusters. While it cannot fully eliminate spurious raft elections under heavy load, it significantly decreases their number. Processing is (to the extent possible) restarted where it stopped on the previous iteration to ensure that sessions towards the tail of the session list are not starved. Signed-off-by: Anton Ivanov --- Hey, Anton. Thanks for the patch! This is not a complete review, but a few things that I noticed. See inline. Best regards, Ilya Maximets. ovsdb/jsonrpc-server.c | 80 +++--- ovsdb/jsonrpc-server.h | 2 +- ovsdb/ovsdb-server.c | 15 +++- ovsdb/raft.c | 5 +++ ovsdb/raft.h | 3 ++ ovsdb/storage.c| 8 + ovsdb/storage.h| 2 ++ 7 files changed, 109 insertions(+), 6 deletions(-) diff --git a/ovsdb/jsonrpc-server.c b/ovsdb/jsonrpc-server.c index 4e2dfc3d7..84e0f69b5 100644 --- a/ovsdb/jsonrpc-server.c +++ b/ovsdb/jsonrpc-server.c @@ -60,7 +60,8 @@ static struct ovsdb_jsonrpc_session *ovsdb_jsonrpc_session_create( struct ovsdb_jsonrpc_remote *, struct jsonrpc_session *, bool); static void ovsdb_jsonrpc_session_preremove_db(struct ovsdb_jsonrpc_remote *, struct ovsdb *); -static void ovsdb_jsonrpc_session_run_all(struct ovsdb_jsonrpc_remote *); +static void ovsdb_jsonrpc_session_run_all(struct ovsdb_jsonrpc_remote *, + uint64_t limit); static void ovsdb_jsonrpc_session_wait_all(struct ovsdb_jsonrpc_remote *); static void ovsdb_jsonrpc_session_get_memory_usage_all( const struct ovsdb_jsonrpc_remote *, struct simap *usage); @@ -128,6 +129,8 @@ struct ovsdb_jsonrpc_server { bool read_only;/* This server is does not accept any transactions that can modify the database. */ struct shash remotes; /* Contains "struct ovsdb_jsonrpc_remote *"s. */ +struct ovsdb_jsonrpc_remote *skip_to; +bool yield_immediately; 'yield' doesn't seem to be a right word here. Maybe 'wake_up' or something similar? Also, both fields above needs a comment. OTOH, do we really need this filed? I mean, if we didn't process some session, shouldn't next session_wait() wake us up? }; /* A configured remote. This is either a passive stream listener plus a list @@ -137,6 +140,7 @@ struct ovsdb_jsonrpc_remote { struct ovsdb_jsonrpc_server *server; struct pstream *listener; /* Listener, if passive. */ struct ovs_list sessions; /* List of "struct ovsdb_jsonrpc_session"s. */ +struct ovsdb_jsonrpc_session *skip_to; uint8_t dscp; bool read_only; char *role; @@ -159,6 +163,8 @@ ovsdb_jsonrpc_server_create(bool read_only) ovsdb_server_init(>up); shash_init(>remotes); server->read_only = read_only; +server->yield_immediately = false; +server->skip_to = NULL; return server; } @@ -279,6 +285,7 @@ ovsdb_jsonrpc_server_add_remote(struct ovsdb_jsonrpc_server *svr, remote->dscp = options->dscp; remote->read_only = options->read_only; remote->role = nullable_xstrdup(options->role); +remote->skip_to = NULL; shash_add(>remotes, name, remote); if (!listener) { @@ -378,12 +385,26 @@ ovsdb_jsonrpc_server_set_read_only(struct ovsdb_jsonrpc_server *svr, } void -ovsdb_jsonrpc_server_run(struct ovsdb_jsonrpc_server *svr) +ovsdb_jsonrpc_server_run(struct ovsdb_jsonrpc_server *svr, uint64_t limit) { struct shash_node *node; +uint64_t elapsed = 0, start_time = 0; + +if (limit) { +start_time = time_now(); Why this function uses time_now() while others are using time_msec() ? time_now() returns seconds while 'limit' is in milliseconds. +} + +svr->yield_immediately = false; SHASH_FOR_EACH (node, >remotes) { struct ovsdb_jsonrpc_remote *remote = node->data; +if (svr->skip_to) { +if (remote != svr->skip_to) { +continue; What if 'skip_to' is already removed from the list? We will, probably, never process any remotes again. Also, we didn't process first N remotes here and we're not setting 'yield_immediately'. This is inconsistent, at least. But, yes, it's unclear if 'yield_immediately' needed at all. +
Re: [ovs-dev] [PATCH v5 ovn 2/4] ovn-northd: Add support for CoPP.
Bleep bloop. Greetings Lorenzo Bianconi, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: WARNING: Line lacks whitespace around operator #884 FILE: utilities/ovn-nbctl.c:430: ls-copp-add SWITCH PROTO METER\n\ WARNING: Line lacks whitespace around operator #887 FILE: utilities/ovn-nbctl.c:433: ls-copp-del SWITCH [PROTO]\n\ WARNING: Line lacks whitespace around operator #891 FILE: utilities/ovn-nbctl.c:437: ls-copp-list SWITCH\n\ WARNING: Line lacks whitespace around operator #894 FILE: utilities/ovn-nbctl.c:440: lr-copp-add ROUTER PROTO METER\n\ WARNING: Line lacks whitespace around operator #897 FILE: utilities/ovn-nbctl.c:443: lr-copp-del ROUTER [PROTO]\n\ WARNING: Line lacks whitespace around operator #901 FILE: utilities/ovn-nbctl.c:447: lr-copp-list ROUTER\n\ Lines checked: 1079, Warnings: 6, Errors: 0 Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V6 00/13] Netdev vxlan-decap offload
Hi all, As part of rebasing our AVX512 DPIF on this patchset, I tested this patchset with partial HWOL and I'm seeing strange behaviour. I'll report back more detailed findings soon, just wanted to mention this here as soon as I found the issue. Thanks, Cian > -Original Message- > From: dev On Behalf Of Ilya Maximets > Sent: Tuesday 22 June 2021 16:55 > To: Eli Britstein ; d...@openvswitch.org; Ilya Maximets > > Cc: ivan.ma...@oktetlabs.ru; Ameer Mahagneh ; Majd Dibbiny > > Subject: Re: [ovs-dev] [PATCH V6 00/13] Netdev vxlan-decap offload > > On 4/4/21 11:54 AM, Eli Britstein wrote: > > VXLAN decap in OVS-DPDK configuration consists of two flows: > > F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789) > > F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0 > > > > F1 is a classification flow. It has outer headers matches and it > > classifies the packet as a VXLAN packet, and using tnl_pop action the > > packet continues processing in F2. > > F2 is a flow that has matches on tunnel metadata as well as on the inner > > packet headers (as any other flow). > > > > In order to fully offload VXLAN decap path, both F1 and F2 should be > > offloaded. As there are more than one flow in HW, it is possible that > > F1 is done by HW but F2 is not. Packet is received by SW, and should be > > processed starting from F2 as F1 was already done by HW. > > Rte_flows are applicable only on physical port IDs. Keeping the original > > physical in port on which the packet is received on enables applying > > vport flows (e.g. F2) on that physical port. > > > > This patch-set makes use of [1] introduced in DPDK 20.11, that adds API > > for tunnel offloads. > > > > Note that MLX5 PMD has a bug that the tnl_pop private actions must be > > first. In OVS it is not. > > Fixing this issue is scheduled for 21.05 (and stable 20.11.2). > > Meanwhile, tests were done with a workaround for it [2]. > > > > v2-v1: > > - Tracking original in_port, and applying vport on that physical port > > instead of all PFs. > > v3-v2: > > - Traversing ports using a new API instead of flow_dump. > > - Refactor packet state recover logic, with bug fix for error pop_header. > > - One ref count for netdev in non-tunnel case. > > - Rename variables, comments, rebase. > > v4-v3: > > - Extract orig_in_port from physdev for flow modify. > > - Miss handling fixes. > > v5-v4: > > - Drop refactor offload rule creation commit. > > - Comment about setting in_port in restore. > > - Refactor vports flow offload commit. > > v6-v5: > > - Fixed duplicate netdev ref bug. > > > > Travis: > > v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552 > > v2: https://travis-ci.org/github/elibritstein/OVS/builds/758382963 > > v3: https://travis-ci.org/github/elibritstein/OVS/builds/761089087 > > v4: https://travis-ci.org/github/elibritstein/OVS/builds/763146966 > > v5: https://travis-ci.org/github/elibritstein/OVS/builds/765271879 > > v6: https://travis-ci.org/github/elibritstein/OVS/builds/765816800 > > > > GitHub Actions: > > v1: https://github.com/elibritstein/OVS/actions/runs/515334647 > > v2: https://github.com/elibritstein/OVS/actions/runs/554986007 > > v3: https://github.com/elibritstein/OVS/actions/runs/613226225 > > v4: https://github.com/elibritstein/OVS/actions/runs/658415274 > > v5: https://github.com/elibritstein/OVS/actions/runs/704357369 > > v6: https://github.com/elibritstein/OVS/actions/runs/716304028 > > > > [1] https://mails.dpdk.org/archives/dev/2020-October/187314.html > > [2] https://github.com/elibritstein/dpdk-stable/pull/1 > > > > > > Eli Britstein (10): > > netdev-offload: Add HW miss packet state recover API > > netdev-dpdk: Introduce DPDK tunnel APIs > > netdev-offload: Introduce an API to traverse ports > > netdev-dpdk: Add flow_api support for netdev vxlan vports > > netdev-offload-dpdk: Implement HW miss packet recover for vport > > dpif-netdev: Add HW miss packet state recover logic > > netdev-offload-dpdk: Change log rate limits > > netdev-offload-dpdk: Support tunnel pop action > > netdev-offload-dpdk: Support vports flows offload > > netdev-dpdk-offload: Add vxlan pattern matching function > > > > Ilya Maximets (2): > > netdev-offload: Allow offloading to netdev without ifindex. > > netdev-offload: Disallow offloading to unrelated tunneling vports. > > > > Sriharsha Basavapatna (1): > > dpif-netdev: Provide orig_in_port in metadata for tunneled packets > > > > Documentation/howto/dpdk.rst | 1 + > > NEWS | 2 + > > lib/dpif-netdev.c | 97 +++-- > > lib/netdev-dpdk.c | 118 ++ > > lib/netdev-dpdk.h | 106 - > > lib/netdev-offload-dpdk.c | 704 +++--- > > lib/netdev-offload-provider.h | 5 + > > lib/netdev-offload-tc.c | 8 + > > lib/netdev-offload.c | 47 ++- > > lib/netdev-offload.h | 10 + > > lib/packets.h
Re: [ovs-dev] [PATCH V6 00/13] Netdev vxlan-decap offload
> -Original Message- > From: Ilya Maximets > Sent: Wednesday 23 June 2021 16:25 > To: Ferriter, Cian ; Ilya Maximets > ; Eli Britstein > ; d...@openvswitch.org > Cc: ivan.ma...@oktetlabs.ru; Ameer Mahagneh ; Majd Dibbiny > > Subject: Re: [ovs-dev] [PATCH V6 00/13] Netdev vxlan-decap offload > > On 6/23/21 5:18 PM, Ferriter, Cian wrote: > >> -Original Message- > >> From: dev On Behalf Of Ferriter, Cian > >> Sent: Wednesday 23 June 2021 13:38 > >> To: Ilya Maximets ; Eli Britstein ; > >> d...@openvswitch.org > >> Cc: ivan.ma...@oktetlabs.ru; Ameer Mahagneh ; Majd > >> Dibbiny > >> Subject: Re: [ovs-dev] [PATCH V6 00/13] Netdev vxlan-decap offload > >> > >> Hi all, > >> > >> As part of rebasing our AVX512 DPIF on this patchset, I tested this > >> patchset with partial HWOL and > I'm > >> seeing strange behaviour. > >> > >> I'll report back more detailed findings soon, just wanted to mention this > >> here as soon as I found > the > >> issue. > >> > >> Thanks, > >> Cian > >> > > > > More details on the issue I'm seeing: > > I'm using Ilya's branch from Github: > > https://github.com/igsilya/ovs/tree/tmp-vxlan-decap > > > > ~/ovs_scripts# $OVS_DIR/utilities/ovs-vsctl list Open_vSwitch > > dpdk_version: "DPDK 20.11.1" > > other_config: {dpdk-hugepage-dir="/mnt/huge", dpdk-init="true", > > dpdk-lcore-mask="0x1", dpdk- > socket-mem="2048,0", emc-insert-inv-prob="0", hw-offload="true", > pmd-cpu-mask="0x2"} > > > > ~/ovs_scripts# $OVS_DIR/utilities/ovs-vsctl show > > 31584ce5-09c1-44b3-ab27-1a0308d63fff > > Bridge br0 > > datapath_type: netdev > > Port br0 > > Interface br0 > > type: internal > > Port phy0 > > Interface phy0 > > type: dpdk > > options: {dpdk-devargs="5e:00.0"} > > > > ~/ovs_scripts# $OVS_DIR/utilities/ovs-ofctl dump-flows br0 > > cookie=0x0, duration=29.466s, table=0, n_packets=0, n_bytes=0, > > in_port=phy0 actions=IN_PORT > > > > I'm expecting the flow to be partially offloaded, but I get a segfault when > > using the above branch. > More info on the segfault below: > > > > Thread 13 "pmd-c01/id:8" received signal SIGSEGV, Segmentation fault. > > [Switching to Thread 0x7f9f72734700 (LWP 19327)] > > 0x56163bf0d825 in set_error (error=0x0, type=RTE_FLOW_ERROR_TYPE_ATTR) > > at lib/netdev-dpdk.h:84 > > (gdb) bt > > #0 0x56163bf0d825 in set_error (error=0x0, > > type=RTE_FLOW_ERROR_TYPE_ATTR) at lib/netdev- > dpdk.h:84 > > #1 0x56163bf0d8d3 in netdev_dpdk_rte_flow_get_restore_info > > (netdev=0x1bfc65c80, p=0x19033af00, > info=0x7f9f72729a20, error=0x0) at lib/netdev-dpdk.h:119 > > #2 0x56163bf14da3 in netdev_offload_dpdk_hw_miss_packet_recover > > (netdev=0x1bfc65c80, > packet=0x19033af00) at lib/netdev-offload-dpdk.c:2133 > > #3 0x56163bde3662 in netdev_hw_miss_packet_recover > > (netdev=0x1bfc65c80, packet=0x19033af00) at > lib/netdev-offload.c:265 > > #4 0x56163bda19a9 in dp_netdev_hw_flow (pmd=0x7f9f72735010, port_no=2, > > packet=0x19033af00, > flow=0x7f9f72729b98) at lib/dpif-netdev.c:7087 > > #5 0x56163bda1c5c in dfc_processing (pmd=0x7f9f72735010, > > packets_=0x7f9f727310d0, > keys=0x7f9f7272c480, missed_keys=0x7f9f7272c370, batches=0x7f9f72729f60, > n_batches=0x7f9f72730f70, > flow_map=0x7f9f72729c50, n_flows=0x7f9f72730f78, index_map=0x7f9f72729c30 "", > md_is_valid=false, > port_no=2) at lib/dpif-netdev.c:7168 > > #6 0x56163bda2f3e in dp_netdev_input__ (pmd=0x7f9f72735010, > > packets=0x7f9f727310d0, > md_is_valid=false, port_no=2) at lib/dpif-netdev.c:7475 > > #7 0x56163bda3105 in dp_netdev_input (pmd=0x7f9f72735010, > > packets=0x7f9f727310d0, port_no=2) at > lib/dpif-netdev.c:7519 > > #8 0x56163bd9ab04 in dp_netdev_process_rxq_port (pmd=0x7f9f72735010, > > rxq=0x56163fb3f610, > port_no=2) at lib/dpif-netdev.c:4774 > > #9 0x56163bd9ee17 in pmd_thread_main (f_=0x7f9f72735010) at > > lib/dpif-netdev.c:6063 > > #10 0x56163be71c88 in ovsthread_wrapper (aux_=0x56163fb3fe70) at > > lib/ovs-thread.c:383 > > #11 0x7f9f884cf6db in start_thread (arg=0x7f9f72734700) at > > pthread_create.c:463 > > #12 0x7f9f862bb71f in clone () at > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 > > > > In netdev_offload_dpdk_hw_miss_packet_recover() calls > > netdev_dpdk_rte_flow_get_restore_info() with a > NULL for the struct rte_flow_error *error argument: > > > > if (netdev_dpdk_rte_flow_get_restore_info(netdev, packet, > > _restore_info, NULL)) { > > /* This function is called for every packet, and in most cases there > > * will be no restore info from the HW, thus error is expected. > > */ > > return 0; > > } > > > > There are 2 "netdev_dpdk_rte_flow_get_restore_info()" functions. One in > > lib/netdev-dpdk.h and one in > lib/netdev-dpdk.c. > > > > I don't have the experimental API enabled,
Re: [ovs-dev] [PATCH V6 00/13] Netdev vxlan-decap offload
On 6/23/2021 6:18 PM, Ferriter, Cian wrote: External email: Use caution opening links or attachments -Original Message- From: dev On Behalf Of Ferriter, Cian Sent: Wednesday 23 June 2021 13:38 To: Ilya Maximets ; Eli Britstein ; d...@openvswitch.org Cc: ivan.ma...@oktetlabs.ru; Ameer Mahagneh ; Majd Dibbiny Subject: Re: [ovs-dev] [PATCH V6 00/13] Netdev vxlan-decap offload Hi all, As part of rebasing our AVX512 DPIF on this patchset, I tested this patchset with partial HWOL and I'm seeing strange behaviour. I'll report back more detailed findings soon, just wanted to mention this here as soon as I found the issue. Thanks, Cian More details on the issue I'm seeing: I'm using Ilya's branch from Github: https://github.com/igsilya/ovs/tree/tmp-vxlan-decap ~/ovs_scripts# $OVS_DIR/utilities/ovs-vsctl list Open_vSwitch dpdk_version: "DPDK 20.11.1" other_config: {dpdk-hugepage-dir="/mnt/huge", dpdk-init="true", dpdk-lcore-mask="0x1", dpdk-socket-mem="2048,0", emc-insert-inv-prob="0", hw-offload="true", pmd-cpu-mask="0x2"} ~/ovs_scripts# $OVS_DIR/utilities/ovs-vsctl show 31584ce5-09c1-44b3-ab27-1a0308d63fff Bridge br0 datapath_type: netdev Port br0 Interface br0 type: internal Port phy0 Interface phy0 type: dpdk options: {dpdk-devargs="5e:00.0"} ~/ovs_scripts# $OVS_DIR/utilities/ovs-ofctl dump-flows br0 cookie=0x0, duration=29.466s, table=0, n_packets=0, n_bytes=0, in_port=phy0 actions=IN_PORT I'm expecting the flow to be partially offloaded, but I get a segfault when using the above branch. More info on the segfault below: Thread 13 "pmd-c01/id:8" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f9f72734700 (LWP 19327)] 0x56163bf0d825 in set_error (error=0x0, type=RTE_FLOW_ERROR_TYPE_ATTR) at lib/netdev-dpdk.h:84 (gdb) bt #0 0x56163bf0d825 in set_error (error=0x0, type=RTE_FLOW_ERROR_TYPE_ATTR) at lib/netdev-dpdk.h:84 Yes, it is caused by passing NULL instead of valid struct rte_error, by Ilya's comments. I will fix it in v7. #1 0x56163bf0d8d3 in netdev_dpdk_rte_flow_get_restore_info (netdev=0x1bfc65c80, p=0x19033af00, info=0x7f9f72729a20, error=0x0) at lib/netdev-dpdk.h:119 #2 0x56163bf14da3 in netdev_offload_dpdk_hw_miss_packet_recover (netdev=0x1bfc65c80, packet=0x19033af00) at lib/netdev-offload-dpdk.c:2133 #3 0x56163bde3662 in netdev_hw_miss_packet_recover (netdev=0x1bfc65c80, packet=0x19033af00) at lib/netdev-offload.c:265 #4 0x56163bda19a9 in dp_netdev_hw_flow (pmd=0x7f9f72735010, port_no=2, packet=0x19033af00, flow=0x7f9f72729b98) at lib/dpif-netdev.c:7087 #5 0x56163bda1c5c in dfc_processing (pmd=0x7f9f72735010, packets_=0x7f9f727310d0, keys=0x7f9f7272c480, missed_keys=0x7f9f7272c370, batches=0x7f9f72729f60, n_batches=0x7f9f72730f70, flow_map=0x7f9f72729c50, n_flows=0x7f9f72730f78, index_map=0x7f9f72729c30 "", md_is_valid=false, port_no=2) at lib/dpif-netdev.c:7168 #6 0x56163bda2f3e in dp_netdev_input__ (pmd=0x7f9f72735010, packets=0x7f9f727310d0, md_is_valid=false, port_no=2) at lib/dpif-netdev.c:7475 #7 0x56163bda3105 in dp_netdev_input (pmd=0x7f9f72735010, packets=0x7f9f727310d0, port_no=2) at lib/dpif-netdev.c:7519 #8 0x56163bd9ab04 in dp_netdev_process_rxq_port (pmd=0x7f9f72735010, rxq=0x56163fb3f610, port_no=2) at lib/dpif-netdev.c:4774 #9 0x56163bd9ee17 in pmd_thread_main (f_=0x7f9f72735010) at lib/dpif-netdev.c:6063 #10 0x56163be71c88 in ovsthread_wrapper (aux_=0x56163fb3fe70) at lib/ovs-thread.c:383 #11 0x7f9f884cf6db in start_thread (arg=0x7f9f72734700) at pthread_create.c:463 #12 0x7f9f862bb71f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 In netdev_offload_dpdk_hw_miss_packet_recover() calls netdev_dpdk_rte_flow_get_restore_info() with a NULL for the struct rte_flow_error *error argument: if (netdev_dpdk_rte_flow_get_restore_info(netdev, packet, _restore_info, NULL)) { /* This function is called for every packet, and in most cases there * will be no restore info from the HW, thus error is expected. */ return 0; } There are 2 "netdev_dpdk_rte_flow_get_restore_info()" functions. One in lib/netdev-dpdk.h and one in lib/netdev-dpdk.c. I don't have the experimental API enabled, so I'm using the function rom lib/netdev-dpdk.h. -Original Message- From: dev On Behalf Of Ilya Maximets Sent: Tuesday 22 June 2021 16:55 To: Eli Britstein ; d...@openvswitch.org; Ilya Maximets Cc: ivan.ma...@oktetlabs.ru; Ameer Mahagneh ; Majd Dibbiny Subject: Re: [ovs-dev] [PATCH V6 00/13] Netdev vxlan-decap offload On 4/4/21 11:54 AM, Eli Britstein wrote: VXLAN decap in OVS-DPDK configuration consists of two flows: F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789) F2:
[ovs-dev] [PATCH V7 09/13] netdev-offload-dpdk: Change log rate limits.
In order to allow showing more debug messages, increase the rate limits. Signed-off-by: Eli Britstein Reviewed-by: Gaetan Rivet Acked-by: Sriharsha Basavapatna Tested-by: Emma Finn Tested-by: Marko Kovacevic Signed-off-by: Ilya Maximets --- lib/netdev-offload-dpdk.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c index fa25e059a..2495453af 100644 --- a/lib/netdev-offload-dpdk.c +++ b/lib/netdev-offload-dpdk.c @@ -31,7 +31,7 @@ #include "uuid.h" VLOG_DEFINE_THIS_MODULE(netdev_offload_dpdk); -static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(100, 5); +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(600, 600); /* Thread-safety * = -- 2.28.0.2311.g225365fb51 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V7 04/13] netdev-dpdk: Add flow_api support for netdev vxlan vports.
Bleep bloop. Greetings Eli Britstein, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Ilya Maximets Lines checked: 38, Warnings: 1, Errors: 0 Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V7 07/13] netdev-offload: Allow offloading to netdev without ifindex.
Bleep bloop. Greetings Eli Britstein, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Eli Britstein Lines checked: 71, Warnings: 1, Errors: 0 Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v2] ovn.at: Fix test "virtual ports -- ovn-northd-ddlog".
On Mon, Jun 21, 2021 at 06:58:37PM -0700, Han Zhou wrote: > On Thu, Jun 17, 2021 at 12:59 PM Mark Michelson wrote: > > > > On 6/14/21 2:44 PM, Ben Pfaff wrote: > > > On Fri, Jun 11, 2021 at 03:48:52PM -0700, Han Zhou wrote: > > >> The test case fails quite often for northd-ddlog because of the tunnel > > >> keys mismatch when comparing OpenFlow rules. Keys can change in > > >> different runs. This patch fixes it by extracting the expected keys > from > > >> SB DB before comparison instead of hardcoding. > > >> > > >> There are some other potential timing issues in this test and this > > >> patch fixes them as well by replacing AT_CHECK with OVS_WAIT_UNTIL. > > >> > > >> Signed-off-by: Han Zhou > > > > > > Awesome! Thank you. > > > > > >> -AT_CHECK([test x$(ovn-sbctl --bare --columns chassis find > port_binding \ > > >> +OVS_WAIT_UNTIL([test x$(ovn-sbctl --bare --columns chassis find > port_binding \ > > >> logical_port=sw0-vir) = x], [0], []) > > > > > > I think the above can be better written: > > > wait_row_count Port_Binding 0 logical_port=sw0-vir > > > > I don't think this is correct. The test is not attempting to wait for > > the Port_Binding record to be deleted. It's waiting for the chassis > > column in the Port_Binding to contain an empty string. I think > > wait_column() could work: > > > > wait_column "" Port_Binding chassis logical_port=sw0-vir > > > > (assuming that testing for an empty string works) > > > > Thanks Ben and Mark! I used wait_column in v4: > https://patchwork.ozlabs.org/project/ovn/patch/20210622015529.2005615-1-hz...@ovn.org/ I see. This is done elsewhere along the following pattern: wait_row_count Port_Binding 1 logical_port=sw0-vir 'chassis=[[]]' I *think* that wait_column works too. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] Path MTU discovery on GRE interfaces
[updating Jesse's email address] On Wed, Jun 23, 2021 at 04:48:29PM +0200, Matthias May via dev wrote: > I'm currently fighting with issues where TCP/UDP frames that are larger than > the MTU of a GRE tunnel are dropped. > I'm aware of the whys and how to work around the issue, but while looking for > solutions i stumbled over the fact that: > * [1] added PMTUD support to OVS > * [2] disabled/removed with v1.9.0 respectively v1.10.0 the feature > > Even after some significant time looking through the history i haven't found > a reason why this was removed, just that it > was removed. > > I started some preliminary work to add PMTUD support to OVS (again), but the > fact that it was removed 8 years ago seems > to me like a red flag to not do it (again). > > Could someone fluent with the OVS history from 8 years ago shed some light on > why PMTUD support was dropped? > Any pointers to a thread on this topic? It was a layering violation. This caused problems like, for example, not having a good IP address to send the "frag needed" message from. Jesse may remember more. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH V7 13/13] netdev-dpdk-offload: Add vxlan pattern matching function.
For VXLAN offload, matches should be done on outer header for tunnel properties as well as inner packet matches. Add a function for parsing VXLAN tunnel matches. Signed-off-by: Eli Britstein Reviewed-by: Gaetan Rivet Acked-by: Sriharsha Basavapatna Tested-by: Emma Finn Tested-by: Marko Kovacevic Signed-off-by: Ilya Maximets --- NEWS | 2 + lib/netdev-offload-dpdk.c | 155 +- 2 files changed, 156 insertions(+), 1 deletion(-) diff --git a/NEWS b/NEWS index 6f8e62f7f..10b3ab053 100644 --- a/NEWS +++ b/NEWS @@ -19,6 +19,8 @@ Post-v2.15.0 * New debug appctl command 'dpdk/get-malloc-stats'. * Add hardware offload support for tunnel pop action (experimental). Available only if DPDK experimantal APIs enabled during the build. + * Add hardware offload support for VXLAN flows (experimental). + Available only if DPDK experimantal APIs enabled during the build. - ovsdb-tool: * New option '--election-timer' to the 'create-cluster' command to set the leader election timer during cluster creation. diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c index 6220394de..6bd5b6c9f 100644 --- a/lib/netdev-offload-dpdk.c +++ b/lib/netdev-offload-dpdk.c @@ -372,6 +372,20 @@ dump_flow_pattern(struct ds *s, ipv6_mask->hdr.hop_limits); } ds_put_cstr(s, "/ "); +} else if (item->type == RTE_FLOW_ITEM_TYPE_VXLAN) { +const struct rte_flow_item_vxlan *vxlan_spec = item->spec; +const struct rte_flow_item_vxlan *vxlan_mask = item->mask; + +ds_put_cstr(s, "vxlan "); +if (vxlan_spec) { +if (!vxlan_mask) { +vxlan_mask = _flow_item_vxlan_mask; +} +DUMP_PATTERN_ITEM(vxlan_mask->vni, "vni", "%"PRIu32, + ntohl(*(ovs_be32 *) vxlan_spec->vni) >> 8, + ntohl(*(ovs_be32 *) vxlan_mask->vni) >> 8); +} +ds_put_cstr(s, "/ "); } else { ds_put_format(s, "unknown rte flow pattern (%d)\n", item->type); } @@ -865,15 +879,154 @@ out: return ret; } +static int +parse_tnl_ip_match(struct flow_patterns *patterns, + struct match *match, + uint8_t proto) +{ +struct flow *consumed_masks; + +consumed_masks = >wc.masks; +/* IP v4 */ +if (match->wc.masks.tunnel.ip_src || match->wc.masks.tunnel.ip_dst) { +struct rte_flow_item_ipv4 *spec, *mask; + +spec = xzalloc(sizeof *spec); +mask = xzalloc(sizeof *mask); + +spec->hdr.type_of_service = match->flow.tunnel.ip_tos; +spec->hdr.time_to_live= match->flow.tunnel.ip_ttl; +spec->hdr.next_proto_id = proto; +spec->hdr.src_addr= match->flow.tunnel.ip_src; +spec->hdr.dst_addr= match->flow.tunnel.ip_dst; + +mask->hdr.type_of_service = match->wc.masks.tunnel.ip_tos; +mask->hdr.time_to_live= match->wc.masks.tunnel.ip_ttl; +mask->hdr.next_proto_id = UINT8_MAX; +mask->hdr.src_addr= match->wc.masks.tunnel.ip_src; +mask->hdr.dst_addr= match->wc.masks.tunnel.ip_dst; + +consumed_masks->tunnel.ip_tos = 0; +consumed_masks->tunnel.ip_ttl = 0; +consumed_masks->tunnel.ip_src = 0; +consumed_masks->tunnel.ip_dst = 0; + +add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_IPV4, spec, mask); +} else if (!is_all_zeros(>wc.masks.tunnel.ipv6_src, + sizeof(struct in6_addr)) || + !is_all_zeros(>wc.masks.tunnel.ipv6_dst, + sizeof(struct in6_addr))) { +/* IP v6 */ +struct rte_flow_item_ipv6 *spec, *mask; + +spec = xzalloc(sizeof *spec); +mask = xzalloc(sizeof *mask); + +spec->hdr.proto = proto; +spec->hdr.hop_limits = match->flow.tunnel.ip_ttl; +spec->hdr.vtc_flow = htonl((uint32_t) match->flow.tunnel.ip_tos << + RTE_IPV6_HDR_TC_SHIFT); +memcpy(spec->hdr.src_addr, >flow.tunnel.ipv6_src, + sizeof spec->hdr.src_addr); +memcpy(spec->hdr.dst_addr, >flow.tunnel.ipv6_dst, + sizeof spec->hdr.dst_addr); + +mask->hdr.proto = UINT8_MAX; +mask->hdr.hop_limits = match->wc.masks.tunnel.ip_ttl; +mask->hdr.vtc_flow = htonl((uint32_t) match->wc.masks.tunnel.ip_tos << + RTE_IPV6_HDR_TC_SHIFT); +memcpy(mask->hdr.src_addr, >wc.masks.tunnel.ipv6_src, + sizeof mask->hdr.src_addr); +memcpy(mask->hdr.dst_addr, >wc.masks.tunnel.ipv6_dst, + sizeof mask->hdr.dst_addr); + +consumed_masks->tunnel.ip_tos = 0; +consumed_masks->tunnel.ip_ttl = 0; +memset(_masks->tunnel.ipv6_src, 0, + sizeof consumed_masks->tunnel.ipv6_src); +
[ovs-dev] [PATCH V7 02/13] netdev-dpdk: Introduce DPDK tunnel APIs.
As a pre-step towards tunnel offloads, introduce DPDK APIs. Signed-off-by: Eli Britstein Reviewed-by: Gaetan Rivet Acked-by: Sriharsha Basavapatna Tested-by: Emma Finn Tested-by: Marko Kovacevic Signed-off-by: Ilya Maximets --- lib/netdev-dpdk.c | 112 ++ lib/netdev-dpdk.h | 103 +++--- 2 files changed, 209 insertions(+), 6 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 9d8096668..4dfed1ebe 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -5291,6 +5291,118 @@ netdev_dpdk_rte_flow_query_count(struct netdev *netdev, return ret; } +#ifdef ALLOW_EXPERIMENTAL_API + +int +netdev_dpdk_rte_flow_tunnel_decap_set(struct netdev *netdev, + struct rte_flow_tunnel *tunnel, + struct rte_flow_action **actions, + uint32_t *num_of_actions, + struct rte_flow_error *error) +{ +struct netdev_dpdk *dev; +int ret; + +if (!is_dpdk_class(netdev->netdev_class)) { +return -1; +} + +dev = netdev_dpdk_cast(netdev); +ovs_mutex_lock(>mutex); +ret = rte_flow_tunnel_decap_set(dev->port_id, tunnel, actions, +num_of_actions, error); +ovs_mutex_unlock(>mutex); +return ret; +} + +int +netdev_dpdk_rte_flow_tunnel_match(struct netdev *netdev, + struct rte_flow_tunnel *tunnel, + struct rte_flow_item **items, + uint32_t *num_of_items, + struct rte_flow_error *error) +{ +struct netdev_dpdk *dev; +int ret; + +if (!is_dpdk_class(netdev->netdev_class)) { +return -1; +} + +dev = netdev_dpdk_cast(netdev); +ovs_mutex_lock(>mutex); +ret = rte_flow_tunnel_match(dev->port_id, tunnel, items, num_of_items, +error); +ovs_mutex_unlock(>mutex); +return ret; +} + +int +netdev_dpdk_rte_flow_get_restore_info(struct netdev *netdev, + struct dp_packet *p, + struct rte_flow_restore_info *info, + struct rte_flow_error *error) +{ +struct rte_mbuf *m = (struct rte_mbuf *) p; +struct netdev_dpdk *dev; +int ret; + +if (!is_dpdk_class(netdev->netdev_class)) { +return -1; +} + +dev = netdev_dpdk_cast(netdev); +ovs_mutex_lock(>mutex); +ret = rte_flow_get_restore_info(dev->port_id, m, info, error); +ovs_mutex_unlock(>mutex); +return ret; +} + +int +netdev_dpdk_rte_flow_tunnel_action_decap_release( +struct netdev *netdev, +struct rte_flow_action *actions, +uint32_t num_of_actions, +struct rte_flow_error *error) +{ +struct netdev_dpdk *dev; +int ret; + +if (!is_dpdk_class(netdev->netdev_class)) { +return -1; +} + +dev = netdev_dpdk_cast(netdev); +ovs_mutex_lock(>mutex); +ret = rte_flow_tunnel_action_decap_release(dev->port_id, actions, + num_of_actions, error); +ovs_mutex_unlock(>mutex); +return ret; +} + +int +netdev_dpdk_rte_flow_tunnel_item_release(struct netdev *netdev, + struct rte_flow_item *items, + uint32_t num_of_items, + struct rte_flow_error *error) +{ +struct netdev_dpdk *dev; +int ret; + +if (!is_dpdk_class(netdev->netdev_class)) { +return -1; +} + +dev = netdev_dpdk_cast(netdev); +ovs_mutex_lock(>mutex); +ret = rte_flow_tunnel_item_release(dev->port_id, items, num_of_items, + error); +ovs_mutex_unlock(>mutex); +return ret; +} + +#endif /* ALLOW_EXPERIMENTAL_API */ + #define NETDEV_DPDK_CLASS_COMMON\ .is_pmd = true, \ .alloc = netdev_dpdk_alloc, \ diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h index 848346cb4..699be3fb4 100644 --- a/lib/netdev-dpdk.h +++ b/lib/netdev-dpdk.h @@ -26,12 +26,7 @@ struct netdev; #ifdef DPDK_NETDEV -struct rte_flow; -struct rte_flow_error; -struct rte_flow_attr; -struct rte_flow_item; -struct rte_flow_action; -struct rte_flow_query_count; +#include void netdev_dpdk_register(void); void free_dpdk_buf(struct dp_packet *); @@ -56,6 +51,102 @@ netdev_dpdk_rte_flow_query_count(struct netdev *netdev, int netdev_dpdk_get_port_id(struct netdev *netdev); +#ifdef ALLOW_EXPERIMENTAL_API + +int netdev_dpdk_rte_flow_tunnel_decap_set(struct netdev *, + struct rte_flow_tunnel *, + struct rte_flow_action **, +
[ovs-dev] [PATCH V7 05/13] netdev-offload-dpdk: Implement HW miss packet recover for vport.
A miss in virtual port offloads means the flow with tnl_pop was offloaded, but not the following one. Recover the state and continue with SW processing. Signed-off-by: Eli Britstein Reviewed-by: Gaetan Rivet Acked-by: Sriharsha Basavapatna Tested-by: Emma Finn Tested-by: Marko Kovacevic Signed-off-by: Ilya Maximets --- lib/netdev-offload-dpdk.c | 150 ++ 1 file changed, 150 insertions(+) diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c index f2413f5be..33cd64dc9 100644 --- a/lib/netdev-offload-dpdk.c +++ b/lib/netdev-offload-dpdk.c @@ -1588,6 +1588,155 @@ netdev_offload_dpdk_flow_flush(struct netdev *netdev) return 0; } +struct get_vport_netdev_aux { +struct rte_flow_tunnel *tunnel; +odp_port_t *odp_port; +struct netdev *vport; +}; + +static bool +get_vxlan_netdev_cb(struct netdev *netdev, +odp_port_t odp_port, +void *aux_) +{ +const struct netdev_tunnel_config *tnl_cfg; +struct get_vport_netdev_aux *aux = aux_; + +if (strcmp(netdev_get_type(netdev), "vxlan")) { +return false; +} + +tnl_cfg = netdev_get_tunnel_config(netdev); +if (!tnl_cfg) { +VLOG_ERR_RL(, "Cannot get a tunnel config for netdev %s", +netdev_get_name(netdev)); +return false; +} + +if (tnl_cfg->dst_port == aux->tunnel->tp_dst) { +/* Found the netdev. Store the results and stop the traversing. */ +aux->vport = netdev_ref(netdev); +*aux->odp_port = odp_port; +return true; +} + +return false; +} + +static struct netdev * +get_vxlan_netdev(const char *dpif_type, + struct rte_flow_tunnel *tunnel, + odp_port_t *odp_port) +{ +struct get_vport_netdev_aux aux = { +.tunnel = tunnel, +.odp_port = odp_port, +.vport = NULL, +}; + +netdev_ports_traverse(dpif_type, get_vxlan_netdev_cb, ); +return aux.vport; +} + +static struct netdev * +get_vport_netdev(const char *dpif_type, + struct rte_flow_tunnel *tunnel, + odp_port_t *odp_port) +{ +if (tunnel->type == RTE_FLOW_ITEM_TYPE_VXLAN) { +return get_vxlan_netdev(dpif_type, tunnel, odp_port); +} + +OVS_NOT_REACHED(); +} + +static int +netdev_offload_dpdk_hw_miss_packet_recover(struct netdev *netdev, + struct dp_packet *packet) +{ +struct rte_flow_restore_info rte_restore_info; +struct rte_flow_tunnel *rte_tnl; +struct netdev *vport_netdev; +struct pkt_metadata *md; +struct flow_tnl *md_tnl; +odp_port_t vport_odp; +int ret = 0; + +if (netdev_dpdk_rte_flow_get_restore_info(netdev, packet, + _restore_info, NULL)) { +/* This function is called for every packet, and in most cases there + * will be no restore info from the HW, thus error is expected. + */ +return 0; +} + +if (!(rte_restore_info.flags & RTE_FLOW_RESTORE_INFO_TUNNEL)) { +return EOPNOTSUPP; +} + +rte_tnl = _restore_info.tunnel; +vport_netdev = get_vport_netdev(netdev->dpif_type, rte_tnl, +_odp); +if (!vport_netdev) { +VLOG_WARN_RL(, "Could not find vport netdev"); +return EOPNOTSUPP; +} + +md = >md; +/* For tunnel recovery (RTE_FLOW_RESTORE_INFO_TUNNEL), it is possible + * to have the packet to still be encapsulated, or not. This is reflected + * by the RTE_FLOW_RESTORE_INFO_ENCAPSULATED flag. + * In the case it is on, the packet is still encapsulated, and we do + * the pop in SW. + * In the case it is off, the packet is already decapsulated by HW, and + * the tunnel info is provided in the tunnel struct. For this case we + * take it to OVS metadata. + */ +if (rte_restore_info.flags & RTE_FLOW_RESTORE_INFO_ENCAPSULATED) { +if (!vport_netdev->netdev_class || +!vport_netdev->netdev_class->pop_header) { +VLOG_ERR_RL(, "vport nedtdev=%s with no pop_header method", +netdev_get_name(vport_netdev)); +ret = EOPNOTSUPP; +goto close_vport_netdev; +} +parse_tcp_flags(packet); +if (vport_netdev->netdev_class->pop_header(packet) == NULL) { +/* If there is an error with popping the header, the packet is + * freed. In this case it should not continue SW processing. + */ +ret = EINVAL; +goto close_vport_netdev; +} +} else { +md_tnl = >tunnel; +if (rte_tnl->is_ipv6) { +memcpy(_tnl->ipv6_src, _tnl->ipv6.src_addr, + sizeof md_tnl->ipv6_src); +memcpy(_tnl->ipv6_dst, _tnl->ipv6.dst_addr, + sizeof md_tnl->ipv6_dst); +} else { +md_tnl->ip_src = rte_tnl->ipv4.src_addr; +
[ovs-dev] [PATCH V7 12/13] netdev-offload-dpdk: Support vports flows offload.
Vports are virtual, OVS only logical devices, so rte_flows cannot be applied as is on them. Instead, apply the rules the physical port from which the packet has arrived, provided by orig_in_port field. Signed-off-by: Eli Britstein Reviewed-by: Gaetan Rivet Acked-by: Sriharsha Basavapatna Tested-by: Emma Finn Tested-by: Marko Kovacevic Signed-off-by: Ilya Maximets --- lib/netdev-offload-dpdk.c | 218 -- 1 file changed, 185 insertions(+), 33 deletions(-) diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c index 99c62c906..6220394de 100644 --- a/lib/netdev-offload-dpdk.c +++ b/lib/netdev-offload-dpdk.c @@ -25,6 +25,7 @@ #include "netdev-offload-provider.h" #include "netdev-provider.h" #include "netdev-vport.h" +#include "odp-util.h" #include "openvswitch/match.h" #include "openvswitch/vlog.h" #include "packets.h" @@ -62,6 +63,7 @@ struct ufid_to_rte_flow_data { struct rte_flow *rte_flow; bool actions_offloaded; struct dpif_flow_stats stats; +struct netdev *physdev; }; /* Find rte_flow with @ufid. */ @@ -87,7 +89,8 @@ ufid_to_rte_flow_data_find(const ovs_u128 *ufid, bool warn) static inline struct ufid_to_rte_flow_data * ufid_to_rte_flow_associate(const ovs_u128 *ufid, struct netdev *netdev, - struct rte_flow *rte_flow, bool actions_offloaded) + struct netdev *physdev, struct rte_flow *rte_flow, + bool actions_offloaded) { size_t hash = hash_bytes(ufid, sizeof *ufid, 0); struct ufid_to_rte_flow_data *data = xzalloc(sizeof *data); @@ -106,6 +109,7 @@ ufid_to_rte_flow_associate(const ovs_u128 *ufid, struct netdev *netdev, data->ufid = *ufid; data->netdev = netdev_ref(netdev); +data->physdev = netdev != physdev ? netdev_ref(physdev) : physdev; data->rte_flow = rte_flow; data->actions_offloaded = actions_offloaded; @@ -121,7 +125,10 @@ ufid_to_rte_flow_disassociate(struct ufid_to_rte_flow_data *data) cmap_remove(_to_rte_flow, CONST_CAST(struct cmap_node *, >node), hash); -netdev_close(data->netdev); +if (data->netdev != data->physdev) { +netdev_close(data->netdev); +} +netdev_close(data->physdev); ovsrcu_postpone(free, data); } @@ -134,6 +141,11 @@ struct flow_patterns { struct rte_flow_item *items; int cnt; int current_max; +struct netdev *physdev; +/* tnl_pmd_items is the opaque array of items returned by the PMD. */ +struct rte_flow_item *tnl_pmd_items; +uint32_t tnl_pmd_items_cnt; +struct ds s_tnl; }; struct flow_actions { @@ -154,16 +166,20 @@ struct flow_actions { static void dump_flow_attr(struct ds *s, struct ds *s_extra, const struct rte_flow_attr *attr, + struct flow_patterns *flow_patterns, struct flow_actions *flow_actions) { if (flow_actions->tnl_pmd_actions_cnt) { ds_clone(s_extra, _actions->s_tnl); +} else if (flow_patterns->tnl_pmd_items_cnt) { +ds_clone(s_extra, _patterns->s_tnl); } -ds_put_format(s, "%s%spriority %"PRIu32" group %"PRIu32" %s%s", +ds_put_format(s, "%s%spriority %"PRIu32" group %"PRIu32" %s%s%s", attr->ingress ? "ingress " : "", attr->egress ? "egress " : "", attr->priority, attr->group, attr->transfer ? "transfer " : "", - flow_actions->tnl_pmd_actions_cnt ? "tunnel_set 1 " : ""); + flow_actions->tnl_pmd_actions_cnt ? "tunnel_set 1 " : "", + flow_patterns->tnl_pmd_items_cnt ? "tunnel_match 1 " : ""); } /* Adds one pattern item 'field' with the 'mask' to dynamic string 's' using @@ -177,9 +193,18 @@ dump_flow_attr(struct ds *s, struct ds *s_extra, } static void -dump_flow_pattern(struct ds *s, const struct rte_flow_item *item) +dump_flow_pattern(struct ds *s, + struct flow_patterns *flow_patterns, + int pattern_index) { -if (item->type == RTE_FLOW_ITEM_TYPE_ETH) { +const struct rte_flow_item *item = _patterns->items[pattern_index]; + +if (item->type == RTE_FLOW_ITEM_TYPE_END) { +ds_put_cstr(s, "end "); +} else if (flow_patterns->tnl_pmd_items_cnt && + pattern_index < flow_patterns->tnl_pmd_items_cnt) { +return; +} else if (item->type == RTE_FLOW_ITEM_TYPE_ETH) { const struct rte_flow_item_eth *eth_spec = item->spec; const struct rte_flow_item_eth *eth_mask = item->mask; @@ -569,19 +594,19 @@ dump_flow_action(struct ds *s, struct ds *s_extra, static struct ds * dump_flow(struct ds *s, struct ds *s_extra, const struct rte_flow_attr *attr, - const struct rte_flow_item *items, + struct flow_patterns *flow_patterns, struct flow_actions *flow_actions) { int i; if (attr) { -dump_flow_attr(s, s_extra, attr,
[ovs-dev] [PATCH V7 00/13] Netdev vxlan-decap offload
VXLAN decap in OVS-DPDK configuration consists of two flows: F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789) F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0 F1 is a classification flow. It has outer headers matches and it classifies the packet as a VXLAN packet, and using tnl_pop action the packet continues processing in F2. F2 is a flow that has matches on tunnel metadata as well as on the inner packet headers (as any other flow). In order to fully offload VXLAN decap path, both F1 and F2 should be offloaded. As there are more than one flow in HW, it is possible that F1 is done by HW but F2 is not. Packet is received by SW, and should be processed starting from F2 as F1 was already done by HW. Rte_flows are applicable only on physical port IDs. Keeping the original physical in port on which the packet is received on enables applying vport flows (e.g. F2) on that physical port. This patch-set makes use of [1] introduced in DPDK 20.11, that adds API for tunnel offloads. Note that MLX5 PMD has a bug that the tnl_pop private actions must be first. In OVS it is not. Fixing this issue is scheduled for 21.05 (and stable 20.11.2). Meanwhile, tests were done with a workaround for it [2]. v2-v1: - Tracking original in_port, and applying vport on that physical port instead of all PFs. v3-v2: - Traversing ports using a new API instead of flow_dump. - Refactor packet state recover logic, with bug fix for error pop_header. - One ref count for netdev in non-tunnel case. - Rename variables, comments, rebase. v4-v3: - Extract orig_in_port from physdev for flow modify. - Miss handling fixes. v5-v4: - Drop refactor offload rule creation commit. - Comment about setting in_port in restore. - Refactor vports flow offload commit. v6-v5: - Fixed duplicate netdev ref bug. v7-v6: - Adopting Ilya's diff, with a minor fix in set_error stub. - Fixed abort (remove OVS_NOT_REACHED()) with tunnels other than vxlan ("netdev-offload-dpdk: Support tunnel pop action."). Travis: v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552 v2: https://travis-ci.org/github/elibritstein/OVS/builds/758382963 v3: https://travis-ci.org/github/elibritstein/OVS/builds/761089087 v4: https://travis-ci.org/github/elibritstein/OVS/builds/763146966 v5: https://travis-ci.org/github/elibritstein/OVS/builds/765271879 v6: https://travis-ci.org/github/elibritstein/OVS/builds/765816800 v7: Have a problem to run GitHub Actions: v1: https://github.com/elibritstein/OVS/actions/runs/515334647 v2: https://github.com/elibritstein/OVS/actions/runs/554986007 v3: https://github.com/elibritstein/OVS/actions/runs/613226225 v4: https://github.com/elibritstein/OVS/actions/runs/658415274 v5: https://github.com/elibritstein/OVS/actions/runs/704357369 v6: https://github.com/elibritstein/OVS/actions/runs/716304028 v7: https://github.com/elibritstein/OVS/actions/runs/964875737 [1] https://mails.dpdk.org/archives/dev/2020-October/187314.html [2] https://github.com/elibritstein/dpdk-stable/pull/1 Eli Britstein (10): netdev-offload: Add HW miss packet state recover API. netdev-dpdk: Introduce DPDK tunnel APIs. netdev-offload: Introduce an API to traverse ports. netdev-dpdk: Add flow_api support for netdev vxlan vports. netdev-offload-dpdk: Implement HW miss packet recover for vport. dpif-netdev: Add HW miss packet state recover logic. netdev-offload-dpdk: Change log rate limits. netdev-offload-dpdk: Support tunnel pop action. netdev-offload-dpdk: Support vports flows offload. netdev-dpdk-offload: Add vxlan pattern matching function. Ilya Maximets (2): netdev-offload: Allow offloading to netdev without ifindex. netdev-offload: Disallow offloading to unrelated tunneling vports. Sriharsha Basavapatna (1): dpif-netdev: Provide orig_in_port in metadata for tunneled packets. Documentation/howto/dpdk.rst | 1 + NEWS | 4 + lib/dpif-netdev.c | 68 +++- lib/netdev-dpdk.c | 118 ++ lib/netdev-dpdk.h | 103 - lib/netdev-offload-dpdk.c | 708 +++--- lib/netdev-offload-provider.h | 6 + lib/netdev-offload-tc.c | 8 + lib/netdev-offload.c | 47 ++- lib/netdev-offload.h | 10 + lib/packets.h | 8 +- 11 files changed, 1011 insertions(+), 70 deletions(-) -- 2.28.0.2311.g225365fb51 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH V7 08/13] netdev-offload: Disallow offloading to unrelated tunneling vports.
From: Ilya Maximets 'linux_tc' flow API suitable only for tunneling vports with backing linux interfaces. DPDK flow API is not suitable for such ports. With this change we could drop vport restriction from dpif-netdev. This is a prerequisite for enabling vport offloading in DPDK. Signed-off-by: Ilya Maximets Signed-off-by: Eli Britstein Reviewed-by: Gaetan Rivet Acked-by: Sriharsha Basavapatna Tested-by: Emma Finn Tested-by: Marko Kovacevic --- lib/dpif-netdev.c | 3 +-- lib/netdev-offload-dpdk.c | 8 lib/netdev-offload-tc.c | 8 3 files changed, 17 insertions(+), 2 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index d5b7d5b73..0a633c582 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -2699,8 +2699,7 @@ dp_netdev_flow_offload_put(struct dp_flow_offload_item *offload) info.flow_mark = mark; port = netdev_ports_get(in_port, dpif_type_str); -if (!port || netdev_vport_is_vport_class(port->netdev_class)) { -netdev_close(port); +if (!port) { goto err_free; } /* Taking a global 'port_mutex' to fulfill thread safety restrictions for diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c index 33cd64dc9..fa25e059a 100644 --- a/lib/netdev-offload-dpdk.c +++ b/lib/netdev-offload-dpdk.c @@ -24,6 +24,7 @@ #include "dpif-netdev.h" #include "netdev-offload-provider.h" #include "netdev-provider.h" +#include "netdev-vport.h" #include "openvswitch/match.h" #include "openvswitch/vlog.h" #include "packets.h" @@ -1523,6 +1524,13 @@ netdev_offload_dpdk_flow_del(struct netdev *netdev OVS_UNUSED, static int netdev_offload_dpdk_init_flow_api(struct netdev *netdev) { +if (netdev_vport_is_vport_class(netdev->netdev_class) +&& !strcmp(netdev_get_dpif_type(netdev), "system")) { +VLOG_DBG("%s: vport belongs to the system datapath. Skipping.", + netdev_get_name(netdev)); +return EOPNOTSUPP; +} + return netdev_dpdk_flow_api_supported(netdev) ? 0 : EOPNOTSUPP; } diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c index 41acbdeb7..27633e04d 100644 --- a/lib/netdev-offload-tc.c +++ b/lib/netdev-offload-tc.c @@ -31,6 +31,7 @@ #include "netdev-linux.h" #include "netdev-offload-provider.h" #include "netdev-provider.h" +#include "netdev-vport.h" #include "netlink.h" #include "netlink-socket.h" #include "odp-netlink.h" @@ -2226,6 +2227,13 @@ netdev_tc_init_flow_api(struct netdev *netdev) int ifindex; int error; +if (netdev_vport_is_vport_class(netdev->netdev_class) +&& strcmp(netdev_get_dpif_type(netdev), "system")) { +VLOG_DBG("%s: vport doesn't belong to the system datapath. Skipping.", + netdev_get_name(netdev)); +return EOPNOTSUPP; +} + ifindex = netdev_get_ifindex(netdev); if (ifindex < 0) { VLOG_INFO("init: failed to get ifindex for %s: %s", -- 2.28.0.2311.g225365fb51 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH V7 07/13] netdev-offload: Allow offloading to netdev without ifindex.
From: Ilya Maximets Virtual interfaces like vports or dpdk vhost-user ports have no proper ifindex, while still supporting some offloads. This is a prerequisite for tunneling vport offloading with DPDK flow API. Signed-off-by: Ilya Maximets Signed-off-by: Eli Britstein Reviewed-by: Gaetan Rivet Acked-by: Sriharsha Basavapatna Tested-by: Emma Finn Tested-by: Marko Kovacevic --- lib/netdev-offload.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c index 10f543018..8075cfbd8 100644 --- a/lib/netdev-offload.c +++ b/lib/netdev-offload.c @@ -579,10 +579,6 @@ netdev_ports_insert(struct netdev *netdev, const char *dpif_type, struct port_to_netdev_data *data; int ifindex = netdev_get_ifindex(netdev); -if (ifindex < 0) { -return ENODEV; -} - ovs_rwlock_wrlock(_hmap_rwlock); if (netdev_ports_lookup(dpif_port->port_no, dpif_type)) { ovs_rwlock_unlock(_hmap_rwlock); @@ -592,13 +588,18 @@ netdev_ports_insert(struct netdev *netdev, const char *dpif_type, data = xzalloc(sizeof *data); data->netdev = netdev_ref(netdev); dpif_port_clone(>dpif_port, dpif_port); -data->ifindex = ifindex; + +if (ifindex >= 0) { +data->ifindex = ifindex; +hmap_insert(_to_port, >ifindex_node, ifindex); +} else { +data->ifindex = -1; +} netdev_set_dpif_type(netdev, dpif_type); hmap_insert(_to_netdev, >portno_node, netdev_ports_hash(dpif_port->port_no, dpif_type)); -hmap_insert(_to_port, >ifindex_node, ifindex); ovs_rwlock_unlock(_hmap_rwlock); netdev_init_flow_api(netdev); @@ -634,7 +635,9 @@ netdev_ports_remove(odp_port_t port_no, const char *dpif_type) dpif_port_destroy(>dpif_port); netdev_close(data->netdev); /* unref and possibly close */ hmap_remove(_to_netdev, >portno_node); -hmap_remove(_to_port, >ifindex_node); +if (data->ifindex >= 0) { +hmap_remove(_to_port, >ifindex_node); +} free(data); ret = 0; } -- 2.28.0.2311.g225365fb51 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH V7 06/13] dpif-netdev: Add HW miss packet state recover logic.
Recover the packet if it was partially processed by the HW. Fallback to lookup flow by mark association. Signed-off-by: Eli Britstein Reviewed-by: Gaetan Rivet Acked-by: Sriharsha Basavapatna Tested-by: Emma Finn Tested-by: Marko Kovacevic Signed-off-by: Ilya Maximets --- lib/dpif-netdev.c | 45 + 1 file changed, 41 insertions(+), 4 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 8fa7eb6d4..d5b7d5b73 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -114,6 +114,7 @@ COVERAGE_DEFINE(datapath_drop_invalid_port); COVERAGE_DEFINE(datapath_drop_invalid_bond); COVERAGE_DEFINE(datapath_drop_invalid_tnl_port); COVERAGE_DEFINE(datapath_drop_rx_invalid_packet); +COVERAGE_DEFINE(datapath_drop_hw_miss_recover); /* Protects against changes to 'dp_netdevs'. */ static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER; @@ -7062,6 +7063,39 @@ smc_lookup_batch(struct dp_netdev_pmd_thread *pmd, pmd_perf_update_counter(>perf_stats, PMD_STAT_SMC_HIT, n_smc_hit); } +static struct tx_port * pmd_send_port_cache_lookup( +const struct dp_netdev_pmd_thread *pmd, odp_port_t port_no); + +static inline int +dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd, + odp_port_t port_no, + struct dp_packet *packet, + struct dp_netdev_flow **flow) +{ +struct tx_port *p; +uint32_t mark; + +/* Restore the packet if HW processing was terminated before completion. */ +p = pmd_send_port_cache_lookup(pmd, port_no); +if (OVS_LIKELY(p)) { +int err = netdev_hw_miss_packet_recover(p->port->netdev, packet); + +if (err && err != EOPNOTSUPP) { +COVERAGE_INC(datapath_drop_hw_miss_recover); +return -1; +} +} + +/* If no mark, no flow to find. */ +if (!dp_packet_has_flow_mark(packet, )) { +*flow = NULL; +return 0; +} + +*flow = mark_to_flow_find(pmd, mark); +return 0; +} + /* Try to process all ('cnt') the 'packets' using only the datapath flow cache * 'pmd->flow_cache'. If a flow is not found for a packet 'packets[i]', the * miniflow is copied into 'keys' and the packet pointer is moved at the @@ -7106,7 +7140,6 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd, DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) { struct dp_netdev_flow *flow; -uint32_t mark; if (OVS_UNLIKELY(dp_packet_size(packet) < ETH_HEADER_LEN)) { dp_packet_delete(packet); @@ -7125,9 +7158,13 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd, pkt_metadata_init(>md, port_no); } -if ((*recirc_depth_get() == 0) && -dp_packet_has_flow_mark(packet, )) { -flow = mark_to_flow_find(pmd, mark); +if (netdev_is_flow_api_enabled() && *recirc_depth_get() == 0) { +if (OVS_UNLIKELY(dp_netdev_hw_flow(pmd, port_no, packet, ))) { +/* Packet restoration failed and it was dropped, do not + * continue processing. + */ +continue; +} if (OVS_LIKELY(flow)) { tcp_flags = parse_tcp_flags(packet); if (OVS_LIKELY(batch_enable)) { -- 2.28.0.2311.g225365fb51 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH V7 04/13] netdev-dpdk: Add flow_api support for netdev vxlan vports.
Add the acceptance of vxlan devices to netdev_dpdk_flow_api_supported() API, to allow offloading of DPDK vxlan devices. Signed-off-by: Eli Britstein Reviewed-by: Gaetan Rivet Acked-by: Sriharsha Basavapatna Tested-by: Emma Finn Tested-by: Marko Kovacevic Signed-off-by: Ilya Maximets --- lib/netdev-dpdk.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 4dfed1ebe..bc5485d60 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -5216,6 +5216,12 @@ netdev_dpdk_flow_api_supported(struct netdev *netdev) struct netdev_dpdk *dev; bool ret = false; +if (!strcmp(netdev_get_type(netdev), "vxlan") && +!strcmp(netdev_get_dpif_type(netdev), "netdev")) { +ret = true; +goto out; +} + if (!is_dpdk_class(netdev->netdev_class)) { goto out; } -- 2.28.0.2311.g225365fb51 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH V7 10/13] netdev-offload-dpdk: Support tunnel pop action.
Support tunnel pop action. Signed-off-by: Eli Britstein Reviewed-by: Gaetan Rivet Acked-by: Sriharsha Basavapatna Tested-by: Emma Finn Tested-by: Marko Kovacevic Signed-off-by: Ilya Maximets --- Documentation/howto/dpdk.rst | 1 + NEWS | 2 + lib/netdev-offload-dpdk.c| 187 --- 3 files changed, 175 insertions(+), 15 deletions(-) diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst index f0d45e47b..36314c06a 100644 --- a/Documentation/howto/dpdk.rst +++ b/Documentation/howto/dpdk.rst @@ -398,6 +398,7 @@ Supported actions for hardware offload are: - VLAN Push/Pop (push_vlan/pop_vlan). - Modification of IPv6 (set_field:->ipv6_src/ipv6_dst/mod_nw_ttl). - Clone/output (tnl_push and output) for encapsulating over a tunnel. +- Tunnel pop, for packets received on physical ports. Further Reading --- diff --git a/NEWS b/NEWS index db3faf4cc..6f8e62f7f 100644 --- a/NEWS +++ b/NEWS @@ -17,6 +17,8 @@ Post-v2.15.0 * OVS validated with DPDK 20.11.1. It is recommended to use this version until further releases. * New debug appctl command 'dpdk/get-malloc-stats'. + * Add hardware offload support for tunnel pop action (experimental). + Available only if DPDK experimantal APIs enabled during the build. - ovsdb-tool: * New option '--election-timer' to the 'create-cluster' command to set the leader election timer during cluster creation. diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c index 2495453af..99c62c906 100644 --- a/lib/netdev-offload-dpdk.c +++ b/lib/netdev-offload-dpdk.c @@ -140,15 +140,30 @@ struct flow_actions { struct rte_flow_action *actions; int cnt; int current_max; +struct netdev *tnl_netdev; +/* tnl_pmd_actions is the opaque array of actions returned by the PMD. */ +struct rte_flow_action *tnl_pmd_actions; +uint32_t tnl_pmd_actions_cnt; +/* tnl_pmd_actions_pos is where the tunnel actions starts within the + * 'actions' field. + */ +int tnl_pmd_actions_pos; +struct ds s_tnl; }; static void -dump_flow_attr(struct ds *s, const struct rte_flow_attr *attr) +dump_flow_attr(struct ds *s, struct ds *s_extra, + const struct rte_flow_attr *attr, + struct flow_actions *flow_actions) { -ds_put_format(s, "%s%spriority %"PRIu32" group %"PRIu32" %s", +if (flow_actions->tnl_pmd_actions_cnt) { +ds_clone(s_extra, _actions->s_tnl); +} +ds_put_format(s, "%s%spriority %"PRIu32" group %"PRIu32" %s%s", attr->ingress ? "ingress " : "", attr->egress ? "egress " : "", attr->priority, attr->group, - attr->transfer ? "transfer " : ""); + attr->transfer ? "transfer " : "", + flow_actions->tnl_pmd_actions_cnt ? "tunnel_set 1 " : ""); } /* Adds one pattern item 'field' with the 'mask' to dynamic string 's' using @@ -395,9 +410,19 @@ dump_vxlan_encap(struct ds *s, const struct rte_flow_item *items) static void dump_flow_action(struct ds *s, struct ds *s_extra, - const struct rte_flow_action *actions) + struct flow_actions *flow_actions, int act_index) { -if (actions->type == RTE_FLOW_ACTION_TYPE_MARK) { +const struct rte_flow_action *actions = _actions->actions[act_index]; + +if (actions->type == RTE_FLOW_ACTION_TYPE_END) { +ds_put_cstr(s, "end"); +} else if (flow_actions->tnl_pmd_actions_cnt && + act_index >= flow_actions->tnl_pmd_actions_pos && + act_index < flow_actions->tnl_pmd_actions_pos + + flow_actions->tnl_pmd_actions_cnt) { +/* Opaque PMD tunnel actions are skipped. */ +return; +} else if (actions->type == RTE_FLOW_ACTION_TYPE_MARK) { const struct rte_flow_action_mark *mark = actions->conf; ds_put_cstr(s, "mark "); @@ -528,6 +553,14 @@ dump_flow_action(struct ds *s, struct ds *s_extra, ds_put_cstr(s, "vxlan_encap / "); dump_vxlan_encap(s_extra, items); ds_put_cstr(s_extra, ";"); +} else if (actions->type == RTE_FLOW_ACTION_TYPE_JUMP) { +const struct rte_flow_action_jump *jump = actions->conf; + +ds_put_cstr(s, "jump "); +if (jump) { +ds_put_format(s, "group %"PRIu32" ", jump->group); +} +ds_put_cstr(s, "/ "); } else { ds_put_format(s, "unknown rte flow action (%d)\n", actions->type); } @@ -537,20 +570,21 @@ static struct ds * dump_flow(struct ds *s, struct ds *s_extra, const struct rte_flow_attr *attr, const struct rte_flow_item *items, - const struct rte_flow_action *actions) + struct flow_actions *flow_actions) { +int i; + if (attr) { -dump_flow_attr(s, attr); +dump_flow_attr(s, s_extra, attr, flow_actions); }
[ovs-dev] [PATCH V7 01/13] netdev-offload: Add HW miss packet state recover API.
When the HW offload involves multiple flows, like in tunnel decap path, it is possible that not all flows in the path are offloaded, resulting in partial processing in HW. In order to proceed with rest of the processing in SW, the packet state has to be recovered as if it was processed in SW from the beginning. In the case of tunnel decap, potential state to recover could be the outer tunneling layer to metadata. Add an API for that. Signed-off-by: Eli Britstein Reviewed-by: Gaetan Rivet Acked-by: Sriharsha Basavapatna Tested-by: Emma Finn Tested-by: Marko Kovacevic Signed-off-by: Ilya Maximets --- lib/netdev-offload-provider.h | 6 ++ lib/netdev-offload.c | 12 lib/netdev-offload.h | 1 + 3 files changed, 19 insertions(+) diff --git a/lib/netdev-offload-provider.h b/lib/netdev-offload-provider.h index cf859d1b4..348ca7081 100644 --- a/lib/netdev-offload-provider.h +++ b/lib/netdev-offload-provider.h @@ -87,6 +87,12 @@ struct netdev_flow_api { * Return 0 if successful, otherwise returns a positive errno value. */ int (*flow_get_n_flows)(struct netdev *, uint64_t *n_flows); +/* Recover the packet state (contents and data) for continued processing + * in software. + * Return 0 if successful, otherwise returns a positive errno value and + * takes ownership of a packet if errno != EOPNOTSUPP. */ +int (*hw_miss_packet_recover)(struct netdev *, struct dp_packet *); + /* Initializies the netdev flow api. * Return 0 if successful, otherwise returns a positive errno value. */ int (*init_flow_api)(struct netdev *); diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c index 6237667c3..e5d24651f 100644 --- a/lib/netdev-offload.c +++ b/lib/netdev-offload.c @@ -253,6 +253,18 @@ netdev_flow_put(struct netdev *netdev, struct match *match, : EOPNOTSUPP; } +int +netdev_hw_miss_packet_recover(struct netdev *netdev, + struct dp_packet *packet) +{ +const struct netdev_flow_api *flow_api = +ovsrcu_get(const struct netdev_flow_api *, >flow_api); + +return (flow_api && flow_api->hw_miss_packet_recover) +? flow_api->hw_miss_packet_recover(netdev, packet) +: EOPNOTSUPP; +} + int netdev_flow_get(struct netdev *netdev, struct match *match, struct nlattr **actions, const ovs_u128 *ufid, diff --git a/lib/netdev-offload.h b/lib/netdev-offload.h index 18b48790f..b063c43a3 100644 --- a/lib/netdev-offload.h +++ b/lib/netdev-offload.h @@ -89,6 +89,7 @@ bool netdev_flow_dump_next(struct netdev_flow_dump *, struct match *, int netdev_flow_put(struct netdev *, struct match *, struct nlattr *actions, size_t actions_len, const ovs_u128 *, struct offload_info *, struct dpif_flow_stats *); +int netdev_hw_miss_packet_recover(struct netdev *, struct dp_packet *); int netdev_flow_get(struct netdev *, struct match *, struct nlattr **actions, const ovs_u128 *, struct dpif_flow_stats *, struct dpif_flow_attrs *, struct ofpbuf *wbuffer); -- 2.28.0.2311.g225365fb51 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH V7 11/13] dpif-netdev: Provide orig_in_port in metadata for tunneled packets.
From: Sriharsha Basavapatna When an encapsulated packet is recirculated through a TUNNEL_POP action, the metadata gets reinitialized and the originating physical port information is lost. When this flow gets processed by the vport and it needs to be offloaded, we can't figure out the physical port through which the tunneled packet was received. Add a new member to the metadata: 'orig_in_port'. This is passed to the next stage during recirculation and the offload layer can use it to offload the flow to this physical port. Signed-off-by: Sriharsha Basavapatna Signed-off-by: Eli Britstein Reviewed-by: Gaetan Rivet Tested-by: Emma Finn Tested-by: Marko Kovacevic Signed-off-by: Ilya Maximets --- lib/dpif-netdev.c| 20 ++-- lib/netdev-offload.h | 1 + lib/packets.h| 8 +++- 3 files changed, 22 insertions(+), 7 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 0a633c582..8766a00ea 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -432,6 +432,7 @@ struct dp_flow_offload_item { struct match match; struct nlattr *actions; size_t actions_len; +odp_port_t orig_in_port; /* Originating in_port for tnl flows. */ struct ovs_list node; }; @@ -2697,11 +2698,13 @@ dp_netdev_flow_offload_put(struct dp_flow_offload_item *offload) } } info.flow_mark = mark; +info.orig_in_port = offload->orig_in_port; port = netdev_ports_get(in_port, dpif_type_str); if (!port) { goto err_free; } + /* Taking a global 'port_mutex' to fulfill thread safety restrictions for * the netdev-offload-dpdk module. */ ovs_mutex_lock(>dp->port_mutex); @@ -2799,7 +2802,8 @@ queue_netdev_flow_del(struct dp_netdev_pmd_thread *pmd, static void queue_netdev_flow_put(struct dp_netdev_pmd_thread *pmd, struct dp_netdev_flow *flow, struct match *match, - const struct nlattr *actions, size_t actions_len) + const struct nlattr *actions, size_t actions_len, + odp_port_t orig_in_port) { struct dp_flow_offload_item *offload; int op; @@ -2825,6 +2829,7 @@ queue_netdev_flow_put(struct dp_netdev_pmd_thread *pmd, offload->actions = xmalloc(actions_len); memcpy(offload->actions, actions, actions_len); offload->actions_len = actions_len; +offload->orig_in_port = orig_in_port; dp_netdev_append_flow_offload(offload); } @@ -3626,7 +3631,8 @@ dp_netdev_get_mega_ufid(const struct match *match, ovs_u128 *mega_ufid) static struct dp_netdev_flow * dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd, struct match *match, const ovs_u128 *ufid, - const struct nlattr *actions, size_t actions_len) + const struct nlattr *actions, size_t actions_len, + odp_port_t orig_in_port) OVS_REQUIRES(pmd->flow_mutex) { struct ds extra_info = DS_EMPTY_INITIALIZER; @@ -3692,7 +3698,8 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd, cmap_insert(>flow_table, CONST_CAST(struct cmap_node *, >node), dp_netdev_flow_hash(>ufid)); -queue_netdev_flow_put(pmd, flow, match, actions, actions_len); +queue_netdev_flow_put(pmd, flow, match, actions, actions_len, + orig_in_port); if (OVS_UNLIKELY(!VLOG_DROP_DBG((_rl { struct ds ds = DS_EMPTY_INITIALIZER; @@ -3763,7 +3770,7 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd, if (!netdev_flow) { if (put->flags & DPIF_FP_CREATE) { dp_netdev_flow_add(pmd, match, ufid, put->actions, - put->actions_len); + put->actions_len, ODPP_NONE); } else { error = ENOENT; } @@ -3779,7 +3786,7 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd, ovsrcu_set(_flow->actions, new_actions); queue_netdev_flow_put(pmd, netdev_flow, match, - put->actions, put->actions_len); + put->actions, put->actions_len, ODPP_NONE); if (stats) { get_dpif_flow_status(pmd->dp, netdev_flow, stats, NULL); @@ -7253,6 +7260,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, ovs_u128 ufid; int error; uint64_t cycles = cycles_counter_update(>perf_stats); +odp_port_t orig_in_port = packet->md.orig_in_port; match.tun_md.valid = false; miniflow_expand(>mf, ); @@ -7302,7 +7310,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, if (OVS_LIKELY(!netdev_flow)) { netdev_flow = dp_netdev_flow_add(pmd, , , add_actions->data, - add_actions->size); + add_actions->size, orig_in_port); } ovs_mutex_unlock(>flow_mutex);
[ovs-dev] [PATCH V7 03/13] netdev-offload: Introduce an API to traverse ports.
Introduce an API to traverse the ports added to the offload ports map, with a generic callback for each one. Signed-off-by: Eli Britstein Reviewed-by: Gaetan Rivet Acked-by: Sriharsha Basavapatna Tested-by: Emma Finn Tested-by: Marko Kovacevic Signed-off-by: Ilya Maximets --- lib/netdev-offload.c | 18 ++ lib/netdev-offload.h | 8 2 files changed, 26 insertions(+) diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c index e5d24651f..10f543018 100644 --- a/lib/netdev-offload.c +++ b/lib/netdev-offload.c @@ -458,6 +458,24 @@ netdev_ports_flow_flush(const char *dpif_type) ovs_rwlock_unlock(_hmap_rwlock); } +void +netdev_ports_traverse(const char *dpif_type, + bool (*cb)(struct netdev *, odp_port_t, void *), + void *aux) +{ +struct port_to_netdev_data *data; + +ovs_rwlock_rdlock(_hmap_rwlock); +HMAP_FOR_EACH (data, portno_node, _to_netdev) { +if (netdev_get_dpif_type(data->netdev) == dpif_type) { +if (cb(data->netdev, data->dpif_port.port_no, aux)) { +break; +} +} +} +ovs_rwlock_unlock(_hmap_rwlock); +} + struct netdev_flow_dump ** netdev_ports_flow_dump_create(const char *dpif_type, int *ports, bool terse) { diff --git a/lib/netdev-offload.h b/lib/netdev-offload.h index b063c43a3..5bf89f891 100644 --- a/lib/netdev-offload.h +++ b/lib/netdev-offload.h @@ -113,6 +113,14 @@ struct netdev *netdev_ports_get(odp_port_t port, const char *dpif_type); int netdev_ports_remove(odp_port_t port, const char *dpif_type); odp_port_t netdev_ifindex_to_odp_port(int ifindex); +/* For each of the ports with dpif_type, call cb with the netdev and port + * number of the port, and an opaque user argument. + * The returned value is used to continue traversing upon false or stop if + * true. + */ +void netdev_ports_traverse(const char *dpif_type, + bool (*cb)(struct netdev *, odp_port_t, void *), + void *aux); struct netdev_flow_dump **netdev_ports_flow_dump_create( const char *dpif_type, int *ports, -- 2.28.0.2311.g225365fb51 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V7 05/13] netdev-offload-dpdk: Implement HW miss packet recover for vport.
Bleep bloop. Greetings Eli Britstein, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Ilya Maximets Lines checked: 188, Warnings: 1, Errors: 0 Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V7 11/13] dpif-netdev: Provide orig_in_port in metadata for tunneled packets.
Bleep bloop. Greetings Eli Britstein, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Eli Britstein , Ilya Maximets Lines checked: 177, Warnings: 1, Errors: 0 Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] controller: Fix the wrong 'struct' type for 'pflow_output_data'.
On Wed, Jun 23, 2021 at 11:26 AM Mark Michelson wrote: > > Acked-by: Mark Michelson Thanks. Applied to the main branch. Numan > > On 6/22/21 4:10 PM, num...@ovn.org wrote: > > From: Numan Siddique > > > > 'pflow_output_data' should be of type 'struct ed_type_pflow_output' > > and not 'struct ed_type_lflow_output'. > > > > Fixes: e07e397b7ae("ovn-controller: Split logical flow and physical flow > > processing.") > > Signed-off-by: Numan Siddique > > --- > > controller/ovn-controller.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > > index 1cfe4b713..3968ef059 100644 > > --- a/controller/ovn-controller.c > > +++ b/controller/ovn-controller.c > > @@ -2980,7 +2980,7 @@ main(int argc, char *argv[]) > > > > struct ed_type_lflow_output *lflow_output_data = > > engine_get_internal_data(_lflow_output); > > -struct ed_type_lflow_output *pflow_output_data = > > +struct ed_type_pflow_output *pflow_output_data = > > engine_get_internal_data(_pflow_output); > > struct ed_type_ct_zones *ct_zones_data = > > engine_get_internal_data(_ct_zones); > > > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] Path MTU discovery on GRE interfaces
Hi Jesse, Hi List I'm currently fighting with issues where TCP/UDP frames that are larger than the MTU of a GRE tunnel are dropped. I'm aware of the whys and how to work around the issue, but while looking for solutions i stumbled over the fact that: * [1] added PMTUD support to OVS * [2] disabled/removed with v1.9.0 respectively v1.10.0 the feature Even after some significant time looking through the history i haven't found a reason why this was removed, just that it was removed. I started some preliminary work to add PMTUD support to OVS (again), but the fact that it was removed 8 years ago seems to me like a red flag to not do it (again). Could someone fluent with the OVS history from 8 years ago shed some light on why PMTUD support was dropped? Any pointers to a thread on this topic? BR Matthias [1] https://mail.openvswitch.org/pipermail/ovs-git/2010-March/009936.html [2] https://www.openvswitch.org/releases/NEWS-2.15.0.txt ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [ovn] did we ever get OVN set up as an official LF project?
I don't see the charter, etc. in the OVN tree. I know we started the process but I don't know whether we finished it. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V7 01/13] netdev-offload: Add HW miss packet state recover API.
Bleep bloop. Greetings Eli Britstein, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Ilya Maximets Lines checked: 81, Warnings: 1, Errors: 0 Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V7 03/13] netdev-offload: Introduce an API to traverse ports.
Bleep bloop. Greetings Eli Britstein, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Ilya Maximets Lines checked: 70, Warnings: 1, Errors: 0 Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V7 06/13] dpif-netdev: Add HW miss packet state recover logic.
Bleep bloop. Greetings Eli Britstein, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Ilya Maximets Lines checked: 98, Warnings: 1, Errors: 0 Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V7 09/13] netdev-offload-dpdk: Change log rate limits.
Bleep bloop. Greetings Eli Britstein, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Ilya Maximets Lines checked: 33, Warnings: 1, Errors: 0 Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V7 12/13] netdev-offload-dpdk: Support vports flows offload.
Bleep bloop. Greetings Eli Britstein, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Ilya Maximets Lines checked: 458, Warnings: 1, Errors: 0 Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] Path MTU discovery on GRE interfaces
On Wed, 2021-06-23 at 10:06 -0700, Ben Pfaff wrote: > [updating Jesse's email address] > > On Wed, Jun 23, 2021 at 04:48:29PM +0200, Matthias May via dev wrote: > > I'm currently fighting with issues where TCP/UDP frames that are > > larger than the MTU of a GRE tunnel are dropped. > > I'm aware of the whys and how to work around the issue, but while > > looking for solutions i stumbled over the fact that: > > * [1] added PMTUD support to OVS > > * [2] disabled/removed with v1.9.0 respectively v1.10.0 the feature > > > > Even after some significant time looking through the history i > > haven't found a reason why this was removed, just that it > > was removed. > > > > I started some preliminary work to add PMTUD support to OVS > > (again), but the fact that it was removed 8 years ago seems > > to me like a red flag to not do it (again). > > > > Could someone fluent with the OVS history from 8 years ago shed > > some light on why PMTUD support was dropped? > > Any pointers to a thread on this topic? > > It was a layering violation. This caused problems like, for example, > not having a good IP address to send the "frag needed" message from. See also Aaron Conole's recent attempt to do some fragmentation handling when delivering to OVS ports with a smaller MTU. Since the tunnels have a smaller MTU for encapsulated traffic by necessity, things that need to send through the tunnel (like a container) must have a smaller MTU. But when something outside of the container's host sends a large UDP packet to the container, OVS fails to deliver that packet to the container's OVS port because its MTU is too small. We finally landed on using check_pkt_len to detect this condition and punt the ICMP reply to ovn-controller, but check_pkt_len isn't easily hardware offloadable :( And it would be great to just fragment this traffic to the right MTU in the first place, rather than have to send an ICMP reply or punt the fragmentation up to a controller. Dan ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 3/5] dpif-netdev: Add group rxq scheduling assignment type.
Hey Kevin , Patch looks good to me. Builds fine , all test cases here http://patchwork.ozlabs.org/project/openvswitch/patch/20210316154532.127858-1-ktray...@redhat.com/ pass as well. The groups assignment works fine too. >From vswitchd logs: dpif_netdev|INFO|PMD auto load balance load threshold set to 50% dpif_netdev|INFO|PMD auto load balance is disabled dpif_netdev|INFO|PMD auto load balance improvement threshold set to 5% dpif_netdev|INFO|PMD auto load balance is disabled dpif_netdev|INFO|PMD auto load balance is enabled interval 1 mins, pmd load threshold 50%, improvement threshold 5% dpif_netdev|INFO|Rxq to PMD assignment mode changed to: 'group'. dpif_netdev|INFO|Performing pmd to rx queue assignment using group algorithm. ... ... dpif_netdev|DBG|PMD auto load balance performing dry run. dpif_netdev|DBG|There's no available (non-isolated) pmd thread on numa node 0. Port 'dpdk1' rx queue 0 will be assigned to a pmd on numa node 1. This may lead to reduced performance. dpif_netdev|DBG|Core 26 on numa node 1 assigned port 'dpdk1' rx queue 0. (measured processing cycles 117514888500). dpif_netdev|DBG|There's no available (non-isolated) pmd thread on numa node 0. Port 'dpdk1' rx queue 1 will be assigned to a pmd on numa node 1. This may lead to reduced performance. dpif_netdev|DBG|Core 27 on numa node 1 assigned port 'dpdk1' rx queue 1. (measured processing cycles 115517336048). dpif_netdev|DBG|There's no available (non-isolated) pmd thread on numa node 0. Port 'dpdk0' rx queue 1 will be assigned to a pmd on numa node 1. This may lead to reduced performance. dpif_netdev|DBG|Core 27 on numa node 1 assigned port 'dpdk0' rx queue 1. (measured processing cycles 79988). dpif_netdev|DBG|There's no available (non-isolated) pmd thread on numa node 0. Port 'dpdk0' rx queue 0 will be assigned to a pmd on numa node 1. This may lead to reduced performance. dpif_netdev|DBG|Core 27 on numa node 1 assigned port 'dpdk0' rx queue 0. (measured processing cycles 539222868). dpif_netdev|DBG|There's no available (non-isolated) pmd thread on numa node 0. Port 'dpdk0' rx queue 2 will be assigned to a pmd on numa node 1. This may lead to reduced performance. dpif_netdev|DBG|Core 27 on numa node 1 assigned port 'dpdk0' rx queue 2. (measured processing cycles 538244586). dpif_netdev|DBG|Current variance 1 Estimated variance 0 dpif_netdev|DBG|Variance improvement 100% dpif_netdev|DBG|PMD load variance improvement threshold 5% is met dpif_netdev|INFO|PMD auto load balance dry run. Requesting datapath reconfigure. dpif_netdev|INFO|Performing pmd to rx queue assignment using group algorithm. Some minor nits inline : > +enum sched_assignment_type { > +SCHED_ROUNDROBIN, > +SCHED_CYCLES, /* Default.*/ > +SCHED_GROUP, > +SCHED_MAX > +}; > + > /* Datapath based on the network device interface from netdev.h. > * > @@ -367,5 +374,5 @@ struct dp_netdev { > struct ovs_mutex tx_qid_pool_mutex; > /* Use measured cycles for rxq to pmd assignment. */ > -bool pmd_rxq_assign_cyc; > +enum sched_assignment_type pmd_rxq_assign_cyc; sched_type would be a better name perhaps ? > +static struct sched_pmd * > +get_lowest_num_rxq_pmd(struct sched_numa *numa) { > +struct sched_pmd *lowest_rxqs_sched_pmd = NULL; > +unsigned lowest_rxqs = UINT_MAX; n_lowest_rxqs is a bit more clear perhaps ? > + > +/* find the pmd with lowest number of rxqs */ > +for (unsigned i = 0; i < numa->n_pmds; i++) { > +struct sched_pmd *sched_pmd; > +unsigned num_rxqs; > + > +sched_pmd = >pmds[i]; > +num_rxqs = sched_pmd->n_rxq; > +if (sched_pmd->isolated) { > +continue; > +} > + > +/* If this current load is higher we can go to the next one */ Full stop at the end of the comment missing. May be check this once for the entire patch ? > +if (num_rxqs > lowest_rxqs) { > +continue; > +} > + if (num_rxqs < lowest_rxqs) { > + lowest_rxqs = num_rxqs; > + lowest_rxqs_sched_pmd = sched_pmd; > +} > +} > +return lowest_rxqs_sched_pmd; > +} > + > +static struct sched_pmd * > +get_lowest_proc_pmd(struct sched_numa *numa) { > +struct sched_pmd *lowest_loaded_sched_pmd = NULL; > +uint64_t lowest_load = UINT64_MAX; > + > +/* find the pmd with the lowest load */ > +for (unsigned i = 0; i < numa->n_pmds; i++) { > +struct sched_pmd *sched_pmd; > +uint64_t pmd_load; > + > +sched_pmd = >pmds[i]; > +if (sched_pmd->isolated) { > +continue; > +} > +pmd_load = sched_pmd->pmd_proc_cycles; > +/* If this current load is higher we can go to the next one */ > +if (pmd_load > lowest_load) { > +continue; > +} > + if (pmd_load < lowest_load) { > + lowest_load = pmd_load; > + lowest_loaded_sched_pmd = sched_pmd; > +} > +} > +return
Re: [ovs-dev] [PATCH V7 02/13] netdev-dpdk: Introduce DPDK tunnel APIs.
Bleep bloop. Greetings Eli Britstein, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Ilya Maximets Lines checked: 265, Warnings: 1, Errors: 0 Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V7 08/13] netdev-offload: Disallow offloading to unrelated tunneling vports.
Bleep bloop. Greetings Eli Britstein, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Eli Britstein Lines checked: 93, Warnings: 1, Errors: 0 Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V7 10/13] netdev-offload-dpdk: Support tunnel pop action.
Bleep bloop. Greetings Eli Britstein, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Ilya Maximets Lines checked: 357, Warnings: 1, Errors: 0 Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V7 13/13] netdev-dpdk-offload: Add vxlan pattern matching function.
Bleep bloop. Greetings Eli Britstein, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Ilya Maximets Lines checked: 217, Warnings: 1, Errors: 0 Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [v13 05/12] dpif-netdev: Add command to switch dpif implementation.
On Thu, Jun 17, 2021 at 05:18:18PM +0100, Cian Ferriter wrote: > From: Harry van Haaren > > This commit adds a new command to allow the user to switch > the active DPIF implementation at runtime. A probe function > is executed before switching the DPIF implementation, to ensure > the CPU is capable of running the ISA required. For example, the > below code will switch to the AVX512 enabled DPIF assuming > that the runtime CPU is capable of running AVX512 instructions: > > $ ovs-appctl dpif-netdev/dpif-set dpif_avx512 > > A new configuration flag is added to allow selection of the > default DPIF. This is useful for running the unit-tests against > the available DPIF implementations, without modifying each unit test. > > The design of the testing & validation for ISA optimized DPIF > implementations is based around the work already upstream for DPCLS. > Note however that a DPCLS lookup has no state or side-effects, allowing > the auto-validator implementation to perform multiple lookups and > provide consistent statistic counters. > > The DPIF component does have state, so running two implementations in > parallel and comparing output is not a valid testing method, as there > are changes in DPIF statistic counters (side effects). As a result, the > DPIF is tested directly against the unit-tests. > > Signed-off-by: Harry van Haaren > Co-authored-by: Cian Ferriter > Signed-off-by: Cian Ferriter > > --- > > v13: > - Add Docs items about the switch DPIF command here rather than in > later commit. > - Document operation in manpages as well as rST. > - Minor code refactoring to address review comments. > --- > Documentation/topics/dpdk/bridge.rst | 34 + > acinclude.m4 | 15 > configure.ac | 1 + > lib/automake.mk | 1 + > lib/dpif-netdev-avx512.c | 14 > lib/dpif-netdev-private-dpif.c | 103 +++ > lib/dpif-netdev-private-dpif.h | 49 - > lib/dpif-netdev-private-thread.h | 11 +-- > lib/dpif-netdev-unixctl.man | 3 + > lib/dpif-netdev.c| 89 +-- > 10 files changed, 304 insertions(+), 16 deletions(-) > create mode 100644 lib/dpif-netdev-private-dpif.c > > diff --git a/Documentation/topics/dpdk/bridge.rst > b/Documentation/topics/dpdk/bridge.rst > index 526d5c959..fafa8c821 100644 > --- a/Documentation/topics/dpdk/bridge.rst > +++ b/Documentation/topics/dpdk/bridge.rst > @@ -214,3 +214,37 @@ implementation :: > > Compile OVS in debug mode to have `ovs_assert` statements error out if > there is a mis-match in the DPCLS lookup implementation. > + > +Datapath Interface Performance > +-- > + > +The datapath interface (DPIF) or dp_netdev_input() is responsible for taking > +packets through the major components of the userspace datapath; such as > +miniflow_extract, EMC, SMC and DPCLS lookups, and a lot of the performance > +stats associated with the datapath. > + > +Just like with the SIMD DPCLS feature above, SIMD can be applied to the DPIF > to > +improve performance. > + > +By default, dpif_scalar is used. The DPIF implementation can be selected by > +name :: > + > +$ ovs-appctl dpif-netdev/dpif-set dpif_avx512 > +DPIF implementation set to dpif_avx512. > + > +$ ovs-appctl dpif-netdev/dpif-set dpif_scalar > +DPIF implementation set to dpif_scalar. > + > +Running Unit Tests with AVX512 DPIF > +~~~ > + > +Since the AVX512 DPIF is disabled by default, a compile time option is > +available in order to test it with the OVS unit test suite. When building > with > +a CPU that supports AVX512, use the following configure option :: > + > +$ ./configure --enable-dpif-default-avx512 > + > +The following line should be seen in the configure output when the above > option > +is used :: > + > +checking whether DPIF AVX512 is default implementation... yes > diff --git a/acinclude.m4 b/acinclude.m4 > index 15a54d636..5fbcd9872 100644 > --- a/acinclude.m4 > +++ b/acinclude.m4 > @@ -30,6 +30,21 @@ AC_DEFUN([OVS_CHECK_DPCLS_AUTOVALIDATOR], [ >fi > ]) > > +dnl Set OVS DPIF default implementation at configure time for running the > unit > +dnl tests on the whole codebase without modifying tests per DPIF impl > +AC_DEFUN([OVS_CHECK_DPIF_AVX512_DEFAULT], [ > + AC_ARG_ENABLE([dpif-default-avx512], > +[AC_HELP_STRING([--enable-dpif-default-avx512], [Enable DPIF > AVX512 implementation as default.])], > +[dpifavx512=yes],[dpifavx512=no]) > + AC_MSG_CHECKING([whether DPIF AVX512 is default implementation]) > + if test "$dpifavx512" != yes; then > +AC_MSG_RESULT([no]) > + else > +OVS_CFLAGS="$OVS_CFLAGS -DDPIF_AVX512_DEFAULT" > +AC_MSG_RESULT([yes]) > + fi > +]) > + > dnl OVS_ENABLE_WERROR > AC_DEFUN([OVS_ENABLE_WERROR], >[AC_ARG_ENABLE( > diff --git a/configure.ac
Re: [ovs-dev] [PATCH V7 00/13] Netdev vxlan-decap offload
On 6/23/21 5:52 PM, Eli Britstein wrote: > VXLAN decap in OVS-DPDK configuration consists of two flows: > F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789) > F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0 > > F1 is a classification flow. It has outer headers matches and it > classifies the packet as a VXLAN packet, and using tnl_pop action the > packet continues processing in F2. > F2 is a flow that has matches on tunnel metadata as well as on the inner > packet headers (as any other flow). > > In order to fully offload VXLAN decap path, both F1 and F2 should be > offloaded. As there are more than one flow in HW, it is possible that > F1 is done by HW but F2 is not. Packet is received by SW, and should be > processed starting from F2 as F1 was already done by HW. > Rte_flows are applicable only on physical port IDs. Keeping the original > physical in port on which the packet is received on enables applying > vport flows (e.g. F2) on that physical port. > > This patch-set makes use of [1] introduced in DPDK 20.11, that adds API > for tunnel offloads. > > Note that MLX5 PMD has a bug that the tnl_pop private actions must be > first. In OVS it is not. > Fixing this issue is scheduled for 21.05 (and stable 20.11.2). > Meanwhile, tests were done with a workaround for it [2]. > > v2-v1: > - Tracking original in_port, and applying vport on that physical port instead > of all PFs. > v3-v2: > - Traversing ports using a new API instead of flow_dump. > - Refactor packet state recover logic, with bug fix for error pop_header. > - One ref count for netdev in non-tunnel case. > - Rename variables, comments, rebase. > v4-v3: > - Extract orig_in_port from physdev for flow modify. > - Miss handling fixes. > v5-v4: > - Drop refactor offload rule creation commit. > - Comment about setting in_port in restore. > - Refactor vports flow offload commit. > v6-v5: > - Fixed duplicate netdev ref bug. > v7-v6: > - Adopting Ilya's diff, with a minor fix in set_error stub. > - Fixed abort (remove OVS_NOT_REACHED()) with tunnels other than vxlan > ("netdev-offload-dpdk: Support tunnel pop action."). Thanks! I see the only difference (beside the set_error fix) with what I have locally is following: diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c index 363f32f71..6bd5b6c9f 100644 --- a/lib/netdev-offload-dpdk.c +++ b/lib/netdev-offload-dpdk.c @@ -835,7 +835,9 @@ vport_to_rte_tunnel(struct netdev *vport, netdev_dpdk_get_port_id(netdev)); } } else { -OVS_NOT_REACHED(); +VLOG_DBG_RL(, "vport type '%s' is not supported", +netdev_get_type(vport)); +return -1; } return 0; --- That looks good to me. So, I guess, Harsha, we're waiting for your review/tests here. > > Travis: > v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552 > v2: https://travis-ci.org/github/elibritstein/OVS/builds/758382963 > v3: https://travis-ci.org/github/elibritstein/OVS/builds/761089087 > v4: https://travis-ci.org/github/elibritstein/OVS/builds/763146966 > v5: https://travis-ci.org/github/elibritstein/OVS/builds/765271879 > v6: https://travis-ci.org/github/elibritstein/OVS/builds/765816800 > v7: Have a problem to run Yes, this thing is non-functional. Even travis-ci.com doesn't work for me for unknown reason (I do have compute credits). Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] tests: Add PMD auto load balance unit tests.
On 3/16/21 4:45 PM, Kevin Traynor wrote: > These tests focus on enabling/disabling and user parameters. > > Co-Authored-by: David Marchand > Signed-off-by: David Marchand > Signed-off-by: Kevin Traynor > > --- > v2: > - Remove above max documented interval test > - Add David's code to combine param checks and add as co-author > --- > tests/alb.at | 218 + > tests/automake.mk | 1 + > tests/testsuite.at | 1 + > 3 files changed, 220 insertions(+) > create mode 100644 tests/alb.at Hi, Kevin. While testing these tests I noticed one thing: get_log_line_num() returns current line and not the next one, so if log didn't change, several subsequent get_log_line_num + OVS_WAIT_UNTIL will succeed. Meaning that it maybe unreliable to test for the same text in a log two times in a row with some command in-between, because command may return faster than logs printed to a file and the check will be performed with the previous line in a log. Suggesting to increase the line number by one to avoid that. I understand that you ported this part from the pmd.at, so we, probably, need to fix that there too in a separate change. Suggesting following incremental: diff --git a/tests/alb.at b/tests/alb.at index 0ea1bbdd1..1331b742c 100644 --- a/tests/alb.at +++ b/tests/alb.at @@ -3,7 +3,7 @@ AT_BANNER([PMD Auto Load Balance]) m4_divert_push([PREPARE_TESTS]) get_log_line_num () { -LINENUM=$(cat ovs-vswitchd.log | wc -l | tr -d [[:blank:]]) +LINENUM=$(($(cat ovs-vswitchd.log | wc -l | tr -d [[:blank:]])+1)) } m4_divert_pop([PREPARE_TESTS]) @@ -21,7 +21,8 @@ m4_define([CHECK_ALB_PARAM], [ line_st="+0" fi OVS_WAIT_UNTIL([tail -n $line_st ovs-vswitchd.log | grep "PMD auto load balance $1 set to"]) -AT_CHECK([tail -n $line_st ovs-vswitchd.log | sed -n "s#.*\(PMD auto load balance $1 set to.*\)#\1#p" | tail -1], [0], [dnl +AT_CHECK([tail -n $line_st ovs-vswitchd.log dnl +| sed -n "s#.*\(PMD auto load balance $1 set to.*\)#\1#p" | tail -1], [0], [dnl PMD auto load balance $1 set to $2 ]) ]) @@ -107,7 +108,9 @@ OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD auto load balance get_log_line_num AT_CHECK([ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="false"]) OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD auto load balance is disabled"]) +get_log_line_num AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-rxq-assign=roundrobin]) +OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "mode changed to: 'roundrobin'"]) get_log_line_num AT_CHECK([ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"]) OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD auto load balance is disabled"]) --- What do you think? The check around 'roundrobin' is just in case, to be sure that log actually updated. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] dpif-netdev: apply subtable-lookup-prio-set on any datapath
Currently, if you try to set subtable-lookup-prio-set when you don't have any datapath (for example if an user wants to set AVX512 before creating any bridge) it sets it globally (dpcls_subtable_set_prio), but it returns an error: please specify an existing datapath ovs-appctl: ovs-vswitchd: server returned an error and, in this case, the exit code of ovs-appctl is 2. This commit changes the behaviour by removing the [dp] optional parameter of subtable-lookup-prio-set and by changing the priority level on any datapath and globally. This means if you don't have any datapath or if you have only one datapath, the behaviour is the same as now, but without the confusing error when you don't have any datapath. Signed-off-by: Timothy Redaelli Fixes: 3d018c3ea79d ("dpif-netdev: add subtable lookup prio set command.") Cc: harry.van.haa...@intel.com --- lib/dpif-netdev.c | 78 +++ 1 file changed, 32 insertions(+), 46 deletions(-) --- There is no need to change the documentation, since the manpage is not merged and it's currently part of another series [1], but the optional datapath parameter is not present in the patchset and so there is not need to change it. Currently the only document that contains a reference to subtable-lookup-prio-set (Documentation/topics/dpdk/bridge.rst), doesn't contain any reference to the optional datapath parameter too. [1] https://patchwork.ozlabs.org/project/openvswitch/patch/20210617161825.94741-9-cian.ferri...@intel.com/ diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 8fa7eb6d4..478eb7cf3 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -1345,13 +1345,15 @@ dpif_netdev_subtable_lookup_set(struct unixctl_conn *conn, int argc, /* This function requires 2 parameters (argv[1] and argv[2]) to execute. * argv[1] is subtable name * argv[2] is priority - * argv[3] is the datapath name (optional if only 1 datapath exists) */ const char *func_name = argv[1]; errno = 0; char *err_char; uint32_t new_prio = strtoul(argv[2], _char, 10); +uint32_t lookup_dpcls_changed = 0; +uint32_t lookup_subtable_changed = 0; +struct shash_node *node; if (errno != 0 || new_prio > UINT8_MAX) { unixctl_command_reply_error(conn, "error converting priority, use integer in range 0-255\n"); @@ -1365,58 +1367,42 @@ dpif_netdev_subtable_lookup_set(struct unixctl_conn *conn, int argc, return; } -/* argv[3] is optional datapath instance. If no datapath name is provided - * and only one datapath exists, the one existing datapath is reprobed. - */ ovs_mutex_lock(_netdev_mutex); -struct dp_netdev *dp = NULL; - -if (argc == 4) { -dp = shash_find_data(_netdevs, argv[3]); -} else if (shash_count(_netdevs) == 1) { -dp = shash_first(_netdevs)->data; -} - -if (!dp) { -ovs_mutex_unlock(_netdev_mutex); -unixctl_command_reply_error(conn, -"please specify an existing datapath"); -return; -} - -/* Get PMD threads list, required to get DPCLS instances. */ -size_t n; -uint32_t lookup_dpcls_changed = 0; -uint32_t lookup_subtable_changed = 0; -struct dp_netdev_pmd_thread **pmd_list; -sorted_poll_thread_list(dp, _list, ); +SHASH_FOR_EACH(node, _netdevs) { +struct dp_netdev *dp = node->data; -/* take port mutex as HMAP iters over them. */ -ovs_mutex_lock(>port_mutex); +/* Get PMD threads list, required to get DPCLS instances. */ +size_t n; +struct dp_netdev_pmd_thread **pmd_list; +sorted_poll_thread_list(dp, _list, ); -for (size_t i = 0; i < n; i++) { -struct dp_netdev_pmd_thread *pmd = pmd_list[i]; -if (pmd->core_id == NON_PMD_CORE_ID) { -continue; -} +/* take port mutex as HMAP iters over them. */ +ovs_mutex_lock(>port_mutex); -struct dp_netdev_port *port = NULL; -HMAP_FOR_EACH (port, node, >ports) { -odp_port_t in_port = port->port_no; -struct dpcls *cls = dp_netdev_pmd_lookup_dpcls(pmd, in_port); -if (!cls) { +for (size_t i = 0; i < n; i++) { +struct dp_netdev_pmd_thread *pmd = pmd_list[i]; +if (pmd->core_id == NON_PMD_CORE_ID) { continue; } -uint32_t subtbl_changes = dpcls_subtable_lookup_reprobe(cls); -if (subtbl_changes) { -lookup_dpcls_changed++; -lookup_subtable_changed += subtbl_changes; + +struct dp_netdev_port *port = NULL; +HMAP_FOR_EACH (port, node, >ports) { +odp_port_t in_port = port->port_no; +struct dpcls *cls = dp_netdev_pmd_lookup_dpcls(pmd, in_port); +if (!cls) { +continue; +} +uint32_t
Re: [ovs-dev] [PATCH] dpif-netdev: apply subtable-lookup-prio-set on any datapath
On 6/23/21 8:54 PM, Timothy Redaelli wrote: > Currently, if you try to set subtable-lookup-prio-set when you don't have > any datapath (for example if an user wants to set AVX512 before creating > any bridge) it sets it globally (dpcls_subtable_set_prio), > but it returns an error: > > please specify an existing datapath > ovs-appctl: ovs-vswitchd: server returned an error > > and, in this case, the exit code of ovs-appctl is 2. > > This commit changes the behaviour by removing the [dp] optional > parameter of subtable-lookup-prio-set and by changing the priority > level on any datapath and globally. This means if you don't have any > datapath or if you have only one datapath, the behaviour is the same as > now, but without the confusing error when you don't have any datapath. > > Signed-off-by: Timothy Redaelli > Fixes: 3d018c3ea79d ("dpif-netdev: add subtable lookup prio set command.") > Cc: harry.van.haa...@intel.com > --- > lib/dpif-netdev.c | 78 +++ > 1 file changed, 32 insertions(+), 46 deletions(-) > --- > There is no need to change the documentation, since the manpage is not > merged and it's currently part of another series [1], but the optional > datapath parameter is not present in the patchset and so there is not need > to change it. Currently the only document that contains a reference to > subtable-lookup-prio-set (Documentation/topics/dpdk/bridge.rst), > doesn't contain any reference to the optional datapath parameter too. > > [1] > https://patchwork.ozlabs.org/project/openvswitch/patch/20210617161825.94741-9-cian.ferri...@intel.com/ > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index 8fa7eb6d4..478eb7cf3 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -1345,13 +1345,15 @@ dpif_netdev_subtable_lookup_set(struct unixctl_conn > *conn, int argc, > /* This function requires 2 parameters (argv[1] and argv[2]) to execute. > * argv[1] is subtable name > * argv[2] is priority > - * argv[3] is the datapath name (optional if only 1 datapath exists) > */ > const char *func_name = argv[1]; > > errno = 0; > char *err_char; > uint32_t new_prio = strtoul(argv[2], _char, 10); > +uint32_t lookup_dpcls_changed = 0; > +uint32_t lookup_subtable_changed = 0; > +struct shash_node *node; > if (errno != 0 || new_prio > UINT8_MAX) { > unixctl_command_reply_error(conn, > "error converting priority, use integer in range 0-255\n"); > @@ -1365,58 +1367,42 @@ dpif_netdev_subtable_lookup_set(struct unixctl_conn > *conn, int argc, > return; > } > > -/* argv[3] is optional datapath instance. If no datapath name is provided > - * and only one datapath exists, the one existing datapath is reprobed. > - */ > ovs_mutex_lock(_netdev_mutex); > -struct dp_netdev *dp = NULL; > - > -if (argc == 4) { > -dp = shash_find_data(_netdevs, argv[3]); > -} else if (shash_count(_netdevs) == 1) { > -dp = shash_first(_netdevs)->data; > -} > - > -if (!dp) { > -ovs_mutex_unlock(_netdev_mutex); > -unixctl_command_reply_error(conn, > -"please specify an existing datapath"); > -return; > -} > - > -/* Get PMD threads list, required to get DPCLS instances. */ > -size_t n; > -uint32_t lookup_dpcls_changed = 0; > -uint32_t lookup_subtable_changed = 0; > -struct dp_netdev_pmd_thread **pmd_list; > -sorted_poll_thread_list(dp, _list, ); > +SHASH_FOR_EACH(node, _netdevs) { > +struct dp_netdev *dp = node->data; > > -/* take port mutex as HMAP iters over them. */ > -ovs_mutex_lock(>port_mutex); > +/* Get PMD threads list, required to get DPCLS instances. */ > +size_t n; > +struct dp_netdev_pmd_thread **pmd_list; > +sorted_poll_thread_list(dp, _list, ); > > -for (size_t i = 0; i < n; i++) { > -struct dp_netdev_pmd_thread *pmd = pmd_list[i]; > -if (pmd->core_id == NON_PMD_CORE_ID) { > -continue; > -} > +/* take port mutex as HMAP iters over them. */ > +ovs_mutex_lock(>port_mutex); > > -struct dp_netdev_port *port = NULL; > -HMAP_FOR_EACH (port, node, >ports) { > -odp_port_t in_port = port->port_no; > -struct dpcls *cls = dp_netdev_pmd_lookup_dpcls(pmd, in_port); > -if (!cls) { > +for (size_t i = 0; i < n; i++) { > +struct dp_netdev_pmd_thread *pmd = pmd_list[i]; > +if (pmd->core_id == NON_PMD_CORE_ID) { > continue; > } > -uint32_t subtbl_changes = dpcls_subtable_lookup_reprobe(cls); > -if (subtbl_changes) { > -lookup_dpcls_changed++; > -lookup_subtable_changed += subtbl_changes; > + > +struct dp_netdev_port *port = NULL; > +
Re: [ovs-dev] [v13 06/12] dpif-netdev: Add command to get dpif implementations.
On Thu, Jun 17, 2021 at 05:18:19PM +0100, Cian Ferriter wrote: > From: Harry van Haaren > > This commit adds a new command to retrieve the list of available > DPIF implementations. This can be used by to check what implementations > of the DPIF are available in any given OVS binary. > > Usage: > $ ovs-appctl dpif-netdev/dpif-get I didn't mention this in the dpif-set but it would be great to have a more targeted command name, like dpif-impl-{get,set}. > > Signed-off-by: Harry van Haaren > > --- > > v13: > - Add NEWS item about DPIF get and set commands here rather than in a > later commit. > - Add documentation items about DPIF set commands here rather than in a > later commit. > --- > Documentation/topics/dpdk/bridge.rst | 8 > NEWS | 1 + > lib/dpif-netdev-private-dpif.c | 8 > lib/dpif-netdev-private-dpif.h | 6 ++ > lib/dpif-netdev-unixctl.man | 3 +++ > lib/dpif-netdev.c| 24 > 6 files changed, 50 insertions(+) > > diff --git a/Documentation/topics/dpdk/bridge.rst > b/Documentation/topics/dpdk/bridge.rst > index fafa8c821..f59e26cbe 100644 > --- a/Documentation/topics/dpdk/bridge.rst > +++ b/Documentation/topics/dpdk/bridge.rst > @@ -226,6 +226,14 @@ stats associated with the datapath. > Just like with the SIMD DPCLS feature above, SIMD can be applied to the DPIF > to > improve performance. > > +OVS provides multiple implementations of the DPIF. The available > +implementations can be listed with the following command :: > + > +$ ovs-appctl dpif-netdev/dpif-get > +Available DPIF implementations: > + dpif_scalar > + dpif_avx512 > + > By default, dpif_scalar is used. The DPIF implementation can be selected by > name :: > > diff --git a/NEWS b/NEWS > index 6a4a7b76d..c47ab349e 100644 > --- a/NEWS > +++ b/NEWS > @@ -12,6 +12,7 @@ Post-v2.15.0 > * Refactor lib/dpif-netdev.c to multiple header files. > * Add avx512 implementation of dpif which can process non recirculated > packets. It supports partial HWOL, EMC, SMC and DPCLS lookups. > + * Add commands to get and set the dpif implementations. > - ovs-ctl: > * New option '--no-record-hostname' to disable hostname configuration > in ovsdb on startup. > diff --git a/lib/dpif-netdev-private-dpif.c b/lib/dpif-netdev-private-dpif.c > index d829a7ee5..3649e775d 100644 > --- a/lib/dpif-netdev-private-dpif.c > +++ b/lib/dpif-netdev-private-dpif.c > @@ -73,6 +73,14 @@ dp_netdev_impl_set_default(dp_netdev_input_func func) > default_dpif_func = func; > } > > +uint32_t > +dp_netdev_impl_get(const struct dpif_netdev_impl_info_t **out_impls) > +{ > +ovs_assert(out_impls); > +*out_impls = dpif_impls; > +return ARRAY_SIZE(dpif_impls); > +} > + This could receive struct ds and fill with the internal details to keep internal details in private-dpif.c > /* This function checks all available DPIF implementations, and selects the > * returns the function pointer to the one requested by "name". > */ > diff --git a/lib/dpif-netdev-private-dpif.h b/lib/dpif-netdev-private-dpif.h > index a6db3c7f2..717e9e2f9 100644 > --- a/lib/dpif-netdev-private-dpif.h > +++ b/lib/dpif-netdev-private-dpif.h > @@ -48,6 +48,12 @@ struct dpif_netdev_impl_info_t { > const char *name; > }; > > +/* This function returns all available implementations to the caller. The > + * quantity of implementations is returned by the int return value. > + */ > +uint32_t > +dp_netdev_impl_get(const struct dpif_netdev_impl_info_t **out_impls); > + > /* This function checks all available DPIF implementations, and selects the > * returns the function pointer to the one requested by "name". > */ > diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man > index b348940b0..534823879 100644 > --- a/lib/dpif-netdev-unixctl.man > +++ b/lib/dpif-netdev-unixctl.man > @@ -227,5 +227,8 @@ When this is the case, the above command prints the > load-balancing information > of the bonds configured in datapath \fIdp\fR showing the interface associated > with each bucket (hash). > . > +.IP "\fBdpif-netdev/dpif-get\fR > +Lists the DPIF implementations that are available. > +. > .IP "\fBdpif-netdev/dpif-set\fR \fIdpif_impl\fR" > Sets the DPIF to be used to \fIdpif_impl\fR. By default "dpif_scalar" is > used. > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index 9c234ef3d..59a44a848 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -991,6 +991,27 @@ dpif_netdev_subtable_lookup_set(struct unixctl_conn > *conn, int argc, > ds_destroy(); > } > > +static void > +dpif_netdev_impl_get(struct unixctl_conn *conn, int argc OVS_UNUSED, > + const char *argv[] OVS_UNUSED, void *aux OVS_UNUSED) > +{ > +const struct dpif_netdev_impl_info_t *dpif_impls; then here you initialize 'reply', call dp_netdev_impl_get() and reply if it
Re: [ovs-dev] [ovn] did we ever get OVN set up as an official LF project?
On Wed, Jun 23, 2021 at 03:21:37PM -0400, Numan Siddique wrote: > On Wed, Jun 23, 2021 at 1:58 PM Ben Pfaff wrote: > > > > I don't see the charter, etc. in the OVN tree. I know we started the > > process but I don't know whether we finished it. > > I'm not sure about the official status but I presumed that OVN is part of LF. > > Trishan de Lanerolle (CC'ed) has reached out to me and other OVS folks earlier > to get updates about OVS and OVN. > > Maybe he can answer your question. OVS was an LF project and still is. OVN was started as a part of it. Later, we split it off and I believe that we initiated the process to make it a separate LF project. I don't know whether we finished that. If we did, we should add the charter to the OVN repo and the "Linux Foundation Collaborative Project" banner to the OVN website. If not, we should finish it. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] dpif-netdev: apply subtable-lookup-prio-set on any datapath
Bleep bloop. Greetings Timothy Redaelli, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: ERROR: Improper whitespace around control block #79 FILE: lib/dpif-netdev.c:1374: SHASH_FOR_EACH(node, _netdevs) { Lines checked: 149, Warnings: 0, Errors: 1 build: /bin/sh ./libtool --tag=CC --mode=compile gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I ./include -I ./include -I ./lib -I ./lib-Wstrict-prototypes -Wall -Wextra -Wno-sign-compare -Wpointer-arith -Wformat -Wformat-security -Wswitch-enum -Wunused-parameter -Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition -Wmissing-prototypes -Wmissing-field-initializers -fno-strict-aliasing -Wshadow -Werror -Werror -g -O2 -DHAVE_LD_AVX512_GOOD -MT lib/dpif-netdev-lookup.lo -MD -MP -MF $depbase.Tpo -c -o lib/dpif-netdev-lookup.lo lib/dpif-netdev-lookup.c &&\ mv -f $depbase.Tpo $depbase.Plo libtool: compile: gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I ./include -I ./include -I ./lib -I ./lib -Wstrict-prototypes -Wall -Wextra -Wno-sign-compare -Wpointer-arith -Wformat -Wformat-security -Wswitch-enum -Wunused-parameter -Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition -Wmissing-prototypes -Wmissing-field-initializers -fno-strict-aliasing -Wshadow -Werror -Werror -g -O2 -DHAVE_LD_AVX512_GOOD -MT lib/dpif-netdev-lookup.lo -MD -MP -MF lib/.deps/dpif-netdev-lookup.Tpo -c lib/dpif-netdev-lookup.c -o lib/dpif-netdev-lookup.o depbase=`echo lib/dpif-netdev-lookup-autovalidator.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\ /bin/sh ./libtool --tag=CC --mode=compile gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I ./include -I ./include -I ./lib -I ./lib-Wstrict-prototypes -Wall -Wextra -Wno-sign-compare -Wpointer-arith -Wformat -Wformat-security -Wswitch-enum -Wunused-parameter -Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition -Wmissing-prototypes -Wmissing-field-initializers -fno-strict-aliasing -Wshadow -Werror -Werror -g -O2 -DHAVE_LD_AVX512_GOOD -MT lib/dpif-netdev-lookup-autovalidator.lo -MD -MP -MF $depbase.Tpo -c -o lib/dpif-netdev-lookup-autovalidator.lo lib/dpif-netdev-lookup-autovalidator.c &&\ mv -f $depbase.Tpo $depbase.Plo libtool: compile: gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I ./include -I ./include -I ./lib -I ./lib -Wstrict-prototypes -Wall -Wextra -Wno-sign-compare -Wpointer-arith -Wformat -Wformat-security -Wswitch-enum -Wunused-parameter -Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition -Wmissing-prototypes -Wmissing-field-initializers -fno-strict-aliasing -Wshadow -Werror -Werror -g -O2 -DHAVE_LD_AVX512_GOOD -MT lib/dpif-netdev-lookup-autovalidator.lo -MD -MP -MF lib/.deps/dpif-netdev-lookup-autovalidator.Tpo -c lib/dpif-netdev-lookup-autovalidator.c -o lib/dpif-netdev-lookup-autovalidator.o depbase=`echo lib/dpif-netdev-lookup-generic.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\ /bin/sh ./libtool --tag=CC --mode=compile gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I ./include -I ./include -I ./lib -I ./lib-Wstrict-prototypes -Wall -Wextra -Wno-sign-compare -Wpointer-arith -Wformat -Wformat-security -Wswitch-enum -Wunused-parameter -Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition -Wmissing-prototypes -Wmissing-field-initializers -fno-strict-aliasing -Wshadow -Werror -Werror -g -O2 -DHAVE_LD_AVX512_GOOD -MT lib/dpif-netdev-lookup-generic.lo -MD -MP -MF $depbase.Tpo -c -o lib/dpif-netdev-lookup-generic.lo lib/dpif-netdev-lookup-generic.c &&\ mv -f $depbase.Tpo $depbase.Plo libtool: compile: gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I ./include -I ./include -I ./lib -I ./lib -Wstrict-prototypes -Wall -Wextra -Wno-sign-compare -Wpointer-arith -Wformat -Wformat-security -Wswitch-enum -Wunused-parameter -Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition -Wmissing-prototypes -Wmissing-field-initializers -fno-strict-aliasing -Wshadow -Werror -Werror -g -O2 -DHAVE_LD_AVX512_GOOD -MT lib/dpif-netdev-lookup-generic.lo -MD -MP -MF lib/.deps/dpif-netdev-lookup-generic.Tpo -c lib/dpif-netdev-lookup-generic.c -o lib/dpif-netdev-lookup-generic.o depbase=`echo lib/dpif-netdev.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\ /bin/sh ./libtool --tag=CC --mode=compile gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I ./include -I ./include -I ./lib -I ./lib-Wstrict-prototypes -Wall -Wextra -Wno-sign-compare -Wpointer-arith -Wformat -Wformat-security -Wswitch-enum -Wunused-parameter -Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition -Wmissing-prototypes -Wmissing-field-initializers -fno-strict-aliasing -Wshadow -Werror -Werror -g -O2 -DHAVE_LD_AVX512_GOOD -MT lib/dpif-netdev.lo -MD -MP -MF $depbase.Tpo -c -o lib/dpif-netdev.lo lib/dpif-netdev.c &&\ mv -f $depbase.Tpo $depbase.Plo libtool: compile: gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I ./include
Re: [ovs-dev] [PATCH V6 00/13] Netdev vxlan-decap offload
On 6/23/21 5:48 PM, Eli Britstein wrote: @@ -1009,7 +1009,7 @@ parse_vxlan_match(struct flow_patterns *patterns, return 0; } -static int +static int OVS_UNUSED > > Note that if experimental is allowed, the OVS_UNUSED attribute is misleading. Yeah, I know. We might introduce OVS_MAY_BE_UNUSED macro someday, but that is really minor. > > Also see below. > parse_flow_tnl_match(struct netdev *tnldev, struct flow_patterns *patterns, odp_port_t orig_in_port, @@ -1031,7 +1031,7 @@ parse_flow_tnl_match(struct netdev *tnldev, static int parse_flow_match(struct netdev *netdev, - odp_port_t orig_in_port, + odp_port_t orig_in_port OVS_UNUSED, struct flow_patterns *patterns, struct match *match) { @@ -1045,10 +1045,12 @@ parse_flow_match(struct netdev *netdev, } patterns->physdev = netdev; +#ifdef ALLOW_EXPERIMENTAL_API /* Packet restoration API required. */ > > In my opinion those should be removed in netdev-offload-dpdk.c, and keep such > #ifdef only in netdev-dpdk (with stubs), so later, when dpdk removes the > experimental attribute, there will be a single place to change. > > This applies both to parse_flow_tnl_match and add_tnl_pop_action. > > However, this is not critical and I would not hold the merge because of this. I agree that it's a bit of an overthinking from my side, but we will need to introduce this kind of guarding here if DPDK APIs will become non-experimental not all at once. I'm not sure if that is a possible scenario, but just in case. Looking more at the code, I agree that they are unnecessary for current version, but let them be, as they will remind us to re-check things once some of APIs will become stable. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [ovn] did we ever get OVN set up as an official LF project?
On Wed, Jun 23, 2021 at 1:58 PM Ben Pfaff wrote: > > I don't see the charter, etc. in the OVN tree. I know we started the > process but I don't know whether we finished it. I'm not sure about the official status but I presumed that OVN is part of LF. Trishan de Lanerolle (CC'ed) has reached out to me and other OVS folks earlier to get updates about OVS and OVN. Maybe he can answer your question. Thanks Numan > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v2] dpif-netdev: apply subtable-lookup-prio-set on any datapath
Currently, if you try to set subtable-lookup-prio-set when you don't have any datapath (for example if an user wants to set AVX512 before creating any bridge) it sets it globally (dpcls_subtable_set_prio), but it returns an error: please specify an existing datapath ovs-appctl: ovs-vswitchd: server returned an error and, in this case, the exit code of ovs-appctl is 2. This commit changes the behaviour by removing the [datapath] optional parameter of subtable-lookup-prio-set and by changing the priority level on any datapath and globally. This means if you don't have any datapath or if you have only one datapath, the behaviour is the same as now, but without the confusing error when you don't have any datapath. Fixes: 3d018c3ea79d ("dpif-netdev: add subtable lookup prio set command.") Cc: harry.van.haa...@intel.com Signed-off-by: Timothy Redaelli -- v2: - Fixed one warning and one coding style issue found by 0-day Robot. --- lib/dpif-netdev.c | 80 +++ 1 file changed, 33 insertions(+), 47 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 8fa7eb6d4..c557daa9c 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -1339,19 +1339,21 @@ dpif_netdev_subtable_lookup_get(struct unixctl_conn *conn, int argc OVS_UNUSED, } static void -dpif_netdev_subtable_lookup_set(struct unixctl_conn *conn, int argc, +dpif_netdev_subtable_lookup_set(struct unixctl_conn *conn, int argc OVS_UNUSED, const char *argv[], void *aux OVS_UNUSED) { /* This function requires 2 parameters (argv[1] and argv[2]) to execute. * argv[1] is subtable name * argv[2] is priority - * argv[3] is the datapath name (optional if only 1 datapath exists) */ const char *func_name = argv[1]; errno = 0; char *err_char; uint32_t new_prio = strtoul(argv[2], _char, 10); +uint32_t lookup_dpcls_changed = 0; +uint32_t lookup_subtable_changed = 0; +struct shash_node *node; if (errno != 0 || new_prio > UINT8_MAX) { unixctl_command_reply_error(conn, "error converting priority, use integer in range 0-255\n"); @@ -1365,58 +1367,42 @@ dpif_netdev_subtable_lookup_set(struct unixctl_conn *conn, int argc, return; } -/* argv[3] is optional datapath instance. If no datapath name is provided - * and only one datapath exists, the one existing datapath is reprobed. - */ ovs_mutex_lock(_netdev_mutex); -struct dp_netdev *dp = NULL; - -if (argc == 4) { -dp = shash_find_data(_netdevs, argv[3]); -} else if (shash_count(_netdevs) == 1) { -dp = shash_first(_netdevs)->data; -} - -if (!dp) { -ovs_mutex_unlock(_netdev_mutex); -unixctl_command_reply_error(conn, -"please specify an existing datapath"); -return; -} - -/* Get PMD threads list, required to get DPCLS instances. */ -size_t n; -uint32_t lookup_dpcls_changed = 0; -uint32_t lookup_subtable_changed = 0; -struct dp_netdev_pmd_thread **pmd_list; -sorted_poll_thread_list(dp, _list, ); +SHASH_FOR_EACH (node, _netdevs) { +struct dp_netdev *dp = node->data; -/* take port mutex as HMAP iters over them. */ -ovs_mutex_lock(>port_mutex); +/* Get PMD threads list, required to get DPCLS instances. */ +size_t n; +struct dp_netdev_pmd_thread **pmd_list; +sorted_poll_thread_list(dp, _list, ); -for (size_t i = 0; i < n; i++) { -struct dp_netdev_pmd_thread *pmd = pmd_list[i]; -if (pmd->core_id == NON_PMD_CORE_ID) { -continue; -} +/* take port mutex as HMAP iters over them. */ +ovs_mutex_lock(>port_mutex); -struct dp_netdev_port *port = NULL; -HMAP_FOR_EACH (port, node, >ports) { -odp_port_t in_port = port->port_no; -struct dpcls *cls = dp_netdev_pmd_lookup_dpcls(pmd, in_port); -if (!cls) { +for (size_t i = 0; i < n; i++) { +struct dp_netdev_pmd_thread *pmd = pmd_list[i]; +if (pmd->core_id == NON_PMD_CORE_ID) { continue; } -uint32_t subtbl_changes = dpcls_subtable_lookup_reprobe(cls); -if (subtbl_changes) { -lookup_dpcls_changed++; -lookup_subtable_changed += subtbl_changes; + +struct dp_netdev_port *port = NULL; +HMAP_FOR_EACH (port, node, >ports) { +odp_port_t in_port = port->port_no; +struct dpcls *cls = dp_netdev_pmd_lookup_dpcls(pmd, in_port); +if (!cls) { +continue; +} +uint32_t subtbl_changes = dpcls_subtable_lookup_reprobe(cls); +if (subtbl_changes) { +lookup_dpcls_changed++; +lookup_subtable_changed += subtbl_changes; +
Re: [ovs-dev] [PATCH ovn v3 3/3] ovn-controller: Fix incremental processing for logical port references.
On Wed, Jun 23, 2021 at 7:48 AM Numan Siddique wrote: > > On Mon, Jun 21, 2021 at 2:52 AM Han Zhou wrote: > > > > If a lflow has an lport name in the match, but when the lflow is > > processed the port-binding is not seen by ovn-controller, the > > corresponding openflow will not be created. Later if the port-binding is > > created/monitored by ovn-controller, the lflow is not reprocessed > > because the lflow didn't change and ovn-controller doesn't know that the > > port-binding affects the lflow. This patch fixes the problem by tracking > > the references when parsing the lflow, even if the port-binding is not > > found when the lflow is firstly parsed. A test case is also added to > > cover the scenario. > > > > Signed-off-by: Han Zhou > > Hi Han, > > Thanks for fixing these issues. I've a few questions. I haven't > reviewed the patch completely. > > > > --- > > controller/lflow.c | 63 ++--- > > controller/lflow.h | 3 ++ > > controller/ovn-controller.c | 35 - > > include/ovn/expr.h | 2 +- > > lib/expr.c | 14 +++-- > > tests/ovn.at| 47 +++ > > tests/test-ovn.c| 4 +-- > > utilities/ovn-trace.c | 2 +- > > 8 files changed, 132 insertions(+), 38 deletions(-) > > > > diff --git a/controller/lflow.c b/controller/lflow.c > > index 34eca135a..b7699a309 100644 > > --- a/controller/lflow.c > > +++ b/controller/lflow.c > > @@ -61,6 +61,7 @@ struct lookup_port_aux { > > > > struct condition_aux { > > struct ovsdb_idl_index *sbrec_port_binding_by_name; > > +const struct sbrec_datapath_binding *dp; > > const struct sbrec_chassis *chassis; > > const struct sset *active_tunnels; > > const struct sbrec_logical_flow *lflow; > > @@ -98,6 +99,12 @@ lookup_port_cb(const void *aux_, const char *port_name, unsigned int *portp) > > > > const struct lookup_port_aux *aux = aux_; > > > > +/* Store the name that used to lookup the lport to lflow reference, so that > > + * in the future when the lport's port binding changes, the logical flow > > + * that references this lport can be reprocessed. */ > > +lflow_resource_add(aux->lfrr, REF_TYPE_PORTBINDING, port_name, > > + >lflow->header_.uuid); > > + > > const struct sbrec_port_binding *pb > > = lport_lookup_by_name(aux->sbrec_port_binding_by_name, port_name); > > if (pb && pb->datapath == aux->dp) { > > @@ -149,19 +156,18 @@ is_chassis_resident_cb(const void *c_aux_, const char *port_name) > > { > > const struct condition_aux *c_aux = c_aux_; > > > > +/* Store the port name that used to lookup the lport to lflow reference, so > > + * that in the future when the lport's port-binding changes the logical > > + * flow that references this lport can be reprocessed. */ > > +lflow_resource_add(c_aux->lfrr, REF_TYPE_PORTBINDING, port_name, > > + _aux->lflow->header_.uuid); > > + > > const struct sbrec_port_binding *pb > > = lport_lookup_by_name(c_aux->sbrec_port_binding_by_name, port_name); > > if (!pb) { > > return false; > > } > > > > -/* Store the port_name to lflow reference. */ > > -int64_t dp_id = pb->datapath->tunnel_key; > > -char buf[16]; > > -get_unique_lport_key(dp_id, pb->tunnel_key, buf, sizeof(buf)); > > -lflow_resource_add(c_aux->lfrr, REF_TYPE_PORTBINDING, buf, > > - _aux->lflow->header_.uuid); > > - > > if (strcmp(pb->type, "chassisredirect")) { > > /* for non-chassisredirect ports */ > > return pb->chassis && pb->chassis == c_aux->chassis; > > @@ -623,8 +629,6 @@ add_matches_to_flow_table(const struct sbrec_logical_flow *lflow, > > int64_t dp_id = dp->tunnel_key; > > char buf[16]; > > get_unique_lport_key(dp_id, port_id, buf, sizeof(buf)); > > -lflow_resource_add(l_ctx_out->lfrr, REF_TYPE_PORTBINDING, buf, > > - >header_.uuid); > > if (!sset_contains(l_ctx_in->local_lport_ids, buf)) { > > VLOG_DBG("lflow "UUID_FMT > > " port %s in match is not local, skip", > > @@ -788,6 +792,7 @@ consider_logical_flow__(const struct sbrec_logical_flow *lflow, > > }; > > struct condition_aux cond_aux = { > > .sbrec_port_binding_by_name = l_ctx_in->sbrec_port_binding_by_name, > > +.dp = dp, > > .chassis = l_ctx_in->chassis, > > .active_tunnels = l_ctx_in->active_tunnels, > > .lflow = lflow, > > @@ -805,7 +810,6 @@ consider_logical_flow__(const struct sbrec_logical_flow *lflow, > > struct hmap *matches = NULL; > > size_t matches_size = 0; > > > > -bool is_cr_cond_present = false; > > bool pg_addr_set_ref = false; > > uint32_t n_conjs = 0; > > > > @@
[ovs-dev] [PATCH ovn] controller: set vlan-limit=0
This allows L3+ ACLs to match against double tagged vlan traffic on vlan-passthru switches. The default in OVS is vlan-limit=1 for backwards compatibility. This means packets are not "parsed" deeper than one tag level. This patch sets it to 0, which means "parse as deep as OVS supports". Right now it's effectively the same as setting it to "2", which is the maximum number of tag levels that OVS supports right now. It is already set to 2 in puppet-vswitch that is used in some OpenStack distributions: https://opendev.org/openstack/puppet-vswitch/commit/14011d69c18e628a3466fa71db25cefb7adff425 Signed-off-by: Ihar Hrachyshka --- controller/ovn-controller.c | 9 tests/ovn.at| 91 + 2 files changed, 100 insertions(+) diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index 3968ef059..9dab694b4 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -885,6 +885,7 @@ ctrl_register_ovs_idl(struct ovsdb_idl *ovs_idl) * their interest explicitly. */ ovsdb_idl_add_table(ovs_idl, _table_open_vswitch); ovsdb_idl_add_column(ovs_idl, _open_vswitch_col_external_ids); +ovsdb_idl_add_column(ovs_idl, _open_vswitch_col_other_config); ovsdb_idl_add_column(ovs_idl, _open_vswitch_col_bridges); ovsdb_idl_add_column(ovs_idl, _open_vswitch_col_datapaths); ovsdb_idl_add_table(ovs_idl, _table_interface); @@ -3130,6 +3131,14 @@ main(int argc, char *argv[]) process_br_int(ovs_idl_txn, bridge_table, ovs_table, _int, _int_dp); +/* Enable ACL matching for double tagged traffic. */ +if (ovs_idl_txn) { +const struct ovsrec_open_vswitch *cfg = +ovsrec_open_vswitch_table_first(ovs_table); +ovsrec_open_vswitch_update_other_config_setkey( +cfg, "vlan-limit", "0"); +} + if (ovsdb_idl_has_ever_connected(ovnsb_idl_loop.idl) && northd_version_match) { diff --git a/tests/ovn.at b/tests/ovn.at index 773b94a83..666c31bd5 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -1907,6 +1907,97 @@ AT_CLEANUP AT_BANNER([OVN end-to-end tests]) +OVN_FOR_EACH_NORTHD([ +AT_SETUP([ovn -- enables vlan-limit=0]) +ovn_start + +net_add n +check ovs-vsctl add-br br-phys +ovn_attach n br-phys 192.168.0.1 + +OVS_WAIT_UNTIL([test x`ovs-vsctl get Open_vSwitch . other_config:vlan-limit | tr -d '""'` = x0]) + +AT_CLEANUP +]) + +OVN_FOR_EACH_NORTHD([ +AT_SETUP([ovn -- allows ACLs to match against vlan-transparent double tagged traffic L3 fields]) +ovn_start + +for i in 1 2; do +check ovn-nbctl ls-add lsw$i +check ovn-nbctl --wait=sb add Logical-Switch lsw$i other_config vlan-passthru=true + +ln_port_name=ln-$i +check ovn-nbctl lsp-add lsw$i $ln_port_name +check ovn-nbctl lsp-set-addresses $ln_port_name unknown +check ovn-nbctl lsp-set-type $ln_port_name localnet +check ovn-nbctl lsp-set-options $ln_port_name network_name=phys +net_add n +done + +# two hypervisors, each connected to the same network +for i in 1 2; do +sim_add hv-$i +as hv-$i +ovs-vsctl add-br br-phys +ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys +ovn_attach n br-phys 192.168.0.$i +done + +check ovs-vsctl add-port br-phys tap +for i in 1 2; do +as hv-$i +check ovs-vsctl add-port br-int vif$i -- set Interface vif$i \ +external-ids:iface-id=lp$i options:tx_pcap=vif$i-tx.pcap options:rxq_pcap=vif$i-rx.pcap +check ovn-nbctl lsp-add lsw$i lp$i +check ovn-nbctl lsp-set-addresses lp$i "f0:00:00:00:00:0$i 10.0.0.$i" +done +for i in 1 2; do +OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lp$i` = xup]) +> $i.expected +done + +test_tcp_packet() { +local inport=$1 eth_dst=$2 eth_src=$3 ip_dst=$4 ip_src=$5 eout=$6 lout=$7 fail=$8 +tag=81ff +local packet=${eth_dst}${eth_src}${tag}${tag}080045284000ff06${ip_src}${ip_dst}0001000100015000 +as hv-$inport ovs-appctl netdev-dummy/receive vif$inport $packet +if [[ $fail -eq 0 ]]; then +echo $packet >> ${eout#lp}.expected +fi +} + +# first check that acl drop rule works for tagged traffic +for i in 1 2; do +check ovn-nbctl acl-add lsw$i to-lport 1000 'tcp' drop +done +check ovn-nbctl --wait=hv sync + +test_tcp_packet 1 f002 f001 0a02 0a01 lp2 lp2 1 +test_tcp_packet 2 f001 f002 0a01 0a02 lp1 lp1 1 + +for i in 1 2; do +OVN_CHECK_PACKETS_REMOVE_BROADCAST([vif$i-tx.pcap], [$i.expected]) +done + +# now check that with no rule traffic passes through +for i in 1 2; do +check ovn-nbctl acl-del lsw$i to-lport 1000 'tcp' +check ovn-nbctl acl-add lsw$i to-lport 1000 'tcp' allow-stateless +done +check ovn-nbctl --wait=hv sync + +test_tcp_packet 2 f001 f002 0a01 0a02 lp1 lp1 0 +test_tcp_packet 1 f002 f001 0a02 0a01 lp2 lp2 0 +
Re: [ovs-dev] [PATCH v3] ovs-lib: pass optional --election-timer arg to ovsdb-tool
On Wed, 2021-06-09 at 14:11 -0500, Dan Williams wrote: > Signed-off-by: Dan Williams > --- > v3: fix line wrapping > v2: put --election-timer arg before create-cluster per Ilya Ping on this patch? Thanks! Dan > > utilities/ovs-lib.in | 10 -- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/utilities/ovs-lib.in b/utilities/ovs-lib.in > index ab38ece458b7b..61a062fa992da 100644 > --- a/utilities/ovs-lib.in > +++ b/utilities/ovs-lib.in > @@ -495,15 +495,21 @@ create_cluster () { > DB_FILE="$1" > DB_SCHEMA="$2" > LOCAL_ADDR="$3" > + ELECTION_TIMER_MS="$4" > + > + election_timer_arg= > + if [ -n "$ELECTION_TIMER_MS" ]; then > + election_timer_arg="--election-timer=$ELECTION_TIMER_MS" > + fi > > if test ! -e "$DB_FILE"; then > - action "Creating cluster database $DB_FILE" ovsdb_tool > create-cluster "$DB_FILE" "$DB_SCHEMA" "$LOCAL_ADDR" > + action "Creating cluster database $DB_FILE" ovsdb_tool > "$election_timer_arg" create-cluster "$DB_FILE" "$DB_SCHEMA" > "$LOCAL_ADDR" > elif ovsdb_tool db-is-standalone "$DB_FILE"; then > # Convert standalone database to clustered. > backup_db || return 1 > rm -f "$DB_FILE" > action "Creating cluster database $DB_FILE from existing > one" \ > - ovsdb_tool create-cluster "$DB_FILE" "$backup" > "$LOCAL_ADDR" > + ovsdb_tool "$election_timer_arg" create-cluster > "$DB_FILE" "$backup" "$LOCAL_ADDR" > fi > } > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [v13 09/12] dpif-netdev/dpcls-avx512: Enable 16 block processing.
On Thu, Jun 17, 2021 at 05:18:22PM +0100, Cian Ferriter wrote: > From: Harry van Haaren > > This commit implements larger subtable searches in avx512. A limitation > of the previous implementation was that up to 8 blocks of miniflow > data could be matched on (so a subtable with 8 blocks was handled > in avx, but 9 blocks or more would fall back to scalar/generic). > This limitation is removed in this patch, where up to 16 blocks > of subtable can be matched on. > > From an implementation perspective, the key to enabling 16 blocks > over 8 blocks was to do bitmask calculation up front, and then use > the pre-calculated bitmasks for 2x passes of the "blocks gather" > routine. The bitmasks need to be shifted for k-mask usage in the > upper (8-15) block range, but it is relatively trivial. This also > helps in case expanding to 24 blocks is desired in future. > > The implementation of the 2nd iteration to handle > 8 blocks is > behind a conditional branch which checks the total number of bits. > This helps the specialized versions of the function that have a > miniflow fingerprint of less-than-or-equal 8 blocks, as the code > can be statically stripped out of those functions. Specialized > functions that do require more than 8 blocks will have the branch > removed and unconditionally execute the 2nd blocks gather routine. > > Lastly, the _any() flavour will have the conditional branch, and > the branch predictor may mispredict a bit, but per burst will > likely get most packets correct (particularly towards the middle > and end of a burst). > > The code has been run with unit tests under autovalidation and > passes all cases, and unit test coverage has been checked to > ensure the 16 block code paths are executing. > > Signed-off-by: Harry van Haaren > > --- The changes look good to me. I also introduced errors on the first 8 blocks and on the second 8 blocks and both caused the autovalidation to fail. Acked-by: Flavio Leitner ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [v13 04/12] dpif-avx512: Add ISA implementation of dpif.
On Thu, Jun 17, 2021 at 05:18:17PM +0100, Cian Ferriter wrote: > From: Harry van Haaren > > This commit adds the AVX512 implementation of DPIF functionality, > specifically the dp_netdev_input_outer_avx512 function. This function > only handles outer (no re-circulations), and is optimized to use the > AVX512 ISA for packet batching and other DPIF work. > > Sparse is not able to handle the AVX512 intrinsics, causing compile > time failures, so it is disabled for this file. > > Signed-off-by: Harry van Haaren > Co-authored-by: Cian Ferriter > Signed-off-by: Cian Ferriter > Co-authored-by: Kumar Amber > Signed-off-by: Kumar Amber > > --- > > v13: > - Squash "Add HWOL support" commit into this commit. > - Add NEWS item about this feature here rather than in a later commit. > - Add #define NUM_U64_IN_ZMM_REG 8. > - Add comment describing operation of while loop handling HWOL->EMC->SMC > lookups in dp_netdev_input_outer_avx512(). > - Add EMC and SMC batch insert functions for better handling of EMC and > SMC in AVX512 DPIF. > - Minor code refactor to address review comments. > --- > NEWS | 2 + > lib/automake.mk | 5 +- > lib/dpif-netdev-avx512.c | 327 +++ > lib/dpif-netdev-private-dfc.h| 25 +++ > lib/dpif-netdev-private-dpif.h | 32 +++ > lib/dpif-netdev-private-thread.h | 11 +- > lib/dpif-netdev-private.h| 25 +++ > lib/dpif-netdev.c| 103 -- > 8 files changed, 514 insertions(+), 16 deletions(-) > create mode 100644 lib/dpif-netdev-avx512.c > create mode 100644 lib/dpif-netdev-private-dpif.h > > diff --git a/NEWS b/NEWS > index 96b3a61c8..6a4a7b76d 100644 > --- a/NEWS > +++ b/NEWS > @@ -10,6 +10,8 @@ Post-v2.15.0 > * Auto load balancing of PMDs now partially supports cross-NUMA polling > cases, e.g if all PMD threads are running on the same NUMA node. > * Refactor lib/dpif-netdev.c to multiple header files. > + * Add avx512 implementation of dpif which can process non recirculated > + packets. It supports partial HWOL, EMC, SMC and DPCLS lookups. > - ovs-ctl: > * New option '--no-record-hostname' to disable hostname configuration > in ovsdb on startup. > diff --git a/lib/automake.mk b/lib/automake.mk > index 3a33cdd5c..660cd07f0 100644 > --- a/lib/automake.mk > +++ b/lib/automake.mk > @@ -33,11 +33,13 @@ lib_libopenvswitchavx512_la_CFLAGS = \ > -mavx512f \ > -mavx512bw \ > -mavx512dq \ > + -mbmi \ > -mbmi2 \ > -fPIC \ > $(AM_CFLAGS) > lib_libopenvswitchavx512_la_SOURCES = \ > - lib/dpif-netdev-lookup-avx512-gather.c > + lib/dpif-netdev-lookup-avx512-gather.c \ > + lib/dpif-netdev-avx512.c > lib_libopenvswitchavx512_la_LDFLAGS = \ > -static > endif > @@ -114,6 +116,7 @@ lib_libopenvswitch_la_SOURCES = \ > lib/dpif-netdev-private-dfc.c \ > lib/dpif-netdev-private-dfc.h \ > lib/dpif-netdev-private-dpcls.h \ > + lib/dpif-netdev-private-dpif.h \ > lib/dpif-netdev-private-flow.h \ > lib/dpif-netdev-private-hwol.h \ > lib/dpif-netdev-private-thread.h \ > diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c > new file mode 100644 > index 0..0e55b0be2 > --- /dev/null > +++ b/lib/dpif-netdev-avx512.c > @@ -0,0 +1,327 @@ > +/* > + * Copyright (c) 2021 Intel Corporation. > + * > + * Licensed under the Apache License, Version 2.0 (the "License"); > + * you may not use this file except in compliance with the License. > + * You may obtain a copy of the License at: > + * > + * http://www.apache.org/licenses/LICENSE-2.0 > + * > + * Unless required by applicable law or agreed to in writing, software > + * distributed under the License is distributed on an "AS IS" BASIS, > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > + * See the License for the specific language governing permissions and > + * limitations under the License. > + */ > + > +#ifdef __x86_64__ > +/* Sparse cannot handle the AVX512 instructions. */ > +#if !defined(__CHECKER__) > + > +#include > + > +#include "dpif-netdev.h" > +#include "dpif-netdev-perf.h" > + > +#include "dpif-netdev-private.h" > +#include "dpif-netdev-private-dpcls.h" > +#include "dpif-netdev-private-flow.h" > +#include "dpif-netdev-private-thread.h" > +#include "dpif-netdev-private-hwol.h" > + > +#include "dp-packet.h" > +#include "netdev.h" > + > +#include "immintrin.h" > + > +/* Each AVX512 register (zmm register in assembly notation) can contain up to > + * 512 bits, which is equivalent to 8 uint64_t variables. This is the maximum > + * number of miniflow blocks that can be processed in a single pass of the > + * AVX512 code at a time. > + */ > +#define NUM_U64_IN_ZMM_REG (8) > + > +/* Structure to contain per-packet metadata that must be attributed to the > + * dp netdev flow. This is unfortunate to have to track per packet, however > + * it's
Re: [ovs-dev] [v13 11/12] dpdk: Cache result of CPU ISA checks.
On Thu, Jun 17, 2021 at 05:18:24PM +0100, Cian Ferriter wrote: > From: Harry van Haaren > > As a small optimization, this patch caches the result of a CPU ISA > check from DPDK. Particularly in the case of running the DPCLS > autovalidator (which repeatedly probes subtables) this reduces > the amount of CPU ISA lookups from the DPDK level. > > By caching them at the OVS/dpdk.c level, the ISA checks remain > runtime for the CPU where they are executed, but subsequent checks > for the same ISA feature become much cheaper. > > Signed-off-by: Harry van Haaren > Co-authored-by: Cian Ferriter > Signed-off-by: Cian Ferriter > --- The current approach uses a static int8_t per CPU flag. Perhaps using two static int8_t (one for DONE and another for AVAIL) and then use a bit on them for each CPU flag would result in allocating less static variables. Anyways, 2 or 3 CPU flags make no relevant difference now. Acked-by: Flavio Leitner ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [v13 08/12] dpif-netdev-unixctl.man: Document subtable-lookup-* CMDs
On Thu, Jun 17, 2021 at 05:18:21PM +0100, Cian Ferriter wrote: > Signed-off-by: Cian Ferriter > > --- Acked-by: Flavio Leitner ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [v13 10/12] dpif-netdev/dpcls: Specialize more subtable signatures.
On Thu, Jun 17, 2021 at 05:18:23PM +0100, Cian Ferriter wrote: > From: Harry van Haaren > > This commit adds more subtables to be specialized. The traffic > pattern here being matched is VXLAN traffic subtables, which commonly > have (5,3), (9,1) and (9,4) subtable fingerprints. > > Signed-off-by: Harry van Haaren > > --- Acked-by: Flavio Leitner ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [v13 07/12] dpif-netdev: Add a partial HWOL PMD statistic.
On Thu, Jun 17, 2021 at 05:18:20PM +0100, Cian Ferriter wrote: > It is possible for packets traversing the userspace datapath to match a > flow before hitting on EMC by using a mark ID provided by a NIC. Add a > PMD statistic for this hit. > > Signed-off-by: Cian Ferriter > > --- Acked-by: Flavio Leitner ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [v13 12/12] dpcls-avx512: Enable avx512 vector popcount instruction.
On Thu, Jun 17, 2021 at 05:18:25PM +0100, Cian Ferriter wrote: > From: Harry van Haaren > > This commit enables the AVX512-VPOPCNTDQ Vector Popcount > instruction. This instruction is not available on every CPU > that supports the AVX512-F Foundation ISA, hence it is enabled > only when the additional VPOPCNTDQ ISA check is passed. > > The vector popcount instruction is used instead of the AVX512 > popcount emulation code present in the avx512 optimized DPCLS today. > It provides higher performance in the SIMD miniflow processing > as that requires the popcount to calculate the miniflow block indexes. > > Signed-off-by: Harry van Haaren > > --- Acked-by: Flavio Leitner This patch series implements low level optimizations by manually coding instructions. I wonder if gcc couldn't get some relevant level of vectorized optimizations refactoring and enabling compiling flags. I assume the answer is no, but I would appreciate some enlightenment on the matter. Thanks, fbl ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev