Re: [ovs-dev] [PATCH ovn] controller: Fix hairpin SNAT flow explosion for the same SNAT IPs case.

2023-02-27 Thread Dumitru Ceara
Hi Ilya,

Thanks for the patch, you're right the hairpin SNAT IP processing in
ovn-controller is extremely inefficient today.

I didn't review the code closely but I do have some initial remarks below.

On 2/20/23 12:42, Ilya Maximets wrote:
> It's common to have 'hairpin_snat_ip' to be configured exactly the
> same for each and every Load Balancer in a setup.  For example,
> ovn-kubernetes does that, setting '169.254.169.5 fd69::5' value
> unconditionally for every LB:
>   
> https://github.com/ovn-org/ovn-kubernetes/blob/cd78ae1af4657d38bdc41003a8737aa958d62b9d/go-controller/pkg/ovn/controller/services/loadbalancer.go#L314
> 
> Current implementation of ovn-controller will create an OpenFlow rule
> for every datapath for each VIP in case of 'hairpin_snat_ip', because
> we need to handle the case where LBs with the same VIP are applied to
> different datapaths and have different SNAT IPs.

IMO the real issue is that we do this blindly for every datapath.  It
should actually be only for local datapaths.  In ovn-kubernetes (which
AFAIK is the only user of hairpin_snat_ip) there will only be a single
local logical switch with LBs applied on it.

> 
> In large scale setups with tens of thousands of LBs and hundreds of
> nodes, this generates millions of OpenFlow rules, making ovn-controller
> choke and fall into unrecoverable disconnection loop with poll
> intervals up to 200 seconds in ovn-heater's density-heavy scenario with
> just 120 nodes.  It also generates rules with thousands of conjunctions
> that do not fit into a single FLOW_MOD, breaking the OpenFlow
> management communication.

These conjunctive flows were added with:

07467cfac499 ("lflow: Refactor OpenFlow snat hairpin flows")

in order to reduce load in ovn-controller for the (now obsolete) use
case ovn-kubernetes had.

However, Han added:

22298fd37908 ("ovn-controller: Don't flood fill local datapaths beyond
DGP boundary.")

Which allowed ovn-kubernetes to essentially bind a logical switch to a
chassis and which means that we will have at most one local datapath
with load balancers applied to it.  At this point, I suspect the
original refactor patch could just be reverted.  We would have to check
that snat-ip flows (non-conjunctive) are being added only for the local
datapath.

Using the database you shared offline (120 node ovn-heater
density-heavy) I tried this (didn't test too much though) and there's no
visible impact on ovn-controller performance.

> 
> Fix that by checking that all the 'hairpin_snat_ip' options on all the
> Load Balancers are exactly the same and using just one OpenFlow rule
> per VIP if that's the case.  This is possible because even if LBs with
> the same VIP are applied to different datapaths, we're still going to
> SNAT with the exactly same hairpin SNAT IP.
> 
> Ideally, we can only check that 'hairpin_snat_ip' is the same for all
> LBs with the same VIP, but that is not necessary for now, as the only
> common use case for this option today is ovn-kubernetes.  Can be added
> in the future by enhancing the logic in ovn-northd, if necessary.
> 

What worries me about this approach is that in the case when one
hairpin_snat_ip changes value we suddenly disable the whole optimization.

I think I'd like, if possible, to get more performance data on what
happens if we revert 07467cfac499 ("lflow: Refactor OpenFlow snat
hairpin flows") and only add snat ip flows for local datapaths.

What do you think?

> Note:
> 
> There seems to be a separate issue in ovn-controller.  If only Load
> Balancer data is updated, I-P processing is going through the 'lb_data'
> node.  And handler for that node deletes old flows and adds new ones
> for each updated LB.  This triggers OpenFlow flow DEL + ADD.  On DEL
> of hairpin reply learning flows in table 68, OVS cleans up all the
> learned flows in table 69 as well, because of 'delete_learned' flag.

Won't this potentially affect dataplane performance for hairpin traffic?

> 
> However, when changes are happening in higher level nodes and
> en_lflow_output_run() triggers lflow_run(), lflow_run() doesn't delete
> current hairpin flows before re-processing them.  That leads to the
> flow MOD instead of DEL + ADD.  Since the learning flow is not deleted,
> but modified, that doesn't trigger removal of learned rules in OVS.
> This is what happening if 'lb_same_hairpin_ips' flips the value.
> The test adjusted to change the 'hairpin_snat_ip' twice, to trigger
> an 'lb_data' processing.  Otherwise, old learned flow for the VIP
> stays in table 69.
> 
> This should not be a huge issue, as it's still a VIP address and it
> should not make any harm to have that learned flow.  Also, the value
> of 'lb_same_hairpin_ips' doesn't ever flip in current ovn-kubernetes.
> 
> Reported-at: https://bugzilla.redhat.com/2171423
> Signed-off-by: Ilya Maximets 
Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org

Re: [ovs-dev] [PATCH ovn] tests: Decrease the number of zones and switches for interconnection

2023-02-27 Thread Mark Michelson

Thanks Ales.

Acked-by: Mark Michelson 

Since this was already discussed during the upstream OVN IRC meeting, I 
went ahead and applied this to main, branch-23.03, branch-22.12, 
branch-22.09, branch-22.06, and branch-22.03.


On 2/27/23 02:32, Ales Musil wrote:

When we run the tests with address sanitizers enabled
the interconnection test consumes a lot of memory,
around 5 GB (on my env) per permutation. To prevent
any OOM kills on Github CI reduce the number of
zones and transit switches to 3 as this lowers
the memory consumption to around 2 GB per permutation.

Signed-off-by: Ales Musil 
---
  tests/ovn.at | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tests/ovn.at b/tests/ovn.at
index dc5c5df3f..778d2dbe0 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -25286,8 +25286,10 @@ AT_SETUP([interconnection])
  AT_KEYWORDS([slowtest])
  
  ovn_init_ic_db

-n_az=5
-n_ts=5
+# The number needs to stay relatively low due to high memory consumption
+# with address sanitizers enabled.
+n_az=3
+n_ts=3
  for i in `seq 1 $n_az`; do
  ovn_start az$i
  done


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4] ipfix: make template and stats interval configurable

2023-02-27 Thread Ilya Maximets
On 2/24/23 12:05, Adrián Moreno wrote:
> From: Adrian Moreno 
> 
> Add options to the IPFIX table configure the interval to send statistics
> and template information.
> 
> Reviewed-by: Simon Horman 
> Signed-off-by: Adrian Moreno 
> 
> ---
> - v4:
>- Bump vswitchd's schema version to 8.4.0.
> - v3:
>- Removed unit tests which generate errors in Intel CI. Will submit in
>  independent series.
> - v2:
>   - Fixed a potential race condition in unit test.
> - v1:
>   - Added unit test.
> ---
>  NEWS |  2 ++
>  ofproto/ofproto-dpif-ipfix.c | 38 ++--
>  ofproto/ofproto.h|  9 +
>  vswitchd/bridge.c| 17 
>  vswitchd/vswitch.ovsschema   | 14 +++--
>  vswitchd/vswitch.xml | 20 +++
>  6 files changed, 88 insertions(+), 12 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index 85b349621..1794de046 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -56,6 +56,8 @@ v3.1.0 - 16 Feb 2023
>   * Add new experimental PMD load based sleeping feature. PMD threads can
> request to sleep up to a user configured 'pmd-maxsleep' value under
> low load conditions.
> +   - IPFIX template and statistics intervals can be configured through two 
> new
> +   options in the IPFIX table: 'template_interval' and 'stats_interval'.

Thanks, Adrian and Simon!

I moved the NEWS entry to a correct section and applied the change.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 2/3] ovn-util: Optimize is_dynamic_lsp_address.

2023-02-27 Thread 0-day Robot
Bleep bloop.  Greetings Ilya Maximets, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Comment with 'xxx' marker
#52 FILE: lib/ovn-util.c:123:
/* "dynamic x.x.x.x ::" */

WARNING: Comment with 'xxx' marker
#59 FILE: lib/ovn-util.c:130:
/* "dynamic ::" */

Lines checked: 78, Warnings: 2, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 3/3] northd: Don't parse LSP addresses twice.

2023-02-27 Thread Ilya Maximets
At the point of IPAM configuration all the addresses are already parsed.
No need to waste time parsing them again.

Signed-off-by: Ilya Maximets 
---
 northd/northd.c | 54 -
 1 file changed, 17 insertions(+), 37 deletions(-)

diff --git a/northd/northd.c b/northd/northd.c
index 6d37ea713..868a8c3b8 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -1497,7 +1497,10 @@ struct ovn_port {
 const struct nbrec_logical_switch_port *nbsp; /* May be NULL. */
 
 struct lport_addresses *lsp_addrs;  /* Logical switch port addresses. */
-unsigned int n_lsp_addrs;
+unsigned int n_lsp_addrs;  /* Total length of lsp_addrs. */
+unsigned int n_lsp_non_router_addrs; /* Number of elements from the
+  * beginning of 'lsp_addrs' extracted
+  * directly from LSP 'addresses'. */
 
 struct lport_addresses *ps_addrs;   /* Port security addresses. */
 unsigned int n_ps_addrs;
@@ -1817,35 +1820,21 @@ ipam_insert_ip_for_datapath(struct ovn_datapath *od, 
uint32_t ip)
 }
 
 static void
-ipam_insert_lsp_addresses(struct ovn_datapath *od, struct ovn_port *op,
-  char *address)
+ipam_insert_lsp_addresses(struct ovn_datapath *od,
+  struct lport_addresses *laddrs)
 {
-if (!od || !op || !address || !strcmp(address, "unknown")
-|| !strcmp(address, "router") || is_dynamic_lsp_address(address)) {
-return;
-}
-
-struct lport_addresses laddrs;
-if (!extract_lsp_addresses(address, )) {
-static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
-VLOG_WARN_RL(, "Extract addresses failed.");
-return;
-}
-ipam_insert_mac(, true);
+ipam_insert_mac(>ea, true);
 
 /* IP is only added to IPAM if the switch's subnet option
  * is set, whereas MAC is always added to MACAM. */
 if (!od->ipam_info.allocated_ipv4s) {
-destroy_lport_addresses();
 return;
 }
 
-for (size_t j = 0; j < laddrs.n_ipv4_addrs; j++) {
-uint32_t ip = ntohl(laddrs.ipv4_addrs[j].addr);
+for (size_t j = 0; j < laddrs->n_ipv4_addrs; j++) {
+uint32_t ip = ntohl(laddrs->ipv4_addrs[j].addr);
 ipam_insert_ip_for_datapath(od, ip);
 }
-
-destroy_lport_addresses();
 }
 
 static void
@@ -1855,29 +1844,21 @@ ipam_add_port_addresses(struct ovn_datapath *od, struct 
ovn_port *op)
 return;
 }
 
-if (op->nbsp) {
+if (op->n_lsp_non_router_addrs) {
 /* Add all the port's addresses to address data structures. */
-for (size_t i = 0; i < op->nbsp->n_addresses; i++) {
-ipam_insert_lsp_addresses(od, op, op->nbsp->addresses[i]);
-}
-} else if (op->nbrp) {
-struct lport_addresses lrp_networks;
-if (!extract_lrp_networks(op->nbrp, _networks)) {
-static struct vlog_rate_limit rl
-= VLOG_RATE_LIMIT_INIT(1, 1);
-VLOG_WARN_RL(, "Extract addresses failed.");
-return;
+for (size_t i = 0; i < op->n_lsp_non_router_addrs; i++) {
+ipam_insert_lsp_addresses(od, >lsp_addrs[i]);
 }
-ipam_insert_mac(_networks.ea, true);
+} else if (op->lrp_networks.ea_s[0]) {
+ipam_insert_mac(>lrp_networks.ea, true);
 
 if (!op->peer || !op->peer->nbsp || !op->peer->od || !op->peer->od->nbs
 || !smap_get(>peer->od->nbs->other_config, "subnet")) {
-destroy_lport_addresses(_networks);
 return;
 }
 
-for (size_t i = 0; i < lrp_networks.n_ipv4_addrs; i++) {
-uint32_t ip = ntohl(lrp_networks.ipv4_addrs[i].addr);
+for (size_t i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) {
+uint32_t ip = ntohl(op->lrp_networks.ipv4_addrs[i].addr);
 /* If the router has the first IP address of the subnet, don't add
  * it to IPAM. We already added this when we initialized IPAM for
  * the datapath. This will just result in an erroneous message
@@ -1887,8 +1868,6 @@ ipam_add_port_addresses(struct ovn_datapath *od, struct 
ovn_port *op)
 ipam_insert_ip_for_datapath(op->peer->od, ip);
 }
 }
-
-destroy_lport_addresses(_networks);
 }
 }
 
@@ -2573,6 +2552,7 @@ join_logical_ports(struct northd_input *input_data,
 }
 op->n_lsp_addrs++;
 }
+op->n_lsp_non_router_addrs = op->n_lsp_addrs;
 
 op->ps_addrs
 = xmalloc(sizeof *op->ps_addrs * nbsp->n_port_security);
-- 
2.39.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 0/3] northd: Optimize parsing of LSP addresses.

2023-02-27 Thread Ilya Maximets
This patch set optimizes usage of string operations and avoids parsing
same LSP addresses multiple times.

Performance tests with ovn-heater show 5-10% improvement in high-scale
density-light scenarios.

Ilya Maximets (3):
  treewide: Remove unnecessary strlen() calls.
  ovn-util: Optimize is_dynamic_lsp_address.
  northd: Don't parse LSP addresses twice.

 controller/chassis.c  |  4 +--
 controller/encaps.c   |  2 +-
 controller/physical.c |  2 +-
 controller/pinctrl.c  |  6 ++--
 ic/ovn-ic.c   |  4 +--
 lib/ovn-util.c| 52 +--
 northd/northd.c   | 64 +++
 utilities/ovn-nbctl.c | 14 +-
 8 files changed, 75 insertions(+), 73 deletions(-)

-- 
2.39.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 1/3] treewide: Remove unnecessary strlen() calls.

2023-02-27 Thread Ilya Maximets
In many cases OVN components are using strlen() to check if the string
is not empty.  In that case, unnecessary scan of the whole string is
performed and the length is calculated, while it's enough to just check
the first byte.

Some functions also have completely unnecessary strlen() calls where
needed functionality can be achieved without them.

The most impactful changes in this commit are around extract_addresses()
function, and extract_lsp_addresses() in particular.  It is called by
northd for every logical port and involves 2 unnecessary strlen() calls.

Signed-off-by: Ilya Maximets 
---
 controller/chassis.c  |  4 ++--
 controller/encaps.c   |  2 +-
 controller/physical.c |  2 +-
 controller/pinctrl.c  |  6 +++---
 ic/ovn-ic.c   |  4 ++--
 lib/ovn-util.c|  5 ++---
 northd/northd.c   | 10 +-
 utilities/ovn-nbctl.c | 14 +++---
 8 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/controller/chassis.c b/controller/chassis.c
index acde00849..0484e2fb9 100644
--- a/controller/chassis.c
+++ b/controller/chassis.c
@@ -99,9 +99,9 @@ static const char *
 get_hostname(const struct smap *ext_ids, const char *chassis_id)
 {
 const char *hostname = get_chassis_external_id_value(ext_ids, chassis_id,
- "hostname", "");
+ "hostname", NULL);
 
-if (strlen(hostname) == 0) {
+if (!hostname) {
 static char hostname_[HOST_NAME_MAX + 1];
 
 if (gethostname(hostname_, sizeof(hostname_))) {
diff --git a/controller/encaps.c b/controller/encaps.c
index 5d383401d..2662eaf98 100644
--- a/controller/encaps.c
+++ b/controller/encaps.c
@@ -77,7 +77,7 @@ tunnel_create_name(struct tunnel_ctx *tc, const char 
*chassis_id)
 for (int i = 0; i < UINT16_MAX; i++) {
 const char *idx = get_chassis_idx(tc->ovs_table);
 char *port_name = xasprintf(
-"ovn%s-%.*s-%x", idx, strlen(idx) ? 5 : 6, chassis_id, i);
+"ovn%s-%.*s-%x", idx, idx[0] ? 5 : 6, chassis_id, i);
 
 if (!sset_contains(>port_names, port_name)) {
 return port_name;
diff --git a/controller/physical.c b/controller/physical.c
index e718268ea..ec861f49c 100644
--- a/controller/physical.c
+++ b/controller/physical.c
@@ -505,7 +505,7 @@ populate_remote_chassis_macs(const struct sbrec_chassis 
*my_chassis,
 const char *tokens
 = get_chassis_mac_mappings(>other_config, chassis->name);
 
-if (!strlen(tokens)) {
+if (!tokens[0]) {
 continue;
 }
 
diff --git a/controller/pinctrl.c b/controller/pinctrl.c
index 248b1a0e7..795847729 100644
--- a/controller/pinctrl.c
+++ b/controller/pinctrl.c
@@ -3775,12 +3775,12 @@ ipv6_ra_update_config(const struct sbrec_port_binding 
*pb)
 
 const char *dnssl = smap_get(>options, "ipv6_ra_dnssl");
 if (dnssl) {
-ds_put_buffer(>dnssl, dnssl, strlen(dnssl));
+ds_put_cstr(>dnssl, dnssl);
 }
 
 const char *route_info = smap_get(>options, "ipv6_ra_route_info");
 if (route_info) {
-ds_put_buffer(>route_info, route_info, strlen(route_info));
+ds_put_cstr(>route_info, route_info);
 }
 
 return config;
@@ -5825,7 +5825,7 @@ extract_addresses_with_port(const char *addresses,
 int ofs;
 if (!extract_addresses(addresses, laddrs, )) {
 return false;
-} else if (ofs >= strlen(addresses)) {
+} else if (!addresses[ofs]) {
 return true;
 }
 
diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c
index 9a80a7f68..1d0a062f6 100644
--- a/ic/ovn-ic.c
+++ b/ic/ovn-ic.c
@@ -1162,7 +1162,7 @@ add_static_to_routes_ad(
 ipv6_format_addr(, );
 }
 
-ds_put_format(, ", route_table: %s", strlen(nb_route->route_table)
+ds_put_format(, ", route_table: %s", nb_route->route_table[0]
  ? nb_route->route_table
  : "");
 
@@ -1348,7 +1348,7 @@ sync_learned_routes(struct ic_context *ctx,
 continue;
 }
 
-if (strlen(isb_route->route_table) &&
+if (isb_route->route_table[0] &&
 strcmp(isb_route->route_table, ts_route_table)) {
 if (VLOG_IS_DBG_ENABLED()) {
 VLOG_DBG("Skip learning static route %s -> %s as either "
diff --git a/lib/ovn-util.c b/lib/ovn-util.c
index b12c027b9..47484ef01 100644
--- a/lib/ovn-util.c
+++ b/lib/ovn-util.c
@@ -128,7 +128,6 @@ parse_and_store_addresses(const char *address, struct 
lport_addresses *laddrs,
 const char *buf = address;
 const char *const start = buf;
 int buf_index = 0;
-const char *buf_end = buf + strlen(address);
 
 if (extract_eth_addr) {
 if (!ovs_scan_len(buf, _index, ETH_ADDR_SCAN_FMT,
@@ -151,7 +150,7 @@ parse_and_store_addresses(const char *address, struct 
lport_addresses *laddrs,
  * and store in the 

[ovs-dev] [PATCH ovn 2/3] ovn-util: Optimize is_dynamic_lsp_address.

2023-02-27 Thread Ilya Maximets
For non-dynamic address, current version performs 1 strcmp + 4 attempts
for ovs_scan.  Updated code perfroms 1 strncmp + 1 ovs_scan.

Also, for ports with both IPv4 and IPv6 specified, new version doesn't
re-parse IPv4 twice.

Signed-off-by: Ilya Maximets 
---
 lib/ovn-util.c | 47 +++
 1 file changed, 35 insertions(+), 12 deletions(-)

diff --git a/lib/ovn-util.c b/lib/ovn-util.c
index 47484ef01..e683abb45 100644
--- a/lib/ovn-util.c
+++ b/lib/ovn-util.c
@@ -105,18 +105,41 @@ is_dynamic_lsp_address(const char *address)
 char ipv6_s[IPV6_SCAN_LEN + 1];
 struct eth_addr ea;
 ovs_be32 ip;
-int n;
-return (!strcmp(address, "dynamic")
-|| (ovs_scan(address, "dynamic "IP_SCAN_FMT"%n",
- IP_SCAN_ARGS(), )
- && address[n] == '\0')
-|| (ovs_scan(address, "dynamic "IP_SCAN_FMT" "IPV6_SCAN_FMT"%n",
- IP_SCAN_ARGS(), ipv6_s, )
- && address[n] == '\0')
-|| (ovs_scan(address, "dynamic "IPV6_SCAN_FMT"%n",
- ipv6_s, ) && address[n] == '\0')
-|| (ovs_scan(address, ETH_ADDR_SCAN_FMT" dynamic%n",
- ETH_ADDR_SCAN_ARGS(ea), ) && address[n] == '\0'));
+int n = 0;
+
+if (!strncmp(address, "dynamic", 7)) {
+n = 7;
+if (!address[n]) {
+/* "dynamic" */
+return true;
+}
+if (ovs_scan_len(address, , " "IP_SCAN_FMT, IP_SCAN_ARGS())) {
+if (!address[n]) {
+/* "dynamic x.x.x.x" */
+return true;
+}
+if (ovs_scan_len(address, , " "IPV6_SCAN_FMT, ipv6_s)
+&& !address[n]) {
+/* "dynamic x.x.x.x ::" */
+return true;
+}
+return false;
+}
+if (ovs_scan_len(address, , " "IPV6_SCAN_FMT, ipv6_s)
+&& !address[n]) {
+/* "dynamic ::" */
+return true;
+}
+return false;
+}
+
+if (ovs_scan_len(address, , ETH_ADDR_SCAN_FMT" dynamic",
+ ETH_ADDR_SCAN_ARGS(ea)) && !address[n]) {
+/* "xx:xx:xx:xx:xx:xx dynamic" */
+return true;
+}
+
+return false;
 }
 
 static bool
-- 
2.39.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] ovn-util: Remove unused ovn_parse_internal_version_minor.

2023-02-27 Thread Ilya Maximets
The only user of this function was removed in a cited commit.  And
it is unlikely to be used in the future, since we have feature flags
in the database instead.

Fixes: 3013c2869696 ("northd: ovn-controller: Use ct_mark.natted only when 
ct_lb_mark is used.")
Signed-off-by: Ilya Maximets 
---
 lib/ovn-util.c | 18 --
 lib/ovn-util.h |  5 -
 2 files changed, 23 deletions(-)

diff --git a/lib/ovn-util.c b/lib/ovn-util.c
index b12c027b9..5cd38c9c4 100644
--- a/lib/ovn-util.c
+++ b/lib/ovn-util.c
@@ -825,24 +825,6 @@ ovn_get_internal_version(void)
  N_OVNACTS, OVN_INTERNAL_MINOR_VER);
 }
 
-unsigned int
-ovn_parse_internal_version_minor(const char *ver)
-{
-const char *p = ver + strlen(ver);
-for (int i = 0; i < strlen(ver); i++) {
-if (*p == '.') {
-break;
-}
-p--;
-}
-
-unsigned int minor;
-if (ovs_scan(p, ".%u", )) {
-return minor;
-}
-return 0;
-}
-
 #ifdef DDLOG
 /* Callbacks used by the ddlog northd code to print warnings and errors. */
 void
diff --git a/lib/ovn-util.h b/lib/ovn-util.h
index cda1fde48..a1a418a24 100644
--- a/lib/ovn-util.h
+++ b/lib/ovn-util.h
@@ -266,11 +266,6 @@ bool ip_address_and_port_from_lb_key(const char *key, char 
**ip_address,
  * value. */
 char *ovn_get_internal_version(void);
 
-/* Parse the provided internal version string and return the "minor" part which
- * is expected to be an unsigned integer followed by the last "." in the
- * string. Returns 0 if the string can't be parsed. */
-unsigned int ovn_parse_internal_version_minor(const char *ver);
-
 /* OVN Packet definitions. These may eventually find a home in OVS's
  * packets.h file. For the time being, they live here because OVN uses them
  * and OVS does not.
-- 
2.39.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] dpdk: Allow retaining cap_sys_rawio privileges

2023-02-27 Thread Ilya Maximets
On 2/22/23 18:07, Aaron Conole wrote:
> Open vSwitch generally tries to let the underlying operating system
> managed the low level details of hardware, for example DMA mapping,
> bus arbitration, etc.  However, when using DPDK, the underlying
> operating system yields control of many of these details to userspace
> for management.
> 
> In the case of some DPDK port drivers, configuring rte_flow or even
> allocating resources may require access to iopl/ioperm calls, which
> are guarded by the CAP_SYS_RAWIO privilege on linux systems.  These
> calls are dangerous, and can allow a process to completely compromise
> a system.  However, they are needed in the case of some userspace
> driver code which manages the hardware (for example, the mlx
> implementation of backend support for rte_flow).
> 
> Here, we create an opt-in flag passed to the command line to allow
> this access.  We need to do this before ever accessing the database,
> because we want to drop all privileges asap, and cannot wait for
> a connection to the database to be established and functional before
> dropping.  There may be distribution specific ways to do capability
> management as well (using for example, systemd), but they are not
> as universal to the vswitchd as a flag.
> 
> Signed-off-by: Aaron Conole 
> ---
>  NEWS   |  4 
>  lib/daemon-unix.c  | 31 ++-
>  lib/daemon.c   |  2 +-
>  lib/daemon.h   |  4 ++--
>  ovsdb/ovsdb-client.c   |  6 +++---
>  ovsdb/ovsdb-server.c   |  4 ++--
>  tests/test-netflow.c   |  2 +-
>  tests/test-sflow.c |  2 +-
>  tests/test-unixctl.c   |  2 +-
>  utilities/ovs-ofctl.c  |  4 ++--
>  utilities/ovs-testcontroller.c |  4 ++--
>  vswitchd/ovs-vswitchd.8.in |  8 
>  vswitchd/ovs-vswitchd.c| 11 ++-
>  13 files changed, 59 insertions(+), 25 deletions(-)
> 

...

> diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
> index 407bfc60eb..f62d1ad751 100644
> --- a/vswitchd/ovs-vswitchd.c
> +++ b/vswitchd/ovs-vswitchd.c
> @@ -60,6 +60,9 @@ VLOG_DEFINE_THIS_MODULE(vswitchd);
>   * the kernel from paging any of its memory to disk. */
>  static bool want_mlockall;
>  
> +/* --hw-access: If set, retains CAP_SYS_RAWIO privileges.  */
> +static bool hw_access;

The comment is outdated.  And we may also want to rename the variable
itself to match the option.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] rhel: pass options to stop daemon command in systemd units

2023-02-27 Thread Vladislav Odintsov
This patch fixes ovn-northd.service and ovn-db@.service systemd units
which didn't pass startup options to ovn-ctl script while stop_* call.

For instance if ovn-northd service was started with option
'--ovn-manage-ovsdb=no' and NB/SB databases are located on the same host
and started by ovn-db@nb / ovn-db@sb or manually with ovn-ctl, so
ovsdb-server processes for these  services will be unexpectedly stopped.

Fixes: 497ec3fdfbdb ("rhel: add ovn-db@.service systemd-unit")
Signed-off-by: Vladislav Odintsov 
---
 rhel/usr_lib_systemd_system_ovn-db@.service| 2 +-
 rhel/usr_lib_systemd_system_ovn-northd.service | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/rhel/usr_lib_systemd_system_ovn-db@.service 
b/rhel/usr_lib_systemd_system_ovn-db@.service
index 98556a673..c835e4967 100644
--- a/rhel/usr_lib_systemd_system_ovn-db@.service
+++ b/rhel/usr_lib_systemd_system_ovn-db@.service
@@ -33,7 +33,7 @@ EnvironmentFile=-/etc/sysconfig/ovn-%i
 ExecStartPre=-/usr/bin/chown -R ${OVN_USER_ID} ${OVN_DBDIR}
 ExecStart=/usr/share/ovn/scripts/ovn-ctl \
   --ovn-user=${OVN_USER_ID} start_%i_ovsdb $OPTIONS $ovn_%i_opts
-ExecStop=/usr/share/ovn/scripts/ovn-ctl stop_%i_ovsdb
+ExecStop=/usr/share/ovn/scripts/ovn-ctl stop_%i_ovsdb $OPTIONS $ovn_%i_opts
 
 [Install]
 WantedBy=multi-user.target
diff --git a/rhel/usr_lib_systemd_system_ovn-northd.service 
b/rhel/usr_lib_systemd_system_ovn-northd.service
index d281f861c..6c4c6621c 100644
--- a/rhel/usr_lib_systemd_system_ovn-northd.service
+++ b/rhel/usr_lib_systemd_system_ovn-northd.service
@@ -26,7 +26,7 @@ EnvironmentFile=-/etc/sysconfig/ovn-northd
 ExecStartPre=-/usr/bin/chown -R ${OVN_USER_ID} ${OVN_DBDIR}
 ExecStart=/usr/share/ovn/scripts/ovn-ctl \
   --ovn-user=${OVN_USER_ID} start_northd $OVN_NORTHD_OPTS
-ExecStop=/usr/share/ovn/scripts/ovn-ctl stop_northd
+ExecStop=/usr/share/ovn/scripts/ovn-ctl stop_northd $OVN_NORTHD_OPTS
 
 [Install]
 WantedBy=multi-user.target
-- 
2.36.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] dpdk: Allow retaining cap_sys_rawio privileges

2023-02-27 Thread Aaron Conole
Gaetan Rivet  writes:

>> -Original Message-
>> From: Aaron Conole mailto:acon...@redhat.com>>
>> Date: Wednesday 22 February 2023 at 18:11
>> To: "d...@openvswitch.org "
>> mailto:d...@openvswitch.org>>
>> Cc: Eli Britstein mailto:el...@nvidia.com>>,
>> Gaetan Rivet mailto:gaet...@nvidia.com>>, Ilya
>> Maximets mailto:i.maxim...@ovn.org>>, Maxime
>> Coquelin > >, Jason Gunthorpe
>> mailto:j...@nvidia.com>>, Majd Dibbiny
>> mailto:m...@nvidia.com>>, David Marchand
>> mailto:david.march...@redhat.com>>
>> Subject: Re: [ovs-dev] [RFC] dpdk: Allow retaining cap_sys_rawio privileges
>>
>> Apologies - I mis-typed Gaetan's email when I entered it into my mail
>> file. CC'd correctly on this email (but I can resend the patch, if you
>> think it is better).
>>
>> Aaron Conole mailto:acon...@redhat.com>> writes:
>>
>> > Open vSwitch generally tries to let the underlying operating system
>> > managed the low level details of hardware, for example DMA mapping,
>> > bus arbitration, etc. However, when using DPDK, the underlying
>> > operating system yields control of many of these details to userspace
>> > for management.
>> >
>> > In the case of some DPDK port drivers, configuring rte_flow or even
>> > allocating resources may require access to iopl/ioperm calls, which
>> > are guarded by the CAP_SYS_RAWIO privilege on linux systems. These
>> > calls are dangerous, and can allow a process to completely compromise
>> > a system. However, they are needed in the case of some userspace
>> > driver code which manages the hardware (for example, the mlx
>> > implementation of backend support for rte_flow).
>> >
>> > Here, we create an opt-in flag passed to the command line to allow
>> > this access. We need to do this before ever accessing the database,
>> > because we want to drop all privileges asap, and cannot wait for
>> > a connection to the database to be established and functional before
>> > dropping. There may be distribution specific ways to do capability
>> > management as well (using for example, systemd), but they are not
>> > as universal to the vswitchd as a flag.
>> >
>> > Signed-off-by: Aaron Conole > > >
>> > ---
>
> Hello Aaron,
>
> Thank you for proposing this change.
>
> If users want to use mlx5 ports with OVS without being root, this capability 
> will be required.
> Adding a vswitchd option to enable it seems the simplest way to offer some 
> control.

Yes, this seems to be the best way as far as I can tell.

> If vendor-specific logic was allowed, I could add a function to detect 
> Mellanox ports
> and enable this option in that case. Otherwise we can document as much as 
> possible,
> but hopefully the errors will be made clear from the DPDK side because it will
> be hard to explain those errors without vendor-specific code.

Unfortunately, there isn't a good place to put such vendor specific
logic.  I thought about it a bit.  The vswitchd can't assume that just
because an mlx5 card is present that it will be used as part of DPDK
port.  We cannot wait until connecting to DB to preserve this
capability, either, since the privileges drops would happen *after* a
stable DB connection and read (which is far too long).

> Regarding the implementation, I had a few comments:

Great!

>> @@ -877,11 +890,11 @@ daemon_become_new_user__(bool access_datapath)
>>   * However, there in case the user switch needs to be done
>>   * before daemonize_start(), the following API can be used.  */
>>  void
>> -daemon_become_new_user(bool access_datapath)
>> +daemon_become_new_user(bool access_datapath, bool access_hardware_ports)
>>  {
>>  assert_single_threaded();
>>  if (switch_user) {
>> -daemon_become_new_user__(access_datapath);
>> +daemon_become_new_user__(access_datapath, access_hardware_ports);
>>  /* daemonize_start() should not switch user again. */
>>  switch_user = false;
>>  }
>
> Grepping for daemon_become_new_user, I see the following that might need a 
> change:
> lib/daemon-windows.c:529:daemon_become_new_user(bool access_datapath 
> OVS_UNUSED)

Yes, I think it needs to change here as well.  My appveyor run for some
reason passed
(https://ci.appveyor.com/project/orgcandman/ovs/builds/46307713) and
that is concerning.

>> diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
>> index 407bfc60e..f62d1ad75 100644
>> --- a/vswitchd/ovs-vswitchd.c
>> +++ b/vswitchd/ovs-vswitchd.c
>> @@ -60,6 +60,9 @@ VLOG_DEFINE_THIS_MODULE(vswitchd);
>>   * the kernel from paging any of its memory to disk. */
>>  static bool want_mlockall;
>>
>> +/* --hw-access: If set, retains CAP_SYS_RAWIO privileges.  */
>> +static bool hw_access;
>> +
>>  static unixctl_cb_func ovs_vswitchd_exit;
>>
>>  static char *parse_options(int argc, char *argv[], char **unixctl_path);
>> @@ -89,7 +92,7 @@ main(int argc, char *argv[])
>>  remote = parse_options(argc, argv, _path);
>>  

Re: [ovs-dev] [PATCH] system-traffic.at: Add icmp error tests while dnatting address and port.

2023-02-27 Thread Paolo Valerio
Ilya Maximets  writes:

> On 2/27/23 12:08, Paolo Valerio wrote:
>> The two tests verify, for both icmp and icmpv6, that the correct port
>> translation happen in the inner packet in the case an error is
>> received in the reply direction.
>> 
>> Signed-off-by: Paolo Valerio 
>> ---
>>  tests/system-traffic.at |   72 
>> +++
>>  1 file changed, 72 insertions(+)
>> 
>> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
>> index 3a15b88a2..02fd0ee1b 100644
>> --- a/tests/system-traffic.at
>> +++ b/tests/system-traffic.at
>> @@ -3561,6 +3561,42 @@ AT_CHECK([ovs-appctl dpctl/dump-conntrack | 
>> FORMAT_CT(172.16.0.3)], [0], [dnl
>>  OVS_TRAFFIC_VSWITCHD_STOP
>>  AT_CLEANUP
>>  
>> +AT_SETUP([conntrack - ICMP related NAT with single port])
>> +AT_SKIP_IF([test $HAVE_NC = no])
>> +AT_SKIP_IF([test $HAVE_TCPDUMP = no])
>> +CHECK_CONNTRACK()
>> +CHECK_CONNTRACK_NAT()
>> +OVS_TRAFFIC_VSWITCHD_START()
>> +
>> +ADD_NAMESPACES(at_ns0, at_ns1)
>> +
>> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", "f0:00:00:01:01:01")
>> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", "f0:00:00:01:01:02")
>> +
>> +NS_CHECK_EXEC([at_ns0], [ip neigh add 10.1.1.240 lladdr f0:00:00:01:01:02 
>> dev p0])
>> +NS_CHECK_EXEC([at_ns1], [ip neigh add 10.1.1.1 lladdr f0:00:00:01:01:01 dev 
>> p1])
>> +
>> +AT_DATA([flows.txt], [dnl
>> +table=0,ip,ct_state=-trk,actions=ct(table=0,nat)
>> +table=0,in_port=ovs-p0,udp,ct_state=+trk+new,actions=ct(commit,nat(dst=10.1.1.2:8080)),ovs-p1
>> +table=0,in_port=ovs-p1,ct_state=+trk+rel+rpl,icmp,actions=ovs-p0
>> +])
>> +
>> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
>> +
>> +rm p0.pcap
>> +NETNS_DAEMONIZE([at_ns0], [tcpdump -l -U -i p0 -w p0.pcap 2>tcpdump0_err], 
>> [tcpdump0.pid])
>> +NS_CHECK_EXEC([at_ns0], [bash -c "echo dest_unreach | nc $NC_EOF_OPT -p 
>> 1234 -u 10.1.1.240 80"])
>> +
>> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.1," | 
>> sort], [0], [dnl
>> +udp,orig=(src=10.1.1.1,dst=10.1.1.240,sport=1234,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=8080,dport=1234)
>> +])
>> +
>> +OVS_WAIT_UNTIL([ovs-pcap p0.pcap | grep -Eq 
>> "f0010101f0010102080045c00045[[[:xdigit:]]]{4}4001[[[:xdigit:]]]{4}0a0101f00a010101030314164529[[[:xdigit:]]]{4}40004011[[[:xdigit:]]]{4}0a0101010a0101f004d2005000156b24646573745f756e72656163680a"])
>> +
>> +OVS_TRAFFIC_VSWITCHD_STOP
>> +AT_CLEANUP
>> +
>>  AT_SETUP([conntrack - IPv4 fragmentation])
>>  CHECK_CONNTRACK()
>>  OVS_TRAFFIC_VSWITCHD_START()
>> @@ -6555,6 +6591,42 @@ 
>> udp,orig=(src=fc00::1,dst=fc00::2,sport=,dport=),reply=(src=fc
>>  OVS_TRAFFIC_VSWITCHD_STOP
>>  AT_CLEANUP
>>  
>> +AT_SETUP([conntrack - ICMPv6 related NAT with single port])
>
> Looks like this test is failing Intel CI.
> Could you, please, check?
>

thanks, I sent a v2. It should fix the problem.

> Best regards, Ilya Maximets.
>
>> +AT_SKIP_IF([test $HAVE_NC = no])
>> +AT_SKIP_IF([test $HAVE_TCPDUMP = no])
>> +CHECK_CONNTRACK()
>> +CHECK_CONNTRACK_NAT()
>> +OVS_TRAFFIC_VSWITCHD_START()
>> +
>> +ADD_NAMESPACES(at_ns0, at_ns1)
>> +
>> +ADD_VETH(p0, at_ns0, br0, "fc00::1/96", "f0:00:00:01:01:01", [], "nodad")
>> +ADD_VETH(p1, at_ns1, br0, "fc00::2/96", "f0:00:00:01:01:02", [], "nodad")
>> +
>> +NS_CHECK_EXEC([at_ns0], [ip -6 neigh add fc00::240 lladdr f0:00:00:01:01:02 
>> dev p0])
>> +NS_CHECK_EXEC([at_ns1], [ip -6 neigh add fc00::1 lladdr f0:00:00:01:01:01 
>> dev p1])
>> +
>> +AT_DATA([flows.txt], [dnl
>> +table=0,ipv6,ct_state=-trk,actions=ct(table=0,nat)
>> +table=0,in_port=ovs-p0,udp6,ct_state=+trk+new,actions=ct(commit,nat(dst=[[fc00::2]]:8080)),ovs-p1
>> +table=0,in_port=ovs-p1,ct_state=+trk+rel+rpl,icmp6,actions=ovs-p0
>> +])
>> +
>> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
>> +
>> +rm p0.pcap
>> +NETNS_DAEMONIZE([at_ns0], [tcpdump -l -U -i p0 -w p0.pcap 2>tcpdump0_err], 
>> [tcpdump0.pid])
>> +NS_CHECK_EXEC([at_ns0], [bash -c "echo dest_unreach | nc -6 $NC_EOF_OPT -p 
>> 1234 -u fc00::240 80"])
>> +
>> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=fc00::1," | 
>> sort], [0], [dnl
>> +udp,orig=(src=fc00::1,dst=fc00::240,sport=1234,dport=80),reply=(src=fc00::2,dst=fc00::1,sport=8080,dport=1234)
>> +])
>> +
>> +OVS_WAIT_UNTIL([ovs-pcap p0.pcap | grep -Eq 
>> "f0010101f001010286dd60[[[:xdigit:]]]{6}00453a40fc000240fc010104[[[:xdigit:]]]{4}60[[[:xdigit:]]]{6}00151140fc01fc00024004d20050001587d4646573745f756e72656163680a"])
>> +
>> +OVS_TRAFFIC_VSWITCHD_STOP
>> +AT_CLEANUP
>> +
>>  AT_SETUP([conntrack - IPv6 FTP with SNAT])
>>  AT_SKIP_IF([test $HAVE_FTP = no])
>>  CHECK_CONNTRACK()
>> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] system-traffic.at: Add icmp error tests while dnatting address and port.

2023-02-27 Thread Paolo Valerio
The two tests verify, for both icmp and icmpv6, that the correct port
translation happen in the inner packet in the case an error is
received in the reply direction.

Signed-off-by: Paolo Valerio 
---
v2:
- added missing OVS_WAIT_UNTIL for tcpdump
- removed nc dependency and replaced with packet-out
---
 tests/system-traffic.at |   74 +++
 1 file changed, 74 insertions(+)

diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 3a15b88a2..380372430 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -3561,6 +3561,43 @@ AT_CHECK([ovs-appctl dpctl/dump-conntrack | 
FORMAT_CT(172.16.0.3)], [0], [dnl
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([conntrack - ICMP related NAT with single port])
+AT_SKIP_IF([test $HAVE_TCPDUMP = no])
+CHECK_CONNTRACK()
+CHECK_CONNTRACK_NAT()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", "f0:00:00:01:01:01")
+ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", "f0:00:00:01:01:02")
+
+AT_DATA([flows.txt], [dnl
+table=0,ip,ct_state=-trk,actions=ct(table=0,nat)
+table=0,in_port=ovs-p0,ct_state=+trk+new,udp,actions=ct(commit,nat(dst=10.1.1.2:8080)),ovs-p1
+table=0,in_port=ovs-p1,ct_state=+trk+rel+rpl,icmp,actions=ovs-p0
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+rm p0.pcap
+OVS_DAEMONIZE([tcpdump -l -U -i ovs-p0 -w p0.pcap 2> tcpdump0_err], 
[tcpdump0.pid])
+OVS_WAIT_UNTIL([grep "listening" tcpdump0_err])
+
+dnl Send UDP packet from 10.1.1.1:1234 to 10.1.1.240:80
+AT_CHECK([ovs-ofctl packet-out br0 
"in_port=ovs-p0,packet=f0010102f00101010800452944c140004011df100a0101010a0101f004d2005000156b24646573745f756e72656163680a,actions=resubmit(,0)"])
+dnl Send "destination unreachable" response
+AT_CHECK([ovs-ofctl packet-out br0 
"in_port=ovs-p1,packet=f0010101f0010102080045c000456a374001f9bc0a0101020a01010103031328452944c140004011dffe0a0101010a01010204d21f9000154cd2646573745f756e72656163680a,actions=resubmit(,0)"])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.1," | 
sort], [0], [dnl
+udp,orig=(src=10.1.1.1,dst=10.1.1.240,sport=1234,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=8080,dport=1234)
+])
+
+OVS_WAIT_UNTIL([ovs-pcap p0.pcap | grep -q 
"f0010101f0010102080045c000456a374001f8ce0a0101f00a01010103031416452944c140004011df100a0101010a0101f004d2005000156b24646573745f756e72656163680a"])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([conntrack - IPv4 fragmentation])
 CHECK_CONNTRACK()
 OVS_TRAFFIC_VSWITCHD_START()
@@ -6555,6 +6592,43 @@ 
udp,orig=(src=fc00::1,dst=fc00::2,sport=,dport=),reply=(src=fc
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([conntrack - ICMPv6 related NAT with single port])
+AT_SKIP_IF([test $HAVE_TCPDUMP = no])
+CHECK_CONNTRACK()
+CHECK_CONNTRACK_NAT()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "fc00::1/96", "f0:00:00:01:01:01", [], "nodad")
+ADD_VETH(p1, at_ns1, br0, "fc00::2/96", "f0:00:00:01:01:02", [], "nodad")
+
+AT_DATA([flows.txt], [dnl
+table=0,ipv6,ct_state=-trk,actions=ct(table=0,nat)
+table=0,in_port=ovs-p0,ct_state=+trk+new,udp6,actions=ct(commit,nat(dst=[[fc00::2]]:8080)),ovs-p1
+table=0,in_port=ovs-p1,ct_state=+trk+rel+rpl,icmp6,actions=ovs-p0
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+rm p0.pcap
+OVS_DAEMONIZE([tcpdump -l -U -i ovs-p0 -w p0.pcap 2> tcpdump0_err], 
[tcpdump0.pid])
+OVS_WAIT_UNTIL([grep "listening" tcpdump0_err])
+
+dnl Send UDP packet from [[fc00::1]]:1234 to [[fc00::240]]:80
+AT_CHECK([ovs-ofctl packet-out br0 
"in_port=ovs-p0,packet=f0010102f001010186dd60066ced00151140fc01fc00024004d20050001587d4646573745f756e72656163680a,actions=resubmit(,0)"])
+dnl Send "destination unreachable" response
+AT_CHECK([ovs-ofctl packet-out br0 
"in_port=ovs-p1,packet=f0010101f001010286dd600733ed00453a40fc02fc010104285560066ced00151140fc01fc0204d21f9000156ad2646573745f756e72656163680a,actions=resubmit(,0)"])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=fc00::1," | sort], 
[0], [dnl
+udp,orig=(src=fc00::1,dst=fc00::240,sport=1234,dport=80),reply=(src=fc00::2,dst=fc00::1,sport=8080,dport=1234)
+])
+
+OVS_WAIT_UNTIL([ovs-pcap p0.pcap | grep -q 
"f0010101f001010286dd600733ed00453a40fc000240fc010104261760066ced00151140fc01fc00024004d20050001587d4646573745f756e72656163680a"])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([conntrack - IPv6 FTP with SNAT])
 AT_SKIP_IF([test $HAVE_FTP = no])
 CHECK_CONNTRACK()

___
dev mailing list
d...@openvswitch.org

Re: [ovs-dev] [PATCH v5 2/2] ofproto-dpif-upcall: Include hardware offloaded flows in total flows.

2023-02-27 Thread Simon Horman
On Mon, Feb 27, 2023 at 04:30:11PM +0100, Eelco Chaudron wrote:
> The revalidator process uses the internal call udpif_get_n_flows()
> to get the total number of flows installed in the system. It uses
> this value for various decisions on flow installation and removal.
> With the tc offload this values is incorrect, as the hardware
> offloaded are not included. With rte_flow offload this is not a
> problem as dpif netdev keeps both in sync.
> 
> This patch will include the hardware offloaded flows if the
> underlying dpif implementation is not syncing them.
> 
> Signed-off-by: Eelco Chaudron 

Reviewed-by: Simon Horman 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 1/2] ofproto-dpif-upcall: Reset ukey's last stats value if the datapath changed.

2023-02-27 Thread Simon Horman
On Mon, Feb 27, 2023 at 04:29:26PM +0100, Eelco Chaudron wrote:
> When the ukey's action set changes, it could cause the flow to use a
> different datapath, for example, when it moves from tc to kernel.
> This will cause the the cached previous datapath statistics to be used.
> 
> This change will reset the cached statistics when a change in
> datapath is discovered.
> 
> Signed-off-by: Eelco Chaudron 

Reviewed-by: Simon Horman 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] test: move check for tc ingress pps support to test script

2023-02-27 Thread Ilya Maximets
On 2/24/23 14:01, Simon Horman wrote:
> Move check for tc ingress pps support to from aclocal to test script
> 
> This has several problems:
> 
> 1. Stderror from failing commands is output when executing
>various make targets.
> 
> 2. There are various failure conditions that lead
>to veth0 and veth1 being created by not cleaned up.
> 
> 3. The check seems to execute for many make targets.
>And it attempts to temporarily modify system state.
>This seems inappropriate.
> 
> 4. veth0 and veth1 seem far too generic and could easily
>conflict with other parts of the system.
> 
> All these problems are addressed by this patch.
> 
> Signed-off-by: Simon Horman 
> Reviewed-by: Louis Peens 
> ---
> v2
> * As suggested by Ilya, use:
>   - ovs_tc_pps rather than veth as prefix for interface names
>   - on_exit for cleanup, rather than handling each case
>   - dnl rather than '/' for multi-line commands
> ---
>  tests/atlocal.in | 11 ---
>  tests/system-offloads-traffic.at | 14 --
>  2 files changed, 12 insertions(+), 13 deletions(-)
> 
> diff --git a/tests/atlocal.in b/tests/atlocal.in
> index e02248f6f829..859668586299 100644
> --- a/tests/atlocal.in
> +++ b/tests/atlocal.in
> @@ -172,17 +172,6 @@ fi
>  # Set HAVE_TC
>  find_command tc
>  
> -# When HAVE_TC=yes, check if the current tc supports adding pps filter
> -SUPPORT_TC_INGRESS_PPS="no"
> -if test $HAVE_TC="yes"; then
> -ip link add veth0 type veth peer name veth1
> -tc qdisc add dev veth0 handle : ingress
> -if tc filter add dev veth0 parent : u32 match u32 0 0 police 
> pkts_rate 100 pkts_burst 10; then
> -SUPPORT_TC_INGRESS_PPS="yes"
> -fi
> -ip link del veth0
> -fi
> -
>  # Set HAVE_TCPDUMP
>  find_command tcpdump
>  
> diff --git a/tests/system-offloads-traffic.at 
> b/tests/system-offloads-traffic.at
> index f2bf9c0639aa..1d21da5f196d 100644
> --- a/tests/system-offloads-traffic.at
> +++ b/tests/system-offloads-traffic.at
> @@ -18,6 +18,16 @@ m4_define([OVS_CHECK_ACTIONS], [
> [0], [$1])
>  ])
>  
> +m4_define([CHECK_TC_INGRESS_PPS],
> +[
> +AT_SKIP_IF([test $HAVE_TC = "no"])
> +AT_CHECK([ip link add ovs_tc_pps0 type veth peer name ovs_tc_pps1 || dnl
> +  exit 77])

I'd break the line before '||', but that's not very important.

And a capital letter and a period in the subject line might be
good to add just for consistency.

In any case, the change seems correct to me:

Acked-by: Ilya Maximets 

Thanks!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5] ofproto-dpif-upcall: Wait for valid hw flow stats before applying min-revalidate-pps

2023-02-27 Thread Simon Horman
On Mon, Feb 27, 2023 at 03:58:04PM +0100, Eelco Chaudron wrote:
> Depending on the driver implementation, it can take from 0.2 seconds
> up to 2 seconds before offloaded flow statistics are updated. This is
> true for both TC and rte_flow-based offloading. This is causing a
> problem with min-revalidate-pps, as old statistic values are used
> during this period.
> 
> This fix will wait for at least 2 seconds, by default, before assuming no
> packets where received during this period.
> 
> Reviewed-by: Simon Horman 
> Signed-off-by: Eelco Chaudron 
> ---
> 
> Changes:
> - v2: Use existing ukey->offloaded so the revalidate_missed_dp_flow case is
>   also covered.
> - v3: Added OVS_REQUIRES(ukey->mutex) to should_revalidate() to keep
>   thread-safety-analysis happy.
> - v4: Add a configurable option.
>   After looking at multiple vendor implementation for both TC and
>   rte_flow I came to the conclusion that the delay is roughly between
>   0.2 and 2 seconds. Updated commit message.
> - v5: Rebased to latest upstream master.
>   Made the key parameter const in should_revalidate().

Thanks, LGTM.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v5 2/2] ofproto-dpif-upcall: Include hardware offloaded flows in total flows.

2023-02-27 Thread Eelco Chaudron
The revalidator process uses the internal call udpif_get_n_flows()
to get the total number of flows installed in the system. It uses
this value for various decisions on flow installation and removal.
With the tc offload this values is incorrect, as the hardware
offloaded are not included. With rte_flow offload this is not a
problem as dpif netdev keeps both in sync.

This patch will include the hardware offloaded flows if the
underlying dpif implementation is not syncing them.

Signed-off-by: Eelco Chaudron 
---
v3: Added this patch to patch set.
v4: No changes to this patch.

 ofproto/ofproto-dpif-upcall.c |   11 +++
 1 file changed, 11 insertions(+)

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index e24d18802..26044cf71 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -785,6 +785,17 @@ udpif_get_n_flows(struct udpif *udpif)
 atomic_store_relaxed(>n_flows_timestamp, now);
 dpif_get_dp_stats(udpif->dpif, );
 flow_count = stats.n_flows;
+
+if (!dpif_synced_dp_layers(udpif->dpif)) {
+/* If the dpif layer does not sync the flows, we need to include
+ * the hardware offloaded flows separately. */
+uint64_t hw_flows;
+
+if (!dpif_get_n_offloaded_flows(udpif->dpif, _flows)) {
+flow_count += hw_flows;
+}
+}
+
 atomic_store_relaxed(>n_flows, flow_count);
 ovs_mutex_unlock(>n_flows_mutex);
 } else {

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v5 1/2] ofproto-dpif-upcall: Reset ukey's last stats value if the datapath changed.

2023-02-27 Thread Eelco Chaudron
When the ukey's action set changes, it could cause the flow to use a
different datapath, for example, when it moves from tc to kernel.
This will cause the the cached previous datapath statistics to be used.

This change will reset the cached statistics when a change in
datapath is discovered.

Signed-off-by: Eelco Chaudron 
---
v2: Updated commit message to explain behavior.
v3: Added a test case.
Added dpif API to query the dpif for datapath layers synchronization.
Added coverage counter to identify potential problem.
v4: Fixed some spelling, moved common code to a macro, and moved
the synced_dp_layers function pointer to a bool.

 lib/dpif-netdev.c|1 +
 lib/dpif-netlink.c   |1 +
 lib/dpif-provider.h  |8 +
 lib/dpif.c   |5 +++
 lib/dpif.h   |1 +
 ofproto/ofproto-dpif-upcall.c|   41 +-
 tests/system-offloads-traffic.at |   60 ++
 7 files changed, 115 insertions(+), 2 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index c9f7179c3..aed2c8fbb 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -9616,6 +9616,7 @@ dpif_netdev_bond_stats_get(struct dpif *dpif, uint32_t 
bond_id,
 const struct dpif_class dpif_netdev_class = {
 "netdev",
 true,   /* cleanup_required */
+true,   /* synced_dp_layers */
 dpif_netdev_init,
 dpif_netdev_enumerate,
 dpif_netdev_port_open_type,
diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index 026b0daa8..80225e9e7 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -4515,6 +4515,7 @@ dpif_netlink_cache_set_size(struct dpif *dpif_, uint32_t 
level, uint32_t size)
 const struct dpif_class dpif_netlink_class = {
 "system",
 false,  /* cleanup_required */
+false,  /* synced_dp_layers */
 NULL,   /* init */
 dpif_netlink_enumerate,
 NULL,
diff --git a/lib/dpif-provider.h b/lib/dpif-provider.h
index 12477a24f..b8ead8a02 100644
--- a/lib/dpif-provider.h
+++ b/lib/dpif-provider.h
@@ -127,6 +127,14 @@ struct dpif_class {
  * datapaths that can not exist without it (e.g. netdev datapath).  */
 bool cleanup_required;
 
+/* If 'true' the specific dpif implementation synchronizes the various
+ * datapath implementation layers, i.e., the dpif's layer in combination
+ * with the underlying netdev offload layers. For example, dpif-netlink
+ * does not sync its kernel flows with the tc ones, i.e., only one gets
+ * installed. On the other hand, dpif-netdev installs both flows,
+ * internally keeps track of both, and represents them as one. */
+bool synced_dp_layers;
+
 /* Called when the dpif provider is registered, typically at program
  * startup.  Returning an error from this function will prevent any
  * datapath with this class from being created.
diff --git a/lib/dpif.c b/lib/dpif.c
index fe4db83fb..5ed8aef17 100644
--- a/lib/dpif.c
+++ b/lib/dpif.c
@@ -2109,3 +2109,8 @@ dpif_cache_set_size(struct dpif *dpif, uint32_t level, 
uint32_t size)
? dpif->dpif_class->cache_set_size(dpif, level, size)
: EOPNOTSUPP;
 }
+
+bool dpif_synced_dp_layers(struct dpif *dpif)
+{
+return dpif->dpif_class->synced_dp_layers;
+}
diff --git a/lib/dpif.h b/lib/dpif.h
index 6cb4dae6d..129cbf6a1 100644
--- a/lib/dpif.h
+++ b/lib/dpif.h
@@ -939,6 +939,7 @@ int dpif_get_pmds_for_port(const struct dpif * dpif, 
odp_port_t port_no,
 char *dpif_get_dp_version(const struct dpif *);
 bool dpif_supports_tnl_push_pop(const struct dpif *);
 bool dpif_supports_explicit_drop_action(const struct dpif *);
+bool dpif_synced_dp_layers(struct dpif *);
 
 /* Log functions. */
 struct vlog_module;
diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 31ac02d11..e24d18802 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -48,17 +48,20 @@
 
 #define UPCALL_MAX_BATCH 64
 #define REVALIDATE_MAX_BATCH 50
+#define UINT64_THREE_QUARTERS (UINT64_MAX / 4 * 3)
 
 VLOG_DEFINE_THIS_MODULE(ofproto_dpif_upcall);
 
 COVERAGE_DEFINE(dumped_duplicate_flow);
 COVERAGE_DEFINE(dumped_new_flow);
 COVERAGE_DEFINE(handler_duplicate_upcall);
-COVERAGE_DEFINE(upcall_ukey_contention);
-COVERAGE_DEFINE(upcall_ukey_replace);
 COVERAGE_DEFINE(revalidate_missed_dp_flow);
+COVERAGE_DEFINE(ukey_dp_change);
+COVERAGE_DEFINE(ukey_invalid_stat_reset);
 COVERAGE_DEFINE(upcall_flow_limit_hit);
 COVERAGE_DEFINE(upcall_flow_limit_kill);
+COVERAGE_DEFINE(upcall_ukey_contention);
+COVERAGE_DEFINE(upcall_ukey_replace);
 
 /* A thread that reads upcalls from dpif, forwards each upcall's packet,
  * and possibly sets up a kernel flow as a cache. */
@@ -288,6 +291,7 @@ struct udpif_key {
 
 struct ovs_mutex mutex;   /* Guards the following. */
 struct dpif_flow_stats 

Re: [ovs-dev] [PATCH v3 1/2] ofproto-dpif-upcall: Reset ukey's last stats value if the datapath changed.

2023-02-27 Thread Eelco Chaudron


On 21 Feb 2023, at 12:15, Simon Horman wrote:

Thanks for the review Simon, see responses inline. Will send out a v4 soon.

//Eelco

> On Fri, Feb 03, 2023 at 12:12:12PM +0100, Eelco Chaudron wrote:
>> When the ukey's action set changes, it could caus the flow to use a
>
> nit: s/caus/cause/

Will fix

>> different datapath, for example, when it moves from tc to kernel.
>> This will cause the the cached previous datapath statistics to be used.
>>
>> This change will reset the cached statistics when a change in
>> datapath is discovered.
>>
>> Signed-off-by: Eelco Chaudron 
>
> ...
>
>> diff --git a/lib/dpif-provider.h b/lib/dpif-provider.h
>> index 12477a24f..2795ccc81 100644
>> --- a/lib/dpif-provider.h
>> +++ b/lib/dpif-provider.h
>> @@ -338,6 +338,15 @@ struct dpif_class {
>>  int (*offload_stats_get)(struct dpif *dpif,
>>   struct netdev_custom_stats *stats);
>>
>> +/* This function will return true if the specific dpif implementation
>> + * synchronizes the various datapath implementation layers, i.e., the
>> + * dpif's layer in combination with the underlying netdev offload 
>> layers.
>> + * For example, dpif-netlink does not sync its kernel flows with the tc
>> + * ones, i.e., only one gets installed. On the other hand, dpif-netdev
>> + * installs both flows, and internally keeps track of both, and 
>> represents
>
> nit: s/and internally/internally/

Will fix

>
>> + * them as one. */
>> +bool (*synced_dp_layers)(struct dpif *dpif);
>> +
>
> Both implementations of this callback simply return a boolean,
> without an logic. I don't know if it makes sense, but it occurs
> to me that synced_dp_layers could be a bool rather than a function pointer.

Changed to a bool as you suggested.

> ...
>
>> diff --git a/lib/dpif.h b/lib/dpif.h
>> index 6cb4dae6d..129cbf6a1 100644
>> --- a/lib/dpif.h
>> +++ b/lib/dpif.h
>
> ...
>
>> @@ -54,11 +54,14 @@ VLOG_DEFINE_THIS_MODULE(ofproto_dpif_upcall);
>>  COVERAGE_DEFINE(dumped_duplicate_flow);
>>  COVERAGE_DEFINE(dumped_new_flow);
>>  COVERAGE_DEFINE(handler_duplicate_upcall);
>> -COVERAGE_DEFINE(upcall_ukey_contention);
>> -COVERAGE_DEFINE(upcall_ukey_replace);
>>  COVERAGE_DEFINE(revalidate_missed_dp_flow);
>> +COVERAGE_DEFINE(ukey_dp_change);
>> +COVERAGE_DEFINE(ukey_invalid_stat_reset);
>>  COVERAGE_DEFINE(upcall_flow_limit_hit);
>>  COVERAGE_DEFINE(upcall_flow_limit_kill);
>> +COVERAGE_DEFINE(upcall_ukey_contention);
>> +COVERAGE_DEFINE(upcall_ukey_replace);
>> +
>>
>
> nit: now there are two blank lines here.

Will fix

>>  /* A thread that reads upcalls from dpif, forwards each upcall's packet,
>>   * and possibly sets up a kernel flow as a cache. */
>> @@ -288,6 +291,7 @@ struct udpif_key {
>>
>>  struct ovs_mutex mutex;   /* Guards the following. */
>>  struct dpif_flow_stats stats OVS_GUARDED; /* Last known stats.*/
>> +const char *dp_layer OVS_GUARDED; /* Last known dp_layer. */
>>  long long int created OVS_GUARDED;/* Estimate of creation time. 
>> */
>>  uint64_t dump_seq OVS_GUARDED;/* Tracks udpif->dump_seq. */
>>  uint64_t reval_seq OVS_GUARDED;   /* Tracks udpif->reval_seq. */
>> @@ -1771,6 +1775,7 @@ ukey_create__(const struct nlattr *key, size_t key_len,
>>  ukey->created = ukey->flow_time = time_msec();
>>  memset(>stats, 0, sizeof ukey->stats);
>>  ukey->stats.used = used;
>> +ukey->dp_layer = NULL;
>>  ukey->xcache = NULL;
>>
>>  ukey->offloaded = false;
>> @@ -2356,6 +2361,13 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key 
>> *ukey,
>>  ? stats->n_bytes - ukey->stats.n_bytes
>>  : 0);
>>
>> +if (stats->n_packets < ukey->stats.n_packets &&
>> +ukey->stats.n_packets < (UINT64_MAX / 4 * 3)) {
>
> Is this logic used elsewhere in the code-base?
> Perhaps it warrants a helper.

Added a MACRO for this…

>> +/* Report cases where the packet counter is lower than the previous
>> + * instance, but exclude the potential wrapping of an uint64_t. */
>> +COVERAGE_INC(ukey_invalid_stat_reset);
>> +}
>> +
>>  if (need_revalidate) {
>>  if (should_revalidate(udpif, push.n_packets, ukey->stats.used)) {
>>  if (!ukey->xcache) {
>> @@ -2469,6 +2481,15 @@ push_dp_ops(struct udpif *udpif, struct ukey_op *ops, 
>> size_t n_ops)
>>  push->tcp_flags = stats->tcp_flags | op->ukey->stats.tcp_flags;
>>  push->n_packets = stats->n_packets - op->ukey->stats.n_packets;
>>  push->n_bytes = stats->n_bytes - op->ukey->stats.n_bytes;
>> +
>> +if (stats->n_packets < op->ukey->stats.n_packets &&
>> +op->ukey->stats.n_packets < (UINT64_MAX / 4 * 3)) {
>> +/* Report cases where the packet counter is lower than the
>> + * previous instance, but exclude the potential wrapping of 
>> an
>
> ... yes, here it is :)
>

[ovs-dev] [PATCH v5] ofproto-dpif-upcall: Wait for valid hw flow stats before applying min-revalidate-pps

2023-02-27 Thread Eelco Chaudron
Depending on the driver implementation, it can take from 0.2 seconds
up to 2 seconds before offloaded flow statistics are updated. This is
true for both TC and rte_flow-based offloading. This is causing a
problem with min-revalidate-pps, as old statistic values are used
during this period.

This fix will wait for at least 2 seconds, by default, before assuming no
packets where received during this period.

Reviewed-by: Simon Horman 
Signed-off-by: Eelco Chaudron 
---

Changes:
- v2: Use existing ukey->offloaded so the revalidate_missed_dp_flow case is
  also covered.
- v3: Added OVS_REQUIRES(ukey->mutex) to should_revalidate() to keep
  thread-safety-analysis happy.
- v4: Add a configurable option.
  After looking at multiple vendor implementation for both TC and
  rte_flow I came to the conclusion that the delay is roughly between
  0.2 and 2 seconds. Updated commit message.
- v5: Rebased to latest upstream master.
  Made the key parameter const in should_revalidate().

 ofproto/ofproto-dpif-upcall.c |   26 --
 ofproto/ofproto-provider.h|4 
 ofproto/ofproto.c |9 +
 ofproto/ofproto.h |2 ++
 vswitchd/bridge.c |3 +++
 vswitchd/vswitch.xml  |   12 
 6 files changed, 46 insertions(+), 10 deletions(-)

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index fc94078cb..60c273a1d 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -2100,10 +2100,12 @@ ukey_delete(struct umap *umap, struct udpif_key *ukey)
 }
 
 static bool
-should_revalidate(const struct udpif *udpif, uint64_t packets,
-  long long int used)
+should_revalidate(const struct udpif *udpif, const struct udpif_key *ukey,
+  uint64_t packets)
+OVS_REQUIRES(ukey->mutex)
 {
 long long int metric, now, duration;
+long long int used = ukey->stats.used;
 
 if (!ofproto_min_revalidate_pps) {
 return true;
@@ -2134,8 +2136,12 @@ should_revalidate(const struct udpif *udpif, uint64_t 
packets,
 duration = now - used;
 metric = duration / packets;
 
-if (metric < 1000 / ofproto_min_revalidate_pps) {
-/* The flow is receiving more than min-revalidate-pps, so keep it. */
+if (metric < 1000 / ofproto_min_revalidate_pps ||
+(ukey->offloaded && duration < ofproto_offloaded_stats_delay)) {
+/* The flow is receiving more than min-revalidate-pps, so keep it.
+ * Or it's a hardware offloaded flow that might take up to X seconds
+ * to update its statistics. Until we are sure the statistics had a
+ * chance to be updated, also keep it. */
 return true;
 }
 return false;
@@ -2339,7 +2345,7 @@ static enum reval_result
 revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey,
 const struct dpif_flow_stats *stats,
 struct ofpbuf *odp_actions, uint64_t reval_seq,
-struct recirc_refs *recircs, bool offloaded)
+struct recirc_refs *recircs)
 OVS_REQUIRES(ukey->mutex)
 {
 bool need_revalidate = ukey->reval_seq != reval_seq;
@@ -2358,7 +2364,7 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key 
*ukey,
 : 0);
 
 if (need_revalidate) {
-if (should_revalidate(udpif, push.n_packets, ukey->stats.used)) {
+if (should_revalidate(udpif, ukey, push.n_packets)) {
 if (!ukey->xcache) {
 ukey->xcache = xlate_cache_new();
 } else {
@@ -2374,7 +2380,7 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key 
*ukey,
 
 /* Stats for deleted flows will be attributed upon flow deletion. Skip. */
 if (result != UKEY_DELETE) {
-xlate_push_stats(ukey->xcache, , offloaded);
+xlate_push_stats(ukey->xcache, , ukey->offloaded);
 ukey->stats = *stats;
 ukey->reval_seq = reval_seq;
 }
@@ -2774,6 +2780,7 @@ revalidate(struct revalidator *revalidator)
 continue;
 }
 
+ukey->offloaded = f->attrs.offloaded;
 already_dumped = ukey->dump_seq == dump_seq;
 if (already_dumped) {
 /* The flow has already been handled during this flow dump
@@ -2805,8 +2812,7 @@ revalidate(struct revalidator *revalidator)
 result = UKEY_DELETE;
 } else {
 result = revalidate_ukey(udpif, ukey, , _actions,
- reval_seq, ,
- f->attrs.offloaded);
+ reval_seq, );
 }
 ukey->dump_seq = dump_seq;
 
@@ -2891,7 +2897,7 @@ revalidator_sweep__(struct revalidator *revalidator, bool 
purge)
 COVERAGE_INC(revalidate_missed_dp_flow);
 memcpy(, >stats, sizeof stats);
 result = revalidate_ukey(udpif, ukey, , 

Re: [ovs-dev] [PATCH v4] ofproto-dpif-upcall: Wait for valid hw flow stats before applying min-revalidate-pps

2023-02-27 Thread Eelco Chaudron


On 21 Feb 2023, at 11:47, Simon Horman wrote:

> On Tue, Feb 14, 2023 at 02:39:25PM +0100, Eelco Chaudron wrote:
>> Depending on the driver implementation, it can take from 0.2 seconds
>> up to 2 seconds before offloaded flow statistics are updated. This is
>> true for both TC and rte_flow-based offloading. This is causing a
>> problem with min-revalidate-pps, as old statistic values are used
>> during this period.
>>
>> This fix will wait for at least 2 seconds, by default, before assuming no
>> packets where received during this period.
>>
>> Signed-off-by: Eelco Chaudron 
>
> Some minor nits inline, but this looks good to me.
>
> Reviewed-by: Simon Horman 

Thanks for the review! I’ve fixed the nits and will send out a v5, keeping your 
reviewed-by.

Cheers,

Eelco

>
>> Changes:
>> - v2: Use existing ukey->offloaded so the revalidate_missed_dp_flow case is
>>   also covered.
>> - v3: Added OVS_REQUIRES(ukey->mutex) to should_revalidate() to keep
>>   thread-safety-analysis happy.
>> - v4: Add a configurable option.
>>   After looking at multiple vendor implementation for both TC and
>>   rte_flow I came to the conclusion that the delay is roughly between
>>   0.2 and 2 seconds. Updated commit message.
>> ---
>>  ofproto/ofproto-dpif-upcall.c |   26 --
>
> nit: there was some fuzz in ofproto-dpif-upcall. when applying this patch.
>
>>  ofproto/ofproto-provider.h|4 
>>  ofproto/ofproto.c |9 +
>>  ofproto/ofproto.h |2 ++
>>  vswitchd/bridge.c |3 +++
>>  vswitchd/vswitch.xml  |   12 
>>  6 files changed, 46 insertions(+), 10 deletions(-)
>>
>> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
>> index db7570ee2..24ac6806c 100644
>> --- a/ofproto/ofproto-dpif-upcall.c
>> +++ b/ofproto/ofproto-dpif-upcall.c
>> @@ -2100,10 +2100,12 @@ ukey_delete(struct umap *umap, struct udpif_key 
>> *ukey)
>>  }
>>
>>  static bool
>> -should_revalidate(const struct udpif *udpif, uint64_t packets,
>> -  long long int used)
>> +should_revalidate(const struct udpif *udpif, struct udpif_key *ukey,
>> +  uint64_t packets)
>
> nit: can ukey be const?
>
>> +OVS_REQUIRES(ukey->mutex)
>>  {
>>  long long int metric, now, duration;
>> +long long int used = ukey->stats.used;
>>
>>  if (!ofproto_min_revalidate_pps) {
>>  return true;
>
> ...

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/1] usdt-scripts: use ebpf to track upcall_cost

2023-02-27 Thread Ilya Maximets
On 2/24/23 11:41, Adrián Moreno wrote:
> From: Adrian Moreno 
> 
> This is basically a rewrite of the upcall_cost.py script moving some of
> it's functionality to the ebpf programs. These are the main changes:
> 
> * Correlation of kernel upcall <-> userspace recv upcall.
> The correlation was being done in python by comparing the packet
> contents.
> We still don't have a better way to do it but, instead of doing it
> at post-processing time, compute a hash in the ebpf program so
> correlation is little more than an indirection.
> For the hash library we use jhash.h from the linux-headers package.
> 
> * GSO packets
> We where attaching to the upcall tracepoint which is hit before
> fragmentation. Use a kprobe on kprobe__queue_userspace_packet to
> correlate upcalls to upcall queue events.
> 
> * Userspace operation batching.
> Also move the batc tracking implementation to the ebpf programs so each
> operation (e.g: PUT, EXEC) event  contains the id (hash) of the
> correspondent RECV.
> 
> As a result of this, it's no longer mandatory to send the packet or key
> bytes to userspace, leading to smaller events and less event drops.
> Also, accuracy is improved since we're now considering fragmentation.
> Additionally, there is no need for a time-consuming post-processing
> phase so the script is now much faster.
> 
> Signed-off-by: Adrian Moreno 

Hi, Adrian.  I'll leave the actual review to Eelco, but FYI this change
is failing builds due to PEP8 issues.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] system-traffic.at: Add icmp error tests while dnatting address and port.

2023-02-27 Thread Ilya Maximets
On 2/27/23 12:08, Paolo Valerio wrote:
> The two tests verify, for both icmp and icmpv6, that the correct port
> translation happen in the inner packet in the case an error is
> received in the reply direction.
> 
> Signed-off-by: Paolo Valerio 
> ---
>  tests/system-traffic.at |   72 
> +++
>  1 file changed, 72 insertions(+)
> 
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index 3a15b88a2..02fd0ee1b 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -3561,6 +3561,42 @@ AT_CHECK([ovs-appctl dpctl/dump-conntrack | 
> FORMAT_CT(172.16.0.3)], [0], [dnl
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
>  
> +AT_SETUP([conntrack - ICMP related NAT with single port])
> +AT_SKIP_IF([test $HAVE_NC = no])
> +AT_SKIP_IF([test $HAVE_TCPDUMP = no])
> +CHECK_CONNTRACK()
> +CHECK_CONNTRACK_NAT()
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", "f0:00:00:01:01:01")
> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", "f0:00:00:01:01:02")
> +
> +NS_CHECK_EXEC([at_ns0], [ip neigh add 10.1.1.240 lladdr f0:00:00:01:01:02 
> dev p0])
> +NS_CHECK_EXEC([at_ns1], [ip neigh add 10.1.1.1 lladdr f0:00:00:01:01:01 dev 
> p1])
> +
> +AT_DATA([flows.txt], [dnl
> +table=0,ip,ct_state=-trk,actions=ct(table=0,nat)
> +table=0,in_port=ovs-p0,udp,ct_state=+trk+new,actions=ct(commit,nat(dst=10.1.1.2:8080)),ovs-p1
> +table=0,in_port=ovs-p1,ct_state=+trk+rel+rpl,icmp,actions=ovs-p0
> +])
> +
> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> +
> +rm p0.pcap
> +NETNS_DAEMONIZE([at_ns0], [tcpdump -l -U -i p0 -w p0.pcap 2>tcpdump0_err], 
> [tcpdump0.pid])
> +NS_CHECK_EXEC([at_ns0], [bash -c "echo dest_unreach | nc $NC_EOF_OPT -p 1234 
> -u 10.1.1.240 80"])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.1," | 
> sort], [0], [dnl
> +udp,orig=(src=10.1.1.1,dst=10.1.1.240,sport=1234,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=8080,dport=1234)
> +])
> +
> +OVS_WAIT_UNTIL([ovs-pcap p0.pcap | grep -Eq 
> "f0010101f0010102080045c00045[[[:xdigit:]]]{4}4001[[[:xdigit:]]]{4}0a0101f00a010101030314164529[[[:xdigit:]]]{4}40004011[[[:xdigit:]]]{4}0a0101010a0101f004d2005000156b24646573745f756e72656163680a"])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
>  AT_SETUP([conntrack - IPv4 fragmentation])
>  CHECK_CONNTRACK()
>  OVS_TRAFFIC_VSWITCHD_START()
> @@ -6555,6 +6591,42 @@ 
> udp,orig=(src=fc00::1,dst=fc00::2,sport=,dport=),reply=(src=fc
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
>  
> +AT_SETUP([conntrack - ICMPv6 related NAT with single port])

Looks like this test is failing Intel CI.
Could you, please, check?

Best regards, Ilya Maximets.

> +AT_SKIP_IF([test $HAVE_NC = no])
> +AT_SKIP_IF([test $HAVE_TCPDUMP = no])
> +CHECK_CONNTRACK()
> +CHECK_CONNTRACK_NAT()
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH(p0, at_ns0, br0, "fc00::1/96", "f0:00:00:01:01:01", [], "nodad")
> +ADD_VETH(p1, at_ns1, br0, "fc00::2/96", "f0:00:00:01:01:02", [], "nodad")
> +
> +NS_CHECK_EXEC([at_ns0], [ip -6 neigh add fc00::240 lladdr f0:00:00:01:01:02 
> dev p0])
> +NS_CHECK_EXEC([at_ns1], [ip -6 neigh add fc00::1 lladdr f0:00:00:01:01:01 
> dev p1])
> +
> +AT_DATA([flows.txt], [dnl
> +table=0,ipv6,ct_state=-trk,actions=ct(table=0,nat)
> +table=0,in_port=ovs-p0,udp6,ct_state=+trk+new,actions=ct(commit,nat(dst=[[fc00::2]]:8080)),ovs-p1
> +table=0,in_port=ovs-p1,ct_state=+trk+rel+rpl,icmp6,actions=ovs-p0
> +])
> +
> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> +
> +rm p0.pcap
> +NETNS_DAEMONIZE([at_ns0], [tcpdump -l -U -i p0 -w p0.pcap 2>tcpdump0_err], 
> [tcpdump0.pid])
> +NS_CHECK_EXEC([at_ns0], [bash -c "echo dest_unreach | nc -6 $NC_EOF_OPT -p 
> 1234 -u fc00::240 80"])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=fc00::1," | 
> sort], [0], [dnl
> +udp,orig=(src=fc00::1,dst=fc00::240,sport=1234,dport=80),reply=(src=fc00::2,dst=fc00::1,sport=8080,dport=1234)
> +])
> +
> +OVS_WAIT_UNTIL([ovs-pcap p0.pcap | grep -Eq 
> "f0010101f001010286dd60[[[:xdigit:]]]{6}00453a40fc000240fc010104[[[:xdigit:]]]{4}60[[[:xdigit:]]]{6}00151140fc01fc00024004d20050001587d4646573745f756e72656163680a"])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
>  AT_SETUP([conntrack - IPv6 FTP with SNAT])
>  AT_SKIP_IF([test $HAVE_FTP = no])
>  CHECK_CONNTRACK()
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto: Fix re-creation of tunnel backing interfaces on restart.

2023-02-27 Thread Ilya Maximets
On 2/24/23 14:06, Simon Horman wrote:
> On Thu, Feb 23, 2023 at 05:39:27PM +0100, Ilya Maximets wrote:
>> Tunnel OpenFlow ports do not exist in the datapath, instead there is a
>> tunnel backing interface that serves all the tunnels of the same type.
>> For example, if the geneve port 'my_tunnel' is added to OVS, it will
>> create 'geneve_sys_6041' datapath port, if it doesn't already exist,
>> and use this port as a tunnel output.
>>
>> However, while creating/opening a new datapath after re-start,
>> ovs-vswitchd only has a list of names of OpenFlow interfaces.  And it
>> thinks that each datapath port, that is not on the list, is a stale
>> port that needs to be removed.  This is obviously not correct for
>> tunnel backing interfaces that can serve multiple tunnel ports and do
>> not match OpenFlow port names.
>>
>> This is causing removal and re-creation of all the tunnel backing
>> interfaces in the datapath on OVS restart, causing disruption in
>> existing connections.
>>
>> It's hard to tell by only having a name of the interface if this
>> interface is a tunnel backing interface, or someone just named a
>> normal interface this way.  So, instead of trying to determine that,
>> not removing any interfaces at all, while we don't know types of
>> actual ports we need.
>>
>> Assuming that all the ports that are currently not in the list of OF
>> ports are tunnel backing ports.  Later, revalidation of tunnel backing
>> ports in type_run() will determine which ports are still needed and
>> which should be removed.
>>
>> It's OK to add even a non-tunnel stale ports into tnl_backers, they
>> will be cleaned up the same way as stale tunnel backers.
>>
>> Reported-at: 
>> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-February/052215.html
>> Signed-off-by: Ilya Maximets 
> 
> Thanks Ilya,
> 
> this looks good to me.
> I have a few comments below, but those notwithstanding,
> 
> Reviewed-by: Simon Horman 
> 
> ...
> 
>> @@ -729,8 +723,6 @@ open_dpif_backer(const char *type, struct dpif_backer 
>> **backerp)
>>  struct dpif_port_dump port_dump;
>>  struct dpif_port port;
>>  struct shash_node *node;
>> -struct ovs_list garbage_list;
>> -struct odp_garbage *garbage;
>>  
>>  struct sset names;
>>  char *backer_name;
>> @@ -792,25 +784,23 @@ open_dpif_backer(const char *type, struct dpif_backer 
>> **backerp)
>>  dpif_flow_flush(backer->dpif);
>>  }
>>  
>> -/* Loop through the ports already on the datapath and remove any
>> - * that we don't need anymore. */
>> -ovs_list_init(_list);
>> +/* Loop through the ports already on the datapath and find ones that are
>> + * not on the initial OpenFlow ports list.  These are stale ports, that 
>> we
>> + * do not need anymore, or tunnel backing interfaces, that do not 
>> generally
>> + * match the name of OpenFlow tunnel ports, or both.  Adding all of 
>> them to
> 
> nit: s/Adding/Add/

OK.

> 
>> + * the list of tunnel backers.  type_run() will garbage collect those 
>> that
>> + * are not active tunnel backing interfaces during revalidation. */
> 
> ...
> 
>> diff --git a/tests/system-interface.at b/tests/system-interface.at
>> index 784bada12..6cd8ae802 100644
>> --- a/tests/system-interface.at
>> +++ b/tests/system-interface.at
>> @@ -63,3 +63,62 @@ AT_CHECK([
>>  [stdout], [Device "br-p1" does not exist.]
>>  )
>>  AT_CLEANUP
>> +
>> +AT_SETUP([interface - datapath ports garbage collection])
>> +OVS_CHECK_GENEVE()
>> +OVS_TRAFFIC_VSWITCHD_START()
>> +
>> +dnl Not relevant for userspace datapath.
>> +AT_SKIP_IF([! ovs-appctl dpctl/show | grep -q ovs-system])
>> +
>> +AT_CHECK([ovs-vsctl add-port br0 tunnel_port dnl
>> +-- set Interface tunnel_port dnl
>> +   type=geneve options:remote_ip=flow options:key=123])
>> +
>> +AT_CHECK([ip link add ovs-veth0 type veth peer name ovs-veth1])
>> +on_exit 'ip link del ovs-veth0'
>> +
>> +AT_CHECK([ovs-vsctl add-port br0 ovs-veth0])
>> +
>> +OVS_WAIT_UNTIL([ip link show | grep ovs-system | grep -q genev])
>> +
>> +dnl Store the output of ip link for geneve port to compare ifindex later.
>> +AT_CHECK([ip link show | grep ovs-system | grep genev > geneve.0])
>> +
>> +AT_CHECK([ovs-appctl dpctl/show | grep port], [0], [dnl
>> +  port 0: ovs-system (internal)
>> +  port 1: br0 (internal)
>> +  port 2: genev_sys_6081 (geneve: packet_type=ptap)
>> +  port 3: ovs-veth0
>> +])
>> +
>> +OVS_APP_EXIT_AND_WAIT_BY_TARGET([ovs-vswitchd], [ovs-vswitchd.pid])
>> +
>> +dnl Check that geneve backing interface is still in the datapath.
>> +AT_CHECK([ip link show | grep ovs-system | grep genev | diff -u - geneve.0])
>> +
>> +dnl Remove the veth port from the database while ovs-vswitchd is down.
>> +AT_CHECK([ovs-vsctl --no-wait del-port ovs-veth0])
>> +
>> +dnl Check that it is still tied to the OVS datapath.
>> +AT_CHECK([ip link show | grep ovs-system | grep -q ovs-veth0])
>> +
>> +dnl Bring ovs-vswitchd back up.
>> 

Re: [ovs-dev] [PATCH v2 ovn 1/2] northd: move defrag router pipeline stage forward

2023-02-27 Thread 0-day Robot
Bleep bloop.  Greetings Lorenzo Bianconi, I am a robot and I have tried out 
your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line is 101 characters long (recommended limit is 79)
#69 FILE: northd/northd.c:326:
 * | |  (>= IP_INPUT)|   | | E |
NEXT_HOP_IPV6 (>= POST_DEFRAG ) |

Lines checked: 1324, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 ovn 1/2] northd: move defrag router pipeline stage forward

2023-02-27 Thread Lorenzo Bianconi
Introduce a post_defrag router ingress pipeline stage and move current defrag
code in post_defrag stage. This is a preliminary patch to just defrag IP
fragment traffic (without performing DNAT) before accessing L4 info (e.g.
L4 protocol port) since they are not available in all IP fragment.

Signed-off-by: Lorenzo Bianconi 
---
 northd/northd.c |  46 ++--
 northd/ovn-northd.8.xml |  45 ++--
 tests/ovn-northd.at | 526 
 tests/ovn.at|  52 ++--
 tests/system-ovn.at |   8 +-
 5 files changed, 343 insertions(+), 334 deletions(-)

diff --git a/northd/northd.c b/northd/northd.c
index 770a5b50e..97589e31d 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -158,22 +158,23 @@ enum ovn_stage {
 PIPELINE_STAGE(ROUTER, IN,  IP_INPUT,3, "lr_in_ip_input") \
 PIPELINE_STAGE(ROUTER, IN,  UNSNAT,  4, "lr_in_unsnat")   \
 PIPELINE_STAGE(ROUTER, IN,  DEFRAG,  5, "lr_in_defrag")   \
-PIPELINE_STAGE(ROUTER, IN,  LB_AFF_CHECK,6, "lr_in_lb_aff_check") \
-PIPELINE_STAGE(ROUTER, IN,  DNAT,7, "lr_in_dnat") \
-PIPELINE_STAGE(ROUTER, IN,  LB_AFF_LEARN,8, "lr_in_lb_aff_learn") \
-PIPELINE_STAGE(ROUTER, IN,  ECMP_STATEFUL,   9, "lr_in_ecmp_stateful") \
-PIPELINE_STAGE(ROUTER, IN,  ND_RA_OPTIONS,   10, "lr_in_nd_ra_options") \
-PIPELINE_STAGE(ROUTER, IN,  ND_RA_RESPONSE,  11, "lr_in_nd_ra_response") \
-PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING_PRE,  12, "lr_in_ip_routing_pre")  \
-PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING,  13, "lr_in_ip_routing")  \
-PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING_ECMP, 14, "lr_in_ip_routing_ecmp") \
-PIPELINE_STAGE(ROUTER, IN,  POLICY,  15, "lr_in_policy")  \
-PIPELINE_STAGE(ROUTER, IN,  POLICY_ECMP, 16, "lr_in_policy_ecmp") \
-PIPELINE_STAGE(ROUTER, IN,  ARP_RESOLVE, 17, "lr_in_arp_resolve") \
-PIPELINE_STAGE(ROUTER, IN,  CHK_PKT_LEN, 18, "lr_in_chk_pkt_len") \
-PIPELINE_STAGE(ROUTER, IN,  LARGER_PKTS, 19, "lr_in_larger_pkts") \
-PIPELINE_STAGE(ROUTER, IN,  GW_REDIRECT, 20, "lr_in_gw_redirect") \
-PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST, 21, "lr_in_arp_request") \
+PIPELINE_STAGE(ROUTER, IN,  POST_DEFRAG, 6, "lr_in_post_defrag")  \
+PIPELINE_STAGE(ROUTER, IN,  LB_AFF_CHECK,7, "lr_in_lb_aff_check") \
+PIPELINE_STAGE(ROUTER, IN,  DNAT,8, "lr_in_dnat") \
+PIPELINE_STAGE(ROUTER, IN,  LB_AFF_LEARN,9, "lr_in_lb_aff_learn") \
+PIPELINE_STAGE(ROUTER, IN,  ECMP_STATEFUL,   10, "lr_in_ecmp_stateful") \
+PIPELINE_STAGE(ROUTER, IN,  ND_RA_OPTIONS,   11, "lr_in_nd_ra_options") \
+PIPELINE_STAGE(ROUTER, IN,  ND_RA_RESPONSE,  12, "lr_in_nd_ra_response") \
+PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING_PRE,  13, "lr_in_ip_routing_pre")  \
+PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING,  14, "lr_in_ip_routing")  \
+PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING_ECMP, 15, "lr_in_ip_routing_ecmp") \
+PIPELINE_STAGE(ROUTER, IN,  POLICY,  16, "lr_in_policy")  \
+PIPELINE_STAGE(ROUTER, IN,  POLICY_ECMP, 17, "lr_in_policy_ecmp") \
+PIPELINE_STAGE(ROUTER, IN,  ARP_RESOLVE, 18, "lr_in_arp_resolve") \
+PIPELINE_STAGE(ROUTER, IN,  CHK_PKT_LEN, 19, "lr_in_chk_pkt_len") \
+PIPELINE_STAGE(ROUTER, IN,  LARGER_PKTS, 20, "lr_in_larger_pkts") \
+PIPELINE_STAGE(ROUTER, IN,  GW_REDIRECT, 21, "lr_in_gw_redirect") \
+PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST, 22, "lr_in_arp_request") \
   \
 /* Logical router egress stages. */   \
 PIPELINE_STAGE(ROUTER, OUT, CHECK_DNAT_LOCAL,   0,   \
@@ -322,7 +323,7 @@ enum ovn_stage {
  * | |  (>= IP_INPUT)| E | INPORT_ETH_ADDR | X |   
 |
  * +-+---+ G |   (< IP_INPUT)  | X |   
 |
  * | R1  |   SRC_IPV4 for ARP-REQ| 0 | | R |   
 |
- * | |  (>= IP_INPUT)|   | | E | 
NEXT_HOP_IPV6 (>= DEFRAG ) |
+ * | |  (>= IP_INPUT)|   | | E |
NEXT_HOP_IPV6 (>= POST_DEFRAG ) |
  * +-+---+---+-+ G |   
 |
  * | R2  |UNUSED | X | | 0 |   
 |
  * | |   | R | |   |   
 |
@@ -10074,13 +10075,13 @@ add_ecmp_symmetric_reply_flows(struct hmap *lflows,
   route->is_src_route ? "dst" : "src",
   cidr);
 free(cidr);
-ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_DEFRAG, 100,
+

[ovs-dev] [PATCH v2 ovn 2/2] northd: add flows to defrag IP traffic

2023-02-27 Thread Lorenzo Bianconi
Introduce a priority-100 flow in the ingress router defrag stage in
order to just perform IP traffic defragmentation without doing any dnat
operation. This change is necessary since the logical flow reported
below fails for IP fragmented traffic since L4 port info is available
just in the first fragment:

table=5 (lr_in_defrag   ), priority=110  , match=(ip && ip4.dst == 
172.16.0.111 && udp), action=(reg0 = 172.16.0.111; reg9[16..31] = udp.dst; 
ct_dnat;)

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2170885
Fixes: d91f359b7694 ("northd: Add VIP port to established flows in DNAT table 
for Load Balancers")
Signed-off-by: Lorenzo Bianconi 
---
 northd/northd.c | 13 +
 northd/ovn-northd.8.xml | 16 ++--
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/northd/northd.c b/northd/northd.c
index 97589e31d..38ccd1f5f 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -10841,19 +10841,26 @@ build_lrouter_defrag_flows_for_lb(struct 
ovn_northd_lb *lb,
 }
 
 struct ds defrag_actions = DS_EMPTY_INITIALIZER;
+struct ds defrag_match = DS_EMPTY_INITIALIZER;
+
 for (size_t i = 0; i < lb->n_vips; i++) {
 struct ovn_lb_vip *lb_vip = >vips[i];
 int prio = 100;
 
 ds_clear(_actions);
+ds_clear(_match);
 ds_clear(match);
 
 if (lb_vip->address_family == AF_INET) {
 ds_put_format(match, "ip && ip4.dst == %s", lb_vip->vip_str);
+ds_put_format(_match, "ip && ip4.dst == %s && ip.is_frag",
+  lb_vip->vip_str);
 ds_put_format(_actions, REG_NEXT_HOP_IPV4" = %s; ",
   lb_vip->vip_str);
 } else {
 ds_put_format(match, "ip && ip6.dst == %s", lb_vip->vip_str);
+ds_put_format(_match, "ip && ip6.dst == %s && ip.is_frag",
+  lb_vip->vip_str);
 ds_put_format(_actions, REG_NEXT_HOP_IPV6" = %s; ",
   lb_vip->vip_str);
 }
@@ -10868,11 +10875,17 @@ build_lrouter_defrag_flows_for_lb(struct 
ovn_northd_lb *lb,
 
 ds_put_format(_actions, "ct_dnat;");
 
+/* Add flow for defrag ip traffic. */
+ovn_lflow_add_with_dp_group(
+lflows, lb->nb_lr_map, S_ROUTER_IN_DEFRAG, 100,
+ds_cstr(_match), "ct_next;", >nlb->header_);
+
 ovn_lflow_add_with_dp_group(
 lflows, lb->nb_lr_map, S_ROUTER_IN_POST_DEFRAG, prio,
 ds_cstr(match), ds_cstr(_actions), >nlb->header_);
 }
 ds_destroy(_actions);
+ds_destroy(_match);
 }
 
 static void
diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
index 03eced0e4..03140ab6c 100644
--- a/northd/ovn-northd.8.xml
+++ b/northd/ovn-northd.8.xml
@@ -3277,10 +3277,22 @@ icmp6 {
 
 
   This is to send packets to connection tracker for tracking and
-  defragmentation.  It contains a priority-0 flow that simply moves traffic
-  to the next table.
+  defragmentation.
 
 
+
+  
+For each load balancer VIP, a priority-100 flow is added with match
+ip  ip.dst == VIP 
+ip.is_frag and action ct_next;
+  
+
+  
+A priority 0 flow is added which matches on all packets and applies
+the action next;.
+  
+
+
 Ingress Table 6: POST_DEFRAG
 
 
-- 
2.39.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 ovn 0/2] fix IP fragmented traffic hitting LB

2023-02-27 Thread Lorenzo Bianconi
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2170885

Changes since v1:
- defrag just ip fragmented traffic for load-balancer VIP instead of all IP
  fragmented traffic

Lorenzo Bianconi (2):
  northd: move defrag router pipeline stage forward
  northd: add flows to defrag IP traffic

 northd/northd.c |  59 +++--
 northd/ovn-northd.8.xml |  61 +++--
 tests/ovn-northd.at | 526 
 tests/ovn.at|  52 ++--
 tests/system-ovn.at |   8 +-
 5 files changed, 370 insertions(+), 336 deletions(-)

-- 
2.39.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 0/5] Add support for preffered src address in ovs-router

2023-02-27 Thread Eelco Chaudron



On 22 Feb 2023, at 11:29, Nobuhiro MIKI wrote:

> With this series, the preferred source address in ovs-router is obtained
> from both ovs/route/add command and kernel FIB.

Hi Nobuhiro,

I was on PTO last week, will try to review it later this week, or early next 
week.

//Eelco


> v5:
> - Add patch to fix man page
> v4:
> - Add cleanup patch for ovs/route/add
> - Remove unrelated code
> v3:
> - Fix netdev-dummy to support multiple IP addresses
> - Add validation and unit tests for ovs/route/add
> - Refactor parsing for optional parameters in ovs/route/add command
> v2:
> - Add NEWS
>
> Nobuhiro MIKI (5):
>   netdev-dummy: Support multiple IP addresses.
>   ovs-router: Cleanup parser for ovs/route/add command.
>   ofproto: Fix mam page for tunnel related commands.
>   ovs-router: Introduce src option in ovs/route/add command.
>   route-table: Retrieving the preferred source address from Netlink.
>
>  NEWS|   3 +
>  lib/netdev-dummy.c  |  67 ++--
>  lib/ovs-router.c| 137 
>  lib/ovs-router.h|   3 +-
>  lib/route-table.c   |  16 +++-
>  ofproto/ofproto-tnl-unixctl.man |   9 ++-
>  tests/ovs-router.at | 105 +++-
>  tests/system-route.at   |  39 +
>  8 files changed, 311 insertions(+), 68 deletions(-)
>
> -- 
> 2.31.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/1] tc: Fix cleaning chains

2023-02-27 Thread Eelco Chaudron



On 22 Feb 2023, at 13:46, Simon Horman wrote:

> On Wed, Feb 22, 2023 at 12:30:17PM +0200, Roi Dayan via dev wrote:
>> Sometimes there is a need to clean empty chains as done in
>> delete_chains_from_netdev().  The cited commit doesn't remove
>> the chain completely which cause adding ingress_block later to fail.
>> This can be reproduced with adding bond as ovs port which makes ovs
>> use ingress_block for it.
>> While at it add the netdev name that fails to the log.
>>
>> Fixes: e1e5eac5b016 ("tc: Add TCA_KIND flower to delete and get operation to 
>> avoid rtnl_lock().")
>> Signed-off-by: Roi Dayan 
>
> I think this needs an ack from Eelco (CCed).
>
> But it looks good to me.
>
> Reviewed-by: Simon Horman 

The changes look good to me. Will it be worth adding a test case?

Acked-by: Eelco Chaudron 


>> ---
>>  lib/netdev-offload-tc.c | 7 ---
>>  lib/tc.c| 4 +++-
>>  2 files changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
>> index 4fb9d9f2127a..9dd0aa2e2a85 100644
>> --- a/lib/netdev-offload-tc.c
>> +++ b/lib/netdev-offload-tc.c
>> @@ -524,7 +524,7 @@ delete_chains_from_netdev(struct netdev *netdev, struct 
>> tcf_id *id)
>>   */
>>  HMAP_FOR_EACH_POP (chain_node, node, ) {
>>  id->chain = chain_node->chain;
>> -tc_del_flower_filter(id);
>> +tc_del_filter(id, NULL);
>>  free(chain_node);
>>  }
>>  }
>> @@ -2860,8 +2860,9 @@ netdev_tc_init_flow_api(struct netdev *netdev)
>>  error = tc_add_del_qdisc(ifindex, true, block_id, hook);
>>
>>  if (error && error != EEXIST) {
>> -VLOG_INFO("failed adding ingress qdisc required for offloading: %s",
>> -  ovs_strerror(error));
>> +VLOG_INFO("failed adding ingress qdisc required for offloading "
>> +  "on %s: %s",
>> +  netdev_get_name(netdev), ovs_strerror(error));
>>  return error;
>>  }
>>
>> diff --git a/lib/tc.c b/lib/tc.c
>> index 4c07e22162e7..5c32c6f971da 100644
>> --- a/lib/tc.c
>> +++ b/lib/tc.c
>> @@ -2354,7 +2354,9 @@ tc_del_filter(struct tcf_id *id, const char *kind)
>>  struct ofpbuf request;
>>
>>  request_from_tcf_id(id, 0, RTM_DELTFILTER, NLM_F_ACK, );
>> -nl_msg_put_string(, TCA_KIND, kind);
>> +if (kind) {
>> +nl_msg_put_string(, TCA_KIND, kind);
>> +}
>>  return tc_transact(, NULL);
>>  }
>>
>> -- 
>> 2.38.0
>>
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [External] Re: [PATCH v3] utilities/ofctl: add-meters for save and restore

2023-02-27 Thread Wan Junjie
Hi Adrian,

Thank you for your review.

On Sat, Feb 25, 2023 at 1:17 AM Adrian Moreno  wrote:
>
> Hi Wan,
>
> Sorry for the delay in the review.
> I've started looking at this patch. Please see some initial comments below. 
> I'll
> continue on Monday.
>
> Thanks.
>
> On 11/15/22 05:16, Wan Junjie wrote:
> > put dump-meters' result in one line so add-meters can handle.
> > save and restore meters when restart ovs.
> > bundle functions are not implemented in this patch.
> >
> > Signed-off-by: Wan Junjie 
> > ---
> > v3:
> > add '--oneline' option for dump-meter(s) command
> >
> > v2:
> > fix failed testcases, as dump-meters format changes
> > ---
> >   include/openvswitch/ofp-meter.h |   9 +-
> >   lib/ofp-meter.c | 103 -
> >   lib/ofp-print.c |   5 +-
> >   tests/dpif-netdev.at|   9 ++
> >   utilities/ovs-ofctl.8.in|  20 ++-
> >   utilities/ovs-ofctl.c   | 263 +---
> >   utilities/ovs-save  |   8 +
> >   7 files changed, 383 insertions(+), 34 deletions(-)
> >
> > diff --git a/include/openvswitch/ofp-meter.h 
> > b/include/openvswitch/ofp-meter.h
> > index 6776eae87..0514b4ec4 100644
> > --- a/include/openvswitch/ofp-meter.h
> > +++ b/include/openvswitch/ofp-meter.h
> > @@ -61,7 +61,8 @@ int ofputil_decode_meter_config(struct ofpbuf *,
> >   struct ofputil_meter_config *,
> >   struct ofpbuf *bands);
> >   void ofputil_format_meter_config(struct ds *,
> > - const struct ofputil_meter_config *);
> > + const struct ofputil_meter_config *,
> > + int);
> >
> >   struct ofputil_meter_mod {
> >   uint16_t command;
> > @@ -79,6 +80,12 @@ char *parse_ofp_meter_mod_str(struct ofputil_meter_mod 
> > *, const char *string,
> >   OVS_WARN_UNUSED_RESULT;
> >   void ofputil_format_meter_mod(struct ds *, const struct ofputil_meter_mod 
> > *);
> >
> > +char *parse_ofp_meter_mod_file(const char *file_name,
> > + int command,
> > + struct ofputil_meter_mod **mms, size_t *n_mms,
> > + enum ofputil_protocol *usable_protocols)
> > +OVS_WARN_UNUSED_RESULT;
> > +
>
> Please align all the arguments one space after the "(". Also I think "int
> command" fits in the first line.
>
>
OK, I'll fix it.

> >   struct ofputil_meter_stats {
> >   uint32_t meter_id;
> >   uint32_t flow_count;
> > diff --git a/lib/ofp-meter.c b/lib/ofp-meter.c
> > index 9ea40a0bf..b94cb6a05 100644
> > --- a/lib/ofp-meter.c
> > +++ b/lib/ofp-meter.c
> > @@ -15,6 +15,7 @@
> >*/
> >
> >   #include 
> > +#include 
> >   #include "openvswitch/ofp-meter.h"
> >   #include "byte-order.h"
> >   #include "nx-match.h"
> > @@ -57,7 +58,7 @@ void
> >   ofputil_format_meter_band(struct ds *s, enum ofp13_meter_flags flags,
> > const struct ofputil_meter_band *mb)
> >   {
> > -ds_put_cstr(s, "\ntype=");
> > +ds_put_cstr(s, "type=");
> >   switch (mb->type) {
> >   case OFPMBT13_DROP:
> >   ds_put_cstr(s, "drop");
> > @@ -343,7 +344,7 @@ ofp_print_meter_flags(struct ds *s, enum 
> > ofp13_meter_flags flags)
> >
> >   void
> >   ofputil_format_meter_config(struct ds *s,
> > -const struct ofputil_meter_config *mc)
> > +const struct ofputil_meter_config *mc, int 
> > oneline)
> >   {
> >   uint16_t i;
> >
> > @@ -354,9 +355,12 @@ ofputil_format_meter_config(struct ds *s,
> >
> >   ds_put_cstr(s, "bands=");
> >   for (i = 0; i < mc->n_bands; ++i) {
> > +ds_put_cstr(s, oneline > 0 ? " ": "\n");
> >   ofputil_format_meter_band(s, mc->flags, >bands[i]);
> >   }
> > -ds_put_char(s, '\n');
> > +if (oneline == 0) {
> > +ds_put_char(s, '\n');
> > +}
> >   }
> >
> >   static enum ofperr
> > @@ -578,6 +582,24 @@ parse_ofp_meter_mod_str__(struct ofputil_meter_mod 
> > *mm, char *string,
> >
> >   /* Meters require at least OF 1.3. */
> >   *usable_protocols = OFPUTIL_P_OF13_UP;
> > +if (command == -2) {
> > +size_t len;
> > +
> > +string += strspn(string, " \t\r\n");   /* Skip white space. */
> > +len = strcspn(string, ", \t\r\n"); /* Get length of the first 
> > token. */
> > +
> > +if (!strncmp(string, "add", len)) {
> > +command = OFPMC13_ADD;
> > +} else if (!strncmp(string, "delete", len)) {
> > +command = OFPMC13_DELETE;
> > +} else if (!strncmp(string, "modify", len)) {
> > +command = OFPMC13_MODIFY;
> > +} else {
> > +len = 0;
> > +command = OFPMC13_ADD;
> > +}
> > +string += len;
> > +}
> >
> >   switch (command) {
> >   case -1:
> > @@ -606,6 +628,11 @@ parse_ofp_meter_mod_str__(struct ofputil_meter_mod 
> 

[ovs-dev] [PATCH ovn v2] northd, lsp: Add additional arp proxy features

2023-02-27 Thread Enrique Llorente
Configure mac address
The mac address returned by ARP/NDP can be configured similar to LSP
addresses where the mac is the first entry on the list

IPv6
Support NDP IPv6 protocol

Use CIDRs
Allow to specify subnets for ipv4 and ipv6, they will match whatever
address is received from ARP/NDP

Signed-off-by: Enrique Llorente 
---
 northd/northd.c   | 141 +++---
 northd/ovn-northd.8.xml   |  12 +-
 ovn-nb.xml|  18 ++-
 tests/ovn.at  | 183 +++
 tests/system-common-macros.at |   5 +-
 tests/system-ovn.at   | 271 ++
 6 files changed, 571 insertions(+), 59 deletions(-)

diff --git a/northd/northd.c b/northd/northd.c
index 770a5b50e..3fc48e71d 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -8644,29 +8644,43 @@ build_lswitch_arp_nd_responder_known_ips(struct 
ovn_port *op,
 }
 }
 }
-
-if (op->peer) {
-const char *arp_proxy = smap_get(>nbsp->options,"arp_proxy");
-
+const char *arp_proxy = smap_get(>nbsp->options,"arp_proxy");
+if (arp_proxy) {
 struct lport_addresses proxy_arp_addrs;
-int i = 0;
+int i, ofs = 0;
+/* Either takes "MAC IP1 IP2" or "IP1 IP2" */
+if (!extract_addresses(arp_proxy, _arp_addrs, ) &&
+!extract_ip_addresses(arp_proxy, _arp_addrs)) {
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
+VLOG_WARN_RL(, "Invalid arp_proxy option: '%s' at lsp '%s'",
+ arp_proxy, op->nbsp->name);
+return;
+}
+
+/* Select the mac address to answer the proxy ARP/NDP */
+char *ea_s = NULL;
+if (!eth_addr_is_zero(proxy_arp_addrs.ea)) {
+ea_s = proxy_arp_addrs.ea_s;
+} else if (op->peer) {
+ea_s = op->peer->lrp_networks.ea_s;
+} else {
+return;
+}
 
-/* Add responses for ARP proxies. */
-if (arp_proxy && extract_ip_addresses(arp_proxy,
-  _arp_addrs) &&
-proxy_arp_addrs.n_ipv4_addrs) {
+/* Add IPv4 responses for ARP proxies. */
+if (proxy_arp_addrs.n_ipv4_addrs) {
 /* Match rule on all proxy ARP IPs. */
 ds_clear(match);
 ds_put_cstr(match, "arp.op == 1 && arp.tpa == {");
 
 for (i = 0; i < proxy_arp_addrs.n_ipv4_addrs; i++) {
-ds_put_format(match, "%s,",
-  proxy_arp_addrs.ipv4_addrs[i].addr_s);
+ds_put_format(match, "%s/%u,",
+  proxy_arp_addrs.ipv4_addrs[i].addr_s,
+  proxy_arp_addrs.ipv4_addrs[i].plen);
 }
 
 ds_chomp(match, ',');
 ds_put_cstr(match, "}");
-destroy_lport_addresses(_arp_addrs);
 
 ds_clear(actions);
 ds_put_format(actions,
@@ -8679,12 +8693,69 @@ build_lswitch_arp_nd_responder_known_ips(struct 
ovn_port *op,
 "outport = inport; "
 "flags.loopback = 1; "
 "output;",
-op->peer->lrp_networks.ea_s,
-op->peer->lrp_networks.ea_s);
+ea_s,
+ea_s);
 
 ovn_lflow_add_with_hint(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP,
 50, ds_cstr(match), ds_cstr(actions), >nbsp->header_);
 }
+
+/* Add IPv6 NDP responses.
+ * For ND solicitations, we need to listen for both the
+ * unicast IPv6 address and its all-nodes multicast address,
+ * but always respond with the unicast IPv6 address. */
+if (proxy_arp_addrs.n_ipv6_addrs) {
+struct ds ip6_dst_match = DS_EMPTY_INITIALIZER;
+struct ds nd_target_match = DS_EMPTY_INITIALIZER;
+for (size_t j = 0; j < proxy_arp_addrs.n_ipv6_addrs; j++) {
+ds_put_format(_dst_match, "%s/%u, %s/%u",
+proxy_arp_addrs.ipv6_addrs[j].addr_s,
+proxy_arp_addrs.ipv6_addrs[j].plen,
+proxy_arp_addrs.ipv6_addrs[j].sn_addr_s,
+proxy_arp_addrs.ipv6_addrs[j].plen);
+ds_put_format(_target_match,
+"%s/%u",
+proxy_arp_addrs.ipv6_addrs[j].addr_s,
+proxy_arp_addrs.ipv6_addrs[j].plen);
+if (j+1 < proxy_arp_addrs.n_ipv6_addrs) {
+ds_put_cstr(_dst_match, ", ");
+ds_put_cstr(_target_match, ", ");
+}
+}
+  

Re: [ovs-dev] [PATCH] treewide: Remove uses of ATOMIC_VAR_INIT

2023-02-27 Thread Ilya Maximets
On 2/24/23 22:34, Fangrui Song via dev wrote:
> ATOMIC_VAR_INIT has a trivial definition
> `#define ATOMIC_VAR_INIT(value) (value)`,
> is deprecated in C17/C++20, and will be removed in newer standards in
> newer GCC/Clang (e.g. https://reviews.llvm.org/D144196).
> 
> Signed-off-by: Fangrui Song 

Hi.  Thanks for the patch.

It is interesting, and I guess, we can just remove the
initialization macro.  But if we're going to do that, we
should also remove definitions for different compilers
in lib/ovs-atomic-*.h and update the documentation in
lib/ovs-atomic.h.

Ideally, add a check to utilities/checkpatch.py to warn
about use of a deprecated macro.

Best regards, Ilya Maximets.

> ---
>  lib/dpdk.c|  2 +-
>  lib/mpsc-queue.h  |  6 +++---
>  lib/ovs-rcu.h |  4 ++--
>  lib/ovs-replay.c  |  2 +-
>  lib/versions.h|  2 +-
>  lib/vlog.c|  2 +-
>  ofproto/ofproto-dpif-upcall.c |  4 ++--
>  tests/test-atomic.c   | 12 ++--
>  8 files changed, 17 insertions(+), 17 deletions(-)
> 
> diff --git a/lib/dpdk.c b/lib/dpdk.c
> index 240babc03..d76d53f8f 100644
> --- a/lib/dpdk.c
> +++ b/lib/dpdk.c
> @@ -47,7 +47,7 @@ VLOG_DEFINE_THIS_MODULE(dpdk);
>  static FILE *log_stream = NULL;   /* Stream for DPDK log redirection */
>  
>  /* Indicates successful initialization of DPDK. */
> -static atomic_bool dpdk_initialized = ATOMIC_VAR_INIT(false);
> +static atomic_bool dpdk_initialized = false;
>  
>  static bool
>  args_contains(const struct svec *args, const char *value)
> diff --git a/lib/mpsc-queue.h b/lib/mpsc-queue.h
> index 8c7109621..70c2d7a01 100644
> --- a/lib/mpsc-queue.h
> +++ b/lib/mpsc-queue.h
> @@ -116,9 +116,9 @@ struct mpsc_queue {
>  };
>  
>  #define MPSC_QUEUE_INITIALIZER(Q) { \
> -.head = ATOMIC_VAR_INIT(&(Q)->stub), \
> -.tail = ATOMIC_VAR_INIT(&(Q)->stub), \
> -.stub = { .next = ATOMIC_VAR_INIT(NULL) }, \
> +.head = &(Q)->stub, \
> +.tail = &(Q)->stub, \
> +.stub = { .next = NULL }, \
>  .read_lock = OVS_MUTEX_INITIALIZER, \
>  }
>  
> diff --git a/lib/ovs-rcu.h b/lib/ovs-rcu.h
> index 8b397b7fb..a1c15c126 100644
> --- a/lib/ovs-rcu.h
> +++ b/lib/ovs-rcu.h
> @@ -175,7 +175,7 @@
>  
>  #if __GNUC__
>  #define OVSRCU_TYPE(TYPE) struct { ATOMIC(TYPE) p; }
> -#define OVSRCU_INITIALIZER(VALUE) { ATOMIC_VAR_INIT(VALUE) }
> +#define OVSRCU_INITIALIZER(VALUE) { VALUE }
>  #define ovsrcu_get__(TYPE, VAR, ORDER)  \
>  ({  \
>  TYPE value__;   \
> @@ -207,7 +207,7 @@
>  #else  /* not GNU C */
>  struct ovsrcu_pointer { ATOMIC(void *) p; };
>  #define OVSRCU_TYPE(TYPE) struct ovsrcu_pointer
> -#define OVSRCU_INITIALIZER(VALUE) { ATOMIC_VAR_INIT(VALUE) }
> +#define OVSRCU_INITIALIZER(VALUE) { VALUE }
>  static inline void *
>  ovsrcu_get__(const struct ovsrcu_pointer *pointer, memory_order order)
>  {
> diff --git a/lib/ovs-replay.c b/lib/ovs-replay.c
> index f386246c7..551c7f56d 100644
> --- a/lib/ovs-replay.c
> +++ b/lib/ovs-replay.c
> @@ -34,7 +34,7 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(10, 
> 25);
>  
>  static struct ovs_mutex replay_mutex = OVS_MUTEX_INITIALIZER;
>  static int replay_seqno OVS_GUARDED_BY(replay_mutex) = 0;
> -static atomic_int replay_state = ATOMIC_VAR_INIT(OVS_REPLAY_NONE);
> +static atomic_int replay_state = OVS_REPLAY_NONE;
>  
>  static char *dirname = NULL;
>  
> diff --git a/lib/versions.h b/lib/versions.h
> index d92f0a319..724880cb7 100644
> --- a/lib/versions.h
> +++ b/lib/versions.h
> @@ -36,7 +36,7 @@ struct versions {
>  };
>  
>  #define VERSIONS_INITIALIZER(ADD, REMOVE) \
> -(struct versions){ ADD, ATOMIC_VAR_INIT(REMOVE) }
> +(struct versions){ ADD, REMOVE }
>  
>  static inline void
>  versions_set_remove_version(struct versions *versions, ovs_version_t version)
> diff --git a/lib/vlog.c b/lib/vlog.c
> index 0a615bb66..9ddea48b8 100644
> --- a/lib/vlog.c
> +++ b/lib/vlog.c
> @@ -118,7 +118,7 @@ static struct ovs_list vlog_modules 
> OVS_GUARDED_BY(log_file_mutex)
>  static int syslog_fd OVS_GUARDED_BY(pattern_rwlock) = -1;
>  
>  /* Log facility configuration. */
> -static atomic_int log_facility = ATOMIC_VAR_INIT(0);
> +static atomic_int log_facility = 0;
>  
>  /* Facility name and its value. */
>  struct vlog_facility {
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index fc94078cb..914cf3f98 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -416,8 +416,8 @@ static int udpif_flow_unprogram(struct udpif *udpif, 
> struct udpif_key *ukey,
>  static upcall_callback upcall_cb;
>  static dp_purge_callback dp_purge_cb;
>  
> -static atomic_bool enable_megaflows = ATOMIC_VAR_INIT(true);
> -static atomic_bool enable_ufid = ATOMIC_VAR_INIT(true);
> +static atomic_bool enable_megaflows = true;
> 

[ovs-dev] [PATCH v2] ovs-dpctl: Add new command dpctl/ct-sweep-next-run.

2023-02-27 Thread Paolo Valerio
Since 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists
with rculists.") the sweep interval changed as well as the constraints
related to the sweeper.
Being able to change the default reschedule time may be convenient in
some conditions, like debugging.
This patch introduces new commands allowing to get and set the sweep
next run in ms.

Signed-off-by: Paolo Valerio 
---
v2:
- resolved conflict in NEWS
- added missing comment
- added missing '\' in dpctl.man
---
 NEWS|4 +++
 lib/conntrack-private.h |1 +
 lib/conntrack.c |   18 +-
 lib/conntrack.h |2 ++
 lib/ct-dpif.c   |   14 +++
 lib/ct-dpif.h   |1 +
 lib/dpctl.c |   61 +++
 lib/dpctl.man   |8 ++
 lib/dpif-netdev.c   |   17 +
 lib/dpif-netlink.c  |2 ++
 lib/dpif-provider.h |4 +++
 11 files changed, 131 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index 85b349621..4c4ef4b2b 100644
--- a/NEWS
+++ b/NEWS
@@ -10,6 +10,10 @@ Post-v3.1.0
in order to create OVSDB sockets with access mode of 0770.
- QoS:
  * Added new configuration option 'jitter' for a linux-netem QoS type.
+   - ovs-appctl:
+ * New commands "dpctl/{ct-get-sweep-next-run,ct-set-sweep-next-run}" that
+   allow to get and set, for the userspace datapath, the next run interval
+   for the conntrack garbage collector.
 
 
 v3.1.0 - 16 Feb 2023
diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index fae8b3a9b..bb326868e 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -224,6 +224,7 @@ struct conntrack {
 struct ipf *ipf; /* Fragmentation handling context. */
 uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */
 atomic_bool tcp_seq_chk; /* Check TCP sequence numbers. */
+atomic_uint32_t sweep_ms; /* Next sweep interval. */
 };
 
 /* Lock acquisition order:
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 5029b2cda..9356c1282 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -320,6 +320,7 @@ conntrack_init(void)
 atomic_count_init(>n_conn, 0);
 atomic_init(>n_conn_limit, DEFAULT_N_CONN_LIMIT);
 atomic_init(>tcp_seq_chk, true);
+atomic_init(>sweep_ms, 2);
 latch_init(>clean_thread_exit);
 ct->clean_thread = ovs_thread_create("ct_clean", clean_thread_main, ct);
 ct->ipf = ipf_init();
@@ -1480,6 +1481,21 @@ set_label(struct dp_packet *pkt, struct conn *conn,
 }
 
 
+int
+conntrack_set_sweep_next_run(struct conntrack *ct, uint32_t ms)
+{
+atomic_store_relaxed(>sweep_ms, ms);
+return 0;
+}
+
+uint32_t
+conntrack_get_sweep_next_run(struct conntrack *ct)
+{
+uint32_t ms;
+atomic_read_relaxed(>sweep_ms, );
+return ms;
+}
+
 static size_t
 ct_sweep(struct conntrack *ct, struct rculist *list, long long now)
 OVS_NO_THREAD_SAFETY_ANALYSIS
@@ -1504,7 +1520,7 @@ ct_sweep(struct conntrack *ct, struct rculist *list, long 
long now)
 static long long
 conntrack_clean(struct conntrack *ct, long long now)
 {
-long long next_wakeup = now + 20 * 1000;
+long long next_wakeup = now + conntrack_get_sweep_next_run(ct);
 unsigned int n_conn_limit, i;
 size_t clean_end, count = 0;
 
diff --git a/lib/conntrack.h b/lib/conntrack.h
index b064abc9f..2306cf375 100644
--- a/lib/conntrack.h
+++ b/lib/conntrack.h
@@ -139,6 +139,8 @@ int conntrack_set_maxconns(struct conntrack *ct, uint32_t 
maxconns);
 int conntrack_get_maxconns(struct conntrack *ct, uint32_t *maxconns);
 int conntrack_get_nconns(struct conntrack *ct, uint32_t *nconns);
 int conntrack_set_tcp_seq_chk(struct conntrack *ct, bool enabled);
+int conntrack_set_sweep_next_run(struct conntrack *ct, uint32_t ms);
+uint32_t conntrack_get_sweep_next_run(struct conntrack *ct);
 bool conntrack_get_tcp_seq_chk(struct conntrack *ct);
 struct ipf *conntrack_ipf_ctx(struct conntrack *ct);
 struct conntrack_zone_limit zone_limit_get(struct conntrack *ct,
diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
index d3b2783ce..0a08eb11c 100644
--- a/lib/ct-dpif.c
+++ b/lib/ct-dpif.c
@@ -368,6 +368,20 @@ ct_dpif_del_limits(struct dpif *dpif, const struct 
ovs_list *zone_limits)
 : EOPNOTSUPP);
 }
 
+int
+ct_dpif_sweep(struct dpif *dpif, uint32_t *ms)
+{
+if (*ms) {
+return (dpif->dpif_class->ct_set_sweep_next_run
+? dpif->dpif_class->ct_set_sweep_next_run(dpif, *ms)
+: EOPNOTSUPP);
+} else {
+return (dpif->dpif_class->ct_get_sweep_next_run
+? dpif->dpif_class->ct_get_sweep_next_run(dpif, ms)
+: EOPNOTSUPP);
+}
+}
+
 int
 ct_dpif_ipf_set_enabled(struct dpif *dpif, bool v6, bool enable)
 {
diff --git a/lib/ct-dpif.h b/lib/ct-dpif.h
index 5edbbfd3b..1e265604f 100644
--- a/lib/ct-dpif.h
+++ b/lib/ct-dpif.h
@@ -298,6 +298,7 @@ int ct_dpif_set_limits(struct dpif *dpif, const uint32_t 
*default_limit,
 int 

Re: [ovs-dev] [PATCH 2/2] WIP

2023-02-27 Thread 0-day Robot
Bleep bloop.  Greetings Paolo Valerio, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


Patch skipped due to previous failure.

Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/2] cli: add option to display the version from Cargo.toml.

2023-02-27 Thread 0-day Robot
Bleep bloop.  Greetings Paolo Valerio, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


git-am:
error: sha1 information is lacking or useless (src/cli/cli.rs).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 cli: add option to display the version from Cargo.toml.
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".


Patch skipped due to previous failure.

Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovs-dpctl: Add new command dpctl/ct-sweep-next-run.

2023-02-27 Thread 0-day Robot
Bleep bloop.  Greetings Paolo Valerio, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


git-am:
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 ovs-dpctl: Add new command dpctl/ct-sweep-next-run.
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".


Patch skipped due to previous failure.

Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/2] cli: add option to display the version from Cargo.toml.

2023-02-27 Thread Paolo Valerio
Sorry for the noise, but this local test got sent unintentionally.

Please, ignore it.

Paolo Valerio  writes:

> Signed-off-by: Paolo Valerio 
> ---
>  src/cli/cli.rs |1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/cli/cli.rs b/src/cli/cli.rs
> index a5b08e6..f8593e1 100644
> --- a/src/cli/cli.rs
> +++ b/src/cli/cli.rs
> @@ -73,6 +73,7 @@ impl Debug for dyn SubCommand {
>  ///
>  /// packet-tracer is a tool for capturing networking-related events from the 
> system using ebpf and analyzing them.
>  #[derive(Args, Default, Debug)]
> +#[command(version)]
>  pub(crate) struct MainConfig {}
>  
>  /// ThinCli handles the first (a.k.a "thin") round of Command Line Interface 
> parsing.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 2/2] WIP

2023-02-27 Thread Paolo Valerio
Signed-off-by: Paolo Valerio 

Signed-off-by: Paolo Valerio 
---
 src/main.rs |1 +
 1 file changed, 1 insertion(+)

diff --git a/src/main.rs b/src/main.rs
index c922fae..c28a07f 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -2,6 +2,7 @@ use anyhow::Result;
 use log::error;
 use simplelog::{Config, LevelFilter, SimpleLogger};
 
+
 mod cli;
 mod collect;
 mod core;

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 1/2] cli: add option to display the version from Cargo.toml.

2023-02-27 Thread Paolo Valerio
Signed-off-by: Paolo Valerio 
---
 src/cli/cli.rs |1 +
 1 file changed, 1 insertion(+)

diff --git a/src/cli/cli.rs b/src/cli/cli.rs
index a5b08e6..f8593e1 100644
--- a/src/cli/cli.rs
+++ b/src/cli/cli.rs
@@ -73,6 +73,7 @@ impl Debug for dyn SubCommand {
 ///
 /// packet-tracer is a tool for capturing networking-related events from the 
system using ebpf and analyzing them.
 #[derive(Args, Default, Debug)]
+#[command(version)]
 pub(crate) struct MainConfig {}
 
 /// ThinCli handles the first (a.k.a "thin") round of Command Line Interface 
parsing.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ovs-dpctl: Add new command dpctl/ct-sweep-next-run.

2023-02-27 Thread Paolo Valerio
Since 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists
with rculists.") the sweep interval changed as well as the constraints
related to the sweeper.
Being able to change the default reschedule time may be convenient in
some conditions, like debugging.
This patch introduces new commands allowing to get and set the sweep
next run in ms.

Signed-off-by: Paolo Valerio 
---
 NEWS|4 +++
 lib/conntrack-private.h |1 +
 lib/conntrack.c |   18 +-
 lib/conntrack.h |2 ++
 lib/ct-dpif.c   |   14 +++
 lib/ct-dpif.h   |1 +
 lib/dpctl.c |   61 +++
 lib/dpctl.man   |8 ++
 lib/dpif-netdev.c   |   17 +
 lib/dpif-netlink.c  |2 ++
 lib/dpif-provider.h |4 +++
 11 files changed, 131 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index 391badd7c..c80f44429 100644
--- a/NEWS
+++ b/NEWS
@@ -4,6 +4,10 @@ Post-v3.1.0
  * OVS now collects per-interface upcall statistics that can be obtained
via 'ovs-appctl dpctl/show -s' or the interface's statistics column
in OVSDB.  Available with upstream kernel 6.2+.
+   - ovs-appctl:
+ * New commands "dpctl/{ct-get-sweep-next-run,ct-set-sweep-next-run}" that
+   allow to get and set, for the userspace datapath, the next run interval
+   for the conntrack garbage collector.
 
 
 v3.1.0 - 16 Feb 2023
diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index fae8b3a9b..3438c3554 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -224,6 +224,7 @@ struct conntrack {
 struct ipf *ipf; /* Fragmentation handling context. */
 uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */
 atomic_bool tcp_seq_chk; /* Check TCP sequence numbers. */
+atomic_uint32_t sweep_ms;
 };
 
 /* Lock acquisition order:
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 524670e45..e9a37f2c1 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -320,6 +320,7 @@ conntrack_init(void)
 atomic_count_init(>n_conn, 0);
 atomic_init(>n_conn_limit, DEFAULT_N_CONN_LIMIT);
 atomic_init(>tcp_seq_chk, true);
+atomic_init(>sweep_ms, 2);
 latch_init(>clean_thread_exit);
 ct->clean_thread = ovs_thread_create("ct_clean", clean_thread_main, ct);
 ct->ipf = ipf_init();
@@ -1480,6 +1481,21 @@ set_label(struct dp_packet *pkt, struct conn *conn,
 }
 
 
+int
+conntrack_set_sweep_next_run(struct conntrack *ct, uint32_t ms)
+{
+atomic_store_relaxed(>sweep_ms, ms);
+return 0;
+}
+
+uint32_t
+conntrack_get_sweep_next_run(struct conntrack *ct)
+{
+uint32_t ms;
+atomic_read_relaxed(>sweep_ms, );
+return ms;
+}
+
 static size_t
 ct_sweep(struct conntrack *ct, struct rculist *list, long long now)
 OVS_NO_THREAD_SAFETY_ANALYSIS
@@ -1504,7 +1520,7 @@ ct_sweep(struct conntrack *ct, struct rculist *list, long 
long now)
 static long long
 conntrack_clean(struct conntrack *ct, long long now)
 {
-long long next_wakeup = now + 20 * 1000;
+long long next_wakeup = now + conntrack_get_sweep_next_run(ct);
 unsigned int n_conn_limit, i;
 size_t clean_end, count = 0;
 
diff --git a/lib/conntrack.h b/lib/conntrack.h
index b064abc9f..2306cf375 100644
--- a/lib/conntrack.h
+++ b/lib/conntrack.h
@@ -139,6 +139,8 @@ int conntrack_set_maxconns(struct conntrack *ct, uint32_t 
maxconns);
 int conntrack_get_maxconns(struct conntrack *ct, uint32_t *maxconns);
 int conntrack_get_nconns(struct conntrack *ct, uint32_t *nconns);
 int conntrack_set_tcp_seq_chk(struct conntrack *ct, bool enabled);
+int conntrack_set_sweep_next_run(struct conntrack *ct, uint32_t ms);
+uint32_t conntrack_get_sweep_next_run(struct conntrack *ct);
 bool conntrack_get_tcp_seq_chk(struct conntrack *ct);
 struct ipf *conntrack_ipf_ctx(struct conntrack *ct);
 struct conntrack_zone_limit zone_limit_get(struct conntrack *ct,
diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
index d3b2783ce..0a08eb11c 100644
--- a/lib/ct-dpif.c
+++ b/lib/ct-dpif.c
@@ -368,6 +368,20 @@ ct_dpif_del_limits(struct dpif *dpif, const struct 
ovs_list *zone_limits)
 : EOPNOTSUPP);
 }
 
+int
+ct_dpif_sweep(struct dpif *dpif, uint32_t *ms)
+{
+if (*ms) {
+return (dpif->dpif_class->ct_set_sweep_next_run
+? dpif->dpif_class->ct_set_sweep_next_run(dpif, *ms)
+: EOPNOTSUPP);
+} else {
+return (dpif->dpif_class->ct_get_sweep_next_run
+? dpif->dpif_class->ct_get_sweep_next_run(dpif, ms)
+: EOPNOTSUPP);
+}
+}
+
 int
 ct_dpif_ipf_set_enabled(struct dpif *dpif, bool v6, bool enable)
 {
diff --git a/lib/ct-dpif.h b/lib/ct-dpif.h
index 5edbbfd3b..1e265604f 100644
--- a/lib/ct-dpif.h
+++ b/lib/ct-dpif.h
@@ -298,6 +298,7 @@ int ct_dpif_set_limits(struct dpif *dpif, const uint32_t 
*default_limit,
 int ct_dpif_get_limits(struct dpif *dpif, uint32_t *default_limit,
  

Re: [ovs-dev] [PATCH v22 0/8] Add offload support for sFlow

2023-02-27 Thread Eelco Chaudron


On 23 Feb 2023, at 12:26, Chris Mi wrote:

> v2-v1:
> - Fix robot errors.
> v3-v2:
> - Remove Gerrit Change-Id.
> - Add patch #9 to fix older kernels build issue.
> - Add travis test result.
> v4-v3:
> - Fix offload issue when sampling rate is 1.
> v5-v4:
> - Move polling thread from ofproto to netdev-offload-tc.
> v6-v5:
> - Rebase.
> - Add GitHub Actions test result.
> v7-v6:
> - Remove Gerrit Change-Id.
> - Fix "ERROR: Inappropriate spacing around cast"
> v8-v7
> - Address Eelco Chaudron's comment for patch #11.
> v9-v8
> - Remove sflow_len from struct dpif_sflow_attr.
> - Log a debug message for other userspace actions.
> v10-v9
> - Address Eelco Chaudron's comments on v9.
> v11-v10
> - Fix a bracing error.
> v12-v11
> - Add duplicate sample group id check.
> v13-v12
> - Remove the psample poll thread from netdev-offload-tc and reuse
>   ofproto handler thread according to Ilya's new desgin.
> - Add dpif-offload-provider layer according to Eli's suggestion.
> v14-v13
> - Fix a robot error.
> v15-v14
> - Address Eelco Chaudron's comments on v14.
> v16-v15
> - Address Eelco Chaudron's comments on v15.
> - Add two test cases.
> v17-v16
> - Address Eelco Chaudron's comments on v16.
> - Move struct dpif_offload_api from struct dpif_class to struct dpif.
> v18-v17
> - Rename dpif_offload_api to dpif_offload_class.
> - Add init and destroy callbacks in dpif_offload_class. They are called
>   when registering dpif_offload_class.
> v19-18
> - Fix a bug that psample_sock is destroyed when last bridge is deleted.
> v20-19
> - Move buf_stub to struct dpif_offload_sflow avoid garbage values when
>   ofproto proceses the sampled packet.
> v21-20
> - Remove netdev dummy for dpif-offload according to Eelco's comment.
> v22-21
> - Address Ilya's comments:
>  - Remove dpif-offload-provider layer.
>  - Remove process_offload_sflow and reuse upcall_receive.
>  - Introduce sample id pool.
>  - Introduce netdev_offload_recv.
>
> Chris Mi (8):
>   compat: Add psample and tc sample action defines for older kernels
>   ovs-kmod-ctl: Load kernel module psample
>   netdev-offload-tc: Introduce group ID management API
>   netdev-offload-tc: Add sFlow offload API for TC
>   netdev-offload: Add netdev offload recv and recv_wait APIs
>   dpif-netlink: Add netdev offload recv in normal recv upcalls
>   netdev-offload-tc: Add offload support for sFlow
>   system-offloads-traffic.at: Add sFlow offload test cases
>
>  include/linux/automake.mk|   4 +-
>  include/linux/psample.h  |  62 +++
>  include/linux/tc_act/tc_sample.h |  25 ++
>  lib/dpif-netlink.c   |  46 ++-
>  lib/dpif.h   |   7 +
>  lib/netdev-offload-provider.h|  11 +
>  lib/netdev-offload-tc.c  | 647 ++-
>  lib/netdev-offload.c |  32 ++
>  lib/netdev-offload.h |  13 +
>  lib/tc.c |  62 ++-
>  lib/tc.h |  12 +-
>  ofproto/ofproto-dpif-upcall.c|  23 +-
>  tests/system-offloads-traffic.at | 101 +
>  utilities/ovs-kmod-ctl.in|  14 +
>  14 files changed, 1036 insertions(+), 23 deletions(-)
>  create mode 100644 include/linux/psample.h
>  create mode 100644 include/linux/tc_act/tc_sample.h

As I was on PTO last week, and in addition, I see a lot of comments already 
from Ilya, I’ll wait for the next revision before doing another full review.

//Eelco

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] system-traffic.at: Add icmp error tests while dnatting address and port.

2023-02-27 Thread Paolo Valerio
The two tests verify, for both icmp and icmpv6, that the correct port
translation happen in the inner packet in the case an error is
received in the reply direction.

Signed-off-by: Paolo Valerio 
---
 tests/system-traffic.at |   72 +++
 1 file changed, 72 insertions(+)

diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 3a15b88a2..02fd0ee1b 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -3561,6 +3561,42 @@ AT_CHECK([ovs-appctl dpctl/dump-conntrack | 
FORMAT_CT(172.16.0.3)], [0], [dnl
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([conntrack - ICMP related NAT with single port])
+AT_SKIP_IF([test $HAVE_NC = no])
+AT_SKIP_IF([test $HAVE_TCPDUMP = no])
+CHECK_CONNTRACK()
+CHECK_CONNTRACK_NAT()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", "f0:00:00:01:01:01")
+ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", "f0:00:00:01:01:02")
+
+NS_CHECK_EXEC([at_ns0], [ip neigh add 10.1.1.240 lladdr f0:00:00:01:01:02 dev 
p0])
+NS_CHECK_EXEC([at_ns1], [ip neigh add 10.1.1.1 lladdr f0:00:00:01:01:01 dev 
p1])
+
+AT_DATA([flows.txt], [dnl
+table=0,ip,ct_state=-trk,actions=ct(table=0,nat)
+table=0,in_port=ovs-p0,udp,ct_state=+trk+new,actions=ct(commit,nat(dst=10.1.1.2:8080)),ovs-p1
+table=0,in_port=ovs-p1,ct_state=+trk+rel+rpl,icmp,actions=ovs-p0
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+rm p0.pcap
+NETNS_DAEMONIZE([at_ns0], [tcpdump -l -U -i p0 -w p0.pcap 2>tcpdump0_err], 
[tcpdump0.pid])
+NS_CHECK_EXEC([at_ns0], [bash -c "echo dest_unreach | nc $NC_EOF_OPT -p 1234 
-u 10.1.1.240 80"])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.1," | 
sort], [0], [dnl
+udp,orig=(src=10.1.1.1,dst=10.1.1.240,sport=1234,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=8080,dport=1234)
+])
+
+OVS_WAIT_UNTIL([ovs-pcap p0.pcap | grep -Eq 
"f0010101f0010102080045c00045[[[:xdigit:]]]{4}4001[[[:xdigit:]]]{4}0a0101f00a010101030314164529[[[:xdigit:]]]{4}40004011[[[:xdigit:]]]{4}0a0101010a0101f004d2005000156b24646573745f756e72656163680a"])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([conntrack - IPv4 fragmentation])
 CHECK_CONNTRACK()
 OVS_TRAFFIC_VSWITCHD_START()
@@ -6555,6 +6591,42 @@ 
udp,orig=(src=fc00::1,dst=fc00::2,sport=,dport=),reply=(src=fc
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([conntrack - ICMPv6 related NAT with single port])
+AT_SKIP_IF([test $HAVE_NC = no])
+AT_SKIP_IF([test $HAVE_TCPDUMP = no])
+CHECK_CONNTRACK()
+CHECK_CONNTRACK_NAT()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "fc00::1/96", "f0:00:00:01:01:01", [], "nodad")
+ADD_VETH(p1, at_ns1, br0, "fc00::2/96", "f0:00:00:01:01:02", [], "nodad")
+
+NS_CHECK_EXEC([at_ns0], [ip -6 neigh add fc00::240 lladdr f0:00:00:01:01:02 
dev p0])
+NS_CHECK_EXEC([at_ns1], [ip -6 neigh add fc00::1 lladdr f0:00:00:01:01:01 dev 
p1])
+
+AT_DATA([flows.txt], [dnl
+table=0,ipv6,ct_state=-trk,actions=ct(table=0,nat)
+table=0,in_port=ovs-p0,udp6,ct_state=+trk+new,actions=ct(commit,nat(dst=[[fc00::2]]:8080)),ovs-p1
+table=0,in_port=ovs-p1,ct_state=+trk+rel+rpl,icmp6,actions=ovs-p0
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+rm p0.pcap
+NETNS_DAEMONIZE([at_ns0], [tcpdump -l -U -i p0 -w p0.pcap 2>tcpdump0_err], 
[tcpdump0.pid])
+NS_CHECK_EXEC([at_ns0], [bash -c "echo dest_unreach | nc -6 $NC_EOF_OPT -p 
1234 -u fc00::240 80"])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=fc00::1," | sort], 
[0], [dnl
+udp,orig=(src=fc00::1,dst=fc00::240,sport=1234,dport=80),reply=(src=fc00::2,dst=fc00::1,sport=8080,dport=1234)
+])
+
+OVS_WAIT_UNTIL([ovs-pcap p0.pcap | grep -Eq 
"f0010101f001010286dd60[[[:xdigit:]]]{6}00453a40fc000240fc010104[[[:xdigit:]]]{4}60[[[:xdigit:]]]{6}00151140fc01fc00024004d20050001587d4646573745f756e72656163680a"])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([conntrack - IPv6 FTP with SNAT])
 AT_SKIP_IF([test $HAVE_FTP = no])
 CHECK_CONNTRACK()

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] northd, lsp: Add additional arp proxy features

2023-02-27 Thread Enrique Llorente
Configure mac address
The mac address returned by ARP/NDP can be configured similar to LSP
addresses where the mac is the first entry on the list

IPv6
Support NDP IPv6 protocol

Use CIDRs
Allow to specify subnets for ipv4 and ipv6, they will match whatever
address is received from ARP/NDP

Signed-off-by: Enrique Llorente 
---
 northd/northd.c   | 141 +++---
 northd/ovn-northd.8.xml   |  12 +-
 ovn-nb.xml|  18 ++-
 tests/ovn.at  | 183 +++
 tests/system-common-macros.at |   5 +-
 tests/system-ovn.at   | 271 ++
 6 files changed, 571 insertions(+), 59 deletions(-)

diff --git a/northd/northd.c b/northd/northd.c
index 770a5b50e..f773603c4 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -8644,29 +8644,43 @@ build_lswitch_arp_nd_responder_known_ips(struct 
ovn_port *op,
 }
 }
 }
-
-if (op->peer) {
-const char *arp_proxy = smap_get(>nbsp->options,"arp_proxy");
-
+const char *arp_proxy = smap_get(>nbsp->options,"arp_proxy");
+if (arp_proxy) {
 struct lport_addresses proxy_arp_addrs;
-int i = 0;
+int i, ofs = 0;
+/* Either takes "MAC IP1 IP2" or "IP1 IP2" */
+if (!extract_addresses(arp_proxy, _arp_addrs, ) &&
+!extract_ip_addresses(arp_proxy, _arp_addrs)) {
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
+VLOG_WARN_RL(, "Invalid arp_proxy option: '%s' at lsp '%s'",
+ arp_proxy, op->nbsp->name);
+return;
+}
+
+/* Select the mac address to answer the proxy ARP/NDP */
+char *ea_s = NULL;
+if (!eth_addr_is_zero(proxy_arp_addrs.ea)) {
+ea_s = proxy_arp_addrs.ea_s;
+} else if (op->peer) {
+ea_s = op->peer->lrp_networks.ea_s;
+} else {
+return;
+}
 
-/* Add responses for ARP proxies. */
-if (arp_proxy && extract_ip_addresses(arp_proxy,
-  _arp_addrs) &&
-proxy_arp_addrs.n_ipv4_addrs) {
+/* Add IPv4 responses for ARP proxies. */
+if (proxy_arp_addrs.n_ipv4_addrs) {
 /* Match rule on all proxy ARP IPs. */
 ds_clear(match);
 ds_put_cstr(match, "arp.op == 1 && arp.tpa == {");
 
 for (i = 0; i < proxy_arp_addrs.n_ipv4_addrs; i++) {
-ds_put_format(match, "%s,",
-  proxy_arp_addrs.ipv4_addrs[i].addr_s);
+ds_put_format(match, "%s/%u,",
+  proxy_arp_addrs.ipv4_addrs[i].addr_s,
+  proxy_arp_addrs.ipv4_addrs[i].plen);
 }
 
 ds_chomp(match, ',');
 ds_put_cstr(match, "}");
-destroy_lport_addresses(_arp_addrs);
 
 ds_clear(actions);
 ds_put_format(actions,
@@ -8679,12 +8693,69 @@ build_lswitch_arp_nd_responder_known_ips(struct 
ovn_port *op,
 "outport = inport; "
 "flags.loopback = 1; "
 "output;",
-op->peer->lrp_networks.ea_s,
-op->peer->lrp_networks.ea_s);
+ea_s,
+ea_s);
 
 ovn_lflow_add_with_hint(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP,
 50, ds_cstr(match), ds_cstr(actions), >nbsp->header_);
 }
+
+/* Add IPv6 NDP responses.
+ * For ND solicitations, we need to listen for both the
+ * unicast IPv6 address and its all-nodes multicast address,
+ * but always respond with the unicast IPv6 address. */
+if (proxy_arp_addrs.n_ipv6_addrs) {
+struct ds ip6_dst_match = DS_EMPTY_INITIALIZER;
+struct ds nd_target_match = DS_EMPTY_INITIALIZER;
+for (size_t j = 0; j < proxy_arp_addrs.n_ipv6_addrs; j++) {
+ds_put_format(_dst_match, "%s/%u, %s/%u",
+proxy_arp_addrs.ipv6_addrs[j].addr_s,
+proxy_arp_addrs.ipv6_addrs[j].plen,
+proxy_arp_addrs.ipv6_addrs[j].sn_addr_s,
+proxy_arp_addrs.ipv6_addrs[j].plen);
+ds_put_format(_target_match,
+"%s/%u",
+proxy_arp_addrs.ipv6_addrs[j].addr_s,
+proxy_arp_addrs.ipv6_addrs[j].plen);
+if (j+1 < proxy_arp_addrs.n_ipv6_addrs) {
+ds_put_cstr(_dst_match, ", ");
+ds_put_cstr(_target_match, ", ");
+}
+}
+  

Re: [ovs-dev] [PATCH ovn] MAINTAINERS: Move myself to emeritus status

2023-02-27 Thread Dumitru Ceara
On 2/24/23 05:09, Numan Siddique wrote:
> On Mon, Feb 20, 2023 at 4:41 PM Russell Bryant  wrote:
>>
>> From: Russell Bryant 
>>
>> I have not been involved in OVN development long enough that I should
>> transition to emeritus status.
>>
>> Signed-off-by: Russell Bryant 
> 
> Thanks for all the contributions.
> 

Echoing Numan: thanks for the contributions throughout the years!

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev