Re: [ovs-dev] [PATCH v9 4/4] userspace: Enable L4 checksum offloading by default.

2023-07-07 Thread Flavio Leitner

Please ignore this email.
fbl

On 7/7/23 16:21, Flavio Leitner wrote:

From: Ilya Maximets 

On 11/24/22 06:30, Mike Pattrick wrote:

From: Flavio Leitner 

The netdev receiving packets is supposed to provide the flags
indicating if the L4 checksum was verified and it is OK or BAD,
otherwise the stack will check when appropriate by software.

If the packet comes with good checksum, then postpone the
checksum calculation to the egress device if needed.

When encapsulate a packet with that flag, set the checksum
of the inner L4 header since that is not yet supported.

Calculate the L4 checksum when the packet is going to be sent
over a device that doesn't support the feature.

Linux tap devices allows enabling L3 and L4 offload, so this
patch enables the feature. However, Linux socket interface
remains disabled because the API doesn't allow enabling
those two features without enabling TSO too.

Signed-off-by: Flavio Leitner 
Co-authored-by: Mike Pattrick 
Signed-off-by: Mike Pattrick 
---

Didn't test this as well.  Only visual review.

Should we enable checksum offloading in CONFIGURE_VETH_OFFLOADS for
check-system-userspace testsuite since support is enabled by default?

More comments inline.

Best regards, Ilya Maximets.


  lib/conntrack.c |  15 +--
  lib/dp-packet.c |  25 
  lib/dp-packet.h |  78 -
  lib/flow.c  |  23 
  lib/netdev-dpdk.c   | 188 --
  lib/netdev-linux.c  | 252 ++--
  lib/netdev-native-tnl.c |  32 +
  lib/netdev.c|  46 ++--
  lib/packets.c   | 175 ++--
  lib/packets.h   |   3 +
  10 files changed, 580 insertions(+), 257 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 12194cce8..57e6a55e0 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -2118,13 +2118,12 @@ conn_key_extract(struct conntrack *ct, struct dp_packet 
*pkt, ovs_be16 dl_type,
  }
  
  if (ok) {

-bool hwol_bad_l4_csum = dp_packet_l4_checksum_bad(pkt);
-if (!hwol_bad_l4_csum) {
-bool  hwol_good_l4_csum = dp_packet_l4_checksum_good(pkt)
-  || dp_packet_hwol_tx_l4_checksum(pkt);
+if (!dp_packet_l4_checksum_bad(pkt)) {
  /* Validate the checksum only when hwol is not supported. */
  if (extract_l4(>key, l4, dp_packet_l4_size(pkt),
-   >icmp_related, l3, !hwol_good_l4_csum,
+   >icmp_related, l3,
+   !dp_packet_l4_checksum_good(pkt) &&
+   !dp_packet_hwol_tx_l4_checksum(pkt),
 NULL)) {
  ctx->hash = conn_key_hash(>key, ct->hash_basis);
  return true;
@@ -3453,8 +3452,10 @@ handle_ftp_ctl(struct conntrack *ct, const struct 
conn_lookup_ctx *ctx,
  adj_seqnum(>tcp_seq, ec->seq_skew);
  }
  
-th->tcp_csum = 0;

-if (!dp_packet_hwol_tx_l4_checksum(pkt)) {
+if (dp_packet_hwol_tx_l4_checksum(pkt)) {
+dp_packet_ol_reset_l4_csum_good(pkt);
+} else {
+th->tcp_csum = 0;
  if (ctx->key.dl_type == htons(ETH_TYPE_IPV6)) {
  th->tcp_csum = packet_csum_upperlayer6(nh6, th, ctx->key.nw_proto,
 dp_packet_l4_size(pkt));
diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 90ef85de3..2cfaf5274 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -38,6 +38,9 @@ dp_packet_init__(struct dp_packet *b, size_t allocated, enum 
dp_packet_source so
  dp_packet_init_specific(b);
  /* By default assume the packet type to be Ethernet. */
  b->packet_type = htonl(PT_ETH);
+/* Reset csum start and offset. */
+b->csum_start = 0;
+b->csum_offset = 0;
  }
  
  static void

@@ -544,4 +547,26 @@ dp_packet_ol_send_prepare(struct dp_packet *p, const 
uint64_t flags)
  dp_packet_ol_set_ip_csum_good(p);
  dp_packet_hwol_reset_tx_ip_csum(p);
  }
+
+if (dp_packet_l4_checksum_good(p) || !dp_packet_hwol_tx_l4_checksum(p)) {
+dp_packet_hwol_reset_tx_l4_csum(p);
+return;
+}
+
+if (dp_packet_hwol_l4_is_tcp(p)
+&& !(flags & NETDEV_TX_OFFLOAD_TCP_CKSUM)) {
+packet_tcp_complete_csum(p);
+dp_packet_ol_set_l4_csum_good(p);
+dp_packet_hwol_reset_tx_l4_csum(p);
+} else if (dp_packet_hwol_l4_is_udp(p)
+&& !(flags & NETDEV_TX_OFFLOAD_UDP_CKSUM)) {

Indentation.


+packet_udp_complete_csum(p);
+dp_packet_ol_set_l4_csum_good(p);
+dp_packet_hwol_reset_tx_l4_csum(p);
+} else if (!(flags & NETDEV_TX_OFFLOAD_SCTP_CKSUM)
+&& dp_packet_hwol_l4_is_sctp(p)) {

Indentation.


+packet_sctp_complete_csum(p);
+dp_packet_ol_set_l4_csum_good(p);
+dp_packet_hwol_reset_tx_l4_c

Re: [ovs-dev] [PATCH v9 4/4] userspace: Enable L4 checksum offloading by default.

2023-07-07 Thread Flavio Leitner
From: Ilya Maximets 

On 11/24/22 06:30, Mike Pattrick wrote:
> From: Flavio Leitner 
> 
> The netdev receiving packets is supposed to provide the flags
> indicating if the L4 checksum was verified and it is OK or BAD,
> otherwise the stack will check when appropriate by software.
> 
> If the packet comes with good checksum, then postpone the
> checksum calculation to the egress device if needed.
> 
> When encapsulate a packet with that flag, set the checksum
> of the inner L4 header since that is not yet supported.
> 
> Calculate the L4 checksum when the packet is going to be sent
> over a device that doesn't support the feature.
> 
> Linux tap devices allows enabling L3 and L4 offload, so this
> patch enables the feature. However, Linux socket interface
> remains disabled because the API doesn't allow enabling
> those two features without enabling TSO too.
> 
> Signed-off-by: Flavio Leitner 
> Co-authored-by: Mike Pattrick 
> Signed-off-by: Mike Pattrick 
> ---

Didn't test this as well.  Only visual review.

Should we enable checksum offloading in CONFIGURE_VETH_OFFLOADS for
check-system-userspace testsuite since support is enabled by default?

More comments inline.

Best regards, Ilya Maximets.

>  lib/conntrack.c |  15 +--
>  lib/dp-packet.c |  25 
>  lib/dp-packet.h |  78 -
>  lib/flow.c  |  23 
>  lib/netdev-dpdk.c   | 188 --
>  lib/netdev-linux.c  | 252 ++--
>  lib/netdev-native-tnl.c |  32 +
>  lib/netdev.c|  46 ++--
>  lib/packets.c   | 175 ++--
>  lib/packets.h   |   3 +
>  10 files changed, 580 insertions(+), 257 deletions(-)
> 
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 12194cce8..57e6a55e0 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -2118,13 +2118,12 @@ conn_key_extract(struct conntrack *ct, struct 
> dp_packet *pkt, ovs_be16 dl_type,
>  }
>  
>  if (ok) {
> -bool hwol_bad_l4_csum = dp_packet_l4_checksum_bad(pkt);
> -if (!hwol_bad_l4_csum) {
> -bool  hwol_good_l4_csum = dp_packet_l4_checksum_good(pkt)
> -  || dp_packet_hwol_tx_l4_checksum(pkt);
> +if (!dp_packet_l4_checksum_bad(pkt)) {
>  /* Validate the checksum only when hwol is not supported. */
>  if (extract_l4(>key, l4, dp_packet_l4_size(pkt),
> -   >icmp_related, l3, !hwol_good_l4_csum,
> +   >icmp_related, l3,
> +   !dp_packet_l4_checksum_good(pkt) &&
> +   !dp_packet_hwol_tx_l4_checksum(pkt),
> NULL)) {
>  ctx->hash = conn_key_hash(>key, ct->hash_basis);
>  return true;
> @@ -3453,8 +3452,10 @@ handle_ftp_ctl(struct conntrack *ct, const struct 
> conn_lookup_ctx *ctx,
>  adj_seqnum(>tcp_seq, ec->seq_skew);
>  }
>  
> -th->tcp_csum = 0;
> -if (!dp_packet_hwol_tx_l4_checksum(pkt)) {
> +if (dp_packet_hwol_tx_l4_checksum(pkt)) {
> +dp_packet_ol_reset_l4_csum_good(pkt);
> +} else {
> +th->tcp_csum = 0;
>  if (ctx->key.dl_type == htons(ETH_TYPE_IPV6)) {
>  th->tcp_csum = packet_csum_upperlayer6(nh6, th, 
> ctx->key.nw_proto,
> dp_packet_l4_size(pkt));
> diff --git a/lib/dp-packet.c b/lib/dp-packet.c
> index 90ef85de3..2cfaf5274 100644
> --- a/lib/dp-packet.c
> +++ b/lib/dp-packet.c
> @@ -38,6 +38,9 @@ dp_packet_init__(struct dp_packet *b, size_t allocated, 
> enum dp_packet_source so
>  dp_packet_init_specific(b);
>  /* By default assume the packet type to be Ethernet. */
>  b->packet_type = htonl(PT_ETH);
> +/* Reset csum start and offset. */
> +b->csum_start = 0;
> +b->csum_offset = 0;
>  }
>  
>  static void
> @@ -544,4 +547,26 @@ dp_packet_ol_send_prepare(struct dp_packet *p, const 
> uint64_t flags)
>  dp_packet_ol_set_ip_csum_good(p);
>  dp_packet_hwol_reset_tx_ip_csum(p);
>  }
> +
> +if (dp_packet_l4_checksum_good(p) || !dp_packet_hwol_tx_l4_checksum(p)) {
> +dp_packet_hwol_reset_tx_l4_csum(p);
> +return;
> +}
> +
> +if (dp_packet_hwol_l4_is_tcp(p)
> +&& !(flags & NETDEV_TX_OFFLOAD_TCP_CKSUM)) {
> +packet_tcp_complete_csum(p);
> +dp_packet_ol_set_l4_csum_good(p);
> +dp_packet_hwol_reset_tx_l4_csum(p);
> +} else if (dp_packet_hwol_l4_is_udp(p)
> +&

[ovs-dev] [PATCH v3] ovs-vsctl: Exit with error if postdb checks report errors.

2023-07-04 Thread Flavio Leitner
Today the exit code refers to the execution of the change
in the database. However, when not using parameter --no-wait
(default), the ovs-vsctl also checks if OVSDB transactions
are successfully recorded and reload by ovs-vswitchd. In this
case, an error message is printed if there is a problem during
the reload, like for example the one below:

 # ovs-vsctl add-port br0 gre0 -- \
set interface gre0 type=ip6gre options:key=100 \
options:remote_ip=2001::2
ovs-vsctl: Error detected while setting up 'gre0': could not \
add network device gre0 to ofproto (Address family not supported\
by protocol).  See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch".
 # echo $?
0

This patch changes to exit with specific error code 160
(ERROR_BAD_ARGUMENTS) on Windows and 65 (EX_DATAERR) on
Linux or BSD if errors were reported during the reload.

This change may break existing scripts because ovs-vsctl will
start to fail when before it was succeeding. However, if an
error is printed, then it is likely that the change was not
functional anyway.

Reported-at: https://bugzilla.redhat.com/1731553
Signed-off-by: Flavio Leitner 
---

v3: Fixed the Windows build issue reported by Ilya.
Return ERROR_BAD_ARGUMENTS on Windows.
v2:
Followed Aaron's suggestion to return EX_DATAERR.

 NEWS |  5 +
 tests/ovs-vsctl.at   | 30 --
 tests/ovs-vswitchd.at|  6 +-
 tests/tunnel.at  |  8 +++-
 utilities/ovs-vsctl.8.in |  4 
 utilities/ovs-vsctl.c| 35 +--
 6 files changed, 78 insertions(+), 10 deletions(-)

diff --git a/NEWS b/NEWS
index 6a990c921..8c733e417 100644
--- a/NEWS
+++ b/NEWS
@@ -30,6 +30,11 @@ Post-v3.1.0
  * Added new options --[ovsdb-server|ovs-vswitchd]-umask=MODE to set umask
value when starting OVS daemons.  E.g., use --ovsdb-server-umask=0002
in order to create OVSDB sockets with access mode of 0770.
+   - ovs-vsctl:
+ * Exit with error code 160 (ERROR_BAD_ARGUMENTS) on Windows or
+   65 (EX_DATAERR) on other platforms if errors were reported while
+   checking if OVSDB transactions are successfully recorded and reload
+   by ovs-vswitchd.
- QoS:
  * Added new configuration option 'jitter' for a linux-netem QoS type.
- DPDK:
diff --git a/tests/ovs-vsctl.at b/tests/ovs-vsctl.at
index a368bff6e..a8274734f 100644
--- a/tests/ovs-vsctl.at
+++ b/tests/ovs-vsctl.at
@@ -1522,7 +1522,11 @@ cat >experr <experr <experr << EOF
+ovs-vsctl: Error detected while setting up 'gre0': gre0: bad ip6gre 'remote_ip'
+gre0: ip6gre type requires valid 'remote_ip' argument.  See ovs-vswitchd log 
for details.
+ovs-vsctl: The default log directory is "$OVS_RUNDIR".
+EOF
+if test "$IS_WIN32" = "yes"; then
+AT_CHECK([ovs-vsctl add-port br0 gre0 -- set interface gre0 type=ip6gre 
options:key=100 options:remote_ip=192.168.0.300], [160], [], [experr])
+else
+AT_CHECK([ovs-vsctl add-port br0 gre0 -- set interface gre0 type=ip6gre 
options:key=100 options:remote_ip=192.168.0.300], [65], [], [experr])
+fi
+OVS_VSWITCHD_STOP(["/is not a valid IP address/d
+/netdev_vport|WARN|gre0: bad ip6gre 'remote_ip'/d
+/netdev|WARN|gre0: could not set configuration/d"])
+AT_CLEANUP
diff --git a/tests/ovs-vswitchd.at b/tests/ovs-vswitchd.at
index 977b2eba1..8fcfc6ec1 100644
--- a/tests/ovs-vswitchd.at
+++ b/tests/ovs-vswitchd.at
@@ -222,7 +222,11 @@ cat >experr <
+#define EXIT_POSTDB_ERROR ERROR_BAD_ARGUMENTS
+#else
+#include 
+#define EXIT_POSTDB_ERROR EX_DATAERR
+#endif
+
 struct vsctl_context;
 
 /* --db: The database server to contact. */
@@ -115,7 +124,7 @@ static void parse_options(int argc, char *argv[], struct 
shash *local_options);
 static void run_prerequisites(struct ctl_command[], size_t n_commands,
   struct ovsdb_idl *);
 static bool do_vsctl(const char *args, struct ctl_command *, size_t n,
- struct ovsdb_idl *);
+ struct ovsdb_idl *, bool *);
 
 /* post_db_reload_check frame work is to allow ovs-vsctl to do additional
  * checks after OVSDB transactions are successfully recorded and reload by
@@ -134,11 +143,13 @@ static bool do_vsctl(const char *args, struct ctl_command 
*, size_t n,
  * Current implementation only check for Post OVSDB reload failures on new
  * interface additions with 'add-br' and 'add-port' commands.
  *
+ * post_db_reload_check returns 'true' if a failure is reported.
+ *
  * post_db_reload_expect_iface()
  *
  * keep track of interfaces to be checked post OVSDB reload. */
 static void post_db_reload_check_init(void);
-static void post_db_reload_do_checks(const struct vsctl_context *);
+static bool post_db_reload_do_checks(const struct vsctl_context *);
 static void post_db_reload_expect_iface(const struct ovsrec_interface *);
 
 static struct uuid *neoteric_ifaces;
@@ -200,9 

Re: [ovs-dev] [PATCH v2] ovs-vsctl: Exit with error if postdb checks report errors.

2023-06-30 Thread Flavio Leitner



On 6/30/23 12:31, Aaron Conole wrote:

Flavio Leitner  writes:


Today the exit code refers to the execution of the change
in the database. However, when not using parameter --no-wait
(default), the ovs-vsctl also checks if OVSDB transactions
are successfully recorded and reload by ovs-vswitchd. In this
case, an error message is printed if there is a problem during
the reload, like for example the one below:

  # ovs-vsctl add-port br0 gre0 -- \
 set interface gre0 type=ip6gre options:key=100 \
 options:remote_ip=2001::2
ovs-vsctl: Error detected while setting up 'gre0': could not \
add network device gre0 to ofproto (Address family not supported\
by protocol).  See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch".
  # echo $?
0

This patch changes to exit with specific error code '65'
(EX_DATAERR) if errors were reported during the reload.

This change may break existing scripts because ovs-vsctl will
start to fail when before it was succeeding. However, if an
error is printed, then it is likely that the change was not
functional anyway.

Reported-at: https://bugzilla.redhat.com/1731553
Signed-off-by: Flavio Leitner 
---

LGTM.  I did a quick double check for FreeBSD and Mac OS X, and the
error code is the same value as on linux systems.

I don't have a windows machine to test with and unfortunately the robot
doesn't build series_* branches on appveyor (maybe something to look
at).  We may need a workaround for windows - but I'll let Alin take a
look.

Acked-by: Aaron Conole 


Thanks for reviewing and testing it.

fbl







v2:
 Followed Aaron's suggestion to return EX_DATAERR.

  NEWS |  4 
  tests/ovs-vsctl.at   | 19 +--
  tests/ovs-vswitchd.at|  2 +-
  tests/tunnel.at  |  2 +-
  utilities/ovs-vsctl.8.in |  2 ++
  utilities/ovs-vsctl.c| 30 --
  6 files changed, 49 insertions(+), 10 deletions(-)

diff --git a/NEWS b/NEWS
index 66d5a4ea3..cb148a09f 100644
--- a/NEWS
+++ b/NEWS
@@ -28,6 +28,10 @@ Post-v3.1.0
   * Added new options --[ovsdb-server|ovs-vswitchd]-umask=MODE to set umask
 value when starting OVS daemons.  E.g., use --ovsdb-server-umask=0002
 in order to create OVSDB sockets with access mode of 0770.
+   - ovs-vsctl:
+ * Exit with error code 65 (EX_DATAERR) if errors were reported while
+   checking if OVSDB transactions are successfully recorded and reload
+   by ovs-vswitchd.
 - QoS:
   * Added new configuration option 'jitter' for a linux-netem QoS type.
 - DPDK:
diff --git a/tests/ovs-vsctl.at b/tests/ovs-vsctl.at
index a368bff6e..b282798cc 100644
--- a/tests/ovs-vsctl.at
+++ b/tests/ovs-vsctl.at
@@ -1522,7 +1522,7 @@ cat >experr <experr <  
  OVS_VSCTL_CLEANUP

  AT_CLEANUP
+
+AT_SETUP([ovs-vsctl -- return error if OVSDB reload issues are reported])
+OVS_VSWITCHD_START
+dnl check if ovs-vsctl returns error 65 if ovs-vswitchd fails to reload.
+
+cat >experr << EOF
+ovs-vsctl: Error detected while setting up 'gre0': gre0: bad ip6gre 'remote_ip'
+gre0: ip6gre type requires valid 'remote_ip' argument.  See ovs-vswitchd log 
for details.
+ovs-vsctl: The default log directory is "$OVS_RUNDIR".
+EOF
+AT_CHECK([ovs-vsctl add-port br0 gre0 -- set interface gre0 type=ip6gre 
options:key=100 options:remote_ip=192.168.0.300], [65], [], [experr])
+OVS_VSWITCHD_STOP(["/is not a valid IP address/d
+/netdev_vport|WARN|gre0: bad ip6gre 'remote_ip'/d
+/netdev|WARN|gre0: could not set configuration/d"])
+AT_CLEANUP
diff --git a/tests/ovs-vswitchd.at b/tests/ovs-vswitchd.at
index 977b2eba1..80a748355 100644
--- a/tests/ovs-vswitchd.at
+++ b/tests/ovs-vswitchd.at
@@ -222,7 +222,7 @@ cat >experr <  
  OVS_VSWITCHD_STOP(['/ignoring bridge with invalid name/d'])

diff --git a/tests/tunnel.at b/tests/tunnel.at
index ddeb66bc9..d281c9e6c 100644
--- a/tests/tunnel.at
+++ b/tests/tunnel.at
@@ -1009,7 +1009,7 @@ OVS_VSWITCHD_START([add-port br0 p1 -- set Interface p1 
type=vxlan \
  options:remote_ip=flow ofport_request=1])
  
  AT_CHECK([ovs-vsctl add-port br0 p2 -- set Interface p2 type=vxlan \

-options:remote_ip=flow options:exts=gbp options:key=1 
ofport_request=2], [0],
+options:remote_ip=flow options:exts=gbp options:key=1 
ofport_request=2], [65],
[], [ignore])
  
  AT_CHECK([grep 'p2: could not set configuration (File exists)' ovs-vswitchd.log | sed "s/^.*\(p2:.*\)$/\1/"], [0],

diff --git a/utilities/ovs-vsctl.8.in b/utilities/ovs-vsctl.8.in
index 9e319aa1c..7c7e7bc29 100644
--- a/utilities/ovs-vsctl.8.in
+++ b/utilities/ovs-vsctl.8.in
@@ -892,6 +892,8 @@ Usage, syntax, or configuration file error.
  .IP "2"
  The \fIbridge\fR argument to \fBbr\-exists\fR specified the name of a
  bridge that does not exist.
+.IP "65"
+An error has been reported post OVSDB reload.
  .SH &q

[ovs-dev] [PATCH v2] ovs-vsctl: Exit with error if postdb checks report errors.

2023-06-30 Thread Flavio Leitner
Today the exit code refers to the execution of the change
in the database. However, when not using parameter --no-wait
(default), the ovs-vsctl also checks if OVSDB transactions
are successfully recorded and reload by ovs-vswitchd. In this
case, an error message is printed if there is a problem during
the reload, like for example the one below:

 # ovs-vsctl add-port br0 gre0 -- \
set interface gre0 type=ip6gre options:key=100 \
options:remote_ip=2001::2
ovs-vsctl: Error detected while setting up 'gre0': could not \
add network device gre0 to ofproto (Address family not supported\
by protocol).  See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch".
 # echo $?
0

This patch changes to exit with specific error code '65'
(EX_DATAERR) if errors were reported during the reload.

This change may break existing scripts because ovs-vsctl will
start to fail when before it was succeeding. However, if an
error is printed, then it is likely that the change was not
functional anyway.

Reported-at: https://bugzilla.redhat.com/1731553
Signed-off-by: Flavio Leitner 
---

v2:
Followed Aaron's suggestion to return EX_DATAERR.

 NEWS |  4 
 tests/ovs-vsctl.at   | 19 +--
 tests/ovs-vswitchd.at|  2 +-
 tests/tunnel.at  |  2 +-
 utilities/ovs-vsctl.8.in |  2 ++
 utilities/ovs-vsctl.c| 30 --
 6 files changed, 49 insertions(+), 10 deletions(-)

diff --git a/NEWS b/NEWS
index 66d5a4ea3..cb148a09f 100644
--- a/NEWS
+++ b/NEWS
@@ -28,6 +28,10 @@ Post-v3.1.0
  * Added new options --[ovsdb-server|ovs-vswitchd]-umask=MODE to set umask
value when starting OVS daemons.  E.g., use --ovsdb-server-umask=0002
in order to create OVSDB sockets with access mode of 0770.
+   - ovs-vsctl:
+ * Exit with error code 65 (EX_DATAERR) if errors were reported while
+   checking if OVSDB transactions are successfully recorded and reload
+   by ovs-vswitchd.
- QoS:
  * Added new configuration option 'jitter' for a linux-netem QoS type.
- DPDK:
diff --git a/tests/ovs-vsctl.at b/tests/ovs-vsctl.at
index a368bff6e..b282798cc 100644
--- a/tests/ovs-vsctl.at
+++ b/tests/ovs-vsctl.at
@@ -1522,7 +1522,7 @@ cat >experr <experr <experr << EOF
+ovs-vsctl: Error detected while setting up 'gre0': gre0: bad ip6gre 'remote_ip'
+gre0: ip6gre type requires valid 'remote_ip' argument.  See ovs-vswitchd log 
for details.
+ovs-vsctl: The default log directory is "$OVS_RUNDIR".
+EOF
+AT_CHECK([ovs-vsctl add-port br0 gre0 -- set interface gre0 type=ip6gre 
options:key=100 options:remote_ip=192.168.0.300], [65], [], [experr])
+OVS_VSWITCHD_STOP(["/is not a valid IP address/d
+/netdev_vport|WARN|gre0: bad ip6gre 'remote_ip'/d
+/netdev|WARN|gre0: could not set configuration/d"])
+AT_CLEANUP
diff --git a/tests/ovs-vswitchd.at b/tests/ovs-vswitchd.at
index 977b2eba1..80a748355 100644
--- a/tests/ovs-vswitchd.at
+++ b/tests/ovs-vswitchd.at
@@ -222,7 +222,7 @@ cat >experr <
 #include 
 #include 
+#include 
 #include 
 
 #include "db-ctl-base.h"
@@ -56,6 +57,9 @@
 
 VLOG_DEFINE_THIS_MODULE(vsctl);
 
+/* Post OVSDB reload error reported. */
+#define EXIT_POSTDB_ERROR EX_DATAERR
+
 struct vsctl_context;
 
 /* --db: The database server to contact. */
@@ -115,7 +119,7 @@ static void parse_options(int argc, char *argv[], struct 
shash *local_options);
 static void run_prerequisites(struct ctl_command[], size_t n_commands,
   struct ovsdb_idl *);
 static bool do_vsctl(const char *args, struct ctl_command *, size_t n,
- struct ovsdb_idl *);
+ struct ovsdb_idl *, bool *);
 
 /* post_db_reload_check frame work is to allow ovs-vsctl to do additional
  * checks after OVSDB transactions are successfully recorded and reload by
@@ -134,11 +138,13 @@ static bool do_vsctl(const char *args, struct ctl_command 
*, size_t n,
  * Current implementation only check for Post OVSDB reload failures on new
  * interface additions with 'add-br' and 'add-port' commands.
  *
+ * post_db_reload_check returns 'true' if a failure is reported.
+ *
  * post_db_reload_expect_iface()
  *
  * keep track of interfaces to be checked post OVSDB reload. */
 static void post_db_reload_check_init(void);
-static void post_db_reload_do_checks(const struct vsctl_context *);
+static bool post_db_reload_do_checks(const struct vsctl_context *);
 static void post_db_reload_expect_iface(const struct ovsrec_interface *);
 
 static struct uuid *neoteric_ifaces;
@@ -200,9 +206,15 @@ main(int argc, char *argv[])
 }
 
 if (seqno != ovsdb_idl_get_seqno(idl)) {
+bool postdb_err;
+
 seqno = ovsdb_idl_get_seqno(idl);
-if (do_vsctl(args, commands, n_commands, idl)) {
+if (do_vsctl(args, commands, n_commands, idl, _err)) {
 free(args);
+

Re: [ovs-dev] [PATCH] ovs-vsctl: Exit with error if postdb checks report errors.

2023-06-27 Thread Flavio Leitner


On 6/26/23 16:48, Aaron Conole wrote:

Flavio Leitner  writes:


Today the exit code refers to the execution of the change
in the database. However, when not using parameter --no-wait
(default), the ovs-vsctl also checks if OVSDB transactions
are successfully recorded and reload by ovs-vswitchd. In this
case, an error message is printed if there is a problem during
the reload, like for example the one below:

  # ovs-vsctl add-port br0 gre0 -- \
 set interface gre0 type=ip6gre options:key=100 \
 options:remote_ip=2001::2
ovs-vsctl: Error detected while setting up 'gre0': could not \
add network device gre0 to ofproto (Address family not supported\
by protocol).  See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch".
  # echo $?
0

This patch changes to exit with specific error code '3'
if errors were reported during the reload.

This change may break existing scripts because ovs-vsctl will
start to fail when before it was succeeding. However, if an
error is printed, then it is likely that the change was not
functional anyway.

Reported-at: https://bugzilla.redhat.com/1731553
Signed-off-by: Flavio Leitner 
---
  NEWS |  3 +++
  tests/ovs-vsctl.at   | 19 +--
  tests/ovs-vswitchd.at|  2 +-
  tests/tunnel.at  |  2 +-
  utilities/ovs-vsctl.8.in |  2 ++
  utilities/ovs-vsctl.c| 29 +++--
  6 files changed, 47 insertions(+), 10 deletions(-)

diff --git a/NEWS b/NEWS
index cfd43..f8f7b7655 100644
--- a/NEWS
+++ b/NEWS
@@ -28,6 +28,9 @@ Post-v3.1.0
   * Added new options --[ovsdb-server|ovs-vswitchd]-umask=MODE to set umask
 value when starting OVS daemons.  E.g., use --ovsdb-server-umask=0002
 in order to create OVSDB sockets with access mode of 0770.
+   - ovs-vsctl:
+ * Exit with error code 3 if errors were reported while checking if OVSDB
+   transactions are successfully recorded and reload by ovs-vswitchd.
 - QoS:
   * Added new configuration option 'jitter' for a linux-netem QoS type.
 - DPDK:
diff --git a/tests/ovs-vsctl.at b/tests/ovs-vsctl.at
index a368bff6e..2554152df 100644
--- a/tests/ovs-vsctl.at
+++ b/tests/ovs-vsctl.at
@@ -1522,7 +1522,7 @@ cat >experr <experr <  
  OVS_VSCTL_CLEANUP

  AT_CLEANUP
+
+AT_SETUP([ovs-vsctl -- return error if OVSDB reload issues are reported])
+OVS_VSWITCHD_START
+dnl check if ovs-vsctl returns error 3 if ovs-vswitchd fails to reload.
+
+cat >experr << EOF
+ovs-vsctl: Error detected while setting up 'gre0': gre0: bad ip6gre 'remote_ip'
+gre0: ip6gre type requires valid 'remote_ip' argument.  See ovs-vswitchd log 
for details.
+ovs-vsctl: The default log directory is "$OVS_RUNDIR".
+EOF
+AT_CHECK([ovs-vsctl add-port br0 gre0 -- set interface gre0 type=ip6gre 
options:key=100 options:remote_ip=192.168.0.300], [3], [], [experr])
+OVS_VSWITCHD_STOP(["/is not a valid IP address/d
+/netdev_vport|WARN|gre0: bad ip6gre 'remote_ip'/d
+/netdev|WARN|gre0: could not set configuration/d"])
+AT_CLEANUP
diff --git a/tests/ovs-vswitchd.at b/tests/ovs-vswitchd.at
index 977b2eba1..81604111b 100644
--- a/tests/ovs-vswitchd.at
+++ b/tests/ovs-vswitchd.at
@@ -222,7 +222,7 @@ cat >experr <  
  OVS_VSWITCHD_STOP(['/ignoring bridge with invalid name/d'])

diff --git a/tests/tunnel.at b/tests/tunnel.at
index ddeb66bc9..cf66cc085 100644
--- a/tests/tunnel.at
+++ b/tests/tunnel.at
@@ -1009,7 +1009,7 @@ OVS_VSWITCHD_START([add-port br0 p1 -- set Interface p1 
type=vxlan \
  options:remote_ip=flow ofport_request=1])
  
  AT_CHECK([ovs-vsctl add-port br0 p2 -- set Interface p2 type=vxlan \

-options:remote_ip=flow options:exts=gbp options:key=1 
ofport_request=2], [0],
+options:remote_ip=flow options:exts=gbp options:key=1 
ofport_request=2], [3],
[], [ignore])
  
  AT_CHECK([grep 'p2: could not set configuration (File exists)' ovs-vswitchd.log | sed "s/^.*\(p2:.*\)$/\1/"], [0],

diff --git a/utilities/ovs-vsctl.8.in b/utilities/ovs-vsctl.8.in
index 9e319aa1c..929285e66 100644
--- a/utilities/ovs-vsctl.8.in
+++ b/utilities/ovs-vsctl.8.in
@@ -892,6 +892,8 @@ Usage, syntax, or configuration file error.
  .IP "2"
  The \fIbridge\fR argument to \fBbr\-exists\fR specified the name of a
  bridge that does not exist.
+.IP "3"
+An error has been reported post OVSDB reload.
  .SH "SEE ALSO"
  .
  .BR ovsdb\-server (1),
diff --git a/utilities/ovs-vsctl.c b/utilities/ovs-vsctl.c
index 2f5ac1a26..8daa1d409 100644
--- a/utilities/ovs-vsctl.c
+++ b/utilities/ovs-vsctl.c
@@ -56,6 +56,9 @@
  
  VLOG_DEFINE_THIS_MODULE(vsctl);
  
+/* Post OVSDB reload error reported. */

+#define EXIT_POSTDB_ERROR 3
+

Maybe we can use a definition from sysexits.h, like:

   #define EX_SOFTWARE  70  /* internal software error */
  or
   #define EX_PROTOCOL  76  /* remote error in protocol */

WDY

[ovs-dev] [PATCH] ovs-vsctl: Exit with error if postdb checks report errors.

2023-06-17 Thread Flavio Leitner
Today the exit code refers to the execution of the change
in the database. However, when not using parameter --no-wait
(default), the ovs-vsctl also checks if OVSDB transactions
are successfully recorded and reload by ovs-vswitchd. In this
case, an error message is printed if there is a problem during
the reload, like for example the one below:

 # ovs-vsctl add-port br0 gre0 -- \
set interface gre0 type=ip6gre options:key=100 \
options:remote_ip=2001::2
ovs-vsctl: Error detected while setting up 'gre0': could not \
add network device gre0 to ofproto (Address family not supported\
by protocol).  See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch".
 # echo $?
0

This patch changes to exit with specific error code '3'
if errors were reported during the reload.

This change may break existing scripts because ovs-vsctl will
start to fail when before it was succeeding. However, if an
error is printed, then it is likely that the change was not
functional anyway.

Reported-at: https://bugzilla.redhat.com/1731553
Signed-off-by: Flavio Leitner 
---
 NEWS |  3 +++
 tests/ovs-vsctl.at   | 19 +--
 tests/ovs-vswitchd.at|  2 +-
 tests/tunnel.at  |  2 +-
 utilities/ovs-vsctl.8.in |  2 ++
 utilities/ovs-vsctl.c| 29 +++--
 6 files changed, 47 insertions(+), 10 deletions(-)

diff --git a/NEWS b/NEWS
index cfd43..f8f7b7655 100644
--- a/NEWS
+++ b/NEWS
@@ -28,6 +28,9 @@ Post-v3.1.0
  * Added new options --[ovsdb-server|ovs-vswitchd]-umask=MODE to set umask
value when starting OVS daemons.  E.g., use --ovsdb-server-umask=0002
in order to create OVSDB sockets with access mode of 0770.
+   - ovs-vsctl:
+ * Exit with error code 3 if errors were reported while checking if OVSDB
+   transactions are successfully recorded and reload by ovs-vswitchd.
- QoS:
  * Added new configuration option 'jitter' for a linux-netem QoS type.
- DPDK:
diff --git a/tests/ovs-vsctl.at b/tests/ovs-vsctl.at
index a368bff6e..2554152df 100644
--- a/tests/ovs-vsctl.at
+++ b/tests/ovs-vsctl.at
@@ -1522,7 +1522,7 @@ cat >experr <experr <experr << EOF
+ovs-vsctl: Error detected while setting up 'gre0': gre0: bad ip6gre 'remote_ip'
+gre0: ip6gre type requires valid 'remote_ip' argument.  See ovs-vswitchd log 
for details.
+ovs-vsctl: The default log directory is "$OVS_RUNDIR".
+EOF
+AT_CHECK([ovs-vsctl add-port br0 gre0 -- set interface gre0 type=ip6gre 
options:key=100 options:remote_ip=192.168.0.300], [3], [], [experr])
+OVS_VSWITCHD_STOP(["/is not a valid IP address/d
+/netdev_vport|WARN|gre0: bad ip6gre 'remote_ip'/d
+/netdev|WARN|gre0: could not set configuration/d"])
+AT_CLEANUP
diff --git a/tests/ovs-vswitchd.at b/tests/ovs-vswitchd.at
index 977b2eba1..81604111b 100644
--- a/tests/ovs-vswitchd.at
+++ b/tests/ovs-vswitchd.at
@@ -222,7 +222,7 @@ cat >experr <header_.uuid;
 }
 
-static void
+static bool
 post_db_reload_do_checks(const struct vsctl_context *vsctl_ctx)
 {
 bool print_error = false;
@@ -2707,6 +2718,8 @@ post_db_reload_do_checks(const struct vsctl_context 
*vsctl_ctx)
 if (print_error) {
 ovs_error(0, "The default log directory is \"%s\".", ovs_logdir());
 }
+
+return print_error;
 }
 
 
@@ -2815,7 +2828,7 @@ vsctl_parent_process_info(void)
 
 static bool
 do_vsctl(const char *args, struct ctl_command *commands, size_t n_commands,
- struct ovsdb_idl *idl)
+ struct ovsdb_idl *idl, bool *postdb_err)
 {
 struct ovsdb_idl_txn *txn;
 const struct ovsrec_open_vswitch *ovs;
@@ -2827,6 +2840,8 @@ do_vsctl(const char *args, struct ctl_command *commands, 
size_t n_commands,
 int64_t next_cfg = 0;
 char *ppid_info = NULL;
 
+ovs_assert(postdb_err);
+*postdb_err = false;
 txn = the_idl_txn = ovsdb_idl_txn_create(idl);
 if (dry_run) {
 ovsdb_idl_txn_set_dry_run(txn);
@@ -2989,7 +3004,9 @@ do_vsctl(const char *args, struct ctl_command *commands, 
size_t n_commands,
 ovsdb_idl_run(idl);
 OVSREC_OPEN_VSWITCH_FOR_EACH (ovs, idl) {
 if (ovs->cur_cfg >= next_cfg) {
-post_db_reload_do_checks(_ctx);
+if (post_db_reload_do_checks(_ctx)) {
+*postdb_err = true;
+}
 goto done;
 }
 }
-- 
2.40.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] flow: fix sanity check for unexpected ip header length field

2023-03-27 Thread Flavio Leitner
On Mon, Mar 27, 2023 at 03:34:52PM +0200, Simon Horman wrote:
> On Wed, Mar 15, 2023 at 05:11:01PM +0800, Faicker Mo wrote:
> > Derivation cases of CVE-2020-35498:
> > 1. invalid ipv4 header total-length field
> > 2. invalid ipv6 header payload-length field
> > These may cause unwanted flow to send to datapath.
> > 
> > 
> > Signed-off-by: Faicker Mo 
> 
> I think the immediate question here is how to correctly handle invalid
> packets which claim to have a total length that is greater than the length
> received (size) by OVS.

OVS strives to forward all packets but at the same time it 
can't rely on data from non-conforming packets.

You can see with checksum that when OVS changes a packet 
it recalculates the checksum in a way that it doesn't 
change the original state (good/bad).

> It doesn't seem to me that truncating the total length to size,
> and thus pretending the packets are valid in this regard,
> is the right approach.

+1

> Is there a bigger problem in that any packets that fail the sanity check
> are problematic with regards to megaflow creation? If so, I think we should
> aim for a more comprehensive solution.

+1

fbl


> 
> > ---
> >  lib/flow.c  | 11 +--
> >  tests/classifier.at | 42 ++
> >  tests/ofp-print.at  |  2 +-
> >  3 files changed, 48 insertions(+), 7 deletions(-)
> > 
> > 
> > diff --git a/lib/flow.c b/lib/flow.c
> > index c3a3aa3ce..d96d02213 100644
> > --- a/lib/flow.c
> > +++ b/lib/flow.c
> > @@ -662,9 +662,8 @@ ipv4_sanity_check(const struct ip_header *nh, size_t 
> > size,
> >  return false;
> >  }
> >  
> > -tot_len = ntohs(nh->ip_tot_len);
> > -if (OVS_UNLIKELY(tot_len > size || ip_len > tot_len ||
> > -size - tot_len > UINT16_MAX)) {
> > +tot_len = MIN(size, ntohs(nh->ip_tot_len));
> > +if (OVS_UNLIKELY(ip_len > tot_len || size - tot_len > UINT16_MAX)) {
> >  COVERAGE_INC(miniflow_extract_ipv4_pkt_len_error);
> >  return false;
> >  }
> > @@ -700,7 +699,7 @@ ipv6_sanity_check(const struct ovs_16aligned_ip6_hdr 
> > *nh, size_t size)
> >  return false;
> >  }
> >  
> > -plen = ntohs(nh->ip6_plen);
> > +plen = MIN(size - sizeof *nh, ntohs(nh->ip6_plen));
> >  if (OVS_UNLIKELY(plen + IPV6_HEADER_LEN > size)) {
> >  COVERAGE_INC(miniflow_extract_ipv6_pkt_len_error);
> >  return false;
> > @@ -920,7 +919,7 @@ miniflow_extract(struct dp_packet *packet, struct 
> > miniflow *dst)
> >  }
> >  data_pull(, , sizeof *nh);
> >  
> > -plen = ntohs(nh->ip6_plen);
> > +plen = MIN(size, ntohs(nh->ip6_plen));
> >  dp_packet_set_l2_pad_size(packet, size - plen);
> >  size = plen;   /* Never pull padding. */
> >  
> > @@ -1197,7 +1196,7 @@ parse_tcp_flags(struct dp_packet *packet,
> >  }
> >  data_pull(, , sizeof *nh);
> >  
> > -plen = ntohs(nh->ip6_plen); /* Never pull padding. */
> > +plen = MIN(size, ntohs(nh->ip6_plen)); /* Never pull padding. */
> >  dp_packet_set_l2_pad_size(packet, size - plen);
> >  size = plen;
> >  const struct ovs_16aligned_ip6_frag *frag_hdr;
> > diff --git a/tests/classifier.at b/tests/classifier.at
> > index de2705653..1a1615bb5 100644
> > --- a/tests/classifier.at
> > +++ b/tests/classifier.at
> > @@ -418,6 +418,7 @@ ovs-ofctl: "conjunction" actions may be used along with 
> > "note" but not any other
> >  OVS_VSWITCHD_STOP
> >  AT_CLEANUP
> >  
> > +AT_BANNER([flow classifier abnormal packet])
> >  # Flow classifier a packet with excess of padding.
> >  AT_SETUP([flow classifier - packet with extra padding])
> >  OVS_VSWITCHD_START
> > @@ -453,3 +454,44 @@ Datapath actions: 2
> >  ])
> >  OVS_VSWITCHD_STOP
> >  AT_CLEANUP
> > +
> > +dnl Flow classifier a packet with invalid total-length field of ipv4 header
> > +AT_SETUP([flow classifier - packet with invalid total-length field of ipv4 
> > header])
> > +OVS_VSWITCHD_START
> > +add_of_ports br0 1 2
> > +AT_DATA([flows.txt], [dnl
> > +priority=5,ip,ip_dst=1.1.1.1,actions=1
> > +priority=5,ip,ip_dst=1.1.1.2,actions=2
> > +priority=0,actions=drop
> > +])
> > +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> > +packet=0800453c0001401176ac0101010101010102003500350008fb4f
> > +AT_CHECK([ovs-appctl ofproto/trace br0 in_port=1 $packet] , [0], [stdout])
> > +dnl the problem flow,
> > +dnl Megaflow: recirc_id=0,eth,ip,in_port=1,nw_dst=0.0.0.0/8,nw_frag=no
> > +dnl Datapath actions: drop
> > +AT_CHECK([tail -2 stdout], [0],
> > +  [Megaflow: recirc_id=0,eth,ip,in_port=1,nw_dst=1.1.1.2,nw_frag=no
> > +Datapath actions: 2
> > +])
> > +OVS_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +dnl Flow classifier a packet with invalid payload-length field of ipv4 
> > header
> > +AT_SETUP([flow classifier - packet with invalid payload-length field of 
> > ipv6 header])
> > +OVS_VSWITCHD_START
> > +add_of_ports br0 1 2
> > 

Re: [ovs-dev] [PATCH v4] dpdk: Allow retaining CAP_SYS_RAWIO privileges

2023-03-16 Thread Flavio Leitner
On Thu, Mar 16, 2023 at 08:00:39AM -0400, Aaron Conole wrote:
> Open vSwitch generally tries to let the underlying operating system
> managed the low level details of hardware, for example DMA mapping,
> bus arbitration, etc.  However, when using DPDK, the underlying
> operating system yields control of many of these details to userspace
> for management.
> 
> In the case of some DPDK port drivers, configuring rte_flow or even
> allocating resources may require access to iopl/ioperm calls, which
> are guarded by the CAP_SYS_RAWIO privilege on linux systems.  These
> calls are dangerous, and can allow a process to completely compromise
> a system.  However, they are needed in the case of some userspace
> driver code which manages the hardware (for example, the mlx
> implementation of backend support for rte_flow).
> 
> Here, we create an opt-in flag passed to the command line to allow
> this access.  We need to do this before ever accessing the database,
> because we want to drop all privileges asap, and cannot wait for
> a connection to the database to be established and functional before
> dropping.  There may be distribution specific ways to do capability
> management as well (using for example, systemd), but they are not
> as universal to the vswitchd as a flag.
> 
> Reviewed-by: Simon Horman 
> Signed-off-by: Aaron Conole 
> ---

Works for me and the patch looks good now.
Thanks Aaron!

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] dpdk: Allow retaining CAP_SYS_RAWIO privileges

2023-03-14 Thread Flavio Leitner
On Tue, Mar 14, 2023 at 10:46:02AM -0400, Aaron Conole wrote:
> Flavio Leitner  writes:
> 
> > On Wed, Mar 08, 2023 at 05:37:11PM -0500, Aaron Conole wrote:
> >> Open vSwitch generally tries to let the underlying operating system
> >> managed the low level details of hardware, for example DMA mapping,
> >> bus arbitration, etc.  However, when using DPDK, the underlying
> >> operating system yields control of many of these details to userspace
> >> for management.
> >> 
> >> In the case of some DPDK port drivers, configuring rte_flow or even
> >> allocating resources may require access to iopl/ioperm calls, which
> >> are guarded by the CAP_SYS_RAWIO privilege on linux systems.  These
> >> calls are dangerous, and can allow a process to completely compromise
> >> a system.  However, they are needed in the case of some userspace
> >> driver code which manages the hardware (for example, the mlx
> >> implementation of backend support for rte_flow).
> >> 
> >> Here, we create an opt-in flag passed to the command line to allow
> >> this access.  We need to do this before ever accessing the database,
> >> because we want to drop all privileges asap, and cannot wait for
> >> a connection to the database to be established and functional before
> >> dropping.  There may be distribution specific ways to do capability
> >> management as well (using for example, systemd), but they are not
> >> as universal to the vswitchd as a flag.
> >> 
> >> Reviewed-by: Simon Horman 
> >> Signed-off-by: Aaron Conole 
> >> ---
> >> v1->v2: update daemon-windows for daemon_become_new_user
> >> 
> >> v2->v3: update daemon-windows for daemon_start
> >> change log messages to be clearer
> >> update the manpage to provide example of why
> >>   one would want the flag
> >> 
> >>  NEWS   |  4 
> >>  lib/daemon-unix.c  | 29 +
> >>  lib/daemon-windows.c   |  6 --
> >>  lib/daemon.c   |  2 +-
> >>  lib/daemon.h   |  4 ++--
> >>  ovsdb/ovsdb-client.c   |  6 +++---
> >>  ovsdb/ovsdb-server.c   |  4 ++--
> >>  tests/test-netflow.c   |  2 +-
> >>  tests/test-sflow.c |  2 +-
> >>  tests/test-unixctl.c   |  2 +-
> >>  utilities/ovs-ofctl.c  |  4 ++--
> >>  utilities/ovs-testcontroller.c |  4 ++--
> >>  vswitchd/ovs-vswitchd.8.in |  9 +
> >>  vswitchd/ovs-vswitchd.c| 11 ++-
> >>  14 files changed, 63 insertions(+), 26 deletions(-)
> >> 
> >> diff --git a/NEWS b/NEWS
> >> index 85b3496214..65f35dcdd5 100644
> >> --- a/NEWS
> >> +++ b/NEWS
> >> @@ -10,6 +10,10 @@ Post-v3.1.0
> >> in order to create OVSDB sockets with access mode of 0770.
> >> - QoS:
> >>   * Added new configuration option 'jitter' for a linux-netem QoS type.
> >> +   - DPDK:
> >> + * ovs-vswitchd will keep the CAP_SYS_RAWIO capability when started
> >> +   with the --hw-rawio-access command line option.  This allows the
> >> +   process extra privileges when mapping physical interconnect memory.
> >>  
> >>  
> >>  v3.1.0 - 16 Feb 2023
> >> diff --git a/lib/daemon-unix.c b/lib/daemon-unix.c
> >> index 1a7ba427d7..dd839015ab 100644
> >> --- a/lib/daemon-unix.c
> >> +++ b/lib/daemon-unix.c
> >> @@ -88,7 +88,8 @@ static bool switch_user = false;
> >>  static uid_t uid;
> >>  static gid_t gid;
> >>  static char *user = NULL;
> >> -static void daemon_become_new_user__(bool access_datapath);
> >> +static void daemon_become_new_user__(bool access_datapath,
> >> + bool access_hardware_ports);
> >>  
> >>  static void check_already_running(void);
> >>  static int lock_pidfile(FILE *, int command);
> >> @@ -443,13 +444,13 @@ monitor_daemon(pid_t daemon_pid)
> >>   * daemonize_complete()) or that it failed to start up (by exiting with a
> >>   * nonzero exit code). */
> >>  void
> >> -daemonize_start(bool access_datapath)
> >> +daemonize_start(bool access_datapath, bool access_hardware_ports)
> >>  {
> >>  assert_single_threaded();
> >>  daemonize_fd = -1;
> >>  
> >>  if (switch_user) {
> >

Re: [ovs-dev] [PATCH v3] dpdk: Allow retaining CAP_SYS_RAWIO privileges

2023-03-14 Thread Flavio Leitner
On Wed, Mar 08, 2023 at 05:37:11PM -0500, Aaron Conole wrote:
> Open vSwitch generally tries to let the underlying operating system
> managed the low level details of hardware, for example DMA mapping,
> bus arbitration, etc.  However, when using DPDK, the underlying
> operating system yields control of many of these details to userspace
> for management.
> 
> In the case of some DPDK port drivers, configuring rte_flow or even
> allocating resources may require access to iopl/ioperm calls, which
> are guarded by the CAP_SYS_RAWIO privilege on linux systems.  These
> calls are dangerous, and can allow a process to completely compromise
> a system.  However, they are needed in the case of some userspace
> driver code which manages the hardware (for example, the mlx
> implementation of backend support for rte_flow).
> 
> Here, we create an opt-in flag passed to the command line to allow
> this access.  We need to do this before ever accessing the database,
> because we want to drop all privileges asap, and cannot wait for
> a connection to the database to be established and functional before
> dropping.  There may be distribution specific ways to do capability
> management as well (using for example, systemd), but they are not
> as universal to the vswitchd as a flag.
> 
> Reviewed-by: Simon Horman 
> Signed-off-by: Aaron Conole 
> ---
> v1->v2: update daemon-windows for daemon_become_new_user
> 
> v2->v3: update daemon-windows for daemon_start
> change log messages to be clearer
> update the manpage to provide example of why
>   one would want the flag
> 
>  NEWS   |  4 
>  lib/daemon-unix.c  | 29 +
>  lib/daemon-windows.c   |  6 --
>  lib/daemon.c   |  2 +-
>  lib/daemon.h   |  4 ++--
>  ovsdb/ovsdb-client.c   |  6 +++---
>  ovsdb/ovsdb-server.c   |  4 ++--
>  tests/test-netflow.c   |  2 +-
>  tests/test-sflow.c |  2 +-
>  tests/test-unixctl.c   |  2 +-
>  utilities/ovs-ofctl.c  |  4 ++--
>  utilities/ovs-testcontroller.c |  4 ++--
>  vswitchd/ovs-vswitchd.8.in |  9 +
>  vswitchd/ovs-vswitchd.c| 11 ++-
>  14 files changed, 63 insertions(+), 26 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index 85b3496214..65f35dcdd5 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -10,6 +10,10 @@ Post-v3.1.0
> in order to create OVSDB sockets with access mode of 0770.
> - QoS:
>   * Added new configuration option 'jitter' for a linux-netem QoS type.
> +   - DPDK:
> + * ovs-vswitchd will keep the CAP_SYS_RAWIO capability when started
> +   with the --hw-rawio-access command line option.  This allows the
> +   process extra privileges when mapping physical interconnect memory.
>  
>  
>  v3.1.0 - 16 Feb 2023
> diff --git a/lib/daemon-unix.c b/lib/daemon-unix.c
> index 1a7ba427d7..dd839015ab 100644
> --- a/lib/daemon-unix.c
> +++ b/lib/daemon-unix.c
> @@ -88,7 +88,8 @@ static bool switch_user = false;
>  static uid_t uid;
>  static gid_t gid;
>  static char *user = NULL;
> -static void daemon_become_new_user__(bool access_datapath);
> +static void daemon_become_new_user__(bool access_datapath,
> + bool access_hardware_ports);
>  
>  static void check_already_running(void);
>  static int lock_pidfile(FILE *, int command);
> @@ -443,13 +444,13 @@ monitor_daemon(pid_t daemon_pid)
>   * daemonize_complete()) or that it failed to start up (by exiting with a
>   * nonzero exit code). */
>  void
> -daemonize_start(bool access_datapath)
> +daemonize_start(bool access_datapath, bool access_hardware_ports)
>  {
>  assert_single_threaded();
>  daemonize_fd = -1;
>  
>  if (switch_user) {
> -daemon_become_new_user__(access_datapath);
> +daemon_become_new_user__(access_datapath, access_hardware_ports);
>  switch_user = false;
>  }
>  
> @@ -807,7 +808,8 @@ daemon_become_new_user_unix(void)
>  /* Linux specific implementation of daemon_become_new_user()
>   * using libcap-ng.   */
>  static void
> -daemon_become_new_user_linux(bool access_datapath OVS_UNUSED)
> +daemon_become_new_user_linux(bool access_datapath OVS_UNUSED,
> + bool access_hardware_ports OVS_UNUSED)
>  {
>  #if defined __linux__ &&  HAVE_LIBCAPNG
>  int ret;
> @@ -827,6 +829,16 @@ daemon_become_new_user_linux(bool access_datapath 
> OVS_UNUSED)
>  ret = capng_update(CAPNG_ADD, cap_sets, CAP_NET_ADMIN)
>|| capng_update(CAPNG_ADD, cap_sets, CAP_NET_RAW)
>|| capng_update(CAPNG_ADD, cap_sets, 
> CAP_NET_BROADCAST);
> +#ifdef DPDK_NETDEV
> +if (access_hardware_ports && !ret) {
> +ret = capng_update(CAPNG_ADD, cap_sets, CAP_SYS_RAWIO);
> +VLOG_INFO("The Linux capability CAP_SYS_RAWIO enabled.");

Shouldn't it be ... 

Re: [ovs-dev] [PATCH v2] dpdk: Allow retaining CAP_SYS_RAWIO privileges

2023-03-07 Thread Flavio Leitner
On Tue, Mar 07, 2023 at 12:24:37PM -0300, Flavio Leitner wrote:
> On Tue, Mar 07, 2023 at 02:32:04PM +0100, David Marchand wrote:
> > On Tue, Mar 7, 2023 at 2:06 PM Aaron Conole  wrote:
> > >
> > > Open vSwitch generally tries to let the underlying operating system
> > > managed the low level details of hardware, for example DMA mapping,
> > > bus arbitration, etc.  However, when using DPDK, the underlying
> > > operating system yields control of many of these details to userspace
> > > for management.
> > >
> > > In the case of some DPDK port drivers, configuring rte_flow or even
> > > allocating resources may require access to iopl/ioperm calls, which
> > > are guarded by the CAP_SYS_RAWIO privilege on linux systems.  These
> > > calls are dangerous, and can allow a process to completely compromise
> > > a system.  However, they are needed in the case of some userspace
> > > driver code which manages the hardware (for example, the mlx
> > > implementation of backend support for rte_flow).
> > >
> > > Here, we create an opt-in flag passed to the command line to allow
> > > this access.  We need to do this before ever accessing the database,
> > > because we want to drop all privileges asap, and cannot wait for
> > > a connection to the database to be established and functional before
> > > dropping.  There may be distribution specific ways to do capability
> > > management as well (using for example, systemd), but they are not
> > > as universal to the vswitchd as a flag.
> > >
> > > Reviewed-by: Simon Horman 
> > > Signed-off-by: Aaron Conole 
> > > ---
> > >  NEWS   |  4 
> > >  lib/daemon-unix.c  | 29 +
> > >  lib/daemon-windows.c   |  3 ++-
> > >  lib/daemon.c   |  2 +-
> > >  lib/daemon.h   |  4 ++--
> > >  ovsdb/ovsdb-client.c   |  6 +++---
> > >  ovsdb/ovsdb-server.c   |  4 ++--
> > >  tests/test-netflow.c   |  2 +-
> > >  tests/test-sflow.c |  2 +-
> > >  tests/test-unixctl.c   |  2 +-
> > >  utilities/ovs-ofctl.c  |  4 ++--
> > >  utilities/ovs-testcontroller.c |  4 ++--
> > >  vswitchd/ovs-vswitchd.8.in |  8 
> > >  vswitchd/ovs-vswitchd.c| 11 ++-
> > >  14 files changed, 60 insertions(+), 25 deletions(-)
> > >
> > > diff --git a/NEWS b/NEWS
> > > index 85b3496214..65f35dcdd5 100644
> > > --- a/NEWS
> > > +++ b/NEWS
> > > @@ -10,6 +10,10 @@ Post-v3.1.0
> > > in order to create OVSDB sockets with access mode of 0770.
> > > - QoS:
> > >   * Added new configuration option 'jitter' for a linux-netem QoS 
> > > type.
> > > +   - DPDK:
> > > + * ovs-vswitchd will keep the CAP_SYS_RAWIO capability when started
> > > +   with the --hw-rawio-access command line option.  This allows the
> > > +   process extra privileges when mapping physical interconnect 
> > > memory.
> > >
> > >
> > >  v3.1.0 - 16 Feb 2023
> > > diff --git a/lib/daemon-unix.c b/lib/daemon-unix.c
> > > index 1a7ba427d7..a080facddc 100644
> > > --- a/lib/daemon-unix.c
> > > +++ b/lib/daemon-unix.c
> > > @@ -88,7 +88,8 @@ static bool switch_user = false;
> > >  static uid_t uid;
> > >  static gid_t gid;
> > >  static char *user = NULL;
> > > -static void daemon_become_new_user__(bool access_datapath);
> > > +static void daemon_become_new_user__(bool access_datapath,
> > > + bool access_hardware_ports);
> > >
> > >  static void check_already_running(void);
> > >  static int lock_pidfile(FILE *, int command);
> > > @@ -443,13 +444,13 @@ monitor_daemon(pid_t daemon_pid)
> > >   * daemonize_complete()) or that it failed to start up (by exiting with a
> > >   * nonzero exit code). */
> > >  void
> > > -daemonize_start(bool access_datapath)
> > > +daemonize_start(bool access_datapath, bool access_hardware_ports)
> > >  {
> > >  assert_single_threaded();
> > >  daemonize_fd = -1;
> > >
> > >  if (switch_user) {
> > > -daemon_become_new_user__(access_datapath);
> > > +daemon_become_new_user__(access_datapath, access_hardware_ports);
> > >  switch_user = false;
> > &g

Re: [ovs-dev] [PATCH v2] dpdk: Allow retaining CAP_SYS_RAWIO privileges

2023-03-07 Thread Flavio Leitner
On Tue, Mar 07, 2023 at 02:32:04PM +0100, David Marchand wrote:
> On Tue, Mar 7, 2023 at 2:06 PM Aaron Conole  wrote:
> >
> > Open vSwitch generally tries to let the underlying operating system
> > managed the low level details of hardware, for example DMA mapping,
> > bus arbitration, etc.  However, when using DPDK, the underlying
> > operating system yields control of many of these details to userspace
> > for management.
> >
> > In the case of some DPDK port drivers, configuring rte_flow or even
> > allocating resources may require access to iopl/ioperm calls, which
> > are guarded by the CAP_SYS_RAWIO privilege on linux systems.  These
> > calls are dangerous, and can allow a process to completely compromise
> > a system.  However, they are needed in the case of some userspace
> > driver code which manages the hardware (for example, the mlx
> > implementation of backend support for rte_flow).
> >
> > Here, we create an opt-in flag passed to the command line to allow
> > this access.  We need to do this before ever accessing the database,
> > because we want to drop all privileges asap, and cannot wait for
> > a connection to the database to be established and functional before
> > dropping.  There may be distribution specific ways to do capability
> > management as well (using for example, systemd), but they are not
> > as universal to the vswitchd as a flag.
> >
> > Reviewed-by: Simon Horman 
> > Signed-off-by: Aaron Conole 
> > ---
> >  NEWS   |  4 
> >  lib/daemon-unix.c  | 29 +
> >  lib/daemon-windows.c   |  3 ++-
> >  lib/daemon.c   |  2 +-
> >  lib/daemon.h   |  4 ++--
> >  ovsdb/ovsdb-client.c   |  6 +++---
> >  ovsdb/ovsdb-server.c   |  4 ++--
> >  tests/test-netflow.c   |  2 +-
> >  tests/test-sflow.c |  2 +-
> >  tests/test-unixctl.c   |  2 +-
> >  utilities/ovs-ofctl.c  |  4 ++--
> >  utilities/ovs-testcontroller.c |  4 ++--
> >  vswitchd/ovs-vswitchd.8.in |  8 
> >  vswitchd/ovs-vswitchd.c| 11 ++-
> >  14 files changed, 60 insertions(+), 25 deletions(-)
> >
> > diff --git a/NEWS b/NEWS
> > index 85b3496214..65f35dcdd5 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -10,6 +10,10 @@ Post-v3.1.0
> > in order to create OVSDB sockets with access mode of 0770.
> > - QoS:
> >   * Added new configuration option 'jitter' for a linux-netem QoS type.
> > +   - DPDK:
> > + * ovs-vswitchd will keep the CAP_SYS_RAWIO capability when started
> > +   with the --hw-rawio-access command line option.  This allows the
> > +   process extra privileges when mapping physical interconnect memory.
> >
> >
> >  v3.1.0 - 16 Feb 2023
> > diff --git a/lib/daemon-unix.c b/lib/daemon-unix.c
> > index 1a7ba427d7..a080facddc 100644
> > --- a/lib/daemon-unix.c
> > +++ b/lib/daemon-unix.c
> > @@ -88,7 +88,8 @@ static bool switch_user = false;
> >  static uid_t uid;
> >  static gid_t gid;
> >  static char *user = NULL;
> > -static void daemon_become_new_user__(bool access_datapath);
> > +static void daemon_become_new_user__(bool access_datapath,
> > + bool access_hardware_ports);
> >
> >  static void check_already_running(void);
> >  static int lock_pidfile(FILE *, int command);
> > @@ -443,13 +444,13 @@ monitor_daemon(pid_t daemon_pid)
> >   * daemonize_complete()) or that it failed to start up (by exiting with a
> >   * nonzero exit code). */
> >  void
> > -daemonize_start(bool access_datapath)
> > +daemonize_start(bool access_datapath, bool access_hardware_ports)
> >  {
> >  assert_single_threaded();
> >  daemonize_fd = -1;
> >
> >  if (switch_user) {
> > -daemon_become_new_user__(access_datapath);
> > +daemon_become_new_user__(access_datapath, access_hardware_ports);
> >  switch_user = false;
> >  }
> >
> > @@ -807,7 +808,8 @@ daemon_become_new_user_unix(void)
> >  /* Linux specific implementation of daemon_become_new_user()
> >   * using libcap-ng.   */
> >  static void
> > -daemon_become_new_user_linux(bool access_datapath OVS_UNUSED)
> > +daemon_become_new_user_linux(bool access_datapath OVS_UNUSED,
> > + bool access_hardware_ports OVS_UNUSED)
> >  {
> >  #if defined __linux__ &&  HAVE_LIBCAPNG
> >  int ret;
> > @@ -827,6 +829,16 @@ daemon_become_new_user_linux(bool access_datapath 
> > OVS_UNUSED)
> >  ret = capng_update(CAPNG_ADD, cap_sets, CAP_NET_ADMIN)
> >|| capng_update(CAPNG_ADD, cap_sets, CAP_NET_RAW)
> >|| capng_update(CAPNG_ADD, cap_sets, 
> > CAP_NET_BROADCAST);
> > +#ifdef DPDK_NETDEV
> > +if (access_hardware_ports && !ret) {
> > +ret = capng_update(CAPNG_ADD, cap_sets, CAP_SYS_RAWIO);
> > +VLOG_INFO("CAP_SYS_RAWIO enabled.");

Perhaps "The Linux capability CAP_SYS_RAWIO is 

Re: [ovs-dev] [PATCH v9 0/4] Enhance support for checksum offloading

2022-11-29 Thread Flavio Leitner



Hi,

The code freeze is approaching fast and there are
holidays and PTOs in between. We kindly ask you to
review and/or test if possible as soon as you can
to unblock the second part of it.

Thanks,
fbl


On Thu, Nov 24, 2022 at 12:30:45AM -0500, Mike Pattrick wrote:
> This is a subset of the larger TSO patchset with various checksum
> improvements. This set includes additional documentation, new appctl
> command "dpif-netdev/offload-show" to display interface offload
> support, and improvements to tracking when an updated checksum is
> required.
> 
> In a simple iperf test with traffic flowing from a VM, through a
> virtio interface and out of DPDK PF, this series resulted in an 18%
> improvement in TCP throughput compared to master branch (361 vs 429
> Mbps). When TSO is enabled, this further improved by 10x (429 Mbps vs
> 4.32 Gbps). While TSO isn't introduced in this series, support for
> encapsulation with offload is extended.
> 
> Flavio Leitner (4):
>   Documentation: Document netdev offload.
>   dpif-netdev: Show netdev offloading flags.
>   userspace: Enable IP checksum offloading by default.
>   userspace: Enable L4 csum offloading by default.
> 
>  Documentation/automake.mk|   1 +
>  Documentation/topics/index.rst   |   1 +
>  Documentation/topics/netdev-offloads.rst |  95 +
>  lib/conntrack.c  |  30 +--
>  lib/dp-packet.c  |  39 
>  lib/dp-packet.h  | 138 -
>  lib/dpif-netdev-unixctl.man  |   6 +
>  lib/dpif-netdev.c|  62 ++
>  lib/flow.c   |  38 +++-
>  lib/ipf.c|  11 +-
>  lib/netdev-dpdk.c| 229 +---
>  lib/netdev-dummy.c   |  23 +++
>  lib/netdev-linux.c   | 252 +++
>  lib/netdev-native-tnl.c  |  53 ++---
>  lib/netdev-provider.h|   3 +
>  lib/netdev.c |  87 +---
>  lib/odp-execute.c|  21 +-
>  lib/packets.c| 209 +++
>  lib/packets.h|   3 +
>  ofproto/ofproto-dpif-upcall.c|   2 +-
>  tests/automake.mk|   1 +
>  tests/dpif-netdev.at |  21 ++
>  tests/system-userspace-offload.at|  79 +++
>  tests/system-userspace-testsuite.at  |   1 +
>  24 files changed, 1099 insertions(+), 306 deletions(-)
>  create mode 100644 Documentation/topics/netdev-offloads.rst
>  create mode 100644 tests/system-userspace-offload.at
> 
> -- 
> 2.31.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC PATCH] lib/ovs-thread: Expand stack when more memory is available.

2022-10-19 Thread Flavio Leitner
On Wed, Oct 19, 2022 at 09:48:18AM -0400, Mike Pattrick wrote:
> On Wed, Oct 19, 2022 at 9:30 AM Flavio Leitner  wrote:
> >
> >
> > Hi Mike,
> >
> > Thanks for the patch.
> >
> > Does this patch need to change this line too?
> > https://github.com/openvswitch/ovs/blob/31db0e043119cf597d720d94f70ec19cf5b8b7d4/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in#L18
> >
> >
> > Wouldn't it be better to have a config option that we
> > can change at runtime? Or perhaps leave it to use the
> > system's default unless configured to cap the amount?
> 
> What I'm worried about here is if OVN is used and a few other things
> like dnat/snat, the resulting openflow rules can cause a segfault with
> the system default 2MB stack. The stack conditions aren't really
> detectable during runtime so increasing the default seemed reasonable
> to me. It also doesn't seem trivial to me to determine if any given
> set of OpenFlow rules will or won't cause an explosion in stack usage.


I agree. The issue is that if there is a crash then the only
option is to recompile and that is super difficult to do when
dealing with production environments.

Another thing to consider is that this may not be OVS' job to
deal with because OVS should rely on OS defaults, when possible.
However, sometimes OS defaults are too high for other reasons,
so we may want to cap it at OVS level.

I am not sure about users upgrading from previous versions
because if before it was limited and now it uses the OS default,
it may use more memory.

fbl

> > BTW, I think we use Reported-at: tag instead of Bugzilla:.
> 
> You're right! I don't know where I came up with that tag.
> 
> -M
> 
> >
> >
> > fbl
> >
> > On Wed, Oct 19, 2022 at 09:01:46AM -0400, Mike Pattrick wrote:
> > > Previously the minimum thread stack size was always set to 512 kB to
> > > help accomidate smaller OpenWRT based systems. Often these devices
> > > don't have a lot of total system memory, so such a limit makes sense.
> > >
> > > The default under x86-64 linux is 2MB, this limit is not always enough
> > > to reach the recursion limits in xlate_resubmit_resource_check(),
> > > resulting in a segfault instead of a recoverable error. This can happen
> > > even when the stack size is set up for unlimited expansion when the
> > > virtaul memory areas of handler threads abut eachother, preventing any
> > > further expansion.
> > >
> > > The solution proposed here is to set a minimum of 4MB on systems with
> > > more than 4GB of total ram.
> > >
> > > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2104779
> > > Signed-off-by: Mike Pattrick 
> > > ---
> > >  lib/ovs-thread.c | 46 --
> > >  lib/ovs-thread.h |  1 +
> > >  2 files changed, 45 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/lib/ovs-thread.c b/lib/ovs-thread.c
> > > index 78ed3e970..dbe4a036f 100644
> > > --- a/lib/ovs-thread.c
> > > +++ b/lib/ovs-thread.c
> > > @@ -478,10 +478,21 @@ ovs_thread_create(const char *name, void 
> > > *(*start)(void *), void *arg)
> > >   * requires approximately 384 kB according to the following analysis:
> > >   * 
> > > https://mail.openvswitch.org/pipermail/ovs-dev/2016-January/308592.html
> > >   *
> > > - * We use 512 kB to give us some margin of error. */
> > > + * We use at least 512 kB to give us some margin of error.
> > > + *
> > > + * However, this can cause issues on larger systems with complex
> > > + * OpenFlow tables. A default stack size of 2MB can result in 
> > > segfaults
> > > + * if a lot of clones and resubmits are used. So if the system memory
> > > + * exceeds some limit then use a 4 MB stack.
> > > + * */
> > >  pthread_attr_t attr;
> > >  pthread_attr_init();
> > > -set_min_stack_size(, 512 * 1024);
> > > +
> > > +if (system_memory() >> 30 > 4) {
> > > +set_min_stack_size(, 4096 * 1024);
> > > +} else {
> > > +set_min_stack_size(, 512 * 1024);
> > > +}
> > >
> > >  error = pthread_create(, , ovsthread_wrapper, aux);
> > >  if (error) {
> > > @@ -680,6 +691,37 @@ count_total_cores(void)
> > >  return n_cores > 0 ? n_cores : 0;
> > >  }
> > >
> > > +/* Returns the total system memory in bytes, or 0 if the
> > > + * number cannot be determine

Re: [ovs-dev] [RFC PATCH] lib/ovs-thread: Expand stack when more memory is available.

2022-10-19 Thread Flavio Leitner


Hi Mike,

Thanks for the patch.

Does this patch need to change this line too?
https://github.com/openvswitch/ovs/blob/31db0e043119cf597d720d94f70ec19cf5b8b7d4/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in#L18


Wouldn't it be better to have a config option that we
can change at runtime? Or perhaps leave it to use the
system's default unless configured to cap the amount?

BTW, I think we use Reported-at: tag instead of Bugzilla:.


fbl

On Wed, Oct 19, 2022 at 09:01:46AM -0400, Mike Pattrick wrote:
> Previously the minimum thread stack size was always set to 512 kB to
> help accomidate smaller OpenWRT based systems. Often these devices
> don't have a lot of total system memory, so such a limit makes sense.
> 
> The default under x86-64 linux is 2MB, this limit is not always enough
> to reach the recursion limits in xlate_resubmit_resource_check(),
> resulting in a segfault instead of a recoverable error. This can happen
> even when the stack size is set up for unlimited expansion when the
> virtaul memory areas of handler threads abut eachother, preventing any
> further expansion.
> 
> The solution proposed here is to set a minimum of 4MB on systems with
> more than 4GB of total ram.
> 
> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2104779
> Signed-off-by: Mike Pattrick 
> ---
>  lib/ovs-thread.c | 46 --
>  lib/ovs-thread.h |  1 +
>  2 files changed, 45 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/ovs-thread.c b/lib/ovs-thread.c
> index 78ed3e970..dbe4a036f 100644
> --- a/lib/ovs-thread.c
> +++ b/lib/ovs-thread.c
> @@ -478,10 +478,21 @@ ovs_thread_create(const char *name, void *(*start)(void 
> *), void *arg)
>   * requires approximately 384 kB according to the following analysis:
>   * 
> https://mail.openvswitch.org/pipermail/ovs-dev/2016-January/308592.html
>   *
> - * We use 512 kB to give us some margin of error. */
> + * We use at least 512 kB to give us some margin of error.
> + *
> + * However, this can cause issues on larger systems with complex
> + * OpenFlow tables. A default stack size of 2MB can result in segfaults
> + * if a lot of clones and resubmits are used. So if the system memory
> + * exceeds some limit then use a 4 MB stack.
> + * */
>  pthread_attr_t attr;
>  pthread_attr_init();
> -set_min_stack_size(, 512 * 1024);
> +
> +if (system_memory() >> 30 > 4) {
> +set_min_stack_size(, 4096 * 1024);
> +} else {
> +set_min_stack_size(, 512 * 1024);
> +}
>  
>  error = pthread_create(, , ovsthread_wrapper, aux);
>  if (error) {
> @@ -680,6 +691,37 @@ count_total_cores(void)
>  return n_cores > 0 ? n_cores : 0;
>  }
>  
> +/* Returns the total system memory in bytes, or 0 if the
> + * number cannot be determined. */
> +uint64_t
> +system_memory(void)
> +{
> +static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
> +static uint64_t memory;
> +
> +if (ovsthread_once_start()) {
> +#if defined(_WIN32)
> +MEMORYSTATUSEX statex;
> +
> +statex.dwLength = sizeof statex;
> +GlobalMemoryStatusEx();
> +memory = statex.ullTotalPhys;
> +#elif defined(__linux__)
> +long int page_count = sysconf(_SC_PHYS_PAGES);
> +long int page_size = sysconf(_SC_PAGESIZE);
> +
> +if (page_count > 0 && page_size > 0) {
> +memory = page_count * page_size;
> +} else {
> +memory = 0;
> +}
> +#endif
> +ovsthread_once_done();
> +}
> +
> +return memory;
> +}
> +
>  /* Returns 'true' if current thread is PMD thread. */
>  bool
>  thread_is_pmd(void)
> diff --git a/lib/ovs-thread.h b/lib/ovs-thread.h
> index aac5e19c9..2ce66b721 100644
> --- a/lib/ovs-thread.h
> +++ b/lib/ovs-thread.h
> @@ -523,6 +523,7 @@ bool may_fork(void);
>  
>  int count_cpu_cores(void);
>  int count_total_cores(void);
> +uint64_t system_memory(void);
>  bool thread_is_pmd(void);
>  
>  #endif /* ovs-thread.h */
> -- 
> 2.31.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH branch-3.0 0/2] Release patches for v3.0.1.

2022-10-07 Thread Flavio Leitner
On Fri, Oct 07, 2022 at 01:17:33PM +0200, Ilya Maximets wrote:
> 
> Ilya Maximets (2):
>   Set release date for 3.0.1.
>   Prepare for 3.0.2.
> 
>  NEWS | 6 +-
>  configure.ac | 2 +-
>  debian/changelog | 8 +++-
>  3 files changed, 13 insertions(+), 3 deletions(-)
> 
> -- 
> 2.37.3
> 


To the set:
Acked-by: Flavio Leitner 

Thanks Ilya,
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH branch-2.17 0/2] Release patches for v2.17.3.

2022-10-07 Thread Flavio Leitner
On Fri, Oct 07, 2022 at 01:17:21PM +0200, Ilya Maximets wrote:
> 
> Ilya Maximets (2):
>   Set release date for 2.17.3.
>   Prepare for 2.17.4.
> 
>  NEWS | 6 +-
>  configure.ac | 2 +-
>  debian/changelog | 8 +++-
>  3 files changed, 13 insertions(+), 3 deletions(-)
> 
> -- 
> 2.37.3
> 

To the set:
Acked-by: Flavio Leitner 

Thanks,
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH branch-2.16 0/2] Release patches for v2.16.5.

2022-10-07 Thread Flavio Leitner
On Fri, Oct 07, 2022 at 01:17:06PM +0200, Ilya Maximets wrote:
> 
> Ilya Maximets (2):
>   Set release date for 2.16.5.
>   Prepare for 2.16.6.
> 
>  NEWS | 6 +-
>  configure.ac | 2 +-
>  debian/changelog | 8 +++-
>  3 files changed, 13 insertions(+), 3 deletions(-)
> 
> -- 
> 2.37.3
> 

To the set:
Acked-by: Flavio Leitner 

Thanks Ilya,
fbl

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH branch-2.15 0/2] Release patches for v2.15.6.

2022-10-07 Thread Flavio Leitner
On Fri, Oct 07, 2022 at 01:16:54PM +0200, Ilya Maximets wrote:
> 
> Ilya Maximets (2):
>   Set release date for 2.15.6.
>   Prepare for 2.15.7.
> 
>  NEWS | 6 +-
>  configure.ac | 2 +-
>  debian/changelog | 8 +++-
>  3 files changed, 13 insertions(+), 3 deletions(-)
> 
> -- 
> 2.37.3
> 

To the set:
Acked-by: Flavio Leitner 

fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH branch-2.14 0/2] Release patches for v2.14.7.

2022-10-07 Thread Flavio Leitner
On Fri, Oct 07, 2022 at 01:16:41PM +0200, Ilya Maximets wrote:
> 
> Ilya Maximets (2):
>   Set release date for 2.14.7.
>   Prepare for 2.14.8.
> 
>  NEWS | 6 +-
>  configure.ac | 2 +-
>  debian/changelog | 8 +++-
>  3 files changed, 13 insertions(+), 3 deletions(-)
> 
> -- 
> 2.37.3
> 

To the set:
Acked-by: Flavio Leitner 

Thanks Ilya,
fbl

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH branch-2.13 0/2] Release patches for v2.13.9.

2022-10-07 Thread Flavio Leitner
On Fri, Oct 07, 2022 at 01:16:29PM +0200, Ilya Maximets wrote:
> 
> Ilya Maximets (2):
>   Set release date for 2.13.9.
>   Prepare for 2.13.10.
> 
>  NEWS | 6 +-
>  configure.ac | 2 +-
>  debian/changelog | 8 +++-
>  3 files changed, 13 insertions(+), 3 deletions(-)
> 
> -- 
> 2.37.3
> 

-- 
fbl

To the set
Acked-by: Flavio Leitner 

Thanks Ilya,
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-dpdk: Drop TSO in case of conflicting virtio features.

2022-09-21 Thread Flavio Leitner
On Tue, Sep 20, 2022 at 2:21 PM David Marchand 
wrote:

> On Mon, Sep 12, 2022 at 4:47 PM Flavio Leitner  wrote:
> > On Mon, Sep 12, 2022 at 5:52 AM David Marchand <
> david.march...@redhat.com> wrote:
> >> On Fri, Sep 9, 2022 at 7:58 PM Flavio Leitner  wrote:
> >> >
> >> > Thanks for working on this patch!
> >> >
> >> > It seems possible to change this patch later when the other TSO series
> >> > gets merged to disable TSO only on the affected port.
> >> > Mike, any thoughts?
> >>
> >> It should be what this patch already does: disable TSO only for the
> >> affected port as I mentionned in the commitlog below.
> >> Can you clarify?
> >
> >
> > Oops, I should have added more context. The issue is that TSO is
> all-or-nothing
> > because there is no software fallback currently [see
> netdev_send_prepare_packet()].
> > So, if the problem happens and the reconnection with disabled TSO
> succeeds, all
> > TSO packets to that port will be dropped. The question is "should we
> allow the update
> > but run the risk of dropping packets or fail to update?"
> >
> > The software fallback  is provided in the posted TSO series. In that
> case, we can
> > have a per-port TSO knob because if it gets disabled, the packets will
> be segmented
> > in software and not dropped.
> > So, the patch makes sense with the posted TSO series applied, but I am
> not so sure
> > with the current code.
> >
> > What do you think?
>
> Ok, I get your point now.
>
> Btw, I still have some review to finish on the TSO series.
> For the time being, I will mark my patch as "Changes requested",
> rebase it and test on top of TSO series.
>


It sounds like a plan to me, thanks David.
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-dpdk: Drop TSO in case of conflicting virtio features.

2022-09-12 Thread Flavio Leitner
On Mon, Sep 12, 2022 at 5:52 AM David Marchand 
wrote:

> Hi Flavio,
>
> On Fri, Sep 9, 2022 at 7:58 PM Flavio Leitner  wrote:
> >
> > Thanks for working on this patch!
> >
> > It seems possible to change this patch later when the other TSO series
> > gets merged to disable TSO only on the affected port.
> > Mike, any thoughts?
>
> It should be what this patch already does: disable TSO only for the
> affected port as I mentionned in the commitlog below.
> Can you clarify?


Oops, I should have added more context. The issue is that TSO is
all-or-nothing
because there is no software fallback currently [see
netdev_send_prepare_packet()].
So, if the problem happens and the reconnection with disabled TSO succeeds,
all
TSO packets to that port will be dropped. The question is "should we allow
the update
but run the risk of dropping packets or fail to update?"

The software fallback  is provided in the posted TSO series. In that case,
we can
have a per-port TSO knob because if it gets disabled, the packets will be
segmented
in software and not dropped.
So, the patch makes sense with the posted TSO series applied, but I am not
so sure
with the current code.

What do you think?

Thanks,
fbl


>
> > I made a couple comments below.
> >
> > On Fri, Sep 9, 2022 at 10:57 AM David Marchand <
> david.march...@redhat.com> wrote:
> >>
> >> At some point in OVS history, some virtio features were announced as
> >> supported (ECN and UFO virtio features).
> >>
> >> The userspace TSO code, which has been added later, does not support
> >> those features and tries to disable them.
> >>
> >> This breaks OVS upgrades: if an existing VM already negotiated such
> >> features, their lack on reconnection to an upgraded OVS triggers a
> >> vhost socket disconnection by Qemu.
> >> This results in an endless loop because Qemu then retries with the same
> >> set of virtio features.
> >>
> >> This patch proposes to try and detect those vhost socket disconnection
> >> and fallback restoring the old virtio features (and disabling TSO for
> this
> >> vhost port).
>
> ^^^
>
>
> >>
> >> Signed-off-by: David Marchand 
> >> ---
> >>  lib/netdev-dpdk.c | 52 +--
> >>  1 file changed, 50 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >> index 0dd655507b..13d7ed3d62 100644
> >> --- a/lib/netdev-dpdk.c
> >> +++ b/lib/netdev-dpdk.c
> >> @@ -465,6 +465,15 @@ struct netdev_dpdk {
> >>  /* True if vHost device is 'up' and has been reconfigured at
> least once */
> >>  bool vhost_reconfigured;
> >>
> >> +/* Set on driver start (which means after a vHost connection is
> >> + * accepted), and cleared when the vHost device gets
> configured. */
> >> +bool vhost_initial_config;
> >> +
> >> +/* Set on disconnection if an initial configuration did not
> finish.
> >> + * This triggers a workaround for Virtio features negotiation,
> that
> >> + * makes TSO unavailable. */
> >> +bool vhost_workaround_disable_tso;
> >> +
> >>  atomic_uint8_t vhost_tx_retries_max;
> >>  /* 2 pad bytes here. */
> >>  );
> >> @@ -1293,6 +1302,7 @@ common_construct(struct netdev *netdev,
> dpdk_port_t port_no,
> >>  dev->requested_lsc_interrupt_mode = 0;
> >>  ovsrcu_index_init(>vid, -1);
> >>  dev->vhost_reconfigured = false;
> >> +dev->vhost_initial_config = false;
> >>  dev->attached = false;
> >>  dev->started = false;
> >>  dev->reset_needed = false;
> >> @@ -3986,6 +3996,7 @@ new_device(int vid)
> >>  } else {
> >>  /* Reconfiguration not required. */
> >>  dev->vhost_reconfigured = true;
> >> +dev->vhost_initial_config = false;
> >>  }
> >>
> >>  ovsrcu_index_set(>vid, vid);
> >> @@ -4154,6 +4165,16 @@ destroy_connection(int vid)
> >>  dev->requested_n_txq = qp_num;
> >>  netdev_request_reconfigure(>up);
> >>  }
> >> +
> >> +if (dev->vhost_initial_config) {
> >> +VLOG_ERR("Connection on socket '%s' seems
> prematurately "
> >> + "

Re: [ovs-dev] [PATCH] netdev-dpdk: Drop TSO in case of conflicting virtio features.

2022-09-09 Thread Flavio Leitner
Hi David,

Thanks for working on this patch!

It seems possible to change this patch later when the other TSO series
gets merged to disable TSO only on the affected port.
Mike, any thoughts?

I made a couple comments below.

On Fri, Sep 9, 2022 at 10:57 AM David Marchand 
wrote:

> At some point in OVS history, some virtio features were announced as
> supported (ECN and UFO virtio features).
>
> The userspace TSO code, which has been added later, does not support
> those features and tries to disable them.
>
> This breaks OVS upgrades: if an existing VM already negotiated such
> features, their lack on reconnection to an upgraded OVS triggers a
> vhost socket disconnection by Qemu.
> This results in an endless loop because Qemu then retries with the same
> set of virtio features.
>
> This patch proposes to try and detect those vhost socket disconnection
> and fallback restoring the old virtio features (and disabling TSO for this
> vhost port).
>
> Signed-off-by: David Marchand 
> ---
>  lib/netdev-dpdk.c | 52 +--
>  1 file changed, 50 insertions(+), 2 deletions(-)
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 0dd655507b..13d7ed3d62 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -465,6 +465,15 @@ struct netdev_dpdk {
>  /* True if vHost device is 'up' and has been reconfigured at
> least once */
>  bool vhost_reconfigured;
>
> +/* Set on driver start (which means after a vHost connection is
> + * accepted), and cleared when the vHost device gets configured.
> */
> +bool vhost_initial_config;
> +
> +/* Set on disconnection if an initial configuration did not
> finish.
> + * This triggers a workaround for Virtio features negotiation,
> that
> + * makes TSO unavailable. */
> +bool vhost_workaround_disable_tso;
> +
>  atomic_uint8_t vhost_tx_retries_max;
>  /* 2 pad bytes here. */
>  );
> @@ -1293,6 +1302,7 @@ common_construct(struct netdev *netdev, dpdk_port_t
> port_no,
>  dev->requested_lsc_interrupt_mode = 0;
>  ovsrcu_index_init(>vid, -1);
>  dev->vhost_reconfigured = false;
> +dev->vhost_initial_config = false;
>  dev->attached = false;
>  dev->started = false;
>  dev->reset_needed = false;
> @@ -3986,6 +3996,7 @@ new_device(int vid)
>  } else {
>  /* Reconfiguration not required. */
>  dev->vhost_reconfigured = true;
> +dev->vhost_initial_config = false;
>  }
>
>  ovsrcu_index_set(>vid, vid);
> @@ -4154,6 +4165,16 @@ destroy_connection(int vid)
>  dev->requested_n_txq = qp_num;
>  netdev_request_reconfigure(>up);
>  }
> +
> +if (dev->vhost_initial_config) {
> +VLOG_ERR("Connection on socket '%s' seems prematurately "
> + "closed.", dev->vhost_id);
>

Perhaps use a message more specific like below?
"Connection on socket '%s' closed during initialization."
Or change to below if that makes sense:
"Connection on socket '%s' closed during feature negotiation."



> +dev->vhost_workaround_disable_tso = true;
> +netdev_request_reconfigure(>up);
> +} else {
> +dev->vhost_workaround_disable_tso = false;
> +}
> +
>  ovs_mutex_unlock(>mutex);
>  exists = true;
>  break;
> @@ -5058,6 +5079,7 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk
> *dev)
>
>  if (dev->vhost_reconfigured == false) {
>  dev->vhost_reconfigured = true;
> +dev->vhost_initial_config = false;
>  /* Carrier status may need updating. */
>  netdev_change_seq_changed(>up);
>  }
> @@ -5086,6 +5108,31 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev
> *netdev)
>  int err;
>  uint64_t vhost_flags = 0;
>  uint64_t vhost_unsup_flags;
> +bool tso_enabled = userspace_tso_enabled();
> +
> +if (tso_enabled) {
> +bool unregister;
> +char *vhost_id;
> +
> +ovs_mutex_lock(>mutex);
> +unregister = dev->vhost_id != NULL;
> +unregister &= dev->vhost_workaround_disable_tso;
> +if (unregister) {
> +/* There was an issue with a previous connection that did not
> + * finish initialising, one common reason is virtio features
> + * negotiation. Disable TSO as a workaround. */
> +tso_enabled = false;
> +dev->vhost_driver_flags &= ~RTE_VHOST_USER_CLIENT;
> +vhost_id = dev->vhost_id;
> +VLOG_WARN("Disabling TSO for %s as a workaround because of "
> +  "previous connection drop.", dev->up.name);
>

Should we update Documentation/topics/userspace-tso.rst with
this new condition since it is external and visible to the users?

Thanks,
fbl



> +}
> +   

Re: [ovs-dev] [PATCH] rhel: Update the spec file and distribution docs.

2022-09-01 Thread Flavio Leitner


Hi,

If I recall correctly, the openvswitch-dpdk package was obsoleted
in 2016 in favor of having only one package with dpdk enabled.
Of course, upstream can decide whether it makes sense to follow
that or not.

Another thing is that openvswitch-fedora.spec should work much
better with RHEL-8 and RHEL-9. So, it would be better to just
run 'make rpm-fedora'.

One could use 'make rpm-fedora RPMBUILD_OPT="--with dpdk"' if
DPDK is needed.

Maybe these two specs could be deprecated altogether.

fbl



On Thu, Sep 01, 2022 at 04:02:35PM +0200, Ilya Maximets wrote:
> 'openvswitch' and 'openvswitch-dpdk' packages are not available
> starting with RHEL 8.  The only option to get an official OVS
> build is to use FDP repository, which is not available with the
> default RHEL subscription.
> 
> Also, the spec file is outdated.  We need python3, not python2.
> Some headers are removed by the build script, but needed for
> the devel package.  'openflow' headers are there, but not packaged.
> And libunwind-devel is not available on RHEL8.  Just removed from
> the spec, since it's an optional dependency.
> 
> Fixes: 1ca0323e7c29 ("Require Python 3 and remove support for Python 2.")
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2022-August/052033.html
> Signed-off-by: Ilya Maximets 
> ---
>  Documentation/intro/install/distributions.rst | 5 -
>  rhel/openvswitch.spec.in  | 6 ++
>  2 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/intro/install/distributions.rst 
> b/Documentation/intro/install/distributions.rst
> index ce75ac45c..3fd29ccb3 100644
> --- a/Documentation/intro/install/distributions.rst
> +++ b/Documentation/intro/install/distributions.rst
> @@ -59,10 +59,13 @@ packages and must be superuser.
>  Red Hat
>  ---
>  
> -RHEL distributes ``openvswitch`` rpm package that supports kernel datapath.
> +RHEL 7 distributes ``openvswitch`` rpm package that supports kernel datapath.
>  DPDK accelerated Open vSwitch can be installed using ``openvswitch-dpdk``
>  package.
>  
> +Starting with RHEL 8, Open vSwitch packages are provided only via Red Hat
> +Fast Datapath repositories, e.g. ``fast-datapath-for-rhel-8-x86_64-rpms``.
> +
>  OpenSuSE
>  
>  
> diff --git a/rhel/openvswitch.spec.in b/rhel/openvswitch.spec.in
> index 9903dd10a..7eac52d63 100644
> --- a/rhel/openvswitch.spec.in
> +++ b/rhel/openvswitch.spec.in
> @@ -32,14 +32,12 @@ License: ASL 2.0
>  Release: %{release_number}%{?dist}
>  Source: openvswitch-%{version}.tar.gz
>  Buildroot: /tmp/openvswitch-rpm
> -Requires: logrotate, hostname, python >= 2.7, python-six
> -BuildRequires: python-six
> +Requires: logrotate, hostname, python3 >= 3.4
>  BuildRequires: openssl-devel
>  BuildRequires: checkpolicy, selinux-policy-devel
>  BuildRequires: autoconf, automake, libtool
>  BuildRequires: python3-sphinx
>  BuildRequires: unbound-devel
> -BuildRequires: libunwind-devel
>  
>  %bcond_without check
>  %bcond_with check_datapath_kernel
> @@ -102,7 +100,6 @@ rm \
>  $RPM_BUILD_ROOT/usr/share/man/man8/ovs-test.8 \
>  $RPM_BUILD_ROOT/usr/share/man/man8/ovs-l3ping.8
>  (cd "$RPM_BUILD_ROOT" && rm -rf usr/%{_lib}/*.la)
> -(cd "$RPM_BUILD_ROOT" && rm -rf usr/include)
>  
>  install -d -m 0755 $RPM_BUILD_ROOT%{_rundir}/openvswitch
>  install -d -m 0755 $RPM_BUILD_ROOT%{_localstatedir}/log/openvswitch
> @@ -255,6 +252,7 @@ exit 0
>  %{_libdir}/lib*.a
>  %{_libdir}/pkgconfig
>  %{_includedir}/openvswitch/*
> +%{_includedir}/openflow/*
>  
>  %files selinux-policy
>  %defattr(-,root,root)
> -- 
> 2.34.3
> 

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC PATCH 0/6] Remove OVS kernel driver

2022-05-25 Thread Flavio Leitner


Hi Greg,


On Mon, May 23, 2022 at 09:10:36PM +0200, Ilya Maximets wrote:
> On 5/19/22 20:04, Gregory Rose wrote:
> > 
> > 
> > On 4/15/2022 2:42 PM, Greg Rose wrote:
> >> It is time to remove support for the OVS kernel driver and push
> >> towards use of the upstream Linux openvswitch kernel driver
> >> in it's place [1].
> >>
> >> This patch series represents a first attempt but there are a few
> >> primary remaining issues that I have yet to address.
> >>
> >> A) Removal of debian packing support for the dkms kernel driver
> >>     module. The debian/rules are not well known to me - I've never
> >>     actually made any changes in that area and do not have a
> >>     well formed understanding of how debian packaging works.  I wil
> >>     attempt to fix that up in upcoming patch series.
> >> B) Figuring out how the github workflow - I removed the tests I
> >>     could find that depend on the Linux kernel (i.e. they use
> >>     install_kernel() function.  Several other tests are  failing
> >>     that would not seem to depend on the Linux kernel.  I need to
> >>     read and understand that code better.
> >> C) There are many Linux specific source modules in the datapath that
> >>     will need eventual removal but some headers are still required for
> >>     the userspace code (which seems counterintuitive but...)
> >>
> >> Reviews, suggestions, etc. are appreciated!
> >>
> >> 1.  https://mail.openvswitch.org/pipermail/ovs-dev/2022-April/393292.html
> > 
> > I would like to suggest at this time that rather than removing the OVS
> > Linux kernel path that we "freeze" it at Linux 5.8. This will make it
> > easier for some consumers of OVS that are continuing to support the
> > Linux kernel datapath in old distributions.
> > 
> > The ultimate goal of shifting toward DPDK and AFXDP datapaths is still
> > preserved but we are placing less burden on some consumers of OVS for
> > older Linux distributions.
> > 
> > Perhaps in suggesting removal of the kernel datapath I was being a bit
> > overly aggressive.
> > 
> > Thoughts? Concerns? Other suggestions?
> 
> Hi.  I think we discussed that before.  Removal from the master branch
> doesn't mean that we will stop supporting the kernel module immediately.
> It will remain in branch 2.17 which will become our new LTS series soon.
> This branch will be supported until 2025.  And we also talked about
> possibility of extending the support just for a kernel module on that
> branch, if required.  It's not necassary to use the kernel module and
> OVS form the same branch, obviously.
> 
> Removal from the master branch will just make it possible to remove
> the maintenance burden eventually, not right away.
> 
> And FWIW, the goal is not to force everyone to use userspace datapath,
> but remove a maintenance burden and push users to use a better supported
> version of a code.  Frankly, we're not doing a great job supporting the
> out-of-tree module these days.  It's getting hard to backport bug fixes.
> And will be even harder over time since the code drifts away from the
> version in the upstream kernel.  Mainly because we're not backporting
> new features for a few years already.
> 
> Does that make sense?

Any thoughts on this? The freeze time is approaching, so it would
be great to know your plans for this patch set.

Thanks,
fbl

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [[PATCH RFC] 13/17] Enable IP checksum offloading by default.

2022-03-10 Thread Flavio Leitner
On Mon, Jan 24, 2022 at 02:21:35PM -0500, Mike Pattrick wrote:
> On Tue, Dec 7, 2021 at 11:54 AM Flavio Leitner  wrote:
> >
> > The netdev receiving packets is supposed to provide the flags
> > indicating if the IP csum was verified and it is OK or BAD,
> > otherwise the stack will check when appropriate by software.
> >
> > If the packet comes with good checksum, then postpone the
> > checksum calculation to the egress device if needed.
> >
> > When encapsulate a packet with that flag, set the checksum
> > of the inner IP header since that is not yet supported.
> >
> > Calculate the IP csum when the packet is going to be sent over
> > a device that doesn't support the feature.
> >
> > Linux devices don't support IP csum offload alone, so the
> > support is not enabled.
> >
> > Signed-off-by: Flavio Leitner 
> > ---
> >  lib/conntrack.c | 12 ++---
> >  lib/dp-packet.c | 12 +
> >  lib/dp-packet.h | 63 ---
> >  lib/dpif.h  |  2 +-
> >  lib/flow.c  | 16 --
> >  lib/ipf.c   |  9 ++--
> >  lib/netdev-dpdk.c   | 78 ++--
> >  lib/netdev-dummy.c  | 21 
> >  lib/netdev-native-tnl.c | 19 +--
> >  lib/netdev.c| 22 
> >  lib/odp-execute.c   | 21 ++--
> >  lib/packets.c   | 34 ++---
> >  ofproto/ofproto-dpif-upcall.c   | 14 +++--
> >  tests/automake.mk   |  1 +
> >  tests/system-userspace-offload.at   | 79 +
> >  tests/system-userspace-testsuite.at |  1 +
> >  16 files changed, 322 insertions(+), 82 deletions(-)
> >  create mode 100644 tests/system-userspace-offload.at
> >
> > diff --git a/lib/conntrack.c b/lib/conntrack.c
> > index 2392a2ea4..5b4ca4dfc 100644
> > --- a/lib/conntrack.c
> > +++ b/lib/conntrack.c
> > @@ -2089,16 +2089,12 @@ conn_key_extract(struct conntrack *ct, struct 
> > dp_packet *pkt, ovs_be16 dl_type,
> >  ctx->key.dl_type = dl_type;
> >
> >  if (ctx->key.dl_type == htons(ETH_TYPE_IP)) {
> > -bool hwol_bad_l3_csum = dp_packet_ol_ip_csum_bad(pkt);
> > -if (hwol_bad_l3_csum) {
> > +if (dp_packet_ol_ip_csum_bad(pkt)) {
> >  ok = false;
> >  COVERAGE_INC(conntrack_l3csum_err);
> >  } else {
> > -bool hwol_good_l3_csum = dp_packet_ol_ip_csum_good(pkt)
> > - || dp_packet_ol_tx_ipv4(pkt);
> > -/* Validate the checksum only when hwol is not supported. */
> >  ok = extract_l3_ipv4(>key, l3, dp_packet_l3_size(pkt), 
> > NULL,
> > - !hwol_good_l3_csum);
> > + !dp_packet_ol_ip_csum_good(pkt));
> >  }
> >  } else if (ctx->key.dl_type == htons(ETH_TYPE_IPV6)) {
> >  ok = extract_l3_ipv6(>key, l3, dp_packet_l3_size(pkt), NULL);
> > @@ -3402,7 +3398,9 @@ handle_ftp_ctl(struct conntrack *ct, const struct 
> > conn_lookup_ctx *ctx,
> >  }
> >  if (seq_skew) {
> >  ip_len = ntohs(l3_hdr->ip_tot_len) + seq_skew;
> > -if (!dp_packet_ol_tx_ipv4(pkt)) {
> > +if (dp_packet_ol_tx_ip_csum(pkt)) {
> > +dp_packet_ol_reset_ip_csum_good(pkt);
> > +} else {
> >  l3_hdr->ip_csum = recalc_csum16(l3_hdr->ip_csum,
> >  l3_hdr->ip_tot_len,
> >  htons(ip_len));
> 
> This is more of a general comment for the whole patch series, but I
> see that a lot of the diffs use the motif:
> 
> if (dp_packet_ol_tx_ip_csum(pkt)) {
> dp_packet_ol_reset_ip_csum_good(pkt);
> } else {
> recalc_csumXX()
> }
> 
> Would it make sense instead to simply flag for non-offload tainted
> checksum, and then only one call to csum() on packet egress?

That's a good point. I see that in most cases it recalculates
only what has changed, so it is supposed to be faster because
1) cache is hot and 2) fewer operations. If we leave to the
egress port, then we need to calculate full checksum and the
headers may not be in the cache.

Therefore, to avoid regressions in the non offloaded case,
I left th

Re: [ovs-dev] [[PATCH RFC] 06/17] dp-packet: Use p for packet and b for batch.

2022-03-10 Thread Flavio Leitner
On Mon, Jan 24, 2022 at 01:39:57PM -0500, Mike Pattrick wrote:
> On Tue, Dec 7, 2021 at 11:53 AM Flavio Leitner  wrote:
> >
> > Currently 'p' and 'b' and used for packets, so use
> > a convention that struct dp_packet is 'p' and
> > struct dp_packet_batch is 'b'.
> >
> > Some comments needed new formatting to not pass the
> > 80 column.
> >
> > Some variables were using 'p' or 'b' were renamed
> > as well.
> >
> > There should be no functional change with this patch.
> >
> > Signed-off-by: Flavio Leitner 
> > ---
> 
> I think this patch goes a long way to improve the readability of Would
> it also make sense to change b to p in lib/packets.c and
> lib/netdev-linux.c as well?

I think 'b' meant buffer before, but IMHO 'b' for batches
and 'p' for packets make more sense and then we should 
keep the consistency in all files, unless someone else
has a better idea.

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH branch-2.17 2/2] Prepare for 2.17.1.

2022-02-17 Thread Flavio Leitner
On Thu, Feb 17, 2022 at 07:45:35PM +0100, Ilya Maximets wrote:
> Signed-off-by: Ilya Maximets 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH branch-2.17 1/2] Set release date for 2.17.0.

2022-02-17 Thread Flavio Leitner
On Thu, Feb 17, 2022 at 07:45:34PM +0100, Ilya Maximets wrote:
> Added a NEWS entry for OVSDB performance because it is user-visible.
> It was not previously mentioned since it's an aggregated result of
> various commits.
> 
> Signed-off-by: Ilya Maximets 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] Intel AVX-512 CI

2022-02-10 Thread Flavio Leitner
This is great!
Congrats to all of you that work to get this up and running.
fbl

On Tue, Feb 8, 2022 at 5:01 PM Aaron Conole  wrote:

> Hi Michael,
>
> "Phelan, Michael"  writes:
>
> > Hi all,
> >
> > As presented at the OVS conference in December, the AVX-512 CI is coming
> online today to test incoming
> > patches to the OVS mailing list with AVX-512 implementations enabled.
> The CI will download and apply
> > patches before compiling OVS with DPCLS, DPIF and MFEX AVX-512
> implementations and then running the
> > OVS DPDK unit tests to verify functionality. We are currently not
> running the MFEX Autovalidator Fuzzy unit
> > test as there seems to be some issues with warnings that causes the test
> to fail unexpectedly. The results of
> > the test for a patch will be sent to the ovs-build mailing list and if
> there is failure in the test the author of the
> > patch will be Cc’d also.
> >
> >
> >
> > Please feel free to get in touch with me with any feedback you may have
> or if you notice any unexpected
> > results.
>
> I see some patches have been tested already.  Thanks for this work, it's
> great to see other labs integrating for public CI.  Kudos to all the
> work!
>
> > Kind regards,
> >
> > Michael Phelan.
> >
> >
> >
> > --
> >
> > Intel Research and Development Ireland Limited Registered in Ireland
> >
> > Registered Office: Collinstown Industrial Park, Leixlip, County
> >
> > Kildare Registered Number: 308263
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/2] Prepare for post-2.17.0 (2.17.90).

2022-01-18 Thread Flavio Leitner
On Tue, Jan 18, 2022 at 08:24:43PM +0100, Ilya Maximets wrote:
> Signed-off-by: Ilya Maximets 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/2] Prepare for 2.17.0.

2022-01-18 Thread Flavio Leitner
On Tue, Jan 18, 2022 at 08:24:42PM +0100, Ilya Maximets wrote:
> Signed-off-by: Ilya Maximets 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-dpdk: prepare for tso offload for tx copy packet

2022-01-18 Thread Flavio Leitner
On Thu, Jan 13, 2022 at 04:23:17PM +0800, Harold Huang wrote:
> From: Harold Huang 
> 
> When one flow is output to multiple egress ports, OVS copy the packets
> and send the copy packets to the intermediate ports. The original packets
> is sent to the last port. If the intermediate port is a dpdk port, the copy
> packets should also be prepared for tso offload.
> 
> Fixes: 29cf9c1b3b ("userspace: Add TCP Segmentation Offload support")
> Signed-off-by: Harold Huang 
> ---
>  lib/netdev-dpdk.c | 13 ++---
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 6782d3e8f..83029405e 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2737,10 +2737,11 @@ dpdk_pktmbuf_alloc(struct rte_mempool *mp, uint32_t 
> data_len)
>  }
>  
>  static struct dp_packet *
> -dpdk_copy_dp_packet_to_mbuf(struct rte_mempool *mp, struct dp_packet 
> *pkt_orig)
> +dpdk_copy_dp_packet_to_mbuf(struct netdev_dpdk *dev, struct dp_packet 
> *pkt_orig)
>  {
>  struct rte_mbuf *mbuf_dest;
>  struct dp_packet *pkt_dest;
> +struct rte_mempool *mp = dev->dpdk_mp->mp;
>  uint32_t pkt_len;
>  
>  pkt_len = dp_packet_size(pkt_orig);
> @@ -2761,11 +2762,9 @@ dpdk_copy_dp_packet_to_mbuf(struct rte_mempool *mp, 
> struct dp_packet *pkt_orig)
>  memcpy(_dest->l2_pad_size, _orig->l2_pad_size,
> sizeof(struct dp_packet) - offsetof(struct dp_packet, 
> l2_pad_size));
>  
> -if (mbuf_dest->ol_flags & RTE_MBUF_F_TX_L4_MASK) {
> -mbuf_dest->l2_len = (char *)dp_packet_l3(pkt_dest)
> -- (char *)dp_packet_eth(pkt_dest);
> -mbuf_dest->l3_len = (char *)dp_packet_l4(pkt_dest)
> -- (char *) dp_packet_l3(pkt_dest);
> +if (!netdev_dpdk_prep_hwol_packet(dev, mbuf_dest)) {
> +rte_pktmbuf_free(mbuf_dest);
> +return NULL;

What happens if a packet comes from a non-DPDK port and
goes to a vhost-user port? I think we will get into this:

netdev_dpdk_vhost_send()
\-- (not a DPDK packet)
\-- dpdk_do_tx_copy()
   \-- dpdk_copy_dp_packet_to_mbuf()
   |   \-- netdev_dpdk_prep_hwol_packet() <--
   |
   \-- __netdev_dpdk_vhost_send()
   \-- netdev_dpdk_prep_hwol_batch()
   \-- netdev_dpdk_prep_hwol_packet() <--

I think we will prepare the same packet twice.

BTW, this bug should be fixed by the patch here:
https://patchwork.ozlabs.org/project/openvswitch/patch/20210110030505.325722-1-...@sysclose.org/#2610119

fbl

>  }
>  
>  return pkt_dest;
> @@ -2813,7 +2812,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct 
> dp_packet_batch *batch)
>  continue;
>  }
>  
> -pkts[txcnt] = dpdk_copy_dp_packet_to_mbuf(dev->dpdk_mp->mp, packet);
> +pkts[txcnt] = dpdk_copy_dp_packet_to_mbuf(dev, packet);
>  if (OVS_UNLIKELY(!pkts[txcnt])) {
>  dropped = cnt - i;
>  break;
> -- 
> 2.27.0
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1] conntrack: Fix FTP NAT when TCP but not IP offload is supported

2022-01-18 Thread Flavio Leitner
On Tue, Jan 18, 2022 at 12:27:21PM -0500, Mike Pattrick wrote:
> On Tue, Jan 18, 2022 at 11:53 AM Flavio Leitner  wrote:
> >
> > On Mon, Jan 17, 2022 at 09:47:18AM -0500, Mike Pattrick wrote:
> > > On Fri, Jan 14, 2022 at 4:13 PM Flavio Leitner  wrote:
> > > >
> > > > On Fri, Jan 14, 2022 at 03:50:52PM -0500, Mike Pattrick wrote:
> > > > > On Fri, Jan 14, 2022 at 3:33 PM Flavio Leitner  
> > > > > wrote:
> > > > > >
> > > > > >
> > > > > > Hi Mike,
> > > > > >
> > > > > > Thanks for working on this issue.
> > > > > >
> > > > > > On Fri, Jan 14, 2022 at 10:45:35AM -0500, Mike Pattrick wrote:
> > > > > > > Formerly when userspace TSO was enabled but with a non-DKDK 
> > > > > > > interface
> > > > > > > without support IP checksum offloading, FTP NAT connections would 
> > > > > > > fail
> > > > > > > if the packet length changed. This can happen if the packets 
> > > > > > > length
> > > > > > > changes during L7 NAT translation, predominantly with FTP.
> > > > > > >
> > > > > > > Now we correct the IP header checksum if hwol is disabled or if 
> > > > > > > DPDK
> > > > > > > will not handle the IP checksum. This fixes the conntrack - IPv4 
> > > > > > > FTP
> > > > > > > Passive with DNAT" test when run with check-system-tso.
> > > > > > >
> > > > > > > Reported-by: Flavio Leitner 
> > > > > >
> > > > > > Actually, this was initially reported by Ilya.
> > > > > >
> > > > > > > Signed-off-by: Mike Pattrick 
> > > > > > > ---
> > > > > > >  lib/conntrack.c | 3 ++-
> > > > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/lib/conntrack.c b/lib/conntrack.c
> > > > > > > index 33a1a9295..1b8a26ac2 100644
> > > > > > > --- a/lib/conntrack.c
> > > > > > > +++ b/lib/conntrack.c
> > > > > > > @@ -3402,7 +3402,8 @@ handle_ftp_ctl(struct conntrack *ct, const 
> > > > > > > struct conn_lookup_ctx *ctx,
> > > > > > >  }
> > > > > > >  if (seq_skew) {
> > > > > > >  ip_len = ntohs(l3_hdr->ip_tot_len) + 
> > > > > > > seq_skew;
> > > > > > > -if (!dp_packet_hwol_is_ipv4(pkt)) {
> > > > > > > +if (!dp_packet_hwol_is_ipv4(pkt) ||
> > > > > > > +!dp_packet_ip_checksum_valid(pkt)) {
> > > > > > >  l3_hdr->ip_csum = 
> > > > > > > recalc_csum16(l3_hdr->ip_csum,
> > > > > > >  
> > > > > > > l3_hdr->ip_tot_len,
> > > > > > >  
> > > > > > > htons(ip_len));
> > > > > >
> > > > > > The problem is that the current code doesn't include IPv4 csum
> > > > > > handling as required by the Linux software ports.
> > > > > >
> > > > > > The patch above resolves the unit test issue because non-DPDK
> > > > > > interfaces will not flag the packet with good IP csum, and then
> > > > > > the csum is updated accordingly. However, a packet coming from
> > > > > > a physical DPDK port can have that flag set by the PMD, then if
> > > > > > it goes through that part the IP csum is not updated, which
> > > > > > will cause a problem if that packet is sent out over a Linux
> > > > > > software port later.
> > >
> > > I was curious about the performance impact of using csum() instead of
> > > recalc_csum16(). While setting up this benchmark, I also noticed some
> > > ways that we could improve recalc_csum16(). For example, making the
> > > function inline and unrolling the while loop.
> > >
> > > In my test, Performed 2^32 checksum updates with both the original,
> > > and my updated recalc_csum16() function, a

Re: [ovs-dev] [PATCH v1] conntrack: Fix FTP NAT when TCP but not IP offload is supported

2022-01-18 Thread Flavio Leitner
On Mon, Jan 17, 2022 at 09:47:18AM -0500, Mike Pattrick wrote:
> On Fri, Jan 14, 2022 at 4:13 PM Flavio Leitner  wrote:
> >
> > On Fri, Jan 14, 2022 at 03:50:52PM -0500, Mike Pattrick wrote:
> > > On Fri, Jan 14, 2022 at 3:33 PM Flavio Leitner  wrote:
> > > >
> > > >
> > > > Hi Mike,
> > > >
> > > > Thanks for working on this issue.
> > > >
> > > > On Fri, Jan 14, 2022 at 10:45:35AM -0500, Mike Pattrick wrote:
> > > > > Formerly when userspace TSO was enabled but with a non-DKDK interface
> > > > > without support IP checksum offloading, FTP NAT connections would fail
> > > > > if the packet length changed. This can happen if the packets length
> > > > > changes during L7 NAT translation, predominantly with FTP.
> > > > >
> > > > > Now we correct the IP header checksum if hwol is disabled or if DPDK
> > > > > will not handle the IP checksum. This fixes the conntrack - IPv4 FTP
> > > > > Passive with DNAT" test when run with check-system-tso.
> > > > >
> > > > > Reported-by: Flavio Leitner 
> > > >
> > > > Actually, this was initially reported by Ilya.
> > > >
> > > > > Signed-off-by: Mike Pattrick 
> > > > > ---
> > > > >  lib/conntrack.c | 3 ++-
> > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/lib/conntrack.c b/lib/conntrack.c
> > > > > index 33a1a9295..1b8a26ac2 100644
> > > > > --- a/lib/conntrack.c
> > > > > +++ b/lib/conntrack.c
> > > > > @@ -3402,7 +3402,8 @@ handle_ftp_ctl(struct conntrack *ct, const 
> > > > > struct conn_lookup_ctx *ctx,
> > > > >  }
> > > > >  if (seq_skew) {
> > > > >  ip_len = ntohs(l3_hdr->ip_tot_len) + seq_skew;
> > > > > -if (!dp_packet_hwol_is_ipv4(pkt)) {
> > > > > +if (!dp_packet_hwol_is_ipv4(pkt) ||
> > > > > +!dp_packet_ip_checksum_valid(pkt)) {
> > > > >  l3_hdr->ip_csum = 
> > > > > recalc_csum16(l3_hdr->ip_csum,
> > > > >  
> > > > > l3_hdr->ip_tot_len,
> > > > >  
> > > > > htons(ip_len));
> > > >
> > > > The problem is that the current code doesn't include IPv4 csum
> > > > handling as required by the Linux software ports.
> > > >
> > > > The patch above resolves the unit test issue because non-DPDK
> > > > interfaces will not flag the packet with good IP csum, and then
> > > > the csum is updated accordingly. However, a packet coming from
> > > > a physical DPDK port can have that flag set by the PMD, then if
> > > > it goes through that part the IP csum is not updated, which
> > > > will cause a problem if that packet is sent out over a Linux
> > > > software port later.
> 
> I was curious about the performance impact of using csum() instead of
> recalc_csum16(). While setting up this benchmark, I also noticed some
> ways that we could improve recalc_csum16(). For example, making the
> function inline and unrolling the while loop.
> 
> In my test, Performed 2^32 checksum updates with both the original,
> and my updated recalc_csum16() function, and then 2^32 full ipv4
> header checksum calculations with the csum function(). I also tested
> both with and without clearing the cpu cache between calls.
> 
> I found that the optimized recalc_csum16() was 1.5x faster then the
> current implementation, and the current recalc_csum16() implementation
> was only 2x faster then a full header calculation with csum().
> 
> Given these results, and the fact that any time we update the header,
> we will usually be affecting more then two bytes anyways, I think the
> performance improvement from using recalc_csum16() instead of csum()
> isn't so fantastic.

Ok, so that means we could update the checksum there, but then I
wonder if there are other places where this issue can happen.
I mean, if this is an isolated case, then we can fix using full
checksum in 2.17 and work on a more robust solution in master.
However, if there are more places with the same issue then we
may need to go with an approach as I posted earlier (implementing
a generic IP csum h

Re: [ovs-dev] [PATCH v1] conntrack: Fix FTP NAT when TCP but not IP offload is supported

2022-01-14 Thread Flavio Leitner
On Fri, Jan 14, 2022 at 03:50:52PM -0500, Mike Pattrick wrote:
> On Fri, Jan 14, 2022 at 3:33 PM Flavio Leitner  wrote:
> >
> >
> > Hi Mike,
> >
> > Thanks for working on this issue.
> >
> > On Fri, Jan 14, 2022 at 10:45:35AM -0500, Mike Pattrick wrote:
> > > Formerly when userspace TSO was enabled but with a non-DKDK interface
> > > without support IP checksum offloading, FTP NAT connections would fail
> > > if the packet length changed. This can happen if the packets length
> > > changes during L7 NAT translation, predominantly with FTP.
> > >
> > > Now we correct the IP header checksum if hwol is disabled or if DPDK
> > > will not handle the IP checksum. This fixes the conntrack - IPv4 FTP
> > > Passive with DNAT" test when run with check-system-tso.
> > >
> > > Reported-by: Flavio Leitner 
> >
> > Actually, this was initially reported by Ilya.
> >
> > > Signed-off-by: Mike Pattrick 
> > > ---
> > >  lib/conntrack.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/lib/conntrack.c b/lib/conntrack.c
> > > index 33a1a9295..1b8a26ac2 100644
> > > --- a/lib/conntrack.c
> > > +++ b/lib/conntrack.c
> > > @@ -3402,7 +3402,8 @@ handle_ftp_ctl(struct conntrack *ct, const struct 
> > > conn_lookup_ctx *ctx,
> > >  }
> > >  if (seq_skew) {
> > >  ip_len = ntohs(l3_hdr->ip_tot_len) + seq_skew;
> > > -if (!dp_packet_hwol_is_ipv4(pkt)) {
> > > +if (!dp_packet_hwol_is_ipv4(pkt) ||
> > > +!dp_packet_ip_checksum_valid(pkt)) {
> > >  l3_hdr->ip_csum = recalc_csum16(l3_hdr->ip_csum,
> > >  
> > > l3_hdr->ip_tot_len,
> > >  htons(ip_len));
> >
> > The problem is that the current code doesn't include IPv4 csum
> > handling as required by the Linux software ports.
> >
> > The patch above resolves the unit test issue because non-DPDK
> > interfaces will not flag the packet with good IP csum, and then
> > the csum is updated accordingly. However, a packet coming from
> > a physical DPDK port can have that flag set by the PMD, then if
> > it goes through that part the IP csum is not updated, which
> > will cause a problem if that packet is sent out over a Linux
> > software port later.
> 
> This is a good point, I was trying to get to a happy medium without
> repurposing the patchset that you had previously submitted. That way
> this patch could be available immediately, and your more thorough TSO
> patchset could be applied after.

I see what you did, and I appreciate that. My concern is that it's
not unusual to have packets moving between dpdk and linux ports, so
we might have to visit this issue again, though the unit test is not
failing.

> I had also prepared a larger patch as part of this work, but scrapped
> it because it was duplicating a lot of the work you had previously
> done. But if you prefer the more substantial solution, we can scrap
> this patch and I can submit the larger one.

Feel free to duplicate any parts of that RFC if that makes sense.
I think it's not duplicating anything. It's more as doing the
pieces that are more pressing first.

Let's see the larger one and decide. If it is not suitable, then
maybe as a potential work around we can ignore offloading at that
point and always calculate full IP csum there.

fbl

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1] conntrack: Fix FTP NAT when TCP but not IP offload is supported

2022-01-14 Thread Flavio Leitner


Hi Mike,

Thanks for working on this issue.

On Fri, Jan 14, 2022 at 10:45:35AM -0500, Mike Pattrick wrote:
> Formerly when userspace TSO was enabled but with a non-DKDK interface
> without support IP checksum offloading, FTP NAT connections would fail
> if the packet length changed. This can happen if the packets length
> changes during L7 NAT translation, predominantly with FTP.
> 
> Now we correct the IP header checksum if hwol is disabled or if DPDK
> will not handle the IP checksum. This fixes the conntrack - IPv4 FTP
> Passive with DNAT" test when run with check-system-tso.
> 
> Reported-by: Flavio Leitner 

Actually, this was initially reported by Ilya.

> Signed-off-by: Mike Pattrick 
> ---
>  lib/conntrack.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 33a1a9295..1b8a26ac2 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -3402,7 +3402,8 @@ handle_ftp_ctl(struct conntrack *ct, const struct 
> conn_lookup_ctx *ctx,
>  }
>  if (seq_skew) {
>  ip_len = ntohs(l3_hdr->ip_tot_len) + seq_skew;
> -if (!dp_packet_hwol_is_ipv4(pkt)) {
> +if (!dp_packet_hwol_is_ipv4(pkt) ||
> +!dp_packet_ip_checksum_valid(pkt)) {
>  l3_hdr->ip_csum = recalc_csum16(l3_hdr->ip_csum,
>  l3_hdr->ip_tot_len,
>  htons(ip_len));

The problem is that the current code doesn't include IPv4 csum
handling as required by the Linux software ports.

The patch above resolves the unit test issue because non-DPDK
interfaces will not flag the packet with good IP csum, and then
the csum is updated accordingly. However, a packet coming from
a physical DPDK port can have that flag set by the PMD, then if
it goes through that part the IP csum is not updated, which
will cause a problem if that packet is sent out over a Linux
software port later.

A better solution would to have the IP csum updated if the
packet requires and the port doesn't support that as proposed
in the RFC[1]. 

[1] https://mail.openvswitch.org/pipermail/ovs-dev/2021-December/389993.html
[See: dp_packet_ol_send_prepare()]

Perhaps we can achieve something in the middle with something
like below (not tested).

What do you think?

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 33a1a9295..9e0364719 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -3403,6 +3403,8 @@ handle_ftp_ctl(struct conntrack *ct, const struct 
conn_lookup_ctx *ctx,
 if (seq_skew) {
 ip_len = ntohs(l3_hdr->ip_tot_len) + seq_skew;
 if (!dp_packet_hwol_is_ipv4(pkt)) {
+dp_packet_ip_checksum_reset_valid(pkt);
+} else {
 l3_hdr->ip_csum = recalc_csum16(l3_hdr->ip_csum,
 l3_hdr->ip_tot_len,
 htons(ip_len));
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index ee0805ae6..8a880a0ce 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -25,6 +25,7 @@
 #include 
 #endif
 
+#include "csum.h"
 #include "netdev-afxdp.h"
 #include "netdev-dpdk.h"
 #include "openvswitch/list.h"
@@ -1064,6 +1065,20 @@ dp_packet_ip_checksum_valid(const struct dp_packet *p)
 DP_PACKET_OL_RX_IP_CKSUM_GOOD;
 }
 
+/* Sets IP valid checksum flag in packet 'p'. */
+static inline void
+dp_packet_ip_checksum_set_valid(const struct dp_packet *p)
+{
+*dp_packet_ol_flags_ptr(p) |= DP_PACKET_OL_RX_IP_CKSUM_GOOD;
+}
+
+/* Resets IP valid checksum flag in packet 'p'. */
+static inline void
+dp_packet_ip_checksum_reset_valid(const struct dp_packet *p)
+{
+*dp_packet_ol_flags_ptr(p) &= ~DP_PACKET_OL_RX_IP_CKSUM_GOOD;
+}
+
 static inline bool
 dp_packet_ip_checksum_bad(const struct dp_packet *p)
 {
@@ -1071,6 +1086,17 @@ dp_packet_ip_checksum_bad(const struct dp_packet *p)
 DP_PACKET_OL_RX_IP_CKSUM_BAD;
 }
 
+/* Calculate and set the IPv4 header checksum in packet 'p'. */
+static inline void
+dp_packet_ip_set_header_csum(struct dp_packet *p)
+{
+struct ip_header *ip = dp_packet_l3(p);
+
+ovs_assert(ip);
+ip->ip_csum = 0;
+ip->ip_csum = csum(ip, sizeof *ip);
+}
+
 static inline bool
 dp_packet_l4_checksum_valid(const struct dp_packet *p)
 {
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 620a451de..a2a87ab92 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -932,7 +932,6 @@ netdev_linux_common_construct(struct netdev *netdev_)
 netdev_->ol_flags |= NETDEV_TX_OFFLOAD_TCP_CKSUM;
 netdev_->ol_flags |= NETDEV_TX_OFFLOAD_UDP_CKSUM;

Re: [ovs-dev] [PATCH v4 0/9] Actions Infrastructure + Optimizations

2022-01-13 Thread Flavio Leitner


Hi,

On Thu, Jan 13, 2022 at 10:53:57AM +, Van Haaren, Harry wrote:
> > -Original Message-
> > From: Ilya Maximets 
> > Sent: Wednesday, January 12, 2022 6:01 PM
> > To: Van Haaren, Harry ; Finn, Emma
> > ; d...@openvswitch.org; Amber, Kumar
> > 
> > Cc: i.maxim...@ovn.org; Stokes, Ian ; Flavio Leitner
> > ; Kevin Traynor 
> > Subject: Re: [PATCH v4 0/9] Actions Infrastructure + Optimizations
> > 
> > On 1/6/22 14:11, Van Haaren, Harry wrote:
> > >> -Original Message-
> > >> From: Finn, Emma 
> > >> Sent: Wednesday, January 5, 2022 4:54 PM
> > >> To: d...@openvswitch.org; Van Haaren, Harry ;
> > >> Amber, Kumar 
> > >> Cc: Finn, Emma 
> > >> Subject: [PATCH v4 0/9] Actions Infrastructure + Optimizations
> > >>
> > >> ---
> > >> v4:
> > >> - Rebase to master
> > >> - Add ISA implementation of push_vlan action
> > >
> > > Thanks for the updated patchset Emma & Amber.
> > >
> > > Overall, this is working as expected and I've only had some minor
> > > comments throughout the patchset. I've added my Acked-by to most
> > > patches, some small open questions remain to be addressed in a v5.
> > >
> > > +CC Ian/Ilya , I'd like to see the v5 get merged, so let's continue to 
> > > work
> > towards that.
> > 
> > Hi, Harry, Ian, others.
> 
> Hi Ilya,
> 
> > Following up from a brief conversation during today's upstream meeting.
> > It was brought to my attention that you're expecting this series and
> > the 'hash' one to be accepted into 2.17. Though there are few issues
> > with that:
> > 
> > 1. This review for v4 was actually very first review of the patch set.
> >The other one as of today doesn't have any reviews at all.
> >Looking at the change log for this patch set it doesn't seem that
> >internal reviews behind the closed doors (if there were any) requested
> >any significant changes.  In any case, internal reviews is not the way
> >how open-source projects work.
> 
> Actions & MFEX  were developed internally yes, and hence internal reviews and
> architecture was iterated on. Saying "not many large changes requested" is 
> not relevant,
> it means that internally the architecture was well aligned. If anything, it 
> means that
> reviewers did not have big concerns, hence we should have better confidence 
> to merge.

Yes but it can also mean that no careful review has been done. That's
why the ask here is to do the review in the open, otherwise others can't
tell what happened.


> > 2. The soft freeze for 2.17 began on Jan 3 in accordance with our
> >release schedule (even a bit later), and as you know, during the soft
> >freeze we're not normally accepting patches that wasn't already reviewed
> >before the soft freeze begun.
> >https://mail.openvswitch.org/pipermail/ovs-dev/2022-January/390487.html
> 
> Actions and MFEX Hashing were discussed and reviewed in public at OVS Conf;
> https://www.openvswitch.org/support/ovscon2021/#T32
> https://www.openvswitch.org/support/ovscon2021/#T33
> 
> The closing "call to action" slide clearly states 2.17 upstream is intended,
> and welcome community review & comments, no concerns were raised.

There is a difference between a talk and pushing patches, and more
importantly, to get the community to review and ACK the patches.

> > That's not the end of a world, but you need to request an exception in
> > reply to the email linked above.
> 
> As both the patches and intent to merge for OVS 2.17 were clearly discussed
> in public at OVS Conf, this "request exception" case does not apply.

The talk happened at Dec 7 & 8, the soft freeze happened Jan 3rd, so
there was a chunk of time to work on that. I don't see why one thing
justifies the other.


> > But I have a few high-level concerns regarding the patch set itself,
> > and that's a bigger problem for me:
> > 
> > 1. What are the benefits of these patch sets?  A lot of infrastructure
> >changes are made, but the benefits of them are unclear.  Why these
> >changes are needed in the end?  I believe, that was the main reason
> >why community had no interest in reviewing these patches.
> >2.17 is supposed to be a new LTS, so infrastructure changes without
> >clear benefits might not be a good fit taking into account time
> >constraints and lack of reviews.
> 
> Customers workloads are accelerated, improving OVS datapath performance for 
> their
> workloads. T

Re: [ovs-dev] [PATCH v2] netdev-dpdk: Refactor the DPDK transmit path.

2022-01-12 Thread Flavio Leitner


Hello Sunil, Marko and Ian.

Mike worked to identify the reason for the performance issue
reported by you a while ago. He summarized below. I wonder
if you can give a try on his patch too and tell us if we are
on the right track.

Thanks,
fbl

On Wed, Jan 05, 2022 at 03:01:47PM -0500, Mike Pattrick wrote:
> Hello Flavio,
> 
> Great patch, I think you really did a lot to improve the code here and
> I think that's borne out by the consistent performance improvements
> across multiple tests.
> 
> Regarding the 4% regression that Intel detected, I found the following
> white paper to describe the "scatter" test:
> 
> https://builders.intel.com/docs/networkbuilders/open-vswitch-optimized-deployment-benchmark-technology-guide.pdf
> 
> This document calls out the following key points:
> 
> The original test was summarized as:
> - 32 VMs with one million flows.
> - Test runs on four physical cores for OVS and 10 hyper-threaded cores
> for TestPMD
> - An Ixia pitches traffic at a sub 0.1% loss rate
> - The server catches traffic with a E810-C 100G
> - The traffic's profile is: Ether()/IP()/UDP()/VXLAN()/Ether()/IP() 
> - On the outer IP, the source address changes incrementally across the
> 32 instances
> - The destination address remains the same on the outer IP.
> - The inner source IP remains
> - The inner destination address increments to create the one million
> flows for the test
> - EMC and SMC were disabled
> 
> I could not reproduce this test exactly because I don't have access to
> the same hardware - notably the Intel NIC and an Ixia - and I didn't
> want to create an environment that wouldn't be reproduced in real world
> scenarios. I did pin VM and TXQs/RXQs threads to cores, but I didn't
> optimize the setup nearly to the extant that the white paper described.
> My test setup consisted of two Fedora 35 servers directly connected
> across Mellanox5E cards with Trex pitching traffic and TestPMD
> reflecting it.
> 
> In my test I was still able to reproduce a similar performance penalty.
> I found that the key factors was the combination of VXLAN and a large
> number of flows. So once I had a setup that could reproduce close to
> the 4% penalty I stopped modifying my test framework and started
> searching for the slow code.
> 
> I didn't see any obvious issues in the code that should cause a
> significant slowdown, in fact, most of the code is identical or
> slightly improved. So to help my analysis, I created several variations
> of your patch reverting small aspects of the change and benchmarked
> each variation.
> 
> Because the difference in performance across each variation was so
> minor, I took a lot of samples. I pitched traffic over one million
> flows for 240 seconds and averaged out the throughput, I then repeated
> this process a total of five times for each patch. Finally, I repeated
> the whole process three times to produce 15 data points per patch.
> 
> The best results came from the patch enclosed below, with the code from
> netdev_dpdk_common_send() protected by the splinlock, as it is in the
> pre-patch code. This yielded a 2.7% +/- 0.64 performance boost over the
> master branch.
> 
> 
> Cheers,
> Michael
> 
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index bc1633663..5db5d7e2a 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2777,13 +2777,13 @@ netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
>  return 0;
>  }
>  
> -cnt = netdev_dpdk_common_send(netdev, batch, );
> -
>  if (OVS_UNLIKELY(!rte_spinlock_trylock(>tx_q[qid].tx_lock))) {
>  COVERAGE_INC(vhost_tx_contention);
>  rte_spinlock_lock(>tx_q[qid].tx_lock);
>  }
>  
> +cnt = netdev_dpdk_common_send(netdev, batch, );
> +
>  pkts = (struct rte_mbuf **) batch->packets;
>  vhost_batch_cnt = cnt;
>  retries = 0;
> @@ -2843,13 +2843,15 @@ netdev_dpdk_eth_send(struct netdev *netdev, int qid,
>  return 0;
>  }
>  
> -cnt = netdev_dpdk_common_send(netdev, batch, );
> -dropped = batch_cnt - cnt;
>  if (OVS_UNLIKELY(concurrent_txq)) {
>  qid = qid % dev->up.n_txq;
>  rte_spinlock_lock(>tx_q[qid].tx_lock);
>  }
>  
> +    cnt = netdev_dpdk_common_send(netdev, batch, );
> +
> +dropped = batch_cnt - cnt;
> +
>  dropped += netdev_dpdk_eth_tx_burst(dev, qid, pkts, cnt);
>  if (OVS_UNLIKELY(dropped)) {
>  struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
> 
> 
> On Sun, 2021-01-10 at 00:05 -0300, Flavio Leitner wrote:
> > This patch split out the common code between vhost and
> > dpdk transmit paths to shared functions to

[ovs-dev] [PATCH] system-tso: Skip encap tests when userspace TSO is enabled.

2022-01-05 Thread Flavio Leitner
It seems Linux native tunnel configuration changed to enable
checksum by default and that causes the check-system-tso unit
test below to fail:
 10: datapath - ping over vxlan tunnelFAILED (system-traffic.at:248)

That happens because userspace TSO doesn't support encapsulation
as mentioned in the current documentation. In this specific case,
udp_extract_tnl_md() checks if the checksum is correct, but since
TSO is enabled, the outer UDP header contains only the pseudo
checksum and not the full packet checksum.

Although the packet is marked correctly with UDP csum offload flag
and the code could use that to verify the pseudo csum, more work
is needed to properly translate the offloading flags from the outer
headers to the inner headers.  For example, if the payload is a
TCP packet, most probably the flag DP_PACKET_OL_TX_UDP_CKSUM doesn't
make sense after decapsulating that.

This patch skips the tunnel tests when the userspace TSO is enabled.
Fixes: 29bb3093eb8b ("userspace: Enable TSO support for non-DPDK.")
Signed-off-by: Flavio Leitner 
---
 tests/system-common-macros.at |  8 
 tests/system-traffic.at   | 17 +
 tests/system-tso-macros.at|  2 ++
 3 files changed, 27 insertions(+)

diff --git a/tests/system-common-macros.at b/tests/system-common-macros.at
index 19a0b125b..8b9f5c752 100644
--- a/tests/system-common-macros.at
+++ b/tests/system-common-macros.at
@@ -281,6 +281,14 @@ m4_define([OVS_START_L7],
 #
 m4_define([OFPROTO_CLEAR_DURATION_IDLE], [[sed -e 
's/duration=.*s,/duration=,/g' -e 
's/idle_age=[0-9]*,/idle_age=,/g']])
 
+# OVS_CHECK_TUNNEL_TSO()
+#
+# Macro to be used in general tunneling tests that could be also
+# used by system-tso. In that case, tunneling is not supported and
+# the test should be skipped.
+m4_define([OVS_CHECK_TUNNEL_TSO],
+[m4_ifdef([CHECK_SYSTEM_TSO], [AT_SKIP_IF(:)])])
+
 # OVS_CHECK_VXLAN()
 #
 # Do basic check for vxlan functionality, skip the test if it's not there.
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index d79753a99..6b01f8655 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -218,6 +218,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([datapath - ping over vxlan tunnel])
+OVS_CHECK_TUNNEL_TSO()
 OVS_CHECK_VXLAN()
 
 OVS_TRAFFIC_VSWITCHD_START()
@@ -259,6 +260,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([datapath - ping over vxlan6 tunnel])
+OVS_CHECK_TUNNEL_TSO()
 OVS_CHECK_VXLAN_UDP6ZEROCSUM()
 
 OVS_TRAFFIC_VSWITCHD_START()
@@ -302,6 +304,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([datapath - ping over gre tunnel])
+OVS_CHECK_TUNNEL_TSO()
 OVS_CHECK_KERNEL_EXCL(3, 10, 4, 15)
 OVS_CHECK_GRE()
 
@@ -343,6 +346,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([datapath - ping over ip6gre L2 tunnel])
+OVS_CHECK_TUNNEL_TSO()
 OVS_CHECK_KERNEL_EXCL(3, 10, 4, 15)
 OVS_CHECK_GRE()
 OVS_CHECK_ERSPAN()
@@ -383,6 +387,7 @@ AT_CLEANUP
 
 
 AT_SETUP([datapath - ping over erspan v1 tunnel])
+OVS_CHECK_TUNNEL_TSO()
 OVS_CHECK_KERNEL_EXCL(3, 10, 4, 15)
 OVS_CHECK_GRE()
 OVS_CHECK_ERSPAN()
@@ -419,6 +424,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([datapath - ping over erspan v2 tunnel])
+OVS_CHECK_TUNNEL_TSO()
 OVS_CHECK_KERNEL_EXCL(3, 10, 4, 15)
 OVS_CHECK_GRE()
 OVS_CHECK_ERSPAN()
@@ -455,6 +461,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([datapath - ping over ip6erspan v1 tunnel])
+OVS_CHECK_TUNNEL_TSO()
 OVS_CHECK_KERNEL_EXCL(3, 10, 4, 15)
 OVS_CHECK_GRE()
 OVS_CHECK_ERSPAN()
@@ -494,6 +501,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([datapath - ping over ip6erspan v2 tunnel])
+OVS_CHECK_TUNNEL_TSO()
 OVS_CHECK_KERNEL_EXCL(3, 10, 4, 15)
 OVS_CHECK_GRE()
 OVS_CHECK_ERSPAN()
@@ -534,6 +542,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([datapath - ping over geneve tunnel])
+OVS_CHECK_TUNNEL_TSO()
 OVS_CHECK_GENEVE()
 
 OVS_TRAFFIC_VSWITCHD_START()
@@ -575,6 +584,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([datapath - ping over geneve tunnel, delete flow regression])
+OVS_CHECK_TUNNEL_TSO()
 OVS_CHECK_GENEVE()
 
 OVS_TRAFFIC_VSWITCHD_START()
@@ -629,6 +639,7 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/|ERR|/d
 AT_CLEANUP
 
 AT_SETUP([datapath - flow resume with geneve tun_metadata])
+OVS_CHECK_TUNNEL_TSO()
 OVS_CHECK_GENEVE()
 
 OVS_TRAFFIC_VSWITCHD_START()
@@ -680,6 +691,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([datapath - ping over geneve6 tunnel])
+OVS_CHECK_TUNNEL_TSO()
 OVS_CHECK_GENEVE_UDP6ZEROCSUM()
 
 OVS_TRAFFIC_VSWITCHD_START()
@@ -723,6 +735,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([datapath - ping over gre tunnel by simulated packets])
+OVS_CHECK_TUNNEL_TSO()
 OVS_CHECK_MIN_KERNEL(3, 10)
 
 OVS_TRAFFIC_VSWITCHD_START()
@@ -769,6 +782,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([datapath - ping over erspan v1 tunnel by simulated packets])
+OVS_CHECK_TUNNEL_TSO()
 OVS_CHECK_MIN_KERNEL(3, 10)
 
 OVS_TRAFFIC_VSWITCHD_START()
@@ -817,6 +831,7 @@ OVS_TRAFFI

[ovs-dev] [[PATCH RFC] 17/17] Enable TSO if available.

2021-12-07 Thread Flavio Leitner
Now that there is a segmentation in software as a fall back in
case a netdev doesn't support TCP segmentation offloading (TSO),
enable it by default on all possible netdevs.

This patch showcase the idea, but it can't really be applied
because it doesn't support encapsulated packets yet. Either it
would have to enable that support first or provide a switch to
turn on/off globally depending on the use case.

This patch is good to also measure performance with P2P and PVP
and check if there are regressions before continue the work.

The encapsulated traffic is challenging because DPDK ports
require pointers to inner headers [1] and OVS doesn't support
them at the moment. We could store the pointers when the packet
is encapsulated, but then any further change in the packet
headers may or may not cause the inner pointers to change too.

Another requirement not present here is the control of the
features (csum, TSO) per port. That can be done, but for
example if a vhost-user port has TSO turned off, then the
software segmentation is used. Currently that allocates
packets from normal memory, so DPDK would have to copy
(dpdk_do_tx_copy) each packet to send out on another DPDK port.

[1]
https://doc.dpdk.org/guides/prog_guide/mbuf_lib.html#meta-information

Signed-off-by: Flavio Leitner 
---
 Documentation/topics/userspace-tso.rst |  12 --
 lib/automake.mk|   2 -
 lib/netdev-dpdk.c  |  56 +++--
 lib/netdev-linux.c | 155 -
 lib/netdev.c   |   4 +-
 lib/userspace-tso.c|  48 
 lib/userspace-tso.h|  23 
 vswitchd/bridge.c  |   2 -
 vswitchd/vswitch.xml   |  20 
 9 files changed, 68 insertions(+), 254 deletions(-)
 delete mode 100644 lib/userspace-tso.c
 delete mode 100644 lib/userspace-tso.h

diff --git a/Documentation/topics/userspace-tso.rst 
b/Documentation/topics/userspace-tso.rst
index bd64e7ed3..a574ae9e3 100644
--- a/Documentation/topics/userspace-tso.rst
+++ b/Documentation/topics/userspace-tso.rst
@@ -27,8 +27,6 @@
 Userspace Datapath - TSO
 
 
-**Note:** This feature is considered experimental.
-
 TCP Segmentation Offload (TSO) enables a network stack to delegate segmentation
 of an oversized TCP segment to the underlying physical NIC. Offload of frame
 segmentation achieves computational savings in the core, freeing up CPU cycles
@@ -51,16 +49,6 @@ __ https://doc.dpdk.org/guides-20.11/nics/overview.html
 Enabling TSO
 
 
-The TSO support may be enabled via a global config value
-``userspace-tso-enable``.  Setting this to ``true`` enables TSO support for
-all ports.::
-
-$ ovs-vsctl set Open_vSwitch . other_config:userspace-tso-enable=true
-
-The default value is ``false``.
-
-Changing ``userspace-tso-enable`` requires restarting the daemon.
-
 When using :doc:`vHost User ports `, TSO may be enabled
 as follows.
 
diff --git a/lib/automake.mk b/lib/automake.mk
index 2ca94e13c..f11c10d9a 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -363,8 +363,6 @@ lib_libopenvswitch_la_SOURCES = \
lib/unicode.h \
lib/unixctl.c \
lib/unixctl.h \
-   lib/userspace-tso.c \
-   lib/userspace-tso.h \
lib/util.c \
lib/util.h \
lib/uuid.c \
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 0d370bda3..1f7443028 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -65,7 +65,6 @@
 #include "timeval.h"
 #include "unaligned.h"
 #include "unixctl.h"
-#include "userspace-tso.h"
 #include "util.h"
 #include "uuid.h"
 
@@ -1180,16 +1179,13 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
 dev->hw_ol_features &= ~NETDEV_TX_SCTP_CKSUM_OFFLOAD;
 }
 
-dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD;
-if (userspace_tso_enabled()) {
-if (info.tx_offload_capa & DEV_TX_OFFLOAD_TCP_TSO) {
-dev->hw_ol_features |= NETDEV_TX_TSO_OFFLOAD;
-} else {
-VLOG_WARN("%s: Tx TSO offload is not supported.",
-  netdev_get_name(>up));
-}
+if (info.tx_offload_capa & DEV_TX_OFFLOAD_TCP_TSO) {
+dev->hw_ol_features |= NETDEV_TX_TSO_OFFLOAD;
+} else {
+dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD;
 }
 
+
 n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq);
 n_txq = MIN(info.max_tx_queues, dev->up.n_txq);
 
@@ -1419,16 +1415,13 @@ netdev_dpdk_vhost_construct(struct netdev *netdev)
 goto out;
 }
 
-if (!userspace_tso_enabled()) {
-err = rte_vhost_driver_disable_features(dev->vhost_id,
-1ULL << VIRTIO_NET_F_HOST_TSO4
-| 1ULL << VIRTIO_NET_F_HOST_TSO6
-| 1ULL << VIRTIO_NET_F_

[ovs-dev] [[PATCH RFC] 16/17] Add Generic Segmentation Offloading.

2021-12-07 Thread Flavio Leitner
This provides a software implementation in the case
the egress netdev doesn't support segmentation in hardware.

This is an _untested_ patch to showcase the proposed solution.

The challenge here is to guarantee packet ordering in the
original batch that may be full of TSO packets. Each TSO
packet can go up to ~64kB, so with segment size of 1440
that means about 44 packets for each TSO. Each batch has
32 packets, so the total batch amounts to 1408 normal
packets.

The segmentation estimates the total number of packets
and then the total number of batches. Then allocate
enough memory and finally do the work.

Finally each batch is sent in order to the netdev.

Signed-off-by: Flavio Leitner 
---
 lib/automake.mk |   2 +
 lib/dp-packet-gso.c | 153 
 lib/dp-packet-gso.h |  24 +++
 lib/dp-packet.h |   7 ++
 lib/netdev.c| 122 +--
 5 files changed, 259 insertions(+), 49 deletions(-)
 create mode 100644 lib/dp-packet-gso.c
 create mode 100644 lib/dp-packet-gso.h

diff --git a/lib/automake.mk b/lib/automake.mk
index 46f869a33..2ca94e13c 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -107,6 +107,8 @@ lib_libopenvswitch_la_SOURCES = \
lib/dpctl.h \
lib/dp-packet.h \
lib/dp-packet.c \
+   lib/dp-packet-gso.c \
+   lib/dp-packet-gso.h \
lib/dpdk.h \
lib/dpif-netdev-extract-study.c \
lib/dpif-netdev-lookup.h \
diff --git a/lib/dp-packet-gso.c b/lib/dp-packet-gso.c
new file mode 100644
index 0..fcc35b100
--- /dev/null
+++ b/lib/dp-packet-gso.c
@@ -0,0 +1,153 @@
+/*
+ * Copyright (c) 2021 Red Hat, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+#include 
+#include 
+
+#include "coverage.h"
+#include "dp-packet.h"
+#include "dp-packet-gso.h"
+#include "netdev-provider.h"
+
+COVERAGE_DEFINE(soft_seg_good);
+
+/* Retuns a new packet that is a segment of packet 'p'.
+ *
+ * The new packet is initialized with 'hdr_len' bytes from the
+ * start of packet 'p' and then appended with 'data_len' bytes
+ * from the 'data' buffer.
+ *
+ * Note: The packet headers are not updated. */
+static struct dp_packet *
+dp_packet_gso_seg_new(const struct dp_packet *p, size_t hdr_len,
+  const char *data, size_t data_len)
+{
+struct dp_packet *seg = dp_packet_new_with_headroom(hdr_len + data_len,
+dp_packet_headroom(p));
+
+/* Append the original packet headers and then the payload. */
+dp_packet_put(seg, dp_packet_data(p), hdr_len);
+dp_packet_put(seg, data, data_len);
+
+/* The new segment should have the same offsets. */
+seg->l2_5_ofs = p->l2_5_ofs;
+seg->l3_ofs = p->l3_ofs;
+seg->l4_ofs = p->l4_ofs;
+
+/* The protocol headers remain the same, so preserve hash and mark. */
+*dp_packet_rss_ptr(seg) = dp_packet_get_rss_hash(p);
+*dp_packet_flow_mark_ptr(seg) = *dp_packet_flow_mark_ptr(p);
+
+/* The segment should inherit all the offloading flags from the
+ * original packet, except for the TCP segmentation flag. */
+*dp_packet_ol_flags_ptr(seg) =  *dp_packet_ol_flags_ptr(p);
+dp_packet_ol_reset_tcp_seg(seg);
+
+return seg;
+}
+
+/* Returns the calculated number of TCP segments in packet 'p'. */
+int
+dp_packet_gso_nr_segs(struct dp_packet *p)
+{
+uint16_t segsz = dp_packet_get_tso_segsz(p);
+const char *data_tail;
+const char *data_pos;
+int n_segs;
+
+data_pos = dp_packet_get_tcp_payload(p);
+data_tail = (char *) dp_packet_tail(p) - dp_packet_l2_pad_size(p);
+data_pos = dp_packet_get_tcp_payload(p);
+n_segs = DIV_ROUND_UP((data_tail - data_pos), segsz);
+
+return n_segs;
+
+}
+
+/* Perform software segmentation on packet 'p'.
+ *
+ * Returns all the segments added to the array of preallocated
+ * batches in 'batches' starting at batch position 'batch_pos'. */
+void
+dp_packet_gso(struct dp_packet *p, struct dp_packet_batch *batches,
+  size_t *batch_pos)
+{
+struct tcp_header *tcp_hdr;
+struct ip_header *ip_hdr;
+struct dp_packet *seg;
+uint32_t tcp_seq;
+uint16_t ip_id;
+int hdr_len;
+
+tcp_hdr = dp_packet_l4(p);
+tcp_seq = ntohl(get_16aligned_be32(_hdr->tcp_seq));
+hdr_len = ((char *)dp_packet_l4(p) - (char *)dp

[ovs-dev] [[PATCH RFC] 15/17] Respect tso/gso segment size.

2021-12-07 Thread Flavio Leitner
Currently OVS will calculate the segment size based on the
MTU of the egress port. That usually happens to be correct
when the ports share the same MTU, but that is not always true.

Therefore, if the segment size is provided, then use that and
make sure the over sized packets are dropped.

Signed-off-by: Flavio Leitner 
---
 lib/dp-packet.c|  1 +
 lib/dp-packet.h| 27 
 lib/netdev-dpdk.c  | 13 ++--
 lib/netdev-linux.c | 78 +++---
 4 files changed, 98 insertions(+), 21 deletions(-)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 8a1bf221a..0cfc295b1 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -34,6 +34,7 @@ dp_packet_init__(struct dp_packet *p, size_t allocated, enum 
dp_packet_source so
 pkt_metadata_init(>md, 0);
 dp_packet_reset_cutlen(p);
 dp_packet_ol_reset(p);
+dp_packet_set_tso_segsz(p, 0);
 /* Initialize implementation-specific fields of dp_packet. */
 dp_packet_init_specific(p);
 /* By default assume the packet type to be Ethernet. */
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 51f98ab9a..27529ca87 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -124,6 +124,7 @@ struct dp_packet {
 uint32_t ol_flags;  /* Offloading flags. */
 uint32_t rss_hash;  /* Packet hash. */
 uint32_t flow_mark; /* Packet flow mark. */
+uint16_t tso_segsz;  /* TCP TSO segment size. */
 #endif
 enum dp_packet_source source;  /* Source of memory allocated as 'base'. */
 
@@ -164,6 +165,9 @@ static inline void dp_packet_set_size(struct dp_packet *, 
uint32_t);
 static inline uint16_t dp_packet_get_allocated(const struct dp_packet *);
 static inline void dp_packet_set_allocated(struct dp_packet *, uint16_t);
 
+static inline uint16_t dp_packet_get_tso_segsz(const struct dp_packet *);
+static inline void dp_packet_set_tso_segsz(struct dp_packet *, uint16_t);
+
 void *dp_packet_resize_l2(struct dp_packet *, int increment);
 void *dp_packet_resize_l2_5(struct dp_packet *, int increment);
 static inline void *dp_packet_eth(const struct dp_packet *);
@@ -635,6 +639,18 @@ dp_packet_set_allocated(struct dp_packet *p, uint16_t s)
 p->mbuf.buf_len = s;
 }
 
+static inline uint16_t
+dp_packet_get_tso_segsz(const struct dp_packet *p)
+{
+return p->mbuf.tso_segsz;
+}
+
+static inline void
+dp_packet_set_tso_segsz(struct dp_packet *p, uint16_t s)
+{
+p->mbuf.tso_segsz = s;
+}
+
 #else /* DPDK_NETDEV */
 
 static inline void
@@ -691,6 +707,17 @@ dp_packet_set_allocated(struct dp_packet *p, uint16_t s)
 p->allocated_ = s;
 }
 
+static inline uint16_t
+dp_packet_get_tso_segsz(const struct dp_packet *p)
+{
+return p->tso_segsz;
+}
+
+static inline void
+dp_packet_set_tso_segsz(struct dp_packet *p, uint16_t s)
+{
+p->tso_segsz = s;
+}
 #endif /* DPDK_NETDEV */
 
 static inline void
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index c7e09b973..0d370bda3 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -,6 +,7 @@ netdev_dpdk_prep_ol_packet(struct netdev_dpdk *dev, 
struct rte_mbuf *mbuf)
 
 if (mbuf->ol_flags & PKT_TX_TCP_SEG) {
 struct tcp_header *th = dp_packet_l4(pkt);
+int hdr_len;
 
 if (!th) {
 VLOG_WARN_RL(, "%s: TCP Segmentation without L4 header"
@@ -2231,7 +2232,14 @@ netdev_dpdk_prep_ol_packet(struct netdev_dpdk *dev, 
struct rte_mbuf *mbuf)
 
 mbuf->l4_len = TCP_OFFSET(th->tcp_ctl) * 4;
 mbuf->ol_flags |= PKT_TX_TCP_CKSUM;
-mbuf->tso_segsz = dev->mtu - mbuf->l3_len - mbuf->l4_len;
+hdr_len = mbuf->l2_len + mbuf->l3_len + mbuf->l4_len;
+if (OVS_UNLIKELY((hdr_len + mbuf->tso_segsz) > dev->max_packet_len)) {
+VLOG_WARN_RL(, "%s: Oversized TSO packet. "
+ "hdr: %"PRIu32", gso: %"PRIu32", max len: %"PRIu32"",
+ dev->up.name, hdr_len, mbuf->tso_segsz,
+ dev->max_packet_len);
+return false;
+}
 
 if (mbuf->ol_flags & PKT_TX_IPV4) {
 mbuf->ol_flags |= PKT_TX_IP_CKSUM;
@@ -2597,7 +2605,8 @@ netdev_dpdk_filter_packet_len(struct netdev_dpdk *dev, 
struct rte_mbuf **pkts,
 int cnt = 0;
 struct rte_mbuf *pkt;
 
-/* Filter oversized packets, unless are marked for TSO. */
+/* Filter oversized packets. The TSO packets are filtered out
+ * during the offloading preparation for performance reasons. */
 for (i = 0; i < pkt_cnt; i++) {
 pkt = pkts[i];
 if (OVS_UNLIKELY((pkt->pkt_len > dev->max_packet_len)
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 48a3cf7d7..8a6f4592b 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -523,7 +523,7 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 
20);
 static ato

[ovs-dev] [[PATCH RFC] 14/17] Enable L4 csum offloading by default.

2021-12-07 Thread Flavio Leitner
The netdev receiving packets is supposed to provide the flags
indicating if the L4 csum was verified and it is OK or BAD,
otherwise the stack will check when appropriate by software.

If the packet comes with good checksum, then postpone the
checksum calculation to the egress device if needed.

When encapsulate a packet with that flag, set the checksum
of the inner L4 header since that is not yet supported.

Calculate the L4 csum when the packet is going to be sent over
a device that doesn't support the feature.

Linux tap devices allows enabling L3 and L4 offload, so this
patch enables the feature. However, Linux socket interface
remains disabled because the API doesn't allow enabling
those those features without enabling TSO too.

Signed-off-by: Flavio Leitner 
---
 lib/conntrack.c |  16 +--
 lib/dp-packet.c |  23 +++-
 lib/dp-packet.h |  56 
 lib/flow.c  |  21 +++
 lib/netdev-dpdk.c   | 157 ++---
 lib/netdev-linux.c  | 295 +---
 lib/netdev-native-tnl.c |  32 +
 lib/netdev.c|  40 ++
 lib/packets.c   | 174 +++-
 lib/packets.h   |   3 +
 10 files changed, 527 insertions(+), 290 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 5b4ca4dfc..c12b03538 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -2103,14 +2103,10 @@ conn_key_extract(struct conntrack *ct, struct dp_packet 
*pkt, ovs_be16 dl_type,
 }
 
 if (ok) {
-bool hwol_bad_l4_csum = dp_packet_ol_l4_csum_bad(pkt);
-if (!hwol_bad_l4_csum) {
-bool  hwol_good_l4_csum = dp_packet_ol_l4_csum_good(pkt)
-  || dp_packet_ol_tx_l4_csum(pkt);
-/* Validate the checksum only when hwol is not supported. */
+if (!dp_packet_ol_l4_csum_bad(pkt)) {
 if (extract_l4(>key, l4, dp_packet_l4_size(pkt),
-   >icmp_related, l3, !hwol_good_l4_csum,
-   NULL)) {
+   >icmp_related, l3,
+   !dp_packet_ol_l4_csum_good(pkt), NULL)) {
 ctx->hash = conn_key_hash(>key, ct->hash_basis);
 return true;
 }
@@ -3421,8 +3417,10 @@ handle_ftp_ctl(struct conntrack *ct, const struct 
conn_lookup_ctx *ctx,
 adj_seqnum(>tcp_seq, ec->seq_skew);
 }
 
-th->tcp_csum = 0;
-if (!dp_packet_ol_tx_l4_csum(pkt)) {
+if (dp_packet_ol_tx_tcp_csum(pkt)) {
+dp_packet_ol_reset_l4_csum_good(pkt);
+} else {
+th->tcp_csum = 0;
 if (ctx->key.dl_type == htons(ETH_TYPE_IPV6)) {
 th->tcp_csum = packet_csum_upperlayer6(nh6, th, ctx->key.nw_proto,
dp_packet_l4_size(pkt));
diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 369f3561e..8a1bf221a 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -38,6 +38,9 @@ dp_packet_init__(struct dp_packet *p, size_t allocated, enum 
dp_packet_source so
 dp_packet_init_specific(p);
 /* By default assume the packet type to be Ethernet. */
 p->packet_type = htonl(PT_ETH);
+/* Reset csum start and offset. */
+p->csum_start = 0;
+p->csum_offset = 0;
 }
 
 static void
@@ -188,7 +191,7 @@ dp_packet_clone_with_headroom(const struct dp_packet *p, 
size_t headroom)
 dp_packet_size(p),
 headroom);
 /* Copy the following fields into the returned buffer: l2_pad_size,
- * l2_5_ofs, l3_ofs, l4_ofs, cutlen, packet_type and md. */
+ * l2_5_ofs, l3_ofs, ..., cutlen, packet_type and md. */
 memcpy(_buffer->l2_pad_size, >l2_pad_size,
 sizeof(struct dp_packet) -
 offsetof(struct dp_packet, l2_pad_size));
@@ -517,4 +520,22 @@ dp_packet_ol_send_prepare(struct dp_packet *p, const 
uint64_t flags) {
 dp_packet_ip_set_header_csum(p);
 dp_packet_ol_set_ip_csum_good(p);
 }
+
+if (dp_packet_ol_l4_csum_good(p) || !dp_packet_ol_tx_l4_csum(p)) {
+return;
+}
+
+if (dp_packet_ol_tx_tcp_csum(p)
+&& !(flags & NETDEV_OFFLOAD_TX_TCP_CSUM)) {
+packet_tcp_complete_csum(p);
+dp_packet_ol_set_l4_csum_good(p);
+} else if (dp_packet_ol_tx_udp_csum(p)
+&& !(flags & NETDEV_OFFLOAD_TX_UDP_CSUM)) {
+packet_udp_complete_csum(p);
+dp_packet_ol_set_l4_csum_good(p);
+} else if (!(flags & NETDEV_OFFLOAD_TX_SCTP_CSUM)
+&& dp_packet_ol_tx_sctp_csum(p)) {
+packet_sctp_complete_csum(p);
+dp_packet_ol_set_l4_csum_good(p);
+}
 }
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 278be172e..51f98ab9a 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -138,6 +138,8 @@ struct dp_packet {
   or UINT16_MAX. */
  

[ovs-dev] [[PATCH RFC] 13/17] Enable IP checksum offloading by default.

2021-12-07 Thread Flavio Leitner
The netdev receiving packets is supposed to provide the flags
indicating if the IP csum was verified and it is OK or BAD,
otherwise the stack will check when appropriate by software.

If the packet comes with good checksum, then postpone the
checksum calculation to the egress device if needed.

When encapsulate a packet with that flag, set the checksum
of the inner IP header since that is not yet supported.

Calculate the IP csum when the packet is going to be sent over
a device that doesn't support the feature.

Linux devices don't support IP csum offload alone, so the
support is not enabled.

Signed-off-by: Flavio Leitner 
---
 lib/conntrack.c | 12 ++---
 lib/dp-packet.c | 12 +
 lib/dp-packet.h | 63 ---
 lib/dpif.h  |  2 +-
 lib/flow.c  | 16 --
 lib/ipf.c   |  9 ++--
 lib/netdev-dpdk.c   | 78 ++--
 lib/netdev-dummy.c  | 21 
 lib/netdev-native-tnl.c | 19 +--
 lib/netdev.c| 22 
 lib/odp-execute.c   | 21 ++--
 lib/packets.c   | 34 ++---
 ofproto/ofproto-dpif-upcall.c   | 14 +++--
 tests/automake.mk   |  1 +
 tests/system-userspace-offload.at   | 79 +
 tests/system-userspace-testsuite.at |  1 +
 16 files changed, 322 insertions(+), 82 deletions(-)
 create mode 100644 tests/system-userspace-offload.at

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 2392a2ea4..5b4ca4dfc 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -2089,16 +2089,12 @@ conn_key_extract(struct conntrack *ct, struct dp_packet 
*pkt, ovs_be16 dl_type,
 ctx->key.dl_type = dl_type;
 
 if (ctx->key.dl_type == htons(ETH_TYPE_IP)) {
-bool hwol_bad_l3_csum = dp_packet_ol_ip_csum_bad(pkt);
-if (hwol_bad_l3_csum) {
+if (dp_packet_ol_ip_csum_bad(pkt)) {
 ok = false;
 COVERAGE_INC(conntrack_l3csum_err);
 } else {
-bool hwol_good_l3_csum = dp_packet_ol_ip_csum_good(pkt)
- || dp_packet_ol_tx_ipv4(pkt);
-/* Validate the checksum only when hwol is not supported. */
 ok = extract_l3_ipv4(>key, l3, dp_packet_l3_size(pkt), NULL,
- !hwol_good_l3_csum);
+ !dp_packet_ol_ip_csum_good(pkt));
 }
 } else if (ctx->key.dl_type == htons(ETH_TYPE_IPV6)) {
 ok = extract_l3_ipv6(>key, l3, dp_packet_l3_size(pkt), NULL);
@@ -3402,7 +3398,9 @@ handle_ftp_ctl(struct conntrack *ct, const struct 
conn_lookup_ctx *ctx,
 }
 if (seq_skew) {
 ip_len = ntohs(l3_hdr->ip_tot_len) + seq_skew;
-if (!dp_packet_ol_tx_ipv4(pkt)) {
+if (dp_packet_ol_tx_ip_csum(pkt)) {
+dp_packet_ol_reset_ip_csum_good(pkt);
+} else {
 l3_hdr->ip_csum = recalc_csum16(l3_hdr->ip_csum,
 l3_hdr->ip_tot_len,
 htons(ip_len));
diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index a4ca5a052..369f3561e 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -21,6 +21,7 @@
 #include "dp-packet.h"
 #include "netdev-afxdp.h"
 #include "netdev-dpdk.h"
+#include "netdev-provider.h"
 #include "openvswitch/dynamic-string.h"
 #include "util.h"
 
@@ -506,3 +507,14 @@ dp_packet_resize_l2(struct dp_packet *p, int increment)
 dp_packet_adjust_layer_offset(>l2_5_ofs, increment);
 return dp_packet_data(p);
 }
+
+/* Checks if the packet 'p' is compatible with netdev_ol_flags 'flags'
+ * and if not, update the packet with the software fall back. */
+void
+dp_packet_ol_send_prepare(struct dp_packet *p, const uint64_t flags) {
+if (!dp_packet_ol_ip_csum_good(p) && dp_packet_ol_tx_ip_csum(p)
+&& !(flags & NETDEV_OFFLOAD_TX_IPV4_CSUM)) {
+dp_packet_ip_set_header_csum(p);
+dp_packet_ol_set_ip_csum_good(p);
+}
+}
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index ac160985d..278be172e 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -25,6 +25,7 @@
 #include 
 #endif
 
+#include "csum.h"
 #include "netdev-afxdp.h"
 #include "netdev-dpdk.h"
 #include "openvswitch/list.h"
@@ -75,12 +76,14 @@ enum dp_packet_offload_mask {
 DEF_OL_FLAG(DP_PACKET_OL_TX_IPV4, PKT_TX_IPV4, 0x80),
 /* Offloaded packet is IPv6. */
 DEF_OL_FLAG(DP_PACKET_OL_TX_IPV6, PKT_TX_IPV6, 0x100),
+/* Offload IP checksum. */
+DEF_OL_FLAG(DP_PACKET_OL_TX_IP_CSUM, PKT_TX_IP_CKSUM, 0x200),
 /* Offload TC

[ovs-dev] [[PATCH RFC] 12/17] Show netdev offloading flags.

2021-12-07 Thread Flavio Leitner
Add a new command to show the offloading features of
each data path port.

Signed-off-by: Flavio Leitner 
---
 lib/dpif-netdev-unixctl.man |  5 
 lib/dpif-netdev.c   | 58 +
 lib/netdev-provider.h   |  3 ++
 lib/netdev.c| 35 ++
 tests/dpif-netdev.at| 21 ++
 5 files changed, 122 insertions(+)

diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man
index 607750bad..da64f89d6 100644
--- a/lib/dpif-netdev-unixctl.man
+++ b/lib/dpif-netdev-unixctl.man
@@ -260,3 +260,8 @@ PMDs in the case where no value is specified.  By default 
"scalar" is used.
 \fIstudy_cnt\fR defaults to 128 and indicates the number of packets that the
 "study" miniflow implementation must parse before choosing an optimal
 implementation.
+.IP "\fBdpif-netdev/offload-show\fR [\fIdp\fR] [\fInetdev\fR]"
+Prints the hardware offloading features enabled in netdev \fInetdev\fR
+attached to datapath \fIdp\fR. The datapath \fIdp\fR parameter can be
+omitted if there is only one. All netdev ports are printed if the
+parameter \fInetdev\fR is omitted.
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 69d7ec26e..a525ab1e9 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -1491,6 +1491,61 @@ dpif_netdev_bond_show(struct unixctl_conn *conn, int 
argc,
 ds_destroy();
 }
 
+static void
+dpif_netdev_offload_show(struct unixctl_conn *conn, int argc,
+ const char *argv[], void *aux OVS_UNUSED)
+{
+struct ds reply = DS_EMPTY_INITIALIZER;
+const char *netdev_name = NULL;
+struct dp_netdev *dp = NULL;
+struct dp_netdev_port *port;
+
+ovs_mutex_lock(_netdev_mutex);
+if (argc == 3) {
+dp = shash_find_data(_netdevs, argv[1]);
+netdev_name = argv[2];
+} else if (argc == 2) {
+dp = shash_find_data(_netdevs, argv[1]);
+if (!dp && shash_count(_netdevs) == 1) {
+/* There's only one datapath. */
+dp = shash_first(_netdevs)->data;
+netdev_name = argv[1];
+}
+} else if (shash_count(_netdevs) == 1) {
+/* There's only one datapath. */
+dp = shash_first(_netdevs)->data;
+}
+
+if (!dp) {
+ovs_mutex_unlock(_netdev_mutex);
+unixctl_command_reply_error(conn,
+"please specify an existing datapath");
+return;
+}
+
+ovs_mutex_lock(>port_mutex);
+HMAP_FOR_EACH (port, node, >ports) {
+if (netdev_name) {
+/* find the port and dump the info */
+if (!strcmp(netdev_get_name(port->netdev), netdev_name)) {
+ds_put_format(, "%s: ", netdev_get_name(port->netdev));
+netdev_ol_flags_to_string(, port->netdev);
+ds_put_format(, "\n");
+break;
+}
+} else {
+ds_put_format(, "%s: ", netdev_get_name(port->netdev));
+netdev_ol_flags_to_string(, port->netdev);
+ds_put_format(, "\n");
+}
+}
+
+ovs_mutex_unlock(>port_mutex);
+ovs_mutex_unlock(_netdev_mutex);
+unixctl_command_reply(conn, ds_cstr());
+ds_destroy();
+}
+
 
 static int
 dpif_netdev_init(void)
@@ -1547,6 +1602,9 @@ dpif_netdev_init(void)
 unixctl_command_register("dpif-netdev/miniflow-parser-get", "",
  0, 0, dpif_miniflow_extract_impl_get,
  NULL);
+unixctl_command_register("dpif-netdev/offload-show", "[dp] [netdev]",
+ 0, 2, dpif_netdev_offload_show,
+ NULL);
 return 0;
 }
 
diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
index 0a8538615..5489ebbb8 100644
--- a/lib/netdev-provider.h
+++ b/lib/netdev-provider.h
@@ -37,6 +37,7 @@ extern "C" {
 struct netdev_tnl_build_header_params;
 #define NETDEV_NUMA_UNSPEC OVS_NUMA_UNSPEC
 
+/* Keep this enum updated with translation to string below. */
 enum netdev_ol_flags {
 NETDEV_OFFLOAD_TX_IPV4_CSUM = 1 << 0,
 NETDEV_OFFLOAD_TX_TCP_CSUM = 1 << 1,
@@ -45,6 +46,8 @@ enum netdev_ol_flags {
 NETDEV_OFFLOAD_TX_TCP_TSO = 1 << 4,
 };
 
+void netdev_ol_flags_to_string(struct ds *, const struct netdev *);
+
 /* A network device (e.g. an Ethernet device).
  *
  * Network device implementations may read these members but should not modify
diff --git a/lib/netdev.c b/lib/netdev.c
index 9043d5aaf..5bde9c1c9 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -2298,3 +2298,38 @@ netdev_free_custom_stats_counters(struct 
netdev_custom_stats *custom_stats)
 }
 }
 }
+
+void
+netdev_ol_flags_to_string(struct ds *string, const struct netdev *netdev)
+{
+/* Sort by dependency, if any. */
+if (netdev->ol_flags & NETDEV_OFFLOAD

[ovs-dev] [[PATCH RFC] 10/17] dp-packet: Add _ol_ to functions using OL flags.

2021-12-07 Thread Flavio Leitner
This helps to identify when it is about the flags or
the packet itself.

Signed-off-by: Flavio Leitner 
---
 lib/conntrack.c |  8 
 lib/dp-packet.c |  2 +-
 lib/dp-packet.h | 10 +-
 lib/ipf.c   |  4 ++--
 lib/netdev-native-tnl.c |  4 ++--
 lib/netdev.c|  2 +-
 lib/packets.c   |  2 +-
 7 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 2f9b17670..2392a2ea4 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -2089,12 +2089,12 @@ conn_key_extract(struct conntrack *ct, struct dp_packet 
*pkt, ovs_be16 dl_type,
 ctx->key.dl_type = dl_type;
 
 if (ctx->key.dl_type == htons(ETH_TYPE_IP)) {
-bool hwol_bad_l3_csum = dp_packet_ip_csum_bad(pkt);
+bool hwol_bad_l3_csum = dp_packet_ol_ip_csum_bad(pkt);
 if (hwol_bad_l3_csum) {
 ok = false;
 COVERAGE_INC(conntrack_l3csum_err);
 } else {
-bool hwol_good_l3_csum = dp_packet_ip_csum_good(pkt)
+bool hwol_good_l3_csum = dp_packet_ol_ip_csum_good(pkt)
  || dp_packet_ol_tx_ipv4(pkt);
 /* Validate the checksum only when hwol is not supported. */
 ok = extract_l3_ipv4(>key, l3, dp_packet_l3_size(pkt), NULL,
@@ -2107,9 +2107,9 @@ conn_key_extract(struct conntrack *ct, struct dp_packet 
*pkt, ovs_be16 dl_type,
 }
 
 if (ok) {
-bool hwol_bad_l4_csum = dp_packet_l4_csum_bad(pkt);
+bool hwol_bad_l4_csum = dp_packet_ol_l4_csum_bad(pkt);
 if (!hwol_bad_l4_csum) {
-bool  hwol_good_l4_csum = dp_packet_l4_csum_good(pkt)
+bool  hwol_good_l4_csum = dp_packet_ol_l4_csum_good(pkt)
   || dp_packet_ol_tx_l4_csum(pkt);
 /* Validate the checksum only when hwol is not supported. */
 if (extract_l4(>key, l4, dp_packet_l4_size(pkt),
diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index b4ee8c33c..a4ca5a052 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -32,7 +32,7 @@ dp_packet_init__(struct dp_packet *p, size_t allocated, enum 
dp_packet_source so
 dp_packet_reset_offsets(p);
 pkt_metadata_init(>md, 0);
 dp_packet_reset_cutlen(p);
-dp_packet_reset_offload(p);
+dp_packet_ol_reset(p);
 /* Initialize implementation-specific fields of dp_packet. */
 dp_packet_init_specific(p);
 /* By default assume the packet type to be Ethernet. */
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index acb236a7d..ac160985d 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -933,7 +933,7 @@ dp_packet_rss_valid(const struct dp_packet *p)
 }
 
 static inline void
-dp_packet_reset_offload(struct dp_packet *p)
+dp_packet_ol_reset(struct dp_packet *p)
 {
 *dp_packet_ol_flags_ptr(p) &= ~DP_PACKET_OL_SUPPORTED_MASK;
 }
@@ -1049,28 +1049,28 @@ dp_packet_ol_set_tcp_seg(struct dp_packet *p)
 }
 
 static inline bool
-dp_packet_ip_csum_good(const struct dp_packet *p)
+dp_packet_ol_ip_csum_good(const struct dp_packet *p)
 {
 return (*dp_packet_ol_flags_ptr(p) & DP_PACKET_OL_RX_IP_CSUM_MASK) ==
 DP_PACKET_OL_RX_IP_CSUM_GOOD;
 }
 
 static inline bool
-dp_packet_ip_csum_bad(const struct dp_packet *p)
+dp_packet_ol_ip_csum_bad(const struct dp_packet *p)
 {
 return (*dp_packet_ol_flags_ptr(p) & DP_PACKET_OL_RX_IP_CSUM_MASK) ==
 DP_PACKET_OL_RX_IP_CSUM_BAD;
 }
 
 static inline bool
-dp_packet_l4_csum_good(const struct dp_packet *p)
+dp_packet_ol_l4_csum_good(const struct dp_packet *p)
 {
 return (*dp_packet_ol_flags_ptr(p) & DP_PACKET_OL_RX_L4_CSUM_MASK) ==
 DP_PACKET_OL_RX_L4_CSUM_GOOD;
 }
 
 static inline bool
-dp_packet_l4_csum_bad(const struct dp_packet *p)
+dp_packet_ol_l4_csum_bad(const struct dp_packet *p)
 {
 return (*dp_packet_ol_flags_ptr(p) & DP_PACKET_OL_RX_L4_CSUM_MASK) ==
 DP_PACKET_OL_RX_L4_CSUM_BAD;
diff --git a/lib/ipf.c b/lib/ipf.c
index fd40e32c4..e78559491 100644
--- a/lib/ipf.c
+++ b/lib/ipf.c
@@ -574,7 +574,7 @@ ipf_list_state_transition(struct ipf *ipf, struct ipf_list 
*ipf_list,
 static bool
 ipf_is_valid_v4_frag(struct ipf *ipf, struct dp_packet *pkt)
 {
-if (OVS_UNLIKELY(dp_packet_ip_csum_bad(pkt))) {
+if (OVS_UNLIKELY(dp_packet_ol_ip_csum_bad(pkt))) {
 COVERAGE_INC(ipf_l3csum_err);
 goto invalid_pkt;
 }
@@ -608,7 +608,7 @@ ipf_is_valid_v4_frag(struct ipf *ipf, struct dp_packet *pkt)
 goto invalid_pkt;
 }
 
-if (OVS_UNLIKELY(!dp_packet_ip_csum_good(pkt)
+if (OVS_UNLIKELY(!dp_packet_ol_ip_csum_good(pkt)
  && !dp_packet_ol_tx_ipv4(pkt)
  && csum(l3, ip_hdr_len) != 0)) {
 COVERAGE_INC(ipf_l3csum_err);
diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index 40705e190..48f13b4bd 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -88,7 +88,7 @@ ne

[ovs-dev] [[PATCH RFC] 11/17] Document netdev offload.

2021-12-07 Thread Flavio Leitner
Document the implementation of netdev hardware offloading
in userspace datapath.

Signed-off-by: Flavio Leitner 
---
 Documentation/automake.mk |  1 +
 Documentation/topics/index.rst|  1 +
 Documentation/topics/nic-offloads.rst | 95 +++
 3 files changed, 97 insertions(+)
 create mode 100644 Documentation/topics/nic-offloads.rst

diff --git a/Documentation/automake.mk b/Documentation/automake.mk
index 137cc57c5..b3da74d4d 100644
--- a/Documentation/automake.mk
+++ b/Documentation/automake.mk
@@ -50,6 +50,7 @@ DOC_SOURCE = \
Documentation/topics/integration.rst \
Documentation/topics/language-bindings.rst \
Documentation/topics/networking-namespaces.rst \
+   Documentation/topics/nic-offloads.rst \
Documentation/topics/openflow.rst \
Documentation/topics/ovs-extensions.rst \
Documentation/topics/ovsdb-relay.rst \
diff --git a/Documentation/topics/index.rst b/Documentation/topics/index.rst
index d8ccbd757..0e402d978 100644
--- a/Documentation/topics/index.rst
+++ b/Documentation/topics/index.rst
@@ -44,6 +44,7 @@ OVS
openflow
bonding
networking-namespaces
+   nic-offloads
ovsdb-relay
ovsdb-replication
dpdk/index
diff --git a/Documentation/topics/nic-offloads.rst 
b/Documentation/topics/nic-offloads.rst
new file mode 100644
index 0..5959c65ad
--- /dev/null
+++ b/Documentation/topics/nic-offloads.rst
@@ -0,0 +1,95 @@
+..
+  Licensed under the Apache License, Version 2.0 (the "License"); you may
+  not use this file except in compliance with the License. You may obtain
+  a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+  WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+  License for the specific language governing permissions and limitations
+  under the License.
+
+  Convention for heading levels in Open vSwitch documentation:
+
+  ===  Heading 0 (reserved for the title in a document)
+  ---  Heading 1
+  ~~~  Heading 2
+  +++  Heading 3
+  '''  Heading 4
+
+  Avoid deeper levels because they do not render well.
+
+
+NIC Offloads
+
+
+This document explains the internals of Open vSwitch support for NIC offloads.
+
+Design
+--
+
+The Open vSwitch should strive to forward packets as they arrive regardless
+if the checksum is correct, for example. However, it cannot fix existing
+problems. Therefore, when the packet has the checksum verified or it the
+packet is known to be good, the checksum calculation can be offloaded to
+the NIC, otherwise updates can be made as long as the previous situation
+doesn't change. For example, a packet has corrupted IP checksum can be
+accepted, a flow rule can change the IP destination address to another
+address. In that case, OVS needs to partially recompute the checksum
+instead of offloading or calculate all of it again which would fix the
+existing issue.
+
+The drivers can set flags indicating if the checksum is good or bad.
+The checksum is considered unverified if no flag is set.
+
+When a packet ingress the data path with good checksum, OVS should
+enable checksum offload by default. This allows the data path to
+postpone checksum updates until the packet egress the data path.
+
+When a packet egress the data path, the packet flags and the egress
+port flags are verified to make sure all required NIC offload
+features to send out the packet are available. If not, the data
+path will fall back to equivalent software implementation.
+
+
+Drivers
+---
+
+When the driver initiates, it should set the flags to tell the data path
+which offload features are supported. For example, if the driver supports
+IP checksum offloading, then netdev->ol_flags should set the flag
+NETDEV_OFFLOAD_TX_IPV4_CSUM.
+
+
+Rules
+-
+1) OVS should strive to forward all packets regardless of checksum.
+
+2) OVS must not correct a bad packet/checksum.
+
+3) Packet with flag DP_PACKET_OL_RX_IP_CSUM_GOOD means that the
+   IP checksum is present in the packet and it is good.
+
+4) Packet with flag DP_PACKET_OL_RX_IP_CSUM_BAD means that the
+   IP checksum is present in the packet and it is BAD. Extra care
+   should be taken to not fix the packet during data path processing.
+
+5) The ingress packet parser can only set DP_PACKET_OL_TX_IP_CSUM
+   if the packet has DP_PACKET_OL_RX_L4_CKSUM_GOOD to not violate
+   rule #2.
+
+6) Packet with flag DP_PACKET_OL_TX_IPV4 is a IPv4 packet.
+
+7) Packet with flag DP_PACKET_OL_TX_IPV6 is a IPv6 packet.
+
+8) Packet with flag DP_PACKET_OL_TX_IP_CSUM tells the data path
+   to skip updating the IP checksum if the packet is modified. The
+   IP checksum will be calculated by the egress port 

[ovs-dev] [[PATCH RFC] 09/17] dp-packet: Rename dp_packet_ol l4 functions.

2021-12-07 Thread Flavio Leitner
Rename to better represent their flags.

Signed-off-by: Flavio Leitner 
---
 lib/dp-packet.h| 21 +++--
 lib/netdev-linux.c | 14 +++---
 lib/netdev.c   | 18 --
 3 files changed, 22 insertions(+), 31 deletions(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index dfa25e095..acb236a7d 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -956,18 +956,11 @@ dp_packet_set_flow_mark(struct dp_packet *p, uint32_t 
mark)
 *dp_packet_ol_flags_ptr(p) |= DP_PACKET_OL_FLOW_MARK;
 }
 
-/* Returns the L4 cksum offload bitmask. */
-static inline uint64_t
-dp_packet_ol_l4_mask(const struct dp_packet *p)
-{
-return *dp_packet_ol_flags_ptr(p) & DP_PACKET_OL_TX_L4_MASK;
-}
-
 /* Return true if the packet 'p' requested L4 checksum offload. */
 static inline bool
 dp_packet_ol_tx_l4_csum(const struct dp_packet *p)
 {
-return !!dp_packet_ol_l4_mask(p);
+return !!(*dp_packet_ol_flags_ptr(p) & DP_PACKET_OL_TX_L4_MASK);
 }
 
 /* Returns 'true' if packet 'p' is marked for TCP segmentation offloading. */
@@ -986,7 +979,7 @@ dp_packet_ol_tx_ipv4(const struct dp_packet *p)
 
 /* Returns 'true' if packet 'p' is marked for TCP checksum offloading. */
 static inline bool
-dp_packet_ol_l4_is_tcp(const struct dp_packet *p)
+dp_packet_ol_tx_tcp_csum(const struct dp_packet *p)
 {
 return (*dp_packet_ol_flags_ptr(p) & DP_PACKET_OL_TX_L4_MASK) ==
 DP_PACKET_OL_TX_TCP_CSUM;
@@ -994,7 +987,7 @@ dp_packet_ol_l4_is_tcp(const struct dp_packet *p)
 
 /* Returns 'true' if packet 'p' is marked for UDP checksum offloading. */
 static inline bool
-dp_packet_ol_l4_is_udp(struct dp_packet *p)
+dp_packet_ol_tx_udp_csum(struct dp_packet *p)
 {
 return (*dp_packet_ol_flags_ptr(p) & DP_PACKET_OL_TX_L4_MASK) ==
 DP_PACKET_OL_TX_UDP_CSUM;
@@ -1002,7 +995,7 @@ dp_packet_ol_l4_is_udp(struct dp_packet *p)
 
 /* Returns 'true' if packet 'p' is marked for SCTP checksum offloading. */
 static inline bool
-dp_packet_ol_l4_is_sctp(struct dp_packet *p)
+dp_packet_ol_tx_sctp_csum(struct dp_packet *p)
 {
 return (*dp_packet_ol_flags_ptr(p) & DP_PACKET_OL_TX_L4_MASK) ==
 DP_PACKET_OL_TX_SCTP_CSUM;
@@ -1025,7 +1018,7 @@ dp_packet_ol_set_tx_ipv6(struct dp_packet *p)
 /* Mark packet 'p' for TCP checksum offloading.  It implies that either
  * the packet 'p' is marked for IPv4 or IPv6 checksum offloading. */
 static inline void
-dp_packet_ol_set_csum_tcp(struct dp_packet *p)
+dp_packet_ol_set_tx_tcp_csum(struct dp_packet *p)
 {
 *dp_packet_ol_flags_ptr(p) |= DP_PACKET_OL_TX_TCP_CSUM;
 }
@@ -1033,7 +1026,7 @@ dp_packet_ol_set_csum_tcp(struct dp_packet *p)
 /* Mark packet 'p' for UDP checksum offloading.  It implies that either
  * the packet 'p' is marked for IPv4 or IPv6 checksum offloading. */
 static inline void
-dp_packet_ol_set_csum_udp(struct dp_packet *p)
+dp_packet_ol_set_tx_udp_csum(struct dp_packet *p)
 {
 *dp_packet_ol_flags_ptr(p) |= DP_PACKET_OL_TX_UDP_CSUM;
 }
@@ -1041,7 +1034,7 @@ dp_packet_ol_set_csum_udp(struct dp_packet *p)
 /* Mark packet 'p' for SCTP checksum offloading.  It implies that either
  * the packet 'p' is marked for IPv4 or IPv6 checksum offloading. */
 static inline void
-dp_packet_ol_set_csum_sctp(struct dp_packet *p)
+dp_packet_ol_set_tx_sctp_csum(struct dp_packet *p)
 {
 *dp_packet_ol_flags_ptr(p) |= DP_PACKET_OL_TX_SCTP_CSUM;
 }
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 696a86db2..82f9a0758 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -6637,11 +6637,11 @@ netdev_linux_parse_vnet_hdr(struct dp_packet *b)
 
 if (vnet->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) {
 if (l4proto == IPPROTO_TCP) {
-dp_packet_ol_set_csum_tcp(b);
+dp_packet_ol_set_tx_tcp_csum(b);
 } else if (l4proto == IPPROTO_UDP) {
-dp_packet_ol_set_csum_udp(b);
+dp_packet_ol_set_tx_udp_csum(b);
 } else if (l4proto == IPPROTO_SCTP) {
-dp_packet_ol_set_csum_sctp(b);
+dp_packet_ol_set_tx_sctp_csum(b);
 }
 }
 
@@ -6681,18 +6681,18 @@ netdev_linux_prepend_vnet_hdr(struct dp_packet *b, int 
mtu)
 vnet->flags = VIRTIO_NET_HDR_GSO_NONE;
 }
 
-if (dp_packet_ol_l4_mask(b)) {
+if (dp_packet_ol_tx_l4_csum(b)) {
 vnet->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
 vnet->csum_start = (OVS_FORCE __virtio16)((char *)dp_packet_l4(b)
   - (char *)dp_packet_eth(b));
 
-if (dp_packet_ol_l4_is_tcp(b)) {
+if (dp_packet_ol_tx_tcp_csum(b)) {
 vnet->csum_offset = (OVS_FORCE __virtio16) __builtin_offsetof(
 struct tcp_header, tcp_csum);
-} else if (dp_packet_ol_l4_is_udp(b)) {
+} else if (dp_packet_ol_tx_udp_csum(b)) {
 vnet->csum_offset = (OVS_FORCE __virtio16) __builtin_offsetof(
 struct 

[ovs-dev] [[PATCH RFC] 08/17] dp-packet: Rename dp_packet_ol_is_ipv4.

2021-12-07 Thread Flavio Leitner
Rename to dp_packet_ol_tx_ipv4 to align the flag.

Signed-off-by: Flavio Leitner 
---
 lib/conntrack.c| 4 ++--
 lib/dp-packet.h| 2 +-
 lib/ipf.c  | 6 +++---
 lib/netdev-linux.c | 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 24234e672..2f9b17670 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -2095,7 +2095,7 @@ conn_key_extract(struct conntrack *ct, struct dp_packet 
*pkt, ovs_be16 dl_type,
 COVERAGE_INC(conntrack_l3csum_err);
 } else {
 bool hwol_good_l3_csum = dp_packet_ip_csum_good(pkt)
- || dp_packet_ol_is_ipv4(pkt);
+ || dp_packet_ol_tx_ipv4(pkt);
 /* Validate the checksum only when hwol is not supported. */
 ok = extract_l3_ipv4(>key, l3, dp_packet_l3_size(pkt), NULL,
  !hwol_good_l3_csum);
@@ -3402,7 +3402,7 @@ handle_ftp_ctl(struct conntrack *ct, const struct 
conn_lookup_ctx *ctx,
 }
 if (seq_skew) {
 ip_len = ntohs(l3_hdr->ip_tot_len) + seq_skew;
-if (!dp_packet_ol_is_ipv4(pkt)) {
+if (!dp_packet_ol_tx_ipv4(pkt)) {
 l3_hdr->ip_csum = recalc_csum16(l3_hdr->ip_csum,
 l3_hdr->ip_tot_len,
 htons(ip_len));
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 614ebbb4d..dfa25e095 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -979,7 +979,7 @@ dp_packet_ol_tcp_seg(const struct dp_packet *p)
 
 /* Returns 'true' if packet 'p' is marked for IPv4 checksum offloading. */
 static inline bool
-dp_packet_ol_is_ipv4(const struct dp_packet *p)
+dp_packet_ol_tx_ipv4(const struct dp_packet *p)
 {
 return !!(*dp_packet_ol_flags_ptr(p) & DP_PACKET_OL_TX_IPV4);
 }
diff --git a/lib/ipf.c b/lib/ipf.c
index f290d5d23..fd40e32c4 100644
--- a/lib/ipf.c
+++ b/lib/ipf.c
@@ -433,7 +433,7 @@ ipf_reassemble_v4_frags(struct ipf_list *ipf_list)
 len += rest_len;
 l3 = dp_packet_l3(pkt);
 ovs_be16 new_ip_frag_off = l3->ip_frag_off & ~htons(IP_MORE_FRAGMENTS);
-if (!dp_packet_ol_is_ipv4(pkt)) {
+if (!dp_packet_ol_tx_ipv4(pkt)) {
 l3->ip_csum = recalc_csum16(l3->ip_csum, l3->ip_frag_off,
 new_ip_frag_off);
 l3->ip_csum = recalc_csum16(l3->ip_csum, l3->ip_tot_len, htons(len));
@@ -609,7 +609,7 @@ ipf_is_valid_v4_frag(struct ipf *ipf, struct dp_packet *pkt)
 }
 
 if (OVS_UNLIKELY(!dp_packet_ip_csum_good(pkt)
- && !dp_packet_ol_is_ipv4(pkt)
+ && !dp_packet_ol_tx_ipv4(pkt)
  && csum(l3, ip_hdr_len) != 0)) {
 COVERAGE_INC(ipf_l3csum_err);
 goto invalid_pkt;
@@ -1185,7 +1185,7 @@ ipf_post_execute_reass_pkts(struct ipf *ipf,
 } else {
 struct ip_header *l3_frag = dp_packet_l3(frag_i->pkt);
 struct ip_header *l3_reass = dp_packet_l3(pkt);
-if (!dp_packet_ol_is_ipv4(frag_i->pkt)) {
+if (!dp_packet_ol_tx_ipv4(frag_i->pkt)) {
 ovs_be32 reass_ip =
 get_16aligned_be32(_reass->ip_src);
 ovs_be32 frag_ip =
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 5d0af5a40..696a86db2 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -6671,7 +6671,7 @@ netdev_linux_prepend_vnet_hdr(struct dp_packet *b, int 
mtu)
 
 vnet->hdr_len = (OVS_FORCE __virtio16)hdr_len;
 vnet->gso_size = (OVS_FORCE __virtio16)(mtu - hdr_len);
-if (dp_packet_ol_is_ipv4(b)) {
+if (dp_packet_ol_tx_ipv4(b)) {
 vnet->gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
 } else {
 vnet->gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [[PATCH RFC] 06/17] dp-packet: Use p for packet and b for batch.

2021-12-07 Thread Flavio Leitner
Currently 'p' and 'b' and used for packets, so use
a convention that struct dp_packet is 'p' and
struct dp_packet_batch is 'b'.

Some comments needed new formatting to not pass the
80 column.

Some variables were using 'p' or 'b' were renamed
as well.

There should be no functional change with this patch.

Signed-off-by: Flavio Leitner 
---
 lib/dp-packet.c | 342 
 lib/dp-packet.h | 506 
 2 files changed, 424 insertions(+), 424 deletions(-)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 72f6d09ac..b4ee8c33c 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -25,58 +25,58 @@
 #include "util.h"
 
 static void
-dp_packet_init__(struct dp_packet *b, size_t allocated, enum dp_packet_source 
source)
-{
-dp_packet_set_allocated(b, allocated);
-b->source = source;
-dp_packet_reset_offsets(b);
-pkt_metadata_init(>md, 0);
-dp_packet_reset_cutlen(b);
-dp_packet_reset_offload(b);
+dp_packet_init__(struct dp_packet *p, size_t allocated, enum dp_packet_source 
source)
+{
+dp_packet_set_allocated(p, allocated);
+p->source = source;
+dp_packet_reset_offsets(p);
+pkt_metadata_init(>md, 0);
+dp_packet_reset_cutlen(p);
+dp_packet_reset_offload(p);
 /* Initialize implementation-specific fields of dp_packet. */
-dp_packet_init_specific(b);
+dp_packet_init_specific(p);
 /* By default assume the packet type to be Ethernet. */
-b->packet_type = htonl(PT_ETH);
+p->packet_type = htonl(PT_ETH);
 }
 
 static void
-dp_packet_use__(struct dp_packet *b, void *base, size_t allocated,
+dp_packet_use__(struct dp_packet *p, void *base, size_t allocated,
  enum dp_packet_source source)
 {
-dp_packet_set_base(b, base);
-dp_packet_set_data(b, base);
-dp_packet_set_size(b, 0);
+dp_packet_set_base(p, base);
+dp_packet_set_data(p, base);
+dp_packet_set_size(p, 0);
 
-dp_packet_init__(b, allocated, source);
+dp_packet_init__(p, allocated, source);
 }
 
-/* Initializes 'b' as an empty dp_packet that contains the 'allocated' bytes of
+/* Initializes 'p' as an empty dp_packet that contains the 'allocated' bytes of
  * memory starting at 'base'.  'base' should be the first byte of a region
- * obtained from malloc().  It will be freed (with free()) if 'b' is resized or
+ * obtained from malloc().  It will be freed (with free()) if 'p' is resized or
  * freed. */
 void
-dp_packet_use(struct dp_packet *b, void *base, size_t allocated)
+dp_packet_use(struct dp_packet *p, void *base, size_t allocated)
 {
-dp_packet_use__(b, base, allocated, DPBUF_MALLOC);
+dp_packet_use__(p, base, allocated, DPBUF_MALLOC);
 }
 
 #if HAVE_AF_XDP
-/* Initialize 'b' as an empty dp_packet that contains
+/* Initialize 'p' as an empty dp_packet that contains
  * memory starting at AF_XDP umem base.
  */
 void
-dp_packet_use_afxdp(struct dp_packet *b, void *data, size_t allocated,
+dp_packet_use_afxdp(struct dp_packet *p, void *data, size_t allocated,
 size_t headroom)
 {
-dp_packet_set_base(b, (char *)data - headroom);
-dp_packet_set_data(b, data);
-dp_packet_set_size(b, 0);
+dp_packet_set_base(p, (char *)data - headroom);
+dp_packet_set_data(p, data);
+dp_packet_set_size(p, 0);
 
-dp_packet_init__(b, allocated, DPBUF_AFXDP);
+dp_packet_init__(p, allocated, DPBUF_AFXDP);
 }
 #endif
 
-/* Initializes 'b' as an empty dp_packet that contains the 'allocated' bytes of
+/* Initializes 'p' as an empty dp_packet that contains the 'allocated' bytes of
  * memory starting at 'base'.  'base' should point to a buffer on the stack.
  * (Nothing actually relies on 'base' being allocated on the stack.  It could
  * be static or malloc()'d memory.  But stack space is the most common use
@@ -91,12 +91,12 @@ dp_packet_use_afxdp(struct dp_packet *b, void *data, size_t 
allocated,
  * on an dp_packet initialized by this function, so that if it expanded into 
the
  * heap, that memory is freed. */
 void
-dp_packet_use_stub(struct dp_packet *b, void *base, size_t allocated)
+dp_packet_use_stub(struct dp_packet *p, void *base, size_t allocated)
 {
-dp_packet_use__(b, base, allocated, DPBUF_STUB);
+dp_packet_use__(p, base, allocated, DPBUF_STUB);
 }
 
-/* Initializes 'b' as an dp_packet whose data starts at 'data' and continues 
for
+/* Initializes 'p' as an dp_packet whose data starts at 'data' and continues 
for
  * 'size' bytes.  This is appropriate for an dp_packet that will be used to
  * inspect existing data, without moving it around or reallocating it, and
  * generally without modifying it at all.
@@ -104,43 +104,43 @@ dp_packet_use_stub(struct dp_packet *b, void *base, 
size_t allocated)
  * An dp_packet operation that requires reallocating data will assert-fail if 
this
  * function was used to initialize it. */
 void
-dp_packet_use_const(struct dp_packet *b, const void *data, size_t size)

[ovs-dev] [[PATCH RFC] 07/17] dp-packet: Rename dp_packet_ol_tcp_seg

2021-12-07 Thread Flavio Leitner
Rename to dp_packet_ol_tcp_seg, because that is less
redundant and allows other protocols.

Signed-off-by: Flavio Leitner 
---
 lib/dp-packet.h| 2 +-
 lib/netdev-linux.c | 2 +-
 lib/netdev.c   | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 8b06e457b..614ebbb4d 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -972,7 +972,7 @@ dp_packet_ol_tx_l4_csum(const struct dp_packet *p)
 
 /* Returns 'true' if packet 'p' is marked for TCP segmentation offloading. */
 static inline bool
-dp_packet_ol_is_tso(const struct dp_packet *p)
+dp_packet_ol_tcp_seg(const struct dp_packet *p)
 {
 return !!(*dp_packet_ol_flags_ptr(p) & DP_PACKET_OL_TX_TCP_SEG);
 }
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 35e3e1e79..5d0af5a40 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -6665,7 +6665,7 @@ netdev_linux_prepend_vnet_hdr(struct dp_packet *b, int 
mtu)
 {
 struct virtio_net_hdr *vnet = dp_packet_push_zeros(b, sizeof *vnet);
 
-if (dp_packet_ol_is_tso(b)) {
+if (dp_packet_ol_tcp_seg(b)) {
 uint16_t hdr_len = ((char *)dp_packet_l4(b) - (char *)dp_packet_eth(b))
 + TCP_HEADER_LEN;
 
diff --git a/lib/netdev.c b/lib/netdev.c
index d087929e5..fb535ed7c 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -794,7 +794,7 @@ netdev_send_prepare_packet(const uint64_t netdev_flags,
 {
 uint64_t l4_mask;
 
-if (dp_packet_ol_is_tso(packet)
+if (dp_packet_ol_tcp_seg(packet)
 && !(netdev_flags & NETDEV_OFFLOAD_TX_TCP_TSO)) {
 /* Fall back to GSO in software. */
 VLOG_ERR_BUF(errormsg, "No TSO support");
@@ -960,7 +960,7 @@ netdev_push_header(const struct netdev *netdev,
 size_t i, size = dp_packet_batch_size(batch);
 
 DP_PACKET_BATCH_REFILL_FOR_EACH (i, size, packet, batch) {
-if (OVS_UNLIKELY(dp_packet_ol_is_tso(packet)
+if (OVS_UNLIKELY(dp_packet_ol_tcp_seg(packet)
  || dp_packet_ol_l4_mask(packet))) {
 COVERAGE_INC(netdev_push_header_drops);
 dp_packet_delete(packet);
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [[PATCH RFC] 05/17] Rename dp_packet_hwol to dp_packet_ol.

2021-12-07 Thread Flavio Leitner
The name correlates better with the flag names.

Signed-off-by: Flavio Leitner 
---
 lib/conntrack.c|  8 
 lib/dp-packet.h| 28 ++--
 lib/ipf.c  |  6 +++---
 lib/netdev-dpdk.c  | 24 
 lib/netdev-linux.c | 24 
 lib/netdev.c   | 14 +++---
 6 files changed, 52 insertions(+), 52 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 907c5ed30..24234e672 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -2095,7 +2095,7 @@ conn_key_extract(struct conntrack *ct, struct dp_packet 
*pkt, ovs_be16 dl_type,
 COVERAGE_INC(conntrack_l3csum_err);
 } else {
 bool hwol_good_l3_csum = dp_packet_ip_csum_good(pkt)
- || dp_packet_hwol_is_ipv4(pkt);
+ || dp_packet_ol_is_ipv4(pkt);
 /* Validate the checksum only when hwol is not supported. */
 ok = extract_l3_ipv4(>key, l3, dp_packet_l3_size(pkt), NULL,
  !hwol_good_l3_csum);
@@ -2110,7 +2110,7 @@ conn_key_extract(struct conntrack *ct, struct dp_packet 
*pkt, ovs_be16 dl_type,
 bool hwol_bad_l4_csum = dp_packet_l4_csum_bad(pkt);
 if (!hwol_bad_l4_csum) {
 bool  hwol_good_l4_csum = dp_packet_l4_csum_good(pkt)
-  || dp_packet_hwol_tx_l4_csum(pkt);
+  || dp_packet_ol_tx_l4_csum(pkt);
 /* Validate the checksum only when hwol is not supported. */
 if (extract_l4(>key, l4, dp_packet_l4_size(pkt),
>icmp_related, l3, !hwol_good_l4_csum,
@@ -3402,7 +3402,7 @@ handle_ftp_ctl(struct conntrack *ct, const struct 
conn_lookup_ctx *ctx,
 }
 if (seq_skew) {
 ip_len = ntohs(l3_hdr->ip_tot_len) + seq_skew;
-if (!dp_packet_hwol_is_ipv4(pkt)) {
+if (!dp_packet_ol_is_ipv4(pkt)) {
 l3_hdr->ip_csum = recalc_csum16(l3_hdr->ip_csum,
 l3_hdr->ip_tot_len,
 htons(ip_len));
@@ -3424,7 +3424,7 @@ handle_ftp_ctl(struct conntrack *ct, const struct 
conn_lookup_ctx *ctx,
 }
 
 th->tcp_csum = 0;
-if (!dp_packet_hwol_tx_l4_csum(pkt)) {
+if (!dp_packet_ol_tx_l4_csum(pkt)) {
 if (ctx->key.dl_type == htons(ETH_TYPE_IPV6)) {
 th->tcp_csum = packet_csum_upperlayer6(nh6, th, ctx->key.nw_proto,
dp_packet_l4_size(pkt));
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 5540680cf..82eae87b6 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -958,35 +958,35 @@ dp_packet_set_flow_mark(struct dp_packet *p, uint32_t 
mark)
 
 /* Returns the L4 cksum offload bitmask. */
 static inline uint64_t
-dp_packet_hwol_l4_mask(const struct dp_packet *b)
+dp_packet_ol_l4_mask(const struct dp_packet *b)
 {
 return *dp_packet_ol_flags_ptr(b) & DP_PACKET_OL_TX_L4_MASK;
 }
 
 /* Return true if the packet 'b' requested L4 checksum offload. */
 static inline bool
-dp_packet_hwol_tx_l4_csum(const struct dp_packet *b)
+dp_packet_ol_tx_l4_csum(const struct dp_packet *b)
 {
-return !!dp_packet_hwol_l4_mask(b);
+return !!dp_packet_ol_l4_mask(b);
 }
 
 /* Returns 'true' if packet 'b' is marked for TCP segmentation offloading. */
 static inline bool
-dp_packet_hwol_is_tso(const struct dp_packet *b)
+dp_packet_ol_is_tso(const struct dp_packet *b)
 {
 return !!(*dp_packet_ol_flags_ptr(b) & DP_PACKET_OL_TX_TCP_SEG);
 }
 
 /* Returns 'true' if packet 'b' is marked for IPv4 checksum offloading. */
 static inline bool
-dp_packet_hwol_is_ipv4(const struct dp_packet *b)
+dp_packet_ol_is_ipv4(const struct dp_packet *b)
 {
 return !!(*dp_packet_ol_flags_ptr(b) & DP_PACKET_OL_TX_IPV4);
 }
 
 /* Returns 'true' if packet 'b' is marked for TCP checksum offloading. */
 static inline bool
-dp_packet_hwol_l4_is_tcp(const struct dp_packet *b)
+dp_packet_ol_l4_is_tcp(const struct dp_packet *b)
 {
 return (*dp_packet_ol_flags_ptr(b) & DP_PACKET_OL_TX_L4_MASK) ==
 DP_PACKET_OL_TX_TCP_CSUM;
@@ -994,7 +994,7 @@ dp_packet_hwol_l4_is_tcp(const struct dp_packet *b)
 
 /* Returns 'true' if packet 'b' is marked for UDP checksum offloading. */
 static inline bool
-dp_packet_hwol_l4_is_udp(struct dp_packet *b)
+dp_packet_ol_l4_is_udp(struct dp_packet *b)
 {
 return (*dp_packet_ol_flags_ptr(b) & DP_PACKET_OL_TX_L4_MASK) ==
 DP_PACKET_OL_TX_UDP_CSUM;
@@ -1002,7 +1002,7 @@ dp_packet_hwol_l4_is_udp(struct dp_packet *b)
 
 /* Returns 'true' if packet 'b' is marked for SCTP checksum offloading. */
 static inline bool
-dp_packet_hwol_l4_is_sctp(struct dp_packet *b)
+dp_packet_ol_l4_is_sctp(struct dp_packet *b)
 {
 return 

[ovs-dev] [[PATCH RFC] 04/17] Rename hwol csum valid to good.

2021-12-07 Thread Flavio Leitner
This represents better the state and use the same
convention as the flags.

Signed-off-by: Flavio Leitner 
---
 lib/conntrack.c | 4 ++--
 lib/dp-packet.h | 4 ++--
 lib/ipf.c   | 2 +-
 lib/netdev-native-tnl.c | 4 ++--
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index e84ec4aee..907c5ed30 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -2094,7 +2094,7 @@ conn_key_extract(struct conntrack *ct, struct dp_packet 
*pkt, ovs_be16 dl_type,
 ok = false;
 COVERAGE_INC(conntrack_l3csum_err);
 } else {
-bool hwol_good_l3_csum = dp_packet_ip_csum_valid(pkt)
+bool hwol_good_l3_csum = dp_packet_ip_csum_good(pkt)
  || dp_packet_hwol_is_ipv4(pkt);
 /* Validate the checksum only when hwol is not supported. */
 ok = extract_l3_ipv4(>key, l3, dp_packet_l3_size(pkt), NULL,
@@ -2109,7 +2109,7 @@ conn_key_extract(struct conntrack *ct, struct dp_packet 
*pkt, ovs_be16 dl_type,
 if (ok) {
 bool hwol_bad_l4_csum = dp_packet_l4_csum_bad(pkt);
 if (!hwol_bad_l4_csum) {
-bool  hwol_good_l4_csum = dp_packet_l4_csum_valid(pkt)
+bool  hwol_good_l4_csum = dp_packet_l4_csum_good(pkt)
   || dp_packet_hwol_tx_l4_csum(pkt);
 /* Validate the checksum only when hwol is not supported. */
 if (extract_l4(>key, l4, dp_packet_l4_size(pkt),
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 18faa79c0..5540680cf 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -1056,7 +1056,7 @@ dp_packet_hwol_set_tcp_seg(struct dp_packet *b)
 }
 
 static inline bool
-dp_packet_ip_csum_valid(const struct dp_packet *p)
+dp_packet_ip_csum_good(const struct dp_packet *p)
 {
 return (*dp_packet_ol_flags_ptr(p) & DP_PACKET_OL_RX_IP_CSUM_MASK) ==
 DP_PACKET_OL_RX_IP_CSUM_GOOD;
@@ -1070,7 +1070,7 @@ dp_packet_ip_csum_bad(const struct dp_packet *p)
 }
 
 static inline bool
-dp_packet_l4_csum_valid(const struct dp_packet *p)
+dp_packet_l4_csum_good(const struct dp_packet *p)
 {
 return (*dp_packet_ol_flags_ptr(p) & DP_PACKET_OL_RX_L4_CSUM_MASK) ==
 DP_PACKET_OL_RX_L4_CSUM_GOOD;
diff --git a/lib/ipf.c b/lib/ipf.c
index 013c4cfba..390fbe312 100644
--- a/lib/ipf.c
+++ b/lib/ipf.c
@@ -608,7 +608,7 @@ ipf_is_valid_v4_frag(struct ipf *ipf, struct dp_packet *pkt)
 goto invalid_pkt;
 }
 
-if (OVS_UNLIKELY(!dp_packet_ip_csum_valid(pkt)
+if (OVS_UNLIKELY(!dp_packet_ip_csum_good(pkt)
  && !dp_packet_hwol_is_ipv4(pkt)
  && csum(l3, ip_hdr_len) != 0)) {
 COVERAGE_INC(ipf_l3csum_err);
diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index 2de424105..40705e190 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -88,7 +88,7 @@ netdev_tnl_ip_extract_tnl_md(struct dp_packet *packet, struct 
flow_tnl *tnl,
 
 ovs_be32 ip_src, ip_dst;
 
-if (OVS_UNLIKELY(!dp_packet_ip_csum_valid(packet))) {
+if (OVS_UNLIKELY(!dp_packet_ip_csum_good(packet))) {
 if (csum(ip, IP_IHL(ip->ip_ihl_ver) * 4)) {
 VLOG_WARN_RL(_rl, "ip packet has invalid checksum");
 return NULL;
@@ -190,7 +190,7 @@ udp_extract_tnl_md(struct dp_packet *packet, struct 
flow_tnl *tnl,
 }
 
 if (udp->udp_csum) {
-if (OVS_UNLIKELY(!dp_packet_l4_csum_valid(packet))) {
+if (OVS_UNLIKELY(!dp_packet_l4_csum_good(packet))) {
 uint32_t csum;
 if (netdev_tnl_is_header_ipv6(dp_packet_data(packet))) {
 csum = packet_csum_pseudoheader6(dp_packet_l3(packet));
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [[PATCH RFC] 03/17] Prefix netdev offload flags with NETDEV_OFFLOAD_.

2021-12-07 Thread Flavio Leitner
Use the 'NETDEV_OFFLOAD_' prefix in the flags to indicate
we are talking about hardware offloading capabilities.

Signed-off-by: Flavio Leitner 
---
 lib/netdev-dpdk.c | 20 ++--
 lib/netdev-linux.c| 10 +-
 lib/netdev-provider.h | 10 +-
 lib/netdev.c  |  8 
 4 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 6fbf19ada..c4618eb22 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -5006,12 +5006,12 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
 
 err = dpdk_eth_dev_init(dev);
 if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) {
-netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO;
-netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_CSUM;
-netdev->ol_flags |= NETDEV_TX_OFFLOAD_UDP_CSUM;
-netdev->ol_flags |= NETDEV_TX_OFFLOAD_IPV4_CSUM;
+netdev->ol_flags |= NETDEV_OFFLOAD_TX_TCP_TSO;
+netdev->ol_flags |= NETDEV_OFFLOAD_TX_TCP_CSUM;
+netdev->ol_flags |= NETDEV_OFFLOAD_TX_UDP_CSUM;
+netdev->ol_flags |= NETDEV_OFFLOAD_TX_IPV4_CSUM;
 if (dev->hw_ol_features & NETDEV_TX_SCTP_CHECKSUM_OFFLOAD) {
-netdev->ol_flags |= NETDEV_TX_OFFLOAD_SCTP_CSUM;
+netdev->ol_flags |= NETDEV_OFFLOAD_TX_SCTP_CSUM;
 }
 }
 
@@ -5153,11 +5153,11 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev 
*netdev)
 }
 
 if (userspace_tso_enabled()) {
-netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO;
-netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_CSUM;
-netdev->ol_flags |= NETDEV_TX_OFFLOAD_UDP_CSUM;
-netdev->ol_flags |= NETDEV_TX_OFFLOAD_SCTP_CSUM;
-netdev->ol_flags |= NETDEV_TX_OFFLOAD_IPV4_CSUM;
+netdev->ol_flags |= NETDEV_OFFLOAD_TX_TCP_TSO;
+netdev->ol_flags |= NETDEV_OFFLOAD_TX_TCP_CSUM;
+netdev->ol_flags |= NETDEV_OFFLOAD_TX_UDP_CSUM;
+netdev->ol_flags |= NETDEV_OFFLOAD_TX_SCTP_CSUM;
+netdev->ol_flags |= NETDEV_OFFLOAD_TX_IPV4_CSUM;
 vhost_unsup_flags = 1ULL << VIRTIO_NET_F_HOST_ECN
 | 1ULL << VIRTIO_NET_F_HOST_UFO;
 } else {
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index e4b7c72f8..30d552170 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -928,11 +928,11 @@ netdev_linux_common_construct(struct netdev *netdev_)
 ovs_mutex_init(>mutex);
 
 if (userspace_tso_enabled()) {
-netdev_->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO;
-netdev_->ol_flags |= NETDEV_TX_OFFLOAD_TCP_CSUM;
-netdev_->ol_flags |= NETDEV_TX_OFFLOAD_UDP_CSUM;
-netdev_->ol_flags |= NETDEV_TX_OFFLOAD_SCTP_CSUM;
-netdev_->ol_flags |= NETDEV_TX_OFFLOAD_IPV4_CSUM;
+netdev_->ol_flags |= NETDEV_OFFLOAD_TX_TCP_TSO;
+netdev_->ol_flags |= NETDEV_OFFLOAD_TX_TCP_CSUM;
+netdev_->ol_flags |= NETDEV_OFFLOAD_TX_UDP_CSUM;
+netdev_->ol_flags |= NETDEV_OFFLOAD_TX_SCTP_CSUM;
+netdev_->ol_flags |= NETDEV_OFFLOAD_TX_IPV4_CSUM;
 }
 
 return 0;
diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
index 08bf8b871..0a8538615 100644
--- a/lib/netdev-provider.h
+++ b/lib/netdev-provider.h
@@ -38,11 +38,11 @@ struct netdev_tnl_build_header_params;
 #define NETDEV_NUMA_UNSPEC OVS_NUMA_UNSPEC
 
 enum netdev_ol_flags {
-NETDEV_TX_OFFLOAD_IPV4_CSUM = 1 << 0,
-NETDEV_TX_OFFLOAD_TCP_CSUM = 1 << 1,
-NETDEV_TX_OFFLOAD_UDP_CSUM = 1 << 2,
-NETDEV_TX_OFFLOAD_SCTP_CSUM = 1 << 3,
-NETDEV_TX_OFFLOAD_TCP_TSO = 1 << 4,
+NETDEV_OFFLOAD_TX_IPV4_CSUM = 1 << 0,
+NETDEV_OFFLOAD_TX_TCP_CSUM = 1 << 1,
+NETDEV_OFFLOAD_TX_UDP_CSUM = 1 << 2,
+NETDEV_OFFLOAD_TX_SCTP_CSUM = 1 << 3,
+NETDEV_OFFLOAD_TX_TCP_TSO = 1 << 4,
 };
 
 /* A network device (e.g. an Ethernet device).
diff --git a/lib/netdev.c b/lib/netdev.c
index e9b2bbe83..a06138aca 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -795,7 +795,7 @@ netdev_send_prepare_packet(const uint64_t netdev_flags,
 uint64_t l4_mask;
 
 if (dp_packet_hwol_is_tso(packet)
-&& !(netdev_flags & NETDEV_TX_OFFLOAD_TCP_TSO)) {
+&& !(netdev_flags & NETDEV_OFFLOAD_TX_TCP_TSO)) {
 /* Fall back to GSO in software. */
 VLOG_ERR_BUF(errormsg, "No TSO support");
 return false;
@@ -804,19 +804,19 @@ netdev_send_prepare_packet(const uint64_t netdev_flags,
 l4_mask = dp_packet_hwol_l4_mask(packet);
 if (l4_mask) {
 if (dp_packet_hwol_l4_is_tcp(packet)) {
-if (!(netdev_flags & NETDEV_TX_OFFLOAD_TCP_CSUM)) {
+if (!(netdev_flags & NETDEV_OFFLOAD_TX_TCP_CSUM)) {
 /* Fall back to TCP csum in software. */
 

[ovs-dev] [[PATCH RFC] 02/17] Rename flags with CKSUM to CSUM.

2021-12-07 Thread Flavio Leitner
It seems csum is more common and shorter.

Signed-off-by: Flavio Leitner 
---
 lib/dp-packet.h   | 72 +--
 lib/netdev-dpdk.c | 16 +-
 lib/netdev-linux.c|  8 ++---
 lib/netdev-provider.h |  8 ++---
 lib/netdev.c  |  6 ++--
 5 files changed, 55 insertions(+), 55 deletions(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index ee8451496..18faa79c0 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -62,13 +62,13 @@ enum dp_packet_offload_mask {
 /* Is the 'flow_mark' valid? */
 DEF_OL_FLAG(DP_PACKET_OL_FLOW_MARK, PKT_RX_FDIR_ID, 0x2),
 /* Bad L4 checksum in the packet. */
-DEF_OL_FLAG(DP_PACKET_OL_RX_L4_CKSUM_BAD, PKT_RX_L4_CKSUM_BAD, 0x4),
+DEF_OL_FLAG(DP_PACKET_OL_RX_L4_CSUM_BAD, PKT_RX_L4_CKSUM_BAD, 0x4),
 /* Bad IP checksum in the packet. */
-DEF_OL_FLAG(DP_PACKET_OL_RX_IP_CKSUM_BAD, PKT_RX_IP_CKSUM_BAD, 0x8),
+DEF_OL_FLAG(DP_PACKET_OL_RX_IP_CSUM_BAD, PKT_RX_IP_CKSUM_BAD, 0x8),
 /* Valid L4 checksum in the packet. */
-DEF_OL_FLAG(DP_PACKET_OL_RX_L4_CKSUM_GOOD, PKT_RX_L4_CKSUM_GOOD, 0x10),
+DEF_OL_FLAG(DP_PACKET_OL_RX_L4_CSUM_GOOD, PKT_RX_L4_CKSUM_GOOD, 0x10),
 /* Valid IP checksum in the packet. */
-DEF_OL_FLAG(DP_PACKET_OL_RX_IP_CKSUM_GOOD, PKT_RX_IP_CKSUM_GOOD, 0x20),
+DEF_OL_FLAG(DP_PACKET_OL_RX_IP_CSUM_GOOD, PKT_RX_IP_CKSUM_GOOD, 0x20),
 /* TCP Segmentation Offload. */
 DEF_OL_FLAG(DP_PACKET_OL_TX_TCP_SEG, PKT_TX_TCP_SEG, 0x40),
 /* Offloaded packet is IPv4. */
@@ -76,34 +76,34 @@ enum dp_packet_offload_mask {
 /* Offloaded packet is IPv6. */
 DEF_OL_FLAG(DP_PACKET_OL_TX_IPV6, PKT_TX_IPV6, 0x100),
 /* Offload TCP checksum. */
-DEF_OL_FLAG(DP_PACKET_OL_TX_TCP_CKSUM, PKT_TX_TCP_CKSUM, 0x200),
+DEF_OL_FLAG(DP_PACKET_OL_TX_TCP_CSUM, PKT_TX_TCP_CKSUM, 0x200),
 /* Offload UDP checksum. */
-DEF_OL_FLAG(DP_PACKET_OL_TX_UDP_CKSUM, PKT_TX_UDP_CKSUM, 0x400),
+DEF_OL_FLAG(DP_PACKET_OL_TX_UDP_CSUM, PKT_TX_UDP_CKSUM, 0x400),
 /* Offload SCTP checksum. */
-DEF_OL_FLAG(DP_PACKET_OL_TX_SCTP_CKSUM, PKT_TX_SCTP_CKSUM, 0x800),
+DEF_OL_FLAG(DP_PACKET_OL_TX_SCTP_CSUM, PKT_TX_SCTP_CKSUM, 0x800),
 /* Adding new field requires adding to DP_PACKET_OL_SUPPORTED_MASK. */
 };
 
 #define DP_PACKET_OL_SUPPORTED_MASK (DP_PACKET_OL_RSS_HASH | \
  DP_PACKET_OL_FLOW_MARK| \
- DP_PACKET_OL_RX_L4_CKSUM_BAD  | \
- DP_PACKET_OL_RX_IP_CKSUM_BAD  | \
- DP_PACKET_OL_RX_L4_CKSUM_GOOD | \
- DP_PACKET_OL_RX_IP_CKSUM_GOOD | \
+ DP_PACKET_OL_RX_L4_CSUM_BAD  | \
+ DP_PACKET_OL_RX_IP_CSUM_BAD  | \
+ DP_PACKET_OL_RX_L4_CSUM_GOOD | \
+ DP_PACKET_OL_RX_IP_CSUM_GOOD | \
  DP_PACKET_OL_TX_TCP_SEG   | \
  DP_PACKET_OL_TX_IPV4  | \
  DP_PACKET_OL_TX_IPV6  | \
- DP_PACKET_OL_TX_TCP_CKSUM | \
- DP_PACKET_OL_TX_UDP_CKSUM | \
- DP_PACKET_OL_TX_SCTP_CKSUM)
-
-#define DP_PACKET_OL_TX_L4_MASK (DP_PACKET_OL_TX_TCP_CKSUM | \
- DP_PACKET_OL_TX_UDP_CKSUM | \
- DP_PACKET_OL_TX_SCTP_CKSUM)
-#define DP_PACKET_OL_RX_IP_CKSUM_MASK (DP_PACKET_OL_RX_IP_CKSUM_GOOD | \
-   DP_PACKET_OL_RX_IP_CKSUM_BAD)
-#define DP_PACKET_OL_RX_L4_CKSUM_MASK (DP_PACKET_OL_RX_L4_CKSUM_GOOD | \
-   DP_PACKET_OL_RX_L4_CKSUM_BAD)
+ DP_PACKET_OL_TX_TCP_CSUM | \
+ DP_PACKET_OL_TX_UDP_CSUM | \
+ DP_PACKET_OL_TX_SCTP_CSUM)
+
+#define DP_PACKET_OL_TX_L4_MASK (DP_PACKET_OL_TX_TCP_CSUM | \
+ DP_PACKET_OL_TX_UDP_CSUM | \
+ DP_PACKET_OL_TX_SCTP_CSUM)
+#define DP_PACKET_OL_RX_IP_CSUM_MASK (DP_PACKET_OL_RX_IP_CSUM_GOOD | \
+   DP_PACKET_OL_RX_IP_CSUM_BAD)
+#define DP_PACKET_OL_RX_L4_CSUM_MASK (DP_PACKET_OL_RX_L4_CSUM_GOOD | \
+   DP_PACKET_OL_RX_L4_CSUM_BAD)
 
 /* Buffer for holding packet data.  A dp_packet is automatically reallocated
  * as necessary if it grows too large for the available memory.
@@ -989,7 +989,7 @@ static inline bool
 dp_packet_hwol_l4_is_tcp(const struct dp_packet *b)
 {
 return (*dp_packet_ol_flags_ptr(b) & DP_PACKET_OL_TX_L4_MASK) ==
-DP_PACKET_OL_TX_TCP_CKSUM;
+DP_PACKET_OL_TX_TCP_CSUM;
 }
 
 /* Returns '

[ovs-dev] [[PATCH RFC] 01/17] Rename checksum to csum in hwol functions.

2021-12-07 Thread Flavio Leitner
It seems csum is more common and shorter.

Signed-off-by: Flavio Leitner 
---
 lib/conntrack.c | 12 ++--
 lib/dp-packet.h | 10 +-
 lib/ipf.c   |  4 ++--
 lib/netdev-native-tnl.c |  4 ++--
 4 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 33a1a9295..e84ec4aee 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -2089,12 +2089,12 @@ conn_key_extract(struct conntrack *ct, struct dp_packet 
*pkt, ovs_be16 dl_type,
 ctx->key.dl_type = dl_type;
 
 if (ctx->key.dl_type == htons(ETH_TYPE_IP)) {
-bool hwol_bad_l3_csum = dp_packet_ip_checksum_bad(pkt);
+bool hwol_bad_l3_csum = dp_packet_ip_csum_bad(pkt);
 if (hwol_bad_l3_csum) {
 ok = false;
 COVERAGE_INC(conntrack_l3csum_err);
 } else {
-bool hwol_good_l3_csum = dp_packet_ip_checksum_valid(pkt)
+bool hwol_good_l3_csum = dp_packet_ip_csum_valid(pkt)
  || dp_packet_hwol_is_ipv4(pkt);
 /* Validate the checksum only when hwol is not supported. */
 ok = extract_l3_ipv4(>key, l3, dp_packet_l3_size(pkt), NULL,
@@ -2107,10 +2107,10 @@ conn_key_extract(struct conntrack *ct, struct dp_packet 
*pkt, ovs_be16 dl_type,
 }
 
 if (ok) {
-bool hwol_bad_l4_csum = dp_packet_l4_checksum_bad(pkt);
+bool hwol_bad_l4_csum = dp_packet_l4_csum_bad(pkt);
 if (!hwol_bad_l4_csum) {
-bool  hwol_good_l4_csum = dp_packet_l4_checksum_valid(pkt)
-  || dp_packet_hwol_tx_l4_checksum(pkt);
+bool  hwol_good_l4_csum = dp_packet_l4_csum_valid(pkt)
+  || dp_packet_hwol_tx_l4_csum(pkt);
 /* Validate the checksum only when hwol is not supported. */
 if (extract_l4(>key, l4, dp_packet_l4_size(pkt),
>icmp_related, l3, !hwol_good_l4_csum,
@@ -3424,7 +3424,7 @@ handle_ftp_ctl(struct conntrack *ct, const struct 
conn_lookup_ctx *ctx,
 }
 
 th->tcp_csum = 0;
-if (!dp_packet_hwol_tx_l4_checksum(pkt)) {
+if (!dp_packet_hwol_tx_l4_csum(pkt)) {
 if (ctx->key.dl_type == htons(ETH_TYPE_IPV6)) {
 th->tcp_csum = packet_csum_upperlayer6(nh6, th, ctx->key.nw_proto,
dp_packet_l4_size(pkt));
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 3dc582fbf..ee8451496 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -965,7 +965,7 @@ dp_packet_hwol_l4_mask(const struct dp_packet *b)
 
 /* Return true if the packet 'b' requested L4 checksum offload. */
 static inline bool
-dp_packet_hwol_tx_l4_checksum(const struct dp_packet *b)
+dp_packet_hwol_tx_l4_csum(const struct dp_packet *b)
 {
 return !!dp_packet_hwol_l4_mask(b);
 }
@@ -1056,28 +1056,28 @@ dp_packet_hwol_set_tcp_seg(struct dp_packet *b)
 }
 
 static inline bool
-dp_packet_ip_checksum_valid(const struct dp_packet *p)
+dp_packet_ip_csum_valid(const struct dp_packet *p)
 {
 return (*dp_packet_ol_flags_ptr(p) & DP_PACKET_OL_RX_IP_CKSUM_MASK) ==
 DP_PACKET_OL_RX_IP_CKSUM_GOOD;
 }
 
 static inline bool
-dp_packet_ip_checksum_bad(const struct dp_packet *p)
+dp_packet_ip_csum_bad(const struct dp_packet *p)
 {
 return (*dp_packet_ol_flags_ptr(p) & DP_PACKET_OL_RX_IP_CKSUM_MASK) ==
 DP_PACKET_OL_RX_IP_CKSUM_BAD;
 }
 
 static inline bool
-dp_packet_l4_checksum_valid(const struct dp_packet *p)
+dp_packet_l4_csum_valid(const struct dp_packet *p)
 {
 return (*dp_packet_ol_flags_ptr(p) & DP_PACKET_OL_RX_L4_CKSUM_MASK) ==
 DP_PACKET_OL_RX_L4_CKSUM_GOOD;
 }
 
 static inline bool
-dp_packet_l4_checksum_bad(const struct dp_packet *p)
+dp_packet_l4_csum_bad(const struct dp_packet *p)
 {
 return (*dp_packet_ol_flags_ptr(p) & DP_PACKET_OL_RX_L4_CKSUM_MASK) ==
 DP_PACKET_OL_RX_L4_CKSUM_BAD;
diff --git a/lib/ipf.c b/lib/ipf.c
index 507db2aea..013c4cfba 100644
--- a/lib/ipf.c
+++ b/lib/ipf.c
@@ -574,7 +574,7 @@ ipf_list_state_transition(struct ipf *ipf, struct ipf_list 
*ipf_list,
 static bool
 ipf_is_valid_v4_frag(struct ipf *ipf, struct dp_packet *pkt)
 {
-if (OVS_UNLIKELY(dp_packet_ip_checksum_bad(pkt))) {
+if (OVS_UNLIKELY(dp_packet_ip_csum_bad(pkt))) {
 COVERAGE_INC(ipf_l3csum_err);
 goto invalid_pkt;
 }
@@ -608,7 +608,7 @@ ipf_is_valid_v4_frag(struct ipf *ipf, struct dp_packet *pkt)
 goto invalid_pkt;
 }
 
-if (OVS_UNLIKELY(!dp_packet_ip_checksum_valid(pkt)
+if (OVS_UNLIKELY(!dp_packet_ip_csum_valid(pkt)
  && !dp_packet_hwol_is_ipv4(pkt)
  && csum(l3, ip_hdr_len) != 0)) {
 COVERAGE_INC(ipf_l3csum_err);
diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index b89dfdd52..2de424105 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -88,7 +88,7 @@ ne

[ovs-dev] [[PATCH RFC] 00/17] Enable TSO in userspace by default.

2021-12-07 Thread Flavio Leitner
This patch series is at RFC stage, though some of the renaming
changes could go in independently of the rest of the series.

The goal is to enable NIC csum and segmentation offloading by
default in OVS userspace data path with and without DPDK.
Other Linux software devices like tap (br) or socket (veth)
netdevs are supported.

The performance depends on the use case. For example, if OVS
is just forwarding between two ports, then checksum offloading
doesn't offer any gains. However, with more complex flow tables,
like the ones generated by OVN for example, where packets are
changed, then checksum offloading can improve performance.

The TCP Segmentation Offload (TSO) helps regardless of the flow
tables because instead of sending many frames of MSS size, OVS
can process one big packet. This improves throughput performance
in some cases up to 6x.

A brief documentation is added to provide details on how this
is supposed to work.

A segmentation implementation is provided (untested) to see
if the approach is good enough. Some of the challenges are
in the commit message.

The patch series currently misses the knobs to control each
feature per port. Is that desired?


Flavio Leitner (17):
  Rename checksum to csum in hwol functions.
  Rename flags with CKSUM to CSUM.
  Prefix netdev offload flags with NETDEV_OFFLOAD_.
  Rename hwol csum valid to good.
  Rename dp_packet_hwol to dp_packet_ol.
  dp-packet: Use p for packet and b for batch.
  dp-packet: Rename dp_packet_ol_tcp_seg
  dp-packet: Rename dp_packet_ol_is_ipv4.
  dp-packet: Rename dp_packet_ol l4 functions.
  dp-packet: Add _ol_ to functions using OL flags.
  Document netdev offload.
  Show netdev offloading flags.
  Enable IP checksum offloading by default.
  Enable L4 csum offloading by default.
  Respect tso/gso segment size.
  Add Generic Segmentation Offloading.
  Enable TSO if available.

 Documentation/automake.mk|   1 +
 Documentation/topics/index.rst   |   1 +
 Documentation/topics/nic-offloads.rst|  95 +++
 Documentation/topics/userspace-tso.rst   |  12 -
 lib/automake.mk  |   4 +-
 lib/conntrack.c  |  28 +-
 lib/dp-packet-gso.c  | 153 +
 lib/{userspace-tso.h => dp-packet-gso.h} |  13 +-
 lib/dp-packet.c  | 378 ++--
 lib/dp-packet.h  | 720 ++-
 lib/dpif-netdev-unixctl.man  |   5 +
 lib/dpif-netdev.c|  58 ++
 lib/dpif.h   |   2 +-
 lib/flow.c   |  37 +-
 lib/ipf.c|  13 +-
 lib/netdev-dpdk.c| 288 +
 lib/netdev-dummy.c   |  21 +
 lib/netdev-linux.c   | 430 +++---
 lib/netdev-native-tnl.c  |  53 +-
 lib/netdev-provider.h|  13 +-
 lib/netdev.c | 183 --
 lib/odp-execute.c|  21 +-
 lib/packets.c| 210 +--
 lib/packets.h|   3 +
 lib/userspace-tso.c  |  48 --
 ofproto/ofproto-dpif-upcall.c|  14 +-
 tests/automake.mk|   1 +
 tests/dpif-netdev.at |  21 +
 tests/system-userspace-offload.at|  79 +++
 tests/system-userspace-testsuite.at  |   1 +
 vswitchd/bridge.c|   2 -
 vswitchd/vswitch.xml |  20 -
 32 files changed, 1858 insertions(+), 1070 deletions(-)
 create mode 100644 Documentation/topics/nic-offloads.rst
 create mode 100644 lib/dp-packet-gso.c
 rename lib/{userspace-tso.h => dp-packet-gso.h} (67%)
 delete mode 100644 lib/userspace-tso.c
 create mode 100644 tests/system-userspace-offload.at

-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 2/4] Native tunnel: Add tnl/neigh/aging command.

2021-11-29 Thread Flavio Leitner
On Sat, Nov 27, 2021 at 12:12:51AM +0100, Paolo Valerio wrote:
> with the command is now possible to change the aging time of the
> cache entries.
> 
> For the existing entries the aging time is updated only if the
> current expiration is greater than the new one. In any case, the next
> refresh will set it to the new value.
> 
> This is intended mostly for debugging purpose.
> 
> Signed-off-by: Paolo Valerio 
> ---

Acked-by: Flavio Leitner 

Thanks Paolo,
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/4] Native tunnel: Add tnl/neigh/aging command.

2021-11-25 Thread Flavio Leitner
On Thu, Nov 25, 2021 at 05:34:27PM +0100, Paolo Valerio wrote:
> Flavio Leitner  writes:
> 
> > On Wed, Nov 10, 2021 at 11:46:42AM +0100, Paolo Valerio wrote:
> >> with the command is now possible to change the aging time of the
> >> cache entries.
> >> 
> >> For the existing entries the aging time is updated only if the
> >> current expiration is greater than the new one. In any case, the next
> >> refresh will set it to the new value.
> >> 
> >> This is intended mostly for debugging purpose.
> >> 
> >> Signed-off-by: Paolo Valerio 
> >> ---
> >> v2:
> >> - fixed NEIGH_ENTRY_MAX_AGEING_TIME (turned to seconds) correcting a
> >>   leftover.
> >> - turned relaxed atomics to acq/rel.
> >> - added range checks to tunnel-push-pop.at. It was useless to
> >>   duplicate the test for both ipv6 and ipv4, so only the latter
> >>   includes it.
> >> - slightly modified the NEWS entry.
> >> ---
> >>  NEWS|2 +
> >>  lib/tnl-neigh-cache.c   |   79 
> >> +++
> >>  ofproto/ofproto-tnl-unixctl.man |9 
> >>  tests/tunnel-push-pop-ipv6.at   |   30 +++
> >>  tests/tunnel-push-pop.at|   47 +++
> >>  5 files changed, 158 insertions(+), 9 deletions(-)
> >> 
> >> diff --git a/NEWS b/NEWS
> >> index 434ee570f..1aa233a0d 100644
> >> --- a/NEWS
> >> +++ b/NEWS
> >> @@ -16,6 +16,8 @@ Post-v2.16.0
> >> - ovs-dpctl and 'ovs-appctl dpctl/':
> >>   * New commands 'cache-get-size' and 'cache-set-size' that allows to
> >> get or configure linux kernel datapath cache sizes.
> >> +   - ovs-appctl:
> >> + * New command tnl/neigh/aging to read/write the neigh aging time.
> >>  
> >>  
> >>  v2.16.0 - 16 Aug 2021
> >> diff --git a/lib/tnl-neigh-cache.c b/lib/tnl-neigh-cache.c
> >> index 1e6cc31db..a4d56e4cc 100644
> >> --- a/lib/tnl-neigh-cache.c
> >> +++ b/lib/tnl-neigh-cache.c
> >> @@ -46,6 +46,7 @@
> >>  
> >>  
> >>  #define NEIGH_ENTRY_DEFAULT_IDLE_TIME_MS  (15 * 60 * 1000)
> >> +#define NEIGH_ENTRY_MAX_AGING_TIME  3600
> >
> > Shouldn't we include the unit suffix here too?
> >
> 
> We could. For consistency, I think we should add "_S".

Yup, that's fine by me.

> Is it ok, or you prefer something else like _SEC?

Well, as you pointed out, the consistency matters, so
in that case I think _MS should have been _MSEC as well. 

fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1] configure: Allow opt-in to CPU ISA opts at compile time

2021-11-25 Thread Flavio Leitner
On Mon, Sep 13, 2021 at 02:36:41PM +, Van Haaren, Harry wrote:
> > -Original Message-
> > From: Eelco Chaudron 
> > Sent: Friday, September 10, 2021 3:41 PM
> > To: Van Haaren, Harry ; i.maxim...@ovn.org;
> > Stokes, Ian ; f...@sysclose.org
> > Cc: Amber, Kumar ; ovs-dev@openvswitch.org;
> > ktray...@redhat.com
> > Subject: Re: [PATCH v1] configure: Allow opt-in to CPU ISA opts at compile 
> > time
> > 
> > 
> > 
> > On 8 Sep 2021, at 17:28, Van Haaren, Harry wrote:
> > 
> > >> -Original Message-
> > >> From: Eelco Chaudron 
> > >> Sent: Wednesday, September 8, 2021 9:16 AM
> > >> To: Amber, Kumar 
> > >> Cc: ovs-dev@openvswitch.org; ktray...@redhat.com; i.maxim...@ovn.org;
> > >> Stokes, Ian ; f...@sysclose.org; Van Haaren, Harry
> > >> 
> > >> Subject: Re: [PATCH v1] configure: Allow opt-in to CPU ISA opts at 
> > >> compile
> > time
> > >>
> > >> Not a real review of the patch, but just some comment/questions glancing
> > over
> > >> the patch.
> > >
> > > Sure, thanks for input.
> > >
> > >> On 3 Sep 2021, at 15:53, Kumar Amber wrote:
> > >>
> > >>> This commit allows "opt-in" to CPU ISA optimized implementations of
> > >>> OVS SW datapath components at compile time. This can be useful in some
> > >>> deployments where the CPU ISA optimized implementation is to be chosen
> > >>> by default.
> > >
> > > 
> > >
> > >>> +Enabling all AVX512 options
> > >>> +---
> > >>> +
> > >>> +A user can enable all the three DPIF, Miniflow Extract and DPLCS 
> > >>> optimized
> > >>> +AVX512 options at build time, if the CPU supports the required AVX512 
> > >>> ISA
> > >>> +by using the following command ::
> > >>> +
> > >>> +./configure --enable-cpu-isa
> > >>
> > >> If we have different ISA architectures, i.e., i86 vs ARM, we are ok with 
> > >> a single
> > >> option. Have you thought about AMD adding its own AMDXXX instructions in
> > >> addition to AVX512? How would this configuration option work? Maybe an
> > >> optional option to prioritize one over the other.
> > >
> > > The ISA enabling efforts have been generic so far, any reference to 
> > > specific ISA
> > (e.g. AVX512)
> > > has been solely in the implementation choice - never in a general 
> > > component.
> > Intention here
> > > is to stay in line with that - and "enable CPU ISA" seemed a logical 
> > > string to
> > achieve that to me..
> > >
> > > It is of course possible to provide multiple configure command lines, but 
> > > I was
> > hoping to avoid
> > > creating too many compile time flags. Typically I think projects attempt 
> > > to avoid
> > due to expanding
> > > testing & validation. A single flag would limit overhead to the minimum...
> > >
> > > Typically ISA sets have a "good - better - best" type relationship - 
> > > which could
> > lead to a general
> > > acceptance of what ISA is best. We have runtime functions to switch
> > implementation - so today
> > > the code already enables a log of runtime/dynamic updating of
> > implementation. If there's a
> > > need to expose that at compile time too, then that's easy to add - but 
> > > comes
> > with a burden in
> > > testing & validation...
> > 
> > The main reason to mention this is the inconsistent behavior across
> > builds/releases. With this flag being as general as it is, if someone 
> > decides to add
> > AVX1024, it now gets selected as the default isa function (assuming the 
> > target
> > was already supporting this). This is a change in behavior that happens 
> > without
> > any configure option change. The difference to any other general change is 
> > that
> > this is not a global change, but something that changes based on your 
> > target.
> 
> Note that the default setting for this option is suggested as "off", meaning 
> this is an entirely
> *opt in* strategy, to allow people to deploy OVS and automatically benefit 
> from CPU ISA.
> 
> To be more specific, this feature is a request from folks who intend to 
> deploy with CPU ISA
> enabled by default - it suited their CI/CD/QA tooling to have this enabled by 
> default compile
> time switch to ease validation as the CPU ISA will get picked up 
> automatically when available.
> 
> Note that it is not a "change in behaviour", because functionally its 
> identical.
> (The fuzzing, autovalidation & unit tests are there to ensure it is 
> functionally identical).
> It correct that there is a change in the default *implementation* of the 
> functionality,
> which I think you meant (just clarifying the "change in behaviour" as not 
> being "functional behaviour",
> only "implementation of behaviour")

It seems to me this is coming to address the distro release process
requirement of not going back in the process to change the RPM
package.

To give others some background: An overview of release process is
like: we build a RPM package, then send it to QA, and if it gets
approved, then ship it to users/customers.

Therefore, there is no way to go back to rebuild/change the RPM to

Re: [ovs-dev] [PATCH v2 1/2] stream-ssl: Fix handling of default ciphers/protocols

2021-11-25 Thread Flavio Leitner
On Thu, Nov 25, 2021 at 09:22:20AM +0100, Frode Nordahl wrote:
> On Wed, Nov 24, 2021 at 9:31 PM Flavio Leitner  wrote:
> >
> > On Mon, Nov 15, 2021 at 10:40:47AM +0100, Frode Nordahl wrote:
> > > On Mon, Sep 13, 2021 at 4:23 AM Frode Nordahl
> > >  wrote:
> > > >
> > > > On Sat, Sep 11, 2021 at 10:23 PM Flavio Leitner  
> > > > wrote:
> > > > >
> > > > > On Fri, Sep 10, 2021 at 06:20:45PM +0200, Frode Nordahl wrote:
> > > > > > On Thu, Sep 9, 2021 at 9:53 PM Flavio Leitner  
> > > > > > wrote:
> > > > > > >
> > > > > > >
> > > > > > > Hi Frode,
> > > > > > >
> > > > > > > Thanks for your patch.
> > > > > > > Please see my comments below.
> > > > > >
> > > > > > Flavio, thank you for taking the time to review.
> > > > > >
> > > > > > > On Wed, Aug 25, 2021 at 01:05:13PM +0200, Frode Nordahl wrote:
> > > > > > > > Contrary to what is stated in the documentation, when SSL
> > > > > > > > ciphers or protocols options are omitted the default values
> > > > > > > > will not be set.  The SSL library default settings will be used
> > > > > > > > instead.
> > > > > > > >
> > > > > > > > Fix handling of default ciphers and protocols so that we 
> > > > > > > > actually
> > > > > > > > enforce what is listed as defaults.
> > > > > > > >
> > > > > > > > Fixes: e18a1d086133 ("Add support for specifying SSL connection 
> > > > > > > > parameters to ovsdb")
> > > > > > > > Signed-off-by: Frode Nordahl 
> > > > > > > > ---
> > > > > > > >  lib/stream-ssl.c | 30 ++
> > > > > > > >  1 file changed, 22 insertions(+), 8 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/lib/stream-ssl.c b/lib/stream-ssl.c
> > > > > > > > index 0ea3f2c08..6b4cf6970 100644
> > > > > > > > --- a/lib/stream-ssl.c
> > > > > > > > +++ b/lib/stream-ssl.c
> > > > > > > > @@ -162,8 +162,10 @@ struct ssl_config_file {
> > > > > > > >  static struct ssl_config_file private_key;
> > > > > > > >  static struct ssl_config_file certificate;
> > > > > > > >  static struct ssl_config_file ca_cert;
> > > > > > > > -static char *ssl_protocols = "TLSv1,TLSv1.1,TLSv1.2";
> > > > > > > > -static char *ssl_ciphers = "HIGH:!aNULL:!MD5";
> > > > > > > > +static char *default_ssl_protocols = "TLSv1,TLSv1.1,TLSv1.2";
> > > > > > > > +static char *default_ssl_ciphers = "HIGH:!aNULL:!MD5";
> > > > > > > > +static char *ssl_protocols = NULL;
> > > > > > > > +static char *ssl_ciphers = NULL;
> > > > > > > >
> > > > > > > >  /* Ordinarily, the SSL client and server verify each other's 
> > > > > > > > certificates using
> > > > > > > >   * a CA certificate.  Setting this to false disables this 
> > > > > > > > behavior.  (This is a
> > > > > > > > @@ -1225,14 +1227,19 @@ stream_ssl_set_key_and_cert(const char 
> > > > > > > > *private_key_file,
> > > > > > > >  void
> > > > > > > >  stream_ssl_set_ciphers(const char *arg)
> > > > > > > >  {
> > > > > > > > -if (ssl_init() || !arg || !strcmp(ssl_ciphers, arg)) {
> > > > > > >
> > > > > > > The ssl_init() calls at least one time do_ssl_init() which then
> > > > > > > calls SSL_CTX_set_cipher_list(ctx, "HIGH:!aNULL:!MD5").
> > > > > > > Those are the defaults in the man-page and not from the library.
> > > > > > >
> > > > > > > The do_ssl_init() also does:
> > > > > > >method = CONST_CAST(SSL_METHOD *, SSLv23_method());
> > > > > > >
> > > > > > > That should return SSLv3, TLSv1, TLSv1.1 and TLS1.2.
> > > > > > >
&

Re: [ovs-dev] [PATCH v2 4/4] Tunnel: Snoop ingress packets and update neigh cache if needed.

2021-11-24 Thread Flavio Leitner
On Wed, Nov 10, 2021 at 11:46:55AM +0100, Paolo Valerio wrote:
> In case of native tunnel with bfd enabled, if the MAC address of the
> remote end's interface changes (e.g. because it got rebooted, and the
> MAC address is allocated dynamically), the BFD session will never be
> re-established.
> 
> This happens because the local tunnel neigh entry doesn't get updated,
> and the local end keeps sending BFD packets with the old destination
> MAC address. This was not an issue until
> b23ddcc57d41 ("tnl-neigh-cache: tighten arp and nd snooping.")
> because ARP requests were snooped as well avoiding the problem.
> 
> Fix this by snooping the incoming packets in the slow path, and
> updating the neigh cache accordingly.
> 
> Signed-off-by: Paolo Valerio 
> Fixes: b23ddcc57d41 ("tnl-neigh-cache: tighten arp and nd snooping.")
> Acked-by: Gaetan Rivet 
> ---

If you happen to respin the series, maybe you could add
the tag Reported-at: .

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 3/4] Native tunnel: Do not refresh the entry while revalidating.

2021-11-24 Thread Flavio Leitner
On Wed, Nov 10, 2021 at 11:46:49AM +0100, Paolo Valerio wrote:
> This is a minor issue but visible e.g. when you try to flush the neigh
> cache while the ARP flow is still present in the datapath, triggering
> the revalidation of the datapath flows which subsequently
> refreshes/adds the entry in the cache.
> 
> Signed-off-by: Paolo Valerio 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/4] Native tunnel: Add tnl/neigh/aging command.

2021-11-24 Thread Flavio Leitner
On Wed, Nov 10, 2021 at 11:46:42AM +0100, Paolo Valerio wrote:
> with the command is now possible to change the aging time of the
> cache entries.
> 
> For the existing entries the aging time is updated only if the
> current expiration is greater than the new one. In any case, the next
> refresh will set it to the new value.
> 
> This is intended mostly for debugging purpose.
> 
> Signed-off-by: Paolo Valerio 
> ---
> v2:
> - fixed NEIGH_ENTRY_MAX_AGEING_TIME (turned to seconds) correcting a
>   leftover.
> - turned relaxed atomics to acq/rel.
> - added range checks to tunnel-push-pop.at. It was useless to
>   duplicate the test for both ipv6 and ipv4, so only the latter
>   includes it.
> - slightly modified the NEWS entry.
> ---
>  NEWS|2 +
>  lib/tnl-neigh-cache.c   |   79 
> +++
>  ofproto/ofproto-tnl-unixctl.man |9 
>  tests/tunnel-push-pop-ipv6.at   |   30 +++
>  tests/tunnel-push-pop.at|   47 +++
>  5 files changed, 158 insertions(+), 9 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index 434ee570f..1aa233a0d 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -16,6 +16,8 @@ Post-v2.16.0
> - ovs-dpctl and 'ovs-appctl dpctl/':
>   * New commands 'cache-get-size' and 'cache-set-size' that allows to
> get or configure linux kernel datapath cache sizes.
> +   - ovs-appctl:
> + * New command tnl/neigh/aging to read/write the neigh aging time.
>  
>  
>  v2.16.0 - 16 Aug 2021
> diff --git a/lib/tnl-neigh-cache.c b/lib/tnl-neigh-cache.c
> index 1e6cc31db..a4d56e4cc 100644
> --- a/lib/tnl-neigh-cache.c
> +++ b/lib/tnl-neigh-cache.c
> @@ -46,6 +46,7 @@
>  
>  
>  #define NEIGH_ENTRY_DEFAULT_IDLE_TIME_MS  (15 * 60 * 1000)
> +#define NEIGH_ENTRY_MAX_AGING_TIME  3600

Shouldn't we include the unit suffix here too?

fbl

>  
>  struct tnl_neigh_entry {
>  struct cmap_node cmap_node;
> @@ -57,6 +58,7 @@ struct tnl_neigh_entry {
>  
>  static struct cmap table = CMAP_INITIALIZER;
>  static struct ovs_mutex mutex = OVS_MUTEX_INITIALIZER;
> +static atomic_uint32_t neigh_aging;
>  
>  static uint32_t
>  tnl_neigh_hash(const struct in6_addr *ip)
> @@ -74,6 +76,15 @@ tnl_neigh_expired(struct tnl_neigh_entry *neigh)
>  return expires <= time_msec();
>  }
>  
> +static uint32_t
> +tnl_neigh_get_aging(void)
> +{
> +unsigned int aging;
> +
> +atomic_read_explicit(_aging, , memory_order_acquire);
> +return aging;
> +}
> +
>  static struct tnl_neigh_entry *
>  tnl_neigh_lookup__(const char br_name[IFNAMSIZ], const struct in6_addr *dst)
>  {
> @@ -88,7 +99,7 @@ tnl_neigh_lookup__(const char br_name[IFNAMSIZ], const 
> struct in6_addr *dst)
>  }
>  
>  atomic_store_explicit(>expires, time_msec() +
> -  NEIGH_ENTRY_DEFAULT_IDLE_TIME_MS,
> +  tnl_neigh_get_aging(),
>memory_order_release);
>  return neigh;
>  }
> @@ -134,7 +145,7 @@ tnl_neigh_set__(const char name[IFNAMSIZ], const struct 
> in6_addr *dst,
>  if (neigh) {
>  if (eth_addr_equals(neigh->mac, mac)) {
>  atomic_store_relaxed(>expires, time_msec() +
> - NEIGH_ENTRY_DEFAULT_IDLE_TIME_MS);
> + tnl_neigh_get_aging());
>  ovs_mutex_unlock();
>  return;
>  }
> @@ -147,7 +158,7 @@ tnl_neigh_set__(const char name[IFNAMSIZ], const struct 
> in6_addr *dst,
>  neigh->ip = *dst;
>  neigh->mac = mac;
>  atomic_store_relaxed(>expires, time_msec() +
> - NEIGH_ENTRY_DEFAULT_IDLE_TIME_MS);
> + tnl_neigh_get_aging());
>  ovs_strlcpy(neigh->br_name, name, sizeof neigh->br_name);
>  cmap_insert(, >cmap_node, tnl_neigh_hash(>ip));
>  ovs_mutex_unlock();
> @@ -273,6 +284,45 @@ tnl_neigh_cache_flush(struct unixctl_conn *conn, int 
> argc OVS_UNUSED,
>  unixctl_command_reply(conn, "OK");
>  }
>  
> +static void
> +tnl_neigh_cache_aging(struct unixctl_conn *conn, int argc,
> +const char *argv[], void *aux OVS_UNUSED)
> +{
> +long long int new_exp, curr_exp;
> +struct tnl_neigh_entry *neigh;
> +uint32_t aging;
> +
> +if (argc == 1) {
> +struct ds ds = DS_EMPTY_INITIALIZER;
> +ds_put_format(, "%"PRIu32, tnl_neigh_get_aging() / 1000);
> +unixctl_command_reply(conn, ds_cstr());
> +ds_destroy();
> +
> +return;
> +}
> +
> +if (!ovs_scan(argv[1], "%"SCNu32, ) ||
> +!aging || aging > NEIGH_ENTRY_MAX_AGING_TIME) {
> +unixctl_command_reply_error(conn, "bad aging value");
> +return;
> +}
> +
> +aging *= 1000;
> +atomic_store_explicit(_aging, aging, memory_order_release);
> +new_exp = time_msec() + aging;
> +
> +CMAP_FOR_EACH (neigh, cmap_node, ) {
> +atomic_read_explicit(>expires, _exp,

Re: [ovs-dev] [PATCH v2 1/4] Native tunnel: Read/write expires atomically.

2021-11-24 Thread Flavio Leitner
On Wed, Nov 10, 2021 at 11:46:36AM +0100, Paolo Valerio wrote:
> Expires is modified in different threads (revalidator, pmd-rx, bfd-tx).
> It's better to use atomics for such potentially parallel write.
> 
> Signed-off-by: Paolo Valerio 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/2] stream-ssl: Fix handling of default ciphers/protocols

2021-11-24 Thread Flavio Leitner
On Mon, Nov 15, 2021 at 10:40:47AM +0100, Frode Nordahl wrote:
> On Mon, Sep 13, 2021 at 4:23 AM Frode Nordahl
>  wrote:
> >
> > On Sat, Sep 11, 2021 at 10:23 PM Flavio Leitner  wrote:
> > >
> > > On Fri, Sep 10, 2021 at 06:20:45PM +0200, Frode Nordahl wrote:
> > > > On Thu, Sep 9, 2021 at 9:53 PM Flavio Leitner  wrote:
> > > > >
> > > > >
> > > > > Hi Frode,
> > > > >
> > > > > Thanks for your patch.
> > > > > Please see my comments below.
> > > >
> > > > Flavio, thank you for taking the time to review.
> > > >
> > > > > On Wed, Aug 25, 2021 at 01:05:13PM +0200, Frode Nordahl wrote:
> > > > > > Contrary to what is stated in the documentation, when SSL
> > > > > > ciphers or protocols options are omitted the default values
> > > > > > will not be set.  The SSL library default settings will be used
> > > > > > instead.
> > > > > >
> > > > > > Fix handling of default ciphers and protocols so that we actually
> > > > > > enforce what is listed as defaults.
> > > > > >
> > > > > > Fixes: e18a1d086133 ("Add support for specifying SSL connection 
> > > > > > parameters to ovsdb")
> > > > > > Signed-off-by: Frode Nordahl 
> > > > > > ---
> > > > > >  lib/stream-ssl.c | 30 ++
> > > > > >  1 file changed, 22 insertions(+), 8 deletions(-)
> > > > > >
> > > > > > diff --git a/lib/stream-ssl.c b/lib/stream-ssl.c
> > > > > > index 0ea3f2c08..6b4cf6970 100644
> > > > > > --- a/lib/stream-ssl.c
> > > > > > +++ b/lib/stream-ssl.c
> > > > > > @@ -162,8 +162,10 @@ struct ssl_config_file {
> > > > > >  static struct ssl_config_file private_key;
> > > > > >  static struct ssl_config_file certificate;
> > > > > >  static struct ssl_config_file ca_cert;
> > > > > > -static char *ssl_protocols = "TLSv1,TLSv1.1,TLSv1.2";
> > > > > > -static char *ssl_ciphers = "HIGH:!aNULL:!MD5";
> > > > > > +static char *default_ssl_protocols = "TLSv1,TLSv1.1,TLSv1.2";
> > > > > > +static char *default_ssl_ciphers = "HIGH:!aNULL:!MD5";
> > > > > > +static char *ssl_protocols = NULL;
> > > > > > +static char *ssl_ciphers = NULL;
> > > > > >
> > > > > >  /* Ordinarily, the SSL client and server verify each other's 
> > > > > > certificates using
> > > > > >   * a CA certificate.  Setting this to false disables this 
> > > > > > behavior.  (This is a
> > > > > > @@ -1225,14 +1227,19 @@ stream_ssl_set_key_and_cert(const char 
> > > > > > *private_key_file,
> > > > > >  void
> > > > > >  stream_ssl_set_ciphers(const char *arg)
> > > > > >  {
> > > > > > -if (ssl_init() || !arg || !strcmp(ssl_ciphers, arg)) {
> > > > >
> > > > > The ssl_init() calls at least one time do_ssl_init() which then
> > > > > calls SSL_CTX_set_cipher_list(ctx, "HIGH:!aNULL:!MD5").
> > > > > Those are the defaults in the man-page and not from the library.
> > > > >
> > > > > The do_ssl_init() also does:
> > > > >method = CONST_CAST(SSL_METHOD *, SSLv23_method());
> > > > >
> > > > > That should return SSLv3, TLSv1, TLSv1.1 and TLS1.2.
> > > > >
> > > > >ctx = SSL_CTX_new(method);
> > > > >SSL_CTX_set_options(ctx, SSL_OP_NO_SSLv2 | SSL_OP_NO_SSLv3);
> > > > >
> > > > > And there it excludes those SSL v2 and v3.
> > > > >
> > > > > Therefore, the default would be "TLSv1,TLSv1.1,TLSv1.2" which is
> > > > > the same in the man-page.
> > > > >
> > > > > Did I miss something?
> > > >
> > > > Thank you for pointing out that, I did not realize we manipulated
> > > > these options multiple places.
> > > >
> > > > I do need to rephrase the commit message, but there is still a problem
> > > > here. It became apparent when working on the next patch in the series,
> > > > where functional tests behave unexpectedly when pa

Re: [ovs-dev] [PATCH] netlink-socket: Check for null sock in nl_sock_recv__()

2021-11-24 Thread Flavio Leitner
On Tue, Nov 23, 2021 at 07:58:52PM -0300, Murilo Opsfelder Araújo wrote:
> Hi, Flavio.
> 
> On 11/23/21 15:32, Flavio Leitner wrote:
> > On Mon, Nov 22, 2021 at 03:46:28PM -0300, Murilo Opsfelder Araújo wrote:
> > > Hi, Ilya Maximets.
> > > 
> > > On 11/19/21 13:23, Ilya Maximets wrote:
> > > > On 11/18/21 22:19, David Christensen wrote:
> > > > > 
> > > > > 
> > > > > On 11/18/21 11:56 AM, Murilo Opsfelder Araújo wrote:
> > > > > > On 11/16/21 19:31, Ilya Maximets wrote:
> > > > > > > On 10/25/21 19:45, David Christensen wrote:
> > > > > > > > In certain high load situations, such as when creating a large 
> > > > > > > > number of
> > > > > > > > ports on a switch, the parameter 'sock' may be passed to 
> > > > > > > > nl_sock_recv__()
> > > > > > > > as null, resulting in a segmentation fault when 'sock' is later
> > > > > > > > dereferenced, such as when calling recvmsg().
> > > > > > > 
> > > > > > > Hi, David.  Thanks for the patch.
> > > > > > > 
> > > > > > > It's OK to check for a NULL pointer there, I guess.  However,
> > > > > > > do you know from where it was actually called?  This function,
> > > > > > > in general, should not be called without the actual socket,
> > > > > > > so we, probably, should fix the caller instead.
> > > > > > > 
> > > > > > > Best regards, Ilya Maximets.
> > > > > > 
> > > > > > Hi, Ilya Maximets.
> > > > > > 
> > > > > > When I looked at the coredump file, ch->sock was nil and was passed 
> > > > > > to nl_sock_recv():
> > > > > > 
> > > > > > (gdb) l
> > > > > > 2701
> > > > > > 2702    while (handler->event_offset < handler->n_events) {
> > > > > > 2703    int idx = 
> > > > > > handler->epoll_events[handler->event_offset].data.u32;
> > > > > > 2704    struct dpif_channel *ch = >channels[idx];
> > > > > > 
> > > > > > (gdb) p idx
> > > > > > $26 = 4
> > > > > > (gdb) p *dpif->channels@5
> > > > > > $27 = {{sock = 0x1001ae88240, last_poll = -9223372036854775808}, 
> > > > > > {sock = 0x1001aa9a8a0, last_poll = -9223372036854775808}, {sock = 
> > > > > > 0x1001ae09510, last_poll = 60634070}, {sock = 0x1001a9dbb60, 
> > > > > > last_poll = 60756950}, {sock = 0x0,
> > > > > >    last_poll = 61340749}}
> > > > > > 
> > > > > > 
> > > > > > The above snippet is from lib/dpif-netlink.c and the caller is 
> > > > > > dpif_netlink_recv_vport_dispatch().
> > > > > > 
> > > > > > The channel at idx=4 had sock=0x0, which was passed to 
> > > > > > nl_sock_recv() via ch->sock parameter.
> > > > > > In that function, it tried to access sock->fd when calling 
> > > > > > recvmsg(), causing the segfault.
> > > > > > 
> > > > > > I'm not enough experienced in Open vSwitch to explain why sock was 
> > > > > > nil at that given index.
> > > > > > The fix seems worth, though.
> > > > > 
> > > > > A few other points of note:
> > > > > 
> > > > > - Test system was a very large configuration (2K CPUs, > 1TB RAM)
> > > > > - OVS Switch was configured with 6K ports as follows:
> > > > > 
> > > > > # b=br0; cmds=; for i in {1..6000}; do cmds+=" -- add-port $b p$i -- 
> > > > > set interface p$i type=internal"; done
> > > > > # time sudo ovs-vsctl $cmds
> > > > > 
> > > > > - OVS was installed from RHEL RPM.  Build from source did not exhibit 
> > > > > the same problem.
> > > > > - Unable to reproduce on a different system (128 CPUs, 256GB RAM), 
> > > > > even with 10K ports.
> > > > > 
> > > > > Dave
> > > > 
> > > > Thanks for all the information.  Having a NULL socket in the 'channels'
> > > > array doesn't look a good sign.  This may mean that even if OVS will
>

Re: [ovs-dev] [PATCH] netlink-socket: Check for null sock in nl_sock_recv__()

2021-11-23 Thread Flavio Leitner
On Mon, Nov 22, 2021 at 03:46:28PM -0300, Murilo Opsfelder Araújo wrote:
> Hi, Ilya Maximets.
> 
> On 11/19/21 13:23, Ilya Maximets wrote:
> > On 11/18/21 22:19, David Christensen wrote:
> > > 
> > > 
> > > On 11/18/21 11:56 AM, Murilo Opsfelder Araújo wrote:
> > > > On 11/16/21 19:31, Ilya Maximets wrote:
> > > > > On 10/25/21 19:45, David Christensen wrote:
> > > > > > In certain high load situations, such as when creating a large 
> > > > > > number of
> > > > > > ports on a switch, the parameter 'sock' may be passed to 
> > > > > > nl_sock_recv__()
> > > > > > as null, resulting in a segmentation fault when 'sock' is later
> > > > > > dereferenced, such as when calling recvmsg().
> > > > > 
> > > > > Hi, David.  Thanks for the patch.
> > > > > 
> > > > > It's OK to check for a NULL pointer there, I guess.  However,
> > > > > do you know from where it was actually called?  This function,
> > > > > in general, should not be called without the actual socket,
> > > > > so we, probably, should fix the caller instead.
> > > > > 
> > > > > Best regards, Ilya Maximets.
> > > > 
> > > > Hi, Ilya Maximets.
> > > > 
> > > > When I looked at the coredump file, ch->sock was nil and was passed to 
> > > > nl_sock_recv():
> > > > 
> > > > (gdb) l
> > > > 2701
> > > > 2702    while (handler->event_offset < handler->n_events) {
> > > > 2703    int idx = 
> > > > handler->epoll_events[handler->event_offset].data.u32;
> > > > 2704    struct dpif_channel *ch = >channels[idx];
> > > > 
> > > > (gdb) p idx
> > > > $26 = 4
> > > > (gdb) p *dpif->channels@5
> > > > $27 = {{sock = 0x1001ae88240, last_poll = -9223372036854775808}, {sock 
> > > > = 0x1001aa9a8a0, last_poll = -9223372036854775808}, {sock = 
> > > > 0x1001ae09510, last_poll = 60634070}, {sock = 0x1001a9dbb60, last_poll 
> > > > = 60756950}, {sock = 0x0,
> > > >   last_poll = 61340749}}
> > > > 
> > > > 
> > > > The above snippet is from lib/dpif-netlink.c and the caller is 
> > > > dpif_netlink_recv_vport_dispatch().
> > > > 
> > > > The channel at idx=4 had sock=0x0, which was passed to nl_sock_recv() 
> > > > via ch->sock parameter.
> > > > In that function, it tried to access sock->fd when calling recvmsg(), 
> > > > causing the segfault.
> > > > 
> > > > I'm not enough experienced in Open vSwitch to explain why sock was nil 
> > > > at that given index.
> > > > The fix seems worth, though.
> > > 
> > > A few other points of note:
> > > 
> > > - Test system was a very large configuration (2K CPUs, > 1TB RAM)
> > > - OVS Switch was configured with 6K ports as follows:
> > > 
> > > # b=br0; cmds=; for i in {1..6000}; do cmds+=" -- add-port $b p$i -- set 
> > > interface p$i type=internal"; done
> > > # time sudo ovs-vsctl $cmds
> > > 
> > > - OVS was installed from RHEL RPM.  Build from source did not exhibit the 
> > > same problem.
> > > - Unable to reproduce on a different system (128 CPUs, 256GB RAM), even 
> > > with 10K ports.
> > > 
> > > Dave
> > 
> > Thanks for all the information.  Having a NULL socket in the 'channels'
> > array doesn't look a good sign.  This may mean that even if OVS will
> > not crash, it may not receive upcalls for certain ports, which is not
> > good eihter.
> > 
> > Could you provide more info on which RHEL package did you use and which
> > version you built from source?  Maybe we can find the difference.
> 
> When the bug was reported internally, the distro package was actually from 
> Fedora 33 ppc64le.
> 
> One difference I've found at that time was regarding the configure options.
> 
> From the logs at [0], the distro package was configured with these options:
> 
> + ./configure --build=ppc64le-redhat-linux-gnu 
> --host=ppc64le-redhat-linux-gnu --program-prefix= 
> --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr 
> --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share 
> --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec 
> --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man 
> --infodir=/usr/share/info --enable-libcapng --disable-static --enable-shared 
> --enable-ssl --with-dpdk=shared --with-pkidir=/var/lib/openvswitch/pki
> 
> And the built-from-source binaries from Dave were built like this:
> 
> git clone https://github.com/openvswitch/ovs.git
> cd ovs/
> git checkout v2.15.0

Do you mean 2.16.0? Because the dpif_netlink_recv_vport_dispatch()
from the snippet above doesn't exist in 2.15.0.

And the FC33 uses 2.15.0:
https://koji.fedoraproject.org/koji/buildinfo?buildID=1713250

Can you reproduce the issue? 
The reason I am asking is because in 2.16.0 we added another way to
handle upcalls, so I am wondering if the bug was introduced with
that change 2.16.0 or if it was already there in 2.15.0.

Also, could you provide the full backtrace of all threads?
(gdb) thread apply all bt

It would be nice to have a dump of the struct handler too.

Thanks,
fbl



> ./boot.sh
> ./configure
> make
> 

Re: [ovs-dev] [PATCH] TCP Stream: Use TCP keepalive by default

2021-11-23 Thread Flavio Leitner
On Tue, Nov 16, 2021 at 09:54:54PM +0100, Ilya Maximets wrote:
> On 10/25/21 16:36, Michael Santana wrote:
> > In the case that a client disables jsonrpc probes the client would fail
> > to detect if the connection to the server has dropped. To workaround
> > such case TCP keepalive is enabled.
> > 
> > Signed-off-by: Michael Santana 
> > ---
> 
> Hi, Michael.  Thanks for the patch.  But I'm not sure why we need this,
> at least in current form.
> 
> Standard keepalive configuration on modern systems is set to something
> around 2 hours most of the time.  So, the user might have 2 hours of
> downtime and not even notice.
> 
> TCP keepalives might be useful for the case where user knows that
> application may not reply for a long time, so they have to set the
> inactivity probe to a higher value.   In this case, we could detect
> connection failure with TCP keepalive using shorter time interval.
> 
> Having TCP keepalive configured to a very long interval (which is system
> default), IMHO, doesn't make a lot of sense.
> 
> I would also argue that inactivity probes should never be disabled, but
> set to a higher value instead, because TCP keepalive will not be able
> to detect hanged application, e.g. deadlock.

That's a good point. My concern is that either we will need
to stop allowing setting to 0 and that can break upgrades,
or we need to silently set a big interval instead, which is
not a good program behavior IMHO.

Also, it sounds like not allowing to disable actually prevents
the user to shoot himself in the foot, but I don't know for
sure if that has a valid use-case.

Even if we decide to improve the built-in inactivity probes,
it still seems like a good idea to enable TCP keepalive as
a second layer of fault detection.

BTW, the patch missed the tag below
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1988461

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] TCP Stream: Use TCP keepalive by default

2021-10-27 Thread Flavio Leitner
On Mon, Oct 25, 2021 at 10:36:32AM -0400, Michael Santana wrote:
> In the case that a client disables jsonrpc probes the client would fail
> to detect if the connection to the server has dropped. To workaround
> such case TCP keepalive is enabled.
> 
> Signed-off-by: Michael Santana 
> ---

Patch looks good and makes sense to me.
It builds fine on Windows as well.

Acked-by: Flavio Leitner 

fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH RFC 5/5] Tunnel: Add self tests for MAC learning and ageing.

2021-10-25 Thread Flavio Leitner
On Thu, Oct 07, 2021 at 02:35:40PM +0200, Paolo Valerio wrote:
> Tests for both ipv4 and ipv6 have been added.

Thanks for writing the unit tests.
Could you please add them as part of the patch adding the commands?

Thanks
fbl

> 
> Signed-off-by: Paolo Valerio 
> ---
>  tests/tunnel-push-pop-ipv6.at |   66 
> +
>  tests/tunnel-push-pop.at  |   65 
>  2 files changed, 131 insertions(+)
> 
> diff --git a/tests/tunnel-push-pop-ipv6.at b/tests/tunnel-push-pop-ipv6.at
> index 59723e63b..5c4dd248b 100644
> --- a/tests/tunnel-push-pop-ipv6.at
> +++ b/tests/tunnel-push-pop-ipv6.at
> @@ -255,6 +255,36 @@ AT_CHECK([cat p0.pcap.txt | grep 
> 93aa55aa5586dd60203aff2001cafe | un
>  
> ff93aa55aa5586dd60203aff2001cafe0088ff020001ff9387004d462001cafe00930101aa55aa55
>  ])
>  
> +dnl Set the ageing time to 5 seconds
> +AT_CHECK([ovs-appctl tnl/neigh/ageing 5], [0], [OK
> +])
> +
> +dnl Read the current ageing time
> +AT_CHECK([ovs-appctl tnl/neigh/ageing], [0], [5
> +])
> +
> +dnl Add an entry
> +AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::92 aa:bb:cc:00:00:01], 
> [0], [OK
> +])
> +
> +AT_CHECK([ovs-appctl tnl/neigh/show | grep br0 | sort], [0], [dnl
> +2001:cafe::92 aa:bb:cc:00:00:01   br0
> +])
> +
> +ovs-appctl time/warp 5000
> +
> +dnl Check the entry has been removed
> +AT_CHECK([ovs-appctl tnl/neigh/show | grep br0 | sort], [0], [dnl
> +])
> +
> +dnl Restore the ageing time to 900s (default)
> +AT_CHECK([ovs-appctl tnl/neigh/ageing 900], [0], [OK
> +])
> +
> +dnl Read the current ageing time
> +AT_CHECK([ovs-appctl tnl/neigh/ageing], [0], [900
> +])
> +
>  dnl Check ARP Snoop
>  AT_CHECK([ovs-appctl netdev-dummy/receive p0 
> 'in_port(1),eth(src=f8:bc:12:44:34:c8,dst=aa:55:aa:55:00:00),eth_type(0x86dd),ipv6(src=2001:cafe::92,dst=2001:cafe::88,label=0,proto=58,tclass=0,hlimit=255,frag=no),icmpv6(type=136,code=0),nd(target=2001:cafe::92,sll=00:00:00:00:00:00,tll=f8:bc:12:44:34:c8)'])
>  
> @@ -432,6 +462,42 @@ AT_CHECK([ovs-appctl dpif/dump-flows int-br | grep 
> 'in_port(6081)'], [0], [dnl
>  
> tunnel(tun_id=0x7b,ipv6_src=2001:cafe::92,ipv6_dst=2001:cafe::88,geneve({class=0x,type=0x80,len=4,0xa/0xf}{class=0x,type=0,len=4}),flags(-df-csum+key)),recirc_id(0),in_port(6081),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
>  packets:0, bytes:0, used:never, 
> actions:userspace(pid=0,controller(reason=1,dont_send=0,continuation=0,recirc_id=3,rule_cookie=0,controller_id=0,max_len=65535))
>  ])
>  
> +dnl Receive VXLAN with different MAC and verify that the neigh cache gets 
> updated
> +AT_CHECK([ovs-appctl netdev-dummy/receive p0 
> 'aa55aa55f8bc1244cafe86dd603a11402001cafe00922001cafe0088c85312b5003abc700c037b000800451c000140117cce7f017f010035003500080172'])
> +
> +ovs-appctl time/warp 1000
> +ovs-appctl time/warp 1000
> +
> +dnl Check VXLAN tunnel push
> +AT_CHECK([ovs-ofctl add-flow int-br action=2])
> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 
> 'in_port(2),eth(src=36:b1:ee:7c:01:01,dst=36:b1:ee:7c:01:02),eth_type(0x0800),ipv4(src=1.1.3.88,dst=1.1.3.112,proto=47,tos=0,ttl=64,frag=no)'],
>  [0], [stdout])
> +AT_CHECK([tail -1 stdout], [0],
> +  [Datapath actions: 
> clone(tnl_push(tnl_port(4789),header(size=70,type=4,eth(dst=f8:bc:12:44:ca:fe,src=aa:55:aa:55:00:00,dl_type=0x86dd),ipv6(src=2001:cafe::88,dst=2001:cafe::92,label=0,proto=17,tclass=0x0,hlimit=64),udp(src=0,dst=4789,csum=0x),vxlan(flags=0x800,vni=0x7b)),out_port(100)),1)
> +])
> +
> +AT_CHECK([ovs-appctl tnl/arp/show | tail -n+3 | sort], [0], [dnl
> +2001:cafe::92 f8:bc:12:44:ca:fe   br0
> +2001:cafe::93 f8:bc:12:44:34:b7   br0
> +])
> +
> +dnl Restore and check the cache entries
> +AT_CHECK([ovs-appctl netdev-dummy/receive p0 
> 'aa55aa55f8bc124434b686dd603a11402001cafe00922001cafe0088c85312b5003abc700c037b000800451c000140117cce7f017f010035003500080172'])
> +
> +ovs-appctl time/warp 1000
> +ovs-appctl time/warp 1000
> +
> +dnl Check VXLAN tunnel push
> +AT_CHECK([ovs-ofctl add-flow int-br action=2])
> +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 
> 'in_port(2),eth(src=36:b1:ee:7c:01:01,dst=36:b1:ee:7c:01:02),eth_type(0x0800),ipv4(src=1.1.3.88,dst=1.1.3.112,proto=47,tos=0,ttl=64,frag=no)'],
>  [0], [stdout])
> +AT_CHECK([tail -1 stdout], [0],
> +  [Datapath actions: 
> clone(tnl_push(tnl_port(4789),header(size=70,type=4,eth(dst=f8:bc:12:44:34:b6,src=aa:55:aa:55:00:00,dl_type=0x86dd),ipv6(src=2001:cafe::88,dst=2001:cafe::92,label=0,proto=17,tclass=0x0,hlimit=64),udp(src=0,dst=4789,csum=0x),vxlan(flags=0x800,vni=0x7b)),out_port(100)),1)
> 

Re: [ovs-dev] [PATCH RFC 4/5] Native tunnel: Do not refresh the entry while revalidating.

2021-10-25 Thread Flavio Leitner
Hi Paolo,

On Thu, Oct 07, 2021 at 02:35:34PM +0200, Paolo Valerio wrote:
> This is a minor issue but visible e.g. when you try to flush the neigh
> cache while the ARP flow is still present in the datapath, triggering
> the revalidation of the datapath flows which subsequntly

Typo

> refreshes/adds the entry in the cache.

Otherwise it looks ok.
fbl


> 
> Signed-off-by: Paolo Valerio 
> ---
>  lib/tnl-neigh-cache.c|   20 +---
>  lib/tnl-neigh-cache.h|2 +-
>  ofproto/ofproto-dpif-xlate.c |3 ++-
>  3 files changed, 16 insertions(+), 9 deletions(-)
> 
> diff --git a/lib/tnl-neigh-cache.c b/lib/tnl-neigh-cache.c
> index 9d3f00ad9..df8de48eb 100644
> --- a/lib/tnl-neigh-cache.c
> +++ b/lib/tnl-neigh-cache.c
> @@ -173,7 +173,7 @@ tnl_arp_set(const char name[IFNAMSIZ], ovs_be32 dst,
>  
>  static int
>  tnl_arp_snoop(const struct flow *flow, struct flow_wildcards *wc,
> -  const char name[IFNAMSIZ])
> +  const char name[IFNAMSIZ], bool update)
>  {
>  /* Snoop normal ARP replies and gratuitous ARP requests/replies only */
>  if (!is_arp(flow)
> @@ -183,13 +183,17 @@ tnl_arp_snoop(const struct flow *flow, struct 
> flow_wildcards *wc,
>  return EINVAL;
>  }
>  
> -tnl_arp_set(name, FLOW_WC_GET_AND_MASK_WC(flow, wc, nw_src), 
> flow->arp_sha);
> +memset(>masks.nw_src, 0xff, sizeof wc->masks.nw_src);
> +
> +if (update) {
> +tnl_arp_set(name, flow->nw_src, flow->arp_sha);
> +}
>  return 0;
>  }
>  
>  static int
>  tnl_nd_snoop(const struct flow *flow, struct flow_wildcards *wc,
> - const char name[IFNAMSIZ])
> + const char name[IFNAMSIZ], bool update)
>  {
>  if (!is_nd(flow, wc) || flow->tp_src != htons(ND_NEIGHBOR_ADVERT)) {
>  return EINVAL;
> @@ -208,20 +212,22 @@ tnl_nd_snoop(const struct flow *flow, struct 
> flow_wildcards *wc,
>  memset(>masks.ipv6_dst, 0xff, sizeof wc->masks.ipv6_dst);
>  memset(>masks.nd_target, 0xff, sizeof wc->masks.nd_target);
>  
> -tnl_neigh_set(name, >nd_target, flow->arp_tha);
> +if (update) {
> +tnl_neigh_set(name, >nd_target, flow->arp_tha);
> +}
>  return 0;
>  }
>  
>  int
>  tnl_neigh_snoop(const struct flow *flow, struct flow_wildcards *wc,
> -const char name[IFNAMSIZ])
> +const char name[IFNAMSIZ], bool update)
>  {
>  int res;
> -res = tnl_arp_snoop(flow, wc, name);
> +res = tnl_arp_snoop(flow, wc, name, update);
>  if (res != EINVAL) {
>  return res;
>  }
> -return tnl_nd_snoop(flow, wc, name);
> +return tnl_nd_snoop(flow, wc, name, update);
>  }
>  
>  void
> diff --git a/lib/tnl-neigh-cache.h b/lib/tnl-neigh-cache.h
> index 92fdf5a93..a2fd9f4ae 100644
> --- a/lib/tnl-neigh-cache.h
> +++ b/lib/tnl-neigh-cache.h
> @@ -32,7 +32,7 @@
>  #include "util.h"
>  
>  int tnl_neigh_snoop(const struct flow *flow, struct flow_wildcards *wc,
> -const char dev_name[IFNAMSIZ]);
> +const char dev_name[IFNAMSIZ], bool update);
>  void
>  tnl_neigh_set(const char name[IFNAMSIZ], const struct in6_addr *dst,
>const struct eth_addr mac);
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index 4430ac073..44a49dae8 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -4097,7 +4097,8 @@ terminate_native_tunnel(struct xlate_ctx *ctx, struct 
> flow *flow,
>  (flow->dl_type == htons(ETH_TYPE_ARP) ||
>   flow->nw_proto == IPPROTO_ICMPV6) &&
>   is_neighbor_reply_correct(ctx, flow)) {
> -tnl_neigh_snoop(flow, wc, ctx->xbridge->name);
> +tnl_neigh_snoop(flow, wc, ctx->xbridge->name,
> +ctx->xin->allow_side_effects);
>  } else if (*tnl_port != ODPP_NONE &&
> ctx->xin->allow_side_effects &&
> (flow->dl_type == htons(ETH_TYPE_IP) ||
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH RFC 3/5] Tunnel: Snoop ingress packets and update neigh cache if needed.

2021-10-25 Thread Flavio Leitner
On Thu, Oct 07, 2021 at 02:35:28PM +0200, Paolo Valerio wrote:
> In case of native tunnel with bfd enabled, if the MAC address of the
> remote end's interface changes (e.g. because it got rebooted, and the
> MAC address is allocated dinamically), the BFD session will never be
> re-established.
> 
> This happens because the local tunnel neigh entry doesn't get updated,
> and the local end keeps sending BFD packets with the old destination
> MAC address. This was not an issue until
> b23ddcc57d41 ("tnl-neigh-cache: tighten arp and nd snooping.")
> because ARP requests were snooped as well avoiding the problem.
> 
> Fix this by snooping the incoming packets, and updating the neigh
> cache accordingly.


Can we add a mention that this only affects slow path?
Otherwise it may suggests a performance impact.


> Signed-off-by: Paolo Valerio 
> Fixes: b23ddcc57d41 ("tnl-neigh-cache: tighten arp and nd snooping.")
> ---
>  lib/tnl-neigh-cache.c|   12 ++--
>  lib/tnl-neigh-cache.h|3 +++
>  ofproto/ofproto-dpif-xlate.c |   14 ++
>  3 files changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/tnl-neigh-cache.c b/lib/tnl-neigh-cache.c
> index c8a7b60cd..9d3f00ad9 100644
> --- a/lib/tnl-neigh-cache.c
> +++ b/lib/tnl-neigh-cache.c
> @@ -135,9 +135,9 @@ tnl_neigh_delete(struct tnl_neigh_entry *neigh)
>  ovsrcu_postpone(neigh_entry_free, neigh);
>  }
>  
> -static void
> -tnl_neigh_set__(const char name[IFNAMSIZ], const struct in6_addr *dst,
> -const struct eth_addr mac)
> +void
> +tnl_neigh_set(const char name[IFNAMSIZ], const struct in6_addr *dst,
> +  const struct eth_addr mac)
>  {
>  ovs_mutex_lock();
>  struct tnl_neigh_entry *neigh = tnl_neigh_lookup__(name, dst);
> @@ -168,7 +168,7 @@ tnl_arp_set(const char name[IFNAMSIZ], ovs_be32 dst,
>  const struct eth_addr mac)
>  {
>  struct in6_addr dst6 = in6_addr_mapped_ipv4(dst);
> -tnl_neigh_set__(name, , mac);
> +tnl_neigh_set(name, , mac);
>  }
>  
>  static int
> @@ -208,7 +208,7 @@ tnl_nd_snoop(const struct flow *flow, struct 
> flow_wildcards *wc,
>  memset(>masks.ipv6_dst, 0xff, sizeof wc->masks.ipv6_dst);
>  memset(>masks.nd_target, 0xff, sizeof wc->masks.nd_target);
>  
> -tnl_neigh_set__(name, >nd_target, flow->arp_tha);
> +tnl_neigh_set(name, >nd_target, flow->arp_tha);
>  return 0;
>  }
>  
> @@ -355,7 +355,7 @@ tnl_neigh_cache_add(struct unixctl_conn *conn, int argc 
> OVS_UNUSED,
>  return;
>  }
>  
> -tnl_neigh_set__(br_name, , mac);
> +tnl_neigh_set(br_name, , mac);
>  unixctl_command_reply(conn, "OK");
>  }
>  
> diff --git a/lib/tnl-neigh-cache.h b/lib/tnl-neigh-cache.h
> index e4b42b059..92fdf5a93 100644
> --- a/lib/tnl-neigh-cache.h
> +++ b/lib/tnl-neigh-cache.h
> @@ -33,6 +33,9 @@
>  
>  int tnl_neigh_snoop(const struct flow *flow, struct flow_wildcards *wc,
>  const char dev_name[IFNAMSIZ]);
> +void
> +tnl_neigh_set(const char name[IFNAMSIZ], const struct in6_addr *dst,
> +  const struct eth_addr mac);
>  int tnl_neigh_lookup(const char dev_name[IFNAMSIZ], const struct in6_addr 
> *dst,
>   struct eth_addr *mac);
>  void tnl_neigh_cache_init(void);
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index 8723cb4e8..4430ac073 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -4098,6 +4098,20 @@ terminate_native_tunnel(struct xlate_ctx *ctx, struct 
> flow *flow,
>   flow->nw_proto == IPPROTO_ICMPV6) &&
>   is_neighbor_reply_correct(ctx, flow)) {
>  tnl_neigh_snoop(flow, wc, ctx->xbridge->name);
> +} else if (*tnl_port != ODPP_NONE &&
> +   ctx->xin->allow_side_effects &&
> +   (flow->dl_type == htons(ETH_TYPE_IP) ||
> +flow->dl_type == htons(ETH_TYPE_IPV6))) {
> +struct eth_addr mac = flow->dl_src;
> +struct in6_addr s_ip6;
> +
> +if (flow->nw_src) {

I don't think we will have zeros as valid source IP addr at this
point, but this looks odd. Why not repeating the same check?
   if (flow->dl_type == htons(ETH_TYPE_IP)) {


> +in6_addr_set_mapped_ipv4(_ip6, flow->nw_src);
> +} else {
> +s_ip6 = flow->ipv6_src;
> +}
> +
> +tnl_neigh_set(ctx->xbridge->name, _ip6, mac);
>  }
>  }
>  
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH RFC 2/5] Native tunnel: Add tnl/neigh/ageing command.

2021-10-25 Thread Flavio Leitner
On Thu, Oct 07, 2021 at 02:35:21PM +0200, Paolo Valerio wrote:
> with the command is now possible to change the ageing time of the
> cache entries.

Please start with a normal sentence using a capital letter.

> For the existing entries the ageing time is updated only if the
> current expiration is greater than the new one. In any case, the next
> refresh will set it to the new value.
> 
> This is intended mostly for debugging purpose.
> 
> Signed-off-by: Paolo Valerio 
> ---
>  NEWS|3 ++
>  lib/tnl-neigh-cache.c   |   77 
> ++-
>  ofproto/ofproto-tnl-unixctl.man |5 +++
>  3 files changed, 76 insertions(+), 9 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index 90f4b1590..148dd5d61 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -10,6 +10,9 @@ Post-v2.16.0
> limiting behavior.
>   * Add hardware offload support for matching IPv4/IPv6 frag types
> (experimental).
> +   - Native tunnel:
> + * Added new ovs-appctl tnl/neigh/ageing to read/write the neigh ageing
> +   time.
>  
>  
>  v2.16.0 - 16 Aug 2021
> diff --git a/lib/tnl-neigh-cache.c b/lib/tnl-neigh-cache.c
> index a37456e6d..c8a7b60cd 100644
> --- a/lib/tnl-neigh-cache.c
> +++ b/lib/tnl-neigh-cache.c
> @@ -42,6 +42,7 @@
>  #include "unixctl.h"
>  #include "util.h"
>  #include "openvswitch/vlog.h"
> +#include "ovs-atomic.h"

The previous patch is using atomic ops, so this shouldn't be needed
here.

>  
>  
>  /* In milliseconds */
> @@ -57,6 +58,7 @@ struct tnl_neigh_entry {
>  
>  static struct cmap table = CMAP_INITIALIZER;
>  static struct ovs_mutex mutex = OVS_MUTEX_INITIALIZER;
> +static atomic_uint32_t neigh_ageing;
>  
>  static uint32_t
>  tnl_neigh_hash(const struct in6_addr *ip)
> @@ -74,6 +76,15 @@ tnl_neigh_expired(struct tnl_neigh_entry *neigh)
>  return expired <= time_msec();
>  }
>  
> +static uint32_t
> +tnl_neigh_get_ageing(void)
> +{
> +unsigned int ageing;
> +
> +atomic_read_relaxed(_ageing, );
> +return ageing;
> +}
> +
>  static struct tnl_neigh_entry *
>  tnl_neigh_lookup__(const char br_name[IFNAMSIZ], const struct in6_addr *dst)
>  {
> @@ -88,7 +99,7 @@ tnl_neigh_lookup__(const char br_name[IFNAMSIZ], const 
> struct in6_addr *dst)
>  }
>  
>  atomic_store_relaxed(>expires, time_msec() +
> - NEIGH_ENTRY_DEFAULT_IDLE_TIME);
> + tnl_neigh_get_ageing());
>  return neigh;
>  }
>  }
> @@ -133,7 +144,7 @@ tnl_neigh_set__(const char name[IFNAMSIZ], const struct 
> in6_addr *dst,
>  if (neigh) {
>  if (eth_addr_equals(neigh->mac, mac)) {
>  atomic_store_relaxed(>expires, time_msec() +
> - NEIGH_ENTRY_DEFAULT_IDLE_TIME);
> + tnl_neigh_get_ageing());
>  ovs_mutex_unlock();
>  return;
>  }
> @@ -146,7 +157,7 @@ tnl_neigh_set__(const char name[IFNAMSIZ], const struct 
> in6_addr *dst,
>  neigh->ip = *dst;
>  neigh->mac = mac;
>  atomic_store_relaxed(>expires, time_msec() +
> - NEIGH_ENTRY_DEFAULT_IDLE_TIME);
> + tnl_neigh_get_ageing());
>  ovs_strlcpy(neigh->br_name, name, sizeof neigh->br_name);
>  cmap_insert(, >cmap_node, tnl_neigh_hash(>ip));
>  ovs_mutex_unlock();
> @@ -272,6 +283,43 @@ tnl_neigh_cache_flush(struct unixctl_conn *conn, int 
> argc OVS_UNUSED,
>  unixctl_command_reply(conn, "OK");
>  }
>  
> +static void
> +tnl_neigh_cache_ageing(struct unixctl_conn *conn, int argc,
> +const char *argv[], void *aux OVS_UNUSED)
> +{
> +long long int new_exp, curr_exp;
> +struct tnl_neigh_entry *neigh;
> +uint32_t ageing;
> +
> +if (argc == 1) {
> +struct ds ds = DS_EMPTY_INITIALIZER;
> +ds_put_format(, "%u", tnl_neigh_get_ageing() / 1000);

Shouldn't that be PRIu32?

> +unixctl_command_reply(conn, ds_cstr());
> +ds_destroy();
> +
> +return;
> +}
> +
> +if (!ovs_scan(argv[1], "%"SCNu32, ) ||
> +!ageing || ageing > 1) {

Why 1? Perhaps it needs to be defined and mentioned in the
documentation. I suggest 3600 secs (1 hour) as an arbitrary upper
limit.

> +unixctl_command_reply_error(conn, "bad ageing value");
> +return;
> +}
> +
> +ageing *= 1000;
> +atomic_store_relaxed(_ageing, ageing);
> +new_exp = time_msec() + ageing;
> +
> +CMAP_FOR_EACH (neigh, cmap_node, ) {
> +atomic_read_relaxed(>expires, _exp);
> +if (new_exp < curr_exp) {
> +atomic_store_relaxed(>expires, new_exp);
> +}
> +}
> +
> +unixctl_command_reply(conn, "OK");
> +}
> +
>  static int
>  lookup_any(const char *host_name, struct in6_addr *address)
>  {
> @@ -346,10 +394,21 @@ tnl_neigh_cache_show(struct unixctl_conn *conn, int 
> argc OVS_UNUSED,
>  void
>  

Re: [ovs-dev] [PATCH RFC 1/5] Native tunnel: Read/write expires atomically.

2021-10-25 Thread Flavio Leitner


Hi Paolo,

The lookup does not change cmap, but it changes the entry which can
be used by multiple threads. In that case, we would need a mutex to
modify the entry. However, in this specific case only 'expires' is
required to change, and other fields are static. Therefore, going
with atomic op makes sense to me.

Since you're using atomic op, it would be great to include
"ovs-atomic.h", though it is indirectly included by thread
or rcu headers.

What do you think?

fbl


On Thu, Oct 07, 2021 at 02:35:15PM +0200, Paolo Valerio wrote:
> Signed-off-by: Paolo Valerio 
> ---
>  lib/tnl-neigh-cache.c |   31 ++-
>  1 file changed, 22 insertions(+), 9 deletions(-)
> 
> diff --git a/lib/tnl-neigh-cache.c b/lib/tnl-neigh-cache.c
> index 5bda4af7e..a37456e6d 100644
> --- a/lib/tnl-neigh-cache.c
> +++ b/lib/tnl-neigh-cache.c
> @@ -44,14 +44,14 @@
>  #include "openvswitch/vlog.h"
>  
>  
> -/* In seconds */
> -#define NEIGH_ENTRY_DEFAULT_IDLE_TIME  (15 * 60)
> +/* In milliseconds */
> +#define NEIGH_ENTRY_DEFAULT_IDLE_TIME  (15 * 60 * 1000)
>  
>  struct tnl_neigh_entry {
>  struct cmap_node cmap_node;
>  struct in6_addr ip;
>  struct eth_addr mac;
> -time_t expires; /* Expiration time. */
> +atomic_llong expires;   /* Expiration time in ms. */
>  char br_name[IFNAMSIZ];
>  };
>  
> @@ -64,6 +64,16 @@ tnl_neigh_hash(const struct in6_addr *ip)
>  return hash_bytes(ip->s6_addr, 16, 0);
>  }
>  
> +static bool
> +tnl_neigh_expired(struct tnl_neigh_entry *neigh)
> +{
> +long long expired;
> +
> +atomic_read_relaxed(>expires, );
> +
> +return expired <= time_msec();
> +}
> +
>  static struct tnl_neigh_entry *
>  tnl_neigh_lookup__(const char br_name[IFNAMSIZ], const struct in6_addr *dst)
>  {
> @@ -73,11 +83,12 @@ tnl_neigh_lookup__(const char br_name[IFNAMSIZ], const 
> struct in6_addr *dst)
>  hash = tnl_neigh_hash(dst);
>  CMAP_FOR_EACH_WITH_HASH (neigh, cmap_node, hash, ) {
>  if (ipv6_addr_equals(>ip, dst) && !strcmp(neigh->br_name, 
> br_name)) {
> -if (neigh->expires <= time_now()) {
yy> +if (tnl_neigh_expired(neigh)) {
>  return NULL;
>  }
>  
> -neigh->expires = time_now() + NEIGH_ENTRY_DEFAULT_IDLE_TIME;
> +atomic_store_relaxed(>expires, time_msec() +
> + NEIGH_ENTRY_DEFAULT_IDLE_TIME);
>  return neigh;
>  }
>  }
> @@ -121,7 +132,8 @@ tnl_neigh_set__(const char name[IFNAMSIZ], const struct 
> in6_addr *dst,
>  struct tnl_neigh_entry *neigh = tnl_neigh_lookup__(name, dst);
>  if (neigh) {
>  if (eth_addr_equals(neigh->mac, mac)) {
> -neigh->expires = time_now() + NEIGH_ENTRY_DEFAULT_IDLE_TIME;
> +atomic_store_relaxed(>expires, time_msec() +
> + NEIGH_ENTRY_DEFAULT_IDLE_TIME);
>  ovs_mutex_unlock();
>  return;
>  }
> @@ -133,7 +145,8 @@ tnl_neigh_set__(const char name[IFNAMSIZ], const struct 
> in6_addr *dst,
>  
>  neigh->ip = *dst;
>  neigh->mac = mac;
> -neigh->expires = time_now() + NEIGH_ENTRY_DEFAULT_IDLE_TIME;
> +atomic_store_relaxed(>expires, time_msec() +
> + NEIGH_ENTRY_DEFAULT_IDLE_TIME);
>  ovs_strlcpy(neigh->br_name, name, sizeof neigh->br_name);
>  cmap_insert(, >cmap_node, tnl_neigh_hash(>ip));
>  ovs_mutex_unlock();
> @@ -208,7 +221,7 @@ tnl_neigh_cache_run(void)
>  
>  ovs_mutex_lock();
>  CMAP_FOR_EACH(neigh, cmap_node, ) {
> -if (neigh->expires <= time_now()) {
> +if (tnl_neigh_expired(neigh)) {
>  tnl_neigh_delete(neigh);
>  changed = true;
>  }
> @@ -319,7 +332,7 @@ tnl_neigh_cache_show(struct unixctl_conn *conn, int argc 
> OVS_UNUSED,
>  
>  ds_put_format(, ETH_ADDR_FMT"   %s",
>ETH_ADDR_ARGS(neigh->mac), neigh->br_name);
> -if (neigh->expires <= time_now()) {
> +if (tnl_neigh_expired(neigh)) {
>  ds_put_format(, " STALE");
>  }
>  ds_put_char(, '\n');
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] netdev-linux: Fix a null pointer dereference in netdev_linux_notify_sock()

2021-09-11 Thread Flavio Leitner
On Sat, Sep 11, 2021 at 02:34:47PM +0800, Yunjian Wang wrote:
> If nl_sock_join_mcgroup() returns an error, the 'sock' is freed and
> set to NULL. This issues will lead to null pointer deference in
> nl_sock_listen_all_nsid(). To fix it, we call nl_sock_listen_all_nsid()
> before joining the mcgroups.
> 
> Fixes: cf114a7fce80 ("netlink linux: enable listening to all nsids")
> Cc: Flavio Leitner 
> Signed-off-by: Yunjian Wang 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/2] stream-ssl: Update default protocols, enable TLSv1.3

2021-09-11 Thread Flavio Leitner
On Fri, Sep 10, 2021 at 06:23:18PM +0200, Frode Nordahl wrote:
> On Thu, Sep 9, 2021 at 10:05 PM Flavio Leitner  wrote:
> >
> >
> > Hi Frode,
> 
> Hello Flavio, thank you for taking the time to review.
> 
> > On Wed, Aug 25, 2021 at 01:05:14PM +0200, Frode Nordahl wrote:
> > > RFC 8996 [0] deprecates the use of TLSv1 and TLSv1.1 for security
> > > reasons.  Update our default list of protcols to be in compliance.
> > >
> > > Also add TLSv1.3 to the default list of supported protocols.
> > >
> > > 0: https://datatracker.ietf.org/doc/html/rfc8996
> > > Signed-off-by: Frode Nordahl 
> >
> > This patch does two things:
> >   Deprecate TLSv1 and TLSv1.2
> >   Add support for TLSv1.3
> >
> > Can we split them into separate logical patches?
> 
> Yes, that makes sense.
> 
> > Also, shouldn't we first warn the users about deprecating
> > TLSv1 and TLSv1.1 for a release period, and then remove from
> > the default list in the next release?  I mean, this is an user
> > visible change, so users might need some time to adapt.
> >
> > What do you think?
> 
> That would indeed be the appropriate thing to do. I guess I felt a bit
> of a rush since we are a bit late on this deprecation, but better do
> it properly regardless!

Cool, thanks!
fbl



> 
> Thanks!
> 
> -- 
> Frode Nordahl
> 
> > fbl
> >
> > > ---
> > >  NEWS  |  7 +++
> > >  lib/ssl-connect.man   |  6 --
> > >  lib/stream-ssl.c  | 35 +--
> > >  m4/openvswitch.m4 | 16 
> > >  tests/ovsdb-server.at |  6 ++
> > >  5 files changed, 58 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/NEWS b/NEWS
> > > index 1f2adf718..502e6693a 100644
> > > --- a/NEWS
> > > +++ b/NEWS
> > > @@ -8,6 +8,13 @@ Post-v2.16.0
> > > by default.  'other_config:dpdk-socket-limit' can be set equal to
> > > the 'other_config:dpdk-socket-mem' to preserve the legacy memory
> > > limiting behavior.
> > > +   - Update default configuration for enabled TLS protocols:
> > > + * RFC 8996 deprecates the use of TLSv1 and TLSv1.1 for security 
> > > reasons.
> > > + * Add TLSv1.3 to the default list of enabled protocols when the 
> > > built with
> > > +   OpenSSL v1.1.1 and newer.
> > > + * The new default is as such: TLSv1.2,TLSv1.3
> > > + * As a consequence we no longer support building Open vSwitch with 
> > > OpenSSL
> > > +   versions without TLSv1.2 support.
> > >
> > >
> > >  v2.16.0 - 16 Aug 2021
> > > diff --git a/lib/ssl-connect.man b/lib/ssl-connect.man
> > > index 6e54f77ef..0dd5a29be 100644
> > > --- a/lib/ssl-connect.man
> > > +++ b/lib/ssl-connect.man
> > > @@ -1,10 +1,12 @@
> > >  .IP "\fB\-\-ssl\-protocols=\fIprotocols\fR"
> > >  Specifies, in a comma- or space-delimited list, the SSL protocols
> > >  \fB\*(PN\fR will enable for SSL connections.  Supported
> > > -\fIprotocols\fR include \fBTLSv1\fR, \fBTLSv1.1\fR, and \fBTLSv1.2\fR.
> > > +\fIprotocols\fR include \fBTLSv1.2\fR and \fBTLSv1.3\fR depending on
> > > +which version of OpenSSL Open vSwitch is built with.
> > >  Regardless of order, the highest protocol supported by both sides will
> > >  be chosen when making the connection.  The default when this option is
> > > -omitted is \fBTLSv1,TLSv1.1,TLSv1.2\fR.
> > > +omitted is \fBTLSv1.2,TLSv1.3\fR if built with a version of OpenSSL that
> > > +supports \fBTLSv1.3\fR.
> > >  .
> > >  .IP "\fB\-\-ssl\-ciphers=\fIciphers\fR"
> > >  Specifies, in OpenSSL cipher string format, the ciphers \fB\*(PN\fR will
> > > diff --git a/lib/stream-ssl.c b/lib/stream-ssl.c
> > > index 6b4cf6970..954067787 100644
> > > --- a/lib/stream-ssl.c
> > > +++ b/lib/stream-ssl.c
> > > @@ -162,7 +162,13 @@ struct ssl_config_file {
> > >  static struct ssl_config_file private_key;
> > >  static struct ssl_config_file certificate;
> > >  static struct ssl_config_file ca_cert;
> > > -static char *default_ssl_protocols = "TLSv1,TLSv1.1,TLSv1.2";
> > > +/* RFC 8996 deprecates the use of TLSv1 and TLSv1.1, users may still 
> > > specify
> > > + * them in their configuration, but our defaults are in compliance. */
> > > +#if OPENSSL_VERSION_NUMBER < 0x10101000L
> > > +static cha

Re: [ovs-dev] [PATCH v2 1/2] stream-ssl: Fix handling of default ciphers/protocols

2021-09-11 Thread Flavio Leitner
On Fri, Sep 10, 2021 at 06:20:45PM +0200, Frode Nordahl wrote:
> On Thu, Sep 9, 2021 at 9:53 PM Flavio Leitner  wrote:
> >
> >
> > Hi Frode,
> >
> > Thanks for your patch.
> > Please see my comments below.
> 
> Flavio, thank you for taking the time to review.
> 
> > On Wed, Aug 25, 2021 at 01:05:13PM +0200, Frode Nordahl wrote:
> > > Contrary to what is stated in the documentation, when SSL
> > > ciphers or protocols options are omitted the default values
> > > will not be set.  The SSL library default settings will be used
> > > instead.
> > >
> > > Fix handling of default ciphers and protocols so that we actually
> > > enforce what is listed as defaults.
> > >
> > > Fixes: e18a1d086133 ("Add support for specifying SSL connection 
> > > parameters to ovsdb")
> > > Signed-off-by: Frode Nordahl 
> > > ---
> > >  lib/stream-ssl.c | 30 ++
> > >  1 file changed, 22 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/lib/stream-ssl.c b/lib/stream-ssl.c
> > > index 0ea3f2c08..6b4cf6970 100644
> > > --- a/lib/stream-ssl.c
> > > +++ b/lib/stream-ssl.c
> > > @@ -162,8 +162,10 @@ struct ssl_config_file {
> > >  static struct ssl_config_file private_key;
> > >  static struct ssl_config_file certificate;
> > >  static struct ssl_config_file ca_cert;
> > > -static char *ssl_protocols = "TLSv1,TLSv1.1,TLSv1.2";
> > > -static char *ssl_ciphers = "HIGH:!aNULL:!MD5";
> > > +static char *default_ssl_protocols = "TLSv1,TLSv1.1,TLSv1.2";
> > > +static char *default_ssl_ciphers = "HIGH:!aNULL:!MD5";
> > > +static char *ssl_protocols = NULL;
> > > +static char *ssl_ciphers = NULL;
> > >
> > >  /* Ordinarily, the SSL client and server verify each other's 
> > > certificates using
> > >   * a CA certificate.  Setting this to false disables this behavior.  
> > > (This is a
> > > @@ -1225,14 +1227,19 @@ stream_ssl_set_key_and_cert(const char 
> > > *private_key_file,
> > >  void
> > >  stream_ssl_set_ciphers(const char *arg)
> > >  {
> > > -if (ssl_init() || !arg || !strcmp(ssl_ciphers, arg)) {
> >
> > The ssl_init() calls at least one time do_ssl_init() which then
> > calls SSL_CTX_set_cipher_list(ctx, "HIGH:!aNULL:!MD5").
> > Those are the defaults in the man-page and not from the library.
> >
> > The do_ssl_init() also does:
> >method = CONST_CAST(SSL_METHOD *, SSLv23_method());
> >
> > That should return SSLv3, TLSv1, TLSv1.1 and TLS1.2.
> >
> >ctx = SSL_CTX_new(method);
> >SSL_CTX_set_options(ctx, SSL_OP_NO_SSLv2 | SSL_OP_NO_SSLv3);
> >
> > And there it excludes those SSL v2 and v3.
> >
> > Therefore, the default would be "TLSv1,TLSv1.1,TLSv1.2" which is
> > the same in the man-page.
> >
> > Did I miss something?
> 
> Thank you for pointing out that, I did not realize we manipulated
> these options multiple places.
> 
> I do need to rephrase the commit message, but there is still a problem
> here. It became apparent when working on the next patch in the series,
> where functional tests behave unexpectedly when passing the
> ssl-protocols options. The reason being that when the protocol list
> matches what is already in the static char *ssl_protocols in
> lib/stream-ssl.c stream_ssl_set_protocols returns early without
> setting any option.

If that matches then it is because the default is set, so it
shouldn't make any difference to return early. Do you still
have the next patch without the default_ssl_protocols change
for me to take a look? That might help me to see the issue.

> So I guess the question then becomes, should we stop doing this
> multiple places or do you want me to update the call to
> SSL_CTX_set_options in do_ssl_init and not introduce this change?

Not sure yet because I didn't see the problem yet.

Thanks,
fbl

> 
> -- 
> Frode Nordahl
> 
> > Thanks
> > fbl
> >
> >
> >
> > > +const char *input = arg ? arg : default_ssl_ciphers;
> > > +
> > > +if (ssl_init() || !input || (ssl_ciphers && !strcmp(ssl_ciphers, 
> > > input))) {
> > >  return;
> > >  }
> > > -if (SSL_CTX_set_cipher_list(ctx,arg) == 0) {
> > > +if (SSL_CTX_set_cipher_list(ctx, input) == 0) {
> > >  VLOG_ERR("SSL_CTX_set_cipher_list: %s",
> > >   ERR_error_string(ERR_get_

Re: [ovs-dev] [PATCH v2 2/2] stream-ssl: Update default protocols, enable TLSv1.3

2021-09-09 Thread Flavio Leitner


Hi Frode,

On Wed, Aug 25, 2021 at 01:05:14PM +0200, Frode Nordahl wrote:
> RFC 8996 [0] deprecates the use of TLSv1 and TLSv1.1 for security
> reasons.  Update our default list of protcols to be in compliance.
> 
> Also add TLSv1.3 to the default list of supported protocols.
> 
> 0: https://datatracker.ietf.org/doc/html/rfc8996
> Signed-off-by: Frode Nordahl 

This patch does two things:
  Deprecate TLSv1 and TLSv1.2
  Add support for TLSv1.3

Can we split them into separate logical patches?

Also, shouldn't we first warn the users about deprecating
TLSv1 and TLSv1.1 for a release period, and then remove from
the default list in the next release?  I mean, this is an user
visible change, so users might need some time to adapt.

What do you think?

fbl

> ---
>  NEWS  |  7 +++
>  lib/ssl-connect.man   |  6 --
>  lib/stream-ssl.c  | 35 +--
>  m4/openvswitch.m4 | 16 
>  tests/ovsdb-server.at |  6 ++
>  5 files changed, 58 insertions(+), 12 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index 1f2adf718..502e6693a 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -8,6 +8,13 @@ Post-v2.16.0
> by default.  'other_config:dpdk-socket-limit' can be set equal to
> the 'other_config:dpdk-socket-mem' to preserve the legacy memory
> limiting behavior.
> +   - Update default configuration for enabled TLS protocols:
> + * RFC 8996 deprecates the use of TLSv1 and TLSv1.1 for security reasons.
> + * Add TLSv1.3 to the default list of enabled protocols when the built 
> with
> +   OpenSSL v1.1.1 and newer.
> + * The new default is as such: TLSv1.2,TLSv1.3
> + * As a consequence we no longer support building Open vSwitch with 
> OpenSSL
> +   versions without TLSv1.2 support.
>  
>  
>  v2.16.0 - 16 Aug 2021
> diff --git a/lib/ssl-connect.man b/lib/ssl-connect.man
> index 6e54f77ef..0dd5a29be 100644
> --- a/lib/ssl-connect.man
> +++ b/lib/ssl-connect.man
> @@ -1,10 +1,12 @@
>  .IP "\fB\-\-ssl\-protocols=\fIprotocols\fR"
>  Specifies, in a comma- or space-delimited list, the SSL protocols
>  \fB\*(PN\fR will enable for SSL connections.  Supported
> -\fIprotocols\fR include \fBTLSv1\fR, \fBTLSv1.1\fR, and \fBTLSv1.2\fR.
> +\fIprotocols\fR include \fBTLSv1.2\fR and \fBTLSv1.3\fR depending on
> +which version of OpenSSL Open vSwitch is built with.
>  Regardless of order, the highest protocol supported by both sides will
>  be chosen when making the connection.  The default when this option is
> -omitted is \fBTLSv1,TLSv1.1,TLSv1.2\fR.
> +omitted is \fBTLSv1.2,TLSv1.3\fR if built with a version of OpenSSL that
> +supports \fBTLSv1.3\fR.
>  .
>  .IP "\fB\-\-ssl\-ciphers=\fIciphers\fR"
>  Specifies, in OpenSSL cipher string format, the ciphers \fB\*(PN\fR will 
> diff --git a/lib/stream-ssl.c b/lib/stream-ssl.c
> index 6b4cf6970..954067787 100644
> --- a/lib/stream-ssl.c
> +++ b/lib/stream-ssl.c
> @@ -162,7 +162,13 @@ struct ssl_config_file {
>  static struct ssl_config_file private_key;
>  static struct ssl_config_file certificate;
>  static struct ssl_config_file ca_cert;
> -static char *default_ssl_protocols = "TLSv1,TLSv1.1,TLSv1.2";
> +/* RFC 8996 deprecates the use of TLSv1 and TLSv1.1, users may still specify
> + * them in their configuration, but our defaults are in compliance. */
> +#if OPENSSL_VERSION_NUMBER < 0x10101000L
> +static char *default_ssl_protocols = "TLSv1.2";
> +#else
> +static char *default_ssl_protocols = "TLSv1.2,TLSv1.3";
> +#endif /* OPENSSL_VERSION_NUMBER < 0x10101000L */
>  static char *default_ssl_ciphers = "HIGH:!aNULL:!MD5";
>  static char *ssl_protocols = NULL;
>  static char *ssl_ciphers = NULL;
> @@ -1255,9 +1261,18 @@ stream_ssl_set_protocols(const char *arg)
>  return;
>  }
>  
> +/* TODO: Using SSL_CTX_set_options to enable individual protocol versions
> + * is deprecated as of OpenSSL v1.1.0.  Once we can drop support for 
> builds
> + * with OpenSSL pre v1.1.0 we should use SSL_CTX_set_min_proto_version 
> and
> + * SSL_CTX_set_max_proto_version instead. */
> +
>  /* Start with all the flags off and turn them on as requested. */
>  #ifndef SSL_OP_NO_SSL_MASK
> -/* For old OpenSSL without this macro, this is the correct value.  */
> +/* For old OpenSSL without this macro, this is the correct value.
> + *
> + * NOTE: We deliberately did not extend this compatibility macro to
> + * include SSL_OP_NO_TLSv1_3 because if you do have a version of OpenSSL
> + * with TLSv1.3 support this macro would be defined by OpenSSL. */
>  #define SSL_OP_NO_SSL_MASK (SSL_OP_NO_SSLv2 | SSL_OP_NO_SSLv3 | \
>  SSL_OP_NO_TLSv1 | SSL_OP_NO_TLSv1_1 | \
>  SSL_OP_NO_TLSv1_2)
> @@ -1272,12 +1287,20 @@ stream_ssl_set_protocols(const char *arg)
>  goto exit;
>  }
>  while (word != NULL) {
> -long on_flag;
> -if (!strcasecmp(word, "TLSv1.2")){
> +long 

Re: [ovs-dev] [PATCH v2 1/2] stream-ssl: Fix handling of default ciphers/protocols

2021-09-09 Thread Flavio Leitner


Hi Frode,

Thanks for your patch.
Please see my comments below.

On Wed, Aug 25, 2021 at 01:05:13PM +0200, Frode Nordahl wrote:
> Contrary to what is stated in the documentation, when SSL
> ciphers or protocols options are omitted the default values
> will not be set.  The SSL library default settings will be used
> instead.
> 
> Fix handling of default ciphers and protocols so that we actually
> enforce what is listed as defaults.
> 
> Fixes: e18a1d086133 ("Add support for specifying SSL connection parameters to 
> ovsdb")
> Signed-off-by: Frode Nordahl 
> ---
>  lib/stream-ssl.c | 30 ++
>  1 file changed, 22 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/stream-ssl.c b/lib/stream-ssl.c
> index 0ea3f2c08..6b4cf6970 100644
> --- a/lib/stream-ssl.c
> +++ b/lib/stream-ssl.c
> @@ -162,8 +162,10 @@ struct ssl_config_file {
>  static struct ssl_config_file private_key;
>  static struct ssl_config_file certificate;
>  static struct ssl_config_file ca_cert;
> -static char *ssl_protocols = "TLSv1,TLSv1.1,TLSv1.2";
> -static char *ssl_ciphers = "HIGH:!aNULL:!MD5";
> +static char *default_ssl_protocols = "TLSv1,TLSv1.1,TLSv1.2";
> +static char *default_ssl_ciphers = "HIGH:!aNULL:!MD5";
> +static char *ssl_protocols = NULL;
> +static char *ssl_ciphers = NULL;
>  
>  /* Ordinarily, the SSL client and server verify each other's certificates 
> using
>   * a CA certificate.  Setting this to false disables this behavior.  (This 
> is a
> @@ -1225,14 +1227,19 @@ stream_ssl_set_key_and_cert(const char 
> *private_key_file,
>  void
>  stream_ssl_set_ciphers(const char *arg)
>  {
> -if (ssl_init() || !arg || !strcmp(ssl_ciphers, arg)) {

The ssl_init() calls at least one time do_ssl_init() which then
calls SSL_CTX_set_cipher_list(ctx, "HIGH:!aNULL:!MD5").
Those are the defaults in the man-page and not from the library.

The do_ssl_init() also does:
   method = CONST_CAST(SSL_METHOD *, SSLv23_method());  

That should return SSLv3, TLSv1, TLSv1.1 and TLS1.2.

   ctx = SSL_CTX_new(method); 
   SSL_CTX_set_options(ctx, SSL_OP_NO_SSLv2 | SSL_OP_NO_SSLv3);

And there it excludes those SSL v2 and v3.

Therefore, the default would be "TLSv1,TLSv1.1,TLSv1.2" which is
the same in the man-page.

Did I miss something?

Thanks
fbl



> +const char *input = arg ? arg : default_ssl_ciphers;
> +
> +if (ssl_init() || !input || (ssl_ciphers && !strcmp(ssl_ciphers, 
> input))) {
>  return;
>  }
> -if (SSL_CTX_set_cipher_list(ctx,arg) == 0) {
> +if (SSL_CTX_set_cipher_list(ctx, input) == 0) {
>  VLOG_ERR("SSL_CTX_set_cipher_list: %s",
>   ERR_error_string(ERR_get_error(), NULL));
>  }
> -ssl_ciphers = xstrdup(arg);
> +if (ssl_ciphers) {
> +free(ssl_ciphers);
> +}
> +ssl_ciphers = xstrdup(input);
>  }
>  
>  /* Set SSL protocols based on the string input. Aborts with an error message
> @@ -1240,7 +1247,11 @@ stream_ssl_set_ciphers(const char *arg)
>  void
>  stream_ssl_set_protocols(const char *arg)
>  {
> -if (ssl_init() || !arg || !strcmp(arg, ssl_protocols)){
> +const char *input = arg ? arg : default_ssl_protocols;
> +
> +if (ssl_init() || !input
> +|| (ssl_protocols && !strcmp(input, ssl_protocols)))
> +{
>  return;
>  }
>  
> @@ -1253,7 +1264,7 @@ stream_ssl_set_protocols(const char *arg)
>  #endif
>  long protocol_flags = SSL_OP_NO_SSL_MASK;
>  
> -char *s = xstrdup(arg);
> +char *s = xstrdup(input);
>  char *save_ptr = NULL;
>  char *word = strtok_r(s, " ,\t", _ptr);
>  if (word == NULL) {
> @@ -1281,7 +1292,10 @@ stream_ssl_set_protocols(const char *arg)
>  /* Set the actual options. */
>  SSL_CTX_set_options(ctx, protocol_flags);
>  
> -ssl_protocols = xstrdup(arg);
> +if (ssl_protocols) {
> +  free(ssl_protocols);
> +}
> +ssl_protocols = xstrdup(input);
>  
>  exit:
>  free(s);
> -- 
> 2.32.0
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-linux: Fix a null pointer dereference in netdev_linux_notify_sock()

2021-09-09 Thread Flavio Leitner
On Thu, Sep 09, 2021 at 02:08:50PM +0200, David Marchand wrote:
> On Wed, Sep 8, 2021 at 1:53 PM Yunjian Wang  wrote:
> >
> > If nl_sock_join_mcgroup() returns an error, the 'sock' is freed
> > and set to NULL. So we should add NULL check of 'sock' before calling
> > nl_sock_listen_all_nsid().
> >
> > Fixes: cf114a7fce80 ("netlink linux: enable listening to all nsids")
> > Cc: Flavio Leitner 
> > Signed-off-by: Yunjian Wang 
> > ---
> >  lib/netdev-linux.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> > index 60dd13891..7fec5f5a6 100644
> > --- a/lib/netdev-linux.c
> > +++ b/lib/netdev-linux.c
> > @@ -636,7 +636,9 @@ netdev_linux_notify_sock(void)
> >  }
> >  }
> >  }
> > -nl_sock_listen_all_nsid(sock, true);
> > +if (sock) {
> > +nl_sock_listen_all_nsid(sock, true);
> > +}
> >  ovsthread_once_done();
> >  }
> >
> 
> Would it make sense to move this call before the loop on groups?
> Something like:

It does to me. The nl_sock_listen_all_nsid() only sets a flag in
the socket, so it should not matter whether it is done before or
after joining the mcgroups.

fbl


> 
> @@ -627,6 +627,7 @@ netdev_linux_notify_sock(void)
>  if (!error) {
>  size_t i;
> 
> +nl_sock_listen_all_nsid(sock, true);
>  for (i = 0; i < ARRAY_SIZE(mcgroups); i++) {
>  error = nl_sock_join_mcgroup(sock, mcgroups[i]);
>  if (error) {
> @@ -636,7 +637,6 @@ netdev_linux_notify_sock(void)
>  }
>  }
>  }
> -nl_sock_listen_all_nsid(sock, true);
>  ovsthread_once_done();
>  }
> 
> 
> -- 
> David Marchand
> 

-- 
fbl

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH branch-2.16 0/2] Release patches for v2.16.0.

2021-08-16 Thread Flavio Leitner
On Mon, Aug 16, 2021 at 10:09:28PM +0200, Ilya Maximets wrote:
> There is still a couple of bug fixes that I want to apply before
> tagging v2.16.0, which are fixes for PACKET_OUT crash in userspace
> datapath:
>   
> https://patchwork.ozlabs.org/project/openvswitch/patch/20210816051007.16373-1-tony.vanderp...@alliedtelesis.co.nz/
> and the fix for setting conntrack zone match from frozen metadata:
>   
> https://patchwork.ozlabs.org/project/openvswitch/patch/20210801130911.66655-1-hepeng.0...@bytedance.com/
> 
> Sending release patches now to get them reviewed in time.
> 
> Ilya Maximets (2):
>   Set release date for 2.16.0.
>   Prepare for 2.16.1.
> 
>  NEWS | 5 -
>  configure.ac | 2 +-
>  debian/changelog | 8 +++-
>  3 files changed, 12 insertions(+), 3 deletions(-)
> 
> -- 

To this series.
Acked-by: Flavio Leitner 

Thanks Ilya!
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ideas for improving the OVS review process

2021-08-16 Thread Flavio Leitner


Hi,

On Tue, Jul 20, 2021 at 11:41:37AM -0700, Ben Pfaff wrote:
> The OVS review process has greatly slowed over the last few years.  This
> is partly because I haven't been able to spend as much time on review,
> since I was once the most productive reviewer.  Ilya has been able to
> step up the amount of review he does, but that still isn't enough to
> keep up with the workload.
> 
> We need to come up with some way to improve things.  Here are a few
> ideas, mostly from a call earlier today (that was mainly about the OVS
> conference).  I hope they will prompt a discussion.
> 
> * Since patches are coming in, we have people who are knowledgable about
>   the code.  Those people should be pitching in with reviews as well.
>   It doesn't seem like they or their managers have the right incentives
>   to do that.  Maybe there is some way to improve the incentives.

There is that and also that even with others review, a maintainer
still does a careful review which puts off reviewers' work.
"The perfect is the enemy of the good."

We have several emeritus maintainers but only a couple of them active
at the moment. It doesn't seem fair to ask them to do more work.  So,
I agree with the other reply to this thread saying that we need more
maintainers.

Although I think we could grow the number of maintainers, it doesn't
seem reasonable to ask each one of them to do an extensive review
by themselves on every patch.  My point is that there should be a
"chain of trust", or a rule that allows a patch to be "blindly
committed" if that patch is reviewed by N different members, for example.
The intention is not to forbid maintainers to review, but to alleviate
the load or pressure on them when the community already helped.

As a side effect, more people could become maintainers/committers
because we don't require them to know OVS deeply or to commit huge
amounts of time to careful review others work.

This brings up another point: Unpredictability. It is not possible
today to tell what is going to happen with a patch after it gets
posted to mailing list. It can be silent ignored for many months,
or be reviewed by others and still not accepted, etc.  We should
have a way to prevent that otherwise it makes very difficult to
align companies or other upstream projects schedules. 

For example, if a company is investing on a feature "X" most probably
has a deliver date. Even if the work is posted during development
phase, that doesn't mean the patch will be reviewed or accepted or
rejected. It's kind of chicken and egg issue. If the process is not
clear, why managers should provide more incentives to participate?


> * The Linux kernel uses something like a "default accept" policy for
>   patches that superficially look good and compile and boot, if there is
>   no review from a specific maintainer within a few days.  The patches
>   do get some small amount of review from a subsystem maintainer.  OVS
>   could adopt a similar policy.

I agree. Aaron is working to improve patchwork's status report for
each patch. Hopefully each community member could report test status
there.  It seems that if the community decides to go with the "default
accept" rule, there will be more incentive to connect tests to OVS
patchwork. We get more automation as a side effect.

However, we would still need to define who is the sub-maintainer
or perhaps think on the "chain of trust"/"blindy committed" idea
mentioned above.


> * Some lack of review can be attributed to a reluctance to accept a
>   review from a reviewer who is at the same company as the patch
>   submitter.  There is good reason for this, but it is certainly
>   possible to get quality reviews from a coworker, and perhaps we should
>   relax the convention.

I also agree with that. I think OVS had that going on years ago and
worked well. The same happens with OVN.  It should be OK during
devel phase when we can revert if something goes wrong. We may strict
if it's close to release date though, if some are uncomfortable with
this idea.


> * A flip side of the above could be to codify the requirement for review
>   from a non-coworker.  This would have the benefit of being able to
>   push back against requests to commit unreviewed long series on the
>   basis that it hasn't been reviewed by a third party.

Good idea. That goes in the same direction of having rules to help
maintainers to accept patches without require them to do an
extensive review.

Also,  should we change 0-day bot to send reminders if a patch is
sitting without a review for a certain period of time?

Should we have rules like for example hardware offload - having ACK
from another hardware offload company would be enough to get the
patch accepted?


Thanks,
-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v15 06/10] dpif-netdev: Add a partial HWOL PMD statistic.

2021-07-15 Thread Flavio Leitner
On Thu, Jul 15, 2021 at 01:39:04PM +, Ferriter, Cian wrote:
> 
> 
> > -Original Message-
> > From: Flavio Leitner 
> > Sent: Friday 9 July 2021 18:54
> > To: Ferriter, Cian 
> > Cc: ovs-dev@openvswitch.org; i.maxim...@ovn.org
> > Subject: Re: [ovs-dev] [v15 06/10] dpif-netdev: Add a partial HWOL PMD 
> > statistic.
> > 
> > 
> > 
> > Hi,
> > 
> > After rebasing, the performance of branch master boosted in my env
> > from 12Mpps to 13Mpps. However, this specific patch brings down
> > to 12Mpps. I am using dpif_scalar and generic lookup (no AVX512).
> > 
> 
> Thanks for the investigation. Always great seeing perf numbers and details!
> 
> I just want to check my understanding here with what you're seeing:
> 
> Performance before DPIF patchset
> 12Mpps
> 
> Performance at this patch
> 12Mpps
> 
> Performance after DPIF patchset
> 13Mpps
> 
> So the performance recovers somewhere else in the patchset?


Interesting, which flags are you passing to build OVS?

Thanks for following up!
fbl


> 
> I've checked the performance behaviour in my case. I'm going to report 
> relative performance numbers. They are relative to master branch before 
> AVX512 DPIF was applied (c36c8e3).
> I tried to run a similar testcase, I can see you are using EMC from the 
> memcmp in perf top output. I am also using the scalar DPIF in all the below 
> testcases.
> 
> Master before AVX512 DPIF (c36c8e3)
> 1.000x (0.0%)
> DPIF patch 3 - dpif-avx512: Add ISA implementation of dpif.
> 1.010x (1.0%)
> DPIF patch 4 - dpif-netdev: Add command to switch dpif implementation.
> 1.042x (4.2%)
> DPIF patch 5 - dpif-netdev: Add command to get dpif implementations.
> 1.063x (6.3%)
> DPIF patch 6 - dpif-netdev: Add a partial HWOL PMD statistic.
> 1.069x (6.9%)
> Latest master which has AVX512 DPIF patches (d2e9703)
> 1.075x (7.5%)
> Master before AVX512 DPIF (c36c8e3), with prefetch change
> 0.983x (-1.7%)
> Latest master which has AVX512 DPIF patches (d2e9703), with prefetch change
> 1.080x (8.0%)
> 
> > (I don't think this report should block the patch because the
> > counter are interesting and the analysis below doesn't point
> > directly to the proposed changes.)
> > 
> > This is a diff using all patches applied versus this patch reverted:
> > 21.44% +6.08%  ovs-vswitchd[.] miniflow_extract
> >  8.94% -1.92%  libc-2.28.so[.] __memcmp_avx2_movbe
> > 14.62% +1.44%  ovs-vswitchd[.] dp_netdev_input__
> >  2.80% -1.08%  ovs-vswitchd[.] 
> > dp_netdev_pmd_flush_output_on_port
> >  3.44% -0.91%  ovs-vswitchd[.] netdev_send
> > 
> > This is the code side by side, patch applied on the right side:
> > (sorry, long lines)
> > 
> 
> My mail client has wrapped the below lines, sorry for mangling the output!
> 
> 
> Please find it here:
> https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385448.html
> 
> > 
> > 
> > I don't see any relevant optimization difference in the code
> > above, but the "mov %r15w,-0x2(%r13)" on the right side accounts
> > for almost all the difference, though on the left side it seems
> > a bit more spread.
> > 
> > I applied the patch below and it helped to get to 12.7Mpps, so
> > almost at the same levels. I wonder if you see the same result.
> > 
> 
> Since I don't see the drop that you see with this patch, when I apply the 
> below patch to the latest master, I see a smaller benefit.
> The relative performance after adding the below prefetch compared to before 
> (latest master):
> 1.005x (0.5%)
> 
> When I compare before/after performance (including the prefetch code, on 
> latest master), the overall performance difference is 0.5% here.
> 
> > diff --git a/lib/flow.c b/lib/flow.c
> > index 729d59b1b..4572e356b 100644
> > --- a/lib/flow.c
> > +++ b/lib/flow.c
> > @@ -746,6 +746,9 @@ miniflow_extract(struct dp_packet *packet, struct 
> > miniflow *dst)
> >  uint8_t *ct_nw_proto_p = NULL;
> >  ovs_be16 ct_tp_src = 0, ct_tp_dst = 0;
> > 
> > +/* dltype will be updated later. */
> > +OVS_PREFETCH_WRITE(miniflow_pointer(mf, dl_type));
> > +
> >  /* Metadata. */
> >  if (flow_tnl_dst_is_set(>tunnel)) {
> >  miniflow_push_words(mf, tunnel, >tunnel,
> > 
> > 
> > fbl
> > 
> 
> 
> 
> Thanks,
> Cian

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v14 07/11] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-07-15 Thread Flavio Leitner
On Thu, Jul 15, 2021 at 09:36:13PM +0530, kumar Amber wrote:
> From: Kumar Amber 
> 
> Tests:
>   6: OVS-DPDK - MFEX Autovalidator
>   7: OVS-DPDK - MFEX Autovalidator Fuzzy
>   8: OVS-DPDK - MFEX Configuration
> 
> Added a new directory to store the PCAP file used
> in the tests and a script to generate the fuzzy traffic
> type pcap to be used in fuzzy unit test.
> 
> Signed-off-by: Kumar Amber 
> Acked-by: Flavio Leitner 
> 
> ---

It looks good and works for me.

Acked-by: Flavio Leitner 

Without sse4.2:
OVS-DPDK unit tests

  6: OVS-DPDK - MFEX Autovalidator   skipped 
(system-dpdk.at:248)
  7: OVS-DPDK - MFEX Autovalidator Fuzzy skipped 
(system-dpdk.at:275)
  8: OVS-DPDK - MFEX Configuration   ok

With sse4.2:
OVS-DPDK unit tests

  6: OVS-DPDK - MFEX Autovalidator   ok
  7: OVS-DPDK - MFEX Autovalidator Fuzzy ok
  8: OVS-DPDK - MFEX Configuration   ok

fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v14 10/11] dpif-netdev/mfex: Add AVX512 based optimized miniflow extract

2021-07-15 Thread Flavio Leitner
On Thu, Jul 15, 2021 at 09:36:16PM +0530, kumar Amber wrote:
> From: Harry van Haaren 
> 
> This commit adds AVX512 implementations of miniflow extract.
> By using the 64 bytes available in an AVX512 register, it is
> possible to convert a packet to a miniflow data-structure in
> a small quantity instructions.
> 
> The implementation here probes for Ether()/IP()/UDP() traffic,
> and builds the appropriate miniflow data-structure for packets
> that match the probe.
> 
> The implementation here is auto-validated by the miniflow
> extract autovalidator, hence its correctness can be easily
> tested and verified.
> 
> Note that this commit is designed to easily allow addition of new
> traffic profiles in a scalable way, without code duplication for
> each traffic profile.
> 
> Signed-off-by: Harry van Haaren 
> Acked-by: Eelco Chaudron 
> 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v14 06/11] dpif-netdev: Add packet count and core id paramters for study

2021-07-15 Thread Flavio Leitner
amp; mfex_name_is_study) {
> +if (!str_to_uint(argv[1], 10, _count) ||
> +(study_count == 0)) {
> +ds_put_format(,
> +"Error: Invalid study_pkt_cnt value: %s.\n", argv[1]);

Ian flagged this one, but there is another missing spot below.

> +goto error;
> +}
> +
> +argc -= 1;
> +argv += 1;
> +} else {
> +ds_put_format(, "Error: unknown argument %s.\n", argv[1]);
> +goto error;
> +break;
> +}
> +}
> +
> +/* Ensure user passed an MFEX name. */
> +if (!mfex_name) {
> +ds_put_format(, "Error: no miniflow extract name provided. "
> +"Output of miniflow-parser-get shows implementation 
> list.\n");
> +goto error;
> +}
>  
> -int err = dp_mfex_impl_set_default_by_name(mfex_name);
> +/* If the MFEX name is "study", set the study packet count. */
> +if (mfex_name_is_study) {
> +err = mfex_set_study_pkt_cnt(study_count, mfex_name);
> +if (err) {
> +ds_put_format(, "Error: failed to set study count %d for"
> +  " miniflow extract implementation %s.\n",
> +  study_count, mfex_name);
> +goto error;
> +}
> +}
>  
> +/* Set the default MFEX impl only if the command was applied to all PMD
> + * threads. If a PMD thread was selected, do NOT update the default.
> + */
> +if (pmd_thread_to_change == NON_PMD_CORE_ID) {
> +err = dp_mfex_impl_set_default_by_name(mfex_name);
> +if (err == -ENODEV) {
> +ds_put_format(,
> +  "Miniflow extract not available due to CPU ISA requirements: 
> %s",
> +  mfex_name);

Here. Error: miniflow extract...

AppVeyor is ok, notice no regression, study works and selects the
correct traffic profile for UDP and TCP, testsuite ok, dpdk
testsuite ok.

Acked-by: Flavio Leitner 



> +goto error;
> +} else if (err) {
> +ds_put_format(,
> +  "Error: unknown miniflow extract implementation %s.",
> +  mfex_name);
> +goto error;
> +}
> +}
> +
> +/* Get the desired MFEX function pointer and error check its usage. */
> +miniflow_extract_func mfex_func = NULL;
> +err = dp_mfex_impl_get_by_name(mfex_name, _func);
>  if (err) {
> -struct ds reply = DS_EMPTY_INITIALIZER;
> -char *error_desc = NULL;
> -if (err == -EINVAL) {
> -error_desc = "Unknown miniflow extract implementation:";
> -} else if (err == -ENOENT) {
> -error_desc = "Miniflow extract implementation doesn't exist:";
> -} else if (err == -ENODEV) {
> -error_desc = "Miniflow extract implementation not available:";
> +if (err == -ENODEV) {
> +ds_put_format(,
> +  "Error: miniflow extract not available due to CPU ISA "
> +  "requirements: %s", mfex_name);
>  } else {
> -error_desc = "Miniflow extract implementation Error:";
> +ds_put_format(,
> +   "Error: unknown miniflow extract implementation %s.",
> +   mfex_name);
>  }
> -ds_put_format(, "%s %s.\n", error_desc, mfex_name);
> -const char *reply_str = ds_cstr();
> -unixctl_command_reply_error(conn, reply_str);
> -VLOG_INFO("%s", reply_str);
> -ds_destroy();
> -return;
> +goto error;
>  }
>  
> +/* Apply the MFEX pointer to each pmd thread in each netdev, filtering
> + * by the users "-pmd" argument if required.
> + */
>  ovs_mutex_lock(_netdev_mutex);
>  
>  SHASH_FOR_EACH (node, _netdevs) {
> @@ -1114,7 +1205,6 @@ dpif_miniflow_extract_impl_set(struct unixctl_conn 
> *conn, int argc OVS_UNUSED,
>  size_t n;
>  
>  sorted_poll_thread_list(dp, _list, );
> -miniflow_extract_func default_func = dp_mfex_impl_get_default();
>  
>  for (size_t i = 0; i < n; i++) {
>  struct dp_netdev_pmd_thread *pmd = pmd_list[i];
> @@ -1122,23 +1212,51 @@ dpif_miniflow_extract_impl_set(struct unixctl_conn 
> *conn, int argc OVS_UNUSED,
>  continue;
>  }
>  
> -/* Initialize MFEX function pointer to the newly configured
> - * default. */
> +/* If -pmd specified,

Re: [ovs-dev] [PATCH v13 07/11] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-07-15 Thread Flavio Leitner
On Thu, Jul 15, 2021 at 06:12:13PM +0530, kumar Amber wrote:
> From: Kumar Amber 
> 
> Tests:
>   6: OVS-DPDK - MFEX Autovalidator
>   7: OVS-DPDK - MFEX Autovalidator Fuzzy
>   8: OVS-DPDK - MFEX Configuration
> 
> Added a new directory to store the PCAP file used
> in the tests and a script to generate the fuzzy traffic
> type pcap to be used in fuzzy unit test.
> 
> Signed-off-by: Kumar Amber 
> Acked-by: Flavio Leitner 
> 
> ---
> v13:
> - fix -v in the command
> - added the configuration test case and supporting doc update
> v12:
> - change skip paramter for unit test
> v11:
> - fix comments from Eelco
> v7:
> - fix review comments(Eelco)
> v5:
> - fix review comments(Ian, Flavio, Eelco)
> - remove sleep from first test and added minor 5 sec sleep to fuzzy
> ---
> ---
>  Documentation/topics/dpdk/bridge.rst |  56 
>  tests/.gitignore |   1 +
>  tests/automake.mk|   6 ++
>  tests/mfex_fuzzy.py  |  33 +++
>  tests/pcap/mfex_test.pcap| Bin 0 -> 416 bytes
>  tests/system-dpdk.at | 129 +++
>  6 files changed, 225 insertions(+)
>  create mode 100755 tests/mfex_fuzzy.py
>  create mode 100644 tests/pcap/mfex_test.pcap
> 
> diff --git a/Documentation/topics/dpdk/bridge.rst 
> b/Documentation/topics/dpdk/bridge.rst
> index 8c500c504..913b3e6f6 100644
> --- a/Documentation/topics/dpdk/bridge.rst
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -346,3 +346,59 @@ A compile time option is available in order to test it 
> with the OVS unit
>  test suite. Use the following configure option ::
>  
>  $ ./configure --enable-mfex-default-autovalidator
> +
> +Unit Test Miniflow Extract
> +++
> +
> +Unit test can also be used to test the workflow mentioned above by running
> +the following test-case in tests/system-dpdk.at ::
> +
> +make check-dpdk TESTSUITEFLAGS='-k MFEX'
> +OVS-DPDK - MFEX Autovalidator
> +
> +The unit test uses mulitple traffic type to test the correctness of the
> +implementaions.
> +
> +The MFEX commands can also be tested for negative and positive cases to
> +verify that the MFEX set command does not allow for incorrect parameters.
> +A user can directly run the following configuration test case in
> +tests/system-dpdk.at ::
> +
> +make check-dpdk TESTSUITEFLAGS='-k MFEX'
> +OVS-DPDK - MFEX Configuration
> +
> +Running Fuzzy test with Autovalidator
> ++
> +
> +Fuzzy tests can also be done on miniflow extract with the help of
> +auto-validator and Scapy. The steps below describes the steps to
> +reproduce the setup with IP being fuzzed to generate packets.
> +
> +Scapy is used to create fuzzy IP packets and save them into a PCAP ::
> +
> +pkt = fuzz(Ether()/IP()/TCP())
> +
> +Set the miniflow extract to autovalidator using ::
> +
> +$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
> +
> +OVS is configured to receive the generated packets ::
> +
> +$ ovs-vsctl add-port br0 pcap0 -- \
> +set Interface pcap0 type=dpdk options:dpdk-devargs=net_pcap0
> +"rx_pcap=fuzzy.pcap"
> +
> +With this workflow, the autovalidator will ensure that all MFEX
> +implementations are classifying each packet in exactly the same way.
> +If an optimized MFEX implementation causes a different miniflow to be
> +generated, the autovalidator has ovs_assert and logging statements that
> +will inform about the issue.
> +
> +Unit Fuzzy test with Autovalidator
> ++
> +
> +Unit test can also be used to test the workflow mentioned above by running
> +the following test-case in tests/system-dpdk.at ::
> +
> +make check-dpdk TESTSUITEFLAGS='-k MFEX'
> +OVS-DPDK - MFEX Autovalidator Fuzzy
> diff --git a/tests/.gitignore b/tests/.gitignore
> index 45b4f67b2..a3d927e5d 100644
> --- a/tests/.gitignore
> +++ b/tests/.gitignore
> @@ -11,6 +11,7 @@
>  /ovsdb-cluster-testsuite
>  /ovsdb-cluster-testsuite.dir/
>  /ovsdb-cluster-testsuite.log
> +/pcap/
>  /pki/
>  /system-afxdp-testsuite
>  /system-afxdp-testsuite.dir/
> diff --git a/tests/automake.mk b/tests/automake.mk
> index f45f8d76c..a6c15ba55 100644
> --- a/tests/automake.mk
> +++ b/tests/automake.mk
> @@ -143,6 +143,11 @@ $(srcdir)/tests/fuzz-regression-list.at: 
> tests/automake.mk
>   echo "TEST_FUZZ_REGRESSION([$$basename])"; \
>   done > $@.tmp && mv $@.tmp $@
>  
> +EXTRA_DIST += $(MFEX_AUTOVALIDATOR_TESTS)
> +MFEX_AUTOVALIDATOR_TESTS = \
> + tests/pcap/mfex_test.pcap \

Re: [ovs-dev] [PATCH v13 06/11] dpif-netdev: Add packet count and core id paramters for study

2021-07-15 Thread Flavio Leitner
On Thu, Jul 15, 2021 at 04:15:13PM +0200, Eelco Chaudron wrote:
> 
> Some minor changes in output, maybe they can be done during the commit?
> 
> On 15 Jul 2021, at 14:42, kumar Amber wrote:
> 
> > From: Kumar Amber 
> > 
> > This commit introduces additional command line paramter
> > for mfex study function. If user provides additional packet out
> > it is used in study to compare minimum packets which must be processed
> > else a default value is choosen.
> > Also introduces a third paramter for choosing a particular pmd core.
> > 
> > $ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3
> > 
> > Signed-off-by: Kumar Amber 
> > 
> > ---
> > v13:
> > - reowrked the set command as per discussion
> > - fixed the atomic set in study
> > - added bool for handling study mfex to simplify logic and command
> > output
> > - fixed double space in variable declaration and removed static
> > v12:
> > - re-work the set command to sweep
> > - inlcude fixes to study.c and doc changes
> > v11:
> > - include comments from Eelco
> > - reworked set command as per discussion
> > v10:
> > - fix review comments Eelco
> > v9:
> > - fix review comments Flavio
> > v7:
> > - change the command paramters for core_id and study_pkt_cnt
> > v5:
> > - fix review comments(Ian, Flavio, Eelco)
> > - introucde pmd core id parameter
> > ---
> > ---
> >  Documentation/topics/dpdk/bridge.rst |  38 +-
> >  lib/dpif-netdev-extract-study.c  |  27 -
> >  lib/dpif-netdev-private-extract.h|   9 ++
> >  lib/dpif-netdev.c| 173 ++-
> >  4 files changed, 215 insertions(+), 32 deletions(-)
> > 
> > diff --git a/Documentation/topics/dpdk/bridge.rst
> > b/Documentation/topics/dpdk/bridge.rst
> > index a47153495..8c500c504 100644
> > --- a/Documentation/topics/dpdk/bridge.rst
> > +++ b/Documentation/topics/dpdk/bridge.rst
> > @@ -284,12 +284,46 @@ command also shows whether the CPU supports each
> > implementation ::
> > 
> >  An implementation can be selected manually by the following command ::
> > 
> > -$ ovs-appctl dpif-netdev/miniflow-parser-set study
> > +$ ovs-appctl dpif-netdev/miniflow-parser-set [-pmd core_id] [name]
> > + [study_cnt]
> > +
> > +The above command has two optional parameters: study_cnt and core_id.
> > +The core_id sets a particular miniflow extract function to a specific
> > +pmd thread on the core. The third parameter study_cnt, which is
> > specific
> > +to study and ignored by other implementations, means how many packets
> > +are needed to choose the best implementation.
> > 
> >  Also user can select the study implementation which studies the traffic
> > for
> >  a specific number of packets by applying all available implementations
> > of
> >  miniflow extract and then chooses the one with the most optimal result
> > for
> > -that traffic pattern.
> > +that traffic pattern. The user can optionally provide an packet count
> > +[study_cnt] parameter which is the minimum number of packets that OVS
> > must
> > +study before choosing an optimal implementation. If no packet count is
> > +provided, then the default value, 128 is chosen. Also, as there is no
> > +synchronization point between threads, one PMD thread might still be
> > running
> > +a previous round, and can now decide on earlier data.
> > +
> > +The per packet count is a global value, and parallel study executions
> > with
> > +differing packet counts will use the most recent count value provided
> > by user.
> > +
> > +Study can be selected with packet count by the following command ::
> > +
> > +$ ovs-appctl dpif-netdev/miniflow-parser-set study 1024
> > +
> > +Study can be selected with packet count and explicit PMD selection
> > +by the following command ::
> > +
> > +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 study 1024
> > +
> > +In the above command the first parameter is the CORE ID of the PMD
> > +thread and this can also be used to explicitly set the miniflow
> > +extraction function pointer on different PMD threads.
> > +
> > +Scalar can be selected on core 3 by the following command where
> > +study count should not be provided for any implementation other
> > +than study ::
> > +
> > +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 scalar
> > 
> >  Miniflow Extract Validation
> >  ~~~
> > diff --git a/lib/dpif-netdev-extract-study.c
> > b/lib/dpif-netdev-extract-study.c
> > index 02b709f8b..7725c8f6e 100644
> > --- a/lib/dpif-netdev-extract-study.c
> > +++ b/lib/dpif-netdev-extract-study.c
> > @@ -25,7 +25,7 @@
> > 
> >  VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study);
> > 
> > -static atomic_uint32_t mfex_study_pkts_count = 0;
> > +static atomic_uint32_t mfex_study_pkts_count = MFEX_MAX_PKT_COUNT;
> > 
> >  /* Struct to hold miniflow study stats. */
> >  struct study_stats {
> > @@ -48,6 +48,26 @@ mfex_study_get_study_stats_ptr(void)
> >  return stats;
> >  }
> > 
> > +int
> > 

Re: [ovs-dev] [v12 00/11] MFEX Infrastructure + Optimizations

2021-07-14 Thread Flavio Leitner


Hi,

I have reviewed more patches, but there are some that Eelco
requested to follow up with fixes, so I am going to wait as well.

Thanks,
fbl

On Wed, Jul 14, 2021 at 07:44:33PM +0530, kumar Amber wrote:
> v12:
> - re-work the set command to sweep method
> - changes skip for unit-test to true from not-available
> - added acks from Eelco
> - minor doc fixed and typos
> v11:
> - reworked set command in alingment with Eelco and Harry
> - added Acks from Eelco.
> - added skip to unit test if other implementations not available
> - minor typos and fixes
> - clang build fixes
> - removed patch whith Scalar DPIF, will send separately
> v10 update:
> - re-worked the default implementation
> - fix comments from Flavio and Eelco
> - Include Acks from Eelco in study
> v9 update:
> - Include review comments from Flavio
> - Rebase onto Master
> - Include Acks from Flavio
> v8 updates:
> - Include documentation on AVX512 MFEX as per Eelco's suggestion on list
> v7 updates:
> - Rebase onto DPIF v15
> - Changed commands to get and set MFEX
> - Fixed comments from Flavio, Eelco
> - Segrated addition of MFEX options to seaprate patch 12 for Scalar DPIF
> - Removed sleep from auto-validator and added frame counter check
> - Documentation updates
> - Minor bug fixes
> v6 updates:
> - Fix non-ssl build
> v5 updates:
> - reabse onto latest DPIF v14
> - use Enum for mfex impls
> - add pmd core id set paramter in set command
> - get command modified to display the pmd thread for individual mfex functions
> - resolved comments from Eelco, Ian, Flavio
> - Use Atomic to get and set miniflow implementations
> - removed and reduced sleep in unit tests
> - fixed scalar miniflow perf degradation
> v4 updates:
> - rebase on to latest DPIF v13
> - fix fuzzy.py script with random mac/ip
> v3 updates:
> - rebase on to latest DPIF v12
> - add additonal AVX512 traffic profiles for tcp and vlan
> - add new command line for study function to add packet count
> - add unit tests for fuzzy testing and auto-validation of mfex
> - add mfex option hit stats to perf-show command
> v2 updates:
> - rebase on to latest DPIF v11
> This patchset introduces miniflow extract Infrastructure changes
> which allows user to choose different type of ISA based optimized
> miniflow extract variants which can be user choosen or set based on 
> packets studies automatically by OVS using different commands.
> The Infrastructure also provides a way to check the correctness of
> different ISA optimized miniflow extract variants against the scalar
> version.
> 
> Harry van Haaren (4):
>   dpif/stats: add miniflow extract opt hits counter
>   dpdk: add additional CPU ISA detection strings
>   dpif-netdev/mfex: Add AVX512 based optimized miniflow extract
>   dpif-netdev/mfex: add more AVX512 traffic profiles
> 
> Kumar Amber (7):
>   dpif-netdev: Add command line and function pointer for miniflow
> extract
>   dpif-netdev: Add auto validation function for miniflow extract
>   dpif-netdev: Add study function to select the best mfex function
>   docs/dpdk/bridge: add miniflow extract section.
>   dpif-netdev: Add configure to enable autovalidator at build time.
>   dpif-netdev: Add packet count and core id paramters for study
>   test/sytem-dpdk: Add unit test for mfex autovalidator
> 
>  Documentation/topics/dpdk/bridge.rst | 138 ++
>  NEWS |  11 +
>  acinclude.m4 |  16 +
>  configure.ac |   1 +
>  lib/automake.mk  |   4 +
>  lib/dpdk.c   |   2 +
>  lib/dpif-netdev-avx512.c |  34 +-
>  lib/dpif-netdev-extract-avx512.c | 630 +++
>  lib/dpif-netdev-extract-study.c  | 160 +++
>  lib/dpif-netdev-perf.c   |   3 +
>  lib/dpif-netdev-perf.h   |   1 +
>  lib/dpif-netdev-private-extract.c| 371 
>  lib/dpif-netdev-private-extract.h| 203 +
>  lib/dpif-netdev-private-thread.h |   8 +
>  lib/dpif-netdev-unixctl.man  |   4 +
>  lib/dpif-netdev.c| 241 +-
>  tests/.gitignore |   1 +
>  tests/automake.mk|   6 +
>  tests/mfex_fuzzy.py  |  33 ++
>  tests/pcap/mfex_test.pcap| Bin 0 -> 416 bytes
>  tests/pmd.at |   6 +-
>  tests/system-dpdk.at |  53 +++
>  22 files changed, 1914 insertions(+), 12 deletions(-)
>  create mode 100644 lib/dpif-netdev-extract-avx512.c
>  create mode 100644 lib/dpif-netdev-extract-study.c
>  create mode 100644 lib/dpif-netdev-private-extract.c
>  create mode 100644 lib/dpif-netdev-private-extract.h
>  create mode 100755 tests/mfex_fuzzy.py
>  create mode 100644 tests/pcap/mfex_test.pcap
> 
> -- 
> 2.25.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

-- 
fbl

Re: [ovs-dev] [v12 05/11] dpif-netdev: Add configure to enable autovalidator at build time.

2021-07-14 Thread Flavio Leitner
On Wed, Jul 14, 2021 at 07:44:38PM +0530, kumar Amber wrote:
> From: Kumar Amber 
> 
> This commit adds a new command to allow the user to enable
> autovalidatior by default at build time thus allowing for
> runnig unit test by default.
> 
>  $ ./configure --enable-mfex-default-autovalidator
> 
> Signed-off-by: Kumar Amber 
> Co-authored-by: Harry van Haaren 
> Signed-off-by: Harry van Haaren 
> Acked-by: Eelco Chaudron 
> 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v12 03/11] dpif-netdev: Add study function to select the best mfex function

2021-07-14 Thread Flavio Leitner
On Wed, Jul 14, 2021 at 07:44:36PM +0530, kumar Amber wrote:
> From: Kumar Amber 
> 
> The study function runs all the available implementations
> of miniflow_extract and makes a choice whose hitmask has
> maximum hits and sets the mfex to that function.
> 
> Study can be run at runtime using the following command:
> 
> $ ovs-appctl dpif-netdev/miniflow-parser-set study
> 
> Signed-off-by: Kumar Amber 
> Co-authored-by: Harry van Haaren 
> Signed-off-by: Harry van Haaren 
> Acked-by: Eelco Chaudron 
> 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v12 02/11] dpif-netdev: Add auto validation function for miniflow extract

2021-07-14 Thread Flavio Leitner
On Wed, Jul 14, 2021 at 07:44:35PM +0530, kumar Amber wrote:
> From: Kumar Amber 
> 
> This patch introduced the auto-validation function which
> allows users to compare the batch of packets obtained from
> different miniflow implementations against the linear
> miniflow extract and return a hitmask.
> 
> The autovaidator function can be triggered at runtime using the
> following command:
> 
> $ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
> 
> Signed-off-by: Kumar Amber 
> Co-authored-by: Harry van Haaren 
> Signed-off-by: Harry van Haaren 
> Acked-by: Eelco Chaudron 
> 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v12 01/11] dpif-netdev: Add command line and function pointer for miniflow extract

2021-07-14 Thread Flavio Leitner
On Wed, Jul 14, 2021 at 07:44:34PM +0530, kumar Amber wrote:
> From: Kumar Amber 
> 
> This patch introduces the MFEX function pointers which allows
> the user to switch between different miniflow extract implementations
> which are provided by the OVS based on optimized ISA CPU.
> 
> The user can query for the available minflow extract variants available
> for that CPU by following commands:
> 
> $ovs-appctl dpif-netdev/miniflow-parser-get
> 
> Similarly an user can set the miniflow implementation by the following
> command :
> 
> $ ovs-appctl dpif-netdev/miniflow-parser-set name
> 
> This allows for more performance and flexibility to the user to choose
> the miniflow implementation according to the needs.
> 
> Signed-off-by: Kumar Amber 
> Co-authored-by: Harry van Haaren 
> Signed-off-by: Harry van Haaren 
> Acked-by: Eelco Chaudron 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v5] dpif/dpcls: limit count subtable search info logs

2021-07-13 Thread Flavio Leitner
On Tue, Jul 13, 2021 at 07:55:32AM +0530, kumar Amber wrote:
> From: Harry van Haaren 
> 
> This commit avoids many instances of "using subtable X for miniflow (x,y)"
> in the ovs-vswitchd log when using the DPCLS Autovalidator. This occurs
> when no specialized subtable is found, and the generic "_any" version of
> the avx512 subtable search implementation was used. This change logs the
> subtable usage once, avoiding duplicates.
> 
> Signed-off-by: Harry van Haaren 
> Signed-off-by: kumar Amber 
> Co-authored-by: kumar Amber 
> 
> ---

This patch looks good and works for me.

Acked-by: Flavio Leitner 

Thanks Kumar and Harry,
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 2/2] dpctl: dpif: allow viewing and configuring dp cache sizes

2021-07-13 Thread Flavio Leitner
On Tue, Jun 01, 2021 at 12:39:19PM +0200, Eelco Chaudron wrote:
> This patch adds a general way of viewing/configuring datapath
> cache sizes. With an implementation for the netlink interface.
> 
> The ovs-dpctl/ovs-appctl show commands will display the
> current cache sizes configured:
> 
> ovs-dpctl show
> system@ovs-system:
>   lookups: hit:25 missed:63 lost:0
>   flows: 0
>   masks: hit:282 total:0 hit/pkt:3.20
>   cache: hit:4 hit rate:4.54%
>   caches:
> masks-cache: size: 256
>   port 0: ovs-system (internal)
>   port 1: br-int (internal)
>   port 2: genev_sys_6081 (geneve: packet_type=ptap)
>   port 3: br-ex (internal)
>   port 4: eth2
>   port 5: sw0p1 (internal)
>   port 6: sw0p3 (internal)
> 
> A specific cache can be configured as follows:
> 
> ovs-appctl dpctl/cache-set-size DP CACHE SIZE
> ovs-dpctl cache-set-size DP CACHE SIZE
> 
> For example to disable the cache do:
> 
> $ ovs-dpctl cache-set-size system@ovs-system masks-cache 0
> Setting cache size successful, new size 0.
> 
> Signed-off-by: Eelco Chaudron 
> 
> ---

I reviewed and tested this. It looks good and works for me.

One concern is that UINT32_MAX is used as reserved number to indicate
no kernel support in OVS but the user can try to set the cache size to
that number which would be valid. However, the kernel has a netlink
policy defining a range of values for the cache size and the maximum
is much much lower than that (4k). 

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 1/2] dpctl: dpif: add kernel datapath cache hit output

2021-07-13 Thread Flavio Leitner
On Tue, Jun 01, 2021 at 12:38:43PM +0200, Eelco Chaudron wrote:
> This patch adds cache usage statistics to the output:
> 
> $ ovs-dpctl show
> system@ovs-system:
>   lookups: hit:24 missed:71 lost:0
>   flows: 0
>   masks: hit:334 total:0 hit/pkt:3.52
>   cache: hit:4 hit rate:4.21%
>   port 0: ovs-system (internal)
>   port 1: genev_sys_6081 (geneve: packet_type=ptap)
>   port 2: br-int (internal)
>   port 3: br-ex (internal)
>   port 4: eth2
>   port 5: sw1p1 (internal)
>   port 6: sw0p4 (internal)
> 
> Signed-off-by: Eelco Chaudron 
> 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


  1   2   3   4   5   6   7   8   9   10   >