from:"Mike Pattrick"

[ovs-dev] [PATCH 3/5] dpctl: Fix Clang's static analyzer 'garbage value' warnings.

2024-05-16 Thread Mike Pattrick

Clang's static analyzer will complain about an uninitialized value
because we weren't setting a value for ufid_generated in all code paths.

Now we initialize this on declaration.

Signed-off-by: Mike Pattrick 
---
 lib/dpctl.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/lib/dpctl.c b/lib/dpctl.c
index 3c555a559..9c287d060 100644
--- a/lib/dpctl.c
+++ b/lib/dpctl.c
@@ -1366,12 +1366,11 @@ dpctl_del_flow_dpif(struct dpif *dpif, const char 
*key_s,
 struct ofpbuf mask; /* To be ignored. */
 
 ovs_u128 ufid;
-bool ufid_generated;
-bool ufid_present;
+bool ufid_generated = false;
+bool ufid_present = false;
 struct simap port_names;
 int n, error;
 
-ufid_present = false;
 n = odp_ufid_from_string(key_s, );
 if (n < 0) {
 dpctl_error(dpctl_p, -n, "parsing flow ufid");
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH 1/5] netdev-offload: Fix Clang's static analyzer 'null pointer dereference' warnings.

2024-05-16 Thread Mike Pattrick

Clang's static analyzer will complain about a null pointer dereference
because dumps can be set to null and then there is a loop where it could
have been written to.

Instead, return early from the netdev_ports_flow_dump_create function if
dumps is NULL.

Signed-off-by: Mike Pattrick 
---
 lib/netdev-offload.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
index 931d634e1..02b1cf203 100644
--- a/lib/netdev-offload.c
+++ b/lib/netdev-offload.c
@@ -638,7 +638,14 @@ netdev_ports_flow_dump_create(const char *dpif_type, int 
*ports, bool terse)
 }
 }
 
-dumps = count ? xzalloc(sizeof *dumps * count) : NULL;
+if (count == 0) {
+*ports = 0;
+ovs_rwlock_unlock(_to_netdev_rwlock);
+
+return NULL;
+}
+
+dumps = xzalloc(sizeof *dumps * count);
 
 HMAP_FOR_EACH (data, portno_node, _to_netdev) {
 if (netdev_get_dpif_type(data->netdev) == dpif_type) {
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH 5/5] netdev-linux: Fix Clang's static analyzer uninitialized values warnings.

2024-05-16 Thread Mike Pattrick

Clang's static analyzer will complain about an uninitialized value
because in some error conditions we wouldn't set all values that are
used later.

Now we initialize more values that are needed later even in error
conditions.

Signed-off-by: Mike Pattrick 
---
 lib/netdev-linux.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 25349c605..66dae3e1a 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -2439,7 +2439,9 @@ netdev_linux_read_definitions(struct netdev_linux *netdev,
 int error = 0;
 
 error = netdev_linux_read_stringset_info(netdev, );
-if (error || !len) {
+if (!len) {
+return -EOPNOTSUPP;
+} else if (error) {
 return error;
 }
 strings = xzalloc(sizeof *strings + len * ETH_GSTRING_LEN);
@@ -2724,6 +2726,7 @@ netdev_linux_get_speed_locked(struct netdev_linux *netdev,
   uint32_t *current, uint32_t *max)
 {
 if (netdev_linux_netnsid_is_remote(netdev)) {
+*current = 0;
 return EOPNOTSUPP;
 }
 
@@ -2733,6 +2736,8 @@ netdev_linux_get_speed_locked(struct netdev_linux *netdev,
? 0 : netdev->current_speed;
 *max = MIN(UINT32_MAX,
netdev_features_to_bps(netdev->supported, 0) / 100ULL);
+} else {
+*current = 0;
 }
 return netdev->get_features_error;
 }
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH 4/5] socket: Fix Clang's static analyzer 'garbage value' warnings.

2024-05-16 Thread Mike Pattrick

Clang's static analyzer will complain about an uninitialized value
because we weren't setting a value for dns_failure in all code paths.

Now we initialize this on declaration.

Signed-off-by: Mike Pattrick 
---
 lib/socket-util.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/socket-util.c b/lib/socket-util.c
index 2d89fce85..1d21ce01c 100644
--- a/lib/socket-util.c
+++ b/lib/socket-util.c
@@ -711,7 +711,7 @@ inet_open_passive(int style, const char *target, int 
default_port,
 struct sockaddr_storage ss;
 int fd = 0, error;
 unsigned int yes = 1;
-bool dns_failure;
+bool dns_failure = false;
 
 if (!inet_parse_passive(target, default_port, , true, _failure)) {
 if (dns_failure) {
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH 0/5] clang: Fix Clang's static analyzer detections.

2024-05-16 Thread Mike Pattrick

Clang's static analyzer has identified several instances of uninitialized
variable usage and null pointer dereferencing that while not likely, is
possible. These mostly included making sure that a variable is properly set
or error code properly returned in every error condition.

Signed-off-by: Mike Pattrick 


Mike Pattrick (5):
  netdev-offload: Fix Clang's static analyzer 'null pointer dereference'
warnings.
  netdev-native-tnl: Fix Clang's static analyzer 'uninitialized value'
warnings.
  dpctl: Fix Clang's static analyzer 'garbage value' warnings.
  socket: Fix Clang's static analyzer 'garbage value' warnings.
  netdev-linux: Fix Clang's static analyzer uninitialized values
warnings.

 lib/dpctl.c | 5 ++---
 lib/netdev-linux.c  | 7 ++-
 lib/netdev-native-tnl.c | 4 +++-
 lib/netdev-offload.c| 9 -
 lib/socket-util.c   | 2 +-
 5 files changed, 20 insertions(+), 7 deletions(-)

-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH 2/5] netdev-native-tnl: Fix Clang's static analyzer 'uninitialized value' warnings.

2024-05-16 Thread Mike Pattrick

Clang's static analyzer will complain about an uninitialized value
because we weren't properly checking the error code from a function that
would have initialized the value.

Instead, add a check for that return code.

Signed-off-by: Mike Pattrick 
---
 lib/netdev-native-tnl.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index dee9ab344..6e6b15764 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -1068,7 +1068,9 @@ netdev_srv6_pop_header(struct dp_packet *packet)
 }
 
 pkt_metadata_init_tnl(md);
-netdev_tnl_ip_extract_tnl_md(packet, tnl, );
+if (netdev_tnl_ip_extract_tnl_md(packet, tnl, ) == NULL) {
+goto err;
+}
 dp_packet_reset_packet(packet, hlen);
 
 return packet;
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2] compiler: Fix errors in Clang 17 ubsan checks.

2024-05-16 Thread Mike Pattrick

Recheck-request: github-robot

On Thu, May 16, 2024 at 9:58 AM Mike Pattrick  wrote:
>
> This patch attempts to fix a large number of ubsan error messages that
> take the following form:
>
> lib/netlink-notifier.c:237:13: runtime error: call to function
> route_table_change through pointer to incorrect function type
> 'void (*)(const void *, void *)'
>
> In Clang 17 the undefined behaviour sanatizer check for function
> pointers was enabled by default, whereas it was previously disabled
> while compiling C code. These warnings are a false positive in the case
> of OVS, as our macros already check to make sure the function parameter
> is the correct size.
>
> So that check is disabled in the single function that is causing all of
> the errors.
>
> Signed-off-by: Mike Pattrick 
> ---
> v2: Changed macro name
> ---
>  include/openvswitch/compiler.h | 11 +++
>  lib/ovs-rcu.c  |  1 +
>  2 files changed, 12 insertions(+)
>
> diff --git a/include/openvswitch/compiler.h b/include/openvswitch/compiler.h
> index 878c5c6a7..f49b23683 100644
> --- a/include/openvswitch/compiler.h
> +++ b/include/openvswitch/compiler.h
> @@ -69,6 +69,17 @@
>  #define OVS_UNLIKELY(CONDITION) (!!(CONDITION))
>  #endif
>
> +/* Clang 17's implementation of ubsan enables checking that function pointers
> + * match the type of the called function. This currently breaks ovs-rcu, 
> which
> + * calls multiple different types of callbacks via a generic void *(void*)
> + * function pointer type. This macro enables disabling that check for 
> specific
> + * functions. */
> +#if __clang__ && __has_feature(undefined_behavior_sanitizer)
> +#define OVS_NO_SANITIZE_FUNCTION __attribute__((no_sanitize("function")))
> +#else
> +#define OVS_NO_SANITIZE_FUNCTION
> +#endif
> +
>  #if __has_feature(c_thread_safety_attributes)
>  /* "clang" annotations for thread safety check.
>   *
> diff --git a/lib/ovs-rcu.c b/lib/ovs-rcu.c
> index 9e07d9bab..597fe6826 100644
> --- a/lib/ovs-rcu.c
> +++ b/lib/ovs-rcu.c
> @@ -327,6 +327,7 @@ ovsrcu_postpone__(void (*function)(void *aux), void *aux)
>  }
>
>  static bool
> +OVS_NO_SANITIZE_FUNCTION
>  ovsrcu_call_postponed(void)
>  {
>  struct ovsrcu_cbset *cbset;
> --
> 2.39.3
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v2 2/2] ipf: Handle common case of ipf defragmentation.

2024-05-16 Thread Mike Pattrick

When conntrack is reassembling packet fragments, the same reassembly
context can be shared across multiple threads handling different packets
simultaneously. Once a full packet is assembled, it is added to a packet
batch for processing, in the case where there are multiple different pmd
threads accessing conntrack simultaneously, there is a race condition
where the reassembled packet may be added to an arbitrary batch even if
the current batch is available.

When this happens, the packet may be handled incorrectly as it is
inserted into a random openflow execution pipeline, instead of the
pipeline for that packets flow.

This change makes a best effort attempt to try to add the defragmented
packet to the current batch. directly. This should succeed most of the
time.

Fixes: 4ea96698f667 ("Userspace datapath: Add fragmentation handling.")
Reported-at: https://issues.redhat.com/browse/FDP-560
Signed-off-by: Mike Pattrick 
---
 lib/ipf.c | 27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/lib/ipf.c b/lib/ipf.c
index 3c8960be3..2d715f5e9 100644
--- a/lib/ipf.c
+++ b/lib/ipf.c
@@ -506,13 +506,15 @@ ipf_reassemble_v6_frags(struct ipf_list *ipf_list)
 }
 
 /* Called when a frag list state transitions to another state. This is
- * triggered by new fragment for the list being received.*/
-static void
+* triggered by new fragment for the list being received. Returns a reassembled
+* packet if this fragment has completed one. */
+static struct reassembled_pkt *
 ipf_list_state_transition(struct ipf *ipf, struct ipf_list *ipf_list,
   bool ff, bool lf, bool v6)
 OVS_REQUIRES(ipf->ipf_lock)
 {
 enum ipf_list_state curr_state = ipf_list->state;
+struct reassembled_pkt *ret = NULL;
 enum ipf_list_state next_state;
 switch (curr_state) {
 case IPF_LIST_STATE_UNUSED:
@@ -562,12 +564,15 @@ ipf_list_state_transition(struct ipf *ipf, struct 
ipf_list *ipf_list,
 ipf_reassembled_list_add(>reassembled_pkt_list, rp);
 ipf_expiry_list_remove(ipf_list);
 next_state = IPF_LIST_STATE_COMPLETED;
+ret = rp;
 } else {
 next_state = IPF_LIST_STATE_REASS_FAIL;
 }
 }
 }
 ipf_list->state = next_state;
+
+return ret;
 }
 
 /* Some sanity checks are redundant, but prudent, in case code paths for
@@ -799,7 +804,8 @@ ipf_is_frag_duped(const struct ipf_frag *frag_list, int 
last_inuse_idx,
 static bool
 ipf_process_frag(struct ipf *ipf, struct ipf_list *ipf_list,
  struct dp_packet *pkt, uint16_t start_data_byte,
- uint16_t end_data_byte, bool ff, bool lf, bool v6)
+ uint16_t end_data_byte, bool ff, bool lf, bool v6,
+ struct reassembled_pkt **rp)
 OVS_REQUIRES(ipf->ipf_lock)
 {
 bool duped_frag = ipf_is_frag_duped(ipf_list->frag_list,
@@ -820,7 +826,7 @@ ipf_process_frag(struct ipf *ipf, struct ipf_list *ipf_list,
 ipf_list->last_inuse_idx++;
 atomic_count_inc(>nfrag);
 ipf_count(ipf, v6, IPF_NFRAGS_ACCEPTED);
-ipf_list_state_transition(ipf, ipf_list, ff, lf, v6);
+*rp = ipf_list_state_transition(ipf, ipf_list, ff, lf, v6);
 } else {
 OVS_NOT_REACHED();
 }
@@ -853,7 +859,8 @@ ipf_list_init(struct ipf_list *ipf_list, struct 
ipf_list_key *key,
  * to a list of fragemnts. */
 static bool
 ipf_handle_frag(struct ipf *ipf, struct dp_packet *pkt, ovs_be16 dl_type,
-uint16_t zone, long long now, uint32_t hash_basis)
+uint16_t zone, long long now, uint32_t hash_basis,
+struct reassembled_pkt **rp)
 OVS_REQUIRES(ipf->ipf_lock)
 {
 struct ipf_list_key key;
@@ -922,7 +929,7 @@ ipf_handle_frag(struct ipf *ipf, struct dp_packet *pkt, 
ovs_be16 dl_type,
 }
 
 return ipf_process_frag(ipf, ipf_list, pkt, start_data_byte,
-end_data_byte, ff, lf, v6);
+end_data_byte, ff, lf, v6, rp);
 }
 
 /* Filters out fragments from a batch of fragments and adjust the batch. */
@@ -941,11 +948,17 @@ ipf_extract_frags_from_batch(struct ipf *ipf, struct 
dp_packet_batch *pb,
   ||
   (dl_type == htons(ETH_TYPE_IPV6) &&
   ipf_is_valid_v6_frag(ipf, pkt {
+struct reassembled_pkt *rp = NULL;
 
 ovs_mutex_lock(>ipf_lock);
-if (!ipf_handle_frag(ipf, pkt, dl_type, zone, now, hash_basis)) {
+if (!ipf_handle_frag(ipf, pkt, dl_type, zone, now, hash_basis,
+ )) {
 dp_packet_batch_refill(pb, pkt, pb_idx);
 } else {
+if (rp && !dp_packet_batch_is_full(pb)) {
+dp_packet_batch_refill(pb, rp->pkt, pb_idx);

[ovs-dev] [PATCH v2 1/2] ipf: Only add fragments to batch of same dl_type.

2024-05-16 Thread Mike Pattrick

When conntrack is reassembling packet fragments, the same reassembly
context can be shared across multiple threads handling different packets
simultaneously. Once a full packet is assembled, it is added to a packet
batch for processing, this is most likely the batch that added it in the
first place, but that isn't a guarantee.

The packets in these batches should be segregated by network protocol
version (ipv4 vs ipv6) for conntrack defragmentation to function
appropriately. However, there are conditions where we would add a
reassembled packet of one type to a batch of another.

This change introduces checks to make sure that reassembled or expired
fragments are only added to packet batches of the same type.

Fixes: 4ea96698f667 ("Userspace datapath: Add fragmentation handling.")
Reported-at: https://issues.redhat.com/browse/FDP-560
Signed-off-by: Mike Pattrick 
---
 lib/ipf.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/lib/ipf.c b/lib/ipf.c
index 7d74e2c13..3c8960be3 100644
--- a/lib/ipf.c
+++ b/lib/ipf.c
@@ -1063,6 +1063,9 @@ ipf_send_completed_frags(struct ipf *ipf, struct 
dp_packet_batch *pb,
 struct ipf_list *ipf_list;
 
 LIST_FOR_EACH_SAFE (ipf_list, list_node, >frag_complete_list) {
+if ((ipf_list->key.dl_type == htons(ETH_TYPE_IPV6)) != v6) {
+continue;
+}
 if (ipf_send_frags_in_list(ipf, ipf_list, pb, IPF_FRAG_COMPLETED_LIST,
v6, now)) {
 ipf_completed_list_clean(>frag_lists, ipf_list);
@@ -1096,6 +1099,9 @@ ipf_send_expired_frags(struct ipf *ipf, struct 
dp_packet_batch *pb,
 size_t lists_removed = 0;
 
 LIST_FOR_EACH_SAFE (ipf_list, list_node, >frag_exp_list) {
+if ((ipf_list->key.dl_type == htons(ETH_TYPE_IPV6)) != v6) {
+continue;
+}
 if (now <= ipf_list->expiration ||
 lists_removed >= IPF_FRAG_LIST_MAX_EXPIRED) {
 break;
@@ -1116,7 +1122,8 @@ ipf_send_expired_frags(struct ipf *ipf, struct 
dp_packet_batch *pb,
 /* Adds a reassmebled packet to a packet batch to be processed by the caller.
  */
 static void
-ipf_execute_reass_pkts(struct ipf *ipf, struct dp_packet_batch *pb)
+ipf_execute_reass_pkts(struct ipf *ipf, struct dp_packet_batch *pb,
+   ovs_be16 dl_type)
 {
 if (ovs_list_is_empty(>reassembled_pkt_list)) {
 return;
@@ -1127,6 +1134,7 @@ ipf_execute_reass_pkts(struct ipf *ipf, struct 
dp_packet_batch *pb)
 
 LIST_FOR_EACH_SAFE (rp, rp_list_node, >reassembled_pkt_list) {
 if (!rp->list->reass_execute_ctx &&
+rp->list->key.dl_type == dl_type &&
 ipf_dp_packet_batch_add(pb, rp->pkt, false)) {
 rp->list->reass_execute_ctx = rp->pkt;
 }
@@ -1237,7 +1245,7 @@ ipf_preprocess_conntrack(struct ipf *ipf, struct 
dp_packet_batch *pb,
 }
 
 if (ipf_get_enabled(ipf) || atomic_count_get(>nfrag)) {
-ipf_execute_reass_pkts(ipf, pb);
+ipf_execute_reass_pkts(ipf, pb, dl_type);
 }
 }
 
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] ipf: Only add fragments to batch of same dl_type.

2024-05-16 Thread Mike Pattrick

On Thu, May 16, 2024 at 8:35 AM Simon Horman  wrote:
>
> Hi Mike,
>
> On Wed, May 15, 2024 at 12:24:33PM -0400, Mike Pattrick wrote:
> > When conntrack is reassembling packet fragments, the same reassembly
> > context can be shared across multiple threads handling different packets
> > simultaneously. Once a full packet is assembled, it is added to a packet
> > batch for processing, this is most likely the batch that added it in the
> > first place, but that isn't a guarantee.
> >
> > The packets in these batches should be segregated by network protocol
> > versuib (ipv4 vs ipv6) for conntrack defragmentation to function
>
> nit: version
>
> > appropriately. However, there are conditions where we would add a
> > reassembled packet of one type to a batch of another.
> >
> > This change introduces checks to make sure that reassembled or expired
> > fragments are only added to packet batches of the same type. It also
> > makes a best effort attempt to make sure the defragmented packet is
> > inserted into the current batch.
>
> Would it make any sense to separate these changes into separate patches?

Can do!

>
> >
> > Fixes: 4ea96698f667 ("Userspace datapath: Add fragmentation handling.")
> > Reported-at: https://issues.redhat.com/browse/FDP-560
> > Signed-off-by: Mike Pattrick 
> > ---
> > Note: This solution is far from perfect, ipf.c can still insert packets
> > into more or less arbitrary batches but this bug fix is needed to avoid a
> > memory overrun and should insert packets into the proper batch in the
> > common case. I'm working on a more correct solution but it changes how
> > fragments are fundimentally handled, and couldn't be considered a bug fix.
>
> FWIIW, I'm ok with changes that move things to a better, even if not ideal,
> state.
>
> ...
>
> > @@ -943,9 +952,14 @@ ipf_extract_frags_from_batch(struct ipf *ipf, struct 
> > dp_packet_batch *pb,
> >ipf_is_valid_v6_frag(ipf, pkt {
> >
> >  ovs_mutex_lock(>ipf_lock);
> > -if (!ipf_handle_frag(ipf, pkt, dl_type, zone, now, 
> > hash_basis)) {
> > +if (!ipf_handle_frag(ipf, pkt, dl_type, zone, now, hash_basis,
> > + )) {
> >  dp_packet_batch_refill(pb, pkt, pb_idx);
> >  } else {
> > +if (rp && !dp_packet_batch_is_full(pb)) {
>
> The conditions under which rp are set are complex and buried
> inside the call-chain under ipf_handle_frag(). I am concerned
> that there are cases where it may be used unset here. Or that
> the complexity allows for such cases to be inadvertently added
> later.
>
> Could we make this more robust, f.e. by making sure rp is
> always initialised when ipf_handle_frag returns by setting
> it to NULL towards the top of that function.

Agreed that it's overly complex. I'll change this to initialize this
in ipf_extract_frags_from_batch(), the functions in between
ipf_list_state_transition and ipf_extract_frags_from_batch shouldn't
really touch or care about this value.

-M

>
> > +dp_packet_batch_refill(pb, rp->pkt, pb_idx);
> > +rp->list->reass_execute_ctx = rp->pkt;
> > +}
> >  dp_packet_delete(pkt);
> >  }
> >  ovs_mutex_unlock(>ipf_lock);
>
> ...
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v2] compiler: Fix errors in Clang 17 ubsan checks.

2024-05-16 Thread Mike Pattrick

This patch attempts to fix a large number of ubsan error messages that
take the following form:

lib/netlink-notifier.c:237:13: runtime error: call to function
route_table_change through pointer to incorrect function type
'void (*)(const void *, void *)'

In Clang 17 the undefined behaviour sanatizer check for function
pointers was enabled by default, whereas it was previously disabled
while compiling C code. These warnings are a false positive in the case
of OVS, as our macros already check to make sure the function parameter
is the correct size.

So that check is disabled in the single function that is causing all of
the errors.

Signed-off-by: Mike Pattrick 
---
v2: Changed macro name
---
 include/openvswitch/compiler.h | 11 +++
 lib/ovs-rcu.c  |  1 +
 2 files changed, 12 insertions(+)

diff --git a/include/openvswitch/compiler.h b/include/openvswitch/compiler.h
index 878c5c6a7..f49b23683 100644
--- a/include/openvswitch/compiler.h
+++ b/include/openvswitch/compiler.h
@@ -69,6 +69,17 @@
 #define OVS_UNLIKELY(CONDITION) (!!(CONDITION))
 #endif
 
+/* Clang 17's implementation of ubsan enables checking that function pointers
+ * match the type of the called function. This currently breaks ovs-rcu, which
+ * calls multiple different types of callbacks via a generic void *(void*)
+ * function pointer type. This macro enables disabling that check for specific
+ * functions. */
+#if __clang__ && __has_feature(undefined_behavior_sanitizer)
+#define OVS_NO_SANITIZE_FUNCTION __attribute__((no_sanitize("function")))
+#else
+#define OVS_NO_SANITIZE_FUNCTION
+#endif
+
 #if __has_feature(c_thread_safety_attributes)
 /* "clang" annotations for thread safety check.
  *
diff --git a/lib/ovs-rcu.c b/lib/ovs-rcu.c
index 9e07d9bab..597fe6826 100644
--- a/lib/ovs-rcu.c
+++ b/lib/ovs-rcu.c
@@ -327,6 +327,7 @@ ovsrcu_postpone__(void (*function)(void *aux), void *aux)
 }
 
 static bool
+OVS_NO_SANITIZE_FUNCTION
 ovsrcu_call_postponed(void)
 {
 struct ovsrcu_cbset *cbset;
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] ipf: Only add fragments to batch of same dl_type.

2024-05-15 Thread Mike Pattrick

When conntrack is reassembling packet fragments, the same reassembly
context can be shared across multiple threads handling different packets
simultaneously. Once a full packet is assembled, it is added to a packet
batch for processing, this is most likely the batch that added it in the
first place, but that isn't a guarantee.

The packets in these batches should be segregated by network protocol
versuib (ipv4 vs ipv6) for conntrack defragmentation to function
appropriately. However, there are conditions where we would add a
reassembled packet of one type to a batch of another.

This change introduces checks to make sure that reassembled or expired
fragments are only added to packet batches of the same type. It also
makes a best effort attempt to make sure the defragmented packet is
inserted into the current batch.

Fixes: 4ea96698f667 ("Userspace datapath: Add fragmentation handling.")
Reported-at: https://issues.redhat.com/browse/FDP-560
Signed-off-by: Mike Pattrick 
---
Note: This solution is far from perfect, ipf.c can still insert packets
into more or less arbitrary batches but this bug fix is needed to avoid a
memory overrun and should insert packets into the proper batch in the
common case. I'm working on a more correct solution but it changes how
fragments are fundimentally handled, and couldn't be considered a bug fix.
---
 lib/ipf.c | 40 +++-
 1 file changed, 31 insertions(+), 9 deletions(-)

diff --git a/lib/ipf.c b/lib/ipf.c
index 7d74e2c13..90c819d63 100644
--- a/lib/ipf.c
+++ b/lib/ipf.c
@@ -506,13 +506,15 @@ ipf_reassemble_v6_frags(struct ipf_list *ipf_list)
 }
 
 /* Called when a frag list state transitions to another state. This is
- * triggered by new fragment for the list being received.*/
-static void
+* triggered by new fragment for the list being received. Returns a reassembled
+* packet if this fragment has completed one. */
+static struct reassembled_pkt *
 ipf_list_state_transition(struct ipf *ipf, struct ipf_list *ipf_list,
   bool ff, bool lf, bool v6)
 OVS_REQUIRES(ipf->ipf_lock)
 {
 enum ipf_list_state curr_state = ipf_list->state;
+struct reassembled_pkt *ret = NULL;
 enum ipf_list_state next_state;
 switch (curr_state) {
 case IPF_LIST_STATE_UNUSED:
@@ -562,12 +564,15 @@ ipf_list_state_transition(struct ipf *ipf, struct 
ipf_list *ipf_list,
 ipf_reassembled_list_add(>reassembled_pkt_list, rp);
 ipf_expiry_list_remove(ipf_list);
 next_state = IPF_LIST_STATE_COMPLETED;
+ret = rp;
 } else {
 next_state = IPF_LIST_STATE_REASS_FAIL;
 }
 }
 }
 ipf_list->state = next_state;
+
+return ret;
 }
 
 /* Some sanity checks are redundant, but prudent, in case code paths for
@@ -799,7 +804,8 @@ ipf_is_frag_duped(const struct ipf_frag *frag_list, int 
last_inuse_idx,
 static bool
 ipf_process_frag(struct ipf *ipf, struct ipf_list *ipf_list,
  struct dp_packet *pkt, uint16_t start_data_byte,
- uint16_t end_data_byte, bool ff, bool lf, bool v6)
+ uint16_t end_data_byte, bool ff, bool lf, bool v6,
+ struct reassembled_pkt **rp)
 OVS_REQUIRES(ipf->ipf_lock)
 {
 bool duped_frag = ipf_is_frag_duped(ipf_list->frag_list,
@@ -820,13 +826,14 @@ ipf_process_frag(struct ipf *ipf, struct ipf_list 
*ipf_list,
 ipf_list->last_inuse_idx++;
 atomic_count_inc(>nfrag);
 ipf_count(ipf, v6, IPF_NFRAGS_ACCEPTED);
-ipf_list_state_transition(ipf, ipf_list, ff, lf, v6);
+*rp = ipf_list_state_transition(ipf, ipf_list, ff, lf, v6);
 } else {
 OVS_NOT_REACHED();
 }
 } else {
 ipf_count(ipf, v6, IPF_NFRAGS_OVERLAP);
 pkt->md.ct_state = CS_INVALID;
+*rp = NULL;
 return false;
 }
 return true;
@@ -853,7 +860,8 @@ ipf_list_init(struct ipf_list *ipf_list, struct 
ipf_list_key *key,
  * to a list of fragemnts. */
 static bool
 ipf_handle_frag(struct ipf *ipf, struct dp_packet *pkt, ovs_be16 dl_type,
-uint16_t zone, long long now, uint32_t hash_basis)
+uint16_t zone, long long now, uint32_t hash_basis,
+struct reassembled_pkt **rp)
 OVS_REQUIRES(ipf->ipf_lock)
 {
 struct ipf_list_key key;
@@ -922,7 +930,7 @@ ipf_handle_frag(struct ipf *ipf, struct dp_packet *pkt, 
ovs_be16 dl_type,
 }
 
 return ipf_process_frag(ipf, ipf_list, pkt, start_data_byte,
-end_data_byte, ff, lf, v6);
+end_data_byte, ff, lf, v6, rp);
 }
 
 /* Filters out fragments from a batch of fragments and adjust the batch. */
@@ -933,6 +941,7 @@ ipf_extract_frags_from_batch(struct ipf *ipf, struct 
dp_packet_batch *pb,
 {
 const size_t pb_cnt = dp_packet_batch_size(pb);
 int pb_idx; /*

Re: [ovs-dev] [PATCH v2] lib: Fix segfault for tunnel packet.

2024-05-03 Thread Mike Pattrick

On Fri, May 3, 2024 at 6:22 AM Amit Prakash Shukla
 wrote:
>
> Add NULL check to UDP, TCP and SCTP checksum functions. This patch
> also adds changes to populate inner_l3_ofs and inner_l4_ofs for the
> tunneled packets received from ports other than vport which are
> required by the protocol specific checksum function to parse the
> headers.
>
> Thread 22 "pmd-c07/id:15" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x6e70dc00 (LWP 1061)]
> 0x13f61750 in packet_udp_complete_csum at lib/packets.c:2061
> 2061if (!udp->udp_csum) {
>
> 0x13f61750 in packet_udp_complete_csum at lib/packets.c:2061
> 0x13e5126c in dp_packet_ol_send_prepare at lib/dp-packet.c:638
> 0x13eb7d4c in netdev_push_header at lib/netdev.c:1035
> 0x13e69830 in push_tnl_action at lib/dpif-netdev.c:9067
> 0x13e69dac in dp_execute_cb at lib/dpif-netdev.c:9226
> 0x13ec72c4 in odp_execute_actions at lib/odp-execute.c:1008
> 0x13e6a7bc in dp_netdev_execute_actions at lib/dpif-netdev.c:9524
> 0x13e673d0 in packet_batch_per_flow_execute at lib/dpif-netdev.c:8271
> 0x13e69188 in dp_netdev_input__ at lib/dpif-netdev.c:8899
> 0x13e691f8 in dp_netdev_input at lib/dpif-netdev.c:8908
> 0x13e600e4 in dp_netdev_process_rxq_port at lib/dpif-netdev.c:5660
> 0x13e649a8 in pmd_thread_main at lib/dpif-netdev.c:7295
> 0x13f44b2c in ovsthread_wrapper at lib/ovs-thread.c:423
>
> CC: Mike Pattrick 
> Fixes: 85bcbbed839a ("userspace: Enable tunnel tests with TSO.")
>
> Signed-off-by: Amit Prakash Shukla 
> ---
>
> v2:
> - Added Fixes tag and updated commit message.
>
>  lib/netdev.c  |  7 +++
>  lib/packets.c | 10 +-
>  2 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/lib/netdev.c b/lib/netdev.c
> index f2d921ed6..19bd87ef7 100644
> --- a/lib/netdev.c
> +++ b/lib/netdev.c
> @@ -1032,6 +1032,13 @@ netdev_push_header(const struct netdev *netdev,
>   netdev_get_name(netdev));
>  continue;
>  }
> +if (packet->l3_ofs != UINT16_MAX) {
> +packet->inner_l3_ofs = packet->l3_ofs + data->header_len;
> +}
> +if (packet->l4_ofs != UINT16_MAX) {
> +packet->inner_l4_ofs = packet->l4_ofs + data->header_len;
> +}
> +
>  dp_packet_ol_send_prepare(packet, 0);
>  }
>  netdev->netdev_class->push_header(netdev, packet, data);
> diff --git a/lib/packets.c b/lib/packets.c
> index 5803d26f4..988c0e41f 100644
> --- a/lib/packets.c
> +++ b/lib/packets.c
> @@ -2011,6 +2011,10 @@ packet_tcp_complete_csum(struct dp_packet *p, bool 
> inner)
>  tcp_sz = dp_packet_l4_size(p);
>  }
>
> +if (!tcp || !ip_hdr) {
> +return;
> +}

This suggests a packet has NETDEV_TX_OFFLOAD_TCP_CKSUM set but no TCP
header or the offsets are set incorrectly. If that's the case then
there will be additional issues in netdev-linux, the avx512 code, and
potentially in other DPDK drivers.

As Ilya mentioned, an assert here would be preferable.

-M

> +
>  if (!inner && dp_packet_hwol_is_outer_ipv6(p)) {
>  is_v4 = false;
>  } else if (!inner && dp_packet_hwol_is_outer_ipv4(p)) {
> @@ -2058,7 +2062,7 @@ packet_udp_complete_csum(struct dp_packet *p, bool 
> inner)
>  }
>
>  /* Skip csum calculation if the udp_csum is zero. */
> -if (!udp->udp_csum) {
> +if (!udp || !ip_hdr || !udp->udp_csum) {
>  return;
>  }
>
> @@ -2109,6 +2113,10 @@ packet_sctp_complete_csum(struct dp_packet *p, bool 
> inner)
>  tp_len = dp_packet_l4_size(p);
>  }
>
> +if (!sh) {
> +return;
> +}
> +
>  put_16aligned_be32(>sctp_csum, 0);
>  csum = crc32c((void *) sh, tp_len);
>  put_16aligned_be32(>sctp_csum, csum);
> --
> 2.34.1
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v9 1/2] ofproto-dpif-mirror: Reduce number of function parameters.

2024-05-01 Thread Mike Pattrick

Previously the mirror_set() and mirror_get() functions took a large
number of parameters, which was inefficient and difficult to read and
extend. This patch moves most of the parameters into a struct.

Signed-off-by: Mike Pattrick 
Acked-by: Simon Horman 
Acked-by: Eelco Chaudron 
Signed-off-by: Mike Pattrick 
---
 ofproto/ofproto-dpif-mirror.c | 60 ++-
 ofproto/ofproto-dpif-mirror.h | 40 ++-
 ofproto/ofproto-dpif-xlate.c  | 29 -
 ofproto/ofproto-dpif.c| 23 +++---
 4 files changed, 88 insertions(+), 64 deletions(-)

diff --git a/ofproto/ofproto-dpif-mirror.c b/ofproto/ofproto-dpif-mirror.c
index 343b75f0e..4967ecc9a 100644
--- a/ofproto/ofproto-dpif-mirror.c
+++ b/ofproto/ofproto-dpif-mirror.c
@@ -207,19 +207,22 @@ mirror_bundle_dst(struct mbridge *mbridge, struct 
ofbundle *ofbundle)
 }
 
 int
-mirror_set(struct mbridge *mbridge, void *aux, const char *name,
-   struct ofbundle **srcs, size_t n_srcs,
-   struct ofbundle **dsts, size_t n_dsts,
-   unsigned long *src_vlans, struct ofbundle *out_bundle,
-   uint16_t snaplen,
-   uint16_t out_vlan)
+mirror_set(struct mbridge *mbridge, void *aux,
+   const struct ofproto_mirror_settings *ms,
+   const struct mirror_bundles *mb)
 {
 struct mbundle *mbundle, *out;
 mirror_mask_t mirror_bit;
 struct mirror *mirror;
 struct hmapx srcs_map;  /* Contains "struct ofbundle *"s. */
 struct hmapx dsts_map;  /* Contains "struct ofbundle *"s. */
+uint16_t out_vlan;
 
+if (!ms || !mbridge) {
+return EINVAL;
+}
+
+out_vlan = ms->out_vlan;
 mirror = mirror_lookup(mbridge, aux);
 if (!mirror) {
 int idx;
@@ -227,7 +230,7 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 idx = mirror_scan(mbridge);
 if (idx < 0) {
 VLOG_WARN("maximum of %d port mirrors reached, cannot create %s",
-  MAX_MIRRORS, name);
+  MAX_MIRRORS, ms->name);
 return EFBIG;
 }
 
@@ -242,8 +245,8 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 unsigned long *vlans = ovsrcu_get(unsigned long *, >vlans);
 
 /* Get the new configuration. */
-if (out_bundle) {
-out = mbundle_lookup(mbridge, out_bundle);
+if (mb->out_bundle) {
+out = mbundle_lookup(mbridge, mb->out_bundle);
 if (!out) {
 mirror_destroy(mbridge, mirror->aux);
 return EINVAL;
@@ -252,16 +255,16 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 } else {
 out = NULL;
 }
-mbundle_lookup_multiple(mbridge, srcs, n_srcs, _map);
-mbundle_lookup_multiple(mbridge, dsts, n_dsts, _map);
+mbundle_lookup_multiple(mbridge, mb->srcs, mb->n_srcs, _map);
+mbundle_lookup_multiple(mbridge, mb->dsts, mb->n_dsts, _map);
 
 /* If the configuration has not changed, do nothing. */
 if (hmapx_equals(_map, >srcs)
 && hmapx_equals(_map, >dsts)
-&& vlan_bitmap_equal(vlans, src_vlans)
+&& vlan_bitmap_equal(vlans, ms->src_vlans)
 && mirror->out == out
 && mirror->out_vlan == out_vlan
-&& mirror->snaplen == snaplen)
+&& mirror->snaplen == ms->snaplen)
 {
 hmapx_destroy(_map);
 hmapx_destroy(_map);
@@ -275,15 +278,15 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 hmapx_swap(_map, >dsts);
 hmapx_destroy(_map);
 
-if (vlans || src_vlans) {
+if (vlans || ms->src_vlans) {
 ovsrcu_postpone(free, vlans);
-vlans = vlan_bitmap_clone(src_vlans);
+vlans = vlan_bitmap_clone(ms->src_vlans);
 ovsrcu_set(>vlans, vlans);
 }
 
 mirror->out = out;
 mirror->out_vlan = out_vlan;
-mirror->snaplen = snaplen;
+mirror->snaplen = ms->snaplen;
 
 /* Update mbundles. */
 mirror_bit = MIRROR_MASK_C(1) << mirror->idx;
@@ -406,23 +409,22 @@ mirror_update_stats(struct mbridge *mbridge, 
mirror_mask_t mirrors,
 /* Retrieves the mirror numbered 'index' in 'mbridge'.  Returns true if such a
  * mirror exists, false otherwise.
  *
- * If successful, '*vlans' receives the mirror's VLAN membership information,
+ * If successful 'mc->vlans' receives the mirror's VLAN membership information,
  * either a null pointer if the mirror includes all VLANs or a 4096-bit bitmap
  * in which a 1-bit indicates that the mirror includes a particular VLAN,
- * '*dup_mirrors' receives a bitmap of mirrors whose output duplicates mirror
- * 'index', '*out' receives the output ofbundle (if any), and '*out_vlan'
- * receives the output VLAN (if any).
+ * 'mc->dup_mirrors' receives a bitmap of mirrors whose output duplic

[ovs-dev] [PATCH v9 2/2] ofproto-dpif-mirror: Add support for pre-selection filter.

2024-05-01 Thread Mike Pattrick

Currently a bridge mirror will collect all packets and tools like
ovs-tcpdump can apply additional filters after they have already been
duplicated by vswitchd. This can result in inefficient collection.

This patch adds support to apply pre-selection to bridge mirrors, which
can limit which packets are mirrored based on flow metadata. This
significantly improves overall vswitchd performance during mirroring if
only a subset of traffic is required.

Signed-off-by: Mike Pattrick 
---
v8:
 - Corrected code from v7 related to sequence and in_port. Mirrors
   reject filters with an in_port set as this could cause confusion.
 - Combined ovsrcu pointers into a new struct, minimatch wasn't used
   because the minimatch_* functions didn't fit the usage here.
 - Added a test to check for modifying filters when partially
   overlapping flows already exist.
 - Corrected documentation.
v9:
 - Explicitly cleared mirror_config.filter* when not set
---
 Documentation/ref/ovs-tcpdump.8.rst |   8 +-
 NEWS|   6 +
 lib/flow.h  |   9 ++
 ofproto/ofproto-dpif-mirror.c   | 104 +-
 ofproto/ofproto-dpif-mirror.h   |   9 +-
 ofproto/ofproto-dpif-xlate.c|  15 ++-
 ofproto/ofproto-dpif.c  |  12 +-
 ofproto/ofproto.h   |   3 +
 tests/ofproto-dpif.at   | 165 
 utilities/ovs-tcpdump.in|  13 ++-
 vswitchd/bridge.c   |  13 ++-
 vswitchd/vswitch.ovsschema  |   7 +-
 vswitchd/vswitch.xml|  16 +++
 13 files changed, 365 insertions(+), 15 deletions(-)

diff --git a/Documentation/ref/ovs-tcpdump.8.rst 
b/Documentation/ref/ovs-tcpdump.8.rst
index b9f8cdf6f..e21e61211 100644
--- a/Documentation/ref/ovs-tcpdump.8.rst
+++ b/Documentation/ref/ovs-tcpdump.8.rst
@@ -61,8 +61,14 @@ Options
 
   If specified, mirror all ports (optional).
 
+* ``--filter ``
+
+  If specified, only mirror flows that match the provided OpenFlow filter.
+  The available fields are documented in ``ovs-fields(7)``.
+
 See Also
 
 
 ``ovs-appctl(8)``, ``ovs-vswitchd(8)``, ``ovs-pcap(1)``,
-``ovs-tcpundump(1)``, ``tcpdump(8)``, ``wireshark(8)``.
+``ovs-fields(7)``, ``ovs-tcpundump(1)``, ``tcpdump(8)``,
+``wireshark(8)``.
diff --git a/NEWS b/NEWS
index b92cec532..f3a4bf076 100644
--- a/NEWS
+++ b/NEWS
@@ -7,6 +7,12 @@ Post-v3.3.0
- The primary development branch has been renamed from 'master' to 'main'.
  The OVS tree remains hosted on GitHub.
  https://github.com/openvswitch/ovs.git
+   - ovs-vsctl:
+ * Added a new filter column in the Mirror table which can be used to
+   apply filters to mirror ports.
+   - ovs-tcpdump:
+ * Added command line parameter --filter to enable filtering the flows
+   that are captured by tcpdump.
 
 
 v3.3.0 - 16 Feb 2024
diff --git a/lib/flow.h b/lib/flow.h
index 75a9be3c1..60ec4b0d7 100644
--- a/lib/flow.h
+++ b/lib/flow.h
@@ -939,6 +939,15 @@ flow_union_with_miniflow(struct flow *dst, const struct 
miniflow *src)
 flow_union_with_miniflow_subset(dst, src, src->map);
 }
 
+/* Perform a bitwise OR of minimask 'src' mask data with the equivalent
+ * fields in 'dst', storing the result in 'dst'. */
+static inline void
+flow_wildcards_union_with_minimask(struct flow_wildcards *dst,
+   const struct minimask *src)
+{
+flow_union_with_miniflow_subset(>masks, >masks, src->masks.map);
+}
+
 static inline bool is_ct_valid(const struct flow *flow,
const struct flow_wildcards *mask,
struct flow_wildcards *wc)
diff --git a/ofproto/ofproto-dpif-mirror.c b/ofproto/ofproto-dpif-mirror.c
index 4967ecc9a..6d89d13a5 100644
--- a/ofproto/ofproto-dpif-mirror.c
+++ b/ofproto/ofproto-dpif-mirror.c
@@ -21,6 +21,7 @@
 #include "cmap.h"
 #include "hmapx.h"
 #include "ofproto.h"
+#include "ofproto-dpif-trace.h"
 #include "vlan-bitmap.h"
 #include "openvswitch/vlog.h"
 
@@ -48,6 +49,11 @@ struct mbundle {
 mirror_mask_t mirror_out;   /* Mirrors that output to this mbundle. */
 };
 
+struct filtermask {
+struct miniflow *flow;
+struct minimask *mask;
+};
+
 struct mirror {
 struct mbridge *mbridge;/* Owning ofproto. */
 size_t idx; /* In ofproto's "mirrors" array. */
@@ -57,6 +63,10 @@ struct mirror {
 struct hmapx srcs;  /* Contains "struct mbundle*"s. */
 struct hmapx dsts;  /* Contains "struct mbundle*"s. */
 
+/* Filter criteria. */
+OVSRCU_TYPE(struct filtermask *) filter_mask;
+char *filter_str;
+
 /* This is accessed by handler threads assuming RCU protection (see
  * mirror_get()), but can be manipulated by mirror_set() without any
  * explicit synchronization. */
@@ -83,6 +93,23 @@ static void mbundle_lookup_multiple(const struct mbri

[ovs-dev] [PATCH v8 2/2] ofproto-dpif-mirror: Add support for pre-selection filter.

2024-04-29 Thread Mike Pattrick

Currently a bridge mirror will collect all packets and tools like
ovs-tcpdump can apply additional filters after they have already been
duplicated by vswitchd. This can result in inefficient collection.

This patch adds support to apply pre-selection to bridge mirrors, which
can limit which packets are mirrored based on flow metadata. This
significantly improves overall vswitchd performance during mirroring if
only a subset of traffic is required.

Signed-off-by: Mike Pattrick 
---
v8:
 - Corrected code from v7 related to sequence and in_port. Mirrors
   reject filters with an in_port set as this could cause confusion.
 - Combined ovsrcu pointers into a new struct, minimatch wasn't used
   because the minimatch_* functions didn't fit the usage here.
 - Added a test to check for modifying filters when partially
   overlapping flows already exist.
 - Corrected documentation.
---
 Documentation/ref/ovs-tcpdump.8.rst |   8 +-
 NEWS|   6 +
 lib/flow.h  |   9 ++
 ofproto/ofproto-dpif-mirror.c   | 101 -
 ofproto/ofproto-dpif-mirror.h   |   9 +-
 ofproto/ofproto-dpif-xlate.c|  15 ++-
 ofproto/ofproto-dpif.c  |  12 +-
 ofproto/ofproto.h   |   3 +
 tests/ofproto-dpif.at   | 165 
 utilities/ovs-tcpdump.in|  13 ++-
 vswitchd/bridge.c   |  13 ++-
 vswitchd/vswitch.ovsschema  |   7 +-
 vswitchd/vswitch.xml|  16 +++
 13 files changed, 362 insertions(+), 15 deletions(-)

diff --git a/Documentation/ref/ovs-tcpdump.8.rst 
b/Documentation/ref/ovs-tcpdump.8.rst
index b9f8cdf6f..e21e61211 100644
--- a/Documentation/ref/ovs-tcpdump.8.rst
+++ b/Documentation/ref/ovs-tcpdump.8.rst
@@ -61,8 +61,14 @@ Options
 
   If specified, mirror all ports (optional).
 
+* ``--filter ``
+
+  If specified, only mirror flows that match the provided OpenFlow filter.
+  The available fields are documented in ``ovs-fields(7)``.
+
 See Also
 
 
 ``ovs-appctl(8)``, ``ovs-vswitchd(8)``, ``ovs-pcap(1)``,
-``ovs-tcpundump(1)``, ``tcpdump(8)``, ``wireshark(8)``.
+``ovs-fields(7)``, ``ovs-tcpundump(1)``, ``tcpdump(8)``,
+``wireshark(8)``.
diff --git a/NEWS b/NEWS
index b92cec532..f3a4bf076 100644
--- a/NEWS
+++ b/NEWS
@@ -7,6 +7,12 @@ Post-v3.3.0
- The primary development branch has been renamed from 'master' to 'main'.
  The OVS tree remains hosted on GitHub.
  https://github.com/openvswitch/ovs.git
+   - ovs-vsctl:
+ * Added a new filter column in the Mirror table which can be used to
+   apply filters to mirror ports.
+   - ovs-tcpdump:
+ * Added command line parameter --filter to enable filtering the flows
+   that are captured by tcpdump.
 
 
 v3.3.0 - 16 Feb 2024
diff --git a/lib/flow.h b/lib/flow.h
index 75a9be3c1..60ec4b0d7 100644
--- a/lib/flow.h
+++ b/lib/flow.h
@@ -939,6 +939,15 @@ flow_union_with_miniflow(struct flow *dst, const struct 
miniflow *src)
 flow_union_with_miniflow_subset(dst, src, src->map);
 }
 
+/* Perform a bitwise OR of minimask 'src' mask data with the equivalent
+ * fields in 'dst', storing the result in 'dst'. */
+static inline void
+flow_wildcards_union_with_minimask(struct flow_wildcards *dst,
+   const struct minimask *src)
+{
+flow_union_with_miniflow_subset(>masks, >masks, src->masks.map);
+}
+
 static inline bool is_ct_valid(const struct flow *flow,
const struct flow_wildcards *mask,
struct flow_wildcards *wc)
diff --git a/ofproto/ofproto-dpif-mirror.c b/ofproto/ofproto-dpif-mirror.c
index 4967ecc9a..7020a5a5f 100644
--- a/ofproto/ofproto-dpif-mirror.c
+++ b/ofproto/ofproto-dpif-mirror.c
@@ -21,6 +21,7 @@
 #include "cmap.h"
 #include "hmapx.h"
 #include "ofproto.h"
+#include "ofproto-dpif-trace.h"
 #include "vlan-bitmap.h"
 #include "openvswitch/vlog.h"
 
@@ -48,6 +49,11 @@ struct mbundle {
 mirror_mask_t mirror_out;   /* Mirrors that output to this mbundle. */
 };
 
+struct filtermask {
+struct miniflow *flow;
+struct minimask *mask;
+};
+
 struct mirror {
 struct mbridge *mbridge;/* Owning ofproto. */
 size_t idx; /* In ofproto's "mirrors" array. */
@@ -57,6 +63,10 @@ struct mirror {
 struct hmapx srcs;  /* Contains "struct mbundle*"s. */
 struct hmapx dsts;  /* Contains "struct mbundle*"s. */
 
+/* Filter criteria. */
+OVSRCU_TYPE(struct filtermask *) filter_mask;
+char *filter_str;
+
 /* This is accessed by handler threads assuming RCU protection (see
  * mirror_get()), but can be manipulated by mirror_set() without any
  * explicit synchronization. */
@@ -83,6 +93,23 @@ static void mbundle_lookup_multiple(const struct mbridge *, 
struct ofbundle **,
 static int mirror_scan(struct

[ovs-dev] [PATCH v8 1/2] ofproto-dpif-mirror: Reduce number of function parameters.

2024-04-29 Thread Mike Pattrick

Previously the mirror_set() and mirror_get() functions took a large
number of parameters, which was inefficient and difficult to read and
extend. This patch moves most of the parameters into a struct.

Signed-off-by: Mike Pattrick 
Acked-by: Simon Horman 
Acked-by: Eelco Chaudron 
Signed-off-by: Mike Pattrick 
---
 ofproto/ofproto-dpif-mirror.c | 60 ++-
 ofproto/ofproto-dpif-mirror.h | 40 ++-
 ofproto/ofproto-dpif-xlate.c  | 29 -
 ofproto/ofproto-dpif.c| 23 +++---
 4 files changed, 88 insertions(+), 64 deletions(-)

diff --git a/ofproto/ofproto-dpif-mirror.c b/ofproto/ofproto-dpif-mirror.c
index 343b75f0e..4967ecc9a 100644
--- a/ofproto/ofproto-dpif-mirror.c
+++ b/ofproto/ofproto-dpif-mirror.c
@@ -207,19 +207,22 @@ mirror_bundle_dst(struct mbridge *mbridge, struct 
ofbundle *ofbundle)
 }
 
 int
-mirror_set(struct mbridge *mbridge, void *aux, const char *name,
-   struct ofbundle **srcs, size_t n_srcs,
-   struct ofbundle **dsts, size_t n_dsts,
-   unsigned long *src_vlans, struct ofbundle *out_bundle,
-   uint16_t snaplen,
-   uint16_t out_vlan)
+mirror_set(struct mbridge *mbridge, void *aux,
+   const struct ofproto_mirror_settings *ms,
+   const struct mirror_bundles *mb)
 {
 struct mbundle *mbundle, *out;
 mirror_mask_t mirror_bit;
 struct mirror *mirror;
 struct hmapx srcs_map;  /* Contains "struct ofbundle *"s. */
 struct hmapx dsts_map;  /* Contains "struct ofbundle *"s. */
+uint16_t out_vlan;
 
+if (!ms || !mbridge) {
+return EINVAL;
+}
+
+out_vlan = ms->out_vlan;
 mirror = mirror_lookup(mbridge, aux);
 if (!mirror) {
 int idx;
@@ -227,7 +230,7 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 idx = mirror_scan(mbridge);
 if (idx < 0) {
 VLOG_WARN("maximum of %d port mirrors reached, cannot create %s",
-  MAX_MIRRORS, name);
+  MAX_MIRRORS, ms->name);
 return EFBIG;
 }
 
@@ -242,8 +245,8 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 unsigned long *vlans = ovsrcu_get(unsigned long *, >vlans);
 
 /* Get the new configuration. */
-if (out_bundle) {
-out = mbundle_lookup(mbridge, out_bundle);
+if (mb->out_bundle) {
+out = mbundle_lookup(mbridge, mb->out_bundle);
 if (!out) {
 mirror_destroy(mbridge, mirror->aux);
 return EINVAL;
@@ -252,16 +255,16 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 } else {
 out = NULL;
 }
-mbundle_lookup_multiple(mbridge, srcs, n_srcs, _map);
-mbundle_lookup_multiple(mbridge, dsts, n_dsts, _map);
+mbundle_lookup_multiple(mbridge, mb->srcs, mb->n_srcs, _map);
+mbundle_lookup_multiple(mbridge, mb->dsts, mb->n_dsts, _map);
 
 /* If the configuration has not changed, do nothing. */
 if (hmapx_equals(_map, >srcs)
 && hmapx_equals(_map, >dsts)
-&& vlan_bitmap_equal(vlans, src_vlans)
+&& vlan_bitmap_equal(vlans, ms->src_vlans)
 && mirror->out == out
 && mirror->out_vlan == out_vlan
-&& mirror->snaplen == snaplen)
+&& mirror->snaplen == ms->snaplen)
 {
 hmapx_destroy(_map);
 hmapx_destroy(_map);
@@ -275,15 +278,15 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 hmapx_swap(_map, >dsts);
 hmapx_destroy(_map);
 
-if (vlans || src_vlans) {
+if (vlans || ms->src_vlans) {
 ovsrcu_postpone(free, vlans);
-vlans = vlan_bitmap_clone(src_vlans);
+vlans = vlan_bitmap_clone(ms->src_vlans);
 ovsrcu_set(>vlans, vlans);
 }
 
 mirror->out = out;
 mirror->out_vlan = out_vlan;
-mirror->snaplen = snaplen;
+mirror->snaplen = ms->snaplen;
 
 /* Update mbundles. */
 mirror_bit = MIRROR_MASK_C(1) << mirror->idx;
@@ -406,23 +409,22 @@ mirror_update_stats(struct mbridge *mbridge, 
mirror_mask_t mirrors,
 /* Retrieves the mirror numbered 'index' in 'mbridge'.  Returns true if such a
  * mirror exists, false otherwise.
  *
- * If successful, '*vlans' receives the mirror's VLAN membership information,
+ * If successful 'mc->vlans' receives the mirror's VLAN membership information,
  * either a null pointer if the mirror includes all VLANs or a 4096-bit bitmap
  * in which a 1-bit indicates that the mirror includes a particular VLAN,
- * '*dup_mirrors' receives a bitmap of mirrors whose output duplicates mirror
- * 'index', '*out' receives the output ofbundle (if any), and '*out_vlan'
- * receives the output VLAN (if any).
+ * 'mc->dup_mirrors' receives a bitmap of mirrors whose output duplic

Re: [ovs-dev] [PATCH 2/2] ovsdb: raft: Fix probe intervals after install snapshot request.

2024-04-17 Thread Mike Pattrick

On Thu, Apr 11, 2024 at 7:45 PM Ilya Maximets  wrote:
>
> If the new snapshot received with INSTALL_SNAPSHOT request contains
> a different election timer value, the timer is updated, but the
> probe intervals for RAFT connections are not.
>
> Fix that by updating probe intervals whenever we get election timer
> from the log.
>
> Fixes: 14b2b0aad7ae ("raft: Reintroduce jsonrpc inactivity probes.")
> Signed-off-by: Ilya Maximets 

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 1/2] ovsdb: raft: Fix inability to join a cluster with a large database.

2024-04-17 Thread Mike Pattrick

On Thu, Apr 11, 2024 at 7:44 PM Ilya Maximets  wrote:
>
> Inactivity probe interval on RAFT connections depend on a value of the
> election timer.  However, the actual value is not known until the
> database snapshot with the RAFT information is received by a joining
> server.  New joining server is using a default 1 second until then.
>
> In case a new joining server is trying to join an existing cluster
> with a large database, it may take more than a second to generate and
> send an initial database snapshot.  This is causing an inability to
> actually join this cluster.  Joining server sends ADD_SERVER request,
> waits 1 second, sends a probe, doesn't get a reply within another
> second, because the leader is busy preparing and sending an initial
> snapshot to it, disconnects, repeat.
>
> This is not an issue for the servers that did already join, since
> their probe intervals are larger than election timeout.
> Cooperative multitasking also doesn't fully solve this issue, since
> it depends on election timer, which is likely higher in the existing
> cluster with a very big database.
>
> Fix that by using the maximum election timer value for inactivity
> probes until the actual value is known.  We still shouldn't completely
> disable the probes, because in the rare event the connection is
> established but the other side silently goes away, we still want to
> disconnect and try to re-establish the connection eventually.
>
> Since probe intervals also depend on the joining state now, update
> them when the server joins the cluster.
>
> Fixes: 14b2b0aad7ae ("raft: Reintroduce jsonrpc inactivity probes.")
> Reported-by: Terry Wilson 
> Reported-at: https://issues.redhat.com/browse/FDP-144
> Signed-off-by: Ilya Maximets 
> ---

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] Userspace: Software fallback for UDP encapsulated TCP segmentation.

2024-03-27 Thread Mike Pattrick

On Wed, Mar 27, 2024 at 1:39 PM Simon Horman  wrote:
>
> On Tue, Feb 20, 2024 at 11:08:55PM -0500, Mike Pattrick wrote:
> > When sending packets that are flagged as requiring segmentation to an
> > interface that doens't support this feature, send the packet to the TSO
> > software fallback instead of dropping it.
> >
> > Signed-off-by: Mike Pattrick 
>
> Hi Mike,
>
> Can I confirm that from your PoV this patch is still awaiting review?
> I ask because it's been sitting around for a while now.

I believe this patch now needs to be modified for the recent
recirculation change. I'll update it for that, give it another once
over, and resubmit.

-M


>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v5] tunnel: Allow UDP zero checksum with IPv6 tunnels.

2024-03-27 Thread Mike Pattrick

This patch adopts the proposed RFC 6935 by allowing null UDP checksums
even if the tunnel protocol is IPv6. This is already supported by Linux
through the udp6zerocsumtx tunnel option. It is disabled by default and
IPv6 tunnels are flagged as requiring a checksum, but this patch enables
the user to set csum=false on IPv6 tunnels.

Signed-off-by: Mike Pattrick 
---
v2: Changed documentation, and added a NEWS item
v3: NEWS file merge conflict
v4: Better comments, new test
v5: Addressed identified nit's
---
 NEWS  |  4 
 lib/netdev-native-tnl.c   |  2 +-
 lib/netdev-vport.c| 17 +++--
 lib/netdev.h  | 18 +-
 ofproto/tunnel.c  | 10 --
 tests/tunnel-push-pop-ipv6.at |  9 +
 tests/tunnel-push-pop.at  |  7 +++
 tests/tunnel.at   |  2 +-
 vswitchd/vswitch.xml  | 12 +---
 9 files changed, 71 insertions(+), 10 deletions(-)

diff --git a/NEWS b/NEWS
index c9e4064e6..6c8c4a2dc 100644
--- a/NEWS
+++ b/NEWS
@@ -4,6 +4,10 @@ Post-v3.3.0
  * Conntrack now supports 'random' flag for selecting ports in a range
while natting and 'persistent' flag for selection of the IP address
from a range.
+ * IPv6 UDP tunnel encapsulation including Geneve and VXLAN will now
+   honour the csum option.  Configuring the interface with
+   "options:csum=false" now has the same effect as the udp6zerocsumtx
+   option has with Linux kernel UDP tunnels.
 
 
 v3.3.0 - 16 Feb 2024
diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index dee9ab344..e8258bc4e 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -424,7 +424,7 @@ udp_build_header(const struct netdev_tunnel_config *tnl_cfg,
 udp = netdev_tnl_ip_build_header(data, params, IPPROTO_UDP, 0);
 udp->udp_dst = tnl_cfg->dst_port;
 
-if (params->is_ipv6 || params->flow->tunnel.flags & FLOW_TNL_F_CSUM) {
+if (params->flow->tunnel.flags & FLOW_TNL_F_CSUM) {
 /* Write a value in now to mark that we should compute the checksum
  * later. 0x is handy because it is transparent to the
  * calculation. */
diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index 60caa02fb..234a4ebe1 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -702,7 +702,9 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
*args, char **errp)
 tnl_cfg.dst_port = htons(atoi(node->value));
 } else if (!strcmp(node->key, "csum") && has_csum) {
 if (!strcmp(node->value, "true")) {
-tnl_cfg.csum = true;
+tnl_cfg.csum = NETDEV_TNL_CSUM_ENABLED;
+} else if (!strcmp(node->value, "false")) {
+tnl_cfg.csum = NETDEV_TNL_CSUM_DISABLED;
 }
 } else if (!strcmp(node->key, "seq") && has_seq) {
 if (!strcmp(node->value, "true")) {
@@ -850,6 +852,15 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
*args, char **errp)
 }
 }
 
+/* The default csum state for GRE is special as it does have an optional
+ * checksum but the default configuration isn't correlated with IP version
+ * like UDP tunnels are.  Likewise, tunnels with no checksum at all must be
+ * in this state. */
+if (tnl_cfg.csum == NETDEV_TNL_CSUM_DEFAULT &&
+(!has_csum || strstr(type, "gre"))) {
+tnl_cfg.csum = NETDEV_TNL_DEFAULT_NO_CSUM;
+}
+
 enum tunnel_layers layers = tunnel_supported_layers(type, _cfg);
 const char *full_type = (strcmp(type, "vxlan") ? type
  : (tnl_cfg.exts & (1 << OVS_VXLAN_EXT_GPE)
@@ -1026,8 +1037,10 @@ get_tunnel_config(const struct netdev *dev, struct smap 
*args)
 }
 }
 
-if (tnl_cfg->csum) {
+if (tnl_cfg->csum == NETDEV_TNL_CSUM_ENABLED) {
 smap_add(args, "csum", "true");
+} else if (tnl_cfg->csum == NETDEV_TNL_CSUM_DISABLED) {
+smap_add(args, "csum", "false");
 }
 
 if (tnl_cfg->set_seq) {
diff --git a/lib/netdev.h b/lib/netdev.h
index 67a8486bd..5d253157c 100644
--- a/lib/netdev.h
+++ b/lib/netdev.h
@@ -111,6 +111,22 @@ enum netdev_srv6_flowlabel {
 SRV6_FLOWLABEL_COMPUTE,
 };
 
+enum netdev_tnl_csum {
+/* Default value for UDP tunnels if no configurations is present.  Enforce
+ * checksum calculation in IPv6 tunnels, disable in IPv4 tunnels. */
+NETDEV_TNL_CSUM_DEFAULT = 0,
+
+/* Checksum explicitly to be calculated. */
+NETDEV_TNL_CSUM_ENABLED,
+
+/* Checksum calculation explicitly disabled. */
+NETDEV_TNL_CSUM_DISABLED,
+
+/* A value for when there is no checksum or the default value is no
+ * checksum reguardless of IP version. */
+NETDEV_TNL_DEFAULT_NO_C

Re: [ovs-dev] [PATCH] ofproto-dpif-upcall: Don't mirror packets that aren't modified.

2024-03-26 Thread Mike Pattrick

On Mon, Mar 25, 2024 at 10:04 PM Zhangweiwei  wrote:
>
> The ethernet addresses of two ICMP request packets are indeed different . One 
> is original packet and the other is modified. It is an expected behavior 
> according to the code.
> Actually, when a packet sent by port A is changed by flow table and then is 
> sent to itself, we expect to capture this packet. However, when this packet 
> is changed and is sent to another port, should we still capture the packet on 
> port A?

Currently in ovs-tcpdump we capture the packet on ingress and then on
egress if it is modified. You could make the mirror without
ovs-tcpdump, and only set the select_dst_port option but not the
select_src_port one. select_src_port is checked during ingress and
select_dst_port is set during egress.

Hope this helps,
M

>
> [root@localhost infiniband]# ovs-tcpdump -i tapVm71 -nnvve
> 11.11.70.1 > 1.1.70.2: ICMP echo request, id 15498, seq 17, length 64
> 09:36:52.822232 68:05:ca:21:d6:e5 > 52:54:00:67:d5:61, ethertype IPv4 
> (0x0800), length 98: (tos 0x0, ttl 63, id 22101, offset 0, flags [none], 
> proto ICMP (1), length 84)
> 1.1.70.2 > 11.11.70.1: ICMP echo reply, id 15498, seq 17, length 64
> 09:36:53.862137 52:54:00:67:d5:61 > 68:05:ca:21:d6:e5, ethertype IPv4 
> (0x0800), length 98: (tos 0x0, ttl 64, id 26518, offset 0, flags [DF], proto 
> ICMP (1), length 84)
> 11.11.70.1 > 1.1.70.2: ICMP echo request, id 15498, seq 18, length 64
> 09:36:53.862139 68:05:ca:21:d6:e5 > 52:54:00:9a:bf:ed, ethertype IPv4 
> (0x0800), length 98: (tos 0x0, ttl 63, id 26518, offset 0, flags [DF], proto 
> ICMP (1), length 84)
> 11.11.70.1 > 1.1.70.2: ICMP echo request, id 15498, seq 18, length 64
> 09:36:53.862230 68:05:ca:21:d6:e5 > 52:54:00:67:d5:61, ethertype IPv4 
> (0x0800), length 98: (tos 0x0, ttl 63, id 22176, offset 0, flags [none], 
> proto ICMP (1), length 84)
> 1.1.70.2 > 11.11.70.1: ICMP echo reply, id 15498, seq 18, length 64
>
> -邮件原件-
> 发件人: Mike Pattrick [mailto:m...@redhat.com]
> 发送时间: 2024年3月25日 22:26
> 收件人: zhangweiwei (RD) 
> 抄送: d...@openvswitch.org
> 主题: Re: [PATCH] ofproto-dpif-upcall: Don't mirror packets that aren't 
> modified.
>
> On Mon, Mar 25, 2024 at 3:48 AM Zhangweiwei  wrote:
> >
> > Hi,
> > I have tried this patch, however, there are still some issues when the 
> > packets contents are changed across recirculation. On the follow example, 
> > packets are modified in recirc_id(0) after mirror, the mirror context 
> > reset. Therefore, there are two ICMP request packets are mirrored on port 
> > mitapVm71.
> >
> > In the following example, ICMP packets ared sent from port(11) to
> > port(14), [root@localhost ~]# ovs-appctl dpif/dump-flows vds1-br
> > ct_state(-new-est-rel-rpl-inv-trk),recirc_id(0),in_port(11),packet_typ
> > e(ns=0,id=0),eth(src=52:54:00:67:d5:61,dst=68:05:ca:21:d6:e5),eth_type
> > (0x0800),ipv4(src=11.11.70.1,dst=1.1.70.2,proto=1,ttl=64,frag=no),
> > packets:431, bytes:42238, used:0.574s,
> > actions:10,set(eth(src=68:05:ca:21:d6:e5,dst=52:54:00:9a:bf:ed)),set(i
> > pv4(ttl=63)),ct(zone=6),recirc(0x3e8)
> > ct_state(+est-rel-rpl),recirc_id(0x3e8),in_port(11),packet_type(ns=0,i
> > d=0),eth_type(0x0800),ipv4(frag=no), packets:430, bytes:42140,
> > used:0.574s, actions:10,14
> >
> > ct_state(-new+est-rel+rpl-inv+trk),recirc_id(0x3e9),in_port(14),packet
> > _type(ns=0,id=0),eth(src=52:54:00:9a:bf:ed,dst=68:05:ca:21:d6:e5),eth_
> > type(0x0800),ipv4(dst=11.11.70.1,proto=1,ttl=64,frag=no), packets:431,
> > bytes:42238, used:0.574s,
> > actions:set(eth(src=68:05:ca:21:d6:e5,dst=52:54:00:67:d5:61)),set(ipv4
> > (ttl=63)),11,10
> > ct_state(-trk),recirc_id(0),in_port(14),packet_type(ns=0,id=0),eth(src
> > =52:54:00:9a:bf:ed),eth_type(0x0800),ipv4(src=1.1.70.2,proto=1,frag=no
> > ), packets:431, bytes:42238, used:0.574s,
> > actions:ct(zone=6),recirc(0x3e9)
> >
> > [root@localhost ~]# ovs-appctl dpif/show
> > netdev@ovs-netdev: hit:2552 missed:3019
> >   vds1-br:
> > mitapVm71 14/10: (system)
> > tapVm71 5/11: (dpdkvhostuserclient: configured_rx_queues=1, 
> > configured_tx_queues=1, mtu=1500, requested_rx_queues=1, 
> > requested_tx_queues=1)
> > tapVm72 6/14: (dpdkvhostuserclient: configured_rx_queues=1,
> > configured_tx_queues=1, mtu=1500, requested_rx_queues=1,
> > requested_tx_queues=1)
> >
> > [root@localhost ~]# ovs-tcpdump -i tapVm71
> > 14:38:53.702142 IP 11.11.70.1 > 1.1.70.2: ICMP echo request, id 13483,
> > seq 2014, length 64
> > 14:38:53.702143 IP 11.11.70.1 > 1.1.70.2: ICMP echo request, id 13483,
> > seq 2014, length 64

Re: [ovs-dev] [PATCH] ofproto-dpif-upcall: Don't mirror packets that aren't modified.

2024-03-25 Thread Mike Pattrick

On Mon, Mar 25, 2024 at 3:48 AM Zhangweiwei  wrote:
>
> Hi,
> I have tried this patch, however, there are still some issues when the 
> packets contents are changed across recirculation. On the follow example, 
> packets are modified in recirc_id(0) after mirror, the mirror context reset. 
> Therefore, there are two ICMP request packets are mirrored on port mitapVm71.
>
> In the following example, ICMP packets ared sent from port(11) to port(14),
> [root@localhost ~]# ovs-appctl dpif/dump-flows vds1-br
> ct_state(-new-est-rel-rpl-inv-trk),recirc_id(0),in_port(11),packet_type(ns=0,id=0),eth(src=52:54:00:67:d5:61,dst=68:05:ca:21:d6:e5),eth_type(0x0800),ipv4(src=11.11.70.1,dst=1.1.70.2,proto=1,ttl=64,frag=no),
>  packets:431, bytes:42238, used:0.574s, 
> actions:10,set(eth(src=68:05:ca:21:d6:e5,dst=52:54:00:9a:bf:ed)),set(ipv4(ttl=63)),ct(zone=6),recirc(0x3e8)
> ct_state(+est-rel-rpl),recirc_id(0x3e8),in_port(11),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
>  packets:430, bytes:42140, used:0.574s, actions:10,14
>
> ct_state(-new+est-rel+rpl-inv+trk),recirc_id(0x3e9),in_port(14),packet_type(ns=0,id=0),eth(src=52:54:00:9a:bf:ed,dst=68:05:ca:21:d6:e5),eth_type(0x0800),ipv4(dst=11.11.70.1,proto=1,ttl=64,frag=no),
>  packets:431, bytes:42238, used:0.574s, 
> actions:set(eth(src=68:05:ca:21:d6:e5,dst=52:54:00:67:d5:61)),set(ipv4(ttl=63)),11,10
> ct_state(-trk),recirc_id(0),in_port(14),packet_type(ns=0,id=0),eth(src=52:54:00:9a:bf:ed),eth_type(0x0800),ipv4(src=1.1.70.2,proto=1,frag=no),
>  packets:431, bytes:42238, used:0.574s, actions:ct(zone=6),recirc(0x3e9)
>
> [root@localhost ~]# ovs-appctl dpif/show
> netdev@ovs-netdev: hit:2552 missed:3019
>   vds1-br:
> mitapVm71 14/10: (system)
> tapVm71 5/11: (dpdkvhostuserclient: configured_rx_queues=1, 
> configured_tx_queues=1, mtu=1500, requested_rx_queues=1, 
> requested_tx_queues=1)
> tapVm72 6/14: (dpdkvhostuserclient: configured_rx_queues=1, 
> configured_tx_queues=1, mtu=1500, requested_rx_queues=1, 
> requested_tx_queues=1)
>
> [root@localhost ~]# ovs-tcpdump -i tapVm71
> 14:38:53.702142 IP 11.11.70.1 > 1.1.70.2: ICMP echo request, id 13483, seq 
> 2014, length 64
> 14:38:53.702143 IP 11.11.70.1 > 1.1.70.2: ICMP echo request, id 13483, seq 
> 2014, length 64
> 14:38:53.702185 IP 1.1.70.2 > 11.11.70.1: ICMP echo reply, id 13483, seq 
> 2014, length 64
> 14:38:54.742141 IP 11.11.70.1 > 1.1.70.2: ICMP echo request, id 13483, seq 
> 2015, length 64
> 14:38:54.742143 IP 11.11.70.1 > 1.1.70.2: ICMP echo request, id 13483, seq 
> 2015, length 64
> 14:38:54.742183 IP 1.1.70.2 > 11.11.70.1: ICMP echo reply, id 13483, seq 
> 2015, length 64
> 14:38:55.782142 IP 11.11.70.1 > 1.1.70.2: ICMP echo request, id 13483, seq 
> 2016, length 64
> 14:38:55.782144 IP 11.11.70.1 > 1.1.70.2: ICMP echo request, id 13483, seq 
> 2016, length 64
> 14:38:55.782186 IP 1.1.70.2 > 11.11.70.1: ICMP echo reply, id 13483, seq 
> 2016, length 64


Hello, thanks for the report. Is it possible to run the command
"ovs-tcpdump -i tapVm71 -ennvv" ? I ask because I see your actions
reset the ethernet address. If the ethernet address is different then
this would be the expected behavior, the collection of the packet as
it enters, and then as it exists modified.


Thank you,

Mike


>
> -邮件原件-
> 发件人: Mike Pattrick [mailto:m...@redhat.com]
> 发送时间: 2024年3月13日 1:37
> 收件人: d...@openvswitch.org
> 抄送: Mike Pattrick ; zhangweiwei (RD) 
> 主题: [PATCH] ofproto-dpif-upcall: Don't mirror packets that aren't modified.
>
> Previously OVS reset the mirror contents when a packet is modified in such a 
> way that the packets contents changes. However, this change incorrectly reset 
> that mirror context when only metadata changes as well.
>
> Now we check for all metadata fields, instead of just tunnel metadata, before 
> resetting the mirror context.
>
> Fixes: feed7f677505 ("ofproto-dpif-upcall: Mirror packets that are modified.")
> Reported-by: Zhangweiwei 
> Signed-off-by: Mike Pattrick 
> ---
>  include/openvswitch/meta-flow.h |   1 +
>  lib/meta-flow.c | 109 
>  ofproto/ofproto-dpif-xlate.c|   2 +-
>  tests/ofproto-dpif.at   |   5 +-
>  4 files changed, 114 insertions(+), 3 deletions(-)
>
> diff --git a/include/openvswitch/meta-flow.h 
> b/include/openvswitch/meta-flow.h index 3b0220aaa..96aad3933 100644
> --- a/include/openvswitch/meta-flow.h
> +++ b/include/openvswitch/meta-flow.h
> @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *,
>const union mf_value *mask,
>struct flow *);  bool mf_is_tun_metadata(const 
> struct mf_field *);
> +bool

Re: [ovs-dev] [PATCH] dpif-netdev: Fix crash due to tunnel offloading on recirculation.

2024-03-22 Thread Mike Pattrick

On Fri, Mar 22, 2024 at 10:41 AM Ilya Maximets  wrote:
>
> Recirculation involves re-parsing the packet from scratch and that
> process is not aware of multiple header levels nor the inner/outer
> offsets.  So, it overwrites offsets with new ones from the outermost
> headers and sets offloading flags that change their meaning when
> the packet is marked for tunnel offloading.
>
> For example:
>
>  1. TCP packet enters OVS.
>  2. TCP packet gets encapsulated into UDP tunnel.
>  3. Recirculation happens.
>  4. Packet is re-parsed after recirculation with miniflow_extract()
> or similar function.
>  5. Packet is marked for UDP checksumming because we parse the
> outermost set of headers.  But since it is tunneled, it means
> inner UDP checksumming.  And that makes no sense, because the
> inner packet is TCP.
>
> This is causing packet drops due to malformed packets or even
> assertions and crashes in the code that is trying to fixup checksums
> for packets using incorrect metadata:
>
>  SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
>
>  lib/packets.c:2061:15: runtime error:
> member access within null pointer of type 'struct udp_header'
>
>   0 0xbe5221 in packet_udp_complete_csum lib/packets.c:2061:15
>   1 0x7e5662 in dp_packet_ol_send_prepare lib/dp-packet.c:638:9
>   2 0x96ef89 in netdev_send lib/netdev.c:940:9
>   3 0x818e94 in dp_netdev_pmd_flush_output_on_port lib/dpif-netdev.c:5577:9
>   4 0x817606 in dp_netdev_pmd_flush_output_packets lib/dpif-netdev.c:5618:27
>   5 0x81cfa5 in dp_netdev_process_rxq_port lib/dpif-netdev.c:5677:9
>   6 0x7eefe4 in dpif_netdev_run lib/dpif-netdev.c:7001:25
>   7 0x610e87 in type_run ofproto/ofproto-dpif.c:367:9
>   8 0x5b9e80 in ofproto_type_run ofproto/ofproto.c:1879:31
>   9 0x55bbb4 in bridge_run__ vswitchd/bridge.c:3281:9
>  10 0x558b6b in bridge_run vswitchd/bridge.c:3346:5
>  11 0x591dc5 in main vswitchd/ovs-vswitchd.c:130:9
>  12 0x172b89 in __libc_start_call_main (/lib64/libc.so.6+0x27b89)
>  13 0x172c4a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x27c4a)
>  14 0x47eff4 in _start (vswitchd/ovs-vswitchd+0x47eff4)
>
> Tests added for both IPv4 and IPv6 cases.  Though IPv6 test doesn't
> trigger the issue it's better to have a symmetric test.
>
> Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/053014.html
> Signed-off-by: Ilya Maximets 
> ---

I have tested this, and it does fix the segfault here.

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v4] tunnel: Allow UDP zero checksum with IPv6 tunnels.

2024-03-21 Thread Mike Pattrick

This patch adopts the proposed RFC 6935 by allowing null UDP checksums
even if the tunnel protocol is IPv6. This is already supported by Linux
through the udp6zerocsumtx tunnel option. It is disabled by default and
IPv6 tunnels are flagged as requiring a checksum, but this patch enables
the user to set csum=false on IPv6 tunnels.

Signed-off-by: Mike Pattrick 
---
v2: Changed documentation, and added a NEWS item
v3: NEWS file merge conflict
v4: Better comments, new test
---
 NEWS  |  4 
 lib/netdev-native-tnl.c   |  2 +-
 lib/netdev-vport.c| 17 +++--
 lib/netdev.h  | 18 +-
 ofproto/tunnel.c  | 11 +--
 tests/tunnel-push-pop-ipv6.at |  9 +
 tests/tunnel-push-pop.at  |  7 +++
 tests/tunnel.at   |  2 +-
 vswitchd/vswitch.xml  | 12 +---
 9 files changed, 72 insertions(+), 10 deletions(-)

diff --git a/NEWS b/NEWS
index c9e4064e6..6c8c4a2dc 100644
--- a/NEWS
+++ b/NEWS
@@ -4,6 +4,10 @@ Post-v3.3.0
  * Conntrack now supports 'random' flag for selecting ports in a range
while natting and 'persistent' flag for selection of the IP address
from a range.
+ * IPv6 UDP tunnel encapsulation including Geneve and VXLAN will now
+   honour the csum option.  Configuring the interface with
+   "options:csum=false" now has the same effect as the udp6zerocsumtx
+   option has with Linux kernel UDP tunnels.
 
 
 v3.3.0 - 16 Feb 2024
diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index dee9ab344..e8258bc4e 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -424,7 +424,7 @@ udp_build_header(const struct netdev_tunnel_config *tnl_cfg,
 udp = netdev_tnl_ip_build_header(data, params, IPPROTO_UDP, 0);
 udp->udp_dst = tnl_cfg->dst_port;
 
-if (params->is_ipv6 || params->flow->tunnel.flags & FLOW_TNL_F_CSUM) {
+if (params->flow->tunnel.flags & FLOW_TNL_F_CSUM) {
 /* Write a value in now to mark that we should compute the checksum
  * later. 0x is handy because it is transparent to the
  * calculation. */
diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index 60caa02fb..e51542e32 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -702,7 +702,9 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
*args, char **errp)
 tnl_cfg.dst_port = htons(atoi(node->value));
 } else if (!strcmp(node->key, "csum") && has_csum) {
 if (!strcmp(node->value, "true")) {
-tnl_cfg.csum = true;
+tnl_cfg.csum = NETDEV_TNL_CSUM_ENABLED;
+} else if (!strcmp(node->value, "false")) {
+tnl_cfg.csum = NETDEV_TNL_CSUM_DISABLED;
 }
 } else if (!strcmp(node->key, "seq") && has_seq) {
 if (!strcmp(node->value, "true")) {
@@ -850,6 +852,15 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
*args, char **errp)
 }
 }
 
+/* The default csum state for GRE is special as it does have an optional
+ * checksum but the default configuration isn't correlated with IP version
+ * like UDP tunnels are.  Likewise, tunnels with checksum at all must be in
+ * this state. */
+if (tnl_cfg.csum == NETDEV_TNL_CSUM_DEFAULT &&
+(!has_csum || strstr(type, "gre"))) {
+tnl_cfg.csum = NETDEV_TNL_DEFAULT_NO_CSUM;
+}
+
 enum tunnel_layers layers = tunnel_supported_layers(type, _cfg);
 const char *full_type = (strcmp(type, "vxlan") ? type
  : (tnl_cfg.exts & (1 << OVS_VXLAN_EXT_GPE)
@@ -1026,8 +1037,10 @@ get_tunnel_config(const struct netdev *dev, struct smap 
*args)
 }
 }
 
-if (tnl_cfg->csum) {
+if (tnl_cfg->csum == NETDEV_TNL_CSUM_ENABLED) {
 smap_add(args, "csum", "true");
+} else if (tnl_cfg->csum == NETDEV_TNL_CSUM_DISABLED) {
+smap_add(args, "csum", "false");
 }
 
 if (tnl_cfg->set_seq) {
diff --git a/lib/netdev.h b/lib/netdev.h
index 67a8486bd..5d253157c 100644
--- a/lib/netdev.h
+++ b/lib/netdev.h
@@ -111,6 +111,22 @@ enum netdev_srv6_flowlabel {
 SRV6_FLOWLABEL_COMPUTE,
 };
 
+enum netdev_tnl_csum {
+/* Default value for UDP tunnels if no configurations is present.  Enforce
+ * checksum calculation in IPv6 tunnels, disable in IPv4 tunnels. */
+NETDEV_TNL_CSUM_DEFAULT = 0,
+
+/* Checksum explicitly to be calculated. */
+NETDEV_TNL_CSUM_ENABLED,
+
+/* Checksum calculation explicitly disabled. */
+NETDEV_TNL_CSUM_DISABLED,
+
+/* A value for when there is no checksum or the default value is no
+ * checksum reguardless of IP version. */
+NETDEV_TNL_DEFAULT_NO_CSUM,
+};
+
 /* Configuration spe

[ovs-dev] [PATCH v3] ovs-monitor-ipsec: LibreSwan autodetect paths.

2024-03-21 Thread Mike Pattrick

In v4.0, LibreSwan changed a default paths that had been hardcoded in
ovs-monitor-ipsec, breaking some uses of this script. This patch adds
support for both old and newer versions by auto detecting the version
of LibreSwan and then choosing the correct path.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1975039
Reported-by: Qijun Ding 
Fixes: d6afbc00d5b3 ("ipsec: Allow custom file locations.")
Signed-off-by: Mike Pattrick 
---
v2: Don't extract variables from ipsec script
v3: Removed use of packaging
---
 ipsec/ovs-monitor-ipsec.in | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/ipsec/ovs-monitor-ipsec.in b/ipsec/ovs-monitor-ipsec.in
index 7945162f9..bc7ac5523 100755
--- a/ipsec/ovs-monitor-ipsec.in
+++ b/ipsec/ovs-monitor-ipsec.in
@@ -457,14 +457,30 @@ conn prevent_unencrypted_vxlan
 CERTKEY_PREFIX = "ovs_certkey_"
 
 def __init__(self, libreswan_root_prefix, args):
+# Collect version infromation
+self.IPSEC = libreswan_root_prefix + "/usr/sbin/ipsec"
+proc = subprocess.Popen([self.IPSEC, "--version"],
+stdout=subprocess.PIPE,
+encoding="latin1")
+pout, perr = proc.communicate()
+
+v = re.match("^Libreswan (.*)$", pout)
+try:
+version = int(v.group(1).split(".")[0])
+except:
+version = 0
+
+if version >= 4:
+ipsec_d = args.ipsec_d if args.ipsec_d else "/var/lib/ipsec/nss"
+else:
+ipsec_d = args.ipsec_d if args.ipsec_d else "/etc/ipsec.d"
+
 ipsec_conf = args.ipsec_conf if args.ipsec_conf else "/etc/ipsec.conf"
-ipsec_d = args.ipsec_d if args.ipsec_d else "/etc/ipsec.d"
 ipsec_secrets = (args.ipsec_secrets if args.ipsec_secrets
 else "/etc/ipsec.secrets")
 ipsec_ctl = (args.ipsec_ctl if args.ipsec_ctl
 else "/run/pluto/pluto.ctl")
 
-self.IPSEC = libreswan_root_prefix + "/usr/sbin/ipsec"
 self.IPSEC_CONF = libreswan_root_prefix + ipsec_conf
 self.IPSEC_SECRETS = libreswan_root_prefix + ipsec_secrets
 self.IPSEC_D = "sql:" + libreswan_root_prefix + ipsec_d
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2] ovs-monitor-ipsec: LibreSwan autodetect paths.

2024-03-20 Thread Mike Pattrick

On Wed, Mar 20, 2024 at 2:05 PM Mike Pattrick  wrote:
>
> In v4.0, LibreSwan changed a default paths that had been hardcoded in
> ovs-monitor-ipsec, breaking some uses of this script. This patch adds
> support for both old and newer versions by auto detecting the version
> of LibreSwan and then choosing the correct path.
>
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1975039
> Reported-by: Qijun Ding 
> Fixes: d6afbc00d5b3 ("ipsec: Allow custom file locations.")
> Signed-off-by: Mike Pattrick 
> ---
> v2: Don't extract variables from ipsec script
> ---

Failed with 503 Service Unavailable

Recheck-request: github-robot

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v2] ovs-monitor-ipsec: LibreSwan autodetect paths.

2024-03-20 Thread Mike Pattrick

In v4.0, LibreSwan changed a default paths that had been hardcoded in
ovs-monitor-ipsec, breaking some uses of this script. This patch adds
support for both old and newer versions by auto detecting the version
of LibreSwan and then choosing the correct path.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1975039
Reported-by: Qijun Ding 
Fixes: d6afbc00d5b3 ("ipsec: Allow custom file locations.")
Signed-off-by: Mike Pattrick 
---
v2: Don't extract variables from ipsec script
---
 ipsec/ovs-monitor-ipsec.in | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/ipsec/ovs-monitor-ipsec.in b/ipsec/ovs-monitor-ipsec.in
index 7945162f9..6a71d4f2f 100755
--- a/ipsec/ovs-monitor-ipsec.in
+++ b/ipsec/ovs-monitor-ipsec.in
@@ -21,6 +21,7 @@ import re
 import subprocess
 import sys
 from string import Template
+from packaging.version import parse
 
 import ovs.daemon
 import ovs.db.idl
@@ -457,14 +458,25 @@ conn prevent_unencrypted_vxlan
 CERTKEY_PREFIX = "ovs_certkey_"
 
 def __init__(self, libreswan_root_prefix, args):
+# Collect version infromation
+self.IPSEC = libreswan_root_prefix + "/usr/sbin/ipsec"
+proc = subprocess.Popen([self.IPSEC, "--version"],
+stdout=subprocess.PIPE,
+encoding="latin1")
+pout, perr = proc.communicate()
+
+v = re.match("^Libreswan (.*)$", pout)
+if v and parse(v.group(1)) >= parse("4.0"):
+ipsec_d = args.ipsec_d if args.ipsec_d else "/var/lib/ipsec/nss"
+else:
+ipsec_d = args.ipsec_d if args.ipsec_d else "/etc/ipsec.d"
+
 ipsec_conf = args.ipsec_conf if args.ipsec_conf else "/etc/ipsec.conf"
-ipsec_d = args.ipsec_d if args.ipsec_d else "/etc/ipsec.d"
 ipsec_secrets = (args.ipsec_secrets if args.ipsec_secrets
 else "/etc/ipsec.secrets")
 ipsec_ctl = (args.ipsec_ctl if args.ipsec_ctl
 else "/run/pluto/pluto.ctl")
 
-self.IPSEC = libreswan_root_prefix + "/usr/sbin/ipsec"
 self.IPSEC_CONF = libreswan_root_prefix + ipsec_conf
 self.IPSEC_SECRETS = libreswan_root_prefix + ipsec_secrets
 self.IPSEC_D = "sql:" + libreswan_root_prefix + ipsec_d
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] ovs-monitor-ipsec: LibreSwan autodetect paths.

2024-03-20 Thread Mike Pattrick

On Tue, Mar 19, 2024 at 5:35 PM Ilya Maximets  wrote:
>
> On 3/13/24 22:54, Mike Pattrick wrote:
> > In v4.0, LibreSwan changed a default paths that had been hardcoded in
> > ovs-monitor-ipsec, breaking some uses of this script. This patch adds
> > support for both old and newer versions by auto detecting the location
> > of these paths from LibreSwan shell script environment variables.
> >
> > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1975039
> > Reported-by: Qijun Ding 
> > Fixes: d6afbc00d5b3 ("ipsec: Allow custom file locations.")
> > Signed-off-by: Mike Pattrick 
> > ---
> >  ipsec/ovs-monitor-ipsec.in | 31 +++
> >  1 file changed, 27 insertions(+), 4 deletions(-)
> >
>
> Hi, Mike.  Thanks for working on this!
>
> Though using the knowledge that /usr/sbin/ipsec is a shell script
> and that it defines particular variables inside seems like a hack.
>
> Maybe we can just check the version instead?  We know that default
> nss path changed in 4.0.

My motivation for this method was because these paths could be changed
easier than reimplementing the ipsec script, by the maintainers or
downstream distributions. But there's nothing stopping us from
addressing any future changes as they happen, I'll resend with just a
fix for 4.0.

-M

>
> Best regards, Ilya Maximets.
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 3/3] netdev-dpdk: Fix tunnel type check during Tx offload preparation.

2024-03-14 Thread Mike Pattrick

On Wed, Mar 13, 2024 at 1:29 PM Ilya Maximets  wrote:
>
> Tunnel types are not flags, but 4-bit fields, so checking them with
> a simple binary 'and' is incorrect and may produce false-positive
> matches.
>
> While the current implementation is unlikely to cause any issues today,
> since both RTE_MBUF_F_TX_TUNNEL_VXLAN and RTE_MBUF_F_TX_TUNNEL_GENEVE
> only have 1 bit set, it is risky to have this code and it may lead
> to problems if we add support for other tunnel types in the future.
>
> Use proper field checks instead.  Also adding a warning for unexpected
> tunnel types in case something goes wrong.
>
> Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
> Signed-off-by: Ilya Maximets 

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 2/3] netdev-dpdk: Fix TCP check during Tx offload preparation.

2024-03-14 Thread Mike Pattrick

On Wed, Mar 13, 2024 at 1:29 PM Ilya Maximets  wrote:
>
> RTE_MBUF_F_TX_TCP_CKSUM is not a flag, but a 2-bit field, so checking
> it with a simple binary 'and' is incorrect.  For example, this check
> will succeed for a packet with UDP checksum requested as well.
>
> Fix the check to avoid wrongly initializing tso_segz and potentially
> accessing UDP header via TCP structure pointer.
>
> The IPv4 checksum flag has to be set for any L4 checksum request,
> regardless of the type, so moving this check out of the TCP condition.
>
> Fixes: 8b5fe2dc6080 ("userspace: Add Generic Segmentation Offloading.")
> Signed-off-by: Ilya Maximets 

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 1/3] netdev-dpdk: Clear inner packet marks if no inner offloads requested.

2024-03-14 Thread Mike Pattrick

On Wed, Mar 13, 2024 at 1:29 PM Ilya Maximets  wrote:
>
> In some cases only outer offloads may be requested for a tunneled
> packet.  In this case there is no need to mark the type of an
> inner packet.  Clean these flags up to avoid potential confusion
> of DPDK drivers.
>
> Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
> Signed-off-by: Ilya Maximets 

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] ovs-monitor-ipsec: LibreSwan autodetect paths.

2024-03-13 Thread Mike Pattrick

In v4.0, LibreSwan changed a default paths that had been hardcoded in
ovs-monitor-ipsec, breaking some uses of this script. This patch adds
support for both old and newer versions by auto detecting the location
of these paths from LibreSwan shell script environment variables.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1975039
Reported-by: Qijun Ding 
Fixes: d6afbc00d5b3 ("ipsec: Allow custom file locations.")
Signed-off-by: Mike Pattrick 
---
 ipsec/ovs-monitor-ipsec.in | 31 +++
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/ipsec/ovs-monitor-ipsec.in b/ipsec/ovs-monitor-ipsec.in
index 7945162f9..6c28f30f4 100755
--- a/ipsec/ovs-monitor-ipsec.in
+++ b/ipsec/ovs-monitor-ipsec.in
@@ -456,15 +456,38 @@ conn prevent_unencrypted_vxlan
 CERT_PREFIX = "ovs_cert_"
 CERTKEY_PREFIX = "ovs_certkey_"
 
+def collect_environment(self):
+"""Extract important paths from ipsec file."""
+env = {
+"IPSEC_CONF": "/etc/ipsec.conf",
+"IPSEC_NSSDIR": "/etc/ipsec.d",
+"IPSEC_RUNDIR": "/run/pluto"
+}
+try:
+with open(self.IPSEC) as fh:
+e_list = re.findall("^([A-Z_]+)=.*:-(.*)}",
+fh.read(),
+re.MULTILINE)
+except:
+return env
+
+for k, v in e_list:
+env[k] = v
+
+return env
+
 def __init__(self, libreswan_root_prefix, args):
-ipsec_conf = args.ipsec_conf if args.ipsec_conf else "/etc/ipsec.conf"
-ipsec_d = args.ipsec_d if args.ipsec_d else "/etc/ipsec.d"
+self.IPSEC = libreswan_root_prefix + "/usr/sbin/ipsec"
+
+env = self.collect_environment()
+
+ipsec_conf = args.ipsec_conf if args.ipsec_conf else env["IPSEC_CONF"]
+ipsec_d = args.ipsec_d if args.ipsec_d else env["IPSEC_NSSDIR"]
 ipsec_secrets = (args.ipsec_secrets if args.ipsec_secrets
 else "/etc/ipsec.secrets")
 ipsec_ctl = (args.ipsec_ctl if args.ipsec_ctl
-else "/run/pluto/pluto.ctl")
+else os.path.join(env["IPSEC_RUNDIR"], "pluto.ctl"))
 
-self.IPSEC = libreswan_root_prefix + "/usr/sbin/ipsec"
 self.IPSEC_CONF = libreswan_root_prefix + ipsec_conf
 self.IPSEC_SECRETS = libreswan_root_prefix + ipsec_secrets
 self.IPSEC_D = "sql:" + libreswan_root_prefix + ipsec_d
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] ofproto-dpif-upcall: Don't mirror packets that aren't modified.

2024-03-12 Thread Mike Pattrick

Previously OVS reset the mirror contents when a packet is modified in
such a way that the packets contents changes. However, this change
incorrectly reset that mirror context when only metadata changes as
well.

Now we check for all metadata fields, instead of just tunnel metadata,
before resetting the mirror context.

Fixes: feed7f677505 ("ofproto-dpif-upcall: Mirror packets that are modified.")
Reported-by: Zhangweiwei 
Signed-off-by: Mike Pattrick 
---
 include/openvswitch/meta-flow.h |   1 +
 lib/meta-flow.c | 109 
 ofproto/ofproto-dpif-xlate.c|   2 +-
 tests/ofproto-dpif.at   |   5 +-
 4 files changed, 114 insertions(+), 3 deletions(-)

diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
index 3b0220aaa..96aad3933 100644
--- a/include/openvswitch/meta-flow.h
+++ b/include/openvswitch/meta-flow.h
@@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *,
   const union mf_value *mask,
   struct flow *);
 bool mf_is_tun_metadata(const struct mf_field *);
+bool mf_is_metadata(const struct mf_field *);
 bool mf_is_frozen_metadata(const struct mf_field *);
 bool mf_is_pipeline_field(const struct mf_field *);
 bool mf_is_set(const struct mf_field *, const struct flow *);
diff --git a/lib/meta-flow.c b/lib/meta-flow.c
index aa7cf1fcb..7ecec334e 100644
--- a/lib/meta-flow.c
+++ b/lib/meta-flow.c
@@ -1788,6 +1788,115 @@ mf_is_tun_metadata(const struct mf_field *mf)
mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS;
 }
 
+bool
+mf_is_metadata(const struct mf_field *mf)
+{
+switch (mf->id) {
+CASE_MFF_TUN_METADATA:
+case MFF_METADATA:
+case MFF_IN_PORT:
+case MFF_IN_PORT_OXM:
+CASE_MFF_REGS:
+CASE_MFF_XREGS:
+CASE_MFF_XXREGS:
+case MFF_PACKET_TYPE:
+case MFF_DP_HASH:
+case MFF_RECIRC_ID:
+case MFF_CONJ_ID:
+case MFF_ACTSET_OUTPUT:
+case MFF_SKB_PRIORITY:
+case MFF_PKT_MARK:
+case MFF_CT_STATE:
+case MFF_CT_ZONE:
+case MFF_CT_MARK:
+case MFF_CT_LABEL:
+case MFF_CT_NW_PROTO:
+case MFF_CT_NW_SRC:
+case MFF_CT_NW_DST:
+case MFF_CT_IPV6_SRC:
+case MFF_CT_IPV6_DST:
+case MFF_CT_TP_SRC:
+case MFF_CT_TP_DST:
+case MFF_N_IDS:
+return true;
+
+case MFF_TUN_ID:
+case MFF_TUN_SRC:
+case MFF_TUN_DST:
+case MFF_TUN_IPV6_SRC:
+case MFF_TUN_IPV6_DST:
+case MFF_TUN_FLAGS:
+case MFF_TUN_GBP_ID:
+case MFF_TUN_GBP_FLAGS:
+case MFF_TUN_ERSPAN_VER:
+case MFF_TUN_ERSPAN_IDX:
+case MFF_TUN_ERSPAN_DIR:
+case MFF_TUN_ERSPAN_HWID:
+case MFF_TUN_GTPU_FLAGS:
+case MFF_TUN_GTPU_MSGTYPE:
+case MFF_TUN_TTL:
+case MFF_TUN_TOS:
+case MFF_ETH_SRC:
+case MFF_ETH_DST:
+case MFF_ETH_TYPE:
+case MFF_VLAN_TCI:
+case MFF_DL_VLAN:
+case MFF_VLAN_VID:
+case MFF_DL_VLAN_PCP:
+case MFF_VLAN_PCP:
+case MFF_MPLS_LABEL:
+case MFF_MPLS_TC:
+case MFF_MPLS_BOS:
+case MFF_MPLS_TTL:
+case MFF_IPV4_SRC:
+case MFF_IPV4_DST:
+case MFF_IPV6_SRC:
+case MFF_IPV6_DST:
+case MFF_IPV6_LABEL:
+case MFF_IP_PROTO:
+case MFF_IP_DSCP:
+case MFF_IP_DSCP_SHIFTED:
+case MFF_IP_ECN:
+case MFF_IP_TTL:
+case MFF_IP_FRAG:
+case MFF_ARP_OP:
+case MFF_ARP_SPA:
+case MFF_ARP_TPA:
+case MFF_ARP_SHA:
+case MFF_ARP_THA:
+case MFF_TCP_SRC:
+case MFF_TCP_DST:
+case MFF_TCP_FLAGS:
+case MFF_UDP_SRC:
+case MFF_UDP_DST:
+case MFF_SCTP_SRC:
+case MFF_SCTP_DST:
+case MFF_ICMPV4_TYPE:
+case MFF_ICMPV4_CODE:
+case MFF_ICMPV6_TYPE:
+case MFF_ICMPV6_CODE:
+case MFF_ND_TARGET:
+case MFF_ND_SLL:
+case MFF_ND_TLL:
+case MFF_ND_RESERVED:
+case MFF_ND_OPTIONS_TYPE:
+case MFF_NSH_FLAGS:
+case MFF_NSH_TTL:
+case MFF_NSH_MDTYPE:
+case MFF_NSH_NP:
+case MFF_NSH_SPI:
+case MFF_NSH_SI:
+case MFF_NSH_C1:
+case MFF_NSH_C2:
+case MFF_NSH_C3:
+case MFF_NSH_C4:
+return false;
+
+default:
+OVS_NOT_REACHED();
+}
+}
+
 bool
 mf_is_frozen_metadata(const struct mf_field *mf)
 {
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 89f183182..faa364ec8 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -7141,7 +7141,7 @@ reset_mirror_ctx(struct xlate_ctx *ctx, const struct flow 
*flow,
 
 set_field = ofpact_get_SET_FIELD(a);
 mf = set_field->field;
-if (mf_are_prereqs_ok(mf, flow, NULL) && !mf_is_tun_metadata(mf)) {
+if (mf_are_prereqs_ok(mf, flow, NULL) && !mf_is_metadata(mf)) {
 ctx->mirrors = 0;
 }
 return;
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index a1393f7f8..245e209c3 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -5443,7 +5443,8 @@ AT_CLEANU

Re: [ovs-dev] [PATCH] netdev-dpdk: Clean up all marker flags if no offloads requested.

2024-03-12 Thread Mike Pattrick

On Mon, Mar 11, 2024 at 2:31 PM Ilya Maximets  wrote:
>
> Some drivers (primarily, Intel ones) do not expect any marking flags
> being set if no offloads are requested.  If these flags are present,
> driver will fail Tx preparation or behave abnormally.
>
> For example, ixgbe driver will refuse to process the packet with
> only RTE_MBUF_F_TX_TUNNEL_GENEVE and RTE_MBUF_F_TX_OUTER_IPV4 set.
> This pretty much breaks Geneve tunnels on these cards.
>
> An extra check is added to make sure we don't have any unexpected
> Tx offload flags set.
>
> Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
> Reported-at: https://github.com/openvswitch/ovs-issues/issues/321
> Signed-off-by: Ilya Maximets 

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v3] ovsdb: Don't iterate over rows on empty mutation.

2024-03-06 Thread Mike Pattrick

Previously when an empty mutation was used to count the number of rows
in a table, OVSDB would iterate over all rows twice. First to perform an
RBAC check, and then to perform the no-operation.

This change adds a short circuit to mutate operations with no conditions
and an empty mutation set, returning immediately. One notable change in
functionality is not performing the RBAC check in this condition, as no
mutation actually takes place.

Reported-by: Terry Wilson 
Reported-at: https://issues.redhat.com/browse/FDP-359
Signed-off-by: Mike Pattrick 
---
v2: Added additional non-rbac tests, and support for conditional
counting without the rbac check
v3: Changed a struct to a size_t.
---
 ovsdb/execution.c| 23 +-
 ovsdb/mutation.h |  6 +
 tests/ovsdb-execution.at | 51 
 tests/ovsdb-rbac.at  | 23 ++
 4 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/ovsdb/execution.c b/ovsdb/execution.c
index 8c20c3b54..f4cc9e802 100644
--- a/ovsdb/execution.c
+++ b/ovsdb/execution.c
@@ -585,6 +585,16 @@ mutate_row_cb(const struct ovsdb_row *row, void *mr_)
 return *mr->error == NULL;
 }
 
+static bool
+count_row_cb(const struct ovsdb_row *row OVS_UNUSED, void *rc)
+{
+size_t *row_count = rc;
+
+(*row_count)++;
+
+return true;
+}
+
 static struct ovsdb_error *
 ovsdb_execute_mutate(struct ovsdb_execution *x, struct ovsdb_parser *parser,
  struct json *result)
@@ -609,7 +619,18 @@ ovsdb_execute_mutate(struct ovsdb_execution *x, struct 
ovsdb_parser *parser,
 error = ovsdb_condition_from_json(table->schema, where, x->symtab,
   );
 }
-if (!error) {
+if (!error && ovsdb_mutation_set_empty()) {
+/* Special case with no mutations, just return the row count. */
+if (ovsdb_condition_empty()) {
+json_object_put(result, "count",
+json_integer_create(hmap_count(>rows)));
+} else {
+size_t row_count = 0;
+ovsdb_query(table, , count_row_cb, _count);
+json_object_put(result, "count",
+json_integer_create(row_count));
+}
+} else if (!error) {
 mr.n_matches = 0;
 mr.txn = x->txn;
 mr.mutations = 
diff --git a/ovsdb/mutation.h b/ovsdb/mutation.h
index 7566ef199..05d4a262a 100644
--- a/ovsdb/mutation.h
+++ b/ovsdb/mutation.h
@@ -69,4 +69,10 @@ void ovsdb_mutation_set_destroy(struct ovsdb_mutation_set *);
 struct ovsdb_error *ovsdb_mutation_set_execute(
 struct ovsdb_row *, const struct ovsdb_mutation_set *) 
OVS_WARN_UNUSED_RESULT;
 
+static inline bool ovsdb_mutation_set_empty(
+const struct ovsdb_mutation_set *ms)
+{
+return ms->n_mutations == 0;
+}
+
 #endif /* ovsdb/mutation.h */
diff --git a/tests/ovsdb-execution.at b/tests/ovsdb-execution.at
index fd1c7a239..1ffa2b738 100644
--- a/tests/ovsdb-execution.at
+++ b/tests/ovsdb-execution.at
@@ -1201,4 +1201,55 @@ OVSDB_CHECK_EXECUTION([garbage collection],
 [{"rows":[]}]
 ]])])
 
+OVSDB_CHECK_EXECUTION([insert rows, count with mutation],
+  [ordinal_schema],
+  "ordinals",
+  {"op": "insert",
+   "table": "ordinals",
+   "row": {"number": 0, "name": "zero"},
+   "uuid-name": "first"}]]],
+   [[["ordinals",
+  {"op": "insert",
+   "table": "ordinals",
+   "row": {"number": 1, "name": "one"},
+   "uuid-name": "first"}]]],
+   [[["ordinals",
+  {"op": "mutate",
+   "table": "ordinals",
+   "where": [["name", "==", "zero"]],
+   "mutations": []}]]],
+   [[["ordinals",
+  {"op": "mutate",
+   "table": "ordinals",
+   "where": [["name", "==", "one"]],
+   "mutations": []}]]],
+   [[["ordinals",
+  {"op": "insert",
+   "table": "ordinals",
+   "row": {"number": 2, "name": "one"},
+   "uuid-name": "first"}]]],
+   [[["ordinals",
+  {"op": "mutate",
+   "table": "ordinals",
+   "where": [["name", "==", "one"]],
+   "mutations": []}]]],
+   [[["ordinals",
+  {"op": "delete",
+   "table": "ordinals",
+   "where": [["name", "==", "zero"]]}]]],
+   [[["

Re: [ovs-dev] [PATCH v6 2/2] netlink-conntrack: Optimize flushing ct zone.

2024-03-06 Thread Mike Pattrick

On Mon, Mar 4, 2024 at 3:22 AM Felix Huettner via dev
 wrote:
>
> Previously the kernel did not provide a netlink interface to flush/list
> only conntrack entries matching a specific zone. With [1] and [2] it is now
> possible to flush and list conntrack entries filtered by zone. Older
> kernels not yet supporting this feature will ignore the filter.
> For the list request that means just returning all entries (which we can
> then filter in userspace as before).
> For the flush request that means deleting all conntrack entries.
>
> The implementation is now identical to the windows one, so we combine
> them.
>
> These significantly improves the performance of flushing conntrack zones
> when the conntrack table is large. Since flushing a conntrack zone is
> normally triggered via an openflow command it blocks the main ovs thread
> and thereby also blocks new flows from being applied. Using this new
> feature we can reduce the flushing time for zones by around 93%.
>
> In combination with OVN the creation of a Logical_Router (which causes
> the flushing of a ct zone) could block other operations, e.g. the
> failover of Logical_Routers (as they cause new flows to be created).
> This is visible from a user perspective as a ovn-controller that is idle
> (as it waits for vswitchd) and vswitchd reporting:
> "blocked 1000 ms waiting for main to quiesce" (potentially with ever
> increasing times).
>
> The following performance tests where run in a qemu vm with 500.000
> conntrack entries distributed evenly over 500 ct zones using `ovstest
> test-netlink-conntrack flush zone=`.
>
>   |  flush zone with 1000 entries  |   flush zone with no entry |
>   +-+--+-+--|
>   |   with the patch| without  |   with the patch| without  |
>   +--+--+--+--+--+--|
>   | v6.8-rc4 |  v6.7.1  | v6.8-rc4 | v6.8-rc4 |  v6.7.1  | v6.8-rc4 |
> +-+--+--+--+--+--+--|
> | Min |  0.260   |  3.946   |  3.497   |  0.228   |  3.462   |  3.212   |
> | Median  |  0.319   |  4.237   |  4.349   |  0.298   |  4.460   |  4.010   |
> | 90%ile  |  0.335   |  4.367   |  4.522   |  0.325   |  4.662   |  4.572   |
> | 99%ile  |  0.348   |  4.495   |  4.773   |  0.340   |  4.931   |  6.003   |
> | Max |  0.362   |  4.543   |  5.054   |  0.348   |  5.390   |  6.396   |
> | Mean|  0.320   |  4.236   |  4.331   |  0.296   |  4.430   |  4.071   |
> | Total   |  80.02   |  1058|  1082|  73.93   |  1107|  1017|
>
> [1]: 
> https://github.com/torvalds/linux/commit/eff3c558bb7e61c41b53e4c8130e514a5a4df9ba
> [2]: 
> https://github.com/torvalds/linux/commit/fa173a1b4e3fd1ab5451cbc57de6fc624c824b0a
>
> Co-Authored-By: Luca Czesla 
> Signed-off-by: Luca Czesla 
> Co-Authored-By: Max Lamprecht 
> Signed-off-by: Max Lamprecht 
> Signed-off-by: Felix Huettner 
> ---

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v6 1/2] util: Support checking for kernel versions.

2024-03-05 Thread Mike Pattrick

On Mon, Mar 4, 2024 at 3:22 AM Felix Huettner via dev
 wrote:
>
> Extract checking for a given kernel version to a separate function.
> It will be used also in the next patch.
>
> Signed-off-by: Felix Huettner 
> ---

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v3] conntrack: Fix flush not flushing all elements.

2024-03-04 Thread Mike Pattrick

On Mon, Mar 4, 2024 at 10:22 AM Xavier Simonart  wrote:
>
> On netdev datapath, when a ct element was cleaned, the cmap
> could be shrinked, potentially causing some elements to be skipped
> in the flush iteration.
>
> Fixes: 967bb5c5cd90 ("conntrack: Add rcu support.")
> Signed-off-by: Xavier Simonart 

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2] ovsdb: Don't iterate over rows on empty mutation.

2024-02-28 Thread Mike Pattrick

On Wed, Feb 28, 2024 at 8:41 AM Ilya Maximets  wrote:
>
> On 2/22/24 17:37, Mike Pattrick wrote:
> > Previously when an empty mutation was used to count the number of rows
> > in a table, OVSDB would iterate over all rows twice. First to perform an
> > RBAC check, and then to perform the no-operation.
> >
> > This change adds a short circuit to mutate operations with no conditions
> > and an empty mutation set, returning immediately. One notable change in
> > functionality is not performing the RBAC check in this condition, as no
> > mutation actually takes place.
> >
> > Reported-by: Terry Wilson 
> > Reported-at: https://issues.redhat.com/browse/FDP-359
> > Signed-off-by: Mike Pattrick 
> > ---
> > v2: Added additional non-rbac tests, and support for conditional
> > counting without the rbac check
> > ---
> >  ovsdb/execution.c| 26 +++-
> >  ovsdb/mutation.h |  6 +
> >  tests/ovsdb-execution.at | 51 
> >  tests/ovsdb-rbac.at  | 23 ++
> >  4 files changed, 105 insertions(+), 1 deletion(-)
>
> Hi, Mike.  Thanks for v2!  I didn't test, but it looks good in general.
> See one comment inline.
>
> Best regards, Ilya Maximets.
>
> >
> > diff --git a/ovsdb/execution.c b/ovsdb/execution.c
> > index 8c20c3b54..7ed700632 100644
> > --- a/ovsdb/execution.c
> > +++ b/ovsdb/execution.c
> > @@ -585,6 +585,19 @@ mutate_row_cb(const struct ovsdb_row *row, void *mr_)
> >  return *mr->error == NULL;
> >  }
> >
> > +struct count_row_cbdata {
> > +size_t n_matches;
> > +};
>
> Do we actually need this structure?  It only has one element.
> We should be able to just pass a counter around directly.

It seemed more thematic to me at the time, but I can change this.

-M

>
> > +
> > +static bool
> > +count_row_cb(const struct ovsdb_row *row OVS_UNUSED, void *cr_)
> > +{
> > +struct count_row_cbdata *cr = cr_;
> > +
> > +cr->n_matches++;
> > +return true;
> > +}
> > +
> >  static struct ovsdb_error *
> >  ovsdb_execute_mutate(struct ovsdb_execution *x, struct ovsdb_parser 
> > *parser,
> >   struct json *result)
> > @@ -609,7 +622,18 @@ ovsdb_execute_mutate(struct ovsdb_execution *x, struct 
> > ovsdb_parser *parser,
> >  error = ovsdb_condition_from_json(table->schema, where, x->symtab,
> >);
> >  }
> > -if (!error) {
> > +if (!error && ovsdb_mutation_set_empty()) {
> > +/* Special case with no mutations, just return the row count. */
> > +if (ovsdb_condition_empty()) {
> > +json_object_put(result, "count",
> > +json_integer_create(hmap_count(>rows)));
> > +} else {
> > +struct count_row_cbdata cr = {};
> > +ovsdb_query(table, , count_row_cb, );
> > +json_object_put(result, "count",
> > +json_integer_create(cr.n_matches));
> > +}
> > +} else if (!error) {
> >  mr.n_matches = 0;
> >  mr.txn = x->txn;
> >  mr.mutations = 
> > diff --git a/ovsdb/mutation.h b/ovsdb/mutation.h
> > index 7566ef199..05d4a262a 100644
> > --- a/ovsdb/mutation.h
> > +++ b/ovsdb/mutation.h
> > @@ -69,4 +69,10 @@ void ovsdb_mutation_set_destroy(struct 
> > ovsdb_mutation_set *);
> >  struct ovsdb_error *ovsdb_mutation_set_execute(
> >  struct ovsdb_row *, const struct ovsdb_mutation_set *) 
> > OVS_WARN_UNUSED_RESULT;
> >
> > +static inline bool ovsdb_mutation_set_empty(
> > +const struct ovsdb_mutation_set *ms)
> > +{
> > +return ms->n_mutations == 0;
> > +}
> > +
> >  #endif /* ovsdb/mutation.h */
> > diff --git a/tests/ovsdb-execution.at b/tests/ovsdb-execution.at
> > index fd1c7a239..1ffa2b738 100644
> > --- a/tests/ovsdb-execution.at
> > +++ b/tests/ovsdb-execution.at
> > @@ -1201,4 +1201,55 @@ OVSDB_CHECK_EXECUTION([garbage collection],
> >  [{"rows":[]}]
> >  ]])])
> >
> > +OVSDB_CHECK_EXECUTION([insert rows, count with mutation],
> > +  [ordinal_schema],
> > +  "ordinals",
> > +  {"op": "insert",
> > +   "table": "ordinals",
> > +   "row": {"number": 0, "name": "zero"},
> > +   "uuid-n

Re: [ovs-dev] [PATCH v5 1/2] util: Support checking for kernel versions.

2024-02-27 Thread Mike Pattrick

On Mon, Feb 26, 2024 at 4:22 AM Felix Huettner via dev
 wrote:
>
> Extract checking for a given kernel version to a separate function.
> It will be used also in the next patch.
>
> Signed-off-by: Felix Huettner 
> ---
> v4->v5:
> - fix wrong ifdef that broke on macos
> - fix ovs_kernel_is_version_or_newer working in reverse than desired
> - ovs_kernel_is_version_or_newer now always returns false if uname
>   errors (Thanks Eelco)
> v4:
> - extract function to check kernel version
>  lib/netdev-linux.c | 14 +++---
>  lib/util.c | 27 +++
>  lib/util.h |  4 
>  3 files changed, 34 insertions(+), 11 deletions(-)
>
> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> index bf91ef462..51bd71ae3 100644
> --- a/lib/netdev-linux.c
> +++ b/lib/netdev-linux.c
> @@ -6427,18 +6427,10 @@ getqdisc_is_safe(void)
>  static bool safe = false;
>
>  if (ovsthread_once_start()) {
> -struct utsname utsname;
> -int major, minor;
> -
> -if (uname() == -1) {
> -VLOG_WARN("uname failed (%s)", ovs_strerror(errno));
> -} else if (!ovs_scan(utsname.release, "%d.%d", , )) {
> -VLOG_WARN("uname reported bad OS release (%s)", utsname.release);
> -} else if (major < 2 || (major == 2 && minor < 35)) {
> -VLOG_INFO("disabling unsafe RTM_GETQDISC in Linux kernel %s",
> -  utsname.release);
> -} else {
> +if (ovs_kernel_is_version_or_newer(2, 35)) {
>  safe = true;
> +} else {
> +VLOG_INFO("disabling unsafe RTM_GETQDISC in Linux kernel");
>  }
>  ovsthread_once_done();
>  }
> diff --git a/lib/util.c b/lib/util.c
> index 3fb3a4b40..f5b2da095 100644
> --- a/lib/util.c
> +++ b/lib/util.c
> @@ -27,6 +27,7 @@
>  #include 
>  #ifdef __linux__
>  #include 
> +#include 

This import can now be removed from netdev-linux (I believe).

>  #endif
>  #include 
>  #include 
> @@ -2500,3 +2501,29 @@ OVS_CONSTRUCTOR(winsock_start) {
> }
>  }
>  #endif
> +
> +#ifdef __linux__
> +bool
> +ovs_kernel_is_version_or_newer(int target_major, int target_minor)
> +{
> +static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
> +static int current_major, current_minor = -1;
> +
> +if (ovsthread_once_start()) {
> +struct utsname utsname;
> +
> +if (uname() == -1) {
> +VLOG_WARN("uname failed (%s)", ovs_strerror(errno));
> +} else if (!ovs_scan(utsname.release, "%d.%d",
> +_major, _minor)) {
> +VLOG_WARN("uname reported bad OS release (%s)", utsname.release);
> +}
> +ovsthread_once_done();
> +}
> +if (current_major == -1 || current_minor == -1) {
> +return false;
> +}
> +return current_major > target_major || (
> +current_major == target_major && current_minor > target_minor);

Shouldn't this be "current_minor >= target_minor" ?

-M

> +}
> +#endif
> diff --git a/lib/util.h b/lib/util.h
> index f2d45bcac..55718fd87 100644
> --- a/lib/util.h
> +++ b/lib/util.h
> @@ -611,4 +611,8 @@ int ftruncate(int fd, off_t length);
>  }
>  #endif
>
> +#ifdef __linux__
> +bool ovs_kernel_is_version_or_newer(int target_major, int target_minor);
> +#endif
> +
>  #endif /* util.h */
>
> base-commit: 166ee41d282c506d100bc2185d60af277121b55b
> --
> 2.43.2
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 5/5] Documentation: Update links to upstream Kernel documentation.

2024-02-27 Thread Mike Pattrick

On Tue, Feb 27, 2024 at 10:37 AM Simon Horman  wrote:
>
> This updates links to several upstream Kernel documents.
>
> 1. Lore is now the canonical archive for the netdev mailing list
>
> 2. net-next is now maintained by the netdev team,
>of which David Miller is currently a member,
>rather than only by David.
>
>Also, use HTTPS rather than HTTP.
>
> 3. The Netdev FAQ has evolved into the Netdev Maintainer Handbook.
>
> 4. The Kernel security document link was dead,
>provide the current canonical location for this document instead.
>
> 1., 2. & 3. Found by inspection
> 4. Flagged by check-docs
>
> Signed-off-by: Simon Horman 

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 4/5] Documentatoin: Update Pacemaker link.

2024-02-27 Thread Mike Pattrick

On Tue, Feb 27, 2024 at 10:36 AM Simon Horman  wrote:
>
> Update link to OCF Resource Agents documentation as the existing link
> is broken. Also, use HTTPS.
>
> Broken link flagged by make check-docs
>
> Signed-off-by: Simon Horman 

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 3/5] Documentation: Anuket project updates.

2024-02-27 Thread Mike Pattrick

On Tue, Feb 27, 2024 at 10:36 AM Simon Horman  wrote:
>
> The Anuket was formed by a merger of OPNFV and CNTT [1].
>
> Also, VswitchPerf, aka vsperf, formerly an OPNFV project,
> has been renamed ViNePerf [2].
>
> Update links and documentation accordingly.
>
> The old links were broken, this was flagged by make check-docs
>
> [1] 
> https://anuket.io/news/2021/01/27/lf-networking-launches-anuket-an-open-source-project-to-accelerate-infrastructure-compliance-interoperability-and-5g-deployments/
> [2] 
> https://docs.opnfv.org/projects/vineperf/en/latest/release/release-notes/release-notes.html
>
> Signed-off-by: Simon Horman 
> ---

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 2/5] Documentation: Correct spelling errors.

2024-02-27 Thread Mike Pattrick

On Tue, Feb 27, 2024 at 10:37 AM Simon Horman  wrote:
>
> Correct spelling errors in .rst files flagged by codespell.
>
> Signed-off-by: Simon Horman 
> ---

These look correct to me.

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 1/5] Documentation: Extend copyright to 2024.

2024-02-27 Thread Mike Pattrick

On Tue, Feb 27, 2024 at 10:36 AM Simon Horman  wrote:
>
> IANAL, but I think we can extend the copyright attached
> to documentation to cover the current year: we are still
> actively working on the documentation.
>
> Signed-off-by: Simon Horman 

Acked-by: Mike Pattrick 

I wonder if it's valid to set the end date to datetime.now().year

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2] conntrack: Fix flush not flushing all elements.

2024-02-27 Thread Mike Pattrick

On Mon, Feb 26, 2024 at 5:50 AM Xavier Simonart  wrote:
>
> On netdev datapath, when a ct element was cleaned, the cmap
> could be shrinked, potentially causing some elements to be skipped
> in the flush iteration.
>
> Fixes: 967bb5c5cd90 ("conntrack: Add rcu support.")
> Signed-off-by: Xavier Simonart 

Thank you for the patch, I was able to test this out, verify the issue
is as you described, and that your patch fixes the problem.

> ---
> v2: - Updated commit message.
> - Use compose-packet instead of hex packet content.
> - Use dnl for comments.
> - Remove unnecessary errors in OVS_TRAFFIC_VSWITCHD_STOP.
> - Rebased on origin/master.
> ---
>  lib/conntrack.c | 14 
>  lib/conntrack.h |  1 +
>  tests/system-traffic.at | 47 +
>  3 files changed, 52 insertions(+), 10 deletions(-)
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 8a7056bac..5786424f6 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -2651,25 +2651,19 @@ conntrack_dump_start(struct conntrack *ct, struct 
> conntrack_dump *dump,
>
>  dump->ct = ct;
>  *ptot_bkts = 1; /* Need to clean up the callers. */
> +dump->cursor = cmap_cursor_start(>conns);
>  return 0;
>  }
>
>  int
>  conntrack_dump_next(struct conntrack_dump *dump, struct ct_dpif_entry *entry)
>  {
> -struct conntrack *ct = dump->ct;
>  long long now = time_msec();
>
> -for (;;) {
> -struct cmap_node *cm_node = cmap_next_position(>conns,
> -   >cm_pos);
> -if (!cm_node) {
> -break;
> -}
> -struct conn_key_node *keyn;
> -struct conn *conn;
> +struct conn_key_node *keyn;
> +struct conn *conn;
>
> -INIT_CONTAINER(keyn, cm_node, cm_node);
> +CMAP_CURSOR_FOR_EACH_CONTINUE (keyn, cm_node, >cursor) {
>  if (keyn->dir != CT_DIR_FWD) {
>  continue;
>  }
> diff --git a/lib/conntrack.h b/lib/conntrack.h
> index ee7da099e..aa12a1847 100644
> --- a/lib/conntrack.h
> +++ b/lib/conntrack.h
> @@ -109,6 +109,7 @@ struct conntrack_dump {
>  union {
>  struct cmap_position cm_pos;

cm_pos is now dead code.

>  struct hmap_position hmap_pos;
> +struct cmap_cursor cursor;
>  };
>  bool filter_zone;
>  uint16_t zone;
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index 98e494abf..34f93b2e5 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -8389,6 +8389,53 @@ AT_CHECK([ovs-pcap client.pcap | grep 
> 20102000], [0], [dnl
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
>
> +AT_SETUP([conntrack - Flush many conntrack entries by port])
> +CHECK_CONNTRACK()
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
> +
> +AT_DATA([flows.txt], [dnl
> +priority=100,in_port=1,udp,action=ct(zone=1,commit),2
> +])
> +
> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> +
> +dnl 20 packets from port 1 and 1 packet from port 2.
> +flow_l3="\
> +eth_src=50:54:00:00:00:09,eth_dst=50:54:00:00:00:0a,dl_type=0x0800,\
> +nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_proto=17,nw_ttl=64,nw_frag=no"
> +
> +for i in $(seq 1 20); do
> +frame=$(ovs-ofctl compose-packet --bare "$flow_l3, udp_src=1,udp_dst=$i")
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 
> packet=$frame actions=resubmit(,0)"])
> +done
> +frame=$(ovs-ofctl compose-packet --bare "$flow_l3, udp_src=2,udp_dst=1")
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=$frame 
> actions=resubmit(,0)"])
> +
> +: > conntrack
> +
> +for i in $(seq 1 20); do
> +echo 
> "udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=${i}),reply=(src=10.1.1.2,dst=10.1.1.1,sport=${i},dport=1),zone=1"
>  >> conntrack
> +done
> +echo 
> "udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=2,dport=1),reply=(src=10.1.1.2,dst=10.1.1.1,sport=1,dport=2),zone=1"
>  >> conntrack
> +
> +sort conntrack > expout
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep -F "src=10.1.1.1," | sort 
> ], [0], [expout])
> +
> +dnl Check that flushing conntrack by port 1 flush all ct for port 1 but 
> keeps ct for port 2.
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_proto=17,ct_tp_src=1'])
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep -F "src=10.1.1.1," | sort 
> ], [0], [dnl
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=2,dport=1),reply=(src=10.1.1.2,dst=10.1.1.1,sport=1,dport=2),zone=1
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
>  AT_BANNER([IGMP])
>
>  AT_SETUP([IGMP - flood under normal action])
> --
> 2.41.0
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>

___
dev mailing list
d...@openvswitch.org

[ovs-dev] [PATCH v7 2/2] ofproto-dpif-mirror: Add support for pre-selection filter.

2024-02-26 Thread Mike Pattrick

Currently a bridge mirror will collect all packets and tools like
ovs-tcpdump can apply additional filters after they have already been
duplicated by vswitchd. This can result in inefficient collection.

This patch adds support to apply pre-selection to bridge mirrors, which
can limit which packets are mirrored based on flow metadata. This
significantly improves overall vswitchd performance during mirroring if
only a subset of traffic is required.

Signed-off-by: Mike Pattrick 
---
v7:
 - Make sure filter mask is added to masks of non-matching flows.
 - Added additional tests.
---
 Documentation/ref/ovs-tcpdump.8.rst |   8 +-
 NEWS|   3 +
 lib/flow.c  |  21 +++-
 lib/flow.h  |  12 +++
 ofproto/ofproto-dpif-mirror.c   |  78 ++-
 ofproto/ofproto-dpif-mirror.h   |  12 ++-
 ofproto/ofproto-dpif-xlate.c|  26 -
 ofproto/ofproto-dpif.c  |   9 +-
 ofproto/ofproto-dpif.h  |   6 ++
 ofproto/ofproto.c   |   4 +-
 ofproto/ofproto.h   |   3 +
 tests/ofproto-dpif.at   | 142 
 utilities/ovs-tcpdump.in|  13 ++-
 vswitchd/bridge.c   |  13 ++-
 vswitchd/vswitch.ovsschema  |   5 +-
 vswitchd/vswitch.xml|  13 +++
 16 files changed, 343 insertions(+), 25 deletions(-)

diff --git a/Documentation/ref/ovs-tcpdump.8.rst 
b/Documentation/ref/ovs-tcpdump.8.rst
index b9f8cdf6f..e21e61211 100644
--- a/Documentation/ref/ovs-tcpdump.8.rst
+++ b/Documentation/ref/ovs-tcpdump.8.rst
@@ -61,8 +61,14 @@ Options
 
   If specified, mirror all ports (optional).
 
+* ``--filter ``
+
+  If specified, only mirror flows that match the provided OpenFlow filter.
+  The available fields are documented in ``ovs-fields(7)``.
+
 See Also
 
 
 ``ovs-appctl(8)``, ``ovs-vswitchd(8)``, ``ovs-pcap(1)``,
-``ovs-tcpundump(1)``, ``tcpdump(8)``, ``wireshark(8)``.
+``ovs-fields(7)``, ``ovs-tcpundump(1)``, ``tcpdump(8)``,
+``wireshark(8)``.
diff --git a/NEWS b/NEWS
index c9e4064e6..35f7eb0c7 100644
--- a/NEWS
+++ b/NEWS
@@ -4,6 +4,9 @@ Post-v3.3.0
  * Conntrack now supports 'random' flag for selecting ports in a range
while natting and 'persistent' flag for selection of the IP address
from a range.
+   - OVSDB:
+ * Added a new filter column in the Mirror table which can be used to
+   apply filters to mirror ports.
 
 
 v3.3.0 - 16 Feb 2024
diff --git a/lib/flow.c b/lib/flow.c
index 8e3402388..a088bdc86 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -3569,7 +3569,7 @@ miniflow_equal_in_minimask(const struct miniflow *a, 
const struct miniflow *b,
 return true;
 }
 
-/* Returns true if 'a' and 'b' are equal at the places where there are 1-bits
+/* Returns true if 'a' and 'b' are equal at the places where there are 0-bits
  * in 'mask', false if they differ. */
 bool
 miniflow_equal_flow_in_minimask(const struct miniflow *a, const struct flow *b,
@@ -3587,6 +3587,25 @@ miniflow_equal_flow_in_minimask(const struct miniflow 
*a, const struct flow *b,
 return true;
 }
 
+/* Returns false if 'a' and 'b' differ in places where there are 1-bits in
+ * 'wc', true otherwise. */
+bool
+miniflow_equal_flow_in_flow_wc(const struct miniflow *a, const struct flow *b,
+   const struct flow_wildcards *wc)
+{
+const struct flow *wc_masks = >masks;
+size_t idx;
+
+FLOWMAP_FOR_EACH_INDEX (idx, a->map) {
+if ((miniflow_get(a, idx) ^ flow_u64_value(b, idx)) &
+flow_u64_value(wc_masks, idx)) {
+return false;
+}
+}
+
+return true;
+}
+
 
 void
 minimask_init(struct minimask *mask, const struct flow_wildcards *wc)
diff --git a/lib/flow.h b/lib/flow.h
index 75a9be3c1..a644be39d 100644
--- a/lib/flow.h
+++ b/lib/flow.h
@@ -748,6 +748,9 @@ bool miniflow_equal_in_minimask(const struct miniflow *a,
 bool miniflow_equal_flow_in_minimask(const struct miniflow *a,
  const struct flow *b,
  const struct minimask *);
+bool miniflow_equal_flow_in_flow_wc(const struct miniflow *a,
+const struct flow *b,
+const struct flow_wildcards *);
 uint32_t miniflow_hash_5tuple(const struct miniflow *flow, uint32_t basis);
 
 
@@ -939,6 +942,15 @@ flow_union_with_miniflow(struct flow *dst, const struct 
miniflow *src)
 flow_union_with_miniflow_subset(dst, src, src->map);
 }
 
+/* Perform a bitwise OR of minimask 'src' mask data with the equivalent
+ * fields in 'dst', storing the result in 'dst'. */
+static inline void
+flow_wildcards_union_with_minimask(struct flow_wildcards *dst,
+   const struct minimask *src)
+{
+flow_union_with_miniflow_subset(>masks, >masks, src->masks.map);
+}
+
 static inline bool is_ct_valid(c

[ovs-dev] [PATCH v7 1/2] ofproto-dpif-mirror: Reduce number of function parameters.

2024-02-26 Thread Mike Pattrick

Previously the mirror_set() and mirror_get() functions took a large
number of parameters, which was inefficient and difficult to read and
extend. This patch moves most of the parameters into a struct.

Signed-off-by: Mike Pattrick 
Acked-by: Simon Horman 
Acked-by: Eelco Chaudron 
---
 ofproto/ofproto-dpif-mirror.c | 61 ++-
 ofproto/ofproto-dpif-mirror.h | 42 +++-
 ofproto/ofproto-dpif-xlate.c  | 29 -
 ofproto/ofproto-dpif.c| 23 ++---
 4 files changed, 91 insertions(+), 64 deletions(-)

diff --git a/ofproto/ofproto-dpif-mirror.c b/ofproto/ofproto-dpif-mirror.c
index 343b75f0e..a84c843b3 100644
--- a/ofproto/ofproto-dpif-mirror.c
+++ b/ofproto/ofproto-dpif-mirror.c
@@ -207,19 +207,23 @@ mirror_bundle_dst(struct mbridge *mbridge, struct 
ofbundle *ofbundle)
 }
 
 int
-mirror_set(struct mbridge *mbridge, void *aux, const char *name,
-   struct ofbundle **srcs, size_t n_srcs,
-   struct ofbundle **dsts, size_t n_dsts,
-   unsigned long *src_vlans, struct ofbundle *out_bundle,
-   uint16_t snaplen,
-   uint16_t out_vlan)
+mirror_set(struct mbridge *mbridge, void *aux,
+   const struct ofproto_mirror_settings *ms,
+   const struct mirror_bundles *mb)
+
 {
 struct mbundle *mbundle, *out;
 mirror_mask_t mirror_bit;
 struct mirror *mirror;
 struct hmapx srcs_map;  /* Contains "struct ofbundle *"s. */
 struct hmapx dsts_map;  /* Contains "struct ofbundle *"s. */
+uint16_t out_vlan;
+
+if (!ms || !mbridge) {
+return EINVAL;
+}
 
+out_vlan = ms->out_vlan;
 mirror = mirror_lookup(mbridge, aux);
 if (!mirror) {
 int idx;
@@ -227,7 +231,7 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 idx = mirror_scan(mbridge);
 if (idx < 0) {
 VLOG_WARN("maximum of %d port mirrors reached, cannot create %s",
-  MAX_MIRRORS, name);
+  MAX_MIRRORS, ms->name);
 return EFBIG;
 }
 
@@ -242,8 +246,8 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 unsigned long *vlans = ovsrcu_get(unsigned long *, >vlans);
 
 /* Get the new configuration. */
-if (out_bundle) {
-out = mbundle_lookup(mbridge, out_bundle);
+if (mb->out_bundle) {
+out = mbundle_lookup(mbridge, mb->out_bundle);
 if (!out) {
 mirror_destroy(mbridge, mirror->aux);
 return EINVAL;
@@ -252,16 +256,16 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 } else {
 out = NULL;
 }
-mbundle_lookup_multiple(mbridge, srcs, n_srcs, _map);
-mbundle_lookup_multiple(mbridge, dsts, n_dsts, _map);
+mbundle_lookup_multiple(mbridge, mb->srcs, mb->n_srcs, _map);
+mbundle_lookup_multiple(mbridge, mb->dsts, mb->n_dsts, _map);
 
 /* If the configuration has not changed, do nothing. */
 if (hmapx_equals(_map, >srcs)
 && hmapx_equals(_map, >dsts)
-&& vlan_bitmap_equal(vlans, src_vlans)
+&& vlan_bitmap_equal(vlans, ms->src_vlans)
 && mirror->out == out
 && mirror->out_vlan == out_vlan
-&& mirror->snaplen == snaplen)
+&& mirror->snaplen == ms->snaplen)
 {
 hmapx_destroy(_map);
 hmapx_destroy(_map);
@@ -275,15 +279,15 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 hmapx_swap(_map, >dsts);
 hmapx_destroy(_map);
 
-if (vlans || src_vlans) {
+if (vlans || ms->src_vlans) {
 ovsrcu_postpone(free, vlans);
-vlans = vlan_bitmap_clone(src_vlans);
+vlans = vlan_bitmap_clone(ms->src_vlans);
 ovsrcu_set(>vlans, vlans);
 }
 
 mirror->out = out;
 mirror->out_vlan = out_vlan;
-mirror->snaplen = snaplen;
+mirror->snaplen = ms->snaplen;
 
 /* Update mbundles. */
 mirror_bit = MIRROR_MASK_C(1) << mirror->idx;
@@ -406,23 +410,22 @@ mirror_update_stats(struct mbridge *mbridge, 
mirror_mask_t mirrors,
 /* Retrieves the mirror numbered 'index' in 'mbridge'.  Returns true if such a
  * mirror exists, false otherwise.
  *
- * If successful, '*vlans' receives the mirror's VLAN membership information,
+ * If successful 'mc->vlans' receives the mirror's VLAN membership information,
  * either a null pointer if the mirror includes all VLANs or a 4096-bit bitmap
  * in which a 1-bit indicates that the mirror includes a particular VLAN,
- * '*dup_mirrors' receives a bitmap of mirrors whose output duplicates mirror
- * 'index', '*out' receives the output ofbundle (if any), and '*out_vlan'
- * receives the output VLAN (if any).
+ * 'mc->dup_mirrors' receives a bitmap of mirrors whose output duplicates
+ * mirror 'index

[ovs-dev] [PATCH v2] dp-packet: Don't offload inner csum if outer isn't supported.

2024-02-26 Thread Mike Pattrick

Some network cards support inner checksum offloading but not outer
checksum offloading. Currently OVS will resolve that outer checksum but
allows the network card to resolve the inner checksum, invalidating the
outer checksum in the process.

Now if we can't offload outer checksums, we don't offload inner either.

Reported-at: https://issues.redhat.com/browse/FDP-363
Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
Signed-off-by: Mike Pattrick 
---
nb: I also tested a more complex patch that only resolved the inner
checksum and offloaded the UDP layer. This didn't noticably improve
performance.
v2: Added IPv4 flag
---
 lib/dp-packet.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 305822293..df7bf8e6b 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -592,6 +592,18 @@ dp_packet_ol_send_prepare(struct dp_packet *p, uint64_t 
flags)
 if (dp_packet_hwol_is_tunnel_geneve(p) ||
 dp_packet_hwol_is_tunnel_vxlan(p)) {
 tnl_inner = true;
+
+/* If the TX interface doesn't support UDP tunnel offload but does
+ * support inner checksum offload and an outer UDP checksum is
+ * required, then we can't offload inner checksum either. As that would
+ * invalidate the outer checksum. */
+if (!(flags & NETDEV_TX_OFFLOAD_OUTER_UDP_CKSUM) &&
+dp_packet_hwol_is_outer_udp_cksum(p)) {
+flags &= ~(NETDEV_TX_OFFLOAD_TCP_CKSUM |
+   NETDEV_TX_OFFLOAD_UDP_CKSUM |
+   NETDEV_TX_OFFLOAD_SCTP_CKSUM |
+   NETDEV_TX_OFFLOAD_IPV4_CKSUM);
+}
 }
 
 if (dp_packet_hwol_tx_ip_csum(p)) {
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v2] ovsdb: Don't iterate over rows on empty mutation.

2024-02-22 Thread Mike Pattrick

Previously when an empty mutation was used to count the number of rows
in a table, OVSDB would iterate over all rows twice. First to perform an
RBAC check, and then to perform the no-operation.

This change adds a short circuit to mutate operations with no conditions
and an empty mutation set, returning immediately. One notable change in
functionality is not performing the RBAC check in this condition, as no
mutation actually takes place.

Reported-by: Terry Wilson 
Reported-at: https://issues.redhat.com/browse/FDP-359
Signed-off-by: Mike Pattrick 
---
v2: Added additional non-rbac tests, and support for conditional
counting without the rbac check
---
 ovsdb/execution.c| 26 +++-
 ovsdb/mutation.h |  6 +
 tests/ovsdb-execution.at | 51 
 tests/ovsdb-rbac.at  | 23 ++
 4 files changed, 105 insertions(+), 1 deletion(-)

diff --git a/ovsdb/execution.c b/ovsdb/execution.c
index 8c20c3b54..7ed700632 100644
--- a/ovsdb/execution.c
+++ b/ovsdb/execution.c
@@ -585,6 +585,19 @@ mutate_row_cb(const struct ovsdb_row *row, void *mr_)
 return *mr->error == NULL;
 }
 
+struct count_row_cbdata {
+size_t n_matches;
+};
+
+static bool
+count_row_cb(const struct ovsdb_row *row OVS_UNUSED, void *cr_)
+{
+struct count_row_cbdata *cr = cr_;
+
+cr->n_matches++;
+return true;
+}
+
 static struct ovsdb_error *
 ovsdb_execute_mutate(struct ovsdb_execution *x, struct ovsdb_parser *parser,
  struct json *result)
@@ -609,7 +622,18 @@ ovsdb_execute_mutate(struct ovsdb_execution *x, struct 
ovsdb_parser *parser,
 error = ovsdb_condition_from_json(table->schema, where, x->symtab,
   );
 }
-if (!error) {
+if (!error && ovsdb_mutation_set_empty()) {
+/* Special case with no mutations, just return the row count. */
+if (ovsdb_condition_empty()) {
+json_object_put(result, "count",
+json_integer_create(hmap_count(>rows)));
+} else {
+struct count_row_cbdata cr = {};
+ovsdb_query(table, , count_row_cb, );
+json_object_put(result, "count",
+json_integer_create(cr.n_matches));
+}
+} else if (!error) {
 mr.n_matches = 0;
 mr.txn = x->txn;
 mr.mutations = 
diff --git a/ovsdb/mutation.h b/ovsdb/mutation.h
index 7566ef199..05d4a262a 100644
--- a/ovsdb/mutation.h
+++ b/ovsdb/mutation.h
@@ -69,4 +69,10 @@ void ovsdb_mutation_set_destroy(struct ovsdb_mutation_set *);
 struct ovsdb_error *ovsdb_mutation_set_execute(
 struct ovsdb_row *, const struct ovsdb_mutation_set *) 
OVS_WARN_UNUSED_RESULT;
 
+static inline bool ovsdb_mutation_set_empty(
+const struct ovsdb_mutation_set *ms)
+{
+return ms->n_mutations == 0;
+}
+
 #endif /* ovsdb/mutation.h */
diff --git a/tests/ovsdb-execution.at b/tests/ovsdb-execution.at
index fd1c7a239..1ffa2b738 100644
--- a/tests/ovsdb-execution.at
+++ b/tests/ovsdb-execution.at
@@ -1201,4 +1201,55 @@ OVSDB_CHECK_EXECUTION([garbage collection],
 [{"rows":[]}]
 ]])])
 
+OVSDB_CHECK_EXECUTION([insert rows, count with mutation],
+  [ordinal_schema],
+  "ordinals",
+  {"op": "insert",
+   "table": "ordinals",
+   "row": {"number": 0, "name": "zero"},
+   "uuid-name": "first"}]]],
+   [[["ordinals",
+  {"op": "insert",
+   "table": "ordinals",
+   "row": {"number": 1, "name": "one"},
+   "uuid-name": "first"}]]],
+   [[["ordinals",
+  {"op": "mutate",
+   "table": "ordinals",
+   "where": [["name", "==", "zero"]],
+   "mutations": []}]]],
+   [[["ordinals",
+  {"op": "mutate",
+   "table": "ordinals",
+   "where": [["name", "==", "one"]],
+   "mutations": []}]]],
+   [[["ordinals",
+  {"op": "insert",
+   "table": "ordinals",
+   "row": {"number": 2, "name": "one"},
+   "uuid-name": "first"}]]],
+   [[["ordinals",
+  {"op": "mutate",
+   "table": "ordinals",
+   "where": [["name", "==", "one"]],
+   "mutations": []}]]],
+   [[["ordinals",
+  {"op": "delete",
+   "table": "ordinals",
+   "where": [["name", &

Re: [ovs-dev] [PATCH] Userspace: Software fallback for UDP encapsulated TCP segmentation.

2024-02-21 Thread Mike Pattrick

On Tue, Feb 20, 2024 at 11:09 PM Mike Pattrick  wrote:
>
> When sending packets that are flagged as requiring segmentation to an
> interface that doens't support this feature, send the packet to the TSO
> software fallback instead of dropping it.
>
> Signed-off-by: Mike Pattrick 

Recheck-request: github-robot

> ---
>  lib/dp-packet-gso.c | 73 +
>  lib/dp-packet.h | 26 +++
>  lib/netdev-native-tnl.c |  8 +
>  lib/netdev.c| 37 +
>  tests/system-traffic.at | 58 
>  5 files changed, 167 insertions(+), 35 deletions(-)
>
> diff --git a/lib/dp-packet-gso.c b/lib/dp-packet-gso.c
> index 847685ad9..f25abf436 100644
> --- a/lib/dp-packet-gso.c
> +++ b/lib/dp-packet-gso.c
> @@ -47,6 +47,8 @@ dp_packet_gso_seg_new(const struct dp_packet *p, size_t 
> hdr_len,
>  seg->l2_5_ofs = p->l2_5_ofs;
>  seg->l3_ofs = p->l3_ofs;
>  seg->l4_ofs = p->l4_ofs;
> +seg->inner_l3_ofs = p->inner_l3_ofs;
> +seg->inner_l4_ofs = p->inner_l4_ofs;
>
>  /* The protocol headers remain the same, so preserve hash and mark. */
>  *dp_packet_rss_ptr(seg) = *dp_packet_rss_ptr(p);
> @@ -71,7 +73,12 @@ dp_packet_gso_nr_segs(struct dp_packet *p)
>  const char *data_tail;
>  const char *data_pos;
>
> -data_pos = dp_packet_get_tcp_payload(p);
> +if (dp_packet_hwol_is_tunnel_vxlan(p) ||
> +dp_packet_hwol_is_tunnel_geneve(p)) {
> +data_pos = dp_packet_get_inner_tcp_payload(p);
> +} else {
> +data_pos = dp_packet_get_tcp_payload(p);
> +}
>  data_tail = (char *) dp_packet_tail(p) - dp_packet_l2_pad_size(p);
>
>  return DIV_ROUND_UP(data_tail - data_pos, segsz);
> @@ -91,12 +98,15 @@ dp_packet_gso(struct dp_packet *p, struct dp_packet_batch 
> **batches)
>  struct tcp_header *tcp_hdr;
>  struct ip_header *ip_hdr;
>  struct dp_packet *seg;
> +const char *data_pos;
>  uint16_t tcp_offset;
>  uint16_t tso_segsz;
> +uint16_t ip_id = 0;
>  uint32_t tcp_seq;
> -uint16_t ip_id;
> +bool outer_ipv4;
>  int hdr_len;
>  int seg_len;
> +bool tnl;
>
>  tso_segsz = dp_packet_get_tso_segsz(p);
>  if (!tso_segsz) {
> @@ -105,20 +115,35 @@ dp_packet_gso(struct dp_packet *p, struct 
> dp_packet_batch **batches)
>  return false;
>  }
>
> -tcp_hdr = dp_packet_l4(p);
> -tcp_offset = TCP_OFFSET(tcp_hdr->tcp_ctl);
> -tcp_seq = ntohl(get_16aligned_be32(_hdr->tcp_seq));
> -hdr_len = ((char *) dp_packet_l4(p) - (char *) dp_packet_eth(p))
> -  + tcp_offset * 4;
> -ip_id = 0;
> -if (dp_packet_hwol_is_ipv4(p)) {
> +if (dp_packet_hwol_is_tunnel_vxlan(p) ||
> +dp_packet_hwol_is_tunnel_geneve(p)) {
> +data_pos =  dp_packet_get_inner_tcp_payload(p);
> +outer_ipv4 = dp_packet_hwol_is_outer_ipv4(p);
> +tcp_hdr = dp_packet_inner_l4(p);
> +ip_hdr = dp_packet_inner_l3(p);
> +tnl = true;
> +if (outer_ipv4) {
> +ip_id = ntohs(((struct ip_header *) dp_packet_l3(p))->ip_id);
> +} else if (dp_packet_hwol_is_ipv4(p)) {
> +ip_id = ntohs(ip_hdr->ip_id);
> +}
> +} else {
> +data_pos = dp_packet_get_tcp_payload(p);
> +outer_ipv4 = dp_packet_hwol_is_ipv4(p);
> +tcp_hdr = dp_packet_l4(p);
>  ip_hdr = dp_packet_l3(p);
> -ip_id = ntohs(ip_hdr->ip_id);
> +tnl = false;
> +if (outer_ipv4) {
> +ip_id = ntohs(ip_hdr->ip_id);
> +}
>  }
>
> +tcp_offset = TCP_OFFSET(tcp_hdr->tcp_ctl);
> +tcp_seq = ntohl(get_16aligned_be32(_hdr->tcp_seq));
> +hdr_len = ((char *) tcp_hdr - (char *) dp_packet_eth(p))
> +  + tcp_offset * 4;
>  const char *data_tail = (char *) dp_packet_tail(p)
>  - dp_packet_l2_pad_size(p);
> -const char *data_pos = dp_packet_get_tcp_payload(p);
>  int n_segs = dp_packet_gso_nr_segs(p);
>
>  for (int i = 0; i < n_segs; i++) {
> @@ -130,8 +155,26 @@ dp_packet_gso(struct dp_packet *p, struct 
> dp_packet_batch **batches)
>  seg = dp_packet_gso_seg_new(p, hdr_len, data_pos, seg_len);
>  data_pos += seg_len;
>
> +if (tnl) {
> +/* Update tunnel L3 header. */
> +if (dp_packet_hwol_is_ipv4(seg)) {
> +ip_hdr = dp_packet_inner_l3(seg);
> +ip_hdr->ip_tot_len = htons(sizeof *ip_hdr +
> +   dp_packet_inn

[ovs-dev] [PATCH v3] userspace: Allow UDP zero checksum with IPv6 tunnels.

2024-02-21 Thread Mike Pattrick

This patch adopts the proposed RFC 6935 by allowing null UDP checksums
even if the tunnel protocol is IPv6. This is already supported by Linux
through the udp6zerocsumtx tunnel option. It is disabled by default and
IPv6 tunnels are flagged as requiring a checksum, but this patch enables
the user to set csum=false on IPv6 tunnels.

Signed-off-by: Mike Pattrick 
---
v2: Changed documentation, and added a NEWS item
v3: NEWS file merge conflict
---
 NEWS|  3 +++
 lib/netdev-native-tnl.c |  2 +-
 lib/netdev-vport.c  | 13 +++--
 lib/netdev.h|  9 -
 ofproto/tunnel.c| 11 +--
 tests/tunnel.at |  6 +++---
 vswitchd/vswitch.xml| 11 ---
 7 files changed, 43 insertions(+), 12 deletions(-)

diff --git a/NEWS b/NEWS
index c9e4064e6..3a75d3850 100644
--- a/NEWS
+++ b/NEWS
@@ -4,6 +4,9 @@ Post-v3.3.0
  * Conntrack now supports 'random' flag for selecting ports in a range
while natting and 'persistent' flag for selection of the IP address
from a range.
+ * IPv6 UDP tunnels will now honour the csum option. Configuring the
+   interface with "options:csum=false" now has the same effect in OVS
+   as the udp6zerocsumtx option has with kernel UDP tunnels.
 
 
 v3.3.0 - 16 Feb 2024
diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index dee9ab344..e8258bc4e 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -424,7 +424,7 @@ udp_build_header(const struct netdev_tunnel_config *tnl_cfg,
 udp = netdev_tnl_ip_build_header(data, params, IPPROTO_UDP, 0);
 udp->udp_dst = tnl_cfg->dst_port;
 
-if (params->is_ipv6 || params->flow->tunnel.flags & FLOW_TNL_F_CSUM) {
+if (params->flow->tunnel.flags & FLOW_TNL_F_CSUM) {
 /* Write a value in now to mark that we should compute the checksum
  * later. 0x is handy because it is transparent to the
  * calculation. */
diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index 60caa02fb..f9a778988 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -702,7 +702,9 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
*args, char **errp)
 tnl_cfg.dst_port = htons(atoi(node->value));
 } else if (!strcmp(node->key, "csum") && has_csum) {
 if (!strcmp(node->value, "true")) {
-tnl_cfg.csum = true;
+tnl_cfg.csum = NETDEV_TNL_CSUM_ENABLED;
+} else if (!strcmp(node->value, "false")) {
+tnl_cfg.csum = NETDEV_TNL_CSUM_DISABLED;
 }
 } else if (!strcmp(node->key, "seq") && has_seq) {
 if (!strcmp(node->value, "true")) {
@@ -850,6 +852,11 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
*args, char **errp)
 }
 }
 
+/* The default csum state for GRE is special. */
+if (tnl_cfg.csum == NETDEV_TNL_CSUM_DEFAULT && strstr(type, "gre")) {
+tnl_cfg.csum = NETDEV_TNL_CSUM_DEFAULT_GRE;
+}
+
 enum tunnel_layers layers = tunnel_supported_layers(type, _cfg);
 const char *full_type = (strcmp(type, "vxlan") ? type
  : (tnl_cfg.exts & (1 << OVS_VXLAN_EXT_GPE)
@@ -1026,8 +1033,10 @@ get_tunnel_config(const struct netdev *dev, struct smap 
*args)
 }
 }
 
-if (tnl_cfg->csum) {
+if (tnl_cfg->csum == NETDEV_TNL_CSUM_ENABLED) {
 smap_add(args, "csum", "true");
+} else if (tnl_cfg->csum == NETDEV_TNL_CSUM_DISABLED) {
+smap_add(args, "csum", "false");
 }
 
 if (tnl_cfg->set_seq) {
diff --git a/lib/netdev.h b/lib/netdev.h
index 67a8486bd..a79531e6d 100644
--- a/lib/netdev.h
+++ b/lib/netdev.h
@@ -111,6 +111,13 @@ enum netdev_srv6_flowlabel {
 SRV6_FLOWLABEL_COMPUTE,
 };
 
+enum netdev_tnl_csum {
+NETDEV_TNL_CSUM_DEFAULT,
+NETDEV_TNL_CSUM_ENABLED,
+NETDEV_TNL_CSUM_DISABLED,
+NETDEV_TNL_CSUM_DEFAULT_GRE,
+};
+
 /* Configuration specific to tunnels. */
 struct netdev_tunnel_config {
 ovs_be64 in_key;
@@ -139,7 +146,7 @@ struct netdev_tunnel_config {
 uint8_t tos;
 bool tos_inherit;
 
-bool csum;
+enum netdev_tnl_csum csum;
 bool dont_fragment;
 enum netdev_pt_mode pt_mode;
 
diff --git a/ofproto/tunnel.c b/ofproto/tunnel.c
index 80ddee78a..6f462874e 100644
--- a/ofproto/tunnel.c
+++ b/ofproto/tunnel.c
@@ -465,9 +465,14 @@ tnl_port_send(const struct ofport_dpif *ofport, struct 
flow *flow,
 
 flow->tunnel.flags &= ~(FLOW_TNL_F_MASK & ~FLOW_TNL_PUB_F_MASK);
 flow->tunnel.flags |= (cfg->dont_fragment ? FLOW_TNL_F_DONT_FRAGMENT : 0)
-| (cfg->csum ? FLOW_TNL_F_CSUM : 0)
 | (cfg->out_key_present ? FLOW_TNL_F_KEY : 0);
 
+if (cfg->csum == NETDEV_TNL_CSUM_ENABLED) {
+flo

[ovs-dev] [PATCH v2] userspace: Allow UDP zero checksum with IPv6 tunnels.

2024-02-21 Thread Mike Pattrick

This patch adopts the proposed RFC 6935 by allowing null UDP checksums
even if the tunnel protocol is IPv6. This is already supported by Linux
through the udp6zerocsumtx tunnel option. It is disabled by default and
IPv6 tunnels are flagged as requiring a checksum, but this patch enables
the user to set csum=false on IPv6 tunnels.

Signed-off-by: Mike Pattrick 
---
v2: Changed documentation, and added a NEWS item
---
 NEWS|  5 -
 lib/netdev-native-tnl.c |  2 +-
 lib/netdev-vport.c  | 13 +++--
 lib/netdev.h|  9 -
 ofproto/tunnel.c| 11 +--
 tests/tunnel.at |  6 +++---
 vswitchd/vswitch.xml| 11 ---
 7 files changed, 44 insertions(+), 13 deletions(-)

diff --git a/NEWS b/NEWS
index 0789dc0c6..84402ff8f 100644
--- a/NEWS
+++ b/NEWS
@@ -1,6 +1,9 @@
 Post-v3.3.0
 
-
+   - Userspace datapath:
+ * IPv6 UDP tunnels will now honour the csum option. Configuring the
+   interface with "options:csum=false" now has the same effect in OVS
+   as the udp6zerocsumtx option has with kernel UDP tunnels.
 
 v3.3.0 - 16 Feb 2024
 
diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index dee9ab344..e8258bc4e 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -424,7 +424,7 @@ udp_build_header(const struct netdev_tunnel_config *tnl_cfg,
 udp = netdev_tnl_ip_build_header(data, params, IPPROTO_UDP, 0);
 udp->udp_dst = tnl_cfg->dst_port;
 
-if (params->is_ipv6 || params->flow->tunnel.flags & FLOW_TNL_F_CSUM) {
+if (params->flow->tunnel.flags & FLOW_TNL_F_CSUM) {
 /* Write a value in now to mark that we should compute the checksum
  * later. 0x is handy because it is transparent to the
  * calculation. */
diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index 60caa02fb..f9a778988 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -702,7 +702,9 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
*args, char **errp)
 tnl_cfg.dst_port = htons(atoi(node->value));
 } else if (!strcmp(node->key, "csum") && has_csum) {
 if (!strcmp(node->value, "true")) {
-tnl_cfg.csum = true;
+tnl_cfg.csum = NETDEV_TNL_CSUM_ENABLED;
+} else if (!strcmp(node->value, "false")) {
+tnl_cfg.csum = NETDEV_TNL_CSUM_DISABLED;
 }
 } else if (!strcmp(node->key, "seq") && has_seq) {
 if (!strcmp(node->value, "true")) {
@@ -850,6 +852,11 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
*args, char **errp)
 }
 }
 
+/* The default csum state for GRE is special. */
+if (tnl_cfg.csum == NETDEV_TNL_CSUM_DEFAULT && strstr(type, "gre")) {
+tnl_cfg.csum = NETDEV_TNL_CSUM_DEFAULT_GRE;
+}
+
 enum tunnel_layers layers = tunnel_supported_layers(type, _cfg);
 const char *full_type = (strcmp(type, "vxlan") ? type
  : (tnl_cfg.exts & (1 << OVS_VXLAN_EXT_GPE)
@@ -1026,8 +1033,10 @@ get_tunnel_config(const struct netdev *dev, struct smap 
*args)
 }
 }
 
-if (tnl_cfg->csum) {
+if (tnl_cfg->csum == NETDEV_TNL_CSUM_ENABLED) {
 smap_add(args, "csum", "true");
+} else if (tnl_cfg->csum == NETDEV_TNL_CSUM_DISABLED) {
+smap_add(args, "csum", "false");
 }
 
 if (tnl_cfg->set_seq) {
diff --git a/lib/netdev.h b/lib/netdev.h
index 67a8486bd..a79531e6d 100644
--- a/lib/netdev.h
+++ b/lib/netdev.h
@@ -111,6 +111,13 @@ enum netdev_srv6_flowlabel {
 SRV6_FLOWLABEL_COMPUTE,
 };
 
+enum netdev_tnl_csum {
+NETDEV_TNL_CSUM_DEFAULT,
+NETDEV_TNL_CSUM_ENABLED,
+NETDEV_TNL_CSUM_DISABLED,
+NETDEV_TNL_CSUM_DEFAULT_GRE,
+};
+
 /* Configuration specific to tunnels. */
 struct netdev_tunnel_config {
 ovs_be64 in_key;
@@ -139,7 +146,7 @@ struct netdev_tunnel_config {
 uint8_t tos;
 bool tos_inherit;
 
-bool csum;
+enum netdev_tnl_csum csum;
 bool dont_fragment;
 enum netdev_pt_mode pt_mode;
 
diff --git a/ofproto/tunnel.c b/ofproto/tunnel.c
index 80ddee78a..6f462874e 100644
--- a/ofproto/tunnel.c
+++ b/ofproto/tunnel.c
@@ -465,9 +465,14 @@ tnl_port_send(const struct ofport_dpif *ofport, struct 
flow *flow,
 
 flow->tunnel.flags &= ~(FLOW_TNL_F_MASK & ~FLOW_TNL_PUB_F_MASK);
 flow->tunnel.flags |= (cfg->dont_fragment ? FLOW_TNL_F_DONT_FRAGMENT : 0)
-| (cfg->csum ? FLOW_TNL_F_CSUM : 0)
 | (cfg->out_key_present ? FLOW_TNL_F_KEY : 0);
 
+if (cfg->csum == NETDEV_TNL_CSUM_ENABLED) {
+flow->tunnel.flags |= FLOW_TNL_F_CSUM;
+} else if (cfg->csum == NETDEV_TNL_CSUM_DEFAULT && !flow->tunnel.ip_dst

Re: [ovs-dev] [PATCH 1/3] tests: Move the non-local port as tunnel endpoint test.

2024-02-20 Thread Mike Pattrick

On Tue, Feb 20, 2024 at 5:35 PM Ilya Maximets  wrote:
>
> It's not a system test as it runs with dummy datapath and ports
> and it has nothing to do with layer 3 tunnels.
>
> It should be with other userspace tunnel tests.
>
> While moving also making it a little nicer visually and less error
> prone by requesting port numbers for all the ports.
>
> Signed-off-by: Ilya Maximets 

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] userspace: Allow UDP zero checksum with IPv6 tunnels.

2024-02-20 Thread Mike Pattrick

On Tue, Feb 20, 2024 at 8:56 PM Mike Pattrick  wrote:
>
> This patch adopts the proposed RFC 6935 by allowing null UDP checksums
> even if the tunnel protocol is IPv6. This is already supported by Linux
> through the udp6zerocsumtx tunnel option. It is disabled by default and
> IPv6 tunnels are flagged as requiring a checksum, but this patch enables
> the user to set csum=false on IPv6 tunnels.
>
> Signed-off-by: Mike Pattrick 

One of the github CI runners failed this in test "bfd - bfd decay". I
believe this is a false negative.

-M

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] dp-packet: Don't offload inner csum if outer isn't supported.

2024-02-20 Thread Mike Pattrick

On Tue, Feb 20, 2024 at 10:07 AM Mike Pattrick  wrote:
>
> Some network cards support inner checksum offloading but not outer
> checksum offloading. Currently OVS will resolve that outer checksum but
> allows the network card to resolve the inner checksum, invalidating the
> outer checksum in the process.
>
> Now if we can't offload outer checksums, we don't offload inner either.
>
> Reported-at: https://issues.redhat.com/browse/FDP-363
> Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
> Signed-off-by: Mike Pattrick 

Intel CI failed this patch at "conntrack - invalid", with error
message "upcall_cb failure: ukey installation fails".

I believe this is a false negative.

-M

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] Userspace: Software fallback for UDP encapsulated TCP segmentation.

2024-02-20 Thread Mike Pattrick

When sending packets that are flagged as requiring segmentation to an
interface that doens't support this feature, send the packet to the TSO
software fallback instead of dropping it.

Signed-off-by: Mike Pattrick 
---
 lib/dp-packet-gso.c | 73 +
 lib/dp-packet.h | 26 +++
 lib/netdev-native-tnl.c |  8 +
 lib/netdev.c| 37 +
 tests/system-traffic.at | 58 
 5 files changed, 167 insertions(+), 35 deletions(-)

diff --git a/lib/dp-packet-gso.c b/lib/dp-packet-gso.c
index 847685ad9..f25abf436 100644
--- a/lib/dp-packet-gso.c
+++ b/lib/dp-packet-gso.c
@@ -47,6 +47,8 @@ dp_packet_gso_seg_new(const struct dp_packet *p, size_t 
hdr_len,
 seg->l2_5_ofs = p->l2_5_ofs;
 seg->l3_ofs = p->l3_ofs;
 seg->l4_ofs = p->l4_ofs;
+seg->inner_l3_ofs = p->inner_l3_ofs;
+seg->inner_l4_ofs = p->inner_l4_ofs;
 
 /* The protocol headers remain the same, so preserve hash and mark. */
 *dp_packet_rss_ptr(seg) = *dp_packet_rss_ptr(p);
@@ -71,7 +73,12 @@ dp_packet_gso_nr_segs(struct dp_packet *p)
 const char *data_tail;
 const char *data_pos;
 
-data_pos = dp_packet_get_tcp_payload(p);
+if (dp_packet_hwol_is_tunnel_vxlan(p) ||
+dp_packet_hwol_is_tunnel_geneve(p)) {
+data_pos = dp_packet_get_inner_tcp_payload(p);
+} else {
+data_pos = dp_packet_get_tcp_payload(p);
+}
 data_tail = (char *) dp_packet_tail(p) - dp_packet_l2_pad_size(p);
 
 return DIV_ROUND_UP(data_tail - data_pos, segsz);
@@ -91,12 +98,15 @@ dp_packet_gso(struct dp_packet *p, struct dp_packet_batch 
**batches)
 struct tcp_header *tcp_hdr;
 struct ip_header *ip_hdr;
 struct dp_packet *seg;
+const char *data_pos;
 uint16_t tcp_offset;
 uint16_t tso_segsz;
+uint16_t ip_id = 0;
 uint32_t tcp_seq;
-uint16_t ip_id;
+bool outer_ipv4;
 int hdr_len;
 int seg_len;
+bool tnl;
 
 tso_segsz = dp_packet_get_tso_segsz(p);
 if (!tso_segsz) {
@@ -105,20 +115,35 @@ dp_packet_gso(struct dp_packet *p, struct dp_packet_batch 
**batches)
 return false;
 }
 
-tcp_hdr = dp_packet_l4(p);
-tcp_offset = TCP_OFFSET(tcp_hdr->tcp_ctl);
-tcp_seq = ntohl(get_16aligned_be32(_hdr->tcp_seq));
-hdr_len = ((char *) dp_packet_l4(p) - (char *) dp_packet_eth(p))
-  + tcp_offset * 4;
-ip_id = 0;
-if (dp_packet_hwol_is_ipv4(p)) {
+if (dp_packet_hwol_is_tunnel_vxlan(p) ||
+dp_packet_hwol_is_tunnel_geneve(p)) {
+data_pos =  dp_packet_get_inner_tcp_payload(p);
+outer_ipv4 = dp_packet_hwol_is_outer_ipv4(p);
+tcp_hdr = dp_packet_inner_l4(p);
+ip_hdr = dp_packet_inner_l3(p);
+tnl = true;
+if (outer_ipv4) {
+ip_id = ntohs(((struct ip_header *) dp_packet_l3(p))->ip_id);
+} else if (dp_packet_hwol_is_ipv4(p)) {
+ip_id = ntohs(ip_hdr->ip_id);
+}
+} else {
+data_pos = dp_packet_get_tcp_payload(p);
+outer_ipv4 = dp_packet_hwol_is_ipv4(p);
+tcp_hdr = dp_packet_l4(p);
 ip_hdr = dp_packet_l3(p);
-ip_id = ntohs(ip_hdr->ip_id);
+tnl = false;
+if (outer_ipv4) {
+ip_id = ntohs(ip_hdr->ip_id);
+}
 }
 
+tcp_offset = TCP_OFFSET(tcp_hdr->tcp_ctl);
+tcp_seq = ntohl(get_16aligned_be32(_hdr->tcp_seq));
+hdr_len = ((char *) tcp_hdr - (char *) dp_packet_eth(p))
+  + tcp_offset * 4;
 const char *data_tail = (char *) dp_packet_tail(p)
 - dp_packet_l2_pad_size(p);
-const char *data_pos = dp_packet_get_tcp_payload(p);
 int n_segs = dp_packet_gso_nr_segs(p);
 
 for (int i = 0; i < n_segs; i++) {
@@ -130,8 +155,26 @@ dp_packet_gso(struct dp_packet *p, struct dp_packet_batch 
**batches)
 seg = dp_packet_gso_seg_new(p, hdr_len, data_pos, seg_len);
 data_pos += seg_len;
 
+if (tnl) {
+/* Update tunnel L3 header. */
+if (dp_packet_hwol_is_ipv4(seg)) {
+ip_hdr = dp_packet_inner_l3(seg);
+ip_hdr->ip_tot_len = htons(sizeof *ip_hdr +
+   dp_packet_inner_l4_size(seg));
+ip_hdr->ip_id = htons(ip_id);
+ip_hdr->ip_csum = 0;
+ip_id++;
+} else {
+struct ovs_16aligned_ip6_hdr *ip6_hdr;
+
+ip6_hdr = dp_packet_inner_l3(seg);
+ip6_hdr->ip6_ctlun.ip6_un1.ip6_un1_plen
+= htons(dp_packet_inner_l3_size(seg) - sizeof *ip6_hdr);
+}
+}
+
 /* Update L3 header. */
-if (dp_packet_hwol_is_ipv4(seg)) {
+if (outer_ipv4) {
 ip_hdr = dp_packet_l3(seg);
 ip_hdr->ip_tot_len = htons(sizeof *ip_hdr +

[ovs-dev] [PATCH] userspace: Allow UDP zero checksum with IPv6 tunnels.

2024-02-20 Thread Mike Pattrick

This patch adopts the proposed RFC 6935 by allowing null UDP checksums
even if the tunnel protocol is IPv6. This is already supported by Linux
through the udp6zerocsumtx tunnel option. It is disabled by default and
IPv6 tunnels are flagged as requiring a checksum, but this patch enables
the user to set csum=false on IPv6 tunnels.

Signed-off-by: Mike Pattrick 
---
 lib/netdev-native-tnl.c |  2 +-
 lib/netdev-vport.c  | 13 +++--
 lib/netdev.h|  9 -
 ofproto/tunnel.c| 11 +--
 tests/tunnel.at |  6 +++---
 vswitchd/vswitch.xml|  8 +---
 6 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index dee9ab344..e8258bc4e 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -424,7 +424,7 @@ udp_build_header(const struct netdev_tunnel_config *tnl_cfg,
 udp = netdev_tnl_ip_build_header(data, params, IPPROTO_UDP, 0);
 udp->udp_dst = tnl_cfg->dst_port;
 
-if (params->is_ipv6 || params->flow->tunnel.flags & FLOW_TNL_F_CSUM) {
+if (params->flow->tunnel.flags & FLOW_TNL_F_CSUM) {
 /* Write a value in now to mark that we should compute the checksum
  * later. 0x is handy because it is transparent to the
  * calculation. */
diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index 60caa02fb..f9a778988 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -702,7 +702,9 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
*args, char **errp)
 tnl_cfg.dst_port = htons(atoi(node->value));
 } else if (!strcmp(node->key, "csum") && has_csum) {
 if (!strcmp(node->value, "true")) {
-tnl_cfg.csum = true;
+tnl_cfg.csum = NETDEV_TNL_CSUM_ENABLED;
+} else if (!strcmp(node->value, "false")) {
+tnl_cfg.csum = NETDEV_TNL_CSUM_DISABLED;
 }
 } else if (!strcmp(node->key, "seq") && has_seq) {
 if (!strcmp(node->value, "true")) {
@@ -850,6 +852,11 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
*args, char **errp)
 }
 }
 
+/* The default csum state for GRE is special. */
+if (tnl_cfg.csum == NETDEV_TNL_CSUM_DEFAULT && strstr(type, "gre")) {
+tnl_cfg.csum = NETDEV_TNL_CSUM_DEFAULT_GRE;
+}
+
 enum tunnel_layers layers = tunnel_supported_layers(type, _cfg);
 const char *full_type = (strcmp(type, "vxlan") ? type
  : (tnl_cfg.exts & (1 << OVS_VXLAN_EXT_GPE)
@@ -1026,8 +1033,10 @@ get_tunnel_config(const struct netdev *dev, struct smap 
*args)
 }
 }
 
-if (tnl_cfg->csum) {
+if (tnl_cfg->csum == NETDEV_TNL_CSUM_ENABLED) {
 smap_add(args, "csum", "true");
+} else if (tnl_cfg->csum == NETDEV_TNL_CSUM_DISABLED) {
+smap_add(args, "csum", "false");
 }
 
 if (tnl_cfg->set_seq) {
diff --git a/lib/netdev.h b/lib/netdev.h
index 67a8486bd..a79531e6d 100644
--- a/lib/netdev.h
+++ b/lib/netdev.h
@@ -111,6 +111,13 @@ enum netdev_srv6_flowlabel {
 SRV6_FLOWLABEL_COMPUTE,
 };
 
+enum netdev_tnl_csum {
+NETDEV_TNL_CSUM_DEFAULT,
+NETDEV_TNL_CSUM_ENABLED,
+NETDEV_TNL_CSUM_DISABLED,
+NETDEV_TNL_CSUM_DEFAULT_GRE,
+};
+
 /* Configuration specific to tunnels. */
 struct netdev_tunnel_config {
 ovs_be64 in_key;
@@ -139,7 +146,7 @@ struct netdev_tunnel_config {
 uint8_t tos;
 bool tos_inherit;
 
-bool csum;
+enum netdev_tnl_csum csum;
 bool dont_fragment;
 enum netdev_pt_mode pt_mode;
 
diff --git a/ofproto/tunnel.c b/ofproto/tunnel.c
index 80ddee78a..6f462874e 100644
--- a/ofproto/tunnel.c
+++ b/ofproto/tunnel.c
@@ -465,9 +465,14 @@ tnl_port_send(const struct ofport_dpif *ofport, struct 
flow *flow,
 
 flow->tunnel.flags &= ~(FLOW_TNL_F_MASK & ~FLOW_TNL_PUB_F_MASK);
 flow->tunnel.flags |= (cfg->dont_fragment ? FLOW_TNL_F_DONT_FRAGMENT : 0)
-| (cfg->csum ? FLOW_TNL_F_CSUM : 0)
 | (cfg->out_key_present ? FLOW_TNL_F_KEY : 0);
 
+if (cfg->csum == NETDEV_TNL_CSUM_ENABLED) {
+flow->tunnel.flags |= FLOW_TNL_F_CSUM;
+} else if (cfg->csum == NETDEV_TNL_CSUM_DEFAULT && !flow->tunnel.ip_dst) {
+flow->tunnel.flags |= FLOW_TNL_F_CSUM;
+}
+
 if (cfg->set_egress_pkt_mark) {
 flow->pkt_mark = cfg->egress_pkt_mark;
 wc->masks.pkt_mark = UINT32_MAX;
@@ -706,8 +711,10 @@ tnl_port_format(const struct tnl_port *tnl_port, struct ds 
*ds)
 ds_put_cstr(ds, ", df=false");
 }
 
-if (cfg->csum) {
+if (cfg->csum == NETDEV_TNL_CSUM_ENABLED) {
 ds_put_cstr(ds, ", csum=true");
+} else if (cfg->csum == NETDEV_TNL_CSUM_DISABL

Re: [ovs-dev] [PATCH v2] dpif-netdev: Do not create handler threads.

2024-02-20 Thread Mike Pattrick

On Tue, Feb 20, 2024 at 4:32 AM Eelco Chaudron  wrote:
>
> Avoid unnecessary thread creation as no upcalls are generated,
> resulting in idle threads waiting for process termination.
>
> This optimization significantly reduces memory usage, cutting it
> by half on a 128 CPU/thread system during testing, with the number
> of threads reduced from 95 to 0.
>
> Signed-off-by: Eelco Chaudron 
> ---

I tested this out a bit, and it seems to work well.

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2] bond: Reset stats when deleting post recirc rule.

2024-02-20 Thread Mike Pattrick

On Tue, Feb 20, 2024 at 1:50 AM Adrian Moreno  wrote:
>
> In order to properly balance bond traffic, ofproto/bond periodically
> reads usage statistics of the post-recirculation rules (which are added
> to a hidden internal table).
>
> To do that, each "struct bond_entry" (which represents a hash within a
> bond) stores the last seen statistics for its rule. When a hash is moved
> to another member (due to a bond rebalance or the previous member going
> down), the rule is typically just modified, i.e: same match different
> actions. In this case, statistics are preserved and accounting continues
> to work.
>
> However, if the rule gets completely deleted (e.g: when all bond members
> go down) and then re-created, the new rule will have 0 tx_bytes but its
> associated entry will still store a non-zero last-seen value.
> This situation leads to an overflow of the delta calculation (computed
> as [current_stats_value - last_seen_value]), which can affect traffic
> as the hash will be considered to carry a lot of traffic and rebalancing
> will kick in.
>
> In order to fix this situation, reset the value of last seen statistics
> on rule deletion.
>
> Implementation notes:
> Modifying pr_tx_bytes requires write-locking the global rwlock but a
> lockless version of update_recirc_rules was being maintained to avoid locking
> on bon_unref().
> Considering the small impact of locking during bond removal, removing the
> lockless version and relying on clang's thread safety analysis is preferred.
>
> Also, folding Ilya's [1], i.e: fixing thread safety annotation in
> update_recirc_rules() to require holding write-lock.
>
> [1]
> https://patchwork.ozlabs.org/project/openvswitch/patch/20240209161718.1149494-1-i.maxim...@ovn.org/
>
> Reported-at: https://github.com/openvswitch/ovs-issues/issues/319
> Co-authored-by: Ilya Maximets 
> Signed-off-by: Ilya Maximets 
> Signed-off-by: Adrian Moreno 
> ---

Looks reasonable to me.

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] dp-packet: Don't offload inner csum if outer isn't supported.

2024-02-20 Thread Mike Pattrick

Some network cards support inner checksum offloading but not outer
checksum offloading. Currently OVS will resolve that outer checksum but
allows the network card to resolve the inner checksum, invalidating the
outer checksum in the process.

Now if we can't offload outer checksums, we don't offload inner either.

Reported-at: https://issues.redhat.com/browse/FDP-363
Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
Signed-off-by: Mike Pattrick 
---
nb: I also tested a more complex patch that only resolved the inner
checksum and offloaded the UDP layer. This didn't noticably improve
performance.
---
 lib/dp-packet.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 305822293..0fc352cce 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -592,6 +592,17 @@ dp_packet_ol_send_prepare(struct dp_packet *p, uint64_t 
flags)
 if (dp_packet_hwol_is_tunnel_geneve(p) ||
 dp_packet_hwol_is_tunnel_vxlan(p)) {
 tnl_inner = true;
+
+/* If the TX interface doesn't support UDP tunnel offload but does
+ * support inner checksum offload and an outer UDP checksum is
+ * required, then we can't offload inner checksum either. As that would
+ * invalidate the outer checksum. */
+if (!(flags & NETDEV_TX_OFFLOAD_OUTER_UDP_CKSUM) &&
+dp_packet_hwol_is_outer_udp_cksum(p)) {
+flags &= ~(NETDEV_TX_OFFLOAD_TCP_CKSUM |
+   NETDEV_TX_OFFLOAD_UDP_CKSUM |
+   NETDEV_TX_OFFLOAD_SCTP_CKSUM);
+}
 }
 
 if (dp_packet_hwol_tx_ip_csum(p)) {
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v4 4/4] Userspace: Add system test with UDP tunneling of UDP traffic.

2024-02-16 Thread Mike Pattrick

On Fri, Feb 16, 2024 at 3:23 PM Ilya Maximets  wrote:
>
> On 2/15/24 23:53, Mike Pattrick wrote:
> > Previously a gap existed in the tunnel system tests where only ICMP and
> > TCP traffic was tested. However, the code paths using for UDP traffic is
> > different then either of those and should also be tested.
> >
> > Some of the modified tests had previously checked for TCP with ncat but
> > didn't include an appropriate check for ncat support. That check was
> > added to these tests.
> >
> > Signed-off-by: Mike Pattrick 
> > ---
> >  tests/system-traffic.at | 118 +---
> >  1 file changed, 111 insertions(+), 7 deletions(-)
> >
>
> Hi, Mike.  Thanks for the fixes.  They look mostly fine to me and I
> tested them also with David's triple-tunnel test and it seem to work.
>
> The code becomes excessively hard to read though.  I think the dp-packet
> API has to be re-done before userspace-tso can be declared not
> experimental.  I don't have any good ideas on how to actually do that
> though.

I strongly agree, I can send a proposal later.

> But this should not be addressed in this patch set.
>
> See a few comments for the tests below.
>
> > diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> > index e68fe7e18..ca54a0f73 100644
> > --- a/tests/system-traffic.at
> > +++ b/tests/system-traffic.at
> > @@ -292,6 +292,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
> >  AT_CLEANUP
> >
> >  AT_SETUP([datapath - ping over vxlan tunnel])
> > +AT_SKIP_IF([test $HAVE_NC = no])
> >  OVS_CHECK_VXLAN()
> >
> >  OVS_TRAFFIC_VSWITCHD_START()
> > @@ -318,6 +319,10 @@ NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -W 2 
> > 172.31.1.100 | FORMAT_PING], [
> >  3 packets transmitted, 3 received, 0% packet loss, time 0ms
> >  ])
> >
> > +dnl Start ncat listeners.
> > +OVS_DAEMONIZE([nc -l 10.1.1.100 1234 > tcp_data], [nc.pid])
> > +NETNS_DAEMONIZE([at_ns0], [nc -l -u 10.1.1.1 4321 > udp_data], [nc2.pid])
> > +
>
> This should not be here.  We should start them below, close to their usage.
>
> >  dnl Okay, now check the overlay with different packet sizes
> >  NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -W 2 10.1.1.100 | 
> > FORMAT_PING], [0], [dnl
> >  3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > @@ -329,15 +334,29 @@ NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 
> > -W 2 10.1.1.100 | FORMAT_PI
> >  3 packets transmitted, 3 received, 0% packet loss, time 0ms
> >  ])
> >
> > +dnl Verify that ncat is ready.
> > +OVS_WAIT_UNTIL([netstat -ln | grep :1234])
> > +OVS_WAIT_UNTIL([NS_EXEC([at_ns0], [netstat -ln | grep :4321])])
> > +
> >  dnl Check large bidirectional TCP.
> >  AT_CHECK([dd if=/dev/urandom of=payload.bin bs=6 count=1 2> /dev/null])
> > -OVS_DAEMONIZE([nc -l 10.1.1.100 1234 > data], [nc.pid])
> >  NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT 10.1.1.100 1234 < payload.bin])
> >
> >  dnl Wait until transfer completes before checking.
> >  OVS_WAIT_WHILE([kill -0 $(cat nc.pid)])
> > -AT_CHECK([diff -q payload.bin data], [0])
> > +AT_CHECK([diff -q payload.bin tcp_data], [0])
> > +
> > +dnl Check UDP
>
> Period at the end.
>
> > +AT_CHECK([dd if=/dev/urandom of=payload.bin bs=600 count=1 2> /dev/null])
> > +AT_CHECK([nc $NC_EOF_OPT -u 10.1.1.1 4321 < payload.bin])
> >
> > +dnl The UDP listener will just listen forever if not terminated.
> > +OVS_WAIT_UNTIL([kill -0 $(cat nc2.pid)])
> > +AT_CHECK([kill $(cat nc2.pid)])
> > +
> > +dnl Wait until transfer completes before checking.
> > +OVS_WAIT_WHILE([kill -0 $(cat nc2.pid)])
>
> I don't think 6 lines above do anything useful.
> We check that the process is running, kill it, wait until it dies.
> But if it didn't finish writing the data, we will kill it and end
> up with incomplete data...
>
> > +AT_CHECK([diff -q payload.bin udp_data], [0])
>
> ... so instead of all these manipulations we can just use
> OVS_WAIT_UNTIL here and wait until the file is fully transferred.
>
> The server will be stopped by the on_exit hook registered by the
> NETNS_DAEMONIZE.
>
> We can also apply the same strategy to the TCP check, we can just
> wait on a content and not wait for the process to exit.
>
> The same applies to all other tests here.
>
> I already tried these changes here:
>   
> https://github.com/igsilya/ovs/commit/f544d397250714b7ba8c1a3f81eada29ffd59142
>
> I did several iterations of GHA runs with these changes and they
> work fine in CI.
>
> If you agree, I can fold those in while applying the set.
>
> Please, let me know as soon as possible as we need to proceed with
> the release.

I agree with those changes.

-M

>
> Best regards, Ilya Maximets.
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v4 3/4] netdev-native-tnl: Mark all vxlan/geneve packets as tunneled.

2024-02-15 Thread Mike Pattrick

Previously some packets were excluded from the tunnel mark if they
weren't L4. However, this causes problems with multi encapsulated
packets like arp.

Due to these flags being set, additional checks are required in checksum
modification code.

Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
Reported-by: David Marchand 
Signed-off-by: Mike Pattrick 
---
v4: Corrected packet to pkt
---
 lib/dp-packet.h | 19 +--
 lib/netdev-native-tnl.c | 10 --
 lib/packets.c   |  8 
 3 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 770ddc1b9..2fa17d814 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -1386,8 +1386,8 @@ dp_packet_ip_checksum_bad(const struct dp_packet *p)
 
 /* Return 'true' is packet 'b' is not encapsulated and is marked for IPv4
  * checksum offload, or if 'b' is encapsulated and the outer layer is marked
- * for IPv4 checksum offload. IPv6 packets and non offloaded packets return
- * 'false'. */
+ * for IPv4 checksum offload. IPv6 packets, non offloaded packets, and IPv4
+ * packets that are marked as good return 'false'. */
 static inline bool
 dp_packet_hwol_l3_csum_ipv4_ol(const struct dp_packet *b)
 {
@@ -1400,6 +1400,21 @@ dp_packet_hwol_l3_csum_ipv4_ol(const struct dp_packet *b)
 return false;
 }
 
+/* Return 'true' is packet 'b' is not encapsulated and is marked for IPv4
+ * checksum offload, or if 'b' is encapsulated and the outer layer is marked
+ * for IPv4 checksum offload. IPv6 packets and non offloaded packets return
+ * 'false'. */
+static inline bool
+dp_packet_hwol_l3_ipv4(const struct dp_packet *b)
+{
+if (dp_packet_hwol_is_outer_ipv4(b)) {
+return true;
+} else if (!dp_packet_hwol_is_outer_ipv6(b)) {
+return dp_packet_hwol_tx_ip_csum(b);
+}
+return false;
+}
+
 /* Calculate and set the IPv4 header checksum in packet 'p'. */
 static inline void
 dp_packet_ip_set_header_csum(struct dp_packet *p, bool inner)
diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index 0d6d803fe..dee9ab344 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -91,8 +91,7 @@ netdev_tnl_ip_extract_tnl_md(struct dp_packet *packet, struct 
flow_tnl *tnl,
 
 /* A packet coming from a network device might have the
  * csum already checked. In this case, skip the check. */
-if (OVS_UNLIKELY(!dp_packet_ip_checksum_good(packet))
-&& !dp_packet_hwol_tx_ip_csum(packet)) {
+if (OVS_UNLIKELY(!dp_packet_hwol_l3_csum_ipv4_ol(packet))) {
 if (csum(ip, IP_IHL(ip->ip_ihl_ver) * 4)) {
 VLOG_WARN_RL(_rl, "ip packet has invalid checksum");
 return NULL;
@@ -299,6 +298,13 @@ dp_packet_tnl_ol_process(struct dp_packet *packet,
  (char *) dp_packet_eth(packet) +
  VXLAN_HLEN);
 }
+} else {
+/* Mark non-l4 packets as tunneled. */
+if (data->tnl_type == OVS_VPORT_TYPE_GENEVE) {
+dp_packet_hwol_set_tunnel_geneve(packet);
+} else if (data->tnl_type == OVS_VPORT_TYPE_VXLAN) {
+dp_packet_hwol_set_tunnel_vxlan(packet);
+}
 }
 }
 
diff --git a/lib/packets.c b/lib/packets.c
index 36c6692e5..5803d26f4 100644
--- a/lib/packets.c
+++ b/lib/packets.c
@@ -1149,7 +1149,7 @@ packet_set_ipv4_addr(struct dp_packet *packet,
 }
 }
 
-if (dp_packet_hwol_tx_ip_csum(packet)) {
+if (dp_packet_hwol_l3_ipv4(packet)) {
 dp_packet_ol_reset_ip_csum_good(packet);
 } else {
 nh->ip_csum = recalc_csum32(nh->ip_csum, old_addr, new_addr);
@@ -1328,7 +1328,7 @@ packet_set_ipv4(struct dp_packet *packet, ovs_be32 src, 
ovs_be32 dst,
 if (nh->ip_tos != tos) {
 uint8_t *field = >ip_tos;
 
-if (dp_packet_hwol_tx_ip_csum(packet)) {
+if (dp_packet_hwol_l3_ipv4(packet)) {
 dp_packet_ol_reset_ip_csum_good(packet);
 } else {
 nh->ip_csum = recalc_csum16(nh->ip_csum, htons((uint16_t) *field),
@@ -1341,7 +1341,7 @@ packet_set_ipv4(struct dp_packet *packet, ovs_be32 src, 
ovs_be32 dst,
 if (nh->ip_ttl != ttl) {
 uint8_t *field = >ip_ttl;
 
-if (dp_packet_hwol_tx_ip_csum(packet)) {
+if (dp_packet_hwol_l3_ipv4(packet)) {
 dp_packet_ol_reset_ip_csum_good(packet);
 } else {
 nh->ip_csum = recalc_csum16(nh->ip_csum, htons(*field << 8),
@@ -1979,7 +1979,7 @@ IP_ECN_set_ce(struct dp_packet *pkt, bool is_ipv6)
 
 tos |= IP_ECN_CE;
 if (nh->ip_tos != tos) {
-if (dp_packet_hwol_tx_ip_csum(pkt)) {
+if (dp_packet_hwol_l3_ipv4(pkt)) {
 dp_packet_ol_reset_ip_csum_good(pkt);
 } else {
 nh->ip_csum = recalc_csum16(nh->ip_csum, htons(nh->ip

[ovs-dev] [PATCH v4 4/4] Userspace: Add system test with UDP tunneling of UDP traffic.

2024-02-15 Thread Mike Pattrick

Previously a gap existed in the tunnel system tests where only ICMP and
TCP traffic was tested. However, the code paths using for UDP traffic is
different then either of those and should also be tested.

Some of the modified tests had previously checked for TCP with ncat but
didn't include an appropriate check for ncat support. That check was
added to these tests.

Signed-off-by: Mike Pattrick 
---
 tests/system-traffic.at | 118 +---
 1 file changed, 111 insertions(+), 7 deletions(-)

diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index e68fe7e18..ca54a0f73 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -292,6 +292,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([datapath - ping over vxlan tunnel])
+AT_SKIP_IF([test $HAVE_NC = no])
 OVS_CHECK_VXLAN()
 
 OVS_TRAFFIC_VSWITCHD_START()
@@ -318,6 +319,10 @@ NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -W 2 
172.31.1.100 | FORMAT_PING], [
 3 packets transmitted, 3 received, 0% packet loss, time 0ms
 ])
 
+dnl Start ncat listeners.
+OVS_DAEMONIZE([nc -l 10.1.1.100 1234 > tcp_data], [nc.pid])
+NETNS_DAEMONIZE([at_ns0], [nc -l -u 10.1.1.1 4321 > udp_data], [nc2.pid])
+
 dnl Okay, now check the overlay with different packet sizes
 NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -W 2 10.1.1.100 | FORMAT_PING], 
[0], [dnl
 3 packets transmitted, 3 received, 0% packet loss, time 0ms
@@ -329,15 +334,29 @@ NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -W 2 
10.1.1.100 | FORMAT_PI
 3 packets transmitted, 3 received, 0% packet loss, time 0ms
 ])
 
+dnl Verify that ncat is ready.
+OVS_WAIT_UNTIL([netstat -ln | grep :1234])
+OVS_WAIT_UNTIL([NS_EXEC([at_ns0], [netstat -ln | grep :4321])])
+
 dnl Check large bidirectional TCP.
 AT_CHECK([dd if=/dev/urandom of=payload.bin bs=6 count=1 2> /dev/null])
-OVS_DAEMONIZE([nc -l 10.1.1.100 1234 > data], [nc.pid])
 NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT 10.1.1.100 1234 < payload.bin])
 
 dnl Wait until transfer completes before checking.
 OVS_WAIT_WHILE([kill -0 $(cat nc.pid)])
-AT_CHECK([diff -q payload.bin data], [0])
+AT_CHECK([diff -q payload.bin tcp_data], [0])
+
+dnl Check UDP
+AT_CHECK([dd if=/dev/urandom of=payload.bin bs=600 count=1 2> /dev/null])
+AT_CHECK([nc $NC_EOF_OPT -u 10.1.1.1 4321 < payload.bin])
 
+dnl The UDP listener will just listen forever if not terminated.
+OVS_WAIT_UNTIL([kill -0 $(cat nc2.pid)])
+AT_CHECK([kill $(cat nc2.pid)])
+
+dnl Wait until transfer completes before checking.
+OVS_WAIT_WHILE([kill -0 $(cat nc2.pid)])
+AT_CHECK([diff -q payload.bin udp_data], [0])
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
@@ -389,6 +408,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([datapath - ping over vxlan6 tunnel])
+AT_SKIP_IF([test $HAVE_NC = no])
 OVS_CHECK_VXLAN_UDP6ZEROCSUM()
 
 OVS_TRAFFIC_VSWITCHD_START()
@@ -412,6 +432,9 @@ ADD_NATIVE_TUNNEL6([vxlan], [at_vxlan1], [at_ns0], 
[fc00::100], [10.1.1.1/24],
 
 OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100])
 
+OVS_DAEMONIZE([nc -l 10.1.1.100 1234 > data], [nc.pid])
+NETNS_DAEMONIZE([at_ns0], [nc -l -u 10.1.1.1 4321 > udp_data], [nc2.pid])
+
 dnl First, check the underlay
 NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -W 2 fc00::100 | FORMAT_PING], 
[0], [dnl
 3 packets transmitted, 3 received, 0% packet loss, time 0ms
@@ -428,14 +451,29 @@ NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -W 2 
10.1.1.100 | FORMAT_PI
 3 packets transmitted, 3 received, 0% packet loss, time 0ms
 ])
 
+dnl Verify that ncat is ready.
+OVS_WAIT_UNTIL([netstat -ln | grep :1234])
+OVS_WAIT_UNTIL([NS_EXEC([at_ns0], [netstat -ln | grep :4321])])
+
 dnl Check large bidirectional TCP.
 AT_CHECK([dd if=/dev/urandom of=payload.bin bs=6 count=1 2> /dev/null])
-OVS_DAEMONIZE([nc -l 10.1.1.100 1234 > data], [nc.pid])
 NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT 10.1.1.100 1234 < payload.bin])
 
 dnl Wait until transfer completes before checking.
 OVS_WAIT_WHILE([kill -0 $(cat nc.pid)])
 AT_CHECK([diff -q payload.bin data], [0])
+
+dnl Check UDP
+AT_CHECK([dd if=/dev/urandom of=payload.bin bs=600 count=1 2> /dev/null])
+AT_CHECK([nc $NC_EOF_OPT -u 10.1.1.1 4321 < payload.bin])
+
+dnl The UDP listener will just listen forever if not terminated.
+OVS_WAIT_UNTIL([kill -0 $(cat nc2.pid)])
+AT_CHECK([kill $(cat nc2.pid)])
+
+dnl Wait until transfer completes before checking.
+OVS_WAIT_WHILE([kill -0 $(cat nc2.pid)])
+AT_CHECK([diff -q payload.bin udp_data], [0])
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
@@ -466,6 +504,10 @@ NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -W 2 
172.31.1.100 | FORMAT_PING], [
 3 packets transmitted, 3 received, 0% packet loss, time 0ms
 ])
 
+dnl Start ncat listeners.
+OVS_DAEMONIZE([nc -l 10.1.1.100 1234 > tcp_data], [nc.pid])
+NETNS_DAEMONIZE([at_ns0], [nc -l -u 10.1.1.1 4321 > udp_data], [nc2.pid])
+
 dnl Okay, now check the overlay with different packet sizes
 NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -W 2

[ovs-dev] [PATCH v4 2/4] netdev-linux: Only repair IP checksum in IPv4.

2024-02-15 Thread Mike Pattrick

Previously a change was added to the vnet prepend code to solve for the
case where no L4 checksum offloading was needed but the L3 checksum
hadn't been calculated. But the added check didn't properly account
for IPv6 traffic.

Fixes: 85bcbbed839a ("userspace: Enable tunnel tests with TSO.")
Reported-by: David Marchand 
Signed-off-by: Mike Pattrick 
---
 lib/dp-packet.h| 18 +-
 lib/netdev-linux.c |  9 +
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 802d3f385..770ddc1b9 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -1184,7 +1184,7 @@ dp_packet_hwol_is_tunnel_vxlan(struct dp_packet *b)
 
 /* Returns 'true' if packet 'b' is marked for outer IPv4 checksum offload. */
 static inline bool
-dp_packet_hwol_is_outer_ipv4_cksum(struct dp_packet *b)
+dp_packet_hwol_is_outer_ipv4_cksum(const struct dp_packet *b)
 {
 return !!(*dp_packet_ol_flags_ptr(b) & DP_PACKET_OL_TX_OUTER_IP_CKSUM);
 }
@@ -1384,6 +1384,22 @@ dp_packet_ip_checksum_bad(const struct dp_packet *p)
 DP_PACKET_OL_RX_IP_CKSUM_BAD;
 }
 
+/* Return 'true' is packet 'b' is not encapsulated and is marked for IPv4
+ * checksum offload, or if 'b' is encapsulated and the outer layer is marked
+ * for IPv4 checksum offload. IPv6 packets and non offloaded packets return
+ * 'false'. */
+static inline bool
+dp_packet_hwol_l3_csum_ipv4_ol(const struct dp_packet *b)
+{
+if (dp_packet_hwol_is_outer_ipv4(b)) {
+return dp_packet_hwol_is_outer_ipv4_cksum(b);
+} else if (!dp_packet_hwol_is_outer_ipv6(b)) {
+return dp_packet_hwol_tx_ip_csum(b) &&
+   !dp_packet_ip_checksum_good(b);
+}
+return false;
+}
+
 /* Calculate and set the IPv4 header checksum in packet 'p'. */
 static inline void
 dp_packet_ip_set_header_csum(struct dp_packet *p, bool inner)
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 8964cd670..bf91ef462 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -7199,10 +7199,11 @@ netdev_linux_prepend_vnet_hdr(struct dp_packet *b, int 
mtu)
 /* The packet has good L4 checksum. No need to validate again. */
 vnet->csum_start = vnet->csum_offset = (OVS_FORCE __virtio16) 0;
 vnet->flags = VIRTIO_NET_HDR_F_DATA_VALID;
-if (!dp_packet_ip_checksum_good(b)) {
-/* It is possible that L4 is good but the IP checksum isn't
- * complete. For example in the case of UDP encapsulation of an ARP
- * packet where the UDP checksum is 0. */
+
+/* It is possible that L4 is good but the IPv4 checksum isn't
+ * complete. For example in the case of UDP encapsulation of an ARP
+ * packet where the UDP checksum is 0. */
+if (dp_packet_hwol_l3_csum_ipv4_ol(b)) {
 dp_packet_ip_set_header_csum(b, false);
 }
 } else if (dp_packet_hwol_tx_l4_checksum(b)) {
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v4 1/4] netdev-linux: Favour inner packet for multi-encapsulated TSO.

2024-02-15 Thread Mike Pattrick

Previously if an OVS configuration nested multiple layers of UDP tunnels
like VXLAN or GENEVE on top of each other through netdev-linux
interfaces, the vnet header would be incorrectly set to the outermost
UDP tunnel layer instead of the intermediary tunnel layer.

This resulted in the middle UDP tunnel not checksum offloading properly.

Fixes: 85bcbbed839a ("userspace: Enable tunnel tests with TSO.")
Reported-by: David Marchand 
Signed-off-by: Mike Pattrick 
---
nb: The first patch was removed from the v2 version of this series,
which was not explicitly a bug fix.
---
 lib/netdev-linux.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 00df7f634..8964cd670 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -7247,14 +7247,23 @@ netdev_linux_prepend_vnet_hdr(struct dp_packet *b, int 
mtu)
 vnet->csum_offset = (OVS_FORCE __virtio16) __builtin_offsetof(
 struct tcp_header, tcp_csum);
 } else if (dp_packet_hwol_l4_is_udp(b)) {
-struct udp_header *udp_hdr = dp_packet_l4(b);
+/* Favour the inner packet when indicating checksum offsets. */
+void *l3_off = dp_packet_inner_l3(b);
+void *l4_off = dp_packet_inner_l4(b);
+
+if (!l3_off || !l4_off) {
+l3_off = dp_packet_l3(b);
+l4_off = dp_packet_l4(b);
+}
+struct udp_header *udp_hdr = l4_off;
+
 ovs_be16 csum = 0;
 
 if (dp_packet_hwol_is_ipv4(b)) {
-const struct ip_header *ip_hdr = dp_packet_l3(b);
+const struct ip_header *ip_hdr = l3_off;
 csum = ~csum_finish(packet_csum_pseudoheader(ip_hdr));
 } else if (dp_packet_hwol_tx_ipv6(b)) {
-const struct ovs_16aligned_ip6_hdr *ip6_hdr = dp_packet_l3(b);
+const struct ovs_16aligned_ip6_hdr *ip6_hdr = l4_off;
 csum = ~csum_finish(packet_csum_pseudoheader6(ip6_hdr));
 }
 
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v3 3/5] netdev-native-tnl: Mark all vxlan/geneve packets as tunneled.

2024-02-15 Thread Mike Pattrick

On Thu, Feb 15, 2024 at 4:59 PM Mike Pattrick  wrote:
>
> Previously some packets were excluded from the tunnel mark if they
> weren't L4. However, this causes problems with multi encapsulated
> packets like arp.
>
> Due to these flags being set, additional checks are required in checksum
> modification code.
>
> Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
> Reported-by: David Marchand 
> Signed-off-by: Mike Pattrick 
> ---
>  lib/dp-packet.h | 19 +--
>  lib/netdev-native-tnl.c | 10 --
>  lib/packets.c   |  8 
>  3 files changed, 29 insertions(+), 8 deletions(-)
>
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> index 770ddc1b9..2fa17d814 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -1386,8 +1386,8 @@ dp_packet_ip_checksum_bad(const struct dp_packet *p)
>
>  /* Return 'true' is packet 'b' is not encapsulated and is marked for IPv4
>   * checksum offload, or if 'b' is encapsulated and the outer layer is marked
> - * for IPv4 checksum offload. IPv6 packets and non offloaded packets return
> - * 'false'. */
> + * for IPv4 checksum offload. IPv6 packets, non offloaded packets, and IPv4
> + * packets that are marked as good return 'false'. */
>  static inline bool
>  dp_packet_hwol_l3_csum_ipv4_ol(const struct dp_packet *b)
>  {
> @@ -1400,6 +1400,21 @@ dp_packet_hwol_l3_csum_ipv4_ol(const struct dp_packet 
> *b)
>  return false;
>  }
>
> +/* Return 'true' is packet 'b' is not encapsulated and is marked for IPv4
> + * checksum offload, or if 'b' is encapsulated and the outer layer is marked
> + * for IPv4 checksum offload. IPv6 packets and non offloaded packets return
> + * 'false'. */
> +static inline bool
> +dp_packet_hwol_l3_ipv4(const struct dp_packet *b)
> +{
> +if (dp_packet_hwol_is_outer_ipv4(b)) {
> +return true;
> +} else if (!dp_packet_hwol_is_outer_ipv6(b)) {
> +return dp_packet_hwol_tx_ip_csum(b);
> +}
> +return false;
> +}
> +
>  /* Calculate and set the IPv4 header checksum in packet 'p'. */
>  static inline void
>  dp_packet_ip_set_header_csum(struct dp_packet *p, bool inner)
> diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
> index 0d6d803fe..dee9ab344 100644
> --- a/lib/netdev-native-tnl.c
> +++ b/lib/netdev-native-tnl.c
> @@ -91,8 +91,7 @@ netdev_tnl_ip_extract_tnl_md(struct dp_packet *packet, 
> struct flow_tnl *tnl,
>
>  /* A packet coming from a network device might have the
>   * csum already checked. In this case, skip the check. */
> -if (OVS_UNLIKELY(!dp_packet_ip_checksum_good(packet))
> -&& !dp_packet_hwol_tx_ip_csum(packet)) {
> +if (OVS_UNLIKELY(!dp_packet_hwol_l3_csum_ipv4_ol(packet))) {
>  if (csum(ip, IP_IHL(ip->ip_ihl_ver) * 4)) {
>  VLOG_WARN_RL(_rl, "ip packet has invalid checksum");
>  return NULL;
> @@ -299,6 +298,13 @@ dp_packet_tnl_ol_process(struct dp_packet *packet,
>   (char *) dp_packet_eth(packet) +
>   VXLAN_HLEN);
>  }
> +} else {
> +/* Mark non-l4 packets as tunneled. */
> +if (data->tnl_type == OVS_VPORT_TYPE_GENEVE) {
> +dp_packet_hwol_set_tunnel_geneve(packet);
> +} else if (data->tnl_type == OVS_VPORT_TYPE_VXLAN) {
> +dp_packet_hwol_set_tunnel_vxlan(packet);
> +}
>  }
>  }
>
> diff --git a/lib/packets.c b/lib/packets.c
> index 36c6692e5..ed185b4ec 100644
> --- a/lib/packets.c
> +++ b/lib/packets.c
> @@ -1149,7 +1149,7 @@ packet_set_ipv4_addr(struct dp_packet *packet,
>  }
>  }
>
> -if (dp_packet_hwol_tx_ip_csum(packet)) {
> +if (dp_packet_hwol_l3_ipv4(packet)) {
>  dp_packet_ol_reset_ip_csum_good(packet);
>  } else {
>  nh->ip_csum = recalc_csum32(nh->ip_csum, old_addr, new_addr);
> @@ -1328,7 +1328,7 @@ packet_set_ipv4(struct dp_packet *packet, ovs_be32 src, 
> ovs_be32 dst,
>  if (nh->ip_tos != tos) {
>  uint8_t *field = >ip_tos;
>
> -if (dp_packet_hwol_tx_ip_csum(packet)) {
> +if (dp_packet_hwol_l3_ipv4(packet)) {
>  dp_packet_ol_reset_ip_csum_good(packet);
>  } else {
>  nh->ip_csum = recalc_csum16(nh->ip_csum, htons((uint16_t) 
> *field),
> @@ -1341,7 +1341,7 @@ packet_set_ipv4(struct dp_packet *packet, ovs_be32 src, 
> ovs_be32 dst,
>  if (nh->ip_ttl != ttl) {
>  uint8_t *field = >ip_ttl;
>
> -if (dp_packet_hwol_tx_ip_csum(packet)) {
> +if (dp_packet_hwol_l

[ovs-dev] [PATCH v3 3/5] netdev-native-tnl: Mark all vxlan/geneve packets as tunneled.

2024-02-15 Thread Mike Pattrick

Previously some packets were excluded from the tunnel mark if they
weren't L4. However, this causes problems with multi encapsulated
packets like arp.

Due to these flags being set, additional checks are required in checksum
modification code.

Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
Reported-by: David Marchand 
Signed-off-by: Mike Pattrick 
---
 lib/dp-packet.h | 19 +--
 lib/netdev-native-tnl.c | 10 --
 lib/packets.c   |  8 
 3 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 770ddc1b9..2fa17d814 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -1386,8 +1386,8 @@ dp_packet_ip_checksum_bad(const struct dp_packet *p)
 
 /* Return 'true' is packet 'b' is not encapsulated and is marked for IPv4
  * checksum offload, or if 'b' is encapsulated and the outer layer is marked
- * for IPv4 checksum offload. IPv6 packets and non offloaded packets return
- * 'false'. */
+ * for IPv4 checksum offload. IPv6 packets, non offloaded packets, and IPv4
+ * packets that are marked as good return 'false'. */
 static inline bool
 dp_packet_hwol_l3_csum_ipv4_ol(const struct dp_packet *b)
 {
@@ -1400,6 +1400,21 @@ dp_packet_hwol_l3_csum_ipv4_ol(const struct dp_packet *b)
 return false;
 }
 
+/* Return 'true' is packet 'b' is not encapsulated and is marked for IPv4
+ * checksum offload, or if 'b' is encapsulated and the outer layer is marked
+ * for IPv4 checksum offload. IPv6 packets and non offloaded packets return
+ * 'false'. */
+static inline bool
+dp_packet_hwol_l3_ipv4(const struct dp_packet *b)
+{
+if (dp_packet_hwol_is_outer_ipv4(b)) {
+return true;
+} else if (!dp_packet_hwol_is_outer_ipv6(b)) {
+return dp_packet_hwol_tx_ip_csum(b);
+}
+return false;
+}
+
 /* Calculate and set the IPv4 header checksum in packet 'p'. */
 static inline void
 dp_packet_ip_set_header_csum(struct dp_packet *p, bool inner)
diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index 0d6d803fe..dee9ab344 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -91,8 +91,7 @@ netdev_tnl_ip_extract_tnl_md(struct dp_packet *packet, struct 
flow_tnl *tnl,
 
 /* A packet coming from a network device might have the
  * csum already checked. In this case, skip the check. */
-if (OVS_UNLIKELY(!dp_packet_ip_checksum_good(packet))
-&& !dp_packet_hwol_tx_ip_csum(packet)) {
+if (OVS_UNLIKELY(!dp_packet_hwol_l3_csum_ipv4_ol(packet))) {
 if (csum(ip, IP_IHL(ip->ip_ihl_ver) * 4)) {
 VLOG_WARN_RL(_rl, "ip packet has invalid checksum");
 return NULL;
@@ -299,6 +298,13 @@ dp_packet_tnl_ol_process(struct dp_packet *packet,
  (char *) dp_packet_eth(packet) +
  VXLAN_HLEN);
 }
+} else {
+/* Mark non-l4 packets as tunneled. */
+if (data->tnl_type == OVS_VPORT_TYPE_GENEVE) {
+dp_packet_hwol_set_tunnel_geneve(packet);
+} else if (data->tnl_type == OVS_VPORT_TYPE_VXLAN) {
+dp_packet_hwol_set_tunnel_vxlan(packet);
+}
 }
 }
 
diff --git a/lib/packets.c b/lib/packets.c
index 36c6692e5..ed185b4ec 100644
--- a/lib/packets.c
+++ b/lib/packets.c
@@ -1149,7 +1149,7 @@ packet_set_ipv4_addr(struct dp_packet *packet,
 }
 }
 
-if (dp_packet_hwol_tx_ip_csum(packet)) {
+if (dp_packet_hwol_l3_ipv4(packet)) {
 dp_packet_ol_reset_ip_csum_good(packet);
 } else {
 nh->ip_csum = recalc_csum32(nh->ip_csum, old_addr, new_addr);
@@ -1328,7 +1328,7 @@ packet_set_ipv4(struct dp_packet *packet, ovs_be32 src, 
ovs_be32 dst,
 if (nh->ip_tos != tos) {
 uint8_t *field = >ip_tos;
 
-if (dp_packet_hwol_tx_ip_csum(packet)) {
+if (dp_packet_hwol_l3_ipv4(packet)) {
 dp_packet_ol_reset_ip_csum_good(packet);
 } else {
 nh->ip_csum = recalc_csum16(nh->ip_csum, htons((uint16_t) *field),
@@ -1341,7 +1341,7 @@ packet_set_ipv4(struct dp_packet *packet, ovs_be32 src, 
ovs_be32 dst,
 if (nh->ip_ttl != ttl) {
 uint8_t *field = >ip_ttl;
 
-if (dp_packet_hwol_tx_ip_csum(packet)) {
+if (dp_packet_hwol_l3_ipv4(packet)) {
 dp_packet_ol_reset_ip_csum_good(packet);
 } else {
 nh->ip_csum = recalc_csum16(nh->ip_csum, htons(*field << 8),
@@ -1979,7 +1979,7 @@ IP_ECN_set_ce(struct dp_packet *pkt, bool is_ipv6)
 
 tos |= IP_ECN_CE;
 if (nh->ip_tos != tos) {
-if (dp_packet_hwol_tx_ip_csum(pkt)) {
+if (dp_packet_hwol_l3_ipv4(packet)) {
 dp_packet_ol_reset_ip_csum_good(pkt);
 } else {
 nh->ip_csum = recalc_csum16(nh->ip_csum, htons(nh->ip_tos),
-- 
2.39.3

__

[ovs-dev] [PATCH v3 4/5] Userspace: Add system test with UDP tunneling of UDP traffic.

2024-02-15 Thread Mike Pattrick

Previously a gap existed in the tunnel system tests where only ICMP and
TCP traffic was tested. However, the code paths using for UDP traffic is
different then either of those and should also be tested.

Some of the modified tests had previously checked for TCP with ncat but
didn't include an appropriate check for ncat support. That check was
added to these tests.

Signed-off-by: Mike Pattrick 
---
 tests/system-traffic.at | 118 +---
 1 file changed, 111 insertions(+), 7 deletions(-)

diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index e68fe7e18..ca54a0f73 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -292,6 +292,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([datapath - ping over vxlan tunnel])
+AT_SKIP_IF([test $HAVE_NC = no])
 OVS_CHECK_VXLAN()
 
 OVS_TRAFFIC_VSWITCHD_START()
@@ -318,6 +319,10 @@ NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -W 2 
172.31.1.100 | FORMAT_PING], [
 3 packets transmitted, 3 received, 0% packet loss, time 0ms
 ])
 
+dnl Start ncat listeners.
+OVS_DAEMONIZE([nc -l 10.1.1.100 1234 > tcp_data], [nc.pid])
+NETNS_DAEMONIZE([at_ns0], [nc -l -u 10.1.1.1 4321 > udp_data], [nc2.pid])
+
 dnl Okay, now check the overlay with different packet sizes
 NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -W 2 10.1.1.100 | FORMAT_PING], 
[0], [dnl
 3 packets transmitted, 3 received, 0% packet loss, time 0ms
@@ -329,15 +334,29 @@ NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -W 2 
10.1.1.100 | FORMAT_PI
 3 packets transmitted, 3 received, 0% packet loss, time 0ms
 ])
 
+dnl Verify that ncat is ready.
+OVS_WAIT_UNTIL([netstat -ln | grep :1234])
+OVS_WAIT_UNTIL([NS_EXEC([at_ns0], [netstat -ln | grep :4321])])
+
 dnl Check large bidirectional TCP.
 AT_CHECK([dd if=/dev/urandom of=payload.bin bs=6 count=1 2> /dev/null])
-OVS_DAEMONIZE([nc -l 10.1.1.100 1234 > data], [nc.pid])
 NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT 10.1.1.100 1234 < payload.bin])
 
 dnl Wait until transfer completes before checking.
 OVS_WAIT_WHILE([kill -0 $(cat nc.pid)])
-AT_CHECK([diff -q payload.bin data], [0])
+AT_CHECK([diff -q payload.bin tcp_data], [0])
+
+dnl Check UDP
+AT_CHECK([dd if=/dev/urandom of=payload.bin bs=600 count=1 2> /dev/null])
+AT_CHECK([nc $NC_EOF_OPT -u 10.1.1.1 4321 < payload.bin])
 
+dnl The UDP listener will just listen forever if not terminated.
+OVS_WAIT_UNTIL([kill -0 $(cat nc2.pid)])
+AT_CHECK([kill $(cat nc2.pid)])
+
+dnl Wait until transfer completes before checking.
+OVS_WAIT_WHILE([kill -0 $(cat nc2.pid)])
+AT_CHECK([diff -q payload.bin udp_data], [0])
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
@@ -389,6 +408,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([datapath - ping over vxlan6 tunnel])
+AT_SKIP_IF([test $HAVE_NC = no])
 OVS_CHECK_VXLAN_UDP6ZEROCSUM()
 
 OVS_TRAFFIC_VSWITCHD_START()
@@ -412,6 +432,9 @@ ADD_NATIVE_TUNNEL6([vxlan], [at_vxlan1], [at_ns0], 
[fc00::100], [10.1.1.1/24],
 
 OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100])
 
+OVS_DAEMONIZE([nc -l 10.1.1.100 1234 > data], [nc.pid])
+NETNS_DAEMONIZE([at_ns0], [nc -l -u 10.1.1.1 4321 > udp_data], [nc2.pid])
+
 dnl First, check the underlay
 NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -W 2 fc00::100 | FORMAT_PING], 
[0], [dnl
 3 packets transmitted, 3 received, 0% packet loss, time 0ms
@@ -428,14 +451,29 @@ NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -W 2 
10.1.1.100 | FORMAT_PI
 3 packets transmitted, 3 received, 0% packet loss, time 0ms
 ])
 
+dnl Verify that ncat is ready.
+OVS_WAIT_UNTIL([netstat -ln | grep :1234])
+OVS_WAIT_UNTIL([NS_EXEC([at_ns0], [netstat -ln | grep :4321])])
+
 dnl Check large bidirectional TCP.
 AT_CHECK([dd if=/dev/urandom of=payload.bin bs=6 count=1 2> /dev/null])
-OVS_DAEMONIZE([nc -l 10.1.1.100 1234 > data], [nc.pid])
 NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT 10.1.1.100 1234 < payload.bin])
 
 dnl Wait until transfer completes before checking.
 OVS_WAIT_WHILE([kill -0 $(cat nc.pid)])
 AT_CHECK([diff -q payload.bin data], [0])
+
+dnl Check UDP
+AT_CHECK([dd if=/dev/urandom of=payload.bin bs=600 count=1 2> /dev/null])
+AT_CHECK([nc $NC_EOF_OPT -u 10.1.1.1 4321 < payload.bin])
+
+dnl The UDP listener will just listen forever if not terminated.
+OVS_WAIT_UNTIL([kill -0 $(cat nc2.pid)])
+AT_CHECK([kill $(cat nc2.pid)])
+
+dnl Wait until transfer completes before checking.
+OVS_WAIT_WHILE([kill -0 $(cat nc2.pid)])
+AT_CHECK([diff -q payload.bin udp_data], [0])
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
@@ -466,6 +504,10 @@ NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -W 2 
172.31.1.100 | FORMAT_PING], [
 3 packets transmitted, 3 received, 0% packet loss, time 0ms
 ])
 
+dnl Start ncat listeners.
+OVS_DAEMONIZE([nc -l 10.1.1.100 1234 > tcp_data], [nc.pid])
+NETNS_DAEMONIZE([at_ns0], [nc -l -u 10.1.1.1 4321 > udp_data], [nc2.pid])
+
 dnl Okay, now check the overlay with different packet sizes
 NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -W 2

[ovs-dev] [PATCH v3 2/5] netdev-linux: Only repair IP checksum in IPv4.

2024-02-15 Thread Mike Pattrick

Previously a change was added to the vnet prepend code to solve for the
case where no L4 checksum offloading was needed but the L3 checksum
hadn't been calculated. But the added check didn't properly account
for IPv6 traffic.

Fixes: 85bcbbed839a ("userspace: Enable tunnel tests with TSO.")
Reported-by: David Marchand 
Signed-off-by: Mike Pattrick 
---
 lib/dp-packet.h| 18 +-
 lib/netdev-linux.c |  9 +
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 802d3f385..770ddc1b9 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -1184,7 +1184,7 @@ dp_packet_hwol_is_tunnel_vxlan(struct dp_packet *b)
 
 /* Returns 'true' if packet 'b' is marked for outer IPv4 checksum offload. */
 static inline bool
-dp_packet_hwol_is_outer_ipv4_cksum(struct dp_packet *b)
+dp_packet_hwol_is_outer_ipv4_cksum(const struct dp_packet *b)
 {
 return !!(*dp_packet_ol_flags_ptr(b) & DP_PACKET_OL_TX_OUTER_IP_CKSUM);
 }
@@ -1384,6 +1384,22 @@ dp_packet_ip_checksum_bad(const struct dp_packet *p)
 DP_PACKET_OL_RX_IP_CKSUM_BAD;
 }
 
+/* Return 'true' is packet 'b' is not encapsulated and is marked for IPv4
+ * checksum offload, or if 'b' is encapsulated and the outer layer is marked
+ * for IPv4 checksum offload. IPv6 packets and non offloaded packets return
+ * 'false'. */
+static inline bool
+dp_packet_hwol_l3_csum_ipv4_ol(const struct dp_packet *b)
+{
+if (dp_packet_hwol_is_outer_ipv4(b)) {
+return dp_packet_hwol_is_outer_ipv4_cksum(b);
+} else if (!dp_packet_hwol_is_outer_ipv6(b)) {
+return dp_packet_hwol_tx_ip_csum(b) &&
+   !dp_packet_ip_checksum_good(b);
+}
+return false;
+}
+
 /* Calculate and set the IPv4 header checksum in packet 'p'. */
 static inline void
 dp_packet_ip_set_header_csum(struct dp_packet *p, bool inner)
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 8964cd670..bf91ef462 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -7199,10 +7199,11 @@ netdev_linux_prepend_vnet_hdr(struct dp_packet *b, int 
mtu)
 /* The packet has good L4 checksum. No need to validate again. */
 vnet->csum_start = vnet->csum_offset = (OVS_FORCE __virtio16) 0;
 vnet->flags = VIRTIO_NET_HDR_F_DATA_VALID;
-if (!dp_packet_ip_checksum_good(b)) {
-/* It is possible that L4 is good but the IP checksum isn't
- * complete. For example in the case of UDP encapsulation of an ARP
- * packet where the UDP checksum is 0. */
+
+/* It is possible that L4 is good but the IPv4 checksum isn't
+ * complete. For example in the case of UDP encapsulation of an ARP
+ * packet where the UDP checksum is 0. */
+if (dp_packet_hwol_l3_csum_ipv4_ol(b)) {
 dp_packet_ip_set_header_csum(b, false);
 }
 } else if (dp_packet_hwol_tx_l4_checksum(b)) {
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v3 1/5] netdev-linux: Favour inner packet for multi-encapsulated TSO.

2024-02-15 Thread Mike Pattrick

Previously if an OVS configuration nested multiple layers of UDP tunnels
like VXLAN or GENEVE on top of each other through netdev-linux
interfaces, the vnet header would be incorrectly set to the outermost
UDP tunnel layer instead of the intermediary tunnel layer.

This resulted in the middle UDP tunnel not checksum offloading properly.

Fixes: 85bcbbed839a ("userspace: Enable tunnel tests with TSO.")
Reported-by: David Marchand 
Signed-off-by: Mike Pattrick 
---
nb: The first patch was removed from the v2 version of this series,
which was not explicitly a bug fix.
---
 lib/netdev-linux.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 00df7f634..8964cd670 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -7247,14 +7247,23 @@ netdev_linux_prepend_vnet_hdr(struct dp_packet *b, int 
mtu)
 vnet->csum_offset = (OVS_FORCE __virtio16) __builtin_offsetof(
 struct tcp_header, tcp_csum);
 } else if (dp_packet_hwol_l4_is_udp(b)) {
-struct udp_header *udp_hdr = dp_packet_l4(b);
+/* Favour the inner packet when indicating checksum offsets. */
+void *l3_off = dp_packet_inner_l3(b);
+void *l4_off = dp_packet_inner_l4(b);
+
+if (!l3_off || !l4_off) {
+l3_off = dp_packet_l3(b);
+l4_off = dp_packet_l4(b);
+}
+struct udp_header *udp_hdr = l4_off;
+
 ovs_be16 csum = 0;
 
 if (dp_packet_hwol_is_ipv4(b)) {
-const struct ip_header *ip_hdr = dp_packet_l3(b);
+const struct ip_header *ip_hdr = l3_off;
 csum = ~csum_finish(packet_csum_pseudoheader(ip_hdr));
 } else if (dp_packet_hwol_tx_ipv6(b)) {
-const struct ovs_16aligned_ip6_hdr *ip6_hdr = dp_packet_l3(b);
+const struct ovs_16aligned_ip6_hdr *ip6_hdr = l4_off;
 csum = ~csum_finish(packet_csum_pseudoheader6(ip6_hdr));
 }
 
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2 2/4] netdev-linux: Favour inner packet for multi-encapsulated TSO.

2024-02-14 Thread Mike Pattrick

On Wed, Feb 14, 2024 at 12:09 PM David Marchand
 wrote:
>
> Hello Mike,
>
> On Mon, Feb 12, 2024 at 8:50 PM Mike Pattrick  wrote:
> >
> > Previously if an OVS configuration nested multiple layers of UDP tunnels
> > like VXLAN or GENEVE on top of each other through netdev-linux
> > interfaces, the vnet header would be incorrectly set to the outermost
> > UDP tunnel layer instead of the intermediary tunnel layer.
> >
> > This resulted in the middle UDP tunnel not checksum offloading properly.
> >
> > Fixes: 3337e6d91c5b ("userspace: Enable L4 checksum offloading by default.")
> > Reported-by: David Marchand 
> > Signed-off-by: Mike Pattrick 
>
> I have some trouble relating this patch to the issue I faced :-).
> Could you detail a test that shows the issue you fix here?

I made a slight modification to the test suite you provided adding a
UDP traffic test to your test; which unveiled this issue. But you are
correct, I got so distracted by one issue that I missed other issues.

>
> After applying (only this patch), I still reproduce an issue with
> inner checksums.
> As I reported this issue to you offlist, let me put the details in public 
> here.
>
> I wrote a system-traffic.at unit test that stacks 3 vxlan tunnels
> (separate topic, but for the context, my goal was to stress DPDK
> dp-packets wrt headroom).
> If I try this unit test before commit 084c8087292c ("userspace:
> Support VXLAN and GENEVE TSO."), I have no issue.
>
> The topology is as follows:
> ##
> #
> # at_ns0. init_net
> #   .
> # at_vxlan1 (10.1.1.1/24)   . br0 (10.1.1.100/24)
> # (remote 172.31.1.100) . |
> #   . at_vxlan0
> #   . (remote 172.31.1.1)
> #   .
> # at_vxlan3 (172.31.1.1/24) . br-underlay0 (172.31.1.100/24)
> # (remote 172.31.2.100) . |
> #   . at_vxlan2
> #   . (remote 172.31.2.1)
> #   .
> # at_vxlan5 (172.31.2.1/24) . br-underlay1 (172.31.2.100/24)
> # (remote 172.31.3.100) . |
> #   . at_vxlan4
> #   . (remote 172.31.3.1)
> #   .
> # p0 (172.31.3.1/24). br-underlay2 (172.31.3.100/24)
> # | . |
> # \-.-ovs-p0
> #
> ##
>
> (gmail will probably bust this copy/paste, so putting a link to the
> actual test: 
> https://github.com/david-marchand/ovs/commit/manyvxlan~2#diff-45a77f85f9679bc66ac97300392c0d5d9f5c53264fa8a82d735a553246e71faeR400)
>
> With this setup, I try to ping, from at_ns0 netns, the ip address of
> the br tap iface plugged with the other side of each tunnel:
>
> - Most outter level, no encapsulation, all good:
> 16:24:51.590966 a6:0a:bf:e2:f3:f2 > 82:cf:78:de:ed:46, ethertype IPv4
> (0x0800), length 98: (tos 0x0, ttl 64, id 63550, offset 0, flags [DF],
> proto ICMP (1), length 84)
> 172.31.3.1 > 172.31.3.100: ICMP echo request, id 26707, seq 1, length 64
>
> 16:24:51.591084 82:cf:78:de:ed:46 > a6:0a:bf:e2:f3:f2, ethertype IPv4
> (0x0800), length 98: (tos 0x0, ttl 64, id 28720, offset 0, flags
> [none], proto ICMP (1), length 84)
> 172.31.3.100 > 172.31.3.1: ICMP echo reply, id 26707, seq 1, length 64
>
> - One tunnel encap all good:
> 16:24:54.140629 a6:0a:bf:e2:f3:f2 > 82:cf:78:de:ed:46, ethertype IPv4
> (0x0800), length 148: (tos 0x0, ttl 64, id 61052, offset 0, flags
> [none], proto UDP (17), length 134)
> 172.31.3.1.36831 > 172.31.3.100.vxlan: [udp sum ok] VXLAN, flags
> [I] (0x08), vni 0
> 1e:db:ec:e5:28:6d > 9a:39:be:e8:18:4b, ethertype IPv4 (0x0800), length
> 98: (tos 0x0, ttl 64, id 54399, offset 0, flags [DF], proto ICMP (1),
> length 84)
> 172.31.2.1 > 172.31.2.100: ICMP echo request, id 51488, seq 1, length 64
>
> 16:24:54.140772 82:cf:78:de:ed:46 > a6:0a:bf:e2:f3:f2, ethertype IPv4
> (0x0800), length 148: (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
> proto UDP (17), length 134)
> 172.31.3.100.39912 > 172.31.3.1.vxlan: [no cksum] VXLAN, flags [I]
> (0x08), vni 0
> 9a:39:be:e8:18:4b > 1e:db:ec:e5:28:6d, ethertype IPv4 (0x0800), length
> 98: (tos 0x0, ttl 64, id 29701, offset 0, flags [none], proto ICMP
> (1), length 84)
> 172.31.2.100 > 172.31.2.1: ICMP echo reply, id 51488, seq 1, length 64
>
> - Two tunnels encap:
> 16:24:58.578900 a6:0a:bf:e2:f3:f2 > 82:cf:78:de:ed:46, ethertype IPv4
> (0x0800), length 142: (tos 0x0, ttl 64, id 61719, offset 0, flags
>

Re: [ovs-dev] [PATCH v2 2/4] netdev-linux: Favour inner packet for multi-encapsulated TSO.

2024-02-14 Thread Mike Pattrick

Note, this failed github CI with the following error message:

../../tests/testsuite: line 4034: ./atconfig: No such file or directory

This appears to be a false negative.


-M

On Mon, Feb 12, 2024 at 2:50 PM Mike Pattrick  wrote:
>
> Previously if an OVS configuration nested multiple layers of UDP tunnels
> like VXLAN or GENEVE on top of each other through netdev-linux
> interfaces, the vnet header would be incorrectly set to the outermost
> UDP tunnel layer instead of the intermediary tunnel layer.
>
> This resulted in the middle UDP tunnel not checksum offloading properly.
>
> Fixes: 3337e6d91c5b ("userspace: Enable L4 checksum offloading by default.")
> Reported-by: David Marchand 
> Signed-off-by: Mike Pattrick 
> ---
>  lib/netdev-linux.c | 15 ---
>  1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> index 1b2e5b6c2..7a156cc28 100644
> --- a/lib/netdev-linux.c
> +++ b/lib/netdev-linux.c
> @@ -7239,14 +7239,23 @@ netdev_linux_prepend_vnet_hdr(struct dp_packet *b, 
> int mtu)
>  vnet->csum_offset = (OVS_FORCE __virtio16) __builtin_offsetof(
>  struct tcp_header, tcp_csum);
>  } else if (dp_packet_hwol_l4_is_udp(b)) {
> -struct udp_header *udp_hdr = dp_packet_l4(b);
> +/* Favour the inner packet when indicating checksum offsets. */
> +void *l3_off = dp_packet_inner_l3(b);
> +void *l4_off = dp_packet_inner_l4(b);
> +
> +if (!l3_off || !l4_off) {
> +l3_off = dp_packet_l3(b);
> +l4_off = dp_packet_l4(b);
> +}
> +struct udp_header *udp_hdr = l4_off;
> +
>  ovs_be16 csum = 0;
>
>  if (dp_packet_hwol_is_ipv4(b)) {
> -const struct ip_header *ip_hdr = dp_packet_l3(b);
> +const struct ip_header *ip_hdr = l3_off;
>  csum = ~csum_finish(packet_csum_pseudoheader(ip_hdr));
>  } else if (dp_packet_hwol_tx_ipv6(b)) {
> -const struct ovs_16aligned_ip6_hdr *ip6_hdr = 
> dp_packet_l3(b);
> +const struct ovs_16aligned_ip6_hdr *ip6_hdr = l4_off;
>  csum = ~csum_finish(packet_csum_pseudoheader6(ip6_hdr));
>  }
>
> --
> 2.39.3
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] ovs-pki: Remove umask trick for self-signing.

2024-02-13 Thread Mike Pattrick

On Tue, Feb 13, 2024 at 2:44 PM Ilya Maximets  wrote:
>
> The output file of this openssl command is a certificate signed with
> pre-existing private key.  It doesn't create a private key.   The
> restricted permissions are explicitly removed from the resulted
> certificate right after its generation.  So, there is no point in
> creating it with restricted permissions in the first place.
>
> Fixes: 99e5e05db37a ("ovs-pki: Create private keys with restricted 
> permissions.")
> Signed-off-by: Ilya Maximets 

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] ovs-pki: Remove executable bit from private/cakey.pem.

2024-02-13 Thread Mike Pattrick

On Tue, Feb 13, 2024 at 2:42 PM Ilya Maximets  wrote:
>
> It's not an executable file.
>
> Signed-off-by: Ilya Maximets 

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 1/4] Userspace: Software fallback for UDP encapsulated TCP segmentation.

2024-02-12 Thread Mike Pattrick

On Mon, Feb 12, 2024 at 10:04 AM Ilya Maximets  wrote:
>
> On 2/12/24 15:19, Mike Pattrick wrote:
> > On Mon, Feb 12, 2024 at 7:52 AM Ilya Maximets  wrote:
> >>
> >> On 2/12/24 09:13, Mike Pattrick wrote:
> >>> When sending packets that are flagged as requiring segmentation to an
> >>> interface that doens't support this feature, send the packet to the TSO
> >>> software fallback instead of dropping it.
> >>>
> >>> Signed-off-by: Mike Pattrick 
> >>>
> >>> ---
> >>> Note: Previously this patch failed gitlab ci, however, I was not able to
> >>> reproduce that failure in my VM. I'm resubmitting to see if the failure
> >>> was a fluke.
> >>
> >> Doesn't look like a fluke.  Tests seem to fail consistently...
> >
> > Yes. I'll continue to look into this. I've tried different
> > kernel/compiler versions but it consistently passes for me.
>
> There is always an option to add some debug stuff into the test and
> run GHA in your fork.  But that's the most annoying way of debugging,
> I agree.

That was a good idea, I never would have guessed the issue on my own.

-M

>
> Best regards, Ilya Maximets.
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v2 2/4] netdev-linux: Favour inner packet for multi-encapsulated TSO.

2024-02-12 Thread Mike Pattrick

Previously if an OVS configuration nested multiple layers of UDP tunnels
like VXLAN or GENEVE on top of each other through netdev-linux
interfaces, the vnet header would be incorrectly set to the outermost
UDP tunnel layer instead of the intermediary tunnel layer.

This resulted in the middle UDP tunnel not checksum offloading properly.

Fixes: 3337e6d91c5b ("userspace: Enable L4 checksum offloading by default.")
Reported-by: David Marchand 
Signed-off-by: Mike Pattrick 
---
 lib/netdev-linux.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 1b2e5b6c2..7a156cc28 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -7239,14 +7239,23 @@ netdev_linux_prepend_vnet_hdr(struct dp_packet *b, int 
mtu)
 vnet->csum_offset = (OVS_FORCE __virtio16) __builtin_offsetof(
 struct tcp_header, tcp_csum);
 } else if (dp_packet_hwol_l4_is_udp(b)) {
-struct udp_header *udp_hdr = dp_packet_l4(b);
+/* Favour the inner packet when indicating checksum offsets. */
+void *l3_off = dp_packet_inner_l3(b);
+void *l4_off = dp_packet_inner_l4(b);
+
+if (!l3_off || !l4_off) {
+l3_off = dp_packet_l3(b);
+l4_off = dp_packet_l4(b);
+}
+struct udp_header *udp_hdr = l4_off;
+
 ovs_be16 csum = 0;
 
 if (dp_packet_hwol_is_ipv4(b)) {
-const struct ip_header *ip_hdr = dp_packet_l3(b);
+const struct ip_header *ip_hdr = l3_off;
 csum = ~csum_finish(packet_csum_pseudoheader(ip_hdr));
 } else if (dp_packet_hwol_tx_ipv6(b)) {
-const struct ovs_16aligned_ip6_hdr *ip6_hdr = dp_packet_l3(b);
+const struct ovs_16aligned_ip6_hdr *ip6_hdr = l4_off;
 csum = ~csum_finish(packet_csum_pseudoheader6(ip6_hdr));
 }
 
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v2 4/4] Userspace: Add system test with UDP tunneling of UDP traffic.

2024-02-12 Thread Mike Pattrick

Previously a gap existed in the tunnel system tests where only ICMP and
TCP traffic was tested. However, the code paths using for UDP traffic is
different then either of those and should also be tested.

Some of the modified tests had previously checked for TCP with ncat but
didn't include an appropriate check for ncat support. That check was
added to these tests.

Signed-off-by: Mike Pattrick 
---
v2: Start the ncat listener before pings, so the socket has a better
chance of being ready to accept connections when the ncat client starts.
Signed-off-by: Mike Pattrick 
---
 tests/system-traffic.at | 78 -
 1 file changed, 70 insertions(+), 8 deletions(-)

diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index d36a1b5ea..9b12e374f 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -292,6 +292,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([datapath - ping over vxlan tunnel])
+AT_SKIP_IF([test $HAVE_NC = no])
 OVS_CHECK_VXLAN()
 
 OVS_TRAFFIC_VSWITCHD_START()
@@ -318,6 +319,10 @@ NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -W 2 
172.31.1.100 | FORMAT_PING], [
 3 packets transmitted, 3 received, 0% packet loss, time 0ms
 ])
 
+dnl Start ncat listeners.
+OVS_DAEMONIZE([nc -l 10.1.1.100 1234 > tcp_data], [nc.pid])
+NETNS_DAEMONIZE([at_ns0], [nc -l -u 10.1.1.1 4321 > udp_data], [nc2.pid])
+
 dnl Okay, now check the overlay with different packet sizes
 NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -W 2 10.1.1.100 | FORMAT_PING], 
[0], [dnl
 3 packets transmitted, 3 received, 0% packet loss, time 0ms
@@ -331,13 +336,23 @@ NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -W 2 
10.1.1.100 | FORMAT_PI
 
 dnl Check large bidirectional TCP.
 AT_CHECK([dd if=/dev/urandom of=payload.bin bs=6 count=1 2> /dev/null])
-OVS_DAEMONIZE([nc -l 10.1.1.100 1234 > data], [nc.pid])
 NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT 10.1.1.100 1234 < payload.bin])
 
 dnl Wait until transfer completes before checking.
 OVS_WAIT_WHILE([kill -0 $(cat nc.pid)])
-AT_CHECK([diff -q payload.bin data], [0])
+AT_CHECK([diff -q payload.bin tcp_data], [0])
+
+dnl Check UDP
+AT_CHECK([dd if=/dev/urandom of=payload.bin bs=600 count=1 2> /dev/null])
+AT_CHECK([nc $NC_EOF_OPT -u 10.1.1.1 4321 < payload.bin])
 
+dnl The UDP listener will just listen forever if not terminated.
+OVS_WAIT_UNTIL([kill -0 $(cat nc2.pid)])
+AT_CHECK([kill $(cat nc2.pid)])
+
+dnl Wait until transfer completes before checking.
+OVS_WAIT_WHILE([kill -0 $(cat nc2.pid)])
+AT_CHECK([diff -q payload.bin udp_data], [0])
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
@@ -444,6 +459,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([datapath - ping over vxlan6 tunnel])
+AT_SKIP_IF([test $HAVE_NC = no])
 OVS_CHECK_VXLAN_UDP6ZEROCSUM()
 
 OVS_TRAFFIC_VSWITCHD_START()
@@ -472,6 +488,10 @@ NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -W 2 
fc00::100 | FORMAT_PING], [0]
 3 packets transmitted, 3 received, 0% packet loss, time 0ms
 ])
 
+dnl Start ncat listeners.
+OVS_DAEMONIZE([nc -l 10.1.1.100 1234 > tcp_data], [nc.pid])
+NETNS_DAEMONIZE([at_ns0], [nc -l -u 10.1.1.1 4321 > udp_data], [nc2.pid])
+
 dnl Okay, now check the overlay with different packet sizes
 NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -W 2 10.1.1.100 | FORMAT_PING], 
[0], [dnl
 3 packets transmitted, 3 received, 0% packet loss, time 0ms
@@ -485,12 +505,23 @@ NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -W 2 
10.1.1.100 | FORMAT_PI
 
 dnl Check large bidirectional TCP.
 AT_CHECK([dd if=/dev/urandom of=payload.bin bs=6 count=1 2> /dev/null])
-OVS_DAEMONIZE([nc -l 10.1.1.100 1234 > data], [nc.pid])
 NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT 10.1.1.100 1234 < payload.bin])
 
 dnl Wait until transfer completes before checking.
 OVS_WAIT_WHILE([kill -0 $(cat nc.pid)])
-AT_CHECK([diff -q payload.bin data], [0])
+AT_CHECK([diff -q payload.bin tcp_data], [0])
+
+dnl Check UDP
+AT_CHECK([dd if=/dev/urandom of=payload.bin bs=600 count=1 2> /dev/null])
+AT_CHECK([nc $NC_EOF_OPT -u 10.1.1.1 4321 < payload.bin])
+
+dnl The UDP listener will just listen forever if not terminated.
+OVS_WAIT_UNTIL([kill -0 $(cat nc2.pid)])
+AT_CHECK([kill $(cat nc2.pid)])
+
+dnl Wait until transfer completes before checking.
+OVS_WAIT_WHILE([kill -0 $(cat nc2.pid)])
+AT_CHECK([diff -q payload.bin udp_data], [0])
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
@@ -727,6 +758,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([datapath - ping over geneve tunnel])
+AT_SKIP_IF([test $HAVE_NC = no])
 OVS_CHECK_GENEVE()
 
 OVS_TRAFFIC_VSWITCHD_START()
@@ -753,6 +785,10 @@ NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -W 2 
172.31.1.100 | FORMAT_PING], [
 3 packets transmitted, 3 received, 0% packet loss, time 0ms
 ])
 
+dnl Start ncat listeners.
+OVS_DAEMONIZE([nc -l 10.1.1.100 1234 > tcp_data], [nc.pid])
+NETNS_DAEMONIZE([at_ns0], [nc -l -u 10.1.1.1 4321 > udp_data], [nc2.pid])
+
 dnl Okay, now check the overlay with different p

[ovs-dev] [PATCH v2 3/4] netdev-linux: Only repair IP checksum in IPv4.

2024-02-12 Thread Mike Pattrick

Previously a change was added to the vnet prepend code to solve for the
case where no L4 checksum offloading was needed but the L3 checksum
hadn't been calculated. But the added check didn't properly account
for IPv6 traffic.

Fixes: 85bcbbed839a ("userspace: Enable tunnel tests with TSO.")
Signed-off-by: Mike Pattrick 
---
 lib/netdev-linux.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 7a156cc28..51517854b 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -7191,8 +7191,8 @@ netdev_linux_prepend_vnet_hdr(struct dp_packet *b, int 
mtu)
 /* The packet has good L4 checksum. No need to validate again. */
 vnet->csum_start = vnet->csum_offset = (OVS_FORCE __virtio16) 0;
 vnet->flags = VIRTIO_NET_HDR_F_DATA_VALID;
-if (!dp_packet_ip_checksum_good(b)) {
-/* It is possible that L4 is good but the IP checksum isn't
+if (dp_packet_hwol_tx_ip_csum(b) && !dp_packet_ip_checksum_good(b)) {
+/* It is possible that L4 is good but the IPv4 checksum isn't
  * complete. For example in the case of UDP encapsulation of an ARP
  * packet where the UDP checksum is 0. */
 dp_packet_ip_set_header_csum(b, false);
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v2 1/4] Userspace: Software fallback for UDP encapsulated TCP segmentation.

2024-02-12 Thread Mike Pattrick

When sending packets that are flagged as requiring segmentation to an
interface that doens't support this feature, send the packet to the TSO
software fallback instead of dropping it.

Signed-off-by: Mike Pattrick 

---
v2: Start the ncat listener before pings, so the socket has a better
chance of being ready to accept connections when the ncat client starts.

Signed-off-by: Mike Pattrick 
---
 lib/dp-packet-gso.c | 73 +
 lib/dp-packet.h | 26 +++
 lib/netdev-native-tnl.c |  8 +
 lib/netdev.c| 19 ---
 tests/system-traffic.at | 55 +++
 5 files changed, 149 insertions(+), 32 deletions(-)

diff --git a/lib/dp-packet-gso.c b/lib/dp-packet-gso.c
index 847685ad9..f25abf436 100644
--- a/lib/dp-packet-gso.c
+++ b/lib/dp-packet-gso.c
@@ -47,6 +47,8 @@ dp_packet_gso_seg_new(const struct dp_packet *p, size_t 
hdr_len,
 seg->l2_5_ofs = p->l2_5_ofs;
 seg->l3_ofs = p->l3_ofs;
 seg->l4_ofs = p->l4_ofs;
+seg->inner_l3_ofs = p->inner_l3_ofs;
+seg->inner_l4_ofs = p->inner_l4_ofs;
 
 /* The protocol headers remain the same, so preserve hash and mark. */
 *dp_packet_rss_ptr(seg) = *dp_packet_rss_ptr(p);
@@ -71,7 +73,12 @@ dp_packet_gso_nr_segs(struct dp_packet *p)
 const char *data_tail;
 const char *data_pos;
 
-data_pos = dp_packet_get_tcp_payload(p);
+if (dp_packet_hwol_is_tunnel_vxlan(p) ||
+dp_packet_hwol_is_tunnel_geneve(p)) {
+data_pos = dp_packet_get_inner_tcp_payload(p);
+} else {
+data_pos = dp_packet_get_tcp_payload(p);
+}
 data_tail = (char *) dp_packet_tail(p) - dp_packet_l2_pad_size(p);
 
 return DIV_ROUND_UP(data_tail - data_pos, segsz);
@@ -91,12 +98,15 @@ dp_packet_gso(struct dp_packet *p, struct dp_packet_batch 
**batches)
 struct tcp_header *tcp_hdr;
 struct ip_header *ip_hdr;
 struct dp_packet *seg;
+const char *data_pos;
 uint16_t tcp_offset;
 uint16_t tso_segsz;
+uint16_t ip_id = 0;
 uint32_t tcp_seq;
-uint16_t ip_id;
+bool outer_ipv4;
 int hdr_len;
 int seg_len;
+bool tnl;
 
 tso_segsz = dp_packet_get_tso_segsz(p);
 if (!tso_segsz) {
@@ -105,20 +115,35 @@ dp_packet_gso(struct dp_packet *p, struct dp_packet_batch 
**batches)
 return false;
 }
 
-tcp_hdr = dp_packet_l4(p);
-tcp_offset = TCP_OFFSET(tcp_hdr->tcp_ctl);
-tcp_seq = ntohl(get_16aligned_be32(_hdr->tcp_seq));
-hdr_len = ((char *) dp_packet_l4(p) - (char *) dp_packet_eth(p))
-  + tcp_offset * 4;
-ip_id = 0;
-if (dp_packet_hwol_is_ipv4(p)) {
+if (dp_packet_hwol_is_tunnel_vxlan(p) ||
+dp_packet_hwol_is_tunnel_geneve(p)) {
+data_pos =  dp_packet_get_inner_tcp_payload(p);
+outer_ipv4 = dp_packet_hwol_is_outer_ipv4(p);
+tcp_hdr = dp_packet_inner_l4(p);
+ip_hdr = dp_packet_inner_l3(p);
+tnl = true;
+if (outer_ipv4) {
+ip_id = ntohs(((struct ip_header *) dp_packet_l3(p))->ip_id);
+} else if (dp_packet_hwol_is_ipv4(p)) {
+ip_id = ntohs(ip_hdr->ip_id);
+}
+} else {
+data_pos = dp_packet_get_tcp_payload(p);
+outer_ipv4 = dp_packet_hwol_is_ipv4(p);
+tcp_hdr = dp_packet_l4(p);
 ip_hdr = dp_packet_l3(p);
-ip_id = ntohs(ip_hdr->ip_id);
+tnl = false;
+if (outer_ipv4) {
+ip_id = ntohs(ip_hdr->ip_id);
+}
 }
 
+tcp_offset = TCP_OFFSET(tcp_hdr->tcp_ctl);
+tcp_seq = ntohl(get_16aligned_be32(_hdr->tcp_seq));
+hdr_len = ((char *) tcp_hdr - (char *) dp_packet_eth(p))
+  + tcp_offset * 4;
 const char *data_tail = (char *) dp_packet_tail(p)
 - dp_packet_l2_pad_size(p);
-const char *data_pos = dp_packet_get_tcp_payload(p);
 int n_segs = dp_packet_gso_nr_segs(p);
 
 for (int i = 0; i < n_segs; i++) {
@@ -130,8 +155,26 @@ dp_packet_gso(struct dp_packet *p, struct dp_packet_batch 
**batches)
 seg = dp_packet_gso_seg_new(p, hdr_len, data_pos, seg_len);
 data_pos += seg_len;
 
+if (tnl) {
+/* Update tunnel L3 header. */
+if (dp_packet_hwol_is_ipv4(seg)) {
+ip_hdr = dp_packet_inner_l3(seg);
+ip_hdr->ip_tot_len = htons(sizeof *ip_hdr +
+   dp_packet_inner_l4_size(seg));
+ip_hdr->ip_id = htons(ip_id);
+ip_hdr->ip_csum = 0;
+ip_id++;
+} else {
+struct ovs_16aligned_ip6_hdr *ip6_hdr;
+
+ip6_hdr = dp_packet_inner_l3(seg);
+ip6_hdr->ip6_ctlun.ip6_un1.ip6_un1_plen
+= htons(dp_packet_inner_l3_size(seg) - sizeof *ip6_hdr);
+}
+}
+
 /* Update L3 header. */
-if (dp_packet_hwol_is_ipv4(seg)

Re: [ovs-dev] [PATCH 1/4] Userspace: Software fallback for UDP encapsulated TCP segmentation.

2024-02-12 Thread Mike Pattrick

On Mon, Feb 12, 2024 at 7:52 AM Ilya Maximets  wrote:
>
> On 2/12/24 09:13, Mike Pattrick wrote:
> > When sending packets that are flagged as requiring segmentation to an
> > interface that doens't support this feature, send the packet to the TSO
> > software fallback instead of dropping it.
> >
> > Signed-off-by: Mike Pattrick 
> >
> > ---
> > Note: Previously this patch failed gitlab ci, however, I was not able to
> > reproduce that failure in my VM. I'm resubmitting to see if the failure
> > was a fluke.
>
> Doesn't look like a fluke.  Tests seem to fail consistently...

Yes. I'll continue to look into this. I've tried different
kernel/compiler versions but it consistently passes for me.

-M

>
> >
> > ---
> >  lib/dp-packet-gso.c | 73 +
> >  lib/dp-packet.h | 26 +++
> >  lib/netdev-native-tnl.c |  8 +
> >  lib/netdev.c| 19 ---
> >  tests/system-traffic.at | 48 +++
> >  5 files changed, 142 insertions(+), 32 deletions(-)
> >
> > diff --git a/lib/dp-packet-gso.c b/lib/dp-packet-gso.c
> > index 847685ad9..f25abf436 100644
> > --- a/lib/dp-packet-gso.c
> > +++ b/lib/dp-packet-gso.c
> > @@ -47,6 +47,8 @@ dp_packet_gso_seg_new(const struct dp_packet *p, size_t 
> > hdr_len,
> >  seg->l2_5_ofs = p->l2_5_ofs;
> >  seg->l3_ofs = p->l3_ofs;
> >  seg->l4_ofs = p->l4_ofs;
> > +seg->inner_l3_ofs = p->inner_l3_ofs;
> > +seg->inner_l4_ofs = p->inner_l4_ofs;
> >
> >  /* The protocol headers remain the same, so preserve hash and mark. */
> >  *dp_packet_rss_ptr(seg) = *dp_packet_rss_ptr(p);
> > @@ -71,7 +73,12 @@ dp_packet_gso_nr_segs(struct dp_packet *p)
> >  const char *data_tail;
> >  const char *data_pos;
> >
> > -data_pos = dp_packet_get_tcp_payload(p);
> > +if (dp_packet_hwol_is_tunnel_vxlan(p) ||
> > +dp_packet_hwol_is_tunnel_geneve(p)) {
> > +data_pos = dp_packet_get_inner_tcp_payload(p);
> > +} else {
> > +data_pos = dp_packet_get_tcp_payload(p);
> > +}
> >  data_tail = (char *) dp_packet_tail(p) - dp_packet_l2_pad_size(p);
> >
> >  return DIV_ROUND_UP(data_tail - data_pos, segsz);
> > @@ -91,12 +98,15 @@ dp_packet_gso(struct dp_packet *p, struct 
> > dp_packet_batch **batches)
> >  struct tcp_header *tcp_hdr;
> >  struct ip_header *ip_hdr;
> >  struct dp_packet *seg;
> > +const char *data_pos;
> >  uint16_t tcp_offset;
> >  uint16_t tso_segsz;
> > +uint16_t ip_id = 0;
> >  uint32_t tcp_seq;
> > -uint16_t ip_id;
> > +bool outer_ipv4;
> >  int hdr_len;
> >  int seg_len;
> > +bool tnl;
> >
> >  tso_segsz = dp_packet_get_tso_segsz(p);
> >  if (!tso_segsz) {
> > @@ -105,20 +115,35 @@ dp_packet_gso(struct dp_packet *p, struct 
> > dp_packet_batch **batches)
> >  return false;
> >  }
> >
> > -tcp_hdr = dp_packet_l4(p);
> > -tcp_offset = TCP_OFFSET(tcp_hdr->tcp_ctl);
> > -tcp_seq = ntohl(get_16aligned_be32(_hdr->tcp_seq));
> > -hdr_len = ((char *) dp_packet_l4(p) - (char *) dp_packet_eth(p))
> > -  + tcp_offset * 4;
> > -ip_id = 0;
> > -if (dp_packet_hwol_is_ipv4(p)) {
> > +if (dp_packet_hwol_is_tunnel_vxlan(p) ||
> > +dp_packet_hwol_is_tunnel_geneve(p)) {
> > +data_pos =  dp_packet_get_inner_tcp_payload(p);
> > +outer_ipv4 = dp_packet_hwol_is_outer_ipv4(p);
> > +tcp_hdr = dp_packet_inner_l4(p);
> > +ip_hdr = dp_packet_inner_l3(p);
> > +tnl = true;
> > +if (outer_ipv4) {
> > +ip_id = ntohs(((struct ip_header *) dp_packet_l3(p))->ip_id);
> > +} else if (dp_packet_hwol_is_ipv4(p)) {
> > +ip_id = ntohs(ip_hdr->ip_id);
> > +}
> > +} else {
> > +data_pos = dp_packet_get_tcp_payload(p);
> > +outer_ipv4 = dp_packet_hwol_is_ipv4(p);
> > +tcp_hdr = dp_packet_l4(p);
> >  ip_hdr = dp_packet_l3(p);
> > -ip_id = ntohs(ip_hdr->ip_id);
> > +tnl = false;
> > +if (outer_ipv4) {
> > +ip_id = ntohs(ip_hdr->ip_id);
> > +}
> >  }
> >
> > +tcp_offset = TCP_OFFSET(tcp_hdr->tcp_ctl);
> > +tcp_seq = ntohl(get_16aligned_be32(_hdr->tcp_seq));
> > +hdr_len =

[ovs-dev] [PATCH 4/4] Userspace: Add system test with udp tunneling of udp traffic

2024-02-12 Thread Mike Pattrick

Previously a gap existed in the tunnel system tests where only ICMP and
TCP traffic was tested. However, the code paths using for UDP traffic is
different then either of those and should also be tested.

Signed-off-by: Mike Pattrick 
---
 tests/system-traffic.at | 50 +
 1 file changed, 50 insertions(+)

diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 07af87143..d4c4cbe84 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -338,6 +338,18 @@ dnl Wait until transfer completes before checking.
 OVS_WAIT_WHILE([kill -0 $(cat nc.pid)])
 AT_CHECK([diff -q payload.bin data], [0])
 
+dnl Check UDP
+AT_CHECK([dd if=/dev/urandom of=payload.bin bs=600 count=1 2> /dev/null])
+NETNS_DAEMONIZE([at_ns0], [nc -l -u 10.1.1.1 4321 > data], [nc.pid])
+AT_CHECK([nc $NC_EOF_OPT -u 10.1.1.1 4321 < payload.bin])
+
+dnl The UDP listener will just listen forever if not terminated.
+OVS_WAIT_UNTIL([kill -0 $(cat nc.pid)])
+AT_CHECK([kill $(cat nc.pid)])
+
+dnl Wait until transfer completes before checking.
+OVS_WAIT_WHILE([kill -0 $(cat nc.pid)])
+AT_CHECK([diff -q payload.bin data], [0])
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
@@ -481,6 +493,19 @@ AT_CHECK([dd if=/dev/urandom of=payload.bin bs=6 
count=1 2> /dev/null])
 OVS_DAEMONIZE([nc -l 10.1.1.100 1234 > data], [nc.pid])
 NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT 10.1.1.100 1234 < payload.bin])
 
+dnl Wait until transfer completes before checking.
+OVS_WAIT_WHILE([kill -0 $(cat nc.pid)])
+AT_CHECK([diff -q payload.bin data], [0])
+
+dnl Check UDP
+AT_CHECK([dd if=/dev/urandom of=payload.bin bs=600 count=1 2> /dev/null])
+NETNS_DAEMONIZE([at_ns0], [nc -l -u 10.1.1.1 4321 > data], [nc.pid])
+AT_CHECK([nc $NC_EOF_OPT -u 10.1.1.1 4321 < payload.bin])
+
+dnl The UDP listener will just listen forever if not terminated.
+OVS_WAIT_UNTIL([kill -0 $(cat nc.pid)])
+AT_CHECK([kill $(cat nc.pid)])
+
 dnl Wait until transfer completes before checking.
 OVS_WAIT_WHILE([kill -0 $(cat nc.pid)])
 AT_CHECK([diff -q payload.bin data], [0])
@@ -766,6 +791,18 @@ dnl Wait until transfer completes before checking.
 OVS_WAIT_WHILE([kill -0 $(cat nc.pid)])
 AT_CHECK([diff -q payload.bin data], [0])
 
+dnl Check UDP
+AT_CHECK([dd if=/dev/urandom of=payload.bin bs=600 count=1 2> /dev/null])
+NETNS_DAEMONIZE([at_ns0], [nc -l -u 10.1.1.1 4321 > data], [nc.pid])
+AT_CHECK([nc $NC_EOF_OPT -u 10.1.1.1 4321 < payload.bin])
+
+dnl The UDP listener will just listen forever if not terminated.
+OVS_WAIT_UNTIL([kill -0 $(cat nc.pid)])
+AT_CHECK([kill $(cat nc.pid)])
+
+dnl Wait until transfer completes before checking.
+OVS_WAIT_WHILE([kill -0 $(cat nc.pid)])
+AT_CHECK([diff -q payload.bin data], [0])
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
@@ -919,6 +956,19 @@ AT_CHECK([dd if=/dev/urandom of=payload.bin bs=6 
count=1 2> /dev/null])
 OVS_DAEMONIZE([nc -l 10.1.1.100 1234 > data], [nc.pid])
 NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT 10.1.1.100 1234 < payload.bin])
 
+dnl Wait until transfer completes before checking.
+OVS_WAIT_WHILE([kill -0 $(cat nc.pid)])
+AT_CHECK([diff -q payload.bin data], [0])
+
+dnl Check UDP
+AT_CHECK([dd if=/dev/urandom of=payload.bin bs=600 count=1 2> /dev/null])
+NETNS_DAEMONIZE([at_ns0], [nc -l -u 10.1.1.1 4321 > data], [nc.pid])
+AT_CHECK([nc $NC_EOF_OPT -u 10.1.1.1 4321 < payload.bin])
+
+dnl The UDP listener will just listen forever if not terminated.
+OVS_WAIT_UNTIL([kill -0 $(cat nc.pid)])
+AT_CHECK([kill $(cat nc.pid)])
+
 dnl Wait until transfer completes before checking.
 OVS_WAIT_WHILE([kill -0 $(cat nc.pid)])
 AT_CHECK([diff -q payload.bin data], [0])
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH 3/4] netdev-linux: Only repair IP checksum in IPv4.

2024-02-12 Thread Mike Pattrick

Previously a change was added to the vnet prepend code to solve for the
case where no L4 checksum offloading was needed but the L3 checksum
hadn't been calculated. But the added check didn't properly account
for IPv6 traffic.

Fixes: 85bcbbed839a ("userspace: Enable tunnel tests with TSO.")
Signed-off-by: Mike Pattrick 
---
 lib/netdev-linux.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 7a156cc28..51517854b 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -7191,8 +7191,8 @@ netdev_linux_prepend_vnet_hdr(struct dp_packet *b, int 
mtu)
 /* The packet has good L4 checksum. No need to validate again. */
 vnet->csum_start = vnet->csum_offset = (OVS_FORCE __virtio16) 0;
 vnet->flags = VIRTIO_NET_HDR_F_DATA_VALID;
-if (!dp_packet_ip_checksum_good(b)) {
-/* It is possible that L4 is good but the IP checksum isn't
+if (dp_packet_hwol_tx_ip_csum(b) && !dp_packet_ip_checksum_good(b)) {
+/* It is possible that L4 is good but the IPv4 checksum isn't
  * complete. For example in the case of UDP encapsulation of an ARP
  * packet where the UDP checksum is 0. */
 dp_packet_ip_set_header_csum(b, false);
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH 2/4] netdev-linux: Favour inner packet for multi-encapsulated tso

2024-02-12 Thread Mike Pattrick

Previously if an OVS configuration nested multiple layers of UDP tunnels
like VXLAN or GENEVE ontop of each other through netdev-linux
interfaces, the vnet header would be incorrectly set to the outermost
UDP tunnel layer instead of the intermediary tunnel layer.

This resulted in the middle UDP tunnel not checksum offloading properly.

Fixes: 3337e6d91c5b ("userspace: Enable L4 checksum offloading by default.")
Reported-by: David Marchand 
Signed-off-by: Mike Pattrick 
---
 lib/netdev-linux.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 1b2e5b6c2..7a156cc28 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -7239,14 +7239,23 @@ netdev_linux_prepend_vnet_hdr(struct dp_packet *b, int 
mtu)
 vnet->csum_offset = (OVS_FORCE __virtio16) __builtin_offsetof(
 struct tcp_header, tcp_csum);
 } else if (dp_packet_hwol_l4_is_udp(b)) {
-struct udp_header *udp_hdr = dp_packet_l4(b);
+/* Favour the inner packet when indicating checksum offsets. */
+void *l3_off = dp_packet_inner_l3(b);
+void *l4_off = dp_packet_inner_l4(b);
+
+if (!l3_off || !l4_off) {
+l3_off = dp_packet_l3(b);
+l4_off = dp_packet_l4(b);
+}
+struct udp_header *udp_hdr = l4_off;
+
 ovs_be16 csum = 0;
 
 if (dp_packet_hwol_is_ipv4(b)) {
-const struct ip_header *ip_hdr = dp_packet_l3(b);
+const struct ip_header *ip_hdr = l3_off;
 csum = ~csum_finish(packet_csum_pseudoheader(ip_hdr));
 } else if (dp_packet_hwol_tx_ipv6(b)) {
-const struct ovs_16aligned_ip6_hdr *ip6_hdr = dp_packet_l3(b);
+const struct ovs_16aligned_ip6_hdr *ip6_hdr = l4_off;
 csum = ~csum_finish(packet_csum_pseudoheader6(ip6_hdr));
 }
 
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH 1/4] Userspace: Software fallback for UDP encapsulated TCP segmentation.

2024-02-12 Thread Mike Pattrick

When sending packets that are flagged as requiring segmentation to an
interface that doens't support this feature, send the packet to the TSO
software fallback instead of dropping it.

Signed-off-by: Mike Pattrick 

---
Note: Previously this patch failed gitlab ci, however, I was not able to
reproduce that failure in my VM. I'm resubmitting to see if the failure
was a fluke.

---
 lib/dp-packet-gso.c | 73 +
 lib/dp-packet.h | 26 +++
 lib/netdev-native-tnl.c |  8 +
 lib/netdev.c| 19 ---
 tests/system-traffic.at | 48 +++
 5 files changed, 142 insertions(+), 32 deletions(-)

diff --git a/lib/dp-packet-gso.c b/lib/dp-packet-gso.c
index 847685ad9..f25abf436 100644
--- a/lib/dp-packet-gso.c
+++ b/lib/dp-packet-gso.c
@@ -47,6 +47,8 @@ dp_packet_gso_seg_new(const struct dp_packet *p, size_t 
hdr_len,
 seg->l2_5_ofs = p->l2_5_ofs;
 seg->l3_ofs = p->l3_ofs;
 seg->l4_ofs = p->l4_ofs;
+seg->inner_l3_ofs = p->inner_l3_ofs;
+seg->inner_l4_ofs = p->inner_l4_ofs;
 
 /* The protocol headers remain the same, so preserve hash and mark. */
 *dp_packet_rss_ptr(seg) = *dp_packet_rss_ptr(p);
@@ -71,7 +73,12 @@ dp_packet_gso_nr_segs(struct dp_packet *p)
 const char *data_tail;
 const char *data_pos;
 
-data_pos = dp_packet_get_tcp_payload(p);
+if (dp_packet_hwol_is_tunnel_vxlan(p) ||
+dp_packet_hwol_is_tunnel_geneve(p)) {
+data_pos = dp_packet_get_inner_tcp_payload(p);
+} else {
+data_pos = dp_packet_get_tcp_payload(p);
+}
 data_tail = (char *) dp_packet_tail(p) - dp_packet_l2_pad_size(p);
 
 return DIV_ROUND_UP(data_tail - data_pos, segsz);
@@ -91,12 +98,15 @@ dp_packet_gso(struct dp_packet *p, struct dp_packet_batch 
**batches)
 struct tcp_header *tcp_hdr;
 struct ip_header *ip_hdr;
 struct dp_packet *seg;
+const char *data_pos;
 uint16_t tcp_offset;
 uint16_t tso_segsz;
+uint16_t ip_id = 0;
 uint32_t tcp_seq;
-uint16_t ip_id;
+bool outer_ipv4;
 int hdr_len;
 int seg_len;
+bool tnl;
 
 tso_segsz = dp_packet_get_tso_segsz(p);
 if (!tso_segsz) {
@@ -105,20 +115,35 @@ dp_packet_gso(struct dp_packet *p, struct dp_packet_batch 
**batches)
 return false;
 }
 
-tcp_hdr = dp_packet_l4(p);
-tcp_offset = TCP_OFFSET(tcp_hdr->tcp_ctl);
-tcp_seq = ntohl(get_16aligned_be32(_hdr->tcp_seq));
-hdr_len = ((char *) dp_packet_l4(p) - (char *) dp_packet_eth(p))
-  + tcp_offset * 4;
-ip_id = 0;
-if (dp_packet_hwol_is_ipv4(p)) {
+if (dp_packet_hwol_is_tunnel_vxlan(p) ||
+dp_packet_hwol_is_tunnel_geneve(p)) {
+data_pos =  dp_packet_get_inner_tcp_payload(p);
+outer_ipv4 = dp_packet_hwol_is_outer_ipv4(p);
+tcp_hdr = dp_packet_inner_l4(p);
+ip_hdr = dp_packet_inner_l3(p);
+tnl = true;
+if (outer_ipv4) {
+ip_id = ntohs(((struct ip_header *) dp_packet_l3(p))->ip_id);
+} else if (dp_packet_hwol_is_ipv4(p)) {
+ip_id = ntohs(ip_hdr->ip_id);
+}
+} else {
+data_pos = dp_packet_get_tcp_payload(p);
+outer_ipv4 = dp_packet_hwol_is_ipv4(p);
+tcp_hdr = dp_packet_l4(p);
 ip_hdr = dp_packet_l3(p);
-ip_id = ntohs(ip_hdr->ip_id);
+tnl = false;
+if (outer_ipv4) {
+ip_id = ntohs(ip_hdr->ip_id);
+}
 }
 
+tcp_offset = TCP_OFFSET(tcp_hdr->tcp_ctl);
+tcp_seq = ntohl(get_16aligned_be32(_hdr->tcp_seq));
+hdr_len = ((char *) tcp_hdr - (char *) dp_packet_eth(p))
+  + tcp_offset * 4;
 const char *data_tail = (char *) dp_packet_tail(p)
 - dp_packet_l2_pad_size(p);
-const char *data_pos = dp_packet_get_tcp_payload(p);
 int n_segs = dp_packet_gso_nr_segs(p);
 
 for (int i = 0; i < n_segs; i++) {
@@ -130,8 +155,26 @@ dp_packet_gso(struct dp_packet *p, struct dp_packet_batch 
**batches)
 seg = dp_packet_gso_seg_new(p, hdr_len, data_pos, seg_len);
 data_pos += seg_len;
 
+if (tnl) {
+/* Update tunnel L3 header. */
+if (dp_packet_hwol_is_ipv4(seg)) {
+ip_hdr = dp_packet_inner_l3(seg);
+ip_hdr->ip_tot_len = htons(sizeof *ip_hdr +
+   dp_packet_inner_l4_size(seg));
+ip_hdr->ip_id = htons(ip_id);
+ip_hdr->ip_csum = 0;
+ip_id++;
+} else {
+struct ovs_16aligned_ip6_hdr *ip6_hdr;
+
+ip6_hdr = dp_packet_inner_l3(seg);
+ip6_hdr->ip6_ctlun.ip6_un1.ip6_un1_plen
+= htons(dp_packet_inner_l3_size(seg) - sizeof *ip6_hdr);
+}
+}
+
 /* Update L3 header. */
-if (dp_packet_hwol_is_ipv4(seg)

[ovs-dev] [PATCH v4 4/4] ofproto-dpif-monitor: Remove unneeded calls to clear packets.

2024-02-11 Thread Mike Pattrick

Currently the monitor will call dp_packet_clear() on the dp_packet that
is shared amongst BFD, LLDP, and CFM. However, all of these packets are
created with eth_compose(), which already calls dp_packet_clear().

Reviewed-by: David Marchand 
Signed-off-by: Mike Pattrick 
---
 ofproto/ofproto-dpif-monitor.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/ofproto/ofproto-dpif-monitor.c b/ofproto/ofproto-dpif-monitor.c
index bb0e49091..5132f9c95 100644
--- a/ofproto/ofproto-dpif-monitor.c
+++ b/ofproto/ofproto-dpif-monitor.c
@@ -275,19 +275,16 @@ monitor_mport_run(struct mport *mport, struct dp_packet 
*packet)
 long long int lldp_wake_time = LLONG_MAX;
 
 if (mport->cfm && cfm_should_send_ccm(mport->cfm)) {
-dp_packet_clear(packet);
 cfm_compose_ccm(mport->cfm, packet, mport->hw_addr);
 ofproto_dpif_send_packet(mport->ofport, false, packet);
 }
 if (mport->bfd && bfd_should_send_packet(mport->bfd)) {
 bool oam;
 
-dp_packet_clear(packet);
 bfd_put_packet(mport->bfd, packet, mport->hw_addr, );
 ofproto_dpif_send_packet(mport->ofport, oam, packet);
 }
 if (mport->lldp && lldp_should_send_packet(mport->lldp)) {
-dp_packet_clear(packet);
 lldp_put_packet(mport->lldp, packet, mport->hw_addr);
 ofproto_dpif_send_packet(mport->ofport, false, packet);
 }
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v4 3/4] dp-packet: Include inner offsets in adjustments and checks.

2024-02-11 Thread Mike Pattrick

Include inner offsets in functions where l3 and l4 offsets are either
modified or checked.

Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
Signed-off-by: Mike Pattrick 
---
v2:
 - Prints out new offsets in autovalidator
 - Extends resize_l2 change to avx512
v3:
 - Reordered fields in dp_packet_compare_offsets error print message
 - Updated and simplified comments in avx512_dp_packet_resize_l2()
v4:
 - Removed comment about three asserts
---
 lib/dp-packet.c  | 18 +-
 lib/odp-execute-avx512.c | 31 ---
 2 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 0e23c766e..305822293 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -507,6 +507,8 @@ dp_packet_resize_l2_5(struct dp_packet *b, int increment)
 /* Adjust layer offsets after l2_5. */
 dp_packet_adjust_layer_offset(>l3_ofs, increment);
 dp_packet_adjust_layer_offset(>l4_ofs, increment);
+dp_packet_adjust_layer_offset(>inner_l3_ofs, increment);
+dp_packet_adjust_layer_offset(>inner_l4_ofs, increment);
 
 return dp_packet_data(b);
 }
@@ -529,17 +531,23 @@ dp_packet_compare_offsets(struct dp_packet *b1, struct 
dp_packet *b2,
 if ((b1->l2_pad_size != b2->l2_pad_size) ||
 (b1->l2_5_ofs != b2->l2_5_ofs) ||
 (b1->l3_ofs != b2->l3_ofs) ||
-(b1->l4_ofs != b2->l4_ofs)) {
+(b1->l4_ofs != b2->l4_ofs) ||
+(b1->inner_l3_ofs != b2->inner_l3_ofs) ||
+(b1->inner_l4_ofs != b2->inner_l4_ofs)) {
 if (err_str) {
 ds_put_format(err_str, "Packet offset comparison failed\n");
 ds_put_format(err_str, "Buffer 1 offsets: l2_pad_size %u,"
-  " l2_5_ofs : %u l3_ofs %u, l4_ofs %u\n",
+  " l2_5_ofs : %u l3_ofs %u, l4_ofs %u,"
+  " inner_l3_ofs %u, inner_l4_ofs %u\n",
   b1->l2_pad_size, b1->l2_5_ofs,
-  b1->l3_ofs, b1->l4_ofs);
+  b1->l3_ofs, b1->l4_ofs,
+  b1->inner_l3_ofs, b1->inner_l4_ofs);
 ds_put_format(err_str, "Buffer 2 offsets: l2_pad_size %u,"
-  " l2_5_ofs : %u l3_ofs %u, l4_ofs %u\n",
+  " l2_5_ofs : %u l3_ofs %u, l4_ofs %u,"
+  " inner_l3_ofs %u, inner_l4_ofs %u\n",
   b2->l2_pad_size, b2->l2_5_ofs,
-  b2->l3_ofs, b2->l4_ofs);
+  b2->l3_ofs, b2->l4_ofs,
+  b2->inner_l3_ofs, b2->inner_l4_ofs);
 }
 return false;
 }
diff --git a/lib/odp-execute-avx512.c b/lib/odp-execute-avx512.c
index 747e04014..103ff2d0d 100644
--- a/lib/odp-execute-avx512.c
+++ b/lib/odp-execute-avx512.c
@@ -35,10 +35,10 @@
 
 VLOG_DEFINE_THIS_MODULE(odp_execute_avx512);
 
-/* The below three build asserts make sure that l2_5_ofs, l3_ofs, and l4_ofs
- * fields remain in the same order and offset to l2_padd_size. This is needed
- * as the avx512_dp_packet_resize_l2() function will manipulate those fields at
- * a fixed memory index based on the l2_padd_size offset. */
+/* The below build asserts make sure that the below fields remain in the same
+ * order and offset to l2_pad_size. This is needed as the
+ * avx512_dp_packet_resize_l2() function will manipulate those fields at a
+ * fixed memory index based on the l2_pad_size offset. */
 BUILD_ASSERT_DECL(offsetof(struct dp_packet, l2_pad_size) +
   MEMBER_SIZEOF(struct dp_packet, l2_pad_size) ==
   offsetof(struct dp_packet, l2_5_ofs));
@@ -51,6 +51,14 @@ BUILD_ASSERT_DECL(offsetof(struct dp_packet, l3_ofs) +
MEMBER_SIZEOF(struct dp_packet, l3_ofs) ==
offsetof(struct dp_packet, l4_ofs));
 
+BUILD_ASSERT_DECL(offsetof(struct dp_packet, l4_ofs) +
+   MEMBER_SIZEOF(struct dp_packet, l4_ofs) ==
+   offsetof(struct dp_packet, inner_l3_ofs));
+
+BUILD_ASSERT_DECL(offsetof(struct dp_packet, inner_l3_ofs) +
+   MEMBER_SIZEOF(struct dp_packet, inner_l3_ofs) ==
+   offsetof(struct dp_packet, inner_l4_ofs));
+
 /* The below build assert makes sure it's safe to read/write 128-bits starting
  * at the l2_pad_size location. */
 BUILD_ASSERT_DECL(sizeof(struct dp_packet) -
@@ -112,7 +120,7 @@ avx512_dp_packet_resize_l2(struct dp_packet *b, int 
resize_by_bytes)
 dp_packet_pull(b, -resize_by_bytes);
 }
 
-/* The next step is to update the l2_5_ofs, l3_ofs and l4_ofs fields which
+/* The next step is to update the l2_5_ofs to inner_l4_ofs fields which
  * the scalar implement

[ovs-dev] [PATCH v4 2/4] bfd: Set proper offsets and flags in BFD packets.

2024-02-11 Thread Mike Pattrick

Previously the BFD packet creation code did not appropriately set
offsets or flags. This contributed to issues involving encapsulation and
the TSO code.

The transition to using standard functions also means some other
metadata like packet_type are set appropriately.

Fixes: ccc096898c46 ("bfd: Implement Bidirectional Forwarding Detection.")
Signed-off-by: Mike Pattrick 
---
v2: Corrected formatting, and just calculate checksum up front
v3: Extended patch comment
---
 lib/bfd.c | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/lib/bfd.c b/lib/bfd.c
index 9698576d0..9af258917 100644
--- a/lib/bfd.c
+++ b/lib/bfd.c
@@ -586,7 +586,6 @@ bfd_put_packet(struct bfd *bfd, struct dp_packet *p,
 {
 long long int min_tx, min_rx;
 struct udp_header *udp;
-struct eth_header *eth;
 struct ip_header *ip;
 struct msg *msg;
 
@@ -605,15 +604,13 @@ bfd_put_packet(struct bfd *bfd, struct dp_packet *p,
  * set. */
 ovs_assert(!(bfd->flags & FLAG_POLL) || !(bfd->flags & FLAG_FINAL));
 
-dp_packet_reserve(p, 2); /* Properly align after the ethernet header. */
-eth = dp_packet_put_uninit(p, sizeof *eth);
-eth->eth_src = eth_addr_is_zero(bfd->local_eth_src)
-? eth_src : bfd->local_eth_src;
-eth->eth_dst = eth_addr_is_zero(bfd->local_eth_dst)
-? eth_addr_bfd : bfd->local_eth_dst;
-eth->eth_type = htons(ETH_TYPE_IP);
+ip = eth_compose(p,
+ eth_addr_is_zero(bfd->local_eth_dst)
+ ? eth_addr_bfd : bfd->local_eth_dst,
+ eth_addr_is_zero(bfd->local_eth_src)
+ ? eth_src : bfd->local_eth_src,
+ ETH_TYPE_IP, sizeof *ip + sizeof *udp + sizeof *msg);
 
-ip = dp_packet_put_zeros(p, sizeof *ip);
 ip->ip_ihl_ver = IP_IHL_VER(5, 4);
 ip->ip_tot_len = htons(sizeof *ip + sizeof *udp + sizeof *msg);
 ip->ip_ttl = MAXTTL;
@@ -621,15 +618,17 @@ bfd_put_packet(struct bfd *bfd, struct dp_packet *p,
 ip->ip_proto = IPPROTO_UDP;
 put_16aligned_be32(>ip_src, bfd->ip_src);
 put_16aligned_be32(>ip_dst, bfd->ip_dst);
-/* Checksum has already been zeroed by put_zeros call. */
+/* Checksum has already been zeroed by eth_compose call. */
 ip->ip_csum = csum(ip, sizeof *ip);
+dp_packet_set_l4(p, ip + 1);
 
-udp = dp_packet_put_zeros(p, sizeof *udp);
+udp = dp_packet_l4(p);
 udp->udp_src = htons(bfd->udp_src);
 udp->udp_dst = htons(BFD_DEST_PORT);
 udp->udp_len = htons(sizeof *udp + sizeof *msg);
+/* Checksum already zero from eth_compose. */
 
-msg = dp_packet_put_uninit(p, sizeof *msg);
+msg = (struct msg *)(udp + 1);
 msg->vers_diag = (BFD_VERSION << 5) | bfd->diag;
 msg->flags = (bfd->state & STATE_MASK) | bfd->flags;
 
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v4 1/4] dp-packet: Validate correct offset for L4 inner size.

2024-02-11 Thread Mike Pattrick

This patch fixes the correctness of dp_packet_inner_l4_size() when
checking for the existence of an inner L4 header. Previously it checked
for the outer L4 header.

This function is currently only used when a packet is already flagged
for tunneling, so an incorrect determination isn't possible as long as
the flags of the packet are correct.

Fixes: 85bcbbed839a ("userspace: Enable tunnel tests with TSO.")
Reviewed-by: David Marchand 
Signed-off-by: Mike Pattrick 
---
v2: Corrected patch subject
---
 lib/dp-packet.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index dceb701e8..802d3f385 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -540,7 +540,7 @@ dp_packet_inner_l4(const struct dp_packet *b)
 static inline size_t
 dp_packet_inner_l4_size(const struct dp_packet *b)
 {
-return OVS_LIKELY(b->l4_ofs != UINT16_MAX)
+return OVS_LIKELY(b->inner_l4_ofs != UINT16_MAX)
? (const char *) dp_packet_tail(b)
- (const char *) dp_packet_inner_l4(b)
- dp_packet_l2_pad_size(b)
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v3 1/4] dp-packet: Validate correct offset for L4 inner size.

2024-02-06 Thread Mike Pattrick

This patch fixes the correctness of dp_packet_inner_l4_size() when
checking for the existence of an inner L4 header. Previously it checked
for the outer L4 header.

This function is currently only used when a packet is already flagged
for tunneling, so an incorrect determination isn't possible as long as
the flags of the packet are correct.

Fixes: 85bcbbed839a ("userspace: Enable tunnel tests with TSO.")
Reviewed-by: David Marchand 
Signed-off-by: Mike Pattrick 
---
v2: Corrected patch subject
---
 lib/dp-packet.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index dceb701e8..802d3f385 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -540,7 +540,7 @@ dp_packet_inner_l4(const struct dp_packet *b)
 static inline size_t
 dp_packet_inner_l4_size(const struct dp_packet *b)
 {
-return OVS_LIKELY(b->l4_ofs != UINT16_MAX)
+return OVS_LIKELY(b->inner_l4_ofs != UINT16_MAX)
? (const char *) dp_packet_tail(b)
- (const char *) dp_packet_inner_l4(b)
- dp_packet_l2_pad_size(b)
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v3 2/4] bfd: Set proper offsets and flags in BFD packets.

2024-02-06 Thread Mike Pattrick

Previously the BFD packet creation code did not appropriately set
offsets or flags. This contributed to issues involving encapsulation and
the TSO code.

The transition to using standard functions also means some other
metadata like packet_type are set appropriately.

Fixes: ccc096898c46 ("bfd: Implement Bidirectional Forwarding Detection.")
Signed-off-by: Mike Pattrick 
---
v2: Corrected formatting, and just calculate checksum up front
v3: Extended patch comment
---
 lib/bfd.c | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/lib/bfd.c b/lib/bfd.c
index 9698576d0..9af258917 100644
--- a/lib/bfd.c
+++ b/lib/bfd.c
@@ -586,7 +586,6 @@ bfd_put_packet(struct bfd *bfd, struct dp_packet *p,
 {
 long long int min_tx, min_rx;
 struct udp_header *udp;
-struct eth_header *eth;
 struct ip_header *ip;
 struct msg *msg;
 
@@ -605,15 +604,13 @@ bfd_put_packet(struct bfd *bfd, struct dp_packet *p,
  * set. */
 ovs_assert(!(bfd->flags & FLAG_POLL) || !(bfd->flags & FLAG_FINAL));
 
-dp_packet_reserve(p, 2); /* Properly align after the ethernet header. */
-eth = dp_packet_put_uninit(p, sizeof *eth);
-eth->eth_src = eth_addr_is_zero(bfd->local_eth_src)
-? eth_src : bfd->local_eth_src;
-eth->eth_dst = eth_addr_is_zero(bfd->local_eth_dst)
-? eth_addr_bfd : bfd->local_eth_dst;
-eth->eth_type = htons(ETH_TYPE_IP);
+ip = eth_compose(p,
+ eth_addr_is_zero(bfd->local_eth_dst)
+ ? eth_addr_bfd : bfd->local_eth_dst,
+ eth_addr_is_zero(bfd->local_eth_src)
+ ? eth_src : bfd->local_eth_src,
+ ETH_TYPE_IP, sizeof *ip + sizeof *udp + sizeof *msg);
 
-ip = dp_packet_put_zeros(p, sizeof *ip);
 ip->ip_ihl_ver = IP_IHL_VER(5, 4);
 ip->ip_tot_len = htons(sizeof *ip + sizeof *udp + sizeof *msg);
 ip->ip_ttl = MAXTTL;
@@ -621,15 +618,17 @@ bfd_put_packet(struct bfd *bfd, struct dp_packet *p,
 ip->ip_proto = IPPROTO_UDP;
 put_16aligned_be32(>ip_src, bfd->ip_src);
 put_16aligned_be32(>ip_dst, bfd->ip_dst);
-/* Checksum has already been zeroed by put_zeros call. */
+/* Checksum has already been zeroed by eth_compose call. */
 ip->ip_csum = csum(ip, sizeof *ip);
+dp_packet_set_l4(p, ip + 1);
 
-udp = dp_packet_put_zeros(p, sizeof *udp);
+udp = dp_packet_l4(p);
 udp->udp_src = htons(bfd->udp_src);
 udp->udp_dst = htons(BFD_DEST_PORT);
 udp->udp_len = htons(sizeof *udp + sizeof *msg);
+/* Checksum already zero from eth_compose. */
 
-msg = dp_packet_put_uninit(p, sizeof *msg);
+msg = (struct msg *)(udp + 1);
 msg->vers_diag = (BFD_VERSION << 5) | bfd->diag;
 msg->flags = (bfd->state & STATE_MASK) | bfd->flags;
 
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v3 4/4] ofproto-dpif-monitor: Remove unneeded calls to clear packets.

2024-02-06 Thread Mike Pattrick

Currently the monitor will call dp_packet_clear() on the dp_packet that
is shared amongst BFD, LLDP, and CFM. However, all of these packets are
created with eth_compose(), which already calls dp_packet_clear().

Reviewed-by: David Marchand 
Signed-off-by: Mike Pattrick 
---
 ofproto/ofproto-dpif-monitor.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/ofproto/ofproto-dpif-monitor.c b/ofproto/ofproto-dpif-monitor.c
index bb0e49091..5132f9c95 100644
--- a/ofproto/ofproto-dpif-monitor.c
+++ b/ofproto/ofproto-dpif-monitor.c
@@ -275,19 +275,16 @@ monitor_mport_run(struct mport *mport, struct dp_packet 
*packet)
 long long int lldp_wake_time = LLONG_MAX;
 
 if (mport->cfm && cfm_should_send_ccm(mport->cfm)) {
-dp_packet_clear(packet);
 cfm_compose_ccm(mport->cfm, packet, mport->hw_addr);
 ofproto_dpif_send_packet(mport->ofport, false, packet);
 }
 if (mport->bfd && bfd_should_send_packet(mport->bfd)) {
 bool oam;
 
-dp_packet_clear(packet);
 bfd_put_packet(mport->bfd, packet, mport->hw_addr, );
 ofproto_dpif_send_packet(mport->ofport, oam, packet);
 }
 if (mport->lldp && lldp_should_send_packet(mport->lldp)) {
-dp_packet_clear(packet);
 lldp_put_packet(mport->lldp, packet, mport->hw_addr);
 ofproto_dpif_send_packet(mport->ofport, false, packet);
 }
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v3 3/4] dp-packet: Include inner offsets in adjustments and checks.

2024-02-06 Thread Mike Pattrick

Include inner offsets in functions where l3 and l4 offsets are either
modified or checked.

Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
Signed-off-by: Mike Pattrick 
---
v2:
 - Prints out new offsets in autovalidator
 - Extends resize_l2 change to avx512
v3:
 - Reordered fields in dp_packet_compare_offsets error print message
 - Updated and simplified comments in avx512_dp_packet_resize_l2()
---
 lib/dp-packet.c  | 18 +-
 lib/odp-execute-avx512.c | 31 ---
 2 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 0e23c766e..305822293 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -507,6 +507,8 @@ dp_packet_resize_l2_5(struct dp_packet *b, int increment)
 /* Adjust layer offsets after l2_5. */
 dp_packet_adjust_layer_offset(>l3_ofs, increment);
 dp_packet_adjust_layer_offset(>l4_ofs, increment);
+dp_packet_adjust_layer_offset(>inner_l3_ofs, increment);
+dp_packet_adjust_layer_offset(>inner_l4_ofs, increment);
 
 return dp_packet_data(b);
 }
@@ -529,17 +531,23 @@ dp_packet_compare_offsets(struct dp_packet *b1, struct 
dp_packet *b2,
 if ((b1->l2_pad_size != b2->l2_pad_size) ||
 (b1->l2_5_ofs != b2->l2_5_ofs) ||
 (b1->l3_ofs != b2->l3_ofs) ||
-(b1->l4_ofs != b2->l4_ofs)) {
+(b1->l4_ofs != b2->l4_ofs) ||
+(b1->inner_l3_ofs != b2->inner_l3_ofs) ||
+(b1->inner_l4_ofs != b2->inner_l4_ofs)) {
 if (err_str) {
 ds_put_format(err_str, "Packet offset comparison failed\n");
 ds_put_format(err_str, "Buffer 1 offsets: l2_pad_size %u,"
-  " l2_5_ofs : %u l3_ofs %u, l4_ofs %u\n",
+  " l2_5_ofs : %u l3_ofs %u, l4_ofs %u,"
+  " inner_l3_ofs %u, inner_l4_ofs %u\n",
   b1->l2_pad_size, b1->l2_5_ofs,
-  b1->l3_ofs, b1->l4_ofs);
+  b1->l3_ofs, b1->l4_ofs,
+  b1->inner_l3_ofs, b1->inner_l4_ofs);
 ds_put_format(err_str, "Buffer 2 offsets: l2_pad_size %u,"
-  " l2_5_ofs : %u l3_ofs %u, l4_ofs %u\n",
+  " l2_5_ofs : %u l3_ofs %u, l4_ofs %u,"
+  " inner_l3_ofs %u, inner_l4_ofs %u\n",
   b2->l2_pad_size, b2->l2_5_ofs,
-  b2->l3_ofs, b2->l4_ofs);
+  b2->l3_ofs, b2->l4_ofs,
+  b2->inner_l3_ofs, b2->inner_l4_ofs);
 }
 return false;
 }
diff --git a/lib/odp-execute-avx512.c b/lib/odp-execute-avx512.c
index 747e04014..4a3396fbd 100644
--- a/lib/odp-execute-avx512.c
+++ b/lib/odp-execute-avx512.c
@@ -35,10 +35,10 @@
 
 VLOG_DEFINE_THIS_MODULE(odp_execute_avx512);
 
-/* The below three build asserts make sure that l2_5_ofs, l3_ofs, and l4_ofs
- * fields remain in the same order and offset to l2_padd_size. This is needed
- * as the avx512_dp_packet_resize_l2() function will manipulate those fields at
- * a fixed memory index based on the l2_padd_size offset. */
+/* The below three build asserts make sure that the below fields remain in the
+ * same order and offset to l2_pad_size. This is needed as the
+ * avx512_dp_packet_resize_l2() function will manipulate those fields at a
+ * fixed memory index based on the l2_pad_size offset. */
 BUILD_ASSERT_DECL(offsetof(struct dp_packet, l2_pad_size) +
   MEMBER_SIZEOF(struct dp_packet, l2_pad_size) ==
   offsetof(struct dp_packet, l2_5_ofs));
@@ -51,6 +51,14 @@ BUILD_ASSERT_DECL(offsetof(struct dp_packet, l3_ofs) +
MEMBER_SIZEOF(struct dp_packet, l3_ofs) ==
offsetof(struct dp_packet, l4_ofs));
 
+BUILD_ASSERT_DECL(offsetof(struct dp_packet, l4_ofs) +
+   MEMBER_SIZEOF(struct dp_packet, l4_ofs) ==
+   offsetof(struct dp_packet, inner_l3_ofs));
+
+BUILD_ASSERT_DECL(offsetof(struct dp_packet, inner_l3_ofs) +
+   MEMBER_SIZEOF(struct dp_packet, inner_l3_ofs) ==
+   offsetof(struct dp_packet, inner_l4_ofs));
+
 /* The below build assert makes sure it's safe to read/write 128-bits starting
  * at the l2_pad_size location. */
 BUILD_ASSERT_DECL(sizeof(struct dp_packet) -
@@ -112,7 +120,7 @@ avx512_dp_packet_resize_l2(struct dp_packet *b, int 
resize_by_bytes)
 dp_packet_pull(b, -resize_by_bytes);
 }
 
-/* The next step is to update the l2_5_ofs, l3_ofs and l4_ofs fields which
+/* The next step is to update the l2_5_ofs to inner_l4_ofs fields which
  * the scalar implementation does with the  dp_packet_adjus

[ovs-dev] [PATCH] ovsdb: Don't iterate over rows on empty mutation.

2024-02-04 Thread Mike Pattrick

Previously when an empty mutation was used to count the number of rows
in a table, OVSDB would iterate over all rows twice. First to perform an
RBAC check, and then to perform the no-operation.

This change adds a short circuit to mutate operations with no conditions
and an empty mutation set, returning immediately. One notable change in
functionality is not performing the RBAC check in this condition, as no
mutation actually takes place.

Reported-by: Terry Wilson 
Reported-at: https://issues.redhat.com/browse/FDP-359
Signed-off-by: Mike Pattrick 
---
 ovsdb/execution.c   |  9 -
 ovsdb/mutation.h|  5 +
 tests/ovsdb-rbac.at | 19 +++
 3 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/ovsdb/execution.c b/ovsdb/execution.c
index 8c20c3b54..26bc49641 100644
--- a/ovsdb/execution.c
+++ b/ovsdb/execution.c
@@ -609,7 +609,14 @@ ovsdb_execute_mutate(struct ovsdb_execution *x, struct 
ovsdb_parser *parser,
 error = ovsdb_condition_from_json(table->schema, where, x->symtab,
   );
 }
-if (!error) {
+if (!error &&
+ovsdb_condition_empty() &&
+ovsdb_mutation_set_empty()) {
+/* Special case with no conditions or mutations, just return the row
+ * count. */
+json_object_put(result, "count",
+json_integer_create(hmap_count(>rows)));
+} else if (!error) {
 mr.n_matches = 0;
 mr.txn = x->txn;
 mr.mutations = 
diff --git a/ovsdb/mutation.h b/ovsdb/mutation.h
index 7566ef199..3989c7b8a 100644
--- a/ovsdb/mutation.h
+++ b/ovsdb/mutation.h
@@ -68,5 +68,10 @@ struct json *ovsdb_mutation_set_to_json(const struct 
ovsdb_mutation_set *);
 void ovsdb_mutation_set_destroy(struct ovsdb_mutation_set *);
 struct ovsdb_error *ovsdb_mutation_set_execute(
 struct ovsdb_row *, const struct ovsdb_mutation_set *) 
OVS_WARN_UNUSED_RESULT;
+static inline bool ovsdb_mutation_set_empty(
+const struct ovsdb_mutation_set *ms)
+{
+return ms->n_mutations == 0;
+}
 
 #endif /* ovsdb/mutation.h */
diff --git a/tests/ovsdb-rbac.at b/tests/ovsdb-rbac.at
index 3172e4bf5..741651723 100644
--- a/tests/ovsdb-rbac.at
+++ b/tests/ovsdb-rbac.at
@@ -355,6 +355,25 @@ AT_CHECK([uuidfilt stdout], [0], [[[{"details":"RBAC rules 
for client \"client-2
 ], [ignore])
 
 # Test 14:
+# Count the rows in other_colors. This should pass even though the RBAC
+# authorization would fail because "client-2" does not match the
+# "creator" column for this row. Because the RBAC check is bypassed when
+# where and mutations are both empty.
+AT_CHECK([ovsdb-client transact ssl:127.0.0.1:$SSL_PORT \
+--private-key=$RBAC_PKIDIR/client-2-privkey.pem \
+--certificate=$RBAC_PKIDIR/client-2-cert.pem \
+--ca-cert=$RBAC_PKIDIR/pki/switchca/cacert.pem \
+['["mydb",
+ {"op": "mutate",
+  "table": "other_colors",
+  "where": [],
+  "mutations": []}
+ ]']], [0], [stdout], [ignore])
+cat stdout >> output
+AT_CHECK([uuidfilt stdout], [0], [[[{"count":1}]]
+], [ignore])
+
+# Test 15:
 # Attempt to delete a row from the "other_colors" table. This should pass
 # the RBAC authorization test because "client-1" does matches the
 # "creator" column for this row.
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2 3/4] dp-packet: Include inner offsets in adjustments and checks.

2024-01-31 Thread Mike Pattrick

On Wed, Jan 31, 2024 at 10:04 AM David Marchand
 wrote:
>
> On Tue, Jan 30, 2024 at 11:15 PM Mike Pattrick  wrote:
> >
> > Include inner offsets in functions where l3 and l4 offsets are either
> > modified or checked.
> >
> > Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
> > Signed-off-by: Mike Pattrick 
> > ---
> > v2:
> >
> >  - Prints out new offsets in autovalidator
> >  - Extends resize_l2 change to avx512
> >
> > Signed-off-by: Mike Pattrick 
> > ---
> >  lib/dp-packet.c  | 18 +-
> >  lib/odp-execute-avx512.c | 19 ++-
> >  2 files changed, 27 insertions(+), 10 deletions(-)
> >
> > diff --git a/lib/dp-packet.c b/lib/dp-packet.c
> > index 0e23c766e..640b1dfeb 100644
> > --- a/lib/dp-packet.c
> > +++ b/lib/dp-packet.c
> > @@ -507,6 +507,8 @@ dp_packet_resize_l2_5(struct dp_packet *b, int 
> > increment)
> >  /* Adjust layer offsets after l2_5. */
> >  dp_packet_adjust_layer_offset(>l3_ofs, increment);
> >  dp_packet_adjust_layer_offset(>l4_ofs, increment);
> > +dp_packet_adjust_layer_offset(>inner_l3_ofs, increment);
> > +dp_packet_adjust_layer_offset(>inner_l4_ofs, increment);
> >
> >  return dp_packet_data(b);
> >  }
> > @@ -529,17 +531,23 @@ dp_packet_compare_offsets(struct dp_packet *b1, 
> > struct dp_packet *b2,
> >  if ((b1->l2_pad_size != b2->l2_pad_size) ||
> >  (b1->l2_5_ofs != b2->l2_5_ofs) ||
> >  (b1->l3_ofs != b2->l3_ofs) ||
> > -(b1->l4_ofs != b2->l4_ofs)) {
> > +(b1->l4_ofs != b2->l4_ofs) ||
> > +(b1->inner_l3_ofs != b2->inner_l3_ofs) ||
> > +(b1->inner_l4_ofs != b2->inner_l4_ofs)) {
> >  if (err_str) {
> >  ds_put_format(err_str, "Packet offset comparison failed\n");
> >  ds_put_format(err_str, "Buffer 1 offsets: l2_pad_size %u,"
> > -  " l2_5_ofs : %u l3_ofs %u, l4_ofs %u\n",
> > +  " l2_5_ofs : %u l3_ofs %u, inner_l3_ofs %u,"
> > +  " l4_ofs %u, inner_l4_ofs %u\n",
> >b1->l2_pad_size, b1->l2_5_ofs,
> > -  b1->l3_ofs, b1->l4_ofs);
> > +  b1->l3_ofs, b1->inner_l3_ofs,
> > +  b1->l4_ofs, b1->inner_l4_ofs);
> >  ds_put_format(err_str, "Buffer 2 offsets: l2_pad_size %u,"
> > -  " l2_5_ofs : %u l3_ofs %u, l4_ofs %u\n",
> > +  " l2_5_ofs : %u l3_ofs %u, inner_l3_ofs %u,"
> > +  " l4_ofs %u, inner_l4_ofs %u\n",
> >b2->l2_pad_size, b2->l2_5_ofs,
> > -  b2->l3_ofs, b2->l4_ofs);
> > +  b2->l3_ofs, b2->inner_l3_ofs,
> > +  b2->l4_ofs, b2->inner_l4_ofs);
> >  }
> >  return false;
> >  }
>
> Not a strong opinion, but I prefer keeping those offsets in the same
> order than a real packet layout, rather than mix l3 / l4 outer/inner
> offsets.
> IOW:
> -  " l2_5_ofs : %u l3_ofs %u, l4_ofs %u\n",
> +  " l2_5_ofs : %u l3_ofs %u, l4_ofs %u,"
> +  " inner_l3_ofs : %u, inner_l4_ofs %u\n",
>
>
> > diff --git a/lib/odp-execute-avx512.c b/lib/odp-execute-avx512.c
> > index 747e04014..7f9870669 100644
> > --- a/lib/odp-execute-avx512.c
> > +++ b/lib/odp-execute-avx512.c
> > @@ -35,10 +35,11 @@
> >
> >  VLOG_DEFINE_THIS_MODULE(odp_execute_avx512);
> >
> > -/* The below three build asserts make sure that l2_5_ofs, l3_ofs, and 
> > l4_ofs
> > - * fields remain in the same order and offset to l2_padd_size. This is 
> > needed
> > - * as the avx512_dp_packet_resize_l2() function will manipulate those 
> > fields at
> > - * a fixed memory index based on the l2_padd_size offset. */
> > +/* The below three build asserts make sure that l2_5_ofs, l3_ofs, l4_ofs,
>
> Counting build asserts is useless in a comment.. and here it gets
> wrong after the change.
> I suggest a simple: "The below build asserts".
>
>
> > + * inner_l3_ofs, and inner_l4_ofs fields remain in the same order and 
> > offset to
> > + * l2_padd_size. This is needed as the avx512_d

[ovs-dev] [PATCH v2 3/4] dp-packet: Include inner offsets in adjustments and checks.

2024-01-30 Thread Mike Pattrick

Include inner offsets in functions where l3 and l4 offsets are either
modified or checked.

Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
Signed-off-by: Mike Pattrick 
---
v2:

 - Prints out new offsets in autovalidator
 - Extends resize_l2 change to avx512

Signed-off-by: Mike Pattrick 
---
 lib/dp-packet.c  | 18 +-
 lib/odp-execute-avx512.c | 19 ++-
 2 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 0e23c766e..640b1dfeb 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -507,6 +507,8 @@ dp_packet_resize_l2_5(struct dp_packet *b, int increment)
 /* Adjust layer offsets after l2_5. */
 dp_packet_adjust_layer_offset(>l3_ofs, increment);
 dp_packet_adjust_layer_offset(>l4_ofs, increment);
+dp_packet_adjust_layer_offset(>inner_l3_ofs, increment);
+dp_packet_adjust_layer_offset(>inner_l4_ofs, increment);
 
 return dp_packet_data(b);
 }
@@ -529,17 +531,23 @@ dp_packet_compare_offsets(struct dp_packet *b1, struct 
dp_packet *b2,
 if ((b1->l2_pad_size != b2->l2_pad_size) ||
 (b1->l2_5_ofs != b2->l2_5_ofs) ||
 (b1->l3_ofs != b2->l3_ofs) ||
-(b1->l4_ofs != b2->l4_ofs)) {
+(b1->l4_ofs != b2->l4_ofs) ||
+(b1->inner_l3_ofs != b2->inner_l3_ofs) ||
+(b1->inner_l4_ofs != b2->inner_l4_ofs)) {
 if (err_str) {
 ds_put_format(err_str, "Packet offset comparison failed\n");
 ds_put_format(err_str, "Buffer 1 offsets: l2_pad_size %u,"
-  " l2_5_ofs : %u l3_ofs %u, l4_ofs %u\n",
+  " l2_5_ofs : %u l3_ofs %u, inner_l3_ofs %u,"
+  " l4_ofs %u, inner_l4_ofs %u\n",
   b1->l2_pad_size, b1->l2_5_ofs,
-  b1->l3_ofs, b1->l4_ofs);
+  b1->l3_ofs, b1->inner_l3_ofs,
+  b1->l4_ofs, b1->inner_l4_ofs);
 ds_put_format(err_str, "Buffer 2 offsets: l2_pad_size %u,"
-  " l2_5_ofs : %u l3_ofs %u, l4_ofs %u\n",
+  " l2_5_ofs : %u l3_ofs %u, inner_l3_ofs %u,"
+  " l4_ofs %u, inner_l4_ofs %u\n",
   b2->l2_pad_size, b2->l2_5_ofs,
-  b2->l3_ofs, b2->l4_ofs);
+  b2->l3_ofs, b2->inner_l3_ofs,
+  b2->l4_ofs, b2->inner_l4_ofs);
 }
 return false;
 }
diff --git a/lib/odp-execute-avx512.c b/lib/odp-execute-avx512.c
index 747e04014..7f9870669 100644
--- a/lib/odp-execute-avx512.c
+++ b/lib/odp-execute-avx512.c
@@ -35,10 +35,11 @@
 
 VLOG_DEFINE_THIS_MODULE(odp_execute_avx512);
 
-/* The below three build asserts make sure that l2_5_ofs, l3_ofs, and l4_ofs
- * fields remain in the same order and offset to l2_padd_size. This is needed
- * as the avx512_dp_packet_resize_l2() function will manipulate those fields at
- * a fixed memory index based on the l2_padd_size offset. */
+/* The below three build asserts make sure that l2_5_ofs, l3_ofs, l4_ofs,
+ * inner_l3_ofs, and inner_l4_ofs fields remain in the same order and offset to
+ * l2_padd_size. This is needed as the avx512_dp_packet_resize_l2() function
+ * will manipulate those fields at a fixed memory index based on the
+ * l2_padd_size offset. */
 BUILD_ASSERT_DECL(offsetof(struct dp_packet, l2_pad_size) +
   MEMBER_SIZEOF(struct dp_packet, l2_pad_size) ==
   offsetof(struct dp_packet, l2_5_ofs));
@@ -51,6 +52,14 @@ BUILD_ASSERT_DECL(offsetof(struct dp_packet, l3_ofs) +
MEMBER_SIZEOF(struct dp_packet, l3_ofs) ==
offsetof(struct dp_packet, l4_ofs));
 
+BUILD_ASSERT_DECL(offsetof(struct dp_packet, l4_ofs) +
+   MEMBER_SIZEOF(struct dp_packet, l4_ofs) ==
+   offsetof(struct dp_packet, inner_l3_ofs));
+
+BUILD_ASSERT_DECL(offsetof(struct dp_packet, inner_l3_ofs) +
+   MEMBER_SIZEOF(struct dp_packet, inner_l3_ofs) ==
+   offsetof(struct dp_packet, inner_l4_ofs));
+
 /* The below build assert makes sure it's safe to read/write 128-bits starting
  * at the l2_pad_size location. */
 BUILD_ASSERT_DECL(sizeof(struct dp_packet) -
@@ -125,7 +134,7 @@ avx512_dp_packet_resize_l2(struct dp_packet *b, int 
resize_by_bytes)
 /* Each lane represents 16 bits in a 12-bit register. In this case the
  * first three 16-bit values, which will map to the l2_5_ofs, l3_ofs and
  * l4_ofs fields. */
-const uint8_t k_lanes = 0b1110;
+const uint8_t k_lanes = 0b10;
 
 /* Set all 16-bit words in the 128-bits v_offset register to the value we
  * nee

[ovs-dev] [PATCH v2 4/4] ofproto-dpif-monitor: Remove unneeded calls to clear packets.

2024-01-30 Thread Mike Pattrick

Currently the monitor will call dp_packet_clear() on the dp_packet that
is shared amongst BFD, LLDP, and CFM. However, all of these packets are
created with eth_compose(), which already calls dp_packet_clear().

Signed-off-by: Mike Pattrick 
---
 ofproto/ofproto-dpif-monitor.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/ofproto/ofproto-dpif-monitor.c b/ofproto/ofproto-dpif-monitor.c
index bb0e49091..5132f9c95 100644
--- a/ofproto/ofproto-dpif-monitor.c
+++ b/ofproto/ofproto-dpif-monitor.c
@@ -275,19 +275,16 @@ monitor_mport_run(struct mport *mport, struct dp_packet 
*packet)
 long long int lldp_wake_time = LLONG_MAX;
 
 if (mport->cfm && cfm_should_send_ccm(mport->cfm)) {
-dp_packet_clear(packet);
 cfm_compose_ccm(mport->cfm, packet, mport->hw_addr);
 ofproto_dpif_send_packet(mport->ofport, false, packet);
 }
 if (mport->bfd && bfd_should_send_packet(mport->bfd)) {
 bool oam;
 
-dp_packet_clear(packet);
 bfd_put_packet(mport->bfd, packet, mport->hw_addr, );
 ofproto_dpif_send_packet(mport->ofport, oam, packet);
 }
 if (mport->lldp && lldp_should_send_packet(mport->lldp)) {
-dp_packet_clear(packet);
 lldp_put_packet(mport->lldp, packet, mport->hw_addr);
 ofproto_dpif_send_packet(mport->ofport, false, packet);
 }
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v2 2/4] bfd: Set proper offsets and flags in BFD packets.

2024-01-30 Thread Mike Pattrick

Previously the BFD packet creation code did not appropriately set
offsets or flags. This contributed to issues involving encapsulation and
the TSO code.

Fixes: ccc096898c46 ("bfd: Implement Bidirectional Forwarding Detection.")
Signed-off-by: Mike Pattrick 
---
v2: Corrected formatting, and just calculate checksum up front
---
 lib/bfd.c | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/lib/bfd.c b/lib/bfd.c
index 9698576d0..9af258917 100644
--- a/lib/bfd.c
+++ b/lib/bfd.c
@@ -586,7 +586,6 @@ bfd_put_packet(struct bfd *bfd, struct dp_packet *p,
 {
 long long int min_tx, min_rx;
 struct udp_header *udp;
-struct eth_header *eth;
 struct ip_header *ip;
 struct msg *msg;
 
@@ -605,15 +604,13 @@ bfd_put_packet(struct bfd *bfd, struct dp_packet *p,
  * set. */
 ovs_assert(!(bfd->flags & FLAG_POLL) || !(bfd->flags & FLAG_FINAL));
 
-dp_packet_reserve(p, 2); /* Properly align after the ethernet header. */
-eth = dp_packet_put_uninit(p, sizeof *eth);
-eth->eth_src = eth_addr_is_zero(bfd->local_eth_src)
-? eth_src : bfd->local_eth_src;
-eth->eth_dst = eth_addr_is_zero(bfd->local_eth_dst)
-? eth_addr_bfd : bfd->local_eth_dst;
-eth->eth_type = htons(ETH_TYPE_IP);
+ip = eth_compose(p,
+ eth_addr_is_zero(bfd->local_eth_dst)
+ ? eth_addr_bfd : bfd->local_eth_dst,
+ eth_addr_is_zero(bfd->local_eth_src)
+ ? eth_src : bfd->local_eth_src,
+ ETH_TYPE_IP, sizeof *ip + sizeof *udp + sizeof *msg);
 
-ip = dp_packet_put_zeros(p, sizeof *ip);
 ip->ip_ihl_ver = IP_IHL_VER(5, 4);
 ip->ip_tot_len = htons(sizeof *ip + sizeof *udp + sizeof *msg);
 ip->ip_ttl = MAXTTL;
@@ -621,15 +618,17 @@ bfd_put_packet(struct bfd *bfd, struct dp_packet *p,
 ip->ip_proto = IPPROTO_UDP;
 put_16aligned_be32(>ip_src, bfd->ip_src);
 put_16aligned_be32(>ip_dst, bfd->ip_dst);
-/* Checksum has already been zeroed by put_zeros call. */
+/* Checksum has already been zeroed by eth_compose call. */
 ip->ip_csum = csum(ip, sizeof *ip);
+dp_packet_set_l4(p, ip + 1);
 
-udp = dp_packet_put_zeros(p, sizeof *udp);
+udp = dp_packet_l4(p);
 udp->udp_src = htons(bfd->udp_src);
 udp->udp_dst = htons(BFD_DEST_PORT);
 udp->udp_len = htons(sizeof *udp + sizeof *msg);
+/* Checksum already zero from eth_compose. */
 
-msg = dp_packet_put_uninit(p, sizeof *msg);
+msg = (struct msg *)(udp + 1);
 msg->vers_diag = (BFD_VERSION << 5) | bfd->diag;
 msg->flags = (bfd->state & STATE_MASK) | bfd->flags;
 
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v2 1/4] dp-packet: Validate correct offset for L4 inner size.

2024-01-30 Thread Mike Pattrick

This patch fixes the correctness of dp_packet_inner_l4_size() when
checking for the existence of an inner L4 header. Previously it checked
for the outer L4 header.

This function is currently only used when a packet is already flagged
for tunneling, so an incorrect determination isn't possible as long as
the flags of the packet are correct.

Fixes: 85bcbbed839a ("userspace: Enable tunnel tests with TSO.")
Signed-off-by: Mike Pattrick 
---
v2: Corrected patch subject
---
 lib/dp-packet.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index dceb701e8..802d3f385 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -540,7 +540,7 @@ dp_packet_inner_l4(const struct dp_packet *b)
 static inline size_t
 dp_packet_inner_l4_size(const struct dp_packet *b)
 {
-return OVS_LIKELY(b->l4_ofs != UINT16_MAX)
+return OVS_LIKELY(b->inner_l4_ofs != UINT16_MAX)
? (const char *) dp_packet_tail(b)
- (const char *) dp_packet_inner_l4(b)
- dp_packet_l2_pad_size(b)
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

1 2 3 4 5 6 7 >

1 - 100 of 635 matches

Mail list logo