Re: [ovs-dev] [PATCH v2] ovsdb: Use table indexes if available for ovsdb_query().

2024-06-17 Thread Mike Pattrick
On Mon, Jun 17, 2024 at 12:54 PM Mike Pattrick  wrote:
>
> Currently all OVSDB database queries except for UUID lookups all result
> in linear lookups over the entire table, even if an index is present.
>
> This patch modifies ovsdb_query() to attempt an index lookup first, if
> possible. If no matching indexes are present then a linear index is
> still conducted.
>
> To test this, I set up an ovsdb database with a variable number of rows
> and timed the average of how long ovsdb-client took to query a single
> row. The first two tests involved a linear scan that didn't match any
> rows, so there was no overhead associated with sending or encoding
> output. The post-patch linear scan was a worst case scenario where the
> table did have an appropriate index but the conditions made its usage
> impossible. The indexed lookup test was for a matching row, which did
> also include overhead associated with a match. The results are included
> in the table below.
>
> Rows   | 100k | 200k | 300k | 400k | 500k
> ---+--+--+--+--+-
> Pre-patch linear scan  |  9ms | 24ms | 37ms | 49ms | 61ms
> Post-patch linear scan |  9ms | 24ms | 38ms | 49ms | 61ms
> Indexed lookup |  3ms |  3ms |  3ms |  3ms |  3ms
>
> I also tested the performance of ovsdb_query() by wrapping it in a loop
> and measuring the time it took to perform 1000 linear scans on 1, 10,
> 100k, and 200k rows. This test showed that the new index checking code
> did not slow down worst case lookups to a statistically detectable
> degree.
>
> Reported-at: https://issues.redhat.com/browse/FDP-590
> Signed-off-by: Mike Pattrick 
>
> ---
>
> v2:
>  - Included txn in index code
>  - Added benchmarks
>  - Refactored code
>  - Added more tests
>  - Now a mock row is created to perform the search with standard
>  functions
> Signed-off-by: Mike Pattrick 

Recheck-request: github-robot

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] ovsdb: Use table indexes if available for ovsdb_query().

2024-06-17 Thread Mike Pattrick
On Mon, Jun 3, 2024 at 2:01 PM Ilya Maximets  wrote:
>
> On 6/3/24 06:20, Mike Pattrick wrote:
> > Currently all OVSDB database queries except for UUID lookups all result
> > in linear lookups over the entire table, even if an index is present.
> >
> > This patch modifies ovsdb_query() to attempt an index lookup first, if
> > possible. If no matching indexes are present then a linear index is
> > still conducted.
> >
> > Reported-at: https://issues.redhat.com/browse/FDP-590
> > Signed-off-by: Mike Pattrick 
> > ---
> >  NEWS |   3 ++
> >  ovsdb/query.c| 102 +++
> >  ovsdb/row.h  |  28 +++
> >  ovsdb/transaction.c  |  27 ---
> >  tests/ovsdb-execution.at |  34 -
> >  tests/ovsdb-server.at|   2 +-
> >  tests/ovsdb-tool.at  |   2 +-
> >  7 files changed, 159 insertions(+), 39 deletions(-)
>
> Hi, Mike.  Thanks for the patch.
>
> Besides what Simon asked, the patch has a few other issues:
>
> 1. Lookup is performed only on the committed index and it doesn't include
>rows that are in-flight in the current transaction.
>
>Unlike rows in a hash table, indexes are updated only after the whole
>transaction is committed.  With this change we'll not be able to find
>newly added rows.
>
>Another thing related to this is that it is allowed to have duplicates
>within a transaction as long as they are removed before the transaction
>ends.  So it is possible that multiple rows will satisfy the condition
>on indexed columns while the transaction is in-flight.
>
>Consider the following commands executed in a sandbox:
>
># ovs-vsctl set-manager "tcp:my-first-target"
># ovsdb-client transact unix:$(pwd)/sandbox/db.sock '
>["Open_vSwitch",
> {"op": "select",
>  "table": "Manager",
>  "columns": ["_uuid", "target"],
>  "where": [["target", "==", "tcp:my-first-target"]]},
> {"op": "insert",
>  "table": "Manager",
>  "uuid-name": "duplicate",
>  "row": {"target": "tcp:my-first-target"}},
> {"op": "select",
>  "table": "Manager",
>  "columns": ["_uuid", "target"],
>  "where": [["target", "==", "tcp:my-first-target"]]},
> {"op": "delete",
>  "table": "Manager",
>  "where":[["_uuid","==",["named-uuid","duplicate"]]]},
> {"op": "select",
>  "table": "Manager",
>  "columns": ["_uuid", "target"],
>  "where": [["target", "==", "tcp:my-first-target"]]}]'
>
>Transaction must succeed.  The first selection should return 1 row,
>the second should return both duplicates and the third should again
>return one row.

This is a good point, I hadn't anticipated this use-case but it does
have a large impact on this change. After working through a few
implementations, I wasn't able to find a solution that wasn't overly
complex. For the next version, I've instead opted to exclude indexed
lookups from transactions that modify the associated row.

The next version should address this and the other feedback.

Cheers,
M

>
>Ideally, implementation should not leak the transaction details to
>the query module, though I'm not sure if that is 100% achievable.
>
> 2. Taking above case into account, this change needs way more unit tests
>with different order of operations and complex data updates.
>
> 3. Since this is a performance-oriented change, please, include some
>performance numbers in the commit message as well, including impact
>on non-indexed lookups, if any.
>
> 4. There seems to be a lot of logic overlap with existing functions like
>ovsdb_condition_match_every_clause(), ovsdb_index_search() and
>ovsdb_row_hash_columns().  Can we re-use those instead?  For example,
>by creating a row from the conditions before the lookup?  What a
>performance impact will look like?
>
> Best regards, Ilya Maximets.
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] ovsdb: Use table indexes if available for ovsdb_query().

2024-06-17 Thread Mike Pattrick
Currently all OVSDB database queries except for UUID lookups all result
in linear lookups over the entire table, even if an index is present.

This patch modifies ovsdb_query() to attempt an index lookup first, if
possible. If no matching indexes are present then a linear index is
still conducted.

To test this, I set up an ovsdb database with a variable number of rows
and timed the average of how long ovsdb-client took to query a single
row. The first two tests involved a linear scan that didn't match any
rows, so there was no overhead associated with sending or encoding
output. The post-patch linear scan was a worst case scenario where the
table did have an appropriate index but the conditions made its usage
impossible. The indexed lookup test was for a matching row, which did
also include overhead associated with a match. The results are included
in the table below.

Rows   | 100k | 200k | 300k | 400k | 500k
---+--+--+--+--+-
Pre-patch linear scan  |  9ms | 24ms | 37ms | 49ms | 61ms
Post-patch linear scan |  9ms | 24ms | 38ms | 49ms | 61ms
Indexed lookup |  3ms |  3ms |  3ms |  3ms |  3ms

I also tested the performance of ovsdb_query() by wrapping it in a loop
and measuring the time it took to perform 1000 linear scans on 1, 10,
100k, and 200k rows. This test showed that the new index checking code
did not slow down worst case lookups to a statistically detectable
degree.

Reported-at: https://issues.redhat.com/browse/FDP-590
Signed-off-by: Mike Pattrick 

---

v2:
 - Included txn in index code
 - Added benchmarks
 - Refactored code
 - Added more tests
 - Now a mock row is created to perform the search with standard
 functions
Signed-off-by: Mike Pattrick 
---
 ovsdb/execution.c|  20 +++--
 ovsdb/query.c| 174 +++
 ovsdb/query.h|   6 +-
 ovsdb/rbac.c |  15 ++--
 ovsdb/rbac.h |  10 ++-
 ovsdb/row.h  |  28 +++
 ovsdb/transaction.c  |  29 +--
 ovsdb/transaction.h  |   5 ++
 tests/ovsdb-execution.at | 108 +++-
 tests/ovsdb-macros.at|  10 +++
 tests/ovsdb-query.at |  18 ++--
 tests/ovsdb-server.at|   2 +-
 tests/ovsdb-tool.at  |   2 +-
 tests/test-ovsdb.c   |  15 +++-
 14 files changed, 363 insertions(+), 79 deletions(-)

diff --git a/ovsdb/execution.c b/ovsdb/execution.c
index f4cc9e802..212839bca 100644
--- a/ovsdb/execution.c
+++ b/ovsdb/execution.c
@@ -459,7 +459,7 @@ ovsdb_execute_select(struct ovsdb_execution *x, struct 
ovsdb_parser *parser,
 if (!error) {
 struct ovsdb_row_set rows = OVSDB_ROW_SET_INITIALIZER;
 
-ovsdb_query_distinct(table, , , );
+ovsdb_query_distinct(table, , , , x->txn);
 ovsdb_row_set_sort(, );
 json_object_put(result, "rows",
 ovsdb_row_set_to_json(, ));
@@ -545,8 +545,8 @@ ovsdb_execute_update(struct ovsdb_execution *x, struct 
ovsdb_parser *parser,
 ur.row = row;
 ur.columns = 
 if (ovsdb_rbac_update(x->db, table, , , x->role,
-  x->id)) {
-ovsdb_query(table, , update_row_cb, );
+  x->id, x->txn)) {
+ovsdb_query(table, , update_row_cb, , x->txn);
 } else {
 error = ovsdb_perm_error("RBAC rules for client \"%s\" role "
  "\"%s\" prohibit modification of "
@@ -626,7 +626,7 @@ ovsdb_execute_mutate(struct ovsdb_execution *x, struct 
ovsdb_parser *parser,
 json_integer_create(hmap_count(>rows)));
 } else {
 size_t row_count = 0;
-ovsdb_query(table, , count_row_cb, _count);
+ovsdb_query(table, , count_row_cb, _count, x->txn);
 json_object_put(result, "count",
 json_integer_create(row_count));
 }
@@ -636,8 +636,8 @@ ovsdb_execute_mutate(struct ovsdb_execution *x, struct 
ovsdb_parser *parser,
 mr.mutations = 
 mr.error = 
 if (ovsdb_rbac_mutate(x->db, table, , , x->role,
-  x->id)) {
-ovsdb_query(table, , mutate_row_cb, );
+  x->id, x->txn)) {
+ovsdb_query(table, , mutate_row_cb, , x->txn);
 } else {
 error = ovsdb_perm_error("RBAC rules for client \"%s\" role "
  "\"%s\" prohibit mutate operation on "
@@ -693,8 +693,9 @@ ovsdb_execute_delete(struct ovsdb_execution *x, struct 
ovsdb_parser *parser,
 dr.table = table;
 dr.txn = x->txn;
 
-if (ovsdb_rbac_delete(x->db, table, , x->role, x->id)) {
-ovsdb_query(table, , delete_row_cb, );
+if (ovsdb_rbac_delete(x->db, tab

Re: [ovs-dev] [PATCH v2] netdev-dpdk: Use LSC interrupt mode.

2024-06-14 Thread Mike Pattrick
On Fri, Jun 14, 2024 at 2:48 AM David Marchand
 wrote:
>
> Querying link status may get delayed for an undeterministic (long) time
> with mlx5 ports. This is a consequence of the mlx5 driver calling ethtool
> kernel API and getting stuck on the kernel RTNL lock while some other
> operation is in progress under this lock.
>
> One impact for long link status query is that it is called under the bond
> lock taken in write mode periodically in bond_run().
> In parallel, datapath threads may block requesting to read bonding related
> info (like for example in bond_check_admissibility()).
>
> The LSC interrupt mode is available with many DPDK drivers and is used by
> default with testpmd.
>
> It seems safe enough to switch on this feature by default in OVS.
> We keep the per interface option to disable this feature in case of an
> unforeseen bug.
>
> Signed-off-by: David Marchand 
> ---
> Changes since v1:
> - (early) fail when interrupt lsc is requested by user but not supported
>   by the driver,
> - otherwise, log a debug message if user did not request interrupt mode,
>
> ---
>  Documentation/topics/dpdk/phy.rst |  4 ++--
>  NEWS  |  3 +++
>  lib/netdev-dpdk.c | 13 -
>  vswitchd/vswitch.xml  |  8 
>  4 files changed, 21 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/topics/dpdk/phy.rst 
> b/Documentation/topics/dpdk/phy.rst
> index efd168cba8..eefc25613d 100644
> --- a/Documentation/topics/dpdk/phy.rst
> +++ b/Documentation/topics/dpdk/phy.rst
> @@ -546,8 +546,8 @@ the firmware every time to fulfil this request.
>
>  Note that not all PMD drivers support LSC interrupts.
>
> -The default configuration is polling mode. To set interrupt mode, option
> -``dpdk-lsc-interrupt`` has to be set to ``true``.
> +The default configuration is interrupt mode. To set polling mode, option
> +``dpdk-lsc-interrupt`` has to be set to ``false``.
>
>  Command to set interrupt mode for a specific interface::
>  $ ovs-vsctl set interface  options:dpdk-lsc-interrupt=true
> diff --git a/NEWS b/NEWS
> index 5ae0108d55..1e19beb793 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -9,6 +9,9 @@ Post-v3.3.0
>   https://github.com/openvswitch/ovs.git
> - DPDK:
>   * OVS validated with DPDK 23.11.1.
> + * Link status changes are now handled via interrupt mode if the DPDK
> +   driver supports it.  It is possible to revert to polling mode by 
> setting
> +   per interface 'dpdk-lsc-interrupt' other_config to 'false'.

As Robin points out, other_config should be changed to options.But the
rest looks good.

Acked-by: Mike Pattrick 

>
>
>  v3.3.0 - 16 Feb 2024
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 0fa37d5145..a260bc8485 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2397,7 +2397,18 @@ netdev_dpdk_set_config(struct netdev *netdev, const 
> struct smap *args,
>  }
>  }
>
> -lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", false);
> +lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", true);
> +if (lsc_interrupt_mode && !(*info.dev_flags & RTE_ETH_DEV_INTR_LSC)) {
> +if (smap_get(args, "dpdk-lsc-interrupt")) {
> +VLOG_ERR("interface '%s': link status interrupt is not 
> supported.",
> + netdev_get_name(netdev));
> +err = EINVAL;
> +goto out;
> +}
> +VLOG_DBG("interface '%s': not enabling link status interrupt.",
> + netdev_get_name(netdev));
> +lsc_interrupt_mode = false;
> +}
>  if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) {
>  dev->requested_lsc_interrupt_mode = lsc_interrupt_mode;
>  netdev_request_reconfigure(netdev);
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 8a1b607d71..e3afb78a4e 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -4647,12 +4647,12 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 
> type=patch options:peer=p1 \
>type='{"type": "boolean"}'>
>  
> -  Set this value to true to configure interrupt mode for
> -  Link State Change (LSC) detection instead of poll mode for the DPDK
> -  interface.
> +  Set this value to false to configure poll mode for
> +  Link State Change (LSC) detection instead of interrupt mode for the
> +  DPDK interface.
>  
>  
> -  If this value is not set, poll mode is configured.
> +  If this value is not set, interrupt mode is configured.
>  
>  
>This parameter has an effect only on netdev dpdk interfaces.
> --
> 2.44.0
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] ofproto: Add upcall/dump-ufid-rules command to map UFIDs to OpenFlow.

2024-06-13 Thread Mike Pattrick
On Tue, Jun 4, 2024 at 7:29 AM Dumitru Ceara  wrote:
>
> It improves the debugging experience if we can easily get a list of
> OpenFlow rules and groups that contribute to the creation of a datapath
> flow.
>
> The suggested workflow is:
> a. dump datapath flows (along with UUIDs), this also prints the core IDs
> (PMD IDs) when applicable.
>
>   $ ovs-appctl dpctl/dump-flows -m
>   flow-dump from pmd on cpu core: 7
>   ufid:7460db8f..., recirc_id(0), 
>
> b. dump related OpenFlow rules and groups:
>   $ ovs-appctl upcall/dump-ufid-rules ufid:7460db8f... pmd=7
>   cookie=0x12345678, table=0 
> priority=100,ip,in_port=2,nw_dst=10.0.0.2,actions=resubmit(,1)
>   cookie=0x0, table=1 priority=200,actions=group:1
>   group_id=1,bucket=bucket_id:0,actions=ct(commit,table=2,nat(dst=20.0.0.2))
>   cookie=0x0, table=2 actions=output:1
>
> The new command only shows rules and groups attached to ukeys that are
> in states UKEY_VISIBLE or UKEY_OPERATIONAL.  That should be fine as all
> other ukeys should not be relevant for the use case presented above.
>
> For ukeys that don't have an xcache populated yet, the command goes
> ahead and populates one.  In theory this is creates a slight overhead as
> those ukeys might not need an xcache until they see traffic (and get
> revalidated) but in practice the overhead should be minimal.
>
> This commit tries to mimic the output format of the ovs-ofctl
> dump-flows/dump-groups commands.  For groups it actually uses
> ofputil_group/_bucket functions for formatting.  For rules it uses
> flow_stats_ds() (the original function was exported and renamed to
> ofproto_rule_stats_ds()).
>
> Signed-off-by: Dumitru Ceara 
> ---
> V2:
> - Addressed Adrian's comments:
>   - check return value of populate_xcache()
>   - use flow_stats_ds() (renamed to ofproto_rule_stats_ds()) instead of
> custom printing
>   - move ukey->state check to caller
>   - handle case when group bucket is not available
>   - update test case to cover the point above
> ---
>  NEWS|   3 +
>  include/openvswitch/ofp-group.h |   7 ++
>  lib/ofp-group.c | 110 +++-
>  ofproto/ofproto-dpif-upcall.c   |  85 
>  ofproto/ofproto-dpif.c  |  24 +++
>  ofproto/ofproto-dpif.h  |   2 +
>  ofproto/ofproto-provider.h  |   1 +
>  ofproto/ofproto.c   |  85 
>  tests/ofproto-dpif.at   |  47 ++
>  tests/ofproto-macros.at |   4 ++
>  10 files changed, 282 insertions(+), 86 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index 5ae0108d552b..1bc085f97045 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -9,6 +9,9 @@ Post-v3.3.0
>   https://github.com/openvswitch/ovs.git
> - DPDK:
>   * OVS validated with DPDK 23.11.1.
> +   - ovs-appctl:
> + * Added 'upcall/dump-ufid-rules' to output the set of OpenFlow rules and
> +   groups that contributed to the creation of a specific datapath flow.
>
>
>  v3.3.0 - 16 Feb 2024
> diff --git a/include/openvswitch/ofp-group.h b/include/openvswitch/ofp-group.h
> index cd7af0ebff9c..79fcb3a4c0d1 100644
> --- a/include/openvswitch/ofp-group.h
> +++ b/include/openvswitch/ofp-group.h
> @@ -70,6 +70,11 @@ struct ofputil_bucket *ofputil_bucket_find(const struct 
> ovs_list *,
>  bool ofputil_bucket_check_duplicate_id(const struct ovs_list *);
>  struct ofputil_bucket *ofputil_bucket_list_front(const struct ovs_list *);
>  struct ofputil_bucket *ofputil_bucket_list_back(const struct ovs_list *);
> +void ofputil_bucket_format(const struct ofputil_bucket *,
> +   enum ofp11_group_type, enum ofp_version,
> +   const struct ofputil_port_map *,
> +   const struct ofputil_table_map *,
> +   struct ds *);
>
>  static inline bool
>  ofputil_bucket_has_liveness(const struct ofputil_bucket *bucket)
> @@ -88,6 +93,8 @@ struct ofputil_group_props {
>  void ofputil_group_properties_destroy(struct ofputil_group_props *);
>  void ofputil_group_properties_copy(struct ofputil_group_props *to,
> const struct ofputil_group_props *from);
> +void ofputil_group_properties_format(const struct ofputil_group_props *,
> + struct ds *);
>  /* Protocol-independent group_mod. */
>  struct ofputil_group_mod {
>  uint16_t command; /* One of OFPGC15_*. */
> diff --git a/lib/ofp-group.c b/lib/ofp-group.c
> index 737f48047b10..28504c068c60 100644
> --- a/lib/ofp-group.c
> +++ b/lib/ofp-group.c
> @@ -58,14 +58,16 @@ ofputil_group_from_string(const char *s, uint32_t 
> *group_idp)
>  return true;
>  }
>
> -/* Appends to 's' a string representation of the OpenFlow group ID 
> 'group_id'.
> - * Most groups' string representation is just the number, but for special
> - * groups, e.g. OFPG_ALL, it is the name, e.g. "ALL". */
> +/* Appends to 's' a string representation of the 

Re: [ovs-dev] [PATCH] odp-execute: Set IPv6 traffic class in AVX implementation.

2024-06-13 Thread Mike Pattrick
On Wed, Jun 12, 2024 at 6:44 AM Emma Finn  wrote:
>
> The AVX implementation for the IPv6 action did not set
> traffic class field. Adding support for this field to
> the AVX implementation.
>
> Signed-off-by: Emma Finn 
> Reported-by: Eelco Chaudron 
> ---
>  lib/odp-execute-avx512.c | 8 
>  lib/packets.c| 2 +-
>  lib/packets.h| 1 +
>  3 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/lib/odp-execute-avx512.c b/lib/odp-execute-avx512.c
> index a74a85dc1..569ea789e 100644
> --- a/lib/odp-execute-avx512.c
> +++ b/lib/odp-execute-avx512.c
> @@ -741,6 +741,14 @@ action_avx512_set_ipv6(struct dp_packet_batch *batch, 
> const struct nlattr *a)
>  }
>  /* Write back the modified IPv6 addresses. */
>  _mm512_mask_storeu_epi64((void *) nh, 0x1F, v_new_hdr);
> +
> +/* Scalar method for setting IPv6 tclass field. */
> +if (key->ipv6_tclass) {
> +uint8_t old_tc = ntohl(get_16aligned_be32(>ip6_flow)) >> 20;
> +uint8_t key_tc = (key->ipv6_tclass |
> + (old_tc & ~mask->ipv6_tclass));
> +packet_set_ipv6_tc(>ip6_flow, key_tc);
> +}

Hello,

I'm wondering if we also need to set the flow label?

Thanks,
M

>  }
>  }
>  #endif /* HAVE_AVX512VBMI */
> diff --git a/lib/packets.c b/lib/packets.c
> index ebf516d67..91c28daf0 100644
> --- a/lib/packets.c
> +++ b/lib/packets.c
> @@ -1299,7 +1299,7 @@ packet_set_ipv6_flow_label(ovs_16aligned_be32 
> *flow_label, ovs_be32 flow_key)
>  put_16aligned_be32(flow_label, new_label);
>  }
>
> -static void
> +void
>  packet_set_ipv6_tc(ovs_16aligned_be32 *flow_label, uint8_t tc)
>  {
>  ovs_be32 old_label = get_16aligned_be32(flow_label);
> diff --git a/lib/packets.h b/lib/packets.h
> index 8b6994809..a102f8163 100644
> --- a/lib/packets.h
> +++ b/lib/packets.h
> @@ -1635,6 +1635,7 @@ void packet_set_ipv6_addr(struct dp_packet *packet, 
> uint8_t proto,
>bool recalculate_csum);
>  void packet_set_ipv6_flow_label(ovs_16aligned_be32 *flow_label,
>  ovs_be32 flow_key);
> +void packet_set_ipv6_tc(ovs_16aligned_be32 *flow_label, uint8_t tc);
>  void packet_set_tcp_port(struct dp_packet *, ovs_be16 src, ovs_be16 dst);
>  void packet_set_udp_port(struct dp_packet *, ovs_be16 src, ovs_be16 dst);
>  void packet_set_sctp_port(struct dp_packet *, ovs_be16 src, ovs_be16 dst);
> --
> 2.34.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] odp-execute: Check IPv4 checksum offload flag in AVX.

2024-06-13 Thread Mike Pattrick
On Wed, Jun 12, 2024 at 6:45 AM Emma Finn  wrote:
>
> The AVX implementation for IPv4 action did not check whether
> the IPv4 checksum offload flag has been set and was incorrectly
> calculating checksums in software. Adding a check to skip AVX
> checksum claculation when offload flags are set.

Nit: calculation

>
> Signed-off-by: Emma Finn 
> Reported-by: Eelco Chaudron 
> ---
>  lib/odp-execute-avx512.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/lib/odp-execute-avx512.c b/lib/odp-execute-avx512.c
> index 569ea789e..3ddbfcd15 100644
> --- a/lib/odp-execute-avx512.c
> +++ b/lib/odp-execute-avx512.c
> @@ -473,7 +473,8 @@ action_avx512_ipv4_set_addrs(struct dp_packet_batch 
> *batch,
>   * (v_pkt_masked). */
>  __m256i v_new_hdr = _mm256_or_si256(v_key_shuf, v_pkt_masked);
>
> -if (dp_packet_hwol_tx_ip_csum(packet)) {
> +if (dp_packet_hwol_tx_ip_csum(packet) ||
> +dp_packet_hwol_l3_ipv4(packet)) {

dp_packet_hwol_tx_ip_csum() should be a subset of
dp_packet_hwol_l3_ipv4(). If it's not set then the good flag may not
be checked.


Cheers,
M

>  dp_packet_ol_reset_ip_csum_good(packet);
>  } else {
>  ovs_be16 old_csum = ~nh->ip_csum;
> --
> 2.34.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-dpdk: Use LSC interrupt mode.

2024-06-13 Thread Mike Pattrick
On Thu, Jun 13, 2024 at 5:59 AM David Marchand
 wrote:
>
> Querying link status may get delayed for an undeterministic (long) time
> with mlx5 ports. This is a consequence of the mlx5 driver calling ethtool
> kernel API and getting stuck on the kernel RTNL lock while some other
> operation is in progress under this lock.
>
> One impact for long link status query is that it is called under the bond
> lock taken in write mode periodically in bond_run().
> In parallel, datapath threads may block requesting to read bonding related
> info (like for example in bond_check_admissibility()).
>
> The LSC interrupt mode is available with many DPDK drivers and is used by
> default with testpmd.
>
> It seems safe enough to switch on this feature by default in OVS.
> We keep the per interface option to disable this feature in case of an
> unforeseen bug.
>
> Signed-off-by: David Marchand 
> ---
>  Documentation/topics/dpdk/phy.rst | 4 ++--
>  NEWS  | 3 +++
>  lib/netdev-dpdk.c | 7 ++-
>  vswitchd/vswitch.xml  | 8 
>  4 files changed, 15 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/topics/dpdk/phy.rst 
> b/Documentation/topics/dpdk/phy.rst
> index efd168cba8..eefc25613d 100644
> --- a/Documentation/topics/dpdk/phy.rst
> +++ b/Documentation/topics/dpdk/phy.rst
> @@ -546,8 +546,8 @@ the firmware every time to fulfil this request.
>
>  Note that not all PMD drivers support LSC interrupts.
>
> -The default configuration is polling mode. To set interrupt mode, option
> -``dpdk-lsc-interrupt`` has to be set to ``true``.
> +The default configuration is interrupt mode. To set polling mode, option
> +``dpdk-lsc-interrupt`` has to be set to ``false``.
>
>  Command to set interrupt mode for a specific interface::
>  $ ovs-vsctl set interface  options:dpdk-lsc-interrupt=true
> diff --git a/NEWS b/NEWS
> index 5ae0108d55..1e19beb793 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -9,6 +9,9 @@ Post-v3.3.0
>   https://github.com/openvswitch/ovs.git
> - DPDK:
>   * OVS validated with DPDK 23.11.1.
> + * Link status changes are now handled via interrupt mode if the DPDK
> +   driver supports it.  It is possible to revert to polling mode by 
> setting
> +   per interface 'dpdk-lsc-interrupt' other_config to 'false'.
>
>
>  v3.3.0 - 16 Feb 2024
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 0fa37d5145..833d852319 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2397,7 +2397,12 @@ netdev_dpdk_set_config(struct netdev *netdev, const 
> struct smap *args,
>  }
>  }
>
> -lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", false);
> +lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", true);
> +if (lsc_interrupt_mode && !(*info.dev_flags & RTE_ETH_DEV_INTR_LSC)) {
> +VLOG_INFO("interface '%s': link status interrupt is not supported.",
> +  netdev_get_name(netdev));
> +lsc_interrupt_mode = false;
> +}

I did a quick grep of the dpdk source tree and noticed the following
drivers had this define:

 bnxt dpaa dpaa2 ena failsafe hns3 mlx4 mlx5 netvsc tap vhost virtio

This may have been an invalid methodology, but I'm noticing some
drivers for very common network cards are missing. Is there a risk
here of starting to print these info messages all the time for a large
number of users who never set this feature in the first place?

It may be preferable to continue  defaulting to true, but only print
the error if the flag was explicitly set.

Cheers,
M

>  if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) {
>  dev->requested_lsc_interrupt_mode = lsc_interrupt_mode;
>  netdev_request_reconfigure(netdev);
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 8a1b607d71..e3afb78a4e 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -4647,12 +4647,12 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 
> type=patch options:peer=p1 \
>type='{"type": "boolean"}'>
>  
> -  Set this value to true to configure interrupt mode for
> -  Link State Change (LSC) detection instead of poll mode for the DPDK
> -  interface.
> +  Set this value to false to configure poll mode for
> +  Link State Change (LSC) detection instead of interrupt mode for the
> +  DPDK interface.
>  
>  
> -  If this value is not set, poll mode is configured.
> +  If this value is not set, interrupt mode is configured.
>  
>  
>This parameter has an effect only on netdev dpdk interfaces.
> --
> 2.44.0
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>

___
dev mailing list
d...@openvswitch.org

[ovs-dev] [PATCH v2] Userspace: Software fallback for UDP encapsulated TCP segmentation.

2024-06-12 Thread Mike Pattrick
When sending packets that are flagged as requiring segmentation to an
interface that doens't support this feature, send the packet to the TSO
software fallback instead of dropping it.

Signed-off-by: Mike Pattrick 

---
v2:
 - Fixed udp tunnel length
 - Added test that UDP headers are correct
 - Split inner and outer ip_id into different counters
 - Set tunnel flags in reset_tcp_seg

Signed-off-by: Mike Pattrick 
---
 lib/dp-packet-gso.c | 89 +
 lib/dp-packet.h | 34 
 lib/netdev-native-tnl.c |  8 
 lib/netdev.c| 37 +++--
 tests/system-traffic.at | 87 
 5 files changed, 216 insertions(+), 39 deletions(-)

diff --git a/lib/dp-packet-gso.c b/lib/dp-packet-gso.c
index 847685ad9..dc43ad662 100644
--- a/lib/dp-packet-gso.c
+++ b/lib/dp-packet-gso.c
@@ -47,6 +47,8 @@ dp_packet_gso_seg_new(const struct dp_packet *p, size_t 
hdr_len,
 seg->l2_5_ofs = p->l2_5_ofs;
 seg->l3_ofs = p->l3_ofs;
 seg->l4_ofs = p->l4_ofs;
+seg->inner_l3_ofs = p->inner_l3_ofs;
+seg->inner_l4_ofs = p->inner_l4_ofs;
 
 /* The protocol headers remain the same, so preserve hash and mark. */
 *dp_packet_rss_ptr(seg) = *dp_packet_rss_ptr(p);
@@ -71,7 +73,12 @@ dp_packet_gso_nr_segs(struct dp_packet *p)
 const char *data_tail;
 const char *data_pos;
 
-data_pos = dp_packet_get_tcp_payload(p);
+if (dp_packet_hwol_is_tunnel_vxlan(p) ||
+dp_packet_hwol_is_tunnel_geneve(p)) {
+data_pos = dp_packet_get_inner_tcp_payload(p);
+} else {
+data_pos = dp_packet_get_tcp_payload(p);
+}
 data_tail = (char *) dp_packet_tail(p) - dp_packet_l2_pad_size(p);
 
 return DIV_ROUND_UP(data_tail - data_pos, segsz);
@@ -89,14 +96,19 @@ dp_packet_gso(struct dp_packet *p, struct dp_packet_batch 
**batches)
 static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
 struct dp_packet_batch *curr_batch = *batches;
 struct tcp_header *tcp_hdr;
+struct udp_header *tnl_hdr;
 struct ip_header *ip_hdr;
+uint16_t inner_ip_id = 0;
+uint16_t outer_ip_id = 0;
 struct dp_packet *seg;
+const char *data_pos;
 uint16_t tcp_offset;
 uint16_t tso_segsz;
 uint32_t tcp_seq;
-uint16_t ip_id;
+bool outer_ipv4;
 int hdr_len;
 int seg_len;
+bool tnl;
 
 tso_segsz = dp_packet_get_tso_segsz(p);
 if (!tso_segsz) {
@@ -105,20 +117,38 @@ dp_packet_gso(struct dp_packet *p, struct dp_packet_batch 
**batches)
 return false;
 }
 
-tcp_hdr = dp_packet_l4(p);
-tcp_offset = TCP_OFFSET(tcp_hdr->tcp_ctl);
-tcp_seq = ntohl(get_16aligned_be32(_hdr->tcp_seq));
-hdr_len = ((char *) dp_packet_l4(p) - (char *) dp_packet_eth(p))
-  + tcp_offset * 4;
-ip_id = 0;
-if (dp_packet_hwol_is_ipv4(p)) {
+if (dp_packet_hwol_is_tunnel_vxlan(p) ||
+dp_packet_hwol_is_tunnel_geneve(p)) {
+data_pos =  dp_packet_get_inner_tcp_payload(p);
+outer_ipv4 = dp_packet_hwol_is_outer_ipv4(p);
+tcp_hdr = dp_packet_inner_l4(p);
+ip_hdr = dp_packet_inner_l3(p);
+tnl = true;
+
+if (outer_ipv4) {
+outer_ip_id = ntohs(((struct ip_header *) dp_packet_l3(p))->ip_id);
+}
+if (dp_packet_hwol_is_ipv4(p)) {
+inner_ip_id = ntohs(ip_hdr->ip_id);
+}
+} else {
+data_pos = dp_packet_get_tcp_payload(p);
+outer_ipv4 = dp_packet_hwol_is_ipv4(p);
+tcp_hdr = dp_packet_l4(p);
 ip_hdr = dp_packet_l3(p);
-ip_id = ntohs(ip_hdr->ip_id);
+tnl = false;
+
+if (outer_ipv4) {
+outer_ip_id = ntohs(ip_hdr->ip_id);
+}
 }
 
+tcp_offset = TCP_OFFSET(tcp_hdr->tcp_ctl);
+tcp_seq = ntohl(get_16aligned_be32(_hdr->tcp_seq));
+hdr_len = ((char *) tcp_hdr - (char *) dp_packet_eth(p))
+  + tcp_offset * 4;
 const char *data_tail = (char *) dp_packet_tail(p)
 - dp_packet_l2_pad_size(p);
-const char *data_pos = dp_packet_get_tcp_payload(p);
 int n_segs = dp_packet_gso_nr_segs(p);
 
 for (int i = 0; i < n_segs; i++) {
@@ -130,14 +160,35 @@ dp_packet_gso(struct dp_packet *p, struct dp_packet_batch 
**batches)
 seg = dp_packet_gso_seg_new(p, hdr_len, data_pos, seg_len);
 data_pos += seg_len;
 
+if (tnl) {
+/* Update tunnel inner L3 header. */
+if (dp_packet_hwol_is_ipv4(seg)) {
+ip_hdr = dp_packet_inner_l3(seg);
+ip_hdr->ip_tot_len = htons(sizeof *ip_hdr +
+   dp_packet_inner_l4_size(seg));
+ip_hdr->ip_id = htons(inner_ip_id);
+ip_hdr->ip_csum = 0;
+inner_ip_id++;
+} else {
+struct ovs_16aligned_ip6_hdr *ip6_hdr;
+
+ 

Re: [ovs-dev] [PATCH] Userspace: Software fallback for UDP encapsulated TCP segmentation.

2024-06-12 Thread Mike Pattrick
On Tue, Jun 4, 2024 at 10:29 AM David Marchand
 wrote:
>
> On Wed, Feb 21, 2024 at 5:09 AM Mike Pattrick  wrote:
> > @@ -105,20 +115,35 @@ dp_packet_gso(struct dp_packet *p, struct 
> > dp_packet_batch **batches)
> >  return false;
> >  }
> >
> > -tcp_hdr = dp_packet_l4(p);
> > -tcp_offset = TCP_OFFSET(tcp_hdr->tcp_ctl);
> > -tcp_seq = ntohl(get_16aligned_be32(_hdr->tcp_seq));
> > -hdr_len = ((char *) dp_packet_l4(p) - (char *) dp_packet_eth(p))
> > -  + tcp_offset * 4;
> > -ip_id = 0;
> > -if (dp_packet_hwol_is_ipv4(p)) {
> > +if (dp_packet_hwol_is_tunnel_vxlan(p) ||
> > +dp_packet_hwol_is_tunnel_geneve(p)) {
> > +data_pos =  dp_packet_get_inner_tcp_payload(p);
> > +outer_ipv4 = dp_packet_hwol_is_outer_ipv4(p);
> > +tcp_hdr = dp_packet_inner_l4(p);
> > +ip_hdr = dp_packet_inner_l3(p);
> > +tnl = true;
> > +if (outer_ipv4) {
> > +ip_id = ntohs(((struct ip_header *) dp_packet_l3(p))->ip_id);
> > +} else if (dp_packet_hwol_is_ipv4(p)) {
> > +ip_id = ntohs(ip_hdr->ip_id);
> > +}
> > +} else {
> > +data_pos = dp_packet_get_tcp_payload(p);
> > +outer_ipv4 = dp_packet_hwol_is_ipv4(p);
> > +tcp_hdr = dp_packet_l4(p);
> >  ip_hdr = dp_packet_l3(p);
> > -ip_id = ntohs(ip_hdr->ip_id);
> > +tnl = false;
> > +if (outer_ipv4) {
> > +ip_id = ntohs(ip_hdr->ip_id);
> > +}
> >  }
> >
> > +tcp_offset = TCP_OFFSET(tcp_hdr->tcp_ctl);
> > +tcp_seq = ntohl(get_16aligned_be32(_hdr->tcp_seq));
> > +hdr_len = ((char *) tcp_hdr - (char *) dp_packet_eth(p))
> > +  + tcp_offset * 4;
> >  const char *data_tail = (char *) dp_packet_tail(p)
> >  - dp_packet_l2_pad_size(p);
> > -const char *data_pos = dp_packet_get_tcp_payload(p);
> >  int n_segs = dp_packet_gso_nr_segs(p);
> >
> >  for (int i = 0; i < n_segs; i++) {
> > @@ -130,8 +155,26 @@ dp_packet_gso(struct dp_packet *p, struct 
> > dp_packet_batch **batches)
> >  seg = dp_packet_gso_seg_new(p, hdr_len, data_pos, seg_len);
> >  data_pos += seg_len;
> >
> > +if (tnl) {
> > +/* Update tunnel L3 header. */
> > +if (dp_packet_hwol_is_ipv4(seg)) {
> > +ip_hdr = dp_packet_inner_l3(seg);
> > +ip_hdr->ip_tot_len = htons(sizeof *ip_hdr +
> > +   dp_packet_inner_l4_size(seg));
> > +ip_hdr->ip_id = htons(ip_id);
> > +ip_hdr->ip_csum = 0;
> > +ip_id++;
>
> Hum, it seems with this change, we are tying outer and inner (in the
> ipv4 in ipv4 case) ip id together.
> I am unclear what the Linux kernel does or what is acceptable, but I
> prefer to mention.

The Linux kernel does +1 as well. The sending OS should be
sufficiently randomizing IP ID's so this shouldn't be an issue. If
that isn't happening then at least we aren't making things worse.

> I also noticed ip_id is incremented a second time later in this loop
> which I find suspicious.
>
> I wonder if OVS should instead increment outer and inner ip ids
> (copied from original packet data) by 'i'.
> Like ip_hdr->ip_id = htons(ntohs(ip_hdr->ip_id) + i) ?

I thought the current solution was simpler. We rely on the sending
host to pick a random initial value, and just need some way to avoid
picking a value that overlaps with another fragment in flight between
the two hosts. That said, keeping them seperate shouldn't hurt
anything, I can make that change.

> And adjusting the outer udp seems missing (length and checksum if
> needed), which is probably what Ilya meant.

Yes, I  missed this.

Thanks,
M

>
>
> --
> David Marchand
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofp-prop: Fix unaligned 128 bit access.

2024-06-12 Thread Mike Pattrick
On Wed, Jun 12, 2024 at 9:50 AM Ales Musil  wrote:

>
>
> On Wed, Jun 12, 2024 at 3:32 PM Mike Pattrick  wrote:
>
>> When compiling with '-fsanitize=address,undefined', the "ovs-ofctl
>> ct-flush" test will yield the following undefined behavior flagged
>> by UBSan. This patch uses memcpy to move the 128bit value into the
>> stack before reading it.
>>
>> lib/ofp-prop.c:277:14: runtime error: load of misaligned address
>> for type 'union ovs_be128', which requires 8 byte alignment
>>   ^
>> #0 0x7735d4 in ofpprop_parse_u128 lib/ofp-prop.c:277
>> #1 0x6c6c83 in ofp_ct_match_decode lib/ofp-ct.c:529
>> #2 0x76f3b5 in ofp_print_nxt_ct_flush lib/ofp-print.c:959
>> #3 0x76f3b5 in ofp_to_string__ lib/ofp-print.c:1206
>> #4 0x76f3b5 in ofp_to_string lib/ofp-print.c:1264
>> #5 0x770c0d in ofp_print lib/ofp-print.c:1308
>> #6 0x484a9d in ofctl_ofp_print utilities/ovs-ofctl.c:4899
>> #7 0x4ddb77 in ovs_cmdl_run_command__ lib/command-line.c:247
>> #8 0x47f6b3 in main utilities/ovs-ofctl.c:186
>>
>> Signed-off-by: Mike Pattrick 
>> ---
>>
>
> Hi Mike,
>
> this is interesting, do you have an idea why it didn't fail in CI by now?
> Also AFAIR the ofprops is supposed to be aligned to 8 bytes so unless the
> buffer itself isn't allocated at an address that is not itself 8 bytes
> aligned it shouldn't happen. In that case we might actually have a problem
> with other sizes.
>

Report is seen with gcc + ubsan, but not clang + ubsan. It is possible that
this is only seen due the test, this warning wasn't seen live.

Cheers,
M


>
>
>>  lib/ofp-prop.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/ofp-prop.c b/lib/ofp-prop.c
>> index 0a685750c..ed6365414 100644
>> --- a/lib/ofp-prop.c
>> +++ b/lib/ofp-prop.c
>> @@ -271,10 +271,12 @@ enum ofperr
>>  ofpprop_parse_u128(const struct ofpbuf *property, ovs_u128 *value)
>>  {
>>  ovs_be128 *p = property->msg;
>> +ovs_be128 aligned;
>>  if (ofpbuf_msgsize(property) != sizeof *p) {
>>  return OFPERR_OFPBPC_BAD_LEN;
>>  }
>> -*value = ntoh128(*p);
>> +memcpy(, p, sizeof aligned);
>> +*value = ntoh128(aligned);
>>  return 0;
>>  }
>>
>> --
>> 2.39.3
>>
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>>
> Thanks,
> Ales
>
> --
>
> Ales Musil
>
> Senior Software Engineer - OVN Core
>
> Red Hat EMEA <https://www.redhat.com>
>
> amu...@redhat.com
> <https://red.ht/sig>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ofp-prop: Fix unaligned 128 bit access.

2024-06-12 Thread Mike Pattrick
When compiling with '-fsanitize=address,undefined', the "ovs-ofctl
ct-flush" test will yield the following undefined behavior flagged
by UBSan. This patch uses memcpy to move the 128bit value into the
stack before reading it.

lib/ofp-prop.c:277:14: runtime error: load of misaligned address
for type 'union ovs_be128', which requires 8 byte alignment
  ^
#0 0x7735d4 in ofpprop_parse_u128 lib/ofp-prop.c:277
#1 0x6c6c83 in ofp_ct_match_decode lib/ofp-ct.c:529
#2 0x76f3b5 in ofp_print_nxt_ct_flush lib/ofp-print.c:959
#3 0x76f3b5 in ofp_to_string__ lib/ofp-print.c:1206
#4 0x76f3b5 in ofp_to_string lib/ofp-print.c:1264
#5 0x770c0d in ofp_print lib/ofp-print.c:1308
#6 0x484a9d in ofctl_ofp_print utilities/ovs-ofctl.c:4899
#7 0x4ddb77 in ovs_cmdl_run_command__ lib/command-line.c:247
#8 0x47f6b3 in main utilities/ovs-ofctl.c:186

Signed-off-by: Mike Pattrick 
---
 lib/ofp-prop.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/ofp-prop.c b/lib/ofp-prop.c
index 0a685750c..ed6365414 100644
--- a/lib/ofp-prop.c
+++ b/lib/ofp-prop.c
@@ -271,10 +271,12 @@ enum ofperr
 ofpprop_parse_u128(const struct ofpbuf *property, ovs_u128 *value)
 {
 ovs_be128 *p = property->msg;
+ovs_be128 aligned;
 if (ofpbuf_msgsize(property) != sizeof *p) {
 return OFPERR_OFPBPC_BAD_LEN;
 }
-*value = ntoh128(*p);
+memcpy(, p, sizeof aligned);
+*value = ntoh128(aligned);
 return 0;
 }
 
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] checkpatch: Extend and move extra_keywords list to file.

2024-06-07 Thread Mike Pattrick
On Fri, Jun 7, 2024 at 2:35 AM Eelco Chaudron  wrote:
>
>
>
> On 6 Jun 2024, at 3:07, Mike Pattrick wrote:
>
> > This patch extends the extra_keywords list from 324 to 747 keywords and
> > moves this list to a separate file. The methodology used to create this
> > list was running the spell checker on a large volume of historical
> > patches and selecting any words that appeared multiple times.
>
> Thanks Mike,
>
> I like the idea of having this in a separate file (I would add the .txt 
> extension to it), however, just blindly taking the last x errors does not 
> seem to be the right approach.
>
> Last time I took the words from the last 1000 commits that made sense. For 
> example, things like countersfn, deviceiocontrol, etc. do not make sense to 
> me to add.

Why wouldn't we want something like deviceiocontrol in an exclusion
list? It's a common Windows function name, any commit that touches the
windows code has a high likelihood of including it.

-M

>
> //Eelco
>
> > The rational for using a separate file is to make management of this
> > list simpler by decoupling the code from the keywords.
> >
> > Signed-off-by: Mike Pattrick 
> > ---
> > v2: Included new file in distfiles
> > ---
> >  utilities/automake.mk|   1 +
> >  utilities/checkpatch.py  |  67 +---
> >  utilities/extra_keywords | 747 +++
> >  3 files changed, 751 insertions(+), 64 deletions(-)
> >  create mode 100644 utilities/extra_keywords
> >
> > diff --git a/utilities/automake.mk b/utilities/automake.mk
> > index 146b8c37f..3f14c0fef 100644
> > --- a/utilities/automake.mk
> > +++ b/utilities/automake.mk
> > @@ -65,6 +65,7 @@ EXTRA_DIST += \
> >   utilities/ovs-vlan-test.in \
> >   utilities/ovs-vsctl-bashcomp.bash \
> >   utilities/checkpatch.py \
> > + utilities/extra_keywords \
> >   utilities/docker/Makefile \
> >   utilities/docker/ovs-override.conf \
> >   utilities/docker/start-ovs \
> > diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
> > index 6b293770d..08b5870d3 100755
> > --- a/utilities/checkpatch.py
> > +++ b/utilities/checkpatch.py
> > @@ -49,70 +49,9 @@ def open_spell_check_dict():
> >  codespell_file = ''
> >
> >  try:
> > -extra_keywords = ['ovs', 'vswitch', 'vswitchd', 'ovs-vswitchd',
> > -  'netdev', 'selinux', 'ovs-ctl', 'dpctl', 'ofctl',
> > -  'openvswitch', 'dpdk', 'hugepage', 'hugepages',
> > -  'pmd', 'upcall', 'vhost', 'rx', 'tx', 
> > 'vhostuser',
> > -  'openflow', 'qsort', 'rxq', 'txq', 'perf', 
> > 'stats',
> > -  'struct', 'int', 'char', 'bool', 'upcalls', 
> > 'nicira',
> > -  'bitmask', 'ipv4', 'ipv6', 'tcp', 'tcp4', 
> > 'tcpv4',
> > -  'udp', 'udp4', 'udpv4', 'icmp', 'icmp4', 
> > 'icmpv6',
> > -  'vlan', 'vxlan', 'cksum', 'csum', 'checksum',
> > -  'ofproto', 'numa', 'mempool', 'mempools', 'mbuf',
> > -  'mbufs', 'hmap', 'cmap', 'smap', 'dhcpv4', 
> > 'dhcp',
> > -  'dhcpv6', 'opts', 'metadata', 'geneve', 'mutex',
> > -  'netdev', 'netdevs', 'subtable', 'virtio', 'qos',
> > -  'policer', 'datapath', 'tunctl', 'attr', 
> > 'ethernet',
> > -  'ether', 'defrag', 'defragment', 'loopback', 
> > 'sflow',
> > -  'acl', 'initializer', 'recirc', 'xlated', 
> > 'unclosed',
> > -  'netlink', 'msec', 'usec', 'nsec', 'ms', 'us', 
> > 'ns',
> > -  'kilobits', 'kbps', 'kilobytes', 'megabytes', 
> > 'mbps',
> > -  'gigabytes', 'gbps', 'megabits', 'gigabits', 
> > 'pkts',
> > -  'tuple', 'miniflow', 'megaflow', 'conntrack',
> > -  'vlans', 'vxlans', 'arg', 'tpid', 'xbundle',
> > -  'xbundles', 'mbundle', 'mbundles', 'netflow',
> > -  'localnet', 'odp', 'pre', 'dst', 'dest', 'src',
> > -  'ethertype', 'cvlan', 'ips', 'msg', 'msgs',
> > -  'liveness', 'userspace', 'eventmask', 
> > 'datapaths',
> > -  'slowpath', 'fastpath', 'multicast', 'unicast',
> > -  'revalidation', 'namespace', 'qdisc', 

[ovs-dev] [PATCH v2] checkpatch: Extend and move extra_keywords list to file.

2024-06-05 Thread Mike Pattrick
This patch extends the extra_keywords list from 324 to 747 keywords and
moves this list to a separate file. The methodology used to create this
list was running the spell checker on a large volume of historical
patches and selecting any words that appeared multiple times.

The rational for using a separate file is to make management of this
list simpler by decoupling the code from the keywords.

Signed-off-by: Mike Pattrick 
---
v2: Included new file in distfiles
---
 utilities/automake.mk|   1 +
 utilities/checkpatch.py  |  67 +---
 utilities/extra_keywords | 747 +++
 3 files changed, 751 insertions(+), 64 deletions(-)
 create mode 100644 utilities/extra_keywords

diff --git a/utilities/automake.mk b/utilities/automake.mk
index 146b8c37f..3f14c0fef 100644
--- a/utilities/automake.mk
+++ b/utilities/automake.mk
@@ -65,6 +65,7 @@ EXTRA_DIST += \
utilities/ovs-vlan-test.in \
utilities/ovs-vsctl-bashcomp.bash \
utilities/checkpatch.py \
+   utilities/extra_keywords \
utilities/docker/Makefile \
utilities/docker/ovs-override.conf \
utilities/docker/start-ovs \
diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
index 6b293770d..08b5870d3 100755
--- a/utilities/checkpatch.py
+++ b/utilities/checkpatch.py
@@ -49,70 +49,9 @@ def open_spell_check_dict():
 codespell_file = ''
 
 try:
-extra_keywords = ['ovs', 'vswitch', 'vswitchd', 'ovs-vswitchd',
-  'netdev', 'selinux', 'ovs-ctl', 'dpctl', 'ofctl',
-  'openvswitch', 'dpdk', 'hugepage', 'hugepages',
-  'pmd', 'upcall', 'vhost', 'rx', 'tx', 'vhostuser',
-  'openflow', 'qsort', 'rxq', 'txq', 'perf', 'stats',
-  'struct', 'int', 'char', 'bool', 'upcalls', 'nicira',
-  'bitmask', 'ipv4', 'ipv6', 'tcp', 'tcp4', 'tcpv4',
-  'udp', 'udp4', 'udpv4', 'icmp', 'icmp4', 'icmpv6',
-  'vlan', 'vxlan', 'cksum', 'csum', 'checksum',
-  'ofproto', 'numa', 'mempool', 'mempools', 'mbuf',
-  'mbufs', 'hmap', 'cmap', 'smap', 'dhcpv4', 'dhcp',
-  'dhcpv6', 'opts', 'metadata', 'geneve', 'mutex',
-  'netdev', 'netdevs', 'subtable', 'virtio', 'qos',
-  'policer', 'datapath', 'tunctl', 'attr', 'ethernet',
-  'ether', 'defrag', 'defragment', 'loopback', 'sflow',
-  'acl', 'initializer', 'recirc', 'xlated', 'unclosed',
-  'netlink', 'msec', 'usec', 'nsec', 'ms', 'us', 'ns',
-  'kilobits', 'kbps', 'kilobytes', 'megabytes', 'mbps',
-  'gigabytes', 'gbps', 'megabits', 'gigabits', 'pkts',
-  'tuple', 'miniflow', 'megaflow', 'conntrack',
-  'vlans', 'vxlans', 'arg', 'tpid', 'xbundle',
-  'xbundles', 'mbundle', 'mbundles', 'netflow',
-  'localnet', 'odp', 'pre', 'dst', 'dest', 'src',
-  'ethertype', 'cvlan', 'ips', 'msg', 'msgs',
-  'liveness', 'userspace', 'eventmask', 'datapaths',
-  'slowpath', 'fastpath', 'multicast', 'unicast',
-  'revalidation', 'namespace', 'qdisc', 'uuid',
-  'ofport', 'subnet', 'revalidation', 'revalidator',
-  'revalidate', 'l2', 'l3', 'l4', 'openssl', 'mtu',
-  'ifindex', 'enum', 'enums', 'http', 'https', 'num',
-  'vconn', 'vconns', 'conn', 'nat', 'memset', 'memcmp',
-  'strcmp', 'strcasecmp', 'tc', 'ufid', 'api',
-  'ofpbuf', 'ofpbufs', 'hashmaps', 'hashmap', 'deref',
-  'dereference', 'hw', 'prio', 'sendmmsg', 'sendmsg',
-  'malloc', 'free', 'alloc', 'pid', 'ppid', 'pgid',
-  'uid', 'gid', 'sid', 'utime', 'stime', 'cutime',
-  'cstime', 'vsize', 'rss', 'rsslim', 'whcan', 'gtime',
-  'eip', 'rip', 'cgtime', 'dbg', 'gw', 'sbrec', 'bfd',
-  'sizeof', 'pmds', 'nic', 'nics', 'hwol', 'encap',
-  'decap', 'tlv', 'tlvs', 'decapsulation', 'fd',
-  'cacheline', 'xlate', 'skiplist', 'idl',
-  'comparator', 'natting', 'alg', 'pasv', 'epasv',
-  'wildcard', 'nated', 'amd64', 'x86_64',
-  'recirculation', 'linux', 'afxdp', 'promisc', 'goto',
-  'misconfigured', 'misconfiguration', 'checkpatch',
-  'debian', 'travis', 'cirrus', 'appveyor', 'faq',
-  'erspan', 'const', 'hotplug

[ovs-dev] [PATCH] checkpatch: Extend and move extra_keywords list to file.

2024-06-05 Thread Mike Pattrick
This patch extends the extra_keywords list from 324 to 747 keywords and
moves this list to a separate file. The methodology used to create this
list was running the spell checker on a large volume of historical
patches and selecting any words that appeared multiple times.

The rational for using a separate file is to make management of this
list simpler by decoupling the code from the keywords.

Signed-off-by: Mike Pattrick 
---
 utilities/checkpatch.py  |  67 +---
 utilities/extra_keywords | 747 +++
 2 files changed, 750 insertions(+), 64 deletions(-)
 create mode 100644 utilities/extra_keywords

diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
index 6b293770d..08b5870d3 100755
--- a/utilities/checkpatch.py
+++ b/utilities/checkpatch.py
@@ -49,70 +49,9 @@ def open_spell_check_dict():
 codespell_file = ''
 
 try:
-extra_keywords = ['ovs', 'vswitch', 'vswitchd', 'ovs-vswitchd',
-  'netdev', 'selinux', 'ovs-ctl', 'dpctl', 'ofctl',
-  'openvswitch', 'dpdk', 'hugepage', 'hugepages',
-  'pmd', 'upcall', 'vhost', 'rx', 'tx', 'vhostuser',
-  'openflow', 'qsort', 'rxq', 'txq', 'perf', 'stats',
-  'struct', 'int', 'char', 'bool', 'upcalls', 'nicira',
-  'bitmask', 'ipv4', 'ipv6', 'tcp', 'tcp4', 'tcpv4',
-  'udp', 'udp4', 'udpv4', 'icmp', 'icmp4', 'icmpv6',
-  'vlan', 'vxlan', 'cksum', 'csum', 'checksum',
-  'ofproto', 'numa', 'mempool', 'mempools', 'mbuf',
-  'mbufs', 'hmap', 'cmap', 'smap', 'dhcpv4', 'dhcp',
-  'dhcpv6', 'opts', 'metadata', 'geneve', 'mutex',
-  'netdev', 'netdevs', 'subtable', 'virtio', 'qos',
-  'policer', 'datapath', 'tunctl', 'attr', 'ethernet',
-  'ether', 'defrag', 'defragment', 'loopback', 'sflow',
-  'acl', 'initializer', 'recirc', 'xlated', 'unclosed',
-  'netlink', 'msec', 'usec', 'nsec', 'ms', 'us', 'ns',
-  'kilobits', 'kbps', 'kilobytes', 'megabytes', 'mbps',
-  'gigabytes', 'gbps', 'megabits', 'gigabits', 'pkts',
-  'tuple', 'miniflow', 'megaflow', 'conntrack',
-  'vlans', 'vxlans', 'arg', 'tpid', 'xbundle',
-  'xbundles', 'mbundle', 'mbundles', 'netflow',
-  'localnet', 'odp', 'pre', 'dst', 'dest', 'src',
-  'ethertype', 'cvlan', 'ips', 'msg', 'msgs',
-  'liveness', 'userspace', 'eventmask', 'datapaths',
-  'slowpath', 'fastpath', 'multicast', 'unicast',
-  'revalidation', 'namespace', 'qdisc', 'uuid',
-  'ofport', 'subnet', 'revalidation', 'revalidator',
-  'revalidate', 'l2', 'l3', 'l4', 'openssl', 'mtu',
-  'ifindex', 'enum', 'enums', 'http', 'https', 'num',
-  'vconn', 'vconns', 'conn', 'nat', 'memset', 'memcmp',
-  'strcmp', 'strcasecmp', 'tc', 'ufid', 'api',
-  'ofpbuf', 'ofpbufs', 'hashmaps', 'hashmap', 'deref',
-  'dereference', 'hw', 'prio', 'sendmmsg', 'sendmsg',
-  'malloc', 'free', 'alloc', 'pid', 'ppid', 'pgid',
-  'uid', 'gid', 'sid', 'utime', 'stime', 'cutime',
-  'cstime', 'vsize', 'rss', 'rsslim', 'whcan', 'gtime',
-  'eip', 'rip', 'cgtime', 'dbg', 'gw', 'sbrec', 'bfd',
-  'sizeof', 'pmds', 'nic', 'nics', 'hwol', 'encap',
-  'decap', 'tlv', 'tlvs', 'decapsulation', 'fd',
-  'cacheline', 'xlate', 'skiplist', 'idl',
-  'comparator', 'natting', 'alg', 'pasv', 'epasv',
-  'wildcard', 'nated', 'amd64', 'x86_64',
-  'recirculation', 'linux', 'afxdp', 'promisc', 'goto',
-  'misconfigured', 'misconfiguration', 'checkpatch',
-  'debian', 'travis', 'cirrus', 'appveyor', 'faq',
-  'erspan', 'const', 'hotplug', 'addresssanitizer',
-  'ovsdb', 'dpif', 'veth', 'rhel', 'jsonrpc', 'json',
-  'syscall', 'lacp', 'ipf', 'skb', 'valgrind',
-  'appctl', 'arp', 'asan', 'backport', 'backtrace',
-  'chmod', 'ci', 'cpu', 'cpus', 'dnat', 'dns', 'dpcls',
-  'eol', 'ethtool', 'fdb', 'freebsd', 'gcc', 'github',
-  'glibc', 'gre', 'inlined', 'ip', 'ipfix', 'ipsec',
-  'ixgbe

Re: [ovs-dev] [PATCH] python: idl: Fix index not being updated on row modification.

2024-06-05 Thread Mike Pattrick
On Mon, May 27, 2024 at 5:39 PM Ilya Maximets  wrote:
>
> When a row is modified, python IDL doesn't perform any operations on
> existing client-side indexes.  This means that if the column on which
> index is created changes, the old value will remain in the index and
> the new one will not be added to the index.  Beside lookup failures
> this is also causing inability to remove modified rows, because the
> new column value doesn't exist in the index causing an exception on
> attempt to remove it:
>
>  Traceback (most recent call last):
>File "ovsdbapp/backend/ovs_idl/connection.py", line 110, in run
>  self.idl.run()
>File "ovs/db/idl.py", line 465, in run
>  self.__parse_update(msg.params[2], OVSDB_UPDATE3)
>File "ovs/db/idl.py", line 924, in __parse_update
>  self.__do_parse_update(update, version, self.tables)
>File "ovs/db/idl.py", line 964, in __do_parse_update
>  changes = self.__process_update2(table, uuid, row_update)
>File "ovs/db/idl.py", line 991, in __process_update2
>  del table.rows[uuid]
>File "ovs/db/custom_index.py", line 102, in __delitem__
>  index.remove(val)
>File "ovs/db/custom_index.py", line 66, in remove
>  self.values.remove(self.index_entry_from_row(row))
>File "sortedcontainers/sortedlist.py", line 2015, in remove
>  raise ValueError('{0!r} not in list'.format(value))
>  ValueError: Datapath_Binding(
>uuid=UUID('498e66a2-70bc-4587-a66f-0433baf82f60'),
>tunnel_key=16711683, load_balancers=[], external_ids={}) not in list
>
> Fix that by always removing an existing row from indexes before
> modification and adding back afterwards.  This ensures that old
> values are removed from the index and new ones are added.
>
> This behavior is consistent with the C implementation.
>
> The new test that reproduces the removal issue is added.  Some extra
> testing infrastructure added to be able to handle and print out the
> 'indexed' table from the idltest schema.
>
> Fixes: 13973bc41524 ("Add multi-column index support for the Python IDL")
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2024-May/053159.html
> Reported-by: Roberto Bartzen Acosta 
> Signed-off-by: Ilya Maximets 

I've tested this a bit and it seems like a reasonable solution.

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] ovsdb: Use table indexes if available for ovsdb_query().

2024-06-03 Thread Mike Pattrick
On Mon, Jun 3, 2024 at 9:56 AM Simon Horman  wrote:
>
> On Mon, Jun 03, 2024 at 12:20:36AM -0400, Mike Pattrick wrote:
> > Currently all OVSDB database queries except for UUID lookups all result
> > in linear lookups over the entire table, even if an index is present.
> >
> > This patch modifies ovsdb_query() to attempt an index lookup first, if
> > possible. If no matching indexes are present then a linear index is
> > still conducted.
> >
> > Reported-at: https://issues.redhat.com/browse/FDP-590
> > Signed-off-by: Mike Pattrick 
>
> ...
>
> > diff --git a/ovsdb/query.c b/ovsdb/query.c
> > index eebe56412..e3e50a034 100644
> > --- a/ovsdb/query.c
> > +++ b/ovsdb/query.c
> > @@ -21,32 +21,116 @@
> >  #include "condition.h"
> >  #include "row.h"
> >  #include "table.h"
> > +#include "transaction.h"
> > +
> > +static bool
> > +ovsdb_query_index(struct ovsdb_table *table,
> > +  const struct ovsdb_condition *cnd,
> > +  const struct ovsdb_row **out)
> > +{
> > +for (size_t idx = 0; idx < table->schema->n_indexes; idx++) {
> > +const struct ovsdb_column_set *index = 
> > >schema->indexes[idx];
> > +struct hmap_node *node;
> > +size_t matches = 0;
> > +uint32_t hash = 0;
> > +
> > +if (index->n_columns != cnd->n_clauses) {
> > +continue;
> > +}
> > +
> > +/* The conditions may not be in the same order as the index. */
> > +for (size_t c = 0; c < cnd->n_clauses; c++) {
> > +const struct ovsdb_clause *cnd_cls = >clauses[c];
> > +
> > +if (cnd_cls->function != OVSDB_F_EQ) {
> > +return false;
> > +}
> > +
> > +for (size_t i = 0; i < index->n_columns; i++) {
> > +const struct ovsdb_column *idx_col = index->columns[i];
> > +
> > +if (cnd_cls->index == idx_col->index) {
> > +hash = ovsdb_datum_hash(_cls->arg, _col->type,
> > +hash);
> > +matches++;
> > +break;
> > +}
> > +}
> > +
> > +/* If none of the indexed columns match, continue to the next
> > + * index. */
> > +if (matches == c) {
> > +break;
> > +}
> > +}
> > +
> > +if (matches != cnd->n_clauses) {
> > +continue;
> > +}
> > +
> > +for (node = hmap_first_with_hash(>indexes[idx], hash); node;
> > + node = hmap_next_with_hash(node)) {
> > +struct ovsdb_row *irow = ovsdb_row_from_index_node(node, table,
> > +   idx);
> > +
> > +for (size_t c = 0; c < cnd->n_clauses; c++) {
> > +const struct ovsdb_clause *cnd_cls = >clauses[c];
> > +
> > +if (!ovsdb_datum_equals(_cls->arg,
> > +>fields[cnd_cls->index],
> > +_cls->column->type)) {
> > +irow = NULL;
> > +break;
> > +}
> > +}
> > +
> > +if (irow) {
> > +*out = irow;
> > +return true;
> > +}
> > +}
> > +
> > +/* In the case that there was a matching index but no matching 
> > row, the
> > + * index check is still considered to be a success. */
> > +return true;
>
> Hi Mike,
>
> Maybe I misread it, but it seems that the code above implements:
> 1. If a row is found, return true
> 2. Otherwise, returns true
>
> If so, then is there a need 1?

That's true, I could just break out of the loop. Github-ci also
pointed out another issue with the patch that I'm working on. When I
resubmit I'll correct this too.

-M

>
> > +}
> > +return false;
> > +}
>
> ...
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] ovsdb: Use table indexes if available for ovsdb_query().

2024-06-02 Thread Mike Pattrick
Currently all OVSDB database queries except for UUID lookups all result
in linear lookups over the entire table, even if an index is present.

This patch modifies ovsdb_query() to attempt an index lookup first, if
possible. If no matching indexes are present then a linear index is
still conducted.

Reported-at: https://issues.redhat.com/browse/FDP-590
Signed-off-by: Mike Pattrick 
---
 NEWS |   3 ++
 ovsdb/query.c| 102 +++
 ovsdb/row.h  |  28 +++
 ovsdb/transaction.c  |  27 ---
 tests/ovsdb-execution.at |  34 -
 tests/ovsdb-server.at|   2 +-
 tests/ovsdb-tool.at  |   2 +-
 7 files changed, 159 insertions(+), 39 deletions(-)

diff --git a/NEWS b/NEWS
index 5ae0108d5..616fd19a4 100644
--- a/NEWS
+++ b/NEWS
@@ -9,6 +9,9 @@ Post-v3.3.0
  https://github.com/openvswitch/ovs.git
- DPDK:
  * OVS validated with DPDK 23.11.1.
+   - OVSDB:
+ * Added support to lookup rows based on arbitrary indexes, instead of
+   only UUID indexes.
 
 
 v3.3.0 - 16 Feb 2024
diff --git a/ovsdb/query.c b/ovsdb/query.c
index eebe56412..e3e50a034 100644
--- a/ovsdb/query.c
+++ b/ovsdb/query.c
@@ -21,32 +21,116 @@
 #include "condition.h"
 #include "row.h"
 #include "table.h"
+#include "transaction.h"
+
+static bool
+ovsdb_query_index(struct ovsdb_table *table,
+  const struct ovsdb_condition *cnd,
+  const struct ovsdb_row **out)
+{
+for (size_t idx = 0; idx < table->schema->n_indexes; idx++) {
+const struct ovsdb_column_set *index = >schema->indexes[idx];
+struct hmap_node *node;
+size_t matches = 0;
+uint32_t hash = 0;
+
+if (index->n_columns != cnd->n_clauses) {
+continue;
+}
+
+/* The conditions may not be in the same order as the index. */
+for (size_t c = 0; c < cnd->n_clauses; c++) {
+const struct ovsdb_clause *cnd_cls = >clauses[c];
+
+if (cnd_cls->function != OVSDB_F_EQ) {
+return false;
+}
+
+for (size_t i = 0; i < index->n_columns; i++) {
+const struct ovsdb_column *idx_col = index->columns[i];
+
+if (cnd_cls->index == idx_col->index) {
+hash = ovsdb_datum_hash(_cls->arg, _col->type,
+hash);
+matches++;
+break;
+}
+}
+
+/* If none of the indexed columns match, continue to the next
+ * index. */
+if (matches == c) {
+break;
+}
+}
+
+if (matches != cnd->n_clauses) {
+continue;
+}
+
+for (node = hmap_first_with_hash(>indexes[idx], hash); node;
+ node = hmap_next_with_hash(node)) {
+struct ovsdb_row *irow = ovsdb_row_from_index_node(node, table,
+   idx);
+
+for (size_t c = 0; c < cnd->n_clauses; c++) {
+const struct ovsdb_clause *cnd_cls = >clauses[c];
+
+if (!ovsdb_datum_equals(_cls->arg,
+>fields[cnd_cls->index],
+_cls->column->type)) {
+irow = NULL;
+break;
+}
+}
+
+if (irow) {
+*out = irow;
+return true;
+}
+}
+
+/* In the case that there was a matching index but no matching row, the
+ * index check is still considered to be a success. */
+return true;
+}
+return false;
+}
 
 void
 ovsdb_query(struct ovsdb_table *table, const struct ovsdb_condition *cnd,
 bool (*output_row)(const struct ovsdb_row *, void *aux), void *aux)
 {
+const struct ovsdb_row *row = NULL;
+
 if (cnd->n_clauses > 0
 && cnd->clauses[0].column->index == OVSDB_COL_UUID
 && cnd->clauses[0].function == OVSDB_F_EQ) {
 /* Optimize the case where the query has a clause of the form "uuid ==
  * ", since we have an index on UUID. */
-const struct ovsdb_row *row;
 
 row = ovsdb_table_get_row(table, >clauses[0].arg.keys[0].uuid);
 if (row && row->table == table &&
 ovsdb_condition_match_every_clause(row, cnd)) {
 output_row(row, aux);
 }
-} else {
-/* Linear scan. */
-const struct ovsdb_row *row;
+return;
+}
 
-HMAP_FOR_EACH_SAFE (row, hmap_node, >rows) {
-if (ovsdb_condition_match_every_clause(row, cnd) &&
-!output_row(row, aux)) {
-  

[ovs-dev] [PATCH] ovsdb: Use table indexes if available for ovsdb_query().

2024-06-02 Thread Mike Pattrick
Currently all OVSDB database queries except for UUID lookups all result
in linear lookups over the entire table, even if an index is present.

This patch modifies ovsdb_query() to attempt an index lookup first, if
possible. If no matching indexes are present then a linear index is
still conducted.

Reported-at: https://issues.redhat.com/browse/FDP-590
Signed-off-by: Mike Pattrick 
---
 NEWS |   3 ++
 ovsdb/query.c| 102 +++
 ovsdb/row.h  |  28 +++
 ovsdb/transaction.c  |  27 ---
 tests/ovsdb-execution.at |  34 -
 tests/ovsdb-server.at|   2 +-
 tests/ovsdb-tool.at  |   2 +-
 7 files changed, 159 insertions(+), 39 deletions(-)

diff --git a/NEWS b/NEWS
index b92cec532..89f64d6f6 100644
--- a/NEWS
+++ b/NEWS
@@ -7,6 +7,9 @@ Post-v3.3.0
- The primary development branch has been renamed from 'master' to 'main'.
  The OVS tree remains hosted on GitHub.
  https://github.com/openvswitch/ovs.git
+   - OVSDB:
+ * Added support to lookup rows based on arbitrary indexes, instead of
+   only UUID indexes.
 
 
 v3.3.0 - 16 Feb 2024
diff --git a/ovsdb/query.c b/ovsdb/query.c
index eebe56412..e3e50a034 100644
--- a/ovsdb/query.c
+++ b/ovsdb/query.c
@@ -21,32 +21,116 @@
 #include "condition.h"
 #include "row.h"
 #include "table.h"
+#include "transaction.h"
+
+static bool
+ovsdb_query_index(struct ovsdb_table *table,
+  const struct ovsdb_condition *cnd,
+  const struct ovsdb_row **out)
+{
+for (size_t idx = 0; idx < table->schema->n_indexes; idx++) {
+const struct ovsdb_column_set *index = >schema->indexes[idx];
+struct hmap_node *node;
+size_t matches = 0;
+uint32_t hash = 0;
+
+if (index->n_columns != cnd->n_clauses) {
+continue;
+}
+
+/* The conditions may not be in the same order as the index. */
+for (size_t c = 0; c < cnd->n_clauses; c++) {
+const struct ovsdb_clause *cnd_cls = >clauses[c];
+
+if (cnd_cls->function != OVSDB_F_EQ) {
+return false;
+}
+
+for (size_t i = 0; i < index->n_columns; i++) {
+const struct ovsdb_column *idx_col = index->columns[i];
+
+if (cnd_cls->index == idx_col->index) {
+hash = ovsdb_datum_hash(_cls->arg, _col->type,
+hash);
+matches++;
+break;
+}
+}
+
+/* If none of the indexed columns match, continue to the next
+ * index. */
+if (matches == c) {
+break;
+}
+}
+
+if (matches != cnd->n_clauses) {
+continue;
+}
+
+for (node = hmap_first_with_hash(>indexes[idx], hash); node;
+ node = hmap_next_with_hash(node)) {
+struct ovsdb_row *irow = ovsdb_row_from_index_node(node, table,
+   idx);
+
+for (size_t c = 0; c < cnd->n_clauses; c++) {
+const struct ovsdb_clause *cnd_cls = >clauses[c];
+
+if (!ovsdb_datum_equals(_cls->arg,
+>fields[cnd_cls->index],
+_cls->column->type)) {
+irow = NULL;
+break;
+}
+}
+
+if (irow) {
+*out = irow;
+return true;
+}
+}
+
+/* In the case that there was a matching index but no matching row, the
+ * index check is still considered to be a success. */
+return true;
+}
+return false;
+}
 
 void
 ovsdb_query(struct ovsdb_table *table, const struct ovsdb_condition *cnd,
 bool (*output_row)(const struct ovsdb_row *, void *aux), void *aux)
 {
+const struct ovsdb_row *row = NULL;
+
 if (cnd->n_clauses > 0
 && cnd->clauses[0].column->index == OVSDB_COL_UUID
 && cnd->clauses[0].function == OVSDB_F_EQ) {
 /* Optimize the case where the query has a clause of the form "uuid ==
  * ", since we have an index on UUID. */
-const struct ovsdb_row *row;
 
 row = ovsdb_table_get_row(table, >clauses[0].arg.keys[0].uuid);
 if (row && row->table == table &&
 ovsdb_condition_match_every_clause(row, cnd)) {
 output_row(row, aux);
 }
-} else {
-/* Linear scan. */
-const struct ovsdb_row *row;
+return;
+}
 
-HMAP_FOR_EACH_SAFE (row, hmap_node, >rows) {
-if (ovsdb_condition_match_every_clause(row, cnd) &

Re: [ovs-dev] [PATCH 1/1] ofproto-dpif-upcall: Avoid stale ukeys leaks.

2024-05-29 Thread Mike Pattrick
On Thu, May 23, 2024 at 6:47 AM Roi Dayan via dev
 wrote:
>
> It is observed in some environments that there are much more ukeys than
> actual DP flows. For example:
>
> $ ovs-appctl upcall/show
> system@ovs-system:
> flows : (current 7) (avg 6) (max 117) (limit 2125)
> offloaded flows : 525
> dump duration : 1063ms
> ufid enabled : true
>
> 23: (keys 3612)
> 24: (keys 3625)
> 25: (keys 3485)
>
> The revalidator threads are busy revalidating the stale ukeys leading to
> high CPU and long dump duration.
>
> There are some possible situations that may result in stale ukeys that
> have no corresponding DP flows.
>
> In revalidator, push_dp_ops() doesn't check error if the op type is not
> DEL. It is possible that a PUT(MODIFY) fails, especially for tc offload
> case, where the old flow is deleted first and then the new one is
> created. If the creation fails, the ukey will be stale (no corresponding
> DP flow). This patch adds a warning in such case.
>
> Another possible scenario is in handle_upcalls() if a PUT operation did
> not succeed and op->error attribute was not set correctly it can lead to
> stale ukey in operational state.
>
> This patch adds checks in the sweep phase for such ukeys and move them
> to DELETE so that they can be cleared eventually.
>
> Co-authored-by: Han Zhou 
> Signed-off-by: Han Zhou 
> Signed-off-by: Roi Dayan 
> ---
>  ofproto/ofproto-dpif-upcall.c | 15 +++
>  1 file changed, 15 insertions(+)
>
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index 83609ec62b63..e9520ebdf910 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -57,6 +57,7 @@ COVERAGE_DEFINE(dumped_inconsistent_flow);
>  COVERAGE_DEFINE(dumped_new_flow);
>  COVERAGE_DEFINE(handler_duplicate_upcall);
>  COVERAGE_DEFINE(revalidate_missed_dp_flow);
> +COVERAGE_DEFINE(revalidate_missed_dp_flow_del);
>  COVERAGE_DEFINE(ukey_dp_change);
>  COVERAGE_DEFINE(ukey_invalid_stat_reset);
>  COVERAGE_DEFINE(ukey_replace_contention);
> @@ -278,6 +279,7 @@ enum flow_del_reason {
>  FDR_BAD_ODP_FIT,/* Bad ODP flow fit. */
>  FDR_FLOW_IDLE,  /* Flow idle timeout. */
>  FDR_FLOW_LIMIT, /* Kill all flows condition reached. */
> +FDR_FLOW_STALE, /* Flow stale detected. */
>  FDR_FLOW_WILDCARDED,/* Flow needs a narrower wildcard mask. */
>  FDR_NO_OFPROTO, /* Bridge not found. */
>  FDR_PURGE,  /* User requested flow deletion. */
> @@ -2557,6 +2559,10 @@ push_dp_ops(struct udpif *udpif, struct ukey_op *ops, 
> size_t n_ops)
>
>  if (op->dop.type != DPIF_OP_FLOW_DEL) {
>  /* Only deleted flows need their stats pushed. */
> +if (op->dop.error) {
> +VLOG_WARN_RL(, "push_dp_ops: error %d in op type %d, ukey 
> "
> + "%p", op->dop.error, op->dop.type, op->ukey);
> +}
>  continue;
>  }
>
> @@ -3027,6 +3033,15 @@ revalidator_sweep__(struct revalidator *revalidator, 
> bool purge)
>  del_reason = purge ? FDR_PURGE : FDR_UPDATE_FAIL;
>  } else if (!seq_mismatch) {
>  result = UKEY_KEEP;
> +} else if (!ukey->stats.used &&

Would it be possible for stats.used to be set but the dp flow to be
deleted? For example, if a flow is offloaded to TC, but something
external to OVS clears it?


Thanks,
Mike

> +   udpif_flow_time_delta(udpif, ukey) * 1000 >
> +   ofproto_max_idle) {
> +COVERAGE_INC(revalidate_missed_dp_flow_del);
> +VLOG_WARN_RL(, "revalidator_sweep__: Remove stale 
> ukey "
> + "%p delta %llus", ukey,
> + udpif_flow_time_delta(udpif, ukey));
> +result = UKEY_DELETE;
> +del_reason = FDR_FLOW_STALE;
>  } else {
>  struct dpif_flow_stats stats;
>  COVERAGE_INC(revalidate_missed_dp_flow);
> --
> 2.21.0
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 6/6] netdev-linux: Initialize link speed in error conditions.

2024-05-28 Thread Mike Pattrick
On Tue, May 28, 2024 at 4:26 PM Ilya Maximets  wrote:
>
> On 5/27/24 21:08, Mike Pattrick wrote:
> > Clang's static analyzer noted that the output from
> > netdev_linux_get_speed_locked can be checked even if this function
> > doesn't set any values.
> >
> > Now we always set those values to a sane default in all cases.
> >
> > Fixes: b8f8fad86435 ("netdev-linux: Use speed as max rate in tc classes.")
>
> This is still an incorrect Fixes tag.  The correct one is:
>
> Fixes: 19cffe30cfda ("netdev-linux: Avoid deadlock in netdev_get_speed.")

Sorry about that, I thought I had corrected it when resubmitting, but
it looks like this slipped through.

Cheers,
M

>
> The original netdev_get_speed() call was fine, because it ensures that values
> are zeroed out even on errors.  That is defined in netdev-provider API.  But
> the new static netdev_linux_get_speed_locked() function didn't do the same.
>
> I fixed that and applied the set.  Individual patches backported according to
> their Fixes tags.  Thanks!
>
> Best regards, Ilya Maximets.
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 6/8] ofproto-dpif: Define age as time_t in ofproto_unixctl_fdb_add().

2024-05-28 Thread Mike Pattrick
On Tue, May 28, 2024 at 7:46 AM Eelco Chaudron  wrote:
>
> Fix the warning from Coverity about potential truncation of the
> time_t value when copying to a local variable by changing the
> local variable's type to time_t.
>
> ccc24fc88d59 ("ofproto-dpif: APIs and CLI option to add/delete static fdb 
> entry.")
> Signed-off-by: Eelco Chaudron 
> ---

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 4/8] sflow: Use uint32_t instead of time_t for tick handling in the poller.

2024-05-28 Thread Mike Pattrick
On Tue, May 28, 2024 at 7:46 AM Eelco Chaudron  wrote:
>
> The sFlow library uses a uint32_t to configure timeout ticks, but
> stores this value as a time_t. Although this doesn't cause functional
> issues, it wastes space and confuses Coverity, potentially indicating
> a Y2K38 problem when storing uint32_t values in time_t. This patch
> updates the internal data structures to use uint32_t variables.
>
> Fixes: c72e245a0e2c ("Add InMon's sFlow Agent library to the build system.")
> Signed-off-by: Eelco Chaudron 
> ---

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 3/8] sflow: Replace libc's random() function with the OVS's random_range().

2024-05-28 Thread Mike Pattrick
On Tue, May 28, 2024 at 7:46 AM Eelco Chaudron  wrote:
>
> Coverity has flagged the use of a potentially unsafe function.
> Although this is not a concern in this case since it's not used for
> encryption, we should replace it with the OVS implementation to
> achieve better randomness.
>
> Fixes: c72e245a0e2c ("Add InMon's sFlow Agent library to the build system.")
> Signed-off-by: Eelco Chaudron 

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/8] cfm: Fix possible integer overflow in tc_add_matchall_policer().

2024-05-28 Thread Mike Pattrick
On Tue, May 28, 2024 at 7:46 AM Eelco Chaudron  wrote:
>
> Fix unintentional integer overflow reported by Coverity by adding
> the LL suffix to the numerical literals used in the multiplication.
>
> Fixes: 5767a79a4059 ("cfm: Require ccm received in demand mode.")
> Signed-off-by: Eelco Chaudron 

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/8] netdev-linux: Fix possible int overflow in tc_add_matchall_policer().

2024-05-28 Thread Mike Pattrick
On Tue, May 28, 2024 at 7:46 AM Eelco Chaudron  wrote:
>
> Fix unintentional integer overflow reported by Coverity by adding
> the ULL suffix to the numerical literals used in the multiplications.
>
> Fixes: ed2300cca0d3 ("netdev-linux: Refactor put police action netlink 
> message")
> Signed-off-by: Eelco Chaudron 

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 6/6] netdev-linux: Initialize link speed in error conditions.

2024-05-27 Thread Mike Pattrick
Clang's static analyzer noted that the output from
netdev_linux_get_speed_locked can be checked even if this function
doesn't set any values.

Now we always set those values to a sane default in all cases.

Fixes: b8f8fad86435 ("netdev-linux: Use speed as max rate in tc classes.")
Signed-off-by: Mike Pattrick 
---
 lib/netdev-linux.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index c89a85a38..ff5e94856 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -2727,6 +2727,7 @@ netdev_linux_get_speed_locked(struct netdev_linux *netdev,
   uint32_t *current, uint32_t *max)
 {
 if (netdev_linux_netnsid_is_remote(netdev)) {
+*current = *max = 0;
 return EOPNOTSUPP;
 }
 
@@ -2736,6 +2737,8 @@ netdev_linux_get_speed_locked(struct netdev_linux *netdev,
? 0 : netdev->current_speed;
 *max = MIN(UINT32_MAX,
netdev_features_to_bps(netdev->supported, 0) / 100ULL);
+} else {
+*current = *max = 0;
 }
 return netdev->get_features_error;
 }
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 5/6] netdev-linux: Return an error if device feature names are empty.

2024-05-27 Thread Mike Pattrick
When retrieving a list of features supported by a network card, return
with an error code if the request completed without an error but the
list contains zero entries.

In practice this should never happen, but it does contribute to a
detection in Clang's static analyzer.

Fixes: 6c59c195266c ("netdev-linux: Use ethtool to detect offload support.")
Signed-off-by: Mike Pattrick 
---
 lib/netdev-linux.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 25349c605..c89a85a38 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -2439,9 +2439,12 @@ netdev_linux_read_definitions(struct netdev_linux 
*netdev,
 int error = 0;
 
 error = netdev_linux_read_stringset_info(netdev, );
-if (error || !len) {
+if (error) {
 return error;
+} else if (!len) {
+return -EOPNOTSUPP;
 }
+
 strings = xzalloc(sizeof *strings + len * ETH_GSTRING_LEN);
 
 strings->cmd = ETHTOOL_GSTRINGS;
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 1/6] netdev-offload: Fix null pointer dereference warning on dump creation.

2024-05-27 Thread Mike Pattrick
Clang's static analyzer will complain about a null pointer dereference
because dumps can be set to null and then there is a loop where it could
have been written to. This is a false positive, but only because the
netdev dpif type won't change during this loop.

Instead, return early from the netdev_ports_flow_dump_create function if
dumps is NULL.

Signed-off-by: Mike Pattrick 
---
 lib/netdev-offload.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
index 931d634e1..8a9d36555 100644
--- a/lib/netdev-offload.c
+++ b/lib/netdev-offload.c
@@ -626,8 +626,8 @@ netdev_ports_traverse(const char *dpif_type,
 struct netdev_flow_dump **
 netdev_ports_flow_dump_create(const char *dpif_type, int *ports, bool terse)
 {
+struct netdev_flow_dump **dumps = NULL;
 struct port_to_netdev_data *data;
-struct netdev_flow_dump **dumps;
 int count = 0;
 int i = 0;
 
@@ -638,7 +638,11 @@ netdev_ports_flow_dump_create(const char *dpif_type, int 
*ports, bool terse)
 }
 }
 
-dumps = count ? xzalloc(sizeof *dumps * count) : NULL;
+if (!count) {
+goto unlock;
+}
+
+dumps = xzalloc(sizeof *dumps * count);
 
 HMAP_FOR_EACH (data, portno_node, _to_netdev) {
 if (netdev_get_dpif_type(data->netdev) == dpif_type) {
@@ -650,6 +654,8 @@ netdev_ports_flow_dump_create(const char *dpif_type, int 
*ports, bool terse)
 i++;
 }
 }
+
+unlock:
 ovs_rwlock_unlock(_to_netdev_rwlock);
 
 *ports = i;
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 0/6] clang: Fix Clang's static analyzer detections.

2024-05-27 Thread Mike Pattrick
Clang's static analyzer has identified several instances of uninitialized
variable usage and null pointer dereferences that - while not likely - are
possible. These mostly included making sure that a variable is properly set
or error code properly returned in every error condition.

Signed-off-by: Mike Pattrick 

Mike Pattrick (6):
  netdev-offload: Fix null pointer dereference warning on dump creation.
  netdev-native-tnl: Fix use of uninitialized offset on SRv6 header pop.
  dpctl: Fix uninitialized value when deleting flows.
  socket: Fix uninitialized values in inet_parse_ functions.
  netdev-linux: Return an error if device feature names are empty.
  netdev-linux: Initialize link speed in error conditions.

 lib/dpctl.c | 12 +---
 lib/netdev-linux.c  |  8 +++-
 lib/netdev-native-tnl.c |  5 -
 lib/netdev-offload.c| 10 --
 lib/socket-util.c   |  9 +
 5 files changed, 33 insertions(+), 11 deletions(-)

-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 3/6] dpctl: Fix uninitialized value when deleting flows.

2024-05-27 Thread Mike Pattrick
Clang's static analyzer will complain about an uninitialized value
because we weren't setting a value for ufid_generated in all code paths.

Now we initialize this on declaration. This patch also corrects the
reverse x-mass of variable declaration.

Fixes: bbe2e3928747 ("dpctl: Fix broken flow deletion via ovs-dpctl due to 
missing ufid.")
Signed-off-by: Mike Pattrick 
---
 lib/dpctl.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/lib/dpctl.c b/lib/dpctl.c
index 3c555a559..a70df5342 100644
--- a/lib/dpctl.c
+++ b/lib/dpctl.c
@@ -1359,19 +1359,17 @@ static int
 dpctl_del_flow_dpif(struct dpif *dpif, const char *key_s,
 struct dpctl_params *dpctl_p)
 {
+struct dpif_port_dump port_dump;
 struct dpif_flow_stats stats;
+bool ufid_generated = false;
 struct dpif_port dpif_port;
-struct dpif_port_dump port_dump;
-struct ofpbuf key;
+bool ufid_present = false;
+struct simap port_names;
 struct ofpbuf mask; /* To be ignored. */
-
+struct ofpbuf key;
 ovs_u128 ufid;
-bool ufid_generated;
-bool ufid_present;
-struct simap port_names;
 int n, error;
 
-ufid_present = false;
 n = odp_ufid_from_string(key_s, );
 if (n < 0) {
 dpctl_error(dpctl_p, -n, "parsing flow ufid");
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 2/6] netdev-native-tnl: Fix use of uninitialized offset on SRv6 header pop.

2024-05-27 Thread Mike Pattrick
Clang's static analyzer will complain about uninitialized value 'hlen'
because we weren't properly checking the error code from a function that
would have initialized the value.

Instead, add a check for that return code.

Fixes: 03fc1ad78521 ("userspace: Add SRv6 tunnel support.")
Signed-off-by: Mike Pattrick 
---
 lib/netdev-native-tnl.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index b21176037..d6f46ac4a 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -1067,7 +1067,10 @@ netdev_srv6_pop_header(struct dp_packet *packet)
 }
 
 pkt_metadata_init_tnl(md);
-netdev_tnl_ip_extract_tnl_md(packet, tnl, );
+if (!netdev_tnl_ip_extract_tnl_md(packet, tnl, )) {
+goto err;
+}
+
 dp_packet_reset_packet(packet, hlen);
 
 return packet;
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 4/6] socket: Fix uninitialized values in inet_parse_ functions.

2024-05-27 Thread Mike Pattrick
Clang's static analyzer will complain about uninitialized value
dns_failure because we weren't setting a value for dns_failure in all
code paths.

Now we initialize this in the error conditions of inet_parse_passive and
inet_parse_active.

Fixes: 08e9e5337383 ("ovsdb: raft: Fix inability to read the database with DNS 
host names.")
Fixes: 5f219af8b3c7 ("ovsdb-server: Fix handling of DNS name for listener 
configuration.")
Signed-off-by: Mike Pattrick 
---
 lib/socket-util.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/lib/socket-util.c b/lib/socket-util.c
index 2d89fce85..c569b7d16 100644
--- a/lib/socket-util.c
+++ b/lib/socket-util.c
@@ -546,9 +546,15 @@ inet_parse_active(const char *target_, int default_port,
 if (!host) {
 VLOG_ERR("%s: host must be specified", target_);
 ok = false;
+if (dns_failure) {
+*dns_failure = false;
+}
 } else if (!port && default_port < 0) {
 VLOG_ERR("%s: port must be specified", target_);
 ok = false;
+if (dns_failure) {
+*dns_failure = false;
+}
 } else {
 ok = parse_sockaddr_components(ss, host, port, default_port,
target_, resolve_host, dns_failure);
@@ -671,6 +677,9 @@ inet_parse_passive(const char *target_, int default_port,
 if (!port && default_port < 0) {
 VLOG_ERR("%s: port must be specified", target_);
 ok = false;
+if (dns_failure) {
+*dns_failure = false;
+}
 } else {
 ok = parse_sockaddr_components(ss, host, port, default_port,
target_, resolve_host, dns_failure);
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 5/6] netdev-linux: Return an error if device feature names are empty.

2024-05-23 Thread Mike Pattrick
When retrieving a list of features supported by a network card, return
with an error code if the request completed without an error but the
list contains zero entries.

In practice this should never happen, but it does contribute to a
detection in Clang's static analyzer.

Fixes: 6c59c195266c ("netdev-linux: Use ethtool to detect offload support.")
Signed-off-by: Mike Pattrick 
---
 lib/netdev-linux.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 25349c605..8b855bdc4 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -2439,7 +2439,9 @@ netdev_linux_read_definitions(struct netdev_linux *netdev,
 int error = 0;
 
 error = netdev_linux_read_stringset_info(netdev, );
-if (error || !len) {
+if (!len) {
+return -EOPNOTSUPP;
+} else if (error) {
 return error;
 }
 strings = xzalloc(sizeof *strings + len * ETH_GSTRING_LEN);
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 4/6] socket: Fix uninitialized values in inet_parse_ functions.

2024-05-23 Thread Mike Pattrick
Clang's static analyzer will complain about uninitialized value
dns_failure because we weren't setting a value for dns_failure in all
code paths.

Now we initialize this in the error conditions of inet_parse_passive and
inet_parse_active.

Fixes: 08e9e5337383 ("ovsdb: raft: Fix inability to read the database with DNS 
host names.")
Fixes: 5f219af8b3c7 ("ovsdb-server: Fix handling of DNS name for listener 
configuration.")
Signed-off-by: Mike Pattrick 
---
 lib/socket-util.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/lib/socket-util.c b/lib/socket-util.c
index 2d89fce85..c569b7d16 100644
--- a/lib/socket-util.c
+++ b/lib/socket-util.c
@@ -546,9 +546,15 @@ inet_parse_active(const char *target_, int default_port,
 if (!host) {
 VLOG_ERR("%s: host must be specified", target_);
 ok = false;
+if (dns_failure) {
+*dns_failure = false;
+}
 } else if (!port && default_port < 0) {
 VLOG_ERR("%s: port must be specified", target_);
 ok = false;
+if (dns_failure) {
+*dns_failure = false;
+}
 } else {
 ok = parse_sockaddr_components(ss, host, port, default_port,
target_, resolve_host, dns_failure);
@@ -671,6 +677,9 @@ inet_parse_passive(const char *target_, int default_port,
 if (!port && default_port < 0) {
 VLOG_ERR("%s: port must be specified", target_);
 ok = false;
+if (dns_failure) {
+*dns_failure = false;
+}
 } else {
 ok = parse_sockaddr_components(ss, host, port, default_port,
target_, resolve_host, dns_failure);
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 6/6] netdev-linux: Initialize link speed in error conditions.

2024-05-23 Thread Mike Pattrick
Clang's static analyzer noted that the output from
netdev_linux_get_speed_locked can be checked even if this function
doesn't set any values.

Now we always set those values to a sane default in all cases.

Fixes: 6240c0b4c80e ("netdev: Add netdev_get_speed() to netdev API.")
Signed-off-by: Mike Pattrick 
---
 lib/netdev-linux.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 8b855bdc4..83c19618c 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -2726,6 +2726,8 @@ netdev_linux_get_speed_locked(struct netdev_linux *netdev,
   uint32_t *current, uint32_t *max)
 {
 if (netdev_linux_netnsid_is_remote(netdev)) {
+*current = 0;
+*max = 0;
 return EOPNOTSUPP;
 }
 
@@ -2735,6 +2737,9 @@ netdev_linux_get_speed_locked(struct netdev_linux *netdev,
? 0 : netdev->current_speed;
 *max = MIN(UINT32_MAX,
netdev_features_to_bps(netdev->supported, 0) / 100ULL);
+} else {
+*current = 0;
+*max = 0;
 }
 return netdev->get_features_error;
 }
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 3/6] dpctl: Fix uninitialized value when deleting flows.

2024-05-23 Thread Mike Pattrick
Clang's static analyzer will complain about an uninitialized value
because we weren't setting a value for ufid_generated in all code paths.

Now we initialize this on declaration. This patch also corrects the
reverse x-mass of variable declaration.

Fixes: bbe2e3928747 ("dpctl: Fix broken flow deletion via ovs-dpctl due to 
missing ufid.")
Signed-off-by: Mike Pattrick 
---
 lib/dpctl.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/lib/dpctl.c b/lib/dpctl.c
index 3c555a559..a70df5342 100644
--- a/lib/dpctl.c
+++ b/lib/dpctl.c
@@ -1359,19 +1359,17 @@ static int
 dpctl_del_flow_dpif(struct dpif *dpif, const char *key_s,
 struct dpctl_params *dpctl_p)
 {
+struct dpif_port_dump port_dump;
 struct dpif_flow_stats stats;
+bool ufid_generated = false;
 struct dpif_port dpif_port;
-struct dpif_port_dump port_dump;
-struct ofpbuf key;
+bool ufid_present = false;
+struct simap port_names;
 struct ofpbuf mask; /* To be ignored. */
-
+struct ofpbuf key;
 ovs_u128 ufid;
-bool ufid_generated;
-bool ufid_present;
-struct simap port_names;
 int n, error;
 
-ufid_present = false;
 n = odp_ufid_from_string(key_s, );
 if (n < 0) {
 dpctl_error(dpctl_p, -n, "parsing flow ufid");
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 2/6] netdev-native-tnl: Fix use of uninitialized offset on SRv6 header pop.

2024-05-23 Thread Mike Pattrick
Clang's static analyzer will complain about uninitialized value 'hlen'
because we weren't properly checking the error code from a function that
would have initialized the value.

Instead, add a check for that return code.

Signed-off-by: Mike Pattrick 
Fixes: 03fc1ad78521 ("userspace: Add SRv6 tunnel support.")
---
 lib/netdev-native-tnl.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index dee9ab344..6bcc00d8c 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -1068,7 +1068,10 @@ netdev_srv6_pop_header(struct dp_packet *packet)
 }
 
 pkt_metadata_init_tnl(md);
-netdev_tnl_ip_extract_tnl_md(packet, tnl, );
+if (!netdev_tnl_ip_extract_tnl_md(packet, tnl, )) {
+goto err;
+}
+
 dp_packet_reset_packet(packet, hlen);
 
 return packet;
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 1/6] netdev-offload: Fix null pointer dereference' warnings on dump creation.

2024-05-23 Thread Mike Pattrick
Clang's static analyzer will complain about a null pointer dereference
because dumps can be set to null and then there is a loop where it could
have been written to. This is a false positive, but only because the
netdev dpif type won't change during this loop.

Instead, return early from the netdev_ports_flow_dump_create function if
dumps is NULL.

Signed-off-by: Mike Pattrick 
---
 lib/netdev-offload.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
index 931d634e1..8a9d36555 100644
--- a/lib/netdev-offload.c
+++ b/lib/netdev-offload.c
@@ -626,8 +626,8 @@ netdev_ports_traverse(const char *dpif_type,
 struct netdev_flow_dump **
 netdev_ports_flow_dump_create(const char *dpif_type, int *ports, bool terse)
 {
+struct netdev_flow_dump **dumps = NULL;
 struct port_to_netdev_data *data;
-struct netdev_flow_dump **dumps;
 int count = 0;
 int i = 0;
 
@@ -638,7 +638,11 @@ netdev_ports_flow_dump_create(const char *dpif_type, int 
*ports, bool terse)
 }
 }
 
-dumps = count ? xzalloc(sizeof *dumps * count) : NULL;
+if (!count) {
+goto unlock;
+}
+
+dumps = xzalloc(sizeof *dumps * count);
 
 HMAP_FOR_EACH (data, portno_node, _to_netdev) {
 if (netdev_get_dpif_type(data->netdev) == dpif_type) {
@@ -650,6 +654,8 @@ netdev_ports_flow_dump_create(const char *dpif_type, int 
*ports, bool terse)
 i++;
 }
 }
+
+unlock:
 ovs_rwlock_unlock(_to_netdev_rwlock);
 
 *ports = i;
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH]ipf: Fix ovs ipf crash.

2024-05-22 Thread Mike Pattrick
Hello laixaingwu,

I happen to have a patch on the list right now for a similar sounding issue:

https://patchwork.ozlabs.org/project/openvswitch/patch/20240516153832.153496-1-...@redhat.com/

Do you happen to have a stack trace available for this crash? That
could help determine if the crash is triggered at the same location.

I'm not sure I like the approach of not adding any fragments to the
batch during expiration, as this changes how packets are processed
currently and may be an unexpected behaviour for end users. Would it
be better to only add a few to the current batch and add the remaining
to later batches?

Thanks,
M



On Tue, May 21, 2024 at 10:20 PM laixiangwu <15310488...@163.com> wrote:
>
> Description:
>
> when 1) The fragment timeout is between 15 seconds and 25 seconds; 2)
> The ipf_list currently has received more than 32 fragments, and there
> are other fragments of same big packet that have not been received.
>
> When the above two scenario conditions are met, due to exceeding the
> capacity of the packet batch(here is 32), ipf_dp_packet_batch_add
> returns false, and ipf_list will not be cleared. However, the 32
> fragments packets added to the packet batch will be processed normally.
> When receiving the subsequent fragments of the ipf_list, because the
> first 32 fragments have been processed, when processing subsequent
> fragment packets, relevant information about the processed fragment
> packets will be read,therefore will occur carsh.
> One solution is do not forward timeout fragment packets from the above
> scenarios, that is, do not add them to the packet batch, and handle
> other scenarios according to the original logic.
> ---
>  lib/ipf.c | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/lib/ipf.c b/lib/ipf.c
> index d45266374..9258173ab 100644
> --- a/lib/ipf.c
> +++ b/lib/ipf.c
> @@ -1011,7 +1011,7 @@ ipf_purge_list_check(struct ipf *ipf, struct ipf_list 
> *ipf_list,
>  }
>
>  /* Does the packet batch management and common accounting work associated
> - * with 'ipf_send_completed_frags()' and 'ipf_send_expired_frags()'. */
> + * with 'ipf_send_completed_frags()'. */
>  static bool
>  ipf_send_frags_in_list(struct ipf *ipf, struct ipf_list *ipf_list,
> struct dp_packet_batch *pb,
> @@ -1076,8 +1076,7 @@ ipf_send_completed_frags(struct ipf *ipf, struct 
> dp_packet_batch *pb,
>   * a packet batch to be processed by the calling application, typically
>   * conntrack. Also cleans up the list context when it is empty.*/
>  static void
> -ipf_send_expired_frags(struct ipf *ipf, struct dp_packet_batch *pb,
> -   long long now, bool v6)
> +ipf_clean_expired_frags(struct ipf *ipf, long long now)
>  {
>  enum {
>  /* Very conservative, due to DOS probability. */
> @@ -1099,8 +1098,7 @@ ipf_send_expired_frags(struct ipf *ipf, struct 
> dp_packet_batch *pb,
>  break;
>  }
>
> -if (ipf_send_frags_in_list(ipf, ipf_list, pb, IPF_FRAG_EXPIRY_LIST,
> -   v6, now)) {
> +if (ipf_purge_list_check(ipf, ipf_list, now)) {
>  ipf_expiry_list_clean(>frag_lists, ipf_list);
>  lists_removed++;
>  } else {
> @@ -1249,7 +1247,7 @@ ipf_postprocess_conntrack(struct ipf *ipf, struct 
> dp_packet_batch *pb,
>  bool v6 = dl_type == htons(ETH_TYPE_IPV6);
>  ipf_post_execute_reass_pkts(ipf, pb, v6);
>  ipf_send_completed_frags(ipf, pb, now, v6);
> -ipf_send_expired_frags(ipf, pb, now, v6);
> +ipf_clean_expired_frags(ipf, now);
>  }
>  }
>
> --
> 2.31.1.windows.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] fix ovs ipf crash

2024-05-21 Thread Mike Pattrick
On Mon, May 20, 2024 at 2:56 AM laixiangwu <15310488...@163.com> wrote:
>
> Description:
>
> when 1) The fragment timeout is between 15 seconds and 25 seconds; 2) The 
> ipf_list currently has received more than 32 fragments, and there are other 
> fragments of same big packet that have not been received.
>
> When the above two scenario conditions are met, due to exceeding the capacity 
> of the packet batch(here is 32), ipf_dp_packet_batch_add returns false, and 
> ipf_list will not be cleared. However, the 32 fragments packets added to the 
> packet batch will be processed normally. When receiving the subsequent 
> fragments of the ipf_list, because the first 32 fragments have been 
> processed, when processing subsequent fragment packets, relevant information 
> about the processed fragment packets will be read,therefore will occur carsh.
> One solution is do not forward timeout fragment packets from the above 
> scenarios, that is, do not add them to the packet batch, and handle other 
> scenarios according to the original logic.

Hello laixaingwu,

I happen to have a patch on the list right now for a similar sounding issue:

https://patchwork.ozlabs.org/project/openvswitch/patch/20240516153832.153496-1-...@redhat.com/

Do you happen to have a stack trace available for this crash? That
could help determine if the crash is triggered at the same location.

I'm not sure I like the approach of not adding any fragments to the
batch during expiration, as this changes how packets are processed
currently and may be an unexpected behaviour for end users. Would it
be better to only add a few to the current batch and add the remaining
to later batches?

Thanks,
M

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 3/5] dpctl: Fix Clang's static analyzer 'garbage value' warnings.

2024-05-16 Thread Mike Pattrick
Clang's static analyzer will complain about an uninitialized value
because we weren't setting a value for ufid_generated in all code paths.

Now we initialize this on declaration.

Signed-off-by: Mike Pattrick 
---
 lib/dpctl.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/lib/dpctl.c b/lib/dpctl.c
index 3c555a559..9c287d060 100644
--- a/lib/dpctl.c
+++ b/lib/dpctl.c
@@ -1366,12 +1366,11 @@ dpctl_del_flow_dpif(struct dpif *dpif, const char 
*key_s,
 struct ofpbuf mask; /* To be ignored. */
 
 ovs_u128 ufid;
-bool ufid_generated;
-bool ufid_present;
+bool ufid_generated = false;
+bool ufid_present = false;
 struct simap port_names;
 int n, error;
 
-ufid_present = false;
 n = odp_ufid_from_string(key_s, );
 if (n < 0) {
 dpctl_error(dpctl_p, -n, "parsing flow ufid");
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 1/5] netdev-offload: Fix Clang's static analyzer 'null pointer dereference' warnings.

2024-05-16 Thread Mike Pattrick
Clang's static analyzer will complain about a null pointer dereference
because dumps can be set to null and then there is a loop where it could
have been written to.

Instead, return early from the netdev_ports_flow_dump_create function if
dumps is NULL.

Signed-off-by: Mike Pattrick 
---
 lib/netdev-offload.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
index 931d634e1..02b1cf203 100644
--- a/lib/netdev-offload.c
+++ b/lib/netdev-offload.c
@@ -638,7 +638,14 @@ netdev_ports_flow_dump_create(const char *dpif_type, int 
*ports, bool terse)
 }
 }
 
-dumps = count ? xzalloc(sizeof *dumps * count) : NULL;
+if (count == 0) {
+*ports = 0;
+ovs_rwlock_unlock(_to_netdev_rwlock);
+
+return NULL;
+}
+
+dumps = xzalloc(sizeof *dumps * count);
 
 HMAP_FOR_EACH (data, portno_node, _to_netdev) {
 if (netdev_get_dpif_type(data->netdev) == dpif_type) {
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 5/5] netdev-linux: Fix Clang's static analyzer uninitialized values warnings.

2024-05-16 Thread Mike Pattrick
Clang's static analyzer will complain about an uninitialized value
because in some error conditions we wouldn't set all values that are
used later.

Now we initialize more values that are needed later even in error
conditions.

Signed-off-by: Mike Pattrick 
---
 lib/netdev-linux.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 25349c605..66dae3e1a 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -2439,7 +2439,9 @@ netdev_linux_read_definitions(struct netdev_linux *netdev,
 int error = 0;
 
 error = netdev_linux_read_stringset_info(netdev, );
-if (error || !len) {
+if (!len) {
+return -EOPNOTSUPP;
+} else if (error) {
 return error;
 }
 strings = xzalloc(sizeof *strings + len * ETH_GSTRING_LEN);
@@ -2724,6 +2726,7 @@ netdev_linux_get_speed_locked(struct netdev_linux *netdev,
   uint32_t *current, uint32_t *max)
 {
 if (netdev_linux_netnsid_is_remote(netdev)) {
+*current = 0;
 return EOPNOTSUPP;
 }
 
@@ -2733,6 +2736,8 @@ netdev_linux_get_speed_locked(struct netdev_linux *netdev,
? 0 : netdev->current_speed;
 *max = MIN(UINT32_MAX,
netdev_features_to_bps(netdev->supported, 0) / 100ULL);
+} else {
+*current = 0;
 }
 return netdev->get_features_error;
 }
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 4/5] socket: Fix Clang's static analyzer 'garbage value' warnings.

2024-05-16 Thread Mike Pattrick
Clang's static analyzer will complain about an uninitialized value
because we weren't setting a value for dns_failure in all code paths.

Now we initialize this on declaration.

Signed-off-by: Mike Pattrick 
---
 lib/socket-util.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/socket-util.c b/lib/socket-util.c
index 2d89fce85..1d21ce01c 100644
--- a/lib/socket-util.c
+++ b/lib/socket-util.c
@@ -711,7 +711,7 @@ inet_open_passive(int style, const char *target, int 
default_port,
 struct sockaddr_storage ss;
 int fd = 0, error;
 unsigned int yes = 1;
-bool dns_failure;
+bool dns_failure = false;
 
 if (!inet_parse_passive(target, default_port, , true, _failure)) {
 if (dns_failure) {
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 0/5] clang: Fix Clang's static analyzer detections.

2024-05-16 Thread Mike Pattrick
Clang's static analyzer has identified several instances of uninitialized
variable usage and null pointer dereferencing that while not likely, is
possible. These mostly included making sure that a variable is properly set
or error code properly returned in every error condition.

Signed-off-by: Mike Pattrick 


Mike Pattrick (5):
  netdev-offload: Fix Clang's static analyzer 'null pointer dereference'
warnings.
  netdev-native-tnl: Fix Clang's static analyzer 'uninitialized value'
warnings.
  dpctl: Fix Clang's static analyzer 'garbage value' warnings.
  socket: Fix Clang's static analyzer 'garbage value' warnings.
  netdev-linux: Fix Clang's static analyzer uninitialized values
warnings.

 lib/dpctl.c | 5 ++---
 lib/netdev-linux.c  | 7 ++-
 lib/netdev-native-tnl.c | 4 +++-
 lib/netdev-offload.c| 9 -
 lib/socket-util.c   | 2 +-
 5 files changed, 20 insertions(+), 7 deletions(-)

-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 2/5] netdev-native-tnl: Fix Clang's static analyzer 'uninitialized value' warnings.

2024-05-16 Thread Mike Pattrick
Clang's static analyzer will complain about an uninitialized value
because we weren't properly checking the error code from a function that
would have initialized the value.

Instead, add a check for that return code.

Signed-off-by: Mike Pattrick 
---
 lib/netdev-native-tnl.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index dee9ab344..6e6b15764 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -1068,7 +1068,9 @@ netdev_srv6_pop_header(struct dp_packet *packet)
 }
 
 pkt_metadata_init_tnl(md);
-netdev_tnl_ip_extract_tnl_md(packet, tnl, );
+if (netdev_tnl_ip_extract_tnl_md(packet, tnl, ) == NULL) {
+goto err;
+}
 dp_packet_reset_packet(packet, hlen);
 
 return packet;
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] compiler: Fix errors in Clang 17 ubsan checks.

2024-05-16 Thread Mike Pattrick
Recheck-request: github-robot

On Thu, May 16, 2024 at 9:58 AM Mike Pattrick  wrote:
>
> This patch attempts to fix a large number of ubsan error messages that
> take the following form:
>
> lib/netlink-notifier.c:237:13: runtime error: call to function
> route_table_change through pointer to incorrect function type
> 'void (*)(const void *, void *)'
>
> In Clang 17 the undefined behaviour sanatizer check for function
> pointers was enabled by default, whereas it was previously disabled
> while compiling C code. These warnings are a false positive in the case
> of OVS, as our macros already check to make sure the function parameter
> is the correct size.
>
> So that check is disabled in the single function that is causing all of
> the errors.
>
> Signed-off-by: Mike Pattrick 
> ---
> v2: Changed macro name
> ---
>  include/openvswitch/compiler.h | 11 +++
>  lib/ovs-rcu.c  |  1 +
>  2 files changed, 12 insertions(+)
>
> diff --git a/include/openvswitch/compiler.h b/include/openvswitch/compiler.h
> index 878c5c6a7..f49b23683 100644
> --- a/include/openvswitch/compiler.h
> +++ b/include/openvswitch/compiler.h
> @@ -69,6 +69,17 @@
>  #define OVS_UNLIKELY(CONDITION) (!!(CONDITION))
>  #endif
>
> +/* Clang 17's implementation of ubsan enables checking that function pointers
> + * match the type of the called function. This currently breaks ovs-rcu, 
> which
> + * calls multiple different types of callbacks via a generic void *(void*)
> + * function pointer type. This macro enables disabling that check for 
> specific
> + * functions. */
> +#if __clang__ && __has_feature(undefined_behavior_sanitizer)
> +#define OVS_NO_SANITIZE_FUNCTION __attribute__((no_sanitize("function")))
> +#else
> +#define OVS_NO_SANITIZE_FUNCTION
> +#endif
> +
>  #if __has_feature(c_thread_safety_attributes)
>  /* "clang" annotations for thread safety check.
>   *
> diff --git a/lib/ovs-rcu.c b/lib/ovs-rcu.c
> index 9e07d9bab..597fe6826 100644
> --- a/lib/ovs-rcu.c
> +++ b/lib/ovs-rcu.c
> @@ -327,6 +327,7 @@ ovsrcu_postpone__(void (*function)(void *aux), void *aux)
>  }
>
>  static bool
> +OVS_NO_SANITIZE_FUNCTION
>  ovsrcu_call_postponed(void)
>  {
>  struct ovsrcu_cbset *cbset;
> --
> 2.39.3
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 2/2] ipf: Handle common case of ipf defragmentation.

2024-05-16 Thread Mike Pattrick
When conntrack is reassembling packet fragments, the same reassembly
context can be shared across multiple threads handling different packets
simultaneously. Once a full packet is assembled, it is added to a packet
batch for processing, in the case where there are multiple different pmd
threads accessing conntrack simultaneously, there is a race condition
where the reassembled packet may be added to an arbitrary batch even if
the current batch is available.

When this happens, the packet may be handled incorrectly as it is
inserted into a random openflow execution pipeline, instead of the
pipeline for that packets flow.

This change makes a best effort attempt to try to add the defragmented
packet to the current batch. directly. This should succeed most of the
time.

Fixes: 4ea96698f667 ("Userspace datapath: Add fragmentation handling.")
Reported-at: https://issues.redhat.com/browse/FDP-560
Signed-off-by: Mike Pattrick 
---
 lib/ipf.c | 27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/lib/ipf.c b/lib/ipf.c
index 3c8960be3..2d715f5e9 100644
--- a/lib/ipf.c
+++ b/lib/ipf.c
@@ -506,13 +506,15 @@ ipf_reassemble_v6_frags(struct ipf_list *ipf_list)
 }
 
 /* Called when a frag list state transitions to another state. This is
- * triggered by new fragment for the list being received.*/
-static void
+* triggered by new fragment for the list being received. Returns a reassembled
+* packet if this fragment has completed one. */
+static struct reassembled_pkt *
 ipf_list_state_transition(struct ipf *ipf, struct ipf_list *ipf_list,
   bool ff, bool lf, bool v6)
 OVS_REQUIRES(ipf->ipf_lock)
 {
 enum ipf_list_state curr_state = ipf_list->state;
+struct reassembled_pkt *ret = NULL;
 enum ipf_list_state next_state;
 switch (curr_state) {
 case IPF_LIST_STATE_UNUSED:
@@ -562,12 +564,15 @@ ipf_list_state_transition(struct ipf *ipf, struct 
ipf_list *ipf_list,
 ipf_reassembled_list_add(>reassembled_pkt_list, rp);
 ipf_expiry_list_remove(ipf_list);
 next_state = IPF_LIST_STATE_COMPLETED;
+ret = rp;
 } else {
 next_state = IPF_LIST_STATE_REASS_FAIL;
 }
 }
 }
 ipf_list->state = next_state;
+
+return ret;
 }
 
 /* Some sanity checks are redundant, but prudent, in case code paths for
@@ -799,7 +804,8 @@ ipf_is_frag_duped(const struct ipf_frag *frag_list, int 
last_inuse_idx,
 static bool
 ipf_process_frag(struct ipf *ipf, struct ipf_list *ipf_list,
  struct dp_packet *pkt, uint16_t start_data_byte,
- uint16_t end_data_byte, bool ff, bool lf, bool v6)
+ uint16_t end_data_byte, bool ff, bool lf, bool v6,
+ struct reassembled_pkt **rp)
 OVS_REQUIRES(ipf->ipf_lock)
 {
 bool duped_frag = ipf_is_frag_duped(ipf_list->frag_list,
@@ -820,7 +826,7 @@ ipf_process_frag(struct ipf *ipf, struct ipf_list *ipf_list,
 ipf_list->last_inuse_idx++;
 atomic_count_inc(>nfrag);
 ipf_count(ipf, v6, IPF_NFRAGS_ACCEPTED);
-ipf_list_state_transition(ipf, ipf_list, ff, lf, v6);
+*rp = ipf_list_state_transition(ipf, ipf_list, ff, lf, v6);
 } else {
 OVS_NOT_REACHED();
 }
@@ -853,7 +859,8 @@ ipf_list_init(struct ipf_list *ipf_list, struct 
ipf_list_key *key,
  * to a list of fragemnts. */
 static bool
 ipf_handle_frag(struct ipf *ipf, struct dp_packet *pkt, ovs_be16 dl_type,
-uint16_t zone, long long now, uint32_t hash_basis)
+uint16_t zone, long long now, uint32_t hash_basis,
+struct reassembled_pkt **rp)
 OVS_REQUIRES(ipf->ipf_lock)
 {
 struct ipf_list_key key;
@@ -922,7 +929,7 @@ ipf_handle_frag(struct ipf *ipf, struct dp_packet *pkt, 
ovs_be16 dl_type,
 }
 
 return ipf_process_frag(ipf, ipf_list, pkt, start_data_byte,
-end_data_byte, ff, lf, v6);
+end_data_byte, ff, lf, v6, rp);
 }
 
 /* Filters out fragments from a batch of fragments and adjust the batch. */
@@ -941,11 +948,17 @@ ipf_extract_frags_from_batch(struct ipf *ipf, struct 
dp_packet_batch *pb,
   ||
   (dl_type == htons(ETH_TYPE_IPV6) &&
   ipf_is_valid_v6_frag(ipf, pkt {
+struct reassembled_pkt *rp = NULL;
 
 ovs_mutex_lock(>ipf_lock);
-if (!ipf_handle_frag(ipf, pkt, dl_type, zone, now, hash_basis)) {
+if (!ipf_handle_frag(ipf, pkt, dl_type, zone, now, hash_basis,
+ )) {
 dp_packet_batch_refill(pb, pkt, pb_idx);
 } else {
+if (rp && !dp_packet_batch_is_full(pb)) {
+dp_packet_batch_refill(pb, rp->pkt, pb_idx);

[ovs-dev] [PATCH v2 1/2] ipf: Only add fragments to batch of same dl_type.

2024-05-16 Thread Mike Pattrick
When conntrack is reassembling packet fragments, the same reassembly
context can be shared across multiple threads handling different packets
simultaneously. Once a full packet is assembled, it is added to a packet
batch for processing, this is most likely the batch that added it in the
first place, but that isn't a guarantee.

The packets in these batches should be segregated by network protocol
version (ipv4 vs ipv6) for conntrack defragmentation to function
appropriately. However, there are conditions where we would add a
reassembled packet of one type to a batch of another.

This change introduces checks to make sure that reassembled or expired
fragments are only added to packet batches of the same type.

Fixes: 4ea96698f667 ("Userspace datapath: Add fragmentation handling.")
Reported-at: https://issues.redhat.com/browse/FDP-560
Signed-off-by: Mike Pattrick 
---
 lib/ipf.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/lib/ipf.c b/lib/ipf.c
index 7d74e2c13..3c8960be3 100644
--- a/lib/ipf.c
+++ b/lib/ipf.c
@@ -1063,6 +1063,9 @@ ipf_send_completed_frags(struct ipf *ipf, struct 
dp_packet_batch *pb,
 struct ipf_list *ipf_list;
 
 LIST_FOR_EACH_SAFE (ipf_list, list_node, >frag_complete_list) {
+if ((ipf_list->key.dl_type == htons(ETH_TYPE_IPV6)) != v6) {
+continue;
+}
 if (ipf_send_frags_in_list(ipf, ipf_list, pb, IPF_FRAG_COMPLETED_LIST,
v6, now)) {
 ipf_completed_list_clean(>frag_lists, ipf_list);
@@ -1096,6 +1099,9 @@ ipf_send_expired_frags(struct ipf *ipf, struct 
dp_packet_batch *pb,
 size_t lists_removed = 0;
 
 LIST_FOR_EACH_SAFE (ipf_list, list_node, >frag_exp_list) {
+if ((ipf_list->key.dl_type == htons(ETH_TYPE_IPV6)) != v6) {
+continue;
+}
 if (now <= ipf_list->expiration ||
 lists_removed >= IPF_FRAG_LIST_MAX_EXPIRED) {
 break;
@@ -1116,7 +1122,8 @@ ipf_send_expired_frags(struct ipf *ipf, struct 
dp_packet_batch *pb,
 /* Adds a reassmebled packet to a packet batch to be processed by the caller.
  */
 static void
-ipf_execute_reass_pkts(struct ipf *ipf, struct dp_packet_batch *pb)
+ipf_execute_reass_pkts(struct ipf *ipf, struct dp_packet_batch *pb,
+   ovs_be16 dl_type)
 {
 if (ovs_list_is_empty(>reassembled_pkt_list)) {
 return;
@@ -1127,6 +1134,7 @@ ipf_execute_reass_pkts(struct ipf *ipf, struct 
dp_packet_batch *pb)
 
 LIST_FOR_EACH_SAFE (rp, rp_list_node, >reassembled_pkt_list) {
 if (!rp->list->reass_execute_ctx &&
+rp->list->key.dl_type == dl_type &&
 ipf_dp_packet_batch_add(pb, rp->pkt, false)) {
 rp->list->reass_execute_ctx = rp->pkt;
 }
@@ -1237,7 +1245,7 @@ ipf_preprocess_conntrack(struct ipf *ipf, struct 
dp_packet_batch *pb,
 }
 
 if (ipf_get_enabled(ipf) || atomic_count_get(>nfrag)) {
-ipf_execute_reass_pkts(ipf, pb);
+ipf_execute_reass_pkts(ipf, pb, dl_type);
 }
 }
 
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ipf: Only add fragments to batch of same dl_type.

2024-05-16 Thread Mike Pattrick
On Thu, May 16, 2024 at 8:35 AM Simon Horman  wrote:
>
> Hi Mike,
>
> On Wed, May 15, 2024 at 12:24:33PM -0400, Mike Pattrick wrote:
> > When conntrack is reassembling packet fragments, the same reassembly
> > context can be shared across multiple threads handling different packets
> > simultaneously. Once a full packet is assembled, it is added to a packet
> > batch for processing, this is most likely the batch that added it in the
> > first place, but that isn't a guarantee.
> >
> > The packets in these batches should be segregated by network protocol
> > versuib (ipv4 vs ipv6) for conntrack defragmentation to function
>
> nit: version
>
> > appropriately. However, there are conditions where we would add a
> > reassembled packet of one type to a batch of another.
> >
> > This change introduces checks to make sure that reassembled or expired
> > fragments are only added to packet batches of the same type. It also
> > makes a best effort attempt to make sure the defragmented packet is
> > inserted into the current batch.
>
> Would it make any sense to separate these changes into separate patches?

Can do!

>
> >
> > Fixes: 4ea96698f667 ("Userspace datapath: Add fragmentation handling.")
> > Reported-at: https://issues.redhat.com/browse/FDP-560
> > Signed-off-by: Mike Pattrick 
> > ---
> > Note: This solution is far from perfect, ipf.c can still insert packets
> > into more or less arbitrary batches but this bug fix is needed to avoid a
> > memory overrun and should insert packets into the proper batch in the
> > common case. I'm working on a more correct solution but it changes how
> > fragments are fundimentally handled, and couldn't be considered a bug fix.
>
> FWIIW, I'm ok with changes that move things to a better, even if not ideal,
> state.
>
> ...
>
> > @@ -943,9 +952,14 @@ ipf_extract_frags_from_batch(struct ipf *ipf, struct 
> > dp_packet_batch *pb,
> >ipf_is_valid_v6_frag(ipf, pkt {
> >
> >  ovs_mutex_lock(>ipf_lock);
> > -if (!ipf_handle_frag(ipf, pkt, dl_type, zone, now, 
> > hash_basis)) {
> > +if (!ipf_handle_frag(ipf, pkt, dl_type, zone, now, hash_basis,
> > + )) {
> >  dp_packet_batch_refill(pb, pkt, pb_idx);
> >  } else {
> > +if (rp && !dp_packet_batch_is_full(pb)) {
>
> The conditions under which rp are set are complex and buried
> inside the call-chain under ipf_handle_frag(). I am concerned
> that there are cases where it may be used unset here. Or that
> the complexity allows for such cases to be inadvertently added
> later.
>
> Could we make this more robust, f.e. by making sure rp is
> always initialised when ipf_handle_frag returns by setting
> it to NULL towards the top of that function.

Agreed that it's overly complex. I'll change this to initialize this
in ipf_extract_frags_from_batch(), the functions in between
ipf_list_state_transition and ipf_extract_frags_from_batch shouldn't
really touch or care about this value.

-M

>
> > +dp_packet_batch_refill(pb, rp->pkt, pb_idx);
> > +rp->list->reass_execute_ctx = rp->pkt;
> > +}
> >  dp_packet_delete(pkt);
> >  }
> >  ovs_mutex_unlock(>ipf_lock);
>
> ...
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] compiler: Fix errors in Clang 17 ubsan checks.

2024-05-16 Thread Mike Pattrick
This patch attempts to fix a large number of ubsan error messages that
take the following form:

lib/netlink-notifier.c:237:13: runtime error: call to function
route_table_change through pointer to incorrect function type
'void (*)(const void *, void *)'

In Clang 17 the undefined behaviour sanatizer check for function
pointers was enabled by default, whereas it was previously disabled
while compiling C code. These warnings are a false positive in the case
of OVS, as our macros already check to make sure the function parameter
is the correct size.

So that check is disabled in the single function that is causing all of
the errors.

Signed-off-by: Mike Pattrick 
---
v2: Changed macro name
---
 include/openvswitch/compiler.h | 11 +++
 lib/ovs-rcu.c  |  1 +
 2 files changed, 12 insertions(+)

diff --git a/include/openvswitch/compiler.h b/include/openvswitch/compiler.h
index 878c5c6a7..f49b23683 100644
--- a/include/openvswitch/compiler.h
+++ b/include/openvswitch/compiler.h
@@ -69,6 +69,17 @@
 #define OVS_UNLIKELY(CONDITION) (!!(CONDITION))
 #endif
 
+/* Clang 17's implementation of ubsan enables checking that function pointers
+ * match the type of the called function. This currently breaks ovs-rcu, which
+ * calls multiple different types of callbacks via a generic void *(void*)
+ * function pointer type. This macro enables disabling that check for specific
+ * functions. */
+#if __clang__ && __has_feature(undefined_behavior_sanitizer)
+#define OVS_NO_SANITIZE_FUNCTION __attribute__((no_sanitize("function")))
+#else
+#define OVS_NO_SANITIZE_FUNCTION
+#endif
+
 #if __has_feature(c_thread_safety_attributes)
 /* "clang" annotations for thread safety check.
  *
diff --git a/lib/ovs-rcu.c b/lib/ovs-rcu.c
index 9e07d9bab..597fe6826 100644
--- a/lib/ovs-rcu.c
+++ b/lib/ovs-rcu.c
@@ -327,6 +327,7 @@ ovsrcu_postpone__(void (*function)(void *aux), void *aux)
 }
 
 static bool
+OVS_NO_SANITIZE_FUNCTION
 ovsrcu_call_postponed(void)
 {
 struct ovsrcu_cbset *cbset;
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ipf: Only add fragments to batch of same dl_type.

2024-05-15 Thread Mike Pattrick
When conntrack is reassembling packet fragments, the same reassembly
context can be shared across multiple threads handling different packets
simultaneously. Once a full packet is assembled, it is added to a packet
batch for processing, this is most likely the batch that added it in the
first place, but that isn't a guarantee.

The packets in these batches should be segregated by network protocol
versuib (ipv4 vs ipv6) for conntrack defragmentation to function
appropriately. However, there are conditions where we would add a
reassembled packet of one type to a batch of another.

This change introduces checks to make sure that reassembled or expired
fragments are only added to packet batches of the same type. It also
makes a best effort attempt to make sure the defragmented packet is
inserted into the current batch.

Fixes: 4ea96698f667 ("Userspace datapath: Add fragmentation handling.")
Reported-at: https://issues.redhat.com/browse/FDP-560
Signed-off-by: Mike Pattrick 
---
Note: This solution is far from perfect, ipf.c can still insert packets
into more or less arbitrary batches but this bug fix is needed to avoid a
memory overrun and should insert packets into the proper batch in the
common case. I'm working on a more correct solution but it changes how
fragments are fundimentally handled, and couldn't be considered a bug fix.
---
 lib/ipf.c | 40 +++-
 1 file changed, 31 insertions(+), 9 deletions(-)

diff --git a/lib/ipf.c b/lib/ipf.c
index 7d74e2c13..90c819d63 100644
--- a/lib/ipf.c
+++ b/lib/ipf.c
@@ -506,13 +506,15 @@ ipf_reassemble_v6_frags(struct ipf_list *ipf_list)
 }
 
 /* Called when a frag list state transitions to another state. This is
- * triggered by new fragment for the list being received.*/
-static void
+* triggered by new fragment for the list being received. Returns a reassembled
+* packet if this fragment has completed one. */
+static struct reassembled_pkt *
 ipf_list_state_transition(struct ipf *ipf, struct ipf_list *ipf_list,
   bool ff, bool lf, bool v6)
 OVS_REQUIRES(ipf->ipf_lock)
 {
 enum ipf_list_state curr_state = ipf_list->state;
+struct reassembled_pkt *ret = NULL;
 enum ipf_list_state next_state;
 switch (curr_state) {
 case IPF_LIST_STATE_UNUSED:
@@ -562,12 +564,15 @@ ipf_list_state_transition(struct ipf *ipf, struct 
ipf_list *ipf_list,
 ipf_reassembled_list_add(>reassembled_pkt_list, rp);
 ipf_expiry_list_remove(ipf_list);
 next_state = IPF_LIST_STATE_COMPLETED;
+ret = rp;
 } else {
 next_state = IPF_LIST_STATE_REASS_FAIL;
 }
 }
 }
 ipf_list->state = next_state;
+
+return ret;
 }
 
 /* Some sanity checks are redundant, but prudent, in case code paths for
@@ -799,7 +804,8 @@ ipf_is_frag_duped(const struct ipf_frag *frag_list, int 
last_inuse_idx,
 static bool
 ipf_process_frag(struct ipf *ipf, struct ipf_list *ipf_list,
  struct dp_packet *pkt, uint16_t start_data_byte,
- uint16_t end_data_byte, bool ff, bool lf, bool v6)
+ uint16_t end_data_byte, bool ff, bool lf, bool v6,
+ struct reassembled_pkt **rp)
 OVS_REQUIRES(ipf->ipf_lock)
 {
 bool duped_frag = ipf_is_frag_duped(ipf_list->frag_list,
@@ -820,13 +826,14 @@ ipf_process_frag(struct ipf *ipf, struct ipf_list 
*ipf_list,
 ipf_list->last_inuse_idx++;
 atomic_count_inc(>nfrag);
 ipf_count(ipf, v6, IPF_NFRAGS_ACCEPTED);
-ipf_list_state_transition(ipf, ipf_list, ff, lf, v6);
+*rp = ipf_list_state_transition(ipf, ipf_list, ff, lf, v6);
 } else {
 OVS_NOT_REACHED();
 }
 } else {
 ipf_count(ipf, v6, IPF_NFRAGS_OVERLAP);
 pkt->md.ct_state = CS_INVALID;
+*rp = NULL;
 return false;
 }
 return true;
@@ -853,7 +860,8 @@ ipf_list_init(struct ipf_list *ipf_list, struct 
ipf_list_key *key,
  * to a list of fragemnts. */
 static bool
 ipf_handle_frag(struct ipf *ipf, struct dp_packet *pkt, ovs_be16 dl_type,
-uint16_t zone, long long now, uint32_t hash_basis)
+uint16_t zone, long long now, uint32_t hash_basis,
+struct reassembled_pkt **rp)
 OVS_REQUIRES(ipf->ipf_lock)
 {
 struct ipf_list_key key;
@@ -922,7 +930,7 @@ ipf_handle_frag(struct ipf *ipf, struct dp_packet *pkt, 
ovs_be16 dl_type,
 }
 
 return ipf_process_frag(ipf, ipf_list, pkt, start_data_byte,
-end_data_byte, ff, lf, v6);
+end_data_byte, ff, lf, v6, rp);
 }
 
 /* Filters out fragments from a batch of fragments and adjust the batch. */
@@ -933,6 +941,7 @@ ipf_extract_frags_from_batch(struct ipf *ipf, struct 
dp_packet_batch *pb,
 {
 const size_t pb_cnt = dp_packet_batch_size(pb);
 int pb_idx; /*

Re: [ovs-dev] [PATCH v2] lib: Fix segfault for tunnel packet.

2024-05-03 Thread Mike Pattrick
On Fri, May 3, 2024 at 6:22 AM Amit Prakash Shukla
 wrote:
>
> Add NULL check to UDP, TCP and SCTP checksum functions. This patch
> also adds changes to populate inner_l3_ofs and inner_l4_ofs for the
> tunneled packets received from ports other than vport which are
> required by the protocol specific checksum function to parse the
> headers.
>
> Thread 22 "pmd-c07/id:15" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x6e70dc00 (LWP 1061)]
> 0x13f61750 in packet_udp_complete_csum at lib/packets.c:2061
> 2061if (!udp->udp_csum) {
>
> 0x13f61750 in packet_udp_complete_csum at lib/packets.c:2061
> 0x13e5126c in dp_packet_ol_send_prepare at lib/dp-packet.c:638
> 0x13eb7d4c in netdev_push_header at lib/netdev.c:1035
> 0x13e69830 in push_tnl_action at lib/dpif-netdev.c:9067
> 0x13e69dac in dp_execute_cb at lib/dpif-netdev.c:9226
> 0x13ec72c4 in odp_execute_actions at lib/odp-execute.c:1008
> 0x13e6a7bc in dp_netdev_execute_actions at lib/dpif-netdev.c:9524
> 0x13e673d0 in packet_batch_per_flow_execute at lib/dpif-netdev.c:8271
> 0x13e69188 in dp_netdev_input__ at lib/dpif-netdev.c:8899
> 0x13e691f8 in dp_netdev_input at lib/dpif-netdev.c:8908
> 0x13e600e4 in dp_netdev_process_rxq_port at lib/dpif-netdev.c:5660
> 0x13e649a8 in pmd_thread_main at lib/dpif-netdev.c:7295
> 0x13f44b2c in ovsthread_wrapper at lib/ovs-thread.c:423
>
> CC: Mike Pattrick 
> Fixes: 85bcbbed839a ("userspace: Enable tunnel tests with TSO.")
>
> Signed-off-by: Amit Prakash Shukla 
> ---
>
> v2:
> - Added Fixes tag and updated commit message.
>
>  lib/netdev.c  |  7 +++
>  lib/packets.c | 10 +-
>  2 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/lib/netdev.c b/lib/netdev.c
> index f2d921ed6..19bd87ef7 100644
> --- a/lib/netdev.c
> +++ b/lib/netdev.c
> @@ -1032,6 +1032,13 @@ netdev_push_header(const struct netdev *netdev,
>   netdev_get_name(netdev));
>  continue;
>  }
> +if (packet->l3_ofs != UINT16_MAX) {
> +packet->inner_l3_ofs = packet->l3_ofs + data->header_len;
> +}
> +if (packet->l4_ofs != UINT16_MAX) {
> +packet->inner_l4_ofs = packet->l4_ofs + data->header_len;
> +}
> +
>  dp_packet_ol_send_prepare(packet, 0);
>  }
>  netdev->netdev_class->push_header(netdev, packet, data);
> diff --git a/lib/packets.c b/lib/packets.c
> index 5803d26f4..988c0e41f 100644
> --- a/lib/packets.c
> +++ b/lib/packets.c
> @@ -2011,6 +2011,10 @@ packet_tcp_complete_csum(struct dp_packet *p, bool 
> inner)
>  tcp_sz = dp_packet_l4_size(p);
>  }
>
> +if (!tcp || !ip_hdr) {
> +return;
> +}

This suggests a packet has NETDEV_TX_OFFLOAD_TCP_CKSUM set but no TCP
header or the offsets are set incorrectly. If that's the case then
there will be additional issues in netdev-linux, the avx512 code, and
potentially in other DPDK drivers.

As Ilya mentioned, an assert here would be preferable.

-M

> +
>  if (!inner && dp_packet_hwol_is_outer_ipv6(p)) {
>  is_v4 = false;
>  } else if (!inner && dp_packet_hwol_is_outer_ipv4(p)) {
> @@ -2058,7 +2062,7 @@ packet_udp_complete_csum(struct dp_packet *p, bool 
> inner)
>  }
>
>  /* Skip csum calculation if the udp_csum is zero. */
> -if (!udp->udp_csum) {
> +if (!udp || !ip_hdr || !udp->udp_csum) {
>  return;
>  }
>
> @@ -2109,6 +2113,10 @@ packet_sctp_complete_csum(struct dp_packet *p, bool 
> inner)
>  tp_len = dp_packet_l4_size(p);
>  }
>
> +if (!sh) {
> +return;
> +}
> +
>  put_16aligned_be32(>sctp_csum, 0);
>  csum = crc32c((void *) sh, tp_len);
>  put_16aligned_be32(>sctp_csum, csum);
> --
> 2.34.1
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v9 1/2] ofproto-dpif-mirror: Reduce number of function parameters.

2024-05-01 Thread Mike Pattrick
Previously the mirror_set() and mirror_get() functions took a large
number of parameters, which was inefficient and difficult to read and
extend. This patch moves most of the parameters into a struct.

Signed-off-by: Mike Pattrick 
Acked-by: Simon Horman 
Acked-by: Eelco Chaudron 
Signed-off-by: Mike Pattrick 
---
 ofproto/ofproto-dpif-mirror.c | 60 ++-
 ofproto/ofproto-dpif-mirror.h | 40 ++-
 ofproto/ofproto-dpif-xlate.c  | 29 -
 ofproto/ofproto-dpif.c| 23 +++---
 4 files changed, 88 insertions(+), 64 deletions(-)

diff --git a/ofproto/ofproto-dpif-mirror.c b/ofproto/ofproto-dpif-mirror.c
index 343b75f0e..4967ecc9a 100644
--- a/ofproto/ofproto-dpif-mirror.c
+++ b/ofproto/ofproto-dpif-mirror.c
@@ -207,19 +207,22 @@ mirror_bundle_dst(struct mbridge *mbridge, struct 
ofbundle *ofbundle)
 }
 
 int
-mirror_set(struct mbridge *mbridge, void *aux, const char *name,
-   struct ofbundle **srcs, size_t n_srcs,
-   struct ofbundle **dsts, size_t n_dsts,
-   unsigned long *src_vlans, struct ofbundle *out_bundle,
-   uint16_t snaplen,
-   uint16_t out_vlan)
+mirror_set(struct mbridge *mbridge, void *aux,
+   const struct ofproto_mirror_settings *ms,
+   const struct mirror_bundles *mb)
 {
 struct mbundle *mbundle, *out;
 mirror_mask_t mirror_bit;
 struct mirror *mirror;
 struct hmapx srcs_map;  /* Contains "struct ofbundle *"s. */
 struct hmapx dsts_map;  /* Contains "struct ofbundle *"s. */
+uint16_t out_vlan;
 
+if (!ms || !mbridge) {
+return EINVAL;
+}
+
+out_vlan = ms->out_vlan;
 mirror = mirror_lookup(mbridge, aux);
 if (!mirror) {
 int idx;
@@ -227,7 +230,7 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 idx = mirror_scan(mbridge);
 if (idx < 0) {
 VLOG_WARN("maximum of %d port mirrors reached, cannot create %s",
-  MAX_MIRRORS, name);
+  MAX_MIRRORS, ms->name);
 return EFBIG;
 }
 
@@ -242,8 +245,8 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 unsigned long *vlans = ovsrcu_get(unsigned long *, >vlans);
 
 /* Get the new configuration. */
-if (out_bundle) {
-out = mbundle_lookup(mbridge, out_bundle);
+if (mb->out_bundle) {
+out = mbundle_lookup(mbridge, mb->out_bundle);
 if (!out) {
 mirror_destroy(mbridge, mirror->aux);
 return EINVAL;
@@ -252,16 +255,16 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 } else {
 out = NULL;
 }
-mbundle_lookup_multiple(mbridge, srcs, n_srcs, _map);
-mbundle_lookup_multiple(mbridge, dsts, n_dsts, _map);
+mbundle_lookup_multiple(mbridge, mb->srcs, mb->n_srcs, _map);
+mbundle_lookup_multiple(mbridge, mb->dsts, mb->n_dsts, _map);
 
 /* If the configuration has not changed, do nothing. */
 if (hmapx_equals(_map, >srcs)
 && hmapx_equals(_map, >dsts)
-&& vlan_bitmap_equal(vlans, src_vlans)
+&& vlan_bitmap_equal(vlans, ms->src_vlans)
 && mirror->out == out
 && mirror->out_vlan == out_vlan
-&& mirror->snaplen == snaplen)
+&& mirror->snaplen == ms->snaplen)
 {
 hmapx_destroy(_map);
 hmapx_destroy(_map);
@@ -275,15 +278,15 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 hmapx_swap(_map, >dsts);
 hmapx_destroy(_map);
 
-if (vlans || src_vlans) {
+if (vlans || ms->src_vlans) {
 ovsrcu_postpone(free, vlans);
-vlans = vlan_bitmap_clone(src_vlans);
+vlans = vlan_bitmap_clone(ms->src_vlans);
 ovsrcu_set(>vlans, vlans);
 }
 
 mirror->out = out;
 mirror->out_vlan = out_vlan;
-mirror->snaplen = snaplen;
+mirror->snaplen = ms->snaplen;
 
 /* Update mbundles. */
 mirror_bit = MIRROR_MASK_C(1) << mirror->idx;
@@ -406,23 +409,22 @@ mirror_update_stats(struct mbridge *mbridge, 
mirror_mask_t mirrors,
 /* Retrieves the mirror numbered 'index' in 'mbridge'.  Returns true if such a
  * mirror exists, false otherwise.
  *
- * If successful, '*vlans' receives the mirror's VLAN membership information,
+ * If successful 'mc->vlans' receives the mirror's VLAN membership information,
  * either a null pointer if the mirror includes all VLANs or a 4096-bit bitmap
  * in which a 1-bit indicates that the mirror includes a particular VLAN,
- * '*dup_mirrors' receives a bitmap of mirrors whose output duplicates mirror
- * 'index', '*out' receives the output ofbundle (if any), and '*out_vlan'
- * receives the output VLAN (if any).
+ * 'mc->dup_mirrors' receives a bitmap of mirrors whose output duplic

[ovs-dev] [PATCH v9 2/2] ofproto-dpif-mirror: Add support for pre-selection filter.

2024-05-01 Thread Mike Pattrick
Currently a bridge mirror will collect all packets and tools like
ovs-tcpdump can apply additional filters after they have already been
duplicated by vswitchd. This can result in inefficient collection.

This patch adds support to apply pre-selection to bridge mirrors, which
can limit which packets are mirrored based on flow metadata. This
significantly improves overall vswitchd performance during mirroring if
only a subset of traffic is required.

Signed-off-by: Mike Pattrick 
---
v8:
 - Corrected code from v7 related to sequence and in_port. Mirrors
   reject filters with an in_port set as this could cause confusion.
 - Combined ovsrcu pointers into a new struct, minimatch wasn't used
   because the minimatch_* functions didn't fit the usage here.
 - Added a test to check for modifying filters when partially
   overlapping flows already exist.
 - Corrected documentation.
v9:
 - Explicitly cleared mirror_config.filter* when not set
---
 Documentation/ref/ovs-tcpdump.8.rst |   8 +-
 NEWS|   6 +
 lib/flow.h  |   9 ++
 ofproto/ofproto-dpif-mirror.c   | 104 +-
 ofproto/ofproto-dpif-mirror.h   |   9 +-
 ofproto/ofproto-dpif-xlate.c|  15 ++-
 ofproto/ofproto-dpif.c  |  12 +-
 ofproto/ofproto.h   |   3 +
 tests/ofproto-dpif.at   | 165 
 utilities/ovs-tcpdump.in|  13 ++-
 vswitchd/bridge.c   |  13 ++-
 vswitchd/vswitch.ovsschema  |   7 +-
 vswitchd/vswitch.xml|  16 +++
 13 files changed, 365 insertions(+), 15 deletions(-)

diff --git a/Documentation/ref/ovs-tcpdump.8.rst 
b/Documentation/ref/ovs-tcpdump.8.rst
index b9f8cdf6f..e21e61211 100644
--- a/Documentation/ref/ovs-tcpdump.8.rst
+++ b/Documentation/ref/ovs-tcpdump.8.rst
@@ -61,8 +61,14 @@ Options
 
   If specified, mirror all ports (optional).
 
+* ``--filter ``
+
+  If specified, only mirror flows that match the provided OpenFlow filter.
+  The available fields are documented in ``ovs-fields(7)``.
+
 See Also
 
 
 ``ovs-appctl(8)``, ``ovs-vswitchd(8)``, ``ovs-pcap(1)``,
-``ovs-tcpundump(1)``, ``tcpdump(8)``, ``wireshark(8)``.
+``ovs-fields(7)``, ``ovs-tcpundump(1)``, ``tcpdump(8)``,
+``wireshark(8)``.
diff --git a/NEWS b/NEWS
index b92cec532..f3a4bf076 100644
--- a/NEWS
+++ b/NEWS
@@ -7,6 +7,12 @@ Post-v3.3.0
- The primary development branch has been renamed from 'master' to 'main'.
  The OVS tree remains hosted on GitHub.
  https://github.com/openvswitch/ovs.git
+   - ovs-vsctl:
+ * Added a new filter column in the Mirror table which can be used to
+   apply filters to mirror ports.
+   - ovs-tcpdump:
+ * Added command line parameter --filter to enable filtering the flows
+   that are captured by tcpdump.
 
 
 v3.3.0 - 16 Feb 2024
diff --git a/lib/flow.h b/lib/flow.h
index 75a9be3c1..60ec4b0d7 100644
--- a/lib/flow.h
+++ b/lib/flow.h
@@ -939,6 +939,15 @@ flow_union_with_miniflow(struct flow *dst, const struct 
miniflow *src)
 flow_union_with_miniflow_subset(dst, src, src->map);
 }
 
+/* Perform a bitwise OR of minimask 'src' mask data with the equivalent
+ * fields in 'dst', storing the result in 'dst'. */
+static inline void
+flow_wildcards_union_with_minimask(struct flow_wildcards *dst,
+   const struct minimask *src)
+{
+flow_union_with_miniflow_subset(>masks, >masks, src->masks.map);
+}
+
 static inline bool is_ct_valid(const struct flow *flow,
const struct flow_wildcards *mask,
struct flow_wildcards *wc)
diff --git a/ofproto/ofproto-dpif-mirror.c b/ofproto/ofproto-dpif-mirror.c
index 4967ecc9a..6d89d13a5 100644
--- a/ofproto/ofproto-dpif-mirror.c
+++ b/ofproto/ofproto-dpif-mirror.c
@@ -21,6 +21,7 @@
 #include "cmap.h"
 #include "hmapx.h"
 #include "ofproto.h"
+#include "ofproto-dpif-trace.h"
 #include "vlan-bitmap.h"
 #include "openvswitch/vlog.h"
 
@@ -48,6 +49,11 @@ struct mbundle {
 mirror_mask_t mirror_out;   /* Mirrors that output to this mbundle. */
 };
 
+struct filtermask {
+struct miniflow *flow;
+struct minimask *mask;
+};
+
 struct mirror {
 struct mbridge *mbridge;/* Owning ofproto. */
 size_t idx; /* In ofproto's "mirrors" array. */
@@ -57,6 +63,10 @@ struct mirror {
 struct hmapx srcs;  /* Contains "struct mbundle*"s. */
 struct hmapx dsts;  /* Contains "struct mbundle*"s. */
 
+/* Filter criteria. */
+OVSRCU_TYPE(struct filtermask *) filter_mask;
+char *filter_str;
+
 /* This is accessed by handler threads assuming RCU protection (see
  * mirror_get()), but can be manipulated by mirror_set() without any
  * explicit synchronization. */
@@ -83,6 +93,23 @@ static void mbundle_lookup_multiple(const struct mbri

[ovs-dev] [PATCH v8 2/2] ofproto-dpif-mirror: Add support for pre-selection filter.

2024-04-29 Thread Mike Pattrick
Currently a bridge mirror will collect all packets and tools like
ovs-tcpdump can apply additional filters after they have already been
duplicated by vswitchd. This can result in inefficient collection.

This patch adds support to apply pre-selection to bridge mirrors, which
can limit which packets are mirrored based on flow metadata. This
significantly improves overall vswitchd performance during mirroring if
only a subset of traffic is required.

Signed-off-by: Mike Pattrick 
---
v8:
 - Corrected code from v7 related to sequence and in_port. Mirrors
   reject filters with an in_port set as this could cause confusion.
 - Combined ovsrcu pointers into a new struct, minimatch wasn't used
   because the minimatch_* functions didn't fit the usage here.
 - Added a test to check for modifying filters when partially
   overlapping flows already exist.
 - Corrected documentation.
---
 Documentation/ref/ovs-tcpdump.8.rst |   8 +-
 NEWS|   6 +
 lib/flow.h  |   9 ++
 ofproto/ofproto-dpif-mirror.c   | 101 -
 ofproto/ofproto-dpif-mirror.h   |   9 +-
 ofproto/ofproto-dpif-xlate.c|  15 ++-
 ofproto/ofproto-dpif.c  |  12 +-
 ofproto/ofproto.h   |   3 +
 tests/ofproto-dpif.at   | 165 
 utilities/ovs-tcpdump.in|  13 ++-
 vswitchd/bridge.c   |  13 ++-
 vswitchd/vswitch.ovsschema  |   7 +-
 vswitchd/vswitch.xml|  16 +++
 13 files changed, 362 insertions(+), 15 deletions(-)

diff --git a/Documentation/ref/ovs-tcpdump.8.rst 
b/Documentation/ref/ovs-tcpdump.8.rst
index b9f8cdf6f..e21e61211 100644
--- a/Documentation/ref/ovs-tcpdump.8.rst
+++ b/Documentation/ref/ovs-tcpdump.8.rst
@@ -61,8 +61,14 @@ Options
 
   If specified, mirror all ports (optional).
 
+* ``--filter ``
+
+  If specified, only mirror flows that match the provided OpenFlow filter.
+  The available fields are documented in ``ovs-fields(7)``.
+
 See Also
 
 
 ``ovs-appctl(8)``, ``ovs-vswitchd(8)``, ``ovs-pcap(1)``,
-``ovs-tcpundump(1)``, ``tcpdump(8)``, ``wireshark(8)``.
+``ovs-fields(7)``, ``ovs-tcpundump(1)``, ``tcpdump(8)``,
+``wireshark(8)``.
diff --git a/NEWS b/NEWS
index b92cec532..f3a4bf076 100644
--- a/NEWS
+++ b/NEWS
@@ -7,6 +7,12 @@ Post-v3.3.0
- The primary development branch has been renamed from 'master' to 'main'.
  The OVS tree remains hosted on GitHub.
  https://github.com/openvswitch/ovs.git
+   - ovs-vsctl:
+ * Added a new filter column in the Mirror table which can be used to
+   apply filters to mirror ports.
+   - ovs-tcpdump:
+ * Added command line parameter --filter to enable filtering the flows
+   that are captured by tcpdump.
 
 
 v3.3.0 - 16 Feb 2024
diff --git a/lib/flow.h b/lib/flow.h
index 75a9be3c1..60ec4b0d7 100644
--- a/lib/flow.h
+++ b/lib/flow.h
@@ -939,6 +939,15 @@ flow_union_with_miniflow(struct flow *dst, const struct 
miniflow *src)
 flow_union_with_miniflow_subset(dst, src, src->map);
 }
 
+/* Perform a bitwise OR of minimask 'src' mask data with the equivalent
+ * fields in 'dst', storing the result in 'dst'. */
+static inline void
+flow_wildcards_union_with_minimask(struct flow_wildcards *dst,
+   const struct minimask *src)
+{
+flow_union_with_miniflow_subset(>masks, >masks, src->masks.map);
+}
+
 static inline bool is_ct_valid(const struct flow *flow,
const struct flow_wildcards *mask,
struct flow_wildcards *wc)
diff --git a/ofproto/ofproto-dpif-mirror.c b/ofproto/ofproto-dpif-mirror.c
index 4967ecc9a..7020a5a5f 100644
--- a/ofproto/ofproto-dpif-mirror.c
+++ b/ofproto/ofproto-dpif-mirror.c
@@ -21,6 +21,7 @@
 #include "cmap.h"
 #include "hmapx.h"
 #include "ofproto.h"
+#include "ofproto-dpif-trace.h"
 #include "vlan-bitmap.h"
 #include "openvswitch/vlog.h"
 
@@ -48,6 +49,11 @@ struct mbundle {
 mirror_mask_t mirror_out;   /* Mirrors that output to this mbundle. */
 };
 
+struct filtermask {
+struct miniflow *flow;
+struct minimask *mask;
+};
+
 struct mirror {
 struct mbridge *mbridge;/* Owning ofproto. */
 size_t idx; /* In ofproto's "mirrors" array. */
@@ -57,6 +63,10 @@ struct mirror {
 struct hmapx srcs;  /* Contains "struct mbundle*"s. */
 struct hmapx dsts;  /* Contains "struct mbundle*"s. */
 
+/* Filter criteria. */
+OVSRCU_TYPE(struct filtermask *) filter_mask;
+char *filter_str;
+
 /* This is accessed by handler threads assuming RCU protection (see
  * mirror_get()), but can be manipulated by mirror_set() without any
  * explicit synchronization. */
@@ -83,6 +93,23 @@ static void mbundle_lookup_multiple(const struct mbridge *, 
struct ofbundle **,
 static int mirror_scan(struct

[ovs-dev] [PATCH v8 1/2] ofproto-dpif-mirror: Reduce number of function parameters.

2024-04-29 Thread Mike Pattrick
Previously the mirror_set() and mirror_get() functions took a large
number of parameters, which was inefficient and difficult to read and
extend. This patch moves most of the parameters into a struct.

Signed-off-by: Mike Pattrick 
Acked-by: Simon Horman 
Acked-by: Eelco Chaudron 
Signed-off-by: Mike Pattrick 
---
 ofproto/ofproto-dpif-mirror.c | 60 ++-
 ofproto/ofproto-dpif-mirror.h | 40 ++-
 ofproto/ofproto-dpif-xlate.c  | 29 -
 ofproto/ofproto-dpif.c| 23 +++---
 4 files changed, 88 insertions(+), 64 deletions(-)

diff --git a/ofproto/ofproto-dpif-mirror.c b/ofproto/ofproto-dpif-mirror.c
index 343b75f0e..4967ecc9a 100644
--- a/ofproto/ofproto-dpif-mirror.c
+++ b/ofproto/ofproto-dpif-mirror.c
@@ -207,19 +207,22 @@ mirror_bundle_dst(struct mbridge *mbridge, struct 
ofbundle *ofbundle)
 }
 
 int
-mirror_set(struct mbridge *mbridge, void *aux, const char *name,
-   struct ofbundle **srcs, size_t n_srcs,
-   struct ofbundle **dsts, size_t n_dsts,
-   unsigned long *src_vlans, struct ofbundle *out_bundle,
-   uint16_t snaplen,
-   uint16_t out_vlan)
+mirror_set(struct mbridge *mbridge, void *aux,
+   const struct ofproto_mirror_settings *ms,
+   const struct mirror_bundles *mb)
 {
 struct mbundle *mbundle, *out;
 mirror_mask_t mirror_bit;
 struct mirror *mirror;
 struct hmapx srcs_map;  /* Contains "struct ofbundle *"s. */
 struct hmapx dsts_map;  /* Contains "struct ofbundle *"s. */
+uint16_t out_vlan;
 
+if (!ms || !mbridge) {
+return EINVAL;
+}
+
+out_vlan = ms->out_vlan;
 mirror = mirror_lookup(mbridge, aux);
 if (!mirror) {
 int idx;
@@ -227,7 +230,7 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 idx = mirror_scan(mbridge);
 if (idx < 0) {
 VLOG_WARN("maximum of %d port mirrors reached, cannot create %s",
-  MAX_MIRRORS, name);
+  MAX_MIRRORS, ms->name);
 return EFBIG;
 }
 
@@ -242,8 +245,8 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 unsigned long *vlans = ovsrcu_get(unsigned long *, >vlans);
 
 /* Get the new configuration. */
-if (out_bundle) {
-out = mbundle_lookup(mbridge, out_bundle);
+if (mb->out_bundle) {
+out = mbundle_lookup(mbridge, mb->out_bundle);
 if (!out) {
 mirror_destroy(mbridge, mirror->aux);
 return EINVAL;
@@ -252,16 +255,16 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 } else {
 out = NULL;
 }
-mbundle_lookup_multiple(mbridge, srcs, n_srcs, _map);
-mbundle_lookup_multiple(mbridge, dsts, n_dsts, _map);
+mbundle_lookup_multiple(mbridge, mb->srcs, mb->n_srcs, _map);
+mbundle_lookup_multiple(mbridge, mb->dsts, mb->n_dsts, _map);
 
 /* If the configuration has not changed, do nothing. */
 if (hmapx_equals(_map, >srcs)
 && hmapx_equals(_map, >dsts)
-&& vlan_bitmap_equal(vlans, src_vlans)
+&& vlan_bitmap_equal(vlans, ms->src_vlans)
 && mirror->out == out
 && mirror->out_vlan == out_vlan
-&& mirror->snaplen == snaplen)
+&& mirror->snaplen == ms->snaplen)
 {
 hmapx_destroy(_map);
 hmapx_destroy(_map);
@@ -275,15 +278,15 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 hmapx_swap(_map, >dsts);
 hmapx_destroy(_map);
 
-if (vlans || src_vlans) {
+if (vlans || ms->src_vlans) {
 ovsrcu_postpone(free, vlans);
-vlans = vlan_bitmap_clone(src_vlans);
+vlans = vlan_bitmap_clone(ms->src_vlans);
 ovsrcu_set(>vlans, vlans);
 }
 
 mirror->out = out;
 mirror->out_vlan = out_vlan;
-mirror->snaplen = snaplen;
+mirror->snaplen = ms->snaplen;
 
 /* Update mbundles. */
 mirror_bit = MIRROR_MASK_C(1) << mirror->idx;
@@ -406,23 +409,22 @@ mirror_update_stats(struct mbridge *mbridge, 
mirror_mask_t mirrors,
 /* Retrieves the mirror numbered 'index' in 'mbridge'.  Returns true if such a
  * mirror exists, false otherwise.
  *
- * If successful, '*vlans' receives the mirror's VLAN membership information,
+ * If successful 'mc->vlans' receives the mirror's VLAN membership information,
  * either a null pointer if the mirror includes all VLANs or a 4096-bit bitmap
  * in which a 1-bit indicates that the mirror includes a particular VLAN,
- * '*dup_mirrors' receives a bitmap of mirrors whose output duplicates mirror
- * 'index', '*out' receives the output ofbundle (if any), and '*out_vlan'
- * receives the output VLAN (if any).
+ * 'mc->dup_mirrors' receives a bitmap of mirrors whose output duplic

Re: [ovs-dev] [PATCH 2/2] ovsdb: raft: Fix probe intervals after install snapshot request.

2024-04-17 Thread Mike Pattrick
On Thu, Apr 11, 2024 at 7:45 PM Ilya Maximets  wrote:
>
> If the new snapshot received with INSTALL_SNAPSHOT request contains
> a different election timer value, the timer is updated, but the
> probe intervals for RAFT connections are not.
>
> Fix that by updating probe intervals whenever we get election timer
> from the log.
>
> Fixes: 14b2b0aad7ae ("raft: Reintroduce jsonrpc inactivity probes.")
> Signed-off-by: Ilya Maximets 

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/2] ovsdb: raft: Fix inability to join a cluster with a large database.

2024-04-17 Thread Mike Pattrick
On Thu, Apr 11, 2024 at 7:44 PM Ilya Maximets  wrote:
>
> Inactivity probe interval on RAFT connections depend on a value of the
> election timer.  However, the actual value is not known until the
> database snapshot with the RAFT information is received by a joining
> server.  New joining server is using a default 1 second until then.
>
> In case a new joining server is trying to join an existing cluster
> with a large database, it may take more than a second to generate and
> send an initial database snapshot.  This is causing an inability to
> actually join this cluster.  Joining server sends ADD_SERVER request,
> waits 1 second, sends a probe, doesn't get a reply within another
> second, because the leader is busy preparing and sending an initial
> snapshot to it, disconnects, repeat.
>
> This is not an issue for the servers that did already join, since
> their probe intervals are larger than election timeout.
> Cooperative multitasking also doesn't fully solve this issue, since
> it depends on election timer, which is likely higher in the existing
> cluster with a very big database.
>
> Fix that by using the maximum election timer value for inactivity
> probes until the actual value is known.  We still shouldn't completely
> disable the probes, because in the rare event the connection is
> established but the other side silently goes away, we still want to
> disconnect and try to re-establish the connection eventually.
>
> Since probe intervals also depend on the joining state now, update
> them when the server joins the cluster.
>
> Fixes: 14b2b0aad7ae ("raft: Reintroduce jsonrpc inactivity probes.")
> Reported-by: Terry Wilson 
> Reported-at: https://issues.redhat.com/browse/FDP-144
> Signed-off-by: Ilya Maximets 
> ---

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Userspace: Software fallback for UDP encapsulated TCP segmentation.

2024-03-27 Thread Mike Pattrick
On Wed, Mar 27, 2024 at 1:39 PM Simon Horman  wrote:
>
> On Tue, Feb 20, 2024 at 11:08:55PM -0500, Mike Pattrick wrote:
> > When sending packets that are flagged as requiring segmentation to an
> > interface that doens't support this feature, send the packet to the TSO
> > software fallback instead of dropping it.
> >
> > Signed-off-by: Mike Pattrick 
>
> Hi Mike,
>
> Can I confirm that from your PoV this patch is still awaiting review?
> I ask because it's been sitting around for a while now.

I believe this patch now needs to be modified for the recent
recirculation change. I'll update it for that, give it another once
over, and resubmit.

-M


>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v5] tunnel: Allow UDP zero checksum with IPv6 tunnels.

2024-03-27 Thread Mike Pattrick
This patch adopts the proposed RFC 6935 by allowing null UDP checksums
even if the tunnel protocol is IPv6. This is already supported by Linux
through the udp6zerocsumtx tunnel option. It is disabled by default and
IPv6 tunnels are flagged as requiring a checksum, but this patch enables
the user to set csum=false on IPv6 tunnels.

Signed-off-by: Mike Pattrick 
---
v2: Changed documentation, and added a NEWS item
v3: NEWS file merge conflict
v4: Better comments, new test
v5: Addressed identified nit's
---
 NEWS  |  4 
 lib/netdev-native-tnl.c   |  2 +-
 lib/netdev-vport.c| 17 +++--
 lib/netdev.h  | 18 +-
 ofproto/tunnel.c  | 10 --
 tests/tunnel-push-pop-ipv6.at |  9 +
 tests/tunnel-push-pop.at  |  7 +++
 tests/tunnel.at   |  2 +-
 vswitchd/vswitch.xml  | 12 +---
 9 files changed, 71 insertions(+), 10 deletions(-)

diff --git a/NEWS b/NEWS
index c9e4064e6..6c8c4a2dc 100644
--- a/NEWS
+++ b/NEWS
@@ -4,6 +4,10 @@ Post-v3.3.0
  * Conntrack now supports 'random' flag for selecting ports in a range
while natting and 'persistent' flag for selection of the IP address
from a range.
+ * IPv6 UDP tunnel encapsulation including Geneve and VXLAN will now
+   honour the csum option.  Configuring the interface with
+   "options:csum=false" now has the same effect as the udp6zerocsumtx
+   option has with Linux kernel UDP tunnels.
 
 
 v3.3.0 - 16 Feb 2024
diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index dee9ab344..e8258bc4e 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -424,7 +424,7 @@ udp_build_header(const struct netdev_tunnel_config *tnl_cfg,
 udp = netdev_tnl_ip_build_header(data, params, IPPROTO_UDP, 0);
 udp->udp_dst = tnl_cfg->dst_port;
 
-if (params->is_ipv6 || params->flow->tunnel.flags & FLOW_TNL_F_CSUM) {
+if (params->flow->tunnel.flags & FLOW_TNL_F_CSUM) {
 /* Write a value in now to mark that we should compute the checksum
  * later. 0x is handy because it is transparent to the
  * calculation. */
diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index 60caa02fb..234a4ebe1 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -702,7 +702,9 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
*args, char **errp)
 tnl_cfg.dst_port = htons(atoi(node->value));
 } else if (!strcmp(node->key, "csum") && has_csum) {
 if (!strcmp(node->value, "true")) {
-tnl_cfg.csum = true;
+tnl_cfg.csum = NETDEV_TNL_CSUM_ENABLED;
+} else if (!strcmp(node->value, "false")) {
+tnl_cfg.csum = NETDEV_TNL_CSUM_DISABLED;
 }
 } else if (!strcmp(node->key, "seq") && has_seq) {
 if (!strcmp(node->value, "true")) {
@@ -850,6 +852,15 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
*args, char **errp)
 }
 }
 
+/* The default csum state for GRE is special as it does have an optional
+ * checksum but the default configuration isn't correlated with IP version
+ * like UDP tunnels are.  Likewise, tunnels with no checksum at all must be
+ * in this state. */
+if (tnl_cfg.csum == NETDEV_TNL_CSUM_DEFAULT &&
+(!has_csum || strstr(type, "gre"))) {
+tnl_cfg.csum = NETDEV_TNL_DEFAULT_NO_CSUM;
+}
+
 enum tunnel_layers layers = tunnel_supported_layers(type, _cfg);
 const char *full_type = (strcmp(type, "vxlan") ? type
  : (tnl_cfg.exts & (1 << OVS_VXLAN_EXT_GPE)
@@ -1026,8 +1037,10 @@ get_tunnel_config(const struct netdev *dev, struct smap 
*args)
 }
 }
 
-if (tnl_cfg->csum) {
+if (tnl_cfg->csum == NETDEV_TNL_CSUM_ENABLED) {
 smap_add(args, "csum", "true");
+} else if (tnl_cfg->csum == NETDEV_TNL_CSUM_DISABLED) {
+smap_add(args, "csum", "false");
 }
 
 if (tnl_cfg->set_seq) {
diff --git a/lib/netdev.h b/lib/netdev.h
index 67a8486bd..5d253157c 100644
--- a/lib/netdev.h
+++ b/lib/netdev.h
@@ -111,6 +111,22 @@ enum netdev_srv6_flowlabel {
 SRV6_FLOWLABEL_COMPUTE,
 };
 
+enum netdev_tnl_csum {
+/* Default value for UDP tunnels if no configurations is present.  Enforce
+ * checksum calculation in IPv6 tunnels, disable in IPv4 tunnels. */
+NETDEV_TNL_CSUM_DEFAULT = 0,
+
+/* Checksum explicitly to be calculated. */
+NETDEV_TNL_CSUM_ENABLED,
+
+/* Checksum calculation explicitly disabled. */
+NETDEV_TNL_CSUM_DISABLED,
+
+/* A value for when there is no checksum or the default value is no
+ * checksum reguardless of IP version. */
+NETDEV_TNL_DEFAULT_NO_C

Re: [ovs-dev] [PATCH] ofproto-dpif-upcall: Don't mirror packets that aren't modified.

2024-03-26 Thread Mike Pattrick
On Mon, Mar 25, 2024 at 10:04 PM Zhangweiwei  wrote:
>
> The ethernet addresses of two ICMP request packets are indeed different . One 
> is original packet and the other is modified. It is an expected behavior 
> according to the code.
> Actually, when a packet sent by port A is changed by flow table and then is 
> sent to itself, we expect to capture this packet. However, when this packet 
> is changed and is sent to another port, should we still capture the packet on 
> port A?

Currently in ovs-tcpdump we capture the packet on ingress and then on
egress if it is modified. You could make the mirror without
ovs-tcpdump, and only set the select_dst_port option but not the
select_src_port one. select_src_port is checked during ingress and
select_dst_port is set during egress.

Hope this helps,
M

>
> [root@localhost infiniband]# ovs-tcpdump -i tapVm71 -nnvve
> 11.11.70.1 > 1.1.70.2: ICMP echo request, id 15498, seq 17, length 64
> 09:36:52.822232 68:05:ca:21:d6:e5 > 52:54:00:67:d5:61, ethertype IPv4 
> (0x0800), length 98: (tos 0x0, ttl 63, id 22101, offset 0, flags [none], 
> proto ICMP (1), length 84)
> 1.1.70.2 > 11.11.70.1: ICMP echo reply, id 15498, seq 17, length 64
> 09:36:53.862137 52:54:00:67:d5:61 > 68:05:ca:21:d6:e5, ethertype IPv4 
> (0x0800), length 98: (tos 0x0, ttl 64, id 26518, offset 0, flags [DF], proto 
> ICMP (1), length 84)
> 11.11.70.1 > 1.1.70.2: ICMP echo request, id 15498, seq 18, length 64
> 09:36:53.862139 68:05:ca:21:d6:e5 > 52:54:00:9a:bf:ed, ethertype IPv4 
> (0x0800), length 98: (tos 0x0, ttl 63, id 26518, offset 0, flags [DF], proto 
> ICMP (1), length 84)
> 11.11.70.1 > 1.1.70.2: ICMP echo request, id 15498, seq 18, length 64
> 09:36:53.862230 68:05:ca:21:d6:e5 > 52:54:00:67:d5:61, ethertype IPv4 
> (0x0800), length 98: (tos 0x0, ttl 63, id 22176, offset 0, flags [none], 
> proto ICMP (1), length 84)
> 1.1.70.2 > 11.11.70.1: ICMP echo reply, id 15498, seq 18, length 64
>
> -邮件原件-
> 发件人: Mike Pattrick [mailto:m...@redhat.com]
> 发送时间: 2024年3月25日 22:26
> 收件人: zhangweiwei (RD) 
> 抄送: d...@openvswitch.org
> 主题: Re: [PATCH] ofproto-dpif-upcall: Don't mirror packets that aren't 
> modified.
>
> On Mon, Mar 25, 2024 at 3:48 AM Zhangweiwei  wrote:
> >
> > Hi,
> > I have tried this patch, however, there are still some issues when the 
> > packets contents are changed across recirculation. On the follow example, 
> > packets are modified in recirc_id(0) after mirror, the mirror context 
> > reset. Therefore, there are two ICMP request packets are mirrored on port 
> > mitapVm71.
> >
> > In the following example, ICMP packets ared sent from port(11) to
> > port(14), [root@localhost ~]# ovs-appctl dpif/dump-flows vds1-br
> > ct_state(-new-est-rel-rpl-inv-trk),recirc_id(0),in_port(11),packet_typ
> > e(ns=0,id=0),eth(src=52:54:00:67:d5:61,dst=68:05:ca:21:d6:e5),eth_type
> > (0x0800),ipv4(src=11.11.70.1,dst=1.1.70.2,proto=1,ttl=64,frag=no),
> > packets:431, bytes:42238, used:0.574s,
> > actions:10,set(eth(src=68:05:ca:21:d6:e5,dst=52:54:00:9a:bf:ed)),set(i
> > pv4(ttl=63)),ct(zone=6),recirc(0x3e8)
> > ct_state(+est-rel-rpl),recirc_id(0x3e8),in_port(11),packet_type(ns=0,i
> > d=0),eth_type(0x0800),ipv4(frag=no), packets:430, bytes:42140,
> > used:0.574s, actions:10,14
> >
> > ct_state(-new+est-rel+rpl-inv+trk),recirc_id(0x3e9),in_port(14),packet
> > _type(ns=0,id=0),eth(src=52:54:00:9a:bf:ed,dst=68:05:ca:21:d6:e5),eth_
> > type(0x0800),ipv4(dst=11.11.70.1,proto=1,ttl=64,frag=no), packets:431,
> > bytes:42238, used:0.574s,
> > actions:set(eth(src=68:05:ca:21:d6:e5,dst=52:54:00:67:d5:61)),set(ipv4
> > (ttl=63)),11,10
> > ct_state(-trk),recirc_id(0),in_port(14),packet_type(ns=0,id=0),eth(src
> > =52:54:00:9a:bf:ed),eth_type(0x0800),ipv4(src=1.1.70.2,proto=1,frag=no
> > ), packets:431, bytes:42238, used:0.574s,
> > actions:ct(zone=6),recirc(0x3e9)
> >
> > [root@localhost ~]# ovs-appctl dpif/show
> > netdev@ovs-netdev: hit:2552 missed:3019
> >   vds1-br:
> > mitapVm71 14/10: (system)
> > tapVm71 5/11: (dpdkvhostuserclient: configured_rx_queues=1, 
> > configured_tx_queues=1, mtu=1500, requested_rx_queues=1, 
> > requested_tx_queues=1)
> > tapVm72 6/14: (dpdkvhostuserclient: configured_rx_queues=1,
> > configured_tx_queues=1, mtu=1500, requested_rx_queues=1,
> > requested_tx_queues=1)
> >
> > [root@localhost ~]# ovs-tcpdump -i tapVm71
> > 14:38:53.702142 IP 11.11.70.1 > 1.1.70.2: ICMP echo request, id 13483,
> > seq 2014, length 64
> > 14:38:53.702143 IP 11.11.70.1 > 1.1.70.2: ICMP echo request, id 13483,
> > seq 2014, length 64

Re: [ovs-dev] [PATCH] ofproto-dpif-upcall: Don't mirror packets that aren't modified.

2024-03-25 Thread Mike Pattrick
On Mon, Mar 25, 2024 at 3:48 AM Zhangweiwei  wrote:
>
> Hi,
> I have tried this patch, however, there are still some issues when the 
> packets contents are changed across recirculation. On the follow example, 
> packets are modified in recirc_id(0) after mirror, the mirror context reset. 
> Therefore, there are two ICMP request packets are mirrored on port mitapVm71.
>
> In the following example, ICMP packets ared sent from port(11) to port(14),
> [root@localhost ~]# ovs-appctl dpif/dump-flows vds1-br
> ct_state(-new-est-rel-rpl-inv-trk),recirc_id(0),in_port(11),packet_type(ns=0,id=0),eth(src=52:54:00:67:d5:61,dst=68:05:ca:21:d6:e5),eth_type(0x0800),ipv4(src=11.11.70.1,dst=1.1.70.2,proto=1,ttl=64,frag=no),
>  packets:431, bytes:42238, used:0.574s, 
> actions:10,set(eth(src=68:05:ca:21:d6:e5,dst=52:54:00:9a:bf:ed)),set(ipv4(ttl=63)),ct(zone=6),recirc(0x3e8)
> ct_state(+est-rel-rpl),recirc_id(0x3e8),in_port(11),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
>  packets:430, bytes:42140, used:0.574s, actions:10,14
>
> ct_state(-new+est-rel+rpl-inv+trk),recirc_id(0x3e9),in_port(14),packet_type(ns=0,id=0),eth(src=52:54:00:9a:bf:ed,dst=68:05:ca:21:d6:e5),eth_type(0x0800),ipv4(dst=11.11.70.1,proto=1,ttl=64,frag=no),
>  packets:431, bytes:42238, used:0.574s, 
> actions:set(eth(src=68:05:ca:21:d6:e5,dst=52:54:00:67:d5:61)),set(ipv4(ttl=63)),11,10
> ct_state(-trk),recirc_id(0),in_port(14),packet_type(ns=0,id=0),eth(src=52:54:00:9a:bf:ed),eth_type(0x0800),ipv4(src=1.1.70.2,proto=1,frag=no),
>  packets:431, bytes:42238, used:0.574s, actions:ct(zone=6),recirc(0x3e9)
>
> [root@localhost ~]# ovs-appctl dpif/show
> netdev@ovs-netdev: hit:2552 missed:3019
>   vds1-br:
> mitapVm71 14/10: (system)
> tapVm71 5/11: (dpdkvhostuserclient: configured_rx_queues=1, 
> configured_tx_queues=1, mtu=1500, requested_rx_queues=1, 
> requested_tx_queues=1)
> tapVm72 6/14: (dpdkvhostuserclient: configured_rx_queues=1, 
> configured_tx_queues=1, mtu=1500, requested_rx_queues=1, 
> requested_tx_queues=1)
>
> [root@localhost ~]# ovs-tcpdump -i tapVm71
> 14:38:53.702142 IP 11.11.70.1 > 1.1.70.2: ICMP echo request, id 13483, seq 
> 2014, length 64
> 14:38:53.702143 IP 11.11.70.1 > 1.1.70.2: ICMP echo request, id 13483, seq 
> 2014, length 64
> 14:38:53.702185 IP 1.1.70.2 > 11.11.70.1: ICMP echo reply, id 13483, seq 
> 2014, length 64
> 14:38:54.742141 IP 11.11.70.1 > 1.1.70.2: ICMP echo request, id 13483, seq 
> 2015, length 64
> 14:38:54.742143 IP 11.11.70.1 > 1.1.70.2: ICMP echo request, id 13483, seq 
> 2015, length 64
> 14:38:54.742183 IP 1.1.70.2 > 11.11.70.1: ICMP echo reply, id 13483, seq 
> 2015, length 64
> 14:38:55.782142 IP 11.11.70.1 > 1.1.70.2: ICMP echo request, id 13483, seq 
> 2016, length 64
> 14:38:55.782144 IP 11.11.70.1 > 1.1.70.2: ICMP echo request, id 13483, seq 
> 2016, length 64
> 14:38:55.782186 IP 1.1.70.2 > 11.11.70.1: ICMP echo reply, id 13483, seq 
> 2016, length 64


Hello, thanks for the report. Is it possible to run the command
"ovs-tcpdump -i tapVm71 -ennvv" ? I ask because I see your actions
reset the ethernet address. If the ethernet address is different then
this would be the expected behavior, the collection of the packet as
it enters, and then as it exists modified.


Thank you,

Mike


>
> -邮件原件-
> 发件人: Mike Pattrick [mailto:m...@redhat.com]
> 发送时间: 2024年3月13日 1:37
> 收件人: d...@openvswitch.org
> 抄送: Mike Pattrick ; zhangweiwei (RD) 
> 主题: [PATCH] ofproto-dpif-upcall: Don't mirror packets that aren't modified.
>
> Previously OVS reset the mirror contents when a packet is modified in such a 
> way that the packets contents changes. However, this change incorrectly reset 
> that mirror context when only metadata changes as well.
>
> Now we check for all metadata fields, instead of just tunnel metadata, before 
> resetting the mirror context.
>
> Fixes: feed7f677505 ("ofproto-dpif-upcall: Mirror packets that are modified.")
> Reported-by: Zhangweiwei 
> Signed-off-by: Mike Pattrick 
> ---
>  include/openvswitch/meta-flow.h |   1 +
>  lib/meta-flow.c | 109 
>  ofproto/ofproto-dpif-xlate.c|   2 +-
>  tests/ofproto-dpif.at   |   5 +-
>  4 files changed, 114 insertions(+), 3 deletions(-)
>
> diff --git a/include/openvswitch/meta-flow.h 
> b/include/openvswitch/meta-flow.h index 3b0220aaa..96aad3933 100644
> --- a/include/openvswitch/meta-flow.h
> +++ b/include/openvswitch/meta-flow.h
> @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *,
>const union mf_value *mask,
>struct flow *);  bool mf_is_tun_metadata(const 
> struct mf_field *);
> +bool 

Re: [ovs-dev] [PATCH] dpif-netdev: Fix crash due to tunnel offloading on recirculation.

2024-03-22 Thread Mike Pattrick
On Fri, Mar 22, 2024 at 10:41 AM Ilya Maximets  wrote:
>
> Recirculation involves re-parsing the packet from scratch and that
> process is not aware of multiple header levels nor the inner/outer
> offsets.  So, it overwrites offsets with new ones from the outermost
> headers and sets offloading flags that change their meaning when
> the packet is marked for tunnel offloading.
>
> For example:
>
>  1. TCP packet enters OVS.
>  2. TCP packet gets encapsulated into UDP tunnel.
>  3. Recirculation happens.
>  4. Packet is re-parsed after recirculation with miniflow_extract()
> or similar function.
>  5. Packet is marked for UDP checksumming because we parse the
> outermost set of headers.  But since it is tunneled, it means
> inner UDP checksumming.  And that makes no sense, because the
> inner packet is TCP.
>
> This is causing packet drops due to malformed packets or even
> assertions and crashes in the code that is trying to fixup checksums
> for packets using incorrect metadata:
>
>  SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
>
>  lib/packets.c:2061:15: runtime error:
> member access within null pointer of type 'struct udp_header'
>
>   0 0xbe5221 in packet_udp_complete_csum lib/packets.c:2061:15
>   1 0x7e5662 in dp_packet_ol_send_prepare lib/dp-packet.c:638:9
>   2 0x96ef89 in netdev_send lib/netdev.c:940:9
>   3 0x818e94 in dp_netdev_pmd_flush_output_on_port lib/dpif-netdev.c:5577:9
>   4 0x817606 in dp_netdev_pmd_flush_output_packets lib/dpif-netdev.c:5618:27
>   5 0x81cfa5 in dp_netdev_process_rxq_port lib/dpif-netdev.c:5677:9
>   6 0x7eefe4 in dpif_netdev_run lib/dpif-netdev.c:7001:25
>   7 0x610e87 in type_run ofproto/ofproto-dpif.c:367:9
>   8 0x5b9e80 in ofproto_type_run ofproto/ofproto.c:1879:31
>   9 0x55bbb4 in bridge_run__ vswitchd/bridge.c:3281:9
>  10 0x558b6b in bridge_run vswitchd/bridge.c:3346:5
>  11 0x591dc5 in main vswitchd/ovs-vswitchd.c:130:9
>  12 0x172b89 in __libc_start_call_main (/lib64/libc.so.6+0x27b89)
>  13 0x172c4a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x27c4a)
>  14 0x47eff4 in _start (vswitchd/ovs-vswitchd+0x47eff4)
>
> Tests added for both IPv4 and IPv6 cases.  Though IPv6 test doesn't
> trigger the issue it's better to have a symmetric test.
>
> Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/053014.html
> Signed-off-by: Ilya Maximets 
> ---

I have tested this, and it does fix the segfault here.

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v4] tunnel: Allow UDP zero checksum with IPv6 tunnels.

2024-03-21 Thread Mike Pattrick
This patch adopts the proposed RFC 6935 by allowing null UDP checksums
even if the tunnel protocol is IPv6. This is already supported by Linux
through the udp6zerocsumtx tunnel option. It is disabled by default and
IPv6 tunnels are flagged as requiring a checksum, but this patch enables
the user to set csum=false on IPv6 tunnels.

Signed-off-by: Mike Pattrick 
---
v2: Changed documentation, and added a NEWS item
v3: NEWS file merge conflict
v4: Better comments, new test
---
 NEWS  |  4 
 lib/netdev-native-tnl.c   |  2 +-
 lib/netdev-vport.c| 17 +++--
 lib/netdev.h  | 18 +-
 ofproto/tunnel.c  | 11 +--
 tests/tunnel-push-pop-ipv6.at |  9 +
 tests/tunnel-push-pop.at  |  7 +++
 tests/tunnel.at   |  2 +-
 vswitchd/vswitch.xml  | 12 +---
 9 files changed, 72 insertions(+), 10 deletions(-)

diff --git a/NEWS b/NEWS
index c9e4064e6..6c8c4a2dc 100644
--- a/NEWS
+++ b/NEWS
@@ -4,6 +4,10 @@ Post-v3.3.0
  * Conntrack now supports 'random' flag for selecting ports in a range
while natting and 'persistent' flag for selection of the IP address
from a range.
+ * IPv6 UDP tunnel encapsulation including Geneve and VXLAN will now
+   honour the csum option.  Configuring the interface with
+   "options:csum=false" now has the same effect as the udp6zerocsumtx
+   option has with Linux kernel UDP tunnels.
 
 
 v3.3.0 - 16 Feb 2024
diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index dee9ab344..e8258bc4e 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -424,7 +424,7 @@ udp_build_header(const struct netdev_tunnel_config *tnl_cfg,
 udp = netdev_tnl_ip_build_header(data, params, IPPROTO_UDP, 0);
 udp->udp_dst = tnl_cfg->dst_port;
 
-if (params->is_ipv6 || params->flow->tunnel.flags & FLOW_TNL_F_CSUM) {
+if (params->flow->tunnel.flags & FLOW_TNL_F_CSUM) {
 /* Write a value in now to mark that we should compute the checksum
  * later. 0x is handy because it is transparent to the
  * calculation. */
diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index 60caa02fb..e51542e32 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -702,7 +702,9 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
*args, char **errp)
 tnl_cfg.dst_port = htons(atoi(node->value));
 } else if (!strcmp(node->key, "csum") && has_csum) {
 if (!strcmp(node->value, "true")) {
-tnl_cfg.csum = true;
+tnl_cfg.csum = NETDEV_TNL_CSUM_ENABLED;
+} else if (!strcmp(node->value, "false")) {
+tnl_cfg.csum = NETDEV_TNL_CSUM_DISABLED;
 }
 } else if (!strcmp(node->key, "seq") && has_seq) {
 if (!strcmp(node->value, "true")) {
@@ -850,6 +852,15 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
*args, char **errp)
 }
 }
 
+/* The default csum state for GRE is special as it does have an optional
+ * checksum but the default configuration isn't correlated with IP version
+ * like UDP tunnels are.  Likewise, tunnels with checksum at all must be in
+ * this state. */
+if (tnl_cfg.csum == NETDEV_TNL_CSUM_DEFAULT &&
+(!has_csum || strstr(type, "gre"))) {
+tnl_cfg.csum = NETDEV_TNL_DEFAULT_NO_CSUM;
+}
+
 enum tunnel_layers layers = tunnel_supported_layers(type, _cfg);
 const char *full_type = (strcmp(type, "vxlan") ? type
  : (tnl_cfg.exts & (1 << OVS_VXLAN_EXT_GPE)
@@ -1026,8 +1037,10 @@ get_tunnel_config(const struct netdev *dev, struct smap 
*args)
 }
 }
 
-if (tnl_cfg->csum) {
+if (tnl_cfg->csum == NETDEV_TNL_CSUM_ENABLED) {
 smap_add(args, "csum", "true");
+} else if (tnl_cfg->csum == NETDEV_TNL_CSUM_DISABLED) {
+smap_add(args, "csum", "false");
 }
 
 if (tnl_cfg->set_seq) {
diff --git a/lib/netdev.h b/lib/netdev.h
index 67a8486bd..5d253157c 100644
--- a/lib/netdev.h
+++ b/lib/netdev.h
@@ -111,6 +111,22 @@ enum netdev_srv6_flowlabel {
 SRV6_FLOWLABEL_COMPUTE,
 };
 
+enum netdev_tnl_csum {
+/* Default value for UDP tunnels if no configurations is present.  Enforce
+ * checksum calculation in IPv6 tunnels, disable in IPv4 tunnels. */
+NETDEV_TNL_CSUM_DEFAULT = 0,
+
+/* Checksum explicitly to be calculated. */
+NETDEV_TNL_CSUM_ENABLED,
+
+/* Checksum calculation explicitly disabled. */
+NETDEV_TNL_CSUM_DISABLED,
+
+/* A value for when there is no checksum or the default value is no
+ * checksum reguardless of IP version. */
+NETDEV_TNL_DEFAULT_NO_CSUM,
+};
+
 /* Configuration spe

[ovs-dev] [PATCH v3] ovs-monitor-ipsec: LibreSwan autodetect paths.

2024-03-21 Thread Mike Pattrick
In v4.0, LibreSwan changed a default paths that had been hardcoded in
ovs-monitor-ipsec, breaking some uses of this script. This patch adds
support for both old and newer versions by auto detecting the version
of LibreSwan and then choosing the correct path.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1975039
Reported-by: Qijun Ding 
Fixes: d6afbc00d5b3 ("ipsec: Allow custom file locations.")
Signed-off-by: Mike Pattrick 
---
v2: Don't extract variables from ipsec script
v3: Removed use of packaging
---
 ipsec/ovs-monitor-ipsec.in | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/ipsec/ovs-monitor-ipsec.in b/ipsec/ovs-monitor-ipsec.in
index 7945162f9..bc7ac5523 100755
--- a/ipsec/ovs-monitor-ipsec.in
+++ b/ipsec/ovs-monitor-ipsec.in
@@ -457,14 +457,30 @@ conn prevent_unencrypted_vxlan
 CERTKEY_PREFIX = "ovs_certkey_"
 
 def __init__(self, libreswan_root_prefix, args):
+# Collect version infromation
+self.IPSEC = libreswan_root_prefix + "/usr/sbin/ipsec"
+proc = subprocess.Popen([self.IPSEC, "--version"],
+stdout=subprocess.PIPE,
+encoding="latin1")
+pout, perr = proc.communicate()
+
+v = re.match("^Libreswan (.*)$", pout)
+try:
+version = int(v.group(1).split(".")[0])
+except:
+version = 0
+
+if version >= 4:
+ipsec_d = args.ipsec_d if args.ipsec_d else "/var/lib/ipsec/nss"
+else:
+ipsec_d = args.ipsec_d if args.ipsec_d else "/etc/ipsec.d"
+
 ipsec_conf = args.ipsec_conf if args.ipsec_conf else "/etc/ipsec.conf"
-ipsec_d = args.ipsec_d if args.ipsec_d else "/etc/ipsec.d"
 ipsec_secrets = (args.ipsec_secrets if args.ipsec_secrets
 else "/etc/ipsec.secrets")
 ipsec_ctl = (args.ipsec_ctl if args.ipsec_ctl
 else "/run/pluto/pluto.ctl")
 
-self.IPSEC = libreswan_root_prefix + "/usr/sbin/ipsec"
 self.IPSEC_CONF = libreswan_root_prefix + ipsec_conf
 self.IPSEC_SECRETS = libreswan_root_prefix + ipsec_secrets
 self.IPSEC_D = "sql:" + libreswan_root_prefix + ipsec_d
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] ovs-monitor-ipsec: LibreSwan autodetect paths.

2024-03-20 Thread Mike Pattrick
On Wed, Mar 20, 2024 at 2:05 PM Mike Pattrick  wrote:
>
> In v4.0, LibreSwan changed a default paths that had been hardcoded in
> ovs-monitor-ipsec, breaking some uses of this script. This patch adds
> support for both old and newer versions by auto detecting the version
> of LibreSwan and then choosing the correct path.
>
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1975039
> Reported-by: Qijun Ding 
> Fixes: d6afbc00d5b3 ("ipsec: Allow custom file locations.")
> Signed-off-by: Mike Pattrick 
> ---
> v2: Don't extract variables from ipsec script
> ---

Failed with 503 Service Unavailable

Recheck-request: github-robot

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] ovs-monitor-ipsec: LibreSwan autodetect paths.

2024-03-20 Thread Mike Pattrick
In v4.0, LibreSwan changed a default paths that had been hardcoded in
ovs-monitor-ipsec, breaking some uses of this script. This patch adds
support for both old and newer versions by auto detecting the version
of LibreSwan and then choosing the correct path.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1975039
Reported-by: Qijun Ding 
Fixes: d6afbc00d5b3 ("ipsec: Allow custom file locations.")
Signed-off-by: Mike Pattrick 
---
v2: Don't extract variables from ipsec script
---
 ipsec/ovs-monitor-ipsec.in | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/ipsec/ovs-monitor-ipsec.in b/ipsec/ovs-monitor-ipsec.in
index 7945162f9..6a71d4f2f 100755
--- a/ipsec/ovs-monitor-ipsec.in
+++ b/ipsec/ovs-monitor-ipsec.in
@@ -21,6 +21,7 @@ import re
 import subprocess
 import sys
 from string import Template
+from packaging.version import parse
 
 import ovs.daemon
 import ovs.db.idl
@@ -457,14 +458,25 @@ conn prevent_unencrypted_vxlan
 CERTKEY_PREFIX = "ovs_certkey_"
 
 def __init__(self, libreswan_root_prefix, args):
+# Collect version infromation
+self.IPSEC = libreswan_root_prefix + "/usr/sbin/ipsec"
+proc = subprocess.Popen([self.IPSEC, "--version"],
+stdout=subprocess.PIPE,
+encoding="latin1")
+pout, perr = proc.communicate()
+
+v = re.match("^Libreswan (.*)$", pout)
+if v and parse(v.group(1)) >= parse("4.0"):
+ipsec_d = args.ipsec_d if args.ipsec_d else "/var/lib/ipsec/nss"
+else:
+ipsec_d = args.ipsec_d if args.ipsec_d else "/etc/ipsec.d"
+
 ipsec_conf = args.ipsec_conf if args.ipsec_conf else "/etc/ipsec.conf"
-ipsec_d = args.ipsec_d if args.ipsec_d else "/etc/ipsec.d"
 ipsec_secrets = (args.ipsec_secrets if args.ipsec_secrets
 else "/etc/ipsec.secrets")
 ipsec_ctl = (args.ipsec_ctl if args.ipsec_ctl
 else "/run/pluto/pluto.ctl")
 
-self.IPSEC = libreswan_root_prefix + "/usr/sbin/ipsec"
 self.IPSEC_CONF = libreswan_root_prefix + ipsec_conf
 self.IPSEC_SECRETS = libreswan_root_prefix + ipsec_secrets
 self.IPSEC_D = "sql:" + libreswan_root_prefix + ipsec_d
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovs-monitor-ipsec: LibreSwan autodetect paths.

2024-03-20 Thread Mike Pattrick
On Tue, Mar 19, 2024 at 5:35 PM Ilya Maximets  wrote:
>
> On 3/13/24 22:54, Mike Pattrick wrote:
> > In v4.0, LibreSwan changed a default paths that had been hardcoded in
> > ovs-monitor-ipsec, breaking some uses of this script. This patch adds
> > support for both old and newer versions by auto detecting the location
> > of these paths from LibreSwan shell script environment variables.
> >
> > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1975039
> > Reported-by: Qijun Ding 
> > Fixes: d6afbc00d5b3 ("ipsec: Allow custom file locations.")
> > Signed-off-by: Mike Pattrick 
> > ---
> >  ipsec/ovs-monitor-ipsec.in | 31 +++
> >  1 file changed, 27 insertions(+), 4 deletions(-)
> >
>
> Hi, Mike.  Thanks for working on this!
>
> Though using the knowledge that /usr/sbin/ipsec is a shell script
> and that it defines particular variables inside seems like a hack.
>
> Maybe we can just check the version instead?  We know that default
> nss path changed in 4.0.

My motivation for this method was because these paths could be changed
easier than reimplementing the ipsec script, by the maintainers or
downstream distributions. But there's nothing stopping us from
addressing any future changes as they happen, I'll resend with just a
fix for 4.0.

-M

>
> Best regards, Ilya Maximets.
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 3/3] netdev-dpdk: Fix tunnel type check during Tx offload preparation.

2024-03-14 Thread Mike Pattrick
On Wed, Mar 13, 2024 at 1:29 PM Ilya Maximets  wrote:
>
> Tunnel types are not flags, but 4-bit fields, so checking them with
> a simple binary 'and' is incorrect and may produce false-positive
> matches.
>
> While the current implementation is unlikely to cause any issues today,
> since both RTE_MBUF_F_TX_TUNNEL_VXLAN and RTE_MBUF_F_TX_TUNNEL_GENEVE
> only have 1 bit set, it is risky to have this code and it may lead
> to problems if we add support for other tunnel types in the future.
>
> Use proper field checks instead.  Also adding a warning for unexpected
> tunnel types in case something goes wrong.
>
> Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
> Signed-off-by: Ilya Maximets 

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/3] netdev-dpdk: Fix TCP check during Tx offload preparation.

2024-03-14 Thread Mike Pattrick
On Wed, Mar 13, 2024 at 1:29 PM Ilya Maximets  wrote:
>
> RTE_MBUF_F_TX_TCP_CKSUM is not a flag, but a 2-bit field, so checking
> it with a simple binary 'and' is incorrect.  For example, this check
> will succeed for a packet with UDP checksum requested as well.
>
> Fix the check to avoid wrongly initializing tso_segz and potentially
> accessing UDP header via TCP structure pointer.
>
> The IPv4 checksum flag has to be set for any L4 checksum request,
> regardless of the type, so moving this check out of the TCP condition.
>
> Fixes: 8b5fe2dc6080 ("userspace: Add Generic Segmentation Offloading.")
> Signed-off-by: Ilya Maximets 

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/3] netdev-dpdk: Clear inner packet marks if no inner offloads requested.

2024-03-14 Thread Mike Pattrick
On Wed, Mar 13, 2024 at 1:29 PM Ilya Maximets  wrote:
>
> In some cases only outer offloads may be requested for a tunneled
> packet.  In this case there is no need to mark the type of an
> inner packet.  Clean these flags up to avoid potential confusion
> of DPDK drivers.
>
> Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
> Signed-off-by: Ilya Maximets 

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ovs-monitor-ipsec: LibreSwan autodetect paths.

2024-03-13 Thread Mike Pattrick
In v4.0, LibreSwan changed a default paths that had been hardcoded in
ovs-monitor-ipsec, breaking some uses of this script. This patch adds
support for both old and newer versions by auto detecting the location
of these paths from LibreSwan shell script environment variables.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1975039
Reported-by: Qijun Ding 
Fixes: d6afbc00d5b3 ("ipsec: Allow custom file locations.")
Signed-off-by: Mike Pattrick 
---
 ipsec/ovs-monitor-ipsec.in | 31 +++
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/ipsec/ovs-monitor-ipsec.in b/ipsec/ovs-monitor-ipsec.in
index 7945162f9..6c28f30f4 100755
--- a/ipsec/ovs-monitor-ipsec.in
+++ b/ipsec/ovs-monitor-ipsec.in
@@ -456,15 +456,38 @@ conn prevent_unencrypted_vxlan
 CERT_PREFIX = "ovs_cert_"
 CERTKEY_PREFIX = "ovs_certkey_"
 
+def collect_environment(self):
+"""Extract important paths from ipsec file."""
+env = {
+"IPSEC_CONF": "/etc/ipsec.conf",
+"IPSEC_NSSDIR": "/etc/ipsec.d",
+"IPSEC_RUNDIR": "/run/pluto"
+}
+try:
+with open(self.IPSEC) as fh:
+e_list = re.findall("^([A-Z_]+)=.*:-(.*)}",
+fh.read(),
+re.MULTILINE)
+except:
+return env
+
+for k, v in e_list:
+env[k] = v
+
+return env
+
 def __init__(self, libreswan_root_prefix, args):
-ipsec_conf = args.ipsec_conf if args.ipsec_conf else "/etc/ipsec.conf"
-ipsec_d = args.ipsec_d if args.ipsec_d else "/etc/ipsec.d"
+self.IPSEC = libreswan_root_prefix + "/usr/sbin/ipsec"
+
+env = self.collect_environment()
+
+ipsec_conf = args.ipsec_conf if args.ipsec_conf else env["IPSEC_CONF"]
+ipsec_d = args.ipsec_d if args.ipsec_d else env["IPSEC_NSSDIR"]
 ipsec_secrets = (args.ipsec_secrets if args.ipsec_secrets
 else "/etc/ipsec.secrets")
 ipsec_ctl = (args.ipsec_ctl if args.ipsec_ctl
-else "/run/pluto/pluto.ctl")
+else os.path.join(env["IPSEC_RUNDIR"], "pluto.ctl"))
 
-self.IPSEC = libreswan_root_prefix + "/usr/sbin/ipsec"
 self.IPSEC_CONF = libreswan_root_prefix + ipsec_conf
 self.IPSEC_SECRETS = libreswan_root_prefix + ipsec_secrets
 self.IPSEC_D = "sql:" + libreswan_root_prefix + ipsec_d
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ofproto-dpif-upcall: Don't mirror packets that aren't modified.

2024-03-12 Thread Mike Pattrick
Previously OVS reset the mirror contents when a packet is modified in
such a way that the packets contents changes. However, this change
incorrectly reset that mirror context when only metadata changes as
well.

Now we check for all metadata fields, instead of just tunnel metadata,
before resetting the mirror context.

Fixes: feed7f677505 ("ofproto-dpif-upcall: Mirror packets that are modified.")
Reported-by: Zhangweiwei 
Signed-off-by: Mike Pattrick 
---
 include/openvswitch/meta-flow.h |   1 +
 lib/meta-flow.c | 109 
 ofproto/ofproto-dpif-xlate.c|   2 +-
 tests/ofproto-dpif.at   |   5 +-
 4 files changed, 114 insertions(+), 3 deletions(-)

diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
index 3b0220aaa..96aad3933 100644
--- a/include/openvswitch/meta-flow.h
+++ b/include/openvswitch/meta-flow.h
@@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *,
   const union mf_value *mask,
   struct flow *);
 bool mf_is_tun_metadata(const struct mf_field *);
+bool mf_is_metadata(const struct mf_field *);
 bool mf_is_frozen_metadata(const struct mf_field *);
 bool mf_is_pipeline_field(const struct mf_field *);
 bool mf_is_set(const struct mf_field *, const struct flow *);
diff --git a/lib/meta-flow.c b/lib/meta-flow.c
index aa7cf1fcb..7ecec334e 100644
--- a/lib/meta-flow.c
+++ b/lib/meta-flow.c
@@ -1788,6 +1788,115 @@ mf_is_tun_metadata(const struct mf_field *mf)
mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS;
 }
 
+bool
+mf_is_metadata(const struct mf_field *mf)
+{
+switch (mf->id) {
+CASE_MFF_TUN_METADATA:
+case MFF_METADATA:
+case MFF_IN_PORT:
+case MFF_IN_PORT_OXM:
+CASE_MFF_REGS:
+CASE_MFF_XREGS:
+CASE_MFF_XXREGS:
+case MFF_PACKET_TYPE:
+case MFF_DP_HASH:
+case MFF_RECIRC_ID:
+case MFF_CONJ_ID:
+case MFF_ACTSET_OUTPUT:
+case MFF_SKB_PRIORITY:
+case MFF_PKT_MARK:
+case MFF_CT_STATE:
+case MFF_CT_ZONE:
+case MFF_CT_MARK:
+case MFF_CT_LABEL:
+case MFF_CT_NW_PROTO:
+case MFF_CT_NW_SRC:
+case MFF_CT_NW_DST:
+case MFF_CT_IPV6_SRC:
+case MFF_CT_IPV6_DST:
+case MFF_CT_TP_SRC:
+case MFF_CT_TP_DST:
+case MFF_N_IDS:
+return true;
+
+case MFF_TUN_ID:
+case MFF_TUN_SRC:
+case MFF_TUN_DST:
+case MFF_TUN_IPV6_SRC:
+case MFF_TUN_IPV6_DST:
+case MFF_TUN_FLAGS:
+case MFF_TUN_GBP_ID:
+case MFF_TUN_GBP_FLAGS:
+case MFF_TUN_ERSPAN_VER:
+case MFF_TUN_ERSPAN_IDX:
+case MFF_TUN_ERSPAN_DIR:
+case MFF_TUN_ERSPAN_HWID:
+case MFF_TUN_GTPU_FLAGS:
+case MFF_TUN_GTPU_MSGTYPE:
+case MFF_TUN_TTL:
+case MFF_TUN_TOS:
+case MFF_ETH_SRC:
+case MFF_ETH_DST:
+case MFF_ETH_TYPE:
+case MFF_VLAN_TCI:
+case MFF_DL_VLAN:
+case MFF_VLAN_VID:
+case MFF_DL_VLAN_PCP:
+case MFF_VLAN_PCP:
+case MFF_MPLS_LABEL:
+case MFF_MPLS_TC:
+case MFF_MPLS_BOS:
+case MFF_MPLS_TTL:
+case MFF_IPV4_SRC:
+case MFF_IPV4_DST:
+case MFF_IPV6_SRC:
+case MFF_IPV6_DST:
+case MFF_IPV6_LABEL:
+case MFF_IP_PROTO:
+case MFF_IP_DSCP:
+case MFF_IP_DSCP_SHIFTED:
+case MFF_IP_ECN:
+case MFF_IP_TTL:
+case MFF_IP_FRAG:
+case MFF_ARP_OP:
+case MFF_ARP_SPA:
+case MFF_ARP_TPA:
+case MFF_ARP_SHA:
+case MFF_ARP_THA:
+case MFF_TCP_SRC:
+case MFF_TCP_DST:
+case MFF_TCP_FLAGS:
+case MFF_UDP_SRC:
+case MFF_UDP_DST:
+case MFF_SCTP_SRC:
+case MFF_SCTP_DST:
+case MFF_ICMPV4_TYPE:
+case MFF_ICMPV4_CODE:
+case MFF_ICMPV6_TYPE:
+case MFF_ICMPV6_CODE:
+case MFF_ND_TARGET:
+case MFF_ND_SLL:
+case MFF_ND_TLL:
+case MFF_ND_RESERVED:
+case MFF_ND_OPTIONS_TYPE:
+case MFF_NSH_FLAGS:
+case MFF_NSH_TTL:
+case MFF_NSH_MDTYPE:
+case MFF_NSH_NP:
+case MFF_NSH_SPI:
+case MFF_NSH_SI:
+case MFF_NSH_C1:
+case MFF_NSH_C2:
+case MFF_NSH_C3:
+case MFF_NSH_C4:
+return false;
+
+default:
+OVS_NOT_REACHED();
+}
+}
+
 bool
 mf_is_frozen_metadata(const struct mf_field *mf)
 {
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 89f183182..faa364ec8 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -7141,7 +7141,7 @@ reset_mirror_ctx(struct xlate_ctx *ctx, const struct flow 
*flow,
 
 set_field = ofpact_get_SET_FIELD(a);
 mf = set_field->field;
-if (mf_are_prereqs_ok(mf, flow, NULL) && !mf_is_tun_metadata(mf)) {
+if (mf_are_prereqs_ok(mf, flow, NULL) && !mf_is_metadata(mf)) {
 ctx->mirrors = 0;
 }
 return;
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index a1393f7f8..245e209c3 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -5443,7 +5443,8 @@ AT_CLEANU

Re: [ovs-dev] [PATCH] netdev-dpdk: Clean up all marker flags if no offloads requested.

2024-03-12 Thread Mike Pattrick
On Mon, Mar 11, 2024 at 2:31 PM Ilya Maximets  wrote:
>
> Some drivers (primarily, Intel ones) do not expect any marking flags
> being set if no offloads are requested.  If these flags are present,
> driver will fail Tx preparation or behave abnormally.
>
> For example, ixgbe driver will refuse to process the packet with
> only RTE_MBUF_F_TX_TUNNEL_GENEVE and RTE_MBUF_F_TX_OUTER_IPV4 set.
> This pretty much breaks Geneve tunnels on these cards.
>
> An extra check is added to make sure we don't have any unexpected
> Tx offload flags set.
>
> Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
> Reported-at: https://github.com/openvswitch/ovs-issues/issues/321
> Signed-off-by: Ilya Maximets 

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3] ovsdb: Don't iterate over rows on empty mutation.

2024-03-06 Thread Mike Pattrick
Previously when an empty mutation was used to count the number of rows
in a table, OVSDB would iterate over all rows twice. First to perform an
RBAC check, and then to perform the no-operation.

This change adds a short circuit to mutate operations with no conditions
and an empty mutation set, returning immediately. One notable change in
functionality is not performing the RBAC check in this condition, as no
mutation actually takes place.

Reported-by: Terry Wilson 
Reported-at: https://issues.redhat.com/browse/FDP-359
Signed-off-by: Mike Pattrick 
---
v2: Added additional non-rbac tests, and support for conditional
counting without the rbac check
v3: Changed a struct to a size_t.
---
 ovsdb/execution.c| 23 +-
 ovsdb/mutation.h |  6 +
 tests/ovsdb-execution.at | 51 
 tests/ovsdb-rbac.at  | 23 ++
 4 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/ovsdb/execution.c b/ovsdb/execution.c
index 8c20c3b54..f4cc9e802 100644
--- a/ovsdb/execution.c
+++ b/ovsdb/execution.c
@@ -585,6 +585,16 @@ mutate_row_cb(const struct ovsdb_row *row, void *mr_)
 return *mr->error == NULL;
 }
 
+static bool
+count_row_cb(const struct ovsdb_row *row OVS_UNUSED, void *rc)
+{
+size_t *row_count = rc;
+
+(*row_count)++;
+
+return true;
+}
+
 static struct ovsdb_error *
 ovsdb_execute_mutate(struct ovsdb_execution *x, struct ovsdb_parser *parser,
  struct json *result)
@@ -609,7 +619,18 @@ ovsdb_execute_mutate(struct ovsdb_execution *x, struct 
ovsdb_parser *parser,
 error = ovsdb_condition_from_json(table->schema, where, x->symtab,
   );
 }
-if (!error) {
+if (!error && ovsdb_mutation_set_empty()) {
+/* Special case with no mutations, just return the row count. */
+if (ovsdb_condition_empty()) {
+json_object_put(result, "count",
+json_integer_create(hmap_count(>rows)));
+} else {
+size_t row_count = 0;
+ovsdb_query(table, , count_row_cb, _count);
+json_object_put(result, "count",
+json_integer_create(row_count));
+}
+} else if (!error) {
 mr.n_matches = 0;
 mr.txn = x->txn;
 mr.mutations = 
diff --git a/ovsdb/mutation.h b/ovsdb/mutation.h
index 7566ef199..05d4a262a 100644
--- a/ovsdb/mutation.h
+++ b/ovsdb/mutation.h
@@ -69,4 +69,10 @@ void ovsdb_mutation_set_destroy(struct ovsdb_mutation_set *);
 struct ovsdb_error *ovsdb_mutation_set_execute(
 struct ovsdb_row *, const struct ovsdb_mutation_set *) 
OVS_WARN_UNUSED_RESULT;
 
+static inline bool ovsdb_mutation_set_empty(
+const struct ovsdb_mutation_set *ms)
+{
+return ms->n_mutations == 0;
+}
+
 #endif /* ovsdb/mutation.h */
diff --git a/tests/ovsdb-execution.at b/tests/ovsdb-execution.at
index fd1c7a239..1ffa2b738 100644
--- a/tests/ovsdb-execution.at
+++ b/tests/ovsdb-execution.at
@@ -1201,4 +1201,55 @@ OVSDB_CHECK_EXECUTION([garbage collection],
 [{"rows":[]}]
 ]])])
 
+OVSDB_CHECK_EXECUTION([insert rows, count with mutation],
+  [ordinal_schema],
+  "ordinals",
+  {"op": "insert",
+   "table": "ordinals",
+   "row": {"number": 0, "name": "zero"},
+   "uuid-name": "first"}]]],
+   [[["ordinals",
+  {"op": "insert",
+   "table": "ordinals",
+   "row": {"number": 1, "name": "one"},
+   "uuid-name": "first"}]]],
+   [[["ordinals",
+  {"op": "mutate",
+   "table": "ordinals",
+   "where": [["name", "==", "zero"]],
+   "mutations": []}]]],
+   [[["ordinals",
+  {"op": "mutate",
+   "table": "ordinals",
+   "where": [["name", "==", "one"]],
+   "mutations": []}]]],
+   [[["ordinals",
+  {"op": "insert",
+   "table": "ordinals",
+   "row": {"number": 2, "name": "one"},
+   "uuid-name": "first"}]]],
+   [[["ordinals",
+  {"op": "mutate",
+   "table": "ordinals",
+   "where": [["name", "==", "one"]],
+   "mutations": []}]]],
+   [[["ordinals",
+  {"op": "delete",
+   "table": "ordinals",
+   "where": [["name", "==", "zero"]]}]]],
+   [[["

Re: [ovs-dev] [PATCH v6 2/2] netlink-conntrack: Optimize flushing ct zone.

2024-03-06 Thread Mike Pattrick
On Mon, Mar 4, 2024 at 3:22 AM Felix Huettner via dev
 wrote:
>
> Previously the kernel did not provide a netlink interface to flush/list
> only conntrack entries matching a specific zone. With [1] and [2] it is now
> possible to flush and list conntrack entries filtered by zone. Older
> kernels not yet supporting this feature will ignore the filter.
> For the list request that means just returning all entries (which we can
> then filter in userspace as before).
> For the flush request that means deleting all conntrack entries.
>
> The implementation is now identical to the windows one, so we combine
> them.
>
> These significantly improves the performance of flushing conntrack zones
> when the conntrack table is large. Since flushing a conntrack zone is
> normally triggered via an openflow command it blocks the main ovs thread
> and thereby also blocks new flows from being applied. Using this new
> feature we can reduce the flushing time for zones by around 93%.
>
> In combination with OVN the creation of a Logical_Router (which causes
> the flushing of a ct zone) could block other operations, e.g. the
> failover of Logical_Routers (as they cause new flows to be created).
> This is visible from a user perspective as a ovn-controller that is idle
> (as it waits for vswitchd) and vswitchd reporting:
> "blocked 1000 ms waiting for main to quiesce" (potentially with ever
> increasing times).
>
> The following performance tests where run in a qemu vm with 500.000
> conntrack entries distributed evenly over 500 ct zones using `ovstest
> test-netlink-conntrack flush zone=`.
>
>   |  flush zone with 1000 entries  |   flush zone with no entry |
>   +-+--+-+--|
>   |   with the patch| without  |   with the patch| without  |
>   +--+--+--+--+--+--|
>   | v6.8-rc4 |  v6.7.1  | v6.8-rc4 | v6.8-rc4 |  v6.7.1  | v6.8-rc4 |
> +-+--+--+--+--+--+--|
> | Min |  0.260   |  3.946   |  3.497   |  0.228   |  3.462   |  3.212   |
> | Median  |  0.319   |  4.237   |  4.349   |  0.298   |  4.460   |  4.010   |
> | 90%ile  |  0.335   |  4.367   |  4.522   |  0.325   |  4.662   |  4.572   |
> | 99%ile  |  0.348   |  4.495   |  4.773   |  0.340   |  4.931   |  6.003   |
> | Max |  0.362   |  4.543   |  5.054   |  0.348   |  5.390   |  6.396   |
> | Mean|  0.320   |  4.236   |  4.331   |  0.296   |  4.430   |  4.071   |
> | Total   |  80.02   |  1058|  1082|  73.93   |  1107|  1017|
>
> [1]: 
> https://github.com/torvalds/linux/commit/eff3c558bb7e61c41b53e4c8130e514a5a4df9ba
> [2]: 
> https://github.com/torvalds/linux/commit/fa173a1b4e3fd1ab5451cbc57de6fc624c824b0a
>
> Co-Authored-By: Luca Czesla 
> Signed-off-by: Luca Czesla 
> Co-Authored-By: Max Lamprecht 
> Signed-off-by: Max Lamprecht 
> Signed-off-by: Felix Huettner 
> ---

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6 1/2] util: Support checking for kernel versions.

2024-03-05 Thread Mike Pattrick
On Mon, Mar 4, 2024 at 3:22 AM Felix Huettner via dev
 wrote:
>
> Extract checking for a given kernel version to a separate function.
> It will be used also in the next patch.
>
> Signed-off-by: Felix Huettner 
> ---

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] conntrack: Fix flush not flushing all elements.

2024-03-04 Thread Mike Pattrick
On Mon, Mar 4, 2024 at 10:22 AM Xavier Simonart  wrote:
>
> On netdev datapath, when a ct element was cleaned, the cmap
> could be shrinked, potentially causing some elements to be skipped
> in the flush iteration.
>
> Fixes: 967bb5c5cd90 ("conntrack: Add rcu support.")
> Signed-off-by: Xavier Simonart 

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] ovsdb: Don't iterate over rows on empty mutation.

2024-02-28 Thread Mike Pattrick
On Wed, Feb 28, 2024 at 8:41 AM Ilya Maximets  wrote:
>
> On 2/22/24 17:37, Mike Pattrick wrote:
> > Previously when an empty mutation was used to count the number of rows
> > in a table, OVSDB would iterate over all rows twice. First to perform an
> > RBAC check, and then to perform the no-operation.
> >
> > This change adds a short circuit to mutate operations with no conditions
> > and an empty mutation set, returning immediately. One notable change in
> > functionality is not performing the RBAC check in this condition, as no
> > mutation actually takes place.
> >
> > Reported-by: Terry Wilson 
> > Reported-at: https://issues.redhat.com/browse/FDP-359
> > Signed-off-by: Mike Pattrick 
> > ---
> > v2: Added additional non-rbac tests, and support for conditional
> > counting without the rbac check
> > ---
> >  ovsdb/execution.c| 26 +++-
> >  ovsdb/mutation.h |  6 +
> >  tests/ovsdb-execution.at | 51 
> >  tests/ovsdb-rbac.at  | 23 ++
> >  4 files changed, 105 insertions(+), 1 deletion(-)
>
> Hi, Mike.  Thanks for v2!  I didn't test, but it looks good in general.
> See one comment inline.
>
> Best regards, Ilya Maximets.
>
> >
> > diff --git a/ovsdb/execution.c b/ovsdb/execution.c
> > index 8c20c3b54..7ed700632 100644
> > --- a/ovsdb/execution.c
> > +++ b/ovsdb/execution.c
> > @@ -585,6 +585,19 @@ mutate_row_cb(const struct ovsdb_row *row, void *mr_)
> >  return *mr->error == NULL;
> >  }
> >
> > +struct count_row_cbdata {
> > +size_t n_matches;
> > +};
>
> Do we actually need this structure?  It only has one element.
> We should be able to just pass a counter around directly.

It seemed more thematic to me at the time, but I can change this.

-M

>
> > +
> > +static bool
> > +count_row_cb(const struct ovsdb_row *row OVS_UNUSED, void *cr_)
> > +{
> > +struct count_row_cbdata *cr = cr_;
> > +
> > +cr->n_matches++;
> > +return true;
> > +}
> > +
> >  static struct ovsdb_error *
> >  ovsdb_execute_mutate(struct ovsdb_execution *x, struct ovsdb_parser 
> > *parser,
> >   struct json *result)
> > @@ -609,7 +622,18 @@ ovsdb_execute_mutate(struct ovsdb_execution *x, struct 
> > ovsdb_parser *parser,
> >  error = ovsdb_condition_from_json(table->schema, where, x->symtab,
> >);
> >  }
> > -if (!error) {
> > +if (!error && ovsdb_mutation_set_empty()) {
> > +/* Special case with no mutations, just return the row count. */
> > +if (ovsdb_condition_empty()) {
> > +json_object_put(result, "count",
> > +json_integer_create(hmap_count(>rows)));
> > +} else {
> > +struct count_row_cbdata cr = {};
> > +ovsdb_query(table, , count_row_cb, );
> > +json_object_put(result, "count",
> > +json_integer_create(cr.n_matches));
> > +}
> > +} else if (!error) {
> >  mr.n_matches = 0;
> >  mr.txn = x->txn;
> >  mr.mutations = 
> > diff --git a/ovsdb/mutation.h b/ovsdb/mutation.h
> > index 7566ef199..05d4a262a 100644
> > --- a/ovsdb/mutation.h
> > +++ b/ovsdb/mutation.h
> > @@ -69,4 +69,10 @@ void ovsdb_mutation_set_destroy(struct 
> > ovsdb_mutation_set *);
> >  struct ovsdb_error *ovsdb_mutation_set_execute(
> >  struct ovsdb_row *, const struct ovsdb_mutation_set *) 
> > OVS_WARN_UNUSED_RESULT;
> >
> > +static inline bool ovsdb_mutation_set_empty(
> > +const struct ovsdb_mutation_set *ms)
> > +{
> > +return ms->n_mutations == 0;
> > +}
> > +
> >  #endif /* ovsdb/mutation.h */
> > diff --git a/tests/ovsdb-execution.at b/tests/ovsdb-execution.at
> > index fd1c7a239..1ffa2b738 100644
> > --- a/tests/ovsdb-execution.at
> > +++ b/tests/ovsdb-execution.at
> > @@ -1201,4 +1201,55 @@ OVSDB_CHECK_EXECUTION([garbage collection],
> >  [{"rows":[]}]
> >  ]])])
> >
> > +OVSDB_CHECK_EXECUTION([insert rows, count with mutation],
> > +  [ordinal_schema],
> > +  "ordinals",
> > +  {"op": "insert",
> > +   "table": "ordinals",
> > +   "row": {"number": 0, "name": "zero"},
> > +   "uuid-n

Re: [ovs-dev] [PATCH v5 1/2] util: Support checking for kernel versions.

2024-02-27 Thread Mike Pattrick
On Mon, Feb 26, 2024 at 4:22 AM Felix Huettner via dev
 wrote:
>
> Extract checking for a given kernel version to a separate function.
> It will be used also in the next patch.
>
> Signed-off-by: Felix Huettner 
> ---
> v4->v5:
> - fix wrong ifdef that broke on macos
> - fix ovs_kernel_is_version_or_newer working in reverse than desired
> - ovs_kernel_is_version_or_newer now always returns false if uname
>   errors (Thanks Eelco)
> v4:
> - extract function to check kernel version
>  lib/netdev-linux.c | 14 +++---
>  lib/util.c | 27 +++
>  lib/util.h |  4 
>  3 files changed, 34 insertions(+), 11 deletions(-)
>
> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> index bf91ef462..51bd71ae3 100644
> --- a/lib/netdev-linux.c
> +++ b/lib/netdev-linux.c
> @@ -6427,18 +6427,10 @@ getqdisc_is_safe(void)
>  static bool safe = false;
>
>  if (ovsthread_once_start()) {
> -struct utsname utsname;
> -int major, minor;
> -
> -if (uname() == -1) {
> -VLOG_WARN("uname failed (%s)", ovs_strerror(errno));
> -} else if (!ovs_scan(utsname.release, "%d.%d", , )) {
> -VLOG_WARN("uname reported bad OS release (%s)", utsname.release);
> -} else if (major < 2 || (major == 2 && minor < 35)) {
> -VLOG_INFO("disabling unsafe RTM_GETQDISC in Linux kernel %s",
> -  utsname.release);
> -} else {
> +if (ovs_kernel_is_version_or_newer(2, 35)) {
>  safe = true;
> +} else {
> +VLOG_INFO("disabling unsafe RTM_GETQDISC in Linux kernel");
>  }
>  ovsthread_once_done();
>  }
> diff --git a/lib/util.c b/lib/util.c
> index 3fb3a4b40..f5b2da095 100644
> --- a/lib/util.c
> +++ b/lib/util.c
> @@ -27,6 +27,7 @@
>  #include 
>  #ifdef __linux__
>  #include 
> +#include 

This import can now be removed from netdev-linux (I believe).

>  #endif
>  #include 
>  #include 
> @@ -2500,3 +2501,29 @@ OVS_CONSTRUCTOR(winsock_start) {
> }
>  }
>  #endif
> +
> +#ifdef __linux__
> +bool
> +ovs_kernel_is_version_or_newer(int target_major, int target_minor)
> +{
> +static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
> +static int current_major, current_minor = -1;
> +
> +if (ovsthread_once_start()) {
> +struct utsname utsname;
> +
> +if (uname() == -1) {
> +VLOG_WARN("uname failed (%s)", ovs_strerror(errno));
> +} else if (!ovs_scan(utsname.release, "%d.%d",
> +_major, _minor)) {
> +VLOG_WARN("uname reported bad OS release (%s)", utsname.release);
> +}
> +ovsthread_once_done();
> +}
> +if (current_major == -1 || current_minor == -1) {
> +return false;
> +}
> +return current_major > target_major || (
> +current_major == target_major && current_minor > target_minor);

Shouldn't this be "current_minor >= target_minor" ?

-M

> +}
> +#endif
> diff --git a/lib/util.h b/lib/util.h
> index f2d45bcac..55718fd87 100644
> --- a/lib/util.h
> +++ b/lib/util.h
> @@ -611,4 +611,8 @@ int ftruncate(int fd, off_t length);
>  }
>  #endif
>
> +#ifdef __linux__
> +bool ovs_kernel_is_version_or_newer(int target_major, int target_minor);
> +#endif
> +
>  #endif /* util.h */
>
> base-commit: 166ee41d282c506d100bc2185d60af277121b55b
> --
> 2.43.2
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 5/5] Documentation: Update links to upstream Kernel documentation.

2024-02-27 Thread Mike Pattrick
On Tue, Feb 27, 2024 at 10:37 AM Simon Horman  wrote:
>
> This updates links to several upstream Kernel documents.
>
> 1. Lore is now the canonical archive for the netdev mailing list
>
> 2. net-next is now maintained by the netdev team,
>of which David Miller is currently a member,
>rather than only by David.
>
>Also, use HTTPS rather than HTTP.
>
> 3. The Netdev FAQ has evolved into the Netdev Maintainer Handbook.
>
> 4. The Kernel security document link was dead,
>provide the current canonical location for this document instead.
>
> 1., 2. & 3. Found by inspection
> 4. Flagged by check-docs
>
> Signed-off-by: Simon Horman 

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 4/5] Documentatoin: Update Pacemaker link.

2024-02-27 Thread Mike Pattrick
On Tue, Feb 27, 2024 at 10:36 AM Simon Horman  wrote:
>
> Update link to OCF Resource Agents documentation as the existing link
> is broken. Also, use HTTPS.
>
> Broken link flagged by make check-docs
>
> Signed-off-by: Simon Horman 

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 3/5] Documentation: Anuket project updates.

2024-02-27 Thread Mike Pattrick
On Tue, Feb 27, 2024 at 10:36 AM Simon Horman  wrote:
>
> The Anuket was formed by a merger of OPNFV and CNTT [1].
>
> Also, VswitchPerf, aka vsperf, formerly an OPNFV project,
> has been renamed ViNePerf [2].
>
> Update links and documentation accordingly.
>
> The old links were broken, this was flagged by make check-docs
>
> [1] 
> https://anuket.io/news/2021/01/27/lf-networking-launches-anuket-an-open-source-project-to-accelerate-infrastructure-compliance-interoperability-and-5g-deployments/
> [2] 
> https://docs.opnfv.org/projects/vineperf/en/latest/release/release-notes/release-notes.html
>
> Signed-off-by: Simon Horman 
> ---

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/5] Documentation: Correct spelling errors.

2024-02-27 Thread Mike Pattrick
On Tue, Feb 27, 2024 at 10:37 AM Simon Horman  wrote:
>
> Correct spelling errors in .rst files flagged by codespell.
>
> Signed-off-by: Simon Horman 
> ---

These look correct to me.

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/5] Documentation: Extend copyright to 2024.

2024-02-27 Thread Mike Pattrick
On Tue, Feb 27, 2024 at 10:36 AM Simon Horman  wrote:
>
> IANAL, but I think we can extend the copyright attached
> to documentation to cover the current year: we are still
> actively working on the documentation.
>
> Signed-off-by: Simon Horman 

Acked-by: Mike Pattrick 

I wonder if it's valid to set the end date to datetime.now().year

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] conntrack: Fix flush not flushing all elements.

2024-02-27 Thread Mike Pattrick
On Mon, Feb 26, 2024 at 5:50 AM Xavier Simonart  wrote:
>
> On netdev datapath, when a ct element was cleaned, the cmap
> could be shrinked, potentially causing some elements to be skipped
> in the flush iteration.
>
> Fixes: 967bb5c5cd90 ("conntrack: Add rcu support.")
> Signed-off-by: Xavier Simonart 

Thank you for the patch, I was able to test this out, verify the issue
is as you described, and that your patch fixes the problem.

> ---
> v2: - Updated commit message.
> - Use compose-packet instead of hex packet content.
> - Use dnl for comments.
> - Remove unnecessary errors in OVS_TRAFFIC_VSWITCHD_STOP.
> - Rebased on origin/master.
> ---
>  lib/conntrack.c | 14 
>  lib/conntrack.h |  1 +
>  tests/system-traffic.at | 47 +
>  3 files changed, 52 insertions(+), 10 deletions(-)
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 8a7056bac..5786424f6 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -2651,25 +2651,19 @@ conntrack_dump_start(struct conntrack *ct, struct 
> conntrack_dump *dump,
>
>  dump->ct = ct;
>  *ptot_bkts = 1; /* Need to clean up the callers. */
> +dump->cursor = cmap_cursor_start(>conns);
>  return 0;
>  }
>
>  int
>  conntrack_dump_next(struct conntrack_dump *dump, struct ct_dpif_entry *entry)
>  {
> -struct conntrack *ct = dump->ct;
>  long long now = time_msec();
>
> -for (;;) {
> -struct cmap_node *cm_node = cmap_next_position(>conns,
> -   >cm_pos);
> -if (!cm_node) {
> -break;
> -}
> -struct conn_key_node *keyn;
> -struct conn *conn;
> +struct conn_key_node *keyn;
> +struct conn *conn;
>
> -INIT_CONTAINER(keyn, cm_node, cm_node);
> +CMAP_CURSOR_FOR_EACH_CONTINUE (keyn, cm_node, >cursor) {
>  if (keyn->dir != CT_DIR_FWD) {
>  continue;
>  }
> diff --git a/lib/conntrack.h b/lib/conntrack.h
> index ee7da099e..aa12a1847 100644
> --- a/lib/conntrack.h
> +++ b/lib/conntrack.h
> @@ -109,6 +109,7 @@ struct conntrack_dump {
>  union {
>  struct cmap_position cm_pos;

cm_pos is now dead code.

>  struct hmap_position hmap_pos;
> +struct cmap_cursor cursor;
>  };
>  bool filter_zone;
>  uint16_t zone;
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index 98e494abf..34f93b2e5 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -8389,6 +8389,53 @@ AT_CHECK([ovs-pcap client.pcap | grep 
> 20102000], [0], [dnl
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
>
> +AT_SETUP([conntrack - Flush many conntrack entries by port])
> +CHECK_CONNTRACK()
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
> +
> +AT_DATA([flows.txt], [dnl
> +priority=100,in_port=1,udp,action=ct(zone=1,commit),2
> +])
> +
> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> +
> +dnl 20 packets from port 1 and 1 packet from port 2.
> +flow_l3="\
> +eth_src=50:54:00:00:00:09,eth_dst=50:54:00:00:00:0a,dl_type=0x0800,\
> +nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_proto=17,nw_ttl=64,nw_frag=no"
> +
> +for i in $(seq 1 20); do
> +frame=$(ovs-ofctl compose-packet --bare "$flow_l3, udp_src=1,udp_dst=$i")
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 
> packet=$frame actions=resubmit(,0)"])
> +done
> +frame=$(ovs-ofctl compose-packet --bare "$flow_l3, udp_src=2,udp_dst=1")
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=$frame 
> actions=resubmit(,0)"])
> +
> +: > conntrack
> +
> +for i in $(seq 1 20); do
> +echo 
> "udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=${i}),reply=(src=10.1.1.2,dst=10.1.1.1,sport=${i},dport=1),zone=1"
>  >> conntrack
> +done
> +echo 
> "udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=2,dport=1),reply=(src=10.1.1.2,dst=10.1.1.1,sport=1,dport=2),zone=1"
>  >> conntrack
> +
> +sort conntrack > expout
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep -F "src=10.1.1.1," | sort 
> ], [0], [expout])
> +
> +dnl Check that flushing conntrack by port 1 flush all ct for port 1 but 
> keeps ct for port 2.
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_proto=17,ct_tp_src=1'])
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep -F "src=10.1.1.1," | sort 
> ], [0], [dnl
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=2,dport=1),reply=(src=10.1.1.2,dst=10.1.1.1,sport=1,dport=2),zone=1
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
>  AT_BANNER([IGMP])
>
>  AT_SETUP([IGMP - flood under normal action])
> --
> 2.41.0
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>

___
dev mailing list
d...@openvswitch.org

[ovs-dev] [PATCH v7 2/2] ofproto-dpif-mirror: Add support for pre-selection filter.

2024-02-26 Thread Mike Pattrick
Currently a bridge mirror will collect all packets and tools like
ovs-tcpdump can apply additional filters after they have already been
duplicated by vswitchd. This can result in inefficient collection.

This patch adds support to apply pre-selection to bridge mirrors, which
can limit which packets are mirrored based on flow metadata. This
significantly improves overall vswitchd performance during mirroring if
only a subset of traffic is required.

Signed-off-by: Mike Pattrick 
---
v7:
 - Make sure filter mask is added to masks of non-matching flows.
 - Added additional tests.
---
 Documentation/ref/ovs-tcpdump.8.rst |   8 +-
 NEWS|   3 +
 lib/flow.c  |  21 +++-
 lib/flow.h  |  12 +++
 ofproto/ofproto-dpif-mirror.c   |  78 ++-
 ofproto/ofproto-dpif-mirror.h   |  12 ++-
 ofproto/ofproto-dpif-xlate.c|  26 -
 ofproto/ofproto-dpif.c  |   9 +-
 ofproto/ofproto-dpif.h  |   6 ++
 ofproto/ofproto.c   |   4 +-
 ofproto/ofproto.h   |   3 +
 tests/ofproto-dpif.at   | 142 
 utilities/ovs-tcpdump.in|  13 ++-
 vswitchd/bridge.c   |  13 ++-
 vswitchd/vswitch.ovsschema  |   5 +-
 vswitchd/vswitch.xml|  13 +++
 16 files changed, 343 insertions(+), 25 deletions(-)

diff --git a/Documentation/ref/ovs-tcpdump.8.rst 
b/Documentation/ref/ovs-tcpdump.8.rst
index b9f8cdf6f..e21e61211 100644
--- a/Documentation/ref/ovs-tcpdump.8.rst
+++ b/Documentation/ref/ovs-tcpdump.8.rst
@@ -61,8 +61,14 @@ Options
 
   If specified, mirror all ports (optional).
 
+* ``--filter ``
+
+  If specified, only mirror flows that match the provided OpenFlow filter.
+  The available fields are documented in ``ovs-fields(7)``.
+
 See Also
 
 
 ``ovs-appctl(8)``, ``ovs-vswitchd(8)``, ``ovs-pcap(1)``,
-``ovs-tcpundump(1)``, ``tcpdump(8)``, ``wireshark(8)``.
+``ovs-fields(7)``, ``ovs-tcpundump(1)``, ``tcpdump(8)``,
+``wireshark(8)``.
diff --git a/NEWS b/NEWS
index c9e4064e6..35f7eb0c7 100644
--- a/NEWS
+++ b/NEWS
@@ -4,6 +4,9 @@ Post-v3.3.0
  * Conntrack now supports 'random' flag for selecting ports in a range
while natting and 'persistent' flag for selection of the IP address
from a range.
+   - OVSDB:
+ * Added a new filter column in the Mirror table which can be used to
+   apply filters to mirror ports.
 
 
 v3.3.0 - 16 Feb 2024
diff --git a/lib/flow.c b/lib/flow.c
index 8e3402388..a088bdc86 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -3569,7 +3569,7 @@ miniflow_equal_in_minimask(const struct miniflow *a, 
const struct miniflow *b,
 return true;
 }
 
-/* Returns true if 'a' and 'b' are equal at the places where there are 1-bits
+/* Returns true if 'a' and 'b' are equal at the places where there are 0-bits
  * in 'mask', false if they differ. */
 bool
 miniflow_equal_flow_in_minimask(const struct miniflow *a, const struct flow *b,
@@ -3587,6 +3587,25 @@ miniflow_equal_flow_in_minimask(const struct miniflow 
*a, const struct flow *b,
 return true;
 }
 
+/* Returns false if 'a' and 'b' differ in places where there are 1-bits in
+ * 'wc', true otherwise. */
+bool
+miniflow_equal_flow_in_flow_wc(const struct miniflow *a, const struct flow *b,
+   const struct flow_wildcards *wc)
+{
+const struct flow *wc_masks = >masks;
+size_t idx;
+
+FLOWMAP_FOR_EACH_INDEX (idx, a->map) {
+if ((miniflow_get(a, idx) ^ flow_u64_value(b, idx)) &
+flow_u64_value(wc_masks, idx)) {
+return false;
+}
+}
+
+return true;
+}
+
 
 void
 minimask_init(struct minimask *mask, const struct flow_wildcards *wc)
diff --git a/lib/flow.h b/lib/flow.h
index 75a9be3c1..a644be39d 100644
--- a/lib/flow.h
+++ b/lib/flow.h
@@ -748,6 +748,9 @@ bool miniflow_equal_in_minimask(const struct miniflow *a,
 bool miniflow_equal_flow_in_minimask(const struct miniflow *a,
  const struct flow *b,
  const struct minimask *);
+bool miniflow_equal_flow_in_flow_wc(const struct miniflow *a,
+const struct flow *b,
+const struct flow_wildcards *);
 uint32_t miniflow_hash_5tuple(const struct miniflow *flow, uint32_t basis);
 
 
@@ -939,6 +942,15 @@ flow_union_with_miniflow(struct flow *dst, const struct 
miniflow *src)
 flow_union_with_miniflow_subset(dst, src, src->map);
 }
 
+/* Perform a bitwise OR of minimask 'src' mask data with the equivalent
+ * fields in 'dst', storing the result in 'dst'. */
+static inline void
+flow_wildcards_union_with_minimask(struct flow_wildcards *dst,
+   const struct minimask *src)
+{
+flow_union_with_miniflow_subset(>masks, >masks, src->masks.map);
+}
+
 static inline bool is_ct_valid(c

[ovs-dev] [PATCH v7 1/2] ofproto-dpif-mirror: Reduce number of function parameters.

2024-02-26 Thread Mike Pattrick
Previously the mirror_set() and mirror_get() functions took a large
number of parameters, which was inefficient and difficult to read and
extend. This patch moves most of the parameters into a struct.

Signed-off-by: Mike Pattrick 
Acked-by: Simon Horman 
Acked-by: Eelco Chaudron 
---
 ofproto/ofproto-dpif-mirror.c | 61 ++-
 ofproto/ofproto-dpif-mirror.h | 42 +++-
 ofproto/ofproto-dpif-xlate.c  | 29 -
 ofproto/ofproto-dpif.c| 23 ++---
 4 files changed, 91 insertions(+), 64 deletions(-)

diff --git a/ofproto/ofproto-dpif-mirror.c b/ofproto/ofproto-dpif-mirror.c
index 343b75f0e..a84c843b3 100644
--- a/ofproto/ofproto-dpif-mirror.c
+++ b/ofproto/ofproto-dpif-mirror.c
@@ -207,19 +207,23 @@ mirror_bundle_dst(struct mbridge *mbridge, struct 
ofbundle *ofbundle)
 }
 
 int
-mirror_set(struct mbridge *mbridge, void *aux, const char *name,
-   struct ofbundle **srcs, size_t n_srcs,
-   struct ofbundle **dsts, size_t n_dsts,
-   unsigned long *src_vlans, struct ofbundle *out_bundle,
-   uint16_t snaplen,
-   uint16_t out_vlan)
+mirror_set(struct mbridge *mbridge, void *aux,
+   const struct ofproto_mirror_settings *ms,
+   const struct mirror_bundles *mb)
+
 {
 struct mbundle *mbundle, *out;
 mirror_mask_t mirror_bit;
 struct mirror *mirror;
 struct hmapx srcs_map;  /* Contains "struct ofbundle *"s. */
 struct hmapx dsts_map;  /* Contains "struct ofbundle *"s. */
+uint16_t out_vlan;
+
+if (!ms || !mbridge) {
+return EINVAL;
+}
 
+out_vlan = ms->out_vlan;
 mirror = mirror_lookup(mbridge, aux);
 if (!mirror) {
 int idx;
@@ -227,7 +231,7 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 idx = mirror_scan(mbridge);
 if (idx < 0) {
 VLOG_WARN("maximum of %d port mirrors reached, cannot create %s",
-  MAX_MIRRORS, name);
+  MAX_MIRRORS, ms->name);
 return EFBIG;
 }
 
@@ -242,8 +246,8 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 unsigned long *vlans = ovsrcu_get(unsigned long *, >vlans);
 
 /* Get the new configuration. */
-if (out_bundle) {
-out = mbundle_lookup(mbridge, out_bundle);
+if (mb->out_bundle) {
+out = mbundle_lookup(mbridge, mb->out_bundle);
 if (!out) {
 mirror_destroy(mbridge, mirror->aux);
 return EINVAL;
@@ -252,16 +256,16 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 } else {
 out = NULL;
 }
-mbundle_lookup_multiple(mbridge, srcs, n_srcs, _map);
-mbundle_lookup_multiple(mbridge, dsts, n_dsts, _map);
+mbundle_lookup_multiple(mbridge, mb->srcs, mb->n_srcs, _map);
+mbundle_lookup_multiple(mbridge, mb->dsts, mb->n_dsts, _map);
 
 /* If the configuration has not changed, do nothing. */
 if (hmapx_equals(_map, >srcs)
 && hmapx_equals(_map, >dsts)
-&& vlan_bitmap_equal(vlans, src_vlans)
+&& vlan_bitmap_equal(vlans, ms->src_vlans)
 && mirror->out == out
 && mirror->out_vlan == out_vlan
-&& mirror->snaplen == snaplen)
+&& mirror->snaplen == ms->snaplen)
 {
 hmapx_destroy(_map);
 hmapx_destroy(_map);
@@ -275,15 +279,15 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 hmapx_swap(_map, >dsts);
 hmapx_destroy(_map);
 
-if (vlans || src_vlans) {
+if (vlans || ms->src_vlans) {
 ovsrcu_postpone(free, vlans);
-vlans = vlan_bitmap_clone(src_vlans);
+vlans = vlan_bitmap_clone(ms->src_vlans);
 ovsrcu_set(>vlans, vlans);
 }
 
 mirror->out = out;
 mirror->out_vlan = out_vlan;
-mirror->snaplen = snaplen;
+mirror->snaplen = ms->snaplen;
 
 /* Update mbundles. */
 mirror_bit = MIRROR_MASK_C(1) << mirror->idx;
@@ -406,23 +410,22 @@ mirror_update_stats(struct mbridge *mbridge, 
mirror_mask_t mirrors,
 /* Retrieves the mirror numbered 'index' in 'mbridge'.  Returns true if such a
  * mirror exists, false otherwise.
  *
- * If successful, '*vlans' receives the mirror's VLAN membership information,
+ * If successful 'mc->vlans' receives the mirror's VLAN membership information,
  * either a null pointer if the mirror includes all VLANs or a 4096-bit bitmap
  * in which a 1-bit indicates that the mirror includes a particular VLAN,
- * '*dup_mirrors' receives a bitmap of mirrors whose output duplicates mirror
- * 'index', '*out' receives the output ofbundle (if any), and '*out_vlan'
- * receives the output VLAN (if any).
+ * 'mc->dup_mirrors' receives a bitmap of mirrors whose output duplicates
+ * mirror 'index

[ovs-dev] [PATCH v2] dp-packet: Don't offload inner csum if outer isn't supported.

2024-02-26 Thread Mike Pattrick
Some network cards support inner checksum offloading but not outer
checksum offloading. Currently OVS will resolve that outer checksum but
allows the network card to resolve the inner checksum, invalidating the
outer checksum in the process.

Now if we can't offload outer checksums, we don't offload inner either.

Reported-at: https://issues.redhat.com/browse/FDP-363
Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
Signed-off-by: Mike Pattrick 
---
nb: I also tested a more complex patch that only resolved the inner
checksum and offloaded the UDP layer. This didn't noticably improve
performance.
v2: Added IPv4 flag
---
 lib/dp-packet.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 305822293..df7bf8e6b 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -592,6 +592,18 @@ dp_packet_ol_send_prepare(struct dp_packet *p, uint64_t 
flags)
 if (dp_packet_hwol_is_tunnel_geneve(p) ||
 dp_packet_hwol_is_tunnel_vxlan(p)) {
 tnl_inner = true;
+
+/* If the TX interface doesn't support UDP tunnel offload but does
+ * support inner checksum offload and an outer UDP checksum is
+ * required, then we can't offload inner checksum either. As that would
+ * invalidate the outer checksum. */
+if (!(flags & NETDEV_TX_OFFLOAD_OUTER_UDP_CKSUM) &&
+dp_packet_hwol_is_outer_udp_cksum(p)) {
+flags &= ~(NETDEV_TX_OFFLOAD_TCP_CKSUM |
+   NETDEV_TX_OFFLOAD_UDP_CKSUM |
+   NETDEV_TX_OFFLOAD_SCTP_CKSUM |
+   NETDEV_TX_OFFLOAD_IPV4_CKSUM);
+}
 }
 
 if (dp_packet_hwol_tx_ip_csum(p)) {
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] ovsdb: Don't iterate over rows on empty mutation.

2024-02-22 Thread Mike Pattrick
Previously when an empty mutation was used to count the number of rows
in a table, OVSDB would iterate over all rows twice. First to perform an
RBAC check, and then to perform the no-operation.

This change adds a short circuit to mutate operations with no conditions
and an empty mutation set, returning immediately. One notable change in
functionality is not performing the RBAC check in this condition, as no
mutation actually takes place.

Reported-by: Terry Wilson 
Reported-at: https://issues.redhat.com/browse/FDP-359
Signed-off-by: Mike Pattrick 
---
v2: Added additional non-rbac tests, and support for conditional
counting without the rbac check
---
 ovsdb/execution.c| 26 +++-
 ovsdb/mutation.h |  6 +
 tests/ovsdb-execution.at | 51 
 tests/ovsdb-rbac.at  | 23 ++
 4 files changed, 105 insertions(+), 1 deletion(-)

diff --git a/ovsdb/execution.c b/ovsdb/execution.c
index 8c20c3b54..7ed700632 100644
--- a/ovsdb/execution.c
+++ b/ovsdb/execution.c
@@ -585,6 +585,19 @@ mutate_row_cb(const struct ovsdb_row *row, void *mr_)
 return *mr->error == NULL;
 }
 
+struct count_row_cbdata {
+size_t n_matches;
+};
+
+static bool
+count_row_cb(const struct ovsdb_row *row OVS_UNUSED, void *cr_)
+{
+struct count_row_cbdata *cr = cr_;
+
+cr->n_matches++;
+return true;
+}
+
 static struct ovsdb_error *
 ovsdb_execute_mutate(struct ovsdb_execution *x, struct ovsdb_parser *parser,
  struct json *result)
@@ -609,7 +622,18 @@ ovsdb_execute_mutate(struct ovsdb_execution *x, struct 
ovsdb_parser *parser,
 error = ovsdb_condition_from_json(table->schema, where, x->symtab,
   );
 }
-if (!error) {
+if (!error && ovsdb_mutation_set_empty()) {
+/* Special case with no mutations, just return the row count. */
+if (ovsdb_condition_empty()) {
+json_object_put(result, "count",
+json_integer_create(hmap_count(>rows)));
+} else {
+struct count_row_cbdata cr = {};
+ovsdb_query(table, , count_row_cb, );
+json_object_put(result, "count",
+json_integer_create(cr.n_matches));
+}
+} else if (!error) {
 mr.n_matches = 0;
 mr.txn = x->txn;
 mr.mutations = 
diff --git a/ovsdb/mutation.h b/ovsdb/mutation.h
index 7566ef199..05d4a262a 100644
--- a/ovsdb/mutation.h
+++ b/ovsdb/mutation.h
@@ -69,4 +69,10 @@ void ovsdb_mutation_set_destroy(struct ovsdb_mutation_set *);
 struct ovsdb_error *ovsdb_mutation_set_execute(
 struct ovsdb_row *, const struct ovsdb_mutation_set *) 
OVS_WARN_UNUSED_RESULT;
 
+static inline bool ovsdb_mutation_set_empty(
+const struct ovsdb_mutation_set *ms)
+{
+return ms->n_mutations == 0;
+}
+
 #endif /* ovsdb/mutation.h */
diff --git a/tests/ovsdb-execution.at b/tests/ovsdb-execution.at
index fd1c7a239..1ffa2b738 100644
--- a/tests/ovsdb-execution.at
+++ b/tests/ovsdb-execution.at
@@ -1201,4 +1201,55 @@ OVSDB_CHECK_EXECUTION([garbage collection],
 [{"rows":[]}]
 ]])])
 
+OVSDB_CHECK_EXECUTION([insert rows, count with mutation],
+  [ordinal_schema],
+  "ordinals",
+  {"op": "insert",
+   "table": "ordinals",
+   "row": {"number": 0, "name": "zero"},
+   "uuid-name": "first"}]]],
+   [[["ordinals",
+  {"op": "insert",
+   "table": "ordinals",
+   "row": {"number": 1, "name": "one"},
+   "uuid-name": "first"}]]],
+   [[["ordinals",
+  {"op": "mutate",
+   "table": "ordinals",
+   "where": [["name", "==", "zero"]],
+   "mutations": []}]]],
+   [[["ordinals",
+  {"op": "mutate",
+   "table": "ordinals",
+   "where": [["name", "==", "one"]],
+   "mutations": []}]]],
+   [[["ordinals",
+  {"op": "insert",
+   "table": "ordinals",
+   "row": {"number": 2, "name": "one"},
+   "uuid-name": "first"}]]],
+   [[["ordinals",
+  {"op": "mutate",
+   "table": "ordinals",
+   "where": [["name", "==", "one"]],
+   "mutations": []}]]],
+   [[["ordinals",
+  {"op": "delete",
+   "table": "ordinals",
+   "where": [["name", &

Re: [ovs-dev] [PATCH] Userspace: Software fallback for UDP encapsulated TCP segmentation.

2024-02-21 Thread Mike Pattrick
On Tue, Feb 20, 2024 at 11:09 PM Mike Pattrick  wrote:
>
> When sending packets that are flagged as requiring segmentation to an
> interface that doens't support this feature, send the packet to the TSO
> software fallback instead of dropping it.
>
> Signed-off-by: Mike Pattrick 

Recheck-request: github-robot

> ---
>  lib/dp-packet-gso.c | 73 +
>  lib/dp-packet.h | 26 +++
>  lib/netdev-native-tnl.c |  8 +
>  lib/netdev.c| 37 +
>  tests/system-traffic.at | 58 
>  5 files changed, 167 insertions(+), 35 deletions(-)
>
> diff --git a/lib/dp-packet-gso.c b/lib/dp-packet-gso.c
> index 847685ad9..f25abf436 100644
> --- a/lib/dp-packet-gso.c
> +++ b/lib/dp-packet-gso.c
> @@ -47,6 +47,8 @@ dp_packet_gso_seg_new(const struct dp_packet *p, size_t 
> hdr_len,
>  seg->l2_5_ofs = p->l2_5_ofs;
>  seg->l3_ofs = p->l3_ofs;
>  seg->l4_ofs = p->l4_ofs;
> +seg->inner_l3_ofs = p->inner_l3_ofs;
> +seg->inner_l4_ofs = p->inner_l4_ofs;
>
>  /* The protocol headers remain the same, so preserve hash and mark. */
>  *dp_packet_rss_ptr(seg) = *dp_packet_rss_ptr(p);
> @@ -71,7 +73,12 @@ dp_packet_gso_nr_segs(struct dp_packet *p)
>  const char *data_tail;
>  const char *data_pos;
>
> -data_pos = dp_packet_get_tcp_payload(p);
> +if (dp_packet_hwol_is_tunnel_vxlan(p) ||
> +dp_packet_hwol_is_tunnel_geneve(p)) {
> +data_pos = dp_packet_get_inner_tcp_payload(p);
> +} else {
> +data_pos = dp_packet_get_tcp_payload(p);
> +}
>  data_tail = (char *) dp_packet_tail(p) - dp_packet_l2_pad_size(p);
>
>  return DIV_ROUND_UP(data_tail - data_pos, segsz);
> @@ -91,12 +98,15 @@ dp_packet_gso(struct dp_packet *p, struct dp_packet_batch 
> **batches)
>  struct tcp_header *tcp_hdr;
>  struct ip_header *ip_hdr;
>  struct dp_packet *seg;
> +const char *data_pos;
>  uint16_t tcp_offset;
>  uint16_t tso_segsz;
> +uint16_t ip_id = 0;
>  uint32_t tcp_seq;
> -uint16_t ip_id;
> +bool outer_ipv4;
>  int hdr_len;
>  int seg_len;
> +bool tnl;
>
>  tso_segsz = dp_packet_get_tso_segsz(p);
>  if (!tso_segsz) {
> @@ -105,20 +115,35 @@ dp_packet_gso(struct dp_packet *p, struct 
> dp_packet_batch **batches)
>  return false;
>  }
>
> -tcp_hdr = dp_packet_l4(p);
> -tcp_offset = TCP_OFFSET(tcp_hdr->tcp_ctl);
> -tcp_seq = ntohl(get_16aligned_be32(_hdr->tcp_seq));
> -hdr_len = ((char *) dp_packet_l4(p) - (char *) dp_packet_eth(p))
> -  + tcp_offset * 4;
> -ip_id = 0;
> -if (dp_packet_hwol_is_ipv4(p)) {
> +if (dp_packet_hwol_is_tunnel_vxlan(p) ||
> +dp_packet_hwol_is_tunnel_geneve(p)) {
> +data_pos =  dp_packet_get_inner_tcp_payload(p);
> +outer_ipv4 = dp_packet_hwol_is_outer_ipv4(p);
> +tcp_hdr = dp_packet_inner_l4(p);
> +ip_hdr = dp_packet_inner_l3(p);
> +tnl = true;
> +if (outer_ipv4) {
> +ip_id = ntohs(((struct ip_header *) dp_packet_l3(p))->ip_id);
> +} else if (dp_packet_hwol_is_ipv4(p)) {
> +ip_id = ntohs(ip_hdr->ip_id);
> +}
> +} else {
> +data_pos = dp_packet_get_tcp_payload(p);
> +outer_ipv4 = dp_packet_hwol_is_ipv4(p);
> +tcp_hdr = dp_packet_l4(p);
>  ip_hdr = dp_packet_l3(p);
> -ip_id = ntohs(ip_hdr->ip_id);
> +tnl = false;
> +if (outer_ipv4) {
> +ip_id = ntohs(ip_hdr->ip_id);
> +}
>  }
>
> +tcp_offset = TCP_OFFSET(tcp_hdr->tcp_ctl);
> +tcp_seq = ntohl(get_16aligned_be32(_hdr->tcp_seq));
> +hdr_len = ((char *) tcp_hdr - (char *) dp_packet_eth(p))
> +  + tcp_offset * 4;
>  const char *data_tail = (char *) dp_packet_tail(p)
>  - dp_packet_l2_pad_size(p);
> -const char *data_pos = dp_packet_get_tcp_payload(p);
>  int n_segs = dp_packet_gso_nr_segs(p);
>
>  for (int i = 0; i < n_segs; i++) {
> @@ -130,8 +155,26 @@ dp_packet_gso(struct dp_packet *p, struct 
> dp_packet_batch **batches)
>  seg = dp_packet_gso_seg_new(p, hdr_len, data_pos, seg_len);
>  data_pos += seg_len;
>
> +if (tnl) {
> +/* Update tunnel L3 header. */
> +if (dp_packet_hwol_is_ipv4(seg)) {
> +ip_hdr = dp_packet_inner_l3(seg);
> +ip_hdr->ip_tot_len = htons(sizeof *ip_hdr +
> +   dp_packet_inn

[ovs-dev] [PATCH v3] userspace: Allow UDP zero checksum with IPv6 tunnels.

2024-02-21 Thread Mike Pattrick
This patch adopts the proposed RFC 6935 by allowing null UDP checksums
even if the tunnel protocol is IPv6. This is already supported by Linux
through the udp6zerocsumtx tunnel option. It is disabled by default and
IPv6 tunnels are flagged as requiring a checksum, but this patch enables
the user to set csum=false on IPv6 tunnels.

Signed-off-by: Mike Pattrick 
---
v2: Changed documentation, and added a NEWS item
v3: NEWS file merge conflict
---
 NEWS|  3 +++
 lib/netdev-native-tnl.c |  2 +-
 lib/netdev-vport.c  | 13 +++--
 lib/netdev.h|  9 -
 ofproto/tunnel.c| 11 +--
 tests/tunnel.at |  6 +++---
 vswitchd/vswitch.xml| 11 ---
 7 files changed, 43 insertions(+), 12 deletions(-)

diff --git a/NEWS b/NEWS
index c9e4064e6..3a75d3850 100644
--- a/NEWS
+++ b/NEWS
@@ -4,6 +4,9 @@ Post-v3.3.0
  * Conntrack now supports 'random' flag for selecting ports in a range
while natting and 'persistent' flag for selection of the IP address
from a range.
+ * IPv6 UDP tunnels will now honour the csum option. Configuring the
+   interface with "options:csum=false" now has the same effect in OVS
+   as the udp6zerocsumtx option has with kernel UDP tunnels.
 
 
 v3.3.0 - 16 Feb 2024
diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index dee9ab344..e8258bc4e 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -424,7 +424,7 @@ udp_build_header(const struct netdev_tunnel_config *tnl_cfg,
 udp = netdev_tnl_ip_build_header(data, params, IPPROTO_UDP, 0);
 udp->udp_dst = tnl_cfg->dst_port;
 
-if (params->is_ipv6 || params->flow->tunnel.flags & FLOW_TNL_F_CSUM) {
+if (params->flow->tunnel.flags & FLOW_TNL_F_CSUM) {
 /* Write a value in now to mark that we should compute the checksum
  * later. 0x is handy because it is transparent to the
  * calculation. */
diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index 60caa02fb..f9a778988 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -702,7 +702,9 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
*args, char **errp)
 tnl_cfg.dst_port = htons(atoi(node->value));
 } else if (!strcmp(node->key, "csum") && has_csum) {
 if (!strcmp(node->value, "true")) {
-tnl_cfg.csum = true;
+tnl_cfg.csum = NETDEV_TNL_CSUM_ENABLED;
+} else if (!strcmp(node->value, "false")) {
+tnl_cfg.csum = NETDEV_TNL_CSUM_DISABLED;
 }
 } else if (!strcmp(node->key, "seq") && has_seq) {
 if (!strcmp(node->value, "true")) {
@@ -850,6 +852,11 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
*args, char **errp)
 }
 }
 
+/* The default csum state for GRE is special. */
+if (tnl_cfg.csum == NETDEV_TNL_CSUM_DEFAULT && strstr(type, "gre")) {
+tnl_cfg.csum = NETDEV_TNL_CSUM_DEFAULT_GRE;
+}
+
 enum tunnel_layers layers = tunnel_supported_layers(type, _cfg);
 const char *full_type = (strcmp(type, "vxlan") ? type
  : (tnl_cfg.exts & (1 << OVS_VXLAN_EXT_GPE)
@@ -1026,8 +1033,10 @@ get_tunnel_config(const struct netdev *dev, struct smap 
*args)
 }
 }
 
-if (tnl_cfg->csum) {
+if (tnl_cfg->csum == NETDEV_TNL_CSUM_ENABLED) {
 smap_add(args, "csum", "true");
+} else if (tnl_cfg->csum == NETDEV_TNL_CSUM_DISABLED) {
+smap_add(args, "csum", "false");
 }
 
 if (tnl_cfg->set_seq) {
diff --git a/lib/netdev.h b/lib/netdev.h
index 67a8486bd..a79531e6d 100644
--- a/lib/netdev.h
+++ b/lib/netdev.h
@@ -111,6 +111,13 @@ enum netdev_srv6_flowlabel {
 SRV6_FLOWLABEL_COMPUTE,
 };
 
+enum netdev_tnl_csum {
+NETDEV_TNL_CSUM_DEFAULT,
+NETDEV_TNL_CSUM_ENABLED,
+NETDEV_TNL_CSUM_DISABLED,
+NETDEV_TNL_CSUM_DEFAULT_GRE,
+};
+
 /* Configuration specific to tunnels. */
 struct netdev_tunnel_config {
 ovs_be64 in_key;
@@ -139,7 +146,7 @@ struct netdev_tunnel_config {
 uint8_t tos;
 bool tos_inherit;
 
-bool csum;
+enum netdev_tnl_csum csum;
 bool dont_fragment;
 enum netdev_pt_mode pt_mode;
 
diff --git a/ofproto/tunnel.c b/ofproto/tunnel.c
index 80ddee78a..6f462874e 100644
--- a/ofproto/tunnel.c
+++ b/ofproto/tunnel.c
@@ -465,9 +465,14 @@ tnl_port_send(const struct ofport_dpif *ofport, struct 
flow *flow,
 
 flow->tunnel.flags &= ~(FLOW_TNL_F_MASK & ~FLOW_TNL_PUB_F_MASK);
 flow->tunnel.flags |= (cfg->dont_fragment ? FLOW_TNL_F_DONT_FRAGMENT : 0)
-| (cfg->csum ? FLOW_TNL_F_CSUM : 0)
 | (cfg->out_key_present ? FLOW_TNL_F_KEY : 0);
 
+if (cfg->csum == NETDEV_TNL_CSUM_ENABLED) {
+flo

[ovs-dev] [PATCH v2] userspace: Allow UDP zero checksum with IPv6 tunnels.

2024-02-21 Thread Mike Pattrick
This patch adopts the proposed RFC 6935 by allowing null UDP checksums
even if the tunnel protocol is IPv6. This is already supported by Linux
through the udp6zerocsumtx tunnel option. It is disabled by default and
IPv6 tunnels are flagged as requiring a checksum, but this patch enables
the user to set csum=false on IPv6 tunnels.

Signed-off-by: Mike Pattrick 
---
v2: Changed documentation, and added a NEWS item
---
 NEWS|  5 -
 lib/netdev-native-tnl.c |  2 +-
 lib/netdev-vport.c  | 13 +++--
 lib/netdev.h|  9 -
 ofproto/tunnel.c| 11 +--
 tests/tunnel.at |  6 +++---
 vswitchd/vswitch.xml| 11 ---
 7 files changed, 44 insertions(+), 13 deletions(-)

diff --git a/NEWS b/NEWS
index 0789dc0c6..84402ff8f 100644
--- a/NEWS
+++ b/NEWS
@@ -1,6 +1,9 @@
 Post-v3.3.0
 
-
+   - Userspace datapath:
+ * IPv6 UDP tunnels will now honour the csum option. Configuring the
+   interface with "options:csum=false" now has the same effect in OVS
+   as the udp6zerocsumtx option has with kernel UDP tunnels.
 
 v3.3.0 - 16 Feb 2024
 
diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index dee9ab344..e8258bc4e 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -424,7 +424,7 @@ udp_build_header(const struct netdev_tunnel_config *tnl_cfg,
 udp = netdev_tnl_ip_build_header(data, params, IPPROTO_UDP, 0);
 udp->udp_dst = tnl_cfg->dst_port;
 
-if (params->is_ipv6 || params->flow->tunnel.flags & FLOW_TNL_F_CSUM) {
+if (params->flow->tunnel.flags & FLOW_TNL_F_CSUM) {
 /* Write a value in now to mark that we should compute the checksum
  * later. 0x is handy because it is transparent to the
  * calculation. */
diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index 60caa02fb..f9a778988 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -702,7 +702,9 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
*args, char **errp)
 tnl_cfg.dst_port = htons(atoi(node->value));
 } else if (!strcmp(node->key, "csum") && has_csum) {
 if (!strcmp(node->value, "true")) {
-tnl_cfg.csum = true;
+tnl_cfg.csum = NETDEV_TNL_CSUM_ENABLED;
+} else if (!strcmp(node->value, "false")) {
+tnl_cfg.csum = NETDEV_TNL_CSUM_DISABLED;
 }
 } else if (!strcmp(node->key, "seq") && has_seq) {
 if (!strcmp(node->value, "true")) {
@@ -850,6 +852,11 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
*args, char **errp)
 }
 }
 
+/* The default csum state for GRE is special. */
+if (tnl_cfg.csum == NETDEV_TNL_CSUM_DEFAULT && strstr(type, "gre")) {
+tnl_cfg.csum = NETDEV_TNL_CSUM_DEFAULT_GRE;
+}
+
 enum tunnel_layers layers = tunnel_supported_layers(type, _cfg);
 const char *full_type = (strcmp(type, "vxlan") ? type
  : (tnl_cfg.exts & (1 << OVS_VXLAN_EXT_GPE)
@@ -1026,8 +1033,10 @@ get_tunnel_config(const struct netdev *dev, struct smap 
*args)
 }
 }
 
-if (tnl_cfg->csum) {
+if (tnl_cfg->csum == NETDEV_TNL_CSUM_ENABLED) {
 smap_add(args, "csum", "true");
+} else if (tnl_cfg->csum == NETDEV_TNL_CSUM_DISABLED) {
+smap_add(args, "csum", "false");
 }
 
 if (tnl_cfg->set_seq) {
diff --git a/lib/netdev.h b/lib/netdev.h
index 67a8486bd..a79531e6d 100644
--- a/lib/netdev.h
+++ b/lib/netdev.h
@@ -111,6 +111,13 @@ enum netdev_srv6_flowlabel {
 SRV6_FLOWLABEL_COMPUTE,
 };
 
+enum netdev_tnl_csum {
+NETDEV_TNL_CSUM_DEFAULT,
+NETDEV_TNL_CSUM_ENABLED,
+NETDEV_TNL_CSUM_DISABLED,
+NETDEV_TNL_CSUM_DEFAULT_GRE,
+};
+
 /* Configuration specific to tunnels. */
 struct netdev_tunnel_config {
 ovs_be64 in_key;
@@ -139,7 +146,7 @@ struct netdev_tunnel_config {
 uint8_t tos;
 bool tos_inherit;
 
-bool csum;
+enum netdev_tnl_csum csum;
 bool dont_fragment;
 enum netdev_pt_mode pt_mode;
 
diff --git a/ofproto/tunnel.c b/ofproto/tunnel.c
index 80ddee78a..6f462874e 100644
--- a/ofproto/tunnel.c
+++ b/ofproto/tunnel.c
@@ -465,9 +465,14 @@ tnl_port_send(const struct ofport_dpif *ofport, struct 
flow *flow,
 
 flow->tunnel.flags &= ~(FLOW_TNL_F_MASK & ~FLOW_TNL_PUB_F_MASK);
 flow->tunnel.flags |= (cfg->dont_fragment ? FLOW_TNL_F_DONT_FRAGMENT : 0)
-| (cfg->csum ? FLOW_TNL_F_CSUM : 0)
 | (cfg->out_key_present ? FLOW_TNL_F_KEY : 0);
 
+if (cfg->csum == NETDEV_TNL_CSUM_ENABLED) {
+flow->tunnel.flags |= FLOW_TNL_F_CSUM;
+} else if (cfg->csum == NETDEV_TNL_CSUM_DEFAULT && !flow->tunnel.ip_dst

Re: [ovs-dev] [PATCH 1/3] tests: Move the non-local port as tunnel endpoint test.

2024-02-20 Thread Mike Pattrick
On Tue, Feb 20, 2024 at 5:35 PM Ilya Maximets  wrote:
>
> It's not a system test as it runs with dummy datapath and ports
> and it has nothing to do with layer 3 tunnels.
>
> It should be with other userspace tunnel tests.
>
> While moving also making it a little nicer visually and less error
> prone by requesting port numbers for all the ports.
>
> Signed-off-by: Ilya Maximets 

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] userspace: Allow UDP zero checksum with IPv6 tunnels.

2024-02-20 Thread Mike Pattrick
On Tue, Feb 20, 2024 at 8:56 PM Mike Pattrick  wrote:
>
> This patch adopts the proposed RFC 6935 by allowing null UDP checksums
> even if the tunnel protocol is IPv6. This is already supported by Linux
> through the udp6zerocsumtx tunnel option. It is disabled by default and
> IPv6 tunnels are flagged as requiring a checksum, but this patch enables
> the user to set csum=false on IPv6 tunnels.
>
> Signed-off-by: Mike Pattrick 

One of the github CI runners failed this in test "bfd - bfd decay". I
believe this is a false negative.

-M

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dp-packet: Don't offload inner csum if outer isn't supported.

2024-02-20 Thread Mike Pattrick
On Tue, Feb 20, 2024 at 10:07 AM Mike Pattrick  wrote:
>
> Some network cards support inner checksum offloading but not outer
> checksum offloading. Currently OVS will resolve that outer checksum but
> allows the network card to resolve the inner checksum, invalidating the
> outer checksum in the process.
>
> Now if we can't offload outer checksums, we don't offload inner either.
>
> Reported-at: https://issues.redhat.com/browse/FDP-363
> Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
> Signed-off-by: Mike Pattrick 

Intel CI failed this patch at "conntrack - invalid", with error
message "upcall_cb failure: ukey installation fails".

I believe this is a false negative.

-M

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] Userspace: Software fallback for UDP encapsulated TCP segmentation.

2024-02-20 Thread Mike Pattrick
When sending packets that are flagged as requiring segmentation to an
interface that doens't support this feature, send the packet to the TSO
software fallback instead of dropping it.

Signed-off-by: Mike Pattrick 
---
 lib/dp-packet-gso.c | 73 +
 lib/dp-packet.h | 26 +++
 lib/netdev-native-tnl.c |  8 +
 lib/netdev.c| 37 +
 tests/system-traffic.at | 58 
 5 files changed, 167 insertions(+), 35 deletions(-)

diff --git a/lib/dp-packet-gso.c b/lib/dp-packet-gso.c
index 847685ad9..f25abf436 100644
--- a/lib/dp-packet-gso.c
+++ b/lib/dp-packet-gso.c
@@ -47,6 +47,8 @@ dp_packet_gso_seg_new(const struct dp_packet *p, size_t 
hdr_len,
 seg->l2_5_ofs = p->l2_5_ofs;
 seg->l3_ofs = p->l3_ofs;
 seg->l4_ofs = p->l4_ofs;
+seg->inner_l3_ofs = p->inner_l3_ofs;
+seg->inner_l4_ofs = p->inner_l4_ofs;
 
 /* The protocol headers remain the same, so preserve hash and mark. */
 *dp_packet_rss_ptr(seg) = *dp_packet_rss_ptr(p);
@@ -71,7 +73,12 @@ dp_packet_gso_nr_segs(struct dp_packet *p)
 const char *data_tail;
 const char *data_pos;
 
-data_pos = dp_packet_get_tcp_payload(p);
+if (dp_packet_hwol_is_tunnel_vxlan(p) ||
+dp_packet_hwol_is_tunnel_geneve(p)) {
+data_pos = dp_packet_get_inner_tcp_payload(p);
+} else {
+data_pos = dp_packet_get_tcp_payload(p);
+}
 data_tail = (char *) dp_packet_tail(p) - dp_packet_l2_pad_size(p);
 
 return DIV_ROUND_UP(data_tail - data_pos, segsz);
@@ -91,12 +98,15 @@ dp_packet_gso(struct dp_packet *p, struct dp_packet_batch 
**batches)
 struct tcp_header *tcp_hdr;
 struct ip_header *ip_hdr;
 struct dp_packet *seg;
+const char *data_pos;
 uint16_t tcp_offset;
 uint16_t tso_segsz;
+uint16_t ip_id = 0;
 uint32_t tcp_seq;
-uint16_t ip_id;
+bool outer_ipv4;
 int hdr_len;
 int seg_len;
+bool tnl;
 
 tso_segsz = dp_packet_get_tso_segsz(p);
 if (!tso_segsz) {
@@ -105,20 +115,35 @@ dp_packet_gso(struct dp_packet *p, struct dp_packet_batch 
**batches)
 return false;
 }
 
-tcp_hdr = dp_packet_l4(p);
-tcp_offset = TCP_OFFSET(tcp_hdr->tcp_ctl);
-tcp_seq = ntohl(get_16aligned_be32(_hdr->tcp_seq));
-hdr_len = ((char *) dp_packet_l4(p) - (char *) dp_packet_eth(p))
-  + tcp_offset * 4;
-ip_id = 0;
-if (dp_packet_hwol_is_ipv4(p)) {
+if (dp_packet_hwol_is_tunnel_vxlan(p) ||
+dp_packet_hwol_is_tunnel_geneve(p)) {
+data_pos =  dp_packet_get_inner_tcp_payload(p);
+outer_ipv4 = dp_packet_hwol_is_outer_ipv4(p);
+tcp_hdr = dp_packet_inner_l4(p);
+ip_hdr = dp_packet_inner_l3(p);
+tnl = true;
+if (outer_ipv4) {
+ip_id = ntohs(((struct ip_header *) dp_packet_l3(p))->ip_id);
+} else if (dp_packet_hwol_is_ipv4(p)) {
+ip_id = ntohs(ip_hdr->ip_id);
+}
+} else {
+data_pos = dp_packet_get_tcp_payload(p);
+outer_ipv4 = dp_packet_hwol_is_ipv4(p);
+tcp_hdr = dp_packet_l4(p);
 ip_hdr = dp_packet_l3(p);
-ip_id = ntohs(ip_hdr->ip_id);
+tnl = false;
+if (outer_ipv4) {
+ip_id = ntohs(ip_hdr->ip_id);
+}
 }
 
+tcp_offset = TCP_OFFSET(tcp_hdr->tcp_ctl);
+tcp_seq = ntohl(get_16aligned_be32(_hdr->tcp_seq));
+hdr_len = ((char *) tcp_hdr - (char *) dp_packet_eth(p))
+  + tcp_offset * 4;
 const char *data_tail = (char *) dp_packet_tail(p)
 - dp_packet_l2_pad_size(p);
-const char *data_pos = dp_packet_get_tcp_payload(p);
 int n_segs = dp_packet_gso_nr_segs(p);
 
 for (int i = 0; i < n_segs; i++) {
@@ -130,8 +155,26 @@ dp_packet_gso(struct dp_packet *p, struct dp_packet_batch 
**batches)
 seg = dp_packet_gso_seg_new(p, hdr_len, data_pos, seg_len);
 data_pos += seg_len;
 
+if (tnl) {
+/* Update tunnel L3 header. */
+if (dp_packet_hwol_is_ipv4(seg)) {
+ip_hdr = dp_packet_inner_l3(seg);
+ip_hdr->ip_tot_len = htons(sizeof *ip_hdr +
+   dp_packet_inner_l4_size(seg));
+ip_hdr->ip_id = htons(ip_id);
+ip_hdr->ip_csum = 0;
+ip_id++;
+} else {
+struct ovs_16aligned_ip6_hdr *ip6_hdr;
+
+ip6_hdr = dp_packet_inner_l3(seg);
+ip6_hdr->ip6_ctlun.ip6_un1.ip6_un1_plen
+= htons(dp_packet_inner_l3_size(seg) - sizeof *ip6_hdr);
+}
+}
+
 /* Update L3 header. */
-if (dp_packet_hwol_is_ipv4(seg)) {
+if (outer_ipv4) {
 ip_hdr = dp_packet_l3(seg);
 ip_hdr->ip_tot_len = htons(sizeof *ip_hdr +
  

[ovs-dev] [PATCH] userspace: Allow UDP zero checksum with IPv6 tunnels.

2024-02-20 Thread Mike Pattrick
This patch adopts the proposed RFC 6935 by allowing null UDP checksums
even if the tunnel protocol is IPv6. This is already supported by Linux
through the udp6zerocsumtx tunnel option. It is disabled by default and
IPv6 tunnels are flagged as requiring a checksum, but this patch enables
the user to set csum=false on IPv6 tunnels.

Signed-off-by: Mike Pattrick 
---
 lib/netdev-native-tnl.c |  2 +-
 lib/netdev-vport.c  | 13 +++--
 lib/netdev.h|  9 -
 ofproto/tunnel.c| 11 +--
 tests/tunnel.at |  6 +++---
 vswitchd/vswitch.xml|  8 +---
 6 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index dee9ab344..e8258bc4e 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -424,7 +424,7 @@ udp_build_header(const struct netdev_tunnel_config *tnl_cfg,
 udp = netdev_tnl_ip_build_header(data, params, IPPROTO_UDP, 0);
 udp->udp_dst = tnl_cfg->dst_port;
 
-if (params->is_ipv6 || params->flow->tunnel.flags & FLOW_TNL_F_CSUM) {
+if (params->flow->tunnel.flags & FLOW_TNL_F_CSUM) {
 /* Write a value in now to mark that we should compute the checksum
  * later. 0x is handy because it is transparent to the
  * calculation. */
diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index 60caa02fb..f9a778988 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -702,7 +702,9 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
*args, char **errp)
 tnl_cfg.dst_port = htons(atoi(node->value));
 } else if (!strcmp(node->key, "csum") && has_csum) {
 if (!strcmp(node->value, "true")) {
-tnl_cfg.csum = true;
+tnl_cfg.csum = NETDEV_TNL_CSUM_ENABLED;
+} else if (!strcmp(node->value, "false")) {
+tnl_cfg.csum = NETDEV_TNL_CSUM_DISABLED;
 }
 } else if (!strcmp(node->key, "seq") && has_seq) {
 if (!strcmp(node->value, "true")) {
@@ -850,6 +852,11 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
*args, char **errp)
 }
 }
 
+/* The default csum state for GRE is special. */
+if (tnl_cfg.csum == NETDEV_TNL_CSUM_DEFAULT && strstr(type, "gre")) {
+tnl_cfg.csum = NETDEV_TNL_CSUM_DEFAULT_GRE;
+}
+
 enum tunnel_layers layers = tunnel_supported_layers(type, _cfg);
 const char *full_type = (strcmp(type, "vxlan") ? type
  : (tnl_cfg.exts & (1 << OVS_VXLAN_EXT_GPE)
@@ -1026,8 +1033,10 @@ get_tunnel_config(const struct netdev *dev, struct smap 
*args)
 }
 }
 
-if (tnl_cfg->csum) {
+if (tnl_cfg->csum == NETDEV_TNL_CSUM_ENABLED) {
 smap_add(args, "csum", "true");
+} else if (tnl_cfg->csum == NETDEV_TNL_CSUM_DISABLED) {
+smap_add(args, "csum", "false");
 }
 
 if (tnl_cfg->set_seq) {
diff --git a/lib/netdev.h b/lib/netdev.h
index 67a8486bd..a79531e6d 100644
--- a/lib/netdev.h
+++ b/lib/netdev.h
@@ -111,6 +111,13 @@ enum netdev_srv6_flowlabel {
 SRV6_FLOWLABEL_COMPUTE,
 };
 
+enum netdev_tnl_csum {
+NETDEV_TNL_CSUM_DEFAULT,
+NETDEV_TNL_CSUM_ENABLED,
+NETDEV_TNL_CSUM_DISABLED,
+NETDEV_TNL_CSUM_DEFAULT_GRE,
+};
+
 /* Configuration specific to tunnels. */
 struct netdev_tunnel_config {
 ovs_be64 in_key;
@@ -139,7 +146,7 @@ struct netdev_tunnel_config {
 uint8_t tos;
 bool tos_inherit;
 
-bool csum;
+enum netdev_tnl_csum csum;
 bool dont_fragment;
 enum netdev_pt_mode pt_mode;
 
diff --git a/ofproto/tunnel.c b/ofproto/tunnel.c
index 80ddee78a..6f462874e 100644
--- a/ofproto/tunnel.c
+++ b/ofproto/tunnel.c
@@ -465,9 +465,14 @@ tnl_port_send(const struct ofport_dpif *ofport, struct 
flow *flow,
 
 flow->tunnel.flags &= ~(FLOW_TNL_F_MASK & ~FLOW_TNL_PUB_F_MASK);
 flow->tunnel.flags |= (cfg->dont_fragment ? FLOW_TNL_F_DONT_FRAGMENT : 0)
-| (cfg->csum ? FLOW_TNL_F_CSUM : 0)
 | (cfg->out_key_present ? FLOW_TNL_F_KEY : 0);
 
+if (cfg->csum == NETDEV_TNL_CSUM_ENABLED) {
+flow->tunnel.flags |= FLOW_TNL_F_CSUM;
+} else if (cfg->csum == NETDEV_TNL_CSUM_DEFAULT && !flow->tunnel.ip_dst) {
+flow->tunnel.flags |= FLOW_TNL_F_CSUM;
+}
+
 if (cfg->set_egress_pkt_mark) {
 flow->pkt_mark = cfg->egress_pkt_mark;
 wc->masks.pkt_mark = UINT32_MAX;
@@ -706,8 +711,10 @@ tnl_port_format(const struct tnl_port *tnl_port, struct ds 
*ds)
 ds_put_cstr(ds, ", df=false");
 }
 
-if (cfg->csum) {
+if (cfg->csum == NETDEV_TNL_CSUM_ENABLED) {
 ds_put_cstr(ds, ", csum=true");
+} else if (cfg->csum == NETDEV_TNL_CSUM_DISABL

  1   2   3   4   5   6   7   >