On 17 Jun 2026, at 12:25, Ilya Maximets wrote:

> TCA_ACT_MAX_NUM limits the actions number from the OVS perspective,
> i.e., the size of the actions array in tc_flower structure.  However,
> these actions can be represented by multiple actual TC actions in the
> netlink message.  For example, output to internal ports requires extra
> "skbedit" action to change the packet type before sending it to the
> "mirred" action.  We also emit extra "csum" actions and put tunnel
> attributes, etc.
>
> In the end, the number of actual TC actions is frequently larger
> than the number of actions in tc_flower structure.  Meaning that the
> number of priorities in use can be larger than TCA_ACT_MAX_NUM.
>
> If that happens, we fail to parse the full ECHO reply for the netlink
> transaction and report a mismatch between requested and acknowledged
> configuration.  The traffic should work fine, but we have the annoying
> warning in the logs and the revalidators will try updating that one
> datapath flow triggering more warnings.
>
> We need to always expect the full range of priorities while parsing
> kernel replies.  And we need to check that the number of used
> priorities doesn't actually exceed the TCA_ACT_MAX_PRIO before sending
> requests to the kernel, otherwise they can be "silently" truncated in
> the absence of the kernel commit:
>   5f5f92912b43 ("tc: Return an error if filters try to attach too many 
> actions")
>
> Also adding some comments around the indexing, as it is confusing
> that priorities for TC flower actions are counted from 1, but the
> action table indexes for the police action are starting from zero.
> Both using the same constant for the upper bound, but it is inclusive
> in one case and not in the other.
>
> Notes:
>
> Technically, there should be no need for the artificial limit imposed
> by TCA_ACT_MAX_NUM, since the issue it was trying to work around was
> fixed in the kernel commit:
>   369609fc6272 ("tc: Ensure we have enough buffer space when sending filter 
> netlink notifications")
> But that fix is not available in 5.x kernels at the moment.
>
> TCA_ACT_MAX_NUM by itself is a wrong concept.  It supposed to limit
> the number of actual TC actions, but it is applied in two different
> places limiting tc_flower actions array instead, while there is
> not a 1 to 1 mapping between them and the actual TC actions generated.
> It sort of works for the most part, but fails when the mapping is
> significantly imbalanced or the number of actions is on the edge of
> the message size.  To be fair, it was not meant as a precise limit.
> And we may actually want some limit at offload-tc-netdev level for
> performance reasons.  But the concept is flawed.
>
> Fixing the logic around TCA_ACT_MAX_NUM or removing it outright is
> a little orthogonal to this patch though, as we'd still need both the
> TCA_ACT_MAX_PRIO check to avoid sending more than 32 actions to the
> kernel, and the full range of parsing for actions in replies.
>
> Fixes: d0fbb09f907b ("tc: Limit the max action number to 16")
> Fixes: 0c70132cd288 ("tc: Make the actions order consistent")
> Reported-at: https://redhat.atlassian.net/browse/FDP-4005
> Signed-off-by: Ilya Maximets <[email protected]>

Thanks for the quick fix, the change looks good to me.  One small nit
about not using the ovs-p1 naming convention, see below.

Acked-by: Eelco Chaudron <[email protected]>

> ---
>  lib/tc.c                         |  9 +++++--
>  tests/system-offloads-traffic.at | 43 ++++++++++++++++++++++++++++++++
>  2 files changed, 50 insertions(+), 2 deletions(-)
>
> diff --git a/lib/tc.c b/lib/tc.c
> index 4a9c6c267..1ae026b0e 100644
> --- a/lib/tc.c
> +++ b/lib/tc.c
> @@ -2133,7 +2133,8 @@ nl_parse_flower_actions(struct nlattr **attrs, struct 
> tc_flower *flower,
>                          bool terse)
>  {
>      const struct nlattr *actions = attrs[TCA_FLOWER_ACT];
> -    static struct nl_policy actions_orders_policy[TCA_ACT_MAX_NUM + 1] = {};
> +    /* Priorities of flower actions start at one: [1, TCA_ACT_MAX_PRIO]. */
> +    static struct nl_policy actions_orders_policy[TCA_ACT_MAX_PRIO + 1] = {};
>      struct nlattr *actions_orders[ARRAY_SIZE(actions_orders_policy)];
>      const int max_size = ARRAY_SIZE(actions_orders_policy);
>      int previous_action_count = 0;
> @@ -2402,6 +2403,7 @@ tc_dump_tc_action_start(char *name, struct nl_dump 
> *dump)
>  int
>  parse_netlink_to_tc_policer(struct ofpbuf *reply, uint32_t police_idx[])
>  {
> +    /* Action tables are indexed from zero: [0, TCA_ACT_MAX_PRIO - 1]. */
>      static struct nl_policy actions_orders_policy[TCA_ACT_MAX_PRIO] = {};
>      struct ofpbuf b = ofpbuf_const_initializer(reply->data, reply->size);
>      struct nlattr *actions_orders[ARRAY_SIZE(actions_orders_policy)];
> @@ -3540,7 +3542,10 @@ nl_msg_put_flower_acts(struct ofpbuf *request, struct 
> tc_flower *flower)
>      }
>      nl_msg_end_nested(request, offset);
>
> -    return 0;
> +    /* Some tc_actions generate more than one actual TC action.  Make sure
> +     * we're within the kernel's limit.  'act_index - 1' is the last used
> +     * priority / index. */
> +    return act_index - 1 <= TCA_ACT_MAX_PRIO ? 0 : -EOPNOTSUPP;
>  }
>
>  static void
> diff --git a/tests/system-offloads-traffic.at 
> b/tests/system-offloads-traffic.at
> index 13a8054a4..55c26a0ce 100644
> --- a/tests/system-offloads-traffic.at
> +++ b/tests/system-offloads-traffic.at
> @@ -1245,3 +1245,46 @@ AT_CHECK([ovs-appctl --format json --pretty 
> dpif/offload/show \
>
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
> +
> +AT_SETUP([offloads - many output ports - actions limit])
> +OVS_TRAFFIC_VSWITCHD_START([], [],
> +  [-- set Open_vSwitch . other_config:hw-offload=true])
> +
> +AT_CHECK([ovs-appctl vlog/set tc:file:dbg dpif_netlink:file:dbg 
> vconn:file:info])
> +
> +m4_for([id], [1], [17], [1], [
> +  ADD_NAMESPACES(atns-id)
> +  ADD_VETH(p-id, atns-id, br0, "10.1.1.id/24")
> +])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "arp,actions=NORMAL"])
> +dnl Forward all non-matching IP traffic towards ovs-p-1 (ICMP replies).
> +AT_CHECK([ovs-ofctl add-flow br0 "priority=0,ip,actions=output:ovs-p-1"])

All other tests use the ovs-p1 naming convention, it would be nice to
do the same here (not a blocker).  The m4 empty-quote trick makes the
macros look a little odd, but it works correctly.  Here is an example
diff:

diff --git a/tests/system-offloads-traffic.at b/tests/system-offloads-traffic.at
index 55c26a0ce..d3abe12df 100644
--- a/tests/system-offloads-traffic.at
+++ b/tests/system-offloads-traffic.at
@@ -1253,18 +1253,18 @@ OVS_TRAFFIC_VSWITCHD_START([], [],
 AT_CHECK([ovs-appctl vlog/set tc:file:dbg dpif_netlink:file:dbg 
vconn:file:info])

 m4_for([id], [1], [17], [1], [
-  ADD_NAMESPACES(atns-id)
-  ADD_VETH(p-id, atns-id, br0, "10.1.1.id/24")
+  ADD_NAMESPACES(atns[]id)
+  ADD_VETH(p[]id, atns[]id, br0, "10.1.1.id/24")
 ])

 AT_CHECK([ovs-ofctl add-flow br0 "arp,actions=NORMAL"])
-dnl Forward all non-matching IP traffic towards ovs-p-1 (ICMP replies).
-AT_CHECK([ovs-ofctl add-flow br0 "priority=0,ip,actions=output:ovs-p-1"])
+dnl Forward all non-matching IP traffic towards ovs-p1 (ICMP replies).
+AT_CHECK([ovs-ofctl add-flow br0 "priority=0,ip,actions=output:ovs-p1"])

 m4_define([OUTPUT_ACTIONS],
-  [m4_for([n], [2], [$1], [1], [,output:ovs-p-n])])
+  [m4_for([n], [2], [$1], [1], [,output:ovs-p[]n])])

-m4_define([DP_FLOW], ["in_port(ovs-p-1),eth(),eth_type(0x0800)"])
+m4_define([DP_FLOW], ["in_port(ovs-p1),eth(),eth_type(0x0800)"])

 dnl Gradually checking broadcast to more and more ports.  The last one should
 dnl hit the actions limit and fall back to kernel datapath.  Actions include
@@ -1272,9 +1272,9 @@ dnl LOCAL output to check the mismatch between the number 
of OVS and TC flower
 dnl actions (output to internal port requires additional skbedit).
 m4_for([id], [2], [17], [1], [
   AT_CHECK([ovs-ofctl add-flow br0 \
-    "priority=id,in_port=ovs-p-1,ip,actions=LOCAL[]OUTPUT_ACTIONS(id)"])
+    "priority=id,in_port=ovs-p1,ip,actions=LOCAL[]OUTPUT_ACTIONS(id)"])
   AT_CHECK([ovs-appctl revalidator/wait])
-  NS_CHECK_EXEC([atns-1], [ping -q -c 3 -i 0.1 -W 2 10.1.1.id | FORMAT_PING], 
[0], [dnl
+  NS_CHECK_EXEC([atns1], [ping -q -c 3 -i 0.1 -W 2 10.1.1.id | FORMAT_PING], 
[0], [dnl
 3 packets transmitted, 3 received, 0% packet loss, time 0ms
 ])
   m4_if(id, [17], [

> +
> +m4_define([OUTPUT_ACTIONS],
> +  [m4_for([n], [2], [$1], [1], [,output:ovs-p-n])])
> +
> +m4_define([DP_FLOW], ["in_port(ovs-p-1),eth(),eth_type(0x0800)"])
> +
> +dnl Gradually checking broadcast to more and more ports.  The last one should
> +dnl hit the actions limit and fall back to kernel datapath.  Actions include
> +dnl LOCAL output to check the mismatch between the number of OVS and TC 
> flower
> +dnl actions (output to internal port requires additional skbedit).
> +m4_for([id], [2], [17], [1], [
> +  AT_CHECK([ovs-ofctl add-flow br0 \
> +    "priority=id,in_port=ovs-p-1,ip,actions=LOCAL[]OUTPUT_ACTIONS(id)"])
> +  AT_CHECK([ovs-appctl revalidator/wait])
> +  NS_CHECK_EXEC([atns-1], [ping -q -c 3 -i 0.1 -W 2 10.1.1.id | 
> FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +  m4_if(id, [17], [
> +    AT_CHECK([ovs-appctl dpctl/dump-flows --names type=tc  | grep DP_FLOW], 
> [1])
> +    AT_CHECK([ovs-appctl dpctl/dump-flows --names type=ovs | grep -q 
> DP_FLOW])
> +  ], [
> +    AT_CHECK([ovs-appctl dpctl/dump-flows --names type=tc  | grep -q 
> DP_FLOW])
> +    AT_CHECK([ovs-appctl dpctl/dump-flows --names type=ovs | grep DP_FLOW], 
> [1])
> +  ])
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> -- 
> 2.54.0

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to