TCA_ACT_MAX_NUM limits the actions number from the OVS perspective,
i.e., the size of the actions array in tc_flower structure.  However,
these actions can be represented by multiple actual TC actions in the
netlink message.  For example, output to internal ports requires extra
"skbedit" action to change the packet type before sending it to the
"mirred" action.  We also emit extra "csum" actions and put tunnel
attributes, etc.

In the end, the number of actual TC actions is frequently larger
than the number of actions in tc_flower structure.  Meaning that the
number of priorities in use can be larger than TCA_ACT_MAX_NUM.

If that happens, we fail to parse the full ECHO reply for the netlink
transaction and report a mismatch between requested and acknowledged
configuration.  The traffic should work fine, but we have the annoying
warning in the logs and the revalidators will try updating that one
datapath flow triggering more warnings.

We need to always expect the full range of priorities while parsing
kernel replies.  And we need to check that the number of used
priorities doesn't actually exceed the TCA_ACT_MAX_PRIO before sending
requests to the kernel, otherwise they can be "silently" truncated in
the absence of the kernel commit:
  5f5f92912b43 ("tc: Return an error if filters try to attach too many actions")

Also adding some comments around the indexing, as it is confusing
that priorities for TC flower actions are counted from 1, but the
action table indexes for the police action are starting from zero.
Both using the same constant for the upper bound, but it is inclusive
in one case and not in the other.

Notes:

Technically, there should be no need for the artificial limit imposed
by TCA_ACT_MAX_NUM, since the issue it was trying to work around was
fixed in the kernel commit:
  369609fc6272 ("tc: Ensure we have enough buffer space when sending filter 
netlink notifications")
But that fix is not available in 5.x kernels at the moment.

TCA_ACT_MAX_NUM by itself is a wrong concept.  It supposed to limit
the number of actual TC actions, but it is applied in two different
places limiting tc_flower actions array instead, while there is
not a 1 to 1 mapping between them and the actual TC actions generated.
It sort of works for the most part, but fails when the mapping is
significantly imbalanced or the number of actions is on the edge of
the message size.  To be fair, it was not meant as a precise limit.
And we may actually want some limit at offload-tc-netdev level for
performance reasons.  But the concept is flawed.

Fixing the logic around TCA_ACT_MAX_NUM or removing it outright is
a little orthogonal to this patch though, as we'd still need both the
TCA_ACT_MAX_PRIO check to avoid sending more than 32 actions to the
kernel, and the full range of parsing for actions in replies.

Fixes: d0fbb09f907b ("tc: Limit the max action number to 16")
Fixes: 0c70132cd288 ("tc: Make the actions order consistent")
Reported-at: https://redhat.atlassian.net/browse/FDP-4005
Signed-off-by: Ilya Maximets <[email protected]>
---
 lib/tc.c                         |  9 +++++--
 tests/system-offloads-traffic.at | 43 ++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/lib/tc.c b/lib/tc.c
index 4a9c6c267..1ae026b0e 100644
--- a/lib/tc.c
+++ b/lib/tc.c
@@ -2133,7 +2133,8 @@ nl_parse_flower_actions(struct nlattr **attrs, struct 
tc_flower *flower,
                         bool terse)
 {
     const struct nlattr *actions = attrs[TCA_FLOWER_ACT];
-    static struct nl_policy actions_orders_policy[TCA_ACT_MAX_NUM + 1] = {};
+    /* Priorities of flower actions start at one: [1, TCA_ACT_MAX_PRIO]. */
+    static struct nl_policy actions_orders_policy[TCA_ACT_MAX_PRIO + 1] = {};
     struct nlattr *actions_orders[ARRAY_SIZE(actions_orders_policy)];
     const int max_size = ARRAY_SIZE(actions_orders_policy);
     int previous_action_count = 0;
@@ -2402,6 +2403,7 @@ tc_dump_tc_action_start(char *name, struct nl_dump *dump)
 int
 parse_netlink_to_tc_policer(struct ofpbuf *reply, uint32_t police_idx[])
 {
+    /* Action tables are indexed from zero: [0, TCA_ACT_MAX_PRIO - 1]. */
     static struct nl_policy actions_orders_policy[TCA_ACT_MAX_PRIO] = {};
     struct ofpbuf b = ofpbuf_const_initializer(reply->data, reply->size);
     struct nlattr *actions_orders[ARRAY_SIZE(actions_orders_policy)];
@@ -3540,7 +3542,10 @@ nl_msg_put_flower_acts(struct ofpbuf *request, struct 
tc_flower *flower)
     }
     nl_msg_end_nested(request, offset);
 
-    return 0;
+    /* Some tc_actions generate more than one actual TC action.  Make sure
+     * we're within the kernel's limit.  'act_index - 1' is the last used
+     * priority / index. */
+    return act_index - 1 <= TCA_ACT_MAX_PRIO ? 0 : -EOPNOTSUPP;
 }
 
 static void
diff --git a/tests/system-offloads-traffic.at b/tests/system-offloads-traffic.at
index 13a8054a4..55c26a0ce 100644
--- a/tests/system-offloads-traffic.at
+++ b/tests/system-offloads-traffic.at
@@ -1245,3 +1245,46 @@ AT_CHECK([ovs-appctl --format json --pretty 
dpif/offload/show \
 
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
+
+AT_SETUP([offloads - many output ports - actions limit])
+OVS_TRAFFIC_VSWITCHD_START([], [],
+  [-- set Open_vSwitch . other_config:hw-offload=true])
+
+AT_CHECK([ovs-appctl vlog/set tc:file:dbg dpif_netlink:file:dbg 
vconn:file:info])
+
+m4_for([id], [1], [17], [1], [
+  ADD_NAMESPACES(atns-id)
+  ADD_VETH(p-id, atns-id, br0, "10.1.1.id/24")
+])
+
+AT_CHECK([ovs-ofctl add-flow br0 "arp,actions=NORMAL"])
+dnl Forward all non-matching IP traffic towards ovs-p-1 (ICMP replies).
+AT_CHECK([ovs-ofctl add-flow br0 "priority=0,ip,actions=output:ovs-p-1"])
+
+m4_define([OUTPUT_ACTIONS],
+  [m4_for([n], [2], [$1], [1], [,output:ovs-p-n])])
+
+m4_define([DP_FLOW], ["in_port(ovs-p-1),eth(),eth_type(0x0800)"])
+
+dnl Gradually checking broadcast to more and more ports.  The last one should
+dnl hit the actions limit and fall back to kernel datapath.  Actions include
+dnl LOCAL output to check the mismatch between the number of OVS and TC flower
+dnl actions (output to internal port requires additional skbedit).
+m4_for([id], [2], [17], [1], [
+  AT_CHECK([ovs-ofctl add-flow br0 \
+    "priority=id,in_port=ovs-p-1,ip,actions=LOCAL[]OUTPUT_ACTIONS(id)"])
+  AT_CHECK([ovs-appctl revalidator/wait])
+  NS_CHECK_EXEC([atns-1], [ping -q -c 3 -i 0.1 -W 2 10.1.1.id | FORMAT_PING], 
[0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+  m4_if(id, [17], [
+    AT_CHECK([ovs-appctl dpctl/dump-flows --names type=tc  | grep DP_FLOW], 
[1])
+    AT_CHECK([ovs-appctl dpctl/dump-flows --names type=ovs | grep -q DP_FLOW])
+  ], [
+    AT_CHECK([ovs-appctl dpctl/dump-flows --names type=tc  | grep -q DP_FLOW])
+    AT_CHECK([ovs-appctl dpctl/dump-flows --names type=ovs | grep DP_FLOW], 
[1])
+  ])
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
-- 
2.54.0

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to