On 4/29/24 16:48, Eelco Chaudron wrote:
> While offloading header modifications to TC, OVS is using {TCA_PEDIT} +
> {TCA_CSUM} combination as that it the only way to represent header
> rewrite.  However, {TCA_CSUM} is unable to calculate L4 checksums for
> IP fragments.
> 
> Since TC already applies fragmentation bit masking, this patch simply
> needs to prevent these packets from being processed through TC.
> 
> Signed-off-by: Eelco Chaudron <[email protected]>
> ---

Thanks, Eelco.  Beside the avx512 failure, see a few comments below.

Should this also have a Fixes tag?

>  lib/netdev-offload-tc.c | 32 +++++++++++++++++++++
>  tests/system-traffic.at | 62 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 94 insertions(+)
> 
> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
> index 921d52317..7e915d419 100644
> --- a/lib/netdev-offload-tc.c
> +++ b/lib/netdev-offload-tc.c
> @@ -1488,6 +1488,30 @@ parse_put_flow_ct_action(struct tc_flower *flower,
>          return 0;
>  }
>  
> +static bool
> +will_tc_add_l4_checksum(struct tc_flower *flower, int type)
> +{
> +    /* This function returns true if the tc layer will add a l4 checksum 
> action
> +     * for this set action. Refer to the csum_update_flag() function for
> +     * detailed logic. Note that even the kernel only supports updating TCP,
> +     * UDP and ICMPv6. */

Nit: double spaces between sentences.

We should add a similar comment to the csum_update_flag() prompting
anyone changing this function to update this one as well.

Ideally we would just fail in csum_update_flag() itself to avoid the
logic duplication in different places.  Can it be done or is it a
performance concern?

> +    switch (type) {
> +    case OVS_KEY_ATTR_IPV4:
> +    case OVS_KEY_ATTR_IPV6:
> +    case OVS_KEY_ATTR_TCP:
> +    case OVS_KEY_ATTR_UDP:
> +        switch (flower->key.ip_proto) {
> +        case IPPROTO_TCP:
> +        case IPPROTO_UDP:
> +        case IPPROTO_ICMPV6:
> +        case IPPROTO_UDPLITE:
> +            return true;
> +        }
> +        break;
> +    }
> +    return false;
> +}
> +
>  static int
>  parse_put_flow_set_masked_action(struct tc_flower *flower,
>                                   struct tc_action *action,
> @@ -1520,6 +1544,14 @@ parse_put_flow_set_masked_action(struct tc_flower 
> *flower,
>          return EOPNOTSUPP;
>      }
>  
> +    if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT
> +        && will_tc_add_l4_checksum(flower, type)) {
> +        VLOG_DBG_RL(&rl, "set action type %d not supported on fragments "
> +                    "due to checksum limitation", type);
> +        ofpbuf_uninit(&set_buf);
> +        return EOPNOTSUPP;
> +    }
> +
>      for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) {
>          struct netlink_field *f = &set_flower_map[type][i];
>  
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index bd7647cbe..d06d2f66a 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -8977,3 +8977,65 @@ OVS_WAIT_UNTIL([cat p2.pcap | grep -E "0x0050: *0000 
> *0000 *5002 *2000 *b85e *00
>  
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
> +
> +

One line is enough.

> +AT_SETUP([datapath - IP mod_nw_src/set_field on fragments])

Should 'IP' be glued to 'fragments' instead?

> +AT_SKIP_IF([test $HAVE_TCPDUMP = no])
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", 36:b1:ee:7c:01:03)
> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", 36:b1:ee:7c:01:02)
> +
> +AT_DATA([flows.txt], [dnl
> +  in_port=ovs-p0,ip,nw_src=10.1.1.1 actions=mod_nw_src=11.1.1.1,ovs-p1
> +  in_port=ovs-p0,ipv6,ipv6_src=fc00::1 
> actions=set_field:fc00::100->ipv6_src,ovs-p1
> +])
> +
> +AT_CHECK([ovs-ofctl del-flows br0])
> +AT_CHECK([ovs-ofctl -Oopenflow13 add-flows br0 flows.txt])
> +
> +NETNS_DAEMONIZE([at_ns1], [tcpdump -l -nn -xx -U -i p1 > p1.out],
> +  [tcpdump.pid])
> +sleep 1

Instead of sleeping we should wait for tcpdump to start listening.
Look for the following check in some other tests:

OVS_WAIT_UNTIL([grep "listening" tcpdump0_err])

> +
> +dnl We send each packet multiple times, ones for learning which will flow
> +dnl through the used datapath for learning, and the others will go through 
> the
> +dnl actuall datapath.

typo: actual

> +for i in 1 2 3 4 5; do
> +  NS_CHECK_EXEC([at_ns0], [$PYTHON3 $srcdir/sendpkt.py p0 \
> +    36 b1 ee 7c 01 02 36 b1 ee 7c 01 03 08 00 45 00 00 26 00 01 20 00 40 11 \
> +    44 c2 0a 01 01 01 0a 01 01 02 0b c4 08 84 00 26 e9 64 01 02 03 04 05 06 \
> +    07 08 09 0a > /dev/null])

Ideally we would use compose-packet for this, but it can't generate such
packets today, so it's fine to just plain-code the packet.  But, please,
add a description on what this packet is.  It's very inconvenient to
copy-paste this into some packet decoder.

> +done
> +
> +OVS_WAIT_UNTIL([test $(grep -c -E \
> +  "0x0000:  *36b1 *ee7c *0102 *36b1 *ee7c *0103 *0800 *4500" p1.out) -eq 5])
> +OVS_WAIT_UNTIL([test $(grep -c -E \
> +  "0x0010:  *0026 *0001 *2000 *4011 *43c2 *0b01 *0101 *0a01" p1.out) -eq 5])
> +OVS_WAIT_UNTIL([test $(grep -c -E \
> +  "0x0020:  *0102 *0bc4 *0884 *0026 *e864 *0102 *0304 *0506" p1.out) -eq 5])

Maybe it's better to use pcap output on tcpdump and compare with ovs-pcap?
Maybe also split the hex in parts like we do in tunnel tests here:
  
https://github.com/openvswitch/ovs/blob/bd8e9f48f180800292c10e12f26824833f18506a/tests/tunnel-push-pop.at#L834

Same for the IPv6 check below.

> +
> +dnl Repeat similar test with IPv6.
> +for i in 1 2 3 4 5; do
> +  NS_CHECK_EXEC([at_ns0], [$PYTHON3 $srcdir/sendpkt.py p0 \
> +    36 b1 ee 7c 01 02 36 b1 ee 7c 01 03 86 dd 60 00 00 00 00 18 2c 40 fc 00 \
> +    00 00 00 00 00 00 00 00 00 00 00 00 00 01 fc 00 00 00 00 00 00 00 00 00 \
> +    00 00 00 00 00 02 11 00 00 01 23 16 ab 36 0b c4 08 84 00 26 07 65 01 02 \
> +    03 04 05 06 07 08 > /dev/null])
> +done
> +
> +OVS_WAIT_UNTIL([test $(grep -c -E \
> +  "0x0000:  *36b1 *ee7c *0102 *36b1 *ee7c *0103 *86dd *6000" p1.out) -eq 5])
> +OVS_WAIT_UNTIL([test $(grep -c -E \
> +  "0x0010:  *0000 *0018 *2c40 *fc00 *0000 *0000 *0000 *0000" p1.out) -eq 5])
> +OVS_WAIT_UNTIL([test $(grep -c -E \
> +  "0x0020:  *0000 *0000 *0100 *fc00 *0000 *0000 *0000 *0000" p1.out) -eq 5])
> +OVS_WAIT_UNTIL([test $(grep -c -E \
> +  "0x0030:  *0000 *0000 *0002 *1100 *0001 *2316 *ab36 *0bc4" p1.out) -eq 5])
> +OVS_WAIT_UNTIL([test $(grep -c -E \
> +  "0x0040:  *0884 *0026 *0666 *0102 *0304 *0506 *0708" p1.out) -eq 5])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to