On 6 Dec 2024, at 23:25, Ilya Maximets wrote:

> On 8/15/24 12:02, Eelco Chaudron wrote:
>> While offloading header modifications to TC, OVS is using {TCA_PEDIT} +
>> {TCA_CSUM} combination as that it the only way to represent header
>> rewrite.  However, {TCA_CSUM} is unable to calculate L4 checksums for
>> IP fragments.
>>
>> Since TC already applies fragmentation bit masking, this patch simply
>> needs to prevent these packets from being processed through TC.
>>
>> Reported-at: https://issues.redhat.com/browse/FDP-545
>> Signed-off-by: Eelco Chaudron <[email protected]>
>> ---
>> v3: - Fixed some comment style issue.
>>     - Add the nw_frag mask if not set.
>> v2: - Fixed and added some comments.
>>     - Use ovs-pcap to compare packets.
>> ---
>>  lib/netdev-offload-tc.c | 35 ++++++++++++++++
>>  lib/tc.c                |  5 ++-
>>  tests/system-traffic.at | 93 +++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 132 insertions(+), 1 deletion(-)
>
> Hi, Eelco.  Sorry for the long delay.  See some comments below.

Thanks for the review. I’ve addressed your comments in the v4 which I just sent 
out.

Cheers,

Eelco


> Best regards, Ilya Maximets.
>
>>
>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
>> index 3be1c08d2..9395f784a 100644
>> --- a/lib/netdev-offload-tc.c
>> +++ b/lib/netdev-offload-tc.c
>> @@ -1490,6 +1490,31 @@ parse_put_flow_ct_action(struct tc_flower *flower,
>>          return 0;
>>  }
>>
>> +/* This function returns true if the tc layer will add a l4 checksum action
>> + * for this set action.  Refer to the csum_update_flag() function for
>> + * detailed logic.  Note that even the kernel only supports updating TCP,
>> + * UDP and ICMPv6.
>> + */
>> +static bool
>> +will_tc_add_l4_checksum(struct tc_flower *flower, int type)
>
> I'd change s/will_tc/tc_will/.  Just reads a little better, IMO.
>
> if "it is a fragment and tc will add l4 checksum"; then
> if "it is a fragment and will tc add l4 checksum"; then
>
>
>> +{
>> +    switch (type) {
>> +    case OVS_KEY_ATTR_IPV4:
>> +    case OVS_KEY_ATTR_IPV6:
>> +    case OVS_KEY_ATTR_TCP:
>> +    case OVS_KEY_ATTR_UDP:
>> +        switch (flower->key.ip_proto) {
>> +        case IPPROTO_TCP:
>> +        case IPPROTO_UDP:
>> +        case IPPROTO_ICMPV6:
>> +        case IPPROTO_UDPLITE:
>> +            return true;
>> +        }
>> +        break;
>> +    }
>> +    return false;
>> +}
>> +
>>  static int
>>  parse_put_flow_set_masked_action(struct tc_flower *flower,
>>                                   struct tc_action *action,
>> @@ -1522,6 +1547,14 @@ parse_put_flow_set_masked_action(struct tc_flower 
>> *flower,
>>          return EOPNOTSUPP;
>>      }
>>
>> +    if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT
>> +        && will_tc_add_l4_checksum(flower, type)) {
>> +        VLOG_DBG_RL(&rl, "set action type %d not supported on fragments "
>> +                    "due to checksum limitation", type);
>> +        ofpbuf_uninit(&set_buf);
>> +        return EOPNOTSUPP;
>> +    }
>> +
>>      for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) {
>>          struct netlink_field *f = &set_flower_map[type][i];
>>
>> @@ -2447,6 +2480,8 @@ netdev_tc_flow_put(struct netdev *netdev, struct match 
>> *match,
>>              }
>>
>>              mask->nw_frag = 0;
>> +        } else {
>> +            mask->nw_frag = FLOW_NW_FRAG_MASK;
>
> This doesn't do much.  We need the datapath to have a match on IS_FRAGMENT
> bit instead.  Updating mask will make offload fail, since it's not consumed.
>
> What we want is:
>
>  if the packet IS NOT fragmented AND tc_will_add_l4_checksum:
>      Add explicit match on packet not being fragmented
>
>  if the packet IS fragmented AND tc_will_add_l4_checksum:
>      Do not offload
>
> What we want to avoid is if we add a flow that changes L34 fields and doesn't
> match on fragmentation bit, fragments may hit it and have incorrect checkums.
> So, every flower rule that modifies L34 should have IS_FRAGMENT set in the 
> mask.
>
>>          }
>>
>>          if (key->nw_proto == IPPROTO_TCP) {
>> diff --git a/lib/tc.c b/lib/tc.c
>> index e55ba3b1b..22960a13e 100644
>> --- a/lib/tc.c
>> +++ b/lib/tc.c
>> @@ -2958,7 +2958,10 @@ csum_update_flag(struct tc_flower *flower,
>>       * eth(dst=<mac>),eth_type(0x0800) actions=set(ipv4(src=<new_ip>))
>>       * we need to force a more specific flow as this can, for example,
>>       * need a recalculation of icmp checksum if the packet that passes
>> -     * is ICMPv6 and tcp checksum if its tcp. */
>> +     * is ICMPv6 and tcp checksum if its tcp.
>> +     *
>> +     * Ensure that you keep the pre-check function in netdev-offload-tc.c,
>> +     * will_tc_add_l4_checksum(), in sync if you make any changes here. */
>
> Comments should not talk to the reader, generally, they should describe
> things instead.  So, we shouldn't use words like 'you'.
>
> E.g. "This section of the code must be kept in sync with ..."
>
>>
>>      switch (htype) {
>>      case TCA_PEDIT_KEY_EX_HDR_TYPE_IP4:
>> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
>> index 202ff0492..add0ab3a2 100644
>> --- a/tests/system-traffic.at
>> +++ b/tests/system-traffic.at
>> @@ -2436,6 +2436,99 @@ 
>> recirc_id(<recirc>),in_port(2),eth_type(0x86dd),ipv6(proto=58,frag=no),icmpv6(ty
>>  OVS_TRAFFIC_VSWITCHD_STOP
>>  AT_CLEANUP
>>
>> +AT_SETUP([datapath - mod_nw_src/set_field on IP fragments])
>> +AT_SKIP_IF([test $HAVE_TCPDUMP = no])
>> +OVS_TRAFFIC_VSWITCHD_START()
>> +
>> +ADD_NAMESPACES(at_ns0, at_ns1)
>> +
>> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", 36:b1:ee:7c:01:03)
>> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", 36:b1:ee:7c:01:02)
>> +
>> +AT_DATA([flows.txt], [dnl
>> +  in_port=ovs-p0,ip,nw_src=10.1.1.1 actions=mod_nw_src=11.1.1.1,ovs-p1
>> +  in_port=ovs-p0,ipv6,ipv6_src=fc00::1 
>> actions=set_field:fc00::100->ipv6_src,ovs-p1
>> +])
>> +
>> +AT_CHECK([ovs-ofctl del-flows br0])
>> +AT_CHECK([ovs-ofctl -Oopenflow13 add-flows br0 flows.txt])
>> +
>> +NETNS_DAEMONIZE([at_ns1],
>> +                [tcpdump -l -nn -xx -U -i p1 -w p1.pcap 2> tcpdump.err],
>> +                [tcpdump.pid])
>> +OVS_WAIT_UNTIL([grep "listening" tcpdump.err])
>> +
>> +dnl IPv4 Packet content:
>> +dnl   Ethernet II, Src: 36:b1:ee:7c:01:03, Dst: 36:b1:ee:7c:01:02
>> +dnl       Type: IPv4 (0x0800)
>> +dnl   Internet Protocol Version 4, Src: 10.1.1.1, Dst: 10.1.1.2
>> +dnl       0100 .... = Version: 4
>> +dnl       .... 0101 = Header Length: 20 bytes (5)
>> +dnl       Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
>> +dnl       Total Length: 38
>> +dnl       Identification: 0x0001 (1)
>> +dnl       001. .... = Flags: 0x1, More fragments
>> +dnl           0... .... = Reserved bit: Not set
>> +dnl           .0.. .... = Don't fragment: Not set
>> +dnl           ..1. .... = More fragments: Set
>> +dnl       ...0 0000 0000 0000 = Fragment Offset: 0
>> +dnl       Time to Live: 64
>> +dnl       Protocol: UDP (17)
>> +dnl       Header Checksum: 0x44c2
>> +dnl   Data (18 bytes)
>> +eth="36 b1 ee 7c 01 02 36 b1 ee 7c 01 03 08 00"
>> +ip="45 00 00 26 00 01 20 00 40 11 44 c2 0a 01 01 01 0a 01 01 02"
>> +data="0b c4 08 84 00 26 e9 64 01 02 03 04 05 06 07 08 09 0a"
>> +packet="${eth} ${ip} ${data}"
>> +
>> +dnl We send each packet multiple times, ones for learning which will flow
>> +dnl through the used datapath for learning, and the others will go through 
>> the
>> +dnl actual datapath.
>
> I'm not sure I understand what "ones for learning which will flow through
> the used datapath for learning" means.  May need some re-wording. :)
>
>> +for i in 1 2 3 4 5; do
>> +  NS_CHECK_EXEC([at_ns0],
>> +                [$PYTHON3 $srcdir/sendpkt.py p0 ${packet} > /dev/null])
>> +done
>> +
>> +dnl Update source address, and checksums in original packet for comparison.
>
> A comma may not be needed.
>
>> +packet=$(echo "$packet" | sed -e 's/ //g' \
>> +         -e 's/0a010101/0b010101/g' -e 's/44c2/43c2/g' -e 's/e964/e864/g')
>> +OVS_WAIT_UNTIL([test $(ovs-pcap p1.pcap | grep -c "${packet}") -eq 5])
>> +
>> +dnl Repeat similar test with IPv6.
>> +dnl Packet content:
>> +dnl   Ethernet II, Src: 36:b1:ee:7c:01:03, Dst: 36:b1:ee:7c:01:02
>> +dnl       Type: IPv6 (0x86dd)
>> +dnl   Internet Protocol Version 6, Src: fc00::1, Dst: fc00::2
>> +dnl       Payload Length: 24
>> +dnl       Next Header: Fragment Header for IPv6 (44)
>> +dnl       Hop Limit: 64
>> +dnl       Fragment Header for IPv6
>> +dnl           Next header: UDP (17)
>> +dnl           Reserved octet: 0x00
>> +dnl           0000 0000 0000 0... = Offset: 0 (0 bytes)
>> +dnl           .... .... .... .00. = Reserved bits: 0
>> +dnl           .... .... .... ...1 = More Fragments: Yes
>> +dnl           Identification: 0x2316ab36
>> +dnl   Data (16 bytes)
>> +eth="36 b1 ee 7c 01 02 36 b1 ee 7c 01 03 86 dd"
>> +ip="60 00 00 00 00 18 2c 40 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 
>> \
>> +    fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 11 00 00 01 23 16 ab 36"
>> +data="0b c4 08 84 00 26 07 65 01 02 03 04 05 06 07 08"
>> +packet="${eth} ${ip} ${data}"
>> +
>> +for i in 1 2 3 4 5; do
>> +  NS_CHECK_EXEC([at_ns0],
>> +                [$PYTHON3 $srcdir/sendpkt.py p0 ${packet} > /dev/null])
>> +done
>> +
>> +dnl Update checksum and source address in original packet for comparison.
>> +packet=$(echo "$packet" | sed -e 's/ //g' -e 's/0765/0666/g' -e \
>> +       
>> 's/fc000000000000000000000000000001/fc000000000000000000000000000100/g')
>> +OVS_WAIT_UNTIL([test $(ovs-pcap p1.pcap | grep -c "${packet}") -eq 5])
>> +
>> +OVS_TRAFFIC_VSWITCHD_STOP
>> +AT_CLEANUP
>> +
>>  AT_BANNER([MPLS])
>>
>>  AT_SETUP([mpls - encap header dp-support])

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to