On 6 Dec 2024, at 23:25, Ilya Maximets wrote:
> On 8/15/24 12:02, Eelco Chaudron wrote:
>> While offloading header modifications to TC, OVS is using {TCA_PEDIT} +
>> {TCA_CSUM} combination as that it the only way to represent header
>> rewrite. However, {TCA_CSUM} is unable to calculate L4 checksums for
>> IP fragments.
>>
>> Since TC already applies fragmentation bit masking, this patch simply
>> needs to prevent these packets from being processed through TC.
>>
>> Reported-at: https://issues.redhat.com/browse/FDP-545
>> Signed-off-by: Eelco Chaudron <[email protected]>
>> ---
>> v3: - Fixed some comment style issue.
>> - Add the nw_frag mask if not set.
>> v2: - Fixed and added some comments.
>> - Use ovs-pcap to compare packets.
>> ---
>> lib/netdev-offload-tc.c | 35 ++++++++++++++++
>> lib/tc.c | 5 ++-
>> tests/system-traffic.at | 93 +++++++++++++++++++++++++++++++++++++++++
>> 3 files changed, 132 insertions(+), 1 deletion(-)
>
> Hi, Eelco. Sorry for the long delay. See some comments below.
Thanks for the review. I’ve addressed your comments in the v4 which I just sent
out.
Cheers,
Eelco
> Best regards, Ilya Maximets.
>
>>
>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
>> index 3be1c08d2..9395f784a 100644
>> --- a/lib/netdev-offload-tc.c
>> +++ b/lib/netdev-offload-tc.c
>> @@ -1490,6 +1490,31 @@ parse_put_flow_ct_action(struct tc_flower *flower,
>> return 0;
>> }
>>
>> +/* This function returns true if the tc layer will add a l4 checksum action
>> + * for this set action. Refer to the csum_update_flag() function for
>> + * detailed logic. Note that even the kernel only supports updating TCP,
>> + * UDP and ICMPv6.
>> + */
>> +static bool
>> +will_tc_add_l4_checksum(struct tc_flower *flower, int type)
>
> I'd change s/will_tc/tc_will/. Just reads a little better, IMO.
>
> if "it is a fragment and tc will add l4 checksum"; then
> if "it is a fragment and will tc add l4 checksum"; then
>
>
>> +{
>> + switch (type) {
>> + case OVS_KEY_ATTR_IPV4:
>> + case OVS_KEY_ATTR_IPV6:
>> + case OVS_KEY_ATTR_TCP:
>> + case OVS_KEY_ATTR_UDP:
>> + switch (flower->key.ip_proto) {
>> + case IPPROTO_TCP:
>> + case IPPROTO_UDP:
>> + case IPPROTO_ICMPV6:
>> + case IPPROTO_UDPLITE:
>> + return true;
>> + }
>> + break;
>> + }
>> + return false;
>> +}
>> +
>> static int
>> parse_put_flow_set_masked_action(struct tc_flower *flower,
>> struct tc_action *action,
>> @@ -1522,6 +1547,14 @@ parse_put_flow_set_masked_action(struct tc_flower
>> *flower,
>> return EOPNOTSUPP;
>> }
>>
>> + if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT
>> + && will_tc_add_l4_checksum(flower, type)) {
>> + VLOG_DBG_RL(&rl, "set action type %d not supported on fragments "
>> + "due to checksum limitation", type);
>> + ofpbuf_uninit(&set_buf);
>> + return EOPNOTSUPP;
>> + }
>> +
>> for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) {
>> struct netlink_field *f = &set_flower_map[type][i];
>>
>> @@ -2447,6 +2480,8 @@ netdev_tc_flow_put(struct netdev *netdev, struct match
>> *match,
>> }
>>
>> mask->nw_frag = 0;
>> + } else {
>> + mask->nw_frag = FLOW_NW_FRAG_MASK;
>
> This doesn't do much. We need the datapath to have a match on IS_FRAGMENT
> bit instead. Updating mask will make offload fail, since it's not consumed.
>
> What we want is:
>
> if the packet IS NOT fragmented AND tc_will_add_l4_checksum:
> Add explicit match on packet not being fragmented
>
> if the packet IS fragmented AND tc_will_add_l4_checksum:
> Do not offload
>
> What we want to avoid is if we add a flow that changes L34 fields and doesn't
> match on fragmentation bit, fragments may hit it and have incorrect checkums.
> So, every flower rule that modifies L34 should have IS_FRAGMENT set in the
> mask.
>
>> }
>>
>> if (key->nw_proto == IPPROTO_TCP) {
>> diff --git a/lib/tc.c b/lib/tc.c
>> index e55ba3b1b..22960a13e 100644
>> --- a/lib/tc.c
>> +++ b/lib/tc.c
>> @@ -2958,7 +2958,10 @@ csum_update_flag(struct tc_flower *flower,
>> * eth(dst=<mac>),eth_type(0x0800) actions=set(ipv4(src=<new_ip>))
>> * we need to force a more specific flow as this can, for example,
>> * need a recalculation of icmp checksum if the packet that passes
>> - * is ICMPv6 and tcp checksum if its tcp. */
>> + * is ICMPv6 and tcp checksum if its tcp.
>> + *
>> + * Ensure that you keep the pre-check function in netdev-offload-tc.c,
>> + * will_tc_add_l4_checksum(), in sync if you make any changes here. */
>
> Comments should not talk to the reader, generally, they should describe
> things instead. So, we shouldn't use words like 'you'.
>
> E.g. "This section of the code must be kept in sync with ..."
>
>>
>> switch (htype) {
>> case TCA_PEDIT_KEY_EX_HDR_TYPE_IP4:
>> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
>> index 202ff0492..add0ab3a2 100644
>> --- a/tests/system-traffic.at
>> +++ b/tests/system-traffic.at
>> @@ -2436,6 +2436,99 @@
>> recirc_id(<recirc>),in_port(2),eth_type(0x86dd),ipv6(proto=58,frag=no),icmpv6(ty
>> OVS_TRAFFIC_VSWITCHD_STOP
>> AT_CLEANUP
>>
>> +AT_SETUP([datapath - mod_nw_src/set_field on IP fragments])
>> +AT_SKIP_IF([test $HAVE_TCPDUMP = no])
>> +OVS_TRAFFIC_VSWITCHD_START()
>> +
>> +ADD_NAMESPACES(at_ns0, at_ns1)
>> +
>> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", 36:b1:ee:7c:01:03)
>> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", 36:b1:ee:7c:01:02)
>> +
>> +AT_DATA([flows.txt], [dnl
>> + in_port=ovs-p0,ip,nw_src=10.1.1.1 actions=mod_nw_src=11.1.1.1,ovs-p1
>> + in_port=ovs-p0,ipv6,ipv6_src=fc00::1
>> actions=set_field:fc00::100->ipv6_src,ovs-p1
>> +])
>> +
>> +AT_CHECK([ovs-ofctl del-flows br0])
>> +AT_CHECK([ovs-ofctl -Oopenflow13 add-flows br0 flows.txt])
>> +
>> +NETNS_DAEMONIZE([at_ns1],
>> + [tcpdump -l -nn -xx -U -i p1 -w p1.pcap 2> tcpdump.err],
>> + [tcpdump.pid])
>> +OVS_WAIT_UNTIL([grep "listening" tcpdump.err])
>> +
>> +dnl IPv4 Packet content:
>> +dnl Ethernet II, Src: 36:b1:ee:7c:01:03, Dst: 36:b1:ee:7c:01:02
>> +dnl Type: IPv4 (0x0800)
>> +dnl Internet Protocol Version 4, Src: 10.1.1.1, Dst: 10.1.1.2
>> +dnl 0100 .... = Version: 4
>> +dnl .... 0101 = Header Length: 20 bytes (5)
>> +dnl Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
>> +dnl Total Length: 38
>> +dnl Identification: 0x0001 (1)
>> +dnl 001. .... = Flags: 0x1, More fragments
>> +dnl 0... .... = Reserved bit: Not set
>> +dnl .0.. .... = Don't fragment: Not set
>> +dnl ..1. .... = More fragments: Set
>> +dnl ...0 0000 0000 0000 = Fragment Offset: 0
>> +dnl Time to Live: 64
>> +dnl Protocol: UDP (17)
>> +dnl Header Checksum: 0x44c2
>> +dnl Data (18 bytes)
>> +eth="36 b1 ee 7c 01 02 36 b1 ee 7c 01 03 08 00"
>> +ip="45 00 00 26 00 01 20 00 40 11 44 c2 0a 01 01 01 0a 01 01 02"
>> +data="0b c4 08 84 00 26 e9 64 01 02 03 04 05 06 07 08 09 0a"
>> +packet="${eth} ${ip} ${data}"
>> +
>> +dnl We send each packet multiple times, ones for learning which will flow
>> +dnl through the used datapath for learning, and the others will go through
>> the
>> +dnl actual datapath.
>
> I'm not sure I understand what "ones for learning which will flow through
> the used datapath for learning" means. May need some re-wording. :)
>
>> +for i in 1 2 3 4 5; do
>> + NS_CHECK_EXEC([at_ns0],
>> + [$PYTHON3 $srcdir/sendpkt.py p0 ${packet} > /dev/null])
>> +done
>> +
>> +dnl Update source address, and checksums in original packet for comparison.
>
> A comma may not be needed.
>
>> +packet=$(echo "$packet" | sed -e 's/ //g' \
>> + -e 's/0a010101/0b010101/g' -e 's/44c2/43c2/g' -e 's/e964/e864/g')
>> +OVS_WAIT_UNTIL([test $(ovs-pcap p1.pcap | grep -c "${packet}") -eq 5])
>> +
>> +dnl Repeat similar test with IPv6.
>> +dnl Packet content:
>> +dnl Ethernet II, Src: 36:b1:ee:7c:01:03, Dst: 36:b1:ee:7c:01:02
>> +dnl Type: IPv6 (0x86dd)
>> +dnl Internet Protocol Version 6, Src: fc00::1, Dst: fc00::2
>> +dnl Payload Length: 24
>> +dnl Next Header: Fragment Header for IPv6 (44)
>> +dnl Hop Limit: 64
>> +dnl Fragment Header for IPv6
>> +dnl Next header: UDP (17)
>> +dnl Reserved octet: 0x00
>> +dnl 0000 0000 0000 0... = Offset: 0 (0 bytes)
>> +dnl .... .... .... .00. = Reserved bits: 0
>> +dnl .... .... .... ...1 = More Fragments: Yes
>> +dnl Identification: 0x2316ab36
>> +dnl Data (16 bytes)
>> +eth="36 b1 ee 7c 01 02 36 b1 ee 7c 01 03 86 dd"
>> +ip="60 00 00 00 00 18 2c 40 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01
>> \
>> + fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 11 00 00 01 23 16 ab 36"
>> +data="0b c4 08 84 00 26 07 65 01 02 03 04 05 06 07 08"
>> +packet="${eth} ${ip} ${data}"
>> +
>> +for i in 1 2 3 4 5; do
>> + NS_CHECK_EXEC([at_ns0],
>> + [$PYTHON3 $srcdir/sendpkt.py p0 ${packet} > /dev/null])
>> +done
>> +
>> +dnl Update checksum and source address in original packet for comparison.
>> +packet=$(echo "$packet" | sed -e 's/ //g' -e 's/0765/0666/g' -e \
>> +
>> 's/fc000000000000000000000000000001/fc000000000000000000000000000100/g')
>> +OVS_WAIT_UNTIL([test $(ovs-pcap p1.pcap | grep -c "${packet}") -eq 5])
>> +
>> +OVS_TRAFFIC_VSWITCHD_STOP
>> +AT_CLEANUP
>> +
>> AT_BANNER([MPLS])
>>
>> AT_SETUP([mpls - encap header dp-support])
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev