On 4 Jun 2024, at 13:52, Ilya Maximets wrote:

> On 6/4/24 13:42, Eelco Chaudron wrote:
>>
>>
>> On 1 Jun 2024, at 0:08, Ilya Maximets wrote:
>>
>>> On 5/7/24 15:52, Eelco Chaudron wrote:
>>>> While offloading header modifications to TC, OVS is using {TCA_PEDIT} +
>>>> {TCA_CSUM} combination as that it the only way to represent header
>>>> rewrite.  However, {TCA_CSUM} is unable to calculate L4 checksums for
>>>> IP fragments.
>>>>
>>>> Since TC already applies fragmentation bit masking, this patch simply
>>>> needs to prevent these packets from being processed through TC.
>>>>
>>>> Signed-off-by: Eelco Chaudron <[email protected]>
>>>> ---
>>>> v2: - Fixed and added some comments.
>>>>     - Use ovs-pcap to compare packets.
>>>>
>>>> NOTE: This patch needs an AVX512 fix before it can be applied.
>>>>       Intel is working on this.
>>>> ---
>>>>  lib/netdev-offload-tc.c | 32 ++++++++++++++
>>>>  lib/tc.c                |  5 ++-
>>>>  tests/system-traffic.at | 93 +++++++++++++++++++++++++++++++++++++++++
>>>>  3 files changed, 129 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
>>>> index 921d52317..bdd307933 100644
>>>> --- a/lib/netdev-offload-tc.c
>>>> +++ b/lib/netdev-offload-tc.c
>>>> @@ -1488,6 +1488,30 @@ parse_put_flow_ct_action(struct tc_flower *flower,
>>>>          return 0;
>>>>  }
>>>>
>>>> +static bool
>>>> +will_tc_add_l4_checksum(struct tc_flower *flower, int type)
>>>> +{
>>>> +    /* This function returns true if the tc layer will add a l4 checksum 
>>>> action
>>>> +     * for this set action.  Refer to the csum_update_flag() function for
>>>> +     * detailed logic.  Note that even the kernel only supports updating 
>>>> TCP,
>>>> +     * UDP and ICMPv6. */
>>>
>>> This comment should be outside of the function, I think.  It's strange
>>> to have it here.  I see csum_update_flag() has a comment inside, but
>>> that's strange as well.  That function has other style issues as well,
>>> there is no need to copy them.
>>
>> ACK, will clean this up in the next rev.
>>
>>>> +    switch (type) {
>>>> +    case OVS_KEY_ATTR_IPV4:
>>>> +    case OVS_KEY_ATTR_IPV6:
>>>> +    case OVS_KEY_ATTR_TCP:
>>>> +    case OVS_KEY_ATTR_UDP:
>>>> +        switch (flower->key.ip_proto) {
>>>> +        case IPPROTO_TCP:
>>>> +        case IPPROTO_UDP:
>>>> +        case IPPROTO_ICMPV6:
>>>> +        case IPPROTO_UDPLITE:
>>>> +            return true;
>>>> +        }
>>>> +        break;
>>>> +    }
>>>> +    return false;
>>>> +}
>>>> +
>>>>  static int
>>>>  parse_put_flow_set_masked_action(struct tc_flower *flower,
>>>>                                   struct tc_action *action,
>>>> @@ -1520,6 +1544,14 @@ parse_put_flow_set_masked_action(struct tc_flower 
>>>> *flower,
>>>>          return EOPNOTSUPP;
>>>>      }
>>>>
>>>> +    if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT
>>>
>>> We're using this field to make an offloading decision.  We must
>>> also set in the mask.  If for some reason we're not matching on
>>> the fragment bits, we may first receive a non-fragmented packet
>>> and offload it, then fragmented traffic may match and fail the
>>> checksumming.  So, we need to enable the bit in the mask.
>>
>> The dp always matches on the fragment bit for IPv4 packets, I did some tests 
>> with this.
>> So if we sent a non-fragment packet first the dp rule will match on fragment 
>> = 0. Or
>> did I miss something?
>
> It is true today, but nothing ensures that on the netdev-offload-tc level.
> Moreover, the netdev-offload-tc is written in a way that it expects the
> frag bits to potentially not be in the mask:
>  
> https://github.com/openvswitch/ovs/blob/1d681ffe3b208a0db4945b6389142ab18404a4d1/lib/netdev-offload-tc.c#L2432-L2448
> So, it is going to be internally inconsistent if we do not set the bit
> explicitly.
>
> And if someday we'll stop adding the frag bit, we'll never know if we
> forget to set it in netdev-offload-tc.  At the very least we'll need an
> assertion that it is set.  But having an assertion will still be internally
> inconsistent with the code linked above.  So, it may be better to just fix
> it instead anyway.

Ok we could just move the ‘mask->nw_frag = 0’ out of the if branch ;) I’ll take 
another look before sending a v3.

>>
>>>> +        && will_tc_add_l4_checksum(flower, type)) {
>>>> +        VLOG_DBG_RL(&rl, "set action type %d not supported on fragments "
>>>> +                    "due to checksum limitation", type);
>>>> +        ofpbuf_uninit(&set_buf);
>>>> +        return EOPNOTSUPP;
>>>> +    }
>>>> +
>>>>      for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) {
>>>>          struct netlink_field *f = &set_flower_map[type][i];
>>>>
>>>> diff --git a/lib/tc.c b/lib/tc.c
>>>> index e9bcae4e4..20472d13c 100644
>>>> --- a/lib/tc.c
>>>> +++ b/lib/tc.c
>>>> @@ -2958,7 +2958,10 @@ csum_update_flag(struct tc_flower *flower,
>>>>       * eth(dst=<mac>),eth_type(0x0800) actions=set(ipv4(src=<new_ip>))
>>>>       * we need to force a more specific flow as this can, for example,
>>>>       * need a recalculation of icmp checksum if the packet that passes
>>>> -     * is ICMPv6 and tcp checksum if its tcp. */
>>>> +     * is ICMPv6 and tcp checksum if its tcp.
>>>> +     *
>>>> +     * Ensure that you keep the pre-check function in netdev-offload-tc.c,
>>>> +     * will_tc_add_l4_checksum(), in sync if you make any changes here. */
>>>>
>>>>      switch (htype) {
>>>>      case TCA_PEDIT_KEY_EX_HDR_TYPE_IP4:
>>>> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
>>>> index bd7647cbe..8f392abd5 100644
>>>> --- a/tests/system-traffic.at
>>>> +++ b/tests/system-traffic.at
>>>> @@ -8977,3 +8977,96 @@ OVS_WAIT_UNTIL([cat p2.pcap | grep -E "0x0050: 
>>>> *0000 *0000 *5002 *2000 *b85e *00
>>>>
>>>>  OVS_TRAFFIC_VSWITCHD_STOP
>>>>  AT_CLEANUP
>>>> +
>>>> +AT_SETUP([datapath - mod_nw_src/set_field on IP fragments])
>>>> +AT_SKIP_IF([test $HAVE_TCPDUMP = no])
>>>> +OVS_TRAFFIC_VSWITCHD_START()
>>>> +
>>>> +ADD_NAMESPACES(at_ns0, at_ns1)
>>>> +
>>>> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", 36:b1:ee:7c:01:03)
>>>> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", 36:b1:ee:7c:01:02)
>>>> +
>>>> +AT_DATA([flows.txt], [dnl
>>>> +  in_port=ovs-p0,ip,nw_src=10.1.1.1 actions=mod_nw_src=11.1.1.1,ovs-p1
>>>> +  in_port=ovs-p0,ipv6,ipv6_src=fc00::1 
>>>> actions=set_field:fc00::100->ipv6_src,ovs-p1
>>>> +])
>>>> +
>>>> +AT_CHECK([ovs-ofctl del-flows br0])
>>>> +AT_CHECK([ovs-ofctl -Oopenflow13 add-flows br0 flows.txt])
>>>> +
>>>> +NETNS_DAEMONIZE([at_ns1],
>>>> +                [tcpdump -l -nn -xx -U -i p1 -w p1.pcap 2> tcpdump.err],
>>>> +                [tcpdump.pid])
>>>> +OVS_WAIT_UNTIL([grep "listening" tcpdump.err])
>>>> +
>>>> +dnl IPv4 Packet content:
>>>> +dnl   Ethernet II, Src: 36:b1:ee:7c:01:03, Dst: 36:b1:ee:7c:01:02
>>>> +dnl       Type: IPv4 (0x0800)
>>>> +dnl   Internet Protocol Version 4, Src: 10.1.1.1, Dst: 10.1.1.2
>>>> +dnl       0100 .... = Version: 4
>>>> +dnl       .... 0101 = Header Length: 20 bytes (5)
>>>> +dnl       Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
>>>> +dnl       Total Length: 38
>>>> +dnl       Identification: 0x0001 (1)
>>>> +dnl       001. .... = Flags: 0x1, More fragments
>>>> +dnl           0... .... = Reserved bit: Not set
>>>> +dnl           .0.. .... = Don't fragment: Not set
>>>> +dnl           ..1. .... = More fragments: Set
>>>> +dnl       ...0 0000 0000 0000 = Fragment Offset: 0
>>>> +dnl       Time to Live: 64
>>>> +dnl       Protocol: UDP (17)
>>>> +dnl       Header Checksum: 0x44c2
>>>> +dnl   Data (18 bytes)
>>>> +eth="36 b1 ee 7c 01 02 36 b1 ee 7c 01 03 08 00"
>>>> +ip="45 00 00 26 00 01 20 00 40 11 44 c2 0a 01 01 01 0a 01 01 02"
>>>> +data="0b c4 08 84 00 26 e9 64 01 02 03 04 05 06 07 08 09 0a"
>>>> +packet="${eth} ${ip} ${data}"
>>>
>>> Since you're backporting the compose-packet functionality now, it's better
>>> if we use it here instead.  We may want to get my sendpkt patch first:
>>> https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/
>>>
>>> And then follow a pattern from the other patches from the same set:
>>> https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/
>>>
>>> What do you think?
>>
>> Yes this will be a good next step. I will review your patch series soon.
>>>
>>> Best regards, Ilya Maximets.
>>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to