Hi Aaron,
Thank you for reviewing the changes. I will format it better in patch v3 which 
I am planning to post today. Let me briefly explain you the problem that we 
faced for which this patch is proposed. When we use balance-tcp in bond for 
dpdk ports. Say, a 4500 length packet is fragmented(1500 each) at source, 
originates from VM towards ovs , as the first fragment contains the L4 header 
and rest 2 fragments do not have the L4 header, the hash calculation is 
different for first fragment than the rest. Thus, these fragments of the same 
packet choose different dpdk port(as hash calculation is different) to egress 
out after ovs processing, which could result in out-of-sequence fragments 
received in the receiver. This patch fixes this issue calculates same hash for 
all fragments of the same packet(be it first or subsequent fragments). The 
patch ignores L4 header for hash calculation for all fragments of a fragmented 
packet.
Regards,
Somnath

-----Original Message-----
From: Aaron Conole <acon...@redhat.com> 
Sent: 13 June 2025 23:56
To: Somnath Chatterjee via dev <ovs-dev@openvswitch.org>
Cc: Somnath Chatterjee B <somnath.b.chatter...@ericsson.com>
Subject: Re: [ovs-dev] [PATCH v2] flow:Ovs shouldn't consider L4 header for 
frag pkt.

[You don't often get email from acon...@redhat.com. Learn why this is important 
at https://aka.ms/LearnAboutSenderIdentification ]

Hi Somnath,

Somnath Chatterjee via dev <ovs-dev@openvswitch.org> writes:

>   When IP layer fragments packets, L4 header
>   is only present in the first packet. The next fragmented
>   packets donot contain L4 headers. So hash calculation based
>   on 5tuple should ignore the L4 headers for all fragmented
>   packets.
>
> Signed-off-by: Somnath Chatterjee <somnath.b.chatter...@ericsson.com>
> ---

The formatting of your subject could be improved.  I think it would work
better as::

  flow: Ignore later fragments when calculating hashes.

That said, comments below

>  lib/flow.c | 31 +++++++++++++++++++------------
>  1 file changed, 19 insertions(+), 12 deletions(-)
>
> diff --git a/lib/flow.c b/lib/flow.c
> index c2e1e4ad6..299ed6d0c 100644
> --- a/lib/flow.c
> +++ b/lib/flow.c
> @@ -1075,10 +1075,12 @@ miniflow_extract(struct dp_packet *packet, struct 
> miniflow *dst)
>                      miniflow_push_be16(mf, tp_dst, tcp->tcp_dst);
>                      miniflow_push_be16(mf, ct_tp_src, ct_tp_src);
>                      miniflow_push_be16(mf, ct_tp_dst, ct_tp_dst);
> -                    if (dl_type == htons(ETH_TYPE_IP)) {
> -                        dp_packet_update_rss_hash_ipv4_tcp_udp(packet);
> -                    } else if (dl_type == htons(ETH_TYPE_IPV6)) {
> -                        dp_packet_update_rss_hash_ipv6_tcp_udp(packet);
> +                    if (OVS_LIKELY(!nw_frag)) {
> +                        if (dl_type == htons(ETH_TYPE_IP)) {
> +                            dp_packet_update_rss_hash_ipv4_tcp_udp(packet);
> +                        } else if (dl_type == htons(ETH_TYPE_IPV6)) {
> +                            dp_packet_update_rss_hash_ipv6_tcp_udp(packet);
> +                        }

Am I wrong that this check will not even work for the first fragment?
nw_frag should have FLOW_NW_FRAG_ANY set and that means we will bail
this test even for first frags.

That said, this block is guarded by:

    if (OVS_LIKELY(!(nw_frag & FLOW_NW_FRAG_LATER))) {
        if (OVS_LIKELY(nw_proto == IPPROTO_TCP)) {
            if (OVS_LIKELY(size >= TCP_HEADER_LEN)) {
               ....

So I'm confused what this change is doing other than not updating rss
hash for any fragments.  Is that your intent?

>                      }
>                      dp_packet_ol_l4_csum_check_partial(packet);
>                      if (dp_packet_l4_checksum_good(packet)
> @@ -1095,10 +1097,12 @@ miniflow_extract(struct dp_packet *packet, struct 
> miniflow *dst)
>                  miniflow_push_be16(mf, tp_dst, udp->udp_dst);
>                  miniflow_push_be16(mf, ct_tp_src, ct_tp_src);
>                  miniflow_push_be16(mf, ct_tp_dst, ct_tp_dst);
> -                if (dl_type == htons(ETH_TYPE_IP)) {
> -                    dp_packet_update_rss_hash_ipv4_tcp_udp(packet);
> -                } else if (dl_type == htons(ETH_TYPE_IPV6)) {
> -                    dp_packet_update_rss_hash_ipv6_tcp_udp(packet);
> +                if (OVS_LIKELY(!nw_frag)) {
> +                    if (dl_type == htons(ETH_TYPE_IP)) {
> +                        dp_packet_update_rss_hash_ipv4_tcp_udp(packet);
> +                    } else if (dl_type == htons(ETH_TYPE_IPV6)) {
> +                        dp_packet_update_rss_hash_ipv6_tcp_udp(packet);
> +                    }

Same comment as above.

>                  }
>                  dp_packet_ol_l4_csum_check_partial(packet);
>                  if (dp_packet_l4_checksum_good(packet)
> @@ -2426,10 +2430,13 @@ flow_hash_5tuple(const struct flow *flow, uint32_t 
> basis)
>              goto out;
>          }
>
> -        /* Add both ports at once. */
> -        hash = hash_add(hash,
> -                        ((const uint32_t *)flow)[offsetof(struct flow, 
> tp_src)
> -                                                 / sizeof(uint32_t)]);
> +        /* Add both L4 ports, only if the packet is not fragmented. */
> +        if (OVS_LIKELY(!(flow->nw_frag & FLOW_NW_FRAG_MASK))) {
> +            hash = hash_add(hash,
> +                            ((const uint32_t *)flow)
> +                            [offsetof(struct flow, tp_src) /
> +                            sizeof(uint32_t)]);
> +        }

I'm not sure about this.  I think if there is fragmented data, tp_src
should be 0, and therefore the hash shouldn't change anyway.

Do you encounter something different?

>      }
>  out:
>      return hash_finish(hash, 42); /* Arbitrary number. */

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to