Hi Aaron, Thank you for reviewing the changes. I will format it better in patch v3 which I am planning to post today. Let me briefly explain you the problem that we faced for which this patch is proposed. When we use balance-tcp in bond for dpdk ports. Say, a 4500 length packet is fragmented(1500 each) at source, originates from VM towards ovs , as the first fragment contains the L4 header and rest 2 fragments do not have the L4 header, the hash calculation is different for first fragment than the rest. Thus, these fragments of the same packet choose different dpdk port(as hash calculation is different) to egress out after ovs processing, which could result in out-of-sequence fragments received in the receiver. This patch fixes this issue calculates same hash for all fragments of the same packet(be it first or subsequent fragments). The patch ignores L4 header for hash calculation for all fragments of a fragmented packet. Regards, Somnath
-----Original Message----- From: Aaron Conole <acon...@redhat.com> Sent: 13 June 2025 23:56 To: Somnath Chatterjee via dev <ovs-dev@openvswitch.org> Cc: Somnath Chatterjee B <somnath.b.chatter...@ericsson.com> Subject: Re: [ovs-dev] [PATCH v2] flow:Ovs shouldn't consider L4 header for frag pkt. [You don't often get email from acon...@redhat.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] Hi Somnath, Somnath Chatterjee via dev <ovs-dev@openvswitch.org> writes: > When IP layer fragments packets, L4 header > is only present in the first packet. The next fragmented > packets donot contain L4 headers. So hash calculation based > on 5tuple should ignore the L4 headers for all fragmented > packets. > > Signed-off-by: Somnath Chatterjee <somnath.b.chatter...@ericsson.com> > --- The formatting of your subject could be improved. I think it would work better as:: flow: Ignore later fragments when calculating hashes. That said, comments below > lib/flow.c | 31 +++++++++++++++++++------------ > 1 file changed, 19 insertions(+), 12 deletions(-) > > diff --git a/lib/flow.c b/lib/flow.c > index c2e1e4ad6..299ed6d0c 100644 > --- a/lib/flow.c > +++ b/lib/flow.c > @@ -1075,10 +1075,12 @@ miniflow_extract(struct dp_packet *packet, struct > miniflow *dst) > miniflow_push_be16(mf, tp_dst, tcp->tcp_dst); > miniflow_push_be16(mf, ct_tp_src, ct_tp_src); > miniflow_push_be16(mf, ct_tp_dst, ct_tp_dst); > - if (dl_type == htons(ETH_TYPE_IP)) { > - dp_packet_update_rss_hash_ipv4_tcp_udp(packet); > - } else if (dl_type == htons(ETH_TYPE_IPV6)) { > - dp_packet_update_rss_hash_ipv6_tcp_udp(packet); > + if (OVS_LIKELY(!nw_frag)) { > + if (dl_type == htons(ETH_TYPE_IP)) { > + dp_packet_update_rss_hash_ipv4_tcp_udp(packet); > + } else if (dl_type == htons(ETH_TYPE_IPV6)) { > + dp_packet_update_rss_hash_ipv6_tcp_udp(packet); > + } Am I wrong that this check will not even work for the first fragment? nw_frag should have FLOW_NW_FRAG_ANY set and that means we will bail this test even for first frags. That said, this block is guarded by: if (OVS_LIKELY(!(nw_frag & FLOW_NW_FRAG_LATER))) { if (OVS_LIKELY(nw_proto == IPPROTO_TCP)) { if (OVS_LIKELY(size >= TCP_HEADER_LEN)) { .... So I'm confused what this change is doing other than not updating rss hash for any fragments. Is that your intent? > } > dp_packet_ol_l4_csum_check_partial(packet); > if (dp_packet_l4_checksum_good(packet) > @@ -1095,10 +1097,12 @@ miniflow_extract(struct dp_packet *packet, struct > miniflow *dst) > miniflow_push_be16(mf, tp_dst, udp->udp_dst); > miniflow_push_be16(mf, ct_tp_src, ct_tp_src); > miniflow_push_be16(mf, ct_tp_dst, ct_tp_dst); > - if (dl_type == htons(ETH_TYPE_IP)) { > - dp_packet_update_rss_hash_ipv4_tcp_udp(packet); > - } else if (dl_type == htons(ETH_TYPE_IPV6)) { > - dp_packet_update_rss_hash_ipv6_tcp_udp(packet); > + if (OVS_LIKELY(!nw_frag)) { > + if (dl_type == htons(ETH_TYPE_IP)) { > + dp_packet_update_rss_hash_ipv4_tcp_udp(packet); > + } else if (dl_type == htons(ETH_TYPE_IPV6)) { > + dp_packet_update_rss_hash_ipv6_tcp_udp(packet); > + } Same comment as above. > } > dp_packet_ol_l4_csum_check_partial(packet); > if (dp_packet_l4_checksum_good(packet) > @@ -2426,10 +2430,13 @@ flow_hash_5tuple(const struct flow *flow, uint32_t > basis) > goto out; > } > > - /* Add both ports at once. */ > - hash = hash_add(hash, > - ((const uint32_t *)flow)[offsetof(struct flow, > tp_src) > - / sizeof(uint32_t)]); > + /* Add both L4 ports, only if the packet is not fragmented. */ > + if (OVS_LIKELY(!(flow->nw_frag & FLOW_NW_FRAG_MASK))) { > + hash = hash_add(hash, > + ((const uint32_t *)flow) > + [offsetof(struct flow, tp_src) / > + sizeof(uint32_t)]); > + } I'm not sure about this. I think if there is fragmented data, tp_src should be 0, and therefore the hash shouldn't change anyway. Do you encounter something different? > } > out: > return hash_finish(hash, 42); /* Arbitrary number. */ _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev