On 3/15/22 15:30, Hemanth Aramadaka wrote:
> Issue:
> 
> The src-port for UDP is based on RSS hash in the packet metadata.
> In case of packets coming from VM it will be 5-tuple, if available,
> otherwise just IP addresses.If the VM fragments a large IP packet
> and sends the fragments to ovs, only the first fragment will contain
> the L4 header. Therefore, the first fragment and subsequent fragments
> get different UDP src ports in the outgoing VXLAN header.This can
> lead to fragment re-ordering in the fabric as packet will take
> different paths.
> 
> Fix:
> 
> Intention of this is to avoid fragment packets taking different paths.
> For example, due to presence of firewalls, fragment packets will take
> different paths and will get dropped.To avoid this we ignore the L4
> header during hash calculation only in the case of fragmented packets.
> 
> P.S: There is already a review request raised for the same.
> https://mail.openvswitch.org/pipermail/ovs-dev/2021-January/379395.html.
> Re-iterating the same patch request on the latest master code.
> 
> Signed-off-by: Hemanth Aramadaka <[email protected]>

Hi, Hemanth.  Sorry for the late reply.

I had some questions for the previous version of that patch, but now
I think it's the best approach for the problem that we could have.

A few comments for the patch though:

The code changed a bit since the last version of the patch and new we
have a new function to calculate RSS hash for packets that doesn't
have one - dp_packet_update_rss_hash_ipv4_tcp_udp().  In order to
be consistent we should not call this function if the packet is
fragmented.  So you need to add extra checks like this:

  !(nw_frag & FLOW_NW_FRAG_MASK)

before calling dp_packet_update_rss_hash_ipv4_tcp_udp() inside of the
miniflow_extract().

Also, this area is very tricky.  We have a lot of different hash
functions which are used in different cases.  It would be great to
add a unit test.  You may send 2 fragments and check if they have
the same source port after encapsulation.  You may look at the
'tunnel_push_pop - packet_out debug_slow' test in tests/tunnel-push-pop.at
for example.

Some more comments inline.

Best regards, Ilya Maximets.

> ---
>  lib/flow.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/flow.c b/lib/flow.c
> index dd523c889..17bd47724 100644
> --- a/lib/flow.c
> +++ b/lib/flow.c
> @@ -2238,7 +2238,7 @@ miniflow_hash_5tuple(const struct miniflow *flow, 
> uint32_t basis)

In the comment for this function it's stated that it returns the
same value as flow_hash_5tuple().  So, you need to make similar
changes in flow_hash_5tuple() too.

>  
>      if (flow) {
>          ovs_be16 dl_type = MINIFLOW_GET_BE16(flow, dl_type);
> -        uint8_t nw_proto;
> +        uint8_t nw_proto,nw_frag;

Missing space between variables.

>  
>          if (dl_type == htons(ETH_TYPE_IPV6)) {
>              struct flowmap map = FLOWMAP_EMPTY_INITIALIZER;
> @@ -2260,6 +2260,14 @@ miniflow_hash_5tuple(const struct miniflow *flow, 
> uint32_t basis)
>  
>          nw_proto = MINIFLOW_GET_U8(flow, nw_proto);
>          hash = hash_add(hash, nw_proto);

I'd add an empty line here.

> +        /* Skip l4 header fields if IP packet is fragmented since
> +         * only first fragment will carry l4 header.
> +         */
> +        nw_frag = MINIFLOW_GET_U8(flow, nw_frag);
> +        if ((nw_frag)) {

Extra '()' are not needed.
Also, we, probably do not need the 'nw_frag' variable at all.
We can just check the output of the MINIFLOW_GET_U8() directly.

One more thing here is that we should not check the entire
u8 value, but only FLOW_NW_FRAG_MASK bits.

> +            goto out;
> +        }
> +
>          if (nw_proto != IPPROTO_TCP && nw_proto != IPPROTO_UDP
>              && nw_proto != IPPROTO_SCTP && nw_proto != IPPROTO_ICMP
>              && nw_proto != IPPROTO_ICMPV6) {

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to