Re: [ovs-dev] [PATCH v4] Enable VXLAN TSO for DPDK datapath

yang_y_yi Thu, 04 Feb 2021 18:13:15 -0800

At 2021-02-05 01:38:14, "Flavio Leitner" <[email protected]> wrote:
>
>
>Hi Yi,
>
>Again, sorry the delay to review the patch.
>
>The patch is using the outer length fields from DPDK
>which seems to be a problem in OVS because most of the
>packet transformation functions are not aware of that.
>
>Therefore, after the packet gets encapsulated most of
>existing functions will stop working.

Flavio, thank you so much for reviewing, is this your guess? I think ovs will 
parse outter header to
set l2, l3, l4. in our openstack enviroment, it works well, so I'm not sure 
which existing functions will
stop working. Can you elaborate which specific functions won't work?

>
>I think it makes more sense to keep inner headers lengths,
>which are unlikely to change after encapsulation and use
>struct dp_packet fields to hold outer headers. Doing so,
>we can re-use all the existing functions.

Only DPDK is consuming those outer headers, rte_mbuf has held them, I think 
struct dp_packet is very
case sensitive, non-DPDK cases won't use them.

For DPDK, inner l2_len must include outer UDP length plus VXLAN header because 
DPDK PMD driver uses it
in this way, so encapsulation will change inner l2_len. It will be great if you 
can elaborate it in a function
example.

>
>Another thing is that this patch pushes to the user to
>set a global tso_segsz which will not work well when
>bridges use different MTU sizes, encapsulations, etc.
>
>I think the tso_segsz comes in some form or another from
>each port and after the encapsulation we need adjust that
>value.

Yes, this is a good proposal, but this tso_segsz means other side VM MTU, it is 
not MTU of output
device port, for exmaple if VM MTU is 1450, underlay DPDK port MTU is 9000, we 
must use 1450 to do
TSO even if output port MTU is 9000. If target receiver has various MTUs, it is 
a big issue how to set them
per different target receivers. So a tradeoff way is to set a smaller 
tso_segsz, say VM MTU is 1450, floating IP
MTU is 1500/9000, then we set tso_segsz is 1450, this will enable both of them 
to work normally, this is why
I use a common tso_segze to handle it, it is a suboptimal way althougth it 
isn't perfect.

>
>I am working on a PoC with those changes in mind to see
>how it looks like, so if you wait a few days, I can follow
>up with results.

no problem, waiting for your results.

>
>More comments inline.
>
>On Thu, Nov 26, 2020 at 03:26:17PM +0800, [email protected] wrote:
>> From: Yi Yang <[email protected]>
>> 
>> @@ -95,7 +103,8 @@ enum dp_packet_offload_mask {
>>                                       DP_PACKET_OL_TX_IPV6          | \
>>                                       DP_PACKET_OL_TX_TCP_CKSUM     | \
>>                                       DP_PACKET_OL_TX_UDP_CKSUM     | \
>> -                                     DP_PACKET_OL_TX_SCTP_CKSUM)
>> +                                     DP_PACKET_OL_TX_SCTP_CKSUM    | \
>> +                                     DP_PACKET_OL_TX_UDP_SEG)
>
>Isn't this missing DP_PACKET_OL_TX_OUTER_IPV4 and
>DP_PACKET_OL_TX_OUTER_IPV6?

Good catch, will add them.

>
>
>>  #define DP_PACKET_OL_TX_L4_MASK (DP_PACKET_OL_TX_TCP_CKSUM | \
>>                                   DP_PACKET_OL_TX_UDP_CKSUM | \
>> @@ -954,6 +963,13 @@ dp_packet_hwol_is_tso(const struct dp_packet *b)
>>      return !!(*dp_packet_ol_flags_ptr(b) & DP_PACKET_OL_TX_TCP_SEG);
>>  }
>>  
>> +/* Returns 'true' if packet 'b' is marked for UDP fragmentation offloading. 
>> */
>> +static inline bool
>> +dp_packet_hwol_is_ufo(const struct dp_packet *b)
>> +{
>> +    return !!(*dp_packet_ol_flags_ptr(b) & DP_PACKET_OL_TX_UDP_SEG);
>> +}
>> +
>>  /* Returns 'true' if packet 'b' is marked for IPv4 checksum offloading. */
>>  static inline bool
>>  dp_packet_hwol_is_ipv4(const struct dp_packet *b)
>> @@ -992,6 +1008,13 @@ dp_packet_hwol_set_tx_ipv4(struct dp_packet *b)
>>      *dp_packet_ol_flags_ptr(b) |= DP_PACKET_OL_TX_IPV4;
>>  }
>>  
>> +/* Reset packet 'b' for IPv4 checksum offloading. */
>> +static inline void
>> +dp_packet_hwol_reset_tx_ipv4(struct dp_packet *b)
>> +{
>> +    *dp_packet_ol_flags_ptr(b) &= ~DP_PACKET_OL_TX_IPV4;
>> +}
>> +
>>  /* Mark packet 'b' for IPv6 checksum offloading. */
>>  static inline void
>>  dp_packet_hwol_set_tx_ipv6(struct dp_packet *b)
>> @@ -999,6 +1022,27 @@ dp_packet_hwol_set_tx_ipv6(struct dp_packet *b)
>>      *dp_packet_ol_flags_ptr(b) |= DP_PACKET_OL_TX_IPV6;
>>  }
>>  
>> +/* Reset packet 'b' for IPv6 checksum offloading. */
>> +static inline void
>> +dp_packet_hwol_reset_tx_ipv6(struct dp_packet *b)
>> +{
>> +    *dp_packet_ol_flags_ptr(b) &= ~DP_PACKET_OL_TX_IPV6;
>> +}
>> +
>> +/* Mark packet 'b' for Outer IPv4 checksum offloading. */
>> +static inline void
>> +dp_packet_hwol_set_tx_outer_ipv4(struct dp_packet *b)
>> +{
>
>Although I see you wanted to keep the API complete, this
>is not used, so I think it's fine to remove this.

Ok, GRO code used them, I will remove it, then add it in following GRO patch 
series.

>
>> +    *dp_packet_ol_flags_ptr(b) |= DP_PACKET_OL_TX_OUTER_IPV4;
>> +}
>> +
>> +/* Mark packet 'b' for Outer IPv6 checksum offloading. */
>> +static inline void
>> +dp_packet_hwol_set_tx_outer_ipv6(struct dp_packet *b)
>
>Same comment here.

Ok, GRO code used them, I will remove it, then add it in following GRO patch 
series.

>
>
>> +{
>> +    *dp_packet_ol_flags_ptr(b) |= DP_PACKET_OL_TX_OUTER_IPV6;
>> +}
>> @@ -2173,37 +2188,267 @@ netdev_dpdk_rxq_dealloc(struct netdev_rxq *rxq)
>>      rte_free(rx);
>>  }
>>  
>> +static inline bool
>> +is_local_to_local(uint16_t src_port_id, struct netdev_dpdk *dev)
>> +{
>
>
>I don't see how this works. Even if the packet is coming from a
>another local DPDK port, then it would still be required to translate
>from OVS generic offloading to, in this case, DPDK device specific.

If target receiver dev is local port (i.e. veth, tap, dpdkvhotsuserclient) and 
src_port_id is also local,
then we don't need to do extra processing because all the flags and l*_len have 
been set correctly.

Comments in code explained this:

"""
    /* Return directly if source and destitation of mbuf are local ports
     * because mbuf has already set ol_flags and l*_len correctly.
     */
"""

>> @@ -2781,17 +3026,26 @@ dpdk_copy_dp_packet_to_mbuf(struct rte_mempool *mp, 
>> struct dp_packet *pkt_orig)
>>      mbuf_dest->packet_type = pkt_orig->mbuf.packet_type;
>>      mbuf_dest->ol_flags |= (pkt_orig->mbuf.ol_flags &
>>                              ~(EXT_ATTACHED_MBUF | IND_ATTACHED_MBUF));
>> +    mbuf_dest->l2_len = pkt_orig->mbuf.l2_len;
>> +    mbuf_dest->l3_len = pkt_orig->mbuf.l3_len;
>> +    mbuf_dest->l4_len = pkt_orig->mbuf.l4_len;
>> +    mbuf_dest->outer_l2_len = pkt_orig->mbuf.outer_l2_len;
>> +    mbuf_dest->outer_l3_len = pkt_orig->mbuf.outer_l3_len;
>
>Although this is used before actually calling the send function
>which will update tso_segs, it seems this should copy that too.
>In that case, maybe it worth to use memcpy() as below to copy
>all fields at once.

Yes, memcpy can replace them, 

                struct {
                        uint64_t l2_len:RTE_MBUF_L2_LEN_BITS;
                        /**< L2 (MAC) Header Length for non-tunneling pkt.
                         * Outer_L4_len + ... + Inner_L2_len for tunneling pkt.
                         */
                        uint64_t l3_len:RTE_MBUF_L3_LEN_BITS;
                        /**< L3 (IP) Header Length. */
                        uint64_t l4_len:RTE_MBUF_L4_LEN_BITS;
                        /**< L4 (TCP/UDP) Header Length. */
                        uint64_t tso_segsz:RTE_MBUF_TSO_SEGSZ_BITS;
                        /**< TCP TSO segment size */

                        /*
                         * Fields for Tx offloading of tunnels.
                         * These are undefined for packets which don't request
                         * any tunnel offloads (outer IP or UDP checksum,
                         * tunnel TSO).
                         *
                         * PMDs should not use these fields unconditionally
                         * when calculating offsets.
                         *
                         * Applications are expected to set appropriate tunnel
                         * offload flags when they fill in these fields.
                         */
                        uint64_t outer_l3_len:RTE_MBUF_OUTL3_LEN_BITS;
                        /**< Outer L3 (IP) Hdr Length. */
                        uint64_t outer_l2_len:RTE_MBUF_OUTL2_LEN_BITS;
                        /**< Outer L2 (MAC) Hdr Length. */

                        /* uint64_t unused:RTE_MBUF_TXOFLD_UNUSED_BITS; */
                };

>> @@ -6527,6 +6531,8 @@ netdev_linux_parse_l2(struct dp_packet *b, uint16_t 
>> *l4proto)
>>          l2_len += VLAN_HEADER_LEN;
>>      }
>>  
>> +    dp_packet_hwol_set_l2_len(b, l2_len);
>
>This is not needed as the other like calls below because
>the packet is going to through the datapath yet and there
>is a packet parser at the beginning  which will set length
>values again.
>
>The purpose of this function is to identify the offloading
>features used and translate that to OVS generic. For example,
>if TSO was used, then we need to flag the packet properly.

This is a function I added, it is also DPDK specific, it will consider
vlan header when pop_vlan and push_vlan, so at any point, I can
use dp_packet_hwol_get_l2_len to get correct l2_len, code after this call
in lib/netdev-linux.c used dp_packet_hwol_get_l2_len , so call it
here is necessary. 

>> @@ -6561,10 +6593,6 @@ netdev_linux_parse_vnet_hdr(struct dp_packet *b)
>>          return -EINVAL;
>>      }
>>  
>> -    if (vnet->flags == 0 && vnet->gso_type == VIRTIO_NET_HDR_GSO_NONE) {
>> -        return 0;
>> -    }
>> -
>
>Why is that required? It seems that if flags are zero and gso_type
>is none, it doesn't need to continue.

Because it skipped l2 header parse, removing it is to ensure 
netdev_linux_parse_l2 can be called. If the packet
is from DPDK port or dpdkvhostuserclient, some flags and l*_len aren't set 
correctly.

>
>
>>      if (netdev_linux_parse_l2(b, &l4proto)) {
>>          return -EINVAL;
>>      }
>> @@ -6595,22 +6623,130 @@ netdev_linux_parse_vnet_hdr(struct dp_packet *b)
>>  }
>>  
>>  static void
>> -netdev_linux_prepend_vnet_hdr(struct dp_packet *b, int mtu)
>> +netdev_linux_set_ol_flags_and_cksum(struct dp_packet *b, int mtu)
>> +{
>> +    struct eth_header *eth_hdr;
>> +    struct ip_header *ip_hdr = NULL;
>> +    struct ovs_16aligned_ip6_hdr *nh6 = NULL;
>> +    uint16_t l4proto = 0;
>> +    ovs_be16 eth_type;
>> +    int l2_len;
>> +    int l3_len = 0;
>> +    int l4_len = 0;
>
>
>At this point the packet and struct dp_packet should be enough
>to do the translation without requiring another packet parsing.

This is for GRO case, flags and l*_len must be recalculated after GRO.

>> @@ -785,6 +787,64 @@ netdev_get_pt_mode(const struct netdev *netdev)
>>              : NETDEV_PT_LEGACY_L2);
>>  }
>>  
>> +static inline void
>> +calculate_tcpudp_checksum(struct dp_packet *p)
>> +{
>
>Perhaps this could be in packet.c instead? 

Good idea, will move it to lib/packet.c.



_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v4] Enable VXLAN TSO for DPDK datapath

Reply via email to