On 1/17/24 11:09, Dexia Li wrote: > > > On 1/14/24 00:34, Ilya Maximets wrote: >> On 12/27/23 19:55, Mike Pattrick wrote: >>> From: Dexia Li <[email protected]> >>> >>> For userspace datapath, this patch provides vxlan and geneve tunnel tso. >>> Only support userspace vxlan or geneve tunnel, meanwhile support >>> tunnel outter and inner csum offload. If netdev do not support >>> offload features, there is a software fallback.If netdev do not >>> support vxlan and geneve tso,packets will drop. Front-end devices can >>> close offload features by ethtool also. >>> >>> Signed-off-by: Dexia Li <[email protected]> >>> Co-authored-by: Mike Pattrick <[email protected]> >>> Signed-off-by: Mike Pattrick <[email protected]> >>> --- >>> v9: Rebased patch >>> --- >>> lib/dp-packet.c | 41 +++++++- >>> lib/dp-packet.h | 216 ++++++++++++++++++++++++++++++++++++---- >>> lib/dpif-netdev.c | 4 +- >>> lib/flow.c | 2 +- >>> lib/netdev-dpdk.c | 87 ++++++++++++++-- >>> lib/netdev-dummy.c | 2 +- >>> lib/netdev-native-tnl.c | 106 ++++++++++++++++++-- >>> lib/netdev-provider.h | 4 + >>> lib/netdev.c | 33 ++++-- >>> lib/packets.c | 12 +-- >>> lib/packets.h | 6 +- >>> tests/dpif-netdev.at | 4 +- >>> 12 files changed, 458 insertions(+), 59 deletions(-) >> > > Hi, Ilya. > > Thanks for your comments. I have fixed the style issues and packet leak error. > The new v12 version patch has been pushed. > I left one explain after your comments about the mbuf data_len setting. > > Best regards, > Dexia > > > > >> Hi, Dexia and Mike. Thanks for the change! >> >> See a few comments inline. Mostly style nitpicks, but there is also a >> packet leak and a missed case of double encapsulation. >> >> I didn't comment on a few things that you fixed in the second patch, >> so that should be fine to keep. >> >>> >>> diff --git a/lib/dp-packet.c b/lib/dp-packet.c index >>> 920402369..cb20608d7 100644 >>> --- a/lib/dp-packet.c >>> +++ b/lib/dp-packet.c >>> @@ -546,16 +546,47 @@ dp_packet_compare_offsets(struct dp_packet *b1, >>> struct dp_packet *b2, >>> return true; >>> } >>> >>> +void >>> +dp_packet_tnl_outer_ol_send_prepare(struct dp_packet *p, >>> + uint64_t flags) >> >> Indentation. Should be 4 spaces to the right. >> >>> +{ >>> + if (dp_packet_hwol_is_outer_ipv4_cksum(p)) { >>> + if (!(flags & NETDEV_TX_OFFLOAD_OUTER_IP_CKSUM)) { >>> + dp_packet_ip_set_header_csum(p, false); >>> + dp_packet_ol_set_ip_csum_good(p); >>> + dp_packet_hwol_reset_outer_ipv4_csum(p); >>> + } >>> + } >>> + >>> + if (!dp_packet_hwol_is_outer_UDP_cksum(p)) { >> >> Capital letters in a function name. >> >>> + return; >>> + } >>> + >>> + if (!(flags & NETDEV_TX_OFFLOAD_OUTER_UDP_CKSUM)) { >>> + packet_udp_complete_csum(p, false); >>> + dp_packet_ol_set_l4_csum_good(p); >>> + dp_packet_hwol_reset_outer_udp_csum(p); >>> + } >>> +} >>> + >>> /* Checks if the packet 'p' is compatible with netdev_ol_flags 'flags' >>> * and if not, updates the packet with the software fall back. */ >>> void dp_packet_ol_send_prepare(struct dp_packet *p, uint64_t flags) >>> { >>> + bool tnl_inner = false; >>> + >>> + if (dp_packet_hwol_is_tunnel_geneve(p) || >>> + dp_packet_hwol_is_tunnel_vxlan(p)) { >>> + dp_packet_tnl_outer_ol_send_prepare(p, flags); >>> + tnl_inner = true; >>> + } >>> + >>> if (dp_packet_hwol_tx_ip_csum(p)) { >>> if (dp_packet_ip_checksum_good(p)) { >>> dp_packet_hwol_reset_tx_ip_csum(p); >>> } else if (!(flags & NETDEV_TX_OFFLOAD_IPV4_CKSUM)) { >>> - dp_packet_ip_set_header_csum(p); >>> + dp_packet_ip_set_header_csum(p, tnl_inner); >>> dp_packet_ol_set_ip_csum_good(p); >>> dp_packet_hwol_reset_tx_ip_csum(p); >>> } >>> @@ -565,24 +596,24 @@ dp_packet_ol_send_prepare(struct dp_packet *p, >>> uint64_t flags) >>> return; >>> } >>> >>> - if (dp_packet_l4_checksum_good(p)) { >>> + if (dp_packet_l4_checksum_good(p) && (!tnl_inner)) { >> >> Nit: No need to parethesise the !tnl_inner. >> >>> dp_packet_hwol_reset_tx_l4_csum(p); >>> return; >>> } >>> >>> if (dp_packet_hwol_l4_is_tcp(p) >>> && !(flags & NETDEV_TX_OFFLOAD_TCP_CKSUM)) { >>> - packet_tcp_complete_csum(p); >>> + packet_tcp_complete_csum(p, tnl_inner); >>> dp_packet_ol_set_l4_csum_good(p); >>> dp_packet_hwol_reset_tx_l4_csum(p); >>> } else if (dp_packet_hwol_l4_is_udp(p) >>> && !(flags & NETDEV_TX_OFFLOAD_UDP_CKSUM)) { >>> - packet_udp_complete_csum(p); >>> + packet_udp_complete_csum(p, tnl_inner); >>> dp_packet_ol_set_l4_csum_good(p); >>> dp_packet_hwol_reset_tx_l4_csum(p); >>> } else if (!(flags & NETDEV_TX_OFFLOAD_SCTP_CKSUM) >>> && dp_packet_hwol_l4_is_sctp(p)) { >>> - packet_sctp_complete_csum(p); >>> + packet_sctp_complete_csum(p, tnl_inner); >>> dp_packet_ol_set_l4_csum_good(p); >>> dp_packet_hwol_reset_tx_l4_csum(p); >>> } >>> diff --git a/lib/dp-packet.h b/lib/dp-packet.h index >>> 11aa00723..3b16b2a54 100644 >>> --- a/lib/dp-packet.h >>> +++ b/lib/dp-packet.h >>> @@ -86,22 +86,47 @@ enum dp_packet_offload_mask { >>> DEF_OL_FLAG(DP_PACKET_OL_TX_SCTP_CKSUM, RTE_MBUF_F_TX_SCTP_CKSUM, >>> 0x800), >>> /* Offload IP checksum. */ >>> DEF_OL_FLAG(DP_PACKET_OL_TX_IP_CKSUM, RTE_MBUF_F_TX_IP_CKSUM, >>> 0x1000), >>> + /* Offload packet is tunnel GENEVE. */ >>> + DEF_OL_FLAG(DP_PACKET_OL_TX_TUNNEL_GENEVE, >>> + RTE_MBUF_F_TX_TUNNEL_GENEVE, 0x2000), >>> + /* Offload packet is tunnel VXLAN. */ >>> + DEF_OL_FLAG(DP_PACKET_OL_TX_TUNNEL_VXLAN, >>> + RTE_MBUF_F_TX_TUNNEL_VXLAN, 0x4000), >>> + /* Offload tunnel packet, out is ipv4 */ >>> + DEF_OL_FLAG(DP_PACKET_OL_TX_OUTER_IPV4, >>> + RTE_MBUF_F_TX_OUTER_IPV4, 0x8000), >>> + /* Offload TUNNEL out ipv4 checksum */ >>> + DEF_OL_FLAG(DP_PACKET_OL_TX_OUTER_IP_CKSUM, >>> + RTE_MBUF_F_TX_OUTER_IP_CKSUM, 0x10000), >>> + /* Offload TUNNEL out udp checksum */ >>> + DEF_OL_FLAG(DP_PACKET_OL_TX_OUTER_UDP_CKSUM, >>> + RTE_MBUF_F_TX_OUTER_UDP_CKSUM, 0x20000), >>> + /* Offload tunnel packet, out is ipv6 */ >>> + DEF_OL_FLAG(DP_PACKET_OL_TX_OUTER_IPV6, >>> + RTE_MBUF_F_TX_OUTER_IPV6, 0x40000), >>> + >> >> Nit: Upper/lower-casing of protocol names is all over the place in the >> comments here. Please, use IPv4, IPv6, UDP, GENEVE, VXLAN (VxLAN is >> also fine). A word 'tunnel' should be in lowercase. >> >>> /* Adding new field requires adding to >>> DP_PACKET_OL_SUPPORTED_MASK. */ }; >>> >>> -#define DP_PACKET_OL_SUPPORTED_MASK (DP_PACKET_OL_RSS_HASH | \ >>> - DP_PACKET_OL_FLOW_MARK | \ >>> - DP_PACKET_OL_RX_L4_CKSUM_BAD | \ >>> - DP_PACKET_OL_RX_IP_CKSUM_BAD | \ >>> - DP_PACKET_OL_RX_L4_CKSUM_GOOD | \ >>> - DP_PACKET_OL_RX_IP_CKSUM_GOOD | \ >>> - DP_PACKET_OL_TX_TCP_SEG | \ >>> - DP_PACKET_OL_TX_IPV4 | \ >>> - DP_PACKET_OL_TX_IPV6 | \ >>> - DP_PACKET_OL_TX_TCP_CKSUM | \ >>> - DP_PACKET_OL_TX_UDP_CKSUM | \ >>> - DP_PACKET_OL_TX_SCTP_CKSUM | \ >>> - DP_PACKET_OL_TX_IP_CKSUM) >>> +#define DP_PACKET_OL_SUPPORTED_MASK (DP_PACKET_OL_RSS_HASH | \ >>> + DP_PACKET_OL_FLOW_MARK | \ >>> + DP_PACKET_OL_RX_L4_CKSUM_BAD | \ >>> + DP_PACKET_OL_RX_IP_CKSUM_BAD | \ >>> + DP_PACKET_OL_RX_L4_CKSUM_GOOD | \ >>> + DP_PACKET_OL_RX_IP_CKSUM_GOOD | \ >>> + DP_PACKET_OL_TX_TCP_SEG | \ >>> + DP_PACKET_OL_TX_IPV4 | \ >>> + DP_PACKET_OL_TX_IPV6 | \ >>> + DP_PACKET_OL_TX_TCP_CKSUM | \ >>> + DP_PACKET_OL_TX_UDP_CKSUM | \ >>> + DP_PACKET_OL_TX_SCTP_CKSUM | \ >>> + DP_PACKET_OL_TX_IP_CKSUM | \ >>> + DP_PACKET_OL_TX_TUNNEL_GENEVE | \ >>> + DP_PACKET_OL_TX_TUNNEL_VXLAN | \ >>> + DP_PACKET_OL_TX_OUTER_IPV4 | \ >>> + DP_PACKET_OL_TX_OUTER_IP_CKSUM | \ >>> + DP_PACKET_OL_TX_OUTER_UDP_CKSUM | \ >>> + DP_PACKET_OL_TX_OUTER_IPV6) >>> >>> #define DP_PACKET_OL_TX_L4_MASK (DP_PACKET_OL_TX_TCP_CKSUM | \ >>> DP_PACKET_OL_TX_UDP_CKSUM | \ @@ >>> -139,6 +164,10 @@ struct dp_packet { >>> * or UINT16_MAX. */ >>> uint16_t l4_ofs; /* Transport-level header offset, >>> or UINT16_MAX. */ >>> + uint16_t inner_l3_ofs; /* inner Network-level header offset, >>> + * or UINT16_MAX. */ >>> + uint16_t inner_l4_ofs; /* inner Transport-level header offset, >>> + or UINT16_MAX. */ >> >> Comments should start with a capital letter. >> >>> uint32_t cutlen; /* length in bytes to cut from the end. >>> */ >>> ovs_be32 packet_type; /* Packet type as defined in OpenFlow */ >>> uint16_t csum_start; /* Position to start checksumming from. >>> */ >>> @@ -250,6 +279,9 @@ bool dp_packet_compare_offsets(struct dp_packet *good, >>> struct dp_packet *test, >>> struct ds *err_str); void >>> dp_packet_ol_send_prepare(struct dp_packet *, uint64_t); >>> +void dp_packet_tnl_outer_ol_send_prepare(struct dp_packet *p, >> >> Nit: prototype doesn't need a variable name 'p'. >> >>> + uint64_t flags); >>> + >>> >>> >>> /* Frees memory that 'b' points to, as well as 'b' itself. */ @@ >>> -482,6 +514,22 @@ dp_packet_l4_size(const struct dp_packet *b) >>> : 0; >>> } >>> >>> +static inline void * >>> +dp_packet_inner_l3(const struct dp_packet *b) { >>> + return b->inner_l3_ofs != UINT16_MAX >>> + ? (char *) dp_packet_data(b) + b->inner_l3_ofs >>> + : NULL; >>> +} >>> + >>> +static inline void * >>> +dp_packet_inner_l4(const struct dp_packet *b) { >>> + return b->inner_l4_ofs != UINT16_MAX >>> + ? (char *) dp_packet_data(b) + b->inner_l4_ofs >>> + : NULL; >>> +} >>> + >>> static inline const void * >>> dp_packet_get_tcp_payload(const struct dp_packet *b) { @@ -539,6 >>> +587,25 @@ dp_packet_get_nd_payload(const struct dp_packet *b) } >>> >>> #ifdef DPDK_NETDEV >>> +static inline void >>> +dp_packet_set_l2_len(struct dp_packet *b, size_t l2_len) { >>> + b->mbuf.l2_len = l2_len; >>> +} >>> + >>> +static inline void >>> +dp_packet_set_l3_len(struct dp_packet *b, size_t l3_len) { >>> + b->mbuf.l3_len = l3_len; >>> +} >>> + >>> +static inline void >>> +dp_packet_set_l4_len(struct dp_packet *b, size_t l4_len) { >>> + b->mbuf.l4_len = l4_len; >>> +} >>> + >>> + >>> static inline uint64_t * >>> dp_packet_ol_flags_ptr(const struct dp_packet *b) { @@ -558,6 >>> +625,24 @@ dp_packet_flow_mark_ptr(const struct dp_packet *b) } >>> >>> #else >>> +static inline void >>> +dp_packet_set_l2_len(struct dp_packet *b OVS_UNUSED, size_t l2_len >>> +OVS_UNUSED) { >>> + /* There are no implementation */ >> >> Nit: s/are/is/ >> And a period at the end of a sentence. >> >>> +} >>> + >>> +static inline void >>> +dp_packet_set_l3_len(struct dp_packet *b OVS_UNUSED, size_t l3_len >>> +OVS_UNUSED) { >>> + /* There are no implementation */ } >>> + >>> +static inline void >>> +dp_packet_set_l4_len(struct dp_packet *b OVS_UNUSED, size_t l4_len >>> +OVS_UNUSED) { >>> + /* There are no implementation */ } >>> + >>> static inline uint32_t * >>> dp_packet_ol_flags_ptr(const struct dp_packet *b) { @@ -619,9 >>> +704,10 @@ dp_packet_set_size(struct dp_packet *b, uint32_t v) >>> * (and thus 'v') will always be <= UINT16_MAX; this means that there >>> is no >>> * loss of accuracy in assigning 'v' to 'data_len'. >>> */ >>> - b->mbuf.data_len = (uint16_t)v; /* Current seg length. */ >>> - b->mbuf.pkt_len = v; /* Total length of all segments >>> linked to >>> - * this segment. */ >>> + /* Current seg length. */ >>> + b->mbuf.data_len += (uint16_t)(v - b->mbuf.pkt_len); >> >> I don't really understand what is happening here. Could you elaborate? >> The result seems likley a negative value turned into unsigned, since >> packet length should always be larger than a length of a current segment. >> >> Also the comment in this function likley needs an update. >> >> And is that correct to have them have different values? We're not >> using multi-segment packets... or do we? > > > We are using multi-segment packets when dp packet using dpdk mbuf. > The old version will set first seg data_len to pkt_len which will result in > Tso error.
To be clear, I meant multi-segment packets in terms of mbuf->next pointer. I don't think we expect mbufs with the 'next' pointer anywhere in OVS. However, it looks like for most packets the calculation above will give the same results as just assigning 'v' to a 'data_len', so it should be fine to have a proposed change if it streamlines some work down the line. Best regards, Ilya Maximets. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
