Re: [ovs-dev] [PATCH v5] userspace: Add TCP Segmentation Offload support

Stokes, Ian Fri, 17 Jan 2020 15:05:17 -0800

Thanks all for review/testing, pushed to master.

Regards
Ian


-----Original Message-----
From: dev <[email protected]> On Behalf Of Stokes, Ian
Sent: Friday, January 17, 2020 10:56 PM
To: Flavio Leitner <[email protected]>; [email protected]
Cc: Ilya Maximets <[email protected]>; txfh2007 <[email protected]>
Subject: Re: [ovs-dev] [PATCH v5] userspace: Add TCP Segmentation Offload 
support



On 1/17/2020 9:54 PM, Stokes, Ian wrote:
> 
> 
> On 1/17/2020 9:47 PM, Flavio Leitner wrote:
>> Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
>> the network stack to delegate the TCP segmentation to the NIC reducing
>> the per packet CPU overhead.
>>
>> A guest using vhostuser interface with TSO enabled can send TCP packets
>> much bigger than the MTU, which saves CPU cycles normally used to break
>> the packets down to MTU size and to calculate checksums.
>>
>> It also saves CPU cycles used to parse multiple packets/headers during
>> the packet processing inside virtual switch.
>>
>> If the destination of the packet is another guest in the same host, then
>> the same big packet can be sent through a vhostuser interface skipping
>> the segmentation completely. However, if the destination is not local,
>> the NIC hardware is instructed to do the TCP segmentation and checksum
>> calculation.
>>
>> It is recommended to check if NIC hardware supports TSO before enabling
>> the feature, which is off by default. For additional information please
>> check the tso.rst document.
>>
>> Signed-off-by: Flavio Leitner <[email protected]>
> 
> Fantastic work here Flavio, quick turn arouround when needed.
> 
> Acked

Are the any objectionions to merging this?

Theres been nothhing so far.

If no further objections I will merge this at the end of the hour?

BR
Ian
> 
> BR
> Ian
>> ---
>>   Documentation/automake.mk              |   1 +
>>   Documentation/topics/index.rst         |   1 +
>>   Documentation/topics/userspace-tso.rst |  98 +++++++
>>   NEWS                                   |   1 +
>>   lib/automake.mk                        |   2 +
>>   lib/conntrack.c                        |  29 +-
>>   lib/dp-packet.h                        | 176 ++++++++++-
>>   lib/ipf.c                              |  32 +-
>>   lib/netdev-dpdk.c                      | 348 +++++++++++++++++++---
>>   lib/netdev-linux-private.h             |   5 +
>>   lib/netdev-linux.c                     | 386 ++++++++++++++++++++++---
>>   lib/netdev-provider.h                  |   9 +
>>   lib/netdev.c                           |  78 ++++-
>>   lib/userspace-tso.c                    |  53 ++++
>>   lib/userspace-tso.h                    |  23 ++
>>   vswitchd/bridge.c                      |   2 +
>>   vswitchd/vswitch.xml                   |  20 ++
>>   17 files changed, 1140 insertions(+), 124 deletions(-)
>>   create mode 100644 Documentation/topics/userspace-tso.rst
>>   create mode 100644 lib/userspace-tso.c
>>   create mode 100644 lib/userspace-tso.h
>>
>> Testing:
>>   - Travis, Cirrus, AppVeyor, testsuite passed OK.
>>   - notice no changes since v4 with regards to performance.
>>
>> Changelog:
>> - v5
>>   * rebased on top of master (NEWS conflict)
>>   * added missing periods at the end of comments
>>   * mention DPDK requirement at vswitch.xml
>>   * restricted tso feature to OvS built with dpdk
>>   * headers in alphabetical order
>>   * removed unneeded call to initialize pkt
>>   * used OVS_UNLIKELY instead of unlikely
>>   * removed parenthesis from sizeof()
>>   * removed blank line at dp_packet_hwol_tx_l4_checksum()
>>   * removed redundant dp_packet_hwol_tx_ipv4_checksum()
>>   * updated function comments as suggested
>>
>> - v4
>>   * rebased on top of master (recvmmsg)
>>   * fixed URL in doc to point to 19.11
>>   * renamed tso to userspace-tso
>>   * renamed the option to userspace-tso-enable
>>   * removed prototype that left over from v2
>>   * fixed function style declaration
>>   * renamed dp_packet_hwol_tx_ip_checksum to 
>> dp_packet_hwol_tx_ipv4_checksum
>>   * dp_packet_hwol_tx_ipv4_checksum now checks for PKT_TX_IPV4.
>>   * account for drops while preping the batch for TX.
>>   * don't prep the batch for TX if TSO is disabled.
>>   * simplified setsockopt error checking
>>   * fixed af_packet_sock error checking to not call setsockopt on
>>        closed sockets.
>>   * fixed ol_flags comment.
>>   * used VLOG_ERR_BUF() to pass error messages.
>>   * fixed packet leak at netdev_send_prepare_batch()
>>   * added a coverage counter to account drops while preparing a batch
>>     at netdev.c
>>   * fixed netdev_send() to not call ->send() if the batch is empty.
>>   * fixed packet leak at netdev_push_header and account for the drops.
>>   * removed DPDK requirement to enable userspace TSO support.
>>   * fixed parameter documentation in vswitch.xml.
>>   * renamed tso.rst to userspace-tso.rst and moved to topics/
>>   * added comments documeting the functions in dp-packet.h
>>   * fixed dp_packet_hwol_is_tso to check only PKT_TX_TCP_SEG
>>
>> - v3
>>   * Improved the documentation.
>>   * Updated copyright year to 2020.
>>   * TSO offloaded msg now includes the netdev's name.
>>   * Added period at the end of all code comments.
>>   * Warn and drop encapsulation of TSO packets.
>>   * Fixed travis issue with restricted virtio types.
>>   * Fixed double headroom allocation in dpdk_copy_dp_packet_to_mbuf()
>>     which caused packet corruption.
>>   * Fixed netdev_dpdk_prep_hwol_packet() to unconditionally set
>>     PKT_TX_IP_CKSUM only for IPv4 packets.
>>
>> diff --git a/Documentation/automake.mk b/Documentation/automake.mk
>> index f2ca17bad..22976a3cd 100644
>> --- a/Documentation/automake.mk
>> +++ b/Documentation/automake.mk
>> @@ -57,6 +57,7 @@ DOC_SOURCE = \
>>       Documentation/topics/ovsdb-replication.rst \
>>       Documentation/topics/porting.rst \
>>       Documentation/topics/tracing.rst \
>> +    Documentation/topics/userspace-tso.rst \
>>       Documentation/topics/windows.rst \
>>       Documentation/howto/index.rst \
>>       Documentation/howto/dpdk.rst \
>> diff --git a/Documentation/topics/index.rst 
>> b/Documentation/topics/index.rst
>> index 34c4b10e0..08af3a24d 100644
>> --- a/Documentation/topics/index.rst
>> +++ b/Documentation/topics/index.rst
>> @@ -50,5 +50,6 @@ OVS
>>      language-bindings
>>      testing
>>      tracing
>> +   userspace-tso
>>      idl-compound-indexes
>>      ovs-extensions
>> diff --git a/Documentation/topics/userspace-tso.rst 
>> b/Documentation/topics/userspace-tso.rst
>> new file mode 100644
>> index 000000000..893c64839
>> --- /dev/null
>> +++ b/Documentation/topics/userspace-tso.rst
>> @@ -0,0 +1,98 @@
>> +..
>> +      Copyright 2020, Red Hat, Inc.
>> +
>> +      Licensed under the Apache License, Version 2.0 (the "License"); 
>> you may
>> +      not use this file except in compliance with the License. You 
>> may obtain
>> +      a copy of the License at
>> +
>> +          http://www.apache.org/licenses/LICENSE-2.0
>> +
>> +      Unless required by applicable law or agreed to in writing, 
>> software
>> +      distributed under the License is distributed on an "AS IS" 
>> BASIS, WITHOUT
>> +      WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
>> implied. See the
>> +      License for the specific language governing permissions and 
>> limitations
>> +      under the License.
>> +
>> +      Convention for heading levels in Open vSwitch documentation:
>> +
>> +      =======  Heading 0 (reserved for the title in a document)
>> +      -------  Heading 1
>> +      ~~~~~~~  Heading 2
>> +      +++++++  Heading 3
>> +      '''''''  Heading 4
>> +
>> +      Avoid deeper levels because they do not render well.
>> +
>> +========================
>> +Userspace Datapath - TSO
>> +========================
>> +
>> +**Note:** This feature is considered experimental.
>> +
>> +TCP Segmentation Offload (TSO) enables a network stack to delegate 
>> segmentation
>> +of an oversized TCP segment to the underlying physical NIC. Offload 
>> of frame
>> +segmentation achieves computational savings in the core, freeing up 
>> CPU cycles
>> +for more useful work.
>> +
>> +A common use case for TSO is when using virtualization, where traffic 
>> that's
>> +coming in from a VM can offload the TCP segmentation, thus avoiding the
>> +fragmentation in software. Additionally, if the traffic is headed to 
>> a VM
>> +within the same host further optimization can be expected. As the 
>> traffic never
>> +leaves the machine, no MTU needs to be accounted for, and thus no 
>> segmentation
>> +and checksum calculations are required, which saves yet more cycles. 
>> Only when
>> +the traffic actually leaves the host the segmentation needs to 
>> happen, in which
>> +case it will be performed by the egress NIC. Consult your controller's
>> +datasheet for compatibility. Secondly, the NIC must have an 
>> associated DPDK
>> +Poll Mode Driver (PMD) which supports `TSO`. For a list of features 
>> per PMD,
>> +refer to the `DPDK documentation`__.
>> +
>> +__ https://doc.dpdk.org/guides-19.11/nics/overview.html
>> +
>> +Enabling TSO
>> +~~~~~~~~~~~~
>> +
>> +The TSO support may be enabled via a global config value
>> +``userspace-tso-enable``.  Setting this to ``true`` enables TSO 
>> support for
>> +all ports.
>> +
>> +    $ ovs-vsctl set Open_vSwitch . 
>> other_config:userspace-tso-enable=true
>> +
>> +The default value is ``false``.
>> +
>> +Changing ``userspace-tso-enable`` requires restarting the daemon.
>> +
>> +When using :doc:`vHost User ports <dpdk/vhost-user>`, TSO may be enabled
>> +as follows.
>> +
>> +`TSO` is enabled in OvS by the DPDK vHost User backend; when a new guest
>> +connection is established, `TSO` is thus advertised to the guest as an
>> +available feature:
>> +
>> +QEMU Command Line Parameter::
>> +
>> +    $ sudo $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 \
>> +    ...
>> +    -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,\
>> +    csum=on,guest_csum=on,guest_tso4=on,guest_tso6=on\
>> +    ...
>> +
>> +2. Ethtool. Assuming that the guest's OS also supports `TSO`, ethtool 
>> can be
>> +used to enable same::
>> +
>> +    $ ethtool -K eth0 sg on     # scatter-gather is a prerequisite 
>> for TSO
>> +    $ ethtool -K eth0 tso on
>> +    $ ethtool -k eth0
>> +
>> +~~~~~~~~~~~
>> +Limitations
>> +~~~~~~~~~~~
>> +
>> +The current OvS userspace `TSO` implementation supports flat and VLAN 
>> networks
>> +only (i.e. no support for `TSO` over tunneled connection [VxLAN, GRE, 
>> IPinIP,
>> +etc.]).
>> +
>> +There is no software implementation of TSO, so all ports attached to the
>> +datapath must support TSO or packets using that feature will be dropped
>> +on ports without TSO support.  That also means guests using vhost-user
>> +in client mode will receive TSO packet regardless of TSO being enabled
>> +or disabled within the guest.
>> diff --git a/NEWS b/NEWS
>> index 579e91c89..c6d3b6053 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -30,6 +30,7 @@ Post-v2.12.0
>>        * Add support for DPDK 19.11.
>>        * Add hardware offload support for output, drop, set of MAC, 
>> IPv4 and
>>          TCP/UDP ports actions (experimental).
>> +     * Add experimental support for TSO.
>>      - RSTP:
>>        * The rstp_statistics column in Port table will only be updated 
>> every
>>          stats-update-interval configured in Open_vSwitch table.
>> diff --git a/lib/automake.mk b/lib/automake.mk
>> index ebf714501..95925b57c 100644
>> --- a/lib/automake.mk
>> +++ b/lib/automake.mk
>> @@ -314,6 +314,8 @@ lib_libopenvswitch_la_SOURCES = \
>>       lib/unicode.h \
>>       lib/unixctl.c \
>>       lib/unixctl.h \
>> +    lib/userspace-tso.c \
>> +    lib/userspace-tso.h \
>>       lib/util.c \
>>       lib/util.h \
>>       lib/uuid.c \
>> diff --git a/lib/conntrack.c b/lib/conntrack.c
>> index b80080e72..60222ca53 100644
>> --- a/lib/conntrack.c
>> +++ b/lib/conntrack.c
>> @@ -2022,7 +2022,8 @@ conn_key_extract(struct conntrack *ct, struct 
>> dp_packet *pkt, ovs_be16 dl_type,
>>           if (hwol_bad_l3_csum) {
>>               ok = false;
>>           } else {
>> -            bool hwol_good_l3_csum = dp_packet_ip_checksum_valid(pkt);
>> +            bool hwol_good_l3_csum = dp_packet_ip_checksum_valid(pkt)
>> +                                     || dp_packet_hwol_is_ipv4(pkt);
>>               /* Validate the checksum only when hwol is not 
>> supported. */
>>               ok = extract_l3_ipv4(&ctx->key, l3, 
>> dp_packet_l3_size(pkt), NULL,
>>                                    !hwol_good_l3_csum);
>> @@ -2036,7 +2037,8 @@ conn_key_extract(struct conntrack *ct, struct 
>> dp_packet *pkt, ovs_be16 dl_type,
>>       if (ok) {
>>           bool hwol_bad_l4_csum = dp_packet_l4_checksum_bad(pkt);
>>           if (!hwol_bad_l4_csum) {
>> -            bool  hwol_good_l4_csum = dp_packet_l4_checksum_valid(pkt);
>> +            bool  hwol_good_l4_csum = dp_packet_l4_checksum_valid(pkt)
>> +                                      || 
>> dp_packet_hwol_tx_l4_checksum(pkt);
>>               /* Validate the checksum only when hwol is not 
>> supported. */
>>               if (extract_l4(&ctx->key, l4, dp_packet_l4_size(pkt),
>>                              &ctx->icmp_related, l3, !hwol_good_l4_csum,
>> @@ -3237,8 +3239,11 @@ handle_ftp_ctl(struct conntrack *ct, const 
>> struct conn_lookup_ctx *ctx,
>>                   }
>>                   if (seq_skew) {
>>                       ip_len = ntohs(l3_hdr->ip_tot_len) + seq_skew;
>> -                    l3_hdr->ip_csum = recalc_csum16(l3_hdr->ip_csum,
>> -                                          l3_hdr->ip_tot_len, 
>> htons(ip_len));
>> +                    if (!dp_packet_hwol_is_ipv4(pkt)) {
>> +                        l3_hdr->ip_csum = recalc_csum16(l3_hdr->ip_csum,
>> +                                                        
>> l3_hdr->ip_tot_len,
>> +                                                        htons(ip_len));
>> +                    }
>>                       l3_hdr->ip_tot_len = htons(ip_len);
>>                   }
>>               }
>> @@ -3256,13 +3261,15 @@ handle_ftp_ctl(struct conntrack *ct, const 
>> struct conn_lookup_ctx *ctx,
>>       }
>>       th->tcp_csum = 0;
>> -    if (ctx->key.dl_type == htons(ETH_TYPE_IPV6)) {
>> -        th->tcp_csum = packet_csum_upperlayer6(nh6, th, 
>> ctx->key.nw_proto,
>> -                           dp_packet_l4_size(pkt));
>> -    } else {
>> -        uint32_t tcp_csum = packet_csum_pseudoheader(l3_hdr);
>> -        th->tcp_csum = csum_finish(
>> -             csum_continue(tcp_csum, th, dp_packet_l4_size(pkt)));
>> +    if (!dp_packet_hwol_tx_l4_checksum(pkt)) {
>> +        if (ctx->key.dl_type == htons(ETH_TYPE_IPV6)) {
>> +            th->tcp_csum = packet_csum_upperlayer6(nh6, th, 
>> ctx->key.nw_proto,
>> +                               dp_packet_l4_size(pkt));
>> +        } else {
>> +            uint32_t tcp_csum = packet_csum_pseudoheader(l3_hdr);
>> +            th->tcp_csum = csum_finish(
>> +                 csum_continue(tcp_csum, th, dp_packet_l4_size(pkt)));
>> +        }
>>       }
>>       if (seq_skew) {
>> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
>> index 133942155..69ae5dfac 100644
>> --- a/lib/dp-packet.h
>> +++ b/lib/dp-packet.h
>> @@ -456,7 +456,7 @@ dp_packet_init_specific(struct dp_packet *p)
>>   {
>>       /* This initialization is needed for packets that do not come 
>> from DPDK
>>        * interfaces, when vswitchd is built with --with-dpdk. */
>> -    p->mbuf.tx_offload = p->mbuf.packet_type = 0;
>> +    p->mbuf.ol_flags = p->mbuf.tx_offload = p->mbuf.packet_type = 0;
>>       p->mbuf.nb_segs = 1;
>>       p->mbuf.next = NULL;
>>   }
>> @@ -519,6 +519,95 @@ dp_packet_set_allocated(struct dp_packet *b, 
>> uint16_t s)
>>       b->mbuf.buf_len = s;
>>   }
>> +/* Returns 'true' if packet 'b' is marked for TCP segmentation 
>> offloading. */
>> +static inline bool
>> +dp_packet_hwol_is_tso(const struct dp_packet *b)
>> +{
>> +    return !!(b->mbuf.ol_flags & PKT_TX_TCP_SEG);
>> +}
>> +
>> +/* Returns 'true' if packet 'b' is marked for IPv4 checksum 
>> offloading. */
>> +static inline bool
>> +dp_packet_hwol_is_ipv4(const struct dp_packet *b)
>> +{
>> +    return !!(b->mbuf.ol_flags & PKT_TX_IPV4);
>> +}
>> +
>> +/* Returns the L4 cksum offload bitmask. */
>> +static inline uint64_t
>> +dp_packet_hwol_l4_mask(const struct dp_packet *b)
>> +{
>> +    return b->mbuf.ol_flags & PKT_TX_L4_MASK;
>> +}
>> +
>> +/* Returns 'true' if packet 'b' is marked for TCP checksum 
>> offloading. */
>> +static inline bool
>> +dp_packet_hwol_l4_is_tcp(const struct dp_packet *b)
>> +{
>> +    return (b->mbuf.ol_flags & PKT_TX_L4_MASK) == PKT_TX_TCP_CKSUM;
>> +}
>> +
>> +/* Returns 'true' if packet 'b' is marked for UDP checksum 
>> offloading. */
>> +static inline bool
>> +dp_packet_hwol_l4_is_udp(struct dp_packet *b)
>> +{
>> +    return (b->mbuf.ol_flags & PKT_TX_L4_MASK) == PKT_TX_UDP_CKSUM;
>> +}
>> +
>> +/* Returns 'true' if packet 'b' is marked for SCTP checksum 
>> offloading. */
>> +static inline bool
>> +dp_packet_hwol_l4_is_sctp(struct dp_packet *b)
>> +{
>> +    return (b->mbuf.ol_flags & PKT_TX_L4_MASK) == PKT_TX_SCTP_CKSUM;
>> +}
>> +
>> +/* Mark packet 'b' for IPv4 checksum offloading. */
>> +static inline void
>> +dp_packet_hwol_set_tx_ipv4(struct dp_packet *b)
>> +{
>> +    b->mbuf.ol_flags |= PKT_TX_IPV4;
>> +}
>> +
>> +/* Mark packet 'b' for IPv6 checksum offloading. */
>> +static inline void
>> +dp_packet_hwol_set_tx_ipv6(struct dp_packet *b)
>> +{
>> +    b->mbuf.ol_flags |= PKT_TX_IPV6;
>> +}
>> +
>> +/* Mark packet 'b' for TCP checksum offloading.  It implies that either
>> + * the packet 'b' is marked for IPv4 or IPv6 checksum offloading. */
>> +static inline void
>> +dp_packet_hwol_set_csum_tcp(struct dp_packet *b)
>> +{
>> +    b->mbuf.ol_flags |= PKT_TX_TCP_CKSUM;
>> +}
>> +
>> +/* Mark packet 'b' for UDP checksum offloading.  It implies that either
>> + * the packet 'b' is marked for IPv4 or IPv6 checksum offloading. */
>> +static inline void
>> +dp_packet_hwol_set_csum_udp(struct dp_packet *b)
>> +{
>> +    b->mbuf.ol_flags |= PKT_TX_UDP_CKSUM;
>> +}
>> +
>> +/* Mark packet 'b' for SCTP checksum offloading.  It implies that either
>> + * the packet 'b' is marked for IPv4 or IPv6 checksum offloading. */
>> +static inline void
>> +dp_packet_hwol_set_csum_sctp(struct dp_packet *b)
>> +{
>> +    b->mbuf.ol_flags |= PKT_TX_SCTP_CKSUM;
>> +}
>> +
>> +/* Mark packet 'b' for TCP segmentation offloading.  It implies that
>> + * either the packet 'b' is marked for IPv4 or IPv6 checksum offloading
>> + * and also for TCP checksum offloading. */
>> +static inline void
>> +dp_packet_hwol_set_tcp_seg(struct dp_packet *b)
>> +{
>> +    b->mbuf.ol_flags |= PKT_TX_TCP_SEG;
>> +}
>> +
>>   /* Returns the RSS hash of the packet 'p'.  Note that the returned 
>> value is
>>    * correct only if 'dp_packet_rss_valid(p)' returns true */
>>   static inline uint32_t
>> @@ -648,6 +737,84 @@ dp_packet_set_allocated(struct dp_packet *b, 
>> uint16_t s)
>>       b->allocated_ = s;
>>   }
>> +/* There are no implementation when not DPDK enabled datapath. */
>> +static inline bool
>> +dp_packet_hwol_is_tso(const struct dp_packet *b OVS_UNUSED)
>> +{
>> +    return false;
>> +}
>> +
>> +/* There are no implementation when not DPDK enabled datapath. */
>> +static inline bool
>> +dp_packet_hwol_is_ipv4(const struct dp_packet *b OVS_UNUSED)
>> +{
>> +    return false;
>> +}
>> +
>> +/* There are no implementation when not DPDK enabled datapath. */
>> +static inline uint64_t
>> +dp_packet_hwol_l4_mask(const struct dp_packet *b OVS_UNUSED)
>> +{
>> +    return 0;
>> +}
>> +
>> +/* There are no implementation when not DPDK enabled datapath. */
>> +static inline bool
>> +dp_packet_hwol_l4_is_tcp(const struct dp_packet *b OVS_UNUSED)
>> +{
>> +    return false;
>> +}
>> +
>> +/* There are no implementation when not DPDK enabled datapath. */
>> +static inline bool
>> +dp_packet_hwol_l4_is_udp(const struct dp_packet *b OVS_UNUSED)
>> +{
>> +    return false;
>> +}
>> +
>> +/* There are no implementation when not DPDK enabled datapath. */
>> +static inline bool
>> +dp_packet_hwol_l4_is_sctp(const struct dp_packet *b OVS_UNUSED)
>> +{
>> +    return false;
>> +}
>> +
>> +/* There are no implementation when not DPDK enabled datapath. */
>> +static inline void
>> +dp_packet_hwol_set_tx_ipv4(struct dp_packet *b OVS_UNUSED)
>> +{
>> +}
>> +
>> +/* There are no implementation when not DPDK enabled datapath. */
>> +static inline void
>> +dp_packet_hwol_set_tx_ipv6(struct dp_packet *b OVS_UNUSED)
>> +{
>> +}
>> +
>> +/* There are no implementation when not DPDK enabled datapath. */
>> +static inline void
>> +dp_packet_hwol_set_csum_tcp(struct dp_packet *b OVS_UNUSED)
>> +{
>> +}
>> +
>> +/* There are no implementation when not DPDK enabled datapath. */
>> +static inline void
>> +dp_packet_hwol_set_csum_udp(struct dp_packet *b OVS_UNUSED)
>> +{
>> +}
>> +
>> +/* There are no implementation when not DPDK enabled datapath. */
>> +static inline void
>> +dp_packet_hwol_set_csum_sctp(struct dp_packet *b OVS_UNUSED)
>> +{
>> +}
>> +
>> +/* There are no implementation when not DPDK enabled datapath. */
>> +static inline void
>> +dp_packet_hwol_set_tcp_seg(struct dp_packet *b OVS_UNUSED)
>> +{
>> +}
>> +
>>   /* Returns the RSS hash of the packet 'p'.  Note that the returned 
>> value is
>>    * correct only if 'dp_packet_rss_valid(p)' returns true */
>>   static inline uint32_t
>> @@ -939,6 +1106,13 @@ dp_packet_batch_reset_cutlen(struct 
>> dp_packet_batch *batch)
>>       }
>>   }
>> +/* Return true if the packet 'b' requested L4 checksum offload. */
>> +static inline bool
>> +dp_packet_hwol_tx_l4_checksum(const struct dp_packet *b)
>> +{
>> +    return !!dp_packet_hwol_l4_mask(b);
>> +}
>> +
>>   #ifdef  __cplusplus
>>   }
>>   #endif
>> diff --git a/lib/ipf.c b/lib/ipf.c
>> index 45c489122..446e89d13 100644
>> --- a/lib/ipf.c
>> +++ b/lib/ipf.c
>> @@ -433,9 +433,11 @@ ipf_reassemble_v4_frags(struct ipf_list *ipf_list)
>>       len += rest_len;
>>       l3 = dp_packet_l3(pkt);
>>       ovs_be16 new_ip_frag_off = l3->ip_frag_off & 
>> ~htons(IP_MORE_FRAGMENTS);
>> -    l3->ip_csum = recalc_csum16(l3->ip_csum, l3->ip_frag_off,
>> -                                new_ip_frag_off);
>> -    l3->ip_csum = recalc_csum16(l3->ip_csum, l3->ip_tot_len, 
>> htons(len));
>> +    if (!dp_packet_hwol_is_ipv4(pkt)) {
>> +        l3->ip_csum = recalc_csum16(l3->ip_csum, l3->ip_frag_off,
>> +                                    new_ip_frag_off);
>> +        l3->ip_csum = recalc_csum16(l3->ip_csum, l3->ip_tot_len, 
>> htons(len));
>> +    }
>>       l3->ip_tot_len = htons(len);
>>       l3->ip_frag_off = new_ip_frag_off;
>>       dp_packet_set_l2_pad_size(pkt, 0);
>> @@ -606,6 +608,7 @@ ipf_is_valid_v4_frag(struct ipf *ipf, struct 
>> dp_packet *pkt)
>>       }
>>       if (OVS_UNLIKELY(!dp_packet_ip_checksum_valid(pkt)
>> +                     && !dp_packet_hwol_is_ipv4(pkt)
>>                        && csum(l3, ip_hdr_len) != 0)) {
>>           goto invalid_pkt;
>>       }
>> @@ -1181,16 +1184,21 @@ ipf_post_execute_reass_pkts(struct ipf *ipf,
>>                   } else {
>>                       struct ip_header *l3_frag = 
>> dp_packet_l3(frag_0->pkt);
>>                       struct ip_header *l3_reass = dp_packet_l3(pkt);
>> -                    ovs_be32 reass_ip = 
>> get_16aligned_be32(&l3_reass->ip_src);
>> -                    ovs_be32 frag_ip = 
>> get_16aligned_be32(&l3_frag->ip_src);
>> -                    l3_frag->ip_csum = recalc_csum32(l3_frag->ip_csum,
>> -                                                     frag_ip, reass_ip);
>> -                    l3_frag->ip_src = l3_reass->ip_src;
>> +                    if (!dp_packet_hwol_is_ipv4(frag_0->pkt)) {
>> +                        ovs_be32 reass_ip =
>> +                            get_16aligned_be32(&l3_reass->ip_src);
>> +                        ovs_be32 frag_ip =
>> +                            get_16aligned_be32(&l3_frag->ip_src);
>> +
>> +                        l3_frag->ip_csum = 
>> recalc_csum32(l3_frag->ip_csum,
>> +                                                         frag_ip, 
>> reass_ip);
>> +                        reass_ip = 
>> get_16aligned_be32(&l3_reass->ip_dst);
>> +                        frag_ip = get_16aligned_be32(&l3_frag->ip_dst);
>> +                        l3_frag->ip_csum = 
>> recalc_csum32(l3_frag->ip_csum,
>> +                                                         frag_ip, 
>> reass_ip);
>> +                    }
>> -                    reass_ip = get_16aligned_be32(&l3_reass->ip_dst);
>> -                    frag_ip = get_16aligned_be32(&l3_frag->ip_dst);
>> -                    l3_frag->ip_csum = recalc_csum32(l3_frag->ip_csum,
>> -                                                     frag_ip, reass_ip);
>> +                    l3_frag->ip_src = l3_reass->ip_src;
>>                       l3_frag->ip_dst = l3_reass->ip_dst;
>>                   }
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index d1469f6f2..b108cbd6b 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -72,6 +72,7 @@
>>   #include "timeval.h"
>>   #include "unaligned.h"
>>   #include "unixctl.h"
>> +#include "userspace-tso.h"
>>   #include "util.h"
>>   #include "uuid.h"
>> @@ -201,6 +202,8 @@ struct netdev_dpdk_sw_stats {
>>       uint64_t tx_qos_drops;
>>       /* Packet drops in ingress policer processing. */
>>       uint64_t rx_qos_drops;
>> +    /* Packet drops in HWOL processing. */
>> +    uint64_t tx_invalid_hwol_drops;
>>   };
>>   enum { DPDK_RING_SIZE = 256 };
>> @@ -410,7 +413,8 @@ struct ingress_policer {
>>   enum dpdk_hw_ol_features {
>>       NETDEV_RX_CHECKSUM_OFFLOAD = 1 << 0,
>>       NETDEV_RX_HW_CRC_STRIP = 1 << 1,
>> -    NETDEV_RX_HW_SCATTER = 1 << 2
>> +    NETDEV_RX_HW_SCATTER = 1 << 2,
>> +    NETDEV_TX_TSO_OFFLOAD = 1 << 3,
>>   };
>>   /*
>> @@ -992,6 +996,12 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, 
>> int n_rxq, int n_txq)
>>           conf.rxmode.offloads |= DEV_RX_OFFLOAD_KEEP_CRC;
>>       }
>> +    if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) {
>> +        conf.txmode.offloads |= DEV_TX_OFFLOAD_TCP_TSO;
>> +        conf.txmode.offloads |= DEV_TX_OFFLOAD_TCP_CKSUM;
>> +        conf.txmode.offloads |= DEV_TX_OFFLOAD_IPV4_CKSUM;
>> +    }
>> +
>>       /* Limit configured rss hash functions to only those supported
>>        * by the eth device. */
>>       conf.rx_adv_conf.rss_conf.rss_hf &= info.flow_type_rss_offloads;
>> @@ -1093,6 +1103,9 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
>>       uint32_t rx_chksm_offload_capa = DEV_RX_OFFLOAD_UDP_CKSUM |
>>                                        DEV_RX_OFFLOAD_TCP_CKSUM |
>>                                        DEV_RX_OFFLOAD_IPV4_CKSUM;
>> +    uint32_t tx_tso_offload_capa = DEV_TX_OFFLOAD_TCP_TSO |
>> +                                   DEV_TX_OFFLOAD_TCP_CKSUM |
>> +                                   DEV_TX_OFFLOAD_IPV4_CKSUM;
>>       rte_eth_dev_info_get(dev->port_id, &info);
>> @@ -1119,6 +1132,14 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
>>           dev->hw_ol_features &= ~NETDEV_RX_HW_SCATTER;
>>       }
>> +    if (info.tx_offload_capa & tx_tso_offload_capa) {
>> +        dev->hw_ol_features |= NETDEV_TX_TSO_OFFLOAD;
>> +    } else {
>> +        dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD;
>> +        VLOG_WARN("Tx TSO offload is not supported on %s port "
>> +                  DPDK_PORT_ID_FMT, netdev_get_name(&dev->up), 
>> dev->port_id);
>> +    }
>> +
>>       n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq);
>>       n_txq = MIN(info.max_tx_queues, dev->up.n_txq);
>> @@ -1369,14 +1390,16 @@ netdev_dpdk_vhost_construct(struct netdev 
>> *netdev)
>>           goto out;
>>       }
>> -    err = rte_vhost_driver_disable_features(dev->vhost_id,
>> -                                1ULL << VIRTIO_NET_F_HOST_TSO4
>> -                                | 1ULL << VIRTIO_NET_F_HOST_TSO6
>> -                                | 1ULL << VIRTIO_NET_F_CSUM);
>> -    if (err) {
>> -        VLOG_ERR("rte_vhost_driver_disable_features failed for vhost 
>> user "
>> -                 "port: %s\n", name);
>> -        goto out;
>> +    if (!userspace_tso_enabled()) {
>> +        err = rte_vhost_driver_disable_features(dev->vhost_id,
>> +                                    1ULL << VIRTIO_NET_F_HOST_TSO4
>> +                                    | 1ULL << VIRTIO_NET_F_HOST_TSO6
>> +                                    | 1ULL << VIRTIO_NET_F_CSUM);
>> +        if (err) {
>> +            VLOG_ERR("rte_vhost_driver_disable_features failed for 
>> vhost user "
>> +                     "port: %s\n", name);
>> +            goto out;
>> +        }
>>       }
>>       err = rte_vhost_driver_start(dev->vhost_id);
>> @@ -1711,6 +1734,11 @@ netdev_dpdk_get_config(const struct netdev 
>> *netdev, struct smap *args)
>>           } else {
>>               smap_add(args, "rx_csum_offload", "false");
>>           }
>> +        if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) {
>> +            smap_add(args, "tx_tso_offload", "true");
>> +        } else {
>> +            smap_add(args, "tx_tso_offload", "false");
>> +        }
>>           smap_add(args, "lsc_interrupt_mode",
>>                    dev->lsc_interrupt_mode ? "true" : "false");
>>       }
>> @@ -2138,6 +2166,67 @@ netdev_dpdk_rxq_dealloc(struct netdev_rxq *rxq)
>>       rte_free(rx);
>>   }
>> +/* Prepare the packet for HWOL.
>> + * Return True if the packet is OK to continue. */
>> +static bool
>> +netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf 
>> *mbuf)
>> +{
>> +    struct dp_packet *pkt = CONTAINER_OF(mbuf, struct dp_packet, mbuf);
>> +
>> +    if (mbuf->ol_flags & PKT_TX_L4_MASK) {
>> +        mbuf->l2_len = (char *)dp_packet_l3(pkt) - (char 
>> *)dp_packet_eth(pkt);
>> +        mbuf->l3_len = (char *)dp_packet_l4(pkt) - (char 
>> *)dp_packet_l3(pkt);
>> +        mbuf->outer_l2_len = 0;
>> +        mbuf->outer_l3_len = 0;
>> +    }
>> +
>> +    if (mbuf->ol_flags & PKT_TX_TCP_SEG) {
>> +        struct tcp_header *th = dp_packet_l4(pkt);
>> +
>> +        if (!th) {
>> +            VLOG_WARN_RL(&rl, "%s: TCP Segmentation without L4 header"
>> +                         " pkt len: %"PRIu32"", dev->up.name, 
>> mbuf->pkt_len);
>> +            return false;
>> +        }
>> +
>> +        mbuf->l4_len = TCP_OFFSET(th->tcp_ctl) * 4;
>> +        mbuf->ol_flags |= PKT_TX_TCP_CKSUM;
>> +        mbuf->tso_segsz = dev->mtu - mbuf->l3_len - mbuf->l4_len;
>> +
>> +        if (mbuf->ol_flags & PKT_TX_IPV4) {
>> +            mbuf->ol_flags |= PKT_TX_IP_CKSUM;
>> +        }
>> +    }
>> +    return true;
>> +}
>> +
>> +/* Prepare a batch for HWOL.
>> + * Return the number of good packets in the batch. */
>> +static int
>> +netdev_dpdk_prep_hwol_batch(struct netdev_dpdk *dev, struct rte_mbuf 
>> **pkts,
>> +                            int pkt_cnt)
>> +{
>> +    int i = 0;
>> +    int cnt = 0;
>> +    struct rte_mbuf *pkt;
>> +
>> +    /* Prepare and filter bad HWOL packets. */
>> +    for (i = 0; i < pkt_cnt; i++) {
>> +        pkt = pkts[i];
>> +        if (!netdev_dpdk_prep_hwol_packet(dev, pkt)) {
>> +            rte_pktmbuf_free(pkt);
>> +            continue;
>> +        }
>> +
>> +        if (OVS_UNLIKELY(i != cnt)) {
>> +            pkts[cnt] = pkt;
>> +        }
>> +        cnt++;
>> +    }
>> +
>> +    return cnt;
>> +}
>> +
>>   /* Tries to transmit 'pkts' to txq 'qid' of device 'dev'.  Takes 
>> ownership of
>>    * 'pkts', even in case of failure.
>>    *
>> @@ -2147,11 +2236,22 @@ netdev_dpdk_eth_tx_burst(struct netdev_dpdk 
>> *dev, int qid,
>>                            struct rte_mbuf **pkts, int cnt)
>>   {
>>       uint32_t nb_tx = 0;
>> +    uint16_t nb_tx_prep = cnt;
>> +
>> +    if (userspace_tso_enabled()) {
>> +        nb_tx_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt);
>> +        if (nb_tx_prep != cnt) {
>> +            VLOG_WARN_RL(&rl, "%s: Output batch contains invalid 
>> packets. "
>> +                         "Only %u/%u are valid: %s", dev->up.name, 
>> nb_tx_prep,
>> +                         cnt, rte_strerror(rte_errno));
>> +        }
>> +    }
>> -    while (nb_tx != cnt) {
>> +    while (nb_tx != nb_tx_prep) {
>>           uint32_t ret;
>> -        ret = rte_eth_tx_burst(dev->port_id, qid, pkts + nb_tx, cnt - 
>> nb_tx);
>> +        ret = rte_eth_tx_burst(dev->port_id, qid, pkts + nb_tx,
>> +                               nb_tx_prep - nb_tx);
>>           if (!ret) {
>>               break;
>>           }
>> @@ -2437,11 +2537,14 @@ netdev_dpdk_filter_packet_len(struct 
>> netdev_dpdk *dev, struct rte_mbuf **pkts,
>>       int cnt = 0;
>>       struct rte_mbuf *pkt;
>> +    /* Filter oversized packets, unless are marked for TSO. */
>>       for (i = 0; i < pkt_cnt; i++) {
>>           pkt = pkts[i];
>> -        if (OVS_UNLIKELY(pkt->pkt_len > dev->max_packet_len)) {
>> -            VLOG_WARN_RL(&rl, "%s: Too big size %" PRIu32 " 
>> max_packet_len %d",
>> -                         dev->up.name, pkt->pkt_len, 
>> dev->max_packet_len);
>> +        if (OVS_UNLIKELY((pkt->pkt_len > dev->max_packet_len)
>> +            && !(pkt->ol_flags & PKT_TX_TCP_SEG))) {
>> +            VLOG_WARN_RL(&rl, "%s: Too big size %" PRIu32 " "
>> +                         "max_packet_len %d", dev->up.name, 
>> pkt->pkt_len,
>> +                         dev->max_packet_len);
>>               rte_pktmbuf_free(pkt);
>>               continue;
>>           }
>> @@ -2463,7 +2566,8 @@ netdev_dpdk_vhost_update_tx_counters(struct 
>> netdev_dpdk *dev,
>>   {
>>       int dropped = sw_stats_add->tx_mtu_exceeded_drops +
>>                     sw_stats_add->tx_qos_drops +
>> -                  sw_stats_add->tx_failure_drops;
>> +                  sw_stats_add->tx_failure_drops +
>> +                  sw_stats_add->tx_invalid_hwol_drops;
>>       struct netdev_stats *stats = &dev->stats;
>>       int sent = attempted - dropped;
>>       int i;
>> @@ -2482,6 +2586,7 @@ netdev_dpdk_vhost_update_tx_counters(struct 
>> netdev_dpdk *dev,
>>           sw_stats->tx_failure_drops      += 
>> sw_stats_add->tx_failure_drops;
>>           sw_stats->tx_mtu_exceeded_drops += 
>> sw_stats_add->tx_mtu_exceeded_drops;
>>           sw_stats->tx_qos_drops          += sw_stats_add->tx_qos_drops;
>> +        sw_stats->tx_invalid_hwol_drops += 
>> sw_stats_add->tx_invalid_hwol_drops;
>>       }
>>   }
>> @@ -2513,8 +2618,15 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, 
>> int qid,
>>           rte_spinlock_lock(&dev->tx_q[qid].tx_lock);
>>       }
>> +    sw_stats_add.tx_invalid_hwol_drops = cnt;
>> +    if (userspace_tso_enabled()) {
>> +        cnt = netdev_dpdk_prep_hwol_batch(dev, cur_pkts, cnt);
>> +    }
>> +
>> +    sw_stats_add.tx_invalid_hwol_drops -= cnt;
>> +    sw_stats_add.tx_mtu_exceeded_drops = cnt;
>>       cnt = netdev_dpdk_filter_packet_len(dev, cur_pkts, cnt);
>> -    sw_stats_add.tx_mtu_exceeded_drops = total_packets - cnt;
>> +    sw_stats_add.tx_mtu_exceeded_drops -= cnt;
>>       /* Check has QoS has been configured for the netdev */
>>       sw_stats_add.tx_qos_drops = cnt;
>> @@ -2562,6 +2674,120 @@ out:
>>       }
>>   }
>> +static void
>> +netdev_dpdk_extbuf_free(void *addr OVS_UNUSED, void *opaque)
>> +{
>> +    rte_free(opaque);
>> +}
>> +
>> +static struct rte_mbuf *
>> +dpdk_pktmbuf_attach_extbuf(struct rte_mbuf *pkt, uint32_t data_len)
>> +{
>> +    uint32_t total_len = RTE_PKTMBUF_HEADROOM + data_len;
>> +    struct rte_mbuf_ext_shared_info *shinfo = NULL;
>> +    uint16_t buf_len;
>> +    void *buf;
>> +
>> +    if (rte_pktmbuf_tailroom(pkt) >= sizeof *shinfo) {
>> +        shinfo = rte_pktmbuf_mtod(pkt, struct 
>> rte_mbuf_ext_shared_info *);
>> +    } else {
>> +        total_len += sizeof *shinfo + sizeof(uintptr_t);
>> +        total_len = RTE_ALIGN_CEIL(total_len, sizeof(uintptr_t));
>> +    }
>> +
>> +    if (OVS_UNLIKELY(total_len > UINT16_MAX)) {
>> +        VLOG_ERR("Can't copy packet: too big %u", total_len);
>> +        return NULL;
>> +    }
>> +
>> +    buf_len = total_len;
>> +    buf = rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE);
>> +    if (OVS_UNLIKELY(buf == NULL)) {
>> +        VLOG_ERR("Failed to allocate memory using rte_malloc: %u", 
>> buf_len);
>> +        return NULL;
>> +    }
>> +
>> +    /* Initialize shinfo. */
>> +    if (shinfo) {
>> +        shinfo->free_cb = netdev_dpdk_extbuf_free;
>> +        shinfo->fcb_opaque = buf;
>> +        rte_mbuf_ext_refcnt_set(shinfo, 1);
>> +    } else {
>> +        shinfo = rte_pktmbuf_ext_shinfo_init_helper(buf, &buf_len,
>> +                                                    
>> netdev_dpdk_extbuf_free,
>> +                                                    buf);
>> +        if (OVS_UNLIKELY(shinfo == NULL)) {
>> +            rte_free(buf);
>> +            VLOG_ERR("Failed to initialize shared info for mbuf while "
>> +                     "attempting to attach an external buffer.");
>> +            return NULL;
>> +        }
>> +    }
>> +
>> +    rte_pktmbuf_attach_extbuf(pkt, buf, rte_malloc_virt2iova(buf), 
>> buf_len,
>> +                              shinfo);
>> +    rte_pktmbuf_reset_headroom(pkt);
>> +
>> +    return pkt;
>> +}
>> +
>> +static struct rte_mbuf *
>> +dpdk_pktmbuf_alloc(struct rte_mempool *mp, uint32_t data_len)
>> +{
>> +    struct rte_mbuf *pkt = rte_pktmbuf_alloc(mp);
>> +
>> +    if (OVS_UNLIKELY(!pkt)) {
>> +        return NULL;
>> +    }
>> +
>> +    if (rte_pktmbuf_tailroom(pkt) >= data_len) {
>> +        return pkt;
>> +    }
>> +
>> +    if (dpdk_pktmbuf_attach_extbuf(pkt, data_len)) {
>> +        return pkt;
>> +    }
>> +
>> +    rte_pktmbuf_free(pkt);
>> +
>> +    return NULL;
>> +}
>> +
>> +static struct dp_packet *
>> +dpdk_copy_dp_packet_to_mbuf(struct rte_mempool *mp, struct dp_packet 
>> *pkt_orig)
>> +{
>> +    struct rte_mbuf *mbuf_dest;
>> +    struct dp_packet *pkt_dest;
>> +    uint32_t pkt_len;
>> +
>> +    pkt_len = dp_packet_size(pkt_orig);
>> +    mbuf_dest = dpdk_pktmbuf_alloc(mp, pkt_len);
>> +    if (OVS_UNLIKELY(mbuf_dest == NULL)) {
>> +            return NULL;
>> +    }
>> +
>> +    pkt_dest = CONTAINER_OF(mbuf_dest, struct dp_packet, mbuf);
>> +    memcpy(dp_packet_data(pkt_dest), dp_packet_data(pkt_orig), pkt_len);
>> +    dp_packet_set_size(pkt_dest, pkt_len);
>> +
>> +    mbuf_dest->tx_offload = pkt_orig->mbuf.tx_offload;
>> +    mbuf_dest->packet_type = pkt_orig->mbuf.packet_type;
>> +    mbuf_dest->ol_flags |= (pkt_orig->mbuf.ol_flags &
>> +                            ~(EXT_ATTACHED_MBUF | IND_ATTACHED_MBUF));
>> +
>> +    memcpy(&pkt_dest->l2_pad_size, &pkt_orig->l2_pad_size,
>> +           sizeof(struct dp_packet) - offsetof(struct dp_packet, 
>> l2_pad_size));
>> +
>> +    if (mbuf_dest->ol_flags & PKT_TX_L4_MASK) {
>> +        mbuf_dest->l2_len = (char *)dp_packet_l3(pkt_dest)
>> +                                - (char *)dp_packet_eth(pkt_dest);
>> +        mbuf_dest->l3_len = (char *)dp_packet_l4(pkt_dest)
>> +                                - (char *) dp_packet_l3(pkt_dest);
>> +    }
>> +
>> +    return pkt_dest;
>> +}
>> +
>>   /* Tx function. Transmit packets indefinitely */
>>   static void
>>   dpdk_do_tx_copy(struct netdev *netdev, int qid, struct 
>> dp_packet_batch *batch)
>> @@ -2575,7 +2801,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, 
>> struct dp_packet_batch *batch)
>>       enum { PKT_ARRAY_SIZE = NETDEV_MAX_BURST };
>>   #endif
>>       struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>> -    struct rte_mbuf *pkts[PKT_ARRAY_SIZE];
>> +    struct dp_packet *pkts[PKT_ARRAY_SIZE];
>>       struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
>>       uint32_t cnt = batch_cnt;
>>       uint32_t dropped = 0;
>> @@ -2596,34 +2822,30 @@ dpdk_do_tx_copy(struct netdev *netdev, int 
>> qid, struct dp_packet_batch *batch)
>>           struct dp_packet *packet = batch->packets[i];
>>           uint32_t size = dp_packet_size(packet);
>> -        if (OVS_UNLIKELY(size > dev->max_packet_len)) {
>> -            VLOG_WARN_RL(&rl, "Too big size %u max_packet_len %d",
>> -                         size, dev->max_packet_len);
>> -
>> +        if (size > dev->max_packet_len
>> +            && !(packet->mbuf.ol_flags & PKT_TX_TCP_SEG)) {
>> +            VLOG_WARN_RL(&rl, "Too big size %u max_packet_len %d", size,
>> +                         dev->max_packet_len);
>>               mtu_drops++;
>>               continue;
>>           }
>> -        pkts[txcnt] = rte_pktmbuf_alloc(dev->dpdk_mp->mp);
>> +        pkts[txcnt] = dpdk_copy_dp_packet_to_mbuf(dev->dpdk_mp->mp, 
>> packet);
>>           if (OVS_UNLIKELY(!pkts[txcnt])) {
>>               dropped = cnt - i;
>>               break;
>>           }
>> -        /* We have to do a copy for now */
>> -        memcpy(rte_pktmbuf_mtod(pkts[txcnt], void *),
>> -               dp_packet_data(packet), size);
>> -        dp_packet_set_size((struct dp_packet *)pkts[txcnt], size);
>> -
>>           txcnt++;
>>       }
>>       if (OVS_LIKELY(txcnt)) {
>>           if (dev->type == DPDK_DEV_VHOST) {
>> -            __netdev_dpdk_vhost_send(netdev, qid, (struct dp_packet 
>> **) pkts,
>> -                                     txcnt);
>> +            __netdev_dpdk_vhost_send(netdev, qid, pkts, txcnt);
>>           } else {
>> -            tx_failure = netdev_dpdk_eth_tx_burst(dev, qid, pkts, 
>> txcnt);
>> +            tx_failure += netdev_dpdk_eth_tx_burst(dev, qid,
>> +                                                   (struct rte_mbuf 
>> **)pkts,
>> +                                                   txcnt);
>>           }
>>       }
>> @@ -2676,26 +2898,33 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, 
>> int qid,
>>           dp_packet_delete_batch(batch, true);
>>       } else {
>>           struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
>> -        int tx_cnt, dropped;
>> -        int tx_failure, mtu_drops, qos_drops;
>> +        int dropped;
>> +        int tx_failure, mtu_drops, qos_drops, hwol_drops;
>>           int batch_cnt = dp_packet_batch_size(batch);
>>           struct rte_mbuf **pkts = (struct rte_mbuf **) batch->packets;
>> -        tx_cnt = netdev_dpdk_filter_packet_len(dev, pkts, batch_cnt);
>> -        mtu_drops = batch_cnt - tx_cnt;
>> -        qos_drops = tx_cnt;
>> -        tx_cnt = netdev_dpdk_qos_run(dev, pkts, tx_cnt, true);
>> -        qos_drops -= tx_cnt;
>> +        hwol_drops = batch_cnt;
>> +        if (userspace_tso_enabled()) {
>> +            batch_cnt = netdev_dpdk_prep_hwol_batch(dev, pkts, 
>> batch_cnt);
>> +        }
>> +        hwol_drops -= batch_cnt;
>> +        mtu_drops = batch_cnt;
>> +        batch_cnt = netdev_dpdk_filter_packet_len(dev, pkts, batch_cnt);
>> +        mtu_drops -= batch_cnt;
>> +        qos_drops = batch_cnt;
>> +        batch_cnt = netdev_dpdk_qos_run(dev, pkts, batch_cnt, true);
>> +        qos_drops -= batch_cnt;
>> -        tx_failure = netdev_dpdk_eth_tx_burst(dev, qid, pkts, tx_cnt);
>> +        tx_failure = netdev_dpdk_eth_tx_burst(dev, qid, pkts, 
>> batch_cnt);
>> -        dropped = tx_failure + mtu_drops + qos_drops;
>> +        dropped = tx_failure + mtu_drops + qos_drops + hwol_drops;
>>           if (OVS_UNLIKELY(dropped)) {
>>               rte_spinlock_lock(&dev->stats_lock);
>>               dev->stats.tx_dropped += dropped;
>>               sw_stats->tx_failure_drops += tx_failure;
>>               sw_stats->tx_mtu_exceeded_drops += mtu_drops;
>>               sw_stats->tx_qos_drops += qos_drops;
>> +            sw_stats->tx_invalid_hwol_drops += hwol_drops;
>>               rte_spinlock_unlock(&dev->stats_lock);
>>           }
>>       }
>> @@ -3011,7 +3240,8 @@ netdev_dpdk_get_sw_custom_stats(const struct 
>> netdev *netdev,
>>       SW_CSTAT(tx_failure_drops)       \
>>       SW_CSTAT(tx_mtu_exceeded_drops)  \
>>       SW_CSTAT(tx_qos_drops)           \
>> -    SW_CSTAT(rx_qos_drops)
>> +    SW_CSTAT(rx_qos_drops)           \
>> +    SW_CSTAT(tx_invalid_hwol_drops)
>>   #define SW_CSTAT(NAME) + 1
>>       custom_stats->size = SW_CSTATS;
>> @@ -4874,6 +5104,12 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>>       rte_free(dev->tx_q);
>>       err = dpdk_eth_dev_init(dev);
>> +    if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) {
>> +        netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO;
>> +        netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_CKSUM;
>> +        netdev->ol_flags |= NETDEV_TX_OFFLOAD_IPV4_CKSUM;
>> +    }
>> +
>>       dev->tx_q = netdev_dpdk_alloc_txq(netdev->n_txq);
>>       if (!dev->tx_q) {
>>           err = ENOMEM;
>> @@ -4903,6 +5139,11 @@ dpdk_vhost_reconfigure_helper(struct 
>> netdev_dpdk *dev)
>>           dev->tx_q[0].map = 0;
>>       }
>> +    if (userspace_tso_enabled()) {
>> +        dev->hw_ol_features |= NETDEV_TX_TSO_OFFLOAD;
>> +        VLOG_DBG("%s: TSO enabled on vhost port", 
>> netdev_get_name(&dev->up));
>> +    }
>> +
>>       netdev_dpdk_remap_txqs(dev);
>>       err = netdev_dpdk_mempool_configure(dev);
>> @@ -4975,6 +5216,11 @@ netdev_dpdk_vhost_client_reconfigure(struct 
>> netdev *netdev)
>>               vhost_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
>>           }
>> +        /* Enable External Buffers if TCP Segmentation Offload is 
>> enabled. */
>> +        if (userspace_tso_enabled()) {
>> +            vhost_flags |= RTE_VHOST_USER_EXTBUF_SUPPORT;
>> +        }
>> +
>>           err = rte_vhost_driver_register(dev->vhost_id, vhost_flags);
>>           if (err) {
>>               VLOG_ERR("vhost-user device setup failure for device %s\n",
>> @@ -4999,14 +5245,20 @@ netdev_dpdk_vhost_client_reconfigure(struct 
>> netdev *netdev)
>>               goto unlock;
>>           }
>> -        err = rte_vhost_driver_disable_features(dev->vhost_id,
>> -                                    1ULL << VIRTIO_NET_F_HOST_TSO4
>> -                                    | 1ULL << VIRTIO_NET_F_HOST_TSO6
>> -                                    | 1ULL << VIRTIO_NET_F_CSUM);
>> -        if (err) {
>> -            VLOG_ERR("rte_vhost_driver_disable_features failed for 
>> vhost user "
>> -                     "client port: %s\n", dev->up.name);
>> -            goto unlock;
>> +        if (userspace_tso_enabled()) {
>> +            netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO;
>> +            netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_CKSUM;
>> +            netdev->ol_flags |= NETDEV_TX_OFFLOAD_IPV4_CKSUM;
>> +        } else {
>> +            err = rte_vhost_driver_disable_features(dev->vhost_id,
>> +                                        1ULL << VIRTIO_NET_F_HOST_TSO4
>> +                                        | 1ULL << VIRTIO_NET_F_HOST_TSO6
>> +                                        | 1ULL << VIRTIO_NET_F_CSUM);
>> +            if (err) {
>> +                VLOG_ERR("rte_vhost_driver_disable_features failed for "
>> +                         "vhost user client port: %s\n", dev->up.name);
>> +                goto unlock;
>> +            }
>>           }
>>           err = rte_vhost_driver_start(dev->vhost_id);
>> diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h
>> index f08159aa7..9dbc67658 100644
>> --- a/lib/netdev-linux-private.h
>> +++ b/lib/netdev-linux-private.h
>> @@ -27,6 +27,7 @@
>>   #include <stdint.h>
>>   #include <stdbool.h>
>> +#include "dp-packet.h"
>>   #include "netdev-afxdp.h"
>>   #include "netdev-afxdp-pool.h"
>>   #include "netdev-provider.h"
>> @@ -37,10 +38,13 @@
>>   struct netdev;
>> +#define LINUX_RXQ_TSO_MAX_LEN 65536
>> +
>>   struct netdev_rxq_linux {
>>       struct netdev_rxq up;
>>       bool is_tap;
>>       int fd;
>> +    char *aux_bufs[NETDEV_MAX_BURST]; /* Batch of preallocated TSO 
>> buffers. */
>>   };
>>   int netdev_linux_construct(struct netdev *);
>> @@ -92,6 +96,7 @@ struct netdev_linux {
>>       int tap_fd;
>>       bool present;               /* If the device is present in the 
>> namespace */
>>       uint64_t tx_dropped;        /* tap device can drop if the iface 
>> is down */
>> +    uint64_t rx_dropped;        /* Packets dropped while recv from 
>> kernel. */
>>       /* LAG information. */
>>       bool is_lag_master;         /* True if the netdev is a LAG 
>> master. */
>> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
>> index 41d1e9273..a4a666657 100644
>> --- a/lib/netdev-linux.c
>> +++ b/lib/netdev-linux.c
>> @@ -29,16 +29,18 @@
>>   #include <linux/filter.h>
>>   #include <linux/gen_stats.h>
>>   #include <linux/if_ether.h>
>> +#include <linux/if_packet.h>
>>   #include <linux/if_tun.h>
>>   #include <linux/types.h>
>>   #include <linux/ethtool.h>
>>   #include <linux/mii.h>
>>   #include <linux/rtnetlink.h>
>>   #include <linux/sockios.h>
>> +#include <linux/virtio_net.h>
>>   #include <sys/ioctl.h>
>>   #include <sys/socket.h>
>> +#include <sys/uio.h>
>>   #include <sys/utsname.h>
>> -#include <netpacket/packet.h>
>>   #include <net/if.h>
>>   #include <net/if_arp.h>
>>   #include <net/route.h>
>> @@ -75,6 +77,7 @@
>>   #include "timer.h"
>>   #include "unaligned.h"
>>   #include "openvswitch/vlog.h"
>> +#include "userspace-tso.h"
>>   #include "util.h"
>>   VLOG_DEFINE_THIS_MODULE(netdev_linux);
>> @@ -237,6 +240,16 @@ enum {
>>       VALID_DRVINFO           = 1 << 6,
>>       VALID_FEATURES          = 1 << 7,
>>   };
>> +
>> +/* Use one for the packet buffer and another for the aux buffer to 
>> receive
>> + * TSO packets. */
>> +#define IOV_STD_SIZE 1
>> +#define IOV_TSO_SIZE 2
>> +
>> +enum {
>> +    IOV_PACKET = 0,
>> +    IOV_AUXBUF = 1,
>> +};
>>   

>>   struct linux_lag_slave {
>>      uint32_t block_id;
>> @@ -501,6 +514,8 @@ static struct vlog_rate_limit rl = 
>> VLOG_RATE_LIMIT_INIT(5, 20);
>>    * changes in the device miimon status, so we can use atomic_count. */
>>   static atomic_count miimon_cnt = ATOMIC_COUNT_INIT(0);
>> +static int netdev_linux_parse_vnet_hdr(struct dp_packet *b);
>> +static void netdev_linux_prepend_vnet_hdr(struct dp_packet *b, int mtu);
>>   static int netdev_linux_do_ethtool(const char *name, struct 
>> ethtool_cmd *,
>>                                      int cmd, const char *cmd_name);
>>   static int get_flags(const struct netdev *, unsigned int *flags);
>> @@ -902,6 +917,13 @@ netdev_linux_common_construct(struct netdev 
>> *netdev_)
>>       /* The device could be in the same network namespace or in 
>> another one. */
>>       netnsid_unset(&netdev->netnsid);
>>       ovs_mutex_init(&netdev->mutex);
>> +
>> +    if (userspace_tso_enabled()) {
>> +        netdev_->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO;
>> +        netdev_->ol_flags |= NETDEV_TX_OFFLOAD_TCP_CKSUM;
>> +        netdev_->ol_flags |= NETDEV_TX_OFFLOAD_IPV4_CKSUM;
>> +    }
>> +
>>       return 0;
>>   }
>> @@ -961,6 +983,10 @@ netdev_linux_construct_tap(struct netdev *netdev_)
>>       /* Create tap device. */
>>       get_flags(&netdev->up, &netdev->ifi_flags);
>>       ifr.ifr_flags = IFF_TAP | IFF_NO_PI;
>> +    if (userspace_tso_enabled()) {
>> +        ifr.ifr_flags |= IFF_VNET_HDR;
>> +    }
>> +
>>       ovs_strzcpy(ifr.ifr_name, name, sizeof ifr.ifr_name);
>>       if (ioctl(netdev->tap_fd, TUNSETIFF, &ifr) == -1) {
>>           VLOG_WARN("%s: creating tap device failed: %s", name,
>> @@ -1024,6 +1050,15 @@ static struct netdev_rxq *
>>   netdev_linux_rxq_alloc(void)
>>   {
>>       struct netdev_rxq_linux *rx = xzalloc(sizeof *rx);
>> +    if (userspace_tso_enabled()) {
>> +        int i;
>> +
>> +        /* Allocate auxiliay buffers to receive TSO packets. */
>> +        for (i = 0; i < NETDEV_MAX_BURST; i++) {
>> +            rx->aux_bufs[i] = xmalloc(LINUX_RXQ_TSO_MAX_LEN);
>> +        }
>> +    }
>> +
>>       return &rx->up;
>>   }
>> @@ -1069,6 +1104,15 @@ netdev_linux_rxq_construct(struct netdev_rxq 
>> *rxq_)
>>               goto error;
>>           }
>> +        if (userspace_tso_enabled()
>> +            && setsockopt(rx->fd, SOL_PACKET, PACKET_VNET_HDR, &val,
>> +                          sizeof val)) {
>> +            error = errno;
>> +            VLOG_ERR("%s: failed to enable vnet hdr in txq raw 
>> socket: %s",
>> +                     netdev_get_name(netdev_), ovs_strerror(errno));
>> +            goto error;
>> +        }
>> +
>>           /* Set non-blocking mode. */
>>           error = set_nonblocking(rx->fd);
>>           if (error) {
>> @@ -1119,10 +1163,15 @@ static void
>>   netdev_linux_rxq_destruct(struct netdev_rxq *rxq_)
>>   {
>>       struct netdev_rxq_linux *rx = netdev_rxq_linux_cast(rxq_);
>> +    int i;
>>       if (!rx->is_tap) {
>>           close(rx->fd);
>>       }
>> +
>> +    for (i = 0; i < NETDEV_MAX_BURST; i++) {
>> +        free(rx->aux_bufs[i]);
>> +    }
>>   }
>>   static void
>> @@ -1159,12 +1208,14 @@ auxdata_has_vlan_tci(const struct 
>> tpacket_auxdata *aux)
>>    * It also used recvmmsg to reduce multiple syscalls overhead;
>>    */
>>   static int
>> -netdev_linux_batch_rxq_recv_sock(int fd, int mtu,
>> +netdev_linux_batch_rxq_recv_sock(struct netdev_rxq_linux *rx, int mtu,
>>                                    struct dp_packet_batch *batch)
>>   {
>> -    size_t size;
>> +    int iovlen;
>> +    size_t std_len;
>>       ssize_t retval;
>> -    struct iovec iovs[NETDEV_MAX_BURST];
>> +    int virtio_net_hdr_size;
>> +    struct iovec iovs[NETDEV_MAX_BURST][IOV_TSO_SIZE];
>>       struct cmsghdr *cmsg;
>>       union {
>>           struct cmsghdr cmsg;
>> @@ -1174,41 +1225,87 @@ netdev_linux_batch_rxq_recv_sock(int fd, int mtu,
>>       struct dp_packet *buffers[NETDEV_MAX_BURST];
>>       int i;
>> +    if (userspace_tso_enabled()) {
>> +        /* Use the buffer from the allocated packet below to receive MTU
>> +         * sized packets and an aux_buf for extra TSO data. */
>> +        iovlen = IOV_TSO_SIZE;
>> +        virtio_net_hdr_size = sizeof(struct virtio_net_hdr);
>> +    } else {
>> +        /* Use only the buffer from the allocated packet. */
>> +        iovlen = IOV_STD_SIZE;
>> +        virtio_net_hdr_size = 0;
>> +    }
>> +
>> +    std_len = VLAN_ETH_HEADER_LEN + mtu + virtio_net_hdr_size;
>>       for (i = 0; i < NETDEV_MAX_BURST; i++) {
>> -         buffers[i] = dp_packet_new_with_headroom(VLAN_ETH_HEADER_LEN 
>> + mtu,
>> -                                                  DP_NETDEV_HEADROOM);
>> -         /* Reserve headroom for a single VLAN tag */
>> -         dp_packet_reserve(buffers[i], VLAN_HEADER_LEN);
>> -         size = dp_packet_tailroom(buffers[i]);
>> -         iovs[i].iov_base = dp_packet_data(buffers[i]);
>> -         iovs[i].iov_len = size;
>> +         buffers[i] = dp_packet_new_with_headroom(std_len, 
>> DP_NETDEV_HEADROOM);
>> +         iovs[i][IOV_PACKET].iov_base = dp_packet_data(buffers[i]);
>> +         iovs[i][IOV_PACKET].iov_len = std_len;
>> +         iovs[i][IOV_AUXBUF].iov_base = rx->aux_bufs[i];
>> +         iovs[i][IOV_AUXBUF].iov_len = LINUX_RXQ_TSO_MAX_LEN;
>>            mmsgs[i].msg_hdr.msg_name = NULL;
>>            mmsgs[i].msg_hdr.msg_namelen = 0;
>> -         mmsgs[i].msg_hdr.msg_iov = &iovs[i];
>> -         mmsgs[i].msg_hdr.msg_iovlen = 1;
>> +         mmsgs[i].msg_hdr.msg_iov = iovs[i];
>> +         mmsgs[i].msg_hdr.msg_iovlen = iovlen;
>>            mmsgs[i].msg_hdr.msg_control = &cmsg_buffers[i];
>>            mmsgs[i].msg_hdr.msg_controllen = sizeof cmsg_buffers[i];
>>            mmsgs[i].msg_hdr.msg_flags = 0;
>>       }
>>       do {
>> -        retval = recvmmsg(fd, mmsgs, NETDEV_MAX_BURST, MSG_TRUNC, NULL);
>> +        retval = recvmmsg(rx->fd, mmsgs, NETDEV_MAX_BURST, MSG_TRUNC, 
>> NULL);
>>       } while (retval < 0 && errno == EINTR);
>>       if (retval < 0) {
>> -        /* Save -errno to retval temporarily */
>> -        retval = -errno;
>> -        i = 0;
>> -        goto free_buffers;
>> +        retval = errno;
>> +        for (i = 0; i < NETDEV_MAX_BURST; i++) {
>> +            dp_packet_delete(buffers[i]);
>> +        }
>> +
>> +        return retval;
>>       }
>>       for (i = 0; i < retval; i++) {
>>           if (mmsgs[i].msg_len < ETH_HEADER_LEN) {
>> -            break;
>> +            struct netdev *netdev_ = netdev_rxq_get_netdev(&rx->up);
>> +            struct netdev_linux *netdev = netdev_linux_cast(netdev_);
>> +
>> +            dp_packet_delete(buffers[i]);
>> +            netdev->rx_dropped += 1;
>> +            VLOG_WARN_RL(&rl, "%s: Dropped packet: less than ether 
>> hdr size",
>> +                         netdev_get_name(netdev_));
>> +            continue;
>> +        }
>> +
>> +        if (mmsgs[i].msg_len > std_len) {
>> +            /* Build a single linear TSO packet by expanding the 
>> current packet
>> +             * to append the data received in the aux_buf. */
>> +            size_t extra_len = mmsgs[i].msg_len - std_len;
>> +
>> +            dp_packet_set_size(buffers[i], dp_packet_size(buffers[i])
>> +                               + std_len);
>> +            dp_packet_prealloc_tailroom(buffers[i], extra_len);
>> +            memcpy(dp_packet_tail(buffers[i]), rx->aux_bufs[i], 
>> extra_len);
>> +            dp_packet_set_size(buffers[i], dp_packet_size(buffers[i])
>> +                               + extra_len);
>> +        } else {
>> +            dp_packet_set_size(buffers[i], dp_packet_size(buffers[i])
>> +                               + mmsgs[i].msg_len);
>>           }
>> -        dp_packet_set_size(buffers[i],
>> -                           dp_packet_size(buffers[i]) + 
>> mmsgs[i].msg_len);
>> +        if (virtio_net_hdr_size && 
>> netdev_linux_parse_vnet_hdr(buffers[i])) {
>> +            struct netdev *netdev_ = netdev_rxq_get_netdev(&rx->up);
>> +            struct netdev_linux *netdev = netdev_linux_cast(netdev_);
>> +
>> +            /* Unexpected error situation: the virtio header is not 
>> present
>> +             * or corrupted. Drop the packet but continue in case 
>> next ones
>> +             * are correct. */
>> +            dp_packet_delete(buffers[i]);
>> +            netdev->rx_dropped += 1;
>> +            VLOG_WARN_RL(&rl, "%s: Dropped packet: Invalid virtio net 
>> header",
>> +                         netdev_get_name(netdev_));
>> +            continue;
>> +        }
>>           for (cmsg = CMSG_FIRSTHDR(&mmsgs[i].msg_hdr); cmsg;
>>                    cmsg = CMSG_NXTHDR(&mmsgs[i].msg_hdr, cmsg)) {
>> @@ -1238,22 +1335,11 @@ netdev_linux_batch_rxq_recv_sock(int fd, int mtu,
>>           dp_packet_batch_add(batch, buffers[i]);
>>       }
>> -free_buffers:
>> -    /* Free unused buffers, including buffers whose size is less than
>> -     * ETH_HEADER_LEN.
>> -     *
>> -     * Note: i has been set correctly by the above for loop, so don't
>> -     * try to re-initialize it.
>> -     */
>> +    /* Delete unused buffers. */
>>       for (; i < NETDEV_MAX_BURST; i++) {
>>           dp_packet_delete(buffers[i]);
>>       }
>> -    /* netdev_linux_rxq_recv needs it to return 0 or positive errno */
>> -    if (retval < 0) {
>> -        return -retval;
>> -    }
>> -
>>       return 0;
>>   }
>> @@ -1263,20 +1349,40 @@ free_buffers:
>>    * packets are added into *batch. The return value is 0 or errno.
>>    */
>>   static int
>> -netdev_linux_batch_rxq_recv_tap(int fd, int mtu, struct 
>> dp_packet_batch *batch)
>> +netdev_linux_batch_rxq_recv_tap(struct netdev_rxq_linux *rx, int mtu,
>> +                                struct dp_packet_batch *batch)
>>   {
>>       struct dp_packet *buffer;
>> +    int virtio_net_hdr_size;
>>       ssize_t retval;
>> -    size_t size;
>> +    size_t std_len;
>> +    int iovlen;
>>       int i;
>> +    if (userspace_tso_enabled()) {
>> +        /* Use the buffer from the allocated packet below to receive MTU
>> +         * sized packets and an aux_buf for extra TSO data. */
>> +        iovlen = IOV_TSO_SIZE;
>> +        virtio_net_hdr_size = sizeof(struct virtio_net_hdr);
>> +    } else {
>> +        /* Use only the buffer from the allocated packet. */
>> +        iovlen = IOV_STD_SIZE;
>> +        virtio_net_hdr_size = 0;
>> +    }
>> +
>> +    std_len = VLAN_ETH_HEADER_LEN + mtu + virtio_net_hdr_size;
>>       for (i = 0; i < NETDEV_MAX_BURST; i++) {
>> +        struct iovec iov[IOV_TSO_SIZE];
>> +
>>           /* Assume Ethernet port. No need to set packet_type. */
>> -        buffer = dp_packet_new_with_headroom(VLAN_ETH_HEADER_LEN + mtu,
>> -                                             DP_NETDEV_HEADROOM);
>> -        size = dp_packet_tailroom(buffer);
>> +        buffer = dp_packet_new_with_headroom(std_len, 
>> DP_NETDEV_HEADROOM);
>> +        iov[IOV_PACKET].iov_base = dp_packet_data(buffer);
>> +        iov[IOV_PACKET].iov_len = std_len;
>> +        iov[IOV_AUXBUF].iov_base = rx->aux_bufs[i];
>> +        iov[IOV_AUXBUF].iov_len = LINUX_RXQ_TSO_MAX_LEN;
>> +
>>           do {
>> -            retval = read(fd, dp_packet_data(buffer), size);
>> +            retval = readv(rx->fd, iov, iovlen);
>>           } while (retval < 0 && errno == EINTR);
>>           if (retval < 0) {
>> @@ -1284,7 +1390,33 @@ netdev_linux_batch_rxq_recv_tap(int fd, int 
>> mtu, struct dp_packet_batch *batch)
>>               break;
>>           }
>> -        dp_packet_set_size(buffer, dp_packet_size(buffer) + retval);
>> +        if (retval > std_len) {
>> +            /* Build a single linear TSO packet by expanding the 
>> current packet
>> +             * to append the data received in the aux_buf. */
>> +            size_t extra_len = retval - std_len;
>> +
>> +            dp_packet_set_size(buffer, dp_packet_size(buffer) + 
>> std_len);
>> +            dp_packet_prealloc_tailroom(buffer, extra_len);
>> +            memcpy(dp_packet_tail(buffer), rx->aux_bufs[i], extra_len);
>> +            dp_packet_set_size(buffer, dp_packet_size(buffer) + 
>> extra_len);
>> +        } else {
>> +            dp_packet_set_size(buffer, dp_packet_size(buffer) + retval);
>> +        }
>> +
>> +        if (virtio_net_hdr_size && 
>> netdev_linux_parse_vnet_hdr(buffer)) {
>> +            struct netdev *netdev_ = netdev_rxq_get_netdev(&rx->up);
>> +            struct netdev_linux *netdev = netdev_linux_cast(netdev_);
>> +
>> +            /* Unexpected error situation: the virtio header is not 
>> present
>> +             * or corrupted. Drop the packet but continue in case 
>> next ones
>> +             * are correct. */
>> +            dp_packet_delete(buffer);
>> +            netdev->rx_dropped += 1;
>> +            VLOG_WARN_RL(&rl, "%s: Dropped packet: Invalid virtio net 
>> header",
>> +                         netdev_get_name(netdev_));
>> +            continue;
>> +        }
>> +
>>           dp_packet_batch_add(batch, buffer);
>>       }
>> @@ -1310,8 +1442,8 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_, 
>> struct dp_packet_batch *batch,
>>       dp_packet_batch_init(batch);
>>       retval = (rx->is_tap
>> -              ? netdev_linux_batch_rxq_recv_tap(rx->fd, mtu, batch)
>> -              : netdev_linux_batch_rxq_recv_sock(rx->fd, mtu, batch));
>> +              ? netdev_linux_batch_rxq_recv_tap(rx, mtu, batch)
>> +              : netdev_linux_batch_rxq_recv_sock(rx, mtu, batch));
>>       if (retval) {
>>           if (retval != EAGAIN && retval != EMSGSIZE) {
>> @@ -1353,7 +1485,7 @@ netdev_linux_rxq_drain(struct netdev_rxq *rxq_)
>>   }
>>   static int
>> -netdev_linux_sock_batch_send(int sock, int ifindex,
>> +netdev_linux_sock_batch_send(int sock, int ifindex, bool tso, int mtu,
>>                                struct dp_packet_batch *batch)
>>   {
>>       const size_t size = dp_packet_batch_size(batch);
>> @@ -1367,6 +1499,10 @@ netdev_linux_sock_batch_send(int sock, int 
>> ifindex,
>>       struct dp_packet *packet;
>>       DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
>> +        if (tso) {
>> +            netdev_linux_prepend_vnet_hdr(packet, mtu);
>> +        }
>> +
>>           iov[i].iov_base = dp_packet_data(packet);
>>           iov[i].iov_len = dp_packet_size(packet);
>>           mmsg[i].msg_hdr = (struct msghdr) { .msg_name = &sll,
>> @@ -1399,7 +1535,7 @@ netdev_linux_sock_batch_send(int sock, int ifindex,
>>    * on other interface types because we attach a socket filter to the rx
>>    * socket. */
>>   static int
>> -netdev_linux_tap_batch_send(struct netdev *netdev_,
>> +netdev_linux_tap_batch_send(struct netdev *netdev_, bool tso, int mtu,
>>                               struct dp_packet_batch *batch)
>>   {
>>       struct netdev_linux *netdev = netdev_linux_cast(netdev_);
>> @@ -1416,10 +1552,15 @@ netdev_linux_tap_batch_send(struct netdev 
>> *netdev_,
>>       }
>>       DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
>> -        size_t size = dp_packet_size(packet);
>> +        size_t size;
>>           ssize_t retval;
>>           int error;
>> +        if (tso) {
>> +            netdev_linux_prepend_vnet_hdr(packet, mtu);
>> +        }
>> +
>> +        size = dp_packet_size(packet);
>>           do {
>>               retval = write(netdev->tap_fd, dp_packet_data(packet), 
>> size);
>>               error = retval < 0 ? errno : 0;
>> @@ -1454,9 +1595,15 @@ netdev_linux_send(struct netdev *netdev_, int 
>> qid OVS_UNUSED,
>>                     struct dp_packet_batch *batch,
>>                     bool concurrent_txq OVS_UNUSED)
>>   {
>> +    bool tso = userspace_tso_enabled();
>> +    int mtu = ETH_PAYLOAD_MAX;
>>       int error = 0;
>>       int sock = 0;
>> +    if (tso) {
>> +        netdev_linux_get_mtu__(netdev_linux_cast(netdev_), &mtu);
>> +    }
>> +
>>       if (!is_tap_netdev(netdev_)) {
>>           if 
>> (netdev_linux_netnsid_is_remote(netdev_linux_cast(netdev_))) {
>>               error = EOPNOTSUPP;
>> @@ -1475,9 +1622,9 @@ netdev_linux_send(struct netdev *netdev_, int 
>> qid OVS_UNUSED,
>>               goto free_batch;
>>           }
>> -        error = netdev_linux_sock_batch_send(sock, ifindex, batch);
>> +        error = netdev_linux_sock_batch_send(sock, ifindex, tso, mtu, 
>> batch);
>>       } else {
>> -        error = netdev_linux_tap_batch_send(netdev_, batch);
>> +        error = netdev_linux_tap_batch_send(netdev_, tso, mtu, batch);
>>       }
>>       if (error) {
>>           if (error == ENOBUFS) {
>> @@ -2045,6 +2192,7 @@ netdev_tap_get_stats(const struct netdev 
>> *netdev_, struct netdev_stats *stats)
>>           stats->collisions          += dev_stats.collisions;
>>       }
>>       stats->tx_dropped += netdev->tx_dropped;
>> +    stats->rx_dropped += netdev->rx_dropped;
>>       ovs_mutex_unlock(&netdev->mutex);
>>       return error;
>> @@ -6223,6 +6371,17 @@ af_packet_sock(void)
>>               if (error) {
>>                   close(sock);
>>                   sock = -error;
>> +            } else if (userspace_tso_enabled()) {
>> +                int val = 1;
>> +                error = setsockopt(sock, SOL_PACKET, PACKET_VNET_HDR, 
>> &val,
>> +                                   sizeof val);
>> +                if (error) {
>> +                    error = errno;
>> +                    VLOG_ERR("failed to enable vnet hdr in raw 
>> socket: %s",
>> +                             ovs_strerror(errno));
>> +                    close(sock);
>> +                    sock = -error;
>> +                }
>>               }
>>           } else {
>>               sock = -errno;
>> @@ -6234,3 +6393,136 @@ af_packet_sock(void)
>>       return sock;
>>   }
>> +
>> +static int
>> +netdev_linux_parse_l2(struct dp_packet *b, uint16_t *l4proto)
>> +{
>> +    struct eth_header *eth_hdr;
>> +    ovs_be16 eth_type;
>> +    int l2_len;
>> +
>> +    eth_hdr = dp_packet_at(b, 0, ETH_HEADER_LEN);
>> +    if (!eth_hdr) {
>> +        return -EINVAL;
>> +    }
>> +
>> +    l2_len = ETH_HEADER_LEN;
>> +    eth_type = eth_hdr->eth_type;
>> +    if (eth_type_vlan(eth_type)) {
>> +        struct vlan_header *vlan = dp_packet_at(b, l2_len, 
>> VLAN_HEADER_LEN);
>> +
>> +        if (!vlan) {
>> +            return -EINVAL;
>> +        }
>> +
>> +        eth_type = vlan->vlan_next_type;
>> +        l2_len += VLAN_HEADER_LEN;
>> +    }
>> +
>> +    if (eth_type == htons(ETH_TYPE_IP)) {
>> +        struct ip_header *ip_hdr = dp_packet_at(b, l2_len, 
>> IP_HEADER_LEN);
>> +
>> +        if (!ip_hdr) {
>> +            return -EINVAL;
>> +        }
>> +
>> +        *l4proto = ip_hdr->ip_proto;
>> +        dp_packet_hwol_set_tx_ipv4(b);
>> +    } else if (eth_type == htons(ETH_TYPE_IPV6)) {
>> +        struct ovs_16aligned_ip6_hdr *nh6;
>> +
>> +        nh6 = dp_packet_at(b, l2_len, IPV6_HEADER_LEN);
>> +        if (!nh6) {
>> +            return -EINVAL;
>> +        }
>> +
>> +        *l4proto = nh6->ip6_ctlun.ip6_un1.ip6_un1_nxt;
>> +        dp_packet_hwol_set_tx_ipv6(b);
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int
>> +netdev_linux_parse_vnet_hdr(struct dp_packet *b)
>> +{
>> +    struct virtio_net_hdr *vnet = dp_packet_pull(b, sizeof *vnet);
>> +    uint16_t l4proto = 0;
>> +
>> +    if (OVS_UNLIKELY(!vnet)) {
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (vnet->flags == 0 && vnet->gso_type == VIRTIO_NET_HDR_GSO_NONE) {
>> +        return 0;
>> +    }
>> +
>> +    if (netdev_linux_parse_l2(b, &l4proto)) {
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (vnet->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) {
>> +        if (l4proto == IPPROTO_TCP) {
>> +            dp_packet_hwol_set_csum_tcp(b);
>> +        } else if (l4proto == IPPROTO_UDP) {
>> +            dp_packet_hwol_set_csum_udp(b);
>> +        } else if (l4proto == IPPROTO_SCTP) {
>> +            dp_packet_hwol_set_csum_sctp(b);
>> +        }
>> +    }
>> +
>> +    if (l4proto && vnet->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
>> +        uint8_t allowed_mask = VIRTIO_NET_HDR_GSO_TCPV4
>> +                                | VIRTIO_NET_HDR_GSO_TCPV6
>> +                                | VIRTIO_NET_HDR_GSO_UDP;
>> +        uint8_t type = vnet->gso_type & allowed_mask;
>> +
>> +        if (type == VIRTIO_NET_HDR_GSO_TCPV4
>> +            || type == VIRTIO_NET_HDR_GSO_TCPV6) {
>> +            dp_packet_hwol_set_tcp_seg(b);
>> +        }
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static void
>> +netdev_linux_prepend_vnet_hdr(struct dp_packet *b, int mtu)
>> +{
>> +    struct virtio_net_hdr *vnet = dp_packet_push_zeros(b, sizeof *vnet);
>> +
>> +    if (dp_packet_hwol_is_tso(b)) {
>> +        uint16_t hdr_len = ((char *)dp_packet_l4(b) - (char 
>> *)dp_packet_eth(b))
>> +                            + TCP_HEADER_LEN;
>> +
>> +        vnet->hdr_len = (OVS_FORCE __virtio16)hdr_len;
>> +        vnet->gso_size = (OVS_FORCE __virtio16)(mtu - hdr_len);
>> +        if (dp_packet_hwol_is_ipv4(b)) {
>> +            vnet->gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
>> +        } else {
>> +            vnet->gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
>> +        }
>> +
>> +    } else {
>> +        vnet->flags = VIRTIO_NET_HDR_GSO_NONE;
>> +    }
>> +
>> +    if (dp_packet_hwol_l4_mask(b)) {
>> +        vnet->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
>> +        vnet->csum_start = (OVS_FORCE __virtio16)((char 
>> *)dp_packet_l4(b)
>> +                                                  - (char 
>> *)dp_packet_eth(b));
>> +
>> +        if (dp_packet_hwol_l4_is_tcp(b)) {
>> +            vnet->csum_offset = (OVS_FORCE __virtio16) 
>> __builtin_offsetof(
>> +                                    struct tcp_header, tcp_csum);
>> +        } else if (dp_packet_hwol_l4_is_udp(b)) {
>> +            vnet->csum_offset = (OVS_FORCE __virtio16) 
>> __builtin_offsetof(
>> +                                    struct udp_header, udp_csum);
>> +        } else if (dp_packet_hwol_l4_is_sctp(b)) {
>> +            vnet->csum_offset = (OVS_FORCE __virtio16) 
>> __builtin_offsetof(
>> +                                    struct sctp_header, sctp_csum);
>> +        } else {
>> +            VLOG_WARN_RL(&rl, "Unsupported L4 protocol");
>> +        }
>> +    }
>> +}
>> diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
>> index f109c4e66..22f4cde33 100644
>> --- a/lib/netdev-provider.h
>> +++ b/lib/netdev-provider.h
>> @@ -37,6 +37,12 @@ extern "C" {
>>   struct netdev_tnl_build_header_params;
>>   #define NETDEV_NUMA_UNSPEC OVS_NUMA_UNSPEC
>> +enum netdev_ol_flags {
>> +    NETDEV_TX_OFFLOAD_IPV4_CKSUM = 1 << 0,
>> +    NETDEV_TX_OFFLOAD_TCP_CKSUM = 1 << 1,
>> +    NETDEV_TX_OFFLOAD_TCP_TSO = 1 << 2,
>> +};
>> +
>>   /* A network device (e.g. an Ethernet device).
>>    *
>>    * Network device implementations may read these members but should 
>> not modify
>> @@ -51,6 +57,9 @@ struct netdev {
>>        * opening this device, and therefore got assigned to the 
>> "system" class */
>>       bool auto_classified;
>> +    /* This bitmask of the offloading features enabled by the netdev. */
>> +    uint64_t ol_flags;
>> +
>>       /* If this is 'true', the user explicitly specified an MTU for this
>>        * netdev.  Otherwise, Open vSwitch is allowed to override it. */
>>       bool mtu_user_config;
>> diff --git a/lib/netdev.c b/lib/netdev.c
>> index 405c98c68..f95b19af4 100644
>> --- a/lib/netdev.c
>> +++ b/lib/netdev.c
>> @@ -66,6 +66,8 @@ COVERAGE_DEFINE(netdev_received);
>>   COVERAGE_DEFINE(netdev_sent);
>>   COVERAGE_DEFINE(netdev_add_router);
>>   COVERAGE_DEFINE(netdev_get_stats);
>> +COVERAGE_DEFINE(netdev_send_prepare_drops);
>> +COVERAGE_DEFINE(netdev_push_header_drops);
>>   struct netdev_saved_flags {
>>       struct netdev *netdev;
>> @@ -782,6 +784,54 @@ netdev_get_pt_mode(const struct netdev *netdev)
>>               : NETDEV_PT_LEGACY_L2);
>>   }
>> +/* Check if a 'packet' is compatible with 'netdev_flags'.
>> + * If a packet is incompatible, return 'false' with the 'errormsg'
>> + * pointing to a reason. */
>> +static bool
>> +netdev_send_prepare_packet(const uint64_t netdev_flags,
>> +                           struct dp_packet *packet, char **errormsg)
>> +{
>> +    if (dp_packet_hwol_is_tso(packet)
>> +        && !(netdev_flags & NETDEV_TX_OFFLOAD_TCP_TSO)) {
>> +            /* Fall back to GSO in software. */
>> +            VLOG_ERR_BUF(errormsg, "No TSO support");
>> +            return false;
>> +    }
>> +
>> +    if (dp_packet_hwol_l4_mask(packet)
>> +        && !(netdev_flags & NETDEV_TX_OFFLOAD_TCP_CKSUM)) {
>> +            /* Fall back to L4 csum in software. */
>> +            VLOG_ERR_BUF(errormsg, "No L4 checksum support");
>> +            return false;
>> +    }
>> +
>> +    return true;
>> +}
>> +
>> +/* Check if each packet in 'batch' is compatible with 'netdev' features,
>> + * otherwise either fall back to software implementation or drop it. */
>> +static void
>> +netdev_send_prepare_batch(const struct netdev *netdev,
>> +                          struct dp_packet_batch *batch)
>> +{
>> +    struct dp_packet *packet;
>> +    size_t i, size = dp_packet_batch_size(batch);
>> +
>> +    DP_PACKET_BATCH_REFILL_FOR_EACH (i, size, packet, batch) {
>> +        char *errormsg = NULL;
>> +
>> +        if (netdev_send_prepare_packet(netdev->ol_flags, packet, 
>> &errormsg)) {
>> +            dp_packet_batch_refill(batch, packet, i);
>> +        } else {
>> +            dp_packet_delete(packet);
>> +            COVERAGE_INC(netdev_send_prepare_drops);
>> +            VLOG_WARN_RL(&rl, "%s: Packet dropped: %s",
>> +                         netdev_get_name(netdev), errormsg);
>> +            free(errormsg);
>> +        }
>> +    }
>> +}
>> +
>>   /* Sends 'batch' on 'netdev'.  Returns 0 if successful (for every 
>> packet),
>>    * otherwise a positive errno value.  Returns EAGAIN without 
>> blocking if
>>    * at least one the packets cannot be queued immediately.  Returns 
>> EMSGSIZE
>> @@ -811,8 +861,14 @@ int
>>   netdev_send(struct netdev *netdev, int qid, struct dp_packet_batch 
>> *batch,
>>               bool concurrent_txq)
>>   {
>> -    int error = netdev->netdev_class->send(netdev, qid, batch,
>> -                                           concurrent_txq);
>> +    int error;
>> +
>> +    netdev_send_prepare_batch(netdev, batch);
>> +    if (OVS_UNLIKELY(dp_packet_batch_is_empty(batch))) {
>> +        return 0;
>> +    }
>> +
>> +    error = netdev->netdev_class->send(netdev, qid, batch, 
>> concurrent_txq);
>>       if (!error) {
>>           COVERAGE_INC(netdev_sent);
>>       }
>> @@ -878,9 +934,21 @@ netdev_push_header(const struct netdev *netdev,
>>                      const struct ovs_action_push_tnl *data)
>>   {
>>       struct dp_packet *packet;
>> -    DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
>> -        netdev->netdev_class->push_header(netdev, packet, data);
>> -        pkt_metadata_init(&packet->md, data->out_port);
>> +    size_t i, size = dp_packet_batch_size(batch);
>> +
>> +    DP_PACKET_BATCH_REFILL_FOR_EACH (i, size, packet, batch) {
>> +        if (OVS_UNLIKELY(dp_packet_hwol_is_tso(packet)
>> +                         || dp_packet_hwol_l4_mask(packet))) {
>> +            COVERAGE_INC(netdev_push_header_drops);
>> +            dp_packet_delete(packet);
>> +            VLOG_WARN_RL(&rl, "%s: Tunneling packets with HW offload 
>> flags is "
>> +                         "not supported: packet dropped",
>> +                         netdev_get_name(netdev));
>> +        } else {
>> +            netdev->netdev_class->push_header(netdev, packet, data);
>> +            pkt_metadata_init(&packet->md, data->out_port);
>> +            dp_packet_batch_refill(batch, packet, i);
>> +        }
>>       }
>>       return 0;
>> diff --git a/lib/userspace-tso.c b/lib/userspace-tso.c
>> new file mode 100644
>> index 000000000..6a4a0149b
>> --- /dev/null
>> +++ b/lib/userspace-tso.c
>> @@ -0,0 +1,53 @@
>> +/*
>> + * Copyright (c) 2020 Red Hat, Inc.
>> + *
>> + * Licensed under the Apache License, Version 2.0 (the "License");
>> + * you may not use this file except in compliance with the License.
>> + * You may obtain a copy of the License at:
>> + *
>> + *     http://www.apache.org/licenses/LICENSE-2.0
>> + *
>> + * Unless required by applicable law or agreed to in writing, software
>> + * distributed under the License is distributed on an "AS IS" BASIS,
>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
>> implied.
>> + * See the License for the specific language governing permissions and
>> + * limitations under the License.
>> + */
>> +
>> +#include <config.h>
>> +
>> +#include "smap.h"
>> +#include "ovs-thread.h"
>> +#include "openvswitch/vlog.h"
>> +#include "dpdk.h"
>> +#include "userspace-tso.h"
>> +#include "vswitch-idl.h"
>> +
>> +VLOG_DEFINE_THIS_MODULE(userspace_tso);
>> +
>> +static bool userspace_tso = false;
>> +
>> +void
>> +userspace_tso_init(const struct smap *ovs_other_config)
>> +{
>> +    if (smap_get_bool(ovs_other_config, "userspace-tso-enable", 
>> false)) {
>> +        static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
>> +
>> +        if (ovsthread_once_start(&once)) {
>> +#ifdef DPDK_NETDEV
>> +            VLOG_INFO("Userspace TCP Segmentation Offloading support 
>> enabled");
>> +            userspace_tso = true;
>> +#else
>> +            VLOG_WARN("Userspace TCP Segmentation Offloading can not 
>> be enabled"
>> +                      "since OVS is built without DPDK support.");
>> +#endif
>> +            ovsthread_once_done(&once);
>> +        }
>> +    }
>> +}
>> +
>> +bool
>> +userspace_tso_enabled(void)
>> +{
>> +    return userspace_tso;
>> +}
>> diff --git a/lib/userspace-tso.h b/lib/userspace-tso.h
>> new file mode 100644
>> index 000000000..0758274c0
>> --- /dev/null
>> +++ b/lib/userspace-tso.h
>> @@ -0,0 +1,23 @@
>> +/*
>> + * Copyright (c) 2020 Red Hat Inc.
>> + *
>> + * Licensed under the Apache License, Version 2.0 (the "License");
>> + * you may not use this file except in compliance with the License.
>> + * You may obtain a copy of the License at:
>> + *
>> + *     http://www.apache.org/licenses/LICENSE-2.0
>> + *
>> + * Unless required by applicable law or agreed to in writing, software
>> + * distributed under the License is distributed on an "AS IS" BASIS,
>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
>> implied.
>> + * See the License for the specific language governing permissions and
>> + * limitations under the License.
>> + */
>> +
>> +#ifndef USERSPACE_TSO_H
>> +#define USERSPACE_TSO_H 1
>> +
>> +void userspace_tso_init(const struct smap *ovs_other_config);
>> +bool userspace_tso_enabled(void);
>> +
>> +#endif /* userspace-tso.h */
>> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
>> index 86c7b10a9..e591c26a6 100644
>> --- a/vswitchd/bridge.c
>> +++ b/vswitchd/bridge.c
>> @@ -65,6 +65,7 @@
>>   #include "system-stats.h"
>>   #include "timeval.h"
>>   #include "tnl-ports.h"
>> +#include "userspace-tso.h"
>>   #include "util.h"
>>   #include "unixctl.h"
>>   #include "lib/vswitch-idl.h"
>> @@ -3285,6 +3286,7 @@ bridge_run(void)
>>       if (cfg) {
>>           netdev_set_flow_api_enabled(&cfg->other_config);
>>           dpdk_init(&cfg->other_config);
>> +        userspace_tso_init(&cfg->other_config);
>>       }
>>       /* Initialize the ofproto library.  This only needs to run once, 
>> but
>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>> index c43cb1aa4..3ddaaefda 100644
>> --- a/vswitchd/vswitch.xml
>> +++ b/vswitchd/vswitch.xml
>> @@ -690,6 +690,26 @@
>>            once in few hours or a day or a week.
>>           </p>
>>         </column>
>> +      <column name="other_config" key="userspace-tso-enable"
>> +              type='{"type": "boolean"}'>
>> +        <p>
>> +          Set this value to <code>true</code> to enable userspace 
>> support for
>> +          TCP Segmentation Offloading (TSO). When it is enabled, the 
>> interfaces
>> +          can provide an oversized TCP segment to the datapath and 
>> the datapath
>> +          will offload the TCP segmentation and checksum calculation 
>> to the
>> +          interfaces when necessary.
>> +        </p>
>> +        <p>
>> +          The default value is <code>false</code>. Changing this 
>> value requires
>> +          restarting the daemon.
>> +        </p>
>> +        <p>
>> +          The feature only works if Open vSwitch is built with DPDK 
>> support.
>> +        </p>
>> +        <p>
>> +          The feature is considered experimental.
>> +        </p>
>> +      </column>
>>       </group>
>>       <group title="Status">
>>         <column name="next_cfg">
>>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v5] userspace: Add TCP Segmentation Offload support

Reply via email to