On 7/11/2018 7:23 PM, Tiago Lam wrote:
Overview
========
This patchset introduces support for multi-segment mbufs to OvS-DPDK.
Multi-segment mbufs are typically used when the size of an mbuf is
insufficient to contain the entirety of a packet's data. Instead, the
data is split across numerous mbufs, each carrying a portion, or
'segment', of the packet data. mbufs are chained via their 'next'
attribute (an mbuf pointer).


Thanks to all for the work on this series. I've pushed this to dpdk_merge and it will be part of the pull request this week.

Ian
Use Cases
=========
i.  Handling oversized (guest-originated) frames, which are marked
     for hardware accelration/offload (TSO, for example).

     Packets which originate from a non-DPDK source may be marked for
     offload; as such, they may be larger than the permitted ingress
     interface's MTU, and may be stored in an oversized dp-packet. In
     order to transmit such packets over a DPDK port, their contents
     must be copied to a DPDK mbuf (via dpdk_do_tx_copy). However, in
     its current implementation, that function only copies data into
     a single mbuf; if the space available in the mbuf is exhausted,
     but not all packet data has been copied, then it is lost.
     Similarly, when cloning a DPDK mbuf, it must be considered
     whether that mbuf contains multiple segments. Both issues are
     resolved within this patchset.

ii. Handling jumbo frames.

     While OvS already supports jumbo frames, it does so by increasing
     mbuf size, such that the entirety of a jumbo frame may be handled
     in a single mbuf. This is certainly the preferred, and most
     performant approach (and remains the default).

Enabling multi-segment mbufs
============================
Multi-segment and single-segment mbufs are mutually exclusive, and the
user must decide on which approach to adopt on init. The introduction
of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this.

This is a global boolean value, which determines how jumbo frames are
represented across all DPDK ports. In the absence of a user-supplied
value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment
mbufs must be explicitly enabled / single-segment mbufs remain the
default.

Setting the field is identical to setting existing DPDK-specific OVSDB
fields:

     ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
     ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10
     ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0
==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true

Performance notes (based on v8)
=================
In order to test for regressions in performance, tests were run on top
of master 88125d6 and v8 of this patchset, both with the multi-segment
mbufs option enabled and disabled.

VSperf was used to run the phy2phy_cont and pvp_cont tests with varying
packet sizes of 64B, 1500B and 7000B, on a 10Gbps interface.

Test | Size | Master | Multi-seg disabled | Multi-seg enabled
-------------------------------------------------------------
p2p  |  64  | ~22.7  |      ~22.65        |       ~18.3
p2p  | 1500 |  ~1.6  |        ~1.6        |        ~1.6
p2p  | 7000 | ~0.36  |       ~0.36        |       ~0.36
pvp  |  64  |  ~6.7  |        ~6.7        |        ~6.3
pvp  | 1500 |  ~1.6  |        ~1.6        |        ~1.6
pvp  | 7000 | ~0.36  |       ~0.36        |       ~0.36

Packet size is in bytes, while all packet rates are reported in mpps
(aggregated).

No noticeable regression has been observed (certainly everything is
within the ± 5% margin of existing performance), aside from the 64B
packet size case when multi-segment mbuf is enabled. This is
expected, however, because of how Tx vectoriszed functions are
incompatible with multi-segment mbufs on some PMDs. The PMD under
use during these tests was the i40e (on a Intel X710 NIC), which
indeed doesn't support vectorized Tx functions with multi-segment
mbufs.

---
v5: - Rebase on master 030958a0cc ("conntrack: Fix conn_update_state_alg use
       after free.");
     - Address Eelco's comments:
       - Remove dpdk_mp_sweep() call in netdev_dpdk_mempool_configure(), a
         leftover from rebase. Only call should be in dpdk_mp_get();
       - Remove NEWS line added by mistake during rebase (about adding
         experimental vhost zero copy support).
     - Address Ian's comments:
       - Drop patch 01 from previous series entirely;
       - Patch (now) 01/14 adds a new call to dpdk_buf_size() inside
         dpdk_mp_create() to get the correct "mbuf_size" to be used;
       - Patch (now) 11/14 modifies dpdk_mp_create() to check if multi-segment
         mbufs is enabled, in which case it calculates the new "mbuf_size" to be
         used;
       - In free_dpdk_buf() and dpdk_buf_alloc(), don't lock and unlock
         conditionally.
     - Add "per-port-memory=true" to test "Multi-segment mbufs Tx" as the 
current
       DPDK set up in system-dpdk-testsuite can't handle higher MTU sizes using
       the shared mempool model (runs out of memory);
     - Add new examples for when multi-segment mbufs are enabled in
       topics/dpdk/memory.rst, and a reference to topics/dpdk/jumbo-frames.rst
       (now patch 11/14).

v4: - Rebase on master b22fb00 ("ovn-nbctl: Clarify error messages in qos-add
       command."):
       - A new patch (now 01/15) has been introduced to differentiate between
         MTU and mbuf size when creating the mempool. This is because as part of
         the new support for both per port and shared mempools, mempools were
         being reused based on the mbuf size, and for multi-segment mbufs the
         mbuf size can end up being the same for all mbufs;
       - A couple of other patches (now 02/15 and 12/15) have been modified as
         part of the rebase, but only to adapt to the code changes to "Support
         both shared and per port mempools.", no functionality should have been
         changed.

v3:
     - Address Eelco's comment:
         - Fix the ovs_assert() introduced in v2 in __packet_set_data(), which
           wasn't correctly asserting that the passed 'v' was smaller than the
           first mbuf's buf_len.

v2:
     - Rebase on master e7cd8cf ("db-ctl-base: Don't die in cmd_destroy() on
       error.");
     - Address Eelco's comments:
         - In free_dpdk_buf(), use mbuf's struct address in dp_packet instead of
           casting;
         - Remove unneeded variable in dp_packet_set_size(), pointing to the
           first mbuf in the chain;
         - Assert in dp_packet_set_size() to enforce that "pkt_len == v" is
           always true for DPBUF_DPDK packets;
         - Assert in __packet_set_data() to enforce that data_off never goes
           beyond the first mbuf in the chain.

v1:
     - v8 should have been sent as v1 really, as that's usually the approach
       followed in OvS. That clearly didn't happen, so restarting the series
       now. This also helps making it clear it is no longer an RFC series;
     - Rebase on master e461481 ("ofp-meter: Fix ofp_print_meter_flags()
       output.");
     - Address Eelco's comments:
         - Change dp_packet_size() so that for `DPBUF_DPDK` packets their
           `pkt_len` and `data_len` can't be set to values bigger than the
           available space. Also fix assigment to `data_len` which was
           incorrectly being set to just`pkt_len`;
         - Improve `nonpmd_mp_mutex` comment with a better explanation as to
           why the mutex is needed;
         - Fix dp_packet_clear() to not call rte_pktmbuf_reset() for non
           `DPBUF_DPDK` packets;
         - Dropped `if` clause in dp_packet_l4_size(), keep just the `else`;
         - Change dp_packet_clone_with_headroom() to use rte_pktmbuf_read() for
           copying `DPBUF_DPDK` packets' data. Also, change it to return
           appropriate and meaningful errors, instead of just "0" or "1";
         - Change dpdk_prep_tx_buf() name's to dpdk_clone_dp_packet_to_mbuf(),
           and reuse dp_packet_mbuf_write() instead of manual copy;
         - Add note vswitchd/vswitch.xml to make it clear the enabling of
           dpdk-multi-seg-mbufs requires a restart;
     - Change dpdk_mp_create() to increase # mbufs used under the multi-segment
       mbufs approach;
     - Increase test coverage by adding "end-to-end" tests that verify that
       "dpdk-multi-seg-mbufs" is disabled by default and that a packet is
       successfully sent out;
     - Some helper funcs such as dp_packet_tail() and dp_packet_end() were moved
       back to be common between `DPBUF_DPDK` and non `DPBUF_DPDK` packets, to
       minimise changes;
     - Add some extra notes to "Performance notes" in jumbo-frames.rst doc,
       after further testing;
     - Structure changes:
       - Drop patch 07/13 which is now unneeded;
       - Two more patches added for extra test coverage. This is what accounts
         for the increase in size (+1 patch) in the series.

v8 (non-RFC):
     - Rebase on master 88125d6 ("rhel: remove ovs-sim man page from
       temporary directory (also for RHEL)");
     - Address Ciara's and Ilya's comments:
         - Drop the dp_packet_mbuf_tail() function and use only the
           already existing dp_packet_tail();
         - Fix bug in dpdk_do_tx_copy() where the error return from
           dpdk_prep_tx_buf() was being wrongly checked;
         - Use dpdk_buf_alloc() and free_dpdk_buf() instead of
           rte_pktmbuf_alloc() and rte_pktmbuf_free();
         - Fix some other code style and duplication issues pointed out.
     - Refactored dp_packet_shift(), dp_packet_resize__() and
       dp_packet_put*() functions to work within the bounds of existing
       mbufs only;
     - Fix dp_packet_clear() which wasn't correctly clearing / freeing
       other mbufs in the chain for chains with more than a single mbuf;
     - dp_packet layer functions (such as dp_packet_l3()) now check if
       the header is within the first mbuf, when using mbufs;
     - Move patch 08/13 to before patch 04/13, since dp_packet_set_size()
       was refactored to use free_dpdk_buf();
     - Fix wrong rte_memcpy() when performing dp_packet_clone() which was
       leading to memory corruption;
     - Modified the added tests to account for some of the above changes;
     - Run performance tests, compiling results and adding them to the
       cover letter;
     - Add a multi-seg mbufs explanation to the jumbo-frames.rst doc,
       together with a "Performance notes" sub-section reflecting the
       findings mentioned above in the cover letter.
v7: - Rebase on master 5e720da ("erspan: fix invalid erspan version.");
      - Address Ilya comments;
         - Fix non-DPDK build;
         - Serialise the access of non pmds to allocation and free of mbufs by
           using a newly introduced mutex.
      - Add a new set of tests that integrates with the recently added DPDK
        testsuite. These focus on allocating dp_packets, with a single or
        multiple mbufs, from an instantiated mempool and performing several
        operations on those, verifying if the data at the end matches what's
        expected;
      - Fix bugs found by the new tests:
         - dp_packet_shift() wasn't taking into account shift lefts;
         - dp_packet_resize__() was misusing and miscalculating the tailrooms
           and headrooms, ending up calculating the wrong number of segments
           that needed allocation;
         - An mbuf's end was being miscalculated both in dp_packet_tail,
           dp_packet_mbuf_tail() and dp_packet_end();
         - dp_packet_set_size() was not updating the number of chained segments
           'nb_segs';
      - Add support for multi-seg mbufs in dp_packet_clear().

v6:  - Rebase on master 7c0cb29 ("conntrack-tcp: Handle tcp session
        reuse.");
      - Further improve dp_packet_put_uninit() and dp_packet_shift() to
        support multi-seg mbufs;
      - Add support for multi-seg mbufs in dp_packet_l4_size() and
        improve other helper funcs, such as dp_packet_tail() and dp_
        packet_tailroom().
      - Add support for multi-seg mbufs in dp_packet_put(), dp_packet_
        put_zeros(), as well as dp_packet_resize__() - allocating new
        mbufs and linking them together;
      Restructured patchset:
      - Squash patch 5 into patch 6, since they were both related to
        copying data while handling multi-seg mbufs;
      - Split patch 4 into two separate patches - one that introduces the
        changes in helper functions to deal with multi-seg mbufs and
        two others for the shift() and put_uninit() functionality;
      - Move patch 4 to before patch 3, so that ihelper functions come
        before functionality improvement that rely on those helpers.

v5: - Rebased on master e5e22dc ("datapath-windows: Prevent ct-counters
       from getting redundantly incremented");
     - Sugesh's comments have been addressed:
       - Changed dp_packet_set_data() and dp_packet_set_size() logic to
         make them independent of each other;
       - Dropped patch 3 now that dp_packet_set_data() and dp_packet_set_
         size() are independent;
       - dp_packet_clone_with_headroom() now has split functions for
         handling DPDK sourced packets and non-DPDK packets;
     - Modified various functions in dp-packet.h to account for multi-seg
       mbufs - dp_packet_put_uninit(), dp_packet_tail(), dp_packet_tail()
       and dp_packet_at();
     - Added support for shifting packet data in multi-seg mbufs, using
       dp_packet_shift();
     - Fixed some minor inconsistencies.

     Note that some of the changes in v5 have been contributed by Mark
     Kavanagh as well.

v4: - restructure patchset
     - account for 128B ARM cacheline when sizing mbufs

Mark Kavanagh (4):
   netdev-dpdk: fix mbuf sizing
   dp-packet: Init specific mbuf fields.
   netdev-dpdk: copy large packet to multi-seg. mbufs
   netdev-dpdk: support multi-segment jumbo frames

Michael Qiu (1):
   dp-packet: copy data from multi-seg. DPDK mbuf

Tiago Lam (9):
   dp-packet: Fix allocated size on DPDK init.
   netdev-dpdk: Serialise non-pmds mbufs' alloc/free.
   dp-packet: Fix data_len handling multi-seg mbufs.
   dp-packet: Handle multi-seg mbufs in helper funcs.
   dp-packet: Handle multi-seg mubfs in shift() func.
   dp-packet: Handle multi-seg mbufs in resize__().
   dpdk-tests: Add uni-tests for multi-seg mbufs.
   dpdk-tests: Accept other configs in OVS_DPDK_START
   dpdk-tests: End-to-end tests for multi-seg mbufs.

  Documentation/topics/dpdk/jumbo-frames.rst |  52 +++
  Documentation/topics/dpdk/memory.rst       |  36 ++
  NEWS                                       |   1 +
  lib/dp-packet.c                            | 221 +++++++++++-
  lib/dp-packet.h                            | 214 ++++++++++--
  lib/dpdk.c                                 |   8 +
  lib/netdev-dpdk.c                          | 244 +++++++++++---
  lib/netdev-dpdk.h                          |   2 +
  tests/automake.mk                          |  10 +-
  tests/dpdk-packet-mbufs.at                 |   7 +
  tests/system-dpdk-macros.at                |   6 +-
  tests/system-dpdk-testsuite.at             |   1 +
  tests/system-dpdk.at                       |  65 ++++
  tests/test-dpdk-mbufs.c                    | 518 +++++++++++++++++++++++++++++
  vswitchd/vswitch.xml                       |  22 ++
  15 files changed, 1328 insertions(+), 79 deletions(-)
  create mode 100644 tests/dpdk-packet-mbufs.at
  create mode 100644 tests/test-dpdk-mbufs.c


_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to