Re: [ovs-dev] [PATCH v5 00/14] Support multi-segment mbufs

2018-07-13 Thread Ian Stokes

On 7/11/2018 7:23 PM, Tiago Lam wrote:

Overview

This patchset introduces support for multi-segment mbufs to OvS-DPDK.
Multi-segment mbufs are typically used when the size of an mbuf is
insufficient to contain the entirety of a packet's data. Instead, the
data is split across numerous mbufs, each carrying a portion, or
'segment', of the packet data. mbufs are chained via their 'next'
attribute (an mbuf pointer).



Thanks to all for the work on this series. I've pushed this to 
dpdk_merge and it will be part of the pull request this week.


Ian

Use Cases
=
i.  Handling oversized (guest-originated) frames, which are marked
 for hardware accelration/offload (TSO, for example).

 Packets which originate from a non-DPDK source may be marked for
 offload; as such, they may be larger than the permitted ingress
 interface's MTU, and may be stored in an oversized dp-packet. In
 order to transmit such packets over a DPDK port, their contents
 must be copied to a DPDK mbuf (via dpdk_do_tx_copy). However, in
 its current implementation, that function only copies data into
 a single mbuf; if the space available in the mbuf is exhausted,
 but not all packet data has been copied, then it is lost.
 Similarly, when cloning a DPDK mbuf, it must be considered
 whether that mbuf contains multiple segments. Both issues are
 resolved within this patchset.

ii. Handling jumbo frames.

 While OvS already supports jumbo frames, it does so by increasing
 mbuf size, such that the entirety of a jumbo frame may be handled
 in a single mbuf. This is certainly the preferred, and most
 performant approach (and remains the default).

Enabling multi-segment mbufs

Multi-segment and single-segment mbufs are mutually exclusive, and the
user must decide on which approach to adopt on init. The introduction
of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this.

This is a global boolean value, which determines how jumbo frames are
represented across all DPDK ports. In the absence of a user-supplied
value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment
mbufs must be explicitly enabled / single-segment mbufs remain the
default.

Setting the field is identical to setting existing DPDK-specific OVSDB
fields:

 ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
 ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10
 ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0
==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true

Performance notes (based on v8)
=
In order to test for regressions in performance, tests were run on top
of master 88125d6 and v8 of this patchset, both with the multi-segment
mbufs option enabled and disabled.

VSperf was used to run the phy2phy_cont and pvp_cont tests with varying
packet sizes of 64B, 1500B and 7000B, on a 10Gbps interface.

Test | Size | Master | Multi-seg disabled | Multi-seg enabled
-
p2p  |  64  | ~22.7  |  ~22.65|   ~18.3
p2p  | 1500 |  ~1.6  |~1.6|~1.6
p2p  | 7000 | ~0.36  |   ~0.36|   ~0.36
pvp  |  64  |  ~6.7  |~6.7|~6.3
pvp  | 1500 |  ~1.6  |~1.6|~1.6
pvp  | 7000 | ~0.36  |   ~0.36|   ~0.36

Packet size is in bytes, while all packet rates are reported in mpps
(aggregated).

No noticeable regression has been observed (certainly everything is
within the ± 5% margin of existing performance), aside from the 64B
packet size case when multi-segment mbuf is enabled. This is
expected, however, because of how Tx vectoriszed functions are
incompatible with multi-segment mbufs on some PMDs. The PMD under
use during these tests was the i40e (on a Intel X710 NIC), which
indeed doesn't support vectorized Tx functions with multi-segment
mbufs.

---
v5: - Rebase on master 030958a0cc ("conntrack: Fix conn_update_state_alg use
   after free.");
 - Address Eelco's comments:
   - Remove dpdk_mp_sweep() call in netdev_dpdk_mempool_configure(), a
 leftover from rebase. Only call should be in dpdk_mp_get();
   - Remove NEWS line added by mistake during rebase (about adding
 experimental vhost zero copy support).
 - Address Ian's comments:
   - Drop patch 01 from previous series entirely;
   - Patch (now) 01/14 adds a new call to dpdk_buf_size() inside
 dpdk_mp_create() to get the correct "mbuf_size" to be used;
   - Patch (now) 11/14 modifies dpdk_mp_create() to check if multi-segment
 mbufs is enabled, in which case it calculates the new "mbuf_size" to be
 used;
   - In free_dpdk_buf() and dpdk_buf_alloc(), don't lock and unlock
 conditionally.
 - Add "per-port-memory=true" to test "Multi-segment mbufs Tx" as the 
current
   DPDK set up in 

[ovs-dev] [PATCH v5 00/14] Support multi-segment mbufs

2018-07-11 Thread Tiago Lam
Overview

This patchset introduces support for multi-segment mbufs to OvS-DPDK.
Multi-segment mbufs are typically used when the size of an mbuf is
insufficient to contain the entirety of a packet's data. Instead, the
data is split across numerous mbufs, each carrying a portion, or
'segment', of the packet data. mbufs are chained via their 'next'
attribute (an mbuf pointer).

Use Cases
=
i.  Handling oversized (guest-originated) frames, which are marked
for hardware accelration/offload (TSO, for example).

Packets which originate from a non-DPDK source may be marked for
offload; as such, they may be larger than the permitted ingress
interface's MTU, and may be stored in an oversized dp-packet. In
order to transmit such packets over a DPDK port, their contents
must be copied to a DPDK mbuf (via dpdk_do_tx_copy). However, in
its current implementation, that function only copies data into
a single mbuf; if the space available in the mbuf is exhausted,
but not all packet data has been copied, then it is lost.
Similarly, when cloning a DPDK mbuf, it must be considered
whether that mbuf contains multiple segments. Both issues are
resolved within this patchset.

ii. Handling jumbo frames.

While OvS already supports jumbo frames, it does so by increasing
mbuf size, such that the entirety of a jumbo frame may be handled
in a single mbuf. This is certainly the preferred, and most
performant approach (and remains the default).

Enabling multi-segment mbufs

Multi-segment and single-segment mbufs are mutually exclusive, and the
user must decide on which approach to adopt on init. The introduction
of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this.

This is a global boolean value, which determines how jumbo frames are
represented across all DPDK ports. In the absence of a user-supplied
value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment
mbufs must be explicitly enabled / single-segment mbufs remain the
default.

Setting the field is identical to setting existing DPDK-specific OVSDB
fields:

ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10
ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0
==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true

Performance notes (based on v8)
=
In order to test for regressions in performance, tests were run on top
of master 88125d6 and v8 of this patchset, both with the multi-segment
mbufs option enabled and disabled.

VSperf was used to run the phy2phy_cont and pvp_cont tests with varying
packet sizes of 64B, 1500B and 7000B, on a 10Gbps interface.

Test | Size | Master | Multi-seg disabled | Multi-seg enabled
-
p2p  |  64  | ~22.7  |  ~22.65|   ~18.3
p2p  | 1500 |  ~1.6  |~1.6|~1.6
p2p  | 7000 | ~0.36  |   ~0.36|   ~0.36
pvp  |  64  |  ~6.7  |~6.7|~6.3
pvp  | 1500 |  ~1.6  |~1.6|~1.6
pvp  | 7000 | ~0.36  |   ~0.36|   ~0.36

Packet size is in bytes, while all packet rates are reported in mpps
(aggregated).

No noticeable regression has been observed (certainly everything is
within the ± 5% margin of existing performance), aside from the 64B
packet size case when multi-segment mbuf is enabled. This is
expected, however, because of how Tx vectoriszed functions are
incompatible with multi-segment mbufs on some PMDs. The PMD under
use during these tests was the i40e (on a Intel X710 NIC), which
indeed doesn't support vectorized Tx functions with multi-segment
mbufs.

---
v5: - Rebase on master 030958a0cc ("conntrack: Fix conn_update_state_alg use
  after free.");
- Address Eelco's comments:
  - Remove dpdk_mp_sweep() call in netdev_dpdk_mempool_configure(), a
leftover from rebase. Only call should be in dpdk_mp_get();
  - Remove NEWS line added by mistake during rebase (about adding
experimental vhost zero copy support).
- Address Ian's comments:
  - Drop patch 01 from previous series entirely;
  - Patch (now) 01/14 adds a new call to dpdk_buf_size() inside
dpdk_mp_create() to get the correct "mbuf_size" to be used;
  - Patch (now) 11/14 modifies dpdk_mp_create() to check if multi-segment
mbufs is enabled, in which case it calculates the new "mbuf_size" to be
used;
  - In free_dpdk_buf() and dpdk_buf_alloc(), don't lock and unlock
conditionally.
- Add "per-port-memory=true" to test "Multi-segment mbufs Tx" as the current
  DPDK set up in system-dpdk-testsuite can't handle higher MTU sizes using
  the shared mempool model (runs out of memory);
- Add new examples for when multi-segment mbufs are enabled in
  topics/dpdk/memory.rst, and a reference to