Re: [ovs-dev] [PATCH v5 00/14] Support multi-segment mbufs
On 7/11/2018 7:23 PM, Tiago Lam wrote: Overview This patchset introduces support for multi-segment mbufs to OvS-DPDK. Multi-segment mbufs are typically used when the size of an mbuf is insufficient to contain the entirety of a packet's data. Instead, the data is split across numerous mbufs, each carrying a portion, or 'segment', of the packet data. mbufs are chained via their 'next' attribute (an mbuf pointer). Thanks to all for the work on this series. I've pushed this to dpdk_merge and it will be part of the pull request this week. Ian Use Cases = i. Handling oversized (guest-originated) frames, which are marked for hardware accelration/offload (TSO, for example). Packets which originate from a non-DPDK source may be marked for offload; as such, they may be larger than the permitted ingress interface's MTU, and may be stored in an oversized dp-packet. In order to transmit such packets over a DPDK port, their contents must be copied to a DPDK mbuf (via dpdk_do_tx_copy). However, in its current implementation, that function only copies data into a single mbuf; if the space available in the mbuf is exhausted, but not all packet data has been copied, then it is lost. Similarly, when cloning a DPDK mbuf, it must be considered whether that mbuf contains multiple segments. Both issues are resolved within this patchset. ii. Handling jumbo frames. While OvS already supports jumbo frames, it does so by increasing mbuf size, such that the entirety of a jumbo frame may be handled in a single mbuf. This is certainly the preferred, and most performant approach (and remains the default). Enabling multi-segment mbufs Multi-segment and single-segment mbufs are mutually exclusive, and the user must decide on which approach to adopt on init. The introduction of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This is a global boolean value, which determines how jumbo frames are represented across all DPDK ports. In the absence of a user-supplied value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment mbufs must be explicitly enabled / single-segment mbufs remain the default. Setting the field is identical to setting existing DPDK-specific OVSDB fields: ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10 ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0 ==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true Performance notes (based on v8) = In order to test for regressions in performance, tests were run on top of master 88125d6 and v8 of this patchset, both with the multi-segment mbufs option enabled and disabled. VSperf was used to run the phy2phy_cont and pvp_cont tests with varying packet sizes of 64B, 1500B and 7000B, on a 10Gbps interface. Test | Size | Master | Multi-seg disabled | Multi-seg enabled - p2p | 64 | ~22.7 | ~22.65| ~18.3 p2p | 1500 | ~1.6 |~1.6|~1.6 p2p | 7000 | ~0.36 | ~0.36| ~0.36 pvp | 64 | ~6.7 |~6.7|~6.3 pvp | 1500 | ~1.6 |~1.6|~1.6 pvp | 7000 | ~0.36 | ~0.36| ~0.36 Packet size is in bytes, while all packet rates are reported in mpps (aggregated). No noticeable regression has been observed (certainly everything is within the ± 5% margin of existing performance), aside from the 64B packet size case when multi-segment mbuf is enabled. This is expected, however, because of how Tx vectoriszed functions are incompatible with multi-segment mbufs on some PMDs. The PMD under use during these tests was the i40e (on a Intel X710 NIC), which indeed doesn't support vectorized Tx functions with multi-segment mbufs. --- v5: - Rebase on master 030958a0cc ("conntrack: Fix conn_update_state_alg use after free."); - Address Eelco's comments: - Remove dpdk_mp_sweep() call in netdev_dpdk_mempool_configure(), a leftover from rebase. Only call should be in dpdk_mp_get(); - Remove NEWS line added by mistake during rebase (about adding experimental vhost zero copy support). - Address Ian's comments: - Drop patch 01 from previous series entirely; - Patch (now) 01/14 adds a new call to dpdk_buf_size() inside dpdk_mp_create() to get the correct "mbuf_size" to be used; - Patch (now) 11/14 modifies dpdk_mp_create() to check if multi-segment mbufs is enabled, in which case it calculates the new "mbuf_size" to be used; - In free_dpdk_buf() and dpdk_buf_alloc(), don't lock and unlock conditionally. - Add "per-port-memory=true" to test "Multi-segment mbufs Tx" as the current DPDK set up in
[ovs-dev] [PATCH v5 00/14] Support multi-segment mbufs
Overview This patchset introduces support for multi-segment mbufs to OvS-DPDK. Multi-segment mbufs are typically used when the size of an mbuf is insufficient to contain the entirety of a packet's data. Instead, the data is split across numerous mbufs, each carrying a portion, or 'segment', of the packet data. mbufs are chained via their 'next' attribute (an mbuf pointer). Use Cases = i. Handling oversized (guest-originated) frames, which are marked for hardware accelration/offload (TSO, for example). Packets which originate from a non-DPDK source may be marked for offload; as such, they may be larger than the permitted ingress interface's MTU, and may be stored in an oversized dp-packet. In order to transmit such packets over a DPDK port, their contents must be copied to a DPDK mbuf (via dpdk_do_tx_copy). However, in its current implementation, that function only copies data into a single mbuf; if the space available in the mbuf is exhausted, but not all packet data has been copied, then it is lost. Similarly, when cloning a DPDK mbuf, it must be considered whether that mbuf contains multiple segments. Both issues are resolved within this patchset. ii. Handling jumbo frames. While OvS already supports jumbo frames, it does so by increasing mbuf size, such that the entirety of a jumbo frame may be handled in a single mbuf. This is certainly the preferred, and most performant approach (and remains the default). Enabling multi-segment mbufs Multi-segment and single-segment mbufs are mutually exclusive, and the user must decide on which approach to adopt on init. The introduction of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This is a global boolean value, which determines how jumbo frames are represented across all DPDK ports. In the absence of a user-supplied value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment mbufs must be explicitly enabled / single-segment mbufs remain the default. Setting the field is identical to setting existing DPDK-specific OVSDB fields: ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10 ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0 ==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true Performance notes (based on v8) = In order to test for regressions in performance, tests were run on top of master 88125d6 and v8 of this patchset, both with the multi-segment mbufs option enabled and disabled. VSperf was used to run the phy2phy_cont and pvp_cont tests with varying packet sizes of 64B, 1500B and 7000B, on a 10Gbps interface. Test | Size | Master | Multi-seg disabled | Multi-seg enabled - p2p | 64 | ~22.7 | ~22.65| ~18.3 p2p | 1500 | ~1.6 |~1.6|~1.6 p2p | 7000 | ~0.36 | ~0.36| ~0.36 pvp | 64 | ~6.7 |~6.7|~6.3 pvp | 1500 | ~1.6 |~1.6|~1.6 pvp | 7000 | ~0.36 | ~0.36| ~0.36 Packet size is in bytes, while all packet rates are reported in mpps (aggregated). No noticeable regression has been observed (certainly everything is within the ± 5% margin of existing performance), aside from the 64B packet size case when multi-segment mbuf is enabled. This is expected, however, because of how Tx vectoriszed functions are incompatible with multi-segment mbufs on some PMDs. The PMD under use during these tests was the i40e (on a Intel X710 NIC), which indeed doesn't support vectorized Tx functions with multi-segment mbufs. --- v5: - Rebase on master 030958a0cc ("conntrack: Fix conn_update_state_alg use after free."); - Address Eelco's comments: - Remove dpdk_mp_sweep() call in netdev_dpdk_mempool_configure(), a leftover from rebase. Only call should be in dpdk_mp_get(); - Remove NEWS line added by mistake during rebase (about adding experimental vhost zero copy support). - Address Ian's comments: - Drop patch 01 from previous series entirely; - Patch (now) 01/14 adds a new call to dpdk_buf_size() inside dpdk_mp_create() to get the correct "mbuf_size" to be used; - Patch (now) 11/14 modifies dpdk_mp_create() to check if multi-segment mbufs is enabled, in which case it calculates the new "mbuf_size" to be used; - In free_dpdk_buf() and dpdk_buf_alloc(), don't lock and unlock conditionally. - Add "per-port-memory=true" to test "Multi-segment mbufs Tx" as the current DPDK set up in system-dpdk-testsuite can't handle higher MTU sizes using the shared mempool model (runs out of memory); - Add new examples for when multi-segment mbufs are enabled in topics/dpdk/memory.rst, and a reference to