On 6/26/2018 3:33 PM, Kevin Traynor wrote:
On 06/25/2018 12:56 PM, Ian Stokes wrote:This commit re-introduces the concept of shared mempools as the default memory model for DPDK devices. Per port mempools are still available but must be enabled explicitly by a user.OVS previously used a shared mempool model for ports with the same MTU and socket configuration. This was replaced by a per port mempool model to address issues flagged by users such as: https://mail.openvswitch.org/pipermail/ovs-discuss/2016-September/042560.html However the per port model potentially requires an increase in memory resource requirements to support the same number of ports and configuration as the shared port model. This is considered a blocking factor for current deployments of OVS when upgrading to future OVS releases as a user may have to redimension memory for the same deployment configuration. This may not be possible for users. This commit resolves the issue by re-introducing shared mempools as the default memory behaviour in OVS DPDK but also refactors the memory configuration code to allow for per port mempools. This patch adds a new global config option, per-port-memory, that controls the enablement of per port mempools for DPDK devices. ovs-vsctl set Open_vSwitch . other_config:per-port-memory=true This value defaults to false; to enable per port memory support, this field should be set to true when setting other global parameters on init (such as "dpdk-socket-mem", for example). Changing the value at runtime is not supported, and requires restarting the vswitch daemon. The mempool sweep functionality is also replaced with the sweep functionality from OVS 2.9 found in commits c77f692 (netdev-dpdk: Free mempool only when no in-use mbufs.) a7fb0a4 (netdev-dpdk: Add mempool reuse/free debug.) A new document to discuss the specifics of the memory models and example memory requirement calculations is also added. Signed-off-by: Ian Stokes <[email protected]> --- Documentation/automake.mk | 1 + Documentation/intro/install/dpdk.rst | 6 + Documentation/topics/dpdk/index.rst | 1 + Documentation/topics/dpdk/memory.rst | 216 +++++++++++++++++++++++++ NEWS | 1 + lib/dpdk-stub.c | 6 + lib/dpdk.c | 12 ++ lib/dpdk.h | 1 + lib/netdev-dpdk.c | 296 +++++++++++++++++++++++------------ vswitchd/vswitch.xml | 17 ++ 10 files changed, 457 insertions(+), 100 deletions(-) create mode 100644 Documentation/topics/dpdk/memory.rst diff --git a/Documentation/automake.mk b/Documentation/automake.mk index bc728df..2444794 100644 --- a/Documentation/automake.mk +++ b/Documentation/automake.mk @@ -36,6 +36,7 @@ DOC_SOURCE = \ Documentation/topics/dpdk/index.rst \ Documentation/topics/dpdk/bridge.rst \ Documentation/topics/dpdk/jumbo-frames.rst \ + Documentation/topics/dpdk/memory.rst \ Documentation/topics/dpdk/pdump.rst \ Documentation/topics/dpdk/phy.rst \ Documentation/topics/dpdk/pmd.rst \ diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst index 085e479..7dc75ef 100644 --- a/Documentation/intro/install/dpdk.rst +++ b/Documentation/intro/install/dpdk.rst @@ -170,6 +170,12 @@ Mount the hugepages, if not already mounted by default::$ mount -t hugetlbfs none /dev/hugepages`` +.. note::+ + The amount of hugepage memory required can be affected by various + aspects of the datapath and device configuration. Refer to + :doc:`/topics/dpdk/memory` for more details. + .. _dpdk-vfio:Setup DPDK devices using VFIOdiff --git a/Documentation/topics/dpdk/index.rst b/Documentation/topics/dpdk/index.rst index 181f61a..cf24a7b 100644 --- a/Documentation/topics/dpdk/index.rst +++ b/Documentation/topics/dpdk/index.rst @@ -40,3 +40,4 @@ The DPDK Datapath /topics/dpdk/qos /topics/dpdk/pdump /topics/dpdk/jumbo-frames + /topics/dpdk/memory diff --git a/Documentation/topics/dpdk/memory.rst b/Documentation/topics/dpdk/memory.rst new file mode 100644 index 0000000..2e0b3f5 --- /dev/null +++ b/Documentation/topics/dpdk/memory.rst @@ -0,0 +1,216 @@ +.. + Copyright (c) 2018 Intel Corporation + + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Convention for heading levels in Open vSwitch documentation: + + ======= Heading 0 (reserved for the title in a document) + ------- Heading 1 + ~~~~~~~ Heading 2 + +++++++ Heading 3 + ''''''' Heading 4 + + Avoid deeper levels because they do not render well. + +========================= +DPDK Device Memory Models +========================= + +DPDK device memory can be allocated in one of two ways in OVS DPDK, +**shared memory** or **per port memory**. The specifics of both are +detailed below. + +Shared Memory +------------- + +By default OVS DPDK uses a shared memory model. This means that multiple +ports can share the same mempool. For example when a port is added it will +have a given MTU and socket ID associated with it. If a mempool has been +created previously for an existing port that has the same MTU and socket ID, +that mempool is used for both ports. If there is no existing mempool +supporting these parameters then a new mempool is created. + +Per Port Memory +--------------- + +In the per port memory model, mempools are created per device and are not +shared. The benefit of this is a more transparent memory model where mempools +will not be exhausted by other DPDK devices. However this comes at a potential +increase in cost for memory dimensioning for a given deployment. Users should +be aware of the memory requirements for their deployment before using this +model and allocate the required hugepage memory. + +Per port mempool support may be enabled via a global config value, +```per-port-memory```. Setting this to true enables the per port memory +model for all DPDK devices in OVS:: + + $ ovs-vsctl set Open_vSwitch . other_config:per-port-memory=true + +.. important:: + + This value should be set before setting dpdk-init=true. If set after + dpdk-init=true then the daemon must be restarted to use per-port-memory. + +Calculating Memory Requirements +------------------------------- + +The amount of memory required for a given mempool can be calculated by the +**number mbufs in the mempool \* mbuf size**. + +Users should be aware of the following: + +* The **number of mbufs** per mempool will differ between memory models. + +* The **size of each mbuf** will be affected by the requested **MTU** size. + +.. important:: + + An mbuf size in bytes can be larger than the requested MTU size due tos/can be/is always/+ allignment and rounding needed in OVS DPDK.alignment+ +Below are a number of examples of memory requirement calculations for both +shared and per port memory models. +-----8<------/* Tries to allocate a new mempool - or re-use an existing one where- * appropriate - on requested_socket_id with a size determined by - * requested_mtu and requested Rx/Tx queues. - * On success - or when re-using an existing mempool - the new configuration - * will be applied. +/* Depending on the memory model being used this function triestries to+ * identify and reuse an existing mempool or tries to allocate new + * mempool on requested_socket_id with mbuf size corresponding to + * requested_mtu. On success new configuration will be applied. * On error, device will be left unchanged. */ static int netdev_dpdk_mempool_configure(struct netdev_dpdk *dev) OVS_REQUIRES(dev->mutex) { uint32_t buf_size = dpdk_buf_size(dev->requested_mtu); - struct rte_mempool *mp; + struct dpdk_mp *dmp; int ret = 0; + bool per_port_mp = dpdk_per_port_memory();- dpdk_mp_sweep();+ /* With shared mempools we do not need to configure a mempool if the MTU + * and socket ID have not changed, the previous configuration is still + * valid so return 0 */ + if (!per_port_mp && dev->mtu == dev->requested_mtu + && dev->socket_id == dev->requested_socket_id) { + return ret; + }- mp = dpdk_mp_create(dev, FRAME_LEN_TO_MTU(buf_size));- if (!mp) { + dmp = dpdk_mp_get(dev, FRAME_LEN_TO_MTU(buf_size), per_port_mp); + if (!dmp) { VLOG_ERR("Failed to create memory pool for netdev " "%s, with MTU %d on socket %d: %s\n", dev->up.name, dev->requested_mtu, dev->requested_socket_id, rte_strerror(rte_errno)); - ret = rte_errno; + return rte_errno;No need to return here and have the else below. You could set ret or remove the else.
Sure I'll just set ret = rte_errno and return below.
} else { - /* If a new MTU was requested and its rounded value equals the one - * that is currently used, then the existing mempool is returned. */ - if (dev->mp != mp) { - /* A new mempool was created, release the previous one. */ - dpdk_mp_release(dev->mp); - } else { - ret = EEXIST; + /* Check for any pre-existing dpdk_mp for the device before accessing + * the associated mempool. + */ + if (dev->dpdk_mp != NULL) { + /* A new MTU was requested and its rounded value does not + * equal the rounded value of the current mempool, decrement + * the reference count to the devices current dpdk_mp as its + * associated mempool will not be used for this device. + */The comment needs updating. You could be reusing the existing mempool, in which case it's still ok to mp_put because you have just incremented it...
Good catch, will update for the v2.
About that, I know it was my comment to remove an unnecessary check here :-/ but i'm wondering if the refcount inc and then dec for the EEXISTS case is unintuitive and it would better the way you had it with set refcount =1 in mp_get, and then have the check for "if (dev->dpdk_mp != dmp)" before you mp_put. What do you think?
I think the approach we have at the moment is better because it handles the per port model of EEXIST and handles the reuse case for a single port for the shared_memory model described below.
Add a port 'dpdk0' with default MTU 1500. The associated dpdk mempool will have a refcount of 1.
Now set the mtu of 'dpdk0' to 1800. The existing dpdk mempool will be reused and incremented in mp_get. refcount will now be 2 but this is incorrect as there is only 1 port and it's reusing the same mempool, we have to decrement the refcount by 1 for it to be correct.
The code as is handles that situation. If you checked for "if (dev->dpdk_mp != dmp)" you wouldn't decrement in this case.
Maybe it needs a comment to this effect? What are your thoughts? Ian
+ dpdk_mp_put(dev>dpdk_mp);} - dev->mp = mp; + dev->dpdk_mp = dmp; dev->mtu = dev->requested_mtu; dev->socket_id = dev->requested_socket_id; dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); @@ -855,7 +950,8 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq)for (i = 0; i < n_rxq; i++) {diag = rte_eth_rx_queue_setup(dev->port_id, i, dev->rxq_size, - dev->socket_id, NULL, dev->mp); + dev->socket_id, NULL, + dev->dpdk_mp->mp); if (diag) { VLOG_INFO("Interface %s unable to setup rxq(%d): %s", dev->up.name, i, rte_strerror(-diag)); @@ -950,7 +1046,7 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) memcpy(dev->hwaddr.ea, eth_addr.addr_bytes, ETH_ADDR_LEN); rte_eth_link_get_nowait(dev->port_id, &dev->link);- mbp_priv = rte_mempool_get_priv(dev->mp);+ mbp_priv = rte_mempool_get_priv(dev->dpdk_mp->mp); dev->buf_size = mbp_priv->mbuf_data_room_size - RTE_PKTMBUF_HEADROOM;/* Get the Flow control configuration for DPDK-ETH */@@ -1204,7 +1300,7 @@ common_destruct(struct netdev_dpdk *dev) OVS_EXCLUDED(dev->mutex) { rte_free(dev->tx_q); - dpdk_mp_release(dev->mp); + dpdk_mp_put(dev->dpdk_mp);ovs_list_remove(&dev->list_node);free(ovsrcu_get_protected(struct ingress_policer *, @@ -1957,7 +2053,7 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq, return EAGAIN; }- nb_rx = rte_vhost_dequeue_burst(vid, qid, dev->mp,+ nb_rx = rte_vhost_dequeue_burst(vid, qid, dev->dpdk_mp->mp, (struct rte_mbuf **) batch->packets, NETDEV_MAX_BURST); if (!nb_rx) { @@ -2196,7 +2292,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) continue; }- pkts[txcnt] = rte_pktmbuf_alloc(dev->mp);+ pkts[txcnt] = rte_pktmbuf_alloc(dev->dpdk_mp->mp); if (OVS_UNLIKELY(!pkts[txcnt])) { dropped += cnt - i; break; @@ -3075,7 +3171,7 @@ netdev_dpdk_get_mempool_info(struct unixctl_conn *conn, ovs_mutex_lock(&dev->mutex); ovs_mutex_lock(&dpdk_mp_mutex);- rte_mempool_dump(stream, dev->mp);+ rte_mempool_dump(stream, dev->dpdk_mp->mp);ovs_mutex_unlock(&dpdk_mp_mutex);ovs_mutex_unlock(&dev->mutex); @@ -3772,7 +3868,7 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev)err = netdev_dpdk_mempool_configure(dev);if (!err) { - /* A new mempool was created. */ + /* A new mempool was created or re-used. */ netdev_change_seq_changed(&dev->up); } else if (err != EEXIST){ return err; diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index b93da69..b9aba8c 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -360,6 +360,23 @@ </p> </column>+ <column name="other_config" key="per-port-memory"+ type='{"type": "boolean"}'> + <p> + By default OVS DPDK uses a shared memory model wherein devices + that have the same MTU and socket values can share the same + mempool. Setting this value to <code>true</code> changes this + behaviour. Per port memory allow DPDK devices to use private + memory per device. This can provide greater transapraency as + regards memory usage but potentially at the cost of greater memory + requirements. + </p> + <p> + Changing this value requires restarting the daemon if dpdk-init has + already been set to true. + </p> + </column> + <column name="other_config" key="tx-flush-interval" type='{"type": "integer", "minInteger": 0, "maxInteger": 1000000}'>
_______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
