Shared memory mempools may be currently be shared between DPDK ports based on port MTU and NUMA. With some hint from the user we can increase the sharing on MTU and hence reduce memory consumption in many cases.
For example, a port with MTU 9000, uses a mempool with an mbuf size based on 9000 MTU. A port with MTU 1500, uses a different mempool with an mbuf size based on 1500 MTU. In this case, assuming same NUMA, both these ports could share the 9000 MTU mempool. The user must give a hint as order of creation of ports and setting of MTUs may vary and we need to ensure that upgrades from older OVS versions do not require more memory. This scheme can also prevent multiple mempools being created for cases where a port is added picking up a default MTU and an appropriate mempool, but later has it's MTU changed to a different value requiring a different mempool. Example usage: $ ovs-vsctl --no-wait set Open_vSwitch . \ other_config:dpdk-shared-memory-config=9000,1500:1,6000:1 Port added on NUMA 0: * MTU 1500, use mempool based on 9000 MTU * MTU 6000, use mempool based on 9000 MTU * MTU 9000, use mempool based on 9000 MTU * MTU 9300, use mempool based on 9300 MTU (existing behaviour) Port added on NUMA 1: * MTU 1500, use mempool based on 1500 MTU * MTU 6000, use mempool based on 6000 MTU * MTU 9000, use mempool based on 9000 MTU * MTU 9300, use mempool based on 9300 MTU (existing behaviour) Default behaviour is unchanged and mempools are still only created when needed. Signed-off-by: Kevin Traynor <[email protected]> --- Documentation/topics/dpdk/memory.rst | 43 +++++++++++ lib/dpdk-stub.c | 10 ++- lib/dpdk.c | 104 +++++++++++++++++++++++++++ lib/dpdk.h | 2 + lib/netdev-dpdk.c | 4 ++ vswitchd/vswitch.xml | 37 ++++++++++ 6 files changed, 199 insertions(+), 1 deletion(-) diff --git a/Documentation/topics/dpdk/memory.rst b/Documentation/topics/dpdk/memory.rst index 7a9d02c18..8bf216fc6 100644 --- a/Documentation/topics/dpdk/memory.rst +++ b/Documentation/topics/dpdk/memory.rst @@ -214,2 +214,45 @@ Example 3: (2 rxq, 2 PMD, 9000 MTU) Mbuf size = 10176 Bytes Memory required = 26656 * 10176 = 271 MB + +Shared Memory Configuration +--------------------------- +In order to increase sharing of mempools, a user can configure the MTUs which +mempools are based on by using ``dpdk-shared-memory-config``. + +A MTU configured bt the user is adjusted to an mbuf size used for mempool +creation and stored. If a port is subsequently added that has an MTU which can +be accommodated by this mbuf size, it will be used for mempool creation/reuse. + +This can increase sharing by consolidating mempools for ports with different +MTUs which would otherwise use separate mempools. It can also help to remove +the need for mempools being created after a port is added but before it's MTU +is changed to a different value. + +For example, on a 2 NUMA system:: + + $ ovs-vsctl ovs-vsctl --no-wait set Open_vSwitch . \ + other_config:dpdk-shared-memory-config=9000,1500:1,6000:1 + + +In this case, OVS stores the mbuf sizes based on the following MTUs. + +* NUMA 0: 9000 +* NUMA 1: 1500, 6000, 9000 + +Ports added will use mempools with the mbuf sizes based on the above MTUs where +possible. If there is more than one suitable, the one closest to the MTU will +be selected. + +Port added on NUMA 0: + +* MTU 1500, use mempool based on 9000 MTU +* MTU 6000, use mempool based on 9000 MTU +* MTU 9000, use mempool based on 9000 MTU +* MTU 9300, use mempool based on 9300 MTU (existing behaviour) + +Port added on NUMA 1: + +* MTU 1500, use mempool based on 1500 MTU +* MTU 6000, use mempool based on 6000 MTU +* MTU 9000, use mempool based on 9000 MTU +* MTU 9300, use mempool based on 9300 MTU (existing behaviour) diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c index 3513d5640..272357ac3 100644 --- a/lib/dpdk-stub.c +++ b/lib/dpdk-stub.c @@ -97,5 +97,13 @@ dpdk_status(const struct ovsrec_open_vswitch *cfg) uint32_t -dpdk_buf_size(int mtu) +dpdk_buf_size(int mtu OVS_UNUSED) +{ + return 0; +} + +int +dpdk_get_user_adjusted_mtu(int port_adj_mtu OVS_UNUSED, + int port_mtu OVS_UNUSED, + int port_socket_id OVS_UNUSED) { return 0; diff --git a/lib/dpdk.c b/lib/dpdk.c index e1836a168..90897437d 100644 --- a/lib/dpdk.c +++ b/lib/dpdk.c @@ -36,4 +36,5 @@ #include "netdev-offload-provider.h" #include "openvswitch/dynamic-string.h" +#include "openvswitch/ofp-parse.h" #include "openvswitch/vlog.h" #include "ovs-atomic.h" @@ -58,4 +59,12 @@ static bool per_port_memory = false; /* Status of per port memory support */ static atomic_bool dpdk_initialized = ATOMIC_VAR_INIT(false); +struct user_mempool_config { + int adj_mtu; + int socket_id; +}; + +static struct user_mempool_config *user_mempools = NULL; +static int n_user_mempools; + static int process_vhost_flags(char *flag, const char *default_val, int size, @@ -344,4 +353,61 @@ malloc_dump_stats_wrapper(FILE *stream) } +static int +parse_user_mempools_list(const char *mtus) +{ + char *list, *key, *value; + int error = 0; + + if (!mtus) { + return 0; + } + + n_user_mempools = 0; + list = xstrdup(mtus); + + while (ofputil_parse_key_value(&list, &key, &value)) { + int socket_id, mtu, i, adj_mtu; + + if (!str_to_int(key, 0, &mtu) || mtu < 0) { + error = EINVAL; + break; + } + + if (!str_to_int(value, 0, &socket_id)) { + /* No socket specified. It will apply for all numas. */ + socket_id = INT_MAX; + } else if (socket_id < 0) { + error = EINVAL; + break; + } + + adj_mtu = FRAME_LEN_TO_MTU(dpdk_buf_size(mtu)); + + for (i = 0; i < n_user_mempools; i++) { + if (user_mempools[i].adj_mtu == adj_mtu && + (user_mempools[i].socket_id == INT_MAX || + user_mempools[i].socket_id == socket_id)) { + /* Already have this adjusted mtu. */ + VLOG_INFO("User configured shared memory mempool with " + "mbuf size of %d already set that caters for: " + "MTU %d, NUMA %s." ,adj_mtu, mtu, + socket_id == INT_MAX ? "ALL" : value); + break; + } + } + if (i == n_user_mempools) { + /* Existing entry does not cover this mtu, create a new one. */ + user_mempools = xrealloc(user_mempools, (n_user_mempools + 1) * + sizeof(struct user_mempool_config)); + user_mempools[n_user_mempools].adj_mtu = adj_mtu; + user_mempools[n_user_mempools++].socket_id = socket_id; + VLOG_INFO("User configured shared memory mempool set for: " + "MTU %d, NUMA %s.",mtu, + socket_id == INT_MAX ? "ALL" : value); + } + } + return error; +} + static bool dpdk_init__(const struct smap *ovs_other_config) @@ -354,4 +420,6 @@ dpdk_init__(const struct smap *ovs_other_config) struct ovs_numa_dump *affinity = NULL; struct svec args = SVEC_EMPTY_INITIALIZER; + const char *mempoolcfg = smap_get(ovs_other_config, + "dpdk-shared-memory-config"); log_stream = fopencookie(NULL, "w+", dpdk_log_func); @@ -406,4 +474,6 @@ dpdk_init__(const struct smap *ovs_other_config) per_port_memory ? "enabled" : "disabled"); + parse_user_mempools_list(mempoolcfg); + svec_add(&args, ovs_get_program_name()); construct_dpdk_args(ovs_other_config, &args); @@ -653,2 +723,36 @@ dpdk_buf_size(int mtu) + RTE_PKTMBUF_HEADROOM; } + +int +dpdk_get_user_adjusted_mtu(int port_adj_mtu, int port_mtu, int port_socket_id) +{ + int best_adj_user_mtu = INT_MAX; + + for (unsigned i = 0; i < n_user_mempools; i++) { + int user_adj_mtu, user_socket_id; + + user_adj_mtu = user_mempools[i].adj_mtu; + user_socket_id = user_mempools[i].socket_id; + if (port_adj_mtu > user_adj_mtu || + (user_socket_id != INT_MAX && + user_socket_id != port_socket_id)) { + continue; + } + if (user_adj_mtu - port_adj_mtu < best_adj_user_mtu - port_adj_mtu) { + /* This is the is the lowest valid user MTU. */ + best_adj_user_mtu = user_adj_mtu; + } + } + + if (best_adj_user_mtu == INT_MAX) { + VLOG_DBG("No user configured shared memory mempool mbuf sizes found " + "suitable for port with MTU %d, NUMA %d.", port_mtu, + port_socket_id); + best_adj_user_mtu = port_adj_mtu; + } else { + VLOG_DBG("Found user configured shared memory mempool with mbufs " + "of size %d, suitable for port with MTU %d, NUMA %d.", + best_adj_user_mtu, port_mtu, port_socket_id); + } + return best_adj_user_mtu; +} diff --git a/lib/dpdk.h b/lib/dpdk.h index 8ec43f57c..f43b80ac0 100644 --- a/lib/dpdk.h +++ b/lib/dpdk.h @@ -64,4 +64,6 @@ void print_dpdk_version(void); void dpdk_status(const struct ovsrec_open_vswitch *); uint32_t dpdk_buf_size(int mtu); +int dpdk_get_user_adjusted_mtu(int port_adj_mtu, int port_mtu, + int port_socket_id); #endif /* dpdk.h */ diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index c9877addb..fbf8fea24 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -766,4 +766,8 @@ dpdk_mp_get(struct netdev_dpdk *dev, int mtu, bool per_port_mp) * to see if reuse is possible. */ if (!per_port_mp) { + /* If user has provided defined mempools, check if one is suitable + * and get new buffer size.*/ + mtu = dpdk_get_user_adjusted_mtu(mtu, dev->requested_mtu, + dev->requested_socket_id); LIST_FOR_EACH (dmp, list_node, &dpdk_mp_list) { if (dmp->socket_id == dev->requested_socket_id diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 0c6632617..925bc97bc 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -491,4 +491,41 @@ </column> + <column name="other_config" key="dpdk-shared-memory-config"> + <p>Specifies dpdk shared memory mempool config.</p> + <p>Value should be set in the following form:</p> + <p> + <code>other_config:dpdk-shared-mempory-config=< + user-shared-mempool-mtu-list></code> + </p> + <p>where</p> + <p> + <ul> + <li> + <user-shared-mempool-mtu-list> ::= + NULL | <non-empty-list> + </li> + <li> + <non-empty-list> ::= <user-mtus> | + <user-mtus> , + <non-empty-list> + </li> + <li> + <user-mtus> ::= <mtu-all-socket> | + <mtu-socket-pair> + </li> + <li> + <mtu-all-socket> ::= <mtu> + </li> + <li> + <mtu-socket-pair> ::= <mtu> : <socket-id> + </li> + </ul> + </p> + <p> + Changing this value requires restarting the daemon if dpdk-init has + already been set to true. + </p> + </column> + <column name="other_config" key="tx-flush-interval" type='{"type": "integer", -- 2.34.1 _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
