On Wed, Apr 13, 2022 at 3:24 PM Kevin Traynor <[email protected]> wrote:
>
> Shared memory mempools may be currently be shared between DPDK
> ports based on port MTU and NUMA. With some hint from the user
> we can increase the sharing on MTU and hence reduce memory
> consumption in many cases.
>
> For example, a port with MTU 9000, uses a mempool with an
> mbuf size based on 9000 MTU. A port with MTU 1500, uses a
> different mempool with an mbuf size based on 1500 MTU.
>
> In this case, assuming same NUMA, both these ports could
> share the 9000 MTU mempool.
>
> The user must give a hint as order of creation of ports and
> setting of MTUs may vary and we need to ensure that upgrades
> from older OVS versions do not require more memory.
>
> This scheme can also prevent multiple mempools being created
> for cases where a port is added picking up a default MTU and
> an appropriate mempool, but later has it's MTU changed to a
> different value requiring a different mempool.
>
> Example usage:
>
>  $ ovs-vsctl --no-wait set Open_vSwitch . \
>    other_config:dpdk-shared-memory-config=9000,1500:1,6000:1
>
> Port added on NUMA 0:
> * MTU 1500, use mempool based on 9000 MTU
> * MTU 6000, use mempool based on 9000 MTU
> * MTU 9000, use mempool based on 9000 MTU
> * MTU 9300, use mempool based on 9300 MTU (existing behaviour)
>
> Port added on NUMA 1:
> * MTU 1500, use mempool based on 1500 MTU
> * MTU 6000, use mempool based on 6000 MTU
> * MTU 9000, use mempool based on 9000 MTU
> * MTU 9300, use mempool based on 9300 MTU (existing behaviour)
>
> Default behaviour is unchanged and mempools are still only created
> when needed.

That's an interesting feature.

I have a first found of comments.


My first concern is on moving the code to lib/dpdk*.
The first patch of the series moves netdev notions to the "generic"
dpdk code while there was an effort (from the introduction of the
lib/dpdk* files) to keep things separated.
I tried squashing both patches and moving everything in
lib/netdev-dpdk.c, it builds fine though I did not test it yet.


On the parameter name itself, dpdk-shared-memory-config is confusing.
There is a shared memory in DPDK (for multiprocess support) but this
parameter has nothing to do with it.
We need at least a mention of mempool in the name, but simply going
with "dpdk-shared-mempool-config" is still strange, because there can
be mempools internally of dpdk.
Maybe netdev-dpdk-shared-mempool-config?


More comments inline:

>
> Signed-off-by: Kevin Traynor <[email protected]>
> ---
>  Documentation/topics/dpdk/memory.rst |  43 +++++++++++
>  lib/dpdk-stub.c                      |   8 +++
>  lib/dpdk.c                           | 104 +++++++++++++++++++++++++++
>  lib/dpdk.h                           |   2 +
>  lib/netdev-dpdk.c                    |   4 ++
>  vswitchd/vswitch.xml                 |  37 ++++++++++
>  6 files changed, 198 insertions(+)
>
> diff --git a/Documentation/topics/dpdk/memory.rst 
> b/Documentation/topics/dpdk/memory.rst
> index 7a9d02c18..8bf216fc6 100644
> --- a/Documentation/topics/dpdk/memory.rst
> +++ b/Documentation/topics/dpdk/memory.rst
> @@ -214,2 +214,45 @@ Example 3: (2 rxq, 2 PMD, 9000 MTU)
>   Mbuf size = 10176 Bytes
>   Memory required = 26656 * 10176 = 271 MB
> +
> +Shared Memory Configuration
> +---------------------------

Not sure it is needed, but in the rest of this doc, sections start
with an empty line after their title.

> +In order to increase sharing of mempools, a user can configure the MTUs which
> +mempools are based on by using ``dpdk-shared-memory-config``.
> +
> +A MTU configured bt the user is adjusted to an mbuf size used for mempool

by* a*


> +creation and stored. If a port is subsequently added that has an MTU which 
> can
> +be accommodated by this mbuf size, it will be used for mempool 
> creation/reuse.
> +
> +This can increase sharing by consolidating mempools for ports with different
> +MTUs which would otherwise use separate mempools. It can also help to remove
> +the need for mempools being created after a port is added but before it's MTU
> +is changed to a different value.
> +
> +For example, on a 2 NUMA system::
> +
> +  $ ovs-vsctl ovs-vsctl --no-wait set Open_vSwitch . \
> +  other_config:dpdk-shared-memory-config=9000,1500:1,6000:1
> +
> +
> +In this case, OVS stores the mbuf sizes based on the following MTUs.
> +
> +* NUMA 0: 9000
> +* NUMA 1: 1500, 6000, 9000
> +
> +Ports added will use mempools with the mbuf sizes based on the above MTUs 
> where
> +possible. If there is more than one suitable, the one closest to the MTU will
> +be selected.
> +
> +Port added on NUMA 0:
> +
> +* MTU 1500, use mempool based on 9000 MTU
> +* MTU 6000, use mempool based on 9000 MTU
> +* MTU 9000, use mempool based on 9000 MTU
> +* MTU 9300, use mempool based on 9300 MTU (existing behaviour)
> +
> +Port added on NUMA 1:
> +
> +* MTU 1500, use mempool based on 1500 MTU
> +* MTU 6000, use mempool based on 6000 MTU
> +* MTU 9000, use mempool based on 9000 MTU
> +* MTU 9300, use mempool based on 9300 MTU (existing behaviour)
> diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c
> index 947af0abd..272357ac3 100644
> --- a/lib/dpdk-stub.c
> +++ b/lib/dpdk-stub.c
> @@ -101,2 +101,10 @@ dpdk_buf_size(int mtu OVS_UNUSED)
>      return 0;
>  }
> +
> +int
> +dpdk_get_user_adjusted_mtu(int port_adj_mtu OVS_UNUSED,
> +                           int port_mtu OVS_UNUSED,
> +                           int port_socket_id OVS_UNUSED)
> +{
> +    return 0;
> +}
> diff --git a/lib/dpdk.c b/lib/dpdk.c
> index e1836a168..90897437d 100644
> --- a/lib/dpdk.c
> +++ b/lib/dpdk.c
> @@ -36,4 +36,5 @@
>  #include "netdev-offload-provider.h"
>  #include "openvswitch/dynamic-string.h"
> +#include "openvswitch/ofp-parse.h"
>  #include "openvswitch/vlog.h"
>  #include "ovs-atomic.h"
> @@ -58,4 +59,12 @@ static bool per_port_memory = false; /* Status of per port 
> memory support */
>  static atomic_bool dpdk_initialized = ATOMIC_VAR_INIT(false);
>
> +struct user_mempool_config {
> +    int adj_mtu;
> +    int socket_id;
> +};
> +
> +static struct user_mempool_config *user_mempools = NULL;
> +static int n_user_mempools;
> +
>  static int
>  process_vhost_flags(char *flag, const char *default_val, int size,
> @@ -344,4 +353,61 @@ malloc_dump_stats_wrapper(FILE *stream)
>  }
>
> +static int

No strong opinion, but nothing is done with return value in caller.
I would make this function void and add logs where parsing fails.


> +parse_user_mempools_list(const char *mtus)
> +{
> +    char *list, *key, *value;
> +    int error = 0;
> +
> +    if (!mtus) {
> +        return 0;
> +    }
> +
> +    n_user_mempools = 0;
> +    list = xstrdup(mtus);

list is leaked at the end of the function.


> +
> +    while (ofputil_parse_key_value(&list, &key, &value)) {
> +        int socket_id, mtu, i, adj_mtu;
> +
> +        if (!str_to_int(key, 0, &mtu) || mtu < 0) {
> +            error = EINVAL;
> +            break;

Breaking means we keep in user_mempools[] what had been parsed of the
passed string so far.
With a half setup configuration, the behavior might be hard to
understand for users.
Inconsistency could be detected a long time after configuring too.

I'd rather log an error here, and reset user_mempools[] and/or n_user_mempools.


> +        }
> +
> +        if (!str_to_int(value, 0, &socket_id)) {
> +            /* No socket specified. It will apply for all numas. */
> +            socket_id = INT_MAX;
> +        } else if (socket_id < 0) {
> +            error = EINVAL;

Same comment here about break;


> +            break;
> +        }
> +
> +        adj_mtu = FRAME_LEN_TO_MTU(dpdk_buf_size(mtu));
> +
> +        for (i = 0; i < n_user_mempools; i++) {
> +            if (user_mempools[i].adj_mtu == adj_mtu &&
> +                    (user_mempools[i].socket_id == INT_MAX ||
> +                    user_mempools[i].socket_id == socket_id)) {

On this "duplicate" detection, do we need it?
It won't handle the following case:

If passing other_config:dpdk-shared-memory-config=9000,9000:0, we get:
2022-05-04T12:23:16.654Z|00028|netdev_dpdk|INFO|User configured shared
memory mempool set for: MTU 9000, NUMA ALL.
2022-05-04T12:23:16.654Z|00029|netdev_dpdk|INFO|User configured shared
memory mempool with mbuf size of 9326 already set that caters for: MTU
9000, NUMA 0.

Otoh, passing other_config:dpdk-shared-memory-config=9000:0,9000:
2022-05-04T12:18:17.525Z|00028|netdev_dpdk|INFO|User configured shared
memory mempool set for: MTU 9000, NUMA 0.
2022-05-04T12:18:17.525Z|00029|netdev_dpdk|INFO|User configured shared
memory mempool set for: MTU 9000, NUMA ALL.


If we keep it and, iirc the coding style guidelines, should it be?
+            if (user_mempools[i].adj_mtu == adj_mtu &&
+                && (user_mempools[i].socket_id == INT_MAX
+                    || user_mempools[i].socket_id == socket_id)) {


> +                /* Already have this adjusted mtu. */
> +                VLOG_INFO("User configured shared memory mempool with "
> +                          "mbuf size of %d already set that caters for: "
> +                          "MTU %d, NUMA %s." ,adj_mtu, mtu,

s/ ,/, /

That's an info level log which means users will see it.
Displaying the adjusted mtu could be confusing to a user who only
feeds "normal" (from his pov) mtu.


> +                          socket_id == INT_MAX ? "ALL" : value);
> +                break;
> +            }
> +        }
> +        if (i == n_user_mempools) {
> +            /* Existing entry does not cover this mtu, create a new one. */
> +            user_mempools = xrealloc(user_mempools, (n_user_mempools + 1) *
> +                                     sizeof(struct user_mempool_config));
> +            user_mempools[n_user_mempools].adj_mtu = adj_mtu;
> +            user_mempools[n_user_mempools++].socket_id = socket_id;

No strong opinion, but I prefer to put the increment aside from
filling the current entry.

If a field is added later by an inattentive developer as:
             user_mempools[n_user_mempools++].socket_id = socket_id;
+            user_mempools[n_user_mempools++].new_field = new_field;

It would be such a pity if reviewers missed that :-).


> +            VLOG_INFO("User configured shared memory mempool set for: "
> +                      "MTU %d, NUMA %s.",mtu,
> +                      socket_id == INT_MAX ? "ALL" : value);
> +        }
> +    }
> +    return error;
> +}
> +
>  static bool
>  dpdk_init__(const struct smap *ovs_other_config)
> @@ -354,4 +420,6 @@ dpdk_init__(const struct smap *ovs_other_config)
>      struct ovs_numa_dump *affinity = NULL;
>      struct svec args = SVEC_EMPTY_INITIALIZER;
> +    const char *mempoolcfg = smap_get(ovs_other_config,
> +                                      "dpdk-shared-memory-config");
>
>      log_stream = fopencookie(NULL, "w+", dpdk_log_func);
> @@ -406,4 +474,6 @@ dpdk_init__(const struct smap *ovs_other_config)
>                per_port_memory ? "enabled" : "disabled");
>
> +    parse_user_mempools_list(mempoolcfg);
> +
>      svec_add(&args, ovs_get_program_name());
>      construct_dpdk_args(ovs_other_config, &args);
> @@ -653,2 +723,36 @@ dpdk_buf_size(int mtu)
>              + RTE_PKTMBUF_HEADROOM;
>  }
> +
> +int
> +dpdk_get_user_adjusted_mtu(int port_adj_mtu, int port_mtu, int 
> port_socket_id)
> +{
> +    int best_adj_user_mtu = INT_MAX;
> +
> +    for (unsigned i = 0; i < n_user_mempools; i++) {
> +        int user_adj_mtu, user_socket_id;
> +
> +        user_adj_mtu = user_mempools[i].adj_mtu;
> +        user_socket_id = user_mempools[i].socket_id;
> +        if (port_adj_mtu > user_adj_mtu ||
> +                (user_socket_id != INT_MAX &&
> +                 user_socket_id != port_socket_id)) {

Same comment than above about coding style.


> +            continue;
> +        }
> +        if (user_adj_mtu - port_adj_mtu < best_adj_user_mtu - port_adj_mtu) {

I must be missing something here.
What is the point of substracting port_adj_mtu to both sides of the comparison?


> +            /* This is the is the lowest valid user MTU. */
> +            best_adj_user_mtu  = user_adj_mtu;
> +        }
> +    }
> +
> +    if (best_adj_user_mtu == INT_MAX) {
> +        VLOG_DBG("No user configured shared memory mempool mbuf sizes found "
> +                 "suitable for port with MTU %d, NUMA %d.", port_mtu,
> +                 port_socket_id);
> +        best_adj_user_mtu = port_adj_mtu;
> +    } else {
> +        VLOG_DBG("Found user configured shared memory mempool with mbufs "
> +                 "of size %d, suitable for port with MTU %d, NUMA %d.",
> +                 best_adj_user_mtu, port_mtu, port_socket_id);
> +    }
> +    return best_adj_user_mtu;
> +}
> diff --git a/lib/dpdk.h b/lib/dpdk.h
> index 8ec43f57c..f43b80ac0 100644
> --- a/lib/dpdk.h
> +++ b/lib/dpdk.h
> @@ -64,4 +64,6 @@ void print_dpdk_version(void);
>  void dpdk_status(const struct ovsrec_open_vswitch *);
>  uint32_t dpdk_buf_size(int mtu);
> +int dpdk_get_user_adjusted_mtu(int port_adj_mtu, int port_mtu,
> +                               int port_socket_id);
>
>  #endif /* dpdk.h */
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index c9877addb..fbf8fea24 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -766,4 +766,8 @@ dpdk_mp_get(struct netdev_dpdk *dev, int mtu, bool 
> per_port_mp)
>       * to see if reuse is possible. */
>      if (!per_port_mp) {
> +        /* If user has provided defined mempools, check if one is suitable
> +         * and get new buffer size.*/
> +        mtu = dpdk_get_user_adjusted_mtu(mtu, dev->requested_mtu,
> +                                         dev->requested_socket_id);
>          LIST_FOR_EACH (dmp, list_node, &dpdk_mp_list) {
>              if (dmp->socket_id == dev->requested_socket_id
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 0c6632617..925bc97bc 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -491,4 +491,41 @@
>        </column>
>
> +      <column name="other_config" key="dpdk-shared-memory-config">
> +              <p>Specifies dpdk shared memory mempool config.</p>
> +              <p>Value should be set in the following form:</p>
> +              <p>
> +                <code>other_config:dpdk-shared-mempory-config=&lt;
> +                  user-shared-mempool-mtu-list&gt;</code>
> +              </p>
> +              <p>where</p>
> +              <p>
> +                <ul>
> +                  <li>
> +                    &lt;user-shared-mempool-mtu-list&gt; ::=
> +                    NULL | &lt;non-empty-list&gt;
> +                  </li>
> +                  <li>
> +                    &lt;non-empty-list&gt; ::= &lt;user-mtus&gt; |
> +                                               &lt;user-mtus&gt; ,
> +                                               &lt;non-empty-list&gt;
> +                  </li>
> +                  <li>
> +                    &lt;user-mtus&gt; ::= &lt;mtu-all-socket&gt; |
> +                                          &lt;mtu-socket-pair&gt;
> +                  </li>
> +                  <li>
> +                    &lt;mtu-all-socket&gt; ::= &lt;mtu&gt;
> +                  </li>
> +                  <li>
> +                    &lt;mtu-socket-pair&gt; ::= &lt;mtu&gt; : 
> &lt;socket-id&gt;
> +                  </li>
> +                </ul>
> +              </p>
> +        <p>
> +          Changing this value requires restarting the daemon if dpdk-init has
> +          already been set to true.
> +        </p>
> +      </column>
> +
>        <column name="other_config" key="tx-flush-interval"
>                type='{"type": "integer",
> --
> 2.34.1
>


-- 
David Marchand

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to