Miguel Angel Ajo <[email protected]> writes:

> What would be the memory impact of this patch? 

There would be additional threads, and these threads will occupy
the stack size / thread amount of data.  Although, without mlockall() as
you noted below, there isn't much additional data that they really do
consume.  And without packets, they won't really do much.

> Context: we have been exploring the use of ovn in SoCs for edge environments 
> where memory is a strong
> constraint and performance is not our biggest concern in all cases. One of 
> the optimizations we had to make
> (thanks +Zenghui Shi for this investigation) was to pin ovs-vswitchd to a 
> single core to reduce the memory
> impact (along with disabling mlock-all ), we were able to reduce ovs-vswitchd 
> footprint from 100MB+ down to
> 16MB 

Well, that will probably grow a bit depending on the amount of traffic.

> Is it possible to make this a configurable parameter defaulting to what you 
> explain here?

I'm not sure about adding such a knob.  It already is difficult to get
this right, and such a knob is incredibly deceptive (since it really
will just restore a 'bad' behavior of kernel having packets go to an
arbitrary handler thread).

> El El mar, 19 jul 2022 a las 9:02, Michael Santana <[email protected]> 
> escribió:
>
>  Additional threads are required to service upcalls when we have CPU
>  isolation (in per-cpu dispatch mode). The reason additional threads
>  are required is because it creates a more fair distribution. With more
>  threads we decrease the load of each thread as more threads would
>  decrease the number of cores each threads is assigned.
>
>  Adding additional threads also increases the chance OVS utilizes all
>  cores available to use. Some RPS schemas might make some handler
>  threads get all the workload while others get no workload. This tends
>  to happen when the handler thread count is low.
>
>  An example would be an RPS that sends traffic on all even cores on a
>  system with only the lower half of the cores available for OVS to use.
>  In this example we have as many handlers threads as there are
>  available cores. In this case 50% of the handler threads get all the
>  workload while the other 50% get no workload. Not only that, but OVS
>  is only utilizing half of the cores that it can use. This is the worst
>  case scenario.
>
>  The ideal scenario is to have as many threads as there are cores - in
>  this case we guarantee that all cores OVS can use are utilized
>
>  But, adding as many threads are there are cores could have a performance
>  hit when the number of active cores (which all threads have to share) is
>  very low. For this reason we avoid creating as many threads as there
>  are cores and instead meet somewhere in the middle.
>
>  The formula used to calculate the number of handler threads to create
>  is as follows:
>
>  handlers_n = min(next_prime(active_cores+1), total_cores)
>
>  Where next_prime is a function that returns the next numeric prime
>  number that is strictly greater than active_cores
>
>  Assume default behavior when total_cores <= 2, that is do not create
>  additional threads when we have less than 2 total cores on the system
>
>  Fixes: b1e517bd2f81 ("dpif-netlink: Introduce per-cpu upcall dispatch.")
>  Signed-off-by: Michael Santana <[email protected]>
>  ---
>   lib/dpif-netlink.c | 83 ++++++++++++++++++++++++++++++++++++++++++++--
>   lib/ovs-thread.c   | 17 ++++++++++
>   lib/ovs-thread.h   |  1 +
>   3 files changed, 99 insertions(+), 2 deletions(-)
>
>  diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
>  index 06e1e8ca0..21f149867 100644
>  --- a/lib/dpif-netlink.c
>  +++ b/lib/dpif-netlink.c
>  @@ -2506,6 +2506,85 @@ dpif_netlink_handler_uninit(struct dpif_handler 
> *handler)
>   }
>   #endif
>
>  +/*
>  + * Returns true if num is a prime number,
>  + * Otherwise return false
>  + */
>  +static bool
>  +is_prime(uint32_t num)
>  +{
>  +    if (num == 2) {
>  +        return true;
>  +    }
>  +
>  +    if (num < 2) {
>  +        return false;
>  +    }
>  +
>  +    if (num % 2 == 0) {
>  +        return false;
>  +    }
>  +
>  +    for (uint64_t i = 3; i * i <= num; i += 2) {
>  +        if (num % i == 0) {
>  +            return false;
>  +        }
>  +    }
>  +
>  +    return true;
>  +}
>  +
>  +/*
>  + * Returns num if num is a prime number.  Otherwise returns the next
>  + * prime greater than num.  Search is limited by UINT32_MAX.
>  + *
>  + * Returns 0 if no prime has been found between num and UINT32_MAX
>  + */
>  +static uint32_t
>  +next_prime(uint32_t num)
>  +{
>  +    if (num <= 2) {
>  +        return 2;
>  +    }
>  +
>  +    for (uint32_t i = num; i < UINT32_MAX; i++) {
>  +        if (is_prime(i)) {
>  +            return i;
>  +        }
>  +    }
>  +
>  +    return 0;
>  +}
>  +
>  +/*
>  + * Calculates and returns the number of handler threads needed based
>  + * the following formula:
>  + *
>  + * handlers_n = min(next_prime(active_cores+1), total_cores)
>  + *
>  + * Where next_prime is a function that returns the next numeric prime
>  + * number that is strictly greater than active_cores
>  + */
>  +static uint32_t
>  +dpif_netlink_calculate_n_handlers(void)
>  +{
>  +    uint32_t total_cores = count_total_cores();
>  +    uint32_t n_handlers = count_cpu_cores();
>  +    uint32_t next_prime_num;
>  +
>  +    /*
>  +     * If we have isolated cores, add additional handler threads to
>  +     * minimize the number of cores assigned to individual handler threads
>  +     */
>  +    if (n_handlers < total_cores && total_cores > 2) {
>  +        next_prime_num = next_prime(n_handlers + 1);
>  +        n_handlers = next_prime_num >= total_cores ?
>  +                                            total_cores : next_prime_num;
>  +    }
>  +
>  +    return n_handlers;
>  +}
>  +
>   static int
>   dpif_netlink_refresh_handlers_cpu_dispatch(struct dpif_netlink *dpif)
>       OVS_REQ_WRLOCK(dpif->upcall_lock)
>  @@ -2515,7 +2594,7 @@ dpif_netlink_refresh_handlers_cpu_dispatch(struct 
> dpif_netlink *dpif)
>       uint32_t n_handlers;
>       uint32_t *upcall_pids;
>
>  -    n_handlers = count_cpu_cores();
>  +    n_handlers = dpif_netlink_calculate_n_handlers();
>       if (dpif->n_handlers != n_handlers) {
>           VLOG_DBG("Dispatch mode(per-cpu): initializing %d handlers",
>                      n_handlers);
>  @@ -2755,7 +2834,7 @@ dpif_netlink_number_handlers_required(struct dpif 
> *dpif_, uint32_t
>  *n_handlers)
>       struct dpif_netlink *dpif = dpif_netlink_cast(dpif_);
>
>       if (dpif_netlink_upcall_per_cpu(dpif)) {
>  -        *n_handlers = count_cpu_cores();
>  +        *n_handlers = dpif_netlink_calculate_n_handlers();
>           return true;
>       }
>
>  diff --git a/lib/ovs-thread.c b/lib/ovs-thread.c
>  index 805cba622..78ed3e970 100644
>  --- a/lib/ovs-thread.c
>  +++ b/lib/ovs-thread.c
>  @@ -663,6 +663,23 @@ count_cpu_cores(void)
>       return n_cores > 0 ? n_cores : 0;
>   }
>
>  +/* Returns the total number of cores on the system, or 0 if the
>  + * number cannot be determined. */
>  +int
>  +count_total_cores(void)
>  +{
>  +    long int n_cores;
>  +
>  +#ifndef _WIN32
>  +    n_cores = sysconf(_SC_NPROCESSORS_CONF);
>  +#else
>  +    n_cores = 0;
>  +    errno = ENOTSUP;
>  +#endif
>  +
>  +    return n_cores > 0 ? n_cores : 0;
>  +}
>  +
>   /* Returns 'true' if current thread is PMD thread. */
>   bool
>   thread_is_pmd(void)
>  diff --git a/lib/ovs-thread.h b/lib/ovs-thread.h
>  index 3b444ccdc..aac5e19c9 100644
>  --- a/lib/ovs-thread.h
>  +++ b/lib/ovs-thread.h
>  @@ -522,6 +522,7 @@ bool may_fork(void);
>   /* Useful functions related to threading. */
>
>   int count_cpu_cores(void);
>  +int count_total_cores(void);
>   bool thread_is_pmd(void);
>
>   #endif /* ovs-thread.h */
>  -- 
>  2.33.1
>
>  _______________________________________________
>  dev mailing list
>  [email protected]
>  https://mail.openvswitch.org/mailman/listinfo/ovs-dev

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to