Miguel Angel Ajo <[email protected]> writes: > What would be the memory impact of this patch?
There would be additional threads, and these threads will occupy the stack size / thread amount of data. Although, without mlockall() as you noted below, there isn't much additional data that they really do consume. And without packets, they won't really do much. > Context: we have been exploring the use of ovn in SoCs for edge environments > where memory is a strong > constraint and performance is not our biggest concern in all cases. One of > the optimizations we had to make > (thanks +Zenghui Shi for this investigation) was to pin ovs-vswitchd to a > single core to reduce the memory > impact (along with disabling mlock-all ), we were able to reduce ovs-vswitchd > footprint from 100MB+ down to > 16MB Well, that will probably grow a bit depending on the amount of traffic. > Is it possible to make this a configurable parameter defaulting to what you > explain here? I'm not sure about adding such a knob. It already is difficult to get this right, and such a knob is incredibly deceptive (since it really will just restore a 'bad' behavior of kernel having packets go to an arbitrary handler thread). > El El mar, 19 jul 2022 a las 9:02, Michael Santana <[email protected]> > escribió: > > Additional threads are required to service upcalls when we have CPU > isolation (in per-cpu dispatch mode). The reason additional threads > are required is because it creates a more fair distribution. With more > threads we decrease the load of each thread as more threads would > decrease the number of cores each threads is assigned. > > Adding additional threads also increases the chance OVS utilizes all > cores available to use. Some RPS schemas might make some handler > threads get all the workload while others get no workload. This tends > to happen when the handler thread count is low. > > An example would be an RPS that sends traffic on all even cores on a > system with only the lower half of the cores available for OVS to use. > In this example we have as many handlers threads as there are > available cores. In this case 50% of the handler threads get all the > workload while the other 50% get no workload. Not only that, but OVS > is only utilizing half of the cores that it can use. This is the worst > case scenario. > > The ideal scenario is to have as many threads as there are cores - in > this case we guarantee that all cores OVS can use are utilized > > But, adding as many threads are there are cores could have a performance > hit when the number of active cores (which all threads have to share) is > very low. For this reason we avoid creating as many threads as there > are cores and instead meet somewhere in the middle. > > The formula used to calculate the number of handler threads to create > is as follows: > > handlers_n = min(next_prime(active_cores+1), total_cores) > > Where next_prime is a function that returns the next numeric prime > number that is strictly greater than active_cores > > Assume default behavior when total_cores <= 2, that is do not create > additional threads when we have less than 2 total cores on the system > > Fixes: b1e517bd2f81 ("dpif-netlink: Introduce per-cpu upcall dispatch.") > Signed-off-by: Michael Santana <[email protected]> > --- > lib/dpif-netlink.c | 83 ++++++++++++++++++++++++++++++++++++++++++++-- > lib/ovs-thread.c | 17 ++++++++++ > lib/ovs-thread.h | 1 + > 3 files changed, 99 insertions(+), 2 deletions(-) > > diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c > index 06e1e8ca0..21f149867 100644 > --- a/lib/dpif-netlink.c > +++ b/lib/dpif-netlink.c > @@ -2506,6 +2506,85 @@ dpif_netlink_handler_uninit(struct dpif_handler > *handler) > } > #endif > > +/* > + * Returns true if num is a prime number, > + * Otherwise return false > + */ > +static bool > +is_prime(uint32_t num) > +{ > + if (num == 2) { > + return true; > + } > + > + if (num < 2) { > + return false; > + } > + > + if (num % 2 == 0) { > + return false; > + } > + > + for (uint64_t i = 3; i * i <= num; i += 2) { > + if (num % i == 0) { > + return false; > + } > + } > + > + return true; > +} > + > +/* > + * Returns num if num is a prime number. Otherwise returns the next > + * prime greater than num. Search is limited by UINT32_MAX. > + * > + * Returns 0 if no prime has been found between num and UINT32_MAX > + */ > +static uint32_t > +next_prime(uint32_t num) > +{ > + if (num <= 2) { > + return 2; > + } > + > + for (uint32_t i = num; i < UINT32_MAX; i++) { > + if (is_prime(i)) { > + return i; > + } > + } > + > + return 0; > +} > + > +/* > + * Calculates and returns the number of handler threads needed based > + * the following formula: > + * > + * handlers_n = min(next_prime(active_cores+1), total_cores) > + * > + * Where next_prime is a function that returns the next numeric prime > + * number that is strictly greater than active_cores > + */ > +static uint32_t > +dpif_netlink_calculate_n_handlers(void) > +{ > + uint32_t total_cores = count_total_cores(); > + uint32_t n_handlers = count_cpu_cores(); > + uint32_t next_prime_num; > + > + /* > + * If we have isolated cores, add additional handler threads to > + * minimize the number of cores assigned to individual handler threads > + */ > + if (n_handlers < total_cores && total_cores > 2) { > + next_prime_num = next_prime(n_handlers + 1); > + n_handlers = next_prime_num >= total_cores ? > + total_cores : next_prime_num; > + } > + > + return n_handlers; > +} > + > static int > dpif_netlink_refresh_handlers_cpu_dispatch(struct dpif_netlink *dpif) > OVS_REQ_WRLOCK(dpif->upcall_lock) > @@ -2515,7 +2594,7 @@ dpif_netlink_refresh_handlers_cpu_dispatch(struct > dpif_netlink *dpif) > uint32_t n_handlers; > uint32_t *upcall_pids; > > - n_handlers = count_cpu_cores(); > + n_handlers = dpif_netlink_calculate_n_handlers(); > if (dpif->n_handlers != n_handlers) { > VLOG_DBG("Dispatch mode(per-cpu): initializing %d handlers", > n_handlers); > @@ -2755,7 +2834,7 @@ dpif_netlink_number_handlers_required(struct dpif > *dpif_, uint32_t > *n_handlers) > struct dpif_netlink *dpif = dpif_netlink_cast(dpif_); > > if (dpif_netlink_upcall_per_cpu(dpif)) { > - *n_handlers = count_cpu_cores(); > + *n_handlers = dpif_netlink_calculate_n_handlers(); > return true; > } > > diff --git a/lib/ovs-thread.c b/lib/ovs-thread.c > index 805cba622..78ed3e970 100644 > --- a/lib/ovs-thread.c > +++ b/lib/ovs-thread.c > @@ -663,6 +663,23 @@ count_cpu_cores(void) > return n_cores > 0 ? n_cores : 0; > } > > +/* Returns the total number of cores on the system, or 0 if the > + * number cannot be determined. */ > +int > +count_total_cores(void) > +{ > + long int n_cores; > + > +#ifndef _WIN32 > + n_cores = sysconf(_SC_NPROCESSORS_CONF); > +#else > + n_cores = 0; > + errno = ENOTSUP; > +#endif > + > + return n_cores > 0 ? n_cores : 0; > +} > + > /* Returns 'true' if current thread is PMD thread. */ > bool > thread_is_pmd(void) > diff --git a/lib/ovs-thread.h b/lib/ovs-thread.h > index 3b444ccdc..aac5e19c9 100644 > --- a/lib/ovs-thread.h > +++ b/lib/ovs-thread.h > @@ -522,6 +522,7 @@ bool may_fork(void); > /* Useful functions related to threading. */ > > int count_cpu_cores(void); > +int count_total_cores(void); > bool thread_is_pmd(void); > > #endif /* ovs-thread.h */ > -- > 2.33.1 > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
