Re: [ovs-dev] [PATCH] dpdk: Support running PMD threads on cores > RTE_MAX_LCORE.
On 03.12.2019 14:00, David Marchand wrote: > On Tue, Dec 3, 2019 at 12:34 PM Ilya Maximets wrote: >> >> On 02.12.2019 17:03, David Marchand wrote: >>> Most DPDK components make the assumption that rte_lcore_id() returns >>> a valid lcore_id in [0..RTE_MAX_LCORE] range (with the exception of >>> the LCORE_ID_ANY special value). >> >> Do you think that makes a practical sense? In a real virtualization >> environments users are usually using cpus with lower numbers and it's >> most likely possible to change NUMA layout to have CPUs from all the >> nodes enumerated from the low CPU numbers. > > That is if you can reconfigure the NUMA layout. > How do you achieve this? > Bios/firmware configuration? It's usually a BIOS configuration. Modern servers provides configurations for 2-4 types of core enumeration along with choosing NUMA topology. > Kernel boot options (did not find)? > > I imagine it would be a pain to enforce this on a fleet of servers > with different hw specs. From my experience, you almost never could just take a new server and run your virtualization software. You'll have to enable some HW capabilities or enable "Highly Optimized Virtualization Configuration Set" in BIOS anyway. > > >> It's uncommon also to use all the CPUs for just OVS on a big system >> with huge number of cores. > > Yes, obviously, you don't want to use all your cores for OVS :-). > > >> >> Another thing is can we really do this on a DPDK level? Maybe it'll >> be a step to dynamic registering/unregistering of non-EAL threads? > > I'd like to ultimately get to a dynamic register mechanism. > But I first wanted to fix the existing code... > > >> Since you're wiping off the meaning of a lcore as a CPU core, it >> becomes just a thread_id in a DPDK point of view. So, maybe application >> could call a new API function instead of managing these strange >> mappings by itself? > > ... as it is unlikely we will backport an experimental API to previous > dpdk versions. > > >> >> One comment inline (not a full review). >> >> Best regards, Ilya Maximets. >> >>> >>> OVS does not currently check which value is set in >>> RTE_PER_LCORE(_lcore_id) which exposes us to potential crashes on DPDK >>> side. >>> >>> Introduce a lcore allocator in OVS for PMD threads and map them to >>> unused lcores from DPDK à la --lcores. >>> >>> The physical cores on which the PMD threads are running still >>> constitutes an important information when debugging, so still keep >>> those in the PMD thread names but add a new debug log when starting >>> them. >>> >>> Synchronize DPDK internals on numa and cpuset for the PMD threads by >>> registering them via the rte_thread_set_affinity() helper. >>> >>> Signed-off-by: David Marchand >>> --- >>> lib/dpdk-stub.c | 8 +- >>> lib/dpdk.c| 69 +++ >>> lib/dpdk.h| 3 ++- >>> lib/dpif-netdev.c | 3 ++- >>> 4 files changed, 75 insertions(+), 8 deletions(-) >>> >>> diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c >>> index c332c217c..90473bc8e 100644 >>> --- a/lib/dpdk-stub.c >>> +++ b/lib/dpdk-stub.c >>> @@ -39,7 +39,13 @@ dpdk_init(const struct smap *ovs_other_config) >>> } >>> >>> void >>> -dpdk_set_lcore_id(unsigned cpu OVS_UNUSED) >>> +dpdk_init_thread_context(unsigned cpu OVS_UNUSED) >>> +{ >>> +/* Nothing */ >>> +} >>> + >>> +void >>> +dpdk_uninit_thread_context(void) >>> { >>> /* Nothing */ >>> } >>> diff --git a/lib/dpdk.c b/lib/dpdk.c >>> index 21dd47e80..771baa413 100644 >>> --- a/lib/dpdk.c >>> +++ b/lib/dpdk.c >>> @@ -33,6 +33,7 @@ >>> >>> #include "dirs.h" >>> #include "fatal-signal.h" >>> +#include "id-pool.h" >>> #include "netdev-dpdk.h" >>> #include "netdev-offload-provider.h" >>> #include "openvswitch/dynamic-string.h" >>> @@ -55,6 +56,9 @@ static bool dpdk_initialized = false; /* Indicates >>> successful initialization >>> * of DPDK. */ >>> static bool per_port_memory = false; /* Status of per port memory support >>> */ >>> >>> +static struct id_pool *lcore_id_pool; OVS_GUARDED_BY() would be nice. >>> +static struct ovs_mutex lcore_id_pool_mutex = OVS_MUTEX_INITIALIZER; >>> + >>> static int >>> process_vhost_flags(char *flag, const char *default_val, int size, >>> const struct smap *ovs_other_config, >>> @@ -346,7 +350,8 @@ dpdk_init__(const struct smap *ovs_other_config) >>> } >>> } >>> >>> -if (args_contains(, "-c") || args_contains(, "-l")) { >>> +if (args_contains(, "-c") || args_contains(, "-l") || >>> +args_contains(, "--lcores")) { >>> auto_determine = false; >>> } >>> >>> @@ -372,8 +377,8 @@ dpdk_init__(const struct smap *ovs_other_config) >>> * thread affintity - default to core #0 */ >>> VLOG_ERR("Thread getaffinity failed. Using core #0"); >>> } >>> -svec_add(, "-l"); >>> -svec_add_nocopy(, xasprintf("%d", cpu)); >>> +
Re: [ovs-dev] [PATCH] dpdk: Support running PMD threads on cores > RTE_MAX_LCORE.
On Wed, Dec 4, 2019 at 9:21 PM Flavio Leitner wrote: > BTW, it would be nice to have an ovs-appctl command to dump a list > of IDs and CPUs in case the logs are rotated, config changed and > whatnot. For instance, sos could get an accurate report dumping a > list. Good idea. I will look at this once I get to the other point raised by Ilya. Thanks. -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] dpdk: Support running PMD threads on cores > RTE_MAX_LCORE.
On Wed, Dec 04, 2019 at 09:09:42PM +0100, David Marchand wrote: > On Wed, Dec 4, 2019 at 8:52 PM Flavio Leitner wrote: > > > > On Mon, Dec 02, 2019 at 05:03:30PM +0100, David Marchand wrote: > > > Most DPDK components make the assumption that rte_lcore_id() returns > > > a valid lcore_id in [0..RTE_MAX_LCORE] range (with the exception of > > > the LCORE_ID_ANY special value). > > > > > > OVS does not currently check which value is set in > > > RTE_PER_LCORE(_lcore_id) which exposes us to potential crashes on DPDK > > > side. > > > > > > Introduce a lcore allocator in OVS for PMD threads and map them to > > > unused lcores from DPDK à la --lcores. > > > > > > The physical cores on which the PMD threads are running still > > > constitutes an important information when debugging, so still keep > > > those in the PMD thread names but add a new debug log when starting > > > them. > > > > > > Synchronize DPDK internals on numa and cpuset for the PMD threads by > > > registering them via the rte_thread_set_affinity() helper. > > > > > > Signed-off-by: David Marchand > > > --- > > > > I liked the idea because maybe we could lower RTE_MAX_LCORE and save > > some memory. It's unusual to require the default amount anyways. > > About decreasing RTE_MAX_LCORE, I had that in mind, but the problem is > existing deployments with -c/-l in ovsdb. > We could try to parse this, reformat and only pass --lcores to dpdk. > Still feels a bit dangerous. Always the legacy stuff... :) > > I changed RTE_MAX_LCORE to 4, and ran some tests with CPU 8: > > 2019-12-04T18:27:22.797Z|1|dpdk(pmd-c08/id:3)|INFO|Initialised lcore 1 > > for core 8 > > > > I didn't find any issues with either the patch or with the tests. > > > > Acked-by: Flavio Leitner > > Thanks for the review and test! BTW, it would be nice to have an ovs-appctl command to dump a list of IDs and CPUs in case the logs are rotated, config changed and whatnot. For instance, sos could get an accurate report dumping a list. -- fbl ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] dpdk: Support running PMD threads on cores > RTE_MAX_LCORE.
On Wed, Dec 4, 2019 at 8:52 PM Flavio Leitner wrote: > > On Mon, Dec 02, 2019 at 05:03:30PM +0100, David Marchand wrote: > > Most DPDK components make the assumption that rte_lcore_id() returns > > a valid lcore_id in [0..RTE_MAX_LCORE] range (with the exception of > > the LCORE_ID_ANY special value). > > > > OVS does not currently check which value is set in > > RTE_PER_LCORE(_lcore_id) which exposes us to potential crashes on DPDK > > side. > > > > Introduce a lcore allocator in OVS for PMD threads and map them to > > unused lcores from DPDK à la --lcores. > > > > The physical cores on which the PMD threads are running still > > constitutes an important information when debugging, so still keep > > those in the PMD thread names but add a new debug log when starting > > them. > > > > Synchronize DPDK internals on numa and cpuset for the PMD threads by > > registering them via the rte_thread_set_affinity() helper. > > > > Signed-off-by: David Marchand > > --- > > I liked the idea because maybe we could lower RTE_MAX_LCORE and save > some memory. It's unusual to require the default amount anyways. About decreasing RTE_MAX_LCORE, I had that in mind, but the problem is existing deployments with -c/-l in ovsdb. We could try to parse this, reformat and only pass --lcores to dpdk. Still feels a bit dangerous. > > I changed RTE_MAX_LCORE to 4, and ran some tests with CPU 8: > 2019-12-04T18:27:22.797Z|1|dpdk(pmd-c08/id:3)|INFO|Initialised lcore 1 > for core 8 > > I didn't find any issues with either the patch or with the tests. > > Acked-by: Flavio Leitner Thanks for the review and test! -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] dpdk: Support running PMD threads on cores > RTE_MAX_LCORE.
On Mon, Dec 02, 2019 at 05:03:30PM +0100, David Marchand wrote: > Most DPDK components make the assumption that rte_lcore_id() returns > a valid lcore_id in [0..RTE_MAX_LCORE] range (with the exception of > the LCORE_ID_ANY special value). > > OVS does not currently check which value is set in > RTE_PER_LCORE(_lcore_id) which exposes us to potential crashes on DPDK > side. > > Introduce a lcore allocator in OVS for PMD threads and map them to > unused lcores from DPDK à la --lcores. > > The physical cores on which the PMD threads are running still > constitutes an important information when debugging, so still keep > those in the PMD thread names but add a new debug log when starting > them. > > Synchronize DPDK internals on numa and cpuset for the PMD threads by > registering them via the rte_thread_set_affinity() helper. > > Signed-off-by: David Marchand > --- I liked the idea because maybe we could lower RTE_MAX_LCORE and save some memory. It's unusual to require the default amount anyways. I changed RTE_MAX_LCORE to 4, and ran some tests with CPU 8: 2019-12-04T18:27:22.797Z|1|dpdk(pmd-c08/id:3)|INFO|Initialised lcore 1 for core 8 I didn't find any issues with either the patch or with the tests. Acked-by: Flavio Leitner Thanks fbl ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] dpdk: Support running PMD threads on cores > RTE_MAX_LCORE.
On Tue, Dec 3, 2019 at 12:34 PM Ilya Maximets wrote: > > On 02.12.2019 17:03, David Marchand wrote: > > Most DPDK components make the assumption that rte_lcore_id() returns > > a valid lcore_id in [0..RTE_MAX_LCORE] range (with the exception of > > the LCORE_ID_ANY special value). > > Do you think that makes a practical sense? In a real virtualization > environments users are usually using cpus with lower numbers and it's > most likely possible to change NUMA layout to have CPUs from all the > nodes enumerated from the low CPU numbers. That is if you can reconfigure the NUMA layout. How do you achieve this? Bios/firmware configuration? Kernel boot options (did not find)? I imagine it would be a pain to enforce this on a fleet of servers with different hw specs. > It's uncommon also to use all the CPUs for just OVS on a big system > with huge number of cores. Yes, obviously, you don't want to use all your cores for OVS :-). > > Another thing is can we really do this on a DPDK level? Maybe it'll > be a step to dynamic registering/unregistering of non-EAL threads? I'd like to ultimately get to a dynamic register mechanism. But I first wanted to fix the existing code... > Since you're wiping off the meaning of a lcore as a CPU core, it > becomes just a thread_id in a DPDK point of view. So, maybe application > could call a new API function instead of managing these strange > mappings by itself? ... as it is unlikely we will backport an experimental API to previous dpdk versions. > > One comment inline (not a full review). > > Best regards, Ilya Maximets. > > > > > OVS does not currently check which value is set in > > RTE_PER_LCORE(_lcore_id) which exposes us to potential crashes on DPDK > > side. > > > > Introduce a lcore allocator in OVS for PMD threads and map them to > > unused lcores from DPDK à la --lcores. > > > > The physical cores on which the PMD threads are running still > > constitutes an important information when debugging, so still keep > > those in the PMD thread names but add a new debug log when starting > > them. > > > > Synchronize DPDK internals on numa and cpuset for the PMD threads by > > registering them via the rte_thread_set_affinity() helper. > > > > Signed-off-by: David Marchand > > --- > > lib/dpdk-stub.c | 8 +- > > lib/dpdk.c| 69 +++ > > lib/dpdk.h| 3 ++- > > lib/dpif-netdev.c | 3 ++- > > 4 files changed, 75 insertions(+), 8 deletions(-) > > > > diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c > > index c332c217c..90473bc8e 100644 > > --- a/lib/dpdk-stub.c > > +++ b/lib/dpdk-stub.c > > @@ -39,7 +39,13 @@ dpdk_init(const struct smap *ovs_other_config) > > } > > > > void > > -dpdk_set_lcore_id(unsigned cpu OVS_UNUSED) > > +dpdk_init_thread_context(unsigned cpu OVS_UNUSED) > > +{ > > +/* Nothing */ > > +} > > + > > +void > > +dpdk_uninit_thread_context(void) > > { > > /* Nothing */ > > } > > diff --git a/lib/dpdk.c b/lib/dpdk.c > > index 21dd47e80..771baa413 100644 > > --- a/lib/dpdk.c > > +++ b/lib/dpdk.c > > @@ -33,6 +33,7 @@ > > > > #include "dirs.h" > > #include "fatal-signal.h" > > +#include "id-pool.h" > > #include "netdev-dpdk.h" > > #include "netdev-offload-provider.h" > > #include "openvswitch/dynamic-string.h" > > @@ -55,6 +56,9 @@ static bool dpdk_initialized = false; /* Indicates > > successful initialization > > * of DPDK. */ > > static bool per_port_memory = false; /* Status of per port memory support > > */ > > > > +static struct id_pool *lcore_id_pool; > > +static struct ovs_mutex lcore_id_pool_mutex = OVS_MUTEX_INITIALIZER; > > + > > static int > > process_vhost_flags(char *flag, const char *default_val, int size, > > const struct smap *ovs_other_config, > > @@ -346,7 +350,8 @@ dpdk_init__(const struct smap *ovs_other_config) > > } > > } > > > > -if (args_contains(, "-c") || args_contains(, "-l")) { > > +if (args_contains(, "-c") || args_contains(, "-l") || > > +args_contains(, "--lcores")) { > > auto_determine = false; > > } > > > > @@ -372,8 +377,8 @@ dpdk_init__(const struct smap *ovs_other_config) > > * thread affintity - default to core #0 */ > > VLOG_ERR("Thread getaffinity failed. Using core #0"); > > } > > -svec_add(, "-l"); > > -svec_add_nocopy(, xasprintf("%d", cpu)); > > +svec_add(, "--lcores"); > > +svec_add_nocopy(, xasprintf("0@%d", cpu)); > > } > > > > svec_terminate(); > > @@ -429,6 +434,23 @@ dpdk_init__(const struct smap *ovs_other_config) > > } > > } > > > > +ovs_mutex_lock(_id_pool_mutex); > > +lcore_id_pool = id_pool_create(0, RTE_MAX_LCORE); > > +/* Empty the whole pool... */ > > +for (uint32_t lcore = 0; lcore < RTE_MAX_LCORE; lcore++) { > > +uint32_t lcore_id; > > + > > +id_pool_alloc_id(lcore_id_pool,
Re: [ovs-dev] [PATCH] dpdk: Support running PMD threads on cores > RTE_MAX_LCORE.
On 02.12.2019 17:03, David Marchand wrote: > Most DPDK components make the assumption that rte_lcore_id() returns > a valid lcore_id in [0..RTE_MAX_LCORE] range (with the exception of > the LCORE_ID_ANY special value). Do you think that makes a practical sense? In a real virtualization environments users are usually using cpus with lower numbers and it's most likely possible to change NUMA layout to have CPUs from all the nodes enumerated from the low CPU numbers. It's uncommon also to use all the CPUs for just OVS on a big system with huge number of cores. Another thing is can we really do this on a DPDK level? Maybe it'll be a step to dynamic registering/unregistering of non-EAL threads? Since you're wiping off the meaning of a lcore as a CPU core, it becomes just a thread_id in a DPDK point of view. So, maybe application could call a new API function instead of managing these strange mappings by itself? One comment inline (not a full review). Best regards, Ilya Maximets. > > OVS does not currently check which value is set in > RTE_PER_LCORE(_lcore_id) which exposes us to potential crashes on DPDK > side. > > Introduce a lcore allocator in OVS for PMD threads and map them to > unused lcores from DPDK à la --lcores. > > The physical cores on which the PMD threads are running still > constitutes an important information when debugging, so still keep > those in the PMD thread names but add a new debug log when starting > them. > > Synchronize DPDK internals on numa and cpuset for the PMD threads by > registering them via the rte_thread_set_affinity() helper. > > Signed-off-by: David Marchand > --- > lib/dpdk-stub.c | 8 +- > lib/dpdk.c| 69 +++ > lib/dpdk.h| 3 ++- > lib/dpif-netdev.c | 3 ++- > 4 files changed, 75 insertions(+), 8 deletions(-) > > diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c > index c332c217c..90473bc8e 100644 > --- a/lib/dpdk-stub.c > +++ b/lib/dpdk-stub.c > @@ -39,7 +39,13 @@ dpdk_init(const struct smap *ovs_other_config) > } > > void > -dpdk_set_lcore_id(unsigned cpu OVS_UNUSED) > +dpdk_init_thread_context(unsigned cpu OVS_UNUSED) > +{ > +/* Nothing */ > +} > + > +void > +dpdk_uninit_thread_context(void) > { > /* Nothing */ > } > diff --git a/lib/dpdk.c b/lib/dpdk.c > index 21dd47e80..771baa413 100644 > --- a/lib/dpdk.c > +++ b/lib/dpdk.c > @@ -33,6 +33,7 @@ > > #include "dirs.h" > #include "fatal-signal.h" > +#include "id-pool.h" > #include "netdev-dpdk.h" > #include "netdev-offload-provider.h" > #include "openvswitch/dynamic-string.h" > @@ -55,6 +56,9 @@ static bool dpdk_initialized = false; /* Indicates > successful initialization > * of DPDK. */ > static bool per_port_memory = false; /* Status of per port memory support */ > > +static struct id_pool *lcore_id_pool; > +static struct ovs_mutex lcore_id_pool_mutex = OVS_MUTEX_INITIALIZER; > + > static int > process_vhost_flags(char *flag, const char *default_val, int size, > const struct smap *ovs_other_config, > @@ -346,7 +350,8 @@ dpdk_init__(const struct smap *ovs_other_config) > } > } > > -if (args_contains(, "-c") || args_contains(, "-l")) { > +if (args_contains(, "-c") || args_contains(, "-l") || > +args_contains(, "--lcores")) { > auto_determine = false; > } > > @@ -372,8 +377,8 @@ dpdk_init__(const struct smap *ovs_other_config) > * thread affintity - default to core #0 */ > VLOG_ERR("Thread getaffinity failed. Using core #0"); > } > -svec_add(, "-l"); > -svec_add_nocopy(, xasprintf("%d", cpu)); > +svec_add(, "--lcores"); > +svec_add_nocopy(, xasprintf("0@%d", cpu)); > } > > svec_terminate(); > @@ -429,6 +434,23 @@ dpdk_init__(const struct smap *ovs_other_config) > } > } > > +ovs_mutex_lock(_id_pool_mutex); > +lcore_id_pool = id_pool_create(0, RTE_MAX_LCORE); > +/* Empty the whole pool... */ > +for (uint32_t lcore = 0; lcore < RTE_MAX_LCORE; lcore++) { > +uint32_t lcore_id; > + > +id_pool_alloc_id(lcore_id_pool, _id); > +} > +/* ...and release the unused spots. */ > +for (uint32_t lcore = 0; lcore < RTE_MAX_LCORE; lcore++) { > +if (rte_eal_lcore_role(lcore) != ROLE_OFF) { > + continue; > +} > +id_pool_free_id(lcore_id_pool, lcore); > +} > +ovs_mutex_unlock(_id_pool_mutex); > + > /* We are called from the main thread here */ > RTE_PER_LCORE(_lcore_id) = NON_PMD_CORE_ID; > > @@ -522,11 +544,48 @@ dpdk_available(void) > } > > void > -dpdk_set_lcore_id(unsigned cpu) > +dpdk_init_thread_context(unsigned cpu) > { > +cpu_set_t cpuset; > +unsigned lcore; > +int err; > + > /* NON_PMD_CORE_ID is reserved for use by non pmd threads. */ > ovs_assert(cpu != NON_PMD_CORE_ID); > -