On 12/01/2021 16:24, Stokes, Ian wrote: > Hi David, > > Thanks for the patch, a few comments and queries inline. > >> DPDK 20.08 introduced a new API that associates a non-EAL thread to a free >> lcore. This new API does not change the thread characteristics (like CPU >> affinity). >> Using this new API, there is no assumption on lcore X running on cpu X >> anymore which leaves OVS free from running its PMD thread on any cpu. > > Possibly just the wording above, " which leaves OVS free from running it's > PMD on any cpu". > I thought OVS was allowed to run its PMDs on any core anyway as long as they > were <= RTE_MAX_LCORE. > Does above mean free to run on any core greater than RTE_MAX_LCORE?. > >> The DPDK multiprocess feature is not compatible with this new API and is >> disabled. >> > > I think that's OK, I'm trying to think where this is a use case, since PDUMP > was removed I'm not aware of any other case. > I would assume we are still allowed to run another DPDK applications on the > same host as long as the --file-prefix was specified? > > For example I've come across people testing with TESTPMD and virtio vdevs on > the same host as OVS DPDK in the past. > > Out of interest, did you give this patch a run with the OVS DPDK unit tests? > >> DPDK still limits the number of lcores to RTE_MAX_LCORE (128 on x86_64) >> which should be enough for OVS pmd threads (hopefully). > > I'm hard pressed to think of a case where it would not be sufficient tbh. > >> >> DPDK lcore/OVS pmd threads mapping are logged at threads creation and >> destruction. >> A new command is added to help get DPDK point of view of the DPDK lcores: >> >> $ ovs-appctl dpdk/lcores-list >> lcore 0, socket 0, role RTE, cpuset 0 >> lcore 1, socket 0, role NON_EAL, cpuset 1 >> lcore 2, socket 0, role NON_EAL, cpuset 15 >> >> Signed-off-by: David Marchand <[email protected]>
I didn't re-test this time due to a local issue, but the additional explicit call to disable mp and documentation LGTM. Acked-by: Kevin Traynor <[email protected]> Comment about experimental API below. >> --- >> Changes since v4: >> - rebased on the master branch, >> - disabled DPDK mp feature, >> - updated DPDK documentation and manual with the new command, >> - added notes in NEWS, >> >> Changes since v3: >> - rebased on current HEAD, >> - switched back to simple warning rather than abort when registering a >> thread fails, >> >> Changes since v2: >> - introduced a new api in DPDK 20.08 (still being discussed), inbox thread at >> http://inbox.dpdk.org/dev/20200610144506.30505-1- >> [email protected]/T/#t >> - this current patch depends on a patch on master I sent: >> >> https://patchwork.ozlabs.org/project/openvswitch/patch/20200626122738.2 >> [email protected]/ >> - dropped 'dpdk-lcore-mask' compat handling, >> >> Changes since v1: >> - rewired existing configuration 'dpdk-lcore-mask' to use --lcores, >> - switched to a bitmap to track lcores, >> - added a command to dump current mapping (Flavio): used an experimental >> API to get DPDK lcores cpuset since it is the most reliable/portable >> information, >> - used the same code for the logs when starting DPDK/PMD threads, >> - addressed Ilya comments, >> >> --- >> Documentation/howto/dpdk.rst | 5 +++++ >> NEWS | 2 ++ >> lib/dpdk-stub.c | 8 ++++++- >> lib/dpdk-unixctl.man | 2 ++ >> lib/dpdk.c | 42 ++++++++++++++++++++++++++++++++++-- >> lib/dpdk.h | 3 ++- >> lib/dpif-netdev.c | 3 ++- >> 7 files changed, 60 insertions(+), 5 deletions(-) >> >> diff --git a/Documentation/howto/dpdk.rst >> b/Documentation/howto/dpdk.rst >> index f0d45e47b6..8492f354ce 100644 >> --- a/Documentation/howto/dpdk.rst >> +++ b/Documentation/howto/dpdk.rst >> @@ -399,6 +399,11 @@ Supported actions for hardware offload are: >> - Modification of IPv6 (set_field:<ADDR>->ipv6_src/ipv6_dst/mod_nw_ttl). >> - Clone/output (tnl_push and output) for encapsulating over a tunnel. >> >> +Multiprocess >> +------------ >> + >> +This DPDK feature is not supported and disabled during OVS initialization. >> + >> Further Reading >> --------------- >> >> diff --git a/NEWS b/NEWS >> index d357da31d8..f59311ab63 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -12,6 +12,8 @@ Post-v2.14.0 >> - DPDK: >> * Removed support for vhost-user dequeue zero-copy. >> * Add support for DPDK 20.11. >> + * Forbid use of DPDK multiprocess feature. >> + * Add support for running threads on cores > RTE_MAX_LCORE. >> - Userspace datapath: >> * Add the 'pmd' option to "ovs-appctl dpctl/dump-flows", which >> restricts a flow dump to a single PMD thread if set. >> diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c >> index b7d577870d..5bc996b665 100644 >> --- a/lib/dpdk-stub.c >> +++ b/lib/dpdk-stub.c >> @@ -39,7 +39,13 @@ dpdk_init(const struct smap *ovs_other_config) >> } >> >> void >> -dpdk_set_lcore_id(unsigned cpu OVS_UNUSED) >> +dpdk_init_thread_context(unsigned cpu OVS_UNUSED) >> +{ >> + /* Nothing */ >> +} >> + >> +void >> +dpdk_uninit_thread_context(void) >> { >> /* Nothing */ >> } >> diff --git a/lib/dpdk-unixctl.man b/lib/dpdk-unixctl.man >> index 2d6d576f24..c28bb4ef20 100644 >> --- a/lib/dpdk-unixctl.man >> +++ b/lib/dpdk-unixctl.man >> @@ -10,5 +10,7 @@ list of words separated by spaces: a word can be either a >> logging \fBlevel\fR >> \fBnotice\fR, \fBinfo\fR or \fBdebug\fR) or a \fBpattern\fR matching DPDK >> components (see \fBdpdk/log-list\fR command on \fBovs\-appctl\fR(8)) >> separated >> by a colon from the logging \fBlevel\fR to apply. >> +.IP "\fBdpdk/lcore-list\fR" >> +Lists the DPDK lcores and their cpu affinity. > > Minor nit, is it worth mentioning "role type" as well in the description > above? > >> .RE >> . >> diff --git a/lib/dpdk.c b/lib/dpdk.c >> index 319540394b..defac3f7de 100644 >> --- a/lib/dpdk.c >> +++ b/lib/dpdk.c >> @@ -14,6 +14,10 @@ >> * limitations under the License. >> */ >> >> +#ifndef ALLOW_EXPERIMENTAL_API >> +#define ALLOW_EXPERIMENTAL_API >> +#endif >> + > > So a bit of a wider question, how does the community feel about allowing > experimental API into the code base? > > What is the feeling with the API in DPDK, will there be further modification > to calls such as rte_mp_disable? > > Will the experimental tag be removed in a future releases (e.g. 21.02). > > In the past we made an exception for the rte_meter library and experimental > API because it was replacing original rte_meter functionality OVS already > used and I think it was an honest oversight to remove the experimental tag > for the 19.11 release for those functions. > > But this could open up the door to accepting experimental APIs in other areas > of the OVS code base from DPDK also. > > I raised this at the conference in December, genuinely interested to hear > peoples thoughts? > I think it should be considered case by case: - What is the benefit of the feature it enables? - Is the API likely to change? - What is the likely impact to OVS of it changing? in terms of code churn and features. - Is the author willing to rework OVS if it changes? It is also not just a one way thing. OVS using the experimental API can be feedback to either solidify the API or identify gaps. I think OVS using the API successfully can be a strong influence on it becoming stable in DPDK. >> #include <config.h> >> #include "dpdk.h" >> >> @@ -502,6 +506,12 @@ dpdk_init__(const struct smap *ovs_other_config) >> return false; >> } >> >> + if (!rte_mp_disable()) { >> + VLOG_EMER("Could not disable multiprocess, DPDK won't be >> available."); >> + rte_eal_cleanup(); >> + return false; >> + } >> + >> if (VLOG_IS_DBG_ENABLED()) { >> size_t size; >> char *response = NULL; >> @@ -525,6 +535,8 @@ dpdk_init__(const struct smap *ovs_other_config) >> dpdk_unixctl_mem_stream, rte_log_dump); >> unixctl_command_register("dpdk/log-set", "{level | pattern:level}", 0, >> INT_MAX, dpdk_unixctl_log_set, NULL); >> + unixctl_command_register("dpdk/lcores-list", "", 0, 0, >> + dpdk_unixctl_mem_stream, rte_lcore_dump); >> >> /* We are called from the main thread here */ >> RTE_PER_LCORE(_lcore_id) = NON_PMD_CORE_ID; >> @@ -601,11 +613,37 @@ dpdk_available(void) >> } >> >> void >> -dpdk_set_lcore_id(unsigned cpu) >> +dpdk_init_thread_context(unsigned cpu) >> { >> /* NON_PMD_CORE_ID is reserved for use by non pmd threads. */ >> ovs_assert(cpu != NON_PMD_CORE_ID); >> - RTE_PER_LCORE(_lcore_id) = cpu; >> + >> + if (!dpdk_available()) { >> + return; >> + } >> + >> + if (rte_thread_register() < 0) { >> + VLOG_WARN("This OVS pmd thread will share resources with the non- >> pmd " >> + "thread: %s.", rte_strerror(rte_errno)); >> + } else { >> + VLOG_INFO("PMD thread uses DPDK lcore %u.", rte_lcore_id()); >> + } >> +} > > I'd like to do a little more testing around the case where the PMD thread is > shared with a non-pmd thread above. > > Do you typically see this usecase? > >> + >> +void >> +dpdk_uninit_thread_context(void) >> +{ >> + unsigned int lcore_id; >> + >> + if (!dpdk_available()) { >> + return; >> + } >> + >> + lcore_id = rte_lcore_id(); >> + rte_thread_unregister(); >> + if (lcore_id != LCORE_ID_ANY) { >> + VLOG_INFO("PMD thread released DPDK lcore %u.", lcore_id); >> + } >> } >> >> void >> diff --git a/lib/dpdk.h b/lib/dpdk.h >> index 445a51d065..1bd16b31db 100644 >> --- a/lib/dpdk.h >> +++ b/lib/dpdk.h >> @@ -36,7 +36,8 @@ struct smap; >> struct ovsrec_open_vswitch; >> >> void dpdk_init(const struct smap *ovs_other_config); >> -void dpdk_set_lcore_id(unsigned cpu); >> +void dpdk_init_thread_context(unsigned cpu); >> +void dpdk_uninit_thread_context(void); >> const char *dpdk_get_vhost_sock_dir(void); >> bool dpdk_vhost_iommu_enabled(void); >> bool dpdk_vhost_postcopy_enabled(void); >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >> index 300861ca59..da9e1e8fcf 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -5926,7 +5926,7 @@ pmd_thread_main(void *f_) >> /* Stores the pmd thread's 'pmd' to 'per_pmd_key'. */ >> ovsthread_setspecific(pmd->dp->per_pmd_key, pmd); >> ovs_numa_thread_setaffinity_core(pmd->core_id); >> - dpdk_set_lcore_id(pmd->core_id); >> + dpdk_init_thread_context(pmd->core_id); >> poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list); >> dfc_cache_init(&pmd->flow_cache); >> pmd_alloc_static_tx_qid(pmd); >> @@ -6061,6 +6061,7 @@ reload: >> dfc_cache_uninit(&pmd->flow_cache); >> free(poll_list); >> pmd_free_cached_ports(pmd); >> + dpdk_uninit_thread_context(); >> return NULL; >> } >> > > Again, I've a small bit more testing to do, so far it seems OK to me. Just > would like to get the experimental conversation going as I see it as one of > the potential blockers. > > Regards > Ian >> -- >> 2.23.0 > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
