> > Looks good to me. > > Acked-by: Ilya Maximets <[email protected]> >
I'll pull this in with this week's pull request, thanks all for the effort. Thanks Ian > On 31.08.2018 11:47, Kevin Traynor wrote: > > Prior to OVS 2.9 automatic assignment of Rxqs to PMDs (i.e. CPUs) was > > done by round-robin. > > > > That was changed in OVS 2.9 to ordering the Rxqs based on their > > measured processing cycles. This was to assign the busiest Rxqs to > > different PMDs, improving aggregate throughput. > > > > For the most part the new scheme should be better, but there could be > > situations where a user prefers a simple round-robin scheme because > > Rxqs from a single port are more likely to be spread across multiple > > PMDs, and/or traffic is very bursty/unpredictable. > > > > Add 'pmd-rxq-assign' config to allow a user to select round-robin > > based assignment. > > > > Signed-off-by: Kevin Traynor <[email protected]> > > Acked-by: Eelco Chaudron <[email protected]> > > --- > > > > V5: > > - Squashed NEWS into main patch (Ben) > > > > V4: > > - Modified warning log (Ilya) > > > > V3: > > - Rolled in some style and vswitch.xml changes (Ilya) > > - Set cycles mode by default on wrong config (Ilya) > > > > V2: > > - simplified nextpmd change (Eelco) > > - removed confusing doc sentence (Eelco) > > - fixed commit msg (Ilya) > > - made change in pmd-rxq-assign value also perform re-assignment > > (Ilya) > > - renamed to roundrobin mode (Ilya) > > - moved vswitch.xml changes to right config section (Ilya) > > - comment/log updates > > - moved NEWS update to separate patch as it's been changing on master > > > > Documentation/topics/dpdk/pmd.rst | 33 +++++++++++++--- > > NEWS | 4 +- > > lib/dpif-netdev.c | 83 +++++++++++++++++++++++++++++--- > ------- > > tests/pmd.at | 12 +++++- > > vswitchd/vswitch.xml | 24 +++++++++++ > > 5 files changed, 126 insertions(+), 30 deletions(-) > > > > diff --git a/Documentation/topics/dpdk/pmd.rst > > b/Documentation/topics/dpdk/pmd.rst > > index 5f0671e..dd9172d 100644 > > --- a/Documentation/topics/dpdk/pmd.rst > > +++ b/Documentation/topics/dpdk/pmd.rst > > @@ -113,10 +113,15 @@ means that this thread will only poll the *pinned* > Rx queues. > > > > If ``pmd-rxq-affinity`` is not set for Rx queues, they will be > > assigned to PMDs > > -(cores) automatically. Where known, the processing cycles that have > > been stored -for each Rx queue will be used to assign Rx queue to PMDs > > based on a round -robin of the sorted Rx queues. For example, take the > > following example, where -there are five Rx queues and three cores - > > 3, 7, and 8 - available and the -measured usage of core cycles per Rx > > queue over the last interval is seen to > > -be: > > +(cores) automatically. > > + > > +The algorithm used to automatically assign Rxqs to PMDs can be set by:: > > + > > + $ ovs-vsctl set Open_vSwitch . > > + other_config:pmd-rxq-assign=<assignment> > > + > > +By default, ``cycles`` assignment is used where the Rxqs will be > > +ordered by their measured processing cycles, and then be evenly > > +assigned in descending order to PMDs based on an up/down walk of the > > +PMDs. For example, where there are five Rx queues and three cores - > > +3, 7, and 8 - available and the measured usage of core cycles per Rx > queue over the last interval is seen to be: > > > > - Queue #0: 30% > > @@ -132,4 +137,20 @@ The Rx queues will be assigned to the cores in the > following order:: > > Core 8: Q3 (60%) | Q0 (30%) > > > > +Alternatively, ``roundrobin`` assignment can be used, where the Rxqs > > +are assigned to PMDs in a round-robined fashion. This algorithm was > > +used by default prior to OVS 2.9. For example, given the following > ports and queues: > > + > > +- Port #0 Queue #0 (P0Q0) > > +- Port #0 Queue #1 (P0Q1) > > +- Port #1 Queue #0 (P1Q0) > > +- Port #1 Queue #1 (P1Q1) > > +- Port #1 Queue #2 (P1Q2) > > + > > +The Rx queues may be assigned to the cores in the following order:: > > + > > + Core 3: P0Q0 | P1Q1 > > + Core 7: P0Q1 | P1Q2 > > + Core 8: P1Q0 | > > + > > To see the current measured usage history of PMD core cycles for each > > Rx > > queue:: > > diff --git a/NEWS b/NEWS > > index 33b4d8a..33b3638 100644 > > --- a/NEWS > > +++ b/NEWS > > @@ -9,5 +9,7 @@ Post-v2.10.0 > > - ovn: > > * ovn-ctl: allow passing user:group ids to the OVN daemons. > > - > > + - DPDK: > > + * Add option for simple round-robin based Rxq to PMD assignment. > > + It can be set with pmd-rxq-assign. > > > > v2.10.0 - xx xxx xxxx > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index > > 807a462..ae24e6a 100644 > > --- a/lib/dpif-netdev.c > > +++ b/lib/dpif-netdev.c > > @@ -349,4 +349,6 @@ struct dp_netdev { > > struct id_pool *tx_qid_pool; > > struct ovs_mutex tx_qid_pool_mutex; > > + /* Use measured cycles for rxq to pmd assignment. */ > > + bool pmd_rxq_assign_cyc; > > > > /* Protects the access of the 'struct dp_netdev_pmd_thread' > > @@ -1500,4 +1502,5 @@ create_dp_netdev(const char *name, const struct > > dpif_class *class, > > > > cmap_init(&dp->poll_threads); > > + dp->pmd_rxq_assign_cyc = true; > > > > ovs_mutex_init(&dp->tx_qid_pool_mutex); > > @@ -3724,4 +3727,6 @@ dpif_netdev_set_config(struct dpif *dpif, const > struct smap *other_config) > > struct dp_netdev *dp = get_dp_netdev(dpif); > > const char *cmask = smap_get(other_config, "pmd-cpu-mask"); > > + const char *pmd_rxq_assign = smap_get_def(other_config, "pmd-rxq- > assign", > > + "cycles"); > > unsigned long long insert_prob = > > smap_get_ullong(other_config, "emc-insert-inv-prob", @@ > > -3786,4 +3791,18 @@ dpif_netdev_set_config(struct dpif *dpif, const > struct smap *other_config) > > } > > } > > + > > + bool pmd_rxq_assign_cyc = !strcmp(pmd_rxq_assign, "cycles"); > > + if (!pmd_rxq_assign_cyc && strcmp(pmd_rxq_assign, "roundrobin")) { > > + VLOG_WARN("Unsupported Rxq to PMD assignment mode in pmd-rxq- > assign. " > > + "Defaulting to 'cycles'."); > > + pmd_rxq_assign_cyc = true; > > + pmd_rxq_assign = "cycles"; > > + } > > + if (dp->pmd_rxq_assign_cyc != pmd_rxq_assign_cyc) { > > + dp->pmd_rxq_assign_cyc = pmd_rxq_assign_cyc; > > + VLOG_INFO("Rxq to PMD assignment mode changed to: \'%s\'.", > > + pmd_rxq_assign); > > + dp_netdev_request_reconfigure(dp); > > + } > > return 0; > > } > > @@ -4256,8 +4275,16 @@ rr_numa_list_populate(struct dp_netdev *dp, > > struct rr_numa_list *rr) } > > > > -/* Returns the next pmd from the numa node in > > - * incrementing or decrementing order. */ > > +/* > > + * Returns the next pmd from the numa node. > > + * > > + * If 'updown' is 'true' it will alternate between selecting the next > > +pmd in > > + * either an up or down walk, switching between up/down when the > > +first or last > > + * core is reached. e.g. 1,2,3,3,2,1,1,2... > > + * > > + * If 'updown' is 'false' it will select the next pmd wrapping around > > +when last > > + * core reached. e.g. 1,2,3,1,2,3,1,2... > > + */ > > static struct dp_netdev_pmd_thread * > > -rr_numa_get_pmd(struct rr_numa *numa) > > +rr_numa_get_pmd(struct rr_numa *numa, bool updown) > > { > > int numa_idx = numa->cur_index; > > @@ -4267,5 +4294,9 @@ rr_numa_get_pmd(struct rr_numa *numa) > > if (numa->cur_index == numa->n_pmds-1) { > > /* Reached the last pmd. */ > > - numa->idx_inc = false; > > + if (updown) { > > + numa->idx_inc = false; > > + } else { > > + numa->cur_index = 0; > > + } > > } else { > > numa->cur_index++; > > @@ -4330,7 +4361,4 @@ compare_rxq_cycles(const void *a, const void *b) > > * pmds to unpinned queues. > > * > > - * If 'pinned' is false queues will be sorted by processing cycles > > they are > > - * consuming and then assigned to pmds in round robin order. > > - * > > * The function doesn't touch the pmd threads, it just stores the > assignment > > * in the 'pmd' member of each rxq. */ @@ -4345,4 +4373,5 @@ > > rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp- > >port_mutex) > > struct rr_numa *numa = NULL; > > int numa_id; > > + bool assign_cyc = dp->pmd_rxq_assign_cyc; > > > > HMAP_FOR_EACH (port, node, &dp->ports) { @@ -4375,10 +4404,13 @@ > > rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp- > >port_mutex) > > rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + 1)); > > } > > - /* Sum the queue intervals and store the cycle history. > */ > > - for (unsigned i = 0; i < PMD_RXQ_INTERVAL_MAX; i++) { > > - cycle_hist += dp_netdev_rxq_get_intrvl_cycles(q, > i); > > - } > > - dp_netdev_rxq_set_cycles(q, RXQ_CYCLES_PROC_HIST, > cycle_hist); > > > > + if (assign_cyc) { > > + /* Sum the queue intervals and store the cycle > history. */ > > + for (unsigned i = 0; i < PMD_RXQ_INTERVAL_MAX; i++) > { > > + cycle_hist += > dp_netdev_rxq_get_intrvl_cycles(q, i); > > + } > > + dp_netdev_rxq_set_cycles(q, RXQ_CYCLES_PROC_HIST, > > + cycle_hist); > > + } > > /* Store the queue. */ > > rxqs[n_rxqs++] = q; > > @@ -4387,5 +4419,5 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) > OVS_REQUIRES(dp->port_mutex) > > } > > > > - if (n_rxqs > 1) { > > + if (n_rxqs > 1 && assign_cyc) { > > /* Sort the queues in order of the processing cycles > > * they consumed during their last pmd interval. */ @@ > > -4411,5 +4443,5 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) > OVS_REQUIRES(dp->port_mutex) > > continue; > > } > > - rxqs[i]->pmd = rr_numa_get_pmd(non_local_numa); > > + rxqs[i]->pmd = rr_numa_get_pmd(non_local_numa, > > + assign_cyc); > > VLOG_WARN("There's no available (non-isolated) pmd thread " > > "on numa node %d. Queue %d on port \'%s\' will " > > @@ -4420,11 +4452,20 @@ rxq_scheduling(struct dp_netdev *dp, bool > pinned) OVS_REQUIRES(dp->port_mutex) > > rxqs[i]->pmd->core_id, rxqs[i]->pmd->numa_id); > > } else { > > - rxqs[i]->pmd = rr_numa_get_pmd(numa); > > - VLOG_INFO("Core %d on numa node %d assigned port \'%s\' " > > - "rx queue %d (measured processing cycles > %"PRIu64").", > > - rxqs[i]->pmd->core_id, numa_id, > > - netdev_rxq_get_name(rxqs[i]->rx), > > - netdev_rxq_get_queue_id(rxqs[i]->rx), > > - dp_netdev_rxq_get_cycles(rxqs[i], > RXQ_CYCLES_PROC_HIST)); > > + rxqs[i]->pmd = rr_numa_get_pmd(numa, assign_cyc); > > + if (assign_cyc) { > > + VLOG_INFO("Core %d on numa node %d assigned port \'%s\' > " > > + "rx queue %d " > > + "(measured processing cycles %"PRIu64").", > > + rxqs[i]->pmd->core_id, numa_id, > > + netdev_rxq_get_name(rxqs[i]->rx), > > + netdev_rxq_get_queue_id(rxqs[i]->rx), > > + dp_netdev_rxq_get_cycles(rxqs[i], > > + > RXQ_CYCLES_PROC_HIST)); > > + } else { > > + VLOG_INFO("Core %d on numa node %d assigned port \'%s\' > " > > + "rx queue %d.", rxqs[i]->pmd->core_id, > numa_id, > > + netdev_rxq_get_name(rxqs[i]->rx), > > + netdev_rxq_get_queue_id(rxqs[i]->rx)); > > + } > > } > > } > > diff --git a/tests/pmd.at b/tests/pmd.at index 4cae6c8..1f952f3 100644 > > --- a/tests/pmd.at > > +++ b/tests/pmd.at > > @@ -62,5 +62,6 @@ m4_define([CHECK_PMD_THREADS_CREATED], [ > > > > m4_define([SED_NUMA_CORE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( core_id > > \)[[0-9]]*:/\1<cleared>\2<cleared>:/"]) > > -m4_define([SED_NUMA_CORE_QUEUE_PATTERN], ["s/1 2 5 6/<group>/;s/0 3 4 > > 7/<group>/"]) > > +m4_define([SED_NUMA_CORE_QUEUE_CYC_PATTERN], ["s/1 2 5 6/<group>/;s/0 > > +3 4 7/<group>/"]) m4_define([SED_NUMA_CORE_QUEUE_PQ_PATTERN], ["s/1 3 > > +5 7/<group>/;s/0 2 4 6/<group>/"]) > > m4_define([DUMMY_NUMA], [--dummy-numa="0,0,0,0"]) > > > > @@ -146,9 +147,16 @@ pmd thread numa_id <cleared> core_id <cleared>: > > ]) > > > > +AT_CHECK([ovs-vsctl set Open_vSwitch . > > +other_config:pmd-rxq-assign=cycles]) > > TMP=$(cat ovs-vswitchd.log | wc -l | tr -d [[:blank:]]) > > AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x3]) > > CHECK_PMD_THREADS_CREATED([2], [], [+$TMP]) > > > > -AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed > > ':a;/AVAIL$/{N;s/\n//;ba;}' | parse_pmd_rxq_show_group | sed > > SED_NUMA_CORE_QUEUE_PATTERN], [0], [dnl > > +AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed > > +':a;/AVAIL$/{N;s/\n//;ba;}' | parse_pmd_rxq_show_group | sed > > +SED_NUMA_CORE_QUEUE_CYC_PATTERN], [0], [dnl > > +port: p0 queue-id: <group> > > +port: p0 queue-id: <group> > > +]) > > + > > +AT_CHECK([ovs-vsctl set Open_vSwitch . > > +other_config:pmd-rxq-assign=roundrobin]) > > +AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed > > +':a;/AVAIL$/{N;s/\n//;ba;}' | parse_pmd_rxq_show_group | sed > > +SED_NUMA_CORE_QUEUE_PQ_PATTERN], [0], [dnl > > port: p0 queue-id: <group> > > port: p0 queue-id: <group> > > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index > > e318151..91d132d 100644 > > --- a/vswitchd/vswitch.xml > > +++ b/vswitchd/vswitch.xml > > @@ -433,4 +433,28 @@ > > </column> > > > > + <column name="other_config" key="pmd-rxq-assign" > > + type='{"type": "string", > > + "enum": ["set", ["cycles", "roundrobin"]]}'> > > + <p> > > + Specifies how RX queues will be automatically assigned to CPU > cores. > > + Options: > > + <dl> > > + <dt><code>cycles</code></dt> > > + <dd>Rxqs will be sorted by order of measured processing > cycles > > + before being assigned to CPU cores.</dd> > > + <dt><code>roundrobin</code></dt> > > + <dd>Rxqs will be round-robined across CPU cores.</dd> > > + </dl> > > + </p> > > + <p> > > + The default value is <code>cycles</code>. > > + </p> > > + <p> > > + Changing this value will affect an automatic re-assignment of > Rxqs to > > + CPUs. Note: Rxqs mapped to CPU cores with > > + <code>pmd-rxq-affinity</code> are unaffected. > > + </p> > > + </column> > > + > > <column name="other_config" key="n-handler-threads" > > type='{"type": "integer", "minInteger": 1}'> > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
