On 08/27/2018 04:04 PM, Ilya Maximets wrote:
> On 27.08.2018 17:19, Kevin Traynor wrote:
>> On 08/27/2018 02:30 PM, Ilya Maximets wrote:
>>> On 25.08.2018 13:00, Kevin Traynor wrote:
>>>> Prior to OVS 2.9 automatic assignment of Rxqs to PMDs
>>>> (i.e. CPUs) was done by round-robin.
>>>>
>>>> That was changed in OVS 2.9 to ordering the Rxqs based on
>>>> their measured processing cycles. This was to assign the
>>>> busiest Rxqs to different PMDs, improving aggregate
>>>> throughput.
>>>>
>>>> For the most part the new scheme should be better, but
>>>> there could be situations where a user prefers a simple
>>>> round-robin scheme because Rxqs from a single port are
>>>> more likely to be spread across multiple PMDs, and/or
>>>> traffic is very bursty/unpredictable.
>>>>
>>>> Add 'pmd-rxq-assign' config to allow a user to select
>>>> round-robin based assignment.
>>>>
>>>> Signed-off-by: Kevin Traynor <[email protected]>
>>>> Acked-by: Eelco Chaudron <[email protected]>
>>>> ---
>>>>
>>>> V3:
>>>> - Rolled in some style and vswitch.xml changes (Ilya)
>>>> - Set cycles mode by default on wrong config (Ilya)
>>>>
>>>> V2:
>>>> - simplified nextpmd change (Eelco)
>>>> - removed confusing doc sentence (Eelco)
>>>> - fixed commit msg (Ilya)
>>>> - made change in pmd-rxq-assign value also perform re-assignment (Ilya)
>>>> - renamed to roundrobin mode (Ilya)
>>>> - moved vswitch.xml changes to right config section (Ilya)
>>>> - comment/log updates
>>>> - moved NEWS update to separate patch as it's been changing on master
>>>>
>>>> Documentation/topics/dpdk/pmd.rst | 33 +++++++++++++---
>>>> lib/dpif-netdev.c | 83
>>>> +++++++++++++++++++++++++++++----------
>>>> tests/pmd.at | 12 +++++-
>>>> vswitchd/vswitch.xml | 24 +++++++++++
>>>> 4 files changed, 123 insertions(+), 29 deletions(-)
>>>>
>>>> diff --git a/Documentation/topics/dpdk/pmd.rst
>>>> b/Documentation/topics/dpdk/pmd.rst
>>>> index 5f0671e..dd9172d 100644
>>>> --- a/Documentation/topics/dpdk/pmd.rst
>>>> +++ b/Documentation/topics/dpdk/pmd.rst
>>>> @@ -113,10 +113,15 @@ means that this thread will only poll the *pinned*
>>>> Rx queues.
>>>>
>>>> If ``pmd-rxq-affinity`` is not set for Rx queues, they will be assigned
>>>> to PMDs
>>>> -(cores) automatically. Where known, the processing cycles that have been
>>>> stored
>>>> -for each Rx queue will be used to assign Rx queue to PMDs based on a round
>>>> -robin of the sorted Rx queues. For example, take the following example,
>>>> where
>>>> -there are five Rx queues and three cores - 3, 7, and 8 - available and the
>>>> -measured usage of core cycles per Rx queue over the last interval is seen
>>>> to
>>>> -be:
>>>> +(cores) automatically.
>>>> +
>>>> +The algorithm used to automatically assign Rxqs to PMDs can be set by::
>>>> +
>>>> + $ ovs-vsctl set Open_vSwitch .
>>>> other_config:pmd-rxq-assign=<assignment>
>>>> +
>>>> +By default, ``cycles`` assignment is used where the Rxqs will be ordered
>>>> by
>>>> +their measured processing cycles, and then be evenly assigned in
>>>> descending
>>>> +order to PMDs based on an up/down walk of the PMDs. For example, where
>>>> there
>>>> +are five Rx queues and three cores - 3, 7, and 8 - available and the
>>>> measured
>>>> +usage of core cycles per Rx queue over the last interval is seen to be:
>>>>
>>>> - Queue #0: 30%
>>>> @@ -132,4 +137,20 @@ The Rx queues will be assigned to the cores in the
>>>> following order::
>>>> Core 8: Q3 (60%) | Q0 (30%)
>>>>
>>>> +Alternatively, ``roundrobin`` assignment can be used, where the Rxqs are
>>>> +assigned to PMDs in a round-robined fashion. This algorithm was used by
>>>> +default prior to OVS 2.9. For example, given the following ports and
>>>> queues:
>>>> +
>>>> +- Port #0 Queue #0 (P0Q0)
>>>> +- Port #0 Queue #1 (P0Q1)
>>>> +- Port #1 Queue #0 (P1Q0)
>>>> +- Port #1 Queue #1 (P1Q1)
>>>> +- Port #1 Queue #2 (P1Q2)
>>>> +
>>>> +The Rx queues may be assigned to the cores in the following order::
>>>> +
>>>> + Core 3: P0Q0 | P1Q1
>>>> + Core 7: P0Q1 | P1Q2
>>>> + Core 8: P1Q0 |
>>>> +
>>>> To see the current measured usage history of PMD core cycles for each Rx
>>>> queue::
>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>> index 7f836bb..8f004c5 100644
>>>> --- a/lib/dpif-netdev.c
>>>> +++ b/lib/dpif-netdev.c
>>>> @@ -342,4 +342,6 @@ struct dp_netdev {
>>>> struct id_pool *tx_qid_pool;
>>>> struct ovs_mutex tx_qid_pool_mutex;
>>>> + /* Use measured cycles for rxq to pmd assignment. */
>>>> + bool pmd_rxq_assign_cyc;
>>>>
>>>> /* Protects the access of the 'struct dp_netdev_pmd_thread'
>>>> @@ -1493,4 +1495,5 @@ create_dp_netdev(const char *name, const struct
>>>> dpif_class *class,
>>>>
>>>> cmap_init(&dp->poll_threads);
>>>> + dp->pmd_rxq_assign_cyc = true;
>>>>
>>>> ovs_mutex_init(&dp->tx_qid_pool_mutex);
>>>> @@ -3717,4 +3720,6 @@ dpif_netdev_set_config(struct dpif *dpif, const
>>>> struct smap *other_config)
>>>> struct dp_netdev *dp = get_dp_netdev(dpif);
>>>> const char *cmask = smap_get(other_config, "pmd-cpu-mask");
>>>> + const char *pmd_rxq_assign = smap_get_def(other_config,
>>>> "pmd-rxq-assign",
>>>> + "cycles");
>>>> unsigned long long insert_prob =
>>>> smap_get_ullong(other_config, "emc-insert-inv-prob",
>>>> @@ -3779,4 +3784,18 @@ dpif_netdev_set_config(struct dpif *dpif, const
>>>> struct smap *other_config)
>>>> }
>>>> }
>>>> +
>>>> + bool pmd_rxq_assign_cyc = !strcmp(pmd_rxq_assign, "cycles");
>>>> + if (!pmd_rxq_assign_cyc && strcmp(pmd_rxq_assign, "roundrobin")) {
>>>> + VLOG_WARN("Unsupported Rxq to PMD assignment mode: "
>>>> + "defaulting to 'cycles'.");
>>>
>>> Have you missed the actual printing of the wrong value?
>>> Or that is what you wanted?
>>>
>>
>> It was what I intended. If someone puts in junk, I didn't want to be
>> adding it to the log.
>>
>
> Many functions in ovs prints wrong arguments. For example, LACP mode,
> bond_mode, port type, getopt, VLAN mode, tc_set_policy and others.
> IMHO, it's not bad to print what exactly the issue is, but I will
> not insist.
>
> One thing is that the colon makes me feel like you wanted to print
> the exact string, but missed it. I think you should either replace
> the colon with a period, or add a real value.
>
Yeah, I can see how the colon might mislead. I got rid of the colon and
mentioned pmd-rxq-assign, so the user knows which setting it is (similar
to dpdk-init).
Kevin.
>>>> + pmd_rxq_assign_cyc = true;
>>>> + pmd_rxq_assign = "cycles";
>>>> + }
>>>> + if (dp->pmd_rxq_assign_cyc != pmd_rxq_assign_cyc) {
>>>> + dp->pmd_rxq_assign_cyc = pmd_rxq_assign_cyc;
>>>> + VLOG_INFO("Rxq to PMD assignment mode changed to: \'%s\'.",
>>>> + pmd_rxq_assign);
>>>> + dp_netdev_request_reconfigure(dp);
>>>> + }
>>>> return 0;
>>>> }
>>>> @@ -4249,8 +4268,16 @@ rr_numa_list_populate(struct dp_netdev *dp, struct
>>>> rr_numa_list *rr)
>>>> }
>>>>
>>>> -/* Returns the next pmd from the numa node in
>>>> - * incrementing or decrementing order. */
>>>> +/*
>>>> + * Returns the next pmd from the numa node.
>>>> + *
>>>> + * If 'updown' is 'true' it will alternate between selecting the next pmd
>>>> in
>>>> + * either an up or down walk, switching between up/down when the first or
>>>> last
>>>> + * core is reached. e.g. 1,2,3,3,2,1,1,2...
>>>> + *
>>>> + * If 'updown' is 'false' it will select the next pmd wrapping around
>>>> when last
>>>> + * core reached. e.g. 1,2,3,1,2,3,1,2...
>>>> + */
>>>> static struct dp_netdev_pmd_thread *
>>>> -rr_numa_get_pmd(struct rr_numa *numa)
>>>> +rr_numa_get_pmd(struct rr_numa *numa, bool updown)
>>>> {
>>>> int numa_idx = numa->cur_index;
>>>> @@ -4260,5 +4287,9 @@ rr_numa_get_pmd(struct rr_numa *numa)
>>>> if (numa->cur_index == numa->n_pmds-1) {
>>>> /* Reached the last pmd. */
>>>> - numa->idx_inc = false;
>>>> + if (updown) {
>>>> + numa->idx_inc = false;
>>>> + } else {
>>>> + numa->cur_index = 0;
>>>> + }
>>>> } else {
>>>> numa->cur_index++;
>>>> @@ -4323,7 +4354,4 @@ compare_rxq_cycles(const void *a, const void *b)
>>>> * pmds to unpinned queues.
>>>> *
>>>> - * If 'pinned' is false queues will be sorted by processing cycles they
>>>> are
>>>> - * consuming and then assigned to pmds in round robin order.
>>>> - *
>>>> * The function doesn't touch the pmd threads, it just stores the
>>>> assignment
>>>> * in the 'pmd' member of each rxq. */
>>>> @@ -4338,4 +4366,5 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned)
>>>> OVS_REQUIRES(dp->port_mutex)
>>>> struct rr_numa *numa = NULL;
>>>> int numa_id;
>>>> + bool assign_cyc = dp->pmd_rxq_assign_cyc;
>>>>
>>>> HMAP_FOR_EACH (port, node, &dp->ports) {
>>>> @@ -4368,10 +4397,13 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned)
>>>> OVS_REQUIRES(dp->port_mutex)
>>>> rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + 1));
>>>> }
>>>> - /* Sum the queue intervals and store the cycle history. */
>>>> - for (unsigned i = 0; i < PMD_RXQ_INTERVAL_MAX; i++) {
>>>> - cycle_hist += dp_netdev_rxq_get_intrvl_cycles(q, i);
>>>> - }
>>>> - dp_netdev_rxq_set_cycles(q, RXQ_CYCLES_PROC_HIST,
>>>> cycle_hist);
>>>>
>>>> + if (assign_cyc) {
>>>> + /* Sum the queue intervals and store the cycle
>>>> history. */
>>>> + for (unsigned i = 0; i < PMD_RXQ_INTERVAL_MAX; i++) {
>>>> + cycle_hist += dp_netdev_rxq_get_intrvl_cycles(q,
>>>> i);
>>>> + }
>>>> + dp_netdev_rxq_set_cycles(q, RXQ_CYCLES_PROC_HIST,
>>>> + cycle_hist);
>>>> + }
>>>> /* Store the queue. */
>>>> rxqs[n_rxqs++] = q;
>>>> @@ -4380,5 +4412,5 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned)
>>>> OVS_REQUIRES(dp->port_mutex)
>>>> }
>>>>
>>>> - if (n_rxqs > 1) {
>>>> + if (n_rxqs > 1 && assign_cyc) {
>>>> /* Sort the queues in order of the processing cycles
>>>> * they consumed during their last pmd interval. */
>>>> @@ -4404,5 +4436,5 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned)
>>>> OVS_REQUIRES(dp->port_mutex)
>>>> continue;
>>>> }
>>>> - rxqs[i]->pmd = rr_numa_get_pmd(non_local_numa);
>>>> + rxqs[i]->pmd = rr_numa_get_pmd(non_local_numa, assign_cyc);
>>>> VLOG_WARN("There's no available (non-isolated) pmd thread "
>>>> "on numa node %d. Queue %d on port \'%s\' will "
>>>> @@ -4413,11 +4445,20 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned)
>>>> OVS_REQUIRES(dp->port_mutex)
>>>> rxqs[i]->pmd->core_id, rxqs[i]->pmd->numa_id);
>>>> } else {
>>>> - rxqs[i]->pmd = rr_numa_get_pmd(numa);
>>>> - VLOG_INFO("Core %d on numa node %d assigned port \'%s\' "
>>>> - "rx queue %d (measured processing cycles %"PRIu64").",
>>>> - rxqs[i]->pmd->core_id, numa_id,
>>>> - netdev_rxq_get_name(rxqs[i]->rx),
>>>> - netdev_rxq_get_queue_id(rxqs[i]->rx),
>>>> - dp_netdev_rxq_get_cycles(rxqs[i],
>>>> RXQ_CYCLES_PROC_HIST));
>>>> + rxqs[i]->pmd = rr_numa_get_pmd(numa, assign_cyc);
>>>> + if (assign_cyc) {
>>>> + VLOG_INFO("Core %d on numa node %d assigned port \'%s\' "
>>>> + "rx queue %d "
>>>> + "(measured processing cycles %"PRIu64").",
>>>> + rxqs[i]->pmd->core_id, numa_id,
>>>> + netdev_rxq_get_name(rxqs[i]->rx),
>>>> + netdev_rxq_get_queue_id(rxqs[i]->rx),
>>>> + dp_netdev_rxq_get_cycles(rxqs[i],
>>>> + RXQ_CYCLES_PROC_HIST));
>>>> + } else {
>>>> + VLOG_INFO("Core %d on numa node %d assigned port \'%s\' "
>>>> + "rx queue %d.", rxqs[i]->pmd->core_id, numa_id,
>>>> + netdev_rxq_get_name(rxqs[i]->rx),
>>>> + netdev_rxq_get_queue_id(rxqs[i]->rx));
>>>> + }
>>>> }
>>>> }
>>>> diff --git a/tests/pmd.at b/tests/pmd.at
>>>> index 4cae6c8..1f952f3 100644
>>>> --- a/tests/pmd.at
>>>> +++ b/tests/pmd.at
>>>> @@ -62,5 +62,6 @@ m4_define([CHECK_PMD_THREADS_CREATED], [
>>>>
>>>> m4_define([SED_NUMA_CORE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( core_id
>>>> \)[[0-9]]*:/\1<cleared>\2<cleared>:/"])
>>>> -m4_define([SED_NUMA_CORE_QUEUE_PATTERN], ["s/1 2 5 6/<group>/;s/0 3 4
>>>> 7/<group>/"])
>>>> +m4_define([SED_NUMA_CORE_QUEUE_CYC_PATTERN], ["s/1 2 5 6/<group>/;s/0 3 4
>>>> 7/<group>/"])
>>>> +m4_define([SED_NUMA_CORE_QUEUE_PQ_PATTERN], ["s/1 3 5 7/<group>/;s/0 2 4
>>>> 6/<group>/"])
>>>> m4_define([DUMMY_NUMA], [--dummy-numa="0,0,0,0"])
>>>>
>>>> @@ -146,9 +147,16 @@ pmd thread numa_id <cleared> core_id <cleared>:
>>>> ])
>>>>
>>>> +AT_CHECK([ovs-vsctl set Open_vSwitch .
>>>> other_config:pmd-rxq-assign=cycles])
>>>> TMP=$(cat ovs-vswitchd.log | wc -l | tr -d [[:blank:]])
>>>> AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x3])
>>>> CHECK_PMD_THREADS_CREATED([2], [], [+$TMP])
>>>>
>>>> -AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed
>>>> ':a;/AVAIL$/{N;s/\n//;ba;}' | parse_pmd_rxq_show_group | sed
>>>> SED_NUMA_CORE_QUEUE_PATTERN], [0], [dnl
>>>> +AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed
>>>> ':a;/AVAIL$/{N;s/\n//;ba;}' | parse_pmd_rxq_show_group | sed
>>>> SED_NUMA_CORE_QUEUE_CYC_PATTERN], [0], [dnl
>>>> +port: p0 queue-id: <group>
>>>> +port: p0 queue-id: <group>
>>>> +])
>>>> +
>>>> +AT_CHECK([ovs-vsctl set Open_vSwitch .
>>>> other_config:pmd-rxq-assign=roundrobin])
>>>> +AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed
>>>> ':a;/AVAIL$/{N;s/\n//;ba;}' | parse_pmd_rxq_show_group | sed
>>>> SED_NUMA_CORE_QUEUE_PQ_PATTERN], [0], [dnl
>>>> port: p0 queue-id: <group>
>>>> port: p0 queue-id: <group>
>>>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>>>> index 0cd8520..2439b24 100644
>>>> --- a/vswitchd/vswitch.xml
>>>> +++ b/vswitchd/vswitch.xml
>>>> @@ -425,4 +425,28 @@
>>>> </column>
>>>>
>>>> + <column name="other_config" key="pmd-rxq-assign"
>>>> + type='{"type": "string",
>>>> + "enum": ["set", ["cycles", "roundrobin"]]}'>
>>>> + <p>
>>>> + Specifies how RX queues will be automatically assigned to CPU
>>>> cores.
>>>> + Options:
>>>> + <dl>
>>>> + <dt><code>cycles</code></dt>
>>>> + <dd>Rxqs will be sorted by order of measured processing cycles
>>>> + before being assigned to CPU cores.</dd>
>>>> + <dt><code>roundrobin</code></dt>
>>>> + <dd>Rxqs will be round-robined across CPU cores.</dd>
>>>> + </dl>
>>>> + </p>
>>>> + <p>
>>>> + The default value is <code>cycles</code>.
>>>> + </p>
>>>> + <p>
>>>> + Changing this value will affect an automatic re-assignment of
>>>> Rxqs to
>>>> + CPUs. Note: Rxqs mapped to CPU cores with
>>>> + <code>pmd-rxq-affinity</code> are unaffected.
>>>> + </p>
>>>> + </column>
>>>> +
>>>> <column name="other_config" key="n-handler-threads"
>>>> type='{"type": "integer", "minInteger": 1}'>
>>>>
>>
>>
>>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev