On 12/18/17, 2:41 PM, "Jan Scheurich" <[email protected]> wrote:
Hi Aaron,
Thanks for the review. Answers in-line.
Regards, Jan
> -----Original Message-----
> From: Aaron Conole [mailto:[email protected]]
> Sent: Monday, 18 December, 2017 19:41
> To: Jan Scheurich <[email protected]>
> Cc: [email protected]
> Subject: Re: [ovs-dev] [PATCH v4 1/3] dpif-netdev: Refactor PMD
performance into dpif-netdev-perf
>
> Hi Jan,
>
> Jan Scheurich <[email protected]> writes:
>
> > Add module dpif-netdev-perf to host all PMD performance-related
> > data structures and functions in dpif-netdev. Refactor the PMD
> > stats handling in dpif-netdev and delegate whatever possible into
> > the new module, using clean interfaces to shield dpif-netdev from
> > the implementation details. Accordingly, the all PMD statistics
> > members are moved from the main struct dp_netdev_pmd_thread into
> > a dedicated member of type struct pmd_perf_stats.
> >
> > Include Darrel's prior refactoring of PMD stats contained in
> > [PATCH v5,2/3] dpif-netdev: Refactor some pmd stats:
> >
> > 1. The cycles per packet counts are now based on packets
> > received rather than packet passes through the datapath.
> >
> > 2. Packet counters are now kept for packets received and
> > packets recirculated. These are kept as separate counters for
> > maintainability reasons. The cost of incrementing these counters
> > is negligible. These new counters are also displayed to the user.
> >
> > 3. A display statistic is added for the average number of
> > datapath passes per packet. This should be useful for user
> > debugging and understanding of packet processing.
> >
> > 4. The user visible 'miss' counter is used for successful upcalls,
> > rather than the sum of sucessful and unsuccessful upcalls. Hence,
> > this becomes what user historically understands by OVS 'miss upcall'.
> > The user display is annotated to make this clear as well.
> >
> > 5. The user visible 'lost' counter remains as failed upcalls, but
> > is annotated to make it clear what the meaning is.
> >
> > 6. The enum pmd_stat_type is annotated to make the usage of the
> > stats counters clear.
> >
> > 7. The subtable lookup stats is renamed to make it clear that it
> > relates to masked lookups.
> >
> > 8. The PMD stats test is updated to handle the new user stats of
> > packets received, packets recirculated and average number of datapath
> > passes per packet.
> >
> > On top of that introduce a "-pmd <core>" option to the PMD info
> > commands to filter the output for a single PMD.
> >
> > Signed-off-by: Jan Scheurich <[email protected]>
> > Co-authored-by: Darrell Ball <[email protected]>
> > Signed-off-by: Darrell Ball <[email protected]>
> > ---
>
> I'm really happy to see this series making progress. I think it's
> extremely useful.
>
> Some comments inline.
>
> > lib/automake.mk | 2 +
> > lib/dpif-netdev-perf.c | 66 ++++++++++
> > lib/dpif-netdev-perf.h | 123 ++++++++++++++++++
> > lib/dpif-netdev.c | 330
++++++++++++++++++++-----------------------------
> > tests/pmd.at | 22 ++--
> > 5 files changed, 339 insertions(+), 204 deletions(-)
> > create mode 100644 lib/dpif-netdev-perf.c
> > create mode 100644 lib/dpif-netdev-perf.h
> >
> > diff --git a/lib/automake.mk b/lib/automake.mk
> > index effe5b5..0b8df0e 100644
> > --- a/lib/automake.mk
> > +++ b/lib/automake.mk
> > @@ -80,6 +80,8 @@ lib_libopenvswitch_la_SOURCES = \
> > lib/dpdk.h \
> > lib/dpif-netdev.c \
> > lib/dpif-netdev.h \
> > + lib/dpif-netdev-perf.c \
> > + lib/dpif-netdev-perf.h \
> > lib/dpif-provider.h \
> > lib/dpif.c \
> > lib/dpif.h \
> > diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
> > new file mode 100644
> > index 0000000..b198fd3
> > --- /dev/null
> > +++ b/lib/dpif-netdev-perf.c
> > @@ -0,0 +1,66 @@
> > +/*
> > + * Copyright (c) 2017 Ericsson AB.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + *
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.apache.org_licenses_LICENSE-2D2.0&d=DwIFAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=dGZmbKhBG9tJHY4odedsGA&m=QBSK6lo6kCkaTQZasac31Kh8ObSufTKYj5CCoTMDA8w&s=r0yS7-nx0OmVHTnfrqFfhk0vQLgePifRH8jO7uhdGuU&e=
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +
> > +#include <config.h>
> > +
> > +#ifdef DPDK_NETDEV
> > +#include <rte_cycles.h>
> > +#endif
> > +
> > +#include "openvswitch/dynamic-string.h"
> > +#include "openvswitch/vlog.h"
> > +#include "dpif-netdev-perf.h"
> > +#include "timeval.h"
> > +
> > +VLOG_DEFINE_THIS_MODULE(pmd_perf);
> > +
> > +void
> > +pmd_perf_stats_init(struct pmd_perf_stats *s) {
> > + memset(s, 0 , sizeof(*s));
> > + s->start_ms = time_msec();
> > +}
> > +
> > +void
> > +pmd_perf_read_counters(struct pmd_perf_stats *s,
> > + uint64_t stats[PMD_N_STATS])
> > +{
> > + uint64_t val;
> > +
> > + /* These loops subtracts reference values ('*_zero') from the
counters.
> > + * Since loads and stores are relaxed, it might be possible for a
'*_zero'
>
> Kind of a nit, but I think that should read .zero[*] or ->zero[*]. No
> big deal one way or the other.
Yes, the comment is a leftover. Will fix.
>
> > + * value to be more recent than the current value we're reading
from the
> > + * counter. This is not a big problem, since these numbers are not
> > + * supposed to be too accurate, but we should at least make sure
that
> > + * the result is not negative. */
> > + for (int i = 0; i < PMD_N_STATS; i++) {
> > + atomic_read_relaxed(&s->counters.n[i], &val);
> > + if (val > s->counters.zero[i]) {
> > + stats[i] = val - s->counters.zero[i];
> > + } else {
> > + stats[i] = 0;
> > + }
> > + }
> > +}
> > +
> > +void
> > +pmd_perf_stats_clear(struct pmd_perf_stats *s)
> > +{
> > + s->start_ms = 0; /* Marks start of clearing. */
> > + for (int i = 0; i < PMD_N_STATS; i++) {
> > + atomic_read_relaxed(&s->counters.n[i], &s->counters.zero[i]);
> > + }
> > + s->start_ms = time_msec(); /* Clearing finished. */
> > +}
> > diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
> > new file mode 100644
> > index 0000000..f55f7a2
> > --- /dev/null
> > +++ b/lib/dpif-netdev-perf.h
> > @@ -0,0 +1,123 @@
> > +/*
> > + * Copyright (c) 2017 Ericsson AB.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + *
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.apache.org_licenses_LICENSE-2D2.0&d=DwIFAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=dGZmbKhBG9tJHY4odedsGA&m=QBSK6lo6kCkaTQZasac31Kh8ObSufTKYj5CCoTMDA8w&s=r0yS7-nx0OmVHTnfrqFfhk0vQLgePifRH8jO7uhdGuU&e=
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +
> > +#ifndef PMD_PERF_H
> > +#define PMD_PERF_H 1
> > +
> > +#include <stdbool.h>
> > +#include <stddef.h>
> > +#include <stdint.h>
> > +#include <string.h>
> > +#include <math.h>
> > +
> > +#include "openvswitch/vlog.h"
> > +#include "ovs-atomic.h"
> > +#include "timeval.h"
> > +#include "unixctl.h"
> > +#include "util.h"
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +enum pmd_stat_type {
> > + PMD_STAT_EXACT_HIT, /* Packets that had an exact match (emc).
*/
> > + PMD_STAT_MASKED_HIT, /* Packets that matched in the flow table.
*/
> > + PMD_STAT_MISS, /* Packets that did not match and upcall
was ok. */
> > + PMD_STAT_LOST, /* Packets that did not match and upcall
failed. */
> > + /* The above statistics account for the
total
> > + * number of packet passes through the
datapath
> > + * pipeline and should not be overlapping
with each
> > + * other. */
> > + PMD_STAT_MASKED_LOOKUP, /* Number of subtable lookups for flow
table
> > + hits. Each MASKED_HIT hit will have >= 1
> > + MASKED_LOOKUP(s). */
> > + PMD_STAT_RECV, /* Packets entering the datapath pipeline
from an
> > + * interface. */
> > + PMD_STAT_RECIRC, /* Packets reentering the datapath
pipeline due to
> > + * recirculation. */
> > + PMD_CYCLES_POLL_IDLE, /* Cycles spent unsuccessful polling. */
> > + PMD_CYCLES_POLL_BUSY, /* Cycles spent successfully polling and
> > + * processing polled packets. */
> > + PMD_CYCLES_OVERHEAD, /* Cycles spent for other tasks. */
> > + PMD_CYCLES_UPCALL, /* Cycles spent in upcalls. Included in
> > + PMD_CYCLES_POLL_BUSY. */
> > + PMD_CYCLES_ITER_IDLE, /* Cycles spent in idle iterations. */
> > + PMD_CYCLES_ITER_BUSY, /* Cycles spent in busy iterations. */
> > + PMD_N_STATS
> > +};
> > +
> > +struct pmd_counters {
> > + atomic_uint64_t n[PMD_N_STATS]; /* Indexed by PMD_STAT_*. */
> > + uint64_t zero[PMD_N_STATS];
> > +};
> > +
> > +struct pmd_perf_stats {
> > + uint64_t start_ms;
> > + uint64_t last_tsc;
> > + struct pmd_counters counters;
> > +};
> > +
> > +enum pmd_info_type;
> > +struct dp_netdev_pmd_thread;
> > +
> > +struct pmd_perf_params {
> > + int command_type;
> > + bool histograms;
> > + size_t iter_hist_len;
> > + size_t ms_hist_len;
> > +};
> > +
> > +void pmd_perf_stats_init(struct pmd_perf_stats *s);
> > +void pmd_perf_stats_clear(struct pmd_perf_stats *s);
> > +void pmd_perf_read_counters(struct pmd_perf_stats *s,
> > + uint64_t stats[PMD_N_STATS]);
> > +
> > +static inline void
> > +pmd_perf_update_counter(struct pmd_perf_stats *s,
> > + enum pmd_stat_type counter, int delta)
> > +{
> > + uint64_t tmp;
> > + atomic_read_relaxed(&s->counters.n[counter], &tmp);
> > + tmp += delta;
> > + atomic_store_relaxed(&s->counters.n[counter], tmp);
> > +}
> > +
> > +static inline void
> > +pmd_perf_start_iteration(struct pmd_perf_stats *s, uint64_t now_tsc)
> > +{
> > + if (OVS_UNLIKELY(s->start_ms == 0)) {
> > + /* Stats not yet fully initialised. */
> > + return;
> > + }
> > + s->last_tsc = now_tsc;
> > +}
> > +
> > +static inline void
> > +pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc,
> > + int packets)
> > +{
> > + uint64_t cycles = now_tsc - s->last_tsc;
> > +
> > + /* No need for atomic updates as only invoked within PMD thread. */
> > + if (packets > 0) {
> > + s->counters.n[PMD_CYCLES_ITER_BUSY] += cycles;
> > + } else {
> > + s->counters.n[PMD_CYCLES_ITER_IDLE] += cycles;
> > + }
> > +}
> > +
>
> This is missing a closing block for the cplusplus block above.
Sure, ended up wrongly in the patch 2/3.
>
> > +#endif /* pmd-perf.h */
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index 55be632..6ae1a3f 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -43,6 +43,7 @@
> > #include "csum.h"
> > #include "dp-packet.h"
> > #include "dpif.h"
> > +#include "dpif-netdev-perf.h"
> > #include "dpif-provider.h"
> > #include "dummy.h"
> > #include "fat-rwlock.h"
> > @@ -330,23 +331,6 @@ static struct dp_netdev_port
*dp_netdev_lookup_port(const struct dp_netdev *dp,
> > odp_port_t)
> > OVS_REQUIRES(dp->port_mutex);
> >
> > -enum dp_stat_type {
> > - DP_STAT_EXACT_HIT, /* Packets that had an exact match
(emc). */
> > - DP_STAT_MASKED_HIT, /* Packets that matched in the flow
table. */
> > - DP_STAT_MISS, /* Packets that did not match. */
> > - DP_STAT_LOST, /* Packets not passed up to the
client. */
> > - DP_STAT_LOOKUP_HIT, /* Number of subtable lookups for flow
table
> > - hits */
> > - DP_N_STATS
> > -};
> > -
> > -enum pmd_cycles_counter_type {
> > - PMD_CYCLES_IDLE, /* Cycles spent idle or unsuccessful
polling */
> > - PMD_CYCLES_PROCESSING, /* Cycles spent successfully polling
and
> > - * processing polled packets */
> > - PMD_N_CYCLES
> > -};
> > -
> > enum rxq_cycles_counter_type {
> > RXQ_CYCLES_PROC_CURR, /* Cycles spent successfully polling
and
> > processing packets during the
current
> > @@ -496,18 +480,6 @@ struct dp_netdev_actions
*dp_netdev_flow_get_actions(
> > const struct dp_netdev_flow *);
> > static void dp_netdev_actions_free(struct dp_netdev_actions *);
> >
> > -/* Contained by struct dp_netdev_pmd_thread's 'stats' member. */
> > -struct dp_netdev_pmd_stats {
> > - /* Indexed by DP_STAT_*. */
> > - atomic_ullong n[DP_N_STATS];
> > -};
> > -
> > -/* Contained by struct dp_netdev_pmd_thread's 'cycle' member. */
> > -struct dp_netdev_pmd_cycles {
> > - /* Indexed by PMD_CYCLES_*. */
> > - atomic_ullong n[PMD_N_CYCLES];
> > -};
> > -
> > struct polled_queue {
> > struct dp_netdev_rxq *rxq;
> > odp_port_t port_no;
> > @@ -577,12 +549,6 @@ struct dp_netdev_pmd_thread {
> > are stored for each polled rxq. */
> > long long int rxq_next_cycle_store;
> >
> > - /* Statistics. */
> > - struct dp_netdev_pmd_stats stats;
> > -
> > - /* Cycles counters */
> > - struct dp_netdev_pmd_cycles cycles;
> > -
> > /* Used to count cicles. See 'cycles_counter_end()' */
> > unsigned long long last_cycles;
> >
> > @@ -620,12 +586,8 @@ struct dp_netdev_pmd_thread {
> > struct hmap tnl_port_cache;
> > struct hmap send_port_cache;
> >
> > - /* Only a pmd thread can write on its own 'cycles' and 'stats'.
> > - * The main thread keeps 'stats_zero' and 'cycles_zero' as base
> > - * values and subtracts them from 'stats' and 'cycles' before
> > - * reporting to the user */
> > - unsigned long long stats_zero[DP_N_STATS];
> > - uint64_t cycles_zero[PMD_N_CYCLES];
> > + /* Keep track of detailed PMD performance statistics. */
> > + struct pmd_perf_stats perf_stats;
> >
> > /* Set to true if the pmd thread needs to be reloaded. */
> > bool need_reload;
> > @@ -791,46 +753,11 @@ enum pmd_info_type {
> > };
> >
> > static void
> > -pmd_info_show_stats(struct ds *reply,
> > - struct dp_netdev_pmd_thread *pmd,
> > - unsigned long long stats[DP_N_STATS],
> > - uint64_t cycles[PMD_N_CYCLES])
> > +format_pmd_thread(struct ds *reply,
> > + struct dp_netdev_pmd_thread *pmd)
>
> Why not put this on one line? Just a nit.
Leftover from earlier version. Will join the lines.
>
> > {
> > - unsigned long long total_packets;
> > - uint64_t total_cycles = 0;
> > - int i;
> > -
> > - /* These loops subtracts reference values ('*_zero') from the
counters.
> > - * Since loads and stores are relaxed, it might be possible for a
'*_zero'
> > - * value to be more recent than the current value we're reading
from the
> > - * counter. This is not a big problem, since these numbers are not
> > - * supposed to be too accurate, but we should at least make sure
that
> > - * the result is not negative. */
> > - for (i = 0; i < DP_N_STATS; i++) {
> > - if (stats[i] > pmd->stats_zero[i]) {
> > - stats[i] -= pmd->stats_zero[i];
> > - } else {
> > - stats[i] = 0;
> > - }
> > - }
> > -
> > - /* Sum of all the matched and not matched packets gives the total.
*/
> > - total_packets = stats[DP_STAT_EXACT_HIT] +
stats[DP_STAT_MASKED_HIT]
> > - + stats[DP_STAT_MISS];
> > -
> > - for (i = 0; i < PMD_N_CYCLES; i++) {
> > - if (cycles[i] > pmd->cycles_zero[i]) {
> > - cycles[i] -= pmd->cycles_zero[i];
> > - } else {
> > - cycles[i] = 0;
> > - }
> > -
> > - total_cycles += cycles[i];
> > - }
> > -
> > ds_put_cstr(reply, (pmd->core_id == NON_PMD_CORE_ID)
> > ? "main thread" : "pmd thread");
> > -
>
> Why was this line trimmed?
Probably just my personal aesthetics. There is no particular reason why
there should be a bank line here but not 6 lines further down in the same
function.
I have factored this code segment out into a dedicated function, and in
this context I believe it may be justified to align white-space usage.
>
> > if (pmd->numa_id != OVS_NUMA_UNSPEC) {
> > ds_put_format(reply, " numa_id %d", pmd->numa_id);
> > }
> > @@ -838,16 +765,39 @@ pmd_info_show_stats(struct ds *reply,
> > ds_put_format(reply, " core_id %u", pmd->core_id);
> > }
> > ds_put_cstr(reply, ":\n");
> > +}
> > +
> > +static void
> > +pmd_info_show_stats(struct ds *reply,
> > + struct dp_netdev_pmd_thread *pmd)
> > +{
> > + uint64_t stats[PMD_N_STATS];
> > + uint64_t total_cycles = 0;
> > +
> > + pmd_perf_read_counters(&pmd->perf_stats, stats);
> > + total_cycles = stats[PMD_CYCLES_ITER_IDLE]
> > + + stats[PMD_CYCLES_ITER_BUSY];
> > +
> > + format_pmd_thread(reply, pmd);
> >
> > ds_put_format(reply,
> > - "\temc hits:%llu\n\tmegaflow hits:%llu\n"
> > - "\tavg. subtable lookups per hit:%.2f\n"
> > - "\tmiss:%llu\n\tlost:%llu\n",
> > - stats[DP_STAT_EXACT_HIT], stats[DP_STAT_MASKED_HIT],
> > - stats[DP_STAT_MASKED_HIT] > 0
> > - ?
(1.0*stats[DP_STAT_LOOKUP_HIT])/stats[DP_STAT_MASKED_HIT]
> > - : 0,
> > - stats[DP_STAT_MISS], stats[DP_STAT_LOST]);
> > + "\tpackets received:%"PRIu64"\n"
> > + "\tpacket recirculations:%"PRIu64"\n"
> > + "\tavg. datapath passes per packet:%.2f\n"
> > + "\temc hits:%"PRIu64"\n"
> > + "\tmegaflow hits:%"PRIu64"\n"
> > + "\tavg. subtable lookups per megaflow hit:%.2f\n"
> > + "\tmiss(i.e. lookup miss with success
upcall):%"PRIu64"\n"
> > + "\tlost(i.e. lookup miss with failed
upcall):%"PRIu64"\n",
> > + stats[PMD_STAT_RECV], stats[PMD_STAT_RECIRC],
> > + stats[PMD_STAT_RECV] > 0
> > + ? (1.0 * (stats[PMD_STAT_RECV] +
stats[PMD_STAT_RECIRC]))
> > + / stats[PMD_STAT_RECV] : 0,
> > + stats[PMD_STAT_EXACT_HIT],
stats[PMD_STAT_MASKED_HIT],
> > + stats[PMD_STAT_MASKED_HIT] > 0
> > + ? (1.0 * stats[PMD_STAT_MASKED_LOOKUP])
> > + / stats[PMD_STAT_MASKED_HIT]
> > + : 0, stats[PMD_STAT_MISS], stats[PMD_STAT_LOST]);
> >
> > if (total_cycles == 0) {
> > return;
> > @@ -856,46 +806,26 @@ pmd_info_show_stats(struct ds *reply,
> > ds_put_format(reply,
> > "\tidle cycles:%"PRIu64" (%.02f%%)\n"
> > "\tprocessing cycles:%"PRIu64" (%.02f%%)\n",
> > - cycles[PMD_CYCLES_IDLE],
> > - cycles[PMD_CYCLES_IDLE] / (double)total_cycles * 100,
> > - cycles[PMD_CYCLES_PROCESSING],
> > - cycles[PMD_CYCLES_PROCESSING] / (double)total_cycles
* 100);
> > + stats[PMD_CYCLES_ITER_IDLE],
> > + stats[PMD_CYCLES_ITER_IDLE] / (double)total_cycles *
100,
> > + stats[PMD_CYCLES_ITER_BUSY],
> > + stats[PMD_CYCLES_ITER_BUSY] / (double)total_cycles *
100);
> >
> > + uint64_t total_packets = stats[PMD_STAT_RECV];
> > if (total_packets == 0) {
> > return;
> > }
> >
> > ds_put_format(reply,
> > - "\tavg cycles per packet: %.02f (%"PRIu64"/%llu)\n",
> > + "\tavg cycles per packet: %.02f
(%"PRIu64"/%"PRIu64")\n",
> > total_cycles / (double)total_packets,
> > total_cycles, total_packets);
> >
> > ds_put_format(reply,
> > "\tavg processing cycles per packet: "
> > - "%.02f (%"PRIu64"/%llu)\n",
> > - cycles[PMD_CYCLES_PROCESSING] /
(double)total_packets,
> > - cycles[PMD_CYCLES_PROCESSING], total_packets);
> > -}
> > -
> > -static void
> > -pmd_info_clear_stats(struct ds *reply OVS_UNUSED,
> > - struct dp_netdev_pmd_thread *pmd,
> > - unsigned long long stats[DP_N_STATS],
> > - uint64_t cycles[PMD_N_CYCLES])
> > -{
> > - int i;
> > -
> > - /* We cannot write 'stats' and 'cycles' (because they're written
by other
> > - * threads) and we shouldn't change 'stats' (because they're used
to count
> > - * datapath stats, which must not be cleared here). Instead, we
save the
> > - * current values and subtract them from the values to be
displayed in the
> > - * future */
> > - for (i = 0; i < DP_N_STATS; i++) {
> > - pmd->stats_zero[i] = stats[i];
> > - }
> > - for (i = 0; i < PMD_N_CYCLES; i++) {
> > - pmd->cycles_zero[i] = cycles[i];
> > - }
> > + "%.02f (%"PRIu64"/%"PRIu64")\n",
> > + stats[PMD_CYCLES_POLL_BUSY] / (double)total_packets,
> > + stats[PMD_CYCLES_POLL_BUSY], total_packets);
> > }
> >
> > static int
> > @@ -1055,23 +985,34 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn,
int argc, const char *argv[],
> > struct ds reply = DS_EMPTY_INITIALIZER;
> > struct dp_netdev_pmd_thread **pmd_list;
> > struct dp_netdev *dp = NULL;
> > - size_t n;
> > enum pmd_info_type type = *(enum pmd_info_type *) aux;
> > + int core_id = -1;
> > + size_t n;
> >
> > ovs_mutex_lock(&dp_netdev_mutex);
> >
> > - if (argc == 2) {
> > - dp = shash_find_data(&dp_netdevs, argv[1]);
> > - } else if (shash_count(&dp_netdevs) == 1) {
> > - /* There's only one datapath */
> > - dp = shash_first(&dp_netdevs)->data;
> > + while (argc > 1) {
> > + if (!strcmp(argv[1], "-pmd")) {
> > + core_id = strtol(argv[2], NULL, 10);
> > + argc -= 2;
> > + argv += 2;
> > + } else {
> > + dp = shash_find_data(&dp_netdevs, argv[1]);
> > + argc -= 1;
> > + argv += 1;
> > + }
> > }
> >
> > if (!dp) {
> > - ovs_mutex_unlock(&dp_netdev_mutex);
> > - unixctl_command_reply_error(conn,
> > - "please specify an existing
datapath");
> > - return;
> > + if (shash_count(&dp_netdevs) == 1) {
> > + /* There's only one datapath */
> > + dp = shash_first(&dp_netdevs)->data;
> > + } else {
> > + ovs_mutex_unlock(&dp_netdev_mutex);
> > + unixctl_command_reply_error(conn,
> > + "please specify an existing
datapath");
> > + return;
> > + }
> > }
> >
> > sorted_poll_thread_list(dp, &pmd_list, &n);
> > @@ -1080,26 +1021,15 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn,
int argc, const char *argv[],
> > if (!pmd) {
> > break;
> > }
> > -
> > + if (core_id != -1 && pmd->core_id != core_id) {
> > + continue;
> > + }
>
> Did I read this right? If I omit the core_id it will skip displaying
> stats now? I didn't get a chance to test - maybe I misread and just
> need coffee.
If you don't specify "-pmd <core_id>", core_id == -1 and all PMDs are
displayed as in the past. If you specify "-pmd <core_id>", core_id != -1 and
all PMDs not equal to core_id are skipped.
>
> > if (type == PMD_INFO_SHOW_RXQ) {
> > pmd_info_show_rxq(&reply, pmd);
> > - } else {
> > - unsigned long long stats[DP_N_STATS];
> > - uint64_t cycles[PMD_N_CYCLES];
> > -
> > - /* Read current stats and cycle counters */
> > - for (size_t j = 0; j < ARRAY_SIZE(stats); j++) {
> > - atomic_read_relaxed(&pmd->stats.n[j], &stats[j]);
> > - }
> > - for (size_t j = 0; j < ARRAY_SIZE(cycles); j++) {
> > - atomic_read_relaxed(&pmd->cycles.n[j], &cycles[j]);
> > - }
> > -
> > - if (type == PMD_INFO_CLEAR_STATS) {
> > - pmd_info_clear_stats(&reply, pmd, stats, cycles);
> > - } else if (type == PMD_INFO_SHOW_STATS) {
> > - pmd_info_show_stats(&reply, pmd, stats, cycles);
> > - }
> > + } else if (type == PMD_INFO_CLEAR_STATS) {
> > + pmd_perf_stats_clear(&pmd->perf_stats);
> > + } else if (type == PMD_INFO_SHOW_STATS) {
> > + pmd_info_show_stats(&reply, pmd);
> > }
> > }
> > free(pmd_list);
> > @@ -1117,14 +1047,14 @@ dpif_netdev_init(void)
> > clear_aux = PMD_INFO_CLEAR_STATS,
> > poll_aux = PMD_INFO_SHOW_RXQ;
> >
> > - unixctl_command_register("dpif-netdev/pmd-stats-show", "[dp]",
> > - 0, 1, dpif_netdev_pmd_info,
> > + unixctl_command_register("dpif-netdev/pmd-stats-show", "[-pmd core
| dp]",
> > + 0, 2, dpif_netdev_pmd_info,
> > (void *)&show_aux);
> > - unixctl_command_register("dpif-netdev/pmd-stats-clear", "[dp]",
> > - 0, 1, dpif_netdev_pmd_info,
> > + unixctl_command_register("dpif-netdev/pmd-stats-clear", "[-pmd
core | dp]",
> > + 0, 2, dpif_netdev_pmd_info,
> > (void *)&clear_aux);
> > - unixctl_command_register("dpif-netdev/pmd-rxq-show", "[dp]",
> > - 0, 1, dpif_netdev_pmd_info,
> > + unixctl_command_register("dpif-netdev/pmd-rxq-show", "[-pmd core |
dp]",
> > + 0, 2, dpif_netdev_pmd_info,
> > (void *)&poll_aux);
> > unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]",
> > 0, 1, dpif_netdev_pmd_rebalance,
> > @@ -1460,23 +1390,19 @@ dpif_netdev_get_stats(const struct dpif *dpif,
struct dpif_dp_stats *stats)
> > {
> > struct dp_netdev *dp = get_dp_netdev(dpif);
> > struct dp_netdev_pmd_thread *pmd;
> > + uint64_t pmd_stats[PMD_N_STATS];
> >
> > - stats->n_flows = stats->n_hit = stats->n_missed = stats->n_lost =
0;
> > + stats->n_flows = stats->n_hit =
> > + stats->n_mask_hit = stats->n_missed = stats->n_lost = 0;
> > CMAP_FOR_EACH (pmd, node, &dp->poll_threads) {
> > - unsigned long long n;
> > stats->n_flows += cmap_count(&pmd->flow_table);
> > -
> > - atomic_read_relaxed(&pmd->stats.n[DP_STAT_MASKED_HIT], &n);
> > - stats->n_hit += n;
> > - atomic_read_relaxed(&pmd->stats.n[DP_STAT_EXACT_HIT], &n);
> > - stats->n_hit += n;
> > - atomic_read_relaxed(&pmd->stats.n[DP_STAT_MISS], &n);
> > - stats->n_missed += n;
> > - atomic_read_relaxed(&pmd->stats.n[DP_STAT_LOST], &n);
> > - stats->n_lost += n;
> > + pmd_perf_read_counters(&pmd->perf_stats, pmd_stats);
> > + stats->n_hit += pmd_stats[PMD_STAT_EXACT_HIT];
> > + stats->n_mask_hit += pmd_stats[PMD_STAT_MASKED_HIT];
> > + stats->n_missed += pmd_stats[PMD_STAT_MISS];
> > + stats->n_lost += pmd_stats[PMD_STAT_LOST];
> > }
> > stats->n_masks = UINT32_MAX;
> > - stats->n_mask_hit = UINT64_MAX;
> >
> > return 0;
> > }
> > @@ -3155,28 +3081,28 @@ cycles_count_start(struct dp_netdev_pmd_thread
*pmd)
> > /* Stop counting cycles and add them to the counter 'type' */
> > static inline void
> > cycles_count_end(struct dp_netdev_pmd_thread *pmd,
> > - enum pmd_cycles_counter_type type)
> > + enum pmd_stat_type type)
> > OVS_RELEASES(&cycles_counter_fake_mutex)
> > OVS_NO_THREAD_SAFETY_ANALYSIS
> > {
> > unsigned long long interval = cycles_counter() - pmd->last_cycles;
> >
> > - non_atomic_ullong_add(&pmd->cycles.n[type], interval);
> > + pmd_perf_update_counter(&pmd->perf_stats, type, interval);
> > }
> >
> > /* Calculate the intermediate cycle result and add to the counter
'type' */
> > static inline void
> > cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd,
> > struct dp_netdev_rxq *rxq,
> > - enum pmd_cycles_counter_type type)
> > + enum pmd_stat_type type)
> > OVS_NO_THREAD_SAFETY_ANALYSIS
> > {
> > unsigned long long new_cycles = cycles_counter();
> > unsigned long long interval = new_cycles - pmd->last_cycles;
> > pmd->last_cycles = new_cycles;
> >
> > - non_atomic_ullong_add(&pmd->cycles.n[type], interval);
> > - if (rxq && (type == PMD_CYCLES_PROCESSING)) {
> > + pmd_perf_update_counter(&pmd->perf_stats, type, interval);
> > + if (rxq && (type == PMD_CYCLES_POLL_BUSY)) {
> > /* Add to the amount of current processing cycles. */
> > non_atomic_ullong_add(&rxq->cycles[RXQ_CYCLES_PROC_CURR],
interval);
> > }
> > @@ -3879,12 +3805,12 @@ dpif_netdev_run(struct dpif *dpif)
> > port->port_no);
> > cycles_count_intermediate(non_pmd, NULL,
> > process_packets
> > - ? PMD_CYCLES_PROCESSING
> > - : PMD_CYCLES_IDLE);
> > + ? PMD_CYCLES_POLL_BUSY
> > + : PMD_CYCLES_POLL_IDLE);
> > }
> > }
> > }
> > - cycles_count_end(non_pmd, PMD_CYCLES_IDLE);
> > + cycles_count_end(non_pmd, PMD_CYCLES_POLL_IDLE);
> > dpif_netdev_xps_revalidate_pmd(non_pmd, time_msec(), false);
> > ovs_mutex_unlock(&dp->non_pmd_mutex);
> >
> > @@ -4028,6 +3954,7 @@ static void *
> > pmd_thread_main(void *f_)
> > {
> > struct dp_netdev_pmd_thread *pmd = f_;
> > + struct pmd_perf_stats *s = &pmd->perf_stats;
> > unsigned int lc = 0;
> > struct polled_queue *poll_list;
> > bool exiting;
> > @@ -4063,13 +3990,17 @@ reload:
> >
> > cycles_count_start(pmd);
> > for (;;) {
> > + uint64_t iter_packets = 0;
> > + pmd_perf_start_iteration(s, pmd->last_cycles);
> > for (i = 0; i < poll_cnt; i++) {
> > process_packets =
> > dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx,
> > poll_list[i].port_no);
> > cycles_count_intermediate(pmd, poll_list[i].rxq,
> > - process_packets ?
PMD_CYCLES_PROCESSING
> > - :
PMD_CYCLES_IDLE);
> > + process_packets
> > + ? PMD_CYCLES_POLL_BUSY
> > + : PMD_CYCLES_POLL_IDLE);
> > + iter_packets += process_packets;
> > }
> >
> > if (lc++ > 1024) {
> > @@ -4087,10 +4018,12 @@ reload:
> > if (reload) {
> > break;
> > }
> > + cycles_count_intermediate(pmd, NULL, PMD_CYCLES_OVERHEAD);
> > }
> > + pmd_perf_end_iteration(s, pmd->last_cycles, iter_packets);
> > }
> >
> > - cycles_count_end(pmd, PMD_CYCLES_IDLE);
> > + cycles_count_end(pmd, PMD_CYCLES_OVERHEAD);
> >
> > poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list);
> > exiting = latch_is_set(&pmd->exit_latch);
> > @@ -4543,6 +4476,7 @@ dp_netdev_configure_pmd(struct
dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
> > }
> > cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *,
&pmd->node),
> > hash_int(core_id, 0));
> > + pmd_perf_stats_init(&pmd->perf_stats);
> > }
> >
> > static void
> > @@ -4740,13 +4674,6 @@ dp_netdev_flow_used(struct dp_netdev_flow
*netdev_flow, int cnt, int size,
> > atomic_store_relaxed(&netdev_flow->stats.tcp_flags, flags);
> > }
> >
> > -static void
> > -dp_netdev_count_packet(struct dp_netdev_pmd_thread *pmd,
> > - enum dp_stat_type type, int cnt)
> > -{
> > - non_atomic_ullong_add(&pmd->stats.n[type], cnt);
> > -}
> > -
> > static int
> > dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, struct dp_packet
*packet_,
> > struct flow *flow, struct flow_wildcards *wc,
ovs_u128 *ufid,
> > @@ -4920,6 +4847,9 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
> > int i;
> >
> > atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min);
> > + pmd_perf_update_counter(&pmd->perf_stats,
> > + md_is_valid ? PMD_STAT_RECIRC :
PMD_STAT_RECV,
> > + cnt);
> >
> > DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) {
> > struct dp_netdev_flow *flow;
> > @@ -4968,24 +4898,24 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
> > }
> > }
> >
> > - dp_netdev_count_packet(pmd, DP_STAT_EXACT_HIT,
> > - cnt - n_dropped - n_missed);
> > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_EXACT_HIT,
> > + cnt - n_dropped - n_missed);
> >
> > return dp_packet_batch_size(packets_);
> > }
> >
> > -static inline void
> > +static inline int
> > handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
> > struct dp_packet *packet,
> > const struct netdev_flow_key *key,
> > struct ofpbuf *actions, struct ofpbuf
*put_actions,
> > - int *lost_cnt, long long now)
> > + long long now)
> > {
> > struct ofpbuf *add_actions;
> > struct dp_packet_batch b;
> > struct match match;
> > ovs_u128 ufid;
> > - int error;
> > + int error = 0;
>
> This shouldn't be needed - error is unconditionally assigned.
You are right. Will fix.
>
> > match.tun_md.valid = false;
> > miniflow_expand(&key->mf, &match.flow);
> > @@ -4999,8 +4929,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread
*pmd,
> > put_actions);
> > if (OVS_UNLIKELY(error && error != ENOSPC)) {
> > dp_packet_delete(packet);
> > - (*lost_cnt)++;
> > - return;
> > + return error;
> > }
> >
> > /* The Netlink encoding of datapath flow keys cannot express
> > @@ -5040,6 +4969,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread
*pmd,
> > ovs_mutex_unlock(&pmd->flow_mutex);
> > emc_probabilistic_insert(pmd, key, netdev_flow);
> > }
>
> Should this have some additional cover for the ENOSPC case? Meaning, we
> will now increment a failure condition, but I'm not sure it should be
> considered an 'upcall' failure? After all, we'll still perform the
actions.
Good question. I inherited this change from Darrell's original patch set.
Looking at ofproto, there are a number of different conditions that return
an ENOSPC error from the upcall: reached ukey limit, recirculation without
matching recirc_ref, ukey installation failed, possibly more. I guess the
common denominator is that we execute the dp actions for the packet just
without installing a datapath flow. So we can consider the upcall as
successful. I suggest to return zero when we get to the end of the function as
the only error possible at that stage is ENOSPC.
@Darrell. OK with you?
[Darrell] No, I don’t think we can just translate an upcall that fails to
install a flow due to lack of space as ‘success/ok’.
One of 2 main purposes of a miss upcall is to install a flow in
the fast path and I think that is also the common expectation of success.
The problem may also be persistent or even as bug.
Two possible options are: to keep the upcall_fail_cnt++
increment or to differentiate the 2 cases as something like -
upcall_fail_to_install_flow and just plain upcall_fail_cnt and
refactor accordingly.
>
> > + return error;
> > }
> >
> > static inline void
> > @@ -5061,7 +4991,7 @@ fast_path_processing(struct dp_netdev_pmd_thread
*pmd,
> > struct dpcls *cls;
> > struct dpcls_rule *rules[PKT_ARRAY_SIZE];
> > struct dp_netdev *dp = pmd->dp;
> > - int miss_cnt = 0, lost_cnt = 0;
> > + int upcall_ok_cnt = 0, upcall_fail_cnt = 0;
> > int lookup_cnt = 0, add_lookup_cnt;
> > bool any_miss;
> > size_t i;
> > @@ -5103,9 +5033,14 @@ fast_path_processing(struct dp_netdev_pmd_thread
*pmd,
> > continue;
> > }
> >
> > - miss_cnt++;
> > - handle_packet_upcall(pmd, packet, &keys[i], &actions,
> > - &put_actions, &lost_cnt, now);
> > + int error = handle_packet_upcall(pmd, packet, &keys[i],
> > + &actions, &put_actions, now);
> > +
> > + if (OVS_UNLIKELY(error)) {
> > + upcall_fail_cnt++;
> > + } else {
> > + upcall_ok_cnt++;
> > + }
> > }
> >
> > ofpbuf_uninit(&actions);
> > @@ -5115,8 +5050,7 @@ fast_path_processing(struct dp_netdev_pmd_thread
*pmd,
> > DP_PACKET_BATCH_FOR_EACH (packet, packets_) {
> > if (OVS_UNLIKELY(!rules[i])) {
> > dp_packet_delete(packet);
> > - lost_cnt++;
> > - miss_cnt++;
> > + upcall_fail_cnt++;
> > }
> > }
> > }
> > @@ -5134,10 +5068,14 @@ fast_path_processing(struct
dp_netdev_pmd_thread *pmd,
> > dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches,
n_batches);
> > }
> >
> > - dp_netdev_count_packet(pmd, DP_STAT_MASKED_HIT, cnt - miss_cnt);
> > - dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt);
> > - dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt);
> > - dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt);
> > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_HIT,
> > + cnt - upcall_ok_cnt - upcall_fail_cnt);
> > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_LOOKUP,
> > + lookup_cnt);
> > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MISS,
> > + upcall_ok_cnt);
> > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_LOST,
> > + upcall_fail_cnt);
> > }
> >
> > /* Packets enter the datapath from a port (or from recirculation) here.
> > diff --git a/tests/pmd.at b/tests/pmd.at
> > index e39a23a..0356f87 100644
> > --- a/tests/pmd.at
> > +++ b/tests/pmd.at
> > @@ -170,13 +170,16 @@ dummy@ovs-dummy: hit:0 missed:0
> > p0 7/1: (dummy-pmd: configured_rx_queues=4,
configured_tx_queues=<cleared>, requested_rx_queues=4,
> requested_tx_queues=<cleared>)
> > ])
> >
> > -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed
SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl
> > +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed
SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl
> > pmd thread numa_id <cleared> core_id <cleared>:
> > + packets received:0
> > + packet recirculations:0
> > + avg. datapath passes per packet:0.00
> > emc hits:0
> > megaflow hits:0
> > - avg. subtable lookups per hit:0.00
> > - miss:0
> > - lost:0
> > + avg. subtable lookups per megaflow hit:0.00
> > + miss(i.e. lookup miss with success upcall):0
> > + lost(i.e. lookup miss with failed upcall):0
> > ])
> >
> > ovs-appctl time/stop
> > @@ -197,13 +200,16 @@ AT_CHECK([cat ovs-vswitchd.log |
filter_flow_install | strip_xout], [0], [dnl
> >
recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=50:54:00:00:00:77,dst=50:54:00:00:01:78),eth_type(0x0800),ipv4(frag=no),
> actions: <del>
> > ])
> >
> > -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed
SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl
> > +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed
SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl
> > pmd thread numa_id <cleared> core_id <cleared>:
> > + packets received:20
> > + packet recirculations:0
> > + avg. datapath passes per packet:1.00
> > emc hits:19
> > megaflow hits:0
> > - avg. subtable lookups per hit:0.00
> > - miss:1
> > - lost:0
> > + avg. subtable lookups per megaflow hit:0.00
> > + miss(i.e. lookup miss with success upcall):1
> > + lost(i.e. lookup miss with failed upcall):0
> > ])
> >
> > OVS_VSWITCHD_STOP
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev