Comments inline. Best regards, Ilya Maximets.
On 09.01.2018 12:26, Jan Scheurich wrote: > Add module dpif-netdev-perf to host all PMD performance-related > data structures and functions in dpif-netdev. Refactor the PMD > stats handling in dpif-netdev and delegate whatever possible into > the new module, using clean interfaces to shield dpif-netdev from > the implementation details. Accordingly, the all PMD statistics > members are moved from the main struct dp_netdev_pmd_thread into > a dedicated member of type struct pmd_perf_stats. > > Include Darrel's prior refactoring of PMD stats contained in > [PATCH v5,2/3] dpif-netdev: Refactor some pmd stats: > > 1. The cycles per packet counts are now based on packets > received rather than packet passes through the datapath. > > 2. Packet counters are now kept for packets received and > packets recirculated. These are kept as separate counters for > maintainability reasons. The cost of incrementing these counters > is negligible. These new counters are also displayed to the user. > > 3. A display statistic is added for the average number of > datapath passes per packet. This should be useful for user > debugging and understanding of packet processing. > > 4. The user visible 'miss' counter is used for successful upcalls, > rather than the sum of sucessful and unsuccessful upcalls. Hence, > this becomes what user historically understands by OVS 'miss upcall'. > The user display is annotated to make this clear as well. > > 5. The user visible 'lost' counter remains as failed upcalls, but > is annotated to make it clear what the meaning is. > > 6. The enum pmd_stat_type is annotated to make the usage of the > stats counters clear. > > 7. The subtable lookup stats is renamed to make it clear that it > relates to masked lookups. > > 8. The PMD stats test is updated to handle the new user stats of > packets received, packets recirculated and average number of datapath > passes per packet. > > On top of that introduce a "-pmd <core>" option to the PMD info > commands to filter the output for a single PMD. > > Signed-off-by: Jan Scheurich <[email protected]> > Co-authored-by: Darrell Ball <[email protected]> > Signed-off-by: Darrell Ball <[email protected]> > --- > lib/automake.mk | 2 + > lib/dpif-netdev-perf.c | 66 +++++++++ > lib/dpif-netdev-perf.h | 139 +++++++++++++++++++ > lib/dpif-netdev.c | 357 > +++++++++++++++++++++---------------------------- > tests/pmd.at | 22 +-- > 5 files changed, 370 insertions(+), 216 deletions(-) > create mode 100644 lib/dpif-netdev-perf.c > create mode 100644 lib/dpif-netdev-perf.h > > diff --git a/lib/automake.mk b/lib/automake.mk > index 4b38a11..159319f 100644 > --- a/lib/automake.mk > +++ b/lib/automake.mk > @@ -80,6 +80,8 @@ lib_libopenvswitch_la_SOURCES = \ > lib/dpdk.h \ > lib/dpif-netdev.c \ > lib/dpif-netdev.h \ > + lib/dpif-netdev-perf.c \ > + lib/dpif-netdev-perf.h \ > lib/dpif-provider.h \ > lib/dpif.c \ > lib/dpif.h \ > diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c > new file mode 100644 > index 0000000..1d5db6b > --- /dev/null > +++ b/lib/dpif-netdev-perf.c > @@ -0,0 +1,66 @@ > +/* > + * Copyright (c) 2017 Ericsson AB. > + * > + * Licensed under the Apache License, Version 2.0 (the "License"); > + * you may not use this file except in compliance with the License. > + * You may obtain a copy of the License at: > + * > + * http://www.apache.org/licenses/LICENSE-2.0 > + * > + * Unless required by applicable law or agreed to in writing, software > + * distributed under the License is distributed on an "AS IS" BASIS, > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > + * See the License for the specific language governing permissions and > + * limitations under the License. > + */ > + > +#include <config.h> > + > +#ifdef DPDK_NETDEV > +#include <rte_cycles.h> > +#endif > + > +#include "openvswitch/dynamic-string.h" > +#include "openvswitch/vlog.h" > +#include "dpif-netdev-perf.h" > +#include "timeval.h" > + > +VLOG_DEFINE_THIS_MODULE(pmd_perf); > + > +void > +pmd_perf_stats_init(struct pmd_perf_stats *s) { '{' should be on the next line. > + memset(s, 0 , sizeof(*s)); > + s->start_ms = time_msec(); > +} > + > +void > +pmd_perf_read_counters(struct pmd_perf_stats *s, > + uint64_t stats[PMD_N_STATS]) > +{ > + uint64_t val; > + > + /* These loops subtracts reference values (.zero[*]) from the counters. > + * Since loads and stores are relaxed, it might be possible for a > .zero[*] > + * value to be more recent than the current value we're reading from the > + * counter. This is not a big problem, since these numbers are not > + * supposed to be 100% accurate, but we should at least make sure that > + * the result is not negative. */ > + for (int i = 0; i < PMD_N_STATS; i++) { > + atomic_read_relaxed(&s->counters.n[i], &val); > + if (val > s->counters.zero[i]) { > + stats[i] = val - s->counters.zero[i]; > + } else { > + stats[i] = 0; > + } How about: stats[i] = (val > s->counters.zero[i]) ? val - s->counters.zero[i] : 0; > + } > +} > + > +void > +pmd_perf_stats_clear(struct pmd_perf_stats *s) > +{ > + s->start_ms = 0; /* Marks start of clearing. */ > + for (int i = 0; i < PMD_N_STATS; i++) { > + atomic_read_relaxed(&s->counters.n[i], &s->counters.zero[i]); > + } > + s->start_ms = time_msec(); /* Clearing finished. */ > +} > diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h > new file mode 100644 > index 0000000..b31bf49 > --- /dev/null > +++ b/lib/dpif-netdev-perf.h > @@ -0,0 +1,139 @@ > +/* > + * Copyright (c) 2017 Ericsson AB. > + * > + * Licensed under the Apache License, Version 2.0 (the "License"); > + * you may not use this file except in compliance with the License. > + * You may obtain a copy of the License at: > + * > + * http://www.apache.org/licenses/LICENSE-2.0 > + * > + * Unless required by applicable law or agreed to in writing, software > + * distributed under the License is distributed on an "AS IS" BASIS, > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > + * See the License for the specific language governing permissions and > + * limitations under the License. > + */ > + > +#ifndef DPIF_NETDEV_PERF_H > +#define DPIF_NETDEV_PERF_H 1 > + > +#include <stdbool.h> > +#include <stddef.h> > +#include <stdint.h> > +#include <string.h> > +#include <math.h> > + > +#include "openvswitch/vlog.h" > +#include "ovs-atomic.h" > +#include "timeval.h" > +#include "unixctl.h" > +#include "util.h" > + > +#ifdef __cplusplus > +extern "C" { > +#endif > + > +/* This module encapsulates data structures and functions to maintain PMD > + * performance metrics such as packet counters, execution cycles. It > + * provides a clean API for dpif-netdev to initialize, update and read and > + * reset these metrics. > + */ > + > +/* Set of counter types maintained in pmd_perf_stats. */ > + > +enum pmd_stat_type { > + PMD_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */ > + PMD_STAT_MASKED_HIT, /* Packets that matched in the flow table. */ > + PMD_STAT_MISS, /* Packets that did not match and upcall was ok. > */ > + PMD_STAT_LOST, /* Packets that did not match and upcall failed. > */ > + /* The above statistics account for the total > + * number of packet passes through the datapath > + * pipeline and should not be overlapping with > each > + * other. */ > + PMD_STAT_MASKED_LOOKUP, /* Number of subtable lookups for flow table > + hits. Each MASKED_HIT hit will have >= 1 > + MASKED_LOOKUP(s). */ > + PMD_STAT_RECV, /* Packets entering the datapath pipeline from an > + * interface. */ > + PMD_STAT_RECIRC, /* Packets reentering the datapath pipeline due > to > + * recirculation. */ > + PMD_STAT_SENT_PKTS, /* Packets that has been sent. */ > + PMD_STAT_SENT_BATCHES, /* Number of batches sent. */ > + PMD_CYCLES_POLL_IDLE, /* Cycles spent unsuccessful polling. */ > + PMD_CYCLES_POLL_BUSY, /* Cycles spent successfully polling and > + * processing polled packets. */ > + PMD_CYCLES_OVERHEAD, /* Cycles spent for other tasks. */ > + PMD_CYCLES_UPCALL, /* Cycles spent in upcalls. Included in > + PMD_CYCLES_POLL_BUSY. */ > + PMD_CYCLES_ITER_IDLE, /* Cycles spent in idle iterations. */ > + PMD_CYCLES_ITER_BUSY, /* Cycles spent in busy iterations. */ > + PMD_N_STATS > +}; > + > +/* Array of PMD counters indexed by enum pmd_stat_type. > + * The n[] array contains the actual counter values since initialization > + * of the PMD. Counters are atomically updated from the PMD but are > + * read and cleared also from other processes. To clear the counters at > + * PMD run-time, the current counter values are copied over to the zero[] > + * array. To read counters we subtract zero[] value from n[]. */ > + > +struct pmd_counters { > + atomic_uint64_t n[PMD_N_STATS]; /* Value since _init(). */ > + uint64_t zero[PMD_N_STATS]; /* Value at last _clear(). */ > +}; > + > +/* Container for all performance metrics of a PMD. > + * Part of the struct dp_netdev_pmd_thread. */ > + > +struct pmd_perf_stats { > + uint64_t start_ms; /* Start of the current performance measurement > + period. */ 'start_ms' used only like a flag. I see no reason to store current time in it. Maybe it's better to convert it to bool and give a meaningful name? > + uint64_t last_tsc; /* Start of the current PMD iteration in TSC > + cycles. */ Comments in this structure looks not so pretty. I'd prefer to place them above the structure elements for readability: struct pmd_perf_stats { /* Start of the current performance measurement period. */ uint64_t start_ms; /* Start of the current PMD iteration in TSC cycles. */ uint64_t last_tsc; > + struct pmd_counters counters; > +}; > + > +void pmd_perf_stats_init(struct pmd_perf_stats *s); > +void pmd_perf_stats_clear(struct pmd_perf_stats *s); > +void pmd_perf_read_counters(struct pmd_perf_stats *s, > + uint64_t stats[PMD_N_STATS]); > + > +static inline void > +pmd_perf_update_counter(struct pmd_perf_stats *s, > + enum pmd_stat_type counter, int delta) > +{ > + uint64_t tmp; > + atomic_read_relaxed(&s->counters.n[counter], &tmp); > + tmp += delta; I think, some comment required here why we're doing this non-atomically. > + atomic_store_relaxed(&s->counters.n[counter], tmp); > +} > + > +static inline void > +pmd_perf_start_iteration(struct pmd_perf_stats *s, uint64_t now_tsc) > +{ > + if (OVS_UNLIKELY(s->start_ms == 0)) { > + /* Stats not yet fully initialized. */ > + return; I not fully understand this. 'pmd_perf_start_iteration' can not fail. And what will happen if we'll just return here? We will use outdated 'last_tsc' and the cycles statistics could be broken after that. What is the purpose of 'start_ms'? Looks like it protects nothing. > + } > + s->last_tsc = now_tsc; > +} > + > +static inline void > +pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc, > + int packets) > +{ > + uint64_t cycles = now_tsc - s->last_tsc; > + > + /* No need for atomic updates as only invoked within PMD thread. */ > + if (packets > 0) { > + s->counters.n[PMD_CYCLES_ITER_BUSY] += cycles; > + } else { > + s->counters.n[PMD_CYCLES_ITER_IDLE] += cycles; > + } > +} > + > +#ifdef __cplusplus > +} > +#endif > + > +#endif /* DPIF_NETDEV_PERF_H */ > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index d06f585..ad050f2 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -44,6 +44,7 @@ > #include "csum.h" > #include "dp-packet.h" > #include "dpif.h" > +#include "dpif-netdev-perf.h" > #include "dpif-provider.h" > #include "dummy.h" > #include "fat-rwlock.h" > @@ -331,25 +332,6 @@ static struct dp_netdev_port > *dp_netdev_lookup_port(const struct dp_netdev *dp, > odp_port_t) > OVS_REQUIRES(dp->port_mutex); > > -enum dp_stat_type { > - DP_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */ > - DP_STAT_MASKED_HIT, /* Packets that matched in the flow table. */ > - DP_STAT_MISS, /* Packets that did not match. */ > - DP_STAT_LOST, /* Packets not passed up to the client. */ > - DP_STAT_LOOKUP_HIT, /* Number of subtable lookups for flow table > - hits */ > - DP_STAT_SENT_PKTS, /* Packets that has been sent. */ > - DP_STAT_SENT_BATCHES, /* Number of batches sent. */ > - DP_N_STATS > -}; > - > -enum pmd_cycles_counter_type { > - PMD_CYCLES_IDLE, /* Cycles spent idle or unsuccessful polling > */ > - PMD_CYCLES_PROCESSING, /* Cycles spent successfully polling and > - * processing polled packets */ > - PMD_N_CYCLES > -}; > - > enum rxq_cycles_counter_type { > RXQ_CYCLES_PROC_CURR, /* Cycles spent successfully polling and > processing packets during the current > @@ -499,21 +481,6 @@ struct dp_netdev_actions *dp_netdev_flow_get_actions( > const struct dp_netdev_flow *); > static void dp_netdev_actions_free(struct dp_netdev_actions *); > > -/* Contained by struct dp_netdev_pmd_thread's 'stats' member. */ > -struct dp_netdev_pmd_stats { > - /* Indexed by DP_STAT_*. */ > - atomic_ullong n[DP_N_STATS]; > -}; > - > -/* Contained by struct dp_netdev_pmd_thread's 'cycle' member. */ > -struct dp_netdev_pmd_cycles { > - /* Indexed by PMD_CYCLES_*. */ > - atomic_ullong n[PMD_N_CYCLES]; > -}; > - > -static void dp_netdev_count_packet(struct dp_netdev_pmd_thread *, > - enum dp_stat_type type, int cnt); > - > struct polled_queue { > struct dp_netdev_rxq *rxq; > odp_port_t port_no; > @@ -595,12 +562,6 @@ struct dp_netdev_pmd_thread { > are stored for each polled rxq. */ > long long int rxq_next_cycle_store; > > - /* Statistics. */ > - struct dp_netdev_pmd_stats stats; > - > - /* Cycles counters */ > - struct dp_netdev_pmd_cycles cycles; > - > /* Current context of the PMD thread. */ > struct dp_netdev_pmd_thread_ctx ctx; > > @@ -638,12 +599,8 @@ struct dp_netdev_pmd_thread { > struct hmap tnl_port_cache; > struct hmap send_port_cache; > > - /* Only a pmd thread can write on its own 'cycles' and 'stats'. > - * The main thread keeps 'stats_zero' and 'cycles_zero' as base > - * values and subtracts them from 'stats' and 'cycles' before > - * reporting to the user */ > - unsigned long long stats_zero[DP_N_STATS]; > - uint64_t cycles_zero[PMD_N_CYCLES]; > + /* Keep track of detailed PMD performance statistics. */ > + struct pmd_perf_stats perf_stats; > > /* Set to true if the pmd thread needs to be reloaded. */ > bool need_reload; > @@ -833,47 +790,10 @@ enum pmd_info_type { > }; > > static void > -pmd_info_show_stats(struct ds *reply, > - struct dp_netdev_pmd_thread *pmd, > - unsigned long long stats[DP_N_STATS], > - uint64_t cycles[PMD_N_CYCLES]) > +format_pmd_thread(struct ds *reply, struct dp_netdev_pmd_thread *pmd) > { > - unsigned long long total_packets; > - uint64_t total_cycles = 0; > - double lookups_per_hit = 0, packets_per_batch = 0; > - int i; > - > - /* These loops subtracts reference values ('*_zero') from the counters. > - * Since loads and stores are relaxed, it might be possible for a > '*_zero' > - * value to be more recent than the current value we're reading from the > - * counter. This is not a big problem, since these numbers are not > - * supposed to be too accurate, but we should at least make sure that > - * the result is not negative. */ > - for (i = 0; i < DP_N_STATS; i++) { > - if (stats[i] > pmd->stats_zero[i]) { > - stats[i] -= pmd->stats_zero[i]; > - } else { > - stats[i] = 0; > - } > - } > - > - /* Sum of all the matched and not matched packets gives the total. */ > - total_packets = stats[DP_STAT_EXACT_HIT] + stats[DP_STAT_MASKED_HIT] > - + stats[DP_STAT_MISS]; > - > - for (i = 0; i < PMD_N_CYCLES; i++) { > - if (cycles[i] > pmd->cycles_zero[i]) { > - cycles[i] -= pmd->cycles_zero[i]; > - } else { > - cycles[i] = 0; > - } > - > - total_cycles += cycles[i]; > - } > - > ds_put_cstr(reply, (pmd->core_id == NON_PMD_CORE_ID) > ? "main thread" : "pmd thread"); > - > if (pmd->numa_id != OVS_NUMA_UNSPEC) { > ds_put_format(reply, " numa_id %d", pmd->numa_id); > } > @@ -881,23 +801,52 @@ pmd_info_show_stats(struct ds *reply, > ds_put_format(reply, " core_id %u", pmd->core_id); > } > ds_put_cstr(reply, ":\n"); > +} > + > +static void > +pmd_info_show_stats(struct ds *reply, > + struct dp_netdev_pmd_thread *pmd) > +{ > + uint64_t stats[PMD_N_STATS]; > + uint64_t total_cycles = 0; no need to clear 'total_cycles'. > + double passes_per_pkt = 0; > + double lookups_per_hit = 0; > + double packets_per_batch = 0; > + > + pmd_perf_read_counters(&pmd->perf_stats, stats); > + total_cycles = stats[PMD_CYCLES_ITER_IDLE] > + + stats[PMD_CYCLES_ITER_BUSY]; > + uint64_t total_packets = stats[PMD_STAT_RECV]; IMHO, this should be declared at the top near to 'total_cycles'. > > - if (stats[DP_STAT_MASKED_HIT] > 0) { > - lookups_per_hit = stats[DP_STAT_LOOKUP_HIT] > - / (double) stats[DP_STAT_MASKED_HIT]; > + format_pmd_thread(reply, pmd); > + > + if (total_packets > 0) { > + passes_per_pkt = (total_packets + stats[PMD_STAT_RECIRC]) > + / (double) total_packets; > + } > + if (stats[PMD_STAT_MASKED_HIT] > 0) { > + lookups_per_hit = stats[PMD_STAT_MASKED_LOOKUP] > + / (double) stats[PMD_STAT_MASKED_HIT]; > } > - if (stats[DP_STAT_SENT_BATCHES] > 0) { > - packets_per_batch = stats[DP_STAT_SENT_PKTS] > - / (double) stats[DP_STAT_SENT_BATCHES]; > + if (stats[PMD_STAT_SENT_BATCHES] > 0) { > + packets_per_batch = stats[PMD_STAT_SENT_PKTS] > + / (double) stats[PMD_STAT_SENT_BATCHES]; > } > > ds_put_format(reply, > - "\temc hits:%llu\n\tmegaflow hits:%llu\n" > - "\tavg. subtable lookups per hit:%.2f\n" > - "\tmiss:%llu\n\tlost:%llu\n" > + "\tpackets received:%"PRIu64"\n" > + "\tpacket recirculations:%"PRIu64"\n" > + "\tavg. datapath passes per packet:%.2f\n" > + "\temc hits:%"PRIu64"\n" > + "\tmegaflow hits:%"PRIu64"\n" > + "\tavg. subtable lookups per megaflow hit:%.2f\n" > + "\tmiss(i.e. lookup miss with success upcall):%"PRIu64"\n" > + "\tlost(i.e. lookup miss with failed upcall):%"PRIu64"\n" Could you add a space before '(': "miss (i.e.", "lost (i.e." ? > "\tavg. packets per output batch: %.2f\n", > - stats[DP_STAT_EXACT_HIT], stats[DP_STAT_MASKED_HIT], > - lookups_per_hit, stats[DP_STAT_MISS], stats[DP_STAT_LOST], > + total_packets, stats[PMD_STAT_RECIRC], > + passes_per_pkt, stats[PMD_STAT_EXACT_HIT], > + stats[PMD_STAT_MASKED_HIT], lookups_per_hit, > + stats[PMD_STAT_MISS], stats[PMD_STAT_LOST], > packets_per_batch); Can we use same format '%.02f' for all the floating point numbers? Format in above code differs now with format of below code. One more thig: As we're already changing the format of the output, could we add an additional space between ':' and the values? This annoyed me for a long time, I just didn't want to break someone's parsing utilities. I mean: emc hits: 38 instead of emc hits:38 Same for the code below in this function. > > if (total_cycles == 0) { > @@ -907,46 +856,25 @@ pmd_info_show_stats(struct ds *reply, > ds_put_format(reply, > "\tidle cycles:%"PRIu64" (%.02f%%)\n" > "\tprocessing cycles:%"PRIu64" (%.02f%%)\n", > - cycles[PMD_CYCLES_IDLE], > - cycles[PMD_CYCLES_IDLE] / (double)total_cycles * 100, > - cycles[PMD_CYCLES_PROCESSING], > - cycles[PMD_CYCLES_PROCESSING] / (double)total_cycles * > 100); > + stats[PMD_CYCLES_ITER_IDLE], > + stats[PMD_CYCLES_ITER_IDLE] / (double) total_cycles * 100, > + stats[PMD_CYCLES_ITER_BUSY], > + stats[PMD_CYCLES_ITER_BUSY] / (double) total_cycles * 100); > > if (total_packets == 0) { > return; > } > > ds_put_format(reply, > - "\tavg cycles per packet: %.02f (%"PRIu64"/%llu)\n", > - total_cycles / (double)total_packets, > + "\tavg cycles per packet: %.02f (%"PRIu64"/%"PRIu64")\n", > + total_cycles / (double) total_packets, > total_cycles, total_packets); > > ds_put_format(reply, > "\tavg processing cycles per packet: " > - "%.02f (%"PRIu64"/%llu)\n", > - cycles[PMD_CYCLES_PROCESSING] / (double)total_packets, > - cycles[PMD_CYCLES_PROCESSING], total_packets); > -} > - > -static void > -pmd_info_clear_stats(struct ds *reply OVS_UNUSED, > - struct dp_netdev_pmd_thread *pmd, > - unsigned long long stats[DP_N_STATS], > - uint64_t cycles[PMD_N_CYCLES]) > -{ > - int i; > - > - /* We cannot write 'stats' and 'cycles' (because they're written by other > - * threads) and we shouldn't change 'stats' (because they're used to > count > - * datapath stats, which must not be cleared here). Instead, we save the > - * current values and subtract them from the values to be displayed in > the > - * future */ > - for (i = 0; i < DP_N_STATS; i++) { > - pmd->stats_zero[i] = stats[i]; > - } > - for (i = 0; i < PMD_N_CYCLES; i++) { > - pmd->cycles_zero[i] = cycles[i]; > - } > + "%.02f (%"PRIu64"/%"PRIu64")\n", > + stats[PMD_CYCLES_POLL_BUSY] / (double) total_packets, > + stats[PMD_CYCLES_POLL_BUSY], total_packets); > } > > static int > @@ -1106,23 +1034,36 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int > argc, const char *argv[], > struct ds reply = DS_EMPTY_INITIALIZER; > struct dp_netdev_pmd_thread **pmd_list; > struct dp_netdev *dp = NULL; > - size_t n; > enum pmd_info_type type = *(enum pmd_info_type *) aux; > + unsigned long core_id; > + bool filter_on_pmd = false; > + size_t n; > > ovs_mutex_lock(&dp_netdev_mutex); > > - if (argc == 2) { > - dp = shash_find_data(&dp_netdevs, argv[1]); > - } else if (shash_count(&dp_netdevs) == 1) { > - /* There's only one datapath */ > - dp = shash_first(&dp_netdevs)->data; > + while (argc > 1) { > + if (!strcmp(argv[1], "-pmd")) { > + core_id = strtoul(argv[2], NULL, 10); Error checking required here. IMHO, you need to use 'str_to_int' or similar library function. > + filter_on_pmd = true; > + argc -= 2; > + argv += 2; > + } else { > + dp = shash_find_data(&dp_netdevs, argv[1]); > + argc -= 1; > + argv += 1; > + } > } > > if (!dp) { > - ovs_mutex_unlock(&dp_netdev_mutex); > - unixctl_command_reply_error(conn, > - "please specify an existing datapath"); > - return; > + if (shash_count(&dp_netdevs) == 1) { > + /* There's only one datapath */ > + dp = shash_first(&dp_netdevs)->data; > + } else { > + ovs_mutex_unlock(&dp_netdev_mutex); > + unixctl_command_reply_error(conn, > + "please specify an existing > datapath"); > + return; > + } > } > > sorted_poll_thread_list(dp, &pmd_list, &n); > @@ -1131,26 +1072,15 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int > argc, const char *argv[], > if (!pmd) { > break; > } > - > + if (filter_on_pmd && pmd->core_id != core_id) { > + continue; > + } > if (type == PMD_INFO_SHOW_RXQ) { > pmd_info_show_rxq(&reply, pmd); > - } else { > - unsigned long long stats[DP_N_STATS]; > - uint64_t cycles[PMD_N_CYCLES]; > - > - /* Read current stats and cycle counters */ > - for (size_t j = 0; j < ARRAY_SIZE(stats); j++) { > - atomic_read_relaxed(&pmd->stats.n[j], &stats[j]); > - } > - for (size_t j = 0; j < ARRAY_SIZE(cycles); j++) { > - atomic_read_relaxed(&pmd->cycles.n[j], &cycles[j]); > - } > - > - if (type == PMD_INFO_CLEAR_STATS) { > - pmd_info_clear_stats(&reply, pmd, stats, cycles); > - } else if (type == PMD_INFO_SHOW_STATS) { > - pmd_info_show_stats(&reply, pmd, stats, cycles); > - } > + } else if (type == PMD_INFO_CLEAR_STATS) { > + pmd_perf_stats_clear(&pmd->perf_stats); > + } else if (type == PMD_INFO_SHOW_STATS) { > + pmd_info_show_stats(&reply, pmd); > } > } > free(pmd_list); > @@ -1168,14 +1098,14 @@ dpif_netdev_init(void) > clear_aux = PMD_INFO_CLEAR_STATS, > poll_aux = PMD_INFO_SHOW_RXQ; > > - unixctl_command_register("dpif-netdev/pmd-stats-show", "[dp]", > - 0, 1, dpif_netdev_pmd_info, > + unixctl_command_register("dpif-netdev/pmd-stats-show", "[-pmd core | > dp]", It should be "[-pmd core] [dp]". Current version means "optionally, only one of arguments". Same in all other cases below. > + 0, 2, dpif_netdev_pmd_info, > (void *)&show_aux); > - unixctl_command_register("dpif-netdev/pmd-stats-clear", "[dp]", > - 0, 1, dpif_netdev_pmd_info, > + unixctl_command_register("dpif-netdev/pmd-stats-clear", "[-pmd core | > dp]", > + 0, 2, dpif_netdev_pmd_info, > (void *)&clear_aux); > - unixctl_command_register("dpif-netdev/pmd-rxq-show", "[dp]", > - 0, 1, dpif_netdev_pmd_info, > + unixctl_command_register("dpif-netdev/pmd-rxq-show", "[-pmd core | dp]", > + 0, 2, dpif_netdev_pmd_info, > (void *)&poll_aux); > unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]", > 0, 1, dpif_netdev_pmd_rebalance, > @@ -1511,23 +1441,19 @@ dpif_netdev_get_stats(const struct dpif *dpif, struct > dpif_dp_stats *stats) > { > struct dp_netdev *dp = get_dp_netdev(dpif); > struct dp_netdev_pmd_thread *pmd; > + uint64_t pmd_stats[PMD_N_STATS]; > > - stats->n_flows = stats->n_hit = stats->n_missed = stats->n_lost = 0; > + stats->n_flows = stats->n_hit = > + stats->n_mask_hit = stats->n_missed = stats->n_lost = 0; > CMAP_FOR_EACH (pmd, node, &dp->poll_threads) { > - unsigned long long n; > stats->n_flows += cmap_count(&pmd->flow_table); > - > - atomic_read_relaxed(&pmd->stats.n[DP_STAT_MASKED_HIT], &n); > - stats->n_hit += n; > - atomic_read_relaxed(&pmd->stats.n[DP_STAT_EXACT_HIT], &n); > - stats->n_hit += n; > - atomic_read_relaxed(&pmd->stats.n[DP_STAT_MISS], &n); > - stats->n_missed += n; > - atomic_read_relaxed(&pmd->stats.n[DP_STAT_LOST], &n); > - stats->n_lost += n; > + pmd_perf_read_counters(&pmd->perf_stats, pmd_stats); > + stats->n_hit += pmd_stats[PMD_STAT_EXACT_HIT]; > + stats->n_mask_hit += pmd_stats[PMD_STAT_MASKED_HIT]; > + stats->n_missed += pmd_stats[PMD_STAT_MISS]; > + stats->n_lost += pmd_stats[PMD_STAT_LOST]; > } > stats->n_masks = UINT32_MAX; > - stats->n_mask_hit = UINT64_MAX; > > return 0; > } > @@ -3209,28 +3135,28 @@ cycles_count_start(struct dp_netdev_pmd_thread *pmd) > /* Stop counting cycles and add them to the counter 'type' */ > static inline void > cycles_count_end(struct dp_netdev_pmd_thread *pmd, > - enum pmd_cycles_counter_type type) > + enum pmd_stat_type type) > OVS_RELEASES(&cycles_counter_fake_mutex) > OVS_NO_THREAD_SAFETY_ANALYSIS > { > unsigned long long interval = cycles_counter() - pmd->ctx.last_cycles; > > - non_atomic_ullong_add(&pmd->cycles.n[type], interval); > + pmd_perf_update_counter(&pmd->perf_stats, type, interval); > } > > /* Calculate the intermediate cycle result and add to the counter 'type' */ > static inline void > cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd, > struct dp_netdev_rxq *rxq, > - enum pmd_cycles_counter_type type) > + enum pmd_stat_type type) > OVS_NO_THREAD_SAFETY_ANALYSIS > { > unsigned long long new_cycles = cycles_counter(); > unsigned long long interval = new_cycles - pmd->ctx.last_cycles; > pmd->ctx.last_cycles = new_cycles; > > - non_atomic_ullong_add(&pmd->cycles.n[type], interval); > - if (rxq && (type == PMD_CYCLES_PROCESSING)) { > + pmd_perf_update_counter(&pmd->perf_stats, type, interval); > + if (rxq && (type == PMD_CYCLES_POLL_BUSY)) { > /* Add to the amount of current processing cycles. */ > non_atomic_ullong_add(&rxq->cycles[RXQ_CYCLES_PROC_CURR], interval); > } > @@ -3289,8 +3215,10 @@ dp_netdev_pmd_flush_output_on_port(struct > dp_netdev_pmd_thread *pmd, > netdev_send(p->port->netdev, tx_qid, &p->output_pkts, dynamic_txqs); > dp_packet_batch_init(&p->output_pkts); > > - dp_netdev_count_packet(pmd, DP_STAT_SENT_PKTS, output_cnt); > - dp_netdev_count_packet(pmd, DP_STAT_SENT_BATCHES, 1); > + pmd_perf_update_counter(&pmd->perf_stats, > + PMD_STAT_SENT_PKTS, output_cnt); One line, please. > + pmd_perf_update_counter(&pmd->perf_stats, > + PMD_STAT_SENT_BATCHES, 1); Same. > } > > static void > @@ -3971,12 +3899,12 @@ dpif_netdev_run(struct dpif *dpif) > port->port_no); > cycles_count_intermediate(non_pmd, NULL, > process_packets > - ? PMD_CYCLES_PROCESSING > - : PMD_CYCLES_IDLE); > + ? PMD_CYCLES_POLL_BUSY > + : PMD_CYCLES_POLL_IDLE); > } > } > } > - cycles_count_end(non_pmd, PMD_CYCLES_IDLE); > + cycles_count_end(non_pmd, PMD_CYCLES_POLL_IDLE); > pmd_thread_ctx_time_update(non_pmd); > dpif_netdev_xps_revalidate_pmd(non_pmd, false); > ovs_mutex_unlock(&dp->non_pmd_mutex); > @@ -4121,6 +4049,7 @@ static void * > pmd_thread_main(void *f_) > { > struct dp_netdev_pmd_thread *pmd = f_; > + struct pmd_perf_stats *s = &pmd->perf_stats; > unsigned int lc = 0; > struct polled_queue *poll_list; > bool exiting; > @@ -4156,13 +4085,17 @@ reload: > > cycles_count_start(pmd); > for (;;) { > + uint64_t iter_packets = 0; > + pmd_perf_start_iteration(s, pmd->ctx.last_cycles); > for (i = 0; i < poll_cnt; i++) { > process_packets = > dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx, > poll_list[i].port_no); > cycles_count_intermediate(pmd, poll_list[i].rxq, > - process_packets ? PMD_CYCLES_PROCESSING > - : PMD_CYCLES_IDLE); > + process_packets > + ? PMD_CYCLES_POLL_BUSY > + : PMD_CYCLES_POLL_IDLE); > + iter_packets += process_packets; > } > > if (lc++ > 1024) { > @@ -4183,10 +4116,12 @@ reload: > if (reload) { > break; > } > + cycles_count_intermediate(pmd, NULL, PMD_CYCLES_OVERHEAD); > } > + pmd_perf_end_iteration(s, pmd->ctx.last_cycles, iter_packets); Why 'iter_packets' needed? Can we just use a flag for this purposes like 'need_to_flush' in my v9 ? Of cource, it should be ranamed, but it has the same logic. > } > > - cycles_count_end(pmd, PMD_CYCLES_IDLE); > + cycles_count_end(pmd, PMD_CYCLES_OVERHEAD); > > poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list); > exiting = latch_is_set(&pmd->exit_latch); > @@ -4638,6 +4573,7 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread > *pmd, struct dp_netdev *dp, > emc_cache_init(&pmd->flow_cache); > pmd_alloc_static_tx_qid(pmd); > } > + pmd_perf_stats_init(&pmd->perf_stats); > cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *, > &pmd->node), > hash_int(core_id, 0)); > } > @@ -4838,13 +4774,6 @@ dp_netdev_flow_used(struct dp_netdev_flow > *netdev_flow, int cnt, int size, > atomic_store_relaxed(&netdev_flow->stats.tcp_flags, flags); > } > > -static void > -dp_netdev_count_packet(struct dp_netdev_pmd_thread *pmd, > - enum dp_stat_type type, int cnt) > -{ > - non_atomic_ullong_add(&pmd->stats.n[type], cnt); > -} > - > static int > dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, struct dp_packet *packet_, > struct flow *flow, struct flow_wildcards *wc, ovs_u128 > *ufid, > @@ -5017,6 +4946,9 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, > int i; > > atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min); > + pmd_perf_update_counter(&pmd->perf_stats, > + md_is_valid ? PMD_STAT_RECIRC : PMD_STAT_RECV, > + cnt); > > DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) { > struct dp_netdev_flow *flow; > @@ -5065,18 +4997,17 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, > } > } > > - dp_netdev_count_packet(pmd, DP_STAT_EXACT_HIT, > - cnt - n_dropped - n_missed); > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_EXACT_HIT, > + cnt - n_dropped - n_missed); > > return dp_packet_batch_size(packets_); > } > > -static inline void > +static inline int > handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, > struct dp_packet *packet, > const struct netdev_flow_key *key, > - struct ofpbuf *actions, struct ofpbuf *put_actions, > - int *lost_cnt) > + struct ofpbuf *actions, struct ofpbuf *put_actions) > { > struct ofpbuf *add_actions; > struct dp_packet_batch b; > @@ -5096,8 +5027,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, > put_actions); > if (OVS_UNLIKELY(error && error != ENOSPC)) { > dp_packet_delete(packet); > - (*lost_cnt)++; > - return; > + return error; > } > > /* The Netlink encoding of datapath flow keys cannot express > @@ -5137,6 +5067,9 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, > ovs_mutex_unlock(&pmd->flow_mutex); > emc_probabilistic_insert(pmd, key, netdev_flow); > } > + /* Only error ENOSPC can reach here. We process the packet but do not > + * install a datapath flow. Treat as successful. */ > + return 0; > } > > static inline void > @@ -5158,7 +5091,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, > struct dpcls *cls; > struct dpcls_rule *rules[PKT_ARRAY_SIZE]; > struct dp_netdev *dp = pmd->dp; > - int miss_cnt = 0, lost_cnt = 0; > + int upcall_ok_cnt = 0, upcall_fail_cnt = 0; > int lookup_cnt = 0, add_lookup_cnt; > bool any_miss; > size_t i; > @@ -5200,9 +5133,14 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, > continue; > } > > - miss_cnt++; > - handle_packet_upcall(pmd, packet, &keys[i], &actions, > - &put_actions, &lost_cnt); > + int error = handle_packet_upcall(pmd, packet, &keys[i], > + &actions, &put_actions); > + > + if (OVS_UNLIKELY(error)) { > + upcall_fail_cnt++; > + } else { > + upcall_ok_cnt++; > + } > } > > ofpbuf_uninit(&actions); > @@ -5212,8 +5150,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, > DP_PACKET_BATCH_FOR_EACH (packet, packets_) { > if (OVS_UNLIKELY(!rules[i])) { > dp_packet_delete(packet); > - lost_cnt++; > - miss_cnt++; > + upcall_fail_cnt++; > } > } > } > @@ -5231,10 +5168,14 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, > dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, > n_batches); > } > > - dp_netdev_count_packet(pmd, DP_STAT_MASKED_HIT, cnt - miss_cnt); > - dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt); > - dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt); > - dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt); > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_HIT, > + cnt - upcall_ok_cnt - upcall_fail_cnt); > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_LOOKUP, > + lookup_cnt); > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MISS, > + upcall_ok_cnt); > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_LOST, > + upcall_fail_cnt); Above two could be one-liners. > } > > /* Packets enter the datapath from a port (or from recirculation) here. > diff --git a/tests/pmd.at b/tests/pmd.at > index e39a23a..0356f87 100644 > --- a/tests/pmd.at > +++ b/tests/pmd.at > @@ -170,13 +170,16 @@ dummy@ovs-dummy: hit:0 missed:0 > p0 7/1: (dummy-pmd: configured_rx_queues=4, > configured_tx_queues=<cleared>, requested_rx_queues=4, > requested_tx_queues=<cleared>) > ]) > > -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN > | sed '/cycles/d' | grep pmd -A 5], [0], [dnl > +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN > | sed '/cycles/d' | grep pmd -A 8], [0], [dnl > pmd thread numa_id <cleared> core_id <cleared>: > + packets received:0 > + packet recirculations:0 > + avg. datapath passes per packet:0.00 > emc hits:0 > megaflow hits:0 > - avg. subtable lookups per hit:0.00 > - miss:0 > - lost:0 > + avg. subtable lookups per megaflow hit:0.00 > + miss(i.e. lookup miss with success upcall):0 > + lost(i.e. lookup miss with failed upcall):0 > ]) > > ovs-appctl time/stop > @@ -197,13 +200,16 @@ AT_CHECK([cat ovs-vswitchd.log | filter_flow_install | > strip_xout], [0], [dnl > > recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=50:54:00:00:00:77,dst=50:54:00:00:01:78),eth_type(0x0800),ipv4(frag=no), > actions: <del> > ]) > > -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN > | sed '/cycles/d' | grep pmd -A 5], [0], [dnl > +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN > | sed '/cycles/d' | grep pmd -A 8], [0], [dnl > pmd thread numa_id <cleared> core_id <cleared>: > + packets received:20 > + packet recirculations:0 > + avg. datapath passes per packet:1.00 > emc hits:19 > megaflow hits:0 > - avg. subtable lookups per hit:0.00 > - miss:1 > - lost:0 > + avg. subtable lookups per megaflow hit:0.00 > + miss(i.e. lookup miss with success upcall):1 > + lost(i.e. lookup miss with failed upcall):0 > ]) > > OVS_VSWITCHD_STOP > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
