> -----Original Message-----
> From: Jan Scheurich [mailto:[email protected]]
> Sent: Tuesday, January 16, 2018 1:51 AM
> To: [email protected]
> Cc: [email protected]; Stokes, Ian <[email protected]>;
> [email protected]; O Mahony, Billy <[email protected]>;
> Jan Scheurich <[email protected]>
> Subject: [PATCH v7 3/3] dpif-netdev: Detection and logging of suspicious
> PMD iterations
>
> This patch enhances dpif-netdev-perf to detect iterations with suspicious
> statistics according to the following criteria:
>
> - iteration lasts longer than US_THR microseconds (default 250).
> This can be used to capture events where a PMD is blocked or
> interrupted for such a period of time that there is a risk for
> dropped packets on any of its Rx queues.
>
> - max vhost qlen exceeds a threshold Q_THR (default 128). This can
> be used to infer virtio queue overruns and dropped packets inside
> a VM, which are not visible in OVS otherwise.
>
> Such suspicious iterations can be logged together with their iteration
> statistics to be able to correlate them to packet drop or other events outside
> OVS.
>
> A new command is introduced to enable/disable logging at run-time and to
> adjust the above thresholds for suspicious iterations:
>
> ovs-appctl dpif-netdev/pmd-perf-log-set on | off
> [-b before] [-a after] [-us usec] [-q qlen]
>
> Turn logging on or off at run-time (on|off).
>
> -b before: The number of iterations before the suspicious iteration to
> be logged (default 5).
[[BO'M]] I've found the before value includes the suspicious iteration itself.
I think it's fine to just mention that here rather than
> -a after: The number of iterations after the suspicious iteration to
> be logged (default 5).
> -q qlen: Suspicious vhost queue fill level threshold. Increase this
> to 512 if the Qemu supports 1024 virtio queue length.
> (default 128).
> -us usec: change the duration threshold for a suspicious iteration
> (default 250 us).
>
> If more than 100 iterations before or after a suspicious iteration have been
> looged once, OVS falls back to the safe default values (5/5) to avoid that
> logging itself causes continuos further logging.
>
> Signed-off-by: Jan Scheurich <[email protected]>
> ---
> NEWS | 2 +
> lib/dpif-netdev-perf.c | 142
> ++++++++++++++++++++++++++++++++++++++++++++
> lib/dpif-netdev-perf.h | 40 ++++++++++++-
> lib/dpif-netdev.c | 7 ++-
> lib/netdev-dpif-unixctl.man | 47 ++++++++++++++-
> 5 files changed, 233 insertions(+), 5 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index 743528e..7d40374 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -47,6 +47,8 @@ Post-v2.8.0
> * Commands ovs-appctl dpif-netdev/pmd-*-show can now work on a
> single PMD
> * Detailed PMD performance metrics available with new command
> ovs-appctl dpif-netdev/pmd-perf-show
> + * Supervision of PMD performance metrics and logging of suspicious
> + iterations
> - vswitchd:
> * Datapath IDs may now be specified as 0x1 (etc.) instead of 16 digits.
> * Configuring a controller, or unconfiguring all controllers, now
> deletes diff
> --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c index
> e0ef15d..259a6c8 100644
> --- a/lib/dpif-netdev-perf.c
> +++ b/lib/dpif-netdev-perf.c
> @@ -24,6 +24,23 @@
>
> VLOG_DEFINE_THIS_MODULE(pmd_perf);
>
> +#define ITER_US_THRESHOLD 250 /* Warning threshold for iteration
> duration
> + in microseconds. */
> +#define VHOST_QUEUE_FULL 128 /* Size of the virtio TX queue. */
> +#define LOG_IT_BEFORE 5 /* Number of iteration to log before
> + suspicious iteration. */
> +#define LOG_IT_AFTER 5 /* Number of iteration to log after
> + suspicious iteration. */
> +
> +bool log_on = false;
> +static uint32_t log_it_before = LOG_IT_BEFORE; static uint32_t
> +log_it_after = LOG_IT_AFTER; static uint32_t log_us_thr =
> +ITER_US_THRESHOLD; uint32_t log_q_thr = VHOST_QUEUE_FULL; uint64_t
> +iter_cycle_threshold;
> +
> +static struct vlog_rate_limit latency_rl = VLOG_RATE_LIMIT_INIT(600,
> +600);
> +
> #ifdef DPDK_NETDEV
> static uint64_t
> get_tsc_hz(void)
> @@ -124,6 +141,8 @@ pmd_perf_stats_init(struct pmd_perf_stats *s)
> histogram_walls_set_log(&s->cycles_per_upcall, 1000, 1000000);
> histogram_walls_set_log(&s->max_vhost_qfill, 0, 512);
> s->start_ms = time_msec();
> + s->log_begin_it = UINT64_MAX;
> + s->log_end_it = UINT64_MAX;
> }
>
> void
> @@ -362,6 +381,8 @@ pmd_perf_stats_clear__(struct pmd_perf_stats *s)
> history_init(&s->milliseconds);
> s->start_ms = time_msec();
> s->milliseconds.sample[0].timestamp = s->start_ms;
> + s->log_begin_it = UINT64_MAX;
> + s->log_end_it = UINT64_MAX;
> /* Clearing finished. */
> s->clear = false;
> }
> @@ -387,3 +408,124 @@ non_pmd_perf_stats_clear(struct pmd_perf_stats
> *s) {
> pmd_perf_stats_clear__(s);
> }
> +
> +void
> +pmd_perf_log_suspicious_iteration(struct pmd_perf_stats *s,
> + uint64_t cycles,
> + char *reason) {
> + VLOG_WARN_RL(&latency_rl,
> + "Suspicious iteration (%s): tsc=%"PRIu64
> + " duration=%"PRIu64" us\n",
> + reason, s->current.timestamp,
> + (1000000L * cycles) / get_tsc_hz());
> + if (log_it_before + log_it_after > 0) {
> + if (s->log_begin_it == UINT64_MAX) {
> + s->log_begin_it =
> + (s->iterations.idx + HISTORY_LEN - log_it_before)
> + % HISTORY_LEN;
> + }
> + s->log_end_it =
> + (s->iterations.idx + log_it_after) % HISTORY_LEN;
> + }
> +}
> +
> +void
> +pmd_perf_log_iteration_neighborhood(struct pmd_perf_stats *s) {
> + struct ds log = DS_EMPTY_INITIALIZER;
> + pmd_perf_format_iteration_history(&log, s,
> + (s->log_end_it + HISTORY_LEN - s->log_begin_it) % HISTORY_LEN);
> + VLOG_WARN_RL(&latency_rl,
> + "Neighborhood of suspicious iteration:\n"
> + "%s", ds_cstr(&log));
> + ds_destroy(&log);
> + s->log_end_it = s->log_begin_it = UINT64_MAX;
> + if (log_it_before > 100 || log_it_after > 100) {
> + /* Reset to safe default values to avoid disturbance. */
> + log_it_before = LOG_IT_BEFORE;
> + log_it_after = LOG_IT_AFTER;
> + }
> +}
> +
> +void
> +pmd_perf_log_set_cmd(struct unixctl_conn *conn,
> + int argc, const char *argv[],
> + void *aux OVS_UNUSED)
> +{
> + int it_before, it_after, us_thr, q_thr;
> + bool on;
> + bool usage = false;
> +
> + on = log_on;
> + it_before = log_it_before;
> + it_after = log_it_after;
> + q_thr = log_q_thr;
> + us_thr = log_us_thr;
> +
> + while (argc > 1) {
> + if (!strcmp(argv[1], "on")) {
> + on = true;
> + argc--;
> + argv++;
> + } else if (!strcmp(argv[1], "off")) {
> + on = false;
> + argc--;
> + argv++;
> + } else if (!strcmp(argv[1], "-a")) {
> + it_after = strtol(argv[2], NULL, 10);
> + if (it_after < 0) {
> + it_after = 0;
> + } else if (it_before + it_after > HISTORY_LEN-1) {
> + it_after = HISTORY_LEN-1 - it_before;
> + }
> + argc -= 2;
> + argv += 2;
> + } else if (!strcmp(argv[1], "-b")) {
> + it_before = strtol(argv[2], NULL, 10);
> + if (it_before < 0) {
> + it_before = 0;
> + } else if (it_before > HISTORY_LEN-10) {
> + it_before = HISTORY_LEN-10;
> + }
[[BO'M]] The logic here assumes -b is give before -a. If you say
ovs-appctl dpif-netdev/pmd-perf-log-set on -a 1500 -b 1500
Then you get a before+after > history_len
2018-01-19T13:57:48.953Z|00141|pmd_perf|INFO|pmd-perf-log-set: on,
before=990, after=994, us_thr=250, q_thr=128
If the cli values were parsed first and then fixed up after the argv loop it
would avoid the inter-dependencies causing this.
> + argc -= 2;
> + argv += 2;
> + } else if (!strcmp(argv[1], "-q")) {
> + q_thr = strtol(argv[2], NULL, 10);
> + if (q_thr < 0) {
> + q_thr = 0;
> + }
> + argc -= 2;
> + argv += 2;
> + } else if (!strcmp(argv[1], "-us")) {
> + us_thr = strtol(argv[2], NULL, 10);
> + if (us_thr < 0) {
> + us_thr = 0;
> + }
> + argc -= 2;
> + argv += 2;
> + } else {
> + usage = true;
> + break;
> + }
> + }
> +
> + if (usage) {
> + unixctl_command_reply_error(conn,
> + "Usage: ovs-appctl dpif-netdev/pmd-perf-log-set "
> + "[on|off] [-b before] [-a after] [-us usec] [-q qlen]");
> + return;
> + }
> +
> + VLOG_INFO("pmd-perf-log-set: %s, before=%d, after=%d, us_thr=%d,"
> + " q_thr=%d\n",
> + on ? "on" : "off", it_before, it_after, us_thr, q_thr);
> + log_on = on;
> + log_it_before = it_before;
> + log_it_after = it_after;
> + log_q_thr = q_thr;
> + log_us_thr = us_thr;
> + iter_cycle_threshold = (log_us_thr * get_tsc_hz()) / 1000000L;
> +
> + unixctl_command_reply(conn, "");
> +}
> diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h index
> 7a89c40..aa55c68 100644
> --- a/lib/dpif-netdev-perf.h
> +++ b/lib/dpif-netdev-perf.h
> @@ -160,8 +160,12 @@ struct pmd_perf_stats {
> struct histogram max_vhost_qfill;
> /* Iteration history buffer. */
> struct history iterations;
> - /* Millisecond hitory buffer. */
> + /* Millisecond history buffer. */
> struct history milliseconds;
> + /* Start of iteration range to log. */
> + uint64_t log_begin_it;
> + /* End of iteration range to log. */
> + uint64_t log_end_it;
> };
>
> /* Support for accurate timing of PMD execution on TSC clock cycle level.
> @@ -322,6 +326,17 @@ history_store(struct history *h, struct iter_stats *is)
> return history_next(h);
> }
>
> +/* Data and function related to logging of suspicious iterations. */
> +
> +extern bool log_on;
> +extern uint32_t log_q_thr;
> +extern uint64_t iter_cycle_threshold;
> +
> +void pmd_perf_log_suspicious_iteration(struct pmd_perf_stats *s,
> + uint64_t cycles,
> + char *reason); void
> +pmd_perf_log_iteration_neighborhood(struct pmd_perf_stats *s);
> +
> /* Functions recording PMD metrics per iteration. */
>
> static inline void
> @@ -351,6 +366,7 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s,
> int rx_packets,
> uint64_t now_tsc = cycles_counter_update(s);
> struct iter_stats *cum_ms;
> uint64_t cycles, cycles_per_pkt = 0;
> + char *reason = NULL;
>
> if (OVS_UNLIKELY(s->current.timestamp == 0)) {
> /* Stats were cleared during the ongoing iteration. */ @@ -404,6
> +420,23
> @@ pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets,
> cum_ms->batches += s->current.batches;
> cum_ms->max_vhost_qfill += s->current.max_vhost_qfill;
>
> + if (log_on) {
> + /* Log suspicious iterations. */
> + if (cycles > iter_cycle_threshold) {
> + reason = "Excessive total cycles";
> + } else if (s->current.max_vhost_qfill >= log_q_thr) {
> + reason = "Vhost RX queue full";
> + }
> + if (OVS_UNLIKELY(reason)) {
> + pmd_perf_log_suspicious_iteration(s, cycles, reason);
> + }
> +
> + /* Log iteration stats interval when required. */
> + if (OVS_UNLIKELY(s->log_end_it == s->iterations.idx)) {
> + pmd_perf_log_iteration_neighborhood(s);
> + }
> + }
> +
> /* Store in iteration history. */
> history_store(&s->iterations, &s->current);
> if (now_tsc > s->next_check_tsc) {
> @@ -426,7 +459,7 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s,
> int rx_packets,
> }
> }
[[BO'M]]
I tested the logging by randomly deciding (about every 30s or so) that an iter
was suspicious (I did "if (cycles > iter_cycle_threshold || random_uint32() <
200 ) { ... ")
In that case it seems that nearly always that the iter *after* the suspicious
iter is also quite a long iter. I believe this is because the cost of the vlog
in in pmd_perf_log_suspicious_iteration get assigned to the iter after the
suspicious iter (the cycles for the current iteration being finalized earlier
in the fn by s->current.cycles = cycles;
2018-01-19T14:46:13.280Z|00497|pmd_perf(pmd60)|WARN|Suspicious iteration
(Excessive total cycles): tsc=8512846813946109 duration=269 us
2018-01-19T14:46:13.280Z|00498|pmd_perf(pmd60)|WARN|Neighborhood of suspicious
iteration:
tsc cycles packets cycles/pkt pkts/batch
vhost qlen upcalls cycles/upcall
8512846814787757 30208 128 236 32 0
0 0
8512846814757449 30308 128 236 32 0
0 0
8512846814727185 30264 128 236 32 0
0 0
8512846814564817 162368 128 1268 32 0
0 0
8512846813946109 618708 128 4833 32 0
0 0 <<<
8512846813916321 29788 128 232 32 0
0 0
8512846813886577 29744 128 232 32 0
0 0
8512846813856937 29640 128 231 32 0
0 0
8512846813827497 29440 128 230 32 0
0 0
8512846813798061 29436 128 229 32 0
0 0
It would be good to acknowledge this in the docs or ideally to move the "WARN
suspicious ... " vlog to be given at the same time as the iter dump vlog.
However, I can't figure out why when I just added a random_uint32 to trigger a
suspicious iter why the 'suspicious' iteration shows and increased cycle
count.
>
> -/* Functions for formatting the output of commands. */
> +/* Formatting the output of commands. */
>
> struct pmd_perf_params {
> int command_type;
> @@ -443,6 +476,9 @@ void pmd_perf_format_iteration_history(struct ds
> *str,
> int n_iter); void
> pmd_perf_format_ms_history(struct ds
> *str, struct pmd_perf_stats *s,
> int n_ms);
> +void pmd_perf_log_set_cmd(struct unixctl_conn *conn,
> + int argc, const char *argv[],
> + void *aux OVS_UNUSED);
>
> #ifdef __cplusplus
> }
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index f5931bf..b493216
> 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -1095,7 +1095,7 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn,
> int argc, const char *argv[],
> ovs_mutex_lock(&dp_netdev_mutex);
>
> while (argc > 1) {
> - if (!strcmp(argv[1], "-pmd") && argc >= 3) {
> + if (!strcmp(argv[1], "-pmd") && argc > 2) {
> if (str_to_uint(argv[2], 10, &core_id)) {
> filter_on_pmd = true;
> }
> @@ -1218,6 +1218,11 @@ dpif_netdev_init(void)
> unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]",
> 0, 1, dpif_netdev_pmd_rebalance,
> NULL);
> + unixctl_command_register("dpif-netdev/pmd-perf-log-set",
> + "on|off [-b before] [-a after] "
> + "[-us usec] [-q qlen]",
> + 0, 10, pmd_perf_log_set_cmd,
> + NULL);
> return 0;
> }
>
> diff --git a/lib/netdev-dpif-unixctl.man b/lib/netdev-dpif-unixctl.man index
> 53f4c51..82e372f 100644
> --- a/lib/netdev-dpif-unixctl.man
> +++ b/lib/netdev-dpif-unixctl.man
> @@ -70,7 +70,7 @@ This raw recorded data is used threefold:
>
> The command options are
>
> - \fB-nh\fR: Suppress the histograms
> + \fB-nh\fR: Suppress the histograms
> \fB-it\fR \fIiter_len\fR: Display the last iter_len iteration stats
> \fB-ms\fR \fIms_len\fR: Display the last ms_len millisecond stats
>
> @@ -101,10 +101,53 @@ To reset the counters and start a new
> measurement use .
> .IP "\fBdpif-netdev/pmd-stats-clear\fR [\fIdp\fR]"
> Resets to zero the per pmd thread performance numbers shown by the -
> \fBdpif-netdev/pmd-stats-show\fR and \Bdpif-netdev/pmd-perf-show \fR
> commands.
> +\fBdpif-netdev/pmd-stats-show\fR and \fBdpif-netdev/pmd-perf-show\fR
> commands.
> It will NOT reset datapath or bridge statistics, only the values shown by
> the
> above commands.
> .
> +.IP "\fBdpif-netdev/pmd-perf-log-set\fR \fBon\fR|\fBoff\fR \ [\fB-b\fR
> +\fIbefore\fR] [\fB-a\fR \fIafter\fR] [\fB-us\fR \fIusec\fR] \
> +[\fB-q\fR \fIqlen\fR]"
> +.
> +The userspace "netdev" datapath is able to supervise the PMD
> +performance metrics and detect iterations with suspicious statistics
> +according to the following criteria:
> +
> +- The iteration lasts longer than \fIusec\fR microseconds (default 250).
> + This can be used to capture events where a PMD is blocked or
> + interrupted for such a period of time that there is a risk for
> + dropped packets on any of its Rx queues.
> +
> +- The max vhost qlen exceeds a threshold \fIqlen\fR (default 128). This
> +can
> + be used to infer virtio queue overruns and dropped packets inside a
> +VM,
> + which are not visible in OVS otherwise.
> +
> +Such suspicious iterations can be logged together with their iteration
> +statistics in the \fBovs-vswitchd.log\fR to be able to correlate them
> +to packet drop or other events outside OVS.
> +
> +The above command enables (\fBon\fR) or disables (\fBoff\fR)
> +supervision and logging at run-time and can be used to adjust the above
> +thresholds for detecting suspicious iterations. By default supervision
> +and logging is disabled.
> +
> +The command options are:
> +
> + \fB-b\fR \fIbefore\fR:
> + The number of iterations before the suspicious iteration to be logged
> (default 5).
> + \fB-a\fR \fIafter\fR:
> + The number of iterations after the suspicious iteration to be logged
> (default 5).
> + \fB-q\fR \fIqlen\fR:
> + Suspicious vhost queue fill level threshold. Increase this to 512 if
> + the Qemu supports 1024 virtio queue length (default 128).
> + \fB-us\fR \fIusec\fR:
> + change the duration threshold for a suspicious iteration (default
> 250 us).
> +
> +If more than 100 iterations before or after a suspicious iteration have
> +been looged once, OVS falls back to the safe default values (5/5) to
> +avoid that logging itself causes continuos further suspicious
> +iterations and continuous logging.
> +.
> .IP "\fBdpif-netdev/pmd-rxq-show\fR [\fB-pmd\fR \fIcore\fR] [\fIdp\fR]"
> For one or all pmd threads of the datapath \fIdp\fR show the list of queue-
> ids with port names, which this thread polls.
> --
> 1.9.1
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev