Re: [ovs-dev] [PATCH v10 3/3] dpif-netdev: Detection and logging of suspicious PMD iterations

2018-04-12 Thread Jan Scheurich
> > I would not say this is expected behavior.
> >
> > It seems that you are executing on a somewhat slower system (tsc clock 
> > seems to be 100/us = 0.1 GHz) and that, even with only 5
> lines logged before and after,  the logging output is causing so much slow 
> down of the PMD that it continues to cause iterations using
> excessive cycles (362000 = 3.62 ms!) due to logging.
> 
> The system is slower than usual, but not so much.
> This behaviour captured on ARMv8. The TSC frequency on ARM is usually around 
> 100MHz
> without using PMU which is not available from userspace by default.
> Meantime, CPU frequency is 2GHz.

On my x86_64 server (2.4 GHz) I could reproduce the periodic logging if I go 
down with the us_thr to values as low as 50 us under high PMD load. But it 
always stops when I increase the threshold to a value higher than 80 us. It 
seems you ARM system is much slower when logging to file. The threshold that 
should be reasonably applied may depend on the system under test.

> 
> >
> > The actual iteration with logging is not flagged as suspicious, but the 
> > subsequent iteration gets the hit of the massive cycles that have
> passed on the TSC clock. The "phantom" duration of 0 us shown is probably a 
> side effect of this.
> 
> I guess, you just have some bug in your calculation of execution time.
> Possibly you're mixing up the TSC and CPU frequencies.
> Zero ms duration is normal for printing so small amount of data.

It was actually a bug in the code: The iteration logged as suspicious was the 
one directly after the actually suspicious iteration. That bug is fixed in v11 
I just sent out.

BR, Jan
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev



Re: [ovs-dev] [PATCH v10 3/3] dpif-netdev: Detection and logging of suspicious PMD iterations

2018-04-12 Thread Ilya Maximets
On 11.04.2018 20:45, Jan Scheurich wrote:
> Hi Ilya,
> 
> I would not say this is expected behavior.
> 
> It seems that you are executing on a somewhat slower system (tsc clock seems 
> to be 100/us = 0.1 GHz) and that, even with only 5 lines logged before and 
> after,  the logging output is causing so much slow down of the PMD that it 
> continues to cause iterations using excessive cycles (362000 = 3.62 ms!) due 
> to logging.

The system is slower than usual, but not so much.
This behaviour captured on ARMv8. The TSC frequency on ARM is usually around 
100MHz
without using PMU which is not available from userspace by default.
Meantime, CPU frequency is 2GHz.

> 
> The actual iteration with logging is not flagged as suspicious, but the 
> subsequent iteration gets the hit of the massive cycles that have passed on 
> the TSC clock. The "phantom" duration of 0 us shown is probably a side effect 
> of this. 

I guess, you just have some bug in your calculation of execution time.
Possibly you're mixing up the TSC and CPU frequencies.
Zero ms duration is normal for printing so small amount of data.

> 
> I will try to reproduce and investigate. I will have a look at the detection 
> logic to see if this can be avoided.
> 
> BR, Jan
> 
> 
>> -Original Message-
>> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
>> Sent: Tuesday, 27 March, 2018 16:05
>> To: Jan Scheurich ; d...@openvswitch.org
>> Cc: ktray...@redhat.com; ian.sto...@intel.com; billy.o.mah...@intel.com
>> Subject: Re: [PATCH v10 3/3] dpif-netdev: Detection and logging of 
>> suspicious PMD iterations
>>
>> I see following behaviour:
>>
>> 1. Configure low -us (like 100)
>> 2. After that I see many logs about suspicious iterations (expected).
>>
>> 2018-03-27T13:58:27Z|03574|pmd_perf(pmd7)|WARN|Suspicious iteration 
>> (Excessive total cycles): tsc=520415762246435
>> duration=106 us
>> 2018-03-27T13:58:27Z|03575|pmd_perf(pmd7)|WARN|Neighborhood of suspicious 
>> iteration:
>>tsc cycles   packets  cycles/pkt   pkts/batch   
>> vhost qlen   upcalls  cycles/upcall
>>520415762297985 9711 32   303  32   
>> 424  00
>>520415762287041 1066732   333  32   
>> 419  00
>>520415762277319 9722 32   303  32   
>> 429  00
>>520415762267083 9971 32   311  32   
>> 443  00
>>520415762257413 9670 32   302  32   
>> 451  00
>>520415762246435 1069932   334  32   
>> 448  00
>>520415762235033 1110932   347  32   
>> 455  00
>>520415762180220 9826 32   307  32   
>> 399  00
>>520415762169792 1022932   319  32   
>> 413  00
>>520415762160385 9407 32   293  32   
>> 408  00
>>520415762150221 9891 32   309  32   
>> 434  00
>> 2018-03-27T13:58:27Z|03576|pmd_perf(pmd7)|WARN|Suspicious iteration 
>> (Excessive total cycles): tsc=520415762469997
>> duration=104 us
>> 2018-03-27T13:58:27Z|03577|pmd_perf(pmd7)|WARN|Neighborhood of suspicious 
>> iteration:
>>tsc cycles   packets  cycles/pkt   pkts/batch   
>> vhost qlen   upcalls  cycles/upcall
>>520415762519119 9462 32   295  32   
>> 505  00
>>520415762509595 9319 32   291  32   
>> 537  00
>>520415762500154 9283 32   290  32   
>> 569  00
>>520415762490585 9287 32   290  32   
>> 601  00
>>520415762480693 9730 32   304  32   
>> 633  00
>>520415762469997 1041432   325  32   
>> 665  00
>>520415762459348 1034232   323  32   
>> 697  00
>>520415762297985 9711 32   303  32   
>> 424  00
>>520415762287041 1066732   333  32   
>> 419  00
>>520415762277319 9722 32   303  32   
>> 429  00
>>520415762267083 9971 32   311  32   
>> 443  00
>>
>> 3. Configure back high -us (like 1000).
>> 4. Logs are still there with zero duration. 

Re: [ovs-dev] [PATCH v10 3/3] dpif-netdev: Detection and logging of suspicious PMD iterations

2018-04-11 Thread Jan Scheurich
Hi Ilya,

I would not say this is expected behavior.

It seems that you are executing on a somewhat slower system (tsc clock seems to 
be 100/us = 0.1 GHz) and that, even with only 5 lines logged before and after,  
the logging output is causing so much slow down of the PMD that it continues to 
cause iterations using excessive cycles (362000 = 3.62 ms!) due to logging.

The actual iteration with logging is not flagged as suspicious, but the 
subsequent iteration gets the hit of the massive cycles that have passed on the 
TSC clock. The "phantom" duration of 0 us shown is probably a side effect of 
this. 

I will try to reproduce and investigate. I will have a look at the detection 
logic to see if this can be avoided.

BR, Jan


> -Original Message-
> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> Sent: Tuesday, 27 March, 2018 16:05
> To: Jan Scheurich ; d...@openvswitch.org
> Cc: ktray...@redhat.com; ian.sto...@intel.com; billy.o.mah...@intel.com
> Subject: Re: [PATCH v10 3/3] dpif-netdev: Detection and logging of suspicious 
> PMD iterations
> 
> I see following behaviour:
> 
> 1. Configure low -us (like 100)
> 2. After that I see many logs about suspicious iterations (expected).
> 
> 2018-03-27T13:58:27Z|03574|pmd_perf(pmd7)|WARN|Suspicious iteration 
> (Excessive total cycles): tsc=520415762246435
> duration=106 us
> 2018-03-27T13:58:27Z|03575|pmd_perf(pmd7)|WARN|Neighborhood of suspicious 
> iteration:
>tsc cycles   packets  cycles/pkt   pkts/batch   
> vhost qlen   upcalls  cycles/upcall
>520415762297985 9711 32   303  32   
> 424  00
>520415762287041 1066732   333  32   
> 419  00
>520415762277319 9722 32   303  32   
> 429  00
>520415762267083 9971 32   311  32   
> 443  00
>520415762257413 9670 32   302  32   
> 451  00
>520415762246435 1069932   334  32   
> 448  00
>520415762235033 1110932   347  32   
> 455  00
>520415762180220 9826 32   307  32   
> 399  00
>520415762169792 1022932   319  32   
> 413  00
>520415762160385 9407 32   293  32   
> 408  00
>520415762150221 9891 32   309  32   
> 434  00
> 2018-03-27T13:58:27Z|03576|pmd_perf(pmd7)|WARN|Suspicious iteration 
> (Excessive total cycles): tsc=520415762469997
> duration=104 us
> 2018-03-27T13:58:27Z|03577|pmd_perf(pmd7)|WARN|Neighborhood of suspicious 
> iteration:
>tsc cycles   packets  cycles/pkt   pkts/batch   
> vhost qlen   upcalls  cycles/upcall
>520415762519119 9462 32   295  32   
> 505  00
>520415762509595 9319 32   291  32   
> 537  00
>520415762500154 9283 32   290  32   
> 569  00
>520415762490585 9287 32   290  32   
> 601  00
>520415762480693 9730 32   304  32   
> 633  00
>520415762469997 1041432   325  32   
> 665  00
>520415762459348 1034232   323  32   
> 697  00
>520415762297985 9711 32   303  32   
> 424  00
>520415762287041 1066732   333  32   
> 419  00
>520415762277319 9722 32   303  32   
> 429  00
>520415762267083 9971 32   311  32   
> 443  00
> 
> 3. Configure back high -us (like 1000).
> 4. Logs are still there with zero duration. Logs printed every second like 
> this:
> 
> 2018-03-27T14:02:08Z|04140|pmd_perf(pmd7)|WARN|Suspicious iteration 
> (Excessive total cycles): tsc=520437806368099 duration=0
> us
> [Thread 0x7fb56f2910 (LWP 19754) exited]
> [New Thread 0x7fb56f2910 (LWP 19755)]
> 2018-03-27T14:02:08Z|04141|pmd_perf(pmd7)|WARN|Neighborhood of suspicious 
> iteration:
>tsc cycles   packets  cycles/pkt   pkts/batch   
> vhost qlen   upcalls  cycles/upcall
>520437806368309 44   0000  
>   00
>

Re: [ovs-dev] [PATCH v10 3/3] dpif-netdev: Detection and logging of suspicious PMD iterations

2018-03-27 Thread Stokes, Ian
> This patch enhances dpif-netdev-perf to detect iterations with suspicious
> statistics according to the following criteria:
> 
> - iteration lasts longer than US_THR microseconds (default 250).
>   This can be used to capture events where a PMD is blocked or
>   interrupted for such a period of time that there is a risk for
>   dropped packets on any of its Rx queues.
> 
> - max vhost qlen exceeds a threshold Q_THR (default 128). This can
>   be used to infer virtio queue overruns and dropped packets inside
>   a VM, which are not visible in OVS otherwise.
> 
> Such suspicious iterations can be logged together with their iteration
> statistics to be able to correlate them to packet drop or other events
> outside OVS.
> 
> A new command is introduced to enable/disable logging at run-time and to
> adjust the above thresholds for suspicious iterations:
> 
> ovs-appctl dpif-netdev/pmd-perf-log-set on | off
> [-b before] [-a after] [-e|-ne] [-us usec] [-q qlen]
> 
> Turn logging on or off at run-time (on|off).
> 
> -b before:  The number of iterations before the suspicious iteration to
> be logged (default 5).
> -a after:   The number of iterations after the suspicious iteration to
> be logged (default 5).
> -e: Extend logging interval if another suspicious iteration is
> detected before logging occurs.
> -ne:Do not extend logging interval (default).
> -q qlen:Suspicious vhost queue fill level threshold. Increase this
> to 512 if the Qemu supports 1024 virtio queue length.
> (default 128).
> -us usec:   change the duration threshold for a suspicious iteration
> (default 250 us).
> 
> Note: Logging of suspicious iterations itself consumes a considerable
> amount of processing cycles of a PMD which may be visible in the iteration
> history. In the worst case this can lead OVS to detect another suspicious
> iteration caused by logging.
> 
> If more than 100 iterations around a suspicious iteration have been logged
> once, OVS falls back to the safe default values (-b 5/-a 5/-ne) to avoid
> that logging itself causes continuos further logging.
> 
> Signed-off-by: Jan Scheurich 
> Acked-by: Billy O'Mahony 
> ---
>  NEWS|   2 +
>  lib/dpif-netdev-perf.c  | 201
> 
>  lib/dpif-netdev-perf.h  |  42 +
>  lib/dpif-netdev-unixctl.man |  59 +
>  lib/dpif-netdev.c   |   5 ++
>  5 files changed, 309 insertions(+)
> 
> diff --git a/NEWS b/NEWS
> index 8f66fd3..61148b1 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -76,6 +76,8 @@ v2.9.0 - 19 Feb 2018
>   * Commands ovs-appctl dpif-netdev/pmd-*-show can now work on a
> single PMD
>   * Detailed PMD performance metrics available with new command
>   ovs-appctl dpif-netdev/pmd-perf-show
> + * Supervision of PMD performance metrics and logging of suspicious
> +   iterations
> - vswitchd:
>   * Datapath IDs may now be specified as 0x1 (etc.) instead of 16
> digits.
>   * Configuring a controller, or unconfiguring all controllers, now
> deletes diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c index
> 2b36410..410a209 100644
> --- a/lib/dpif-netdev-perf.c
> +++ b/lib/dpif-netdev-perf.c
> @@ -25,6 +25,24 @@
> 
>  VLOG_DEFINE_THIS_MODULE(pmd_perf);
> 
> +#define ITER_US_THRESHOLD 250   /* Warning threshold for iteration
> duration
> +   in microseconds. */
> +#define VHOST_QUEUE_FULL 128/* Size of the virtio TX queue. */
> +#define LOG_IT_BEFORE 5 /* Number of iterations to log before
> +   suspicious iteration. */
> +#define LOG_IT_AFTER 5  /* Number of iterations to log after
> +   suspicious iteration. */
> +
> +bool log_enabled = false;
> +bool log_extend = false;

Will cause compilation error 'error: symbol 'log_extend' was not declared. 
Should it be static?'

You could declare it as an extern similar to 'bool log_enabled' in 
dpif-netdev-perf.h.

Thanks
Ian
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v10 3/3] dpif-netdev: Detection and logging of suspicious PMD iterations

2018-03-27 Thread Ilya Maximets
I see following behaviour:

1. Configure low -us (like 100)
2. After that I see many logs about suspicious iterations (expected).

2018-03-27T13:58:27Z|03574|pmd_perf(pmd7)|WARN|Suspicious iteration (Excessive 
total cycles): tsc=520415762246435 duration=106 us
2018-03-27T13:58:27Z|03575|pmd_perf(pmd7)|WARN|Neighborhood of suspicious 
iteration:
   tsc cycles   packets  cycles/pkt   pkts/batch   
vhost qlen   upcalls  cycles/upcall
   520415762297985 9711 32   303  32   424  
00  
   520415762287041 1066732   333  32   419  
00  
   520415762277319 9722 32   303  32   429  
00  
   520415762267083 9971 32   311  32   443  
00  
   520415762257413 9670 32   302  32   451  
00  
   520415762246435 1069932   334  32   448  
00  
   520415762235033 1110932   347  32   455  
00  
   520415762180220 9826 32   307  32   399  
00  
   520415762169792 1022932   319  32   413  
00  
   520415762160385 9407 32   293  32   408  
00  
   520415762150221 9891 32   309  32   434  
00  
2018-03-27T13:58:27Z|03576|pmd_perf(pmd7)|WARN|Suspicious iteration (Excessive 
total cycles): tsc=520415762469997 duration=104 us
2018-03-27T13:58:27Z|03577|pmd_perf(pmd7)|WARN|Neighborhood of suspicious 
iteration:
   tsc cycles   packets  cycles/pkt   pkts/batch   
vhost qlen   upcalls  cycles/upcall
   520415762519119 9462 32   295  32   505  
00  
   520415762509595 9319 32   291  32   537  
00  
   520415762500154 9283 32   290  32   569  
00  
   520415762490585 9287 32   290  32   601  
00  
   520415762480693 9730 32   304  32   633  
00  
   520415762469997 1041432   325  32   665  
00  
   520415762459348 1034232   323  32   697  
00  
   520415762297985 9711 32   303  32   424  
00  
   520415762287041 1066732   333  32   419  
00  
   520415762277319 9722 32   303  32   429  
00  
   520415762267083 9971 32   311  32   443  
00

3. Configure back high -us (like 1000).
4. Logs are still there with zero duration. Logs printed every second like this:

2018-03-27T14:02:08Z|04140|pmd_perf(pmd7)|WARN|Suspicious iteration (Excessive 
total cycles): tsc=520437806368099 duration=0 us
[Thread 0x7fb56f2910 (LWP 19754) exited]
[New Thread 0x7fb56f2910 (LWP 19755)]
2018-03-27T14:02:08Z|04141|pmd_perf(pmd7)|WARN|Neighborhood of suspicious 
iteration:
   tsc cycles   packets  cycles/pkt   pkts/batch   
vhost qlen   upcalls  cycles/upcall
   520437806368309 44   0000
00  
   520437806368266 43   0000
00  
   520437806368223 43   0000
00  
   520437806368179 44   0000
00  
   520437806368134 45   0000
00  
   520437806368099 35   0000
00  
   520437806005193 362819   0000
00  
   520437806005149 44   0000
00  
   520437806005105 44   0000
00  
   520437806005061 44   0000
00  
   520437806005017 44  

Re: [ovs-dev] [PATCH v10 3/3] dpif-netdev: Detection and logging of suspicious PMD iterations

2018-03-18 Thread O Mahony, Billy
Acked-by: Billy O'Mahony 

> -Original Message-
> From: Jan Scheurich [mailto:jan.scheur...@ericsson.com]
> Sent: Sunday, March 18, 2018 5:55 PM
> To: d...@openvswitch.org
> Cc: ktray...@redhat.com; Stokes, Ian ;
> i.maxim...@samsung.com; O Mahony, Billy ; Jan
> Scheurich 
> Subject: [PATCH v10 3/3] dpif-netdev: Detection and logging of suspicious PMD
> iterations
> 
> This patch enhances dpif-netdev-perf to detect iterations with suspicious
> statistics according to the following criteria:
> 
> - iteration lasts longer than US_THR microseconds (default 250).
>   This can be used to capture events where a PMD is blocked or
>   interrupted for such a period of time that there is a risk for
>   dropped packets on any of its Rx queues.
> 
> - max vhost qlen exceeds a threshold Q_THR (default 128). This can
>   be used to infer virtio queue overruns and dropped packets inside
>   a VM, which are not visible in OVS otherwise.
> 
> Such suspicious iterations can be logged together with their iteration 
> statistics
> to be able to correlate them to packet drop or other events outside OVS.
> 
> A new command is introduced to enable/disable logging at run-time and to
> adjust the above thresholds for suspicious iterations:
> 
> ovs-appctl dpif-netdev/pmd-perf-log-set on | off
> [-b before] [-a after] [-e|-ne] [-us usec] [-q qlen]
> 
> Turn logging on or off at run-time (on|off).
> 
> -b before:  The number of iterations before the suspicious iteration to
> be logged (default 5).
> -a after:   The number of iterations after the suspicious iteration to
> be logged (default 5).
> -e: Extend logging interval if another suspicious iteration is
> detected before logging occurs.
> -ne:Do not extend logging interval (default).
> -q qlen:Suspicious vhost queue fill level threshold. Increase this
> to 512 if the Qemu supports 1024 virtio queue length.
> (default 128).
> -us usec:   change the duration threshold for a suspicious iteration
> (default 250 us).
> 
> Note: Logging of suspicious iterations itself consumes a considerable amount 
> of
> processing cycles of a PMD which may be visible in the iteration history. In 
> the
> worst case this can lead OVS to detect another suspicious iteration caused by
> logging.
> 
> If more than 100 iterations around a suspicious iteration have been logged 
> once,
> OVS falls back to the safe default values (-b 5/-a 5/-ne) to avoid that 
> logging
> itself causes continuos further logging.
> 
> Signed-off-by: Jan Scheurich 
> Acked-by: Billy O'Mahony 
> ---
>  NEWS|   2 +
>  lib/dpif-netdev-perf.c  | 201
> 
>  lib/dpif-netdev-perf.h  |  42 +
>  lib/dpif-netdev-unixctl.man |  59 +
>  lib/dpif-netdev.c   |   5 ++
>  5 files changed, 309 insertions(+)
> 
> diff --git a/NEWS b/NEWS
> index 8f66fd3..61148b1 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -76,6 +76,8 @@ v2.9.0 - 19 Feb 2018
>   * Commands ovs-appctl dpif-netdev/pmd-*-show can now work on a single
> PMD
>   * Detailed PMD performance metrics available with new command
>   ovs-appctl dpif-netdev/pmd-perf-show
> + * Supervision of PMD performance metrics and logging of suspicious
> +   iterations
> - vswitchd:
>   * Datapath IDs may now be specified as 0x1 (etc.) instead of 16 digits.
>   * Configuring a controller, or unconfiguring all controllers, now 
> deletes diff --
> git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c index 2b36410..410a209
> 100644
> --- a/lib/dpif-netdev-perf.c
> +++ b/lib/dpif-netdev-perf.c
> @@ -25,6 +25,24 @@
> 
>  VLOG_DEFINE_THIS_MODULE(pmd_perf);
> 
> +#define ITER_US_THRESHOLD 250   /* Warning threshold for iteration duration
> +   in microseconds. */
> +#define VHOST_QUEUE_FULL 128/* Size of the virtio TX queue. */
> +#define LOG_IT_BEFORE 5 /* Number of iterations to log before
> +   suspicious iteration. */
> +#define LOG_IT_AFTER 5  /* Number of iterations to log after
> +   suspicious iteration. */
> +
> +bool log_enabled = false;
> +bool log_extend = false;
> +static uint32_t log_it_before = LOG_IT_BEFORE; static uint32_t
> +log_it_after = LOG_IT_AFTER; static uint32_t log_us_thr =
> +ITER_US_THRESHOLD; uint32_t log_q_thr = VHOST_QUEUE_FULL; uint64_t
> +iter_cycle_threshold;
> +
> +static struct vlog_rate_limit latency_rl = VLOG_RATE_LIMIT_INIT(600,
> +600);
> +
>  #ifdef DPDK_NETDEV
>  static uint64_t
>  get_tsc_hz(void)
> @@ -127,6 +145,10 @@ pmd_perf_stats_init(struct pmd_perf_stats *s)
>  histogram_walls_set_log(>cycles_per_upcall, 1000, 100);
>  

[ovs-dev] [PATCH v10 3/3] dpif-netdev: Detection and logging of suspicious PMD iterations

2018-03-18 Thread Jan Scheurich
This patch enhances dpif-netdev-perf to detect iterations with
suspicious statistics according to the following criteria:

- iteration lasts longer than US_THR microseconds (default 250).
  This can be used to capture events where a PMD is blocked or
  interrupted for such a period of time that there is a risk for
  dropped packets on any of its Rx queues.

- max vhost qlen exceeds a threshold Q_THR (default 128). This can
  be used to infer virtio queue overruns and dropped packets inside
  a VM, which are not visible in OVS otherwise.

Such suspicious iterations can be logged together with their iteration
statistics to be able to correlate them to packet drop or other events
outside OVS.

A new command is introduced to enable/disable logging at run-time and
to adjust the above thresholds for suspicious iterations:

ovs-appctl dpif-netdev/pmd-perf-log-set on | off
[-b before] [-a after] [-e|-ne] [-us usec] [-q qlen]

Turn logging on or off at run-time (on|off).

-b before:  The number of iterations before the suspicious iteration to
be logged (default 5).
-a after:   The number of iterations after the suspicious iteration to
be logged (default 5).
-e: Extend logging interval if another suspicious iteration is
detected before logging occurs.
-ne:Do not extend logging interval (default).
-q qlen:Suspicious vhost queue fill level threshold. Increase this
to 512 if the Qemu supports 1024 virtio queue length.
(default 128).
-us usec:   change the duration threshold for a suspicious iteration
(default 250 us).

Note: Logging of suspicious iterations itself consumes a considerable
amount of processing cycles of a PMD which may be visible in the iteration
history. In the worst case this can lead OVS to detect another
suspicious iteration caused by logging.

If more than 100 iterations around a suspicious iteration have been
logged once, OVS falls back to the safe default values (-b 5/-a 5/-ne)
to avoid that logging itself causes continuos further logging.

Signed-off-by: Jan Scheurich 
Acked-by: Billy O'Mahony 
---
 NEWS|   2 +
 lib/dpif-netdev-perf.c  | 201 
 lib/dpif-netdev-perf.h  |  42 +
 lib/dpif-netdev-unixctl.man |  59 +
 lib/dpif-netdev.c   |   5 ++
 5 files changed, 309 insertions(+)

diff --git a/NEWS b/NEWS
index 8f66fd3..61148b1 100644
--- a/NEWS
+++ b/NEWS
@@ -76,6 +76,8 @@ v2.9.0 - 19 Feb 2018
  * Commands ovs-appctl dpif-netdev/pmd-*-show can now work on a single PMD
  * Detailed PMD performance metrics available with new command
  ovs-appctl dpif-netdev/pmd-perf-show
+ * Supervision of PMD performance metrics and logging of suspicious
+   iterations
- vswitchd:
  * Datapath IDs may now be specified as 0x1 (etc.) instead of 16 digits.
  * Configuring a controller, or unconfiguring all controllers, now deletes
diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
index 2b36410..410a209 100644
--- a/lib/dpif-netdev-perf.c
+++ b/lib/dpif-netdev-perf.c
@@ -25,6 +25,24 @@
 
 VLOG_DEFINE_THIS_MODULE(pmd_perf);
 
+#define ITER_US_THRESHOLD 250   /* Warning threshold for iteration duration
+   in microseconds. */
+#define VHOST_QUEUE_FULL 128/* Size of the virtio TX queue. */
+#define LOG_IT_BEFORE 5 /* Number of iterations to log before
+   suspicious iteration. */
+#define LOG_IT_AFTER 5  /* Number of iterations to log after
+   suspicious iteration. */
+
+bool log_enabled = false;
+bool log_extend = false;
+static uint32_t log_it_before = LOG_IT_BEFORE;
+static uint32_t log_it_after = LOG_IT_AFTER;
+static uint32_t log_us_thr = ITER_US_THRESHOLD;
+uint32_t log_q_thr = VHOST_QUEUE_FULL;
+uint64_t iter_cycle_threshold;
+
+static struct vlog_rate_limit latency_rl = VLOG_RATE_LIMIT_INIT(600, 600);
+
 #ifdef DPDK_NETDEV
 static uint64_t
 get_tsc_hz(void)
@@ -127,6 +145,10 @@ pmd_perf_stats_init(struct pmd_perf_stats *s)
 histogram_walls_set_log(>cycles_per_upcall, 1000, 100);
 histogram_walls_set_log(>max_vhost_qfill, 0, 512);
 s->start_ms = time_msec();
+s->log_susp_it = UINT32_MAX;
+s->log_begin_it = UINT32_MAX;
+s->log_end_it = UINT32_MAX;
+s->log_reason = NULL;
 }
 
 void
@@ -382,6 +404,10 @@ pmd_perf_stats_clear_lock(struct pmd_perf_stats *s)
 history_init(>milliseconds);
 s->start_ms = time_msec();
 s->milliseconds.sample[0].timestamp = s->start_ms;
+s->log_susp_it = UINT32_MAX;
+s->log_begin_it = UINT32_MAX;
+s->log_end_it = UINT32_MAX;
+s->log_reason = NULL;
 /* Clearing finished. */
 s->clear = false;
 ovs_mutex_unlock(>clear_mutex);
@@ -402,3 +428,178 @@ pmd_perf_stats_clear(struct pmd_perf_stats *s)
 s->clear =