Martin Kennelly <[email protected]> writes:

> Hey ovs community,
>
> I am a developer working on ovn-kubernetes and I want to programmatically 
> consume long poll information
> i.e:
> ovs|00211|timeval(handler25)|WARN|Unreasonably long 52388ms poll interval 
> (752ms user, 209ms system)
>
> This is currently exposed via journal logs but it's not practical to consume 
> it there programmatically and I was
> hoping you could add it to coverage metrics.

I think it could be useful.  I do want to be careful about exposing
these kinds of data in a way that could be misinterpreted.  Already,
that log in particular gets misinterpreted quite a bit, and RH gets
customers claiming OVS is misbehaving when they've oversubscribed the
system.

Mechanically, it would be pretty simple to do something like:

---
diff --git a/lib/timeval.c b/lib/timeval.c
index 193c7bab17..00e5f2a74d 100644
--- a/lib/timeval.c
+++ b/lib/timeval.c
@@ -40,6 +40,7 @@
 #include "openvswitch/vlog.h"
 
 VLOG_DEFINE_THIS_MODULE(timeval);
+COVERAGE_DEFINE(long_poll_interval);
 
 #if !defined(HAVE_CLOCK_GETTIME)
 typedef unsigned int clockid_t;
@@ -645,6 +646,8 @@ log_poll_interval(long long int last_wakeup)
         struct rusage rusage;
 
         if (!getrusage_thread(&rusage)) {
+            COVERAGE_INC(long_poll_interval);
+
             VLOG_WARN("Unreasonably long %lldms poll interval"
                       " (%lldms user, %lldms system)",
                       interval,
---

This would at least expose the coverage data via the coverage framework
and it can be queried via ovs-appctl.  Actually, the advantage here is
that the coverage counter can track some details about X/sec over the
last 5 seconds, minute, hour, in addition to the total, so we can see
whether the condition is ongoing.

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to