Adrian Moreno <[email protected]> writes:
> On 6/19/23 10:36, Eelco Chaudron wrote:
>> On 16 Jun 2023, at 19:19, Aaron Conole wrote:
>>
>>> Martin Kennelly <[email protected]> writes:
>>>
>>>> Hey ovs community,
>>>>
>>>> I am a developer working on ovn-kubernetes and I want to
>>>> programmatically consume long poll information
>>>> i.e:
>>>> ovs|00211|timeval(handler25)|WARN|Unreasonably long 52388ms poll
>>>> interval (752ms user, 209ms system)
>>>>
>>>> This is currently exposed via journal logs but it's not practical
>>>> to consume it there programmatically and I was
>>>> hoping you could add it to coverage metrics.
>>>
>>> I think it could be useful. I do want to be careful about exposing
>>> these kinds of data in a way that could be misinterpreted. Already,
>>> that log in particular gets misinterpreted quite a bit, and RH gets
>>> customers claiming OVS is misbehaving when they've oversubscribed the
>>> system.
>> +1
>>
>
> Maybe it's a good time to start documenting coverage counters?
I agree - we should have at least some kind of documentation. Actually,
it would be really nice if we could do something during
COVERAGE_DEFINE() that would be like:
COVERAGE_DEFINE(ctr, "description")
and then we can generate documentation from the COVERAGE_DEFINE()s as
well as querying for it with an ovs-appctl command.
That might be trying to be too fancy, though.
>>> Mechanically, it would be pretty simple to do something like:
>>>
>>> ---
>>> diff --git a/lib/timeval.c b/lib/timeval.c
>>> index 193c7bab17..00e5f2a74d 100644
>>> --- a/lib/timeval.c
>>> +++ b/lib/timeval.c
>>> @@ -40,6 +40,7 @@
>>> #include "openvswitch/vlog.h"
>>>
>>> VLOG_DEFINE_THIS_MODULE(timeval);
>>> +COVERAGE_DEFINE(long_poll_interval);
>>>
>>> #if !defined(HAVE_CLOCK_GETTIME)
>>> typedef unsigned int clockid_t;
>>> @@ -645,6 +646,8 @@ log_poll_interval(long long int last_wakeup)
>>> struct rusage rusage;
>>>
>>> if (!getrusage_thread(&rusage)) {
>>> + COVERAGE_INC(long_poll_interval);
>>> +
>>> VLOG_WARN("Unreasonably long %lldms poll interval"
>>> " (%lldms user, %lldms system)",
>>> interval,
>>> ---
>>>
>>> This would at least expose the coverage data via the coverage framework
>>> and it can be queried via ovs-appctl. Actually, the advantage here is
>>> that the coverage counter can track some details about X/sec over the
>>> last 5 seconds, minute, hour, in addition to the total, so we can see
>>> whether the condition is ongoing.
>> _______________________________________________
>> dev mailing list
>> [email protected]
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev