On 6/20/23 16:10, Aaron Conole wrote:
> Adrian Moreno <[email protected]> writes:
> 
>> On 6/19/23 10:36, Eelco Chaudron wrote:
>>> On 16 Jun 2023, at 19:19, Aaron Conole wrote:
>>>
>>>> Martin Kennelly <[email protected]> writes:
>>>>
>>>>> Hey ovs community,
>>>>>
>>>>> I am a developer working on ovn-kubernetes and I want to
>>>>> programmatically consume long poll information
>>>>> i.e:
>>>>> ovs|00211|timeval(handler25)|WARN|Unreasonably long 52388ms poll
>>>>> interval (752ms user, 209ms system)
>>>>>
>>>>> This is currently exposed via journal logs but it's not practical
>>>>> to consume it there programmatically and I was
>>>>> hoping you could add it to coverage metrics.
>>>>
>>>> I think it could be useful.  I do want to be careful about exposing
>>>> these kinds of data in a way that could be misinterpreted.  Already,
>>>> that log in particular gets misinterpreted quite a bit, and RH gets
>>>> customers claiming OVS is misbehaving when they've oversubscribed the
>>>> system.
>>> +1
>>>
>>
>> Maybe it's a good time to start documenting coverage counters?
> 
> I agree - we should have at least some kind of documentation.  Actually,
> it would be really nice if we could do something during
> COVERAGE_DEFINE() that would be like:
> 
>   COVERAGE_DEFINE(ctr, "description")
> 
> and then we can generate documentation from the COVERAGE_DEFINE()s as
> well as querying for it with an ovs-appctl command.
> 
> That might be trying to be too fancy, though.


The problem with documenting coverage counters is that we can't
and shouldn't claim that these are in any way a stable API.
They are mainly for developers.  Code can change and there might
be no meaningful way preserve current counters or their names.
They can change in each release including minor ones without
prior notice.

Best regards, Ilya Maximets.

> 
>>>> Mechanically, it would be pretty simple to do something like:
>>>>
>>>> ---
>>>> diff --git a/lib/timeval.c b/lib/timeval.c
>>>> index 193c7bab17..00e5f2a74d 100644
>>>> --- a/lib/timeval.c
>>>> +++ b/lib/timeval.c
>>>> @@ -40,6 +40,7 @@
>>>>   #include "openvswitch/vlog.h"
>>>>
>>>>   VLOG_DEFINE_THIS_MODULE(timeval);
>>>> +COVERAGE_DEFINE(long_poll_interval);
>>>>
>>>>   #if !defined(HAVE_CLOCK_GETTIME)
>>>>   typedef unsigned int clockid_t;
>>>> @@ -645,6 +646,8 @@ log_poll_interval(long long int last_wakeup)
>>>>           struct rusage rusage;
>>>>
>>>>           if (!getrusage_thread(&rusage)) {
>>>> +            COVERAGE_INC(long_poll_interval);
>>>> +
>>>>               VLOG_WARN("Unreasonably long %lldms poll interval"
>>>>                         " (%lldms user, %lldms system)",
>>>>                         interval,
>>>> ---
>>>>
>>>> This would at least expose the coverage data via the coverage framework
>>>> and it can be queried via ovs-appctl.  Actually, the advantage here is
>>>> that the coverage counter can track some details about X/sec over the
>>>> last 5 seconds, minute, hour, in addition to the total, so we can see
>>>> whether the condition is ongoing.
>>> _______________________________________________
>>> dev mailing list
>>> [email protected]
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
> 
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to