On 20 Jun 2023, at 16:57, Ilya Maximets wrote:

> On 6/20/23 16:10, Aaron Conole wrote:
>> Adrian Moreno <[email protected]> writes:
>>
>>> On 6/19/23 10:36, Eelco Chaudron wrote:
>>>> On 16 Jun 2023, at 19:19, Aaron Conole wrote:
>>>>
>>>>> Martin Kennelly <[email protected]> writes:
>>>>>
>>>>>> Hey ovs community,
>>>>>>
>>>>>> I am a developer working on ovn-kubernetes and I want to
>>>>>> programmatically consume long poll information
>>>>>> i.e:
>>>>>> ovs|00211|timeval(handler25)|WARN|Unreasonably long 52388ms poll
>>>>>> interval (752ms user, 209ms system)
>>>>>>
>>>>>> This is currently exposed via journal logs but it's not practical
>>>>>> to consume it there programmatically and I was
>>>>>> hoping you could add it to coverage metrics.
>>>>>
>>>>> I think it could be useful.  I do want to be careful about exposing
>>>>> these kinds of data in a way that could be misinterpreted.  Already,
>>>>> that log in particular gets misinterpreted quite a bit, and RH gets
>>>>> customers claiming OVS is misbehaving when they've oversubscribed the
>>>>> system.
>>>> +1
>>>>
>>>
>>> Maybe it's a good time to start documenting coverage counters?
>>
>> I agree - we should have at least some kind of documentation.  Actually,
>> it would be really nice if we could do something during
>> COVERAGE_DEFINE() that would be like:
>>
>>   COVERAGE_DEFINE(ctr, "description")
>>
>> and then we can generate documentation from the COVERAGE_DEFINE()s as
>> well as querying for it with an ovs-appctl command.

I think this might be the only way to keep the doc in sync, however, who is 
going to document the 320+ existing ones?

>> That might be trying to be too fancy, though.
>
>
> The problem with documenting coverage counters is that we can't
> and shouldn't claim that these are in any way a stable API.
> They are mainly for developers.  Code can change and there might
> be no meaningful way preserve current counters or their names.
> They can change in each release including minor ones without
> prior notice.

+1000 as even if the direct code near the counter does not changes, it could 
still be impacted the actual counter. Which I have seen before.

> Best regards, Ilya Maximets.
>
>>
>>>>> Mechanically, it would be pretty simple to do something like:
>>>>>
>>>>> ---
>>>>> diff --git a/lib/timeval.c b/lib/timeval.c
>>>>> index 193c7bab17..00e5f2a74d 100644
>>>>> --- a/lib/timeval.c
>>>>> +++ b/lib/timeval.c
>>>>> @@ -40,6 +40,7 @@
>>>>>   #include "openvswitch/vlog.h"
>>>>>
>>>>>   VLOG_DEFINE_THIS_MODULE(timeval);
>>>>> +COVERAGE_DEFINE(long_poll_interval);
>>>>>
>>>>>   #if !defined(HAVE_CLOCK_GETTIME)
>>>>>   typedef unsigned int clockid_t;
>>>>> @@ -645,6 +646,8 @@ log_poll_interval(long long int last_wakeup)
>>>>>           struct rusage rusage;
>>>>>
>>>>>           if (!getrusage_thread(&rusage)) {
>>>>> +            COVERAGE_INC(long_poll_interval);
>>>>> +
>>>>>               VLOG_WARN("Unreasonably long %lldms poll interval"
>>>>>                         " (%lldms user, %lldms system)",
>>>>>                         interval,
>>>>> ---
>>>>>
>>>>> This would at least expose the coverage data via the coverage framework
>>>>> and it can be queried via ovs-appctl.  Actually, the advantage here is
>>>>> that the coverage counter can track some details about X/sec over the
>>>>> last 5 seconds, minute, hour, in addition to the total, so we can see
>>>>> whether the condition is ongoing.
>>>> _______________________________________________
>>>> dev mailing list
>>>> [email protected]
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>
>>
>> _______________________________________________
>> dev mailing list
>> [email protected]
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to