On 20 Jun 2023, at 16:57, Ilya Maximets wrote:
> On 6/20/23 16:10, Aaron Conole wrote:
>> Adrian Moreno <[email protected]> writes:
>>
>>> On 6/19/23 10:36, Eelco Chaudron wrote:
>>>> On 16 Jun 2023, at 19:19, Aaron Conole wrote:
>>>>
>>>>> Martin Kennelly <[email protected]> writes:
>>>>>
>>>>>> Hey ovs community,
>>>>>>
>>>>>> I am a developer working on ovn-kubernetes and I want to
>>>>>> programmatically consume long poll information
>>>>>> i.e:
>>>>>> ovs|00211|timeval(handler25)|WARN|Unreasonably long 52388ms poll
>>>>>> interval (752ms user, 209ms system)
>>>>>>
>>>>>> This is currently exposed via journal logs but it's not practical
>>>>>> to consume it there programmatically and I was
>>>>>> hoping you could add it to coverage metrics.
>>>>>
>>>>> I think it could be useful. I do want to be careful about exposing
>>>>> these kinds of data in a way that could be misinterpreted. Already,
>>>>> that log in particular gets misinterpreted quite a bit, and RH gets
>>>>> customers claiming OVS is misbehaving when they've oversubscribed the
>>>>> system.
>>>> +1
>>>>
>>>
>>> Maybe it's a good time to start documenting coverage counters?
>>
>> I agree - we should have at least some kind of documentation. Actually,
>> it would be really nice if we could do something during
>> COVERAGE_DEFINE() that would be like:
>>
>> COVERAGE_DEFINE(ctr, "description")
>>
>> and then we can generate documentation from the COVERAGE_DEFINE()s as
>> well as querying for it with an ovs-appctl command.
I think this might be the only way to keep the doc in sync, however, who is
going to document the 320+ existing ones?
>> That might be trying to be too fancy, though.
>
>
> The problem with documenting coverage counters is that we can't
> and shouldn't claim that these are in any way a stable API.
> They are mainly for developers. Code can change and there might
> be no meaningful way preserve current counters or their names.
> They can change in each release including minor ones without
> prior notice.
+1000 as even if the direct code near the counter does not changes, it could
still be impacted the actual counter. Which I have seen before.
> Best regards, Ilya Maximets.
>
>>
>>>>> Mechanically, it would be pretty simple to do something like:
>>>>>
>>>>> ---
>>>>> diff --git a/lib/timeval.c b/lib/timeval.c
>>>>> index 193c7bab17..00e5f2a74d 100644
>>>>> --- a/lib/timeval.c
>>>>> +++ b/lib/timeval.c
>>>>> @@ -40,6 +40,7 @@
>>>>> #include "openvswitch/vlog.h"
>>>>>
>>>>> VLOG_DEFINE_THIS_MODULE(timeval);
>>>>> +COVERAGE_DEFINE(long_poll_interval);
>>>>>
>>>>> #if !defined(HAVE_CLOCK_GETTIME)
>>>>> typedef unsigned int clockid_t;
>>>>> @@ -645,6 +646,8 @@ log_poll_interval(long long int last_wakeup)
>>>>> struct rusage rusage;
>>>>>
>>>>> if (!getrusage_thread(&rusage)) {
>>>>> + COVERAGE_INC(long_poll_interval);
>>>>> +
>>>>> VLOG_WARN("Unreasonably long %lldms poll interval"
>>>>> " (%lldms user, %lldms system)",
>>>>> interval,
>>>>> ---
>>>>>
>>>>> This would at least expose the coverage data via the coverage framework
>>>>> and it can be queried via ovs-appctl. Actually, the advantage here is
>>>>> that the coverage counter can track some details about X/sec over the
>>>>> last 5 seconds, minute, hour, in addition to the total, so we can see
>>>>> whether the condition is ongoing.
>>>> _______________________________________________
>>>> dev mailing list
>>>> [email protected]
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>
>>
>> _______________________________________________
>> dev mailing list
>> [email protected]
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev