On 10 Jan 2024, at 20:34, Ilya Maximets wrote:
> On 1/10/24 12:25, Eelco Chaudron wrote:
>> Martin Kennelly observes that even though this data is available to
>> humans via the journal/log files, these aren't exactly easy for a
>> developer to make any kind of behavioral inferences. This kind of
>> log and counter would be useful when checking on system health to
>> let us know that an Open vSwitch component is noticing some kind of
>> system level hiccup.
>>
>> Add a new coverage counter to track information on these events, and
>> let a developer or system engineer know how long these events have
>> occurred with some historical context.
>>
>> Reported-at:
>> https://lists.linuxfoundation.org/pipermail/ovs-discuss/2023-June/052523.html
>> Suggested-by: Martin Kennelly <[email protected]>
>> Co-Authored-By: Aaron Conole <[email protected]>
>> Signed-off-by: Aaron Conole <[email protected]>
>> Signed-off-by: Eelco Chaudron <[email protected]>
>> Acked-by: Simon Horman <[email protected]>
>> ---
>> v2: Updated commit message, based on similar patch upstream.
>>
>> lib/timeval.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/lib/timeval.c b/lib/timeval.c
>> index 193c7bab1..0abe7e555 100644
>> --- a/lib/timeval.c
>> +++ b/lib/timeval.c
>> @@ -41,6 +41,8 @@
>>
>> VLOG_DEFINE_THIS_MODULE(timeval);
>>
>> +COVERAGE_DEFINE(long_poll_interval);
>> +
>> #if !defined(HAVE_CLOCK_GETTIME)
>> typedef unsigned int clockid_t;
>> static int clock_gettime(clock_t id, struct timespec *ts);
>> @@ -644,6 +646,8 @@ log_poll_interval(long long int last_wakeup)
>> const struct rusage *last_rusage = get_recent_rusage();
>> struct rusage rusage;
>>
>> + COVERAGE_INC(long_poll_interval);
>> +
>> if (!getrusage_thread(&rusage)) {
>> VLOG_WARN("Unreasonably long %lldms poll interval"
>> " (%lldms user, %lldms system)",
>
> Potentially interesting side effect of this change would be that
> coverage counters will likely be logged on every long poll interval,
> since every long poll interval will change them. Before this change
> many coverage dumps are skipped, because they are the same (see the
> coverage_log() function). Did you notice such a behavior change?
I did not notice this, as I only forced a single instance, but this is a good
catch. I could exclude this new counter from the coverage_hash() function.
However, there will probably be a lot of malloc ones anyway. What do you think?
//Eelco
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev