Re: [ovs-dev] [PATCH v2 3/3] timeval: Add coverage counter for long poll interval events.

Eelco Chaudron Thu, 11 Jan 2024 00:16:51 -0800

On 11 Jan 2024, at 0:34, Eelco Chaudron wrote:

> On 10 Jan 2024, at 20:34, Ilya Maximets wrote:
>
>> On 1/10/24 12:25, Eelco Chaudron wrote:
>>> Martin Kennelly observes that even though this data is available to
>>> humans via the journal/log files, these aren't exactly easy for a
>>> developer to make any kind of behavioral inferences.  This kind of
>>> log and counter would be useful when checking on system health to
>>> let us know that an Open vSwitch component is noticing some kind of
>>> system level hiccup.
>>>
>>> Add a new coverage counter to track information on these events, and
>>> let a developer or system engineer know how long these events have
>>> occurred with some historical context.
>>>
>>> Reported-at: 
>>> https://lists.linuxfoundation.org/pipermail/ovs-discuss/2023-June/052523.html
>>> Suggested-by: Martin Kennelly <[email protected]>
>>> Co-Authored-By: Aaron Conole <[email protected]>
>>> Signed-off-by: Aaron Conole <[email protected]>
>>> Signed-off-by: Eelco Chaudron <[email protected]>
>>> Acked-by: Simon Horman <[email protected]>
>>> ---
>>>  v2: Updated commit message, based on similar patch upstream.
>>>
>>>  lib/timeval.c | 4 ++++
>>>  1 file changed, 4 insertions(+)
>>>
>>> diff --git a/lib/timeval.c b/lib/timeval.c
>>> index 193c7bab1..0abe7e555 100644
>>> --- a/lib/timeval.c
>>> +++ b/lib/timeval.c
>>> @@ -41,6 +41,8 @@
>>>
>>>  VLOG_DEFINE_THIS_MODULE(timeval);
>>>
>>> +COVERAGE_DEFINE(long_poll_interval);
>>> +
>>>  #if !defined(HAVE_CLOCK_GETTIME)
>>>  typedef unsigned int clockid_t;
>>>  static int clock_gettime(clock_t id, struct timespec *ts);
>>> @@ -644,6 +646,8 @@ log_poll_interval(long long int last_wakeup)
>>>          const struct rusage *last_rusage = get_recent_rusage();
>>>          struct rusage rusage;
>>>
>>> +        COVERAGE_INC(long_poll_interval);
>>> +
>>>          if (!getrusage_thread(&rusage)) {
>>>              VLOG_WARN("Unreasonably long %lldms poll interval"
>>>                        " (%lldms user, %lldms system)",
>>
>> Potentially interesting side effect of this change would be that
>> coverage counters will likely be logged on every long poll interval,
>> since every long poll interval will change them.  Before this change
>> many coverage dumps are skipped, because they are the same (see the
>> coverage_log() function).  Did you notice such a behavior change?
>
> I did not notice this, as I only forced a single instance, but this is a good 
> catch. I could exclude this new counter from the coverage_hash() function. 
> However, there will probably be a lot of malloc ones anyway. What do you 
> think?

Ok, after adding some code to exclude this new event in the hash. I did not see 
any difference in behavior. The dump will happen each time the total values are 
updated (around every 5 seconds) due to the fact seq_change and util_xalloc are 
constantly being increased. The only thing that stops this from happening all 
the time in my setup is the rate-limiting in coverage_hash().

So I guess this change does not change anything to the current behavior, so 
adding the exclude in the hash makes no sense. Thoughts?

//Eelco

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 3/3] timeval: Add coverage counter for long poll interval events.

Reply via email to