Re: [vpp-dev] clocks stored in vlib_node_runtime_t #vpp #counters

Dave Barach via Lists.Fd.Io Fri, 27 Mar 2020 14:33:01 -0700

Running for 2.5 seconds in a single graph node is certainly a problem.

See if you can use the vppinfra “file” infrastructure for non-blocking file 
I/O. Search for clib_file_add(...). You can use a VLIB_NODE_TYPE_PROCESS node, 
suspend the process node after a realistic amount of runtime [measured in 
microseconds!], and cause the process to resume by sending it an event from a 
clib file “read_ready” callback.


To see how long a given graph node invocation runs, you can use the vpp event 
logger. It will deal with nodes which run for a year, but only if you’re 
willing to wait that long. 😉...

“elog trace dispatch”
“event-logger save <filename>”

Refer to this page to see how to view the resulting event log file: 
https://fd.io/docs/vpp/master/gettingstarted/developers/eventviewer.html

HTH... Dave

From: Nagaraju Vemuri <nagarajuiit...@gmail.com>
Sent: Friday, March 27, 2020 4:41 PM
To: Dave Barach (dbarach) <dbar...@cisco.com>
Subject: Re: [vpp-dev] clocks stored in vlib_node_runtime_t #vpp #counters

We are registering a node, where we do little heavy file operations and it is 
taking 2.5sec or so.
We are going to fix it.

This infra is supposed to catch any such wrongly coded node.

If max_clock logic is wrong, we will never catch nodes that (are incorrectly 
coded) take more than 2 seconds.
As per current code, we are always storing max_clock from 32 bit into 64 bit, 
so effectively it is always within 32 bit value.

My suggestion would solve the infra issue and catch such culprits?

Thanks,
Nagaraju

On Fri, Mar 27, 2020 at 1:32 PM Dave Barach (dbarach) 
<dbar...@cisco.com<mailto:dbar...@cisco.com>> wrote:
The assumption in step 1 won’t hold true in real life. I’m not claiming that 
it’s “impossible,” just that it will not happen with anything like 
properly-coded node dispatch functions.

A full dispatch circuit across multiple graph nodes - with full frames - 
doesn’t exceed 1ms or so. None of the individual nodes run for anywhere near 
that long.

HTH... Dave

From: Nagaraju Vemuri 
<nagarajuiit...@gmail.com<mailto:nagarajuiit...@gmail.com>>
Sent: Friday, March 27, 2020 3:09 PM
To: Dave Barach (dbarach) <dbar...@cisco.com<mailto:dbar...@cisco.com>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] clocks stored in vlib_node_runtime_t #vpp #counters

Hi Dave,

I saw that code. Please excuse me if I'm wrong with below issue:
Step #1: vlib_node_runtime_update_stats() receives n_clocks (uword), assume 
this n_clocks has value bigger than 32 bit can hold.
Step #2: We are setting node->max_clock (note that here max_clock already 
overflowed as n_clocks is big value)
Step #3: Now we call vlib_node_runtime_sync_stats() and pass node as argument 
along with n_clocks.
Step #4: In vlib_node_runtime_sync_stats(), we set n->stats_total.max_clock = 
r->max_clock; from node(32 bit field) which already overflowed.

I think we should have the logic like below inside 
vlib_node_runtime_sync_stats().
n->stats_total.max_clock = n_clocks > n->stats_total.max_clock ? n_clocks : 
n->stats_total.max_clock;   # all variables are 64 bit here

Thanks,
Nagaraju

On Fri, Mar 27, 2020 at 11:11 AM Dave Barach (dbarach) 
<dbar...@cisco.com<mailto:dbar...@cisco.com>> wrote:
vlib_node_runtime_update_stats(...)

From: Nagaraju Vemuri 
<nagarajuiit...@gmail.com<mailto:nagarajuiit...@gmail.com>>
Sent: Friday, March 27, 2020 12:54 PM
To: Dave Barach (dbarach) <dbar...@cisco.com<mailto:dbar...@cisco.com>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] clocks stored in vlib_node_runtime_t #vpp #counters

Thanks for your reply Dave.

Can you please point me to the code where we copy into 64 bit counters when 32 
bit overflows?

When we issue the command "show runtime max", do we print the data from 64 bit 
counters?

I made max_clock variable 64 bit type locally to experiment.
And found different output from "show runtime max".
Max clock was 7.49e9 with 64 bit field, whereas it was within 4e9(because of 32 
bit limitation) with 32 bit field.

Regards,
Nagaraju

On Fri, Mar 27, 2020 at 4:52 AM Dave Barach (dbarach) 
<dbar...@cisco.com<mailto:dbar...@cisco.com>> wrote:
We scrape the 32-bit node-runtime counters back into the 64 bit node counters 
when they overflow.

It should take a good long time for a 32-bit counter to overflow – O(1 seconds’ 
worth of node runtime) – we do this to reduce the cache footprint of the 
node-runtime.

HTH... Dave

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
<vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> On Behalf Of 
nagarajuiit...@gmail.com<mailto:nagarajuiit...@gmail.com>
Sent: Friday, March 27, 2020 1:59 AM
To: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: [vpp-dev] clocks stored in vlib_node_runtime_t #vpp #counters

Hi,

I see that clib_cpu_time_now() returns u64 type data.
And we are updating max clocks spent by a node in vlib_node_runtime_t using 
these timestamps.

Why are we using u32 types in vlib_node_runtime_t?
Because of this, it can only hold data within 2 seconds and anything above 32 
bit range overflows in our system.

Really appreciate your help.

Thanks,
Nagaraju


--
Thanks,
Nagaraju Vemuri


--
Thanks,
Nagaraju Vemuri


--
Thanks,
Nagaraju Vemuri

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15908): https://lists.fd.io/g/vpp-dev/message/15908
Mute This Topic: https://lists.fd.io/mt/72581795/21656
Mute #vpp: https://lists.fd.io/mk?hashtag=vpp&subid=1480452
Mute #counters: https://lists.fd.io/mk?hashtag=counters&subid=1480452
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] clocks stored in vlib_node_runtime_t #vpp #counters

Reply via email to