Thanks for pointing out elog feature. May I understand why my suggestion is not good enough? (for my understanding only). "show runtime max" is very useful if we can fix this.
On Fri, Mar 27, 2020 at 2:32 PM Dave Barach (dbarach) <dbar...@cisco.com> wrote: > Running for 2.5 seconds in a single graph node is certainly a problem. > > > > See if you can use the vppinfra “file” infrastructure for non-blocking > file I/O. Search for clib_file_add(...). You can use a > VLIB_NODE_TYPE_PROCESS node, suspend the process node after a realistic > amount of runtime [measured in microseconds!], and cause the process to > resume by sending it an event from a clib file “read_ready” callback. > > > > To see how long a given graph node invocation runs, you can use the vpp > event logger. It will deal with nodes which run for a year, but only if > you’re willing to wait that long. 😉... > > > > “elog trace dispatch” > > “event-logger save <filename>” > > > > Refer to this page to see how to view the resulting event log file: > https://fd.io/docs/vpp/master/gettingstarted/developers/eventviewer.html > > > > HTH... Dave > > > > *From:* Nagaraju Vemuri <nagarajuiit...@gmail.com> > *Sent:* Friday, March 27, 2020 4:41 PM > *To:* Dave Barach (dbarach) <dbar...@cisco.com> > *Subject:* Re: [vpp-dev] clocks stored in vlib_node_runtime_t #vpp > #counters > > > > We are registering a node, where we do little heavy file operations and it > is taking 2.5sec or so. > > We are going to fix it. > > > > This infra is supposed to catch any such wrongly coded node. > > > > If max_clock logic is wrong, we will never catch nodes that (are > incorrectly coded) take more than 2 seconds. > > As per current code, we are always storing max_clock from 32 bit into 64 > bit, so effectively it is always within 32 bit value. > > > > My suggestion would solve the infra issue and catch such culprits? > > > > Thanks, > > Nagaraju > > > > On Fri, Mar 27, 2020 at 1:32 PM Dave Barach (dbarach) <dbar...@cisco.com> > wrote: > > The assumption in step 1 won’t hold true in real life. I’m not claiming > that it’s “impossible,” just that it will not happen with anything like > properly-coded node dispatch functions. > > > > A full dispatch circuit across multiple graph nodes - with full frames - > doesn’t exceed 1ms or so. None of the individual nodes run for anywhere > near that long. > > > > HTH... Dave > > > > *From:* Nagaraju Vemuri <nagarajuiit...@gmail.com> > *Sent:* Friday, March 27, 2020 3:09 PM > *To:* Dave Barach (dbarach) <dbar...@cisco.com> > *Cc:* vpp-dev@lists.fd.io > *Subject:* Re: [vpp-dev] clocks stored in vlib_node_runtime_t #vpp > #counters > > > > Hi Dave, > > > > I saw that code. Please excuse me if I'm wrong with below issue: > > Step #1: vlib_node_runtime_update_stats() receives n_clocks (uword), > assume this n_clocks has value bigger than 32 bit can hold. > > Step #2: We are setting node->max_clock (note that here max_clock already > overflowed as n_clocks is big value) > > Step #3: Now we call vlib_node_runtime_sync_stats() and pass node as > argument along with n_clocks. > > Step #4: In vlib_node_runtime_sync_stats(), we > set n->stats_total.max_clock = r->max_clock; from node(32 bit field) which > already overflowed. > > > > I think we should have the logic like below > inside vlib_node_runtime_sync_stats(). > > n->stats_total.max_clock = n_clocks > n->stats_total.max_clock ? n_clocks > : n->stats_total.max_clock; # all variables are 64 bit here > > > > Thanks, > > Nagaraju > > > > On Fri, Mar 27, 2020 at 11:11 AM Dave Barach (dbarach) <dbar...@cisco.com> > wrote: > > vlib_node_runtime_update_stats(...) > > > > *From:* Nagaraju Vemuri <nagarajuiit...@gmail.com> > *Sent:* Friday, March 27, 2020 12:54 PM > *To:* Dave Barach (dbarach) <dbar...@cisco.com> > *Cc:* vpp-dev@lists.fd.io > *Subject:* Re: [vpp-dev] clocks stored in vlib_node_runtime_t #vpp > #counters > > > > Thanks for your reply Dave. > > > > Can you please point me to the code where we copy into 64 bit counters > when 32 bit overflows? > > > > When we issue the command "show runtime max", do we print the data from 64 > bit counters? > > > > I made max_clock variable 64 bit type locally to experiment. > > And found different output from "show runtime max". > > Max clock was 7.49e9 with 64 bit field, whereas it was within 4e9(because > of 32 bit limitation) with 32 bit field. > > > > Regards, > > Nagaraju > > > > On Fri, Mar 27, 2020 at 4:52 AM Dave Barach (dbarach) <dbar...@cisco.com> > wrote: > > We scrape the 32-bit node-runtime counters back into the 64 bit node > counters when they overflow. > > > > It should take a good long time for a 32-bit counter to overflow – O(1 > seconds’ worth of node runtime) – we do this to reduce the cache footprint > of the node-runtime. > > > > HTH... Dave > > > > *From:* vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> *On Behalf Of * > nagarajuiit...@gmail.com > *Sent:* Friday, March 27, 2020 1:59 AM > *To:* vpp-dev@lists.fd.io > *Subject:* [vpp-dev] clocks stored in vlib_node_runtime_t #vpp #counters > > > > Hi, > > I see that clib_cpu_time_now() returns u64 type data. > And we are updating max clocks spent by a node in vlib_node_runtime_t > using these timestamps. > > Why are we using u32 types in vlib_node_runtime_t? > Because of this, it can only hold data within 2 seconds and anything above > 32 bit range overflows in our system. > > Really appreciate your help. > > Thanks, > Nagaraju > > > > > -- > > Thanks, > Nagaraju Vemuri > > > > > -- > > Thanks, > Nagaraju Vemuri > > > > > -- > > Thanks, > Nagaraju Vemuri > -- Thanks, Nagaraju Vemuri
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#15909): https://lists.fd.io/g/vpp-dev/message/15909 Mute This Topic: https://lists.fd.io/mt/72581795/21656 Mute #vpp: https://lists.fd.io/mk?hashtag=vpp&subid=1480452 Mute #counters: https://lists.fd.io/mk?hashtag=counters&subid=1480452 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-