Monitoring CPU time

Stuart Nelson Sun, 18 Sep 2016 09:47:46 -0700

Hey all,

I'm following up on some work from this old thread for monitoring CPU time:
https://www.dragonflybsd.org/mailarchive/users/2010-04/msg00056.html

The code I have is essentially the one shown in the link, but I'm
attempting to find the actual number of seconds spent in each state. I'm
doing this by dividing each value by clockrate.stathz, e.g.:

user += cp_t[cpu].cp_user / clockrate.stathz;

Relevant code is here:
https://github.com/stuartnelson3/node_exporter/blob/2b5a581942ac31b501438d402274100df1f7d3d6/collector/cpu_dragonfly.go#L50-L98

My question is about the units on struct members in kinfo_cputime (the
source of cp_user et al.). The values I'm getting out seem to be growing at
a rate that indicates I'm not looking at seconds, but something smaller.

I'm looking at the rate of change of cpu time on my personal machine
running dragonfly vs. a machine running linux. The implementation is the
same: get user time, divide by 100Hz to get the value in seconds, and find
the rate of change between two collections in fixed time window. The
dragonfly rate of change seems to be larger by about 2 orders of magnitude,
which is why I'm asking about the units.

For reference, the dragonfly node I'm looking at is reporting ~200 increase
per second in cpu time for user and sys with loadavg ~0.1%, whereas the
linux node is reporting values <10 with loadavg ~15%.

I'm improving dragonfly support for the node_exporter for Prometheus, a
metrics and monitoring solution that is used mostly in the linux community.
I'm assuming the linux implementation for finding cpu time in seconds is
correct, and it's also the implementation used for finding cpu seconds for
freebsd. It just struck me as unlikely that my old dell running dragonfly
would have a rate of change at a fraction of the load that was so
drastically different.

If there is anything I can clarify don't hesitate to write back!

Thanks,
Stuart

Monitoring CPU time

Reply via email to