Re: Monitoring CPU time

2016-09-18 Thread Sepherosa Ziehau
Hi,

The bump upon each statclock is:
((cur_systimer - prev_systimer) * systimer_freq) >> 32

systimer_freq can be extracted from following sysctl in userspace:
sysctl kern.cputimer.freq

statclock is called at stathz frequency.

Thanks,
sephe


On Mon, Sep 19, 2016 at 12:46 AM, Stuart Nelson  wrote:
> Hey all,
>
> I'm following up on some work from this old thread for monitoring CPU time:
> https://www.dragonflybsd.org/mailarchive/users/2010-04/msg00056.html
>
> The code I have is essentially the one shown in the link, but I'm attempting
> to find the actual number of seconds spent in each state. I'm doing this by
> dividing each value by clockrate.stathz, e.g.:
>
>
> user += cp_t[cpu].cp_user / clockrate.stathz;
>
>
> Relevant code is here:
> https://github.com/stuartnelson3/node_exporter/blob/2b5a581942ac31b501438d402274100df1f7d3d6/collector/cpu_dragonfly.go#L50-L98
>
> My question is about the units on struct members in kinfo_cputime (the
> source of cp_user et al.). The values I'm getting out seem to be growing at
> a rate that indicates I'm not looking at seconds, but something smaller.
>
> I'm looking at the rate of change of cpu time on my personal machine running
> dragonfly vs. a machine running linux. The implementation is the same: get
> user time, divide by 100Hz to get the value in seconds, and find the rate of
> change between two collections in fixed time window. The dragonfly rate of
> change seems to be larger by about 2 orders of magnitude, which is why I'm
> asking about the units.
>
> For reference, the dragonfly node I'm looking at is reporting ~200 increase
> per second in cpu time for user and sys with loadavg ~0.1%, whereas the
> linux node is reporting values <10 with loadavg ~15%.
>
> I'm improving dragonfly support for the node_exporter for Prometheus, a
> metrics and monitoring solution that is used mostly in the linux community.
> I'm assuming the linux implementation for finding cpu time in seconds is
> correct, and it's also the implementation used for finding cpu seconds for
> freebsd. It just struck me as unlikely that my old dell running dragonfly
> would have a rate of change at a fraction of the load that was so
> drastically different.
>
> If there is anything I can clarify don't hesitate to write back!
>
> Thanks,
> Stuart
>
>



-- 
Tomorrow Will Never Die


Vinum troubleshooting

2016-09-18 Thread afreedma
Hi all,

I'm having a rough time trying to get vinum going. After a few failed starts I 
got my
disks in a format that appeared to work, but vinum won't bring the volume up 
after
creation.


loki# vinum lv
V storage   State: down Plexes:   1 Size:   9315 GB


Broadly - I have six disks of 2Tb or more. I want to bundle them all up and be 
able to
survive a single drive failure. RAID-5 with vinum seems my best option; I gave 
up on LVM2
when I got lost trying to set up the disks in a way it approved of. Open to 
other
solutions than vinum if they exist though.

Any guidance gratefully received.

(PS, man vinum(8) states "vinum maintains a log file, by default 
/var/tmp/vinum_history,",
it's actually in /var/log/vinum_history)

- Andrew


Diagnostics:

loki# uname -v
DragonFly v4.6.0-RELEASE #0: Mon Aug  1 12:46:25 EDT 2016 
r...@www.shiningsilence.com:/usr/obj/build/sys/X86_64_GENERIC 

loki# vinum info -V
Flags: 0x80004
Can't get information: Invalid argument

loki# cat /etc/vinum.conf 
drive d1 device /dev/da0s1a
drive d2 device /dev/da1s1a
drive d3 device /dev/da3s1a
drive d4 device /dev/da4s1a
drive d5 device /dev/da5s1a
drive d6 device /dev/da6s1a
volume storage
 plex org raid5 512k
   sd length 1863G drive d1
   sd length 1863G drive d2
   sd length 1863G drive d3
   sd length 1863G drive d4
   sd length 1863G drive d5
   sd length 1863G drive d6

loki# vinum l
6 drives:
D d1State: up   Device /dev/da0s1a  Avail: 
13/1907725 MB (0%)
D d2State: up   Device /dev/da1s1a  Avail: 
189438/2097150 MB (9%)
D d3State: up   Device /dev/da3s1a  Avail: 
189438/2097150 MB (9%)
D d4State: up   Device /dev/da4s1a  Avail: 
13/1907725 MB (0%)
D d5State: up   Device /dev/da5s1a  Avail: 
13/1907725 MB (0%)
D d6State: up   Device /dev/da6s1a  Avail: 
13/1907725 MB (0%)

1 volumes:
V storage   State: down Plexes:   1 Size:   9315 GB

1 plexes:
P storage.p0 R5 State: init Subdisks: 6 Size:   9315 GB

6 subdisks:
S storage.p0.s0 State: emptyPO:0  B Size:   1863 GB
S storage.p0.s1 State: emptyPO:  512 kB Size:   1863 GB
S storage.p0.s2 State: emptyPO: 1024 kB Size:   1863 GB
S storage.p0.s3 State: emptyPO: 1536 kB Size:   1863 GB
S storage.p0.s4 State: emptyPO: 2048 kB Size:   1863 GB
S storage.p0.s5 State: emptyPO: 2560 kB Size:   1863 GB

loki# vinum start
** no drives found: No such file or directory
Warning: defective objects

V storage   State: down Plexes:   1 Size:   9315 GB
P storage.p0 R5 State: init Subdisks: 6 Size:   9315 GB
S storage.p0.s0 State: emptyPO:0  B Size:   1863 GB
S storage.p0.s1 State: emptyPO:  512 kB Size:   1863 GB
S storage.p0.s2 State: emptyPO: 1024 kB Size:   1863 GB
S storage.p0.s3 State: emptyPO: 1536 kB Size:   1863 GB
S storage.p0.s4 State: emptyPO: 2048 kB Size:   1863 GB
S storage.p0.s5 State: emptyPO: 2560 kB Size:   1863 GB


da0 through to da6 (excluding da2) all have identical results on the below 
(aside from 1 & 3 being a little bigger):

loki# fdisk /dev/da0
*** Working on device /dev/da0 ***
parameters extracted from device are:
cylinders=242251 heads=256 sectors/track=63 (16128 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=242251 heads=256 sectors/track=63 (16128 blks/cyl)

Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 165,(DragonFly/FreeBSD/NetBSD/386BSD)
start 63, size 3907024065 (1907726 Meg), flag 80 (active)
beg: cyl 0/ head 1/ sector 1;
end: cyl 1023/ head 255/ sector 63
The data for partition 2 is:

The data for partition 3 is:

The data for partition 4 is:


loki# gpt show /dev/da0
   startsize  index  contents
   0   1  -  MBR
   1  62  -  
  63  3907024065  0  MBR part 165
  39070241285007  -  
  3907029135  32  -  Sec GPT table
  3907029167   1  -  Sec GPT header
  
loki# disklabel64 /dev/da0s1
# /dev/da0s1:
#
# Informational fields calculated from the above
# All byte equivalent offsets must be aligned
#
# boot space:1044992 bytes
# data space: 1953511003 blocks # 1907725.59 MB (2000395267584 bytes)
#
# NOTE: If the partition data base looks odd it may be
#   physically aligned instead of slice-aligned
#
diskid: f413c18e-7d78-11e6-ae3b-0100
label: 
boot2 data base:  0x1000
partitions data base: 

Monitoring CPU time

2016-09-18 Thread Stuart Nelson
Hey all,

I'm following up on some work from this old thread for monitoring CPU time:
https://www.dragonflybsd.org/mailarchive/users/2010-04/msg00056.html

The code I have is essentially the one shown in the link, but I'm
attempting to find the actual number of seconds spent in each state. I'm
doing this by dividing each value by clockrate.stathz, e.g.:


user += cp_t[cpu].cp_user / clockrate.stathz;


Relevant code is here:
https://github.com/stuartnelson3/node_exporter/blob/2b5a581942ac31b501438d402274100df1f7d3d6/collector/cpu_dragonfly.go#L50-L98

My question is about the units on struct members in kinfo_cputime (the
source of cp_user et al.). The values I'm getting out seem to be growing at
a rate that indicates I'm not looking at seconds, but something smaller.

I'm looking at the rate of change of cpu time on my personal machine
running dragonfly vs. a machine running linux. The implementation is the
same: get user time, divide by 100Hz to get the value in seconds, and find
the rate of change between two collections in fixed time window. The
dragonfly rate of change seems to be larger by about 2 orders of magnitude,
which is why I'm asking about the units.

For reference, the dragonfly node I'm looking at is reporting ~200 increase
per second in cpu time for user and sys with loadavg ~0.1%, whereas the
linux node is reporting values <10 with loadavg ~15%.

I'm improving dragonfly support for the node_exporter for Prometheus, a
metrics and monitoring solution that is used mostly in the linux community.
I'm assuming the linux implementation for finding cpu time in seconds is
correct, and it's also the implementation used for finding cpu seconds for
freebsd. It just struck me as unlikely that my old dell running dragonfly
would have a rate of change at a fraction of the load that was so
drastically different.

If there is anything I can clarify don't hesitate to write back!

Thanks,
Stuart