[Linux-HA] Antw: Re: pacemaker/HealthCPU

2011-02-04 Thread Ulrich Windl
Lars,

you are right, and I saw that my guess to use /proc/stat was wrong. top is slow 
in getting the current CPU usage. So basically I wondered if you need the CPU 
usage at all. If you'd switch to load, you could get it a lot faster.

To be honest: I wondered what HealthCPU would monitor about the CPU's 
health when initially looking into it. I was kind of disappointed to see that 
it simply inspects the CPU usage (A CPU that is 100% busy (0% idle) may be 
quite healthy) ;-)

Regards,
Ulrich

 Lars Ellenberg lars.ellenb...@linbit.com schrieb am 04.02.2011 um 13:26 in
Nachricht 20110204122642.GG10069@barkeeper1-xen.linbit:
 On Thu, Feb 03, 2011 at 01:09:04PM +0100, Michael Schwartzkopff wrote:
  On Thursday 03 February 2011 12:35:34 Ulrich Windl wrote:
   Hi!
   
   I'm starting to explore Linux-HA. Examining one of the monitors, I think
   things could be made much more efficient. For example: To get the percent
   of idle CPU the monitor uses 4 processes: top -b -n2 | grep Cpu | tail -1
   | awk -F,|\.[0-9]%id '{ print $4 }'
   
   However awk can do the effect of grep and tail as well. My first attempt 
 is
   this: top -b -n2 | awk -F,|\.[0-9]%id '/^Cpu/{ print $4; exit }'
   
   My second attempt uses /proc/stat instead, avoiding the slow top process:
   awk '$1 == cpu { print $7; exit }' /proc/stat
   
   time (top -b -n2 | grep Cpu | tail -1 | awk -F,|\.[0-9]%id '{ print $4
   }') awk: warning: escape sequence `\.' treated as plain `.'
99
   
   real0m3.533s
   user0m0.008s
   sys 0m0.008s
   
time (top -b -n2| awk -F,|\.[0-9]%id '/^Cpu/{ print $4; exit }')
   awk: warning: escape sequence `\.' treated as plain `.'
99
 
 Outch. Big FAIL here already ;-)
 
   
   real0m0.518s
   user0m0.000s
   sys 0m0.008s
   
   time awk '$1 == cpu { print $7; exit }' /proc/stat
   98
 
 And you actually believe that this was the equivalent of the above
 top | etc pipe ?
 
 See below.
 
   real0m0.004s
   user0m0.000s
   sys 0m0.000s
   
   Regards,
   Ulrich
  
  Hi,
  
  good idea. The only problem it that the information in /proc/stats is in 
 ticks 
  and does not give you an absolute value. So you would have to calculate the 
 
  difference yourself, which makes the task much more difficult.
 
 1) /proc/stat is linux specific.
 2) /proc/stat is what top samles on linux ;-)
 3) it is in USER_HZ, so it's in centi secs,
which makes it easy enough to calculate a meaningfull difference.
besides, as long as it is _any_ consistent unit,
the unit does not matter, as it is in both nominator and denominator
;-)
 
 4) time top -b -n2 | pipe
vs. time cat /proc/stats
...
You do realize that to get any meaningfull measure about current
cpu usage, while the only measure readily available is cpu usage
since system boot, you need to watch it for a while?
 
 $ strace -tt -T -e read,select top -b -n2 21 1/dev/null |
   grep -Ee ' read[(][0-9]*, cpu | select[(]'
   12:27:55.985830 read(3, cpu  2146371 14 1345585 66007476..., 8192) = 
 1586 0.000395
   12:27:56.065141 select(0, NULL, NULL, NULL, {0, 50}) = 0 (Timeout) 
 0.500579
   12:27:56.644309 read(5, cpu  2146377 14 1345595 66007497..., 1024) = 
 1024 0.000164
   12:27:56.645109 select(0, NULL, NULL, NULL, {3, 0}) = 0 (Timeout) 3.002688
   12:27:59.730617 read(5, cpu  2146379 14 1345604 66007624..., 1024) = 
 1024 0.000164
 
 [lars@soda:~/DRBD/drbd-8.3]$ strace -tt -T -e read,select top -b -n2 -d 0.1 
 21 
 1/dev/null |
   grep -Ee ' read[(][0-9]*, cpu | select[(]'
   12:28:14.834497 read(3, cpu  2146386 14 1345657 66008224..., 8192) = 
 1586 0.000395
   12:28:14.919021 select(0, NULL, NULL, NULL, {0, 50}) = 0 (Timeout) 
 0.500582
   12:28:15.501349 read(5, cpu  2146391 14 1345669 66008247..., 1024) = 
 1024 0.000165
   12:28:15.502149 select(0, NULL, NULL, NULL, {0, 10}) = 0 (Timeout) 
 0.100165
   12:28:15.681910 read(5, cpu  2146394 14 1345674 66008252..., 1024) = 
 1024 0.000162
 
 Notice something? (mind the timestamp, and the -d option...)
 
 Default delay of top on my setup seems to be 3 seconds.  If you want the 
 same
 accuracy, then any replacement would need to sleep the same three seconds
 between two samplings of /proc/stat.
 
 
 So what you measure with your first variant (the print $4; exit)
 is the first estimate of top, after sampling /proc/stat twice with
 a hardcoded delay of 0.5 seconds, not waiting for the better, 3 seconds
 sampling period.  While this may be good enough, it is certainly not
 equivalent.
 
 You could drop the exit from awk, and do top -b -n1 to achieve the same.
 
 And exactly what do you think you measure with
   awk '$1 == cpu { print $7; exit }' /proc/stat ?
 column 7 is cumulative centiseconds spent in (hard)irq since system boot.
 What would the HealthCPU agent do with that?
 
 What you would need to do is sample /proc/stat twice, with a delay of,
 say, 1 to 3 seconds, calculate the relative cpu time spent in idle,
 and get 

Re: [Linux-HA] Antw: Re: pacemaker/HealthCPU

2011-02-04 Thread Michael Schwartzkopff
On Friday 04 February 2011 14:35:35 Ulrich Windl wrote:
 Lars,
 
 you are right, and I saw that my guess to use /proc/stat was wrong. top is
 slow in getting the current CPU usage. So basically I wondered if you need
 the CPU usage at all. If you'd switch to load, you could get it a lot
 faster.
 
 To be honest: I wondered what HealthCPU would monitor about the CPU's
 health when initially looking into it. I was kind of disappointed to see
 that it simply inspects the CPU usage (A CPU that is 100% busy (0% idle)
 may be quite healthy) ;-)
 
 Regards,
 Ulrich

Please consider the CPUHealth RA as a first try. Useful patches are always 
welcome.

Greetings,

-- 
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 München

Tel: (0163) 172 50 98


signature.asc
Description: This is a digitally signed message part.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Antw: Re: pacemaker/HealthCPU

2011-02-04 Thread Dimitri Maziuk
Lars Ellenberg wrote:
 On Fri, Feb 04, 2011 at 02:35:35PM +0100, Ulrich Windl wrote:
 Lars,

 you are right, and I saw that my guess to use /proc/stat was wrong. top is 
 slow in getting the current CPU usage. So basically I wondered if you need 
 the CPU usage at all. If you'd switch to load, you could get it a lot 
 faster.

 To be honest: I wondered what HealthCPU would monitor about the
 CPU's health when initially looking into it. I was kind of
 disappointed to see that it simply inspects the CPU usage (A CPU that
 is 100% busy (0% idle) may be quite healthy) ;-)
 
 Health was arguably a bad choice, Utilization may have been more 
 appropriate.
 But try to define cpu health...

Of course, with 4+-core CPUs, you'd very rarely see all of them at 100% 
busy. Especially when it only takes one to saturate your i/o bus.

Dima
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Antw: Re: pacemaker/HealthCPU

2011-02-03 Thread Ulrich Windl
 Michael Schwartzkopff mi...@clusterbau.com schrieb am 03.02.2011 um 13:09 
 in
Nachricht 201102031309.04931.mi...@clusterbau.com:
 On Thursday 03 February 2011 12:35:34 Ulrich Windl wrote:
  Hi!
  
  I'm starting to explore Linux-HA. Examining one of the monitors, I think
  things could be made much more efficient. For example: To get the percent
  of idle CPU the monitor uses 4 processes: top -b -n2 | grep Cpu | tail -1
  | awk -F,|\.[0-9]%id '{ print $4 }'
  
  However awk can do the effect of grep and tail as well. My first attempt is
  this: top -b -n2 | awk -F,|\.[0-9]%id '/^Cpu/{ print $4; exit }'
  
  My second attempt uses /proc/stat instead, avoiding the slow top process:
  awk '$1 == cpu { print $7; exit }' /proc/stat
  
  time (top -b -n2 | grep Cpu | tail -1 | awk -F,|\.[0-9]%id '{ print $4
  }') awk: warning: escape sequence `\.' treated as plain `.'
   99
  
  real0m3.533s
  user0m0.008s
  sys 0m0.008s
  
   time (top -b -n2| awk -F,|\.[0-9]%id '/^Cpu/{ print $4; exit }')
  awk: warning: escape sequence `\.' treated as plain `.'
   99
  
  real0m0.518s
  user0m0.000s
  sys 0m0.008s
  
  time awk '$1 == cpu { print $7; exit }' /proc/stat
  98
  
  real0m0.004s
  user0m0.000s
  sys 0m0.000s
  
  Regards,
  Ulrich
 
 Hi,
 
 good idea. The only problem it that the information in /proc/stats is in 
 ticks 
 and does not give you an absolute value. So you would have to calculate the 
 difference yourself, which makes the task much more difficult.

OK,

what about this:
time (procinfo | awk '$1 == idle  $2 == : { if (sub(%, , $5)) { print 
$5 } else { sub(%, , $4); print $4} }')
99.4

real0m0.010s
user0m0.000s
sys 0m0.000s
# Maybe use print int($x) to see an integer
(I didn't know the details of /proc/stat. Those who want to might read: 
/usr/src/linux/kernel/sched.c, /usr/src/linux/include/linux/kernel_stat.h, 
/usr/src/linux/include/asm-generic/cputime.h)

Regards,
Ulrich
P.S: The lines that are processed look like this:
system:   1:21:29.98   0.6%  page act:2141840
IOwait:   3:58:58.89   1.9%  page dea:1635314
hw irq:   0:02:17.38   0.0%  page flt:  766785934
sw irq:   0:13:44.90   0.1%  swap in :562
idle  :   7d 21:37:23.52  90.4%  swap out:899
uptime:   2d  4:25:34.97 context :  119011606


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: Re: pacemaker/HealthCPU

2011-02-03 Thread Michael Schwartzkopff
On Thursday 03 February 2011 16:00:54 Ulrich Windl wrote:
  Michael Schwartzkopff mi...@clusterbau.com schrieb am 03.02.2011 um
  13:09 in
 
 Nachricht 201102031309.04931.mi...@clusterbau.com:
  On Thursday 03 February 2011 12:35:34 Ulrich Windl wrote:
   Hi!
   
   I'm starting to explore Linux-HA. Examining one of the monitors, I
   think things could be made much more efficient. For example: To get
   the percent of idle CPU the monitor uses 4 processes: top -b -n2 |
   grep Cpu | tail -1
   
   | awk -F,|\.[0-9]%id '{ print $4 }'
   
   However awk can do the effect of grep and tail as well. My first
   attempt is this: top -b -n2 | awk -F,|\.[0-9]%id '/^Cpu/{ print $4;
   exit }'
   
   My second attempt uses /proc/stat instead, avoiding the slow top
   process: awk '$1 == cpu { print $7; exit }' /proc/stat
   
   time (top -b -n2 | grep Cpu | tail -1 | awk -F,|\.[0-9]%id '{ print
   $4 }') awk: warning: escape sequence `\.' treated as plain `.'
   
99
   
   real0m3.533s
   user0m0.008s
   sys 0m0.008s
   
time (top -b -n2| awk -F,|\.[0-9]%id '/^Cpu/{ print $4; exit }')
   
   awk: warning: escape sequence `\.' treated as plain `.'
   
99
   
   real0m0.518s
   user0m0.000s
   sys 0m0.008s
   
   time awk '$1 == cpu { print $7; exit }' /proc/stat
   98
   
   real0m0.004s
   user0m0.000s
   sys 0m0.000s
   
   Regards,
   Ulrich
  
  Hi,
  
  good idea. The only problem it that the information in /proc/stats is in
  ticks
  and does not give you an absolute value. So you would have to calculate
  the difference yourself, which makes the task much more difficult.
 
 OK,
 
 what about this:
 time (procinfo | awk '$1 == idle  $2 == : { if (sub(%, , $5)) {
 print $5 } else { sub(%, , $4); print $4} }') 99.4
 
 real0m0.010s
 user0m0.000s
 sys 0m0.000s
 # Maybe use print int($x) to see an integer
 (I didn't know the details of /proc/stat. Those who want to might read:
 /usr/src/linux/kernel/sched.c, /usr/src/linux/include/linux/kernel_stat.h,
 /usr/src/linux/include/asm-generic/cputime.h)
 
 Regards,
 Ulrich
 P.S: The lines that are processed look like this:
 system:   1:21:29.98   0.6%  page act:2141840
 IOwait:   3:58:58.89   1.9%  page dea:1635314
 hw irq:   0:02:17.38   0.0%  page flt:  766785934
 sw irq:   0:13:44.90   0.1%  swap in :562
 idle  :   7d 21:37:23.52  90.4%  swap out:899
 uptime:   2d  4:25:34.97 context :  119011606

Ok.l That looks better. Here I see the problem that the procinfo package might 
not be installed on all cluster nodes and thus the resource-agent package 
would have to be made depended on this package.


-- 
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 München

Tel: (0163) 172 50 98


signature.asc
Description: This is a digitally signed message part.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Antw: Re: pacemaker/HealthCPU

2011-02-03 Thread Ulrich Windl
 Soffen, Matthew msof...@iso-ne.com schrieb am 03.02.2011 um 16:35 in
Nachricht e847bfef193361409d48010ec8ace3bc01703...@exchangebe.iso-ne.com:
 Morning All,
 
 Please also keep in mind that /proc/stat is ONLY in Linux and Linux-HA 
 despite the name is also used on FreeBSD and Solaris.

Hi!

Good thought: I was wondering whether the output format on other systems 
matches that of Linux. Anyway, the other solutions are much faster than the 
original. Maybe a $(uname) will help.

For example a Perl script that is supposed to get the list of running processes 
on HP-UX and Linux (SLES10 SP3) uses a hash table like this:

use constant PS_CONF = {
# PID  TTY   TIME COMMAND
# 6714 ?   223:15 dw.sapC11_DVEBMGS00 pf=/usr/sap/C11/... bla bla
'hpux' = ['ps -ex', qr/^\s*(\d+)\s+\S+\s+\d+:\d+\s+(\S+)(.*)$/,
   qr/^init$/],
#   PID TTY  STAT   TIME COMMAND
# 15046 ?S110:29 dw.sapC11_D09 pf=/usr/sap/C11/... bla bla
'linux' = ['ps ax', qr/^\s*(\d+)\s+\S+\s+\S+\s+\d+:\d+\s+(\S+)(.*)$/,
qr/^init$/],
};  # OS($^O)-dependent ps configuration
my $ps_conf = PS_CONF-{$^O};

Regards,
Ulrich


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems