[gentoo-user] Re: Advice on system monitoring

2011-12-05 Thread James
Michael Mol mikemol at gmail.com writes:


 Let's start with that dual-xeon box I was using to benchmark emerge
 -e @world, figure I'm looking for how better to tune my MAKEOPTS and
 EMERGE_DEFAULT_OPTS variables, and assume I'd like to get more
 information about the following factors:

Complex and never finished, imho.


 * What was the 1m, 5m 15m load averages?
 * What were the similar averages for CPU spent in user time, system
 time and I/O wait?

sys-process/iotop

 * What was network usage like? (I have a caching proxy server on the
 network

Lots of different tools to look at network performance:

wireshark,  (look around /usr/portage/net-analyzer)


 so even if distfiles are lost on-system, well, a cache hit
 transfers at up to around 50MB/s. It'd be better, except for read
 performance limitations on the router box, and write performance
 limitations on the local machine)


bonnie++ (or bonnie)


 * What was the temperature of each CPU core, RAM module and hard
 drive? (Not so relevant for improving system performance, but still of
 interest.)

app-admin/hddtemp (for drives)

dunno on individual cpu cores...

 I'd like to have a web interface I could navigate to which would show
 graphs of these counters.


Now all of that in one gui tool?  Do post back when you get it working,
as I'd like to use it too!

hth,
James






Re: [gentoo-user] Re: Advice on system monitoring

2011-12-05 Thread Michael Mol
On Mon, Dec 5, 2011 at 12:01 PM, James wirel...@tampabay.rr.com wrote:
 Michael Mol mikemol at gmail.com writes:
 Let's start with that dual-xeon box I was using to benchmark emerge
 -e @world, figure I'm looking for how better to tune my MAKEOPTS and
 EMERGE_DEFAULT_OPTS variables, and assume I'd like to get more
 information about the following factors:

 Complex and never finished, imho.


 * What was the 1m, 5m 15m load averages?
 * What were the similar averages for CPU spent in user time, system
 time and I/O wait?

 sys-process/iotop

 * What was network usage like? (I have a caching proxy server on the
 network

 Lots of different tools to look at network performance:

 wireshark,  (look around /usr/portage/net-analyzer)


 so even if distfiles are lost on-system, well, a cache hit
 transfers at up to around 50MB/s. It'd be better, except for read
 performance limitations on the router box, and write performance
 limitations on the local machine)


 bonnie++ (or bonnie)


 * What was the temperature of each CPU core, RAM module and hard
 drive? (Not so relevant for improving system performance, but still of
 interest.)

 app-admin/hddtemp (for drives)

 dunno on individual cpu cores...

 I'd like to have a web interface I could navigate to which would show
 graphs of these counters.


 Now all of that in one gui tool?  Do post back when you get it working,
 as I'd like to use it too!

The approach I'd like to take is to have all the monitoring set up,
launch emerge -e @world, and see what's going on around (and just
prior to) stalls and CPU waste. I'm defining a stall as where my
operating load falls below my number of CPU cores, and I'm defining
CPU waste as CPU time spent anywhere but 'user'. I'd like to look at
graphs of the metrics from over the course of the emerge.

My chief thought is this: I have both 'make' and 'emerge' trying to
reach a specific load average, which means that this particular
dynamic system is going to have feedback as they go back and forth. I
expect that I'll want to duck one of them under the other, but I don't
know which one yet, and I don't know how far.

I should also look to see if pbzip2 supports load awareness. Having
eight cores suddenly start churning through BWT blocks is great if
your load average is something like 0.24, but not so great if it
launches your load average up to around 12.

-- 
:wq