On Thu, 2006-06-22 at 09:35 +0700, Alain Fauconnet wrote:
> One point I wish to make in this discussion is that load average
> includes more than pure CPU load (on most systems I've worked on
> anyway). It also takes other factors into account e.g. disk I/O rate.
> Using 'nice' is certainly the best way to make a process run only when
> CPU load is low, but if that process is a disk I/O hog, it will still
> impact the overall load average and compete with other processes
> making disk I/Os. Same goes for memory hogs and swapping induced disk
> I/O.

The script in question seemed to be only be looking at CPU load, not io
or swap load.  "load", in general, could be measured against any
resource there is contention for, but usually it's just against
processor resources.

Load is merely a count of how many processes are in the "ready to
run/waiting for CPU" state when the measurement is taken.  Processes
that are waiting on disk I/O are not ready to run (they will have a D in
the state field in top).  Processes that are doing a lot of I/O to a
fast medium or to a highly cached medium would be "ready to run" more
often than one that is doing I/O to a slow medium, but this process
would contribute less to the load than a process that never made a
blocking system call and was still ready to run at the end of its
timeslice.

You can observe the length of the run queue with a frequency smaller
than how the "load average" numbers are calculated by using vmstat with
a small delay (1 or 2 seconds).  As an experiment, observe 'vmstat 2' on

      * an idle system
      * a system that is running "perl -e 'while(1){}'"
      * a system that is running "while [ 0 ]; do find / -type f -exec
        cat {} \; > /dev/null 2>&1 ; done"

Note the "r" and "io" columns in vmstat's output.  When the tight perl
loop is running, load is 100%, 1, there is always one process wanting
more CPU time (run this perl one-liner multiple times in different
terminals, and note the change in r).  When the continuous find is
running, there is a lot of disk I/O (bi = blocks in, at least until the
entire filesystem is cached if you have a lot of RAM), but few processes
waiting for CPU time (the r column).  Load is low with the continuous
find (which is always waiting on disk), but high with the tight loop.

Swapping processes fall under io related waiting in general, but because
high swap usage (the si and so columns in vmstat) can keep the CPU from
running programs, the CPU could be spending most of it's time doing
memory and swap management, which can make the run queue grow longer as
more processes become ready to run but have not been run yet.

High r values (or a sustained r value greater than the number of
processors in the system) means your processes are CPU bound -- it might
be time to upgrade the CPU (faster CPU, so more work can be done in less
time, or more cores, so more work can be in parallel).

High bi and bo values mean your processes are disk bound.  If the r
value is low and the b (blocked processes -- uninterruptible sleep) is
non-zero and there is a lot of swap activity, faster disks may help.  It
may also help to put more RAM in the system, to lessen swap usage and/or
to provide disk/filesystem cache (thereby actually going to the physical
disks less often).

High si and so values most likely mean the machine needs more RAM.  It's
not bad to have things swapped out, unless they are continuously being
used (which is why you may see gettys completely swapped out if you have
not logged into the console in a long while -- no big deal since they
are not being used).  What's important is the swap activity (si and so),
not necessarily the amount of swap in use (the swpd column).  In some
cases, the system might swap out processes that are not running in order
to make room for filesystem cache; this is often good.  Some of the
settings that influence swap usage and cache sizes can be tweaked
using /proc, if you wanted to play with that.  I've found the defaults
to be reasonably acceptable across a wide range of systems and services,
databases, webservers, samba and postfix/imap (not to say that there
isn't room for customized improvement, though!), although most of my
systems are relatively small, so YMMV.

The ideal situation is that, if there are a number of processes
executing, when process A is on the CPU, processes B, C, and D are
waiting for disk activity, and when B's disk activity is complete, A is
off the CPU (waiting for io) and the CPU is now ready to continue with
process B.  This should give a sustained load of 1 with the CPU being
busy all the time, even though multiple processes are "executing".

vmstat is installed with the procps package.

-- 
Andy Bakun <[EMAIL PROTECTED]>

_______________________________________________
tsl-discuss mailing list
[email protected]
http://lists.trustix.org/mailman/listinfo/tsl-discuss

Reply via email to