One of the more difficult performance monitoring problems that I have
come across is determining the impact of multiple workloads running on
a server.  Consider a server that has about 1000 database processes
that are long running - many minutes to many months - mixed with batch
jobs written in Bourne shell.  Largely due to the batch jobs, it is
not uncommon for sar to report hundreds of forks and execs per second.

There is somewhat of a knee-jerk reaction to move the batch jobs off
of the database server.  Howerver, quantifying how much of an impact
this would have is somewhat hard to do.  Trying to use "prstat -a" or
"prstat -J" does not seem to give a very accurate picture.  My guess
is that prstat will tend to miss out on all of the processes that were
very short lived.

The best solution that I have come up with is to write extended
accounting records (task) every few minutes, then to process the
exacct file afterwards.  Writing the code to write exacct records
periodically and make sense of them later is far from trivial.  It is
also impractical for multiple users (monitoring frameworks,
administrators, etc.) to make use of this approach on the same machine
at the same time due to the fact that the exacct records need to be
written and this is presumably a somewhat expensive operation to do
too often.

It seems as though it should be possible for the kernel to maintain
per-user, per-project, and per-zone statistics.  Perhaps collecting
them all the time is not desirable, but it seems as though updating
the three sets of statistics for each context switch would be lighter
weight than writing accounting records then post processing them.  The
side affect of having this data available would be that tools like
prstat could report accurate data.  Other tools could likely get this
data through kstat or a similar interface.

Maybe this already exists but is not exposed.  If it looks like a
reasonable thing to do, is there a good place to start looking to add
such functionality.  My guess is that it would be hidden somewhere in
/usr/src/uts/common/disp, but an admittedly short perusal of that
directory hasn't turned up anything obvious.

Mike
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to