Hi, Jean
Thank you for your explaination and your helpful advices :)
It seems you're right. After reading your guess, it reminded me that
something I've read from a book, Solairs Performance and Tools, and the
author talked about the decaying problem in this book.
2007/6/18, Jean-Francois Richard <[EMAIL PROTECTED]>:
Hi
I don't know enough about the specifics for Open Solaris so the
following is based on the guess that it is the same as regular Solaris (the
fact that your vmstat r queue is at values like 7 and that idle CPU is at 0%
but prstat shows only about 4% CPU makes this feel like a good guess) .
Historically, Solaris used the "pcpu" parameter to display CPU against
individual processes in prstat (or as part of ps -ef with the -o option). As
pcpu is based on a slowly decaying average (equivalent to exponential
weighting over the last minute) it deals poorly with processes with short
lives such as the one you have (your example from your snap shot below shows
that the CPU times are all very low - suggesting processes which don't stay
up for long). Typically, changing the prstat refresh time doesn't change
anything because the %cpu it is displaying remains the decaying average over
the last minute. Alternatives could be prstat with the -m option if it is
supported. The TOP tool
yes, -m option works partially. It shows the whole system's user and system
CPU percentage correctly, but to each process, they all look idle as well,
and no short lives are displayed either.
changed how it calculates per process CPU% for Solaris (it stopped
displaying the pcpu parameter and started to calculate %cpu based on the
increases in cpu time) as of version 3.6, so you might want to give that a
try. Because TOP calculates CPU% of the processes based on CPU time, going
to a shorter refresh rate actually does help to improve accuracy for short
processes... but it can't go below 1 second. Many short lived processes
live less than one second. I don't know
TOP doesn't work correctly, it ignores all the short lives as well as the
prstat. But the whole system's user and system CPU percentage are correct in
TOP.
Dtrace that well but look forward to having a good tools to deal with this
type of situation.
Yes, DTrace works! I've used the dtrace script named shortlived.d introduced
by ,Matty in his email and the result is:
[EMAIL PROTECTED]:~/performance/DTraceToolkit-0.96/Bin# ./shortlived.d
Tracing... Hit Ctrl-C to stop.
^C
short lived processes: 15.941 secs
total sample duration: 20.880 secs
Total time by process name,
mkdir 26 ms
mv 36 ms
lint2 37 ms
date 40 ms
lint 642 ms
sh 678 ms
dmake 1422 ms
lint1 12690 ms
Total time by PPID,
26628 5 ms
26637 5 ms
26643 5 ms
26659 5 ms
26665 5 ms
26682 5 ms
26691 5 ms
26697 5 ms
26715 5 ms
26721 5 ms
26739 5 ms
26748 5 ms
26754 6 ms
26653 7 ms
26678 7 ms
26699 7 ms
26709 7 ms
26723 7 ms
26727 7 ms
26733 7 ms
24185 8 ms
26621 8 ms
26624 8 ms
26675 8 ms
26703 8 ms
26768 8 ms
26780 8 ms
26784 8 ms
26788 8 ms
26792 8 ms
26796 8 ms
26800 8 ms
26804 8 ms
26808 8 ms
26812 8 ms
26820 8 ms
26828 8 ms
26840 8 ms
26844 8 ms
26860 8 ms
26872 8 ms
26756 9 ms
26757 9 ms
26764 9 ms
26776 9 ms
26816 9 ms
26824 9 ms
26832 9 ms
26836 9 ms
26848 9 ms
26852 9 ms
26856 9 ms
26864 9 ms
26873 9 ms
26772 10 ms
26845 10 ms
26861 10 ms
26667 11 ms
26765 11 ms
26837 11 ms
26865 11 ms
26613 12 ms
26645 12 ms
26773 12 ms
26777 12 ms
26797 12 ms
26801 12 ms
26825 12 ms
26833 12 ms
26841 12 ms
26607 13 ms
26648 13 ms
26670 13 ms
26758 13 ms
26759 13 ms
26769 13 ms
26785 13 ms
26789 13 ms
26793 13 ms
26805 13 ms
26809 13 ms
26813 13 ms
26817 13 ms
26821 13 ms
26829 13 ms
26853 13 ms
26616 14 ms
26700 14 ms
26704 14 ms
26728 14 ms
26781 14 ms
26849 14 ms
26857 14 ms
26610 15 ms
26674 15 ms
26724 15 ms
26652 16 ms
26708 23 ms
26732 23 ms
26737 24 ms
26874 33 ms
26626 39 ms
26680 39 ms
26862 46 ms
26620 48 ms
26590 51 ms
26713 60 ms
26689 61 ms
26635 67 ms
26657 67 ms
26766 144 ms
26866 147 ms
26838 152 ms
26846 155 ms
26625 163 ms
26679 163 ms
26746 177 ms
26842 180 ms
26668 205 ms
26646 206 ms
26656 211 ms
26634 214 ms
26688 250 ms
26712 252 ms
26761 255 ms
26790 263 ms
26798 263 ms
26806 264 ms
26814 265 ms
26818 265 ms
26826 266 ms
26810 267 ms
26822 267 ms
26778 270 ms
26834 270 ms
26770 271 ms
26774 271 ms
26671 273 ms
26649 275 ms
26802 275 ms
26854 292 ms
26608 351 ms
26830 360 ms
26614 366 ms
26794 374 ms
26850 391 ms
26705 410 ms
26729 412 ms
26760 430 ms
26786 442 ms
26858 459 ms
26617 510 ms
26782 546 ms
26725 549 ms
26701 556 ms
26611 694 ms
[EMAIL PROTECTED]:~/performance/DTraceToolkit-0.96/Bin#
YES! DTRACE CATCHED THESE SHORT LIVES!
Hope it helps and others, please correct me if I am saying anything wrong
or that doesn't apply to OpenSolaris,
JF.
------------------------------
*From:* [EMAIL PROTECTED] [mailto:
[EMAIL PROTECTED] *On Behalf Of [EMAIL PROTECTED]
*Sent:* Monday, June 18, 2007 8:28 AM
*To:* [email protected]; [EMAIL PROTECTED];
[EMAIL PROTECTED]
*Subject:* [perf-discuss] cpu performance counters obtained by vmstat and
prstat look conflict
Dear all:
I'm compiling the ON build 65's source code now, using "nightly
opensolaris.sh" command.
The prstat reports that system is very idle, but the load average tells me
that the system is very busy. -,-
And then I check the vmstat report, it shows the system is busy now, too.
Following are the reports, what's the problem?
ps, My system is a dell workstation with a P4 1.7G CPU and 512MB memory.
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
6673 root 13M 12M sleep 35 0 0:00:02 2.9% dmake/1
9086 root 5672K 3396K run 35 0 0:00:00 0.5% acomp/1
6634 root 2080K 1344K sleep 59 0 0:00:00 0.2% vmstat/1
9080 root 9640K 6576K run 15 0 0:00:00 0.2% ube/1
8383 root 4320K 2764K cpu0 59 0 0:00:00 0.1% prstat/1
7720 root 8112K 3928K sleep 59 0 0:00:00 0.0% sshd/1
9083 root 1140K 876K sleep 35 0 0:00:00 0.0% sh/1
9071 root 1200K 920K run 15 0 0:00:00 0.0% cc/1
9069 root 1140K 876K sleep 45 0 0:00:00 0.0% sh/1
9085 root 1192K 916K sleep 35 0 0:00:00 0.0% cc/1
9070 root 996K 688K sleep 35 0 0:00:00 0.0% cw/1
9084 root 996K 668K run 25 0 0:00:00 0.0% cw/1
9082 root 13M 1320K sleep 35 0 0:00:00 0.0% dmake/1
9068 root 13M 1316K sleep 45 0 0:00:00 0.0% dmake/1
7979 root 7836K 2040K sleep 59 0 0:00:00 0.0% sshd/1
7984 root 2588K 1820K sleep 59 0 0:00:00 0.0% bash/1
7918 root 7836K 2044K sleep 59 0 0:00:01 0.0% sshd/1
117 daemon 4008K 1972K sleep 59 0 0:00:01 0.0% kcfd/3
9742 root 4012K 2552K sleep 59 0 0:00:11 0.0% nscd/25
5309 root 12M 6428K sleep 59 0 0:10:35 0.0% smbd/1
7598 root 4600K 3860K sleep 59 0 0:00:02 0.0% dmake/1
7597 root 972K 688K sleep 59 0 0:00:00 0.0% time/1
27340 root 3152K 2396K sleep 59 0 0:00:00 0.0% dmake/1
26430 root 3176K 2436K sleep 59 0 0:00:00 0.0% dmake/1
NPROC USERNAME SIZE RSS MEMORY TIME CPU
63 root 289M 138M 27% 0:21:04 *4.3%
* 2 daemon 6332K 2960K 0.6% 0:00:01 0.0%
1 smmsp 6996K 1452K 0.3% 0:00:05 0.0%
Total: 66 processes, 176 lwps, load averages: *2.97, 3.04, 3.03*
[EMAIL PROTECTED]:~#vmstat 1
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr cd f0 s0 -- in sy cs us
sy id
0 0 0 1238652 245720 15 22 40 0 0 0 0 2 0 0 0 419 153 155 1
1 97
1 0 0 1201976 225460 1445 6769 0 67 67 0 0 7 0 0 0 317 5544 115 73
27 0
3 0 0 1190896 224480 6842 18811 0 72 72 0 0 10 0 0 0 332 11701 223 35
65 0
7 0 0 1196188 231540 4681 13909 0 99 99 0 0 10 0 0 0 340 10901 292 47
53 0
4 0 0 1196072 230308 4168 11179 0 67 67 0 0 8 0 0 0 329 11525 171 58
42 0
3 0 0 1183820 218020 1415 7525 0 28 28 0 0 6 0 0 0 325 5082 135 74
26 0
2 0 0 1189328 222504 4544 12530 0 139 139 0 0 16 0 0 0 348 10556 309 50
50 0
8 0 0 1194864 229620 5194 15550 0 36 36 0 0 11 0 0 0 333 10805 204 43
57 0
7 0 0 1195228 229536 5172 14077 0 67 67 0 0 13 0 0 0 338 10306 196 50
50 0
7 0 0 1187484 226696 4699 14620 0 115 115 0 0 11 0 0 0 336 10514 205 *45
55 0
*^C
[EMAIL PROTECTED]:~#
It seems that prstat doesn't report the correct CPU usage percent for some
processes.
Regards
TJ
Regards
TJ
_______________________________________________
opensolaris-discuss mailing list
[email protected]