+1 Mike
On Thu, 2011-10-20 at 11:47 -0700, Rennie Allen wrote: > I'd like to see a run of the script I sent earlier. I don't trust > intrstat (not for any particular reason, other than that I have never used > it)... > > > On 10/20/11 11:33 AM, "Michael Stapleton" > <michael.staple...@techsologic.com> wrote: > > >Don't know. I don't like to trouble shoot by guess if possible. I rather > >follow the evidence to capture the culprit. Use what we know to discover > >what we do not know. > > > >We know CS rate in vmstat is high, we know Sys time is high, we know > >syscall rate is low, we know it is not a user process therefor it is > >kernel. Likely a driver. > > > >So what kernel code is running the most? > > > >What's causing that code to run? > > > >Does that code belong to a driver? > > > > > >Mike > > > > > > > >On Thu, 2011-10-20 at 20:25 +0200, Michael Schuster wrote: > > > >> Hi, > >> > >> just found this: > >> http://download.oracle.com/docs/cd/E19253-01/820-5245/ghgoc/index.html > >> > >> does it help? > >> > >> On Thu, Oct 20, 2011 at 20:23, Michael Stapleton > >> <michael.staple...@techsologic.com> wrote: > >> > My understanding is that it is not supposed to be a loaded system. We > >> > want to know what the load is. > >> > > >> > > >> > gernot@tintenfass:~# intrstat 30 > >> > > >> > device | cpu0 %tim cpu1 %tim > >> > -------------+------------------------------ > >> > e1000g#0 | 1 0,0 0 0,0 > >> > ehci#0 | 0 0,0 4 0,0 > >> > ehci#1 | 3 0,0 0 0,0 > >> > hci1394#0 | 0 0,0 2 0,0 > >> > i8042#1 | 0 0,0 4 0,0 > >> > i915#1 | 0 0,0 2 0,0 > >> > pci-ide#0 | 15 0,1 0 0,0 > >> > uhci#0 | 0 0,0 2 0,0 > >> > uhci#1 | 0 0,0 0 0,0 > >> > uhci#2 | 3 0,0 0 0,0 > >> > uhci#3 | 0 0,0 2 0,0 > >> > uhci#4 | 0 0,0 4 0,0 > >> > > >> > device | cpu0 %tim cpu1 %tim > >> > -------------+------------------------------ > >> > e1000g#0 | 1 0,0 0 0,0 > >> > ehci#0 | 0 0,0 3 0,0 > >> > ehci#1 | 3 0,0 0 0,0 > >> > hci1394#0 | 0 0,0 1 0,0 > >> > i8042#1 | 0 0,0 6 0,0 > >> > i915#1 | 0 0,0 1 0,0 > >> > pci-ide#0 | 3 0,0 0 0,0 > >> > uhci#0 | 0 0,0 1 0,0 > >> > uhci#1 | 0 0,0 0 0,0 > >> > uhci#2 | 3 0,0 0 0,0 > >> > uhci#3 | 0 0,0 1 0,0 > >> > uhci#4 | 0 0,0 3 0,0 > >> > > >> > gernot@tintenfass:~# vmstat 5 10 > >> > kthr memory page disk faults > >> > cpu > >> > r b w swap free re mf pi po fr de sr cd s0 s1 s2 in sy cs > >>us > >> > sy id > >> > 0 0 0 4243840 1145720 1 6 0 0 0 0 2 0 1 1 1 9767 121 > >>37073 0 > >> > 54 46 > >> > 0 0 0 4157824 1059796 4 11 0 0 0 0 0 0 0 0 0 9752 119 > >>37132 0 > >> > 54 46 > >> > 0 0 0 4157736 1059752 0 0 0 0 0 0 0 0 0 0 0 9769 113 > >>37194 0 > >> > 54 46 > >> > 0 0 0 4157744 1059788 0 0 0 0 0 0 0 0 0 0 0 9682 104 > >>36941 0 > >> > 54 46 > >> > 0 0 0 4157744 1059788 0 0 0 0 0 0 0 0 0 0 0 9769 105 > >>37208 0 > >> > 54 46 > >> > 0 0 0 4157728 1059772 0 1 0 0 0 0 0 0 0 0 0 9741 159 > >>37104 0 > >> > 54 46 > >> > 0 0 0 4157728 1059772 0 0 0 0 0 0 0 0 0 0 0 9695 127 > >>36931 0 > >> > 54 46 > >> > 0 0 0 4157744 1059788 0 0 0 0 0 0 0 0 0 0 0 9762 105 > >>37188 0 > >> > 54 46 > >> > 0 0 0 4157744 1059788 0 0 0 0 0 0 0 0 0 0 0 9723 102 > >>37058 0 > >> > 54 46 > >> > 0 0 0 4157744 1059788 0 0 0 0 0 0 0 0 0 0 0 9774 105 > >>37263 0 > >> > 54 46 > >> > > >> > Mike > >> > > >> > > >> > On Thu, 2011-10-20 at 11:02 -0700, Rennie Allen wrote: > >> > > >> >> Sched is the scheduler itself. How long did you let this run? If > >>only > >> >> for a couple of seconds, then that number is high, but not > >>ridiculous for > >> >> a loaded system, so I think that this output rules out a high context > >> >> switch rate. > >> >> > >> >> Try this command to see if some process is making an excessive > >>number of > >> >> syscalls: > >> >> > >> >> dtrace -n 'syscall:::entry { @[execname]=count()}' > >> >> > >> >> If not, then I'd try looking at interrupts... > >> >> > >> >> > >> >> On 10/20/11 10:52 AM, "Gernot Wolf" <gw.i...@chello.at> wrote: > >> >> > >> >> >Yeah, I've been able to run this diagnostics on another OI box (at > >>my > >> >> >office, so much for OI not being used in production ;)), and noticed > >> >> >that there were several values that were quite different. I just > >>don't > >> >> >have any idea on the meaning of this figures... > >> >> > > >> >> >Anyway, here are the results of the dtrace command (I executed the > >> >> >command twice, hence two result sets): > >> >> > > >> >> >gernot@tintenfass:~# dtrace -n 'sched:::off-cpu { > >>@[execname]=count()}' > >> >> >dtrace: description 'sched:::off-cpu ' matched 3 probes > >> >> >^C > >> >> > > >> >> > ipmgmtd > >> 1 > >> >> > gconfd-2 > >> 2 > >> >> > gnome-settings-d > >> 2 > >> >> > idmapd > >> 2 > >> >> > inetd > >> 2 > >> >> > miniserv.pl > >> 2 > >> >> > netcfgd > >> 2 > >> >> > nscd > >> 2 > >> >> > ospm-applet > >> 2 > >> >> > ssh-agent > >> 2 > >> >> > sshd > >> 2 > >> >> > svc.startd > >> 2 > >> >> > intrd > >> 3 > >> >> > afpd > >> 4 > >> >> > mdnsd > >> 4 > >> >> > gnome-power-mana > >> 5 > >> >> > clock-applet > >> 7 > >> >> > sendmail > >> 7 > >> >> > xscreensaver > >> 7 > >> >> > fmd > >> 9 > >> >> > fsflush > >>11 > >> >> > ntpd > >>11 > >> >> > updatemanagernot > >>13 > >> >> > isapython2.6 > >>14 > >> >> > devfsadm > >>20 > >> >> > gnome-terminal > >>20 > >> >> > dtrace > >>23 > >> >> > mixer_applet2 > >>25 > >> >> > smbd > >>39 > >> >> > nwam-manager > >>60 > >> >> > svc.configd > >>79 > >> >> > Xorg > >>100 > >> >> > sched > >>394078 > >> >> > > >> >> >gernot@tintenfass:~# dtrace -n 'sched:::off-cpu { > >>@[execname]=count()}' > >> >> >dtrace: description 'sched:::off-cpu ' matched 3 probes > >> >> >^C > >> >> > > >> >> > automountd > >> 1 > >> >> > ipmgmtd > >> 1 > >> >> > idmapd > >> 2 > >> >> > in.routed > >> 2 > >> >> > init > >> 2 > >> >> > miniserv.pl > >> 2 > >> >> > netcfgd > >> 2 > >> >> > ssh-agent > >> 2 > >> >> > sshd > >> 2 > >> >> > svc.startd > >> 2 > >> >> > fmd > >> 3 > >> >> > hald > >> 3 > >> >> > inetd > >> 3 > >> >> > intrd > >> 3 > >> >> > hald-addon-acpi > >> 4 > >> >> > nscd > >> 4 > >> >> > gnome-power-mana > >> 5 > >> >> > sendmail > >> 5 > >> >> > mdnsd > >> 6 > >> >> > devfsadm > >> 8 > >> >> > xscreensaver > >> 9 > >> >> > fsflush > >>10 > >> >> > ntpd > >>14 > >> >> > updatemanagernot > >>16 > >> >> > mixer_applet2 > >>21 > >> >> > isapython2.6 > >>22 > >> >> > dtrace > >>24 > >> >> > gnome-terminal > >>24 > >> >> > smbd > >>39 > >> >> > nwam-manager > >>58 > >> >> > zpool-rpool > >>65 > >> >> > svc.configd > >>79 > >> >> > Xorg > >>82 > >> >> > sched > >>369939 > >> >> > > >> >> >So, quite obviously there is one executable standing out here, > >>"sched", > >> >> >now what's the meaning of this figures? > >> >> > > >> >> >Regards, > >> >> >Gernot Wolf > >> >> > > >> >> > > >> >> >Am 20.10.11 19:22, schrieb Michael Stapleton: > >> >> >> Hi Gernot, > >> >> >> > >> >> >> You have a high context switch rate. > >> >> >> > >> >> >> try > >> >> >> #dtrace -n 'sched:::off-cpu { @[execname]=count()}' > >> >> >> > >> >> >> For a few seconds to see if you can get the name of and > >>executable. > >> >> >> > >> >> >> Mike > >> >> >> On Thu, 2011-10-20 at 18:44 +0200, Gernot Wolf wrote: > >> >> >> > >> >> >>> Hello all, > >> >> >>> > >> >> >>> I have a machine here at my home running OpenIndiana oi_151a, > >>which > >> >> >>> serves as a NAS on my home network. The original install was > >> >> >>>OpenSolaris > >> >> >>> 2009.6 which was later upgraded to snv_134b, and recently to > >>oi_151a. > >> >> >>> > >> >> >>> So far this OSOL (now OI) box has performed excellently, with > >>one major > >> >> >>> exception: Sometimes, after a reboot, the cpu load was about > >>50-60%, > >> >> >>> although the system was doing nothing. Until recently, another > >>reboot > >> >> >>> solved the issue. > >> >> >>> > >> >> >>> This does not work any longer. The system has always a cpu load > >>of > >> >> >>> 50-60% when idle (and higher of course when there is actually > >>some work > >> >> >>> to do). > >> >> >>> > >> >> >>> I've already googled the symptoms. This didn't turn up very much > >>useful > >> >> >>> info, and the few things I found didn't apply to my problem. Most > >> >> >>> noticably was this problem which could be solved by disabling > >>cpupm in > >> >> >>> /etc/power.conf, but trying that didn't show any effect on my > >>system. > >> >> >>> > >> >> >>> So I'm finally out of my depth. I have to admit that my > >>knowledge of > >> >> >>> Unix is superficial at best, so I decided to try looking for > >>help here. > >> >> >>> > >> >> >>> I've run several diagnostic commands like top, powertop, > >>lockstat etc. > >> >> >>> and attached the results to this email (I've zipped the results > >>of > >> >> >>>kstat > >> >> >>> because they were>1MB). > >> >> >>> > >> >> >>> One important thing is that when I boot into the oi_151a live dvd > >> >> >>> instead of booting into the installed system, I also get the > >>high cpu > >> >> >>> load. I mention this because I have installed several things on > >>my OI > >> >> >>> box like vsftpd, svn, netstat etc. I first thought that this > >>problem > >> >> >>> might be caused by some of this extra stuff, but getting the same > >> >> >>>system > >> >> >>> when booting the live dvd ruled that out (I think). > >> >> >>> > >> >> >>> The machine is a custom build medium tower: > >> >> >>> S-775 Intel DG965WHMKR ATX mainbord > >> >> >>> Intel Core 2 Duo E4300 CPU 1.8GHz > >> >> >>> 1x IDE DVD recorder > >> >> >>> 1x IDE HD 200GB (serves as system drive) > >> >> >>> 6x SATA II 1.5TB HD (configured as zfs raidz2 array) > >> >> >>> > >> >> >>> I have to solve this problem. Although the system runs fine and > >> >> >>> absolutely serves it's purpose, having the cpu at 50-60% load > >> >> >>>constantly > >> >> >>> is a waste of energy and surely a rather unhealthy stress on the > >> >> >>>hardware. > >> >> >>> > >> >> >>> Anyone any ideas...? > >> >> >>> > >> >> >>> Regards, > >> >> >>> Gernot Wolf > >> >> >>> _______________________________________________ > >> >> >>> OpenIndiana-discuss mailing list > >> >> >>> OpenIndiana-discuss@openindiana.org > >> >> >>> http://openindiana.org/mailman/listinfo/openindiana-discuss > >> >> >> > >> >> >> > >> >> >> _______________________________________________ > >> >> >> OpenIndiana-discuss mailing list > >> >> >> OpenIndiana-discuss@openindiana.org > >> >> >> http://openindiana.org/mailman/listinfo/openindiana-discuss > >> >> >> > >> >> > > >> >> >_______________________________________________ > >> >> >OpenIndiana-discuss mailing list > >> >> >OpenIndiana-discuss@openindiana.org > >> >> >http://openindiana.org/mailman/listinfo/openindiana-discuss > >> >> > >> >> > >> >> > >> >> _______________________________________________ > >> >> OpenIndiana-discuss mailing list > >> >> OpenIndiana-discuss@openindiana.org > >> >> http://openindiana.org/mailman/listinfo/openindiana-discuss > >> > > >> > > >> > _______________________________________________ > >> > OpenIndiana-discuss mailing list > >> > OpenIndiana-discuss@openindiana.org > >> > http://openindiana.org/mailman/listinfo/openindiana-discuss > >> > > >> > >> > >> > > > > > >_______________________________________________ > >OpenIndiana-discuss mailing list > >OpenIndiana-discuss@openindiana.org > >http://openindiana.org/mailman/listinfo/openindiana-discuss > > > > _______________________________________________ > OpenIndiana-discuss mailing list > OpenIndiana-discuss@openindiana.org > http://openindiana.org/mailman/listinfo/openindiana-discuss _______________________________________________ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss