Re: [dtrace-discuss] Web server(s) fails

2009-09-22 Thread Anil
... error, correction... I believe I was using the PID provider not the syscall. -- This message posted from opensolaris.org ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org

Re: [dtrace-discuss] Web server(s) fails

2009-09-22 Thread Anil
The syscall:::entry probe shows, is it stuck doing something?? How can I dig deeper? All I get is this when doing "GET /index.htm HTTP/1.1" through telnet port 80: ... 3 83394 ___errno:entry 3 80993 apr_atomic_dec32:entry 3 83062 atomic_dec_32_nv:

[dtrace-discuss] Web server(s) fails

2009-09-22 Thread Anil
I have a couple of zones. On one zone, I have lighttpd. On another, I got a customer using Apache. This is on a 2009.06 release. Every once in a while, maybe every other day, the server becomes unstable. HTTP requests no longer get a response. Sometimes, it works after a while w/o a problem. Bu

Re: [dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)

2009-09-22 Thread Jim Mauro
It would also be interesting to see some snapshots of the ZFS arc kstats kstat -n arcstats Thanks Jim Leonard wrote: Thanks for the awesome two-liner, I'd been struggling with 1-second intervals without a full-blown script. I modified it to output walltime so that I could zoom in on the prob

Re: [dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)

2009-09-22 Thread Jim Mauro
As Dan said, it looks like ZFS is busy. How much RAM is on this system? What release of Solaris? Do you have any ZFS tweaks in /etc/system? (like clamping the ARC size...) Is the system memory constrained? The xcalls are due to the page unmaps out of what I'm assuming is the ZFS ARC (although I'

Re: [dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)

2009-09-22 Thread Dan Mick
zfs is busy? Jim Leonard wrote: Thanks for the awesome two-liner, I'd been struggling with 1-second intervals without a full-blown script. I modified it to output walltime so that I could zoom in on the problem, and here it is: unix`xc_do_call+0x8f unix`xc_wait_sy

Re: [dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)

2009-09-22 Thread Jim Leonard
Thanks for the awesome two-liner, I'd been struggling with 1-second intervals without a full-blown script. I modified it to output walltime so that I could zoom in on the problem, and here it is: unix`xc_do_call+0x8f unix`xc_wait_sync+0x36 unix`x86pte_i

Re: [dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)

2009-09-22 Thread James Litchfield
Jim Mauro has provided an excellent starting point. Keep in mind that kernel threads will show up as pid 0 so you may be seeing a kernel thread Causing the activity. Jim L --Original Message-- From: Jim Leonard Sent: Tue, September 22, 2009 11:31 AM To: dtrace-discuss@opensolar

Re: [dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)

2009-09-22 Thread Jim Mauro
dtrace -n ':::xcalls { @s[stack()] = count() } tick-1sec { trunc(@s,10); printa(@s); clear(@s); }' That will tell us where the xcalls are coming from in the kernel, and we can go from there. Thanks, /jim Jim Leonard wrote: We have a 16-core x86 system that, at seemingly random intervals,

[dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)

2009-09-22 Thread Jim Leonard
We have a 16-core x86 system that, at seemingly random intervals, will completely stop responding for several seconds. Running an mpstat 1 showed something horrifiying: CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 1004691 397 170 0 0 0 5 0 0 0 100 0 0 (rest of C