I'm not sure I understand what is being asked here, but I'll take a shot...
Note it is virtually impossible to write a piece of software that is guaranteed
to have sufficient space to buffer a given amount of data when the rate
and size of the data flow is unknown. This is one of the robustness
(I know, I know - shameless).
FYI - ebook deal of the day on Informit:
http://www.informit.com/store/index.aspx
Also FYI, the printing process is completed and boxes are
being shipped to retailers. Amazon should be shipping
next week.
www.dtracebook.com will be up and running very, very soon,
entry
.
.
.
On Tue, Dec 7, 2010 at 12:27 PM, Jim Mauro james.ma...@oracle.com wrote:
When you say straight from the dtrace book, I assume you mean
the soon-to-be-published book?
If that is the case, please allow me to clarify. Not every invocation of
dtrace,
both one-liners
Mike is correct. Pretty much every time I've seen this, it's
VM (VM = virtual memory = swap) related.
There's a DTrace script below you can run when you hit this
problem that will show us which system call is failing with an
EAGAIN error. It is most likely fork(2) (and yes, I know printing
the
wrote:
Sorry guys. Swap is not the issue. We've had this confirmed by Oracle and I
can clearly see there is 96GB of swap awailable on the system and ~50GB of
main memory.
By who at Oracle? Not everyone is equally qualified. I would tend to
trust Jim Mauro (who co-wrote the books[1] on Solaris
You could first filter out the target file system for the file IO by
doing a count() aggregation on;
[fds[arg0].fi_fs] = count();
NOTE - this will only work for those system calls that take a file
descriptor as arg0.
Once you know the FS target for the file IO (ufs? zfs? whatever), use it
in a
Hi Steffen - I actually think that code did change quite a bit
from s10 to nv.
I'm not sure what you need to do, but you may want to grab
Brendan's DTraceToolKit and have a look at tcptop and
tcpsnoop, and have a look at how Brendan did it for s10.
Thanks,
/jim
On May 5, 2010, at 8:21 AM,
Check out slides 17 and 18 of DTrace Tips, which you can find
here;
http://blogs.sun.com/bmc/entry/dtrace_tips_tricks_and_gotchas
On May 4, 2010, at 6:23 PM, Jianhua Yang wrote:
I wanted to find out what the single threaded process was doing with dtrace
but it returned with Not enough
fire. Fine
tune your probe spec from there.
On Apr 28, 2010, at 10:14 AM, Steve Gonczi wrote:
Do these post have some connection to the thread topic?
BTW. the mdb settings recommended by Jim Mauro actually did not make a
difference, I
jumped to to wrong conclusion, based on an incorrect test
Groan. I'm such an idiot. I should have been more
precise in terms of where and when you get set this.
Sorry folks.
On Apr 22, 2010, at 11:11 AM, Richard Skelton wrote:
Hi Jim,
If I set set idle_cpu_prefer_mwait = 0 in /etc/system on a X2270 running
Solaris 10 10/09 s10x_u8wos_08a X86
I
It's used to put threads to sleep that are blocking on user locks
(at least that's my recollection).
Run prstat -Lmp PID_OF_APP) - what does the LCK column look like?
Try running plockstat -A -p PID_OF_APP.
Thanks,
/jim
Dtrace Email wrote:
Hi, when doing dtrace on an appliction,
the interrupt (this
may be got by
intrstat) and which pid/execname is interrupted?
Thanks
Daniel
On Thu, Jan 21, 2010 at 10:43 PM, Jim Mauro
james.ma...@sun.com mailto:james.ma...@sun.com wrote:
sched is the execname of the PID 0
do we have a way to know what causes the interrupt (this may be got
by intrstat) and which pid/execname is interrupted?
Thanks
Daniel
On Thu, Jan 21, 2010 at 10:43 PM, Jim Mauro james.ma...@sun.com
mailto:james.ma...@sun.com wrote:
sched is the execname of the PID 0 process (run ps -e
sched is the execname of the PID 0 process (run ps -e).
The string sched gets plugged into the DTrace execname variable
if the CPU is in an interrupt handler when a probe fires.
CPU 0 is very likely taking the clock interrupts, which by default
occur every 10 milliseconds.
HTH,
/jim
Qihua
iostat -Cx 1 is your friend.
The -C flag will provide a rollup per controller
(c1, c2, etc) so you can determine the IO rate
on a per-controller basis (IOPS and bandwidth).
I'd start there. DTrace rocks, but you should be able
to answer this question with iostat.
/jim
Michael Brian - IL
the next few months. You can arrange
for auto email notification for updates when there
are changes.
We welcome any and all feedback.
Thanks,
Jim Mauro
Brendan Gregg
Chad Mynhier
Tariq Magdon-Ismail
___
dtrace-discuss mailing list
dtrace-discuss
?
Dave
On 11/23/09 10:43, Jim Mauro wrote:
(shameless plug).
We have a DTrace book underway, and while the
final product won't appear until Summer, 2010,
we're leveraging the Safari Books OnLine
Rough Cuts facility to make early drafts of chapters
generally available.
Right now, there are 3
configuration file,not in /etc/system.
Thanks.
Best Regards,
Simon
On Sun, Nov 22, 2009 at 11:33 AM, Jim Mauro james.ma...@sun.com
mailto:james.ma...@sun.com wrote:
You have about 9GB of shared memory (on a 16GB machine).
From the prstat output,we found 3 sybase process
You have about 9GB of shared memory (on a 16GB machine).
From the prstat output,we found 3 sybase process,and each process
derived 12 threads,the java process(launched by customer application)
derived total 370 threads, I think it's too many threads(especially of
java program) that
Right. All your memory appears to be anon segments -
13.4GB worth. About 9GB of that is the shared memory
segments. That leaves 4.4GB.
I see 13 Java processes listed. Assuming they have a similar
memory footprint as the one pmap example, which shows
about 40MB of RSS, that's (40MB x 13) about
If you're running out of memory, which it appears you are,
you need to profile the memory consumers, and determine if
you have either a memory leak somewhere, or an under-configured
system. Note 16GB is really tiny by todays standards, especially for
an M5000-class server. It's like putting an
D'oh!
Thanks
Jonathan Adams wrote:
On Tue, Oct 27, 2009 at 02:00:52PM -0400, Jim Mauro wrote:
I've run into this from time to time.
Simple example;
#dtrace -n 'hotspot27563::: { @[probename]=count(); } tick-1sec {
printa(@); clear(@); }'
The sample output (below) shows a couple
I'm cross-posting to zfs-discuss, as this is now more of a ZFS
query than a dtrace query at this point, and I'm not sure if all the ZFS
experts are listening on dtrace-discuss (although they probably
are... :^).
The only thing that jumps out at me is the ARC size - 53.4GB, or
most of your 64GB
dtrace -n ':::xcalls { @s[stack()] = count() } tick-1sec { trunc(@s,10);
printa(@s); clear(@s); }'
That will tell us where the xcalls are coming from in the kernel,
and we can go from there.
Thanks,
/jim
Jim Leonard wrote:
We have a 16-core x86 system that, at seemingly random intervals,
As Dan said, it looks like ZFS is busy.
How much RAM is on this system?
What release of Solaris?
Do you have any ZFS tweaks in /etc/system?
(like clamping the ARC size...)
Is the system memory constrained?
The xcalls are due to the page unmaps out of
what I'm assuming is the ZFS ARC (although
It would also be interesting to see some snapshots
of the ZFS arc kstats
kstat -n arcstats
Thanks
Jim Leonard wrote:
Thanks for the awesome two-liner, I'd been struggling with 1-second intervals
without a full-blown script.
I modified it to output walltime so that I could zoom in on the
From http://wikis.sun.com/display/DTrace/Scripting;
If you want your D macro arguments to be interpreted as string tokens
even if they match the form of an integer or identifier, prefix the
macro variable or argument name with two leading dollar signs (for
example, $$1) to force the D
In the shameless plug category
I have two tutorials scheduled for the Usenix LISA '09
conference, running in Baltimore, Md, Nov 1-6, 2009.
Sunday, Nov 1, is a full day DTrace tutorial.
Monday, Nov 2, is a full day Solaris/OpenSolaris Performance,
Observability and Tools tutorial.
And of
You're actually asking multiple questions here, because in order
to verify if a particular operating system zero-fills memory pages
when they are freed from an address space, you'd need to first know
which kernel function is called to zero-fill the pages, right?
I created a simple DTrace script
The easiest way to do this is using the sleep/wakeup
probes in the sched provider. From the process/thread
perspective, once they issue an IO, they sit on a sleep
queue until the IO is completed, at which point they're
issued a wakeup.
io:start/io:::done is usefull for a system view, but that's
It's the ID of the probe, not the provider.
/jim
Andrea Cucciarre' wrote:
I guess that the ID you see it's the ID of the provider not the PID
On 07/10/09 16:01, Robert Alatalo wrote:
Hello,
Trying to track down what application is causing the system to
reboot by turning the uadmin
Try this;
#!/usr/sbin/dtrace -s
#pragma D option quiet
extern int errno;
syscall::forkall:return,
syscall::vfork:return,
syscall::forksys:return,
syscall::fork1:return
/ arg0 == -1 || arg1 == -1 /
{
printf(FORKED FAILED, errno: %d, arg0: %d, arg1: %d\n,errno, arg0,
arg1);
}
not enough space indicates an errno 28 ENOSPC, which isn't
listed is the fork man page under ERRORS. Are you sure it's
fork(2) that's failing?
It may be errno 12, ENOMEM.
So what does a general memory health profile of the system
look like? Lots of free memory? Plenty of swap space?
How about
D'oh!
Disregard that last question (address space) - my brain
was thinking thread create failures - it's not applicable
to fork failures. My bad.
The system memory and swap space health checks
still apply, as well as process count -
grab some sar -v 1 60 samples
/jim
Jim Mauro wrote
Which example are you using, specopen.d, /*the script
that instruments every fbt probe*/?
Please post or be more precise about which script you're using.
If you're using specopen.d, than you're enabling on the
order of 30,000 probes. That's going to add up, even at
the very reasonable cost of
I'm sorry, but I am unable to parse this.
What is the question here?
Thanks,
/jim
tester wrote:
counting system call process during this interval: Dtrace came on top
ioctl dtrace 10609
I am sure if that is from the speculative dtrace script or the script used to
count the system calls.
Ah, OK - I think I get it.
tester wrote:
counting system call process during this interval: Dtrace came on top
ioctl dtrace 10609
Got it. DTrace executed 10,609 system calls during your sampling period,
more than any other process. I often filter dtrace out in a predicate;
/ execname !=
I would start with lockstat to determine if there's RW lock contention
(and lockstat is a DTrace consumer).
#lockstat -e4-7,34-35 sleep 60 /var/tmp/rwlocks.out
The above will collect events on kernel reader/writer locks
(run lockstat -h to get a description of each event).
With that data, we
FYI - I just tried this in OpenSolaris 2008.11, running in a
Vbox (2.2) virtual machine. It's noisy without the predicate
for the fbt probes (naturally :^), but it doesn't hang.
(Vbox on a Mac host, FWIW).
Thanks,
/jim
Michael Ernest wrote:
I've been playing with a follow script example and
Hey Paul - Add this predicate;
/ arg0 != (conn_t *)0 /
Talk soon...
/jim
Paul Mininni wrote:
Hey everyone-
With this dtrace script;
#!/usr/sbin/dtrace -qs
#pragma D option aggsize=512k
#pragma D option bufsize=512k
fbt::tcp_conn_request:entry
{
this-connp = (conn_t *)arg0;
You're tripping over the fact the these disk IOs are happening
asynchronously to the process/thread that initiated them.
The dd(1) process has long since been placed on a sleep
queue by the time you're hitting the ARC code, which is why
execname is sched (the execname of PID 0 - the user
process
Cross-posted to perf-discuss.
You can't change the write behavior of the app without
changing the app itself. The code would need to be modified
to issue fsync() calls on the file(s), or open the files for
synchronous writes (O_SYNC | O_DSYNC flags).
fsflush will run, by default, once per
# dtrace -qn 'syscall:::exec-success { trace (execname); }'
^
The exec-success probe is managed by the proc provider, not
the syscall provider. So the probe designation should be;
proc:::exec-success (or just 'exec-success').
(for
You're looking at byte counts, not block sizes.
56kb sounds typical for UFS, which uses an 8k block size,
with 1k frags (default), so you'll typically see IO sizes
to/from UFS in multiples of 8k. The actually amount of
IO depends of course on several factors.
You can also just use iostat data.
http://www.youtube.com/watch?v=tDacjrSCeq4
I wonder if the inverse is true. If I whisper soothing
words of encouragement at my JBODs, will I get
more IOPS with reduced latency?
:^)
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org
This is all very oddiostat is historically extremely reliable.
I've never observed stats like that before - zero reads and writes
with a non-zero value in the wait queue (forget utilization when
it comes to disk - it's a useless metric).
IO rates per process are best measured at the VOP
No bug here - we can absolutely use DTrace on MP systems,
reliably and with confidence.
The script output shows some nasty outliers for a small percentage
of the reads and writes happening on the server. Time to take a closer
look at the IO subsystem. I'd start with iostat -znx 1, and see what
HmmmSomething is certainly wrong. 11 writes at 137k - 275k seconds
(which is where your 1.5M seconds sum is coming from) is bogus.
What version of Solaris is this ('uname -a' and 'cat /etc/release')?
Your running this on an NFS server, right (not client)?
Is this a benchmark? I ask because
Also (I meant to ask) - are you having performance problems, or
just monitoring with the NFS provider scripts?
Thanks,
/jim
Marcelo Leal wrote:
Hello Jim, this is not a benchmark. The filenames i did change for privacy...
This is a NFS server, yes.
# uname -a
SunOS test 5.11 snv_89 i86pc
/
{
That way, you don't have the done probe clause executing
for id's where the start has not fired first. (This still does not
match start/done for a given xid).
But what do I know...
max
Jim Mauro wrote:
Also (I meant to ask) - are you having performance problems, or
just monitoring
Are you referring to nfsv3rwsnoop.d?
The TIME(us) value from that script is not a latency measurement,
it's just a time stamp.
If you're referring to a different script, let us know specifically
which script.
/jim
Marcelo Leal wrote:
Hello there,
Ten minutes of trace (latency), using the
The problem you're running into is disk IO operations tend to occur
asynchronously to the thread that initiated the IO, so when the IO
provider probe fires, execname shows the process name for PID 0.
This is not uncommon when chasing disk and network IOs. You
need to capture the write further up
Do you have directio enabled on UFS?
Especially for the redo logs?
With directio enabled, UFS writes to the log do not
serialize on the RW lock for the log file(s).
directio will also bypass the memory cache, so you need
to increase the Oracle db_block_buffers when enabling
UFS directio.
For the record, my friend Phil Harman reminded me that
it's not the log files we care about for directio in terms
of single-writer lock break-up. We care about directio for
redo logs to avoid read-modify-write, which happens when
the write is not memory-page aligned.
Sorry about that.
%busy is meaningless unless you're looking at a single disk that
can only have 1 outstanding IO in it's active queue, which is to
say %busy is a useless metric for anything disk that's been designed
and built in the last decade.
Ignore %busy. Focus on queue depths and queue service times,
both of
The sysinfo provider isn't the best choice for measuring disk IO
times.
Run;
#dtrace -s /usr/demo/dtrace/iotime.d
/jim
Hans-Peter wrote:
Hello all,
I added a clause to my script.
sysinfo:::
/self-traceme==1 pid == $1/
{
trace(execname);
printf(sysinfo: timestamp : %d
Also - since this is Oracle, are the Oracle files
on a file system, or raw devices?
If a file system, which one?
/jim
Jim Mauro wrote:
The sysinfo provider isn't the best choice for measuring disk IO
times.
Run;
#dtrace -s /usr/demo/dtrace/iotime.d
/jim
Hans-Peter wrote:
Hello
Start with iostat. It's simple, and provides an average of service times
for disk IOs (iostat -xnz 1, the asvc_t column is average service times
in milliseconds). As Jim Litchfield pointed out in a previous thread,
keep in mind it is an average, so you won't see nasty peaks, but if the
average is
Hi Paul -
One thing I have been puzzled with a lot this weekend is the information and
plot in Figure 4.7. This section if I understand it correctly, offers the
means to track the actual times from when an IO starts in the kernel to when
it completes, implying the time to either read or
Hey Paul - I should add that iostat -xnz 1 is a great method
for determine how well the SAN is performance.
The asvc_t times are disk IO service times in milliseconds.
I usually start there to sanity check disk IO times...
Thanks,
/jim
Paul Clayton wrote:
Hello..
Due to growing performance
What kind of system is this, and what release of Solaris?
Enabling all the probes for all the function entry points in a process
(pid$1:::entry) can take some time, and may make your terminal
window appear hung, but it should not almost hang your system
(unless you did this on a laptop or small,
That's not a query that can be answered in a forum like this.
You need to do some reading. Starting with the docs on the
Wiki site (wikis.sun.com/dtrace). Go to blogs.sun.com, and
search for dtrace, and read. Go to
http://www.solarisinternals.com/wiki/index.php/DTrace_Topics_Intro.
Read through
Check out;
http://opensolaris.org/jive/thread.jspa?messageID=267138#267138
You may be tripping over bug 6507659 (tsc differences between CPU's give
dtrace_gethrtime() serious problems.). It looks like the fix went into
127112-03 (you're running -02).
Best to install the latest patch, but you
To All (This is mainly for the Mac DTrace 3, Adam Leventhal, Bryan
Cantrill, Mike Shapiro)..
We appreciate your desire to go to the source. It's a lot like posting a
question on
relativity, and indicating you'd really like an answer from Einstein :^)
That said, there's great news -
and there are many, many DTrace experts that can help. So Brian, Mike and
Sorry; s/Brian/Bryan
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org
for the help.
-Blake
On Sep 2, 2008, at [Sep 2]1:06 PM, Jim Mauro wrote:
To All (This is mainly for the Mac DTrace 3, Adam Leventhal, Bryan
Cantrill, Mike Shapiro)..
We appreciate your desire to go to the source. It's a lot like
posting a question on
relativity, and indicating you'd
You could use curpsinfo-pr_dmodel is a predicate.
probe
/ curpsinfo-pr_dmodel == 1/
{
32-bit process
}
probe
/ curpsinfo-pr_dmodel == 2/
{
64-bit process
}
/jim
Bruce Chapman wrote:
I had a simple D script that looks for use of the DP_POLL ioctl with a long
timeout for any process
prstat -s rss
The RSS column is the resident set size, or roughly the amount of memory
being used be the process. The -s rss tells prstat to sort based on
rss size.
HTH,
/jim
YOUNSI RIADH wrote:
*Hi *
*Using vmstat I noticed that free memory is getting lower during a
certain period of
I don't understand the question.
I see forcedirectio set as a mount option, so I would expect
ufs:directio_start:entry to fire
/jim
Sébastien Bouchex Bellomié wrote:
Hi,
The following script is working fine as it display the directio start
message
[...]
#!/usr/sbin/dtrace -s
disp_getwork() is a kernel function that gets called from the idle loop
(dispatcher - go get work, meaning find me a runnable thread).
(usermode) simply means that the CPU(s) were running in usermode,
not kernel mode during the profile, and lockstat has not visibility into
what user functions
Read this thread:
http://www.opensolaris.org/jive/thread.jspa?messageID=10250
Thanks,
/jim
Jianhua Yang wrote:
Hello,
new to dtrace, need help here
when run the following dtrace, it produced dtrace errors, why it get such
errors ?
# dtrace -n 'pid$target:libc:malloc:entry {
You may want to cross-post to a Java alias, but I've been down this
road before.
Java will call into malloc() for buffers for network reads and writes
that are larger than 2k bytes (the 2k is from memory, and I think it
was a 1.5 JVM). A large number of malloc calls,
and resulting contention on
Try linking the JVM to libumem.so.1
(export LD_PRELOAD=/usr/lib/libumem.so.1)
I have, in the past, been able to reduce lock contention in malloc
using libumem malloc.
If that does not help, you need to see where the mallocs are coming
from in the code, and see if there's opportunity to change
Hi Travis - Your first clue here is the backtick operator (`) used to
extract hp_avenrun[0]. The backtick operator is used to read the
value of kernel variables, which will be specific to the running kernel.
That is, Solaris, Mac OS X (Darwin), FreeBSD and all other kernels
with DTrace will not
For kernel modules, _if_ the supplied kernel code (driver, file system,
whatever),
was compiled with symbolic information, you can determine the argument list
and types with mdb:
# mdb -k
Loading modules: [ unix genunix specfs dtrace uppc pcplusmp scsi_vhci
ufs mpt ip hook neti sctp arp usba
I hate to beat this to death, but it really sounds like you can make your
life simple and use prstat and pmap to track the information you want.
Heap memory usage tracking is tricky, since libc malloc may or may not
call into the kernel (sbrk) to satisfy a memory request. Also, tracking
malloc
Great question. As a dtrace user and documentation reader, I would not
want to
need to flip to another chapter, or another section, to read about
platform differences
for a particular provider, function, etc. I'm not saying you suggested
that, I'm just
thinking out loud...
I think a
http://blogs.sun.com/jonh/
The esteemed Jon Haslam posted some slides with examples of new features
that went into DTrace since Solaris 10 was released.
Good stuff, and a tip-o-the-hat to Jon.
/jim
___
dtrace-discuss mailing list
78 matches
Mail list logo