Re: [dtrace-discuss] CPU dispatcher and buffer questions

2011-07-01 Thread Jim Mauro
I'm not sure I understand what is being asked here, but I'll take a shot...

Note it is virtually impossible to write a piece of software that is guaranteed
to have sufficient space to buffer a given amount of data when the rate
and size of the data flow is unknown. This is one of the robustness features
of dtrace - it's smart enough to know that, and smart enough to let the user
know when data can not be buffered.

Yes, buffers are allocated per-CPU. There are several buffer types, depending
on the dtrace invocation. Minimally, principle buffers are allocated per CPU
when a dtrace consumer (dtrace(1M)) is executed. Read;
http://wikis.sun.com/display/DTrace/Buffers+and+Buffering

The self-read describes a thread local variable, one of several variable
types available in DTrace. It defines the variable scope - each kernel thread
that's on the CPU when the  probe(s) fires will have it's own copy of a
self- variable. 

There is only one kernel dispatcher, not one per CPU. There are per-CPU run
queues managed by the dispatcher.

As for running a DTrace script for hours/days/weeks, I have never been down that
road. It is theoretically possible of course, and seems to be a good use of 
speculative buffers or a ring buffer policy.

We can not guarantee it will execute without errors (dynamic variable drops, 
etc).
We can guarantee you'll know when errors occur. 

How can such guarantees be made with a dynamic tool like dtrace?
Does your customer know up-front how much data will be traced/processed/
consumed, and at what rate?

Read this;
http://blogs.oracle.com/bmc/resource/dtrace_tips.pdf

Thanks
/jim

On Jul 1, 2011, at 9:30 AM, Scott Shurr wrote:

 Hello,
 I have a customer who has some dtrace questions.  I am guessing that someone 
 knows the answer to these, so I am asking here.  Here are the questions:
 
 **
 In this document, we will describe how we assume that DTrace uses its memory. 
 Most assumptions result from [1]. We want these assumptions to be validated 
 by a DTrace expert from Oracle. This validation is necessary to provide us 
 confidence that DTrace can execute for a long period of time (in the order of 
 weeks) along with our software, without introducing errors due to e.g. 
 “dynamic variable drops”. In addition, we described a problem we experience 
 with our DTrace script, for which we want to have support from you.
 
 [1] Sun Microsystems inc, “Solaris Dynamic Tracing Guide”, September 2008.
 Quotes from Solaris Dynamic Tracing Guide [1], with interpretation:
 •“Each time the variable self-read is referenced in your D program, the 
 data object referenced is the one associated with the operating system thread 
 that was executing when the corresponding DTrace probe fired.”
 oInterpretation: Per CPU there is a dispatcher that has its own thread, 
 when it executes the sched:::on-cpu and sched:::off probes.
 •“At that time, the ring buffer is consumed and processed. dtrace 
 processes each ring buffer in CPU order. Within a CPU's buffer, trace records 
 will be displayed in order from oldest to youngest.”
 Interpretation: There is a principal buffer per CPU
 
 1) Impact on Business
 We have a number of assumptions that we would like to verify about DTrace.
 
 2) What is the OS version and the kernel patch level of the system?
 SunOS nlvdhe321 5.10 Generic_141444-09 sun4v sparc SUNW,T5240
 
 3) What is the Firmware level of the system?
 SP firmware 3.0.10.2.b
 SP firmware build number: 56134
 SP firmware date: Tue May 25 13:02:56 PDT 2010
 SP filesystem version: 0.1.22 
 **
 Thanks
 oracle.jpg
 
 Scott Shurr| Solaris and Network Domain, Global Systems Support
 Email: scott.sh...@oracle.com
 Phone: 781-442-1352
 Oracle Global Customer Services
 
 Log, update, and monitor your Service Request online using My Oracle Support
 
 
 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org

[dtrace-discuss] DTrace Book - Deal Of The Day

2011-04-07 Thread Jim Mauro
(I know, I know - shameless).

FYI - ebook deal of the day on Informit:

http://www.informit.com/store/index.aspx


Also FYI, the printing process is completed and boxes are
being shipped to retailers. Amazon should be shipping
next week.

www.dtracebook.com will be up and running very, very soon,
making all the scripts in the book available for download.

Working on a feedback alias, but in the meantime please feel
free to send feedback, comments, errata, etc to;

jim.ma...@gmail.com

Feel free to post positive feedback publically.

:^)

Thanks
/jim

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org

Re: [dtrace-discuss] pid probes fail on OSX

2010-12-07 Thread Jim Mauro
  28728 pthread_key_delete:entry 
  0 126037__tcf_0:entry 
  0  29716   _cleanup:entry 
  0  29717 _fwalk:entry 
  0  29643   __sflush:entry 
  0  29643   __sflush:entry 
  0  29643   __sflush:entry 
  0  28729  _exit:entry 
  2  50753 cerror:entry 
  2  28752 cthread_set_errno_self:entry 
  2  28614__error:entry 
  2  28614__error:entry 
  2  28463   stub helpers:entry 
  2  51054   dyld_stub_binder:entry 
  2  28697 _dyld_fast_stub_entry(void*, long):entry 
  2  27717 dyld::fastBindLazySymbol(ImageLoader**, unsigned long):entry 
  2  28106 ImageLoaderMachOCompressed::doBindFastLazySymbol(unsigned int, 
ImageLoader::LinkContext const):entry 
  2  28010 ImageLoaderMachO::segActualLoadAddress(unsigned int) const:entry 
  2  28074 ImageLoaderMachOCompressed::segmentCommandOffsets() const:entry 
  2  28090 ImageLoaderMachOCompressed::resolve(ImageLoader::LinkContext const, 
char const*, unsigned char, int, ImageLoader const**, Imag:entry 
  2  28075 ImageLoaderMachOCompressed::libImage(unsigned int) const:entry 
  2  28088 ImageLoaderMachOCompressed::resolveTwolevel(ImageLoader::LinkContext 
const, ImageLoader const*, bool, char const*, ImageLoader:entry 
  2  28002 ImageLoaderMachO::findExportedSymbol(char const*, bool, ImageLoader 
const**) const:entry 
  2  28102 ImageLoaderMachOCompressed::findExportedSymbol(char const*, 
ImageLoader const**) const:entry 

On Dec 7, 2010, at 5:09 PM, Hans Stimer wrote:

 Thank you for the quick response. I tried a number of things and got some 
 interesting results. In short:
 
 1) pid doesn't work with -c date, but does seem to work with other 
 executables. On osx date is an executable and not a script or built in.
 2) pid doesn't let you instrument all entries; I created a hello world 
 program and dtrace didn't return for over 5 minutes
 
 * Modify to point to hello world program:
 $ sudo dtrace -n 'pid$target:::entry { trace(arg0); }' -c ./out
 dtrace: description 'pid$target:::entry ' matched 7947 probes
 ^C
 I gave it over 5 minutes and it didn't return
 
 * Trace everything connected to date:
 $ sudo dtrace -n 'pid$target:::entry { trace(arg0); }' -c date
 Tue Dec  7 13:49:34 PST 2010
 dtrace: invalid probe specifier pid$target:::entry { trace(arg0); }: failed 
 to create entry probe for 'stub_binding_helper': No such process
 
 * Find entry points for date
 $ sudo dtrace -ln 'pid$target:::entry,pid$target:::return' -c date
 Tue Dec  7 13:52:55 PST 2010
 dtrace: invalid probe specifier pid$target:::entry,pid$target:::return: 
 failed to create entry probe for 'stub_binding_helper': No such process
 
 * Find entry points for hello world program:
 $ sudo dtrace -ln 'pid$target:::entry,pid$target:::return' -c ./out
ID   PROVIDERMODULE  FUNCTION NAME
 144623   pid95634  dyld   stub_binding_helper entry
 144624   pid95634  dyld  dyld_func_lookup entry
 144625   pid95634  dyldoffset_to_dyld_all_image_infos entry
 144626   pid95634  dyld   _dyld_start entry
 144627   pid95634  dyld  dyld_fatal_error entry
 .
 .
 .
 
 * Find entry points for top:
 $ sudo dtrace -ln 'pid$target:::entry,pid$target:::return' -c top
ID   PROVIDERMODULE  FUNCTION NAME
 36684   pid95691  dyld   _dyld_start entry
 36685   pid95691  dyld std::vectordyld_image_info, 
 std::allocatordyld_image_info 
 ::erase(__gnu_cxx::__normal_iteratordyld_image_info*, std::vecto entry
 36686   pid95691  dyld std::_Vector_basedyld_image_info, 
 std::allocatordyld_image_info ::~_Vector_base() entry
 36687   pid95691  dyld  dyld_fatal_error entry
 .
 .
 .
 
 
 
 On Tue, Dec 7, 2010 at 12:27 PM, Jim Mauro james.ma...@oracle.com wrote:
 
 When you say straight from the dtrace book, I assume you mean
 the soon-to-be-published book?
 
 If that is the case, please allow me to clarify. Not every invocation of 
 dtrace,
 both one-liners and scripts, that are in the book (and there are hundreds) is
 tested and guaranteed to work under all three operating systems
 (Solaris, OS X, FreeBSD). In fact, some will not work across different Solaris
 releases, depending on (for example) which provider is used.
 
 We are very clear in the book with our examples about the differences in
 the underlying implementations. There are a great many scripts and one-liners
 in the book that will only work on Solaris.
 
 The good news is that, in addition to the actual examples, there is a lot of
 emphasis on methodology in the book, such that you can use DTrace
 to figure out which probes/modules, etc, you should be instrumenting
 based on the problem

Re: [dtrace-discuss] Solaris Internals Resource Threshold being hit

2010-10-29 Thread Jim Mauro
Mike is correct. Pretty much every time I've seen this, it's
VM (VM = virtual memory = swap) related.

There's a DTrace script below you can run when you hit this
problem that will show us which system call is failing with an
EAGAIN error. It is most likely fork(2) (and yes, I know printing
the errno in the return action is superfluous given we use it
in the predicate - it's me being OCD and sanity checking).

A second DTrace script further down should provide a kernel
stack trace if it is a fork(2) failure.

Or(disk is cheap) swap -a (add swap space) and see if the
problem goes away.

Thanks
/jim


#!/usr/sbin/dtrace -s

#pragma D option quiet

syscall:::entry
{
self-flag[probefunc] = 1;
}
syscall:::return
/self-flag[probefunc]  errno == 11/
{
printf(syscall: %s, arg0: %d, arg1: %d, errno: 
%d\n\n,probefunc,arg0,arg1,errno);
self-flag[probefunc] = 0;
}




#!/usr/sbin/dtrace -s

#pragma D option quiet

syscall::forksys:entry
{
self-flag = 1;
@ks[stack(),ustack()] = count();
}
syscall::forksys:return
/self-flag  arg0 == -1  errno != 0/
{
printf(fork failed, errno: %d\n,errno);
printa(@ks);
clear(@ks);
exit(0);
}


On Oct 29, 2010, at 12:00 PM, Robin Cotgrove wrote:

 I need some assistance and guidance in writing a DTRACE script or even 
 better, finding an example one which would help me identify what's going on 
 our system. Intermittently, and we think it might be happening after about 60 
 days, on a E2900, 192GB, 24 core, Solaris 10 11.06 system with a fairly new 
 patch cluster (Generic_142900-13) we are running into a problem whereby we 
 suddenly hit a problem which results in processes failing to start and 
 getting the error message 'resource temporarily unavailable' error. This is 
 leading to Oracle crash/startup issues.
 
 I ran a simple du command at the time it was happening at got the following 
 response.
 
 ‘du: No more processes: Resource temporarily unavailable’ 
 
 Approximately 6500 TCP connections on server at time. 6000 unix processes. 
 The max UNIX processes per user is set to 29995. 60GB free physical memory 
 and no swap being used. Absolutely baffling us at mo. 
 
 Not managed to truss a failing command when it happened yet because it's so 
 intermitttent in it's nature. 
 
 We've checked all the usual suspects including max processes per users and 
 cannot find the cause. Need a way to monitor all the internal kernel 
 resources to see what we're hitting. Suggestions please on a postcard. All 
 welcome. 
 
 Robin Cotgrove
 -- 
 This message posted from opensolaris.org
 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Solaris Internals Resource Threshold being hit

2010-10-29 Thread Jim Mauro
Thanks Mike. Good point on the script.

Indeed, use of speculative tracing would be a better
fit here. I'll see if I can get something together and 
send it out.

Thanks,
/jim

On Oct 29, 2010, at 4:45 PM, Mike Gerdts wrote:

 On Fri, Oct 29, 2010 at 2:50 PM, Robin Cotgrove ro...@rjcnet.co.uk wrote:
 Sorry guys. Swap is not the issue. We've had this confirmed by Oracle and I 
 can clearly see there is 96GB of swap awailable on the system and ~50GB of 
 main memory.
 
 By who at Oracle?  Not everyone is equally qualified.  I would tend to
 trust Jim Mauro (who co-wrote the books[1] on Solaris internals,
 performance,  dtrace) over most of the people you will get to through
 normal support channels.
 
 1. http://www.amazon.com/Jim-Mauro/e/B001ILM8NC/
 
 How do you know that available swap doesn't momentarily drop?  I've
 run into plenty of instances where a system has tens of gigabytes of
 free memory but is woefully short on reservable swap (virtual memory,
 as Jim approximates).  Usually vmstat 1 is helpful in observing
 spikes, but as I said before this could miss very short spikes.  If
 you've already done this to see that swap is unlikely to be an issue,
 knowing that would be useful to know.  If you are measuring the amount
 of reservable swap with swap -l, you are doing it wrong.
 
 I do agree that there can be other shortfalls that can cause this.
 This may call for speculative tracing of stacks across the fork entry
 and return calls, displaying results only when the fork fails with
 EAGAIN.  Jim's second script is similar to what I suggest, except that
 it doesn't show the code path taken between syscall::forksys:entry and
 syscall::forksys:return.
 
 Also, I would be a little careful running the second script as is for
 long periods of time if you have a lot of forksys activity with unique
 stacks.  I think that as it is @ks may grow rather large over time
 because the successful forks are not cleared.
 
 -- 
 Mike Gerdts
 http://mgerdts.blogspot.com/
 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Distinguish filesystem from UNIX domain socket activity in syscall provider

2010-07-13 Thread Jim Mauro

You could first filter out the target file system for the file IO by
doing a count() aggregation on;

[fds[arg0].fi_fs] = count();

NOTE - this will only work for those system calls that take a file
descriptor as arg0.

Once you know the FS target for the file IO (ufs? zfs? whatever), use it
in a predicate to capture systems calls of interest;

/ fds[arg0].fi_fs == ufs /

(for example - replace ufs with whatever came up in the previous 
result.

And, again, this will only work for the system calls that take a
file descriptor as arg0.

HTH
/jim

On Jul 13, 2010, at 7:10 AM, Ralph Böhme wrote:

 Dtracers,
 
 I'm investigating some performance issues using a simple dtrace script which 
 collects read,write,stat,... syscall data. This works fine and gives me some 
 good data.
 Unfortunately the app I'm tracing is concurrently doing file IO and IO on a 
 UNIX domain socket. As I'm only interested in data for file IO, I'm 
 investigating how I can modify my syscall probes. I'm currenty doing stuff 
 like this:
 
 /*
 * Save syscall entry info
 */
 syscall::*stat*:entry, 
 syscall::*open*:entry,
 syscall::*read*:entry,
 syscall::*write*:entry,
 syscall::getdents*:entry,
 syscall::*sync*:entry
 /((OPT_command || OPT_pid)  pid == $target) || (OPT_name  execname == 
 NAME)/
 {
/* set start details */
self-start = timestamp;
self-vstart = vtimestamp;
 }
 
 /*
 * Print return data
 */
 
 syscall::*stat*:return, 
 syscall::*open*:return,
 syscall::*read*:return,
 syscall::*write*:return,
 syscall::getdents*:return,
 syscall::*sync*:return
 /self-start/
 {
/* calculate elapsed time */
this-elapsed = (timestamp - self-start) / 1000;
self-start = 0;
this-cpu = (vtimestamp - self-vstart) / 1000;
self-vstart = 0;
 
@count[probefunc] = count();
@sum[probefunc] = sum(this-elapsed);
@elapsed[probefunc] = quantize(this-elapsed);
@oncpu[probefunc] = quantize(this-cpu);
@avg[probefunc] = avg(this-elapsed);
 
 }
 
 Any hint for me? Thanks!
 -Ralph
 -- 
 This message posted from opensolaris.org
 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] ?: DTrace of connection establishment attempts for destination address/port

2010-05-05 Thread Jim Mauro

Hi Steffen - I actually think that code did change quite a bit
from s10 to nv.

I'm not sure what you need to do, but you may want to grab
Brendan's DTraceToolKit and have a look at tcptop and
tcpsnoop, and have a look at how Brendan did it for s10.

Thanks,
/jim

On May 5, 2010, at 8:21 AM, Steffen Weiberle wrote:

 Using Solaris 10, so no IP provider, I am trying to figure out when a
 connection attempt is failing in a client. The connect() statement takes
 a file descriptor, which is an integer and has no connection details.
 Since I don't know if the connect is closely preceded with a bind() or a new 
 connection may be attempted with the same socket, and thus I won't be 
 capturing the bin, I can't rely on it to get the sockaddr structure.
 
 (The event happens very infrequently, maybe once a day)
 
 So I am trying tcp_connect FBT probe.
 
 fbt::tcp_connect:entry
 {
   printf(execname: %s \n, execname);
   self-sockaddr = args[1];
 
   printf(family: %x \n, self-sockaddr-sa_family);
 }
 
 However, I get
 
 # /var/tmp/connect.d
 dtrace: failed to compile script /var/tmp/connect.d: line 32: sa_family
 is not a member of struct msgb
 
 Note is says 'msgb', not 'sockaddr'.
 
 I got the args[1] from the OpenSolaris source code (and maybe that is my
 problem, since I am running on Solaris 10--did the code change that
 much?--seems unlikely)
 
 http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/inet/tcp/tcp_socket.c#232
 
 Any good way to grab the IPv4 address and port number on an entry so I
 can have it in the situation the connect() fails?
 
 Thanks
 Steffen
 
 
 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] pid$target: Not enough space

2010-05-04 Thread Jim Mauro

Check out slides 17 and 18 of DTrace Tips, which you can find
here;

http://blogs.sun.com/bmc/entry/dtrace_tips_tricks_and_gotchas


On May 4, 2010, at 6:23 PM, Jianhua Yang wrote:

 I wanted to find out what the single threaded process was doing with dtrace 
 but it returned with Not enough spaceerror:
 
 # dtrace -ln 'pid$target:::entry,pid$target:::return {trace(timestamp);}' -p 
 25050
 dtrace: invalid probe specifier pid$target:::entry,pid$target:::return 
 {trace(timestamp);}: failed to create return probe for 'kdstf1011000km': Not 
 enough space
 
 questions are:  
what does Not enough space means ?
how to avoid such errors ?
 
 Thanks for your help !
 
 Jianhua Yang,
 -- 
 This message posted from opensolaris.org
 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] matching the -module-

2010-04-28 Thread Jim Mauro
interrupts not being serviced?

Do you mean device interrupts - NIC and HBA?

Or higher-level interrupts, like errors?

That seems a wee bit extreme, but intrstat should help
here. Or use dtrace and determine which interrupt services
routines are running. If you're referring to device interrupts,
the service routines all have intr in the function name.

j...@pae-g4-nv:~# dtrace -n 'fbt::*intr*:entry { @[probefunc] = count(); }'
dtrace: description 'fbt::*intr*:entry ' matched 339 probes
^C


The above probably seems extreme, but just run it for
a few seconds and see which probes actually fire. Fine
tune your probe spec from there.


On Apr 28, 2010, at 10:14 AM, Steve Gonczi wrote:

 Do these post have some connection to the thread topic?
 
 BTW. the mdb settings recommended by Jim Mauro actually did not make a 
 difference, I
 jumped to to wrong conclusion, based on an incorrect test setup.
 
 My sense is that interrupts are not being serviced.  Assume that most cpu-s 
 have been given a cli instruction, and just ignoring interrupts most of the 
 time.
 
 Anyone has a suggestion on how to prove/disprove this?
 -- 
 This message posted from opensolaris.org
 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little

2010-04-22 Thread Jim Mauro
Groan. I'm such an idiot. I should have been more
precise in terms of  where and when you get set this.

Sorry folks.
On Apr 22, 2010, at 11:11 AM, Richard Skelton wrote:

 Hi Jim,
 If I set set idle_cpu_prefer_mwait = 0 in /etc/system on a X2270 running 
 Solaris 10 10/09 s10x_u8wos_08a X86
 I get a kernel panic upon reboot :-(
 Is this setting only for recent versions of OpenSolaris ?
 
 Cheers
 Richard.
 -- 
 This message posted from opensolaris.org
 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] __lwp_park() showing up as most time-consuming when dtrace an application

2010-02-03 Thread Jim Mauro

It's used to put threads to sleep that are blocking on user locks
(at least that's my recollection).

Run prstat -Lmp PID_OF_APP) - what does  the LCK column look like?

Try running plockstat -A -p PID_OF_APP.

Thanks,
/jim


Dtrace Email wrote:

Hi,  when doing dtrace on an appliction, __lwp_park() seems to be taking a lot 
of time.  What does it really do?   is it waiting for threads?

Thanks,
  

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] about sched interrupt

2010-01-24 Thread Jim Mauro
PID 0 isn't really a process. It's the kernel - that's something of an 
oversimplification,
but for the purposes of this discussion, just know that pid 0 is not a 
process.


Interrupts do not target processes. Devices bind interrupts to CPU at 
initialization
time. Use 'echo ::interrupts | mdb -k to determine which devices have 
interrupts
bound to which CPUs, and/or intrstat. mpstat will show you interrupt 
rates per

CPU as well.

As I said, if you think interrupts are getting in the way, create a 
processor set
with the CPU(s) handling high rates of interrupts, thereby fencing it 
off from

running application code.

Thanks,
/jim


Qihua Wu wrote:

Hi Max/Jim,

Based on your script, I created the following script to print which 
pid is interrupted how many times every 2 second. The result shows pid 
0 is interrupted most: about 50k every 2 second, that's 25k per 
second. How could it be interrupted so frequently? For the clock 
interrupt, it only interrupts 100 times per second, which is much less 
than 25k per second. Where could it come from?


The box has lots of network traffic from mysql, I think the network 
interrupt will cause the mysqld (pid 10490) to be interrupted instead 
of pid 0.


#!/usr/sbin/dtrace -s

interrupt-start
{
  self-intr = curthread-t_intr;
}

interrupt-start
/self-intr/
{
  @inter[self-intr-t_procp-p_pidp-pid_id, cpu] = count();
}

interrupt-complete
/self-intr/
{
  self-intr = 0;
}

profile:::tick-2sec
{
printf(report completed at time:%Y\n,walltimestamp);
printf(***\n);
 printa(%d-%d: %...@d\n, @inter);  
trunc(@inter,0);

}

 0  89152   :tick-2sec report completed at 
time:2010 Jan 24 02:48:56

***
0-0: 1
0-2: 1
0-8: 1
0-12: 1
0-14: 1
0-16: 1
0-20: 1
0-21: 1
0-22: 1
1095-7: 4
0-6: 5
0-5: 8
264-7: 8
10490-7: 4962
0-7: 49131

  0  89152   :tick-2sec report completed at 
time:2010 Jan 24 02:48:58

***
0-0: 1
0-5: 4
1095-7: 4
10490-7: 4200
0-7: 50305

  0  89152   :tick-2sec report completed at 
time:2010 Jan 24 02:49:00

***
0-3: 4
1095-7: 4
0-5: 8
25388-7: 818
10490-7: 1582
0-7: 51507


On Fri, Jan 22, 2010 at 4:21 PM, m...@bruningsystems.com 
mailto:m...@bruningsystems.com m...@bruningsystems.com 
mailto:m...@bruningsystems.com wrote:


Hi Qihua Wu,

Here is a script that shows process args for process that was
interrupted, and cpu.  It is easily modified
to add pid.

#!/usr/sbin/dtrace -s

interrupt-start
{
  self-intr = curthread-t_intr;
}

interrupt-start
/self-intr/
{
  @[stringof(self-intr-t_procp-p_user.u_psargs), cpu] = count();
}

interrupt-complete
/self-intr/
{
  self-intr = 0;
}


max


Qihua Wu wrote:

So do we have a way to know what causes the interrupt (this
may be got by
intrstat) and which pid/execname is interrupted?

Thanks
Daniel

On Thu, Jan 21, 2010 at 10:43 PM, Jim Mauro
james.ma...@sun.com mailto:james.ma...@sun.com wrote:

 


sched is the execname of the PID 0 process (run ps -e).

The string sched gets plugged into the DTrace execname
variable
if the CPU is in an interrupt handler when a probe fires.

CPU 0 is very likely taking the clock interrupts, which by
default
occur every 10 milliseconds.

HTH,
/jim




Qihua Wu wrote:

   


Why sched is often interrupt on cpu 0 instead of
distribution evenly on
all cpu?

dtrace -n 'sdt:::interrupt-start { @num[execname,cpu]
= count(); }'
dtrace: description 'sdt:::interrupt-start ' matched 1
probe
^C

 sched
4

 355
 sched
   18

 358
 sched
8

 369
 sched
   10

 370
 sched
   11

 370
 sched
   12

 408
 sched
   13

 408
 sched
   15

 408
 sched

Re: [dtrace-discuss] about sched interrupt

2010-01-22 Thread Jim Mauro

According to your intrstat data, it looks like a nge network
device and disk sata controller both have interrupts bound
to cpu 7.

Max provided a DTrace script to determine which process is having
their thread(s) pinned when an interrupt comes in.

If you're concerned about the possibility of a CPU handling a high
rate of interrupts, and running application threads, causing a performance
issue, just fence the CPU off. It's easy, and dynamic.

For example, if cpu7 is handling most of the interrupts, just put it
in a processor set;

psrset -c 7

Once you've done that, the kernel will not schedule user threads on
cpu7 unless you explicitly bind processes/threads to the processor
set. By NOT doing any binding, you've effectively fenced off cpu7 -
all it will do is handle interrupts.

If it doesn't help, or makes things worse, just psradm -d 1 to
delete the set.

/jim


Qihua Wu wrote:
So do we have a way to know what causes the interrupt (this may be got 
by intrstat) and which pid/execname is interrupted?


Thanks
Daniel

On Thu, Jan 21, 2010 at 10:43 PM, Jim Mauro james.ma...@sun.com 
mailto:james.ma...@sun.com wrote:


sched is the execname of the PID 0 process (run ps -e).

The string sched gets plugged into the DTrace execname variable
if the CPU is in an interrupt handler when a probe fires.

CPU 0 is very likely taking the clock interrupts, which by default
occur every 10 milliseconds.

HTH,
/jim




Qihua Wu wrote:

Why sched is often interrupt on cpu 0 instead of distribution
evenly on all cpu?

dtrace -n 'sdt:::interrupt-start { @num[execname,cpu] =
count(); }'
dtrace: description 'sdt:::interrupt-start ' matched 1 probe
^C

 sched 4  
   355
 sched18  
   358
 sched 8  
   369
 sched10  
   370
 sched11  
   370
 sched12  
   408
 sched13  
   408
 sched15  
   408
 sched19  
   408
 sched20  
   408
 sched21  
   408
 sched22  
   408
 sched23  
   408
 sched14  
   409
 sched 9  
   410
 sched 2  
   411
 sched16  
   411
 sched 6  
   422
 sched17  
   456
 sched 3  
   469
 sched 5  
   510
 sched 7  
   564
 sched 1  
  3079
 sched 0  
535071


Thanks,
Daniel


___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org
mailto:dtrace-discuss@opensolaris.org
 




___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] about sched interrupt

2010-01-21 Thread Jim Mauro

sched is the execname of the PID 0 process (run ps -e).

The string sched gets plugged into the DTrace execname variable
if the CPU is in an interrupt handler when a probe fires.

CPU 0 is very likely taking the clock interrupts, which by default
occur every 10 milliseconds.

HTH,
/jim




Qihua Wu wrote:
Why sched is often interrupt on cpu 0 instead of distribution evenly 
on all cpu?


dtrace -n 'sdt:::interrupt-start { @num[execname,cpu] = count(); }'
dtrace: description 'sdt:::interrupt-start ' matched 1 probe
^C

  sched 
4  355
  sched
18  358
  sched 
8  369
  sched
10  370
  sched
11  370
  sched
12  408
  sched
13  408
  sched
15  408
  sched
19  408
  sched
20  408
  sched
21  408
  sched
22  408
  sched
23  408
  sched
14  409
  sched 
9  410
  sched 
2  411
  sched
16  411
  sched 
6  422
  sched
17  456
  sched 
3  469
  sched 
5  510
  sched 
7  564
  sched 
1 3079
  sched 
0   535071


Thanks,
Daniel


___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org
  

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Looking for help on 2 items...

2009-12-17 Thread Jim Mauro

iostat -Cx 1 is your friend.

The -C flag will provide a rollup per controller
(c1, c2, etc) so you can determine the IO rate
on a per-controller basis (IOPS and bandwidth).

I'd start there. DTrace rocks, but you should be able
to answer this question with iostat.

/jim


Michael Brian - IL wrote:

I am pretty new to Dtrace but use the Dtrace Toolkit when trying to
troubleshoot I/O issues
On Oracle.

I am looking for help on how to do the following:
I am trying to answer whether adding more HBA Cards/ports
would be effective.

To do this, I need to know the i/o's per second
As well as total bandwidth per second.

Has anyone done this before?

Does anyone have any other ideas on how to attack this problem?

I have been tuning Oracle for quite some time now, and I am continually
Asked to prove what I tend to know naturally, that the classic 1 HBA, 2
port card
Isn't cutting it. 


I also have similar discussions on whether I am saturating the BUS on a
particular box.






 
Brian
 
--

Brian P Michael
Technical Management Consultant
Rolta TUSC, Inc.
micha...@tusc.com
630-960-2909 x1181
http://www.tusc.com
 
The information contained in this transmission is privileged and

confidential information intended for the use of the individual or
entity named above.  If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution
or copying of this communication is strictly prohibited.  If you have
received this transmission in error, do not read it.  Please immediately
reply to the sender that you have received this communication in error
and then delete it.  Thank you.
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org
  

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


[dtrace-discuss] DTrace Book - rough cuts

2009-11-23 Thread Jim Mauro

(shameless plug).

We have a DTrace book  underway, and while the
final product won't appear until Summer, 2010,
we're leveraging the Safari Books OnLine
Rough Cuts facility to make early drafts of chapters
generally available.

Right now, there are 3 chapters available, with more
coming over the next few months. You can arrange
for auto email notification for updates when there
are changes.

We welcome any and all feedback.

Thanks,
Jim Mauro
Brendan Gregg
Chad Mynhier
Tariq Magdon-Ismail




___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] DTrace Book - rough cuts

2009-11-23 Thread Jim Mauro

Hi Dave - yes, we plan on covering Java extensively,
perhaps even in a dedicated chapter. That bit is not
underway yet, but it's in the plan.

Thanks,
/jim


David Blasingame, Area Systems Support Engineer wrote:
Great plug.  Are you considering having information on DTrace and JAVA 
in the book?


Dave

On 11/23/09 10:43, Jim Mauro wrote:

(shameless plug).

We have a DTrace book  underway, and while the
final product won't appear until Summer, 2010,
we're leveraging the Safari Books OnLine
Rough Cuts facility to make early drafts of chapters
generally available.

Right now, there are 3 chapters available, with more
coming over the next few months. You can arrange
for auto email notification for updates when there
are changes.

We welcome any and all feedback.

Thanks,
Jim Mauro
Brendan Gregg
Chad Mynhier
Tariq Magdon-Ismail




___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org



___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Who're stealing memory ?

2009-11-22 Thread Jim Mauro

Have to grabbed periodic snapshots with ps, or prstat?
These will gave you a sense of which processes have
large physical memory footprints, and you can pmap
from there

Thanks,
/jim

Simon wrote:

Hi Jim,

In order to be sure, you need to so some additional memory
accounting and determine how much RAM you need to support
the shared segments for Sybase, and the JVMs.


It's difficult for me now since I don't kwow what is the really 
troublemaker to cause this issue,I guess the JVMs,and I will suggest 
reduce the share memory allocation for sybase,that control it in 
sybase configuration file,not in /etc/system.



Thanks.
Best Regards,
Simon


On Sun, Nov 22, 2009 at 11:33 AM, Jim Mauro james.ma...@sun.com 
mailto:james.ma...@sun.com wrote:



You have about 9GB of shared memory (on a 16GB machine).



From the prstat output,we found 3 sybase process,and each
process derived 12 threads,the java process(launched by
customer application) derived total 370 threads, I think it's
too many threads(especially of java program)  that generate
excessive stack/heaps,and finally used up the RAM ?


Java can consume a lot of memory. Need to see the memory sizes,
but it's certainly a possibility.


So I think decrease the share memory used by sybase(defined at
sybase configuration layer,not in /etc/system file) would be
helpful ?


Sure. If you take memory away from one consumer, it leaves
more for the others. Whether or not it actually solves your
problem, meaning after such a change the system has sufficient
memory to run without paging, remains to be seen.

In order to be sure, you need to so some additional memory
accounting and determine how much RAM you need to support
the shared segments for Sybase, and the JVMs.

Thanks,
/jim



___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Who're stealing memory ?

2009-11-21 Thread Jim Mauro


You have about 9GB of shared memory (on a 16GB machine).



From the prstat output,we found 3 sybase process,and each process 
derived 12 threads,the java process(launched by customer application) 
derived total 370 threads, I think it's too many threads(especially of 
java program)  that generate excessive stack/heaps,and finally used 
up the RAM ?


Java can consume a lot of memory. Need to see the memory sizes,
but it's certainly a possibility.


So I think decrease the share memory used by sybase(defined at sybase 
configuration layer,not in /etc/system file) would be helpful ?


Sure. If you take memory away from one consumer, it leaves
more for the others. Whether or not it actually solves your
problem, meaning after such a change the system has sufficient
memory to run without paging, remains to be seen.

In order to be sure, you need to so some additional memory
accounting and determine how much RAM you need to support
the shared segments for Sybase, and the JVMs.

Thanks,
/jim

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] [perf-discuss] Who're stealing memory ?

2009-11-21 Thread Jim Mauro


Right. All your memory appears to be anon segments -
13.4GB worth. About 9GB of that is the shared memory
segments. That leaves 4.4GB.

I see 13 Java processes listed. Assuming they have a similar
memory footprint as the one pmap example, which shows
about 40MB of RSS, that's (40MB x 13) about 500MB.

It's of course possible that the other JVMs are using more
memory than the 1 pmap example. The ps output or
prstat would be helpful here. You still need to account
for about 3.9GB of anon usage...

Thanks,
/jim


Simon wrote:

Hi Michael,

Now the system been reset seems that process disappear,but I found 
some similar processes which launched by the same user kplus,as:


kplus20905  0.1  1.1464984180288 ?S 08:18:38  2:17 
/usr/java/bin/java -Dprogram.name=run.sh -server -Xms128m -Xmx512m 
-Dsun.rmi.dgc.client.gcInterval=360 
-Dsun.rmi.dgc.server.gcInterval=360 -Djava.awt.headless=true 
-Djava.endorsed.dirs=/export/home1/jboss/jboss-4.0.5.GA/lib/endorsed 
http://jboss-4.0.5.GA/lib/endorsed -classpath 
/export/home1/jboss/jboss-4.0.5.GA/bin/run.jar:/usr/java/lib/tools.jar 
http://jboss-4.0.5.GA/bin/run.jar:/usr/java/lib/tools.jar org.jboss.Main


Thanks.
Best Regards,
Simon


On Sat, Nov 21, 2009 at 4:22 PM, Michael Schulte 
mschu...@sunspezialist.de mailto:mschu...@sunspezialist.de wrote:


Hey Simon,


# pmap -x 28447
28447:  /export/home1/bea/jdk160_05//bin/java
-Djava.awt.headless=true -Xbootc

Can you give the full argument list of this java?

Memory de-allocation is Java is fully asynchronous in Garbage
Collection and can be
tuned by command line options when starting the application. Just
Google for
the exakt syntax.

Michael


Simon wrote:

Hi Jim,

Thanks for your reply,here's my update:

   Did you check for additional virtual space usage in /tmp?


df -k shows only 1% used in /tmp filesystem:
swap 10943720 968 10942752 1%/tmp
swap 10942832  80 10942752 1%/var/run

   Are you using ZFS (ARC space needed for that)?

 No any zfs used,all filesystems are UFS.

   You can also try using the ::memstat mdb dcmd to break
out kernel
   memory further.


  ::memstat

Page SummaryPagesMB  %Tot
     
Kernel 111925   8745%
Anon  1715077 13399   83%
Exec and libs   64697   5053%
Page cache  71828   5613%
Free (cachelist)51148   3992%
Free (freelist) 43872   3422%

Total 2058547 16082
Physical  2037012 15914

As above,the Anonymous memory is very high,I think some user
thread using the memory in an abnormal way,I checked one of
process with pmap -x and found many of stack/heap,as:

# ps -ef |grep bea |grep -v grep
 kplustp 28447 1   0 07:01:37 ?   0:26
/export/home1/bea/jdk160_05//bin/java -Djava.awt.headless=true
-Xbootclasspath/
 kplustp 28447 1   0 07:01:37 ?   0:26
/export/home1/bea/jdk160_05//bin/java -Djava.awt.headless=true
-Xbootclasspath/
 kplustp 28443 1   0 07:01:37 ?   2:29
/export/home1/bea/jdk160_05//bin/java -Djava.awt.headless=true
-Xbootclasspath/
 kplustp 28445 1   0 07:01:37 ?   1:24
/export/home1/bea/jdk160_05//bin/java -Djava.awt.headless=true
-Xbootclasspath/
 kplustp 28457 1   0 07:01:38 ?   0:50
/export/home1/bea/jdk160_05//bin/java -Xms512m -Xmx1024m
-Djava.awt.headless=tr
 kplustp 28453 1   0 07:01:37 ?   1:55
/export/home1/bea/jdk160_05//bin/java -Xms512m -Xmx1024m
-Xbootclasspath/p:./..
 kplustp 28449 1   0 07:01:37 ?   0:25
/export/home1/bea/jdk160_05//bin/java -Djava.awt.headless=true
-Xbootclasspath/
 kplustp 28508 1   0 07:01:44 ?   1:15
/export/home1/bea/jdk160_05//bin/java -Djava.awt.headless=true
-classpath ./../
 kplustp 28451 1   0 07:01:37 ?   1:25
/export/home1/bea/jdk160_05//bin/java -Djava.awt.headless=true
-Xbootclasspath/
 kplustp 28455 1   0 07:01:37 ?   1:27
/export/home1/bea/jdk160_05//bin/java -Djava.awt.headless=true
-Xbootclasspath/
 kplustp 28439 1   0 07:01:36 ?   0:28
/export/home1/bea/jdk160_05//bin/java -Djava.awt.headless=true
-Xbootclasspath/
 kplustp 28441 1   0 07:01:36 ?   0:26

Re: [dtrace-discuss] Who're stealing memory ?

2009-11-20 Thread Jim Mauro

If you're running out of memory, which it appears you are,
you need to profile the memory consumers, and determine if
you have either a memory leak somewhere, or an under-configured
system. Note 16GB is really tiny by todays standards, especially for
an M5000-class server. It's like putting an engine from a Ford sedan
into an 18-wheel truck - the capacity to do work is severely limited
by a lack of towing power. Laptops ship with 8GB these days...

Back to memory consumers. We have;
- The kernel
- User processes
- The file system cache (which is technically part of the kernel,
   but significant enough such that it should be measured
   seperately.

If the database on a file system, and if so, which one (UFS? ZFS,
VxFS?). How much shared memory is really being used
(ipcs -a)?

If the system starts off well, and degrades over time, then you need
to capture memory data over time and see what area is growing.
Based on that data, we can determine if something is leaking memory,
or you have an underconfigured machine.

I would start with;
echo ::memstat | mdb -k
ipcs -a
ps -eo pid,vsz,rss,class,pri,fname,args
prstat -c 1 30
kstat -n system_pages

You need to collect that data and some regular interval
with timestamps. The interval depends on how long it takes
the machine to degrade. If the systems goes from fresh boot to
degraded state in 1 hour, I'd collect the data every second.
If the machine goes from fresh boot to degraded state in 1 week,
I'd grab the data every 2 hours or so.

/jim


Simon wrote:

Hi Experts,

Here's the performance related question,please help to review what can I
do to get the issue fixed ?

IHAC who has one M5000 with Solaris 10 10/08(KJP: 13-01) installed
and 16GB RAM configured,running sybase ASE 12.5 and JBOSS
application,recently,they felt the OS got very slow after OS running for
some sime,collected vmstat data points out memory shortage,as:

# vmstat 5
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr m0 m1 m4 m5 in sy cs us sy id
0 0 153 6953672 254552 228 228 1843 1218 1687 0 685 3 2 0 0 2334 32431 3143 1 1 
97
0 0 153 6953672 259888 115 115 928 917 917 0 264 0 35 0 2 2208 62355 3332 7 3 90
0 0 153 6953672 255688 145 145 1168 1625 1625 0 1482 0 6 1 0 2088 40113 3070 2 
1 96
0 0 153 6953640 256144 111 111 894 1371 1624 0 1124 0 6 0 0 2080 55278 3106 3 3 
94
0 0 153 6953640 256048 241 241 1935 2585 3035 0 1009 0 18 0 0 2392 40643 3164 2 
2 96
0 0 153 6953648 257112 236 235 1916 1710 1710 0 1223 0 7 0 0 2672 62582 3628 3 
4 93

As above,the w column is very high all time,and sr column also kept
very high,which indicates the page scanner is activated and busying for
page out,but the CPU is very idle,checked /etc/system,found one
improper entry:
set shmsys:shminfo_shmmax = 0x

So I think it's the improper share memory setting to cause too many
physical RAM was reserved by application and suggest to adjustment the
share memory to 8GB(0x2),but as customer feedback,seems it got
worst result based on new vmstat output:

kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr m0 m1 m4 m5 in sy cs us sy id
0 6 762 3941344 515848 18 29 4544 0 0 0 0 4 562 0 1 2448 25687 3623 1 2 97
0 6 762 4235016 749616 66 21 4251 2 2 0 0 0 528 0 0 2508 50540 3733 2 5 93
0 6 762 4428080 889864 106 299 4694 0 0 0 0 1 573 0 7 2741 182274 3907 10 4 86
0 5 762 4136400 664888 19 174 4126 0 0 0 0 6 511 0 0 2968 241186 4417 18 9 73
0 7 762 3454280 193776 103 651 2526 3949 4860 0 121549 11 543 0 5 2808 149820 
4164 10 12 78
0 9 762 3160424 186016 61 440 1803 7362 15047 0 189720 12 567 0 5 3101 119895 
4125 6 13 81
0 6 762 3647456 403056 44 279 4260 331 331 0 243 10 540 0 3 2552 38374 3847 5 3 
92

the w  sr value increased instead,why ?

And I also attached the prstat outout,it's a prstat snapshot after
share memory adjustment,please help to have a look ? what can I do next
to get the issue solved ? what's the possible factors to cause memory
shortage again and again,even they have 16GB RAM + 16GB Swap the physical RAM 
really shortage?
Or is there any useful dtrace script to trace the problem ? 
Thanks very much !


Best Regards,
Simon

  







___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org
  

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


[dtrace-discuss] value of zero from count()

2009-10-27 Thread Jim Mauro

I've run into this from time to time.

Simple example;

#dtrace -n 'hotspot27563::: { @[probename]=count(); } tick-1sec { 
printa(@); clear(@); }'


The sample output (below) shows a couple functions (probename)
with a value of zero. By virtue of the fact that it appears in the
aggregation at all implies that minimal value should be 1 (I would think).

(FYI - this is on OSOL 2009.06, x86 system).

Thanks,
/jim


gc-begin  0
 gc-end0
 mem-pool-gc-begin 0
 mem-pool-gc-end   0
 thread-start  3
 thread-stop   3

 gc-begin  0
 gc-end0
 mem-pool-gc-begin 0
 mem-pool-gc-end   0
 thread-start  4
 thread-stop   4

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] value of zero from count()

2009-10-27 Thread Jim Mauro

D'oh!

Thanks

Jonathan Adams wrote:

On Tue, Oct 27, 2009 at 02:00:52PM -0400, Jim Mauro wrote:
  

I've run into this from time to time.

Simple example;

#dtrace -n 'hotspot27563::: { @[probename]=count(); } tick-1sec { 
printa(@); clear(@); }'


The sample output (below) shows a couple functions (probename)
with a value of zero. By virtue of the fact that it appears in the
aggregation at all implies that minimal value should be 1 (I would think).



Clear clears all values in the aggregation to 0; to remove all elements from
the aggregation, you want trunc(@a).

(clear() can be useful if you want a running list of everything seen, but
count only for the most recent hits)

Cheers,
- jonathan

  

(FYI - this is on OSOL 2009.06, x86 system).

Thanks,
/jim


gc-begin  0
 gc-end0
 mem-pool-gc-begin 0
 mem-pool-gc-end   0
 thread-start  3
 thread-stop   3

 gc-begin  0
 gc-end0
 mem-pool-gc-begin 0
 mem-pool-gc-end   0
 thread-start  4
 thread-stop   4

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)

2009-09-23 Thread Jim Mauro

I'm cross-posting to zfs-discuss, as this is now more of a ZFS
query than a dtrace query at this point, and I'm  not sure if all the ZFS
experts are listening on dtrace-discuss (although they probably
are... :^).

The only thing that jumps out at me is the ARC size - 53.4GB, or
most of your 64GB of RAM. This in-and-of-itself is not necessarily
a bad thing - if there are no other memory consumers, let ZFS cache
data in the ARC. But if something is coming along to flush dirty
ARC pages periodically

I found what looked to be an applicable bug;
CR 6699438 zfs induces crosscall storm under heavy mapped sequential 
read workload

but the stack signature for the above bug is different than yours, and
it doesn't sound like your workload is doing mmap'd sequential reads.
That said, I would be curious to know if your workload used mmap(),
versus read/write?

For the ZFS folks just seeing this, here's the stack frame;

 unix`xc_do_call+0x8f
 unix`xc_wait_sync+0x36
 unix`x86pte_invalidate_pfn+0x135
 unix`hat_pte_unmap+0xa9
 unix`hat_unload_callback+0x109
 unix`hat_unload+0x2a
 unix`segkmem_free_vn+0x82
 unix`segkmem_zio_free+0x10
 genunix`vmem_xfree+0xee
 genunix`vmem_free+0x28
 genunix`kmem_slab_destroy+0x80
 genunix`kmem_slab_free+0x1be
 genunix`kmem_magazine_destroy+0x54
 genunix`kmem_depot_ws_reap+0x4d
 genunix`taskq_thread+0xbc
 unix`thread_start+0x8

Let's see what the fsstat and zpool iostat data looks like when this
starts happening..

Thanks,
/jim


Jim Leonard wrote:

It would also be interesting to see some snapshots
of the ZFS arc kstats

kstat -n arcstats



Here you go, although I didn't see anything jump out (massive amounts of cache 
misses or something).  Any immediate trouble spot?

# kstat -n arcstats
module: zfs instance: 0
name:   arcstatsclass:misc
c   53490612870
c_max   67636535296
c_min   8454566912
crtime  212.955493179
deleted 7267003
demand_data_hits179708165
demand_data_misses  189797
demand_metadata_hits9959277
demand_metadata_misses  194228
evict_skip  1709
hash_chain_max  9
hash_chains 205513
hash_collisions 9372169
hash_elements   851634
hash_elements_max   886509
hdr_size143082240
hits198822346
l2_abort_lowmem 0
l2_cksum_bad0
l2_evict_lock_retry 0
l2_evict_reading0
l2_feeds0
l2_free_on_write0
l2_hdr_size 0
l2_hits 0
l2_io_error 0
l2_misses   0
l2_rw_clash 0
l2_size 0
l2_writes_done  0
l2_writes_error 0
l2_writes_hdr_miss  0
l2_writes_sent  0
memory_throttle_count   0
mfu_ghost_hits  236508
mfu_hits165895558
misses  388618
mru_ghost_hits  70149
mru_hits24777390
mutex_miss  6094
p   49175731760
prefetch_data_hits  7993639
prefetch_data_misses370
prefetch_metadata_hits  1161265
prefetch_metadata_misses4223
recycle_miss7149
size53490565328
snaptime5759009.53378144
  

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)

2009-09-23 Thread Jim Mauro

(posted to zfs-discuss)

Hmmm...this is nothing in terms of load.
So you say that the system becomes sluggish/unresponsive
periodically, and you noticed the xcall storm when that
happens, correct?

Refresh my memory - what is the frequency and duration
of the sluggish cycles?

Could you capture a kernel profile during a sluggish cycle;

#dtrace -n 'profile-997hz / arg0  curthread-t_pri != -1 / { 
@[stack()]=count(); } tick-1sec { trunc(@,10); printa(@); clear(@); }'


And/or -

#lockstat -i997 -kIW -s 10 sleep 30  lockstat.kprof.out

And

#lockstat -Cc sleep 30  lockstat.locks.out

Thanks,
/jim

Jim Leonard wrote:

Can you gather some ZFS IO statistics, like
fsstat zfs 1 for a minute or so.
  


Here is a snapshot from when it is exhibiting the behavior:

 new  name   name  attr  attr lookup rddir  read read  write write
 file remov  chng   get   setops   ops   ops bytes   ops bytes
0 0 075 0  0 0 0 010 1.25M zfs
0 0 083 0  0 0 0 0 7  896K zfs
0 0 078 0  0 0 0 013 1.62M zfs
0 0 0   229 0  0 0 0 029 3.62M zfs
0 0 0   217 0  0 0 0 028 3.37M zfs
0 0 0   212 0  0 0 0 026 3.03M zfs
0 0 0   151 0  0 0 0 018 2.07M zfs
0 0 0   184 0  0 0 0 031 3.41M zfs
0 0 0   187 0  0 0 0 032 2.74M zfs
0 0 0   219 0  0 0 0 024 2.61M zfs
0 0 0   222 0  0 0 0 029 3.29M zfs
0 0 0   206 0  0 0 0 029 3.26M zfs
0 0 0   205 0  0 0 0 019 2.26M zfs

Unless attr_get is ludicrously costly, I can't see any issues...?
  

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)

2009-09-22 Thread Jim Mauro


dtrace -n ':::xcalls { @s[stack()] = count() } tick-1sec { trunc(@s,10); 
printa(@s); clear(@s); }'



That will tell us where the xcalls are coming from in the kernel,
and we can go from there.

Thanks,
/jim


Jim Leonard wrote:

We have a 16-core x86 system that, at seemingly random intervals, will 
completely stop responding for several seconds.  Running an mpstat 1 showed 
something horrifiying:

CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 1004691 397 170 0 0 0 5 0 0 0 100 0 0
(rest of CPUs omitted)

That's over a million cross-calls a second.  Seeing them on CPU0 made me 
nervous that they were kernel-related, so I wrote a dtrace to print out xcalls 
per second aggregated by PID to see if a specific process was the culprit.  
Here's the output during another random system outage:

2009 Sep 22 12:51:49, load average: 5.90, 5.35, 5.39   xcalls: 637511

   PIDXCALLCOUNT
   616415
   616515
   28339   26
   0   637455

PID 0 is sched (aka the kernel).

At this point I'm completely stumped as to what could be causing this.  Any 
hints or ideas?
  

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)

2009-09-22 Thread Jim Mauro

As Dan said, it looks like ZFS is busy.

How much RAM is on this system?
What release of Solaris?
Do you have any ZFS tweaks in /etc/system?
(like clamping the ARC size...)
Is the system memory constrained?

The xcalls are due to the page unmaps out of
what I'm assuming is the ZFS ARC (although
I'm not 100% sure about that - I need to check
the source for segkmem_zio_free())

Can you gather some ZFS IO statistics, like
fsstat zfs 1 for a minute or so.

Thanks,
/jim


Jim Leonard wrote:

Thanks for the awesome two-liner, I'd been struggling with 1-second intervals 
without a full-blown script.

I modified it to output walltime so that I could zoom in on the problem, and 
here it is:

  unix`xc_do_call+0x8f
  unix`xc_wait_sync+0x36
  unix`x86pte_invalidate_pfn+0x135
  unix`hat_pte_unmap+0xa9
  unix`hat_unload_callback+0x109
  unix`hat_unload+0x2a
  unix`segkmem_free_vn+0x82
  unix`segkmem_zio_free+0x10
  genunix`vmem_xfree+0xee
  genunix`vmem_free+0x28
  genunix`kmem_slab_destroy+0x80
  genunix`kmem_slab_free+0x1be
  genunix`kmem_magazine_destroy+0x54
  genunix`kmem_depot_ws_reap+0x4d
  genunix`taskq_thread+0xbc
  unix`thread_start+0x8
   418560

Any ideas?
  

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)

2009-09-22 Thread Jim Mauro

It would also be interesting to see some snapshots
of the ZFS arc kstats

kstat -n arcstats

Thanks

Jim Leonard wrote:

Thanks for the awesome two-liner, I'd been struggling with 1-second intervals 
without a full-blown script.

I modified it to output walltime so that I could zoom in on the problem, and 
here it is:

  unix`xc_do_call+0x8f
  unix`xc_wait_sync+0x36
  unix`x86pte_invalidate_pfn+0x135
  unix`hat_pte_unmap+0xa9
  unix`hat_unload_callback+0x109
  unix`hat_unload+0x2a
  unix`segkmem_free_vn+0x82
  unix`segkmem_zio_free+0x10
  genunix`vmem_xfree+0xee
  genunix`vmem_free+0x28
  genunix`kmem_slab_destroy+0x80
  genunix`kmem_slab_free+0x1be
  genunix`kmem_magazine_destroy+0x54
  genunix`kmem_depot_ws_reap+0x4d
  genunix`taskq_thread+0xbc
  unix`thread_start+0x8
   418560

Any ideas?
  

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Pass argument to dtrace script for use as predicate

2009-09-12 Thread Jim Mauro

From http://wikis.sun.com/display/DTrace/Scripting;

If you want your D macro arguments to be interpreted as string tokens 
even if they match the form of an integer or identifier, prefix the 
macro variable or argument name with two leading dollar signs (for 
example, $$1) to force the D compiler to interpret the argument value as 
if it were a string surrounded by double quotes. All the usual D string 
escape sequences (see Table 2–5) are expanded inside of any string macro 
arguments, regardless of whether they are referenced using the $arg or 
$$arg form of the macro. If the defaultargs option is set, unspecified 
arguments that are referenced with the $$arg form have the value of the 
empty string ().



If you're passing a string, use $$1.

/jim

Stathis Kamperis wrote:

2009/9/12 Stathis Kamperis ekamp...@gmail.com:
  

2. If I leave $1 without double quotes  and the execname doesn't have
a dot in it, it fails with:
$ pfexec ./lala.d temp
dtrace: failed to compile script ./lala.d: line 11: failed to resolve
temp: Unknown variable name


Ok, I got lucky with google this time:
http://solaris.reys.net/passing-command-line-parameters-to-dtrace-scripts

I have to write:
$ pfexec ./lala.d 'temp'

for dtrace to treat it as string argument.

  

3. If I surround $1 with double quotes, it fails not matter what the
supplied argument is:
$ pfexec ./lala.d temp.sh
dtrace: failed to compile script ./lala.d: line 31: extraneous
argument 'temp.sh' ($1 is not referenced)
$ pfexec ./lala.d temp
dtrace: failed to compile script ./lala.d: line 31: extraneous
argument 'temp' ($1 is not referenced



I still can't figure this out. Does $1 inside a script make dtrace
perceive it as a literal string '$1' ?

Cheers,
Stathis
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org
  

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


[dtrace-discuss] LISA '09 Tutorials

2009-08-21 Thread Jim Mauro

In the shameless plug category

I have two tutorials scheduled for the Usenix LISA '09
conference, running in Baltimore, Md, Nov 1-6, 2009.

Sunday, Nov 1, is a full day DTrace tutorial.
Monday, Nov 2, is a full day Solaris/OpenSolaris Performance,
Observability and Tools tutorial.

And of course, Richard Elling has a full day ZFS
tutorial on Monday, Nov 2 (unfortunately, it does
overlap with the Solaris performance tutorial).

Jeff Victor is doing a full day of Resource Management
with Solaris Containers on Thursday, Nov 5.

Come one, come all!

Thanks,
/jim

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Check that memory is filled by zero or not after process dies

2009-08-18 Thread Jim Mauro

You're actually asking multiple questions here, because in order
to verify if a particular operating system zero-fills memory pages
when they are freed from an address space, you'd need to first know
which kernel function is called to zero-fill the pages, right?

I created a simple DTrace script to trace all kernel functions
executed when a process exits;

-- ex.d ---
#!/usr/sbin/dtrace -s

#pragma D option flowindent

proc::proc_exit:exit
/ pid == $target /
{
   self-trace = 1;
}

fbt:::
/ self-trace /
{
}
--

This script uses the DTrace proc provider and enables the exit
probe, which fires when a process exits. The predicate following
the probe, / pid == $target /, tests that the PID is one I'm interested in.
In this case, I'm running this script as (the script name is ex.d);

#ex.d -c ./mem

The -c flag instructs DTrace to plug the PID of the command
into the $target macro. Very handy. The mem program is a
quick 10 line C program I wrote that malloc's 50MB of RAM,
touches every page, and exits.

So when the proc::proc_exit:exit probe fires, and
the predicate evaluates TRUE, meaning it's the PID
I'm interested in, we set a thread local variable called
trace to 1.

The second probe entry, fbt:::, basically enables every fbt
provider probe. The fbt provider allows us to instrument the
entry and return point for pretty much every kernel function.

With the trace flag set, and the flowindent option set in the script,
when the process exits, I get an easy-to-read list of kernel functions
called.

None of this answers the question are memory pages zero filled
when they are freed, because I don't (or at least did not) know the
name of the kernel function the zero-fills memory pages.

When I went through the kernel calls, nothing really jumped out at
me as a zero-fill function. It was clear I was freeing memory -
I saw as_free() (free an address space) which loops through the
segments in the address space, calling the segment-specific
unmap code, and eventually page_free(). But nothing that looked
like it was zero'ing out freed pages. So...

I know for sure that allocation of anonymous memory uses a ZFOD
(zero fill on demand) mechanism, so I decided to have a look at that.
DTrace makes this so easy - I have three probes called zfod available,
that each reside in a different kernel function - anon_map_createpages(),
anon_zero() and anon_map_getpages(). So I did some dtrace'ing on these
functions to see what they called, and discovered pagezero(), which sounded
a lot to me like a zero fill a memory page function. A quick look at
the source code verified this. So

I know empirically that memory pages are zero-filled when allocated
(anon memory pages that is);

# dtrace -n 'fbt::pagezero:entry / pid == $target / { @c = count(); }' 
-c ./mem

dtrace: description 'fbt::pagezero:entry ' matched 1 probe
dtrace: pid 5016 has exited

6642

In the dtrace above, I enabled the entry point to the kernel
pagezero() function, and used the DTrace count() aggregating
function to count the number of times I call it. I know my
mem program allocates 50MB, which is 6400 8k memory pages.
I see I called pagezero() 6642 times, which aligns pretty well with
what I expected (other anon allocations happen when you start
a process beyond my 50MB heap segment).

As for process exit and freed pages, I have not found any
indication that freed pages are zero'd. Mind you, this does not
represent an exhaustive study (I only spent a few minutes on it).

HTH,
/jim



Dmitry Afanasyev wrote:

Hello.

I've got a scrip for systemtap and Linux kernel, that check if physical memory 
is filled by zero after process dies. The scrip is attached.

The probe is set in the free_pages_bulk function, which frees number of memory 
pages using _free_one_page. I can't set probe in _free_one_page, because it is 
inline and systemtap doesn't support it. The result is the address of first 
dirty(not filled by zero) page in the bulk.

I need the same script for Solaris kernel. I guess that DTrace provides the 
same functionality as systemtap, but I do not familiar with Solaris kernel at 
all.
  



___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org
  

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] CPU time and IO time

2009-08-17 Thread Jim Mauro


The easiest way to do this is using the sleep/wakeup
probes in the sched provider. From the process/thread
perspective, once they issue an IO, they sit on a sleep
queue until the IO is completed, at which point they're
issued a wakeup.

io:start/io:::done is usefull for a system view, but that's
not what you're interested in.

You could also enable probes at the system call level,
and time things likd syscall::read:entry to syscall::read:return,
using predicates as needed to make sure you're measuring
what you expect.

I find your last sentence confusing;

because the io event might take a long time to finish, before
which, the program would have gone back to the cpu

The thread that issued the IO will not go back on-cpu until
the IO has been completed, unless the thread is using the
AIO (async IO) facility.

Thanks,
/jim


Hi,

  I am trying to characterize some benchmark programs using dtrace. The 
first step is to determine the time that a program spends on the CPU (doing 
computations), and the time it spends doing IO.  The cpu time can be easily 
determined using the on-cpu and off-cpu probes. I am having trouble figuring 
out the io time. Using the io:start and io:done probes will not give me the 
correct numbers, because the io event might take a long time to finish, before 
which, the program would have gone back to the cpu..Any ideas?

Thanks.
  

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] track program time - need some guidelines

2009-07-13 Thread Jim Mauro

Threads are put to sleep when they issue blocking system calls.
I'm not sure exactly what you mean by only know synchronous
app wait from an app perspective, but the threads will be in the
sleep state (SLP) after issuing the read/write until the data is
available, at which point a wakeup is issued and the thread goes to
runnable.

You're conspicuously avoiding answering the what problem are you
trying to solve question, which has been asked multiple times.

This is no longer a DTrace question, so any further queries should get
posted to the performance forum.

/jim


tester wrote:

Thanks everyone. A quick question, where would be the time be accounted for a 
thread that wait for a write/read syscall to a block device? I would only like 
to know the synchronous app wait from an app  perspective irrespective of all 
the async activities that supports it. I havn't looked at Sun studio yet since 
I don't think I can install it our env. Time for me go and re-read the thread 
again,I guess.

Thanks again.
  

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Question about ID from trace

2009-07-10 Thread Jim Mauro

It's the ID of the probe, not the provider.

/jim


Andrea Cucciarre' wrote:

I guess that the ID you see it's the ID of the provider not the PID

On 07/10/09 16:01, Robert Alatalo wrote:

Hello,

Trying to track down what application is causing the system to 
reboot by turning the uadmin system call's reboot into a panic, we 
are running the following script:


-x-x- start of uadmin.d -x-x-

#!/usr/sbin/dtrace -s

#pragma D option destructive

syscall::uadmin:entry
/arg0 == 1  uid == 0/
{
  panic();
}

syscall::uadmin:entry
{
   trace(I am not root)
}
-x-x- end of uadmin.d -x-x-


We are getting a string of output:


dtrace: script '/usr/local/bin/uadmin.d' matched 2 probes
dtrace: allowing destructive actions
CPU IDFUNCTION:NAME
258   8063 uadmin:entry   I am not root
258   8063 uadmin:entry   I am not root
258   8063 uadmin:entry   I am not root
258   8063 uadmin:entry   I am not root
258   8063 uadmin:entry   I am not root
261   8063 uadmin:entry   I am not root
256   8063 uadmin:entry   I am not root
256   8063 uadmin:entry   I am not root
259   8063 uadmin:entry   I am not root
257   8063 uadmin:entry   I am not root
257   8063 uadmin:entry   I am not root
256   8063 uadmin:entry   I am not root
256   8063 uadmin:entry   I am not root
256   8063 uadmin:entry   I am not root
257   8063 uadmin:entry   I am not root
258   8063 uadmin:entry   I am not root
...
...

I thought that the ID should be a PID, but
# ps -ef|grep 8063

dores not find it

Any suggestions on how to track down this rouge entity which is 
futilely  trying to do something with uadmin?


thanks,
Robert







___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] dtrace performance overhead

2009-06-03 Thread Jim Mauro


Hi (ummm, Tester?) -

First and foremost, what's the errno on the fork failure?
99% of the time, the errno information is enough to figure out
why forks are failing.

Second, make sure you look in /var/adm/messages - if fork
is failing because of a system resource issue, you'll often
get a syslog message that tells you what's wrong.

You really should not need to generate a kernel function call
trace to root-cause the failure of any system call, including fork.
It's like using an MRI and studying the resuling film when the
problem is a small cut that just needs some disinfectint and
a band-aid. You're looking much deeper than you need to.

Thanks,
/jim


tester wrote:

Jim,

Thanks. You are right, I was using the specopen.d, but looking for fork errors instead of open. I did not know that probe has to fire before predicate gets evaluated. It now makes sense for 40% increase in load during dtracing.  I would like to see the code path during a fork failure (and ultimately the reason), do you have any suggestion without incurring a significant overhead. We have an application that forks around 1000+ processes in 5-10 seconds and occassionaly the fork fails. I am think it is from lack of VM, but also have a feeling that there are some hardcoded limits in the kernel. There was plenty of RAM available duiring the fork failure. 


Thanks
  

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] dtrace performance overhead

2009-06-03 Thread Jim Mauro

Try this;

#!/usr/sbin/dtrace -s

#pragma D option quiet

extern int errno;

syscall::forkall:return,
syscall::vfork:return,
syscall::forksys:return,
syscall::fork1:return
/ arg0 == -1 || arg1 == -1 /
{
   printf(FORKED FAILED, errno: %d, arg0: %d, arg1: %d\n,errno, arg0, 
arg1);

}
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] dtrace performance overhead

2009-06-03 Thread Jim Mauro

not enough space indicates an errno 28 ENOSPC, which isn't
listed is the fork man page under ERRORS. Are you sure it's
fork(2) that's failing?

It may be errno 12, ENOMEM.

So what does a general memory health profile of the system
look like? Lots of free memory? Plenty of swap space?

How about the process executing the forks - is it a
32-bit or 64-bit process? If 32-bit, is it hitting the limits
of it's address space?

/jim

tester wrote:

Hi Jim,

The app software doesn't poduce a errno  in its logs (bad software, although from a 
leading vendor, I think they inherited it, but a error string says not enough 
space I tried grepping some of the header files but could not find a match.

/var/adm/messages: that's the first thing I looked at, but no help there either.

I tried to match the error string in src.opensolaris.org, some hits turned up 
but I wasn't confident they were a result of this.

Thank you.
  

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] dtrace performance overhead

2009-06-03 Thread Jim Mauro

D'oh!

Disregard that last question (address space) - my brain
was thinking thread create failures - it's not applicable
to fork failures. My bad.

The system memory and swap space health checks
still apply, as well as process count -
grab some sar -v 1 60 samples

/jim


Jim Mauro wrote:

not enough space indicates an errno 28 ENOSPC, which isn't
listed is the fork man page under ERRORS. Are you sure it's
fork(2) that's failing?

It may be errno 12, ENOMEM.

So what does a general memory health profile of the system
look like? Lots of free memory? Plenty of swap space?

How about the process executing the forks - is it a
32-bit or 64-bit process? If 32-bit, is it hitting the limits
of it's address space?

/jim

tester wrote:

Hi Jim,

The app software doesn't poduce a errno  in its logs (bad software, 
although from a leading vendor, I think they inherited it, but a 
error string says not enough space I tried grepping some of the 
header files but could not find a match.


/var/adm/messages: that's the first thing I looked at, but no help 
there either.


I tried to match the error string in src.opensolaris.org, some hits 
turned up but I wasn't confident they were a result of this.


Thank you.
  

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] dtrace performance overhead

2009-06-02 Thread Jim Mauro

Which example are you using, specopen.d, /*the script
that instruments every fbt probe*/?

Please post or be more precise about which script you're using.

If you're using specopen.d, than you're enabling on the
order of 30,000 probes. That's going to add up, even at
the very reasonable cost of about 1 microsecond per probe
(rough approximation).

In general, the cost of using DTrace in terms of probe effect
is directly proprotional to two things;
- the number of probes you enable.
- the activity of the code path of the enabled probe(s).

In other words, if you instrument the fork system call,
and the system is not executing forks, you won't feel a thing.
If, on the other hand, the system is going 1000 forks per second,
you'll feel it.

The specopen.d script enables all the fbt probes. Even with
the predicate ( / self-spec / ), the probe still has to fire before
the predicate gets evaluated.

HTH,
/jim


tester wrote:

Hi,

On a T5220, when using the speculative tracing there is a signifcant increase on system load. I am using the examples from 


http://wikis.sun.com/display/DTrace/Speculative+Tracing

The system  call traced is fork instead of open64. Can that script cause such a 
load? The system itself without the script is moderately loaded (usr+sys 
~30-40).

Also, I have to bump up the number of speculative buffers to 10 to avoid some 
of the warnings. What is a good rule to set the number of speculative buffers? 
In this case is it based on the number of fork calls that could be 
simultaneously traced?

Thanks
  

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] dtrace performance overhead

2009-06-02 Thread Jim Mauro

I'm sorry, but I am unable to parse this.

What is the question here?

Thanks,
/jim


tester wrote:

counting system call process during this interval: Dtrace came on top

ioctl dtrace 10609

I am sure if that is from the speculative dtrace script or the script used to 
count the system calls.

Thanks
  

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] dtrace performance overhead

2009-06-02 Thread Jim Mauro

Ah, OK - I think I get it.

tester wrote:

counting system call process during this interval: Dtrace came on top

ioctl dtrace 10609
  


Got  it. DTrace executed 10,609 system calls during your sampling period,
more than any other process. I often filter dtrace out in a predicate;

/ execname != dtrace /


I am sure if that is from the speculative dtrace script or the script used to 
count the system calls.
  


Both. If you want to know precisely which one, you can use pid
as an aggregation key in the count aggregation, and correlate the
pid to the script you're starting. I'm not sure what you're chasing
here though...

/jim


Thanks
  

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] DBWR write performance

2009-05-26 Thread Jim Mauro

I would start with lockstat to determine if there's RW lock contention
(and lockstat is a DTrace consumer).

#lockstat -e4-7,34-35 sleep 60  /var/tmp/rwlocks.out

The above will collect events on kernel reader/writer locks
(run lockstat -h to get a description of each event).

With that data, we can see if there are long RW lock hold
times, and the VxFS kernel routine issuing the call.
From that output, we should be able to write a D script
that does what you're interested in doing - per-file RW lock
hold time tracking. I'm thinking about this now, and
will try to get this written for UFS (since I do not have
VxFS available).

Let us know what the lockstat output looks like, and
we'll go from there.

Thanks,
/jim


Michael Jessen wrote:
I'm an Oracle DBA, a dtrace noob, and I'm attempting to resolve an Oracle write performance problem on Solaris 10, Oracle 9iR2. 

The Oracle DBWR processes have poor write throughput writing to VxFS files. I suspect that the problem may be with POSIX readers/writer file locking. 

Ideally I'd like to identify, for a given DBWR process, readers/writer file lock contention on a datafile by datafile basis. 

Any guidance would be greatly appreciated, 


Kind regards, Michael
  

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] fbt with predicates and [O]Sol on VBox

2009-05-18 Thread Jim Mauro

FYI - I just tried this in OpenSolaris 2008.11, running in a
Vbox (2.2) virtual machine. It's noisy without the predicate
for the fbt probes (naturally :^), but it doesn't hang.

(Vbox on a Mac host, FWIW).

Thanks,
/jim


Michael Ernest wrote:

I've been playing with a follow script example and getting a system hang every 
time. Turns out when I try:

syscall::read:entry /execname == ls / { self-tr = 1; }
fbt::: {}
syscall::read:return /self-tr / { self-tr = 0; }

I get (unusable) output.  But when I add /self-tr / to my fbt::: clause, it hangs the instance every time. I've tried this on Sol u6, Sol u7, OSol b110, OSol Nov2008, anywhere from 1-2 GB RAM. I don't a native Solaris instance available today, although I have no reason to believe I'd see this issue on it. 


Any help?
  

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] tcp listen backlog queue

2009-04-14 Thread Jim Mauro

Hey Paul - Add this predicate;

/ arg0 != (conn_t *)0 /

Talk soon...

/jim


Paul Mininni wrote:

Hey everyone-
With this dtrace script;

#!/usr/sbin/dtrace -qs

#pragma D option aggsize=512k
#pragma D option bufsize=512k

fbt::tcp_conn_request:entry
{
this-connp = (conn_t *)arg0;
this-tcpp = (tcp_t *)this-connp-conn_tcp;
printf(tcp q: %d  max: %d\n, this-tcpp-tcp_conn_req_cnt_q, 
this-tcpp-tcp_conn_req_max );
}

I get this error upon incoming tcp connection;

dtrace: error on enabled probe ID 1 (ID 36240: fbt:ip:tcp_conn_request:entry): 
invalid address (0x0) in action #3 at DIF offset 12

Source indicates this...
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/inet/tcp/tcp.c
   5275 tcp_conn_request(void *arg, mblk_t *mp, void *arg2)
   5276 {
...
   5288 conn_t  *connp = (conn_t *)arg;
   5289 tcp_t   *tcp = connp-conn_tcp;
...
   5300 if (tcp-tcp_conn_req_cnt_q = tcp-tcp_conn_req_max) {
...

Anyone have an idea why this error shows up?
Thanx
-Paul
  

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Applications I/O

2009-04-02 Thread Jim Mauro

You're tripping over the fact the these disk IOs are happening
asynchronously to the process/thread that initiated them.
The dd(1) process has long since been placed on a sleep
queue by the time you're hitting the ARC code, which is why
execname is sched (the execname of PID 0  - the user
process is not running - it's not on-cpu when the probe fires).

If you want to look at the disk IO load being generated by
an application, use the syscall provider, or the fsinfo
provider. You can't chase IO's and correlate them to
user processes once you get below fsinfo - they're off
CPU and sleeping while the kernel goes off and deals with
the disk IOs.

The IO size will typically change as you move down the
stack, unless the application issues IOs that align with the
underlying file system record size, but even then things
like prefetching will result in the physical IO size not necessarily
aligning with the number of bytes the application tried to
read or write.

Starting at the bottom - if you want to monitor physical disk
IO rates, use the DTrace io provider, and/or iostat.

If you want to track IO rates through a file system,
use the DTrace fsinfo provider, and/or the fsstat command.

If you want to track disk IO's generating by applications,
start with the syscall provider. It's possible to connect
the dots up and down the stack, and there should be
examples of that in iotop in Brendan's DTraceToolkit.

Thanks,
/jim


Ferrand wrote:

Hye,

I still try to access application side I/O size on a ZFS file system.

( example : if /var/zfs_fs is a ZFS file system
/var/zfs_fs:# dd if=/dev/urandom of=file bs=4k count=100

i try to get an information simple as
execname == dd
block_size = 4096

in order to trace a specific application I/O profile

I've tried with both fbt::arc_access, fbt::arc_write, fbt::arc_buf_evict, 
fbt::arc_change_state, fbt::zio_create, ..., but i can't retrieve this information.

Some times, block_size doesn't match 4k (e.g 128k, ..., or 0 with the 
zio_create probe)
Some times, execname is sched and hide the application name wich initiate the 
I/O.

I've read a lot of the ZFS ans ARC source code, but can't find any DTrace probe 
that provides me those informations.
Please, need some Help

R.
  

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Applications I/O

2009-04-02 Thread Jim Mauro




provider. You can't chase IO's and correlate them to
user processes once you get below fsinfo - they're off
CPU and sleeping while the kernel goes off and deals with
the disk IOs.


I need to make this clearer. With DTrace, you can correlate
disk IO load to applications, it's just not as straight-forward
as tracking execname in the io provider, or somewhere
else down in the bowels of the kernel's IO stack, because
of the aforementioned asynchronous nature of disk IO.

Thanks,
/jim

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] fsflush writes very slow

2009-03-30 Thread Jim Mauro

Cross-posted to perf-discuss.

You can't change the write behavior of the app without
changing the app itself. The code would need to be modified
to issue fsync() calls on the file(s), or open the files for
synchronous writes (O_SYNC | O_DSYNC flags).

fsflush will run, by default, once per second, which is the
fastest rate it can be tuned for.

The actual size of the IO is driven by several factors. fsflush
is is writing 4kB or 8kB because the base page size on x86
is 4k, so fsflush is flushing 1 or 2 pages at a time.

You can use truss or dtrace to determine what size writes the
application is issuing;

#dtrace -n 'syscall::write:entry { @sizes = quantize(arg2); }'

I assume the application is issuing 32k writes, which would correspond
to what you observed with dtrace.

If you're looking to optimize write performance, and need to
ensure data is commited to disk, you could turn directio on
for the file system, which will bypass the page cache and
write straight to disk. This can really help, or it can really hurt,
depending on to what extent hitting in the page cache is
helping performance (like on the read side).

Getting to your specific issue, 6-8MB/sec does not sound like much,
but what if that's all the application is generating? In other words,
using a different load (dd), you observed much higher bandwidth
with a large, sequential write, so you know the underlying platform
is capable of delivering more bandwidth. At the application level,
the app is doing smaller, random writes.

What does the iostat -zxnd 1 data look like? Are you getting
good latency? If the app is generating 32k IOs, and you're getting
6-8MB/sec (lets call it 7), that means your generating (7MB / 32k)
about 220 IOPS. Depending on what the underlying disk storage is,
this may be a very good number, but I need to know more about
the disk(s) the UFS is sitting on.

Thanks,
/jim




K Kelley wrote:

I'm troubleshooting an I/O performance problem with one of our applications that does a 
lot of writing, generally blocks just over 32K, sequentially writing large files.  It's a 
Solaris 10 x86 system with UFS disk.  We're often only seeing disk write throughput of 
around 6-8MB/s, even when there is minimal read activity.  Running iosnoop shows that 
most of the physical writes are made by the actual app and average around 32KB.  About 
15% of the data, however, is done by fsflush and only 4 or 8KB at a time.  The write 
throughput for the fsflush writes is about 10% that of the app writes (using the 
DTIME values and aggregating the results to get totals).  CPU resources are 
not a bottleneck.

If I turn off dopageflush the overall rate jumps to 18-20MB/s.  However, this 
would mean that the file data may not get flushed for a very long time, so is 
not a suitable option for production environments.

If I do a dd (to create a file of 1GB, the amount of system memory), even with 
a block size that matches our app, it does much larger block writes, often 1M, 
the overall rate is around 60MB/s, and there were very few writes by fsflush.

Is there any way to cause more of the physical writes to be done by the app 
rather than fsflush (and for that matter, what determines when fsflush does the 
flushing rather than the app?)?
  

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Introducing DTrace to a newbie audience

2009-03-26 Thread Jim Mauro




# dtrace -qn 'syscall:::exec-success { trace (execname); }'
  

   ^


The exec-success probe is managed by the proc provider, not
the syscall provider. So the probe designation should be;

proc:::exec-success (or just 'exec-success').

(for the newbies).

Thanks,
/jim
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] DTrace and I/O

2009-03-17 Thread Jim Mauro

You're looking at byte counts, not block sizes.

56kb sounds typical for UFS, which uses an 8k block size,
with 1k frags (default), so you'll typically see IO sizes
to/from UFS in multiples of 8k. The actually amount of
IO depends of course on several factors.

You can also just use iostat data. For reads, divide kbytes reads by
reads-per-second to get the physical IO size for a given row of
data. (same for writes).

/jim


Ferrand wrote:

Hye everyboy,

I try to obtain the REAL size of I/O in block or Ko in order to know how ZFS 
manage the block size.

I've tried the io provider with

this-byte_size = args[0]-b_bcount;
this-kibi_byte_size = this-byte_size/1024;

but my results seems strange ( block of 56 ko with UFS ... )

How can i retrieve REAL block size for I/O ?
Thanks
R.
  

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


[dtrace-discuss] Don't Shout at your JBODs

2009-01-05 Thread Jim Mauro

http://www.youtube.com/watch?v=tDacjrSCeq4

I wonder if the inverse is true. If I whisper soothing
words of encouragement at my JBODs, will I get
more IOPS with reduced latency?

:^)

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] disk utilization is over 200%

2008-12-17 Thread Jim Mauro
This is all very oddiostat is historically extremely reliable.
I've never observed stats like that before - zero reads and writes
with a non-zero value in the wait queue (forget utilization when
it comes to disk - it's a useless metric).

IO rates per process are best measured at the VOP layer.
Depending on what version of Solaris you're running, you
can use the fsinfo provider (fsinfo::fop_read:entry,
fsinfo::fop_write:entry). If you don't have the fsinfo
provider, instrument the syscall layer to track reads and writes.

Can we get another sample, using iostat -zxnd 1 20?

Does the application recover from the hang, or does it
remain hung and require kill/restart?

Thanks,
/jim


Jianhua Yang wrote:

 Hello,

 I use Brendan's sysperfstat script to see the overall system 
 performance and found the the disk utilization is over 100:

 15:51:38 14.52 15.01 200.00 24.42 0.00 0.00 83.53 0.00
 15:51:42 11.37 15.01 200.00 25.48 0.00 0.00 88.43 0.00
 -- Utilisation -- -- Saturation --
 Time %CPU %Mem *%Disk* %Net CPU Mem *Disk* Net
 15:51:45 11.01 15.01* 200.00* 12.02 0.00 0.00 *95.03* 0.00
 15:51:48 13.80 15.01 *200.00* 24.87 0.00 0.00 *98.86* 0.00
 15:51:51 9.44 15.01 *200.00* 17.02 0.00 0.00 *102.64* 0.00
 15:51:54 9.49 15.01 *164.59* 9.10 0.00 0.00 *83.75* 0.00
 15:51:57 16.58 15.01 *2.83* 20.46 0.00 0.00 0.00 0.00

 how can I fix this ? is there new verion of this script ?

 my system is X4600-M1 with hardware RAID of
 0+1 = OS disk =72 GB = d0
 2+3 = apps data disk = 146 GB = d2, SVM soft partition with one UFS 
 file system is active
 at that time, iostat showed strange output:
 cpu
 us sy wt id
 13 9 0 78
 extended device statistics
 r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
 0.0 0.0 0.0 0.0 0.0 335.0 0.0 0.0 0 100 md/d30
 0.0 0.0 0.0 0.0 0.0 335.0 0.0 0.0 0 100 md/d40
 0.0 0.0 0.0 0.0 0.0 335.0 0.0 0.0 0 100 md/d52
 0.0 0.0 0.0 0.0 334.0 1.0 0.0 0.0 100 100 c3t2d0
 cpu
 us sy wt id
 10 5 0 85
 extended device statistics
 r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
 0.0 0.0 0.0 0.0 0.0 335.0 0.0 0.0 0 100 md/d30
 0.0 0.0 0.0 0.0 0.0 335.0 0.0 0.0 0 100 md/d40
 0.0 0.0 0.0 0.0 0.0 335.0 0.0 0.0 0 100 md/d52
 0.0 0.0 0.0 0.0 334.0 1.0 0.0 0.0 100 100 c3t2d0
 kr/s  kw/s show 0, but wait is 334,

 at this, the application always hang.

 # dtrace -n 'io:::start { @files[pid, execname, args[2]-fi_pathname] 
 = sum(args[0]-b_bcount); } tick-5sec { exit(); }'
 dtrace: description 'io:::start ' matched 7 probes
 CPU ID FUNCTION:NAME
 8 49675 :tick-5sec

 16189 nTrade 
 /export/data/dbxpt3/logs/ledgers/arinapt3.NTRPT3-MOCA.trans_outmsg.ledger 
 32768
 25456 pt_chmod /export/data/dbxpt3/logs/NTRPT3-MOCA.log 32768
 3 fsflush none 38912
 25418 pt_chmod /export/data/dbxpt3/logs/NTRPT3-MOCA.log 49152
 21372 tail /export/data/dbxpt3/logs/NTRPT3-MOCA.log 65536
 16189 nTrade 
 /export/data/dbxpt3/logs/ledgers/arinapt3.NTRPT3-MOCA.trans_exerep.ledger 
 81920
 16189 nTrade /export/data/dbxpt3/logs/ntrade.imbalances.log 114688
 25419 iostat /export/data/dbxpt3/logs/NTRPT3-MOCA.log 114688
 8018 tail /export/data/dbxpt3/logs/NTRPT3-MOCA.log 131072
 24915 tail /export/data/dbxpt3/logs/NTRPT3-MOCA.log 147456
 16189 nTrade none 207872
 20900 tail /export/data/dbxpt3/logs/NTRPT3-MOCA.log 270336
 0 sched none 782336
 16189 nTrade /export/data/dbxpt3/logs/NTRPT3-MOCA.log 2162688

 the write is about 10MB/s, did the above dtrace script tell the real 
 IO going on at that time ?
 is there a way to find how many IO generate by processes, and how many 
 IO are in the IO wait queue ?
 is there a way to find out the disk RPM besides checking the physical 
 drive ?

 Thanks,

 James Yang
 ---
 This communication may contain confidential and/or privileged information.
 If you are not the intended recipient (or have received this communication
 in error) please notify the sender immediately and destroy this
 communication. Any unauthorized copying, disclosure or distribution of the
 material in this communication is strictly forbidden.

 Deutsche Bank does not render legal or tax advice, and the information
 contained in this communication should not be regarded as such.

 

 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Is the nfs dtrace script right (from nfsv3 provider wiki)?

2008-12-10 Thread Jim Mauro
No bug here - we can absolutely use DTrace on MP systems,
reliably and with confidence.

The script output shows some nasty outliers for a small percentage
of the reads and writes happening on the server. Time to take a closer
look at the IO subsystem. I'd start with iostat -znx 1, and see what
the queues look like, IO rates, and service times.

I'd also have a look at the network. Download nicstat and run it
(go to blogs.sun.com and search for nicstat - it's easy to find).

What are you using for an on-disk file system? UFS or ZFS?

/jim



Marcelo Leal wrote:
 Ok, but that is a bug, or should work like that? 
  We can not use dtrace on multiple processors systems?
  Sorry, but i don't get it...
 __
  Leal
 [http://www.eall.com.br/blog]
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Is the nfs dtrace script right (from nfsv3 provider wiki)?

2008-12-09 Thread Jim Mauro
HmmmSomething is certainly wrong. 11 writes at 137k - 275k seconds
(which is where your 1.5M seconds sum is coming from) is bogus.

What version of Solaris is this ('uname -a' and 'cat /etc/release')?

Your running this on an NFS server, right (not client)?

Is this a benchmark? I ask because the file names (file1, file2, etc)
seem like names used in synthetic benchmarks.

I need to take a closer look at the code and figure out how to chase
this. Either there's a bug somewhere, or we're missing something.

Thanks,
/jim


Marcelo Leal wrote:
 Hello Jim!
  Actually i can repeat it... every time i did run some d script to collect 
 some data i got some (how do you call it? nasty :) values. Look:

 Fri Dec  5 10:19:32 BRST 2008
 Fri Dec  5 10:29:34 BRST 2008


 NFSv3 read/write distributions (us):

   read
value  - Distribution - count
2 | 0
4 | 1092
8 |@93773
   16 | 64481
   32 |@@   11713
   64 |@7590
  128 | 1156
  256 | 698
  512 | 1394
 1024 | 1729
 2048 | 805
 4096 |@2732
 8192 |@@@  14893
16384 |@@   9351
32768 |@2988
65536 | 647
   131072 | 119
   262144 | 29
   524288 | 30
  1048576 | 28
  2097152 | 0

   write
value  - Distribution - count
   64 | 0
  128 | 8
  256 | 2418
  512 |@@@  22679
 1024 | 28442
 2048 | 59887
 4096 |@68852
 8192 |@65152
16384 | 32224
32768 |@@   11554
65536 | 3162
   131072 | 1100
   262144 | 446
   524288 | 105
  1048576 | 70
  2097152 | 11
  4194304 | 0
  8388608 | 0
 16777216 | 0
 33554432 | 0
 67108864 | 0
134217728 | 0
268435456 | 0
536870912 | 0
   1073741824 | 0
   2147483648 | 0
   4294967296 | 0
   8589934592 | 0
  17179869184 | 0
  34359738368 | 0
  68719476736 | 0
 137438953472 | 11
 274877906944 | 0

 NFSv3 read/write by host (total us):

   x.16.0.x   4707246
   x.16.0.x  28397213
   x.16.0.x  40901275
   x.16.0.x  68333664
   x.16.0.x  89357734
   x.16.0.x 125890329
   x.16.0.x 127848295
   x.16.0.x 

Re: [dtrace-discuss] Is the nfs dtrace script right (from nfsv3 provider wiki)?

2008-12-09 Thread Jim Mauro
Also (I meant to ask) - are you having performance problems, or
just monitoring with the NFS provider scripts?

Thanks,
/jim


Marcelo Leal wrote:
 Hello Jim, this is not a benchmark. The filenames i did change for privacy...
  This is a NFS server, yes.

 # uname -a
 SunOS test 5.11 snv_89 i86pc i386 i86pc

 # cat /etc/release
Solaris Express Community Edition snv_89 X86
Copyright 2008 Sun Microsystems, Inc.  All Rights Reserved.
 Use is subject to license terms.
   Assembled 06 May 2008
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Is the nfs dtrace script right (from nfsv3 provider wiki)?

2008-12-09 Thread Jim Mauro
D'oh! Good spot Max - feeling sheepish that I missed that.

Marcelo - add the predicate to the done probes as per Max's
message, and let's see where that takes us

Thanks,
/jim


[EMAIL PROTECTED] wrote:
 Hi,
 I have looked at the script, and there is no correspondence between 
 start and done.
 So, I am not sure how this script is supposed to work.
 I think there should be a predicate in the done probes...
 The way the script is written, it assumes that for any start, the done
 that fires after it is for the same noi_xid.

 Current script:

 nfsv3:::op-read-start,
 nfsv3:::op-write-start
 {
start[args[1]-noi_xid] = timestamp;
 }

 nfsv3:::op-read-done,
 nfsv3:::op-write-done
 {
 ...

 Changed script:

 nfsv3:::op-read-start,
 nfsv3:::op-write-start
 {
start[args[1]-noi_xid] = timestamp;
 }

 nfsv3:::op-read-done,
 nfsv3:::op-write-done
 /start[args[1]-noi_xid] != 0/
 {

 That way, you don't have the done probe clause executing
 for id's where the start has not fired first.  (This still does not
 match start/done for a given xid).
 But what do I know...

 max

 Jim Mauro wrote:
 Also (I meant to ask) - are you having performance problems, or
 just monitoring with the NFS provider scripts?

 Thanks,
 /jim


 Marcelo Leal wrote:
  
 Hello Jim, this is not a benchmark. The filenames i did change for 
 privacy...
  This is a NFS server, yes.

 # uname -a
 SunOS test 5.11 snv_89 i86pc i386 i86pc

 # cat /etc/release
Solaris Express Community Edition snv_89 X86
Copyright 2008 Sun Microsystems, Inc.  All Rights Reserved.
 Use is subject to license terms.
   Assembled 06 May 2008
   
 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org

   

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Is the nfs dtrace script right (from nfsv3 provider wiki)?

2008-12-05 Thread Jim Mauro
Are you referring to nfsv3rwsnoop.d?

The TIME(us) value from that script is not a latency measurement,
it's just a time stamp.

If you're referring to a different script, let us know specifically
which script.

/jim


Marcelo Leal wrote:
 Hello there,
  Ten minutes of trace (latency), using the nfs dtrace script from nfsv3 
 provider wiki page, i got total numbers (us) like:
  131175486635
   ???

  thanks!
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Trying to identify writer and/or reason for iowrite.

2008-12-05 Thread Jim Mauro
The problem you're running into is disk IO operations tend to occur
asynchronously to the thread that initiated the IO, so when the IO
provider probe fires, execname shows the process name for PID 0.
This is not uncommon when chasing disk and network IOs. You
need to capture the write further up the stack.

The easiest method for determining which process(es) are writing is
to use either the syscall provider, or the fsinfo provider (depending
on which release of Solaris you're running, fsinfo may not be there).

Use the syscall provider and see which system calls are being used to
generate disk writes - it's probably write(2), but it may be any of;

nv98 dtrace -l -P syscall | grep write
76295syscall   write entry
76296syscall   write return
76497syscall  writev entry
76498syscall  writev return
76597syscall  pwrite entry
76598syscall  pwrite return
76691syscallpwrite64 entry
76692syscallpwrite64 return

#!/usr/sbin/dtrace -s

syscall::write:entry,syscall::writev:entry,syscall::pwrite:entry,syscall::pwrite64:entry
{
@[pid,probefunc] = count();
}

Once you have the correct system call and process name(s), fine tune
the DTrace and grab a user stack();

#!/usr/sbin/dtrace -s

syscall::write:entry
/ pid == PID_OF_INTEREST /
{
@[ustack()] = count();
}

The above assumes it's all write(2) system calls.

You can determine which files by grabbing arg0 when the
probe fires. Depending on which release of Solaris you're
running, you can use the fds array to get the file path;

#!/usr/sbin/dtrace -s

syscall::write:entry
{
@[execname, fds[arg0].fi_pathname] = count();
}


If your version of Solaris is older, and does not have fds available, 
just track arg0
(the file descriptor), and run pfiles on the process to map the file 
descriptor to
the file.

HTH,
/jim



Robert Alatalo wrote:
 Hello,

   Running the iotop dtrace script, we see many entries like

 UIDPID   PPID CMD  DEVICE  MAJ MIN DBYTES
 0  3  0 fsflush  md3  85   3 W  512
 ...
 ...
 0  0  0 schedssd22   118 176 W 80538112
 0  0  0 schedssd18   118 144 W 80585728
 0  0  0 schedssd24   118 192 W 80730624
 0  0  0 schedssd19   118 152 W 80762880
 0  0  0 schedssd23   118 184 W 80764928
 0  0  0 schedssd25   118 200 W 80965632
 0  0  0 schedssd20   118 160 W 81029120
 0  0  0 schedssd21   118 168 W 81132032

 In the iostat we see things like

 device   r/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
 ...
 ...
 ssd180.0  157.20.0 17914.5  9.2 13.3  142.9  63  75
 ssd190.0  161.40.0 17887.7  9.6 13.7  144.6  65  76
 ssd200.0  166.40.0 17922.0  8.3 12.7  126.1  58  74
 ssd210.0  157.60.0 17880.0  9.3 13.4  144.1  64  75
 ssd220.0  153.30.0 17867.7  9.2 13.4  147.3  63  75
 ssd230.0  154.30.0 17905.6  8.8 13.0  141.5  61  74
 ssd240.0  160.42.1 17915.4  9.2 13.4  141.3  63  75
 ssd250.0  160.70.0 17934.5  9.7 13.8  145.8  66  76


   Can anyone suggest a different script, or help modify the iotop as 
 included in the tookkit so as to either print stack traces or better 
 yet, print stack traces when the %busy goes above some threshhold, or 
 other way of limiting the data to go though?

 thanks in advance,
 Robert
 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] How to dig deeper

2008-12-04 Thread Jim Mauro

Do you have directio enabled on UFS?
Especially for the redo logs?

With directio enabled, UFS writes to the log do not
serialize on the RW lock for the log file(s).

directio will also bypass the memory cache, so you need
to increase the Oracle db_block_buffers when enabling
UFS directio.

Thanks,
/jim


Hans-Peter wrote:
 Hi Mauro,

 Yes I understand why sysinfo is not the best to measure IO.
 But I just wanted to see when in the whole trace the actual physical write 
 was being done.
 So it seems to me that, because the sysinfo:::pwrite is right at the end the 
 performance bottle neck is more because of the locking etc.

 The database files are on ufs filesystems.
 Is zfs better?

 Regards Hans-Peter
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] How to dig deeper

2008-12-04 Thread Jim Mauro
For the record, my friend Phil Harman reminded me that
it's not the log files we care about for directio in terms
of single-writer lock break-up. We care about directio for
redo logs to avoid read-modify-write, which happens when
the write is not memory-page aligned.

Sorry about that.

Nonetheless, you must still pursue getting UFS directio enabled
and working on your system (dealing with the single-writer
lock problem is still important, it's just not directly applicable
to the redo logs).

Thanks,
/jim


Hans-Peter wrote:
 Do you have directio enabled on UFS?
 Especially for the redo logs?
 

 That is the strange thing.
 filesystemio_options has been set to setall which is asynch and directio.
 But when I dtrace the fbt calls I see  only directio calls for the database 
 writer
 but not for the log writer.

 Perhaps we should mount the filesystems with forcedirectio.
 Would noatime also help?

 Regards

   
 With directio enabled, UFS writes to the log do not
 serialize on the RW lock for the log file(s).

 directio will also bypass the memory cache, so you
 need
 to increase the Oracle db_block_buffers when enabling
 UFS directio.

 Thanks,
 /jim


 Hans-Peter wrote:
 
 Hi Mauro,

 Yes I understand why sysinfo is not the best to
   
 measure IO.
 
 But I just wanted to see when in the whole trace
   
 the actual physical write was being done.
 
 So it seems to me that, because the
   
 sysinfo:::pwrite is right at the end the performance
 bottle neck is more because of the locking etc.
 
 The database files are on ufs filesystems.
 Is zfs better?

 Regards Hans-Peter
   
   
 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
 
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] How to dig deeper

2008-12-03 Thread Jim Mauro
%busy is meaningless unless you're looking at a single disk that
can only have 1 outstanding IO in it's active queue, which is to
say %busy is a useless metric for anything disk that's been designed
and built in the last decade.

Ignore %busy. Focus on queue depths and queue service times,
both of which are reported in iotstat.

And again, averages can be misleading. Use the iostat data
as a starting point, but if you really think you have some disk
IO issues, use DTrace and aggregations to get more precise
data on disk IO times.

Thanks,
/jim


Hans-Peter wrote:
 Hello Jim,

 The iostat average are not too bad.
 But the busy% is 100% most of the time.

 regards Hans-Peter
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] How to dig deeper

2008-12-03 Thread Jim Mauro

The sysinfo provider isn't the best choice for measuring disk IO
times.

Run;
#dtrace -s /usr/demo/dtrace/iotime.d

/jim


Hans-Peter wrote:
 Hello all,

 I added a clause to my script.
 sysinfo:::
 /self-traceme==1  pid == $1/
 {
 trace(execname);
 printf(sysinfo: timestamp : %d , timestamp);
 }
 A subsequent trace created a file of about 19000 lines.
 I loaded in Excel to be able to subtrace timestamps etc.
 The longest jump in timestamp is between the first pair of savectx and 
 restorectx at line 70.
 I count 50 savectx/restorectx calls in my trace.
 But the actual physical write as indicated by the sysinfo is almost at the 
 end of the file directly after the ufs_write and fop_write call end (line 
 18746).

   24  -  tsd_agent_get   oracle  timestamp   
 1795469946839100
   24  -  ufs_lockfs_top_vop_return   oracle  timestamp   
 1795469946841500
   24  -  ufs_lockfs_top_vop_return   oracle  timestamp   
 1795469946844300
   24  -  ufs_lockfs_end  oracle  timestamp   
 1795469946846700
   24  -  ufs_write   oracle  timestamp   
 1795469946849600
   24  -  fop_write   oracle  timestamp   
 179546994685350057,365,700
   24  |   pwrite  syswriteoracle  sysinfo timestamp   
 1795469946856800
   24  |   pwrite  writech oracle  sysinfo timestamp   
 1795469946860200

 So it seems that the actual write not the bottle neck but 
 I attached a zip file with the excel document.

 Am I right in thinking that is is more an OS issue than a storage issue?

 Regards Hans-Peter
   
 

 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] How to dig deeper

2008-12-03 Thread Jim Mauro
Also - since this is Oracle, are the Oracle files
on a file system, or raw devices?
If a file system, which one?

/jim


Jim Mauro wrote:
 The sysinfo provider isn't the best choice for measuring disk IO
 times.

 Run;
 #dtrace -s /usr/demo/dtrace/iotime.d

 /jim


 Hans-Peter wrote:
   
 Hello all,

 I added a clause to my script.
 sysinfo:::
 /self-traceme==1  pid == $1/
 {
 trace(execname);
 printf(sysinfo: timestamp : %d , timestamp);
 }
 A subsequent trace created a file of about 19000 lines.
 I loaded in Excel to be able to subtrace timestamps etc.
 The longest jump in timestamp is between the first pair of savectx and 
 restorectx at line 70.
 I count 50 savectx/restorectx calls in my trace.
 But the actual physical write as indicated by the sysinfo is almost at the 
 end of the file directly after the ufs_write and fop_write call end (line 
 18746).

  24  -  tsd_agent_get   oracle  timestamp   
 1795469946839100
  24  -  ufs_lockfs_top_vop_return   oracle  timestamp   
 1795469946841500
  24  -  ufs_lockfs_top_vop_return   oracle  timestamp   
 1795469946844300
  24  -  ufs_lockfs_end  oracle  timestamp   
 1795469946846700
  24  -  ufs_write   oracle  timestamp   
 1795469946849600
  24  -  fop_write   oracle  timestamp   
 179546994685350057,365,700
  24  |   pwrite  syswriteoracle  sysinfo timestamp   
 1795469946856800
  24  |   pwrite  writech oracle  sysinfo timestamp   
 1795469946860200

 So it seems that the actual write not the bottle neck but 
 I attached a zip file with the excel document.

 Am I right in thinking that is is more an OS issue than a storage issue?

 Regards Hans-Peter
   
 

 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
 
 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] How to dig deeper

2008-12-02 Thread Jim Mauro
Start with iostat. It's simple, and provides an average of service times
for disk IOs (iostat -xnz 1, the asvc_t column is average service times
in milliseconds). As Jim Litchfield pointed out in a previous thread,
keep in mind it is an average, so you won't see nasty peaks, but if the
average is bad, it's time to talk to the SAN/Storage folks.

Use DTrace with the IO provider to get more precise measurements.
You can borrow from existing scripts in /usr/demo/dtrace to add
time stamps, and use aggregations to capture IO times.

If you need assistance with such a script, please let us know and
we'll be glad to post one (it looks like you're already pretty comfortable
with DTrace...).

Thanks,
/jim

Hans-Peter wrote:
 In order to get more information on IO performance problems I created the 
 script below:
 #!/usr/sbin/dtrace -s
 #pragma D option flowindent
 syscall::*write*:entry
 /pid == $1  guard++ == 0/
 {
 self - ts = timestamp;
 self-traceme = 1;
 printf(fd: %d, arg0);
 }
 fbt:::
 /self-traceme/
 {
 /*  elapsd =timestamp - self - ts;
 printf( elapsed : %d , elapsd); */
 printf( timestamp : %d , timestamp);
 }
 syscall::*write*:return
 /self-traceme/
 {
 self-traceme = 0;
 elapsed=timestamp - self - ts;
 printf( timestamp : %d , timestamp);
 printf(\telapsed : %d , elapsed);
 exit(0);
 }

 I gives me the timestamp for every fbt call during a write system call.
 A snippet is here below
   8  - schedctl_save  timestamp : 1627201334052600
   8- savectx  timestamp : 1627201334053000
   0  - restorectx timestamp : 1627202741110300 
  difference = 1.407.057.300
   0- schedctl_restore timestamp : 1627202741115100
   0- schedctl_restore timestamp : 1627202741116100
 Visible is that the thread of for 1.4 s off cpu.
 Storage is on SAN with fibers between the system and the storage.

 Is it possible to dig deeper with dtrace to see how the HBA's are performing.
 Other suggestions are welcome too.

 Regards Hans-Peter
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Concurrent Disk Activity from Solaris Performance Tools Chap. 4.17.4

2008-11-30 Thread Jim Mauro

Hi Paul -

 One thing I have been puzzled with a lot this weekend is the information and 
 plot in Figure 4.7. This section if I understand it correctly, offers the 
 means to track the actual times from when an IO starts in the kernel to when 
 it completes, implying the time to either read or write from disk or memory 
 cache. 

 I have been using a data file for an Oracle database as the test subject for 
 this work.

 I have several confusion points with this section.

 • The graph mentions ‘strategy’ and ‘biodone’ which seem to imply TNF 
 based data, not output from the DTrace script above Figure 4.7. 
   

The DTrace script uses the IO provider io:::start and io:::done probes.
io:genunix::start enables several probes, including the bdev_strategy 
kernel routine;

nv98 dtrace -n 'io:genunix::start'
dtrace: description 'io:genunix::start' matched 3 probes

CPU ID FUNCTION:NAME
0 22920 bdev_strategy:start
0 22920 bdev_strategy:start
0 22920 bdev_strategy:start

io:genunix::done enables a probe in biodone;

nv98 dtrace -n 'io:genunix::done'
dtrace: description 'io:genunix::done' matched 1 probe
CPU ID FUNCTION:NAME
0 22908 biodone:done


The actions in the dtrace script gather timestamps and block numbers at 
each firing
(assuming the predicates evaluate true).

 • In looking at the data gotten from the DTrace script I see no way to 
 generate the graph of Figure 4.7. Specifically the time difference between 
 ‘0’ and the points for ‘strategy’. With the DTrace script we have the start 
 time of the IO. I see no way to determine some amount of time between ‘start’ 
 and something earlier. The time values on the ‘X’ axis also don’t fall out of 
 the data generated by the DTrace script.
   
The script in the book generates output that looks like this;
nv98 ./iot.d
122065039977,2100,
122065039980,,2100
122070310632,72,
122070310637,,72
122070310833,81,
122070310836,,81
. . .

The value on the left is the timestamp, the value on the right is the 
block number.
The data was imported into a spreadsheet, and the math was done on the 
start time
and stop (done) time for each block, resulting in IO times on a 
per-block basis.

 • How can it be determined if the IO completed from a memory cache or 
 required an I/O to a physical disk? I have a lot of times less than 0.5 ms 
 but also have a fair number that are in the range of 1 ms to 300 ms.
   

The lookup in the file system memory cache happens above the bio layer. 
Put another way,
if we're hitting bdev_strategy, we need to do a disk IO to get the data 
(we already missed
in the cache).

HTH,

Thanks,
/jim


 I modified the script to dump out the size of the I/O being done and that was 
 interesting to see some unexpected lengths.  Also added ‘start’ and ‘end’ to 
 the appropriate lines as a sanity check to make it easier to pair up the 
 entries. Should always have one start/end pair for a block address.

 I have attached an Excel spreadsheet with what I was able to create based on 
 the data collected.

 My thanks for any clarifications to these confusions.

 pdc
   
 

 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Concurrent Disk Activity from Solaris Performance Tools Chap. 4.17.4

2008-11-30 Thread Jim Mauro
Hey Paul - I should add that iostat -xnz 1 is a great method
for determine how well the SAN is performance.
The asvc_t times are disk IO service times in milliseconds.
I usually start there to sanity check disk IO times...

Thanks,
/jim


Paul Clayton wrote:
 Hello..

 Due to  growing performance problems on numerous systems, I have been reading 
 through a lot of information about DTrace and what it can find out about a 
 system. It is a great tool, and while there is a learning curve to using it 
 successfully, getting useful information quickly is helped by such books as 
 the ‘Solaris Performance  Tools’ book.

 In light of us also ramping up new SANs and ever larger SAN fabrics, I have 
 long been wondering what the times are for getting data in/out of the SANs. 
 If we talk with the SAN vendor they do the say the SAN not a problem and we 
 should look elsewhere. Talk with the fabric folks and they say no problems, 
 look elsewhere.

 So, it was with high interest that I have been reading Chapter 4 in the book 
 about disk activity multiple times and trying various commands out on systems 
 here.

 One thing I have been puzzled with a lot this weekend is the information and 
 plot in Figure 4.7. This section if I understand it correctly, offers the 
 means to track the actual times from when an IO starts in the kernel to when 
 it completes, implying the time to either read or write from disk or memory 
 cache. 

 I have been using a data file for an Oracle database as the test subject for 
 this work.

 I have several confusion points with this section.

 • The graph mentions ‘strategy’ and ‘biodone’ which seem to imply TNF 
 based data, not output from the DTrace script above Figure 4.7. 

 • In looking at the data gotten from the DTrace script I see no way to 
 generate the graph of Figure 4.7. Specifically the time difference between 
 ‘0’ and the points for ‘strategy’. With the DTrace script we have the start 
 time of the IO. I see no way to determine some amount of time between ‘start’ 
 and something earlier. The time values on the ‘X’ axis also don’t fall out of 
 the data generated by the DTrace script.

 • How can it be determined if the IO completed from a memory cache or 
 required an I/O to a physical disk? I have a lot of times less than 0.5 ms 
 but also have a fair number that are in the range of 1 ms to 300 ms.

 I modified the script to dump out the size of the I/O being done and that was 
 interesting to see some unexpected lengths.  Also added ‘start’ and ‘end’ to 
 the appropriate lines as a sanity check to make it easier to pair up the 
 entries. Should always have one start/end pair for a block address.

 I have attached an Excel spreadsheet with what I was able to create based on 
 the data collected.

 My thanks for any clarifications to these confusions.

 pdc
   
 

 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] truss -fall equivalent in DTrace

2008-11-16 Thread Jim Mauro
What kind of system is this, and what release of Solaris?
Enabling all the probes for all the function entry points in a process
(pid$1:::entry) can take some time, and may make your terminal
window appear hung, but it should not almost hang your system
(unless you did this on a laptop or small, single CPU machine).

If you have thread contention, use the plockstat provider.
Run plockstat -V  and save the output. The -V
flag will generate the actual D that plockstat uses (to stderr if
I remember right). You can save that and use it as a building
block for chasing with threads are hitting your contended locks.

Of course, you may find the output of plockstat -A -p PID is all you
need...

Thanks,
/jim


[EMAIL PROTECTED] wrote:
 I tried this script (attached below) but it printed only 1 thread (shown 
 below). When I tried :::entry and my system was almost hung. I think pstack 
 is good enough for my purpose. :-)

 BTW any script to find out which two threads are causing lock contention 
 /deadlock?

 #!/usr/sbin/dtrace -s
 #pragma D option quiet
 #pragma D option defaultargs

 pid$1:::entry
 / this-thread_is_already_printed != 1 /
 {
this-thread_is_already_printed = 1;
printf(thread %d: \n, tid);
ustack(50);
 }

 $./pstack.d 16028
 thread 11:

  libc.so.1`_lwp_mutex_lock
  libc.so.1`_lwp_cond_reltimedwait+0x78
  libc.so.1`_lwp_cond_timedwait+0x1c
  libjvm.so`__1cHMonitorEwait6Mil_i_+0x328
  libjvm.so`__1cIVMThreadDrun6M_v_+0x1b4
  libjvm.so`__1cG_start6Fpv_0_+0x208
  libc.so.1`_lwp_start 


 --- On Fri, 11/14/08, Adam Leventhal [EMAIL PROTECTED] wrote:

   
 From: Adam Leventhal [EMAIL PROTECTED]
 Subject: Re: [dtrace-discuss] truss -fall equivalent in DTrace
 To: [EMAIL PROTECTED] [EMAIL PROTECTED]
 Cc: Mark Plotnick [EMAIL PROTECTED], dtrace-discuss@opensolaris.org
 Date: Friday, November 14, 2008, 7:32 PM
 On Fri, Nov 14, 2008 at 12:40:55AM -0800,
 [EMAIL PROTECTED] wrote:
 
 Can I get pstack equivalent script using DTrace?
   
 You can use ustack() at any probe.

 Adam

 -- 
 Adam Leventhal, Fishworks
 http://blogs.sun.com/ahl
 


   
 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Dtrace command

2008-11-10 Thread Jim Mauro
That's not a query that can be answered in a forum like this.

You need to do some reading. Starting with the docs on the
Wiki site (wikis.sun.com/dtrace). Go to blogs.sun.com, and
search for dtrace, and read. Go to
http://www.solarisinternals.com/wiki/index.php/DTrace_Topics_Intro.
Read through the material posted there. It's designed specifically to
get folks like yourself started on DTrace.

Invest some time, learn what DTrace is and what it can do.
Once you've done that, if you have specific questions about
the use of DTrace in a specific context or even a general question,
we are all happy to help. But we can't post an email response
to showing you the simple way to use DTrace.

Thanks,
/jim


muthamilan wrote:
 Hi All,

 Thanks for your response.
 I’m working on an environment, which having more then 50 Solaris 10 servers. 
 I want use this Dtrace tool for trouble shooting and performance issue .So 
 guide me ,the simple way how  to use this Dtrace.

 Advance thanks.
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org

Re: [dtrace-discuss] plockstat: processing aborted: Abort due to systemic unresponsiveness

2008-11-05 Thread Jim Mauro
Check out;

http://opensolaris.org/jive/thread.jspa?messageID=267138#267138

You may be tripping over bug 6507659 (tsc differences between CPU's give
dtrace_gethrtime() serious problems.). It looks like the fix went into
127112-03 (you're running -02).

Best to install the latest patch, but you can work around this
by enabling destructive actions. For plockstat, I think that's done
as;

plockstat -x destructive=1 -A -p 

Thanks,
/jim


Jianhua Yang wrote:

 Hello,

 I need help here about plockstat on X86 platform (Sun X4600 AMD)
 # plockstat -A -p 20034
 *plockstat: processing aborted: Abort due to systemic unresponsiveness*
 # plockstat -e 5 -s 10 -A -x bufsize=100k -x aggsize=20m -p 20034
 *plockstat: processing aborted: Abort due to systemic unresponsiveness*
 # ps -ef | grep 20034
 algodev 20034 1 2 07:00:54 ? 86:17 
 /data/algodev/arina/PCM/7.6.3d/bin/galaxy/pcmsrv_wombat -INSTANCE 
 INST_P_6 -fee
 # uname -a
 SunOS nygeqptsoruat1 5.10 Generic_127112-02 i86pc i386 i86pc

 Thanks,

 James Yang
 Email : [EMAIL PROTECTED]
 ---

 This e-mail may contain confidential and/or privileged information. If 
 you
 are not the intended recipient (or have received this e-mail in error)
 please notify the sender immediately and destroy this e-mail. Any
 unauthorized copying, disclosure or distribution of the material in this
 e-mail is strictly forbidden.

 

 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] DTrace on Mac OS X

2008-09-02 Thread Jim Mauro

 To All (This is mainly for the Mac DTrace 3, Adam Leventhal, Bryan  
 Cantrill, Mike Shapiro)..
   
We appreciate your desire to go to the source. It's a lot like posting a 
question on
relativity, and indicating you'd really like an answer from Einstein :^)

That said, there's great news - the DTrace community is growing all the 
time,
and there are many, many DTrace experts that can help. So Brian, Mike and
Adam (and Eric, and Brendan, and Jon), can spend more time catching up
on People magazine, tracking alien landings in the National Enquirer, and
playing FishPong

:^)
  From what I have learned from DTrace is that you can monitor the  
 system calls of either the OS probes or individual processes.  
Let's start with terminology here. Systems calls are not issued by the 
OS - application
processes (threads) issue system calls to enter the OS for a privileged 
operation (e.g.
open a file). Please clarify what you mean by OS probes - are you 
referring to DTrace
probes, or OS X IOKit probes?

Are you interested in dtrace'ing dtrace?

I'm not breaking your you-know-whatsI'm sincerely interested in 
making sure
we agree on terminology. The notion of OS probes issuing system calls 
is not
clear to me.
 Also,  
 on the Mac you are able to monitor Objective-C calls by specifying an  
 individual Cocoa application.  For my particular project I need to  
 track 3 tasks:  when any arbitrary application accesses a file, when  
 any web page is accessed, and when any mail client assesses an email.   
 My knowledge of DTrace tells me that I need to create probes for every  
 application that can do these three types of tasks.  Is there another  
 approach that does not need to know every type of application or  
 process that can do these 3 tasks?
   
... any arbitrary application accesses a file - Are you interested in 
tracking access
to one specific file by any possible process?

I assume the machine you're monitoring is running a WEB server, and an 
email server?

There's a couple ways to do this, depending somewhat on your configuration.
For tracking access to a file, you can use the system call provider and 
a predicate
on arg0, which is a pointer to a pathname string.

For the second two items, I'd start with monitoring the httpd process 
and mail server
process with the syscall provider to determine the syscall path to the 
patterns you're
interested in. Once you've established that, you can whittle all this 
down to a
relatively simple script that does what you want. Make sense?

(In case it does not) - In other words, as a general methodology, I 
sometimes use
dtrace to get a broad view of what a particular application process is 
doing, e.g.
grab a system call profile of a httpd process, to establish the specific 
calls (and, in
some cases, args) used when something of interest is happening (e.g. 
accessing
a static WEB page). From there I use that information to create a dtrace 
script
that is intended to hone-in on that particular flow of activity.

HTH,
/jim


 I appreciate any feedback one could give.

 Thanks in advanced,

 Blake
 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] DTrace on Mac OS X

2008-09-02 Thread Jim Mauro

 and there are many, many DTrace experts that can help. So Brian, Mike and
   
Sorry; s/Brian/Bryan


___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] DTrace on Mac OS X

2008-09-02 Thread Jim Mauro
I'm not sure I can add anything beyond Bryan's response.

I'm not an expert in WEB servers or the multimedia server stack
(other software components that get integrated for building these
cool sites), but I would first look at add-ons, compiler flags, options,
etc, for the actual software components running the service. It's not
unusual to find additional logging/tracking features that can be enabled.

I see DTrace as a fit here to help you understand how to build the
software you're interested in building (which is what Bryan suggested
in his second reason response), but not as a component of the final
application.

If you have specific questions on how to do that, let's continue
the dialogue.

Thanks,
/jim


Blake Sawyer wrote:
 Jim,

 Thanks so much for your help.  Also thanks for understanding my 
 terminology mistakes, I am new to a lot of this.  Let me clear some of 
 these things up by explaining the idea of my project.

 I want to create an indexing application that will work with a 
 location/context aware system.  This application will automatically 
 index media files(documents, video, audio, webpages, mail, etc.) 
 socially and episodically.  For example, in a meeting scenario, the 
 system will track media files accessed and tag them with who attended 
 the meeting and the time and location of the meeting.  I am trying to 
 prove that giving a user the ability to reference files socially and 
 episodically, they will be better at re-contextualization.  You can 
 think of it as a new personal information manager.

 Therefore, the purposes of using DTrace is not to run any kind of test 
 on a web server or email server.  I want this to run on an 
 individual's computer that determines when meaningful media access 
 occurs.  It will then query the location/context aware system to tag 
 the media appropriately.  I am just unsure if DTrace is the right 
 approach here.  This is not really the main purpose of DTrace, but my 
 professors and I agree that this may be a good starting point.  As I 
 mentioned Mac supports AppleScript that can do this, though it is 
 quite inefficient.

 It seems to me that if I want to create an application like this, I 
 would need to monitor each application using DTrace.  This approach 
 would not be very extensible for future versions.

 I hope this clears up what I am trying to accomplish.

 Again, thanks so much for the help.

 -Blake

 On Sep 2, 2008, at  [Sep 2]1:06 PM, Jim Mauro wrote:


 To All (This is mainly for the Mac DTrace 3, Adam Leventhal, Bryan  
 Cantrill, Mike Shapiro)..

 We appreciate your desire to go to the source. It's a lot like 
 posting a question on
 relativity, and indicating you'd really like an answer from 
 Einstein :^)

 That said, there's great news - the DTrace community is growing all 
 the time,
 and there are many, many DTrace experts that can help. So Brian, Mike 
 and
 Adam (and Eric, and Brendan, and Jon), can spend more time catching up
 on People magazine, tracking alien landings in the National Enquirer, 
 and
 playing FishPong

 :^)
 From what I have learned from DTrace is that you can monitor the  
 system calls of either the OS probes or individual processes.
 Let's start with terminology here. Systems calls are not issued by 
 the OS - application
 processes (threads) issue system calls to enter the OS for a 
 privileged operation (e.g.
 open a file). Please clarify what you mean by OS probes - are you 
 referring to DTrace
 probes, or OS X IOKit probes?

 Are you interested in dtrace'ing dtrace?

 I'm not breaking your you-know-whatsI'm sincerely interested in 
 making sure
 we agree on terminology. The notion of OS probes issuing system 
 calls is not
 clear to me.
 Also,  on the Mac you are able to monitor Objective-C calls by 
 specifying an  individual Cocoa application.  For my particular 
 project I need to  track 3 tasks:  when any arbitrary application 
 accesses a file, when  any web page is accessed, and when any mail 
 client assesses an email.   My knowledge of DTrace tells me that I 
 need to create probes for every  application that can do these three 
 types of tasks.  Is there another  approach that does not need to 
 know every type of application or  process that can do these 3 tasks?

 ... any arbitrary application accesses a file - Are you interested 
 in tracking access
 to one specific file by any possible process?

 I assume the machine you're monitoring is running a WEB server, and 
 an email server?

 There's a couple ways to do this, depending somewhat on your 
 configuration.
 For tracking access to a file, you can use the system call provider 
 and a predicate
 on arg0, which is a pointer to a pathname string.

 For the second two items, I'd start with monitoring the httpd process 
 and mail server
 process with the syscall provider to determine the syscall path to 
 the patterns you're
 interested in. Once you've established that, you can whittle all this 
 down to a
 relatively

Re: [dtrace-discuss] Can DTrace tell me if my process is 32 or 64 bit?

2008-07-23 Thread Jim Mauro
You could use curpsinfo-pr_dmodel is a predicate.

probe
/ curpsinfo-pr_dmodel == 1/
{
32-bit process
}

probe
/ curpsinfo-pr_dmodel == 2/
{
64-bit process
}


/jim


Bruce Chapman wrote:
 I had a simple D script that looks for use of the DP_POLL ioctl with a long 
 timeout for any process on the system, but I could not find an easy way to 
 determine whether to use dvpoll or dvpoll32 structures to do so.  An ugly 
 hack based upon pointer location/value works, but is clearly unacceptable 
 (this was in a single probe syscall::ioctl:entry/arg1==0xd001/ ):

 nfds = (dvpoll-dp_nfds  100) ?dvpoll32-dp_nfds : dvpoll-dp_nfds;

 Since I could find no way for DTrace to help me out here I looked at source 
 code and found that dpioctl takes a mode argument that has a bitmask which 
 will indicate DATA_MODEL, so I modified the script to use that, which works 
 and should be stable, but it seems DTrace should provide coders with it's own 
 way of determining the data_model to save people such gyrations.  Also, 
 though this script works fine on S10 FCS SPARC/x86 up to snv_86 (Indiana), it 
 fails on snv_93 (image updated Indiana - it works on a 32 kernel snv_95 
 machine but I haven't found a 64 bit snv_95 to try it on) with :

 dtrace: failed to compile script ./ioctl_DP_POLL.d: line 25: operands have 
 incompatible types: dblk_t != union

 Is this a known DTrace bug in snv_93 or just more restrictive type checking 
 (seems like a bug...if not, can someone get the script to work in snv_93?)

 Anyway, here's the script that works on all S10 versions through snv_86 but 
 is more complicated than it would be if DTrace had a mechanism of letting me 
 know if the binary being traced was 32 or 64 bit :

 #!/usr/sbin/dtrace -s
 #pragma D option quiet

 struct dvpoll*  dvpoll;
 struct dvpoll32  *  dvpoll32;

 BEGIN {
  printf(If timeout is -1 or big, you could get hit by bugid 6724237\n);
  prev_timeout = -2;
  prev_fd = prev_nfds = prev_pid = -1;
 }
 syscall::ioctl:entry /arg1 == 0xd001/ {
  self-fd=arg0;
 }
 fbt:poll:dpioctl:entry {
self-bitsize64 = (arg3  0x0020); /* model.h: DATA_MODEL_LP64 */
 }
 fbt:poll:dpioctl:entry /self-bitsize64/ {
dvpoll=copyin(arg2, sizeof(*dvpoll));
self-nfds = dvpoll-dp_nfds;
self-timeout = dvpoll-dp_timeout;
 }
 fbt:poll:dpioctl:entry /!(self-bitsize64)/ {
dvpoll32=copyin(arg2, sizeof(*dvpoll32));
self-nfds = dvpoll32-dp_nfds; /* DTrace broken in SNV_93? */
self-timeout = dvpoll32-dp_timeout;
 }
 fbt:poll:dpioctl:entry
 / (self-fd != prev_fd) || (self-nfds != prev_nfds) ||
  (self-timeout != prev_timeout) || (prev_pid != pid) /{
printf(%s pid=%d/%d calling ioctl(%d, DP_POLL, nfds=%d timeout=%d)\n,
   execname, pid, tid, self-fd, self-nfds, self-timeout);
prev_pid = pid;
prev_fd = self-fd;
prev_nfds = self-nfds;
prev_timeout = self-timeout;
self-fd = self-nfds = self-timeout = self-bitsize64 = 0;
 }


 --
 This message posted from opensolaris.org
 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] About memory usage by a specific process

2008-07-13 Thread Jim Mauro
Soif the RSS value of the process reaches a certain size and remains
more or less at the same size (which is typically the case once a workload
quiesces), are you saying to know specific processes should be releasing
memory at some point? Why do you think the memory should be getting
released - specific application code?

Are these heap segments (I assume yes)?

Chasing memory allocations and frees is doable, but not trivial.
You can instrument malloc/free calls with the PID provider, or link
to libumem.so, and use it's feature set for debugging memory allocations
with mdb.

http://access1.sun.com/techarticles/libumem.html

That aside, there are several more detailed responsed on using
dtrace for chasing memory allocations and frees in the archives
on the opensolaris.org page (discussions - dtrace - search).

I would encourage you to make sure you have a good sense for what
your chasing before you go down this path. It sounds almost like a
memory leak, but you do not describe it as such.

All I'm saying is have a good sense of the magnitude of the problem.
If you're looking for segments in the 100's of MBytes or GByte range,
you should be able to find those. But if you're chasing a few megabytes
here and thereyou really need to be using a debugger and working
with the folks that own the code.

Good luck,
/jim

 Thanks Jim
 Prstat is useful for looking at what processes are using the memory. 
 I am looking for the processes that use memory but do not release it 
 Thanks
 Riadh 
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] About memory usage by a specific process

2008-07-11 Thread Jim Mauro
prstat -s rss

The RSS column is the resident set size, or roughly the amount of memory
being used be the process. The -s rss tells prstat to sort based on 
rss size.

HTH,
/jim


YOUNSI RIADH wrote:

 *Hi *

 *Using vmstat I noticed that free memory is getting lower during a 
 certain period of time (2 or 3 days) then that memory is recovered. Is 
 there a**_ dtrace script_** that would help to identify the process 
 that uses that** memory?** *

 **

 **

 *Thanks*

 ***Riadh YOUNSI*

 

 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] IO probes and forcedirectio

2008-05-23 Thread Jim Mauro
I don't understand the question.
I see forcedirectio set as a mount option, so I would expect
ufs:directio_start:entry to fire

/jim


Sébastien Bouchex Bellomié wrote:
 Hi,

 The following script is working fine as it display the  directio start  
 message

 [...]
 #!/usr/sbin/dtrace -s
 #pragma D option quiet
 fbt:ufs:directio_start:entry
 {
 printf(directio start\n);
 }
 [...]

 However, taking my mount line, I don't see anything special :

 /rdata on /dev/dsk/c2t1d1s0 
 read/write/setuid/devices/intr/forcedirectio/largefiles/logging/xattr/onerror=panic/dev=1d80008
  on Thu Apr 24 15:41:27 2008

 Thanks
 Seb

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
 Sent: vendredi 23 mai 2008 17:26
 To: Sébastien Bouchex Bellomié
 Cc: dtrace-discuss@opensolaris.org
 Subject: Re: [dtrace-discuss] IO probes and forcedirectio


   
 That's why I came to the conclusion that the io provided ignore 
 forcedirectio mounted partition, but I wanted to be sure.

 

 io:genunix::start is in bdev_strategy(), which is lower than
 the file system level. It is not concerned with any file system,
 or its mount options. I suspect that you may not be recognizing the
 probes as belonging to this file system.

 Also, if there is a small number of UFS file systems with this option
 set, you may try to use directio_start:entry probe as a quick sanity
 check.

 HIH
 Bhaskar
 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] disp_getwork question

2008-04-30 Thread Jim Mauro
disp_getwork() is a kernel function that gets called from the idle loop
(dispatcher - go get work, meaning find me a runnable thread).

(usermode) simply means that the CPU(s) were running in usermode,
not kernel mode during the profile, and lockstat has not visibility into
what user functions where running during the sample. The best it can
do it tell you the cpus(s) were not running in the kernel.

Try;

#dtrace -n 'profile-997hz / arg0 != 0  curthread-t_pri != -1 / 
@[func(arg0)] = count() }'

for a kernel profile that filters out the idle loop.

/jim


Jianhua Yang wrote:
 Hello,

 I collected  lockstat statistics when CPU idle is less 10, see below:

 Profiling interrupt: 120600 events in 5.038 seconds (23940 events/sec)

 Count indv cuml rcnt nsec CPU+PILCaller
 ---
  3430   3%   3% 0.00 1225 cpu[3]+11  disp_getwork
  3378   3%   6% 0.00 1429 cpu[512]+11disp_getwork
  3319   3%   8% 0.00 1254 cpu[515]+11disp_getwork
  
  1407   1%  61% 0.00 1238 cpu[11](usermode)
   972   1%  62% 0.00 2664 cpu[18](usermode)
   953   1%  63% 0.00  833 cpu[528]   (usermode)

 question:
   1.   what is disp_getwork ?I guess this function comes from shced
 ?  if trace sched procss functions,
 say 5 seconds, what is overhead of performance?
   2.   caller = (usermode),   what is usermode mean here ? why did it not
 trace out the functions ?

 Thanks,

 James Yang
 ---

 This e-mail may contain confidential and/or privileged information. If you
 are not the intended recipient (or have received this e-mail in error)
 please notify the sender immediately and destroy this e-mail. Any
 unauthorized copying, disclosure or distribution of the material in this
 e-mail is strictly forbidden.

 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] dtrace: error invalid address (0x1f257c000) in action #2

2008-04-23 Thread Jim Mauro
Read this thread:

http://www.opensolaris.org/jive/thread.jspa?messageID=10250

Thanks,
/jim


Jianhua Yang wrote:
 Hello,

 new to dtrace,  need help here

 when run the following dtrace,  it produced dtrace errors,   why it get such
 errors ?

 # dtrace -n 'pid$target:libc:malloc:entry { @j[jstack()] = count(); }'  -p
 11539
 dtrace: description 'pid$target:libc:malloc:entry ' matched 1 probe
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f257c000) in action #2
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f257c000) in action #2
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f257c000) in action #2
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f257c000) in action #2
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f257c000) in action #2
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f257c000) in action #2
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f257c000) in action #2
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f257c000) in action #2
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f257c000) in action #2
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f257c000) in action #2
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f24f8000) in action #2
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f24f8000) in action #2
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f24f8000) in action #2
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f24f8000) in action #2
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f24f8000) in action #2
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f24f8000) in action #2
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f24f8000) in action #2
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f24f8000) in action #2
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f257c000) in action #2
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f257c000) in action #2
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f257c000) in action #2
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f24f8000) in action #2
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f24f8000) in action #2
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f24f8000) in action #2
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f24f8000) in action #2
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f24f8000) in action #2
 ^C
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f257c000) in action #2
 dtrace: error on enabled probe ID 1 (ID 112884:
 pid11539:libc.so.1:malloc:entry): invalid address (0x1f24f8000) in action #2


   libc.so.1`malloc
   libESPD12d.so`__1cPCFETUByteStream2t5B6MIIHHpc_v_+0xa4
   libESPD12d.so`__1cMCFETPMessageNCreateInbound6Fpci_p0_+0x618
   libESPD12d.so`__1cMCFETISessionFRXMsg6M_pnMCFETPMessage__+0x60
   libESPD12d.so`__1cMCFETIManagerPInboundDataFlow6M_v_+0x178
   libESPD12d.so`__1cYStartInboundDataFlowFuncDrun6M_v_+0xc
   libESPD12d.so`PosixThreadFunction+0x24
   libc.so.1`_lwp_start
 1
 .


 Thanks,

 James Yang
 ---

 This e-mail may contain confidential and/or privileged information. If you
 are not the intended recipient (or have received this e-mail in error)
 please notify the sender immediately and destroy this e-mail. Any
 unauthorized copying, disclosure or distribution of the material in this
 e-mail is strictly forbidden.

 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
   
___

Re: [dtrace-discuss] RES: RES: Process in LCK / SLP (Please)

2008-04-22 Thread Jim Mauro
You may want to cross-post to a Java alias, but I've been down this
road before.

Java will call into malloc() for buffers for network reads and writes
that are larger than 2k bytes (the 2k is from memory, and I think it
was a 1.5 JVM). A large number of malloc calls,
and resulting contention on locks in the library, are due to the application
doing network writes of larger than 2k.

Newer JVMs (1.6) may improve this, but I'm not sure. There's also
an alternative set of classes and methods, NIO, which also can
help (although I've heard tell that NIO brings other problems along
with it, but I can not speak from personal experience).

At this point, I think you need to consult with Java experts to determine
what options you have for buffer allocation for network IO from the
Java heap, versus the current behavior of the JVM dropping back
to malloc for allocating buffers for network IO.

The other option of course is determining if the code can be changed to
use buffers smaller than 2k.

Thanks,
/jim




Kleyson Rios wrote:
 OK jonathan,

 I understand.

 So, looking on right place now, i can see few locks and sometimes no locks
 (just Mutex Hold). But I still have many threads in 100% LCK.

 If I don't have a lot of locks, where is my problem ?

 Running rickey c weisner's script a get:

 (...)
 25736
   libc.so.1`_so_send+0x15
   libjvm.so`__1cDhpiEsend6Fipcii_i_+0x67
   libjvm.so`JVM_Send+0x32
   libnet.so`Java_java_net_SocketOutputStream_socketWrite0+0x131
   0xc3c098d3
10
 25736
   0xc3d2a33a
14
 25736
   libc.so.1`_write+0x15
   libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d
   libjvm.so`JVM_Write+0x30
   libjava.so`0xc8f7c04b
16
 25736
   libc.so.1`stat64+0x15
21
 25736
   libc.so.1`_write+0x15
   libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d
   libjvm.so`JVM_Write+0x30
   libjava.so`0xc8f80ce9
76
   java   25736  kernel-level lock  1
   java   25736  shuttle6
   java   25736  preempted  7
   java   25736  user-level lock511
   java   25736  condition variable 748

  
 Atenciosamente,
  
 --
  
 Kleyson Rios.
 Gerência de Suporte Técnico
 Analista de Suporte / Líder de Equipe
  

 -Mensagem original-
 De: Jonathan Adams [mailto:[EMAIL PROTECTED] 
 Enviada em: sexta-feira, 18 de abril de 2008 15:40
 Para: Kleyson Rios
 Cc: dtrace-discuss@opensolaris.org
 Assunto: Re: [dtrace-discuss] RES: Process in LCK / SLP (Please)


 On Apr 18, 2008, at 1:03 PM, Kleyson Rios wrote:
   
 Hi przemol,

 Bellow output of plockstat for malloc and libumem. Both many locks.
 Why changing to libumem I didn't get less locks ?

 

 You're looking at Mutex hold statistics, which don't mean a lot  
 (unless contention is caused by long hold times)

 The important thing for multi-threaded performance is *contention*.   
 (Spinning and blocking)  Those are the statistics you should be  
 looking at.

 Both malloc and libumem use locks to protect their state;  libumem  
 just uses many locks, in order to reduce contention.

 Cheers,
 - jonathan




 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Java Process in LCK/SLP

2008-04-16 Thread Jim Mauro
Try linking the JVM to libumem.so.1
(export LD_PRELOAD=/usr/lib/libumem.so.1)

I have, in the past, been able to reduce lock contention in malloc
using libumem malloc.

If that does not help, you need to see where the mallocs are coming
from in the code, and see if there's opportunity to change the app
such that the JVM makes fewer malloc calls.

dtrace -n 'pid$target:libc:malloc:entry { @s[jstack()] = count() }'

500 threads - does your need that many threads?

Are the locks mostly associated with condition variables?

Typically, the only way to solve lock contention problems is by
changing the code...

HTH,
/jim


Kleyson Rios wrote:

 Hi,

 I am a newbie in dtrace.

 There is a multi-thread application running in my box. But looking 
 prstat –Lmp java i saw that almost all time is LCK and SLP.

 A ran plockstat and saw the most locks were malloc, so I changed my 
 application to use mtmalloc, but didn’t resolv the problem.

 After changed malloc to mtmalloc, i ran again plockstat and i’m still 
 getting many locks.

 How can I use better dtrace to find out the problem ?

 PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG 
 PROCESS/LWPID

 5637 root 94 0,0 0,0 0,0 0,0 4,2 0,0 1,4 19 96 61 0 java/7

 5637 root 1,3 0,2 0,0 0,0 0,0 98 0,0 0,2 2K 7 2K 0 java/5

 5637 root 1,1 0,2 0,0 0,0 0,0 98 0,0 0,3 3K 16 3K 0 java/3

 5637 root 1,1 0,1 0,0 0,0 0,0 98 0,1 0,4 1K 6 1K 0 java/4

 5637 root 1,1 0,1 0,0 0,0 0,0 98 0,0 0,4 1K 10 1K 0 java/6

 5637 root 0,7 0,1 0,0 0,0 0,0 33 65 0,8 11 17 129 0 java/347

 5637 root 0,7 0,1 0,0 0,0 0,0 65 34 0,0 20 0 129 0 java/134

 5637 root 0,6 0,1 0,0 0,0 0,0 93 5,7 0,2 21 1 133 0 java/84

 5637 root 0,4 0,1 0,0 0,0 0,0 49 51 0,1 20 1 135 0 java/435

 5637 root 0,5 0,1 0,0 0,0 0,0 14 85 0,1 16 0 87 0 java/162

 5637 root 0,5 0,0 0,0 0,0 0,0 23 76 0,8 3 2 44 0 java/291

 5637 root 0,4 0,1 0,0 0,0 0,0 36 63 0,6 9 5 81 0 java/412

 5637 root 0,4 0,1 0,0 0,0 0,0 52 46 1,0 9 0 70 0 java/370

 5637 root 0,3 0,1 0,0 0,0 0,0 81 17 1,2 9 4 75 0 java/376

 5637 root 0,3 0,1 0,0 0,0 0,0 37 63 0,0 8 0 77 0 java/174

 5637 root 0,3 0,1 0,0 0,0 0,0 42 57 0,1 7 1 61 0 java/145

 5637 root 0,3 0,1 0,0 0,0 0,0 43 56 0,9 12 11 53 0 java/114

 5637 root 0,2 0,1 0,0 0,0 0,0 2,6 96 0,8 8 0 38 0 java/269

 5637 root 0,3 0,0 0,0 0,0 0,0 98 0,1 1,1 10 0 38 0 java/361

 5637 root 0,2 0,1 0,0 0,0 0,0 2,4 97 0,0 6 2 37 0 java/81

 5637 root 0,2 0,1 0,0 0,0 0,0 84 14 1,2 6 7 57 0 java/292

 5637 root 0,2 0,1 0,0 0,0 0,0 2,5 97 0,3 5 1 33 0 java/192

 5637 root 0,2 0,1 0,0 0,0 0,0 85 14 0,8 7 8 54 0 java/212

 5637 root 0,2 0,1 0,0 0,0 0,0 3,2 96 0,2 6 6 36 0 java/149

 5637 root 0,2 0,0 0,0 0,0 0,0 42 58 0,1 7 1 50 1 java/411

 5637 root 0,2 0,1 0,0 0,0 0,0 2,3 97 0,3 8 3 38 0 java/236

 5637 root 0,2 0,1 0,0 0,0 0,0 43 56 0,9 10 4 44 0 java/438

 5637 root 0,2 0,0 0,0 0,0 0,0 2,5 97 0,1 5 1 34 0 java/137

 5637 root 0,2 0,0 0,0 0,0 0,0 2,2 98 0,0 7 1 34 0 java/179

 5637 root 0,1 0,1 0,0 0,0 0,0 2,5 97 0,6 1 4 111 0 java/20

 5637 root 0,2 0,0 0,0 0,0 0,0 91 8,2 0,2 4 8 48 0 java/72

 5637 root 0,2 0,0 0,0 0,0 0,0 3,1 97 0,2 6 2 35 0 java/75

 5637 root 0,2 0,0 0,0 0,0 0,0 21 78 0,2 3 2 38 0 java/109

 5637 root 0,2 0,0 0,0 0,0 0,0 89 9,9 0,9 3 6 36 0 java/256

 5637 root 0,2 0,0 0,0 0,0 0,0 99 0,0 1,0 4 8 39 1 java/458

 Total: 1 processes, 489 lwps, load averages: 1,92, 1,74, 1,65

 Atenciosamente,

 --

 Kleyson Rios.

 Gerência de Suporte Técnico

 Analista de Suporte / Líder de Equipe

 Governo do Estado de Goiás

 Agência de Administração

 Diretoria de Informática

 +55 62 3201-6582

 Você sabe a diferença entre pessoas bem sucedidas e pessoas mal 
 sucedidas ?

 Pessoas bem sucedidas estão dispostas a fazer as coisas que as pessoas 
 mal sucedidas não estão.

 

 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] howto measure CPU load

2008-03-30 Thread Jim Mauro
Hi Travis - Your first clue here is the backtick operator (`) used to
extract hp_avenrun[0]. The backtick operator is used to read the
value of kernel variables, which will be specific to the running kernel.
That is, Solaris, Mac OS X (Darwin), FreeBSD and all other kernels
with DTrace will not have portability when it comes to dtrace scripts
that use the backtick variable, since by definition they will be kernel
specific.

hp_avenrun[] is a Solaris kernel variable:
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/clock.c#97

For some kernel variables (like avenrun), you could look around for 
something
similar in your target kernel:

kilroy nm -x /mach_kernel | grep avenrun
004d25ac 0f 08  0002dbb9 _avenrun
kilroy

Of course, without the source, it's not clear what the variable type is

Soyour example below is specific to Solaris. If you see the backtick
operator, think kernel specific.

HTH,
/jim

Travis Crawford wrote:
 Hi dtrace-discuss,

 Having recently upgraded to OS X 10.5 I'm relatively new to DTrace, so
 apologies if this has been covered before; I did search the archives
 and didn't find anything though.

 As a DTrace starter project I'd like to understand how to extract
 common performance metrics before diving too far in. One of these
 common metrics is load average, which I thought would be pretty easy.
 However, I'm confused by this code snippet (from
 http://www.brendangregg.com/DTrace/iotop):

 
 /* fetch 1 min load average */
 this-load1a  = `hp_avenrun[0] / 65536;
 this-load1b  = ((`hp_avenrun[0] % 65536) * 100) / 65536;
 

 I can't figure out where hp_avenrun is defined. I searched
 http://wikis.sun.com/display/DTrace as well as looking at all the
 probes on my system ($ sudo dtrace -l | grep hp_avenrun) and came up
 empty.

 So my question is, when reading d scripts, how should I go about
 finding out what these mysterious variables are?

 Thanks,
 Travis
 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Can DTrace display non-instrumented function argument counts and types?

2008-02-23 Thread Jim Mauro
For kernel modules, _if_ the supplied kernel code (driver, file system, 
whatever),
was compiled with symbolic information, you can determine the argument list
and types with mdb:

# mdb -k
Loading modules: [ unix genunix specfs dtrace uppc pcplusmp scsi_vhci 
ufs mpt ip hook neti sctp arp usba fctl nca lofs zfs random audiosup 
sppp ptm crypto ipc md cpc fcip fcp logindmux nsctl sdbc sv ii rdc ]
 ufs_read::nm -f ctype
C Type   
int (*)(struct vnode *, struct uio *, int, struct cred *, struct 
caller_context *)

The example above displays the arg list for the kernel ufs_read() 
function. You can also
determine structure members:

  ::print -t struct vnode
{
kmutex_t v_lock {
void *[2] _opaque
}
uint_t v_flag
uint_t v_count
void *v_data
struct vfs *v_vfsp
. . .

If that works (meaning the kernel module was indeed compiled with 
symbolic type information),
you can use dtrace to instrument the entry points in the kernel 
functions of interest,
and grab arguments using either the args[] array (typed data) or 
args0...args1...etc (int64_t's).

HTH,
/jim


Gordon Marler wrote:
 Is it possible to use DTrace to display the number of arguments and their 
 types for userland or kernel functions that are not explicitly instrumented 
 or documented by their authors?  We're talking about functions provided by 
 Veritas VxVM/VxFS and the like, and for example, we might want to try tracing 
 I/Os from App = VxFS = VxVM = ssd driver to see where I/Os get aggregated 
 or broken up.  But without being able to determine function signatures, it 
 will be very hard to even begin such a process.

 It would seem that this is possible, but I haven't seen any examples of such 
 a use.  I'm presuming that this would be done via fbt.

 Gordon Marler
 [EMAIL PROTECTED]


 --
 This message posted from opensolaris.org
 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Getting total memory allocated so far

2008-01-04 Thread Jim Mauro

Thanks for the update. You probably already know this, but...

 2. I'm trying to track the memory allocations/frees relative to
 specific events in the code (e.g. someone getting hit by a projectile,
 etc).  The granularity for this stuff is pretty tight.  Some state
 variables linked to the place in the code where the program is (single
 threaded) when the malloc/free occurs tells me a lot.  Makes it easy
 to tell which portions of the program are responsible for most of the
 memory allocations.
Grabbing a ustack() in the pidPID:libc:malloc:entry and 
pidPID:libc:free:entry
probes is your friend here...

Thanks,
/jim


___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Getting total memory allocated so far

2008-01-03 Thread Jim Mauro

I hate to beat this to death, but it really sounds like you can make your
life simple and use prstat and pmap to track the information you want.

Heap memory usage tracking is tricky, since libc malloc may or may not
call into the kernel (sbrk) to satisfy a memory request. Also, tracking
malloc and keeping a running total in dtrace is dead easy, but tracking
free and correlating to how much space was freed is tricky, since
free takes an address, not a size. You could build an array of addresses
in a pidPID:libc:malloc:return probe clause, but making use of it to
track free's and sizes is tricky...

It really all gets down to the level of granularity and accuracy that you
require. prstat and pmap will provide the information you need with
reasonable accuracy - pmap will allow you to track heap segment sizes
(pmap -x PID | grep heap) - you could easily script something with
pmap and generate a summary report.

Finally, as Rickey pointed out, you could also write some C code that
uses procfs and grabs the desired data.

Thanks,
/jim


Lally Singh wrote:
 @Jeremy: yeah, I'm tracking how much memory it's actually using right
 now.. sbrk makes it complicated, but I figure that more advanced
 memory management may help, once I've measured how it's getting used
 today.  Via mmap() for large transient allocations and/or some
 smartness about allocating smartly to keep the working set of pages
 reasonable.

 @James: I saw that, but could I use it on a running program without
 stopping it?  The dtrace scripts don't do much damage, and the program
 still gets all its work done at ~60 Hz for its clients.  Single
 threaded, no less.

 Some more details: I was trying to keep this all a simple question,
 but with memory there rarely are simple questions.  My research is on
 the scalability of massively multiplayer online games.  So, I've
 written a load simulator which logs in from remote computers and
 starts playing the game.  I've got a bunch of dtrace scripts that
 measure the network, memory, and cpu usage as I add more simulated
 clients.  The scripts haven't shown themselves to interfere
 significantly with performance (yet), but in exchange they put out
 data once a second, which another script reads  shoves into a db for
 me.

 I'd like to monitor memory usage without interfering with my other
 measurements.  Hence, I'd love to know something close to a global var
 (which'd be great, malloc() already has to do some locking (for normal
 malloc, haven't seen umem), so why not?) or some way to quickly
 assemble a running total.  Right now all I've got is a pid provider
 probe for malloc  free that reads the headers on the memory blocks
 they use.

 Thanks in advance everyone!

 On Jan 2, 2008 1:25 PM, James Carlson [EMAIL PROTECTED] wrote:
   
 Lally Singh writes:
 
 I'd like to track to total amount of memory the application's actually
 using right now.   More specifically, I guess these rules would do it:

 1. start at M=0
 2. for each call to malloc(): set M += size of block returned by malloc
 3. for each call to free(): set M -= size of block freed.
   
 Using env LD_PRELOAD=libumem.so myapp with mdb's umem support should
 get you that (plus quite a bit more).

 --
 James Carlson, Solaris Networking  [EMAIL PROTECTED]
 Sun Microsystems / 35 Network Drive71.232W   Vox +1 781 442 2084
 MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677

 



   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Solaris Dynamic Tracing Guide gets Wikified!

2007-12-13 Thread Jim Mauro

Great question. As a dtrace user and documentation reader, I would not 
want to
need to flip to another chapter, or another section, to read about 
platform differences
for a particular provider, function, etc. I'm not saying you suggested 
that, I'm just
thinking out loud...

I think a reasonable way to do this, and maintain consistency, is to add 
a Platform Notes
heading to each section - or perhaps at a Chapter granularity.

Within Platform Notes, we have subsections for Mac OS X, FreeBSD, etc,
and add the platform specific notes there.

It may also be useful to add a chapter that discusses platform 
differences at a
high level.

Just some thoughts...if the platform differences are not vast enough to 
warrant
adding format changes to the document, perhaps they can be handled in 
footnotes.
(keeping in mind footnotes are considered poor writing style).

Thanks,
/jim


Quinn wrote:
 At 10:23 -0800 28/11/07, Adam Leventhal wrote:
   
 On Tue, Nov 27, 2007 at 07:46:37PM +, Jon Haslam wrote:
 
  To gain access to the DTrace wiki space contact Adam
  ([EMAIL PROTECTED]). If you are working on a bug
  from the bug list make sure you update the owner column of
  the bug to reflect the fact. When you've finished please
  update the column in the bug list to reflect that fact.
   
 Actually, there's no need to contact me or anyone: the wiki is open for all
 registered users to modify. We'll tighted up security if we have any 
 problems.
 

 How do you want to handle platform-specific changes?  As I stumble 
 across Mac OS X specific stuff, I'd like to add it to the book.  To 
 me that makes a lot more sense that maintaining a separate DTrace on 
 Mac OS X document.

 For example, the discussion of the stop action says that you can 
 start a process using prun, but that's not available on Mac OS X. 
 On Mac OS X you can use kill -CONT pid.  I'd love to just go in 
 there and add a note to that effect, but I figured I'd ask about 
 formatting conventions first.

 S+E
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


[dtrace-discuss] DTrace updates

2007-09-26 Thread Jim Mauro
http://blogs.sun.com/jonh/

The esteemed Jon Haslam posted some slides with examples of new features
that went into DTrace since Solaris 10 was released.

Good stuff, and a tip-o-the-hat to Jon.

/jim

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org