Re: [dtrace-discuss] Why can't I use stack as index to an array but I can for an aggregation?
On Fri, Dec 17, 2010 at 03:44:00PM -0500, Brian Utterback wrote: I am aggregating using the stack function as the index: @s[stack()] = count(); This works great. However, I found that in addition to how many times each stack appears, it would also be useful to know which stacks were called shortly before the script exited. So I thought to make an associative array with the same indexes, and store the timestamp the last time they were set: ts[stack()] = timestamp; This gives me an error: tracing function stack( ) may not be called from a D expression (D program context required) Why can't I use stack as the index to an associative array when I can use it as the index to an aggregation? Am I doing something wrong? And if I can't use it, does anyone have a suggestion how to store the time of the last call of each stack? While I don't know the answer to your question, a workaround would be to use max() as the aggregation function... Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Supporting the return of an array...
On Fri, Oct 15, 2010 at 04:08:25PM -0700, Darren Reed wrote: At present, dtrace does not support conditional execution, so no if's and no loops. There's the ? : operator... With the current design, all collection of statistics is about something that is happening. This means I can't present information about objects that have no activity alongside those that do. Well, I suppose I can, but I have to either provide the information from outside of the D script or cause some iterative function to occur that the D script can monitor and learn from - both of which seem like kludges to me. You might be better off using the Java interface to DTrace... That way you'd have better access to raw data (i.e., without having to parse text output of dtrace(1) in a shell or Python script). Therefore what I'd like to do is be able to return an array of objects and assign them the value 0. For example: @fsactivity[zfs_list()] = zero(); ... where zfs_list() would return an array of strings that are the names of all the zfs filesystems and zero() is a function that would be considered an aggregation friendly function that assigned 0 to every element in fsactivity. You can create SDT (and/or USDT) if-enabled probes that gather the relevant data and make it available as probe arguments. That's the typical answer for this sort of thing. But remember that DTrace probe context is very limited. You can't grab locks to build a list of the names of all the zfs filesystems -- if you'd need to grab locks then you'd better maintain that list at runtime and keep a per-CPU copy/reference, or something along those lines. Now I know that doing zero() is possible, and specifics of zfs_list() aside, but is it possible to support a function returning an array or list that is used to populate an array in D? Can the internals of D be easily modified to work in that fashion or would that require an extensive rewrite? All functions that can be called in D code are provided by DTrace itself. You can't define new functions from the host OS outside the context of DTrace itself. The limitations of DTrace context (which are purposeful, not accidental) also necessarily limit the extensibility of the language itself. If you find yourself wanting to create a variety of D functions unrelated to core DTrace functionality, then you may need to reconsider your approach. Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] stack() for specified CPU
On Mon, Aug 30, 2010 at 10:17:53AM -0700, Jonathan Adams wrote: On Fri, Aug 27, 2010 at 04:44:31PM -0700, Rafael Vanoni wrote: I was trying to aggregate on the stack trace of a CPU's current thread from a cpu_t * and the best I could do was aggregating with sym() on the t_startpc field of the CPU's current thread. It doesn't look like it would be very difficult to write a new action to get stack() from a given CPU id. Would that be of interest to anyone else? If so, please let me know and I'll write it up. It is extremely difficult to do that from another CPU on SPARC, since register windows mean that much of the stack state is kept in-CPU. Even on x86, you can't get anything better than the last place the thread updated its t_pcb (e.g. in cv_wait()), and there's a good chance the stack won't line up. Can't this be done with dtrace_xcall()? BTW, another backtrace-like action I'd love to see would be something like ptree(), which would output ptree(1)-like output (but without program arguments) that could then be aggregated on. Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] stack() for specified CPU
On Mon, Aug 30, 2010 at 10:42:27AM -0700, Jonathan Adams wrote: On Mon, Aug 30, 2010 at 12:29:18PM -0500, Nicolas Williams wrote: Can't this be done with dtrace_xcall()? You can't cross-call from probe context, let alone wait for one to complete. dtrace_xcall() is used by the dtrace infrastructure to ensure that any in-flight dtrace_probe()s while a state change was happening have completed before continuing on to the next step. Ah, thanks for the tip. BTW, another backtrace-like action I'd love to see would be something like ptree(), which would output ptree(1)-like output (but without program arguments) that could then be aggregated on. That could be handy. Yes, I think it would be, particularly for profiling apps with lots of short-lived processes, such as ON's build :) Something like: profile-2 /uid == 12345/ { /* Need cwd since we don't get program args */ @ptrees[ptree(), cwd] = count(); } Can one safely chase proc_t pointers in dtrace context? Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] struct padding on x64
On Wed, May 12, 2010 at 06:53:47PM +0200, Mark Phalan wrote: Are the padding rules supposed to be consistent between the compiler and dtrace on x64? # cat /tmp/s.d ... # cat /tmp/s.d ... Did you mean to catn /tmp/x.c and /tmp/s.d? $ cc -m64 /tmp/x.c -o /tmp/x Dunno what x.c contains... I'm assuming something very similar to what s.d contains. $ /tmp/x my_data: 16 more_data: 24 offset: 8 # dtrace -qs /tmp/s.d my_data: 16 more_data: 20 offset: 4 Bug? Probably, I think. The code here is all 64-bit, so differences in alignment in ILP32 vs. LP64 cannot be the problem. Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] dtrace / c++ / lots of threads
On Tue, May 11, 2010 at 07:38:17AM +0200, Michael Schuster wrote: On 10.05.10 21:01, William Reich wrote: only this process fails - ./adc.d 24930 dtrace: failed to compile script ./adc.d: line 7: failed to grab process 24930 Isn't this what happens if a process is already being traced (e.g., by truss)? Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] accessing frame pointer
On Fri, May 07, 2010 at 12:38:02PM -0700, tester wrote: Hello, How can I access frame pointer? I am trying to get a variable value. Here is the disassembly of the function. function+0x208:st%l0, [%fp - 0xc] I am trying to the value at [%fp - 0xc] You can access the registers via the uregs[] array. See: http://wikis.sun.com/display/DTrace/User+Process+Tracing I don't think you can get at the registers of code executing in kernel- mode. Also, uregs[] only works for the current user-land stack frame. Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] accessing frame pointer
On Fri, May 07, 2010 at 12:56:25PM -0700, tester wrote: Do I need to add stack bias 2047to fp to get actual data? On 64-bit SPARC, yes. Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Invalid address error when using stringof to convert a pointer to a string
On Wed, May 05, 2010 at 09:11:27AM -0700, Yossi Lev wrote: I just found a function that (sort of) does what I need: it is called lltostr and it takes a long long integer, and returns a string that represents it. (Unfortunately I didn't find an option to do the translation in hexadecimal, but I can live with that...) As for your question, I need to use the numbers as a key to an aggregation, but I need to concatenate a few of those. In particular, I have a sequence of N numbers where N 9, and I need to count how many times I'm getting each sequence. The value of N may be different in separate invocations of the probe function, so I would like to use an unrolled loop to concatenate the N numbers to single string representing the sequence, and then use the string as the key to my aggregation. Do you see any other option but to use the lltostr function to and concatenate the resulted strings? Yes: aggregations take multiple keys, not just one, so rather than format a string with all the key content that you need: use a comma- separated list of expressions of various types as the key. See: http://docs.sun.com/app/docs/doc/819-5488/gcggh?a=view Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Kerberos provider specifications for review
On Fri, Apr 30, 2010 at 05:39:13PM +0200, Mark Phalan wrote: http://wikis.sun.com/display/DTrace/kerberos+Provider I've reviewed already, but I'd like to add one more thing: probes for KDB lookup start/end, with end getting the KDB record fields, could be very useful as well (e.g., to help with KDB LDAP backend issues, and to check where oddness in KDC relies might come from). Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] translating compound structs
On Wed, Apr 07, 2010 at 05:08:07PM +0200, Mark Phalan wrote: I'm working on a set of USDT probes and need to expose fairly complex data in a reasonable way. I'd like to build up some compound structures but dtrace doesn't seem to like translating them :( The PID provider and USDT do not give you nice syntactic sugar for handling user-land data. In DTrace probe context all you can do is copyin and interpret as necessary. It's annoying. In theory the D compiler could generate the necessary DOF code by consuming CTF/DWARF/STABS from the relevant user-land programs and objects; in practice that's a very difficult thing to pull off (this is the standard answer, and I believe it). Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] translating compound structs
On Wed, Apr 07, 2010 at 06:15:22PM +0200, Mark Phalan wrote: But this isn't about user-land data. The structs I'm talking about are dtrace native structs defined in the d script. In the example I gave I wasn't even tracing a program - it was pure d. CTF data doesn't come into it... Duh. Well, this works: 2 typedef struct complex { 3 uint32_t simplevalue; 4 } complex_t; 5 6 translator complex_t uint32_t x { 7 simplevalue = x; 8 }; 9 10 dtrace:::BEGIN { 11 printf(simplevalue: %d\n, xlate complex_t(2).simplevalue); 12 } I couldn't get your example to work. ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] translating compound structs
On Wed, Apr 07, 2010 at 06:41:35PM +0200, Mark Phalan wrote: I guess I wasn't too clear.. I don't have any problems writing translators for simple structs. My problem is when the struct contains another struct which I'd like to initialize/set in the translator. No, it was clear the second time. I couldn't get your example to work. Right. I was wondering if someone knew if it was possible to modify my example to get it to work. I think this should be possible and if it's not currently then I'll probably open a CR for it. It may well be that translator output types have to have simple members. That's my impression anyways. The examples in /usr/lib/dtrace/ that I looked at have simple members. Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Using DTrace in 32-bit to handle 64-bit parameters [72631230]
On Fri, Mar 19, 2010 at 12:31:04PM +, Mark R. Bowyer wrote: Now, first glance, I asked why they were monitoring a 64-bit application with 32-bit code. But they do, and everywhere else they manage it. You can't do that. What they are trying to do is observe a 32-bit user-land application that uses 64-bit types. DTrace's PID provider can't possibly know that some argument is 64 bits because it doesn't consume user-land CTF data. (Many of us have asked for DTrace to be able to use user-land CTF data and make plenty of syntactic sugar available for chasing user-land pointers, accessing struct fields, etcetera, with automatically generated D code that does whatever copyins are required, but the answer is always that that would be a huge project.) [...] is somewhat confusing, as now that 64-bit value is taking up *2* args, not the one you'd expect. Yes. I believe you just have to be aware of this and deal with it. Dtrace probe: adv$1:::myprobe { /* translation section */ this-param1 = (curpsinfo-pr_dmodel == PR_MODEL_LP64) ? arg0 : (`utsname.machine == i86pc) ? ((arg1 32) | arg0) : ((arg0 32) | arg1); I'd do the curpsinfo-pr_dmodel check in the predicate instead. It makes the D code much more readable. But this isn't documented anywhere I've seen, or more to the point that the customer has seen. Is this an oversight, or are we missing something? or should we just avoid doing this at all costs? You don't have to avoid it. You just need to be aware that to interpret user-land data using PID provider probes requires care. Chasing pointers results in ugly, explicit copyins. Handling 64-bit values in 32-bit apps requires joining two 32-bit values in the probe actions. This is just a result of the PID provider's relative simplicity. Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Need some help
On Tue, Mar 16, 2010 at 08:55:04AM -0700, tester wrote: It is somewhat complex for someone like me with limited C programming to unearth the actual data(args and result) from the door call API. I beleive it has custom NSS headers packed along with actual data. Tracemem for most part is not that meaningful. If anyone has already done this, please share it. Ok, so you're trying to observe name service calls. (There are other doors in use in Solaris besides the one for nscd.) What you need is a way to parse the door call argument data buffer and the door return rbuf too. I don't see where that's documented in the source (the protocol itself is private, but it should be documented in source), so I'm cc'ing two people who can help you with that. (The relevant source code is in $SRC/lib/libc/port/gen/nss_common.c.) I am startting to look at other places where I can get this data in a strcutured form. I started looking at pid$target:libc:_nsc_try1door:entry That function receives the call data/results already packed. Also, on the client side there are many more processes to trace than on the server side. I recommend tracing the server-side if at all possible. Besides, if you're tracing the server you don't need to worry about 32-bit vs. 64-bit -- the server in this is always 32-bit because nscd is only built in 32-bit. Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Need some help
On Mon, Mar 15, 2010 at 12:20:35PM -0700, tester wrote: #!/usr/sbin/dtrace -qs typedef struct door_arg { char *data_ptr; /* Argument/result buf ptr*/ size_t data_size; /* Argument/result buf size */ door_desc_t *desc_ptr; /* Argument/result descriptors */ uint_t desc_num; /* Argument/result num desc */ char *rbuf; /* Result buffer */ size_t rsize; /* Result buffer size */ } door_arg_t; These declarations will use the sizes of those types as these declarations are to be used in kernel-land. That means that data_ptr is a 64-bit field in kernel-land and in DTrace context. That disagrees with 32-bit user-land. Use door_arg32_t as Michael Bergknoff suggests. You'll see that door_arg32_t uses caddr32_t instead of base type *: typedef struct door_arg32 { caddr32_t data_ptr; /* Argument/result data */ size32_tdata_size; /* Argument/result data size */ caddr32_t desc_ptr; /* Argument/result descriptors */ uint32_tdesc_num; /* Argument/result num descriptors */ caddr32_t rbuf; /* Result area */ size32_trsize; /* Result size */ } door_arg32_t; caddr32_t is (in 64-bit land) an unsigned integer type rather than a pointer type, and its size matches that of 32-bit pointer types. Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Need some help
On Sun, Mar 14, 2010 at 07:42:08PM -0700, tester wrote: Ouch, copyin bites again. Remember: copyin() returns the kernel address of the buffer into which it copied the data that you asked for. To get to the data you have to dereference that pointer. BTW, can you please explain the 8 byte difference in o/p fbt::door_call:entry { printf(entry args are %x and %a \n, arg0, copyin(arg1,4)); door_arg = (curpsinfo-pr_dmodel == PR_MODEL_ILP32 ? copyin(arg1, 4) : copyin(arg1, 8)); printf(argument pointer %a \n, door_arg); } 8 bytes difference in o/p entry args are 3 and 0x3022640 argument pointer 0x3022648 I was expecting either a 32bit or 64 bit address. I am getting 44 bit address. That's printing the kernel address returned by copyin(). Try this: fbt::door_call:entry /curpsinfo-pr_dmodel == PR_MODEL_ILP32/ { printf(entry args are %x and %x \n, arg0, *(uint32_t *)copyin(arg1, 4)); } fbt::door_call:entry /curpsinfo-pr_dmodel == PR_MODEL_LP64/ { printf(entry args are %lx and %lx \n, arg0, *(uint64_t *)copyin(arg1, 8)); } Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] USDT probes and CTF
On Tue, Feb 16, 2010 at 02:12:35PM +0100, Mark Phalan wrote: I'm working on adding USDT probes to a library. The library is built in ON and hence includes CTF data. I've set up a number of translators for the probes to use to avoid exposing the internals of the library. Unfortunately it appears as though dtrace can't take advantage of the CTF data in userspace libraries to walk library data structures. The library data-structures are complex and cannot simply be included in the provider support file so I have to resort to teaching the library to setup the arguments for dtrace to consume with an intermediate data structure. This data strucutre can be then included in the provider support file (in /usr/lib/dtrace). I couldn't find a CR/RFE tracking this. Is there one? What you're looking for is called is-enabled probes. These are probes that do have disabled probe effects (a branch, basically). You get to gather the data needed for the probes only when the probes are enabled. User-land pointer chasing and structured type handling via simple syntax that relies CTF is not a feature of DTrace right now; you can only do that in kernel-land. Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] probe order
On Mon, Dec 21, 2009 at 07:19:35AM -0800, tester wrote: Yes, a T5220. You're printing the thread ID, which is good. But, suppose thre thread in question gets re-scheduled on a different processor (or hw thread). DTrace has a per-CPU trace log... You can see how such re-ordering might happen :) Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] emulate pstack using DTrace
On Thu, Dec 17, 2009 at 09:16:24AM -0800, GGallagher wrote: On Solaris 10, running 'pstack' against our JVMs (java version 1.5.0_17) can take as long as 60 seconds to complete. During that time apparently, the JVM is unable to do anything else. This causes applications to crash. So they said to me, Please write a DTrace pstack. Using the syscall provider, I can print out the PID and TID and then a nice jstack() every time a system call is made by the PID, but that shows only the stack for the LWP that ran the system call. That's not what 'pstack' does. I want to stop the process, and then walk through each LWP and generate a stack trace. Can I do this in DTrace? No, you can't. To be exact, you can stop a process (with the stop() destructive action), and you can then get the stack traces for all its threads (using the system() action to invoke pstack!), but not in a way that's distinguishable from actually using pstack. The problem is the stop the process part of stop the process, and then walk through each LWP and generate a stack trace. The stopping of the process is what makes the JVM stop making progress, which in turn sounds like the reason that your apps are crashing (but why are they so sensitive to timing?). DTrace can only give you a stack trace for threads that cause some probe to fire. If a thread is blocking inside a system call at the time that your D script start running, then DTrace will not be able to give you a stack trace for that thread until that thread wakes up and does something that causes one of your probes to fire. Perhaps a tool could be built that does what pstack does, but without stopping the process, and only for threads that are blocking in the kernel, while using DTrace to get stack traces for active threads (probably using a profile provider probe)? Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] where do kernel data types come from?
On Wed, Oct 28, 2009 at 06:55:28PM +, Joel Reymont wrote: I have a script where I can freely reference struct nameidata*, struct vnode*, etc. on Snow Leopard. How does DTrace know about these data types? I understand things like #pragma D depends_on library darwin.d where darwin.d has typedefs. I can't find definitions of nameidata and vnode in any D scripts, though. How does it work? On Solaris these come from CTF data embedded in the ELF files that make up the kernel and loadable modules. The CTF data comes from ctfconvert and other build utilities, which in turn get the data from the .os originally output by the compilers. Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Dtracing influence functionality (of PCFS)
On Sat, Oct 10, 2009 at 07:56:14AM -0700, Martin Cerveny wrote: PS: I understand that there is the error in PCFS (I will probably find and correct the problem), but I am very disappointed by dtracing side effect This breaks non intrusiveness of dtrace. All DTrace did here was have some enabled-probe performance impact, which is what it normally does and unavoidable. The fact that there is a race condition which can be triggered by that performance impact is a bug, not in DTrace, but in the affected sub-system (pcfs). The fact that DTrace can have such impact is sometimes a very useful side-effect of DTrace, and it's why there's a chill() [destructive] function that you can use in probe actions: race conditions can be very difficult to trigger, making the ability to play with timing via DTrace very useful. Also, as others have pointed out, you enabled all FBT probes; when you do that the per-enabled-probe performance impact of DTrace adds up quickly. You might argue that enabling too many probes is itself as destructive as chill(), but unlike chill(), DTrace cannot judge the overall performance impact of your D scripts; failing to distinguish between destructive and non-destructive actions on the argument that all performance impact of enabled probes, no matter how tiny, is destructive would make it harder to use DTrace safely, not easier. Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] putting a running app into trace mode
On Fri, Aug 28, 2009 at 05:13:41PM -0400, Chad Mynhier wrote: On Fri, Aug 28, 2009 at 4:44 PM, Nicolas Williamsnicolas.willi...@sun.com wrote: Don't forget to have a system(prun $pid); action in the BEGIN probe of the second script... Nope, DTrace will actually take care of that for you. Consider this script, /tmp/foo.d: Is this new? It's very nice... Thanks, Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Possible proposal?
On Wed, Jun 10, 2009 at 05:09:38PM -0700, Randy Fishel wrote: Where I eventually landed, is that I need to have (or implement) a mechanism to send dtrace information to an arbitrarily defined streaming device. I was considering it as part of a 'power management' provider, but it seemed this might be useful outside of a pm provider. I agree. Simply provide some encoding of the trace data and write it wherever you can. ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Possible proposal?
On Thu, Jun 11, 2009 at 01:15:14PM -0400, Colin Burgess wrote: Do you mean that the raw data would be sent (rather than being placed into buffers?) or that the dtrace util would spit out it's info to the port? The user-land part of dtrace wouldn't be running, so I'm sure Randy meant the former. If the former, you will be limiting the acquisition of data to the baud rate/buffering capability of the serial driver, no? Wouldn't that be true regardless? ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Possible proposal?
On Wed, Jun 10, 2009 at 04:15:30PM -0700, Randy Fishel wrote: This morning I had the thought that maybe some or all of this functionality could be handled by having a 'logger' provider in dtrace. My needs are to have a lightweight non-volatile logging mechansim that will continue to function while normal hardware I/O channels are being stopped or powered off. But I considered that this feature might be usefull to other consumers. Have you looked at anonymous DTrace scripts? Those are used for tracing early in boot, and their results end up in a kernel-land buffer until retrieved from user-land once the system is up. You could use the same approach. One of the nice things about this is that if there's a panic you can always get at the buffered data using kmdb (if you can't use kmdb but can force a dump then you can always look at the data in the dump). Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] file offset monitoring weirdness
On Thu, Jan 29, 2009 at 01:48:05PM -0800, Glenn Skinner wrote: syscall::write:entry /pid == $target arg0 == 1/ { interesting = 1; printf(writing 0x%x bytes to stdout at 0x%x\n, arg2, fds[1].fi_offset); } syscall::write:return /pid == $target interesting == 1/ { printf(wrote 0x%x bytes\n, arg0); } syscall:::entry /pid == $target interesting == 1/ { } The last probe does very little. And the second one catches write() returns other than the ones you want. Perhaps you meant something like: syscall::write:entry /pid == $target arg0 == 1/ { self-interesting = 1; printf(writing 0x%x bytes to stdout at 0x%x\n, arg2, fds[1].fi_offset); } syscall::write:return /pid == $target self-interesting == 1/ { printf(wrote 0x%x bytes\n, arg0); } Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Systemtap and Dtrace Comparison
On Tue, Jan 20, 2009 at 09:33:42AM -0800, Rob Clark wrote: Summary comparing systemtap and dtrace http://sourceware.org/systemtap/wiki/SystemtapDtraceComparison It looks like we are ahead in a few spots also. We being DTrace? That comparison is wrong or misleading in a number of areas. For example: DTrace can instrument every instruction in user-land, whereas that page says SystemTap can instrument zillions (statements, functions) and DTrace only millions (functions, markers). The contention that SystemTap can access any context-visible variable as preserved by compiler is misleading unless it is customary to ship production object code with low optimization levels and with full debug information. The same observation applies to the ability to set probes on statements; production code does not normally ship with optimization turned off and debug information. Another misleading item: full control structures (conditionals, loops, functions) -- DTrace does not provide loops and functions in D _by design_, _on purpose_, _for good reason_, but no discussion is given. Yet another: kernel coupling - interdependent development/schedule -- methinks that the number of operating systems to which DTrace has been ported is proof that the level of such coupling is not [a] lot. There are other such problems. Finally, I'm not sure what is meant by binary tracing in SystemTap. If that means tracing arbitrary memory buffers then that is definitely supported by DTrace. If it means tracing arbitrary instructions then that is true only of the DTrace pid provider (user-land). Much bandwidth has been spilled on this topic. If you want more you can find it in the mailing list archives for DTrace, and in SystemTap's, as well as in plenty of blog entries. Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Programmatic interface to DTrace?
On Fri, Jan 16, 2009 at 02:03:48PM -0500, Chip Bennett wrote: and for the C API (libdtrace): http://dev.lrem.net/tcldtrace/wiki/LibDtrace Oh, and there's a tantalizing link to a Tcl interface, not yet populated, and an even more tantalizing link to what a TclDTrace script would look like! ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] [Fwd: Why exece syscall on 64-bit OS returns 32-bit address]
On Wed, Dec 17, 2008 at 10:53:28PM +0800, Jason Zhao wrote: I have a quick question, I run the following dtrace command on my Lenovo T61 which is 64-bit kernel running. But after that, I found the dtrace return 32-bit address, otherthan 64-bit address, why does that happen? Does that mean 64-bit kernel still use 32-bit syscall? No, 32-bit user processes run in 32-bit mode. A 64-bit platform running a 64-bit kernel can run both, 32- and 64-bit user processes. What's happening is that you're catching the *entry* to exece(), and that's from your shell (after fork()ing the child that is to run ls(1)): # dtrace -n 'syscall::exece:entry { stack(); ustack() }' ^ dtrace: description 'syscall::exece:entry ' matched 1 probe --- run /usr/bin/amd64/ls command on other term. CPU ID FUNCTION:NAME 1 59502 exece:entry unix`_sys_sysenter_post_swapgs+0x14b 0xfedb0b85 - 32-bit address exece() normally does not return to the caller (it returns only if there's an error), and ls(1) doesn't call exece(). What are you looking for? ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] lsof vs Dtrace
On Fri, Dec 12, 2008 at 09:45:08AM -0500, James Carlson wrote: Chip Bennett writes: Pfiles should be rewritten to not stop processes. I had to go look at the code to make sure you were right on this. If lsof can gather open file info without stopping processes, why can't pfiles do that. lsof does it because it reads the volatile kernel structures on the running system. Often that works because things aren't changing right at the moment when you look at them. But it's also possible that you get back garbage. pfiles stops the process because it uses the debugging interfaces, just as (say) mdb or gdb. But not kmdb. mdb -k can do everything that pfiles can do but without stopping processes. The only problem with th kmdb approach is that not all sockets are (were, now that Volo has integrated?) associated with file structs, so finding open sockets for kernel-land services required quite a bit more work than merely walking the process table. I have a very out of date script lying around that did just that. But the point is that kmdb scripting and lsof are on the same footing. Or, if there is some additional reliability/consistency to be gained by stopping the process, perhaps pfiles could be modified to have an option that causes it stop the process, but by default doesn't, just to be safe. The long-term answer, I think, is that we need stable kernel interfaces that will provide the information that lsof needs to work. I agree. Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] lsof vs Dtrace
On Fri, Dec 12, 2008 at 02:00:25PM -0800, Dan Mick wrote: For the record: 'kmdb' and 'mdb -k' are different beasts. kmdb == mdb -K == boot with -k == console only == stop the kernel in its tracks. mdb -k is have a gander at the still-running kernel. I knew I'd get in trouble over that nit :-) I should have said mdb -k. (The script does use mdb -k, of course -- you'd not want it to drop the system into kmdb!) ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] dtrace missing 'unlinkat'? showing
On Wed, Oct 01, 2008 at 12:59:15PM -0700, Adam Leventhal wrote: On Wed, Oct 01, 2008 at 12:16:07AM -0500, Nicolas Williams wrote: pid$1:libsomelibrary:some_function:entry { self-some_item = copyin_expr(arg[0]-field1-field2-field3); ... } and the D *compiler* looking up the CTF for this and figuring out what sequence of copyins to translace that into, including, possibly, defining a useful struct type for self-some_item. The bitness of the target would have to be known at compile-time, else the compiler would have to generate DOF code for both, 32- and 64-bit targets and branch at run-time according to which the running target is. We don't even need the copyin_expr() part -- there's no reason we couldn't know that copyins were implicitly required. Oh, I know, I didn't mean that it'd have to look like a function. Native compiler support might look completely different, better integrated, something like pid$1:libsomelibrary:some_function:entry { self-some_item = arg[0]-field1-field2-field3; trace(self-some_item.field4); ... } If I were building a pre-processor to generate D code to replace such an expression I'd probably just add a copyin directives section between a probe's predicate and actions. That would make it easy to extract the copyin expressions without having to parse the actions. ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] dtrace missing 'unlinkat'? showing
On Wed, Oct 01, 2008 at 01:00:52PM -0700, Adam Leventhal wrote: On Wed, Oct 01, 2008 at 12:02:04PM -0700, Paul Macknee wrote: The alternative I see, is to need to know the usage about each syscall (all hundreds of them) and write hundreds of separate probes that know how the arguments are to be parsed. As it is, if I even try to do intelligent preprocessing: That would already be necessary: there's no type information for system calls and no typed arguments. For a DTrace-based truss you'd need to replicate the gigantic tables in the truss source code. Usually there's a kernel function that gets all the copied-in arguments of a syscall (e.g., copen(), in the case of open(), open64(), openat()...), so judicious use of the FBT provider can get you what you want. But that's not stable. It'd be nice if for every syscall there was a stable function that got the copied-in args. Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] mblk chains?
On Wed, Jun 11, 2008 at 11:45:05AM -0400, James Carlson wrote: Andrew Gallatin writes: Just that that the second snippet seems to be rather recursive (this-mp = this-mp-b_cont) and it seems to work, and is a nice alternative to enumerating all the potential chains as was suggested. Maybe I'm thinking of things incorrectly.. It's not recursive; it's sequential. The instructions are executed in the order they appear. There are no places where it jumps backwards or reconsiders an earlier probe point based on what a later one says. What might seem recursive is that the same variable names work in each case. Think of this as unrolled recursion. ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] whatfor.d -- where's null pointer?
On Mon, Apr 21, 2008 at 11:00:29AM -0700, Adam Leventhal wrote: So curlwpsinfo-pr_stype can work and later fail. Looking at the translator for that field we see that it looks like this: pr_stype = T-t_sobj_ops ? T-t_sobj_ops-sobj_type : 0; This compiles to this DIF code: [...] We can see that we load the t_sobj_ops member once at offset 07 and then again at offset 17 (right before we load sobj_type at offset 18). The t_sobj_ops member can be set to NULL asynchronously from other threads so this double load introduces a window for the failure that you're seeing. Either we need to use some temporary, probe-local variable (one that can't conflict with a user-defined variable), or we need to perform some element of optimization to the generated DIF. In the world of LISP macros the macro writer would gensym a local variable (probe-local in this case) to deal with this sort of issue. Perhaps the DTrace translator facility needs a probe-local gensym feature. Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Iterating over all LWPs
On Mon, Mar 24, 2008 at 10:35:43AM -0700, Roman Shaposhnik wrote: Any suggestions? Either don't use DTrace if you need to iterate, or start the target via DTrace so you can keep track of all the LWPs in your D script. ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Iterating over all LWPs
What I meant by the don't use DTrace if you need to iterate option is this: try using MDB. Given what you say you might find it a lot easier to sample the items you need via MDB. ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Missing syscall provider probes for unlinkat(2) and friends?
On Sun, Feb 03, 2008 at 01:34:02PM -0800, Adam Leventhal wrote: I'm not sure why we don't include the evaluation in the public version, but here it is: ---8--- Evaluation [ahl 8.9.2007] unlink(2) and unlinkat(2) are different system calls. unlinkat(2) is actually a subcode (number 5) of the SYS_fsat system call. While it might be confusing to u sers, DTrace isn't going to slap lipstack on that particular pig. [...] ---8--- This pig isn't very attractive. Is a system call number shortage the underlying problem? And is the fix to this ultimately about fixing the syscall number shortage? Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Missing syscall provider probes for unlinkat(2) and friends?
On Mon, Feb 04, 2008 at 05:44:05PM +0100, Joerg Schilling wrote: Nicolas Williams [EMAIL PROTECTED] wrote: This pig isn't very attractive. Is a system call number shortage the underlying problem? And is the fix to this ultimately about fixing the syscall number shortage? grouping several syscalls under a single entry is nothting new. I am aware. However, I think, e.g., unlinkat(2) should be just like unlink(2), from a DTrace syscall provider p.o.v. Adam called this a pig for a reason; I agree with that characterization. Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
[dtrace-discuss] Missing syscall provider probes for unlinkat(2) and friends?
I see that rm(1) uses unlinkat(2), but I don't see a syscall provider probe for unlinkat(2). That's... annoying (but there's always the fbt provider). Actually, I don't see any syscall provider probes for any of the open/unlink/rename/...at[64]() system calls. Is there a CR for this? Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Missing syscall provider probes for unlinkat(2) and friends?
On Fri, Feb 01, 2008 at 01:09:22PM -0500, James Carlson wrote: Nicolas Williams writes: I see that rm(1) uses unlinkat(2), but I don't see a syscall provider probe for unlinkat(2). That's... annoying (but there's always the fbt provider). Actually, I don't see any syscall provider probes for any of the open/unlink/rename/...at[64]() system calls. Is there a CR for this? It's not actually missing. A quick sunsolve search will get you CR 6590548, which explains that fsat is the actual syscall involved. Ah. Thanks! ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] DNS profiling
On Thu, Jan 24, 2008 at 09:58:32AM -0800, Fletcher Cocquyt wrote: Fellow dtracers - I am diagnosing a busy spamassassin/mailman/sendmail system - the CPU and memory usage seems normal, but iotop shows I/O and network are very busy (and initiating TCP connections is quite slow sometimes (even after tuning the stack)) I suspect DNS latency - how can I use Dtrace to quantify latency due to DNS lookups on this server? Er, DNS issues don't cause TCP connections to take a long time to complete. DNS issues definitely can cause applications to take a long time to get around to initiating TCP connections. You can use snoop to find DNS requests that are going unanswered, though the find part is very manual. Otherwise you can do as Brendan said, and for applications using the resolver directly, you may want to use pid provider probes on res_nsend(). Applications that speak DNS without using the resolver will be more difficult to trace -- you'll have to trace calls to connect/write/read/ sendto/sendmsg/... system calls and figure out how to tell which such calls are interesting based on TCP/UDP port number 53. Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Missing struct?
On Wed, Dec 05, 2007 at 12:05:04PM -0500, Brian Utterback wrote: And just to make sure, if I want to see the values in the struct when ntp_adjtime returns, I need to do the copyin again, right? Or use fbt. ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Can you trace a library across all pids?
On Wed, Nov 21, 2007 at 09:58:49AM -0800, Bryan Cantrill wrote: We wanted to develop a dtrace script that would capture the raw data that would be used to modify the internal database just before it was actually used to update the database. I have written this script, and it does dump out the record. However, it turns out that the database can be modified by three different methods, and each of these can be run many times under different pid numbers of course. The three methods are kadmin, kadmin.local and kpasswd. Thanks for the use case! One thing to investigate: add an SDT probe in the common library where the database is modified. Especially with is-enabled probes, you can construct this such that it provides rich arguments (e.g., the nature of the query or transaction) -- and then the ability to dynamically instrument all processes that make the call just falls out. And to be honest, it's such a nasty problem to allow instrumentation of arbitrary functions across arbitrarily many (and arbitrarily dynamic) processes that we're unlikely to solve it (at least in the near future) -- especially with SDT probes providing such a reasonable solution... I've wanted that myself, but it does seem like a difficult problem! One way to deal with this particular use case might be to use the syscall provider to catch opens for write of that DB, stop caller and start a new instance of dtrace (via system()) that traces the stopped process and pruns it. I've done this sort of thing before and it works OK. It slows down the DB open operation a lot (since system() involves fork/execing), and it requires destructive actions, but it's livable. Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] provider proposal: NFS v4
On Thu, Nov 15, 2007 at 06:43:07PM -0600, Spencer Shepler wrote: On Nov 15, 2007, at 4:27 PM, Adam Leventhal wrote: Does 4.1 differ so much from 4 that the provider name itself must differ? That's funny. Yes. How funny is it? Will we want to support both the nfsv4 and nfsv41 providers? Will the probes overlap in a way that would create incompatibilities? Funny == flippant, in the sense that I would have expected Nico to know some of the details and was poking him a little on the issue (Hi, Nico). Yes, _I_ got it :) [...] The larger issue is that there are a couple of arguments (OPEN mainly) that have been extended through the NFSv4 minor version rules to include additional parameters in effect. This means that an NFSv4 and NFSv4.1 provider should not share the argument definition. I don't see that as much of a problem either if the providers were per minor version (e.g. NFSv4.0 and NFSv4.1). Certainly there's no problem if we have separate providers for each version, but I did think (and you confirm) that a large subset of ops are common to both versions. If the implementation shares that commonality then it makes sense to at least try for one provider. Architecturally, if the differences between the two versions are sufficiently large then separate providers would be better. I'll be happy either way though :) Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] provider proposal: NFS v4
On Fri, Nov 16, 2007 at 01:53:05PM -0800, Adam Leventhal wrote: On Fri, Nov 16, 2007 at 03:20:02PM -0600, Spencer Shepler wrote: On Fri, Nov 16, 2007 at 12:58:19PM -0600, Nicolas Williams wrote: (I.e., drop the op- in the client-server compound direction.) We chose to include 'op-' because we thought that it created better symmetry with 'compound-cb'. The 'op-' doesn't indicate that it's an operation (that would be 'op-compound-start'; rather it indicates that its the operation compound rather than the callback compound. nfsv4:::compound-proc-start nfsv4:::compound-proc-done nfsv4:::compound-proc-cb-start nfsv4:::compound-proc-cb-done and to be complete nfsv4:::null-proc-start nfsv4:::null-proc-done Why is that better? Can you explain a bit? It still seems as though you're still losing the symmetry between the compound operations and callbacks, but perhaps that's intentional. I don't see that. I think what Spencer and I are saying is that COMPOUND is not an operation, but a procedure. And folks who know the protocol would expect the probe naming to reflect the procedure vs. operation distinction. It's mostly six of one and half-a-dozen of the other, but suppose v4.2 added a COMPOUND OP (no, I'm sure that wouldn't happen)... I think the simplest thing to do is to let the probe names reflect the protocol element names wherever probes relate directly to protocol elements. It may be wordier, but it will be less surprising to those who know the protocol (and yes, I realize that the proponents here know the protocol, much better than I know it too). Nico -- ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] How do I look up syscall name
On Wed, Aug 29, 2007 at 11:31:13AM -0700, Martin Englund wrote: Thanks for the suggestion. This is how the probes fire: fbt:c2audit:audit_start:entry fbt:c2audit:audit_start:return syscall:::entry syscall:::return fbt:c2audit:audit_finish:entry fbt:c2audit:audit_finish:return I guess I can just save the timing data I calculate in fbt:c2audit:audit_start:return and add it to my aggregation when I'm in fbt:c2audit:audit_finish:return. And you don't even need to use a speculation -- thread locals should suffice, no? ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org