Re: [dtrace-discuss] How to dig deeper
%busy is meaningless unless you're looking at a single disk that can only have 1 outstanding IO in it's active queue, which is to say %busy is a useless metric for anything disk that's been designed and built in the last decade. Ignore %busy. Focus on queue depths and queue service times, both of which are reported in iotstat. And again, averages can be misleading. Use the iostat data as a starting point, but if you really think you have some disk IO issues, use DTrace and aggregations to get more precise data on disk IO times. Thanks, /jim Hans-Peter wrote: Hello Jim, The iostat average are not too bad. But the busy% is 100% most of the time. regards Hans-Peter ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Tracing application-specific data structures
Adam, thanks for your reply. I'm trying to do a root-cause analysis on a troublesome customer bug and single-stepping with a debugger is making the context difficult to track. I've been searching the forums before and after the posting. The problem has to do with what terms to search for. I did find a thread in volume 42 which covered why conditionals weren't included in the language, but alternative patterns for traversing data structures seem to be lacking. Again, it probably comes down to knowing well enough the taxonomy to employ in searching in the DTrace forum. Questions: 1) By tick probe are you meaning a probe that fires on a timer tick, used to trigger the next iteration through the data structure? If so, I'd thought the data structure would be morphing underneath the traversal. If not, can you point me to an example? 2) Before writing to the alias, I was trying to get my head around something like the first approach you mention, of unrolling the loop and determining a stopping point. Having dealt with Prolog in the distant past, this at least occurred to me, but I'm missing how one can express this with DTrace. Any pointers are appreciated. Cheers, Matthew -- This message posted from opensolaris.org ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] How to dig deeper
It seems VOP_RWLOCK takes a long time. ufs_rwlock starts at line 12, but ends at line 15611. Is there another application keeps reading/writing in your system? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Hans-Peter Sent: 2008年12月3日 21:41 To: dtrace-discuss@opensolaris.org Subject: Re: [dtrace-discuss] How to dig deeper Hello all, I added a clause to my script. sysinfo::: /self-traceme==1 pid == $1/ { trace(execname); printf(sysinfo: timestamp : %d , timestamp); } A subsequent trace created a file of about 19000 lines. I loaded in Excel to be able to subtrace timestamps etc. The longest jump in timestamp is between the first pair of savectx and restorectx at line 70. I count 50 savectx/restorectx calls in my trace. But the actual physical write as indicated by the sysinfo is almost at the end of the file directly after the ufs_write and fop_write call end (line 18746). 24 - tsd_agent_get oracle timestamp 1795469946839100 24 - ufs_lockfs_top_vop_return oracle timestamp 1795469946841500 24 - ufs_lockfs_top_vop_return oracle timestamp 1795469946844300 24 - ufs_lockfs_end oracle timestamp 1795469946846700 24 - ufs_write oracle timestamp 1795469946849600 24 - fop_write oracle timestamp 179546994685350057,365,700 24 | pwrite syswriteoracle sysinfo timestamp 1795469946856800 24 | pwrite writech oracle sysinfo timestamp 1795469946860200 So it seems that the actual write not the bottle neck but I attached a zip file with the excel document. Am I right in thinking that is is more an OS issue than a storage issue? Regards Hans-Peter -- This message posted from opensolaris.org ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] How to dig deeper
The sysinfo provider isn't the best choice for measuring disk IO times. Run; #dtrace -s /usr/demo/dtrace/iotime.d /jim Hans-Peter wrote: Hello all, I added a clause to my script. sysinfo::: /self-traceme==1 pid == $1/ { trace(execname); printf(sysinfo: timestamp : %d , timestamp); } A subsequent trace created a file of about 19000 lines. I loaded in Excel to be able to subtrace timestamps etc. The longest jump in timestamp is between the first pair of savectx and restorectx at line 70. I count 50 savectx/restorectx calls in my trace. But the actual physical write as indicated by the sysinfo is almost at the end of the file directly after the ufs_write and fop_write call end (line 18746). 24 - tsd_agent_get oracle timestamp 1795469946839100 24 - ufs_lockfs_top_vop_return oracle timestamp 1795469946841500 24 - ufs_lockfs_top_vop_return oracle timestamp 1795469946844300 24 - ufs_lockfs_end oracle timestamp 1795469946846700 24 - ufs_write oracle timestamp 1795469946849600 24 - fop_write oracle timestamp 179546994685350057,365,700 24 | pwrite syswriteoracle sysinfo timestamp 1795469946856800 24 | pwrite writech oracle sysinfo timestamp 1795469946860200 So it seems that the actual write not the bottle neck but I attached a zip file with the excel document. Am I right in thinking that is is more an OS issue than a storage issue? Regards Hans-Peter ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] How to dig deeper
Also - since this is Oracle, are the Oracle files on a file system, or raw devices? If a file system, which one? /jim Jim Mauro wrote: The sysinfo provider isn't the best choice for measuring disk IO times. Run; #dtrace -s /usr/demo/dtrace/iotime.d /jim Hans-Peter wrote: Hello all, I added a clause to my script. sysinfo::: /self-traceme==1 pid == $1/ { trace(execname); printf(sysinfo: timestamp : %d , timestamp); } A subsequent trace created a file of about 19000 lines. I loaded in Excel to be able to subtrace timestamps etc. The longest jump in timestamp is between the first pair of savectx and restorectx at line 70. I count 50 savectx/restorectx calls in my trace. But the actual physical write as indicated by the sysinfo is almost at the end of the file directly after the ufs_write and fop_write call end (line 18746). 24 - tsd_agent_get oracle timestamp 1795469946839100 24 - ufs_lockfs_top_vop_return oracle timestamp 1795469946841500 24 - ufs_lockfs_top_vop_return oracle timestamp 1795469946844300 24 - ufs_lockfs_end oracle timestamp 1795469946846700 24 - ufs_write oracle timestamp 1795469946849600 24 - fop_write oracle timestamp 179546994685350057,365,700 24 | pwrite syswriteoracle sysinfo timestamp 1795469946856800 24 | pwrite writech oracle sysinfo timestamp 1795469946860200 So it seems that the actual write not the bottle neck but I attached a zip file with the excel document. Am I right in thinking that is is more an OS issue than a storage issue? Regards Hans-Peter ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
[dtrace-discuss] Round four: Re: code review req: 6750659 drti.o crashes app due to corrupt environment
I believe that I have incorporated all of the feedback given (thanks!). Changes since the 2008-11-16 version include: - ksh style coding standards compliance in test script (Roland) - dof_init_debug == B_FALSE vs. !dof_init_debug (Adam) The updated webrev is at: http://cr.opensolaris.org/~mgerdts/6750659-2008-12-03/ -- Mike Gerdts http://mgerdts.blogspot.com/ ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Tracing application-specific data structures
2) Before writing to the alias, I was trying to get my head around something like the first approach you mention, of unrolling the loop and determining a stopping point. Having dealt with Prolog in the distant past, this at least occurred to me, but I'm missing how one can express this with DTrace. For example, you could do something like this: struct foo { int data; struct foo *next; }; probe { this-lst = arg1; } probe /this-list != NULL/ { trace(this-lst-data); this-lst = this-lst-next; } /* ... */ probe /this-list != NULL/ { trace(this-lst-data); this-lst = this-lst-next; } I blogged about something like this awhile ago: http://blogs.sun.com/ahl/entry/pid2proc_for_dtrace Questions: 1) By tick probe are you meaning a probe that fires on a timer tick, used to trigger the next iteration through the data structure? If so, I'd thought the data structure would be morphing underneath the traversal. If not, can you point me to an example? You could do something like this: probe { lst = arg1; } tick-10ms /lst != NULL/ { trace(lst-data); lst = lst-next; } But remember, that the use of global variables means that you can only have one of these at a time. Adam -- Adam Leventhal, Fishworks http://blogs.sun.com/ahl ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Tracing application-specific data structures
Thanks, Adam! This is what I was starting to converge on, but was fumbling with ordering of clauses. Your examples make things a lot clearer. Cheers, Matthew On Dec 3, 2008, at 6:15 PM, Adam Leventhal wrote: 2) Before writing to the alias, I was trying to get my head around something like the first approach you mention, of unrolling the loop and determining a stopping point. Having dealt with Prolog in the distant past, this at least occurred to me, but I'm missing how one can express this with DTrace. For example, you could do something like this: struct foo { int data; struct foo *next; }; probe { this-lst = arg1; } probe /this-list != NULL/ { trace(this-lst-data); this-lst = this-lst-next; } /* ... */ probe /this-list != NULL/ { trace(this-lst-data); this-lst = this-lst-next; } I blogged about something like this awhile ago: http://blogs.sun.com/ahl/entry/pid2proc_for_dtrace Questions: 1) By tick probe are you meaning a probe that fires on a timer tick, used to trigger the next iteration through the data structure? If so, I'd thought the data structure would be morphing underneath the traversal. If not, can you point me to an example? You could do something like this: probe { lst = arg1; } tick-10ms /lst != NULL/ { trace(lst-data); lst = lst-next; } But remember, that the use of global variables means that you can only have one of these at a time. Adam -- Adam Leventhal, Fishworks http://blogs.sun.com/ahl ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] How to dig deeper
Hi Lejun, No there is only a 1 Oracle database running on this system. regards Hans-Peter It seems VOP_RWLOCK takes a long time. ufs_rwlock starts at line 12, but ends at line 15611. Is there another application keeps reading/writing in your system? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Hans-Peter Sent: 2008年12月3日 21:41 To: dtrace-discuss@opensolaris.org Subject: Re: [dtrace-discuss] How to dig deeper Hello all, I added a clause to my script. sysinfo::: /self-traceme==1 pid == $1/ { trace(execname); printf(sysinfo: timestamp : %d , timestamp); } A subsequent trace created a file of about 19000 lines. I loaded in Excel to be able to subtrace timestamps etc. The longest jump in timestamp is between the first pair of savectx and restorectx at line 70. I count 50 savectx/restorectx calls in my trace. But the actual physical write as indicated by the sysinfo is almost at the end of the file directly after the ufs_write and fop_write call end (line 18746). 24 - tsd_agent_get oracle timestamp 1795469946839100 24 - ufs_lockfs_top_vop_return oracle timestamp 1795 469946841500 24 - ufs_lockfs_top_vop_return oracle timestamp 1795 469946844300 24 - ufs_lockfs_end oracle timestamp 179546994684670 0 24 - ufs_write oracle timestamp 1795469946849600 24 - fop_write oracle timestamp 179546994685350057 ,365,700 24 | pwrite syswriteoracle sysinfo timestamp 1795469 946856800 24 | pwrite writech oracle sysinfo timestamp 17954699 46860200 So it seems that the actual write not the bottle neck but I attached a zip file with the excel document. Am I right in thinking that is is more an OS issue than a storage issue? Regards Hans-Peter -- This message posted from opensolaris.org ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org -- This message posted from opensolaris.org ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] How to dig deeper
Hi Mauro, Yes I understand why sysinfo is not the best to measure IO. But I just wanted to see when in the whole trace the actual physical write was being done. So it seems to me that, because the sysinfo:::pwrite is right at the end the performance bottle neck is more because of the locking etc. The database files are on ufs filesystems. Is zfs better? Regards Hans-Peter -- This message posted from opensolaris.org ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org