Re: [dtrace-discuss] How to dig deeper

2008-12-03 Thread Jim Mauro
%busy is meaningless unless you're looking at a single disk that
can only have 1 outstanding IO in it's active queue, which is to
say %busy is a useless metric for anything disk that's been designed
and built in the last decade.

Ignore %busy. Focus on queue depths and queue service times,
both of which are reported in iotstat.

And again, averages can be misleading. Use the iostat data
as a starting point, but if you really think you have some disk
IO issues, use DTrace and aggregations to get more precise
data on disk IO times.

Thanks,
/jim


Hans-Peter wrote:
 Hello Jim,

 The iostat average are not too bad.
 But the busy% is 100% most of the time.

 regards Hans-Peter
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Tracing application-specific data structures

2008-12-03 Thread Matthew Barker
Adam, thanks for your reply.

I'm trying to do a root-cause analysis on a troublesome customer bug and 
single-stepping with a debugger is making the context difficult to track. 

I've been searching the forums before and after the posting.  The problem has 
to do with what terms to search for.   I did find a thread in volume 42 which 
covered why conditionals weren't included in the language, but alternative 
patterns for traversing data structures seem to be lacking.  Again, it probably 
comes down to knowing well enough the taxonomy to employ in searching in the 
DTrace forum.

Questions: 1) By tick probe are you meaning a probe that fires on a timer 
tick, used to trigger the next iteration through the data structure?  If so, 
I'd thought the data structure would be morphing underneath the traversal.  If 
not, can you point me to an example?

2) Before writing to the alias, I was trying to get my head around something 
like the first approach you mention, of unrolling the loop and determining a 
stopping point.  Having dealt with Prolog in the distant past, this at least 
occurred to me, but I'm missing how one can express this with DTrace.

Any pointers are appreciated.

Cheers,
Matthew
-- 
This message posted from opensolaris.org
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] How to dig deeper

2008-12-03 Thread Zhu, Lejun
It seems VOP_RWLOCK takes a long time. ufs_rwlock starts at line 12, but ends 
at line 15611. Is there another application keeps reading/writing in your 
system?

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Hans-Peter
Sent: 2008年12月3日 21:41
To: dtrace-discuss@opensolaris.org
Subject: Re: [dtrace-discuss] How to dig deeper

Hello all,

I added a clause to my script.
sysinfo:::
/self-traceme==1  pid == $1/
{
trace(execname);
printf(sysinfo: timestamp : %d , timestamp);
}
A subsequent trace created a file of about 19000 lines.
I loaded in Excel to be able to subtrace timestamps etc.
The longest jump in timestamp is between the first pair of savectx and 
restorectx at line 70.
I count 50 savectx/restorectx calls in my trace.
But the actual physical write as indicated by the sysinfo is almost at the end 
of the file directly after the ufs_write and fop_write call end (line 18746).

24  -  tsd_agent_get   oracle  timestamp   
1795469946839100
24  -  ufs_lockfs_top_vop_return   oracle  timestamp   
1795469946841500
24  -  ufs_lockfs_top_vop_return   oracle  timestamp   
1795469946844300
24  -  ufs_lockfs_end  oracle  timestamp   
1795469946846700
24  -  ufs_write   oracle  timestamp   
1795469946849600
24  -  fop_write   oracle  timestamp   
179546994685350057,365,700
24  |   pwrite  syswriteoracle  sysinfo timestamp   
1795469946856800
24  |   pwrite  writech oracle  sysinfo timestamp   
1795469946860200

So it seems that the actual write not the bottle neck but 
I attached a zip file with the excel document.

Am I right in thinking that is is more an OS issue than a storage issue?

Regards Hans-Peter
-- 
This message posted from opensolaris.org
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] How to dig deeper

2008-12-03 Thread Jim Mauro

The sysinfo provider isn't the best choice for measuring disk IO
times.

Run;
#dtrace -s /usr/demo/dtrace/iotime.d

/jim


Hans-Peter wrote:
 Hello all,

 I added a clause to my script.
 sysinfo:::
 /self-traceme==1  pid == $1/
 {
 trace(execname);
 printf(sysinfo: timestamp : %d , timestamp);
 }
 A subsequent trace created a file of about 19000 lines.
 I loaded in Excel to be able to subtrace timestamps etc.
 The longest jump in timestamp is between the first pair of savectx and 
 restorectx at line 70.
 I count 50 savectx/restorectx calls in my trace.
 But the actual physical write as indicated by the sysinfo is almost at the 
 end of the file directly after the ufs_write and fop_write call end (line 
 18746).

   24  -  tsd_agent_get   oracle  timestamp   
 1795469946839100
   24  -  ufs_lockfs_top_vop_return   oracle  timestamp   
 1795469946841500
   24  -  ufs_lockfs_top_vop_return   oracle  timestamp   
 1795469946844300
   24  -  ufs_lockfs_end  oracle  timestamp   
 1795469946846700
   24  -  ufs_write   oracle  timestamp   
 1795469946849600
   24  -  fop_write   oracle  timestamp   
 179546994685350057,365,700
   24  |   pwrite  syswriteoracle  sysinfo timestamp   
 1795469946856800
   24  |   pwrite  writech oracle  sysinfo timestamp   
 1795469946860200

 So it seems that the actual write not the bottle neck but 
 I attached a zip file with the excel document.

 Am I right in thinking that is is more an OS issue than a storage issue?

 Regards Hans-Peter
   
 

 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] How to dig deeper

2008-12-03 Thread Jim Mauro
Also - since this is Oracle, are the Oracle files
on a file system, or raw devices?
If a file system, which one?

/jim


Jim Mauro wrote:
 The sysinfo provider isn't the best choice for measuring disk IO
 times.

 Run;
 #dtrace -s /usr/demo/dtrace/iotime.d

 /jim


 Hans-Peter wrote:
   
 Hello all,

 I added a clause to my script.
 sysinfo:::
 /self-traceme==1  pid == $1/
 {
 trace(execname);
 printf(sysinfo: timestamp : %d , timestamp);
 }
 A subsequent trace created a file of about 19000 lines.
 I loaded in Excel to be able to subtrace timestamps etc.
 The longest jump in timestamp is between the first pair of savectx and 
 restorectx at line 70.
 I count 50 savectx/restorectx calls in my trace.
 But the actual physical write as indicated by the sysinfo is almost at the 
 end of the file directly after the ufs_write and fop_write call end (line 
 18746).

  24  -  tsd_agent_get   oracle  timestamp   
 1795469946839100
  24  -  ufs_lockfs_top_vop_return   oracle  timestamp   
 1795469946841500
  24  -  ufs_lockfs_top_vop_return   oracle  timestamp   
 1795469946844300
  24  -  ufs_lockfs_end  oracle  timestamp   
 1795469946846700
  24  -  ufs_write   oracle  timestamp   
 1795469946849600
  24  -  fop_write   oracle  timestamp   
 179546994685350057,365,700
  24  |   pwrite  syswriteoracle  sysinfo timestamp   
 1795469946856800
  24  |   pwrite  writech oracle  sysinfo timestamp   
 1795469946860200

 So it seems that the actual write not the bottle neck but 
 I attached a zip file with the excel document.

 Am I right in thinking that is is more an OS issue than a storage issue?

 Regards Hans-Peter
   
 

 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
 
 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


[dtrace-discuss] Round four: Re: code review req: 6750659 drti.o crashes app due to corrupt environment

2008-12-03 Thread Mike Gerdts
I believe that I have incorporated all of the feedback given
(thanks!).  Changes since the 2008-11-16 version include:

- ksh style  coding standards compliance in test script (Roland)
- dof_init_debug == B_FALSE vs. !dof_init_debug (Adam)

The updated webrev is at:

http://cr.opensolaris.org/~mgerdts/6750659-2008-12-03/

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Tracing application-specific data structures

2008-12-03 Thread Adam Leventhal
 2) Before writing to the alias, I was trying to get my head around
 something like the first approach you mention, of unrolling the loop and
 determining a stopping point.  Having dealt with Prolog in the distant
 past, this at least occurred to me, but I'm missing how one can express
 this with DTrace.

For example, you could do something like this:

struct foo {
int data;
struct foo *next;
};

probe
{
this-lst = arg1;
}

probe
/this-list != NULL/
{
trace(this-lst-data);
this-lst = this-lst-next;
}

/* ... */

probe
/this-list != NULL/
{
trace(this-lst-data);
this-lst = this-lst-next;
}

I blogged about something like this awhile ago:

  http://blogs.sun.com/ahl/entry/pid2proc_for_dtrace

 Questions: 1) By tick probe are you meaning a probe that fires on a timer
 tick, used to trigger the next iteration through the data structure?  If
 so, I'd thought the data structure would be morphing underneath the
 traversal.  If not, can you point me to an example?

You could do something like this:

probe
{
lst = arg1;
}

tick-10ms
/lst != NULL/
{
trace(lst-data);
lst = lst-next;
}

But remember, that the use of global variables means that you can only have
one of these at a time.

Adam

-- 
Adam Leventhal, Fishworks http://blogs.sun.com/ahl
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] Tracing application-specific data structures

2008-12-03 Thread Matthew Barker
Thanks, Adam!
This is what  I was starting to converge on, but was fumbling with  
ordering of clauses.
Your examples make things a lot clearer.
Cheers,
Matthew

On Dec 3, 2008, at 6:15 PM, Adam Leventhal wrote:

 2) Before writing to the alias, I was trying to get my head around
 something like the first approach you mention, of unrolling the  
 loop and
 determining a stopping point.  Having dealt with Prolog in the  
 distant
 past, this at least occurred to me, but I'm missing how one can  
 express
 this with DTrace.

 For example, you could do something like this:

 struct foo {
   int data;
   struct foo *next;
 };

 probe
 {
   this-lst = arg1;
 }

 probe
 /this-list != NULL/
 {
   trace(this-lst-data);
   this-lst = this-lst-next;
 }

 /* ... */

 probe
 /this-list != NULL/
 {
   trace(this-lst-data);
   this-lst = this-lst-next;
 }

 I blogged about something like this awhile ago:

  http://blogs.sun.com/ahl/entry/pid2proc_for_dtrace

 Questions: 1) By tick probe are you meaning a probe that fires on  
 a timer
 tick, used to trigger the next iteration through the data  
 structure?  If
 so, I'd thought the data structure would be morphing underneath the
 traversal.  If not, can you point me to an example?

 You could do something like this:

 probe
 {
   lst = arg1;
 }

 tick-10ms
 /lst != NULL/
 {
   trace(lst-data);
   lst = lst-next;
 }

 But remember, that the use of global variables means that you can  
 only have
 one of these at a time.

 Adam

 -- 
 Adam Leventhal, Fishworks http://blogs.sun.com/ahl

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] How to dig deeper

2008-12-03 Thread Hans-Peter
Hi Lejun,

No there is only a 1 Oracle database running on this system.

regards Hans-Peter

 It seems VOP_RWLOCK takes a long time. ufs_rwlock
 starts at line 12, but ends at line 15611. Is there
 another application keeps reading/writing in your
 system?
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On
 Behalf Of Hans-Peter
 Sent: 2008年12月3日 21:41
 To: dtrace-discuss@opensolaris.org
 Subject: Re: [dtrace-discuss] How to dig deeper
 
 Hello all,
 
 I added a clause to my script.
 sysinfo:::
 /self-traceme==1  pid == $1/
 {
 trace(execname);
 printf(sysinfo: timestamp : %d , timestamp);
 }
 A subsequent trace created a file of about 19000
 lines.
 I loaded in Excel to be able to subtrace timestamps
 etc.
 The longest jump in timestamp is between the first
 pair of savectx and restorectx at line 70.
 I count 50 savectx/restorectx calls in my trace.
 But the actual physical write as indicated by the
 sysinfo is almost at the end of the file directly
 after the ufs_write and fop_write call end (line
 18746).
 
   24  -  tsd_agent_get   oracle  timestamp   1795469946839100
   
   24  -  ufs_lockfs_top_vop_return   oracle  timestamp   
 1795
 469946841500  
   24  -  ufs_lockfs_top_vop_return   oracle  timestamp   
 1795
 469946844300  
   24  -  ufs_lockfs_end  oracle  timestamp   179546994684670
 0 
   24  -  ufs_write   oracle  timestamp   
 1795469946849600
   24  -  fop_write   oracle  timestamp   
 179546994685350057
 ,365,700
   24  |   pwrite  syswriteoracle  sysinfo timestamp   
 1795469
 946856800
   24  |   pwrite  writech oracle  sysinfo timestamp   17954699
 46860200
 
 So it seems that the actual write not the bottle neck
 but 
 I attached a zip file with the excel document.
 
 Am I right in thinking that is is more an OS issue
 than a storage issue?
 
 Regards Hans-Peter
 -- 
 This message posted from opensolaris.org
 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
-- 
This message posted from opensolaris.org
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org

Re: [dtrace-discuss] How to dig deeper

2008-12-03 Thread Hans-Peter
Hi Mauro,

Yes I understand why sysinfo is not the best to measure IO.
But I just wanted to see when in the whole trace the actual physical write was 
being done.
So it seems to me that, because the sysinfo:::pwrite is right at the end the 
performance bottle neck is more because of the locking etc.

The database files are on ufs filesystems.
Is zfs better?

Regards Hans-Peter
-- 
This message posted from opensolaris.org
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org