Re: [dtrace-discuss] How to dig deeper

2008-12-08 Thread Hans-Peter
The buffer cache was already huge.
So I decided not to increase it.
There is a KEEP pool of 5G which is hardly used.
If needed I will sacrifice this cache and add it to the DEFAULT cache.

Until now it looks promising.
The average log file sync wait time has dropped from about 70ms to 7ms.
But we will see how things devellop.

Thanks so far to everybody who helped.

Regards Hans-Peter
-- 
This message posted from opensolaris.org
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] How to dig deeper

2008-12-05 Thread przemolicc
On Fri, Dec 05, 2008 at 05:40:19AM -0800, Hans-Peter wrote:
 Ok thanks a lot sofar.
 
 We have planned down time for remounting the filesystems this weekend.
 We will see what happens.

You can remount it online:

mount -o remount,forcedirectio /mount point
mount -o remount,noforcedirectio /mount point

Regards
Przemyslaw Bak (przemol)
--
http://przemol.blogspot.com/





















--
Wygraj telefon komorkowy!
Sprawdz   http://link.interia.pl/f1fc0

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] How to dig deeper

2008-12-04 Thread Jim Mauro

Do you have directio enabled on UFS?
Especially for the redo logs?

With directio enabled, UFS writes to the log do not
serialize on the RW lock for the log file(s).

directio will also bypass the memory cache, so you need
to increase the Oracle db_block_buffers when enabling
UFS directio.

Thanks,
/jim


Hans-Peter wrote:
 Hi Mauro,

 Yes I understand why sysinfo is not the best to measure IO.
 But I just wanted to see when in the whole trace the actual physical write 
 was being done.
 So it seems to me that, because the sysinfo:::pwrite is right at the end the 
 performance bottle neck is more because of the locking etc.

 The database files are on ufs filesystems.
 Is zfs better?

 Regards Hans-Peter
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] How to dig deeper

2008-12-04 Thread przemolicc
On Thu, Dec 04, 2008 at 04:55:04AM -0800, Hans-Peter wrote:
  
  Do you have directio enabled on UFS?
  Especially for the redo logs?
 
 That is the strange thing.
 filesystemio_options has been set to setall which is asynch and directio.
 But when I dtrace the fbt calls I see  only directio calls for the database 
 writer
 but not for the log writer.
 
 Perhaps we should mount the filesystems with forcedirectio.

Forcedirectio forces direct io on _all_ files while 'setall' lets database 
decide
which files should be accessed with direct options. It (setall) means
that database has ability to write to some files with buffered access e.g.
log files (which is better).
Forcedirectio disables the choice and database is forced to use direct access
to all files.
So if you have Oracle use setall. If you have non-Oracle and that database 
doesn't have
setall-like option - use forcedirectio.

 Would noatime also help?

Usually setall/forcedirectio gives you enough.

Regards
Przemyslaw Bak (przemol)
--
http://przemol.blogspot.com/





















--
Wygraj telefon komorkowy!
Sprawdz   http://link.interia.pl/f1fc0

___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] How to dig deeper

2008-12-04 Thread David Miller
Hi Hans-Peter,

There is an Oracle bug that is not fixed until 10.2.0.4 (if I remember right)
where filesystemio_options is broken and so it won't do the correct thing
with directio and ufs.  The locks you are seeing are classic single-writer-lock
issues and indicate that you aren't getting directio.  The Oracle bug number
is Oracle Bug ID 5752399 (Oracle setall w/ UFS).

You might check to be sure you have the fix to that.  The only workaround is
to use forcedirectio in the mount options, which has been pointed out has
other issues for some activities, so getting the bug fixed would be a better
idea.

Regards,

Dave Miller

Hans-Peter wrote, On 12/04/08 06:55:
 Do you have directio enabled on UFS?
 Especially for the redo logs?
 
 That is the strange thing.
 filesystemio_options has been set to setall which is asynch and directio.
 But when I dtrace the fbt calls I see  only directio calls for the database 
 writer
 but not for the log writer.
 
 Perhaps we should mount the filesystems with forcedirectio.
 Would noatime also help?
 
 Regards
 
 With directio enabled, UFS writes to the log do not
 serialize on the RW lock for the log file(s).

 directio will also bypass the memory cache, so you
 need
 to increase the Oracle db_block_buffers when enabling
 UFS directio.

 Thanks,
 /jim


 Hans-Peter wrote:
 Hi Mauro,

 Yes I understand why sysinfo is not the best to
 measure IO.
 But I just wanted to see when in the whole trace
 the actual physical write was being done.
 So it seems to me that, because the
 sysinfo:::pwrite is right at the end the performance
 bottle neck is more because of the locking etc.
 The database files are on ufs filesystems.
 Is zfs better?

 Regards Hans-Peter
   
 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] How to dig deeper

2008-12-04 Thread Jim Mauro
For the record, my friend Phil Harman reminded me that
it's not the log files we care about for directio in terms
of single-writer lock break-up. We care about directio for
redo logs to avoid read-modify-write, which happens when
the write is not memory-page aligned.

Sorry about that.

Nonetheless, you must still pursue getting UFS directio enabled
and working on your system (dealing with the single-writer
lock problem is still important, it's just not directly applicable
to the redo logs).

Thanks,
/jim


Hans-Peter wrote:
 Do you have directio enabled on UFS?
 Especially for the redo logs?
 

 That is the strange thing.
 filesystemio_options has been set to setall which is asynch and directio.
 But when I dtrace the fbt calls I see  only directio calls for the database 
 writer
 but not for the log writer.

 Perhaps we should mount the filesystems with forcedirectio.
 Would noatime also help?

 Regards

   
 With directio enabled, UFS writes to the log do not
 serialize on the RW lock for the log file(s).

 directio will also bypass the memory cache, so you
 need
 to increase the Oracle db_block_buffers when enabling
 UFS directio.

 Thanks,
 /jim


 Hans-Peter wrote:
 
 Hi Mauro,

 Yes I understand why sysinfo is not the best to
   
 measure IO.
 
 But I just wanted to see when in the whole trace
   
 the actual physical write was being done.
 
 So it seems to me that, because the
   
 sysinfo:::pwrite is right at the end the performance
 bottle neck is more because of the locking etc.
 
 The database files are on ufs filesystems.
 Is zfs better?

 Regards Hans-Peter
   
   
 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
 
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] How to dig deeper

2008-12-03 Thread Jim Mauro
%busy is meaningless unless you're looking at a single disk that
can only have 1 outstanding IO in it's active queue, which is to
say %busy is a useless metric for anything disk that's been designed
and built in the last decade.

Ignore %busy. Focus on queue depths and queue service times,
both of which are reported in iotstat.

And again, averages can be misleading. Use the iostat data
as a starting point, but if you really think you have some disk
IO issues, use DTrace and aggregations to get more precise
data on disk IO times.

Thanks,
/jim


Hans-Peter wrote:
 Hello Jim,

 The iostat average are not too bad.
 But the busy% is 100% most of the time.

 regards Hans-Peter
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] How to dig deeper

2008-12-03 Thread Zhu, Lejun
It seems VOP_RWLOCK takes a long time. ufs_rwlock starts at line 12, but ends 
at line 15611. Is there another application keeps reading/writing in your 
system?

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Hans-Peter
Sent: 2008年12月3日 21:41
To: dtrace-discuss@opensolaris.org
Subject: Re: [dtrace-discuss] How to dig deeper

Hello all,

I added a clause to my script.
sysinfo:::
/self-traceme==1  pid == $1/
{
trace(execname);
printf(sysinfo: timestamp : %d , timestamp);
}
A subsequent trace created a file of about 19000 lines.
I loaded in Excel to be able to subtrace timestamps etc.
The longest jump in timestamp is between the first pair of savectx and 
restorectx at line 70.
I count 50 savectx/restorectx calls in my trace.
But the actual physical write as indicated by the sysinfo is almost at the end 
of the file directly after the ufs_write and fop_write call end (line 18746).

24  -  tsd_agent_get   oracle  timestamp   
1795469946839100
24  -  ufs_lockfs_top_vop_return   oracle  timestamp   
1795469946841500
24  -  ufs_lockfs_top_vop_return   oracle  timestamp   
1795469946844300
24  -  ufs_lockfs_end  oracle  timestamp   
1795469946846700
24  -  ufs_write   oracle  timestamp   
1795469946849600
24  -  fop_write   oracle  timestamp   
179546994685350057,365,700
24  |   pwrite  syswriteoracle  sysinfo timestamp   
1795469946856800
24  |   pwrite  writech oracle  sysinfo timestamp   
1795469946860200

So it seems that the actual write not the bottle neck but 
I attached a zip file with the excel document.

Am I right in thinking that is is more an OS issue than a storage issue?

Regards Hans-Peter
-- 
This message posted from opensolaris.org
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] How to dig deeper

2008-12-03 Thread Jim Mauro

The sysinfo provider isn't the best choice for measuring disk IO
times.

Run;
#dtrace -s /usr/demo/dtrace/iotime.d

/jim


Hans-Peter wrote:
 Hello all,

 I added a clause to my script.
 sysinfo:::
 /self-traceme==1  pid == $1/
 {
 trace(execname);
 printf(sysinfo: timestamp : %d , timestamp);
 }
 A subsequent trace created a file of about 19000 lines.
 I loaded in Excel to be able to subtrace timestamps etc.
 The longest jump in timestamp is between the first pair of savectx and 
 restorectx at line 70.
 I count 50 savectx/restorectx calls in my trace.
 But the actual physical write as indicated by the sysinfo is almost at the 
 end of the file directly after the ufs_write and fop_write call end (line 
 18746).

   24  -  tsd_agent_get   oracle  timestamp   
 1795469946839100
   24  -  ufs_lockfs_top_vop_return   oracle  timestamp   
 1795469946841500
   24  -  ufs_lockfs_top_vop_return   oracle  timestamp   
 1795469946844300
   24  -  ufs_lockfs_end  oracle  timestamp   
 1795469946846700
   24  -  ufs_write   oracle  timestamp   
 1795469946849600
   24  -  fop_write   oracle  timestamp   
 179546994685350057,365,700
   24  |   pwrite  syswriteoracle  sysinfo timestamp   
 1795469946856800
   24  |   pwrite  writech oracle  sysinfo timestamp   
 1795469946860200

 So it seems that the actual write not the bottle neck but 
 I attached a zip file with the excel document.

 Am I right in thinking that is is more an OS issue than a storage issue?

 Regards Hans-Peter
   
 

 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] How to dig deeper

2008-12-03 Thread Jim Mauro
Also - since this is Oracle, are the Oracle files
on a file system, or raw devices?
If a file system, which one?

/jim


Jim Mauro wrote:
 The sysinfo provider isn't the best choice for measuring disk IO
 times.

 Run;
 #dtrace -s /usr/demo/dtrace/iotime.d

 /jim


 Hans-Peter wrote:
   
 Hello all,

 I added a clause to my script.
 sysinfo:::
 /self-traceme==1  pid == $1/
 {
 trace(execname);
 printf(sysinfo: timestamp : %d , timestamp);
 }
 A subsequent trace created a file of about 19000 lines.
 I loaded in Excel to be able to subtrace timestamps etc.
 The longest jump in timestamp is between the first pair of savectx and 
 restorectx at line 70.
 I count 50 savectx/restorectx calls in my trace.
 But the actual physical write as indicated by the sysinfo is almost at the 
 end of the file directly after the ufs_write and fop_write call end (line 
 18746).

  24  -  tsd_agent_get   oracle  timestamp   
 1795469946839100
  24  -  ufs_lockfs_top_vop_return   oracle  timestamp   
 1795469946841500
  24  -  ufs_lockfs_top_vop_return   oracle  timestamp   
 1795469946844300
  24  -  ufs_lockfs_end  oracle  timestamp   
 1795469946846700
  24  -  ufs_write   oracle  timestamp   
 1795469946849600
  24  -  fop_write   oracle  timestamp   
 179546994685350057,365,700
  24  |   pwrite  syswriteoracle  sysinfo timestamp   
 1795469946856800
  24  |   pwrite  writech oracle  sysinfo timestamp   
 1795469946860200

 So it seems that the actual write not the bottle neck but 
 I attached a zip file with the excel document.

 Am I right in thinking that is is more an OS issue than a storage issue?

 Regards Hans-Peter
   
 

 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
 
 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] How to dig deeper

2008-12-03 Thread Hans-Peter
Hi Lejun,

No there is only a 1 Oracle database running on this system.

regards Hans-Peter

 It seems VOP_RWLOCK takes a long time. ufs_rwlock
 starts at line 12, but ends at line 15611. Is there
 another application keeps reading/writing in your
 system?
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On
 Behalf Of Hans-Peter
 Sent: 2008年12月3日 21:41
 To: dtrace-discuss@opensolaris.org
 Subject: Re: [dtrace-discuss] How to dig deeper
 
 Hello all,
 
 I added a clause to my script.
 sysinfo:::
 /self-traceme==1  pid == $1/
 {
 trace(execname);
 printf(sysinfo: timestamp : %d , timestamp);
 }
 A subsequent trace created a file of about 19000
 lines.
 I loaded in Excel to be able to subtrace timestamps
 etc.
 The longest jump in timestamp is between the first
 pair of savectx and restorectx at line 70.
 I count 50 savectx/restorectx calls in my trace.
 But the actual physical write as indicated by the
 sysinfo is almost at the end of the file directly
 after the ufs_write and fop_write call end (line
 18746).
 
   24  -  tsd_agent_get   oracle  timestamp   1795469946839100
   
   24  -  ufs_lockfs_top_vop_return   oracle  timestamp   
 1795
 469946841500  
   24  -  ufs_lockfs_top_vop_return   oracle  timestamp   
 1795
 469946844300  
   24  -  ufs_lockfs_end  oracle  timestamp   179546994684670
 0 
   24  -  ufs_write   oracle  timestamp   
 1795469946849600
   24  -  fop_write   oracle  timestamp   
 179546994685350057
 ,365,700
   24  |   pwrite  syswriteoracle  sysinfo timestamp   
 1795469
 946856800
   24  |   pwrite  writech oracle  sysinfo timestamp   17954699
 46860200
 
 So it seems that the actual write not the bottle neck
 but 
 I attached a zip file with the excel document.
 
 Am I right in thinking that is is more an OS issue
 than a storage issue?
 
 Regards Hans-Peter
 -- 
 This message posted from opensolaris.org
 ___
 dtrace-discuss mailing list
 dtrace-discuss@opensolaris.org
-- 
This message posted from opensolaris.org
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org

Re: [dtrace-discuss] How to dig deeper

2008-12-03 Thread Hans-Peter
Hi Mauro,

Yes I understand why sysinfo is not the best to measure IO.
But I just wanted to see when in the whole trace the actual physical write was 
being done.
So it seems to me that, because the sysinfo:::pwrite is right at the end the 
performance bottle neck is more because of the locking etc.

The database files are on ufs filesystems.
Is zfs better?

Regards Hans-Peter
-- 
This message posted from opensolaris.org
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org


Re: [dtrace-discuss] How to dig deeper

2008-12-02 Thread Jim Mauro
Start with iostat. It's simple, and provides an average of service times
for disk IOs (iostat -xnz 1, the asvc_t column is average service times
in milliseconds). As Jim Litchfield pointed out in a previous thread,
keep in mind it is an average, so you won't see nasty peaks, but if the
average is bad, it's time to talk to the SAN/Storage folks.

Use DTrace with the IO provider to get more precise measurements.
You can borrow from existing scripts in /usr/demo/dtrace to add
time stamps, and use aggregations to capture IO times.

If you need assistance with such a script, please let us know and
we'll be glad to post one (it looks like you're already pretty comfortable
with DTrace...).

Thanks,
/jim

Hans-Peter wrote:
 In order to get more information on IO performance problems I created the 
 script below:
 #!/usr/sbin/dtrace -s
 #pragma D option flowindent
 syscall::*write*:entry
 /pid == $1  guard++ == 0/
 {
 self - ts = timestamp;
 self-traceme = 1;
 printf(fd: %d, arg0);
 }
 fbt:::
 /self-traceme/
 {
 /*  elapsd =timestamp - self - ts;
 printf( elapsed : %d , elapsd); */
 printf( timestamp : %d , timestamp);
 }
 syscall::*write*:return
 /self-traceme/
 {
 self-traceme = 0;
 elapsed=timestamp - self - ts;
 printf( timestamp : %d , timestamp);
 printf(\telapsed : %d , elapsed);
 exit(0);
 }

 I gives me the timestamp for every fbt call during a write system call.
 A snippet is here below
   8  - schedctl_save  timestamp : 1627201334052600
   8- savectx  timestamp : 1627201334053000
   0  - restorectx timestamp : 1627202741110300 
  difference = 1.407.057.300
   0- schedctl_restore timestamp : 1627202741115100
   0- schedctl_restore timestamp : 1627202741116100
 Visible is that the thread of for 1.4 s off cpu.
 Storage is on SAN with fibers between the system and the storage.

 Is it possible to dig deeper with dtrace to see how the HBA's are performing.
 Other suggestions are welcome too.

 Regards Hans-Peter
   
___
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org