Re: [zfs-discuss] How to find poor performing disks

2009-09-04 Thread Roch

Scott Lawson writes:
 > Also you may wish to look at the output of 'iostat -xnce 1' as well.
 > 
 > You can post those to the list if you have a specific problem.
 > 
 > You want to be looking for error counts increasing and specifically 'asvc_t'
 > for the service times on the disks. I higher number for asvc_t  may help to
 >  isolate poorly performing individual disks.
 > 
 > 

I blast the pool with dd, and look for drives that are
*always* active, while others in the same group have
completed their transaction group and get no more activity.
Within a group drives should be getting the same amount of
data per 5 second (zfs_txg_synctime) and the ones that are
always active are the ones slowing you down.

If whole groups are unbalanced that's a sign that they have
different amount of free space and the expectation is that
you will be gated by the speed on the group that needs to
catch up. 

-r

 > 
 > Scott Meilicke wrote:
 > > You can try:
 > >
 > > zpool iostat pool_name -v 1
 > >
 > > This will show you IO on each vdev at one second intervals. Perhaps you 
 > > will see different IO behavior on any suspect drive.
 > >
 > > -Scott
 > >   
 > 
 > 
 > ___
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to find poor performing disks

2009-08-26 Thread Simon Gao
Running "iostat -nxce 1", I saw write sizes alternate between two raidz groups 
in the same pool.  

At one time, drives on cotroller 1 have larger writes (3-10 times) than ones on 
controller2:

extended device statistics    errors ---

r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot 
device 
0.00.00.00.0  0.0  0.00.00.0   0   0   0   0   0   0 
fd0
0.00.00.00.0  0.0  0.00.00.0   0   0   2   0   0   2 
c1t1d0 
0.00.00.00.0  0.0  0.00.00.0   0   0   1   0   0   1 
c0t10d0
0.00.00.00.0  0.0  0.00.00.0   0   0   1   0   0   1 
c0t11d0
0.00.00.00.0  0.0  0.00.00.0   0   0   2   0   0   2 
c3t0d0 
0.00.00.00.0  0.0  0.00.00.0   0   0   2   0   0   2 
c4t0d0 
0.09.00.04.0  0.0  0.00.00.5   0   0   1   0   0   1 
c0t12d0
0.09.00.04.0  0.0  0.00.00.1   0   0   1   0   0   1 
c0t13d0
0.09.00.04.5  0.0  0.00.00.1   0   0   1   0   0   1 
c0t14d0
0.08.00.04.5  0.0  0.00.00.2   0   0   1   0   0   1 
c0t15d0
0.09.00.03.5  0.0  0.00.00.1   0   0   1   0   0   1 
c0t16d0
0.09.00.03.5  0.0  0.00.00.1   0   0   1   0   0   1 
c0t17d0
0.0   20.00.0   56.5  0.0  0.00.00.2   0   0   1   0   0   1 
c2t6d0 
0.0   20.00.0   55.0  0.0  0.00.00.3   0   0   1   0   0   1 
c2t7d0 
0.0   20.00.0   53.5  0.0  0.00.00.2   0   0   1   0   0   1 
c2t8d0 
0.0   20.00.0   53.0  0.0  0.00.00.3   0   0   1   0   0   1 
c2t9d0 
0.0   20.00.0   55.5  0.0  0.00.00.2   0   0   1   0   0   1 
c2t10d0
0.0   20.00.0   55.0  0.0  0.00.00.3   0   0   1   0   0   1 
c2t11d0
0.00.00.00.0  0.0  0.00.00.0   0   0   1   0   0   1 
c2t12d0
0.00.00.00.0  0.0  0.00.00.0   0   0   1   0   0   1 
c2t13d0
 cpu

 us sy wt id

  0 47  0 53

extended device statistics    errors ---

r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot 
device 
0.00.00.00.0  0.0  0.00.00.0   0   0   0   0   0   0 
fd0
0.00.00.00.0  0.0  0.00.00.0   0   0   2   0   0   2 
c1t1d0 
0.00.00.00.0  0.0  0.00.00.0   0   0   1   0   0   1 
c0t10d0
0.00.00.00.0  0.0  0.00.00.0   0   0   1   0   0   1 
c0t11d0
0.00.00.00.0  0.0  0.00.00.0   0   0   2   0   0   2 
c3t0d0 
0.00.00.00.0  0.0  0.00.00.0   0   0   2   0   0   2 
c4t0d0 
0.08.00.0   18.5  0.0  0.00.00.2   0   0   1   0   0   1 
c0t12d0
0.08.00.0   18.5  0.0  0.00.00.3   0   0   1   0   0   1 
c0t13d0
0.0   11.00.0   20.5  0.0  0.00.00.3   0   0   1   0   0   1 
c0t14d0
0.0   12.00.0   20.5  0.0  0.00.00.3   0   0   1   0   0   1 
c0t15d0
0.08.00.0   19.0  0.0  0.00.00.2   0   0   1   0   0   1 
c0t16d0
0.08.00.0   18.5  0.0  0.00.00.2   0   0   1   0   0   1 
c0t17d0
0.0   21.00.0   66.0  0.0  0.00.00.4   0   1   1   0   0   1 
c2t6d0 
0.0   21.00.0   66.0  0.0  0.00.00.3   0   0   1   0   0   1 
c2t7d0 
0.0   21.00.0   65.5  0.0  0.00.00.3   0   0   1   0   0   1 
c2t8d0 
0.0   20.00.0   64.0  0.0  0.00.00.4   0   0   1   0   0   1 
c2t9d0 
0.0   21.00.0   65.0  0.0  0.00.00.4   0   0   1   0   0   1 
c2t10d0
0.0   21.00.0   64.0  0.0  0.00.00.3   0   0   1   0   0   1 
c2t11d0
0.00.00.00.0  0.0  0.00.00.0   0   0   1   0   0   1 
c2t12d0
0.00.00.00.0  0.0  0.00.00.0   0   0   1   0   0   1 
c2t13d0
 cpu

 us sy wt id

  0 23  0 77




At other time, drives on controller2  have larger writes (3-10 times) than the 
ones on 
controller1:
extended device statistics    errors ---

r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot 
device 
0.00.00.00.0  0.0  0.00.00.0   0   0   0   0   0   0 
fd0
0.00.00.00.0  0.0  0.00.00.0   0   0   2   0   0   2 
c1t1d0 
0.00.00.00.0  0.0  0.00.00

Re: [zfs-discuss] How to find poor performing disks

2009-08-26 Thread Dave Koelmeyer
Maybe you can run a Dtrace probe using Chime?

http://blogs.sun.com/observatory/entry/chime

Initial Traces -> Device IO
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to find poor performing disks

2009-08-26 Thread Scott Lawson

Also you may wish to look at the output of 'iostat -xnce 1' as well.

You can post those to the list if you have a specific problem.

You want to be looking for error counts increasing and specifically 'asvc_t'
for the service times on the disks. I higher number for asvc_t  may help to
isolate poorly performing individual disks.



Scott Meilicke wrote:

You can try:

zpool iostat pool_name -v 1

This will show you IO on each vdev at one second intervals. Perhaps you will 
see different IO behavior on any suspect drive.

-Scott
  



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to find poor performing disks

2009-08-26 Thread Scott Meilicke
You can try:

zpool iostat pool_name -v 1

This will show you IO on each vdev at one second intervals. Perhaps you will 
see different IO behavior on any suspect drive.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How to find poor performing disks

2009-08-26 Thread Simon Gao
Hi,

I'd appreciate if anyone can point me how to identify poor performing disks 
that might have dragged down performance of the pool. Also the system logged 
following error about one of the drives. Does it show the disk was having 
problem?

Aug 17 13:45:56 zfs1.domain.com scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@6/pci1000,3...@0 (mpt1):
Aug 17 13:45:56 zfs1.domain.com  Disconnected command timeout for Target 10
Aug 17 13:45:56 zfs1.domain.com scsi: [ID 365881 kern.info] 
/p...@0,0/pci8086,2...@6/pci1000,3...@0 (mpt1):
Aug 17 13:45:56 zfs1.domain.com  Log info 3114 received for target 10.
Aug 17 13:45:56 zfs1.domain.com  scsi_status=0, ioc_status=8048, scsi_state=c   
 
Aug 17 13:45:56 zfs1.domain.com scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@6/pci1000,3...@0/s...@a,0 (sd15):
Aug 17 13:45:56 zfs1.domain.com  SCSI transport failed: reason 'reset': 
retrying command 
Aug 17 13:45:59 zfs1.domain.com scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@6/pci1000,3...@0/s...@a,0 (sd15):
Aug 17 13:45:59 zfs1.domain.com  Error for Command: read(10)
Error Level: Retryable
Aug 17 13:45:59 zfs1.domain.com scsi: [ID 107833 kern.notice]Requested 
Block: 715872929 Error Block: 715872929
Aug 17 13:45:59 zfs1.domain.com scsi: [ID 107833 kern.notice]Vendor: ATA
Serial Number:  WD-WCAP
Aug 17 13:45:59 zfs1.domain.com scsi: [ID 107833 kern.notice]Sense Key: 
Unit Attention
Aug 17 13:45:59 zfs1.domain.com scsi: [ID 107833 kern.notice]ASC: 0x29 
(power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss