more below...

On Oct 24, 2009, at 2:49 AM, Adam Cheal wrote:

The iostat I posted previously was from a system we had already tuned the zfs:zfs_vdev_max_pending depth down to 10 (as visible by the max of about 10 in actv per disk).

I reset this value in /etc/system to 7, rebooted, and started a scrub. iostat output showed busier disks (%b is higher, which seemed odd) but a cap of about 7 queue items per disk, proving the tuning was effective. iostat at a high-water mark during the test looked like this:

                   extended device statistics
   r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8
   0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t0d0
   0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t1d0
   0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t2d0
   0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t3d0
8344.5    0.0 359640.4    0.0  0.1 300.5    0.0   36.0   0 4362 c9
 190.0    0.0 6800.4    0.0  0.0  6.6    0.0   34.8   0  99 c9t8d0
 185.0    0.0 6917.1    0.0  0.0  6.1    0.0   32.9   0  94 c9t9d0
 187.0    0.0 6640.9    0.0  0.0  6.5    0.0   34.6   0  98 c9t10d0
 186.5    0.0 6543.4    0.0  0.0  7.0    0.0   37.5   0 100 c9t11d0
 180.5    0.0 7203.1    0.0  0.0  6.7    0.0   37.2   0 100 c9t12d0
 195.5    0.0 7352.4    0.0  0.0  7.0    0.0   35.8   0 100 c9t13d0
 188.0    0.0 6884.9    0.0  0.0  6.6    0.0   35.2   0  99 c9t14d0
 204.0    0.0 6990.1    0.0  0.0  7.0    0.0   34.3   0 100 c9t15d0
 199.0    0.0 7336.7    0.0  0.0  7.0    0.0   35.2   0 100 c9t16d0
 180.5    0.0 6837.9    0.0  0.0  7.0    0.0   38.8   0 100 c9t17d0
 198.0    0.0 7668.9    0.0  0.0  7.0    0.0   35.3   0 100 c9t18d0
 203.0    0.0 7983.2    0.0  0.0  7.0    0.0   34.5   0 100 c9t19d0
   0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c9t20d0
 195.5    0.0 7096.4    0.0  0.0  6.7    0.0   34.1   0  98 c9t21d0
 189.5    0.0 7757.2    0.0  0.0  6.4    0.0   33.9   0  97 c9t22d0
 195.5    0.0 7645.9    0.0  0.0  6.6    0.0   33.8   0  99 c9t23d0
 194.5    0.0 7925.9    0.0  0.0  7.0    0.0   36.0   0 100 c9t24d0
 188.5    0.0 6725.6    0.0  0.0  6.2    0.0   32.8   0  94 c9t25d0
 188.5    0.0 7199.6    0.0  0.0  6.5    0.0   34.6   0  98 c9t26d0
 196.0    0.0 6666.9    0.0  0.0  6.3    0.0   32.1   0  95 c9t27d0
 193.5    0.0 7455.4    0.0  0.0  6.2    0.0   32.0   0  95 c9t28d0
 189.0    0.0 7400.9    0.0  0.0  6.3    0.0   33.2   0  96 c9t29d0
 182.5    0.0 9397.0    0.0  0.0  7.0    0.0   38.3   0 100 c9t30d0
 192.5    0.0 9179.5    0.0  0.0  7.0    0.0   36.3   0 100 c9t31d0
 189.5    0.0 9431.8    0.0  0.0  7.0    0.0   36.9   0 100 c9t32d0
 187.5    0.0 9082.0    0.0  0.0  7.0    0.0   37.3   0 100 c9t33d0
 188.5    0.0 9368.8    0.0  0.0  7.0    0.0   37.1   0 100 c9t34d0
 180.5    0.0 9332.8    0.0  0.0  7.0    0.0   38.8   0 100 c9t35d0
 183.0    0.0 9690.3    0.0  0.0  7.0    0.0   38.2   0 100 c9t36d0
 186.0    0.0 9193.8    0.0  0.0  7.0    0.0   37.6   0 100 c9t37d0
 180.5    0.0 8233.4    0.0  0.0  7.0    0.0   38.8   0 100 c9t38d0
 175.5    0.0 9085.2    0.0  0.0  7.0    0.0   39.9   0 100 c9t39d0
 177.0    0.0 9340.0    0.0  0.0  7.0    0.0   39.5   0 100 c9t40d0
 175.5    0.0 8831.0    0.0  0.0  7.0    0.0   39.9   0 100 c9t41d0
 190.5    0.0 9177.8    0.0  0.0  7.0    0.0   36.7   0 100 c9t42d0
   0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c9t43d0
 196.0    0.0 9180.5    0.0  0.0  7.0    0.0   35.7   0 100 c9t44d0
 193.5    0.0 9496.8    0.0  0.0  7.0    0.0   36.2   0 100 c9t45d0
 187.0    0.0 8699.5    0.0  0.0  7.0    0.0   37.4   0 100 c9t46d0
 198.5    0.0 9277.0    0.0  0.0  7.0    0.0   35.2   0 100 c9t47d0
 185.5    0.0 9778.3    0.0  0.0  7.0    0.0   37.7   0 100 c9t48d0
 192.0    0.0 8384.2    0.0  0.0  7.0    0.0   36.4   0 100 c9t49d0
 198.5    0.0 8864.7    0.0  0.0  7.0    0.0   35.2   0 100 c9t50d0
 192.0    0.0 9369.8    0.0  0.0  7.0    0.0   36.4   0 100 c9t51d0
 182.5    0.0 8825.7    0.0  0.0  7.0    0.0   38.3   0 100 c9t52d0
 202.0    0.0 7387.9    0.0  0.0  7.0    0.0   34.6   0 100 c9t55d0

...and sure enough about 20 minutes into it I get this (bus reset?):

scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/ pci1000,3...@0/s...@34,0 (sd49):
      incomplete read- retrying
scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/ pci1000,3...@0/s...@21,0 (sd30):
      incomplete read- retrying
scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/ pci1000,3...@0/s...@1e,0 (sd27):
      incomplete read- retrying
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
      Rev. 8 LSI, Inc. 1068E found.
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
      mpt0 supports power management.
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
      mpt0: IOC Operational.

During the "bus reset", iostat output looked like this:

extended device statistics ---- errors --- r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c8t0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c8t1d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c8t2d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c8t3d0 0.0 0.0 0.0 0.0 0.0 88.0 0.0 0.0 0 2200 0 3 0 3 c9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t8d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t9d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t10d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t11d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t12d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t13d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t14d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t15d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t16d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t17d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t18d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t19d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t20d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t21d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t22d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t23d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t24d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t25d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t26d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t27d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t28d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t29d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 1 0 1 c9t30d0

OK, here we see 4 I/Os pending outside  of the host.  The host has
sent them on and is waiting for them to return. This means they are
getting dropped either at the disk or somewhere between the disk
and the controller.

When this happens, the sd driver will time them out, try to clear
the fault by reset, and retry. In other words, the resets you see
are when the system tries to recover.

Since there are many disks with 4 stuck I/Os, I would lean towards
a common cause. What do these disks have in common?  Firmware?
Do they share a SAS expander?
 -- richard

0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t31d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t32d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 1 0 1 c9t33d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t34d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t35d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t36d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t37d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t38d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t39d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t40d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t41d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t42d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t43d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t44d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t45d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t46d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t47d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t48d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t49d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t50d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t51d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 1 0 1 c9t52d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t55d0

During our previous testing, we had tried even setting this max_pending value down to 1, but we still hit the problem (albeit it took a little longer to hit it) and I couldn't find anything else I could set to throttle IO to the disk, hence the frustration.

If you hadn't seen this output, would you say that 7 was a "reasonable" value for that max_pending queue for our architecture and should give the LSI controller in this situation enough breathing room to operate? If so, I *should* be able to scrub the disks successfully (ZFS isn't to blame) and therefore have to point the finger at the mpt-driver/LSI-firmware/disk-firmware instead.
--
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to