The iostat I posted previously was from a system we had already tuned the
zfs:zfs_vdev_max_pending depth down to 10 (as visible by the max of about 10 in
actv per disk).
I reset this value in /etc/system to 7, rebooted, and started a scrub. iostat
output showed busier disks (%b is higher, which seemed odd) but a cap of about
7 queue items per disk, proving the tuning was effective. iostat at a
high-water mark during the test looked like this:
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t1d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t2d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t3d0
8344.5 0.0 359640.4 0.0 0.1 300.5 0.0 36.0 0 4362 c9
190.0 0.0 6800.4 0.0 0.0 6.6 0.0 34.8 0 99 c9t8d0
185.0 0.0 6917.1 0.0 0.0 6.1 0.0 32.9 0 94 c9t9d0
187.0 0.0 6640.9 0.0 0.0 6.5 0.0 34.6 0 98 c9t10d0
186.5 0.0 6543.4 0.0 0.0 7.0 0.0 37.5 0 100 c9t11d0
180.5 0.0 7203.1 0.0 0.0 6.7 0.0 37.2 0 100 c9t12d0
195.5 0.0 7352.4 0.0 0.0 7.0 0.0 35.8 0 100 c9t13d0
188.0 0.0 6884.9 0.0 0.0 6.6 0.0 35.2 0 99 c9t14d0
204.0 0.0 6990.1 0.0 0.0 7.0 0.0 34.3 0 100 c9t15d0
199.0 0.0 7336.7 0.0 0.0 7.0 0.0 35.2 0 100 c9t16d0
180.5 0.0 6837.9 0.0 0.0 7.0 0.0 38.8 0 100 c9t17d0
198.0 0.0 7668.9 0.0 0.0 7.0 0.0 35.3 0 100 c9t18d0
203.0 0.0 7983.2 0.0 0.0 7.0 0.0 34.5 0 100 c9t19d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c9t20d0
195.5 0.0 7096.4 0.0 0.0 6.7 0.0 34.1 0 98 c9t21d0
189.5 0.0 7757.2 0.0 0.0 6.4 0.0 33.9 0 97 c9t22d0
195.5 0.0 7645.9 0.0 0.0 6.6 0.0 33.8 0 99 c9t23d0
194.5 0.0 7925.9 0.0 0.0 7.0 0.0 36.0 0 100 c9t24d0
188.5 0.0 6725.6 0.0 0.0 6.2 0.0 32.8 0 94 c9t25d0
188.5 0.0 7199.6 0.0 0.0 6.5 0.0 34.6 0 98 c9t26d0
196.0 0.0 6666.9 0.0 0.0 6.3 0.0 32.1 0 95 c9t27d0
193.5 0.0 7455.4 0.0 0.0 6.2 0.0 32.0 0 95 c9t28d0
189.0 0.0 7400.9 0.0 0.0 6.3 0.0 33.2 0 96 c9t29d0
182.5 0.0 9397.0 0.0 0.0 7.0 0.0 38.3 0 100 c9t30d0
192.5 0.0 9179.5 0.0 0.0 7.0 0.0 36.3 0 100 c9t31d0
189.5 0.0 9431.8 0.0 0.0 7.0 0.0 36.9 0 100 c9t32d0
187.5 0.0 9082.0 0.0 0.0 7.0 0.0 37.3 0 100 c9t33d0
188.5 0.0 9368.8 0.0 0.0 7.0 0.0 37.1 0 100 c9t34d0
180.5 0.0 9332.8 0.0 0.0 7.0 0.0 38.8 0 100 c9t35d0
183.0 0.0 9690.3 0.0 0.0 7.0 0.0 38.2 0 100 c9t36d0
186.0 0.0 9193.8 0.0 0.0 7.0 0.0 37.6 0 100 c9t37d0
180.5 0.0 8233.4 0.0 0.0 7.0 0.0 38.8 0 100 c9t38d0
175.5 0.0 9085.2 0.0 0.0 7.0 0.0 39.9 0 100 c9t39d0
177.0 0.0 9340.0 0.0 0.0 7.0 0.0 39.5 0 100 c9t40d0
175.5 0.0 8831.0 0.0 0.0 7.0 0.0 39.9 0 100 c9t41d0
190.5 0.0 9177.8 0.0 0.0 7.0 0.0 36.7 0 100 c9t42d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c9t43d0
196.0 0.0 9180.5 0.0 0.0 7.0 0.0 35.7 0 100 c9t44d0
193.5 0.0 9496.8 0.0 0.0 7.0 0.0 36.2 0 100 c9t45d0
187.0 0.0 8699.5 0.0 0.0 7.0 0.0 37.4 0 100 c9t46d0
198.5 0.0 9277.0 0.0 0.0 7.0 0.0 35.2 0 100 c9t47d0
185.5 0.0 9778.3 0.0 0.0 7.0 0.0 37.7 0 100 c9t48d0
192.0 0.0 8384.2 0.0 0.0 7.0 0.0 36.4 0 100 c9t49d0
198.5 0.0 8864.7 0.0 0.0 7.0 0.0 35.2 0 100 c9t50d0
192.0 0.0 9369.8 0.0 0.0 7.0 0.0 36.4 0 100 c9t51d0
182.5 0.0 8825.7 0.0 0.0 7.0 0.0 38.3 0 100 c9t52d0
202.0 0.0 7387.9 0.0 0.0 7.0 0.0 34.6 0 100 c9t55d0
...and sure enough about 20 minutes into it I get this (bus reset?):
scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci8086,6...@4/pci1000,3...@0/s...@34,0 (sd49):
incomplete read- retrying
scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci8086,6...@4/pci1000,3...@0/s...@21,0 (sd30):
incomplete read- retrying
scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci8086,6...@4/pci1000,3...@0/s...@1e,0 (sd27):
incomplete read- retrying
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
Rev. 8 LSI, Inc. 1068E found.
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
mpt0 supports power management.
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
mpt0: IOC Operational.
During the "bus reset", iostat output looked like this:
extended device statistics ---- errors ---
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot
device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c8
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c8t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c8t1d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c8t2d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c8t3d0
0.0 0.0 0.0 0.0 0.0 88.0 0.0 0.0 0 2200 0 3 0 3 c9
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c9t8d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c9t9d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c9t10d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c9t11d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c9t12d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c9t13d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c9t14d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c9t15d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c9t16d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c9t17d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c9t18d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c9t19d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c9t20d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c9t21d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c9t22d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c9t23d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c9t24d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c9t25d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c9t26d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c9t27d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c9t28d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c9t29d0
0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 1 0 1
c9t30d0
0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0
c9t31d0
0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0
c9t32d0
0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 1 0 1
c9t33d0
0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0
c9t34d0
0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0
c9t35d0
0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0
c9t36d0
0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0
c9t37d0
0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0
c9t38d0
0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0
c9t39d0
0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0
c9t40d0
0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0
c9t41d0
0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0
c9t42d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c9t43d0
0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0
c9t44d0
0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0
c9t45d0
0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0
c9t46d0
0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0
c9t47d0
0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0
c9t48d0
0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0
c9t49d0
0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0
c9t50d0
0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0
c9t51d0
0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 1 0 1
c9t52d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c9t55d0
During our previous testing, we had tried even setting this max_pending value
down to 1, but we still hit the problem (albeit it took a little longer to hit
it) and I couldn't find anything else I could set to throttle IO to the disk,
hence the frustration.
If you hadn't seen this output, would you say that 7 was a "reasonable" value
for that max_pending queue for our architecture and should give the LSI
controller in this situation enough breathing room to operate? If so, I
*should* be able to scrub the disks successfully (ZFS isn't to blame) and
therefore have to point the finger at the mpt-driver/LSI-firmware/disk-firmware
instead.
--
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss