Re: [zfs-discuss] SNV_125 MPT warning in logfile

Adam Cheal Sat, 24 Oct 2009 02:50:39 -0700

The iostat I posted previously was from a system we had already tuned the 
zfs:zfs_vdev_max_pending depth down to 10 (as visible by the max of about 10 in 
actv per disk).


I reset this value in /etc/system to 7, rebooted, and started a scrub. iostat 
output showed busier disks (%b is higher, which seemed odd) but a cap of about 
7 queue items per disk, proving the tuning was effective. iostat at a 
high-water mark during the test looked like this:

                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t0d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t1d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t2d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t3d0
 8344.5    0.0 359640.4    0.0  0.1 300.5    0.0   36.0   0 4362 c9
  190.0    0.0 6800.4    0.0  0.0  6.6    0.0   34.8   0  99 c9t8d0
  185.0    0.0 6917.1    0.0  0.0  6.1    0.0   32.9   0  94 c9t9d0
  187.0    0.0 6640.9    0.0  0.0  6.5    0.0   34.6   0  98 c9t10d0
  186.5    0.0 6543.4    0.0  0.0  7.0    0.0   37.5   0 100 c9t11d0
  180.5    0.0 7203.1    0.0  0.0  6.7    0.0   37.2   0 100 c9t12d0
  195.5    0.0 7352.4    0.0  0.0  7.0    0.0   35.8   0 100 c9t13d0
  188.0    0.0 6884.9    0.0  0.0  6.6    0.0   35.2   0  99 c9t14d0
  204.0    0.0 6990.1    0.0  0.0  7.0    0.0   34.3   0 100 c9t15d0
  199.0    0.0 7336.7    0.0  0.0  7.0    0.0   35.2   0 100 c9t16d0
  180.5    0.0 6837.9    0.0  0.0  7.0    0.0   38.8   0 100 c9t17d0
  198.0    0.0 7668.9    0.0  0.0  7.0    0.0   35.3   0 100 c9t18d0
  203.0    0.0 7983.2    0.0  0.0  7.0    0.0   34.5   0 100 c9t19d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c9t20d0
  195.5    0.0 7096.4    0.0  0.0  6.7    0.0   34.1   0  98 c9t21d0
  189.5    0.0 7757.2    0.0  0.0  6.4    0.0   33.9   0  97 c9t22d0
  195.5    0.0 7645.9    0.0  0.0  6.6    0.0   33.8   0  99 c9t23d0
  194.5    0.0 7925.9    0.0  0.0  7.0    0.0   36.0   0 100 c9t24d0
  188.5    0.0 6725.6    0.0  0.0  6.2    0.0   32.8   0  94 c9t25d0
  188.5    0.0 7199.6    0.0  0.0  6.5    0.0   34.6   0  98 c9t26d0
  196.0    0.0 6666.9    0.0  0.0  6.3    0.0   32.1   0  95 c9t27d0
  193.5    0.0 7455.4    0.0  0.0  6.2    0.0   32.0   0  95 c9t28d0
  189.0    0.0 7400.9    0.0  0.0  6.3    0.0   33.2   0  96 c9t29d0
  182.5    0.0 9397.0    0.0  0.0  7.0    0.0   38.3   0 100 c9t30d0
  192.5    0.0 9179.5    0.0  0.0  7.0    0.0   36.3   0 100 c9t31d0
  189.5    0.0 9431.8    0.0  0.0  7.0    0.0   36.9   0 100 c9t32d0
  187.5    0.0 9082.0    0.0  0.0  7.0    0.0   37.3   0 100 c9t33d0
  188.5    0.0 9368.8    0.0  0.0  7.0    0.0   37.1   0 100 c9t34d0
  180.5    0.0 9332.8    0.0  0.0  7.0    0.0   38.8   0 100 c9t35d0
  183.0    0.0 9690.3    0.0  0.0  7.0    0.0   38.2   0 100 c9t36d0
  186.0    0.0 9193.8    0.0  0.0  7.0    0.0   37.6   0 100 c9t37d0
  180.5    0.0 8233.4    0.0  0.0  7.0    0.0   38.8   0 100 c9t38d0
  175.5    0.0 9085.2    0.0  0.0  7.0    0.0   39.9   0 100 c9t39d0
  177.0    0.0 9340.0    0.0  0.0  7.0    0.0   39.5   0 100 c9t40d0
  175.5    0.0 8831.0    0.0  0.0  7.0    0.0   39.9   0 100 c9t41d0
  190.5    0.0 9177.8    0.0  0.0  7.0    0.0   36.7   0 100 c9t42d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c9t43d0
  196.0    0.0 9180.5    0.0  0.0  7.0    0.0   35.7   0 100 c9t44d0
  193.5    0.0 9496.8    0.0  0.0  7.0    0.0   36.2   0 100 c9t45d0
  187.0    0.0 8699.5    0.0  0.0  7.0    0.0   37.4   0 100 c9t46d0
  198.5    0.0 9277.0    0.0  0.0  7.0    0.0   35.2   0 100 c9t47d0
  185.5    0.0 9778.3    0.0  0.0  7.0    0.0   37.7   0 100 c9t48d0
  192.0    0.0 8384.2    0.0  0.0  7.0    0.0   36.4   0 100 c9t49d0
  198.5    0.0 8864.7    0.0  0.0  7.0    0.0   35.2   0 100 c9t50d0
  192.0    0.0 9369.8    0.0  0.0  7.0    0.0   36.4   0 100 c9t51d0
  182.5    0.0 8825.7    0.0  0.0  7.0    0.0   38.3   0 100 c9t52d0
  202.0    0.0 7387.9    0.0  0.0  7.0    0.0   34.6   0 100 c9t55d0

...and sure enough about 20 minutes into it I get this (bus reset?):

scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,6...@4/pci1000,3...@0/s...@34,0 (sd49):
       incomplete read- retrying
scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,6...@4/pci1000,3...@0/s...@21,0 (sd30):
       incomplete read- retrying
scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,6...@4/pci1000,3...@0/s...@1e,0 (sd27):
       incomplete read- retrying
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
       Rev. 8 LSI, Inc. 1068E found.
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
       mpt0 supports power management.
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
       mpt0: IOC Operational.

During the "bus reset", iostat output looked like this:

                            extended device statistics       ---- errors --- 
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot 
device
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 c8
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c8t0d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c8t1d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c8t2d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c8t3d0
    0.0    0.0    0.0    0.0  0.0 88.0    0.0    0.0   0 2200   0   3   0   3 c9
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c9t8d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c9t9d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c9t10d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c9t11d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c9t12d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c9t13d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c9t14d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c9t15d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c9t16d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c9t17d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c9t18d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c9t19d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c9t20d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c9t21d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c9t22d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c9t23d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c9t24d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c9t25d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c9t26d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c9t27d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c9t28d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c9t29d0
    0.0    0.0    0.0    0.0  0.0  4.0    0.0    0.0   0 100   0   1   0   1 
c9t30d0
    0.0    0.0    0.0    0.0  0.0  4.0    0.0    0.0   0 100   0   0   0   0 
c9t31d0
    0.0    0.0    0.0    0.0  0.0  4.0    0.0    0.0   0 100   0   0   0   0 
c9t32d0
    0.0    0.0    0.0    0.0  0.0  4.0    0.0    0.0   0 100   0   1   0   1 
c9t33d0
    0.0    0.0    0.0    0.0  0.0  4.0    0.0    0.0   0 100   0   0   0   0 
c9t34d0
    0.0    0.0    0.0    0.0  0.0  4.0    0.0    0.0   0 100   0   0   0   0 
c9t35d0
    0.0    0.0    0.0    0.0  0.0  4.0    0.0    0.0   0 100   0   0   0   0 
c9t36d0
    0.0    0.0    0.0    0.0  0.0  4.0    0.0    0.0   0 100   0   0   0   0 
c9t37d0
    0.0    0.0    0.0    0.0  0.0  4.0    0.0    0.0   0 100   0   0   0   0 
c9t38d0
    0.0    0.0    0.0    0.0  0.0  4.0    0.0    0.0   0 100   0   0   0   0 
c9t39d0
    0.0    0.0    0.0    0.0  0.0  4.0    0.0    0.0   0 100   0   0   0   0 
c9t40d0
    0.0    0.0    0.0    0.0  0.0  4.0    0.0    0.0   0 100   0   0   0   0 
c9t41d0
    0.0    0.0    0.0    0.0  0.0  4.0    0.0    0.0   0 100   0   0   0   0 
c9t42d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c9t43d0
    0.0    0.0    0.0    0.0  0.0  4.0    0.0    0.0   0 100   0   0   0   0 
c9t44d0
    0.0    0.0    0.0    0.0  0.0  4.0    0.0    0.0   0 100   0   0   0   0 
c9t45d0
    0.0    0.0    0.0    0.0  0.0  4.0    0.0    0.0   0 100   0   0   0   0 
c9t46d0
    0.0    0.0    0.0    0.0  0.0  4.0    0.0    0.0   0 100   0   0   0   0 
c9t47d0
    0.0    0.0    0.0    0.0  0.0  4.0    0.0    0.0   0 100   0   0   0   0 
c9t48d0
    0.0    0.0    0.0    0.0  0.0  4.0    0.0    0.0   0 100   0   0   0   0 
c9t49d0
    0.0    0.0    0.0    0.0  0.0  4.0    0.0    0.0   0 100   0   0   0   0 
c9t50d0
    0.0    0.0    0.0    0.0  0.0  4.0    0.0    0.0   0 100   0   0   0   0 
c9t51d0
    0.0    0.0    0.0    0.0  0.0  4.0    0.0    0.0   0 100   0   1   0   1 
c9t52d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 
c9t55d0

During our previous testing, we had tried even setting this max_pending value 
down to 1, but we still hit the problem (albeit it took a little longer to hit 
it) and I couldn't find anything else I could set to throttle IO to the disk, 
hence the frustration.

If you hadn't seen this output, would you say that 7 was a "reasonable" value 
for that max_pending queue for our architecture and should give the LSI 
controller in this situation enough breathing room to operate? If so, I 
*should* be able to scrub the disks successfully (ZFS isn't to blame) and 
therefore have to point the finger at the mpt-driver/LSI-firmware/disk-firmware 
instead.
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] SNV_125 MPT warning in logfile

Reply via email to