On Tue, Aug 24, 2010 at 04:46:23PM -0700, Andrew Gabriel wrote: > Ray Van Dolson wrote: > > I posted a thread on this once long ago[1] -- but we're still fighting > > with this problem and I wanted to throw it out here again. > > > > All of our hardware is from Silicon Mechanics (SuperMicro chassis and > > motherboards). > > > > Up until now, all of the hardware has had a single 24-disk expander / > > backplane -- but we recently got one of the new SC847-based models with > > 24 disks up front and 12 in the back -- a dual backplane setup. > > > > We're using two SSD's in the front backplane as mirrored ZIL/OS (I > > don't think we have the 4K alignment set up correctly) and two drives > > in the back as L2ARC. > > > > The rest of the disks are 1TB SATA disks which make up a single large > > zpool via three 8-disk RAIDZ2's. As you can see, we don't have the > > server maxed out on drives... > > > > In any case, this new server gets between 400 and 600 of these timeout > > errors an hour: > > > > Aug 21 03:10:17 dev-zfs1 scsi: [ID 365881 kern.info] > > /p...@0,0/pci8086,3...@8/pci15d9,1...@0 (mpt0): > > Aug 21 03:10:17 dev-zfs1 Log info 31126000 received for target 8. > > Aug 21 03:10:17 dev-zfs1 scsi_status=0, ioc_status=804b, scsi_state=c > > Aug 21 03:10:17 dev-zfs1 scsi: [ID 365881 kern.info] > > /p...@0,0/pci8086,3...@8/pci15d9,1...@0 (mpt0): > > Aug 21 03:10:17 dev-zfs1 Log info 31126000 received for target 8. > > Aug 21 03:10:17 dev-zfs1 scsi_status=0, ioc_status=804b, scsi_state=c > > Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.warning] WARNING: > > /p...@0,0/pci8086,3...@8/pci15d9,1...@0/s...@8,0 (sd0): > > Aug 21 03:10:17 dev-zfs1 Error for Command: write(10) > > Error Level: Retryable > > Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.notice] Requested Block: > > 21230708 Error Block: 21230708 > > Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.notice] Vendor: ATA > > Serial Number: CVEM002600EW > > Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.notice] Sense Key: Unit > > Attention > > Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.notice] ASC: 0x29 (power > > on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0 > > Aug 21 03:10:21 dev-zfs1 scsi: [ID 365881 kern.info] > > /p...@0,0/pci8086,3...@8/pci15d9,1...@0 (mpt0): > > > > iostat -xnMCez shows that the first of the two ZIL drives receives > > about twice the number of "errors" as the second drive. > > > > There are no other errors on any other drives -- including the L2ARC > > SSD's and the ascv_t times seem reasonably low and don't indicate a bad > > drive to my eyes... > > > > The timeouts above exact a rather large performance penalty on the > > system, both in IO and general usage from an SSH console. Obvious > > pauses and glitches when accessing the filesystem. > > > > This isn't a timeout. "Unit Attention" is the drive saying back to the > computer that it's been reset and has forgotten any negotiation which > happened with the controller. It's a couple of decades since I was > working on SCSI at this level, but IIRC, a drive will return "Unit > Attention" error to the first command issued to it after a > reset/powerup, except for a Test Unit Ready command. As it says, this > might be caused by power on, reset, or bus reset occurred.
Interesting. Thanks for the insight. > > > The problem _follows_ the ZIL and isn't tied to hardware. IOW, if I > > switch to using the L2ARC drives as ZIL, those drives suddenly exhibit > > the timeout problems... > > > > A possibility is that the problem is related to the nature of the load a > ZIL drive attracts. One scenario could be that you are crashing the > drive firmware, causing it it reset and reinitialize itself, and > therefore to return "Unit Attention" to the next command. (I don't know > if X25-E's can behave this way.) > > I would try and correct the 4k alignment on the ZIL at least - that does > significantly affect the work the drive has to do internally (as well as > its performance), although I've no idea if that's related to the issue > you're seeing. Will definitely give this a go -- certainly can't hurt. > > > If we connect the SSD drives directly to the LSI controller instead of > > hanging off the hot-swap backplane, the timeouts go away. > > > > Again, may be related to some combination of the load type and physical > characteristics. > > > If we use SSD's attached to the SATA controllers as ZIL, there are also > > no performance issues or timeout errors. > > > > Why not do this then? It also avoids using SATA tunneling protocol > across the SAS and port expanders. We may -- however, the main reason we'd gone with the port expander was for convenient hot swappability. Though I guess SATA is technically hot swappable, it's not as convenient :) > > > So the problem only occurs with SSD drives acting as ZIL attached to > > the backplane. > > > > This is leading me to believe we have a driver issue of some sort in > > the mpt subsystem unable to cope with the longer command path of > > multiple backplanes. Someone alluded to this in [1] as well, and it > > makes sense to me. > > > > One quick fix to me would seem to be upping the SCSI timeout values. > > > > The error you included isn't a timeout. > > > The SSD's themselves are all Intel X-25E's (32GB) with firmware 8860 > > and the LSI 1068 is a SAS1068E B3 with firmware 011c0200 (1.28.02.00). > > > > I'm not intimately familiar with the firmware versions, but if you're > having problems, making sure you have latest firmware is probably a good > thing to do. > Appreciate the response Gabriel. We also do plan to compare between Solaris 10U8 and OpenSolaris / Nexenta at some point when this hardware is freed up... Ray _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss