On Tue, Aug 24, 2010 at 04:46:23PM -0700, Andrew Gabriel wrote:
> Ray Van Dolson wrote:
> > I posted a thread on this once long ago[1] -- but we're still fighting
> > with this problem and I wanted to throw it out here again.
> >
> > All of our hardware is from Silicon Mechanics (SuperMicro chassis and
> > motherboards).
> >
> > Up until now, all of the hardware has had a single 24-disk expander /
> > backplane -- but we recently got one of the new SC847-based models with
> > 24 disks up front and 12 in the back -- a dual backplane setup.
> >
> > We're using two SSD's in the front backplane as mirrored ZIL/OS (I
> > don't think we have the 4K alignment set up correctly) and two drives
> > in the back as L2ARC.
> >
> > The rest of the disks are 1TB SATA disks which make up a single large
> > zpool via three 8-disk RAIDZ2's.  As you can see, we don't have the
> > server maxed out on drives...
> >
> > In any case, this new server gets between 400 and 600 of these timeout
> > errors an hour:
> >
> > Aug 21 03:10:17 dev-zfs1 scsi: [ID 365881 kern.info] 
> > /p...@0,0/pci8086,3...@8/pci15d9,1...@0 (mpt0):
> > Aug 21 03:10:17 dev-zfs1        Log info 31126000 received for target 8.
> > Aug 21 03:10:17 dev-zfs1        scsi_status=0, ioc_status=804b, scsi_state=c
> > Aug 21 03:10:17 dev-zfs1 scsi: [ID 365881 kern.info] 
> > /p...@0,0/pci8086,3...@8/pci15d9,1...@0 (mpt0):
> > Aug 21 03:10:17 dev-zfs1        Log info 31126000 received for target 8.
> > Aug 21 03:10:17 dev-zfs1        scsi_status=0, ioc_status=804b, scsi_state=c
> > Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.warning] WARNING: 
> > /p...@0,0/pci8086,3...@8/pci15d9,1...@0/s...@8,0 (sd0):
> > Aug 21 03:10:17 dev-zfs1        Error for Command: write(10)               
> > Error Level: Retryable
> > Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.notice]  Requested Block: 
> > 21230708                  Error Block: 21230708
> > Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.notice]  Vendor: ATA         
> >                        Serial Number: CVEM002600EW
> > Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.notice]  Sense Key: Unit 
> > Attention
> > Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.notice]  ASC: 0x29 (power 
> > on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
> > Aug 21 03:10:21 dev-zfs1 scsi: [ID 365881 kern.info] 
> > /p...@0,0/pci8086,3...@8/pci15d9,1...@0 (mpt0):
> >
> > iostat -xnMCez shows that the first of the two ZIL drives receives
> > about twice the number of "errors" as the second drive.
> >
> > There are no other errors on any other drives -- including the L2ARC
> > SSD's and the ascv_t times seem reasonably low and don't indicate a bad
> > drive to my eyes...
> >
> > The timeouts above exact a rather large performance penalty on the
> > system, both in IO and general usage from an SSH console.  Obvious
> > pauses and glitches when accessing the filesystem.
> >   
> 
> This isn't a timeout. "Unit Attention" is the drive saying back to the 
> computer that it's been reset and has forgotten any negotiation which 
> happened with the controller. It's a couple of decades since I was 
> working on SCSI at this level, but IIRC, a drive will return "Unit 
> Attention" error to the first command issued to it after a 
> reset/powerup, except for a Test Unit Ready command. As it says, this 
> might be caused by power on, reset, or bus reset occurred.

Interesting.  Thanks for the insight.

> 
> > The problem _follows_ the ZIL and isn't tied to hardware.  IOW, if I
> > switch to using the L2ARC drives as ZIL, those drives suddenly exhibit
> > the timeout problems...
> >   
> 
> A possibility is that the problem is related to the nature of the load a 
> ZIL drive attracts. One scenario could be that you are crashing the 
> drive firmware, causing it it reset and reinitialize itself, and 
> therefore to return "Unit Attention" to the next command. (I don't know 
> if X25-E's can behave this way.)
> 
> I would try and correct the 4k alignment on the ZIL at least - that does 
> significantly affect the work the drive has to do internally (as well as 
> its performance), although I've no idea if that's related to the issue 
> you're seeing.

Will definitely give this a go -- certainly can't hurt.

> 
> > If we connect the SSD drives directly to the LSI controller instead of
> > hanging off the hot-swap backplane, the timeouts go away.
> >   
> 
> Again, may be related to some combination of the load type and physical 
> characteristics.
> 
> > If we use SSD's attached to the SATA controllers as ZIL, there are also
> > no performance issues or timeout errors.
> >   
> 
> Why not do this then? It also avoids using SATA tunneling protocol 
> across the SAS and port expanders.

We may -- however, the main reason we'd gone with the port expander was
for convenient hot swappability.  Though I guess SATA is technically
hot swappable, it's not as convenient :)

> 
> > So the problem only occurs with SSD drives acting as ZIL attached to
> > the backplane.
> >
> > This is leading me to believe we have a driver issue of some sort in
> > the mpt subsystem unable to cope with the longer command path of
> > multiple backplanes.  Someone alluded to this in [1] as well, and it
> > makes sense to me.
> >
> > One quick fix to me would seem to be upping the SCSI timeout values.
> >   
> 
> The error you included isn't a timeout.
> 
> > The SSD's themselves are all Intel X-25E's (32GB) with firmware 8860
> > and the LSI 1068 is a SAS1068E B3 with firmware 011c0200 (1.28.02.00).
> >   
> 
> I'm not intimately familiar with the firmware versions, but if you're 
> having problems, making sure you have latest firmware is probably a good 
> thing to do.
> 

Appreciate the response Gabriel.

We also do plan to compare between Solaris 10U8 and OpenSolaris /
Nexenta at some point when this hardware is freed up...

Ray
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to