Hi all,

Looks like we've run aground a *really* fun problem.

We found some performance degradation over the evening, where a filesystem 
became unresponsive, and then became unwritable. Took it offline, and found 
this:

samfsck: NOTICE: Filesystem insdata requires fsck
name:     insdata       version:     2A    shared
First pass
NOTICE: ino 1.1,        Repaired file size from 17968922624 to 17968988160
Second pass
samfsck: Write failed on eq 203 at block 0x15dfc90, length = 16: I/O error
samfsck: Write failed in .inodes on eq 203 at block 0x15dfc90

Has anyone seen this behaviour before?

It's writing to other filesystems, on the same SSD/Flash based disk over FC 
just fine. 

Some errors have shown up:

Apr 11 15:55:46 mdc scsi: [ID 107833 kern.warning] WARNING: 
/scsi_vhci/disk@g600144f016cec70000004c22ce7c0008 (sd53):
Apr 11 15:55:46 mdc         Error for Command: read(10)                Error 
Level: Retryable
Apr 11 15:55:46 mdc scsi: [ID 107833 kern.notice]   Requested Block: 2928896    
               Error Block: 2928896
Apr 11 15:55:46 mdc scsi: [ID 107833 kern.notice]   Vendor: SUN                 
               Serial Number:             
Apr 11 15:55:46 mdc scsi: [ID 107833 kern.notice]   Sense Key: Unit Attention
Apr 11 15:55:46 mdc scsi: [ID 107833 kern.notice]   ASC: 0x29 (power on, reset, 
or bus reset occurred), ASCQ: 0x0, FRU: 0x0
Apr 11 15:55:46 mdc scsi: [ID 107833 kern.warning] WARNING: 
/scsi_vhci/disk@g600144f016cec70000004c22ce620002 (sd47):
Apr 11 15:55:46 mdc         Error for Command: read(10)                Error 
Level: Retryable

We're racking our brains trying to figure out of this is hardware or software 
related. We've swapped out a couple of SFP's already - but to no avail, thus 
far.

FC environment and switching *seems* sane, but whenever SAM tries to write to 
this particular filesystem, and this particular part of the mm block, we have 
an I/O error.

I've very hesitant to blow it away and start again - because there might be 
something I'm not seeing in hardware, perhaps. 

All Oracle F20 cards are showing as healthy/no failed NAND modules, either.

Thanks all.

JC






_______________________________________________
sam-qfs-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/sam-qfs-discuss

Reply via email to