Ok, 

I now waited 30 minutes - still hung. After that I pulled the SATA cable to the 
L2ARC device also - still no success (I waited 10 minutes). 

After 10 minutes I put the L2ARC device back (SATA + Power) 

20 seconds after that the system continues to run. 

dmesg shows: 

Jan  8 15:41:57 nexenta scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci-...@14,1/i...@1 (ata9):
Jan  8 15:41:57 nexenta         timeout: early timeout, target=1 lun=0
Jan  8 15:41:57 nexenta gda: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci-...@14,1/i...@1/c...@1,0 (Disk6):
Jan  8 15:41:57 nexenta         Error for command 'write sector'        Error 
Level: Informational
Jan  8 15:41:57 nexenta gda: [ID 107833 kern.notice]    Sense Key: aborted 
command
Jan  8 15:41:57 nexenta gda: [ID 107833 kern.notice]    Vendor 'Gen-ATA ' error 
code: 0x3
Jan  8 15:42:01 nexenta fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, 
TYPE: Fault, VER: 1, SEVERITY: Major
Jan  8 15:42:01 nexenta EVENT-TIME: Fri Jan  8 15:41:59 CET 2010
Jan  8 15:42:01 nexenta PLATFORM: GA-MA770-UD3, CSN:  , HOSTNAME: nexenta
Jan  8 15:42:01 nexenta SOURCE: zfs-diagnosis, REV: 1.0
Jan  8 15:42:01 nexenta EVENT-ID: aca93a91-e013-c1b8-a5b7-fff547b2a61e
Jan  8 15:42:01 nexenta DESC: The number of I/O errors associated with a ZFS 
device exceeded
Jan  8 15:42:01 nexenta              acceptable levels.  Refer to 
http://sun.com/msg/ZFS-8000-FD for more information.
Jan  8 15:42:01 nexenta AUTO-RESPONSE: The device has been offlined and marked 
as faulted.  An attempt
Jan  8 15:42:01 nexenta              will be made to activate a hot spare if 
available.
Jan  8 15:42:01 nexenta IMPACT: Fault tolerance of the pool may be compromised.
Jan  8 15:42:01 nexenta REC-ACTION: Run 'zpool status -x' and replace the bad 
device.
Jan  8 15:42:13 nexenta fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, 
TYPE: Fault, VER: 1, SEVERITY: Major
Jan  8 15:42:13 nexenta EVENT-TIME: Fri Jan  8 15:42:12 CET 2010
Jan  8 15:42:13 nexenta PLATFORM: GA-MA770-UD3, CSN:  , HOSTNAME: nexenta
Jan  8 15:42:13 nexenta SOURCE: zfs-diagnosis, REV: 1.0
Jan  8 15:42:13 nexenta EVENT-ID: 781fa01d-394f-c24d-b900-c114d1cd9d06
Jan  8 15:42:13 nexenta DESC: The number of I/O errors associated with a ZFS 
device exceeded
Jan  8 15:42:13 nexenta              acceptable levels.  Refer to 
http://sun.com/msg/ZFS-8000-FD for more information.
Jan  8 15:42:13 nexenta AUTO-RESPONSE: The device has been offlined and marked 
as faulted.  An attempt
Jan  8 15:42:13 nexenta              will be made to activate a hot spare if 
available.
Jan  8 15:42:13 nexenta IMPACT: Fault tolerance of the pool may be compromised.
Jan  8 15:42:13 nexenta REC-ACTION: Run 'zpool status -x' and replace the bad 
device.

.. the deivce is seen as faulted: 

  pool: data
 state: ONLINE
 scrub: resilver completed after 0h0m with 0 errors on Fri Jan  8 15:42:03 2010
config:

        NAME        STATE     READ WRITE CKSUM
        data        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c3d0    ONLINE       0     0     0
            c6d0    ONLINE       0     0     0  512 resilvered
          mirror    ONLINE       0     0     0
            c3d1    ONLINE       0     0     0
            c4d0    ONLINE       0     0     0
        cache
          c6d1      FAULTED      0   499     0  too many errors

.. however zpool iostat -v still shows the device ....

r...@nexenta:/export/home/admin# zpool iostat -v 1
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
data         209G  1.61T      0    129      0  4.64M
  mirror     104G   824G      0     64      0  2.34M
    c3d0        -      -      0     64      0  2.34M
    c6d0        -      -      0     64      0  2.34M
  mirror     104G   824G      0     64      0  2.31M
    c3d1        -      -      0     64      0  2.31M
    c4d0        -      -      0     64      0  2.31M
cache           -      -      -      -      -      -
  c6d1       137M   149G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
syspool     2.18G   462G      0      0      0      0
  c4d1s0    2.18G   462G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----

So this seems to be a hardware issue. 

I would expect that there is some "general in kernel timeout" for I/O's so that 
strangly failing and not reacting device (and real failures are like this) are 
killed. 

Did I miss something ? Is there a tunable (/etc/system) ? 

Thanks for your responses :)
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to