Ok, I now waited 30 minutes - still hung. After that I pulled the SATA cable to the L2ARC device also - still no success (I waited 10 minutes).
After 10 minutes I put the L2ARC device back (SATA + Power) 20 seconds after that the system continues to run. dmesg shows: Jan 8 15:41:57 nexenta scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci-...@14,1/i...@1 (ata9): Jan 8 15:41:57 nexenta timeout: early timeout, target=1 lun=0 Jan 8 15:41:57 nexenta gda: [ID 107833 kern.warning] WARNING: /p...@0,0/pci-...@14,1/i...@1/c...@1,0 (Disk6): Jan 8 15:41:57 nexenta Error for command 'write sector' Error Level: Informational Jan 8 15:41:57 nexenta gda: [ID 107833 kern.notice] Sense Key: aborted command Jan 8 15:41:57 nexenta gda: [ID 107833 kern.notice] Vendor 'Gen-ATA ' error code: 0x3 Jan 8 15:42:01 nexenta fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major Jan 8 15:42:01 nexenta EVENT-TIME: Fri Jan 8 15:41:59 CET 2010 Jan 8 15:42:01 nexenta PLATFORM: GA-MA770-UD3, CSN: , HOSTNAME: nexenta Jan 8 15:42:01 nexenta SOURCE: zfs-diagnosis, REV: 1.0 Jan 8 15:42:01 nexenta EVENT-ID: aca93a91-e013-c1b8-a5b7-fff547b2a61e Jan 8 15:42:01 nexenta DESC: The number of I/O errors associated with a ZFS device exceeded Jan 8 15:42:01 nexenta acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. Jan 8 15:42:01 nexenta AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt Jan 8 15:42:01 nexenta will be made to activate a hot spare if available. Jan 8 15:42:01 nexenta IMPACT: Fault tolerance of the pool may be compromised. Jan 8 15:42:01 nexenta REC-ACTION: Run 'zpool status -x' and replace the bad device. Jan 8 15:42:13 nexenta fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major Jan 8 15:42:13 nexenta EVENT-TIME: Fri Jan 8 15:42:12 CET 2010 Jan 8 15:42:13 nexenta PLATFORM: GA-MA770-UD3, CSN: , HOSTNAME: nexenta Jan 8 15:42:13 nexenta SOURCE: zfs-diagnosis, REV: 1.0 Jan 8 15:42:13 nexenta EVENT-ID: 781fa01d-394f-c24d-b900-c114d1cd9d06 Jan 8 15:42:13 nexenta DESC: The number of I/O errors associated with a ZFS device exceeded Jan 8 15:42:13 nexenta acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. Jan 8 15:42:13 nexenta AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt Jan 8 15:42:13 nexenta will be made to activate a hot spare if available. Jan 8 15:42:13 nexenta IMPACT: Fault tolerance of the pool may be compromised. Jan 8 15:42:13 nexenta REC-ACTION: Run 'zpool status -x' and replace the bad device. .. the deivce is seen as faulted: pool: data state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Fri Jan 8 15:42:03 2010 config: NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 mirror ONLINE 0 0 0 c3d0 ONLINE 0 0 0 c6d0 ONLINE 0 0 0 512 resilvered mirror ONLINE 0 0 0 c3d1 ONLINE 0 0 0 c4d0 ONLINE 0 0 0 cache c6d1 FAULTED 0 499 0 too many errors .. however zpool iostat -v still shows the device .... r...@nexenta:/export/home/admin# zpool iostat -v 1 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- data 209G 1.61T 0 129 0 4.64M mirror 104G 824G 0 64 0 2.34M c3d0 - - 0 64 0 2.34M c6d0 - - 0 64 0 2.34M mirror 104G 824G 0 64 0 2.31M c3d1 - - 0 64 0 2.31M c4d0 - - 0 64 0 2.31M cache - - - - - - c6d1 137M 149G 0 0 0 0 ---------- ----- ----- ----- ----- ----- ----- syspool 2.18G 462G 0 0 0 0 c4d1s0 2.18G 462G 0 0 0 0 ---------- ----- ----- ----- ----- ----- ----- So this seems to be a hardware issue. I would expect that there is some "general in kernel timeout" for I/O's so that strangly failing and not reacting device (and real failures are like this) are killed. Did I miss something ? Is there a tunable (/etc/system) ? Thanks for your responses :) -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
