I have a large pool and I starting getting the following errors on one
of the LUNS:
Mar 13 17:52:36 gdo-node-2 scsi: [ID 107833 kern.warning]
WARNING: /scsi_vhci/[EMAIL PROTECTED] (sd337):
Mar 13 17:52:36 gdo-node-2 Error for Command: write(10)
Error Level: Retryable
Mar 13 17:52:36 gdo-node-2 scsi: [ID 107833 kern.notice]
Requested Block: 15782 Error Block: 15782
Mar 13 17:52:36 gdo-node-2 scsi: [ID 107833 kern.notice] Vendor:
STK Serial Number:
Mar 13 17:52:36 gdo-node-2 scsi: [ID 107833 kern.notice] Sense
Key: Hardware Error
Mar 13 17:52:36 gdo-node-2 scsi: [ID 107833 kern.notice] ASC:
0x84 (<vendor unique code 0x84>), ASCQ: 0x0, FRU: 0x0
Mar 13 17:52:37 gdo-node-2 scsi: [ID 107833 kern.warning]
WARNING: /scsi_vhci/[EMAIL PROTECTED] (sd337):
Mar 13 17:52:37 gdo-node-2 Error for Command: write(10)
Error Level: Retryable
Mar 13 17:52:37 gdo-node-2 scsi: [ID 107833 kern.notice]
Requested Block: 885894 Error Block: 885894
Mar 13 17:52:37 gdo-node-2 scsi: [ID 107833 kern.notice] Vendor:
STK Serial Number:
Mar 13 17:52:37 gdo-node-2 scsi: [ID 107833 kern.notice] Sense
Key: Hardware Error
Mar 13 17:52:37 gdo-node-2 scsi: [ID 107833 kern.notice] ASC:
0x84 (<vendor unique code 0x84>), ASCQ: 0x0, FRU: 0x0
Mar 13 17:52:37 gdo-node-2 scsi: [ID 107833 kern.warning]
WARNING: /scsi_vhci/[EMAIL PROTECTED] (sd337):
Mar 13 17:52:37 gdo-node-2 Error for Command: write(10)
Error Level: Retryable
Mar 13 17:52:37 gdo-node-2 scsi: [ID 107833 kern.notice]
Requested Block: 15779 Error Block: 15779
Mar 13 17:52:37 gdo-node-2 scsi: [ID 107833 kern.notice] Vendor:
STK Serial Number:
Mar 13 17:52:37 gdo-node-2 scsi: [ID 107833 kern.notice] Sense
Key: Hardware Error
There were others which were at a "Fatal" error level.
>From the hardware side of things this lun has failed as well. The lun
is actually only composed of a single disk of which the entire disk has
been made into the lun. 1 lun / volume / disk. I'm testing various
configurations from the hardware, from R5 volumes, to these single disk
volumes. Back to the issue...
So I was hoping that the hotspare would kick in, but since that didn't
seem to be the case I thought I would try and replace the disk manually.
I did the following on this disk, but the errors just keep coming.
zpool replace -f gdo-pool-01 c8t600A0B8000115EA20000FEDD45E81306d0 \
c8t600A0B800011399600007D6F45E8149Bd0
Originally the replacement disk was part of the spares, for this pool,
hence I think I had to use the -f.
I had removed the disk from the spares just prior to the above zpool
replace.
zpool remove gdo-pool-01 c8t600A0B800011399600007D6F45E8149Bd0
After the replacement the raidz2 group looked like:
bash-3.00# zpool status gdo-pool-01
pool: gdo-pool-01
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are
unaffected.
action: Determine if the device needs to be replaced, and clear the
errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: resilver completed with 0 errors on Tue Mar 13 17:38:21 2007
config:
...
<several raidz2 group listings deleted to make this email shorter >
...
raidz2 ONLINE 0
0 0
c8t600A0B800011399600007CDD45E80D31d0 ONLINE 0
0 0
spare ONLINE 0
0 0
c8t600A0B8000115EA20000FEDD45E81306d0 ONLINE 14
121.9 0
c8t600A0B800011399600007D6F45E8149Bd0 ONLINE 0
0 0
c8t600A0B800011399600007D0745E80F03d0 ONLINE 0
0 0
c8t600A0B8000115EA20000FEF945E814DEd0 ONLINE 0
0 0
c8t600A0B800011399600007D3145E810E9d0 ONLINE 0
0 0
c8t600A0B800011399600007D4F45E81263d0 ONLINE 0
0 0
c8t600A0B8000115EA20000FF1F45E8183Ed0 ONLINE 0
0 0
c8t600A0B800011399600007D6B45E81471d0 ONLINE 0
0 0
c8t600A0B8000115EA20000FE8B45E80D46d0 ONLINE 0
0 0
c8t600A0B800011399600007C6F45E80927d0 ONLINE 0
0 0
c8t600A0B8000115EA20000FEA745E80ED4d0 ONLINE 0
0 0
c8t600A0B800011399600007C9945E80ABDd0 ONLINE 0
0 0
c8t600A0B800011399600007CB545E80B81d0 ONLINE 0
0 0
c8t600A0B8000115EA20000FEC345E8114Ed0 ONLINE 0
0 0
c8t600A0B800011399600007CDF45E80D3Fd0 ONLINE 0
0 0
c8t600A0B8000115EA20000FEDF45E81316d0 ONLINE 0
0 0
So even after the replace the read and write errors continue to
accumulate in the zpool status output and I continue to see errors
in /var/adm/messages.
This system is an x4600 running Solaris 10 update 3, with fairly recent
patches applied.
Any advise on what I should have done, or what I can do to make the
system stop using the bad lun would appreciated.
Thank you,
David
_______________________________________________
zfs-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss