x4540 snv_117
We lost a HDD last night, and it seemed to take out most of the bus or
something and forced us to reboot. (We have yet to experience losing a
disk that didn't force a reboot mind you).
So today, I'm looking at replacing the broken HDD, but no amount of work
makes it "turn on the blue LED". After trying that for an hour, we just
replaced the HDD anyway. But no amount of work will make it
use/recognise it. (We tried more than one working spare HDD too).
For example:
# zpool status
raidz1 DEGRADED 0 0 0
c5t1d0 ONLINE 0 0 0
c0t5d0 ONLINE 0 0 0
spare DEGRADED 0 0 285K
c1t5d0 UNAVAIL 0 0 0 cannot open
c4t7d0 ONLINE 0 0 0 4.13G resilvered
c2t5d0 ONLINE 0 0 0
c3t5d0 ONLINE 0 0 0
spares
c4t7d0 INUSE currently in use
# zpool offline zpool1 c1t5d0
raidz1 DEGRADED 0 0 0
c5t1d0 ONLINE 0 0 0
c0t5d0 ONLINE 0 0 0
spare DEGRADED 0 0 285K
c1t5d0 OFFLINE 0 0 0
c4t7d0 ONLINE 0 0 0 4.13G resilvered
c2t5d0 ONLINE 0 0 0
c3t5d0 ONLINE 0 0 0
# cfgadm -al
Ap_Id Type Receptacle Occupant
Condition
c1 scsi-bus connected configured
unknown
c1::dsk/c1t5d0 disk connected configured failed
# cfgadm -c unconfigure c1::dsk/c1t5d0
# cfgadm -al
c1::dsk/c1t5d0 disk connected configured failed
# cfgadm -c unconfigure c1::dsk/c1t5d0
# cfgadm -c unconfigure c1::dsk/c1t5d0
# cfgadm -fc unconfigure c1::dsk/c1t5d0
# cfgadm -fc unconfigure c1::dsk/c1t5d0
# cfgadm -al
c1::dsk/c1t5d0 disk connected configured failed
# hdadm offline slot 13
1: 5: 9: 13: 17: 21: 25: 29: 33: 37: 41: 45:
c0t1 c0t5 c1t1 c1t5 c2t1 c2t5 c3t1 c3t5 c4t1 c4t5 c5t1 c5t5
^b+ ^++ ^b+ ^-- ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++
# cfgadm -al
c1::dsk/c1t5d0 disk connected configured failed
# fmadm faulty
FRU : "HD_ID_47"
(hc://:product-id=Sun-Fire-X4540:chassis-id=0915AMR048:server-id=x4500-10.unix:serial=9QMB024K:part=SEAGATE-ST35002NSSUN500G-09107B024K:revision=SU0D/chassis=0/bay=47/disk=0)
faulty
# fmadm repair HD_ID_47
fmadm: recorded repair to HD_ID_47
# format | grep c1t5d0
#
# hdadm offline slot 13
1: 5: 9: 13: 17: 21: 25: 29: 33: 37: 41: 45:
c0t1 c0t5 c1t1 c1t5 c2t1 c2t5 c3t1 c3t5 c4t1 c4t5 c5t1 c5t5
^b+ ^++ ^b+ ^-- ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++
# cfgadm -al
c1::dsk/c1t5d0 disk connected configured failed
# ipmitool sunoem led get|grep 13
hdd13.fail.led | ON
hdd13.ok2rm.led | OFF
# zpool online zpool1 c1t5d0
warning: device 'c1t5d0' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present
# cfgadm -c disconnect c1::dsk/c1t5d0
cfgadm: Hardware specific failure: operation not supported for SCSI device
Bah, why were they changed to SCSI? Increasing the size of the hammer...
# cfgadm -x replace_device c1::sd37
Replacing SCSI device: /devices/p...@0,0/pci10de,3...@b/pci1000,1...@0/s...@5,0
This operation will suspend activity on SCSI bus: c1
Continue (yes/no)? y
SCSI bus quiesced successfully.
It is now safe to proceed with hotplug operation.
Enter y if operation is complete or n to abort (yes/no)? y
# cfgadm -al
c1::dsk/c1t5d0 disk connected configured failed
I am fairly certain that if I reboot, it will all come back ok again.
But I would like to believe that I should be able to replace a disk
without rebooting on a X4540.
Any other commands I should try?
Lund
--
Jorgen Lundman | <lund...@lundman.net>
Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell)
Japan | +81 (0)3 -3375-1767 (home)
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss