x4540 snv_117

We lost a HDD last night, and it seemed to take out most of the bus or something and forced us to reboot. (We have yet to experience losing a disk that didn't force a reboot mind you).

So today, I'm looking at replacing the broken HDD, but no amount of work makes it "turn on the blue LED". After trying that for an hour, we just replaced the HDD anyway. But no amount of work will make it use/recognise it. (We tried more than one working spare HDD too).

For example:

# zpool status

          raidz1      DEGRADED     0     0     0
            c5t1d0    ONLINE       0     0     0
            c0t5d0    ONLINE       0     0     0
            spare     DEGRADED     0     0  285K
              c1t5d0  UNAVAIL      0     0     0  cannot open
              c4t7d0  ONLINE       0     0     0  4.13G resilvered
            c2t5d0    ONLINE       0     0     0
            c3t5d0    ONLINE       0     0     0
        spares
          c4t7d0      INUSE     currently in use



# zpool offline zpool1 c1t5d0

          raidz1      DEGRADED     0     0     0
            c5t1d0    ONLINE       0     0     0
            c0t5d0    ONLINE       0     0     0
            spare     DEGRADED     0     0  285K
              c1t5d0  OFFLINE      0     0     0
              c4t7d0  ONLINE       0     0     0  4.13G resilvered
            c2t5d0    ONLINE       0     0     0
            c3t5d0    ONLINE       0     0     0


# cfgadm -al
Ap_Id Type Receptacle Occupant Condition c1 scsi-bus connected configured unknown
c1::dsk/c1t5d0                 disk         connected    configured   failed

# cfgadm -c unconfigure c1::dsk/c1t5d0
# cfgadm -al
c1::dsk/c1t5d0                 disk         connected    configured   failed
# cfgadm -c unconfigure c1::dsk/c1t5d0
# cfgadm -c unconfigure c1::dsk/c1t5d0
# cfgadm -fc unconfigure c1::dsk/c1t5d0
# cfgadm -fc unconfigure c1::dsk/c1t5d0
# cfgadm -al
c1::dsk/c1t5d0                 disk         connected    configured   failed

# hdadm offline slot 13
 1:    5:    9:   13:   17:   21:   25:   29:   33:   37:   41:   45:
c0t1  c0t5  c1t1  c1t5  c2t1  c2t5  c3t1  c3t5  c4t1  c4t5  c5t1  c5t5
^b+   ^++   ^b+   ^--   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++

# cfgadm -al
c1::dsk/c1t5d0                 disk         connected    configured   failed

 # fmadm faulty
FRU : "HD_ID_47" (hc://:product-id=Sun-Fire-X4540:chassis-id=0915AMR048:server-id=x4500-10.unix:serial=9QMB024K:part=SEAGATE-ST35002NSSUN500G-09107B024K:revision=SU0D/chassis=0/bay=47/disk=0)
                  faulty

 # fmadm repair HD_ID_47
fmadm: recorded repair to HD_ID_47

 # format | grep c1t5d0
 #

 # hdadm offline slot 13
 1:    5:    9:   13:   17:   21:   25:   29:   33:   37:   41:   45:
c0t1  c0t5  c1t1  c1t5  c2t1  c2t5  c3t1  c3t5  c4t1  c4t5  c5t1  c5t5
^b+   ^++   ^b+   ^--   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++

 # cfgadm -al
c1::dsk/c1t5d0                 disk         connected    configured   failed

 # ipmitool sunoem led get|grep 13
 hdd13.fail.led   | ON
 hdd13.ok2rm.led  | OFF

# zpool online zpool1 c1t5d0
warning: device 'c1t5d0' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present

# cfgadm -c disconnect c1::dsk/c1t5d0
cfgadm: Hardware specific failure: operation not supported for SCSI device


Bah, why were they changed to SCSI? Increasing the size of the hammer...


# cfgadm -x replace_device c1::sd37
Replacing SCSI device: /devices/p...@0,0/pci10de,3...@b/pci1000,1...@0/s...@5,0
This operation will suspend activity on SCSI bus: c1
Continue (yes/no)? y
SCSI bus quiesced successfully.
It is now safe to proceed with hotplug operation.
Enter y if operation is complete or n to abort (yes/no)? y

# cfgadm -al
c1::dsk/c1t5d0                 disk         connected    configured   failed


I am fairly certain that if I reboot, it will all come back ok again. But I would like to believe that I should be able to replace a disk without rebooting on a X4540.

Any other commands I should try?

Lund

--
Jorgen Lundman       | <lund...@lundman.net>
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
Japan                | +81 (0)3 -3375-1767          (home)
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to