On 01/22/2013 06:26 PM, Len Zaifman wrote:
> We have just had a major system meltdown and it took several days to fix.
> 
> What we would have liked is 2 things we had on thumpers (Old SUN ZFS systems)
> 
> 1) A tool to show the mapping of a solaris device name to a physical location
> 2) A tool to turn on the light on a disk via its solaris device name.
> 
> The process below is too painful, and we have other devices whose disks may 
> go bad. Does either 1 or 2 above exist in openindiana? I could not find it, 
> if it does.
> 
> Thanks.
> 
> The issue was:
> 
> OI (OpenIndiana Development oi_151a X86) reported:
> 
> 
> Jan 22 10:57:43 archivea scsi: [ID 107833 kern.warning] WARNING: 
> /pci@7a,0/pci8086,3408@1/pci1000,3040@0 (mpt_sas10):
> Jan 22 10:57:43 archivea        Disconnected command timeout for Target 18
> Jan 22 10:57:43 archivea scsi: [ID 365881 kern.info] 
> /pci@7a,0/pci8086,3408@1/pci1000,3040@0 (mpt_sas10):
> Jan 22 10:57:43 archivea        Log info 0x31140000 received for target 18.
> Jan 22 10:57:43 archivea        scsi_status=0x0, ioc_status=0x8048, 
> scsi_state=0xc
> 
> zfs performance went through the floor  and was intolerable(< 1 mb/sec where 
> we had hundreds of MB/sec for resilver/scrubs and 100 MB/sec through the 
> filesystem).
> 
> The defective disk was one of 45 disks in a Supermicro Jbod system 
> (SC847E26-RJBOD1)
> 
> We finally found which disk it was by comparing serial numbers reported by 
> iostat, disks that reported errors and the actual disk serial number (we 
> pulled all 45 disks out to do this mapping). we do not want to repeat this 
> process for our other devices.

The things you describe are hardware-specific. If your enclosures are
SES-2 compatible, then the fault manager should automatically blink the
appropriate LED.

You can easily map the affected FRU from a fault report in fmadm, for
example I have one drive right now with a predictive failure:
# fmadm faulty
--------------- ------------------------------------  --------------
---------
TIME            EVENT-ID                              MSG-ID
SEVERITY
--------------- ------------------------------------  --------------
---------
Jan 14 19:11:12 29661ec9-5747-4466-f241-c96ac9f7954f  DISK-8000-0X
Major

Host        : vod1
Platform    : SUN-FIRE-X2250    Chassis_id  : 0948QBN009
Product_sn  :

Fault class : fault.io.disk.predictive-failure
Affects     :
dev:///:devid=id1,sd@n5000c50015ae9c51//pci@0,0/pci8086,4021@1/pci1000,3150@0/sd@9,0
                  faulted but still in service
FRU         : "SCSI Device  9"
(hc://:product-id=SUN-Storage-J4200:server-id=:chassis-id=0946QGJ007:serial=9QJ4VY4X:part=SEAGATE-ST31000NSSUN1.0T-093354VY4X:revision=SU0D/ses-enclosure=0/bay=9/disk=0)
                  faulty

Description : SMART health-monitoring firmware reported that a disk
              failure is imminent.
              Refer to http://sun.com/msg/DISK-8000-0X for more information.

Now we can take the FRU ID and find out which logical drive it
corresponds to.

# /usr/lib/fm/fmd/fmtopo -V
'hc://:product-id=SUN-Storage-J4200:server-id=:chassis-id=0946QGJ007:serial=9QJ4VY4X:part=SEAGATE-ST31000NSSUN1.0T-093354VY4X:revision=SU0D/ses-enclosure=0/bay=9/disk=0'
... [snip] ...
    logical-disk      string    c7t9d0    <<< here's the logical ID
    manufacturer      string    SEAGATE
    model             string    ST31000NSSUN1.0T 093354VY4X
    serial-number     string    9QJ4VY4X
    firmware-revision string    SU0D
    capacity-in-bytes string    1000204886016
    target-port-l0ids string[]  [ "w5001636000207501" ]
... [snip] ...

If you don't know your FRU, just run /usr/lib/fm/fmd/fmtopo without any
arguments, it'll print out the FRUs for all the machine components it knows.

If you are running a recent LSI HBA, you can also install the sas2ircu
and diskmap.py utilities which will map out your physical infrastructure
and tell you what lies where:

# diskmap.py
Diskmap - npvr1> help

Documented commands (type help <topic>):
========================================
EOF    controllers  disks       enclosures  ledon   quit     sd_timeout
alias  discover     drawletter  ledoff      mangle  refresh

Diskmap - npvr1> disks
1:02:00    c8t50000393E8CAF2A4d0        MK2001TRKB    2.0T  Ready (RDY)
content: raidz1-0
1:02:01    c8t50000393E8CAF53Cd0        MK2001TRKB    2.0T  Ready (RDY)
content: raidz1-1
... [snip] ...

The first column is your <ctrl>:<enclosureid>:<drivenumber> ID. The
"ledon" and "ledoff" control LED blinking.
See https://github.com/swacquie/DiskMap for more info.

Hope this helps.

Cheers,
--
Saso

_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss

Reply via email to