On 01/22/2013 06:26 PM, Len Zaifman wrote: > We have just had a major system meltdown and it took several days to fix. > > What we would have liked is 2 things we had on thumpers (Old SUN ZFS systems) > > 1) A tool to show the mapping of a solaris device name to a physical location > 2) A tool to turn on the light on a disk via its solaris device name. > > The process below is too painful, and we have other devices whose disks may > go bad. Does either 1 or 2 above exist in openindiana? I could not find it, > if it does. > > Thanks. > > The issue was: > > OI (OpenIndiana Development oi_151a X86) reported: > > > Jan 22 10:57:43 archivea scsi: [ID 107833 kern.warning] WARNING: > /pci@7a,0/pci8086,3408@1/pci1000,3040@0 (mpt_sas10): > Jan 22 10:57:43 archivea Disconnected command timeout for Target 18 > Jan 22 10:57:43 archivea scsi: [ID 365881 kern.info] > /pci@7a,0/pci8086,3408@1/pci1000,3040@0 (mpt_sas10): > Jan 22 10:57:43 archivea Log info 0x31140000 received for target 18. > Jan 22 10:57:43 archivea scsi_status=0x0, ioc_status=0x8048, > scsi_state=0xc > > zfs performance went through the floor and was intolerable(< 1 mb/sec where > we had hundreds of MB/sec for resilver/scrubs and 100 MB/sec through the > filesystem). > > The defective disk was one of 45 disks in a Supermicro Jbod system > (SC847E26-RJBOD1) > > We finally found which disk it was by comparing serial numbers reported by > iostat, disks that reported errors and the actual disk serial number (we > pulled all 45 disks out to do this mapping). we do not want to repeat this > process for our other devices.
The things you describe are hardware-specific. If your enclosures are SES-2 compatible, then the fault manager should automatically blink the appropriate LED. You can easily map the affected FRU from a fault report in fmadm, for example I have one drive right now with a predictive failure: # fmadm faulty --------------- ------------------------------------ -------------- --------- TIME EVENT-ID MSG-ID SEVERITY --------------- ------------------------------------ -------------- --------- Jan 14 19:11:12 29661ec9-5747-4466-f241-c96ac9f7954f DISK-8000-0X Major Host : vod1 Platform : SUN-FIRE-X2250 Chassis_id : 0948QBN009 Product_sn : Fault class : fault.io.disk.predictive-failure Affects : dev:///:devid=id1,sd@n5000c50015ae9c51//pci@0,0/pci8086,4021@1/pci1000,3150@0/sd@9,0 faulted but still in service FRU : "SCSI Device 9" (hc://:product-id=SUN-Storage-J4200:server-id=:chassis-id=0946QGJ007:serial=9QJ4VY4X:part=SEAGATE-ST31000NSSUN1.0T-093354VY4X:revision=SU0D/ses-enclosure=0/bay=9/disk=0) faulty Description : SMART health-monitoring firmware reported that a disk failure is imminent. Refer to http://sun.com/msg/DISK-8000-0X for more information. Now we can take the FRU ID and find out which logical drive it corresponds to. # /usr/lib/fm/fmd/fmtopo -V 'hc://:product-id=SUN-Storage-J4200:server-id=:chassis-id=0946QGJ007:serial=9QJ4VY4X:part=SEAGATE-ST31000NSSUN1.0T-093354VY4X:revision=SU0D/ses-enclosure=0/bay=9/disk=0' ... [snip] ... logical-disk string c7t9d0 <<< here's the logical ID manufacturer string SEAGATE model string ST31000NSSUN1.0T 093354VY4X serial-number string 9QJ4VY4X firmware-revision string SU0D capacity-in-bytes string 1000204886016 target-port-l0ids string[] [ "w5001636000207501" ] ... [snip] ... If you don't know your FRU, just run /usr/lib/fm/fmd/fmtopo without any arguments, it'll print out the FRUs for all the machine components it knows. If you are running a recent LSI HBA, you can also install the sas2ircu and diskmap.py utilities which will map out your physical infrastructure and tell you what lies where: # diskmap.py Diskmap - npvr1> help Documented commands (type help <topic>): ======================================== EOF controllers disks enclosures ledon quit sd_timeout alias discover drawletter ledoff mangle refresh Diskmap - npvr1> disks 1:02:00 c8t50000393E8CAF2A4d0 MK2001TRKB 2.0T Ready (RDY) content: raidz1-0 1:02:01 c8t50000393E8CAF53Cd0 MK2001TRKB 2.0T Ready (RDY) content: raidz1-1 ... [snip] ... The first column is your <ctrl>:<enclosureid>:<drivenumber> ID. The "ledon" and "ledoff" control LED blinking. See https://github.com/swacquie/DiskMap for more info. Hope this helps. Cheers, -- Saso _______________________________________________ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss