Re: [Linux-PowerEdge] R6515 with unbalanced memory config?

2020-07-13 Thread Stephen Dowdy


[EXTERNAL EMAIL] 


On 7/13/20 3:18 PM, Tim Mooney wrote:


But the full whitepaper is no longer present on Dell's download site.
Even finding the 1 page

 
http://downloads.dell.com/manuals/common/Direct_from_Development_-_Balanced_Memory_on_2nd_Generation_AMD_EPYC_Processors_Reference_Guide.pdf

Still links to the full whitepaper under a download URL that no longer
works.

Anyone know either where the full whitepaper can be found or what kind of
performance penalty we're looking at for this type of NUMA memory config?


FWIW, i strongly deter any of my users from configuring a machine that's not fully 
"balanced".
Unfortunately, that's hard here on the Epyc Gen II where you have to use 8 
DIMMs and 8GB is the smallest.
(so a 2P server has at least 128GB of RAM, even if you don't need it)

Amazingly the site is not hiding dirindexes, so, just head on over to:
https://downloads.dell.com/manuals/common/

Quality Control is Job None in the world today :-(
(for renaming files and not validating links)

How's this?

https://downloads.dell.com/manuals/common/dellemc-balanced-memory-2ndgen-amd-epyc-poweredge.pdf

The Epyc "Chiplet" arrangement makes things even more weird for configuring 
systems.
I haven't read the WP yet, but thanks for pointing it out.

Search for "epyc" to find other docs.

dellemc-balanced-memory-2ndgen-amd-epyc-poweredge.pdf
dell-emc-dfd-advantage-four-channel-memory-poweredge-amd-epyc.pdf
dellemc-dfd-balanced-memory-2ndgen-amd-epyc-summary.pdf
dell-emc-dfd-numa-amd-epyc-2ndgen.pdf
dellemc_readysol_hpc_digimanufacturing_amdepyc_altairperf.pdf
dellemc_readysol_hpc_digimanufacturing_epyc_ansys.pdf
dellemc_readysol_hpc_digimanufacturing_epyc_simcenter_starccm.pdf
Direct_from_Development_-_Balanced_Memory_on_2nd_Generation_AMD_EPYC_Processors_Reference_Guide.pdf
poweredge_perf_amdepyc7002series.pdf
security_poweredge_amd_epyc_gen2.pdf

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge


Re: [Linux-PowerEdge] BIOS update fails on Dell PowerEdge R720xd

2019-11-07 Thread Stephen Dowdy

On 11/7/19 6:35 PM, Mauricio Tavares wrote:

terminate called after throwing an instance of 'unsigned char'


We had this recently here.  using an interim BIOS (a few revs back) showed a 
much more useful error message:

 System Services is disabled

Can you re-boot the system to see if that's the case?

I dunno if there's a way to programmatically re-enable System-Services when the 
system is running (perhaps through the iDRAC)

--stephen

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge


Re: [Linux-PowerEdge] Disk temperature

2019-10-10 Thread Stephen Dowdy


[EXTERNAL EMAIL] 


On 10/10/19 11:32 AM, vinc...@cojot.name wrote:


[EXTERNAL EMAIL]

You could use megaclisas-status, here a T440 with a PERC H730P.
It reports temps for drives and for the LSI cards.

# megaclisas-status
-- Controller information --
-- ID | H/W Model  | RAM    | Temp | BBU    | Firmware
c0    | PERC H730P Adapter | 2048MB | 79C  | Good   | FW: 25.5.6.0009

-- Array information --
-- ID | Type   |    Size |  Strpsz |   Flags | DskCache |   Status |  OS Path | 
CacheCade |InProgress
c0u0  | RAID-0 |   1818G |  512 KB | ADRA,WB |  Enabled |  Optimal | /dev/sda | 
None  |None
c0u1  | RAID-0 |   7276G |  512 KB | ADRA,WB |  Enabled |  Optimal | /dev/sdb | 
None  |None
c0u2  | RAID-0 |   1818G |  512 KB | ADRA,WB |  Enabled |  Optimal | /dev/sdc | 
None  |None

-- Disk information --
-- ID   | Type | Drive Model  | Size | 
Status  | Speed    | Temp | Slot ID  | LSI ID
c0u0p0  | SSD  | S3YUNB0KC09340D Samsung SSD 860 EVO 2TB RVT03B6Q | 1.818 TB | 
Online, Spun Up | 6.0Gb/s  | 25C  | [32:0]   | 0
c0u1p0  | HDD  | R6GP40KY WDC WD80EFZX-68UW8N0 83.H0A83   | 7.276 TB | 
Online, Spun Up | 6.0Gb/s  | 33C  | [32:4]   | 4
c0u2p0  | SSD  | S3YUNB0KC09173D Samsung SSD 860 EVO 2TB RVT03B6Q | 1.818 TB | 
Online, Spun Up | 6.0Gb/s  | 24C  | [32:1]   | 1

Ref:
https://raw.githubusercontent.com/ElCoyote27/hwraid/master/wrapper-scripts/megaclisas-status


Vincent, thanks for this,  This is similar to a script i wrote 15 years ago 
(output below).
but megaclisas-status is MUCH faster (mine is bash and full of ugliness, but 
that's the nature of parsing MegaCLI output :-/ ).
I've added 'megaclisas-status' to my LSI-Tools kit.

FWIW, You have info i don't have, but it'd be nice to get PredFault(PF), 
MediaErrors(ME), (much more useful status predictors, IMHO, than Temp values) 
and Foreign Cfg info
Of course, figuring out where to stuff that info and not have lines too long is 
fun.

--stephen

# /usr/local/LSI-Tools/megacli-overview
[Adapter 0]
ADP[0]=( PERC 5/E Adapter),FWPkg( 5.2.2-0076),FWVer( 1.03.50-0461)
BBU[0]=(Learning?=No, charge=100 %, status=Complete, isSOHGood=Yes)
PD[17:0]=(WDC/WD1002FBYS-18W8B1/0C12/X) ME=0,OE=6,PF=0,FW=Online, Spun 
Up,F=None
PD[17:1]=(SATA/ST3750640AS/E/X) ME=0,OE=6,PF=0,FW=Online, Spun Up,F=None
...
PD[17:14]=(Hitachi/HUA721075KLA330/A74A/X) ME=0,OE=6,PF=0,FW=Online, 
Spun Up,F=None
VD[0:0]=("d4",R(5,0,3),SZ= 4.089 TB,SS=128 KB,CP=(W=WB,R=ReadADP,IO=D),# 
7,Optimal)
VD[0:1]=("d3",R(5,0,3),SZ= 4.089 TB,SS=128 KB,CP=(W=WB,R=ReadADP,IO=D),# 
7,Optimal)
[Adapter 1]
ADP[1]=( PERC 6/i Integrated),FWPkg( 6.3.3.0002),FWVer( 1.22.52-1909)
BBU[1]=(Learning?=No, charge=88 %, status=Complete, isSOHGood=Yes)
PD[32:0]=(SEAGATE/ST9300603SS/FS66/X) ME=0,OE=0,PF=0,FW=Online, Spun 
Up,F=None
PD[32:1]=(SEAGATE/ST9300603SS/FS66/X) ME=0,OE=0,PF=0,FW=Online, Spun 
Up,F=None
PD[32:2]=(SEAGATE/ST9300603SS/FS66/X) ME=0,OE=0,PF=0,FW=Online, Spun 
Up,F=None
PD[32:3]=(SEAGATE/ST9300603SS/FS66/X) ME=0,OE=0,PF=0,FW=Online, Spun 
Up,F=None
VD[1:0]=("opsys",R(1,3,0),SZ= 557.75 GB,SS=64 
KB,CP=(W=WB,R=ReadADP,IO=D),#2,Optimal)

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge


Re: [Linux-PowerEdge] Disk temperature

2019-10-10 Thread Stephen Dowdy
Title:scsi-inq-all
# Purpose:  use smartctl to list out INQUIRY data and important stats for all 
drives it finds.
# Author:   Stephen Dowdy (sdo...@ucar.edu)

# Ref: https://www.backblaze.com/blog/hard-drive-smart-stats/
# Backblaze uses:
#SMART 5 – Reallocated_Sector_Count.
#SMART 187 – Reported_Uncorrectable_Errors.
#SMART 188 – Command_Timeout.
#SMART 197 – Current_Pending_Sector_Count.
#SMART 198 – Offline_Uncorrectable.

# Note that this 'awk' script is designed to only process ONE disk at a time 
(vars are not rezeroed on end-of-records)
# Warn: this script is extremely dependent upon parsing exact output strings 
from 'smartctl', which are subject to change.

# SATA and SAS are reported VERY differently by 'smartctl', so we have to do 
several attempts to string match/parse

#awk=gawk
#awk=mawk
awk=awk

smartctl --scan-open | sed -ne '/^\//{s/[ \t]*#.*$//;p}' | \
while read disk; do
echo "[ ${disk} ]"
#smartctl -a ${disk} | ${awk} -v 'FS=:[[:space:]][[:space:]]*' '
smartctl -x ${disk} | ${awk} -v 'FS=[ \t]*:[ \t]*' '
/^(Vendor|Model Family):/{vendor=$2}
/^(Product|Device Model):/{model=$2}
/^(Revision|Firmware Version):/{firmware=$2}
/^(Serial number|Serial Number):/{serial=$2}
/^Elements in grown defect list:/{gdf=$2}
/Reallocated_Sector_Ct/{tmp=split($0,foo,"[ \t][ \t]*"); gdf=foo[tmp]}
/Offline_Uncorrectable/{tmp=split($0,foo,"[ \t][ \t]*"); uco=foo[tmp]}
/Reported_Uncorrect/{tmp=split($0,foo,"[ \t][ \t]*"); ucr=foo[tmp]}
/Current_Pending_Sector/{tmp=split($0,foo,"[ \t][ \t]*"); pen=foo[tmp]}
/^Manufactured in week/{nf=split($0,foo,"[ \t][ 
\t]*");mdt=sprintf("%d(w)/%d(y)",foo[4],foo[7]);}
/Power_On_Hours/{tmp=split($0,foo,"[ \t][ \t]*"); poh=foo[tmp]; }
/number of hours powered up/{tmp=split($0,foo,"[ \t][ \t]*=[ \t][ 
\t]*");poh=foo[tmp]}
/Device Error Count/{tmp=split($0,foo,"[ \t][ \t]*"); elc=foo[4]}
/Temperature_Celsius/{tmp=split($0,foo,"[ \t][ \t]*"); cdt=foo[8]; }
/Current Drive Temperature/{tmp=split($0,foo,"[ \t][ \t]*"); cdt=foo[4]}
/Current Temperature:/{tmp=split($0,foo,"[ \t][ \t]*"); cdt=foo[3]}
END{printf("%-60s : %8s %15s (gdf=%d,ucr=%d,uco=%d,pen=%d,cdt=%d,elc=%d) [mdt=%s, 
poh=%d]\n",vendor " " model,firmware,serial,gdf,ucr,uco,pen,cdt,elc,mdt,poh)}
'
done

exit 0

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge


Re: [Linux-PowerEdge] Disk temperature

2019-10-10 Thread Stephen Dowdy



[EXTERNAL EMAIL] 


On 10/10/19 9:58 AM, Onno Zweers wrote:

Hi all,

Does anyone know how to get the temperature of the disks connected to a Perc 
H730?


Onno,

You should be able to modify this script to also report Temperatures.
(smartmontools' smartctl past some 6.5(?) version supports a MegaRAID passthru)

e.g.  SAS drive might report something like:

Current Drive Temperature: 54 C
(SATA will likely report in an Attribute table element)



#!/bin/sh
# Title:scsi-inq-all
# Purpose:  use smartctl to list out INQUIRY data and important stats for all 
drives it finds.
# Author:   Stephen Dowdy (sdo...@ucar.edu)

# Ref: https://www.backblaze.com/blog/hard-drive-smart-stats/
# Backblaze uses:
#SMART 5 – Reallocated_Sector_Count.
#SMART 187 – Reported_Uncorrectable_Errors.
#SMART 188 – Command_Timeout.
#SMART 197 – Current_Pending_Sector_Count.
#SMART 198 – Offline_Uncorrectable.

# Note that this 'awk' script is designed to only process ONE disk at a time 
(vars are not rezeroed on end-of-records)
# Warn: this script is extremely dependent upon parsing exact output strings 
from 'smartctl', which are subject to change.

# SATA and SAS are reported VERY differently by 'smartctl', so we have to do 
several attempts to string match/parse

#awk=gawk
#awk=mawk
awk=awk

smartctl --scan-open | sed -ne '/^\//{s/[ \t]*#.*$//;p}' | \
while read disk; do
echo "[ ${disk} ]"
#smartctl -a ${disk} | ${awk} -v 'FS=:[[:space:]][[:space:]]*' '
smartctl -x ${disk} | ${awk} -v 'FS=[ \t]*:[ \t]*' '
/^(Vendor|Model Family):/{vendor=$2}
/^(Product|Device Model):/{model=$2}
/^(Revision|Firmware Version):/{firmware=$2}
/^(Serial number|Serial Number):/{serial=$2}
/^Elements in grown defect list:/{gdf=$2}
/Reallocated_Sector_Ct/{tmp=split($0,foo,"[ \t][ \t]*"); gdf=foo[tmp]}
/Offline_Uncorrectable/{tmp=split($0,foo,"[ \t][ \t]*"); uco=foo[tmp]}
/Reported_Uncorrect/{tmp=split($0,foo,"[ \t][ \t]*"); ucr=foo[tmp]}
/Current_Pending_Sector/{tmp=split($0,foo,"[ \t][ \t]*"); pen=foo[tmp]}
/^Manufactured in week/{nf=split($0,foo,"[ \t][ 
\t]*");mdt=sprintf("%d(w)/%d(y)",foo[4],foo[7]);}
/Power_On_Hours/{tmp=split($0,foo,"[ \t][ \t]*"); poh=foo[tmp]; }
/number of hours powered up/{tmp=split($0,foo,"[ \t][ \t]*=[ \t][ 
\t]*");poh=foo[tmp]}
/Device Error Count/{tmp=split($0,foo,"[ \t][ \t]*"); elc=foo[4]}
END{printf("%-60s : %8s %15s (gdf=%d,ucr=%d,uco=%d,pen=%d,elc=%d) [mdt=%s, 
poh=%d]\n",vendor " " model,firmware,serial,gdf,ucr,uco,pen,elc,mdt,poh)}
'
done

exit 0

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge


Re: [Linux-PowerEdge] R710 not iDRAC upgradeable?

2019-10-08 Thread Stephen Dowdy


[EXTERNAL EMAIL] 



On 10/8/19 11:58 AM, Mauricio Tavares wrote:

[EXTERNAL EMAIL]

More fun:

# ./IDRAC_FRMW_LX_R218238.BIN
/tmp/IDRAC_FRMW_LX_R218238.BIN-26250-16229/spsetup.sh: line 124:
source: buildVer.sh: file not found
#

I take that buildVer.sh should be in the package but not being extracted?

# fgrep buildVer.sh IDRAC_FRMW_LX_R218238.BIN
#


You can extract a .BIN via:


./{FOO}.BIN --extract FOO to see what's in the package, but be aware, 
DUP Kits like to have embedded RPMs that extract also, and i think i've 
seen THREE or more levels of archiving :-/



But, you may also be running into some systems not allowing executable 
use in /tmp/   (fstab/mount option 'noexec').


I'd try running it from somewhere else, also, or see if:

export TMPDIR=/home/tmp

./IDRAC_FRMW_LX_R218238.BIN


works ?


--stephen

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge


Re: [Linux-PowerEdge] R710 not iDRAC upgradeable?

2019-10-08 Thread Stephen Dowdy


[EXTERNAL EMAIL] 

tangential to the conversation, if anyone's interested in shell code 
that gets the BMC/IDRAC full firmware revision and type using IPMITool 
and the ipmi kernel modules, it follows.


This has a number of go-back revisions over time and hasn't been 
refactored, but it does the job.


I needed to updated it when Dell started releasing iDRAC firmwares that 
were in the "release/fix" series ({major}.{minor}.{release}.{fix}  -- 
where {major}.{minor} stayed the same, and most tools only report 
{major}.{minor}, like:


   # ipmitool bmc info | grep 'Firmware Revision'
    Firmware Revision : 2.61


Note that if you perform a firmware upgrade, you'll have to 'rmmod' all 
the IPMI modules, then reload them to get updated results.


    # something like...

    while lsmod | grep -q '^ipmi_'; do modprobe --remove --all $(lsmod 
| awk '/^ipmi_/{print$1}'); done; modprobe --all  ipmi_si ipmi_devintf 
ipmi_msghandler



# get_sys_bmc_version
2.61.60.60 {iDRAC8}


# get_sys_bmc_version() {

# XXX /sys/class/ipmi won't update when the firmware is updated.
# XXX find a way to reload it if necessary

  # kernel class data doesn't expose name/type (e.g. iDRAC)
  if [ -f /sys/class/ipmi/ipmi0/device/bmc/firmware_revision ]; then
    fwver=$(cat /sys/class/ipmi/ipmi0/device/bmc/firmware_revision)
    if [ -f /sys/class/ipmi/ipmi0/device/bmc/aux_firmware_revision ]; then
  fwver_rev=$(printf "%d" "$(cut -d' ' -f2 
/sys/class/ipmi/ipmi0/device/bmc/aux_firmware_revision)")
  fwver_fix=$(printf "%d" "$(cut -d' ' -f1 
/sys/class/ipmi/ipmi0/device/bmc/aux_firmware_revision)")

    fi
    if type ipmitool >/dev/null 2>&1; then
  bmctype="$(ipmitool sdr elist mcloc|sed -e 's/[[:space:]].*$//')"
  tmp="$(ipmitool mc info 2>/dev/null | grep 'Firmware Revision' \
    | sed -e 's/^[^:]*:[[:space:]]*\(.*[^ ]\)\([[:space:]]*$\)/\1/')"
  if [ "${fwver}" != "${tmp}" ]; then
    printf "WARNING: 
/sys/class/ipmi/ipmi0/device/bmc/firmware_revision (=${fwver}) differs 
from ipmitool mc info (=${tmp})\n" 1>&2
    printf " Perhaps firmware was updated and system not 
rebooted (or IPMI not reloaded)\n" 1>&2

  fi
    fi
    echo "${fwver}.${fwver_rev}.${fwver_fix}${bmctype:+ {${bmctype}}}"
  else
    if type ipmitool >/dev/null 2>&1; then
  bmctype="$(ipmitool sdr elist mcloc|sed -e 's/[[:space:]].*$//')"
  tmp="$(ipmitool mc info 2>/dev/null | grep 'Firmware Revision' \
    | sed -e 's/^[^:]*:[[:space:]]*\(.*[^ ]\)\([[:space:]]*$\)/\1/')"
  echo "${tmp}${bmctype:+ {${bmctype}}}"
    else
  echo ""
    fi
  fi
}


___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge


Re: [Linux-PowerEdge] Raid errors

2019-10-01 Thread Stephen Dowdy

[EXTERNAL EMAIL] 


On 10/1/19 10:41 AM, vi...@vheuser.com wrote:
> Sep 26 15:25:47 Debian9 kernel: megaraid_sas :01:00.0: 19474006
> (622841147s/0x0004/CRIT) - Enclosure PD 20(c None/p0) hardware error 


"Enclosure PD 20"  is hex  32.   this is typical enclosure address for
Dell server internal enclosures.

So, this appears to be reporting a hardware error on the enclosure,
rather than with any of the drives.


Probably want to get megacli or perccli and do :

    megacli fwtermlog dsply a0

    perccli /c0 show termlog

to get more details

--stephen

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge


Re: [Linux-PowerEdge] FYI: BIOS Update fails due to CONFIG_IO_STRICT_DEVMEM=y in linux kernel 4.16+

2018-07-12 Thread Stephen Dowdy

SUMMARY:
  - CONFIG_IO_STRICT_DEVMEM is culprit (not CONFIG_STRICT_DEVMEM)
  - iomem=relaxed allows Dell DUP Kits to correctly report and update some 
firmware that previously failed.
  - some Dell DUP Kits work fine in locked-down environment, others fail.  some 
fail with error reports, others silently claim to work and fail fully.
  - Dell should hopefully fix the DUP kit components to error check and 
work-around restrictions.


On 07/12/2018 02:40 PM, Stephen Dowdy wrote:

Update:  adding 'iomem=relaxed' to the kernel bootparams allows 'biosie' (from 
the BIOS*.BIN DUPkit) to update the BIOS.


Subject line changed to reflect the accurate kernel knob 
(CONFIG_IO_STRICT_DEVMEM)

Ref: https://outflux.net/blog/archives/2016/09/28/security-things-in-linux-v4-5/
Ref: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=90a545e981267e917b9d698ce07affd69787db87


Okay, this (iomem=relaxed) also fixed the issue i've been having with the BCM NetExtreme 
(5720) firmware update DUP kits failing to report the "installed/running" 
version and actually REALLY updating the firmware:

~# ral-superinv
[System Board]
Hostname : 
Manufacturer : Dell Inc.
   Model : PowerEdge T640
  Memory : 196608 MB
   Serial Number : 
BIOS Version : 1.3.7 (!=1.4.8)
   BMC/iDRAC Version : 3.21 {iDRAC9}

[Network]

  Broadcom Adv. Dual 10GBASE-T Ethernet (BCM957416)@18:00.0: 20.08.04.04
  Broadcom Adv. Dual 10GBASE-T Ethernet (BCM957416)@18:00.1: 20.08.04.04
-->   Broadcom NetXtreme Gigabit Ethernet (BCM95720)@b1:00.0   : 20.6.52
-->   Broadcom NetXtreme Gigabit Ethernet (BCM95720)@b1:00.1   : 20.6.52
[RAID/PERC] :
PERC H740P Adapter = 50.3.0-1022


We try to run the Broadcom NetXtreme updater (Network_Firmware_R4HKW_LN_20.8.4.BIN) to 
get from 20.6 to 20.8, but Dell's DUP Kit can't inventory the "Installed 
version".  It runs and CLAIMS it updated and wants to reboot, but on reboot, the 
firmware's still 20.6 :

.../local/DellUpdates/NETW# ../dup_run_debian 14GEN_Broadcom_NetXtreme -q
/bin/sh appears to be dash, setting it to use BASH, because Dell's sh 
scripts are non-POSIX and break systems
Collecting inventory...

Running validation...

NetXtreme BCM5720 Gigabit Ethernet PCIe (enp177s0f0)

The version of this Update Package is newer than the currently installed version.

Software application name: NetXtreme BCM5720 Gigabit Ethernet PCIe 
(enp177s0f0)
Package version: 20.8.4
--> Installed version:
...
Executing update...
WARNING: DO NOT STOP THIS PROCESS OR INSTALL OTHER PRODUCTS WHILE UPDATE IS 
IN PROGRESS.
THESE ACTIONS MAY CAUSE YOUR SYSTEM TO BECOME UNSTABLE!

The system should be restarted for the update to take effect.
ERROR: Network_Firmware_R4HKW_LN_20.8.4.BIN execution MAY have failed, or 
you answered NO to reboot, sorry (return_code=2)
(my wrapper script reports return error codes, so this was not a reported error 
from the DUP kit, but i believe DELL uses 'returncode=2' to indicate user chose 
to NOT reboot after update, but i'm not going to blindly trust that given that 
many DUPs behave differently)

Clearly, there's some failure to do error checks and report in this particular 
DUP kit.

BUT, The equivalent 20.8 Broadcom NetXtreme-E Updater 
(Network_Firmware_3VXHM_LN64_20.08.04.04.BIN) works *flawlessly* under the same 
conditions. (that's why my BCM957416's are updated to 20.08).

So, *something* can be done on Dell's side to make these DUP kits actually work 
under these conditions.

I verified that booting with 'iomem=relaxed' allows the 
'Network_Firmware_R4HKW_LN_20.8.4.BIN' to actually correctly query the 
installed version, *AND* update the firmware properly.  (but there's a security 
side-effect of running this way)

thanks,
--stephen
--
Stephen Dowdy  -  Systems Administrator  -  NCAR/RAL
303.497.2869   -  sdo...@ucar.edu-  http://www.ral.ucar.edu/~sdowdy/

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge


Re: [Linux-PowerEdge] FYI: BIOS Update fails due to CONFIG_STRICT_DEVMEM=y in linux kernel 4.16+

2018-07-12 Thread Stephen Dowdy

Update:  adding 'iomem=relaxed' to the kernel bootparams allows 'biosie' (from 
the BIOS*.BIN DUPkit) to update the BIOS.

The docs on this suck, though:

.../src/linux-source-4.16/Documentation# sed -ne '/iomem=/,/^$/p'  
./admin-guide/kernel-parameters.txt
iomem=  Disable strict checking of access to MMIO memory
strict  regions from userspace.
relaxed

No indications of the risks involved (okay, i presume this is a little better 
than all /dev/mem, as it would still immediately protect against memory 
scraping sensitive user process data and such, but still lets a potential 
cracker control/manipulate devices)

But unless i run that way always, rebooting an alternate grub line just to BIOS 
update adds an extra reboot.
(the goal is to not have to sit at the console, take long periods in lifecycle 
controller, ... or do multiple reboot hacks)
(I'm getting the feeling there's going to be no easy workaround anymore past 
Kernel 4.16 because the SMBIOS table is iomapped outide 1MB)

--stephen

--
Stephen Dowdy  -  Systems Administrator  -  NCAR/RAL
303.497.2869   -  sdo...@ucar.edu-  http://www.ral.ucar.edu/~sdowdy/

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge


[Linux-PowerEdge] FYI: BIOS Update fails due to CONFIG_STRICT_DEVMEM=y in linux kernel 4.16+

2018-07-12 Thread Stephen Dowdy

New PowerEdge T640 server needs BIOS update, however, it borks out with:

terminate called after throwing an instance of 'smbios::InternalErrorImpl'
  what():  Could not instantiate SMBIOS table.
/opt/dell/updatepackage/BIOS_4F4K0_LN_1.4.8.BIN-64420.wnGVr8/spsetup.sh: line 1331: 
65177 Aborted $cmd "$@"
.
The update failed to complete
ERROR: BIOS_4F4K0_LN_1.4.8.BIN execution MAY have failed, or you answered 
NO to reboot, sorry (return_code=1)

(NOTE: the last message is from my 'dup_run_debian' wrapper script that works 
around various DUP kit issues related to non-POSIX /bin/sh invocations, 
RHEL-presumptions, xterm forced uses, blah, blah, blah of various DUP kits that 
i've had to struggle with in the past.  THANKFULLY, Dell's getting serious 
about fixing some of these things in the past couple years)

i found the error is in spsetup.sh calling _execCmd() on 'biosie -u', and 
strace sez:
3359  open("/dev/mem", O_RDONLY)= 3
3359  mmap(NULL, 65536, PROT_READ, MAP_PRIVATE, 3, 0xe) = 0x7fb6f4c63000
3359  munmap(0x7fb6f4c63000, 65536) = 0
3359  mmap(NULL, 65536, PROT_READ, MAP_PRIVATE, 3, 0xf) = 0x7fb6f4c63000
3359  munmap(0x7fb6f4c63000, 65536) = 0
3359  mmap(NULL, 65536, PROT_READ, MAP_PRIVATE, 3, 0x6ca0) = -1 EPERM 
(Operation not permitted)
3359  futex(0x7fb6f3bf01a0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
3359  write(2, "terminate called after throwing "..., 48) = 48
3359  write(2, "smbios::InternalErrorImpl", 25) = 25
3359  write(2, "'\n", 2)= 2
3359  write(2, "  what():  ", 11)   = 11
3359  write(2, "Could not instantiate SMBIOS tab"..., 35) = 35
3359  write(2, "\n", 1)= 1

ran across this:

https://www.phoronix.com/scan.php?page=news_item=Linux-4.16-Def-Strict-Dev-Mem
"...Enabling CONFIG_STRICT_DEVMEM implements strict access to /dev/mem so that 
it only allows user-space access to memory mapped peripherals"

~/BIOS# grep STRICT_DEVMEM /boot/config-$(uname -r)
CONFIG_STRICT_DEVMEM=y
CONFIG_IO_STRICT_DEVMEM=y

Sigh.

~/BIOS# cat /dev/mem | wc -c
cat: /dev/mem: Operation not permitted
1048576

So, we have a 1MB access limit, as expected with CONFIG_STRICT_DEVMEM

and sure enough, 'biosie -u' is trying to read offset 0x6ca0, which is 
definitely > 1MB  (the first two are below 1MB and succeed)

AFAICT, This requires rebuilding a kernel from scratch with 
'CONFIG_STRICT_DEVMEM=n'

Can anyone from Dell identify if there's something that can be done either in 
'biosie' or by arm-twisting Linux kernel devs to allow other regions to be 
mapped? (if that's necessary).  (is there ANY way, i as a normal customer, can 
use a formal channel to report bugs/misfeatures like this?)

thanks,
--stephen

--
Stephen Dowdy  -  Systems Administrator  -  NCAR/RAL
303.497.2869   -  sdo...@ucar.edu-  http://www.ral.ucar.edu/~sdowdy/

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge


Re: [Linux-PowerEdge] MegaCLI compatibility in StorCli (with added typos) ?

2018-02-27 Thread Stephen Dowdy
On 02/27/2018 03:47 PM, vinc...@cojot.name wrote:
> 
> Hi,
> Has anyone noticed that storcli (version 1.23.02 tested) appears to support 
> MegaCli's syntax? Does anyone know if this is here to stay?

I've written about this several times in the past, e.g.:

http://lists.us.dell.com/pipermail/linux-poweredge/2015-January/049533.html

https://www.mail-archive.com/search?l=linux-poweredge@dell.com=subject:%22%5C%5BLinux%5C-PowerEdge%5C%5D+PE+r710+%5C-+MegaCli+not+working%22=newest=1

storcli/perccli help legacy


--stephen
-- 
Stephen Dowdy  -  Systems Administrator  -  NCAR/RAL
303.497.2869   -  sdo...@ucar.edu-  http://www.ral.ucar.edu/~sdowdy/

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge


Re: [Linux-PowerEdge] PE r710 - MegaCli not working

2018-01-22 Thread Stephen Dowdy
On 01/22/2018 10:11 AM, George Machitidze wrote:
> megacli/storcli won't work, use perccli instead

As i said, early storcli versions work fine. Also, MegaCLI has worked just 
fine, as well. but i haven't updated since 8.07.10, so maybe they did the same 
disablement on SubVendorID to later version (but LSI/Avago stopped updating 
MegaCLI a while back in favor of storcli)

Change to storcli made sometime between 1.03.11 and 1.04.07.

finds my PERC in 1.03.11
# storcli64-1.03.11 help | grep Ver
 Storage Command Line Tool  Ver 1.03.11 Jan 30, 2013
# storcli64-1.03.11 show ctrlcount | grep Count
Controller Count = 1

doesn't in 1.04.07
# storcli64-1.04.07 help | grep Ver  
 Storage Command Line Tool  Ver 1.04.07 Apr 2, 2013

# storcli64-1.04.07 show ctrlcount | grep Count
Controller Count = 0

Example of much older MegaCLI using a relatively recent PERC H730 Mini :

# /usr/local/LSI-Tools/megacli -V | grep Ver
  MegaCLI SAS RAID Management Tool  Ver 8.07.10 May 28, 2013

# /usr/local/LSI-Tools/megacli-overview
[Adapter 0]
ADP[0]=( PERC H730 Mini),FWPkg( 25.5.0.0018),FWVer( 4.270.00-8112)
BBU[0]=(Learning?=No, charge=93 %, status=Complete, isSOHGood=Yes)
PD[32:0]=(SEAGATE/ST300MM0006/LS0B/S0KX) ME=0,OE=0,PF=0,FW=Online, Spun 
Up,F=None
PD[32:1]=(SEAGATE/ST300MM0006/LS0B/S0KX) ME=0,OE=0,PF=0,FW=Online, Spun 
Up,F=None
PD[32:2]=(SEAGATE/ST300MM0006/LS0B/S0KX) ME=0,OE=0,PF=0,FW=Online, Spun 
Up,F=None
PD[32:3]=(SEAGATE/ST300MM0006/LS0B/S0KX) ME=0,OE=0,PF=0,FW=Online, Spun 
Up,F=None
PD[32:4]=(SEAGATE/ST300MM0006/LS0B/S0KX) ME=0,OE=0,PF=0,FW=Online, Spun 
Up,F=None
PD[32:5]=(SEAGATE/ST300MM0006/LS0B/S0KX) ME=0,OE=0,PF=0,FW=Online, Spun 
Up,F=None
PD[32:6]=(SEAGATE/ST300MM0006/LS0B/S0KX) ME=0,OE=0,PF=0,FW=Online, Spun 
Up,F=None
PD[32:7]=(SEAGATE/ST300MM0006/LS0B/S0KX) ME=0,OE=0,PF=0,FW=Online, Spun 
Up,F=None
VD[0:0]=("OpSys",R(6,0,3),SZ= 249.999 GB,SS=128 
KB,CP=(W=WriteBack,R=RANone,IO=D),# 8,Optimal)
VD[0:1]=("Exports",R(6,0,3),SZ= 1.389 TB,SS=128 
KB,CP=(W=WriteBack,R=RANone,IO=D),# 8,Optimal)

If you can't get your head around 'storcli/perccli' syntax or have existing 
scripts that parse MegaCLI (so much fun), realize it has the entire MegaCLI 
parser secretly hidden inside ;)
(oh, i actually just noticed, it *IS* "documented" with:
storcli/perccli help legacy
)

(not super current version, but haven't needed newer)
# /usr/local/LSI-Tools/perccli version   
 Storage Command Line Tool  Ver 1.17.10 October 21, 2015

 (c)Copyright 2015, AVAGO Corporation, All Rights Reserved.
 
storcli/perccli command format   
# /usr/local/LSI-Tools/perccli /c0 show termlog | head -5
Firmware Term Log Information on controller 0:
ateChannelBankInfo: Available Channels: 1 
T4: C0:onfiUpdateChannelBankInfo: Available Banks   : 1 
T4: C0:**DEBUG:LUN Interleave: 0x11 
T4: C0:onfiDeviceInfo[0].mainAreaSize:8192

MegaCli format:
# /usr/local/LSI-Tools/perccli fwtermlog dsply a0 | head -5
Firmware Term Log Information on controller 0:
ateChannelBankInfo: Available Channels: 1 
T4: C0:onfiUpdateChannelBankInfo: Available Banks   : 1 
T4: C0:**DEBUG:LUN Interleave: 0x11 
T4: C0:onfiDeviceInfo[0].mainAreaSize:8192 

--stephen
-- 
Stephen Dowdy  -  Systems Administrator  -  NCAR/RAL
303.497.2869   -  sdo...@ucar.edu-  http://www.ral.ucar.edu/~sdowdy/

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge


Re: [Linux-PowerEdge] PE r710 - MegaCli not working

2018-01-22 Thread Stephen Dowdy
On 01/22/2018 09:53 AM, Howard, Chris wrote:
> I don't use it often but I think it was working.
> Now it is not working.  It always says there are no controllers.

FWIW, sometimes the MegaRAID kernel drivers get wonked (esp when something is 
quite screwed up on the RAID itself), requiring a reboot for megacli to work, 
if you haven't done that.

There was also a split a few years back in LSI/Avago/Broadcom/whatever land 
where 'storcli' and 'perccli' split functionality depending upon whether it was 
a pure LSI device or Dell/other sub-brand.  ('storcli' used to work for Dell 
controllers but then LSI/Ava... switched it to (i presume) look at the PCI 
SubVendor ID and will report "No Controller" if it was a Dell model.

I continue to use MegaCli64-8.07.10  for all my PERC controllers and haven't 
had an issue with anything even current. (It Works For Me)

--stepehn
-- 
Stephen Dowdy  -  Systems Administrator  -  NCAR/RAL
303.497.2869   -  sdo...@ucar.edu-  http://www.ral.ucar.edu/~sdowdy/

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge


[Linux-PowerEdge] FYI using pdsh for mass execution (Re: dsu woes)

2018-01-10 Thread Stephen Dowdy
On 01/10/2018 03:27 PM, Patrick Boutilier wrote:
> What I would do is download the BIOS update file once and then create a 
> script to rsync it to the 1500 hosts and run the update. I presume you would 
> have a script to run dsu on the 1500 hosts anyway?

First:  i gave up on using Dell's enterprise management tools due to constant 
heartache/headache/frustration.  It definitely makes me sad that these tools 
continually change, but never actually get much better.  (okay, it looks like 
they are finally attacking the BASHisms in some of their scripts that borked 
Debian/Ubuntu systems badly, but the continual lack of correctness/current-cy, 
etc just pains me).

Second:

FYI, LLNL's 'pdsh' works great for this.  requires ssh public key trust for 
'root' in my following examples to do the following (running DUPs requires 
'root'):
** note that your public key should be offline until loaded in your ssh-agent 
(oof, Meltdown/Spectre, sigh)
WARNING: as any tool that allows mass-execution, if you screw up, you've now 
multiplied that screwup to a large number of systems, so always be careful.

If you have a file of hostnames already:

   pdsh -lroot -g {file} 'command-to-run-on-all-remote-systems'
(file is usually dshgroup module selected so ~/.dsh/group/{file} or 
/etc/dsh/group/{file})

If you wanna hit everything in /etc/hosts, instead:

   awk '$0!~/^(#.*| *)$/{print$2}' /etc/hosts | WCOLL=- pdsh -Rexec scp 
FOO-2.7.0.BIN  root@%h:/dev/shm/
explained: all non-comment/blank lines, print hostname (field 2), setup pdsh's 
WCOLL envvar as the file containing hostnames to stdin (-), use the exec module 
to scp your DUP.BIN file, substituting %h for each hostname successively.
   awk '$0!~/^(#.*| *)$/{print$2}' /etc/hosts | WCOLL=- pdsh -lroot 
'/dev/shm/FOO-2.7.0.BIN -q'
now, issue 'ssh' session to all hosts to run the DUP.BIN with '-q'.  ('-q' 
doesn't display changelog or prompt to run, also won't reboot after completion 
automatically)

Note that 'pdsh' fans-out commands, running "N" jobs simultaneously (default 
32).  I limit mine to 8 so i can use some special gateway-hop syntax with 
custom ./ssh/config rules to bounce past the admin nodes on clusters into the 
backend compute nodes.  This avoids the default 'sshd' connection throttling 
limits (usually 10 simultaneous connections)  (e.g.  cluster1-admin!!node3) 
using the ssh-config rule:

  Host*!!*
GatewayPorts no
ProxyCommand $(h="%h";p="%p" ; echo ssh -W ${h##*\!\!}:%p -l root 
${h%%\!\!*})

It's much easier to use libgenders or dshgroup style files for this kind of 
thing (than /etc/hosts and awk, etc), so you can use attribute selectors 
(genders) like:

gpsh -lroot -g 'model=poweredge_r730' 'do-something'
(it's up to you to create a genders file with the right attributes filled in)

records in my genders file, as created from a scripted MySQL asset database 
extraction look like:

host99 
name=host99,manu=dell,model=precision_t3400,hwtype=desktop,sn=XXX,os=debian_linux,status=in_use,user=godot,responsible=godot,purpose=user_room_linux,sa1=sdowdy,project=unknown,location=fl2-2094

Unfortunately, 'genders' doesn't support REGEX :-(   but you can use regex 
selection on hostnames in pdsh (just not attributes), like:

pdsh -lroot -g 'hwtype=desktop' -w '/engr-.*/' ...

to only hit the systems that are desktops and filter-down to only names with 
"engr-" in them.

--stephen

-- 
Stephen Dowdy  -  Systems Administrator  -  NCAR/RAL
303.497.2869   -  sdo...@ucar.edu-  http://www.ral.ucar.edu/~sdowdy/

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge


Re: [Linux-PowerEdge] 2 predicted failure disks and RAID5

2017-11-14 Thread Stephen Dowdy
On 11/14/2017 11:52 AM, Grzegorz Bakalarski wrote:
> Thanks for valuable input.
> Regarding punctured block:  from fwtermlog I got several (not much) lines of 
> type:
> 
> 11/13/17  3:24:45: EVT#08603-11/13/17  3:24:45:  97=Puncturing bad block on 
> PD 02(e0x20/s2) at 9ecd
that's bad.  You have a punctured stripe.

> T35:     maintainPdFailHistory=0 disablePuncturing=0 
> zeroBasedEnclEnumeration=1 disableBootCLI=1
This is and informational line indicating that the controller doesn't have the 
disablePuncturing config option set.

> All the same PD, the same bad block (different time)
> 
> Is my raid useless?

No, it's good enough to recover what data you can before you rebuild it.  
However, you can't trust the data that uses the bad block.   You'll get a read 
error from any object that maps to it.

Here's a good doc Dell put out:

https://www.dell.com/support/article/us/en/4/438291#2
   "...If the data within a punctured stripe is accessed errors will continue 
to be reported against the affected badLBAs with no possible correction 
available. Eventually (this could be minutes, days, weeks, months, etc.), the 
Bad Block Management (BBM) Table will fill up causing one or more drives to 
become flagged as predictive failure.,,,:

> BTW: why do think raid level migration to raid-6 with 2 additional disk would 
> be better than with one disk. I would keep VD size the same.

I'm not talking about a migration, i'm talking a complete WIPE of what you 
have, and a recreation from scratch.  At this point, you can recover what you 
can to a staging location, rebuild, then restore.
Keep track of data with I/O errors, because it's going to have a corrupted 
block at the punctured block address.  this could (if you're lucky), be in 
unallocated space.  could also be in filesystem structures and lead to 
widescale corruption of the filesystem.

I would mount it all READONLY and do a file-level dump (not a 'dd' or anything 
like that, which would migrate corrupted filesystem structures).  (i typically 
'rsync' data to another machine.).  You don't want any backup tool that does 
infinite retries, as it'll likely result in another disk failure. (from the 
above)  

> Anyway will migration too raid-6 fail with this  "awful Puncturing)???

RAID-6 is going to lessen the likelihood of a puncture, with 2 parity drives.  
While you're rebuilding a RAID5, any unrecoverable bad block event on any of 
the "good" drives during the rebuild will result in a puncture, with RAID6, you 
still have parity to cope with an uncorrectable error.

The above is especially true of some of the less reliable seagate drives from 
past years.  You can't count on them not throwing UCEs during a rebuild (or 
before you get the replacement drive installed), thereby puncturing the RAID.  
:-(

--stephen
-- 
Stephen Dowdy  -  Systems Administrator  -  NCAR/RAL
303.497.2869   -  sdo...@ucar.edu-  http://www.ral.ucar.edu/~sdowdy/

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge


Re: [Linux-PowerEdge] 2 predicted failure disks and RAID5

2017-11-14 Thread Stephen Dowdy
On 11/14/2017 09:52 AM, Grzegorz Bakalarski wrote:
> I have a server (R815) with PERC H710. I have 4 disks and RAID-5 on them. The 
> server have 2 empty disk slots.
> 
> After night I noticed 2 disks got predicted failure state - LEDs on disk 
> flash yellow (not green) on them (2 of them). MegaCLI shows that 2 disks have 
> high "Predictive Failure Count:" - some few thousands. 
GB,

*IF* this all rebuilds fine and goes Optimal again, you really want to:

   megacli fwtermlog dsply aall | grep -i punct

If you see a punctured block, you're gonna have to back off what you can, 
rebuild the RAID from scratch and restore, because there's no good way to fix a 
punctured stripe.
(ignore the lines that say the controller supports puncturing). 
If you have to rebuild, i'd go RAID6 with your 6 drives.

--stephen


-- 
Stephen Dowdy  -  Systems Administrator  -  NCAR/RAL
303.497.2869   -  sdo...@ucar.edu-  http://www.ral.ucar.edu/~sdowdy/

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge


Re: [Linux-PowerEdge] Upgrading firmware under CentOS7

2016-10-24 Thread Stephen Dowdy
On Mon, Oct 24, 2016 at 9:04 AM, Stephen Dowdy <sdo...@ucar.edu> wrote:

> SUMMARY: you could use linux namespaces (see proof-of-concept below)


​Since i failed to explicitly state WHY using this over 'mount -o
remount,exec /tmp', the point would be to NOT enable a potential GLOBAL
/tmp trojan/drop attack (the main point behind ​NOEXEC use on /tmp) even
during a short window (where "short" can be as long as like 30 minutes with
an iDRAC update)

--stephen



-- 
Stephen Dowdy  -  Systems Administrator  -  NCAR/RAL
303.497.2869   -  sdo...@ucar.edu-  http://www.ral.ucar.edu/~sdowdy/
___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge


Re: [Linux-PowerEdge] Upgrading firmware under CentOS7

2016-10-24 Thread Stephen Dowdy
ty tmp folder

chown "root:$TARGET_USER" "$NEWTMP"
chmod 770 "$NEWTMP"

unshare --mount -- /bin/bash -c "mount -o bind,noexec,nosuid,nodev
'$NEWTMP' /tmp && sudo -u '$TARGET_USER' $TARGET_CMD"



--stephen



On Mon, Oct 24, 2016 at 7:21 AM, Robert Jacobson <teri...@gmail.com> wrote:

>
> This should be considered a bug in DUP.  It should be easily resolvable by
> allowing users to customizing the temporary working directory; e.g. using
> an environment variable.
>
>
> > -Original Message-
> > From: linux-poweredge-boun...@dell.com [mailto:linux-poweredge-
> > boun...@dell.com] On Behalf Of Thibaut Pouzet
> > Sent: Monday, October 24, 2016 3:47 AM
> > To: linux-poweredge@dell.com
> > Subject: Re: [Linux-PowerEdge] Upgrading firmware under CentOS7
> >
> >
> > Le 21/10/2016 à 17:57, Davide Ferrari a écrit :
> >
> >
> >   Hello,
> >
> >
> >   CentOS7 comes with /tmp with no exec permissions by default, but
> all
> > Dell furmware upgrade pacakges uses /tmp as the default (and
> > uncustomizable) path to unpack and execute the actual FW upgrade binary
> > blob. Is there any official way to do it properly without remounting /tmp
> > with "exec" (or replacing /tmp with /var/tmp in the bash wrapper code
> > inside the package)?
> >
> >
> >   Thanks
> >
> >
> >   --
> >
> >   Davide Ferrari
> >
> >   Senior Systems Engineer
> >
> >
> >
> >
> >   ___
> >   Linux-PowerEdge mailing list
> >   Linux-PowerEdge@dell.com <mailto:Linux-PowerEdge@dell.com>
> >   https://lists.us.dell.com/mailman/listinfo/linux-poweredge
> >
> > Hi,
> >
> > Not that I've heard of sorry. I do this and this works just fine : before
> > launching the update, I run :
> > sudo mount -o remount,exec /tmp
> >
> > Once I'm done :
> > sudo mount -o remount,noexec /tmp
> >
> > I'm not aware of any other magical solution
> >
> > Cheers
> >
> >
> > --
> > Thibaut Pouzet
> > Lyra Network
> > Expert Sécurité
> > (+33) 5 31 22 40 08
> > www.lyra-network.com <http://www.lyra-network.com>
>
>
> ___
> Linux-PowerEdge mailing list
> Linux-PowerEdge@dell.com
> https://lists.us.dell.com/mailman/listinfo/linux-poweredge
>



-- 
Stephen Dowdy  -  Systems Administrator  -  NCAR/RAL
303.497.2869   -  sdo...@ucar.edu-  http://www.ral.ucar.edu/~sdowdy/
___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge


Re: [Linux-PowerEdge] Perc H700 firmware upgrade Debian

2016-05-24 Thread Stephen Dowdy
Helmut,

perccli /c${adp} download file=${fwfile}
-or-
megacli adpfwflash -f ${fwfile} -a${adp}

using the .ROM file from the DUP payload directory will actually work,
whereas the RHEL-specific Dell stuff may/usually doesn't (on Debian).

NOTE: 'perccli' is avail from Dell, 'megacli' is avail (though deprecated)
from LSI/Avago (harder to find)
--stephen



On Tue, May 24, 2016 at 5:55 AM, Helmut Wollmersdorfer <
helmut.wollmersdor...@fixpunkt.de> wrote:

> Hi,
>
> tried to upgrade Perc H700 Integrated firmware on Debian (Wheezy).
>
> # /opt/dell/srvadmin/bin/omreport about
>
> Product name : Dell OpenManage Server Administrator
> Version  : 7.4.0-1
> Copyright: Copyright (C) Dell Inc. 1995-2013 All rights reserved.
> Company  : Dell Inc.
>
>
> # /opt/dell/srvadmin/bin/omreport storage controller
> Controller  PERC H700 Integrated (Slot 4)
>
> Controllers
> ID: 0
> Status: Non-Critical
> Name  : PERC H700 Integrated
> Slot ID   : PCI Slot 4
> State : Degraded
> Firmware Version  : 12.10.1-0001
> Latest Available Firmware Version : 12.10.5-0001
> Driver Version: 00.00.06.12-rc1
> Minimum Required Driver Version   : Not Applicable
> Storport Driver Version   : Not Applicable
> Minimum Required Storport Driver Version  : Not Applicable
> […]
>
> # cat /etc/debian_version
> 7.9
>
>
> # wget
> http://downloads.dell.com/FOLDER03292779M/4/SAS-RAID_Firmware_2948G_LN_12.10.7-0001_A13_02.BIN
>
> # chmod 777 SAS-RAID_Firmware_2948G_LN_12.10.7-0001_A13_02.BIN
>
> # bash ./SAS-RAID_Firmware_2948G_LN_12.10.7-0001_A13_02.BIN --extract
> RAIDFW
> # cd RAIDFW/
> # ./sasdupie -u -s payload/
>  lang="en">0
>
> There is no “The operation was successful” and
> “1” as answer of the last step. So it went
> wrong.
>
>
> Is it possible to upgrade from the shell? How can I upgrade else?
>
> TIA
>
> Helmut Wollmersdorfer
>
> ___
> Linux-PowerEdge mailing list
> Linux-PowerEdge@dell.com
> https://lists.us.dell.com/mailman/listinfo/linux-poweredge
>
>


-- 
Stephen Dowdy  -  Systems Administrator  -  NCAR/RAL
303.497.2869   -  sdo...@ucar.edu-  http://www.ral.ucar.edu/~sdowdy/
___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge


Re: [Linux-PowerEdge] FW:

2016-04-15 Thread Stephen Dowdy
The thing to be careful with, is that there is not just *ONE* PERC 6/i.
There's:

   PERC 6/i Adapter (regular PCI card that goes in PCI slots)
   PERC 6/i Integrated (it's really about the same card, but the PCI
bracket is removed and it's in a servers' Integrated RAID card housing)
(IIRC, the latest version i could find for Integrated was 6.3.3, and 6.3.1
for the Adapter.  That may be because Dell didn't care to update the
Adapter form, or the specific fwupdate for the integrated only addressed
specific sets of disks and such that Dell sold in those configurations)

There are different firmware updates for these based upon the device's
PCI Vendor/Device SubVendor/SubDevice identifiers.

btw, For H700, etc, there's 3 or 4 different versions (Adapter, Integrated,
Mini, MiniP, ...) Again, distinguished by the PCI identifiers.  The
firmware images refuse to be burned to the wrong card even though they're
all "H700" family, e.g.

I have a script that identifies which particular device(s) you have in your
machine, and uses a table mapping the LSI Firmware ROM names to each
Vendor_Device_SubV_SubD  string that i yoinked out of the .ROM images
themselves.

It uses LSI's 'megacli' to flash those ROM images to all discovered
devices.  MUCH, MUCH, MUCH less error prone than Dell's overengineered
DUP/OpenManage, and it runs w/o modification on Debian (BIG WIN!)

--stephen

On Fri, Apr 15, 2016 at 10:55 AM, Karsten Suehring <suehringli...@gmail.com>
wrote:

> Hi,
>
> I remember having trouble with the PERC6/i update on old R1950 machines a
> while ago. I noticed that there existed two Linux update packages with the
> same version number, but a different cryptic string part in the file name.
> One worked, the other did not. The two versions showed up on different
> server types on the dell support site. I think the one that worked was
> named "SAS-RAID_Firmware_W83M2_LN32_6.3.1-0003_A14.BIN" (at least that's
> the file that I still have). If you Google the name, it will point you to
> the Dell download servers.
>
> Best regards,
> Karsten
>
> On Fri, Apr 1, 2016 at 3:03 PM, Andrew Barkley <abark...@crawfordtech.com>
> wrote:
>
>> kernel: sasdupie[8998]: segfault at 20 ip 7f3c3ecee00d sp
>> 7ffc73039a20 error 4 in sasdupie[7f3c3ecc9000+11]
>> --
>> *From:* soorej_ponna...@dell.com [soorej_ponna...@dell.com]
>> *Sent:* Thursday, March 31, 2016 10:55 PM
>> *To:* Andrew Barkley; linux-powere...@lists.us.dell.com
>> *Subject:* RE: [Linux-PowerEdge] FW:
>>
>> *Dell - Internal Use - Confidential *
>>
>>
>>
>> Hi,
>>
>>
>>
>> DSU does not support PE2950 server. For the firmware
>> segfault, please attach the log messages so that the device team can have a
>> look.
>>
>>
>>
>> *Soorej Ponnandi*
>>
>> *Dell* | Change Management
>>
>> *office* +  91 80 2807 7759 Extn: 78469
>>
>>
>>
>> *From:* linux-poweredge-bounces-Lists *On Behalf Of *Andrew Barkley
>> *Sent:* Friday, April 1, 2016 12:49 AM
>> *To:* linux-poweredge-Lists <linux-powere...@lists.us.dell.com>
>> *Subject:* [Linux-PowerEdge] FW:
>>
>>
>>
>> I am using the new repository, found through this site:
>> http://linux.dell.com/repo/hardware/dsu/
>> <http://redir.aspx?REF=d6Jgeo73qRVnmpNJgZl04FeKUlyNwTuxupig2xuCAUagu36mLVrTCAFodHRwOi8vbGludXguZGVsbC5jb20vcmVwby9oYXJkd2FyZS9kc3Uv>
>> --
>>
>> I am trying to update the PERC 6/i firmware in a PE 2950 running CentOS
>> 6.7 and the update package is segfaulting.
>>
>> ___
>> Linux-PowerEdge mailing list
>> Linux-PowerEdge@dell.com
>> https://lists.us.dell.com/mailman/listinfo/linux-poweredge
>>
>>
>
> ___
> Linux-PowerEdge mailing list
> Linux-PowerEdge@dell.com
> https://lists.us.dell.com/mailman/listinfo/linux-poweredge
>
>


-- 
Stephen Dowdy  -  Systems Administrator  -  NCAR/RAL
303.497.2869   -  sdo...@ucar.edu-  http://www.ral.ucar.edu/~sdowdy/
___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge


Re: [Linux-PowerEdge] R720XD dead after IDRAC update trial

2016-04-08 Thread Stephen Dowdy
Grzegorz,

I've had a motherboard replacement needed due to iDRAC updates before.
It's pretty frustrating.

The only "emergency" procedure you might try:

- disconnect power cords to PSUs
- press and HOLD the power button to "drain the Flea power" (i presume
this is a capacitor store residuals?)
   Hold for 30 seconds
- reconnect PSUs to power
- power on and enable CAPS LOCK, SCROLL LOCK and NUM LOCK (may not be
able to do immediately)
  This *MAY* require you get into the BIOS "Setup" menu () (which
you indicate seems unlikely)
- Press  , , 
  I can only imagine this stands for "Emergency F*ing Boot"

If you are lucky, then the system will be reset.  If not, you might
want to give up and have the Motherboard replaced (sigh)

Ref:
http://en.community.dell.com/techcenter/systems-management/w/wiki/3464.troubleshooting-idrac6-issues

Good Luck,
--stephen



-- 
Stephen Dowdy  -  Systems Administrator  -  NCAR/RAL
303.497.2869   -  sdo...@ucar.edu-  http://www.ral.ucar.edu/~sdowdy/

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge


Re: PERC H800 Firmware update from Linux failing?

2010-08-02 Thread Stephen Dowdy
(sorry for the attribution depth, deleted the original message).

Anyway, i have no H800 series controllers yet, so i'm just throwing
this out, in case it applies, but the PERC5/6
can be updated via MegaCLI with:

MegaCli -AdpFwFlash -f filename [-NoSigChk] [-NoVerChk] -aN|-a0,1,2|-aALL

I'd extract the DUP kit below with:

RAID_FRMW_LX_R269683.BIN --extract ./FOO

and look in FOO/payload for a .img or .fw or whatever file to
apply directly.

If anyone does try this, finds that it works/doesn't, please respond.

--stephen


 Op 31-7-2010 18:38, Tom Rockwell schreef:
 Hi,

 I tried to apply the latest firmware update to a PERC H800 controller 
 using the Linux .BIN package.  It gave me the message that the update 
 doesn't match the system (forget the exact wording) and wouldn't apply. 
   I have updated the controller on this system in the past (prior release).

 I ended up applying the update using a PXE booted DOS image, and that 
 went fine.

 Anyone having problems installing using the file: 

 http://ftp.us.dell.com/sas-raid/RAID_FRMW_LX_R269683.BIN


-- 
Stephen Dowdy  -  Systems Administrator  -  NCAR/RAL
303.497.2869   -  sdo...@ucar.edu-  http://www.ral.ucar.edu/~sdowdy/

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Blew away my partition table

2010-06-29 Thread Stephen Dowdy
Jefferson Ogata wrote, On 06/29/2010 02:06 PM:
  Lots of really good info ...

Also, take a look at :

http://www.cgsecurity.org/wiki/TestDisk
-
TestDisk can

* Fix partition table, recover deleted partition
* Recover FAT32 boot sector from its backup
* Rebuild FAT12/FAT16/FAT32 boot sector
* Fix FAT tables
* Rebuild NTFS boot sector
* Recover NTFS boot sector from its backup
* Fix MFT using MFT mirror
* Locate ext2/ext3 Backup SuperBlock
* Undelete files from FAT, NTFS and ext2 filesystem
* Copy files from deleted FAT, NTFS and ext2/ext3 partitions. 
-

I'm not sure if there's a simple command to scan the device,
presuming only the partition table is borked and to recreate
just the PT, but i think so.  At least it can do Superblock
scan lookups.

There's another tool i've run across that'll scan a block dev
for superblock backups, but i can't recall the name...

btw, cgsecurity has photorec, which was originally designed
to recover lost photos off digital camera media.  It's been
enhanced to recover a large number of file types off any
damaged media and write what it can to auxiliary storage.


--stephen

-- 
Stephen Dowdy  -  Systems Administrator  -  NCAR/RAL
303.497.2869   -  sdo...@ucar.edu-  http://www.ral.ucar.edu/~sdowdy/

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Checking servers for firmware/driver updates etc

2010-02-27 Thread Stephen Dowdy
Matt Domsch wrote, On 02/27/2010 06:48 PM:

 For PowerEdge, you could parse the XML Catalog, which is what
 Repository Manager, DMC, and other 3rd party commercial update tools
 use.
 
 ftp://ftp.dell.com/catalog/Catalog.xml.gz
 ftp://ftp.dell.com/catalog/Catalog.xml.gz.sign

Matt,

Thanks! I didn't know that was quickly accessible.  There does appear
to be some historical unmaintained cruft there
(e.g. Catalog.xml.tar.gz), but if that file is reliable (and i presume
from your description of it being used by the above tools that it won't
disappear anytime soon), then that's is DEFINITELY good news ;).

Yeah, it doesn't help me with the hundreds of desktop (Precision WS) 
laptop (Latitude) systems i also manage, but this is definitely useful.

 This does not include history, but it does include the current release
 block's set of files.  This file is updated as part of the block
 release process, when a whole set of tested updates are published at
 once.

 The next obvious question is - where can I find the schema for this
 XML?  I believe this is available as part of Dell's PartnerDirect
 program: http://dell.com/partnerdirect/.  You may be able to tease the
 bits you want out of the XML directly without having the full schema
 though.

Yeah, it definitely (at this point) looks pretty easily parsable
(sans the UTF16 stuff, which is easily coped with).

Thanks,
--stephen

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Checking servers for firmware/driver updates etc

2010-02-26 Thread Stephen Dowdy
 difficult trying to figure
out which system is the one to use (note the ITA vs DMC, SBUU vs SUU vs ..)
Even things like the Dell Capacity Planner/Calculator hav gone from
a Windows only executable to a Java app to a flash app to ???  and often
the new servers aren't included.   And, STILL:
www.dell.com/calc points to ESSA for the Blades, and you click that
link and get can't find the server at solutions.dell.com.  (it's been
that way for MONTHS)

Thanks,
--stephen Cranky Old Man Ramblin' dowdy
#!/bin/sh
# Title:ral-superinv
# Purpose:  obtain System Inventory info (firmware, etc)
# Author:   Stephen Dowdy (sdo...@ucar.edu)
# RCS:  $Header$
# Note: See RCS/CVS Log info at End of File
# Requirements:  dmidecode, ipmitool, ddcprobe, xresprobe
# Caveats:
# Todo:
#
# Copyright UCAR (c) 2006-2009.
# University Corporation for Atmospheric Research (UCAR),
# National Center for Atmospheric Research (NCAR),
# Research Applications Laboratory (RAL),
# P.O. Box 3000, Boulder, Colorado, 80307-3000, USA.

is_debug() { [ ${DEBUG:-0} -ge 1 ] ;}
debug() { is_debug  echo DEBUG:  $@ 12 ;}

preload() {

if [ -f /etc/debian_version ]; then
#if ! type dmidecode  /dev/null; then echo Missing dmidecode;  exit 1; 
fi
if ! dpkg-query -W --showformat='${Status}\n' dmidecode | grep -q 
'^install'; then
  echo Installing 'dmidecode'...
  apt-get install dmidecode  /dev/null
fi

#if ! type ddcprobe  /dev/null; then echo Missing ddcprobe;  exit 1; fi
if ! dpkg-query -W --showformat='${Status}\n' xresprobe | grep -q 
'^install'; then
  echo Installing 'xresprobe'...
  apt-get install xresprobe  /dev/null
fi
fi

# lame attempt to get IPMI functional
if type -p ipmitool  /dev/null; then
if [ $(lsmod | grep ipmi | wc -l) != 3 ]; then
modprobe ipmi_si
modprobe ipmi_devintf
modprobe ipmi_msghandler
fi
fi
}

inv_sys_hostname=$(hostname -s)

# The dmidecode statements here are from earlier dmidecode releases
# that didn't support such niceties as dmidecode -t bios
get_sys_manufacturer() {
  dmidecode | egrep -A8 '^Handle (0x0100|0x0001|0x0005)' | grep 'Manufacturer:' 
\
| sed -e 's/^[^:]*:[[:space:]]*\(.*[^ ]\)\([[:space:]]*$\)/\1/'
}

get_sys_model() {
  dmidecode | egrep -A8 '^Handle (0x0100|0x0001|0x0005)' | grep 'Product Name:' 
\
| sed -e 's/^[^:]*:[[:space:]]*\(.*[^ ]\)\([[:space:]]*$\)/\1/'
}

get_sys_serialnumber() {
dmidecode | egrep -A8 '^Handle (0x0100|0x0001|0x0005)' \
| sed -ne '/Serial Number:/s/^[^:]*:[[:space:]]*\(.*[^ 
]\)\([[:space:]]*$\)/\1/p'
}

get_sys_bios() {
  dmidecode | egrep -A8 '^Handle (0x)' | grep 'Version:' \
| sed -e 's/^[^:]*:[[:space:]]*\(.*[^ ]\)\([[:space:]]*$\)/\1/'
}

get_sys_bmc() {
  if type -p ipmitool /dev/null 21; then
tmp=$(ipmitool mc info 2/dev/null | grep 'Firmware Revision' \
  | sed -e 's/^[^:]*:[[:space:]]*\(.*[^ ]\)\([[:space:]]*$\)/\1/')
#echo ${tmp:-N/A}
echo ${tmp}
  else
#echo N/A
echo 
  fi
}

# XXX this doesn't work reliably much, but Xorg.0.log
# XXX often has the monitor model correctly probed?
# XXX - nvidia proprietary drivers tend to disable this capability :-(
get_monitor_model() {
  tmp=$(ddcprobe | grep monitorname: | cut -d: -f2 \
  | sed -e 's/^[[:space:]]*\(.*[^ ]\)\([[:space:]]*$\)/\1/')
  debug monitor_model tmp=[${tmp}]
  #echo ${tmp:-UNKNOWN}
  echo ${tmp}
}

get_monitor_serialnumber() {
  tmp=$(ddcprobe | grep monitorserial: \
  | sed -e 's/^[^:]*:[[:space:]]*\(.*[^ ]\)\([[:space:]]*$\)/\1/')
  debug monitor_serialnumber tmp=[${tmp}]
  #echo ${tmp:-UNKNOWN}
  echo ${tmp}
}

# XXX - We don't rely on external LSI tools at this point
# XXX - we're trying to be lean, rely on well-established utilities
get_perc4() {
## Host: scsi0 Channel: 01 Id: 00 Lun: 00
##   Vendor: MegaRAID Model: LD 0 RAID1   69G Rev: 522D
##   Type:   Direct-AccessANSI SCSI revision: 02
awk -F: '
$1 ~ /Vendor/  $2 ~ /MegaRAID/  $3 ~ /LD [0-9] RAID/ {print $4; }
' /proc/scsi/scsi | tr -d ' ' | fmt -128 | sed -e 's/ /, /g'
}

get_perc5i() {
## Host: scsi0 Channel: 02 Id: 00 Lun: 00
##   Vendor: DELL Model: PERC 5/i Rev: 1.03
##   Type:   Direct-AccessANSI SCSI revision: 05
awk -F: '
$1 ~ /Vendor/  $2 ~ /DELL/  $3 ~ /PERC 5\/i/ {print $4; }
' /proc/scsi/scsi | tr -d ' ' | fmt -128 | sed -e 's/ /, /g'
}

get_perc5E() {
## Host: scsi2 Channel: 02 Id: 00 Lun: 00
##   Vendor: DELL Model: PERC 5/E Adapter Rev: 1.03
##   Type:   Direct-AccessANSI SCSI revision: 05
awk -F: '
$1 ~ /Vendor/  $2 ~ /DELL/  $3 ~ /PERC 5\/E/ {print $4; }
' /proc/scsi/scsi | tr -d ' ' | fmt -128 | sed -e 's/ /, /g'
}

get_perc6i() {
## Host: scsi0 Channel: 02 Id: 00 Lun: 00
##   Vendor: DELL Model: PERC 6/i Rev: 1.11
##   Type:   Direct-AccessANSI SCSI revision: 05
awk -F: '
$1 ~ /Vendor/  $2 ~ /DELL/  $3 ~ /PERC 6\/i/ {print $4; }
' /proc/scsi/scsi | tr -d ' ' | fmt -128 | sed -e 's/ /, /g'
}

get_perc6E

Re: Automatic Detection of PowerEdge Servers

2010-01-20 Thread Stephen Dowdy
Erinn Looney-Triggs wrote, On 01/20/2010 01:46 PM:
 I run an automatic provisioning and installation service via cobbler for 
 our RedHat installs. Now as my laziness increases, 

there's some saying about how *much* work sysadmins are willing
to invest due to laziness ;)

 can't figure out is a way to reliably know that the system is a Dell 
 PowerEdge system. I could do something like dmidecode -s 
 system-product-name and grep -i for poweredge (in loose terms), but are 
 all PowerEdge servers supported by OpenManage? Is there a better way to 
 do this?

My guess is any Dell equipment with a BMC is covered by OM, but that's
just a guess.

This should then work for that...
[r...@foo ~]# ipmitool mc info
Device ID : 32
Device Revision   : 0
Firmware Revision : 2.28
IPMI Version  : 2.0
Manufacturer ID   : 674   --- This may be good to key off
Manufacturer Name : DELL Inc
Product ID: 256 (0x0100)
Product Name  : Unknown (0x100)
Device Available  : yes
Provides Device SDRs  : yes
Additional Device Support :
Sensor Device
SDR Repository Device
SEL Device
FRU Inventory Device
IPMB Event Receiver
Bridge
Chassis Device
Aux Firmware Rev Info :
0x00
0x00
0x00
0x00

 On a related note I am trying to do the same thing for the DRAC cards in 
 the systems, I would like a way to detect that a DRAC card is present 
 and if so run a set of commands. I can do this for a a DRAC 5 pretty 
 reliably using lsusb -d 0x413c: which correlates to the DRAC 5 
 cards, however the iDRAC6 cards don't seem to have this option 
 available. Does anyone know of a way to automagically detect their 
 presence?

yeah, that's something that's bugged me too.  'lspci' would show up
a DRAC4, but thanks for the clue on 'lsusb' for the DRAC5!
(following this logic...  lspcmcia ??  heh ;) darn :-(

'ipmitool fru list' doesn't show it, and i'm pretty sure dmidecode
doesn't show it (unless there's encoding in one of the OEM specific
types, and i wouldn't rule that out)

--stephen
-- 
Stephen Dowdy  -  Systems Administrator  -  NCAR/RAL
303.497.2869   -  sdo...@ucar.edu-  http://www.ral.ucar.edu/~sdowdy/

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Automatic Detection of PowerEdge Servers

2010-01-20 Thread Stephen Dowdy
Alexander Dupuy wrote, On 01/20/2010 03:22 PM:
 You may get some joy from ipmitool sdr elist mcloc:
 
 On a 1950 with a BMC (but no DRAC 5, despite the output):
 # ipmitool sdr elist mcloc
 BMC  | 00h | ok  |  7.1 | Dynamic MC @ 20h
 DRAC 5   | 00h | ok  | 11.1 | Dynamic MC @ 26h
 
 On a R710 with iDRAC6:
 # ipmitool sdr elist mcloc
 iDRAC6   | 00h | ok  |  7.1 | Dynamic MC @ 20h

Alex,

Cool, thanks.  FWIW, here's what an M610 blade shows:
# ipmitool sdr elist mcloc
iDRAC| 00h | ok  |  7.1 | Dynamic MC @ 20h

And a PE1850
# ipmitool sdr elist mcloc
DRAC4| 00h | ok  | 11.5 | Dynamic MC @ 26h
BMC  | 00h | ok  |  7.1 | Dynamic MC @ 20h
Primary BP   | 00h | ok  | 26.2 | Dynamic MC @ C0h

As you say, this seems to only be representative of
an interface address for that system, not the existence
of the card.  I'm guessing there must be some IPMI
call that can be issued to those addresses to detect
presence, but...  (i'm afraid to start pushing random
events there and don't have time to read the IPMI docs)

however, at least for a DRAC5...

System *with* a DRAC5 installed
# ipmitool sdr elist | grep -i drac
DRAC5 Conn 2 Cbl | 59h | ok  |  7.1 | Connected
(this command takes *minutes* to complete, but
can be shortened by
# ipmitool sdr entity 7.1 | grep -i drac
DRAC5 Conn 2 Cbl | 59h | ok  |  7.1 | Connected
)

System w/o DRAC5 installed, note the 'ns' (no sense) state:
# ipmitool sdr elist | grep -i drac
DRAC5 Conn 2 Cbl | 59h | ns  |  7.1 | Disabled

This entity may be an indicator of Console
Serial redirection from the name, so i'm not sure if it
still indicates the existence of the card.

Unfortunately, on my M610, nothing matches the 'drac' string
nor on my PE1850.

For Brandon Ooi, 

# lshw | egrep -i '(drac|remote|access|mc)'
#

While 'lshw' is useful, it also doesn't answer this question :-(

--stephen
-- 
Stephen Dowdy  -  Systems Administrator  -  NCAR/RAL
303.497.2869   -  sdo...@ucar.edu-  http://www.ral.ucar.edu/~sdowdy/

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: Dell Poweredge SC1425 Open Manage

2009-12-07 Thread Stephen Dowdy
jeffrey_l_mend...@dell.com wrote, On 12/07/2009 08:47 AM:

 OMSA is only available for certain Dell PowerEdge server models. Notably, 
 SC-class systems do not have OMSA available. 
 To check to see if OMSA is available on your system, use the getSystemId' 
 executable to look up your System ID.

If you don't have 'getSystemId' (part of libsmbios), this works in a pinch:
(assuming you have a reasonably current 'dmidecode')

sh-3.2# dell_systemid() { dmidecode -t 208 | awk '/Header and Data/ {getline; 
print 0x$10$9}' ;}
sh-3.2# dell_systemid
0x0162
sh-3.2# dmidecode -s system-product-name
PowerEdge 400SC

ssh-in2:~/cib # dell_systemid
0x016D
ssh-in2:~/cib # dmidecode -s system-product-name
PowerEdge 2850

--stephen

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: PE T610 - ipmitool reports a 'cr'

2009-10-23 Thread Stephen Dowdy
Per Jensen wrote, On 10/23/2009 06:52 AM:
 List,
 
 I have just received a T610 which is being setup for Xen dom0 backup use.
...
 When running 'ipmitool sdr' I get a 'cr' on one of the reported temperatures, 
 as
 shown in the snip below.
...
 Temp | 62 degrees C  | cr
...
 How do I find out what the cr is about, should I be concerned ?

Per,

You need to get the Entity ID via:

# ipmitool sdr type temp
Temp | 01h | ok  |  3.1 | -56 degrees C
Temp | 02h | ok  |  3.2 | -54 degrees C
Temp | 05h | ok  | 10.1 | 37 degrees C
Temp | 06h | ok  | 10.2 | 36 degrees C
Ambient Temp | 0Eh | ok  |  7.1 | 25 degrees C
Planar Temp  | 0Fh | ok  |  7.1 | 42 degrees C
IOH THERMTRIP| 5Dh | ns  |  7.1 | Disabled
CPU Temp Interf  | 76h | ns  |  7.1 | Disabled
Temp | 0Ah | ok  |  8.1 | 34 degrees C
Temp | 0Bh | ok  |  8.1 | 39 degrees C
Temp | 0Ch | ucr |  8.1 | 50 degrees C

I also appear to have an Upper Critical on entity 8.1 on an R710.
I'll have to check my other systems to see if this is an anomoly
or a bug/deficiency in Dell's implementation.

# ipmitool -v sdr entity 8.1

Sensor ID  : Temp (0xc)
 Entity ID : 8.1 (Memory Module)
 Sensor Type (Analog)  : Temperature
 Sensor Reading: 49 (+/- 1) degrees C
 Status: Upper Critical
 Nominal Reading   : 23.000
 Normal Minimum: 11.000
 Normal Maximum: 69.000
 Upper critical: 47.000
 Upper non-critical: 42.000
 Lower critical: 3.000
 Lower non-critical: 8.000
 Positive Hysteresis   : 1.000
 Negative Hysteresis   : 1.000
 Minimum sensor range  : Unspecified
 Maximum sensor range  : Unspecified
 Event Message Control : Per-threshold
 Readable Thresholds   : lcr lnc unc ucr
 Settable Thresholds   : lcr lnc unc ucr
 Threshold Read Mask   : lcr lnc unc ucr
 Event Status  : Event Messages Disabled
 Assertion Events  : unc+ ucr+
 Event Enable  : Event Messages Disabled
 Assertions Enabled:

This is a Memory Module.  Not sure how to map that to any particular
DIMM/slot/cpu/sensor-location, though, as i have 6 DIMMs (3/cpu)

# dmidecode -t memory | sed -ne '/Memory Device/,/Part Number/ {
 /Size:/h; /^[[:space:]]*Locator:/ {p;x;p}; /Speed:/p}' | paste - - - | tr -s 
 '\t
 ' | expand -t 1,20,50
 Locator: DIMM_A1   Size: 4096 MB Speed: 1333 MHz (0.8 ns)
 Locator: DIMM_A2   Size: 4096 MB Speed: 1333 MHz (0.8 ns)
 Locator: DIMM_A3   Size: 4096 MB Speed: 1333 MHz (0.8 ns)
 Locator: DIMM_A4   Size: No Module Installed Speed: Unknown
 Locator: DIMM_A5   Size: No Module Installed Speed: Unknown
 Locator: DIMM_A6   Size: No Module Installed Speed: Unknown
 Locator: DIMM_A7   Size: No Module Installed Speed: Unknown
 Locator: DIMM_A8   Size: No Module Installed Speed: Unknown
 Locator: DIMM_A9   Size: No Module Installed Speed: Unknown
 Locator: DIMM_B1   Size: 4096 MB Speed: 1333 MHz (0.8 ns)
 Locator: DIMM_B2   Size: 4096 MB Speed: 1333 MHz (0.8 ns)
 Locator: DIMM_B3   Size: 4096 MB Speed: 1333 MHz (0.8 ns)
 Locator: DIMM_B4   Size: No Module Installed Speed: Unknown
 Locator: DIMM_B5   Size: No Module Installed Speed: Unknown
 Locator: DIMM_B6   Size: No Module Installed Speed: Unknown
 Locator: DIMM_B7   Size: No Module Installed Speed: Unknown
 Locator: DIMM_B8   Size: No Module Installed Speed: Unknown
 Locator: DIMM_B9   Size: No Module Installed Speed: Unknown

But, the presumption from this, then, is that the memory is
overheating *IF* it's not some incomplete function of the BMC.

Well, to confirm, this seems to be common on the R710s i've checked.

lager:~# ipmitool  sdr entity 8.1.0
Temp | 0Ah | ok  |  8.1 | 27 degrees C
Temp | 0Bh | ok  |  8.1 | 24 degrees C
Temp | 0Ch | ucr |  8.1 | 59 degrees C

pub:~# ipmitool  sdr entity 8.1.0
Temp | 0Ah | ok  |  8.1 | 32 degrees C
Temp | 0Bh | ok  |  8.1 | 32 degrees C
Temp | 0Ch | unc |  8.1 | 45 degrees C

The last sensor is MUCH higher than the other two.

I think someone from Dell needs to chime in on this
--stephen

-- 
Stephen Dowdy  -  Systems Administrator  -  NCAR/RAL
303.497.2869   -  sdo...@ucar.edu-  http://www.ral.ucar.edu/~sdowdy/

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq