Re: [Linux-PowerEdge] R6515 with unbalanced memory config?
[EXTERNAL EMAIL] On 7/13/20 3:18 PM, Tim Mooney wrote: But the full whitepaper is no longer present on Dell's download site. Even finding the 1 page http://downloads.dell.com/manuals/common/Direct_from_Development_-_Balanced_Memory_on_2nd_Generation_AMD_EPYC_Processors_Reference_Guide.pdf Still links to the full whitepaper under a download URL that no longer works. Anyone know either where the full whitepaper can be found or what kind of performance penalty we're looking at for this type of NUMA memory config? FWIW, i strongly deter any of my users from configuring a machine that's not fully "balanced". Unfortunately, that's hard here on the Epyc Gen II where you have to use 8 DIMMs and 8GB is the smallest. (so a 2P server has at least 128GB of RAM, even if you don't need it) Amazingly the site is not hiding dirindexes, so, just head on over to: https://downloads.dell.com/manuals/common/ Quality Control is Job None in the world today :-( (for renaming files and not validating links) How's this? https://downloads.dell.com/manuals/common/dellemc-balanced-memory-2ndgen-amd-epyc-poweredge.pdf The Epyc "Chiplet" arrangement makes things even more weird for configuring systems. I haven't read the WP yet, but thanks for pointing it out. Search for "epyc" to find other docs. dellemc-balanced-memory-2ndgen-amd-epyc-poweredge.pdf dell-emc-dfd-advantage-four-channel-memory-poweredge-amd-epyc.pdf dellemc-dfd-balanced-memory-2ndgen-amd-epyc-summary.pdf dell-emc-dfd-numa-amd-epyc-2ndgen.pdf dellemc_readysol_hpc_digimanufacturing_amdepyc_altairperf.pdf dellemc_readysol_hpc_digimanufacturing_epyc_ansys.pdf dellemc_readysol_hpc_digimanufacturing_epyc_simcenter_starccm.pdf Direct_from_Development_-_Balanced_Memory_on_2nd_Generation_AMD_EPYC_Processors_Reference_Guide.pdf poweredge_perf_amdepyc7002series.pdf security_poweredge_amd_epyc_gen2.pdf ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Re: [Linux-PowerEdge] BIOS update fails on Dell PowerEdge R720xd
On 11/7/19 6:35 PM, Mauricio Tavares wrote: terminate called after throwing an instance of 'unsigned char' We had this recently here. using an interim BIOS (a few revs back) showed a much more useful error message: System Services is disabled Can you re-boot the system to see if that's the case? I dunno if there's a way to programmatically re-enable System-Services when the system is running (perhaps through the iDRAC) --stephen ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Re: [Linux-PowerEdge] Disk temperature
[EXTERNAL EMAIL] On 10/10/19 11:32 AM, vinc...@cojot.name wrote: [EXTERNAL EMAIL] You could use megaclisas-status, here a T440 with a PERC H730P. It reports temps for drives and for the LSI cards. # megaclisas-status -- Controller information -- -- ID | H/W Model | RAM | Temp | BBU | Firmware c0 | PERC H730P Adapter | 2048MB | 79C | Good | FW: 25.5.6.0009 -- Array information -- -- ID | Type | Size | Strpsz | Flags | DskCache | Status | OS Path | CacheCade |InProgress c0u0 | RAID-0 | 1818G | 512 KB | ADRA,WB | Enabled | Optimal | /dev/sda | None |None c0u1 | RAID-0 | 7276G | 512 KB | ADRA,WB | Enabled | Optimal | /dev/sdb | None |None c0u2 | RAID-0 | 1818G | 512 KB | ADRA,WB | Enabled | Optimal | /dev/sdc | None |None -- Disk information -- -- ID | Type | Drive Model | Size | Status | Speed | Temp | Slot ID | LSI ID c0u0p0 | SSD | S3YUNB0KC09340D Samsung SSD 860 EVO 2TB RVT03B6Q | 1.818 TB | Online, Spun Up | 6.0Gb/s | 25C | [32:0] | 0 c0u1p0 | HDD | R6GP40KY WDC WD80EFZX-68UW8N0 83.H0A83 | 7.276 TB | Online, Spun Up | 6.0Gb/s | 33C | [32:4] | 4 c0u2p0 | SSD | S3YUNB0KC09173D Samsung SSD 860 EVO 2TB RVT03B6Q | 1.818 TB | Online, Spun Up | 6.0Gb/s | 24C | [32:1] | 1 Ref: https://raw.githubusercontent.com/ElCoyote27/hwraid/master/wrapper-scripts/megaclisas-status Vincent, thanks for this, This is similar to a script i wrote 15 years ago (output below). but megaclisas-status is MUCH faster (mine is bash and full of ugliness, but that's the nature of parsing MegaCLI output :-/ ). I've added 'megaclisas-status' to my LSI-Tools kit. FWIW, You have info i don't have, but it'd be nice to get PredFault(PF), MediaErrors(ME), (much more useful status predictors, IMHO, than Temp values) and Foreign Cfg info Of course, figuring out where to stuff that info and not have lines too long is fun. --stephen # /usr/local/LSI-Tools/megacli-overview [Adapter 0] ADP[0]=( PERC 5/E Adapter),FWPkg( 5.2.2-0076),FWVer( 1.03.50-0461) BBU[0]=(Learning?=No, charge=100 %, status=Complete, isSOHGood=Yes) PD[17:0]=(WDC/WD1002FBYS-18W8B1/0C12/X) ME=0,OE=6,PF=0,FW=Online, Spun Up,F=None PD[17:1]=(SATA/ST3750640AS/E/X) ME=0,OE=6,PF=0,FW=Online, Spun Up,F=None ... PD[17:14]=(Hitachi/HUA721075KLA330/A74A/X) ME=0,OE=6,PF=0,FW=Online, Spun Up,F=None VD[0:0]=("d4",R(5,0,3),SZ= 4.089 TB,SS=128 KB,CP=(W=WB,R=ReadADP,IO=D),# 7,Optimal) VD[0:1]=("d3",R(5,0,3),SZ= 4.089 TB,SS=128 KB,CP=(W=WB,R=ReadADP,IO=D),# 7,Optimal) [Adapter 1] ADP[1]=( PERC 6/i Integrated),FWPkg( 6.3.3.0002),FWVer( 1.22.52-1909) BBU[1]=(Learning?=No, charge=88 %, status=Complete, isSOHGood=Yes) PD[32:0]=(SEAGATE/ST9300603SS/FS66/X) ME=0,OE=0,PF=0,FW=Online, Spun Up,F=None PD[32:1]=(SEAGATE/ST9300603SS/FS66/X) ME=0,OE=0,PF=0,FW=Online, Spun Up,F=None PD[32:2]=(SEAGATE/ST9300603SS/FS66/X) ME=0,OE=0,PF=0,FW=Online, Spun Up,F=None PD[32:3]=(SEAGATE/ST9300603SS/FS66/X) ME=0,OE=0,PF=0,FW=Online, Spun Up,F=None VD[1:0]=("opsys",R(1,3,0),SZ= 557.75 GB,SS=64 KB,CP=(W=WB,R=ReadADP,IO=D),#2,Optimal) ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Re: [Linux-PowerEdge] Disk temperature
Title:scsi-inq-all # Purpose: use smartctl to list out INQUIRY data and important stats for all drives it finds. # Author: Stephen Dowdy (sdo...@ucar.edu) # Ref: https://www.backblaze.com/blog/hard-drive-smart-stats/ # Backblaze uses: #SMART 5 – Reallocated_Sector_Count. #SMART 187 – Reported_Uncorrectable_Errors. #SMART 188 – Command_Timeout. #SMART 197 – Current_Pending_Sector_Count. #SMART 198 – Offline_Uncorrectable. # Note that this 'awk' script is designed to only process ONE disk at a time (vars are not rezeroed on end-of-records) # Warn: this script is extremely dependent upon parsing exact output strings from 'smartctl', which are subject to change. # SATA and SAS are reported VERY differently by 'smartctl', so we have to do several attempts to string match/parse #awk=gawk #awk=mawk awk=awk smartctl --scan-open | sed -ne '/^\//{s/[ \t]*#.*$//;p}' | \ while read disk; do echo "[ ${disk} ]" #smartctl -a ${disk} | ${awk} -v 'FS=:[[:space:]][[:space:]]*' ' smartctl -x ${disk} | ${awk} -v 'FS=[ \t]*:[ \t]*' ' /^(Vendor|Model Family):/{vendor=$2} /^(Product|Device Model):/{model=$2} /^(Revision|Firmware Version):/{firmware=$2} /^(Serial number|Serial Number):/{serial=$2} /^Elements in grown defect list:/{gdf=$2} /Reallocated_Sector_Ct/{tmp=split($0,foo,"[ \t][ \t]*"); gdf=foo[tmp]} /Offline_Uncorrectable/{tmp=split($0,foo,"[ \t][ \t]*"); uco=foo[tmp]} /Reported_Uncorrect/{tmp=split($0,foo,"[ \t][ \t]*"); ucr=foo[tmp]} /Current_Pending_Sector/{tmp=split($0,foo,"[ \t][ \t]*"); pen=foo[tmp]} /^Manufactured in week/{nf=split($0,foo,"[ \t][ \t]*");mdt=sprintf("%d(w)/%d(y)",foo[4],foo[7]);} /Power_On_Hours/{tmp=split($0,foo,"[ \t][ \t]*"); poh=foo[tmp]; } /number of hours powered up/{tmp=split($0,foo,"[ \t][ \t]*=[ \t][ \t]*");poh=foo[tmp]} /Device Error Count/{tmp=split($0,foo,"[ \t][ \t]*"); elc=foo[4]} /Temperature_Celsius/{tmp=split($0,foo,"[ \t][ \t]*"); cdt=foo[8]; } /Current Drive Temperature/{tmp=split($0,foo,"[ \t][ \t]*"); cdt=foo[4]} /Current Temperature:/{tmp=split($0,foo,"[ \t][ \t]*"); cdt=foo[3]} END{printf("%-60s : %8s %15s (gdf=%d,ucr=%d,uco=%d,pen=%d,cdt=%d,elc=%d) [mdt=%s, poh=%d]\n",vendor " " model,firmware,serial,gdf,ucr,uco,pen,cdt,elc,mdt,poh)} ' done exit 0 ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Re: [Linux-PowerEdge] Disk temperature
[EXTERNAL EMAIL] On 10/10/19 9:58 AM, Onno Zweers wrote: Hi all, Does anyone know how to get the temperature of the disks connected to a Perc H730? Onno, You should be able to modify this script to also report Temperatures. (smartmontools' smartctl past some 6.5(?) version supports a MegaRAID passthru) e.g. SAS drive might report something like: Current Drive Temperature: 54 C (SATA will likely report in an Attribute table element) #!/bin/sh # Title:scsi-inq-all # Purpose: use smartctl to list out INQUIRY data and important stats for all drives it finds. # Author: Stephen Dowdy (sdo...@ucar.edu) # Ref: https://www.backblaze.com/blog/hard-drive-smart-stats/ # Backblaze uses: #SMART 5 – Reallocated_Sector_Count. #SMART 187 – Reported_Uncorrectable_Errors. #SMART 188 – Command_Timeout. #SMART 197 – Current_Pending_Sector_Count. #SMART 198 – Offline_Uncorrectable. # Note that this 'awk' script is designed to only process ONE disk at a time (vars are not rezeroed on end-of-records) # Warn: this script is extremely dependent upon parsing exact output strings from 'smartctl', which are subject to change. # SATA and SAS are reported VERY differently by 'smartctl', so we have to do several attempts to string match/parse #awk=gawk #awk=mawk awk=awk smartctl --scan-open | sed -ne '/^\//{s/[ \t]*#.*$//;p}' | \ while read disk; do echo "[ ${disk} ]" #smartctl -a ${disk} | ${awk} -v 'FS=:[[:space:]][[:space:]]*' ' smartctl -x ${disk} | ${awk} -v 'FS=[ \t]*:[ \t]*' ' /^(Vendor|Model Family):/{vendor=$2} /^(Product|Device Model):/{model=$2} /^(Revision|Firmware Version):/{firmware=$2} /^(Serial number|Serial Number):/{serial=$2} /^Elements in grown defect list:/{gdf=$2} /Reallocated_Sector_Ct/{tmp=split($0,foo,"[ \t][ \t]*"); gdf=foo[tmp]} /Offline_Uncorrectable/{tmp=split($0,foo,"[ \t][ \t]*"); uco=foo[tmp]} /Reported_Uncorrect/{tmp=split($0,foo,"[ \t][ \t]*"); ucr=foo[tmp]} /Current_Pending_Sector/{tmp=split($0,foo,"[ \t][ \t]*"); pen=foo[tmp]} /^Manufactured in week/{nf=split($0,foo,"[ \t][ \t]*");mdt=sprintf("%d(w)/%d(y)",foo[4],foo[7]);} /Power_On_Hours/{tmp=split($0,foo,"[ \t][ \t]*"); poh=foo[tmp]; } /number of hours powered up/{tmp=split($0,foo,"[ \t][ \t]*=[ \t][ \t]*");poh=foo[tmp]} /Device Error Count/{tmp=split($0,foo,"[ \t][ \t]*"); elc=foo[4]} END{printf("%-60s : %8s %15s (gdf=%d,ucr=%d,uco=%d,pen=%d,elc=%d) [mdt=%s, poh=%d]\n",vendor " " model,firmware,serial,gdf,ucr,uco,pen,elc,mdt,poh)} ' done exit 0 ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Re: [Linux-PowerEdge] R710 not iDRAC upgradeable?
[EXTERNAL EMAIL] On 10/8/19 11:58 AM, Mauricio Tavares wrote: [EXTERNAL EMAIL] More fun: # ./IDRAC_FRMW_LX_R218238.BIN /tmp/IDRAC_FRMW_LX_R218238.BIN-26250-16229/spsetup.sh: line 124: source: buildVer.sh: file not found # I take that buildVer.sh should be in the package but not being extracted? # fgrep buildVer.sh IDRAC_FRMW_LX_R218238.BIN # You can extract a .BIN via: ./{FOO}.BIN --extract FOO to see what's in the package, but be aware, DUP Kits like to have embedded RPMs that extract also, and i think i've seen THREE or more levels of archiving :-/ But, you may also be running into some systems not allowing executable use in /tmp/ (fstab/mount option 'noexec'). I'd try running it from somewhere else, also, or see if: export TMPDIR=/home/tmp ./IDRAC_FRMW_LX_R218238.BIN works ? --stephen ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Re: [Linux-PowerEdge] R710 not iDRAC upgradeable?
[EXTERNAL EMAIL] tangential to the conversation, if anyone's interested in shell code that gets the BMC/IDRAC full firmware revision and type using IPMITool and the ipmi kernel modules, it follows. This has a number of go-back revisions over time and hasn't been refactored, but it does the job. I needed to updated it when Dell started releasing iDRAC firmwares that were in the "release/fix" series ({major}.{minor}.{release}.{fix} -- where {major}.{minor} stayed the same, and most tools only report {major}.{minor}, like: # ipmitool bmc info | grep 'Firmware Revision' Firmware Revision : 2.61 Note that if you perform a firmware upgrade, you'll have to 'rmmod' all the IPMI modules, then reload them to get updated results. # something like... while lsmod | grep -q '^ipmi_'; do modprobe --remove --all $(lsmod | awk '/^ipmi_/{print$1}'); done; modprobe --all ipmi_si ipmi_devintf ipmi_msghandler # get_sys_bmc_version 2.61.60.60 {iDRAC8} # get_sys_bmc_version() { # XXX /sys/class/ipmi won't update when the firmware is updated. # XXX find a way to reload it if necessary # kernel class data doesn't expose name/type (e.g. iDRAC) if [ -f /sys/class/ipmi/ipmi0/device/bmc/firmware_revision ]; then fwver=$(cat /sys/class/ipmi/ipmi0/device/bmc/firmware_revision) if [ -f /sys/class/ipmi/ipmi0/device/bmc/aux_firmware_revision ]; then fwver_rev=$(printf "%d" "$(cut -d' ' -f2 /sys/class/ipmi/ipmi0/device/bmc/aux_firmware_revision)") fwver_fix=$(printf "%d" "$(cut -d' ' -f1 /sys/class/ipmi/ipmi0/device/bmc/aux_firmware_revision)") fi if type ipmitool >/dev/null 2>&1; then bmctype="$(ipmitool sdr elist mcloc|sed -e 's/[[:space:]].*$//')" tmp="$(ipmitool mc info 2>/dev/null | grep 'Firmware Revision' \ | sed -e 's/^[^:]*:[[:space:]]*\(.*[^ ]\)\([[:space:]]*$\)/\1/')" if [ "${fwver}" != "${tmp}" ]; then printf "WARNING: /sys/class/ipmi/ipmi0/device/bmc/firmware_revision (=${fwver}) differs from ipmitool mc info (=${tmp})\n" 1>&2 printf " Perhaps firmware was updated and system not rebooted (or IPMI not reloaded)\n" 1>&2 fi fi echo "${fwver}.${fwver_rev}.${fwver_fix}${bmctype:+ {${bmctype}}}" else if type ipmitool >/dev/null 2>&1; then bmctype="$(ipmitool sdr elist mcloc|sed -e 's/[[:space:]].*$//')" tmp="$(ipmitool mc info 2>/dev/null | grep 'Firmware Revision' \ | sed -e 's/^[^:]*:[[:space:]]*\(.*[^ ]\)\([[:space:]]*$\)/\1/')" echo "${tmp}${bmctype:+ {${bmctype}}}" else echo "" fi fi } ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Re: [Linux-PowerEdge] Raid errors
[EXTERNAL EMAIL] On 10/1/19 10:41 AM, vi...@vheuser.com wrote: > Sep 26 15:25:47 Debian9 kernel: megaraid_sas :01:00.0: 19474006 > (622841147s/0x0004/CRIT) - Enclosure PD 20(c None/p0) hardware error "Enclosure PD 20" is hex 32. this is typical enclosure address for Dell server internal enclosures. So, this appears to be reporting a hardware error on the enclosure, rather than with any of the drives. Probably want to get megacli or perccli and do : megacli fwtermlog dsply a0 perccli /c0 show termlog to get more details --stephen ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Re: [Linux-PowerEdge] FYI: BIOS Update fails due to CONFIG_IO_STRICT_DEVMEM=y in linux kernel 4.16+
SUMMARY: - CONFIG_IO_STRICT_DEVMEM is culprit (not CONFIG_STRICT_DEVMEM) - iomem=relaxed allows Dell DUP Kits to correctly report and update some firmware that previously failed. - some Dell DUP Kits work fine in locked-down environment, others fail. some fail with error reports, others silently claim to work and fail fully. - Dell should hopefully fix the DUP kit components to error check and work-around restrictions. On 07/12/2018 02:40 PM, Stephen Dowdy wrote: Update: adding 'iomem=relaxed' to the kernel bootparams allows 'biosie' (from the BIOS*.BIN DUPkit) to update the BIOS. Subject line changed to reflect the accurate kernel knob (CONFIG_IO_STRICT_DEVMEM) Ref: https://outflux.net/blog/archives/2016/09/28/security-things-in-linux-v4-5/ Ref: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=90a545e981267e917b9d698ce07affd69787db87 Okay, this (iomem=relaxed) also fixed the issue i've been having with the BCM NetExtreme (5720) firmware update DUP kits failing to report the "installed/running" version and actually REALLY updating the firmware: ~# ral-superinv [System Board] Hostname : Manufacturer : Dell Inc. Model : PowerEdge T640 Memory : 196608 MB Serial Number : BIOS Version : 1.3.7 (!=1.4.8) BMC/iDRAC Version : 3.21 {iDRAC9} [Network] Broadcom Adv. Dual 10GBASE-T Ethernet (BCM957416)@18:00.0: 20.08.04.04 Broadcom Adv. Dual 10GBASE-T Ethernet (BCM957416)@18:00.1: 20.08.04.04 --> Broadcom NetXtreme Gigabit Ethernet (BCM95720)@b1:00.0 : 20.6.52 --> Broadcom NetXtreme Gigabit Ethernet (BCM95720)@b1:00.1 : 20.6.52 [RAID/PERC] : PERC H740P Adapter = 50.3.0-1022 We try to run the Broadcom NetXtreme updater (Network_Firmware_R4HKW_LN_20.8.4.BIN) to get from 20.6 to 20.8, but Dell's DUP Kit can't inventory the "Installed version". It runs and CLAIMS it updated and wants to reboot, but on reboot, the firmware's still 20.6 : .../local/DellUpdates/NETW# ../dup_run_debian 14GEN_Broadcom_NetXtreme -q /bin/sh appears to be dash, setting it to use BASH, because Dell's sh scripts are non-POSIX and break systems Collecting inventory... Running validation... NetXtreme BCM5720 Gigabit Ethernet PCIe (enp177s0f0) The version of this Update Package is newer than the currently installed version. Software application name: NetXtreme BCM5720 Gigabit Ethernet PCIe (enp177s0f0) Package version: 20.8.4 --> Installed version: ... Executing update... WARNING: DO NOT STOP THIS PROCESS OR INSTALL OTHER PRODUCTS WHILE UPDATE IS IN PROGRESS. THESE ACTIONS MAY CAUSE YOUR SYSTEM TO BECOME UNSTABLE! The system should be restarted for the update to take effect. ERROR: Network_Firmware_R4HKW_LN_20.8.4.BIN execution MAY have failed, or you answered NO to reboot, sorry (return_code=2) (my wrapper script reports return error codes, so this was not a reported error from the DUP kit, but i believe DELL uses 'returncode=2' to indicate user chose to NOT reboot after update, but i'm not going to blindly trust that given that many DUPs behave differently) Clearly, there's some failure to do error checks and report in this particular DUP kit. BUT, The equivalent 20.8 Broadcom NetXtreme-E Updater (Network_Firmware_3VXHM_LN64_20.08.04.04.BIN) works *flawlessly* under the same conditions. (that's why my BCM957416's are updated to 20.08). So, *something* can be done on Dell's side to make these DUP kits actually work under these conditions. I verified that booting with 'iomem=relaxed' allows the 'Network_Firmware_R4HKW_LN_20.8.4.BIN' to actually correctly query the installed version, *AND* update the firmware properly. (but there's a security side-effect of running this way) thanks, --stephen -- Stephen Dowdy - Systems Administrator - NCAR/RAL 303.497.2869 - sdo...@ucar.edu- http://www.ral.ucar.edu/~sdowdy/ ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Re: [Linux-PowerEdge] FYI: BIOS Update fails due to CONFIG_STRICT_DEVMEM=y in linux kernel 4.16+
Update: adding 'iomem=relaxed' to the kernel bootparams allows 'biosie' (from the BIOS*.BIN DUPkit) to update the BIOS. The docs on this suck, though: .../src/linux-source-4.16/Documentation# sed -ne '/iomem=/,/^$/p' ./admin-guide/kernel-parameters.txt iomem= Disable strict checking of access to MMIO memory strict regions from userspace. relaxed No indications of the risks involved (okay, i presume this is a little better than all /dev/mem, as it would still immediately protect against memory scraping sensitive user process data and such, but still lets a potential cracker control/manipulate devices) But unless i run that way always, rebooting an alternate grub line just to BIOS update adds an extra reboot. (the goal is to not have to sit at the console, take long periods in lifecycle controller, ... or do multiple reboot hacks) (I'm getting the feeling there's going to be no easy workaround anymore past Kernel 4.16 because the SMBIOS table is iomapped outide 1MB) --stephen -- Stephen Dowdy - Systems Administrator - NCAR/RAL 303.497.2869 - sdo...@ucar.edu- http://www.ral.ucar.edu/~sdowdy/ ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge
[Linux-PowerEdge] FYI: BIOS Update fails due to CONFIG_STRICT_DEVMEM=y in linux kernel 4.16+
New PowerEdge T640 server needs BIOS update, however, it borks out with: terminate called after throwing an instance of 'smbios::InternalErrorImpl' what(): Could not instantiate SMBIOS table. /opt/dell/updatepackage/BIOS_4F4K0_LN_1.4.8.BIN-64420.wnGVr8/spsetup.sh: line 1331: 65177 Aborted $cmd "$@" . The update failed to complete ERROR: BIOS_4F4K0_LN_1.4.8.BIN execution MAY have failed, or you answered NO to reboot, sorry (return_code=1) (NOTE: the last message is from my 'dup_run_debian' wrapper script that works around various DUP kit issues related to non-POSIX /bin/sh invocations, RHEL-presumptions, xterm forced uses, blah, blah, blah of various DUP kits that i've had to struggle with in the past. THANKFULLY, Dell's getting serious about fixing some of these things in the past couple years) i found the error is in spsetup.sh calling _execCmd() on 'biosie -u', and strace sez: 3359 open("/dev/mem", O_RDONLY)= 3 3359 mmap(NULL, 65536, PROT_READ, MAP_PRIVATE, 3, 0xe) = 0x7fb6f4c63000 3359 munmap(0x7fb6f4c63000, 65536) = 0 3359 mmap(NULL, 65536, PROT_READ, MAP_PRIVATE, 3, 0xf) = 0x7fb6f4c63000 3359 munmap(0x7fb6f4c63000, 65536) = 0 3359 mmap(NULL, 65536, PROT_READ, MAP_PRIVATE, 3, 0x6ca0) = -1 EPERM (Operation not permitted) 3359 futex(0x7fb6f3bf01a0, FUTEX_WAKE_PRIVATE, 2147483647) = 0 3359 write(2, "terminate called after throwing "..., 48) = 48 3359 write(2, "smbios::InternalErrorImpl", 25) = 25 3359 write(2, "'\n", 2)= 2 3359 write(2, " what(): ", 11) = 11 3359 write(2, "Could not instantiate SMBIOS tab"..., 35) = 35 3359 write(2, "\n", 1)= 1 ran across this: https://www.phoronix.com/scan.php?page=news_item=Linux-4.16-Def-Strict-Dev-Mem "...Enabling CONFIG_STRICT_DEVMEM implements strict access to /dev/mem so that it only allows user-space access to memory mapped peripherals" ~/BIOS# grep STRICT_DEVMEM /boot/config-$(uname -r) CONFIG_STRICT_DEVMEM=y CONFIG_IO_STRICT_DEVMEM=y Sigh. ~/BIOS# cat /dev/mem | wc -c cat: /dev/mem: Operation not permitted 1048576 So, we have a 1MB access limit, as expected with CONFIG_STRICT_DEVMEM and sure enough, 'biosie -u' is trying to read offset 0x6ca0, which is definitely > 1MB (the first two are below 1MB and succeed) AFAICT, This requires rebuilding a kernel from scratch with 'CONFIG_STRICT_DEVMEM=n' Can anyone from Dell identify if there's something that can be done either in 'biosie' or by arm-twisting Linux kernel devs to allow other regions to be mapped? (if that's necessary). (is there ANY way, i as a normal customer, can use a formal channel to report bugs/misfeatures like this?) thanks, --stephen -- Stephen Dowdy - Systems Administrator - NCAR/RAL 303.497.2869 - sdo...@ucar.edu- http://www.ral.ucar.edu/~sdowdy/ ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Re: [Linux-PowerEdge] MegaCLI compatibility in StorCli (with added typos) ?
On 02/27/2018 03:47 PM, vinc...@cojot.name wrote: > > Hi, > Has anyone noticed that storcli (version 1.23.02 tested) appears to support > MegaCli's syntax? Does anyone know if this is here to stay? I've written about this several times in the past, e.g.: http://lists.us.dell.com/pipermail/linux-poweredge/2015-January/049533.html https://www.mail-archive.com/search?l=linux-poweredge@dell.com=subject:%22%5C%5BLinux%5C-PowerEdge%5C%5D+PE+r710+%5C-+MegaCli+not+working%22=newest=1 storcli/perccli help legacy --stephen -- Stephen Dowdy - Systems Administrator - NCAR/RAL 303.497.2869 - sdo...@ucar.edu- http://www.ral.ucar.edu/~sdowdy/ ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Re: [Linux-PowerEdge] PE r710 - MegaCli not working
On 01/22/2018 10:11 AM, George Machitidze wrote: > megacli/storcli won't work, use perccli instead As i said, early storcli versions work fine. Also, MegaCLI has worked just fine, as well. but i haven't updated since 8.07.10, so maybe they did the same disablement on SubVendorID to later version (but LSI/Avago stopped updating MegaCLI a while back in favor of storcli) Change to storcli made sometime between 1.03.11 and 1.04.07. finds my PERC in 1.03.11 # storcli64-1.03.11 help | grep Ver Storage Command Line Tool Ver 1.03.11 Jan 30, 2013 # storcli64-1.03.11 show ctrlcount | grep Count Controller Count = 1 doesn't in 1.04.07 # storcli64-1.04.07 help | grep Ver Storage Command Line Tool Ver 1.04.07 Apr 2, 2013 # storcli64-1.04.07 show ctrlcount | grep Count Controller Count = 0 Example of much older MegaCLI using a relatively recent PERC H730 Mini : # /usr/local/LSI-Tools/megacli -V | grep Ver MegaCLI SAS RAID Management Tool Ver 8.07.10 May 28, 2013 # /usr/local/LSI-Tools/megacli-overview [Adapter 0] ADP[0]=( PERC H730 Mini),FWPkg( 25.5.0.0018),FWVer( 4.270.00-8112) BBU[0]=(Learning?=No, charge=93 %, status=Complete, isSOHGood=Yes) PD[32:0]=(SEAGATE/ST300MM0006/LS0B/S0KX) ME=0,OE=0,PF=0,FW=Online, Spun Up,F=None PD[32:1]=(SEAGATE/ST300MM0006/LS0B/S0KX) ME=0,OE=0,PF=0,FW=Online, Spun Up,F=None PD[32:2]=(SEAGATE/ST300MM0006/LS0B/S0KX) ME=0,OE=0,PF=0,FW=Online, Spun Up,F=None PD[32:3]=(SEAGATE/ST300MM0006/LS0B/S0KX) ME=0,OE=0,PF=0,FW=Online, Spun Up,F=None PD[32:4]=(SEAGATE/ST300MM0006/LS0B/S0KX) ME=0,OE=0,PF=0,FW=Online, Spun Up,F=None PD[32:5]=(SEAGATE/ST300MM0006/LS0B/S0KX) ME=0,OE=0,PF=0,FW=Online, Spun Up,F=None PD[32:6]=(SEAGATE/ST300MM0006/LS0B/S0KX) ME=0,OE=0,PF=0,FW=Online, Spun Up,F=None PD[32:7]=(SEAGATE/ST300MM0006/LS0B/S0KX) ME=0,OE=0,PF=0,FW=Online, Spun Up,F=None VD[0:0]=("OpSys",R(6,0,3),SZ= 249.999 GB,SS=128 KB,CP=(W=WriteBack,R=RANone,IO=D),# 8,Optimal) VD[0:1]=("Exports",R(6,0,3),SZ= 1.389 TB,SS=128 KB,CP=(W=WriteBack,R=RANone,IO=D),# 8,Optimal) If you can't get your head around 'storcli/perccli' syntax or have existing scripts that parse MegaCLI (so much fun), realize it has the entire MegaCLI parser secretly hidden inside ;) (oh, i actually just noticed, it *IS* "documented" with: storcli/perccli help legacy ) (not super current version, but haven't needed newer) # /usr/local/LSI-Tools/perccli version Storage Command Line Tool Ver 1.17.10 October 21, 2015 (c)Copyright 2015, AVAGO Corporation, All Rights Reserved. storcli/perccli command format # /usr/local/LSI-Tools/perccli /c0 show termlog | head -5 Firmware Term Log Information on controller 0: ateChannelBankInfo: Available Channels: 1 T4: C0:onfiUpdateChannelBankInfo: Available Banks : 1 T4: C0:**DEBUG:LUN Interleave: 0x11 T4: C0:onfiDeviceInfo[0].mainAreaSize:8192 MegaCli format: # /usr/local/LSI-Tools/perccli fwtermlog dsply a0 | head -5 Firmware Term Log Information on controller 0: ateChannelBankInfo: Available Channels: 1 T4: C0:onfiUpdateChannelBankInfo: Available Banks : 1 T4: C0:**DEBUG:LUN Interleave: 0x11 T4: C0:onfiDeviceInfo[0].mainAreaSize:8192 --stephen -- Stephen Dowdy - Systems Administrator - NCAR/RAL 303.497.2869 - sdo...@ucar.edu- http://www.ral.ucar.edu/~sdowdy/ ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Re: [Linux-PowerEdge] PE r710 - MegaCli not working
On 01/22/2018 09:53 AM, Howard, Chris wrote: > I don't use it often but I think it was working. > Now it is not working. It always says there are no controllers. FWIW, sometimes the MegaRAID kernel drivers get wonked (esp when something is quite screwed up on the RAID itself), requiring a reboot for megacli to work, if you haven't done that. There was also a split a few years back in LSI/Avago/Broadcom/whatever land where 'storcli' and 'perccli' split functionality depending upon whether it was a pure LSI device or Dell/other sub-brand. ('storcli' used to work for Dell controllers but then LSI/Ava... switched it to (i presume) look at the PCI SubVendor ID and will report "No Controller" if it was a Dell model. I continue to use MegaCli64-8.07.10 for all my PERC controllers and haven't had an issue with anything even current. (It Works For Me) --stepehn -- Stephen Dowdy - Systems Administrator - NCAR/RAL 303.497.2869 - sdo...@ucar.edu- http://www.ral.ucar.edu/~sdowdy/ ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge
[Linux-PowerEdge] FYI using pdsh for mass execution (Re: dsu woes)
On 01/10/2018 03:27 PM, Patrick Boutilier wrote: > What I would do is download the BIOS update file once and then create a > script to rsync it to the 1500 hosts and run the update. I presume you would > have a script to run dsu on the 1500 hosts anyway? First: i gave up on using Dell's enterprise management tools due to constant heartache/headache/frustration. It definitely makes me sad that these tools continually change, but never actually get much better. (okay, it looks like they are finally attacking the BASHisms in some of their scripts that borked Debian/Ubuntu systems badly, but the continual lack of correctness/current-cy, etc just pains me). Second: FYI, LLNL's 'pdsh' works great for this. requires ssh public key trust for 'root' in my following examples to do the following (running DUPs requires 'root'): ** note that your public key should be offline until loaded in your ssh-agent (oof, Meltdown/Spectre, sigh) WARNING: as any tool that allows mass-execution, if you screw up, you've now multiplied that screwup to a large number of systems, so always be careful. If you have a file of hostnames already: pdsh -lroot -g {file} 'command-to-run-on-all-remote-systems' (file is usually dshgroup module selected so ~/.dsh/group/{file} or /etc/dsh/group/{file}) If you wanna hit everything in /etc/hosts, instead: awk '$0!~/^(#.*| *)$/{print$2}' /etc/hosts | WCOLL=- pdsh -Rexec scp FOO-2.7.0.BIN root@%h:/dev/shm/ explained: all non-comment/blank lines, print hostname (field 2), setup pdsh's WCOLL envvar as the file containing hostnames to stdin (-), use the exec module to scp your DUP.BIN file, substituting %h for each hostname successively. awk '$0!~/^(#.*| *)$/{print$2}' /etc/hosts | WCOLL=- pdsh -lroot '/dev/shm/FOO-2.7.0.BIN -q' now, issue 'ssh' session to all hosts to run the DUP.BIN with '-q'. ('-q' doesn't display changelog or prompt to run, also won't reboot after completion automatically) Note that 'pdsh' fans-out commands, running "N" jobs simultaneously (default 32). I limit mine to 8 so i can use some special gateway-hop syntax with custom ./ssh/config rules to bounce past the admin nodes on clusters into the backend compute nodes. This avoids the default 'sshd' connection throttling limits (usually 10 simultaneous connections) (e.g. cluster1-admin!!node3) using the ssh-config rule: Host*!!* GatewayPorts no ProxyCommand $(h="%h";p="%p" ; echo ssh -W ${h##*\!\!}:%p -l root ${h%%\!\!*}) It's much easier to use libgenders or dshgroup style files for this kind of thing (than /etc/hosts and awk, etc), so you can use attribute selectors (genders) like: gpsh -lroot -g 'model=poweredge_r730' 'do-something' (it's up to you to create a genders file with the right attributes filled in) records in my genders file, as created from a scripted MySQL asset database extraction look like: host99 name=host99,manu=dell,model=precision_t3400,hwtype=desktop,sn=XXX,os=debian_linux,status=in_use,user=godot,responsible=godot,purpose=user_room_linux,sa1=sdowdy,project=unknown,location=fl2-2094 Unfortunately, 'genders' doesn't support REGEX :-( but you can use regex selection on hostnames in pdsh (just not attributes), like: pdsh -lroot -g 'hwtype=desktop' -w '/engr-.*/' ... to only hit the systems that are desktops and filter-down to only names with "engr-" in them. --stephen -- Stephen Dowdy - Systems Administrator - NCAR/RAL 303.497.2869 - sdo...@ucar.edu- http://www.ral.ucar.edu/~sdowdy/ ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Re: [Linux-PowerEdge] 2 predicted failure disks and RAID5
On 11/14/2017 11:52 AM, Grzegorz Bakalarski wrote: > Thanks for valuable input. > Regarding punctured block: from fwtermlog I got several (not much) lines of > type: > > 11/13/17 3:24:45: EVT#08603-11/13/17 3:24:45: 97=Puncturing bad block on > PD 02(e0x20/s2) at 9ecd that's bad. You have a punctured stripe. > T35: maintainPdFailHistory=0 disablePuncturing=0 > zeroBasedEnclEnumeration=1 disableBootCLI=1 This is and informational line indicating that the controller doesn't have the disablePuncturing config option set. > All the same PD, the same bad block (different time) > > Is my raid useless? No, it's good enough to recover what data you can before you rebuild it. However, you can't trust the data that uses the bad block. You'll get a read error from any object that maps to it. Here's a good doc Dell put out: https://www.dell.com/support/article/us/en/4/438291#2 "...If the data within a punctured stripe is accessed errors will continue to be reported against the affected badLBAs with no possible correction available. Eventually (this could be minutes, days, weeks, months, etc.), the Bad Block Management (BBM) Table will fill up causing one or more drives to become flagged as predictive failure.,,,: > BTW: why do think raid level migration to raid-6 with 2 additional disk would > be better than with one disk. I would keep VD size the same. I'm not talking about a migration, i'm talking a complete WIPE of what you have, and a recreation from scratch. At this point, you can recover what you can to a staging location, rebuild, then restore. Keep track of data with I/O errors, because it's going to have a corrupted block at the punctured block address. this could (if you're lucky), be in unallocated space. could also be in filesystem structures and lead to widescale corruption of the filesystem. I would mount it all READONLY and do a file-level dump (not a 'dd' or anything like that, which would migrate corrupted filesystem structures). (i typically 'rsync' data to another machine.). You don't want any backup tool that does infinite retries, as it'll likely result in another disk failure. (from the above) > Anyway will migration too raid-6 fail with this "awful Puncturing)??? RAID-6 is going to lessen the likelihood of a puncture, with 2 parity drives. While you're rebuilding a RAID5, any unrecoverable bad block event on any of the "good" drives during the rebuild will result in a puncture, with RAID6, you still have parity to cope with an uncorrectable error. The above is especially true of some of the less reliable seagate drives from past years. You can't count on them not throwing UCEs during a rebuild (or before you get the replacement drive installed), thereby puncturing the RAID. :-( --stephen -- Stephen Dowdy - Systems Administrator - NCAR/RAL 303.497.2869 - sdo...@ucar.edu- http://www.ral.ucar.edu/~sdowdy/ ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Re: [Linux-PowerEdge] 2 predicted failure disks and RAID5
On 11/14/2017 09:52 AM, Grzegorz Bakalarski wrote: > I have a server (R815) with PERC H710. I have 4 disks and RAID-5 on them. The > server have 2 empty disk slots. > > After night I noticed 2 disks got predicted failure state - LEDs on disk > flash yellow (not green) on them (2 of them). MegaCLI shows that 2 disks have > high "Predictive Failure Count:" - some few thousands. GB, *IF* this all rebuilds fine and goes Optimal again, you really want to: megacli fwtermlog dsply aall | grep -i punct If you see a punctured block, you're gonna have to back off what you can, rebuild the RAID from scratch and restore, because there's no good way to fix a punctured stripe. (ignore the lines that say the controller supports puncturing). If you have to rebuild, i'd go RAID6 with your 6 drives. --stephen -- Stephen Dowdy - Systems Administrator - NCAR/RAL 303.497.2869 - sdo...@ucar.edu- http://www.ral.ucar.edu/~sdowdy/ ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Re: [Linux-PowerEdge] Upgrading firmware under CentOS7
On Mon, Oct 24, 2016 at 9:04 AM, Stephen Dowdy <sdo...@ucar.edu> wrote: > SUMMARY: you could use linux namespaces (see proof-of-concept below) Since i failed to explicitly state WHY using this over 'mount -o remount,exec /tmp', the point would be to NOT enable a potential GLOBAL /tmp trojan/drop attack (the main point behind NOEXEC use on /tmp) even during a short window (where "short" can be as long as like 30 minutes with an iDRAC update) --stephen -- Stephen Dowdy - Systems Administrator - NCAR/RAL 303.497.2869 - sdo...@ucar.edu- http://www.ral.ucar.edu/~sdowdy/ ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Re: [Linux-PowerEdge] Upgrading firmware under CentOS7
ty tmp folder chown "root:$TARGET_USER" "$NEWTMP" chmod 770 "$NEWTMP" unshare --mount -- /bin/bash -c "mount -o bind,noexec,nosuid,nodev '$NEWTMP' /tmp && sudo -u '$TARGET_USER' $TARGET_CMD" --stephen On Mon, Oct 24, 2016 at 7:21 AM, Robert Jacobson <teri...@gmail.com> wrote: > > This should be considered a bug in DUP. It should be easily resolvable by > allowing users to customizing the temporary working directory; e.g. using > an environment variable. > > > > -Original Message- > > From: linux-poweredge-boun...@dell.com [mailto:linux-poweredge- > > boun...@dell.com] On Behalf Of Thibaut Pouzet > > Sent: Monday, October 24, 2016 3:47 AM > > To: linux-poweredge@dell.com > > Subject: Re: [Linux-PowerEdge] Upgrading firmware under CentOS7 > > > > > > Le 21/10/2016 à 17:57, Davide Ferrari a écrit : > > > > > > Hello, > > > > > > CentOS7 comes with /tmp with no exec permissions by default, but > all > > Dell furmware upgrade pacakges uses /tmp as the default (and > > uncustomizable) path to unpack and execute the actual FW upgrade binary > > blob. Is there any official way to do it properly without remounting /tmp > > with "exec" (or replacing /tmp with /var/tmp in the bash wrapper code > > inside the package)? > > > > > > Thanks > > > > > > -- > > > > Davide Ferrari > > > > Senior Systems Engineer > > > > > > > > > > ___ > > Linux-PowerEdge mailing list > > Linux-PowerEdge@dell.com <mailto:Linux-PowerEdge@dell.com> > > https://lists.us.dell.com/mailman/listinfo/linux-poweredge > > > > Hi, > > > > Not that I've heard of sorry. I do this and this works just fine : before > > launching the update, I run : > > sudo mount -o remount,exec /tmp > > > > Once I'm done : > > sudo mount -o remount,noexec /tmp > > > > I'm not aware of any other magical solution > > > > Cheers > > > > > > -- > > Thibaut Pouzet > > Lyra Network > > Expert Sécurité > > (+33) 5 31 22 40 08 > > www.lyra-network.com <http://www.lyra-network.com> > > > ___ > Linux-PowerEdge mailing list > Linux-PowerEdge@dell.com > https://lists.us.dell.com/mailman/listinfo/linux-poweredge > -- Stephen Dowdy - Systems Administrator - NCAR/RAL 303.497.2869 - sdo...@ucar.edu- http://www.ral.ucar.edu/~sdowdy/ ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Re: [Linux-PowerEdge] Perc H700 firmware upgrade Debian
Helmut, perccli /c${adp} download file=${fwfile} -or- megacli adpfwflash -f ${fwfile} -a${adp} using the .ROM file from the DUP payload directory will actually work, whereas the RHEL-specific Dell stuff may/usually doesn't (on Debian). NOTE: 'perccli' is avail from Dell, 'megacli' is avail (though deprecated) from LSI/Avago (harder to find) --stephen On Tue, May 24, 2016 at 5:55 AM, Helmut Wollmersdorfer < helmut.wollmersdor...@fixpunkt.de> wrote: > Hi, > > tried to upgrade Perc H700 Integrated firmware on Debian (Wheezy). > > # /opt/dell/srvadmin/bin/omreport about > > Product name : Dell OpenManage Server Administrator > Version : 7.4.0-1 > Copyright: Copyright (C) Dell Inc. 1995-2013 All rights reserved. > Company : Dell Inc. > > > # /opt/dell/srvadmin/bin/omreport storage controller > Controller PERC H700 Integrated (Slot 4) > > Controllers > ID: 0 > Status: Non-Critical > Name : PERC H700 Integrated > Slot ID : PCI Slot 4 > State : Degraded > Firmware Version : 12.10.1-0001 > Latest Available Firmware Version : 12.10.5-0001 > Driver Version: 00.00.06.12-rc1 > Minimum Required Driver Version : Not Applicable > Storport Driver Version : Not Applicable > Minimum Required Storport Driver Version : Not Applicable > […] > > # cat /etc/debian_version > 7.9 > > > # wget > http://downloads.dell.com/FOLDER03292779M/4/SAS-RAID_Firmware_2948G_LN_12.10.7-0001_A13_02.BIN > > # chmod 777 SAS-RAID_Firmware_2948G_LN_12.10.7-0001_A13_02.BIN > > # bash ./SAS-RAID_Firmware_2948G_LN_12.10.7-0001_A13_02.BIN --extract > RAIDFW > # cd RAIDFW/ > # ./sasdupie -u -s payload/ > lang="en">0 > > There is no “The operation was successful” and > “1” as answer of the last step. So it went > wrong. > > > Is it possible to upgrade from the shell? How can I upgrade else? > > TIA > > Helmut Wollmersdorfer > > ___ > Linux-PowerEdge mailing list > Linux-PowerEdge@dell.com > https://lists.us.dell.com/mailman/listinfo/linux-poweredge > > -- Stephen Dowdy - Systems Administrator - NCAR/RAL 303.497.2869 - sdo...@ucar.edu- http://www.ral.ucar.edu/~sdowdy/ ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Re: [Linux-PowerEdge] FW:
The thing to be careful with, is that there is not just *ONE* PERC 6/i. There's: PERC 6/i Adapter (regular PCI card that goes in PCI slots) PERC 6/i Integrated (it's really about the same card, but the PCI bracket is removed and it's in a servers' Integrated RAID card housing) (IIRC, the latest version i could find for Integrated was 6.3.3, and 6.3.1 for the Adapter. That may be because Dell didn't care to update the Adapter form, or the specific fwupdate for the integrated only addressed specific sets of disks and such that Dell sold in those configurations) There are different firmware updates for these based upon the device's PCI Vendor/Device SubVendor/SubDevice identifiers. btw, For H700, etc, there's 3 or 4 different versions (Adapter, Integrated, Mini, MiniP, ...) Again, distinguished by the PCI identifiers. The firmware images refuse to be burned to the wrong card even though they're all "H700" family, e.g. I have a script that identifies which particular device(s) you have in your machine, and uses a table mapping the LSI Firmware ROM names to each Vendor_Device_SubV_SubD string that i yoinked out of the .ROM images themselves. It uses LSI's 'megacli' to flash those ROM images to all discovered devices. MUCH, MUCH, MUCH less error prone than Dell's overengineered DUP/OpenManage, and it runs w/o modification on Debian (BIG WIN!) --stephen On Fri, Apr 15, 2016 at 10:55 AM, Karsten Suehring <suehringli...@gmail.com> wrote: > Hi, > > I remember having trouble with the PERC6/i update on old R1950 machines a > while ago. I noticed that there existed two Linux update packages with the > same version number, but a different cryptic string part in the file name. > One worked, the other did not. The two versions showed up on different > server types on the dell support site. I think the one that worked was > named "SAS-RAID_Firmware_W83M2_LN32_6.3.1-0003_A14.BIN" (at least that's > the file that I still have). If you Google the name, it will point you to > the Dell download servers. > > Best regards, > Karsten > > On Fri, Apr 1, 2016 at 3:03 PM, Andrew Barkley <abark...@crawfordtech.com> > wrote: > >> kernel: sasdupie[8998]: segfault at 20 ip 7f3c3ecee00d sp >> 7ffc73039a20 error 4 in sasdupie[7f3c3ecc9000+11] >> -- >> *From:* soorej_ponna...@dell.com [soorej_ponna...@dell.com] >> *Sent:* Thursday, March 31, 2016 10:55 PM >> *To:* Andrew Barkley; linux-powere...@lists.us.dell.com >> *Subject:* RE: [Linux-PowerEdge] FW: >> >> *Dell - Internal Use - Confidential * >> >> >> >> Hi, >> >> >> >> DSU does not support PE2950 server. For the firmware >> segfault, please attach the log messages so that the device team can have a >> look. >> >> >> >> *Soorej Ponnandi* >> >> *Dell* | Change Management >> >> *office* + 91 80 2807 7759 Extn: 78469 >> >> >> >> *From:* linux-poweredge-bounces-Lists *On Behalf Of *Andrew Barkley >> *Sent:* Friday, April 1, 2016 12:49 AM >> *To:* linux-poweredge-Lists <linux-powere...@lists.us.dell.com> >> *Subject:* [Linux-PowerEdge] FW: >> >> >> >> I am using the new repository, found through this site: >> http://linux.dell.com/repo/hardware/dsu/ >> <http://redir.aspx?REF=d6Jgeo73qRVnmpNJgZl04FeKUlyNwTuxupig2xuCAUagu36mLVrTCAFodHRwOi8vbGludXguZGVsbC5jb20vcmVwby9oYXJkd2FyZS9kc3Uv> >> -- >> >> I am trying to update the PERC 6/i firmware in a PE 2950 running CentOS >> 6.7 and the update package is segfaulting. >> >> ___ >> Linux-PowerEdge mailing list >> Linux-PowerEdge@dell.com >> https://lists.us.dell.com/mailman/listinfo/linux-poweredge >> >> > > ___ > Linux-PowerEdge mailing list > Linux-PowerEdge@dell.com > https://lists.us.dell.com/mailman/listinfo/linux-poweredge > > -- Stephen Dowdy - Systems Administrator - NCAR/RAL 303.497.2869 - sdo...@ucar.edu- http://www.ral.ucar.edu/~sdowdy/ ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Re: [Linux-PowerEdge] R720XD dead after IDRAC update trial
Grzegorz, I've had a motherboard replacement needed due to iDRAC updates before. It's pretty frustrating. The only "emergency" procedure you might try: - disconnect power cords to PSUs - press and HOLD the power button to "drain the Flea power" (i presume this is a capacitor store residuals?) Hold for 30 seconds - reconnect PSUs to power - power on and enable CAPS LOCK, SCROLL LOCK and NUM LOCK (may not be able to do immediately) This *MAY* require you get into the BIOS "Setup" menu () (which you indicate seems unlikely) - Press , , I can only imagine this stands for "Emergency F*ing Boot" If you are lucky, then the system will be reset. If not, you might want to give up and have the Motherboard replaced (sigh) Ref: http://en.community.dell.com/techcenter/systems-management/w/wiki/3464.troubleshooting-idrac6-issues Good Luck, --stephen -- Stephen Dowdy - Systems Administrator - NCAR/RAL 303.497.2869 - sdo...@ucar.edu- http://www.ral.ucar.edu/~sdowdy/ ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Re: PERC H800 Firmware update from Linux failing?
(sorry for the attribution depth, deleted the original message). Anyway, i have no H800 series controllers yet, so i'm just throwing this out, in case it applies, but the PERC5/6 can be updated via MegaCLI with: MegaCli -AdpFwFlash -f filename [-NoSigChk] [-NoVerChk] -aN|-a0,1,2|-aALL I'd extract the DUP kit below with: RAID_FRMW_LX_R269683.BIN --extract ./FOO and look in FOO/payload for a .img or .fw or whatever file to apply directly. If anyone does try this, finds that it works/doesn't, please respond. --stephen Op 31-7-2010 18:38, Tom Rockwell schreef: Hi, I tried to apply the latest firmware update to a PERC H800 controller using the Linux .BIN package. It gave me the message that the update doesn't match the system (forget the exact wording) and wouldn't apply. I have updated the controller on this system in the past (prior release). I ended up applying the update using a PXE booted DOS image, and that went fine. Anyone having problems installing using the file: http://ftp.us.dell.com/sas-raid/RAID_FRMW_LX_R269683.BIN -- Stephen Dowdy - Systems Administrator - NCAR/RAL 303.497.2869 - sdo...@ucar.edu- http://www.ral.ucar.edu/~sdowdy/ ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Blew away my partition table
Jefferson Ogata wrote, On 06/29/2010 02:06 PM: Lots of really good info ... Also, take a look at : http://www.cgsecurity.org/wiki/TestDisk - TestDisk can * Fix partition table, recover deleted partition * Recover FAT32 boot sector from its backup * Rebuild FAT12/FAT16/FAT32 boot sector * Fix FAT tables * Rebuild NTFS boot sector * Recover NTFS boot sector from its backup * Fix MFT using MFT mirror * Locate ext2/ext3 Backup SuperBlock * Undelete files from FAT, NTFS and ext2 filesystem * Copy files from deleted FAT, NTFS and ext2/ext3 partitions. - I'm not sure if there's a simple command to scan the device, presuming only the partition table is borked and to recreate just the PT, but i think so. At least it can do Superblock scan lookups. There's another tool i've run across that'll scan a block dev for superblock backups, but i can't recall the name... btw, cgsecurity has photorec, which was originally designed to recover lost photos off digital camera media. It's been enhanced to recover a large number of file types off any damaged media and write what it can to auxiliary storage. --stephen -- Stephen Dowdy - Systems Administrator - NCAR/RAL 303.497.2869 - sdo...@ucar.edu- http://www.ral.ucar.edu/~sdowdy/ ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Checking servers for firmware/driver updates etc
Matt Domsch wrote, On 02/27/2010 06:48 PM: For PowerEdge, you could parse the XML Catalog, which is what Repository Manager, DMC, and other 3rd party commercial update tools use. ftp://ftp.dell.com/catalog/Catalog.xml.gz ftp://ftp.dell.com/catalog/Catalog.xml.gz.sign Matt, Thanks! I didn't know that was quickly accessible. There does appear to be some historical unmaintained cruft there (e.g. Catalog.xml.tar.gz), but if that file is reliable (and i presume from your description of it being used by the above tools that it won't disappear anytime soon), then that's is DEFINITELY good news ;). Yeah, it doesn't help me with the hundreds of desktop (Precision WS) laptop (Latitude) systems i also manage, but this is definitely useful. This does not include history, but it does include the current release block's set of files. This file is updated as part of the block release process, when a whole set of tested updates are published at once. The next obvious question is - where can I find the schema for this XML? I believe this is available as part of Dell's PartnerDirect program: http://dell.com/partnerdirect/. You may be able to tease the bits you want out of the XML directly without having the full schema though. Yeah, it definitely (at this point) looks pretty easily parsable (sans the UTF16 stuff, which is easily coped with). Thanks, --stephen ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Checking servers for firmware/driver updates etc
difficult trying to figure out which system is the one to use (note the ITA vs DMC, SBUU vs SUU vs ..) Even things like the Dell Capacity Planner/Calculator hav gone from a Windows only executable to a Java app to a flash app to ??? and often the new servers aren't included. And, STILL: www.dell.com/calc points to ESSA for the Blades, and you click that link and get can't find the server at solutions.dell.com. (it's been that way for MONTHS) Thanks, --stephen Cranky Old Man Ramblin' dowdy #!/bin/sh # Title:ral-superinv # Purpose: obtain System Inventory info (firmware, etc) # Author: Stephen Dowdy (sdo...@ucar.edu) # RCS: $Header$ # Note: See RCS/CVS Log info at End of File # Requirements: dmidecode, ipmitool, ddcprobe, xresprobe # Caveats: # Todo: # # Copyright UCAR (c) 2006-2009. # University Corporation for Atmospheric Research (UCAR), # National Center for Atmospheric Research (NCAR), # Research Applications Laboratory (RAL), # P.O. Box 3000, Boulder, Colorado, 80307-3000, USA. is_debug() { [ ${DEBUG:-0} -ge 1 ] ;} debug() { is_debug echo DEBUG: $@ 12 ;} preload() { if [ -f /etc/debian_version ]; then #if ! type dmidecode /dev/null; then echo Missing dmidecode; exit 1; fi if ! dpkg-query -W --showformat='${Status}\n' dmidecode | grep -q '^install'; then echo Installing 'dmidecode'... apt-get install dmidecode /dev/null fi #if ! type ddcprobe /dev/null; then echo Missing ddcprobe; exit 1; fi if ! dpkg-query -W --showformat='${Status}\n' xresprobe | grep -q '^install'; then echo Installing 'xresprobe'... apt-get install xresprobe /dev/null fi fi # lame attempt to get IPMI functional if type -p ipmitool /dev/null; then if [ $(lsmod | grep ipmi | wc -l) != 3 ]; then modprobe ipmi_si modprobe ipmi_devintf modprobe ipmi_msghandler fi fi } inv_sys_hostname=$(hostname -s) # The dmidecode statements here are from earlier dmidecode releases # that didn't support such niceties as dmidecode -t bios get_sys_manufacturer() { dmidecode | egrep -A8 '^Handle (0x0100|0x0001|0x0005)' | grep 'Manufacturer:' \ | sed -e 's/^[^:]*:[[:space:]]*\(.*[^ ]\)\([[:space:]]*$\)/\1/' } get_sys_model() { dmidecode | egrep -A8 '^Handle (0x0100|0x0001|0x0005)' | grep 'Product Name:' \ | sed -e 's/^[^:]*:[[:space:]]*\(.*[^ ]\)\([[:space:]]*$\)/\1/' } get_sys_serialnumber() { dmidecode | egrep -A8 '^Handle (0x0100|0x0001|0x0005)' \ | sed -ne '/Serial Number:/s/^[^:]*:[[:space:]]*\(.*[^ ]\)\([[:space:]]*$\)/\1/p' } get_sys_bios() { dmidecode | egrep -A8 '^Handle (0x)' | grep 'Version:' \ | sed -e 's/^[^:]*:[[:space:]]*\(.*[^ ]\)\([[:space:]]*$\)/\1/' } get_sys_bmc() { if type -p ipmitool /dev/null 21; then tmp=$(ipmitool mc info 2/dev/null | grep 'Firmware Revision' \ | sed -e 's/^[^:]*:[[:space:]]*\(.*[^ ]\)\([[:space:]]*$\)/\1/') #echo ${tmp:-N/A} echo ${tmp} else #echo N/A echo fi } # XXX this doesn't work reliably much, but Xorg.0.log # XXX often has the monitor model correctly probed? # XXX - nvidia proprietary drivers tend to disable this capability :-( get_monitor_model() { tmp=$(ddcprobe | grep monitorname: | cut -d: -f2 \ | sed -e 's/^[[:space:]]*\(.*[^ ]\)\([[:space:]]*$\)/\1/') debug monitor_model tmp=[${tmp}] #echo ${tmp:-UNKNOWN} echo ${tmp} } get_monitor_serialnumber() { tmp=$(ddcprobe | grep monitorserial: \ | sed -e 's/^[^:]*:[[:space:]]*\(.*[^ ]\)\([[:space:]]*$\)/\1/') debug monitor_serialnumber tmp=[${tmp}] #echo ${tmp:-UNKNOWN} echo ${tmp} } # XXX - We don't rely on external LSI tools at this point # XXX - we're trying to be lean, rely on well-established utilities get_perc4() { ## Host: scsi0 Channel: 01 Id: 00 Lun: 00 ## Vendor: MegaRAID Model: LD 0 RAID1 69G Rev: 522D ## Type: Direct-AccessANSI SCSI revision: 02 awk -F: ' $1 ~ /Vendor/ $2 ~ /MegaRAID/ $3 ~ /LD [0-9] RAID/ {print $4; } ' /proc/scsi/scsi | tr -d ' ' | fmt -128 | sed -e 's/ /, /g' } get_perc5i() { ## Host: scsi0 Channel: 02 Id: 00 Lun: 00 ## Vendor: DELL Model: PERC 5/i Rev: 1.03 ## Type: Direct-AccessANSI SCSI revision: 05 awk -F: ' $1 ~ /Vendor/ $2 ~ /DELL/ $3 ~ /PERC 5\/i/ {print $4; } ' /proc/scsi/scsi | tr -d ' ' | fmt -128 | sed -e 's/ /, /g' } get_perc5E() { ## Host: scsi2 Channel: 02 Id: 00 Lun: 00 ## Vendor: DELL Model: PERC 5/E Adapter Rev: 1.03 ## Type: Direct-AccessANSI SCSI revision: 05 awk -F: ' $1 ~ /Vendor/ $2 ~ /DELL/ $3 ~ /PERC 5\/E/ {print $4; } ' /proc/scsi/scsi | tr -d ' ' | fmt -128 | sed -e 's/ /, /g' } get_perc6i() { ## Host: scsi0 Channel: 02 Id: 00 Lun: 00 ## Vendor: DELL Model: PERC 6/i Rev: 1.11 ## Type: Direct-AccessANSI SCSI revision: 05 awk -F: ' $1 ~ /Vendor/ $2 ~ /DELL/ $3 ~ /PERC 6\/i/ {print $4; } ' /proc/scsi/scsi | tr -d ' ' | fmt -128 | sed -e 's/ /, /g' } get_perc6E
Re: Automatic Detection of PowerEdge Servers
Erinn Looney-Triggs wrote, On 01/20/2010 01:46 PM: I run an automatic provisioning and installation service via cobbler for our RedHat installs. Now as my laziness increases, there's some saying about how *much* work sysadmins are willing to invest due to laziness ;) can't figure out is a way to reliably know that the system is a Dell PowerEdge system. I could do something like dmidecode -s system-product-name and grep -i for poweredge (in loose terms), but are all PowerEdge servers supported by OpenManage? Is there a better way to do this? My guess is any Dell equipment with a BMC is covered by OM, but that's just a guess. This should then work for that... [r...@foo ~]# ipmitool mc info Device ID : 32 Device Revision : 0 Firmware Revision : 2.28 IPMI Version : 2.0 Manufacturer ID : 674 --- This may be good to key off Manufacturer Name : DELL Inc Product ID: 256 (0x0100) Product Name : Unknown (0x100) Device Available : yes Provides Device SDRs : yes Additional Device Support : Sensor Device SDR Repository Device SEL Device FRU Inventory Device IPMB Event Receiver Bridge Chassis Device Aux Firmware Rev Info : 0x00 0x00 0x00 0x00 On a related note I am trying to do the same thing for the DRAC cards in the systems, I would like a way to detect that a DRAC card is present and if so run a set of commands. I can do this for a a DRAC 5 pretty reliably using lsusb -d 0x413c: which correlates to the DRAC 5 cards, however the iDRAC6 cards don't seem to have this option available. Does anyone know of a way to automagically detect their presence? yeah, that's something that's bugged me too. 'lspci' would show up a DRAC4, but thanks for the clue on 'lsusb' for the DRAC5! (following this logic... lspcmcia ?? heh ;) darn :-( 'ipmitool fru list' doesn't show it, and i'm pretty sure dmidecode doesn't show it (unless there's encoding in one of the OEM specific types, and i wouldn't rule that out) --stephen -- Stephen Dowdy - Systems Administrator - NCAR/RAL 303.497.2869 - sdo...@ucar.edu- http://www.ral.ucar.edu/~sdowdy/ ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Automatic Detection of PowerEdge Servers
Alexander Dupuy wrote, On 01/20/2010 03:22 PM: You may get some joy from ipmitool sdr elist mcloc: On a 1950 with a BMC (but no DRAC 5, despite the output): # ipmitool sdr elist mcloc BMC | 00h | ok | 7.1 | Dynamic MC @ 20h DRAC 5 | 00h | ok | 11.1 | Dynamic MC @ 26h On a R710 with iDRAC6: # ipmitool sdr elist mcloc iDRAC6 | 00h | ok | 7.1 | Dynamic MC @ 20h Alex, Cool, thanks. FWIW, here's what an M610 blade shows: # ipmitool sdr elist mcloc iDRAC| 00h | ok | 7.1 | Dynamic MC @ 20h And a PE1850 # ipmitool sdr elist mcloc DRAC4| 00h | ok | 11.5 | Dynamic MC @ 26h BMC | 00h | ok | 7.1 | Dynamic MC @ 20h Primary BP | 00h | ok | 26.2 | Dynamic MC @ C0h As you say, this seems to only be representative of an interface address for that system, not the existence of the card. I'm guessing there must be some IPMI call that can be issued to those addresses to detect presence, but... (i'm afraid to start pushing random events there and don't have time to read the IPMI docs) however, at least for a DRAC5... System *with* a DRAC5 installed # ipmitool sdr elist | grep -i drac DRAC5 Conn 2 Cbl | 59h | ok | 7.1 | Connected (this command takes *minutes* to complete, but can be shortened by # ipmitool sdr entity 7.1 | grep -i drac DRAC5 Conn 2 Cbl | 59h | ok | 7.1 | Connected ) System w/o DRAC5 installed, note the 'ns' (no sense) state: # ipmitool sdr elist | grep -i drac DRAC5 Conn 2 Cbl | 59h | ns | 7.1 | Disabled This entity may be an indicator of Console Serial redirection from the name, so i'm not sure if it still indicates the existence of the card. Unfortunately, on my M610, nothing matches the 'drac' string nor on my PE1850. For Brandon Ooi, # lshw | egrep -i '(drac|remote|access|mc)' # While 'lshw' is useful, it also doesn't answer this question :-( --stephen -- Stephen Dowdy - Systems Administrator - NCAR/RAL 303.497.2869 - sdo...@ucar.edu- http://www.ral.ucar.edu/~sdowdy/ ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: Dell Poweredge SC1425 Open Manage
jeffrey_l_mend...@dell.com wrote, On 12/07/2009 08:47 AM: OMSA is only available for certain Dell PowerEdge server models. Notably, SC-class systems do not have OMSA available. To check to see if OMSA is available on your system, use the getSystemId' executable to look up your System ID. If you don't have 'getSystemId' (part of libsmbios), this works in a pinch: (assuming you have a reasonably current 'dmidecode') sh-3.2# dell_systemid() { dmidecode -t 208 | awk '/Header and Data/ {getline; print 0x$10$9}' ;} sh-3.2# dell_systemid 0x0162 sh-3.2# dmidecode -s system-product-name PowerEdge 400SC ssh-in2:~/cib # dell_systemid 0x016D ssh-in2:~/cib # dmidecode -s system-product-name PowerEdge 2850 --stephen ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
Re: PE T610 - ipmitool reports a 'cr'
Per Jensen wrote, On 10/23/2009 06:52 AM: List, I have just received a T610 which is being setup for Xen dom0 backup use. ... When running 'ipmitool sdr' I get a 'cr' on one of the reported temperatures, as shown in the snip below. ... Temp | 62 degrees C | cr ... How do I find out what the cr is about, should I be concerned ? Per, You need to get the Entity ID via: # ipmitool sdr type temp Temp | 01h | ok | 3.1 | -56 degrees C Temp | 02h | ok | 3.2 | -54 degrees C Temp | 05h | ok | 10.1 | 37 degrees C Temp | 06h | ok | 10.2 | 36 degrees C Ambient Temp | 0Eh | ok | 7.1 | 25 degrees C Planar Temp | 0Fh | ok | 7.1 | 42 degrees C IOH THERMTRIP| 5Dh | ns | 7.1 | Disabled CPU Temp Interf | 76h | ns | 7.1 | Disabled Temp | 0Ah | ok | 8.1 | 34 degrees C Temp | 0Bh | ok | 8.1 | 39 degrees C Temp | 0Ch | ucr | 8.1 | 50 degrees C I also appear to have an Upper Critical on entity 8.1 on an R710. I'll have to check my other systems to see if this is an anomoly or a bug/deficiency in Dell's implementation. # ipmitool -v sdr entity 8.1 Sensor ID : Temp (0xc) Entity ID : 8.1 (Memory Module) Sensor Type (Analog) : Temperature Sensor Reading: 49 (+/- 1) degrees C Status: Upper Critical Nominal Reading : 23.000 Normal Minimum: 11.000 Normal Maximum: 69.000 Upper critical: 47.000 Upper non-critical: 42.000 Lower critical: 3.000 Lower non-critical: 8.000 Positive Hysteresis : 1.000 Negative Hysteresis : 1.000 Minimum sensor range : Unspecified Maximum sensor range : Unspecified Event Message Control : Per-threshold Readable Thresholds : lcr lnc unc ucr Settable Thresholds : lcr lnc unc ucr Threshold Read Mask : lcr lnc unc ucr Event Status : Event Messages Disabled Assertion Events : unc+ ucr+ Event Enable : Event Messages Disabled Assertions Enabled: This is a Memory Module. Not sure how to map that to any particular DIMM/slot/cpu/sensor-location, though, as i have 6 DIMMs (3/cpu) # dmidecode -t memory | sed -ne '/Memory Device/,/Part Number/ { /Size:/h; /^[[:space:]]*Locator:/ {p;x;p}; /Speed:/p}' | paste - - - | tr -s '\t ' | expand -t 1,20,50 Locator: DIMM_A1 Size: 4096 MB Speed: 1333 MHz (0.8 ns) Locator: DIMM_A2 Size: 4096 MB Speed: 1333 MHz (0.8 ns) Locator: DIMM_A3 Size: 4096 MB Speed: 1333 MHz (0.8 ns) Locator: DIMM_A4 Size: No Module Installed Speed: Unknown Locator: DIMM_A5 Size: No Module Installed Speed: Unknown Locator: DIMM_A6 Size: No Module Installed Speed: Unknown Locator: DIMM_A7 Size: No Module Installed Speed: Unknown Locator: DIMM_A8 Size: No Module Installed Speed: Unknown Locator: DIMM_A9 Size: No Module Installed Speed: Unknown Locator: DIMM_B1 Size: 4096 MB Speed: 1333 MHz (0.8 ns) Locator: DIMM_B2 Size: 4096 MB Speed: 1333 MHz (0.8 ns) Locator: DIMM_B3 Size: 4096 MB Speed: 1333 MHz (0.8 ns) Locator: DIMM_B4 Size: No Module Installed Speed: Unknown Locator: DIMM_B5 Size: No Module Installed Speed: Unknown Locator: DIMM_B6 Size: No Module Installed Speed: Unknown Locator: DIMM_B7 Size: No Module Installed Speed: Unknown Locator: DIMM_B8 Size: No Module Installed Speed: Unknown Locator: DIMM_B9 Size: No Module Installed Speed: Unknown But, the presumption from this, then, is that the memory is overheating *IF* it's not some incomplete function of the BMC. Well, to confirm, this seems to be common on the R710s i've checked. lager:~# ipmitool sdr entity 8.1.0 Temp | 0Ah | ok | 8.1 | 27 degrees C Temp | 0Bh | ok | 8.1 | 24 degrees C Temp | 0Ch | ucr | 8.1 | 59 degrees C pub:~# ipmitool sdr entity 8.1.0 Temp | 0Ah | ok | 8.1 | 32 degrees C Temp | 0Bh | ok | 8.1 | 32 degrees C Temp | 0Ch | unc | 8.1 | 45 degrees C The last sensor is MUCH higher than the other two. I think someone from Dell needs to chime in on this --stephen -- Stephen Dowdy - Systems Administrator - NCAR/RAL 303.497.2869 - sdo...@ucar.edu- http://www.ral.ucar.edu/~sdowdy/ ___ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq