In confluent, a new command was added:
# nodehealth n1
n1: critical (Mezz Exp 2 Fault:Critical)
[root@odin ~]# nodehealth r1
r1: ok
In xcat:
# rvitals <noderange> led
Can do a serviceable job of showing the error lights:
# rvitals n1 leds
n1: LED 0x0000 (Fault) active to indicate system error condition.
n1: LED 0271 (Mezz Exp 2) active to indicate Sensor 0x62 (Mezz Exp 2 Fault)
error.
From: Rundall, Jacob D [mailto:rund...@illinois.edu]
Sent: Wednesday, May 17, 2017 3:13 PM
To: xcat-user@lists.sourceforge.net
Subject: [xcat-user] using xCAT to view "Active Events" for Lenovo System x
servers
I’m curious if anybody can help me figure out how to use xCAT to view “Active
Events” for Lenovo System x servers, as shown in the web interface of the IMM.
Using pasu gets me somewhere, as follows:
pasu mynode immapp showimmlog | grep “Severity:5”
There are a few shortcomings, though, as compared to the web interface of the
IMM:
1. pasu shows me past events that are no longer active (and the recovery
events are lower severity so they don’t make it through the grep, so it’s not
obvious that the events have been recovered from, at least not with this
command).
2. pasu only returns items with some kind of sequence number rather than a
date and time.
3. The web interface also sometimes has “Additional Information for Event”
as well, which I cannot figure out how to view using pasu.
Here is an example of what I can see in the IMM web interface:
Error System 25 June 2016, 03:14:40.788 AM An Uncorrectable Error
has occurred on PCIs.
Error System 25 June 2016, 03:15:13.638 AM Fault in slot 3 on system
System x3650 M5. <more>
Clicking “more” on the latter provides the following additional information:
[S.68005] An error has been detected by the the IIO core logic on CPU 1. The
Global Fatal Error Status register contains 0x0. The Global Non-Fatal Error
Status register contains 0x40. Please check error logs for the presence of
additional downstream device error data.
And here’s the output that I get using my pasu command shown above (with grep):
monitor01: 19 | Severity:5 | Message:Redundancy Lost for Power Unit has
asserted.
monitor01: 22 | Severity:5 | Message:Redundancy Lost for Power Unit has
asserted.
monitor01: 27 | Severity:5 | Message:Redundancy Lost for Power Unit has
asserted.
monitor01: 49 | Severity:5 | Message:Redundancy Lost for Power Unit has
asserted.
monitor01: 56 | Severity:5 | Message:Redundancy Lost for Power Unit has
asserted.
monitor01: 125 | Severity:5 | Message:A Fatal Bus Error has occurred on bus CPU
2 PECI.
monitor01: 126 | Severity:5 | Message:An Uncorrectable Error has occurred on
PCIs.
monitor01: 128 | Severity:5 | Message:Fault in slot 3 on system System x3650 M5.
monitor01: 138 | Severity:5 | Message:A Fatal Bus Error has occurred on bus CPU
2 PECI.
monitor01: 164 | Severity:5 | Message:A Fatal Bus Error has occurred on bus CPU
2 PECI.
Events 126 and 128 clearly correspond to what is shown as “Active Events” in
the web interface. But it’s not obvious that the others are not active unless I
dig deeper in the IMM log (e.g., without filtering through grep). When I do
that I can eventually find subsequent recovery events for the other sev 5
events which shows why they are not considered “active”.
On a related note, does anyone know of a way with xCAT (pasu or otherwise) to
view status/info about the following via the command-line from an xCAT
management node:
1. IMM web interface: System Status -> System Information -> Check Log LED
[I suspect the status here corresponds to the status of the “Check log LED” on
the front of the server].
2. Front of the server: “System-error LED”
3. IMM web interface: System Status -> Hardware Health: status of each
component type (i.e., “Cooling Devices”, “Power Modules”, “Local Storage”,
“Processors”, “Memory”, “System”)
Thanks very much,
Jake Rundall
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user