Have you looked at 'rvitals mynode leds' ?

Regards,
Christian Caruthers
Lenovo Professional Services
Mobile: 757-289-9872

From: Rundall, Jacob D [mailto:rund...@illinois.edu]
Sent: Wednesday, May 17, 2017 3:13 PM
To: xcat-user@lists.sourceforge.net
Subject: [xcat-user] using xCAT to view "Active Events" for Lenovo System x 
servers

I’m curious if anybody can help me figure out how to use xCAT to view “Active 
Events” for Lenovo System x servers, as shown in the web interface of the IMM. 
Using pasu gets me somewhere, as follows:
pasu mynode immapp showimmlog | grep “Severity:5”
There are a few shortcomings, though, as compared to the web interface of the 
IMM:

  1.  pasu shows me past events that are no longer active (and the recovery 
events are lower severity so they don’t make it through the grep, so it’s not 
obvious that the events have been recovered from, at least not with this 
command).
  2.  pasu only returns items with some kind of sequence number rather than a 
date and time.
  3.  The web interface also sometimes has “Additional Information for Event” 
as well, which I cannot figure out how to view using pasu.

Here is an example of what I can see in the IMM web interface:
Error      System   25 June 2016, 03:14:40.788 AM     An Uncorrectable Error 
has occurred on PCIs.
Error      System   25 June 2016, 03:15:13.638 AM     Fault in slot 3 on system 
System x3650 M5. <more>

Clicking “more” on the latter provides the following additional information:
[S.68005] An error has been detected by the the IIO core logic on CPU 1. The 
Global Fatal Error Status register contains 0x0. The Global Non-Fatal Error 
Status register contains 0x40. Please check error logs for the presence of 
additional downstream device error data.

And here’s the output that I get using my pasu command shown above (with grep):
monitor01: 19 | Severity:5 | Message:Redundancy Lost for Power Unit has 
asserted.
monitor01: 22 | Severity:5 | Message:Redundancy Lost for Power Unit has 
asserted.
monitor01: 27 | Severity:5 | Message:Redundancy Lost for Power Unit has 
asserted.
monitor01: 49 | Severity:5 | Message:Redundancy Lost for Power Unit has 
asserted.
monitor01: 56 | Severity:5 | Message:Redundancy Lost for Power Unit has 
asserted.
monitor01: 125 | Severity:5 | Message:A Fatal Bus Error has occurred on bus CPU 
2 PECI.
monitor01: 126 | Severity:5 | Message:An Uncorrectable Error has occurred on 
PCIs.
monitor01: 128 | Severity:5 | Message:Fault in slot 3 on system System x3650 M5.
monitor01: 138 | Severity:5 | Message:A Fatal Bus Error has occurred on bus CPU 
2 PECI.
monitor01: 164 | Severity:5 | Message:A Fatal Bus Error has occurred on bus CPU 
2 PECI.

Events 126 and 128 clearly correspond to what is shown as “Active Events” in 
the web interface. But it’s not obvious that the others are not active unless I 
dig deeper in the IMM log (e.g., without filtering through grep). When I do 
that I can eventually find subsequent recovery events for the other sev 5 
events which shows why they are not considered “active”.


On a related note, does anyone know of a way with xCAT (pasu or otherwise) to 
view status/info about the following via the command-line from an xCAT 
management node:

  1.  IMM web interface: System Status -> System Information -> Check Log LED 
[I suspect the status here corresponds to the status of the “Check log LED” on 
the front of the server].
  2.  Front of the server: “System-error LED”
  3.  IMM web interface: System Status -> Hardware Health: status of each 
component type (i.e., “Cooling Devices”, “Power Modules”, “Local Storage”, 
“Processors”, “Memory”, “System”)


Thanks very much,

Jake Rundall
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to