Thanks, Jarrod and Christian. I was not aware of rvitals but that seems to do 
exactly what I need. And it also shows me that I need to spend some more time 
reading the xCAT docs, including the Hardware Management section. Doh!

Jake

On 5/17/17, 2:26 PM, "xcat-user-requ...@lists.sourceforge.net" 
<xcat-user-requ...@lists.sourceforge.net> wrote:

    Send xCAT-user mailing list submissions to
        xcat-user@lists.sourceforge.net
    
    To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.sourceforge.net/lists/listinfo/xcat-user
    or, via email, send a message with subject or body 'help' to
        xcat-user-requ...@lists.sourceforge.net
    
    You can reach the person managing the list at
        xcat-user-ow...@lists.sourceforge.net
    
    When replying, please edit your Subject line so it is more specific
    than "Re: Contents of xCAT-user digest..."
    
    
    Today's Topics:
    
       1. Re: using xCAT to view "Active Events" for Lenovo System x
          servers (Jarrod Johnson)
       2. Re: using xCAT to view "Active Events" for Lenovo System x
          servers (Christian Caruthers)
    
    
    ----------------------------------------------------------------------
    
    Message: 1
    Date: Wed, 17 May 2017 19:18:38 +0000
    From: Jarrod Johnson <jjohns...@lenovo.com>
    Subject: Re: [xcat-user] using xCAT to view "Active Events" for Lenovo
        System x        servers
    To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
    Message-ID: <520F6194F0DCCD4C97924999FB8247476A8CD61B@USMAILMBX01>
    Content-Type: text/plain; charset="utf-8"
    
    In confluent, a new command was added:
    
    # nodehealth n1
    n1: critical (Mezz Exp 2 Fault:Critical)
    [root@odin ~]# nodehealth r1
    r1: ok
    
    In xcat:
    # rvitals <noderange> led
    
    Can do a serviceable job of showing the error lights:
    # rvitals n1 leds
    n1: LED 0x0000 (Fault) active to indicate system error condition.
    n1: LED 0271 (Mezz Exp 2) active to indicate Sensor 0x62 (Mezz Exp 2 Fault) 
error.
    
    
    From: Rundall, Jacob D [mailto:rund...@illinois.edu]
    Sent: Wednesday, May 17, 2017 3:13 PM
    To: xcat-user@lists.sourceforge.net
    Subject: [xcat-user] using xCAT to view "Active Events" for Lenovo System x 
servers
    
    I?m curious if anybody can help me figure out how to use xCAT to view 
?Active Events? for Lenovo System x servers, as shown in the web interface of 
the IMM. Using pasu gets me somewhere, as follows:
    pasu mynode immapp showimmlog | grep ?Severity:5?
    There are a few shortcomings, though, as compared to the web interface of 
the IMM:
    
      1.  pasu shows me past events that are no longer active (and the recovery 
events are lower severity so they don?t make it through the grep, so it?s not 
obvious that the events have been recovered from, at least not with this 
command).
      2.  pasu only returns items with some kind of sequence number rather than 
a date and time.
      3.  The web interface also sometimes has ?Additional Information for 
Event? as well, which I cannot figure out how to view using pasu.
    
    Here is an example of what I can see in the IMM web interface:
    Error      System   25 June 2016, 03:14:40.788 AM     An Uncorrectable 
Error has occurred on PCIs.
    Error      System   25 June 2016, 03:15:13.638 AM     Fault in slot 3 on 
system System x3650 M5. <more>
    
    Clicking ?more? on the latter provides the following additional information:
    [S.68005] An error has been detected by the the IIO core logic on CPU 1. 
The Global Fatal Error Status register contains 0x0. The Global Non-Fatal Error 
Status register contains 0x40. Please check error logs for the presence of 
additional downstream device error data.
    
    And here?s the output that I get using my pasu command shown above (with 
grep):
    monitor01: 19 | Severity:5 | Message:Redundancy Lost for Power Unit has 
asserted.
    monitor01: 22 | Severity:5 | Message:Redundancy Lost for Power Unit has 
asserted.
    monitor01: 27 | Severity:5 | Message:Redundancy Lost for Power Unit has 
asserted.
    monitor01: 49 | Severity:5 | Message:Redundancy Lost for Power Unit has 
asserted.
    monitor01: 56 | Severity:5 | Message:Redundancy Lost for Power Unit has 
asserted.
    monitor01: 125 | Severity:5 | Message:A Fatal Bus Error has occurred on bus 
CPU 2 PECI.
    monitor01: 126 | Severity:5 | Message:An Uncorrectable Error has occurred 
on PCIs.
    monitor01: 128 | Severity:5 | Message:Fault in slot 3 on system System 
x3650 M5.
    monitor01: 138 | Severity:5 | Message:A Fatal Bus Error has occurred on bus 
CPU 2 PECI.
    monitor01: 164 | Severity:5 | Message:A Fatal Bus Error has occurred on bus 
CPU 2 PECI.
    
    Events 126 and 128 clearly correspond to what is shown as ?Active Events? 
in the web interface. But it?s not obvious that the others are not active 
unless I dig deeper in the IMM log (e.g., without filtering through grep). When 
I do that I can eventually find subsequent recovery events for the other sev 5 
events which shows why they are not considered ?active?.
    
    
    On a related note, does anyone know of a way with xCAT (pasu or otherwise) 
to view status/info about the following via the command-line from an xCAT 
management node:
    
      1.  IMM web interface: System Status -> System Information -> Check Log 
LED [I suspect the status here corresponds to the status of the ?Check log LED? 
on the front of the server].
      2.  Front of the server: ?System-error LED?
      3.  IMM web interface: System Status -> Hardware Health: status of each 
component type (i.e., ?Cooling Devices?, ?Power Modules?, ?Local Storage?, 
?Processors?, ?Memory?, ?System?)
    
    
    Thanks very much,
    
    Jake Rundall
    -------------- next part --------------
    An HTML attachment was scrubbed...
    
    ------------------------------
    
    Message: 2
    Date: Wed, 17 May 2017 19:25:04 +0000
    From: Christian Caruthers <ccaruth...@lenovo.com>
    Subject: Re: [xcat-user] using xCAT to view "Active Events" for Lenovo
        System x        servers
    To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
    Message-ID: <0CDF2C7DEB37E244AAB7648109AFC6EDB2EED76F@USMAILMBX01>
    Content-Type: text/plain; charset="utf-8"
    
    Have you looked at 'rvitals mynode leds' ?
    
    Regards,
    Christian Caruthers
    Lenovo Professional Services
    Mobile: 757-289-9872
    
    From: Rundall, Jacob D [mailto:rund...@illinois.edu]
    Sent: Wednesday, May 17, 2017 3:13 PM
    To: xcat-user@lists.sourceforge.net
    Subject: [xcat-user] using xCAT to view "Active Events" for Lenovo System x 
servers
    
    I?m curious if anybody can help me figure out how to use xCAT to view 
?Active Events? for Lenovo System x servers, as shown in the web interface of 
the IMM. Using pasu gets me somewhere, as follows:
    pasu mynode immapp showimmlog | grep ?Severity:5?
    There are a few shortcomings, though, as compared to the web interface of 
the IMM:
    
      1.  pasu shows me past events that are no longer active (and the recovery 
events are lower severity so they don?t make it through the grep, so it?s not 
obvious that the events have been recovered from, at least not with this 
command).
      2.  pasu only returns items with some kind of sequence number rather than 
a date and time.
      3.  The web interface also sometimes has ?Additional Information for 
Event? as well, which I cannot figure out how to view using pasu.
    
    Here is an example of what I can see in the IMM web interface:
    Error      System   25 June 2016, 03:14:40.788 AM     An Uncorrectable 
Error has occurred on PCIs.
    Error      System   25 June 2016, 03:15:13.638 AM     Fault in slot 3 on 
system System x3650 M5. <more>
    
    Clicking ?more? on the latter provides the following additional information:
    [S.68005] An error has been detected by the the IIO core logic on CPU 1. 
The Global Fatal Error Status register contains 0x0. The Global Non-Fatal Error 
Status register contains 0x40. Please check error logs for the presence of 
additional downstream device error data.
    
    And here?s the output that I get using my pasu command shown above (with 
grep):
    monitor01: 19 | Severity:5 | Message:Redundancy Lost for Power Unit has 
asserted.
    monitor01: 22 | Severity:5 | Message:Redundancy Lost for Power Unit has 
asserted.
    monitor01: 27 | Severity:5 | Message:Redundancy Lost for Power Unit has 
asserted.
    monitor01: 49 | Severity:5 | Message:Redundancy Lost for Power Unit has 
asserted.
    monitor01: 56 | Severity:5 | Message:Redundancy Lost for Power Unit has 
asserted.
    monitor01: 125 | Severity:5 | Message:A Fatal Bus Error has occurred on bus 
CPU 2 PECI.
    monitor01: 126 | Severity:5 | Message:An Uncorrectable Error has occurred 
on PCIs.
    monitor01: 128 | Severity:5 | Message:Fault in slot 3 on system System 
x3650 M5.
    monitor01: 138 | Severity:5 | Message:A Fatal Bus Error has occurred on bus 
CPU 2 PECI.
    monitor01: 164 | Severity:5 | Message:A Fatal Bus Error has occurred on bus 
CPU 2 PECI.
    
    Events 126 and 128 clearly correspond to what is shown as ?Active Events? 
in the web interface. But it?s not obvious that the others are not active 
unless I dig deeper in the IMM log (e.g., without filtering through grep). When 
I do that I can eventually find subsequent recovery events for the other sev 5 
events which shows why they are not considered ?active?.
    
    
    On a related note, does anyone know of a way with xCAT (pasu or otherwise) 
to view status/info about the following via the command-line from an xCAT 
management node:
    
      1.  IMM web interface: System Status -> System Information -> Check Log 
LED [I suspect the status here corresponds to the status of the ?Check log LED? 
on the front of the server].
      2.  Front of the server: ?System-error LED?
      3.  IMM web interface: System Status -> Hardware Health: status of each 
component type (i.e., ?Cooling Devices?, ?Power Modules?, ?Local Storage?, 
?Processors?, ?Memory?, ?System?)
    
    
    Thanks very much,
    
    Jake Rundall
    -------------- next part --------------
    An HTML attachment was scrubbed...
    
    ------------------------------
    
    
------------------------------------------------------------------------------
    Check out the vibrant tech community on one of the world's most
    engaging tech sites, Slashdot.org! http://sdm.link/slashdot
    
    ------------------------------
    
    _______________________________________________
    xCAT-user mailing list
    xCAT-user@lists.sourceforge.net
    https://lists.sourceforge.net/lists/listinfo/xcat-user
    
    
    End of xCAT-user Digest, Vol 93, Issue 35
    *****************************************
    



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to