Re: [Freeipmi-users] Interpretation of IPMI events for monitoring

2021-07-15 Thread FRANK Michael
Hello Al,


Many thanks for your quick feedback. I will forward to monitoring software 
vendor.

Mike

-Original Message-
From: Al Chu  
Sent: Wednesday, July 14, 2021 7:55 PM
To: FRANK Michael ; freeipmi-users@gnu.org
Subject: Re: [Freeipmi-users] Interpretation of IPMI events for monitoring

Hi Frank,

> Is it correct that the meaning of "transition to" is, that the sensor
> was before in a different state and now has the reported state?

This is my interpretation of the specification.

> Current situation is that the monitoring software interpret the event
> "transition to OK" as an CRITCAL state which I think is not correct.

With the caveat that every manufacturer / sensor could have alternate
interpretations, I've always gone with the belief that "transition to
OK" is "nominal" / "ok" as a monitoring state.  It's what I set as the
default within the libipmimonitoring interpretations (i.e. ipmi-sensors 
--output-sensor-state).

Al

On Wed, 2021-07-14 at 15:13 +, FRANK Michael wrote:
> Hello,
> 
> I am currently in the discussion with monitoring software developers
> about the interpretation of sensor events.
> In general what I understood is: an event only changes in the
> situation the state of a sensor changes but the state of the sensor
> is not "really" checked each time we read the sensor e.g with ipmi-
> sensor. My abstract imagination is that the BMC has a table of
> sensorĀ“s and the latest events and if the state of a sensor changes
> the BMC update the according entry in this table.
> 
> Main discussion is about the event "transition to..." and especially
> the event "transition to OK" (Type code 07h, Offset 00h).
> Unfortunately, there are no further explanations in the IPMI Specs
> Sensor and Event Code Tables.
> 
> So here is my question:
> Is it correct that the meaning of "transition to" is, that the sensor
> was before in a different state and now has the reported state?
> For example, event "transition to OK" means the sensor had, before
> the event changed, a state other than OK, like Critical, Non-Critical 
> etc. and his present state is now OK.
> 
> I would much appreciate if an expert could verify the above
> statement. 
> Current situation is that the monitoring software interpret the event
> "transition to OK" as an CRITCAL state which I think is not correct.
> 
> ID | Name | Type | Reading| Units | Event
> 80 | PS2 12V UV Fault | Power Supply | N/A| N/A   |
> 'transition to OK'
> 
> Record ID: 80
> Record Type: Compact Sensor Record (2h)
> ID String: PS2 12V UV Fault
> Sensor Type: Power Supply (8h)
> Sensor Number: 122
> IPMB Slave Address: 10h
> Sensor Owner ID: 20h
> Sensor Owner LUN: 0h
> Channel Number: 0h
> Entity ID: power supply (10)
> Entity Instance: 2
> Entity Instance Type: Physical Entity
> Event/Reading Type Code: 7h
> Sensor Direction: Unspecified
> Assertion Event Enabled: 'transition to OK'
> Assertion Event Enabled: 'transition to Non-Critical from OK'
> Assertion Event Enabled: 'transition to Critical from less severe'
> Assertion Event Enabled: 'transition to Critical from Non-
> recoverable'
> Assertion Event Enabled: 'Monitor'
> Deassertion Event Enabled: 'transition to OK'
> Deassertion Event Enabled: 'transition to Non-Critical from OK'
> Deassertion Event Enabled: 'transition to Critical from less severe'
> Deassertion Event Enabled: 'transition to Critical from Non-
> recoverable'
> Deassertion Event Enabled: 'Monitor'
> Share Count: 1
> ID String Instance Modifier Type: Numeric
> ID String Instance Modifier Offset: 0
> Entity Instance Sharing: Same for all records
> Sensor Event: 'transition to OK'
> 
> 10.240.4.12: =
> 10.240.4.12: IPMI 1.5 Get Channel Authentication Capabilities Request
> 10.240.4.12: =
> 10.240.4.12: RMCP Header:
> 10.240.4.12: 
> 10.240.4.12: [   6h] = version[ 8b]
> 10.240.4.12: [   0h] = reserved[ 8b]
> 10.240.4.12: [  FFh] = sequence_number[ 8b]
> 10.240.4.12: [   7h] = message_class.class[ 5b]
> 10.240.4.12: [   0h] = message_class.reserved[ 2b]
> 10.240.4.12: [   0h] = message_class.ack[ 1b]
> 10.240.4.12: IPMI Session Header:
> 10.240.4.12: 
> 10.240.4.12: [   0h] = authentication_type[ 8b]
> 10.240.4.12: [   0h] = session_sequence_number[32b]
> 10.240.4.12: [   0h] = session_id[32b]
> 10.240.4.12: [   9h] = ipmi_msg_len[ 8b]
> 10.240.4.12: IPMI Message Header:

Re: [Freeipmi-users] Interpretation of IPMI events for monitoring

2021-07-14 Thread Al Chu via Freeipmi-users
Hi Frank,

> Is it correct that the meaning of "transition to" is, that the sensor
> was before in a different state and now has the reported state?

This is my interpretation of the specification.

> Current situation is that the monitoring software interpret the event
> "transition to OK" as an CRITCAL state which I think is not correct.

With the caveat that every manufacturer / sensor could have alternate
interpretations, I've always gone with the belief that "transition to
OK" is "nominal" / "ok" as a monitoring state.  It's what I set as the
default within the libipmimonitoring interpretations (i.e. ipmi-sensors 
--output-sensor-state).

Al

On Wed, 2021-07-14 at 15:13 +, FRANK Michael wrote:
> Hello,
> 
> I am currently in the discussion with monitoring software developers
> about the interpretation of sensor events.
> In general what I understood is: an event only changes in the
> situation the state of a sensor changes but the state of the sensor
> is not "really" checked each time we read the sensor e.g with ipmi-
> sensor. My abstract imagination is that the BMC has a table of
> sensorĀ“s and the latest events and if the state of a sensor changes
> the BMC update the according entry in this table.
> 
> Main discussion is about the event "transition to..." and especially
> the event "transition to OK" (Type code 07h, Offset 00h).
> Unfortunately, there are no further explanations in the IPMI Specs
> Sensor and Event Code Tables.
> 
> So here is my question:
> Is it correct that the meaning of "transition to" is, that the sensor
> was before in a different state and now has the reported state?
> For example, event "transition to OK" means the sensor had, before
> the event changed, a state other than OK, like Critical, Non-Critical 
> etc. and his present state is now OK.
> 
> I would much appreciate if an expert could verify the above
> statement. 
> Current situation is that the monitoring software interpret the event
> "transition to OK" as an CRITCAL state which I think is not correct.
> 
> ID | Name | Type | Reading| Units | Event
> 80 | PS2 12V UV Fault | Power Supply | N/A| N/A   |
> 'transition to OK'
> 
> Record ID: 80
> Record Type: Compact Sensor Record (2h)
> ID String: PS2 12V UV Fault
> Sensor Type: Power Supply (8h)
> Sensor Number: 122
> IPMB Slave Address: 10h
> Sensor Owner ID: 20h
> Sensor Owner LUN: 0h
> Channel Number: 0h
> Entity ID: power supply (10)
> Entity Instance: 2
> Entity Instance Type: Physical Entity
> Event/Reading Type Code: 7h
> Sensor Direction: Unspecified
> Assertion Event Enabled: 'transition to OK'
> Assertion Event Enabled: 'transition to Non-Critical from OK'
> Assertion Event Enabled: 'transition to Critical from less severe'
> Assertion Event Enabled: 'transition to Critical from Non-
> recoverable'
> Assertion Event Enabled: 'Monitor'
> Deassertion Event Enabled: 'transition to OK'
> Deassertion Event Enabled: 'transition to Non-Critical from OK'
> Deassertion Event Enabled: 'transition to Critical from less severe'
> Deassertion Event Enabled: 'transition to Critical from Non-
> recoverable'
> Deassertion Event Enabled: 'Monitor'
> Share Count: 1
> ID String Instance Modifier Type: Numeric
> ID String Instance Modifier Offset: 0
> Entity Instance Sharing: Same for all records
> Sensor Event: 'transition to OK'
> 
> 10.240.4.12: =
> 10.240.4.12: IPMI 1.5 Get Channel Authentication Capabilities Request
> 10.240.4.12: =
> 10.240.4.12: RMCP Header:
> 10.240.4.12: 
> 10.240.4.12: [   6h] = version[ 8b]
> 10.240.4.12: [   0h] = reserved[ 8b]
> 10.240.4.12: [  FFh] = sequence_number[ 8b]
> 10.240.4.12: [   7h] = message_class.class[ 5b]
> 10.240.4.12: [   0h] = message_class.reserved[ 2b]
> 10.240.4.12: [   0h] = message_class.ack[ 1b]
> 10.240.4.12: IPMI Session Header:
> 10.240.4.12: 
> 10.240.4.12: [   0h] = authentication_type[ 8b]
> 10.240.4.12: [   0h] = session_sequence_number[32b]
> 10.240.4.12: [   0h] = session_id[32b]
> 10.240.4.12: [   9h] = ipmi_msg_len[ 8b]
> 10.240.4.12: IPMI Message Header:
> 10.240.4.12: 
> 10.240.4.12: [  20h] = rs_addr[ 8b]
> 10.240.4.12: [   0h] = rs_lun[ 2b]
> 10.240.4.12: [   6h] = net_fn[ 6b]
> 10.240.4.12: [  C8h] = checksum1[ 8b]
> 10.240.4.12: [  81h] = rq_addr[ 8b]
> 10.240.4.12: [   0h] = rq_lun[ 2b]
> 10.240.4.12: [  31h] = rq_seq[ 6b]
> 10.240.4.12: IPMI Command Data:
> 10.240.4.12: --
> 10.240.4.12: [  38h] = cmd[ 8b]
> 10.240.4.12: [   Eh] = channel_number[ 4b]
> 10.240.4.12: [   0h] = reserved1[ 3b]
> 10.240.4.12: [   1h] = get_ipmi_v2.0_extended_data[ 1b]
>