Re: [Freeipmi-users] Interpretation of IPMI events for monitoring

2021-07-15 Thread FRANK Michael
Hello Al,


Many thanks for your quick feedback. I will forward to monitoring software 
vendor.

Mike

-Original Message-
From: Al Chu  
Sent: Wednesday, July 14, 2021 7:55 PM
To: FRANK Michael ; freeipmi-users@gnu.org
Subject: Re: [Freeipmi-users] Interpretation of IPMI events for monitoring

Hi Frank,

> Is it correct that the meaning of "transition to" is, that the sensor
> was before in a different state and now has the reported state?

This is my interpretation of the specification.

> Current situation is that the monitoring software interpret the event
> "transition to OK" as an CRITCAL state which I think is not correct.

With the caveat that every manufacturer / sensor could have alternate
interpretations, I've always gone with the belief that "transition to
OK" is "nominal" / "ok" as a monitoring state.  It's what I set as the
default within the libipmimonitoring interpretations (i.e. ipmi-sensors 
--output-sensor-state).

Al

On Wed, 2021-07-14 at 15:13 +, FRANK Michael wrote:
> Hello,
> 
> I am currently in the discussion with monitoring software developers
> about the interpretation of sensor events.
> In general what I understood is: an event only changes in the
> situation the state of a sensor changes but the state of the sensor
> is not "really" checked each time we read the sensor e.g with ipmi-
> sensor. My abstract imagination is that the BMC has a table of
> sensor´s and the latest events and if the state of a sensor changes
> the BMC update the according entry in this table.
> 
> Main discussion is about the event "transition to..." and especially
> the event "transition to OK" (Type code 07h, Offset 00h).
> Unfortunately, there are no further explanations in the IPMI Specs
> Sensor and Event Code Tables.
> 
> So here is my question:
> Is it correct that the meaning of "transition to" is, that the sensor
> was before in a different state and now has the reported state?
> For example, event "transition to OK" means the sensor had, before
> the event changed, a state other than OK, like Critical, Non-Critical 
> etc. and his present state is now OK.
> 
> I would much appreciate if an expert could verify the above
> statement. 
> Current situation is that the monitoring software interpret the event
> "transition to OK" as an CRITCAL state which I think is not correct.
> 
> ID | Name | Type | Reading| Units | Event
> 80 | PS2 12V UV Fault | Power Supply | N/A| N/A   |
> 'transition to OK'
> 
> Record ID: 80
> Record Type: Compact Sensor Record (2h)
> ID String: PS2 12V UV Fault
> Sensor Type: Power Supply (8h)
> Sensor Number: 122
> IPMB Slave Address: 10h
> Sensor Owner ID: 20h
> Sensor Owner LUN: 0h
> Channel Number: 0h
> Entity ID: power supply (10)
> Entity Instance: 2
> Entity Instance Type: Physical Entity
> Event/Reading Type Code: 7h
> Sensor Direction: Unspecified
> Assertion Event Enabled: 'transition to OK'
> Assertion Event Enabled: 'transition to Non-Critical from OK'
> Assertion Event Enabled: 'transition to Critical from less severe'
> Assertion Event Enabled: 'transition to Critical from Non-
> recoverable'
> Assertion Event Enabled: 'Monitor'
> Deassertion Event Enabled: 'transition to OK'
> Deassertion Event Enabled: 'transition to Non-Critical from OK'
> Deassertion Event Enabled: 'transition to Critical from less severe'
> Deassertion Event Enabled: 'transition to Critical from Non-
> recoverable'
> Deassertion Event Enabled: 'Monitor'
> Share Count: 1
> ID String Instance Modifier Type: Numeric
> ID String Instance Modifier Offset: 0
> Entity Instance Sharing: Same for all records
> Sensor Event: 'transition to OK'
> 
> 10.240.4.12: =
> 10.240.4.12: IPMI 1.5 Get Channel Authentication Capabilities Request
> 10.240.4.12: =
> 10.240.4.12: RMCP Header:
> 10.240.4.12: 
> 10.240.4.12: [   6h] = version[ 8b]
> 10.240.4.12: [   0h] = reserved[ 8b]
> 10.240.4.12: [  FFh] = sequence_number[ 8b]
> 10.240.4.12: [   7h] = message_class.class[ 5b]
> 10.240.4.12: [   0h] = message_class.reserved[ 2b]
> 10.240.4.12: [   0h] = message_class.ack[ 1b]
> 10.240.4.12: IPMI Session Header:
> 10.240.4.12: 
> 10.240.4.12: [   0h] = authentication_type[ 8b]
> 10.240.4.12: [   0h] = session_sequence_number[32b]
> 10.240.4.12: [   0h] = session_id[32b]
> 10.240.4.12: [   9h] = ipmi_msg_len[ 8b]
> 10.240.4.12: IPMI Message Header:

Re: [Freeipmi-users] Interpretation of IPMI events for monitoring

2021-07-14 Thread Al Chu via Freeipmi-users
Hi Frank,

> Is it correct that the meaning of "transition to" is, that the sensor
> was before in a different state and now has the reported state?

This is my interpretation of the specification.

> Current situation is that the monitoring software interpret the event
> "transition to OK" as an CRITCAL state which I think is not correct.

With the caveat that every manufacturer / sensor could have alternate
interpretations, I've always gone with the belief that "transition to
OK" is "nominal" / "ok" as a monitoring state.  It's what I set as the
default within the libipmimonitoring interpretations (i.e. ipmi-sensors 
--output-sensor-state).

Al

On Wed, 2021-07-14 at 15:13 +, FRANK Michael wrote:
> Hello,
> 
> I am currently in the discussion with monitoring software developers
> about the interpretation of sensor events.
> In general what I understood is: an event only changes in the
> situation the state of a sensor changes but the state of the sensor
> is not "really" checked each time we read the sensor e.g with ipmi-
> sensor. My abstract imagination is that the BMC has a table of
> sensor´s and the latest events and if the state of a sensor changes
> the BMC update the according entry in this table.
> 
> Main discussion is about the event "transition to..." and especially
> the event "transition to OK" (Type code 07h, Offset 00h).
> Unfortunately, there are no further explanations in the IPMI Specs
> Sensor and Event Code Tables.
> 
> So here is my question:
> Is it correct that the meaning of "transition to" is, that the sensor
> was before in a different state and now has the reported state?
> For example, event "transition to OK" means the sensor had, before
> the event changed, a state other than OK, like Critical, Non-Critical 
> etc. and his present state is now OK.
> 
> I would much appreciate if an expert could verify the above
> statement. 
> Current situation is that the monitoring software interpret the event
> "transition to OK" as an CRITCAL state which I think is not correct.
> 
> ID | Name | Type | Reading| Units | Event
> 80 | PS2 12V UV Fault | Power Supply | N/A| N/A   |
> 'transition to OK'
> 
> Record ID: 80
> Record Type: Compact Sensor Record (2h)
> ID String: PS2 12V UV Fault
> Sensor Type: Power Supply (8h)
> Sensor Number: 122
> IPMB Slave Address: 10h
> Sensor Owner ID: 20h
> Sensor Owner LUN: 0h
> Channel Number: 0h
> Entity ID: power supply (10)
> Entity Instance: 2
> Entity Instance Type: Physical Entity
> Event/Reading Type Code: 7h
> Sensor Direction: Unspecified
> Assertion Event Enabled: 'transition to OK'
> Assertion Event Enabled: 'transition to Non-Critical from OK'
> Assertion Event Enabled: 'transition to Critical from less severe'
> Assertion Event Enabled: 'transition to Critical from Non-
> recoverable'
> Assertion Event Enabled: 'Monitor'
> Deassertion Event Enabled: 'transition to OK'
> Deassertion Event Enabled: 'transition to Non-Critical from OK'
> Deassertion Event Enabled: 'transition to Critical from less severe'
> Deassertion Event Enabled: 'transition to Critical from Non-
> recoverable'
> Deassertion Event Enabled: 'Monitor'
> Share Count: 1
> ID String Instance Modifier Type: Numeric
> ID String Instance Modifier Offset: 0
> Entity Instance Sharing: Same for all records
> Sensor Event: 'transition to OK'
> 
> 10.240.4.12: =
> 10.240.4.12: IPMI 1.5 Get Channel Authentication Capabilities Request
> 10.240.4.12: =
> 10.240.4.12: RMCP Header:
> 10.240.4.12: 
> 10.240.4.12: [   6h] = version[ 8b]
> 10.240.4.12: [   0h] = reserved[ 8b]
> 10.240.4.12: [  FFh] = sequence_number[ 8b]
> 10.240.4.12: [   7h] = message_class.class[ 5b]
> 10.240.4.12: [   0h] = message_class.reserved[ 2b]
> 10.240.4.12: [   0h] = message_class.ack[ 1b]
> 10.240.4.12: IPMI Session Header:
> 10.240.4.12: 
> 10.240.4.12: [   0h] = authentication_type[ 8b]
> 10.240.4.12: [   0h] = session_sequence_number[32b]
> 10.240.4.12: [   0h] = session_id[32b]
> 10.240.4.12: [   9h] = ipmi_msg_len[ 8b]
> 10.240.4.12: IPMI Message Header:
> 10.240.4.12: 
> 10.240.4.12: [  20h] = rs_addr[ 8b]
> 10.240.4.12: [   0h] = rs_lun[ 2b]
> 10.240.4.12: [   6h] = net_fn[ 6b]
> 10.240.4.12: [  C8h] = checksum1[ 8b]
> 10.240.4.12: [  81h] = rq_addr[ 8b]
> 10.240.4.12: [   0h] = rq_lun[ 2b]
> 10.240.4.12: [  31h] = rq_seq[ 6b]
> 10.240.4.12: IPMI Command Data:
> 10.240.4.12: --
> 10.240.4.12: [  38h] = cmd[ 8b]
> 10.240.4.12: [   Eh] = channel_number[ 4b]
> 10.240.4.12: [   0h] = reserved1[ 3b]
> 10.240.4.12: [   1h] = get_ipmi_v2.0_extended_data[ 1b]
> 

[Freeipmi-users] Interpretation of IPMI events for monitoring

2021-07-14 Thread FRANK Michael
Hello,

I am currently in the discussion with monitoring software developers about the 
interpretation of sensor events.
In general what I understood is: an event only changes in the situation the 
state of a sensor changes but the state of the sensor is not "really" checked 
each time we read the sensor e.g with ipmi-sensor. My abstract imagination is 
that the BMC has a table of sensor´s and the latest events and if the state of 
a sensor changes the BMC update the according entry in this table.

Main discussion is about the event "transition to..." and especially the event 
"transition to OK" (Type code 07h, Offset 00h). Unfortunately, there are no 
further explanations in the IPMI Specs Sensor and Event Code Tables.

So here is my question:
Is it correct that the meaning of "transition to" is, that the sensor was 
before in a different state and now has the reported state?
For example, event "transition to OK" means the sensor had, before the event 
changed, a state other than OK, like Critical, Non-Critical etc. and his 
present state is now OK.

I would much appreciate if an expert could verify the above statement. 
Current situation is that the monitoring software interpret the event 
"transition to OK" as an CRITCAL state which I think is not correct.

ID | Name | Type | Reading| Units | Event
80 | PS2 12V UV Fault | Power Supply | N/A| N/A   | 'transition to OK'

Record ID: 80
Record Type: Compact Sensor Record (2h)
ID String: PS2 12V UV Fault
Sensor Type: Power Supply (8h)
Sensor Number: 122
IPMB Slave Address: 10h
Sensor Owner ID: 20h
Sensor Owner LUN: 0h
Channel Number: 0h
Entity ID: power supply (10)
Entity Instance: 2
Entity Instance Type: Physical Entity
Event/Reading Type Code: 7h
Sensor Direction: Unspecified
Assertion Event Enabled: 'transition to OK'
Assertion Event Enabled: 'transition to Non-Critical from OK'
Assertion Event Enabled: 'transition to Critical from less severe'
Assertion Event Enabled: 'transition to Critical from Non-recoverable'
Assertion Event Enabled: 'Monitor'
Deassertion Event Enabled: 'transition to OK'
Deassertion Event Enabled: 'transition to Non-Critical from OK'
Deassertion Event Enabled: 'transition to Critical from less severe'
Deassertion Event Enabled: 'transition to Critical from Non-recoverable'
Deassertion Event Enabled: 'Monitor'
Share Count: 1
ID String Instance Modifier Type: Numeric
ID String Instance Modifier Offset: 0
Entity Instance Sharing: Same for all records
Sensor Event: 'transition to OK'

10.240.4.12: =
10.240.4.12: IPMI 1.5 Get Channel Authentication Capabilities Request
10.240.4.12: =
10.240.4.12: RMCP Header:
10.240.4.12: 
10.240.4.12: [   6h] = version[ 8b]
10.240.4.12: [   0h] = reserved[ 8b]
10.240.4.12: [  FFh] = sequence_number[ 8b]
10.240.4.12: [   7h] = message_class.class[ 5b]
10.240.4.12: [   0h] = message_class.reserved[ 2b]
10.240.4.12: [   0h] = message_class.ack[ 1b]
10.240.4.12: IPMI Session Header:
10.240.4.12: 
10.240.4.12: [   0h] = authentication_type[ 8b]
10.240.4.12: [   0h] = session_sequence_number[32b]
10.240.4.12: [   0h] = session_id[32b]
10.240.4.12: [   9h] = ipmi_msg_len[ 8b]
10.240.4.12: IPMI Message Header:
10.240.4.12: 
10.240.4.12: [  20h] = rs_addr[ 8b]
10.240.4.12: [   0h] = rs_lun[ 2b]
10.240.4.12: [   6h] = net_fn[ 6b]
10.240.4.12: [  C8h] = checksum1[ 8b]
10.240.4.12: [  81h] = rq_addr[ 8b]
10.240.4.12: [   0h] = rq_lun[ 2b]
10.240.4.12: [  31h] = rq_seq[ 6b]
10.240.4.12: IPMI Command Data:
10.240.4.12: --
10.240.4.12: [  38h] = cmd[ 8b]
10.240.4.12: [   Eh] = channel_number[ 4b]
10.240.4.12: [   0h] = reserved1[ 3b]
10.240.4.12: [   1h] = get_ipmi_v2.0_extended_data[ 1b]
10.240.4.12: [   2h] = maximum_privilege_level[ 4b]
10.240.4.12: [   0h] = reserved2[ 4b]
10.240.4.12: IPMI Trailer:
10.240.4.12: --
10.240.4.12: [  F3h] = checksum2[ 8b]
10.240.4.12: =
10.240.4.12: IPMI 1.5 Get Channel Authentication Capabilities Response
10.240.4.12: =
10.240.4.12: RMCP Header:
10.240.4.12: 
10.240.4.12: [   6h] = version[ 8b]
10.240.4.12: [   0h] = reserved[ 8b]
10.240.4.12: [  FFh] = sequence_number[ 8b]
10.240.4.12: [   7h] = message_class.class[ 5b]
10.240.4.12: [   0h] = message_class.reserved[ 2b]
10.240.4.12: [   0h] = message_class.ack[ 1b]
10.240.4.12: IPMI Session Header:
10.240.4.12: 
10.240.4.12: [