On 17/07/19 19:07 +0000, Gershman, Vladimir wrote: > This would be for the Pacemaker. > > Seems like the alerts in the link you sent, refer to numeric codes, > so where would I see all the codes and their meanings ? This would > allow a way to select what I need to monitor.
Unfortunately, we currently won't make do without direct code references, but keep in mind we are in the expert area already when the alerts usefulness is to be maxed out (and one is then also responsible to update such a tight wrapping accordingly if or when incompatible changes arrive in new versions -- presumably any slightly more disruptive releases would designate that in the versioning scheme [non-minor component being incremented] and perhaps it'd be noted in the release notes as well). That being said, more detailed documentation and perhaps accompanied with firmer assurances as to the details of so far vaguely specified informative data items attached to the "unit" of alert may arrive in the future, and I bet contributions to make it happen faster are warmly welcome, especially when driven by the real production needs. > For example: [intentionally reordered] > CRM_alert_status: > A numerical code used by Pacemaker to represent the operation > result (resource alerts only) See https://github.com/ClusterLabs/pacemaker/blob/Pacemaker-2.0.2/include/crm/services.h#L118-L129 > CRM_alert_desc: > Detail about event. For node alerts, this is the node's current > state (member or lost). That's literal, see https://github.com/ClusterLabs/pacemaker/blob/Pacemaker-2.0.2/include/crm/cluster.h#L30-L31 > For fencing alerts, this is a summary of the requested fencing > operation, including origin, target, and fencing operation error > code, if any. This would indeed require extensive parsing of the generated string for fields that are not present as standalone variables (here, node to be fenced that is also available separately via CRM_alert_node): https://github.com/ClusterLabs/pacemaker/blob/Pacemaker-2.0.2/daemons/controld/controld_execd_state.c#L805-L809 > For resource alerts, this is a readable string equivalent of > CRM_alert_status. See the first link above, translation from numeric codes is rather symbolic, though: https://github.com/ClusterLabs/pacemaker/blob/Pacemaker-2.0.2/include/crm/services.h#L331-L340 (but may denote that some codes from the full enumeration are strictly internal, based on a simple reasoning about the coverage, not sure) Plus there's an exception for operations already known finished, for which exit status from the actual agent's execution is reproduced here in words, and luckily, that's actually documented: https://github.com/ClusterLabs/resource-agents/blob/v4.3.0/doc/dev-guides/ra-dev-guide.asc#return-codes > CRM_alert_target_rc: > The expected numerical return code of the operation (resource > alerts only) This appears to be primarily bound to OCF codes referred just above. * * * Hopefully that's enough to get you started with your own exploration. Initially, I'd also suggest attaching your own dump-all alert handler to get the real hands-on with the data at your disposal that can be leveraged in your true handler. -- Jan (Poki)
pgpJ_ABsu74l5.pgp
Description: PGP signature
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/