Right now in a test cluster on CentOS 7 I'm occasionally seeing resource monitoring failures and, just today, a failure to start a fencing agent. While I need to track those down problems, the issue I want to discuss here is being notified when there is a problem with the cluster, where there is not a nagios-type monitoring system in place.
On an older CentOS 5 cluster I have a cron job that periodically runs 'crm_verify -LV'. If the return code is non-zero, the output of that command (and some other info) is mailed to the operator. That mechanism has been working well for years. However on CentOS 7, when the cluster gets into this state 'crm_verify -LV' returns zero, and its output claims there is no problem. However in 'crm_mon -f' I can see that I've got resource failures and nonzero failcounts. I tried 'pcs cluster status', however when the cluster is properly working (no failures), that command still has a return code of '1', probably because I get the 'Error: no nodes found in corosync.conf' which is an ignorable condition per <https://access.redhat.com/solutions/663283>. Is there a command that I can run from cron in the current cluster tools to tell me the simple answer of whether there is *anything* failed in the cluster, preferably based on its return code? The CentOS 7 cluster is running: corosync 2.3.4 pacemaker 1.1.13 The CentOS 5 cluster is running: corosync 1.2.7 pacemaker 1.0.12 The corosync.conf is included below: --------- cut here and be careful of pointy scissors --------- totem { version: 2 #secauth: off cluster_name: somecluster #transport: udpu rrp_mode: passive crypto_hash: sha256 clear_node_high_bit: yes interface { ringnumber: 0 bindnetaddr: 192.168.1.0 mcastaddr: 239.192.0.5 mcastport: 5406 } interface { ringnumber: 1 bindnetaddr: 192.168.2.0 mcastaddr: 239.192.0.6 mcastport: 5408 } } quorum { provider: corosync_votequorum two_node: 1 expected_votes: 2 } logging { to_syslog: yes } --------- cut here and be careful of pointy scissors --------- Devin _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org