Right now in a test cluster on CentOS 7 I'm occasionally seeing
resource monitoring failures and, just today, a failure to start
a fencing agent.  While I need to track those down problems, the
issue I want to discuss here is being notified when there is a
problem with the cluster, where there is not a nagios-type monitoring
system in place.

On an older CentOS 5 cluster I have a cron job that periodically runs
'crm_verify -LV'.  If the return code is non-zero, the output of
that command (and some other info) is mailed to the operator.  That
mechanism has been working well for years.

However on CentOS 7, when the cluster gets into this state 'crm_verify -LV'
returns zero, and its output claims there is no problem.  However in
'crm_mon -f' I can see that I've got resource failures and nonzero
failcounts.

I tried 'pcs cluster status', however when the cluster is properly
working (no failures), that command still has a return code of '1',
probably because I get the 'Error: no nodes found in corosync.conf'
which is an ignorable condition per
<https://access.redhat.com/solutions/663283>.

Is there a command that I can run from cron in the current cluster
tools to tell me the simple answer of whether there is *anything*
failed in the cluster, preferably based on its return code?

The CentOS 7 cluster is running:
   corosync 2.3.4
   pacemaker 1.1.13

The CentOS 5 cluster is running:
   corosync 1.2.7
   pacemaker 1.0.12

The corosync.conf is included below:

--------- cut here and be careful of pointy scissors ---------
totem {
        version: 2
        #secauth: off
        cluster_name: somecluster
        #transport: udpu
        rrp_mode: passive
        crypto_hash: sha256
        clear_node_high_bit: yes

        interface {
                ringnumber: 0
                bindnetaddr: 192.168.1.0
                mcastaddr: 239.192.0.5
                mcastport: 5406
        }
        interface {
                ringnumber: 1
                bindnetaddr: 192.168.2.0
                mcastaddr: 239.192.0.6
                mcastport: 5408
        }
}

quorum {
        provider: corosync_votequorum
        two_node: 1
        expected_votes: 2
}

logging {
        to_syslog: yes
}

--------- cut here and be careful of pointy scissors ---------

Devin


_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to