Re: [Pacemaker] ClusterMon Resource starting multiple instances of crm_mon

Steven Bambling Fri, 10 May 2013 02:41:27 -0700

On May 9, 2013, at 8:05 PM, Andrew Beekhof <and...@beekhof.net> wrote:


> 
> On 10/05/2013, at 12:40 AM, Steven Bambling <smbambl...@arin.net> wrote:
> 
>> I'm having some issues with getting some cluster  monitoring setup and 
>> configured on a 3 node multi-state cluster.   I'm using Florian's blog as an 
>> example 
>> http://floriancrouzat.net/2013/01/monitor-a-pacemaker-cluster-with-ocfpacemakerclustermon-andor-external-agent/.
>> 
>> When I create the primitive resource it starts on one of my nodes but spawns 
>> multiple instances of crm_mon.  I don't see any reason that would cause it 
>> to spawn multiple instances, its very odd behavior.
> 
> If you run:
> 
> /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E 
> /usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net -h 
> /tmp/ClusterMon_SNMPMon.html
> 
> manually a few times, what happens?  Multiple processes?

Yep for some reason its spawning multiple processes.

root@pgdb3 ~]# ps aux | grep crm_mon
root     30678  0.0  0.0 103244   856 pts/0    S+   05:30   0:00 grep crm_mon
[root@pgdb3 ~]# /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E 
/usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net -h 
/tmp/ClusterMon_SNMPMon.html
[root@pgdb3 ~]# /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E 
/usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net -h 
/tmp/ClusterMon_SNMPMon.html
[root@pgdb3 ~]# /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E 
/usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net -h 
/tmp/ClusterMon_SNMPMon.html
[root@pgdb3 ~]# /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E 
/usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net -h 
/tmp/ClusterMon_SNMPMon.html
[root@pgdb3 ~]# ps aux | grep crm_mon
root     30772  0.0  0.0  82744  2816 ?        S    05:30   0:00 
/usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E 
/usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net -h 
/tmp/ClusterMon_SNMPMon.html
root     30781  0.0  0.0  82744  2668 ?        S    05:30   0:00 
/usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E 
/usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net -h 
/tmp/ClusterMon_SNMPMon.html
root     30784  0.0  0.0  82744  2476 ?        S    05:30   0:00 
/usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E 
/usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net -h 
/tmp/ClusterMon_SNMPMon.html
root     31134  0.0  0.0 103244   856 pts/0    S+   05:30   0:00 grep crm_mon

Put the .pid file in the tmp dir only lists 1 pid
[root@pgdb3 ~]# cat /tmp/ClusterMon_SNMPMon.pid
    30772

> 
>> 
>> I was also looking for some clarification on what this resource provides….it 
>> looks to me that it kicks off a crm_mon in daemon mode that will update a 
>> .html file and with -E it will run an external script.  But the resource 
>> itself doesn't trigger anything if another resource changes state only if 
>> the crm_mon process ( monitored with PID ) fails and it has to restart.
> 
> Correct, it just updates the html file which you can see in your browser.
> Or, with -E, it can send an email or snmp alert.
> 
>> If this is correct what is the best practice for monitoring additional 
>> resource states?
> 
> Define "additional"?
> If the resource fails we'll normally recover it automatically.
An example of an additional resource would be a vip using ( IPaddr2 ).  Also I 
have a multi-state pgsql resource, so if the resource fails it will either try 
to restart or promote another node in the cluster to Master.

v/r

STEVE

> 
>> 
>> v/r
>> 
>> STEVE
>> 
>> 
>> Below are some additional data points. 
>> 
>> 
>> Creating the Resource
>> 
>> [root@pgdb2 tmp]# crm configure primitive SNMPMon ocf:pacemaker:ClusterMon \
>>>       params user="root" update="30" extra_options="-E 
>>> /usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net" \
>>>       op monitor on-fail="restart" interval="60"
>> 
>> 
>> Manual crm_mon output
>> 
>> Last updated: Thu May  9 10:24:30 2013
>> Last change: Thu May  9 10:20:49 2013 via cibadmin on pgdb2.example.com
>> Stack: cman
>> Current DC: pgdb1.example.com - partition with quorum
>> Version: 1.1.8-7.el6-394e906
>> 3 Nodes configured, unknown expected votes
>> 6 Resources configured.
>> 
>> 
>> Node pgdb1.example.com: standby
>> Online: [ pgdb2.example.com pgdb3.example.com ]
>> 
>> PG_REP_VIP   (ocf::heartbeat:IPaddr2):       Started pgdb2.example.com
>> PG_CLI_VIP   (ocf::heartbeat:IPaddr2):       Started pgdb2.example.com
>> Master/Slave Set: msPGSQL [PGSQL]
>>    Masters: [ pgdb2.example.com ]
>>    Slaves: [ pgdb3.example.com ]
>>    Stopped: [ PGSQL:2 ]
>> SNMPMon      (ocf::pacemaker:ClusterMon):    Started pgdb3.example.com
>> 
>> PS to check for process on pgdb3
>> 
>> [root@pgdb3 tmp]# ps aux | grep crm_mon
>> root     16097  0.0  0.0  82624  2784 ?        S    10:20   0:00 
>> /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E 
>> /usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net -h 
>> /tmp/ClusterMon_SNMPMon.html
>> root     16099  0.0  0.0  82624  2660 ?        S    10:20   0:00 
>> /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E 
>> /usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net -h 
>> /tmp/ClusterMon_SNMPMon.html
>> root     16104  0.0  0.0  82624  2448 ?        S    10:20   0:00 
>> /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E 
>> /usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net -h 
>> /tmp/ClusterMon_SNMPMon.html
>> root     16515  0.0  0.0 103244   852 pts/0    S+   10:21   0:00 grep crm_mon
>> 
>> Output from corosync.log
>> 
>> May 09 10:20:51 [3100] pgdb3.cha.arin.net       lrmd:     info: 
>> process_lrmd_get_rsc_info:      Resource 'SNMPMon' not found (3 active 
>> resources)
>> May 09 10:20:51 [3100] pgdb3.cha.arin.net       lrmd:     info: 
>> process_lrmd_rsc_register:      Added 'SNMPMon' to the rsc list (4 active 
>> resources)
>> May 09 10:20:52 [3103] pgdb3.cha.arin.net       crmd:     info: 
>> services_os_action_execute:     Managed ClusterMon_meta-data_0 process 16010 
>> exited with rc=0
>> May 09 10:20:52 [3103] pgdb3.cha.arin.net       crmd:   notice: 
>> process_lrm_event:      LRM operation SNMPMon_monitor_0 (call=61, rc=7, 
>> cib-update=28, confirmed=true) not running
>> May 09 10:20:52 [3103] pgdb3.cha.arin.net       crmd:   notice: 
>> process_lrm_event:      LRM operation SNMPMon_start_0 (call=64, rc=0, 
>> cib-update=29, confirmed=true) ok
>> May 09 10:20:52 [3103] pgdb3.cha.arin.net       crmd:   notice: 
>> process_lrm_event:      LRM operation SNMPMon_monitor_60000 (call=67, rc=0, 
>> cib-update=30, confirmed=false) ok
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] ClusterMon Resource starting multiple instances of crm_mon

Reply via email to