On May 9, 2013, at 8:05 PM, Andrew Beekhof <and...@beekhof.net> wrote:
> > On 10/05/2013, at 12:40 AM, Steven Bambling <smbambl...@arin.net> wrote: > >> I'm having some issues with getting some cluster monitoring setup and >> configured on a 3 node multi-state cluster. I'm using Florian's blog as an >> example >> http://floriancrouzat.net/2013/01/monitor-a-pacemaker-cluster-with-ocfpacemakerclustermon-andor-external-agent/. >> >> When I create the primitive resource it starts on one of my nodes but spawns >> multiple instances of crm_mon. I don't see any reason that would cause it >> to spawn multiple instances, its very odd behavior. > > If you run: > > /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E > /usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net -h > /tmp/ClusterMon_SNMPMon.html > > manually a few times, what happens? Multiple processes? Yep for some reason its spawning multiple processes. root@pgdb3 ~]# ps aux | grep crm_mon root 30678 0.0 0.0 103244 856 pts/0 S+ 05:30 0:00 grep crm_mon [root@pgdb3 ~]# /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E /usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net -h /tmp/ClusterMon_SNMPMon.html [root@pgdb3 ~]# /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E /usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net -h /tmp/ClusterMon_SNMPMon.html [root@pgdb3 ~]# /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E /usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net -h /tmp/ClusterMon_SNMPMon.html [root@pgdb3 ~]# /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E /usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net -h /tmp/ClusterMon_SNMPMon.html [root@pgdb3 ~]# ps aux | grep crm_mon root 30772 0.0 0.0 82744 2816 ? S 05:30 0:00 /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E /usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net -h /tmp/ClusterMon_SNMPMon.html root 30781 0.0 0.0 82744 2668 ? S 05:30 0:00 /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E /usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net -h /tmp/ClusterMon_SNMPMon.html root 30784 0.0 0.0 82744 2476 ? S 05:30 0:00 /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E /usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net -h /tmp/ClusterMon_SNMPMon.html root 31134 0.0 0.0 103244 856 pts/0 S+ 05:30 0:00 grep crm_mon Put the .pid file in the tmp dir only lists 1 pid [root@pgdb3 ~]# cat /tmp/ClusterMon_SNMPMon.pid 30772 > >> >> I was also looking for some clarification on what this resource provides….it >> looks to me that it kicks off a crm_mon in daemon mode that will update a >> .html file and with -E it will run an external script. But the resource >> itself doesn't trigger anything if another resource changes state only if >> the crm_mon process ( monitored with PID ) fails and it has to restart. > > Correct, it just updates the html file which you can see in your browser. > Or, with -E, it can send an email or snmp alert. > >> If this is correct what is the best practice for monitoring additional >> resource states? > > Define "additional"? > If the resource fails we'll normally recover it automatically. An example of an additional resource would be a vip using ( IPaddr2 ). Also I have a multi-state pgsql resource, so if the resource fails it will either try to restart or promote another node in the cluster to Master. v/r STEVE > >> >> v/r >> >> STEVE >> >> >> Below are some additional data points. >> >> >> Creating the Resource >> >> [root@pgdb2 tmp]# crm configure primitive SNMPMon ocf:pacemaker:ClusterMon \ >>> params user="root" update="30" extra_options="-E >>> /usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net" \ >>> op monitor on-fail="restart" interval="60" >> >> >> Manual crm_mon output >> >> Last updated: Thu May 9 10:24:30 2013 >> Last change: Thu May 9 10:20:49 2013 via cibadmin on pgdb2.example.com >> Stack: cman >> Current DC: pgdb1.example.com - partition with quorum >> Version: 1.1.8-7.el6-394e906 >> 3 Nodes configured, unknown expected votes >> 6 Resources configured. >> >> >> Node pgdb1.example.com: standby >> Online: [ pgdb2.example.com pgdb3.example.com ] >> >> PG_REP_VIP (ocf::heartbeat:IPaddr2): Started pgdb2.example.com >> PG_CLI_VIP (ocf::heartbeat:IPaddr2): Started pgdb2.example.com >> Master/Slave Set: msPGSQL [PGSQL] >> Masters: [ pgdb2.example.com ] >> Slaves: [ pgdb3.example.com ] >> Stopped: [ PGSQL:2 ] >> SNMPMon (ocf::pacemaker:ClusterMon): Started pgdb3.example.com >> >> PS to check for process on pgdb3 >> >> [root@pgdb3 tmp]# ps aux | grep crm_mon >> root 16097 0.0 0.0 82624 2784 ? S 10:20 0:00 >> /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E >> /usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net -h >> /tmp/ClusterMon_SNMPMon.html >> root 16099 0.0 0.0 82624 2660 ? S 10:20 0:00 >> /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E >> /usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net -h >> /tmp/ClusterMon_SNMPMon.html >> root 16104 0.0 0.0 82624 2448 ? S 10:20 0:00 >> /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E >> /usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net -h >> /tmp/ClusterMon_SNMPMon.html >> root 16515 0.0 0.0 103244 852 pts/0 S+ 10:21 0:00 grep crm_mon >> >> Output from corosync.log >> >> May 09 10:20:51 [3100] pgdb3.cha.arin.net lrmd: info: >> process_lrmd_get_rsc_info: Resource 'SNMPMon' not found (3 active >> resources) >> May 09 10:20:51 [3100] pgdb3.cha.arin.net lrmd: info: >> process_lrmd_rsc_register: Added 'SNMPMon' to the rsc list (4 active >> resources) >> May 09 10:20:52 [3103] pgdb3.cha.arin.net crmd: info: >> services_os_action_execute: Managed ClusterMon_meta-data_0 process 16010 >> exited with rc=0 >> May 09 10:20:52 [3103] pgdb3.cha.arin.net crmd: notice: >> process_lrm_event: LRM operation SNMPMon_monitor_0 (call=61, rc=7, >> cib-update=28, confirmed=true) not running >> May 09 10:20:52 [3103] pgdb3.cha.arin.net crmd: notice: >> process_lrm_event: LRM operation SNMPMon_start_0 (call=64, rc=0, >> cib-update=29, confirmed=true) ok >> May 09 10:20:52 [3103] pgdb3.cha.arin.net crmd: notice: >> process_lrm_event: LRM operation SNMPMon_monitor_60000 (call=67, rc=0, >> cib-update=30, confirmed=false) ok >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org