On Fri, Dec 17, 2010 at 10:56 AM, Chris Picton <ch...@ecntelecoms.com> wrote: > On Thu, 16 Dec 2010 08:27:51 +0100, Andrew Beekhof wrote: > >> On Wed, Dec 15, 2010 at 8:30 AM, Chris Picton > >>> Why would a resource cleanup remove the resource from the lrm, even >>> though it is still running correctly, >> >> Thats what cleanup does. >> What is supposed to happen next however, is that the cluster runs a >> non-recurring monitor operation to re-determine the current state of the >> cluster and go from there. >> Also, any recurring actions should have been cancelled at the point the >> resource was removed from the lrm. >> >> What versions of pacemaker and cluster-glue do you have? Distro? >> > > I am using the clusterlabs rpms > pacemaker-1.0.9.1-1.15.el5 > cluster-glue-1.0.6-1.6.el5 > > I see the following in the output of mon_mon -rf1t (I'm only showing the > resources which are showing rc != 0) > * Node sbc-tpna2-06.ecntelecoms.za.net: pingd=100 > megaswitch:5: migration-threshold=1000000 > + (53) probe: last-rc-change='Fri Nov 26 09:17:38 2010' last-run='Fri > Nov 26 09:17:38 2010' exec-time=30ms queue-time=0ms rc=1 (unknown error) > + (55) stop: last-rc-change='Fri Nov 26 09:17:41 2010' last-run='Fri > Nov 26 09:17:41 2010' exec-time=20ms queue-time=0ms rc=0 (ok) > + (56) start: last-rc-change='Fri Nov 26 09:17:42 2010' last-run='Fri > Nov 26 09:17:42 2010' exec-time=1040ms queue-time=0ms rc=0 (ok) > + (57) monitor: interval=8000ms last-rc-change='Fri Nov 26 09:17:44 > 2010' last-run='Fri Nov 26 09:17:44 2010' exec-time=260ms queue-time=0ms > rc=0 (ok) > * Node sbc-tpna2-05.ecntelecoms.za.net: pingd=100 > megaswitch:4: migration-threshold=1000000 > + (58) probe: last-rc-change='Fri Nov 26 09:17:38 2010' last-run='Fri > Nov 26 09:17:38 2010' exec-time=30ms queue-time=0ms rc=1 (unknown error) > + (60) stop: last-rc-change='Fri Nov 26 09:17:41 2010' last-run='Fri > Nov 26 09:17:41 2010' exec-time=20ms queue-time=0ms rc=0 (ok) > + (61) start: last-rc-change='Fri Nov 26 09:17:42 2010' last-run='Fri > Nov 26 09:17:42 2010' exec-time=1040ms queue-time=0ms rc=0 (ok) > + (62) monitor: interval=8000ms last-rc-change='Fri Nov 26 09:17:44 > 2010' last-run='Fri Nov 26 09:17:44 2010' exec-time=260ms queue-time=0ms > rc=0 (ok) > > Would this affect the result of the 'non-recurring monitor > operation' (the probe operations having rc=1)
Definitely. They tell us the resource is unhealthy and needs to be stopped. > > I am not 100% sure why the errors are there - the log on the server for > that day shows: > ---- > Nov 26 09:17:39 sbc-tpna2-06 crmd: [29893]: info: do_lrm_rsc_op: > Performing key=36:2184:7:c83a06e0-913e-4546-92e5-19f784dcaf5c > op=megaswitch:5_monitor_0 ) > Nov 26 09:17:39 sbc-tpna2-06 lrmd: [29890]: info: rsc:megaswitch:5:53: > probe > Nov 26 09:17:39 sbc-tpna2-06 lrmd: [29890]: WARN: Managed > megaswitch:5:monitor process 24823 exited with return code 1. > Nov 26 09:17:39 sbc-tpna2-06 lrmd: [29890]: WARN: Managed > megaswitch:5:monitor process 24823 exited with return code 1. > Nov 26 09:17:39 sbc-tpna2-06 crmd: [29893]: info: process_lrm_event: LRM > operation megaswitch:5_monitor_0 (call=53, rc=1, cib-update=68, > confirmed=true) unknown error > ---- > > If they are affecting it, how would I clear them, so pacemaker sees > everything as OK? Clearing them wont help, because we'll just go and check the status again - which will fail again. You need to fix the agent. > > Thanks for the help > > Chris > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker