Hi Dominik The problem is, that the cluster does not do the monitor-action every 20s. The last time, when he did the action was at 09:21. And now we have 10:37:
MySQL_MonitorAgent_Resource: migration-threshold=3 + (479) stop: last-rc-change='Wed Mar 17 09:21:28 2010' last-run='Wed Mar 17 09:21:28 2010' exec-time=3010ms queue-time=0ms rc=0 (ok) + (480) start: last-rc-change='Wed Mar 17 09:21:31 2010' last-run='Wed Mar 17 09:21:31 2010' exec-time=3010ms queue-time=0ms rc=0 (ok) + (481) monitor: interval=10000ms last-rc-change='Wed Mar 17 09:21:34 2010' last-run='Wed Mar 17 09:21:34 2010' exec-time=20ms queue-time=0ms rc=0 (ok) If I restart the whole cluster, then the new returncode (exit99 or exit4) will be saw by the cluster-monitor. 2010/3/17 Dominik Klein <d...@in-telegence.net>: > Hi Tom > > have a look at the logs and see whether the monitor op really returns > 99. (grep for the resource-id). If so, I'm not sure what the cluster > does with rc=99. As far as I know, rc=4 would be status=failed (unknown > actually). > > Regards > Dominik > > Tom Tux wrote: >> Thanks for your hint. >> >> I've configured an lsb-resource like this (with migration-threshold): >> >> primitive MySQL_MonitorAgent_Resource lsb:mysql-monitor-agent \ >> meta target-role="Started" migration-threshold="3" \ >> op monitor interval="10s" timeout="20s" on-fail="restart" >> >> I have now modified the init-script "/etc/init.d/mysql-monitor-agent", >> to exit with a returncode not equal "0" (example exit 99), when the >> monitor-operation is querying the status. But the cluster does not >> recognise a failed monitor-action. Why this behaviour? For the >> cluster, everything seems ok. >> >> node1:/ # showcores.sh MySQL_MonitorAgent_Resource >> Resource Score Node Stickiness >> #Fail Migration-Threshold >> MySQL_MonitorAgent_Resource -1000000 node1 100 0 3 >> MySQL_MonitorAgent_Resource 100 node2 100 0 3 >> >> I also saw, that the "last-run"-entry (crm_mon -fort1) for this >> resource is not up-to-date. For me it seems, that the monitor-action >> does not occurs every 10 seconds. Why? Any hints for this behaviour? >> >> Thanks a lot. >> Tom >> >> >> 2010/3/16 Dominik Klein <d...@in-telegence.net>: >>> Tom Tux wrote: >>>> Hi >>>> >>>> I've have a question about the resource-monitoring: >>>> I'm monitoring an ip-resource every 20 seconds. I have configured the >>>> "On Fail"-action with "restart". This works fine. If the >>>> "monitor"-operation fails, then the resource will be restartet. >>>> >>>> But how can I define this resource, to migrate to the other node, if >>>> the resource still fails after 10 restarts? Is this possible? How will >>>> the "failcount" interact with this scenario? >>>> >>>> In the documentation I read, that the resource-"fail_count" will >>>> encrease every time, when the resource restarts. But I can't see this >>>> fail_count. >>> Look at the meta attribute "migration-threshold". >>> >>> Regards >>> Dominik > > > _______________________________________________ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker