Could some one please reply ? On Thu, Nov 19, 2015 at 10:28 PM, Pritam Kharat < pritam.kha...@oneconvergence.com> wrote:
> > Hi All, > > I have 2 node HA setup. I have added migration_threshold=5 and > failure-timeout=120s for my resources. When migration threshold is reached > to 5 resources are migrated to other node. But once observed fail-count is > not reset back to zero after 2 mins. The setup was in the same state almost > for 3 hours but still fail-count did not reset to zero. > > Then I tried the same test again but could not reproduce this.When > compared the logs of success scenario with failed scenario found that > pengine did not take action to clear failcount. > > > > Success logs > *Nov 19 15:27:08 [16409] sc-node-1 pengine: notice: unpack_rsc_op: > Clearing expired failcount for oc-service-manager on sc-node-1* > Nov 19 15:27:08 [16409] sc-node-1 pengine: info: get_failcount_full: > oc-service-manager has failed 5 times on sc-node-1 > Nov 19 15:27:08 [16409] sc-node-1 pengine: notice: unpack_rsc_op: > Clearing expired failcount for oc-service-manager on sc-node-1 > Nov 19 15:27:08 [16409] sc-node-1 pengine: notice: unpack_rsc_op: > Re-initiated expired calculated failure oc-service-manager_last_failure_0 > (rc=7, magic=0:7;3:145:0:258ae879-832f-4126-a7d7-e57bd3fdcdb1) on > sc-node-1 > 4:58 PM > > > Failure logs > Nov 04 22:23:39 [6831] sc-HA2 pengine: warning: unpack_rsc_op: > Processing failed op monitor for oc-service-manager on sc-HA1: not > running (7) > Nov 04 22:23:39 [6831] sc-HA2 pengine: info: native_print: > oc-service-manager (upstart:oc-service-manager): Started sc-HA2 > *Nov 04 22:23:39 [6831] sc-HA2 pengine: info: get_failcount_full: > oc-service-manager has failed 5 times on sc-HA1* > Nov 04 22:23:39 [6831] sc-HA2 pengine: warning: common_apply_stickiness: > Forcing oc-service-manager away from sc-HA1 after 5 failures (max=5) > Nov 04 22:23:39 [6831] sc-HA2 pengine: info: rsc_merge_weights: > oc-service-manager: Rolling back scores from oc-fw-agent > Nov 04 22:23:39 [6831] sc-HA2 pengine: info: LogActions: > Leave oc-service-manager (Started sc-HA2) > > > What might be the reason of - in failure case this action did not take > place ? > *notice: unpack_rsc_op: Clearing expired failcount for > oc-service-manager * > > > -- > Thanks and Regards, > Pritam Kharat. > -- Thanks and Regards, Pritam Kharat.
_______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org