[Pacemaker] Expired fail-count doesn't get cleaned up.

David Coulson Tue, 31 Jul 2012 02:45:24 -0700

I'm running RHEL6 with the tech preview of pacemaker it ships with. I'vea number of resources which have a failure-timeout="60", which most ofthe time does what it is supposed to.

Last night a resource failed, which was part of a clone - While theresource recovered, the fail-count log never got cleaned up. Aroundevery second the DC logged the pengine message below. I manually did aresource cleanup, and it seems happy now. Is there something I should belooking for in the logs to indicate that it 'missed' expiring this?


Version: 1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558

Migration summary:
* Node dresproddns01:

re-openfire-lsb:0: migration-threshold=1000000 fail-count=1last-failure='Mon Jul 30 21:57:53 2012'

* Node dresproddns02:

Jul 31 05:32:34 dresproddns02 pengine: [2860]: notice: get_failcount:Failcount for cl-openfire on dresproddns01 has expired (limit was 60s)



_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Expired fail-count doesn't get cleaned up.

Reply via email to