On Tue, Jul 31, 2012 at 7:36 PM, David Coulson <[email protected]> wrote: > I'm running RHEL6 with the tech preview of pacemaker it ships with. I've a > number of resources which have a failure-timeout="60", which most of the > time does what it is supposed to. > > Last night a resource failed, which was part of a clone - While the resource > recovered, the fail-count log never got cleaned up. Around every second the > DC logged the pengine message below. I manually did a resource cleanup, and > it seems happy now. Is there something I should be looking for in the logs > to indicate that it 'missed' expiring this?
You might be experiencing: + David Vossel (5 months ago) 9263480: Low: pengine: cl#5025 - Automatically clear failures when resource configuration changes. But if you send us a crm_report tarball coving the period during which you had problems, we can check. > > Version: 1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558 > > Migration summary: > * Node dresproddns01: > re-openfire-lsb:0: migration-threshold=1000000 fail-count=1 > last-failure='Mon Jul 30 21:57:53 2012' > * Node dresproddns02: > > > Jul 31 05:32:34 dresproddns02 pengine: [2860]: notice: get_failcount: > Failcount for cl-openfire on dresproddns01 has expired (limit was 60s) > > > _______________________________________________ > Pacemaker mailing list: [email protected] > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: [email protected] http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
