On Wednesday 31 March 2021 at 16:58:30, Antony Stone wrote: > I'm only interested in the most recent failure. I'm saying that once that > failure is more than "failure-timeout" seconds old, I want the fact that > the resource failed to be forgotten, so that it can be restarted or moved > between nodes as normal, and not either be moved to another node just > because (a) there were two failures last Friday and then one today, or (b) > get stuck and not run on any nodes at all because all three nodes had > three failures sometime in the past month.
I've just confirmed that this is working as expected on pacemaker 1.1.16 (Debian 9) and is not working on pacemaker 2.0.1 (Debian 10). I have one cluster of 3 machines running pacemaker 1.1.16 and I have another cluster of 3 machines running pacemaker 2.0.1 They are both running the same set of resources. I just deliberately killed the same resource on each cluster, and sure enough "crm status -f" on both told me it had a fail-count of 1, with a last-failure timestamp. I waited 5 minutes (well above my failure-timeout value) and asked for "crm status -f" again. On pacemaker 1.1.16 there was simply a list of resources; no mention of failures. Just what I want. On pacemaker 2.0.1 there was a list of resources plus a fail-count=1 and a last-failure timestamp of 5 minutes earlier. To be sure I'm not being impatient, I've left it an hour (I did this test eariler, while I was still trying to understand the timing interactions) and the fail-count does not go away. Does anyone have suggestions on how to debug this difference in behaviour between pacemaker 1.1.16 and 2.0.1, because at present it prevents me being able to upgrade an operational cluster, as the result is simply unusable. Thanks, Antony. -- Perfection in design is achieved not when there is nothing left to add, but rather when there is nothing left to take away. - Antoine de Saint-Exupery Please reply to the list; please *don't* CC me. _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/