Re: [ClusterLabs] cluster-recheck-interval and failure-timeout

2021-03-31 Thread Ken Gaillot
On Wed, 2021-03-31 at 17:38 +0200, Antony Stone wrote: > On Wednesday 31 March 2021 at 16:58:30, Antony Stone wrote: > > > I'm only interested in the most recent failure. I'm saying that > > once that > > failure is more than "failure-timeout" seconds old, I want the fact > > that > > the

Re: [ClusterLabs] cluster-recheck-interval and failure-timeout

2021-03-31 Thread Antony Stone
On Wednesday 31 March 2021 at 16:58:30, Antony Stone wrote: > I'm only interested in the most recent failure. I'm saying that once that > failure is more than "failure-timeout" seconds old, I want the fact that > the resource failed to be forgotten, so that it can be restarted or moved > between

Re: [ClusterLabs] cluster-recheck-interval and failure-timeout

2021-03-31 Thread Antony Stone
On Wednesday 31 March 2021 at 15:48:15, Ken Gaillot wrote: > On Wed, 2021-03-31 at 14:32 +0200, Antony Stone wrote: > > > > So, what am I misunderstanding about "failure-timeout", and what > > configuration setting do I need to use to tell pacemaker that "provided the > > resource hasn't failed

Re: [ClusterLabs] cluster-recheck-interval and failure-timeout

2021-03-31 Thread Antony Stone
On Wednesday 31 March 2021 at 15:48:15, Ken Gaillot wrote: > On Wed, 2021-03-31 at 14:32 +0200, Antony Stone wrote: > > > So, what am I misunderstanding about "failure-timeout", and what > > configuration setting do I need to use to tell pacemaker that "provided the > > resource hasn't failed

Re: [ClusterLabs] cluster-recheck-interval and failure-timeout

2021-03-31 Thread Ken Gaillot
On Wed, 2021-03-31 at 14:32 +0200, Antony Stone wrote: > Hi. > > I'm trying to understand what looks to me like incorrect behaviour > between > cluster-recheck-interval and failure-timeout, under pacemaker 2.0.1 > > I have three machines in a corosync (3.0.1 if it matters) cluster, > managing

[ClusterLabs] cluster-recheck-interval and failure-timeout

2021-03-31 Thread Antony Stone
Hi. I'm trying to understand what looks to me like incorrect behaviour between cluster-recheck-interval and failure-timeout, under pacemaker 2.0.1 I have three machines in a corosync (3.0.1 if it matters) cluster, managing 12 resources in a single group. I'm following documentation from: