Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-10-05 Thread Ken Gaillot
On 10/04/2016 05:34 PM, Andrew Beekhof wrote: > > > On Wed, Oct 5, 2016 at 7:03 AM, Ken Gaillot > wrote: > > On 10/02/2016 10:02 PM, Andrew Beekhof wrote: > >> Take a > >> look at all of nagios' options for deciding when a failure becomes > "real". >

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-10-04 Thread Andrew Beekhof
On Wed, Oct 5, 2016 at 7:03 AM, Ken Gaillot wrote: > On 10/02/2016 10:02 PM, Andrew Beekhof wrote: > >> Take a > >> look at all of nagios' options for deciding when a failure becomes > "real". > > > > I used to take a very hard line on this: if you don't want the cluster > > to do anything about

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-10-04 Thread Ken Gaillot
On 10/02/2016 10:02 PM, Andrew Beekhof wrote: >> Take a >> look at all of nagios' options for deciding when a failure becomes "real". > > I used to take a very hard line on this: if you don't want the cluster > to do anything about an error, don't tell us about it. > However I'm slowly changing my

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-10-02 Thread Andrew Beekhof
On Fri, Sep 30, 2016 at 10:28 AM, Ken Gaillot wrote: > On 09/28/2016 10:54 PM, Andrew Beekhof wrote: >> On Sat, Sep 24, 2016 at 9:12 AM, Ken Gaillot wrote: "Ignore" is theoretically possible to escalate, e.g. "ignore 3 failures then migrate", but I can't think of a real-world si

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-29 Thread Ken Gaillot
On 09/28/2016 10:54 PM, Andrew Beekhof wrote: > On Sat, Sep 24, 2016 at 9:12 AM, Ken Gaillot wrote: >>> "Ignore" is theoretically possible to escalate, e.g. "ignore 3 failures >>> then migrate", but I can't think of a real-world situation where that >>> makes sense, >>> >>> >>> really?

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-28 Thread Klaus Wenninger
On 09/29/2016 05:57 AM, Andrew Beekhof wrote: > On Mon, Sep 26, 2016 at 7:39 PM, Klaus Wenninger wrote: >> On 09/24/2016 01:12 AM, Ken Gaillot wrote: >>> On 09/22/2016 05:58 PM, Andrew Beekhof wrote: On Fri, Sep 23, 2016 at 1:58 AM, Ken Gaillot >>> > wrote: >>

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-28 Thread Andrew Beekhof
On Mon, Sep 26, 2016 at 7:39 PM, Klaus Wenninger wrote: > On 09/24/2016 01:12 AM, Ken Gaillot wrote: >> On 09/22/2016 05:58 PM, Andrew Beekhof wrote: >>> >>> On Fri, Sep 23, 2016 at 1:58 AM, Ken Gaillot >> > wrote: >>> >>> On 09/22/2016 09:53 AM, Jan Pokorný wrote:

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-28 Thread Andrew Beekhof
On Sat, Sep 24, 2016 at 9:12 AM, Ken Gaillot wrote: > On 09/22/2016 05:58 PM, Andrew Beekhof wrote: >> >> >> On Fri, Sep 23, 2016 at 1:58 AM, Ken Gaillot > > wrote: >> >> On 09/22/2016 09:53 AM, Jan Pokorný wrote: >> > On 22/09/16 08:42 +0200, Kristoffer Grönlun

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-26 Thread Klaus Wenninger
On 09/24/2016 01:12 AM, Ken Gaillot wrote: > On 09/22/2016 05:58 PM, Andrew Beekhof wrote: >> >> On Fri, Sep 23, 2016 at 1:58 AM, Ken Gaillot > > wrote: >> >> On 09/22/2016 09:53 AM, Jan Pokorný wrote: >> > On 22/09/16 08:42 +0200, Kristoffer Grönlund wrote: >>

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-23 Thread Ken Gaillot
On 09/22/2016 05:58 PM, Andrew Beekhof wrote: > > > On Fri, Sep 23, 2016 at 1:58 AM, Ken Gaillot > wrote: > > On 09/22/2016 09:53 AM, Jan Pokorný wrote: > > On 22/09/16 08:42 +0200, Kristoffer Grönlund wrote: > >> Ken Gaillot mailto:kgail...@redhat.com>>

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-22 Thread Andrew Beekhof
On Fri, Sep 23, 2016 at 1:58 AM, Ken Gaillot wrote: > On 09/22/2016 09:53 AM, Jan Pokorný wrote: > > On 22/09/16 08:42 +0200, Kristoffer Grönlund wrote: > >> Ken Gaillot writes: > >> > >>> I'm not saying it's a bad idea, just that it's more complicated than it > >>> first sounds, so it's worth t

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-22 Thread Ken Gaillot
On 09/22/2016 12:58 PM, Kristoffer Grönlund wrote: > Ken Gaillot writes: >> >> "restart" is the only on-fail value that it makes sense to escalate. >> >> block/stop/fence/standby are final. Block means "don't touch the >> resource again", so there can't be any further response to failures. >> Stop

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-22 Thread Kristoffer Grönlund
Ken Gaillot writes: > > "restart" is the only on-fail value that it makes sense to escalate. > > block/stop/fence/standby are final. Block means "don't touch the > resource again", so there can't be any further response to failures. > Stop/fence/standby move the resource off the local node, so fai

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-22 Thread Ken Gaillot
On 09/22/2016 09:53 AM, Jan Pokorný wrote: > On 22/09/16 08:42 +0200, Kristoffer Grönlund wrote: >> Ken Gaillot writes: >> >>> I'm not saying it's a bad idea, just that it's more complicated than it >>> first sounds, so it's worth thinking through the implications. >> >> Thinking about it and look

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-22 Thread Ken Gaillot
On 09/22/2016 10:43 AM, Jan Pokorný wrote: > On 21/09/16 10:51 +1000, Andrew Beekhof wrote: >> On Wed, Sep 21, 2016 at 6:25 AM, Ken Gaillot wrote: >>> Our first proposed approach would add a new hard-fail-threshold >>> operation property. If specified, the cluster would first try restarting >>> th

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-22 Thread Jan Pokorný
On 21/09/16 10:51 +1000, Andrew Beekhof wrote: > On Wed, Sep 21, 2016 at 6:25 AM, Ken Gaillot wrote: >> Our first proposed approach would add a new hard-fail-threshold >> operation property. If specified, the cluster would first try restarting >> the resource on the same node, > > > Well, just a

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-22 Thread Jan Pokorný
On 22/09/16 08:42 +0200, Kristoffer Grönlund wrote: > Ken Gaillot writes: > >> I'm not saying it's a bad idea, just that it's more complicated than it >> first sounds, so it's worth thinking through the implications. > > Thinking about it and looking at how complicated it gets, maybe what > you'

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-22 Thread Kristoffer Grönlund
Ken Gaillot writes: > I'm not saying it's a bad idea, just that it's more complicated than it > first sounds, so it's worth thinking through the implications. Thinking about it and looking at how complicated it gets, maybe what you'd really want, to make it clearer for the user, is the ability t

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-21 Thread Ken Gaillot
On 09/20/2016 07:51 PM, Andrew Beekhof wrote: > > > On Wed, Sep 21, 2016 at 6:25 AM, Ken Gaillot > wrote: > > Hi everybody, > > Currently, Pacemaker's on-fail property allows you to configure how the > cluster reacts to operation failures. The default "r

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-21 Thread Ken Gaillot
On 09/21/2016 02:23 AM, Kristoffer Grönlund wrote: > First of all, is there a use case for when fence-after-3-failures is a > useful behavior? I seem to recall some case where someone expected that > to be the behavior and were surprised by how pacemaker works, but that > problem wouldn't be helped

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-21 Thread Klaus Wenninger
On 09/20/2016 10:25 PM, Ken Gaillot wrote: > Hi everybody, > > Currently, Pacemaker's on-fail property allows you to configure how the > cluster reacts to operation failures. The default "restart" means try to > restart on the same node, optionally moving to another node once > migration-threshold

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-21 Thread Kristoffer Grönlund
Kristoffer Grönlund writes: > If implementing the first option, I would prefer to keep the behavior of > migration-threshold of counting all failures, not just > monitors. Otherwise there would be two closely related thresholds with > subtly divergent behavior, which seems confusing indeed. I se

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-21 Thread Kristoffer Grönlund
Ken Gaillot writes: > Hi everybody, > > Currently, Pacemaker's on-fail property allows you to configure how the > cluster reacts to operation failures. The default "restart" means try to > restart on the same node, optionally moving to another node once > migration-threshold is reached. Other pos

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-20 Thread Andrew Beekhof
On Wed, Sep 21, 2016 at 6:25 AM, Ken Gaillot wrote: > Hi everybody, > > Currently, Pacemaker's on-fail property allows you to configure how the > cluster reacts to operation failures. The default "restart" means try to > restart on the same node, optionally moving to another node once > migration