Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-10-04 Thread Andrew Beekhof
On Wed, Oct 5, 2016 at 7:03 AM, Ken Gaillot wrote: > On 10/02/2016 10:02 PM, Andrew Beekhof wrote: > >> Take a > >> look at all of nagios' options for deciding when a failure becomes > "real". > > > > I used to take a very hard line on this: if you don't want the cluster > >

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-10-04 Thread Ken Gaillot
On 10/02/2016 10:02 PM, Andrew Beekhof wrote: >> Take a >> look at all of nagios' options for deciding when a failure becomes "real". > > I used to take a very hard line on this: if you don't want the cluster > to do anything about an error, don't tell us about it. > However I'm slowly changing

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-10-02 Thread Andrew Beekhof
On Fri, Sep 30, 2016 at 10:28 AM, Ken Gaillot wrote: > On 09/28/2016 10:54 PM, Andrew Beekhof wrote: >> On Sat, Sep 24, 2016 at 9:12 AM, Ken Gaillot wrote: "Ignore" is theoretically possible to escalate, e.g. "ignore 3 failures then

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-29 Thread Ken Gaillot
On 09/28/2016 10:54 PM, Andrew Beekhof wrote: > On Sat, Sep 24, 2016 at 9:12 AM, Ken Gaillot wrote: >>> "Ignore" is theoretically possible to escalate, e.g. "ignore 3 failures >>> then migrate", but I can't think of a real-world situation where that >>> makes

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-23 Thread Ken Gaillot
On 09/22/2016 05:58 PM, Andrew Beekhof wrote: > > > On Fri, Sep 23, 2016 at 1:58 AM, Ken Gaillot > wrote: > > On 09/22/2016 09:53 AM, Jan Pokorný wrote: > > On 22/09/16 08:42 +0200, Kristoffer Grönlund wrote: > >> Ken Gaillot

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-22 Thread Andrew Beekhof
On Fri, Sep 23, 2016 at 1:58 AM, Ken Gaillot wrote: > On 09/22/2016 09:53 AM, Jan Pokorný wrote: > > On 22/09/16 08:42 +0200, Kristoffer Grönlund wrote: > >> Ken Gaillot writes: > >> > >>> I'm not saying it's a bad idea, just that it's more complicated

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-22 Thread Ken Gaillot
On 09/22/2016 12:58 PM, Kristoffer Grönlund wrote: > Ken Gaillot writes: >> >> "restart" is the only on-fail value that it makes sense to escalate. >> >> block/stop/fence/standby are final. Block means "don't touch the >> resource again", so there can't be any further

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-22 Thread Kristoffer Grönlund
Ken Gaillot writes: > > "restart" is the only on-fail value that it makes sense to escalate. > > block/stop/fence/standby are final. Block means "don't touch the > resource again", so there can't be any further response to failures. > Stop/fence/standby move the resource off

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-22 Thread Ken Gaillot
On 09/22/2016 09:53 AM, Jan Pokorný wrote: > On 22/09/16 08:42 +0200, Kristoffer Grönlund wrote: >> Ken Gaillot writes: >> >>> I'm not saying it's a bad idea, just that it's more complicated than it >>> first sounds, so it's worth thinking through the implications. >> >>

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-22 Thread Ken Gaillot
On 09/22/2016 10:43 AM, Jan Pokorný wrote: > On 21/09/16 10:51 +1000, Andrew Beekhof wrote: >> On Wed, Sep 21, 2016 at 6:25 AM, Ken Gaillot wrote: >>> Our first proposed approach would add a new hard-fail-threshold >>> operation property. If specified, the cluster would first

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-22 Thread Jan Pokorný
On 21/09/16 10:51 +1000, Andrew Beekhof wrote: > On Wed, Sep 21, 2016 at 6:25 AM, Ken Gaillot wrote: >> Our first proposed approach would add a new hard-fail-threshold >> operation property. If specified, the cluster would first try restarting >> the resource on the same

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-22 Thread Kristoffer Grönlund
Ken Gaillot writes: > I'm not saying it's a bad idea, just that it's more complicated than it > first sounds, so it's worth thinking through the implications. Thinking about it and looking at how complicated it gets, maybe what you'd really want, to make it clearer for the

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-21 Thread Ken Gaillot
On 09/20/2016 07:51 PM, Andrew Beekhof wrote: > > > On Wed, Sep 21, 2016 at 6:25 AM, Ken Gaillot > wrote: > > Hi everybody, > > Currently, Pacemaker's on-fail property allows you to configure how the > cluster reacts to operation

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-21 Thread Ken Gaillot
On 09/21/2016 02:23 AM, Kristoffer Grönlund wrote: > First of all, is there a use case for when fence-after-3-failures is a > useful behavior? I seem to recall some case where someone expected that > to be the behavior and were surprised by how pacemaker works, but that > problem wouldn't be

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-21 Thread Klaus Wenninger
On 09/20/2016 10:25 PM, Ken Gaillot wrote: > Hi everybody, > > Currently, Pacemaker's on-fail property allows you to configure how the > cluster reacts to operation failures. The default "restart" means try to > restart on the same node, optionally moving to another node once > migration-threshold

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-21 Thread Kristoffer Grönlund
Kristoffer Grönlund writes: > If implementing the first option, I would prefer to keep the behavior of > migration-threshold of counting all failures, not just > monitors. Otherwise there would be two closely related thresholds with > subtly divergent behavior, which seems

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-21 Thread Kristoffer Grönlund
Ken Gaillot writes: > Hi everybody, > > Currently, Pacemaker's on-fail property allows you to configure how the > cluster reacts to operation failures. The default "restart" means try to > restart on the same node, optionally moving to another node once > migration-threshold

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-20 Thread Andrew Beekhof
On Wed, Sep 21, 2016 at 6:25 AM, Ken Gaillot wrote: > Hi everybody, > > Currently, Pacemaker's on-fail property allows you to configure how the > cluster reacts to operation failures. The default "restart" means try to > restart on the same node, optionally moving to another