Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-22 Thread Andrew Beekhof
On Fri, Sep 23, 2016 at 1:58 AM, Ken Gaillot  wrote:

> On 09/22/2016 09:53 AM, Jan Pokorný wrote:
> > On 22/09/16 08:42 +0200, Kristoffer Grönlund wrote:
> >> Ken Gaillot  writes:
> >>
> >>> I'm not saying it's a bad idea, just that it's more complicated than it
> >>> first sounds, so it's worth thinking through the implications.
> >>
> >> Thinking about it and looking at how complicated it gets, maybe what
> >> you'd really want, to make it clearer for the user, is the ability to
> >> explicitly configure the behavior, either globally or per-resource. So
> >> instead of having to tweak a set of variables that interact in complex
> >> ways, you'd configure something like rule expressions,
> >>
> >> 
> >>   
> >>   
> >>   
> >> 
> >>
> >> So, try to restart the service 3 times, if that fails migrate the
> >> service, if it still fails, fence the node.
> >>
> >> (obviously the details and XML syntax are just an example)
> >>
> >> This would then replace on-fail, migration-threshold, etc.
> >
> > I must admit that in previous emails in this thread, I wasn't able to
> > follow during the first pass, which is not the case with this procedural
> > (sequence-ordered) approach.  Though someone can argue it doesn't take
> > type of operation into account, which might again open the door for
> > non-obvious interactions.
>
> "restart" is the only on-fail value that it makes sense to escalate.
>
> block/stop/fence/standby are final. Block means "don't touch the
> resource again", so there can't be any further response to failures.
> Stop/fence/standby move the resource off the local node, so failure
> handling is reset (there are 0 failures on the new node to begin with).
>
> "Ignore" is theoretically possible to escalate, e.g. "ignore 3 failures
> then migrate", but I can't think of a real-world situation where that
> makes sense,


really?

it is not uncommon to hear "i know its failed, but i dont want the cluster
to do anything until its _really_ failed"


> and it would be a significant re-implementation of "ignore"
> (which currently ignores the state of having failed, as opposed to a
> particular instance of failure).
>

agreed


>
> What the interface needs to express is: "If this operation fails,
> optionally try a soft recovery [always stop+start], but if  failures
> occur on the same node, proceed to a [configurable] hard recovery".
>
> And of course the interface will need to be different depending on how
> certain details are decided, e.g. whether any failures count toward 
> or just failures of one particular operation type, and whether the hard
> recovery type can vary depending on what operation failed.
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-22 Thread Ken Gaillot
On 09/22/2016 12:58 PM, Kristoffer Grönlund wrote:
> Ken Gaillot  writes:
>>
>> "restart" is the only on-fail value that it makes sense to escalate.
>>
>> block/stop/fence/standby are final. Block means "don't touch the
>> resource again", so there can't be any further response to failures.
>> Stop/fence/standby move the resource off the local node, so failure
>> handling is reset (there are 0 failures on the new node to begin with).
> 
> Hrm. If a restart potentially migrates the resource to a different node,
> is the failcount reset then as well? If so, wouldn't that complicate the
> hard-fail-threshold variable too, since potentially, the resource could
> keep migrating between nodes and since the failcount is reset on each
> migration, it would never reach the hard-fail-threshold. (or am I
> missing something?)

The failure count is specific to each node. By "failure handling is
reset" I mean that when the resource moves to a different node, the
failure count of the original node no longer matters -- the new node's
failure count is now what matters.

A node's failure count is reset only when the user manually clears it,
or the node is rebooted. Also, resources may have a failure-timeout
configured, in which case the count will go down as failures expire.

So, a resource with on-fail=restart would never go back to a node where
it had previously reached the threshold, unless that node's fail count
were cleared in one of those ways.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-22 Thread Kristoffer Grönlund
Ken Gaillot  writes:
>
> "restart" is the only on-fail value that it makes sense to escalate.
>
> block/stop/fence/standby are final. Block means "don't touch the
> resource again", so there can't be any further response to failures.
> Stop/fence/standby move the resource off the local node, so failure
> handling is reset (there are 0 failures on the new node to begin with).

Hrm. If a restart potentially migrates the resource to a different node,
is the failcount reset then as well? If so, wouldn't that complicate the
hard-fail-threshold variable too, since potentially, the resource could
keep migrating between nodes and since the failcount is reset on each
migration, it would never reach the hard-fail-threshold. (or am I
missing something?)

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-22 Thread Ken Gaillot
On 09/22/2016 09:53 AM, Jan Pokorný wrote:
> On 22/09/16 08:42 +0200, Kristoffer Grönlund wrote:
>> Ken Gaillot  writes:
>>
>>> I'm not saying it's a bad idea, just that it's more complicated than it
>>> first sounds, so it's worth thinking through the implications.
>>
>> Thinking about it and looking at how complicated it gets, maybe what
>> you'd really want, to make it clearer for the user, is the ability to
>> explicitly configure the behavior, either globally or per-resource. So
>> instead of having to tweak a set of variables that interact in complex
>> ways, you'd configure something like rule expressions,
>>
>> 
>>   
>>   
>>   
>> 
>>
>> So, try to restart the service 3 times, if that fails migrate the
>> service, if it still fails, fence the node.
>>
>> (obviously the details and XML syntax are just an example)
>>
>> This would then replace on-fail, migration-threshold, etc.
> 
> I must admit that in previous emails in this thread, I wasn't able to
> follow during the first pass, which is not the case with this procedural
> (sequence-ordered) approach.  Though someone can argue it doesn't take
> type of operation into account, which might again open the door for
> non-obvious interactions.

"restart" is the only on-fail value that it makes sense to escalate.

block/stop/fence/standby are final. Block means "don't touch the
resource again", so there can't be any further response to failures.
Stop/fence/standby move the resource off the local node, so failure
handling is reset (there are 0 failures on the new node to begin with).

"Ignore" is theoretically possible to escalate, e.g. "ignore 3 failures
then migrate", but I can't think of a real-world situation where that
makes sense, and it would be a significant re-implementation of "ignore"
(which currently ignores the state of having failed, as opposed to a
particular instance of failure).

What the interface needs to express is: "If this operation fails,
optionally try a soft recovery [always stop+start], but if  failures
occur on the same node, proceed to a [configurable] hard recovery".

And of course the interface will need to be different depending on how
certain details are decided, e.g. whether any failures count toward 
or just failures of one particular operation type, and whether the hard
recovery type can vary depending on what operation failed.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-22 Thread Ken Gaillot
On 09/22/2016 10:43 AM, Jan Pokorný wrote:
> On 21/09/16 10:51 +1000, Andrew Beekhof wrote:
>> On Wed, Sep 21, 2016 at 6:25 AM, Ken Gaillot  wrote:
>>> Our first proposed approach would add a new hard-fail-threshold
>>> operation property. If specified, the cluster would first try restarting
>>> the resource on the same node,
>>
>>
>> Well, just as now, it would be _allowed_ to start on the same node, but
>> this is not guaranteed.
> 
> Yeah, I should attend doublethink classes to understand "the same
> node" term better:
> 
> https://github.com/ClusterLabs/pacemaker/pull/1146/commits/3b3fc1fd8f2c95d8ab757711cf096cf231f27941

"Same node" is really a shorthand to hand-wave some details, because
that's what will typically happen.

The exact behavior is: "If the fail-count on this node reaches , ban
this node from running the resource."

That's not the same as *requiring* the resource to restart on the same
node before  is reached. As in any situation, Pacemaker will
re-evaluate the current state of the cluster, and choose the best node
to try starting the resource on.

For example, consider if the failed resource with on-fail=restart is
colocated with another resource with on-fail=standby that also failed,
then the whole node will be put in standby, and the original resource
will of course move away. It will be restarted, but the start will
happen on another node.

There are endless such scenarios, so "try restarting on the same node"
is not really accurate. To be accurate, I should have said something
like "try restarting without banning the node with the failure".

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-22 Thread Jan Pokorný
On 21/09/16 10:51 +1000, Andrew Beekhof wrote:
> On Wed, Sep 21, 2016 at 6:25 AM, Ken Gaillot  wrote:
>> Our first proposed approach would add a new hard-fail-threshold
>> operation property. If specified, the cluster would first try restarting
>> the resource on the same node,
> 
> 
> Well, just as now, it would be _allowed_ to start on the same node, but
> this is not guaranteed.

Yeah, I should attend doublethink classes to understand "the same
node" term better:

https://github.com/ClusterLabs/pacemaker/pull/1146/commits/3b3fc1fd8f2c95d8ab757711cf096cf231f27941


-- 
Jan (Poki)


pgpTKo0jPTs3W.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] kind=Optional order constraint not working at startup

2016-09-22 Thread Auer, Jens
Hi,

> >> shared_fs has to wait for the DRBD promotion, but the other resources
> >> have no such limitation, so they are free to start before shared_fs.
> > Isn't there an implicit limitation by the ordering constraint? I have
> > drbd_promote < shared_fs < snmpAgent-clone, and I would expect this to be a
> transitive relationship.
> 
> Yes, but shared fs < snmpAgent-Clone is optional, so snmpAgent-Clone is free 
> to
> start without it.
I was probably confused by the description in the manual. It says that
"* Optional - Only applies if both resources are starting and/or stopping." 
(from 
RedHat HA documentation). I assumed that this means e.g.
that when all resources are started when I start the cluster the constraint 
holds. 

> > What is the meaning of "transition"? Is there any way I can force resource 
> > actions
> into transitions?
> 
> A transition is simply the cluster's response to the current cluster state, 
> as directed
> by the configuration. The easiest way to think of it is as the "steps" as 
> described
> above.
> 
> If the configuration says a service should be running, but the service is not 
> currently
> running, then the cluster will schedule a start action (if possible 
> considering
> constraints, etc.). All such actions that may be scheduled together at one 
> time is a
> "transition".
> 
> You can't really control transitions; you can only control the configuration, 
> and
> transitions result from configuration+state.
> 
> The only way to force actions to take place in a certain order is to use 
> mandatory
> constraints.
> 
> The problem here is that you want the constraint to be mandatory only at 
> "start-
> up". But there really is no such thing. Consider the case where the cluster 
> stays up,
> and for whatever maintenance purpose, you stop all the resources, then start 
> them
> again later. Is that the same as start-up or not? What if you restart all but 
> one
> resource?
I think start-up is just a special case of what I think is a dependency for 
starting a resource. 
My current understanding is that a mandatory constraint means "If you 
start/stop resource A then you 
have to start/stop resource B". An optional  constraint says that the 
constraint only holds when
you start/stop two resources together in a single transition. What I want to 
express is more like
a dependency "don't start resource A before resource B has been started at all. 
State changes of resource B 
should not impact resource A". I realize this is kind of odd, but if A can 
tolerate outages of its dependency B,
e.g. reconnect, this makes sense. In principle this is what an optional 
constraint does, but not restricted
to a single transition.

> I can imagine one possible (but convoluted) way to do something like this, 
> using
> node attributes and rules:
> 
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-
> single/Pacemaker_Explained/index.html#idm140521751827232
> 
> With a rule, you can specify a location constraint that applies, not to a 
> particular
> node, but to any node with a particular value of a particular node attribute.
> 
> You would need a custom resource agent that sets a node attribute. Let's say 
> it
> takes three parameters, the node attribute name, the value to set when 
> starting (or
> do nothing), and the value to set when stopping (or do nothing). (That might
> actually be a good idea for a new ocf:pacemaker:
> agent.)
> 
> You'd have an instance of this resource grouped with shared-fs, that would 
> set the
> attribute to some magic value when started (say, "1").
> You'd have another instance grouped with snmpAgent-clone that would set it
> differently when stopped ("0"). Then, you'd have a location constraint for
> snmpAgent-clone with a rule that says it is only allowed on nodes with the 
> attribute
> set to "1".
> 
> With that, snmpAgent-clone would be unable to start until shared-fs had 
> started at
> least once. shared-fs could stop without affecting snmpAgent-clone. If 
> snmpAgent-
> clone stopped, it would reset, so it would require shared-fs again.
> 
> I haven't thought through all possible scenarios, but I think it would give 
> the
> behavior you want.
That sounds interesting... I think we explore a solution which could accept 
restarting our resources.
We only used the cloned resource set because we want our processes up and 
running to
minimize outage when doing a failover. Currently, the second server is a 
passive backup
which has everything up and running ready to take over. After the fs switches, 
it resynchs 
and then is ready to go. We probably can accept the additional timeout for 
starting the resources
completely, but we have to explore this.

Thanks,

  Jens


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: 

Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

2016-09-22 Thread Lars Ellenberg
On Thu, Sep 22, 2016 at 08:01:44AM +0200, Klaus Wenninger wrote:
> On 09/22/2016 06:34 AM, renayama19661...@ybb.ne.jp wrote:
> > Hi Klaus,
> >
> > Thank you for comment.
> >
> > Okay!
> >
> > Will it mean that improvement is considered in community in future?
> 
> Speaking for me I'd like to have some feedback if we might
> have overseen something so that it is rather a config issue.
> 
> One of my current projects is to introduce improved
> observation of pacemaker_remoted by sbd. (Saying improved
> here because there is already something when you enable
> pacemaker-watcher on remote-nodes but it creates unneeded
> watchdog-reboots in a couple of cases ...)
> Looks as if some additional (direct) communication (heartbeat -
> the principle not the communication & membership for
> clusters) between pacemaker_remoted (very similar to lrmd)
> and sbd would come handy for that.
> 
> So in this light it might make sense
> to consider expanding that for crmd as well ...
> 
> If we are finally facing an issue I'd herewith like to ask for
> input.

In a somewhat extended context, there used to be "apphbd",
which itself would register with some watchdog to "monitor" itself,
and which "applications" would register with to negotiate their own
"application heartbeat".
Not neccessarily only components of the cluster manager,
but cluster aware "resources" as well.

If they fail to feed their app hb, apphbd would then "trigger a
notification", and some other entity would react on that based on
yet an other configuration.  And plugins.
Didn't old heartbeat like the concept of plugins...

Anyways, you get the idea.

Currently, we have SBD chosen as such a "watchdog proxy",
maybe we can generalize it?

All of that would require cooperation within the node itself, though.

In this scenario, the cluster is not trusting the "sanity"
of the "commander in chief".

So maybe in addition of this "in-node application heartbeat",
all non-DCs should periodically actively challenge the sanity
of the DC from the outside, and trigger re-election if they have
"reasonable doubt"?


-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R, Integration, Ops, Consulting, Support

DRBD® and LINBIT® are registered trademarks of LINBIT

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-22 Thread Kristoffer Grönlund
Ken Gaillot  writes:

> I'm not saying it's a bad idea, just that it's more complicated than it
> first sounds, so it's worth thinking through the implications.

Thinking about it and looking at how complicated it gets, maybe what
you'd really want, to make it clearer for the user, is the ability to
explicitly configure the behavior, either globally or per-resource. So
instead of having to tweak a set of variables that interact in complex
ways, you'd configure something like rule expressions,


  
  
  


So, try to restart the service 3 times, if that fails migrate the
service, if it still fails, fence the node.

(obviously the details and XML syntax are just an example)

This would then replace on-fail, migration-threshold, etc.

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

2016-09-22 Thread Klaus Wenninger
On 09/22/2016 06:34 AM, renayama19661...@ybb.ne.jp wrote:
> Hi Klaus,
>
> Thank you for comment.
>
> Okay!
>
> Will it mean that improvement is considered in community in future?

Speaking for me I'd like to have some feedback if we might
have overseen something so that it is rather a config issue.

One of my current projects is to introduce improved
observation of pacemaker_remoted by sbd. (Saying improved
here because there is already something when you enable
pacemaker-watcher on remote-nodes but it creates unneeded
watchdog-reboots in a couple of cases ...)
Looks as if some additional (direct) communication (heartbeat -
the principle not the communication & membership for
clusters) between pacemaker_remoted (very similar to lrmd)
and sbd would come handy for that.

So in this light it might make sense
to consider expanding that for crmd as well ...

If we are finally facing an issue I'd herewith like to ask for
input.

>
> Best Regards,
> Hideo Yamauchi.
>
>
>
> - Original Message -
>> From: Klaus Wenninger 
>> To: users@clusterlabs.org
>> Cc: 
>> Date: 2016/9/21, Wed 19:09
>> Subject: Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster 
>> decisions are delayed infinitely
>>
>> My observation results still stand:
>>
>> - cluster doesn't autonomously find out of this situation
>> - sbd doesn't help because the cib is still readable
>>
>> Regards,
>> Klaus
>>
>> On 09/21/2016 11:52 AM, renayama19661...@ybb.ne.jp wrote:
>>>  Hi All,
>>>
>>>  Was the final conclusion given about this problem?
>>>
>>>  If a user uses sbd, can the cluster evade a problem of SIGSTOP of crmd?
>>>
>>>  We are interested in this problem, too.
>>>
>>>  Best Regards,
>>>
>>>  Hideo Yamauchi.
>>>
>>>
>>>  ___
>>>  Users mailing list: Users@clusterlabs.org
>>>  http://clusterlabs.org/mailman/listinfo/users
>>>
>>>  Project Home: http://www.clusterlabs.org
>>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>  Bugs: http://bugs.clusterlabs.org
>>
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org