Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.
On Fri, Sep 23, 2016 at 1:58 AM, Ken Gaillotwrote: > On 09/22/2016 09:53 AM, Jan Pokorný wrote: > > On 22/09/16 08:42 +0200, Kristoffer Grönlund wrote: > >> Ken Gaillot writes: > >> > >>> I'm not saying it's a bad idea, just that it's more complicated than it > >>> first sounds, so it's worth thinking through the implications. > >> > >> Thinking about it and looking at how complicated it gets, maybe what > >> you'd really want, to make it clearer for the user, is the ability to > >> explicitly configure the behavior, either globally or per-resource. So > >> instead of having to tweak a set of variables that interact in complex > >> ways, you'd configure something like rule expressions, > >> > >> > >> > >> > >> > >> > >> > >> So, try to restart the service 3 times, if that fails migrate the > >> service, if it still fails, fence the node. > >> > >> (obviously the details and XML syntax are just an example) > >> > >> This would then replace on-fail, migration-threshold, etc. > > > > I must admit that in previous emails in this thread, I wasn't able to > > follow during the first pass, which is not the case with this procedural > > (sequence-ordered) approach. Though someone can argue it doesn't take > > type of operation into account, which might again open the door for > > non-obvious interactions. > > "restart" is the only on-fail value that it makes sense to escalate. > > block/stop/fence/standby are final. Block means "don't touch the > resource again", so there can't be any further response to failures. > Stop/fence/standby move the resource off the local node, so failure > handling is reset (there are 0 failures on the new node to begin with). > > "Ignore" is theoretically possible to escalate, e.g. "ignore 3 failures > then migrate", but I can't think of a real-world situation where that > makes sense, really? it is not uncommon to hear "i know its failed, but i dont want the cluster to do anything until its _really_ failed" > and it would be a significant re-implementation of "ignore" > (which currently ignores the state of having failed, as opposed to a > particular instance of failure). > agreed > > What the interface needs to express is: "If this operation fails, > optionally try a soft recovery [always stop+start], but if failures > occur on the same node, proceed to a [configurable] hard recovery". > > And of course the interface will need to be different depending on how > certain details are decided, e.g. whether any failures count toward > or just failures of one particular operation type, and whether the hard > recovery type can vary depending on what operation failed. > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.
On 09/22/2016 12:58 PM, Kristoffer Grönlund wrote: > Ken Gaillotwrites: >> >> "restart" is the only on-fail value that it makes sense to escalate. >> >> block/stop/fence/standby are final. Block means "don't touch the >> resource again", so there can't be any further response to failures. >> Stop/fence/standby move the resource off the local node, so failure >> handling is reset (there are 0 failures on the new node to begin with). > > Hrm. If a restart potentially migrates the resource to a different node, > is the failcount reset then as well? If so, wouldn't that complicate the > hard-fail-threshold variable too, since potentially, the resource could > keep migrating between nodes and since the failcount is reset on each > migration, it would never reach the hard-fail-threshold. (or am I > missing something?) The failure count is specific to each node. By "failure handling is reset" I mean that when the resource moves to a different node, the failure count of the original node no longer matters -- the new node's failure count is now what matters. A node's failure count is reset only when the user manually clears it, or the node is rebooted. Also, resources may have a failure-timeout configured, in which case the count will go down as failures expire. So, a resource with on-fail=restart would never go back to a node where it had previously reached the threshold, unless that node's fail count were cleared in one of those ways. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.
Ken Gaillotwrites: > > "restart" is the only on-fail value that it makes sense to escalate. > > block/stop/fence/standby are final. Block means "don't touch the > resource again", so there can't be any further response to failures. > Stop/fence/standby move the resource off the local node, so failure > handling is reset (there are 0 failures on the new node to begin with). Hrm. If a restart potentially migrates the resource to a different node, is the failcount reset then as well? If so, wouldn't that complicate the hard-fail-threshold variable too, since potentially, the resource could keep migrating between nodes and since the failcount is reset on each migration, it would never reach the hard-fail-threshold. (or am I missing something?) -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.
On 09/22/2016 09:53 AM, Jan Pokorný wrote: > On 22/09/16 08:42 +0200, Kristoffer Grönlund wrote: >> Ken Gaillotwrites: >> >>> I'm not saying it's a bad idea, just that it's more complicated than it >>> first sounds, so it's worth thinking through the implications. >> >> Thinking about it and looking at how complicated it gets, maybe what >> you'd really want, to make it clearer for the user, is the ability to >> explicitly configure the behavior, either globally or per-resource. So >> instead of having to tweak a set of variables that interact in complex >> ways, you'd configure something like rule expressions, >> >> >> >> >> >> >> >> So, try to restart the service 3 times, if that fails migrate the >> service, if it still fails, fence the node. >> >> (obviously the details and XML syntax are just an example) >> >> This would then replace on-fail, migration-threshold, etc. > > I must admit that in previous emails in this thread, I wasn't able to > follow during the first pass, which is not the case with this procedural > (sequence-ordered) approach. Though someone can argue it doesn't take > type of operation into account, which might again open the door for > non-obvious interactions. "restart" is the only on-fail value that it makes sense to escalate. block/stop/fence/standby are final. Block means "don't touch the resource again", so there can't be any further response to failures. Stop/fence/standby move the resource off the local node, so failure handling is reset (there are 0 failures on the new node to begin with). "Ignore" is theoretically possible to escalate, e.g. "ignore 3 failures then migrate", but I can't think of a real-world situation where that makes sense, and it would be a significant re-implementation of "ignore" (which currently ignores the state of having failed, as opposed to a particular instance of failure). What the interface needs to express is: "If this operation fails, optionally try a soft recovery [always stop+start], but if failures occur on the same node, proceed to a [configurable] hard recovery". And of course the interface will need to be different depending on how certain details are decided, e.g. whether any failures count toward or just failures of one particular operation type, and whether the hard recovery type can vary depending on what operation failed. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.
On 09/22/2016 10:43 AM, Jan Pokorný wrote: > On 21/09/16 10:51 +1000, Andrew Beekhof wrote: >> On Wed, Sep 21, 2016 at 6:25 AM, Ken Gaillotwrote: >>> Our first proposed approach would add a new hard-fail-threshold >>> operation property. If specified, the cluster would first try restarting >>> the resource on the same node, >> >> >> Well, just as now, it would be _allowed_ to start on the same node, but >> this is not guaranteed. > > Yeah, I should attend doublethink classes to understand "the same > node" term better: > > https://github.com/ClusterLabs/pacemaker/pull/1146/commits/3b3fc1fd8f2c95d8ab757711cf096cf231f27941 "Same node" is really a shorthand to hand-wave some details, because that's what will typically happen. The exact behavior is: "If the fail-count on this node reaches , ban this node from running the resource." That's not the same as *requiring* the resource to restart on the same node before is reached. As in any situation, Pacemaker will re-evaluate the current state of the cluster, and choose the best node to try starting the resource on. For example, consider if the failed resource with on-fail=restart is colocated with another resource with on-fail=standby that also failed, then the whole node will be put in standby, and the original resource will of course move away. It will be restarted, but the start will happen on another node. There are endless such scenarios, so "try restarting on the same node" is not really accurate. To be accurate, I should have said something like "try restarting without banning the node with the failure". ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.
On 21/09/16 10:51 +1000, Andrew Beekhof wrote: > On Wed, Sep 21, 2016 at 6:25 AM, Ken Gaillotwrote: >> Our first proposed approach would add a new hard-fail-threshold >> operation property. If specified, the cluster would first try restarting >> the resource on the same node, > > > Well, just as now, it would be _allowed_ to start on the same node, but > this is not guaranteed. Yeah, I should attend doublethink classes to understand "the same node" term better: https://github.com/ClusterLabs/pacemaker/pull/1146/commits/3b3fc1fd8f2c95d8ab757711cf096cf231f27941 -- Jan (Poki) pgpTKo0jPTs3W.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] kind=Optional order constraint not working at startup
Hi, > >> shared_fs has to wait for the DRBD promotion, but the other resources > >> have no such limitation, so they are free to start before shared_fs. > > Isn't there an implicit limitation by the ordering constraint? I have > > drbd_promote < shared_fs < snmpAgent-clone, and I would expect this to be a > transitive relationship. > > Yes, but shared fs < snmpAgent-Clone is optional, so snmpAgent-Clone is free > to > start without it. I was probably confused by the description in the manual. It says that "* Optional - Only applies if both resources are starting and/or stopping." (from RedHat HA documentation). I assumed that this means e.g. that when all resources are started when I start the cluster the constraint holds. > > What is the meaning of "transition"? Is there any way I can force resource > > actions > into transitions? > > A transition is simply the cluster's response to the current cluster state, > as directed > by the configuration. The easiest way to think of it is as the "steps" as > described > above. > > If the configuration says a service should be running, but the service is not > currently > running, then the cluster will schedule a start action (if possible > considering > constraints, etc.). All such actions that may be scheduled together at one > time is a > "transition". > > You can't really control transitions; you can only control the configuration, > and > transitions result from configuration+state. > > The only way to force actions to take place in a certain order is to use > mandatory > constraints. > > The problem here is that you want the constraint to be mandatory only at > "start- > up". But there really is no such thing. Consider the case where the cluster > stays up, > and for whatever maintenance purpose, you stop all the resources, then start > them > again later. Is that the same as start-up or not? What if you restart all but > one > resource? I think start-up is just a special case of what I think is a dependency for starting a resource. My current understanding is that a mandatory constraint means "If you start/stop resource A then you have to start/stop resource B". An optional constraint says that the constraint only holds when you start/stop two resources together in a single transition. What I want to express is more like a dependency "don't start resource A before resource B has been started at all. State changes of resource B should not impact resource A". I realize this is kind of odd, but if A can tolerate outages of its dependency B, e.g. reconnect, this makes sense. In principle this is what an optional constraint does, but not restricted to a single transition. > I can imagine one possible (but convoluted) way to do something like this, > using > node attributes and rules: > > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html- > single/Pacemaker_Explained/index.html#idm140521751827232 > > With a rule, you can specify a location constraint that applies, not to a > particular > node, but to any node with a particular value of a particular node attribute. > > You would need a custom resource agent that sets a node attribute. Let's say > it > takes three parameters, the node attribute name, the value to set when > starting (or > do nothing), and the value to set when stopping (or do nothing). (That might > actually be a good idea for a new ocf:pacemaker: > agent.) > > You'd have an instance of this resource grouped with shared-fs, that would > set the > attribute to some magic value when started (say, "1"). > You'd have another instance grouped with snmpAgent-clone that would set it > differently when stopped ("0"). Then, you'd have a location constraint for > snmpAgent-clone with a rule that says it is only allowed on nodes with the > attribute > set to "1". > > With that, snmpAgent-clone would be unable to start until shared-fs had > started at > least once. shared-fs could stop without affecting snmpAgent-clone. If > snmpAgent- > clone stopped, it would reset, so it would require shared-fs again. > > I haven't thought through all possible scenarios, but I think it would give > the > behavior you want. That sounds interesting... I think we explore a solution which could accept restarting our resources. We only used the cloned resource set because we want our processes up and running to minimize outage when doing a failover. Currently, the second server is a passive backup which has everything up and running ready to take over. After the fs switches, it resynchs and then is ready to go. We probably can accept the additional timeout for starting the resources completely, but we have to explore this. Thanks, Jens ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
On Thu, Sep 22, 2016 at 08:01:44AM +0200, Klaus Wenninger wrote: > On 09/22/2016 06:34 AM, renayama19661...@ybb.ne.jp wrote: > > Hi Klaus, > > > > Thank you for comment. > > > > Okay! > > > > Will it mean that improvement is considered in community in future? > > Speaking for me I'd like to have some feedback if we might > have overseen something so that it is rather a config issue. > > One of my current projects is to introduce improved > observation of pacemaker_remoted by sbd. (Saying improved > here because there is already something when you enable > pacemaker-watcher on remote-nodes but it creates unneeded > watchdog-reboots in a couple of cases ...) > Looks as if some additional (direct) communication (heartbeat - > the principle not the communication & membership for > clusters) between pacemaker_remoted (very similar to lrmd) > and sbd would come handy for that. > > So in this light it might make sense > to consider expanding that for crmd as well ... > > If we are finally facing an issue I'd herewith like to ask for > input. In a somewhat extended context, there used to be "apphbd", which itself would register with some watchdog to "monitor" itself, and which "applications" would register with to negotiate their own "application heartbeat". Not neccessarily only components of the cluster manager, but cluster aware "resources" as well. If they fail to feed their app hb, apphbd would then "trigger a notification", and some other entity would react on that based on yet an other configuration. And plugins. Didn't old heartbeat like the concept of plugins... Anyways, you get the idea. Currently, we have SBD chosen as such a "watchdog proxy", maybe we can generalize it? All of that would require cooperation within the node itself, though. In this scenario, the cluster is not trusting the "sanity" of the "commander in chief". So maybe in addition of this "in-node application heartbeat", all non-DCs should periodically actively challenge the sanity of the DC from the outside, and trigger re-election if they have "reasonable doubt"? -- : Lars Ellenberg : LINBIT | Keeping the Digital World Running : DRBD -- Heartbeat -- Corosync -- Pacemaker : R, Integration, Ops, Consulting, Support DRBD® and LINBIT® are registered trademarks of LINBIT ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.
Ken Gaillotwrites: > I'm not saying it's a bad idea, just that it's more complicated than it > first sounds, so it's worth thinking through the implications. Thinking about it and looking at how complicated it gets, maybe what you'd really want, to make it clearer for the user, is the ability to explicitly configure the behavior, either globally or per-resource. So instead of having to tweak a set of variables that interact in complex ways, you'd configure something like rule expressions, So, try to restart the service 3 times, if that fails migrate the service, if it still fails, fence the node. (obviously the details and XML syntax are just an example) This would then replace on-fail, migration-threshold, etc. -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
On 09/22/2016 06:34 AM, renayama19661...@ybb.ne.jp wrote: > Hi Klaus, > > Thank you for comment. > > Okay! > > Will it mean that improvement is considered in community in future? Speaking for me I'd like to have some feedback if we might have overseen something so that it is rather a config issue. One of my current projects is to introduce improved observation of pacemaker_remoted by sbd. (Saying improved here because there is already something when you enable pacemaker-watcher on remote-nodes but it creates unneeded watchdog-reboots in a couple of cases ...) Looks as if some additional (direct) communication (heartbeat - the principle not the communication & membership for clusters) between pacemaker_remoted (very similar to lrmd) and sbd would come handy for that. So in this light it might make sense to consider expanding that for crmd as well ... If we are finally facing an issue I'd herewith like to ask for input. > > Best Regards, > Hideo Yamauchi. > > > > - Original Message - >> From: Klaus Wenninger>> To: users@clusterlabs.org >> Cc: >> Date: 2016/9/21, Wed 19:09 >> Subject: Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster >> decisions are delayed infinitely >> >> My observation results still stand: >> >> - cluster doesn't autonomously find out of this situation >> - sbd doesn't help because the cib is still readable >> >> Regards, >> Klaus >> >> On 09/21/2016 11:52 AM, renayama19661...@ybb.ne.jp wrote: >>> Hi All, >>> >>> Was the final conclusion given about this problem? >>> >>> If a user uses sbd, can the cluster evade a problem of SIGSTOP of crmd? >>> >>> We are interested in this problem, too. >>> >>> Best Regards, >>> >>> Hideo Yamauchi. >>> >>> >>> ___ >>> Users mailing list: Users@clusterlabs.org >>> http://clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> >> ___ >> Users mailing list: Users@clusterlabs.org >> http://clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org