Re: [ClusterLabs] best practice fencing with ipmi in 2node-setups / cloneresource/monitor/timeout
On 09/21/2016 01:51 AM, Stefan Bauer wrote: > Hi Ken, > > let met sum it up: > > Pacemaker in recent versions is smart enough to run (trigger, execute) the > fence operation on the node, that is not the target. > > If i have an external stonith device that can fence multiple nodes, a single > primitive is enough in pacemaker. > > If with external/ipmi i can only address a single node, i need to have > multiple primitives - one for each node. > > In this case it's recommended to let the primitive always run on the opposite > node - right? Yes, exactly :-) In terms of implementation, I'd use a +INFINITY location constraint to tie the device to the opposite node. This approach (as opposed to a -INFINITY constraint on the target node) allows the target node to run the fence device when the opposite node is unavailable. > thank you. > > Stefan > > -Ursprüngliche Nachricht- >> Von:Ken Gaillot <kgail...@redhat.com> >> Gesendet: Die 20 September 2016 16:49 >> An: users@clusterlabs.org >> Betreff: Re: [ClusterLabs] best practice fencing with ipmi in 2node-setups / >> cloneresource/monitor/timeout >> >> On 09/20/2016 06:42 AM, Digimer wrote: >>> On 20/09/16 06:59 AM, Stefan Bauer wrote: >>>> Hi, >>>> >>>> i run a 2 node cluster and want to be save in split-brain scenarios. For >>>> this i setup external/ipmi to stonith the other node. >>> >>> Please use 'fence_ipmilan'. I believe that the older external/ipmi are >>> deprecated (someone correct me if I am wrong on this). >> >> It's just an alternative. The "external/" agents come with the >> cluster-glue package, which isn't provided by some distributions (such >> as RHEL and its derivatives), so it's "deprecated" on those only. >> >>>> Some possible issues jumped to my mind and i would ike to find the best >>>> practice solution: >>>> >>>> - I have a primitive for each node to stonith. Many documents and guides >>>> recommend to never let them run on the host it should fence. I would >>>> setup clone resources to avoid dealing with locations that would also >>>> influence scoring. Does that make sense? >>> >>> Since v1.1.10 of pacemaker, you don't have to worry about this. >>> Pacemaker is smart enough to know where to run a fence call from in >>> order to terminate a target. >> >> Right, fence devices can run anywhere now, and in fact they don't even >> have to be "running" for pacemaker to use them -- as long as they are >> configured and not intentionally disabled, pacemaker will use them. >> >> There is still a slight advantage to not running a fence device on a >> node it can fence. "Running" a fence device in pacemaker really means >> running the recurring monitor for it. Since the node that runs the >> monitor has "verified" access to the device, pacemaker will prefer to >> use it to execute that device. However, pacemaker will not use a node to >> fence itself, except as a last resort if no other node is available. So, >> running a fence device on a node it can fence means that the preference >> is lost. >> >> That's a very minor detail, not worth worrying about. It's more a matter >> of personal preference. >> >> In this particular case, a more relevant concern is that you need >> different configurations for the different targets (the IPMI address is >> different). >> >> One approach is to define two different fence devices, each with one >> IPMI address. In that case, it makes sense to use the location >> constraints to ensure the device prefers the node that's not its target. >> >> Another approach (if the fence agent supports it) is to use >> pcmk_host_map to provide a different "port" (IPMI address) depending on >> which host is being fenced. In this case, you need only one fence device >> to be able to fence both hosts. You don't need a clone. (Remember, the >> node "running" the device merely refers to its monitor, so the cluster >> can still use the fence device, even if that node crashes.) >> >>>> - Monitoring operation on the stonith primitive is dangerous. I read >>>> that if monitor operations fail for the stonith device, stonith action >>>> is triggered. I think its not clever to give the cluster the option to >>>> fence a node just because it has an issue to monitor a fence device. >>>> That should not be a reason to shutdown a node. What is your opinion on >
Re: [ClusterLabs] best practice fencing with ipmi in 2node-setups / cloneresource/monitor/timeout
On 09/20/2016 06:42 AM, Digimer wrote: > On 20/09/16 06:59 AM, Stefan Bauer wrote: >> Hi, >> >> i run a 2 node cluster and want to be save in split-brain scenarios. For >> this i setup external/ipmi to stonith the other node. > > Please use 'fence_ipmilan'. I believe that the older external/ipmi are > deprecated (someone correct me if I am wrong on this). It's just an alternative. The "external/" agents come with the cluster-glue package, which isn't provided by some distributions (such as RHEL and its derivatives), so it's "deprecated" on those only. >> Some possible issues jumped to my mind and i would ike to find the best >> practice solution: >> >> - I have a primitive for each node to stonith. Many documents and guides >> recommend to never let them run on the host it should fence. I would >> setup clone resources to avoid dealing with locations that would also >> influence scoring. Does that make sense? > > Since v1.1.10 of pacemaker, you don't have to worry about this. > Pacemaker is smart enough to know where to run a fence call from in > order to terminate a target. Right, fence devices can run anywhere now, and in fact they don't even have to be "running" for pacemaker to use them -- as long as they are configured and not intentionally disabled, pacemaker will use them. There is still a slight advantage to not running a fence device on a node it can fence. "Running" a fence device in pacemaker really means running the recurring monitor for it. Since the node that runs the monitor has "verified" access to the device, pacemaker will prefer to use it to execute that device. However, pacemaker will not use a node to fence itself, except as a last resort if no other node is available. So, running a fence device on a node it can fence means that the preference is lost. That's a very minor detail, not worth worrying about. It's more a matter of personal preference. In this particular case, a more relevant concern is that you need different configurations for the different targets (the IPMI address is different). One approach is to define two different fence devices, each with one IPMI address. In that case, it makes sense to use the location constraints to ensure the device prefers the node that's not its target. Another approach (if the fence agent supports it) is to use pcmk_host_map to provide a different "port" (IPMI address) depending on which host is being fenced. In this case, you need only one fence device to be able to fence both hosts. You don't need a clone. (Remember, the node "running" the device merely refers to its monitor, so the cluster can still use the fence device, even if that node crashes.) >> - Monitoring operation on the stonith primitive is dangerous. I read >> that if monitor operations fail for the stonith device, stonith action >> is triggered. I think its not clever to give the cluster the option to >> fence a node just because it has an issue to monitor a fence device. >> That should not be a reason to shutdown a node. What is your opinion on >> this? Can i just set the primitive monitor operation to disabled? > > Monitoring is how you will detect that, for example, the IPMI cable > failed or was unplugged. I do not believe the node will get fenced on > fence agent monitor failing... At least not by default. I am not aware of any situation in which a failing fence monitor triggers a fence. Monitoring is good -- it verifies that the fence device is still working. One concern particular to on-board IPMI devices is that they typically share the same power supply as their host. So if the machine loses power, the cluster can't contact the IPMI to fence it -- which means it will be unable to recover any resources from the lost node. (It can't assume the node lost power -- it's possible just network connectivity between the two nodes was lost.) The only way around that is to have a second fence device (such as an intelligent power switch). If the cluster can't reach the IPMI, it will try the second device. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] best practice fencing with ipmi in 2node-setups / cloneresource/monitor/timeout
On 20/09/16 06:59 AM, Stefan Bauer wrote: > Hi, > > i run a 2 node cluster and want to be save in split-brain scenarios. For > this i setup external/ipmi to stonith the other node. Please use 'fence_ipmilan'. I believe that the older external/ipmi are deprecated (someone correct me if I am wrong on this). > Some possible issues jumped to my mind and i would ike to find the best > practice solution: > > - I have a primitive for each node to stonith. Many documents and guides > recommend to never let them run on the host it should fence. I would > setup clone resources to avoid dealing with locations that would also > influence scoring. Does that make sense? Since v1.1.10 of pacemaker, you don't have to worry about this. Pacemaker is smart enough to know where to run a fence call from in order to terminate a target. > - Monitoring operation on the stonith primitive is dangerous. I read > that if monitor operations fail for the stonith device, stonith action > is triggered. I think its not clever to give the cluster the option to > fence a node just because it has an issue to monitor a fence device. > That should not be a reason to shutdown a node. What is your opinion on > this? Can i just set the primitive monitor operation to disabled? Monitoring is how you will detect that, for example, the IPMI cable failed or was unplugged. I do not believe the node will get fenced on fence agent monitor failing... At least not by default. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] best practice fencing with ipmi in 2node-setups / cloneresource/monitor/timeout
Hi, i run a 2 node cluster and want to be save in split-brain scenarios. For this i setup external/ipmi to stonith the other node. Some possible issues jumped to my mind and i would ike to find the best practice solution: - I have a primitive for each node to stonith. Many documents and guides recommend to never let them run on the host it should fence. I would setup clone resources to avoid dealing with locations that would also influence scoring. Does that make sense? - Monitoring operation on the stonith primitive is dangerous. I read that if monitor operations fail for the stonith device, stonith action is triggered. I think its not clever to give the cluster the option to fence a node just because it has an issue to monitor a fence device. That should not be a reason to shutdown a node. What is your opinion on this? Can i just set the primitive monitor operation to disabled? Thank you. Stefan ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org