Re: [ClusterLabs] Pacemaker startup-fencing
Andrei Borzenkovwrites: > On Wed, Mar 16, 2016 at 2:22 PM, Ferenc Wágner wrote: > >> Pacemaker explained says about this cluster option: >> >> Advanced Use Only: Should the cluster shoot unseen nodes? Not using >> the default is very unsafe! >> >> 1. What are those "unseen" nodes? > > Nodes that lost communication with other nodes (think of unplugging cables) Translating to node status, does is mean UNCLEAN (offline) nodes which suddenly return? Can Pacemaker tell these apart from abruptly power cycled nodes (when reboot happens before the comeback)? I guess if a node was successfully fenced at the time, it won't be considered UNCLEAN, but is that the only way to avoid that? >> And a possibly related question: >> >> 2. If I've got UNCLEAN (offline) nodes, is there a way to clean them up, >>so that they don't get fenced when I switch them on? I mean without >>removing the node altogether, to keep its capacity settings for >>example. > > You can declare node as down using "crm node clearstate". You should > not really do it unless you ascertained that node is actually > physically down. Great. Is there an equivalent in bare bones Pacemaker, that is, not involving the CRM shell? Like deleting some status or LRMD history element of the node, for example? >> And some more about fencing: >> >> 3. What's the difference in cluster behavior between >>- stonith-enabled=FALSE (9.3.2: how often will the stop operation be >> retried?) >>- having no configured STONITH devices (resources won't be started, >> right?) >>- failing to STONITH with some error (on every node) >>- timing out the STONITH operation >>- manual fencing > > I do not think there is much difference. Without fencing pacemaker > cannot make decision to relocate resources so cluster will be stuck. Then I wonder why I hear the "must have working fencing if you value your data" mantra so often (and always without explanation). After all, it does not risk the data, only the automatic cluster recovery, right? >> 4. What's the modern way to do manual fencing? (stonith_admin >>--confirm + what? > > node name. :) I did really poor wording that question. I meant to ask what kind of cluster (STONITH) configuration makes the cluster sit patiently until I do the manual fencing, then carry on without timeouts or other errors. Just as if some automatic fencing agent did the job, but letting me investigate the node status beforehand. -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker startup-fencing
On Wed, Mar 16, 2016 at 4:18 PM, Lars Ellenbergwrote: > On Wed, Mar 16, 2016 at 01:47:52PM +0100, Ferenc Wágner wrote: >> >> And some more about fencing: >> >> >> >> 3. What's the difference in cluster behavior between >> >>- stonith-enabled=FALSE (9.3.2: how often will the stop operation be >> >> retried?) >> >>- having no configured STONITH devices (resources won't be started, >> >> right?) >> >>- failing to STONITH with some error (on every node) >> >>- timing out the STONITH operation >> >>- manual fencing >> > >> > I do not think there is much difference. Without fencing pacemaker >> > cannot make decision to relocate resources so cluster will be stuck. >> >> Then I wonder why I hear the "must have working fencing if you value >> your data" mantra so often (and always without explanation). After all, >> it does not risk the data, only the automatic cluster recovery, right? > > stonith-enabled=false > means: > if some node becomes unresponsive, > it is immediately *assumed* it was "clean" dead. > no fencing takes place, > resource takeover happens without further protection. > Oh! Actually it is not quite clear from documentation; documentation does not explain what happens in case of stonith-enabled=false at all. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Pacemaker startup-fencing
Hi, Pacemaker explained says about this cluster option: Advanced Use Only: Should the cluster shoot unseen nodes? Not using the default is very unsafe! 1. What are those "unseen" nodes? And a possibly related question: 2. If I've got UNCLEAN (offline) nodes, is there a way to clean them up, so that they don't get fenced when I switch them on? I mean without removing the node altogether, to keep its capacity settings for example. And some more about fencing: 3. What's the difference in cluster behavior between - stonith-enabled=FALSE (9.3.2: how often will the stop operation be retried?) - having no configured STONITH devices (resources won't be started, right?) - failing to STONITH with some error (on every node) - timing out the STONITH operation - manual fencing 4. What's the modern way to do manual fencing? (stonith_admin --confirm + what? I ask because meatware.so comes from cluster-glue and uses the old API). -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker startup-fencing
On Wed, Mar 16, 2016 at 01:47:52PM +0100, Ferenc Wágner wrote: > >> And some more about fencing: > >> > >> 3. What's the difference in cluster behavior between > >>- stonith-enabled=FALSE (9.3.2: how often will the stop operation be > >> retried?) > >>- having no configured STONITH devices (resources won't be started, > >> right?) > >>- failing to STONITH with some error (on every node) > >>- timing out the STONITH operation > >>- manual fencing > > > > I do not think there is much difference. Without fencing pacemaker > > cannot make decision to relocate resources so cluster will be stuck. > > Then I wonder why I hear the "must have working fencing if you value > your data" mantra so often (and always without explanation). After all, > it does not risk the data, only the automatic cluster recovery, right? stonith-enabled=false means: if some node becomes unresponsive, it is immediately *assumed* it was "clean" dead. no fencing takes place, resource takeover happens without further protection. That very much risks at least data divergence (replicas evoling independently), if not data corruption (shared disks and the like). -- : Lars Ellenberg : LINBIT | Keeping the Digital World Running : DRBD -- Heartbeat -- Corosync -- Pacemaker : R, Integration, Ops, Consulting, Support ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker startup-fencing
Andrei Borzenkovwrites: > On Wed, Mar 16, 2016 at 4:18 PM, Lars Ellenberg > wrote: > >> On Wed, Mar 16, 2016 at 01:47:52PM +0100, Ferenc Wágner wrote: >> > And some more about fencing: > > 3. What's the difference in cluster behavior between >- stonith-enabled=FALSE (9.3.2: how often will the stop operation be > retried?) >- having no configured STONITH devices (resources won't be started, > right?) >- failing to STONITH with some error (on every node) >- timing out the STONITH operation >- manual fencing I do not think there is much difference. Without fencing pacemaker cannot make decision to relocate resources so cluster will be stuck. >>> >>> Then I wonder why I hear the "must have working fencing if you value >>> your data" mantra so often (and always without explanation). After all, >>> it does not risk the data, only the automatic cluster recovery, right? >> >> stonith-enabled=false >> means: >> if some node becomes unresponsive, >> it is immediately *assumed* it was "clean" dead. >> no fencing takes place, >> resource takeover happens without further protection. > > Oh! Actually it is not quite clear from documentation; documentation > does not explain what happens in case of stonith-enabled=false at all. Yes, this is a crucially important piece of information, which should be prominently announced in the documentation. Thanks for spelling it out, Lars. Hope you don't mind that I turned your text into https://github.com/ClusterLabs/pacemaker/pull/960. -- Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org