Re: [ClusterLabs] questions about startup fencing
Ken Gaillotwrote: On Thu, 2017-11-30 at 11:58 +, Adam Spiers wrote: Ken Gaillot wrote: On Wed, 2017-11-29 at 14:22 +, Adam Spiers wrote: [snipped] Let's suppose further that the cluster configuration is such that no stateful resources which could potentially conflict with other nodes will ever get launched on that 5th node. For example it might only host stateless clones, or resources with require=nothing set, or it might not even host any resources at all due to some temporary constraints which have been applied. In those cases, what is to be gained from fencing? The only thing I can think of is that using (say) IPMI to power-cycle the node *might* fix whatever issue was preventing it from joining the cluster. Are there any other reasons for fencing in this case? It wouldn't help avoid any data corruption, at least. Just because constraints are telling the node it can't run a resource doesn't mean the node isn't malfunctioning and running it anyway. If the node can't tell us it's OK, we have to assume it's not. Sure, but even if it *is* running it, if it's not conflicting with anything or doing any harm, is it really always better to fence regardless? There's a resource meta-attribute "requires" that says what a resource needs to start. If it can't do any harm if it runs awry, you can set requires="quorum" (or even "nothing"). So, that's sort of a way to let the cluster know that, but it doesn't currently do what you're suggesting, since start-up fencing is purely about the node and not about the resources. I suppose if the cluster had no resources requiring fencing (or, to push it further, no such resources that will be probed on that node), we could disable start-up fencing, but that's not done currently. Yeah, that's the kind of thing I was envisaging. Disclaimer: to a certain extent I'm playing devil's advocate here to stimulate a closer (re-)examination of the axiom we've grown so used to over the years that if we don't know what a node is doing, we should fence it. I'm not necessarily arguing that fencing is wrong here, but I think it's healthy to occasionally go back to first principles and re-question why we are doing things a certain way, to make sure that the original assumptions still hold true. I'm familiar with the pain that our customers experience when nodes are fenced for less than very compelling reasons, so I think it's worth looking for opportunities to reduce fencing to when it's really needed. The fundamental purpose of a high-availability cluster is to keep the desired service functioning, above all other priorities (including, unfortunately, making sysadmins' lives easier). If a service requires an HA cluster, it's a safe bet it will have problems in a split-brain situation (otherwise, why bother with the overhead). Even something as simple as an IP address will render a service useless if it's brought up on two machines on a network. Fencing is really the only hammer we have in that situation. At that point, we have zero information about what the node is doing. If it's powered off (or cut off from disk/network), we know it's not doing anything. Fencing may not always help the situation, but it's all we've got. Sure, but I'm not (necessarily) even talking about a split-brain situation. For example what if a cluster with remote nodes is shut down cleanly, and then all the core nodes boot up cleanly but none of the remote nodes are powered on till hours or even days later? If I understand Yan correctly, in this situation all the remotes will be marked as needing fencing, and this is the bit that doesn't make sense to me. If Pacemaker can't reach *any* remotes, it can't start any resources on those remotes, so (in the case where resources are partitioned cleanly into those which run on remotes vs. those which don't) there is no danger of any concurrency violation. So fencing remotes before you can use them is totally pointless. Surely fencing of node A should only happen when Pacemaker is ready to start resource X on node B which might already be running on node A. But if no such node B exists then fencing is overkill. It would be better to wait until the first remote joins the cluster, at which point Pacemaker can assess its current state and decide the best course of action. Otherwise it's like cutting your nose to spite your face. In fact, in the particular scenario which caused me to trigger this whole discussion, I suspect the above also applies even if some remotes joined the newly booted cluster quickly whilst others still take hours or days to boot - because in that scenario it is additionally safe to assume that none of the resources managed on those remotes by pacemaker_remoted would ever be started by anything other than pacemaker_remoted, since a) the whole system is configured automatically in a way which
Re: [ClusterLabs] questions about startup fencing
On Fri, 2017-12-01 at 16:21 -0600, Ken Gaillot wrote: > On Thu, 2017-11-30 at 11:58 +, Adam Spiers wrote: > > Ken Gaillotwrote: > > > On Wed, 2017-11-29 at 14:22 +, Adam Spiers wrote: > > > > Hi all, > > > > > > > > A colleague has been valiantly trying to help me belatedly > > > > learn > > > > about > > > > the intricacies of startup fencing, but I'm still not fully > > > > understanding some of the finer points of the behaviour. > > > > > > > > The documentation on the "startup-fencing" option[0] says > > > > > > > > Advanced Use Only: Should the cluster shoot unseen nodes? > > > > Not > > > > using the default is very unsafe! > > > > > > > > and that it defaults to TRUE, but doesn't elaborate any > > > > further: > > > > > > > > https://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/ > > > > Pa > > > > cema > > > > ker_Explained/s-cluster-options.html > > > > > > > > Let's imagine the following scenario: > > > > > > > > - We have a 5-node cluster, with all nodes running cleanly. > > > > > > > > - The whole cluster is shut down cleanly. > > > > > > > > - The whole cluster is then started up again. (Side question: > > > > what > > > > happens if the last node to shut down is not the first to > > > > start > > > > up? > > > > How will the cluster ensure it has the most recent version of > > > > the > > > > CIB? Without that, how would it know whether the last man > > > > standing > > > > was shut down cleanly or not?) > > > > > > Of course, the cluster can't know what CIB version nodes it > > > doesn't > > > see > > > have, so if a set of nodes is started with an older version, it > > > will go > > > with that. > > > > Right, that's what I expected. > > > > > However, a node can't do much without quorum, so it would be > > > difficult > > > to get in a situation where CIB changes were made with quorum > > > before > > > shutdown, but none of those nodes are present at the next start- > > > up > > > with > > > quorum. > > > > > > In any case, when a new node joins a cluster, the nodes do > > > compare > > > CIB > > > versions. If the new node has a newer CIB, the cluster will use > > > it. > > > If > > > other changes have been made since then, the newest CIB wins, so > > > one or > > > the other's changes will be lost. > > > > Ahh, that's interesting. Based on reading > > > > https://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pace > > ma > > ker_Explained/ch03.html#_cib_properties > > > > whichever node has the highest (admin_epoch, epoch, num_updates) > > tuple > > will win, so normally in this scenario it would be the epoch which > > decides it, i.e. whichever node had the most changes since the last > > time the conflicting nodes shared the same config - right? > > Correct ... assuming the code for that is working properly, which I > haven't confirmed :) > > > > > And if that would choose the wrong node, admin_epoch can be set > > manually to override that decision? > > Correct again, with same caveat > > > > > > Whether missing nodes were shut down cleanly or not relates to > > > your > > > next question ... > > > > > > > - 4 of the nodes boot up fine and rejoin the cluster within the > > > > dc-deadtime interval, foruming a quorum, but the 5th doesn't. > > > > > > > > IIUC, with startup-fencing enabled, this will result in that > > > > 5th > > > > node > > > > automatically being fenced. If I'm right, is that really > > > > *always* > > > > necessary? > > > > > > It's always safe. :-) As you mentioned, if the missing node was > > > the > > > last one alive in the previous run, the cluster can't know > > > whether > > > it > > > shut down cleanly or not. Even if the node was known to shut down > > > cleanly in the last run, the cluster still can't know whether the > > > node > > > was started since then and is now merely unreachable. So, fencing > > > is > > > necessary to ensure it's not accessing resources. > > > > I get that, but I was questioning the "necessary to ensure it's not > > accessing resources" part of this statement. My point is that > > sometimes this might be overkill, because sometimes we might be > > able > > to > > discern through other methods that there are no resources we need > > to > > worry about potentially conflicting with what we want to > > run. That's > > why I gave the stateless clones example. > > > > > The same scenario is why a single node can't have quorum at > > > start- > > > up in > > > a cluster with "two_node" set. Both nodes have to see each other > > > at > > > least once before they can assume it's safe to do anything. > > > > Yep. > > > > > > Let's suppose further that the cluster configuration is such > > > > that > > > > no > > > > stateful resources which could potentially conflict with other > > > > nodes > > > > will ever get launched on that 5th node. For example it might > > > > only > > > > host stateless clones, or resources with
Re: [ClusterLabs] questions about startup fencing
On Thu, 2017-11-30 at 11:58 +, Adam Spiers wrote: > Ken Gaillotwrote: > > On Wed, 2017-11-29 at 14:22 +, Adam Spiers wrote: > > > Hi all, > > > > > > A colleague has been valiantly trying to help me belatedly learn > > > about > > > the intricacies of startup fencing, but I'm still not fully > > > understanding some of the finer points of the behaviour. > > > > > > The documentation on the "startup-fencing" option[0] says > > > > > > Advanced Use Only: Should the cluster shoot unseen nodes? Not > > > using the default is very unsafe! > > > > > > and that it defaults to TRUE, but doesn't elaborate any further: > > > > > > https://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pa > > > cema > > > ker_Explained/s-cluster-options.html > > > > > > Let's imagine the following scenario: > > > > > > - We have a 5-node cluster, with all nodes running cleanly. > > > > > > - The whole cluster is shut down cleanly. > > > > > > - The whole cluster is then started up again. (Side question: > > > what > > > happens if the last node to shut down is not the first to start > > > up? > > > How will the cluster ensure it has the most recent version of > > > the > > > CIB? Without that, how would it know whether the last man > > > standing > > > was shut down cleanly or not?) > > > > Of course, the cluster can't know what CIB version nodes it doesn't > > see > > have, so if a set of nodes is started with an older version, it > > will go > > with that. > > Right, that's what I expected. > > > However, a node can't do much without quorum, so it would be > > difficult > > to get in a situation where CIB changes were made with quorum > > before > > shutdown, but none of those nodes are present at the next start-up > > with > > quorum. > > > > In any case, when a new node joins a cluster, the nodes do compare > > CIB > > versions. If the new node has a newer CIB, the cluster will use it. > > If > > other changes have been made since then, the newest CIB wins, so > > one or > > the other's changes will be lost. > > Ahh, that's interesting. Based on reading > > https://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacema > ker_Explained/ch03.html#_cib_properties > > whichever node has the highest (admin_epoch, epoch, num_updates) > tuple > will win, so normally in this scenario it would be the epoch which > decides it, i.e. whichever node had the most changes since the last > time the conflicting nodes shared the same config - right? Correct ... assuming the code for that is working properly, which I haven't confirmed :) > > And if that would choose the wrong node, admin_epoch can be set > manually to override that decision? Correct again, with same caveat > > > Whether missing nodes were shut down cleanly or not relates to your > > next question ... > > > > > - 4 of the nodes boot up fine and rejoin the cluster within the > > > dc-deadtime interval, foruming a quorum, but the 5th doesn't. > > > > > > IIUC, with startup-fencing enabled, this will result in that 5th > > > node > > > automatically being fenced. If I'm right, is that really > > > *always* > > > necessary? > > > > It's always safe. :-) As you mentioned, if the missing node was the > > last one alive in the previous run, the cluster can't know whether > > it > > shut down cleanly or not. Even if the node was known to shut down > > cleanly in the last run, the cluster still can't know whether the > > node > > was started since then and is now merely unreachable. So, fencing > > is > > necessary to ensure it's not accessing resources. > > I get that, but I was questioning the "necessary to ensure it's not > accessing resources" part of this statement. My point is that > sometimes this might be overkill, because sometimes we might be able > to > discern through other methods that there are no resources we need to > worry about potentially conflicting with what we want to run. That's > why I gave the stateless clones example. > > > The same scenario is why a single node can't have quorum at start- > > up in > > a cluster with "two_node" set. Both nodes have to see each other at > > least once before they can assume it's safe to do anything. > > Yep. > > > > Let's suppose further that the cluster configuration is such that > > > no > > > stateful resources which could potentially conflict with other > > > nodes > > > will ever get launched on that 5th node. For example it might > > > only > > > host stateless clones, or resources with require=nothing set, or > > > it > > > might not even host any resources at all due to some temporary > > > constraints which have been applied. > > > > > > In those cases, what is to be gained from fencing? The only > > > thing I > > > can think of is that using (say) IPMI to power-cycle the node > > > *might* > > > fix whatever issue was preventing it from joining the > > > cluster. Are > > > there any other reasons for fencing in this
Re: [ClusterLabs] questions about startup fencing
Ken Gaillotwrote: On Wed, 2017-11-29 at 14:22 +, Adam Spiers wrote: Hi all, A colleague has been valiantly trying to help me belatedly learn about the intricacies of startup fencing, but I'm still not fully understanding some of the finer points of the behaviour. The documentation on the "startup-fencing" option[0] says Advanced Use Only: Should the cluster shoot unseen nodes? Not using the default is very unsafe! and that it defaults to TRUE, but doesn't elaborate any further: https://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacema ker_Explained/s-cluster-options.html Let's imagine the following scenario: - We have a 5-node cluster, with all nodes running cleanly. - The whole cluster is shut down cleanly. - The whole cluster is then started up again. (Side question: what happens if the last node to shut down is not the first to start up? How will the cluster ensure it has the most recent version of the CIB? Without that, how would it know whether the last man standing was shut down cleanly or not?) Of course, the cluster can't know what CIB version nodes it doesn't see have, so if a set of nodes is started with an older version, it will go with that. Right, that's what I expected. However, a node can't do much without quorum, so it would be difficult to get in a situation where CIB changes were made with quorum before shutdown, but none of those nodes are present at the next start-up with quorum. In any case, when a new node joins a cluster, the nodes do compare CIB versions. If the new node has a newer CIB, the cluster will use it. If other changes have been made since then, the newest CIB wins, so one or the other's changes will be lost. Ahh, that's interesting. Based on reading https://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacemaker_Explained/ch03.html#_cib_properties whichever node has the highest (admin_epoch, epoch, num_updates) tuple will win, so normally in this scenario it would be the epoch which decides it, i.e. whichever node had the most changes since the last time the conflicting nodes shared the same config - right? And if that would choose the wrong node, admin_epoch can be set manually to override that decision? Whether missing nodes were shut down cleanly or not relates to your next question ... - 4 of the nodes boot up fine and rejoin the cluster within the dc-deadtime interval, foruming a quorum, but the 5th doesn't. IIUC, with startup-fencing enabled, this will result in that 5th node automatically being fenced. If I'm right, is that really *always* necessary? It's always safe. :-) As you mentioned, if the missing node was the last one alive in the previous run, the cluster can't know whether it shut down cleanly or not. Even if the node was known to shut down cleanly in the last run, the cluster still can't know whether the node was started since then and is now merely unreachable. So, fencing is necessary to ensure it's not accessing resources. I get that, but I was questioning the "necessary to ensure it's not accessing resources" part of this statement. My point is that sometimes this might be overkill, because sometimes we might be able to discern through other methods that there are no resources we need to worry about potentially conflicting with what we want to run. That's why I gave the stateless clones example. The same scenario is why a single node can't have quorum at start-up in a cluster with "two_node" set. Both nodes have to see each other at least once before they can assume it's safe to do anything. Yep. Let's suppose further that the cluster configuration is such that no stateful resources which could potentially conflict with other nodes will ever get launched on that 5th node. For example it might only host stateless clones, or resources with require=nothing set, or it might not even host any resources at all due to some temporary constraints which have been applied. In those cases, what is to be gained from fencing? The only thing I can think of is that using (say) IPMI to power-cycle the node *might* fix whatever issue was preventing it from joining the cluster. Are there any other reasons for fencing in this case? It wouldn't help avoid any data corruption, at least. Just because constraints are telling the node it can't run a resource doesn't mean the node isn't malfunctioning and running it anyway. If the node can't tell us it's OK, we have to assume it's not. Sure, but even if it *is* running it, if it's not conflicting with anything or doing any harm, is it really always better to fence regardless? Disclaimer: to a certain extent I'm playing devil's advocate here to stimulate a closer (re-)examination of the axiom we've grown so used to over the years that if we don't know what a node is doing, we should fence it. I'm not necessarily arguing that fencing is wrong here, but I think it's healthy to occasionally go back to first
Re: [ClusterLabs] questions about startup fencing
On Thu, Nov 30, 2017 at 1:39 PM, Gao,Yanwrote: > On 11/30/2017 09:14 AM, Andrei Borzenkov wrote: >> >> On Wed, Nov 29, 2017 at 6:54 PM, Ken Gaillot wrote: >>> >>> >>> The same scenario is why a single node can't have quorum at start-up in >>> a cluster with "two_node" set. Both nodes have to see each other at >>> least once before they can assume it's safe to do anything. >>> >> >> Unless we set no-quorum-policy=ignore in which case it will proceed >> after fencing another node. As far as I'm understand this is the only >> way to get number of active cluster nodes below quorum, right? > > To be safe, "two_node: 1" automatically enables "wait_for_all". Of course > one can explicitly disable "wait_for_all" if they know what they are doing. > Well ... ha1:~ # crm corosync status Printing ring status. Local node ID 1084766299 RING ID 0 id = 192.168.56.91 status = ring 0 active with no faults Quorum information -- Date: Thu Nov 30 19:09:57 2017 Quorum provider: corosync_votequorum Nodes:1 Node ID: 1084766299 Ring ID: 412 Quorate: No Votequorum information -- Expected votes: 2 Highest expected: 2 Total votes: 1 Quorum: 1 Activity blocked Flags:2Node WaitForAll Membership information -- Nodeid Votes Name 1084766299 1 ha1 (local) ha1:~ # ha1:~ # crm_mon -1r Stack: corosync Current DC: ha1 (version 1.1.16-4.8-77ea74d) - partition WITHOUT quorum Last updated: Thu Nov 30 19:08:03 2017 Last change: Thu Nov 30 11:05:03 2017 by root via cibadmin on ha1 2 nodes configured 3 resources configured Online: [ ha1 ] OFFLINE: [ ha2 ] Full list of resources: stonith-sbd(stonith:external/sbd): Started ha1 Master/Slave Set: ms_Stateful_1 [rsc_Stateful_1] Masters: [ ha1 ] Stopped: [ ha2 ] ha1:~ # ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] questions about startup fencing
On 11/30/2017 09:14 AM, Andrei Borzenkov wrote: On Wed, Nov 29, 2017 at 6:54 PM, Ken Gaillotwrote: The same scenario is why a single node can't have quorum at start-up in a cluster with "two_node" set. Both nodes have to see each other at least once before they can assume it's safe to do anything. Unless we set no-quorum-policy=ignore in which case it will proceed after fencing another node. As far as I'm understand this is the only way to get number of active cluster nodes below quorum, right? To be safe, "two_node: 1" automatically enables "wait_for_all". Of course one can explicitly disable "wait_for_all" if they know what they are doing. Regards, Yan ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] questions about startup fencing
On Wed, Nov 29, 2017 at 6:54 PM, Ken Gaillotwrote: > > The same scenario is why a single node can't have quorum at start-up in > a cluster with "two_node" set. Both nodes have to see each other at > least once before they can assume it's safe to do anything. > Unless we set no-quorum-policy=ignore in which case it will proceed after fencing another node. As far as I'm understand this is the only way to get number of active cluster nodes below quorum, right? ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] questions about startup fencing
On 11/29/2017 09:09 PM, Kristoffer Grönlund wrote: > Adam Spierswrites: > >> OK, so reading between the lines, if we don't want our cluster's >> latest config changes accidentally discarded during a complete cluster >> reboot, we should ensure that the last man standing is also the first >> one booted up - right? > That would make sense to me, but I don't know if it's the only > solution. If you separately ensure that they all have the same > configuration first, you could start them in any order I guess. I guess it is not that bad as after the last man standing has left the stage it would take a quorate number (actually depending on how many you allow to survive) of nodes till anything happens again (equivalent to wait-for-all in 2-node clusters). And one of these should have a reasonably current cib. > >> If so, I think that's a perfectly reasonable thing to ask for, but >> maybe it should be documented explicitly somewhere? Apologies if it >> is already and I missed it. > Yeah, maybe a section discussing both starting and stopping a whole > cluster would be helpful, but I don't know if I feel like I've thought > about it enough myself. Regarding the HP Service Guard commands that > Ulrich Windl mentioned, the very idea of such commands offends me on > some level but I don't know if I can clearly articulate why. :D > ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] questions about startup fencing
Adam Spierswrites: > > OK, so reading between the lines, if we don't want our cluster's > latest config changes accidentally discarded during a complete cluster > reboot, we should ensure that the last man standing is also the first > one booted up - right? That would make sense to me, but I don't know if it's the only solution. If you separately ensure that they all have the same configuration first, you could start them in any order I guess. > > If so, I think that's a perfectly reasonable thing to ask for, but > maybe it should be documented explicitly somewhere? Apologies if it > is already and I missed it. Yeah, maybe a section discussing both starting and stopping a whole cluster would be helpful, but I don't know if I feel like I've thought about it enough myself. Regarding the HP Service Guard commands that Ulrich Windl mentioned, the very idea of such commands offends me on some level but I don't know if I can clearly articulate why. :D -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] questions about startup fencing
Kristoffer Gronlundwrote: Adam Spiers writes: Kristoffer Gronlund wrote: Adam Spiers writes: - The whole cluster is shut down cleanly. - The whole cluster is then started up again. (Side question: what happens if the last node to shut down is not the first to start up? How will the cluster ensure it has the most recent version of the CIB? Without that, how would it know whether the last man standing was shut down cleanly or not?) This is my opinion, I don't really know what the "official" pacemaker stance is: There is no such thing as shutting down a cluster cleanly. A cluster is a process stretching over multiple nodes - if they all shut down, the process is gone. When you start up again, you effectively have a completely new cluster. Sorry, I don't follow you at all here. When you start the cluster up again, the cluster config from before the shutdown is still there. That's very far from being a completely new cluster :-) You have a new cluster with (possibly fragmented) memories of a previous life ;) Well yeah, that's another way of describing it :-) Yes, exactly. If the first node to start up was not the last man standing, the CIB history is effectively being forked. So how is this issue avoided? The only way to bring up a cluster from being completely stopped is to treat it as creating a completely new cluster. The first node to start "creates" the cluster and later nodes join that cluster. That's ignoring the cluster config, which persists even when the cluster's down. There could be a command in pacemaker which resets a set of nodes to a common known state, basically to pick the CIB from one of the nodes as the survivor and copy that to all of them. But in the end, that's just the same thing as just picking one node as the first node, and telling the others to join that one and to discard their configurations. So, treating it as a new cluster. OK, so reading between the lines, if we don't want our cluster's latest config changes accidentally discarded during a complete cluster reboot, we should ensure that the last man standing is also the first one booted up - right? If so, I think that's a perfectly reasonable thing to ask for, but maybe it should be documented explicitly somewhere? Apologies if it is already and I missed it. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] questions about startup fencing
Klaus Wenningerwrote: On 11/29/2017 04:23 PM, Kristoffer Grönlund wrote: Adam Spiers writes: - The whole cluster is shut down cleanly. - The whole cluster is then started up again. (Side question: what happens if the last node to shut down is not the first to start up? How will the cluster ensure it has the most recent version of the CIB? Without that, how would it know whether the last man standing was shut down cleanly or not?) This is my opinion, I don't really know what the "official" pacemaker stance is: There is no such thing as shutting down a cluster cleanly. A cluster is a process stretching over multiple nodes - if they all shut down, the process is gone. When you start up again, you effectively have a completely new cluster. When starting up, how is the cluster, at any point, to know if the cluster it has knowledge of is the "latest" cluster? The next node could have a newer version of the CIB which adds yet more nodes to the cluster. To make it even clearer imagine a node being reverted to a previous state by recovering it from a backup. Yes, I'm asking how this kind of scenario is dealt with :-) Another example is a config change being made after one or more of the cluster nodes had already been shut down. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] questions about startup fencing
Kristoffer Gronlundwrote: Adam Spiers writes: - The whole cluster is shut down cleanly. - The whole cluster is then started up again. (Side question: what happens if the last node to shut down is not the first to start up? How will the cluster ensure it has the most recent version of the CIB? Without that, how would it know whether the last man standing was shut down cleanly or not?) This is my opinion, I don't really know what the "official" pacemaker stance is: There is no such thing as shutting down a cluster cleanly. A cluster is a process stretching over multiple nodes - if they all shut down, the process is gone. When you start up again, you effectively have a completely new cluster. Sorry, I don't follow you at all here. When you start the cluster up again, the cluster config from before the shutdown is still there. That's very far from being a completely new cluster :-) When starting up, how is the cluster, at any point, to know if the cluster it has knowledge of is the "latest" cluster? That was exactly my question. The next node could have a newer version of the CIB which adds yet more nodes to the cluster. Yes, exactly. If the first node to start up was not the last man standing, the CIB history is effectively being forked. So how is this issue avoided? The only way to bring up a cluster from being completely stopped is to treat it as creating a completely new cluster. The first node to start "creates" the cluster and later nodes join that cluster. That's ignoring the cluster config, which persists even when the cluster's down. But to be clear, you picked a small side question from my original post and answered that. The main questions I had were about startup fencing :-) ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] questions about startup fencing
On 11/29/2017 04:54 PM, Ken Gaillot wrote: On Wed, 2017-11-29 at 14:22 +, Adam Spiers wrote: The same questions apply if this troublesome node was actually a remote node running pacemaker_remoted, rather than the 5th node in the cluster. Remote nodes don't join at the crmd level as cluster nodes do, so they don't "start up" in the same sense, and start-up fencing doesn't apply to them. Instead, the cluster initiates the connection when called for (I don't remember for sure whether it fences the remote node if the connection fails, but that would make sense). According to link_rsc2remotenode() and handle_startup_fencing(), similar "startup-fencing applies to remote nodes too. So if a remote resource fails to start, the remote node will be fenced. A global setting statup-fencing=false will change the behavior for remote nodes too. Regards, Yan ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] questions about startup fencing
On Wed, 2017-11-29 at 14:22 +, Adam Spiers wrote: > Hi all, > > A colleague has been valiantly trying to help me belatedly learn > about > the intricacies of startup fencing, but I'm still not fully > understanding some of the finer points of the behaviour. > > The documentation on the "startup-fencing" option[0] says > > Advanced Use Only: Should the cluster shoot unseen nodes? Not > using the default is very unsafe! > > and that it defaults to TRUE, but doesn't elaborate any further: > > https://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacema > ker_Explained/s-cluster-options.html > > Let's imagine the following scenario: > > - We have a 5-node cluster, with all nodes running cleanly. > > - The whole cluster is shut down cleanly. > > - The whole cluster is then started up again. (Side question: what > happens if the last node to shut down is not the first to start up? > How will the cluster ensure it has the most recent version of the > CIB? Without that, how would it know whether the last man standing > was shut down cleanly or not?) Of course, the cluster can't know what CIB version nodes it doesn't see have, so if a set of nodes is started with an older version, it will go with that. However, a node can't do much without quorum, so it would be difficult to get in a situation where CIB changes were made with quorum before shutdown, but none of those nodes are present at the next start-up with quorum. In any case, when a new node joins a cluster, the nodes do compare CIB versions. If the new node has a newer CIB, the cluster will use it. If other changes have been made since then, the newest CIB wins, so one or the other's changes will be lost. Whether missing nodes were shut down cleanly or not relates to your next question ... > - 4 of the nodes boot up fine and rejoin the cluster within the > dc-deadtime interval, foruming a quorum, but the 5th doesn't. > > IIUC, with startup-fencing enabled, this will result in that 5th node > automatically being fenced. If I'm right, is that really *always* > necessary? It's always safe. :-) As you mentioned, if the missing node was the last one alive in the previous run, the cluster can't know whether it shut down cleanly or not. Even if the node was known to shut down cleanly in the last run, the cluster still can't know whether the node was started since then and is now merely unreachable. So, fencing is necessary to ensure it's not accessing resources. The same scenario is why a single node can't have quorum at start-up in a cluster with "two_node" set. Both nodes have to see each other at least once before they can assume it's safe to do anything. > Let's suppose further that the cluster configuration is such that no > stateful resources which could potentially conflict with other nodes > will ever get launched on that 5th node. For example it might only > host stateless clones, or resources with require=nothing set, or it > might not even host any resources at all due to some temporary > constraints which have been applied. > > In those cases, what is to be gained from fencing? The only thing I > can think of is that using (say) IPMI to power-cycle the node *might* > fix whatever issue was preventing it from joining the cluster. Are > there any other reasons for fencing in this case? It wouldn't help > avoid any data corruption, at least. Just because constraints are telling the node it can't run a resource doesn't mean the node isn't malfunctioning and running it anyway. If the node can't tell us it's OK, we have to assume it's not. > Now let's imagine the same scenario, except rather than a clean full > cluster shutdown, all nodes were affected by a power cut, but also > this time the whole cluster is configured to *only* run stateless > clones, so there is no risk of conflict between two nodes > accidentally > running the same resource. On startup, the 4 nodes in the quorum > have > no way of knowing that the 5th node was also affected by the power > cut, so in theory from their perspective it could still be running a > stateless clone. Again, is there anything to be gained from fencing > the 5th node once it exceeds the dc-deadtime threshold for joining, > other than the chance that a reboot might fix whatever was preventing > it from joining, and get the cluster back to full strength? If a cluster runs only services that have no potential to conflict, then you don't need a cluster. :-) Unique clones require communication even if they're stateless (think IPaddr2). I'm pretty sure even some anonymous stateless clones require communication to avoid issues. > Also, when exactly does the dc-deadtime timer start ticking? > Is it reset to zero after a node is fenced, so that potentially that > node could go into a reboot loop if dc-deadtime is set too low? A node's crmd starts the timer at start-up and whenever a new election starts, and is stopped when the DC makes it a join offer. I don't
Re: [ClusterLabs] questions about startup fencing
On 11/29/2017 04:23 PM, Kristoffer Grönlund wrote: > Adam Spierswrites: > >> - The whole cluster is shut down cleanly. >> >> - The whole cluster is then started up again. (Side question: what >> happens if the last node to shut down is not the first to start up? >> How will the cluster ensure it has the most recent version of the >> CIB? Without that, how would it know whether the last man standing >> was shut down cleanly or not?) > This is my opinion, I don't really know what the "official" pacemaker > stance is: There is no such thing as shutting down a cluster cleanly. A > cluster is a process stretching over multiple nodes - if they all shut > down, the process is gone. When you start up again, you effectively have > a completely new cluster. > > When starting up, how is the cluster, at any point, to know if the > cluster it has knowledge of is the "latest" cluster? The next node could > have a newer version of the CIB which adds yet more nodes to the > cluster. To make it even clearer imagine a node being reverted to a previous state by recovering it from a backup. Regards, Klaus > > The only way to bring up a cluster from being completely stopped is to > treat it as creating a completely new cluster. The first node to start > "creates" the cluster and later nodes join that cluster. > > Cheers, > Kristoffer > ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] questions about startup fencing
Adam Spierswrites: > - The whole cluster is shut down cleanly. > > - The whole cluster is then started up again. (Side question: what > happens if the last node to shut down is not the first to start up? > How will the cluster ensure it has the most recent version of the > CIB? Without that, how would it know whether the last man standing > was shut down cleanly or not?) This is my opinion, I don't really know what the "official" pacemaker stance is: There is no such thing as shutting down a cluster cleanly. A cluster is a process stretching over multiple nodes - if they all shut down, the process is gone. When you start up again, you effectively have a completely new cluster. When starting up, how is the cluster, at any point, to know if the cluster it has knowledge of is the "latest" cluster? The next node could have a newer version of the CIB which adds yet more nodes to the cluster. The only way to bring up a cluster from being completely stopped is to treat it as creating a completely new cluster. The first node to start "creates" the cluster and later nodes join that cluster. Cheers, Kristoffer -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org