Re: [ClusterLabs] Single-node automated startup question
If it's a VM or container - it should be on a third location. Using a VM hosted on one of the nodes is like giving that node more votes in a two-node cluster. Cheap 3rd node for quorum makes more sense to me. Best Regards,Strahil Nikolov On Wed, Apr 14, 2021 at 21:19, Antony Stone wrote: On Wednesday 14 April 2021 at 19:33:39, Strahil Nikolov wrote: > What about a small form factor device to serve as a quorum maker ? > Best Regards,Strahil Nikolov If you're going to take that approach, why not a virtual machine or two, hosted inside the physical machine which is your single real node? I'm not necessarily advocating this method of achieving quorum, but it's probably an idea worth considering for specific situations. Antony. -- Someone has stolen all the toilets from New Scotland Yard. Police say they have absolutely nothing to go on. Please reply to the list; please *don't* CC me. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Single-node automated startup question
What about a small form factor device to serve as a quorum maker ? Best Regards,Strahil Nikolov___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Single-node automated startup question
On Wednesday 14 April 2021 at 19:33:39, Strahil Nikolov wrote: > What about a small form factor device to serve as a quorum maker ? > Best Regards,Strahil Nikolov If you're going to take that approach, why not a virtual machine or two, hosted inside the physical machine which is your single real node? I'm not necessarily advocating this method of achieving quorum, but it's probably an idea worth considering for specific situations. Antony. -- Someone has stolen all the toilets from New Scotland Yard. Police say they have absolutely nothing to go on. Please reply to the list; please *don't* CC me. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Single-node automated startup question
On Wed, 2021-04-14 at 18:00 +0300, Andrei Borzenkov wrote: > On 14.04.2021 17:50, Digimer wrote: > > Hi all, > > > > As we get close to finish our Anvil! switch to pacemaker, I'm > > trying > > to tie up loose ends. One that I want feedback on is the pacemaker > > version of cman's old 'post_join_delay' feature. > > > > Use case example; > > > > A common use for the Anvil! is remote deployments where there is > > no > > (IT) humans available. Think cargo ships, field data collection, > > etc. So > > it's entirely possible that a node could fail and not be repaired > > for > > weeks or even months. With this in mind, it's also feasible that a > > solo > > node later loses power, and then reboots. In such a case, 'pcs > > cluster > > start' would never go quorate as the peer is dead. > > > > In cman, during startup, if there was no reply from the peer > > after > > post_join_delay seconds, the peer would get fenced and then the > > cluster > > would finish coming up. Being two_node, it would also become > > quorate and > > start hosting services. Of course, this opens the risk of a fence > > loop, > > but we have other protections in place to prevent that, so a fence > > loop > > is not a concern. > > > > My question then is two-fold; > > > > 1. Is there a pacemaker equivalent to 'post_join_delay'? (Fence the > > peer > > and, if successful, become quorate)? > > > > Startup fencing is pacemaker default (startup-fencing cluster > option). Start-up fencing will have the desired effect in >2 node cluster, but in 2-node cluster the corosync wait_for_all option is key. If wait_for_all is true (which is the default when two_node is set), then a node that comes up alone will wait until it sees the other node at least once before becoming quorate. This prevents an isolated node from coming up and fencing a node that's happily running. Setting wait_for_all to false will make an isolated node immediately become quorate. It will do what you want, which is fence the other node and take over resources, but the danger is that this node is the one that's having trouble (e.g. can't see the other node due to a network card issue). The healthy node could fence the unhealthy node, which might then reboot and come up and shoot the healthy node. There's no direct equivalent of a delay before becoming quorate, but I don't think that helps -- the boot time acts as a sort of random delay, and a delay doesn't help the issue of an unhealthy node shooting a healthy one. My recommendation would be to set wait_for_all to true as long as both nodes are known to be healthy. Once an unhealthy node is down and expected to stay down, set wait_for_all to false on the healthy node so it can reboot and bring the cluster up. (The unhealthy node will still have wait_for_all=true, so it won't cause any trouble even if it comes up.) > > > 2. If not, was this a conscious decision not to add it for some > > reason, > > or was it simply never added? If it was consciously decided to not > > have > > it, what was the reasoning behind it? > > > > I can replicate this behaviour in our code, but I don't want to > > do > > that if there is a compelling reason that I am not aware of. > > > > So, > > > > A) is there a pacemaker version of post_join_delay? > > B) is there a compelling argument NOT to use post_join_delay > > behaviour > > in pacemaker I am not seeing? > > > > Thanks! > > > > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Single-node automated startup question
On 14.04.2021 17:50, Digimer wrote: > Hi all, > > As we get close to finish our Anvil! switch to pacemaker, I'm trying > to tie up loose ends. One that I want feedback on is the pacemaker > version of cman's old 'post_join_delay' feature. > > Use case example; > > A common use for the Anvil! is remote deployments where there is no > (IT) humans available. Think cargo ships, field data collection, etc. So > it's entirely possible that a node could fail and not be repaired for > weeks or even months. With this in mind, it's also feasible that a solo > node later loses power, and then reboots. In such a case, 'pcs cluster > start' would never go quorate as the peer is dead. > > In cman, during startup, if there was no reply from the peer after > post_join_delay seconds, the peer would get fenced and then the cluster > would finish coming up. Being two_node, it would also become quorate and > start hosting services. Of course, this opens the risk of a fence loop, > but we have other protections in place to prevent that, so a fence loop > is not a concern. > > My question then is two-fold; > > 1. Is there a pacemaker equivalent to 'post_join_delay'? (Fence the peer > and, if successful, become quorate)? > Startup fencing is pacemaker default (startup-fencing cluster option). > 2. If not, was this a conscious decision not to add it for some reason, > or was it simply never added? If it was consciously decided to not have > it, what was the reasoning behind it? > > I can replicate this behaviour in our code, but I don't want to do > that if there is a compelling reason that I am not aware of. > > So, > > A) is there a pacemaker version of post_join_delay? > B) is there a compelling argument NOT to use post_join_delay behaviour > in pacemaker I am not seeing? > > Thanks! > ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Single-node automated startup question
Hi all, As we get close to finish our Anvil! switch to pacemaker, I'm trying to tie up loose ends. One that I want feedback on is the pacemaker version of cman's old 'post_join_delay' feature. Use case example; A common use for the Anvil! is remote deployments where there is no (IT) humans available. Think cargo ships, field data collection, etc. So it's entirely possible that a node could fail and not be repaired for weeks or even months. With this in mind, it's also feasible that a solo node later loses power, and then reboots. In such a case, 'pcs cluster start' would never go quorate as the peer is dead. In cman, during startup, if there was no reply from the peer after post_join_delay seconds, the peer would get fenced and then the cluster would finish coming up. Being two_node, it would also become quorate and start hosting services. Of course, this opens the risk of a fence loop, but we have other protections in place to prevent that, so a fence loop is not a concern. My question then is two-fold; 1. Is there a pacemaker equivalent to 'post_join_delay'? (Fence the peer and, if successful, become quorate)? 2. If not, was this a conscious decision not to add it for some reason, or was it simply never added? If it was consciously decided to not have it, what was the reasoning behind it? I can replicate this behaviour in our code, but I don't want to do that if there is a compelling reason that I am not aware of. So, A) is there a pacemaker version of post_join_delay? B) is there a compelling argument NOT to use post_join_delay behaviour in pacemaker I am not seeing? Thanks! -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/