Re: [ClusterLabs] Single-node automated startup question

2021-04-14 Thread Strahil Nikolov
If it's a VM or container - it should be on a third location. Using a VM hosted 
on one of the nodes is like giving that node more votes in a two-node cluster.
Cheap 3rd node for quorum makes more sense to me.
Best Regards,Strahil Nikolov
 
 
  On Wed, Apr 14, 2021 at 21:19, Antony Stone 
wrote:   On Wednesday 14 April 2021 at 19:33:39, Strahil Nikolov wrote:

> What about a small form factor device to serve as a quorum maker ?
> Best Regards,Strahil Nikolov

If you're going to take that approach, why not a virtual machine or two, 
hosted inside the physical machine which is your single real node?

I'm not necessarily advocating this method of achieving quorum, but it's 
probably an idea worth considering for specific situations.


Antony.

-- 
Someone has stolen all the toilets from New Scotland Yard.  Police say they 
have absolutely nothing to go on.

                                                  Please reply to the list;
                                                        please *don't* CC me.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
  
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Single-node automated startup question

2021-04-14 Thread Strahil Nikolov
What about a small form factor device to serve as a quorum maker ?
Best Regards,Strahil Nikolov___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Single-node automated startup question

2021-04-14 Thread Antony Stone
On Wednesday 14 April 2021 at 19:33:39, Strahil Nikolov wrote:

> What about a small form factor device to serve as a quorum maker ?
> Best Regards,Strahil Nikolov

If you're going to take that approach, why not a virtual machine or two, 
hosted inside the physical machine which is your single real node?

I'm not necessarily advocating this method of achieving quorum, but it's 
probably an idea worth considering for specific situations.


Antony.

-- 
Someone has stolen all the toilets from New Scotland Yard.  Police say they 
have absolutely nothing to go on.

   Please reply to the list;
 please *don't* CC me.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Single-node automated startup question

2021-04-14 Thread Ken Gaillot
On Wed, 2021-04-14 at 18:00 +0300, Andrei Borzenkov wrote:
> On 14.04.2021 17:50, Digimer wrote:
> > Hi all,
> > 
> >   As we get close to finish our Anvil! switch to pacemaker, I'm
> > trying
> > to tie up loose ends. One that I want feedback on is the pacemaker
> > version of cman's old 'post_join_delay' feature.
> > 
> > Use case example;
> > 
> >   A common use for the Anvil! is remote deployments where there is
> > no
> > (IT) humans available. Think cargo ships, field data collection,
> > etc. So
> > it's entirely possible that a node could fail and not be repaired
> > for
> > weeks or even months. With this in mind, it's also feasible that a
> > solo
> > node later loses power, and then reboots. In such a case, 'pcs
> > cluster
> > start' would never go quorate as the peer is dead.
> > 
> >   In cman, during startup, if there was no reply from the peer
> > after
> > post_join_delay seconds, the peer would get fenced and then the
> > cluster
> > would finish coming up. Being two_node, it would also become
> > quorate and
> > start hosting services. Of course, this opens the risk of a fence
> > loop,
> > but we have other protections in place to prevent that, so a fence
> > loop
> > is not a concern.
> > 
> >   My question then is two-fold;
> > 
> > 1. Is there a pacemaker equivalent to 'post_join_delay'? (Fence the
> > peer
> > and, if successful, become quorate)?
> > 
> 
> Startup fencing is pacemaker default (startup-fencing cluster
> option).

Start-up fencing will have the desired effect in >2 node cluster, but
in 2-node cluster the corosync wait_for_all option is key.

If wait_for_all is true (which is the default when two_node is set),
then a node that comes up alone will wait until it sees the other node
at least once before becoming quorate. This prevents an isolated node
from coming up and fencing a node that's happily running.

Setting wait_for_all to false will make an isolated node immediately
become quorate. It will do what you want, which is fence the other node
and take over resources, but the danger is that this node is the one
that's having trouble (e.g. can't see the other node due to a network
card issue). The healthy node could fence the unhealthy node, which
might then reboot and come up and shoot the healthy node.

There's no direct equivalent of a delay before becoming quorate, but I
don't think that helps -- the boot time acts as a sort of random delay,
and a delay doesn't help the issue of an unhealthy node shooting a
healthy one.

My recommendation would be to set wait_for_all to true as long as both
nodes are known to be healthy. Once an unhealthy node is down and
expected to stay down, set wait_for_all to false on the healthy node so
it can reboot and bring the cluster up. (The unhealthy node will still
have wait_for_all=true, so it won't cause any trouble even if it comes
up.) 

> 
> > 2. If not, was this a conscious decision not to add it for some
> > reason,
> > or was it simply never added? If it was consciously decided to not
> > have
> > it, what was the reasoning behind it?
> > 
> >   I can replicate this behaviour in our code, but I don't want to
> > do
> > that if there is a compelling reason that I am not aware of.
> > 
> > So,
> > 
> > A) is there a pacemaker version of post_join_delay?
> > B) is there a compelling argument NOT to use post_join_delay
> > behaviour
> > in pacemaker I am not seeing?
> > 
> > Thanks!
> > 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Single-node automated startup question

2021-04-14 Thread Andrei Borzenkov
On 14.04.2021 17:50, Digimer wrote:
> Hi all,
> 
>   As we get close to finish our Anvil! switch to pacemaker, I'm trying
> to tie up loose ends. One that I want feedback on is the pacemaker
> version of cman's old 'post_join_delay' feature.
> 
> Use case example;
> 
>   A common use for the Anvil! is remote deployments where there is no
> (IT) humans available. Think cargo ships, field data collection, etc. So
> it's entirely possible that a node could fail and not be repaired for
> weeks or even months. With this in mind, it's also feasible that a solo
> node later loses power, and then reboots. In such a case, 'pcs cluster
> start' would never go quorate as the peer is dead.
> 
>   In cman, during startup, if there was no reply from the peer after
> post_join_delay seconds, the peer would get fenced and then the cluster
> would finish coming up. Being two_node, it would also become quorate and
> start hosting services. Of course, this opens the risk of a fence loop,
> but we have other protections in place to prevent that, so a fence loop
> is not a concern.
> 
>   My question then is two-fold;
> 
> 1. Is there a pacemaker equivalent to 'post_join_delay'? (Fence the peer
> and, if successful, become quorate)?
> 

Startup fencing is pacemaker default (startup-fencing cluster option).

> 2. If not, was this a conscious decision not to add it for some reason,
> or was it simply never added? If it was consciously decided to not have
> it, what was the reasoning behind it?
> 
>   I can replicate this behaviour in our code, but I don't want to do
> that if there is a compelling reason that I am not aware of.
> 
> So,
> 
> A) is there a pacemaker version of post_join_delay?
> B) is there a compelling argument NOT to use post_join_delay behaviour
> in pacemaker I am not seeing?
> 
> Thanks!
> 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Single-node automated startup question

2021-04-14 Thread Digimer
Hi all,

  As we get close to finish our Anvil! switch to pacemaker, I'm trying
to tie up loose ends. One that I want feedback on is the pacemaker
version of cman's old 'post_join_delay' feature.

Use case example;

  A common use for the Anvil! is remote deployments where there is no
(IT) humans available. Think cargo ships, field data collection, etc. So
it's entirely possible that a node could fail and not be repaired for
weeks or even months. With this in mind, it's also feasible that a solo
node later loses power, and then reboots. In such a case, 'pcs cluster
start' would never go quorate as the peer is dead.

  In cman, during startup, if there was no reply from the peer after
post_join_delay seconds, the peer would get fenced and then the cluster
would finish coming up. Being two_node, it would also become quorate and
start hosting services. Of course, this opens the risk of a fence loop,
but we have other protections in place to prevent that, so a fence loop
is not a concern.

  My question then is two-fold;

1. Is there a pacemaker equivalent to 'post_join_delay'? (Fence the peer
and, if successful, become quorate)?

2. If not, was this a conscious decision not to add it for some reason,
or was it simply never added? If it was consciously decided to not have
it, what was the reasoning behind it?

  I can replicate this behaviour in our code, but I don't want to do
that if there is a compelling reason that I am not aware of.

So,

A) is there a pacemaker version of post_join_delay?
B) is there a compelling argument NOT to use post_join_delay behaviour
in pacemaker I am not seeing?

Thanks!

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/