Re: [ClusterLabs] HA domain controller fences newly joined node after fence_ipmilan delay even if transition was aborted.

2018-12-18 Thread Vitaly Zolotusky
Chris,
Thanks a lot for the info. I'll explore both options.
_Vitaly

> On December 18, 2018 at 11:13 AM Chris Walker  wrote:
> 
> 
> Looks like rhino66-left was scheduled for fencing because it was not present 
> 20 seconds (the dc-deadtime parameter) after rhino66-right started Pacemaker 
> (startup fencing).  I can think of a couple of ways to allow all nodes to 
> survive if they come up far apart in time (i.e., father apart than 
> dc-deadtime):
> 
> 1.  Increase dc-deadtime.  Unfortunately, the cluster always waits for 
> dc-deadtime to expire before starting resources, so this can delay your 
> cluster's startup.
> 
> 2.  As Ken mentioned, synchronize the starting of Corosync and Pacemaker.  I 
> did this with a simple ExecStartPre systemd script:
> 
> [root@bug0 ~]# cat /etc/systemd/system/corosync.service.d/ha_wait.conf
> [Service]
> ExecStartPre=/sbin/ha_wait.sh
> TimeoutStartSec=11min
> [root@bug0 ~]#
> 
> where ha_wait.sh has something like:
> 
> #!/bin/bash
> 
> timeout=600
> 
> peer=
> 
> echo "Waiting for ${peer}"
> peerup() {
>   systemctl -H ${peer} show -p ActiveState corosync.service 2> /dev/null | \
> egrep -q "=active|=reloading|=failed|=activating|=deactivating" && return > 0
>   return 1
> }
> 
> now=${SECONDS}
> while ! peerup && [ $((SECONDS-now)) -lt ${timeout} ]; do
>   echo -n .
>   sleep 5
> done
> 
> peerup && echo "${peer} is up starting HA" || echo "${peer} not up after 
> ${timeout} starting HA alone"
> 
> 
> This will cause corosync startup to block for 10 minutes waiting for the 
> partner node to come up, after which both nodes will start corosync/pacemaker 
> close in time.  If one node never comes up, then it will wait 10 minutes 
> before starting, after which the other node will be fenced (startup fencing 
> and subsequent resource startup will only happen will only occur if 
> no-quorum-policy is set to ignore)
> 
> HTH,
> 
> Chris
> 
> On 12/17/18 6:25 PM, Vitaly Zolotusky wrote:
> 
> Ken, Thank you very much for quick response!
> I do have "two_node: 1" in the corosync.conf. I have attached it to this 
> email (not from the same system as original messages, but they are all the 
> same).
> Syncing startup of corosync and pacemaker on different nodes would be a 
> problem for us.
> I suspect that the problem is that corosync assumes quorum is reached as soon 
> as corosync is started on both nodes, but pacemaker does not abort fencing 
> until pacemaker starts on the other node.
> 
> I will try to work around this issue by moving corosync and pacemaker 
> startups on single node as close to each other as possible.
> Thanks again!
> _Vitaly
> 
> 
> 
> On December 17, 2018 at 6:01 PM Ken Gaillot 
>  wrote:
> 
> 
> On Mon, 2018-12-17 at 15:43 -0500, Vitaly Zolotusky wrote:
> 
> 
> Hello,
> I have a 2 node cluster and stonith is configured for SBD and
> fence_ipmilan.
> fence_ipmilan for node 1 is configured for 0 delay and for node 2 for
> 30 sec delay so that nodes do not start killing each other during
> startup.
> 
> 
> 
> If you're using corosync 2 or later, you can set "two_node: 1" in
> corosync.conf. That implies the wait_for_all option, so that at start-
> up, both nodes must be present before quorum can be reached the first
> time. (After that point, one node can go away and quorum will be
> retained.)
> 
> Another way to avoid this is to start corosync on all nodes, then start
> pacemaker on all nodes.
> 
> 
> 
> In some cases (usually right after installation and when node 1 comes
> up first and node 2 second) the node that comes up first (node 1)
> states that node 2 is unclean, but can't fence it until quorum
> reached.
> Then as soon as quorum is reached after startup of corosync on node 2
> it sends a fence request for node 2.
> Fence_ipmilan gets into 30 sec delay.
> Pacemaker gets started on node 2.
> While fence_ipmilan is still waiting for the delay node 1 crmd aborts
> transition that requested the fence.
> Even though the transition was aborted, when delay time expires node
> 2 gets fenced.
> 
> 
> 
> Currently, pacemaker has no way of cancelling fencing once it's been
> initiated. Technically, it would be possible to cancel an operation
> that's in the delay stage (assuming that no other fence device has
> already been attempted, if there are more than one), but that hasn't
> been implemented.
> 
> 
> 
> Excerpts from messages are below. I also attached messages from both
> nodes and pe-input files from node 1.
> Any suggestions would be appreciated.
> Thank you very much for your help!
> Vitaly Zolotusky
> 
> Here are excerpts from the messages:
> 
> Node 1 - controller - rhino66-right 172.18.51.81 - came up
> first  *
> 
> Nov 29 16:47:54 rhino66-right pengine[22183]:  warning: Fencing and
> resource management disabled due to lack of quorum
> Nov 29 16:47:54 rhino66-right pengine[22183]:  warning: Node rhino66-
> left.lab.archivas.com is unclean!
> Nov 29 16:47:54 rhino66-right pengine[22183]:   

Re: [ClusterLabs] HA domain controller fences newly joined node after fence_ipmilan delay even if transition was aborted.

2018-12-18 Thread Chris Walker
Looks like rhino66-left was scheduled for fencing because it was not present 20 
seconds (the dc-deadtime parameter) after rhino66-right started Pacemaker 
(startup fencing).  I can think of a couple of ways to allow all nodes to 
survive if they come up far apart in time (i.e., father apart than dc-deadtime):

1.  Increase dc-deadtime.  Unfortunately, the cluster always waits for 
dc-deadtime to expire before starting resources, so this can delay your 
cluster's startup.

2.  As Ken mentioned, synchronize the starting of Corosync and Pacemaker.  I 
did this with a simple ExecStartPre systemd script:

[root@bug0 ~]# cat /etc/systemd/system/corosync.service.d/ha_wait.conf
[Service]
ExecStartPre=/sbin/ha_wait.sh
TimeoutStartSec=11min
[root@bug0 ~]#

where ha_wait.sh has something like:

#!/bin/bash

timeout=600

peer=

echo "Waiting for ${peer}"
peerup() {
  systemctl -H ${peer} show -p ActiveState corosync.service 2> /dev/null | \
egrep -q "=active|=reloading|=failed|=activating|=deactivating" && return 0
  return 1
}

now=${SECONDS}
while ! peerup && [ $((SECONDS-now)) -lt ${timeout} ]; do
  echo -n .
  sleep 5
done

peerup && echo "${peer} is up starting HA" || echo "${peer} not up after 
${timeout} starting HA alone"


This will cause corosync startup to block for 10 minutes waiting for the 
partner node to come up, after which both nodes will start corosync/pacemaker 
close in time.  If one node never comes up, then it will wait 10 minutes before 
starting, after which the other node will be fenced (startup fencing and 
subsequent resource startup will only happen will only occur if 
no-quorum-policy is set to ignore)

HTH,

Chris

On 12/17/18 6:25 PM, Vitaly Zolotusky wrote:

Ken, Thank you very much for quick response!
I do have "two_node: 1" in the corosync.conf. I have attached it to this email 
(not from the same system as original messages, but they are all the same).
Syncing startup of corosync and pacemaker on different nodes would be a problem 
for us.
I suspect that the problem is that corosync assumes quorum is reached as soon 
as corosync is started on both nodes, but pacemaker does not abort fencing 
until pacemaker starts on the other node.

I will try to work around this issue by moving corosync and pacemaker startups 
on single node as close to each other as possible.
Thanks again!
_Vitaly



On December 17, 2018 at 6:01 PM Ken Gaillot 
 wrote:


On Mon, 2018-12-17 at 15:43 -0500, Vitaly Zolotusky wrote:


Hello,
I have a 2 node cluster and stonith is configured for SBD and
fence_ipmilan.
fence_ipmilan for node 1 is configured for 0 delay and for node 2 for
30 sec delay so that nodes do not start killing each other during
startup.



If you're using corosync 2 or later, you can set "two_node: 1" in
corosync.conf. That implies the wait_for_all option, so that at start-
up, both nodes must be present before quorum can be reached the first
time. (After that point, one node can go away and quorum will be
retained.)

Another way to avoid this is to start corosync on all nodes, then start
pacemaker on all nodes.



In some cases (usually right after installation and when node 1 comes
up first and node 2 second) the node that comes up first (node 1)
states that node 2 is unclean, but can't fence it until quorum
reached.
Then as soon as quorum is reached after startup of corosync on node 2
it sends a fence request for node 2.
Fence_ipmilan gets into 30 sec delay.
Pacemaker gets started on node 2.
While fence_ipmilan is still waiting for the delay node 1 crmd aborts
transition that requested the fence.
Even though the transition was aborted, when delay time expires node
2 gets fenced.



Currently, pacemaker has no way of cancelling fencing once it's been
initiated. Technically, it would be possible to cancel an operation
that's in the delay stage (assuming that no other fence device has
already been attempted, if there are more than one), but that hasn't
been implemented.



Excerpts from messages are below. I also attached messages from both
nodes and pe-input files from node 1.
Any suggestions would be appreciated.
Thank you very much for your help!
Vitaly Zolotusky

Here are excerpts from the messages:

Node 1 - controller - rhino66-right 172.18.51.81 - came up
first  *

Nov 29 16:47:54 rhino66-right pengine[22183]:  warning: Fencing and
resource management disabled due to lack of quorum
Nov 29 16:47:54 rhino66-right pengine[22183]:  warning: Node rhino66-
left.lab.archivas.com is unclean!
Nov 29 16:47:54 rhino66-right pengine[22183]:   notice: Cannot fence
unclean nodes until quorum is attained (or no-quorum-policy is set to
ignore)
.
Nov 29 16:48:38 rhino66-right corosync[6677]:   [TOTEM ] A new
membership (172.16.1.81:60) was formed. Members joined: 2
Nov 29 16:48:38 rhino66-right corosync[6677]:   [VOTEQ ] Waiting for
all cluster members. Current votes: 1 expected_votes: 2
Nov 29 16:48:38 rhino66-right corosync[6677]:   [QUORUM] This 

Re: [ClusterLabs] HA domain controller fences newly joined node after fence_ipmilan delay even if transition was aborted.

2018-12-17 Thread Vitaly Zolotusky
Ken, Thank you very much for quick response! 
I do have "two_node: 1" in the corosync.conf. I have attached it to this email 
(not from the same system as original messages, but they are all the same).
Syncing startup of corosync and pacemaker on different nodes would be a problem 
for us.
I suspect that the problem is that corosync assumes quorum is reached as soon 
as corosync is started on both nodes, but pacemaker does not abort fencing 
until pacemaker starts on the other node.

I will try to work around this issue by moving corosync and pacemaker startups 
on single node as close to each other as possible.
Thanks again!
_Vitaly

> On December 17, 2018 at 6:01 PM Ken Gaillot  wrote:
> 
> 
> On Mon, 2018-12-17 at 15:43 -0500, Vitaly Zolotusky wrote:
> > Hello,
> > I have a 2 node cluster and stonith is configured for SBD and
> > fence_ipmilan.
> > fence_ipmilan for node 1 is configured for 0 delay and for node 2 for
> > 30 sec delay so that nodes do not start killing each other during
> > startup.
> 
> If you're using corosync 2 or later, you can set "two_node: 1" in
> corosync.conf. That implies the wait_for_all option, so that at start-
> up, both nodes must be present before quorum can be reached the first
> time. (After that point, one node can go away and quorum will be
> retained.)
> 
> Another way to avoid this is to start corosync on all nodes, then start
> pacemaker on all nodes.
> 
> > In some cases (usually right after installation and when node 1 comes
> > up first and node 2 second) the node that comes up first (node 1)
> > states that node 2 is unclean, but can't fence it until quorum
> > reached. 
> > Then as soon as quorum is reached after startup of corosync on node 2
> > it sends a fence request for node 2. 
> > Fence_ipmilan gets into 30 sec delay.
> > Pacemaker gets started on node 2.
> > While fence_ipmilan is still waiting for the delay node 1 crmd aborts
> > transition that requested the fence.
> > Even though the transition was aborted, when delay time expires node
> > 2 gets fenced.
> 
> Currently, pacemaker has no way of cancelling fencing once it's been
> initiated. Technically, it would be possible to cancel an operation
> that's in the delay stage (assuming that no other fence device has
> already been attempted, if there are more than one), but that hasn't
> been implemented.
> 
> > Excerpts from messages are below. I also attached messages from both
> > nodes and pe-input files from node 1.
> > Any suggestions would be appreciated.
> > Thank you very much for your help!
> > Vitaly Zolotusky
> > 
> > Here are excerpts from the messages:
> > 
> > Node 1 - controller - rhino66-right 172.18.51.81 - came up
> > first  *
> > 
> > Nov 29 16:47:54 rhino66-right pengine[22183]:  warning: Fencing and
> > resource management disabled due to lack of quorum
> > Nov 29 16:47:54 rhino66-right pengine[22183]:  warning: Node rhino66-
> > left.lab.archivas.com is unclean!
> > Nov 29 16:47:54 rhino66-right pengine[22183]:   notice: Cannot fence
> > unclean nodes until quorum is attained (or no-quorum-policy is set to
> > ignore)
> > .
> > Nov 29 16:48:38 rhino66-right corosync[6677]:   [TOTEM ] A new
> > membership (172.16.1.81:60) was formed. Members joined: 2
> > Nov 29 16:48:38 rhino66-right corosync[6677]:   [VOTEQ ] Waiting for
> > all cluster members. Current votes: 1 expected_votes: 2
> > Nov 29 16:48:38 rhino66-right corosync[6677]:   [QUORUM] This node is
> > within the primary component and will provide service.
> > Nov 29 16:48:38 rhino66-right corosync[6677]:   [QUORUM] Members[2]:
> > 1 2
> > Nov 29 16:48:38 rhino66-right corosync[6677]:   [MAIN  ] Completed
> > service synchronization, ready to provide service.
> > Nov 29 16:48:38 rhino66-right crmd[22184]:   notice: Quorum acquired
> > Nov 29 16:48:38 rhino66-right pacemakerd[22152]:   notice: Quorum
> > acquired
> > Nov 29 16:48:38 rhino66-right crmd[22184]:   notice: Could not obtain
> > a node name for corosync nodeid 2
> > Nov 29 16:48:38 rhino66-right pacemakerd[22152]:   notice: Could not
> > obtain a node name for corosync nodeid 2
> > Nov 29 16:48:38 rhino66-right crmd[22184]:   notice: Could not obtain
> > a node name for corosync nodeid 2
> > Nov 29 16:48:38 rhino66-right crmd[22184]:   notice: Node (null)
> > state is now member
> > Nov 29 16:48:38 rhino66-right pacemakerd[22152]:   notice: Could not
> > obtain a node name for corosync nodeid 2
> > Nov 29 16:48:38 rhino66-right pacemakerd[22152]:   notice: Node
> > (null) state is now member
> > Nov 29 16:48:54 rhino66-right crmd[22184]:   notice: State transition
> > S_IDLE -> S_POLICY_ENGINE
> > Nov 29 16:48:54 rhino66-right pengine[22183]:   notice: Watchdog will
> > be used via SBD if fencing is required
> > Nov 29 16:48:54 rhino66-right pengine[22183]:  warning: Scheduling
> > Node rhino66-left.lab.archivas.com for STONITH
> > Nov 29 16:48:54 rhino66-right pengine[22183]:   notice:  * Fence
> > (reboot) rhino66-left.lab.archivas.com 'node 

Re: [ClusterLabs] HA domain controller fences newly joined node after fence_ipmilan delay even if transition was aborted.

2018-12-17 Thread Ken Gaillot
On Mon, 2018-12-17 at 15:43 -0500, Vitaly Zolotusky wrote:
> Hello,
> I have a 2 node cluster and stonith is configured for SBD and
> fence_ipmilan.
> fence_ipmilan for node 1 is configured for 0 delay and for node 2 for
> 30 sec delay so that nodes do not start killing each other during
> startup.

If you're using corosync 2 or later, you can set "two_node: 1" in
corosync.conf. That implies the wait_for_all option, so that at start-
up, both nodes must be present before quorum can be reached the first
time. (After that point, one node can go away and quorum will be
retained.)

Another way to avoid this is to start corosync on all nodes, then start
pacemaker on all nodes.

> In some cases (usually right after installation and when node 1 comes
> up first and node 2 second) the node that comes up first (node 1)
> states that node 2 is unclean, but can't fence it until quorum
> reached. 
> Then as soon as quorum is reached after startup of corosync on node 2
> it sends a fence request for node 2. 
> Fence_ipmilan gets into 30 sec delay.
> Pacemaker gets started on node 2.
> While fence_ipmilan is still waiting for the delay node 1 crmd aborts
> transition that requested the fence.
> Even though the transition was aborted, when delay time expires node
> 2 gets fenced.

Currently, pacemaker has no way of cancelling fencing once it's been
initiated. Technically, it would be possible to cancel an operation
that's in the delay stage (assuming that no other fence device has
already been attempted, if there are more than one), but that hasn't
been implemented.

> Excerpts from messages are below. I also attached messages from both
> nodes and pe-input files from node 1.
> Any suggestions would be appreciated.
> Thank you very much for your help!
> Vitaly Zolotusky
> 
> Here are excerpts from the messages:
> 
> Node 1 - controller - rhino66-right 172.18.51.81 - came up
> first  *
> 
> Nov 29 16:47:54 rhino66-right pengine[22183]:  warning: Fencing and
> resource management disabled due to lack of quorum
> Nov 29 16:47:54 rhino66-right pengine[22183]:  warning: Node rhino66-
> left.lab.archivas.com is unclean!
> Nov 29 16:47:54 rhino66-right pengine[22183]:   notice: Cannot fence
> unclean nodes until quorum is attained (or no-quorum-policy is set to
> ignore)
> .
> Nov 29 16:48:38 rhino66-right corosync[6677]:   [TOTEM ] A new
> membership (172.16.1.81:60) was formed. Members joined: 2
> Nov 29 16:48:38 rhino66-right corosync[6677]:   [VOTEQ ] Waiting for
> all cluster members. Current votes: 1 expected_votes: 2
> Nov 29 16:48:38 rhino66-right corosync[6677]:   [QUORUM] This node is
> within the primary component and will provide service.
> Nov 29 16:48:38 rhino66-right corosync[6677]:   [QUORUM] Members[2]:
> 1 2
> Nov 29 16:48:38 rhino66-right corosync[6677]:   [MAIN  ] Completed
> service synchronization, ready to provide service.
> Nov 29 16:48:38 rhino66-right crmd[22184]:   notice: Quorum acquired
> Nov 29 16:48:38 rhino66-right pacemakerd[22152]:   notice: Quorum
> acquired
> Nov 29 16:48:38 rhino66-right crmd[22184]:   notice: Could not obtain
> a node name for corosync nodeid 2
> Nov 29 16:48:38 rhino66-right pacemakerd[22152]:   notice: Could not
> obtain a node name for corosync nodeid 2
> Nov 29 16:48:38 rhino66-right crmd[22184]:   notice: Could not obtain
> a node name for corosync nodeid 2
> Nov 29 16:48:38 rhino66-right crmd[22184]:   notice: Node (null)
> state is now member
> Nov 29 16:48:38 rhino66-right pacemakerd[22152]:   notice: Could not
> obtain a node name for corosync nodeid 2
> Nov 29 16:48:38 rhino66-right pacemakerd[22152]:   notice: Node
> (null) state is now member
> Nov 29 16:48:54 rhino66-right crmd[22184]:   notice: State transition
> S_IDLE -> S_POLICY_ENGINE
> Nov 29 16:48:54 rhino66-right pengine[22183]:   notice: Watchdog will
> be used via SBD if fencing is required
> Nov 29 16:48:54 rhino66-right pengine[22183]:  warning: Scheduling
> Node rhino66-left.lab.archivas.com for STONITH
> Nov 29 16:48:54 rhino66-right pengine[22183]:   notice:  * Fence
> (reboot) rhino66-left.lab.archivas.com 'node is unclean'
> Nov 29 16:48:54 rhino66-right pengine[22183]:   notice:  *
> Start  fence_sbd ( rhino66-right.lab.archivas.com )
> Nov 29 16:48:54 rhino66-right pengine[22183]:   notice:  *
> Start  ipmi-82   ( rhino66-right.lab.archivas.com )
> Nov 29 16:48:54 rhino66-right pengine[22183]:   notice:  *
> Start  S_IP  ( rhino66-right.lab.archivas.com )
> Nov 29 16:48:54 rhino66-right pengine[22183]:   notice:  *
> Start  postgres:0( rhino66-right.lab.archivas.com )
> Nov 29 16:48:54 rhino66-right pengine[22183]:   notice:  *
> Start  ethmonitor:0  ( rhino66-right.lab.archivas.com )
> Nov 29 16:48:54 rhino66-right pengine[22183]:   notice:  *
> Start  fs_monitor:0  ( rhino66-right.lab.archivas.com
> )   due to unrunnable DBMaster running (blocked)
> Nov 29 16:48:54 rhino66-right