>>> Chris Walker <[email protected]> schrieb am 18.12.2018 um 17:13 in Nachricht
<[email protected]>:

[...]
> 2.  As Ken mentioned, synchronize the starting of Corosync and Pacemaker.  I 
> did this with a simple ExecStartPre systemd script:
> 
> [root@bug0 ~]# cat /etc/systemd/system/corosync.service.d/ha_wait.conf
> [Service]
> ExecStartPre=/sbin/ha_wait.sh
> TimeoutStartSec=11min
> [root@bug0 ~]#
> 
> where ha_wait.sh has something like:
> 
> #!/bin/bash
> 
> timeout=600
> 
> peer=<hostname of HA peer>
> 
> echo "Waiting for ${peer}"
> peerup() {
>   systemctl -H ${peer} show -p ActiveState corosync.service 2> /dev/null | \
>     egrep -q "=active|=reloading|=failed|=activating|=deactivating" && return 
> 0
>   return 1
> }
> 
> now=${SECONDS}
> while ! peerup && [ $((SECONDS-now)) -lt ${timeout} ]; do
>   echo -n .
>   sleep 5
> done
> 
> peerup && echo "${peer} is up starting HA" || echo "${peer} not up after 
> ${timeout} starting HA alone"
> 
> 
> This will cause corosync startup to block for 10 minutes waiting for the 
> partner node to come up, after which both nodes will start corosync/pacemaker 
> close in time.  If one node never comes up, then it will wait 10 minutes 
> before starting, after which the other node will be fenced (startup fencing 
> and subsequent resource startup will only happen will only occur if 
> no-quorum-policy is set to ignore)

Hi!

I also missed such an option, because I knew it from HP-UX ServiceGuard: There 
was a delay to wait "for the cluster to form", meaning if all nodes are down, 
there is no cluster, and a new cluster "has to form". Then the first node to 
boot would not simply become a one-node cluster, but it will wait some 
configurable time for other nodes to join (come up). So either if all 
configured cluster nodes came up, or the configured time had elapsed, the 
"cluster would form". The advantage is that unneeded resource movements are 
avoided when other nodes come up shortly after a new cluster has formed...

That makes sense to me.

(ServiceGuard also had an option I miss in pacemaker: I could configure a 
network timeout that was ignored. So when I quickly replugged a cable (which 
would cause an "interface down" for at least five seconds) the cluster would 
NOT trigger any corrective actions. We had such a timeout set to 30 seconds or 
so, as most resource operations would take much longer to complete...)

Regards,
Ulrich

> 
> HTH,
> 
> Chris
[...]


_______________________________________________
Users mailing list: [email protected]
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to