>>> Chris Walker <[email protected]> schrieb am 18.12.2018 um 17:13 in Nachricht
<[email protected]>:
[...]
> 2. As Ken mentioned, synchronize the starting of Corosync and Pacemaker. I
> did this with a simple ExecStartPre systemd script:
>
> [root@bug0 ~]# cat /etc/systemd/system/corosync.service.d/ha_wait.conf
> [Service]
> ExecStartPre=/sbin/ha_wait.sh
> TimeoutStartSec=11min
> [root@bug0 ~]#
>
> where ha_wait.sh has something like:
>
> #!/bin/bash
>
> timeout=600
>
> peer=<hostname of HA peer>
>
> echo "Waiting for ${peer}"
> peerup() {
> systemctl -H ${peer} show -p ActiveState corosync.service 2> /dev/null | \
> egrep -q "=active|=reloading|=failed|=activating|=deactivating" && return
> 0
> return 1
> }
>
> now=${SECONDS}
> while ! peerup && [ $((SECONDS-now)) -lt ${timeout} ]; do
> echo -n .
> sleep 5
> done
>
> peerup && echo "${peer} is up starting HA" || echo "${peer} not up after
> ${timeout} starting HA alone"
>
>
> This will cause corosync startup to block for 10 minutes waiting for the
> partner node to come up, after which both nodes will start corosync/pacemaker
> close in time. If one node never comes up, then it will wait 10 minutes
> before starting, after which the other node will be fenced (startup fencing
> and subsequent resource startup will only happen will only occur if
> no-quorum-policy is set to ignore)
Hi!
I also missed such an option, because I knew it from HP-UX ServiceGuard: There
was a delay to wait "for the cluster to form", meaning if all nodes are down,
there is no cluster, and a new cluster "has to form". Then the first node to
boot would not simply become a one-node cluster, but it will wait some
configurable time for other nodes to join (come up). So either if all
configured cluster nodes came up, or the configured time had elapsed, the
"cluster would form". The advantage is that unneeded resource movements are
avoided when other nodes come up shortly after a new cluster has formed...
That makes sense to me.
(ServiceGuard also had an option I miss in pacemaker: I could configure a
network timeout that was ignored. So when I quickly replugged a cable (which
would cause an "interface down" for at least five seconds) the cluster would
NOT trigger any corrective actions. We had such a timeout set to 30 seconds or
so, as most resource operations would take much longer to complete...)
Regards,
Ulrich
>
> HTH,
>
> Chris
[...]
_______________________________________________
Users mailing list: [email protected]
https://lists.clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org