05.05.2020 06:39, Nickle, Richard пишет: > I have a two node cluster managing a VIP. The service is an SMTP service. > This could be active/active, it doesn't matter which node accepts the SMTP > connection, but I wanted to make sure that a VIP was in place so that there > was a well-known address. > > This service has been running for quite awhile with no problems. All of a > sudden, it partitioned, and now I can't work out a good way to get them to > merge the clusters back again. Right now one partition takes the resource > and starts the VIP, but doesn't see the other node. The other node doesn't > create a resource, and can't seem to see the other node. > > At this point, I am perfectly willing to create another node and make an > odd-numbered cluster, the arguments for this being fairly persuasive. But > I'm not sure why they are blocking. > > Surely there must be some manual way to get a partitioned cluster to > merge?
it does it automatically if nodes can communicate with each other. You seem to have some network connectivity issues which you need to investigate and resolve. > Some trick? I also had a scenario several weeks ago where an > odd-numbered cluster configured in a similar way partitioned into a 3 and 2 > node cluster, and I was unable to work out how to get them to merge, until > all of a sudden they seemed to fix themselves after doing a 'pcs node > remove/pcs node add' which had failed many times before. I have tried that > here but with no success so far. > > I ruled out some common cases I've seen in discussions and threads, such as > having my host name defined in host as localhost, etc. > > Corosync 2.4.3, Pacemaker 0.9.164. (Ubuntu 18.04.). > > Output from pcs status for both nodes: > > Cluster name: mail > Stack: corosync > Current DC: mail2 (version 1.1.18-2b07d5c5a9) - partition with quorum > Last updated: Mon May 4 23:28:53 2020 > Last change: Mon May 4 21:50:04 2020 by hacluster via crmd on mail2 > > 2 nodes configured > 1 resource configured > > Online: [ mail2 ] > OFFLINE: [ mail3 ] > > Full list of resources: > > mail_vip (ocf::heartbeat:IPaddr2): Started mail2 > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled > > Cluster name: mail > Stack: corosync > Current DC: mail3 (version 1.1.18-2b07d5c5a9) - partition with quorum > Last updated: Mon May 4 22:13:10 2020 > Last change: Mon May 4 22:10:34 2020 by root via cibadmin on mail3 > > 2 nodes configured > 0 resources configured > > Online: [ mail3 ] > OFFLINE: [ mail2 ] > > No resources > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled > > /etc/corosync/corosync.conf: > > totem { > version: 2 > cluster_name: mail > clear_node_high_bit: yes > crypto_cipher: none > crypto_hash: none > > interface { > ringnumber: 0 > bindnetaddr: 192.168.80.128 > mcastport: 5405 > } > } > Is interconnect attached to LAN switches or it is direct cable between two host? > logging { > fileline: off > to_stderr: no > to_logfile: no > to_syslog: yes > syslog_facility: daemon > debug: off > timestamp: on > } > > quorum { > provider: corosync_votequorum > wait_for_all: 0 > two_node: 1 > } > > nodelist { > node { > ring0_addr: mail2 > name: mail2 > nodeid: 1 > } > > node { > ring0_addr: mail3 > name: mail3 > nodeid: 2 > } > } > > Thanks! > > Rick > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/