Thanks Honza and Andrei (and Strahil? I might have missed a message in the thread...)
I'm running this in a VM cluster, so they are on a VLAN and there is switched routing. I tried enabling the 'transport: udpu' unicast option, but I have mixed results: corosync seems to fault and not come up, but even that isn't consistent. I can't fool around with it right now because it is production, so I will move to try udpu on a test environment. Is it possible for me to rule in/out multicast? I tried using iperf to do this: rnickle@mail3:~$ !605 iperf -s -u -B 239.192.226.65 -i 1 ------------------------------------------------------------ Server listening on UDP port 5001 Binding to local address 239.192.226.65 Joining multicast group 239.192.226.65 Receiving 1470 byte datagrams UDP buffer size: 208 KByte (default) ------------------------------------------------------------ rnickle@mail2:~$ iperf -c 239.192.226.65 -u -T 32 -t 3 -i 1 ------------------------------------------------------------ Client connecting to 239.192.226.65, UDP port 5001 Sending 1470 byte datagrams, IPG target: 11215.21 us (kalman adjust) Setting multicast TTL to 32 UDP buffer size: 208 KByte (default) ------------------------------------------------------------ [ 3] local 192.133.83.146 port 46033 connected with 239.192.226.65 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 1.0 sec 131 KBytes 1.07 Mbits/sec [ 3] 1.0- 2.0 sec 128 KBytes 1.05 Mbits/sec [ 3] 2.0- 3.0 sec 128 KBytes 1.05 Mbits/sec [ 3] 0.0- 3.0 sec 386 KBytes 1.05 Mbits/sec [ 3] Sent 269 datagrams Thanks, Rick On Tue, May 5, 2020 at 1:54 AM Andrei Borzenkov <arvidj...@gmail.com> wrote: > 05.05.2020 06:39, Nickle, Richard пишет: > > I have a two node cluster managing a VIP. The service is an SMTP > service. > > This could be active/active, it doesn't matter which node accepts the > SMTP > > connection, but I wanted to make sure that a VIP was in place so that > there > > was a well-known address. > > > > This service has been running for quite awhile with no problems. All of > a > > sudden, it partitioned, and now I can't work out a good way to get them > to > > merge the clusters back again. Right now one partition takes the > resource > > and starts the VIP, but doesn't see the other node. The other node > doesn't > > create a resource, and can't seem to see the other node. > > > > At this point, I am perfectly willing to create another node and make an > > odd-numbered cluster, the arguments for this being fairly persuasive. > But > > I'm not sure why they are blocking. > > > > Surely there must be some manual way to get a partitioned cluster to > > merge? > > it does it automatically if nodes can communicate with each other. You > seem to have some network connectivity issues which you need to > investigate and resolve. > > > Some trick? I also had a scenario several weeks ago where an > > odd-numbered cluster configured in a similar way partitioned into a 3 > and 2 > > node cluster, and I was unable to work out how to get them to merge, > until > > all of a sudden they seemed to fix themselves after doing a 'pcs node > > remove/pcs node add' which had failed many times before. I have tried > that > > here but with no success so far. > > > > I ruled out some common cases I've seen in discussions and threads, such > as > > having my host name defined in host as localhost, etc. > > > > Corosync 2.4.3, Pacemaker 0.9.164. (Ubuntu 18.04.). > > > > Output from pcs status for both nodes: > > > > Cluster name: mail > > Stack: corosync > > Current DC: mail2 (version 1.1.18-2b07d5c5a9) - partition with quorum > > Last updated: Mon May 4 23:28:53 2020 > > Last change: Mon May 4 21:50:04 2020 by hacluster via crmd on mail2 > > > > 2 nodes configured > > 1 resource configured > > > > Online: [ mail2 ] > > OFFLINE: [ mail3 ] > > > > Full list of resources: > > > > mail_vip (ocf::heartbeat:IPaddr2): Started mail2 > > > > Daemon Status: > > corosync: active/enabled > > pacemaker: active/enabled > > pcsd: active/enabled > > > > Cluster name: mail > > Stack: corosync > > Current DC: mail3 (version 1.1.18-2b07d5c5a9) - partition with quorum > > Last updated: Mon May 4 22:13:10 2020 > > Last change: Mon May 4 22:10:34 2020 by root via cibadmin on mail3 > > > > 2 nodes configured > > 0 resources configured > > > > Online: [ mail3 ] > > OFFLINE: [ mail2 ] > > > > No resources > > > > Daemon Status: > > corosync: active/enabled > > pacemaker: active/enabled > > pcsd: active/enabled > > > > /etc/corosync/corosync.conf: > > > > totem { > > version: 2 > > cluster_name: mail > > clear_node_high_bit: yes > > crypto_cipher: none > > crypto_hash: none > > > > interface { > > ringnumber: 0 > > bindnetaddr: 192.168.80.128 > > mcastport: 5405 > > } > > } > > > > Is interconnect attached to LAN switches or it is direct cable between > two host? > > > logging { > > fileline: off > > to_stderr: no > > to_logfile: no > > to_syslog: yes > > syslog_facility: daemon > > debug: off > > timestamp: on > > } > > > > quorum { > > provider: corosync_votequorum > > wait_for_all: 0 > > two_node: 1 > > } > > > > nodelist { > > node { > > ring0_addr: mail2 > > name: mail2 > > nodeid: 1 > > } > > > > node { > > ring0_addr: mail3 > > name: mail3 > > nodeid: 2 > > } > > } > > > > Thanks! > > > > Rick > > > > > > _______________________________________________ > > Manage your subscription: > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > ClusterLabs home: https://www.clusterlabs.org/ > > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ >
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/