2013/4/17 Fabio M. Di Nitto <fdini...@redhat.com> > On 4/17/2013 3:57 PM, eXeC001er wrote: > > Hello. > > > > I have tried to create the following demo-cluster to check how work > > MasterWins logic: > > > > NODE1 (VM) > > |========== tap0 (host) > > NODE2 (VM) > > |=============br0(host) > > NODE3 (VM) > > |========== tap1 (host) > > NODE3 (VM) > > > > > > To simulate 50/50 split i just remove "tap1" from "br0". > > > > before split i have the following on all nodes > > > > ---------------------- > > Quorate: Yes > > Nodeid Votes Qdevice Name > > 1 1 A,V,MW 172.18.251.41 > > 2 1 A,NV,MW 172.18.251.42 (local) > > 3 1 NA,NV,MW 172.18.251.43 > > 4 1 A,NV,MW 172.18.251.44 > > 0 3 QDEV > > > > ---------------------- > > > > after split > > > > on NODE1 and NODE2 i see > > > > ---------------------- > > Quorate: Yes > > Nodeid Votes Qdevice Name > > 1 1 A,V,MW 172.18.251.41 (local) > > 2 1 A,NV,MW 172.18.251.42 > > 0 3 QDEV > > ---------------------- > > > > on NODE2 and NODE3 i see > > > > ---------------------- > > Quorate: No > > Nodeid Votes Qdevice Name > > 3 1 A,NV,MW 172.18.251.43 > > 4 1 A,NV,MW 172.18.251.44 (local) > > 0 3 QDEV > > ---------------------- > > > > So everything fine and MasterWins works as designed. > > > > But after check i tried to restore network connection and added "tap1" > > to "br0". I see that all nodes can ping to each other. but corosync > > still show me 50/50 split. > > > > tcpdump: > > ..................... > > 17:49:36.387217 IP 172.18.251.43.5404 > 172.18.251.44.5405: UDP, length > 74 > > 17:49:36.387441 IP 172.18.251.44.5404 > 172.18.251.43.5405: UDP, length > 74 > > 17:49:36.447590 IP 172.18.251.41.5404 > 172.18.251.42.5405: UDP, length > 74 > > 17:49:36.447811 IP 172.18.251.42.5404 > 172.18.251.41.5405: UDP, length > 74 > > 17:49:36.568557 IP 172.18.251.43.5404 > 172.18.251.44.5405: UDP, length > 74 > > 17:49:36.568804 IP 172.18.251.44.5404 > 172.18.251.43.5405: UDP, length > 74 > > 17:49:36.587829 IP 172.18.251.43.5404 > 239.255.1.1.5405: UDP, length 87 > > 17:49:36.628254 IP 172.18.251.41.5404 > 172.18.251.42.5405: UDP, length > 74 > > 17:49:36.628442 IP 172.18.251.42.5404 > 172.18.251.41.5405: UDP, length > 74 > > 17:49:36.648323 IP 172.18.251.41.5404 > 239.255.1.1.5405: UDP, length 87 > > ........................ > > > > > > Any ideas ? > > > > Beside the missing logs that might show something, I have tested this > scenario plenty times but using iptables instead. > > I wonder if you have found a bug in the bridging code. > > I suggest you try the following test instead: > > 4 nodes, without qdisk, try to repeat your bridge remove/add test > > 4 nodes, without qdisk, use iptables instead (make sure block mcast > traffic too) > > then again with qdisk + iptables. >
have tried with IPTABLES. everything nice. But in any case it is very strange, because after the nework connection has been restored and i restart corosync on ALL nodes my "cluster" works. logs do not contain anything intresting. latest records after 50/50 split just say that some memeber have left. after restoring the connection no new records in the logfile. Also it is very strange that to restore whole cluster i need to restart corosync on ALL nodes. If restart the corosync only on 3/4 node then corosync on each node does not see any other nodes. Thanks. > > But also collect the logs.. otherwise tcpdump doesn´t say enough. > > Fabio > _______________________________________________ > Openais mailing list > Openais@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/openais >
_______________________________________________ Openais mailing list Openais@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/openais