2013/4/17 Fabio M. Di Nitto <fdini...@redhat.com>

> On 4/17/2013 3:57 PM, eXeC001er wrote:
> > Hello.
> >
> > I have tried to create the following demo-cluster to check how work
> > MasterWins logic:
> >
> > NODE1 (VM)
> >            |========== tap0 (host)
> > NODE2 (VM)
> >                                     |=============br0(host)
> > NODE3 (VM)
> >            |========== tap1 (host)
> > NODE3 (VM)
> >
> >
> > To simulate 50/50 split i just remove "tap1" from "br0".
> >
> > before split i have the following on all nodes
> >
> > ----------------------
> > Quorate:          Yes
> >     Nodeid      Votes    Qdevice Name
> >          1          1     A,V,MW 172.18.251.41
> >          2          1    A,NV,MW 172.18.251.42 (local)
> >          3          1   NA,NV,MW 172.18.251.43
> >          4          1    A,NV,MW 172.18.251.44
> >          0          3            QDEV
> >
> > ----------------------
> >
> > after split
> >
> > on NODE1 and NODE2 i see
> >
> > ----------------------
> > Quorate:          Yes
> >     Nodeid      Votes    Qdevice Name
> >          1          1     A,V,MW 172.18.251.41 (local)
> >          2          1    A,NV,MW 172.18.251.42
> >          0          3            QDEV
> > ----------------------
> >
> > on NODE2 and NODE3 i see
> >
> > ----------------------
> > Quorate:          No
> >     Nodeid      Votes    Qdevice Name
> >          3          1    A,NV,MW 172.18.251.43
> >          4          1    A,NV,MW 172.18.251.44 (local)
> >          0          3            QDEV
> > ----------------------
> >
> > So everything fine and MasterWins works as designed.
> >
> > But after check i tried to restore network connection and added "tap1"
> > to "br0". I see that all nodes can ping to each other. but corosync
> > still show me 50/50 split.
> >
> > tcpdump:
> > .....................
> > 17:49:36.387217 IP 172.18.251.43.5404 > 172.18.251.44.5405: UDP, length
> 74
> > 17:49:36.387441 IP 172.18.251.44.5404 > 172.18.251.43.5405: UDP, length
> 74
> > 17:49:36.447590 IP 172.18.251.41.5404 > 172.18.251.42.5405: UDP, length
> 74
> > 17:49:36.447811 IP 172.18.251.42.5404 > 172.18.251.41.5405: UDP, length
> 74
> > 17:49:36.568557 IP 172.18.251.43.5404 > 172.18.251.44.5405: UDP, length
> 74
> > 17:49:36.568804 IP 172.18.251.44.5404 > 172.18.251.43.5405: UDP, length
> 74
> > 17:49:36.587829 IP 172.18.251.43.5404 > 239.255.1.1.5405: UDP, length 87
> > 17:49:36.628254 IP 172.18.251.41.5404 > 172.18.251.42.5405: UDP, length
> 74
> > 17:49:36.628442 IP 172.18.251.42.5404 > 172.18.251.41.5405: UDP, length
> 74
> > 17:49:36.648323 IP 172.18.251.41.5404 > 239.255.1.1.5405: UDP, length 87
> > ........................
> >
> >
> > Any ideas ?
> >
>
> Beside the missing logs that might show something, I have tested this
> scenario plenty times but using iptables instead.
>
> I wonder if you have found a bug in the bridging code.
>
> I suggest you try the following test instead:
>
> 4 nodes, without qdisk, try to repeat your bridge remove/add test
>
> 4 nodes, without qdisk, use iptables instead (make sure block mcast
> traffic too)
>
> then again with qdisk + iptables.
>

have tried with IPTABLES. everything nice.

But in any case it is very strange, because after the nework connection has
been restored and i restart corosync on ALL nodes my "cluster" works.

logs do not contain anything intresting. latest records after 50/50 split
just say that some memeber have left. after restoring the connection no new
records in the logfile.

Also it is very strange that to restore whole cluster i need to restart
corosync on ALL nodes. If restart the corosync only on 3/4 node then
corosync on each node does not see any other nodes.


Thanks.


>
> But also collect the logs.. otherwise tcpdump doesn´t say enough.
>
> Fabio
> _______________________________________________
> Openais mailing list
> Openais@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/openais
>
_______________________________________________
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/openais

Reply via email to