Hi steve, Network is like this: A (block all packets from src C) B C (block all packets from src A)
Nodes A,B,C A sends join (multicast) Only B receives. (C drops it because of ACL) B sends join (multicast) (with A,B) A,C receive join C sends join (with A,B,C) Only B receives the above B sends join (with A,B,C) A, C sends join (with A,B,C) B gets consensus but suppose A is the smallest Id But A never gets consensus as A cannot get join from C Am I correct till this point? Regards, Ranjith On Thu, Sep 30, 2010 at 11:49 PM, Steven Dake <[email protected]> wrote: > On 09/30/2010 10:40 AM, Ranjith wrote: > >> Hi Steve, >> >> I believe you mean to say that the same acl rules should be applied in >> the outgoing side also. >> But since here the nodes are not receiving any packet (both multicast >> and unicast) from the other, i believe it will also not send to the >> other....Is that right? >> >> >> > That assumption is incorrect. Example: > > Nodes > A,B,C > A sends join (multicast) > B,C receive join > B sends join (multicast) > A,C receive join > C sends join (with A,B,C) > now A rejects that message. > > As a result, the nodes can never come to consensus. > > Regards > -steve > > Regards, >> Ranjith >> >> On Thu, Sep 30, 2010 at 10:41 PM, Steven Dake <[email protected] >> <mailto:[email protected]>> wrote: >> >> On 09/30/2010 03:47 AM, Ranjith wrote: >> >> Hi all, >> >> Kindly let know whether corosync considers the below network as >> byzantine failure i.e the case where N1 and N3 does not have >> connectivity? >> I am testing such scenarios as i believe such a behaviour can >> happen due >> to some misbehaviour in switch (stale arp entries). >> >> >> >> What makes the fault byzantine is that only incoming packets are >> blocked. If you block both incoming and outgoing packets on the >> nodes, the fault is not byzantine and totem will behave properly. >> >> Regards >> -steve >> >> Regards, >> Ranjith >> >> >> Untitled.png >> On Sat, Sep 25, 2010 at 9:47 AM, Ranjith >> <[email protected] <mailto:[email protected]> >> <mailto:[email protected] >> <mailto:[email protected]>>> wrote: >> >> Hi Steve, >> Just to make it clear. Do you mean that in the above case If >> N3 is >> part of the network, it should have connectivity to both N2 >> and N1 >> and if it happens so >> that N3 has connectivity to N2 only, corosync doesnot take >> care of >> the same. >> Regards, >> Ranjith >> On Sat, Sep 25, 2010 at 9:39 AM, Steven Dake >> <[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>> wrote: >> >> On 09/24/2010 08:20 PM, Ranjith wrote: >> >> Hi , >> It is hard to tell what is happening without logs >> from all 3 >> nodes. Does >> this only happen at system start, or can you duplicate >> 5 >> minutes after >> systems have started? >> >> >> The cluster is never stabilizing. It keeps on >> switching between the >> >> membership and operational state. >> Below is the test network which i am using: >> >> Untitled.png >> >> >> N1 and N3 does not reveive any packets from each >> other. Here what i >> >> expected was that either (N1,N2) or (N2, N3) forms a >> two >> node cluster >> and stabilizes. But the cluster is never stabilizing >> even >> though 2 node >> clusters are forming, it is going back to membership [I >> checked the logs >> and it looks like because of the steps i mentioned >> in the >> previous mail, >> this seems to be happening] >> >> >> >> ...... Where did you say you were testing a byzantine >> fault in >> your original bug report? Please be more forthcoming in >> the >> future. Corosync does not protect against byzantine faults. >> Allowing one way connectivity in network connection = >> this >> fault scenario. You can try coro-netctl (the attached >> script) >> which will atomically block a network ip in the network >> to test >> split brain scenarios without actually pulling network >> cables. >> >> Regards >> -steve >> >> >> Regards, >> Ranjith >> On Fri, Sep 24, 2010 at 11:36 PM, Steven Dake >> <[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>> >> <mailto:[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>>> wrote: >> >> It is hard to tell what is happening without >> logs from >> all 3 nodes. >> Does this only happen at system start, or can you >> duplicate 5 >> minutes after systems have started? >> >> If it is at system start, you may need to enable >> "fast >> STP" on your >> switch. It looks to me like node 3 gets some >> messages >> through but >> then is blocked. STP will do this in it's >> default state >> on most >> switches. >> >> Another option if you can't enable STP is to use >> broadcast mode (man >> openais.conf for details). >> >> Also verify firewalls are properly configured on >> all >> nodes. You can >> join us on the irc server freenode on >> #linux-cluster for >> real-time >> assistance. >> >> Regards >> -steve >> >> >> On 09/22/2010 11:33 PM, Ranjith wrote: >> >> Hi Steve, >> I am running corosync 1.2.8 >> I didn't get what u meant by blackbox. I >> suppose it is >> logs/debugs. >> I just checked logs/debugs and I am able to >> understand the below: >> >> 1--------------2--------------3 >> 1) Node1 and Node2 are already in a 2node >> cluster >> 2) Now Node3 sends join with ({1} , {} ) >> (proc_list/fail_list) >> 3) Node2 sends join ({1,2,3} , {}) and Node 1/3 >> updates to >> ({1,2,3}, {}) >> 4) Now Node 2 gets consensus after some >> messages >> [But 1 is the rep] >> 5) Consensus timeout fires at node 1 for node >> 3, >> node1 sends join as >> ({1,2}, {3}) >> 6) Node2 updates because of the above message >> to >> ({1,2}, {3}) >> and sends >> out join. This join received by node 3 >> causes it to >> update >> ({1,3}, {2}) >> 7) Node1and Node2 enter operational (fail list >> cleared by node2) but >> node 3 join timeout fires and again >> membership state. >> 8) This will continue to happen until consensus >> fires at node3 >> for node1 >> and it moves to ({3}, {1,2}) >> 9) Now Node1and Node2 from 2 node cluster and 3 >> forms a single >> node cluster >> 10) Now node 2 broadcast a Normal message >> 11) This message is received by Node3 as a >> foreign >> message which >> forces >> it to go to gather state >> 12) Again above steps .... >> The cluster is never stabilizing. >> I have attached the debugs for Node2: >> (1 - 10.102.33.115, 2 - 10.102.33.150, 3 >> -10.102.33.180) >> Regards, >> Ranjith >> >> On Wed, Sep 22, 2010 at 10:53 PM, Steven Dake >> <[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>> >> <mailto:[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>> >> <mailto:[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>> >> <mailto:[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>>>> wrote: >> >> On 09/21/2010 11:15 PM, Ranjith wrote: >> >> Hi all, >> Kindly comment on the above behaviour >> Regards, >> Ranjith >> >> On Tue, Sep 21, 2010 at 9:52 PM, >> Ranjith >> <[email protected] <mailto:[email protected]> >> <mailto:[email protected] >> <mailto:[email protected]>> >> <mailto:[email protected] <mailto: >> [email protected]> >> <mailto:[email protected] >> <mailto:[email protected]>>> >> <mailto:[email protected] <mailto: >> [email protected]> >> <mailto:[email protected] >> <mailto:[email protected]>> >> <mailto:[email protected] <mailto: >> [email protected]> >> <mailto:[email protected] >> <mailto:[email protected]>>>> >> <mailto:[email protected] <mailto: >> [email protected]> >> <mailto:[email protected] >> <mailto:[email protected]>> >> <mailto:[email protected] <mailto: >> [email protected]> >> <mailto:[email protected] >> <mailto:[email protected]>>> >> <mailto:[email protected] <mailto: >> [email protected]> >> <mailto:[email protected] >> <mailto:[email protected]>> >> <mailto:[email protected] <mailto: >> [email protected]> >> <mailto:[email protected] >> <mailto:[email protected]>>>>>> wrote: >> >> Hi all, >> I was testing the corosync cluster >> engine by using the >> testcpg exec >> provided along with the release. >> I am >> getting the below >> behaviour >> while testing some specific >> scenarios. >> Kindly >> comment on the >> expected behaviour. >> 1) 3 Node cluster >> >> 1---------2---------3 >> a) suppose I bring the >> nodes 1&2 >> up, it will form a >> ring (1,2) >> b) now bring up 3 >> c) 3 sends join which >> restarts the >> membership >> process >> d) (1,2) again forms the >> ring , 3 >> forms self >> cluster >> e) now 3 sends a join (due >> to join >> or other >> timeout) >> f) again membership protocol >> is >> started as 2 >> responds >> to this >> by going to gather state ( i >> believe 2 >> should not accept >> this as 2 >> would have earlier decided that >> 3 is failed) >> I am seeing a continuous >> loop of >> the above >> behaviour ( >> operational -> membership -> >> operational >> -> ) due to >> which the >> cluster is not becoming stabilized >> 2) 3 Node Cluster >> >> 1---------2-----------3 >> a) bring up all the three >> nodes at >> the same >> time (None >> of the >> nodes have seen each other >> before this) >> b) Now each node forms a >> cluster >> by itself .. >> (Here i >> think it >> should from either a (1,2) or >> (2,3) ring ) >> Regards, >> Ranjith >> >> >> >> >> Ranjith, >> >> Which version of corosync are you running? >> >> can you run corosync-blackbox and attach >> the output? >> >> Thanks >> -steve >> >> >> >> _______________________________________________ >> Openais mailing list >> [email protected] >> <mailto:[email protected]> >> <mailto:[email protected] >> <mailto:[email protected]>> >> <mailto:[email protected] >> <mailto:[email protected]> >> <mailto:[email protected] >> <mailto:[email protected]>>> >> <mailto:[email protected] >> <mailto:[email protected]> >> <mailto:[email protected] >> <mailto:[email protected]>> >> <mailto:[email protected] >> <mailto:[email protected]> >> <mailto:[email protected] >> <mailto:[email protected]>>>> >> >> https://lists.linux-foundation.org/mailman/listinfo/openais >> >> >> >> >> >> >> >> >> >> >> >
_______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
