On 09/30/2010 10:40 AM, Ranjith wrote: > Hi Steve, > > I believe you mean to say that the same acl rules should be applied in > the outgoing side also. > But since here the nodes are not receiving any packet (both multicast > and unicast) from the other, i believe it will also not send to the > other....Is that right? > >
That assumption is incorrect. Example: Nodes A,B,C A sends join (multicast) B,C receive join B sends join (multicast) A,C receive join C sends join (with A,B,C) now A rejects that message. As a result, the nodes can never come to consensus. Regards -steve > Regards, > Ranjith > > On Thu, Sep 30, 2010 at 10:41 PM, Steven Dake <[email protected] > <mailto:[email protected]>> wrote: > > On 09/30/2010 03:47 AM, Ranjith wrote: > > Hi all, > > Kindly let know whether corosync considers the below network as > byzantine failure i.e the case where N1 and N3 does not have > connectivity? > I am testing such scenarios as i believe such a behaviour can > happen due > to some misbehaviour in switch (stale arp entries). > > > > What makes the fault byzantine is that only incoming packets are > blocked. If you block both incoming and outgoing packets on the > nodes, the fault is not byzantine and totem will behave properly. > > Regards > -steve > > Regards, > Ranjith > > > Untitled.png > On Sat, Sep 25, 2010 at 9:47 AM, Ranjith > <[email protected] <mailto:[email protected]> > <mailto:[email protected] > <mailto:[email protected]>>> wrote: > > Hi Steve, > Just to make it clear. Do you mean that in the above case If > N3 is > part of the network, it should have connectivity to both N2 > and N1 > and if it happens so > that N3 has connectivity to N2 only, corosync doesnot take > care of > the same. > Regards, > Ranjith > On Sat, Sep 25, 2010 at 9:39 AM, Steven Dake > <[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>>> wrote: > > On 09/24/2010 08:20 PM, Ranjith wrote: > > Hi , > It is hard to tell what is happening without logs > from all 3 > nodes. Does > this only happen at system start, or can you duplicate 5 > minutes after > systems have started? > > >> The cluster is never stabilizing. It keeps on > switching between the > > membership and operational state. > Below is the test network which i am using: > > Untitled.png > > >> N1 and N3 does not reveive any packets from each > other. Here what i > > expected was that either (N1,N2) or (N2, N3) forms a two > node cluster > and stabilizes. But the cluster is never stabilizing > even > though 2 node > clusters are forming, it is going back to membership [I > checked the logs > and it looks like because of the steps i mentioned > in the > previous mail, > this seems to be happening] > > > > ...... Where did you say you were testing a byzantine > fault in > your original bug report? Please be more forthcoming in the > future. Corosync does not protect against byzantine faults. > Allowing one way connectivity in network connection = this > fault scenario. You can try coro-netctl (the attached > script) > which will atomically block a network ip in the network > to test > split brain scenarios without actually pulling network > cables. > > Regards > -steve > > > Regards, > Ranjith > On Fri, Sep 24, 2010 at 11:36 PM, Steven Dake > <[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>> > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>>>> wrote: > > It is hard to tell what is happening without > logs from > all 3 nodes. > Does this only happen at system start, or can you > duplicate 5 > minutes after systems have started? > > If it is at system start, you may need to enable > "fast > STP" on your > switch. It looks to me like node 3 gets some > messages > through but > then is blocked. STP will do this in it's > default state > on most > switches. > > Another option if you can't enable STP is to use > broadcast mode (man > openais.conf for details). > > Also verify firewalls are properly configured on all > nodes. You can > join us on the irc server freenode on > #linux-cluster for > real-time > assistance. > > Regards > -steve > > > On 09/22/2010 11:33 PM, Ranjith wrote: > > Hi Steve, > I am running corosync 1.2.8 > I didn't get what u meant by blackbox. I > suppose it is > logs/debugs. > I just checked logs/debugs and I am able to > understand the below: > > 1--------------2--------------3 > 1) Node1 and Node2 are already in a 2node > cluster > 2) Now Node3 sends join with ({1} , {} ) > (proc_list/fail_list) > 3) Node2 sends join ({1,2,3} , {}) and Node 1/3 > updates to > ({1,2,3}, {}) > 4) Now Node 2 gets consensus after some messages > [But 1 is the rep] > 5) Consensus timeout fires at node 1 for node 3, > node1 sends join as > ({1,2}, {3}) > 6) Node2 updates because of the above message to > ({1,2}, {3}) > and sends > out join. This join received by node 3 > causes it to > update > ({1,3}, {2}) > 7) Node1and Node2 enter operational (fail list > cleared by node2) but > node 3 join timeout fires and again > membership state. > 8) This will continue to happen until consensus > fires at node3 > for node1 > and it moves to ({3}, {1,2}) > 9) Now Node1and Node2 from 2 node cluster and 3 > forms a single > node cluster > 10) Now node 2 broadcast a Normal message > 11) This message is received by Node3 as a > foreign > message which > forces > it to go to gather state > 12) Again above steps .... > The cluster is never stabilizing. > I have attached the debugs for Node2: > (1 - 10.102.33.115, 2 - 10.102.33.150, 3 > -10.102.33.180) > Regards, > Ranjith > > On Wed, Sep 22, 2010 at 10:53 PM, Steven Dake > <[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>> > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>>> > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>> > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>>>>> wrote: > > On 09/21/2010 11:15 PM, Ranjith wrote: > > Hi all, > Kindly comment on the above behaviour > Regards, > Ranjith > > On Tue, Sep 21, 2010 at 9:52 PM, Ranjith > <[email protected] <mailto:[email protected]> > <mailto:[email protected] > <mailto:[email protected]>> > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] > <mailto:[email protected]>>> > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] > <mailto:[email protected]>> > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] > <mailto:[email protected]>>>> > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] > <mailto:[email protected]>> > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] > <mailto:[email protected]>>> > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] > <mailto:[email protected]>> > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] > <mailto:[email protected]>>>>>> wrote: > > Hi all, > I was testing the corosync cluster > engine by using the > testcpg exec > provided along with the release. > I am > getting the below > behaviour > while testing some specific > scenarios. > Kindly > comment on the > expected behaviour. > 1) 3 Node cluster > > 1---------2---------3 > a) suppose I bring the > nodes 1&2 > up, it will form a > ring (1,2) > b) now bring up 3 > c) 3 sends join which > restarts the > membership > process > d) (1,2) again forms the > ring , 3 > forms self > cluster > e) now 3 sends a join (due > to join > or other > timeout) > f) again membership protocol is > started as 2 > responds > to this > by going to gather state ( i > believe 2 > should not accept > this as 2 > would have earlier decided that > 3 is failed) > I am seeing a continuous > loop of > the above > behaviour ( > operational -> membership -> > operational > -> ) due to > which the > cluster is not becoming stabilized > 2) 3 Node Cluster > > 1---------2-----------3 > a) bring up all the three > nodes at > the same > time (None > of the > nodes have seen each other > before this) > b) Now each node forms a > cluster > by itself .. > (Here i > think it > should from either a (1,2) or > (2,3) ring ) > Regards, > Ranjith > > > > > Ranjith, > > Which version of corosync are you running? > > can you run corosync-blackbox and attach > the output? > > Thanks > -steve > > > > _______________________________________________ > Openais mailing list > [email protected] > <mailto:[email protected]> > <mailto:[email protected] > <mailto:[email protected]>> > <mailto:[email protected] > <mailto:[email protected]> > <mailto:[email protected] > <mailto:[email protected]>>> > <mailto:[email protected] > <mailto:[email protected]> > <mailto:[email protected] > <mailto:[email protected]>> > <mailto:[email protected] > <mailto:[email protected]> > <mailto:[email protected] > <mailto:[email protected]>>>> > > https://lists.linux-foundation.org/mailman/listinfo/openais > > > > > > > > > > _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
