Re: [Openais] Corosync 1.2.8 totem membership behaviour

2010-09-26 Thread Ranjith
Hi Steve,

Please comment on the same.

Regards,
Ranjith

On Sat, Sep 25, 2010 at 9:47 AM, Ranjith ranjith.nath...@gmail.com wrote:

 Hi Steve,

 Just to make it clear. Do you mean that in the above case If N3 is part of
 the network, it should have connectivity to both N2 and N1 and if it happens
 so
 that N3 has connectivity to N2 only, corosync doesnot take care of the
 same.

 Regards,
 Ranjith
   On Sat, Sep 25, 2010 at 9:39 AM, Steven Dake sd...@redhat.com wrote:

 On 09/24/2010 08:20 PM, Ranjith wrote:

 Hi ,
 It is hard to tell what is happening without logs from all 3 nodes. Does
 this only happen at system start, or can you duplicate 5 minutes after
 systems have started?

  The cluster is never stabilizing. It keeps on switching between the

 membership and operational state.
 Below is the test network which i am using:

 Untitled.png

  N1 and N3 does not reveive any packets from each other. Here what i

 expected was that either (N1,N2) or (N2, N3) forms a two node cluster
 and stabilizes. But the cluster is never stabilizing even though 2 node
 clusters are forming, it is going back to membership [I checked the logs
 and it looks like because of the steps i mentioned in the previous mail,
 this seems to be happening]



 ..  Where did you say you were testing a byzantine fault in your
 original bug report?  Please be more forthcoming in the future. Corosync
 does not protect against byzantine faults.  Allowing one way connectivity in
 network connection = this fault scenario.  You can try coro-netctl (the
 attached script) which will atomically block a network ip in the network to
 test split brain scenarios without actually pulling network cables.

 Regards
 -steve


 Regards,
 Ranjith
 On Fri, Sep 24, 2010 at 11:36 PM, Steven Dake sd...@redhat.com
  mailto:sd...@redhat.com wrote:

It is hard to tell what is happening without logs from all 3 nodes.
Does this only happen at system start, or can you duplicate 5
minutes after systems have started?

If it is at system start, you may need to enable fast STP on your
switch.  It looks to me like node 3 gets some messages through but
then is blocked.  STP will do this in it's default state on most
switches.

Another option if you can't enable STP is to use broadcast mode (man
openais.conf for details).

Also verify firewalls are properly configured on all nodes.  You can
join us on the irc server freenode on #linux-cluster for real-time
assistance.

Regards
-steve


On 09/22/2010 11:33 PM, Ranjith wrote:

Hi Steve,
  I am running corosync 1.2.8
  I didn't get what u meant by blackbox. I suppose it is
logs/debugs.
  I just checked logs/debugs and I am able to understand the
 below:
   1--2--3
1) Node1 and Node2 are already in a 2node cluster
2) Now Node3 sends join with ({1} , {} ) (proc_list/fail_list)
3) Node2 sends join ({1,2,3} , {}) and Node 1/3 updates to
({1,2,3}, {})
4) Now Node 2 gets consensus after some messages [But 1 is the
 rep]
5) Consensus timeout fires at node 1 for node 3, node1 sends join
 as
({1,2}, {3})
6) Node2 updates because of the above message to ({1,2}, {3})
and sends
out join. This join received by node 3 causes it to update
({1,3}, {2})
7) Node1and Node2 enter operational (fail list cleared by node2)
 but
node 3 join timeout fires and again membership state.
8) This will continue to happen until consensus fires at node3
for node1
and it moves to ({3}, {1,2})
9) Now Node1and Node2 from 2 node cluster and 3 forms a single
node cluster
10) Now node 2 broadcast a Normal message
11) This message is received by Node3 as a foreign message which
forces
it to go to gather state
12) Again above steps 
The cluster is never stabilizing.
I have attached the debugs for Node2:
(1 - 10.102.33.115, 2 - 10.102.33.150, 3 -10.102.33.180)
Regards,
Ranjith

On Wed, Sep 22, 2010 at 10:53 PM, Steven Dake sd...@redhat.com
mailto:sd...@redhat.com
 mailto:sd...@redhat.com mailto:sd...@redhat.com wrote:

On 09/21/2010 11:15 PM, Ranjith wrote:

Hi all,
Kindly comment on the above behaviour
Regards,
Ranjith

On Tue, Sep 21, 2010 at 9:52 PM, Ranjith
ranjith.nath...@gmail.com mailto:ranjith.nath...@gmail.com
mailto:ranjith.nath...@gmail.com
mailto:ranjith.nath...@gmail.com
mailto:ranjith.nath...@gmail.com mailto:
 ranjith.nath...@gmail.com
mailto:ranjith.nath...@gmail.com
mailto:ranjith.nath...@gmail.com wrote:

Hi all,
I was testing the corosync cluster engine

Re: [Openais] Corosync 1.2.8 totem membership behaviour

2010-09-22 Thread Ranjith
Hi all,

Kindly comment on the above behaviour

Regards,
Ranjith

On Tue, Sep 21, 2010 at 9:52 PM, Ranjith ranjith.nath...@gmail.com wrote:

 Hi all,

 I was testing the corosync cluster engine by using the testcpg exec
 provided along with the release. I am getting the below behaviour
 while testing some specific scenarios. Kindly comment on the expected
 behaviour.

 1)   3 Node cluster

   1-2-3

 a) suppose I bring the nodes 12 up, it will form a ring (1,2)
 b) now bring up 3
 c) 3 sends join which restarts the membership process
 d) (1,2) again forms the ring , 3 forms self cluster
 e) now 3 sends a join (due to join or other timeout)
 f) again membership protocol is started as 2 responds to this by going
 to gather state ( i believe 2 should not accept this as 2 would have earlier
 decided that 3 is failed)

 I am seeing a continuous loop of the above behaviour  ( operational -
 membership - operational - ) due to which the cluster is not becoming
 stabilized


 2)   3 Node Cluster

   1-2---3

  a) bring up all the three nodes at the same time (None of the nodes
 have seen each other before this)
  b) Now each node forms a cluster by itself .. (Here i think it should
 from either a (1,2) or (2,3) ring )


 Regards,
 Ranjith

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais

[Openais] Corosync 1.2.8 totem membership behaviour

2010-09-21 Thread Ranjith
Hi all,

I was testing the corosync cluster engine by using the testcpg exec provided
along with the release. I am getting the below behaviour
while testing some specific scenarios. Kindly comment on the expected
behaviour.

1)   3 Node cluster

  1-2-3

a) suppose I bring the nodes 12 up, it will form a ring (1,2)
b) now bring up 3
c) 3 sends join which restarts the membership process
d) (1,2) again forms the ring , 3 forms self cluster
e) now 3 sends a join (due to join or other timeout)
f) again membership protocol is started as 2 responds to this by going
to gather state ( i believe 2 should not accept this as 2 would have earlier
decided that 3 is failed)

I am seeing a continuous loop of the above behaviour  ( operational -
membership - operational - ) due to which the cluster is not becoming
stabilized


2)   3 Node Cluster

  1-2---3

 a) bring up all the three nodes at the same time (None of the nodes
have seen each other before this)
 b) Now each node forms a cluster by itself .. (Here i think it should
from either a (1,2) or (2,3) ring )


Regards,
Ranjith
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais

Re: [Openais] Rgding Corosync 1.2.8 installation on freebsd

2010-09-20 Thread Ranjith
Hi all,

I was able to successfully install the same. Thanks for your help
I had to change some of the commands (like cp -a, compatibily issues) to get
it installed on freebsd 6.3


Rgds,
Ranjith



On Mon, Sep 20, 2010 at 1:45 PM, Jerome Flesch jerome.fle...@netasq.comwrote:

 Hello,

 I think you have to specify MAKE=gmake also when running gmake:
  MAKE=gmake ./configure  MAKE=gmake gmake

 Regards,

  Hi Honza,
 Now the ./configure is going through successfully
 But make install is giving the below problem:
 /Built Live Component Replacement System
 test -z /usr/lib || .././install-sh -c -d /usr/lib
  /usr/bin/install -c -m 644  liblcr.a '/usr/lib'
  ( cd '/usr/lib'  ranlib liblcr.a )
 Making install in lib
 Error expanding embedded variable.
 *** Error code 1/
 Regards,
 Ranjith


 2010/9/17 Jan Friesse jfrie...@redhat.com mailto:jfrie...@redhat.com


Hi Ranjith,

Ranjith napsal(a):

Hi,


I am trying to install Corosync 1.2.8 on freebsd 6.3 (Corosync
1.2.8 tar
ball)

I am getting the following the error when i do ./configure

configure: error: you don't seem to have GNU make; it is
required



I'm pretty sure that make from ports is new enough. Your problem is
somewhere else, and it's because BSD make is in your PATH on first
place (configure script by default search make, not gmake).

I can recommend you following line to make compilation successfully:
MAKE=gmake ./configure  gmake

Another question is does corosync work on fbsd 6.x? And to be
honest, I really don't know. I'm testing corosync on fbsd 7.x and
8.x. But you can give it a try.

Regards,
  Honza


But pkg_info shows the following pkg: gmake-3.81_2GNU
version of
'make' utility

Does corosync require any particular version of gmake?


Regards,
Ranjith




 ___
 Openais mailing list
 Openais@lists.linux-foundation.org
 https://lists.linux-foundation.org/mailman/listinfo/openais




___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais

[Openais] Linking Corosync to corosync-quorum tool

2010-09-20 Thread Ranjith
Hi all,


How can I link the corosync quorum  to the corosync to avoid split brains?


Regards,
Ranjith
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais

[Openais] Rgding Corosync 1.2.8 installation on freebsd

2010-09-17 Thread Ranjith
Hi,


I am trying to install Corosync 1.2.8 on freebsd 6.3 (Corosync 1.2.8 tar
ball)

I am getting the following the error when i do ./configure
configure: error: you don't seem to have GNU make; it is required

But pkg_info shows the following pkg: gmake-3.81_2GNU version of
'make' utility

Does corosync require any particular version of gmake?


Regards,
Ranjith
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais

Re: [Openais] Rgding Corosync 1.2.8 installation on freebsd

2010-09-17 Thread Ranjith
Hi Honza,

Now the ./configure is going through successfully

But make install is giving the below problem:
*Built Live Component Replacement System
test -z /usr/lib || .././install-sh -c -d /usr/lib
 /usr/bin/install -c -m 644  liblcr.a '/usr/lib'
 ( cd '/usr/lib'  ranlib liblcr.a )
Making install in lib
Error expanding embedded variable.
*** Error code 1*

Regards,
Ranjith



2010/9/17 Jan Friesse jfrie...@redhat.com

 Hi Ranjith,

 Ranjith napsal(a):

 Hi,


 I am trying to install Corosync 1.2.8 on freebsd 6.3 (Corosync 1.2.8 tar
 ball)

 I am getting the following the error when i do ./configure

 configure: error: you don't seem to have GNU make; it is required



 I'm pretty sure that make from ports is new enough. Your problem is
 somewhere else, and it's because BSD make is in your PATH on first place
 (configure script by default search make, not gmake).

 I can recommend you following line to make compilation successfully:
 MAKE=gmake ./configure  gmake

 Another question is does corosync work on fbsd 6.x? And to be honest, I
 really don't know. I'm testing corosync on fbsd 7.x and 8.x. But you can
 give it a try.

 Regards,
  Honza


 But pkg_info shows the following pkg: gmake-3.81_2GNU version of
 'make' utility

 Does corosync require any particular version of gmake?


 Regards,
 Ranjith


___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais