Re: Clustering not working

MHT Mon, 23 Aug 2010 15:44:23 -0700

Hi,

I followed the starting/running a cluster info for the build of this new 
cluster 
as I had done for the others.



Some output from a working pair of clustered servers:

cat /proc/net/igmp

Idx     Device    : Count Querier       Group    Users Timer    Reporter
1       lo        :     0      V3
                                010000E0     1 0:00000000               0
2       eth0      :     2      V2
                                0A016FEF     1 0:00000000               1
                                010000E0     1 0:00000000               0

netstat -g
IPv6/IPv4 Group Memberships
Interface       RefCnt Group
--------------- ------ ---------------------
lo              1      ALL-SYSTEMS.MCAST.NET
eth0            1      239.111.1.10
eth0            1      ALL-SYSTEMS.MCAST.NET

tcpdump -i eth0 dst port 5405
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
23:25:37.952006 IP [server1].5149 > [server2].netsupport: UDP, length 70
23:25:37.952481 IP [server2].5149 > [server1].netsupport: UDP, length 70
23:25:38.141940 IP [server1].5149 > [server2].netsupport: UDP, length 70
23:25:38.142372 IP [server2].5149 > [server1].netsupport: UDP, length 70
23:25:38.161920 IP [server1].5149 > 239.111.1.10.netsupport: UDP, length 82
23:25:38.331888 IP [server1].5149 > [server2].netsupport: UDP, length 70
23:25:38.332256 IP [server2].5149 > [server1].netsupport: UDP, length 70
23:25:38.521795 IP [server1].5149 > [server2].netsupport: UDP, length 70
23:25:38.522190 IP [server2].5149 > [server1].netsupport: UDP, length 70


For the pair that isn't working I get:

Idx     Device    : Count Querier       Group    Users Timer    Reporter
1       lo        :     0      V3
                                010000E0     1 0:00000000               0
2       eth0      :     2      V3
                                04016FEF     1 0:00000000               0
                                010000E0     1 0:00000000               0

netstat -g
IPv6/IPv4 Group Memberships
Interface       RefCnt Group
--------------- ------ ---------------------
lo              1      ALL-SYSTEMS.MCAST.NET
eth0            1      239.111.1.4
eth0            1      ALL-SYSTEMS.MCAST.NET

tcpdump -i eth0 dst port 5405
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
23:31:13.663055 IP [server1].5149 > 239.111.1.4.netsupport: UDP, length 82
23:31:14.042033 IP [server1].5149 > 239.111.1.4.netsupport: UDP, length 82
23:31:14.421028 IP [server1].5149 > 239.111.1.4.netsupport: UDP, length 82
23:31:14.799024 IP [server1].5149 > 239.111.1.4.netsupport: UDP, length 82

So presume the tcpdump shows that the packet only gets as far as the multicast 
address but nothing receiving?  Sorry, I'm a numpty when it comes to networks.  
Also, the Reporter column in /proc/net/igmp shows all zeros. The servers that 
host the nodes where there's an issue are on the same vlan with no firewall 
between them, and selinux is disabled. 







----- Original Message ----
From: Alan Conway <[email protected]>
To: [email protected]
Cc: MHT <[email protected]>
Sent: Mon, 23 August, 2010 14:50:29
Subject: Re: Clustering not working

On 08/20/2010 11:56 AM, MHT wrote:
> Hi,
>
> On a working cluster I see the expected node joins in logs for both boxes:
>   [TOTEM] entering OPERATIONAL state.
>   [CLM  ] got nodejoin message<serverIP1>
>   [CLM  ] got nodejoin message<serverIP2>
>
>
>
> But on this problem one I only see the local instance on both boxes:
>   [TOTEM] entering OPERATIONAL state.
>   [CLM  ] got nodejoin message<serverIP1>
>
> I've got logging on the brokers set to trace, but so far still not seeing any
> obvious errors in the mass (other than the missing node join).  A diff on the
> config file on each box shows only cluster-url is different, as expected 
>because
> it starts with the local broker address:port.
>

There are some troubleshooting tips for configuring openais and qpidd at 
https://cwiki.apache.org/qpid/starting-a-cluster.html. If you're not seeing all 
the expected nodejoin messages then it sounds like a probelm with openais 
configuration.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:[email protected]




---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:[email protected]

Re: Clustering not working

Reply via email to