Hi,
I followed the starting/running a cluster info for the build of this new
cluster
as I had done for the others.
Some output from a working pair of clustered servers:
cat /proc/net/igmp
Idx Device : Count Querier Group Users Timer Reporter
1 lo : 0 V3
010000E0 1 0:00000000 0
2 eth0 : 2 V2
0A016FEF 1 0:00000000 1
010000E0 1 0:00000000 0
netstat -g
IPv6/IPv4 Group Memberships
Interface RefCnt Group
--------------- ------ ---------------------
lo 1 ALL-SYSTEMS.MCAST.NET
eth0 1 239.111.1.10
eth0 1 ALL-SYSTEMS.MCAST.NET
tcpdump -i eth0 dst port 5405
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
23:25:37.952006 IP [server1].5149 > [server2].netsupport: UDP, length 70
23:25:37.952481 IP [server2].5149 > [server1].netsupport: UDP, length 70
23:25:38.141940 IP [server1].5149 > [server2].netsupport: UDP, length 70
23:25:38.142372 IP [server2].5149 > [server1].netsupport: UDP, length 70
23:25:38.161920 IP [server1].5149 > 239.111.1.10.netsupport: UDP, length 82
23:25:38.331888 IP [server1].5149 > [server2].netsupport: UDP, length 70
23:25:38.332256 IP [server2].5149 > [server1].netsupport: UDP, length 70
23:25:38.521795 IP [server1].5149 > [server2].netsupport: UDP, length 70
23:25:38.522190 IP [server2].5149 > [server1].netsupport: UDP, length 70
For the pair that isn't working I get:
Idx Device : Count Querier Group Users Timer Reporter
1 lo : 0 V3
010000E0 1 0:00000000 0
2 eth0 : 2 V3
04016FEF 1 0:00000000 0
010000E0 1 0:00000000 0
netstat -g
IPv6/IPv4 Group Memberships
Interface RefCnt Group
--------------- ------ ---------------------
lo 1 ALL-SYSTEMS.MCAST.NET
eth0 1 239.111.1.4
eth0 1 ALL-SYSTEMS.MCAST.NET
tcpdump -i eth0 dst port 5405
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
23:31:13.663055 IP [server1].5149 > 239.111.1.4.netsupport: UDP, length 82
23:31:14.042033 IP [server1].5149 > 239.111.1.4.netsupport: UDP, length 82
23:31:14.421028 IP [server1].5149 > 239.111.1.4.netsupport: UDP, length 82
23:31:14.799024 IP [server1].5149 > 239.111.1.4.netsupport: UDP, length 82
So presume the tcpdump shows that the packet only gets as far as the multicast
address but nothing receiving? Sorry, I'm a numpty when it comes to networks.
Also, the Reporter column in /proc/net/igmp shows all zeros. The servers that
host the nodes where there's an issue are on the same vlan with no firewall
between them, and selinux is disabled.
----- Original Message ----
From: Alan Conway <[email protected]>
To: [email protected]
Cc: MHT <[email protected]>
Sent: Mon, 23 August, 2010 14:50:29
Subject: Re: Clustering not working
On 08/20/2010 11:56 AM, MHT wrote:
> Hi,
>
> On a working cluster I see the expected node joins in logs for both boxes:
> [TOTEM] entering OPERATIONAL state.
> [CLM ] got nodejoin message<serverIP1>
> [CLM ] got nodejoin message<serverIP2>
>
>
>
> But on this problem one I only see the local instance on both boxes:
> [TOTEM] entering OPERATIONAL state.
> [CLM ] got nodejoin message<serverIP1>
>
> I've got logging on the brokers set to trace, but so far still not seeing any
> obvious errors in the mass (other than the missing node join). A diff on the
> config file on each box shows only cluster-url is different, as expected
>because
> it starts with the local broker address:port.
>
There are some troubleshooting tips for configuring openais and qpidd at
https://cwiki.apache.org/qpid/starting-a-cluster.html. If you're not seeing all
the expected nodejoin messages then it sounds like a probelm with openais
configuration.
---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project: http://qpid.apache.org
Use/Interact: mailto:[email protected]
---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project: http://qpid.apache.org
Use/Interact: mailto:[email protected]