HA Network of Brokers

Keith Minkler Mon, 18 Jul 2011 11:28:18 -0700

Hi!

We've spent the past couple weeks trying to setup an ActiveMQ cluster
which would have both high availability of brokers as well as high
availability of messages (e.g. to not have messages get stuck on
crashed brokers).


It seems that this configuration is missing from the documentation or
mailing list (or we were just unable to find it), so I wanted to
document it here for you all in case it becomes helpful for others...
Also, if you have any comments on what we could have done simpler,
that would be helpful as well!  (we're using ActiveMQ 5.5)

We started with a pretty standard "Pure Master/Slave"
(http://activemq.apache.org/pure-master-slave.html) configuration.
This allows for a backup of messages to exist if the Master dies (in
our configuration, only the Master takes part in the network of
brokers or client communication).

One change we had to make due to an open Active MQ bug
(https://issues.apache.org/jira/browse/AMQ-3364) was to change the
init script to kill -9 the java process on "stop" instead of trying to
do a controlled shutdown.  After this change, we no longer saw any
missing messages on master shutdown.

Additionally, I had written a restart script which would inspect the
states of the master and slave nodes, shut everything down, copy the
data directories over from the "more recently running" node, and
restart the master and slave in the right order (after pausing for the
master to start it's transportConnectors so that the slave wouldn't
try to start up before the master completed).

With the pure Master/Slave node setup out of the way, we went onto try
and setup a network of (pure master/slave) brokers.  This
configuration was much more difficult to get right.  In most of the
configurations we'd try, after restarting random nodes in the cluster
and testing various message delivery paths, messages would get stuck
or lost, or networkConnections would reconnect but not pass messages.

The configuration we did get working (note that we primarily will be
using STOMP on 61613 as the client connection) was this:

The loadbalancer VIP load balances over the Master01 and Master02
servers, Slaves do not accept client connections.

Master01 and Master02 have transportConnections configured, but
Slave01 and Slave02 do not have any transports in the configuration.

master connections from Slave01 -> Master01, and Slave02 -> Master02
are configured on a separate port (61618) from the network of brokers
connections (61616).  Not sure if this ended up being a required part
of configuration for this to work, but we had separated these to help
identify traffic and failed reconnections during testing.

Master01 <---> Master02 , duplex networkConnection between Master
servers, configured using multicast discovery (explicit configuration
of servers seemed to result in stuck networkConnections on server
failures)

Slave01 --> Master02 and Slave02  --> Master01 , non-duplex
networkConnection between slave and master, configured using multicast
discovery.  This connection allows the slave to drain the messages
queue from its master if the master server dies (clients will not
connect to the slave only the master in our configuration)

We use puppet to configure activemq.xml, and the ERB templates use a
naming convention such that (name)-mq(number)-(letter) indicates a
queue cluster, such as clustername-mq01-a.  -a indicates master, and
-b indicates slave, so that configuration is 100% automated.  Here's
the relevant portions of the activemq.xml.erb file:

[...snip...]
    <!--
        The <broker> element is used to configure the ActiveMQ broker.
        Note that for masters, we have "waitforslave" set to true, and
we have a unique name set for each brokerName.
    -->
<%
    hostname =~ /^([^.]*)-mq([0-9]+)-([ab])/
    group_name = $~[1]
    pair_name = "#{$~[1]}-mq#{$~[2]}"
    letter = $~[3]
-%>

    <broker xmlns="http://activemq.apache.org/schema/core";
<% if letter == "a" %>
            waitForSlave="true"
            shutdownOnSlaveFailure="false"
<% end -%>
        brokerName="<%= hostname %>"
        dataDirectory="${activemq.base}/data"
        networkConnectorStartAsync="true">

[...snip...]
       <!--
           for network connections, master-master is duplex,
slave-master is non-duplex.
           the username/password is given ACL for the all queues and
topics (">")
        -->
        <networkConnectors>
        <networkConnector uri="multicast://224.1.2.3:6255?group=<%=
group_name %>"
                          name="<%= hostname -%>-<%= group_name %>"
                          userName="<%= network_connection_username %>"
                          password="<%= network_connection_password %>"
                  networkTTL="3"
                  duplex="<%= letter == "a" ? "true" : "false" %>">
        </networkConnector>
        </networkConnectors>
[...snip...]
    <services>
<% if letter == "b" %>
        <!-- slaves initiate masterConnector to master for replication -->

        <masterConnector remoteURI="nio://<%= pair_name -%>-a.<%=
domain %>:61618"
                         userName="<%= network_connection_username %>"
                         password="<%= network_connection_password %>"/>
<% end -%>
    </services>

        <transportConnectors>
<% if letter == "a" %>
            <!-- note that we only start up transports for masters,
not for slaves -->

            <transportConnector name="openwire"
uri="nio://0.0.0.0:61616"
discoveryUri="multicast://224.1.2.3:6255?group=<%= group_name %>"/>
            <transportConnector name="replication" uri="nio://0.0.0.0:61618"/>
            <transportConnector name="stomp+nio"
uri="stomp+nio://0.0.0.0:61613"/>
<% end -%>
        </transportConnectors>


Hope this helps someone!!

Keith Minkler

HA Network of Brokers

Reply via email to