Hi folks,

this is along the lines of what Franz was asking in "Master/Slave
Configuration With Non-Persistence - 2 Brokers Starting Problem", but I
didn't want to hijack his thread in case it wasn't related.

what I'm looking for is:
* Pure master/slave share nothing
* transparent failover in the event of a broker failure
* transparent fail back when the master is restored (I realize this isn't in
AMQ yet)
* clean startup when bringing up servers while clients are trying to send

Client usage pattern is:
* Some persistent messages to certain queues (maybe 10-50 a minute)
* Lots of non-persistent messages with 10 sec TTL on other queues (200 /
second)

I realize I can do the pure master/slave bit by making the slave point to
the master.
Transparent failover works fine, even when there's temp queues in play
(which is awesome given that there's only one other broker out there that
can do this, afaik)

Restoring the master node / startup is where I'm concerned.  Considering
that the 200/sec messages are 24 hours a day, there's no time where I can:
* stop the slave
* copy state to master
* restart the master
* restart the slave

All without a client trying to send messages.

If I read the docs correctly, there is no initial state sync at all between
brokers, only a means of transferring state events as they occur.

Given that, if I stop the slave, the clients will block waiting for a broker
to come back up, then I start the master, the clients will connect there and
immediately start sending messages.

When the slave comes up, the master will start propagating changes, but it
won't have any of the messages in it's store between when the master started
accepting and when the slave comes up.

Is this assumption correct?

This isn't as critical for the non-persistent messages as we don't expect a
master to come up and fail again in 10 seconds (TTL of messages), but for
the persistent messages, they may live a very long time (14 hours in a queue
until something reads them).  And you could imagine that being an easy
failure if you unknowingly have faulty hardware.

I can imagine a few ways around this issue (if it is indeed an issue):
1) Start up brokers with acceptors 'disabled' then use jmx to enable
acceptance on the master. (is this possible?)

2) Create a second set of brokers for persistent messages pointing to our
RAC database (we don't want to run 200/sec through rac, but 50 a min is
fine).  This is kind of a pain in that it's a EJB3 MDB based app and using
multiple brokers requires extra configuration.

3) we add (and contribute back) state sync between brokers.  Idea being that
when a slave connects, we pause all connectors to transfer state and then
resume the connectors.  Probably a lot harder done than said considering
it's not already implemented that way.

Any suggestions?  How are other people using AMQ for a HA-loose nothing /
share nothing solution?

Thanks,
-David

Reply via email to