On 07/15/2011 11:43 AM, Jakub Scholz wrote:
Hi,
We are running the Qpid/MRG broker in cluster consisting usually from
2-4 nodes. Our broker configuration contains many persistent queues
(currently several GB, in the future even more). When starting the
cluster (with already existing store), the nodes are being started one
after another. Due to the way the clustering is designed, the first
broker starts and recovers the store from the disk. All the other
brokers which start later have to discard the existing store and get
all data freshly from the broker(s) which is already running. This
initial synchronization of the store can be quite long if the store
contains several GB of persistent queues (that's something to be
expected).
Unfortunately, during this initial synchronization, the cluster seems
to be in some kind of "half running" state. Clients seem to be able to
connect, but they cannot do anything.
Yes, during update the cluster accepts connections but doesn't read from them
until update is complete.
Would it improve things for you if a cluster rejected connections during update
- allowing the client to fail over to a different broker?
Especially all python based
tools using QMF (e.g. qpid-config, qpid-cluster) seem to timeout
during this period instead of waiting for the cluster synchronization
to finish. The --timeout parameter for the qpid-config (which is
passed to the QMF API in the session->addBroker method) doesn't really
seem to help here.
Does someone know whether ...
- There is some way to detect that the cluster is currently
synchronizing new node or that the synchronization already finished?
Not presently. What would you do with that information if you could get it?
- In case I stopped all my brokers cleanly using qpid-cluster, is
there some way to start all brokers synchronously from their own
stores (which are clean at the moment)? It would save a lot of time
for the initial store synchronization.
That should work. You need to configure cluster-size so the brokers can all
synchronize their initial state. If it doesn't work raise a JIRA and mail me so
I don't miss it.
- Is the QMF problem described above known (I didn't found anything in
JIRA)? Eventually is it worth entering an Issue report?
I'm guessing that python client sets up its timeout after the connection is
open, so it's not taking effect if the connection opening itself takes a long
time (which happens because of the "half running" state.) If that the case it's
worth a JIRA.
---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project: http://qpid.apache.org
Use/Interact: mailto:[email protected]