Hi, We are running the Qpid/MRG broker in cluster consisting usually from 2-4 nodes. Our broker configuration contains many persistent queues (currently several GB, in the future even more). When starting the cluster (with already existing store), the nodes are being started one after another. Due to the way the clustering is designed, the first broker starts and recovers the store from the disk. All the other brokers which start later have to discard the existing store and get all data freshly from the broker(s) which is already running. This initial synchronization of the store can be quite long if the store contains several GB of persistent queues (that's something to be expected).
Unfortunately, during this initial synchronization, the cluster seems to be in some kind of "half running" state. Clients seem to be able to connect, but they cannot do anything. Especially all python based tools using QMF (e.g. qpid-config, qpid-cluster) seem to timeout during this period instead of waiting for the cluster synchronization to finish. The --timeout parameter for the qpid-config (which is passed to the QMF API in the session->addBroker method) doesn't really seem to help here. Does someone know whether ... - There is some way to detect that the cluster is currently synchronizing new node or that the synchronization already finished? - In case I stopped all my brokers cleanly using qpid-cluster, is there some way to start all brokers synchronously from their own stores (which are clean at the moment)? It would save a lot of time for the initial store synchronization. - Is the QMF problem described above known (I didn't found anything in JIRA)? Eventually is it worth entering an Issue report? Thanks & Regards Jakub --------------------------------------------------------------------- Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:[email protected]
