Ah. okie, got it :) I was wondering if you were using some replication software that augments BDB that I wasn't aware of.
A SAN explains your architecture. Thanks a lot for writing back :) On Fri, Jan 20, 2012 at 8:29 AM, Rob Godfrey <[email protected]>wrote: > On 20 January 2012 17:13, Praveen M <[email protected]> wrote: > > > Hi Rob, > > > > Thanks for writing. Please see inline. > > > > On Fri, Jan 20, 2012 at 1:35 AM, Rob Godfrey <[email protected] > > >wrote: > > > > > Hi Praveen, > > > > > > On 14 January 2012 02:47, Praveen M <[email protected]> wrote: > > > > > > > Hi, > > > > > > > > Are there any java broker high availability/clustering solutions > that > > > > are currently present? I tried googling around and didn't find > anything > > > to > > > > my luck. > > > > > > > > Can you please suggest a HA strategy that you've used working with > the > > > Qpid > > > > Java Broker? > > > > > > > > > > > So where I work we have two separate strategies for "HA" and disaster > > > recovery. > > > > > > For HA we use synchronous replication of the BDB store, with external > > > software monitoring the availability of the primary broker machine. If > > the > > > primary broker machine goes down, the external software starts up the > > > secondary broker machine, which points to the synchronously replicated > > > instance of the store... it can also handle reassignment of the IP > > address > > > / DNS name. > > > > > > > *Is there a reason that you use an external software to monitor the > > availability of the primary broker machine.?* > > *Shouldn't the connection failover model be sufficient enough for this? > Or > > does the failover model have any limitations? * > > * > > > > > The JMS clients failover automatically, the architectural design was not > driven by limits in the failover model... however the HA solution is not > focused solely on Qpid and aims to provide a service which is as seamless > as possible to end user applications > > > > *Also, you mention synchronous replication of BDB. Can you please write a > > bit about how you go about doing this? I think with syncCommit false, > sync > > replication could be something that could work for us too without > > really jeopardizing the enqueue latencies.* > > > > > > > The synchronous replication in our case is done at the "hardware" level. > The storage attached to the machines provides this replication. > > > > > For DR we take regular snapshots of the BDB store files and ship these > > > using an FTP-like mechanism to a DR site. Clearly with this solution > you > > > run the risk of loss as you only have a snapshot from a known point in > > > time, not from the very moment the system went down. > > > > > > *Ah yes, this runs the risk of losing messages. Did you not consider a > > synchronous replication in this case too?* > > > > DR sites are necessarily far enough away from primary sites to make > synchronous replication (at least at the storage level) impractical. > > > > *Or is it because of the distance of the DR site that could contribute to > > high latency round trips. Just curious.* > > > > > Exactly. > > In general the message broker forms only one part of an application, in a > DR scenario many different components with their own stores will have to be > restarted. At this point the application design needs to be able to > recover - most importantly applications need to tolerate duplicates cause > by replaying from a point earlier in time than the point at which failure > occurred. > > > > In our model our transaction store which contains a copy of the message > > will be DR'ed. > > > > > > > > I found a Message Federation design proposal document, but I'm > guessing > > > > it's not implemented yet (Please correct me if I'm wrong). > > > > > > > > > > > There is an alpha/beta implementation of Message Federation in the Java > > > Broker, which follows the same design as that in the C++ broker and > uses > > > the same toolset to create routes. This code is broken in the most > > recent > > > releases of the Java Broker, but should work "better" from trunk... > > however > > > I'm not going to give any guarantees on it's suitability for a > production > > > system right now (I hope to be doing some serious testing/fixing over > the > > > next couple of months). > > > > > > > > > > I plan to spin off two brokers on two different machines and use a > > > failover > > > > connection model to route messages to one if the other goes down. > This > > > > works well for message enqueues. > > > > But still, I'd run the risk of not being able to process the messages > > in > > > > the broker that just went down (until it's back up). It will be nice > to > > > > know if someone had solved a similar problem by other > > > > strategies/solutions available with the broker. > > > > > > > > Also, has someone tried replicating the database used for > > > > the persistent store to solve this problem (BDB/Derby ?) > > > > > > > > > > > As above, we use replication, but managed by hardware/external > software. > > > I've not yet tried using BDB's own HA solutions to provide replication. > > > > > > *well. Is the replication too driven by an external software. I'm > > curious on how you go about doing a synchronous* > > *replication with BDB (as this is the route that we might want to take). > > Any tips here will be useful. * > > * > > * > > > > As above the replication I describe is at the storage level. Essentially > we're talking about facilities offered by certain Storage Area Network > products :-) > > > > *If you are allowed to talk about the hardware/external software piece > I'd > > love to hear more about your HA* > > *architecture. (I do understand sometimes NDAs might stop you. If so, > it's > > okie).* > > > > > > > We use a standard commercial High Availability Cluster software for this > purpose, I'm not really at liberty to say which of these products we use - > but I imagine that all are equally functional in this area. > > Cheers, > Rob > -- -Praveen
