So... don't get your hopes up too high... but I am going to look at utilising BDB's HA capabilities to implement some sort of Active-Passive HA solution... it looks like it shouldn't be *too* much work at first glance (non master nodes block on startup waiting to be elected master, and then configure themselves from the now-master BDB instance).
At best this is going to be a bit of a hobby project for me as it's not something that is strictly necessary for my personal end users, Cheers, Rob On 20 January 2012 17:34, Praveen M <[email protected]> wrote: > Ah. okie, got it :) I was wondering if you were using some replication > software that augments BDB that I wasn't aware of. > > A SAN explains your architecture. Thanks a lot for writing back :) > > On Fri, Jan 20, 2012 at 8:29 AM, Rob Godfrey <[email protected] > >wrote: > > > On 20 January 2012 17:13, Praveen M <[email protected]> wrote: > > > > > Hi Rob, > > > > > > Thanks for writing. Please see inline. > > > > > > On Fri, Jan 20, 2012 at 1:35 AM, Rob Godfrey <[email protected] > > > >wrote: > > > > > > > Hi Praveen, > > > > > > > > On 14 January 2012 02:47, Praveen M <[email protected]> wrote: > > > > > > > > > Hi, > > > > > > > > > > Are there any java broker high availability/clustering solutions > > that > > > > > are currently present? I tried googling around and didn't find > > anything > > > > to > > > > > my luck. > > > > > > > > > > Can you please suggest a HA strategy that you've used working with > > the > > > > Qpid > > > > > Java Broker? > > > > > > > > > > > > > > So where I work we have two separate strategies for "HA" and disaster > > > > recovery. > > > > > > > > For HA we use synchronous replication of the BDB store, with external > > > > software monitoring the availability of the primary broker machine. > If > > > the > > > > primary broker machine goes down, the external software starts up the > > > > secondary broker machine, which points to the synchronously > replicated > > > > instance of the store... it can also handle reassignment of the IP > > > address > > > > / DNS name. > > > > > > > > > > *Is there a reason that you use an external software to monitor the > > > availability of the primary broker machine.?* > > > *Shouldn't the connection failover model be sufficient enough for this? > > Or > > > does the failover model have any limitations? * > > > * > > > > > > > > The JMS clients failover automatically, the architectural design was not > > driven by limits in the failover model... however the HA solution is not > > focused solely on Qpid and aims to provide a service which is as seamless > > as possible to end user applications > > > > > > > *Also, you mention synchronous replication of BDB. Can you please > write a > > > bit about how you go about doing this? I think with syncCommit false, > > sync > > > replication could be something that could work for us too without > > > really jeopardizing the enqueue latencies.* > > > > > > > > > > > The synchronous replication in our case is done at the "hardware" level. > > The storage attached to the machines provides this replication. > > > > > > > > For DR we take regular snapshots of the BDB store files and ship > these > > > > using an FTP-like mechanism to a DR site. Clearly with this solution > > you > > > > run the risk of loss as you only have a snapshot from a known point > in > > > > time, not from the very moment the system went down. > > > > > > > > *Ah yes, this runs the risk of losing messages. Did you not consider > a > > > synchronous replication in this case too?* > > > > > > > DR sites are necessarily far enough away from primary sites to make > > synchronous replication (at least at the storage level) impractical. > > > > > > > *Or is it because of the distance of the DR site that could contribute > to > > > high latency round trips. Just curious.* > > > > > > > > Exactly. > > > > In general the message broker forms only one part of an application, in a > > DR scenario many different components with their own stores will have to > be > > restarted. At this point the application design needs to be able to > > recover - most importantly applications need to tolerate duplicates cause > > by replaying from a point earlier in time than the point at which failure > > occurred. > > > > > > > In our model our transaction store which contains a copy of the message > > > will be DR'ed. > > > > > > > > > > > I found a Message Federation design proposal document, but I'm > > guessing > > > > > it's not implemented yet (Please correct me if I'm wrong). > > > > > > > > > > > > > > There is an alpha/beta implementation of Message Federation in the > Java > > > > Broker, which follows the same design as that in the C++ broker and > > uses > > > > the same toolset to create routes. This code is broken in the most > > > recent > > > > releases of the Java Broker, but should work "better" from trunk... > > > however > > > > I'm not going to give any guarantees on it's suitability for a > > production > > > > system right now (I hope to be doing some serious testing/fixing over > > the > > > > next couple of months). > > > > > > > > > > > > > I plan to spin off two brokers on two different machines and use a > > > > failover > > > > > connection model to route messages to one if the other goes down. > > This > > > > > works well for message enqueues. > > > > > But still, I'd run the risk of not being able to process the > messages > > > in > > > > > the broker that just went down (until it's back up). It will be > nice > > to > > > > > know if someone had solved a similar problem by other > > > > > strategies/solutions available with the broker. > > > > > > > > > > Also, has someone tried replicating the database used for > > > > > the persistent store to solve this problem (BDB/Derby ?) > > > > > > > > > > > > > > As above, we use replication, but managed by hardware/external > > software. > > > > I've not yet tried using BDB's own HA solutions to provide > replication. > > > > > > > > *well. Is the replication too driven by an external software. I'm > > > curious on how you go about doing a synchronous* > > > *replication with BDB (as this is the route that we might want to > take). > > > Any tips here will be useful. * > > > * > > > * > > > > > > > As above the replication I describe is at the storage level. Essentially > > we're talking about facilities offered by certain Storage Area Network > > products :-) > > > > > > > *If you are allowed to talk about the hardware/external software piece > > I'd > > > love to hear more about your HA* > > > *architecture. (I do understand sometimes NDAs might stop you. If so, > > it's > > > okie).* > > > > > > > > > > > We use a standard commercial High Availability Cluster software for this > > purpose, I'm not really at liberty to say which of these products we use > - > > but I imagine that all are equally functional in this area. > > > > Cheers, > > Rob > > > > > > -- > -Praveen >
