Hello Vidar, On Tue, Sep 14, 2010 at 11:27 AM, Vidar Ramdal <[email protected]> wrote: > We're setting up a clustered Jackrabbit application. > The application has hight traffic, so we're concerned that the Journal > table will be very large. This, in turn, will make setting up new > nodes a time-consuming task, when the new node starts replaying the > journal to get up to date. > > At [1], the concept of the janitor is described, which cleans the > journal table at certain intervals. However, the list of caveats > states that "If the janitor is enabled then you loose the possibility > to easily add cluster nodes. (It is still possible but takes detailed > knowledge of Jackrabbit.)" > > What detailed knowledge does this take? Can anyone give me some hints > of what we need to look into? > > Also, we're not 100% sure we know what happens when a new node is > added. We understand that the journal needs to be replayed so that the > Lucene index kan be updated. But is the Lucene index the only thing > that needs modification when a new node is started? > If so, should this procedure work: > 1. Take a complete snapshot (disk image) of one of the live nodes - > including the Lucene index
Although I am not to familiar with the clustered setup (others at my company are), I know that this is not possible unfortunately. The problem is that the most recent Lucene index is an in-memory one. You cannot get correct snapshots from the index. It is something I'd love to get improved in some time in Jackrabbit. As a more general, and part of the same thing, but, also a very large job, would be to see how clustering would work out with optionally using infinispan keeping a clustered in memory Lucene index, and use 1 or 2 repository nodes to store Lucene segments into the database. This way, I think the journals largely become redundant, adding repository nodes to a cluster is trivial, and the database can also contain the persisted Lucene segments. I am confident this can work as some people are using Hibernate clustered in this way. It does however imply large refactoring in the jackrabbit query package: for example moving from the multi-index to a single one and just use re-open on the index reader. Unfortunately, you are not helped with this, just giving a brain dump Regards Ard > 2. Use the disk image to setup a new node > 4. Assign a new, uniqe cluster node ID to the new node >
