> On Tue, Sep 14, 2010 at 11:27 AM, Vidar Ramdal <[email protected]> wrote: >> We're setting up a clustered Jackrabbit application. >> The application has hight traffic, so we're concerned that the Journal >> table will be very large. This, in turn, will make setting up new >> nodes a time-consuming task, when the new node starts replaying the >> journal to get up to date. >> >> At [1], the concept of the janitor is described, which cleans the >> journal table at certain intervals. However, the list of caveats >> states that "If the janitor is enabled then you loose the possibility >> to easily add cluster nodes. (It is still possible but takes detailed >> knowledge of Jackrabbit.)" >> >> What detailed knowledge does this take? Can anyone give me some hints >> of what we need to look into? >> >> Also, we're not 100% sure we know what happens when a new node is >> added. We understand that the journal needs to be replayed so that the >> Lucene index kan be updated. But is the Lucene index the only thing >> that needs modification when a new node is started? >> If so, should this procedure work: >> 1. Take a complete snapshot (disk image) of one of the live nodes - >> including the Lucene index
On Tue, Sep 14, 2010 at 11:41 AM, Ard Schrijvers <[email protected]> wrote: > Although I am not to familiar with the clustered setup (others at my > company are), I know that this is not possible unfortunately. The > problem is that the most recent Lucene index is an in-memory one. You > cannot get correct snapshots from the index. It is something I'd love > to get improved in some time in Jackrabbit. OK, but what if we shutdown the application before taking the snapshot? Will this give us a usable starting point? The procedure would then be: 1. Shutdown the live node A 2. Take a disk image snapshot of A 3. Use the disk image to create a new instance B 4. Alter the cluster node ID in B's repository.xml 5. Restart A 6. Start B > As a more general, and part of the same thing, but, also a very large > job, would be to see how clustering would work out with optionally > using infinispan keeping a clustered in memory Lucene index, and use 1 > or 2 repository nodes to store Lucene segments into the database. This > way, I think the journals largely become redundant, adding repository > nodes to a cluster is trivial, and the database can also contain the > persisted Lucene segments. I am confident this can work as some people > are using Hibernate clustered in this way. It does however imply large > refactoring in the jackrabbit query package: for example moving from > the multi-index to a single one and just use re-open on the index > reader. +1 :) -- Vidar S. Ramdal <[email protected]> - http://www.idium.no Sommerrogata 13-15, N-0255 Oslo, Norway + 47 22 00 84 00 / +47 22 00 84 76 Quando omni flunkus moritatus!
