We're setting up a clustered Jackrabbit application. The application has hight traffic, so we're concerned that the Journal table will be very large. This, in turn, will make setting up new nodes a time-consuming task, when the new node starts replaying the journal to get up to date.
At [1], the concept of the janitor is described, which cleans the journal table at certain intervals. However, the list of caveats states that "If the janitor is enabled then you loose the possibility to easily add cluster nodes. (It is still possible but takes detailed knowledge of Jackrabbit.)" What detailed knowledge does this take? Can anyone give me some hints of what we need to look into? Also, we're not 100% sure we know what happens when a new node is added. We understand that the journal needs to be replayed so that the Lucene index kan be updated. But is the Lucene index the only thing that needs modification when a new node is started? If so, should this procedure work: 1. Take a complete snapshot (disk image) of one of the live nodes - including the Lucene index 2. Use the disk image to setup a new node 4. Assign a new, uniqe cluster node ID to the new node However, when trying this procedure, we still experienced that the new node replayed the entire journal. Is there more we need to add to the procedure, so that we can add new nodes without having to replay all (perhaps a year's worth of) journal entries? [1] http://wiki.apache.org/jackrabbit/Clustering#Removing_Old_Revisions -- Vidar S. Ramdal <[email protected]> - http://www.idium.no Sommerrogata 13-15, N-0255 Oslo, Norway + 47 22 00 84 00 / +47 22 00 84 76 Quando omni flunkus moritatus!
