On Wed, Sep 15, 2010 at 2:27 PM, Vidar Ramdal <[email protected]> wrote:
> > On Tue, Sep 14, 2010 at 11:48 AM, Ian Boston <[email protected]> wrote: > > > > On 14 Sep 2010, at 19:27, Vidar Ramdal wrote: > > > >> We're setting up a clustered Jackrabbit application. > >> The application has hight traffic, so we're concerned that the Journal > >> table will be very large. This, in turn, will make setting up new > >> nodes a time-consuming task, when the new node starts replaying the > >> journal to get up to date. > >> > >> At [1], the concept of the janitor is described, which cleans the > >> journal table at certain intervals. However, the list of caveats > >> states that "If the janitor is enabled then you loose the possibility > >> to easily add cluster nodes. (It is still possible but takes detailed > >> knowledge of Jackrabbit.)" > >> > >> What detailed knowledge does this take? Can anyone give me some hints > >> of what we need to look into? > > > > Sure, > > you need to take a good copy of the local state of a node and for good > measure extract the journal log number for that state from the journal > revision file. (have a look at the ClusterNode impl to locate it and the > format IIRC its 2 binary longs) > > > > Getting a good local state means one that is consistent with itself and > wasn't written to between the start and the end of the snapshot operation. > If you have high write traffic you almost certainly wont be able to snapshot > the local state live, and will have to take a node offline before taking a > snapshot. If its high read low write you might be able to use repetitive > rsync operations to get a good snapshot. > > Ian, thanks for your answer. > > So a "good copy of the local state" should be possible by shutting > down the source node before taking a snapshot. That is fine with us, > at least for node > 2, as long as we can leave one node online. > > >> Also, we're not 100% sure we know what happens when a new node is > >> added. > > > > If there is no snapshot to start form, it will replay all journal records > since record 0 to build the search index and anything else. If there is a > snapshot it will read the record number of the snapshot and replay from that > point forwards. > > > > Before using a snapshot you must make certain that all references to the > server name are correct in the snapshot (look in repository.xml after > startup) > > Yes, I know the cluster node ID in repository.xml - but is that the > only place the ID is held? > > It seems we have to update the LOCAL_REVISIONS table with the cluster id as JOURNAL_ID, and revision at the time of the snapshot as REVISION_ID. > >> We understand that the journal needs to be replayed so that the > >> Lucene index kan be updated. But is the Lucene index the only thing > >> that needs modification when a new node is started? > > > > AFAIK yes, > > > >> If so, should this procedure work: > >> 1. Take a complete snapshot (disk image) of one of the live nodes - > >> including the Lucene index > >> 2. Use the disk image to setup a new node > >> 4. Assign a new, uniqe cluster node ID to the new node > > > > yes (didnt need to write all that I did above :) ) > > > >> > >> However, when trying this procedure, we still experienced that the new > >> node replayed the entire journal. > > > > hmm, did you get the local journal record number with the snapshot ? > > Will have to double check that. > > > -- Erik Buene Senior Developer Idium AS
