Re: Adding new nodes to a cluster

Erik Buene Wed, 15 Sep 2010 05:37:14 -0700

On Wed, Sep 15, 2010 at 2:27 PM, Vidar Ramdal <[email protected]> wrote:


>
> On Tue, Sep 14, 2010 at 11:48 AM, Ian Boston <[email protected]> wrote:
> >
> > On 14 Sep 2010, at 19:27, Vidar Ramdal wrote:
> >
> >> We're setting up a clustered Jackrabbit application.
> >> The application has hight traffic, so we're concerned that the Journal
> >> table will be very large. This, in turn, will make setting up new
> >> nodes a time-consuming task, when the new node starts replaying the
> >> journal to get up to date.
> >>
> >> At [1], the concept of the janitor is described, which cleans the
> >> journal table at certain intervals. However, the list of caveats
> >> states that "If the janitor is enabled then you loose the possibility
> >> to easily add cluster nodes. (It is still possible but takes detailed
> >> knowledge of Jackrabbit.)"
> >>
> >> What detailed knowledge does this take? Can anyone give me some hints
> >> of what we need to look into?
> >
> > Sure,
> > you need to take a good copy of the local state of a node and for good
> measure extract the journal log number for that state from the journal
> revision file. (have a look at the ClusterNode impl to locate it and the
> format IIRC its 2 binary longs)
> >
> > Getting a good local state means one that is consistent with itself and
> wasn't written to between the start and the end of the snapshot operation.
> If you have high write traffic you almost certainly wont be able to snapshot
> the local state live, and will have to take a node offline before taking a
> snapshot. If its high read low write you might be able to use repetitive
> rsync operations to get a good snapshot.
>
> Ian, thanks for your answer.
>
> So a "good copy of the local state" should be possible by shutting
> down the source node before taking a snapshot. That is fine with us,
> at least for node > 2, as long as we can leave one node online.
>
> >> Also, we're not 100% sure we know what happens when a new node is
> >> added.
> >
> > If there is no snapshot to start form, it will replay all journal records
> since record 0 to build the search index and anything else. If there is a
> snapshot it will read the record number of the snapshot and replay from that
> point forwards.
> >
> > Before using a snapshot you must make certain that all references to the
> server name are correct in the snapshot (look in repository.xml after
> startup)
>
> Yes, I know the cluster node ID in repository.xml - but is that the
> only place the ID is held?
>
>
It seems we have to update the LOCAL_REVISIONS table with the cluster id as
JOURNAL_ID, and revision at the time of the snapshot as REVISION_ID.


> >> We understand that the journal needs to be replayed so that the
> >> Lucene index kan be updated. But is the Lucene index the only thing
> >> that needs modification when a new node is started?
> >
> > AFAIK yes,
> >
> >> If so, should this procedure work:
> >> 1. Take a complete snapshot (disk image) of one of the live nodes -
> >> including the Lucene index
> >> 2. Use the disk image to setup a new node
> >> 4. Assign a new, uniqe cluster node ID to the new node
> >
> > yes (didnt need to write all that I did above :) )
> >
> >>
> >> However, when trying this procedure, we still experienced that the new
> >> node replayed the entire journal.
> >
> > hmm, did you get the local journal record number with the snapshot ?
>
> Will have to double check that.
>
>
>
--
Erik Buene
Senior Developer
Idium AS

Re: Adding new nodes to a cluster

Reply via email to