Re: Clustering-Issues

Ian Boston Tue, 27 Oct 2009 13:28:31 -0700


On 27 Oct 2009, at 16:38, Thomas Müller wrote:

Hi,

two cluster nodes working for a while.
100000 revisions in the datastore.
add a third cluster node
it's replaying 100000 journal entries

Is there a way of having the third (new) cluster node start at thelatest

Global-Revision immediately?


There seems to be a related feature:
https://issues.apache.org/jira/browse/JCR-1087 - I'm not sure if this
will solve the problem however (I don't really know this feature)

We have been running in production with a similar solution toJCR-1087. We have a perl script that creates a consistent snapshot ofthe local disk state (through repetitive rsyncs) and stores thatsnapshot on a central server.

When a new node comes up, it pulls the snapshot from the centralserver, adjusts some of the settings and starts the JVM up. At thispoint jackrabbit replays the part of the journal since the snapshot astaken.

When the snapshots are stored, we look into the local revisions file,extract the revision and store it. A separate process then deletesjournal records from the database prior to the earliest snapshot,hence keeping the size of the journal down, and the startup time down.

The JVM's we use between 3 and 8 depending on the load at time arehosted on Xen based Linux virtual machines, and over the past 18months in production I believe we have recreated the JVM's many timeswith no problems (or at least not I've been told about).

Although the approach is a little agricultural and the repetitiversync can take a while to get a solid snapshot (and we do sometimesget a bad one when the indexes are half way though some optimization),it woks with JR 1.4, and we get at least 3 parallel snapshots of thelocal node state at any one time (infact we keep several old versionsfor each node). The nice part is the JR startup script always startsform a snapshot so the startup time is always acceptable.


Looking at the comments on JCR-1087 it does some of the same things.

Ian

If I temporarily shut down the second cluster node I receive thefollowing
error messages during synchronization at restarting this second node:


I am not sure, it sounds like a bug... Could you create a Jira issue
for this, together with a simple reproducible test case?

Regards,
Thomas

Re: Clustering-Issues

Reply via email to