Re: Revision cleanup

Ian Boston Tue, 09 Feb 2010 01:29:37 -0800

On 8 Feb 2010, at 20:26, Michael Yin wrote:

> Our jackrabbit 1.4.x db has 80000 revisions. If we don't care about
> version history, but also want to add new 'cluster nodes' at any point
> but don't want to sit waiting for jackrabbit to process 80,000
> revisions, is there any way to reset the revision counter to speed that
> up? Currently we tend to copy around local repo folder, but that is just
> asking for corruption.
>



We have been running in production for about 18 months in a 8 node cluster with 
JR1.4. Our app servers are hosted on Xen VM's and we drop and recreate them to 
adjust for load. Here is what we do.

1. We rsync backup the local repo onto a shares server, performing sequential 
rsyncs untill we get no modifications in the state of the files on disk from 
beginning to end.
1a once we have a stable copy we tar that up and send to a central backup 
server as a "snapshot" of the local node.
2 To determine is the snapshot is stable, we read the local revisions file from 
the local repo and compare it to the state in the central DB. If they are the 
same we know nothing was pending in the local state, so if there are no rsync 
changes the snapshot is stable and in sync with the DB.
3 We store all the local revisions number of all the snapshots in one place, 
and periodically clean the revision history in the DB upto the lowest revision 
number.

on creation of a VM to join the cluster. 
We find the latest snapshot from any node
Unpack the snapshot
Modify local settings (server ID etc)
Bring the node up, at which point it catches up with the rest of the cluster, 
usually a delay of < 1min

This was all implemented as perl scripts and as I say has been good for about a 
18 months. The nice part is at any one time we have about 8 good snapshots, so 
if for any reason 1 is bad, there are 7 more to try.

The critical part is to get the snapshot stable before taking it, unfortunately 
there is no way of pausing JR to allow this to happen, although we could have 
put something into the ClusterNode implementation to trigger a snapshot. I 
suspect under really heavy load this would not work.

Ian


> 
> 
> I was thinking about exporting to XML then reimporting into a clean
> repo, but there must be a better way than that. 
> 
> 
> 
> -mike
>

Re: Revision cleanup

Reply via email to