Re: Jackrabbit & cluster with Oracle backend

Morrell Jacobs Fri, 16 May 2014 14:50:54 -0700

Hi Fabrice,

We are using a similar architecture, but with some different backend components:

JackRabbit: 2.7 (soon to move to 2.8)
Persistence: MySQL
Datastore: S3

JackRabbit JAR is included in the WAR.  There is a cluster of Tomcats hosting 
the WAR with a load balancer in front.

For the global / root file system, we DbFileSystem so it is shared by all 
nodes.  I’m not sure if this is wrong, but it works for us.  The only anomaly 
is warnings or errors on start up after changes to NodeTypes; despite these 
errors, JackRabbit does startup and function correctly.

The workspace files system is in LocalFileSystem and stored in Workspace home 
(in the local repository folder).

Currently, tomcat instance will create a new local repository folder on startup 
and generate a unique cluster id.  We have plans to changes this to something 
similar to what you describe:

Have a background process that runs periodically.  After starting and shutting 
down JackRabbit to create an update repository folder, the process would remove 
the cluster_node.id file and archive the file.  When a new node starts up, it 
would make a local copy of the archived of the repository folder before 
starting JackRabbit.  My tests indicate this should work, but we have not yet 
gone ahead with full implementation.

I’m not sure if have a reason for assigning the cluster id yourself; if there 
is no cluster id, JackRabbit will generate a unique value (not sure if this is 
true in 2.2).

We’ve only run into a couple issue running JackRabbit in this configuration:

* we need to call Session.refresh(true) before calling Session.save().  We were 
not doing this initially and would get occasional errors.
* every once in a while - we have not been able to determine the conditions, 
but the frequency is something like every few months - the search indices will 
become corrupt on one of the tomcat instances.  Our current fix is to stop 
tomcat, discard the local repository folder, then restart tomcat; JackRabbit 
will take some time to startup, but it rebuilds all local data in the 
repository folder, eliminating any corruption.

I think your approach should work, although I’m not sure you need to manually 
assign cluster ids (unless you have some reason for controlling them).

Let me know if have any other questions about our set up.
Morrell

On May 15, 2014, at 1:09 PM, Fabrice Aupert <[email protected]> wrote:

> Hi,
> 
> We're building a 'document manager' for an existing J2EE (java5/Websphere
> 6.1) webapp deployed in a cluster. This manager has to be fully integrated
> into the webap. Due to production constraints, storing data in a shared
> filesystem is not an option. All data/metadata must be stored in an Oracle
> 10g DB.
> 
> We have a working prototype based on Jackrabbit 2.2.13. On each node of the
> cluster, the webapp embeds jackrabbit JAR and owns a dedicated repository
> directory on the local filesystem. This repo contains a repository.xml file
> which is pretty much the same on all nodes except for <Cluster id=""> (see
> attached file). Once the webapp started, the local 'repository' directory
> contains only a few files, index essentially. Example :
> 
> ./repository
> 
> ./repository/repository.xml
> 
> ./repository/workspaces
> 
> ./repository/workspaces/security
> 
> ./repository/workspaces/security/workspace.xml
> 
> ./repository/workspaces/security/index
> 
> ./repository/workspaces/security/index/indexes_2
> 
> ./repository/workspaces/security/index/_0
> 
> ./repository/workspaces/security/index/_0/cache.inSegmentParents
> 
> ./repository/workspaces/security/index/_0/segments_1
> 
> ./repository/workspaces/security/index/_0/segments.gen
> 
> ./repository/workspaces/security/index/_0/segments_2
> 
> ./repository/workspaces/security/index/_0/_0.cfs
> 
> ./repository/workspaces/myrepo
> 
> ./repository/workspaces/myrepo/workspace.xml
> 
> ./repository/workspaces/myrepo/index
> 
> ./repository/workspaces/myrepo/index/indexes_2
> 
> ./repository/workspaces/myrepo/index/_0
> 
> ./repository/workspaces/myrepo/index/_0/_2.cfs
> 
> ./repository/workspaces/myrepo/index/_0/segments_4
> 
> ./repository/workspaces/myrepo/index/_0/cache.inSegmentParents
> 
> ./repository/workspaces/myrepo/index/_0/segments_1
> 
> ./repository/workspaces/myrepo/index/_0/segments.gen
> 
> ./repository/revision.log
> 
> From what we've seen, a thread is started on each node by jackrabbit to
> refresh indexes periodically, allowing synchronization inside the cluster.
> 
> This architectural layout seems to work but, as we lack any real world
> experience with jackrabbit in this context, we would like to check with the
> community that we're not bending jackrabbit capabilities in the wrong
> direction. Could it lead to silent data corruption/inconsistencies ?
> 
> The second point is about giving operations decent tooling to manage the
> jackrabbit repo :
> 
> - Admin console : we were thinking, as our embedded jackrabbit does not
> expose RMI or Webdav interface, relying on jackrabbit-standalone (either
> cli or server mode) : by copying a repository.xml, changing its cluster id,
> and starting a new session with the standalone version from this file, we
> could manage our nodes and to search (using jackrabbitexplorer on top of it
> for example). Could it be a viable solution ?
> 
> - Repo inconsistencies : does the OraclePersistenceManager really support
> the <param name="consistencyFix" value="true" /> ? It does not seem so. Are
> there other tools we could use to investigate and fix problems inside repo
> data ?
> 
> Any input on this matter would be extremely valuable to us.
> 
> Thanks.
> 
> Fabrice Aupert

Re: Jackrabbit & cluster with Oracle backend

Reply via email to