Hi Fabrice, We are using a similar architecture, but with some different backend components:
JackRabbit: 2.7 (soon to move to 2.8) Persistence: MySQL Datastore: S3 JackRabbit JAR is included in the WAR. There is a cluster of Tomcats hosting the WAR with a load balancer in front. For the global / root file system, we DbFileSystem so it is shared by all nodes. I’m not sure if this is wrong, but it works for us. The only anomaly is warnings or errors on start up after changes to NodeTypes; despite these errors, JackRabbit does startup and function correctly. The workspace files system is in LocalFileSystem and stored in Workspace home (in the local repository folder). Currently, tomcat instance will create a new local repository folder on startup and generate a unique cluster id. We have plans to changes this to something similar to what you describe: Have a background process that runs periodically. After starting and shutting down JackRabbit to create an update repository folder, the process would remove the cluster_node.id file and archive the file. When a new node starts up, it would make a local copy of the archived of the repository folder before starting JackRabbit. My tests indicate this should work, but we have not yet gone ahead with full implementation. I’m not sure if have a reason for assigning the cluster id yourself; if there is no cluster id, JackRabbit will generate a unique value (not sure if this is true in 2.2). We’ve only run into a couple issue running JackRabbit in this configuration: * we need to call Session.refresh(true) before calling Session.save(). We were not doing this initially and would get occasional errors. * every once in a while - we have not been able to determine the conditions, but the frequency is something like every few months - the search indices will become corrupt on one of the tomcat instances. Our current fix is to stop tomcat, discard the local repository folder, then restart tomcat; JackRabbit will take some time to startup, but it rebuilds all local data in the repository folder, eliminating any corruption. I think your approach should work, although I’m not sure you need to manually assign cluster ids (unless you have some reason for controlling them). Let me know if have any other questions about our set up. Morrell On May 15, 2014, at 1:09 PM, Fabrice Aupert <[email protected]> wrote: > Hi, > > We're building a 'document manager' for an existing J2EE (java5/Websphere > 6.1) webapp deployed in a cluster. This manager has to be fully integrated > into the webap. Due to production constraints, storing data in a shared > filesystem is not an option. All data/metadata must be stored in an Oracle > 10g DB. > > We have a working prototype based on Jackrabbit 2.2.13. On each node of the > cluster, the webapp embeds jackrabbit JAR and owns a dedicated repository > directory on the local filesystem. This repo contains a repository.xml file > which is pretty much the same on all nodes except for <Cluster id=""> (see > attached file). Once the webapp started, the local 'repository' directory > contains only a few files, index essentially. Example : > > ./repository > > ./repository/repository.xml > > ./repository/workspaces > > ./repository/workspaces/security > > ./repository/workspaces/security/workspace.xml > > ./repository/workspaces/security/index > > ./repository/workspaces/security/index/indexes_2 > > ./repository/workspaces/security/index/_0 > > ./repository/workspaces/security/index/_0/cache.inSegmentParents > > ./repository/workspaces/security/index/_0/segments_1 > > ./repository/workspaces/security/index/_0/segments.gen > > ./repository/workspaces/security/index/_0/segments_2 > > ./repository/workspaces/security/index/_0/_0.cfs > > ./repository/workspaces/myrepo > > ./repository/workspaces/myrepo/workspace.xml > > ./repository/workspaces/myrepo/index > > ./repository/workspaces/myrepo/index/indexes_2 > > ./repository/workspaces/myrepo/index/_0 > > ./repository/workspaces/myrepo/index/_0/_2.cfs > > ./repository/workspaces/myrepo/index/_0/segments_4 > > ./repository/workspaces/myrepo/index/_0/cache.inSegmentParents > > ./repository/workspaces/myrepo/index/_0/segments_1 > > ./repository/workspaces/myrepo/index/_0/segments.gen > > ./repository/revision.log > > From what we've seen, a thread is started on each node by jackrabbit to > refresh indexes periodically, allowing synchronization inside the cluster. > > This architectural layout seems to work but, as we lack any real world > experience with jackrabbit in this context, we would like to check with the > community that we're not bending jackrabbit capabilities in the wrong > direction. Could it lead to silent data corruption/inconsistencies ? > > The second point is about giving operations decent tooling to manage the > jackrabbit repo : > > - Admin console : we were thinking, as our embedded jackrabbit does not > expose RMI or Webdav interface, relying on jackrabbit-standalone (either > cli or server mode) : by copying a repository.xml, changing its cluster id, > and starting a new session with the standalone version from this file, we > could manage our nodes and to search (using jackrabbitexplorer on top of it > for example). Could it be a viable solution ? > > - Repo inconsistencies : does the OraclePersistenceManager really support > the <param name="consistencyFix" value="true" /> ? It does not seem so. Are > there other tools we could use to investigate and fix problems inside repo > data ? > > Any input on this matter would be extremely valuable to us. > > Thanks. > > Fabrice Aupert
