Hi, We're building a 'document manager' for an existing J2EE (java5/Websphere 6.1) webapp deployed in a cluster. This manager has to be fully integrated into the webap. Due to production constraints, storing data in a shared filesystem is not an option. All data/metadata must be stored in an Oracle 10g DB.
We have a working prototype based on Jackrabbit 2.2.13. On each node of the cluster, the webapp embeds jackrabbit JAR and owns a dedicated repository directory on the local filesystem. This repo contains a repository.xml file which is pretty much the same on all nodes except for <Cluster id=""> (see attached file). Once the webapp started, the local 'repository' directory contains only a few files, index essentially. Example : ./repository ./repository/repository.xml ./repository/workspaces ./repository/workspaces/security ./repository/workspaces/security/workspace.xml ./repository/workspaces/security/index ./repository/workspaces/security/index/indexes_2 ./repository/workspaces/security/index/_0 ./repository/workspaces/security/index/_0/cache.inSegmentParents ./repository/workspaces/security/index/_0/segments_1 ./repository/workspaces/security/index/_0/segments.gen ./repository/workspaces/security/index/_0/segments_2 ./repository/workspaces/security/index/_0/_0.cfs ./repository/workspaces/myrepo ./repository/workspaces/myrepo/workspace.xml ./repository/workspaces/myrepo/index ./repository/workspaces/myrepo/index/indexes_2 ./repository/workspaces/myrepo/index/_0 ./repository/workspaces/myrepo/index/_0/_2.cfs ./repository/workspaces/myrepo/index/_0/segments_4 ./repository/workspaces/myrepo/index/_0/cache.inSegmentParents ./repository/workspaces/myrepo/index/_0/segments_1 ./repository/workspaces/myrepo/index/_0/segments.gen ./repository/revision.log >From what we've seen, a thread is started on each node by jackrabbit to refresh indexes periodically, allowing synchronization inside the cluster. This architectural layout seems to work but, as we lack any real world experience with jackrabbit in this context, we would like to check with the community that we're not bending jackrabbit capabilities in the wrong direction. Could it lead to silent data corruption/inconsistencies ? The second point is about giving operations decent tooling to manage the jackrabbit repo : - Admin console : we were thinking, as our embedded jackrabbit does not expose RMI or Webdav interface, relying on jackrabbit-standalone (either cli or server mode) : by copying a repository.xml, changing its cluster id, and starting a new session with the standalone version from this file, we could manage our nodes and to search (using jackrabbitexplorer on top of it for example). Could it be a viable solution ? - Repo inconsistencies : does the OraclePersistenceManager really support the <param name="consistencyFix" value="true" /> ? It does not seem so. Are there other tools we could use to investigate and fix problems inside repo data ? Any input on this matter would be extremely valuable to us. Thanks. Fabrice Aupert
