Hi all, We are setting up a clustered Jackrabbit environment as the data storage for our (custom) CMS. We are using Jackrabbit 1.6.0 with an Oracle 10g database, bundled persistence manager and finegrained ISM locking. Whenever the repository is accessed through a JcrSession, we first do a session.refresh(). We were assuming clustering would have some overhead, but because of the much-talked-about performance bottleneck in the PersistenceManager and/or SharedItemStateManager, the performance boost would compensate this overhead.
What we have observed is that having more cluster nodes using the same Jackrabbit repository has a significant impact on performance, with almost linear degradation. We started a performance test by uploading files to just one machine in a two-machine-cluster and the same test on a six-machine-cluster. The first test was about 3x faster than the second test. Most of the time, the threads are waiting on a Mutex in org.apache.jackrabbit.core.cluster.ClusterNode.sync() (called from SessionImpl.refresh()). Uploading to multiple cluster-machines at the same time only seemed to increase the performance-impact, probably because the time a Mutex is being held is longer due to the fact other nodes in the cluster have updates, too. So my questions are: - Is it a good idea to call session.refresh() every time we use the session? - Is there a difference between calling session.refresh() and the automatic sync done by the ClusterNode thread? - Why is a refresh more expansive when there are more cluster nodes? Thanks for any information! Dennis -- Dennis van der Laan, MSc Centre for Information Technology University of Groningen
