Performance question when clustering

Dennis van der Laan Mon, 19 Apr 2010 02:47:56 -0700

Hi all,

We are setting up a clustered Jackrabbit environment as the data storage
for our (custom) CMS. We are using Jackrabbit 1.6.0 with an Oracle 10g
database, bundled persistence manager and finegrained ISM locking.
Whenever the repository is accessed through a JcrSession, we first do a
session.refresh().  We were assuming clustering would have some
overhead, but because of the much-talked-about performance bottleneck in
the PersistenceManager and/or SharedItemStateManager, the performance
boost would compensate this overhead.


What we have observed is that having more cluster nodes using the same
Jackrabbit repository has a significant impact on performance, with
almost linear degradation. We started a performance test by uploading
files to just one machine in a two-machine-cluster and the same test on
a six-machine-cluster. The first test was about 3x faster than the
second test. Most of the time, the threads are waiting on a Mutex in
org.apache.jackrabbit.core.cluster.ClusterNode.sync() (called from
SessionImpl.refresh()). Uploading to multiple cluster-machines at the
same time only seemed to increase the performance-impact, probably
because the time a Mutex is being held is longer due to the fact other
nodes in the cluster have updates, too.

So my questions are:
- Is it a good idea to call session.refresh() every time we use the
session?
- Is there a difference between calling session.refresh() and the
automatic sync done by the ClusterNode thread?
- Why is a refresh more expansive when there are more cluster nodes?

Thanks for any information!

Dennis

-- 
Dennis van der Laan, MSc
Centre for Information Technology
University of Groningen

Performance question when clustering

Reply via email to