Re: JackRabbit maturity?- robustness, performance and scalability

Miro Walker Wed, 15 Nov 2006 00:49:22 -0800

Hi Shaun,

Our experience with production systems has largely been with Day's
commercially licensed version of Jackrabbit, CRX, which contains some
prioprietary extensions. However, it's sufficiently similar that many
of the points you raise have similar answers across both systems.


Our experiences to date have indicated that there isn't a straight
answer to the questions you answer - much depends upon what you are
trying to do with the system. For example:

 * performance with lots of nodes - any comments on the best persistence
manager/config to use over and above the FAQ comments.


Key factors here are:
* your data model - Jackrabbit does not handle large flat node
hierarchies well, so it is sometimes necessary to artificially deepen
the hierarchy to address this.
* the persistence manager - the way in which JR stores data in the
underlying database has a big effect on performance (e.g. remote vs.
local db, persistence manager mapping to database tables).
* use of versioning / transactions - use of these features carries a
performance overhead (in some cases significant).

Reliability

 * reliability of the persistence - how likely is corruption of the
persisted objects?


Again this depends... Use of versionable nodes seems to be a problem
at the moment. We've seen significant issues with data loss and
corruption in live environments because of the current transaction
handling when storing versionable nodes. This is to do with the fact
that JR does not have support for true distributed transactions, but
maintains seperate connections to the workspace and the version
storage. If one of these fails and rolls-back you can end up with a
corrupt repository that then needs to be fixed "by hand" with possible
loss of data.

There are other issues, such as current lack of failover support,
search-indexes not being transactional (afaik still?), the need to
restart jackrabbit in the event of transient loss of connectivity to
the database, etc., but these are comparatively more minor.

 * scalability - has JackRabbit being proven to handle lots of concurrent
access? Can it yet be clustered? Any equivalent to the replication provided
by Day?


There's some work Dominique's doing now on clustering - see JCR-263
(http://issues.apache.org/jira/browse/JCR-623). In terms of concurrent
simple read access, JR is pretty damned fast, so handling lots (how
much are you thinking of here?) of concurrent access is unlikely to be
a problem even without clustering support. For write access or
versioning, etc.


Any insight from developers with live systems based on JackRabbit would be
gratefully received and provide reassurance that JackRabbit is a suitable
choice.


Hope that's useful and hasn't put you off too much :-).

Miro

Re: JackRabbit maturity?- robustness, performance and scalability

Reply via email to