RE: JackRabbit maturity?- robustness, performance and scalability

Shaun Barriball Wed, 15 Nov 2006 01:37:23 -0800

Hi Miro et al,
Thanks for the detailed insight. To pick up on some key points:


 * versioning - our current model makes use of nt:hierarchyNode,
mix:referenceable, mix:lockable and mix:versionable. From your comments it
sounds like using mix:versionable will significantly reduce the reliability
of JackRabbit. Would you recommend NOT using mix:versionable therefore?

 * persistence - we'd prefer to use the SimpleDbPersistenceManager with
MySql. Is this a popular/reliable combination?

 * Fix "by hand" - Given that some persistence managers use binary
serialization, how do you go about correcting the integrity of the database?
The prospect scares me but its not uncommon with applications operating
ontop of schemas with complex referential integrity.

 * you mentioned Day CRX. We also installed this and we're initially
impressed with the polished package however since then we've found some
significant problems with the Content Explorer etc which sow the seed of
doubt that there are potentially bigger issues under the covers. It's
important for us to have a commercial alternative so I'd welcome any
comments/experiences on using Day versus JackRabbit - for example, is
mix:versionable viable with Day?

Overall, your comments haven't 'put me off'. All persistence tiers have
their problems as they mature - this doesn't negate the value-add JackRabbit
provides over and above building a custom OR/RDBMS solution. I've happy to
share our results with this list as we perform various tests.

Regards,
Shaun.

-----Original Message-----
From: Miro Walker [mailto:[EMAIL PROTECTED] 
Sent: 15 November 2006 08:47
To: [email protected]
Subject: Re: JackRabbit maturity?- robustness, performance and scalability

Hi Shaun,

Our experience with production systems has largely been with Day's
commercially licensed version of Jackrabbit, CRX, which contains some
prioprietary extensions. However, it's sufficiently similar that many of the
points you raise have similar answers across both systems.

Our experiences to date have indicated that there isn't a straight answer to
the questions you answer - much depends upon what you are trying to do with
the system. For example:

>  * performance with lots of nodes - any comments on the best 
> persistence manager/config to use over and above the FAQ comments.

Key factors here are:
* your data model - Jackrabbit does not handle large flat node hierarchies
well, so it is sometimes necessary to artificially deepen the hierarchy to
address this.
* the persistence manager - the way in which JR stores data in the
underlying database has a big effect on performance (e.g. remote vs.
local db, persistence manager mapping to database tables).
* use of versioning / transactions - use of these features carries a
performance overhead (in some cases significant).

Reliability
>  * reliability of the persistence - how likely is corruption of the 
> persisted objects?

Again this depends... Use of versionable nodes seems to be a problem at the
moment. We've seen significant issues with data loss and corruption in live
environments because of the current transaction handling when storing
versionable nodes. This is to do with the fact that JR does not have support
for true distributed transactions, but maintains seperate connections to the
workspace and the version storage. If one of these fails and rolls-back you
can end up with a corrupt repository that then needs to be fixed "by hand"
with possible loss of data.

There are other issues, such as current lack of failover support,
search-indexes not being transactional (afaik still?), the need to restart
jackrabbit in the event of transient loss of connectivity to the database,
etc., but these are comparatively more minor.

>  * scalability - has JackRabbit being proven to handle lots of 
> concurrent access? Can it yet be clustered? Any equivalent to the 
> replication provided by Day?

There's some work Dominique's doing now on clustering - see JCR-263
(http://issues.apache.org/jira/browse/JCR-623). In terms of concurrent
simple read access, JR is pretty damned fast, so handling lots (how much are
you thinking of here?) of concurrent access is unlikely to be a problem even
without clustering support. For write access or versioning, etc.

>
> Any insight from developers with live systems based on JackRabbit 
> would be gratefully received and provide reassurance that JackRabbit 
> is a suitable choice.
>

Hope that's useful and hasn't put you off too much :-).

Miro

Send instant messages to your online friends http://uk.messenger.yahoo.com

RE: JackRabbit maturity?- robustness, performance and scalability

Reply via email to