Hi Shaun, Our experience with production systems has largely been with Day's commercially licensed version of Jackrabbit, CRX, which contains some prioprietary extensions. However, it's sufficiently similar that many of the points you raise have similar answers across both systems.
Our experiences to date have indicated that there isn't a straight answer to the questions you answer - much depends upon what you are trying to do with the system. For example:
* performance with lots of nodes - any comments on the best persistence manager/config to use over and above the FAQ comments.
Key factors here are: * your data model - Jackrabbit does not handle large flat node hierarchies well, so it is sometimes necessary to artificially deepen the hierarchy to address this. * the persistence manager - the way in which JR stores data in the underlying database has a big effect on performance (e.g. remote vs. local db, persistence manager mapping to database tables). * use of versioning / transactions - use of these features carries a performance overhead (in some cases significant). Reliability
* reliability of the persistence - how likely is corruption of the persisted objects?
Again this depends... Use of versionable nodes seems to be a problem at the moment. We've seen significant issues with data loss and corruption in live environments because of the current transaction handling when storing versionable nodes. This is to do with the fact that JR does not have support for true distributed transactions, but maintains seperate connections to the workspace and the version storage. If one of these fails and rolls-back you can end up with a corrupt repository that then needs to be fixed "by hand" with possible loss of data. There are other issues, such as current lack of failover support, search-indexes not being transactional (afaik still?), the need to restart jackrabbit in the event of transient loss of connectivity to the database, etc., but these are comparatively more minor.
* scalability - has JackRabbit being proven to handle lots of concurrent access? Can it yet be clustered? Any equivalent to the replication provided by Day?
There's some work Dominique's doing now on clustering - see JCR-263 (http://issues.apache.org/jira/browse/JCR-623). In terms of concurrent simple read access, JR is pretty damned fast, so handling lots (how much are you thinking of here?) of concurrent access is unlikely to be a problem even without clustering support. For write access or versioning, etc.
Any insight from developers with live systems based on JackRabbit would be gratefully received and provide reassurance that JackRabbit is a suitable choice.
Hope that's useful and hasn't put you off too much :-). Miro
