Jackrabbit = Big Trouble??

Bruce Li Thu, 26 Jul 2007 15:23:22 -0700

I have been in this Jackrabbit Community for a couple of months since I joined 
repository project two months ago.


 

First, I respect and appreciate all hard works contributed in current 
JackRabbit project and definitely I am sure a lot of developers benefit from 
this project. There are some people contribute their JackRabbit working 
experience like David Nuescheler, who collects "7 DR Rules", which is precious 
since current lack of document of JackRabbit, and they are "real" working 
experiences. 

 

However, I also heard some negative voice from this community like "JackRabbit 
is dead (for us)" from Frédéric Esnault. I suffer some troubles from JackRabbit 
and it seems foundational problems. I would like to share all my experience 
with you, and any feedback or good suggestion is definitely what I want.

 

Since these troubles are "big" troubles for enterprise use of JackRabbit 1.3, 
let's discuss it from beginning.

 

Question 1: 

Why do you select JackRabbit rather than Database as your repository solution?

 

There are a lot of answers for this question and it seems that everybody who 
joins this community has already known the answers (It may be formal document 
which was approved by your CTO).  However, my opinion, this is the basic 
question really need to be discussed here.

 

To answer this question, some technical key words to support Jackrabbit may be 
"JCR API", "Lucene Search Engine" and so on. However, as the user of 
JackRabbit, I would like to list the two key concerns why I select JackRabbit 
as repository solution from Product Point of View:

 

1.      Quick and effective data search/fetch from volume content repository
2.      Build-in content version/revision control without extra code

 

Now let me describe the big troubles I met in my use: 

1.      Quick and effective data search or fetch from volume content repository

 

Experience: There are not many data on my repository which contains hundreds of 
two major object nodes, each node (object) contains less than 20 properties 
(fields), including the other 5 child nodes (nested small objects) and one of 
two major nodes(object) has one binary data (up to 1 megabyte). Unfortunately, 
the performance is not acceptable when I navigate nodes of the major nodes. The 
main problem is the build-in Cache Manager of JackRabbit resizes which costs 
uncertain time, which result the operation very slow sometimes.  It is not easy 
to read those codes when debugging Jackrabbit for performance tuning because 
there is no document about the logic behind the index resizing. 

 

2.      Content version/revision control

Experience: This function works well on Jackrabbit v1.3. The main problem is 
that all revision (except base revision) of node are lost when export/import 
data from one repository to another repository. I am discussing this issue 
because it concerns the repository backup.

 

I just found in JackRabbit v1.3, there is no way to backup repository using DB 
as persistence manager. I mean that there is no way to re-index based on data 
on DB. The following is my case:

 

In one repository server, the index (in file system) is corrupt which causes 
all search failure. However, all data (in DB) is still alive, where you can 
iterate all of them. After clean the whole repository file system (most of them 
are index information), Jackrabbit can not correctly re-build index based on 
the data on DB. If it happens on production repository, it means: "My God, I am 
going to be fired". As I know, Jackrabbit v1.1 can successfully re-index 
(creating totally new repository index (file system) based on DB data).

 

As the alternative solution to backup repository, I try to export/import all 
nodes from repository to another repository using JCR Export API 
(exportSystemView). The good news is that JackRabbot v1.3 successfully builds 
index (the whole file system) during the importing process; the bad news is 
that it lost all revision of all versioning nodes. Can you image how frustrate 
I am when I realize there is no way to backup repository based on DB data?

 

I just got the answer for the re-index issue for Jackrabbit v1.3: You CAN NOT 
delete all file system. Only delete all indexes but keep the other folders. 
Jackrabbit can re-index successfully when it starts up. 

 

Question 2: 

How can developer correctly use Jackrabbit (JCR) as their repository solution?

 

The expert of jackrabbit may see that I use object to describe node and you may 
think it is not the pattern you are using Jackrabbit. So the question is raised 
as "Which is the best practices (pattern) to use Jackrabbit (JCR) as repository 
solution."

 

>From this community, I see a lot of developers use Jackrabbit by fetching 
>contents by path. It means that they do not need treat node as object, 
>instead, they put content on repository as asset, which can be easily and 
>effectively retrieved by a given path. This pattern exactly meets the truth of 
>"The simplicity is the best".

 

My use of Jackrabbit is based on the business requirement, which need to 
navigate most of nodes and reference nodes, check child nodes and properties to 
find the proper content by a couple of business rules. I would like to say that 
all performance issues are raised by nodes iteration process. Even more, I have 
created generic classes using java reflect package for bi-directory mapping 
between nodes and objects. For performance improvement, the mapping supports 
generic child nodes lazy loading. However, it seems all these jobs do not solve 
the performance problem although they sound pretty "professional".  You may ask 
me: if you have such business requirement, why not go to DB and build the full 
relationship for your business model? J2EE developers all know how powerful 
java-db world is: the mature ORM tool (e.g. Hibernate), transaction management, 
batch data fetching, performance tuning and so on. However, my question is: "Is 
there any good pattern in current jackrabbit to effectively handle data 
fetching with week relationship?" 

 

Now it is time to say some words to the jackrabbit developers and contributors 
what I really want to say for the whole community:

 

My begs:

Guide, document and sample code is the king for any open source. How 
frustrating for Jackrabbit developers find the incorrect pattern is applied by 
users on their projects. On the other hand, how frustrating for JackRabbit 
users can not find the good pattern to follow, which can save their bunch of 
time. From product point of view, the search by XPath or XQuery or SQL is not 
foundational issue. The foundational issue is one effective search means covers 
most of important requirements from real world and the document can be found in 
jackrabbit web site.

 

 

I do believe Jackrabbit is qualified project and I really hope all "best 
features" are documented, demoed and used by the whole community.

 

Thanks 

 

Bruce

Jackrabbit = Big Trouble??

Reply via email to