Hi Bruce, thanks for your comment.
> I am not fired by index problems. -) > I just want to everybody realize it is very critical issue to back up your > repository. > Currently, the solution is: > 1) Backup DB data. > 2) Backup your file system and you can delete all indexes of them. > However, it is still a bug that JackRabbit v1.3 can not rebuild everything > from DB, in > case your hard driver dies with all your repository file system. Shouldn't that be solved by the DBFileSystem. http://yukatan.fi/2007/1.4/org/apache/jackrabbit/core/fs/db/DbFileSystem.html This allows you to store everything that is necessary for a complete restore in the DB, which means your DB backup is the only thing (beyond the repository.xml) that you need to restore a complete JR instance. > My concerns are two: > 1) Performance of navigation of Nodes which relates cache manager resizing I appreciate the performance issue. I am still not convinced that this is related with the cache manager resizing... > 2) Logic backup repository using JCR export/import API. I agree that it would be desirable to have a built-in backup/restore mechanism on a higher level. The JCR export/import is probably not the right layer, since it only covers the content in a single workspace and has no means to address things like nodetypes, versions or the namespace registry. And I think your most pressing issue should be addressed by the DBFileSystem. regards, david > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bertrand > Delacretaz > Sent: Friday, July 27, 2007 3:15 AM > To: [email protected] > Subject: Jackrabbit = Kick Ass Tool (was: Jackrabbit = Big Trouble??) > > Hi, > > I hate to play grumpy old man once again, but the recent trend towards > Loud Subjects That Catch Peoples Attention does not really help the > discussion, so let's rename this thread ;-) > > Bruce, if I read your message correctly, it looks like you have three > problems with Jackrabbit: > > 1) Cache Manager resizes seem to slow your app down > 2) You're going to be fired because you lost your index (or Jackrabbit did) > 3) You're not sure about which application pattern/content model to use > > So let's please tackle these one at a time, ideally in separate > threads so that people can contribute efficiently to the discussion. > > Sorry if I'm being a bit harsh, but IMHO you started it with the > choice of your message's subject ;-) > -Bertrand > > > On 7/27/07, Bruce Li <[EMAIL PROTECTED]> wrote: > > I have been in this Jackrabbit Community for a couple of months since I > > joined repository project two months ago. > > > > > > > > First, I respect and appreciate all hard works contributed in current > > JackRabbit project and definitely I am sure a lot of developers benefit > > from this project. There are some people contribute their JackRabbit > > working experience like David Nuescheler, who collects "7 DR Rules", which > > is precious since current lack of document of JackRabbit, and they are > > "real" working experiences. > > > > > > > > However, I also heard some negative voice from this community like > > "JackRabbit is dead (for us)" from Frédéric Esnault. I suffer some troubles > > from JackRabbit and it seems foundational problems. I would like to share > > all my experience with you, and any feedback or good suggestion is > > definitely what I want. > > > > > > > > Since these troubles are "big" troubles for enterprise use of JackRabbit > > 1.3, let's discuss it from beginning. > > > > > > > > Question 1: > > > > Why do you select JackRabbit rather than Database as your repository > > solution? > > > > > > > > There are a lot of answers for this question and it seems that everybody > > who joins this community has already known the answers (It may be formal > > document which was approved by your CTO). However, my opinion, this is the > > basic question really need to be discussed here. > > > > > > > > To answer this question, some technical key words to support Jackrabbit may > > be "JCR API", "Lucene Search Engine" and so on. However, as the user of > > JackRabbit, I would like to list the two key concerns why I select > > JackRabbit as repository solution from Product Point of View: > > > > > > > > 1. Quick and effective data search/fetch from volume content repository > > 2. Build-in content version/revision control without extra code > > > > > > > > Now let me describe the big troubles I met in my use: > > > > 1. Quick and effective data search or fetch from volume content > > repository > > > > > > > > Experience: There are not many data on my repository which contains > > hundreds of two major object nodes, each node (object) contains less than > > 20 properties (fields), including the other 5 child nodes (nested small > > objects) and one of two major nodes(object) has one binary data (up to 1 > > megabyte). Unfortunately, the performance is not acceptable when I navigate > > nodes of the major nodes. The main problem is the build-in Cache Manager of > > JackRabbit resizes which costs uncertain time, which result the operation > > very slow sometimes. It is not easy to read those codes when debugging > > Jackrabbit for performance tuning because there is no document about the > > logic behind the index resizing. > > > > > > > > 2. Content version/revision control > > > > Experience: This function works well on Jackrabbit v1.3. The main problem > > is that all revision (except base revision) of node are lost when > > export/import data from one repository to another repository. I am > > discussing this issue because it concerns the repository backup. > > > > > > > > I just found in JackRabbit v1.3, there is no way to backup repository using > > DB as persistence manager. I mean that there is no way to re-index based on > > data on DB. The following is my case: > > > > > > > > In one repository server, the index (in file system) is corrupt which > > causes all search failure. However, all data (in DB) is still alive, where > > you can iterate all of them. After clean the whole repository file system > > (most of them are index information), Jackrabbit can not correctly re-build > > index based on the data on DB. If it happens on production repository, it > > means: "My God, I am going to be fired". As I know, Jackrabbit v1.1 can > > successfully re-index (creating totally new repository index (file system) > > based on DB data). > > > > > > > > As the alternative solution to backup repository, I try to export/import > > all nodes from repository to another repository using JCR Export API > > (exportSystemView). The good news is that JackRabbot v1.3 successfully > > builds index (the whole file system) during the importing process; the bad > > news is that it lost all revision of all versioning nodes. Can you image > > how frustrate I am when I realize there is no way to backup repository > > based on DB data? > > > > > > > > I just got the answer for the re-index issue for Jackrabbit v1.3: You CAN > > NOT delete all file system. Only delete all indexes but keep the other > > folders. Jackrabbit can re-index successfully when it starts up. > > > > > > > > Question 2: > > > > How can developer correctly use Jackrabbit (JCR) as their repository > > solution? > > > > > > > > The expert of jackrabbit may see that I use object to describe node and you > > may think it is not the pattern you are using Jackrabbit. So the question > > is raised as "Which is the best practices (pattern) to use Jackrabbit (JCR) > > as repository solution." > > > > > > > > From this community, I see a lot of developers use Jackrabbit by fetching > > contents by path. It means that they do not need treat node as object, > > instead, they put content on repository as asset, which can be easily and > > effectively retrieved by a given path. This pattern exactly meets the truth > > of "The simplicity is the best". > > > > > > > > My use of Jackrabbit is based on the business requirement, which need to > > navigate most of nodes and reference nodes, check child nodes and > > properties to find the proper content by a couple of business rules. I > > would like to say that all performance issues are raised by nodes iteration > > process. Even more, I have created generic classes using java reflect > > package for bi-directory mapping between nodes and objects. For performance > > improvement, the mapping supports generic child nodes lazy loading. > > However, it seems all these jobs do not solve the performance problem > > although they sound pretty "professional". You may ask me: if you have > > such business requirement, why not go to DB and build the full relationship > > for your business model? J2EE developers all know how powerful java-db > > world is: the mature ORM tool (e.g. Hibernate), transaction management, > > batch data fetching, performance tuning and so on. However, my question is: > > "Is there any good pattern in current jackrabbit to effectively handle data > > fetching with week relationship?" > > > > > > > > Now it is time to say some words to the jackrabbit developers and > > contributors what I really want to say for the whole community: > > > > > > > > My begs: > > > > Guide, document and sample code is the king for any open source. How > > frustrating for Jackrabbit developers find the incorrect pattern is applied > > by users on their projects. On the other hand, how frustrating for > > JackRabbit users can not find the good pattern to follow, which can save > > their bunch of time. From product point of view, the search by XPath or > > XQuery or SQL is not foundational issue. The foundational issue is one > > effective search means covers most of important requirements from real > > world and the document can be found in jackrabbit web site. > > > > > > > > > > > > I do believe Jackrabbit is qualified project and I really hope all "best > > features" are documented, demoed and used by the whole community. > > > > > > > > Thanks > > > > > > > > Bruce >
