Re: Jackrabbit = Kick Ass Tool (was: Jackrabbit = Big Trouble??)

Mark Waschkowski Tue, 07 Aug 2007 09:24:27 -0700

OK, great, happy to help.

Yes, a lot of people will want to do this, and I've tried unsuccessfully in
the past to get the details surrounding backup techniques, so when I got
back from vacation and saw your post, I didn't want to lose the info!


Would you please confirm that I got the information correct, as I haven't
tried this yet? After you confirm, I will update my config, test it, and
then provide my new config as a blueprint and put in the wiki.

Best,

Mark

On 8/7/07, David Nuescheler <[EMAIL PROTECTED]> wrote:
>
> Hi Mark,
>
> I think this is an excellent idea, thanks a lot for putting in the effort.
>
> I think the case that someone would like to store all their content
> within the same RDBMS is common enough that we even should
> have a blueprint example config in the documentation.
>
> thanks again,
> david
>
>
> On 8/7/07, Mark Waschkowski <[EMAIL PROTECTED]> wrote:
> > Hi David,
> >
> > I would like to update the wiki with the below information, as I think
> its
> > quite valuable and would help new users without having to scour the
> mailing
> > list. If you verify the following, I will update the wiki.
> >
> > -----For wiki:
> > Using DBFileSystem as specified in the repository.xml:
> > <Repository>
> >         <FileSystem ...>
> >
> > and using the same database any of the PersistenceManager entries, the
> only
> > things that need to be backed up are:
> > 1) repository.xml
> > 2) the database
> >
> > Then, to restore from a backup, all that would need to be done is to use
> the
> > backed up repository.xml , restore the database using the backup, and
> the
> > indexes will rebuild themselves when the system restarts. This will
> properly
> > handle versioning as well.
> >
> > Note: rebuilding of indexes may take a significant amount of time
> > ----end
> >
> > If all that looks correct, I'll fill in an example FileSystem and update
> the
> > wiki. As well, any suggestions for the 'significant amount of time
> part'?
> >
> > Thanks,
> >
> > Mark
> >
> > On 7/30/07, David Nuescheler <[EMAIL PROTECTED]> wrote:
> > >
> > > Hi Bruce,
> > >
> > > thanks for your comment.
> > >
> > > > I am not fired by index problems. -)
> > > > I just want to everybody realize it is very critical issue to back
> up
> > > your repository.
> > > > Currently, the solution is:
> > > > 1) Backup DB data.
> > > > 2) Backup your file system and you can delete all indexes of them.
> > > > However, it is still a bug that JackRabbit v1.3 can not rebuild
> > > everything from DB, in
> > > > case your hard driver dies with all your repository file system.
> > > Shouldn't that be solved by the DBFileSystem.
> > >
> http://yukatan.fi/2007/1.4/org/apache/jackrabbit/core/fs/db/DbFileSystem.html
> > >
> > >
> > > This allows you to store everything that is necessary for a complete
> > > restore
> > > in the DB, which means your DB backup is the only thing (beyond the
> > > repository.xml) that you need to restore a complete JR instance.
> > >
> > > > My concerns are two:
> > > > 1) Performance of navigation of Nodes which relates cache manager
> > > resizing
> > > I appreciate the performance issue. I am still not convinced that this
> > > is related
> > > with the cache manager resizing...
> > >
> > > > 2) Logic backup repository using JCR export/import API.
> > > I agree that it would be desirable to have a built-in backup/restore
> > > mechanism on a higher level.
> > >
> > > The JCR export/import is probably not the right layer,
> > > since it only covers the content in a single workspace and has no
> > > means to address things like nodetypes, versions or the
> > > namespace registry.
> > > And I think your most pressing issue should be addressed
> > > by the DBFileSystem.
> > >
> > > regards,
> > > david
> > >
> > > > -----Original Message-----
> > > > From: [EMAIL PROTECTED] [mailto: [EMAIL PROTECTED] On
> Behalf Of
> > > Bertrand Delacretaz
> > > > Sent: Friday, July 27, 2007 3:15 AM
> > > > To: [email protected]
> > > > Subject: Jackrabbit = Kick Ass Tool (was: Jackrabbit = Big
> Trouble??)
> > > >
> > > > Hi,
> > > >
> > > > I hate to play grumpy old man once again, but the recent trend
> towards
> > > > Loud Subjects That Catch Peoples Attention does not really help the
> > > > discussion, so let's rename this thread ;-)
> > > >
> > > > Bruce, if I read your message correctly, it looks like you have
> three
> > > > problems with Jackrabbit:
> > > >
> > > > 1) Cache Manager resizes seem to slow your app down
> > > > 2) You're going to be fired because you lost your index (or
> Jackrabbit
> > > did)
> > > > 3) You're not sure about which application pattern/content model to
> use
> > > >
> > > > So let's please tackle these one at a time, ideally in separate
> > > > threads so that people can contribute efficiently to the discussion.
>
> > > >
> > > > Sorry if I'm being a bit harsh, but IMHO you started it with the
> > > > choice of your message's subject ;-)
> > > > -Bertrand
> > > >
> > > >
> > > > On 7/27/07, Bruce Li < [EMAIL PROTECTED]> wrote:
> > > > > I have been in this Jackrabbit Community for a couple of months
> since
> > > I joined repository project two months ago.
> > > > >
> > > > >
> > > > >
> > > > > First, I respect and appreciate all hard works contributed in
> current
> > > JackRabbit project and definitely I am sure a lot of developers
> benefit from
> > > this project. There are some people contribute their JackRabbit
> working
> > > experience like David Nuescheler, who collects "7 DR Rules", which is
> > > precious since current lack of document of JackRabbit, and they are
> "real"
> > > working experiences.
> > > > >
> > > > >
> > > > >
> > > > > However, I also heard some negative voice from this community like
> > > "JackRabbit is dead (for us)" from Frédéric Esnault. I suffer some
> troubles
> > > from JackRabbit and it seems foundational problems. I would like to
> share
> > > all my experience with you, and any feedback or good suggestion is
> > > definitely what I want.
> > > > >
> > > > >
> > > > >
> > > > > Since these troubles are "big" troubles for enterprise use of
> > > JackRabbit 1.3, let's discuss it from beginning.
> > > > >
> > > > >
> > > > >
> > > > > Question 1:
> > > > >
> > > > > Why do you select JackRabbit rather than Database as your
> repository
> > > solution?
> > > > >
> > > > >
> > > > >
> > > > > There are a lot of answers for this question and it seems that
> > > everybody who joins this community has already known the answers (It
> may be
> > > formal document which was approved by your CTO).  However, my opinion,
> this
> > > is the basic question really need to be discussed here.
> > > > >
> > > > >
> > > > >
> > > > > To answer this question, some technical key words to support
> > > Jackrabbit may be "JCR API", "Lucene Search Engine" and so on.
> However, as
> > > the user of JackRabbit, I would like to list the two key concerns why
> I
> > > select JackRabbit as repository solution from Product Point of View:
> > > > >
> > > > >
> > > > >
> > > > > 1.      Quick and effective data search/fetch from volume content
> > > repository
> > > > > 2.      Build-in content version/revision control without extra
> code
> > > > >
> > > > >
> > > > >
> > > > > Now let me describe the big troubles I met in my use:
> > > > >
> > > > > 1.      Quick and effective data search or fetch from volume
> content
> > > repository
> > > > >
> > > > >
> > > > >
> > > > > Experience: There are not many data on my repository which
> contains
> > > hundreds of two major object nodes, each node (object) contains less
> than 20
> > > properties (fields), including the other 5 child nodes (nested small
> > > objects) and one of two major nodes(object) has one binary data (up to
> 1
> > > megabyte). Unfortunately, the performance is not acceptable when I
> navigate
> > > nodes of the major nodes. The main problem is the build-in Cache
> Manager of
> > > JackRabbit resizes which costs uncertain time, which result the
> operation
> > > very slow sometimes.  It is not easy to read those codes when
> debugging
> > > Jackrabbit for performance tuning because there is no document about
> the
> > > logic behind the index resizing.
> > > > >
> > > > >
> > > > >
> > > > > 2.      Content version/revision control
> > > > >
> > > > > Experience: This function works well on Jackrabbit v1.3. The main
> > > problem is that all revision (except base revision) of node are lost
> when
> > > export/import data from one repository to another repository. I am
> > > discussing this issue because it concerns the repository backup.
> > > > >
> > > > >
> > > > >
> > > > > I just found in JackRabbit v1.3, there is no way to backup
> repository
> > > using DB as persistence manager. I mean that there is no way to
> re-index
> > > based on data on DB. The following is my case:
> > > > >
> > > > >
> > > > >
> > > > > In one repository server, the index (in file system) is corrupt
> which
> > > causes all search failure. However, all data (in DB) is still alive,
> where
> > > you can iterate all of them. After clean the whole repository file
> system
> > > (most of them are index information), Jackrabbit can not correctly
> re-build
> > > index based on the data on DB. If it happens on production repository,
> it
> > > means: "My God, I am going to be fired". As I know, Jackrabbit v1.1can
> > > successfully re-index (creating totally new repository index (file
> system)
> > > based on DB data).
> > > > >
> > > > >
> > > > >
> > > > > As the alternative solution to backup repository, I try to
> > > export/import all nodes from repository to another repository using
> JCR
> > > Export API (exportSystemView). The good news is that JackRabbot
> v1.3successfully builds index (the whole file system) during the importing
> > > process; the bad news is that it lost all revision of all versioning
> nodes.
> > > Can you image how frustrate I am when I realize there is no way to
> backup
> > > repository based on DB data?
> > > > >
> > > > >
> > > > >
> > > > > I just got the answer for the re-index issue for Jackrabbit v1.3:
> You
> > > CAN NOT delete all file system. Only delete all indexes but keep the
> other
> > > folders. Jackrabbit can re-index successfully when it starts up.
> > > > >
> > > > >
> > > > >
> > > > > Question 2:
> > > > >
> > > > > How can developer correctly use Jackrabbit (JCR) as their
> repository
> > > solution?
> > > > >
> > > > >
> > > > >
> > > > > The expert of jackrabbit may see that I use object to describe
> node
> > > and you may think it is not the pattern you are using Jackrabbit. So
> the
> > > question is raised as "Which is the best practices (pattern) to use
> > > Jackrabbit (JCR) as repository solution."
> > > > >
> > > > >
> > > > >
> > > > > From this community, I see a lot of developers use Jackrabbit by
> > > fetching contents by path. It means that they do not need treat node
> as
> > > object, instead, they put content on repository as asset, which can be
> > > easily and effectively retrieved by a given path. This pattern exactly
> meets
> > > the truth of "The simplicity is the best".
> > > > >
> > > > >
> > > > >
> > > > > My use of Jackrabbit is based on the business requirement, which
> need
> > > to navigate most of nodes and reference nodes, check child nodes and
> > > properties to find the proper content by a couple of business rules. I
> would
> > > like to say that all performance issues are raised by nodes iteration
> > > process. Even more, I have created generic classes using java reflect
> > > package for bi-directory mapping between nodes and objects. For
> performance
> > > improvement, the mapping supports generic child nodes lazy loading.
> However,
> > > it seems all these jobs do not solve the performance problem although
> they
> > > sound pretty "professional".  You may ask me: if you have such
> business
> > > requirement, why not go to DB and build the full relationship for your
>
> > > business model? J2EE developers all know how powerful java-db world
> is: the
> > > mature ORM tool ( e.g. Hibernate), transaction management, batch data
> > > fetching, performance tuning and so on. However, my question is: "Is
> there
> > > any good pattern in current jackrabbit to effectively handle data
> fetching
> > > with week relationship?"
> > > > >
> > > > >
> > > > >
> > > > > Now it is time to say some words to the jackrabbit developers and
> > > contributors what I really want to say for the whole community:
> > > > >
> > > > >
> > > > >
> > > > > My begs:
> > > > >
> > > > > Guide, document and sample code is the king for any open source.
> How
> > > frustrating for Jackrabbit developers find the incorrect pattern is
> applied
> > > by users on their projects. On the other hand, how frustrating for
> > > JackRabbit users can not find the good pattern to follow, which can
> save
> > > their bunch of time. From product point of view, the search by XPath
> or
> > > XQuery or SQL is not foundational issue. The foundational issue is one
> > > effective search means covers most of important requirements from real
> world
> > > and the document can be found in jackrabbit web site.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > I do believe Jackrabbit is qualified project and I really hope all
>
> > > "best features" are documented, demoed and used by the whole
> community.
> > > > >
> > > > >
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > >
> > > > > Bruce
> > > >
> > >
> >
> >
> >
> > --
> > Best,
> >
> > Mark Waschkowski
> >
>



-- 
Best,

Mark Waschkowski

Re: Jackrabbit = Kick Ass Tool (was: Jackrabbit = Big Trouble??)

Reply via email to