Re: eXist

Marcin Nowak Mon, 23 Apr 2007 01:38:48 -0700

Hi,

First of all, my intention was definitely not to troll - I am looking for the best solution for an XML storage, my favourite is Jackrabbit but I've found something what in my opinion performs better - I am only asking why? I really want to use Jackrabbit, I like it versioning and referencing features but I need it to be a high performance XML storage.

In fact my question was based on short testing, but not just 5 minutes :) I have created a repository containing a collections nested in each other(three of them) each with three 4,5 MB XML files. Then I've launched a query (btw - import times are impressive (4,5MB XML in ca. 10 seconds)- will you agree? If not - show me how to configure Jackrabbit to preform that good(same import in Jackrabbit took ca. 16 minutes on same machine) - again please don't take it as trolling - **I really want to know how to configure Jackrabbit to be high-performance**). Query was really simple


for $x in //type where $x='STRING_SINGLE'
return $x

and was performed on the whole DB - correct me if I am wrong. Results of querying I have received after less than 4 seconds.

I know how Jackrabbit performs in default configuration, on derby, mysql, and oracle DB very well, you can see results of my tests somewhere here in mailing archives, I've published complex report some time ago, after that report I have made those tests again - because of changes made in Jackrabbit source code, results were better but in comparison to eXist, again, not to optimistic.

My main question is that is there anything that can speed up Jackrabbit to get close to performance results achieved in eXist? Take this question seriously - performance is one of the main requirements to XML storage which I need.


BR,
Marcin Nowak

Jean-Baptiste Quenot wrote:

* Marcin Nowak:

Recently I've  discovered XML database quite  similar in general
concepts to Jackrabbit,  in fact it does  not provide versioning
and  referencing  between  nodes  but   it  is  really  fast  as
I  compared  it  with  Jackrabbit, especially  in  querying  and
importing nodes, question is why Jackrabbit performs so badly in
comparison to eXist?


You're asking  for a troll very  obviously, so I won't  comment on
it, but there are a few things that are worth to mention:

1. eXist  is  an XML  database,  Jackrabbit  is  not, so  you  are
   comparing two  unrelated things.   Moreover, even if  the query
   syntax can look similar, eXist returns XML, whereas JCR returns
   Java objects.  You need to understand the implications of this,
   namely parsing the  resulting XML and work with  it can quickly
   lead to  memory and CPU  starvation, especially when  the query
   returns a lot of documents.  JCR  plays nicely with this, as it
   returns an iterator on the data set.

2. Jackrabbit is  mostly seen  as a Java-API,  whereas eXist  is a
   standalone beast with specific servlets that talk xmlrpc, REST,
   and  so  on mostly  accessed  using  HTTP requests  causing  an
   additional  overhead.  eXist  even  has a  front-end  based  on
   Cocoon.  A  *lot* of caching is  done on the eXist  side, while
   with Jackrabbit you will need  a second-level cache in your own
   code to address that.

3. In my  book, eXist is not  designed to let you  query the whole
   database at  once, whereas  Jackrabbit allows  you to  return a
   sorted  subset  of documents  from  the  whole repository  very
   efficiently,  by design.   Accessing one  XML document  is very
   different from querying the whole database with 10k+ documents.
   Play with eXist more than 5 minutes with a serious data set and
   you will notice by yourself.

4. Jackrabbit's efficiency at importing nodes depends largely on

   the persistence  and filesystem  implementation you  are using.
   For example I've seen the  BDB storage backend perform 10 times
   faster than the XML-file-based one.

5. When  you compare  two approaches  (one XML  database, one  JCR
   repository) for your own usecase, and moreover when you ask for
   feedback about  your experiments,  publish the results  of your
   benchmarks, be very  careful to mention *what*  you tested, and
   *how*.  You also need to mention of course the numeric figures.
   Otherwise you're just spreading FUD.

Cheers,

smime.p7s
Description: S/MIME Cryptographic Signature

Re: eXist

Reply via email to