Hi,First of all, my intention was definitely not to troll - I am looking for the best solution for an XML storage, my favourite is Jackrabbit but I've found something what in my opinion performs better - I am only asking why? I really want to use Jackrabbit, I like it versioning and referencing features but I need it to be a high performance XML storage.
In fact my question was based on short testing, but not just 5 minutes :) I have created a repository containing a collections nested in each other(three of them) each with three 4,5 MB XML files. Then I've launched a query (btw - import times are impressive (4,5MB XML in ca. 10 seconds)- will you agree? If not - show me how to configure Jackrabbit to preform that good(same import in Jackrabbit took ca. 16 minutes on same machine) - again please don't take it as trolling - **I really want to know how to configure Jackrabbit to be high-performance**). Query was really simple
for $x in //type where $x='STRING_SINGLE' return $xand was performed on the whole DB - correct me if I am wrong. Results of querying I have received after less than 4 seconds.
I know how Jackrabbit performs in default configuration, on derby, mysql, and oracle DB very well, you can see results of my tests somewhere here in mailing archives, I've published complex report some time ago, after that report I have made those tests again - because of changes made in Jackrabbit source code, results were better but in comparison to eXist, again, not to optimistic.
My main question is that is there anything that can speed up Jackrabbit to get close to performance results achieved in eXist? Take this question seriously - performance is one of the main requirements to XML storage which I need.
BR, Marcin Nowak Jean-Baptiste Quenot wrote:
* Marcin Nowak:Recently I've discovered XML database quite similar in general concepts to Jackrabbit, in fact it does not provide versioning and referencing between nodes but it is really fast as I compared it with Jackrabbit, especially in querying and importing nodes, question is why Jackrabbit performs so badly in comparison to eXist?You're asking for a troll very obviously, so I won't comment on it, but there are a few things that are worth to mention: 1. eXist is an XML database, Jackrabbit is not, so you are comparing two unrelated things. Moreover, even if the query syntax can look similar, eXist returns XML, whereas JCR returns Java objects. You need to understand the implications of this, namely parsing the resulting XML and work with it can quickly lead to memory and CPU starvation, especially when the query returns a lot of documents. JCR plays nicely with this, as it returns an iterator on the data set. 2. Jackrabbit is mostly seen as a Java-API, whereas eXist is a standalone beast with specific servlets that talk xmlrpc, REST, and so on mostly accessed using HTTP requests causing an additional overhead. eXist even has a front-end based on Cocoon. A *lot* of caching is done on the eXist side, while with Jackrabbit you will need a second-level cache in your own code to address that. 3. In my book, eXist is not designed to let you query the whole database at once, whereas Jackrabbit allows you to return a sorted subset of documents from the whole repository very efficiently, by design. Accessing one XML document is very different from querying the whole database with 10k+ documents. Play with eXist more than 5 minutes with a serious data set and you will notice by yourself.4. Jackrabbit's efficiency at importing nodes depends largely onthe persistence and filesystem implementation you are using. For example I've seen the BDB storage backend perform 10 times faster than the XML-file-based one. 5. When you compare two approaches (one XML database, one JCR repository) for your own usecase, and moreover when you ask for feedback about your experiments, publish the results of your benchmarks, be very careful to mention *what* you tested, and *how*. You also need to mention of course the numeric figures. Otherwise you're just spreading FUD. Cheers,
smime.p7s
Description: S/MIME Cryptographic Signature
