Re: importing jackrabbit into jackrabbit

Marcel Reutegger Thu, 26 Apr 2007 01:11:29 -0700

Stefan Kurla wrote:

As far as the file nodetype is concerned, this is a custom nodetype
which has 4 references per file imported and currently, all the
references are made to the same UUID since we are testing, this could
change in the future.

this may be the time consuming factor. whenever a reference is added that pointsto a node N the complete set of references pointing to N is re-written to thepersistence manager. with increasing number of references to N this will slowdown your import. is there a reason why all files point to the same node?

Any tips or ideas? I will update the results of the test. Right now I
have imported 1K out of 12K files and the import time has gone up to 4
seconds per file. Is this normal? Remember since I am importing the
jackrabbit SVN all files are put under one nt:folder which is
"jackrabbit". This is a pretty normal case of about 12K files and only
78MB. We have plans of a 1TB repository.

I did a quick test with an adapted version ofhttp://svn.apache.org/repos/asf/jackrabbit/trunk/jackrabbit-core/src/test/java/org/apache/jackrabbit/core/query/TextExtractorTest.java

that saves changes whenever 100 files have been imported.

I used the svn export of jackrabbit/trunk (~3000 files in ~900 folders)

configuration:
- jackrabbit in-process
- o.a.j.c.persistence.db.DerbyPersistenceManager (externalBlobs = false)
- text extractors: pdf, xml and plain text

test result:

Imported 2978 files in 50484 ms.

regards
 marcel

Re: importing jackrabbit into jackrabbit

Reply via email to