Hi all,

 

we currently work on an application using jackrabbit (CRX) with a lot of 
content (more than 15000 documents). 

 

To fit the requirements we had to create some relations between our documents 
(like symlinks in unix systems). For example we have a document

 

/docs/generated/document1

 

that should be referenced from several other locations, say:

 

/some/path/refToDocument1

/some/other/path/refToDocument1

/and/another/path/refToDocument1

 

We implemented this using a property, where we store the path to the original 
document. These reference is resolved in a higher layer of our application. 

 

Now we have obtain a list of all referenced documents below a given path, 
filtered by  properties currently stored at the document itself and 
conditionally sorted by a set of properties.  Therefor we tried two approaches 
yet:

 

1.       Setting a mixin type to the references, we can query for and get an 
unsorted, unfiltered (very huge) result set, we afterwards filter and sort

2.       Iterating (using multiple threads) over the hole tree below the given 
path, only collecting nodes matching the given filter. Sorting is done 
afterwards

 

Both of the solutions didn't perform very well. In (1) the search took about 
900 ms (this is ok for about 10000 entries in the result set, I think) and the 
filtering took about 3000ms. In (2) the traversing took 4500ms including 
filtering only. So both solutions are not suitable for our project and we are 
looking for a better way to model the given requirement. So what is the best 
way to work with relational content in jackrabbit?

 

The last idea I had to solve the performance issue is to reduce the size of the 
result set by querying the documents directly, applying filtering and sorting 
using Lucene but this failed due to the complex sorting we have to implement. 
For example: order by property a when b doesn't exists otherwise use b. So is 
it possible to implement conditional sorting using the properties available in 
the index?

 

Any other hints according to performance improvements are very welcome. 
(bundleCacheSize is already increased to about 10% of available heap size ;-)).



Thanks so far,

Dirk Rudolph  




T-Systems Multimedia Solutions GmbH 
Organisationseinheit CCS
Dirk Rudolph
Software-Entwicklung, OCJP

Hausanschrift: Riesaer Straße 5, 01129 Dresden 
Postanschrift: Postfach 10 02 24, 01072 Dresden 
+49 351 2820-5363       (Tel)  
E-Mail: [email protected] <mailto:[email protected]> 
Internet: http://www.t-systems-mms.com <http://www.t-systems-mms.de/> 

T-Systems Multimedia Solutions GmbH 

Aufsichtsrat: Klaus Werner (Vorsitzender)
Geschäftsführung: Peter Klingenburg, Susanne Heger, Dr. Rolf Werner
Handelsregister: Amtsgericht Dresden HRB 11433 
Sitz der Gesellschaft: Dresden 
Ust-IdNr.: DE 811 807 949 

 

 

Reply via email to