Re: Some performance questions about Jackrabbit

Marcel Reutegger Wed, 06 Feb 2008 02:18:54 -0800

Lorenzo Dini wrote:

Indexing and Searching
12) How much is the improvement of specifying the indexing rules? I ammainly use the name property for searching and few others... Settingthis properties as priorital would speedup a lot? I think that most ofthe time is spent not on the lucine query itself but in loading andsorting the nodes.

an indexing rule has an effect on the size of the index. if fewer properties areindexed the, the index will be smaller and queries will be slightly faster. theprimary use of the rules however are boost values that you can assign. thosehave an effect on the ordering of result nodes in case you do an 'order by@jcr:score'. boost values in the configuration do not have an effect on performance.

performance wrt sorting of nodes has been greatly improved in 1.4 and should nowbe faster than in 1.3.x.

13) When exactly the nodes are loaded from the DB by the QueryEngine?

this depends on the query, the configuration and the sort criteria. if theconfiguration is set to respectDocumentOrder=true (default, but will change tofalse in jackrabbit >= 1.5) and there is no sort criteria in the querystatement, then all result nodes are loaded and they are sorted according totheir document order.

What's happening during query.execute()?
What's during query.getNodes()? how many nodes are read from the DB?


none, except if respectDocumentOrder=true and there is no sort criteria

When (and how) the sorting is done?

sorting is done at the very end of the query. document order is calculated fromthe content directly. any other sorting (based on property values) is done usinglucene.

What's during iterator.nextNode()

the uuid of the node is resolved into a Node instance. Usually the nodes needsto be read from the persistence manager, unless it is already present in the cache.

14) How the sorting works since it cannot be done by the DB? Is it doneby lucine?


correct.

or simply all the nodes are sorted using a collections.sort?That means that all nodes must be loaded before returning the first andeven if you need only the first N.

this is only the case for results in document order. we assumed people wouldrarely need this and did not optimize it.

How to speedup this?


this depends on the query you have. can you please provide some query 
statements?

15) Is there any change in JR 1.4? I saw it is possible to limit theentries returned and the offset, how this work with sorting?

actually lots of. performance has been improved for property existence checks,hierarchy checks are faster and sorting has been improved as well.

16) In case I need a specific subnode with a particular property, is itfaster to list all the subnodes using the node.getNodes() and pickingthe right one or doing a lucine query? I imagine it depends on thenumber of subnodes but aproximately for 20 subnodes the overhead oflucine overperform the getNodes()


if there are only 20 child nodes the manual check is probably faster than a 
query.

regards
 marcel

Re: Some performance questions about Jackrabbit

Reply via email to