Lorenzo Dini wrote:
Indexing and Searching

12) How much is the improvement of specifying the indexing rules? I am mainly use the name property for searching and few others... Setting this properties as priorital would speedup a lot? I think that most of the time is spent not on the lucine query itself but in loading and sorting the nodes.

an indexing rule has an effect on the size of the index. if fewer properties are indexed the, the index will be smaller and queries will be slightly faster. the primary use of the rules however are boost values that you can assign. those have an effect on the ordering of result nodes in case you do an 'order by @jcr:score'. boost values in the configuration do not have an effect on performance.

performance wrt sorting of nodes has been greatly improved in 1.4 and should now be faster than in 1.3.x.

13) When exactly the nodes are loaded from the DB by the QueryEngine?

this depends on the query, the configuration and the sort criteria. if the configuration is set to respectDocumentOrder=true (default, but will change to false in jackrabbit >= 1.5) and there is no sort criteria in the query statement, then all result nodes are loaded and they are sorted according to their document order.

What's happening during query.execute()?
What's during query.getNodes()? how many nodes are read from the DB?

none, except if respectDocumentOrder=true and there is no sort criteria

When (and how) the sorting is done?

sorting is done at the very end of the query. document order is calculated from the content directly. any other sorting (based on property values) is done using lucene.

What's during iterator.nextNode()

the uuid of the node is resolved into a Node instance. Usually the nodes needs to be read from the persistence manager, unless it is already present in the cache.

14) How the sorting works since it cannot be done by the DB? Is it done by lucine?

correct.

or simply all the nodes are sorted using a collections.sort? That means that all nodes must be loaded before returning the first and even if you need only the first N.

this is only the case for results in document order. we assumed people would rarely need this and did not optimize it.

How to speedup this?

this depends on the query you have. can you please provide some query 
statements?

15) Is there any change in JR 1.4? I saw it is possible to limit the entries returned and the offset, how this work with sorting?

actually lots of. performance has been improved for property existence checks, hierarchy checks are faster and sorting has been improved as well.

16) In case I need a specific subnode with a particular property, is it faster to list all the subnodes using the node.getNodes() and picking the right one or doing a lucine query? I imagine it depends on the number of subnodes but aproximately for 20 subnodes the overhead of lucine overperform the getNodes()

if there are only 20 child nodes the manual check is probably faster than a 
query.

regards
 marcel

Reply via email to