Hi, We have been incurring in an interesting behavior doing searches on a quite large repository (~1,000,000 nodes). The test data is made of a tree of nodes of type nt:unstructured, reference able, with two numeric properties (a sequential count of the node and a random number between 0 and the count). Each node has a reference to the parent, and up to 100 child nodes, and is named n<m> where m is the index of the node, related to the parent node. So, for instance, /load/n0 is the first node, /load/n1 the second to /load/n99. Then each one of them has 100 children and so on, so that a valid path, for instance, is /load/n23/n34/n50. One node out of 6 has attached a nt:file node as well, in order to test full text searches. If requested, I can provide the code to create the test set.
The strange behavior that prompted me to write to this mailing list, is the following: Say that I am searching for a node that contains the word 'beatles' at some level under the node /load/n40 and I use the following query: */jcr:root/load/n40//*[jcr:contains(.,'beatles')]* the execution time is 1672ms If I use instead: */jcr:root/load/n40/*/*/*/*[jcr:contains(.,'beatles')]* the execution time is 19749ms The second query, in theory, could execute faster than the first, because I am providing more information (only nodes at the 4th level under /load/n40) but takes 10 times longer to execute. Is there a reason why? The other, way more worrisome problem, appears to be the opposite: I have executed the following two queries /jcr:root/load/n50/n2/* ==> 931ms /jcr:root/load/n50/n2/*/* ==> 661ms The first is returning all nodes one level below /load/n50/n2 and the second two levels below. There are no other nodes under that. When I tried the following query, which would return the same nodes in one operation, the result was surprising (in a bad way) /jcr:root/load/n50/n2//* ==>*353769ms* ** The CPU goes 100%, I see in the jackrabbit logs a lot of entries similar to: DocNumberCache: size=1024/1024, #accesses=17039, #hits=167, #misses=16872, cacheRatio=1% (DocNumberCache.java, line 155) and then finally, *some 5 minutes later*, I get the result. Even if I restrict the query, it still takes the same time: /jcr:root/load/n50/n2/m96//* and there's maybe only an hundred nodes under that. I have the exact same behavior if I try with the SQL syntax: select * from nt:base where jcr:path like '/load/n50/n2/n96/%' The version of JR is 1.2.2. The backend is Oracle 10g, and I am running the application on Tomcat 5.5 with jdk 1.5 and 1GB assigned to the JVM (on Windows) Does anybody have any idea on why is this happening and if there is a workaround? Thanks Alessandro Bologna **
