Marcel,

just wanted to get back to you (and the list as well). I downloaded
jackrabbit-webapp-1.3-SNAPSHOT and run the same tests again.
Performances are much better and queries seem to be much more optimized.
Congratulations for the improvements.
Alessandro



On 3/28/07, Marcel Reutegger <[EMAIL PROTECTED]> wrote:

Hi Alessandro,

Alessandro Bologna wrote:
> Now I have found another unusual behavior, and I was hoping you could
> explain this too...
> These queries have been executed in sequence (without restarting):
>
>
> Executing query: /jcr:root/load/n10/n33/[EMAIL PROTECTED]>10000]
> Query execution time:10245ms
> Number of nodes:91
>
>
>
> Executing query: /jcr:root/load/n10/n33/[EMAIL PROTECTED]>10000 and
> @random<10000000]
> Query execution time:20409ms
> Number of nodes:91
>
>
>
> Executing query: /jcr:root/load/n10/n33/[EMAIL PROTECTED]>10000 and
> @random<10000000 and @random<10000001]
> Query execution time:30053ms
> Number of nodes:91
>
>
> I think that the execution time on the first query is already quite high
> (an
> equality query takes just a few millisecond),

This has already been improved with
http://issues.apache.org/jira/browse/JCR-804

> but what I am more
> disconcerted about is that the second query (with two condition, the
second
> being a 'dummy' one since it is true for each of the 91 nodes returned
by
> the second query) takes double the time, and the third query (with the
> third
> condition being basically the same as the first one) takes three times
as
> much.
>
> Typically I would expect an 'and' query to be executed on the results of
> the
> first one, and therefore to take just a little bit less.
>
> So the questions are:
> 1. why does it takes so long to find 91 nodes in the first query

this is caused by:
- MultiTermDocs is expensive on large value ranges (-> fixed in JCR-804)
- @random>10000 (probably) selects a great number of nodes, which are
later
excluded again because of the path constraint

> 2. why the second and third query take as much time as the first times
the
> number of expressions?

each of the expressions is evaluated independently and in a second step
'and'ed
together. therefore the predominant cost in your query seems to be the
individual expressions. because each of the range expressions selects a
lot of
nodes lucene cannot optimize the execution well. see above for a
workaround.

> 3. is there a workaround to do range queries?

partitioning the random property into multiple properties may help. the
basic
idea is that you split the random number into a sum of multiple values.

@random = 34045

would become:

@random1 = 5
@random10 = 4
@random100 = 0
@random1000 = 4
@random10000 = 3

later if you search for all random properties with a value larger than
12000 you
would have a query:
//*[(@random10000 = 1 and @random1000 >= 2) or (random10000 >= 2)]

because the distinct values of the split up properties are small, lucene
can
much better optimize the query execution.

regards
marcel

Reply via email to