[ 
https://issues.apache.org/jira/browse/OAK-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147692#comment-14147692
 ] 

Thomas Mueller commented on OAK-2134:
-------------------------------------

Some more details why using the path is slow: Lucene first expands the prefix 
restriction, that means, it will fetch all terms from the index that match the 
given path. If there are many paths that match (and in my case there are 
potentially millions of matches), this is slow. Then, Lucene searches all 
combinations of the other full-text condition(s) with the respective path. That 
means Lucene is simply not made for this use case.

Two solutions to investigate are:

* (a) only index the path prefix (for example, the 20 first characters of the 
path, or only the first 5 path entries), and

* (b) also index all parent paths of a node (not store those in the document; 
just index them); when querying, use an exact match for the parent.

Solution (a) will reduce the index size in most cases, but is not guaranteed to 
always solve the problem, if there are many nodes that have distinct short path 
(for example, a counter node near the root node).

Solution (b) will increase the index size (how much needs to be tested), and 
would solve the problem.

> Lucene: not using the path restriction can speed up queries
> -----------------------------------------------------------
>
>                 Key: OAK-2134
>                 URL: https://issues.apache.org/jira/browse/OAK-2134
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>             Fix For: 1.1, 1.0.7
>
>
> Currently, the Oak Lucene index uses the path restriction in the hope that 
> queries can be faster. However, I found that not using the path restriction 
> is better (much better) in many cases. The following queries were run:
> {noformat}
> :fulltext:test
> +:fulltext:test +:path:/path/prefix/*
> {noformat}
> A workaround is to change the query, by removing the path restriction, and 
> adding a 'like' conditions, as follows (for XPath):
> {noformat}
> ... and jcr:like(@jcr:path, '/path/prefix/%')
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to