[ 
https://issues.apache.org/jira/browse/OAK-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704402#comment-13704402
 ] 

Thomas Mueller commented on OAK-890:
------------------------------------

> ideally the full abstract syntax tree of the query

There are a few problems, for example the syntax tree can contain entries for 
other selectors, and an index is used for each selector. More importantly, the 
full AST API is quite complex and not stable; it changes whenever we want to 
add a new feature or even just new syntax. If we have multiple index 
implementations relying on the AST, then the whole system would be quite 
brittle and not modular. That's why I would try to avoid it if possible. But it 
might not always be possible to avoid it.

But fulltext index conditions are slightly different. The syntax should be 
relatively stable, and providing the AST (for the fulltext conditions) would 
have the benefit of only requiring one parser. For example, I'm not sure if the 
parser in the LuceneIndex really works correctly around "OR" conditions and 
escaping. I guess nobody would be unhappy if we don't need it at all :-)
                
> Query: advanced fulltext search conditions
> ------------------------------------------
>
>                 Key: OAK-890
>                 URL: https://issues.apache.org/jira/browse/OAK-890
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: query
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>
> Currently, the query engine does not use a fulltext index if there are 
> multiple fulltext conditions combined with "or". Also, the QueryIndex 
> interface does not support boosts, and does not support fulltext conditions 
> on properties (just on nodes) - Filter.getFulltextConditions is a collection 
> of strings, combined with "and", but does not contain the information whether 
> a condition is on a property or on all properties. Also, the popular sorting 
> by score (specially descending) is not currently supported.
> [~mreutegg] and me discussed how we could support those features (including 
> boost) in a way that is backward compatible with Jackrabbit 2.x, but without 
> adding a lot of complexity. Example Jackrabbit 2.x query:
> {code}
> /jcr:root/content//*[(@jcr:primaryType='page' 
>   and (jcr:contains(jcr:content/@tags, 'it:blue') 
>   or jcr:contains(jcr:content/@tags, '/tags/it/blue')))]
> /jcr:root/content//element(*, nt:hierarchyNode)[
>   (jcr:contains(jcr:content, 'SomeTextToSearch') 
>   or jcr:contains(jcr:content/@jcr:title, 'SomeTextToSearch') 
>   or jcr:contains(jcr:content/@jcr:description, 'SomeTextToSearch'))]
>   /rep:excerpt(.) order by @jcr:score descending 
> {code}
> A possible solution is to extend the internal fulltext syntax to support 
> those features. The internal fulltext syntax is the one used by 
> Filter.getFulltextCondition (not the one used within the original XPath, SQL, 
> or SQL-2 query). The proposed syntax (work in progress, just a rough draft so 
> far) is:
> {code}
> FullTextSearch ::= Or
>   ['order by score' [' desc']]
> Or ::= And {' OR ' And}* 
> And ::= Term {' ' Term}*
> Term ::= '(' Or ')' | ['-'] SimpleTerm
> SimpleTerm ::= [Property ':'] '"' Word {' ' Word}* '"' ['^' Boost]
> Property ::= <property name>
> Boost ::= <number>
> {code}
> The idea is that the syntax matches the syntax used by Lucene (except for the 
> 'order by' part), so that the Lucene and Solr index implementations should 
> get simpler (only need minimal parsing, possibly just the 'order by' part). 
> Search terms (phrases, words) are always within double quotes. That means, 
> the above queries would result in the following condition:
> {code}
> jcr:content/tags:"it:blue" 
> OR jcr:content/tags:"/tags/it/blue"
> jcr:content/*:"SomeTextToSearch" 
> OR jcr:content/jcr:title:"SomeTextToSearch"
> OR jcr:content/jcr:description:"SomeTextToSearch"
> order by score desc
> {code}
> It would also allow to switch back from 
> {code}
> Collection<String> getFulltextConditions()
> {code}
> to 
> {code}
> String getFulltextCondition()
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to