[ 
https://issues.apache.org/jira/browse/OAK-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704365#comment-13704365
 ] 

Thomas Mueller commented on OAK-890:
------------------------------------

Hi Jukka,

Yes, a specialized index would help a lot. I'm not against doing that (in the 
contrary) but my issue is a bit different, it is about features we had in 
Jackrabbit 2.x that are not supported right now in Oak (and are hard to 
support), such as "contains(..) or contains(..)".

> rewriting the query to a join

I think automatic rewriting within the query engine (for this case) would be 
hard to achieve, so I wouldn't try to do that. (Manual rewriting within the 
application might make sense in some cases; but on the other hand we want to be 
compatible with Jackrabbit 2.x as much as possible, so it's also not my 
preferred option).

> extended full text syntax

Well it's not so much about extending the fulltext syntax as it's about provide 
a way to let the index know the whole fulltext condition. Currently, for some 
cases, such as "contains(..) or contains(..)" as above, currently the query 
index filter (FilterImpl) doesn't contain either condition - it can't really, 
due to the way the Filter interface is specified. That's the main problem.

> I'd just pass the full abstract query tree to the index implementations for 
> evaluation

That's an option. We could do it in a way such that the index implementation 
doesn't _have to_ understand it, but if it does understand the AST, then it 
could use it. Specially for the fulltext condition it's an option; that way we 
might avoid having to parse the condition multiple times (avoid needing 
multiple parsers and avoid the small runtime overhead). Specially as soon as 
there is "fulltext index wrapper" similar to the NodeTypeIndex.

I'm actually working on such an addition, right now just for fulltext 
conditions, to the Filter interface.

> Features like boosts, etc. could be implemented by extending the query syntax 
> and associated abstract syntax tree.

Yes, I'm working on that.


                
> Query: advanced fulltext search conditions
> ------------------------------------------
>
>                 Key: OAK-890
>                 URL: https://issues.apache.org/jira/browse/OAK-890
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: query
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>
> Currently, the query engine does not use a fulltext index if there are 
> multiple fulltext conditions combined with "or". Also, the QueryIndex 
> interface does not support boosts, and does not support fulltext conditions 
> on properties (just on nodes) - Filter.getFulltextConditions is a collection 
> of strings, combined with "and", but does not contain the information whether 
> a condition is on a property or on all properties. Also, the popular sorting 
> by score (specially descending) is not currently supported.
> [~mreutegg] and me discussed how we could support those features (including 
> boost) in a way that is backward compatible with Jackrabbit 2.x, but without 
> adding a lot of complexity. Example Jackrabbit 2.x query:
> {code}
> /jcr:root/content//*[(@jcr:primaryType='page' 
>   and (jcr:contains(jcr:content/@tags, 'it:blue') 
>   or jcr:contains(jcr:content/@tags, '/tags/it/blue')))]
> /jcr:root/content//element(*, nt:hierarchyNode)[
>   (jcr:contains(jcr:content, 'SomeTextToSearch') 
>   or jcr:contains(jcr:content/@jcr:title, 'SomeTextToSearch') 
>   or jcr:contains(jcr:content/@jcr:description, 'SomeTextToSearch'))]
>   /rep:excerpt(.) order by @jcr:score descending 
> {code}
> A possible solution is to extend the internal fulltext syntax to support 
> those features. The internal fulltext syntax is the one used by 
> Filter.getFulltextCondition (not the one used within the original XPath, SQL, 
> or SQL-2 query). The proposed syntax (work in progress, just a rough draft so 
> far) is:
> {code}
> FullTextSearch ::= Or
>   ['order by score' [' desc']]
> Or ::= And {' OR ' And}* 
> And ::= Term {' ' Term}*
> Term ::= '(' Or ')' | ['-'] SimpleTerm
> SimpleTerm ::= [Property ':'] '"' Word {' ' Word}* '"' ['^' Boost]
> Property ::= <property name>
> Boost ::= <number>
> {code}
> The idea is that the syntax matches the syntax used by Lucene (except for the 
> 'order by' part), so that the Lucene and Solr index implementations should 
> get simpler (only need minimal parsing, possibly just the 'order by' part). 
> Search terms (phrases, words) are always within double quotes. That means, 
> the above queries would result in the following condition:
> {code}
> jcr:content/tags:"it:blue" 
> OR jcr:content/tags:"/tags/it/blue"
> jcr:content/*:"SomeTextToSearch" 
> OR jcr:content/jcr:title:"SomeTextToSearch"
> OR jcr:content/jcr:description:"SomeTextToSearch"
> order by score desc
> {code}
> It would also allow to switch back from 
> {code}
> Collection<String> getFulltextConditions()
> {code}
> to 
> {code}
> String getFulltextCondition()
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to