[
https://issues.apache.org/jira/browse/OAK-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704365#comment-13704365
]
Thomas Mueller commented on OAK-890:
------------------------------------
Hi Jukka,
Yes, a specialized index would help a lot. I'm not against doing that (in the
contrary) but my issue is a bit different, it is about features we had in
Jackrabbit 2.x that are not supported right now in Oak (and are hard to
support), such as "contains(..) or contains(..)".
> rewriting the query to a join
I think automatic rewriting within the query engine (for this case) would be
hard to achieve, so I wouldn't try to do that. (Manual rewriting within the
application might make sense in some cases; but on the other hand we want to be
compatible with Jackrabbit 2.x as much as possible, so it's also not my
preferred option).
> extended full text syntax
Well it's not so much about extending the fulltext syntax as it's about provide
a way to let the index know the whole fulltext condition. Currently, for some
cases, such as "contains(..) or contains(..)" as above, currently the query
index filter (FilterImpl) doesn't contain either condition - it can't really,
due to the way the Filter interface is specified. That's the main problem.
> I'd just pass the full abstract query tree to the index implementations for
> evaluation
That's an option. We could do it in a way such that the index implementation
doesn't _have to_ understand it, but if it does understand the AST, then it
could use it. Specially for the fulltext condition it's an option; that way we
might avoid having to parse the condition multiple times (avoid needing
multiple parsers and avoid the small runtime overhead). Specially as soon as
there is "fulltext index wrapper" similar to the NodeTypeIndex.
I'm actually working on such an addition, right now just for fulltext
conditions, to the Filter interface.
> Features like boosts, etc. could be implemented by extending the query syntax
> and associated abstract syntax tree.
Yes, I'm working on that.
> Query: advanced fulltext search conditions
> ------------------------------------------
>
> Key: OAK-890
> URL: https://issues.apache.org/jira/browse/OAK-890
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: query
> Reporter: Thomas Mueller
> Assignee: Thomas Mueller
>
> Currently, the query engine does not use a fulltext index if there are
> multiple fulltext conditions combined with "or". Also, the QueryIndex
> interface does not support boosts, and does not support fulltext conditions
> on properties (just on nodes) - Filter.getFulltextConditions is a collection
> of strings, combined with "and", but does not contain the information whether
> a condition is on a property or on all properties. Also, the popular sorting
> by score (specially descending) is not currently supported.
> [~mreutegg] and me discussed how we could support those features (including
> boost) in a way that is backward compatible with Jackrabbit 2.x, but without
> adding a lot of complexity. Example Jackrabbit 2.x query:
> {code}
> /jcr:root/content//*[(@jcr:primaryType='page'
> and (jcr:contains(jcr:content/@tags, 'it:blue')
> or jcr:contains(jcr:content/@tags, '/tags/it/blue')))]
> /jcr:root/content//element(*, nt:hierarchyNode)[
> (jcr:contains(jcr:content, 'SomeTextToSearch')
> or jcr:contains(jcr:content/@jcr:title, 'SomeTextToSearch')
> or jcr:contains(jcr:content/@jcr:description, 'SomeTextToSearch'))]
> /rep:excerpt(.) order by @jcr:score descending
> {code}
> A possible solution is to extend the internal fulltext syntax to support
> those features. The internal fulltext syntax is the one used by
> Filter.getFulltextCondition (not the one used within the original XPath, SQL,
> or SQL-2 query). The proposed syntax (work in progress, just a rough draft so
> far) is:
> {code}
> FullTextSearch ::= Or
> ['order by score' [' desc']]
> Or ::= And {' OR ' And}*
> And ::= Term {' ' Term}*
> Term ::= '(' Or ')' | ['-'] SimpleTerm
> SimpleTerm ::= [Property ':'] '"' Word {' ' Word}* '"' ['^' Boost]
> Property ::= <property name>
> Boost ::= <number>
> {code}
> The idea is that the syntax matches the syntax used by Lucene (except for the
> 'order by' part), so that the Lucene and Solr index implementations should
> get simpler (only need minimal parsing, possibly just the 'order by' part).
> Search terms (phrases, words) are always within double quotes. That means,
> the above queries would result in the following condition:
> {code}
> jcr:content/tags:"it:blue"
> OR jcr:content/tags:"/tags/it/blue"
> jcr:content/*:"SomeTextToSearch"
> OR jcr:content/jcr:title:"SomeTextToSearch"
> OR jcr:content/jcr:description:"SomeTextToSearch"
> order by score desc
> {code}
> It would also allow to switch back from
> {code}
> Collection<String> getFulltextConditions()
> {code}
> to
> {code}
> String getFulltextCondition()
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira