[
https://issues.apache.org/jira/browse/OAK-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704326#comment-13704326
]
Jukka Zitting commented on OAK-890:
-----------------------------------
Ideally it should IMO be possible to execute the first example query against a
custom index configured as follows:
* include all nodes under {{/content}}
* include the {{jcr:primaryType}} property
* include the {{tags}} property of a {{jcr:content}} child, analyzed for full
text searching
(or any superset of such a configuration). Similarly for the second example
query. Such custom indices should be able to massively outperform Jackrabbit
2.x at least in some cases.
As far as I see it, solutions that rely on mapping such constraints to an
extended full text syntax or on rewriting the query to a join would end up
making it harder for an index like the one described above to figure out
whether it can be used to evaluate the query. Thus instead of such preprocessed
data, I'd just pass the full abstract query tree to the index implementations
for evaluation. The implementations can still opt to apply such transformations
if they're helpful (or necessary to make the query evaluable), but it should
also be possible for them to use the original query.
Features like boosts, etc. could be implemented by extending the query syntax
and associated abstract syntax tree.
> Query: advanced fulltext search conditions
> ------------------------------------------
>
> Key: OAK-890
> URL: https://issues.apache.org/jira/browse/OAK-890
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: query
> Reporter: Thomas Mueller
> Assignee: Thomas Mueller
>
> Currently, the query engine does not use a fulltext index if there are
> multiple fulltext conditions combined with "or". Also, the QueryIndex
> interface does not support boosts, and does not support fulltext conditions
> on properties (just on nodes) - Filter.getFulltextConditions is a collection
> of strings, combined with "and", but does not contain the information whether
> a condition is on a property or on all properties. Also, the popular sorting
> by score (specially descending) is not currently supported.
> [~mreutegg] and me discussed how we could support those features (including
> boost) in a way that is backward compatible with Jackrabbit 2.x, but without
> adding a lot of complexity. Example Jackrabbit 2.x query:
> {code}
> /jcr:root/content//*[(@jcr:primaryType='page'
> and (jcr:contains(jcr:content/@tags, 'it:blue')
> or jcr:contains(jcr:content/@tags, '/tags/it/blue')))]
> /jcr:root/content//element(*, nt:hierarchyNode)[
> (jcr:contains(jcr:content, 'SomeTextToSearch')
> or jcr:contains(jcr:content/@jcr:title, 'SomeTextToSearch')
> or jcr:contains(jcr:content/@jcr:description, 'SomeTextToSearch'))]
> /rep:excerpt(.) order by @jcr:score descending
> {code}
> A possible solution is to extend the internal fulltext syntax to support
> those features. The internal fulltext syntax is the one used by
> Filter.getFulltextCondition (not the one used within the original XPath, SQL,
> or SQL-2 query). The proposed syntax (work in progress, just a rough draft so
> far) is:
> {code}
> FullTextSearch ::= Or
> ['order by score' [' desc']]
> Or ::= And {' OR ' And}*
> And ::= Term {' ' Term}*
> Term ::= '(' Or ')' | ['-'] SimpleTerm
> SimpleTerm ::= [Property ':'] '"' Word {' ' Word}* '"' ['^' Boost]
> Property ::= <property name>
> Boost ::= <number>
> {code}
> The idea is that the syntax matches the syntax used by Lucene (except for the
> 'order by' part), so that the Lucene and Solr index implementations should
> get simpler (only need minimal parsing, possibly just the 'order by' part).
> Search terms (phrases, words) are always within double quotes. That means,
> the above queries would result in the following condition:
> {code}
> jcr:content/tags:"it:blue"
> OR jcr:content/tags:"/tags/it/blue"
> jcr:content/*:"SomeTextToSearch"
> OR jcr:content/jcr:title:"SomeTextToSearch"
> OR jcr:content/jcr:description:"SomeTextToSearch"
> order by score desc
> {code}
> It would also allow to switch back from
> {code}
> Collection<String> getFulltextConditions()
> {code}
> to
> {code}
> String getFulltextCondition()
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira