[jira] [Commented] (OAK-890) Query: advanced fulltext search conditions

Thomas Mueller (JIRA) Tue, 09 Jul 2013 03:22:12 -0700

    [ 
https://issues.apache.org/jira/browse/OAK-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13703128#comment-13703128
 ]


Thomas Mueller commented on OAK-890:
------------------------------------

OK, so there is some mechanism in Jackrabbit 2.x that plugs into Lucene to 
support relative properties... I guess this is something we want to avoid for 
Oak, as it wouldn't be easy to support that for Solr (or other fulltext search 
engines). For the example above, the SQL-2 query could be re-written as an 
outer join, from:

{code}
select [jcr:path], [jcr:score], [rep:excerpt] 
from [nt:hierarchyNode] as a 
where (contains([jcr:content/*], 'SomeTextToSearch') 
or contains([jcr:content/jcr:title], 'SomeTextToSearch') 
or contains([jcr:content/jcr:description], 'SomeTextToSearch')) 
and isdescendantnode(a, '/content') 
order by [jcr:score] desc
{code}
to:
{code}
select a.[jcr:path], a.[jcr:score], a.[rep:excerpt] 
from [nt:hierarchyNode] as a 
left outer join [nt:base] as b on ischildnode(b, a)
where (contains(b.*, 'SomeTextToSearch')
or contains(b.[jcr:title], 'SomeTextToSearch')
or contains(b.[jcr:description], 'SomeTextToSearch'))
and isdescendantnode(a, '/content') 
and name(b) = 'jcr:content'
order by a.[jcr:score] desc
{code}

But using outer joins is problematic; it wouldn't work any longer if there is 
another "or" constraint (for example "or a.x = 1". The problem is the "and 
name(b) = 'jcr:content'" that should be moved to the "on" condition but can't 
currently due to the limited SQL-2 syntax (I believe the condition "or name(b) 
is null" is not supported either). So the query would potentially return the 
wrong data.

It's probably easier if a wrapper index of the Lucene/Solr index (similar to 
the NodeTypeIndex) could support relative properties. That way each component 
itself (the query engine, and the fulltext index) could stay relatively simple.
                
> Query: advanced fulltext search conditions
> ------------------------------------------
>
>                 Key: OAK-890
>                 URL: https://issues.apache.org/jira/browse/OAK-890
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: query
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>
> Currently, the query engine does not use a fulltext index if there are 
> multiple fulltext conditions combined with "or". Also, the QueryIndex 
> interface does not support boosts, and does not support fulltext conditions 
> on properties (just on nodes) - Filter.getFulltextConditions is a collection 
> of strings, combined with "and", but does not contain the information whether 
> a condition is on a property or on all properties. Also, the popular sorting 
> by score (specially descending) is not currently supported.
> [~mreutegg] and me discussed how we could support those features (including 
> boost) in a way that is backward compatible with Jackrabbit 2.x, but without 
> adding a lot of complexity. Example Jackrabbit 2.x query:
> {code}
> /jcr:root/content//*[(@jcr:primaryType='page' 
>   and (jcr:contains(jcr:content/@tags, 'it:blue') 
>   or jcr:contains(jcr:content/@tags, '/tags/it/blue')))]
> /jcr:root/content//element(*, nt:hierarchyNode)[
>   (jcr:contains(jcr:content, 'SomeTextToSearch') 
>   or jcr:contains(jcr:content/@jcr:title, 'SomeTextToSearch') 
>   or jcr:contains(jcr:content/@jcr:description, 'SomeTextToSearch'))]
>   /rep:excerpt(.) order by @jcr:score descending 
> {code}
> A possible solution is to extend the internal fulltext syntax to support 
> those features. The internal fulltext syntax is the one used by 
> Filter.getFulltextCondition (not the one used within the original XPath, SQL, 
> or SQL-2 query). The proposed syntax (work in progress, just a rough draft so 
> far) is:
> {code}
> FullTextSearch ::= Or
>   ['order by score' [' desc']]
> Or ::= And {' OR ' And}* 
> And ::= Term {' ' Term}*
> Term ::= '(' Or ')' | ['-'] SimpleTerm
> SimpleTerm ::= [Property ':'] '"' Word {' ' Word}* '"' ['^' Boost]
> Property ::= <property name>
> Boost ::= <number>
> {code}
> The idea is that the syntax matches the syntax used by Lucene (except for the 
> 'order by' part), so that the Lucene and Solr index implementations should 
> get simpler (only need minimal parsing, possibly just the 'order by' part). 
> Search terms (phrases, words) are always within double quotes. That means, 
> the above queries would result in the following condition:
> {code}
> jcr:content/tags:"it:blue" 
> OR jcr:content/tags:"/tags/it/blue"
> jcr:content/*:"SomeTextToSearch" 
> OR jcr:content/jcr:title:"SomeTextToSearch"
> OR jcr:content/jcr:description:"SomeTextToSearch"
> order by score desc
> {code}
> It would also allow to switch back from 
> {code}
> Collection<String> getFulltextConditions()
> {code}
> to 
> {code}
> String getFulltextCondition()
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OAK-890) Query: advanced fulltext search conditions

Reply via email to