Hi,

I fully agree with the idea that the Query Engine should not split the
search phrase into tokens.

If i remember correctly this behavior is there to allow the default
full-text engine to work, so to keep those parts working (if needed) we can
simply move this simple tokenization mechanism to the index impl.

> So should we disable the Fulltext parsing happening in QueryEngine?
+1


alex







On Wed, Nov 19, 2014 at 2:23 PM, Chetan Mehrotra <[email protected]>
wrote:

> Following up on the earlier mail thread [1] but focusing on fulltext
> parsing happening at the Query Engine level
>
> Consider a case where we search for "mountain is big" and assume that
> no aggregation complexity is involved
>
> /jcr:root/content//element(*, test:Asset)[(jcr:contains(., 'mountain is
> big'))]
>
> Now as per (OAK-890) this would get broken into a full text expression
> which is *and* of 'mountain' , 'is', 'big'. LuceneIndex would get to
> see already analyzed full text phrase and would construct a Lucene
> query like below
>
> +:fulltext:big +:fulltext:is +:fulltext:mountain
>
> This query might not perform in expected way if the analyzer is
> configured with stop words which would ignore 'is'.
>
> To avoid such cases it would be better if the QueryEngine does not
> parse the fulltext string in any form and pass the string as is.
>
> Only thing that would be lost in such a case is the boost support.
> That can possibly be handled at LuceneIndex level
>
> Looking at JR2 code I think no such parsing was performed at that time
> [2] and text passed as part of query is passed *as is* to Lucene
> QueryParser
>
> So should we disable the Fulltext parsing happening in QueryEngine?
>
> Chetan Mehrotra
> [1] http://markmail.org/thread/cyu7evezbi4u22gr
> [2]
> https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-core/src/main/java/org/apache/jackrabbit/core/query/lucene/JackrabbitQueryParser.java
>

Reply via email to