Following up on the earlier mail thread [1] but focusing on fulltext
parsing happening at the Query Engine level

Consider a case where we search for "mountain is big" and assume that
no aggregation complexity is involved

/jcr:root/content//element(*, test:Asset)[(jcr:contains(., 'mountain is big'))]

Now as per (OAK-890) this would get broken into a full text expression
which is *and* of 'mountain' , 'is', 'big'. LuceneIndex would get to
see already analyzed full text phrase and would construct a Lucene
query like below

+:fulltext:big +:fulltext:is +:fulltext:mountain

This query might not perform in expected way if the analyzer is
configured with stop words which would ignore 'is'.

To avoid such cases it would be better if the QueryEngine does not
parse the fulltext string in any form and pass the string as is.

Only thing that would be lost in such a case is the boost support.
That can possibly be handled at LuceneIndex level

Looking at JR2 code I think no such parsing was performed at that time
[2] and text passed as part of query is passed *as is* to Lucene
QueryParser

So should we disable the Fulltext parsing happening in QueryEngine?

Chetan Mehrotra
[1] http://markmail.org/thread/cyu7evezbi4u22gr
[2] 
https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-core/src/main/java/org/apache/jackrabbit/core/query/lucene/JackrabbitQueryParser.java

Reply via email to