Hi, I fully agree with the idea that the Query Engine should not split the search phrase into tokens.
If i remember correctly this behavior is there to allow the default full-text engine to work, so to keep those parts working (if needed) we can simply move this simple tokenization mechanism to the index impl. > So should we disable the Fulltext parsing happening in QueryEngine? +1 alex On Wed, Nov 19, 2014 at 2:23 PM, Chetan Mehrotra <[email protected]> wrote: > Following up on the earlier mail thread [1] but focusing on fulltext > parsing happening at the Query Engine level > > Consider a case where we search for "mountain is big" and assume that > no aggregation complexity is involved > > /jcr:root/content//element(*, test:Asset)[(jcr:contains(., 'mountain is > big'))] > > Now as per (OAK-890) this would get broken into a full text expression > which is *and* of 'mountain' , 'is', 'big'. LuceneIndex would get to > see already analyzed full text phrase and would construct a Lucene > query like below > > +:fulltext:big +:fulltext:is +:fulltext:mountain > > This query might not perform in expected way if the analyzer is > configured with stop words which would ignore 'is'. > > To avoid such cases it would be better if the QueryEngine does not > parse the fulltext string in any form and pass the string as is. > > Only thing that would be lost in such a case is the boost support. > That can possibly be handled at LuceneIndex level > > Looking at JR2 code I think no such parsing was performed at that time > [2] and text passed as part of query is passed *as is* to Lucene > QueryParser > > So should we disable the Fulltext parsing happening in QueryEngine? > > Chetan Mehrotra > [1] http://markmail.org/thread/cyu7evezbi4u22gr > [2] > https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-core/src/main/java/org/apache/jackrabbit/core/query/lucene/JackrabbitQueryParser.java >
