Actually, i have found that it is *not* mandatory to use phrase search with CommonGramsFilter .
PS: i had some other code change (which is unnecessary) which was causing the above behavior. On Thu, Jan 4, 2018 at 6:56 PM, Nawab Zada Asad Iqbal <khi...@gmail.com> wrote: > After some debugging, it seems that the search works if the query is > phrase search (i.e, enclosed in quotes) > > http://localhost:8983/solr/filesearch/select?q=%22not% > 20to%20or%20be%22&debugQuery=true > > This works both in case of sow=true or false. > > Is it mandatory to use phrase search to properly pass the stopwords to the > CommonGramsFilter? > > > > > > On Thu, Jan 4, 2018 at 6:08 PM, Nawab Zada Asad Iqbal <khi...@gmail.com> > wrote: > >> Hi, >> >> I am looking at this documentation and wondering if it would be better to >> optionally skip indexing of original stopwords. >> >> https://lucene.apache.org/solr/guide/6_6/filter-descriptions >> .html#FilterDescriptions-CommonGramsFilter >> >> http://localhost:8983/solr/filesearch/select?q=not%20to%20or >> %20be&debugQuery=true >> >> >> - parsedquery: "+(-DisjunctionMaxQuery((commongram_field2:to)~0.01) >> DisjunctionMaxQuery((commongram_field2:be)~0.01))~1", >> >> >> >> Other parameters are: >> >> >> - params: { >> - mm: " 1<-0% ", >> - q.alt: "*:*", >> - ps: "100", >> - echoParams: "all", >> - sort: "score desc", >> - rows: "35", >> - version: "2.2", >> - q: "not to or be", >> - tie: "0.01", >> - defType: "edismax", >> - qf: "commongram_field2", >> - sow: "false", >> - wt: "json", >> - debugQuery: "true" >> } >> >> >> And it doesn't match my document, which has following fields: >> >> >> - id: "9191", >> - commongram_field2: "not to or be", >> >> >> >> Commongram is defined as: >> >> <field name="commongram_field2" type="commongaram" indexed="true" >> stored="true" omitPositions="false"/> >> >> <fieldType name="commongaram" class="solr.TextField" >> positionIncrementGap="100"> >> <analyzer type="index"> >> <charFilter class="org.apache.lucene.analy >> sis.icu.ICUNormalizer2CharFilterFactory" name="nfkc" mode="compose"/> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> <filter class="solr.WordDelimiterGraphFilterFactory" >> generateWordParts="1" generateNumberParts="1" catenateWords="0" >> catenateNumbers="0" catenateAll="0" preserveOriginal="0" >> splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="0"/> >> <filter class="solr.FlattenGraphFilterFactory"/> >> <filter class="solr.PatternReplaceFilterFactory" >> pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.ASCIIFoldingFilterFactory"/> >> <filter class="solr.CommonGramsFilterFactory" >> words="stopwords.txt" ignoreCase="true"/> >> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >> <filter class="solr.LimitTokenCountFilterFactory" >> maxTokenCount="10000" consumeAllTokens="false"/> >> <filter class="solr.LengthFilterFactory" min="1" max="255"/> >> </analyzer> >> <analyzer type="query"> >> <charFilter class="org.apache.lucene.analy >> sis.icu.ICUNormalizer2CharFilterFactory" name="nfkc" mode="compose"/> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> <filter class="solr.WordDelimiterGraphFilterFactory" >> generateWordParts="1" generateNumberParts="1" catenateWords="0" >> catenateNumbers="0" catenateAll="0" preserveOriginal="0" >> splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="0"/> >> <filter class="solr.PatternReplaceFilterFactory" >> pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.ASCIIFoldingFilterFactory"/> >> <filter class="solr.CommonGramsFilterFactory" >> words="stopwords.txt" ignoreCase="true"/> >> <filter class="solr.LengthFilterFactory" min="1" max="255"/> >> </analyzer> >> </fieldType> >> >> >> I am not sure what I am missing. I have also set sow=false so that the >> whole query string is sent to field's analysis chain instead of sending >> word by word. But that didnt' seem to help. >> >> Thanks >> Nawab >> > >