Re: Solr limit in words search - take 2

Michael Gibney Wed, 17 Nov 2021 09:07:33 -0800

This is not the most thorough answer, but hopefully gets you headed in the
right direction:

Very strange things can happen when your index-time analysis chain
generates "graph" token-streams (as yours does). A couple of things you
could try:
1. experiment with setting `enableGraphQueries=false` on the fieldtype
2. upgrading to solr >=8.1 may address your issue partially, via
LUCENE-8730 -- here I go out on a limb in guessing that you're not
_already_ on 8.1+ :-)
3. increase the phrase slop param, to be more lenient in matching
"phrases". (as I say this I'm not sure it would actually help your case,
because you're dealing with explicit phrases, and iirc phrase slop may only
configure _implicit_ ("pf") phrase searches?)

The _best_ approach would be to configure your index-time analysis chain(s)
so that they don't have multi-term "expand" synonyms, and WDGF either only
splits ("generate*Parts", etc.) or only catenates ("catenate*",
"preserveOriginal"). One approach that can work is to index into two
fields, each with a dedicated index-time analysis type (split or catenate).

Some relevant issues:
https://issues.apache.org/jira/browse/LUCENE-7398
https://issues.apache.org/jira/browse/LUCENE-4312

Michael

On Wed, Nov 17, 2021 at 11:18 AM Scott <[email protected]> wrote:

> My apologies for the previous e-mail…should have never sent that as html
>
> I am facing a weird issue, possibly caused by my config.
>
> I have indexed a document which has a field called subject, subject is
> defined as:
>
> <field name="subject" type="partial_text_general"/>
>
>   <fieldType name="partial_text_general" class="solr.TextField"
> positionIncrementGap="100" multiValued="true">
>         <analyzer type="index">
>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                 <filter class="solr.WordDelimiterGraphFilterFactory"
> generateWordParts="1" generateNumberParts="0" splitOnCaseChange="1"
> catenateWords="1" catenateNumbers="1" preserveOriginal="1"
> splitOnNumerics="0"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>                 <filter class="solr.EnglishPossessiveFilterFactory"/>
>                 <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>                 <filter class="solr.EnglishMinimalStemFilterFactory"/>
>                 <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
> maxGramSize="45" />
>         </analyzer>
>         <analyzer type="query">
>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                 <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>                 <filter class="solr.WordDelimiterGraphFilterFactory"
> generateWordParts="1" generateNumberParts="0" splitOnCaseChange="1"
> catenateWords="1" catenateNumbers="1" splitOnNumerics="0"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>                 <filter class="solr.EnglishPossessiveFilterFactory"/>
>                 <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>                 <filter class="solr.EnglishMinimalStemFilterFactory"/>
>         </analyzer>
>   </fieldType>
>
> I have a document with subject field: <str>cobrancas E-mail marketing em
> dezembro, 2020 - referente ao uso de novembro</str>
>
> If I search for <str name="q">subject:"cobrancas e-mail"</str> then it
> finds
> the document, but if I search for <str name="q">subject:"cobrancas e-mail
> marketing"</str> I have no match.
>
> Why would this happen ?
>
> Thank you!
>
>
>

Re: Solr limit in words search - take 2

Reply via email to