An update on this:

The problem occurs on phrase queries, using edismax, where the term in the
nested query contains a multi-word synonym.
In the example above,  dog has a multiterm synonym "canis familiaris", and
aspirin has "acetylsalicylic acid".

Creating a JIRA ticket.

Thank you,
Elizabeth


On Wed, Apr 18, 2018 at 12:38 PM, Elizabeth Haubert <
ehaub...@opensourceconnections.com> wrote:

> I'm seeing pf and pf3 clauses fail to generate in long queries containing
> synonyms.  Wondering if anyone else has run into this, or if it needs to be
> submitted as a bug in Jira.   It is a showstopper problem for the current
> project, as the pf and pf3 were pretty heavily tuned.
>
> Using Solr 7.1; all fields are using the following type:
>
> With query-time synonyms:
> <fieldType name="my_text_general" class="solr.TextField"
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
> <analyzer type="index">
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.WordDelimiterGraphFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"
> stemEnglishPossessive="1"  protected="protwords_wdff.txt"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
> <filter class="solr.TrimFilterFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.ASCIIFoldingFilterFactory"/>
> <filter class="solr.EnglishMinimalStemFilterFactory"/>
> <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords_nostem.txt"/>
> <filter class="solr.KStemFilterFactory"/>
> <filter class="solr.FlattenGraphFilterFactory" />
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>   </analyzer>
> <analyzer type="query">
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.WordDelimiterGraphFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"
> stemEnglishPossessive="1"  protected="protwords_wdff.txt"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
> <filter class="solr.TrimFilterFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.ASCIIFoldingFilterFactory"/>
> <filter class="solr.EnglishMinimalStemFilterFactory"/>
>     <filter class="solr.SynonymGraphFilterFactory"
>  managed="synonyms_all" />
> <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords_nostem.txt"/>
> <filter class="solr.KStemFilterFactory"/>
> </analyzer>
> <similarity class="solr.ClassicSimilarityFactory" />
> </fieldType>
>
> Without query-time synonyms:
> <fieldType name="my_text_general" class="solr.TextField"
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
> <analyzer type="index">
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.WordDelimiterGraphFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"
> stemEnglishPossessive="1"  protected="protwords_wdff.txt"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
> <filter class="solr.TrimFilterFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.ASCIIFoldingFilterFactory"/>
> <filter class="solr.EnglishMinimalStemFilterFactory"/>
>     <filter class="solr.SynonymGraphFilterFactory"
>  managed="synonyms_all" />
> <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords_nostem.txt"/>
> <filter class="solr.KStemFilterFactory"/>
> <filter class="solr.FlattenGraphFilterFactory" />
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>   </analyzer>
> <analyzer type="query">
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.WordDelimiterGraphFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"
> stemEnglishPossessive="1"  protected="protwords_wdff.txt"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
> <filter class="solr.TrimFilterFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.ASCIIFoldingFilterFactory"/>
> <filter class="solr.EnglishMinimalStemFilterFactory"/>
> <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords_nostem.txt"/>
> <filter class="solr.KStemFilterFactory"/>
> </analyzer>
> <similarity class="solr.ClassicSimilarityFactory" />
> </fieldType>
>
> Synonyms file is pretty long, so I'll just include the relevent bits for
> an example:
>
> allergic, hypersensitive
> aspirin, acetylsalicylic acid
> dog, canine, canis familiris, k 9
> rat, rattus
>
>
> The problem seems to occur when part of the query has a synonym, but the
> whole phrase is not.  Whitespace added to piece out what is going on;
> believe any parentheses errors are due to my tinkering around.  Beyond that
> though, this is as from Solr.  Slop has been tinkered with to identify
> PF/PF2/PF3 clauses where PF fields have a slop ending in 0, pf2 ending in
> 1, pf3 ending in 2 eg ~10, ~11, ~12, etc.
>
> =============
> Example 1:  "aspirin dose in rats"
> ==============
>
> With query-time synonyms:
> ===============
> /// Q terms generate as expected ///
> +((((kw1:\"acetylsalicylic acid\" kw1:aspirin)^100.0 |
> (species:\"acetylsalicylic acid\" species:aspirin) |
> (keywords_bm25_no_norms:\"acetylsalicylic acid\" 
> keywords_bm25_no_norms:aspirin)^50.0
> | (description:\"acetylsalicylic acid\" description:aspirin) |
> (kw1ranked:\"acetylsalicylic acid\" kw1ranked:aspirin)^100.0 |
> (text:\"acetylsalicylic acid\" text:aspirin) | (title:\"acetylsalicylic
> acid\" title:aspirin)^100.0 | (keywordsranked_bm25_no_norms:\"acetylsalicylic
> acid\" keywordsranked_bm25_no_norms:aspirin)^50.0 |
> (authors:\"acetylsalicylic acid\" authors:aspirin))~0.4
> ((Synonym(kw1:dosage kw1:dose kw1:dose kw1:dose))^100.0 |
> Synonym(species:dosage species:dose species:dose species:dose) |
> (Synonym(keywords_bm25_no_norms:dosage keywords_bm25_no_norms:dose
> keywords_bm25_no_norms:dose keywords_bm25_no_norms:dose))^50.0 |
> Synonym(description:dosage description:dose description:dose
> description:dose) | (Synonym(kw1ranked:dosage kw1ranked:dose kw1ranked:dose
> kw1ranked:dose))^100.0 | Synonym(text:dosage text:dose text:dose text:dose)
> | (Synonym(title:dosage title:dose title:dose title:dose))^100.0 |
> (Synonym(keywordsranked_bm25_no_norms:dosage keywordsranked_bm25_no_norms:dose
> keywordsranked_bm25_no_norms:dose keywordsranked_bm25_no_norms:dose))^50.0
> | Synonym(authors:dosage authors:dose authors:dose authors:dose))~0.4
> ((Synonym(kw1:rat kw1:rattu))^100.0 | Synonym(species:rat species:rattu) |
> (Synonym(keywords_bm25_no_norms:rat keywords_bm25_no_norms:rattu))^50.0 |
> Synonym(description:rat description:rattu) | (Synonym(kw1ranked:rat
> kw1ranked:rattu))^100.0 | Synonym(text:rat text:rattu) | (Synonym(title:rat
> title:rattu))^100.0 | (Synonym(keywordsranked_bm25_no_norms:rat
> keywordsranked_bm25_no_norms:rattu))^50.0 | Synonym(authors:rat
> authors:rattu))~0.4)~3)
>
> /// PF and PF2 are missing. ///
>  () () () () ()
>
> /// This is actually PF3 with a missing ? where the stopword 'in'
> belonged. ///
>  ((title:\"(dosage dose dose dose) (rattu rat)\"~22)^1000.0 |
> (keywordsranked_bm25_no_norms:\"(dosage dose dose dose) (rattu
> rat)\"~22)^1000.0 | (text:\"(dosage dose dose dose) (rattu
> rat)\"~22)^100.0)~0.4 ((keywords_bm25_no_norms:\"(dosage dose dose dose)
> (rattu rat)\"~12)^500.0 | (kw1ranked:\"(dosage dose dose dose) (rattu
> rat)\"~12)^100.0 | (kw1:\"(dosage dose dose dose) (rattu
> rat)\"~12)^100.0)~0.4,product(max(10.0/(3.16E-11*float(ms(
> const(1555545600000),date(dateint)))+6.0),int(documentdatefix)),scale(map(
> int(rank),-1.0,-1.0,const(0.5),null),0.5,2.0)))",
>
> With index-time synonyms:
> ===============
>
> /// Q ///
>  "boost(+((((kw1:aspirin)^100.0 | species:aspirin |
> (keywords_bm25_no_norms:aspirin)^50.0 | description:aspirin |
> (kw1ranked:aspirin)^100.0 | text:aspirin | (title:aspirin)^100.0 |
> (keywordsranked_bm25_no_norms:aspirin)^50.0 | authors:aspirin)~0.4
> ((kw1:dose)^100.0 | species:dose | (keywords_bm25_no_norms:dose)^50.0 |
> description:dose | (kw1ranked:dose)^100.0 | text:dose | (title:dose)^100.0
> | (keywordsranked_bm25_no_norms:dose)^50.0 | authors:dose)~0.4
> ((kw1:rats)^100.0 | species:rats | (keywords_bm25_no_norms:rats)^50.0 |
> description:rats | (kw1ranked:rats)^100.0 | text:rats | (title:rats)^100.0
> | (keywordsranked_bm25_no_norms:rats)^50.0 | authors:rats)~0.4)~3)
> /// PF  ///
>   ((title:\"aspirin dose ? rats\"~20)^5000.0 |
> (keywordsranked_bm25_no_norms:\"aspirin dose ? rats\"~20)^5000.0 |
> (keywords_bm25_no_norms:\"aspirin dose ? rats\"~20)^1500.0 |
> (text:\"aspirin dose ? rats\"~20)^1000.0)~0.4 ((kw1ranked:\"aspirin dose ?
> rats\"~10)^5000.0 | (kw1:\"aspirin dose ? rats\"~10)^500.0)~0.4
> ((authors:\"aspirin dose ? rats\")^250.0 | description:\"aspirin dose ?
> rats\")~0.4
>
> /// PF2 ///
>   ((text:\"aspirin dose ? rats\"~100)^500.0)~0.4 (authors:\"aspirin
> dose\"~11 | species:\"aspirin dose\"~11)~0.4
>
> /// PF3 ///
> (((title:\"aspirin dose\"~22)^1000.0 | (keywordsranked_bm25_no_norms:\"aspirin
> dose\"~22)^1000.0 | (text:\"aspirin dose\"~22)^100.0)~0.4 ((title:\"dose ?
> rats\"~22)^1000.0 | (keywordsranked_bm25_no_norms:\"dose ?
> rats\"~22)^1000.0 | (text:\"dose ? rats\"~22)^100.0)~0.4)
> (((keywords_bm25_no_norms:\"aspirin dose\"~12)^500.0 |
> (kw1ranked:\"aspirin dose\"~12)^100.0 | (kw1:\"aspirin
> dose\"~12)^100.0)~0.4 ((keywords_bm25_no_norms:\"dose ? rats\"~12)^500.0
> | (kw1ranked:\"dose ? rats\"~12)^100.0 | (kw1:\"dose ?
> rats\"~12)^100.0)~0.4),product(max(10.0/(3.16E-11*
> float(ms(const(1555545600000),date(dateint)))+6.0),int(
> documentdatefix)),scale(map(int(rank),-1.0,-1.0,const(0.5)
> ,null),0.5,2.0)))",
>
>
> ===============
> Example 2: "allergic reaction dogs"
> The underlying issue isn't specifically PF, PF2, PF3. The following
> example picks up PF2, but not PF or PF3
> ===============
>
> With Query-time synonyms:
> ///  Q ///
> parsedquery_toString":"boost(
> +((((Synonym(kw1:allergic kw1:allergy kw1:hypersensitive
> kw1:hypersensitive))^100.0 | Synonym(species:allergic species:allergy
> species:hypersensitive species:hypersensitive) | 
> (Synonym(keywords_bm25_no_norms:allergic
> keywords_bm25_no_norms:allergy keywords_bm25_no_norms:hypersensitive
> keywords_bm25_no_norms:hypersensitive))^50.0 |
> Synonym(description:allergic description:allergy description:hypersensitive
> description:hypersensitive) | (Synonym(kw1ranked:allergic kw1ranked:allergy
> kw1ranked:hypersensitive kw1ranked:hypersensitive))^100.0 |
> Synonym(text:allergic text:allergy text:hypersensitive text:hypersensitive)
> | (Synonym(title:allergic title:allergy title:hypersensitive
> title:hypersensitive))^100.0 | (Synonym(keywordsranked_bm25_no_norms:allergic
> keywordsranked_bm25_no_norms:allergy 
> keywordsranked_bm25_no_norms:hypersensitive
> keywordsranked_bm25_no_norms:hypersensitive))^50.0 |
> Synonym(authors:allergic authors:allergy authors:hypersensitive
> authors:hypersensitive))~0.4 ((kw1:reaction)^100.0 | species:reaction |
> (keywords_bm25_no_norms:reaction)^50.0 | description:reaction |
> (kw1ranked:reaction)^100.0 | text:reaction | (title:reaction)^100.0 |
> (keywordsranked_bm25_no_norms:reaction)^50.0 | authors:reaction)~0.4
> ((kw1:\"cani familiari\" kw1:canine kw1:\"k 9\" kw1:\"cani lupu familiari\"
> kw1:dog)^100.0 | (species:\"cani familiari\" species:canine species:\"k 9\"
> species:\"cani lupu familiari\" species:dog) |
> (keywords_bm25_no_norms:\"cani familiari\" keywords_bm25_no_norms:canine
> keywords_bm25_no_norms:\"k 9\" keywords_bm25_no_norms:\"cani lupu
> familiari\" keywords_bm25_no_norms:dog)^50.0 | (description:\"cani
> familiari\" description:canine description:\"k 9\" description:\"cani lupu
> familiari\" description:dog) | (kw1ranked:\"cani familiari\"
> kw1ranked:canine kw1ranked:\"k 9\" kw1ranked:\"cani lupu familiari\"
> kw1ranked:dog)^100.0 | (text:\"cani familiari\" text:canine text:\"k 9\"
> text:\"cani lupu familiari\" text:dog) | (title:\"cani familiari\"
> title:canine title:\"k 9\" title:\"cani lupu familiari\" title:dog)^100.0 |
> (keywordsranked_bm25_no_norms:\"cani familiari\"
> keywordsranked_bm25_no_norms:canine keywordsranked_bm25_no_norms:\"k 9\"
> keywordsranked_bm25_no_norms:\"cani lupu familiari\"
> keywordsranked_bm25_no_norms:dog)^50.0 | (authors:\"cani familiari\"
> authors:canine authors:\"k 9\" authors:\"cani lupu familiari\"
> authors:dog))~0.4)~3)
>
> /// PF ///
> () () () ()
>
> /// PF2 ////
> (authors:\"(hypersensitive allergy hypersensitive allergic) reaction\"~11
> | species:\"(hypersensitive allergy hypersensitive allergic)
> reaction\"~11)~0.4
>
> /// PF3 ///
> () (),
> product(max(10.0/(3.16E-11*float(ms(const(1555545600000),
> date(dateint)))+6.0),int(documentdatefix)),scale(map(
> int(rank),-1.0,-1.0,const(0.5),null),0.5,2.0)))",
>
> With index-timy synonyms:
> /// Q ///
> +((((kw1:allergic)^100.0 | species:allergic | 
> (keywords_bm25_no_norms:allergic)^50.0
> | description:allergic | (kw1ranked:allergic)^100.0 | text:allergic |
> (title:allergic)^100.0 | (keywordsranked_bm25_no_norms:allergic)^50.0 |
> authors:allergic)~0.4 ((kw1:reaction)^100.0 | species:reaction |
> (keywords_bm25_no_norms:reaction)^50.0 | description:reaction |
> (kw1ranked:reaction)^100.0 | text:reaction | (title:reaction)^100.0 |
> (keywordsranked_bm25_no_norms:reaction)^50.0 | authors:reaction)~0.4
> ((kw1:dog)^100.0 | species:dog | (keywords_bm25_no_norms:dog)^50.0 |
> description:dog | (kw1ranked:dog)^100.0 | text:dog | (title:dog)^100.0 |
> (keywordsranked_bm25_no_norms:dog)^50.0 | authors:dog)~0.4)~3)
>
> /// PF ///
> ((title:\"allergic reaction dog\"~20)^5000.0 |
> (keywordsranked_bm25_no_norms:\"allergic reaction dog\"~20)^5000.0 |
> (keywords_bm25_no_norms:\"allergic reaction dog\"~20)^1500.0 |
> (text:\"allergic reaction dog\"~20)^1000.0)~0.4 ((kw1ranked:\"allergic
> reaction dog\"~10)^5000.0 | (kw1:\"allergic reaction dog\"~10)^500.0)~0.4
> ((authors:\"allergic reaction dog\")^250.0 | description:\"allergic
> reaction dog\")~0.4 ((text:\"allergic reaction dog\"~100)^500.0)~0.4
>
> /// PF2 ///
> ((authors:\"allergic reaction\"~11 | species:\"allergic reaction\"~11)~0.4
>
> /// PF3 ///
> (authors:\"reaction dog\"~11 | species:\"reaction dog\"~11)~0.4)
> ((title:\"allergic reaction dog\"~22)^1000.0 |
> (keywordsranked_bm25_no_norms:\"allergic reaction dog\"~22)^1000.0 |
> (text:\"allergic reaction dog\"~22)^100.0)~0.4 
> ((keywords_bm25_no_norms:\"allergic
> reaction dog\"~12)^500.0 | (kw1ranked:\"allergic reaction dog\"~12)^100.0 |
> (kw1:\"allergic reaction dog\"~12)^100.0)~0.4,product(
> max(10.0/(3.16E-11*float(ms(const(1555545600000),date(dateint)))+6.0),int(
> documentdatefix)),scale(map(int(rank),-1.0,-1.0,const(0.5)
> ,null),0.5,2.0)))",
>
>
> Working on getting this rigged up in the debugger, but would appreciate
> any feedback.
>
> Thank you,
> Elizabeth
>

Reply via email to