[jira] [Commented] (LUCENE-8767) DisjunctionMaxQuery do not work well when multiple search term+mm+query fields with different fieldType.
[ https://issues.apache.org/jira/browse/LUCENE-8767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903882#comment-16903882 ] Chongchen Chen commented on LUCENE-8767: Hi, [~ZhongHua]. on master branch, I cannot reproduce your problem. Here's my patch that tries to reproduce your problem. [^a.diff] you can run that test. you will find that the parsedQuery is correct. Is there something wrong in my patch? > DisjunctionMaxQuery do not work well when multiple search term+mm+query > fields with different fieldType. > > > Key: LUCENE-8767 > URL: https://issues.apache.org/jira/browse/LUCENE-8767 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Affects Versions: 7.3 > Environment: Solr: 7.3.1 > Backup: > FieldType for name field: > omitNorms="true"> > > > words="stopwords.txt" enablePositionIncrements="true" /> > generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0" > splitOnCaseChange="0" preserveOriginal="1" splitOnNumerics="0"/> > > protected="protwords.txt" /> > > > > FieldType for partNumber field: > omitNorms="true"> > > > > > > >Reporter: ZhongHua Wu >Priority: Critical > Labels: patch > Attachments: a.diff > > > When multiple fields in query fields came from different fieldType, > especially one from KeywordTokenizerFactory, another from > WhitespaceTokenizerFactory, then the generated parse query could not honor > synonyms and mm, which hit incorrect documents. The following is my detail: > # We use Solr 7.3.1 > # Our qf=name^10 partNumber_ntk, while fieldType of name use > solr.WhitespaceTokenizerFactory and solr.WordDelimiterFilterFactory, while > partNumber_ntk is not tokenized and use solr.KeywordTokenizerFactory > # mm=2<3 4<5 6<-80%25 > # The search term is versatil sundress, while 'versatile' and 'testing' are > synonyms, we have documents named " Versatil Empire Waist Sundress" which > should be hit, but failed. > # We test same query on Solr 5.5.4, it works fine, it do not work on Solr > 7.3.1. > q= > (Versatil%20testing)%20sundress=name=edismax=2<3 4<5 > 6<-80%25=name^10%20partNumber_ntk=true=xml=100 > parsedQuery: > +(DisjunctionMaxQueryname:versatil name:test)~2)^10.0 | > partNumber_ntk:versatil testing)) DisjunctionMaxQuery(((name:sundress)^10.0 | > partNumber_ntk:sundress)))~2 > Which seems it incorrect parse name to: name:versatil name:test > If I change the query fields to same fieldType, for example,shortDescription > is in same fieldType of name: > q=(Versatil%20testing)%20sundress=name=edismax=2<3 4<5 > 6<-80%25=name^10%20shortDescription=true=xml=100 > ParsedQuery: > +((DisjunctionMaxQuery(((name:versatil)^10.0 | shortDescription:versatil)) > DisjunctionMaxQuery(((name:test)^10.0 | shortDescription:test))) > DisjunctionMaxQuery(((name:sundress)^10.0 | shortDescription:sundress)))~2 > which hits correctly. > Could someone check this or tell us a quick workaround? Now it have big > impact on customer. > Thanks in advance! The following is backup information: > > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8767) DisjunctionMaxQuery do not work well when multiple search term+mm+query fields with different fieldType.
[ https://issues.apache.org/jira/browse/LUCENE-8767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819882#comment-16819882 ] ZhongHua Wu commented on LUCENE-8767: - BTW, I do test to add q.op=OR like: q=(Versatil%20test)%20sundress=name=edismax=2=name^10%20partNumber_ntk=true=xml=1=OR so this issue is not the same issue in https://issues.apache.org/jira/browse/SOLR-3589Verstail Even we want to achieve same effect, we want name:Versatil | name:test > DisjunctionMaxQuery do not work well when multiple search term+mm+query > fields with different fieldType. > > > Key: LUCENE-8767 > URL: https://issues.apache.org/jira/browse/LUCENE-8767 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Affects Versions: 7.3 > Environment: Solr: 7.3.1 > Backup: > FieldType for name field: > omitNorms="true"> > > > words="stopwords.txt" enablePositionIncrements="true" /> > generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0" > splitOnCaseChange="0" preserveOriginal="1" splitOnNumerics="0"/> > > protected="protwords.txt" /> > > > > FieldType for partNumber field: > omitNorms="true"> > > > > > > >Reporter: ZhongHua Wu >Priority: Critical > Labels: patch > > When multiple fields in query fields came from different fieldType, > especially one from KeywordTokenizerFactory, another from > WhitespaceTokenizerFactory, then the generated parse query could not honor > synonyms and mm, which hit incorrect documents. The following is my detail: > # We use Solr 7.3.1 > # Our qf=name^10 partNumber_ntk, while fieldType of name use > solr.WhitespaceTokenizerFactory and solr.WordDelimiterFilterFactory, while > partNumber_ntk is not tokenized and use solr.KeywordTokenizerFactory > # mm=2<3 4<5 6<-80%25 > # The search term is versatil sundress, while 'versatile' and 'testing' are > synonyms, we have documents named " Versatil Empire Waist Sundress" which > should be hit, but failed. > # We test same query on Solr 5.5.4, it works fine, it do not work on Solr > 7.3.1. > q= > (Versatil%20testing)%20sundress=name=edismax=2<3 4<5 > 6<-80%25=name^10%20partNumber_ntk=true=xml=100 > parsedQuery: > +(DisjunctionMaxQueryname:versatil name:test)~2)^10.0 | > partNumber_ntk:versatil testing)) DisjunctionMaxQuery(((name:sundress)^10.0 | > partNumber_ntk:sundress)))~2 > Which seems it incorrect parse name to: name:versatil name:test > If I change the query fields to same fieldType, for example,shortDescription > is in same fieldType of name: > q=(Versatil%20testing)%20sundress=name=edismax=2<3 4<5 > 6<-80%25=name^10%20shortDescription=true=xml=100 > ParsedQuery: > +((DisjunctionMaxQuery(((name:versatil)^10.0 | shortDescription:versatil)) > DisjunctionMaxQuery(((name:test)^10.0 | shortDescription:test))) > DisjunctionMaxQuery(((name:sundress)^10.0 | shortDescription:sundress)))~2 > which hits correctly. > Could someone check this or tell us a quick workaround? Now it have big > impact on customer. > Thanks in advance! The following is backup information: > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org