[jira] [Commented] (LUCENE-8767) DisjunctionMaxQuery do not work well when multiple search term+mm+query fields with different fieldType.

2019-08-09 Thread Chongchen Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903882#comment-16903882
 ] 

Chongchen Chen commented on LUCENE-8767:


Hi, [~ZhongHua]. on master branch, I cannot reproduce your problem. Here's my 
patch that tries to reproduce your problem.  [^a.diff]  you can run that test. 
you will find that the parsedQuery is correct. Is there something wrong in my 
patch?

> DisjunctionMaxQuery do not work well when multiple search term+mm+query 
> fields with different fieldType.
> 
>
> Key: LUCENE-8767
> URL: https://issues.apache.org/jira/browse/LUCENE-8767
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/queryparser
>Affects Versions: 7.3
> Environment: Solr: 7.3.1
> Backup:
> FieldType for name field:
>  omitNorms="true">
>  
>  
>   words="stopwords.txt" enablePositionIncrements="true" />
>   generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0" 
>  splitOnCaseChange="0" preserveOriginal="1" splitOnNumerics="0"/>
>  
>   protected="protwords.txt" />
>  
>  
>  
> FieldType for partNumber field:
>  omitNorms="true">
>  
>  
>  
>  
>  
>  
>Reporter: ZhongHua Wu
>Priority: Critical
>  Labels: patch
> Attachments: a.diff
>
>
> When multiple fields in query fields came from different fieldType, 
> especially one from KeywordTokenizerFactory, another from 
> WhitespaceTokenizerFactory, then the generated parse query could not honor 
> synonyms and mm, which hit incorrect documents. The following is my detail:
>  # We use Solr 7.3.1
>  # Our qf=name^10 partNumber_ntk, while fieldType of name use 
> solr.WhitespaceTokenizerFactory and solr.WordDelimiterFilterFactory, while  
> partNumber_ntk is not tokenized and use solr.KeywordTokenizerFactory
>  # mm=2<3 4<5 6<-80%25
>  # The search term is versatil sundress, while 'versatile' and 'testing' are 
> synonyms, we have documents named " Versatil Empire Waist Sundress" which 
> should be hit, but failed.
>  # We test same query on Solr 5.5.4, it works fine, it do not work on Solr 
> 7.3.1.
> q=
> (Versatil%20testing)%20sundress=name=edismax=2<3 4<5 
> 6<-80%25=name^10%20partNumber_ntk=true=xml=100
> parsedQuery:
> +(DisjunctionMaxQueryname:versatil name:test)~2)^10.0 | 
> partNumber_ntk:versatil testing)) DisjunctionMaxQuery(((name:sundress)^10.0 | 
> partNumber_ntk:sundress)))~2
> Which seems it incorrect parse name to: name:versatil name:test
> If I change the query fields to same fieldType, for example,shortDescription 
> is in same fieldType of name:
> q=(Versatil%20testing)%20sundress=name=edismax=2<3 4<5 
> 6<-80%25=name^10%20shortDescription=true=xml=100
> ParsedQuery:
> +((DisjunctionMaxQuery(((name:versatil)^10.0 | shortDescription:versatil)) 
> DisjunctionMaxQuery(((name:test)^10.0 | shortDescription:test))) 
> DisjunctionMaxQuery(((name:sundress)^10.0 | shortDescription:sundress)))~2
> which hits correctly.
> Could someone check this or tell us a quick workaround? Now it have big 
> impact on customer.
> Thanks in advance! The following is backup information:
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8767) DisjunctionMaxQuery do not work well when multiple search term+mm+query fields with different fieldType.

2019-04-17 Thread ZhongHua Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819882#comment-16819882
 ] 

ZhongHua Wu commented on LUCENE-8767:
-

BTW, I do test to add q.op=OR like:

q=(Versatil%20test)%20sundress=name=edismax=2=name^10%20partNumber_ntk=true=xml=1=OR

so this issue is not the same issue in 
https://issues.apache.org/jira/browse/SOLR-3589Verstail

Even we want to achieve same effect, we want name:Versatil | name:test

> DisjunctionMaxQuery do not work well when multiple search term+mm+query 
> fields with different fieldType.
> 
>
> Key: LUCENE-8767
> URL: https://issues.apache.org/jira/browse/LUCENE-8767
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/queryparser
>Affects Versions: 7.3
> Environment: Solr: 7.3.1
> Backup:
> FieldType for name field:
>  omitNorms="true">
>  
>  
>   words="stopwords.txt" enablePositionIncrements="true" />
>   generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0" 
>  splitOnCaseChange="0" preserveOriginal="1" splitOnNumerics="0"/>
>  
>   protected="protwords.txt" />
>  
>  
>  
> FieldType for partNumber field:
>  omitNorms="true">
>  
>  
>  
>  
>  
>  
>Reporter: ZhongHua Wu
>Priority: Critical
>  Labels: patch
>
> When multiple fields in query fields came from different fieldType, 
> especially one from KeywordTokenizerFactory, another from 
> WhitespaceTokenizerFactory, then the generated parse query could not honor 
> synonyms and mm, which hit incorrect documents. The following is my detail:
>  # We use Solr 7.3.1
>  # Our qf=name^10 partNumber_ntk, while fieldType of name use 
> solr.WhitespaceTokenizerFactory and solr.WordDelimiterFilterFactory, while  
> partNumber_ntk is not tokenized and use solr.KeywordTokenizerFactory
>  # mm=2<3 4<5 6<-80%25
>  # The search term is versatil sundress, while 'versatile' and 'testing' are 
> synonyms, we have documents named " Versatil Empire Waist Sundress" which 
> should be hit, but failed.
>  # We test same query on Solr 5.5.4, it works fine, it do not work on Solr 
> 7.3.1.
> q=
> (Versatil%20testing)%20sundress=name=edismax=2<3 4<5 
> 6<-80%25=name^10%20partNumber_ntk=true=xml=100
> parsedQuery:
> +(DisjunctionMaxQueryname:versatil name:test)~2)^10.0 | 
> partNumber_ntk:versatil testing)) DisjunctionMaxQuery(((name:sundress)^10.0 | 
> partNumber_ntk:sundress)))~2
> Which seems it incorrect parse name to: name:versatil name:test
> If I change the query fields to same fieldType, for example,shortDescription 
> is in same fieldType of name:
> q=(Versatil%20testing)%20sundress=name=edismax=2<3 4<5 
> 6<-80%25=name^10%20shortDescription=true=xml=100
> ParsedQuery:
> +((DisjunctionMaxQuery(((name:versatil)^10.0 | shortDescription:versatil)) 
> DisjunctionMaxQuery(((name:test)^10.0 | shortDescription:test))) 
> DisjunctionMaxQuery(((name:sundress)^10.0 | shortDescription:sundress)))~2
> which hits correctly.
> Could someone check this or tell us a quick workaround? Now it have big 
> impact on customer.
> Thanks in advance! The following is backup information:
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org