[jira] [Comment Edited] (SOLR-12243) Edismax missing phrase queries when phrases contain multiterm synonyms

2019-05-09 Thread Fredrik Rodland (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16836249#comment-16836249
 ] 

Fredrik Rodland edited comment on SOLR-12243 at 5/9/19 6:07 PM:


I am aware that this issue is closed, but nonetheless:

I think this actually broke something regarding expansion of synonyms for large 
queries (possibly large {{OR}}-queries).

Having {{pf}} enabled on fields with a substantial amount of synonym resulted 
in the pf-portion of the query growing "exponentially" and resulted in one 
single query taking down an entire solr-server.

By adjusting the number of {{OR}}-queries we were able to increase the memory 
required for running the query.

example (id has synonyms enabled, companyname has not):

*{{A.}}*

{{q=( samfunnsviter (klima OR miljø) ) NOT ( psykolog%20 OR rus OR ortopedi OR 
odontologi )&debugQuery=true&pf=companyname}}

results in pf-part of edismax-query

{{(+DisjunctionMaxQuery((companyname:\"? samfunnsviter klima miljø ? ? psykolog 
rus ortopedi odontologi\"~5)~0.01))}}

*{{B.}}*

{{q=( samfunnsviter (klima OR miljø) ) NOT ( psykolog%20 OR rus OR ortopedi OR 
odontologi )&debugQuery=true&pf=id companyname}}

results in pf-part of edismax-query

{{(+DisjunctionMaxQuery(((id:\"samfunnsviter klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"samfunnsviter klima miljø psykologspesialist rus ortopedi 
odontologi\"~5 id:\"samfunnsvitar klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"samfunnsvitar klima miljø psykologspesialist rus ortopedi 
odontologi\"~5 id:\"social scientist klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"social scientist klima miljø psykologspesialist rus 
ortopedi odontologi\"~5 id:\"statsviter klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"statsviter klima miljø psykologspesialist rus ortopedi 
odontologi\"~5 id:\"samfunnsøkonom klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"samfunnsøkonom klima miljø psykologspesialist rus ortopedi 
odontologi\"~5) | companyname:\"? samfunnsviter klima miljø ? ? psykolog rus 
ortopedi odontologi\"~5)~0.01))}}

 

B. above is just a reasonably short example to show our point.  Our actually 
queries (and resulting {{pf}} {{DisjunctionMaxQuery}} are a *lot longer*.  
Increasing the number of OR-terms or synonyms results in the id-part of the 
query growing "exponentially"


was (Author: fmr):
I am aware that this issue is closed, but nonetheless:

I think this actually broke something regarding expansion of synonyms for large 
queries (possibly large OR-queries).

Having \{code}pf\{code} enabled on fields with a substansial amount of synonym 
resulted in the pf-portion of the query growing "exponentially" and resulted in 
one single query taking down an entire solr-server.

By adjusting the number of OR-queries we were able to increase the memory 
required for running the query.

example (id has synonyms enabled, companyname has not):

q=( samfunnsviter (klima OR miljø) ) NOT ( psykolog%20 OR rus OR ortopedi OR 
odontologi )&debugQuery=true&pf=companyname

results in pf-part of edismax-query

(+DisjunctionMaxQuery((companyname:\"? samfunnsviter klima miljø ? ? psykolog 
rus ortopedi odontologi\"~5)~0.01)) 

q=( samfunnsviter (klima OR miljø) ) NOT ( psykolog%20 OR rus OR ortopedi OR 
odontologi )&debugQuery=true&pf=id companyname

results in pf-part of edismax-query

(+DisjunctionMaxQuery(((id:\"samfunnsviter klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"samfunnsviter klima miljø psykologspesialist rus ortopedi 
odontologi\"~5 id:\"samfunnsvitar klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"samfunnsvitar klima miljø psykologspesialist rus ortopedi 
odontologi\"~5 id:\"social scientist klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"social scientist klima miljø psykologspesialist rus 
ortopedi odontologi\"~5 id:\"statsviter klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"statsviter klima miljø psykologspesialist rus ortopedi 
odontologi\"~5 id:\"samfunnsøkonom klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"samfunnsøkonom klima miljø psykologspesialist rus ortopedi 
odontologi\"~5) | companyname:\"? samfunnsviter klima miljø ? ? psykolog rus 
ortopedi odontologi\"~5)~0.01))\{code}

 

 increasing the number of OR-terms or synonyms results in the id-part of the 
query growing "exponentially"

> Edismax missing phrase queries when phrases contain multiterm synonyms
> --
>
> Key: SOLR-12243
> URL: https://issues.apache.org/jira/browse/SOLR-12243
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 7.1
> Environment: RHEL, MacOS X
> Do not believe this is environment-specific.
>

[jira] [Comment Edited] (SOLR-12243) Edismax missing phrase queries when phrases contain multiterm synonyms

2019-05-09 Thread Fredrik Rodland (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16836454#comment-16836454
 ] 

Fredrik Rodland edited comment on SOLR-12243 at 5/9/19 3:05 PM:


Thanks for taking the time to explain and link other issues [~mgibney].  Good 
we're not alone here.  For the time being we've limited pf to only allow 
non-synonym fields as pf is really not that crucial for our site.


was (Author: fmr):
Thanks for taking the time to explain and link other issues [~mgibney].  Good 
we're not alone here.  For the time being we've disabled limited pf to only 
allow non-synonym fields as pf is really not that crucial for our site.

> Edismax missing phrase queries when phrases contain multiterm synonyms
> --
>
> Key: SOLR-12243
> URL: https://issues.apache.org/jira/browse/SOLR-12243
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 7.1
> Environment: RHEL, MacOS X
> Do not believe this is environment-specific.
>Reporter: Elizabeth Haubert
>Assignee: Steve Rowe
>Priority: Major
> Fix For: 7.6, 8.0
>
> Attachments: SOLR-12243.patch, SOLR-12243.patch, SOLR-12243.patch, 
> SOLR-12243.patch, SOLR-12243.patch, SOLR-12243.patch, SOLR-12243.patch, 
> multiword-synonyms.txt, schema.xml, solrconfig.xml
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> synonyms.txt:
> {code}
> allergic, hypersensitive
> aspirin, acetylsalicylic acid
> dog, canine, canis familiris, k 9
> rat, rattus
> {code}
> request handler:
> {code:xml}
> 
>  
> 
>  edismax
>   0.4
>  title^100
>  title~20^5000
>  title~11
>  title~22^1000
>  text
>  
>  3<-1 6<-3 9<30%
>  *:*
>  25
> 
> 
> {code}
> Phrase queries (pf, pf2, pf3) containing "dog" or "aspirin"  against the 
> above list will not be generated.
> "allergic reaction dog" will generate pf2: "allergic reaction", but not 
> pf:"allergic reaction dog", pf2: "reaction dog", or pf3: "allergic reaction 
> dog"
> "aspirin dose in rats" will generate pf3: "dose ? rats" but not pf2: "aspirin 
> dose" or pf3:"aspirin dose ?"
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12243) Edismax missing phrase queries when phrases contain multiterm synonyms

2019-05-09 Thread Fredrik Rodland (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16836249#comment-16836249
 ] 

Fredrik Rodland edited comment on SOLR-12243 at 5/9/19 9:58 AM:


I am aware that this issue is closed, but nonetheless:

I think this actually broke something regarding expansion of synonyms for large 
queries (possibly large OR-queries).

Having \{code}pf\{code} enabled on fields with a substansial amount of synonym 
resulted in the pf-portion of the query growing "exponentially" and resulted in 
one single query taking down an entire solr-server.

By adjusting the number of OR-queries we were able to increase the memory 
required for running the query.

example (id has synonyms enabled, companyname has not):

q=( samfunnsviter (klima OR miljø) ) NOT ( psykolog%20 OR rus OR ortopedi OR 
odontologi )&debugQuery=true&pf=companyname

results in pf-part of edismax-query

(+DisjunctionMaxQuery((companyname:\"? samfunnsviter klima miljø ? ? psykolog 
rus ortopedi odontologi\"~5)~0.01)) 

q=( samfunnsviter (klima OR miljø) ) NOT ( psykolog%20 OR rus OR ortopedi OR 
odontologi )&debugQuery=true&pf=id companyname

results in pf-part of edismax-query

(+DisjunctionMaxQuery(((id:\"samfunnsviter klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"samfunnsviter klima miljø psykologspesialist rus ortopedi 
odontologi\"~5 id:\"samfunnsvitar klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"samfunnsvitar klima miljø psykologspesialist rus ortopedi 
odontologi\"~5 id:\"social scientist klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"social scientist klima miljø psykologspesialist rus 
ortopedi odontologi\"~5 id:\"statsviter klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"statsviter klima miljø psykologspesialist rus ortopedi 
odontologi\"~5 id:\"samfunnsøkonom klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"samfunnsøkonom klima miljø psykologspesialist rus ortopedi 
odontologi\"~5) | companyname:\"? samfunnsviter klima miljø ? ? psykolog rus 
ortopedi odontologi\"~5)~0.01))\{code}

 

 increasing the number of OR-terms or synonyms results in the id-part of the 
query growing "exponentially"


was (Author: fmr):
I am aware that this issue is closed, but nonetheless:

I think this actually broke something regarding expansion of synonyms for large 
queries (possibly large OR-queries).

Having \{code}pf\{code} enabled on fields with a substansial amount of synonym 
resulted in the pf-portion of the query growing "exponentially" and resulted in 
one single query taking down an entire solr-server.

By adjusting the number of OR-queries we were able to increase the memory 
required for running the query.

example (id has synonyms enabled, companyname has not):
{code:java}
q= ( samfunnsviter (klima OR miljø) ) NOT ( psykolog%20 OR rus OR ortopedi OR 
odontologi )&debugQuery=true&pf=companyname\
{code}
results in pf-part of edismax-query

{code}(+DisjunctionMaxQuery((companyname:\"? samfunnsviter klima miljø ? ? 
psykolog rus ortopedi odontologi\"~5)~0.01))\{code}

 
{code:java}
q= ( samfunnsviter (klima OR miljø) ) NOT ( psykolog%20 OR rus OR ortopedi OR 
odontologi )&debugQuery=true&pf=id companyname\
{code}
results in pf-part of edismax-query

{code}(+DisjunctionMaxQuery(((id:\"samfunnsviter klima miljø psykolog rus 
ortopedi odontologi\"~5 id:\"samfunnsviter klima miljø psykologspesialist rus 
ortopedi odontologi\"~5 id:\"samfunnsvitar klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"samfunnsvitar klima miljø psykologspesialist rus ortopedi 
odontologi\"~5 id:\"social scientist klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"social scientist klima miljø psykologspesialist rus 
ortopedi odontologi\"~5 id:\"statsviter klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"statsviter klima miljø psykologspesialist rus ortopedi 
odontologi\"~5 id:\"samfunnsøkonom klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"samfunnsøkonom klima miljø psykologspesialist rus ortopedi 
odontologi\"~5) | companyname:\"? samfunnsviter klima miljø ? ? psykolog rus 
ortopedi odontologi\"~5)~0.01))\{code}

 

 increasing the number of OR-terms or synonyms results in the id-part of the 
query growing "exponentially"

> Edismax missing phrase queries when phrases contain multiterm synonyms
> --
>
> Key: SOLR-12243
> URL: https://issues.apache.org/jira/browse/SOLR-12243
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 7.1
> Environment: RHEL, MacOS X
> Do not believe this is environment-specific.
>Reporter: Elizabeth Haubert
>Assignee: Steve Rowe
>Priority: Major
> Fix For: 7.6, 8

[jira] [Comment Edited] (SOLR-12243) Edismax missing phrase queries when phrases contain multiterm synonyms

2019-05-09 Thread Fredrik Rodland (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16836249#comment-16836249
 ] 

Fredrik Rodland edited comment on SOLR-12243 at 5/9/19 9:56 AM:


I am aware that this issue is closed, but nonetheless:

I think this actually broke something regarding expansion of synonyms for large 
queries (possibly large OR-queries).

Having \{code}pf\{code} enabled on fields with a substansial amount of synonym 
resulted in the pf-portion of the query growing "exponentially" and resulted in 
one single query taking down an entire solr-server.

By adjusting the number of OR-queries we were able to increase the memory 
required for running the query.

example (id has synonyms enabled, companyname has not):
{code:java}
q= ( samfunnsviter (klima OR miljø) ) NOT ( psykolog%20 OR rus OR ortopedi OR 
odontologi )&debugQuery=true&pf=companyname\
{code}
results in pf-part of edismax-query

{code}(+DisjunctionMaxQuery((companyname:\"? samfunnsviter klima miljø ? ? 
psykolog rus ortopedi odontologi\"~5)~0.01))\{code}

 
{code:java}
q= ( samfunnsviter (klima OR miljø) ) NOT ( psykolog%20 OR rus OR ortopedi OR 
odontologi )&debugQuery=true&pf=id companyname\
{code}
results in pf-part of edismax-query

{code}(+DisjunctionMaxQuery(((id:\"samfunnsviter klima miljø psykolog rus 
ortopedi odontologi\"~5 id:\"samfunnsviter klima miljø psykologspesialist rus 
ortopedi odontologi\"~5 id:\"samfunnsvitar klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"samfunnsvitar klima miljø psykologspesialist rus ortopedi 
odontologi\"~5 id:\"social scientist klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"social scientist klima miljø psykologspesialist rus 
ortopedi odontologi\"~5 id:\"statsviter klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"statsviter klima miljø psykologspesialist rus ortopedi 
odontologi\"~5 id:\"samfunnsøkonom klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"samfunnsøkonom klima miljø psykologspesialist rus ortopedi 
odontologi\"~5) | companyname:\"? samfunnsviter klima miljø ? ? psykolog rus 
ortopedi odontologi\"~5)~0.01))\{code}

 

 increasing the number of OR-terms or synonyms results in the id-part of the 
query growing "exponentially"


was (Author: fmr):
I am aware that this issue is closed, but nonetheless:

I think this actually broke something regarding expansion of synonyms for large 
queries (possibly large OR-queries).

Having \{code}pf\{code} enabled on fields with a substansial amount of synonym 
resulted in the pf-portion of the query growing "exponentially" and resulted in 
one single query taking down an entire solr-server.

By adjusting the number of OR-queries we were able to increase the memory 
required for running the query.

example (id has synonyms enabled, companyname has not):

{code}q= ( samfunnsviter (klima OR miljø) ) NOT ( psykolog%20 OR rus OR 
ortopedi OR odontologi )&debugQuery=true&pf=companyname\{code}

results in pf-part of edismax-query

{code}(+DisjunctionMaxQuery((companyname:\"? samfunnsviter klima miljø ? ? 
psykolog rus ortopedi odontologi\"~5)~0.01))\{code}

 

{code}q= ( samfunnsviter (klima OR miljø) ) NOT ( psykolog%20 OR rus OR 
ortopedi OR odontologi )&debugQuery=true&pf=id companyname\{code}

results in pf-part of edismax-query

{code}(+DisjunctionMaxQuery(((id:\"samfunnsviter klima miljø psykolog rus 
ortopedi odontologi\"~5 id:\"samfunnsviter klima miljø psykologspesialist rus 
ortopedi odontologi\"~5 id:\"samfunnsvitar klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"samfunnsvitar klima miljø psykologspesialist rus ortopedi 
odontologi\"~5 id:\"social scientist klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"social scientist klima miljø psykologspesialist rus 
ortopedi odontologi\"~5 id:\"statsviter klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"statsviter klima miljø psykologspesialist rus ortopedi 
odontologi\"~5 id:\"samfunnsøkonom klima miljø psykolog rus ortopedi 
odontologi\"~5 id:\"samfunnsøkonom klima miljø psykologspesialist rus ortopedi 
odontologi\"~5) | companyname:\"? samfunnsviter klima miljø ? ? psykolog rus 
ortopedi odontologi\"~5)~0.01))\{code}

 

 increasing the number of OR-terms or synonyms results in the id-part of the 
query growing "exponentially"

> Edismax missing phrase queries when phrases contain multiterm synonyms
> --
>
> Key: SOLR-12243
> URL: https://issues.apache.org/jira/browse/SOLR-12243
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 7.1
> Environment: RHEL, MacOS X
> Do not believe this is environment-specific.
>Reporter: Elizabeth Haubert
>Assignee: Steve Rowe
>

[jira] [Comment Edited] (SOLR-12243) Edismax missing phrase queries when phrases contain multiterm synonyms

2018-10-29 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16667457#comment-16667457
 ] 

Uwe Schindler edited comment on SOLR-12243 at 10/29/18 5:19 PM:


Thanks Elizabeth and Steve,
I think the problem Sarowe mentioned was actually the problem for the failing 
test (I expected something like this). This is also the reason why I don't like 
the current architecture. EDismax relies on the (internal) structure of queries 
that querybuilder produces! IMHO, we should maybe add a "Lucene" version of the 
dismax parser for easier testing. Also I figured out that especially the phrase 
expansions are useful for Lucene users, too. I had several people I made a 
custom query parser for and for all of those you hd to reinvent the phrase 
expansion stuff.

Elizabeth: I think the permutation problem is not new with the recent Lucene 
fixes. This problem should also have happened with Span expansions, right? 
Maybe we should add an option to limit the number of phrase expansions (as a 
safety feature). If those limits are reached, the phrase expansion should be 
stopped (maybe then only bigrams and no trigrams).


was (Author: thetaphi):
Thanks Elizabeth and Steve,
I think the problem Sarowe mentioned was actually the problem for the failing 
test (I expected something like this). This is also the reason why I don't like 
the current architecture. EDismax relies on the (internal) structure of queries 
that querybuilder produces! IMHO, we should maybe add a "Lucene" version of the 
dismax parser for easier testing. Also I figured out that especially the phrase 
expansions are useful for Lucene users, too. I had several people I made a 
custom query parser for and for all of those you hd to reinvent the phrase 
expansion stuff.

Elizabeth: I think the permutation problem is not new with the recent Lucene 
fixed. This problem also happened with Span expansions, right? Maybe we should 
add an option to limit the number of phrase expansions (as a safety feature). 
If those limits are reached, the phrase expansion should be stopped (maybe then 
only bigrams and no trigrams).

> Edismax missing phrase queries when phrases contain multiterm synonyms
> --
>
> Key: SOLR-12243
> URL: https://issues.apache.org/jira/browse/SOLR-12243
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 7.1
> Environment: RHEL, MacOS X
> Do not believe this is environment-specific.
>Reporter: Elizabeth Haubert
>Assignee: Uwe Schindler
>Priority: Major
> Attachments: SOLR-12243.patch, SOLR-12243.patch, SOLR-12243.patch, 
> SOLR-12243.patch, SOLR-12243.patch, SOLR-12243.patch, multiword-synonyms.txt, 
> schema.xml, solrconfig.xml
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> synonyms.txt:
> {code}
> allergic, hypersensitive
> aspirin, acetylsalicylic acid
> dog, canine, canis familiris, k 9
> rat, rattus
> {code}
> request handler:
> {code:xml}
> 
>  
> 
>  edismax
>   0.4
>  title^100
>  title~20^5000
>  title~11
>  title~22^1000
>  text
>  
>  3<-1 6<-3 9<30%
>  *:*
>  25
> 
> 
> {code}
> Phrase queries (pf, pf2, pf3) containing "dog" or "aspirin"  against the 
> above list will not be generated.
> "allergic reaction dog" will generate pf2: "allergic reaction", but not 
> pf:"allergic reaction dog", pf2: "reaction dog", or pf3: "allergic reaction 
> dog"
> "aspirin dose in rats" will generate pf3: "dose ? rats" but not pf2: "aspirin 
> dose" or pf3:"aspirin dose ?"
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12243) Edismax missing phrase queries when phrases contain multiterm synonyms

2018-10-19 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657074#comment-16657074
 ] 

Uwe Schindler edited comment on SOLR-12243 at 10/19/18 4:51 PM:


That's waht I mean, it's still linked together. The main bug is still in 
Lucene, because the Lucene Query builder creates a query that does not 
correctly implement span queries on multi-term synonyms, because it uses the 
wrong query type. The issues here are coming from the fact that dismax relies 
on the interal implementation of the lucene code, which is not a good thing. 
The solr code should not do this and instead we should add something into 
Lucene that can create those pf auto-phrase queries. I was missing that in an 
own query parser, too. So basically it would be good to have some additional 
query builder method in Lucene that analyzes some text and then builds 
configureable shingles that are connected with span/phrase using a slop. This 
code should not depend on the structure of a span/boolean query that was parsed 
before.

I'd like to wait a few days until the Lucene issue is solved and then review 
the changes here and adapt them as necessary. On the longer term, I'd like to 
get rid of the query instanceof spaghetticode and move the query construction 
for dismax-like queries using term shingles (bigrams, trigrams) to a separate 
builder class. So it's better resuseable.


was (Author: thetaphi):
That's waht I mean, it's still linked together. The main bug is still in 
Lucene, because the Lucene Query builder creates a query that does not 
correctly implement span queries on multi-term synonyms, because it uses the 
wrong query type. The issues here are coming from the fact that dismax relies 
on the interal implementation of the lucene code, which is not a good thing. 
The solr code should not do this and instead we should add something into 
Lucene that can create those pf auto-phrase queries. I was missing that in an 
own query parser, too. So basically it would be good to have some additional 
query builder method in Lucene that analyzes some text and then builds 
configureable shingles that are connected with span/phrase using a slop. This 
code should not depend on the structure of a span/boolean query that was parsed 
before.

I'd like to wait a few days until the Lucene issue is solved and then review 
the changes here and adapt them as necessary. On the longer term, I'd like to 
get rid of the query instance of shingling and move the query construction for 
dismax-like queries to a separate builder class. So it's better resuseable.

> Edismax missing phrase queries when phrases contain multiterm synonyms
> --
>
> Key: SOLR-12243
> URL: https://issues.apache.org/jira/browse/SOLR-12243
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 7.1
> Environment: RHEL, MacOS X
> Do not believe this is environment-specific.
>Reporter: Elizabeth Haubert
>Assignee: Uwe Schindler
>Priority: Major
> Attachments: SOLR-12243.patch, SOLR-12243.patch, SOLR-12243.patch, 
> SOLR-12243.patch, SOLR-12243.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> synonyms.txt:
> {code}
> allergic, hypersensitive
> aspirin, acetylsalicylic acid
> dog, canine, canis familiris, k 9
> rat, rattus
> {code}
> request handler:
> {code:xml}
> 
>  
> 
>  edismax
>   0.4
>  title^100
>  title~20^5000
>  title~11
>  title~22^1000
>  text
>  
>  3<-1 6<-3 9<30%
>  *:*
>  25
> 
> 
> {code}
> Phrase queries (pf, pf2, pf3) containing "dog" or "aspirin"  against the 
> above list will not be generated.
> "allergic reaction dog" will generate pf2: "allergic reaction", but not 
> pf:"allergic reaction dog", pf2: "reaction dog", or pf3: "allergic reaction 
> dog"
> "aspirin dose in rats" will generate pf3: "dose ? rats" but not pf2: "aspirin 
> dose" or pf3:"aspirin dose ?"
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12243) Edismax missing phrase queries when phrases contain multiterm synonyms

2018-04-27 Thread Elizabeth Haubert (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16456929#comment-16456929
 ] 

Elizabeth Haubert edited comment on SOLR-12243 at 4/27/18 7:26 PM:
---

The fix I pushed up really only handles the case where you're starting with the 
single-word synonym well for pf2.  So matching "foo bar" queries to "foo 
tropical cyclone" documents.  This was a real problem for my use case, because 
the pf clauses weren't being generated at all.

The other direction, to match "foo tropical cyclone" queries to "foo bar" 
documents is harder.   I've gone a little ways into the pf2 "b tropical" 
problem, but it is a deeper problem than the spans getting thrown out because 
they were the wrong type of query. Start small.

Here's what I've got for the other direction:

One of first thing edismax does is generate a list of different kinds of 
clauses off the user query, and that seems to be unaffected by the sow flag. So 
"foo tropical cyclone" has three bareword clauses: "foo", "tropical", and 
"cyclone". But 'foo "tropical cyclone"' (with quotes) has two: a bareword foo 
and a phrase "tropical cyclone".   When it goes to construct pf2 and pf3, 
edismax re-assembles the bareword clauses, then makes the 2- and 3- word 
shingles. So "foo tropical cyclone" would get pf2="foo tropical" and "tropical 
cyclone", pf2="foo tropical" can't get expanded, because it is missing cyclone, 
and will go through such as it is;  "tropical cyclone" will get expanded, but 
then removed as not a phrase, not just because it is a Span.  That seems 
consistent if we think of "tropical cyclone" as a single entity.  So to do 
anything, we need to address how the shingle queries are being constructed.

 

I opened Jira-12260 to start looping in the phrases to pf clauses, not just the 
barewords, because that has some other weird semantics.  So 'foo "tropical 
cyclone" baz' (with quotes) would generate pf="foo baz", which is unintuitive - 
it would make more sense if it became "foo "tropical cyclone"" and "tropical 
cyclone" baz. Beyond looking a little into whether the graph queries could 
handle the phrase, I haven't really dug how to do that yet.

That matters here, because if that works and the semantics are acceptable, 
multi-word synoynms are already handled as quoted in the logic that creates the 
graph queries.   So it would (probably) be safe to take that another step to 
stuff the multiword synonyms into a single phrase clause, rather than 
individual bareword clauses.  Maybe.

 

 

 

 


was (Author: ehaubert):
The fix I pushed up really only handles the case where you're starting with the 
single-word synonym well.  So matching "foo bar" queries to "foo tropical 
cyclone" documents.  This was a real problem for my use case, because the pf 
clauses weren't being generated at all.

The other direction, to match "foo tropical cyclone" queries to "foo bar" 
documents is harder.   I've gone a little ways into the pf2 "b tropical" 
problem, but it is a deeper problem than the spans getting thrown out because 
they were the wrong type of query. Start small.

Here's what I've got for the other direction:

One of first thing edismax does is generate a list of different kinds of 
clauses off the user query, and that seems to be unaffected by the sow flag. So 
"foo tropical cyclone" has three bareword clauses: "foo", "tropical", and 
"cyclone". But 'foo "tropical cyclone"' (with quotes) has two: a bareword foo 
and a phrase "tropical cyclone".   When it goes to construct pf2 and pf3, 
edismax re-assembles the bareword clauses, then makes the 2- and 3- word 
shingles. So "foo tropical cyclone" would get pf2="foo tropical" and "tropical 
cyclone", pf2="foo tropical" can't get expanded, because it is missing cyclone, 
and will go through such as it is;  "tropical cyclone" will get expanded, but 
then removed as not a phrase, not just because it is a Span.  That seems 
consistent if we think of "tropical cyclone" as a single entity.  So to do 
anything, we need to address how the shingle queries are being constructed.

 

I opened Jira-12260 to start looping in the phrases to pf clauses, not just the 
barewords, because that has some other weird semantics.  So 'foo "tropical 
cyclone" baz' (with quotes) would generate pf="foo baz", which is unintuitive - 
it would make more sense if it became "foo "tropical cyclone"" and "tropical 
cyclone" baz. Beyond looking a little into whether the graph queries could 
handle the phrase, I haven't really dug how to do that yet.

That matters here, because if that works and the semantics are acceptable, 
multi-word synoynms are already handled as quoted in the logic that creates the 
graph queries.   So it would (probably) be safe to take that another step to 
stuff the multiword synonyms into a single phrase clause, rather than 
individual bareword clauses.  May

[jira] [Comment Edited] (SOLR-12243) Edismax missing phrase queries when phrases contain multiterm synonyms

2018-04-27 Thread Alessandro Benedetti (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16456834#comment-16456834
 ] 

Alessandro Benedetti edited comment on SOLR-12243 at 4/27/18 5:58 PM:
--

Hi [~ehaubert], thanks for the reply.

I think the current patch could be completed adding a test that verifies the 
actual query (building) parsing.
 The bug affects the query (building) parsing in the end, so, testing on 
results per query can be effective, but it's not testing the bugfix.

I will just post a brutal copy and paste here, If the Jira is still open I will 
push a PR with the fix in the next days.

Adding something like this should work :

public void 
testEdismaxQueryParsing_multiTermWithPf_shouldParseCorrectPhraseQueries() 
throws Exception

{ Query q = QParser.getParser("foo a b bar","edismax",true, req(params("sow", 
"false","qf", "text^10","pf", "text^10","pf2", "text^5","pf3", 
"text^8"))).getQuery(); assertEquals("+(" + "((text:foo)^10.0) ((text:a)^10.0) 
((text:b)^10.0) (((+text:tropical +text:cyclone) text:bar)^10.0)) " + 
"((spanNear([text:foo, text:a, text:b, spanOr([spanNear([text:tropical, 
text:cyclone], 0, true), text:bar])], 0, true))^10.0) " + "(((text:\"foo 
a\")^5.0) ((text:\"a b\")^5.0) ((spanNear([text:b, 
spanOr([spanNear([text:tropical, text:cyclone], 0, true), text:bar])], 0, 
true))^5.0)) " + "(((text:\"foo a b\")^8.0) ((spanNear([text:a, text:b, 
spanOr([spanNear([text:tropical, text:cyclone], 0, true), text:bar])], 0, 
true))^8.0))", q.toString()); q = QParser.getParser("foo a b tropical 
cyclone","edismax",true, req(params("qf", "text^10","pf", "text^10","pf2", 
"text^5","pf3", "text^8"))).getQuery(); assertEquals("+(" + "((text:foo)^10.0) 
((text:a)^10.0) ((text:b)^10.0) ((text:bar (+text:tropical 
+text:cyclone))^10.0)) " + "((spanNear([text:foo, text:a, text:b, 
spanOr([text:bar, spanNear([text:tropical, text:cyclone], 0, true)])], 0, 
true))^10.0) " + "(((text:\"foo a\")^5.0) ((text:\"a b\")^5.0) ((text:\"b 
tropical\")^5.0)) {color:#ff}*(spanOr([text:bar, spanNear([text:tropical, 
text:cyclone], 0, true)]))^5.0))"*{color} + "(((text:\"foo a b\")^8.0) 
((text:\"a b tropical\")^8.0) ((spanNear([text:b, spanOr([text:bar, 
spanNear([text:tropical, text:cyclone], 0, true)])], 0, true))^8.0))", 
q.toString()); }

*N.B.* The second part is failing for pf2, because for the query "foo a b 
tropical cyclone" , pf2 is generating just :
 ((text:\"foo a\")^5.0) ((text:\"a b\")^5.0) ((text:\"b tropical\")^5.0)), 
which I believe is incorrect as an additional span query should be generated  ( 
(spanOr([text:bar, spanNear([text:tropical, text:cyclone], 0, true)]))^5.0)).
 I will investigate further in the next days, just wanted to post it here to 
the community attention :)


was (Author: alessandro.benedetti):
Hi [~ehaubert], thanks for the reply.

I think the current patch could be completed adding a test that verifies the 
actual query (building) parsing.
 The bug affects the query (building) parsing in the end, so, testing on 
results per query can be effective, but it's not testing the bugfix.

Adding something like this should work :

public void 
testEdismaxQueryParsing_multiTermWithPf_shouldParseCorrectPhraseQueries() 
throws Exception

{ Query q = QParser.getParser("foo a b bar","edismax",true, req(params("sow", 
"false","qf", "text^10","pf", "text^10","pf2", "text^5","pf3", 
"text^8"))).getQuery(); assertEquals("+(" + "((text:foo)^10.0) ((text:a)^10.0) 
((text:b)^10.0) (((+text:tropical +text:cyclone) text:bar)^10.0)) " + 
"((spanNear([text:foo, text:a, text:b, spanOr([spanNear([text:tropical, 
text:cyclone], 0, true), text:bar])], 0, true))^10.0) " + "(((text:\"foo 
a\")^5.0) ((text:\"a b\")^5.0) ((spanNear([text:b, 
spanOr([spanNear([text:tropical, text:cyclone], 0, true), text:bar])], 0, 
true))^5.0)) " + "(((text:\"foo a b\")^8.0) ((spanNear([text:a, text:b, 
spanOr([spanNear([text:tropical, text:cyclone], 0, true), text:bar])], 0, 
true))^8.0))", q.toString()); q = QParser.getParser("foo a b tropical 
cyclone","edismax",true, req(params("qf", "text^10","pf", "text^10","pf2", 
"text^5","pf3", "text^8"))).getQuery(); assertEquals("+(" + "((text:foo)^10.0) 
((text:a)^10.0) ((text:b)^10.0) ((text:bar (+text:tropical 
+text:cyclone))^10.0)) " + "((spanNear([text:foo, text:a, text:b, 
spanOr([text:bar, spanNear([text:tropical, text:cyclone], 0, true)])], 0, 
true))^10.0) " + "(((text:\"foo a\")^5.0) ((text:\"a b\")^5.0) ((text:\"b 
tropical\")^5.0)) {color:#FF}*(spanOr([text:bar, spanNear([text:tropical, 
text:cyclone], 0, true)]))^5.0))"*{color} + "(((text:\"foo a b\")^8.0) 
((text:\"a b tropical\")^8.0) ((spanNear([text:b, spanOr([text:bar, 
spanNear([text:tropical, text:cyclone], 0, true)])], 0, true))^8.0))", 
q.toString()); }

*N.B.* The second part is failing for pf2, because for the query "foo a b 
tropical cyclone" , pf2 is generating just

[jira] [Comment Edited] (SOLR-12243) Edismax missing phrase queries when phrases contain multiterm synonyms

2018-04-27 Thread Alessandro Benedetti (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16456834#comment-16456834
 ] 

Alessandro Benedetti edited comment on SOLR-12243 at 4/27/18 5:57 PM:
--

Hi [~ehaubert], thanks for the reply.

I think the current patch could be completed adding a test that verifies the 
actual query (building) parsing.
 The bug affects the query (building) parsing in the end, so, testing on 
results per query can be effective, but it's not testing the bugfix.

Adding something like this should work :

public void 
testEdismaxQueryParsing_multiTermWithPf_shouldParseCorrectPhraseQueries() 
throws Exception

{ Query q = QParser.getParser("foo a b bar","edismax",true, req(params("sow", 
"false","qf", "text^10","pf", "text^10","pf2", "text^5","pf3", 
"text^8"))).getQuery(); assertEquals("+(" + "((text:foo)^10.0) ((text:a)^10.0) 
((text:b)^10.0) (((+text:tropical +text:cyclone) text:bar)^10.0)) " + 
"((spanNear([text:foo, text:a, text:b, spanOr([spanNear([text:tropical, 
text:cyclone], 0, true), text:bar])], 0, true))^10.0) " + "(((text:\"foo 
a\")^5.0) ((text:\"a b\")^5.0) ((spanNear([text:b, 
spanOr([spanNear([text:tropical, text:cyclone], 0, true), text:bar])], 0, 
true))^5.0)) " + "(((text:\"foo a b\")^8.0) ((spanNear([text:a, text:b, 
spanOr([spanNear([text:tropical, text:cyclone], 0, true), text:bar])], 0, 
true))^8.0))", q.toString()); q = QParser.getParser("foo a b tropical 
cyclone","edismax",true, req(params("qf", "text^10","pf", "text^10","pf2", 
"text^5","pf3", "text^8"))).getQuery(); assertEquals("+(" + "((text:foo)^10.0) 
((text:a)^10.0) ((text:b)^10.0) ((text:bar (+text:tropical 
+text:cyclone))^10.0)) " + "((spanNear([text:foo, text:a, text:b, 
spanOr([text:bar, spanNear([text:tropical, text:cyclone], 0, true)])], 0, 
true))^10.0) " + "(((text:\"foo a\")^5.0) ((text:\"a b\")^5.0) ((text:\"b 
tropical\")^5.0)) {color:#FF}*(spanOr([text:bar, spanNear([text:tropical, 
text:cyclone], 0, true)]))^5.0))"*{color} + "(((text:\"foo a b\")^8.0) 
((text:\"a b tropical\")^8.0) ((spanNear([text:b, spanOr([text:bar, 
spanNear([text:tropical, text:cyclone], 0, true)])], 0, true))^8.0))", 
q.toString()); }

*N.B.* The second part is failing for pf2, because for the query "foo a b 
tropical cyclone" , pf2 is generating just :
 ((text:\"foo a\")^5.0) ((text:\"a b\")^5.0) ((text:\"b tropical\")^5.0)), 
which I believe is incorrect as an additional span query should be generated  ( 
(spanOr([text:bar, spanNear([text:tropical, text:cyclone], 0, true)]))^5.0)).
 I will investigate further in the next days, just wanted to post it here to 
the community attention :)


was (Author: alessandro.benedetti):
Hi [~ehaubert], thanks for the reply.

I think the current patch could be completed adding a test that verifies the 
actual query (building) parsing.
 The bug affects the query (building) parsing in the end, so, testing on 
results per query can be effective, but it's not testing the bugfix.

Adding something like this should work :

public void 
testEdismaxQueryParsing_multiTermWithPf_shouldParseCorrectPhraseQueries() 
throws Exception

{ Query q = QParser.getParser("foo a b bar","edismax",true, req(params("sow", 
"false","qf", "text^10","pf", "text^10","pf2", "text^5","pf3", 
"text^8"))).getQuery(); assertEquals("+(" + "((text:foo)^10.0) ((text:a)^10.0) 
((text:b)^10.0) (((+text:tropical +text:cyclone) text:bar)^10.0)) " + 
"((spanNear([text:foo, text:a, text:b, spanOr([spanNear([text:tropical, 
text:cyclone], 0, true), text:bar])], 0, true))^10.0) " + "(((text:\"foo 
a\")^5.0) ((text:\"a b\")^5.0) ((spanNear([text:b, 
spanOr([spanNear([text:tropical, text:cyclone], 0, true), text:bar])], 0, 
true))^5.0)) " + "(((text:\"foo a b\")^8.0) ((spanNear([text:a, text:b, 
spanOr([spanNear([text:tropical, text:cyclone], 0, true), text:bar])], 0, 
true))^8.0))", q.toString()); q = QParser.getParser("foo a b tropical 
cyclone","edismax",true, req(params("qf", "text^10","pf", "text^10","pf2", 
"text^5","pf3", "text^8"))).getQuery(); assertEquals("+(" + "((text:foo)^10.0) 
((text:a)^10.0) ((text:b)^10.0) ((text:bar (+text:tropical 
+text:cyclone))^10.0)) " + "((spanNear([text:foo, text:a, text:b, 
spanOr([text:bar, spanNear([text:tropical, text:cyclone], 0, true)])], 0, 
true))^10.0) " + "(((text:\"foo a\")^5.0) ((text:\"a b\")^5.0) ((text:\"b 
tropical\")^5.0)) (spanOr([text:bar, spanNear([text:tropical, text:cyclone], 0, 
true)]))^5.0))" + "(((text:\"foo a b\")^8.0) ((text:\"a b tropical\")^8.0) 
((spanNear([text:b, spanOr([text:bar, spanNear([text:tropical, text:cyclone], 
0, true)])], 0, true))^8.0))", q.toString()); }

*N.B.* The second part is failing for pf2, because for the query "foo a b 
tropical cyclone" , pf2 is generating just :
 ((text:\"foo a\")^5.0) ((text:\"a b\")^5.0) ((text:\"b tropical\")^5.0)), 
which I believe is incorrect as an additional span query should be ge

[jira] [Comment Edited] (SOLR-12243) Edismax missing phrase queries when phrases contain multiterm synonyms

2018-04-27 Thread Alessandro Benedetti (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16456834#comment-16456834
 ] 

Alessandro Benedetti edited comment on SOLR-12243 at 4/27/18 5:55 PM:
--

Hi [~ehaubert], thanks for the reply.

I think the current patch could be completed adding a test that verifies the 
actual query (building) parsing.
 The bug affects the query (building) parsing in the end, so, testing on 
results per query can be effective, but it's not testing the bugfix.

Adding something like this should work :

public void 
testEdismaxQueryParsing_multiTermWithPf_shouldParseCorrectPhraseQueries() 
throws Exception

{ Query q = QParser.getParser("foo a b bar","edismax",true, req(params("sow", 
"false","qf", "text^10","pf", "text^10","pf2", "text^5","pf3", 
"text^8"))).getQuery(); assertEquals("+(" + "((text:foo)^10.0) ((text:a)^10.0) 
((text:b)^10.0) (((+text:tropical +text:cyclone) text:bar)^10.0)) " + 
"((spanNear([text:foo, text:a, text:b, spanOr([spanNear([text:tropical, 
text:cyclone], 0, true), text:bar])], 0, true))^10.0) " + "(((text:\"foo 
a\")^5.0) ((text:\"a b\")^5.0) ((spanNear([text:b, 
spanOr([spanNear([text:tropical, text:cyclone], 0, true), text:bar])], 0, 
true))^5.0)) " + "(((text:\"foo a b\")^8.0) ((spanNear([text:a, text:b, 
spanOr([spanNear([text:tropical, text:cyclone], 0, true), text:bar])], 0, 
true))^8.0))", q.toString()); q = QParser.getParser("foo a b tropical 
cyclone","edismax",true, req(params("qf", "text^10","pf", "text^10","pf2", 
"text^5","pf3", "text^8"))).getQuery(); assertEquals("+(" + "((text:foo)^10.0) 
((text:a)^10.0) ((text:b)^10.0) ((text:bar (+text:tropical 
+text:cyclone))^10.0)) " + "((spanNear([text:foo, text:a, text:b, 
spanOr([text:bar, spanNear([text:tropical, text:cyclone], 0, true)])], 0, 
true))^10.0) " + "(((text:\"foo a\")^5.0) ((text:\"a b\")^5.0) ((text:\"b 
tropical\")^5.0)) (spanOr([text:bar, spanNear([text:tropical, text:cyclone], 0, 
true)]))^5.0))" + "(((text:\"foo a b\")^8.0) ((text:\"a b tropical\")^8.0) 
((spanNear([text:b, spanOr([text:bar, spanNear([text:tropical, text:cyclone], 
0, true)])], 0, true))^8.0))", q.toString()); }

*N.B.* The second part is failing for pf2, because for the query "foo a b 
tropical cyclone" , pf2 is generating just :
 ((text:\"foo a\")^5.0) ((text:\"a b\")^5.0) ((text:\"b tropical\")^5.0)), 
which I believe is incorrect as an additional span query should be generated  ( 
(spanOr([text:bar, spanNear([text:tropical, text:cyclone], 0, true)]))^5.0)).
 I will investigate further in the next days, just wanted to post it here to 
the community attention :)


was (Author: alessandro.benedetti):
Hi [~ehaubert], thanks for the reply.

I think the current patch could be completed adding a test that verifies the 
actual query (building) parsing.
The bug affects the query (building) parsing in the end, so, testing on results 
per query can be effective, but it's not testing the bugfix.

Adding something like this should work :



public void 
testEdismaxQueryParsing_multiTermWithPf_shouldParseCorrectPhraseQueries() 
throws Exception {
 Query q = QParser.getParser("foo a b bar","edismax",true, req(params("sow", 
"false","qf", "text^10","pf", "text^10","pf2", "text^5","pf3", 
"text^8"))).getQuery();
 assertEquals("+(" +
 "((text:foo)^10.0) ((text:a)^10.0) ((text:b)^10.0) (((+text:tropical 
+text:cyclone) text:bar)^10.0)) " +
 "((spanNear([text:foo, text:a, text:b, spanOr([spanNear([text:tropical, 
text:cyclone], 0, true), text:bar])], 0, true))^10.0) " +
 "(((text:\"foo a\")^5.0) ((text:\"a b\")^5.0) ((spanNear([text:b, 
spanOr([spanNear([text:tropical, text:cyclone], 0, true), text:bar])], 0, 
true))^5.0)) " +
 "(((text:\"foo a b\")^8.0) ((spanNear([text:a, text:b, 
spanOr([spanNear([text:tropical, text:cyclone], 0, true), text:bar])], 0, 
true))^8.0))", q.toString());

 q = QParser.getParser("foo a b tropical cyclone","edismax",true, 
req(params("qf", "text^10","pf", "text^10","pf2", "text^5","pf3", 
"text^8"))).getQuery();
 assertEquals("+(" +
 "((text:foo)^10.0) ((text:a)^10.0) ((text:b)^10.0) ((text:bar (+text:tropical 
+text:cyclone))^10.0)) " +
 "((spanNear([text:foo, text:a, text:b, spanOr([text:bar, 
spanNear([text:tropical, text:cyclone], 0, true)])], 0, true))^10.0) " +
 "(((text:\"foo a\")^5.0) ((text:\"a b\")^5.0) ((text:\"b tropical\")^5.0)) 
(spanOr([text:bar, spanNear([text:tropical, text:cyclone], 0, true)]))^5.0))" +
 "(((text:\"foo a b\")^8.0) ((text:\"a b tropical\")^8.0) ((spanNear([text:b, 
spanOr([text:bar, spanNear([text:tropical, text:cyclone], 0, true)])], 0, 
true))^8.0))", q.toString());
}

N.B. The second part is failing for pf2, because for the query "foo a b 
tropical cyclone" , pf2 is generating just :
((text:\"foo a\")^5.0) ((text:\"a b\")^5.0) ((text:\"b tropical\")^5.0)), which 
I believe is incorrect as an additional span query should be generated  ( 
(spanOr(

[jira] [Comment Edited] (SOLR-12243) Edismax missing phrase queries when phrases contain multiterm synonyms

2018-04-27 Thread Elizabeth Haubert (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16456720#comment-16456720
 ] 

Elizabeth Haubert edited comment on SOLR-12243 at 4/27/18 5:16 PM:
---

My understanding from the 
[HowToContribute|https://wiki.apache.org/solr/HowToContribute] is that is 
supposed to happen automagically if the patch is named correctly, but I didn't 
knowingly do anything to cause it to happen.  Code is pretty straightforward, 
but I'm having a bit of a learning curve on the non-code things that need to 
happen.


was (Author: ehaubert):
My understanding from the 
[HowToContribute|https://wiki.apache.org/solr/HowToContribute] is that is 
supposed to happen automagically if the patch is named correctly, but I didn't 
knowingly do anything to cause it to happen.  The code base is pretty 
straightforward, but I'm having a bit of a learning curve on the non-code 
things that need to happen.

> Edismax missing phrase queries when phrases contain multiterm synonyms
> --
>
> Key: SOLR-12243
> URL: https://issues.apache.org/jira/browse/SOLR-12243
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 7.1
> Environment: RHEL, MacOS X
> Do not believe this is environment-specific.
>Reporter: Elizabeth Haubert
>Priority: Major
> Attachments: SOLR-12243.patch
>
>
> synonyms.txt:
> allergic, hypersensitive
> aspirin, acetylsalicylic acid
> dog, canine, canis familiris, k 9
> rat, rattus
> request handler:
> 
>  
> 
>  edismax
>   0.4
>  title^100
>  title~20^5000
>  title~11
>  title~22^1000
>  text
>  
>  3<-1 6<-3 9<30%
>  *:*
>  25
> 
>  
> Phrase queries (pf, pf2, pf3) containing "dog" or "aspirin"  against the 
> above list will not be generated.
> "allergic reaction dog" will generate pf2: "allergic reaction", but not 
> pf:"allergic reaction dog", pf2: "reaction dog", or pf3: "allergic reaction 
> dog"
> "aspirin dose in rats" will generate pf3: "dose ? rats" but not pf2: "aspirin 
> dose" or pf3:"aspirin dose ?"
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12243) Edismax missing phrase queries when phrases contain multiterm synonyms

2018-04-25 Thread Elizabeth Haubert (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452972#comment-16452972
 ] 

Elizabeth Haubert edited comment on SOLR-12243 at 4/25/18 7:54 PM:
---

I'd really like to make a bugfix release of the 7_1 branch with this, although 
the problem is still present on 7.x as well.  Thoughts?

The actual change is quite small.

 

 

 

 


was (Author: ehaubert):
I'd really like to make a bugfix release of this on the 7_1 branch with this, 
although the problem is still present on 7.x as well.  Thoughts?

The actual change is quite small.

 

 

 

 

> Edismax missing phrase queries when phrases contain multiterm synonyms
> --
>
> Key: SOLR-12243
> URL: https://issues.apache.org/jira/browse/SOLR-12243
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 7.1
> Environment: RHEL, MacOS X
> Do not believe this is environment-specific.
>Reporter: Elizabeth Haubert
>Priority: Major
> Attachments: SOLR-12243.patch
>
>
> synonyms.txt:
> allergic, hypersensitive
> aspirin, acetylsalicylic acid
> dog, canine, canis familiris, k 9
> rat, rattus
> request handler:
> 
>  
> 
>  edismax
>   0.4
>  title^100
>  title~20^5000
>  title~11
>  title~22^1000
>  text
>  
>  3<-1 6<-3 9<30%
>  *:*
>  25
> 
>  
> Phrase queries (pf, pf2, pf3) containing "dog" or "aspirin"  against the 
> above list will not be generated.
> "allergic reaction dog" will generate pf2: "allergic reaction", but not 
> pf:"allergic reaction dog", pf2: "reaction dog", or pf3: "allergic reaction 
> dog"
> "aspirin dose in rats" will generate pf3: "dose ? rats" but not pf2: "aspirin 
> dose" or pf3:"aspirin dose ?"
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org