[jira] [Updated] (SOLR-11662) Make overlapping query term scoring configurable per field type

2017-11-21 Thread Doug Turnbull (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Turnbull updated SOLR-11662:
-
Description: 
This patch customizes the query-time behavior when query terms overlap 
positions. Right now the only option is SynonymQuery. This is a fantastic 
default & improvement on past versions. However, there are use cases where 
terms overlap positions but don't carry exact synonymy relationships. Often 
synonyms are actually used to model hypernym/hyponym relationships using 
synonyms (or other analyzers). So the individual term scores matter, with terms 
with higher specificity (hyponym) scoring higher than terms with lower 
specificity (hypernym).

This patch adds the fieldType setting scoreOverlaps, as in:


{code:java}
  

{code}


Valid values for scoreOverlaps are:

*as_one_term*
Default, most synonym use cases. Uses SynonymQuery
Treats all terms as if they're exactly equivalent, with document frequency from 
underlying terms blended 

*pick_best*
For a given document, score using the best scoring synonym (ie dismax over 
generated terms). 
Useful when synonyms not exactly equilevant. Instead they are used to model 
hypernym/hyponym relationships. Such as expanding to synonyms of where terms 
scores will reflect that quality
IE this query time expansion

tabby => tabby, cat, animal

Searching "text", generates the dismax (text:tabby | text:cat | text:animal)

*as_distinct_terms*
(The pre 6.0 behavior.)
Compromise between pick_best and as_oneSterm
Appropriate when synonyms reflect a hypernym/hyponym relationship, but lets 
scores stack, so documents with more tabby, cat, or animal the better w/ a bias 
towards the term with highest specificity
Terms are turned into a boolean OR query, with documen frequencies not blended
IE this query time expansion

tabby => tabby, cat, animal

Searching "text", generates the boolean query (text:tabby  text:cat text:animal)


  was:
This patch customizes the query-time behavior when query terms overlap 
positions. Right now the only option is SynonymQuery. This is a fantastic 
default & improvement on past versions. However, there are use cases where 
terms overlap positions but don't carry exact synonymy relationships. Often 
synonyms are actually used to model hypernym/hyponym relationships using 
synonyms (or other analyzers). So the individual term scores matter, with terms 
with higher specificity (hyponym) scoring higher than terms with lower 
specificity (hypernym).

This patch adds the fieldType setting scoreOverlaps, as in:


{code:java}
  

{code}


Valid values for scoreOverlaps are:

*as_one_term*
Default, most synonym use cases. Uses SynonymQuery
Treats all terms as if they're exactly equivalent, with document frequency from 
underlying terms blended 

*pick_best*
For a given document, score using the best scoring synonym (ie dismax over 
generated terms). 
Useful when synonyms not exactly equilevant. Instead they are used to model 
hypernym/hyponym relationships. Such as expanding to synonyms of where terms 
scores will reflect that quality
IE this query time expansion

tabby => tabby, cat, animal

Searching "text", generates the dismax (text:tabby | text:cat | text:animal)

*as_distinct_terms*
(The pre 6.0 behavior.)
Compromise between pick_best and as_oneSterm
Appropriate when synonyms reflect a hypernym/hyponym relationship, but lets 
scores stack, so documents with more tabby, cat, or animal the better w/ a bias 
towards the term with highest specificity
Terms are turned into a boolean OR query, with documen frequencies not blended
IE this query time expansion

tabby => tabby, cat, animal

Searching "text", generates the dismax (text:tabby  text:cat text:animal)



> Make overlapping query term scoring configurable per field type
> ---
>
> Key: SOLR-11662
> URL: https://issues.apache.org/jira/browse/SOLR-11662
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Doug Turnbull
> Fix For: 7.2, master (8.0)
>
>
> This patch customizes the query-time behavior when query terms overlap 
> positions. Right now the only option is SynonymQuery. This is a fantastic 
> default & improvement on past versions. However, there are use cases where 
> terms overlap positions but don't carry exact synonymy relationships. Often 
> synonyms are actually used to model hypernym/hyponym relationships using 
> synonyms (or other analyzers). So the individual term scores matter, with 
> terms with higher specificity (hyponym) scoring higher than terms with lower 
> specificity (hypernym).
> This patch adds the fieldType setting scoreOverlaps, as in:
> {code:java}
>class="solr.TextField" positionIncrementGap="100" multiValued="true">
> {code}
> Valid values 

[jira] [Updated] (SOLR-11662) Make overlapping query term scoring configurable per field type

2017-11-21 Thread Doug Turnbull (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Turnbull updated SOLR-11662:
-
Summary: Make overlapping query term scoring configurable per field type  
(was: More than SynonymQuery: Let overlapping query terms model 
hypernym/hyponym relationships)

> Make overlapping query term scoring configurable per field type
> ---
>
> Key: SOLR-11662
> URL: https://issues.apache.org/jira/browse/SOLR-11662
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Doug Turnbull
> Fix For: 7.2, master (8.0)
>
>
> This patch customizes the query-time behavior when query terms overlap 
> positions. Right now the only option is SynonymQuery. This is a fantastic 
> default & improvement on past versions. However, there are use cases where 
> terms overlap positions but don't carry exact synonymy relationships. Often 
> synonyms are actually used to model hypernym/hyponym relationships using 
> synonyms (or other analyzers). So the individual term scores matter, with 
> terms with higher specificity (hyponym) scoring higher than terms with lower 
> specificity (hypernym).
> This patch adds the fieldType setting scoreOverlaps, as in:
> {code:java}
>class="solr.TextField" positionIncrementGap="100" multiValued="true">
> {code}
> Valid values for scoreOverlaps are:
> *as_one_term*
> Default, most synonym use cases. Uses SynonymQuery
> Treats all terms as if they're exactly equivalent, with document frequency 
> from underlying terms blended 
> *pick_best*
> For a given document, score using the best scoring synonym (ie dismax over 
> generated terms). 
> Useful when synonyms not exactly equilevant. Instead they are used to model 
> hypernym/hyponym relationships. Such as expanding to synonyms of where terms 
> scores will reflect that quality
> IE this query time expansion
> tabby => tabby, cat, animal
> Searching "text", generates the dismax (text:tabby | text:cat | text:animal)
> *as_distinct_terms
> *(The pre 6.0 behavior.)
> Compromise between pick_best and as_oneSterm
> Appropriate when synonyms reflect a hypernym/hyponym relationship, but lets 
> scores stack, so documents with more tabby, cat, or animal the better w/ a 
> bias towards the term with highest specificity
> Terms are turned into a boolean OR query, with documen frequencies not blended
> IE this query time expansion
> tabby => tabby, cat, animal
> Searching "text", generates the dismax (text:tabby  text:cat text:animal)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11662) Make overlapping query term scoring configurable per field type

2017-11-21 Thread Doug Turnbull (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Turnbull updated SOLR-11662:
-
Description: 
This patch customizes the query-time behavior when query terms overlap 
positions. Right now the only option is SynonymQuery. This is a fantastic 
default & improvement on past versions. However, there are use cases where 
terms overlap positions but don't carry exact synonymy relationships. Often 
synonyms are actually used to model hypernym/hyponym relationships using 
synonyms (or other analyzers). So the individual term scores matter, with terms 
with higher specificity (hyponym) scoring higher than terms with lower 
specificity (hypernym).

This patch adds the fieldType setting scoreOverlaps, as in:


{code:java}
  

{code}


Valid values for scoreOverlaps are:

*as_one_term*
Default, most synonym use cases. Uses SynonymQuery
Treats all terms as if they're exactly equivalent, with document frequency from 
underlying terms blended 

*pick_best*
For a given document, score using the best scoring synonym (ie dismax over 
generated terms). 
Useful when synonyms not exactly equilevant. Instead they are used to model 
hypernym/hyponym relationships. Such as expanding to synonyms of where terms 
scores will reflect that quality
IE this query time expansion

tabby => tabby, cat, animal

Searching "text", generates the dismax (text:tabby | text:cat | text:animal)

*as_distinct_terms*
(The pre 6.0 behavior.)
Compromise between pick_best and as_oneSterm
Appropriate when synonyms reflect a hypernym/hyponym relationship, but lets 
scores stack, so documents with more tabby, cat, or animal the better w/ a bias 
towards the term with highest specificity
Terms are turned into a boolean OR query, with documen frequencies not blended
IE this query time expansion

tabby => tabby, cat, animal

Searching "text", generates the dismax (text:tabby  text:cat text:animal)


  was:
This patch customizes the query-time behavior when query terms overlap 
positions. Right now the only option is SynonymQuery. This is a fantastic 
default & improvement on past versions. However, there are use cases where 
terms overlap positions but don't carry exact synonymy relationships. Often 
synonyms are actually used to model hypernym/hyponym relationships using 
synonyms (or other analyzers). So the individual term scores matter, with terms 
with higher specificity (hyponym) scoring higher than terms with lower 
specificity (hypernym).

This patch adds the fieldType setting scoreOverlaps, as in:


{code:java}
  

{code}


Valid values for scoreOverlaps are:

*as_one_term*
Default, most synonym use cases. Uses SynonymQuery
Treats all terms as if they're exactly equivalent, with document frequency from 
underlying terms blended 

*pick_best*
For a given document, score using the best scoring synonym (ie dismax over 
generated terms). 
Useful when synonyms not exactly equilevant. Instead they are used to model 
hypernym/hyponym relationships. Such as expanding to synonyms of where terms 
scores will reflect that quality
IE this query time expansion

tabby => tabby, cat, animal

Searching "text", generates the dismax (text:tabby | text:cat | text:animal)

*as_distinct_terms
*(The pre 6.0 behavior.)
Compromise between pick_best and as_oneSterm
Appropriate when synonyms reflect a hypernym/hyponym relationship, but lets 
scores stack, so documents with more tabby, cat, or animal the better w/ a bias 
towards the term with highest specificity
Terms are turned into a boolean OR query, with documen frequencies not blended
IE this query time expansion

tabby => tabby, cat, animal

Searching "text", generates the dismax (text:tabby  text:cat text:animal)



> Make overlapping query term scoring configurable per field type
> ---
>
> Key: SOLR-11662
> URL: https://issues.apache.org/jira/browse/SOLR-11662
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Doug Turnbull
> Fix For: 7.2, master (8.0)
>
>
> This patch customizes the query-time behavior when query terms overlap 
> positions. Right now the only option is SynonymQuery. This is a fantastic 
> default & improvement on past versions. However, there are use cases where 
> terms overlap positions but don't carry exact synonymy relationships. Often 
> synonyms are actually used to model hypernym/hyponym relationships using 
> synonyms (or other analyzers). So the individual term scores matter, with 
> terms with higher specificity (hyponym) scoring higher than terms with lower 
> specificity (hypernym).
> This patch adds the fieldType setting scoreOverlaps, as in:
> {code:java}
>class="solr.TextField" positionIncrementGap="100" multiValued="true">
> {code}
> Valid values for