[jira] [Updated] (SOLR-8584) Phrase Fields ranking bug for range search when using edismax queryparser

2016-01-22 Thread Thomas Egense (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Egense updated SOLR-8584:

Description: 
When you have PhraseFields defined in your edismax parser and make a range 
query, the edismax will expand the query to the relevant fields, but with a 
wrong term added to the query. That term is the max-limit from the range query.

In the example tutorial Solr add the following to the /browse edismax handler:
name 

Try this  query  with query explain for the /browse requesthandler:
{code}
apple AND weight:[* TO 60]
{code}
Debug query text
. DisjunctionMaxQuery((name:"apple 60").

As you can see the term 60 (from the range query) is now added to the search 
term and this will boost the score if indeed "apple 60" matches anything.
To demostrate this, add the following document to the index:
(ipod_video.xml with minor changes)

{code}

  MA147LL/B
  Apple 120 GB iPod with Video Playback Black
  Apple Computer Inc.
  
  apple
  electronics
  music
  iTunes, Podcasts, Audiobooks
  Stores up to 15,000 songs, 25,000 photos, or 150 hours 
of video
  2.5-inch, 320x240 color TFT LCD display with LED 
backlight
  Up to 20 hours of battery life
  Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, 
H.264 video
  Notes, Calendar, Phone book, Hold button, Date 
display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable 
firmware, USB 2.0 compatibility, Playback speed control, Rechargeable 
capability, Battery level indication
  earbud headphones, USB cable
  6.5
  599.00
  10
  true
  
  37.7752,-100.0232
  2005-10-12T08:00:00Z

{code}

When you repeat the query: 
{code}
apple AND weight:[* TO 60]
{code}
It will find two documents as expected, but the ranking should be
identical! Instead they are   0.65656495 and 0.3007804 

The reason for this bug is that phrase "apple 60" matches one of the documents 
(the one that comes with the tutorial).

The phrase field expansion can go much worse than this and use both the 
start-limit, end-limit and "TO" used in the range query part. 

/Thomas Egense









  was:
When you have PhraseFields defined in your edismax parser and make a range 
query, the edismax will expand the query to the relevant fields, but with a 
wrong term added to the query. That term is the max-limit from the range query.

In the example tutorial Solr add the following to the /browse edismax handler:
name 

Try this  query  with query explain for the /browse requesthandler:
{code}
apple AND weight:[* TO 60]
{/code}
Debug query text
. DisjunctionMaxQuery((name:"apple 60").

As you can see the term 60 (from the range query) is now added to the search 
term and this will boost the score if indeed "apple 60" matches anything.
To demostrate this, add the following document to the index:
(ipod_video.xml with minor changes)

{code}

  MA147LL/B
  Apple 120 GB iPod with Video Playback Black
  Apple Computer Inc.
  
  apple
  electronics
  music
  iTunes, Podcasts, Audiobooks
  Stores up to 15,000 songs, 25,000 photos, or 150 hours 
of video
  2.5-inch, 320x240 color TFT LCD display with LED 
backlight
  Up to 20 hours of battery life
  Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, 
H.264 video
  Notes, Calendar, Phone book, Hold button, Date 
display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable 
firmware, USB 2.0 compatibility, Playback speed control, Rechargeable 
capability, Battery level indication
  earbud headphones, USB cable
  6.5
  599.00
  10
  true
  
  37.7752,-100.0232
  2005-10-12T08:00:00Z

{/code}

When you repeat the query: 
{code}
apple AND weight:[* TO 60]
{/code}
It will find two documents as expected, but the ranking should be
identical! Instead they are   0.65656495 and 0.3007804 

The reason for this bug is that phrase "apple 60" matches one of the documents 
(the one that comes with the tutorial).

The phrase field expansion can go much worse than this and use both the 
start-limit, end-limit and "TO" used in the range query part. 

/Thomas Egense










> Phrase Fields ranking bug for range search when using edismax queryparser
> -
>
> Key: SOLR-8584
> URL: https://issues.apache.org/jira/browse/SOLR-8584
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 4.7, 4.8.1, 4.9.1, 4.10.4, 5.0, 5.1, 5.2, 5.3.1, 5.4
>Reporter: Thomas Egense
>Priority: Minor
>  Labels: edismax, phrasequery
>
> When you have PhraseFields defined in your edismax parser and make a range 
> query, the edismax will expand the query to the relevant fields, but with a 
> wrong term added to the query. That term is the max-limit from the range 
> query.
> In the example tutorial Solr add the following to the /browse edismax handler:
> name 
> Try this  query 

[jira] [Updated] (SOLR-8584) Phrase Fields ranking bug for range search when using edismax queryparser

2016-01-22 Thread Thomas Egense (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Egense updated SOLR-8584:

Description: 
When you have PhraseFields defined in your edismax parser and make a range 
query, the edismax will expand the query to the relevant fields, but with a 
wrong term added to the query. That term is the max-limit from the range query.

In the example tutorial Solr add the following to the /browse edismax handler:
name 

Try this  query  with query explain for the /browse requesthandler:
{code}
apple AND weight:[* TO 60]
{/code}
Debug query text
. DisjunctionMaxQuery((name:"apple 60").

As you can see the term 60 (from the range query) is now added to the search 
term and this will boost the score if indeed "apple 60" matches anything.
To demostrate this, add the following document to the index:
(ipod_video.xml with minor changes)

{code}

  MA147LL/B
  Apple 120 GB iPod with Video Playback Black
  Apple Computer Inc.
  
  apple
  electronics
  music
  iTunes, Podcasts, Audiobooks
  Stores up to 15,000 songs, 25,000 photos, or 150 hours 
of video
  2.5-inch, 320x240 color TFT LCD display with LED 
backlight
  Up to 20 hours of battery life
  Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, 
H.264 video
  Notes, Calendar, Phone book, Hold button, Date 
display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable 
firmware, USB 2.0 compatibility, Playback speed control, Rechargeable 
capability, Battery level indication
  earbud headphones, USB cable
  6.5
  599.00
  10
  true
  
  37.7752,-100.0232
  2005-10-12T08:00:00Z

{/code}

When you repeat the query: 
{code}
apple AND weight:[* TO 60]
{/code}
It will find two documents as expected, but the ranking should be
identical! Instead they are   0.65656495 and 0.3007804 

The reason for this bug is that phrase "apple 60" matches one of the documents 
(the one that comes with the tutorial).

The phrase field expansion can go much worse than this and use both the 
start-limit, end-limit and "TO" used in the range query part. 

/Thomas Egense









  was:
When you have PhraseFields defined in your edismax parser and make a range 
query, the edismax will expand the query to the relevant fields, but with a 
wrong term added to the query. That term is the max-limit from the range query.

In the example tutorial Solr add the following to the /browse edismax handler:
name 

Try this  query  with query explain for the /browse requesthandler:
apple AND weight:[* TO 60]

Debug query text
. DisjunctionMaxQuery((name:"apple 60").

As you can see the term 60 (from the range query) is now added to the search 
term and this will boost the score if indeed "apple 60" matches anything.
To demostrate this, add the following document to the index:
(ipod_video.xml with minor changes)


  MA147LL/B
  Apple 120 GB iPod with Video Playback Black
  Apple Computer Inc.
  
  apple
  electronics
  music
  iTunes, Podcasts, Audiobooks
  Stores up to 15,000 songs, 25,000 photos, or 150 hours 
of video
  2.5-inch, 320x240 color TFT LCD display with LED 
backlight
  Up to 20 hours of battery life
  Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, 
H.264 video
  Notes, Calendar, Phone book, Hold button, Date 
display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable 
firmware, USB 2.0 compatibility, Playback speed control, Rechargeable 
capability, Battery level indication
  earbud headphones, USB cable
  6.5
  599.00
  10
  true
  
  37.7752,-100.0232
  2005-10-12T08:00:00Z


When you repeat the query: apple AND weight:[* TO 60]
It will find two documents as expected, but the ranking should be
identical! Instead they are   0.65656495 and 0.3007804 

The reason for this bug is that phrase "apple 60" matches one of the documents 
(the one that comes with the tutorial).

The phrase field expansion can go much worse than this and use both the 
start-limit, end-limit and "TO" used in the range query part. 

/Thomas Egense










> Phrase Fields ranking bug for range search when using edismax queryparser
> -
>
> Key: SOLR-8584
> URL: https://issues.apache.org/jira/browse/SOLR-8584
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 4.7, 4.8.1, 4.9.1, 4.10.4, 5.0, 5.1, 5.2, 5.3.1, 5.4
>Reporter: Thomas Egense
>Priority: Minor
>  Labels: edismax, phrasequery
>
> When you have PhraseFields defined in your edismax parser and make a range 
> query, the edismax will expand the query to the relevant fields, but with a 
> wrong term added to the query. That term is the max-limit from the range 
> query.
> In the example tutorial Solr add the following to the /browse edismax handler:
> name 
> Try this  query  with query explain for the /browse 

[jira] [Updated] (SOLR-8584) Phrase Fields ranking bug for range search when using edismax queryparser

2016-01-22 Thread Thomas Egense (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Egense updated SOLR-8584:

Description: 
When you have PhraseFields defined in your edismax parser and make a range 
query, the edismax will expand the query to the relevant fields, but with a 
wrong term added to the query. That term is the max-limit from the range query.

In the example tutorial Solr add the following to the /browse edismax handler 
in solrconfig.xml:
{code}
name 
{code}

Try this  query  with query explain for the /browse requesthandler:
{code}
apple AND weight:[* TO 60]
{code}
This is from the query explain:
{code}
. DisjunctionMaxQuery((name:"apple 60").
{code}
As you can see the term 60 (from the range query) is now added to the search 
term and this will boost the score if indeed "apple 60" matches anything.
To demostrate this, add the following document to the index:
(ipod_video.xml with minor changes)

{code}

  MA147LL/B
  Apple 120 GB iPod with Video Playback Black
  Apple Computer Inc.
  
  apple
  electronics
  music
  iTunes, Podcasts, Audiobooks
  Stores up to 15,000 songs, 25,000 photos, or 150 hours 
of video
  2.5-inch, 320x240 color TFT LCD display with LED 
backlight
  Up to 20 hours of battery life
  Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, 
H.264 video
  Notes, Calendar, Phone book, Hold button, Date 
display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable 
firmware, USB 2.0 compatibility, Playback speed control, Rechargeable 
capability, Battery level indication
  earbud headphones, USB cable
  6.5
  599.00
  10
  true
  
  37.7752,-100.0232
  2005-10-12T08:00:00Z

{code}

When you repeat the query: 
{code}
apple AND weight:[* TO 60]
{code}
It will find two documents as expected, but the ranking should be
identical! Instead they are   0.65656495 and 0.3007804 

The reason for this bug is that phrase "apple 60" matches one of the documents 
(the one that comes with the tutorial).

The phrase field expansion can go much worse than this and use both the 
start-limit, end-limit and "TO" used in the range query part. 

/Thomas Egense and Toke Eskildsen









  was:
When you have PhraseFields defined in your edismax parser and make a range 
query, the edismax will expand the query to the relevant fields, but with a 
wrong term added to the query. That term is the max-limit from the range query.

In the example tutorial Solr add the following to the /browse edismax handler 
in solrconfig.xml:
{code}
name 
{code}

Try this  query  with query explain for the /browse requesthandler:
{code}
apple AND weight:[* TO 60]
{code}
This is from the query explain:
{code}
. DisjunctionMaxQuery((name:"apple 60").
{code}
As you can see the term 60 (from the range query) is now added to the search 
term and this will boost the score if indeed "apple 60" matches anything.
To demostrate this, add the following document to the index:
(ipod_video.xml with minor changes)

{code}

  MA147LL/B
  Apple 120 GB iPod with Video Playback Black
  Apple Computer Inc.
  
  apple
  electronics
  music
  iTunes, Podcasts, Audiobooks
  Stores up to 15,000 songs, 25,000 photos, or 150 hours 
of video
  2.5-inch, 320x240 color TFT LCD display with LED 
backlight
  Up to 20 hours of battery life
  Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, 
H.264 video
  Notes, Calendar, Phone book, Hold button, Date 
display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable 
firmware, USB 2.0 compatibility, Playback speed control, Rechargeable 
capability, Battery level indication
  earbud headphones, USB cable
  6.5
  599.00
  10
  true
  
  37.7752,-100.0232
  2005-10-12T08:00:00Z

{code}

When you repeat the query: 
{code}
apple AND weight:[* TO 60]
{code}
It will find two documents as expected, but the ranking should be
identical! Instead they are   0.65656495 and 0.3007804 

The reason for this bug is that phrase "apple 60" matches one of the documents 
(the one that comes with the tutorial).

The phrase field expansion can go much worse than this and use both the 
start-limit, end-limit and "TO" used in the range query part. 

/Thomas Egense










> Phrase Fields ranking bug for range search when using edismax queryparser
> -
>
> Key: SOLR-8584
> URL: https://issues.apache.org/jira/browse/SOLR-8584
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 4.7, 4.8.1, 4.9.1, 4.10.4, 5.0, 5.1, 5.2, 5.3.1, 5.4
>Reporter: Thomas Egense
>Priority: Minor
>  Labels: edismax, phrasequery
>
> When you have PhraseFields defined in your edismax parser and make a range 
> query, the edismax will expand the query to the relevant fields, but with a 
> wrong term added to the query. That term is the 

[jira] [Updated] (SOLR-8584) Phrase Fields ranking bug for range search when using edismax queryparser

2016-01-22 Thread Thomas Egense (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Egense updated SOLR-8584:

Description: 
When you have PhraseFields defined in your edismax parser and make a range 
query, the edismax will expand the query to the relevant fields, but with a 
wrong term added to the query. That term is the max-limit from the range query.

In the example tutorial Solr add the following to the /browse edismax handler 
in solrconfig.xml:
{code}
name 
{code}

Try this  query  with query explain for the /browse requesthandler:
{code}
apple AND weight:[* TO 60]
{code}
This is from the query explain:
{code}
. DisjunctionMaxQuery((name:"apple 60").
{code}
As you can see the term 60 (from the range query) is now added to the search 
term and this will boost the score if indeed "apple 60" matches anything.
To demostrate this, add the following document to the index:
(ipod_video.xml with minor changes)

{code}

  MA147LL/B
  Apple 120 GB iPod with Video Playback Black
  Apple Computer Inc.
  
  apple
  electronics
  music
  iTunes, Podcasts, Audiobooks
  Stores up to 15,000 songs, 25,000 photos, or 150 hours 
of video
  2.5-inch, 320x240 color TFT LCD display with LED 
backlight
  Up to 20 hours of battery life
  Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, 
H.264 video
  Notes, Calendar, Phone book, Hold button, Date 
display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable 
firmware, USB 2.0 compatibility, Playback speed control, Rechargeable 
capability, Battery level indication
  earbud headphones, USB cable
  6.5
  599.00
  10
  true
  
  37.7752,-100.0232
  2005-10-12T08:00:00Z

{code}

When you repeat the query: 
{code}
apple AND weight:[* TO 60]
{code}
It will find two documents as expected, but the ranking should be
identical! Instead they are   0.65656495 and 0.3007804 

The reason for this bug is that phrase "apple 60" matches one of the documents 
(the one that comes with the tutorial).

The phrase field expansion can go much worse than this and use both the 
start-limit, end-limit and "TO" used in the range query part. 

/Thomas Egense









  was:
When you have PhraseFields defined in your edismax parser and make a range 
query, the edismax will expand the query to the relevant fields, but with a 
wrong term added to the query. That term is the max-limit from the range query.

In the example tutorial Solr add the following to the /browse edismax handler 
in solrconfig.xml:
{code}
name 
{code}

Try this  query  with query explain for the /browse requesthandler:
{code}
apple AND weight:[* TO 60]
{code}
Debug query text
. DisjunctionMaxQuery((name:"apple 60").

As you can see the term 60 (from the range query) is now added to the search 
term and this will boost the score if indeed "apple 60" matches anything.
To demostrate this, add the following document to the index:
(ipod_video.xml with minor changes)

{code}

  MA147LL/B
  Apple 120 GB iPod with Video Playback Black
  Apple Computer Inc.
  
  apple
  electronics
  music
  iTunes, Podcasts, Audiobooks
  Stores up to 15,000 songs, 25,000 photos, or 150 hours 
of video
  2.5-inch, 320x240 color TFT LCD display with LED 
backlight
  Up to 20 hours of battery life
  Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, 
H.264 video
  Notes, Calendar, Phone book, Hold button, Date 
display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable 
firmware, USB 2.0 compatibility, Playback speed control, Rechargeable 
capability, Battery level indication
  earbud headphones, USB cable
  6.5
  599.00
  10
  true
  
  37.7752,-100.0232
  2005-10-12T08:00:00Z

{code}

When you repeat the query: 
{code}
apple AND weight:[* TO 60]
{code}
It will find two documents as expected, but the ranking should be
identical! Instead they are   0.65656495 and 0.3007804 

The reason for this bug is that phrase "apple 60" matches one of the documents 
(the one that comes with the tutorial).

The phrase field expansion can go much worse than this and use both the 
start-limit, end-limit and "TO" used in the range query part. 

/Thomas Egense










> Phrase Fields ranking bug for range search when using edismax queryparser
> -
>
> Key: SOLR-8584
> URL: https://issues.apache.org/jira/browse/SOLR-8584
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 4.7, 4.8.1, 4.9.1, 4.10.4, 5.0, 5.1, 5.2, 5.3.1, 5.4
>Reporter: Thomas Egense
>Priority: Minor
>  Labels: edismax, phrasequery
>
> When you have PhraseFields defined in your edismax parser and make a range 
> query, the edismax will expand the query to the relevant fields, but with a 
> wrong term added to the query. That term is the max-limit from the range 
> query.
> In the 

[jira] [Updated] (SOLR-8584) Phrase Fields ranking bug for range search when using edismax queryparser

2016-01-22 Thread Thomas Egense (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Egense updated SOLR-8584:

Description: 
When you have PhraseFields defined in your edismax parser and make a range 
query, the edismax will expand the query to the phrase fields, but with a wrong 
term added to the query. That term is the max-limit from the range query.

In the example tutorial Solr add the following to the /browse edismax handler 
in solrconfig.xml:
{code}
name 
{code}

Try this  query  with query explain for the /browse requesthandler:
{code}
apple AND weight:[* TO 60]
{code}
This is from the query explain:
{code}
. DisjunctionMaxQuery((name:"apple 60").
{code}
As you can see the term 60 (from the range query) is now added to the search 
term and this will boost the score if indeed "apple 60" matches anything.
To demostrate this, add the following document to the index:
(ipod_video.xml with minor changes)

{code}

  MA147LL/B
  Apple 120 GB iPod with Video Playback Black
  Apple Computer Inc.
  
  apple
  electronics
  music
  iTunes, Podcasts, Audiobooks
  Stores up to 15,000 songs, 25,000 photos, or 150 hours 
of video
  2.5-inch, 320x240 color TFT LCD display with LED 
backlight
  Up to 20 hours of battery life
  Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, 
H.264 video
  Notes, Calendar, Phone book, Hold button, Date 
display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable 
firmware, USB 2.0 compatibility, Playback speed control, Rechargeable 
capability, Battery level indication
  earbud headphones, USB cable
  6.5
  599.00
  10
  true
  
  37.7752,-100.0232
  2005-10-12T08:00:00Z

{code}

When you repeat the query: 
{code}
apple AND weight:[* TO 60]
{code}
It will find two documents as expected, but the ranking should be
identical! Instead they are   0.65656495 and 0.3007804 

The reason for this bug is that phrase "apple 60" matches one of the documents 
(the one that comes with the tutorial).

The phrase field expansion can go much worse than this and use both the 
start-limit, end-limit and "TO" used in the range query part. 

Solution: Do not use anything from the range query part for the phrase fields.

/Thomas Egense and Toke Eskildsen









  was:
When you have PhraseFields defined in your edismax parser and make a range 
query, the edismax will expand the query to the relevant fields, but with a 
wrong term added to the query. That term is the max-limit from the range query.

In the example tutorial Solr add the following to the /browse edismax handler 
in solrconfig.xml:
{code}
name 
{code}

Try this  query  with query explain for the /browse requesthandler:
{code}
apple AND weight:[* TO 60]
{code}
This is from the query explain:
{code}
. DisjunctionMaxQuery((name:"apple 60").
{code}
As you can see the term 60 (from the range query) is now added to the search 
term and this will boost the score if indeed "apple 60" matches anything.
To demostrate this, add the following document to the index:
(ipod_video.xml with minor changes)

{code}

  MA147LL/B
  Apple 120 GB iPod with Video Playback Black
  Apple Computer Inc.
  
  apple
  electronics
  music
  iTunes, Podcasts, Audiobooks
  Stores up to 15,000 songs, 25,000 photos, or 150 hours 
of video
  2.5-inch, 320x240 color TFT LCD display with LED 
backlight
  Up to 20 hours of battery life
  Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, 
H.264 video
  Notes, Calendar, Phone book, Hold button, Date 
display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable 
firmware, USB 2.0 compatibility, Playback speed control, Rechargeable 
capability, Battery level indication
  earbud headphones, USB cable
  6.5
  599.00
  10
  true
  
  37.7752,-100.0232
  2005-10-12T08:00:00Z

{code}

When you repeat the query: 
{code}
apple AND weight:[* TO 60]
{code}
It will find two documents as expected, but the ranking should be
identical! Instead they are   0.65656495 and 0.3007804 

The reason for this bug is that phrase "apple 60" matches one of the documents 
(the one that comes with the tutorial).

The phrase field expansion can go much worse than this and use both the 
start-limit, end-limit and "TO" used in the range query part. 

Solution: Do not use anything from the range query part for the phrase fields.

/Thomas Egense and Toke Eskildsen










> Phrase Fields ranking bug for range search when using edismax queryparser
> -
>
> Key: SOLR-8584
> URL: https://issues.apache.org/jira/browse/SOLR-8584
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 4.7, 4.8.1, 4.9.1, 4.10.4, 5.0, 5.1, 5.2, 5.3.1, 5.4
>Reporter: Thomas Egense
>Priority: Minor
>  Labels: edismax, phrasequery
>
> When you have PhraseFields 

[jira] [Updated] (SOLR-8584) Phrase Fields ranking bug for range search when using edismax queryparser

2016-01-22 Thread Thomas Egense (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Egense updated SOLR-8584:

Description: 
When you have PhraseFields defined in your edismax parser and make a range 
query, the edismax will expand the query to the relevant fields, but with a 
wrong term added to the query. That term is the max-limit from the range query.

In the example tutorial Solr add the following to the /browse edismax handler 
in solrconfig.xml:
{code}
name 
{code}

Try this  query  with query explain for the /browse requesthandler:
{code}
apple AND weight:[* TO 60]
{code}
Debug query text
. DisjunctionMaxQuery((name:"apple 60").

As you can see the term 60 (from the range query) is now added to the search 
term and this will boost the score if indeed "apple 60" matches anything.
To demostrate this, add the following document to the index:
(ipod_video.xml with minor changes)

{code}

  MA147LL/B
  Apple 120 GB iPod with Video Playback Black
  Apple Computer Inc.
  
  apple
  electronics
  music
  iTunes, Podcasts, Audiobooks
  Stores up to 15,000 songs, 25,000 photos, or 150 hours 
of video
  2.5-inch, 320x240 color TFT LCD display with LED 
backlight
  Up to 20 hours of battery life
  Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, 
H.264 video
  Notes, Calendar, Phone book, Hold button, Date 
display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable 
firmware, USB 2.0 compatibility, Playback speed control, Rechargeable 
capability, Battery level indication
  earbud headphones, USB cable
  6.5
  599.00
  10
  true
  
  37.7752,-100.0232
  2005-10-12T08:00:00Z

{code}

When you repeat the query: 
{code}
apple AND weight:[* TO 60]
{code}
It will find two documents as expected, but the ranking should be
identical! Instead they are   0.65656495 and 0.3007804 

The reason for this bug is that phrase "apple 60" matches one of the documents 
(the one that comes with the tutorial).

The phrase field expansion can go much worse than this and use both the 
start-limit, end-limit and "TO" used in the range query part. 

/Thomas Egense









  was:
When you have PhraseFields defined in your edismax parser and make a range 
query, the edismax will expand the query to the relevant fields, but with a 
wrong term added to the query. That term is the max-limit from the range query.

In the example tutorial Solr add the following to the /browse edismax handler:
name 

Try this  query  with query explain for the /browse requesthandler:
{code}
apple AND weight:[* TO 60]
{code}
Debug query text
. DisjunctionMaxQuery((name:"apple 60").

As you can see the term 60 (from the range query) is now added to the search 
term and this will boost the score if indeed "apple 60" matches anything.
To demostrate this, add the following document to the index:
(ipod_video.xml with minor changes)

{code}

  MA147LL/B
  Apple 120 GB iPod with Video Playback Black
  Apple Computer Inc.
  
  apple
  electronics
  music
  iTunes, Podcasts, Audiobooks
  Stores up to 15,000 songs, 25,000 photos, or 150 hours 
of video
  2.5-inch, 320x240 color TFT LCD display with LED 
backlight
  Up to 20 hours of battery life
  Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, 
H.264 video
  Notes, Calendar, Phone book, Hold button, Date 
display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable 
firmware, USB 2.0 compatibility, Playback speed control, Rechargeable 
capability, Battery level indication
  earbud headphones, USB cable
  6.5
  599.00
  10
  true
  
  37.7752,-100.0232
  2005-10-12T08:00:00Z

{code}

When you repeat the query: 
{code}
apple AND weight:[* TO 60]
{code}
It will find two documents as expected, but the ranking should be
identical! Instead they are   0.65656495 and 0.3007804 

The reason for this bug is that phrase "apple 60" matches one of the documents 
(the one that comes with the tutorial).

The phrase field expansion can go much worse than this and use both the 
start-limit, end-limit and "TO" used in the range query part. 

/Thomas Egense










> Phrase Fields ranking bug for range search when using edismax queryparser
> -
>
> Key: SOLR-8584
> URL: https://issues.apache.org/jira/browse/SOLR-8584
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 4.7, 4.8.1, 4.9.1, 4.10.4, 5.0, 5.1, 5.2, 5.3.1, 5.4
>Reporter: Thomas Egense
>Priority: Minor
>  Labels: edismax, phrasequery
>
> When you have PhraseFields defined in your edismax parser and make a range 
> query, the edismax will expand the query to the relevant fields, but with a 
> wrong term added to the query. That term is the max-limit from the range 
> query.
> In the example tutorial Solr add the following to the /browse edismax 

[jira] [Updated] (SOLR-8584) Phrase Fields ranking bug for range search when using edismax queryparser

2016-01-22 Thread Thomas Egense (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Egense updated SOLR-8584:

Description: 
When you have PhraseFields defined in your edismax parser and make a range 
query, the edismax will expand the query to the relevant fields, but with a 
wrong term added to the query. That term is the max-limit from the range query.

In the example tutorial Solr add the following to the /browse edismax handler 
in solrconfig.xml:
{code}
name 
{code}

Try this  query  with query explain for the /browse requesthandler:
{code}
apple AND weight:[* TO 60]
{code}
This is from the query explain:
{code}
. DisjunctionMaxQuery((name:"apple 60").
{code}
As you can see the term 60 (from the range query) is now added to the search 
term and this will boost the score if indeed "apple 60" matches anything.
To demostrate this, add the following document to the index:
(ipod_video.xml with minor changes)

{code}

  MA147LL/B
  Apple 120 GB iPod with Video Playback Black
  Apple Computer Inc.
  
  apple
  electronics
  music
  iTunes, Podcasts, Audiobooks
  Stores up to 15,000 songs, 25,000 photos, or 150 hours 
of video
  2.5-inch, 320x240 color TFT LCD display with LED 
backlight
  Up to 20 hours of battery life
  Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, 
H.264 video
  Notes, Calendar, Phone book, Hold button, Date 
display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable 
firmware, USB 2.0 compatibility, Playback speed control, Rechargeable 
capability, Battery level indication
  earbud headphones, USB cable
  6.5
  599.00
  10
  true
  
  37.7752,-100.0232
  2005-10-12T08:00:00Z

{code}

When you repeat the query: 
{code}
apple AND weight:[* TO 60]
{code}
It will find two documents as expected, but the ranking should be
identical! Instead they are   0.65656495 and 0.3007804 

The reason for this bug is that phrase "apple 60" matches one of the documents 
(the one that comes with the tutorial).

The phrase field expansion can go much worse than this and use both the 
start-limit, end-limit and "TO" used in the range query part. 

Solution: Do not use anything from the range query part for the phrase fields.

/Thomas Egense and Toke Eskildsen









  was:
When you have PhraseFields defined in your edismax parser and make a range 
query, the edismax will expand the query to the relevant fields, but with a 
wrong term added to the query. That term is the max-limit from the range query.

In the example tutorial Solr add the following to the /browse edismax handler 
in solrconfig.xml:
{code}
name 
{code}

Try this  query  with query explain for the /browse requesthandler:
{code}
apple AND weight:[* TO 60]
{code}
This is from the query explain:
{code}
. DisjunctionMaxQuery((name:"apple 60").
{code}
As you can see the term 60 (from the range query) is now added to the search 
term and this will boost the score if indeed "apple 60" matches anything.
To demostrate this, add the following document to the index:
(ipod_video.xml with minor changes)

{code}

  MA147LL/B
  Apple 120 GB iPod with Video Playback Black
  Apple Computer Inc.
  
  apple
  electronics
  music
  iTunes, Podcasts, Audiobooks
  Stores up to 15,000 songs, 25,000 photos, or 150 hours 
of video
  2.5-inch, 320x240 color TFT LCD display with LED 
backlight
  Up to 20 hours of battery life
  Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, 
H.264 video
  Notes, Calendar, Phone book, Hold button, Date 
display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable 
firmware, USB 2.0 compatibility, Playback speed control, Rechargeable 
capability, Battery level indication
  earbud headphones, USB cable
  6.5
  599.00
  10
  true
  
  37.7752,-100.0232
  2005-10-12T08:00:00Z

{code}

When you repeat the query: 
{code}
apple AND weight:[* TO 60]
{code}
It will find two documents as expected, but the ranking should be
identical! Instead they are   0.65656495 and 0.3007804 

The reason for this bug is that phrase "apple 60" matches one of the documents 
(the one that comes with the tutorial).

The phrase field expansion can go much worse than this and use both the 
start-limit, end-limit and "TO" used in the range query part. 

/Thomas Egense and Toke Eskildsen










> Phrase Fields ranking bug for range search when using edismax queryparser
> -
>
> Key: SOLR-8584
> URL: https://issues.apache.org/jira/browse/SOLR-8584
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 4.7, 4.8.1, 4.9.1, 4.10.4, 5.0, 5.1, 5.2, 5.3.1, 5.4
>Reporter: Thomas Egense
>Priority: Minor
>  Labels: edismax, phrasequery
>
> When you have PhraseFields defined in your edismax parser and make a range 
> query, the edismax will expand