[jira] [Updated] (LUCENE-3821) SloppyPhraseScorer sometimes misses documents that ExactPhraseScorer finds.

2012-03-06 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3821:


Attachment: LUCENE-3821-SloppyDecays.patch

Patch adds NonExactPhraseScorer (temporary name) as discussed above - work in 
progress, it does not yet do any sloppy matching or scoring.

 SloppyPhraseScorer sometimes misses documents that ExactPhraseScorer finds.
 ---

 Key: LUCENE-3821
 URL: https://issues.apache.org/jira/browse/LUCENE-3821
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.5, 4.0
Reporter: Naomi Dushay
Assignee: Doron Cohen
 Attachments: LUCENE-3821-SloppyDecays.patch, LUCENE-3821.patch, 
 LUCENE-3821.patch, LUCENE-3821.patch, LUCENE-3821.patch, 
 LUCENE-3821_test.patch, schema.xml, solrconfig-test.xml


 The general bug is a case where a phrase with no slop is found,
 but if you add slop its not.
 I committed a test today (TestSloppyPhraseQuery2) that actually triggers this 
 case,
 jenkins just hasn't had enough time to chew on it.
 ant test -Dtestcase=TestSloppyPhraseQuery2 -Dtests.iter=100 is enough to make 
 it fail on trunk or 3.x

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3821) SloppyPhraseScorer sometimes misses documents that ExactPhraseScorer finds.

2012-03-05 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3821:


Attachment: LUCENE-3821.patch

Attached updated patch. 

Repeating PPs with multi-Phrase-query is handled as well.

This called for more cases in the sloppy phrase scorer and more code, and, 
although I think the code is cleaner now, I don't know to what extent is it 
easier to maintain. 

It definitely fixes wrong behavior that exists in current 3x and trunk (patch 
is for 3x).

However, although the random test passes for me even with -Dtests.iter=2000, it 
is possible to break the scorer - that is, create a document and a query 
which should match each other but would not. 

The patch adds just such a case as an @Ignored test case:  
TestMultiPhraseQuery.testMultiSloppyWithRepeats(). 

I don't see how to solve this specific case in the context of current sloppy 
phrase scorer. 

So there are 3 options:
# leave things as they are
# commit this patch and for now document the failing scenario (also keep it in 
the ignored test case). 
# devise a different algorithm for this.

I would love it to be the 3rd if I just knew how to do it. Otherwise I like the 
2nd, just need to keep in mind that the random test might from time to time 
create this scenario and so there will be noise in the test builds.

Preferences?

 SloppyPhraseScorer sometimes misses documents that ExactPhraseScorer finds.
 ---

 Key: LUCENE-3821
 URL: https://issues.apache.org/jira/browse/LUCENE-3821
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.5, 4.0
Reporter: Naomi Dushay
Assignee: Doron Cohen
 Attachments: LUCENE-3821.patch, LUCENE-3821.patch, LUCENE-3821.patch, 
 LUCENE-3821.patch, LUCENE-3821_test.patch, schema.xml, solrconfig-test.xml


 The general bug is a case where a phrase with no slop is found,
 but if you add slop its not.
 I committed a test today (TestSloppyPhraseQuery2) that actually triggers this 
 case,
 jenkins just hasn't had enough time to chew on it.
 ant test -Dtestcase=TestSloppyPhraseQuery2 -Dtests.iter=100 is enough to make 
 it fail on trunk or 3.x

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3821) SloppyPhraseScorer sometimes misses documents that ExactPhraseScorer finds.

2012-03-04 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3821:


Attachment: LUCENE-3821.patch

updated patch with fixed MFQ.toString(), which prints the problematic doc and 
queries in case of failure.

 SloppyPhraseScorer sometimes misses documents that ExactPhraseScorer finds.
 ---

 Key: LUCENE-3821
 URL: https://issues.apache.org/jira/browse/LUCENE-3821
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.5, 4.0
Reporter: Naomi Dushay
Assignee: Doron Cohen
 Attachments: LUCENE-3821.patch, LUCENE-3821.patch, LUCENE-3821.patch, 
 LUCENE-3821_test.patch, schema.xml, solrconfig-test.xml


 The general bug is a case where a phrase with no slop is found,
 but if you add slop its not.
 I committed a test today (TestSloppyPhraseQuery2) that actually triggers this 
 case,
 jenkins just hasn't had enough time to chew on it.
 ant test -Dtestcase=TestSloppyPhraseQuery2 -Dtests.iter=100 is enough to make 
 it fail on trunk or 3.x

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3821) SloppyPhraseScorer sometimes misses documents that ExactPhraseScorer finds.

2012-03-03 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3821:


Attachment: LUCENE-3821.patch

Patch with fix for this problem. I would expect SloppyPhrase scoring 
performance to degrade, though I did not measure it.

The single test that still fails (and I think the bug is in ExactPhraseScorer) 
is testRandomIncreasingSloppiness, and can be recreated like this:
{noformat}
ant test -Dtestcase=TestSloppyPhraseQuery2 
-Dtestmethod=testRandomIncreasingSloppiness 
-Dtests.seed=47267613db69f714:-617bb800c4a3c645:-456a673444fdc184 
-Dargs=-Dfile.encoding=UTF-8
{noformat}

 SloppyPhraseScorer sometimes misses documents that ExactPhraseScorer finds.
 ---

 Key: LUCENE-3821
 URL: https://issues.apache.org/jira/browse/LUCENE-3821
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.5, 4.0
Reporter: Naomi Dushay
Assignee: Doron Cohen
 Attachments: LUCENE-3821.patch, LUCENE-3821_test.patch, schema.xml, 
 solrconfig-test.xml


 The general bug is a case where a phrase with no slop is found,
 but if you add slop its not.
 I committed a test today (TestSloppyPhraseQuery2) that actually triggers this 
 case,
 jenkins just hasn't had enough time to chew on it.
 ant test -Dtestcase=TestSloppyPhraseQuery2 -Dtests.iter=100 is enough to make 
 it fail on trunk or 3.x

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3821) SloppyPhraseScorer sometimes misses documents that ExactPhraseScorer finds.

2012-03-03 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3821:


Attachment: LUCENE-3821.patch

bq. Hmm patch has this: ... import backport.api...

Oops, here's a fixed patch, also added some comments, and removed the @Ignore 
from the test

bq. I'm going to be ecstatic if that crazy test finds bugs both in exact and 
sloppy phrase scorers :)

It is a great test! Interestingly one thing it exposed is the dependency of the 
SloppyPhraseScorer in the order of PPs in PhraseScorer when phraseFreq() is 
invoked. The way things work in the super class, that order depends on the 
content of previously processed documents. This fix removes that wrong 
dependency, of course. The point is that deliberately devising a test that 
exposes such a bug seems almost impossible: first you need to think about such 
a case, and if you did, writing a test that would create this specific scenario 
is buggy by itself. Praise to random testing, and this random test in 
particular.

 SloppyPhraseScorer sometimes misses documents that ExactPhraseScorer finds.
 ---

 Key: LUCENE-3821
 URL: https://issues.apache.org/jira/browse/LUCENE-3821
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.5, 4.0
Reporter: Naomi Dushay
Assignee: Doron Cohen
 Attachments: LUCENE-3821.patch, LUCENE-3821.patch, 
 LUCENE-3821_test.patch, schema.xml, solrconfig-test.xml


 The general bug is a case where a phrase with no slop is found,
 but if you add slop its not.
 I committed a test today (TestSloppyPhraseQuery2) that actually triggers this 
 case,
 jenkins just hasn't had enough time to chew on it.
 ant test -Dtestcase=TestSloppyPhraseQuery2 -Dtests.iter=100 is enough to make 
 it fail on trunk or 3.x

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3821) SloppyPhraseScorer sometimes misses documents that ExactPhraseScorer finds.

2012-02-23 Thread Robert Muir (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3821:


Description: 
The general bug is a case where a phrase with no slop is found,
but if you add slop its not.

I committed a test today (TestSloppyPhraseQuery2) that actually triggers this 
case,
jenkins just hasn't had enough time to chew on it.

ant test -Dtestcase=TestSloppyPhraseQuery2 -Dtests.iter=100 is enough to make 
it fail on trunk or 3.x


  was:
In upgrading from Solr 1.4 to Solr 3.5, the following phrase searches stopped 
working in dismax:
  The Beatles as musicians : Revolver through the Anthology
  Color-blindness [print/digital]; its dangers and its detection
Both of these queries have a repeated work, and have many terms.  It's not the 
number of terms or the colon surrounded by spaces, because the following phrase 
search works in Solr 3.5 (and Solr 1.4):
International encyclopedia of revolution and protest : 1500 to the present

With Robert Muir's help, we have narrowed the problem down to slop  (proximity 
in lucene QueryParser, query slop in dismax).   I have included debugQuery 
details for  the Beatles search;  I confirmed the same behavior with the 
color-blindness search.


Solr 3.5:   it fails when (query) slop setting isn't 0.

lucene QueryParser with proximity set to 1 (or anything  0) :  no match
  URL: q=all_search:The Beatles as musicians : Revolver through the 
Anthology~1
  final query:  all_search:the beatl as musician revolv through the antholog~1

lucene QueryParser with proximity set to 0:result!
  URL:   q=all_search:The Beatles as musicians : Revolver through the 
Anthology
  final query:  all_search:the beatl as musician revolv through the antholog

  6.0562754 = (MATCH) weight(all_search:the beatl as musician revolv through 
the antholog in 1064395), product of:
 snip
  48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 
musician=11805 revolv=872 through=81366 the=3531140 antholog=11611)
 snip

dismax QueryParser with qs=1:  no match
  ps=0
  URL:  qf=all_searchpf=all_searchq=The Beatles as musicians : Revolver 
through the Anthologyqs=1ps=0
  final query:   +(all_search:the beatl as musician revolv through the 
antholog~1)~0.01 (all_search:the beatl as musician revolv through the 
antholog)~0.01
  ps=1
  URL:  qf=all_searchpf=all_searchq=The Beatles as musicians : Revolver 
through the Anthologyqs=1ps=1
  final query:   +(all_search:the beatl as musician revolv through the 
antholog~1)~0.01 (all_search:the beatl as musician revolv through the 
antholog~1)~0.01

dismax QueryParser with qs=0:result!
 ps=0
  URL:  qf=all_searchpf=all_searchq=The Beatles as musicians : Revolver 
through the Anthologyqs=0ps=0
  final query:  +(all_search:the beatl as musician revolv through the 
antholog)~0.01 (all_search:the beatl as musician revolv through the 
antholog)~0.01
  ps=1
  URL:  qf=all_searchpf=all_searchq=The Beatles as musicians : Revolver 
through the Anthologyqs=0ps=1
  final query:  +(all_search:the beatl as musician revolv through the 
antholog)~0.01 (all_search:the beatl as musician revolv through the 
antholog~1)~0.01

  8.564867 = (MATCH) sum of:
4.2824335 = (MATCH) weight(all_search:the beatl as musician revolv through 
the antholog in 1064395), product of:
snip
48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 
musician=11805 revolv=872 through=81366 the=3531140 antholog=11611)
snip


Solr 1.4:it works regardless of slop settings

lucene QueryParser with any proximity value:result!
  ~0
  URL:   q=all_search:The Beatles as musicians : Revolver through the 
Anthology
  final query:  all_search:the beatl as musician revolv through the antholog
  ~1
  URL: q=all_search:The Beatles as musicians : Revolver through the 
Anthology~1
  final query:  all_search:the beatl as musician revolv through the antholog~1

  5.2672544 = fieldWeight(all_search:the beatl as musician revolv through the 
antholog in 3469163), product of:
 snip
48.157753 = idf(all_search: the=3549637 beatl=392 as=751093 musician=11992 
revolv=822 through=88522 the=3549637 antholog=11246)
 snip

dismax QueryParser with any qs:result!
  qs=0, ps=0
   URL:  qf=all_searchpf=all_searchq=The Beatles as musicians : Revolver 
through the Anthologyqs=0ps=0
   final query: +(all_search:the beatl as musician revolv through the 
antholog)~0.01 (all_search:the beatl as musician revolv through the 
antholog)~0.01
  qs=0, ps=1
   URL:  qf=all_searchpf=all_searchq=The Beatles as musicians : Revolver 
through the Anthologyqs=0ps=1
   final query: +(all_search:the beatl as musician revolv through the 
antholog)~0.01 (all_search:the beatl as musician revolv through the 
antholog~1)~0.01
dismax QueryParser with qs=0:result!
  qs=1, ps=0
   URL:  qf=all_searchpf=all_searchq=The Beatles