Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

2012-02-23 Thread Naomi Dushay
Robert,

You found it!   it is the phrase slop.  What do I do now?   I am using Solr 
from trunk from December, and all those JIRA tixes are marked fixed …

- Naomi


Solr 1.4:

luceneQueryParser:

URL: q=all_search:The Beatles as musicians : Revolver through the Anthology~3
final query:  all_search:the beatl as musician revolv through the antholog~3

got result


Solr 3.5

luceneQueryParser:

URL: q=all_search:The Beatles as musicians : Revolver through the Anthology~3
final query:  all_search:the beatl as musician revolv through the antholog~3

NO result



 lucene QueryParser:
 
 URL:  q=all_search:The Beatles as musicians : Revolver through the Anthology
 final query:  all_search:the beatl as musician revolv through the antholog




On Feb 22, 2012, at 7:34 PM, Robert Muir [via Lucene] wrote:

 On Wed, Feb 22, 2012 at 7:35 PM, Naomi Dushay [hidden email] wrote: 
  Jonathan has brought it to my attention that BOTH of my failing searches 
  happen to have 8 terms, and one of the terms is repeated: 
  
   The Beatles as musicians : Revolver through the Anthology 
   Color-blindness [print/digital]; its dangers and its detection 
  
  but this is a PHRASE search. 
  
 
 Can you take your same phrase queries, and simply add some slop to 
 them (e.g. ~3) and ensure they still match with the lucene 
 queryparser? SloppyPhraseQuery has a bit of a history with repeats 
 since Lucene 2.9 that you were using. 
 
 https://issues.apache.org/jira/browse/LUCENE-3068
 https://issues.apache.org/jira/browse/LUCENE-3215
 https://issues.apache.org/jira/browse/LUCENE-3412
 
 -- 
 lucidimagination.com 
 
 
 If you reply to this email, your message will be added to the discussion 
 below:
 http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3768619.html
 To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, 
 dismax only, click here.
 NAML



--
View this message in context: 
http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770665.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

2012-02-23 Thread Robert Muir
Is it possible to also provide your document?
If you could attach the document and the analysis config and queries
to a JIRA issue, that would be most ideal.

On Thu, Feb 23, 2012 at 2:05 PM, Naomi Dushay ndus...@stanford.edu wrote:
 Robert,

 You found it!   it is the phrase slop.  What do I do now?   I am using Solr 
 from trunk from December, and all those JIRA tixes are marked fixed …

 - Naomi


 Solr 1.4:

 luceneQueryParser:

 URL: q=all_search:The Beatles as musicians : Revolver through the 
 Anthology~3
 final query:  all_search:the beatl as musician revolv through the antholog~3

 got result


 Solr 3.5

 luceneQueryParser:

 URL: q=all_search:The Beatles as musicians : Revolver through the 
 Anthology~3
 final query:  all_search:the beatl as musician revolv through the antholog~3

 NO result



 lucene QueryParser:

 URL:  q=all_search:The Beatles as musicians : Revolver through the 
 Anthology
 final query:  all_search:the beatl as musician revolv through the antholog




 On Feb 22, 2012, at 7:34 PM, Robert Muir [via Lucene] wrote:

 On Wed, Feb 22, 2012 at 7:35 PM, Naomi Dushay [hidden email] wrote:
  Jonathan has brought it to my attention that BOTH of my failing searches 
  happen to have 8 terms, and one of the terms is repeated:
 
   The Beatles as musicians : Revolver through the Anthology
   Color-blindness [print/digital]; its dangers and its detection
 
  but this is a PHRASE search.
 

 Can you take your same phrase queries, and simply add some slop to
 them (e.g. ~3) and ensure they still match with the lucene
 queryparser? SloppyPhraseQuery has a bit of a history with repeats
 since Lucene 2.9 that you were using.

 https://issues.apache.org/jira/browse/LUCENE-3068
 https://issues.apache.org/jira/browse/LUCENE-3215
 https://issues.apache.org/jira/browse/LUCENE-3412

 --
 lucidimagination.com


 If you reply to this email, your message will be added to the discussion 
 below:
 http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3768619.html
 To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, 
 dismax only, click here.
 NAML



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770665.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
lucidimagination.com


Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

2012-02-23 Thread Naomi Dushay
Robert,

I will create a jira issue with the documentation.  FYI, I tried ps values of 
3, 2, 1 and 0 and none of them worked with dismax;   For lucene QueryParser, 
only the value of 0 got results.

- Naomi


On Feb 23, 2012, at 11:12 AM, Robert Muir [via Lucene] wrote:

 Is it possible to also provide your document? 
 If you could attach the document and the analysis config and queries 
 to a JIRA issue, that would be most ideal. 
 
 On Thu, Feb 23, 2012 at 2:05 PM, Naomi Dushay [hidden email] wrote:
 
  Robert, 
  
  You found it!   it is the phrase slop.  What do I do now?   I am using Solr 
  from trunk from December, and all those JIRA tixes are marked fixed … 
  
  - Naomi 
  
  
  Solr 1.4: 
  
  luceneQueryParser: 
  
  URL: q=all_search:The Beatles as musicians : Revolver through the 
  Anthology~3 
  final query:  all_search:the beatl as musician revolv through the 
  antholog~3 
  
  got result 
  
  
  Solr 3.5 
  
  luceneQueryParser: 
  
  URL: q=all_search:The Beatles as musicians : Revolver through the 
  Anthology~3 
  final query:  all_search:the beatl as musician revolv through the 
  antholog~3 
  
  NO result 
  
  
  
  lucene QueryParser: 
  
  URL:  q=all_search:The Beatles as musicians : Revolver through the 
  Anthology 
  final query:  all_search:the beatl as musician revolv through the 
  antholog 
  
  
  
  
  On Feb 22, 2012, at 7:34 PM, Robert Muir [via Lucene] wrote: 
  
  On Wed, Feb 22, 2012 at 7:35 PM, Naomi Dushay [hidden email] wrote: 
   Jonathan has brought it to my attention that BOTH of my failing searches 
   happen to have 8 terms, and one of the terms is repeated: 
   
The Beatles as musicians : Revolver through the Anthology 
Color-blindness [print/digital]; its dangers and its detection 
   
   but this is a PHRASE search. 
   
  
  Can you take your same phrase queries, and simply add some slop to 
  them (e.g. ~3) and ensure they still match with the lucene 
  queryparser? SloppyPhraseQuery has a bit of a history with repeats 
  since Lucene 2.9 that you were using. 
  
  https://issues.apache.org/jira/browse/LUCENE-3068
  https://issues.apache.org/jira/browse/LUCENE-3215
  https://issues.apache.org/jira/browse/LUCENE-3412
  
  -- 
  lucidimagination.com 
  
  
  If you reply to this email, your message will be added to the discussion 
  below: 
  http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3768619.html
  To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, 
  dismax only, click here. 
  NAML 
  
  
  
  -- 
  View this message in context: 
  http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770665.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 -- 
 lucidimagination.com 
 
 
 If you reply to this email, your message will be added to the discussion 
 below:
 http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770681.html
 To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, 
 dismax only, click here.
 NAML



Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

2012-02-23 Thread Robert Muir
Please attach your docs if you dont mind.

I worked up tests for this (in general for ANY phrase query,
increasing the slop should never remove results, only potentially
enlarge them).

It fails already... but its good to also have your test case too...

On Thu, Feb 23, 2012 at 2:20 PM, Naomi Dushay ndus...@stanford.edu wrote:
 Robert,

 I will create a jira issue with the documentation.  FYI, I tried ps values of 
 3, 2, 1 and 0 and none of them worked with dismax;   For lucene QueryParser, 
 only the value of 0 got results.

 - Naomi


 On Feb 23, 2012, at 11:12 AM, Robert Muir [via Lucene] wrote:

 Is it possible to also provide your document?
 If you could attach the document and the analysis config and queries
 to a JIRA issue, that would be most ideal.

 On Thu, Feb 23, 2012 at 2:05 PM, Naomi Dushay [hidden email] wrote:

  Robert,
 
  You found it!   it is the phrase slop.  What do I do now?   I am using 
  Solr from trunk from December, and all those JIRA tixes are marked fixed …
 
  - Naomi
 
 
  Solr 1.4:
 
  luceneQueryParser:
 
  URL: q=all_search:The Beatles as musicians : Revolver through the 
  Anthology~3
  final query:  all_search:the beatl as musician revolv through the 
  antholog~3
 
  got result
 
 
  Solr 3.5
 
  luceneQueryParser:
 
  URL: q=all_search:The Beatles as musicians : Revolver through the 
  Anthology~3
  final query:  all_search:the beatl as musician revolv through the 
  antholog~3
 
  NO result
 
 
 
  lucene QueryParser:
 
  URL:  q=all_search:The Beatles as musicians : Revolver through the 
  Anthology
  final query:  all_search:the beatl as musician revolv through the 
  antholog
 
 
 
 
  On Feb 22, 2012, at 7:34 PM, Robert Muir [via Lucene] wrote:
 
  On Wed, Feb 22, 2012 at 7:35 PM, Naomi Dushay [hidden email] wrote:
   Jonathan has brought it to my attention that BOTH of my failing 
   searches happen to have 8 terms, and one of the terms is repeated:
  
    The Beatles as musicians : Revolver through the Anthology
    Color-blindness [print/digital]; its dangers and its detection
  
   but this is a PHRASE search.
  
 
  Can you take your same phrase queries, and simply add some slop to
  them (e.g. ~3) and ensure they still match with the lucene
  queryparser? SloppyPhraseQuery has a bit of a history with repeats
  since Lucene 2.9 that you were using.
 
  https://issues.apache.org/jira/browse/LUCENE-3068
  https://issues.apache.org/jira/browse/LUCENE-3215
  https://issues.apache.org/jira/browse/LUCENE-3412
 
  --
  lucidimagination.com
 
 
  If you reply to this email, your message will be added to the discussion 
  below:
  http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3768619.html
  To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, 
  dismax only, click here.
  NAML
 
 
 
  --
  View this message in context: 
  http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770665.html
  Sent from the Solr - User mailing list archive at Nabble.com.



 --
 lucidimagination.com


 If you reply to this email, your message will be added to the discussion 
 below:
 http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770681.html
 To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, 
 dismax only, click here.
 NAML




-- 
lucidimagination.com


Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

2012-02-23 Thread Naomi Dushay
Robert -

Did you mean for me to attach my docs to an existing ticket (which one?) or 
just want to make sure I attach the docs to the new issue?

- Naomi

On Feb 23, 2012, at 11:39 AM, Robert Muir [via Lucene] wrote:

 Please attach your docs if you dont mind. 
 
 I worked up tests for this (in general for ANY phrase query, 
 increasing the slop should never remove results, only potentially 
 enlarge them). 
 
 It fails already... but its good to also have your test case too... 
 
 On Thu, Feb 23, 2012 at 2:20 PM, Naomi Dushay [hidden email] wrote:
 
  Robert, 
  
  I will create a jira issue with the documentation.  FYI, I tried ps values 
  of 3, 2, 1 and 0 and none of them worked with dismax;   For lucene 
  QueryParser, only the value of 0 got results. 
  
  - Naomi 
  
  
  On Feb 23, 2012, at 11:12 AM, Robert Muir [via Lucene] wrote: 
  
  Is it possible to also provide your document? 
  If you could attach the document and the analysis config and queries 
  to a JIRA issue, that would be most ideal. 
  
  On Thu, Feb 23, 2012 at 2:05 PM, Naomi Dushay [hidden email] wrote: 
  
   Robert, 
   
   You found it!   it is the phrase slop.  What do I do now?   I am using 
   Solr from trunk from December, and all those JIRA tixes are marked fixed 
   … 
   
   - Naomi 
   
   
   Solr 1.4: 
   
   luceneQueryParser: 
   
   URL: q=all_search:The Beatles as musicians : Revolver through the 
   Anthology~3 
   final query:  all_search:the beatl as musician revolv through the 
   antholog~3 
   
   got result 
   
   
   Solr 3.5 
   
   luceneQueryParser: 
   
   URL: q=all_search:The Beatles as musicians : Revolver through the 
   Anthology~3 
   final query:  all_search:the beatl as musician revolv through the 
   antholog~3 
   
   NO result 
   
   
   
   lucene QueryParser: 
   
   URL:  q=all_search:The Beatles as musicians : Revolver through the 
   Anthology 
   final query:  all_search:the beatl as musician revolv through the 
   antholog 
   
   
   
   
   On Feb 22, 2012, at 7:34 PM, Robert Muir [via Lucene] wrote: 
   
   On Wed, Feb 22, 2012 at 7:35 PM, Naomi Dushay [hidden email] wrote: 
Jonathan has brought it to my attention that BOTH of my failing 
searches happen to have 8 terms, and one of the terms is repeated: 

 The Beatles as musicians : Revolver through the Anthology 
 Color-blindness [print/digital]; its dangers and its detection 

but this is a PHRASE search. 

   
   Can you take your same phrase queries, and simply add some slop to 
   them (e.g. ~3) and ensure they still match with the lucene 
   queryparser? SloppyPhraseQuery has a bit of a history with repeats 
   since Lucene 2.9 that you were using. 
   
   https://issues.apache.org/jira/browse/LUCENE-3068
   https://issues.apache.org/jira/browse/LUCENE-3215
   https://issues.apache.org/jira/browse/LUCENE-3412
   
   -- 
   lucidimagination.com 
   
   
   If you reply to this email, your message will be added to the 
   discussion below: 
   http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3768619.html
   To unsubscribe from result present in Solr 1.4, but missing in Solr 
   3.5, dismax only, click here. 
   NAML 
   
   
   
   -- 
   View this message in context: 
   http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770665.html
   Sent from the Solr - User mailing list archive at Nabble.com. 
  
  
  
  -- 
  lucidimagination.com 
  
  
  If you reply to this email, your message will be added to the discussion 
  below: 
  http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770681.html
  To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, 
  dismax only, click here. 
  NAML 
 
 
 
 
 -- 
 lucidimagination.com 
 
 
 If you reply to this email, your message will be added to the discussion 
 below:
 http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770746.html
 To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, 
 dismax only, click here.
 NAML



Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

2012-02-23 Thread Robert Muir
Please make a new one if you dont mind!

On Thu, Feb 23, 2012 at 2:45 PM, Naomi Dushay ndus...@stanford.edu wrote:
 Robert -

 Did you mean for me to attach my docs to an existing ticket (which one?) or 
 just want to make sure I attach the docs to the new issue?

 - Naomi

 On Feb 23, 2012, at 11:39 AM, Robert Muir [via Lucene] wrote:

 Please attach your docs if you dont mind.

 I worked up tests for this (in general for ANY phrase query,
 increasing the slop should never remove results, only potentially
 enlarge them).

 It fails already... but its good to also have your test case too...

 On Thu, Feb 23, 2012 at 2:20 PM, Naomi Dushay [hidden email] wrote:

  Robert,
 
  I will create a jira issue with the documentation.  FYI, I tried ps values 
  of 3, 2, 1 and 0 and none of them worked with dismax;   For lucene 
  QueryParser, only the value of 0 got results.
 
  - Naomi
 
 
  On Feb 23, 2012, at 11:12 AM, Robert Muir [via Lucene] wrote:
 
  Is it possible to also provide your document?
  If you could attach the document and the analysis config and queries
  to a JIRA issue, that would be most ideal.
 
  On Thu, Feb 23, 2012 at 2:05 PM, Naomi Dushay [hidden email] wrote:
 
   Robert,
  
   You found it!   it is the phrase slop.  What do I do now?   I am using 
   Solr from trunk from December, and all those JIRA tixes are marked 
   fixed …
  
   - Naomi
  
  
   Solr 1.4:
  
   luceneQueryParser:
  
   URL: q=all_search:The Beatles as musicians : Revolver through the 
   Anthology~3
   final query:  all_search:the beatl as musician revolv through the 
   antholog~3
  
   got result
  
  
   Solr 3.5
  
   luceneQueryParser:
  
   URL: q=all_search:The Beatles as musicians : Revolver through the 
   Anthology~3
   final query:  all_search:the beatl as musician revolv through the 
   antholog~3
  
   NO result
  
  
  
   lucene QueryParser:
  
   URL:  q=all_search:The Beatles as musicians : Revolver through the 
   Anthology
   final query:  all_search:the beatl as musician revolv through the 
   antholog
  
  
  
  
   On Feb 22, 2012, at 7:34 PM, Robert Muir [via Lucene] wrote:
  
   On Wed, Feb 22, 2012 at 7:35 PM, Naomi Dushay [hidden email] wrote:
Jonathan has brought it to my attention that BOTH of my failing 
searches happen to have 8 terms, and one of the terms is repeated:
   
 The Beatles as musicians : Revolver through the Anthology
 Color-blindness [print/digital]; its dangers and its detection
   
but this is a PHRASE search.
   
  
   Can you take your same phrase queries, and simply add some slop to
   them (e.g. ~3) and ensure they still match with the lucene
   queryparser? SloppyPhraseQuery has a bit of a history with repeats
   since Lucene 2.9 that you were using.
  
   https://issues.apache.org/jira/browse/LUCENE-3068
   https://issues.apache.org/jira/browse/LUCENE-3215
   https://issues.apache.org/jira/browse/LUCENE-3412
  
   --
   lucidimagination.com
  
  
   If you reply to this email, your message will be added to the 
   discussion below:
   http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3768619.html
   To unsubscribe from result present in Solr 1.4, but missing in Solr 
   3.5, dismax only, click here.
   NAML
  
  
  
   --
   View this message in context: 
   http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770665.html
   Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
  --
  lucidimagination.com
 
 
  If you reply to this email, your message will be added to the discussion 
  below:
  http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770681.html
  To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, 
  dismax only, click here.
  NAML
 



 --
 lucidimagination.com


 If you reply to this email, your message will be added to the discussion 
 below:
 http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770746.html
 To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, 
 dismax only, click here.
 NAML




-- 
lucidimagination.com


Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

2012-02-23 Thread Naomi Dushay
Ticket created:

https://issues.apache.org/jira/browse/SOLR-3158

(perhaps it's a lucene problem, not a Solr one -- feel free to move it or 
whatever.)

- Naomi


On Feb 23, 2012, at 11:55 AM, Robert Muir [via Lucene] wrote:

 Please make a new one if you dont mind! 
 
 On Thu, Feb 23, 2012 at 2:45 PM, Naomi Dushay [hidden email] wrote:
 
  Robert - 
  
  Did you mean for me to attach my docs to an existing ticket (which one?) or 
  just want to make sure I attach the docs to the new issue? 
  
  - Naomi 
  
  On Feb 23, 2012, at 11:39 AM, Robert Muir [via Lucene] wrote: 
  
  Please attach your docs if you dont mind. 
  
  I worked up tests for this (in general for ANY phrase query, 
  increasing the slop should never remove results, only potentially 
  enlarge them). 
  
  It fails already... but its good to also have your test case too... 
  
  On Thu, Feb 23, 2012 at 2:20 PM, Naomi Dushay [hidden email] wrote: 
  
   Robert, 
   
   I will create a jira issue with the documentation.  FYI, I tried ps 
   values of 3, 2, 1 and 0 and none of them worked with dismax;   For 
   lucene QueryParser, only the value of 0 got results. 
   
   - Naomi 
   
   
   On Feb 23, 2012, at 11:12 AM, Robert Muir [via Lucene] wrote: 
   
   Is it possible to also provide your document? 
   If you could attach the document and the analysis config and queries 
   to a JIRA issue, that would be most ideal. 
   
   On Thu, Feb 23, 2012 at 2:05 PM, Naomi Dushay [hidden email] wrote: 
   
Robert, 

You found it!   it is the phrase slop.  What do I do now?   I am 
using Solr from trunk from December, and all those JIRA tixes are 
marked fixed … 

- Naomi 


Solr 1.4: 

luceneQueryParser: 

URL: q=all_search:The Beatles as musicians : Revolver through the 
Anthology~3 
final query:  all_search:the beatl as musician revolv through the 
antholog~3 

got result 


Solr 3.5 

luceneQueryParser: 

URL: q=all_search:The Beatles as musicians : Revolver through the 
Anthology~3 
final query:  all_search:the beatl as musician revolv through the 
antholog~3 

NO result 



lucene QueryParser: 

URL:  q=all_search:The Beatles as musicians : Revolver through the 
Anthology 
final query:  all_search:the beatl as musician revolv through the 
antholog 




On Feb 22, 2012, at 7:34 PM, Robert Muir [via Lucene] wrote: 

On Wed, Feb 22, 2012 at 7:35 PM, Naomi Dushay [hidden email] 
wrote: 
 Jonathan has brought it to my attention that BOTH of my failing 
 searches happen to have 8 terms, and one of the terms is repeated: 
 
  The Beatles as musicians : Revolver through the Anthology 
  Color-blindness [print/digital]; its dangers and its detection 
 
 but this is a PHRASE search. 
 

Can you take your same phrase queries, and simply add some slop to 
them (e.g. ~3) and ensure they still match with the lucene 
queryparser? SloppyPhraseQuery has a bit of a history with repeats 
since Lucene 2.9 that you were using. 

https://issues.apache.org/jira/browse/LUCENE-3068
https://issues.apache.org/jira/browse/LUCENE-3215
https://issues.apache.org/jira/browse/LUCENE-3412

-- 
lucidimagination.com 


If you reply to this email, your message will be added to the 
discussion below: 
http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3768619.html
To unsubscribe from result present in Solr 1.4, but missing in Solr 
3.5, dismax only, click here. 
NAML 



-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770665.html
Sent from the Solr - User mailing list archive at Nabble.com. 
   
   
   
   -- 
   lucidimagination.com 
   
   
   If you reply to this email, your message will be added to the 
   discussion below: 
   http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770681.html
   To unsubscribe from result present in Solr 1.4, but missing in Solr 
   3.5, dismax only, click here. 
   NAML 
   
  
  
  
  -- 
  lucidimagination.com 
  
  
  If you reply to this email, your message will be added to the discussion 
  below: 
  http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770746.html
  To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, 
  dismax only, click here. 
  NAML 
 
 
 
 
 -- 
 lucidimagination.com 
 
 
 If you reply to this email, your message will be added to the discussion 
 below:
 http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770786.html
 To unsubscribe from result present 

Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

2012-02-22 Thread Jonathan Rochkind
So I don't really know what I'm talking about, and I'm not really sure 
if it's related or not, but your particular query:


The Beatles as musicians : Revolver through the Anthology

With the lone word that's a ':', reminds me of a dismax stopwords-type 
problem I ran into. Now, I ran into it on 1.4.  I don't know why it 
would be different on 1.4 and 3.x. And I see you aren't even using a 
multi-field dismax in your sample query, so it couldn't possibly be what 
I ran into... I don't think. But I'll write this anyway in case it gives 
someone some ideas.


The problem I ran into is caused by different analysis in two fields 
both used in a dismax, one that ends up keeping : as a token, and one 
that doesn't.  Which ends up having the same effect as the famous 
'dismax stopwords problem'.


Maybe somehow your schema changed such to produce this problem in 3.x 
but not in 1.4? Although again I realize the fact that you are only 
using a single field in your demo dismax query kind of suggests it's not 
this problem. Wonder if you try the query without the :, if the 
problem goes away, that might be a hint. Or, maybe someone more skilled 
at understanding what's in those Solr debug statements than I am (it's 
kind of all greek to me) will be able to take this hint and rule out or 
confirm that it may have something to do with your problem.


Here I write up the issue I ran into (which may or may not have anything 
to do with what you ran into)


http://bibwild.wordpress.com/2011/06/15/more-dismax-gotchas-varying-field-analysis-and-mm/


Also, you don't say what your 'mm' is in your dismax queries, that could 
be relevant if it's got anything to do with anything similar to the 
issue I'm talking about.


Hmm, I wonder if Solr 3.x changes the way dismax calculates number of 
tokens for 'mm' in such a way that the 'varying field analysis dismax 
gotcha' can manifest with only one field, if the way dismax counts 
tokens for 'mm' differs from number of tokens the single field's 
analysis produces?


Jonathan

On 2/22/2012 2:55 PM, Naomi Dushay wrote:

I am working on upgrading Solr from 1.4 to 3.5, and I have hit a problem.   I 
have a test checking for a search result in Solr, and the test passes in Solr 
1.4, but fails in Solr 3.5.   Dismax is the desired QueryParser -- I just 
included output from lucene QueryParser to prove the document exists and is 
found

I am completely stumped.


Here are the debugQuery details:

***Solr 3.5***

lucene QueryParser:

URL:   q=all_search:The Beatles as musicians : Revolver through the Anthology
final query:  all_search:the beatl as musician revolv through the antholog

6.0562754 = (MATCH) weight(all_search:the beatl as musician revolv through the 
antholog in 1064395), product of:
   1.0 = queryWeight(all_search:the beatl as musician revolv through the 
antholog), product of:
 48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 musician=11805 
revolv=872 through=81366 the=3531140 antholog=11611)
 0.02063975 = queryNorm
   6.0562754 = fieldWeight(all_search:the beatl as musician revolv through the 
antholog in 1064395), product of:
 1.0 = tf(phraseFreq=1.0)
 48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 musician=11805 
revolv=872 through=81366 the=3531140 antholog=11611)
 0.125 = fieldNorm(field=all_search, doc=1064395)

dismax QueryParser:
URL:  qf=all_searchpf=all_searchq=The Beatles as musicians : Revolver through the 
Anthology
final query:   +(all_search:the beatl as musician revolv through the antholog~1)~0.01 
(all_search:the beatl as musician revolv through the antholog~3)~0.01

(no matches)


***Solr 1.4***

lucene QueryParser:

URL:  q=all_search:The Beatles as musicians : Revolver through the Anthology
final query:  all_search:the beatl as musician revolv through the antholog

5.2676983 = fieldWeight(all_search:the beatl as musician revolv through the 
antholog in 3469163), product of:
   1.0 = tf(phraseFreq=1.0)
   48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 
revolv=820 through=88238 the=3542123 antholog=11205)
   0.109375 = fieldNorm(field=all_search, doc=3469163)

dismax QueryParser:
URL:  qf=all_searchpf=all_searchq=The Beatles as musicians : Revolver through the 
Anthology
final query:  +(all_search:the beatl as musician revolv through the antholog~1)~0.01 
(all_search:the beatl as musician revolv through the antholog~3)~0.01

score:

7.449651 = (MATCH) sum of:
   3.7248254 = weight(all_search:the beatl as musician revolv through the 
antholog~1 in 3469163), product of:
 0.7071068 = queryWeight(all_search:the beatl as musician revolv through the 
antholog~1), product of:
   48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 
musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
   0.014681898 = queryNorm
 5.2676983 = fieldWeight(all_search:the beatl as musician revolv through the 
antholog in 3469163), product of:
   1.0 = tf(phraseFreq=1.0)

Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

2012-02-22 Thread Naomi Dushay
I forgot to include the field definition information:

schema.xml:
  field name=all_search type=text indexed=true stored=false /

solr 3.5:
  fieldtype name=text class=solr.TextField
positionIncrementGap=100 autoGeneratePhraseQueries=true
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.ICUFoldingFilterFactory/  
filter class=solr.WordDelimiterFilterFactory
  splitOnCaseChange=1 generateWordParts=1 catenateWords=1
  splitOnNumerics=0 generateNumberParts=1 catenateNumbers=1
  catenateAll=0 preserveOriginal=0 stemEnglishPossessive=1 /
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
  /analyzer
/fieldtype

solr1.4:
fieldtype name=text class=solr.TextField positionIncrementGap=100
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=schema.UnicodeNormalizationFilterFactory
version=icu4j composed=false remove_diacritics=true
remove_modifiers=true fold=true /
filter class=solr.WordDelimiterFilterFactory 
  splitOnCaseChange=1 generateWordParts=1 catenateWords=1 
  splitOnNumerics=0 generateNumberParts=1 catenateNumbers=1 
  catenateAll=0 preserveOriginal=0 stemEnglishPossessive=1 /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
  /analyzer
/fieldtype


And the analysis page shows the same results for Solr 3.5 and 1.4


Solr 3.5:

position1   2   3   4   5   6   7   8
term text   the beatl   as  musicianrevolv  through the 
antholog
keyword false   false   false   false   false   false   false   false
startOffset 0   4   12  15  27  36  44  48
endOffset   3   11  14  24  35  43  47  57
typewordwordwordwordwordwordwordword

Solr 1.4:

term position   1   2   3   4   5   6   7   8
term text   the beatl   as  musicianrevolv  through the 
antholog
term type   wordwordwordwordwordwordwordword
source start,end0,3 4,1112,14   15,24   27,35   36,43   44,47   
48,57

- Naomi

--
View this message in context: 
http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3768007.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

2012-02-22 Thread Naomi Dushay
Jonathan,

I have the same problem without the colon - I tested that, but didn't mention 
it.   

mm can't be the issue either:   in Solr 3.5, if I remove one of the occurrences 
of the  (doesn't matter which), I get results.  Removing any other word does 
NOT get results.   And if the query isn't a phrase query, it gets results.

And no, it can't be related to what you refer to as the  dismax stopwords 
problem, since i can demonstrate the problem with a single field.  mm can't be 
the issue 


I have run into problems in the past with a non-alpha character surrounded by 
spaces tanking my search results for dismax … but I fixed that with this 
fieldType:

!-- single token with punctuation terms removed so dismax doesn't look for 
punctuation terms in these fields --
!-- On client side, Lucene query parser breaks things up by whitespace 
*before* field analysis for dismax --
!-- so punctuation terms ( : ;) are stopwords to allow results from other 
fields when these chars are surrounded by spaces in query --
!--  do not lowercase --
fieldType name=string_punct_stop class=solr.TextField omitNorms=true
  analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory /
filter class=solr.ICUNormalizer2FilterFactory name=nfkc 
mode=compose /
  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory /
filter class=solr.ICUNormalizer2FilterFactory name=nfkc 
mode=compose /
!-- removing punctuation for Lucene query parser issues --
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords_punctuation.txt enablePositionIncrements=true /
  /analyzer
/fieldType

My stopwords_punctuation.txt file is

#Punctuation characters we want to ignore in queries
:
;

/

and used this type instead of string for fields in my dismax qf.Thus, the 
punctuation terms in the query are not present for the fields that were 
formerly string fields.

- Naomi

On Feb 22, 2012, at 3:41 PM, Jonathan Rochkind wrote:

 So I don't really know what I'm talking about, and I'm not really sure if 
 it's related or not, but your particular query:
 
 The Beatles as musicians : Revolver through the Anthology
 
 With the lone word that's a ':', reminds me of a dismax stopwords-type 
 problem I ran into. Now, I ran into it on 1.4.  I don't know why it would be 
 different on 1.4 and 3.x. And I see you aren't even using a multi-field 
 dismax in your sample query, so it couldn't possibly be what I ran into... I 
 don't think. But I'll write this anyway in case it gives someone some ideas.
 
 The problem I ran into is caused by different analysis in two fields both 
 used in a dismax, one that ends up keeping : as a token, and one that 
 doesn't.  Which ends up having the same effect as the famous 'dismax 
 stopwords problem'.
 
 Maybe somehow your schema changed such to produce this problem in 3.x but not 
 in 1.4? Although again I realize the fact that you are only using a single 
 field in your demo dismax query kind of suggests it's not this problem. 
 Wonder if you try the query without the :, if the problem goes away, that 
 might be a hint. Or, maybe someone more skilled at understanding what's in 
 those Solr debug statements than I am (it's kind of all greek to me) will be 
 able to take this hint and rule out or confirm that it may have something to 
 do with your problem.
 
 Here I write up the issue I ran into (which may or may not have anything to 
 do with what you ran into)
 
 http://bibwild.wordpress.com/2011/06/15/more-dismax-gotchas-varying-field-analysis-and-mm/
 
 
 Also, you don't say what your 'mm' is in your dismax queries, that could be 
 relevant if it's got anything to do with anything similar to the issue I'm 
 talking about.
 
 Hmm, I wonder if Solr 3.x changes the way dismax calculates number of tokens 
 for 'mm' in such a way that the 'varying field analysis dismax gotcha' can 
 manifest with only one field, if the way dismax counts tokens for 'mm' 
 differs from number of tokens the single field's analysis produces?
 
 Jonathan
 
 On 2/22/2012 2:55 PM, Naomi Dushay wrote:
 I am working on upgrading Solr from 1.4 to 3.5, and I have hit a problem.   
 I have a test checking for a search result in Solr, and the test passes in 
 Solr 1.4, but fails in Solr 3.5.   Dismax is the desired QueryParser -- I 
 just included output from lucene QueryParser to prove the document exists 
 and is found
 
 I am completely stumped.
 
 
 Here are the debugQuery details:
 
 ***Solr 3.5***
 
 lucene QueryParser:
 
 URL:   q=all_search:The Beatles as musicians : Revolver through the 
 Anthology
 final query:  all_search:the beatl as musician revolv through the antholog
 
 6.0562754 = (MATCH) weight(all_search:the beatl as musician revolv through 
 the antholog in 1064395), product of:
   1.0 = queryWeight(all_search:the beatl as musician revolv through the 
 antholog), product of:
 48.450203 = idf(all_search: 

Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

2012-02-22 Thread Naomi Dushay
Jonathan has brought it to my attention that BOTH of my failing searches happen 
to have 8 terms, and one of the terms is repeated:

 The Beatles as musicians : Revolver through the Anthology
 Color-blindness [print/digital]; its dangers and its detection

but this is a PHRASE search.  

In case it's relevant, both Solr 1.4 and Solr 3.5:
 do NOT use stopwords in the fieldtype;  
 mm is  6-1 690%  for dismax
 qs is 1
 ps is 3

And both use this filter last

filter class=solr.RemoveDuplicatesTokenFilterFactory /

… but I believe that filter is only used for consecutive tokens.

Lastly, 

 Color-blindness [print/digital]; its and its detection   works   (danger 
is removed, rather than one of the repeated its)

- Naomi



On Feb 22, 2012, at 3:41 PM, Jonathan Rochkind wrote:

 So I don't really know what I'm talking about, and I'm not really sure if 
 it's related or not, but your particular query:
 
 The Beatles as musicians : Revolver through the Anthology
 
 With the lone word that's a ':', reminds me of a dismax stopwords-type 
 problem I ran into. Now, I ran into it on 1.4.  I don't know why it would be 
 different on 1.4 and 3.x. And I see you aren't even using a multi-field 
 dismax in your sample query, so it couldn't possibly be what I ran into... I 
 don't think. But I'll write this anyway in case it gives someone some ideas.
 
 The problem I ran into is caused by different analysis in two fields both 
 used in a dismax, one that ends up keeping : as a token, and one that 
 doesn't.  Which ends up having the same effect as the famous 'dismax 
 stopwords problem'.
 
 Maybe somehow your schema changed such to produce this problem in 3.x but not 
 in 1.4? Although again I realize the fact that you are only using a single 
 field in your demo dismax query kind of suggests it's not this problem. 
 Wonder if you try the query without the :, if the problem goes away, that 
 might be a hint. Or, maybe someone more skilled at understanding what's in 
 those Solr debug statements than I am (it's kind of all greek to me) will be 
 able to take this hint and rule out or confirm that it may have something to 
 do with your problem.
 
 Here I write up the issue I ran into (which may or may not have anything to 
 do with what you ran into)
 
 http://bibwild.wordpress.com/2011/06/15/more-dismax-gotchas-varying-field-analysis-and-mm/
 
 
 Also, you don't say what your 'mm' is in your dismax queries, that could be 
 relevant if it's got anything to do with anything similar to the issue I'm 
 talking about.
 
 Hmm, I wonder if Solr 3.x changes the way dismax calculates number of tokens 
 for 'mm' in such a way that the 'varying field analysis dismax gotcha' can 
 manifest with only one field, if the way dismax counts tokens for 'mm' 
 differs from number of tokens the single field's analysis produces?
 
 Jonathan
 
 On 2/22/2012 2:55 PM, Naomi Dushay wrote:
 I am working on upgrading Solr from 1.4 to 3.5, and I have hit a problem.   
 I have a test checking for a search result in Solr, and the test passes in 
 Solr 1.4, but fails in Solr 3.5.   Dismax is the desired QueryParser -- I 
 just included output from lucene QueryParser to prove the document exists 
 and is found
 
 I am completely stumped.
 
 
 Here are the debugQuery details:
 
 ***Solr 3.5***
 
 lucene QueryParser:
 
 URL:   q=all_search:The Beatles as musicians : Revolver through the 
 Anthology
 final query:  all_search:the beatl as musician revolv through the antholog
 
 6.0562754 = (MATCH) weight(all_search:the beatl as musician revolv through 
 the antholog in 1064395), product of:
   1.0 = queryWeight(all_search:the beatl as musician revolv through the 
 antholog), product of:
 48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 
 musician=11805 revolv=872 through=81366 the=3531140 antholog=11611)
 0.02063975 = queryNorm
   6.0562754 = fieldWeight(all_search:the beatl as musician revolv through 
 the antholog in 1064395), product of:
 1.0 = tf(phraseFreq=1.0)
 48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 
 musician=11805 revolv=872 through=81366 the=3531140 antholog=11611)
 0.125 = fieldNorm(field=all_search, doc=1064395)
 
 dismax QueryParser:
 URL:  qf=all_searchpf=all_searchq=The Beatles as musicians : Revolver 
 through the Anthology
 final query:   +(all_search:the beatl as musician revolv through the 
 antholog~1)~0.01 (all_search:the beatl as musician revolv through the 
 antholog~3)~0.01
 
 (no matches)
 
 
 ***Solr 1.4***
 
 lucene QueryParser:
 
 URL:  q=all_search:The Beatles as musicians : Revolver through the 
 Anthology
 final query:  all_search:the beatl as musician revolv through the antholog
 
 5.2676983 = fieldWeight(all_search:the beatl as musician revolv through the 
 antholog in 3469163), product of:
   1.0 = tf(phraseFreq=1.0)
   48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 
 revolv=820 through=88238 the=3542123 antholog=11205)
   0.109375 = 

Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

2012-02-22 Thread Robert Muir
On Wed, Feb 22, 2012 at 7:35 PM, Naomi Dushay ndus...@stanford.edu wrote:
 Jonathan has brought it to my attention that BOTH of my failing searches 
 happen to have 8 terms, and one of the terms is repeated:

  The Beatles as musicians : Revolver through the Anthology
  Color-blindness [print/digital]; its dangers and its detection

 but this is a PHRASE search.


Can you take your same phrase queries, and simply add some slop to
them (e.g. ~3) and ensure they still match with the lucene
queryparser? SloppyPhraseQuery has a bit of a history with repeats
since Lucene 2.9 that you were using.

https://issues.apache.org/jira/browse/LUCENE-3068
https://issues.apache.org/jira/browse/LUCENE-3215
https://issues.apache.org/jira/browse/LUCENE-3412

-- 
lucidimagination.com